0% found this document useful (0 votes)
12 views21 pages

NP Compl

This chapter discusses NP-completeness and the challenges of solving algorithmic problems efficiently. It introduces key concepts in complexity theory, including the definitions of complexity classes such as P, NP, NP-hard, and NP-complete, emphasizing the distinction between problems that can be solved in polynomial time and those that cannot. The chapter also highlights the importance of decision problems and verification algorithms in understanding the hardness of computational problems.

Uploaded by

leconstheoriques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

NP Compl

This chapter discusses NP-completeness and the challenges of solving algorithmic problems efficiently. It introduces key concepts in complexity theory, including the definitions of complexity classes such as P, NP, NP-hard, and NP-complete, emphasizing the distinction between problems that can be solved in polynomial time and those that cannot. The chapter also highlights the importance of decision problems and verification algorithms in understanding the hardness of computational problems.

Uploaded by

leconstheoriques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

CHAPTER 34

NP-Completeness
Complexity Theory
• At this point of the semester we have been building up your ``bag of tricks'' for
solving algorithmic problems.
• Hopefully when presented with a problem you now have a little better idea of
how to go about solving the problem.
• what sort of design paradigm should be used (divideandconquer, DFS, greedy,
dynamic programming, parallel),
• what sort of data structures might be relevant (trees, heaps, graphs), and
• what representations would be best (adjacency list, adjacency matrices),
• what is the running time of your algorithm.
• All of this is fine if it helps you discover an acceptably efficient algorithm to solve
your problem.
• The question that often arises in practice is that you have tried every trick in the
book, and nothing seems to work. Although your algorithm can solve small
problems reasonably efficiently (e.g. n < 20), the really large applications that you
want to solve (e.g. n = 1.000 or n = 10.000) your algorithm never terminates.
• When you analyze its running time, you realize that it is running in exponential
n
time, perhaps n n , or 2n , or 22 , or n!, or worse!
Complexity Theory (cont.)
• Near the end of the 60's where there was great success in finding efficient
solutions to many combinatorial problems, but there was also a growing list of
problems for which there seemed to be no known efficient algorithmic solutions.
• People began to wonder whether there was some unknown paradigm that would
lead to a solution to these problems, or perhaps some proof that these problems
are inherently hard to solve and no algorithmic solutions exist that run under
exponential time.
• Near the end of the 60's a remarkable discovery was made. Many of these hard
problems were interrelated in the sense that if you could solve any one of them in
polynomial time, then you could solve all of them in polynomial time.
• This discovery gave rise to the notion of NPcompleteness, and created possibly
the biggest open problems in computer science: is P = NP?
• We will be studying this concept over the next few lectures.
• This area is a radical departure from what we have been doing because the
emphasis will change.
• The goal is no longer to prove that a problem can be solved efficiently by
presenting an algorithm for it.
• Instead we will be trying to show that a problem cannot be solved efficiently.
The question is how to do this?
Laying down the rules
• We need some way to separate the class of efficiently solvable problems from
inefficiently solvable problems.
• We will do this by considering problems that can be solved in polynomial time.
• When designing algorithms it has been possible for us to be rather informal with various
concepts. We have made use of the fact that an intelligent programmer could fill in any
missing details.
• However, the task of proving that something cannot be done efficiently must be handled
much more carefully, since we do not want leave any ``loopholes'' that would allow
someone to subvert the rules in an unreasonable way and claim to have an efficient
solution when one does not really exist.
• We have measured the running time of algorithms using worstcase complexity, as a
function of n, the size of the input. We have defined input size variously for different
problems, but the bottom line is the number of bits (or bytes) that it takes to represent the
input using any reasonably efficient encoding.
• By a reasonably efficient encoding, we assume that there is not some significantly shorter
way of providing the same information. For example, you could write numbers in unary
notation 11111111  100  8 rather than binary, but that would be unacceptably
1 2
inefficient.
• You could describe graphs in some highly inefficient way, such as by listing all of its
cycles, but this would also be unacceptable.
• We will assume that numbers are expressed in binary or some higher base and graphs are
expressed using either adjacency matrices or adjacency lists.
Laying down the rules (cont.)
• We will usually restrict numeric inputs to be integers (as opposed to calling them ``reals''),
so that it is clear that arithmetic can be performed efficiently.
• We have also assumed that operations on numbers can be performed in constant time.
From now on, we should be more careful and assume that arithmetic operations require at
least as much time as there are bits of precision in the numbers being stored.
• Up until now all the algorithms we have seen have had the property that their worstcase
running times are bounded above by some polynomial in the input size, n.
• A polynomial time algorithm is any algorithm that runs in time O(n k ) where k is some
constant that is independent of n.
• A problem is said to be solvable in polynomial time if there is a polynomial time
algorithm that solves it.
• Some functions that do not ``look'' like polynomials (such as O(n log n)) are bounded
above by polynomials (such as O(n 2 ) ).Some functions that do ``look'' like polynomials are
not. For example, supposek you have an algorithm which inputs a graph of size n and an
integer k and runs in O(n ) time. Is this a polynomial? No, because k is an input to the
problem, so the user is allowed to choose k = n, implying that the running time would be
O(n n ) which is not a polynomial in n. The important thing is that the exponent must be a
constant independent of n.
• Of course, saying that all polynomial time algorithms are ``efficient'' is untrue. An
algorithm whose running time is O(n1000) is certainly pretty inefficient. Nonetheless, if an
algorithm runs in worse than polynomial time (e.g.O(2n ) ), then it is certainly not efficient,
except for very small values of n.
Decision Problems
• Many of the problems that we have discussed involve optimization of one form
or another: find an optimal parenthesization in the matrix-chain multiplication
problem, select a maximum-size subset of mutually compatible activities in the
activity-selection problem, find the minimum weight triangulation.
• For rather technical reasons, most NPcomplete problems that we will discuss will
be phrased as decision problems.
• A problem is called a decision problem if its output is a simple ``yes'' or ``no'' (or
you may think of this as True/False, 0/1, accept/reject).
• We will phrase many optimization problems in terms of decision problems. For
example, the minimum spanning tree decision problem might be: Given a weighted
graph G and an integer k, does G have a spanning tree whose weight is at most k?
• This may seem like a less interesting formulation of the problem. It does not ask
for the weight of the minimum spanning tree, and it does not even ask for the edges
of the spanning tree that achieves this weight.
• However, our job will be to show that certain problems cannot be solved
efficiently.
• If we show that the simple decision problem cannot be solved efficiently, then the
more general optimization problem certainly cannot be solved efficiently either.
Language Recognition Problems
• Observe that a decision problem can also be thought of as a language
recognition problem. We could define a language L
L = {(G, k) : G has a MST of weight at most k}:
• This set consists of pairs, the first element is a graph (e.g. the adjacency matrix
encoded as a string) followed by an integer k encoded as a binary number.
• At first it may seem strange expressing a graph as a string, but obviously anything
that is represented in a computer is broken down somehow into a string of bits.
• When presented with an input string (G, k), the algorithm would answer ``yes'' if
(G, k)  L implying that G has a spanning tree of weight at most k, and ``no''
otherwise.
• In the first case we say that the algorithm ``accepts'' the input and otherwise it
``rejects'' the input.
• Given any language, we can ask the question of how hard it is to determine
whether a given string is in the language.
• For example, in the case of the MST language L, we can determine membership
easily in polynomial time. We just store the graph internally, run Kruskal's
algorithm, and see whether the final optimal weight is at most k. If so we accept,
and otherwise we reject.
Complexity Classes: Definitions
• Define P to be the set of all languages for which membership can be tested in
polynomial time. (Intuitively, this corresponds to the set of all decisions problems
that can be solved in polynomial time.)
• Note that languages are sets of strings, and P is a set of languages. P is defined in
terms of how hard it is computationally to recognized membership in the language.
• A set of languages that is defined in terms of how hard it is to determine
membership is called a complexity class.
• Since we can compute minimum spanning trees in polynomial time, we have L P.
• Here is a harder one. M = {(G, k) : G has a simple path of length at least k}
• Given a graph G and integer k how would you ``recognize'' whether it is in the
language M?
• You might try searching the graph for a simple paths, until finding one of length at
least k.
• If you find one then you can accept and terminate. However, if not then you may
spend a lot of time searching (especially if k is large, like n-1, and no such path
exists). So is M  P? No one knows the answer. In fact, we will see that M is NP-
complete.
•In what follows, we will be introducing a number of classes. We will jump back
and forth between the terms ``language'' and ``decision problems'', but for our
purposes they mean the same things.
Definitions (cont.)
• Before giving all the technical definitions, let us say a bit about what the general
classes look like at an intuitive level.
P: This is the set of all decision problems that can be solved in polynomial time. We
will generally refer to these problems as being ``easy'' or ``efficiently solvable''.
(Although this may be an exaggeration in many cases.)
NP: This is the set of all decision problems that can be verified in polynomial time.
(We will give a definition of this below.) This class contains P as a subset. Thus, it
contains a number of easy problems, but it also contains a number of problems that
are believed to be very hard to solve. The term NP does not mean ``not polynomial''.
Originally the term meant ``nondeterministic polynomial time''. But it is bit more
intuitive to explain the concept from the perspective of verification.
NPhard: In spite of its name, to say that problem is NPhard does not mean that it
is hard to solve. Rather it means that if we could solve this problem in polynomial
time, then we could solve all NP problems in polynomial time. Note that for a
problem to be NP-hard, it does not have to be in the class NP. Since it is widely
believed that all NP problems are not solvable in polynomial time, it is widely
believed that no NPhard problem is solvable in polynomial time.
NPcomplete: A problem is NPcomplete if (1) it is in NP, and (2) it is NPhard.
That is, NP-complete = NPNPhard.
Definitions (cont.)
• The figure below illustrates one way that the sets P, NP, NPhard, and NP-
complete (NPC) might look. We say might because we do not know whether all of
these complexity classes are distinct or whether they are all solvable in polynomial
time.
•There are some problems in the figure
that we will not discuss.
• One is Graph Isomorphism, which asks
whether two graphs are identical up to a
renaming of their vertices. It is known that
this problem is in NP, but it is not known
to be in P.
• The other is QBF, which stands for
Quantified Boolean Formulas. In this
problem you are given a boolean formula
with quantifiers (  and  ) and you want to
know whether the formula is true or false.
This problem is beyond the scope of this
course, but may be discussed in an
advanced course on complexity theory.
Polynomial Time Verification and Certificates
• Before talking about the class of NPcomplete problems, it is important to introduce the
notion of a verification algorithm.
• Many language recognition problems that may be very hard to solve, but they have the
property that it is easy to verify whether a string is in the language.
• Consider the following problem, called the Hamiltonian cycle problem.
• Given an undirected graph G, does G have a cycle that visits every vertex exactly once.
(There is a similar problem on directed graphs, and there is also a version which asks whether
there is a path that visits all vertices.)
• We can describe this problem as a language recognition problem, where the language is
HC = {(G) : G has a Hamiltonian cycle},
where (G) denotes an encoding of a graph G as a string.
• The Hamiltonian cycle problem seems to be much harder, and there is no known
polynomial time algorithm for this problem.
• For example, the figure below shows two graphs, one which is Hamiltonian and one which
is not. However, suppose that a graph did have a Hamiltonian cycle. Then it would be a very
easy matter for someone to convince us of this. They would simply say ``the cycle is
v3 , v7 , v1 ,..., v13 “.
•We could then inspect the graph, and check that this
is indeed a legal cycle and that it visits all the
vertices of the graph exactly once.
• Thus, even though we know of no efficient way to
solve the Hamiltonian cycle problem, there is a very
efficient way to verify that a given graph is in HC.
Polynomial Time Verification and Certificates (cont.)
• The given cycle is called a certificate. This is some piece of information which
allows us to verify that a given string is in a language.
• More formally, given a language L, and given x  L, a verification algorithm is an
algorithm which given x and a string y called the certificate, can verify that x is in
the language L using this certificate as help. If x is not in L then there is nothing to
verify.
• Note that not all languages have the property that they are easy to verify. For
example, consider the following languages:
UHC = {(G) : G has a unique Hamiltonian cycle},
HC = {(G) : G has no Hamiltonian cycle}.
• Suppose that a graph G is in the language UHC. What information would someone
give us that would allow us to verify that G is indeed in the language? They could
give us an example of the unique Hamiltonian cycle, and we could verify that it is a
Hamiltonian cycle, but what sort of certificate could they give us to convince us that
this is the only one?
• They could give another cycle that is NOT Hamiltonian, but this does not mean
that there is not another cycle somewhere that is Hamiltonian. They could try to list
every other cycle of length n, but this would not be at all efficient, since there are n!
possible cycles in general. Thus, it is hard to imagine that someone could give us
some information that would allow us to efficiently convince ourselves that a given
graph is in the language.
The Class NP
Definition: Define NP to be the set of all languages that can be verified by a
polynomial time algorithm.
• Why is the set called ``NP'' rather than ``VP''? The original term NP stood for
``non-deterministic polynomial time''. This referred to a program running on a non-
deterministic computer that can make guesses. Basically, such a computer could
non-deterministically guess the value of certificate, and then verify that the string is
in the language in polynomial time.
• We have avoided introducing non-determinism here. It would be covered in a
course on complexity theory or formal language theory.
• Like P, NP is a set of languages based on some complexity measure (the
complexity of verification). Observe that P NP. In other words, if we can solve a
problem in polynomial time, then we can certainly verify membership in
polynomial time. (More formally, we do not even need to see a certificate to solve
the problem, we can solve it in polynomial time anyway).
• However it is not known whether P = NP. It seems unreasonable to think that this
should be so. In other words, just being able to verify that you have a correct
solution does not help you in finding the actual solution very much.
• Most experts believe that P  NP, but no one has a proof of this.
• Next time we will define the notions of NPhard and NPcomplete.
Summary of the previous lecture
• Last time we introduced a number of concepts, on the way to defining NP-
completeness. In particular, the following concepts are important.
• Decision Problems: are problems for which the answer is either yes or no. NP-
complete problems are expressed as decision problems, and hence can be thought of
as language recognition problems, assuming that the input has been encoded as a
string. We encode inputs as strings.
• For example: HC = {G : G has a Hamiltonian cycle}
MST = {(G; x): G has a MST of cost at most x}.
• P: is the class of all decision problems which can be solved in polynomial time,
O(n k ) for some constant k. For example MSTP but HC is not known (and
suspected not) to be in P.
• Certificate: is a piece of evidence that allows us to verify in polynomial time that a
string is in a given language. For example, suppose that the language is the set of
Hamiltonian graphs. To convince someone that a graph is in this language, we could
supply the certificate consisting of a sequence of vertices along the cycle. It is easy
to access the adjacency matrix to determine that this is a legitimate cycle in G.
Therefore HC NP.
• NP: is defined to be the class of all languages that can be verified in polynomial
time. Note that since all languages in P can be solved in polynomial time, they can
certainly be verified in polynomial time, so we have P NP. However, NP also
seems to have some pretty hard problems to solve, such as HC.
NP-Completeness: Reductions
• The class of NPcomplete problems consists of a set of decision problems (languages) (a
subset of the class NP) that no one knows how to solve efficiently, but if there were a
polynomial time solution for even a single NPcomplete problem, then every problem in NP
would be solvable in polynomial time.
• To establish this, we need to introduce the concept of a reduction.
• Before discussing reductions, let us just consider the following question. Suppose that there
are two problems, A and B. You know (or you strongly believe at least) that it is impossible to
solve problem A in polynomial time. You want to prove that B cannot be solved in
polynomial time. How would you do this?
• We want to show that (A P)  (B P).
• To do this, we could prove the contrapositive, (B P)  (A  P):
• In other words, to show that B is not solvable in polynomial time, we will suppose that there
is an algorithm that solves B in polynomial time, and then derive a contradiction by showing
that A can be solved in polynomial time.
• How do we do this? Suppose that we have a subroutine that can solve any instance of
problem B in polynomial time. Then all we need to do is to show that we can use this
subroutine to solve problem A in polynomial time. Thus we have ``reduced'' problem A to
problem B.
• It is important to note here that this supposed subroutine is really a fantasy. We know (or
strongly believe) that A cannot be solved in polynomial time, thus we are essentially proving
that the subroutine cannot exist, implying that B cannot be solved in polynomial time.
• Be sure that you understand this, this is the basis behind all reductions.
Example: 3-Colorability and Clique Cover
• Let us consider an example to make this clearer. The following problem is well-
known to be NPcomplete, and hence it is strongly believed that the problem cannot
be solved in polynomial time.
3coloring (3Col): Given a graph G, can each of its vertices be labeled with one of 3
different ``colors'', such that no two adjacent vertices have the same label.
• Coloring arises in various partitioning problems, where there is a constraint that
two objects cannot be assigned to the same set of the partition. The term ``coloring''
comes from the original application which was in map drawing. Two countries that
share a common border should be colored with different colors. It is well known
that planar graphs can be colored with 4 colors, and there exists a polynomial time
algorithm for this. But determining whether 3 colors are possible (even for planar
graphs) seems to be hard and there is no known polynomial time algorithm.
• In the figure below we give two graphs. One which can be colored with 3 colors,
and one that cannot.

• The 3Col problem will play the role of


problem A, which we strongly suspect to not
be solvable in polynomial time.
• For our problem B, consider the following problem.
• Given a graph G = (V, E), we say that a subset of
vertices V’ V forms a clique if for every pair of
vertices u, v V’, (u, v)  E. That is, the subgraph
induced by V’ is a complete graph.
Clique Cover (CCov): Given a graph G and an integer k, can we find k subsets of
vertices V1 ,V2 ,...,Vk , such that  i Vi  V , and that each Vi is a clique of G.
• The clique cover problem arises in applications of clustering. We put an edge
between two nodes if they are similar enough to be clustered in the same group. We
want to know whether it is possible to cluster all the vertices into k groups.
• Suppose that you want to solve the CCov problem, but after a while of fruitless
effort, you still cannot find a polynomial time algorithm for the CCov problem. How
can you prove that CCov is likely to not have a polynomial time solution?
• You know that 3Col is NPcomplete, and hence experts believe that 3Col P. You
feel that there is some connection between the CCov problem and the 3Col problem.
• Thus, you want to show that (3Col  P)  (CCov  P), which you will show by
proving the contrapositive (CCov  P)  (3Col  P).
• To do this, you assume that you have access to a subroutine CCov(G, k). Given a
graph G and an integer k, this subroutine returns true if G has a clique cover of size
k and false otherwise, and furthermore, this subroutine runs in polynomial time.
• How can we use this ``alleged'' subroutine to solve the wellknown hard 3Col
problem?
• We want to write a polynomial time subroutine for 3Col, and this subroutine is
allowed to call the subroutine CCov(G, k) for any graph G and any integer k.
• Both problems involve partitioning the vertices up into groups. The only
difference here is that in one problem the number of cliques is specified as part of
the input and in the other the number of color classes is fixed at 3.
• In the clique cover problem, for two vertices to be in the same group they must be
adjacent to each other. In the 3coloring problem, for two vertices to be in the same
color group, they must not be adjacent.
• In some sense, the problems are almost the same, but the requirement adjacent/
nonadjacent is exactly reversed.
• We claim that we can reduce the 3coloring problem to the clique cover problem as
follows.
• Given a graph G for which we want to determine its 3colorability, output the pair
(G, 3) where G denotes the complement of G. We can then feed the pair (G, 3) into
a subroutine for clique cover. This is illustrated in the figure below.
Claim: A graph G is 3colorable if and only if its complement G has a cliquecover of
size 3.
In other words, G  3Col iff (G, 3)  CCov.
Proof: () If G 3colorable, then let V1 ,V2 ,V3 be the three color classes. We claim
that this is a clique cover of size 3 for G, since if u and v are distinct vertices in Vi ,
then (u, v)  E(G) (since adjacent vertices cannot have the same color) which
implies that (u, v)  E(G). Thus every pair of distinct vertices in Vi are adjacent in
G.
() Suppose G has a clique cover of size 3, denoted V1 ,V2 ,V3 . For i =1,2,3 give
the vertices of Vi color i. We assert that this is a legal coloring for G, since if
distinct vertices u and v are both in Vi , then (u, v)  E(G) (since they are in a
common clique), implying that (u, v)  E(G). Hence, two vertices with the same
color are not adjacent.
• We now take this intuition of reducing one problem to another through the use of a
subroutine call, and place it on more formal footing.
• Notice that in the example above, we converted an instance of the 3coloring
problem (G) into an equivalent instance of the Clique Cover problem (G, 3).
Polynomial – Time Reduction
Definition: We say that a language (i.e. decision problem) L1 is polynomialtime
reducible to language L2 (written L1  P L2 ) if there is a polynomial time
computable function f , such that for all x, x  L1 if and only if f(x) L2 .
• In the previous example we showed that 3Col  P CCov.
• In particular we have f(G) = (G, 3). Note that it is easy to complement a graph in
O(n 2 ) (i.e. polynomial) time (e.g. flip 0's and 1's in the adjacency matrix). Thus f
is computable in polynomial time.
• Intuitively, saying that L1  P L2 means that ``if L2 is solvable in polynomial time,
then so is L1 .'' This is because a polynomial time subroutine for L2 could be
applied to f(x) to determine whether f(x) L2 , or equivalently whether x  L1 .
• Thus, in sense of polynomial time computability, L1 is ``no harder'' than L2 .
• The way in which this is used in NPcompleteness is exactly the converse. We
usually have strong evidence that L1 is not solvable in polynomial time, and hence
the reduction is effectively equivalent to saying ``since L1 is not likely to be
solvable in polynomial time, then L2 is also not likely to be solvable in polynomial
time.''
• Thus, this is how polynomial time reductions can be used to show that problems
are as hard to solve as known difficult problems.
Lemma : If L1  P L2 and L2  P then L1 P.
Lemma : If L1  P L2 and L1 P then L2  P.
NP-Completeness
• One important fact about reducibility is that it is transitive. In other words
Lemma : If L1  P L2 and L2  P L3 then L1  P L3.
• The reason is that if two functions f(x) and g(x) are computable in polynomial
time, then their composition f(g(x)) is computable in polynomial time as well.
NPcompleteness: The set of NPcomplete problems are all problems in the
complexity class NP, for which it is known that if any one is solvable in polynomial
time, then they all are, and conversely, if any one is not solvable in polynomial time,
then none are.
• This is made mathematically formal using the notion of polynomial time
reductions.
Definition: A language L is NPhard if: L’  PL for all L’  NP.
Definition: A language L is NPcomplete if: L NP, and L is NPhard.
• An alternative (and usually easier way) to show that a problem is NPcomplete is to
use transitivity.
Lemma: L is NPcomplete if (1) L NP and
(2) L’  P L for some known NPcomplete language L’.
• The reason is that all L’’  NP are reducible to L’ (since L’ is NPcomplete and
hence NPhard) and hence by transitivity L’’ is reducible to L, implying that L is
NPhard.
NP-Completeness (cont.)
• This gives us a way to prove that problems are NPcomplete, once we know that
one problem is NPcomplete.
• Unfortunately, it appears to be almost impossible to prove that one problem is NP-
complete, because the definition says that we have to be able to reduce every
problem in NP to this problem.
• There are infinitely many such problems, so how can we ever hope to do this?
• We will talk about this next time with Cook's theorem. Cook showed that there is
one problem called SAT (short for boolean satisfiability) that is NPcomplete.
• To prove a second problem is NPcomplete, all we need to do is to show that our
problem is in NP (and hence it is reducible to SAT), and then to show that we can
reduce SAT (or generally some known NPC problem) to our problem. It follows
that our problem is equivalent to SAT (with respect to solvability in polynomial
time). This is illustrated in the figure below.

You might also like