0% found this document useful (0 votes)
4 views22 pages

CS2

The document discusses dynamic programming techniques for solving various problems including shortest paths in directed acyclic graphs (DAGs), longest increasing subsequences, the knapsack problem, edit distance, and the Floyd-Warshall algorithm for all-pairs shortest paths. Each section outlines algorithms and their explanations, focusing on how to break down problems into subproblems and build solutions incrementally. Key examples and algorithms are provided to illustrate the concepts effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views22 pages

CS2

The document discusses dynamic programming techniques for solving various problems including shortest paths in directed acyclic graphs (DAGs), longest increasing subsequences, the knapsack problem, edit distance, and the Floyd-Warshall algorithm for all-pairs shortest paths. Each section outlines algorithms and their explanations, focusing on how to break down problems into subproblems and build solutions incrementally. Key examples and algorithms are provided to illustrate the concepts effectively.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

6 Dynamic programming

Dynamic programmingis about identifying and solving subproblems and put them together to solve larger problems.

6.1 Shortest paths in dags, revisited


The special distinguishing feature of a dag is that its nodes can be linearized; that is, they can be arranged on a line
so that all edges go from left to right.

Algorithm: Explanation:
Let u1 , ..., un be a topological sort of nodes We initialize an array denoted d[ ] to store the shortest distances from
d[ ] = ∞ , d[s] = 0 the source node s to each node in the DAG, and initialize all elements
for i=1, ..., n of d[ ] to ∞ except for d[s], which is set to 0.
for each v in Adj[ui ] For each node ui , iterate over each neighbor v of ui , and update the
d[v] = min(d[v], d[u] + w[u, v]) shortest distance d[v] to node v by taking the minimum between its
current value d[v] and the sum of the distance to ui (d[ui ]) and the
weight of the edge from ui to v (w[ui , v]).

6.2 Longest increasing subsequences


For a sequence a1 , a2 , ..., an we want to find the length of the longest increasing subsequence ai1 , ai2 , ..., aik , that is
just a sequence of elements where each one of them has both a larger value and a larger index than the previous one:
i1 < i2 < ... < ik , and ai1 < ai2 < ... < aik .

Example:
Consider A=[3, 1, 8, 2, 5], in this case the longest increasing subsequence is [1, 2, 5], with a total length of 3.
To solve this kind of problem we can imagine that each element of the sequence is a node in a graph, and we can
construct a directed edge from one node to another if the node connected contains a larger value: there is a directed
edge from node i to node j if the element at index j is greater than the element at index i.

We notice that this is a DAG, where an increasing subsequence is just another path in the graph, indeed the lenght of
the longest increasing subsequence corresponds to the length of the longest path in this DAG + 1.

- We have to find a subproblem: We know that all increasing subsequences have a start and end, and we focus on the
end index of an increasing subsequence.
Consider L[k]= L ending at index k , for example if k=3 it would be [1, 2], with a total length of 2.

- We have to find relationships among subproblems: If I want to find the longest increasing subsequence ending at
index 4, what subproblems are needed to dolve L[4]? → We look at all the paths from a node i to our node at index
4, we have a path from node 0 (index) to node 4, a path from node 1 and a path from node 3.
Now we need to know the length of the longest increasing subsequence ending at index 0, which happened to be 1,
L[0] = 1 , then ending at index 1, L[1]= 1, and finally ending at index 3, L[3]=2.
Therefore the length of the longest increasing subsequence ending at index 4 is L[4] = 1 + max{L[0], L[1], L[3]} = 3.

1
- Now we have to generalize this relationship: L[4] = 1 + max{L[k] | k < 4andA[k] < A[4]} = 1 + max{L[0], L[1], L[3]}.

- Implement by solving subproblems in order: In this case we have to solve subproblems from left to right.

Algorithm: Explanation:
Let u1 , ..., un be a topological sort of nodes We initialize an array d[] to store the length of the longest
for i=1, ..., n increasing subsequence ending at each node.
d[ui ] = -∞ if ui ̸= s , else 1 If ui is the first element of the sequence is initialized to 1, as
for each v in AdjIncoming[ui ] the node itself forms an increasing subsequence of length 1.
d[ui ] = max(d[ui ], 1 + d[v])
d[ui ] = max(d[ui ], 1 + d[v]) means that the length of the longest
return max(d)
increasing subsequence ending at ui, is one plus the maximum
length of increasing subsequences ending at all its predecessors.
The length of the longest increasing subsequence is the maximum of d[].

Running time: The i-th step of outer loop takes time Pn O(out-deg(ui )), where out-degree represents the number of
outgoing edges from ui , so the total time will be O( i=1 out-deg(ui )) , and since the sum of the out-degrees of all
nodes is equal to the total number of edges in the graph we will have O(m), where m is the number of edges.

6.3 Knapsack
Example: During a robbery, a burglar finds much more loot than he had expected and has to decide what to take.
His bag (or “knapsack”) will hold a total weight of at most W pounds, and there are n items to pick from, of weight
w1 , ..., wn , and dollar value v1 , ..., vn .
What’s the most valuable combination of items he can fit into his bag?
For instance, take W = 10 and:

Item Weight Value


1 6 $30
2 3 $14
3 4 $16
4 2 $9

Note that: If this application seems frivolous, replace “weight” with “CPU time” and “only W pounds can be taken”
with “only W units of CPU time are available”.

We have two possible solution:


- There are unlimited quantities of each item available, the optimal choice is to pick item 1 and two of item 4 (tot: 48).
- There is one of each item, then the optimal knapsack contains items 1 and 3 (tot: 46).

Knapsack with repetition


This is the case where each item can be included multiple times in the knapsack.

Algorithm: Explanation:
K(0) = 0 We initialize an array K of length W + 1 to store the maximum
for w = 1 to W : values for each knapsack capacity.
K(w) = max(K(w − wi ) + vi : wi ≤ w) For each capacity w, consider each item i such that the weight of
return K(W ) item i (wi ) does not exceed the current capacity w.
For each item i, calculate K(w − wi ) + vi , which represents the
maximum value achievable by including item i in the knapsack
(w − wi represents the remaining capacity after including item i).

2
Knapsack without repetition
Each item can be included in the knapsack at most once.

Algorithm: 0/1 Knapsack problem Explanation:


Input: weight[i], cost[i], capacity Initialize a 2D array A of size (n+1) × (t+1),
Output: subset of items with total weight ≤ t (total capacity) where n is the number of items, while t is the
total weight capacity of the knapsack.
for i=0, ..., n
for j=0, ..., t - If i == 0 there are no items to consider.
if i == 0 - If j < weight[i-1] it means that the weight of
A[0][j] = 0 the current item i-1 is greater than the capacity j,
else so we cannot include the current item.
if j < weight[i-1]
If this is the case we set A[i][j] = A[i-1][j], which
A[i][j] = A[i-1][j] means the value at this cell is the same as the value
else obtained without including the current item.

A[i][j] = max(A[i-1][j], cost[i-1] + A[i-1][j - weight[i-1]])
- If j ≥ weight[i-1] it means that the current item
i-1 can be included in the knapsack.

We have two options:
1) Do not include the current item: A[i][j] = A[i-1][j] .
2) Include it: A[i][j] = cost[i-1] + A[i-1][j - weight[i-1]], where cost[i-1] is the value of the current item and
A[i-1][j - weight[i-1]] is the maximum value achievable with the remaining capacity, after including the current item.
We take the maximum of these two options as the value of A[i][j] (the first option would be preferred over the second
one when including the current item does not significantly improve the total value achievable in the knapsack).
Once all items and capacities have been considered, the maximum value achievable is stored in A[n][t].

Example:
n = 4, t = 8, cost[i] = {1, 2, 5, 6} , weight[i] = {2, 3, 4, 5}

0 1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 1 1 1 1
2 0 0 1 2 2 3 3 3 3
3 0 0 1 2 5 5 6 7 7
4 0 0 1 2 5 6 6 7 8∗

A[i][j] = max(A[i-1][j], cost[i-1] + A[i-1][j - weight[i-1]])



A[4][8] = max(A[3][8], cost[3] + A[3][8 - 5]) = max(7, 6+2) = 8

Remark: The following algorithm finds the items included in the optimal solution to the 0/1 Knapsack problem.

A = table(cost[ ],weight[ ]) Explanation:


items[ ] The algorithm iterates over each item from n to 1, where n is the total
j = t (tot capacity) number of items.
for i in range(n, 0, -1) For each item, it checks if A[i][j] ̸= A[i-1][j], which compares the optimal
if A[i][j] != A[i-1][j] value at the current position (i, j) with the optimal value obtained by
items.append(list[i-1]) excluding the current item (i-1, j).
j = j - weight[i-1]
return items If the condition is true, indicating that the current item is included
in the optimal solution, it appends the index of the item to the sol list.
Finally, it updates j by subtracting the weight of the current item
(j = j - weight[i-1]), so we update j as we include items in the solution.

3
6.4 Edit distance
S - NOWY
SUNN -Y
The cost of an alignment is the number of columns in which the letters differ, and the edit distance between two strings
is the cost of their best possible alignment.
The edit distance can also be seen as the minimum number of edits insertions, deletions, and substitutions of characters
needed to transform the first string into the second one.
In this case we have to insert U, substitute O → N, and delete W, therefore the total cost will be 3.

Objective: We want to find the edit distance between two strings x[1 ··· m] and y[1 ··· n].
What is a good subproblem? We could look at the edit distance between some prefix of the first string, x[1 ··· i], and
some prefix of the second, y[1 ··· j].
Call this subproblem E(i, j), then our final aim is to compute E(m, n), and for this to work we need to somehow express
E(i, j) in terms of smaller subproblems.
x[i] − x[i]
The rightmost column can only be one of three things:
− y[i] y[j]

The 1st case has a cost of 1, and it remains to align x[1 ... i - 1] with y[1 ... j], that this is the subproblem E (i - 1, j).
In the 2nd case, also with cost 1, we still need to align x[1 ... i] with y[1 ... j - 1], and this is subproblem, E(i, j - 1).
In the final case, which either costs 1 (if x[i] != y[j]) or 0 (if x[i] = y[j]), it is the subproblem E (i - 1, j - 1).

We have no idea which of them is the right one, so we need to try them all and pick the best:
E (i, j) = min{1 + E(i - 1, j), 1 + E(i, j - 1), diff(i, j) + E(i - 1, j - 1)} , and diff(i, j) is 0 if x[i] = y[j] or 1 otherwise.

Algorithm:
Input: two strings A, B
Output: min # operations transforming A into B

for i = 0, 1, 2,..., m: (rows)


E(i, 0) = i
for j = 1, 2,..., n: (columns)
E(0, j) = j
for i = 1, 2,..., m:
for j = 1, 2,..., n:
E(i, j) = min{E(i - 1, j) + 1, E(i, j - 1) + 1, E(i - 1, j - 1) + diff(i, j)}
return E(m, n)

Example:

S N O W Y
0 1 2 3 4 5
S 1 0 1 2 3 4
U 2 1 1 2 3 4
N 3 2 1 2 3 4
N 4 3 2 2 3 4
Y 5 4 3 3 3 3∗

E(i, j) = min{E(i - 1, j) + 1, E(i, j - 1) + 1, E(i - 1, j - 1) + diff(i, j)}



E(5,5) = min{E(4, 5) + 1, E(5, 4) + 1, E(4, 4) + diff(5, 5)} = min{4+1, 3+1, 3+0} = 3

4
6.5 Shortest paths
Floyd-Warshall Algorithm:
It is an “All-pairs shortest path” algorith, this means it can find the shortest path between all pairs of nodes.
We represent our graph as a 2D adjacency matrix A, where A[i][j] is the weight of the edge going from node i to j.

Note that it is assumed that the distance form a node to itself is zero (this is why we have a diagonal of zeros),
instead, if there is no edge from node i to j, then we set the value for m[i][j] to be +∞.

Main idea:
Gradually build up all intermediate routes between nodes i and j, and then find the optimal path.
Suppose our adjacency matrix tells that the distance from a to b is A[a][b] = 11 , and suppose
there exists a third node c, if A[a][c] + A[C][B] < A[a][b] , then it is better to route through c.

Input: n × n matrix A[i][j] = weight of edges from i to j, or +∞ if there is no edge.


Output: D[i][j] = length of the shortest path from i to j.
D=A
for k = 0 ... n-1
for i = 0 ... n-1
for j = 0 ... n-1
D[i][j] = min(D[i][j], D[i][k] + D[k][j])

Explanation:
By iterating through all possible intermediate vertices k, the algorithm gradually builds up the optimal solution, con-
sidering paths routing through 0, then all paths routing through 0 and 1, then all paths routing through 0, 1, and 2,
and so on, until considering all possible intermediate vertices.
This ensures that the algorithm finds the shortest paths between all pairs of vertices in the graph.

Case 1) k is not on the path from i to j .


In this case, D[i][j] remains unchanged after the k-th iteration because k is not on the shortest path from i to j, so
D[i][j] already contains the length of the shortest path from i to j, considering intermediate vertices up to k.

Case 2) k is on this path, before the last iteration we had D[i][k] ≤ α and D[k][j] ≤ β .
In this case, D[i][j] gets updated to D[i][j] + D[i][k] + D[k][j], which is the length of the shortest path from i to j
considering intermediate vertices up to k.

5
7 Flows in networks (and cuts)
Recall that a network is a directed graph with capacities associated to edges, and two special nodes s and t.

Example 1:
Suppose you have a water pipeline network, represented as a directed graph.
Arcs represent the directions in which the water flows, and there are numbers associated with arcs representing the
capacity of the pipe (max amount of water that can be pushed in it).
What is the maximum amount of water that can be pushed from s (source) to t (sink)?
(...) On the ipad

Example 2:
Imagine to have a group of people that could lend / borrow money from each other.
An arc between person a to person b represents the level of trust, in particular it shows that person a is willing to lend
money to person b up to a maximum value expressed by the capacity.
Stefano has money to lend, Tommaso would like to borrow some money, How much money can Tommaso borrow from
Stefano?
(...) On the ipad

Useful notation: For a given directed graph G = (V, A) and a given node u ∈ V , define
δ + (u) = {(w, w) ∈ A : w = u}, arcs outgoing from u (tail = u)
δ − (u) = {(w, w) ∈ A : w = u}, arcs incoming in u (head = u)

Definition of flow: Let G = (V, A) be a directed graph, and let s ∈ V , t ∈ V be two specified nodes in V .
A function f : A → R+ is called an s − t flow if:
(i) f (a) ≥ 0, for every a ∈ A (Non-negativity constraint)
P P
(ii) a∈δ− (u) f (a) = a∈δ+ (u) f (a) for all u ∈ V \ {s, t}
Flow conservation ⇒ the amount of flow entering a vertex v should be equal to the amount of flow leaving v.

P P
Definition of value: The value of an s − t flow is defined as val(f ) = a∈δ + (s) f (a) − a∈δ − (s) f (a) for all u ∈ V
⇒ Note that: The value is the net amount of flow leaving s, that it is equal to the net amount of flow entering t.

Definition: Let c : A → R+ be a capacity function.


We say that f respects c if f (a) ≤ c(a) for every a ∈ A (Capacity constraint).

Maximum flow problem:


Given a directed graph G = (V, A), nodes s ∈ V , t ∈ V , capacity function c.
Find an s − t flow of maximum value that respect c.

Definition of cut: A cut in a graph G = (V, A) is a partition of the set of nodes into two sets, U and V \ U .
In particular, an s − t cut is a cut U where s ∈ U and t ∈ V \ U .

Let’s define the capacity of a cut U as the amount of capacity on the arcs that are going out from U,
P
that is cap(U) = (w,v)∈A c(w, v) , where w ∈ U and v ∈ V \ U .

6
Useful notation: For any U ≤ V
δ + (u) = {(w, v) ∈ A : w ∈ u, v ∈
/ U}

δ (u) = {(w, v) ∈ A : w ∈ / U, v ∈ U }
P
With this notation cap(U ) = a∈δ+ (U ) c(a)

The notation of s − t cuts will be useful to certify maximality of a given flow,


and this is because the capacity of any s − t cut is an upper bound on the value of any s − t flow,
indeed the capacity of the cut is the maximum amount of flow that can be sent from s to t.

Proposition: For any s − t flow f respecting c, and any s − t cut δ + (U ), one has val(f ) ≤ cap(δ + (U )).
Proof:
First we observe that for any s − t flow f and any s − t cut δ + (U ),
P P
we have that val(f ) = a∈δ+ (U ) f (a) − a∈δ− (U ) f (a) for all u ∈ V .
Why is this true?

X X
val(f ) = f (a) − f (a)
a∈δ + (s) a∈δ − (s)
 
X X X X X
= f (a) − f (a) +  f (a) − f (a)
a∈δ + (s) a∈δ − (s) v∈U \s a∈δ + (v) a∈δ − (v)
 
X X X X X
=  f (a) − f (a) = f (a) − f (a)
v∈U a∈δ + (v) a∈δ − (v) a∈δ + (U ) a∈δ − (U )

Now
P note that: P
- a∈δ + (U ) f (a) ≤ a∈δ + (U ) c(a), because of the Capacity constraint.
P
- a∈δ − (U ) f (a) ≥ 0, because of the Non-negativity constraint.

c(a) = cap(δ + (U )).


P P P
Therefore val(f ) = a∈δ + (U ) f (a) − a∈δ − (U ) f (a) ≤ a∈δ + (U )

Ford-Fulkerson Algorithm: We want to find the s − t flow of maximum value that respect c.
1) Set f (a) = 0 for all a ∈ A ( initially there is no flow on any edge)
2) Construct a residual graph Df = (V, Af ), where a ∈ Af if c(a) − f (a) > 0 and a−1 ∈ Af if f (a) > 0
3) Search for an s − t path P in Df :
3.1) Let the path have arcs {e1 , . . . , ek }
Set ϵ = min{ minei ∈p:ei ∈A {c(ei ) − f (ei )} , minei ∈p:e−1 ∈A {f (ei )} }
i

3.2) Set f (ei ) = f (ei ) + ϵ if ei ∈ P and ei ∈ A


f (e−1 −1
i ) = f (ei ) − ϵ if ei
−1
∈ P and e−1
i ∈A
3.3) For each ei ∈ P :
if ei ∈ A and f (ei ) = c(ei ):
remove ei from Af
if e−1
i / A add e−1
∈ i toAf
else if ei ∈ A and f (e−1
−1
i ) = 0:
−1
remove ei from Af
if ei ∈
/ A add ei to Af
4) When such paths do not exist anymore, output f .

7
The idea behind the algorithm:
(1) Start with initial flow as 0.
(2) Construct a residual graph Df = (V, Af ) where Af consists of edges a s.t. c(a) − f (a) > 0, indicating that there is
residual capacity available, and edges a−1 s.t. f (a) > 0, indicating that there is flow in the reverse direction.
(3) Search for an s − t path P in the residual graph Df
(3.1) Find the minimum residual capacity along the path P , indeed min{ minei ∈p:ei ∈A {c(ei )−f (ei )} , minei ∈p:e−1 ∈A {f (ei )} }
i
represents the max amount of flow that can be pushed along the path P without violating capacity constraints.
(3.2) Update the flow along the path P adding ϵ to the flow of each edge ei in the path if ei is in A (forward flow)
and subtracting ϵ from the flow of each edge ei in the path if e−1
i is in A (reverse flow).
(3.3) Adjust the residual graph Af based on the updated flow:
- If an edge ei reaches its capacity (i.e. f (ei ) = c(ei )), remove it from Af .
- If an edge e−1
i no longer has flow in the reverse direction (i.e. f (e−1
i ) = 0), remove it from Af .
- If an edge e−1
i now has flow in the reverse direction (i.e. f (e−1
i ) > 0), add it to Af .
- If an edge ei now has flow in the forward direction (i.e. f (ei ) > 0), add it to Af .
(4) Repeat condition (3) until no s − t path exists in the residual graph Df , then return the maximum flow.

Theorem: The flow f output at the end of the algorithm is the maximum value.
Proof:
By construction, f is an s − t flow respecting the capacity c, as:
- At the beginning f is a feasible flow respecting c.
- After each flow updates f (a) ≥ 0, f (a) ≤ c(a) by our choice of ϵ and flow conservation holds at any node.
Let’s show that f is of maximum value:
Since our algorithm stopped, we know that the residual graph Df does not have an s − t directed graph.
Let U = {w ∈ V : ∃ s − w path ∈ Df }
Note that: U is an s − t cut since s ∈ U , t ∈/ U.
If (v, w) ∈ δ + (U ), since (v, w) ∈
/ Af , then f (v, w) = c(v, w)
If (v, w) ∈ δ − (U ), since (v, w)−1 ∈
/ Af , then f (v, w) = 0
P P
Hence, val(f ) = (w,v)∈δ+ (U ) f (w, v) − (w,v)∈δ− (U ) f (w, v) = cap(U ).
Since we found a flow whose value = capacity of an s − t cut, our flow is maximum.

Theorem: (Max flow / min cut)


The maximum value of an s − t flow (respecting the capacity) is equal to the minimum capacity of an s − t cut,
that is maxs−t f low value(f ) = mins−t cut cap(δ + (U ))

⇒ Observation 1) If all capacities are integers, then ∃ an s − t flow of max value with f (a) integer for every a.
⇒ Observation 2) Suppose you are asked to find a cut in G of minimum total capacity,
i.e. we want U ≤ V with U ̸= ∅ and V \ U ̸= ∅ which minimizes cap(U ).
Can we find an algorithm? ⇒ Ford & Fulkerson’s algorithm.
For every ordered w, v ∈ V we could set s = w , t = v and compute a min s − t cut.
Then output the minimum among all such cuts.

Efficiency of Ford-Fulkerson algorithm:


What is the running time O(# iterations × (O(|V |) + O(|E|))) , where O(|E|) is the time to find the path.

8
We can do so much better! ⇒ Idea: Edmonds & Karp
Use F & F algorithm with BFS to find paths in the residual graph.

Theorem: F & F algorithm with BFS finds a max s − t flow in O(|V ||E|(|V | + |E|)) running time.
Proof:
During the execution of our algorithm, arcs can get saturated∗ multiple times, an we’ll exploit BFS
to give a bound on how many times this can happen.
Let dT (u) be the distance from S to u in the residual graph in iteration T.
′ ′
Observation: Let dT (u) and dT (u) , where T ′ is the iteration after T, then dT (u) ≤ dT (u).
Assume a = (u, v) gets saturated at iteration T and iteration T ′′ (after T ).
In order for a to reappear in the graph, it means a−1 appears in a BFS tree at some iteration T ′ between T and T ′′ .
′ ′ ′′
But then dT (u) = dT (v) − 1 ≤ dT (v) − 1 = dT (u) − 1 − 1 ≤ dT (u) − 2.
|V |
Hence an arc cannot get saturated more than 2 times → # iterations is O(|V ||E|).


Definition: An arc is saturated at iteration i if its residual capacity goes to 0 after that iteration.

8 Linear programming
8.1 Introductory examples
A production problem:
Each week, a food industry produces 2 types of flour:
x1 is the quantity of type 1 flour produced per week
x2 is the quantity of type 1 flour produced per week

Profit: 3 for each unit of product of type 1, and 5 for each unit of product of type 2.

Plant 1 has 3 hours available per week


Plant 2 has 21 hours available per week
Plant 3 has 25 hours available per week

Each x1 uses 6 hours of Plant 2 and 3 hours of Plant 3


Each x2 uses 1 hour of Plant 1, 2 hours of Plant 2 and 7 hours of Plant 3

The goal of the company is to maximize z(x1 , x2 ) = 3x1 + 5x2


Summarizing: max z(x1 , x2 ) = 3x1 + 5x2 with x1 , x2 ≥ 0 and x1 , x2 ∈ R , subject to:
x2 ≤ 3 (Plant 1: used hrs less than available hrs)
6x1 + 2x2 ≤ 21 (Plant 2: used hrs less than available hrs)
3x1 + 7x2 ≤ 25 (Plant 3: used hrs less than available hrs)

The optimal solution is x∗1 = 97


36 ≈ 2.69 and x∗2 = 29
12 ≈ 2.42.
At (x∗1 , x∗2 ) we have: 3 · 97 + 5 · 29 ≈ 20.16
36 · 12 , subject to:
97
36 ≤ 3 (P1: hrs available: NOT BINDING)
21 ≤ 21 (P2: full capacity: BINDING)
25 ≤ 25 (P3: full capacity: BINDING)

Note that: A constraint is binding if the optimal solution occurs at the


limit imposed by that constraint (touches or ”binds” the constraint).

9
A mix (or diet) problem:
For a healthy life, every day we must get a minimum quantity of certain substances (vitamins, fats, fiber, etc.) con-
tained in some ingredients (flour, sugar, milk, etc.), that contain the substances in various proportions.
To simplify things, consider just two ingredients (1, 2) and two substances (A, B):

One unit of I1 contains 7 units of substance A and 2 units of substance of B


One unit of I2 contains 2 units of substance A and 12 units of substance B
Everyday, at least 28 units of substance A and 24 units of substance B are required.

One unit of I1 costs 5$


One unit of I2 costs 10$

We want to get at least the minimum of both substances paying the minimum cost:
- x1 is the quantity of ingredient 1
- x2 is the quantity of ingredient 2

Summarizing:
min 5x1 + 10x2 with x1 , x2 ∈ R , subject to:
7x1 + 2x2 ≥ 28 (min request for substance A)
2x1 + 12x2 ≥ 24 (min request for substance B)
x1 , x 2 ≥ 0
The minimum is 15.8 attained at (3.6, 1.4).

The production problem, revisited


Suppose that, in the first production problem, x1 and x2 are quantities of barrels of flour (a barrel equals 196 pounds).
In this case, x1 and x2 can’t take decimal values and their domain is instead restricted to the non-negative integers.
The problem then becomes:
Maximize 3x1 + 5x2 with x1 , x2 ∈ N , subject to:
x2 ≤ 3 (Plant 1: used hrs less than available hrs)
6x1 + 2x2 ≤ 21 (Plant 2: used hrs less than available hrs)
3x1 + 7x2 ≤ 25 (Plant 3: used hrs less than available hrs)
x1 , x 2 ≥ 0

In this simple case we can actually examine all the feasible


solutions, the points with integer coordinates in the feasible
set found for the first case.
The optimal solution is found to be (1,3) with a value of 18
for the profit (“far” from the non-integer optimal solution).

10
8.2 Formalization
Names: In the previous models, a choice of the value for the variables represents a decision about an activity to be
done, and this is called “programming”, in the sense of planning.
These problems are called programming problems, while the variables are called decision variables (sometimes, the
problem itself is called “program”).

Variables: The modeling dictates the type of variables that has to be used in the problem: real numbers (when
decimal numbers are legal values), integers or even binary numbers.

Constraints: Very often, variables are restricted in sign and can only take non-negative values.
In addition there are further requirements on the values, written in terms of equations and inequalities.
All these equations and inequalities are collectively called constraints.
Moreover, the set of all values the decision variables can take, that satisfy all the constraints, is called the feasible set.
Note that: If the constraints are continuous (almost always the case), the feasible set is a closed region of Rn .

Objective: The solution to every problem is “optimal” → It is the minimum or maximum value of a given function.
The function to be maximized or minimized is called objective function, and the process of finding the extreme value
is called “optimization”.

Linearity: The problems with only real variables are called linear programming, LP problems.
The problems where variables are integers are called integer programming, IP problems.
The two types of problem are related because, when the formulation of the problem is the same, the feasible set for
the IP problem is a subset of the feasible set for the LP problem.
Note that: Given an IP problem, the corresponding LP problem where the constraint about the integer variables
is removed, is called “relaxation”.

8.3 Standard form of an LP problem


The two (non-integer) LP problems in the previous section have these forms:

max cT x subject to min cT x subject to


Ax ≤ b , x ≥ 0 Ax ≥ b , x ≥ 0

However, it can be proved that both forms can be manipulated to get the following standard form:

min/max cT x subject to
Ax = b , x ≥ 0

The transformation into the standard form uses the following remarks:
1. Every inequality can be transformed into an equation by introducing a slack or surplus variable.
For example, if we have the inequality constraint Ax ≥ b, we can add a new variable s to obtain the equation
Ax + s = b , where the variable s is referred to as a slack variable if the constraint is non-binding, or a surplus
variable if the constraint is binding.
2. If a variable xi is unrestricted in sign, then we can add two non-negative variables x′ and x′′ , and set x = x′ −x′′ .

Definition: Pick a subset of the inequalities → If there is a unique point that satisfies them with equality,
and this point happens to be feasible, then it is a vertex (each vertex is specified by a set of n (in)equalities).

Definition: Two vertices are neighbours if they have n − 1 defining (in)equalities in common.

11
Theorem: Given an LP problem, if there exists an optimal solution, there also exists an optimal solution
which is at a vertex of the feasible set.

Consider the following LP problem:


min/max cT x subject to
Ax = b , x ≥ 0

We have m equalities from the matrx A, and n equalities from the non-negativity constraints,
thus we have m + n equalities.
Ax = b has a solution only if rank(A) = n.

A possible approach:
1. Pick all possible subsets of n linearly independent constraints out of the m + n constraints.
2. Solve, in the worst case, m+n systems of equations of the type A∗ x = b∗ , where A∗ , b∗ are the restrictions

n
of A and b to the subset of n constraints. This can be done, for example, by Gaussian elimination.
3. Check feasibility of all solutions, evaluate the objective function at each solution, and pick the best.
This approach is correct but inefficient.

Example: On the ipad (...)

8.4 The simplex algorithm


The simplex algorithm provides a method to reduce the number of vertices to visit in order to find the optimum.
On each iteration, simplex has two tasks:
1. Check whether the current vertex is optimal (and if so, terminate the algorithm).
2. Determine where to move next.
As we will see, both tasks are easy if the vertex happens to be at the origin → If the vertex is elsewhere, we will
transform the coordinate system to move it to the origin.

A worked example:
A wooden toy factory produces cars and trains: the demand of cars is 50 units per month, while that of trains is 80,
overall, the factory cannot produce more than 100 items per month.
Every car is sold for 2 money units, and every train for 4 money units, we need to find the production plan that
maximizes the revenues.
The problem is:
max z = 2x1 + 4x2
x1 ≤ 50
x2 ≤ 80
x1 + x2 ≤ 100
x1 , x2 ≥ 0
Graphically we find the optimal solution at x1 = 20, x2 = 80, with an optimal value for the objective of 360.

How should we proceed?


(...) On the ipad

12
Some implementation issues: While running, the simplex algorithm might encounter some issues.
- Unbounded Solution Issue → Sometimes, the simplex algorithm may find that there’s no limit to how much it can
improve the objective function. In such cases, the algorithm stops and reports that the solution is unbounded.

- Cycles, degenerate case → In rare cases, a vertex can be connected to more than n edges, denoted by m > n.
In such situations, the simplex algorithm might fall into a cycle, repeatedly substituting one of the n constraints with
one of the remaining m − n > 0 constraints, and vice versa.

- Lack of Initial Feasible Solution → Occasionally, the simplex algorithm may struggle to find an initial feasible
solution, and to overcome this, we create an artificial problem by adding artificial variables and constraints.
These artificial variables help us start at a feasible point, even if it’s not optimal, but how can we do that?
We adjust the artificial objective function with a large coefficient (often represented by M ), in this way the algorithm
prioritize minimizing the artificial variables while still considering the original problem’s constraints.

Example:
min 4x1 + x2 , subject to:
3x1 + x2 = 3
4x1 + 3x2 ≥ 6
x1 + 2x2 ≤ 3
x1 , x2 ≥ 0

Here, x1 = x2 = 0 is not feasible, and we need a feasible solution to start the simplex.
We then add an artificial variable a1 multiplied by M , to get 3x1 + x2 + M a1 = 3, with the constraint a1 ≥ 0,
in this way the problem has changed.
The initial solution to this problem is x1 = x2 = 0, a1 = 3, and if we run the simplex for this problem it will
end up with a1 = 0 (to minimize the objective) and will return a feasible solution to the original problem.

Time of execution: The simplex algorithm shows that a linear program can always be solved in finite time, in
particular in time that is at most exponential in the number of variables.
This is because each iteration takes polynomial time and moves to a new vertex, and if there are m inequalities and n
variables, there can be at most m n vertices (that is O(2
m+n
)).

9 Duality
9.1 A numerical example
We can approach the problem of the optimality of the solution of the original problem from another point of view.
The problem is:
max 2x1 + 4x2 , subject to: x1 ≤ 50
x2 ≤ 80
x1 + x2 ≤ 100
x1 , x 2 ≥ 0

And graphically we’ve already found the optimal solution at x1 = 20, x2 = 80 with an optimal value of 360.

- First option: We multiply the 1st inequality by 2 and the 2nd by 4, from which we get 2x1 ≤ 100 and 4x2 ≤ 320 .
Adding the two together, we get 2x1 + 4x2 ≤ 420, which provides an upper bound, although not very tight.

- Second option: However, there is a better choice of factors by multiplying the 2nd and the 3rd inequalities by 2,
and adding them together, we get 2x1 + 4x2 ≤ 360 , that is a certificate of optimality.

13
Is there a way to get the best coefficients?
We assign to every constraint a non-negative multiplicative coefficient, y1 , y2 , y3 ≥ 0 , and we get:
y1 x1 ≤ 50y1
y2 x2 ≤ 80y2
y3 (x1 + x2 ) ≤ 100y3
Adding them together we obtain: x1 (y1 + y3 ) + x2 (y2 + y3 ) ≤ 50y1 + 80y2 + 100y3
Now we want to build the best upper bound to 2x1 + 4x2 , so any expression c1 x1 + c2 x2 (on the left-hand side), where
c1 ≥ 2 and c2 ≥ 4, would provide an upper bound, because all variables are non-negative, and we want the upper
bound (on the right-hand side) to be as small as possible.

The problem is:


min 50y1 + 80y2 + 100y3 , (to find the best upper bound)
subject to:
y1 + y3 ≥ 2 , (where y1 + y3 = c1 )
y2 + y3 ≥ 4 , (where y2 + y3 = c2 )
y1 + y2 + y3 ≥ 0

We can see the link with the original problem.


Problem 1) The original problem is:
max cT x
subject to
Ax ≤ b, x ≥ 0
   
  1 0 50
2
where c = , A = 0 1, and b =  80 .
4
1 1 100

Problem 2) This is written in terms of rows, instead of columns:


min yT b
subject to
y T A ≥ cT , y ≥ 0

The second problem is called the dual of the first, which, in turn, is called the primal problem.
Moreover the optimal solution to the dual is (0, 2, 2), and the corresponding objective is 360.
(...) On the ipad

9.2 More on duality


Note that each decision variable in the primal problem corresponds to a constraint in the dual problem, and each
constraint in the primal problem corresponds to a variable in the dual problem.

Theorem: (Weak duality)


For any pair x and y of feasible solutions to the primal and the dual respectively, we have cT x ≤ y T b.
Proof:
If y is a feasible solution of the dual, then yT A ≥ cT , and since x ≥ 0 we can right-multiply the previous inequality
by x to get yT Ax ≥ cT x.
If x is a feasible solution of the primal, then Ax ≤ b, and similarly, since y ≥ 0, we can left-multiply the last inequality
by yT to get yT Ax ≤ yT b.
Combining the two inequalities we get cT x ≤ yT Ax ≤ yT b.

14
Theorem: (Certificate of optimality)
If x and y are feasible solutions of the primal and the dual and cT x = yT b, then x and y must be optimal solutions
to the primal and the dual.

Remark: There is another interesting consequence of weak duality that relates infiniteness of optimal values in the
primal/dual with feasibility of the dual/primal.
Let y be a feasible solution of the dual. By weak duality, we have cT x ≤ yT b for all feasible x. If the optimal value
in the primal is ∞, then ∞ ≤ yT b. This is not possible, so the dual cannot have a feasible solution.

Theorem:
If the optimal value in the primal is ∞, then the dual must be infeasible.
If the optimal value of the dual is −∞, then the primal must be infeasible.

Theorem: (Strong duality)


The dual has an optimal solution if and only if the primal does.
If x∗ and y ∗ are optimal solutions to the primal and dual, then cT x∗ = (y ∗ )T b.

9.3 Sensitivity analysis


Suppose that the resource quantities b change by a small amount ∆ ∈ Rm , then the primal and dual become:

Primal Dual :
T
max c x min yT (b + ∆)
subject to: subject to:
Ax ≤ b + ∆, y T A ≥ cT ,
x≥0 y≥0

Let the optimal value of the problem in both the primal and the dual be z(∆),
so that z(0) is the optimal value of the original problem.
Suppose that the optimal solution to the dual is unique and that, for sufficiently small ∆,
the optimal solution to the dual does not change.
In this case, the optimal value changes by z(∆) − z(0) = yT b + yT ∆ − yT b = yT ∆.
Remark: By strong duality, the optimal solution to the primal changes by the same amount, yT ∆.

Summary: A small change in the level of resources in the primal, induces a change in the optimal value,
which is scaled by the dual solution.
Every dual optimal variable scales the corresponding variation in the objective of the primal.
The change in the primal’s optimal value (z(∆) − z(0)) is determined by how much each element
of the dual solution matrix y scales or influences this change.

Example: Suppose there’s a constraint related to production (e.g., buying a new machine) that could increase the
weekly production by a certain amount δ.
The increase in weekly production leads to a variation in revenues of 2δ.
If the cost of the machine is less than 2δ, then investing in the machine is profitable, otherwise the investment would
result in a loss because the cost of acquiring the machine would outweigh the potential increase in revenue.
This is why the dual optimal solutions are sometimes called shadow prices.
(...)

15
Theorem: (Complementary slackness)
Let x and y be feasible solutions to the primal and the dual.
Then, x and y are optimal solutions if and only if yT (b − Ax) = 0 and (yT A − cT )x = 0.
Proof:
Feasibility for x and y means:
- x is feasible ⇒ Ax ≤ b, that is Ax − b ≤ 0 ⇒ yT (b − Ax) ≥ 0
- y is feasible ⇒ yT A ≤ cT , that is yT A − cT ≤ 0 ⇒ (yT A − cT )x ≥ 0
Suppose x and y are optimal, adding the two inequalities we get:
yT b − yT Ax + yT Ax − cT x ≥ 0 ⇒ yT b − cT x ≥ 0

However, x and y are also supposed to be optimal, not just feasible,


then by strong duality we impose the equality and get:

y T b = cT x ⇒ y T b − cT x = 0

Going back to yT (b − Ax) and (yT A − cT )x their sum is 0,


and since yT (b − Ax) ≥ 0 and (yT A − cT )x ≥ 0 they must both be equal to 0.

⇐ Conversely, suppose that yT (b − Ax) = 0 and (yT A − cT )x = 0


then yT b − cT x = 0 ⇒ yT b = cT x , and so they are optimal.

10 NP-completeness
10.1 Introduction
How do we judge how good is an algorithm? Typically we try to analyze an algorithm by predicting the resources
that it would takes, indeed most often we measure the running time.
Given an instance of a particular problem, we can measure its size by an integer p, that represents the length of the
encoding of the input data (in binary notion).
Running time of an algorithm: # of elementary operations performed, measured as a function of the input size p.
However, precise measure is not so important, we are interested in the order of growth of the running time function,
for this reason we use asymptotic notation.

Definition: Given f : N → N and g : N → N , we say that f = O(g) if ∃ c, p′ such that f (p) ≤ c · g(p) for all p ≥ p′ .
An algorithm is efficient (or polynomial time) if its running time f (p) is a polynomial function of the input size p,
i.e. f (p) = O(pk ) for some fixed constant k.

⇒ Remark: The notation of running time allows to classify problems in terms of computational complexity,
whether or not we currently know how to solve them efficiently.
A formal classification applies to the so-called decision problems, that are problems in which the answer is yes or not.
Some example of this kind of problems are:
- Does a graph G have an s-t path with at most k edges?
- Does a graph G have an s-t path with at least k edges?
- Does a minimization LP have a feasible solution of objective function value ≤ k?

⇒ Note that: So far we have mostly dealt with optimization problems, in witch we were asked to find the best solution
according to a given cost function.
Typically we can state optimization problems as decision problems, and solve them by relying on solving the decision
version multiple times, e.g. with binary search.

16
P is the class of decision problems that can be solved by a polynomial time algorithm,
and by definition P is a subset of NP.
⇒ In particular, we can fin a solution and verify its correctness in polynomial time.

NP is the class of decision problems admitting a certificate, from which the correctness of a yes-answer
can be derived in a polynomial time.
⇒ In particular, we can verify its correctness in polynomial time.

NP-complete problems: (Informally) Problems in NP are problems such that the following holds: << If you have
a polynomial-time algorithm for one of them, you can use it to solve any problem in NP in polynomial time >>.

But what does this mean formally?


To define NP-complete formally, we need to introduce reductions.

Definition: (Reductions) We say that a decision problem A reduces to a decision problem B if:
i) Given an instance I of problem A, you can construct an instance I’ of problem B in polynomial time in size(I).
ii) I admits a yes-answer if and only if I’ admits a yes-answer.
We let A ≤ B denote that A reduces to B.

⇒ Remark: Assume that problem A reduces to problem B, if B admits a polynomial-time algorithm,


then A admits a polynomial-time algorithm.

Definition: (NP-complete problems, formally)


A decision problem X is NP-complete if:
i) X is in NP
ii) For any Y ∈ N P , Y ≤ X (*)

We will often encounter problems called NP-hard, that are problems for which only (*) apply.

Procedure: When dealing with your own problem X you can exploit reductions in 2 ways:
(1) Trying to reduce your own problem X to a problem Y ∈ P (solvable in polynomial time) → This shows that
your problem X can be solved efficiently.

(2) Trying to reduce an NP-complete problem Y to your problem X → This shows that your problem X is NP-
complete too, that means that most likely it can’t be solved efficiently (unless P=NP).

Definition: A bipartite graph is an undirected graph G = (V, E) where V can be partitioned into two subsets L, R
such that every edge e ∈ E has an endpoint in L and the other one in R.

Definition: (Matching) A matching in a graph G = (V, E) is a subset of edges M ⊂ E such that every node in V is
the endpoint of at most one edge in M .

Definition: (Maximum matching in bipartite graphs) Given G = (V, E), a maximum matching is a matching M
of maximum cardinality (i.e. maximizing |M |).

Theorem: Maximum matching in bipartite graphs is solvable in polynomial-time (true also if G is non bipartite).
Proof: We’ll prove the theorem using a reduction to flow, but first let’s state the problem in a decision version:
Given G = (V, E) and a value k, does G have a matching M with |M | ≥ k? (*)
We’ll reduce (*) to a flow problem.

17
Given an instance G = (L ∪ R, E) of (*) we’ll construct a flow instance on a directed G′ = (V ′ , E ′ ) as follows.
- We set V ′ = L ∪ R ∪ {s} ∪ {t}
- For each (u, v) ∈ E orient the edge from u ∈ L to v ∈ R, and add it to E ′ .
- Add (s, u) to E ′ for all u ∈ L.
- Add (v, t) to E ′ for all v ∈ R.
- Set capacity of all arcs to 1
This construction can be done in polynomial time (in size(G)).

Claim: There exists an s-t flow in G′ of value ≥ k if and only if there exists a matching in G of value ≥ k.
Given the claim, since we know we can decide whether G′ has a flow of value ≥ k in polynomial-time (apply F&F),
we can decide (*) in polynomial-time.
⇒ Note that: If we can decide (*) in polynomial-time, we can solve the maximum matching problem in polynomial
time, by trying all values of k from 1 to |V |.

Now it remains to prove the claim:


Given a matching M with |M | ≥ k, one can construct a flow from s to t of value ≥ k.
For all (u, v) ∈ M send 1 unit of flow on G′ on the path with nodes s, u, v, t.
Given flow of value ≥ k we know that ∃ flow f with f (a) ∈ {0, 1} for every a ∈ E ′ .
We can take all edges with flow value 1 that goes from u ∈ L to v ∈ R, and they form a matching M .

Conclusion: Since we know that we can decide the existence of a flow with value ≥ k in polynomial time using
algorithms like Ford-Fulkerson, and because of the correspondence established in the claim, we can also decide whether
a graph has a matching of size ≥ k in polynomial time.

10.2 SAT
The first problem shown to be NP-complete is SAT, however to define it formally we need the following notions:

Definition of literal: Given a set of boolean variables x1 , ..., xn (variables that can take 0 or 1 value),
a literal l is either a variable (xi ) or its negation (x′i ).

Definition of clause: A clause C is given by taking an OR of literals: C = (l1 ∨ l2 ∨ ... ∨ lk ),


where ∨ denotes the symbol of OR. Example: C=(x1 ∨ x′2 ∨ x3 )

Conjuctive normal form: A boolean formula in conjuctive normal form is given by an AND of clauses,
so it is an expression of the form C1 ∧ C2 ∧ ... ∧ Cn , where ∧ denotes the symbol of AND.

Truth assignment: A truth assignment assigns (1/0) value (or truth/false) to each variable. (...) On the ipad

SAT problem: (Decision version) Given a boolean formula in conjuctive normal form, is there a truth assignment
for the variables that satisfies all the clauses? (i.e. that make the whole formula be true?)

Theorem: (Cook-Levin) SAT is NP-complete.


Proof: We want to show that every problem X ∈ NP reduces to SAT.
But what do we know about X? For every I of X ∃ algorithm Ax with polynomial runtime in size(I) that, for any
possible solution S of I, verifies whether S is a yes-answer for I or not.
Assume size(I)= n , and assume that Ax :
- performs t(n) steps (elementary operation)
- uses s(n) bits of memory in its execution.

18
General idea:
(1) The execution of one step of the algorithm can be simulated by a so-called Boolean circuit of size O(s(n)).
(2) The execution of the entire algorithm can be simulated by combining t(n) such circuits, yielding a boolean circuit
of size O(t(n)s(n)) = O(t2 (n)) (because the algorithm at each step affects O(1) memory location).
(3) Model such a circuit with a SAT formula.
What is a boolean circuit? A boolean circuit is an abstract device that computes boolean functions (f : {0, 1}n →
{0, 1}).
The device is made of boolean gates and wires.
A boolean circuit is modeled as a directed acyclic graph whose nodes are gates and arcs are wires cinnecting the gates.
- input of circuit → nodes of in-degree 0
- output of circuit → nodes of out-degree 0
- size of circuit → number of total gates
Given I of X on computes n = size(I) and construct a circuit Cn that simulates Ax on (S, I).
(...) On the ipad
Essentialy, given X ∈ NP and Ax for any instance I of X we construct (*) (→ polytime in size(I)).
Assume (*) is a circuit with h inputs and m gates, we’ll define a boolean formula with h + m variables g1 , ..., gh+m .
Think of the circuit as an acyclic graph, number the input-nodes and the gate nodes consistently, according to the
topological order.
v1 , ..., vh , vh+1 , ..., vh+m , where v1 , ..., vh are inputs and vh+1 , ..., vh+m are the gates.
We will construct a SAT formula by:
- associating a small formula to each gate
- taking the AND of all these small formulas and the final gate.

Theorem: (Circuits can simulate algorithms) If A is an algorithm that takes worst-case running time t(n) on
input of length n, then for every n there is a boolean circuit of size O(t2 (n)), such that for every input I of
length n, the output of the circuit on I is the same as the output of algorithm A on I.
Furthermore, given the code of A and n, such circuit is computable in polynomial time in t(n).

For each i = 1, ..., n the formula (and/or/not) associated to vh+i (i-th gate) is:
- If vh+i represents NOT gate, that has an incoming edge from vj :
˜ ∨ g˜j ) ∧ (gh+i ∨ gj ) that represents gh+i = g˜j
(gh+i
- If vh+i represents AND gate, that has incoming edges from vj and vl :
(gh+i ∨ g˜j ∨ g˜l ) ∧ (gh+i
˜ ∨ gj ∨ gl ) ∧ (gh+i
˜ ∨ g˜j ∨ gl ) ∧ (gh+i
˜ ∨ gj ∨ g˜l ) that represents gh+i = gj ∧ gl
- If vh+i represents OR gate, that has incoming edges from vj and vl :
(gh+i ∨ g˜j ∨ g˜l ) ∧ (gh+i
˜ ∨ gj ∨ gl ) ∧ (gh+i ∨ g˜j ∨ gl ) ∧ (gh+i ∨ gj ∨ g˜l ) that represents gh+i = gj ∨ gl
Now it remains to observe that:
- If (g1 , ..., gh+m ) = (b1 , ..., bh+m ) is an assignment that satisfies the formula (*), then on input (b1 , ..., bh+m ) the i-th
gate will evaluate bh+i and bh+m = 1 (circuit outputs 1).
(*) F: (formula for gate 1) ∧ ... ∧ (formula for gate m) ∧ gh+m
If b˜1 , ..., b˜n is a set of input values that make the circuit output 1, then let bh+i
˜ be the vale of gate i for this input, then
˜ ˜
(b1 , ..., bh+m ) satisfies the formula (*).
This concludes the sketch of NP-complete for SAT.

19
10.3 3-SAT
3-SAT problem: (Decision version) Given a boolean formula in conjuctive normal form, where each clause has exactly
3 literals, is there a truth assignment for the variables that satisfies all clauses?

Theorem: 3-SAT is NP-complete.


Proof: Observe that 3-SAT is in NP (easy).
Given a SAT formula I with n variables and m clauses, we will construct a 3-SAT formula I’ with poly(n,m) variables
and clauses such that I is satisfiable iff I’ is satisfiable.
We need to modify the clauses which have ̸= 3 literals.
- Suppose I contains a clause Ci with 2 literals, e.g. (Xi ∨ Yi ).
(1) We introduce a new variable Zi and 2 clauses instead of Ci → (Xi ∨ Yi ∨ Zi ) ∧ (Xi ∨ Yi ∨ Zi′ ) (*).
Note that: ∃ a truth assignment satisfying Ci iff ∃ a truth assignment (on Xi , Yi , Zi ) satisfying (*).
- Suppose I contains a clause Ci with 1 literal, e.g. Ci = Xi
(2) We add a new variable Zi , and replace Ci with (Xi ∨ Zi ) ∧ (Xi ∨ Zi′ ) → Then apply the transformation in (1).
- Suppose I contains a clause Ci with > 3 literals, e.g. Ci =(Xi ∨ ... ∨ Xk )
(3) We have to add a new variable Zi and replace Ci with 2 clauses (X1 ∨ X2 ∨ Z1 ) , (Zi′ ∨ X3 ∨ ... ∨ Xk ).
Example: (X2 ∨ X2 ∨ X3 ∨ X4 ) → (X1 ∨ X2 ∨ Zi ) ∧ (X3 ∨ X4 ∨ Zi′ ) ,
Repeat (3) on until all clauses have 3 literals (It will be repeated ≤ k − 3 times).
After all these steps we obtain instance I’ of 3-SAT, moreover, by construction, the number of additional variables and
new clauses are polynomial in n, m (size(I)).
We can see that I’ is satisfiable iff I is satisfiable, hence SAT reduces to 3-SAT, thus 3-SAT is NP-complete.

10.4 Vertex Cover


Definition: (Vertex cover) Given a graph G = (V, E) a vertex cover is a subset C ⊂ V with the property that
for all edges (u, v) ∈ E, either u ∈ C or v ∈ C, or both.

Vertex cover problem: (Decision version) Given a graph G = (V, E) and an integer k (0 < k < |V |),
is there a vertex cover C with |C| ≤ k?

Theorem: The vertex cover problem is NP-complete.


Proof: We observe that vertex cover is in NP.
Given a subset of vertices we can efficiently check whether it’s a vertex cover or not.
We’ll show that 3-SAT ≤ Vertex cover (we have to prove that we can reduce the 3-SAT to the vertex cover).
Give an instance I of 3-SAT with n variables and m clauses, we will construct a vertex cover instance I’ defined on a
graph G and integer k, with poly(n, m) nodes and edges, such that I is satisfiable iff G has a vertex cover of cardinality
≤ k (I’ is a yes-answer).
Give an instance I of 3-SAT with variables x1 , ..., xn and clauses C1 , ..., Cn , we create a ”gadget” fro every variable xi .
We create also a gadget for every closure Ci that is a triangle where each node in the triangle is associated with a
literal in the closure (example: Ci = (l1 ∨ l2 ∨ l3 )).
For each literal in Ci we add an edge between the literal and the corresponding variable.
Example: (...) photo
We now claim that a vertex cover of cardinality ≤ n + 2m exists iff the 3-SAT formula I is satisfiable.
Assume a vertex cover of cardinality ≤ n + 2m exists, then the cardinality is n + 2m because we need at least n distinct
nodes in the cover to cover the variables-gadget plus 2m distinct ones to cover the clause-gadgets (triangles).

20
Then necessarily, for every variable xi , exactly one between the nodes in the xi -gadget is in the cover.
We set xi = 1 if x1 is in the cover and xi = 0 otherwise.
Now we observe that this truth assignment satisfy all clauses because we must have exactly 2 roads in the Ci -gadget
included in the cover.
So the literal not included in the cover has the right truth-values.
Now assume ∃ truth assignment for 3-SAT formula, we can construct a vertex cover C as follows:
for all xi add (xi )∈ C if xi = 1 and (x′i )∈ C if xi = 0 .
For all clause Ci we know ∃ at least one literal satisfying it, we add the other 2 nodes of its triangle to C.
⇒ Note that: |C| = n + 2m, and C is a vertex.
⇒ Note that: our graph has poly(n,m) nodes and edges, hence vertex cover is NP-complete.

10.5 Independence set:


Given G = (V, E) and an integer k, is there a subset S ⊂ V s.t. nodes in S are pairwise not adjacent, with |S| ≥ K ?

Theorem: Independent set is NP-complete.


Proof: Independent set in NP, we’ll show that vertex cover ≤ independent set.
Main observation: If S is an independent set, V \ S is a vertex cover.
Viceversa if C is a vertex cover, V \ C is an independent set.
So an independent set of cardinality ≥ k exists in G iff a vertex cover of cardinality ≤ |V | − k exists.

10.6 Integer programming:


Given an IP of the form min{cT x : Ax ≤ b, x integer}, find an optimal solution.

Theorem: Integer programming is NP-complete.


Proof: We use a reduction from vertex cover.
Given a vertex cover instance I defined on a graph with n nodes and m edges, we’ll construct an integer program with
poly(n, m) variables and constraints (and bounded coefficient) such that an optimal solution to the IP allows us to
decide whether I is a yes-instance of vertex cover.
Suppose to have a vertex cover instance on G = (V, E) and integer k.
(...) photo

10.7 Subset sum:


Given integers a1 , ... , an and an integer
P k, we ask whether ∃ subset of all these integers whose sum is exactly k (i.e.
whether ∃ I ⊂ {1, ..., n} such that i∈I i = k)
a

Theorem: Subset sum is NP-complete.


Proof: First observe the problem is in NP, we’ll show that independent set ≤ subset sum.
Given an instance of independent set defined on G = (V, E) and parameter k, we’ll construct an instance of subset
sum by defining integers:
- a′i for all i ∈ V
- b′ij for all (i, j) ∈ E
- parameter k

21
The goal is to achieve the following: If we have a subset of integers that sum up to k ′ , then the subset of a′i gives us
an independent set of cardinality k ′ , and the subset of b′ij will correspond to edges whose endpoints are not in our
independent set, and viceversa.
We represent integers in a matrix where each integer is a row, and the row should be considered as being the base-10
representation of the integer.
- We have one special column plus a column for every edge.
- Row for every vertex and for every edge.
- Column (i, j) has 1 in row a′i , a′j , b′ij .
P|E|−1
We fix k ′ = k · 10|E| · i=0 10i
⇒ Note that: the construction is polynomial in size of subset sum instance.
Now we observe that ∃ integers summing up to k ′ iff ∃ independent set of cardinality ≥ k.

10.8 Knapsack:
Given n objects with weights w1 , ..., wn , a budget B, profits p1 , ..., pn and a parameter k.
P P
We ask whether ∃ I ⊂ {1, ..., n} such that i∈I wi ≤ B and i∈I pi ≥ k (decision version).

Theorem: Knapsack is NP-complete.


Proof: One observes that problem in NP, we’ll show that subset sum reduces to knapsack.
Given an instance of subset sum with integers ai , ..., an and parameter k.
We construct a knapsack instance with n objective where:
wi = pi = ai
B=k
k̃ = k
The construction is polynomial in size of subset instance.
∃ subset I of objects for knapsack satisfying:
P P
i∈I wi i∈I ai ≤ k
⇔ ⇔ subset sum instance admits a yes-answer.
P P
i∈I pi ≥ k̃ i∈I ai ≥ k

22

You might also like