CS2
CS2
Dynamic programmingis about identifying and solving subproblems and put them together to solve larger problems.
Algorithm: Explanation:
Let u1 , ..., un be a topological sort of nodes We initialize an array denoted d[ ] to store the shortest distances from
d[ ] = ∞ , d[s] = 0 the source node s to each node in the DAG, and initialize all elements
for i=1, ..., n of d[ ] to ∞ except for d[s], which is set to 0.
for each v in Adj[ui ] For each node ui , iterate over each neighbor v of ui , and update the
d[v] = min(d[v], d[u] + w[u, v]) shortest distance d[v] to node v by taking the minimum between its
current value d[v] and the sum of the distance to ui (d[ui ]) and the
weight of the edge from ui to v (w[ui , v]).
Example:
Consider A=[3, 1, 8, 2, 5], in this case the longest increasing subsequence is [1, 2, 5], with a total length of 3.
To solve this kind of problem we can imagine that each element of the sequence is a node in a graph, and we can
construct a directed edge from one node to another if the node connected contains a larger value: there is a directed
edge from node i to node j if the element at index j is greater than the element at index i.
We notice that this is a DAG, where an increasing subsequence is just another path in the graph, indeed the lenght of
the longest increasing subsequence corresponds to the length of the longest path in this DAG + 1.
- We have to find a subproblem: We know that all increasing subsequences have a start and end, and we focus on the
end index of an increasing subsequence.
Consider L[k]= L ending at index k , for example if k=3 it would be [1, 2], with a total length of 2.
- We have to find relationships among subproblems: If I want to find the longest increasing subsequence ending at
index 4, what subproblems are needed to dolve L[4]? → We look at all the paths from a node i to our node at index
4, we have a path from node 0 (index) to node 4, a path from node 1 and a path from node 3.
Now we need to know the length of the longest increasing subsequence ending at index 0, which happened to be 1,
L[0] = 1 , then ending at index 1, L[1]= 1, and finally ending at index 3, L[3]=2.
Therefore the length of the longest increasing subsequence ending at index 4 is L[4] = 1 + max{L[0], L[1], L[3]} = 3.
1
- Now we have to generalize this relationship: L[4] = 1 + max{L[k] | k < 4andA[k] < A[4]} = 1 + max{L[0], L[1], L[3]}.
- Implement by solving subproblems in order: In this case we have to solve subproblems from left to right.
Algorithm: Explanation:
Let u1 , ..., un be a topological sort of nodes We initialize an array d[] to store the length of the longest
for i=1, ..., n increasing subsequence ending at each node.
d[ui ] = -∞ if ui ̸= s , else 1 If ui is the first element of the sequence is initialized to 1, as
for each v in AdjIncoming[ui ] the node itself forms an increasing subsequence of length 1.
d[ui ] = max(d[ui ], 1 + d[v])
d[ui ] = max(d[ui ], 1 + d[v]) means that the length of the longest
return max(d)
increasing subsequence ending at ui, is one plus the maximum
length of increasing subsequences ending at all its predecessors.
The length of the longest increasing subsequence is the maximum of d[].
Running time: The i-th step of outer loop takes time Pn O(out-deg(ui )), where out-degree represents the number of
outgoing edges from ui , so the total time will be O( i=1 out-deg(ui )) , and since the sum of the out-degrees of all
nodes is equal to the total number of edges in the graph we will have O(m), where m is the number of edges.
6.3 Knapsack
Example: During a robbery, a burglar finds much more loot than he had expected and has to decide what to take.
His bag (or “knapsack”) will hold a total weight of at most W pounds, and there are n items to pick from, of weight
w1 , ..., wn , and dollar value v1 , ..., vn .
What’s the most valuable combination of items he can fit into his bag?
For instance, take W = 10 and:
Note that: If this application seems frivolous, replace “weight” with “CPU time” and “only W pounds can be taken”
with “only W units of CPU time are available”.
Algorithm: Explanation:
K(0) = 0 We initialize an array K of length W + 1 to store the maximum
for w = 1 to W : values for each knapsack capacity.
K(w) = max(K(w − wi ) + vi : wi ≤ w) For each capacity w, consider each item i such that the weight of
return K(W ) item i (wi ) does not exceed the current capacity w.
For each item i, calculate K(w − wi ) + vi , which represents the
maximum value achievable by including item i in the knapsack
(w − wi represents the remaining capacity after including item i).
2
Knapsack without repetition
Each item can be included in the knapsack at most once.
Example:
n = 4, t = 8, cost[i] = {1, 2, 5, 6} , weight[i] = {2, 3, 4, 5}
0 1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 1 1 1 1
2 0 0 1 2 2 3 3 3 3
3 0 0 1 2 5 5 6 7 7
4 0 0 1 2 5 6 6 7 8∗
Remark: The following algorithm finds the items included in the optimal solution to the 0/1 Knapsack problem.
3
6.4 Edit distance
S - NOWY
SUNN -Y
The cost of an alignment is the number of columns in which the letters differ, and the edit distance between two strings
is the cost of their best possible alignment.
The edit distance can also be seen as the minimum number of edits insertions, deletions, and substitutions of characters
needed to transform the first string into the second one.
In this case we have to insert U, substitute O → N, and delete W, therefore the total cost will be 3.
Objective: We want to find the edit distance between two strings x[1 ··· m] and y[1 ··· n].
What is a good subproblem? We could look at the edit distance between some prefix of the first string, x[1 ··· i], and
some prefix of the second, y[1 ··· j].
Call this subproblem E(i, j), then our final aim is to compute E(m, n), and for this to work we need to somehow express
E(i, j) in terms of smaller subproblems.
x[i] − x[i]
The rightmost column can only be one of three things:
− y[i] y[j]
The 1st case has a cost of 1, and it remains to align x[1 ... i - 1] with y[1 ... j], that this is the subproblem E (i - 1, j).
In the 2nd case, also with cost 1, we still need to align x[1 ... i] with y[1 ... j - 1], and this is subproblem, E(i, j - 1).
In the final case, which either costs 1 (if x[i] != y[j]) or 0 (if x[i] = y[j]), it is the subproblem E (i - 1, j - 1).
We have no idea which of them is the right one, so we need to try them all and pick the best:
E (i, j) = min{1 + E(i - 1, j), 1 + E(i, j - 1), diff(i, j) + E(i - 1, j - 1)} , and diff(i, j) is 0 if x[i] = y[j] or 1 otherwise.
Algorithm:
Input: two strings A, B
Output: min # operations transforming A into B
Example:
S N O W Y
0 1 2 3 4 5
S 1 0 1 2 3 4
U 2 1 1 2 3 4
N 3 2 1 2 3 4
N 4 3 2 2 3 4
Y 5 4 3 3 3 3∗
4
6.5 Shortest paths
Floyd-Warshall Algorithm:
It is an “All-pairs shortest path” algorith, this means it can find the shortest path between all pairs of nodes.
We represent our graph as a 2D adjacency matrix A, where A[i][j] is the weight of the edge going from node i to j.
Note that it is assumed that the distance form a node to itself is zero (this is why we have a diagonal of zeros),
instead, if there is no edge from node i to j, then we set the value for m[i][j] to be +∞.
Main idea:
Gradually build up all intermediate routes between nodes i and j, and then find the optimal path.
Suppose our adjacency matrix tells that the distance from a to b is A[a][b] = 11 , and suppose
there exists a third node c, if A[a][c] + A[C][B] < A[a][b] , then it is better to route through c.
Explanation:
By iterating through all possible intermediate vertices k, the algorithm gradually builds up the optimal solution, con-
sidering paths routing through 0, then all paths routing through 0 and 1, then all paths routing through 0, 1, and 2,
and so on, until considering all possible intermediate vertices.
This ensures that the algorithm finds the shortest paths between all pairs of vertices in the graph.
Case 2) k is on this path, before the last iteration we had D[i][k] ≤ α and D[k][j] ≤ β .
In this case, D[i][j] gets updated to D[i][j] + D[i][k] + D[k][j], which is the length of the shortest path from i to j
considering intermediate vertices up to k.
5
7 Flows in networks (and cuts)
Recall that a network is a directed graph with capacities associated to edges, and two special nodes s and t.
Example 1:
Suppose you have a water pipeline network, represented as a directed graph.
Arcs represent the directions in which the water flows, and there are numbers associated with arcs representing the
capacity of the pipe (max amount of water that can be pushed in it).
What is the maximum amount of water that can be pushed from s (source) to t (sink)?
(...) On the ipad
Example 2:
Imagine to have a group of people that could lend / borrow money from each other.
An arc between person a to person b represents the level of trust, in particular it shows that person a is willing to lend
money to person b up to a maximum value expressed by the capacity.
Stefano has money to lend, Tommaso would like to borrow some money, How much money can Tommaso borrow from
Stefano?
(...) On the ipad
Useful notation: For a given directed graph G = (V, A) and a given node u ∈ V , define
δ + (u) = {(w, w) ∈ A : w = u}, arcs outgoing from u (tail = u)
δ − (u) = {(w, w) ∈ A : w = u}, arcs incoming in u (head = u)
Definition of flow: Let G = (V, A) be a directed graph, and let s ∈ V , t ∈ V be two specified nodes in V .
A function f : A → R+ is called an s − t flow if:
(i) f (a) ≥ 0, for every a ∈ A (Non-negativity constraint)
P P
(ii) a∈δ− (u) f (a) = a∈δ+ (u) f (a) for all u ∈ V \ {s, t}
Flow conservation ⇒ the amount of flow entering a vertex v should be equal to the amount of flow leaving v.
P P
Definition of value: The value of an s − t flow is defined as val(f ) = a∈δ + (s) f (a) − a∈δ − (s) f (a) for all u ∈ V
⇒ Note that: The value is the net amount of flow leaving s, that it is equal to the net amount of flow entering t.
Definition of cut: A cut in a graph G = (V, A) is a partition of the set of nodes into two sets, U and V \ U .
In particular, an s − t cut is a cut U where s ∈ U and t ∈ V \ U .
Let’s define the capacity of a cut U as the amount of capacity on the arcs that are going out from U,
P
that is cap(U) = (w,v)∈A c(w, v) , where w ∈ U and v ∈ V \ U .
6
Useful notation: For any U ≤ V
δ + (u) = {(w, v) ∈ A : w ∈ u, v ∈
/ U}
−
δ (u) = {(w, v) ∈ A : w ∈ / U, v ∈ U }
P
With this notation cap(U ) = a∈δ+ (U ) c(a)
Proposition: For any s − t flow f respecting c, and any s − t cut δ + (U ), one has val(f ) ≤ cap(δ + (U )).
Proof:
First we observe that for any s − t flow f and any s − t cut δ + (U ),
P P
we have that val(f ) = a∈δ+ (U ) f (a) − a∈δ− (U ) f (a) for all u ∈ V .
Why is this true?
X X
val(f ) = f (a) − f (a)
a∈δ + (s) a∈δ − (s)
X X X X X
= f (a) − f (a) + f (a) − f (a)
a∈δ + (s) a∈δ − (s) v∈U \s a∈δ + (v) a∈δ − (v)
X X X X X
= f (a) − f (a) = f (a) − f (a)
v∈U a∈δ + (v) a∈δ − (v) a∈δ + (U ) a∈δ − (U )
Now
P note that: P
- a∈δ + (U ) f (a) ≤ a∈δ + (U ) c(a), because of the Capacity constraint.
P
- a∈δ − (U ) f (a) ≥ 0, because of the Non-negativity constraint.
Ford-Fulkerson Algorithm: We want to find the s − t flow of maximum value that respect c.
1) Set f (a) = 0 for all a ∈ A ( initially there is no flow on any edge)
2) Construct a residual graph Df = (V, Af ), where a ∈ Af if c(a) − f (a) > 0 and a−1 ∈ Af if f (a) > 0
3) Search for an s − t path P in Df :
3.1) Let the path have arcs {e1 , . . . , ek }
Set ϵ = min{ minei ∈p:ei ∈A {c(ei ) − f (ei )} , minei ∈p:e−1 ∈A {f (ei )} }
i
7
The idea behind the algorithm:
(1) Start with initial flow as 0.
(2) Construct a residual graph Df = (V, Af ) where Af consists of edges a s.t. c(a) − f (a) > 0, indicating that there is
residual capacity available, and edges a−1 s.t. f (a) > 0, indicating that there is flow in the reverse direction.
(3) Search for an s − t path P in the residual graph Df
(3.1) Find the minimum residual capacity along the path P , indeed min{ minei ∈p:ei ∈A {c(ei )−f (ei )} , minei ∈p:e−1 ∈A {f (ei )} }
i
represents the max amount of flow that can be pushed along the path P without violating capacity constraints.
(3.2) Update the flow along the path P adding ϵ to the flow of each edge ei in the path if ei is in A (forward flow)
and subtracting ϵ from the flow of each edge ei in the path if e−1
i is in A (reverse flow).
(3.3) Adjust the residual graph Af based on the updated flow:
- If an edge ei reaches its capacity (i.e. f (ei ) = c(ei )), remove it from Af .
- If an edge e−1
i no longer has flow in the reverse direction (i.e. f (e−1
i ) = 0), remove it from Af .
- If an edge e−1
i now has flow in the reverse direction (i.e. f (e−1
i ) > 0), add it to Af .
- If an edge ei now has flow in the forward direction (i.e. f (ei ) > 0), add it to Af .
(4) Repeat condition (3) until no s − t path exists in the residual graph Df , then return the maximum flow.
Theorem: The flow f output at the end of the algorithm is the maximum value.
Proof:
By construction, f is an s − t flow respecting the capacity c, as:
- At the beginning f is a feasible flow respecting c.
- After each flow updates f (a) ≥ 0, f (a) ≤ c(a) by our choice of ϵ and flow conservation holds at any node.
Let’s show that f is of maximum value:
Since our algorithm stopped, we know that the residual graph Df does not have an s − t directed graph.
Let U = {w ∈ V : ∃ s − w path ∈ Df }
Note that: U is an s − t cut since s ∈ U , t ∈/ U.
If (v, w) ∈ δ + (U ), since (v, w) ∈
/ Af , then f (v, w) = c(v, w)
If (v, w) ∈ δ − (U ), since (v, w)−1 ∈
/ Af , then f (v, w) = 0
P P
Hence, val(f ) = (w,v)∈δ+ (U ) f (w, v) − (w,v)∈δ− (U ) f (w, v) = cap(U ).
Since we found a flow whose value = capacity of an s − t cut, our flow is maximum.
⇒ Observation 1) If all capacities are integers, then ∃ an s − t flow of max value with f (a) integer for every a.
⇒ Observation 2) Suppose you are asked to find a cut in G of minimum total capacity,
i.e. we want U ≤ V with U ̸= ∅ and V \ U ̸= ∅ which minimizes cap(U ).
Can we find an algorithm? ⇒ Ford & Fulkerson’s algorithm.
For every ordered w, v ∈ V we could set s = w , t = v and compute a min s − t cut.
Then output the minimum among all such cuts.
8
We can do so much better! ⇒ Idea: Edmonds & Karp
Use F & F algorithm with BFS to find paths in the residual graph.
Theorem: F & F algorithm with BFS finds a max s − t flow in O(|V ||E|(|V | + |E|)) running time.
Proof:
During the execution of our algorithm, arcs can get saturated∗ multiple times, an we’ll exploit BFS
to give a bound on how many times this can happen.
Let dT (u) be the distance from S to u in the residual graph in iteration T.
′ ′
Observation: Let dT (u) and dT (u) , where T ′ is the iteration after T, then dT (u) ≤ dT (u).
Assume a = (u, v) gets saturated at iteration T and iteration T ′′ (after T ).
In order for a to reappear in the graph, it means a−1 appears in a BFS tree at some iteration T ′ between T and T ′′ .
′ ′ ′′
But then dT (u) = dT (v) − 1 ≤ dT (v) − 1 = dT (u) − 1 − 1 ≤ dT (u) − 2.
|V |
Hence an arc cannot get saturated more than 2 times → # iterations is O(|V ||E|).
∗
Definition: An arc is saturated at iteration i if its residual capacity goes to 0 after that iteration.
8 Linear programming
8.1 Introductory examples
A production problem:
Each week, a food industry produces 2 types of flour:
x1 is the quantity of type 1 flour produced per week
x2 is the quantity of type 1 flour produced per week
Profit: 3 for each unit of product of type 1, and 5 for each unit of product of type 2.
9
A mix (or diet) problem:
For a healthy life, every day we must get a minimum quantity of certain substances (vitamins, fats, fiber, etc.) con-
tained in some ingredients (flour, sugar, milk, etc.), that contain the substances in various proportions.
To simplify things, consider just two ingredients (1, 2) and two substances (A, B):
We want to get at least the minimum of both substances paying the minimum cost:
- x1 is the quantity of ingredient 1
- x2 is the quantity of ingredient 2
Summarizing:
min 5x1 + 10x2 with x1 , x2 ∈ R , subject to:
7x1 + 2x2 ≥ 28 (min request for substance A)
2x1 + 12x2 ≥ 24 (min request for substance B)
x1 , x 2 ≥ 0
The minimum is 15.8 attained at (3.6, 1.4).
10
8.2 Formalization
Names: In the previous models, a choice of the value for the variables represents a decision about an activity to be
done, and this is called “programming”, in the sense of planning.
These problems are called programming problems, while the variables are called decision variables (sometimes, the
problem itself is called “program”).
Variables: The modeling dictates the type of variables that has to be used in the problem: real numbers (when
decimal numbers are legal values), integers or even binary numbers.
Constraints: Very often, variables are restricted in sign and can only take non-negative values.
In addition there are further requirements on the values, written in terms of equations and inequalities.
All these equations and inequalities are collectively called constraints.
Moreover, the set of all values the decision variables can take, that satisfy all the constraints, is called the feasible set.
Note that: If the constraints are continuous (almost always the case), the feasible set is a closed region of Rn .
Objective: The solution to every problem is “optimal” → It is the minimum or maximum value of a given function.
The function to be maximized or minimized is called objective function, and the process of finding the extreme value
is called “optimization”.
Linearity: The problems with only real variables are called linear programming, LP problems.
The problems where variables are integers are called integer programming, IP problems.
The two types of problem are related because, when the formulation of the problem is the same, the feasible set for
the IP problem is a subset of the feasible set for the LP problem.
Note that: Given an IP problem, the corresponding LP problem where the constraint about the integer variables
is removed, is called “relaxation”.
However, it can be proved that both forms can be manipulated to get the following standard form:
min/max cT x subject to
Ax = b , x ≥ 0
The transformation into the standard form uses the following remarks:
1. Every inequality can be transformed into an equation by introducing a slack or surplus variable.
For example, if we have the inequality constraint Ax ≥ b, we can add a new variable s to obtain the equation
Ax + s = b , where the variable s is referred to as a slack variable if the constraint is non-binding, or a surplus
variable if the constraint is binding.
2. If a variable xi is unrestricted in sign, then we can add two non-negative variables x′ and x′′ , and set x = x′ −x′′ .
Definition: Pick a subset of the inequalities → If there is a unique point that satisfies them with equality,
and this point happens to be feasible, then it is a vertex (each vertex is specified by a set of n (in)equalities).
Definition: Two vertices are neighbours if they have n − 1 defining (in)equalities in common.
11
Theorem: Given an LP problem, if there exists an optimal solution, there also exists an optimal solution
which is at a vertex of the feasible set.
We have m equalities from the matrx A, and n equalities from the non-negativity constraints,
thus we have m + n equalities.
Ax = b has a solution only if rank(A) = n.
A possible approach:
1. Pick all possible subsets of n linearly independent constraints out of the m + n constraints.
2. Solve, in the worst case, m+n systems of equations of the type A∗ x = b∗ , where A∗ , b∗ are the restrictions
n
of A and b to the subset of n constraints. This can be done, for example, by Gaussian elimination.
3. Check feasibility of all solutions, evaluate the objective function at each solution, and pick the best.
This approach is correct but inefficient.
A worked example:
A wooden toy factory produces cars and trains: the demand of cars is 50 units per month, while that of trains is 80,
overall, the factory cannot produce more than 100 items per month.
Every car is sold for 2 money units, and every train for 4 money units, we need to find the production plan that
maximizes the revenues.
The problem is:
max z = 2x1 + 4x2
x1 ≤ 50
x2 ≤ 80
x1 + x2 ≤ 100
x1 , x2 ≥ 0
Graphically we find the optimal solution at x1 = 20, x2 = 80, with an optimal value for the objective of 360.
12
Some implementation issues: While running, the simplex algorithm might encounter some issues.
- Unbounded Solution Issue → Sometimes, the simplex algorithm may find that there’s no limit to how much it can
improve the objective function. In such cases, the algorithm stops and reports that the solution is unbounded.
- Cycles, degenerate case → In rare cases, a vertex can be connected to more than n edges, denoted by m > n.
In such situations, the simplex algorithm might fall into a cycle, repeatedly substituting one of the n constraints with
one of the remaining m − n > 0 constraints, and vice versa.
- Lack of Initial Feasible Solution → Occasionally, the simplex algorithm may struggle to find an initial feasible
solution, and to overcome this, we create an artificial problem by adding artificial variables and constraints.
These artificial variables help us start at a feasible point, even if it’s not optimal, but how can we do that?
We adjust the artificial objective function with a large coefficient (often represented by M ), in this way the algorithm
prioritize minimizing the artificial variables while still considering the original problem’s constraints.
Example:
min 4x1 + x2 , subject to:
3x1 + x2 = 3
4x1 + 3x2 ≥ 6
x1 + 2x2 ≤ 3
x1 , x2 ≥ 0
Here, x1 = x2 = 0 is not feasible, and we need a feasible solution to start the simplex.
We then add an artificial variable a1 multiplied by M , to get 3x1 + x2 + M a1 = 3, with the constraint a1 ≥ 0,
in this way the problem has changed.
The initial solution to this problem is x1 = x2 = 0, a1 = 3, and if we run the simplex for this problem it will
end up with a1 = 0 (to minimize the objective) and will return a feasible solution to the original problem.
Time of execution: The simplex algorithm shows that a linear program can always be solved in finite time, in
particular in time that is at most exponential in the number of variables.
This is because each iteration takes polynomial time and moves to a new vertex, and if there are m inequalities and n
variables, there can be at most m n vertices (that is O(2
m+n
)).
9 Duality
9.1 A numerical example
We can approach the problem of the optimality of the solution of the original problem from another point of view.
The problem is:
max 2x1 + 4x2 , subject to: x1 ≤ 50
x2 ≤ 80
x1 + x2 ≤ 100
x1 , x 2 ≥ 0
And graphically we’ve already found the optimal solution at x1 = 20, x2 = 80 with an optimal value of 360.
- First option: We multiply the 1st inequality by 2 and the 2nd by 4, from which we get 2x1 ≤ 100 and 4x2 ≤ 320 .
Adding the two together, we get 2x1 + 4x2 ≤ 420, which provides an upper bound, although not very tight.
- Second option: However, there is a better choice of factors by multiplying the 2nd and the 3rd inequalities by 2,
and adding them together, we get 2x1 + 4x2 ≤ 360 , that is a certificate of optimality.
13
Is there a way to get the best coefficients?
We assign to every constraint a non-negative multiplicative coefficient, y1 , y2 , y3 ≥ 0 , and we get:
y1 x1 ≤ 50y1
y2 x2 ≤ 80y2
y3 (x1 + x2 ) ≤ 100y3
Adding them together we obtain: x1 (y1 + y3 ) + x2 (y2 + y3 ) ≤ 50y1 + 80y2 + 100y3
Now we want to build the best upper bound to 2x1 + 4x2 , so any expression c1 x1 + c2 x2 (on the left-hand side), where
c1 ≥ 2 and c2 ≥ 4, would provide an upper bound, because all variables are non-negative, and we want the upper
bound (on the right-hand side) to be as small as possible.
The second problem is called the dual of the first, which, in turn, is called the primal problem.
Moreover the optimal solution to the dual is (0, 2, 2), and the corresponding objective is 360.
(...) On the ipad
14
Theorem: (Certificate of optimality)
If x and y are feasible solutions of the primal and the dual and cT x = yT b, then x and y must be optimal solutions
to the primal and the dual.
Remark: There is another interesting consequence of weak duality that relates infiniteness of optimal values in the
primal/dual with feasibility of the dual/primal.
Let y be a feasible solution of the dual. By weak duality, we have cT x ≤ yT b for all feasible x. If the optimal value
in the primal is ∞, then ∞ ≤ yT b. This is not possible, so the dual cannot have a feasible solution.
Theorem:
If the optimal value in the primal is ∞, then the dual must be infeasible.
If the optimal value of the dual is −∞, then the primal must be infeasible.
Primal Dual :
T
max c x min yT (b + ∆)
subject to: subject to:
Ax ≤ b + ∆, y T A ≥ cT ,
x≥0 y≥0
Let the optimal value of the problem in both the primal and the dual be z(∆),
so that z(0) is the optimal value of the original problem.
Suppose that the optimal solution to the dual is unique and that, for sufficiently small ∆,
the optimal solution to the dual does not change.
In this case, the optimal value changes by z(∆) − z(0) = yT b + yT ∆ − yT b = yT ∆.
Remark: By strong duality, the optimal solution to the primal changes by the same amount, yT ∆.
Summary: A small change in the level of resources in the primal, induces a change in the optimal value,
which is scaled by the dual solution.
Every dual optimal variable scales the corresponding variation in the objective of the primal.
The change in the primal’s optimal value (z(∆) − z(0)) is determined by how much each element
of the dual solution matrix y scales or influences this change.
Example: Suppose there’s a constraint related to production (e.g., buying a new machine) that could increase the
weekly production by a certain amount δ.
The increase in weekly production leads to a variation in revenues of 2δ.
If the cost of the machine is less than 2δ, then investing in the machine is profitable, otherwise the investment would
result in a loss because the cost of acquiring the machine would outweigh the potential increase in revenue.
This is why the dual optimal solutions are sometimes called shadow prices.
(...)
15
Theorem: (Complementary slackness)
Let x and y be feasible solutions to the primal and the dual.
Then, x and y are optimal solutions if and only if yT (b − Ax) = 0 and (yT A − cT )x = 0.
Proof:
Feasibility for x and y means:
- x is feasible ⇒ Ax ≤ b, that is Ax − b ≤ 0 ⇒ yT (b − Ax) ≥ 0
- y is feasible ⇒ yT A ≤ cT , that is yT A − cT ≤ 0 ⇒ (yT A − cT )x ≥ 0
Suppose x and y are optimal, adding the two inequalities we get:
yT b − yT Ax + yT Ax − cT x ≥ 0 ⇒ yT b − cT x ≥ 0
y T b = cT x ⇒ y T b − cT x = 0
10 NP-completeness
10.1 Introduction
How do we judge how good is an algorithm? Typically we try to analyze an algorithm by predicting the resources
that it would takes, indeed most often we measure the running time.
Given an instance of a particular problem, we can measure its size by an integer p, that represents the length of the
encoding of the input data (in binary notion).
Running time of an algorithm: # of elementary operations performed, measured as a function of the input size p.
However, precise measure is not so important, we are interested in the order of growth of the running time function,
for this reason we use asymptotic notation.
Definition: Given f : N → N and g : N → N , we say that f = O(g) if ∃ c, p′ such that f (p) ≤ c · g(p) for all p ≥ p′ .
An algorithm is efficient (or polynomial time) if its running time f (p) is a polynomial function of the input size p,
i.e. f (p) = O(pk ) for some fixed constant k.
⇒ Remark: The notation of running time allows to classify problems in terms of computational complexity,
whether or not we currently know how to solve them efficiently.
A formal classification applies to the so-called decision problems, that are problems in which the answer is yes or not.
Some example of this kind of problems are:
- Does a graph G have an s-t path with at most k edges?
- Does a graph G have an s-t path with at least k edges?
- Does a minimization LP have a feasible solution of objective function value ≤ k?
⇒ Note that: So far we have mostly dealt with optimization problems, in witch we were asked to find the best solution
according to a given cost function.
Typically we can state optimization problems as decision problems, and solve them by relying on solving the decision
version multiple times, e.g. with binary search.
16
P is the class of decision problems that can be solved by a polynomial time algorithm,
and by definition P is a subset of NP.
⇒ In particular, we can fin a solution and verify its correctness in polynomial time.
NP is the class of decision problems admitting a certificate, from which the correctness of a yes-answer
can be derived in a polynomial time.
⇒ In particular, we can verify its correctness in polynomial time.
NP-complete problems: (Informally) Problems in NP are problems such that the following holds: << If you have
a polynomial-time algorithm for one of them, you can use it to solve any problem in NP in polynomial time >>.
Definition: (Reductions) We say that a decision problem A reduces to a decision problem B if:
i) Given an instance I of problem A, you can construct an instance I’ of problem B in polynomial time in size(I).
ii) I admits a yes-answer if and only if I’ admits a yes-answer.
We let A ≤ B denote that A reduces to B.
We will often encounter problems called NP-hard, that are problems for which only (*) apply.
Procedure: When dealing with your own problem X you can exploit reductions in 2 ways:
(1) Trying to reduce your own problem X to a problem Y ∈ P (solvable in polynomial time) → This shows that
your problem X can be solved efficiently.
(2) Trying to reduce an NP-complete problem Y to your problem X → This shows that your problem X is NP-
complete too, that means that most likely it can’t be solved efficiently (unless P=NP).
Definition: A bipartite graph is an undirected graph G = (V, E) where V can be partitioned into two subsets L, R
such that every edge e ∈ E has an endpoint in L and the other one in R.
Definition: (Matching) A matching in a graph G = (V, E) is a subset of edges M ⊂ E such that every node in V is
the endpoint of at most one edge in M .
Definition: (Maximum matching in bipartite graphs) Given G = (V, E), a maximum matching is a matching M
of maximum cardinality (i.e. maximizing |M |).
Theorem: Maximum matching in bipartite graphs is solvable in polynomial-time (true also if G is non bipartite).
Proof: We’ll prove the theorem using a reduction to flow, but first let’s state the problem in a decision version:
Given G = (V, E) and a value k, does G have a matching M with |M | ≥ k? (*)
We’ll reduce (*) to a flow problem.
17
Given an instance G = (L ∪ R, E) of (*) we’ll construct a flow instance on a directed G′ = (V ′ , E ′ ) as follows.
- We set V ′ = L ∪ R ∪ {s} ∪ {t}
- For each (u, v) ∈ E orient the edge from u ∈ L to v ∈ R, and add it to E ′ .
- Add (s, u) to E ′ for all u ∈ L.
- Add (v, t) to E ′ for all v ∈ R.
- Set capacity of all arcs to 1
This construction can be done in polynomial time (in size(G)).
Claim: There exists an s-t flow in G′ of value ≥ k if and only if there exists a matching in G of value ≥ k.
Given the claim, since we know we can decide whether G′ has a flow of value ≥ k in polynomial-time (apply F&F),
we can decide (*) in polynomial-time.
⇒ Note that: If we can decide (*) in polynomial-time, we can solve the maximum matching problem in polynomial
time, by trying all values of k from 1 to |V |.
Conclusion: Since we know that we can decide the existence of a flow with value ≥ k in polynomial time using
algorithms like Ford-Fulkerson, and because of the correspondence established in the claim, we can also decide whether
a graph has a matching of size ≥ k in polynomial time.
10.2 SAT
The first problem shown to be NP-complete is SAT, however to define it formally we need the following notions:
Definition of literal: Given a set of boolean variables x1 , ..., xn (variables that can take 0 or 1 value),
a literal l is either a variable (xi ) or its negation (x′i ).
Conjuctive normal form: A boolean formula in conjuctive normal form is given by an AND of clauses,
so it is an expression of the form C1 ∧ C2 ∧ ... ∧ Cn , where ∧ denotes the symbol of AND.
Truth assignment: A truth assignment assigns (1/0) value (or truth/false) to each variable. (...) On the ipad
SAT problem: (Decision version) Given a boolean formula in conjuctive normal form, is there a truth assignment
for the variables that satisfies all the clauses? (i.e. that make the whole formula be true?)
18
General idea:
(1) The execution of one step of the algorithm can be simulated by a so-called Boolean circuit of size O(s(n)).
(2) The execution of the entire algorithm can be simulated by combining t(n) such circuits, yielding a boolean circuit
of size O(t(n)s(n)) = O(t2 (n)) (because the algorithm at each step affects O(1) memory location).
(3) Model such a circuit with a SAT formula.
What is a boolean circuit? A boolean circuit is an abstract device that computes boolean functions (f : {0, 1}n →
{0, 1}).
The device is made of boolean gates and wires.
A boolean circuit is modeled as a directed acyclic graph whose nodes are gates and arcs are wires cinnecting the gates.
- input of circuit → nodes of in-degree 0
- output of circuit → nodes of out-degree 0
- size of circuit → number of total gates
Given I of X on computes n = size(I) and construct a circuit Cn that simulates Ax on (S, I).
(...) On the ipad
Essentialy, given X ∈ NP and Ax for any instance I of X we construct (*) (→ polytime in size(I)).
Assume (*) is a circuit with h inputs and m gates, we’ll define a boolean formula with h + m variables g1 , ..., gh+m .
Think of the circuit as an acyclic graph, number the input-nodes and the gate nodes consistently, according to the
topological order.
v1 , ..., vh , vh+1 , ..., vh+m , where v1 , ..., vh are inputs and vh+1 , ..., vh+m are the gates.
We will construct a SAT formula by:
- associating a small formula to each gate
- taking the AND of all these small formulas and the final gate.
Theorem: (Circuits can simulate algorithms) If A is an algorithm that takes worst-case running time t(n) on
input of length n, then for every n there is a boolean circuit of size O(t2 (n)), such that for every input I of
length n, the output of the circuit on I is the same as the output of algorithm A on I.
Furthermore, given the code of A and n, such circuit is computable in polynomial time in t(n).
For each i = 1, ..., n the formula (and/or/not) associated to vh+i (i-th gate) is:
- If vh+i represents NOT gate, that has an incoming edge from vj :
˜ ∨ g˜j ) ∧ (gh+i ∨ gj ) that represents gh+i = g˜j
(gh+i
- If vh+i represents AND gate, that has incoming edges from vj and vl :
(gh+i ∨ g˜j ∨ g˜l ) ∧ (gh+i
˜ ∨ gj ∨ gl ) ∧ (gh+i
˜ ∨ g˜j ∨ gl ) ∧ (gh+i
˜ ∨ gj ∨ g˜l ) that represents gh+i = gj ∧ gl
- If vh+i represents OR gate, that has incoming edges from vj and vl :
(gh+i ∨ g˜j ∨ g˜l ) ∧ (gh+i
˜ ∨ gj ∨ gl ) ∧ (gh+i ∨ g˜j ∨ gl ) ∧ (gh+i ∨ gj ∨ g˜l ) that represents gh+i = gj ∨ gl
Now it remains to observe that:
- If (g1 , ..., gh+m ) = (b1 , ..., bh+m ) is an assignment that satisfies the formula (*), then on input (b1 , ..., bh+m ) the i-th
gate will evaluate bh+i and bh+m = 1 (circuit outputs 1).
(*) F: (formula for gate 1) ∧ ... ∧ (formula for gate m) ∧ gh+m
If b˜1 , ..., b˜n is a set of input values that make the circuit output 1, then let bh+i
˜ be the vale of gate i for this input, then
˜ ˜
(b1 , ..., bh+m ) satisfies the formula (*).
This concludes the sketch of NP-complete for SAT.
19
10.3 3-SAT
3-SAT problem: (Decision version) Given a boolean formula in conjuctive normal form, where each clause has exactly
3 literals, is there a truth assignment for the variables that satisfies all clauses?
Vertex cover problem: (Decision version) Given a graph G = (V, E) and an integer k (0 < k < |V |),
is there a vertex cover C with |C| ≤ k?
20
Then necessarily, for every variable xi , exactly one between the nodes in the xi -gadget is in the cover.
We set xi = 1 if x1 is in the cover and xi = 0 otherwise.
Now we observe that this truth assignment satisfy all clauses because we must have exactly 2 roads in the Ci -gadget
included in the cover.
So the literal not included in the cover has the right truth-values.
Now assume ∃ truth assignment for 3-SAT formula, we can construct a vertex cover C as follows:
for all xi add (xi )∈ C if xi = 1 and (x′i )∈ C if xi = 0 .
For all clause Ci we know ∃ at least one literal satisfying it, we add the other 2 nodes of its triangle to C.
⇒ Note that: |C| = n + 2m, and C is a vertex.
⇒ Note that: our graph has poly(n,m) nodes and edges, hence vertex cover is NP-complete.
21
The goal is to achieve the following: If we have a subset of integers that sum up to k ′ , then the subset of a′i gives us
an independent set of cardinality k ′ , and the subset of b′ij will correspond to edges whose endpoints are not in our
independent set, and viceversa.
We represent integers in a matrix where each integer is a row, and the row should be considered as being the base-10
representation of the integer.
- We have one special column plus a column for every edge.
- Row for every vertex and for every edge.
- Column (i, j) has 1 in row a′i , a′j , b′ij .
P|E|−1
We fix k ′ = k · 10|E| · i=0 10i
⇒ Note that: the construction is polynomial in size of subset sum instance.
Now we observe that ∃ integers summing up to k ′ iff ∃ independent set of cardinality ≥ k.
10.8 Knapsack:
Given n objects with weights w1 , ..., wn , a budget B, profits p1 , ..., pn and a parameter k.
P P
We ask whether ∃ I ⊂ {1, ..., n} such that i∈I wi ≤ B and i∈I pi ≥ k (decision version).
22