Combined Doc
Combined Doc
Classification
A motivating example
Optimizing the Product Mix
in Manufacturing
Linear Programming (LP)
formulation
Graphical Representation
Objective:
About the Solution
The feasible region is delimited by a convex polygon.
Determined by its vertices (corner points)
Linear Programming
min Z = c1 x1 + c2 x2 + . . . + cn xn
subject to
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm
x1 ≥ 0
..
.
xn ≥ 0
For maximization it would look the same, just the “min” would be replaced
by “max”.
• a constant
min Z = cx
subject to
Ax = b
x ≥ 0
Other formulations are also possible. An often used version is shown below
(some textbooks call this one the standard form, rather than the one above):
min Z = cx
subject to
Ax ≥ b
x≥0
Note: x is meant a column vector, which implies that c must be row a vector,
so that the cx product makes sense. If c is a column vector, then we write
cT . Usually it is clear from the context, which vector is column and which
is row, so often the distinction is not shown explicitely.
Why does it make sense to increase the number of variables? Because the
standard form may be advantageous (For example, an LP solver program
may require the input in standard form).
Exercises
min Z = 5x − 6y
subject to
2x − 3y ≥ 6
x−y ≤ 4
x ≥ 3
Notice that this will indeed force the original left-hand side (the part before
x4 ) greater than or equal to 6, as the original inequality required. The reason
is that x4 ≥ 0 must hold, since all variables are forced to be nonnegative in
the standard from. Therefore, the expression before x4 must be ≥ 6, since
subtracting a positive (or 0) quantity makes it equal to 6. If the original
inequality had the ≤ direction, then the only difference would be that we
would add the slack variable, rather than subtracting it.
After transforming the other inequalities, too, and also substituting the old
variables in the objective function with the new ones, we get the entire stan-
dard form:
3.
4.
5.
schedule that minimizes the cost, such that in each month the demand is satisfied.
• The routes are fixed and known in advance, each route goes through a
known set of links. (These sets can possibly overlap, as the routes may
share links.)
• Each link has a known available capacity, which cannot be exceeded by the
routes that use the link, in the sense that the sum of the route bandwidths on
the link cannot be more than the link capacity.
• Each route generates a profit, due to the traffic it carries. The profit of each
route is proportional to the bandwidth assigned to the route. The profit
generated by unit bandwidth is known for each route (may be different for
different routes).
Under the above conditions, the company wants to decide how much bandwidth to
assign to each route. The goal is that the ratio of the total profit vs. the total cost is
maximized. In other words, they want to maximize the yield of the bandwidth
investment in the sense that it brings the highest profit percentage. Formulate this
optimization problem as a linear program.
Hint: Formulate it first as optimizing the ratio of two linear functions under linear
constraints. This is still not an LP, since the objective function is a fraction. Then
convert it into an LP in two steps, using the following idea.
Step 1. If you multiply both the numerator and the denominator of the fractional
objective function by the same (nonzero) number, then the value of the fraction
remains the same. Therefore, after introducing such as scaling factor, which will be
an extra variable, we can fix the denominator at an arbitrary fixed nonzero value,
say 1. This gives a new constraint. After adding the new constraint, we can aim at
maximizing only the numerator in the new system, so we got rid of the fraction in
the objective function.
Step2. While this new system is still nonlinear, containing the product of the
original variables and the scaling factor, we can make it linear it by introducing
another new variable for the product of variables.
Solution of Exercises 5 and 6
Exercise 5. (See the problem description in the previous lecture note.)
Let us introduce the following variables:
xi : amount of regular production in month i
yi : amount of overtime production in month i
ui : amount used from storage in month i
vi : amount put into storage in month i
wi : amount available in storage at the beginning of month i
Then in each month we need to satisfy the following constraints:
xi + yi + ui − vi = di
wi+1 = wi − ui + vi
Explanation: The first constraint expresses that the regular production,
plus the overtime production, plus what we use from storage, should satisfy
the demand (di ), after subtracting the amount that we put into storage. The
second constraint expresses that the storage content at the beginning of the
next month will result by what is available in the current month, minus what
we use from storage, plus what we add to it.
We have to include some further constraints:
• xi ≤ r (∀i)
(at most r units can be produced by regular production each month)
• w1 = 0
(there is nothing yet in storage at the beginning of the first month)
• wn+1 = 0
(there is nothing left in storage after the last month)
1
• ui ≤ wi (∀i)
(we cannot use more from storage than what is available there)
The objective function is the total cost, summed over all months, including
regular production, overtime, and storage. To make the expression more
general, let us allow the cost coefficients vary month by month; it is expressed
by indexing them with the month index i. Thus, the objective function is:
n
X
Z= (bi xi + ci yi + si wi ).
i=1
n
X
min Z = (bi xi + ci yi + si wi )
i=1
Subject to
xi + yi + ui − vi = di (∀i)
wi+1 = wi − ui + vi (∀i)
xi ≤ r, ui ≤ wi (∀i)
w1 = 0, wn+1 = 0
xi , yi , ui , vi , wi ≥ 0 (∀i)
2
Exercise 6. (See the problem description in the previous lecture note.)
Note that this matrix is also known as input, since it is assumed that the
route system is given.
Variables: xj , j = 1, . . . , R; it represents the capacity assigned to route j.
Note that while all routes occur in this expression, only those contribute to
the sum which use link i. For these we have aij = 1. For the rest, which do
not use the link, we have aij = 0, so they do not contribute to the sum.
The objective function is the ratio of the total profit vs. the total cost,
summed over all routes: PR
p j xj
Z = Pj=1
R
j=1 cj xj
3
Thus, the mathematical programming (but not yet linear programming!)
formulation that directly follows the verbal description is this:
PR
p j xj
max Z = Pj=1
R
j=1 cj xj
Subject to
xj ≥ 0 i = 1, . . . , R
cx + α
max Z =
dx + β
Subject to
Ax = b
x≥0
4
To avoid additional difficulties, we assume that the denominator dx + β
is always positive in the feasible domain. Let us introduce a new variable
t, which we call scaling factor. If we multiply both the numerator and the
denominator in the objective function by t, then the value of the ratio remains
the same:
cx + α (cx + α)t cxt + αt
Z= = =
dx + β (dx + β)t dxt + βt
Observe now that the arbitrary value of t allows it to be chosen such that the
denominator becomes 1. This can be enforced by a new constraint. Then,
the denominator being 1, it is enough to maximize the numerator. Thus, the
following task will have the same optimum as the original:
max cxt + αt
Subject to
dxt + βt = 1
Ax = b
x≥0
(A “sanity check” question: the usage of t assumed that t 6= 0. How do we
know that it holds? Answer: the constraint dxt + βt = 1 guarantees it, as
it could not hold with t = 0. Note: it also could not hold with dx + β = 0,
but we assumed that the denominator dx + β is always positive.)
The above formulation is still nonlinear, since xt is a product of two variables.
Let us introduce a new variable y by
y = xt. (1)
max cy + αt
Subject to
dy + βt = 1
5
Ax = b
x≥0
This is already a linear programming task, but the relationship between x, y
and t is expressed by the nonlinear formula (1). This relationship cannot be
ignored, since it represents a dependence among the variables. If, however,
we include (1) as a constraint, then the task becomes nonlinear again! We
can avoid this problem by expressing x from (1) as
1
x= y
t
and using it in place of x. Then we get
max cy + αt
Subject to
dy + βt = 1
1
A y=b
t
1
y≥0
t
Multiplying both sides in the last two constraints by t, we get
max cy + αt
Subject to
dy + βt = 1
Ay − bt = 0
y≥0
which is already a linear programming task. This is what we wanted! ♥
6
1. How do we know that the inequality 1t y ≥ 0 indeed transforms into y ≥ 0?
After all, this only holds if t > 0, which was not explicitly required.
Answer: the constraint dy + βt = 1 enforces t > 0, because it is equivalent
to dxt + βt = 1, which is in turn equivalent to (dx + β)t = 1. As we assumed
dx + β > 0, the equation (dx + β)t = 1 could not hold with t ≤ 0.
7
An LP Formulation Example:
Minimum Cost Flow
The Minimum Cost Flow (MCF) problem is a frequently used model. It can
be described in our context as follows.
Model
Given a network with N nodes and links among them. We would like to
transport some entity among nodes (for example, data), so that it flows
along the links. The goal is to determine the optimal flow, that is, how much
flow is put on each link, according to the conditions discussed below.
Let us review the input for the problem, the objective and constraints and
then let us formulate it as a linear programming task.
Input data:
• Each node i is associated with a given number bi , the source rate of the
node. If bi > 0, the node is called a source, if bi < 0, the node is a sink.
If bi = 0, the node is a “transshipment” node that only forwards the
flow with no loss and no gain.
• Each link is associated with a cost factor aij ≥ 0. The value of aij is
the cost of sending unit amount of flow on link (i, j). Thus, sending xij
amount of flow on the link costs aij xij .
Remarks:
• The links are directed in this model, Cij and Cji may be different.
• If the link from i to j is missing from the network, then Cij = 0. Thus,
the capacities automatically describe the network topology, too.
Constraints:
• Capacity constraint: The flow on each link cannot exceed the capacity
of the link.
• Flow conservation: The total outgoing flow of a node minus the total
incoming flow of the node must be equal to the source rate of the node.
That is, the difference between the flow out and into the node is exactly
what the node produces or sinks. For transshipment nodes (bi = 0) the
outgoing and incoming flow amounts are equal.
Objective:
Find the amount of flow sent on each link, so that the constraints are satisfied
and the total cost of the flow is minimized.
LP Formulation
Let xij denote the flow on link (i, j). The xij are the variables we want to
determine.
xij ≥ 0 (∀i, j)
• Capacity constraints:
xij ≤ Cij (∀i, j)
• Flow conservation:
N
X N
X
xij − xki = bi (∀i)
j=1 k=1
Here the first sum is the total flow out of node i, the second sum is the
total flow into node i.
The objective function is the total cost, summed for all choices of i, j :
X
Z= aij xij
i,j
X
min Z = aij xij
i,j
subject to
N
X N
X
xij − xki = bi (∀i)
j=1 k=1
xij ≤ Cij (∀i, j)
xij ≥ 0 (∀i, j)
Is this in standard form? No, but can be easily transformed into standard
form. Only the xij ≤ Cij inequalities have to be transformed into equations.
This can be done by introducing slack variables yij ≥ 0 for each, and replacing
each original inequality xij ≤ Cij by xij + yij = Cij .
An Application to Network Design
We can build links between any pair of nodes. The cost for
unit capacity (=1 Mbit/s) on a link from node i to j is aij .
Higher capacity costs proportionally more, lower capacity costs
proportionally less.
The goal is to design which links will be built and with how
much capacity, so that the given demand can be satisfied and
the overall cost is minimum.
Let zij be the capacity we implement on link (i, j). This is not
given, this is what we want to optimize. If the result is zij = 0
for some link, then that link will not be built.
With this notation the cost of link (i, j) is aij zij , so the objective
function to be minimized is
X
Z= aij zij
i,j
(kl)
To express the constraints, let xij be the amount of flow on
link (i, j) that carries traffic from node k to l (not known in
advance). Then the total traffic on link (i, j) is obtained by
summing up these variables for all k, l :
X (kl)
xij .
k,l
The flow conservation should hold for each piece of flow, that
is, for the flow carrying traffic between each source-destination
pair k, l. To express this concisely, let us define new constants
by
bkl if i=k
(kl)
di = −bkl if i=l
0 otherwise
(kl)
The value of di shows whether node i is source, sink or trans-
shipment node for the k → l flow.
Thus, the flow conservation constraints (one for each node and
for each flow) can be written as
X (kl) X (kl) (kl)
xij − xri = di (∀i, k, l)
j r
X
min Z = aij zij
i,j
subject to
X (kl) X (kl) (kl)
xij − xrj = di (∀i, k, l)
j r
X (kl)
xij − zij ≤ 0 (∀i, j)
k,l
xij ≥ 0 (∀i, j)
zij ≥ 0 (∀i, j)
Comments
A Fast Solution
• Set the capacity of link (i, j) to the sum of those bkl values
for which (i, j) is on the min cost path found for k, l.
Geometric Interpretation
¿ T
¿ T
¿ T
¿ TT
¿
← Optimum
¯
A ¯
A ¯
A ¯
A ¯
AXX ¯
XX
XXX ¯
XXX
¯
In case there are only two variables, this can be graphically represented and solved
in the plane.
Graphical solution: After finding the polygonal boundary of the feasible domain
D, as illustrated in the figure below, we “push” the line 3x1+3x2 = a , representing
the objective function, as far as possible, so that it still intersects D. The optimum
will be attained at a vertex of the polygon.
If, however, as typical in applications, there are many variables, this simple
graphical approach does not work, one needs more systematic methods.
Comments on LP Algorithms
The historically first and most fundamental network flow problem is the
Maximum Flow Problem. Recall that we have already seen two other flow
models: Minimum Cost Flow and Multicommodity Flow.
Model
Given a network with N nodes and links among them. We would like to
transport some entity (for example, data) from a source node to a desti-
nation node so that it flows along the links. The goal is to determine the
maximum possible amount of flow that can be pushed through the network,
such that link capacity contraints are obeyed and at the intermediate nodes
(also called transshipment nodes) the flow is conserved, i.e., whatever flows
into an intermediate node, the same amount must flow out.
Let us review the input for the problem, the objective and constraints and
then let us formulate it as a linear programming task.
Input data:
Remarks:
• The links are directed in this model, Cij and Cji may be different.
• If the link from i to j is missing from the network, then Cij = 0. Thus,
the capacities automatically describe the network topology, too.
Constraints:
1
• Capacity constraint: The flow on each link cannot exceed the capacity
of the link.
Objective:
Find the maximum amount of flow that can be sent through the network from
the source to the destination, such that the link capacity is not exceeded on
any link and the flow conservation constraint is satisfied at every intermediate
(transshipment) node.
LP Formulation
Let xij denote the flow on link (i, j). (The values of these variables are not
known in advance, we want to optimize them.)
Let us express the constraints:
xij ≥ 0 (∀i, j)
• Capacity constraints:
• Flow conservation:
N
X N
X
xij − xki = 0 (∀i 6= s, d)
j=1 k=1
Here the first sum is the total flow out of node i, the second sum is the
total flow into node i.
2
The objective function is the total net flow out of the source:
N
X N
X
F = xsj − xks
j=1 k=1
N
X N
X
max F = xsj − xks
j=1 k=1
subject to
N
X N
X
xij − xki = 0 (∀i 6= s, d)
j=1 k=1
xij ≤ Cij (∀i, j)
xij ≥ 0 (∀i, j)
Solution
Solving the Maximum Flow Problem directly as a linear program, although
possible, is not economical. There are several special algorithms which utilize
the special structure of the problem and, therefore, are faster and simpler
than a general LP solution.
The historically first and most fundamental maximum flow algorithm is due
to Ford and Fulkerson. Here we provide an informal summary on how the
Ford-Fulkerson algorithm works.
The key idea of the Ford-Fulkerson algorithm is that if we can find a so
called augmenting path, then the current flow (which may be initially 0) can
be increased without violating any constraints.
An augmenting path is any undirected path from s to d, such that on any
forward edge (an edge whose direction agrees with the path traversal) the
3
flow is strictly less than the capacity, while on any backward edge (an edge
with opposite direction to the path traversal) the flow is strictly positive.
If such an augmenting path is found then we can increase the net flow out
of the source, without constraint violation, by the following operation:
• Increase the flow on every forward edge by some > 0 (the same for
each edge).
One can directly check that with this operation we do not violate any con-
straint, assuming was chosen small enough, such that we do not go over
the capacity on any forward edge, and do not pull the flow into negative on
any backward edge.
While the augmenting operation keeps all constraints satisfied, the net flow
out of the source increases by . The reasons are:
• If the first edge on the augmenting path is a forward edge, then the
flow on this edge grows by , and it does not change on any other edge
adjacent with the source. As a result, more flow is flowing out of the
source.
• If the first edge on the augmenting path is a backward edge, then the
flow on this edge decreases by , and does not change on any other edge
adjacent with the source. As a result, less flow is flowing back into
the source. Therefore, the net flow out of the source still grows by .
Thus, the augmenting operation increases the net flow that leaves the source,
while keeping all constraints satisfied. This increased flow must eventually
reach the destination, since it is preserved at every intermediate node.
4
Remark. It may seem at first an appealing interpretation of the augmenting
operation that we “push through” more flow through the augmenting path.
This is, however, not correct. It may even happen that the flow is reduced on
every edge of the path, if they are all backward edges. Then how can the flow
always increase? The answer is that the augmenting operation rearranges the
flow in a way that the net flow out of the source eventually grows, but this
does not necessarily mean that the increment is simply realized along the
augmenting path.
Having introduced the augmenting path concept, there are some natural
questions that arise at this point:
5
maximum remaining allowed change in the flow (increase or decrease,
depending on whether the edge is forward or backward).
Then the right value of is the minimum of the rij values along the
augmenting path. Why? Because we want as large as possible (so
that we gain the largest possible flow increment by the augmentation),
but must fit into each gap along the path, so that we do not violate
the capacity and non-negativity constraints.
6
Dijsktra’s algorithm), and then map back the path into the original
graph.
7
by A and B: all edges from A to B carry the maximum possible
amount of flow (their capacity), and nothing is flowing back.
Observe now that every such cut is a bottleneck: we cannot push
through more flow than the capacity of the cut, which is defined
as the summed capacity of all edges that go from the source side
to the destination side.
Thus, having reached a bottleneck, the flow must be maximum,
since no more flow can be pushed through than the capacity of
any cut. Since we can only reach the smallest bottleneck, as it
already limits how high we can go, therefore, the flow has reached
its maximum, once we find no more augmenting paths. As a by-
product, we obtain an important result, the famous Max Flow
Min Cut Theorem:
Theorem 1 The value of the maximum flow from s to d is equal
to the minimum capacity of any cut which separates d from s.
Remark. Since no more flow can be pushed through any cut than its
capacity, therefore, the total s → d flow can never be larger than the
capacity of any s → d cut (a cut that separates d from s). That is,
for any feasible value F of the s → d flow and for any s → d cut with
capacity C
F ≤C
always holds. The nice thing is that if we maximize the lefthand-side
and minimize the righthand-side, then we get precise equality. In other
words, whatever is not obviously excluded by the capacity limitations,
can actually be achieved. There is no gap here between the theoretically
possible and the practically achievable!
8
On the other hand, one can easily avoid this, as shown in the following
theorem.
Theorem 2 If among the possible augmenting paths in the residual
graph a shortest path, using the residual capacities as edge weights, is
chosen in every iteration, then the algorithm reaches the maximum flow
after at most nm/4 iterations, where n is the number of nodes and m
is the number of edges in the graph.
Note: This variant of the Ford-Fulkerson method is often called Edmonds-
Karp algorithm.
Yet another important result is about the integrality of the maximum flow.
Theorem 3 If all capacities are integers, then there exists a maximum flow
in which the value of the flow on each edge is an integer (and, therefore, the
total flow is also integer). Such an integer flow can also be efficiently found
by the Ford-Fulkerson (or Edmonds-Karp) algorithms.
This theorem is not hard to prove: if we start with the all-zero flow, and
all capacities are integers, then one can prove with a simple induction that
can always be chosen an integer, so the flow remains integer after each
augmentation.
The integrality theorem makes it possible to use maximum flow computations
for solving graph optimization problems. Key examples are the maximum
number of disjoint paths, and the maximum bipartite matching problem.
See details in the slide set entitled “Application of Maximum Flow to Other
Optimization Problems,” as well as in the class discussion.
9
Application of Maximum Flows
to Solve Other Optimization
Problems
Disjoint Paths
Disjoint path network: G = (V, E, s, t).
■ Directed graph (V, E), source s, sink t.
■ Two paths are edge-disjoint if they have no arc in common.
2 5
s 3 6 t
4 7
3
Disjoint Paths
Disjoint path network: G = (V, E, s, t).
■ Directed graph (V, E), source s, sink t.
■ Two paths are edge-disjoint if they have no arc in common.
2 5
s 3 6 t
4 7
4
Disjoint Paths
Max flow formulation: assign unit capacity to every edge.
1
1 1 1 1
1
s 1 1 t
1 1
1 1 1
5
Disjoint Paths
Max flow formulation: assign unit capacity to every edge.
1
1 1 1 1
1
s 1 1 t
1 1
1 1 1
1 5 9
2 6 10
s t
3 7 11
4 8 12
3
Deleting 3 arcs disconnects s and t
1 5 9
2 6 10
s t
3 7 11
4 8 12
5
There are 2 node disjoint s-t paths.
1 5 9
2 6 10
s t
3 7 11
4 8 12
6
Deleting 5 and 6 disconnects t from s?
1 5 9
2 6 10
s t
3 7 11
4 8 12
1 1’
2 2’ Matching
1-2’, 3-1’, 4-5’
3 3’
4 4’
L 5 5’ R
13
Bipartite Matching
Bipartite matching.
■ Input: undirected, bipartite graph G = (L ∪ R, E).
■ M ⊆ E is a matching if each node appears in at most edge in M.
■ Max matching: find a max cardinality matching.
1 1’
2 2’ Matching
1-1’, 2-2’, 3-3’, 4-4’
3 3’
4 4’
L 5 5’ R
14
Bipartite Matching
Max flow formulation.
■ Create directed graph G’ = (L ∪ R ∪ {s, t}, E’ ).
■ Direct all arcs from L to R, and give infinite (or unit) capacity.
■ Add source s, and unit capacity arcs from s to each node in L.
■ Add sink t, and unit capacity arcs from each node in R to t.
1 ∞ 1’
1 2 2’ 1
s 3 3’ t
4 4’
L 5 5’ R
15
Graph Connectivity and Minimum Cuts
1 Basic Concepts
Often we want to find a minimum cut in the entire graph, not just between a
specified source and destination. The size of this minimum cut characterizes
the connectivity of the graph. It is important to study it, as it allows to
decide how vulnerable the network is to link or node failures.
Definitions
1
• Node-connectivity of the graph: κ(G) = minimum number of
nodes (vertices) that need to be deleted, such that the remaining graph
is either disconnected, or it has no edge at all. κ(G) is the size of a
minimum vertex-cut, also called node-cut.
Exercises
2
• C contains at least one endpoint from each cut edge, so the re-
moval of C removes all cut edges. If there are no more edges at
all in the graph, then C satisfies the definition of a node-cut.
• If, after the removal of C, at least one edge remains, then such an
edge must be fully either in A or B, since it cannot be in the cut,
as the cut edges have all been removed. Let e′ = {u, v} be such
an edge. Pick the endpoint of the cut edge e = {x, y} that is not
in the same set as u. Let us say, this endpoint of e is y (since we
can name the nodes arbitrarily). Then u and y are separated by
C, since they are on different sides of the cut, and the removal of
C removes all edges in the cut. Therefore C is a node-cut, as it
separates (at least) two nodes.
Inequalities (3) and (2) together imply (1) that we wanted to prove.
Comment. The graph parameters κ(G), λ(G) and δ(G) are positive
integers for any connected graph G, with at least two nodes. Fur-
thermore, as we have seen, they must always satisfy the condition
κ(G) ≤ λ(G) ≤ δ(G).
At this point it is natural to ask: is there any other condition they must
satisfy, so that a graph exists with the given parameters? Interestingly,
the (nontrivial) answer is that nothing else is imposed:
Theorem. For any three positive integers k, ℓ, d, if they satisfy
k ≤ ℓ ≤ d,
3
3. We want to design a network topology, represented by an undirected
graph, such that it satisfies the following conditions:
κ(G) ≥ 4
λ(G) ≥ 5
∆(G) ≤ 9
d(G) ≤ 6
diam(G) ≤ 3
n = 30.
4
How was this graph found? Designing it by hand may be a daunting
task...
111 = 2m
cannot hold, as the right-hand side is even, the left-hand side is odd.
Thus, the assumption that all degrees are 3 leads to a contradiction.
Therefore, there must be a node with degree d 6= 3. Since we cannot
disconnect the network by removing 2 or fewer links, d ≤ 2 is impossi-
ble. This yields that this node must have degree at least 4.
5
5. Consider a network, represented by a k-connected graph G. We want
to extend the network by adding a new node u. We decide to connect
the new node u to ℓ different old nodes v1 , . . . vℓ that we can chose
from G. What should be the value of ℓ, and how should we choose the
ℓ neighbors v1 , . . . vℓ of u, if we want to guarantee that the extended
network preserves k-connectivity?
6
6. Based on the idea of Exercise 5, one can also prove the following result:
Proof. Assume indirectly that the resulting graph has λ(G) < k. Then
its nodes can be partitioned into two non-empty sets, A and B, such
that they are separated by a cut C containing at most k −1 edges. Now
a key observation is that G1 must be either fully in A or fully in B. The
reason is that if G1 has nodes in both sets, then its nodes in A would
be separated from its nodes in B by the cut C. Since |C| ≤ k − 1, this
would contradict to λ(G1 ) = k. As the naming of the sets is irrelevant,
let us say that G1 falls fully in A.
The same argument applies to G2 , as well, so it must also fall fully in
one of the sets. Since A, B are non-empty, and they contain no other
nodes than the original ones, therefore, G2 must fall fully in the other
set B. This means, the subgraph induced by A is equal to G1 , and
the subgraph induced by B is equal to G2 . The sets A, B, however,
are connected to each other by at most k − 1 edges, according to the
indirect assumption. This contradicts to the construction, in which we
connected G1 and G2 by k edges.
This result can be used, for example, to optimally solve the following
network design problem, which would otherwise look quite hard:
This could appear as a hard optimization problem, since one can choose
the connecting links in exponentially many different ways, and we have
to find the least expensive solution under the constraint that the re-
sulting larger network remains k-connected.
However, in view of the above result, we can simply choose any k links
to connect the two networks, the obtained larger network will always
7
be k connected. Therefore, the solution that attains the minimum cost
is simply the one in which we select the k least expensive links.
• For any two different nodes x, y the value of λ(x, y) is equal to the
maximum number of edge-disjoing paths connecting x and y.
• λ(G) is equal to the smallest value k with the property that between any
two nodes there are at least k edge-disjoint paths.
Exercises
8
any two nodes there is a route system that contains (at least) 3 link-
disjoint routes.
Solution: Yes. Let us add a new node u and connect it to all nodes in
A. Similarly, add another new node v and connect it to all nodes in B.
By Exercise 5 in Section 1, the extended graph remains k-connected.
Then, by Menger’s Theorem, there are k edge-disjoint paths connecting
u and v. The parts of these paths that fall in the original graph satisfy
the requirements.
• Set δ = ⌊2m/n⌋.
• Connect any two nodes that are within distance ⌊δ/2⌋ along the circle.
9
• If not all nodes have degree ≥ δ, then pick a node with degree < δ,
connect it to a node that is at distance ⌊n/2⌋ away along the circle.
Repeat this until all nodes have degree δ, possibly with a single node
that can have degree δ + 1.
Exercise
3 Algorithms
The minimum cut between two specified nodes can be obtained as a by-
product of the maximum flow computation. If, however, we want an overall
minimum cut in the whole graph, then a single maximum flow computation
does not suffice.
Interestingly, one can find an overall minimum cut directly, without using
anything about maximum flows. Below we present two algorithms for this
problem.
This algorithm uses randomization, so it will guarantee the result only with
a certain probability, less than 1, but it can be made arbitrarily close to 1.
A single run of the algorithm works as follows (we will have to repeat
these runs sufficiently many times, with independent random choices).
Step 1 Pick an edge (v, w) uniformly at random among all edges from the
current graph G.
10
Step 2 Contract the selected edge (i.e., merge its endpoints). Keep the
arising parallel edges, but remove the self-loops, if any. The contracted
graph, denoted by G/(v, w), will be the new value of G.
Step 3 Are there still more than 2 nodes? If yes, repeat from Step 1. If no,
then STOP, the set of edges between the 2 remaining nodes form a cut.
Note that the obtained cut is not necessarily minimum. On the other
hand, as shown in the next (optional) lecture note, it will be a minimum cut
with probability at least 1/n2 in a graph on n nodes. Therefore, if we repeat
the algorithm K times independently and choose the smallest of the found
cuts, say C, then the following holds:
K
1
Pr(C is not a minimum cut) ≤ 1 − 2
n
Exercise
Show that the bound n(n − 1)/2 on the number of minimum cuts is tight
(that is, it can hold with equality).
11
of minimum cuts. Thus, in this case the bound holds with equality, which
shows that it is tight.
We can now use the following result that is proved among the exercises:
One can indeed easily find such a pair x, y of nodes. This is done via the
so-called Maximum Adjacency (MA) ordering. An MA ordering v1 , . . . , vn of
the nodes is generated recursively the by the following algorithm:
Nagamochi and Ibaraki proved that this ordering has the following nice
property:
12
Theorem In any MA ordering v1 , . . . , vn of the nodes
λ(vn−1 , vn ) = d(vn )
holds, where d(.) denotes the degree of the node.
Exercises
1. Prove that λ(G) ≤ λ(Gxy ) always holds, assuming Gxy still has at least
two nodes. In other words, merging two nodes in a graph G that has
at least 3 nodes, can never decrease the edge-connectivity.
13
a.) The nodes x, y are on different sides of the cut C. In this case
C is also a cut between x and y. Since C is a minimum cut in the
whole graph, there cannot be a smaller cut between x and y, so C is a
minimum cut between x and y. Therefore, in this case, λ(G) = λ(x, y)
holds.
b.) The nodes x, y are on the same side of the cut C. Then, after
merging x and y, C still remains a cut in the new graph Gxy . Since C
cannot be smaller than a minimum cut in Gxy , therefore, |C| ≥ λ(Gxy ).
We chose C, such that |C| = λ(G), yielding λ(G) ≥ λ(Gxy ). On the
other hand, we already know from Exercise 1 that λ(G) ≤ λ(Gxy ).
Therefore, in this case, the only possibility is λ(G) = λ(Gxy ).
Thus, we obtain that in case a) λ(G) = λ(x, y), while in case b) λ(G) =
λ(Gxy ). We also know that λ(G) ≤ λ(x, y) (by definition), and λ(G) ≤
λ(Gxy ) (by Exercise 1). Since precisely one of cases a), b) must occur,
therefore, by combining the above facts, we obtain that λ(G) must
be equal to the smaller of λ(x, y) and λ(Gxy ). This means, λ(G) =
min{λ(x, y), λ(Gxy )}, as desired.
14
Comments on Minimum Cut Algorithms
15
Finding the Most Connected Subgraph,
and Other Clusters in Graphs
The example graph on the next page shows that all the three cases can occur.
In the example graph we can observe:
• Find a minimum cut in the original graph G, let its size be denoted by
k = λ(G).
• Let A, B be the two node set into which the minimum cut divides
the graph. A key observation is that if there is any subgraph G0 with
λ(G0 ) > k, then it must be either fully in A or fully in B. Why?
The reason is that if G0 had nodes on both sides, then the found cut
of size k would separate these nodes, too. G0 , however, cannot be
separated by k edges, due to λ(G0 ) > k. Thus, if such a G0 exists, then
it must be either fully in A or fully in B.
f (n) ≤ f (k) + f (n − k) + 1 ≤ (k − 1) + (n − k − 1) + 1 = n − 1.
2. If the remaining graph is empty, then output the message “the k-core
is empty” and halt.
Exercises
1. In the last line, why do we jump back to Step 1? After all, when Step
1 was previously executed, it deleted all nodes of degree < k. Why is
this done again?
Answer: When the nodes of degree < k are deleted, this may create
new nodes of degree < k among the remaining ones. For example,
assume node u has degree k − 1 and it is a neighbor of another node
v, which has degree k. Then u is deleted in the first round, because its
degree is < k, but v is not deleted, as its degree originally is not < k.
But after deleting u, the degree of v becomes k − 1, so we created a
new node of degree < k.
Answer: Yes. Assume indirectly that there are two sets, A and B,
with A 6= B, that both satisfy the definition of the k-core. Then each
node in A has at least k neighbors in A. The same holds for B. Then
A ∪ B also has this property. However, if A 6= B, then at least one
of them cannot be the largest such subset of nodes, as A ∪ B also
has the property, and for any two different sets |A ∪ B| > min{|A|, |B|}
holds. Therefore, at least one of A, B violates the definition of a k-core,
contradicting to the initial assumption.
Clusters based on density
Definition: The edge density (or simply, the density) of a graph is the ratio
of the number of edges vs the number of nodes.
Example: The graph used to illustrate the k-core concept (see 2 pages back)
has n = 29 nodes, and m = 36 edges. Therefore, its density is
m 36
= ≈ 1.241
n 29
On the other hand, some of its subgraphs have higher density. For example,
the subgraph spanned by the red nodes has 5 vertices and 8 edges, so its
density is 8/5=1.6, which is larger. The subgraph induced by the red and
green nodes together has 17 vertices and 25 edges, so its density is 25/17 ≈
1.47, which is smaller than 1.6, but still larger than 1.241.
Answer: Yes. If a graph has m edges, n nodes, and the degree of the ith
node is di , then we can write:
n
X
2m = di .
i=1
The reason is that each edge contributes by 2 to the sum of the degrees.
Then we have for the edge-density:
Pn
m 1 i=1 di
= .
n 2 n
P
However, the expression ( ni=1 di )/n is nothing else but the average degree.
Therefore, the edge-density is just half of the average degree. Thus, finding
a subgraph with maximum edge-density is equivalent to finding a subgraph
with the highest average degree.
Sample Questions for the CS 6385 Midterm Exam
Contains 7 pages. The correct answer to each question is provided at the bottom of its page.
EXAM RULES:
1. Exactly one answer is correct for each multiple choice question. Encircle the number
of the chosen answer. You have two options for each multiple choice question, allowing
to get partial credit even for multiple choice:
(a) Select one answer. If you selected the correct one, then it is worth 1 point, other-
wise 0.
(b) You may select 2 answers. If the correct one is among them, then it is worth 1/2
point, otherwise 0. This allows partial credit for multiple choice questions. Note
that by selecting 2 answers you may double your chance to hit the correct one,
but at the price of getting only half of the points for this question.
2. Your choice of the answer(s) should be clear and unambigous. If you feel you have to
revisit a question and must change some of your choices during the exam, you can still
do it, given that you mark clearly and unambiguously your final choices. If ambiguity
remains for a question regarding your final choice(s), then it will be counted as an
unanswered question.
3. The instructor cannot give any hint during the exam about the answers. Therefore,
please refrain from asking questions that could lead to such hints (even remotely),
because such questions cannot be answered.
1
MULTIPLE CHOICE QUESTIONS
2 Assume the variables x, y occur in a linear programming task. We would like to add the
new constraint |2x| + |3y| ≤ 3 to the LP. Which of the following formulations does it
correctly, given that we must keep the formulation linear:
Correct answer: 4.
2
3 Consider the maximum flow problem from a source node s to a destination node t in a
directed graph. Assume we were able to find a flow that pushes a total of 10 units of
flow through the graph from s to t. (But it may not be a maximum flow.) Which of
the following is correct:
1. If originally each edge has capacity at most 5 and we increase the capacity of each
edge by 1, then the maximum s-t flow will necessarily be at least 12.
2. If we remove all edges from the graph on which the found flow is 0, then the
minimum cut capacity between s and t in the new graph necessarily remains the
same as in the original graph.
3. If there is a cut in the graph that separates s and t, and its capacity is 11, then
the found flow was not maximum.
4. If the found flow is maximum and all edge capacities are integers, then there must
be at least 10 edges that are saturated by the flow, that is, they are filled up to
their capacities.
Correct answer: 1.
3
5 Assume a large network with undirected links has minimum cut size 4, that is, it cannot
be disconnected by removing less than 4 links, but it can be disconnected by removing
4 links. We would like to disconnect the network in the following strong sense: we
remove enough links so that the network falls apart into at least 3 components. In
other words, the disconnected network cannot be made connected again by adding a
single link. To achieve this strong disconnection, which of the following is true:
Correct answer: 4.
4
PROBLEM SOLVING
9 Consider a network with directed links, and each link in this network has 1 Mb/s capacity.
Assume that from a source node s to a terminal node t the network can carry a flow of
20 Mb/s, but not more. Let us call a link critical if it has the following property: if the
link is removed, then the network cannot carry the 20 Mb/s flow from s to t anymore.
a) Is it possible that this network contains less than 20 critical links? Justify your
answer.
Answer: No, it is not possible. Let us consider a maximum flow, which has
value 20 Mb/s, according to the conditions. If we look at a minimum cut which
separates s and t, then, due to the max flow min cut theorem, this cut must have
capacity 20 Mb/s. Since each link has 1 Mb/s capacity, there must be 20 links in
the cut. If we remove any of these links, the capacity of the cut goes down to 19
Mb/s, so it is not possible to send 20 Mb/s flow anymore. Thus, each link in the
minimum cut is critical and there are 20 of them.
5
b) Is it possible that this network contains more than 20 critical links? Justify your
answer.
Answer: Yes, it is possible. For example, let the network have 3 nodes: s, a, t.
Define the links in the following way: connect s to a by 20 directed links, each
of capacity 1 Mb/s, and also connect a to t by 20 directed links, each of capacity
1 Mb/s. Then the maximum flow from a to t is 20 Mb/s. The links between s
and a form a minimum cut in which each link is critical. The same is true for the
links that connect a to t. Thus, this network satisfies the conditions and it has
40 critical links.
6
9 * Consider an undirected graph, and let λ(a, b) denote the edge connectivity between
any two different nodes a, b.
Let x, y, z be three distinct nodes in the graph. Does the inequality
Comment. One may ask here: could it be true that the inequality
7
Basic Reliability Configurations
The function f (...) above depends on the configuration, which defines when
the system is considered operational, given the states of the components.
Basic examples are shown in the configurations discussed below.
Series Configuration
In the series configuration the system is operational if and only if all com-
ponents are functioning. This can be schematically represented by the figure
below, in which the system is considered operational if there is an operational
path between the two endpoints, that is, all components are functioning:
o— p1 —— p2 —— p3 ——.........—— pN —o
The reliability of the series configuration is computed simply as the product
of the component reliabilities:
Rseries = p1 p2 . . . pN
Note: If many components are connected in series, then the reliability may
be much lower than the individual reliabilities. For example, if p = 0.98
and N = 10, then Rseries = (0.98)10 = 0.82, significantly lower than the
individual reliabilities.
Parallel Configuration
— p1 —
.. ..
. .
..
o— . —o
.. ..
. .
— pN —
pk (1 − p)N −k .
where à !
N N!
=
k k!(N − k)!
represents the number of ways one can choose a k-element set out of N .
N
à !
X N i
Rk/N = p (1 − p)N −i
i=k i
Exercise
Exercise
Comparison of the series-parallel and parallel-series configurations:
Solution
Rs−p = [1 − (1 − p)N ]N ,
Rp−s = 1 − (1 − pN )N
We can answer the question using the following known formula from calculus:
where e = 2.71... is the base of the natural logarithm. We use it for approx-
imation: when x is very small, then
(1 − x)1/x ≈ e−1
holds.
Rs−p → 1 (1)
as N grows.
Rp−s → 0 (2)
Comparing (1) and (2), we can conclude that with N → ∞ the reliability of
the series-parallel configuration tends to 1, while the reliability of the parallel-
series configuration tends to 0. Thus, the series-parallel configuration will be
more reliable when N grows very large.
Interpretation
The series-parallel configuration has much more possible paths between the
endpoints. We can pick any component from each parallel subnetwork to
build a path, yielding N N possible paths. On the other hand, the parallel-
series configuration has only N possible paths between the endpoints, as each
series subnetwork can serve as such a path, but there is no more possibility.
! ""#
"" #
""#
$
%1 &1
' (
Method of minimal paths
Remark: Do not confuse this concept with the well known concept of shortest
path. A minimal path in the reliability context does not have to be a shortest
path. (In fact, it does not even have to be a path, it can be an arbitrary
subset of components with the said properties.)
Example: In the example on page 1 the minimal paths are A-D, B-E, A-C-E
and B-C-D. The latter two are not shortest paths.
How can we compute the reliability using minimal paths? Observe that if
any one of the minimal paths is fully functional, then the system is already
operational, by the definition of the minimal path. Therefore, if we can list
all minimal paths in the system, then we can regard them as units connected
in parallel.
Let us use this approach in our example configuration. For simplicity, let the
reliability of each component be denoted by the same letter as the name of
the component. Then the reliability of the 4 minimal paths listed above will
be AD, BE, ACE, BCD, respectively. Then we can form a parallel configu-
ration from them, thus obtaining:
Answer: The number of minimal paths may be much smaller than the number
of all states, so the method is more scalable than the exhaustive enumeration
of all states.
This method is similar in spirit to the previous one, but considers cuts rather
than paths.
Property 1 If all components in C fail, then the whole system fails, inde-
pendently of the status of the other components.
Remark: Observe the duality between the definition of the minimal path
and minimal cut. If we replace “functioning/operational” in the definiton of
minimal path by “fail”, then we obtain the definition of minimal cut.
Example: In our example on page 1 the minimal cuts are A-B, D-E, A-C-E
and B-C-D. The latter two are not shortest paths.
To compute the system reliability, observe that if any one of the minimal cuts
fully fails, then the system must fail, independently of the rest. Therefore,
all minimal cuts must be operational for the sytem to work. Thus, we can
consider the minimal cuts as if they were connected in series.
The probability that a given cut, say A-B, works is 1 − (1 − A)(1 − B), since
the probability that all components fail in the cut is (1 − A)(1 − B), so the
complement 1 − (1 − A)(1 − B) means the cut is operational, i.e., not all
components fail in it. Since we conceptually consider the cuts as connected
in series with each other, therefore, we obtain
Note: The same comments apply here as for the method of minimal paths:
it is only an approximation and its advantage is that it tends to be more
scalable than the exhaustive enumeration of all states.
In the previous sections a high level of abstraction was used that ignored the
time dependence of reliability. Now we look into some fundamental quantities
that describe the temporal behavior of component reliability.
Let T be the failure time when the considered component brakes down. Nat-
urally, T is a random variable.
In other words, F (t) is the probability that the breakdown occurs before a
given time t. Naturally, the value of this probability depends on what is this
given time t, this dependence is described by F (t).
Let us define now further important functions that characterize how the
reliability of a component behaves in time.
Lifetime measures
S(t) = 1 − F (t)
1
where F (t) is the probability distribution function of the failure time.
The meaning of the survivor function directly follows from the defini-
tion of F (t):
S(t) = 1 − Pr(T < t) = Pr(T ≥ t).
In other words, S(t) is the probability that the component is still op-
erational at time t, since T ≥ t means the component has not failed up
to time t.
Note that the risk may be quite different from the probability of failure.
As an example, one can say that the probability that a person dies at
2
age 120 is very small, since most people do not live that long. On the
other hand, the risk that a person of age 120 dies is quite high, since
the concept of risk assumes that the person is still alive (otherwise it
would be meaningless to talk about risk).
To make the risk expression (1) exact and independent of ∆t, we con-
sider the limit ∆t → 0. Then, however, the probability would always
tend to 0. To avoid this, we divide it by ∆t, thus obtaining a quan-
tity that is similar in spirit to the pdf. This gives the definition of the
hazard function:
Pr(t ≤ T ≤ t + ∆t | T ≥ t)
h(t) = lim
∆t→0 ∆t
Thus, the hazard function gives the risk density for the failure to occur
at time t.
S(t) = 1 − F (t)
F (t) = 1 − S(t)
By taking derivatives we can get a relationship between the survivior function
and the pdf:
f (t) = −S 0 (t).
We can also express the survivior function with the pdf from the definition
R
S(t) = 1 − F (t). Using that 1-F(t)= t∞ f (t)dt, we obtain
Z ∞
S(t) = f (t)dt.
t
3
Relating the hazard function with the others is slightly more complex. One
can prove the following formula:
−S 0 (t)
h(t) = .
S(t)
Using the previous relationships, this implies
f (t)
h(t) = R ∞ .
t f (t)dt
It is kown from basic calculus that the formula h(t) = −S 0 (t)/S(t) is equiv-
alent to
d
h(t) = − ln S(t).
dt
Using this we can express S(t) by the hazard function as
Rt
− h(t)dt
S(t) = e 0 .
then we can see that there are two possible reasons for having high hazard
(risk) at time t:
• either the numerator is large, that is, the probability of a failure is high
around time t, and/or
R
• the denominator is small. The denominator t∞ f (t)dt gives the prob-
ability that the failure has not occured before t. If this is small, that
means the probability that the component is still alive is low. In other
words, the component is old from the reliability point of view.
4
Since a typical pdf sooner or later starts decreasing with time, this effect
tends to diminish the hazard as time advances. On the other hand, the
denumerator decreases as the component gets older, which will increase the
hazard. These two opposite effects can balance each other in different ways.
A special case when they precisely balance each other is the exponential
distribution, where the pdf is of the form
f (t) = λe−λt .
In many practical cases the hazard function has a bathtub shape that consists
three typical parts:
5
Expressed by →
Lifetime Measure
↓
1
1
1 – Ft
! "
1
!
1. Given some lifetime measure of the components, express them in terms of the survivor
function .
2. Compute the time dependent reliability of the configuration just like in the static case but using
the component survivor functions instead of static component reliabilities.
3. We get the survivor function of the system.
4. Convert it into the requested life-time measure using the table.
ILP Exercises
1.
Solution
' $ ' $
A
1 z z z 4
S
S
S
S
S
S
S
z z
S
2 S 5
S
S
S
S
S
S
S
z z z
S
S
3 S 6
& % & %
B
• Exactly one of the West coast cities is connected to one of the hubs:
3 X
X B
xij = 1
i=1 j=A
• Similarly, exactly one of the East coast cities is connected to one of the
hubs:
6 X
X B
xij = 1
i=4 j=A
Note that this already implies the same equation for hub B.
Thus, the entire mathematical program will be an ILP with 0-1 valued vari-
ables:
6 X
X B
min Z = cij xij
i=1 j=A
Subject to:
3 X
X B
xij = 1
i=1 j=A
6 X
X B
xij = 1
i=4 j=A
3
X 6
X
xiA = xiA
i=1 i=4
X3 X6
xiB = xiB
i=1 i=4
Answer:
This means, they cannot be all 1, so their sum is at nost n − 1:
x1 + x2 + . . . + xn ≤ n − 1
or
x1 + x2 + . . . + xn < n
Answer:
This means, at least n − 1 of them is 1, so their sum is at least n − 1:
x1 + x2 + . . . + xn ≥ n − 1
c) Either all variables are 0, or none of them.
Answer:
This means, they all must be equal:
x1 = x2 = . . . = xn
x1 = x2
x2 = x3
..
.
xn−1 = xn
Solution
z ≤ x
z ≤ y
z ≥ x+y−1
z = xy
for 0-1 valued variables. Note that the restriction x, y, z ∈ {0, 1} is essential
here, without it the equivalence would not hold.
Solution
Similarly to the previous exercise, it is easy to check that the following system
of linear inequalities provides an equivalent expression for the case when
z, x1 , . . . , xn ∈ {0, 1}:
z ≤ x1
..
.
z ≤ xn
z ≥ x1 + . . . + xn − n + 1
Case Study
An Optimization Problem in Cellular Network Design:
Formulation via Integer Linear Program
In cellular networks the coverage area is divided into cells. Each cell is served by base station to
which the mobile units connect wirelessly. The base stations are connected to switching centers
through a wired network.
Whenever a mobile unit moves to another cell, a handoff procedure has to be carried out. There
are two types of handoffs:
1. Intra-switch handoff: the mobile moves between two cells that are connected to the same
switch.
2. Inter-switch handoff: the mobile moves between two cells that are connected to different
switching centers.
The two different types of handoff procedures have different costs (for example, in terms of
processing time).
The goal of the optimization (which is part of the cellular network planning process) in this case
study is to decide which base station will be connected to which switching center, such that the
total cost is minimized.
The total cost takes into account the cost of the different handoff procedures (based on traffic
volume data), as well as the cabling cost among base stations and switching centers. In addition,
the traffic processing capacity of the switching centers is bounded, which has to be factored into
the optimization.
Note that the handoff cost between two cells depends on the traffic volume (amount of handoffs
between them). Furthermore, it also depends on whether they are assigned to the same switch or
not, since that assignment decides whether intra-switch or inter-switch handoff is needed. The
assignment between cells and switches, however, is not given in advance, it is obtained from the
optimization. On the other hand, the optimization, in turn, depends on the handoff costs. This
circular dependence makes the first formulation of the problem nonlinear. Then we can linearize
it with the trick that was used in earlier exercises to linearize a product of variables.
The source of this case study is T.G. Robertazzi, “Planning Telecommunication Networks,”
IEEE Press, 1999.
Integer Linear Programming
Introduction
In many network planning problems the variables can take only integer val-
ues, because they represent choices among a finite number of possibilities.
Such a mathematical program is known as integer program.
Remark: Often the variables can only take two values, 0 and 1. Then we
speak about 0-1 programming.
If the problem, apart from the restriction that the variables are integer val-
ued, has the same formulation as a linear program, then it is called an integer
linear program (ILP).
min Z = c1 x1 + c2 x2 + . . . + cn xn
Subject to
One can, of course, use other LP formulations, too, and then add the inte-
grality constraint x1 , . . . , xn ∈ Z to obtain an ILP.
Comments:
The minimum can be replaced by maximum, this does not make any essential
difference.
It is important to know that ILP is usually much more difficult to solve than
LP. Often we have to apply heuristic or approximative approaches, because
finding the exact global optimum for a large problem is rarely feasible.
Example 1: Capital Budgeting
• The total available budget that can be used for building the centers is
C.
• Objective: select which of the N sites will be used for building switching
centers, so that the total profit is maximized, while keeping the total
cost within the budget. (Any number of sites can be selected out of
the N .)
Let us formulate the problem as a mathematical program!
In other words, xi is the indicator variable of the fact that site i is selected/not
selected as a center.
Objective function
How can we express it? The profit from site i is pi if the site is selected as a
center, otherwise 0. The definition of xi suggests that this can be conveniently
expressed as pi xi . Then the total profit is:
N
X
p i xi .
i=1
xi ∈ {0, 1}, i = 1, . . . , N.
Subject to
N
X
ci x i ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , N
Comment: Though we have not mentioned it, one can easily recognize that
this problem is exactly identical to the well known Knapsack Problem.
Why is this good? Because we can then use all the methodology that has
been developed for the Knapsack Problem, so we can stand “on the shoulders
of giants.”
How can we find a solution?
It is well known from the study of algorithms that the Knapsack Problem in
the general case is NP-complete. In the special case when the coefficients are
polynomially bounded, it is known to have a Dynamic Programming solution,
but in the general case (where the coefficients can be exponentially large) we
cannot reasonably hope to find the exact optimum for a large problem, via
an efficient algorithm. A good approximation, however, is still possible:
Principle of the heuristics: Sort the sites according to their profit/cost ratio.
(Clearly, a higher profit/cost ratio is more preferable.) Select the sites one
by one in this order, thus, always trying the most preferable first, among the
yet unselected ones. (Preference = profit/cost). If the considered site still
fits in the remaining budget, then select it, otherwise proceed to the next
one in the preference order.
Exercise (to test your understanding for yourself): Find a specific example
when the above greedy algorithm does not yield the optimum.
A simple example is on the next page, but try to find one yourself, before
proceeding to the next page.
Solution to the exercise
p1 = 90, c1 = 10
p2 = 20, c2 = 2
p3 = 8, c3 = 1
Budget: C = 11
The Greedy Algorithm would select i = 2 first, since p2 /c2 is the largest.
This choice uses c2 = 2 units from the budget. The next largest ratio is
p1 /c1 , but c1 = 10 already does not fit in the remaining budget of 11 − 2 = 9.
So the algorithm proceeds to i = 3, which still fits in the budget.
Thus, the Greedy Algorithm selects sites 2 and 3, achieving a total profit
of 20 + 8 = 28. On the other hand, if we select sites 1 and 3, then we can
achieve a profit of 90 + 8 = 98, while the cost still fits in the budget.
Comment
As we have seen, the Greedy Algorithm does not necessarily find the optimum
for this problem. A natural question is:
How far is it from the optimum? In other words, can we prove some perfor-
mance guarantee for the greedy solution?
Theorem:
Let Topt be the optimum solution to the considered problem, that is, the max-
imum achievable total profit under the budget constraints. Let Tgreedy be the
profit achieved by the Greedy Algorithm. Further, denote by pmax the largest
profit coefficient, that is, pmax = maxi pi . Then
always holds.
Proof: See the Appendix at the end of this note. The proof is included only
for showing how can one possibly analyze the result of the greedy heuristic
in this situation, you do not have to learn the proof details.
Exercises
Solution
2. In the example which showed that the Greedy Algorithm can produce a
non-optimal solution, the greedy solution was less than half of the optimum.
Can we modify the Greedy Algorithm such that it always guarantees at least
half of the optimum profit?
Solution
To exlude trivial cases, let us assume that ci ≤ C for every i. (If ci > C for
some i, then it can be excluded, since it never fits in the budget).
Topt
Tgreedy ≥ Topt − pmax ≥ ,
2
since the first inequality follows from the Theorem and the second inequality,
after rearranging, is equivalent to Topt ≥ 2pmax .
Thus, if Topt ≥ 2pmax , then the greedy solution satisfies the requirement.
What if Topt < 2pmax ? This means pmax > Topt /2, so then simply selecting
the site with maximum profit already achieves more than half of Topt . Let
us call this latter choice 1-site heuristic. Clearly, the profit achieved by the
1-site heuristics is pmax .
It follows that at least one of the greedy and 1-site heuristics gives half or
more of Topt . But we do not know in advance which one. There is, however,
an easy way to know it: run both and take the result that gives the larger
profit. Thus, the required modification is:
Run the Greedy Algorithm to obtain Tgreedy . If Tgreedy ≥ pmax , then keep the
greedy solution, otherwise replace it with the 1-site heuristics.
Appendix
Proof of the Theorem.
j
X j
X pj
Tgreedy ≥ pi y i + ci (xi − yi ).
i=1 i=1 cj
A is the cost incurred by the first j variables in the greedy solution. This
does not include cj , since xj = 0. Moreover, cj cannot be added without
violating the budget constraint, since otherwise the greedy algorithm would
have set xj = 1. Therefore, we must have A + cj > C, which yields
j
X
A= ci x i > C − cj .
i=1
Considering the other sum (B), we can use the fact that the optimal solution
must also fit in the budget. Therefore,
n
X
ci y i ≤ C
i=1
holds, which implies
j
X n
X
B= ci yi ≤ C − ci y i .
i=1 i=j+1
Substituting the sums A and B by the bounds, we can observe that the
expression can only decrease, as A is substituted by a smaller quantity, while
the subtracted B is substituted by a larger quantity. Thus, we obtain
j
X pj Xn
Tgreedy ≥ pi y i + C − cj − C − ci yi
.
cj | {z }
i=1 i=j+1
<A | {z }
≥B
Using now that pcjj ≥ pcii when i > j, the second sum can only decrease if the
coefficient of yi is replaced by pi . This yields
j
X n
X n
X
Tgreedy ≥ pi y i + pi y i − pj = pi y i − pj
i=1 i=j+1 i=1
We can now observe that the last summation is precisely Topt , while pj ≤
pmax , so if we subtract pmax instead of pj , then the expression can only
decrease. Thus, we obtain
A telecom company wants to offer new services to its customers. Each service
can be offered in different amounts (e.g., with different bandwidth). The goal
is to decide how much of each service is to be offered to maximize the total
profit, such that the total cost remains within a given budget.
Subject to
N
X
(ci xi + ki yi ) ≤ C
i=1
xi ≥ 0, i = 1, . . . , N
yi ∈ {0, 1}, i = 1, . . . , N
1 if xi > 0
yi =
0 if xi = 0
This is a mixed integer program, since some of the variables are continuous.
On the other hand, this formulation is not linear, since the last constraint
(the one that enforces the relationship between yi and xi ) is not linear. Often
it is desirable, however, to have a linear formulation which is useful when we
want to use commercial ILP software.
Exercises
Solution:
Solution:
Let i be the service that the company offers. Then the total profit will be
pi xi , as for all j 6= i we have xj = 0. For any given i, the profit will be
maximum, if xi is as large as possible. Since there is no other service in the
considered case, therefore, service i will use the entire available budget C.
This means
c i xi + k i = C
(Since obviously xi > 0 in this case, therefore, the fixed charge will be there.)
From this we get
C − ki
xi =
ci
and the profit will be
pi (C − ki )
p i xi = .
ci
How do we know the value of i? We simply have to check which i makes the
expression
pi (C − ki )
ci
the largest. (Since the expression contains only known constants, we can
easily check which i makes it the largest.)
3*. Show that the special case discussed in 2 is not as special as it looks:
even in the general case the optimum is always attained by a solution in
which only one xi differs from 0. Using this, devise an algorithm that solves
the problem optimally in the general case without ILP. This gives a clever
shortcut, using the special features of the problem, making the task efficiently
solvable even in the general case.
ILP Solution Techniques
• Randomized rounding
• Dynamic programming
• Constraint aggregation
The randomized rounding technique works the best when the ILP is of combi-
natorial nature, that is, a 0-1 programming problem. The principle is shown
via the following example. Consider the problem
max Z = cx
subject to
Ax = b
xi ∈ {0, 1}, i = 1, . . . , n
One the other hand, we can have much smaller violation of the constraints by
rounding randomly, according to the following rule. The randomly rounded
value xei of xi will be:
1 with probability xi
xei =
0 with probability 1 − xi
This can be easily implemented by drawing a uniformly distributed random
number yi ∈ [0, 1] and setting xi = 1 if xi ≥ yi , otherwise xi = 0.
What is the advantage of the random rounding? We can observe that the
expected value of xe1 + . . . + xe1000 will be exactly 510, thus it satisfies the
constraint. Of course, the actual value may still violate it, but one can
expect that the actual values fluctuate around the average, so the errors of
different signs largely cancel out if there are many variables.
One can quantitatively estimate the error that can still occur with a cer-
tain small probability. The way of this estimation is shown in the following
theorem (informal explanation follows after the theorem).
with probability at least 1 − n−α , where α > 0 is any constant and amax =
maxi |ai |.
The Theorem says that the error, which is the deviation of the actual value
√
of ax̃ from the expected value b, is bounded by amax αn log n. This may be
much smaller than b.
Observe the effect of the α parameter. The bound on the error holds with
probability at least 1 − n−α . If α is large, then this probability is very close
√
to 1. Then, however, the error amax αn log n gets also larger. If α is small,
then the error is small, but it holds with smaller probability. Thus, there
is trade-off between providing a tighter bound less surely or a looser bound
more surely. Note, however, that this plays a role only in the analysis, not
in the actual algorithm.
Thus we obtain that the deviation from the required ax = 510 is bounded as
What if we are satisfied with less certainty, say, with 90% probability? Then
we can choose α = 1/3, since 1 − 1000−1/3 = 0.9 and then we get the error
bound of s
q 1
amax αn log n = · 1000 log 1000 ≈ 48.
3
What if we would like to have the tighter bound (2) but also the higher 99.9%
probability? There is a way to achieve that, too. Repeat the randomized
rounding 3 times independently. Then in each trial the probability of violat-
ing (2) will be at most 10%. Hence the probability that all the 3 independent
trials violate (2) is 0.13 = 0.001 = 0.1%. Thus, among the 3 trials there must
be one with 99.9% probability that satisfies the bound (2), so we can choose
this one. The moral is that we can “amplify” the power of the method by
repeating it several times independently and then choosing the best result.
The Theorem can directly be extended to the case that involves repeated
independent trials. With r repetitions the probability is amplified to 1−n−αr ,
while the error bound remains the same.
Remark: The approach is most useful if the problem contains “soft” con-
straints. What are these? It is customary to differentiate two types of cons-
triants:
• Hard constraints: these must be obeyed. For example, they may rep-
resent some physical law which is impossible to violate.
better solution via randomized rounding than with naive deterministic round-
ing, especially if there are many variables.
2. Generalize Theorem 1 for the case when there are more constraints.
max f (x).
x∈B
x = (x1 , . . . , xn ), b = (b1 , . . . , bn ).
Bk (b) = {x ∈ B | x1 = b1 , . . . , xk = bk }.
That is, the first coordinate is fixed at b1 , the second at b2 , ... the k th is fixed
at bk and the rest of the coordinates of x is arbitrary (0 or 1). If k = 0,
then it means we did not fix anything, so B0 (b) = B. If k = n, then all
coordinates are fixed at the corresponding bi value, so then the set Bn (b)
contains only a single vector, b itself: Bn (b) = {b}.
Let us introduce now the following family of functions:
The meaning of Fk (b) is that this is the maximum value of f (x) if the maxi-
mization is done over the restricted set Bk (b), that is, the first k coordinates
of x are fixed at the values determined by b and only the rest can change.
1
What is the advantage of the Fk (b) functions? The reason for we have
introduced them is that we can easily construct a recursive algorithm to
compute them. Once we have this subroutine that computes Fk (b) for any k
and b, then we can solve the original problem by simply calling the subroutine
with k = 0, since, by definition we have
Exhaustive Search
1. function Fk (b)
2. if k = n then return f (b)
3. else
4. begin
5. bk+1 := 0; u := Fk+1 (b)
6. bk+1 := 1; v := Fk+1 (b)
7. return max(u, v)
8. end
Explanation: In line 2 we handle the case when all coordinates are fixed,
i.e., we have arrived at a leaf of the search tree. In this case Bn (b) = {b},
so the maximum over this set can only be f (b). If k < n then in lines 5
and 6 we compute recursively the function for two possible cases (branch):
when the next coordinate is fixed at 0 (line 5) and when it is fixed at 1 (line
6). The maximum of the two gives the value of Fk (b) in line 7, since this is
the maximum over the set when the (k + 1)th coordinate is not fixed, only
the first k. What guarantees that the recursion ends after a finite number of
steps? This is guaranteed by the fact that the recursive call is always made
for a larger value of k and when the largest value k = n is reached then there
is no more recursive call.
2
Let us now turn to the Branch and Bound algorithm. Suppose that we have
an upper bound function Uk (b) available, such that
Fk (b) ≤ Uk (b)
3
get back the exhaustive search (ignoring the unnecessary begin-end pairs).
Thus, let us take a closer look how these lines change the exhaustive search.
In the case k = n the only change is that before returning f (b) we update
the value of maxlow in line 4: if the new function value is larger then the
old value of maxlow, then this larger value will be the new best lower bound
on the optimum, otherwise the old one remains.
The most essential difference is in line 8, this is the bounding. Here the idea is
that we only make the recursive calls if the upper bound Uk (b) is larger than
the current value of maxlow. Why we can do this without losing anything?
Because if the condition in line 8 is not satisfied, then
must hold. This means that from this parameter combination (k, b) we
cannot hope improvement on the best value found so far, since the maximum
in this subtree of the search tree (i.e., Fk (b)) is already known to be not more
than maxlow, so it makes no sense to explore this subtree. Thus, in this case
we do not make any recursive call and return maxlow in line 14 (just to
return something).
Note that for the above bounding step we need to compute the value of Uk (b),
but we assumed that for this an independent subroutine is available. The
savings may be essential if this subroutine is much faster then the exhaustive
recursive algorithm. This is typically the case in the successful applications
of B&B.
4
Comments:
• If we also want to find the maximizing vector x, not just the maximum
value f (x) (i.e., we ask not only that how much is the maximum, but
also that where it occurs), then we have to keep record the latest b vec-
tor along with updating maxlow, whenever a higher value (an update)
occurs in line 4.
5
Application Example of the
Branch and Bound Algorithm
Consider the Capital Budgeting problem (which, as we already know, is structurally identical
to the Knapsack problem, so we can use either name). Its formulation is this:
n
X
max Z = pi xi
i=1
Subject to
n
X
c i xi ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , n
Let us try to find the exact solution with the Branch and Bound (B&B) algorithm. In order
to do it, we need to handle the following issues.
This does not have any constraint and seeks for the optimum over all binary vectors of
dimension n. In contrast, our problem does have a constraint and seeks for the optimum
only over those binary vectors that satisfy the constraint.
To handle this difference, we need to transform our constrained task into an unconstrained
one. It can be done by the method that is often called penalty function method. It is based on
the idea that we can substitute a constraint by modifying the objective function, such that
we “penalize” those vectors that violate the constraint. In this way, we can do the search
over all vectors (with the modified objective function), because when a global, unconstrained
optimum is found, the “penalty” does not allow it to be taken at a vector that violates the
constraint.
Specifically, in our case, we can do the following. The original objective function was
n
X
f (x) = pi xi .
i=1
1
Let the modified objective function be
Pn Pn
i=1 pi xi if i=1 ci xi ≤C
fe(x) =
Pn
0 if i=1 ci xi > C.
We can observe that whenever a vector x satisfies the constraint, fe(x) = f (x) holds. On the
other hand, if x violates the constraint, then fe(x) = 0. Since the maximum we are looking
for is positive1 , therefore, this positive maximum of fe(x) cannot be taken on a vector that
violates the constraint, as fe(x) = 0 for those vectors, by definition. Thus, if we solve
max fe(x)
x∈B
by B&B, then the optimum provides us with the optimum of the Knapsack problem.
In B&B we use an upper bound function Uk (b) that provides an upper bound on the partial
optimum Fk (b) (see the Lecture Note about B&B).
What will be Fk (b) is our case? If we fix the first k variables at values b1 , . . . , bk ∈ {0, 1},
then we get the following new Knapsack problem:
n
X k
X
max pi xi + pi bi
i=k+1 i=1
Subject to
n
X k
X
ci xi ≤ C − ci bi
i=k+1 i=1
xi ∈ {0, 1}, i = k + 1, . . . , n
Fk (b) is the optimum of this task. How do we get the upper bound function Uk (b)? We can
take the LP relaxation of this ILP, replacing the constraint xi ∈ {0, 1} by 0 ≤ xi ≤ 1. Thus,
Uk (b) is the optimum solution of the LP
1
Assuming each item alone fits in the knapsack, since we can a priori exclude those that do not.
2
n
X k
X
Uk (b) = max pi xi + pi bi
i=k+1 i=1
Subject to
n
X k
X
c i xi ≤ C − ci bi
i=k+1 i=1
0 ≤ xi ≤ 1, i = k + 1, . . . , n
The whole B&B approach makes sense only if we can compute the upper bound function
significantly faster than the original task. In our case we face the problem: how do we quickly
solve the above LP that defines Uk (b)? Fortunately, this LP has a special structure: it is the
continuous version of the Knapsack problem. By continuous we mean that xi ∈ {0, 1} has
been replaced by 0 ≤ xi ≤ 1. For such a continuous knapsack it is known that the optimum
can be found by a fast greedy algorithm, as follows.
Order the variables according to decreasing pi /ci ratio. Let us call it preference ordering.
The most preferred variable is the one with the highest pi /ci ratio. Consider the variables
starting by the most preferred one, and assign them the value 1, as long as it is possible
without violating the budget constraint. (Note that in the considered subproblem the budget
P
is C − ki=1 ci bi . If it happens to be negative, then there is no solution to the subproblem,
and we can represent it by Uk (b) = −1, so that this branch of the search tree will surely
be cut down.) When we cannot continue assigning 1 anymore without violating the budget,
then assign a fractional value to the next variable, so that the remaining part of the budget,
if any, is exactly used up. The remaining variables, if any, take 0 as their value. One can
prove that this value assignment gives the optimum solution to the continuous Knapsack
problem. Therefore, we have a fast way of computing Uk (b).
max fe(x)
x∈B
and whenever the algorithm calls for Uk (b), we can compute it by the fast algorithm described
above.
3
Sample question about the Branch and Bound Algorithm
Which of the following describes best the definition of the functions Fk(b) used in the
the Branch and Bound algorithm:
1. These functions can be computed recursively and then the original optimum is
obtainable by calling the function Fk(b) with k = 0; b = 0.
3. Fk(b) is the optimum value of the original function f(x) over a restricted set, in
which the first k coordinates of x are fixed according to b.
4. These functions make it possible to incorporate an upper bound function into the exhaustive
search. As a result, possibly large parts of the exhaustive search can be eliminated, by
eliminating the recursive call when it cannot bring an improvement.
Branch and Cut Algorithm
It is the combination of the Cutting Plane and Branch and Bound Algorithms. We show the
The formulation can yield an efficient algorithm if the following two condi-
tions are satisfied:
• The number of different subproblems is not too large (does not grow
exponentially).
• The iterative reductions finally reduce the task to some initial cases for
which the solution is already known or directly computable.
Example
A telecom company wants to create an extension plan for its network, year by
year over a time horizon of T years, such that the network will have enough
capacity to keep up with the growing demand in every year, and the total
cost is minimum. The following information is available as input:
Solution
• The time (year) will be denoted by t. This variable runs over a time
horizon T, that is t = 1, . . . , T.
If the last amount of capacity we have bought is r and now we have altogether
k, then by the end of the previous year we must have had k − r. If we already
know the optimum cost up to year t − 1 for all possible values of the capacity,
then we can express the updated total cost as
We can apply the above recursive formula whenever t > 1. But how do
we handle the first year, that is, how do we start the recursion? When
t = 1, there is no previosly accumulated capacity yet, so then we simply take
A[1, k] = cost[1, k] for every k.
Comment:
N
X
max Z = p i xi
i=1
Subject to
N
X
ci xi ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , N
Let us assume now that instead of a single inequality constraint we have two.
Assume further that all constants are nonnegative integers. Thus, the task
becomes this:
N
X
max Z = p i xi
i=1
Subject to
N
X
a i xi ≤ A
i=1
XN
b i xi ≤ B
i=1
xi ∈ {0, 1}, i = 1, . . . , N
N
X
max Z = p i xi
i=1
Subject to
N
X +2
ai xi = A
i=1
N
X +2
bi xi = B
i=1
xi ∈ {0, 1}, i = 1, . . . , N
xN +1 ∈ {0, 1, . . . , A}
xN +2 ∈ {0, 1, . . . , B}
Now the question is this: can we replace the two equality constraints above
by a single one, so that this single constraint is equivalent to the joint effect
of the two? By equivalence we mean that they generate the same set of
feasible solutions.
The answer to the above question is yes, even though it may seem counter-
intuitive at first. Let us define the parameters of the new constraint by
ci = ai + M bi
for every i, and
C = A + MB
where M is a constant that we are going to choose later.
(taking aN +2 = 0), since (3) is just a linear combination of (1) and (2). The
question, however, is this: can we choose M such that the implication also
holds in the opposite direction, that is, such that the aggregated constraint
(3) implies (1) and (2)?
The answer is yes. To see it, let us rearrange (3) in the following form:
N +1
ÃN +2 !
X X
ai xi − A + M b i xi − B =0 (4)
i=1 i=1
| {z } | {z }
Y Z
PN +2
Using Z = i=1 bi xi − B, we can rewrite (4) as
Y + M Z = 0. (5)
How can this equality hold? One possibility is that Y = Z = 0. In that case,
(1) and (2) are both satisfied, since
N
X +1 N
X +2
Y = ai xi − A and Z= bi xi − B.
i=1 i=1
P
Remark: We can assume that A < N i=1 ai , since otherwise the original
PN
constraint i=1 ai xi ≤ A would always be satisfied, so we could simply leave
it out. With this assumption the choice for the constant M becomes
N
X
M =1+ ai .
i=1
Tabu Search
Let us first consider the simplest strategy to optimize a function f (x) over
a discrete domain x ∈ D, the (greedy) local search. This can be defined as
follows.
The critical problem of greedy local search is that it may easily get stuck in
a local optimum, which can be very far from the global optimum.
Tabu search offers an approach that diminishes (but does not eliminate!)
the danger of getting stuck in a local optimum. The fundamental principle
is summarized below.
1
• The algorithm does local search, but it keeps moving even if a local
optimum is reached. Thus, it can “climb out” from a local minimum.
This would result, however, in cycling, since whenever it moves away
from a local minimum, the next step could take it back if only the
improvement is considered. This problem is handled by the rule that
the points on the tabu list cannot be chosen as next steps. In the tabu
list example mentioned above we can decrease the possibility of cycling
by disallowing recently visited points.
• The tabu list can also be more complicated than the example above.
The algorithm may also maintain more than one tabu lists. For exam-
ple, one list may contain the last k visited points, while another list can
contain those points that are local minimums visited in the search. A
third list can contain, for example, those points from which the search
directly moved into a local minimum.
• The rule that the points in the actual tabu list cannnot be visited
may be too rigid in certain cases. To make it more flexible, it is usually
allowed that under certain conditions a point may be removed from the
tabu list. These conditions are called aspiration criteria. For example,
if with respect to the current position a point on the tabu list offers
an improvement larger than a certain treshold, then we may remove
the point from the tabu list. Or, if all possible moves from the current
position would be prohibited by the tabu list(s), then we can remove
from the tabu list the point that offers the best move. In this way we
can avoid getting “paralyzed” by the tabu list.
2
stop the algorithm.
Recall that the search space consists of an underlying domain, and a neigh-
borhood definition. The domain is usually given, as it is the set of the possible
objects in which we search an optimal one. But the neighborhood structure
is up to us to choose.
3
• The arising neighborhoods are not too large, so that they can be
searched relatively easily.
• At the same time, the neighborhoods are also not too small, so that
they do not overly restrict the choices.
• We expect that the search space is connected, that is, every point is
reachable from every other one, via a sequence of neighbor to neighbor
moves. This is important, since otherwise it may be impossible to reach
the optimum simply because it may not be reachable from the initial
position at all.
In some cases these conditions are easily satisfied, see, e.g., the introductory
example of this lecture note, where the neighborhood is defined by (1). But
in other cases it may be somewhat harder to design the “right” search space.
Exercises
Task: Design a search space over the set of all spanning trees of the complete
graph, such that the neighborhood definition satisfies the above presented
three conditions for a good neighborhood structure.
4
2. Assume we have n nodes, and we want to design a ring network connecting
all the nodes. The cost of connecting any two given nodes is known, and we
look for the minimum cost ring. (This problem is known to be NP-hard.)
Tasks:
a.) Design a search space over the set of all possible rings, such that the
neighborhood definition satisfies the above presented three conditions for a
good neighborhood structure.
Hint: Use the result from algebra that every permutation can be represented
as the superposition of transpositions (transposition: a special permutation
in which only two entries are exchanged).
b.) Show that the search space can be designed in a way that, beyond
satisfying the three conditions, it is also regular. Regularity means that each
element (point) of the space has the same number of neighbors. This is
usually a desirable property, because it means that all points are equally
well connected, none of them is distinguished by having more neighbors than
others.
5
Simulated Annealing
Simulated annealing improves the greedy local search in the following way.
Assume the task is to minimize a function f (x).
• If f (y) < f (x), that is, a better point is found, then we accept it and
y will be the next position.
An essential feature of simulated annealing that it can climb out from a local
minimum, since it can accept worse neighbors as the next step. Such an
acceptance happens with a probability that is smaller if the neighbor’s quality
is worse. That is, with larger ∆E the acceptance probability gets smaller.
At the same time, with decreasing temperature all acceptance probabilities
get smaller. This means, initially the system is “hot”, makes more random
movement, it can relatively easily jump into worse states, too. On the other
hand, over time it “cools down”, so that worsening moves are accepted less
and less frequently. Finally, the system gets “frozen”.
This process can be interpreted such that initially, when the achieved quality
is not yet too good, we are willing to accept worse positions in order to be able
to climb out of local minimums, but later, with improving quality we tend
to reject the worsening moves more and more frequently. These tendencies
can be balanced and controlled by the choice of the cooling schedule. If it
is chosen carefully, then convergence to a global optimum can be guaranteed
(proven mathematically).
Advantages of Simulated Annealing
The genetic algorithm applies concepts from evolutionary biology and at-
tempts to find a good solution by mimicking the process of natural selection
and the “survival of the fittest”. The algorithm can be outlined as follows.
1
Advantages of Genetic Algorithms
• Offers a chance that the initial solutions evolve into good ones that are
close to optimal.
• The evolving population tends to have more and more fit individuals,
so the algorithm may find many good solutions, not just one.
2
Exercise
p = 1 − (1 − )n . (1)
The reason is that (1 − )n is the probability that all the n individuals have
their fitness in the interval [0, 1 − ). Then 1 − (1 − )n is the probability of
3
the complement event, which is the event that not all have their fitness in
[0, 1 − ), that is, at least one must fall in [1 − , 1]. Answer 1 claims that for
every fixed > 0 we have p → 1. From formula (1), we see that this would
require (1 − )n → 0. However, for a constant n this is not satisfied. On
the other hand, if n → ∞, then (1 − )n → 0 indeed holds. Therefore, the
correct choice is Answer 3.
4
CS 6385 Final Exam Sample Questions with Answers
RULES:
Exactly one answer is correct for each multiple choice question. Encircle the
number of the chosen answer. You have two options for each multiple choice
question:
(a) Select one answer. If it is correct, then it is worth 1 point, otherwise 0.
(b) You may select 2 answers. If the correct one is among them, then it is worth
1/2 point, otherwise 0. This allows partial credit for multiple choice questions.
Note that by selecting 2 answers you may double your chance to hit the correct
one, but at the price of getting only half of the credit for this question.
Your choice of the answer(s) should be clear and unambigous. If ambiguity
remains for a question, then it will be counted as an unanswered question. The
instructor cannot give any hint during the exam about the answers, so please
refrain from asking such questions. It is an open-notes exam, but no other help
can be used. In particular, no device can be used that has a communicating
capability (laptop, cellphone, etc).
1
Part A : MULTIPLE CHOICE QUESTIONS
Answer: 3
2
2 We observe that for a component the probability of being operational at some given time
t is the same as at time t + 5. Which of the following is true about potential failures:
Answer: 1
Justification: The probability of being operational at time t is given by the survival
function S(t), since S(t) = 1 − F (t) = Pr(T ≥ t), that is, it is the probability that
the component has not failed before t. In our case the condition says S(t) = S(t + 5).
Therefore, we must also have F (t) = F (t + 5). Since the probability distribution
function F (t) cannot decrease, therefore, it must be constant in the entire [t, t + 5]
interval. (If it were not constant, then at some point in the interval it must differ from
F (t). If it is smaller there than F (t), then it must have decreased from F (t), which is
not possible. If it is larger, then it must decrease again to get back to F (t) = F (t + 5),
which is also not possible.) Thus, F (t) is constant in the entire interval, which implies
that its derivative, the probability density function f (t), must be 0. But that means,
no failure can occur in the interval, since the probablity of failure in an interval is given
by the integral of f (t) over the interval, which is 0 in our case.
3
3 In a mixed ILP x is a continuous variable and y is a discrete variable with y ∈ {0, 1}.
We would like to express both of the following conditions via linear constraints: (a) if
y = 0 then x = 0; and (b) if y = 1 then −1 ≤ x ≤ 1. Which of the following does it
correctly:
1. xy = 0, −1 ≤ x ≤ 1
2. x = y = 0 or −1 ≤ x ≤ 1
3. x + y ≥ 0, x − y ≤ 0
4. −1 − y ≤ x + y ≤ 1 − y
5. y(−1 − y) ≤ x ≤ y(1 − y)
6. None of the above.
Answer: 3
4
4 We observe that a component has a survivor function that satisfies 0 < S(t1 ) = S(t2 ) < 1
for some given time instants t1 < t2 . Somebody claims this implies that the hazard
function h(t) must be 0 everywhere in the interval [t1 , t2 ]. Which of the following is
true about this claim:
1. The claim cannot hold in general, since we only have information at the time
instants t1 , t2 , but we do not know anything about what happens between t1 and
t2 .
2. The claim may or may not be true, depending on how S(t) behaves between t1
and t2 .
3. The claim is always true, because S(t) = 1−F (t), and the probability distribution
function F (t) cannot decrease, so S(t) cannot increase. Therefore, S(t1 ) = S(t2 )
implies that S(t) must be constant in the whole [t1 , t2 ] interval, which yields that
h(t) = 0 holds everywhere in the interval, due to h(t) = −S ′ (t)/S(t).
4. If h(t) = 0 for every t ∈ [t1 , t2 ], it means that there is no risk (hazard) of failure
in the interval. Since we know that S(t2 ) < 1, therefore, the operational status
of the component at the end of the interval is not guaranteed, having probability
less than 1. Thus, there must be some risk of failure before the interval ends, so
h(t) = 0 cannot hold in the entire interval, making the claim wrong.
Answer: 3
5
5 A network has 3 nodes and there is an undirected link between each pair of nodes. Each
link is operational independently of the others with probability p, 0 < p < 1, while the
nodes are always up. The system is considered operational if the network topology is
connected. Which of the following is true about the reliability configuration:
1. This is a series configuration, because each link should be up to make the system
operational.
2. Since it is enough if two links are up to preserve connectivity, and it is not neces-
sary that all are operational, therefore, this is a parallel configuration.
3. It is a k out of N configuration with k = 2 and N = 3.
4. Since one link has to be operational, and at least one of the other two should
be up, too, therefore, it can be regarded as a combination of series and parallel
configurations.
5. None of the above.
Answer: 3
6
6 Consider the reliability of a series configuration with n components and the reliability of
each component is p = 1 − 1/n2 . What happens if n grows very large (n → ∞)?
1. The reliability of the configuration tends to 0.
2. The reliability of the configuration tends to 1.
3. The reliability of the configuration tends to a constant that is strictly between 0
and 1.
4. The reliability of the configuration does not tend to any number, because the
limit does not exist.
Answer: 2
Justification: Since it is a series configuration, we have
µ ¶n
1
R= 1− 2 .
n
³ ´n2
1
Using that 1 − n2
→ e−1 , it can be reformulated as
µ ¶n µ ¶n2 1
1 1 n 1
R= 1− = 1− ∼ e− n → 1.
n2 n2
7
7 Assume that in a Simulated Annealing algorithm the state space consists of all n-
dimensional binary vectors for some fixed n ≥ 2. We want to decide whether the state
space is connected, that is, every vector can be reached from every other via a sequence
of neighbor-to-neighbor moves. Let w(x) denote the number of 1 bits in x (the weight
of x). The binary vectors x, y are defined neighbors if and only if w(x) + w(y) = n
holds. Which of the following is correct with this neighborhood definition?
1. The state space will not be connected for any n ≥ 2.
2. The state space will always be connected for any n ≥ 2.
3. If n ≥ 2, then the state space will be connected for any even n, but not for odd
n.
4. If n ≥ 2, then the state space will be connected for any odd n, but not for even
n.
5. None of the above.
Answer: 1
Justification: Consider the vector x = 0 that contains only 0 bits. Its weight is 0.
It can be a neighbor only of the vector y = 1, which consists of only 1 bits, and its
weight is n, since otherwise the weight sum could not be n. Since no other vector has 0
or n weight, these two vectors are only neighbors of each other, but are not connected
to any other vector. Since for n ≥ 2 the state space contains more than 2 vectors, it
cannot be connected.
8
Part B : PROBLEM SOLVING
Solution
If the reliability doubles, then we have
Let us introduce a new variable x = (1−p)n . With this the above formula becomes
1 − x2 = 2 − 2x.
9
9 A company wants to install switches at n sites, at most one at each site. 4 types
of switches are available: Type-1, Type-2, Type-3 and Type-4, each at a cost of
c1 , c2 , c3 and c4 , respectively. There is a restriction, however, that we can use
altogether at most two different types out of the four, that is, the number of
types used in the entire system (not just at a single site) can be at most two. If
a switch is installed at a site, it generates a profit of p1 , p2 , p3 or p4 , respectively,
depending on its type. There is an available budget of C. Formulate as an integer
linear programming problem the following task: find and installation plan that
maximizes the total profit, such that the total cost does not exceed the available
budget, and altogether at most two different types of switches are used in the
entire system.
Solution
Let xij ∈ {0, 1} indicate whether at site i a switch of Type-j installed or not.
Furthermore, let yj indicate if a Type-j switch is used somewhere in the system
or not (j = 1, 2, 3, 4). Then the task can be formulated as
n
X
max (p1 xi1 + p2 xi2 + p3 xi3 + p4 xi4 )
i=1
subject to
n
X
(c1 xi1 + c2 xi2 + c3 xi3 + c4 xi4 ) ≤ C
i=1
xij ≤ yj (∀i, j)
y1 + y 2 + y 3 + y 4 ≤ 2
10
MULTIPLE CHOICE QUESTIONS
2 Assume the variables x, y occur in a linear programming task. We would like to add the
new constraint |2x| + |3y| ≤ 3 to the LP. Which of the following formulations does it
correctly, given that we must keep the formulation linear:
Correct answer: 4.
2
3 Consider the maximum flow problem from a source node s to a destination node t in a
directed graph. Assume we were able to find a flow that pushes a total of 10 units of
flow through the graph from s to t. (But it may not be a maximum flow.) Which of
the following is correct:
1. If originally each edge has capacity at most 5 and we increase the capacity of each
edge by 1, then the maximum s-t flow will necessarily be at least 12.
2. If we remove all edges from the graph on which the found flow is 0, then the
minimum cut capacity between s and t in the new graph necessarily remains the
same as in the original graph.
3. If there is a cut in the graph that separates s and t, and its capacity is 11, then
the found flow was not maximum.
4. If the found flow is maximum and all edge capacities are integers, then there must
be at least 10 edges that are saturated by the flow, that is, they are filled up to
their capacities.
Correct answer: 1.
3
5 Assume a large network with undirected links has minimum cut size 4, that is, it cannot
be disconnected by removing less than 4 links, but it can be disconnected by removing
4 links. We would like to disconnect the network in the following strong sense: we
remove enough links so that the network falls apart into at least 3 components. In
other words, the disconnected network cannot be made connected again by adding a
single link. To achieve this strong disconnection, which of the following is true:
Correct answer: 4.
4
PROBLEM SOLVING
9 Consider a network with directed links, and each link in this network has 1 Mb/s capacity.
Assume that from a source node s to a terminal node t the network can carry a flow of
20 Mb/s, but not more. Let us call a link critical if it has the following property: if the
link is removed, then the network cannot carry the 20 Mb/s flow from s to t anymore.
a) Is it possible that this network contains less than 20 critical links? Justify your
answer.
Answer: No, it is not possible. Let us consider a maximum flow, which has
value 20 Mb/s, according to the conditions. If we look at a minimum cut which
separates s and t, then, due to the max flow min cut theorem, this cut must have
capacity 20 Mb/s. Since each link has 1 Mb/s capacity, there must be 20 links in
the cut. If we remove any of these links, the capacity of the cut goes down to 19
Mb/s, so it is not possible to send 20 Mb/s flow anymore. Thus, each link in the
minimum cut is critical and there are 20 of them.
5
b) Is it possible that this network contains more than 20 critical links? Justify your
answer.
Answer: Yes, it is possible. For example, let the network have 3 nodes: s, a, t.
Define the links in the following way: connect s to a by 20 directed links, each
of capacity 1 Mb/s, and also connect a to t by 20 directed links, each of capacity
1 Mb/s. Then the maximum flow from a to t is 20 Mb/s. The links between s
and a form a minimum cut in which each link is critical. The same is true for the
links that connect a to t. Thus, this network satisfies the conditions and it has
40 critical links.
6
9 * Consider an undirected graph, and let λ(a, b) denote the edge connectivity between
any two different nodes a, b.
Let x, y, z be three distinct nodes in the graph. Does the inequality
Comment. One may ask here: could it be true that the inequality
7
CS 6385 SAMPLE EXAM ANSWERS
Correct answer: 2
P
2 The formula for the network-wide mean delay is T = γ1 li=1 Cif−f
i
i
, where γ is the total volume
of traffic in the network. Assume we double each link capacity and also double each link flow.
Which of the following describes correctly the resulting change in the value of T ?
fi 2fi
1. Since Ci −fi = 2Ci −2fi holds for every i, therefore, T remains the same.
2. Since by doubling the flow on each link the total traffic volume γ will also double and T is
inversely proportional to γ, therefore, taking into account that the summands fi /(Ci −fi )
preserve their values, T will decrease to half of its original value.
3. The resulting change in T depends on how the flow is distributed in the network, so the
given information is insufficient to determine how much the value will change.
4. None of the above.
Correct answer: 2
3 A step in the Cut Saturation Algorithm is the finding of a saturated cut. Consider now the
following slight modification of the problem. Let us call a cut nearly saturated if it contains
at most one non-saturated link and the rest of the links in the cut are all saturated. Which
of the following statements is true?
1. If the network contains a nearly saturated cut, then the saturated links within this cut
form a saturated cut.
2. If the network contains exactly one saturated cut, then it cannot contain a nearly satu-
rated cut.
3. If there is no nearly saturated cut in the network, then there is at most one saturated
cut.
4. If there is no nearly saturated cut in the network, then there is no saturated cut either.
5. None of the above.
Correct answer: 4
1
4 A network has 3 nodes and there is an undirected link between each pair of nodes. Each link
is operational independently of the others with probability p=0.7. The system is considered
operational if the network topology is connected. Which of the following is true?
Correct answer: 1
5 In a mixed ILP x is a continuous variable and y is a discrete variable with y ∈ {0, 1}. We would
like to express the following conditions via linear constraints:
• if y = 0 then also x = 0
• if y = 1 then a ≤ x ≤ b for some given constants 0 < a < b.
Which of the following does it correctly:
1. xy = 0, a≤x≤b
2. x = y = 0 or a ≤ x ≤ b
3. a − y ≤ x + y ≤ b − y
4. y(a − y) ≤ x ≤ y(b − y)
5. ay ≤ x ≤ by
6. None of the above.
Correct answer: 5
6 Consider the reliability of a series configuration with n components and the reliability of each
√
component is p = 1 − 1/(n + n). What happens if n grows very large (n → ∞)?
Correct answer: 3
7 Assume that in a Tabu Search algorithm we define the neighborhood of an n-dimensional binary
vector as follows. Let w(x) denote the number of 1 bits in x (the weight of x). The binary
vectors x, y are defined neighbors if and only if |w(x) − w(y)| ≥ 2 holds. Which of the
following is correct?
1. The state space will not be connected for any n, because by the above definition neighbors
differ in at least 2 bits, so one cannot reach a vector that differs from the current one in
only one bit.
2
2. The state space will always be connected for any n ≥ 2.
3. If n ≥ 3, then any vector x which has at least 3 bits with value 1 has the zero vector
among its neighbors.
4. None of the above.
Correct answer: 3
x1 + x2 + . . . + xn ≤ n − 1
or
x1 + x2 + . . . + xn < n
b) At most one variable can take the value 0.
Answer:
This means, at least n − 1 of them is 1, so their sum is at least n − 1:
x1 + x2 + . . . + xn ≥ n − 1
c) Either all variables are 0, or none of them.
Answer:
This means, they all must be equal:
x1 = x2 = . . . = xn
or
x1 = x2
x2 = x3
..
.
xn−1 = xn
9 Consider the Concentrator Location Problem with the following modification: the cost
of placing a concentrator at a site is charged only when the site serves more than 5
terminals. For those sites that serve at most 5 terminals the cost is waived. Let us
assume that the capacity k of a concentrator satisfies k > 5. Provide a modification
of the original Concentrator Location Problem formulation to cover this modified case.
The formulation should preserve the linearity of the original formulation, that is, you
cannot add any case separation, like ”if ... then ...”, or any other nonlinear formulation.
Justify your solution and explain the meaning of every variable and constraint.
3
Answer:
For reference, here is the original formulation of the Concentrator Location Problem.
(In an exam it does not have to be included, we just include it here for handy reference.)
Variables:
(
1 if terminal i is connected to site j
xij =
0 otherwise
(
1 if a concentrator is placed at site j
yj =
0 otherwise
Optimization task:
n X
X m m
X
min Z = cij xij + d j yj .
i=1 j=1 j=1
Subject to
n
X
xij ≤ kyj (∀j)
i=1
Xm
xij = 1 (∀i)
j=1
xij , yj ∈ {0, 1} (∀i, j)
Note: The above formulation is just for reference, the actual solution follows below.
To obtain the required modification, it is enough to add 5 to the righthand side of the
first constraint and changing the meaning of yj accordingly: now yj will be 1 only if
more than 5 terminals are served by the concentrator. Those that serve at most 5 are
free. We also replace the coefficient of y in this constraint by k − 5, to maintain that a
maximum of k terminals can be served by a concentrator (otherwise it would grow to
k + 5).
Variables:
(
1 if terminal i is connected to site j
xij =
0 otherwise
(
1 if a concentrator is placed at site j and it serves more than 5 terminals
yj =
0 otherwise
4
n X
X m m
X
min Z = cij xij + d j yj .
i=1 j=1 j=1
Subject to
n
X
xij ≤ (k − 5)yj + 5 (∀j)
i=1
Xm
xij = 1 (∀i)
j=1
xij , yj ∈ {0, 1} (∀i, j)