0% found this document useful (0 votes)
15 views202 pages

Combined Doc

Mathematical programming is an optimization task aimed at finding solutions that minimize or maximize an objective function under constraints, with various classifications including linear, nonlinear, and combinatorial optimization. The document outlines the formulation of linear programming (LP) problems, emphasizing the importance of proper formulation for utilizing existing algorithms and software for solutions. It also provides examples and exercises to illustrate the application of LP in real-world scenarios such as manufacturing and telecommunications.

Uploaded by

jsanthoshithota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views202 pages

Combined Doc

Mathematical programming is an optimization task aimed at finding solutions that minimize or maximize an objective function under constraints, with various classifications including linear, nonlinear, and combinatorial optimization. The document outlines the formulation of linear programming (LP) problems, emphasizing the importance of proper formulation for utilizing existing algorithms and software for solutions. It also provides examples and exercises to illustrate the application of LP in real-world scenarios such as manufacturing and telecommunications.

Uploaded by

jsanthoshithota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 202

Mathematical Programming for Planning

What is a mathematical program?

An optimization task where we want to find a solution that


minimizes or maximizes an objective function under constraints.
Typically, it has many variables and constraints.

Historical note: the word “program” in the name had origi-


nally noting to do with computer programming, even though
the solving algorithm is naturally implemented by a computer
program. The name “programming” was originally motivated
by its meaning “preparing a schedule of activities,” before com-
puters existed at all. It is still used, e.g., in oil refineries, where
the refinery programmers prepare detailed schedules of how the
various process units will be operated and the products blended.

Classification

• Continuous — Discrete: If all the variables can take con-


tinuous values (real numbers) then the task is called con-
tinuous. If the variables are allowed to take discrete values
only (e.g., integers), then we speak about a discrete pro-
gramming task. (Sometimes both cases occur within the
same problem, then we speak about mixed programming).
• Linear Programming (LP): A linear programming problem
(see more later) is a continuous optimization task in which
the objective function is linear and the constraints are ex-
pressed by linear inequalities or linear equations.

• Nonlinear Programming: If either the objective function


or the constraints (or both) are nonlinear, then we speak
about nonlinear programming.

• Combinatorial Optimization: Many optimization problems


that occur in network design are associated with a combina-
torial structure, typically a graph. This often imposes the
constraint that the variables are 0-1 valued, i.e, can only
take 2 possible values (0 or 1). Then we often speak about
combinatorial optimization.

Our first objective is to show how to formulate network design


related problems as mathematical programming tasks.

Why is it important? Because once the task is formulated as a


standard mathematical programming problem, the solution can
be found using known algorithms that are available as commer-
cial software, or even freeware.

Thus, if we can ask the question properly, the answer is often


available off-the-shelf. But, as we are going to see, it is not
always easy to properly formulate the task in a given framework.
Linear
Programming

A motivating example
Optimizing the Product Mix
in Manufacturing
Linear Programming (LP)
formulation
Graphical Representation

Objective:
About the Solution
The feasible region is delimited by a convex polygon.
Determined by its vertices (corner points)
Linear Programming

The objective function is of the form


Z = c1 x1 + c2 x2 + . . . + cn xn
where

c1 , . . . , cn are given constants

x1 , . . . , xn are the variables to be optimized.

In the standard form of LP the constraints are formulated as


a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm
x1 ≥ 0
..
.
xn ≥ 0

We either want to minimize or maximize the objective function, subject to


the constraints. That is, given the parameters aij , bi , ci as input, we want to
find values for the xi variables, such that all constraints are satisfied and the
objective function takes minimum or maximum value, depending on whether
we want to minimize or maximize.

Form the mathematical point of view, minimization and maximization are


equivalent, since minimizing an objective function Z means the same as max-
imizing −Z, and vice versa.
Thus, a linear programming problem (or linear program, for short; abbrevi-
ated LP) looks like this for minimization:

min Z = c1 x1 + c2 x2 + . . . + cn xn
subject to
a11 x1 + a12 x2 + . . . + a1n xn = b1
a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm
x1 ≥ 0
..
.
xn ≥ 0

For maximization it would look the same, just the “min” would be replaced
by “max”.

An important restriction is that we only allow linear expressions, both in the


objective function and in the constraints. A linear expression (in terms of
the xi variables) can only be one of the following:

• a constant

• a variable xi multiplied by a constant (such as ci xi )

• sum of expressions that belong to either the above types


(Examples: c1 x1 + c2 x2 + . . . + cn xn , or 3x4 − 2).

No other types of expressions are allowed. In particular, no product of vari-


ables, no logical case separation, etc.
The standard form LP in vector-matrix notation:

min Z = cx
subject to

Ax = b
x ≥ 0

Other formulations are also possible. An often used version is shown below
(some textbooks call this one the standard form, rather than the one above):

min Z = cx
subject to
Ax ≥ b
x≥0

Note: x is meant a column vector, which implies that c must be row a vector,
so that the cx product makes sense. If c is a column vector, then we write
cT . Usually it is clear from the context, which vector is column and which
is row, so often the distinction is not shown explicitely.

When we write an inequality between vectors, then it means it holds com-


ponentwise. For example, Ax ≤ b means that when we take matrix-vector
product Ax, then each component of the resulting vector is ≤ than the cor-
responding component of the vector b. Similarly, x ≥ 0 means that each
component of x is ≥ 0.
The different formulations can be easily converted into each other, usually
at the price of increasing the number of variables.

Why does it make sense to increase the number of variables? Because the
standard form may be advantageous (For example, an LP solver program
may require the input in standard form).

Exercises

1. Convert the following LP into standard form:

min Z = 5x − 6y
subject to

2x − 3y ≥ 6
x−y ≤ 4
x ≥ 3

Solution: Set y = x2 − x3 for the free (unbounded) variable y, using the


fact that any number can be expressed as the difference of two nonnegative
numbers. Furthermore, introduce a slack variable in each inequality. These
slack variables, as discussed in class, “fill the gap” between the two sides of
the inequality, to convert it into an equation.

For example, the first constraint 2x − 3y ≥ 6 will be transformed as follows.


Let us use x1 instead of x, for uniform notation. The variable y will be
replaced by x2 − x3 , where x2 , x3 ≥ 0. Then we get
x1 − 3 (x2 − x3 ) −
2 |{z} x4
|{z}
= 6.
| {z }
=x this was y slack variable

Notice that this will indeed force the original left-hand side (the part before
x4 ) greater than or equal to 6, as the original inequality required. The reason
is that x4 ≥ 0 must hold, since all variables are forced to be nonnegative in
the standard from. Therefore, the expression before x4 must be ≥ 6, since
subtracting a positive (or 0) quantity makes it equal to 6. If the original
inequality had the ≤ direction, then the only difference would be that we
would add the slack variable, rather than subtracting it.

After transforming the other inequalities, too, and also substituting the old
variables in the objective function with the new ones, we get the entire stan-
dard form:

min Z = 5x1 − 6x2 + 6x3


subject to

2x1 − 3x2 + 3x3 − x4 = 6


x1 − x2 + x3 + x5 = 4
x1 − x6 = 3
x1 , x 2 , x3 , x 4 , x5 , x 6 ≥ 0
2.

3.

4.
5.

schedule that minimizes the cost, such that in each month the demand is satisfied.

6. * A telecommunications company sets up routes through its network to serve


certain source-destination (S-D) pairs of traffic. We want to assign bandwidth to
each route, under the following conditions:

• The routes are fixed and known in advance, each route goes through a
known set of links. (These sets can possibly overlap, as the routes may
share links.)

• Each link has a known available capacity, which cannot be exceeded by the
routes that use the link, in the sense that the sum of the route bandwidths on
the link cannot be more than the link capacity.

• Assigning bandwidth to a route has a cost. This cost is proportional to the


bandwidth assigned. The cost of unit bandwidth is known for each route
(may be different for different routes).

• Each route generates a profit, due to the traffic it carries. The profit of each
route is proportional to the bandwidth assigned to the route. The profit
generated by unit bandwidth is known for each route (may be different for
different routes).

Under the above conditions, the company wants to decide how much bandwidth to
assign to each route. The goal is that the ratio of the total profit vs. the total cost is
maximized. In other words, they want to maximize the yield of the bandwidth
investment in the sense that it brings the highest profit percentage. Formulate this
optimization problem as a linear program.
Hint: Formulate it first as optimizing the ratio of two linear functions under linear
constraints. This is still not an LP, since the objective function is a fraction. Then
convert it into an LP in two steps, using the following idea.

Step 1. If you multiply both the numerator and the denominator of the fractional
objective function by the same (nonzero) number, then the value of the fraction
remains the same. Therefore, after introducing such as scaling factor, which will be
an extra variable, we can fix the denominator at an arbitrary fixed nonzero value,
say 1. This gives a new constraint. After adding the new constraint, we can aim at
maximizing only the numerator in the new system, so we got rid of the fraction in
the objective function.

Step2. While this new system is still nonlinear, containing the product of the
original variables and the scaling factor, we can make it linear it by introducing
another new variable for the product of variables.
Solution of Exercises 5 and 6
Exercise 5. (See the problem description in the previous lecture note.)
Let us introduce the following variables:
xi : amount of regular production in month i
yi : amount of overtime production in month i
ui : amount used from storage in month i
vi : amount put into storage in month i
wi : amount available in storage at the beginning of month i
Then in each month we need to satisfy the following constraints:

xi + yi + ui − vi = di
wi+1 = wi − ui + vi
Explanation: The first constraint expresses that the regular production,
plus the overtime production, plus what we use from storage, should satisfy
the demand (di ), after subtracting the amount that we put into storage. The
second constraint expresses that the storage content at the beginning of the
next month will result by what is available in the current month, minus what
we use from storage, plus what we add to it.
We have to include some further constraints:

• xi ≤ r (∀i)
(at most r units can be produced by regular production each month)

• w1 = 0
(there is nothing yet in storage at the beginning of the first month)

• wn+1 = 0
(there is nothing left in storage after the last month)

1
• ui ≤ wi (∀i)
(we cannot use more from storage than what is available there)

• All variables are non-negative.

The objective function is the total cost, summed over all months, including
regular production, overtime, and storage. To make the expression more
general, let us allow the cost coefficients vary month by month; it is expressed
by indexing them with the month index i. Thus, the objective function is:
n
X
Z= (bi xi + ci yi + si wi ).
i=1

Collecting the parts, we get the complete LP formulation:

n
X
min Z = (bi xi + ci yi + si wi )
i=1

Subject to
xi + yi + ui − vi = di (∀i)
wi+1 = wi − ui + vi (∀i)
xi ≤ r, ui ≤ wi (∀i)
w1 = 0, wn+1 = 0
xi , yi , ui , vi , wi ≥ 0 (∀i)

2
Exercise 6. (See the problem description in the previous lecture note.)

a.) Choosing notations


Let us index the links by i = 1, . . . , L, and the routes by j = 1, . . . , R. We use
the following further constants, which are assumed known (given as input):
Bi : bandwidth available on link i
cj : cost of unit bandwidth on route j
pj : profit brought by unit capacity on route j
A = [aij ]: link-route incidence matrix. Its size is L × R. The matrix entries
tell which link is used by which routes:

 1 if link i is on route j
aij = 
0 otherwise

Note that this matrix is also known as input, since it is assumed that the
route system is given.
Variables: xj , j = 1, . . . , R; it represents the capacity assigned to route j.

b.) Finding a mathematical programming formulation


The constraint we need to satisfy for each link i is this: the summed capacity
of the routes that use the link cannot exceed the bandwidth available on the
link. This can be expressed as

ai1 x1 + ai2 x2 + . . . + aiR xR ≤ Bi

Note that while all routes occur in this expression, only those contribute to
the sum which use link i. For these we have aij = 1. For the rest, which do
not use the link, we have aij = 0, so they do not contribute to the sum.
The objective function is the ratio of the total profit vs. the total cost,
summed over all routes: PR
p j xj
Z = Pj=1
R
j=1 cj xj

3
Thus, the mathematical programming (but not yet linear programming!)
formulation that directly follows the verbal description is this:
PR
p j xj
max Z = Pj=1
R
j=1 cj xj

Subject to

ai1 x1 + ai2 x2 + . . . + aiR xR ≤ Bi i = 1, . . . , R

xj ≥ 0 i = 1, . . . , R

Here the objective function is nonlinear, so we want to replace it with some


linear formulation in the next step.

c.) Transforming into a linear programming task


Let us carry out this step in a slightly more general from, so that it becomes
usable in other situations, as well. Let x be the variable vector, and c, d
be constant vectors. Further, let α, β be scalar constants, A be a constant
matrix, and b a constant vector. They all have dimensions that make the
operations below executable. Note that these notations are generic, not
coming directly from the previous considerations.
Let us now consider the following nonlinear optimization task, in which the
objective function is the ratio of two linear functions, subject to a system
of linear constraints. The latter is in the LP standard form. Clearly, the
formulation found in b.) can be put in this format.

cx + α
max Z =
dx + β
Subject to
Ax = b
x≥0

4
To avoid additional difficulties, we assume that the denominator dx + β
is always positive in the feasible domain. Let us introduce a new variable
t, which we call scaling factor. If we multiply both the numerator and the
denominator in the objective function by t, then the value of the ratio remains
the same:
cx + α (cx + α)t cxt + αt
Z= = =
dx + β (dx + β)t dxt + βt
Observe now that the arbitrary value of t allows it to be chosen such that the
denominator becomes 1. This can be enforced by a new constraint. Then,
the denominator being 1, it is enough to maximize the numerator. Thus, the
following task will have the same optimum as the original:

max cxt + αt

Subject to
dxt + βt = 1
Ax = b
x≥0
(A “sanity check” question: the usage of t assumed that t 6= 0. How do we
know that it holds? Answer: the constraint dxt + βt = 1 guarantees it, as
it could not hold with t = 0. Note: it also could not hold with dx + β = 0,
but we assumed that the denominator dx + β is always positive.)
The above formulation is still nonlinear, since xt is a product of two variables.
Let us introduce a new variable y by

y = xt. (1)

Substituting it in the formulation, we get

max cy + αt

Subject to
dy + βt = 1

5
Ax = b
x≥0
This is already a linear programming task, but the relationship between x, y
and t is expressed by the nonlinear formula (1). This relationship cannot be
ignored, since it represents a dependence among the variables. If, however,
we include (1) as a constraint, then the task becomes nonlinear again! We
can avoid this problem by expressing x from (1) as
1
x= y
t
and using it in place of x. Then we get

max cy + αt

Subject to
dy + βt = 1
1
A y=b
t
1
y≥0
t
Multiplying both sides in the last two constraints by t, we get

max cy + αt

Subject to
dy + βt = 1
Ay − bt = 0
y≥0
which is already a linear programming task. This is what we wanted! ♥

Two “sanity check” questions:

6
1. How do we know that the inequality 1t y ≥ 0 indeed transforms into y ≥ 0?
After all, this only holds if t > 0, which was not explicitly required.
Answer: the constraint dy + βt = 1 enforces t > 0, because it is equivalent
to dxt + βt = 1, which is in turn equivalent to (dx + β)t = 1. As we assumed
dx + β > 0, the equation (dx + β)t = 1 could not hold with t ≤ 0.

2. Having obtained an LP formulation, assume we solve it by some off-the-


shelf software, and get some optimal solution y∗ , t∗ . But these variables come
from a transformed task, they are not the original route capacity values that
we were looking for! Then what will be the optimal route capacities?
Answer: the route capacities are the components of the x vector. Once y∗ , t∗
are numerically available, we can express the optimal value of x as
1 ∗
x∗ = y
t∗
giving us the optimal route capacities as the components of the x∗ vector.

7
An LP Formulation Example:
Minimum Cost Flow

The Minimum Cost Flow (MCF) problem is a frequently used model. It can
be described in our context as follows.

Model

Given a network with N nodes and links among them. We would like to
transport some entity among nodes (for example, data), so that it flows
along the links. The goal is to determine the optimal flow, that is, how much
flow is put on each link, according to the conditions discussed below.

Let us review the input for the problem, the objective and constraints and
then let us formulate it as a linear programming task.

Input data:

• The link from node i to j has a given capacity Cij ≥ 0.

• Each node i is associated with a given number bi , the source rate of the
node. If bi > 0, the node is called a source, if bi < 0, the node is a sink.
If bi = 0, the node is a “transshipment” node that only forwards the
flow with no loss and no gain.

• Each link is associated with a cost factor aij ≥ 0. The value of aij is
the cost of sending unit amount of flow on link (i, j). Thus, sending xij
amount of flow on the link costs aij xij .

Remarks:
• The links are directed in this model, Cij and Cji may be different.

• If the link from i to j is missing from the network, then Cij = 0. Thus,
the capacities automatically describe the network topology, too.

Constraints:

• Capacity constraint: The flow on each link cannot exceed the capacity
of the link.

• Flow conservation: The total outgoing flow of a node minus the total
incoming flow of the node must be equal to the source rate of the node.
That is, the difference between the flow out and into the node is exactly
what the node produces or sinks. For transshipment nodes (bi = 0) the
outgoing and incoming flow amounts are equal.

Objective:

Find the amount of flow sent on each link, so that the constraints are satisfied
and the total cost of the flow is minimized.

LP Formulation

Let xij denote the flow on link (i, j). The xij are the variables we want to
determine.

Let us express the constraints:

• The flow is nonnegative (by definition):

xij ≥ 0 (∀i, j)
• Capacity constraints:
xij ≤ Cij (∀i, j)

• Flow conservation:
N
X N
X
xij − xki = bi (∀i)
j=1 k=1

Here the first sum is the total flow out of node i, the second sum is the
total flow into node i.

The objective function is the total cost, summed for all choices of i, j :

X
Z= aij xij
i,j

Thus, the LP formulation is:

X
min Z = aij xij
i,j
subject to
N
X N
X
xij − xki = bi (∀i)
j=1 k=1
xij ≤ Cij (∀i, j)
xij ≥ 0 (∀i, j)

Is this in standard form? No, but can be easily transformed into standard
form. Only the xij ≤ Cij inequalities have to be transformed into equations.
This can be done by introducing slack variables yij ≥ 0 for each, and replacing
each original inequality xij ≤ Cij by xij + yij = Cij .
An Application to Network Design

Consider the following network design problem. Given N nodes


and a demand of transporting data from node i to node j (i, j =
1, . . . , N, i 6= j) at a speed of bij Mbit/s.

We can build links between any pair of nodes. The cost for
unit capacity (=1 Mbit/s) on a link from node i to j is aij .
Higher capacity costs proportionally more, lower capacity costs
proportionally less.

Set aii = 0, bii = 0 for all i so we do not have to take care of


the case when i = j in the formulas.

The goal is to design which links will be built and with how
much capacity, so that the given demand can be satisfied and
the overall cost is minimum.

Let us find an LP formulation of the problem!

Let zij be the capacity we implement on link (i, j). This is not
given, this is what we want to optimize. If the result is zij = 0
for some link, then that link will not be built.
With this notation the cost of link (i, j) is aij zij , so the objective
function to be minimized is
X
Z= aij zij
i,j

(kl)
To express the constraints, let xij be the amount of flow on
link (i, j) that carries traffic from node k to l (not known in
advance). Then the total traffic on link (i, j) is obtained by
summing up these variables for all k, l :
X (kl)
xij .
k,l

Thus, the capacity constraint for link (i, j) can be expressed as


X (kl)
xij ≤ zij .
k,l

The flow conservation should hold for each piece of flow, that
is, for the flow carrying traffic between each source-destination
pair k, l. To express this concisely, let us define new constants
by


 bkl if i=k



(kl)
di = −bkl if i=l



 0 otherwise
(kl)
The value of di shows whether node i is source, sink or trans-
shipment node for the k → l flow.
Thus, the flow conservation constraints (one for each node and
for each flow) can be written as
X (kl) X (kl) (kl)
xij − xri = di (∀i, k, l)
j r

Collecting all the pieces, the LP formulation is:

X
min Z = aij zij
i,j

subject to
X (kl) X (kl) (kl)
xij − xrj = di (∀i, k, l)
j r
X (kl)
xij − zij ≤ 0 (∀i, j)
k,l
xij ≥ 0 (∀i, j)
zij ≥ 0 (∀i, j)
Comments

• This task is a variant of flow problems that are often re-


ferred as Multicommodity Flow because several flows are
handled simultaneously.

• This model contains several major simplifications. They


make the problem solvable via linear programming, but at
the price of distancing the model from reality. Try to list
some of these simplifications!

• Having found the LP formulation, a “brute force” way to


solve it would be to apply a general purpose LP algorithm.
Given that the number of variables and constraints is large,
it would be very slow. Below we show that there is, how-
ever, a clever way of circumventing the complexity, using
the special features of the problem.

A Fast Solution

Observe that the cheapest way of sending bij amount of flow


from node k to l is to send it all along a path for which the
sum of the link costs is minimum. If this path consists of nodes
k = i1 , i2 , . . . , ir−1 , ir = l, then the resulting piece of cost is

bkl (ai1 ,i2 + . . . + air−1 ,ir )


Due to the linear nature of the model, these costs simply sum
up, independently of each other. This suggests the following
simple algorithm:

• Find a minimum cost path between each pair k, l of nodes,


with edge weights aij . This can be done by any standard
shortest path algorithm that you met in earlier courses.

• Set the capacity of link (i, j) to the sum of those bkl values
for which (i, j) is on the min cost path found for k, l.

• The optimum cost can be expressed explicitely. Let Ekl be


the set of edges that are on the min cost k → l path. Then,
according to the above, the optimal cost is:
 
X X
Zopt = bkl aij 
.
k,l (i,j)∈Ekl
Solving Linear Programs

Geometric Interpretation

The constraints determine the set of feasible solutions. This


is a polyhedron, the higher dimensional generalization of a 2-
dimensional polygon.

Finding the maximum of a linear objective function of the form


Z = cx over this polyhedron essentially means to find a vertex of
the polyhedron that is the farthest in the direction determined
by the vector c:
c
-

¿ T
¿ T
¿ T
¿ TT
¿
← Optimum
¯
A ¯
A ¯
A ¯
A ¯
AXX ¯
XX
XXX ¯
XXX
¯
In case there are only two variables, this can be graphically represented and solved
in the plane.

Graphical solution: After finding the polygonal boundary of the feasible domain
D, as illustrated in the figure below, we “push” the line 3x1+3x2 = a , representing
the objective function, as far as possible, so that it still intersects D. The optimum
will be attained at a vertex of the polygon.

If, however, as typical in applications, there are many variables, this simple
graphical approach does not work, one needs more systematic methods.
Comments on LP Algorithms

Finding the optimal solution in Linear Programming takes rel-


atively complex algorithms. Studying the details of LP algo-
rithms is beyond the scope of this course, since in most cases the
network designer can apply off-the-shelf commercial software. A
lot of freeware is also available on the Internet.

Some historical points about solving linear programs:

• The first and most widely used LP algorithm has been


the Simplex method of Dantzig, published in 1951. The
key idea of the method is to explore the vertices of the
polyhedron, moving along edges, until an optimal vertex is
reached.

• There are many variants of the Simplex Method, and they


usually work fast in practice. In pathological worst cases,
however, they may take exponential running time.

• It was a long standing open problem whether linear pro-


gramming could be solved by a polynomial-time algorithm
at all, in the worst case. The two most important discov-
eries in this area were the following:

– The first polynomial-time LP algorithm was published


by Khachiyan in 1979. This result was considered a
theoretical breakthrough, but was not very practical.
– A practically better algorithm was found by Karmarkar
in 1984. This is a so called interior point method that
starts from a point in the polyhedron and proceeds to-
wards the optimum in a step-by-step descent fashion.
Later many variants, improvements and implementa-
tions were elaborated, and now it has similar practical
peformance as the Simplex Method, while guaranteeing
polynomially bounded worst-case running time.

• It is interesting that, after more than a half century, a ma-


jor problem is still open in the world of LP algorithms:
does there exist an algorithm that solves the LP, such that
the worst-case running time is bounded by a polynomial
in terms of the number of variables and constraints only,
independently of how large are the numbers that occur in
them? (Counting elementary arithmetic operations as sin-
gle steps.) Such an algorithm is called a strongly polynomial
time algorithm. Khachiyan’s and Karmarkar’s methods do
not have this feature.

• LP solvers of different sorts are available as commercial


software, or even as freeware. Thus, the network designer
typically does not have to develop his/her own LP solver.
Once a problem is formulated as a linear programming task,
off-the-shelf software can readily be used.
The Maximum Flow Problem

The historically first and most fundamental network flow problem is the
Maximum Flow Problem. Recall that we have already seen two other flow
models: Minimum Cost Flow and Multicommodity Flow.
Model
Given a network with N nodes and links among them. We would like to
transport some entity (for example, data) from a source node to a desti-
nation node so that it flows along the links. The goal is to determine the
maximum possible amount of flow that can be pushed through the network,
such that link capacity contraints are obeyed and at the intermediate nodes
(also called transshipment nodes) the flow is conserved, i.e., whatever flows
into an intermediate node, the same amount must flow out.
Let us review the input for the problem, the objective and constraints and
then let us formulate it as a linear programming task.
Input data:

• The capacity of the i → j link is Cij ≥ 0 (i, j = 1, . . . , N ).

• There are two distinguished nodes: a source node s, and a destination


node d.

Remarks:

• The links are directed in this model, Cij and Cji may be different.

• If the link from i to j is missing from the network, then Cij = 0. Thus,
the capacities automatically describe the network topology, too.

Constraints:

1
• Capacity constraint: The flow on each link cannot exceed the capacity
of the link.

• Flow conservation: The total outgoing flow of a node is equal to the


total incoming flow of the node, except for the source and the destina-
tion.

Objective:
Find the maximum amount of flow that can be sent through the network from
the source to the destination, such that the link capacity is not exceeded on
any link and the flow conservation constraint is satisfied at every intermediate
(transshipment) node.
LP Formulation
Let xij denote the flow on link (i, j). (The values of these variables are not
known in advance, we want to optimize them.)
Let us express the constraints:

• The flow is nonnegative (by definition):

xij ≥ 0 (∀i, j)

• Capacity constraints:

xij ≤ Cij (∀i, j)

• Flow conservation:
N
X N
X
xij − xki = 0 (∀i 6= s, d)
j=1 k=1

Here the first sum is the total flow out of node i, the second sum is the
total flow into node i.

2
The objective function is the total net flow out of the source:

N
X N
X
F = xsj − xks
j=1 k=1

Thus, the LP formulation is:

N
X N
X
max F = xsj − xks
j=1 k=1

subject to
N
X N
X
xij − xki = 0 (∀i 6= s, d)
j=1 k=1
xij ≤ Cij (∀i, j)
xij ≥ 0 (∀i, j)

Solution
Solving the Maximum Flow Problem directly as a linear program, although
possible, is not economical. There are several special algorithms which utilize
the special structure of the problem and, therefore, are faster and simpler
than a general LP solution.
The historically first and most fundamental maximum flow algorithm is due
to Ford and Fulkerson. Here we provide an informal summary on how the
Ford-Fulkerson algorithm works.
The key idea of the Ford-Fulkerson algorithm is that if we can find a so
called augmenting path, then the current flow (which may be initially 0) can
be increased without violating any constraints.
An augmenting path is any undirected path from s to d, such that on any
forward edge (an edge whose direction agrees with the path traversal) the

3
flow is strictly less than the capacity, while on any backward edge (an edge
with opposite direction to the path traversal) the flow is strictly positive.
If such an augmenting path is found then we can increase the net flow out
of the source, without constraint violation, by the following operation:

• Increase the flow on every forward edge by some  > 0 (the same  for
each edge).

• Decrease the flow by  on every backward edge (using the same  as


above).

• Leave the flow unchanged on every edge not on the path.

One can directly check that with this operation we do not violate any con-
straint, assuming  was chosen small enough, such that we do not go over
the capacity on any forward edge, and do not pull the flow into negative on
any backward edge.
While the augmenting operation keeps all constraints satisfied, the net flow
out of the source increases by . The reasons are:

• If the first edge on the augmenting path is a forward edge, then the
flow on this edge grows by , and it does not change on any other edge
adjacent with the source. As a result,  more flow is flowing out of the
source.

• If the first edge on the augmenting path is a backward edge, then the
flow on this edge decreases by , and does not change on any other edge
adjacent with the source. As a result,  less flow is flowing back into
the source. Therefore, the net flow out of the source still grows by .

Thus, the augmenting operation increases the net flow that leaves the source,
while keeping all constraints satisfied. This increased flow must eventually
reach the destination, since it is preserved at every intermediate node.

4
Remark. It may seem at first an appealing interpretation of the augmenting
operation that we “push through” more flow through the augmenting path.
This is, however, not correct. It may even happen that the flow is reduced on
every edge of the path, if they are all backward edges. Then how can the flow
always increase? The answer is that the augmenting operation rearranges the
flow in a way that the net flow out of the source eventually grows, but this
does not necessarily mean that the increment is simply realized along the
augmenting path.
Having introduced the augmenting path concept, there are some natural
questions that arise at this point:

1. What is a good choice for ?

2. How do we find an augmenting path?

3. What if there is no augmenting path?

4. How many iterations (augmentations) are needed?

Let us present the answers for each:

1. The value of  should be small enough, so that we do not overshoot the


capacity on any forward edge, and do not pull the flow into negative
on any backward edge. Let us define the residual capacity of an edge
(i, j) on the augmenting path as follows:

 Cij − xij if (i, j) is a forward edge
rij =
 xij if (i, j) is a backward edge

Thus, rij is the maximum flow increment allowed on the edge, if it is


a forward edge, or the maximum flow decrement, if it is a backward
edge. The reason to call it residual capacity is that this represents the

5
maximum remaining allowed change in the flow (increase or decrease,
depending on whether the edge is forward or backward).
Then the right value of  is the minimum of the rij values along the
augmenting path. Why? Because we want  as large as possible (so
that we gain the largest possible flow increment by the augmentation),
but  must fit into each gap along the path, so that we do not violate
the capacity and non-negativity constraints.

2. An augmenting path, if exists, can be found by a shortest path com-


putation in an auxiliary graph G0 , called residual graph. This is easily
constructed from the original, using the current flow, as follows. Let
the residual graph have the same number of nodes as the original. Let
us index them such that for each original node i we assign a node de-
noted by i0 in the residual graph. Look now at each edge (i, j) in the
original graph, and assign edges between i0 and j 0 , as follows:

• If xij = 0, then include the edge (i0 , j 0 ) in G0 .


Reason: this can be a forward edge on an augmenting path (be-
cause the flow can be increased on it), but not a backward edge
(because the flow cannot be decreased on it).
• If xij = Cij , then include the edge (j 0 , i0 ) in G0 .
Reason: this can be a backward edge on an augmenting path
(because the flow can be decreased on it), but not a forward edge
(because the flow cannot be increased on it).
• If 0 < xij < Cij , then include the both edges (i0 , j 0 ) and (j 0 , i0 ) in
G0 . Reason: it can be either a forward or a backward edge on an
augmenting path.

With this construction it is easy to see that any directed s0 → d0 path


in G0 corresponds to an s → d augmenting path in the original graph.
Thus, we only need to find a directed s0 → d0 path in G0 (e.g., by

6
Dijsktra’s algorithm), and then map back the path into the original
graph.

3. Having found an augmenting path, we can increase the flow, without


violating any constraint. But what if no augmenting path exists?
Answer: in that case, the flow is necessarily maximum. This can be
proved by the following simple, but ingenious reasoning:

• Let A be the subset of nodes that are reachable from s via an


augmenting path, with respect to the current flow. The source
itself is also in A. Let B contain the remaining nodes. Observe
that d ∈ B, because we look at the situation when there is no
s → d augmenting path, so, by definition d cannot be in A.
• Look at the edges connecting A and B.
– On any edge e = (i, j) with i ∈ A and j ∈ B, the flow must
be equal to the capacity. The reason is that, by definition, i is
reachable from s via an augmenting path. If e is not saturated,
then it could be added to this path as a forward edge, creating
an augmenting path from s to j. This is, however, impossible,
as j ∈ B, and B contains those nodes that are not reachable
from s via an augmenting path.
– On any edge e = (k, `) with k ∈ B and ` ∈ A, the flow must be
zero. The reason is similar as above: the node `, being in A,
is reachable from s via an augmenting path. If e has positive
flow, then it could be added to this path as a backward edge,
creating an augmenting path from s to k. This is, however,
impossible, as k ∈ B, and B contains those nodes that are
not reachable from s via an augmenting path.
Thus, we can conclude that the edges that go from A to B must
be all saturated, and the ones that go from B to A do not carry
any flow. This means, the current flow saturates the cut defined

7
by A and B: all edges from A to B carry the maximum possible
amount of flow (their capacity), and nothing is flowing back.
Observe now that every such cut is a bottleneck: we cannot push
through more flow than the capacity of the cut, which is defined
as the summed capacity of all edges that go from the source side
to the destination side.
Thus, having reached a bottleneck, the flow must be maximum,
since no more flow can be pushed through than the capacity of
any cut. Since we can only reach the smallest bottleneck, as it
already limits how high we can go, therefore, the flow has reached
its maximum, once we find no more augmenting paths. As a by-
product, we obtain an important result, the famous Max Flow
Min Cut Theorem:
Theorem 1 The value of the maximum flow from s to d is equal
to the minimum capacity of any cut which separates d from s.

Remark. Since no more flow can be pushed through any cut than its
capacity, therefore, the total s → d flow can never be larger than the
capacity of any s → d cut (a cut that separates d from s). That is,
for any feasible value F of the s → d flow and for any s → d cut with
capacity C
F ≤C
always holds. The nice thing is that if we maximize the lefthand-side
and minimize the righthand-side, then we get precise equality. In other
words, whatever is not obviously excluded by the capacity limitations,
can actually be achieved. There is no gap here between the theoretically
possible and the practically achievable!

4. Regarding the number of iterations (augmentations) the algorithm has


to do in the worst case, it can happen that with unlucky choices for
the augmenting paths, the number of iterations can grow exponentially.

8
On the other hand, one can easily avoid this, as shown in the following
theorem.
Theorem 2 If among the possible augmenting paths in the residual
graph a shortest path, using the residual capacities as edge weights, is
chosen in every iteration, then the algorithm reaches the maximum flow
after at most nm/4 iterations, where n is the number of nodes and m
is the number of edges in the graph.
Note: This variant of the Ford-Fulkerson method is often called Edmonds-
Karp algorithm.

Yet another important result is about the integrality of the maximum flow.
Theorem 3 If all capacities are integers, then there exists a maximum flow
in which the value of the flow on each edge is an integer (and, therefore, the
total flow is also integer). Such an integer flow can also be efficiently found
by the Ford-Fulkerson (or Edmonds-Karp) algorithms.
This theorem is not hard to prove: if we start with the all-zero flow, and
all capacities are integers, then one can prove with a simple induction that
 can always be chosen an integer, so the flow remains integer after each
augmentation.
The integrality theorem makes it possible to use maximum flow computations
for solving graph optimization problems. Key examples are the maximum
number of disjoint paths, and the maximum bipartite matching problem.
See details in the slide set entitled “Application of Maximum Flow to Other
Optimization Problems,” as well as in the class discussion.

9
Application of Maximum Flows
to Solve Other Optimization
Problems
Disjoint Paths
Disjoint path network: G = (V, E, s, t).
■ Directed graph (V, E), source s, sink t.
■ Two paths are edge-disjoint if they have no arc in common.

Disjoint path problem: find max number of edge-disjoint s-t paths.


■ Application: communication networks.

2 5

s 3 6 t

4 7

3
Disjoint Paths
Disjoint path network: G = (V, E, s, t).
■ Directed graph (V, E), source s, sink t.
■ Two paths are edge-disjoint if they have no arc in common.

Disjoint path problem: find max number of edge-disjoint s-t paths.

2 5

s 3 6 t

4 7

4
Disjoint Paths
Max flow formulation: assign unit capacity to every edge.

1
1 1 1 1
1
s 1 1 t

1 1
1 1 1

Theorem. There are k edge-disjoint paths from s to t if and only if the


max flow value is k.
Proof. ⇒
■ Suppose there are k edge-disjoint paths P1, . . . , Pk.
■ Set f(e) = 1 if e participates in some path Pi ; otherwise, set f(e) = 0.
■ Since paths are edge-disjoint, f is a flow of value k.

5
Disjoint Paths
Max flow formulation: assign unit capacity to every edge.

1
1 1 1 1
1
s 1 1 t

1 1
1 1 1

Theorem. There are k edge-disjoint paths from s to t if and only if the


max flow value is k.
Proof. ⇐
■ Suppose max flow value is k. By integrality theorem, there exists
{0, 1} flow f of value k.
■ Consider edge (s,v) with f(s,v) = 1.
– by conservation, there exists an arc (v,w) with f(v,w) = 1
– continue until reach t, always choosing a new edge
■ Produces k (not necessarily simple) edge-disjoint paths.
6
Link Disjoint Routes
There are 3 arc-disjoint s-t paths

1 5 9

2 6 10

s t

3 7 11

4 8 12

3
Deleting 3 arcs disconnects s and t

1 5 9

2 6 10

s t

3 7 11

4 8 12

Let S = {s, 3, 4, 8}. The only arcs from S to


T = N\S are the 3 deleted arcs. 4
Node disjoint paths
Two s-t paths P and P' f are said to be node-
disjoint if the only nodes in common to P and P'
are s and t).

How can one determine the maximum number of


node disjoint s-t paths?
Answer: node splitting

Theorem. Let G = (N,A) be a network with no arc


from s to t. The maximum number of node-
disjoint paths from s to t equals the minimum
number of nodes whose removal from G
disconnects all paths from nodes s to node t.

5
There are 2 node disjoint s-t paths.

1 5 9

2 6 10

s t

3 7 11

4 8 12

6
Deleting 5 and 6 disconnects t from s?

1 5 9

2 6 10

s t

3 7 11

4 8 12

Let S = {s, 1, 2, 3, 4, 8} There is no arc directed


Let T = {7, 9, 10, 11, 12, t} from S to T.
7
Bipartite Matching
Bipartite matching.
■ Input: undirected, bipartite graph G = (L ∪ R, E).
■ M ⊆ E is a matching if each node appears in at most edge in M.
■ Max matching: find a max cardinality matching.

1 1’

2 2’ Matching
1-2’, 3-1’, 4-5’
3 3’

4 4’

L 5 5’ R
13
Bipartite Matching
Bipartite matching.
■ Input: undirected, bipartite graph G = (L ∪ R, E).
■ M ⊆ E is a matching if each node appears in at most edge in M.
■ Max matching: find a max cardinality matching.

1 1’

2 2’ Matching
1-1’, 2-2’, 3-3’, 4-4’
3 3’

4 4’

L 5 5’ R
14
Bipartite Matching
Max flow formulation.
■ Create directed graph G’ = (L ∪ R ∪ {s, t}, E’ ).
■ Direct all arcs from L to R, and give infinite (or unit) capacity.
■ Add source s, and unit capacity arcs from s to each node in L.
■ Add sink t, and unit capacity arcs from each node in R to t.

1 ∞ 1’

1 2 2’ 1

s 3 3’ t

4 4’

L 5 5’ R
15
Graph Connectivity and Minimum Cuts

1 Basic Concepts

Often we want to find a minimum cut in the entire graph, not just between a
specified source and destination. The size of this minimum cut characterizes
the connectivity of the graph. It is important to study it, as it allows to
decide how vulnerable the network is to link or node failures.

Exercise: Create an example where the (edge-)connectivity of the graph


is 3 times smaller than the connectivity between a given source-destination
pair.

Let us consider an undirected graph, which is also unweighted, that is, no


capacities assigned to the edges. If it models the topology of a network, then
the size of the minimum cut tells us that minimum how many links have
to fail to disconnect the network (assuming, of course, that it was originally
connected). This is the edge-connectivity. We can similarly define node
connectivity, as well. These connectivity values characterize the vulnerability
of the network, that is, how easy it is to disconnect it. We mainly focus on
the (simpler) edge-connectivity. Therefore, often we just call it connectivity,
for short.

Definitions

• Edge-connectivity between two nodes: λ(x, y) = minimum num-


ber of edges that need to be deleted to disconnect nodes x 6= y.
λ(x, y) is the size of a minimum cut between x and y.

• Edge-connectivity (of the graph): λ(G) = minimum number of


edges that need to be deleted to disconnect G.

λ(G) = min λ(x, y).


x6=y

λ(G) is the size of a minimum cut.

1
• Node-connectivity of the graph: κ(G) = minimum number of
nodes (vertices) that need to be deleted, such that the remaining graph
is either disconnected, or it has no edge at all. κ(G) is the size of a
minimum vertex-cut, also called node-cut.

Naming convention: If a graph has λ(G) = k, we call it k-connected.


Similarly, if κ(G) = k, then the graph is called k-node-connected. Occasion-
ally we also use the phrase that the graph is ≥k-connected, to express that
λ(G) ≥ k.

Exercises

1. Construct a graph with κ(G) = 2 and λ(G) = 3.


2. Let δ(G) denote the minimum node degree in a graph G. A basic result
is that the following holds for every graph G:
κ(G) ≤ λ(G) ≤ δ(G). (1)
(Sometimes it is referred to as Whitney’s Theorem.) Prove it!
Solution: Regarding the minimum degree, it is clear that if we remove
all the δ(G) edges that are adjacent to a node of minimum degree, then
it gets disconnected from the rest of the graph. Thus, we obtain
λ(G) ≤ δ(G). (2)

To prove κ(G) ≤ λ(G), assume λ(G) = k (note that we must have


k ≥ 1, as we deal with connected graphs). Then the nodes of G can
be partitioned into 2 sets, say A and B, such that precisely k edges
go between the two sets. Let us call this set of k edges E, and let
e = {x, y} be one of them. For every other edge in E, if any, pick one
of its endpoints, which is different from x and y. Clearly, each edge
must have such an endpoint, otherwise it would coincide with e. Let
W be the set of the points we picked (possibly W = ∅, if k = 1).
Since the cut has k − 1 edges other than e, we have |W | ≤ k − 1 (the
reason for possibly not having exact equality is that some edges may
share an end node, and that is picked only once).
Now set C = W ∪ {x}. Clearly, |C| ≤ k, due to |W | ≤ k − 1. We claim
that C is a node-cut. Why?

2
• C contains at least one endpoint from each cut edge, so the re-
moval of C removes all cut edges. If there are no more edges at
all in the graph, then C satisfies the definition of a node-cut.
• If, after the removal of C, at least one edge remains, then such an
edge must be fully either in A or B, since it cannot be in the cut,
as the cut edges have all been removed. Let e′ = {u, v} be such
an edge. Pick the endpoint of the cut edge e = {x, y} that is not
in the same set as u. Let us say, this endpoint of e is y (since we
can name the nodes arbitrarily). Then u and y are separated by
C, since they are on different sides of the cut, and the removal of
C removes all edges in the cut. Therefore C is a node-cut, as it
separates (at least) two nodes.

Knowing |C| ≤ k, and that C is a node-cut, we have

κ(G) ≤ |C| ≤ k = λ(G). (3)

Inequalities (3) and (2) together imply (1) that we wanted to prove.

Comment. The graph parameters κ(G), λ(G) and δ(G) are positive
integers for any connected graph G, with at least two nodes. Fur-
thermore, as we have seen, they must always satisfy the condition
κ(G) ≤ λ(G) ≤ δ(G).
At this point it is natural to ask: is there any other condition they must
satisfy, so that a graph exists with the given parameters? Interestingly,
the (nontrivial) answer is that nothing else is imposed:
Theorem. For any three positive integers k, ℓ, d, if they satisfy

k ≤ ℓ ≤ d,

then there exists a graph G, such that

κ(G) = k, λ(G) = ℓ, δ(G) = d.

3
3. We want to design a network topology, represented by an undirected
graph, such that it satisfies the following conditions:

• If any 3 nodes fail, the network remains connected.


• If any 4 links fail, the network remains connected.
• Every node has at most 9 neighbors.
• The average number of neighbors the nodes have is at most 6.
• Any two nodes are reachable from each other in at most 3 hops.
• The network has exactly 30 nodes.

Solution. Let us denote the maximum degree by ∆(G), the average


degree by d(G), and the diameter by diam(G). (The diameter is defined
as the smallest integer k, such that any two nodes can reach each other
on a path of length at most k.)
Then the conditions mean that we are looking for a graph with the
following parameters:

κ(G) ≥ 4
λ(G) ≥ 5
∆(G) ≤ 9
d(G) ≤ 6
diam(G) ≤ 3
n = 30.

How to find such a graph?... See one on the next page.

4
How was this graph found? Designing it by hand may be a daunting
task...

Answer: Check out the website House of Graphs.

Comment: The above exercises suggest the following general graph


synthesis problem:

Given a list of graph parameter values, find a graph with these


parameters.

Unfortunately, no efficient algorithm is known to solve this for an ar-


bitrary set of parameters.

4. Consider a connected network with undirected links. We have two


pieces of information about the network:

• The network has 37 nodes.


• It is impossible to disconnect the network by removing 2 or fewer
links.

Justify wether the following statement is true or not:


There must be a node with at least 4 links adjacent to it.

Solution: If it is impossible to disconnect the network by removing 2


links, then each node must have degree ≥ 3. Let us assume that the
degree of each node is exactly 3. Then, if we sum up the degrees, we
get 3 × 37 = 111. On the other hand, the sum of the degrees is twice
the number of edges, since each edge contributes 2 to the sum (1 at
each of its endpoints). Thus, we get 111 = 2m, where m is the number
of edges. Since m must be an integer, the equality

111 = 2m

cannot hold, as the right-hand side is even, the left-hand side is odd.
Thus, the assumption that all degrees are 3 leads to a contradiction.
Therefore, there must be a node with degree d 6= 3. Since we cannot
disconnect the network by removing 2 or fewer links, d ≤ 2 is impossi-
ble. This yields that this node must have degree at least 4.

5
5. Consider a network, represented by a k-connected graph G. We want
to extend the network by adding a new node u. We decide to connect
the new node u to ℓ different old nodes v1 , . . . vℓ that we can chose
from G. What should be the value of ℓ, and how should we choose the
ℓ neighbors v1 , . . . vℓ of u, if we want to guarantee that the extended
network preserves k-connectivity?

Solution: We prove that if we chose any k different nodes for the


neighbors of u, the resulting graph G′ is always k-connected. That is,
we can use ℓ = k, and the neighbors can be chosen arbitrarily.

Proof. Assume that λ(G′ ) < k. Then there is a cut C containing


≤ k − 1 edges, such that their removal separates the nodes of G′ into
two sets A, B with no connection to each other. Let us call A the set
that contains u. We observe that u cannot be alone in A, since then
all its neighbors would be in B, which means all the k edges adjacent
to u would be in the cut, contradicting to |C| ≤ k − 1. If, however, u
is not alone in A, then after removing u, we get a decomposition of the
original nodes into two non-empty sets (A−{u} and B), such that they
are separated by the cut C. Since |C| ≤ k − 1, this would contradict
to λ(G) = k. Therefore, we obtain that no cut of size ≤ k − 1 can exist
in G′ , so λ(G′ ) ≥ k must hold.

6
6. Based on the idea of Exercise 5, one can also prove the following result:

Let G1 , G2 be two k-connected graphs, such that their node


sets are disjoint. Connect the two graphs arbitrarily by k
new edges, to obtain a larger graph G. Then the resulting
graph G remains k-connected.

Proof. Assume indirectly that the resulting graph has λ(G) < k. Then
its nodes can be partitioned into two non-empty sets, A and B, such
that they are separated by a cut C containing at most k −1 edges. Now
a key observation is that G1 must be either fully in A or fully in B. The
reason is that if G1 has nodes in both sets, then its nodes in A would
be separated from its nodes in B by the cut C. Since |C| ≤ k − 1, this
would contradict to λ(G1 ) = k. As the naming of the sets is irrelevant,
let us say that G1 falls fully in A.
The same argument applies to G2 , as well, so it must also fall fully in
one of the sets. Since A, B are non-empty, and they contain no other
nodes than the original ones, therefore, G2 must fall fully in the other
set B. This means, the subgraph induced by A is equal to G1 , and
the subgraph induced by B is equal to G2 . The sets A, B, however,
are connected to each other by at most k − 1 edges, according to the
indirect assumption. This contradicts to the construction, in which we
connected G1 and G2 by k edges.

This result can be used, for example, to optimally solve the following
network design problem, which would otherwise look quite hard:

Given two k-connected networks, connect them into a single


k-connected network, by adding new links. Assume that ev-
ery added new link incurs a given (possibly different) cost,
and we want to achieve the goal with minimum cost.

This could appear as a hard optimization problem, since one can choose
the connecting links in exponentially many different ways, and we have
to find the least expensive solution under the constraint that the re-
sulting larger network remains k-connected.
However, in view of the above result, we can simply choose any k links
to connect the two networks, the obtained larger network will always

7
be k connected. Therefore, the solution that attains the minimum cost
is simply the one in which we select the k least expensive links.

2 Some Fundamental Theorems

A fundamental question is that how many disjoint paths (routes) can be


guaranteed between any two nodes. We focus here on the link-disjoint case.
In a network topology this is important to ensure backup routes in case of
link failures. A fundamental charaterization is provided by the following
theorem.

Theorem (Menger’s Theorem) For any connected graph G the follow-


ing hold:

• For any two different nodes x, y the value of λ(x, y) is equal to the
maximum number of edge-disjoing paths connecting x and y.

• λ(G) is equal to the smallest value k with the property that between any
two nodes there are at least k edge-disjoint paths.

• Analogous statements hold for node connectivity, as well.

Exercises

1. Consider an optical network. Assume we know that this network cannot


be disconnected by two link failures. We want to select for any two
nodes a primary route to carry the traffic between them. Furthermore,
we also want to select two backup routes for the same node pair, such
that the backup routes are link disjoint from the primary route, and
also from each other, to provide adequate protection whenever at most
two links fail simultaneously. Is it true that such a route system can
always be selected under the given conditions?

Solution: The answer is yes. Since 2 link failures cannot disconnect


the network, we have λ(G) ≥ 3. Then, by Menger’s Theorem, between

8
any two nodes there is a route system that contains (at least) 3 link-
disjoint routes.

2. In a network two disjoint sets of nodes, A and B are selected, each


containing k nodes. We know that the network is k-connected. Is
it always possible to connect each node in A with a node in B, such
that the connecting routes are link-disjoint, and, moreover, they do not
share any end-nodes?

Solution: Yes. Let us add a new node u and connect it to all nodes in
A. Similarly, add another new node v and connect it to all nodes in B.
By Exercise 5 in Section 1, the extended graph remains k-connected.
Then, by Menger’s Theorem, there are k edge-disjoint paths connecting
u and v. The parts of these paths that fall in the original graph satisfy
the requirements.

Another important question for network topology design is this: what is


the maximum connectivity that can be achieved with a given number of nodes
and edges? Or, turning the question around: for a given number of nodes we
can look for the minimum number of edges that makes it possible to satisfy
a given connectivity requirement (node or edge-connectivity). Interestingly,
there is a surprisingly simple formula answering the question:

Theorem (Harary) The achievable maximum node and edge connec-


tivites that a simple graph on n nodes with m edges can have are given by
 
2m
max κ(G) = max λ(G) =
n
assuming n − 1 ≤ m ≤ n(n − 1)/2, which is necessary for a connected
graph to exist. Then the optimal connectivity can be achieved by the following
construction: if m = n − 1, then take any tree on n nodes; otherwise follow
the following steps:

• Take a circle on n nodes.

• Set δ = ⌊2m/n⌋.

• Connect any two nodes that are within distance ⌊δ/2⌋ along the circle.

9
• If not all nodes have degree ≥ δ, then pick a node with degree < δ,
connect it to a node that is at distance ⌊n/2⌋ away along the circle.
Repeat this until all nodes have degree δ, possibly with a single node
that can have degree δ + 1.

Exercise

Design a minimum cost network topology on 11 nodes, if each link costs


1 unit of cost, and the network is required to have node connectivity at least
5. Justify that the solution indeed has minimum cost.

3 Algorithms

The minimum cut between two specified nodes can be obtained as a by-
product of the maximum flow computation. If, however, we want an overall
minimum cut in the whole graph, then a single maximum flow computation
does not suffice.

Interestingly, one can find an overall minimum cut directly, without using
anything about maximum flows. Below we present two algorithms for this
problem.

3.1 The Contraction Algorithm

This algorithm uses randomization, so it will guarantee the result only with
a certain probability, less than 1, but it can be made arbitrarily close to 1.

A single run of the algorithm works as follows (we will have to repeat
these runs sufficiently many times, with independent random choices).

Step 1 Pick an edge (v, w) uniformly at random among all edges from the
current graph G.

10
Step 2 Contract the selected edge (i.e., merge its endpoints). Keep the
arising parallel edges, but remove the self-loops, if any. The contracted
graph, denoted by G/(v, w), will be the new value of G.

Step 3 Are there still more than 2 nodes? If yes, repeat from Step 1. If no,
then STOP, the set of edges between the 2 remaining nodes form a cut.

Note that the obtained cut is not necessarily minimum. On the other
hand, as shown in the next (optional) lecture note, it will be a minimum cut
with probability at least 1/n2 in a graph on n nodes. Therefore, if we repeat
the algorithm K times independently and choose the smallest of the found
cuts, say C, then the following holds:

 K
1
Pr(C is not a minimum cut) ≤ 1 − 2
n

If K is chosen as K = ℓn2 , then the above probability will be approxi-


mately e−ℓ , which is very small already for moderate values of ℓ.

An interesting by-product of this algorithm is the theorem that the number


of minimum cuts in a graph is at most n2 , since each arises with at least 1/n2
probability in a run and these are mutually exclusive events. Note that the
total number of different cuts may well be exponentially large, but only at
most n2 of them can be minimum (actually, n2 is only an upper bound, the
precise tight bound is n(n − 1)/2).

Exercise

Show that the bound n(n − 1)/2 on the number of minimum cuts is tight
(that is, it can hold with equality).

Solution: Consider a graph on n nodes which is a simple circle (ring). It


cannot be disconnected by removing a single edge, but it can be disconnected
by removing 2 edges, so the minimum cut size is 2. If we remove any 2 edges
from the graph, it gets disconnected. Therefore, every pair of edges form a
minimum cut. Since there are n edges in a circle on n nodes, we can choose
n(n−1)/2 different minimum cuts, which is exactly the bound on the number

11
of minimum cuts. Thus, in this case the bound holds with equality, which
shows that it is tight.

3.2 The Nagamochi-Ibaraki Algorithm for Minimum


Cut

A clever deterministic algorithm for minimum cuts was discovered by Nag-


amochi and Ibaraki. Recall that the (edge-)connectivity of a graph G is
denoted by λ(G); and λ(x, y) denotes the connectivity between two different
nodes x, y.

Let Gxy be the graph obtained from G by contracting (merging) nodes x, y,


whether or not they are originally connected. In this operation we omit the
possibly arising self-loops (if x, y are connected in G), but keep the parallel
edges.

We can now use the following result that is proved among the exercises:

For any two nodes x, y


λ(G) = min{λ(x, y), λ(Gxy )} (4)
Thus, if we can find two nodes x, y for which λ(x, y) can be computed easily
(without a flow computation), then the above formula yields a recursive
algorithm.

One can indeed easily find such a pair x, y of nodes. This is done via the
so-called Maximum Adjacency (MA) ordering. An MA ordering v1 , . . . , vn of
the nodes is generated recursively the by the following algorithm:

• Take any of the nodes for v1 .


• Once v1 , . . . , vi is already chosen, take a node for vi+1 that has the
maximum number of edges connecting it with the set {v1 , . . . , vi }.

Nagamochi and Ibaraki proved that this ordering has the following nice
property:

12
Theorem In any MA ordering v1 , . . . , vn of the nodes
λ(vn−1 , vn ) = d(vn )
holds, where d(.) denotes the degree of the node.

Therefore, it is enough to create an MA ordering, as described above, and


take the last 2 nodes in this ordering for x, y. Then the connectivity between
them will be equal to the degree of the last node in the MA ordering. Then
one can apply formula (4) recursively to get the minimum cut. Here we make
use of the fact that it reduces the problem to computing λ(Gxy ) and Gxy is
already a smaller graph.

Exercises

1. Prove that λ(G) ≤ λ(Gxy ) always holds, assuming Gxy still has at least
two nodes. In other words, merging two nodes in a graph G that has
at least 3 nodes, can never decrease the edge-connectivity.

Solution: Let u denote the node obtained by merging x and y. Let C


be an edge set that forms a minimum cut in Gxy . Then |C| = λ(Gxy ),
and C cuts the node set of Gxy into two disjoint, non-empty subsets.
Let us call these sets A and B, such that the one that contains the
node u is called A. Let us now undo the merging, such that u is turned
back into the original two nodes x, y, still keeping them in the set A.
Then C will still separate A and B, so it is a cut in the resulting graph.
However, the resulting graph is just the original graph G, since undoing
the merging gives back the original graph. Thus, C remains a cut in
G, but not necessarily a minimum cut. Therefore, |C| ≥ λ(G). A the
same time, we had |C| = λ(Gxy ), yielding λ(Gxy ) ≥ λ(G), which is
what we wanted to prove.
2. Building on Exercise 1, prove
λ(G) = min{λ(x, y), λ(Gxy )}.

Solution: Let C be an edge set that forms a minimum cut of G. Then


|C| = λ(G) holds. Let A, B be the two node sets into which C cuts the
nodes of G. Now we can consider two cases:

13
a.) The nodes x, y are on different sides of the cut C. In this case
C is also a cut between x and y. Since C is a minimum cut in the
whole graph, there cannot be a smaller cut between x and y, so C is a
minimum cut between x and y. Therefore, in this case, λ(G) = λ(x, y)
holds.
b.) The nodes x, y are on the same side of the cut C. Then, after
merging x and y, C still remains a cut in the new graph Gxy . Since C
cannot be smaller than a minimum cut in Gxy , therefore, |C| ≥ λ(Gxy ).
We chose C, such that |C| = λ(G), yielding λ(G) ≥ λ(Gxy ). On the
other hand, we already know from Exercise 1 that λ(G) ≤ λ(Gxy ).
Therefore, in this case, the only possibility is λ(G) = λ(Gxy ).
Thus, we obtain that in case a) λ(G) = λ(x, y), while in case b) λ(G) =
λ(Gxy ). We also know that λ(G) ≤ λ(x, y) (by definition), and λ(G) ≤
λ(Gxy ) (by Exercise 1). Since precisely one of cases a), b) must occur,
therefore, by combining the above facts, we obtain that λ(G) must
be equal to the smaller of λ(x, y) and λ(Gxy ). This means, λ(G) =
min{λ(x, y), λ(Gxy )}, as desired.

3. Create a pseudo-code for the Nagamochi-Ibaraki minimum cut algo-


rithm.

14
Comments on Minimum Cut Algorithms

1. The Contraction Algorithm is much simpler, but it is randomized, so


it does not give a 100% guarantee that the found cut is indeed min-
imum. There exist de-randomized (i.e., deterministic) versions of the
Contraction Algorithm, but they are more complicated than the simple
randomized version.

2. An interesting, already mentioned, by-product of the Contraction Al-


gorithm is that any n-node graph can have at most n(n − 1)/2 = O(n2 )
different minimum cuts. Furthermore, they can all be listed by a
polynomial-time algorithm.

3. The above can be generalized to approximately minimum cuts. Let us


say that a cut is α-minimum, if the number of edges in it is within α
times the minimum, where α > 1 is a constant. Then one can prove
that the number of α-minimum cuts in any n-node graph is at most
O(n2α ). They can be all listed by a polynomial-time algorithm, as well.

Exercise. Given a k-connected network, create a polynomial-time algorithm


that can decide whether it can be disconnected by at most 3k isolated link
failures. (Isolated link failures: the failing links do not share any node.)

Solution (sketch): This kind of disconnection can happen precisely when


there is a cut in the graph that contains at most 3k edges, and they are all
disjoint. In other words, they form a matching, so such a cut is called a
matching cut.

Generally, the problem whether a graph contains a matching cut is known


to be NP-complete. We, however, want to decide only if there is a matching
cut of size ≤ 3k, where k = λ(G).

By the above discussion, all α-minimum cuts can be listed in polynomial


time, and their number is bounded by O(n2α ). In our case α = 3, so we can
list all 3-minimum cuts; there will be at most O(n6 ) of them. we can directly
check for each one whether it is a matching cut or not, i.e., it consists only
of disjoint edges or not. If at least one of them does, then ≤ 3k isolated link
failures can disconect the network, otherwise not.

15
Finding the Most Connected Subgraph,
and Other Clusters in Graphs

The parameter λ(G) addresses a bottleneck in connectivity, in the sense


that a minimum cut is the least resistant part of the network, it is where
the network is the easiest to disconnect. We may, however, also look for the
other end of the spectrum: it is also of interest to find the most connected
part, that is, a subgraph that has the highest connectivity.
The task of finding highly connected parts of the network is motivated by
several applications:

• Social network analysis. A part of a social network with high connec-


tivity is likely to represent a community within the social network. For
example, people connected to each other by some common interest or
affiliation.

• Network vulnerability analysis. When analyzing network vulnerability,


it is useful to know not only the most vulnerable parts, but also the
least vulnerable ones.

Regarding the most connected subgraph, a first question that comes to


mind is: what is the relationship between the connectivity of a graph and
that of its subgraphs? Specifically, if G is a graph, and G0 is a subgraph of
it, then how λ(G) and λ(G0 ) relate to each other? Which of the following
relationships can occur:

λ(G0 ) < λ(G), λ(G0 ) > λ(G), λ(G0 ) = λ(G)?

The example graph on the next page shows that all the three cases can occur.
In the example graph we can observe:

• The connectivity of the entire graph is λ(G) = 2.

• The subgraph induced by nodes 8 and 9 has connectivity 1, which is


smaller than λ(G). (In fact, any single edge would do as a 1-connected
subgraph.)

• The subgraph induced by nodes 1, 2, 3, 4, 10 has connectivity 4, which


is larger than λ(G).

• The subgraph induced by nodes 1, 6, 7 has connectivity 2, which is


equal to λ(G).

Let us now consider the following task: assuming we have a subroutine


that can find a minimum cut, develop an algorithm that finds a subgraph
with maximum connectivity.

Principle of the algorithm

• Find a minimum cut in the original graph G, let its size be denoted by
k = λ(G).

• Let A, B be the two node set into which the minimum cut divides
the graph. A key observation is that if there is any subgraph G0 with
λ(G0 ) > k, then it must be either fully in A or fully in B. Why?
The reason is that if G0 had nodes on both sides, then the found cut
of size k would separate these nodes, too. G0 , however, cannot be
separated by k edges, due to λ(G0 ) > k. Thus, if such a G0 exists, then
it must be either fully in A or fully in B.

• Let G1 , G2 denote the most connected subgraphs within the sets A, B,


respectively. We can find them by recursively calling the algorithm for
the smaller graphs induced by A and B.
• Return the graph that has the highest connectivity among the three
graphs G, G1 , and G2 .

Exercise. The subgraph with maximum connectivity may not be unique.


How would you modify the above algorithm if you want to find a most con-
nected subgraph which also has the maximum number of nodes among the
possibly multiple subgraphs of maximum connectivity?

Analysis: at most how many minimum cut computations are needed?


The graph is repeatedly split into two parts by a min cut computation,
and then the algorithm is recursively called for each part. This may give the
impression that in the worst case we may make an exponentially high number
of min cut computations, since we keep splitting each graph, and therefore,
keep doubling the number parts. This impression, however, is incorrect.
Let f (n) denote the number of min cut computations we carry out when
the recursive algorithm is run on an n-node graph.
Claim: f (n) ≤ n − 1.
Proof. Use induction. The base case n = 1 is trivial, since f (1) = 0,
as no min cut computation is needed for n = 1. In the recursive step we
partition the graph into two parts, let the number of nodes in them be k
and n − k. We do not know the value of k in advance, only that it is an
integer between 1 and n − 1. Nevertheless, by the induction hypothesis, we
can assume f (k) ≤ k − 1 and f (n − k) ≤ n − k − 1. The total number of
min cut computations we make for n nodes is at most the sum of f (k) and
f (n − k), plus one more to generate the split. Therefore,

f (n) ≤ f (k) + f (n − k) + 1 ≤ (k − 1) + (n − k − 1) + 1 = n − 1.

Observe that this holds regardless of the value of k, so it indeed completes


the inductive proof.
Algorithm to find the k-core

1. Delete all nodes of degree < k.

2. If the remaining graph is empty, then output the message “the k-core
is empty” and halt.

3. If the remaining graph is non-empty, then

• If every node in the remaining graph has degree ≥ k, then this is


the k-core; output this graph and halt.
• Else repeat from Step 1.

Exercises

1. In the last line, why do we jump back to Step 1? After all, when Step
1 was previously executed, it deleted all nodes of degree < k. Why is
this done again?

Answer: When the nodes of degree < k are deleted, this may create
new nodes of degree < k among the remaining ones. For example,
assume node u has degree k − 1 and it is a neighbor of another node
v, which has degree k. Then u is deleted in the first round, because its
degree is < k, but v is not deleted, as its degree originally is not < k.
But after deleting u, the degree of v becomes k − 1, so we created a
new node of degree < k.

2. It is clear that if the algorithm outputs a graph, then every degree is


≥ k in the output graph (as otherwise we would keep deleting nodes of
degree < k). But why is it true that the algorithm finds the largest such
subgraph? Could it be that possibly with some tricky different order
of deleting nodes we might be able to find a larger such subgraph?
Answer: Let V denote the vertex set of the original graph, and let A
be the set of nodes with degree < k in the original graph. Then we can
argue:

• If A = ∅, then the k-core is the entire graph, which is clearly the


largest possible such subgraph.
• If A 6= ∅, then all nodes of A must be removed, since no node can
stay with degree < k, and subsequent removals cannot increase
the degree. Thus, we have no choice but to remove all nodes of
A, as they cannot stay in the output.
• If, after removing A, in the remaining graph new nodes arise with
degree < k, then they must be removed again.
• Continuing this way, in each phase we only remove nodes that
must be removed, as they cannot stay in the output. Therefore,
we remove the minimum number of nodes, only those that must
be removed, with no other choice. Thus, the graph that remains
at the end of the algorithm, if not empty, must be indeed the
largest with the property.

3. Is it true that every graph has a unique k-core?

Answer: Yes. Assume indirectly that there are two sets, A and B,
with A 6= B, that both satisfy the definition of the k-core. Then each
node in A has at least k neighbors in A. The same holds for B. Then
A ∪ B also has this property. However, if A 6= B, then at least one
of them cannot be the largest such subset of nodes, as A ∪ B also
has the property, and for any two different sets |A ∪ B| > min{|A|, |B|}
holds. Therefore, at least one of A, B violates the definition of a k-core,
contradicting to the initial assumption.
Clusters based on density

The density of a graph can be defined in many different ways. A simple


definition is this:

Definition: The edge density (or simply, the density) of a graph is the ratio
of the number of edges vs the number of nodes.
Example: The graph used to illustrate the k-core concept (see 2 pages back)
has n = 29 nodes, and m = 36 edges. Therefore, its density is
m 36
= ≈ 1.241
n 29
On the other hand, some of its subgraphs have higher density. For example,
the subgraph spanned by the red nodes has 5 vertices and 8 edges, so its
density is 8/5=1.6, which is larger. The subgraph induced by the red and
green nodes together has 17 vertices and 25 edges, so its density is 25/17 ≈
1.47, which is smaller than 1.6, but still larger than 1.241.

Since a dense subgraph is likely to represent a cluster, the above consid-


erations lead to the following concept: as a cluster, find a subgraph that has
the highest density. We have seen in the above example that this densest
subgraph may possibly be a proper subgraph, not the entire graph. This
naturally leads to the following:

Algorithmic problem: Given a graph, find a subgraph with maximum edge-


density.
It is not obvious at all that this can be done by an efficient (polynomial-
time) algorithm, but it is doable. In fact, it can be reduced to bipartite
matching (which, in turn, is reducible to maximum flow). The details, how-
ever, are somewhat complicated, so we refer to the next (optional) lecture
note. It is a paper that presents an efficient solution not only to this task, but
also a solution to find densest subgraphs according to a much more general
density concept, which contains many others as special cases, including the
edge-density.

Exercise: Can we express the edge-density by the node degrees?

Answer: Yes. If a graph has m edges, n nodes, and the degree of the ith
node is di , then we can write:
n
X
2m = di .
i=1

The reason is that each edge contributes by 2 to the sum of the degrees.
Then we have for the edge-density:
Pn
m 1 i=1 di
= .
n 2 n
P
However, the expression ( ni=1 di )/n is nothing else but the average degree.
Therefore, the edge-density is just half of the average degree. Thus, finding
a subgraph with maximum edge-density is equivalent to finding a subgraph
with the highest average degree.
Sample Questions for the CS 6385 Midterm Exam

Contains 7 pages. The correct answer to each question is provided at the bottom of its page.

EXAM RULES:

1. Exactly one answer is correct for each multiple choice question. Encircle the number
of the chosen answer. You have two options for each multiple choice question, allowing
to get partial credit even for multiple choice:

(a) Select one answer. If you selected the correct one, then it is worth 1 point, other-
wise 0.
(b) You may select 2 answers. If the correct one is among them, then it is worth 1/2
point, otherwise 0. This allows partial credit for multiple choice questions. Note
that by selecting 2 answers you may double your chance to hit the correct one,
but at the price of getting only half of the points for this question.

2. Your choice of the answer(s) should be clear and unambigous. If you feel you have to
revisit a question and must change some of your choices during the exam, you can still
do it, given that you mark clearly and unambiguously your final choices. If ambiguity
remains for a question regarding your final choice(s), then it will be counted as an
unanswered question.

3. The instructor cannot give any hint during the exam about the answers. Therefore,
please refrain from asking questions that could lead to such hints (even remotely),
because such questions cannot be answered.

4. It is an open-notes exam, but no other help is allowed. In particular, no device can be


used that has a communicating capability (laptop, cellphone, etc). The work has to
be fully individual.

1
MULTIPLE CHOICE QUESTIONS

2 Assume the variables x, y occur in a linear programming task. We would like to add the
new constraint |2x| + |3y| ≤ 3 to the LP. Which of the following formulations does it
correctly, given that we must keep the formulation linear:

1. if (x ≥ 0 and y ≥ 0) then 2x + 3y ≤ 3 else −2x − 3y ≤ 3


2. 2x + 3y ≤ 3, −2x − 3y ≤ 3
3. x + y ≤ 3, −x + y ≤ 3, x − y ≤ 3, −x − y ≤ 3
4. 2x + 3y ≤ 3, −2x + 3y ≤ 3, 2x − 3y ≤ 3, −2x − 3y ≤ 3
5. x ≤ 3, y ≤ 3, 2x + 3y ≤ 3
6. x ≤ 3, y ≤ 3, 2x + 3y ≤ 3, −2x − 3y ≤ 3

Correct answer: 4.

2
3 Consider the maximum flow problem from a source node s to a destination node t in a
directed graph. Assume we were able to find a flow that pushes a total of 10 units of
flow through the graph from s to t. (But it may not be a maximum flow.) Which of
the following is correct:

1. If originally each edge has capacity at most 5 and we increase the capacity of each
edge by 1, then the maximum s-t flow will necessarily be at least 12.
2. If we remove all edges from the graph on which the found flow is 0, then the
minimum cut capacity between s and t in the new graph necessarily remains the
same as in the original graph.
3. If there is a cut in the graph that separates s and t, and its capacity is 11, then
the found flow was not maximum.
4. If the found flow is maximum and all edge capacities are integers, then there must
be at least 10 edges that are saturated by the flow, that is, they are filled up to
their capacities.

Correct answer: 1.

3
5 Assume a large network with undirected links has minimum cut size 4, that is, it cannot
be disconnected by removing less than 4 links, but it can be disconnected by removing
4 links. We would like to disconnect the network in the following strong sense: we
remove enough links so that the network falls apart into at least 3 components. In
other words, the disconnected network cannot be made connected again by adding a
single link. To achieve this strong disconnection, which of the following is true:

1. It is always enough to remove the links of a minimum cut.


2. It is always enough to remove the links of a minimum cut plus one additional link.
3. It is always enough to remove a number of links that is at most twice the size a
minimum cut.
4. We may have to remove a number of links that can be arbitrarily higher than the
size of a minimum cut.

Correct answer: 4.

4
PROBLEM SOLVING

9 Consider a network with directed links, and each link in this network has 1 Mb/s capacity.
Assume that from a source node s to a terminal node t the network can carry a flow of
20 Mb/s, but not more. Let us call a link critical if it has the following property: if the
link is removed, then the network cannot carry the 20 Mb/s flow from s to t anymore.

a) Is it possible that this network contains less than 20 critical links? Justify your
answer.

Answer: No, it is not possible. Let us consider a maximum flow, which has
value 20 Mb/s, according to the conditions. If we look at a minimum cut which
separates s and t, then, due to the max flow min cut theorem, this cut must have
capacity 20 Mb/s. Since each link has 1 Mb/s capacity, there must be 20 links in
the cut. If we remove any of these links, the capacity of the cut goes down to 19
Mb/s, so it is not possible to send 20 Mb/s flow anymore. Thus, each link in the
minimum cut is critical and there are 20 of them.

5
b) Is it possible that this network contains more than 20 critical links? Justify your
answer.

Answer: Yes, it is possible. For example, let the network have 3 nodes: s, a, t.
Define the links in the following way: connect s to a by 20 directed links, each
of capacity 1 Mb/s, and also connect a to t by 20 directed links, each of capacity
1 Mb/s. Then the maximum flow from a to t is 20 Mb/s. The links between s
and a form a minimum cut in which each link is critical. The same is true for the
links that connect a to t. Thus, this network satisfies the conditions and it has
40 critical links.

6
9 * Consider an undirected graph, and let λ(a, b) denote the edge connectivity between
any two different nodes a, b.
Let x, y, z be three distinct nodes in the graph. Does the inequality

λ(x, z) ≥ min{λ(x, y), λ(y, z)}


always hold? Justify your answer!

Answer: Yes. Let λ(x, z) = k. Then x and z can be separated by removing k


edges. But this cut must also separate at least one of the source-destination pairs
(x, y), or (y, z). The reason is this: if none of them is separated by this cut, then after
removing the cut, a path would remain connecting x, y, and another path connecting
y, z. Together they would provide connection between x and z, contradicting to the
fact that we removed a cut between (x, z). Thus, at least one of the pairs (x, y) and
(y, z) must be separated by these k edges, so at least one of λ(x, y) and λ(y, z)} must
be ≤ k. This is equivalent to min{λ(x, y), λ(y, z)} ≤ k, which proves the statement,
since k = λ(x, z).

Comment. One may ask here: could it be true that the inequality

λ(x, z) ≥ min{λ(x, y), λ(y, z)}

always holds with equality?


The answer is no. In some cases the inequality may be strict. Consider the following
example. Let the graph contain 5 nodes: x, y, z, a, b. Connect x with a, b, y and connect
z also with a, b, y. Then λ(x, y) = 2, since removing the (x, y) and y, z) edges separates
x and y. Similarly, the removal of the same two edges separates y and z, so λ(y, z) = 2.
(Draw a figure to see it.) On the other hand, λ(x, z) = 3, since there are 3 edge-disjoint
paths between x and z, namely, x − a − z, x − b − z and x − y − z.

7
Basic Reliability Configurations

Assumptions on the reliability model:

• Each component has two possible states: operational or failed.

• The failure of each component is an independent event.

• Component i is functioning (operational) with probability pi and is


inoperational (failed) with probability 1 − pi . (These probabilities are
usually known.)

• The reliability R of the system is some fuction of the component relia-


bilities:
R = f (p1 , p2 , . . . , pN )
where N is the number of components.

The function f (...) above depends on the configuration, which defines when
the system is considered operational, given the states of the components.
Basic examples are shown in the configurations discussed below.

Series Configuration

In the series configuration the system is operational if and only if all com-
ponents are functioning. This can be schematically represented by the figure
below, in which the system is considered operational if there is an operational
path between the two endpoints, that is, all components are functioning:

o— p1 —— p2 —— p3 ——.........—— pN —o
The reliability of the series configuration is computed simply as the product
of the component reliabilities:

Rseries = p1 p2 . . . pN

Note: If many components are connected in series, then the reliability may
be much lower than the individual reliabilities. For example, if p = 0.98
and N = 10, then Rseries = (0.98)10 = 0.82, significantly lower than the
individual reliabilities.

Parallel Configuration

The parallel configuration is defined operational if at least one of the compo-


nents are functioning. This is schematically represented in the figure below:

— p1 —
.. ..
. .

..
o— . —o
.. ..
. .

— pN —

The reliability can be computed as follows. The probability that component


i fails is 1 − pi . The probability that all components fail is (1 − p1 )(1 −
p2 ) . . . (1 − pN ). The complement of this is that not all component fails, that
is, at least one of them works:
N
Y
Rparallel = 1 − (1 − p1 )(1 − p2 ) . . . (1 − pN ) = 1 − (1 − pi )
i=1
k out of N Configuration

In this configuration the system is considered functional if at least k compo-


nents out of the total of N are functioning.

The probability that a given set of k components are functioning is

pk (1 − p)N −k .

The probability that a any set of k components are functioning is


à !
N k
p (1 − p)N −k
k

where à !
N N!
=
k k!(N − k)!
represents the number of ways one can choose a k-element set out of N .

Finally, since we need at least k operational componenets, we have to sum


up the above for all possible acceptable values of k. This gives the reliability
of the k out of N configuration:

N
à !
X N i
Rk/N = p (1 − p)N −i
i=k i
Exercise
Exercise
Comparison of the series-parallel and parallel-series configurations:

Assume N = M , that is, we have the same number of components in each


column and row. Also assume that each component has the same reliability
p with 0 < p < 1. Which of the two configurations tends to be more reliable
if N grows very large (N → ∞)?

Solution

With N = M we obtain that the series-parallel configuration has reliability

Rs−p = [1 − (1 − p)N ]N ,

and the parallel-series has reliability

Rp−s = 1 − (1 − pN )N

Which of these similarly looking expressions is larger if N tends to infinity?

We can answer the question using the following known formula from calculus:

lim (1 − x)1/x = e−1


x→0

where e = 2.71... is the base of the natural logarithm. We use it for approx-
imation: when x is very small, then

(1 − x)1/x ≈ e−1

holds.

For Rs−p let us take x = (1 − p)N . Then we have x → 0 if N → ∞. We can


then write

Rs−p = (1 − x)N = (1 − x)(1/x)N x = [(1 − x)1/x ]N x ≈ e−N x ,


using (1 − x)1/x ≈ e−1 in the last step. Since N x = N (1 − p)N , therefore, we
have N x → 0, as the second factor tends to 0 exponentially, while the first
grows only linearly. Thus, we obtain that

Rs−p → 1 (1)

as N grows.

For Rp−s we take x = pN . Then, similarly to the above, we have x → 0 if


N → ∞, so we can write

Rp−s = 1 − (1 − x)N = 1 − (1 − x)(1/x)N x = 1 − [(1 − x)1/x ]N x ≈ 1 − e−N x .

Since now N x = N pN , therefore, we have again N x → 0, as the second


factor tends to 0 exponentially, while the first grows only linearly. Thus, we
obtain that e−N x → 0, implying

Rp−s → 0 (2)

Comparing (1) and (2), we can conclude that with N → ∞ the reliability of
the series-parallel configuration tends to 1, while the reliability of the parallel-
series configuration tends to 0. Thus, the series-parallel configuration will be
more reliable when N grows very large.

Interpretation

It is interesting that the two apparently similar configurations behave so


differently. The series-parallel tends to be more reliable as its size grows,
while the parallel-series loses reliability with growing size. How could we
informally explain this difference?

The series-parallel configuration has much more possible paths between the
endpoints. We can pick any component from each parallel subnetwork to
build a path, yielding N N possible paths. On the other hand, the parallel-
series configuration has only N possible paths between the endpoints, as each
series subnetwork can serve as such a path, but there is no more possibility.

Similarly, if we consider minimum cuts, that is, minimal subsets of compo-


nents whose failure disconnects the network, then we find that the situation is
reversed. The series-parallel configuration has only N possible minimum cuts
(taking an entire parallel subnetwork), while the parallel-series configuration
has N N minimum cuts (taking one component from each series subnetwork).

The above considerations show that the series-parallel configuration is much


more connected, i.e., it has much more paths and much less cuts than the
other. Thus, it is not surprising that it grows more reliable.
     


                   
                           
      

 

     

                      !      ""#  
"" #                        
  ""#           

$              
        %1  &1
  '        (     


 


Method of minimal paths

A minimal path is a set P of components with the following properties:

Property 1 If all components in P are functioning, then the whole system


is operational, independently of the status of the other components.

Property 2 P is minimal with respect to Property 1, that is, no proper


subset of P has Property 1.

Remark: Do not confuse this concept with the well known concept of shortest
path. A minimal path in the reliability context does not have to be a shortest
path. (In fact, it does not even have to be a path, it can be an arbitrary
subset of components with the said properties.)

Example: In the example on page 1 the minimal paths are A-D, B-E, A-C-E
and B-C-D. The latter two are not shortest paths.

How can we compute the reliability using minimal paths? Observe that if
any one of the minimal paths is fully functional, then the system is already
operational, by the definition of the minimal path. Therefore, if we can list
all minimal paths in the system, then we can regard them as units connected
in parallel.

Let us use this approach in our example configuration. For simplicity, let the
reliability of each component be denoted by the same letter as the name of
the component. Then the reliability of the 4 minimal paths listed above will
be AD, BE, ACE, BCD, respectively. Then we can form a parallel configu-
ration from them, thus obtaining:

R = 1 − (1 − AD)(1 − BE)(1 − ACE)(1 − BCD).


Question: Does this method guarantee an exact result?

Answer: No, it is only an approximation. The reason is that when we com-


pute the reliability of the parallel configuration consisting of the minimal
paths, we regard the paths independent. They may not be independent,
however, since they can share components.

Question: What is the advantage of this method?

Answer: The number of minimal paths may be much smaller than the number
of all states, so the method is more scalable than the exhaustive enumeration
of all states.

Method of minimal cuts

This method is similar in spirit to the previous one, but considers cuts rather
than paths.

A minimal cut is a set C of components with the following properties:

Property 1 If all components in C fail, then the whole system fails, inde-
pendently of the status of the other components.

Property 2 C is minimal with respect to Property 1, that is, no proper


subset of C has Property 1.

Remark: Observe the duality between the definition of the minimal path
and minimal cut. If we replace “functioning/operational” in the definiton of
minimal path by “fail”, then we obtain the definition of minimal cut.

Example: In our example on page 1 the minimal cuts are A-B, D-E, A-C-E
and B-C-D. The latter two are not shortest paths.
To compute the system reliability, observe that if any one of the minimal cuts
fully fails, then the system must fail, independently of the rest. Therefore,
all minimal cuts must be operational for the sytem to work. Thus, we can
consider the minimal cuts as if they were connected in series.

The probability that a given cut, say A-B, works is 1 − (1 − A)(1 − B), since
the probability that all components fail in the cut is (1 − A)(1 − B), so the
complement 1 − (1 − A)(1 − B) means the cut is operational, i.e., not all
components fail in it. Since we conceptually consider the cuts as connected
in series with each other, therefore, we obtain

R = [1 − (1 − A)(1 − B)][1 − (1 − D)(1 − E)] ·


·[1 − (1 − A)(1 − C)(1 − E)][1 − (1 − B)(1 − C)(1 − D)].

Note: The same comments apply here as for the method of minimal paths:
it is only an approximation and its advantage is that it tends to be more
scalable than the exhaustive enumeration of all states.

Question: Which method is more advantageous to apply, the minimum paths


or minimum cuts?

Answer: It depends on the configuration. If there are many minimum paths


but fewer minimum cuts, then the method of minimum cuts is less complex.
On the other hand, if there are fewer paths but many cuts, then the method
of minimum paths is easier. The previously considered series-parallel and
parallel-series configurations show that the number of paths and the number
of cuts in a confoiguration can be very different.
Reliability and Lifetimes

In the previous sections a high level of abstraction was used that ignored the
time dependence of reliability. Now we look into some fundamental quantities
that describe the temporal behavior of component reliability.

Basic Probabilistic Model

Let T be the failure time when the considered component brakes down. Nat-
urally, T is a random variable.

A fundamental characteristics of any random variable is its probability dis-


tribution function (sometimes also called cumulative distribution function).
For the failure time it is defined as

F (t) = Pr(T < t).

In other words, F (t) is the probability that the breakdown occurs before a
given time t. Naturally, the value of this probability depends on what is this
given time t, this dependence is described by F (t).

Let us define now further important functions that characterize how the
reliability of a component behaves in time.

Lifetime measures

• Survivor function (also called reliability function)


The survivor function is defined by

S(t) = 1 − F (t)

1
where F (t) is the probability distribution function of the failure time.
The meaning of the survivor function directly follows from the defini-
tion of F (t):
S(t) = 1 − Pr(T < t) = Pr(T ≥ t).
In other words, S(t) is the probability that the component is still op-
erational at time t, since T ≥ t means the component has not failed up
to time t.

• Probability density function (pdf )


If F (t) is differentiable, then its derivative is called probability density
function:
f (t) = F 0 (t)
With the pdf f (t) we can easily express the probability that the failure
occurs in a time interval [t1 , t2 ]:
Z t2
Pr(t1 ≤ T ≤ t2 ) = f (t)dt.
t1

• Hazard function (also called failure rate or hazard rate)


The goal of the hazard function h(t) is to express the risk that the
component fails at time t.
How can we capture this risk with the already known quantities?
Take a very small time ∆t. Then the fact that the failure occurs at time
t can be approximately expressed with the condition t ≤ T ≤ t + ∆t.
Of course, this can only happen if the component still has not broken
down before t. Thus, the risk that the component brakes down at time
t can be approximated by the conditional probability

Pr(t ≤ T ≤ t + ∆t | T ≥ t). (1)

Note that the risk may be quite different from the probability of failure.
As an example, one can say that the probability that a person dies at

2
age 120 is very small, since most people do not live that long. On the
other hand, the risk that a person of age 120 dies is quite high, since
the concept of risk assumes that the person is still alive (otherwise it
would be meaningless to talk about risk).
To make the risk expression (1) exact and independent of ∆t, we con-
sider the limit ∆t → 0. Then, however, the probability would always
tend to 0. To avoid this, we divide it by ∆t, thus obtaining a quan-
tity that is similar in spirit to the pdf. This gives the definition of the
hazard function:
Pr(t ≤ T ≤ t + ∆t | T ≥ t)
h(t) = lim
∆t→0 ∆t
Thus, the hazard function gives the risk density for the failure to occur
at time t.

Relationships between different lifetime measures

As we have seen there is a direct relationship between the survivor function


and the probability density function:

S(t) = 1 − F (t)

F (t) = 1 − S(t)
By taking derivatives we can get a relationship between the survivior function
and the pdf:
f (t) = −S 0 (t).
We can also express the survivior function with the pdf from the definition
R
S(t) = 1 − F (t). Using that 1-F(t)= t∞ f (t)dt, we obtain
Z ∞
S(t) = f (t)dt.
t

3
Relating the hazard function with the others is slightly more complex. One
can prove the following formula:
−S 0 (t)
h(t) = .
S(t)
Using the previous relationships, this implies
f (t)
h(t) = R ∞ .
t f (t)dt

It is kown from basic calculus that the formula h(t) = −S 0 (t)/S(t) is equiv-
alent to
d
h(t) = − ln S(t).
dt
Using this we can express S(t) by the hazard function as
Rt
− h(t)dt
S(t) = e 0 .

Typical hazard functions

If we look at the formula


f (t)
h(t) = R ∞
t f (t)dt

then we can see that there are two possible reasons for having high hazard
(risk) at time t:

• either the numerator is large, that is, the probability of a failure is high
around time t, and/or
R
• the denominator is small. The denominator t∞ f (t)dt gives the prob-
ability that the failure has not occured before t. If this is small, that
means the probability that the component is still alive is low. In other
words, the component is old from the reliability point of view.

4
Since a typical pdf sooner or later starts decreasing with time, this effect
tends to diminish the hazard as time advances. On the other hand, the
denumerator decreases as the component gets older, which will increase the
hazard. These two opposite effects can balance each other in different ways.
A special case when they precisely balance each other is the exponential
distribution, where the pdf is of the form

f (t) = λe−λt .

In this case, if we compute the hazard function, we obtain h(t) = λ. That


is, in this case the hazard function is constant.

In many practical cases the hazard function has a bathtub shape that consists
three typical parts:

1. Initial “burn-in” period, when the hazard is relatively large, due to


potential manufacturing defects that result in early failure.

2. Steady part with approximately constant hazard function.

3. Aging with increasing hazard.

5
Expressed by →
   
Lifetime Measure

    1    
1      

 Ft   


     


 1 – Ft   

    

   
  ! "  
1      
! 

General Approach to Compute System Lifetime Measures

1. Given some lifetime measure of the components, express them in terms of the survivor
function  .
2. Compute the time dependent reliability of the configuration just like in the static case but using
the component survivor functions instead of static component reliabilities.
3. We get the survivor function of the system.
4. Convert it into the requested life-time measure using the table.
ILP Exercises

1.
Solution

First we draw a map of the situation:

' $ ' $
A
1 z z z 4
S
S
S
S
S
S
S
z z
S
2 S 5
S 
S 
S 

S 
S 

S 
S 


z z z
S
S 
3 S 6

& % & %
B

A potential layout of the transcontinental link.


Let us now formulate the constraints.

• Exactly one of the West coast cities is connected to one of the hubs:
3 X
X B
xij = 1
i=1 j=A

• Similarly, exactly one of the East coast cities is connected to one of the
hubs:
6 X
X B
xij = 1
i=4 j=A

• The same hub is connected to both coasts:


3
X 6
X
xiA = xiA
i=1 i=4

Note that this already implies the same equation for hub B.

• The variables can only take 0-1 values:

xij ∈ {0, 1}, i = 1, . . . , 6; j = A, B

The objective function is the total cost:


6 X
X B
cij xij
i=1 j=A

Thus, the entire mathematical program will be an ILP with 0-1 valued vari-
ables:
6 X
X B
min Z = cij xij
i=1 j=A

Subject to:
3 X
X B
xij = 1
i=1 j=A
6 X
X B
xij = 1
i=4 j=A
3
X 6
X
xiA = xiA
i=1 i=4
X3 X6
xiB = xiB
i=1 i=4

xij ∈ {0, 1}, i = 1, . . . , 6; j = A, B


2. Assume that in an integer linear programming problem we have n vari-
ables. Each variable can only take the value 0 or 1. Express each of the
conditions a.), b.), c.) given below, such that you can only use linear equa-
tions or inequalities, nothing else.

a) At least one variable must take the value 0.

Answer:
This means, they cannot be all 1, so their sum is at nost n − 1:

x1 + x2 + . . . + xn ≤ n − 1

or

x1 + x2 + . . . + xn < n

b) At most one variable can take the value 0.

Answer:
This means, at least n − 1 of them is 1, so their sum is at least n − 1:

x1 + x2 + . . . + xn ≥ n − 1
c) Either all variables are 0, or none of them.

Answer:
This means, they all must be equal:

x1 = x2 = . . . = xn

or, expressed via separate equations:

x1 = x2
x2 = x3
..
.
xn−1 = xn

3. Let x, y, z be 0-1 valued variables. Express the following constraint via


linear inequalities:
z = xy

Solution

If any of x, y is 0, then z must be 0, too. This can be expressed by two


inequalities:
z≤x
z≤y
On the other hand, if x, y are both 1, then we have to force z to be 1. It can
be done by
z ≥ x+y−1
Note that if at least one of x, y is 0, then the right-hand side is ≤ 0, so then
nothing is forced on z.

Thus, the system

z ≤ x
z ≤ y
z ≥ x+y−1

is equivalent to the nonlinear constraint

z = xy

for 0-1 valued variables. Note that the restriction x, y, z ∈ {0, 1} is essential
here, without it the equivalence would not hold.

4. Let us generalize the previous exercise to a longer product! Let x1 , . . . , xn


and z be 0-1 valued variables. Express the following constraint via linear
inequalities:
z = x1 · . . . · xn

Solution

Similarly to the previous exercise, it is easy to check that the following system
of linear inequalities provides an equivalent expression for the case when
z, x1 , . . . , xn ∈ {0, 1}:

z ≤ x1
..
.
z ≤ xn
z ≥ x1 + . . . + xn − n + 1
Case Study
An Optimization Problem in Cellular Network Design:
Formulation via Integer Linear Program

In cellular networks the coverage area is divided into cells. Each cell is served by base station to
which the mobile units connect wirelessly. The base stations are connected to switching centers
through a wired network.

Whenever a mobile unit moves to another cell, a handoff procedure has to be carried out. There
are two types of handoffs:

1. Intra-switch handoff: the mobile moves between two cells that are connected to the same
switch.

2. Inter-switch handoff: the mobile moves between two cells that are connected to different
switching centers.

The two different types of handoff procedures have different costs (for example, in terms of
processing time).

The goal of the optimization (which is part of the cellular network planning process) in this case
study is to decide which base station will be connected to which switching center, such that the
total cost is minimized.

The total cost takes into account the cost of the different handoff procedures (based on traffic
volume data), as well as the cabling cost among base stations and switching centers. In addition,
the traffic processing capacity of the switching centers is bounded, which has to be factored into
the optimization.

Note that the handoff cost between two cells depends on the traffic volume (amount of handoffs
between them). Furthermore, it also depends on whether they are assigned to the same switch or
not, since that assignment decides whether intra-switch or inter-switch handoff is needed. The
assignment between cells and switches, however, is not given in advance, it is obtained from the
optimization. On the other hand, the optimization, in turn, depends on the handoff costs. This
circular dependence makes the first formulation of the problem nonlinear. Then we can linearize
it with the trick that was used in earlier exercises to linearize a product of variables.

The source of this case study is T.G. Robertazzi, “Planning Telecommunication Networks,”
IEEE Press, 1999.
Integer Linear Programming

Introduction

In many network planning problems the variables can take only integer val-
ues, because they represent choices among a finite number of possibilities.
Such a mathematical program is known as integer program.

Remark: Often the variables can only take two values, 0 and 1. Then we
speak about 0-1 programming.

If the problem, apart from the restriction that the variables are integer val-
ued, has the same formulation as a linear program, then it is called an integer
linear program (ILP).

Sometimes it happens that only a subset of the variables are restricted to be


integer valued, others may vary continuously. Then we speak about a mixed
programming task. If it is also linear, then it is a mixed ILP.
A general ILP is formulated as follows:

min Z = c1 x1 + c2 x2 + . . . + cn xn
Subject to

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + . . . + amn xn = bm
x1 , . . . , x n ∈ Z

where Z denotes the set of integers.

One can, of course, use other LP formulations, too, and then add the inte-
grality constraint x1 , . . . , xn ∈ Z to obtain an ILP.

Comments:

The minimum can be replaced by maximum, this does not make any essential
difference.

If x1 , . . . , xn ∈ Z is replaced by x1 , . . . , xn ∈ {0, 1}, then we obtain a 0-1


programming problem. Many of the tasks encountered in network design
belong to this class, as will be seen later.

It is important to know that ILP is usually much more difficult to solve than
LP. Often we have to apply heuristic or approximative approaches, because
finding the exact global optimum for a large problem is rarely feasible.
Example 1: Capital Budgeting

A telecommunication company wants to build switching centers at certain


given sites. The sites should be selected so that the total profit is maximized,
under the constraint that the total cost remains within the available budget.

The following information is available to formulate the problem:

• There are N available sites.

• The cost of building a switching center at site i is ci .

• If a center is built at site i, then it generates a profit pi (say, per year).

• The total available budget that can be used for building the centers is
C.

• Objective: select which of the N sites will be used for building switching
centers, so that the total profit is maximized, while keeping the total
cost within the budget. (Any number of sites can be selected out of
the N .)
Let us formulate the problem as a mathematical program!

Variables: one variable, xi , for each site i. The meaning of xi is:



 1 if site i is selected as a center
xi =
 0 if site i is not selected as a center

In other words, xi is the indicator variable of the fact that site i is selected/not
selected as a center.

Such indicator variables are frequently used in network planning formulations


when one has to express yes/no choices.

Objective function

Goal: maximize the total profit.

How can we express it? The profit from site i is pi if the site is selected as a
center, otherwise 0. The definition of xi suggests that this can be conveniently
expressed as pi xi . Then the total profit is:
N
X
p i xi .
i=1

Thus, the objective function is:


N
X
max Z = p i xi .
i=1
Constraints:

The total cost cannot be more than the budget C:


N
X
ci x i ≤ C
i=1

The variables can only take 0-1 values:

xi ∈ {0, 1}, i = 1, . . . , N.

Collecting the parts, the complete formulation is this:


N
X
max Z = p i xi
i=1

Subject to
N
X
ci x i ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , N

Comment: Though we have not mentioned it, one can easily recognize that
this problem is exactly identical to the well known Knapsack Problem.

Why is this good? Because we can then use all the methodology that has
been developed for the Knapsack Problem, so we can stand “on the shoulders
of giants.”
How can we find a solution?

It is well known from the study of algorithms that the Knapsack Problem in
the general case is NP-complete. In the special case when the coefficients are
polynomially bounded, it is known to have a Dynamic Programming solution,
but in the general case (where the coefficients can be exponentially large) we
cannot reasonably hope to find the exact optimum for a large problem, via
an efficient algorithm. A good approximation, however, is still possible:

Principle of the heuristics: Sort the sites according to their profit/cost ratio.
(Clearly, a higher profit/cost ratio is more preferable.) Select the sites one
by one in this order, thus, always trying the most preferable first, among the
yet unselected ones. (Preference = profit/cost). If the considered site still
fits in the remaining budget, then select it, otherwise proceed to the next
one in the preference order.

This is a natural variant of the so called Greedy Algorithm.

Trying the sites according to the profit/cost preference order is so natural,


that one may wonder: why it does not guarantee an optimum solution?

Exercise (to test your understanding for yourself): Find a specific example
when the above greedy algorithm does not yield the optimum.

A simple example is on the next page, but try to find one yourself, before
proceeding to the next page.
Solution to the exercise

Consider the following data:

p1 = 90, c1 = 10
p2 = 20, c2 = 2
p3 = 8, c3 = 1
Budget: C = 11

Then the profit/cost ratios are:

p1 /c1 = 9, p2 /c2 = 10, p3 /c3 = 8.

The Greedy Algorithm would select i = 2 first, since p2 /c2 is the largest.
This choice uses c2 = 2 units from the budget. The next largest ratio is
p1 /c1 , but c1 = 10 already does not fit in the remaining budget of 11 − 2 = 9.
So the algorithm proceeds to i = 3, which still fits in the budget.

Thus, the Greedy Algorithm selects sites 2 and 3, achieving a total profit
of 20 + 8 = 28. On the other hand, if we select sites 1 and 3, then we can
achieve a profit of 90 + 8 = 98, while the cost still fits in the budget.
Comment

As we have seen, the Greedy Algorithm does not necessarily find the optimum
for this problem. A natural question is:

How far is it from the optimum? In other words, can we prove some perfor-
mance guarantee for the greedy solution?

The answer is yes. We present the preformance guarantee below.

Theorem:

Let Topt be the optimum solution to the considered problem, that is, the max-
imum achievable total profit under the budget constraints. Let Tgreedy be the
profit achieved by the Greedy Algorithm. Further, denote by pmax the largest
profit coefficient, that is, pmax = maxi pi . Then

Tgreedy ≥ Topt − pmax

always holds.

Proof: See the Appendix at the end of this note. The proof is included only
for showing how can one possibly analyze the result of the greedy heuristic
in this situation, you do not have to learn the proof details.
Exercises

1. Assume that we execute the Greedy Algorithm on a specific Capital


Budgeting problem and we find that Tgreedy = 1000 (in some appropriate
units). Further, we also know that the largest profit coefficient is pmax = 5.
Give an estimate on the optimum profit Topt .

Solution

From the preceding Theorem we know that

Tgreedy ≥ Topt − pmax

always holds. Rearranging, we get


Tgreedy pmax
≥1− .
Topt Topt
Since Topt ≥ Tgreedy by definition, therefore, if we replace Topt by Tgreedy on
the righthand-side, then pmax /Topt can only grow. This means, we can only
subtract more from 1, so
pmax pmax
1− ≥1−
Topt Tgreedy
must hold. As a result, we get
Tgreedy pmax
≥1− .
Topt Tgreedy
Substituting the given numerical values on the righthand-side yields
Tgreedy 5
≥1− = 0.995 = 99.5%.
Topt 1000
This means, in this case we can argue that the Greedy Algorithm will give
a result that is within 0.5% of the optimum, even though we do not know
what the optimum is!
The value of Tgreedy = 1000 gives a lower bound on Topt . If we also want an
upper bound, then from the Theorem we have Tgreedy ≥ Topt − pmax , which,
after rearranging, yields

Topt ≤ Tgreedy + pmax .

Using the numerical values, we get Topt ≤ 1005.

2. In the example which showed that the Greedy Algorithm can produce a
non-optimal solution, the greedy solution was less than half of the optimum.
Can we modify the Greedy Algorithm such that it always guarantees at least
half of the optimum profit?

Solution

To exlude trivial cases, let us assume that ci ≤ C for every i. (If ci > C for
some i, then it can be excluded, since it never fits in the budget).

If Topt ≥ 2pmax , then we have

Topt
Tgreedy ≥ Topt − pmax ≥ ,
2
since the first inequality follows from the Theorem and the second inequality,
after rearranging, is equivalent to Topt ≥ 2pmax .

Thus, if Topt ≥ 2pmax , then the greedy solution satisfies the requirement.
What if Topt < 2pmax ? This means pmax > Topt /2, so then simply selecting
the site with maximum profit already achieves more than half of Topt . Let
us call this latter choice 1-site heuristic. Clearly, the profit achieved by the
1-site heuristics is pmax .

It follows that at least one of the greedy and 1-site heuristics gives half or
more of Topt . But we do not know in advance which one. There is, however,
an easy way to know it: run both and take the result that gives the larger
profit. Thus, the required modification is:

Run the Greedy Algorithm to obtain Tgreedy . If Tgreedy ≥ pmax , then keep the
greedy solution, otherwise replace it with the 1-site heuristics.

Appendix
Proof of the Theorem.

Let x1 , x2 , . . . , xn be the solution found by the greedy algorithm. Similarly,


let y1 , y2 , . . . , yn be an optimal solution. If the greedy solution is not optimal,
then there must be an index j for which xj = 0 and yj = 1. The reason is
that the 1s in the optimal solution cannot form a proper subset of the 1s in
the greedy solution, since then the greedy would have higher value than the
optimal. Thus, if the greedy is not optimal, then there must be a 1 in the
optimal solution that is not in the greedy solution.

Let us assume that the variables are indexed such that


p1 p2 pn
≥ ≥ ... ≥
c1 c2 cn
holds and let j be the smallest index with xj = 0, yj = 1. Then we can write
for the total profit obtained by the greedy algorithm:
n
X j
X j
X j
X
Tgreedy = p i xi ≥ p i xi = pi y i + pi (xi − yi )
i=1 i=1 i=1 i=1

Let us now use the fact that


pi pj

ci cj
holds if i ≤ j, due to the chosen indexing. Therefore, we have
pj
pi ≥ ci
cj
pj
and then substituting pi by c
cj i
in the last summation, we obtain

j
X j
X pj
Tgreedy ≥ pi y i + ci (xi − yi ).
i=1 i=1 cj

Note that xi − yi ≥ 0 holds for i < j, since, by definition, j is the first


index with xj = 0, yj = 1. Therefore, the inequality remains correct after
the substitution, as we have no negative number in the summation for i < j.
When i = j, then xi − yi < 0, but then the coefficient does not change, being
pi /ci = pj /cj for i = j.

Further rearranging, we get


 
j X 
X pj  j Xj

Tgreedy ≥ pi y i +  ci x i − ci y i 
 .
i=1 cj  i=1 i=1 
| {z } | {z }
A B

Let us bound now the sums denoted by A and B above.

A is the cost incurred by the first j variables in the greedy solution. This
does not include cj , since xj = 0. Moreover, cj cannot be added without
violating the budget constraint, since otherwise the greedy algorithm would
have set xj = 1. Therefore, we must have A + cj > C, which yields
j
X
A= ci x i > C − cj .
i=1

Considering the other sum (B), we can use the fact that the optimal solution
must also fit in the budget. Therefore,
n
X
ci y i ≤ C
i=1
holds, which implies
j
X n
X
B= ci yi ≤ C − ci y i .
i=1 i=j+1

Substituting the sums A and B by the bounds, we can observe that the
expression can only decrease, as A is substituted by a smaller quantity, while
the subtracted B is substituted by a larger quantity. Thus, we obtain
 
  
j
X pj  Xn 

Tgreedy ≥ pi y i + C − cj − C − ci yi 
.
cj | {z } 
i=1  i=j+1 
<A | {z }
≥B

After rearranging, we get


j
X n
X pj
Tgreedy ≥ pi yi + ci y i − p j
i=1 i=j+1 cj

Using now that pcjj ≥ pcii when i > j, the second sum can only decrease if the
coefficient of yi is replaced by pi . This yields
j
X n
X n
X
Tgreedy ≥ pi y i + pi y i − pj = pi y i − pj
i=1 i=j+1 i=1

We can now observe that the last summation is precisely Topt , while pj ≤
pmax , so if we subtract pmax instead of pj , then the expression can only
decrease. Thus, we obtain

Tgreedy ≥ Topt − pmax

which proves the Theorem.


Example 2: Fixed Charge Problem

A telecom company wants to offer new services to its customers. Each service
can be offered in different amounts (e.g., with different bandwidth). The goal
is to decide how much of each service is to be offered to maximize the total
profit, such that the total cost remains within a given budget.

The following information is available to formulate the problem:

• There are N potential services.

• If service i is offered, then it incurs a cost of ci per unit amount of


service, plus a fixed charge ki , which is independent of the amount.

• Service i brings a profit of pi per unit amount of offered service.

• The total available budget is C.

To formulate a mathematical program, let us introduce variables:

xi = the amount of service i offered

Note that xi is a continuous variable. To handle the fixed charge we also


introduce indicator variables for the services, to indicate which ones are ac-
tually offered:

 1 if xi > 0
yi = 
0 if xi = 0

How much is the cost of service i? The proportional cost is clearly ci xi . It


would be wrong, however, to say that the cost with fixed charge is ci xi + ki ,
since ki is incurred only if xi > 0. Thus, we have to distinguish the cases
when xi > 0 and xi = 0. This is why we need the yi variables. Thus, the
correct expression is ci xi + ki yi and the total cost is:
N
X
(ci xi + ki yi ).
i=1

The total profit is:


N
X
pi xi .
i=1

The complete formulation then can be expressed as


N
X
max Z = p i xi
i=1

Subject to
N
X
(ci xi + ki yi ) ≤ C
i=1
xi ≥ 0, i = 1, . . . , N
yi ∈ {0, 1}, i = 1, . . . , N

 1 if xi > 0
yi =
 0 if xi = 0

This is a mixed integer program, since some of the variables are continuous.
On the other hand, this formulation is not linear, since the last constraint
(the one that enforces the relationship between yi and xi ) is not linear. Often
it is desirable, however, to have a linear formulation which is useful when we
want to use commercial ILP software.
Exercises

1. Replace the last constraint with a linear formulation.

Solution:

Observe that the constraint


N
X
(ci xi + ki yi ) ≤ C
i=1

implies that ci xi ≤ C must hold for every x. Therefore, if we introduce a


new constant value by
C
M = max
i ci
then xi ≤ M holds for every xi . Then we may replace the nonlinear constraint
by the following linear one:
1
yi ≥ xi .
M
Why does the new (linear) formulation remains equivalent with the original
one? If xi > 0, then 0 < M1 xi ≤ 1, so then yi ≥ M1 xi forces yi = 1. That
is exactly what we want if xi > 0. If xi = 0, then, according to the new
formulation, yi can be both 0 or 1. In this case, however, whenever it is 1,
changing it to 0 does not cause any new constraint violation and it also does
not change the the objective function. Therefore, the new formulation serves
the purpose just as well as the original one, it does not change the result.
It is, however, linear, which is useful when we want to use commercial ILP
software.
2. What will be the optimal solution, if the company decides to offer only
a single service? (But we do not know in advance, which one will be the
offered service.)

Solution:

Let i be the service that the company offers. Then the total profit will be
pi xi , as for all j 6= i we have xj = 0. For any given i, the profit will be
maximum, if xi is as large as possible. Since there is no other service in the
considered case, therefore, service i will use the entire available budget C.
This means
c i xi + k i = C
(Since obviously xi > 0 in this case, therefore, the fixed charge will be there.)
From this we get
C − ki
xi =
ci
and the profit will be
pi (C − ki )
p i xi = .
ci
How do we know the value of i? We simply have to check which i makes the
expression
pi (C − ki )
ci
the largest. (Since the expression contains only known constants, we can
easily check which i makes it the largest.)

3*. Show that the special case discussed in 2 is not as special as it looks:
even in the general case the optimum is always attained by a solution in
which only one xi differs from 0. Using this, devise an algorithm that solves
the problem optimally in the general case without ILP. This gives a clever
shortcut, using the special features of the problem, making the task efficiently
solvable even in the general case.
ILP Solution Techniques

ILP is much more difficult to solve than LP. No efficient general


algorithm is known for ILP and it is not even likely to exist. To
provide at least an idea how certain methods work we outline a
few principles in the subsequent notes.

Some general methods that we outline are:

• Cutting plane algorithm

• Randomized rounding

• Branch and bound

• Branch and cut

• Dynamic programming

• Constraint aggregation

• Heuristic search and optimization techniques


Randomized Rounding

The randomized rounding technique works the best when the ILP is of combi-
natorial nature, that is, a 0-1 programming problem. The principle is shown
via the following example. Consider the problem

max Z = cx

subject to

Ax = b
xi ∈ {0, 1}, i = 1, . . . , n

Replace the last constraint with 0 ≤ xi ≤ 1. The new problem is already an


LP, called the LP relaxation of the 0-1 program.

We can solve the LP relaxation by any LP algorithm, which generally re-


sults in variable assignments that are between 0 and 1. The next step is to
round the variables to 0/1 values. This could be done deterministically as
usual rounding, but then in a pessimistic case too much violation of the con-
straints can occur. For example, assume the problem contains the following
constraint:
x1 + x2 + . . . + x1000 = 510 (1)
and the LP solution is x1 = . . . = x1000 = 0.51, which satisfies the constraint.
If we round the variables in the usual way, then they all will be rounded to
one, so the lefthand-side becomes 1000.

One the other hand, we can have much smaller violation of the constraints by
rounding randomly, according to the following rule. The randomly rounded
value xei of xi will be:

 1 with probability xi
xei =
 0 with probability 1 − xi
This can be easily implemented by drawing a uniformly distributed random
number yi ∈ [0, 1] and setting xi = 1 if xi ≥ yi , otherwise xi = 0.

What is the advantage of the random rounding? We can observe that the
expected value of xe1 + . . . + xe1000 will be exactly 510, thus it satisfies the
constraint. Of course, the actual value may still violate it, but one can
expect that the actual values fluctuate around the average, so the errors of
different signs largely cancel out if there are many variables.

One can quantitatively estimate the error that can still occur with a cer-
tain small probability. The way of this estimation is shown in the following
theorem (informal explanation follows after the theorem).

Theorem 1. Let x be an n-dimensional vector with 0 ≤ xi ≤ 1 for all i.


Assume x satisfies the constraint ax = b. Let x̃ be the rounded version of x,
obtained by randomized rounding. Then the following inequality holds
q q
b − amax αn log n ≤ ax̃ ≤ b + amax αn log n

with probability at least 1 − n−α , where α > 0 is any constant and amax =
maxi |ai |.

The Theorem says that the error, which is the deviation of the actual value

of ax̃ from the expected value b, is bounded by amax αn log n. This may be
much smaller than b.

Observe the effect of the α parameter. The bound on the error holds with
probability at least 1 − n−α . If α is large, then this probability is very close

to 1. Then, however, the error amax αn log n gets also larger. If α is small,
then the error is small, but it holds with smaller probability. Thus, there
is trade-off between providing a tighter bound less surely or a looser bound
more surely. Note, however, that this plays a role only in the analysis, not
in the actual algorithm.

Let us look at example (1) we used. Choose α = 1. The other parameters


are:
n = 1000, amax = 1, b = 510
The error term will be:
q q
amax αn log n = 1000 log 1000 ≈ 83

and the probability is

1 − n−α = 1 − 1000−1 = 0.999 = 99.9%.

Thus we obtain that the deviation from the required ax = 510 is bounded as

427 ≤ ax̃ ≤ 593

with 99.9% probability, thus, almost surely. In contrast, with deterministic


rounding we could only say that the value is between 0 and 1000.

What if we are satisfied with less certainty, say, with 90% probability? Then
we can choose α = 1/3, since 1 − 1000−1/3 = 0.9 and then we get the error
bound of s
q 1
amax αn log n = · 1000 log 1000 ≈ 48.
3

This yields the tighter estimation

462 ≤ ax̃ ≤ 558 (2)

which holds, however, only with 90% probability.

What if we would like to have the tighter bound (2) but also the higher 99.9%
probability? There is a way to achieve that, too. Repeat the randomized
rounding 3 times independently. Then in each trial the probability of violat-
ing (2) will be at most 10%. Hence the probability that all the 3 independent
trials violate (2) is 0.13 = 0.001 = 0.1%. Thus, among the 3 trials there must
be one with 99.9% probability that satisfies the bound (2), so we can choose
this one. The moral is that we can “amplify” the power of the method by
repeating it several times independently and then choosing the best result.

The Theorem can directly be extended to the case that involves repeated
independent trials. With r repetitions the probability is amplified to 1−n−αr ,
while the error bound remains the same.

Remark: The approach is most useful if the problem contains “soft” con-
straints. What are these? It is customary to differentiate two types of cons-
triants:

• Hard constraints: these must be obeyed. For example, they may rep-
resent some physical law which is impossible to violate.

• Soft constraints: these can possibly be violated if there is no other way


to solve the problem, but then we have to pay a penalty, so we would
like to minimize the violation. For example, budget constraints often
behave this way.

Since randomized rounding may result in constraint violation, it is typically


good for soft constraints.
Exercises

1. If we can guarantee an error bound B with probability p, then how many


repetitions are needed if we want to decrease the error bound by a factor of
10, while keeping the same probability?

Answer: If we use r repetitions, then the probability bound is amplified to


1 − n−αr . For keeping the original probability we can choose a new α as

α0 = α/r. Then the error will decrease by a factor of r, as α is under the
square root in the expression of the error. Thus, if we want to decrease the
error by a factor of 10, then we need 100 repetitions.

better solution via randomized rounding than with naive deterministic round-
ing, especially if there are many variables.

2. Generalize Theorem 1 for the case when there are more constraints.

Answer: Theorem 1, when applied to several constraints, implies the theorem


below, via the union bound of probabilities. The union bound says that for
any events A1 , A2 , . . . Am , the probability of their union (i.e., the probability
that at least one of them occurs) is bounded by the sum of their individ-
ual probabilities, no matter whether the events are independent or not. In
formula:
Pr(A1 ∪ . . . ∪ Am ) ≤ Pr(A1 ) + . . . + Pr(Am ).

Theorem 2. Let x be an n-dimensional vector with 0 ≤ xi ≤ 1 for all i.


Assume x satisfies Ax = b, representing m constraints. Let x̃ be the rounded
version of x, obtained by randomized rounding. Then the following inequality
holds:
µ q ¶ µ q ¶
b − amax αn log n 1 ≤ Ax̃ ≤ b + amax αn log n 1

with probability at least 1 − mn−α , where α > 0 is any constant, amax is


the largest absolute value of any entry in A, and 1 is a vector in which all
coordinates are 1.
The Branch and Bound Algorithm
The Branch and Bound (B&B) algorithm is a general method to solve discrete
optimization problems. The key idea is that B&B attempts to prune parts
of the exhaustive search tree by using bounds on the value of the optimum
solution. Below we present a formalized description of the method. In order
to understand how it works, let us review first how we can formalize the
exhaustive search.
Consider the following optimization problem. Let B be the set of all n-
dimensional binary vectors and we would like to find the maximum of a
function f (x) over B, that is, we are looking for

max f (x).
x∈B

Note: boldface letters will be used to denote vectors, such as

x = (x1 , . . . , xn ), b = (b1 , . . . , bn ).

We can formalize the exhaustive search as a recursive procedure. To present


it, let us introduce a few notations. Let Bk (b) denote the subset of B which
is obtained by fixing the first k coordinates of the binary vectors according
to the coordinates of a given binary vector b:

Bk (b) = {x ∈ B | x1 = b1 , . . . , xk = bk }.

That is, the first coordinate is fixed at b1 , the second at b2 , ... the k th is fixed
at bk and the rest of the coordinates of x is arbitrary (0 or 1). If k = 0,
then it means we did not fix anything, so B0 (b) = B. If k = n, then all
coordinates are fixed at the corresponding bi value, so then the set Bn (b)
contains only a single vector, b itself: Bn (b) = {b}.
Let us introduce now the following family of functions:

Fk (b) = max f (x).


x∈Bk (b)

The meaning of Fk (b) is that this is the maximum value of f (x) if the maxi-
mization is done over the restricted set Bk (b), that is, the first k coordinates
of x are fixed at the values determined by b and only the rest can change.

1
What is the advantage of the Fk (b) functions? The reason for we have
introduced them is that we can easily construct a recursive algorithm to
compute them. Once we have this subroutine that computes Fk (b) for any k
and b, then we can solve the original problem by simply calling the subroutine
with k = 0, since, by definition we have

F0 (b) = max f (x) = max f (x)


x∈B0 (b) x∈B

for any b (so we can use, for example, b = 0 in this call).


Let us see now how we can compute the Fk (b) values recursively. We present
an algorithm in pseudo-code below first for the exhaustive search.

Exhaustive Search

1. function Fk (b)
2. if k = n then return f (b)
3. else
4. begin
5. bk+1 := 0; u := Fk+1 (b)
6. bk+1 := 1; v := Fk+1 (b)
7. return max(u, v)
8. end

Explanation: In line 2 we handle the case when all coordinates are fixed,
i.e., we have arrived at a leaf of the search tree. In this case Bn (b) = {b},
so the maximum over this set can only be f (b). If k < n then in lines 5
and 6 we compute recursively the function for two possible cases (branch):
when the next coordinate is fixed at 0 (line 5) and when it is fixed at 1 (line
6). The maximum of the two gives the value of Fk (b) in line 7, since this is
the maximum over the set when the (k + 1)th coordinate is not fixed, only
the first k. What guarantees that the recursion ends after a finite number of
steps? This is guaranteed by the fact that the recursive call is always made
for a larger value of k and when the largest value k = n is reached then there
is no more recursive call.

2
Let us now turn to the Branch and Bound algorithm. Suppose that we have
an upper bound function Uk (b) available, such that

Fk (b) ≤ Uk (b)

holds for any k and b. It is assumed that an independent subroutine is


available that can compute Uk (b) for any parameter value.
Let us introduce the variable maxlow that will carry the largest lower bound
on the global optimum that has been achieved so far. We do not need a sepa-
rate subroutine for computing a lower bound, because whenever we compute
a value f (x) for a particular x, it is automatically a lower bound on the max-
imum. This is so, simply because the function at any x is either maximum
or less, that is,
f (x) ≤ max f (y)
y∈B

holds for any x.


Now we can present the B&B algorithm by modifying the exhaustive search
in the following manner. For simplicity, let us assume that the function f (x)
is nonnegative. Then the variable maxlow can be initially set to 0.

Branch and Bound


1. function Fk (b)
2. if k = n then
3. begin
4. if f (b) > maxlow then maxlow := f (b)
5. return f (b)
6. end
7. else
8. if Uk (b) > maxlow then
9. begin
10. bk+1 := 0; u := Fk+1 (b)
11. bk+1 := 1; v := Fk+1 (b)
12. return max(u, v)
13. end
14. else return maxlow
Explanation. Observe that if we remove lines 4, 8 and 14 then we exactly

3
get back the exhaustive search (ignoring the unnecessary begin-end pairs).
Thus, let us take a closer look how these lines change the exhaustive search.
In the case k = n the only change is that before returning f (b) we update
the value of maxlow in line 4: if the new function value is larger then the
old value of maxlow, then this larger value will be the new best lower bound
on the optimum, otherwise the old one remains.
The most essential difference is in line 8, this is the bounding. Here the idea is
that we only make the recursive calls if the upper bound Uk (b) is larger than
the current value of maxlow. Why we can do this without losing anything?
Because if the condition in line 8 is not satisfied, then

maxlow ≥ Uk (b) ≥ Fk (b)

must hold. This means that from this parameter combination (k, b) we
cannot hope improvement on the best value found so far, since the maximum
in this subtree of the search tree (i.e., Fk (b)) is already known to be not more
than maxlow, so it makes no sense to explore this subtree. Thus, in this case
we do not make any recursive call and return maxlow in line 14 (just to
return something).
Note that for the above bounding step we need to compute the value of Uk (b),
but we assumed that for this an independent subroutine is available. The
savings may be essential if this subroutine is much faster then the exhaustive
recursive algorithm. This is typically the case in the successful applications
of B&B.

4
Comments:

• If we drop the assumption f (x) ≥ 0, the algorithm still remains the


same, just maxlow should be initialized to a value for which maxlow ≤
f (x) holds for all x.

• If the original maximization problem allows only certain binary vectors


x, then we first extend the function to all binary vectors by assigning a
small enough function value to those vectors which are excluded in the
original maximization problem. (“Small enough” means here a value
that is surely smaller than any function value on those vectors that are
not excluded in the original problem).

• If we also want to find the maximizing vector x, not just the maximum
value f (x) (i.e., we ask not only that how much is the maximum, but
also that where it occurs), then we have to keep record the latest b vec-
tor along with updating maxlow, whenever a higher value (an update)
occurs in line 4.

• The algorithm is presented here for a maximization problem. If the


problem is minimization, then everything should be changed accord-
ingly, that is, then we need a lower bound function instead of an upper
bound function, we maintain a minimum upper bound minup instead
of a maximum lower bound maxlow etc. If you understand the algo-
rithm, it should not be a problem to convert it into minimization.

5
Application Example of the
Branch and Bound Algorithm
Consider the Capital Budgeting problem (which, as we already know, is structurally identical
to the Knapsack problem, so we can use either name). Its formulation is this:
n
X
max Z = pi xi
i=1

Subject to
n
X
c i xi ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , n

Let us try to find the exact solution with the Branch and Bound (B&B) algorithm. In order
to do it, we need to handle the following issues.

1. Transforming the formulation

In B&B we considered the optimization task


max f (x).
x∈B

This does not have any constraint and seeks for the optimum over all binary vectors of
dimension n. In contrast, our problem does have a constraint and seeks for the optimum
only over those binary vectors that satisfy the constraint.
To handle this difference, we need to transform our constrained task into an unconstrained
one. It can be done by the method that is often called penalty function method. It is based on
the idea that we can substitute a constraint by modifying the objective function, such that
we “penalize” those vectors that violate the constraint. In this way, we can do the search
over all vectors (with the modified objective function), because when a global, unconstrained
optimum is found, the “penalty” does not allow it to be taken at a vector that violates the
constraint.
Specifically, in our case, we can do the following. The original objective function was
n
X
f (x) = pi xi .
i=1

1
Let the modified objective function be
 Pn Pn

 i=1 pi xi if i=1 ci xi ≤C
fe(x) = 
 Pn
0 if i=1 ci xi > C.

We can observe that whenever a vector x satisfies the constraint, fe(x) = f (x) holds. On the
other hand, if x violates the constraint, then fe(x) = 0. Since the maximum we are looking
for is positive1 , therefore, this positive maximum of fe(x) cannot be taken on a vector that
violates the constraint, as fe(x) = 0 for those vectors, by definition. Thus, if we solve

max fe(x)
x∈B

by B&B, then the optimum provides us with the optimum of the Knapsack problem.

2. Finding a good upper bound function

In B&B we use an upper bound function Uk (b) that provides an upper bound on the partial
optimum Fk (b) (see the Lecture Note about B&B).
What will be Fk (b) is our case? If we fix the first k variables at values b1 , . . . , bk ∈ {0, 1},
then we get the following new Knapsack problem:

n
X k
X
max pi xi + pi bi
i=k+1 i=1

Subject to
n
X k
X
ci xi ≤ C − ci bi
i=k+1 i=1

xi ∈ {0, 1}, i = k + 1, . . . , n

Fk (b) is the optimum of this task. How do we get the upper bound function Uk (b)? We can
take the LP relaxation of this ILP, replacing the constraint xi ∈ {0, 1} by 0 ≤ xi ≤ 1. Thus,
Uk (b) is the optimum solution of the LP
1
Assuming each item alone fits in the knapsack, since we can a priori exclude those that do not.

2
n
X k
X
Uk (b) = max pi xi + pi bi
i=k+1 i=1

Subject to
n
X k
X
c i xi ≤ C − ci bi
i=k+1 i=1

0 ≤ xi ≤ 1, i = k + 1, . . . , n

3. Fast computation of the upper bound function

The whole B&B approach makes sense only if we can compute the upper bound function
significantly faster than the original task. In our case we face the problem: how do we quickly
solve the above LP that defines Uk (b)? Fortunately, this LP has a special structure: it is the
continuous version of the Knapsack problem. By continuous we mean that xi ∈ {0, 1} has
been replaced by 0 ≤ xi ≤ 1. For such a continuous knapsack it is known that the optimum
can be found by a fast greedy algorithm, as follows.
Order the variables according to decreasing pi /ci ratio. Let us call it preference ordering.
The most preferred variable is the one with the highest pi /ci ratio. Consider the variables
starting by the most preferred one, and assign them the value 1, as long as it is possible
without violating the budget constraint. (Note that in the considered subproblem the budget
P
is C − ki=1 ci bi . If it happens to be negative, then there is no solution to the subproblem,
and we can represent it by Uk (b) = −1, so that this branch of the search tree will surely
be cut down.) When we cannot continue assigning 1 anymore without violating the budget,
then assign a fractional value to the next variable, so that the remaining part of the budget,
if any, is exactly used up. The remaining variables, if any, take 0 as their value. One can
prove that this value assignment gives the optimum solution to the continuous Knapsack
problem. Therefore, we have a fast way of computing Uk (b).

Thus, we have all the ingredients to apply B&B. We aim at computing

max fe(x)
x∈B

and whenever the algorithm calls for Uk (b), we can compute it by the fast algorithm described
above.

3
Sample question about the Branch and Bound Algorithm

Which of the following describes best the definition of the functions Fk(b) used in the
the Branch and Bound algorithm:

1. These functions can be computed recursively and then the original optimum is
obtainable by calling the function Fk(b) with k = 0; b = 0.

2. Via these functions the original optimization problem is replaced by a sequence


of simpler ones.

3. Fk(b) is the optimum value of the original function f(x) over a restricted set, in
which the first k coordinates of x are fixed according to b.

4. These functions make it possible to incorporate an upper bound function into the exhaustive
search. As a result, possibly large parts of the exhaustive search can be eliminated, by
eliminating the recursive call when it cannot bring an improvement.
Branch and Cut Algorithm

It is the combination of the Cutting Plane and Branch and Bound Algorithms. We show the

principle below through an example.


We can then recursively repeat the
algorithm for the subproblem.
Dynamic Programming

The general idea

Dynamic programming is essentially a recursive algorithm, which is often


useful if the problem can be represented as a parametrized set of subproblems,
such that each subproblem is reducible to others with smaller parameters.

The formulation can yield an efficient algorithm if the following two condi-
tions are satisfied:

• The number of different subproblems is not too large (does not grow
exponentially).

• The iterative reductions finally reduce the task to some initial cases for
which the solution is already known or directly computable.

We show the principle through an example.

Example

A telecom company wants to create an extension plan for its network, year by
year over a time horizon of T years, such that the network will have enough
capacity to keep up with the growing demand in every year, and the total
cost is minimum. The following information is available as input:

• cost[t, k] is the known cost we have to pay if in year t we extend the


capacity by k units (k is assumed to be an integer).

• demand[t] is the known capacity demand in year t.


Goal: find an extension plan by specifying how much new capacity has to
be bought in each year, such that in every year the accumulated capacity
satisfies the demand, and the total cost is minimum.

Solution

For easy description, let us use the following notations:

• k will denote the amount of capacity.

• The time (year) will be denoted by t. This variable runs over a time
horizon T, that is t = 1, . . . , T.

• cost[t, k] denotes the known cost we have to pay if in year t we buy k


units of capacity.

• demand[t] denotes the known demand for year t.

• Let A[t, k] be the minimum achievable accumulated cost of a decision


sequence up to the end of year t, if it achieves a total capacity of k.

Let us now formulate the recursive algorithm:

The optimal accumulated cost up to the end of year t with accumulated


capacity k can be obtained recursively as follows.

If the last amount of capacity we have bought is r and now we have altogether
k, then by the end of the previous year we must have had k − r. If we already
know the optimum cost up to year t − 1 for all possible values of the capacity,
then we can express the updated total cost as

A[t, k] = A[t − 1, k − r] + cost[t, r].


Since r is a free parameter, the updated optimum will be the minimum with
respect to r:
A[t, k] = min
r
A[t − 1, k − r] + cost[t, r]
where the minimum is taken over all possible values of r ≤ k that corresponds
to the possible amounts of capacity we can buy.

Of course, k capacity in year t is acceptable only if k ≥ demand[t], that is,


the accumulated capacity satisfies the demand. To exclude those values that
violate this requirement, we can prevent using them by assigning infinite cost
(or some very large number) to these cases. Thus, the final recursice formula
will be

 ∞ if k < demand[t]
A[t, k] =
 minr A[t − 1, k − r] + cost[t, r] if k ≥ demand[t]

We can apply the above recursive formula whenever t > 1. But how do
we handle the first year, that is, how do we start the recursion? When
t = 1, there is no previosly accumulated capacity yet, so then we simply take
A[1, k] = cost[1, k] for every k.

In this way the A[t, k] values can be computed recursively for t = 1, . . . , T.

Having done the iteration, the optimum cost will be

OP T COST = min A[T, k]


k≥demand[T ]

Comment:

This systematic solution solves the problem in time bounded by a polynomial


of the time horizon length (T ) and the maximum value of k (capacity) as
opposed to the full decision tree where the running time would grow expo-
nentially.
Constraint Aggregation

In the constraint aggregation method we replace several constraints by a


single one. Let us show the technique trough the example of the knapsack
problem. The original knapsack problem is formulated as follows.

N
X
max Z = p i xi
i=1
Subject to
N
X
ci xi ≤ C
i=1
xi ∈ {0, 1}, i = 1, . . . , N

Let us assume now that instead of a single inequality constraint we have two.
Assume further that all constants are nonnegative integers. Thus, the task
becomes this:

N
X
max Z = p i xi
i=1
Subject to
N
X
a i xi ≤ A
i=1
XN
b i xi ≤ B
i=1
xi ∈ {0, 1}, i = 1, . . . , N

A possible motivation is, refering to the Capital Budgeting problem, that


there are two kinds of costs associated with each item, and the budgets
for the different types come from different sources, so they may have to be
obeyed separately. Or, in the knapsack terminology, each item has a size and
a weight parameter, and the knapsack can carry only a certain total size and
a certain total weight, so we have to satisfy both constraints.

Let us first convert the inequality constraints into equations, by introducing


slack variables. Let xN +1 be the slack variable in the first inquality, and
xN +2 in the second. Further, let aN +1 = 1, bN +1 = 0, bN +2 = 1. In this way
we can simply include the slack variables in the summation, without extra
terms. Moreover, we can observe that the size of the “gap” the slack variable
has to fill can be at most A in the first constraint, and at most B in the
second.

N
X
max Z = p i xi
i=1

Subject to
N
X +2
ai xi = A
i=1
N
X +2
bi xi = B
i=1
xi ∈ {0, 1}, i = 1, . . . , N
xN +1 ∈ {0, 1, . . . , A}
xN +2 ∈ {0, 1, . . . , B}

Now the question is this: can we replace the two equality constraints above
by a single one, so that this single constraint is equivalent to the joint effect
of the two? By equivalence we mean that they generate the same set of
feasible solutions.
The answer to the above question is yes, even though it may seem counter-
intuitive at first. Let us define the parameters of the new constraint by
ci = ai + M bi
for every i, and
C = A + MB
where M is a constant that we are going to choose later.

It is clear that the two constraints


N
X +1
ai xi = A (1)
i=1
N
X +2
bi xi = B (2)
i=1

imply the new one


N
X +2
(ai + M bi )xi = A + M B (3)
i=1

(taking aN +2 = 0), since (3) is just a linear combination of (1) and (2). The
question, however, is this: can we choose M such that the implication also
holds in the opposite direction, that is, such that the aggregated constraint
(3) implies (1) and (2)?

The answer is yes. To see it, let us rearrange (3) in the following form:
N +1
ÃN +2 !
X X
ai xi − A + M b i xi − B =0 (4)
i=1 i=1
| {z } | {z }
Y Z

Now observe that the value of


N
X +1
Y = ai xi − A
i=1
PN PN
must be at least −A, and cannot be more than i=1 ai + A − A = i=1 ai .
So we have the bound
N
X
|Y | ≤ A0 = max{A, ai }.
i=1

PN +2
Using Z = i=1 bi xi − B, we can rewrite (4) as

Y + M Z = 0. (5)

How can this equality hold? One possibility is that Y = Z = 0. In that case,
(1) and (2) are both satisfied, since
N
X +1 N
X +2
Y = ai xi − A and Z= bi xi − B.
i=1 i=1

On the other hand, if Z 6= 0, then |Z| ≥ 1, being an integer. In this case Y


must make the left hand side 0 in (5). If, however, we choose M = A0 + 1,
then it becomes impossible, since with this choice we have M |Z| ≥ A0 + 1,
and we already know |Y | ≤ A0 .

Thus, with the choice of M = A0 + 1, we can achieve that whenever the


aggregated constraint (3) is satisfiable, both original constraints (1) and (2)
will also be satisfiable. Of course, this is only guaranteed when the variables
take their values from the allowed ranges.

P
Remark: We can assume that A < N i=1 ai , since otherwise the original
PN
constraint i=1 ai xi ≤ A would always be satisfied, so we could simply leave
it out. With this assumption the choice for the constant M becomes
N
X
M =1+ ai .
i=1
Tabu Search

Let us first consider the simplest strategy to optimize a function f (x) over
a discrete domain x ∈ D, the (greedy) local search. This can be defined as
follows.

Let us define a neighborhood N (x) for each x ∈ D. The domain D, together


with the neighborhood definition, is often referred to as the search space.

For example, if D is the set of n-dimensional binary vectors, then an example


of a neighborhood definition is
N (x) = {y | y differs from x in at most 2 bits}. (1)

Then the local search can be defined as an iteration in which we always


move from the current point to its best neighbor, that is, to the one that
offers the most improvement in the function value in the neighborhood. If
no improvement is possible in the neighborhood, then the algorithm stops.

If we consider minimization, then local search can be visualized as sliding


down on the steepest slope until we reach a local minimum.

The critical problem of greedy local search is that it may easily get stuck in
a local optimum, which can be very far from the global optimum.

Tabu search offers an approach that diminishes (but does not eliminate!)
the danger of getting stuck in a local optimum. The fundamental principle
is summarized below.

• The algorithm maintains a tabu list that contains disallowed moves.


For example, a simple tabu list can contain the last k visited points,
for some constant k.

1
• The algorithm does local search, but it keeps moving even if a local
optimum is reached. Thus, it can “climb out” from a local minimum.
This would result, however, in cycling, since whenever it moves away
from a local minimum, the next step could take it back if only the
improvement is considered. This problem is handled by the rule that
the points on the tabu list cannot be chosen as next steps. In the tabu
list example mentioned above we can decrease the possibility of cycling
by disallowing recently visited points.

• The tabu list can also be more complicated than the example above.
The algorithm may also maintain more than one tabu lists. For exam-
ple, one list may contain the last k visited points, while another list can
contain those points that are local minimums visited in the search. A
third list can contain, for example, those points from which the search
directly moved into a local minimum.

• The rule that the points in the actual tabu list cannnot be visited
may be too rigid in certain cases. To make it more flexible, it is usually
allowed that under certain conditions a point may be removed from the
tabu list. These conditions are called aspiration criteria. For example,
if with respect to the current position a point on the tabu list offers
an improvement larger than a certain treshold, then we may remove
the point from the tabu list. Or, if all possible moves from the current
position would be prohibited by the tabu list(s), then we can remove
from the tabu list the point that offers the best move. In this way we
can avoid getting “paralyzed” by the tabu list.

• Stopping criterion. The search can, in principle, be continued indefi-


nitely. A possible stopping criterion can be obtained in the following
way. Let us keep (and refresh after each iteration) the best value that
has been achieved so far. If during a certain number of iterations no
improvement is found relative to the earlier best value, then we can

2
stop the algorithm.

Advantages of Tabu Search

• Improves greedy local search by avoiding getting trapped in a local


optimum too early.

• Offers flexibility by the possibility of adjusting the tabu list structure


and rules to the specific problem. If this is done well, good results can
be achieved.

Disadvantages of Tabu Search

• Offers only heuristic improvement, but does not guarantee anything in


a strict sense: there is no guaranteed convergence to a global optimum
(as opposed to simulated annealing).

• Usually requires heavy problem-specific adjustment and there are no


general rules that would tell how to do it, much is left to intuition.
Therefore, it often involves a significant algorithm design challenge.

A critical step: designing a good search space

Recall that the search space consists of an underlying domain, and a neigh-
borhood definition. The domain is usually given, as it is the set of the possible
objects in which we search an optimal one. But the neighborhood structure
is up to us to choose.

What can we expect from a good neighborhood definition?

3
• The arising neighborhoods are not too large, so that they can be
searched relatively easily.

• At the same time, the neighborhoods are also not too small, so that
they do not overly restrict the choices.

• We expect that the search space is connected, that is, every point is
reachable from every other one, via a sequence of neighbor to neighbor
moves. This is important, since otherwise it may be impossible to reach
the optimum simply because it may not be reachable from the initial
position at all.

In some cases these conditions are easily satisfied, see, e.g., the introductory
example of this lecture note, where the neighborhood is defined by (1). But
in other cases it may be somewhat harder to design the “right” search space.

Exercises

1. Assume we an n-node complete graph, with non-negative weights on the


edges. The weights represent link delays. We want to design a network
that has a tree topology, spanning all the nodes, such that the average delay
between two uniformly chosen random nodes, when they are connected via
a path within the tree, is minimum. Here the delay of a path is the sum of
the link delays (edge weights) along the path. That is, we want to select a
spanning tree among all spanning trees of the complete graph, such that it
is optimal in the above sense. (This problem is known to be NP-hard.)

Task: Design a search space over the set of all spanning trees of the complete
graph, such that the neighborhood definition satisfies the above presented
three conditions for a good neighborhood structure.

4
2. Assume we have n nodes, and we want to design a ring network connecting
all the nodes. The cost of connecting any two given nodes is known, and we
look for the minimum cost ring. (This problem is known to be NP-hard.)

Tasks:

a.) Design a search space over the set of all possible rings, such that the
neighborhood definition satisfies the above presented three conditions for a
good neighborhood structure.

Hint: Use the result from algebra that every permutation can be represented
as the superposition of transpositions (transposition: a special permutation
in which only two entries are exchanged).

b.) Show that the search space can be designed in a way that, beyond
satisfying the three conditions, it is also regular. Regularity means that each
element (point) of the space has the same number of neighbors. This is
usually a desirable property, because it means that all points are equally
well connected, none of them is distinguished by having more neighbors than
others.

5
Simulated Annealing

Simulated annealing improves the greedy local search in the following way.
Assume the task is to minimize a function f (x).

• At each iteration we choose a point randomly from the neighborhood


of the current point. Let x denote the current point and let y be the
randomly chosen neighbor.

• If f (y) < f (x), that is, a better point is found, then we accept it and
y will be the next position.

• If y is not better or even worse than x, we still accept it with a certain


probability. This probability is computed as follows. Define a quantity
∆E by
∆E = f (y) − f (x).
This shows by how much y is worse than x. The notation is based on
a physical analogy in which the function value represents the energy of
a state in the state space and then ∆E is the energy difference. With
this the acceptance probability is defined as
∆E
Probaccept = e− kT

where k is a constant and T is a parameter called temperature (these


come again from the physical analogy, where k is called the Boltzmann
constant, but this playes no role in the optimization, here we can sim-
ply take k = 1). Thus, the new point y is accepted with probability
Probaccept or we stay in x with probability 1 − Probaccept .

• As the iteration proceeds the temperature is gradually decreased. Thus,


instead of a constant T , we use a so called cooling schedule, which is a
sequence Tn , such that in the nth iteration Tn is used as the temperature
parameter. The cooling schedule is chosen such that Tn → 0 holds. A
typical choice is the logarithmic cooling schedule:
a
Tn =
b + log n
where a, b are positive constants.

• As a stopping rule we can again say that if no improvement was ob-


tained over a certain number of iterations than the algorithm stops.
Or, we can continue the iterations until the temperature drops below
a small ² > 0, that is, the system “freezes”.

Interpretation of the Algorithm

An essential feature of simulated annealing that it can climb out from a local
minimum, since it can accept worse neighbors as the next step. Such an
acceptance happens with a probability that is smaller if the neighbor’s quality
is worse. That is, with larger ∆E the acceptance probability gets smaller.
At the same time, with decreasing temperature all acceptance probabilities
get smaller. This means, initially the system is “hot”, makes more random
movement, it can relatively easily jump into worse states, too. On the other
hand, over time it “cools down”, so that worsening moves are accepted less
and less frequently. Finally, the system gets “frozen”.

This process can be interpreted such that initially, when the achieved quality
is not yet too good, we are willing to accept worse positions in order to be able
to climb out of local minimums, but later, with improving quality we tend
to reject the worsening moves more and more frequently. These tendencies
can be balanced and controlled by the choice of the cooling schedule. If it
is chosen carefully, then convergence to a global optimum can be guaranteed
(proven mathematically).
Advantages of Simulated Annealing

• Improves greedy local search by avoiding getting trapped in a local


optimum.

• With appropriately chosen cooling schedule the algorithm converges to


the global optimum.

Disadvantages of Simulated Annealing

• Convergence to the optimum can be slow.

• There is no general way of estimating the number of iterations needed


for a given problem, so we cannot easily guarantee that the result is
within a certain error bound of the global optimum.
Exercise

Assume that in a Simulated Annealing algorithm we define the neighbor-


hood of an n-dimensional binary vector x as follows. Let w(x) denote the
number of 1 bits in x. The binary vector y is a neighbor of x if and only
if |w(x) − w(y)| = α min{w(x), w(y)} holds, where α is a constant that we
can choose. We want to choose the value of α, such that the state space
becomes connected, that is, any vector can be reached from any other one
through some sequence of moves, going from neighbor to neighbor. Which
of the following is correct?

1. If α = 1, then the state space will be connected.

2. If α ≥ 2, then the state space will be connected.

3. The state space will be connected for any α > 0.

4. No matter how α is chosen, this state space will never be connected.

5. None of the above.


Genetic Algorithm

The genetic algorithm applies concepts from evolutionary biology and at-
tempts to find a good solution by mimicking the process of natural selection
and the “survival of the fittest”. The algorithm can be outlined as follows.

• The potential solutions are viewed as members of a population. Each


such member can be described, for example, by a vector. Initially we
can start with a random population, i.e., a set of random vectors.

• In each iteration certain operations are executed that randomly change


the population. Two fundamental operations are:

– Mutation. This means that some randomly chosen individuals


(vectors) undergo random changes. For example, some of their
bits (in case of binary vectors) are reversed.
– Crossover. Two randomly selected individuals (vectors) “mate”
and create offsprings that inherit the combined properties of the
parents. For example, the components of the offspring vectors are
obtained by swapping some components of the parent vectors.

• After each round of random changes the fitness of each individual is


evaluated by a fitness function that can measure how good is each
solution vector. Those that are not fit enough, die, while the fit ones
survive.

• The population evolves through such iterations. After a large number


of iterations one can hope that the arising population already contains
good solutions with high fitness value.

1
Advantages of Genetic Algorithms

• Offers a chance that the initial solutions evolve into good ones that are
close to optimal.

• The evolving population tends to have more and more fit individuals,
so the algorithm may find many good solutions, not just one.

Disadvantages of Genetic Algorithms

• Nothing is guaranteed formally (usual problem with heuristic algo-


rithms).

• There is no general way of estimating the number of iterations needed


for a problem, so we cannot easily decide how close is the result to a
global optimum.

2
Exercise

Assume that in a Genetic Algorithm the fitness of each individual member of


the population, in each iteration, can be modeled as a random variable that
is uniformly distributed in the interval [0, 1]. Further asssume that for each
individual this random variable is independent from all the others, in each
iteration. We are looking for a solution (an individual) with as high fitness
value as possible. The initial population has size n. After each iteration of
modifying the population via mutation and crossover operations we only keep
the n fittest individuals, the rest “die”. Which of the following is correct?

1. The maximum fitness value found in the population will converge to 1,


as the number of iterations grow. The reason is that the fittest individ-
ual in the current population always survives when we keep only the n
fittest. Since the fitness in our case is modeled as an independent, uni-
formly distributed random variable in [0, 1], therefore, the probability
that we have an individual with fitness in [1 − , 1] tends to 1 for every
fixed 0 <  < 1, if the number of iterations grow.

2. Since there is no general performance guarantee for the Genetic Algo-


rithm, therefore, we can never gurantee any kind of convergence.

3. The convergence of the maximum occuring fittness to 1 under the given


assumptions is only guaranteed if n also grows with the iterations.

Solution (with explanation): Let p denote the probability that in a given


iteration there is an individual with fitness in [1 − , 1]. Under the given
assumptions, in any iteration, the value of p can be computed this way:

p = 1 − (1 − )n . (1)

The reason is that (1 − )n is the probability that all the n individuals have
their fitness in the interval [0, 1 − ). Then 1 − (1 − )n is the probability of

3
the complement event, which is the event that not all have their fitness in
[0, 1 − ), that is, at least one must fall in [1 − , 1]. Answer 1 claims that for
every fixed  > 0 we have p → 1. From formula (1), we see that this would
require (1 − )n → 0. However, for a constant n this is not satisfied. On
the other hand, if n → ∞, then (1 − )n → 0 indeed holds. Therefore, the
correct choice is Answer 3.

4
CS 6385 Final Exam Sample Questions with Answers

RULES:
Exactly one answer is correct for each multiple choice question. Encircle the
number of the chosen answer. You have two options for each multiple choice
question:
(a) Select one answer. If it is correct, then it is worth 1 point, otherwise 0.
(b) You may select 2 answers. If the correct one is among them, then it is worth
1/2 point, otherwise 0. This allows partial credit for multiple choice questions.
Note that by selecting 2 answers you may double your chance to hit the correct
one, but at the price of getting only half of the credit for this question.
Your choice of the answer(s) should be clear and unambigous. If ambiguity
remains for a question, then it will be counted as an unanswered question. The
instructor cannot give any hint during the exam about the answers, so please
refrain from asking such questions. It is an open-notes exam, but no other help
can be used. In particular, no device can be used that has a communicating
capability (laptop, cellphone, etc).

1
Part A : MULTIPLE CHOICE QUESTIONS

1 The reliability of a series configuration

1. is determined by the least reliable component.


2. can be larger than any of the component reliabilities.
3. cannot be larger than the reliability of the least reliable component.
4. is always strictly smaller than the reliability of the most reliable component.
5. is the average of the component reliabilities.

Answer: 3

2
2 We observe that for a component the probability of being operational at some given time
t is the same as at time t + 5. Which of the following is true about potential failures:

1. No failure can occur between t and t + 5.


2. Failure can still occur between t and t + 5, just it is equally likely everywhere in
the [t, t + 5] interval.
3. The given information alone is not sufficient to determine whether failures can
occur in the [t, t + 5] interval.
4. The lifetime of the component cannot be more than t.
5. The lifetime of the component cannot be more than t + 5.
6. None of the above.

Answer: 1
Justification: The probability of being operational at time t is given by the survival
function S(t), since S(t) = 1 − F (t) = Pr(T ≥ t), that is, it is the probability that
the component has not failed before t. In our case the condition says S(t) = S(t + 5).
Therefore, we must also have F (t) = F (t + 5). Since the probability distribution
function F (t) cannot decrease, therefore, it must be constant in the entire [t, t + 5]
interval. (If it were not constant, then at some point in the interval it must differ from
F (t). If it is smaller there than F (t), then it must have decreased from F (t), which is
not possible. If it is larger, then it must decrease again to get back to F (t) = F (t + 5),
which is also not possible.) Thus, F (t) is constant in the entire interval, which implies
that its derivative, the probability density function f (t), must be 0. But that means,
no failure can occur in the interval, since the probablity of failure in an interval is given
by the integral of f (t) over the interval, which is 0 in our case.

3
3 In a mixed ILP x is a continuous variable and y is a discrete variable with y ∈ {0, 1}.
We would like to express both of the following conditions via linear constraints: (a) if
y = 0 then x = 0; and (b) if y = 1 then −1 ≤ x ≤ 1. Which of the following does it
correctly:
1. xy = 0, −1 ≤ x ≤ 1
2. x = y = 0 or −1 ≤ x ≤ 1
3. x + y ≥ 0, x − y ≤ 0
4. −1 − y ≤ x + y ≤ 1 − y
5. y(−1 − y) ≤ x ≤ y(1 − y)
6. None of the above.

Answer: 3

4
4 We observe that a component has a survivor function that satisfies 0 < S(t1 ) = S(t2 ) < 1
for some given time instants t1 < t2 . Somebody claims this implies that the hazard
function h(t) must be 0 everywhere in the interval [t1 , t2 ]. Which of the following is
true about this claim:

1. The claim cannot hold in general, since we only have information at the time
instants t1 , t2 , but we do not know anything about what happens between t1 and
t2 .
2. The claim may or may not be true, depending on how S(t) behaves between t1
and t2 .
3. The claim is always true, because S(t) = 1−F (t), and the probability distribution
function F (t) cannot decrease, so S(t) cannot increase. Therefore, S(t1 ) = S(t2 )
implies that S(t) must be constant in the whole [t1 , t2 ] interval, which yields that
h(t) = 0 holds everywhere in the interval, due to h(t) = −S ′ (t)/S(t).
4. If h(t) = 0 for every t ∈ [t1 , t2 ], it means that there is no risk (hazard) of failure
in the interval. Since we know that S(t2 ) < 1, therefore, the operational status
of the component at the end of the interval is not guaranteed, having probability
less than 1. Thus, there must be some risk of failure before the interval ends, so
h(t) = 0 cannot hold in the entire interval, making the claim wrong.

Answer: 3

5
5 A network has 3 nodes and there is an undirected link between each pair of nodes. Each
link is operational independently of the others with probability p, 0 < p < 1, while the
nodes are always up. The system is considered operational if the network topology is
connected. Which of the following is true about the reliability configuration:
1. This is a series configuration, because each link should be up to make the system
operational.
2. Since it is enough if two links are up to preserve connectivity, and it is not neces-
sary that all are operational, therefore, this is a parallel configuration.
3. It is a k out of N configuration with k = 2 and N = 3.
4. Since one link has to be operational, and at least one of the other two should
be up, too, therefore, it can be regarded as a combination of series and parallel
configurations.
5. None of the above.

Answer: 3

6
6 Consider the reliability of a series configuration with n components and the reliability of
each component is p = 1 − 1/n2 . What happens if n grows very large (n → ∞)?
1. The reliability of the configuration tends to 0.
2. The reliability of the configuration tends to 1.
3. The reliability of the configuration tends to a constant that is strictly between 0
and 1.
4. The reliability of the configuration does not tend to any number, because the
limit does not exist.

Answer: 2
Justification: Since it is a series configuration, we have
µ ¶n
1
R= 1− 2 .
n
³ ´n2
1
Using that 1 − n2
→ e−1 , it can be reformulated as

µ ¶n µ ¶n2 1
1 1 n 1
R= 1− = 1− ∼ e− n → 1.
n2 n2

7
7 Assume that in a Simulated Annealing algorithm the state space consists of all n-
dimensional binary vectors for some fixed n ≥ 2. We want to decide whether the state
space is connected, that is, every vector can be reached from every other via a sequence
of neighbor-to-neighbor moves. Let w(x) denote the number of 1 bits in x (the weight
of x). The binary vectors x, y are defined neighbors if and only if w(x) + w(y) = n
holds. Which of the following is correct with this neighborhood definition?
1. The state space will not be connected for any n ≥ 2.
2. The state space will always be connected for any n ≥ 2.
3. If n ≥ 2, then the state space will be connected for any even n, but not for odd
n.
4. If n ≥ 2, then the state space will be connected for any odd n, but not for even
n.
5. None of the above.

Answer: 1
Justification: Consider the vector x = 0 that contains only 0 bits. Its weight is 0.
It can be a neighbor only of the vector y = 1, which consists of only 1 bits, and its
weight is n, since otherwise the weight sum could not be n. Since no other vector has 0
or n weight, these two vectors are only neighbors of each other, but are not connected
to any other vector. Since for n ≥ 2 the state space contains more than 2 vectors, it
cannot be connected.

8
Part B : PROBLEM SOLVING

8 In a parallel reliability configuration we have n components, connected in parallel,


and each component has reliability p > 0. Let us now change the system by
doubling the number of components and keeping p at the same value. That is, we
use 2n instead of n, but p remains the same. Is it possible that the reliability of
the entire system will also double as a result of this change? Justify your answer!

Solution
If the reliability doubles, then we have

1 − (1 − p)2n = 2[1 − (1 − p)n ].

Let us introduce a new variable x = (1−p)n . With this the above formula becomes

1 − x2 = 2 − 2x.

After rearranging, we get


x2 − 2x + 1 = 0
which is equivalent to
(x − 1)2 = 0.
This can only hold if x = 1, which implies p = 0, since x = (1 − p)n . That,
however, contradicts to the condition p > 0. Thus, we arrived at a contradiction,
so the reliability cannot double.

9
9 A company wants to install switches at n sites, at most one at each site. 4 types
of switches are available: Type-1, Type-2, Type-3 and Type-4, each at a cost of
c1 , c2 , c3 and c4 , respectively. There is a restriction, however, that we can use
altogether at most two different types out of the four, that is, the number of
types used in the entire system (not just at a single site) can be at most two. If
a switch is installed at a site, it generates a profit of p1 , p2 , p3 or p4 , respectively,
depending on its type. There is an available budget of C. Formulate as an integer
linear programming problem the following task: find and installation plan that
maximizes the total profit, such that the total cost does not exceed the available
budget, and altogether at most two different types of switches are used in the
entire system.

Solution
Let xij ∈ {0, 1} indicate whether at site i a switch of Type-j installed or not.
Furthermore, let yj indicate if a Type-j switch is used somewhere in the system
or not (j = 1, 2, 3, 4). Then the task can be formulated as
n
X
max (p1 xi1 + p2 xi2 + p3 xi3 + p4 xi4 )
i=1

subject to
n
X
(c1 xi1 + c2 xi2 + c3 xi3 + c4 xi4 ) ≤ C
i=1

xi1 + xi2 + xi3 + xi4 ≤ 1 (∀i)

xij ≤ yj (∀i, j)

y1 + y 2 + y 3 + y 4 ≤ 2

xij , yj ∈ {0, 1} (∀i, j)

10
MULTIPLE CHOICE QUESTIONS

2 Assume the variables x, y occur in a linear programming task. We would like to add the
new constraint |2x| + |3y| ≤ 3 to the LP. Which of the following formulations does it
correctly, given that we must keep the formulation linear:

1. if (x ≥ 0 and y ≥ 0) then 2x + 3y ≤ 3 else −2x − 3y ≤ 3


2. 2x + 3y ≤ 3, −2x − 3y ≤ 3
3. x + y ≤ 3, −x + y ≤ 3, x − y ≤ 3, −x − y ≤ 3
4. 2x + 3y ≤ 3, −2x + 3y ≤ 3, 2x − 3y ≤ 3, −2x − 3y ≤ 3
5. x ≤ 3, y ≤ 3, 2x + 3y ≤ 3
6. x ≤ 3, y ≤ 3, 2x + 3y ≤ 3, −2x − 3y ≤ 3

Correct answer: 4.

2
3 Consider the maximum flow problem from a source node s to a destination node t in a
directed graph. Assume we were able to find a flow that pushes a total of 10 units of
flow through the graph from s to t. (But it may not be a maximum flow.) Which of
the following is correct:

1. If originally each edge has capacity at most 5 and we increase the capacity of each
edge by 1, then the maximum s-t flow will necessarily be at least 12.
2. If we remove all edges from the graph on which the found flow is 0, then the
minimum cut capacity between s and t in the new graph necessarily remains the
same as in the original graph.
3. If there is a cut in the graph that separates s and t, and its capacity is 11, then
the found flow was not maximum.
4. If the found flow is maximum and all edge capacities are integers, then there must
be at least 10 edges that are saturated by the flow, that is, they are filled up to
their capacities.

Correct answer: 1.

3
5 Assume a large network with undirected links has minimum cut size 4, that is, it cannot
be disconnected by removing less than 4 links, but it can be disconnected by removing
4 links. We would like to disconnect the network in the following strong sense: we
remove enough links so that the network falls apart into at least 3 components. In
other words, the disconnected network cannot be made connected again by adding a
single link. To achieve this strong disconnection, which of the following is true:

1. It is always enough to remove the links of a minimum cut.


2. It is always enough to remove the links of a minimum cut plus one additional link.
3. It is always enough to remove a number of links that is at most twice the size a
minimum cut.
4. We may have to remove a number of links that can be arbitrarily higher than the
size of a minimum cut.

Correct answer: 4.

4
PROBLEM SOLVING

9 Consider a network with directed links, and each link in this network has 1 Mb/s capacity.
Assume that from a source node s to a terminal node t the network can carry a flow of
20 Mb/s, but not more. Let us call a link critical if it has the following property: if the
link is removed, then the network cannot carry the 20 Mb/s flow from s to t anymore.

a) Is it possible that this network contains less than 20 critical links? Justify your
answer.

Answer: No, it is not possible. Let us consider a maximum flow, which has
value 20 Mb/s, according to the conditions. If we look at a minimum cut which
separates s and t, then, due to the max flow min cut theorem, this cut must have
capacity 20 Mb/s. Since each link has 1 Mb/s capacity, there must be 20 links in
the cut. If we remove any of these links, the capacity of the cut goes down to 19
Mb/s, so it is not possible to send 20 Mb/s flow anymore. Thus, each link in the
minimum cut is critical and there are 20 of them.

5
b) Is it possible that this network contains more than 20 critical links? Justify your
answer.

Answer: Yes, it is possible. For example, let the network have 3 nodes: s, a, t.
Define the links in the following way: connect s to a by 20 directed links, each
of capacity 1 Mb/s, and also connect a to t by 20 directed links, each of capacity
1 Mb/s. Then the maximum flow from a to t is 20 Mb/s. The links between s
and a form a minimum cut in which each link is critical. The same is true for the
links that connect a to t. Thus, this network satisfies the conditions and it has
40 critical links.

6
9 * Consider an undirected graph, and let λ(a, b) denote the edge connectivity between
any two different nodes a, b.
Let x, y, z be three distinct nodes in the graph. Does the inequality

λ(x, z) ≥ min{λ(x, y), λ(y, z)}


always hold? Justify your answer!

Answer: Yes. Let λ(x, z) = k. Then x and z can be separated by removing k


edges. But this cut must also separate at least one of the source-destination pairs
(x, y), or (y, z). The reason is this: if none of them is separated by this cut, then after
removing the cut, a path would remain connecting x, y, and another path connecting
y, z. Together they would provide connection between x and z, contradicting to the
fact that we removed a cut between (x, z). Thus, at least one of the pairs (x, y) and
(y, z) must be separated by these k edges, so at least one of λ(x, y) and λ(y, z)} must
be ≤ k. This is equivalent to min{λ(x, y), λ(y, z)} ≤ k, which proves the statement,
since k = λ(x, z).

Comment. One may ask here: could it be true that the inequality

λ(x, z) ≥ min{λ(x, y), λ(y, z)}

always holds with equality?


The answer is no. In some cases the inequality may be strict. Consider the following
example. Let the graph contain 5 nodes: x, y, z, a, b. Connect x with a, b, y and connect
z also with a, b, y. Then λ(x, y) = 2, since removing the (x, y) and y, z) edges separates
x and y. Similarly, the removal of the same two edges separates y and z, so λ(y, z) = 2.
(Draw a figure to see it.) On the other hand, λ(x, z) = 3, since there are 3 edge-disjoint
paths between x and z, namely, x − a − z, x − b − z and x − y − z.

7
CS 6385 SAMPLE EXAM ANSWERS

Part A : MULTIPLE CHOICE QUESTIONS

1 The reliability of a parallel configuration

1. is determined by the least reliable component.


2. can be larger than any of the component reliabilities, given that none of them is 1.
3. cannot be larger than the reliability of the most reliable component.
4. always strictly smaller than the reliability of the most reliable component.
5. is the average of the component reliabilities

Correct answer: 2
P
2 The formula for the network-wide mean delay is T = γ1 li=1 Cif−f
i
i
, where γ is the total volume
of traffic in the network. Assume we double each link capacity and also double each link flow.
Which of the following describes correctly the resulting change in the value of T ?
fi 2fi
1. Since Ci −fi = 2Ci −2fi holds for every i, therefore, T remains the same.
2. Since by doubling the flow on each link the total traffic volume γ will also double and T is
inversely proportional to γ, therefore, taking into account that the summands fi /(Ci −fi )
preserve their values, T will decrease to half of its original value.
3. The resulting change in T depends on how the flow is distributed in the network, so the
given information is insufficient to determine how much the value will change.
4. None of the above.

Correct answer: 2

3 A step in the Cut Saturation Algorithm is the finding of a saturated cut. Consider now the
following slight modification of the problem. Let us call a cut nearly saturated if it contains
at most one non-saturated link and the rest of the links in the cut are all saturated. Which
of the following statements is true?

1. If the network contains a nearly saturated cut, then the saturated links within this cut
form a saturated cut.
2. If the network contains exactly one saturated cut, then it cannot contain a nearly satu-
rated cut.
3. If there is no nearly saturated cut in the network, then there is at most one saturated
cut.
4. If there is no nearly saturated cut in the network, then there is no saturated cut either.
5. None of the above.

Correct answer: 4

1
4 A network has 3 nodes and there is an undirected link between each pair of nodes. Each link
is operational independently of the others with probability p=0.7. The system is considered
operational if the network topology is connected. Which of the following is true?

1. The reliability of the system is at least 0.4.


2. The reliability of the system is strictly between 0.3 and 0.4.
3. The reliability of the system is less than 0.2.
4. The reliability of the system is at least 0.2, but not more than 0.3.
5. None of the above.

Correct answer: 1

5 In a mixed ILP x is a continuous variable and y is a discrete variable with y ∈ {0, 1}. We would
like to express the following conditions via linear constraints:
• if y = 0 then also x = 0
• if y = 1 then a ≤ x ≤ b for some given constants 0 < a < b.
Which of the following does it correctly:
1. xy = 0, a≤x≤b
2. x = y = 0 or a ≤ x ≤ b
3. a − y ≤ x + y ≤ b − y
4. y(a − y) ≤ x ≤ y(b − y)
5. ay ≤ x ≤ by
6. None of the above.

Correct answer: 5

6 Consider the reliability of a series configuration with n components and the reliability of each

component is p = 1 − 1/(n + n). What happens if n grows very large (n → ∞)?

1. The reliability of the configuration tends to 0.


2. The reliability of the configuration tends to 1.
3. The reliability of the configuration tends to a constant that is strictly between 0 and 1.
4. The reliability of the configuration does not tend to any number, because the limit does
not exist.

Correct answer: 3

7 Assume that in a Tabu Search algorithm we define the neighborhood of an n-dimensional binary
vector as follows. Let w(x) denote the number of 1 bits in x (the weight of x). The binary
vectors x, y are defined neighbors if and only if |w(x) − w(y)| ≥ 2 holds. Which of the
following is correct?

1. The state space will not be connected for any n, because by the above definition neighbors
differ in at least 2 bits, so one cannot reach a vector that differs from the current one in
only one bit.

2
2. The state space will always be connected for any n ≥ 2.
3. If n ≥ 3, then any vector x which has at least 3 bits with value 1 has the zero vector
among its neighbors.
4. None of the above.

Correct answer: 3

Part B : PROBLEM SOLVING

8 In an integer linear programming problem we have n variables, denoted by x1 , x2 , . . . , xn .


Each variable can only take the value 0 or 1. Express each of the following conditions
by a linear formulation:
a) At least one variable must take the value 0.
Answer:
This means, they cannot be all 1, so their sum is at most n − 1:

x1 + x2 + . . . + xn ≤ n − 1
or

x1 + x2 + . . . + xn < n
b) At most one variable can take the value 0.
Answer:
This means, at least n − 1 of them is 1, so their sum is at least n − 1:

x1 + x2 + . . . + xn ≥ n − 1
c) Either all variables are 0, or none of them.
Answer:
This means, they all must be equal:

x1 = x2 = . . . = xn
or

x1 = x2
x2 = x3
..
.
xn−1 = xn
9 Consider the Concentrator Location Problem with the following modification: the cost
of placing a concentrator at a site is charged only when the site serves more than 5
terminals. For those sites that serve at most 5 terminals the cost is waived. Let us
assume that the capacity k of a concentrator satisfies k > 5. Provide a modification
of the original Concentrator Location Problem formulation to cover this modified case.
The formulation should preserve the linearity of the original formulation, that is, you
cannot add any case separation, like ”if ... then ...”, or any other nonlinear formulation.
Justify your solution and explain the meaning of every variable and constraint.

3
Answer:
For reference, here is the original formulation of the Concentrator Location Problem.
(In an exam it does not have to be included, we just include it here for handy reference.)
Variables:

(
1 if terminal i is connected to site j
xij =
0 otherwise

(
1 if a concentrator is placed at site j
yj =
0 otherwise

Optimization task:
n X
X m m
X
min Z = cij xij + d j yj .
i=1 j=1 j=1

Subject to
n
X
xij ≤ kyj (∀j)
i=1
Xm
xij = 1 (∀i)
j=1
xij , yj ∈ {0, 1} (∀i, j)

Note: The above formulation is just for reference, the actual solution follows below.

To obtain the required modification, it is enough to add 5 to the righthand side of the
first constraint and changing the meaning of yj accordingly: now yj will be 1 only if
more than 5 terminals are served by the concentrator. Those that serve at most 5 are
free. We also replace the coefficient of y in this constraint by k − 5, to maintain that a
maximum of k terminals can be served by a concentrator (otherwise it would grow to
k + 5).
Variables:

(
1 if terminal i is connected to site j
xij =
0 otherwise

(
1 if a concentrator is placed at site j and it serves more than 5 terminals
yj =
0 otherwise

Thus, the optimization task is:

4
n X
X m m
X
min Z = cij xij + d j yj .
i=1 j=1 j=1

Subject to
n
X
xij ≤ (k − 5)yj + 5 (∀j)
i=1
Xm
xij = 1 (∀i)
j=1
xij , yj ∈ {0, 1} (∀i, j)

You might also like