Optimisation
Optimisation
Optimisation
Easter 2015
These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.
Lagrangian methods
General formulation of constrained problems; the Lagrangian sufficiency theorem.
Interpretation of Lagrange multipliers as shadow prices. Examples. [2]
Network problems
The Ford-Fulkerson algorithm and the max-flow min-cut theorems in the rational case.
Network flows with costs, the transportation algorithm, relationship of dual variables
with nodes. Examples. Conditions for optimality in more general networks; *the
simplex-on-a-graph algorithm*. [3]
1
Contents IB Optimisation
Contents
1 Introduction and preliminaries 3
1.1 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Review of unconstrained optimization . . . . . . . . . . . . . . . 4
4 Non-cooperative games 26
4.1 Games and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 The minimax theorem . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Network problems 30
5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Minimum-cost flow problem . . . . . . . . . . . . . . . . . . . . . 30
5.3 The transportation problem . . . . . . . . . . . . . . . . . . . . . 31
5.4 The maximum flow problem . . . . . . . . . . . . . . . . . . . . . 36
2
1 Introduction and preliminaries IB Optimisation
where we’ve explicitly written out the different forms the constraints can take.
This is too clumsy. Instead, we can perform some tricks and turn them into
a nicer form:
Definition (General and standard form). The general form of a linear program
is
minimize cT x subject to Ax ≥ b, x ≥ 0
The standard form is
minimize cT x subject to Ax = b, x ≥ 0.
3
1 Introduction and preliminaries IB Optimisation
It takes some work to show that these are indeed the most general forms. The
equivalence between the two forms can be done via slack variables, as described
above. We still have to check some more cases. For example, this form says
that x ≥ 0, i.e. all decision variables have to be positive. What if we want x to
be unconstrained, ie can take any value we like? We can split x into to parts,
x = x+ − x− , where each part has to be positive. Then x can take any positive
or negative value.
Note that when I said “nicer”, I don’t mean that turning a problem into this
form necessarily makes it easier to solve in practice. However, it will be much
easier to work with when developing general theory about linear programs.
Example. We want to minimize −(x1 + x2 ) subject to
x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x2 ≥ 0
x1 − x2 = 3
x1 + 2x2 = 6
x1
c
−(x1 + x2 ) = 0 −(x1 + x2 ) = −2 −(x1 + x2 ) = −5
The shaded region is the feasible region, and c is our cost vector. The dotted lines,
which are orthogonal to c are lines in which the objective function is constant.
To minimize our objective function, we want the line to be as right as possible,
which is clearly achieved at the intersection of the two boundary lines.
Now we have a problem. In the general case, we have absolutely no idea how
to solve it. What we do know, is how to do unconstrained optimization.
4
1 Introduction and preliminaries IB Optimisation
Definition (Convex region). A region S ⊆ Rn is convex iff for all δ ∈ [0, 1],
x, y ∈ S, we have δx + (1 − δ)y ∈ S. Alternatively, If you take two points, the
line joining them lies completely within the region.
non-convex convex
δx + (1 − δ)y
x y
5
2 The method of Lagrange multipliers IB Optimisation
since we just have to switch the sign of λ. So we don’t have to worry about
getting the sign of λ wrong when defining the Lagrangian.
If we minimize L over both x and λ, then we will magically find the minimal
solution subject to the constrains. Sometimes.
Theorem (Lagrangian sufficiency). Let x∗ ∈ X and λ∗ ∈ Rm be such that
Proof. We first define the “feasible set”: let X(b) = {x ∈ X : h(x) = b}, i.e. the
set of all x that satisfies the constraints. Then
= f (x∗ ).
How can we interpret this result? To find these values of λ∗ and x∗ , we have
to solve
∇L = 0
h(x) = b.
6
2 The method of Lagrange multipliers IB Optimisation
∇f = λ∇h
h(x) = b.
What does this mean? For better visualization, we take the special case where
f and h are a functions R2 → R. Usually, if we want to minimize f without
restriction, then for small changes in x, there should be no (first-order) change
in f , i.e. df = ∇f · dx = 0. This has to be true for all possible directions of x.
However, if we are constrained by h(x) = b, this corresponds to forcing x to
lie along this particular path. Hence the restriction df = 0 only has to hold when
x lies along the path. Since we need ∇f · dx = 0, this means that ∇f has to be
perpendicular to dx. Alternatively, ∇f has to be parallel to the normal to the
path. Since the normal to the path is given by ∇h, we obtain the requirement
∇f = λ∇h.
This is how we should interpret the condition ∇f = λ∇h. Instead of requiring
that ∇f = 0 as in usual minimization problems, we only require ∇f to point at
directions perpendicular to the allowed space.
Example. Minimize x1 + x2 − 2x3 subject to
x1 + x2 + x3 = 5
x21 + x22 = 4
The Lagrangian is
7
2 The method of Lagrange multipliers IB Optimisation
(iii) Find
Y = λ: inf L(x, z, λ) > −∞ .
x∈X,z≥0
x∗ (λ) ∈ X, z ∗ (λ) ≥ 0
such that
L(x∗ (λ), z ∗ (λ), λ) = inf L(x, z, λ)
x∈X,z≥0
Then by the Lagrangian sufficiency condition, x∗ (λ∗ ) is optimal for the con-
strained problem.
8
2 The method of Lagrange multipliers IB Optimisation
9
2 The method of Lagrange multipliers IB Optimisation
ie, we fix λ, and see how small we can get L to be. As before, let
Then we have
Theorem (Weak duality). If x ∈ X(b) (i.e. x satisfies both the functional and
regional constraints) and λ ∈ Y , then
g(λ) ≤ f (x).
In particular,
sup g(λ) ≤ inf f (x).
λ∈Y x∈X(b)
Proof.
g(λ) = inf
0
L(x0 , λ)
x ∈X
≤ L(x, λ)
= f (x) − λT (h(x) − b)
= f (x).
Definition (Strong duality). (P ) and (D) are said to satisfy strong duality if
It turns out that problems satisfying strong duality are exactly those for
which the method of Lagrange multipliers work.
Example. Again consider the problem to minimize x1 − x2 − 2x3 subject to
x1 + x2 + x3 = 5
x21 + x22 = 4
10
2 The method of Lagrange multipliers IB Optimisation
We saw that
Y = {λ ∈ R2 : λ1 = −2, λ2 < 0}
and
∗ 3 1 4
x (λ) = , ,5 − .
2λ2 2λ2 2λ2
The dual function is
10
g(λ) = inf L(x, λ) = L(x∗ (λ), λ) = + 4λ2 − 10.
x∈X 4λ2
The dual is the problem to
10
maximize 4λ2 + 4λ2 − 10 subject to λ2 < 0.
The maximum is attained for
r
5
λ2 = −
8
After calculating the values of g and f , we can see that the primal and dual do
have the same optimal value.
Right now, what we’ve got isn’t helpful, because we won’t know if our problem
satisfies strong duality!
φ
α
φ(b)
Theorem. (P ) satisfies strong duality iff φ(c) = inf f (x) has a supporting
x∈X(c)
hyperplane at b.
Note that here we fix a b, and let φ be a function of c.
Proof. (⇐) Suppose there is a supporting hyperplane. Then since the plane
passes through φ(b), it must be of the form
φ(b) + λT (c − b) ≤ φ(c),
11
2 The method of Lagrange multipliers IB Optimisation
or
φ(b) ≤ φ(c) − λT (c − b),
This implies that
= g(λ)
φ(b) = g(λ)
= inf L(x, λ)
x∈X
≤ inf L(x, λ)
x∈X(c)
= φ(c) − λT (c − b)
12
2 The method of Lagrange multipliers IB Optimisation
Proof. Consider b1 , b2 ∈ Rm such that φ(b1 ) and φ(b2 ) are defined. Let δ ∈ [0, 1]
and define b = δb1 + (1 − δ)b2 . We want to show that φ(b) ≤ δφ(b1 ) + (1 − δ)φ(b2 ).
Consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x = δx1 + (1 − δ)x2 . By convexity
of X, x ∈ X.
By convexity of h,
φ(b) ≤ f (x)
= f (δx1 + (1 − δ)x2 )
≤ δf (x1 ) + (1 − δ)f (x2 )
This holds for any x1 ∈ X(b1 ) and x2 ∈ X(b2 ). So by taking infimum of the
right hand side,
φ(b) ≤ δφ(b1 ) + (1 − δ)φ(b2 ).
So φ is convex.
h(x) = b is equivalent to h(x) ≤ b and −h(x) ≤ −b. So the result holds for
problems with equality constraints if both h and −h are convex, i.e. if h(x) is
linear.
So
Theorem. If a linear program is feasible and bounded, then it satisfies strong
duality.
13
3 Solutions of linear programs IB Optimisation
x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x2 ≥ 0
x1 − x2 = 3
c
x1
x1 + 2x2 = 6
A x1 − x2 = 3
c
x1
x1 + x2 = 3.5
14
3 Solutions of linear programs IB Optimisation
This already allows us to solve linear programs, since we can just try all
corners and see which has the smallest value. However, this can be made more
efficient, especially when we have a large number of dimensions and hence corners.
x = δy + (1 − δ)z,
then x = y = z.
Consider again the linear program in standard form, i.e.
maximize cT x subject to Ax = b, x ≥ 0, where A ∈ Rm×n and b ∈ Rm .
Note that now we are talking about maximization instead of minimization.
Definition (Basic solution and basis). A solution x ∈ Rn is basic if it has at
most m non-zero entries (out of n), i.e. if there exists a set B ⊆ {1, · · · , n} with
|B| = m such that xi = 0 if i 6∈ B. In this case, B is called the basis, and xi are
the basic variables if i ∈ B.
We will later see (via an example) that basic solutions correspond to solutions
at the “corners” of the solution space.
Definition (Non-degenerate solutions). A basic solution is non-degenerate if it
has exactly m non-zero entries.
Note that by “solution”, we do not mean a solution to the whole maximization
problem. Instead we are referring to a solution to the constraint Ax = b. Being a
solution does not require that x ≥ 0. Those that satisfy this regional constraint
are known as feasible.
Definition (Basic feasible solution). A basic solution x is feasible if it satisfies
x ≥ 0.
Example. Consider the linear program
maximize f (x) = x1 + x2 subject to
x1 + 2x2 + z1 = 6
x1 − x2 + z2 = 3
x1 , x2 , z1 , z2 ≥ 0
15
3 Solutions of linear programs IB Optimisation
x1 x2 z1 z2 f (x)
A 0 0 6 3 0
B 0 3 0 6 3
C 4 1 0 0 5
D 3 0 3 0 3
E 6 0 0 −4 6
F 0 −3 12 0 −3
Among all 6, E and F are not feasible solutions since they have negative entries.
So the basic feasible solutions are A, B, C, D.
x2
x1 − x2 = 3
D
A B E x1
x1 + 2x2 = 6
In previous example, we saw that the extreme points are exactly the basic
feasible solutions. This is true in general.
Theorem. A vector x is a basic feasible solution of Ax = b if and only if it is
an extreme point of the set X(b) = {x0 : Ax0 = b, x0 ≥ 0}.
We will not prove this.
16
3 Solutions of linear programs IB Optimisation
Now suppose x has r > m non-zero entries. Since it is not an extreme point,
we have y 6= z ∈ X(b), δ ∈ (0, 1) such that
x = δy + (1 − δ)z.
We will show there exists an optimal solution strictly fewer than r non-zero
entries. Then the result follows by induction.
By optimality of x, we have cT x ≥ cT y and cT x ≥ cT z.
Since cT x = δcT y + (1 − δ)cT z, we must have that cT x = cT y = cT z, i.e. y
and z are also optimal.
Since y ≥ 0 and z ≥ 0, x = δy + (1 − δ)z implies that yi = zi = 0 whenever
xi = 0.
So the non-zero entries of y and z is a subset of the non-zero entries of x. So
y and z have at most r non-zero entries, which must occur in rows where x is
also non-zero.
If y or z has strictly fewer than r non-zero entries, then we are done. Other-
wise, for any δ̂ (not necessarily in (0, 1)), let
So any point in the interior cannot be better than the extreme points.
17
3 Solutions of linear programs IB Optimisation
Since x, z can be arbitrarily positive, this has a finite minimum if and only if
cT − λT A ≥ 0, λT ≥ 0.
Call the feasible set Y . Then for fixed λ ∈ Y , the minimum of L(x, z, λ) is
attained when (cT − λT A)x and λT z = 0 by complementary slackness. So
2x1 + x2 + z1 = 4
2x1 + 3x2 + z2 = 6
x1 , x2 , z1 , z2 ≥ 0.
2λ1 + 2λ2 − µ1 = 3
λ1 + 3λ2 − µ2 = 2
λ1 , λ2 , µ1 , µ2 ≥ 0.
We can compute all basic solutions of the primal and the dual by setting n−m−2
variables to be zero in turn.
Given a particular basic solutions of the primal, the corresponding solutions
of the dual can be found by using the complementary slackness solutions:
λ1 z1 = λ2 z2 = 0, µ1 x1 = µ2 x2 = 0.
x1 x2 z1 z2 f (x) λ1 λ2 µ1 µ2 g(λ)
A 0 0 4 6 0 0 0 -3 -2 0
3
B 2 0 0 2 6 2 0 0 − 21 6
3 5
C 3 0 -2 0 9 0 2 0 2 9
3 13 5 1 13
D 2 1 0 0 2 4 4 0 0 2
2
E 0 2 2 0 4 0 3 − 35 0 4
F 0 4 0 -6 8 2 0 1 0 8
18
3 Solutions of linear programs IB Optimisation
x2 λ2
F
C
E D
B D
C x1 F
λ1
A B A E λ1 + 3λ2 = 2
2x1 + 3x2 = 6
We see that D is the only solution such that both the primal and dual solutions
are feasible. So we know it is optimal without even having to calculate f (x). It
turns out this is always the case.
Theorem. Let x and λ be feasible for the primal and the dual of the linear
program in general form. Then x and λ and optimal if and only if they satisfy
complementary slackness, i.e. if
(cT − λT A)x = 0 and λT (Ax − b) = 0.
Proof. If x and λ are optimal, then
cT x = λ T b
since every linear program satisfies strong duality. So
cT x = λT b
= inf
0
(cT x0 − λT (Ax0 − b))
x ∈X
≤ c x − λT (Ax − b)
T
≤ cT x.
The last line is since Ax ≥ b and λ ≥ 0.
The first and last term are the same. So the inequalities hold with equality.
Therefore
λT b = cT x − λT (Ax − b) = (cT − λT A)x + λT b.
So
(cT − λT A)x = 0.
Also,
cT x − λT (Ax − b) = cT x
implies
λT (Ax − b) = 0.
On the other hand, suppose we have complementary slackness, i.e.
(cT − λT A)x = 0 and λT (Ax − b) = 0,
then
cT x = cT x − λT (Ax − b) = (cT − λT A)x + λT b = λT b.
Hence by weak duality, x and λ are optimal.
19
3 Solutions of linear programs IB Optimisation
x1 + 2x2 + z1 = 6
x1 − x2 + z2 = 3
x1 , x2 , z1 , z2 ≥ 0.
x1 x2 z1 z2
Constraint 1 1 2 1 0 6
Constraint 2 1 -1 0 1 3
Objective 1 1 0 0 0
1 0
We see an identity matrix in the z1 and z2 columns, and these correspond
0 1
to basic feasible solution: z1 = 6, z2 = 3, x1 = x2 = 0. It’s pretty clear that our
basic feasible solution is not optimal, since our objective function is 0. This is
since something in the last row is positive, and we can increase the objective by,
say, increasing x1 .
The simplex method says that we can find the optimal solution if we make
the bottom row all negative while keeping the right column positive, by doing
row operations.
We multiply the first row by 12 and subtract/add it to the other rows to
obtain
x1 x2 z1 z2
1 1
Constraint 1 2 1 2 0 3
2 1
Constraint 2 3 0 2 1 6
1
Objective 2 0 − 12 0 -3
20
3 Solutions of linear programs IB Optimisation
We rearrange the columns so that all basis columns are on the left. Then we
can write our matrices as
Am×n = (AB )m×m (AN )m×(n−m)
T
xn×1 = (xB )m×1 (xN )(n−m)×1
c1×n = (cB )m×1 (cN )(n−m)×1 .
Ax = b
can be decomposed as
AB xB + AN xN = b.
We can rearrange this to obtain
xB = A−1
B (b − AN xN ).
xB = A−1
B b.
A−1
B AB = I A−1
B AN A−1
B b
This might look really scary, and it is! Without caring too much about how the
formulas for the cells come from, we see the identity matrix on the left, which is
where we find our basic feasible solution. Below that is the row for the objective
function. The values of this row must be 0 for the basis columns.
On the right-most column, we have A−1 B b, which is our xB . Below that is
T −1
−cB AB b, which is the negative of our objective function cTB xB .
f (x) = cT x
= cTB xB + cTN xN
= cTB A−1 T
B (b − AN xN ) + cN xN
= cTB A−1 T T −1
B b + (cN − cB AB AN )xN .
We will maximize cT x by choosing a basis such that cTN − cTB A−1 B AN ≤ 0, i.e.
non-positive everywhere and A−1 B b ≥ 0.
If this is true, then for any feasible solution x ∈ Rn , we must have xN ≥ 0.
So (cTN − cTB A−1
B AN )xN ≤ 0 and
21
3 Solutions of linear programs IB Optimisation
aij ai0
a0j a00
where ai0 is b, a0j corresponds to the objective function, and a00 is initial 0.
The simplex method proceeds as follows:
(i) Find an initial basic feasible solution.
(ii) Check whether a0j ≤ 0 for every j. If so, the current solution is optimal.
Stop.
(iii) If not, choose a pivot column j such that a0j > 0. Choose a pivot row
i ∈ {i : aij > 0} that minimizes ai0 /aij . If multiple rows are minimize
ai0 /aij , then the problem is degenerate, and things might go wrong. If
aij ≤ 0 for all i, i.e. we cannot choose a pivot row, the problem is unbounded,
and we stop.
(iv) We update the tableau by multiplying row i by 1/aij (such that the new
aij = 1), and add a (−akj /aij ) multiple of row i to each row k 6= i,
including k = 0 (so that akj = 0 for all k 6= i)
We have a basic feasible solution, since our choice of aij makes all right-hand
columns positive after subtracting (apart from a00 ).
22
3 Solutions of linear programs IB Optimisation
x1 + x2 ≥ 1
2x1 − x2 ≥ 1
3x2 ≤ 2
x1 , x2 ≥ 0
x1 + x2 − z1 = 1
2x1 − x2 − z2 = 1
3x2 + z3 = 2
x1 , x2 , z1 , z2 , z3 ≥ 0
x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x2 , z1 , z2 , z3 , y1 , y2 ≥ 0
Note that adding y1 and y2 might create new solutions, which is bad. We solve
this problem by first trying to make y1 and y2 both 0 and find a basic feasible
solution. Then we can throw away y1 and y2 and then get a basic feasible for
our original problem. So momentarily, we want to solve
minimize y1 + y2 subject to
x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x2 , z1 , z2 , z3 , y1 , y2 ≥ 0
23
3 Solutions of linear programs IB Optimisation
x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
0 0 0 0 0 -1 -1 0
Note that we keep both our original and “kill-yi ” objectives, but now we only
care about the second one. We will keep track of the original objective so that
we can use it in the second phase.
We see an initial feasible solution y1 = y2 = 1, z3 = 2. However, this is not a
proper simplex tableau, as the basis columns should not have non-zero entries
(apart from the identity matrix itself). But we have the two −1s at the bottom!
So we add the first two rows to the last to obtain
x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
3 0 -1 -1 0 0 0 2
Our pivot column is x1 , and our pivot row is the second row. We divide it by 1
and add/subtract it from other rows.
x1 x2 z1 z2 z3 y1 y2
3 1
0 2 -1 2 0 1 − 12 1
2
1 − 12 0 − 12 0 0 1
2
1
2
0 3 0 0 1 0 0 2
0 -6 0 -3 0 0 3 3
3 1
0 2 −1 2 0 0 − 32 1
2
There are two possible pivot columns. We pick z2 and use the first row as the
pivot row.
x1 x2 z1 z2 z3 y1 y2
0 3 -2 1 0 2 -1 1
1 1 -1 0 0 1 0 1
0 3 0 0 1 0 0 2
0 3 -6 0 0 6 0 6
0 0 0 0 0 -1 -1 0
We see that y1 and y2 are no longer in the basis, and hence take value 0. So we
drop all the phase I stuff, and are left with
24
3 Solutions of linear programs IB Optimisation
x1 x2 z1 z2 z3
0 3 -2 1 0 1
1 1 -1 0 0 1
0 3 0 0 1 2
0 3 -6 0 0 6
x1 x2 z1 z2 z3
0 1 − 23 1
3 0 1
3
1 0 − 13 − 31 0 2
3
0 0 2 -1 1 1
0 0 -4 -1 0 5
Since the last row is all negative, we have complementary slackness. So this
is a optimal solution. So x1 = 23 , x2 = 13 , z3 = 1 is a feasible solution, and our
optimal value is 5.
Note that we previously said that the bottom right entry is the negative
of the optimal value, not the optimal value itself! This is correct, since in the
tableau, we are maximizing −6x1 − 3x2 , whose maximum value is −5. So the
minimum value of 6x1 + 3x2 is 5.
25
4 Non-cooperative games IB Optimisation
4 Non-cooperative games
Here we have a short digression to game theory. We mostly focus on games with
two players.
Here a victory gives you a payoff of 1, a loss gives a payoff of −1, and a draw
gives a payoff of 1. Also the first row/column corresponds to playing rock, second
corresponds to paper and third corresponds to scissors.
Usually, this is not the best way to display the payoff matrices. First of all,
we need two write out two matrices, and there isn’t an easy way to indicate what
row corresponds to what decision. Instead, we usually write this as a table.
R P S
R (0, 0) (−1, 1) (1, −1)
P (1, −1) (0, 0) (−1, 1)
S (−1, 1) (1, −1) (0, 0)
By convention, the first item in the tuple (−1, 1) indicates the payoff of the row
player, and the second item indicates the payoff of the column player.
Definition (Strategy). Players are allowed to play randomly. The set of strate-
gies the row player can have is
X
X = {x ∈ Rm : x ≥ 0, xi = 1}
Example (Prisoner’s dilemma). Suppose Alice and Bob commit a crime together,
and are caught by the police. They can choose to remain silent (S) or testify
(T ). Different options will lead to different outcomes:
26
4 Non-cooperative games IB Optimisation
– Both keep silent: the police has little evidence and they go to jail for 2
years.
– One testifies and one remains silent: the one who testifies gets awarded
and is freed, while the other gets stuck in jail for 10 years.
– Both testify: they both go to jail for 5 years.
We can represent this by a payoff table:
S T
S (2, 2) (0, 3)
T (3, 0) (1, 1)
C D
C (2, 2) (1, 3)
D (3, 1) (0, 0)
Such an x is the strategy the row player can employ that minimizes the worst
possible loss. This is called the maximin strategy.
We can formulate this as a linear program:
27
4 Non-cooperative games IB Optimisation
Here the maximin strategy is to chicken. However, this isn’t really what we
are looking for, since if both players employ this maximin strategy, it would be
better for you to not chicken out.
Definition (Best response and equilibrium). A strategy x ∈ X is a best response
to y ∈ Y if for all x0 ∈ X
p(x, y) ≥ p(x0 , y)
A pair (x, y) is an equilibrium if x is the best response against y and y is a best
response against x.
Example. In the chicken game, there are two pure equilibrium, (3, 1) and (1, 3),
and there is a mixed equilibrium in which the players pick the options with equal
probability.
Theorem (Nash, 1961). Every bimatrix game has an equilibrium.
We are not proving this since it is too hard.
The left hand side is the worst payoff the row player can get if he employs the
minimax strategy. The right hand side is the worst payoff the column player
can get if he uses his minimax strategy.
The theorem then says that if both players employ the minimax strategy,
then this is an equilibrium.
Proof. Recall that the optimal value of max min p(x, y) is a solution to the linear
program
maximize v such that
m
X
xi pij ≥ v for all j = 1, · · · , n
i=1
m
X
xi = 1
i=1
x≥0
28
4 Non-cooperative games IB Optimisation
P P
This has finite minimum for all v ∈ R, x ≥ 0 iff yi = 1, pij yj ≤ w for all i,
and y ≥ 0. The dual is therefore
minimize w subject to
n
X
pij yj ≤ w for all i
j=1
n
X
yj = 1
j=1
y≥0
This corresponds to the column player choosing a strategy (yi ) such that the
expected payoff is bounded above by w.
The optimum value of the dual is min max p(x, y). So the result follows from
y∈Y x∈X
strong duality.
Definition (Value). The value of the matrix game with payoff matrix P is
max
0
p(x0 , y) = min
0
max
0
p(x0 , u0 )
x ∈X y ∈Y x ∈X
i.e. the x, y are optimizers for the max min and min max functions.
29
5 Network problems IB Optimisation
5 Network problems
5.1 Definitions
We are going to look into several problems that involve graphs. Unsurprisingly,
we will need some definitions from graph theory.
Definition (Directed graph/network). A directed graph or network is a pair
G = (V, E), where V is the set of vertices and E ⊆ V × V is the set of edges. If
(u, v) ∈ E, we say there is an edge from u to v.
Definition (Degree). The degree of a vertex u ∈ V is the number of v ∈ V such
that (u, v) ∈ E or (v, u) ∈ E.
Definition (Walk). An walk from u ∈ V to v ∈ V is a sequence of vertices
u = v1 , · · · , vk = v such that (vi , vi+1 ) ∈ E for all i. An undirected walk allows
(vi , vi+1 ) ∈ E or (vi+1 , v) ∈ E, i.e. we are allowed to walk backwards.
Definition (Path). A path is a walk where v1 , · · · , vk are pairwise distinct.
Definition (Cycle). A cycle is a walk where v1 , · · · , vk−1 are pairwise distinct
and v1 = vk .
Definition (Connected graph). A graph is connected if for any pair of vertices,
there is an undirected path between them.
Definition (Tree). A tree is a connected graph without (undirected) cycles.
Definition (Spanning tree). The spanning tree of a graph G = (V, E) is a tree
(V 0 , E 0 ) with V 0 = V and E 0 ⊆ E.
X X
bi + xji = xij for each i ∈ V
j:(j,i)∈E j:(i,j)∈E
30
5 Network problems IB Optimisation
This problem is a linear program. In theory, we can write it into the general
form Ax = b, where A is a huge matrix given by
1
if the kth edge starts at vertex i
aij −1 if the kth edge ends at vertex i
0 otherwise
However, using this huge matrix to solve this problem by the simplex method is
not very efficient. So we will look for better solutions.
Note that for the system to make sense, we must have
X
bi = 0,
i∈V
m
X
xij = si for i = 1, · · · , n
j=1
n
X
xij = dj for j = 1, · · · , m
i=1
x≥0
31
5 Network problems IB Optimisation
The idea is that if the capacity of the edge (i, j) is, say, 5, in the original network,
and we want to transport 3 along this edge, then in the new network, we send 3
units from ij to j, and 2 units to i.
The tricky part of the proof is to show that we have the same constraints in
both graphs.
For any flow x in the original network, the corresponding flow on (ij, j) is
xij and the flow on (ij, i) is mij − xij . The total flow into i is then
X X
(mik − xik ) + xki
k:(i,k)∈E k:(k,i)∈E
which is exactly the constraint for the node i in the original minimal-cost flow
problem. So done.
To solve the transportation problem, it is convenient to have two sets of
Lagrange multipliers, one for the supplier constraints and one for the consumer
32
5 Network problems IB Optimisation
Note that we use different signs for the Lagrange multipliers for the suppliers
and the consumers, so that our ultimate optimality condition will look nicer.
This is equivalent to
n X
X n n
X m
X
L(x, λ, µ) = (cij − λi + µj )xij + λi si − µj dj .
i=1 j=1 i=1 j=1
Since x ≥ 0, the Lagrangian has a finite minimum iff cij − λi + µj ≥ 0 for all
i, j. So this is our dual feasibility condition.
At an optimum, complementary slackness entails that
(cij − λi + µj )xij = 0
for all i, j.
In this case, we have a tableau as follows:
µ1 µ2 µ3 µ4
λ1 − µ1 λ1 − µ2 λ1 − µ3 λ1 − µ4
λ1 x11 c11 x12 c12 x13 c13 x14 c14 s1
λ2 − µ1 λ2 − µ2 λ2 − µ3 λ2 − µ4
λ2 x21 c21 x22 c22 x23 c23 x24 c24 s1
λ3 − µ1 λ3 − µ2 λ3 − µ3 λ3 − µ4
λ3 x31 c31 x32 c32 x33 c33 x34 c34 s1
d1 d2 d3 d4
We have a row for each supplier and a column for each consumer.
Example. Suppose we have three suppliers with supplies 8, 10 and 9; and four
consumers with demands 6, 5, 8, 8.
It is easy to create an initial feasible solution - we just start from the first
consumer and first supplier, and supply as much as we can until one side runs
out of stuff.
We first fill our tableau with our feasible solution.
6 5 2 3 4 6 8
2 3 7 7 4 1 10
5 6 1 2 8 4 9
6 5 8 8
33
5 Network problems IB Optimisation
6
8 = s1 d1 = 6
2
3
10 = s2 d2 = 5
7
1
9 = s3 d3 = 8
8
d4 = 8
We see that our basic feasible solution corresponds to a spanning tree. In general,
if we have n suppliers and m consumers, then we have n + m vertices, and hence
n + m − 1 edges. So we have n + m − 1 dual constraints. So we can arbitrarily
choose one Lagrange multiplier, and the other Lagrange multipliers will follow.
We choose λ1 = 0. Since we require
(cij − λi + µi )xij = 0,
0 6 5 2 3 4 6
4 2 3 7 7 4 1
2 5 6 1 2 8 4
We can fill in the values of λi − µi :
-5 -3 0 -2
0 2
0 6 5 2 3 4 6
9 6
4 2 3 7 7 4 1
7 5
2 5 6 1 2 8 4
The dual feasibility condition is
λi − µi ≤ cij
34
5 Network problems IB Optimisation
6−δ 5 2+δ 3 4 6
δ 2 3−δ 7 7 4 1
5 6 1 2 8 4
6−δ
8 = s1 d1 = 6
2+δ
δ
3−δ
10 = s2 d2 = 5
7
1
9 = s3 d3 = 8
8
d4 = 8
3 5 5 3 4 6
3 2 7 7 4 1
5 6 1 2 8 4
We re-compute the Lagrange multipliers to obtain
-5 -3 -7 -9
7 9
0 3 5 5 3 4 6
0 6
-3 3 2 7 7 4 1
0 -2
-5 5 6 1 2 8 4
We see a violation at the bottom right. So we do it again:
3 5 5 3 4 6
3 2 7 7−δ 4 δ 1
5 6 1+δ 2 8−δ 4
The maximum possible value of δ is 7. So we have
3 5 5 3 4 6
3 2 7 4 7 1
5 6 8 2 1 4
35
5 Network problems IB Optimisation
All this clumsy notation says is that we add up the capacities of all edges from
S to V \ S.
Assume x is a feasible flow vector that sends δ units from 1 to n. For
X, Y ⊆ V , we define X
fx (X, Y ) = xij ,
(i,j)∈(X×Y )∩E
36
5 Network problems IB Optimisation
For any solution xij and cut S ⊆ V with 1 ∈ S, n ∈ V \ S, the total flow
from 1 to n can be written as
X X X
δ= xij − xji .
i∈S j:(i,j)∈E j:(j,i)∈E
P P
This is true since by flow conservation, for any i 6= 1, xij − xji = 0,
j:(i,j)∈E j:(j,i)∈E
and for i = 1, it is δ. So the sum is δ. Hence
δ = fx (S, V ) − fx (V, S)
= fx (S, S) + fx (S, V \ S) − fx (V \ S, S) − fx (S, S)
= fx (S, V \ S) − fx (V \ S, S)
≤ fx (S, V \ S)
≤ C(S)
This says that the flow through the cut is less than the capacity of the cut, which
is obviously true. The less obvious result is that this bound is tight, i.e. there is
always a cut S such that δ = C(S).
Theorem (Max-flow min-cut theorem). Let δ be an optimal solution. Then
δ = min{C(S) : S ⊆ V, 1 ∈ S, n ∈ V \ S}
δ = fx (S, V \ S) − fx (V \ S, S).
δ = C(S).
37
5 Network problems IB Optimisation
The max-flow min-cut theorem does not tell us how to find an optimal path.
Instead, it provides a quick way to confirm that our path is optimal.
It turns out that it isn’t difficult to find an optimal solution. We simply keep
adding flow along augmenting paths until we cannot do so. This is known as the
Ford-Fulkerson algorithm.
1
5 1
1 4 n
- -
5 5
2
1
5 1 1
1 4 4 1 n
- -
5 3 5
2 2 5
2
(red is flow, black is capacity). We know this is an optimum, since our total flow
is 6, and we can draw a cut with capacity 6:
1
5 1
1 4 n
- -
5 5
2
38