0% found this document useful (0 votes)
23 views12 pages

Optimisation THM Proof

proof

Uploaded by

rliu3685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

Optimisation THM Proof

proof

Uploaded by

rliu3685
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Part IB — Optimisation

Theorems with proof

Based on lectures by F. A. Fischer


Notes taken by Dexter Chua

Easter 2015

These notes are not endorsed by the lecturers, and I have modified them (often
significantly) after lectures. They are nowhere near accurate representations of what
was actually lectured, and in particular, all errors are almost surely mine.

Lagrangian methods
General formulation of constrained problems; the Lagrangian sufficiency theorem.
Interpretation of Lagrange multipliers as shadow prices. Examples. [2]

Linear programming in the nondegenerate case


Convexity of feasible region; sufficiency of extreme points. Standardization of problems,
slack variables, equivalence of extreme points and basic solutions. The primal simplex
algorithm, artificial variables, the two-phase method. Practical use of the algorithm;
the tableau. Examples. The dual linear problem, duality theorem in a standardized
case, complementary slackness, dual variables and their interpretation as shadow prices.
Relationship of the primal simplex algorithm to dual problem. Two person zero-sum
games. [6]

Network problems
The Ford-Fulkerson algorithm and the max-flow min-cut theorems in the rational case.
Network flows with costs, the transportation algorithm, relationship of dual variables
with nodes. Examples. Conditions for optimality in more general networks; *the
simplex-on-a-graph algorithm*. [3]

Practice and applications


*Efficiency of algorithms*. The formulation of simple practical and combinatorial
problems as linear programming or network problems. [1]

1
Contents IB Optimisation (Theorems with proof)

Contents
1 Introduction and preliminaries 3
1.1 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Review of unconstrained optimization . . . . . . . . . . . . . . . 3

2 The method of Lagrange multipliers 4


2.1 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Shadow prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Lagrange duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Supporting hyperplanes and convexity . . . . . . . . . . . . . . . 5

3 Solutions of linear programs 7


3.1 Linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Basic solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Extreme points and optimal solutions . . . . . . . . . . . . . . . 7
3.4 Linear programming duality . . . . . . . . . . . . . . . . . . . . . 7
3.5 Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5.1 The simplex tableau . . . . . . . . . . . . . . . . . . . . . 8
3.5.2 Using the Tableau . . . . . . . . . . . . . . . . . . . . . . 8
3.6 The two-phase simplex method . . . . . . . . . . . . . . . . . . . 8

4 Non-cooperative games 9
4.1 Games and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 The minimax theorem . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Network problems 11
5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Minimum-cost flow problem . . . . . . . . . . . . . . . . . . . . . 11
5.3 The transportation problem . . . . . . . . . . . . . . . . . . . . . 11
5.4 The maximum flow problem . . . . . . . . . . . . . . . . . . . . . 12

2
1 Introduction and preliminaries IB Optimisation (Theorems with proof)

1 Introduction and preliminaries


1.1 Constrained optimization
1.2 Review of unconstrained optimization
Lemma. Let f be twice differentiable. Then f is convex on a convex set S if
the Hessian matrix
∂2f
Hfij =
∂xi ∂xj
is positive semidefinite for all x ∈ S, where this fancy term means:

Theorem. Let X ⊆ Rn be convex, f : Rn → R be twice differentiable on X.


If x∗ ∈ X satisfy ∇f (x∗ ) = 0 and Hf (x) is positive semidefinite for all x ∈ X,
then x∗ minimizes f on X.

3
2 The method of Lagrange multipliersIB Optimisation (Theorems with proof)

2 The method of Lagrange multipliers


Theorem (Lagrangian sufficiency). Let x∗ ∈ X and λ∗ ∈ Rm be such that

L(x∗ , λ∗ ) = inf L(x, λ∗ ) and h(x∗ ) = b.


x∈X

Then x∗ is optimal for (P ).


In words, if x∗ minimizes L for a fixed λ∗ , and x∗ satisfies the constraints,
then x∗ minimizes f .
Proof. We first define the “feasible set”: let X(b) = {x ∈ X : h(x) = b}, i.e. the
set of all x that satisfies the constraints. Then

min f (x) = min (f (x) − λ∗T (h(x) − b)) since h(x) − b = 0


x∈X(b) x∈X(b)

≥ min(f (x) − λ∗T (h(x) − b))


x∈X

= f (x∗ ) − λ∗T (h(x∗ ) − b).


= f (x∗ ).

2.1 Complementary Slackness


2.2 Shadow prices
Theorem. Consider the problem

minimize f (x) subject to h(x) = b.


Here we assume all functions are continuously differentiable. Suppose that for
each b ∈ Rn , φ(b) is the optimal value of f and λ∗ is the corresponding Lagrange
multiplier. Then
∂φ
= λ∗i .
∂bi

2.3 Lagrange duality


Theorem (Weak duality). If x ∈ X(b) (i.e. x satisfies both the functional and
regional constraints) and λ ∈ Y , then

g(λ) ≤ f (x).

In particular,
sup g(λ) ≤ inf f (x).
λ∈Y x∈X(b)

Proof.

g(λ) = inf
0
L(x0 , λ)
x ∈X
≤ L(x, λ)
= f (x) − λT (h(x) − b)
= f (x).

4
2 The method of Lagrange multipliersIB Optimisation (Theorems with proof)

2.4 Supporting hyperplanes and convexity


Theorem. (P ) satisfies strong duality iff φ(c) = inf f (x) has a supporting
x∈X(c)
hyperplane at b.
Proof. (⇐) Suppose there is a supporting hyperplane. Then since the plane
passes through φ(b), it must be of the form

α(c) = φ(b) + λT (c − b).

Since this is supporting, for all c ∈ Rm ,

φ(b) + λT (c − b) ≤ φ(c),

or
φ(b) ≤ φ(c) − λT (c − b),
This implies that

φ(b) ≤ infm (φ(c) − λT (c − b))


c∈R

= infm inf (f (x) − λT (h(x) − b))


c∈R x∈X(c)

(since φ(c) = inf f (x) and h(x) = c for x ∈ X(c))


x∈X(c)

= inf L(x, λ).


x∈X
S
(since X(c) = X, which is true since for any x ∈ X, we have x ∈ X(h(x)))
c∈Rm

= g(λ)

By weak duality, g(λ) ≤ φ(b). So φ(b) = g(λ). So strong duality holds.


(⇒). Assume now that we have strong duality. The there exists λ such that
for all c ∈ Rm ,

φ(b) = g(λ)
= inf L(x, λ)
x∈X
≤ inf L(x, λ)
x∈X(c)

= inf (f (x) − λT (h(x) − b))


x∈X(c)

= φ(c) − λT (c − b)

So φ(b) + λT (c − b) ≤ φ(c). So this defines a supporting hyperplane.


Theorem (Supporting hyperplane theorem). Suppose that φ : Rm → R is
convex and b ∈ Rm lies in the interior of the set of points where φ is finite. Then
there exists a supporting hyperplane to φ at b.
Theorem. Let
φ(b) = inf {f (x) : h(x) ≤ b}
x∈X

If X, f, h are convex, then so is φ (assuming feasibility and boundedness).

5
2 The method of Lagrange multipliersIB Optimisation (Theorems with proof)

Proof. Consider b1 , b2 ∈ Rm such that φ(b1 ) and φ(b2 ) are defined. Let δ ∈ [0, 1]
and define b = δb1 + (1 − δ)b2 . We want to show that φ(b) ≤ δφ(b1 ) + (1 − δ)φ(b2 ).
Consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x = δx1 + (1 − δ)x2 . By convexity
of X, x ∈ X.
By convexity of h,

h(x) = h(δx1 + (1 − δ)x2 )


≤ δh(x1 ) + (1 − δ)h(x2 )
≤ δb1 + (1 − δ)b2
=b

So x ∈ X(b). Since φ(x) is an optimal solution, by convexity of f ,

φ(b) ≤ f (x)
= f (δx1 + (1 − δ)x2 )
≤ δf (x1 ) + (1 − δ)f (x2 )

This holds for any x1 ∈ X(b1 ) and x2 ∈ X(b2 ). So by taking infimum of the
right hand side,
φ(b) ≤ δφ(b1 ) + (1 − δ)φ(b2 ).
So φ is convex.
Theorem. If a linear program is feasible and bounded, then it satisfies strong
duality.

6
3 Solutions of linear programs IB Optimisation (Theorems with proof)

3 Solutions of linear programs


3.1 Linear programs
3.2 Basic solutions
Theorem. A vector x is a basic feasible solution of Ax = b if and only if it is
an extreme point of the set X(b) = {x0 : Ax0 = b, x0 ≥ 0}.

3.3 Extreme points and optimal solutions


Theorem. If (P ) is feasible and bounded, then there exists an optimal solution
that is a basic feasible solution.
Proof. Let x be optimal of (P ). If x has at most non-zero entries, it is a basic
feasible solution, and we are done.
Now suppose x has r > m non-zero entries. Since it is not an extreme point,
we have y 6= z ∈ X(b), δ ∈ (0, 1) such that

x = δy + (1 − δ)z.

We will show there exists an optimal solution strictly fewer than r non-zero
entries. Then the result follows by induction.
By optimality of x, we have cT x ≥ cT y and cT x ≥ cT z.
Since cT x = δcT y + (1 − δ)cT z, we must have that cT x = cT y = cT z, i.e. y
and z are also optimal.
Since y ≥ 0 and z ≥ 0, x = δy + (1 − δ)z implies that yi = zi = 0 whenever
xi = 0.
So the non-zero entries of y and z is a subset of the non-zero entries of x. So
y and z have at most r non-zero entries, which must occur in rows where x is
also non-zero.
If y or z has strictly fewer than r non-zero entries, then we are done. Other-
wise, for any δ̂ (not necessarily in (0, 1)), let

xδ̂ = δ̂y + (1 − δ̂)z = z + δ̂(y − z).

Observe that xδ̂ is optimal for every δ̂ ∈ R.


Moreover, y − z 6= 0, and all non-zero entries of y − z occur in rows where
x is non-zero as well. We can thus choose δ̂ ∈ R such that xδ̂ ≥ 0 and xδ̂ has
strictly fewer than r non-zero entries.

3.4 Linear programming duality


Theorem. The dual of the dual of a linear program is the primal.
Proof. It suffices to show this for the linear program in general form. We have
shown above that the dual problem is
minimize −bT λ subject to −AT λ ≥ −c, λ ≥ 0.
This problem has the same form as the primal, with −b taking the role of c, −c
taking the role of b, −AT taking the role of A. So doing it again, we get back to
the original problem.

7
3 Solutions of linear programs IB Optimisation (Theorems with proof)

Theorem. Let x and λ be feasible for the primal and the dual of the linear
program in general form. Then x and λ and optimal if and only if they satisfy
complementary slackness, i.e. if

(cT − λT A)x = 0 and λT (Ax − b) = 0.

Proof. If x and λ are optimal, then

cT x = λ T b

since every linear program satisfies strong duality. So

cT x = λT b
= inf
0
(cT x0 − λT (Ax0 − b))
x ∈X

≤ cT x − λT (Ax − b)
≤ cT x.

The last line is since Ax ≥ b and λ ≥ 0.


The first and last term are the same. So the inequalities hold with equality.
Therefore
λT b = cT x − λT (Ax − b) = (cT − λT A)x + λT b.
So
(cT − λT A)x = 0.
Also,
cT x − λT (Ax − b) = cT x
implies
λT (Ax − b) = 0.
On the other hand, suppose we have complementary slackness, i.e.

(cT − λT A)x = 0 and λT (Ax − b) = 0,

then
cT x = cT x − λT (Ax − b) = (cT − λT A)x + λT b = λT b.
Hence by weak duality, x and λ are optimal.

3.5 Simplex method


3.5.1 The simplex tableau
3.5.2 Using the Tableau

3.6 The two-phase simplex method

8
4 Non-cooperative games IB Optimisation (Theorems with proof)

4 Non-cooperative games
4.1 Games and Solutions
Theorem (Nash, 1961). Every bimatrix game has an equilibrium.

4.2 The minimax theorem


Theorem (von Neumann, 1928). If P ∈ Rm×n . Then
max min p(x, y) = min max p(x, y).
x∈X y∈Y y∈Y x∈X

Note that this is equivalent to


max min p(x, y) = − max min −p(x, y).
x∈X y∈Y y∈Y x∈X

The left hand side is the worst payoff the row player can get if he employs the
minimax strategy. The right hand side is the worst payoff the column player
can get if he uses his minimax strategy.
The theorem then says that if both players employ the minimax strategy,
then this is an equilibrium.
Proof. Recall that the optimal value of max min p(x, y) is a solution to the linear
program
maximize v such that
m
X
xi pij ≥ v for all j = 1, · · · , n
i=1
m
X
xi = 1
i=1
x≥0
n
Adding slack variable z ∈ R with z ≥ 0, we obtain the Lagrangian
n m
! m
!
X X X
L(v, x, z, w, y) = v + yj xi pij − zj − v − w xi − 1 ,
j=1 i=1 i=1

where w ∈ R and y ∈ Rn are Lagrange multipliers. This is equal to


   
Xn m
X Xn Xn
1 − yj  v +  pij yj − w xi − yj zj + w.
j=1 i=1 j=1 j=1
P P
This has finite minimum for all v ∈ R, x ≥ 0 iff yi = 1, pij yj ≤ w for all i,
and y ≥ 0. The dual is therefore
minimize w subject to
n
X
pij yj ≤ w for all i
j=1
n
X
yj = 1
j=1

y≥0

9
4 Non-cooperative games IB Optimisation (Theorems with proof)

This corresponds to the column player choosing a strategy (yi ) such that the
expected payoff is bounded above by w.
The optimum value of the dual is min max p(x, y). So the result follows from
y∈Y x∈X
strong duality.

Theorem. (x, y) ∈ X × Y is an equilibrium of the matrix game with payoff


matrix P if and only if

min p(x, y 0 ) = max min p(x0 , y 0 )


y 0 ∈Y 0 0
x ∈X y ∈Y

max
0
p(x0 , y) = min
0
max
0
p(x0 , u0 )
x ∈X y ∈Y x ∈X

i.e. the x, y are optimizers for the max min and min max functions.

10
5 Network problems IB Optimisation (Theorems with proof)

5 Network problems
5.1 Definitions
5.2 Minimum-cost flow problem
5.3 The transportation problem
Theorem. Every minimum cost-flow problem with finite capacities or non-
negative costs has an equivalent transportation problem.
Proof. Consider a minimum-cost flow problem on network (V, E). It is wlog to
assume that mij = 0 for all (i, j) ∈ E. Otherwise, set mij to 0, mij to mij − mij ,
bi to bi − mij , bj to bj + mij , xij to xij − mij . Intuitively, we just secretly ship
the minimum amount without letting the network know.
Moreover, we can assume that all capacities are finite: if some edge has
infinite capacity but non-negativeP cost, then setting the capacity to a large
enough number, for example i∈V |bi | does not affect the optimal solutions.
This is since cost is non-negative,Pand the optimal solution will not want shipping
loops. So we will have at most |bi | shipments.
We will construct an instance of the transportation P problem as follows:
For every i ∈ V , add a consumer with demand k:(i,k)∈E mik − bi .
For every (i, j) ∈ E, add a supplier with supply mij , an edge to consumer i
with cost c(ij,i) = 0 and an edge to consumer j with cost c(ij,j) = cij .
P
i k:(i,k)∈E mik − bi
0
mij ij
cij P
j k:(j,k)∈E mjk − bj

The idea is that if the capacity of the edge (i, j) is, say, 5, in the original network,
and we want to transport 3 along this edge, then in the new network, we send 3
units from ij to j, and 2 units to i.
The tricky part of the proof is to show that we have the same constraints in
both graphs.
For any flow x in the original network, the corresponding flow on (ij, j) is
xij and the flow on (ij, i) is mij − xij . The total flow into i is then
X X
(mik − xik ) + xki
k:(i,k)∈E k:(k,i)∈E

This satisfies the constraints of the new network if and only if


X X X
(mik − xik ) + xki = mik − bi ,
k:(i,k)∈E k:(k,i)∈E k:(i,k)∈E

which is true if and only if


X X
bi + xki − xik = 0,
k:(k,i)∈E k:(i,k)∈E

11
5 Network problems IB Optimisation (Theorems with proof)

which is exactly the constraint for the node i in the original minimal-cost flow
problem. So done.

5.4 The maximum flow problem


Theorem (Max-flow min-cut theorem). Let δ be an optimal solution. Then

δ = min{C(S) : S ⊆ V, 1 ∈ S, n ∈ V \ S}

Proof. Consider any feasible flow vector x. Call a path v0 , · · · , vk an augmenting


path if the flow along the path can be increased. Formally, it is a path that
satisfies
xvi−1 vi < Cvi−1 vi or xvi vi−1 > 0
for i = 1, · · · , k. The first condition says that we have a forward edge where
we have not hit the capacity, while the second condition says that we have a
backwards edge with positive flow. If these conditions are satisfied, we can
increase the flow of each edge (or decrease the backwards flow for backwards
edge), and the total flow increases.
Now assume that x is optimal and let

S = {1} ∪ {i ∈ V : there exists an augmenting path from 1 to i}.

Since there is an augmenting path from 1 to S, we can increase flow from 1 to


any vertex in S. So n 6∈ S by optimality. So n ∈ V \ S.
We have previously shown that

δ = fx (S, V \ S) − fx (V \ S, S).

We now claim that fx (V \ S, S) = 0. If it is not 0, it means that there is a node


v ∈ V \ S such that there is flow from v to a vertex u ∈ S. Then we can add
that edge to the augmenting path to u to obtain an augmenting path to v.
Also, we must have fx (S, V \ S) = C(S). Or else, we can still send more
things to the other side so there is an augmenting path. So we have

δ = C(S).

12

You might also like