Chapter 1 Linear Programming 1.1 Transportation of Commodities
Chapter 1 Linear Programming 1.1 Transportation of Commodities
The freight F in dollars per case per thousand miles is F = 90, so that
the transport cost cij in thousands of dollars per case is given by
F dij
cij = .
1000
x1 =0 C
x 3 =0 P x4 =0
x1
E x2 =0 A D
submatrix of A with columns aji ; xJ refers to the vector (xj1 , . . . , xjr )T . For
notational simplicity, the set {ji | i = 1, 2, . . . , r} of components of J will
also be denoted by J, and we will use the notation p ∈ J, if there exists t
such that p = jt . We define
Definition 1.1: An index vector J = (j1 , . . . , jM ) with M := m+1 different
indices ji ∈ N is called a basis of Ax = b resp. of LP (I, p), if AJ is regular.
Obviously, A has a basis if and only if the rows of A are linearly independent.
Besides J, sometimes AJ is referred to as a basis as well; the variables xi
with i ∈ J are called basis variables, the other variables xk (indizes k) with
k 6∈ J are said to be non-basis variables (non-basis indices). In case an
index vector K contains all non-basis indices, we will write J ⊕ K = N .
In the previous example, JA := (3, 4, 5), JB := (4, 5, 2) are bases.
To a basis J, J ⊕ K = N we assign a uniquely determined solution
x̄ = x̄(J) of Ax = b, called a basis solution, with the property x̄K = 0. Since
Ax̄ = AJ x̄J + AK x̄K = AJ x̄J = b ,
x̄ is given by
(1.21) x̄J := b̄, x̄K := 0 with b̄ := A−1
J b.
Moreover, for any basis J, a solution x of Ax = b is uniquely determined by
its non-basis part xK and its basis part x̄: This follows from the multiplica-
tion of Ax = AJ xJ + AK xK = b by A−1 J and (1.21):
(1.22) xJ = b̄ − A−1
J AK xK
= x̄J − A−1
J AK xK .
Choosing xK arbitrarily and defining xJ and hence x by (1.22), we have
that x solves Ax = b. (1.22) thus provides a specific parametrization of the
solution set {x | Ax = b} via the parameters xK .
If the basis solution x̄ associated with the basis J of Ax = b is a feasible
solution of LP (I, p), x̄ ∈ P , i.e., due to x̄K = 0 there holds
(1.23) x̄i ≥ 0 for all i ∈ I ∩ J ,
J is called a feasible basis of LP (I, p) and x̄ is said to be a feasible basis
solution. Moreover, a feasible basis is called non-degenerate, if instead of
(1.23) the sharper condition
(1.24) x̄i > 0 for all i ∈ I ∩ J.
holds true. The linear program LP (I, p) is said to be non-degenerate, if all
feasible bases J of LP (I, p) are non-degenerate.
Geometrically, the feasible basis solutions of the different bases of LP (I, p)
correspond to the vertices of the polyhedra P of feasible solutions, provided
the set of vertices of P is non-empty. In the example (cf. Fig. 1), the
vertex A ∈ P corresponds to the feasible basis JA = (3, 4, 5), since A is
determined by x1 = x2 = 0 and {1, 2} is the complementary set of JA with
respect to N = {1, 2, 3, 4, 5}; B corresponds to JB = (4, 5, 2), C corresponds
8 Ronald H.W. Hoppe
and thus the basis solution x̄ associated with J satisfying x̄J := b̄, x̄K := 0.
Step 2: Compute the row vector
π := eTt A−1
J ,
Optimization Theory, Fall 2006 ; Chapter 1 9
If yes, stop: the basis solution x̄ is the optimal solution of LP (I, p).
If no, determine s ∈ K such that
(1.26)
cs = min {ck < 0 | k ∈ K ∩ I} or |cs | = max {|ck | 6= 0, | k ∈ K \ I}
Step 5: If
(1.27) σ ᾱi ≤ 0 for all i with ji ∈ I,
i.e., the basis solution x̄ is the optimal solution of LP (I, p). This motivates
the test (1.25) and the assertion a) of step 3. If (1.25) does not old true,
there exists an index s ∈ K for which either
(1.29) cs < 0, s ∈ K ∩ I,
or
(1.30) |cs | =
6 0, s∈K \I .
Assume that s is such an index. We set σ := −sign(cs ). Since due to
(1.28) an increase in σxs yields an increase in the objective functional xp ,
we consider the following family of vectors x(θ) ∈ Rn+m1 +1 , θ ∈ R,
(1.31) x(θ)J := b̄ − θσA−1
J as = b̄ − θσā,
(1.32) x(θ)s := θσ,
(1.33) x(θ)k := 0 for k ∈ K, k 6= s.
Here, ā := A−1
J as is chosen as in step 4.
In the example we have I = {1, 2, 3, 4}, and J0 = JA = (3, 4, 5) is a
feasible basis, K0 = (1, 2), p = 5 ∈ J0 , t0 = 3. We obtain:
1 2
AJ0 = 1 , b̄ = 4
1 0
x̄(J0 ) = (0, 0, 2, 4, 0)T (=
ˆ point A in Fig. 1) and πAJ0 = eTt0 ⇒ π = (0, 0, 1).
The reduced costs are c1 = πa1 = −1, c2 = πa2 = −2. Hence, J0 is not
optimal. Choosing in step 3 the index s = 2, we obtain
1
ā = A−1 a2 = 1 .
J0
−2
Optimization Theory, Fall 2006 ; Chapter 1 11
(1) (1)
A(1) = (aik ), aik := 1/(i + k), i, k = 1, . . . , 5,
A(2) := I5 = 5-row unit matrix .
Here, A(1) is badly conditioned, whereas A(2) is well conditioned. The right-
hand side is chosen as the vector b := A(1) · e, e := (1, 1, 1, 1, 1)T ,
5
X 1
bi := ,
i+k
k=1
so that both bases J1 := (1, 2, 3, 4, 5), J2 := (6, 7, 8, 9, 10) are feasible for
(1.42) with the basis solutions
J2 → · · · → J1 → · · · → J2 .
For the associated basis solutions (1.43), this cycling process yields the fol-
lowing results (machine accuracy eps ≈ 10−11 , inexact digits are underlined):
16 Ronald H.W. Hoppe
minimize c1 x1 + · · · + cn xn
n
x ∈ R : ai1 x1 + · · · + ain xn ≤ bi , i = 1, 2, . . . , m
xi ≥ 0 for i ∈ I1 ⊂ {1, 2, . . . , n},
Optimization Theory, Fall 2006 ; Chapter 1 17
is feasible due to bj ≥ 0.
A 4-tuple M0 = {Jˆ0 ; t̂0 ; F̂0 , R̂0 } corresponding to Jˆ0 is given by
1 0
.. . . 1 0
. . ..
t̂0 := m + 1, F̂0 := , R̂0 := . .
0 ... 1
0 1
−1 . . . −1 1
ˆ p̂), phase II of the simplex method can be
Now, for the solution of LP (I,
P ˆ
launched. Due to xn+m+1 = − m i=1 xn+i ≤ 0, LP (I, p̂) has a finite
maximum and hence, phase II provides an optimal basis J¯ and the associated
¯ which is the optimal solution of LP (I,
basis solution x̄ = x̄(J) ˆ p̂).
We distinguish the three cases:
1: x̄n+m+1 < 0, i.e., (1.46) does not hold true for x̄,
2: x̄n+m+1 = 0 and no artificial variable is a basis variable,
3: x̄n+m+1 = 0 and there exists an artificial variable in J. ¯
In case 1, (1.44) is not solvable, since any feasible solution corresponds to
a feasible solution of LP (I, ˆ p̂) with xn+m+1 = 0. In case 2, the optimal
¯ ˆ
basis J of LP (I, p̂) readily gives a feasible start basis for phase II of the
simplex method. Case 3 represents a degenerate problem, since the artifi-
cial variables in the basis J¯ are zero. If necessary, by a re-numeration of
the equations and the artificial variables we may achieve that the artificial
variables in the basis J¯ are the variables xn+1 , xn+2 , . . . , xn+k . In LP (I,
ˆ p̂),
we then eliminate the remaining artificial variables which are not in J and ¯
instead of xn+m+1 introduce a new variable xn+k+1 := −xn+1 − · · · − xn+k
and a new variable xn+k+2 for the objective functional. The optimal ba-
sis J¯ of LP (I,
ˆ p̂) yields a feasible start basis J¯ ∪ {xn+k+2 } for the problem
Optimization Theory, Fall 2006 ; Chapter 1 19
equivalent to (1.44)
maximize xn+k+2
x: a11 x1 +···+ a1n xn +xn+1 = b1
.. .. .. ..
. . . .
ak1 x1 +···+ akn xn +xn+k = bk
xn+1 + ··· +xn+k +xn+k+1 =0
ak+1,1 x1 +···+ ak+1,n xn = bk+1
.. .. ..
. . .
am1 x1 +···+ amn xn = bm
c1 x 1 +···+ cn xn +xn+k+2 =0
xi ≥ 0 or i ∈ I ∪ {n + 1, . . . , n + k + 1} .
The sets
(1.48) FP := {x ∈ lRn | Ax = b , x ≥ 0} ,
(1.49) FPo := {x ∈ lRn | Ax = b , x > 0}
are called the primal feasible set and the primal strictly feasible set, respec-
tively.
The dual of the LP is given by: Find λ ∈ lRm , s ∈ lRn , such that
The sets
are referred to as the dual feasible set and the dual strictly feasible set,
respectively.
Theorem 1.2 (KKT conditions) A vector (x∗ , λ∗ , s∗ ) ∈ lRn × lRm × lRn
is a solution of (1.47),(1.50) if and only if the following Karush-Kuhn-Tucker
20 Ronald H.W. Hoppe
(ii) Assume that the dual problem is feasible. Then, the objective functional
bT λ is bounded from above on its feasible region if and only if the primal
problem is feasible.
Proof: The proof is left as an exercise.
The following result reveals a condition for the existence and boundedness
of the primal and dual solution sets:
Theorem 1.4 (Existence of primal/dual solutions) Assume that the
primal and dual problems are feasible, i.e., FP D 6= ∅. Then, there holds:
(i) If the dual problem has a strictly feasible point, the primal solution set
ΩP is nonempty and bounded.
(ii) If the primal problem has a strictly feasible point, the set
{s∗ ∈ lRn | (λ∗ , s∗ ) ∈ ΩD for some λ∗ ∈ lRm }
is nonempty and bounded.
Proof of (i): let (λ̄, s̄) be the strictly feasible dual point and assume that
x̂ is some primal feasible point. Then, we have
(1.62) 0 ≤ s̄T x̂ = cT x̂ − bT λ̄ ,
and the set
T := {x ∈ lRn | Ax = b , x ≥ 0 , cT x ≤ cT x̂}
is nonempty (x̂ ∈ T ) and closed. For any x ∈ T , (1.62) implies
n
X
s̄i xi = s̄T x = cT x − bT λ̄ ≤ cT x̂ − bT λ̄ = s̄T x̂ .
i=1
It follows that
1 1
0 = ( Xe − S −1 e)T (X −1/2 S 1/2 )(X 1/2 S −1/2 )( Se − X −1 e) =
τ τ
1 1/2 −1/2 2
= k (XS) e − (XS) ek ,
τ
and hence,
1
(XS)1/2 e − (XS)−1/2 e = 0 =⇒ XSe = τ e ,
τ
which concludes the proof of the theorem.
A commonly used primal-dual interior-point method is to couple the in-
equality constraints x ≥ 0 by a standard logarithmic barrier function para-
metrized by a barrier parameter τ > 0 which leads to the family of parame-
trized minimization subproblems
Xn
(1.87) min cT x − τ log xi subject to Ax = b .
x
i=1
The domain of the logarithmic barrier function is the set of strictly feasible
points for the LP, and the optimality conditions imply the existence of a
Lagrange multiplier λ ∈ lRm such that
τ X −1 e + AT λ = c ,
Ax = b ,
x > 0.
If we define s ∈ lRn by means of
τ
si := , 1≤i≤n,
xi
we see that the minimizer xτ of (1.87) is the x-component of the central
path vector (xτ , λτ , sτ ) ∈ C. Hence, we may refer to the path
(1.88) {xτ ∈ lRn | xτ solves (1.87) , τ > 0}
as the primal central path.
Optimization Theory, Fall 2006 ; Chapter 1 27
where k · k2 stands for the Euclidean norm, and the one-sided ∞-norm
neighborhood
(1.95) N−∞ (γ) := {(x, λ, s) ∈ FPo D | xi si ≥ γµ, 1 ≤ i ≤ n} , γ ∈ (0, 1).
In the sequel, we will investigate three classes of methods:
• short-step path following methods,
• Mizuno-Todd-Ye predictor-corrector methods,
• long-step path following methods.
x 2 s2
central path
1
0
3 2
N 2 (θ)
iterates
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
111111111111111111111111111111
000000000000000000000000000000
x1s 1
Step 1: Initialization
√
Choose (x0 , λ0 , s0 ) ∈ FPo D and set θ := 0.4 , σ := 1 − 0.4/ n.
Step 2: Iteration loop
For k ≥ 0 set σk = σ and compute
0 AT I ∆xk 0
(1.96) A 0 0 ∆λk = 0 ,
k k k k k
−X S e + σk µk e
S 0 X δs
Optimization Theory, Fall 2006 ; Chapter 1 29
Figure 2 contains the first iterates of the algorithm. The horizontal and the
vertical coordinate axes stand for the x1 s1 and the x2 s2 product, respec-
tively. The central path is the line emanating from (0, 0) at an angle of π/4.
The search direction appear to be curves rather than straight lines. The
solution is at (0, 0) and the problem is to reach that point maintaining the
feasibility conditions
Ax = b , AT λ + s = c .
The choice of θ and σ is motivated by the following result:
Theorem 1.7 (Properties of the short-step path following algo-
rithm) Let θ ∈ (0, 1) and σ ∈ (0, 1) be given such that
θ2 + n(1 − σ)2
(1.98) ≤ σθ.
23/2 (1 − θ)
Then
(1.99) (x, λ, s) ∈ N2 (θ) =⇒ (x(α), λ(α), s(α)) ∈ N2 (θ) , α ∈ [0, 1] .
Proof: For a proof we refer to [5].
1.3.2.3 Mizuno-Todd-Ye predictor-corrector methods
Predictor-corrector methods consist of predictor steps with σk = 0 to reduce
the duality measure µ and corrector step with σk = 1 to improve centrality.
They work with an inner neighborhood N2 (0.25) and an outer neighborhood
N2 (0.5) such that even-index iterates are confined to the inner neighborhood,
whereas odd-index iterates stay in the outer neighborhood.
Step 1: Initialization
Choose (x0 , λ0 , s0 ) ∈ N2 (0.25).
Step 2: Iteration loop
For k ≥ 0 do:
Predictor step: If k is even, set σk = 0 and solve
0 AT I ∆xk 0
(1.100) A 0 0 ∆λk = 0 .
k k k k k
−X S e
S 0 X δs
Choose αk as the largest value of α ∈ [0, 1] such that
(1.101) (xk + α∆xk , λk + α∆λk , sk + α∆sk ) ∈ N2 (0.5) .
30 Ronald H.W. Hoppe
Set
k
xk+1 x ∆xk
(1.102) λk+1 = λk + αk ∆λk .
sk+1 sk ∆sk
Corrector step: If k is odd, set σk = 1 and solve
0 AT I ∆xk 0
(1.103) A 0 0 ∆λk = 0 ,
k k
Sk 0 X k
δs k −X S e + µk e
x 2 s2
central path
N 2 (0.5) N 2 (0.25)
2 iterates
4
x1s 1
x 2 s2 1 0
central path
iterates
2
3
N −infty (γ )
x1s 1
The lower bound σmin on the centering parameter guarantees that the
search directions start out by moving off the boundary of N−∞ (γ) and into
its interior: Small steps would improve the centrality, whereas large steps
lead outside the neighborhood. The step size selection αk ensures that we
stay at least at the boundary.
Lemma 1.4 (properties of the long-step path following algorithm)
Fore given γ ∈ (0, 1) and 0 < σmin < σmax < 1 there exists δ < n, indepen-
dent of n, such that
δ
(1.112) µk+1 ≤ (1 − ) µk , k ≥ 0 .
n
Proof: We refer to [5].
1.3.2.5 Convergence of the path following algorithms
As far as the convergence of the sequence of iterates of the three previously
introduced path following primal-dual interior-point methods is concerned,
we have the following result:
Theorem 1.8 (Convergence of iterates of path following meth-
ods) Assume that {(xk , λk , sk )}k∈lN0 is a sequence of iterates generated
either by the short-step resp. long-step path following method or by the
predictor-corrector path-following algorithm and suppose that the sequence
{µk }k∈lN0 of duality measures is going to zero as k → ∞. Then, the sequence
{(xk , sk )}k∈lN0 is bounded and thus contains a convergent subsequence. Each
limit point is a strictly complementary primal-dual solution.
Optimization Theory, Fall 2006 ; Chapter 1 33
Instead of doing so, we will combine the centering step with the corrector
step.
Corrector step: The impact of a full step in the affine-scaling direction
on the pairwise products xi si , 1 ≤ i ≤ n, is as follows
(1.119) (xi + ∆xaf
i
f
)(si + ∆saf
i
f
) =
= xi si + xi ∆saf
i
f
+ si ∆xaf
i
f
+∆xaf
i
f
∆saf
i
f
= ∆xaf
i
f
∆saf
i
f
,
| {z }
= 0
where we have used that the sum of the first three terms is zero due to
(1.113). The corrector step is designed in such a way that the pairwise
products xi si come closer to the target value of zero:
0 AT I ∆xcor 0
(1.120) A 0 0 ∆λcor = 0 ,
∆s cor af f af f
S 0 X −∆X ∆S e
where
∆X af f = diag(∆xaf f af f
1 , ..., ∆xn ) ,
∆S af f = diag(∆saf f af f
1 , ..., ∆sn ) .
Now, it is an easy exercise to show that (1.119) and (1.120) imply
(1.121) (xi + ∆xaf
i
f
+ ∆xcor af f
i )(si + ∆si + ∆scor
i ) =
= ∆xaf
i
f
∆scor cor af f
i + ∆xi ∆si + ∆xcor cor
i ∆si .
If for µ → 0 the coefficient matrix in (1.113) resp. (1.120) is approaching a
nonsingular limit, we indeed have
k(∆xaf f , δsaf f )k = O(µ) , k(∆xcor , δscor )k = O(µ2 ) ,
which implies
∆xaf
i
f
∆saf
i
f
= O(µ2 ) ,
∆xaf
i
f
∆saf
i
f
∆xiaf f ∆scor cor af f
i + ∆xi ∆si + ∆xcor cor
i ∆si = O(µ3 ) .
However, if the limiting matrix is singular, it is not guaranteed that the
corrector step is smaller in norm than the predictor step (in fact, often it is
larger). Nevertheless, numerical evidence suggests that also in this case the
corrector step improves the overall performance of the algorithm.
Combining the centering and the corrector step amounts to the solution of
the linear system
0 AT I ∆xcc 0
(1.122) A 0 0 ∆λcc = 0 .
∆scc af f af f
S 0 X σµe − ∆X ∆S e
A commonly used variant of Mehrotra’s predictor-corrector step is given as
follows:
Step 1: Initialization
Optimization Theory, Fall 2006 ; Chapter 1 35
References
[1] G.B. Dantzig; Linear Programming and Extensions. Princeton Univ. Press,
Princeton, 1963
[2] R. Fletcher; Practical Methods of Optimization. Wiley, New York, 1987
[3] O.L. Mangarasian; Nonlinear Programming. McGraw-Hill, New York, 1969
[4] J. Stoer and R. Bulirsch; Introduction to Numerical Analysis. 3rd Edition.
Springer, Berlin-Heidelberg-New York, 2002
[5] S.J. Wright; Primal-Dual Interior-Point Methods. SIAM, Philadelphia, 1997