LY15 Linear and Nonlinear Programming
LY15 Linear and Nonlinear Programming
net/publication/317083799
CITATIONS READS
2,341 12,294
2 authors:
All content following this page was uploaded by Yinyu Ye on 29 May 2014.
Preface ix
List of Figures xi
Bibliography 160
Index 409
vii
viii CONTENTS
Preface
ix
x PREFACE
List of Figures
xi
xii LIST OF FIGURES
Chapter 5
Interior-Point Algorithms
5.1 Introduction
Linear programming (LP), plays a distinguished role in optimization theory.
In one sense it is a continuous optimization problem since the goal is to
minimize a linear objective function over a convex polyhedron. But it is
also a combinatorial problem involving selecting an extreme point among
a finite set of possible vertices, as seen in Chapters 3 and 4.
An optimal solution of a linear program always lies at a vertex of the
feasible region, which itself is a polyhedron. Unfortunately, the number of
vertices associated with a set of n inequalities in m variables can be expo-
nential in the dimension—up to n!/m!(n − m)!. Except for small values of
m and n, this number is sufficiently large to prevent examining all possible
vertices when searching for an optimal vertex.
The simplex method examines optimal candidate vertices in an intel-
ligent fashion. As we know, it constructs a sequence of adjacent vertices
with improving values of the objective function. Thus, the method travels
along edges of the polyhedron until it hits an optimal vertex. Improved in
various way in the intervening four decades, the simplex method continues
to be the workhorse algorithm for solving linear programming problems.
Although it performs well in practice, the simplex method will examine
every vertex when applied to certain linear programs. Klee and Minty in
1972 gave such an example. These examples confirm that, in the worst
case, the simplex method uses a number of iterations that is exponential
in the size of the problem to find the optimal solution. As interest in
complexity theory grew, many researchers believed that a good algorithm
should be polynomial —that is, broadly speaking, the running time required
117
118 CHAPTER 5. INTERIOR-POINT ALGORITHMS
x.2
........
..
...
...
...
...
...
...
..................
1 ..
...
.............
.............
.............
... .............
.............
... .............
.............
... .............
... ............
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ..................
... ..
.......
... . .
. .
. ....
. ..........
..
... .............
.............
... .............
.. .............
.............
....................................................................................................................................................................................................................................................................................................................................
...
x1
..
.... 1
...
..
where 0 < ε < 1/2. This presentation of the problem emphasizes the idea
that the feasible region of the problem is a perturbation of the n-cube.
In the case of n = 2 and ε = 1/4, the feasible region of the linear
program above looks like that of Figure 5.1
For the case where n = 3, the feasible region of the problem above looks
like that of Figure 5.2
122 CHAPTER 5. INTERIOR-POINT ALGORITHMS
x.3
.........
....
...
...
...
... .......................................................................................................................................................
.......................................................................
1 ..
...
................................
...
................................
................................................
.........
......... ....
...
... .. ...
... ..... ... ....
... ... ..
... .. ... ...
... .. ... ...
...
...
x
.....
2 .
.
.
.
.
...
...
........ .
.
... ..... .
. ...
... .. .
. ...
...
. .
.
... . .... ........ ...... ....... ....... ....... ........ ....... ....... ....... ......
. .
.. .
.. .
... ... . . .
... ..... ..
. ..
....
.
... ...
. ..
..... .... .....
.....
... .. ... .....
... ...
. ..... ... .....
... .
.. .
. ...
......
. .. .
... ... ... .. .... .........
... .. .... ... .......
... ... ... ............
. ..
... ... ... .............. ...................
... ........ .....................
.. ... .....................
......................
.....................................................................................................................................................................................................................................................................................................................................
...
x1
..
.... 1
...
..
Consider
n
X
maximize 10n−j xj
j=1
Xi−1
(5.1)
subject to 2 10i−j xj + xi ≤ 100i−1 i = 1, . . . , n
j=1
xj ≥0 j = 1, . . . , n
In this case, we have three constraints and three variables (along with
their nonnegativity constraints). After adding slack variables, we get a
problem in standard form. The system has m = 3 equations and n = 6
nonnegative variables. In tableau form, the problem is
5.2. THE SIMPLEX METHOD IS NOT POLYNOMIAL-TIME∗ 123
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
0 5 20 1 0 0 1 0 100
T
6 200 20 1 0 0 1 10,000
cT 100 10 1 0 0 0 0
• • •
The bullets below the tableau indicate the columns that are basic.
Note that we are maximizing, so the goal is to find a feasible basis that
prices out nonpositive. In the objective function, the nonbasic variables
x1 , x2 , and x3 have coefficients 100, 10, and 1, respectively. Using the
greedy rule2 for selecting the incoming variable (see Section 3.8, Step 2
of the revised simplex method), we start making x1 positive and find (by
the minimum ratio test) that x4 becomes nonbasic. After pivoting on the
element in row 1 and column 1, we obtain a sequence of tables:
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
5 0 1 0 –20 1 0 80
T1
6 0 20 1 –200 0 1 9,800
rT 0 10 1 –100 0 0 –100
• • •
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
2 2 0 1 0 –20 1 0 80
T
6 0 0 1 200 –20 1 8,200
rT 0 0 1 100 –10 0 –900
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
3 2 20 1 0 0 1 0 100
T
6 –200 0 1 0 –20 1 8,000
rT –100 0 1 0 –10 0 –1,000
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
2 20 1 0 0 1 0 100
T4
3 –200 0 1 0 –20 1 8,000
rT 100 0 0 0 10 –1 –9,000
• • •
2 That is, selecting the pivot column as that with the largest “reduced cost” coefficient.
124 CHAPTER 5. INTERIOR-POINT ALGORITHMS
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
5 2 0 1 0 –20 1 0 80
T
3 0 0 1 200 –20 1 8,200
rT 0 0 0 –100 10 –1 –9,100
• • •
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
6 5 0 1 0 –20 1 0 80
T
3 0 20 1 –200 0 1 9,800
rT 0 –10 0 100 0 –1 –9,900
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
7 5 20 1 0 0 1 0 100
T
3 200 20 1 0 0 1 10,000
rT –100 –10 0 0 0 –1 –10,000
• • •
From T7 we see that the corresponding basic feasible solution
(x1 , x2 , x3 , x4 , x5 , x6 ) = (0, 0, 104 , 1, 102 , 0)
is optimal and that the objective function value is 10, 000. Along the way,
we made 23 − 1 = 7 pivot steps. The objective function strictly increased
with each change of basis.
We see that the instance of the linear program (5.1) with n = 3 leads to
23 − 1 pivot steps when the greedy rule is used to select the pivot column.
The general problem of the class (5.1) takes 2n − 1 pivot steps. To get
an idea of how bad this can be, consider the case where n = 50. Now
250 − 1 ≈ 1015 . In a year with 365 days, there are approximately 3 × 107
seconds. If a computer ran continuously performing a hundred thousand
iterations of the simplex algorithm per second, it would take approximately
1015
≈ 33 years
3 × 105 × 108
to solve the problem using the greedy pivot selection rule3 .
3 A path that visits each vertex of a unit n-cube once and only once is said to be
Hamiltonian path. In this example, the simplex method generates a sequence of points
which yields a Hamiltonian path on the cube. There is an amusing recreational literature
that connects the Hamiltonian path with certain puzzles. See Martin Gardner in the
references.
5.3. THE ELLIPSOID METHOD 125
Ω = {y ∈ Rm : yT aj ≤ cj , j = 1, . . . n}
{y ∈ Rm : |y − y0 | ≤ R},
contains Ω.
(A2) There is a known scalar r > 0 such that if Ω is nonempty, then it
contains a ball of the form S(y∗ , r) with center at y∗ and radius
r. (This assumption implies that if Ω is nonempty, then it has a
nonempty interior and its volume is at least vol(S(0, r)))4 .
E = {y ∈ Rm : (y − z)T Q(y − z) ≤ 1}
The axes of a general ellipsoid are the eigenvectors of Q and the lengths
−1/2 −1/2 −1/2
of the axes are λ1 , λ2 , . . . , λm , where the λi ’s are the corresponding
eigenvalues. It is easily seen that the volume of an ellipsoid is
−1/2
vol(E) = vol(S(0, 1))Πm
i=1 λi = vol(S(0, 1))det(Q−1/2 ).
1
Ω⊂ Ek := {y ∈ Ek : aTj y ≤ aTj yk }
2
This set is half of the ellipsoid, obtained by cutting the ellipsoid in half
through its center.
The successor ellipsoid Ek+1 will be the minimal-volume ellipsoid con-
taining 12 Ek . It is constructed as follows. Define
1 m2
τ= , δ= , σ = 2τ.
m+1 m2 − 1
Then put
τ
yk+1 = yk − Bk aj
(aTj Bk aj )1/2
à !
Bk aj aTj Bk
Bk+1 = δ Bk − σ T (5.2)
aj Bk aj
.
Theorem 5.1 The ellipsoid Ek+1 = ell(yk+1 , B−1 k+1 ) defined as above is
the ellipsoid of least volume containing 12 Ek . Moreover,
µ ¶(m−1)/2 µ ¶
vol(Ek+1 ) m2 m 1
= < exp − < 1.
vol(Ek ) m2 − 1 m+1 2(m + 1)
5.3. THE ELLIPSOID METHOD 127
- +
E(E )
E
+
E
ye
-
E
H
Figure 5.3: Illustration of the minimal-volume ellipsoid containing a half-
ellipsoid.
Proof. We shall not prove the statement about the new ellipsoid being
of least volume, since that is not necessary for the results that follow. To
prove the remainder of the statement, we have
1/2
vol(Ek+1 ) det(Bk+1 )
= 1/2
vol(Ek ) det(Bk )
product of the square roots of these, giving the equality in the theorem.
Then using (1 + x)p ≤ exp , we have
µ ¶(m−1)/2 µ ¶(m−1)/2 µ ¶
m2 m 1 1
= 1+ 2 1−
m2 − 1 m+1 m −1 m+1
µ ¶ µ ¶
1 1 1
< exp − = exp − .
2(m + 1) (m + 1) 2(m + 1)
2
128 CHAPTER 5. INTERIOR-POINT ALGORITHMS
Convergence
The ellipsoid method is initiated by selecting y0 and R such that condition
(A1) is satisfied. Then B0 = R2 I, and the corresponding E0 contains Ω.
The updating of the Ek ’s is continued until a solution is found.
Under the assumptions stated above, a single repetition of the ellipsoid
method reduces the volume of an ellipsoid to one-half of its initial value
in O(m) iterations. Hence it can reduce the volume to less than that of a
sphere of radius r in O(m2 log(R/r)) iterations, since it volume is bounded
from below by vol(S(0, 1))rm and the initial volume is vol(S(0, 1))Rm .
Generally a single iteration requires O(m2 ) arithmetic operations. Hence
the entire process requires O(m4 log(R/r)) arithmetic operations.5
maximize cT x
(P) subject to Ax ≤ b
x≥0
y ≥ 0.
Both problems can be solved by finding a feasible point to inequalities
−cT x + bT y ≤ 0
Ax ≤ b
(5.3)
−AT y ≤ −c
x, y ≥ 0,
where both x and y are variables. Thus, the total number of arithmetic
operations of solving a linear program is bounded by O((m + n)4 log(R/r)).
the data consists of integers, it is possible to perturb the problem so that (A2) is satisfied
and if the perturbed problem has a feasible solution, so does the original Ω.
5.4. THE ANALYTIC CENTER 129
set rather than the vertices and edges that plays a dominant role in this
type of algorithm. In fact, these algorithms purposely avoid the edges of
the set, only eventually converging to one as a solution.
The study of these algorithms begins in the next section, but it is useful
at this point to introduce a concept that definitely focuses on the interior
of a set, termed the set’s analytic center. As the name implies, the center
is away from the edge.
In addition, the study of the analytic center introduces a special struc-
ture, termed a barrier that is fundamental to interior-point methods.
Consider a set S in a subset of X of Rn defined by a group of inequalities
as
S = {x ∈ X : gj (x) ≥ 0, j = 1, 2, . . . , m},
and assume that the functions gj are continuous. S has a nonempty interior
◦
S = {x ∈ X : gj (x) > 0, all j}. Associated with this definition of the set is
the potential function
m
X
ψ(x) = − log gj (x)
j=1
◦
defined on S .
The analytic center of S is the vector (or set of vectors) that minimizes
the potential; that is, the vector (or vectors) that solve
X m
min ψ(x) = min − log gj (x) : x ∈ X , gj (x) > 0 for each j .
j=1
There are several sets associated with linear programs for which the
analytic center is of particular interest. One such set is the feasible region
itself. Another is the set of optimal solutions. There are also sets associated
with the dual and primal-dual formulations. All of these are related in
important ways.
Let us illustrate, by considering the analytic center associated with a
bounded polytope Ω in Rm represented by n (> m) linear inequalities; that
is,
Ω = {y ∈ Rm : cT − yT A ≥ 0},
where A ∈ Rm×n and c ∈ Rn are given and A has rank m. Denote the
interior of Ω by
◦
m T T
Ω= {y ∈ R : c − y A > 0}.
The potential function for this set is
n
X n
X
ψΩ (y) ≡ − log(cj − yT aj ) = − log sj , (5.4)
j=1 j=1
The analytic center can be defined when the interior is empty or equal-
ities are present, such as
Ω = {y ∈ Rm : cT − yT A ≥ 0, By = b}.
show how this merger of linear and nonlinear programming produces elegant
and effective methods. And these ideas take an especially pleasing form
when applied to linear programming. Study of them here, even without
all the detailed analysis, should provide good intuitive background for the
more general manifestations.
Consider a primal linear program in standard form
(LP) minimize cT x (5.5)
subject to Ax = b
x ≥ 0.
We denote the feasible region of this program by Fp . We assume that
◦
F = {x : Ax = b, x > 0} is nonempty and the optimal solution set of the
problem is bounded.
Associated with this problem, we define for µ ≥ 0 the barrier problem
n
X
(BP) minimize cT x − µ log xj (5.6)
j=1
subject to Ax = b
x > 0.
It is clear that µ = 0 corresponds to the original problem (5.5). As µ → ∞,
the solution approaches the analytic center of the feasible region (when it is
bounded), since the barrier term swamps out cT x in the objective. As µ is
varied continuously toward 0, there is a path x(µ) defined by the solution to
(BP). This path x(µ) is termed the primal central path. As µ → 0 this path
converges to the analytic center of the optimal face {x : cT x = z ∗ , Ax =
b, x ≥ 0}, where z ∗ is the optimal value of (LP).
A strategy for solving (LP) is to solve (BP) for smaller and smaller
values of µ and thereby approach a solution to (LP). This is indeed the
basic idea of interior-point methods.
At any µ > 0, under the assumptions that we have made for problem
(5.5), the necessary and sufficient conditions for a unique and bounded
solution are obtained by introducing a Lagrange multiplier vector y for the
linear equality constraints to form the Lagrangian (see Chapter ?)
n
X
cT x − µ log xj − yT (Ax − b).
j=1
The derivatives with respect to the xj ’s are set to zero, leading to the
conditions
cj − µ/xj − yT aj = 0, for each j
5.5. THE CENTRAL PATH 133
or
µX−1 1 + AT y = c (5.7)
where as before aj is the j-th column of A and X is the diagonal matrix
whose diagonal entries are x > 0. Setting sj = µ/xj the complete set of
conditions can be rewritten
x◦s = µ1
Ax = b (5.8)
AT y + s = c.
Note that y is a dual feasible solution and c − AT y > 0 (see Exercise 5.3).
Example 5.3 A square primal Consider the problem of maximizing x1
within the unit square S = [0, 1]2 . The problem is formulated as
max x1
subject to x1 + x3 = 1
x2 + x4 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
Here x3 and x4 are slack variables for the original problem to put it in
standard form. The optimality conditions for x(µ) consist of the original 2
linear constraint equations and the four equations
y1 + s1 = 1
y2 + s2 = 0
y1 + s3 = 0
y2 + s4 = 0
together with the relations si = µ/xi for i = 1, 2 . . . , 4. These equations are
readily solved with a series of elementary variable eliminations to find
p
1 − 2µ ± 1 + 4µ2
x1 (µ) =
2
x2 (µ) = 1/2.
Using the “+” solution, it is seen that as µ → 0 the solution goes to
x → (1, 1/2). Note that this solution is not a corner of the cube. Instead
it is at the analytic center of the optimal face {x : x1 = 1, 0 ≤ x2 ≤ 1}.
The limit of x(µ) as µ → ∞ can be seen to be the point (1/2, 1/2). Hence,
the central path in this case is a straight line progressing from the analytic
center of the square (at µ → ∞) to the analytic center of the optimal face
(at µ → 0).
134 CHAPTER 5. INTERIOR-POINT ALGORITHMS
(LD) maximize yT b
subject to yT A + sT = cT
s ≥ 0.
subject to yT A + sT = cT
s > 0.
◦
We assume that the dual feasible set Fd has an interior F d = {(y, s) :
yT A + sT = cT , s > 0} is nonempty and the optimal solution set of (LD)
is bounded. Then, as µ is varied continuously toward 0, there is a path
(y(µ), s(µ)) defined by the solution to (BD). This path y(µ) is termed the
dual central path.
To work out the necessary and sufficient conditions we introduce x as a
Lagrange multiplier and form the Lagrangian
n
X
yT b + µ log sj − (yT A + sT − cT )x.
j=1
where a.i is the i-th row of A. Setting to zero the derivative with respect
to sj leads to
µ/sj − xj = 0, for all j.
Combining these equations and including the original constraint yields the
complete set of conditions
x◦s = µ1
Ax = b
T
A y+s = c.
5.5. THE CENTRAL PATH 135
These are identical to the optimality conditions for the primal central path
(5.8). Note that x is a primal feasible solution and x > 0.
To see the geometric representation of the dual central path, consider
the dual level set
Ω(z) = {y : cT − yT A ≥ 0, yT b ≥ z}
for any z < z ∗ where z ∗ is the optimal value of (LD). Then, the analytic
center (y(z), s(z)) of Ω(z) coincides the dual central path as z tends to the
optimal value z ∗ from below (Figure 5.4).
ya
Figure 5.4: The central path as analytic centers in the dual feasible region.
Example 5.4 The square dual Consider the dual of the LP of example
5.3. The feasible region is the set of vectors y with y1 ≤ −1, y2 ≤ 0.
(The values of s1 and s2 are the slack variables of these inequalities.) The
solution to the dual barrier problem is easily found from the solution of the
primal barrier problem to be
x◦s = µ1
Ax = b
(5.9)
AT y + s = c
x, s ≥ 0
Proposition 5.2 Suppose the feasible sets of the primal and dual pro-
grams contain interior points. Then the primal–dual central path x(µ), y(µ), s(µ))
exists for all µ, 0 ≤ µ < ∞. Furthermore, x(µ) is the primal central path,
and (y(µ), s(µ)) is the dual central path.
Duality gap
Let (x(µ), y(µ), s(µ)) be on the primal-dual central path. Then from (5.9)
it follows that
cT x − yT b = yT Ax + sT x − yT b = sT x = nµ.
z ∗ is the optimal value of the primal. Likewise, for any dual feasible pair
(y, s), the value yT b gives a lower bound as yT b ≤ z ∗ . The difference, the
duality gap g = cT x − yT b, provides a bound on z ∗ as z ∗ ≥ cT x − g. Hence
if at a feasible point x, a dual feasible (y, s) is available, the quality of x
can be measured as cT x − z ∗ ≤ g.
At any point on the primal–dual central path, the duality gap is equal
to nµ. Hence, this value gives a measure of optimality. It is clear that as
µ → 0 the duality gap goes to zero, and hence both x(µ) and (y(µ), s(µ))
approach optimality for the primal and dual, respectively.
for a very small value of µ. In fact, if we desire to reduce the duality gap to
ε it is only necessary to solve the problem for µ = ε/n. Unfortunately, when
µ is small, the problem (5.10) could be highly ill-conditioned in the sense
that the necessary conditions are nearly singular. This makes it difficult to
directly solve the problem for small µ.
An overall strategy, therefore, is to start with a moderately large µ
(say µ = 100) and solve that problem approximately. The corresponding
solution is a point approximately on the primal central path, but it is likely
to be quite distant from the point corresponding to the limit of µ → 0.
However this solution point at µ = 100 can be used as the starting point
for the problem with a slightly smaller µ, for this point is likely to be close
to the solution of the new problem. The value of µ might be reduced at each
stage by a specific factor, giving µk+1 = γµk , where γ is a fixed positive
parameter less than one and k is the stage count.
If the strategy is begun with a value µ0 , then at the k-th stage we have
µk = γ k µ0 . Hence to reduce µk /µ0 to below ε, requires
log ε
k=
log γ
stages.
Often a version of Newton’s method for minimization is used to solve
each of the problems. For the current strategy, Newton’s method works
◦
on a problem (5.10) with fixed µ by moving from a given point x ∈F p to
◦
a closer point x+ ∈F p by solving for dx , dy and ds from the linearized
version of the central path equations of (5.7), as
µX−2 dx + ds = µX−1 1 − c,
Adx = 0, (5.11)
−AT dy − ds = 0.
(Recall that X is the diagonal matrix whose diagonal entries are x > 0.)
The new point is then updated by taking a step in the direction of dx , as
x+ = x + dx .
5.6. SOLUTION STRATEGIES 139
µdx + X2 ds = µX1 − X2 c.
and
◦
T
F d = {(y, s) : s = c − A y > 0} 6= ∅,
and denote by z ∗ the optimal objective value.
140 CHAPTER 5. INTERIOR-POINT ALGORITHMS
for some η ∈ (0, 1), say η = 1/4. This can be thought of as a tube whose
center is the central path.
The idea of the path-following method is to move within a tubular
neighborhood of the central path toward the solution point. A suitable
initial point (x0 , y0 , s0 ) ∈ N (η) can be found by solving the barrier problem
for some fixed µ0 or from an initialization phase proposed later. After that,
step by step moves are made, alternating between a predictor step and a
corrector step. After each pair of steps, the point achieved is again in the
fixed given neighborhood of the central path, but closer to the LP solution
set.
The predictor step is designed to move essentially parallel to the true
central path. The step d ≡ (dx , dy , ds ) is determined from the linearized
version of the primal-dual central path equations of (5.9), as
s ◦ dx + x ◦ ds = γµ1 − x ◦ s,
Adx = 0, (5.13)
−AT dy − ds = 0,
where one selects γ = 0. (To show the dependence of d on the current pair
(x, s) and the parameter γ, we write d = d(x, s, γ).)
The new point is then found by taking a step in the direction of d,
as (x+ , y+ , s+ ) = (x, y, s) + α(dx , dy , ds ), where α is called the step-size.
Note that dTx ds = −dTx AT dy = 0 here. Then
Thus, the predictor step will then reduce the duality gap by a factor 1 − α,
because the new direction d will tend to reduce it. The maximum possible
step-size α in that direction is made in that parallel direction without going
outside of the neighborhood N (2η).
The corrector step essentially moves perpendicular to the central path
in order to get closer to it. This step moves the solution back to within the
neighborhood N (η), and the step is determined by selecting γ = 1 as in
5.6. SOLUTION STRATEGIES 141
Thus, the operation complexity of the method is identical to that of the bar-
rier method with the overall operation complexity O(n.5 (nm2 +m3 ) log(1/ε))
to achive µ/µ0 ≤ ε where nµ0 is the initial duality gap. Moreover, one can
prove that the step-size α → 1 as xT s → 0, that is, the duality reduction
speed is accelerated as the gap becomes smaller.
where ρ ≥ 0.
From the arithmetic and geometric mean inequality (also see Exercise
5.9) we can derive that
n
X
n log(xT s) − log(xj sj ) ≥ n log n.
j=1
142 CHAPTER 5. INTERIOR-POINT ALGORITHMS
Then
n
X
ψn+ρ (x, s) = ρ log(xT s) + n log(xT s) − log(xj sj ) ≥ ρ log(xT s) + n log n.
j=1
(5.15)
Thus, for ρ > 0, ψn+ρ (x, s) → −∞ implies that xT s → 0. More precisely,
we have from (5.15)
µ ¶
T ψn+ρ (x, s) − n log n
x s ≤ exp .
ρ
Hence the primal–dual potential function gives an explicit bound on the
magnitude of the duality gap.
The objective of this method is to drive the potential function down
toward minus infinity. The method of reduction is a version of Newton’s
method (5.13). In this case we select γ = n/(n + ρ). Notice that is a
combination of a predictor and corrector choice. The predictor uses γ =
0 and the corrector uses γ = 1. The primal–dual potential method uses
something in between. This seems logical, for the predictor moves parallel
to the central path toward a lower duality gap, and the corrector moves
perpendicular to get close to the central path. This new method does both
at once. Of course,
√ this intuitive notion must be made precise.
For ρ ≥ n, there is in fact a guaranteed decrease in the potential
function by a fixed amount δ (see Exercises 5.11 and 5.12). Specifically,
ψn+ρ (x+ , s+ ) − ψn+ρ (x, s) ≤ −δ (5.16)
for a constant δ ≥ 0.2. This result provides a theoretical bound on the
number of required iterations and the bound is competitive with other
methods. However, a faster algorithm may be achieved by conducting a line
search along direction d to achieve the greatest reduction in the primal-dual
potential function at each iteration.
We outline the algorithm here:
◦
Algorithm √ 5.1 Given (x0 , y0 ,√
s0 ) ∈F with ψn+ρ (x0 , s0 ) ≤ ρ log((s0 )T x0 )+
n log n + O( n log n). Set ρ ≥ n and k = 0.
k T k
While (s ) x
(s0 )T x0
≥ ε do
or √
ρ(log((sk )T xk ) − log((s0 )T x0 )) ≤ −k · δ + O( n log n).
Therefore, as soon as k ≥ O(ρ log(n/ε)), we must have
or
(sk )T xk
≤ ε.
(s0 )T x0
2
√ √
Theorem 5.4 holds for any ρ ≥ n.√ Thus, by choosing ρ = n, the
iteration complexity bound becomes O( n log(n/ε)).
where X and S are two diagonal matrices whose diagonal entries are x > 0
and s > 0, respectively. Premultiplying both sides by S−1 we have
(xk )T sk
≤ ε.
(x0 )T s0
Termination
In the previously studied complexity bounds for interior-point algorithms,
we have an ε which needs to be zero in order to obtain an exact optimal
solution. Sometimes it is advantageous to employ an early termination or
rounding method while ε is still moderately large. There are five basic
approaches.
• A “purification” procedure finds a feasible corner whose objective
value is at least as good as the current interior point. This can be
accomplished in strongly polynomial time (that is, the complexity
bound is a polynomial only in the dimensions m and n). One difficulty
is that then may be many non-optimal vertices may be close to the
optimal face, and the procedure might require many pivot steps for
difficult problems.
• A second method seeks to identify an optimal basis. It has been
shown that if the LP problem is nondegenerate, the unique optimal
basis may be identified early. The procedure seems to work well for
some LP problems but it has difficulty if the problem is degenerate.
Unfortunately, most real LP problems are degenerate.
• The third approach is to slightly perturb the data such that the new
LP problem is nondegenerate and its optimal basis remains one of
the optimal bases of the original LP problem. There are questions
about how and when to perturb the data during the iterative process,
decisions which can significantly affect the success of the effort.
• The fourth approach is to guess the optimal face and find a feasible
solution on that face. It consists of two phases: the first phase uses
interior point algorithms to identify the complementarity partition
(P ∗ , Z ∗ ) (see Exercise 5.5), and the second phase adapts the simplex
method to find an optimal primal (or dual) basic solution and one can
use (P ∗ , Z ∗ ) as a starting base for the second phase. This method is
often called the cross-over method. It is guaranteed to work in finite
time and is implemented in several popular LP software packages.
146 CHAPTER 5. INTERIOR-POINT ALGORITHMS
• The fifth approach is to guess the optimal face and project the current
interior point onto the interior of the optimal face. This again uses
interior point algorithms to identify the complementarity partition
(P ∗ , Z ∗ ), and then solves a least-squares problem for the projection;
see Figure 5.5. The termination criterion is guaranteed to work in
finite time.
Objective
hyperplane
yk Central path
y*
Optimal
face
Figure 5.5: Illustration of the projection of an interior point onto the opti-
mal face.
The fourth and fifth methods above are based on the fact that (as observed
in practice and subsequently proved) many interior-point algorithms for
linear programming generate solution sequences that converge to a strictly
complementary solution or an interior solution on the optimal face; see
Exercise (5.7).
Initialization
Most interior-point algorithms must be initiated at a strictly feasible point.
The complexity of obtaining such an initial point is the same as that of
solving the LP problem itself. More importantly, a complete LP algorithm
should accomplish two tasks: 1) detect the infeasibility or unboundedness
status of the LP problem, then 2) generate an optimal solution if the prob-
lem is neither infeasible nor unbounded.
Several approaches have been proposed to accomplish these goals:
• The primal and dual can be combined into a single linear feasibil-
ity problem, and then an LP algorithm can find a feasible point of
the problem. Theoretically, this approach achieves the currently best
5.7. TERMINATION AND INITIALIZATION∗ 147
√
iteration complexity bound, i.e., O( n log(1/ε)). Practically, a sig-
nificant disadvantage of this approach is the doubled dimension of the
system of equations that must be solved at each iteration.
5.8 Notes
Computation and complexity models were developed by a number of sci-
entists; see, e.g., Cook [14], Hartmanis and Stearns [30] and Papadimitriou
and Steiglitz [49] for the bit complaxity models and Blum et al. [12] for the
real number arithmetic model.
Most materials of subsection 5.1 are based on a teaching note of Cottle
on Linear Programming tatught at Stanford. Practical performances of the
simplex method can be seen in Bixby [10].
The ellipsoid method was developed by Khachiyan [34]; more develop-
ments of the ellipsoid method can be found in Bland and Todd [11] and
Goldfarb and Todd [25].
The “analytic center” for a convex polyhedron given by linear inequali-
ties was introduced by Huard [32], and later by Sonnevend [53]. The barrier
function, was introduced by Frisch [23].
McLinden [40] earlier, then Bayer and Lagarias [6, 7], Megiddo [41],
and Sonnevend [53], analyzed the central path for linear programming and
convex optimization; also see Fiacco and McCormick [21].
The path-following algorithms were first developed by Renegar [50]. A
primal barrier or path-following algorithm is independently analyzed by
Gonzaga [28]. Both Gonzaga [28] and Vaidya [59] developed a rank-one
updating technique in solving the Newton equation of each iteration, and
proved that each iteration uses O(n2.5 ) arithmetic operations on average.
Kojima, Mizuno and Yoshise [36] and Monteiro and Adler [44] developed
a symmetric primal-dual path-following algorithm with the same iteration
and arithmetic operation bounds.
The predictor-corrector algorithms were developed by Mizuno et al. [43].
A more practical predictor-corrector algorithm was proposed by Mehrotra
[42] (also see Lustig et al. [39] and Zhang and Zhang [66]). His technique
has been used in almost all of the LP interior-point implementations.
A primal potential reduction algorithm was initially proposed by Kar-
markar [33]. The primal-dual potential function was proposed by Tanabe
[54] and Todd and Ye [56]. The primal-dual potential reduction algorithm
was developed by Ye [64], Freund [22], Kojima, Mizuno and Yoshise [37],
Goldfarb and Xiao [26], Gonzaga and Todd [29], Todd [55], Tuncel4 [57],
Tunccel [57], etc.
5.9. EXERCISES 149
5.9 Exercises
5.1 Prove the volume reduction rate in Theorem 5.1 for the ellipsoid method.
5.2 Develop a cutting plane method, based on the ellipsoid method, to find
a point satisfying convex inequalities
fi (x) ≤ 0, i = 1, ..., m, |x|2 ≤ R2 ,
where fi ’s are convex functions of x in C 1 .
◦
5.3 Consider linear program (5.5) and assume that F = {x : Ax = b, x >
0} is nonempty and its optimal solution set is bounded. Then, the dual of
the problem has a nonempty interior.
5.4 (Farkas’ lemma) Exactly one of the feasible set {x : Ax = b, x ≥ 0}
and the feasible set {y : yT A ≤ 0, yT b = 1} is nonempty. A vector y in
the latter set is called an infeasibility certificate for the former.
5.5 (Strict complementarity) Given any linear program in the standard
form, the primal optimal face is
Ωp = {xP ∗ : AP ∗ xP ∗ = b, xP ∗ ≥ 0},
and the dual optimal face is
Ωd = {(y, sZ ∗ ) : ATP ∗ y = cP ∗ , sZ ∗ = cZ ∗ − ATZ ∗ y ≥ 0},
where (P ∗ , Z ∗ ) is a partition of {1, 2, ..., n}. Prove that the partition is
unique, and each face has an interior, that is,
◦
Ωp = {xP ∗ : AP ∗ xP ∗ = b, xP ∗ > 0},
and
◦
T
Ωd = {(y, sZ ∗ ) : AP ∗ y = cP ∗ , sZ ∗ = cZ ∗ − ATZ ∗ y > 0},
are both nonempty.
150 CHAPTER 5. INTERIOR-POINT ALGORITHMS
5.6 (Central path theorem) Let (x(µ), y(µ), s(µ)) be the central path of
(5.9). Then prove
i) The central path point (x(µ), s(µ)) is bounded for 0 < µ ≤ µ0 and any
given 0 < µ0 < ∞.
iii) (x(µ), s(µ)) converges to an optimal solution pair for (LP) and (LD).
Moreover, the limit point x(0)P ∗ is the analytic center on the pri-
mal optimal face, and the limit point s(0)Z ∗ is the analytic center on
the dual optimal face, where (P ∗ , Z ∗ ) is the strict complementarity
partition of the index set {1, 2, ..., n}.
xj ≥ δ, ∀j ∈ P ∗
and
sj ≥ δ, ∀j ∈ Z ∗ ,
Prove
i)
Ψ(δ 1 ) ⊂ Ψ(δ 2 ) if δ1 ≤ δ2 .
ii) For every δ, Ψ(δ) is bounded and its closure Ψ̂(δ) has non-empty inter-
section with the LP solution set.
5.9. EXERCISES 151
and µ ¶
ψn+ρ (x, s) − n log n
xT s ≤ exp .
ρ
Then, using exercise 5.10 and the concavity of logarithmic function show
◦
(x+ , y+ , s+ ) ∈F and
for a constant δ. One can verify that δ > 0.2 when θ = 0.4.
152 CHAPTER 5. INTERIOR-POINT ALGORITHMS
Chapter 12
153
154 CHAPTER 12. PENALTY AND BARRIER METHODS
minimize Q•X
subject to Ij • X = 1, ∀j,
X º 0.
It has been shown that an optimal SDP solution constitutes a good approx-
imation to the original problem.
where ei ∈ Rn is the vector with 1 at the ith position and zero everywhere
else. Let Y = XT X. Then SDP relaxation of the localization problem is
to find Y such that
12.11 Notes
Many researchers have applied interior-point algorithms to convex QP prob-
lems. These algorithms can be divided into three groups: the primal scaling
algorithm, the dual scaling algorithm, and the primal-dual scaling algo-
rithm. Relations among these algorithms can be seen in Anstreicher, den
Hertog, Roos and Terlaky [31, 4].
12.12. EXERCISES 159
There have been several remarkable applications of SDP; see, e.g., Goe-
mans and Williamson [24] and Vandenberghe and Boyd [13, 60].
The SDP example with a duality gap was constructed by Freund.
The primal potential reduction algorithm for positive semi-definite pro-
gramming is due to Alizadeh [2, 1] and to Nesterov and Nemirovskii [47].
The primal-dual SDP algorithm described here is due to Nesterov and Todd
[48].
12.12 Exercises
P12.1 (Farkas’ lemma in SDP) Let Ai , i = 1, ..., m, have rank m (i.e.,
m
i yi Ai = 0 implies y = 0). Then, there exists a symmetric matrix
X Â 0 with
Ai • X = bi , i = 1, ..., m,
Pm Pm
if and only if i yi Ai ¹ 0 and i yi Ai 6= 0 implies bT y < 0.
12.2 Let X and S be both positive definite. Then prove
n log(X • S) − log(det(X) · det(S)) ≥ n log n.
12.3 Consider SDP and the potential level set
◦
Ψ(δ) := {(X, y, S) ∈F : ψn+ρ (X, S) ≤ δ}.
Prove that
Ψ(δ 1 ) ⊂ Ψ(δ 2 ) if δ1 ≤ δ2 ,
and for every δ Ψ(δ) is bounded and its closure Ψ̂(δ) has non-empty inter-
section with the SDP solution set.
12.4 Let both (SDP) and (SDD) have interior feasible points. Then for
any 0 < µ < ∞, the central path point (X(µ), y(µ), S(µ)) exists and is
unique. Moreover,
i) the central path point (X(µ), S(µ)) is bounded where 0 < µ ≤ µ0 for any
given 0 < µ0 < ∞.
ii) For 0 < µ0 < µ,
C • X(µ0 ) < C • X(µ) and bT y(µ0 ) > bT y(µ)
if X(µ) 6= X(µ0 ) and y(µ) 6= y(mu0 ).
iii) (X(µ), S(µ)) converges to an optimal solution pair for (SDP) and (SDD),
and the rank of the limit of X(µ) is maximal among all optimal solu-
tions of (SDP) and the rank of the limit S(µ) is maximal among all
optimal solutions of (SDD).
160 CHAPTER 12. PENALTY AND BARRIER METHODS
Bibliography
161
162 BIBLIOGRAPHY
[19] G.B. Dantzig and M.N. Thapa. Linear Programming 2: Theory and
Extensions. Springer-Verlag, New York, 2003.
[56] M. J. Todd and Y. Ye. A centered projective algorithm for linear pro-
gramming. Mathematics of Operations Research, 15:508–529, 1990.
[57] L. Tunccel. Constant potential primal–dual algorithms: A frame-
work. Math. Programming, 66:145–159, 1994.
[58] R. Tutuncu. An infeasible-interior-point potential-reduction algo-
rithm for linear programming. Ph.D. Thesis, School of Operations
Research and Industrial Engineering, Cornell University, Ithaca,
NY, 1995.
[59] P. M. Vaidya. An algorithm for linear programming which requires
O((m + n)n2 + (m + n)1.5 nL) arithmetic operations. Math. Pro-
gramming, 47:175–201, 1990. Condensed version in : Proceedings of
the 19th Annual ACM Symposium on Theory of Computing, 1987,
pages 29–38.
[60] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM
Review, 38(1):49–95, 1996.
409
View publication stats
410 INDEX
set
◦
Ω, 125
simplex method, 117, 144
simultaneously feasible and optimal,
148
slack variable, 130
step-size, 141
strict complementarity partition, 150
strong duality theorem in SDP, 157
Â, 159
vertex, 117