0% found this document useful (0 votes)
3 views

LY15 Linear and Nonlinear Programming

The document discusses the book 'Linear and Nonlinear Programming' by David Luenberger and Yinyu Ye, published in the American Journal of Agricultural Economics in January 1984. It covers various topics in linear programming, including the simplex method and polynomial-time algorithms, particularly focusing on interior-point methods. The authors aim to provide insights into the complexity theories of linear programming and the effectiveness of different algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

LY15 Linear and Nonlinear Programming

The document discusses the book 'Linear and Nonlinear Programming' by David Luenberger and Yinyu Ye, published in the American Journal of Agricultural Economics in January 1984. It covers various topics in linear programming, including the simplex method and polynomial-time algorithms, particularly focusing on interior-point methods. The authors aim to provide insights into the complexity theories of linear programming and the effectiveness of different algorithms.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/317083799

Linear and Nonlinear Programming

Book in American Journal of Agricultural Economics · January 1984


DOI: 10.2307/1240727

CITATIONS READS

2,341 12,294

2 authors:

David Luenberger Yinyu Ye


Stanford University Stanford University
166 PUBLICATIONS 30,758 CITATIONS 497 PUBLICATIONS 31,152 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Yinyu Ye on 29 May 2014.

The user has requested enhancement of the downloaded file.


Supplement to “Linear and Nonlinear
Programming”

David G. Luenberger and Yinyu Ye

Revised October 2005


v
vi
Contents

Preface ix

List of Figures xi

5 Interior-Point Algorithms 117


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2 The simplex method is not polynomial-time∗ . . . . . . . . 120
5.3 The Ellipsoid Method . . . . . . . . . . . . . . . . . . . . . 125
5.4 The Analytic Center . . . . . . . . . . . . . . . . . . . . . . 128
5.5 The Central Path . . . . . . . . . . . . . . . . . . . . . . . . 131
5.6 Solution Strategies . . . . . . . . . . . . . . . . . . . . . . . 137
5.6.1 Primal-dual path-following . . . . . . . . . . . . . . 139
5.6.2 Primal-dual potential function . . . . . . . . . . . . 141
5.7 Termination and Initialization∗ . . . . . . . . . . . . . . . . 144
5.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

12 Penalty and Barrier Methods 153


12.9 Convex quadratic programming . . . . . . . . . . . . . . . . 153
12.10Semidefinite programming . . . . . . . . . . . . . . . . . . . 154
12.11Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
12.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Bibliography 160

Index 409

vii
viii CONTENTS
Preface

ix
x PREFACE
List of Figures

5.1 Feasible region for Klee-Minty problem;n = 2. . . . . . . . . 121


5.2 The case n = 3. . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 Illustration of the minimal-volume ellipsoid containing a half-
ellipsoid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.4 The central path as analytic centers in the dual feasible region.135
5.5 Illustration of the projection of an interior point onto the
optimal face. . . . . . . . . . . . . . . . . . . . . . . . . . . 146

xi
xii LIST OF FIGURES
Chapter 5

Interior-Point Algorithms

5.1 Introduction
Linear programming (LP), plays a distinguished role in optimization theory.
In one sense it is a continuous optimization problem since the goal is to
minimize a linear objective function over a convex polyhedron. But it is
also a combinatorial problem involving selecting an extreme point among
a finite set of possible vertices, as seen in Chapters 3 and 4.
An optimal solution of a linear program always lies at a vertex of the
feasible region, which itself is a polyhedron. Unfortunately, the number of
vertices associated with a set of n inequalities in m variables can be expo-
nential in the dimension—up to n!/m!(n − m)!. Except for small values of
m and n, this number is sufficiently large to prevent examining all possible
vertices when searching for an optimal vertex.
The simplex method examines optimal candidate vertices in an intel-
ligent fashion. As we know, it constructs a sequence of adjacent vertices
with improving values of the objective function. Thus, the method travels
along edges of the polyhedron until it hits an optimal vertex. Improved in
various way in the intervening four decades, the simplex method continues
to be the workhorse algorithm for solving linear programming problems.
Although it performs well in practice, the simplex method will examine
every vertex when applied to certain linear programs. Klee and Minty in
1972 gave such an example. These examples confirm that, in the worst
case, the simplex method uses a number of iterations that is exponential
in the size of the problem to find the optimal solution. As interest in
complexity theory grew, many researchers believed that a good algorithm
should be polynomial —that is, broadly speaking, the running time required

117
118 CHAPTER 5. INTERIOR-POINT ALGORITHMS

to compute the solution should be bounded above by a polynomial in the


size, or the total data length, of the problem. The simplex method is not
a polynomial algorithm.1
In 1979, a new approach to linear programming, Khachiyan’s ellipsoid
method, received dramatic and widespread coverage in the international
press. Khachiyan proved that the ellipsoid method, developed during the
1970s by other mathematicians, is a polynomial algorithm for linear pro-
gramming under a certain computational model. It constructs a sequence
of shrinking ellipsoids with two properties: the current ellipsoid always con-
tains the optimal solution set, and each member of the sequence undergoes
a guaranteed reduction in volume, so that the solution set is squeezed more
tightly at each iteration.
Experience with the method, however, led to disappointment. It was
found that even with the best implementations the method was not even
close to being competitive with the simplex method. Thus, after the
dust eventually settled, the prevalent view among linear programming re-
searchers was that Khachiyan had answered a major open question on the
polynomiality of solving linear programs, but the simplex method remained
the clear winner in practice.
This contradiction, the fact that an algorithm with the desirable the-
oretical property of polynomiality might nonetheless compare unfavorably
with the (worst-case exponential) simplex method, set the stage for excit-
ing new developments. It was no wonder, then, that the announcement by
Karmarkar in 1984 of a new polynomial time algorithm, an interior-point
method, with the potential to improve the practical effectiveness of the
simplex method made front-page news in major newspapers and magazines
throughout the world.
Interior-point algorithms are continuous iterative algorithms. Computa-
tional experience with sophisticated procedures suggests that the number
of necessary iterations grows very slowly with problem size. This pro-
vides the potential for improvements in computation effectiveness for solv-
ing large-scale linear programs. The goal of this chapter is to provide
some understanding on the complexity theories of linear programming and
polynomial-time interior-point algorithms.

A few words on complexity theory


Complexity theory is arguably the foundation stone for analysis of com-
puter algorithms. The goal of the theory is twofold: to develop criteria for
1 We will be more precise about complexity notions such as “polynomial algorithm”
in S5.1 below.
5.1. INTRODUCTION 119

measuring the effectiveness of various algorithms (and thus, be able to com-


pare algorithms using these criteria), and to assess the inherent difficulty
of various problems.
The term complexity refers to the amount of resources required by a
computation. In this chapter we focus on a particular resource, namely,
computing time. In complexity theory, however, one is not interested in
the execution time of a program implemented in a particular programming
language, running on a particular computer over a particular input. This
involves too many contingent factors. Instead, one wishes to associate to
an algorithm more intrinsic measures of its time requirements.
Roughly speaking, to do so one needs to define:

• a notion of input size,

• a set of basic operations, and

• a cost for each basic operation.

The last two allow one to associate a cost of a computation. If x is any


input, the cost C(x) of the computation with input x is the sum of the
costs of all the basic operations performed during this computation.
Let A be an algorithm and In be the set of all its inputs having size n.
w
The worst-case cost function of A is the function TA defined by
w
TA (n) = sup C(x).
x∈In

If there is a probability structure on In it is possible to define the average-


a
case cost function TA given by
a
TA (n) = En (C(x)).

where En is the expectation over In . However, the average is usually more


difficult to find, and there is of course the issue of what probabilities to
assign.
We now discuss how the objects in the three items above are selected.
The selection of a set of basic operations is generally easy. For the algo-
rithms we consider in this chapter, the obvious choice is the set {+, −, ×, /, ≤
} of the four arithmetic operations and the comparison. Selecting a notion
of input size and a cost for the basic operations depends on the kind of
data dealt with by the algorithm. Some kinds can be represented within a
fixed amount of computer memory; some others require a variable amount.
Examples of the first are fixed-precision floating-point numbers, stored
in a fixed amount of memory (usually 32 or 64 bits). For this kind of data
120 CHAPTER 5. INTERIOR-POINT ALGORITHMS

the size of an element is usually taken to be 1 and consequently to have


unit size per number.
Examples of the second are integer numbers which require a number
of bits approximately equal to the logarithm of their absolute value. This
(base 2) logarithm is usually referred to as the bit size of the integer. Similar
ideas apply for rational numbers.
Let A be some kind of data and x = (x1 , . . . , xn ) ∈ An . If A is of
the first kind
Pn above then we define size(x) = n. Otherwise, we define
size(x) = i=1 bit-size(xi ).
The cost of operating on two unit-size numbers is taken to be 1 and is
called unit cost. In the bit-size case, the cost of operating on two numbers
is the product of their bit-sizes (for multiplications and divisions) or its
maximum (for additions, subtractions, and comparisons).
The consideration of integer or rational data with their associated bit
size and bit cost for the arithmetic operations is usually referred to as the
Turing model of computation. The consideration of idealized reals with unit
size and unit cost is today referred as the real number arithmetic model.
When comparing algorithms, one should make clear which model of com-
putation is used to derive complexity bounds.
A basic concept related to both models of computation is that of poly-
nomial time. An algorithm A is said to be a polynomial time algorithm if
w
TA is bounded above by a polynomial. A problem can be solved in polyno-
mial time if there is a polynomial time algorithm solving the problem. The
w a
notion of average polynomial time is defined similarly, replacing TA by TA .
The notion of polynomial time is usually taken as the formalization of
efficiency and it is the ground upon which complexity theory is built.

5.2 The simplex method is not polynomial-


time∗
When the simplex method is used to solve a linear program in standard
form with the coefficient matrix A ∈ Rm×n , b ∈ Rm and c ∈ Rn , the
number of iterations to solve the problem starting from a basic feasible
solution is typically a small multiple of m: usually between 2m and 3m.
In fact, Dantzig observed that for problems with m ≤ 50 and n ≤ 200 the
number of iterations is ordinarily less than 1.5m.
At one time researchers believed—and attempted to prove—that the
simplex algorithm (or some variant thereof) always requires a number of
iterations that is bounded by a polynomial expression in the problem size.
That was until Victor Klee and George Minty published their landmark
5.2. THE SIMPLEX METHOD IS NOT POLYNOMIAL-TIME∗ 121

x.2
........
..
...
...
...
...
...
...
..................
1 ..
...
.............
.............
.............
... .............
.............
... .............
.............
... .............
... ............
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ..................
... ..
.......
... . .
. .
. ....
. ..........
..
... .............
.............
... .............
.. .............
.............
....................................................................................................................................................................................................................................................................................................................................
...
x1
..
.... 1
...
..

Figure 5.1: Feasible region for Klee-Minty problem;n = 2.

paper that exhibited a class of linear programs each of which requires an


exponential number of iterations when solved by the conventional simplex
method.
As originally stated, the Klee–Minty problems are not in standard form;
they are expressed in terms of 2n linear inequalities in n variables. Their
feasible regions are perturbations of the unit cube in n-space; that is,

[0, 1]n = {x : 0 ≤ xj ≤ 1, j = 1, . . . , n}.

One way to express a problem in this class is


maximize xn
subject to x1 ≥ 0
x1 ≤ 1
xj ≥ εxj−1 j = 2, . . . , n
xj ≤ 1 − εxj−1 j = 2, . . . , n

where 0 < ε < 1/2. This presentation of the problem emphasizes the idea
that the feasible region of the problem is a perturbation of the n-cube.
In the case of n = 2 and ε = 1/4, the feasible region of the linear
program above looks like that of Figure 5.1
For the case where n = 3, the feasible region of the problem above looks
like that of Figure 5.2
122 CHAPTER 5. INTERIOR-POINT ALGORITHMS

x.3
.........
....
...
...
...
... .......................................................................................................................................................
.......................................................................
1 ..
...
................................
...
................................
................................................
.........
......... ....
...
... .. ...
... ..... ... ....
... ... ..
... .. ... ...
... .. ... ...
...
...
x
.....
2 .
.
.
.
.
...
...
........ .
.
... ..... .
. ...
... .. .
. ...
...
. .
.
... . .... ........ ...... ....... ....... ....... ........ ....... ....... ....... ......
. .
.. .
.. .
... ... . . .
... ..... ..
. ..
....
.
... ...
. ..
..... .... .....
.....
... .. ... .....
... ...
. ..... ... .....
... .
.. .
. ...
......
. .. .
... ... ... .. .... .........
... .. .... ... .......
... ... ... ............
. ..
... ... ... .............. ...................
... ........ .....................
.. ... .....................
......................
.....................................................................................................................................................................................................................................................................................................................................
...
x1
..
.... 1
...
..

Figure 5.2: The case n = 3.

Consider
n
X
maximize 10n−j xj
j=1
Xi−1
(5.1)
subject to 2 10i−j xj + xi ≤ 100i−1 i = 1, . . . , n
j=1
xj ≥0 j = 1, . . . , n

The problem above is easily cast as a linear program in standard form.

Example 5.1 Suppose we want to solve the linear program

maximize 100x1 + 10x2 + x3


subject to x1 ≤ 1
20x1 + x2 ≤ 100
200x1 + 20x2 + x3 ≤ 10, 000

In this case, we have three constraints and three variables (along with
their nonnegativity constraints). After adding slack variables, we get a
problem in standard form. The system has m = 3 equations and n = 6
nonnegative variables. In tableau form, the problem is
5.2. THE SIMPLEX METHOD IS NOT POLYNOMIAL-TIME∗ 123

Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
0 5 20 1 0 0 1 0 100
T
6 200 20 1 0 0 1 10,000
cT 100 10 1 0 0 0 0
• • •
The bullets below the tableau indicate the columns that are basic.
Note that we are maximizing, so the goal is to find a feasible basis that
prices out nonpositive. In the objective function, the nonbasic variables
x1 , x2 , and x3 have coefficients 100, 10, and 1, respectively. Using the
greedy rule2 for selecting the incoming variable (see Section 3.8, Step 2
of the revised simplex method), we start making x1 positive and find (by
the minimum ratio test) that x4 becomes nonbasic. After pivoting on the
element in row 1 and column 1, we obtain a sequence of tables:
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
5 0 1 0 –20 1 0 80
T1
6 0 20 1 –200 0 1 9,800
rT 0 10 1 –100 0 0 –100
• • •
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
2 2 0 1 0 –20 1 0 80
T
6 0 0 1 200 –20 1 8,200
rT 0 0 1 100 –10 0 –900
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
3 2 20 1 0 0 1 0 100
T
6 –200 0 1 0 –20 1 8,000
rT –100 0 1 0 –10 0 –1,000
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
2 20 1 0 0 1 0 100
T4
3 –200 0 1 0 –20 1 8,000
rT 100 0 0 0 10 –1 –9,000
• • •
2 That is, selecting the pivot column as that with the largest “reduced cost” coefficient.
124 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
5 2 0 1 0 –20 1 0 80
T
3 0 0 1 200 –20 1 8,200
rT 0 0 0 –100 10 –1 –9,100
• • •
Variable x1 x2 x3 x4 x5 x6 b
1 1 0 0 1 0 0 1
6 5 0 1 0 –20 1 0 80
T
3 0 20 1 –200 0 1 9,800
rT 0 –10 0 100 0 –1 –9,900
• • •
Variable x1 x2 x3 x4 x5 x6 b
4 1 0 0 1 0 0 1
7 5 20 1 0 0 1 0 100
T
3 200 20 1 0 0 1 10,000
rT –100 –10 0 0 0 –1 –10,000
• • •
From T7 we see that the corresponding basic feasible solution
(x1 , x2 , x3 , x4 , x5 , x6 ) = (0, 0, 104 , 1, 102 , 0)
is optimal and that the objective function value is 10, 000. Along the way,
we made 23 − 1 = 7 pivot steps. The objective function strictly increased
with each change of basis.
We see that the instance of the linear program (5.1) with n = 3 leads to
23 − 1 pivot steps when the greedy rule is used to select the pivot column.
The general problem of the class (5.1) takes 2n − 1 pivot steps. To get
an idea of how bad this can be, consider the case where n = 50. Now
250 − 1 ≈ 1015 . In a year with 365 days, there are approximately 3 × 107
seconds. If a computer ran continuously performing a hundred thousand
iterations of the simplex algorithm per second, it would take approximately
1015
≈ 33 years
3 × 105 × 108
to solve the problem using the greedy pivot selection rule3 .
3 A path that visits each vertex of a unit n-cube once and only once is said to be

Hamiltonian path. In this example, the simplex method generates a sequence of points
which yields a Hamiltonian path on the cube. There is an amusing recreational literature
that connects the Hamiltonian path with certain puzzles. See Martin Gardner in the
references.
5.3. THE ELLIPSOID METHOD 125

5.3 The Ellipsoid Method


The basic ideas of the ellipsoid method stem from research done in the
nineteen sixties and seventies mainly in the Soviet Union (as it was then
called) by others who preceded Khachiyan. The idea in a nutshell is to
enclose the region of interest in each member of a sequence of ever smaller
ellipsoids.
The significant contribution of Khachiyan was to demonstrate in two
papers—published in 1979 and 1980—that under certain assumptions, the
ellipsoid method constitutes a polynomially bounded algorithm for linear
programming.
The method discussed here is really aimed at finding a point of a poly-
hedral set Ω given by a system of linear inequalities.

Ω = {y ∈ Rm : yT aj ≤ cj , j = 1, . . . n}

Finding a point of Ω can be thought of as being equivalent to solving a


linear programming problem.
Two important assumptions are made regarding this problem:
(A1) There is a vector y0 ∈ Rm and a scalar R > 0 such that the closed
ball S(y0 , R) with center y0 and radius R, that is

{y ∈ Rm : |y − y0 | ≤ R},

contains Ω.
(A2) There is a known scalar r > 0 such that if Ω is nonempty, then it
contains a ball of the form S(y∗ , r) with center at y∗ and radius
r. (This assumption implies that if Ω is nonempty, then it has a
nonempty interior and its volume is at least vol(S(0, r)))4 .

Definition 5.1 An Ellipsoid in Rm is a set of the form

E = {y ∈ Rm : (y − z)T Q(y − z) ≤ 1}

where z ∈ Rm is a given point (called the center) and Q is a positive definite


matrix (see Section A.4 of Appendix A) of dimension m. This ellipsoid is
denoted ell(z, Q).

The unit sphere S(0, 1) centered at the origin 0 is a special ellipsoid


with Q = I, the identity matrix.
4 The(topological) interior of any set Ω is the set of points in Ω which are the centers
of some balls contained in Ω.
126 CHAPTER 5. INTERIOR-POINT ALGORITHMS

The axes of a general ellipsoid are the eigenvectors of Q and the lengths
−1/2 −1/2 −1/2
of the axes are λ1 , λ2 , . . . , λm , where the λi ’s are the corresponding
eigenvalues. It is easily seen that the volume of an ellipsoid is
−1/2
vol(E) = vol(S(0, 1))Πm
i=1 λi = vol(S(0, 1))det(Q−1/2 ).

Cutting plane and new containing ellipsoid


In the ellipsoid method, a series of ellipsoids Ek are defined, with centers
yk and with the defining Q = B−1 k , where Bk is symmetric and positive
definite.
At each iteration of the algorithm, we will have Ω ⊂ Ek . It is then
possible to check whether yk ∈ Ω. If so, we have found an element of Ω as
required. If not, there is at least one constraint that is violated. Suppose
aTj yk > cj . Then

1
Ω⊂ Ek := {y ∈ Ek : aTj y ≤ aTj yk }
2
This set is half of the ellipsoid, obtained by cutting the ellipsoid in half
through its center.
The successor ellipsoid Ek+1 will be the minimal-volume ellipsoid con-
taining 12 Ek . It is constructed as follows. Define

1 m2
τ= , δ= , σ = 2τ.
m+1 m2 − 1

Then put
τ
yk+1 = yk − Bk aj
(aTj Bk aj )1/2
à !
Bk aj aTj Bk
Bk+1 = δ Bk − σ T (5.2)
aj Bk aj
.

Theorem 5.1 The ellipsoid Ek+1 = ell(yk+1 , B−1 k+1 ) defined as above is
the ellipsoid of least volume containing 12 Ek . Moreover,
µ ¶(m−1)/2 µ ¶
vol(Ek+1 ) m2 m 1
= < exp − < 1.
vol(Ek ) m2 − 1 m+1 2(m + 1)
5.3. THE ELLIPSOID METHOD 127

- +
E(E )

E
+
E
ye

-
E

H
Figure 5.3: Illustration of the minimal-volume ellipsoid containing a half-
ellipsoid.

Proof. We shall not prove the statement about the new ellipsoid being
of least volume, since that is not necessary for the results that follow. To
prove the remainder of the statement, we have

1/2
vol(Ek+1 ) det(Bk+1 )
= 1/2
vol(Ek ) det(Bk )

For simplicity, by a change of coordinates, we may take Bk = I. Then


2
Bk+1 has m − 1 eigenvalues equal to δ = mm 2 −1 and one eigenvalue equal
2
to δ − 2δτ = mm 2 m 2
2 −1 (1 − m+1 ) = ( m+1 ) . The reduction in volume is the

product of the square roots of these, giving the equality in the theorem.
Then using (1 + x)p ≤ exp , we have
µ ¶(m−1)/2 µ ¶(m−1)/2 µ ¶
m2 m 1 1
= 1+ 2 1−
m2 − 1 m+1 m −1 m+1
µ ¶ µ ¶
1 1 1
< exp − = exp − .
2(m + 1) (m + 1) 2(m + 1)

2
128 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Convergence
The ellipsoid method is initiated by selecting y0 and R such that condition
(A1) is satisfied. Then B0 = R2 I, and the corresponding E0 contains Ω.
The updating of the Ek ’s is continued until a solution is found.
Under the assumptions stated above, a single repetition of the ellipsoid
method reduces the volume of an ellipsoid to one-half of its initial value
in O(m) iterations. Hence it can reduce the volume to less than that of a
sphere of radius r in O(m2 log(R/r)) iterations, since it volume is bounded
from below by vol(S(0, 1))rm and the initial volume is vol(S(0, 1))Rm .
Generally a single iteration requires O(m2 ) arithmetic operations. Hence
the entire process requires O(m4 log(R/r)) arithmetic operations.5

Ellipsoid method for usual form of LP


Now consider the linear program (where A is m × n)

maximize cT x
(P) subject to Ax ≤ b
x≥0

and its dual


minimize yT b
(D) subject to y A ≥ cT
T

y ≥ 0.
Both problems can be solved by finding a feasible point to inequalities

−cT x + bT y ≤ 0
Ax ≤ b
(5.3)
−AT y ≤ −c
x, y ≥ 0,

where both x and y are variables. Thus, the total number of arithmetic
operations of solving a linear program is bounded by O((m + n)4 log(R/r)).

5.4 The Analytic Center


The new form of interior-point algorithm introduced by Karmarkar moves
by successive steps inside the feasible region. It is the interior of the feasible
5 Assumption (A2) is sometimes too strong. It has been shown, however, that when

the data consists of integers, it is possible to perturb the problem so that (A2) is satisfied
and if the perturbed problem has a feasible solution, so does the original Ω.
5.4. THE ANALYTIC CENTER 129

set rather than the vertices and edges that plays a dominant role in this
type of algorithm. In fact, these algorithms purposely avoid the edges of
the set, only eventually converging to one as a solution.
The study of these algorithms begins in the next section, but it is useful
at this point to introduce a concept that definitely focuses on the interior
of a set, termed the set’s analytic center. As the name implies, the center
is away from the edge.
In addition, the study of the analytic center introduces a special struc-
ture, termed a barrier that is fundamental to interior-point methods.
Consider a set S in a subset of X of Rn defined by a group of inequalities
as
S = {x ∈ X : gj (x) ≥ 0, j = 1, 2, . . . , m},
and assume that the functions gj are continuous. S has a nonempty interior

S = {x ∈ X : gj (x) > 0, all j}. Associated with this definition of the set is
the potential function
m
X
ψ(x) = − log gj (x)
j=1


defined on S .
The analytic center of S is the vector (or set of vectors) that minimizes
the potential; that is, the vector (or vectors) that solve
 
 X m 
min ψ(x) = min − log gj (x) : x ∈ X , gj (x) > 0 for each j .
 
j=1

Example 5.2 A cube Consider the set S defined by xi ≥ 0, (1 − xi ) ≥ 0,


for i = 1, 2, . . . , n. This is S = [0, 1]n , the unit cube in Rn . The analytic
center can be found by differentiation to be xi = 1/2, for all i. Hence, the
analytic center is identical to what one would normally call the center of
the unit cube.

In general, the analytic center depends on how the set is defined—on


the particular inequalities used in the definition. For instance, the unit
cube is also defined by the inequalities xi ≥ 0, (1 − xi )d ≥ 0 with d > 1.
In this case the solution is xi = 1/(d + 1) for all i. For large d this point is
near the inner corner of the unit cube.
Also, the additional of redundant inequalities can also change the loca-
tion of the analytic center. For example, repeating a given inequality will
change the center’s location.
130 CHAPTER 5. INTERIOR-POINT ALGORITHMS

There are several sets associated with linear programs for which the
analytic center is of particular interest. One such set is the feasible region
itself. Another is the set of optimal solutions. There are also sets associated
with the dual and primal-dual formulations. All of these are related in
important ways.
Let us illustrate, by considering the analytic center associated with a
bounded polytope Ω in Rm represented by n (> m) linear inequalities; that
is,
Ω = {y ∈ Rm : cT − yT A ≥ 0},
where A ∈ Rm×n and c ∈ Rn are given and A has rank m. Denote the
interior of Ω by

m T T
Ω= {y ∈ R : c − y A > 0}.
The potential function for this set is
n
X n
X
ψΩ (y) ≡ − log(cj − yT aj ) = − log sj , (5.4)
j=1 j=1

where s ≡ c − AT y is a slack vector. Hence the potential function is the


negative sum of the logarithms of the slack variables.
The analytic center of Ω is the interior point of Ω that minimizes the
potential function. This point is denoted by ya and has the associated
sa = c − AT ya . (ya , sa ) is uniquely defined, since the potential function is
strictly convex in a bounded convex Ω.
Setting to zero the derivatives of ψ(y) with respect to each yi gives
m
X aij
= 0, for all i.
j=1
cj − yT aj

which can be written


m
X aij
= 0, for all i.
j=1
sj

Now define xi = 1/sj for each i. We introduce the notion


x ◦ s ≡ (x1 s1 , x2 s2 , . . . , xn sn )T ,
which is component multiplication. Then the analytic center is defined by
the conditions
x◦s = 1
Ax = 0
T
A y+s = c.
5.5. THE CENTRAL PATH 131

The analytic center can be defined when the interior is empty or equal-
ities are present, such as

Ω = {y ∈ Rm : cT − yT A ≥ 0, By = b}.

In this case the analytic center is chosen on the linear surface {y : By = b}


to maximize the product of the slack variables s = c − AT y. Thus, in this
context the interior of Ω refers to the interior of the positive orthant of
n
slack variables: R+ ≡ {s : s ≥ 0}. This definition of interior depends only
on the region of the slack variables. Even if there is only a single point in Ω
with s = c − AT y for some y where By = b with s > 0, we still say that

Ω is not empty.

5.5 The Central Path


The concept underlying interior-point methods for linear programming is to
use nonlinear programming techniques of analysis and methodology. The
analysis is often based on differentiation of the functions defining the prob-
lem. Traditional linear programming does not require these techniques since
the defining functions are linear. Duality in general nonlinear programs is
typically manifested through Lagrange multipliers (which are called dual
variables in linear programming). The analysis and algorithms of the re-
maining sections of the chapter use these nonlinear techniques. These tech-
niques are discussed systematically in later chapters, so rather than treat
them in detail at this point, these current sections provide only minimal
detail in their application to linear programming. It is expected that most
readers are already familiar with the basic method for minimizing a func-
tion by setting its derivative to zero, and for incorporating constraints by
introducing Lagrange multipliers. These methods are discussed in detail in
Chapters 11-15.
The computational algorithms of nonlinear programming are typically
iterative in nature, often characterized as search algorithms. When at any
point, a direction for search is established and then a move in that direction
is made to define the next point. There are many varieties of such search
algorithms and they are systematically presented in chapters blank through
blank. In this chapter, we use versions of Newton’s method as the search
algorithm, but we postpone a detailed study of the method until later
chapters.
The transfer of ideas has gone in two directions. The ideas of interior-
point methods for linear programming have been extended to provide new
approaches to nonlinear programming as well. This chapter is intended to
132 CHAPTER 5. INTERIOR-POINT ALGORITHMS

show how this merger of linear and nonlinear programming produces elegant
and effective methods. And these ideas take an especially pleasing form
when applied to linear programming. Study of them here, even without
all the detailed analysis, should provide good intuitive background for the
more general manifestations.
Consider a primal linear program in standard form
(LP) minimize cT x (5.5)
subject to Ax = b
x ≥ 0.
We denote the feasible region of this program by Fp . We assume that

F = {x : Ax = b, x > 0} is nonempty and the optimal solution set of the
problem is bounded.
Associated with this problem, we define for µ ≥ 0 the barrier problem
n
X
(BP) minimize cT x − µ log xj (5.6)
j=1
subject to Ax = b
x > 0.
It is clear that µ = 0 corresponds to the original problem (5.5). As µ → ∞,
the solution approaches the analytic center of the feasible region (when it is
bounded), since the barrier term swamps out cT x in the objective. As µ is
varied continuously toward 0, there is a path x(µ) defined by the solution to
(BP). This path x(µ) is termed the primal central path. As µ → 0 this path
converges to the analytic center of the optimal face {x : cT x = z ∗ , Ax =
b, x ≥ 0}, where z ∗ is the optimal value of (LP).
A strategy for solving (LP) is to solve (BP) for smaller and smaller
values of µ and thereby approach a solution to (LP). This is indeed the
basic idea of interior-point methods.
At any µ > 0, under the assumptions that we have made for problem
(5.5), the necessary and sufficient conditions for a unique and bounded
solution are obtained by introducing a Lagrange multiplier vector y for the
linear equality constraints to form the Lagrangian (see Chapter ?)
n
X
cT x − µ log xj − yT (Ax − b).
j=1

The derivatives with respect to the xj ’s are set to zero, leading to the
conditions
cj − µ/xj − yT aj = 0, for each j
5.5. THE CENTRAL PATH 133

or
µX−1 1 + AT y = c (5.7)
where as before aj is the j-th column of A and X is the diagonal matrix
whose diagonal entries are x > 0. Setting sj = µ/xj the complete set of
conditions can be rewritten
x◦s = µ1
Ax = b (5.8)
AT y + s = c.

Note that y is a dual feasible solution and c − AT y > 0 (see Exercise 5.3).
Example 5.3 A square primal Consider the problem of maximizing x1
within the unit square S = [0, 1]2 . The problem is formulated as
max x1
subject to x1 + x3 = 1
x2 + x4 = 1
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.
Here x3 and x4 are slack variables for the original problem to put it in
standard form. The optimality conditions for x(µ) consist of the original 2
linear constraint equations and the four equations
y1 + s1 = 1
y2 + s2 = 0
y1 + s3 = 0
y2 + s4 = 0
together with the relations si = µ/xi for i = 1, 2 . . . , 4. These equations are
readily solved with a series of elementary variable eliminations to find
p
1 − 2µ ± 1 + 4µ2
x1 (µ) =
2
x2 (µ) = 1/2.
Using the “+” solution, it is seen that as µ → 0 the solution goes to
x → (1, 1/2). Note that this solution is not a corner of the cube. Instead
it is at the analytic center of the optimal face {x : x1 = 1, 0 ≤ x2 ≤ 1}.
The limit of x(µ) as µ → ∞ can be seen to be the point (1/2, 1/2). Hence,
the central path in this case is a straight line progressing from the analytic
center of the square (at µ → ∞) to the analytic center of the optimal face
(at µ → 0).
134 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Dual central path


Now consider the dual problem

(LD) maximize yT b
subject to yT A + sT = cT
s ≥ 0.

We may apply the barrier approach to this problem by formulating the


problem
n
X
(BD) maximize yT b + µ log sj
j=1

subject to yT A + sT = cT
s > 0.

We assume that the dual feasible set Fd has an interior F d = {(y, s) :
yT A + sT = cT , s > 0} is nonempty and the optimal solution set of (LD)
is bounded. Then, as µ is varied continuously toward 0, there is a path
(y(µ), s(µ)) defined by the solution to (BD). This path y(µ) is termed the
dual central path.
To work out the necessary and sufficient conditions we introduce x as a
Lagrange multiplier and form the Lagrangian
n
X
yT b + µ log sj − (yT A + sT − cT )x.
j=1

Setting to zero the derivative with respect to yi leads to

bi − ai. x = 0, for all i

where a.i is the i-th row of A. Setting to zero the derivative with respect
to sj leads to
µ/sj − xj = 0, for all j.
Combining these equations and including the original constraint yields the
complete set of conditions

x◦s = µ1
Ax = b
T
A y+s = c.
5.5. THE CENTRAL PATH 135

These are identical to the optimality conditions for the primal central path
(5.8). Note that x is a primal feasible solution and x > 0.
To see the geometric representation of the dual central path, consider
the dual level set

Ω(z) = {y : cT − yT A ≥ 0, yT b ≥ z}

for any z < z ∗ where z ∗ is the optimal value of (LD). Then, the analytic
center (y(z), s(z)) of Ω(z) coincides the dual central path as z tends to the
optimal value z ∗ from below (Figure 5.4).

ya

The objective hyperplanes

Figure 5.4: The central path as analytic centers in the dual feasible region.

Example 5.4 The square dual Consider the dual of the LP of example
5.3. The feasible region is the set of vectors y with y1 ≤ −1, y2 ≤ 0.
(The values of s1 and s2 are the slack variables of these inequalities.) The
solution to the dual barrier problem is easily found from the solution of the
primal barrier problem to be

y1 (µ) = −1 − µ/x1 (µ), y2 = −2µ.

As µ → 0, we have y1 → −1, y2 → 0, which is the unique solution to the


dual LP. However, as µ → ∞, the vector y is unbounded, for in this case
the dual feasible set is itself unbounded.
136 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Primal–dual central path


Suppose the feasible region of the primal (LP) has interior points and its
optimal solution set is bounded. Then, the dual also has interior points
(again see Exercise 5.3). The primal–dual path is defined to be the set of
vectors (x(µ), y(µ), s(µ)) that satisfy the conditions

x◦s = µ1
Ax = b
(5.9)
AT y + s = c
x, s ≥ 0

for 0 ≤ µ ≤ ∞. Hence the central path is defined without explicit reference


to any optimization problem. It is simply defined in terms of the set of
equality and inequality conditions.
Since conditions (5.8) and (5.9) are identical, the primal–dual central
path can be split into two components by projecting onto the relevant space,
as described in the following proposition.

Proposition 5.2 Suppose the feasible sets of the primal and dual pro-
grams contain interior points. Then the primal–dual central path x(µ), y(µ), s(µ))
exists for all µ, 0 ≤ µ < ∞. Furthermore, x(µ) is the primal central path,
and (y(µ), s(µ)) is the dual central path.

Moreover, the central path converges to the primal–dual optimal solu-


tion set, as stated below.
Proposition 5.3 Suppose the feasible sets of the primal and dual pro-
grams contain interior points. Then x(µ) and (y(µ), s(µ)) converge to the
analytic centers of the optimal primal solution and dual solution faces, re-
spectively, as µ → 0.

Duality gap
Let (x(µ), y(µ), s(µ)) be on the primal-dual central path. Then from (5.9)
it follows that

cT x − yT b = yT Ax + sT x − yT b = sT x = nµ.

The value cT x − yT b = sT x is the difference between the primal objective


value and the dual objective value. This value is always nonnegative (see
the weak duality lemma in Section 4.2) and is termed the duality gap.
The duality gap provides a measure of closeness to optimality. For any
primal feasible x, the value cT x gives an upper bound as cT x ≥ z ∗ where
5.6. SOLUTION STRATEGIES 137

P-F D-F 0-Duality


√ √
Primal Simplex √ √
Dual Simplex √
Primal Barrier √ √
Primal-Dual Path-Following √ √
Primal-Dual Potential-Reduction

z ∗ is the optimal value of the primal. Likewise, for any dual feasible pair
(y, s), the value yT b gives a lower bound as yT b ≤ z ∗ . The difference, the
duality gap g = cT x − yT b, provides a bound on z ∗ as z ∗ ≥ cT x − g. Hence
if at a feasible point x, a dual feasible (y, s) is available, the quality of x
can be measured as cT x − z ∗ ≤ g.
At any point on the primal–dual central path, the duality gap is equal
to nµ. Hence, this value gives a measure of optimality. It is clear that as
µ → 0 the duality gap goes to zero, and hence both x(µ) and (y(µ), s(µ))
approach optimality for the primal and dual, respectively.

5.6 Solution Strategies


The various definitions of the central path directly suggest corresponding
strategies for solution of an LP. We outline three general approaches here:
the primal barrier or path-following method, the primal-dual path-following
method and the primal-dual potential-reduction method, although the de-
tails of their implementation and analysis must be deferred to later chapters
after study of general nonlinear methods. The following table depicts these
solution strategies and the simplex methods described in Chapters 3 and 4
with respect to how they meet the three optimality conditions: Primal Fea-
sibility (P-F), Dual Feasibility (D-F), and Zero-Duality during the iterative
process: For example, the primal simplex method keeps improving a primal
feasible solution, maintains the zero-duality gap (complementarity condi-
tion) and moves toward dual feasibility; while the dual simplex method
keeps improving a dual feasible solution, maintains the zero-duality gap
(complementarity condition) and moves toward primal feasibility (see Sec-
tion 4.3). The primal barrier method will keep improving a primal feasible
solution and move toward dual feasibility and complementarity; and the
primal-dual interior-point methods will keep improving a primal and dual
feasible solution pair and move toward complementarity.
138 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Primal barrier method


A direct approach is to use the barrier construction and solve the the prob-
lem
Pn
minimize cT x − µ j=1 log xj (5.10)
subject to Ax = b
x ≥ 0,

for a very small value of µ. In fact, if we desire to reduce the duality gap to
ε it is only necessary to solve the problem for µ = ε/n. Unfortunately, when
µ is small, the problem (5.10) could be highly ill-conditioned in the sense
that the necessary conditions are nearly singular. This makes it difficult to
directly solve the problem for small µ.
An overall strategy, therefore, is to start with a moderately large µ
(say µ = 100) and solve that problem approximately. The corresponding
solution is a point approximately on the primal central path, but it is likely
to be quite distant from the point corresponding to the limit of µ → 0.
However this solution point at µ = 100 can be used as the starting point
for the problem with a slightly smaller µ, for this point is likely to be close
to the solution of the new problem. The value of µ might be reduced at each
stage by a specific factor, giving µk+1 = γµk , where γ is a fixed positive
parameter less than one and k is the stage count.
If the strategy is begun with a value µ0 , then at the k-th stage we have
µk = γ k µ0 . Hence to reduce µk /µ0 to below ε, requires
log ε
k=
log γ
stages.
Often a version of Newton’s method for minimization is used to solve
each of the problems. For the current strategy, Newton’s method works

on a problem (5.10) with fixed µ by moving from a given point x ∈F p to

a closer point x+ ∈F p by solving for dx , dy and ds from the linearized
version of the central path equations of (5.7), as

µX−2 dx + ds = µX−1 1 − c,
Adx = 0, (5.11)
−AT dy − ds = 0.

(Recall that X is the diagonal matrix whose diagonal entries are x > 0.)
The new point is then updated by taking a step in the direction of dx , as
x+ = x + dx .
5.6. SOLUTION STRATEGIES 139

Notice that if x◦s = µ1 for some s = c−AT y, then d ≡ (dx , dy , ds ) = 0


because the current point is already the central path solution for µ. If some
component of x ◦ s is less than µ, then d will tend to increment the solution
so as to increase that component. The converse will occur if x ◦ s is greater
than µ.
This process may be repeated several times until a point close enough
to the proper solution to the barrier problem for the given value of µ is
obtained. That is, until the necessary and sufficient conditions (5.7) are
(approximately) satisfied.
There are several details involved in a complete implementation and
analysis of Newton’s method. These items are discussed in later chapters
of the text. However, the methods works well if either µ is moderately
large, or if the algorithm is initiated at a point very close to the solution,
exactly as needed for the barrier strategy discussed in this subsection.
To solve (5.11), premultiplying both sides by X2 we have

µdx + X2 ds = µX1 − X2 c.

Then, premultiplying by A and using Adx = 0, we have

AX2 ds = µAX1 − AX2 c.

Using ds = −AT dy we have

(AX2 AT )dy = −µAX1 + AX2 c.

Thus, dy can be computed by solving the above linear system of equa-


tions, which, together with computing ds and dx , amount to O(nm2 + m3 )
arithmetic operations for each Newton step.

5.6.1 Primal-dual path-following


Another strategy for solving LP is to follow the central path from a given
initial primal-dual solution pair. Consider a linear program in the standard

form (LP) and (LD). Assume that F 6= ∅; that is, both

F p = {x : Ax = b, x > 0} 6= ∅

and

T
F d = {(y, s) : s = c − A y > 0} 6= ∅,
and denote by z ∗ the optimal objective value.
140 CHAPTER 5. INTERIOR-POINT ALGORITHMS

The central path can be expressed as


½ ¾
◦ xT s
C = (x, y, s) ∈F : x ◦ s = 1
n

in the primal-dual form. On the path we have x ◦ s = µ1 and hence


sT x = nµ. A neighborhood of the central path C is of the form

N (η) = {(x, y, s) ∈F : |s ◦ x − µ1| < ηµ, where µ = sT x/n} (5.12)

for some η ∈ (0, 1), say η = 1/4. This can be thought of as a tube whose
center is the central path.
The idea of the path-following method is to move within a tubular
neighborhood of the central path toward the solution point. A suitable
initial point (x0 , y0 , s0 ) ∈ N (η) can be found by solving the barrier problem
for some fixed µ0 or from an initialization phase proposed later. After that,
step by step moves are made, alternating between a predictor step and a
corrector step. After each pair of steps, the point achieved is again in the
fixed given neighborhood of the central path, but closer to the LP solution
set.
The predictor step is designed to move essentially parallel to the true
central path. The step d ≡ (dx , dy , ds ) is determined from the linearized
version of the primal-dual central path equations of (5.9), as

s ◦ dx + x ◦ ds = γµ1 − x ◦ s,
Adx = 0, (5.13)
−AT dy − ds = 0,

where one selects γ = 0. (To show the dependence of d on the current pair
(x, s) and the parameter γ, we write d = d(x, s, γ).)
The new point is then found by taking a step in the direction of d,
as (x+ , y+ , s+ ) = (x, y, s) + α(dx , dy , ds ), where α is called the step-size.
Note that dTx ds = −dTx AT dy = 0 here. Then

(x+ )T s+ = (x + αdx )T (s + αds ) = xT s + α(dTx s + xT ds ) = (1 − α)xT s.

Thus, the predictor step will then reduce the duality gap by a factor 1 − α,
because the new direction d will tend to reduce it. The maximum possible
step-size α in that direction is made in that parallel direction without going
outside of the neighborhood N (2η).
The corrector step essentially moves perpendicular to the central path
in order to get closer to it. This step moves the solution back to within the
neighborhood N (η), and the step is determined by selecting γ = 1 as in
5.6. SOLUTION STRATEGIES 141

(5.13) with µ = xT s/n. Notice that if x ◦ s = µ1, then d = 0 because the


current point is already the central path solution.
This corrector step is identical to one step of the barrier method. Note,
however, that the predictor–corrector method requires only one sequence
of steps each consisting of a single predictor and corrector. This contrasts
with the barrier method which requires a complete sequence for each µ to
get back to the central path, and then an outer sequence to reduce the µ’s.
One can prove that for any (x, y, s) ∈ N (η) with µ = xT s/n, the step-
size satisfied
1
α≥ √ .
2 n

Thus, the operation complexity of the method is identical to that of the bar-
rier method with the overall operation complexity O(n.5 (nm2 +m3 ) log(1/ε))
to achive µ/µ0 ≤ ε where nµ0 is the initial duality gap. Moreover, one can
prove that the step-size α → 1 as xT s → 0, that is, the duality reduction
speed is accelerated as the gap becomes smaller.

5.6.2 Primal-dual potential function


In this method a primal-dual potential function is used to measure the so-
lution’s progress. The potential is reduced at each iteration. There is no
restriction on either neighborhood or step-size during the iterative process
as long as the potential is reduced. The greater the reduction of the po-
tential function, the faster the convergence of the algorithm. Thus, from
a practical point of view, potential-reduction algorithms may have an ad-
vantage over path-following algorithms where iterates are confined to lie in
certain neighborhoods of the central path.
◦ ◦
For x ∈F p and (y, s) ∈F d the primal–dual potential function is defined
by
n
X
ψn+ρ (x, s) ≡ (n + ρ) log(xT s) − log(xj sj ), (5.14)
j=1

where ρ ≥ 0.
From the arithmetic and geometric mean inequality (also see Exercise
5.9) we can derive that

n
X
n log(xT s) − log(xj sj ) ≥ n log n.
j=1
142 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Then
n
X
ψn+ρ (x, s) = ρ log(xT s) + n log(xT s) − log(xj sj ) ≥ ρ log(xT s) + n log n.
j=1
(5.15)
Thus, for ρ > 0, ψn+ρ (x, s) → −∞ implies that xT s → 0. More precisely,
we have from (5.15)
µ ¶
T ψn+ρ (x, s) − n log n
x s ≤ exp .
ρ
Hence the primal–dual potential function gives an explicit bound on the
magnitude of the duality gap.
The objective of this method is to drive the potential function down
toward minus infinity. The method of reduction is a version of Newton’s
method (5.13). In this case we select γ = n/(n + ρ). Notice that is a
combination of a predictor and corrector choice. The predictor uses γ =
0 and the corrector uses γ = 1. The primal–dual potential method uses
something in between. This seems logical, for the predictor moves parallel
to the central path toward a lower duality gap, and the corrector moves
perpendicular to get close to the central path. This new method does both
at once. Of course,
√ this intuitive notion must be made precise.
For ρ ≥ n, there is in fact a guaranteed decrease in the potential
function by a fixed amount δ (see Exercises 5.11 and 5.12). Specifically,
ψn+ρ (x+ , s+ ) − ψn+ρ (x, s) ≤ −δ (5.16)
for a constant δ ≥ 0.2. This result provides a theoretical bound on the
number of required iterations and the bound is competitive with other
methods. However, a faster algorithm may be achieved by conducting a line
search along direction d to achieve the greatest reduction in the primal-dual
potential function at each iteration.
We outline the algorithm here:

Algorithm √ 5.1 Given (x0 , y0 ,√
s0 ) ∈F with ψn+ρ (x0 , s0 ) ≤ ρ log((s0 )T x0 )+
n log n + O( n log n). Set ρ ≥ n and k = 0.
k T k
While (s ) x
(s0 )T x0
≥ ε do

1. Set (x, s) = (xk , sk ) and γ = n/(n + ρ) and compute (dx , dy , ds ) from


(5.13).
2. Let xk+1 = xk + ᾱdx , yk+1 = yk + ᾱdy , and sk+1 = sk + ᾱds where
ᾱ = arg min ψn+ρ (xk + αdx , sk + αds ).
α≥0
5.6. SOLUTION STRATEGIES 143

3. Let k = k + 1 and return to Step 1.

Theorem 5.4 Algorithm 5.1 terminates in at most O(ρ log(n/ε)) itera-


tions with
(sk )T xk
≤ ε.
(s0 )T x0

Proof. Note that after k iterations, we have from (5.16)



ψn+ρ (xk , sk ) ≤ ψn+ρ (x0 , s0 )−k·δ ≤ ρ log((s0 )T x0 )+n log n+O( n log n)−k·δ.

Thus, from the inequality (5.15),



ρ log((sk )T xk ) + n log n ≤ ρ log((s0 )T x0 ) + n log n + O( n log n) − k · δ,

or √
ρ(log((sk )T xk ) − log((s0 )T x0 )) ≤ −k · δ + O( n log n).
Therefore, as soon as k ≥ O(ρ log(n/ε)), we must have

ρ(log((sk )T xk ) − log((s0 )T x0 )) ≤ −ρ log(1/ε),

or
(sk )T xk
≤ ε.
(s0 )T x0
2
√ √
Theorem 5.4 holds for any ρ ≥ n.√ Thus, by choosing ρ = n, the
iteration complexity bound becomes O( n log(n/ε)).

Computational complexity cost of each iteration


The computation of each iteration basically invokes the solving (5.13) for
d. Note that the first equation of (5.13) can be written as

Sdx + Xds = γµ1 − XS1

where X and S are two diagonal matrices whose diagonal entries are x > 0
and s > 0, respectively. Premultiplying both sides by S−1 we have

dx + S−1 Xds = γµS−1 1 − x.

Then, premultiplying by A and using Adx = 0, we have

AS−1 Xds = γµAS−1 1 − Ax = γµAS−1 1 − b.


144 CHAPTER 5. INTERIOR-POINT ALGORITHMS

Using ds = −AT dy we have

(AS−1 XAT )dy = b − γµAS−1 1.

Thus, the primary computational cost of each iteration of the interior-


point algorithm discussed in this section is to form and invert the normal
matrix AXS−1 AT , which typically requires O(nm2 + m3 ) arithmetic op-
erations. However, an approximation of this matrix can be updated and
inverted using far fewer arithmetic operations. In fact, using a rank-one
technique (see Chapter 10) to update the approximation inverse of the nor-
mal matrix during the iterative progress, one can√reduce the average num-
ber of arithmetic operations per iteration to O( nm2 ). Thus, if relative
tolerance ε is viewed as a variable, we have the following total arithmetic
operation complexity bound to solve an linear program:

Corollary 5.5 Let ρ = n. Then, Algorithm 5.1 terminates in at most
O(nm2 log(n/ε)) arithmetic operations with

(xk )T sk
≤ ε.
(x0 )T s0

5.7 Termination and Initialization∗


There are several remaining important issues concerning interior-point al-
gorithms for LP. The first issue involves termination. Unlike the simplex
method for linear programming which terminates with an exact solution,
interior-point algorithms are continuous optimization algorithms that gen-
erate an infinite solution sequence converging to the optimal solution set.
If the data of an LP instance are integral or rational, an argument is made
that, after the worst-case time bound, an exact solution can be rounded
from the latest approximate solution. Several questions arise. First, under
the real number computation model (that is, the LP data consists of real
numbers), how can we terminate at an exact solution? Second, regardless
of the data’s status, is there a practical test, which can be computed cost-
effectively during the iterative process, to identify an exact solution so that
the algorithm can be terminated before the worse-case time bound? Here,
by exact solution we mean one that could be found using exact arithmetic,
such as the solution of a system of linear equations, which can be computed
in a number of arithmetic operations bounded by a polynomial in n.
The second issue involves initialization. Almost all interior-point algo-

rithms solve the LP problem under the regularity assumption that F 6= ∅.
What is to be done if this is not true?
5.7. TERMINATION AND INITIALIZATION∗ 145

A related issue is that interior-point algorithms have to start at a strictly


feasible point near the central path. One way is to explicitly bound the
feasible region of the LP problem by a big M number. If the LP problem
has integral data, this number can be set to 2L in the worst case, where
L is the length of the LP data in binary form. This is not workable for
large problems. Moreover, if the LP problem has real data, no computable
bound is known.

Termination
In the previously studied complexity bounds for interior-point algorithms,
we have an ε which needs to be zero in order to obtain an exact optimal
solution. Sometimes it is advantageous to employ an early termination or
rounding method while ε is still moderately large. There are five basic
approaches.
• A “purification” procedure finds a feasible corner whose objective
value is at least as good as the current interior point. This can be
accomplished in strongly polynomial time (that is, the complexity
bound is a polynomial only in the dimensions m and n). One difficulty
is that then may be many non-optimal vertices may be close to the
optimal face, and the procedure might require many pivot steps for
difficult problems.
• A second method seeks to identify an optimal basis. It has been
shown that if the LP problem is nondegenerate, the unique optimal
basis may be identified early. The procedure seems to work well for
some LP problems but it has difficulty if the problem is degenerate.
Unfortunately, most real LP problems are degenerate.
• The third approach is to slightly perturb the data such that the new
LP problem is nondegenerate and its optimal basis remains one of
the optimal bases of the original LP problem. There are questions
about how and when to perturb the data during the iterative process,
decisions which can significantly affect the success of the effort.
• The fourth approach is to guess the optimal face and find a feasible
solution on that face. It consists of two phases: the first phase uses
interior point algorithms to identify the complementarity partition
(P ∗ , Z ∗ ) (see Exercise 5.5), and the second phase adapts the simplex
method to find an optimal primal (or dual) basic solution and one can
use (P ∗ , Z ∗ ) as a starting base for the second phase. This method is
often called the cross-over method. It is guaranteed to work in finite
time and is implemented in several popular LP software packages.
146 CHAPTER 5. INTERIOR-POINT ALGORITHMS

• The fifth approach is to guess the optimal face and project the current
interior point onto the interior of the optimal face. This again uses
interior point algorithms to identify the complementarity partition
(P ∗ , Z ∗ ), and then solves a least-squares problem for the projection;
see Figure 5.5. The termination criterion is guaranteed to work in
finite time.

Objective
hyperplane

yk Central path

y*
Optimal
face

Figure 5.5: Illustration of the projection of an interior point onto the opti-
mal face.

The fourth and fifth methods above are based on the fact that (as observed
in practice and subsequently proved) many interior-point algorithms for
linear programming generate solution sequences that converge to a strictly
complementary solution or an interior solution on the optimal face; see
Exercise (5.7).

Initialization
Most interior-point algorithms must be initiated at a strictly feasible point.
The complexity of obtaining such an initial point is the same as that of
solving the LP problem itself. More importantly, a complete LP algorithm
should accomplish two tasks: 1) detect the infeasibility or unboundedness
status of the LP problem, then 2) generate an optimal solution if the prob-
lem is neither infeasible nor unbounded.
Several approaches have been proposed to accomplish these goals:

• The primal and dual can be combined into a single linear feasibil-
ity problem, and then an LP algorithm can find a feasible point of
the problem. Theoretically, this approach achieves the currently best
5.7. TERMINATION AND INITIALIZATION∗ 147


iteration complexity bound, i.e., O( n log(1/ε)). Practically, a sig-
nificant disadvantage of this approach is the doubled dimension of the
system of equations that must be solved at each iteration.

• The big M method can be used by adding one or more artificial


column(s) and/or row(s) and a huge penalty parameter M to force
solutions to become feasible during the algorithm. Theoretically, this
approach also achieves the best iteration bound. A major disadvan-
tage of this approach is the numerical problems caused by the addition
of coefficients of large magnitude.

• Phase I-then-Phase II methods are effective. They first try to find


a feasible point (and possibly one for the dual problem), and then
begin to look for an optimal solution if the problem is feasible and
bounded. Theoretically, this approach also achieves the best iteration
complexity bound. A major disadvantage of this approach is that the
two (or three) related LP problems must be solved sequentially.

• A variation is the combined Phase I-Phase II method. This ap-


proaches feasibility and optimality simultaneously. To our knowl-
edge, the currently best iteration complexity
√ bound of this approach
is O(n log(1/ε)), as compared to O( n log(1/ε)) of the three above.
Other disadvantages of the method include the assumption of non-
empty interior and the need of an objective lower bound.

There is a homogeneous and self-dual (HSD) LP algorithm to over-


come the difficulties mentioned
√ above. The algorithm, while also achieves
theoretically best O( n log(1/ε)) complexity bound, is often used in LP
software packages and possesses the following features:

• It solves a linear programming problem without any regularity as-


sumption concerning the existence of optimal, feasible, or interior
feasible solutions.

• It can start at x = e, y = 0 and s = e, feasible or infeasible, on the


central ray of the positive orthant (cone), and it does not require a
big M penalty parameter or lower bound.

• Each iteration solves a system of linear equations whose dimension is


almost the same as that solved in the standard (primal-dual) interior-
point algorithms.

• If the LP problem has a solution, the algorithm generates a sequence


that approaches feasibility and optimality simultaneously; if the prob-
148 CHAPTER 5. INTERIOR-POINT ALGORITHMS

lem is infeasible or unbounded, the algorithm will produce an infeasi-


bility certificate for at least one of the primal and dual problems; see
Exercise 5.4.

5.8 Notes
Computation and complexity models were developed by a number of sci-
entists; see, e.g., Cook [14], Hartmanis and Stearns [30] and Papadimitriou
and Steiglitz [49] for the bit complaxity models and Blum et al. [12] for the
real number arithmetic model.
Most materials of subsection 5.1 are based on a teaching note of Cottle
on Linear Programming tatught at Stanford. Practical performances of the
simplex method can be seen in Bixby [10].
The ellipsoid method was developed by Khachiyan [34]; more develop-
ments of the ellipsoid method can be found in Bland and Todd [11] and
Goldfarb and Todd [25].
The “analytic center” for a convex polyhedron given by linear inequali-
ties was introduced by Huard [32], and later by Sonnevend [53]. The barrier
function, was introduced by Frisch [23].
McLinden [40] earlier, then Bayer and Lagarias [6, 7], Megiddo [41],
and Sonnevend [53], analyzed the central path for linear programming and
convex optimization; also see Fiacco and McCormick [21].
The path-following algorithms were first developed by Renegar [50]. A
primal barrier or path-following algorithm is independently analyzed by
Gonzaga [28]. Both Gonzaga [28] and Vaidya [59] developed a rank-one
updating technique in solving the Newton equation of each iteration, and
proved that each iteration uses O(n2.5 ) arithmetic operations on average.
Kojima, Mizuno and Yoshise [36] and Monteiro and Adler [44] developed
a symmetric primal-dual path-following algorithm with the same iteration
and arithmetic operation bounds.
The predictor-corrector algorithms were developed by Mizuno et al. [43].
A more practical predictor-corrector algorithm was proposed by Mehrotra
[42] (also see Lustig et al. [39] and Zhang and Zhang [66]). His technique
has been used in almost all of the LP interior-point implementations.
A primal potential reduction algorithm was initially proposed by Kar-
markar [33]. The primal-dual potential function was proposed by Tanabe
[54] and Todd and Ye [56]. The primal-dual potential reduction algorithm
was developed by Ye [64], Freund [22], Kojima, Mizuno and Yoshise [37],
Goldfarb and Xiao [26], Gonzaga and Todd [29], Todd [55], Tuncel4 [57],
Tunccel [57], etc.
5.9. EXERCISES 149

The homogeneous and self-dual embedding method can be found Ye et


al. [65], Luo et al. [38], Andersen2 [3], and many others.
In addition to those mentioned earlier, there are several comprehensive
books which cover interior-point linear programming algorithms. They are
Bazaraa, Jarvis and Sherali [5], Bertsekas [8], Bertsimas and Tsitsiklis [9],
Cottle [15], Cottle, Pang and Stone [16], Dantzig and Thapa [18, 19], Fang
and Puthenpura [20], den Hertog [31], Murty [45], Nash and Sofer [46],
Roos et al. [51], Saigal [52], Vanderbei [61], Vavasis [62], Wright [63], etc.

5.9 Exercises
5.1 Prove the volume reduction rate in Theorem 5.1 for the ellipsoid method.
5.2 Develop a cutting plane method, based on the ellipsoid method, to find
a point satisfying convex inequalities
fi (x) ≤ 0, i = 1, ..., m, |x|2 ≤ R2 ,
where fi ’s are convex functions of x in C 1 .

5.3 Consider linear program (5.5) and assume that F = {x : Ax = b, x >
0} is nonempty and its optimal solution set is bounded. Then, the dual of
the problem has a nonempty interior.
5.4 (Farkas’ lemma) Exactly one of the feasible set {x : Ax = b, x ≥ 0}
and the feasible set {y : yT A ≤ 0, yT b = 1} is nonempty. A vector y in
the latter set is called an infeasibility certificate for the former.
5.5 (Strict complementarity) Given any linear program in the standard
form, the primal optimal face is
Ωp = {xP ∗ : AP ∗ xP ∗ = b, xP ∗ ≥ 0},
and the dual optimal face is
Ωd = {(y, sZ ∗ ) : ATP ∗ y = cP ∗ , sZ ∗ = cZ ∗ − ATZ ∗ y ≥ 0},
where (P ∗ , Z ∗ ) is a partition of {1, 2, ..., n}. Prove that the partition is
unique, and each face has an interior, that is,

Ωp = {xP ∗ : AP ∗ xP ∗ = b, xP ∗ > 0},
and

T
Ωd = {(y, sZ ∗ ) : AP ∗ y = cP ∗ , sZ ∗ = cZ ∗ − ATZ ∗ y > 0},
are both nonempty.
150 CHAPTER 5. INTERIOR-POINT ALGORITHMS

5.6 (Central path theorem) Let (x(µ), y(µ), s(µ)) be the central path of
(5.9). Then prove

i) The central path point (x(µ), s(µ)) is bounded for 0 < µ ≤ µ0 and any
given 0 < µ0 < ∞.

ii) For 0 < µ0 < µ,

cT x(µ0 ) ≤ cT x(µ) and bT y(µ0 ) ≥ bT y(µ).

Furthermore, if x(µ0 ) 6= x(µ) and y(µ0 ) 6= y(µ),

cT x(µ0 ) < cT x(µ) and bT y(µ0 ) > bT y(µ).

iii) (x(µ), s(µ)) converges to an optimal solution pair for (LP) and (LD).
Moreover, the limit point x(0)P ∗ is the analytic center on the pri-
mal optimal face, and the limit point s(0)Z ∗ is the analytic center on
the dual optimal face, where (P ∗ , Z ∗ ) is the strict complementarity
partition of the index set {1, 2, ..., n}.

5.7 Consider a primal-dual interior point (x, y, s) ∈ N (η) where η < 1.


Prove that there is a fixed quantity δ > 0 such that

xj ≥ δ, ∀j ∈ P ∗

and
sj ≥ δ, ∀j ∈ Z ∗ ,

where (P ∗ , Z ∗ ) is defined in Exercise 5.5.

5.8 (Potential level theorem) Define the potential level set



Ψ(δ) := {(x, y, s) ∈F : ψn+ρ (x, s) ≤ δ}.

Prove
i)

Ψ(δ 1 ) ⊂ Ψ(δ 2 ) if δ1 ≤ δ2 .

ii) For every δ, Ψ(δ) is bounded and its closure Ψ̂(δ) has non-empty inter-
section with the LP solution set.
5.9. EXERCISES 151

5.9 Given 0 < x, s ∈ Rn , show that


n
X
n log(xT s) − log(xj sj ) ≥ n log n
j=1

and µ ¶
ψn+ρ (x, s) − n log n
xT s ≤ exp .
ρ

5.10 (Logarithmic approximation) If d ∈ Rn such that |d|∞ < 1 then


n
X |d|2
eT d ≥ log(1 + di ) ≥ eT d − .
i=1
2(1 − |d|∞ )

5.11 Let the direction (dx , dy , ds ) be generated by system (5.13) with γ =


n/(n + ρ) and µ = xT s/n, and let step size
p
θ min(Xs)
α= xT s
, (5.17)
k(XS)−1/2 ( (n+ρ) 1 − Xs)k

where θ is a positive constant less than 1. Let

x+ = x + αdx , y+ = y + αdy , and s+ = s + αds .

Then, using exercise 5.10 and the concavity of logarithmic function show

(x+ , y+ , s+ ) ∈F and

ψn+ρ (x+ , s+ ) − ψn+ρ (x, s)


p (n + ρ) θ2
≤ −θ min(Xs)k(XS)−1/2 (1 − T
Xs)k + .
x s 2(1 − θ)

5.12 Let v = Xs in exercise 5.11. Then, prove


p (n + ρ) p
min(v)kV−1/2 (1 − T
v)k ≥ 3/4 ,
1 v
where V is the diagonal matrix of v. Thus, the two exercises imply
p θ2
ψn+ρ (x+ , s+ ) − ψn+ρ (x, s) ≤ −θ 3/4 + = −δ
2(1 − θ)

for a constant δ. One can verify that δ > 0.2 when θ = 0.4.
152 CHAPTER 5. INTERIOR-POINT ALGORITHMS
Chapter 12

Penalty and Barrier


Methods

The interior-point algorithms discussed for linear programming in Chap-


ter 5 are, as we mentioned then, closely related to the barrier methods
presented in this chapter. They can be naturally extended to solving non-
linear programming problems while maintain both theoretical and practical
efficiency. Below we present two important such problems.

12.9 Convex quadratic programming


Quadratic programming (QP), specially convex quadratic programming,
plays an important role in optimization. In one sense it is a continuous
optimization problem and a fundamental subroutine frequently employed
by general nonlinear programming, but it is also considered one of the
challenging combinatorial problems because its optimality conditions are
linear equalities and inequalities.
The barrier and interior-point LP algorithms discussed in Chapter 5 can
be extended to solve convex quadratic programming (QP). We present one
extension, the primal-dual potential reduction algorithm, in this section.
Since solving convex QP problems in the form
1 T T
minimize 2 x Qx + c x
subject to Ax = b, (12.1)
x ≥ 0,
where given matrix Q ∈ Rn×n is positive semidefinite, A ∈ Rn×m , c ∈ Rn
and b ∈ Rm , reduces to finding x ∈ Rn , y ∈ Rm and s ∈ Rn to satisfy the

153
154 CHAPTER 12. PENALTY AND BARRIER METHODS

following optimality conditions:


T
µ ¶ µ x ¶s = 0
x s
M − = q (12.2)
y 0
x, s ≥ 0,
where
µ ¶ µ ¶
Q −AT (n+m)×(n+m) −c
M= ∈R and q= ∈ Rn+m .
A 0 b
The potential function for solving this problem has the same form as

(5.14) over the interior of the feasible region, F , of (12.2). Once we have
an interior feasible point (x, y, s) with µ = xT s/n, we can generate a new
iterate (x+ , y+ , s+ ) by solving for (dx , dy , ds ) from the system of linear
equations:
µ ¶Sdxµ+ Xd¶s = γµ1 − Xs,
dx ds (12.3)
M − = 0,
dy 0
where X and S are two diagonal matrices whose diagonal entries are x > 0
and s > 0, respectively.
We again choose γ = n/(n + ρ) < 1. Then, assign x+ = x + ᾱdx ,
y = y + ᾱdy , and s+ = s + ᾱds where
+

ᾱ = arg min ψn+ρ (x + αdx , s + αds ).


α≥0

Since Q is positive semidefinite, we have

dTx ds = (dx ; dy )T (ds ; 0) = (dx ; dy )T M(dx ; dy ) = dTx Qdx ≥ 0

(where we recall that dTx ds = 0 in the LP case). √


Similar to the case of LP, one can prove that for ρ ≥ n
p θ2
ψn+ρ (x+ , s+ ) − ψn+ρ (x, s) ≤ −θ 3/4 + = −δ
2(1 − θ)
for a constant δ > 0.2. This result will provide an iteration complexity
bound that is identical to linear programming; see Chapter 5.

12.10 Semidefinite programming


Semidefinite Programming, hereafter SDP, is a natural extension of Linear
programming. In LP, the variables form a vector which is required to
12.10. SEMIDEFINITE PROGRAMMING 155

be component-wise nonnegative, where in SDP they are components of a


symmetric matrix and it is constrained to be positive semidefinite. Both
of them may have linear equality constraints as well. Although SDP is
known to be a convex optimization model, no efficient algorithm is known
for solving it. One of the major result in optimization during the past
decade is the discovery that interior-point algorithms for LP, discussed in
Chapter 5, can be effectively adapted to solving SDP with both theoretical
and practical efficiency. During the same period, the applicable domain
of SDP has dramatically widened to combinatory optimization, statistical
computation, robust optimization, Euclidean distance geometry, quantum
computing, etc. SDP established its full popularity.
Let C and Ai , i = 1, 2, ..., m, be given n-dimensional symmetric ma-
trices and b ∈ Rm , and let X be an unknown n-dimensional symmetric
matrix. Then, the SDP model can be written as

(SDP ) minimize C•X


(12.4)
subject to Ai • X = bi , i = 1, 2, ..., m, X º 0,

where the • operation is the matrix inner product


X
A • B ≡ trace AT B = aij bij .
i,j

The notation X º 0 means that X is positive semidefinite , and X Â 0


means that X is positive definite. If a matrix X Â 0 and satisfies all equal-
ities in (SDP), it is called a (primal) strictly or interior feasible solution.
.
Note that, in SDP, we minimize a linear function of a symmetric matrix
constrained in the positive semidefinite matrix cone and subjected to linear
equality constraints. This is in contrast to the nonnegative orthant cone
required for LP.

Example 12.1 Binary quadratic optimization Consider a binary quadratic


optimization
minimize xT Qx
subject to xj = {1, −1}, ∀j,
which is a difficult nonconvex optimization problem. The problem can be
written as
minimize Q • xxT
subject to Ij • xxT = 1, ∀j,
where Ij is the all zero matrix except 1 at the jth diagonal.
156 CHAPTER 12. PENALTY AND BARRIER METHODS

An SDP relaxation of the problem is

minimize Q•X
subject to Ij • X = 1, ∀j,
X º 0.

It has been shown that an optimal SDP solution constitutes a good approx-
imation to the original problem.

Example 12.2 Sensor localization Suppose we have n unknown points


(sensors) xj ∈ Rd , j = 1, ..., n. For a pair of two points xi and xj in
Ne , we have a Euclidean distance measure dˆij between them. Then, the
localization problem is to find xj , j = 1, ..., n, such that

|xi − xj |2 = (dˆij )2 , ∀(i, j) ∈ Ne ,

subject to possible rotation and translation.


Let X = [x1 x2 ... xn ] be the 2 × n matrix that needs to be determined.
Then
|xi − xj |2 = (ei − ej )T XT X(ei − ej ),

where ei ∈ Rn is the vector with 1 at the ith position and zero everywhere
else. Let Y = XT X. Then SDP relaxation of the localization problem is
to find Y such that

(ei − ej )(ei − ej )T • Y = (dˆij )2 , ∀(i, j) ∈ Ne ,


Y º 0.

Example 12.3 Fast Markov chain design This problem is to design


a symmetric stochcastic probability transition matrix P such that pij =
pji = 0 for all (i, j) in a given non-edge set No and the spectral radius of
matrix P − eeT /n is minimized.
Let Iij ∈ Rn×n be the matrix with 1 at the (ij)th and (ji)th positions
and zero everywhere else. Then, the problem can be framed as an SDP
problem
minimize λ
subject to Iij • P = 0, ∀(i, j) ∈ No ,
Iij • P ≥ 0, ∀(i, j) 6∈ No ,
ei eT • P = 1, ∀i = 1, . . . , n,
λI º P − n1 eeT º −λI.
12.10. SEMIDEFINITE PROGRAMMING 157

Duality theorems for SDP


The dual of (SDP) is
T
(SDD) maximize y Pm b
subject to i yi Ai + S = C, (12.5)
S º 0.
Example 12.4 Given symmetric matrices Ai , i = 0, . . . , m, andPconsider
m
designing y ∈ Rm to maximize the lowest eigenvalue of A0 + i yi Ai .
The problem can be post as an SDP problem:
maximize λ Pm
subject to λI − i yi Ai + S = A0 ,
S º 0.
The weak duality theorem for SDP is identical to that of (LP) and (LD).
Lemma 12.1 (Weak Duality Lemma in SDP) Let the feasible regions, Fp
of (SDP) and Fd of (SDD), be both non-empty. Then,
C • X ≥ bT y where X ∈ Fp , (y, S) ∈ Fd .
But we need more to make the strong duality theorem hold.
Theorem 12.2 (Strong Duality Theorem in SDP) Suppose Fp and Fd are
non-empty and at least one of them has an interior. Then, X is optimal
for (SDP) if and only if the following conditions hold:
i) X ∈ Fp ;
ii) there is (y, S) ∈ Fd ;
iii) C • X = bT y or X • S = 0.
If the non-empty interior condition of Theorem 12.2 does not hold, then
the duality gap may not be zero at optimality.
Example 12.5 The following SDP has a duality gap:
     
0 1 0 0 0 0 0 −1 0
C =  1 0 0  , A1 =  0 1 0  , A2 =  −1 0 0 
0 0 0 0 0 0 0 0 2
and µ ¶
0
b= .
10
The primal minimal objective value is 0 and the dual maximal objective
value is −10 so that the duality gap is 10.
158 CHAPTER 12. PENALTY AND BARRIER METHODS

Interior-point algorithms for SDP


Let (SDP) and (SDD) both have interior point feasible solution. Then, the
central path can be expressed as
n ◦ o
C = (X, y, S) ∈F : XS = µI, 0 < µ < ∞ .

The primal-dual potential function for SDP is

ψn+ρ (X, S) = (n + ρ) log(X • S) − log(det(X) · det(S))

where ρ ≥ 0. Note that if X and S are diagonal matrices, these definitions


reduce to those for LP.
Once we have an interior feasible point (X, y, S), we can generate a new
iterate (X+ , y+ , S+ ) by solving for (Dx , dy , Ds ) from a system of linear
equations
D−1 Dx D−1 + Ds = γµX−1 − S,
Pm Ai • Dx = 0, ∀i, (12.6)
− i (dy )i Ai − Ds = 0,
where the (scaling) matrix

D = X.5 (X.5 SX.5 )−.5 X.5

and µ = X • S/n. Then assign X+ = X + ᾱDx , y+ = y + ᾱdy , and


S+ = s + ᾱDs where

ᾱ = arg min ψn+ρ (X + αDx , S + αDs ).


α≥0

Furthermore, we can show

ψn+ρ (X+ , S+ ) − ψn+ρ (X, S) ≤ −δ

for a constant δ > 0.2.


This will provide an iteration complexity bound that is identical to
linear programming; see Chapter 5.

12.11 Notes
Many researchers have applied interior-point algorithms to convex QP prob-
lems. These algorithms can be divided into three groups: the primal scaling
algorithm, the dual scaling algorithm, and the primal-dual scaling algo-
rithm. Relations among these algorithms can be seen in Anstreicher, den
Hertog, Roos and Terlaky [31, 4].
12.12. EXERCISES 159

There have been several remarkable applications of SDP; see, e.g., Goe-
mans and Williamson [24] and Vandenberghe and Boyd [13, 60].
The SDP example with a duality gap was constructed by Freund.
The primal potential reduction algorithm for positive semi-definite pro-
gramming is due to Alizadeh [2, 1] and to Nesterov and Nemirovskii [47].
The primal-dual SDP algorithm described here is due to Nesterov and Todd
[48].

12.12 Exercises
P12.1 (Farkas’ lemma in SDP) Let Ai , i = 1, ..., m, have rank m (i.e.,
m
i yi Ai = 0 implies y = 0). Then, there exists a symmetric matrix
X Â 0 with
Ai • X = bi , i = 1, ..., m,
Pm Pm
if and only if i yi Ai ¹ 0 and i yi Ai 6= 0 implies bT y < 0.
12.2 Let X and S be both positive definite. Then prove
n log(X • S) − log(det(X) · det(S)) ≥ n log n.
12.3 Consider SDP and the potential level set

Ψ(δ) := {(X, y, S) ∈F : ψn+ρ (X, S) ≤ δ}.
Prove that
Ψ(δ 1 ) ⊂ Ψ(δ 2 ) if δ1 ≤ δ2 ,
and for every δ Ψ(δ) is bounded and its closure Ψ̂(δ) has non-empty inter-
section with the SDP solution set.
12.4 Let both (SDP) and (SDD) have interior feasible points. Then for
any 0 < µ < ∞, the central path point (X(µ), y(µ), S(µ)) exists and is
unique. Moreover,
i) the central path point (X(µ), S(µ)) is bounded where 0 < µ ≤ µ0 for any
given 0 < µ0 < ∞.
ii) For 0 < µ0 < µ,
C • X(µ0 ) < C • X(µ) and bT y(µ0 ) > bT y(µ)
if X(µ) 6= X(µ0 ) and y(µ) 6= y(mu0 ).
iii) (X(µ), S(µ)) converges to an optimal solution pair for (SDP) and (SDD),
and the rank of the limit of X(µ) is maximal among all optimal solu-
tions of (SDP) and the rank of the limit S(µ) is maximal among all
optimal solutions of (SDD).
160 CHAPTER 12. PENALTY AND BARRIER METHODS
Bibliography

[1] F. Alizadeh. Combinatorial optimization with interior point methods


and semi–definite matrices. Ph.D. Thesis, University of Minnesota,
Minneapolis, MN, 1991.
[2] F. Alizadeh. Optimization over the positive semi-definite cone:
Interior–point methods and combinatorial applications. In P. M.
Pardalos, editor, Advances in Optimization and Parallel Computing,
pages 1–25. North Holland, Amsterdam, The Netherlands, 1992.
[3] E. D. Andersen and Y. Ye. On a homogeneous algorithm for the
monotone complementarity problem. Math. Programming, 84:375-
400, 1999.
[4] K. M. Anstreicher, D. den Hertog, C. Roos, and T. Terlaky. A long
step barrier method for convex quadratic programming. Algorith-
mica, 10:365–382, 1993.
[5] M. S. Bazaraa, J. J. Jarvis, and H. F. Sherali. Linear Program-
ming and Network Flows, chapter 8.4 : Karmarkar’s projective al-
gorithm, pages 380–394, chapter 8.5: Analysis of Karmarkar’s al-
gorithm, pages 394–418. John Wiley & Sons, New York, second
edition, 1990.
[6] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of lin-
ear programming, Part I: Affine and projective scaling trajectories.
Transactions of the American Mathematical Society, 314(2):499–526,
1989.
[7] D. A. Bayer and J. C. Lagarias. The nonlinear geometry of linear
programming, Part II: Legendre transform coordinates. Transac-
tions of the American Mathematical Society, 314(2):527–581, 1989.
[8] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Bel-
mont, MA, 1995.

161
162 BIBLIOGRAPHY

[9] D.M. Bertsimas and J.N. Tsitsiklis. Linear Optimization. Athena


Scientific, Belmont, Massachusetts, 1997.

[10] R. E. Bixby. Progress in linear programming. ORSA J. on Comput.,


6(1):15–22, 1994.

[11] R. G. Bland, D. Goldfarb and M. J. Todd. The ellipsoidal method:


a survey. Operations Research, 29:1039–1091, 1981.

[12] L. Blum, F. Cucker, M. Shub, and S. Smale, Complexity and Real


Computation (Springer-Verlag, 1996).

[13] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan. Linear Ma-


trix Inequalities in System and Control Science. SIAM Publications.
SIAM, Philadelphia, 1994.

[14] S. A. Cook. The complexity of theorem-proving procedures. Proc.


3rd ACM Symposium on the Theory of Computing, 151–158, 1971.

[15] R. W. Cottle. Linear Programming. Lecture Notes for MS&E 310,


Stanford University, Stanford, 2002.

[16] R. Cottle, J. S. Pang, and R. E. Stone. The Linear Complementar-


ity Problem, chapter 5.9 : Interior–point methods, pages 461–475.
Academic Press, Boston, 1992.

[17] G. B. Dantzig. Linear Programming and Extensions. Princeton


University Press, Princeton, New Jersey, 1963.

[18] G.B. Dantzig and M.N. Thapa. Linear Programming 1: Introduc-


tion. Springer-Verlag, New York, 1997.

[19] G.B. Dantzig and M.N. Thapa. Linear Programming 2: Theory and
Extensions. Springer-Verlag, New York, 2003.

[20] S. C. Fang and S. Puthenpura. Linear Optimization and Extensions.


Prentice–Hall, Englewood Cliffs, NJ, 1994.

[21] A. V. Fiacco and G. P. McCormick. Nonlinear Programming : Se-


quential Unconstrained Minimization Techniques. John Wiley &
Sons, New York, 1968. Reprint : Volume 4 of SIAM Classics in
Applied Mathematics, SIAM Publications, Philadelphia, PA, 1990.

[22] R. M. Freund. Polynomial–time algorithms for linear programming


based only on primal scaling and projected gradients of a potential
function. Math. Programming, 51:203–222, 1991.
BIBLIOGRAPHY 163

[23] K. R. Frisch. The logarithmic potential method for convex program-


ming. Unpublished manuscript, Institute of Economics, University
of Oslo, Oslo, Norway, 1955.

[24] M. X. Goemans and D. P. Williamson. Improved approximation al-


gorithms for maximum cut and satisfiability problems using semidef-
inite programming. Journal of Assoc. Comput. Mach., 42:1115–
1145, 1995.

[25] D. Goldfarb and M. J. Todd. Linear Programming. In G. L.


Nemhauser, A. H. G. Rinnooy Kan, and M. J. Todd, editors, Opti-
mization, volume 1 of Handbooks in Operations Research and Man-
agement Science, pages 141–170. North Holland, Amsterdam, The
Netherlands, 1989.

[26] D. Goldfarb and D. Xiao. A primal projective interior point method


for linear programming. Math. Programming, 51:17–43, 1991.

[27] A. J. Goldman and A. W. Tucker. Polyhedral convex cones. In


H. W. Kuhn and A. W. Tucker, Editors, Linear Inequalities and
Related Systems, page 19–40, Princeton University Press, Princeton,
NJ, 1956.

[28] C. C. Gonzaga. An algorithm for solving linear programming prob-


lems in O(n3 L) operations. In N. Megiddo, editor, Progress in
Mathematical Programming : Interior Point and Related Methods,
Springer Verlag, New York, 1989, pages 1–28.

[29] C. C. Gonzaga and M. J. Todd. An O( nL)–iteration large–step
primal–dual affine algorithm for linear programming. SIAM J. Op-
timization, 2:349–359, 1992.

[30] J. Hartmanis and R. E. Stearns, On the computational complexity


of algorithms. Trans. A.M.S, 117:285–306, 1965.

[31] D. den Hertog. Interior Point Approach to Linear, Quadratic


and Convex Programming, Algorithms and Complexity. Ph.D.
Thesis, Faculty of Mathematics and Informatics, TU Delft, NL–
2628 BL Delft, The Netherlands, 1992.

[32] P. Huard. Resolution of mathematical programming with nonlinear


constraints by the method of centers. In J. Abadie, editor, Nonlin-
ear Programming, pages 207–219. North Holland, Amsterdam, The
Netherlands, 1967.
164 BIBLIOGRAPHY

[33] N. K. Karmarkar. A new polynomial–time algorithm for linear pro-


gramming. Combinatorica, 4:373–395, 1984.
[34] L. G. Khachiyan. A polynomial algorithm for linear programming.
Doklady Akad. Nauk USSR, 244:1093–1096, 1979. Translated in So-
viet Math. Doklady, 20:191–194, 1979.
[35] V. Klee and G. J. Minty. How good is the simplex method. In
O. Shisha, editor, Inequalities III, Academic Press, New York, NY,
1972.
[36] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial–time algo-
rithm for a class of linear complementarity problems. Math. Pro-
gramming, 44:1–26, 1989.

[37] M. Kojima, S. Mizuno, and A. Yoshise. An O( nL) iteration poten-
tial reduction algorithm for linear complementarity problems. Math.
Programming, 50:331–342, 1991.
[38] Z. Q. Luo, J. Sturm and S. Zhang. Conic Convex Programming And
Self-Dual Embedding. Optimization Methods and Software, 14:169–
218, 2000.
[39] I. J. Lustig, R. E. Marsten, and D. F. Shanno. On implementing
Mehrotra’s predictor–corrector interior point method for linear pro-
gramming. SIAM J. Optimization, 2:435–449, 1992.
[40] L. McLinden. The analogue of Moreau’s proximation theorem, with
applications to the nonlinear complementarity problem. Pacific
Journal of Mathematics, 88:101–161, 1980.
[41] N. Megiddo. Pathways to the optimal set in linear programming.
In N. Megiddo, editor, Progress in Mathematical Programming : In-
terior Point and Related Methods, pages 131–158. Springer Verlag,
New York, 1989.
[42] S. Mehrotra. On the implementation of a primal–dual interior point
method. SIAM J. Optimization, 2(4):575–601, 1992.
[43] S. Mizuno, M. J. Todd, and Y. Ye. On adaptive step primal–dual
interior–point algorithms for linear programming. Mathematics of
Operations Research, 18:964–981, 1993.
[44] R. D. C. Monteiro and I. Adler. Interior path following primal–dual
algorithms : Part I : Linear programming. Math. Programming,
44:27–41, 1989.
BIBLIOGRAPHY 165

[45] K. G. Murty. Linear Complementarity, Linear and Nonlinear


Programming, volume 3 of Sigma Series in Applied Mathematics,
chapter 11.4.1 : The Karmarkar’s algorithm for linear program-
ming, pages 469–494. Heldermann Verlag, Nassauische Str. 26, D–
1000 Berlin 31, Germany, 1988.
[46] S. G. Nash and A. Sofer. Linear and Nonlinear Programming. New
York, McGraw-Hill Companies, Inc., 1996.
[47] Yu. E. Nesterov and A. S. Nemirovskii. Interior Point Polynomial
Methods in Convex Programming: Theory and Algorithms. SIAM
Publications. SIAM, Philadelphia, 1993.
[48] Yu. E. Nesterov and M. J. Todd. Self-scaled barriers and interior-
point methods for convex programming. Mathematics of Operations
Research, 22(1):1–42, 1997.
[49] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization:
Algorithms and Complexity. Prentice–Hall, Englewood Cliffs, NJ,
1982.
[50] J. Renegar. A polynomial–time algorithm, based on Newton’s
method, for linear programming. Math. Programming, 40:59–93,
1988.
[51] C. Roos, T. Terlaky and J.-Ph. Vial. Theory and Algorithms for
Linear Optimization: An Interior Point Approach. John Wiley &
Sons, Chichester, 1997.

[52] R. Saigal. Linear Programming: Modern Integrated Analysis. Kluwer


Academic Publisher, Boston, 1995.
[53] G. Sonnevend. An “analytic center” for polyhedrons and new classes
of global algorithms for linear (smooth, convex) programming. In
A. Prekopa, J. Szelezsan, and B. Strazicky, editors, System Mod-
elling and Optimization : Proceedings of the 12th IFIP–Conference
held in Budapest, Hungary, September 1985, volume 84 of Lecture
Notes in Control and Information Sciences, pages 866–876. Springer
Verlag, Berlin, Germany, 1986.
[54] K. Tanabe. Complementarity–enforced centered Newton method
for mathematical programming. In K. Tone, editor, New Methods
for Linear Programming, pages 118–144, The Institute of Statistical
Mathematics, 4–6–7 Minami Azabu, Minatoku, Tokyo 106, Japan,
1987.
166 BIBLIOGRAPHY

[55] M. J. Todd. A low complexity interior point algorithm for linear


programming. SIAM J. Optimization, 2:198–209, 1992.

[56] M. J. Todd and Y. Ye. A centered projective algorithm for linear pro-
gramming. Mathematics of Operations Research, 15:508–529, 1990.
[57] L. Tunccel. Constant potential primal–dual algorithms: A frame-
work. Math. Programming, 66:145–159, 1994.
[58] R. Tutuncu. An infeasible-interior-point potential-reduction algo-
rithm for linear programming. Ph.D. Thesis, School of Operations
Research and Industrial Engineering, Cornell University, Ithaca,
NY, 1995.
[59] P. M. Vaidya. An algorithm for linear programming which requires
O((m + n)n2 + (m + n)1.5 nL) arithmetic operations. Math. Pro-
gramming, 47:175–201, 1990. Condensed version in : Proceedings of
the 19th Annual ACM Symposium on Theory of Computing, 1987,
pages 29–38.
[60] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM
Review, 38(1):49–95, 1996.

[61] R. J. Vanderbei. Linear Programming: Foundations and Extensions


Kluwer Academic Publishers, Boston, 1997.

[62] S. A. Vavasis. Nonlinear Optimization: Complexity Issues. Oxford


Science, New York, NY, 1991.

[63] S. J. Wright. Primal-Dual Interior-Point Methods SIAM, Philadel-


phia, 1996.

[64] Y. Ye. An O(n3 L) potential reduction algorithm for linear program-


ming. Math. Programming, 50:239–258, 1991.

[65] Y. Ye, M. J. Todd and S. Mizuno. An O( nL)–iteration homoge-
neous and self–dual linear programming algorithm. Mathematics of
Operations Research, 19:53–67, 1994.
[66] Y. Zhang and D. Zhang. On polynomiality of the Mehrotra-type
predictor-corrector interior-point algorithms. Math. Programming,
68:303–317, 1995.
Index

analytic center, 148 optimal basis, 145


optimal face, 145
barrier function, 148 unbounded, 146
basis identification, 145
big M method, 145, 147 matrix inner product, 155
maximal-rank complementarity solu-
central path, 139 tion, 159
combined Phase I and II, 147
combined primal and dual, 147 nondegeneracy, 145
complementarity partition, 145 normal matrix, 144
complexity, 118
optimal face, 149
dual potential function, 130 optimal solution, 117
ellipsoid, 125, 126
Phase I, 147
ellipsoid method, 118
Phase II, 147
exponential time, 117
polynomial time, 118
positive definite, 155
Farkas’ lemma in SDP, 159
feasible region, 117 positive semi-definite, 155
feasible solution potential function, 130
strictly or interior, 155 potential reduction algorithm, 141,
154
homogeneous and self-dual, 147 ¹, 159
primal-dual algorithm, 154
infeasibility certificate, 148 primal-dual potential function, 141,
initialization, 144, 146 148, 158
interior-point algorithm, 118 ψn+ρ (X, S), 158
ψn+ρ (x, s), 141
level set, 135 purification, 145
linear programming, 117, 154
degeneracy, 145 random perturbation, 145
infeasible, 146 rank-one update, 144
nondegeneracy, 145 real number model, 144

409
View publication stats

410 INDEX

set

Ω, 125
simplex method, 117, 144
simultaneously feasible and optimal,
148
slack variable, 130
step-size, 141
strict complementarity partition, 150
strong duality theorem in SDP, 157
Â, 159

termination, 144, 145


trace, 155

vertex, 117

weak duality theorem in SDP, 157

You might also like