On the Implementation of an Interior-point Filter Line-search Algorithm for Large-scale Nonlinear Programming MP2006
On the Implementation of an Interior-point Filter Line-search Algorithm for Large-scale Nonlinear Programming MP2006
Abstract. We present a primal-dual interior-point algorithm with a filter line-search method for nonlinear
programming. Local and global convergence properties of this method were analyzed in previous work. Here
we provide a comprehensive description of the algorithm, including the feasibility restoration phase for the fil-
ter method, second-order corrections, and inertia correction of the KKT matrix. Heuristics are also considered
that allow faster performance. This method has been implemented in the IPOPT code, which we demonstrate
in a detailed numerical study based on 954 problems from the CUTEr test set. An evaluation is made of several
line-search options, and a comparison is provided with two state-of-the-art interior-point codes for nonlinear
programming.
Key words. Nonlinear programming – Nonconvex constrained optimization – Filter method – Line search –
Interior-point method – Barrier method
1. Introduction
Growing interest in efficient optimization methods has led to the development of
interior-point or barrier methods for large-scale nonlinear programming. In particu-
lar, these methods provide an attractive alternative to active set strategies in handling
problems with large numbers of inequality constraints. Over the past 15 years, there has
also been a better understanding of the convergence properties of interior-point meth-
ods [16] and efficient algorithms have been developed with desirable global and local
convergence properties.
To allow convergence from poor starting points, interior-point methods, in both
trust region and line-search frameworks, have been developed that use exact penalty
merit functions to enforce progress toward the solution [2, 21, 29]. On the other hand,
Fletcher and Leyffer [14] recently proposed filter methods, offering an alternative to
merit functions, as a tool to guarantee global convergence in algorithms for nonlinear
programming. The underlying concept is that trial points are accepted if they improve
the objective function or improve the constraint violation instead of a combination of
those two measures defined by a merit function.
More recently, this filter approach has been adapted to barrier methods in a number
of ways. M. Ulbrich, S. Ulbrich, and Vicente [22] consider a trust region filter method
Andreas Wächter: IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598, USA.
e-mail: [email protected]
Lorenz T. Biegler: Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA.
e-mail: [email protected]
that bases the acceptance of trial steps on the norm of the optimality conditions. Also,
Benson, Shanno, and Vanderbei [1] proposed several heuristics based on the idea of
filter methods, for which improved efficiency is reported compared to their previous
merit function approach, although no convergence analysis is given. Finally, global con-
vergence of an interior-point algorithm with a filter line search is analyzed in [26]. The
assumptions made for the analysis of the interior-point method in [26] are less restrictive
than those made for previously proposed line-search interior-point methods for nonlinear
programming (e.g., [10, 29]).
A number of interior-point methods have been implemented in robust software codes
(such as [3, 23]), and numerical tests have shown them to be efficient and robust in
practice. In this paper we describe the detailed development of a primal-dual interior-
point algorithm with a filter line-search, based on the analysis in [26]. We consider a
primal-dual barrier method to solve nonlinear optimization problems of the form
(i) (i)
where xL ∈ [−∞, ∞)n and xU ∈ (−∞, ∞]n , with xL ≤ xU , are the lower and
upper bounds on the variables x. The objective function f : Rn −→ R and the equality
constraints c : Rn −→ Rm , with m ≤ n, are assumed to be twice continuously differ-
entiable. Problems with general nonlinear inequality constraints, “d(x) ≤ 0,” can be
reformulated in the above form by introducing slack variables.
The paper is organized as follows. Section 2 presents the overall algorithm, including
the step computation, the filter line-search procedure, and a second-order correction. In
Section 3, we describe some aspects of the algorithm, and its implementation, in more
detail, including the restoration phase for the filter procedure, as well as several heu-
ristics to improve the performance of the overall method. Section 4 presents numerical
results of our implementation, called IPOPT, for the CUTEr test set [18], including a
comparison of the filter method with a penalty function approach, and a comparison with
two state-of-the-art nonlinear optimization codes, KNITRO [3, 28] and LOQO [23].
1.1. Notation
The i-th component of a vector v ∈ Rn is written as v (i) . Norms · denote a fixed vector
norm and its compatible matrix norm unless explicitly noted. We further introduce the
notation X := diag(x) for a vector x (similarly Z := diag(z), etc.), and e stands for the
vector of all ones for appropriate dimension.
Finally, we will refer to mach as the machine precision for the finite arithmetic. For the
computer and implementation used for our numerical experiments, it is mach ≈ 10−16 .
The algorithm presented in this paper has parameters, which have to be given values
for a practical implementation. Except where explicitly noted, these parameters do not
depend on the details of the finite arithmetic.
On the implementation of an interior-point filter line-search algorithm 27
The following sections motivate the proposed algorithm, which is formally summarized
in Section 2.5.
To simplify notation, we first describe the method for the problem formulation
The changes necessary to handle the general case (1) will be briefly outlined in Sec-
tion 3.4.
As a barrier method, like the methods discussed in [2, 8, 11, 29], the proposed
algorithm computes (approximate) solutions for a sequence of barrier problems
n
min ϕµ (x) := f (x) − µ ln(x (i) ) (3a)
x∈Rn
i=1
s.t. c(x) = 0 (3b)
with the homotopy parameter µ, which is driven to zero (see e.g., [4, 17]). Here, λ ∈ Rm
and z ∈ Rn correspond to the Lagrangian multipliers for the equality constraints (2b) and
the bound constraints (2c), respectively. Note, that the equations (4) for µ = 0 together
with “x, z ≥ 0” are the Karush-Kuhn-Tucker (KKT) conditions for the original problem
(2). Those are the first order optimality conditions for (2) if constraint qualifications are
satisfied [7].
The presented method computes an approximate solution to the barrier problem (3)
for a fixed value of µ, then decreases the barrier parameter, and continues the solution
of the next barrier problem from the approximate solution of the previous one.
Using the individual parts of the primal-dual equations (4), we define the optimality
error for the barrier problem as
∇f (x) + ∇c(x)λ − z∞ XZe − µe∞
Eµ (x, λ, z) := max , c(x)∞ , (5)
sd sc
28 A. Wächter, L. T. Biegler
for a constant κ > 0, before the algorithm continues with the solution of the next barrier
problem. The new barrier parameter is obtained from
tol θ
µj +1 = max , min κµ µj , µjµ , (7)
10
with constants κµ ∈ (0, 1) and θµ ∈ (1, 2). In this way, the barrier parameter is even-
tually decreased at a superlinear rate. On the other hand, the update rule (7) does not
allow µ to become smaller than necessary given the desired tolerance tol , thus avoiding
numerical difficulties at the end of the optimization procedure.
For later reference, we also choose a “fraction-to-the-boundary” parameter
τj = max{τmin , 1 − µj } (8)
In order to solve the barrier problem (3) for a given fixed value µj of the barrier param-
eter, a damped Newton’s method is applied to the primal-dual equations (4). Here, we
use k to denote the iteration counter for the “inner loop.” Given an iterate (xk , λk , zk )
with xk , zk > 0, search directions (dkx , dkλ , dkz ) are obtained from the linearization of (4)
at (xk , λk , zk ), namely
Wk Ak −I dkx ∇f (xk ) + Ak λk − zk
T
Ak 0 0 dkλ = − c(xk ) . (9)
Zk 0 Xk dkz
X k Zk e − µ j e
Instead of solving the nonsymmetric linear system (9) directly, the proposed method
computes the solution equivalently by first solving the smaller, symmetric linear system
x
W k + k Ak dk ∇ϕµj (xk ) + Ak λk
=− , (11)
ATk 0 dkλ c(xk )
with k := Xk−1 Zk , derived from (9) by eliminating the last block row. The vector dkz
is then obtained from
As is common for most line-search methods, we have to ensure that the matrix in the
top-left block in the matrix in (11), projected onto the null space of the constraint Jaco-
bian ATk , is positive definite. This is necessary to guarantee certain descent properties
for the filter line-search procedure below [26]. Also, if Ak does not have full rank, the
iteration matrix in (11) is singular, and a solution of (11) might not exist. Therefore, it
might be necessary to modify the iteration matrix. In our implementation, we solve the
linear system
x
Wk + k + δw I Ak dk ∇ϕµj (xk ) + Ak λk
=− , (13)
ATk −δc I dkλ c(xk )
for some δw , δc ≥ 0. The choice of the scalars δw and δc for each iteration is discussed
below in Section 3.1.
Having computed search directions from (13) and (12), step sizes αk , αkz ∈ (0, 1]
have to be determined in order to obtain the next iterate as
Note that we allow a different step size in the z variables from that of the other variables.
In our experience, this is more efficient since it does not unnecessarily restrict the steps.
Since x and z are both positive at an optimal solution of the barrier problem (3), this
property is maintained for all iterates. It is attained using the fraction-to-the-boundary
rule
αkmax := max α ∈ (0, 1] : xk + αdkx ≥ (1 − τj )xk (15a)
αkz := max α ∈ (0, 1] : zk + αdkz ≥ (1 − τj )zk (15b)
where the parameter τj ∈ (0, 1) is defined in (8). Note that αkz is the actual step size
used in (14c). In order to ensure global convergence, the step size αk ∈ (0, αkmax ] for the
remaining variables is determined by a backtracking line-search procedure exploring a
decreasing sequence of trial step sizes αk,l = 2−l αkmax (with l = 0, 1, 2, . . . ). We use
a line-search variant of Fletcher and Leyffer’s filter method [14], which we present and
analyze in [26]. In particular, in [26] we prove that this procedure is globally convergent
under appropriate (mild) assumptions.
Before reviewing this procedure in the next section, we briefly note that a require-
ment for the convergence proof in [26] is that the “primal-dual barrier term Hessian”
k does not deviate arbitrarily much from the “primal Hessian” µj Xk−2 . We ensure this
by resetting
(i) (i) κ µj µj
zk+1 ← max min zk+1 , (i) , (i)
, i = 1, . . . , n, (16)
xk+1 κ xk+1
(i)
for some fixed κ ≥ 1 after each step. This guarantees that each component σk+1 of
k+1 is in the interval
(i) 1 (i) (i)
σk+1 ∈ µj /(xk )2 , κ µj /(xk )2 . (17)
κ
Such safeguards are common for the global convergence proof of primal-dual methods
for NLP (see e.g., [8, 30]), and do not interfere with the primal-dual spirit of the method
in terms of local convergence, when the parameter κ is chosen sufficiently large. In
our implementation, κ = 1010 .
Filter methods were originally proposed by Fletcher and Leyffer [14]. In the context of
solving the barrier problem (3) for µj , the basic idea behind this approach is to inter-
pret (3) as a bi-objective optimization problem with the two goals of minimizing the
objective function ϕµj (x) and the constraint violation θ(x) := c(x) (with a certain
emphasis on the latter quantity). Following this paradigm, we might consider a trial point
xk (αk,l ) := xk + αk,l dkx during the backtracking line search to be acceptable, if it leads
to sufficient progress toward either goal compared to the current iterate, i.e., if
θ (xk (αk,l )) ≤ (1 − γθ )θ (xk ) (18a)
or ϕµj (xk (αk,l )) ≤ ϕµj (xk ) − γϕ θ(xk ) (18b)
On the implementation of an interior-point filter line-search algorithm 31
holds for fixed constants γθ , γϕ ∈ (0, 1). However, the above criterion is replaced by
requiring sufficient progress in the barrier objective function, whenever for the current
iterate we have θ (xk ) ≤ θ min , for some constant θ min ∈ (0, ∞], and the following
“switching condition”
∇ϕµj (xk )T dkx < 0 and αk,l [−∇ϕµj (xk )T dkx ]sϕ > δ [θ(xk )]sθ , (19)
with constants δ > 0, sθ > 1, sϕ ≥ 1 holds. If θ (xk ) ≤ θ min and (19) is true for the
current step size αk,l , the trial point has to satisfy the Armijo condition
ϕµj (xk (αk,l )) ≤ ϕµj (xk ) + ηϕ αk,l ∇ϕµj (xk )T dkx , (20)
for some θ max , so that the algorithm will never allow trial points to be accepted that have
a constraint violation larger than θ max . Later, the filter is augmented, using the update
formula
Fk+1 := Fk ∪ (θ, ϕ) ∈ R2 : θ ≥ (1 − γθ )θ (xk ) and ϕ ≥ ϕµj (xk ) − γϕ θ(xk ) ,
(22)
after every iteration, in which the accepted trial step size does not satisfy the switching
condition (19), or in which the Armijo condition (20) does not hold. This ensures that
the iterates cannot return to the neighborhood of xk . On the other hand, if both (19) and
(20) hold for the accepted step size, the filter remains unchanged.
Overall, this procedure ensures that the algorithm cannot cycle, for example between
two points that alternatingly decrease the constraint violation and the barrier objective
function.
Finally, in some cases it is not possible to find a trial step size αk,l that satisfies the
above criteria. We approximate a minimum desired step size using linear models of the
involved functions. For this, we define
32 A. Wächter, L. T. Biegler
γϕ θ(xk ) δ[θ(xk )]sθ
min γθ , −∇ϕ (x )T d x , [−∇ϕ (x )T d x ]sϕ
µj k µj k
k k
if ∇ϕµj (xk ) dk < 0 and θ(xk ) ≤ θ min
T x
αkmin := γα · min γθ , γϕ θ(xk ) (23)
−∇ϕµj (xk )T dkx
if ∇ϕµj (xk )T dkx < 0 and θ(xk ) > θ min
γθ
otherwise,
with a “safety factor” γα ∈ (0, 1]. If the backtracking line search encounters a trial step
size with αk,l ≤ αkmin , the algorithm reverts to a feasibility restoration phase. Here, the
algorithm tries to find a new iterate xk+1 > 0 which is acceptable to the current filter and
for which (18) holds, by reducing the constraint violation with some iterative method.
Note that the restoration phase algorithm might not be able to produce a new iterate
for the filter line-search method, for example, when the problem is infeasible. In this
case, a suitable restoration phase algorithm should converge to a local minimizer (or at
least a stationary point) for the constraint violation, indicating to the user that the prob-
lem seems (at least locally) infeasible. Details on the implemented restoration phase are
presented in Section 3.3.
To ensure global convergence of the overall method it is sufficient to ensure global
convergence for each barrier parameter with a fixed value, µl . Therefore, the filter Fk is
reset to its initial definition (21), whenever µl is decreased. It might be possible to reset
the filter in ways that include information from the previous barrier problem, but in our
experience the re-initialization works well in practice.
Many methods for nonlinear optimization use second-order corrections (see, e.g., [7,
12]) to improve the proposed step if a trial point has been rejected. A second-order
correction (SOC) for some step d̃kx aims to reduce the infeasibility by applying an addi-
tional Newton-type step for the constraints at the point xk + d̃kx , using the Jacobian ATk
at xk . In the proposed method, if the first trial step size αk,0 has been rejected and if
θ(xk (αk,0 )) ≥ θ (xk ), a second-order correction dkx,soc (for the step d̃kx = αk,0 dkx ) is
computed that satisfies
ATk dkx,soc + c(xk + αk,0 dkx ) = 0. (24)
The new corrected search direction is then obtained from
dkx,cor = αk,0 dkx + dkx,soc . (25)
Condition (24) does not uniquely define the second-order correction, and different
choices would be possible. In order to avoid additional matrix factorizations, the pro-
posed method uses the same matrix as in (13) to compute the overall corrected step (25)
from
x,cor
Wk + k + δw I Ak dk ∇ϕµj (xk ) + Ak λk
= − . (26)
ATk −δc I dkλ cksoc
On the implementation of an interior-point filter line-search algorithm 33
Here, we choose
and check if the resulting trial point xksoc := xk + αksoc dkx,cor is acceptable to the filter
and satisfies the filter acceptance criteria. Note that the original search direction dkx is
still used in (19) and the right hand side of (20). Also, xksoc replaces x(αk ) in (20).
If this trial point passes the tests, it is accepted as the new iterate. Otherwise, we
apply additional second-order corrections, unless the correction step has not decreased
the constraint violation by a fraction κsoc ∈ (0, 1) or a maximum number p max of sec-
ond-order corrections has been performed. In that case, the original search direction dkx
is restored and the regular backtracking line search is resumed with a shorter step size
αk,1 = 21 αk,0 .
Note that by choosing to apply the second-order correction at the step d̃kx = αk,0 dkx
instead of, say, the full step dkx , no additional evaluation of the constraints is required.
This also guarantees that the constraints are never evaluated for arguments violating the
bound constraints (2c), at which they might not be defined.
Next we formally state the overall filter line-search algorithm for solving the barrier
problem (3).
A-1. Initialize. Initialize the iteration counters j ← 0 and k ← 0, as well as the filter
F0 from (21). Obtain τ0 from (8).
A-2. Check convergence for the overall problem. If E0 (xk , λk , zk ) ≤ tol (with the error
estimate E0 defined in (5)), then STOP [CONVERGED].
A-3. Check convergence for the barrier problem. If Eµj (xk , λk , zk ) ≤ κ µj , then:
A-3.1. Compute µj +1 and τj +1 from (7) and (8), and set j ← j + 1;
A-3.2. Re-initialize the filter Fk ← {(θ, ϕ) ∈ R2 : θ ≥ θ max };
A-3.3. If k = 0 repeat this Step A-3, otherwise continue at A-4.
A-4. Compute the search direction. Compute (dkx , dkλ , dkz ) from (13), where δw and δc
are obtained from Algorithm IC described in Section 3.1.
A-5. Backtracking line search.
34 A. Wächter, L. T. Biegler
A-5.1. Initialize the line search. Set αk,0 = αkmax with αkmax from (15a), and set
l ← 0.
A-5.2. Compute the new trial point. Set xk (αk,l ) := xk + αk,l dkx .
A-5.3. Check acceptability to the filter. If (θ (xk (αk,l )), ϕµj (xk (αk,l )) ∈ Fk , reject
the trial step and go to Step A-5.5.
A-5.4. Check sufficient decrease with respect to the current iterate.
– Case I: θ (xk ) ≤ θ min and (19) holds: If (20) holds, accept the trial
step xk+1 := xk (αk,l ) and go to A-6. Otherwise, continue at A-5.5.
– Case II: θ (xk ) > θ min or (19) is not satisfied: If (18) holds, accept the
trial step xk+1 := xk (αk,l ) and go to A-6. Otherwise, continue at A-5.5.
A-5.5. Initialize the second-order correction. If l > 0 or θ(xk,0 ) < θ (xk ), skip the
second-order correction (SOC) and continue at A-5.10. Otherwise, initial-
ize the SOC counter p ← 1 and cksoc from (27). Initialize θold soc ← θ(x ).
k
x,cor
A-5.6. Compute the second-order correction. Compute dk and dkλ from (26),
αksoc from (28), and xksoc := xk + αksoc dkx,cor .
A-5.7. Check acceptability to the filter (in SOC). If (θ (xksoc ), ϕµj (xksoc )) ∈ Fk ,
reject the trial step size and go to Step A-5.10.
A-5.8. Check sufficient decrease with respect to the current iterate (in SOC).
– Case I: θ (xk ) ≤ θ min and (19) holds (for αk,0 ): If (20) holds with
“xk (αk,l )” replaced by “xksoc ”, accept the trial step xk+1 := xksoc and
go to A-6. Otherwise, continue at A-5.9.
– Case II: θ (xk ) > θ min or (19) is not satisfied (for αk,0 ): If (18) holds
with “xk (αk,l )” replaced by “xksoc ”, accept the trial step xk+1 := xksoc
and go to A-6. Otherwise, continue at A-5.9.
A-5.9. Next second-order correction. If p = p max or θ(xksoc ) > κsoc θold soc , abort
Note that in each iteration at least one trial point will be tested before the algorithm
may switch to the restoration phase. Also, the condition in Step A-3.3 ensures that even-
tually at least one step is taken for each decreased value of the barrier parameter. This
is necessary to achieve fast local convergence in the neighborhood of a local solution
satisfying the strong second-order optimality conditions [4].
In our implementation, the 1 norm is used to measure the infeasibility, i.e., θ(x) =
c(x)1 . The values of the constants in our implementation (if their value has not yet
been mentioned earlier) are κ = 10; κµ = 0.2; θµ = 1.5; τmin = 0.99; γθ = 10−5 ;
γϕ = 10−5 ; δ = 1; γα = 0.05; sθ = 1.1; sϕ = 2.3; ηϕ = 10−4 ; κsoc = 0.99; p max = 4;
as well as µ0 = 0.1, θ max = 104 max{1, θ(x0 )} and θ min = 10−4 max{1, θ (x0 )}, where
x0 is the starting point. These values have been chosen because they seem to produce
overall good performance, compared to other values we have explored. But the most
efficient choice of the values of those numerical parameters are usually problem depen-
dent. The numerical results in Section 4 were obtained with the tolerance tol = 10−8
√
(which is approximately mach ).
In order to be able to compute the search direction from (11), we need to ensure that the
iteration matrix is non-singular. In addition, as mentioned earlier, the filter line-search
method requires that the matrix in the top-left block of (11), projected onto the null space
of the constraint Jacobian ATk , is positive definite1 . These conditions are satisfied if the
iteration matrix has the inertia (n, m, 0), i.e., if it has exactly n positive, m negative,
and no zero eigenvalues [20]. Therefore, if the inertia of this matrix is not (n, m, 0), the
linear system (13) is re-solved in our implementation with a modified iteration matrix
for different trial values for the scalars δw , δc ≥ 0 until the inertia is as desired. The
inertia of the iteration matrix is readily available from several symmetric indefinite linear
solvers such as the factorization routine MA27 from the Harwell subroutine library [19]
used in our implementation.
Note that the desired inertia is obtained if δw is sufficiently large and the constraint
Jacobian ∇c(xk )T has full rank. If ∇c(xk )T is rank-deficient, the matrix is singular as
long as δc is zero, but a positive value for δc and a sufficiently large value of δw gives the
correct eigenvalue signatures2 . In practice, however, the iteration matrix can become so
ill-conditioned that the factorization cannot be performed successfully, even with very
large values of δw and some δc > 0. In this case, we give up on the current step compu-
tation and switch directly to the feasibility restoration phase, hoping that the matrix has
better properties close to feasible points.
1 The global convergence proof in [26] requires that the eigenvalues of the projection are uniformly bounded
away from zero. However, since guaranteeing this property does not seem to be possible without consider-
able computational effort, e.g., construction of the projected matrix explicitly, followed by an eigenvalue
decomposition, we only guarantee positive definiteness in each iteration.
2 The minus sign for the δ -perturbation is used to avoid generating too many positive eigenvalues.
c
36 A. Wächter, L. T. Biegler
In each iteration k:
IC-1. Attempt to factorize the unmodified matrix in (13) with δw = δc = 0. If the matrix
is non-singular and its inertia is (n, m, 0), use the resulting search direction in the
line search. Otherwise continue with IC-2.
IC-2. If the iteration matrix has zero eigenvalues, set δc ← δ̄c µκc , otherwise set δc ← 0.
IC-3. If δwlast = 0, set δ ← δ̄ 0 , otherwise set δ ← max{δ̄ min , κ − δ last }.
w w w w w w
IC-4. Attempt to factorize the modified matrix in (13). If the inertia is now (n, m, 0),
set δw last ← δ and use the resulting search direction in the line search. Otherwise
w
continue with IC-5.
IC-5. If δwlast = 0, set δ ← κ̄ + δ , otherwise set δ ← κ + δ .
w w w w w w
IC-6. If δw > δ̄w max , abort the search direction computation, skip the backtracking line
search, and switch directly to the restoration phase in Step A-9 of Algorithm A.
Otherwise, go back to IC-4.
In our implementation, we have δ̄w min = 10−20 , δ̄ 0 = 10−4 , δ̄ max = 1040 , as well as
w w
κ̄w = 100, κw = 8, κw = 3 and κc = 41 . The values of δ̄c = 10−8 is chosen to be
+ + − 1
√
approximately mach .
The above heuristic first checks in IC-1 if the unmodified matrix has the desired
inertia so that the “pure” Newton search direction is used whenever possible (with an
exception mentioned below). If IC-1 is unsuccessful, increasing values for δw are used.
Note that the first trial value is based on δwlast , which stores the perturbation value from
the last time a modification of the iteration matrix was necessary. In this way, we attempt
to find the smallest perturbation necessary (within some factor) while at the same time
avoiding futile factorizations in IC-4 for values of δw that are too small. Here we assume
that the minimum necessary perturbation is of the same order of magnitude in succes-
sive iterations. The reason for using a much larger factor κ̄w+ in IC-5 for the very first
necessary correction than for the correction in later iterations is that we want to avoid
a high number of trial factorizations when the scale of the problem and the order of
magnitude for a successful correction is not yet known. By choosing κw− and κw+ so that
κw− κw+ = 1 we avoid situations where the same perturbation δw is used in successive
iterations. Otherwise, the algorithm could repeatedly produce very large steps dkx due to
a nearly singular iteration matrix, so that only very small step sizes αk would be taken
and little progress would be made.
A nonzero value for δc is always chosen if the unmodified iteration matrix has a
zero eigenvalue, as we assume that the singularity is caused by a rank-deficient con-
straint Jacobian. We do not attempt to verify whether the singularity is instead caused
by a singular projected Hessian matrix, because this would increase the number of trial
factorizations. Note that the nonzero value for δc in Step IC-2 converges to zero as
µ → 0 (if κc > 0), so that the perturbation is smaller when a solution of the problem is
approached.
On the implementation of an interior-point filter line-search algorithm 37
In some problem instances, the iteration matrix is structurally singular, for exam-
ple, when the equality constraint gradients are always linearly dependent, or when the
reduced Hessian is always rank deficient. We therefore deviate from Algorithm IC in our
implementation, if the iteration matrix is singular in the first three iterations and if this
can be corrected by choosing a positive value for δc . In that case, the value of δc used in
the later iteration will always be δ̄c µκc (also in IC-1) in order to avoid futile factorizations
with δc = 0. Similarly, if in the first three iterations singularity of the iteration matrix
can be avoided by choosing δw > 0, we assume that a correction δw > 0 is necessary in
any case, and Step IC-1 above is executed with δw = max{δ̄w min , κ − δ last }.
w w
We note that the optimization code LOQO [23] also uses a similar trial procedure
to find an appropriate perturbation of the Hessian.
One possible pitfall of the filter method described in Section 2.3 is that the filter Fk in
the current iteration might include (θ, ϕ)-pairs that have been added earlier for an iterate
in a different region, with similar values for ϕµj (x) and θ(x) (see also [26, Remark 7]).
This could prevent the algorithm from taking good steps toward a nearby local solution.
As a result, the backtracking line-search algorithm might repeatedly cut back the step
size, or could be forced unnecessarily to resort to the feasibility restoration phase.
We also noticed that in some cases the full step (with αk,0 = αkmax , even with a
second-order correction) is rejected in successive iterations, because it does not achieve
sufficient progress with respect to the current iterate (condition (18) or (20)). This causes
the algorithm to make little progress, even though the method may converge faster when
the acceptance of the full step is temporarily allowed.
In order to avoid these inefficiencies, two heuristics are added to the proposed method.
The algorithm counts the number of successive iterations in which the first trial step
(including a second-order correction) is rejected. If this number exceeds a given thresh-
old (four in our implementation), then one of the following actions are taken after the
last of those iterations, say iteration k:
• Case I: If θ max > θ (xk+1 )/10, and the last unsuccessful trial step size in the back-
tracking line search was rejected in A-5.3 because the trial point was not acceptable
to the filter.
In this case, the current filter might be blocking good progress, caused by historic
information from iterates in a different (and now irrelevant) region of Rn . To avoid
further inefficiencies, the filter is re-initialized for the next iteration by setting Fk+1 =
{(θ, ϕ) : θ ≥ θ max } in Step A-7, after the maximal permitted constraint violation has
been been reduced, i.e., θ max ← 0.1θ max . Note that the decrease of θ max ensures that
the filter is not reset infinitely many times, unless the infeasibility becomes arbitrarily
small.
• Case II: Otherwise.
Here, we hope to overcome possible inefficiencies by tentatively ignoring the filter
criteria for one iteration, similar to a watchdog procedure [5] (with one relaxed step).
In the next iteration, k + 1, we choose αk+1 = αk+1 max without any backtracking line
search. The filter is not augmented in Step A-7 for iteration k + 1, and the search
38 A. Wächter, L. T. Biegler
x , dλ , dz
directions dk+1 x
k+1 k+1 are stored as a backup. Then, new search directions dk+2
etc. are computed at the point xk+2 = xk+1 + αk+1 dk+1 etc. We check whether
max x
max is acceptable to the filter F
the trial point for the full step αk+2 k+1 and satisfies
the line-search acceptance criteria for the previous iteration k + 1, i.e., whether
“θ (xk+2 + αk+2
max d x ) ≤ (1 − γ )θ (x
k+2 θ k+1 )” (similar to (18b)) or
(depending on the switching condition in Step A-5.4 for iteration k + 1). If these
tests are passed, the trial point is accepted as iterate xk+3 , and λk+3 and zk+3 are
updated accordingly. In this case, we have made sufficient progress with respect to
xk+1 within two iterations, and the filter is augmented using (22) with xk+1 , if (19)
or (20) does not hold (for xk+1 and dk+1x ). If the tests fail, the tentative iterate x
k+2
is abandoned, the original search directions dk+1x , dλ , dz are restored, and the
k+1 k+1
usual backtracking line-search procedure from xk+1 is resumed to produce a new
iterate xk+3 = xk+1 + αk+1,l dk+1
x .
Even though these heuristics are not frequently activated and the watchdog heuristic
might in some cases increase the number of iterations, they appear to have an overall
positive effect.
A key ingredient of the filter line-search method is the feasibility restoration phase (see
Step A-9). The task of the restoration phase is to compute a new iterate acceptable to the
augmented filter Fk+1 by decreasing the infeasibility, whenever the regular backtrack-
ing line-search procedure cannot make sufficient progress and the step size becomes too
small (see Step A-5.10). In addition, as mentioned in Section 3.1, the method switches to
the restoration phase whenever the linear system (13) is very ill-conditioned and cannot
be factorized successfully despite modifications of the iteration matrix. In summary, the
feasibility restoration phase is very important in the sense that it is invoked whenever
the progress to the solution becomes difficult, and hence it needs to be very robust.
The feasibility restoration phase has another significant purpose, namely to detect
(local) infeasibility. Infeasible problems arise, for example, due to modeling errors, and
a user should be notified quickly of a badly-posed problem. If the problem is infeasible,
the algorithm is ultimately not able to generate sufficient progress in the regular back-
tracking line-search procedure and reverts to the restoration phase. We would then want
the restoration phase to converge to a non-zero minimizer of the constraint violation (in
some norm), and in this way to provide an indication of infeasibility.
We note that for the global convergence proof of the filter line-search method in [26]
it is assumed that, in the neighborhood of feasible points, the gradients of the active con-
straints are linearly independent. It is shown in [26] that as a consequence the algorithm
does not switch to the feasibility restoration phase at (almost) feasible points. However,
in practice this assumption might be violated, and the restoration phase might be called at
a point with a very small (or zero) value of θ . Since further reduction of the infeasibility
might then be difficult and not lead to progress in the optimization process, the current
On the implementation of an interior-point filter line-search algorithm 39
3.3.1. Minimization of the Constraint Violation In this section we describe the first
algorithm for the restoration phase. The goal of this method is to return a new iterate
xk+1 > 0 with (θ (xk+1 ), ϕµj (xk+1 )) ∈ Fk+1 for Step A-9, or to converge to a non-zero
minimizer (or at least a stationary point) of some norm of the constraint violation. The
restoration phase algorithm applies the primal-dual interior-point filter line-search algo-
rithm outlined in the previous sections to a smooth reformulation of the optimization
problem
ζ
min c(x̄)1 + DR (x̄ − x̄R )22 (29a)
x̄∈Rn 2
s.t. x̄ ≥ 0. (29b)
Here, a term is included in the objective function that penalizes the deviation from a
reference point x̄R , where ζ > 0 is the weighting parameter, and the scaling matrix DR
is defined by
(1) (n)
DR = diag(min{1, 1/|x̄R |}, . . . , min{1, 1/|x̄R |}).
The reference point x̄R is chosen to be the iterate xk at which the restoration phase
is called in Step A-9. In this way, we seek to decrease the constraint violation but try
to avoid a large deviation from x̄R and an undesired significant increase in the barrier
objective function ϕµj . A related restoration phase problem formulation that attempts
to minimize the constraint violation and also includes a regularization term based on
x̄ − x̄R has been proposed by Ulbrich et al. [22].
A smooth reformulation of (29) is obtained by introducing non-negative variables
p̄, n̄ ∈ Rm that capture the positive and negative parts of the constraints,
m
ζ
min p̄ (i) + n̄(i) + DR (x̄ − x̄R )22 (30a)
x̄∈Rn ,p̄,n̄∈Rm 2
i=1
s.t. c(x̄) − p̄ + n̄ = 0 (30b)
x̄, p̄, n̄ ≥ 0. (30c)
This nonlinear optimization problem is of the form (2). We can therefore apply the “reg-
ular” interior-point algorithm described in the earlier sections and solve a sequence of
barrier problems
40 A. Wächter, L. T. Biegler
m
ζ
n
min ρ p̄ (i) + n̄(i) + DR (x̄ − x̄R )22 − µ̄ ln(x̄ (i) )
x̄∈Rn ,p̄,n̄∈Rm 2
i=1 i=1
m
m
−µ̄ ln(p̄ (i) ) − µ̄ ln(n̄(i) ) (31a)
i=1 i=1
s.t. c(x̄) − p̄ + n̄ = 0 (31b)
with the filter line-search procedure. We introduced the additional scaling parameter
ρ > 0 in order to allow a relative scaling of the overall objective function (31a) with
respect to the constraints (31b). By default, the parameter ρ is chosen to be 1000, which
seems to work well in practice.
Note that if the regularization parameter ζ > 0 is chosen sufficiently small, the
optimization problem (30) is the exact penalty formulation [12] of the problem “find the
feasible point that is closest (in a weighted norm) to the reference point x̄R ,”
This appears to be an intuitive goal for the restoration phase. An additional desired con-
sequence of the penalty term is that the optimal solution of (30) is usually a strict local
solution, which makes this nonlinear optimization problem easier to solve. This would
usually not be the case for the choice ζ = 0, because then all points in the manifold
defined by “c(x) = 0” would be minimizers. Since a sufficiently small value of the
regularization√ parameter ζ is not known before a solution of (29) is determined, we
choose ζ = µ̄, so that ζ is driven to zero together with µ̄.
In addition to the original variables x̄, the barrier problem (31) contains the vari-
ables p̄ and n̄, and the corresponding primal-dual equations (similar to (4)) include their
accompanying dual variables, say z̄p and z̄n . Search directions for the line search are,
as before, obtained by linearization of these equations. Some straight-forward algebraic
manipulations show that they can be computed from (omitting the iteration index t for
simplicity)
x
W̄ + ζ DR2 + ¯ ∇c(x̄) d̄
¯ p−1 − ¯ n−1 =
∇c(x̄)T − d̄ λ
ζ DR2 (x̄ − x̄R ) + ∇c(x̄)λ̄ − µ̄X̄ −1 e
− , (32)
c(x̄) − p̄ + n̄ + ρ Z̄p−1 (µ̄e − p̄) + ρ Z̄n−1 (µ̄e − n̄)
where W̄ = m ¯ −1 ¯
i=1 λ̄ ∇xx c(x̄), = X̄ Z̄, p = P̄
(i) 2 ¯ n = N̄ −1 Z̄n . Subse-
−1 Z̄ , and
p
p n zp z
quently, d̄ , d̄ , d̄ and d̄ are obtained from
n
and d̄ z from
¯ d̄ x .
d̄ z = µ̄X̄−1 e − z̄ −
On the implementation of an interior-point filter line-search algorithm 41
Note that the structure of the nonzero elements of the linear system in (32) is identical
to the one in (13), which allows us to use the same code (and symbolic factorization)
in the step computations as for the regular iteration, including the Hessian correction
mechanism described in Section 3.1. Here, we keep δc = 0 at all times since the Jacobian
of the constraint (31b) cannot be rank-deficient. We note that second-order corrections
as described in Section 2.4 have not been implemented for the restoration phase.
The filter line-search method applied to (31) might itself revert to a restoration phase.
If this occurs, we compute the optimal solution of (31) for a fixed value of x̄ (namely
the current iterate x̄t ) and use this as the “result” of the restoration phase within the
restoration phase. Since (31) then becomes separable, this can easily be done by solving
a quadratic equation for each (p̄ (i) , n̄(i) ) pair, that is
2
µ̄ − ρ c(i) (x̄) µ̄ − ρ c(i) (x̄) µ̄ c(i) (x̄)
n̄ =
(i)
+ + p̄ (i)=c(i) (x̄) + n̄(i) .
2ρ 2ρ 2ρ
(33)
Since the gradients of the constraints (31b) always have full rank, the analysis in [26]
shows that the restoration phase (within the restoration phase filter algorithm) is not
invoked at a feasible point (for (31)).
At the beginning of the restoration phase algorithm, the first barrier parameter µ̄0
is chosen to be the maximum of the current barrier parameter, µj , of the regular iter-
ation and c(xk )∞ . The initial value for x̄0 is simply chosen as the regular iterate xk
at which the restoration phase is called (identical to x̄R ). To initialize the dual variables
(i) (i)
we set λ̄0 = 0 and z̄0 = min{ρ, zk }, for i = 1, . . . , n. Furthermore, p̄0 and n̄0 are
computed from (33), and their dual variables are initialized as z̄p,0 = µ̄(P̄0 )−1 e and
z̄n,0 = µ̄(N̄0 )−1 e. In this way, the optimality conditions for the variables added for the
restoration phase problem are all satisfied at the starting point, so that the first restoration
phase step usually tends to reduce θ (x) = c(x)1 without being “distracted” by the
introduction of the new variables.
The restoration phase is discontinued as soon as (i) the current restoration phase iter-
ate, say x̄t , is acceptable for the augmented regular filter (i.e., (θ (x̄t ), ϕµj (x̄t )) ∈ Fk+1 ,
see Step A-9) and (ii) θ (xt ) ≤ κresto θ (x̄R ) for some constant κresto ∈ (0, 1) (κresto = 0.9
in our implementation). The motivation for the second condition is to ensure that once
the restoration phase is activated, reasonable progress toward feasibility is achieved;
this has proven advantageous in our numerical experiments. The regular method is then
resumed from xk+1 = x̄t . Note, that because an interior-point algorithm is used to solve
(30), it is guaranteed that x̄t > 0. In order to compute a step for the bound multipliers
z after the return to the regular method, we pretend that the entire progress during the
restoration phase was one single step, dkx := xk+1 − xk , and obtain zk+1 from (12),
(14c) and (15b). The equality constraint multipliers are re-initialized as described in
Section 3.6 below.
On the other hand, if the termination criterion for the restoration phase problem,
similar to (6), is satisfied before the regular method can be resumed, the proposed algo-
rithm terminates with the message that the problem seems locally infeasible.
42 A. Wächter, L. T. Biegler
3.3.2. Reducing the KKT Error As mentioned earlier in Section 3.2, “historic” infor-
mation in the filter Fk originating from points in a different region of Rn can prevent fast
progress in the neighborhood of a local solution (x∗ , λ∗ , z∗ ). The heuristics in Section 3.2
might not always be able to overcome this difficulty, so that eventually the restoration
phase might be invoked. However, the regular iteration steps are Newton(-type) steps
for the primal-dual equations, and should therefore be taken close to (x∗ , λ∗ , z∗ ).
Therefore, we do not immediately revert to the algorithm described in Section 3.3.1
when the restoration phase is called in Step A-9. Instead, we try to achieve reduction in
the norm of the primal-dual equations, using the regular iteration steps (as proposed in
[26, Remark 8]). In the following description of this procedure, Fµ (x, λ, z) denotes the
nonlinear system of equations on the left hand side of (4).
with
! " ! " ! "
x̄t+1 , λ̄t+1 , z̄t+1 = x̄t , λ̄t , z̄t + βt d̄tx , d̄tλ , dtz .
If the evaluation of the functions at the trial point results in an error, or if this
decrease condition is not satisfied, discard the trial point and switch to the robust
restoration phase algorithm described in Section 3.3.1, using x̄t as the reference
point in the initialization of the robust restoration phase.
R-4. If (θ (x̄t+1 ), ϕµj (x̄t+1 )) ∈ Fk+1 , continue the regular interior-point method from
the point (xk+1 , λk+1 , zk+1 ) := (x̄t+1 , λ̄t+1 , z̄t+1 ). Otherwise, set t ← t + 1 and
continue with Step R-1.
In the neighborhood of a strict local solution satisfying the second-order sufficient opti-
mality conditions for the barrier problem, the projection of the Hessian Wt + t onto
the null space of the constraint Jacobian ∇c(x̄t )T is positive definite, and therefore no
modification of the iteration matrix, as described in Section 3.1, is applied. As a con-
sequence, the search directions computed from (11)–(12) are the Newton steps for (4),
so that the above procedure will accept those steps and quickly converge toward this
solution, if it is started sufficiently close.
On the implementation of an interior-point filter line-search algorithm 43
Since the norm of the KKT conditions is decreased by at least a fixed fraction,
κF , it is guaranteed that the method eventually either resumes the regular procedure,
Algorithm A, or reverts to the restoration phase described in Section 3.3.1.
The above algorithm is not attempted, if the restoration phase is triggered in the
regular method because of numerical problems during the solution of the linear system
(11) in Step IC-6. In that case, the method immediately proceeds to the restoration phase
described in Section 3.3.1.
For simplicity, the algorithm has been described for solving optimization problems of
the form (2), but it is straight-forward to generalize the procedures outlined so far to the
more general formulation (1). In particular, the resulting barrier problem then becomes
(i)
(i)
min ϕµj (x) = f (x) − µj ln x (i) − xL − µj ln xU − x (i) (34a)
x∈Rn
i∈IL i∈IU
s.t. c(x) = 0 (34b)
(i) (i) (i)
where IL = {i : xL = −∞} and IU = {i : xU = ∞}. Bound multipliers zL
(i)
and zU are introduced for all finite lower and upper bounds, and the primal-dual Hes-
sian k of the barrier terms is defined as the sum of kL = diag(σk,1
L , . . . , σ L ) and
k,n
kU = diag(σk,1
U , . . . , σ U ), where
k,n
(i) (i) (i)
zL,k / xk − xL if i ∈ IL
L
σk,i = ,
0 otherwise
(i) (i) (i)
zU,k / xU − xk if i ∈ IU
U
σk,i = .
0 otherwise
(i) (i)
For completeness, we define zL,k = 0 for i ∈ IL and zU,k = 0 for i ∈ IU .
If the given lower and upper bounds for a variable are identical, this component
of x is fixed to this value for all function evaluations and removed from the problem
statement.
As a barrier method, the proposed algorithm relies on the existence of a strict relative
interior of the feasible region, i.e., of points x with xL < x < xU and c(x) = 0, since
otherwise a solution to the barrier problem (34) does not exist. However, this assump-
tion can easily be violated in practice, for example, if the equality constraints implicitly
(i)
imply x (i) = xL for some i-th component. In such a case, in the process of trying to
find a feasible point for a fixed value of µj , the algorithm might generate a sequence of
44 A. Wächter, L. T. Biegler
(i) (i)
iterates where xk − xL becomes very small. This in turn can lead to numerical diffi-
culties during the solution of the linear system (11), because the corresponding entry in
(i) (i)
k , which is roughly of the order of µj /(xk − xL )2 (see (17)), becomes very large.
As a remedy, we found it helpful to slightly relax the bounds before solving the
problem by
(i) (i) (i)
xL ← xL − tol max{1, |xL |} (35)
(similarly for xU ), in order to avoid an empty relative interior from the very beginning.
Since this perturbation is of the order of the termination tolerance tol , we believe that
this does not constitute an unwanted modification of the problem statement.
3 (i)
Furthermore, the lower bound on x (i) is slightly relaxed by (mach ) 4 max{1, xL },
(i) (i)
whenever xk − xL < mach µj , where mach is the machine precision. An analogous
procedure is applied for very small slack to upper bounds. Even if these corrections are
applied repeatedly, the changes are so small that the problem statement is essentially not
modified, but the numerical difficulties are usually avoided.
3.6. Initialization
Since the algorithm requires the iterates to strictly satisfy the bound constraints (1c), it
is often necessary to modify the user-provided initial point so that it is sufficiently away
from the boundary. For this purpose, each component i of the initial point, which has
only one (say, a lower) bound, is modified by
(i) (i) (i) (i)
x0 ← max x0 , xL + κ1 max{1, |xL |}
for a constant κ1 > 0 (similarly for variables only bounded above). The initial value of a
(i) (i) (i) (i)
variable x (i) bounded on two sides is projected into the interval [xL + pL , xU − pU ]
with the perturbations
(i) (i) (i) (i)
pL := min{κ1 max{1, |xL |}, κ2 (xU − xL )}
(i) (i) (i) (i)
pU := min{κ1 max{1, |xU |}, κ2 (xU − xL )},
for some κ2 ∈ (0, 21 ). The default choices in our implementation are κ1 = κ2 = 10−2 .
The dual variables corresponding to the bound constraints are initialized to one com-
ponent-wise. Finally, using the possibly modified initial point x0 and the initial bound
multipliers, the multipliers λ0 for the equality constraints are obtained as least-square
solutions for the dual infeasibility (4a), i.e., by solving the linear system
I ∇c(x0 ) w ∇f (x0 ) − zL,0 + zU,0
= − , (36)
∇c(x0 )T 0 λ0 0
where w is discarded after this computation. However, if λ0 obtained in this way is
too large, i.e., if λ0 ∞ > λmax (with λmax = 103 in our implementation), the least
square estimate is discarded and we set λ0 = 0. In practice this seems to avoid poor
initial guesses for λ0 in cases where the constraint Jacobian is nearly linearly dependent
at the initial point. This procedure for estimating the equality constraint multipliers is
also used after the restoration phase algorithm described in Section 3.3.1 reverts to the
regular method.
On the implementation of an interior-point filter line-search algorithm 45
In some cases, the set S∗ of optimal points for (1) does not consist of isolated points,
but contains an unbounded connected component. Then, the objective function of the
corresponding barrier problem (34) for a fixed value of µj is unbounded below over the
feasible set, since a log-barrier term converges to −∞ as its argument goes to infinity.
As a consequence, the method for solving the barrier problem might fail to converge,
even though the original problem is well-posed.
In order to prevent this behavior, linear damping terms for all variables with exactly
one finite bound are added to the barrier objective function (34a), which then becomes
(i)
(i)
ϕµj (x) = f (x) − µj ln x (i) − xL − µj ln xU − x (i)
i∈IL i∈IU
(i)
(i)
+κd µj x (i)
− xL + κ d µj xU − x (i)
i∈IL \IU i∈IU \IL
The Newton steps for the primal dual equations (4) computed from (11) are invariant to
scaling of the variables, the objective and constraint functions, i.e., to replacing x, f , and c
by x̃ = Dx x, f˜(x) = df f (x) and c̃(x) = Dc c(x) for some df > 0 and positive definite
(1) (n) (1) (m)
diagonal matrices Dx = diag(dx , . . . , dx ), Dc = diag(dc , . . . , dc ). However,
the overall optimization algorithm with its initialization procedures, globalization strat-
egy and stopping criteria usually behaves very differently for different scaling factors,
particularly if the scaling factors are very large or very small. In addition, numerical
difficulties due to finite precision are more likely to arise if the occurring numbers are
of very different orders of magnitude.
Automatic scaling of optimization problems has been examined in the past, but it
is not clear how, in the nonlinear case, the variables and functions should be scaled in
order to obtain good efficiency and robustness (where the sensitivities of functions with
respect to changes in variables might vary drastically from one iteration to another).
In the context of this paper we take the perspective that ideally we would like to
scale the variables and functions so that changing a variable by a given amount has a
comparable effect on any function which depends on this variables, or in other words, so
that the non-zero elements of the function gradients are of the same order of magnitude
(say, 1).
46 A. Wächter, L. T. Biegler
to obtain scaling matrices Dx and Dcf = diag(Dc , df ) so that the nonzero elements in
Dcf J0 Dx−1 are of order one (as proposed in [6]). Similarly, we computed scaling factors
so that the matrix
−1 2
Dx 0 ∇xx f (x0 ) ∇x c(x0 ) Dx−1 0
0 Dc ∇x c(x0 )T 0 0 Dc
has non-zero entries close to one. While these strategies seem to work well in some
instances, the overall performance on the considered test set became worse. Neverthe-
less, these procedures are available to users of our implementation as options.
The automatic scaling procedure finally used by default in the proposed method is
rather conservative and assumes that usually the given problem is well scaled and does
not require modification, unless some sensitivities are large. Given a threshold value
gmax > 0 (gmax = 100 in our implementation), we choose the scaling factors according
to
and we set Dx = I . Note that this will never multiply a function by a number larger than
one, and that all gradient components in the scaled problem are at most of the size gmax
at the starting point.
The scaling factors are computed only at the beginning of the optimization using the
starting point after the modifications described in Section 3.6.
In a few instances we observed that the search directions dkx generated from (13) become
very small compared to the size of the iterate xk itself. For example, this can occur if the
primal variables are already very close to their final optimal value, but the dual variables
have not yet converged. We also observed this situation for very ill-scaled problems.
Performing the regular backtracking line-search procedure can then be unsuccessful
due to rounding errors, and can result in an unnecessary switch to the restoration phase.
In order to prevent this, we allow the algorithm to take the full step with αk = αkmax
(i)
whenever max{|(dkx )(i) |/(1 + |xk |) : i = 1, . . . , n} < 10mach . If this is true for two
consecutive iterations, the algorithm assumes that the current barrier problem has been
solved as well as possible given the finite precision, and reduces the barrier parameter
in A-3. If µj is already very small (tol /10), the algorithm terminates with a warning
message.
On the implementation of an interior-point filter line-search algorithm 47
In our implementation of the proposed algorithm, the linear systems (13) and (32) are
solved by the Harwell routine [19] MA27, after they have been equilibrated with the
scaling routine MC19. As default pivot tolerance for MA27 we specify piv = 10−8 ,
which is about the square root of mach . In our experience, it is very important to use
iterative refinement in order to improve robustness of the implementation and to be
able to obtain highly accurate solutions. Whereas iterative refinement on the linear sys-
tems of the form (13) itself provides somewhat better search directions than using no
iterative refinement, we found that a considerable gain in robustness and precision can
be achieved by applying iterative refinement on the unreduced non-symmetric Newton
system (such as (9), but including the perturbations δw and δc ). Here, we still use (13)
and (12) to solve the linear system, but compute the iterative refinement residual for
the larger linear system (9). This appears to be particularly important for a numerically
robust implementation of the restoration phase, where iterative refinement only on (32)
seems insufficient to solve the restoration phase problem to the default tolerance, even
for very large pivot tolerances piv . We believe that this is partly due to the fact that the
diagonal elements in the smaller formulation (32) are obtained by adding numbers that
may be very different in magnitude, which may lead to severe rounding error. For exam-
ple, if a variable with two bounds converges to one of its bounds, then the corresponding
entry in k is obtained by adding two numbers, one of which converges to zero, and the
other one goes to infinity in the limit.
In addition, if the linear systems cannot be solved sufficiently well despite iter-
ative refinement, the algorithm increases the pivot tolerance for the linear solver by
piv ← max{10−2 , piv }. Here, the pivot tolerance is increased at most once per itera-
3/4
tion. If an increase in the pivot tolerance still does not lead to a sufficiently small residual,
the search direction is used as is.
In order to handle round-off error in the acceptance criteria, such as (18) and (20), we
relax those slightly based on the machine precision mach . For example, (20) is replaced
in the code by
ϕµj (xk,l ) − ϕµj (xk ) − 10mach |ϕµj (xk )| ≤ ηϕ αk,l ∇ϕµj (xk )T dkx .
4. Numerical Results
In the following sections we examine the practical behavior of the algorithm proposed
in this paper. Our implementation, called IPOPT, is written in Fortran 77 and avail-
able as open source3 . The numerical results have been obtained on a PC with a 1.66
GHz Pentium IV microprocessor and 1 GB of memory running RedHat Linux 9.0. The
executables were generated with the Intel Fortran compiler version 7.1, using the flags
“-O3 -mp -pc64”. The machine precision is mach ≈ 10−16 . The source code for the
required BLAS and LAPACK routines have been obtained from www.netlib.org
and compiled with the rest of the code.
3 The source code for IPOPT is available at http://www.coin-or.org/Ipopt. In addition, readily
available BLAS and LAPACK routines as well as certain subroutines from the Harwell library are required to
compile the IPOPT executable.
48 A. Wächter, L. T. Biegler
For the numerical comparison we use the CUTEr test set [18] (as of Jan 1, 2004).
Here, problems with general inequality constraints of the form “dL ≤ d(x) ≤ dU ”
are reformulated into the formulation (1) by adding slack variables dL ≤ s ≤ dU and
replacing the inequality constraint by “d(x) − s = 0.” The initial point for the slack
variables is chosen as s0 = d(x̃0 ), where x̃0 is the starting point given by CUTEr for
the original problem formulation.
The test problems initially used in our experiments were all 979 problems with
analytical twice continuously differentiable functions that have at least as many free
variables as equality constraints, after the reformulation of the general inequality con-
straints. For problems with variable size we used the default size, except for 46 cases
where we decreased the number of variables in order to allow a solution within the given
time limit4 . The problems vary in size from n = 2 to 125, 050 variables and m = 0 to
125, 025 constraints (after the introduction of slack variables).
IPOPTwas run for the test set using the default options and a termination tolerance
of tol = 10−8 with a CPU time limit of 1 hour and an iteration limit of 3000. (The iter-
ation count includes the iterations in the restoration phase algorithms.) Based on those
results, we removed 11 problems from the test set, because they appeared unbounded
below 5 . The problems S365 and S365MOD were excluded because the constraint Jaco-
bian could not be evaluated at the starting point. In addition, VANDERM4 was removed
since at the initial point c(x0 ) ≈ 1063 , and numerical problems occurred. Finally, we
excluded 11 problems on which IPOPT with default options terminated at a point x̃∗
satisfying the termination criterion for the feasibility restoration phase problem (29) (for
√
the tolerance tol = 10−8 ) with c(x̃∗ )1 > tol , and for which also the optimization
codes KNITRO and LOQO (see Section 4.2 below) both failed. These problems might
truly be infeasible6 . We note that IPOPT was able to converge to a point satisfying the
convergence tolerance for the restoration phase problem within the given iteration limit,
therefore producing a user message indicating that the problem seems locally infeasible,
whereas the other methods (except in two cases) exceeded the iteration limit.
Of the remaining 954 problems, which are those used in the comparisons in the next
sections, IPOPT was able to solve 895 problems. This corresponds to a success rate of
93.8%. In 7 cases, it failed to converge within the time limit7 , and in 24 cases the itera-
tion limit was exceeded. Furthermore, IPOPT aborted in 3 problems because it reverted
to the restoration phase when the constraint violation was already below the termina-
4 The problems with altered problem size are CATENARY, CHARDIS1, CONT5-QP, CONT6-QQ,
CVXQP1, CVXQP2, CVXQP3, DRCAV1LQ, DRCAV2LQ, DRCAV3LQ, DRCAVTY3, EG3, EI-
GENA, EIGENALS, EIGENB, EIGENB2, EIGENBCO, EIGENBLS, EIGENC, EIGENC2,
EIGENCCO, EIGENCLS, FLOSP2HH, FLOSP2HL, FLOSP2HM, FMINSURF, GAUSSELM,
HARKERP2, LUBRIF, LUBRIFC, NCVXQP[1-9], NONCVXU2, NONMSQRT, POWER,
SCURLY30, SPARSINE, SPARSQUR.
5 IPOPT failed to converge and was able to produce iterates with very small constraint violation
and at the same time very large negative values of the objective function for the following prob-
lems: FLETCBV3, FLETCHBV, INDEF, LUKVLE2, LUKVLE4, LUKVLI2, LUKVLI4, MESH,
RAYBENDL, RAYBENDS, STATIC3.
6 Those problems were: CONT6-QQ, DRCAVTY3, FLOSP2HH, FLOSP2HL, FLOSP2HM,
HIMMELBD, JUNKTURN, LUBRIF, LUBRIFC, MODEL, WOODSNE.
7 The sizes of those problems could not be altered; except for the problems LUKVLE15 and LUKVLI10,
the size of which we left unchanged because the time limit was not exceeded for the other problems in the
LUKVL* family.
On the implementation of an interior-point filter line-search algorithm 49
tion tolerance, and in 21 problems because the restoration phase algorithm encountered
points where the infeasibility was below the convergence tolerance, but the point was
not acceptable to the (regular) filter. Finally, in 3 cases IPOPT converged to a stationary
point for the infeasibility (but at least one of the codes LOQO and KNITRO was able
to solve the problem), and in one case the evaluation of the constraints was repeatedly
unsuccessful (producing the IEEE numbers Inf or NaN).
For the comparisons in the next sections we make use of the Dolan-Moré perfor-
mance profiles [9]. Given a test set P containing np problems, and ns runs (e.g., obtained
with different solver options) for each problem, these profiles provide a way to graphi-
cally present the comparison of quantities tp,s (such as number of iterations or required
CPU time) obtained for each problem p and each option s. For this, the performance
ratio for a problem p and option s is defined as
tp,s
rp,s := . (37)
min tp,s : 1 ≤ s ≤ ns
If the option s for problem p leads to a failure, we define rp,s := ∞. Then,
1
ρs (τ ) := card p ∈ P : rp,s ≤ τ
np
is the fraction of the test problems that were solved by the option s within a factor τ ≥ 1
of the performance obtained by the best option. The performance plots present ρs for
each option s as a function of τ ; in this paper we use a logarithmic scale for the τ -axis.
Since the considered optimization methods only try to find local solutions of the prob-
lems, it can easily happen that two different options converge to different local solutions.
In an attempt to avoid comparisons of runs to different local solutions, we exclude those
problems for which the final values of the objective functions f (x∗1 ), . . . , f (x∗ns ) were
not close, that is we discard those problems from a performance plot for which
f∗max − f∗min
> 10−1 , (38)
1 + max{|f∗min |, |f∗max |}
where f∗max = max{f (x∗1 ), . . . , f (x∗ns )} and f∗min = min{f (x∗1 ), . . . , f (x∗ns )}, with
the objective functions values of unsuccessful runs omitted.
In this section we examine the practical performance of the proposed filter method in
comparison with an approach based on the exact penalty function
φν (x) = ϕµj (x) + νc(x). (39)
The update rule and step acceptance criteria chosen for the comparison in this paper
has been proposed recently by Waltz et. al. in [27] as part of a hybrid trust region and
line-search interior-point method. We chose this option since the algorithm in [27] is
in many aspects similar to the method proposed here, and since its practical behavior
seems promising (in particular, it performs considerably better than the penalty function
50 A. Wächter, L. T. Biegler
approach used in our earlier comparison [24]). In the following we only briefly state
the algorithm; its motivation can be found in [27]. We should point out, however, that
the algorithm proposed in [27] is more complex and, in particular, reverts, under certain
circumstances, to a trust region approach, ensuring global convergence (in contrast to
the penalty function option using only a backtracking line-search procedure, see [25]).
For the penalty function based option, the search direction is computed from (13) in
an iteration k, and the maximum step sizes are obtained from (15). After this, the penalty
parameter is updated according to the formula
νk−1 if νk−1 ≥ νk+
νk := + ,
νk + 1 otherwise
where
ςk
∇ϕµj (xk )T dkx + x T
2 (dk ) (Wk + k + δw I )dkx
νk+ = ,
(1 − ρ)c(xk )
for ρ ∈ (0, 1), with ρ = 0.1 in our implementation. The scalar ςk is set to one if
(dkx )T (Wk + k + δw I )dkx > 0, and zero otherwise. After this, a backtracking line-
search procedure with αk,l = 2−l αkmax is performed. For each trial step size αk,l the
predicted reduction of the merit function is computed as
2 ςk
predk (αk,l dkx ) = −αk,l ∇ϕµj (xk )T dkx − αk,l (dkx )T (Wk + k + δw I ) dkx +
2
νk c(xk ) − c(xk ) + αk,l ∇c(xk )T dkx
If
for a constant η ∈ (0, 21 ), then the trial step size αk = αk,l is accepted and the iterates
are updated according to (14). Otherwise a shorter trial step size is tried. It can be shown
that dkx is a descent direction for φνk , so that eventually a sufficiently small trial step size
is accepted. At the beginning of the optimization and after each decrease of the barrier
parameter µj , the penalty parameter νk is set to 10−6 .
In order to achieve a fair comparison between the filter method and this approach, all
comparable constants (such as η) are set to the same values, so that the methods behave
identically on problems without equality constraints. In addition, the details described
in Section 3 are identical (including a second-order correction for the merit function
method), unless they pertain specifically to the filter method, such as the restoration
phase. In particular, the Hessian correction scheme described in Section 3.1 is also used
for the line-search algorithm using (39), with the exception that δc is always kept at zero
to ensure that the generated search direction dkx is always a descent direction for the exact
penalty function. As a consequence, this line-search option aborts, when a point with
On the implementation of an interior-point filter line-search algorithm 51
100
95
90
85
% of problems (100 ρ)
80 Filter (default)
Filter (no heuristics)
Penalty Function
75 Full Step
70
65
60
55
50
0 1 2 3 4 5 6
not more than 2x−times worse than best solver ( log2(τ) )
105
Filter (no heuristics)
Penalty Function
100
% of problems (100 ρ)
95
90
85
80
0 1 2 3 4 5 6
not more than 2x−times worse than best solver ( log2(τ) )
Fig. 2. Comparing iteration count for problems solved by “Filter (no heuristics)” and “Penalty
Function”
that in many cases Newton’s method does not require a safeguarding scheme (note that
second derivatives are used in the computation of the search directions), or alternatively,
that many problems in the test set are not very difficult.
On the other hand, the different options do not seem to differ very much in terms
of efficiency. If we compare the number of iterations for the “Filter (no heu-
ristics)” and “Penalty Function” only for those 814 problems, in which both
options were able to find a solution with comparable final objective function values, then
the performance plots turn out to be very similar, see Figure 2 (note that the range of the
vertical axis starts at 80%). The filter option seems to be only slightly more efficient for
those problems.
In this section we present a comparison of IPOPT with the optimization codes KNITRO
[3, 28] (version 3.1.1) and LOQO [23] (version 6.06), both well regarded and recog-
nized software packages for large-scale nonlinear optimization. Like IPOPT, they are
based on interior-point approaches. Tables with detailed results for every test problem
and each solver can be downloaded from the first author’s home page8 .
8 http://www.research.ibm.com/people/a/andreasw
On the implementation of an interior-point filter line-search algorithm 53
The comparison presented here is not meant to be a rigorous assessment of the per-
formance of these three algorithms, as this would require very careful handling of subtle
details such as comparable termination criteria etc, and would be outside the scope of
this paper. In addition, all three software packages are continuously being improved, so
that a comparison might quickly be out of date. The main purpose of the comparison
here is to give an idea of the relative performance of IPOPT, and to encourage read-
ers to consider IPOPT as a potential candidate when looking for a practical nonlinear
optimization code.
All three optimization codes were run with their default options on the 954 problems
of our test set on the same machine as used to obtain the IPOPT results. Again, a CPU
time limit of 1 hour and an iteration count limit of 3000 was imposed. The default termi-
nation tolerance for KNITRO and LOQO is “10−6 ,” whereas we still chose tol = 10−8
for IPOPT. The termination criteria are not directly comparable, for example due to
different scalings of various entities and different reformulations of the problems, but
we believe that on average the chosen termination criterion for IPOPT is tighter than
those for the other two codes. We include a run for IPOPT, for which the automatic
problem scaling procedure described in Section 3.8 has been disabled, since the other
codes do not perform any scaling of the problem statement.
As mentioned earlier, IPOPT in default mode terminated successfully for 895 out
of the 954 problems, whereas only 872 could be solved when the scaling was disabled.
KNITRO terminated successfully in 829 cases, and LOQO for 847 problems. Figure 3
100
90
80 Ipopt
Ipopt (no scaling)
Knitro 3.1
70 Loqo 6.06
% of problems (100 ρ)
60
50
40
30
20
10
0
0 1 2 3 4 5 6
not more than 2x−times worse than best solver ( log2(τ) )
100
90
80 Ipopt
Ipopt (no scaling)
Knitro 3.1
70 Loqo 6.06
% of problems (100 ρ)
60
50
40
30
20
10
0
0 1 2 3 4 5 6
not more than 2x−times worse than best solver ( log2(τ) )
presents a performance plot for the iteration count, and Figure 4 compares the number
of function evaluations9 . Here, 75 problems were excluded because the final objective
function values were too different (see (38)). IPOPT appears to be more efficient in both
measures compared to LOQO, and comparable to KNITRO in terms of iteration counts.
However, KNITRO is a trust region method, and the computational costs per iteration are
usually not comparable; each unsuccessful trial point in KNITRO is counted as one iter-
ation. Looking at Figure 4, KNITRO seems to require overall fewer function evaluations
than IPOPT for the given test set. These figures also show that the scaling procedure
proposed in Section 3.8 does indeed improve IPOPT’s robustness and efficiency.
Finally, Figure 5 presents a comparison of the CPU time10 . Since the CPU time is
measured in 0.01s increments on the machine used for obtaining the results, we excluded
the 444 test problems from the graph, for which the CPU time for the fastest solver was
less than 0.05s, as well as 48 additional problems with different final objective function
values. As can be seen, IPOPT seems to perform well compared to the other solvers.
9 LOQO appears to compute the function value for each accepted iterate twice, so that a minimum of two
function evaluations is observed per iteration. To correct for this, the function evaluation count for LOQO has
been decreased by the number of iterations for the performance plots.
10 Like IPOPT, KNITRO is written in Fortran and has been compiled with the same compiler and compiler
options. LOQO is written C, and we used the default Linux library available at the LOQO website.
On the implementation of an interior-point filter line-search algorithm 55
100
90
80
70
% of problems (100 ρ)
Ipopt
60 Ipopt (no scaling)
Knitro 3.1
Loqo 6.06
50
40
30
20
10
0
0 1 2 3 4 5 6
not more than 2x−times worse than best solver ( log2(τ) )
5. Conclusions
Acknowledgements. The authors would like to thank Richard Waltz and Jorge Nocedal, as well as Hande
Benson and Robert Vanderbei for their help and providing a copy of their optimization codes KNITRO and
LOQO, respectively. We further thank Arvind Raghunathan for insightful comments on different aspects of
the algorithm, Carl Laird for his help in obtaining the numerical results, and Dominique Orban for support
on CUTEr issues. We are also very grateful to Andrew Conn and Jorge Nocedal, as well as two anonymous
referees, whose comments on the manuscript greatly helped to improve the exposition of the material.
56 A. Wächter, L. T. Biegler
References
1. Benson, H. Y., Shanno, D. F., Vanderbei, R. J.: Interior-point methods for nonconvex nonlinear program-
ming: Filter methods and merit functions. Computational Optimization and Applications, 23 (2), 257–272
(2002)
2. Byrd, R. H., Gilbert, J. Ch., Nocedal, J.: A trust region method based on interior point techniques for
nonlinear programming. Mathematical Programming, 89, 149–185 (2000)
3. Byrd, R. H., Hribar, M. E., Nocedal, J.: An interior point algorithm for large-scale nonlinear programming.
SIAM Journal on Optimization, 9 (4), 877–900 (1999)
4. Byrd, R. H., Liu, G., Nocedal, J.: On the local behavior of an interior point method for nonlinear program-
ming. In: Griffiths, D. F., Higham, D. J. (eds), Numerical Analysis 1997, pages 37–56. Addison–Wesley
Longman, Reading, MA, USA, 1997
5. Chamberlain, R. M., Lemarechal, C., Pedersen, H. C., Powell, M. J. D.: The watchdog technique for
forcing convergence in algorithms for constrained optimization. Mathematical Programming Study, 16,
1–17 (1982)
6. Conn, A. R., Gould, N. I. M., Toint, Ph. L.: LANCELOT: a Fortran package for large-scale nonlinear
optimization (Release A). Number 17 in Springer Series in Computational Mathematics. Springer Verlag,
Heidelberg, Berlin, New York, 1992
7. Conn, A. R., Gould, N. I. M., Toint, Ph. L.: Trust-Region Methods. SIAM, Philadelphia, PA, USA, 2000
8. Conn, A. R., Gould, N.I.M., Orban, D., Toint, Ph. L.: A primal-dual trust-region algorithm for non-convex
nonlinear programming. Mathematical Programming, 87 (2), 215–249 (2000)
9. Dolan, E. D., Moré, J. J.: Benchmarking optimization software with performance profiles. Mathematical
Programming, 91 (2), 201–213 (2002)
10. El-Bakry, A. S., Tapia, R. A., Tsuchiya, T., Zhang, Y.: On the formulation and theory of the Newton inte-
rior-point method for nonlinear programming. Journal of Optimization Theory and Application, 89 (3),
507–541 (1996)
11. Fiacco, A. V., McCormick, G. P.: Nonlinear Programming: Sequential Unconstrained Minimization Tech-
niques. John Wiley, New York, USA, 1968 Reprinted by SIAM Publications, 1990.
12. Fletcher, R.: Practical Methods of Optimization. John Wiley and Sons, New York, USA, second edition,
1987
13. Fletcher, R., Gould, N. I. M., Leyffer, S., Toint, Ph. L., Wächter, A.: Global convergence of a trust-
region SQP-filter algorithms for general nonlinear programming. SIAM Journal on Optimization, 13 (3),
635–659 (2002)
14. Fletcher, R., Leyffer, S.: Nonlinear programming without a penalty function. Mathematical Programming,
91 (2), 239–269 (2002)
15. Fletcher, R., Leyffer, S., Toint, Ph. L.: On the global convergence of a filter-SQP algorithm. SIAM Journal
on Optimization, 13 (1), 44–59 (2002)
16. Forsgren, A., Gill, P. E., Wright, M. H.: Interior methods for nonlinear optimization. SIAM Review, 44 (4),
525–597 (2002)
17. Gould, N. I. M., Orban, D., Sartenaer, A., Toint, Ph. L.: Superlinear convergence of primal-dual interior
point algorithms for nonlinear programming. SIAM Journal on Optimization, 11 (4), 974–1002 (2001)
18. Gould, N. I. M., Orban, D., Toint, Ph. L.: CUTEr (and SifDec), a constrained and unconstrained testing
environment, revisited. Technical Report TR/PA/01/04, CERFACS, Toulouse, France, 2001
19. Harwell Subroutine Library, AEA Technology, Harwell, Oxfordshire, England. A catalogue of subroutines
(HSL 2000), 2002
20. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York, NY, USA, 1999
21. Tits, A. L., Wächter, A., Bakhtiari, S., Urban, T. J., Lawrence, C. T.: A primal-dual interior-point method
for nonlinear programming with strong global and local convergence properties. SIAM Journal on Opti-
mization, 14 (1), 173–199 (2003)
22. Ulbrich, M., Ulbrich, S., Vicente, L. N.: A globally convergent primal-dual interior-point filter method
for nonlinear programming. Mathematical Programming, 100 (2), 379–410 (2004)
23. Vanderbei, R. J., Shanno, D. F.: An interior-point algorithm for nonconvex nonlinear programming. Com-
putational Optimization and Applications, 13, 231–252 (1999)
24. Wächter, A.: An Interior Point Algorithm for Large-Scale Nonlinear Optimization with Applications in
Process Engineering. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, January 2002
25. Wächter, A., Biegler, L. T.: Failure of global convergence for a class of interior point methods for nonlinear
programming. Mathematical Programming, 88 (2), 565–574 (2000)
26. Wächter, A., Biegler, L. T.: Line search filter methods for nonlinear programming: Motivation and global
convergence. Technical Report RC 23036, IBM T.J. Watson Research Center, Yorktown Heights, USA,
2001; revised 2004. To appear in SIAM Journal on Optimization.
On the implementation of an interior-point filter line-search algorithm 57
27. Waltz, R. A., Morales, J. L., Nocedal, J., Orban, D.: An interior algorithm for nonlinear optimization that
combines line search and trust region steps. Technical Report OTC 6/2003, Optimization Technology
Center, Northwestern University, Evanston, IL, USA. To appear in Mathematical Programming A
28. Waltz, R. A., Nocedal, J.: KNITRO user’s manual. Technical Report OTC 2003/05, Optimization Tech-
nology Center, Northwestern University, Evanston, IL, USA, April 2003
29. Yamashita, H.: A globally convergent primal-dual interior-point method for constrained optimization.
Optimization Methods and Software, 10, 443–469 (1998)
30. Yamashita, H., Yabe, H., Tanabe, T.: A globally and superlinearly convergent primal-dual interior point
trust region method for large scale constrained optimization. Technical report, Mathematical System
Institute, Inc., Tokyo, Japan, July 1997. Revised July 1998