Optimizing With Ga
Optimizing With Ga
2
Institute of Applied Mathematics and Statistics
University of Würzburg
Am Hubland
97074 Würzburg
Germany
e-mail: [email protected]
3
Department of Applied Mathematics and Physics
Graduate School of Informatics
Kyoto University
Kyoto 606-8501
Japan
e-mail: [email protected]
[email protected]
1
This research was supported in part by a Grant-in-Aid for Scientific Research from the Ministry of
Education, Science, Sports, and Culture of Japan.
Abstract. We consider the problem of finding a solution of a constrained (and not nec-
essarily square) system of equations, i.e., we consider systems of nonlinear equations and
want to find a solution that belongs to a certain feasible set. To this end, we present two
Levenberg-Marquardt-type algorithms that differ in the way they compute their search di-
rections. The first method solves a strictly convex minimization problem at each iteration,
whereas the second one solves only one system of linear equations in each step. Both meth-
ods are shown to converge locally quadratically under an error bound assumption that is
much weaker than the standard nonsingularity condition. Both methods can be globalized
in an easy way. Some numerical results for the second method indicate that the algorithm
works quite well in practice.
2
1 Introduction
In this paper we consider the problem of finding a solution of the constrained system of
nonlinear equations
F (x) = 0, x ∈ X, (1)
where X ⊆ Rn is a nonempty, closed and convex set and F : O → Rm is a given mapping
defined on an open neighbourhood O of the set X. Note that the dimensions n and m do
not necessarily coincide. We denote by X ∗ the set of solutions to (1).
The solution of an unconstrained square system of nonlinear equations, where X =
n
R and n = m in (1), is a classical problem in mathematics for which many well-known
solution techniques like Newton’s method, quasi-Newton methods, Gauss-Newton methods,
Levenberg-Marquardt methods etc. are available, see, e.g., [19, 4, 14] for three standard
books on this subject.
The solution of a constrained (and possibly nonsquare) system of equations like problem
(1), however, has not been the subject of intense research. In fact, the authors are currently
only aware of the recent papers [9, 15, 12, 13, 18, 23, 22, 1, 20] that deal with constrained
(typically box constrained) systems of equations. Most of these papers describe algorithms
that have certain global and local fast convergence properties under a nonsingularity as-
sumption at the solution.
The nonsingularity assumption implies that the solution is locally unique. Here we
present some Levenberg-Marquardt-type algorithms that are locally quadratically conver-
gent under a weaker assumption that, in particular, allows the solution set to be (locally)
nonunique. To this end, we replace the nonsingularity assumption by an error bound con-
dition. This is motivated by the recent paper [24] that deals with unconstrained equations
only. See also [3, 7] for some subsequent related results for the unconstrained case.
On the other hand, the possibility of dealing with constrained equations is very important.
In fact, systems of nonlinear equations arising in several applications are often constrained.
For example, in chemical equilibrium systems (see, e.g., [16, 17]), the variables correspond to
the concentration of certain elements that are naturally nonnegative. Furthermore, in many
economic equilibrium problems, the mapping F is not defined everywhere (see, e.g., [6]) so
that one is urged to impose suitable constraints on the variables. Finally, engineers often
have a good guess regarding the area where they expect their solution to lie; such a priori
knowledge can then easily be incorporated by adding suitable constraints to the system of
equations.
The organization of this paper is as follows: Section 2 describes a constrained Levenberg-
Marquardt method for the solution of problem (1). It is shown that this method has some nice
local convergence properties under fairly mild assumptions. We also note that the method
can be globalized quite easily. The main disadvantage of this method is that it has to solve
relatively complicated subproblems at each iteration, namely (strictly convex) quadratic pro-
grams in the special case where the set X is polyhedral, and convex minimization problems
in the general case.
In order to avoid this drawback, we present a variant of the constrained Levenberg-
Marquardt method in Section 3 (called the projected Levenberg-Marquardt method) that
solves only a system of linear equations per iteration. This method is shown to have es-
3
sentially the same local (and global) convergence properties as the method of Section 2.
Numerical results for this method are presented in Section 4. We conclude the paper with
some remarks in Section 5.
The notation used in this paper is standard: The Euclidean norm is denoted by k · k,
n
Bδ (x) := {y ∈ R | ky − xk ≤ δ} is the closed ball centered at x with radius δ > 0,
∗ ∗
dist(y, X ) := inf ky − xk x ∈ X denotes the distance from a point y to the solution set
X ∗ , and PX (x) is the projection of a point x ∈ Rn onto the feasible set X.
4
Algorithm 2.1 (Constrained Levenberg-Marquardt Method: Local Version)
(S.2) Choose Hk ∈ Rm×n , set µk := µkF (xk )k2 , and compute dk as the solution of (4).
Note that the algorithm is well-defined and that all iterates xk belong to the feasible set
X. To establish our (local) convergence results for Algorithm 2.1, we need the following
assumptions.
Assumption 2.2 The solution set X ∗ of problem (1) is nonempty. For some solution x∗ ∈
X ∗ , there exist constants δ > 0, c1 > 0, c2 > 0 and L > 0 such that the following inequalities
hold:
Assumption (6) is a local error bound condition and known to be much weaker than the more
standard nonsingularity of the Jacobian F 0 (x∗ ) in the case where this Jacobian exists and is
a square matrix (i.e., if F is differentiable and n = m). For example, this local error bound
condition is satisfied when F is affine and X is polyhedral. To see this, let F (x) = Ax + a
and X = {x | Bx ≤ b} with appropriate matrices A, B and vectors a, b. Due to Hoffman’s
[11] famous error bound result, there exists τ > 0 such that
τ dist(x, X ∗ ) ≤ kF (x)k,
5
2.2 Local Convergence of Distance Function
Throughout this subsection, we suppose that Assumption 2.2 holds. The constants δ, c1 , c2 ,
and L that appear in the subsequent analysis are always the constants from Assumption 2.2.
Our aim is to show that Algorithm 2.1 is locally quadratically convergent in the sense that
the distance from the iterates xk to the solution set X ∗ goes down to zero with a quadratic
rate. In order to verify this result, we need to prove a couple of technical lemmas. These
lemmas can be derived by suitable modifications of the corresponding unconstrained results
in [24].
Lemma 2.3 There exist constants c3 > 0 and c4 > 0 such that the following inequalities
hold for each xk ∈ Bδ/2 (x∗ ) ∩ X:
(a) kdk k ≤ c3 dist(xk , X ∗ ).
(b) kF (xk ) + Hk dk k ≤ c4 dist(xk , X ∗ )2 .
Proof. (a) Let x̄k ∈ X ∗ denote the closest solution to xk so that
kxk − x̄k k = dist(xk , X ∗ ). (10)
Since dk is the global minimum of subproblem (4) and xk + d¯k ∈ X holds for the vector
d¯k := x̄k − xk , we have
θk (dk ) ≤ θk (d¯k ) = θk (x̄k − xk ). (11)
Furthermore, since xk ∈ Bδ/2 (x∗ ) by assumption, we obtain
kx̄k − x∗ k ≤ kx̄k − xk k + kxk − x∗ k ≤ kx∗ − xk k + kxk − x∗ k ≤ δ
so that x̄k ∈ Bδ (x∗ ) ∩ X. Moreover, the definition of µk in Algorithm 2.1 together with (6)
and (10) gives
µk = µkF (xk )k2 ≥ µc21 dist(xk , X ∗ )2 = µc21 kxk − x̄k k2 . (12)
Using (10), (11), (12) and (7), we obtain from the definition of the function θk in (5) that
1 k k
kdk k2 ≤ θ (d )
µk
1 k k
≤ θ (x̄ − xk )
µk
1
kF (xk ) + Hk (x̄k − xk )k2 + µk kx̄k − xk k2
=
µk
1
F (xk ) − F (x̄k ) −Hk (xk − x̄k )
2 + kx̄k − xk k2
=
µk | {z }
=0
1 2 k
≤ c kx − x̄k k4 + kxk − x̄k k2
µk 2
c22 k
≤ kx − x̄k k2 + kxk − x̄k k2
µc21
c2
= 2
+ 1 dist(xk , X ∗ )2 .
µc21
6
q
c22
Therefore, statement (a) holds with c3 := µc21
+ 1.
2
p
Hence statement (b) holds with c4 := c22 + µL2 .
The next result is a major step in verifying local quadratic convergence of the distance
function.
Lemma 2.4 Assume that both xk−1 and xk belong to the ball Bδ/2 (x∗ ) for each k ∈ N. Then
there is a constant c5 > 0 such that
dist(xk , X ∗ ) ≤ c5 dist(xk−1 , X ∗ )2
for each k ∈ N.
Proof. Since xk , xk−1 ∈ Bδ/2 (x∗ ) and xk = xk−1 + dk−1 , we obtain from (7) that
F (xk−1 + dk−1 )
−
F (xk−1 ) + Hk−1 dk−1
≤
F (xk−1 ) − F (xk−1 + dk−1 ) + Hk−1 dk−1
≤ c2 kdk−1 k2 .
Using the error bound assumption (6) and Lemma 2.3, we therefore obtain
c1 dist(xk , X ∗ ) ≤ kF (xk )k
= kF (xk−1 + dk−1 )k
≤
F (xk−1 ) + Hk−1 dk−1
+ c2 kdk−1 k2
≤ c4 dist(xk−1 , X ∗ )2 + c2 c23 dist(xk−1 , X ∗ )2
= (c4 + c2 c23 )dist(xk−1 , X ∗ )2 ,
7
and this completes the proof by setting c5 := (c4 + c2 c23 )/c1 . 2
The next result shows that the assumption of Lemma 2.4 is satisfied if the starting point x0
in Algorithm 2.1 is chosen sufficiently close to the solution set X ∗ . Let
n δ 1 o
r := min , . (15)
2(1 + 2c3 ) 2c5
Lemma 2.5 Assume that the starting point x0 ∈ X used in Algorithm 2.1 belongs to the ball
Br (x∗ ), where r is defined by (15). Then all iterates xk generated by Algorithm 2.1 belong
to the ball Bδ/2 (x∗ ).
Proof. The proof is by induction on k. We start with k = 0. By assumption, we have
x0 ∈ Br (x∗ ). Since r ≤ δ/2, this implies x0 ∈ Bδ/2 (x∗ ). Now let k ≥ 0 be arbitrarily given
and assume that xl ∈ Bδ/2 (x∗ ) for all l = 0, . . . , k. In order to show that xk+1 belongs to
Bδ/2 (x∗ ), first note that
kxk+1 − x∗ k = kxk + dk − x∗ k
≤ kxk − x∗ k + kdk k
= kxk−1 + dk−1 − x∗ k + kdk k
≤ kxk−1 − x∗ k + kdk−1 k + kdk k
.. ..
. .
Xk
≤ kx0 − x∗ k + kdl k
l=0
k
X
≤ r + c3 dist(xl , X ∗ ),
l=0
where the last inequality follows from Lemma 2.3. Since Lemma 2.4 implies
dist(xl , X ∗ ) ≤ c5 dist(xl−1 , X ∗ )2 l = 1, . . . , k,
we have
dist(xl , X ∗ ) ≤ c5 dist(xl−1 , X ∗ )2
2
≤ c5 c25 dist(xl−2 , X ∗ )2
.. ..
. .
l−1 l
≤ c5 c5 · · · c52 dist(x0 , X ∗ )2
2
l l
= c25 −1 dist(x0 , X ∗ )2
l l
≤ c25 −1 kx0 − x∗ k2
l l
≤ c25 −1 r2 ,
8
for all l = 0, . . . , k. Using r ≤ 1/(2c5 ), we therefore get
k
l l
X
∗
kx k+1
− x k ≤ r + c3 c25 −1 r2
l=0
k
l l
X
= r + c3 r c25 −1 r2 −1
l=0
k
X 1 2l −1
≤ r + c3 r
l=0
2
∞
X 1 l
≤ r + c3 r
l=0
2
= (1 + 2c3 )r
δ
≤ ,
2
where the last inequality follows from the definition (15) of r. This completes the induction.
2
We now obtain the following quadratic convergence result for the distance function as an
immediate consequence of Lemmas 2.4 and 2.5.
Theorem 2.6 Let Assumption 2.2 be satisfied and {xk } be a sequence generated by Algo-
rithm 2.1 with starting point x0 ∈ Br (x∗ ), where r is defined by (15). Then the sequence
{dist(xk , X ∗ )} converges to zero quadratically, i.e., the iterates xk approach the solution set
X ∗ locally quadratically.
Theorem 2.6 is the main result in this subsection and shows that the constrained Levenberg-
Marquardt method of Algorithm 2.1 is locally quadratically convergent under fairly mild
assumptions.
9
Theorem 2.7 Let Assumption 2.2 be satisfied and {xk } be a sequence generated by Algo-
rithm 2.1 with starting point x0 ∈ Br (x∗ ), where r is defined by (15). Then the sequence
{xk } converges to a solution x̄ of (1) belonging to the ball Bδ/2 (x∗ ).
Proof. Since the entire sequence {xk } remains in the closed ball Bδ/2 (x∗ ) by Lemma 2.5,
every limit point of this sequence belongs to this set, too. Hence it remains to show that the
sequence {xk } converges. To this end, we first note that, for any positive integers k and m
such that k > m, we have
In order to prove that the sequence {xk } converges locally quadratically, we need some
further preparatory results.
Lemma 2.8 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 2.1. Then
there is a constant c6 > 0 such that
dist(xk , X ∗ ) ≤ c6 kdk k
10
for all k ∈ N sufficiently large. Letting x̄k+1 ∈ X ∗ denote the closest solution to xk+1 , we
then obtain
The next result shows that the length of the search direction dk goes down to zero locally
quadratically.
Lemma 2.9 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 2.1. Then
there is a constant c7 > 0 such that
kdk+1 k ≤ c7 kdk k2
kdk+1 k ≤ c3 dist(xk+1 , X ∗ )
≤ c3 c5 dist(xk , X ∗ )2
≤ c3 c5 c26 kdk k2
for all k ∈ N sufficiently large. Setting c7 := c3 c5 c26 gives the desired result. 2
We next show that the length of the search direction dk is eventually in the same order as
the distance from the current iterate xk to the limit point x̄ of the sequence {xk }.
Lemma 2.10 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 2.1 and
converging to x̄. Then there exist constants c8 > 0 and c9 > 0 such that
for all k ∈ N. In order to verify the left inequality, let k ∈ N be sufficiently large so that
Lemma 2.9 applies and
c7 kdk k ≤ 1
11
holds. Without loss of generality, we may also assume that
1
kdk+1 k ≤ kdk k
2
holds. We can then apply Lemma 2.9 successively to obtain
2 1 2
kdk+2 k ≤ c7 kdk+1 k2 ≤ 12 c7 kdk k2 ≤
k
2
kd k,
4 1 3
kdk+3 k ≤ c7 kdk+2 k2 ≤ 21 c7 kdk k2 ≤
2
kdk k,
6 1 4
kdk+4 k ≤ c7 kdk+3 k2 ≤ 21 c7 kdk k2 ≤
k
2
kd k,
.. .. ..
. . .
i.e.,
1 j k
kdk+j k ≤ kd k for all j = 0, 1, 2, . . .
2
Since
l−1
X
k+l k
x =x + dk+j
j=0
and
x̄ = lim xk+l ,
l→∞
we therefore get
As a consequence of the previous lemmas, we now obtain our main local convergence result
of this subsection.
12
Theorem 2.11 Let Assumption 2.2 be satisfied and {xk } be a sequence generated by Al-
gorithm 2.1 with starting point x0 ∈ Br (x∗ ) and limit point x̄. Then the sequence {xk }
converges locally quadratically to x̄.
Proof. Using Lemmas 2.9 and 2.10, we immediately obtain
for all k ∈ N sufficiently large. This shows that {xk } converges locally quadratically to the
limit point x̄. 2
(S.2) Choose Hk ∈ Rm×n , setµk := µkF (xk )k2 , and compute dk as the solution of (4).
(S.3) If
kF (xk + dk )k ≤ γkF (xk )k, (16)
then set xk+1 := xk + dk , k ← k + 1, and go to (S.1); otherwise go to (S.4).
where xk (t) := PX [xk − t∇f (xk )]. Set xk+1 := xk (tk ), k ← k + 1, and go to (S.1).
The convergence properties of Algorithm 2.12 are summarized in the following theorem.
13
Theorem 2.13 Let {xk } be a sequence generated by Algorithm 2.12. Then any accumulation
point of this sequence is a stationary point of (2). Moreover, if an accumulation point x∗
of the sequence {xk } is a solution of (1) and Assumption 2.2 is satisfied at this point, then
the entire sequence {xk } converges to x∗ , the rate of convergence is locally quadratic, and the
sequence {dist(xk , X ∗ )} also converges locally quadratically.
Based on our previous results, the proof can be carried out in exactly the same way as that
of Theorem 3.1 in [24]. We therefore skip the details here.
xk+1 := xk + dk k = 0, 1, . . . ,
where dkU is the unique solution of the unconstrained (hence the subscript ‘U’) subproblem
min θk (dU ), dU ∈ Rn .
We call this the projected Levenberg-Marquardt method since the unconstrained step gets
projected onto the feasible region X. Note that, whenever the projection can be carried out
efficiently (like in the box constrained case), this method needs a significantly less amount
of work per iteration since the strict convexity of the function θk ensures that dkU is a global
14
minimum of this function if and only if ∇θk (dkU ) = 0, i.e., if and only if dkU is the unique
solution of the system of linear equations
(S.2) Choose Hk ∈ Rm×n , setµk := µkF (xk )k2 , and compute dkU as the solution of (17).
Note that the algorithm is well-defined since the coefficient matrix in (17) is always symmetric
positive definite. Furthermore, all iterates xk belong to the feasible set X.
The following assumption is supposed to hold throughout this section.
Assumption 3.2 The solution set X ∗ of problem (1) is nonempty. For some solution x∗ ∈
X ∗ , there exist constants ε > 0, κ1 > 0, κ2 > 0 and L > 0 such that the following inequalities
hold:
We tacitly assume that the constant ε > 0 in Assumption 3.2 is taken sufficiently small so
that the mapping F is defined in the entire ball Bε (x∗ ). Note that this is always possible
since F is assumed to be defined on an open set O containing the feasible region X.
Apart from this, the only difference between Assumptions 2.2 and 3.2 lies in the fact
that we now assume that the three conditions (18)–(20) hold in the entire ball Bε (x∗ ),
whereas before it was only assumed that the corresponding conditions (6)–(8) hold in the
intersection Bδ (x∗ ) ∩ X. The reason for this slight modification is that we sometimes have
to apply conditions (18)–(20) to the vector xk + dkU that may lie outside X.
Without the restriction on X, condition (18) is more restrictive than the corresponding
condition (6). Whenever there exists a point x such that F (x) = 0 and x 6∈ X, (18) may
fail even if F is affine and X is polyhedral. Nevertheless, condition (18) is still significantly
weaker than the nonsingularity of the Jacobian of F . To see this, consider the example with
F : R2 → R and X ⊆ R2 being defined by
q
F (x) = x21 + x22 − 1
and
X = x − 1 ≤ x1 ≤ 1, −1 ≤ x2 ≤ 0 ,
15
respectively. Note that the solution set of F (x) = 0 without the constraint is the unit circle,
while the solution set of the constrained equation F (x) = 0, x ∈ X, is the lower half of the
unit circle. By substituting x := (r cos θ, r sin θ) with r ≥ 0, we have |F (x)| = |r − 1|. It is
easy to see that dist(x, X ∗ ) = |r − 1| when x is an interior point of X. Therefore (18) holds
on the interior of X. However, when x∗ = (−1, 0)T , which is a boundary point of X, (18)
fails since F (x) = 0 but dist(x, X ∗ ) > 0 for any x such that r = 1 and 0 < θ < π. On the
other hand, when x∗ = (0, −1)T , which is also a boundary point of X, (18) is satisfied for
sufficiently small ε > 0.
Lemma 3.3 There exist constants κ3 > 0 and κ4 > 0 such that the following inequalities
hold for each xk ∈ Bε/2 (x∗ ):
(a) kdkU k ≤ κ3 dist(xk , X ∗ ).
We next state the counterpart of Lemma 2.4. Note, however, that the vector xk−1 + dk−1 U is
k
no longer equal to the next iterate x in the method considered here. Hence the assumption
in the following result is somewhat different from the assumption in the corresponding result
in Lemma 2.4.
Lemma 3.4 Assume that both xk−1 and xk−1 + dk−1 U belong to the ball Bε/2 (x∗ ) for each
k ∈ N. Then there is a constant κ5 > 0 such that
dist(xk , X ∗ ) ≤ κ5 dist(xk−1 , X ∗ )2
for each k ∈ N.
Proof. The definition of xk and the nonexpansiveness of the projection operator imply
that
κ1 dist(xk , X ∗ ) = κ1 dist PX (xk−1 + dUk−1 ), X ∗
= κ1 inf x̄∈X ∗
PX (xk−1 + dUk−1 ) − x̄
= κ1 inf x̄∈X ∗
PX (xk−1 + dk−1
U ) − P X (x̄)
k−1
(21)
≤ κ1 inf x̄∈X ∗
x + dk−1
U − x̄
k−1
+ dU , X ∗
k−1
= κ1 dist x
≤ kF (xk−1 + dUk−1 )k,
16
where the last inequality follows from (18) together with our assumption that xk−1 + dk−1
U ∈
∗ k−1 k−1 k−1 ∗
Bε/2 (x ). Now, using (19) as well as x , x + dU ∈ Bε/2 (x ), we have
F (xk−1 + dk−1 )
−
F (xk−1 ) + Hk−1 dk−1
U U
k−1
≤ F (x ) − F (x
k−1
+ d ) + Hk−1 dk−1
k−1
U U (22)
≤ κ2 kdk−1 2
U k .
The next result is the counterpart of Lemma 2.5 and states that the assumptions in Lemma
3.4 are satisfied if the starting point x0 is chosen sufficiently close to the solution set. Let
n ε 1 o
r := min , . (23)
2(1 + 2κ3 ) 2κ5
Lemma 3.5 Assume that the starting point x0 ∈ X used in Algorithm 3.1 belongs to the
ball Br (x∗ ), where x∗ denotes a solution of (1) satisfying Assumption 3.2 and r is defined
by (23). Then
xk−1 , xk−1 + dk−1
U ∈ Bε/2 (x∗ )
holds for all k ∈ N.
Proof. The proof is by induction on k. We start with k = 1. By assumption, we have
x0 ∈ Br (x∗ ). Since r ≤ ε/2, this implies x0 ∈ Bε/2 (x∗ ). Furthermore, we obtain from Lemma
3.3
17
To see that xk + dkU ∈ Bε/2 (x∗ ), first note that
where the last inequality follows from Lemma 3.3. Using Lemma 3.4, the induction can then
be completed by following the arguments in the proof of Lemma 2.5. 2
We are now able to state our main local convergence result of this subsection. It is an
immediate consequence of Lemmas 3.4 and 3.5.
Theorem 3.6 Let Assumption 3.2 be satisfied and {xk } be a sequence generated by Algo-
rithm 3.1 with starting point x0 ∈ Br (x∗ ), where r is defined by (23). Then the sequence
{dist(xk , X ∗ )} converges to zero locally quadratically.
Theorem 3.7 Let Assumption 3.2 be satisfied and {xk } be a sequence generated by Algo-
rithm 3.1 with starting point x0 ∈ Br (x∗ ), where r is defined by (23). Then the sequence
{xk } converges to a solution x̄ of (1) belonging to the ball Bε/2 (x∗ ).
18
Proof. Similar to the proof of Theorem 2.7, we verify that {xk } is a Cauchy sequence.
Indeed, for any intergers k and m such that k > m, we have
kxk − xm k = kPX (xk−1 + dk−1 m
U ) − PX (x )k
≤ kxk−1 + dk−1
U − xm k
≤ kxk−1 − xm k + kdk−1U k
= kPX (x k−2
+ dU ) − PX (xm )k + kdk−1
k−2
U k
k−2 k−2 m k−1
≤ kx + dU − x k + kdU k
≤ kx k−2
− xm k + kdk−2 k−1
U k + kdU k
.. ..
. .
k−1
X
≤ kdlU k
l=m
∞
X
≤ kdlU k.
l=m
We next want to show that the sequence {xk } is locally quadratically convergent. To this
end, we begin with the following preliminary result.
Lemma 3.8 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 3.1. Then
there is a constant κ6 > 0 such that
dist(xk , X ∗ ) ≤ κ6 kdkU k
for all k ∈ N sufficiently large.
Proof. The proof is a modification of that of Lemma 2.8. First note that Theorem 3.6
implies that
1
dist(xk+1 , X ∗ ) ≤ dist(xk , X ∗ )
2
k+1
for all k ∈ N sufficiently large. Let x̄ be the closest solution to xk+1 , i.e.,
dist(xk+1 , X ∗ ) = kxk+1 − x̄k+1 k.
Then we obtain from the nonexpansiveness of the projection operator
kdkU k = kxk + dkU − xk k
≥ kPX (xk + dkU ) − PX (xk )k
= kxk+1 − xk k
≥ kx̄k+1 − xk k − kxk+1 − x̄k+1 k
≥ dist(xk , X ∗ ) − dist(xk+1 , X ∗ )
1
≥ dist(xk , X ∗ ) − dist(xk , X ∗ )
2
1
= dist(xk , X ∗ )
2
19
for all k ∈ N large enough. 2
The next result shows that the length of the unconstrained search direction dkU goes down
to zero locally quadratically.
Lemma 3.9 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 3.1. Then
there is a constant κ7 > 0 such that
kdk+1 k 2
U k ≤ κ7 kdU k
kdk+1
U k ≤ κ3 dist(x
k+1
, X ∗)
≤ κ3 κ5 dist(xk , X ∗ )2
≤ κ3 κ5 κ26 kdkU k2
for all k ∈ N sufficiently large. The desired result then follows by setting κ7 := κ3 κ5 κ26 . 2
We next state the counterpart of Lemma 2.10 that relates the length of dkU with the distance
from the iterates xk to their limit point x̄.
Lemma 3.10 Let x0 ∈ Br (x∗ ) and {xk } be a sequence generated by Algorithm 3.1 and
converging to x̄. Then there exist constants κ8 > 0 and κ9 > 0 such that
20
Since
x̄ = lim xk+l ,
l→∞
Since this holds for an arbitrary (sufficiently large) k ∈ N, we obtain the desired result by
setting κ8 := 1/2. 2
Using Lemmas 3.9 and 3.10, we get the following local convergence result for the iterates xk
in exactly the same way as in the proof of the corresponding Theorem 2.11.
Theorem 3.11 Let Assumption 3.2 be satisfied and {xk } be a sequence generated by Al-
gorithm 3.1 with starting point x0 ∈ Br (x∗ ) and limit point x̄. Then the sequence {xk }
converges locally quadratically to x̄.
Hence it turns out that the projected Levenberg-Marquardt method of Algorithm 3.1 has
essentially the same local convergence properties as the constrained Levenberg-Marquardt
method of Algorithm 2.1.
21
(S.3) If
kF (PX (xk + dkU ))k ≤ γkF (xk )k, (24)
then set xk+1 := PX (xk + dkU ), k ← k + 1, and go to (S.1); otherwise go to (S.4).
where xk (t) := PX [xk − t∇f (xk )]. Set xk+1 := xk (tk ), k ← k + 1, and go to (S.1).
Algorithm 3.12 has the advantage of having simpler subproblems than Algorithm 2.12. How-
ever, this advantage is realized only if the projections onto the feasible set X can be computed
in a convenient manner, which is particularly the case when X is described by some box
constraints.
Based on our previous results, it is not difficult to see that the counterpart of Theorem
2.13 also holds for Algorithm 3.12. We skip the details here.
4 Numerical Results
We have implemented Algorithm 3.12 in MATLAB and tested it on a number of examples
from different areas. The implementation differs slightly from the description of Algorithm
3.12. Specifically, Algorithm 3.12 considers two types of steps only, namely Levenberg-
Marquardt and projected gradient steps, whereas our implementation uses the following
three types of steps:
• LM-step (Levenberg-Marquardt step): This is used when the descent condition (24) is
satisfied, i.e., (S.3) is carried out.
• LS-step (Line Search step): This step occurs if condition (24) is not satisfied but the
search direction sk := PX (xk + dk ) − xk is a descent direction for f in the sense that
∇f (xk )T sk ≤ −ρksk kp for some constants ρ > 0 and p > 1. We then use an Armijo-
type line search to reduce f along the direction sk .
• PG-step (Projected Gradient step): If neither an LM-step nor an LS-step can be used,
we apply a projected gradient step as described in (S.4) of Algorithm 3.12.
It is easy to see that this modification does not change the local and global convergence
properties of Algorithm 3.12.
The parameters used for our test runs are
22
with
ε = 10−5 , kmax = 100 and tmin = 10−12 .
The computational results obtained with these parameters are shown in Tables 1–6.
Tables 1 and 2 give the results for some square systems of equations. All these systems
have some bound constraints. For example, many of the test examples come from chemical
equilibrium problems where the components of the vector x correspond to chemical concen-
trations, so that these problems have some nonnegativity constraints. Other examples are
obtained from complementarity problems
G(x) − y = 0, x ≥ 0, y ≥ 0, xi yi = 0 ∀i.
Also some convex optimization problems are solved by applying the algorithm to the corre-
sponding KKT conditions.
The starting point taken for all test examples is the vector of lower bounds except for
those examples which arise from complementarity or optimization problems. For the latter
problems we used the standard starting point from the literature (filled with zero Lagrange
multipliers).
The columns in Table 1 contain the name of the test problem (together with a hint to the
literature that, however, is usually not the original reference for that particular example),
the dimension n(= m) of this example, the number of iterations, the number of LM-, LS-
and PG-steps, the number of function evaluations as well as the final value of the merit
function f . Table 2 has a similar structure except that the first column gives the value of a
parameter for the particular problem (we use all three different parameters given in [8]).
Table 3 states the results obtained for some underdetermined systems taken from [3].
The columns have a similar meaning to those of Table 1 except that we added one more
column that gives the dimension m of the corresponding (nonsquare) system.
Finally, Tables 4–6 contain numerical results for some parameter-dependent problems
where the starting point of a problem is equal to the solution of the previous problem, i.e.,
we apply Algorithm 3.12 in the framework of a path-following method. Note, however, that
the dependence of these problems on the corresponding parameters might be nonsmooth,
e.g., the number of (known) solutions in the example given in Table 4 varies significantly
with the values of parameters.
To summarize the results shown in the tables, we were able to solve most of the test
problems without any difficulties. Only in a few cases, we were not able to find an approx-
imate solution (the same is true for the method of [1], which has also been tested on many
of the examples used here). This is typically due to the fact that the step size gets too small
(except for the circuit design problem in Table 1, for which we observed convergence to a
non-optimal stationary point). For some examples, we also needed a relatively large number
of function evaluations (at least compared to the number of iterations), but this is mainly
due to the fact that the stepsize reduction factor β was chosen equal to 0.9 (both for LS-
and PG-steps).
23
Test problem, source n iter LM/LS/PG F-eval. f (x)
Himmelblau function, [8, 14.1.1] 2 8 8/0/0 9 1.1e-11
Equilibrium combustion, [8, 14.1.2] 5 10 6/4/0 11 5.2e-11
Bullard-Biegler system, [8, 14.1.3] 2 11 9/2/0 40 9.5e-15
Ferraris-Tronconi system, [8, 14.1.4] 2 3 3/0/0 4 8.9e-15
Brown’s almost lin. syst., [8, 14.1.5] 5 10 10/0/0 11 9.1e-16
Robot kinematics system, [8, 14.1.6] 8 5 5/0/0 6 2.1e-19
Circuit design problem, [8, 14.1.7] 9 – –/–/– – –
Chem. equil. system, [17, system 1] 11 15 13/1/1 64 6.5e-11
Chem. equil. system, [17, system 2] 5 – –/–/– – –
Combust. system (Lean case), [16] 10 7 5/2/0 99 2.0e-11
Combust. system (Rich case), [16] 10 – –/–/– – –
Kojima-Shindo problem, [6] 4 5 4/1/1 21 3.1e-13
Josephy problem, [6] 4 11 8/2/1 80 9.5e-21
Mathiesen problem, [6] 4 3 3/0/0 4 2.0e-16
Hock-Schittkowski 34, [10] 16 8 7/1/0 32 7.6e-18
Hock-Schittkowski 35, [10] 8 2 2/0/0 3 1.2e-13
Hock-Schittkowski 66, [10] 16 65 35/30/0 253 3.4e-11
Hock-Schittkowski 76, [10] 14 43 23/0/20 428 7.1e-11
Table 2: Numerical results for test problem 14.1.9 from [8] (Smith steady state temperature)
5 Final Remarks
This paper described two Levenberg-Marquardt-type methods for the solution of a con-
strained system of equations. Both methods were shown to possess a local quadratic rate
of convergence under a suitable error bound condition. This property is motivated by the
recent research for unconstrained equations in [24] and seems to be much stronger than that
of any other method for constrained equations known to the authors.
The globalization strategy used in this paper is quite standard and can certainly be
improved, although the numerical results indicate that the method works quite well with this
strategy. However, numerical experiments were carried out for the case of box constraints
only since otherwise the computation of the projections onto the feasible set becomes very
expensive and, in fact, dominates the overall cost of the algorithm. The question of how to
deal with a general convex set X in a numerically efficient way is still open.
24
Test problem, source n m iter LM/LS/PG F-eval. f (x)
Linear system, [3, Problem 2] 100 50 3 3/0/0 4 1.3e-11
Linear system, [3, Problem 2] 200 100 6 6/0/0 7 1.8e-14
Linear system, [3, Problem 2] 300 150 13 13/0/0 14 7 8e-29
Quadratic system, [3, Problem 4] 100 50 11 11/0/0 12 1.2e-11
Quadratic system, [3, Problem 4] 200 100 26 26/0/0 27 5.0e-12
Quadratic system, [3, Problem 4] 300 150 72 72/0/0 73 2.6e-15
Table 4: Numerical results for test problem 14.1.8 from [8] (CSTR)
Acknowledgment. The authors would like to thank Stefania Bellavia for sending them
some of the test problems.
References
[1] S. Bellavia, M. Macconi and B. Morini, An affine scaling trust-region approach to
bound-constrained nonlinear systems, Technical Report, Dipartimento di Energetica,
University of Florence, Italy, 2001.
[2] D.P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, MA, 1995.
25
c n iter LM/LS/PG F-eval. f (x)
c=0.5 100 4 4/0/0 5 4.1e-11
c=0.6 100 4 4/0/0 5 2.3e-11
c=0.7 100 5 5/0/0 6 1.3e-10
c=0.8 100 9 9/0/0 10 5.1e-11
c=0.9 100 95 3/92/0 383 1.6e-10
c=0.99 100 98 97/1/1 102 1.7e-10
Table 6: Numerical results for a chemical equilibrium problem (propane), see [5]
[4] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained Optimization and
Nonlinear Equations, Prentice-Hall, Englewood Cliffs, 1983.
[6] S.P. Dirkse and M.C. Ferris, MCPLIB: A collection of nonlinear mixed complementarity
problems, Optimization Methods and Software, 5 (1995), pp. 319–345.
[7] J.Y. Fan and Y.X. Yuan, On the convergence of a new Levenberg-Marquardt method,
Technical Report, AMSS, Chinese Academy of Sciences, 2001.
[8] C.A. Floudas, P.M. Pardalos, C.S. Adjiman, W.R. Esposito, Z.H. Gumus, S.T. Harding,
J.L. Klepeis, C.A. Meyer and C.A. Schweiger, Handbook of Test Problems in Local
and Global Optimization, Nonconvex Optimization and Its Applications 33, Kluwer
Academic Publishers, 1999.
[9] S.A. Gabriel and J.-S. Pang, A trust region method for constrained nonsmooth equa-
tions, in: W.W. Hager, D.W. Hearn and P.M. Pardalos (eds.), Large Scale Optimization
– State of the Art, Kluwer Academic Press, 1994, pp. 155–181.
26
[10] W. Hock and K. Schittkowski, Test Examples for Nonlinear Programming Codes, Lec-
ture Notes in Economics and Mathematical Systems 187, Springer, 1981.
[11] A.J. Hoffman, On approximate solutions of systems of linear inequalities, Journal of the
National Bureau of Standards, 49 (1952), pp. 263–265.
[12] C. Kanzow, An active set-type Newton method for constrained nonlinear systems, in:
M.C. Ferris, O.L. Mangasarian and J.-S. Pang (eds.): Complementarity: Applications,
Algorithms and Extensions, Kluwer Academic Publishers, 2001, pp. 179–200.
[13] C. Kanzow, Strictly feasible equation-based methods for mixed complementarity prob-
lems, Numerische Mathematik, 89 (2001), pp. 135–160.
[14] C.T. Kelley, Iterative Methods for Linear and Nonlinear Equations, SIAM, Phildelphia,
PA, 1995.
[15] D.N. Kozakevich, J.M. Martinez and S.A. Santos, Solving nonlinear systems of equations
with simple bounds, Journal of Computational and Applied Mathematics, 16 (1997), pp.
215–235.
[16] K. Meintjes and A.P. Morgan, A methodology for solving chemical equilibrium systems,
Applied Mathematics and Computation, 22 (1987), pp. 333–361.
[17] K. Meintjes and A.P. Morgan, Chemical equilibrium systems as numerical test problems,
ACM Transactions on Mathematical Software, 16 (1990), pp. 143–151.
[18] R.D.C Monteiro and J.-S. Pang, A potential reduction Newton method for constrained
equations, SIAM Journal on Optimization, 9 (1999), pp. 729–754.
[19] J.M. Ortega and W.C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several
Variables, Academic Press, New York, NY, 1970.
[20] L. Qi, X.-J. Tong and D.-H. Li, An active-set projected trust region algorithm for box
constrained nonsmooth equations, Technical Report, Department of Applied Mathe-
matics, Hong Kong Polytechnique University, Hong Kong, October 2001.
[23] T. Wang, R.D.C. Monteiro and J.-S. Pang, An interior point potential reduction method
for constrained equations, Mathematical Programming, 74 (1996), pp. 159–195.
27