Algorithms For Solving Nonlinear Systems of Equations
Algorithms For Solving Nonlinear Systems of Equations
Systems of Equations(*)
1. INTRODUCTION
Nonlinear systems of equations appear in many real - life problems. More [1989] has
reported a collection of practical examples which include: Aircraft Stability problems,
Inverse Elastic Rod problems, Equations of Radiative Transfer, Elliptic Boundary
Value problems, etc .. We have also worked with Power Flow problems, Distribution
of Water on a Pipeline, Discretization of Evolution problems using Implicit Schemes,
Chemical Plant Equilibrium problems, and others. The scope of applications becomes
even greater if we include the family of Nonlinear Programming problems, since the
first-order optimality conditions of these problems are nonlinear systems.
Given F : IRn _ IRn, F = (iI, ... , fnf, our aim is to find solutions of
F(x) = O. (1.1)
We assume that F is well defined and has continuous partial derivatives on an open
set of IRn. We denote J (x) the matrix of partial derivatives of F (Jacobian matrix).
So,
0(0) Work supported by FAPESP (PT NQ 90-3724-6) FAEP-UNICAMP, FINEP and CNPq.
81
E. Spedicato (ed.), Algorithmsjor Continuous Optimization, 81-108.
© 1994 Kluwer Academic Publishers.
82
described by
At each iteration of Newton's method, we must compute the Jacobian J(xk) and
solve the linear system (2.2). Using modern techniques of automatic differentiation
(see RaIl [1984, 1987], Griewank [1992]' and references therein) we can compute F(x)
and J(x) in a reliable and economical way. If, instead of the true Jacobian in (2.2),
we use an approximation by differences of J(xk), which is generally expensive, we
obtain the Finite-Difference Newton's Method, whose convergence properties are very
similar to those of Newton's method.
Now, (2.2) is a linear system of equations. If n is small, this system can be
solved using the LU factorization with partial pivoting or the QR factorization. See
Golub and Van Loan [1989]. Using these linear solvers, the cost of solving (2.2) is
O(n 3 ) floating point operations. If n is large this cost becomes prohibitive. However,
in many situations, where the matrix J(xk) is sparse, we can solve (2.2) using LU
factorizations. In fact, many times the structure of the matrix is such that the
factors Land U of its factorization are also sparse, and can be computed using a
moderate amount of operations. Computer algorithms for sparse LU factorizations
are surveyed in Duff, Erisman and Reid [1989]. In Gomes-Ruggiero, Martinez and
Moretti [1992] we describe the first version of the NIGHTINGALE package for solving
sparse nonlinear systems. In NIGHTINGALE, we use the sparse linear solver of
George and Ng [1987]. The George-Ng method performs the LU factorization with
partial pivoting of a sparse matrix A using a static data structure defined before
beginning numerical computations. In Newton's method we solve a sequence of linear
systems with the same structure, so, the symbolic phase that defines the data structure
is executed only once.
The system (2.2) has a unique solution if and only if J(xk) is nonsingular. If the
Jacobian is singular, the iteration must be modified. Moreover, if J(Xk) is nearly
singular, it is also convenient to modify the iteration in order to prevent numerical
instability. Many modifications are possible to keep this phenomenon controlled. In
the NIGHTINGALE package, when a very small pivot, relative to the size of the
matrix, occurs, it is replaced by a nonzero scalar whose modulus is sufficiently large.
Moreover, nearly singular or ill-conditioned matrices usually cause very large incre-
ments sk. So, Ilskll must also be controlled. Computer algorithms usually normalize
the stepsize by
~}
S
k .
+-mm
{
l'M k
s.
2. NEWTON'S METHOD
Newton's method is the most widely used algorithm for solving nonlinear systems
of equations. Given an initial estimation XO of the solution of (1.1), this method
considers, at each iteration, the approximation
(2.1)
and computes Xk+l as a solution of the linear system Lk(X) = o. This solution exists
and is unique if J(xk) is nonsingular. Therefore, an iteration of Newton's method is
84
theorem.
Theorem 2.1. Let us assume that F : n c lR" -+ JR:', n an open and convex set,
F E Cl(n), F(x*) = 0, J(x*) nonsingular, and that there exist L,p > 0 such that
for all x E n
Lllx -
IIJ(x) - J(x*)11 :::; x*W'· (2.4)
Then there exists c > 0 such that if Ilxo - x*11 :::; c,the sequence {xk} generated by
(2.2) - (2.3) is well defined, converges to x*, and satisfies
Proof. See Ortega and Rheinboldt [1970], Dennis and Schnabel [1983], etc. 0
3. QUASI-NEWTON METHODS
In this survey, we call Quasi-Newton methods those methods for solving (1.1) whose
general form is
(3.1)
Newton's method, studied in Section 2, belongs to this family. Most Quasi-
Newton methods use less expensive iterations than Newton, but their convergence
properties are not very different. In general, Quasi-Newton methods avoid either the
necessity of computing derivatives, or the necessity of solving a full linear system per
iteration or both tasks.
The most simple Quasi-Newton method is the Stationary Newton Method, where
Bk = J(XO) for all k E IN. In this method, derivatives are computed at the initial
point and we only need the LU factorization of J(XO). A variation of this method
is the Stationary Newton Method with restarts, where Bk = J(xk) if k is a multiple
of a fixed integer m and Bk = B k- 1 otherwise. The number of iterations used by
this method tends to increase with m, but the average computer time per iteration
decreases. In some situations we can determine an optimal choice for m (Shamanskii
[1967]).
An obvious drawback of the stationary Newton methods is that, except when
k == 0 (mod m), Bk does not incorporate information about xk and F(x k). Therefore,
85
the adequacy of the model Lk(X) == F(x k) + Bk(X - Xk) to the real function F(x) can
decrease rapidly as k grows. Observe that, due to (3.1), in Quasi-Newton methods
Xk+l is defined as the solution of L k( x) = 0, which exists and is unique if Bk is
nonsingular. One way to incorporate new information about F on the linear model
is to impose the interpolatory conditions
Defining
(3.6)
we obtain the Sequential Secant Method (Wolfe [1959], Barnes [1965], Gragg and
Stewart [1976], Martinez [1979a], etc.). If the set of increments {sk, sk-1, ... , sk-n+1 }
is linearly independent there exists only one matrix Bk+l that satisfies (3.6). In this
case,
B k+1 -- (yk , yk-1 , ... , yk-n+l )(sk "sk-1 ... , Sk-n+1 )-1 (3.7)
and
-1
B k+1 = (k
s,s k-1 , ... ,s k-n+1)( y,y
k k-l , ... ,y k-n+1 )-1 . (3.8)
Bi:~l can be obtained from Bi: 1 using O(n 2 ) floating point operations. However, in
order to ensure numerical stability, the definition of the increments si that appear
in (3.7) and (3.8) must sometimes be modified. When these modifications are not
necessary, the Sequential Secant Method has the following interpolatory property:
Equation (3.5), using the degrees of freedom inherent to this equation to guarantee
numerical stability. Broyden's "good" method (Broyden [1965]) and the Column Up-
dating Method (COLUM) (Martinez [1984]) are two examples of this idea. In both
methods
(3.10)
where
(3.11)
for Broyden's method and
Zk = (3.12)
I(ejkfskl = (3.13)
(Sk B-1yk)(zk)T
B- 1 - B- 1 + - k B- 1 (3 14)
k+l - k (zk)T B;;1yk k· .
Formula (3.14) shows that B;;:1 can be obtained from B;;1 using O(n 2 ) floating
point operations in the dense case. Moreover,
(3.16)
for k = 1,2, 3 ...
Formula (3.16) is used when n is large. In this case, the vectors uO, zO, ... ,U k- 1, zk-1
are stored and the product B;;1F(xk) is computed using (3.16). In this way, the com-
puter time of iteration k is O(kn) plus the computer time of computing B01F(x k).
If k is large the process must be periodically restarted taking Bk ::::: J(xk).
An intermediate method between Broyden's method and the Sequential Secant
Method is Broyden's Method with Projected Updates, which was introduced by Gay
and Schnabel [1978]. See also Martinez [1979b] and Lopes and Martinez [1980]. This
method is probably more efficient than Broyden for small problems, but we have not
heard about large-scale implementations.
Broyden's method is a particular case of the family of Least Change Secant Up-
date (LCSU) methods (Dennis and Schnabel [1979, 1983], Dennis and Walker [1981]'
87
Martinez [1990b, 1992a]), which include many algorithms that are useful for partic-
ular structures (Hart-Soul algorithms for boundary value problems (Hart and Soul
[1973], Kelley and Sachs [1987]), Partitioned Quasi-Newton methods for separable
problems (Griewank and Toint [1982a, 1982b, 1982c, 1984], Toint [1986]), Meth-
ods of Direct Updating of Factorizations (Dennis and Marwil [1982]' Johnson and
Austria [1983], Chadee [1985], Martinez [1990a]), BFGS and DFP algorithms for
unconstrained minimization (see Dennis and Schnabel [1983]), etc.
Let us survey the main convergence results related to Quasi-Newton algorithms.
We are going to assume, as in Theorem 2.1, that F : n c lR:' --+ IR n , n open and
convex, F E C1(n), F(x*) = 0, J(x*) nonsingular and that the Holder condition
(2.4) is satisfied. The first result is the "Theorem of two neighborhoods".
Theorem 3.1. Given r E (0,1), there exists €, 8 > 0 such that if I/xo - x* II ::; € and
IIBk - J(x*)11 ::; 8 for all k E lN then the sequence {xk} generated by (3.1) is well
defined, converges to x*, and satisfies
(3.17)
for all k E IN.
Using Theorem 3.1 we can prove that the Stationary Newton Method and its
variations with restarts have local convergence at a linear rate. The main tool for
proving superlinear convergence of Quasi-Newton methods, is the following theorem,
due to Dennis and More.
Theorem 3.2. Assume that the sequence {Xk} generated by (3.1) is well defined
and convergent to x*. Then, the two following properties are equivalent.
(3.18) is called the Dennis-More condition. Using (3.18), we can prove that the
Stationary Newton Method with periodic restarts (for which limBk = J(x*)) is
k ..... oo
superlinearly convergent. The Dennis-More condition says that the effect of Bk -
J(x*) on the normalized increment tends to be null when k --+ 00. This condition
88
is weaker than saying that B" tends to J(x*). The Dennis-More condition is closely
related to the Secant Equation. In fact, from (2.4) we can deduce that, for all x E n,
1!F(z) - F(x) - J(x*)(z - x)1I ~ Lllz - xii max{llx - x*IIP, liz - x*W} (3.20)
(Broyden, Dennis and More [1973]). So, writing x = x", z = x"+l, and assuming
that the sequence converges,
(3.23)
Theorem 3.1 does not guarantee local convergence of all Secant Methods. In fact,
the hypothesis of this theorem is that all the B,. '8 belong to a neighborhood of J(x*)
of radius 8. Observe that, even if the first Bo belongs to this neighborhood it could
be possible that IIB,,-J(x*)11 ~ IIBo-J(x*)1I , destroying convergence. Fortunately,
for LCSU methods (including Broyden) we are able to prove that exists 8' > 0 such
that liB,. - J(x*)11 ~ 8 for all k E IN, if IIBa - J(x*)11 ~ 8'. This is a Bounded De-
terioration Property. Moreover, the Successive Projection scheme that characterizes
LCSU methods guarantees also (3.23). Summing up, the following result holds for
Broyden, and other LCSU methods.
Theorem 3.4. There exists €, b > 0 such that, if II xo-x*II ~ € and IIBo-J(x*)II ~ b,
the sequence generated by Broyden's method is well defined, converges to x* and sat-
isfies (3.19).
Proof. See Broyden, Dennis and More [1973]. For an extension to "all" LCSU meth-
ods see Martinez [1990b, 1992a]. 0
89
For Broyden's method, we also have the following result, which states that the
convergence is 2n-quadratic.
Theor~m 3.5. Under the hypotheses of Theorem 3.4, if p ~ 1, there exists c > 0
such that the sequence generated by Broyden's method satisfies
Theorem 3.6. Assume that the sequence {xk} is generated by COLUM, except
that when k == 0 (mod m), Bk = J(xk). Then, there exists c > 0 such that, if
Ilxo - x*11 ~ c, the sequence converges superlinearly to x*.
Proof. See Martinez [1984]. For a similar result, concerning an Inverse Column Up-
dating Method, see Martinez and Zambaldi [1992]. 0
Theorem 3.7. Assume that n = 2. Let r E (0,1). Then, there exists c,5 > 0
such that, if Ilxo- x*11 ~ c and IIBo - J(x*)11 ~ 5, the sequence {Xk} generated by
COLUM is well defined, converges to x*, and satisfies (3.17).
. Ilxk+2n - x*11
hm =0 (3.24)
k_oo Ilxk - x*11
90
and
(3.26)
for some c > 0, q strictly greater than m? The motivation of this question is that
the Stationary Newton method with restarts every m iterations satisfies (3.26) with
q=m.
Does Theorem 3.7 hold for n > 2?
Is COLUM superlinearly convergent in the sense of (3.19)?
4. INEXACT-NEWTON METHODS
Many times, large nonlinear systems can be solved using Newton's method, employing
a sparse LU factorization for solving (2.2). Most frequently, even if Newton's method
is applicable, more efficient algorithms are obtained using COLUM or Broyden, with
Newton restarts. In the NIGHTINGALE package, an automatic restart procedure has
been incorporated, by means of which a Newton iteration is performed only when
it is expected that its efficiency should be greater than the efficiency of previous
Quasi-Newton iterations.
Sometimes, the structure of the Jacobian matrix is unsuitable for LU factoriza-
tions. That is, a lot of fill-in appears due to that structure and so the iteration
becomes very expensive. In many of these cases a strategy of "false Jacobians" works
well. By this we mean that we use a Quasi-Newton iteration with restarts, where, at
the restarted iterations, Bk is not J(xk) but a "simplified Jacobian" J(xk) such that
its L U factorization can be performed without problems.
Unhappily, in many cases, IIJ(xk) - J(xk)11 is excessively large, and the Quasi-
Newton method looses its local convergence properties. In these cases, it is strongly
recommendable to use Inexact-Newton methods.
The idea is the following. Since we cannot solve (2.2) using a direct (LU) method,
we use an Iterative Linear Method for solving (2.2). Usually, iterative linear methods
based on Krylov subspaces are preferred (Golub and Van Loan [1989], Hestenes and
Stiefel [1952]' Saad and Schulz [1986], etc). Iterative Linear Methods are interesting
for solving large-scale systems of equations because of their low memory requirements.
91
When we solve (2.2) using an iterative linear method, we need a stopping criterion
for deciding when to finish the calculation. A very reasonable stopping criterion is
(4.1)
where fh E (0,1). The condition fh < 1 is necessary because, otherwise, the null
increment sk == 0 could be accepted as an approximate solution of (2.2). On the other
hand, if fh ~ 0, the number of iterations needed by the Iterative Linear Method to
obtain (4.1) could be excessively large. Therefore, in practice, an intermediate value
fh ~ 0.1 is recommended.
Dembo, Eisenstat and Steihaug introduced the criterion (4.1) and proved the main
local convergence properties of the algorithms based on this criterion.
Theorem 4.1 Assume that F(x*) = 0, J(x·) is nonsingular and continuous at x*,
and Ok ::::; Omax < 0 < 1. Then there exists t: > 0 such that, if Ilxo- x·11 : : ; t:, the
sequence {Xk} obtained using (4.1) and (2.3) converges to x* and satisfies
(4.2)
for all k ~ 0, where Ilyll* = IIJ(x*)yll. If lim Ok = 0 the convergence is superlinear.
k-+oo
Krylov subspace methods for solving systems like (2.2) are usually implemented
using some preconditioning scheme. See Axelsson [1985]. By this we mean that
the original system is replaced by an equivalent one, which is easier to solve by the
Iterative Linear Solver. In the case of (2.2), we wish to replace the linear system by
(4.3)
where Bkl (or, at least, the product Bkl z) must be easy to compute and Bk ~ J(xk).
For general linear systems, many useful preconditioners Bk have been introduced.
Most of them are based on Incomplete L U Factorizations, or on Stationary Linear
iterations. A very cheap and popular procedure is to use the diagonal of the origi-
nal matrix as preconditioner. Many other preconditioners for specific problems can
be found in the papers published in Spedicato [1991]. A common feature to dif-
ferent preconditioning schemes applied to a linear system Az = b is that the first
iteration of the preconditioned Iterative Linear Solver is zl = >"B-1b, where B is
the preconditioner. So, in the case of the system (2.2), the first increment tried
should be of the form ->..Bk 1 F(xk). This increment will be accepted if it satisfies
(4.1). However, (2.2) is not an isolated linear system of equations. In fact, probably
J(xk) ~ J(xk-l) specially when k --t 00. Therefore, we are motivated to use infor-
mation about Bk,F(xk),F(xk+1),xk+I,xk when we choose the preconditioner B k+1 •
This idea leads to impose a Secant Condition to the preconditioner. So, we would
92
(4.4)
for all k E IN.
We saw in Section 2 that there exist infinite many possible choices of Bk+1 satisfy-
ing (4.4). Nazareth and Nocedal [1978] and Nash [1985] suggested to use the classical
BFGS formula in order to precondition (2.2), when we deal with minimization prob-
lems. Our preference is to define
(4.5)
where Ck+1 is a classical preconditioner and Dk is chosen to satisfy (4.4).
The main appeal of Secant Preconditioners is that it has been shown (Martinez
[1992bJ) that using them it is possible to obtain stronger convergence results than
°
the one mentioned in Theorem 4.1. In fact, the main drawback of this result is the
necessity of (h - t for obtaining superlinear convergence. The following Precondi-
tioned Inexact Newton Method was introduced by Martinez [1992b] with the aim of
obtaining superlinear convergence without imposing a precision tending to infinity in
the iterative resolution of (2.2).
Algorithm 4.2. Let (h E (0,0) for all k E lN,O E (0,1) and lim Ok
k--+oo
= O. Assume
that x O E IRn is an initial approximation to the solution of (1.1) and Bo E mnxn is
an initial nonsingular preconditioner. Given xk E mn and Bk nonsingular, the steps
for obtaining xk+ 1 , Bk+1 are the following.
Step 1. Compute
(4.6)
Step 2. If
(4.7)
define
S
k
= sQ'
k
(4.8)
Else, find an increment sk such that (4.1) holds, using some iterative method.
The following theorem states the main convergence result relative to Algorithm
4.2.
Theorem 4.3. Assume that F : n c lR" -+ lR", n an open and convex set,
F E C 1 (n), J(x*) nonsingular, F(x*) = a and (2.4) holds for some L 2: a,p 2: 1.
Suppose that IIBkll and IIB;lll are bounded and that the Dennis-More condition
(3.18) is satisfied. Then, there exists c > a such that, if Ilxo - x*11 :::; c, the sequence
{xk} generated by Algorithm 4.2 converges superlinearly to x*. Moreover, there ex-
ists ko E IN such that sk = s~ for all k 2: ko.
5. DECOMPOSITION METHODS
The methods studied in the previous sections evaluate all the components of the
function F at the same points. This is not always the best possible strategy. In many
practical problems, given a guess of the solution, the evaluation of a few components
of F is enough to suggest a new useful estimate. Methods that evaluate different
components at different points are called Decomposition Methods.
I ),
The (Block) SOR-Newton method proceeds as follows. Assume that the compo-
k+1 _
Xi
k
- Xi - W
(OFi)-l (
OXi
k+1 k+1 k k )F.( k+1
X l ' ..• 'X i_ l 'Xi' •.• 'X m i Xl
k+1 k k)
, ..• 'X i _ l ,Xi' ... 'X m
(5.1)
94
k+I
Xi = Xik - W
B-1 D ( k+I k+I k k )
i,k 1.'i Xl , ... ,Xi_I' Xi' ... 'X m . (5.2)
Methods of type (5.2) are called SOR-Quasi-Newton methods (Martinez [1992d,
1992eJ). If Bi,k+I satisfies
B i,k+l ( Xik+I - k)
Xi = ri
D ( k+I
Xl , ••. , Xi
k+I k ) D ( k+I
, ... , Xm - ri Xl
k k )
, •.• 'Xi" •. , Xm (5.3)
we say that (5.2) defines a SOR-Secant method (Martinez [1992eJ). The local conver-
gence analysis of (5.1), (5.2) and (5.3) has been made in Martinez [1992e]. Essentially,
the condition for the local linear convergence of (5.1) - (5.3) is the same convergence
condition of the linear SOR method for a linear system with matrix J(x*). If ni is
small for all i = 1, ... , n, SOR methods have low storage requirements and so, they
are useful for large-scale problems.
if k + 1 E T i , and
(5.5)
if k + 1 ¢ T i , where 0::; v(i,j,k) ::; k for all k E IN,i,j = 1,oo.,m. (5.2) is a
particular case of (5.4) - (5.5) defining Ti = {i + jm,j E IN}, v(i,j, k) = k for all
i,j, ... , n, i #- j, k E IN. The Jacobi-Quasi-Newton method corresponds to Ti =
{1,2,3, ... },v(i,j,k) = k. Secant Asynchronous methods based on (5.4) can also be
considered.
The SOR and Jacobi methods for linear systems are strongly related with the pro-
jection methods of Kaczmarz [1937] and Cimmino [1938] respectively. The advantage
of Kaczmarz and Cimmino is that convergence is guaranteed for every linear system,
95
while SOR and Jacobi require a condition on the spectral radius of the transforma-
tion. However, in general, when SOR and Jacobi converge, they are far more efficient
than Kaczmarz and Cimmino. Nonlinear generalizations of Kaczmarz and Cimmino
may be found in Tompkins [1955], McCormick [1977], Meyn [1983) , Martinez [1986a,
1986b, 1986c] and Diniz - Ehrhardt and Martinez [1992].
Different decomposition methods are motivated by direct methods for solving lin-
ear systems, for example, the family of Nonlinear ABS algorithms developed in Abaffy
and Spedicato [1989), Abaffy, Broyden and Spedicato [1984]' Abaffy, Galantai and
Spedicato [1987], Spedicato, Chen and Deng [1992), etc .. These methods generalize
classical algorithms due to Brown [1969 ], Brent [1973], Gay [1975) and Martinez
[1979c, 1980]). The idea is the following. Divide the components of F into m groups
FI, . .. ,Fm. Assume that xk has been computed. We generate Xk,l, . .. ,xk,m by:
(5.6)
Ji(Xk,i)(Xk,i+1 _ xk,i) = -Fi(xk,i), (5.7)
J j (Xk,j)(xk,i+1 - xk,i) = O,j = 1, ... , i-I, (5.8)
where JAx) = Fj(x), i = 0,1, ... ,m - 1.
(5.9)
Clearly, the scheme (5.6) - (5.9) solves a linear system of equations in one cycle.
However, ther are infinite many ways to choose the intermediate points xk,l, ... ,xk,m-l.
Different choices of these points originate different methods of the ABS class. The
first motivation given in the papers of Brown and Brent for methods of type (5.6)
- (5.9) was that, using suitable factorizations, the derivatives can be approximated
by differences in a more economic way than in the Finite Difference Newton method.
Methods of this class have, in general, the same local convergence properties of New-
ton, though the proofs are technically complicated.
Many other decomposition algorithms have been introduced with the aim of taking
advantage of particular structures. For systems that are reducible to block lower tri-
angular form see Eriksson [1976] and Dennis, Martinez and Zhang [1992]. For Block
Tridiagonal Systems, see Hoyer, Schmidt and Shabani [1989]. Much theory about
decomposition methods has been produced by the German school (Schmidt [1987),
Burmeister and Schmidt [1988], Hoyer [1987], Hoyer and Schmidt [1984], Schmidt,
Hoyer and Haufe [1985], etc.)
6. GLOBALIZATION BY OPTIMIZATION
In the previous sections we studied local methods, that is, algorithms that converge,
usually with a high rate of convergence, if the initial point is close enough to the
solution. Luckily, in many cases the domain of convergence of local algorithms is
large enough to guarantee practical efficiency. However, locally convergent methods
96
may not converge if the starting point is very poor, or if the system is highly nonlinear.
By this reason, local methods are usually modified in order to improve their global
convergence properties. The most usual way to do this is to transform (1.1) into an
Optimization Problem, with the objective function J(x) = ~IIF(x)W. Then, (1.1)
becomes the problem of finding a global minimizer of J.
However, the decision of merely using a method to minimize J, in order to solve
(1.1), is not satisfactory. In fact, sometimes efficient local methods converge rapidly
to a solution but the generated sequence {xk} does not exhibit monotonic behav-
ior in J(x k ). In these cases, the pure local method is much more efficient than the
J-minimization method. Often, the minimization method converges to a local (non-
global) minimizer of J, while the local method converges to a solution of (1.1). By
these reasons, it is necessary to give a chance to the local method before calling
the minimization algorithm. Different solutions have been proposed to this problem
(Grippo, Lampariello e Lucidi [1986]). Here we describe a strategy that combines lo-
cal algorithms and minimization methods introduced in the NIGHTINGALE package.
We define "ordinary iterations" and "special iterations". By an ordinary iteration
we understand an iteration produced by any of the methods described in sections 2
to 4 of this paper. Decomposition methods can also be considered, with some mod-
ifications. A special iteration is an iteration produced by a minimization algorithm
applied to J. We define, for all k E IN,
Step 1. If FLAG = 1, obtain Xk+l using an ordinary iteration. Else, obtain xk+1
using a special iteration.
Step 2. If
If the test (6.2) is satisfied infinite many times, then there exists a subsequence
of {xk} such that lim IIF(xk)11 = o. So, if the sequence is bounded, we will be able
k-+oo
to find a solution of (1.1) up to any prescribed accuracy. Conversely, if (6.2) does
not hold for all k 2:: ko, then all the iterations starting from the ko-th will be special,
and the convergence properties of the sequence will be those of the minimization
algorithm.
97
Algorithm 6.2. Assume that ~min > 0, a: E (0,1) are given independently of the
iteration k. Define "pk(X) = IIF(x k) + J(xk)(x - xk)W , ~ 2 ~min •
Step 2. If
(6.3)
define xk+1 = x. Else, choose ~new E [O.lllx- xkll,0.9~], replace ~ by ~new and go
to Step 1. 8
(6.4)
(6.4) is the problem of minimizing a convex quadratic with box constraints. For this
problem, algorithms based on combinations of Krylov Subspace methods with Gra-
dient Projection strategies are currently preferred. In NIGHTINGALE, the approxi-
:s
mate solution of (6.4) is defined as a point that satisfies "pk( x) "pk( x~) and where, in
addition, the norm of projected gradient of "pk(X) is less than O.lIIJ(xk)T F(xk)ll. We
also choose: ~min = O.OOlx (typicalllxll), initial choice of ~ == ~o = Ilxoll,~new =
0.511x - xkll, further choice is ~ = 4 x ~.
The convergence properties of Algorithm 6.2 were given in Friedlander, Gomes
- Ruggiero, Martinez and Santos [1993]. Every limit point x* of a sequence {xk}
generated by this algorithm satisfies J(x*f F(x*) = o. Therefore, x* is a solution
of (1.1) if J(x*) is nonsingular. Unhappily, if J(x*) is singular, it is possible that
F(x*) =f. O. This is the main weakness of algorithms based on optimization for the
globalization of (1.1).
An advantage of special iterations based on box trust regions is that they can be
easily adapted to situations where we have natural bounds for the solution of (1.1).
Other recent methods based on the inexact Newton approach with global convergent
properties were given by Deuflhard [1991] and Eisenstat and Walker [1993].
98
7. GLOBALIZATION BY HOMOTOPIES
°
In Section 6 we saw that local methods for solving F( x) = can be "globalized"
through their transformation into minimization problems. In this section we study
another popular technique to solve (1.1) when the initial approximation is poor.
This technique is based on homotopies. A homotopy associated to this problem is a
function H(x, t) : lR" x JR -+ JR such that
H(x,l) = F(X)}
H(xO, 0) = ° (7.1)
8. CONCLUSIONS
Many practical applications give rise to large scale nonlinear systems of equations.
From the local point of view, the most interesting techniques for this case are varia-
tions of the Inexact Newton method. To develop preconditioners for Krylov Subspace
solvers, taking into account the structure of these methods and problems, is a chal-
lenging problem. Since solving a linear system is a particular case of minimizing a
quadratic on a box, it turns out that to solve efficiently this last problem is crucial.
Moreover the problem appears again in the development of methods for globalization
using optimization, and in the Corrector Phase of Homotopy methods.
100
For very small problems Newton's method continues to be the best choice, and for
medium to large problems a combination Newton-Quasi-Newton seems to be better.
Many decomposition methods are interesting when they are induced by characteristics
of the problem structure , or when decomposition is suggested by the computer
architecture.
In this paper, we surveyed methods based on first-order approximations of the
nonlinear system. Methods based on approximations of higher order have also been
developed (Schnabel and Frank [1984]) and are useful in the presence of singulari-
ties of the Jacobians. Large - scale implementations of these methods seem to be hard.
References
Abaffy, J.; Broyden, C.G.; Spedicato, E. [1984]: A class of direct methods for linear
equations, Numerische Mathematik 45, pp. 361-376.
Barnes, J.G.P. [1965]: An algorithm for solving nonlinear equations based on the
secant method, Computer Journal 8, pp. 66 - 72.
Brent, R.P. [1973]: Some efficient algorithms for solving systems of nonlinear equa-
tions, SIAM Journal on Numerical Analysis 10, pp. 327 - 344.
Brown, K.M. [1969]: A quadratically convergent Newton - like method based upon
Gaussian elimination, SIAM Journal on Numerical Analysis 6, pp. 560 - 569.
101
Broyden, C.G. [1965): A class of methods for solving nonlinear simultaneous equa-
tions, Mathematics of Computation 19, pp. 577-593.
Broyden, C.G.; Dennis Jr., J.E.; More, J.J. [1973): On the local and superlinear
convergence of quasi-Newton methods, Journal of the Institute of Mathematics
and its Applications 12, pp. 223-245.
Burmeister, W.; Schmidt, J.W. [1978): On the k-order of coupled sequences arising
in single - step type methods, Numerische Mathematik 33, pp. 653 - 66l.
Chadee, F.F. [1985): Sparse quasi-Newton methods and the continuation problem,
T.R. S.O.L. 85-8, Department of Operations Research, Stanford University.
Chow, S.N.; Mallet-Paret, J.; Yorke, J.A. [1978): Finding zeros of maps: Homotopy
methods that are constructive with probability one, Mathematics of Computa-
tion 32, pp. 887-899.
Coleman,T. F.; Garbow, B. S.; More,J. J. [1984): Software for estimating sparse
Jacobian matrices, ACM Trans. Math. Software 11, pp. 363-378.
Coleman,T. F.; More,J. J. [1983): Estimation of sparse Jacobian matrices and graph
coloring problems, SIAM Journal on Numerical Analysis 20, pp. 187-209.
Dembo, R.S.; Eisenstat, S.C.; Steihaug, T. [1982): Inexact Newton methods, SIAM
Journal on Numerical Analysis 19, pp. 400-408.
Dennis Jr., J.E.; Marwil, E.S. [1982): Direct secant updates of matrix factorizations,
Mathematics of Computation 38, pp. 459-476.
Dennis Jr., J.E.; More, J.J. [1977]: Quasi-Newton methods, motivation and theory,
SIAM Review 19, pp. 46-89.
Dennis Jr.,J.E.; Schnabel,R.B. [1979]: Least change secant updates for quasi-
Newton methods, SIAM Review 21, pp. 443-459.
Dennis Jr., J.E. ; Walker, H.F. [1981]: Convergence theorems for least-change
secant update methods, SIAM Journal on Numerical Analysis 18, pp. 949-987.
Deufihard, P. [1991]: Global inexact Newton methods for very large scale nonlinear
problems Impact of Computing in Science and Engineering 3, pp. 366-393.
Duff, I.S. [1977]: MA28 - a set of Fortran subroutrines for sparse unsymmetric
linear equations. AERE R8730, HMSO, London.
Duff, I.S.; Erisman, A.M.; Reid, J.K. [1989]:Direct Methods for Sparse Matrices,
Oxford Scientific Publications.
Eisenstat, S.C.; Walker, H.F. [1993]: Globally convergent inexact Newton methods,
to appear in SIAM Journal on Optimization.
Gay, D.M. [1975]: Brown's method and some generalizations with applications to
minimization problems, Ph D Thesis, Computer Science Department, Cornell
University, Ithaca, New York.
103
Gay, D.M. [1979]: Some convergence properties of Broyden's method, SIAM Jour-
nal on Numerical Analysis 16, pp. 623 - 630.
Gay, D.M.; Schnabel, R.B. [1978]: Solving systems of nonlinear equations by Broy-
den's method with projected updates, in Nonlinear Programming 3, edited by
O. Mangasarian, R. Meyer and S. Robinson, Academic Press, New York, pp.
245-281.
George, A.; Ng, E. [1987]: Symbolic factorization for sparse Gaussian elimination
with partial pivoting, SIAM Journal on Scientific and Statistical Computing 8,
pp. 877-898.
Golub, G.H.; Van Loan, Ch.F. [1989]: Matrix Computations, The Johns Hopkins
University Press, Baltimore and London.
Gragg, W.B.; Stewart, G.W. [1976]: A stable variant of the secant method for
solving nonlinear equations, SIAM Journal on Numerical Analysis 13, pp. 127
- 140.
Griewank, A.; Toint, Ph.L. [1982b]: Partitioned variable metric for large structured
optimization problems, Numerische Mathematik 39, pp. 119 - 137.
Griewank, A.; Toint, Ph.L. [1982c]: Local convergence analysis for partitioned
quasi-Newton updates, Numerische Mathematik 39, pp. 429-448.
104
Griewank, A.; Toint, Ph.L. [1984]: Numerical experiments with partially separable
optimization problems, in Numerical Analysis Proceedings Dundee 1983, edited
by D.F. Griffiths, Lecture Notes in Mathematics vol. 1066, Springer - Verlag,
Berlin, pp. 203-220.
Grippo, L.; Lampariello, F.; Lucidi, S. [1986]: A nonmonotone line search technique
for Newton's method, SIAM Journal on Numerical Analysis 23, pp. 707 - 716.
Hart, W.E.; Soul, S.O.W. [1973]: Quasi-Newton methods for discretized nonlinear
boundary value problems, J. Inst. Math. Applies. 11, pp. 351 - 359.
Hestenes, M.R.; Stiefel, E. [1952]: Methods of conjugate gradients for solving linear
systems, Journal of Research of the National Bureau of Standards B49, pp. 409
- 436.
Hoyer, W.; Schmidt, J.W. [1984]: Newton-type decomposition methods for equa-
tions arising in network analysis, Z. Angew-Math. Mech. 64, pp. 397 - 405.
Kelley, C.T.; Sachs, E.W. [1987]: A quasi-Newton method for elliptic boundary
value problems, SIAM Journal on Numerical Analysis 24, pp. 516 - 53l.
Lopes, T.L.; Martinez, J.M. [1980]: Combination of the Sequential Secant Method
and Broyden's method with projected updates, Computing 25, pp. 379-386.
Martinez, J.M. [1979a]: Three new algorithms based on the sequential secant
method, BIT 19, pp. 236-243.
105
Martinez, J.M. [1979c]: Generalization of the methods of Brent and Brown for
solving nonlinear simultaneous equations, SIAM Journal on Numerical Analysis
16, pp. 434 - 448.
Martinez, J.M. [1983]: A quasi-Newton method with a new updating for the LDU
factorization of the approximate Jacobian, Matematica Aplicada e Computa-
cional2, pp. 131-142.
Martinez, J.M. [1986a]: The method of Successive Orthogonal Projections for solv-
ing nonlinear simultaneous equations, Calcolo 23, pp. 93 - 105.
Martinez, J.M. [1987]: Quasi-Newton Methods with Factorization Scaling for Solv-
ing Sparse Nonlinear Systems of Equations, Computing 38, pp. 133-141.
Martinez, J.M. [1990a]: A family of quasi-Newton methods for nonlinear equations
with direct secant updates of matrix factorizations, SIAM Journal on Numerical
Analysis 27, pp. 1034-1049.
Martinez, J.M. [1990b]: Local convergence theory of inexact Newton methods based
on structured least change updates, Mathematics of Computation 55, pp. 143-
168.
Martinez, J.M. [1991]: Quasi-Newton Methods for Solving Under determined Non-
linear Simultaneous Equations, Journal of Computational and Applied Mathe-
matics 34, pp. 171-190.
Martinez, J.M. [1992a]: On the relation between two local convergence theories
of least change secant update methods, Mathematics of Computation 59, pp.
457-481.
106
Matthies, H.; Strang, G. [1979J: The solution of nonlinear finite element equations,
International Journal of Numerical Methods in Engineering 14, pp. 1613 - 1626.
Milnor, J.W. [1969]: Topology from the differential viewpoint, The University Press
of Virginia, Charlottesville, Virginia.
More, J.J. [1989]: A collection of nonlinear model problems, Preprint MCS - P60
- 0289, Mathematics and Computer Science Division, Argonne National Labo-
ratory, Argonne, Illinois.
RaIl, L.B. [1984]: Differentiation in PASCAL - SC: Type Gradient, ACM Transac-
tions on Mathematical Software 10, pp. 161-184.
Schmidt J.W.; Hoyer, W.; Haufe, Ch. [1985]: Consistent approximations in Newton
- type decomposition methods, Numerische Mathematik 47, pp. 413 - 425.
Schnabel, R.B.; Frank, P.D. [1984]: Tensor methods for nonlinear equations, SIAM
Journal on Numerical Analysis 21, pp. 815 - 843.
Spedicato, E.; Chen, Z.; Deng, N. [1992]: A class of difference ABS - type algo-
rithms for a nonlinear system of equations, Technical Report, Department of
Mathematics, University of Bergamo.
Walker, H.F.; Watson, L.T. [1989]: Least - Change Update Methods for under de-
termined systems, Research Report, Department of Mathematics, Utah State
University.
Watson, L.T. [1979]: An algorithm that is globally convergent with probability one
for a class of nonlinear two-point boundary value problems, SIAM Journal on
Numerical Analysis 16, pp. 394-401.
Toint, Ph.L. [1986]: Numerical solution of large sets of algebraic nonlinear equa-
tions, Mathematics of Computation 16, pp. 175 - 189.
Watson, 1.T.; Billups, S.C.; Morgan, A.P. [1987]: Algorithm 652: HOMPACK: A
suite of codes for globally convergent homotopy algorithms, ACM Trans. Math.
Software 13, pp. 281-310.
Watson, L.T.; Wang, C.Y. [1981]: A homotopy method applied to elastica prob-
lems, International Journal on Solid Structures 17, pp. 29-37.
Wolfe, P. [1959]: The secant method for solving nonlinear equations ,Communica-
tions ACM 12, pp. 12 - 13.
Zambaldi, M.C. [1990]: Estruturas estdticas e dinamicas para resolver sistemas niio
lineares esparsos, Tese de Mestrado, Departamento de Matematica Aplicada,
Universidade Estadual de Campinas, Campinas, Brazil.
Zlatev, Z.; Wasniewski, J.; Schaumburg, K. [1981]: Y12M. Solution of large and
sparse systems of linear algebraic equations, Lecture Notes in Computer Science
121, Springer-Verlag, New York, Berlin, Heidelberg and Tokyo.