Nonlinear Equations and Systems: Lectures Lesson Vi
Nonlinear Equations and Systems: Lectures Lesson Vi
Belgrade Niš
Master Study Doctoral Study
COMPUTATIONAL ENGINEERING
LECTURES
LESSON VI
6.1.0. Introduction
We consider that most basic of tasks, solving equations numerically. While most
equations are born with both a right-hand side and a left-hand side, one traditionally
moves all terms to the left, leaving
(6.1.0.1) f (x) = 0
whose solution or solutions are desired. When there is only one independent variable, the
problem is one-dimensional, namely to find the root or roots of a function. With more
than one independent variable, more than one equation can be satisfied simultaneously.
You likely once learned the implicit function theorem which (in this context) gives us
the hope of satisfying n equations in n unknowns simultaneously. Note that we have
only hope, not certainty. A nonlinear set of equations may have no (real) solutions at
all. Contrariwise, it may have more than one solution. The implicit function theorem
tells us that generically the solutions will be distinct, pointlike, and separated from each
other. But, because of nongeneric, i.e., degenerate, case, one can get a continuous family
of solutions. In vector notation, we want to find one or more n-dimensional solution
vectors ~x such that
(6.1.0.2) f~(~x) = ~0
where f~ is the n-dimensional vector-valued function whose components are the individ-
ual equations to be satisfied simultaneously. Simultaneous solution of equations in n
dimensions is much more difficult than finding roots in the one-dimensional case. The
principal difference between one and many dimensions is that, in one dimension, it is
possible to bracket or ”trap” a root between bracketing values, and then find it out di-
rectly. In multidimensions, you can never be sure that the root is there at all until you
have found it. Except in linear problems, root finding invariably proceeds by iteration,
and this is equally true in one or in many dimensions. Starting from some approximate
trial solution, a useful algorithm will improve the solution until some predetermined
convergence criterion is satisfied. For smoothly varying functions, good algorithms will
always converge, provided that the initial guess is good enough. Indeed one can even
determine in advance the rate of convergence of most algorithms. It cannot be overem-
phasized, however, how crucially success depends on having a good first guess for the
85
86 Numerical Methods in Computational Engineering
solution, especially for multidimensional problems. This crucial beginning usually de-
pends on analysis rather than numerics. Carefully crafted initial estimates reward you
not only with reduced computational effort, but also with understanding and increased
self-esteem. Hammings motto, ”the purpose of computing is insight, not numbers,”
is particularly apt in the area of finding roots. One should repeat this motto aloud
whenever program converges, with ten-digit accuracy, to the wrong root of a problem,
or whenever it fails to converge because there is actually no root, or because there is
a root but initial estimate was not sufficiently close to it. For one-dimensional root
finding, it is possible to give some straightforward answers: You should try to get some
idea of what your function looks like before trying to find its roots. If you need to
mass-produce roots for many different functions, then you should at least know what
some typical members of the ensemble look like. Next, you should always bracket a root,
that is, know that the function changes sign in an identified interval, before trying to
converge to the roots value. Finally, one should never let iteration method get outside
of the best bracketing bounds obtained at any stage. We can see that some pedagogi-
cally important algorithms, such as secant method or Newton-Raphson, can violate this
last constraint, and are thus not recommended unless certain fixups are implemented.
Multiple roots, or very close roots, are a real problem, especially if the multiplicity is
an even number. In that case, there may be no readily apparent sign change in the
function, so the notion of bracketing a root and maintaining the bracket becomes diffi-
cult. We nevertheless insist on bracketing a root, even if it takes the minimum-searching
techniques to determine whether a tantalizing dip in the function really does cross zero
or not. As usual, we want to discourage the reader from using routines as black boxes
without understanding them.
There are two distinct phases in finding the roots of nonlinear equation (see [2], pp.
130-135):
(1) Bounding the solution, and
(2) Refining the solution.
In general, nonlinear equations can behave in many different ways in the vicinity of
a root.
(1) Bounding the solution
Bounding the solution involves finding a rough estimate of the solution that can
be used as the initial approximation, or the starting point, in a systematic procedure
that refines the solution to a specified tolerance in an efficient manner. If possible, it
is desirable to bracket the root between two points at which the value of the nonlinear
function has opposite signs. The bounding procedures can be:
1. Drafting the function,
2. Incremental search,
3. Previous experience or similar problem,
4. Solution of a simplified approximate model.
Drafting the function involves plotting the nonlinear function over the range of
interest. Spreadsheets generally have graphing capabilities, as does Mathematica, Matlab
and Mathcad. The resolution of the plots is generally not precise enough for accurate
result. However, they are accurate enough to bound the solution. The plot of the
nonlinear function displays the behavior of nonlinear equation and gives view of scope
of problem.
An incremental search is conducted by starting at one end of the region of interest
and evaluating the nonlinear function with small increments across the region. When
the value of the function changes the sign, it is assumed that a root lies in that interval.
Two end points of the interval containing the root can be used as initial guesses for a
Lesson VI - Nonlinear Equations and Systems 87
refining method (second phase of solution). If multiple roots are suspected, one has to
check for sigh changes in the derivative of the function between the ends of the interval.
(2) Refining the solution
Refining the solution involves determining the solution to a specified tolerance by
an efficient procedure. The basic methods for refining the solution are:
2.1 Trial and error,
2.2 Closed domain methods (bracketing method),
2.3 Open domain methods.
Trial and error methods simply presume (guess) the root, x = α, evaluate f (α), and
compare to zero. If f (α) is close enough to zero, quit, if not guess another α and continue
until f (α) is close enough to zero.
Closed domain (bracketing methods) are methods that start with two values of x
which bracket the root, x = α, and systematically reduce the interval, keeping root inside
of brackets (inside of interval). Two most popular methods of that kind are:
2.2.1 Interval halving (bisection),
2.2.2 False position (Regula Falsi).
Bracketing methods are robust and reliable, since root is always inside of closed
interval, but can be slow to convergence.
Open domain methods do not restrict the root to remain trapped in a closed interval.
Therefore, there are not as robust as bracketing methods and can diverge. But, they
use information about the nonlinear function itself to come closer with estimation of
the root. Thus, they are much more efficient than bracketing methods.
Some general hints for root finding
Nonlinear equations can behave in various ways in the vicinity of a root. Algebraic
and transcendental equations may have simple real roots, multiple real roots, or complex
roots. Polynomials may have real or complex roots. If the polynomial coefficients are all
real, complex root occur in conjugate pairs. If the polynomial coefficients are complex,
single complex roots can occur.
There are numerous methods for finding the roots of a nonlinear equation. Some
general philosophy of root finding is given below.
1. Bounding method should bracket a root, if possible.
2. Good initial approximations are extremely important.
3. Closed domain methods are more robust than open domain methods because they
keep the root in a closed interval.
4. Open domain methods, when converge, in the general case converge faster than
closed domain methods.
5. For smoothly varying functions, most algorithms will always converge if the initial
approximation is close enough. The rate of convergence of most algorithms can be
determined in advance.
6. Many problems in engineering and science are well behaved and straightforward.
In such cases, a straightforward open domain method, such as Newton’s method,
or the secant method, can be applied without worrying about special cases and
strange behavior. If problems arise during the solution, then the peculiarities of the
nonlinear equation and the choice of solution method can be reevaluated.
7. When a problem is to be solved only once or a few times, then the efficiency of
method is not of major concern. However, when a problem is to be solved many
times, efficiency is of major concern.
8. Polynomials can be solved by any of the methods for solving nonlinear equations.
However, the special techniques applicable to polynomials should be considered.
88 Numerical Methods in Computational Engineering
9. If a nonlinear equation has complex roots, that has to be anticipated when choosing
a method.
10. Time for problem analysis versus computer time has to be considered during method
selection.
11. Generalizations about root-finding methods are generally not possible.
The root-finding algorithms should contain the following features:
1. An upper limit on the number of iterations.
2. If the method uses the derivative f 0 (x), it should be monitored to ensure that it does
not approach zero.
3. A convergence test for the change in the magnitude of the solution, |xi+1 − xi |, or the
magnitude of the nonlinear function, |xi+1 |, has to be included.
4. When convergence is indicated, the final root estimate should be inserted into the
nonlinear function f (x) to guarantee that f (x) = 0 within the desired tolerance.
where ξ = x0 + θ(a − x0 ) (0 < θ < 1). Having in mind that f (a) = 0, by neglecting last
member on the right-hand side of (6.1.1.1), we get
f (x0 )
a∼
= x0 − .
f 0 (x0 )
Here x1 represents the abscissa of intersection of tangent on the curve y = f (x) in the
point (x0 , f (x0 )) with x−axis (see Figure 6.1.1.1).
by differentiation we get
f 0 (x2 ) − f (x)f 00 (x) f (x)f 00 (x)
(6.1.1.4) φ0 (x) = 1 − = .
f 0 (x)2 f 0 (x)2
Note that φ(a) = a and φ0 (a) = 0. Being, based on accepted assumptions for f , function
φ0 continuous on [α, β], and φ0 (a) = 0, there exists a neighborhood of point x = a, denoted
as U (a) where it holds
f (x)f 00 (x)
(6.1.1.5) |φ0 (x)| = ≤ q < 1.
f 0 (x)2
Theorem 6.1.1.1. If x0 ∈ U (a), series {xk } generated using (6.1.1.3) converges to point x = a, whereby
xk+1 − a f 00 (a)
(6.1.1.6) lim 2
= 0 .
k→+∞ (xk − a) 2f (a)
f (x) = x − cos x = 0
Note that f 0 (x) = 1 + sin x > 0(∀x ∈ [0, π/2]). Starting with x0 = 1, we get the results given
in Table 6.1.1.
Table 6.1.1
k xk
0 1.000000
1 0.750364
2 0.739133
3 0.739085
4 0.739085
The last two iterations give solution of equation in consideration with six exact
figures.
Example 6.1.1.2.
By applying the Newton’s method on solution of equation f (x) = xn − a = 0 (a > 0, n >
1) we obtain the iterative formula for determination of n-th root of positive number a
xkn − a 1 a
xk+1 = xk − = (n − 1)xk + n−1 (k = 0, 1, . . .).
nxn−1
k
n xk
90 Numerical Methods in Computational Engineering
The case f 00 (a) is specially to be analyzed. Namely, if we suppose that f ∈ C 3 [α, β] one
can prove that
xk+1 − a f 000 (a)
lim 3
= 0 .
k→+∞ (xk − a) 3f (a)
Example 6.1.1.3.
Because of f (0) = −2 and f (1.5) = 0.625 we conclude that on segment [0, 1.5] this equation
has a root. On the other hand, f 0 (x) = 3x2 − 6x + 4 = 3(x − 1)2 + 1 > 0, what means that
the root is simple, enabling application of Newton’s method. Starting with x0 = 1.5, we
get the results in Table 6.1.2.
Table 6.1.2
k xk
0 1.5000000
1 1.1428571
2 1.0054944
3 1.0000003
f 0 (a)
Because of φ1 (a) = a and φ01 (a) = 1 − , we conclude that method has order of conver-
f 0 (x0 )
gence one, i.e. it holds
f 0 (a)
xk+1 − a ∼ 1 − 0 (xk − a) (k → +∞),
f (x0 )
Lesson VI - Nonlinear Equations and Systems 91
A special case of this method, for p = 1 − n is known as method of Tihonov ([7]), for the
case when f is algebraic polynomial of degree n.
The following modification of Newton’s method, containing successive application
of formulas
f (xk ) f (yk )
(6.1.1.10) yk = xk − , xk+1 = yk − 0 (k = 0, 1, . . .).
f 0 (xk ) f (xk )
f (x) (x − a)g(x)
∆(x) = = (x 6= a).
f 0 (x) mg(x) + (x − a)g 0 (x)
1 1
Because of Φ(a) = a, Φ0 (a) = 1 − , ≤ Φ0 (a) < 1 (m ≥ 2) and Φ0 being continuous
m 2
function, it follows that there exists neighborhood of root x = a in which |Φ0 (x)| ≤ q < 1,
wherefrom we conclude that Newton’s method in this case is also convergent, but with
degree of convergence 1.
If we know in advance order of multiplicity of root, then Newton method can be
modify in such a way that it has order of convergence 2. Namely, one should
f (xk )
(6.1.2.2) xk+1 = xk − m (k = 0, 1, . . .).
f 0 (xk )
Remark 6.1.2.1.
Formally, the formula (6.1.2.2) is Newton’s method applied to solving the equation
m
p
F (x) = f (x) = 0.
Theorem 6.1.2.1. If x0 is chosen enough close to root x = a with order of multiplicity m, then series
{xk }k∈N0 defined by (6.1.2.2) converges to a, whereby
with order of convergence 2. Note that this function could be obtained from (6.1.1.8) by
taking Ψ(x) = 1/f 0 (x).
xk − xk−1
(6.1.3.1) xk+1 = xk − f (xk ) (k = 1, 2, . . .),
f (xk ) − f (xk−1 )
Lesson VI - Nonlinear Equations and Systems 93
which belongs to open domains methods (two steps method). For starting of iterative
process 6.1.3.1) two initial values x0 and x1 are needed. Geometrical interpretation of
secant method is given in Figure 6.1.3.1.
Let in segment [α, β] exist unique root x = a of equation f (x) = 0. For examination of
convergence of iterative process (6.1.3.1) suppose that f ∈ C 2 [α, β] and f 0 (x) 6= 0 (∀x ∈ [α, β].
If we put ek = xk − a (k = 0, 1, . . .), from (6.1.3.1) it follows
ek − ek−1
(6.1.3.2) ek+1 = ek − f (xk ).
f (xk ) − f (xk−1 )
i.e.
f 00 (a)
(6.1.3.3) ek+1 = ek ek−1 (1 + O(ek−1 )).
2f 0 (a)
wherefrom it follows 00
f (a) 1/r
r2 − r − 1 = 0 and Cr = .
2f 0 (a)
Remark 6.1.3.1.
(6.1.3.5) x = g(x)
one can find in bibliography Weigstein’s method ([9]), where, starting from x0 the series
{xk }k∈N is to be generated using
x1 =g(x0 )
(6.1.3.6) (g(xk ) − g(xk−1 ))(g(xk ) − xk )
xk+1 =g(xk ) − (k = 1, 2, . . .)
(g(xk ) − g(xk−1 )) − (xk − xk−1 )
It can be shown that this method is actually secant method with initial values x0 and
x1 = g(x0 ). Namely, if we present equation (6.1.3.5) in form
This method is often called regula falsi or false position method. Differently from secant
method, where is enough to take x1 = 6 x0 , at this method one needs to take x1 and x0 on
different sides of root x = a. Geometric interpretation of false position method is given
in Figure 6.1.3.2.
Iterative function at modified secant method is
x − x0 x0 f (x) − xf (x0 )
Φ(x) = x − f (x) = .
f (x) − f (x0 ) f (x) − f (x0 )
Because Φ(a) = a and Φ0 (a) 6= 0, we conclude that iterative process (6.1.3.8), if converges,
has order of convergence 1. Condition of convergence is, in this case,
Example 6.1.3.4.
Using convergence acceleration (see [1, Theorem 2.4.1, p. 197]) on the iterative
process 6.1.3.8, we get the iterative process of second order
x0 g(xk ) − xk h(xk )
xk+1 = (k = 1, 2, . . .),
g(xk ) − h(xk )
(6.1.4.1) f (x) = 0,
where f ∈ C[α, β]. Method of interval bisection for solution of equation (6.1.4.1) consists
in construction of series’ of intervals {(xk , yk )}k∈N such that
1
yk+1 − xk+1 = (yk − xk ), (k = 1, 2, . . .)
2
having thereby lim xk = lim yk = a. The noted process of construction of intervals
k→+∞ k→+∞
is interrupted when, for example, interval length becomes lesser than in advance given
small positive number ε. This method can be described with four steps:
I. k := 0, x1 = α, y1 = β ;
II. k := k + 1, zk := 12 (xk + yk );
III. If
f (zk )f (xk ) < 0 take xk+1 := xk , yk+1 := zk ,
> 0 take xk+1 := zk , yk+1 := yk ,
= 0 take a := zk ; end of calculation
IV. If
|yk+1 − xk+1 | ≥ ε go to II,
1
<ε zk+1 := (xk+1 + yk+1 )
2
end of calculation.
Note that error estimation for approximation of zk+1 is
1
|zk+1 − a| ≤ (β − α).
2k+1
96 Numerical Methods in Computational Engineering
Finding the higher derivatives of function F , supposing that it is enough times dif-
ferentiable, could be very complicated. Being necessary for Schröder development, a
recursive procedure is suggested (see [1], pp. 353).
Suppose that function f is (n + 1) times differentiable on [α, β] , and that
Xk
(6.1.5.1) F (k) (y) = (k = 1, . . . , n + 1),
(f 0 )2k−1
Suppose that function f has on segment [α, β] a simple zero x = a whose surrounding
f (x)
denote with U (a). If we put h = − 0 (x ∈ U (a)), then f (x) + hf 0 (x) = 0, wherefrom we
f (x)
have
a = F (0) = F (f + hf 0 ).
where y = f + thf 0 = (1 − t)f = θf (t, θ ∈ (0, 1)). Finally, using (6.1.5.1) we get Schröder
development
n
X 0 00
1 f f f (k) k
a−x= Xk , , . . . , 0 h + O(f (x)n+1
k! f0 f0 f
k=1
i.e.
2
f 00 2 3f 00 − f 0 f 000 3
(6.1.5.3) a−x=h− h + h
2f 0 6f 0 2
2 3
10f 0 f 00 f 000 − f 0 f IV − 15f 00 4
+ h + ...
24f 0 3
Lesson VI - Nonlinear Equations and Systems 97
(6.1.6.1) f (x) = 0,
has on segment [α, β] unique simple root x = a, and that function f is enough times
differentiable on [α, β].
1. Using Schröder development, by taking finite number of first members on right
hand site of (6.1.5.3), we can get a number of iterative formulas.
Let
f (x)
Φ2 (x) = x + h = x − ,
f 0 (x)
f (x)00 2 f (x) f 00 (x)f (x)2
Φ3 (x) = Φ2 (x) − 0 h =x− 0 − ,
2f (x) f (x) 2f 0 (x)3
2
3f 00 − f 0 f 000 3
Φ4 (x) = Φ3 (x) + h
6f 0 2
00 2
f (x) f 00 (x)f (x)2 f (x)3 f (x) 000
=x− 0 − 3 − 0 4 3 0 − f (x) ,
f (x) 2f 0 (x) 6f (x) f (x)
etc.
Note that Φ2 (x) is iteration function of Newton’s method.
Because h being in first iteration a − x (x → a), based on (6.1.5.3) we have
f (xk ) f (xk )2 00
(6.1.6.3) xk+1 = xk − − f (xk ) (k = 1, 2, . . .),
f 0 (xk ) 2f 0 (xk )
00 6 2
where f (xk ) = − 2 (f (xk ) − f (xk−1 )) + (2f 0 (xk ) + f 0 (xk−1 )) and ε = xk − xk−1 . Order of con-
εk √ εk
vergence of this process is r = 1+ 3. Iterative function of this process is one modification
of Chebyshev function Φ3 .
In paper [11], Milovanović and Petković considered one modification of function Φ3
using approximation
f 0 (xk + εk ) − f 0 (xk )
f 00 (xk ) ≈
εk
whereby ε → 0 when k → +∞. The corresponding iterative process is
f (x) f (xk )2 f 0 (xk + εk ) − f 0 (xk )
(6.1.6.4) xk+1 = xk − − · .
f 0 (x) 2f 0 (xk )3 εk
0
∼ f̄ (x) = f (x + f (x)) − f (x − f (x))
f 0 (x) =
2f (x)
(6.1.6.5)
∼ f̄ 00 (x) = f (x + f (x)) − 2f (x) + f (x − f (x))
f 00 (x) =
f (x)2
xk − Φ(xk )
xk+1 = xk − (k = 0, 1, . . .)
1
1 − Φ0 (xk )
r
f (x)
for Newton’s method Φ(x) = x − we get the method
f 0 (x)
2f (xk )f 0 (xk )
xk+1 = xk − (k = 0, 1, . . .),
2f 0 (xk )2 − f (xk )f 00 (xk )
6.2.0. Introduction
System of nonlinear equations
(6.2.0.1) fi (x1 , . . . , xn ) = 0 (i = 1, . . . , n)
(6.2.0.2) F u = θ,
where F is operator which maps Banach space X to Banach space Y , and θ null-vector.
Thus, X = Y = Rn , u = ~x = [x1 . . . xn ]T , θ = [0 . . . 0]T ,
f1 (x1 , . . . , xn )
(6.2.0.3)
F u = f~(~x) = ..
.
.
fn (x1 , . . . , xn )
Lesson VI - Nonlinear Equations and Systems 99
Basic method for solving operator equation (6.2.0.2) and also system of equations (6.2.0.1)
is Newton-Kantorowich (Newton-Raphson) method, which is generalization of Newton
method (6.1.1.3) .
where
T u = u − Γ(u)F u.
For developing a methods for solution of systems of nonlinear equations, we will induce
some crucial theorems without proofs (see [1], pp. 375 - 380].
Theorem 6.2.1.1. Let operator F be two times Fréchet differentiable on D, whereby for every u ∈ D
there exists operator Γ(u). If the operators Γ(u) and F 00 (u) are limited, and u0 ∈ D is close enough to
point a, the iterative process (6.2.1.2) has order of convergence at least two.
For usual consideration we suppose that D is a ball in K[u0 , R], where u0 is starting
value of series {uk }k∈N0 .
If Lipschitz condition
0 0
(6.2.1.3) ||F(u) − F(v) || ≤ L||u − v|| (u, v ∈ K[u0 , R])
is fulfilled, from
Z1
0 0 0
F u − F v − F(v) (u − v) = [Fv+t(u−v)) − F(v) ](u − v)dt
0
1
(6.2.1.5) ||Γ0 || ≤ b0 , ||Γ0 F u0 || ≤ η0 , h0 = b0 Lη0 ≤ ,
2
100 Numerical Methods in Computational Engineering
series {uk }k∈N0 , generating by means of (6.2.1.2) converges to solution a ∈ K[u0 , r0 ] of equation (6.2.0.2).
the existence of series {uk }k∈N0 is proven and the following relations
1
(6.2.1.7) ||Γ(uk )|| ≤ bk , ||Γ(uk )F uk || ≤ ηk , hk ≤
2
and
hold.
Theorem 6.2.1.3. When conditions of previous theorem are fulfilled, then it holds
1 k
(6.2.1.9) ||uk − a|| ≤ (2h0 )2 −1
η0 (k ∈ N ).
2k−1
(6.2.1.11) T u = u − Γ0 F u,
uk+1 = T uk (k = 0, 1, 2, . . .).
the series generated using (6.2.1.10) converges to solution a ∈ K[u0 , r0 ] of equation (6.2.0.2).
where ~r (k) = [r1(k) . . . rn(k) ]T . If Jacobian matrix for f~ is regular, then we have
~a = ~x (k) − W −1 (~x (k) )f~(~x (k) )~r (k) .
By neglecting the very last member on the right-hand size, in spite of vector ~a, we get
its new approximation, denoted with ~x (k+1) . In this way, one gets (6.2.1.13).
As already noted, method (6.2.1.13) can be modified in the sense that inverse matrix
of W (~x) is not evaluated at every step, but only at first. Thus,
(6.2.1.14) ~x (k+1) = ~x (k) − W −1 (~x (0) ) (k = 0, 1, . . .).
Remark 6.2.1.1. Modified method (6.2.1.14) can be considered as simple iterative method
with matrix Λ obtained from condition that derivative of T is null-operator, i.e. I~ + ΛW(~x (0) ) is null-
matrix. If W(~x (0) ) is regular matrix, then we have Λ = W−1 (~x (0) ).
Previous introductory theorems can be adapted for the case of system of nonlinear
equations, whereby the conditions for convergence of processes (6.2.1.13) and (6.2.1.14)
can be expressed in different ways, depending on introduced norms in X. For example,
taking for norm in Rn
||~x|| = ||~x||∞ = max |xi |
i
and supposing that f ∈ C (D), where D is ball K[x(0) , R] from theorem 6.2.1.2 it follows
2
102 Numerical Methods in Computational Engineering
(0) (0)
(6.2.1.16) ||~f (~x )|| ≤ Q, ||W −1 (~x )|| ≤ b;
(0) 1
(6.2.1.17) ∆0 = det W(~x ) 6= 0, h = nN Qb2 ≤ .
2
√
1− 1 − 2h
Then, if R ≥ r = Qb, method Newton-Kantorowich (6.2.1.13) converges to solution a ∈
h
(0)
K[~x , r].
√
Because for 0 < h ≤ 1/2 it holds (1 − 1 − 2h)/h ≤ 2, so that for r in Corollary 6.2.1.1
we can take r = 2Qb.
Modified Newton-Kantorowich method (6.2.1.14) converges also under conditions
given in Corollary 6.2.1.1.
In [1, pp. 384-386] the Newton-Kantorowich method is illustrated with system of
nonlinear equation in two unknowns. It is suggested to reader to write a program code
in Mathematica and Fortran.
Example 6.2.1.1. Solve the system of nonlinear equation
Open(1, File=’Newt-Kant.out’)
x10=2.d0
x20=1.d0
EPS=1.d-6
Iter=0
write(1,5)
5 format(1h ,// 3x, ’i’,7x,’x1(i)’,9x,’x2(i)’,
* 9x,’f1(i)’, 9x,’f2(i)’/)
write(1,10)Iter, x10,x20,F1(x10,x20),F2(x10,x20)
1 x11=x10-((32*x20+1)*f1(x10,x20)-(9*x10**2+8*x20)*
* f2(x10,x20)) /Delta(x10,x20)
x21=x20-(4*x10**3*f1(x10,x20)+18*x10*x20*f2(x10,x20))
* /Delta(x10,x20)
Iter=Iter+1
write(1,10)Iter, x11,x21,F1(x11,x21),F2(x11,x21)
10 Format(1x,i3, 4D14.8,2x)
If(Dabs(x10-x11).lt.EPS.and.Dabs(x20-x21).lt.EPS)stop
If(Iter.gt.100)Stop
x10=x11
x20=x21
go to 1
End
and the output list of results is
i x1(i) x2(i) f1(i) f2(i)
0 .20000000D+01 .10000000D+01 .40000000D+01 .20000000D+01
1 .19830508D+01 .92295840D+00 .73136345D-01 .88110835D-01
2 .19837071D+01 .92074322D+00-.28694053D-04 .68348441D-04
3 .19837087D+01 .92074264D+00-.10324186D-10-.56994853D-10
4 .19837087D+01 .92074264D+00 .00000000D+00-.15543122D-14
The gradient method for solving a given system of equations is based on minimization
of functional n X
U (~x) = fi (x1 , . . . , xn )2 = (~f (~x), ~f (~x)).
i=1
wherefrom we obtain
P
n
Hi fi (~x (k) )
i=1
(6.2.2.3) λk = t = P
n ,
Hi2
i=1
we have
where f~ (k) = f~(~x (k) ) and Wk = W(~x (k) ). Finally, gradient method can be represented in
the form
~x (k+1) = ~x (k) − 2λk WkT f~(~x (k) ) (k = 0, 1, . . .).
As we see, in spite of matrix W−1 (~x (k) ) which appears in Newton-Kantorowich
method, we have now matrix 2λk WkT .
Example 6.2.1.2. System of nonlinear equations given in example 6.2.1.1 will be solved using gradient
method, starting with the same initial vector ~x(0) = [2 1]T , giving the following list of results
to a solution from almost any starting point. Therefore, it is our goal to develop an
algorithm that combines the rapid local convergence of Newtons method with a glob-
ally convergent strategy that will guarantee some progress towards the solution at each
iteration. The algorithm is closely related to the quasi-Newton method of minimization
(see [5], p. 376).
From (6.2.1.13), Newton-Raphson method, we have so known Newton step in iteration
formula
where W is Jacobian matrix. The question is how one should decide to accept the
Newton step δx? If we denote F = f~(~x (k) ), a reasonable strategy for step acceptance is
that |F|2 = F · F decreases, what is the same requirement one would impose if trying to
minimize
1
(6.2.3.2) f= F · F.
2
Every solution of (6.2.1.12) minimizes (6.2.3.2), but there may be some local minima of
(6.2.3.2) that are not solution of (6.2.1.12). Thus, simply applying some minimum finding
algorithms can be wrong.
To develop a better strategy, note that Newton step (6.2.3.1) is a descent direction
for f :
Thus, the strategy is quite simple. One should first try the full Newton step, because
once we are close enough to the solution, we will get quadratic convergence. However,
we should check at each iteration that the proposed step reduces f . If not, we go
back (backtrack) along the Newton direction until we get acceptable step. Because the
Newton direction is descent direction for f , we fill find for sure an acceptable step by
backtracking.
It is to mention that this strategy essentially minimizes f by by taking Newton steps
determined in such a way that bring ∇f to zero. In spite of fact that this method can
occasionally lead to local minimum of f , this is rather rare in practice. In such a case,
one should try a new starting point.
The aim is to find λ so that f (~xold + λ~p) has decreased sufficiently. Until the early 1970s,
standard practice was to choose λ so that ~xnew exactly minimizes f in the direction p~.
However, we now know that it is extremely wasteful of function evaluations to do so. A
better strategy is as follows: Since p~ is always the Newton direction in our algorithms,
we first try λ = 1, the full Newton step. This will lead to quadratic convergence when
~x is sufficiently close to the solution. However, if f (~xnew ) does not meet our acceptance
criteria, we backtrack along the Newton direction, trying a smaller value of λ, until
106 Numerical Methods in Computational Engineering
we find a suitable point. Since the Newton direction is a descent direction, we are
guaranteed to decrease f for sufficiently small λ. What should the criterion for accepting
a step be? It is not sufficient to require merely that f (~xnew ) < f (~xold ). This criterion can
fail to converge to a minimum of f in one of two ways. First, it is possible to construct
a sequence of steps satisfying this criterion with f decreasing too slowly relative to the
step lengths. Second, one can have a sequence where the step lengths are too small
relative to the initial rate of decrease of f . A simple way to fix the first problem is to
require the average rate of decrease of f to be at least some fraction α of the initial rate
of decrease ∇f · p~
Here the parameter α satisfies 0 < α < 1. We can get away with quite small values of
α; α = 10−4 is a good choice. The second problem can be fixed by requiring the rate of
decrease of f at ~xnew to be greater than some fraction β of the rate of decrease of f at ~xold .
In practice, we will not need to impose this second constraint because our backtracking
algorithm will have a built-in cutoff to avoid taking steps that are too small.
Here is the strategy for a practical backtracking routine. Define
so that
(6.2.3.7) g 0 (λ) = ∇f · p~
If we need to backtrack, then we model g with the most current information we have
and choose λ to minimize the model. We start with g(0) and g0 (0) available. The first
step is always the Newton step, λ = 1. If this step is not acceptable, we have available
g(1) as well. We can therefore model g(λ) as a quadratic:
Requiring this expression to give the correct values of g at λ1 and λ2 gives two
equations that can be solved for the coefficients a and b.
a 1 1/λ21 −1/λ22 g(λ1 ) − g 0 (0)λ1 − g(0)
(6.2.3.11) = · .
b λ1 − λ2 −λ2 /λ12 λ1 /λ22 g(λ2 ) − g 0 (0)λ2 − g(0)
One should enforce that λ lie between λmax = 0.5λ1 and λmin = 0.1λ1 . The corresponding
code in FORTRAN is given in [5], pp. 378-381. It it suggested to reader to write the
corresponding code in Mathematica.
where δ~xi = ~xi+1 − ~xi . Quasi-Newton or secant condition is that Bi+1 satisfy
where δFi = Fi+1 −Fi . This is generalization of the one-dimensional secant approximation
to the derivative, δF/δx. However, equation (6.2.3.14) does not determine Bi+1 uniquely
in more than one dimension. Many different auxiliary conditions to determine Bi+1 have
been examined, but the best one results from the Broyden’s formula. This formula is
based on idea of getting Bi+1 by making a least change to Bi in accordance to the secant
equation (6.2.3.14). Broyden gave the formula
(δFi − Bi · δ~xi ) ⊗ δ~xi
(6.2.3.15) Bi+1 = Bi + .
δ~xi · δ~xi
Thus, instead of solving equation (6.2.3.1) by, for example, LU decomposition, one de-
termined
Accordingly, one should implement the update formula in the form (6.2.3.15). However,
we can still preserve the O(n2 ) solution of (6.2.3.1) by using QR decomposition of Bi+1
in O(n2 ) operations. All needed is initial approximation B0 to start process. It is often
accepted to take identity matrix, and then allow O(n) updates to produce a reasonable
108 Numerical Methods in Computational Engineering
approximation to the Jacobian. In [5], p. 382-383, the first n function evaluations are
spent on a finite-difference approximation in order to initialize B. Since B is not exact
Jacobian, it is not guaranteed that δ~x is descent direction for f = 12 F · F (see eq. (6.2.3.3)).
That has a consequence that the line search algorithm can fail to return the suitable
step if B is far from the true Jacobian. In this case we simply reinitialize B.
Like the secant method in one dimension, Broyden’s method converges superlinearly
once you get close enough to the root. Embedded in a global strategy, it is almost as
robust as Newton’s method, and often needs far fewer function evaluations to determine
a zero. Note that the final value of B is not always close to the true Jacobian at the
root, in spite of fact that method converges.
The programme code ([5], pp. 383-385) of Broyden’s method differs from Newtonian
methods in using QR decomposition instead of LU, and determination of Jacobian by
finite-difference approximation instead of direct evaluation.
More Advanced Implementations
One of the principal ways that the methods described above can fail is if matrix W
(Newton-Kantorowich) or B (Broyden’s method) becomes singular or nearly singular,
so that ∆x cannot be determined. This situation will not occur very often in practice.
Methods developed so far to deal with this problem involve the monitoring of condition
number of W and perturbing W if singularity or near singularity is detected. This
feature is most easily implemented if QR decomposition instead of LU decomposition
in Newton (or quasi-Newton) method is applied. However, in spite of fact that this
method can solve problems when W is exactly singular and Newton’s and Newton-like
methods fail, it is occasionally less robust on other problems where LU decomposition
succeeds. Implementation details, like roundoff, underflow, etc. are to be considered
and taken in account.
In [5], considering effectiveness of strategies for minimization and zero finding, the
global strategies have been based on line searches. Other global algorithms, like hook
step and dogleg step methods, are based on the model-trust region approach, which is
related to the Levenberg-Marquardt algorithm for nonlinear least-squares. In spite being
more complicated than line searches, these methods have a reputation for robustness
even when starting far from desired zero or minimum.
Numerous libraries and software packages are available for solving nonlinear equa-
tions. Many workstations and mainframe computers have such libraries attached to
operating systems. Many commercial software packages contain nonlinear equation
solvers. Very popolar among engineers are Matlab and Matcad. More sophisticated
packages like Mathematica, IMSL, Macsyma, and Maple contain programs for nonlin-
ear equation solving. The book Numerical recipes [5] contains numerous programs for
solving nonlinear equation.
Bibliography (Cited references and further reading)
[1] Milovanović, G.V., Numerical Analysis I, Naučna knjiga, Beograd, 1988 (Serbian).
[2] Hoffman, J.D., Numerical Methods for Engineers and Scientists. Taylor & Francis,
Boca Raton-London-New York-Singapore, 2001.
[3] Milovanović, G.V. and Djordjević, Dj.R., Programiranje numeričkih metoda na
FORTRAN jeziku. Institut za dokumentaciju zaštite na radu ”Edvard Kardelj”,
Niš, 1981 (Serbian).
[4] Stoer, J., and Bulirsch, R., Introduction to Numerical Analysis, Springer, New York,
1980.
[5] Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T., Numerical Re-
cepies - The Art of Scientific Computing. Cambridge University Press, 1989.
Lesson VI - Nonlinear Equations and Systems 109