0% found this document useful (0 votes)
4 views

Solving_Nonlinear_Equations

The document discusses methods for solving nonlinear equations, particularly in the context of economic equilibrium and related fields. It outlines various iterative techniques such as simple iteration, Newton-Raphson, secant method, and bracketing methods, detailing their advantages and limitations. The author emphasizes the importance of selecting appropriate starting values and convergence criteria for successful iteration.

Uploaded by

kekito123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Solving_Nonlinear_Equations

The document discusses methods for solving nonlinear equations, particularly in the context of economic equilibrium and related fields. It outlines various iterative techniques such as simple iteration, Newton-Raphson, secant method, and bracketing methods, detailing their advantages and limitations. The author emphasizes the importance of selecting appropriate starting values and convergence criteria for successful iteration.

Uploaded by

kekito123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/265633573

Solving Nonlinear Equations

Article

CITATIONS READS

0 1,003

1 author:

Rómulo A. Chumacero
University of Chile
78 PUBLICATIONS 610 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

School Choice View project

All content following this page was uploaded by Rómulo A. Chumacero on 02 June 2015.

The user has requested enhancement of the downloaded file.


Solving Nonlinear Equations
Rómulo A. Chumacero∗
September 2001

1 Introduction
Concepts of economic equilibrium are often expressed as systems of nonlinear equa-
tions. Such problems generally take two forms, zeros and fixed points. If f : <n → <n ,
then a zero of f is any x such that f (x) = 0, and a fixed point of f is any x such
that f (x) = x. These are of course essentially the same problem, since x is a fixed
point of f (x) if and only if it is a zero of f (x) − x.
The most famous nonlinear equation problem in economics is the Arrow-Debreu
concept of general equilibrium which reduces to finding a price vector at which excess
demand is zero. Other examples include finding Nash equilibria in games, transition
paths of dynamic systems, and computation of steady states in nonlinear deterministic
models to name a few. In general, given that one of the most important paradigms
in economics is summarized by the assumption that agents maximize an objective
function subject to constraints, solutions to these type of problems usually reduce
to finding values that satisfy the underlying first order conditions which are charac-
terized by nonlinear equations. In econometrics we confront the same problem when
estimating models by say maximum likelihood, where the combination of the statisti-
cal model and the data give rise to a first order condition that when solved yields the
estimates that maximize the probability of occurrence of the data given the model.
The document is organized as follows: Section 2 discusses the techniques used
to solve one-dimensional problems. Section 3 discusses methods for solving general
finite-dimensional problems. Finally, Section 4 concludes.

2 Solving One-Dimensional Problems


We first consider the case, f : < → <, of a single variable and the problem f (x) = 0.
This special case is of interest because several multivariate root-finding methods are

Department of Economics of the University of Chile and Research Department of the Central
Bank of Chile. E-mail address: [email protected]

1
generalizations of simpler single-variable techniques and in several instances, complex
multivariate problems can be reduced to a single nonlinear equation.
We begin this section by describing several methods to solve one-dimensional prob-
lems. All of them are iterative, that is, they produce a sequence of values x1 , x2 , ...,
which converges (hopefully!) to the solution. After presenting the algorithms, we dis-
cuss the collateral issues of choosing a starting value for the sequence, and of deciding
when to stop the iteration.

2.1 Simple Iteration


The simplest and often easiest method for solving f (x) = 0 is to solve a related
fixed-point equation which can be used when f (x) is continuous. First, reformulate
f (x) = 0 in the form g (x) = x. From a stating value x0 , compute xi = g (xi−1 ) until
convergence is attained.1

Algorithm 1 (Simple iteration) Objective: Find a fixed point for g (x).


Initialization: Choose convergence criterion and starting point x0 .
Step 1. Compute next iterate: xi = g (xi−1 )
Step 2. If convergence is achieved go to step 3, else go to step 1.
Step 3. xi is the solution.

Under what conditions will successive application of this adjustment lead to con-
vergence? Figure 1 shows an example of when this algorithm will work. Notice that
a necessary condition for convergence is that f be defined when evaluated at each
iterate. The following theorem guarantees existence and uniqueness of a solution, as
well as convergence.

Theorem 1 Let g (x) be a continuous function defined on the interval I = [a, b] such
that g (x) ∈ I whenever x ∈ I, and satisfying a Lipschitz condition with L<1. Then
for any x0 ∈ I, the sequence defined by xi = g (xi−1 ) converges to the solution of the
equation g (x) = x, and the solution is unique.

Proof. Left as an exercise.2

Because it is easy to program, fixed-point iteration is worth knowing and trying;


if convergence fails, no more than five minutes may have been lost. Nevertheless, it
has several shortcomings; the most important being the Lipschitz condition required
for convergence, and a rate of convergence that can be painfully slow.
1
The most obvious reformulation is to set g (x) = x + f (x), so that xi = xi−1 + f (xi−1 ).
2
A function g (x) is said to satisfy a Lipschitz condition on I with index L if, for any s, t ∈ I,
|g (s) − g (t)| ≤ L |s − t|.

2
x
1.4

1.2

1 g
0.8

0.6

0.4

0.2

0
0.2 0.4 0.6 0.8 1 1.2 1.4

Figure 1: Simple Iteration.

2.2 Newton-Raphson
The simple iteration method requires f (x) to be continuous and to satisfy a Lipschitz
condition. The Newton-Raphson method requires f (x) to be twice continuously
differentiable and f 0 6= 0 at the solution s. Even though these requirements appear
to be more restrictive, they are commonly satisfied in practice. An advantage of this
method with respect to simple iteration is that it has a quadratic convergence, while
the later usually exhibits linear convergence.3
We derive the method by expanding f (x) in a Taylor series about the current
iterate, writing

(s − xi )2 00 ∗
0 = f (s) = f (xi ) + (s − xi ) f 0 (xi ) + f (x )
2
where x∗ is a point between s and xi . When xi is sufficiently close to s, the remainder
term will be small relative to the other terms (provided that f 0 (s) 6= 0). Dropping
the remainder, then, provides an approximation which yields to:

f (xi )
s = xi −
f 0 (xi )

Algorithm 2 (Newton-Raphson) Objective: Find a zero of f (x).


Initialization: Choose convergence criterion and starting point x0 .
Step 1. Compute next iterate: xi = xi−1 − f (xi−1 ) /f 0 (xi )
Step 2. If convergence is achieved go to step 3, else go to step 1.
Step 3. xi is the solution.
3
The order of convergence is defined as follows. Let s denote the solution being sought, and
i = |xi − s| the error at step i. The sequence x1 , x2 , ... is said to have convergence of order β if
limi→∞ i+1 = cβi , for some non-zero constant c.

3
This algorithm requires that the starting value be sufficiently close to the solution
to guarantee convergence. Even though an important advantage of this method is
its rapid convergence, it may be computationally costly given that for each iteration
the primitive and derivative have to be evaluated. Furthermore, oftentimes it is im-
possible to write down a closed form for the derivative and numerical approximations
(which may be inaccurate) have to be used.

2.3 The Secant Method


When the derivative of f (x) is either hard or impossible to write down, or when
the computational effort required to evaluate f 0 (x) is very large compared to that of
f (x), Newton-Raphson iteration is impossible or costly to carry out. An alternative
is to approximate the derivative by a finite difference, that is, to write
f (xi ) − f (xi−1 )
f 0 (xi ) ≈
xi − xi−1
Algorithm 3 (Secant method) Objective: Find a zero of f (x).
Initialization: Choose convergence criterion and starting points x0 , x1 .
Step 1. Compute next iterate: xi = xi−1 − f (xi−1 ) (xi − xi−1 ) / [f (xi ) − f (xi−1 )]
Step 2. If convergence is achieved go to step 3, else go to step 1.
Step 3. xi is the solution.

This iteration is called the secant method because it approximates the function
f (x) by the secant line through two successive points in the iteration, rather than
the tangent at a single point used in the Newton-Raphson iteration. This method
provides a quite good approximation to the derivative provided that the two iterates
on which it based are close to one another. It has the disadvantage of requiring two
starting points and is sensitive to them.4

2.4 Bracketing Methods


The methods discussed previously, have essentially required that f 0 (x) exist and be
well-behaved. When the behavior of the first derivative is either unknown or unpleas-
ant, other numerical methods based on bracketing can be used. The basic bracketing
method requires that two points x0 < x1 be known for which f (x0 ) f (x1 ) < 0, and
that f be continuous on [x0 , x1 ], and the various algorithms proceed by systematically
reducing the length of the interval in which the root is known to lie. The simplest
such method is bisection.5
4
A generalization of the secant method is known as the Muller method, and has an order of
convergence of 1.839, while the order of convergence of the secant method is of 1.618. If is nevertheless
more difficult to program.
5
This method works because if f is continuous, the intermediate value theorem tells us that there
is some zero of f in [x0 , x1 ].

4
The bisection method divides the current interval at its midpoint. (x2 = 12 (x0 + x1 )).
If f (x2 ) = 0, we are done. If not, we take as the next interval the half which continues
to bracket a root.

Figure 2: Bisection Method.

Algorithm 4 (Bisection) Objective: Find a zero of f (x).


Initialization: Choose convergence criterion and starting points x0 , x1 .
Step 1. Define x0 = a, x1 = b
Step 2. Compute next iterate: xi = (a + b) /2.
Step 3. Redefine bracket: If f (xi ) f (a) < 0, xi = b; else xi = a.
Step 4. If convergence is achieved go to step 5, else go to step 1.
Step 5. xi is the solution.

While this method guarantees convergence, it may be excruciatingly slow. A


modification applies the secant method to the bracketing interval, drawing the secant
connecting (a, f (a)) and (b, f (b)), and then splitting the interval at the point where
the secant intersects the horizontal axis. This method (which we shall refer as the
secant-bracket method) speeds the convergence at the initial steps. However, as soon
as the interval is reduced to one in which f (x) is either concave or convex, one of the
two endpoints of the interval becomes fixed in the subsequent iterations, producing
only linear convergence.6

Algorithm 5 (Secant-bracket) Objective: Find a zero of f (x).


Initialization: Choose convergence criterion and starting points x0 , x1 .
Step 1. Define x0 = a, x1 = b
Step 2. Compute next iterate: xi = [af (b) − bf (a)] / [f (b) − f (a)] .
Step 3. Redefine bracket: If f (xi ) f (a) < 0, xi = b; else xi = a.
6
A further refinement has become known as the Illinois method and has as order of convergence
1.442, while bisection and secant-bracket have a rate of 1.

5
Step 4. If convergence is achieved go to step 5, else go to step 1.
Step 5. xi is the solution.

2.5 Starting Values and Convergence Criteria


All of the methods that we have discussed are iterative. A fully specified algorithm
based on an iterative scheme should have three components: a method for deciding
on a starting value for the iteration. a method for obtaining the next iterate from
its predecessors, and a method for deciding when to stop the iterative process. The
preceding sections examined ways to generate the iterative sequence itself, but treated
as given the starting values and stopping criteria. Next we discuss the practicalities
of selecting initial values and determining convergence.
Starting values matter in two ways. First, if the initial value for the iteration is
too far away from the solution, the iteration can diverge. Second, it is possible that
the function f (x) = 0 has multiple roots. In this case, the root to which the sequence
converges will depend on the starting value for the iteration. Unfortunately, there
is little constructive theory about choosing starting values, however a few words of
general advice can be given. Much of the time, the equation being solved is similar to
another whose roots are easily obtained, so that a root of the latter can be used as a
starting point for the equation of actual interest. For example, in several econometric
problems, particularly when f (x) is the score function for a parameter, a preliminary
estimator such as a moment estimator can be used to get sufficiently close for the
iteration to converge rapidly. However, sometimes there is no alternative to starting
than with a guess or two and to observing the progress of the iteration, hoping that it
will be possible to adjust the starting point or iteration method as needed to achieve
convergence. A useful approach, when possible, is to graph the function. This can
often provide not only good starting values, but also some insight concerning an
appropriate form for the iteration.
The problem of detecting multiple roots, and of settling on the appropriate one
when more than one root is found, is a difficult one. Probably the most successful
general approach for discovering whether there are multiple roots is to start the
iteration several times, from the vicinity of possible solutions if enough is known
about the function, or using randomly chosen starting values otherwise.
There is more to say about stopping and iteration than starting one. There are
two reasons for bringing an iteration to a halt: either the iteration has converged or
it has not. Since the solution of the equation is not known explicitly, the decision as
to whether an iteration has converged is based on monitoring either the sequence of
iterates to see if xi is sufficiently close to xi−1 , or the sequence of function evaluations
f (xi ) to see if these become sufficiently close to zero. The two most common defi-
nitions for successive iterates to be “sufficiently close” are embodied in the absolute
converge criterion, which asserts convergence when |xi − xi−1 | < tol, and the relative
converge criterion, which asserts convergence when |(xi − xi−1 ) /xi−1 | < tol, where

6
tol is a preselected tolerance. The absolute convergence criterion is most suitable
when the solution is close to zero; in this case the denominator of the relative crite-
rion can foster numerical difficulties. On the other hand, when the solution is large
(far away from zero) the relative criterion is generally more satisfactory. Once the
convergence criterion is satisfied, we ask if f (xi ) is “nearly” zero. More precisely we
stop if |f (xi )| ≤ δ for some prespecified δ. If we want high precision, we will choose
small δ, but that choice must be reasonable. Choosing δ = 0 is nonsense, since it is
unachievable; equally pointless is choosing δ = 10−20 on a 12-digit machine where f
can be calculated with at most 12 digits of accuracy. accuracy.7

3 Solving Multivariate Nonlinear Equations


Most problems have several unknowns, requiring the use of multidimensional meth-
ods. Suppose that f : <n → <n and that we want to solve f (x) = 0, a list of n
equations in n unknowns:

f 1 (x1 , .., xi , .., xj , .., xn ) = 0,


..
.
n
f (x1 , .., xi , .., xj , .., xn ) = 0.

For the most part, methods for solving nonlinear systems are generalizations of
methods for single equations. Each of the methods (except for those based on brack-
eting) has at least one extension to the multivariate case. The problem of solving
nonlinear systems of equations arises most frequently in the context of optimizing a
scalar objective function in several variables. In that case, the system to be solved is
obtained by setting the gradient vector of the objective function to zero.
The methods discussed below will for the most part be based on locally linear
approximations to the vector function whose root is sought, so that generally the
algorithms discussed here will employ a mix of techniques from linear systems and
from multivariate nonlinear equations. As in the univariate case, we usually begin be
expanding f in a Taylor series about the current solution, and then evaluating the
series at the solution.
7
The range of numbers that are machine-representable varies greatly across machines; one should
always have a good idea of their value when working on a computer. Machine epsilon is the
smallest relative quantity that is machine-representable. Formally this is the smallest ε such that
the machine knows that 1 + ε < 1 < 1 − ε. It is also important to know machine infinity, that
is, the largest number such that both at and its negative are representable. Overflow occurs when
an operation takes machine representable numbers but wants to produce a number which exceeds
machine infinity in magnitude. A machine zero is any quantity that is equivalent to zero on the
machine. Underflow occurs when an operation takes nonzero quantities but tries to produce to
produce a nonzero magnitude less than machine zero. The analyst must either know these important
constants for his machine or more conservative guesses. Much of the software contains a section
where the user must specify these arithmetic constants.

7
3.1 Gauss-Jacobi Algorithm
The simplest iteration method for solving multivariate nonlinear equations is the
Gauss-Jacobi method. Given the known value of the kth iterate, xk , we use the ith
equation to compute the ith component of unknown xk+1 , the next iterate. Formally,
xk+1 is defined in terms of xk by the equations:

¡ ¢
f 1 xk+1 k k k
1 , x2 , x3 , ..., xn = 0,
¡ ¢
f 2 xk1 , xk+1 k k
2 , x3 , ..., xn = 0,
..
.
¡ k k k ¢
f x1 , x2 , x3 , ..., xk+1
n
n = 0.

Each of these equations is a single nonlinear equation with one unknown, allowing
us to apply the single-equation methods presented in the previous sections. This
method reduces the problem of solving n unknowns simultaneously in n equations to
that of repeatedly solving n equations with one unknown.
The Gauss-Jacobi method is affected by the indexing scheme for the variables
and the equations. There is no natural choice for which variable is variable 1 and
which equation is equation 1. Therefore there are n! different schemes from which it
is difficult to determine which is the best, but some simple situations come to mind.
For example, if some equation depends on only one unknown, then that equation
should be equation 1 and that variable should be variable 1.
Each step in the Gauss-Jacobi method is a nonlinear equation and is usually solved
by some iterative method. There is a little point in solving each one precisely, since
we must solve each equation again in the next iteration. We could just approximately
solve each equation by taking a single Newton step for each component of xk+1 . The
resulting scheme is ¡ ¢
k+1 k f i xk
xi = xi − i k , i = 1, ..., n
fxi (x )

3.2 Gauss-Seidel Algorithm


In the Gauss-Jacobi method we use the new guess of xi , xk+1 i , only after we have
computed the entire vector of new values, xk+1 . The basic idea of the Gauss-Seidel
method is to use the new guess of xi as soon as it is available. In the general nonlinear
case, this implies that given xk , we construct xk+1 componentwise by solving the
following one-dimensional problems in sequence:

8
¡ ¢
f 1 xk+1 k k k
1 , x2 , x3 , ..., xn = 0,
¡ ¢
f 2 xk+1 k+1 k k
1 , x2 , x3 , ..., xn = 0,
..
.
n−1
¡ k+1 k+1 ¢
f x1 , x2 , ..., xk+1 k
n−1 , xn = 0,
¡ ¢
f n xk+1 k+1 k+1
1 , x2 , ..., xn−1 , xn
k+1
= 0.

Again we solve f 1 , f 2 , ..., f n in sequence, but we immediately use each new com-
ponent. Now the indexing scheme matters even more because it affects the way in
which later results depend on earlier ones. We can implement a single Newton step
to economize on computation costs at each iteration by using:

f i ¡ k+1 ¢
xk+1
i = xki − i
x1 , ..., xk+1 k k
i−1 , xi , ..., xn , i = 1, ..., n
fxi
While often used, Gaussian methods have some problems. These are risky meth-
ods to use if the system is not diagonally dominant (there is not enough “block
recursion”); furthermore, convergence is at best linear.

3.3 Newton-Raphson
Just as in the one-dimensional case, Newton-Raphson’s method replaces f with a
linear approximation, and then solve the linear problem to generate the next guess.
By Taylor’s theorem, the linear approximation of f around the initial guess x0 is
f (x) ≈ f (x0 ) + J (x0 ) (x − x0 ).8 We can solve for the zero of this linear approxi-
−1
mation, yielding x1 = x0 − J (x0 ) f (x0 ). This zero then serves as the new guess
around which we again linearize.

Algorithm 6 (Newton-Raphson’s Method) Objective: Find a zero of f (x).


Initialization: Choose convergence criterion and starting point x0 .
−1
Step 1. Compute next iterate: xi+1 = xi − J (xi ) f (xi )
Step 2. If convergence is achieved go to step 3, else go to step 1.
Step 3. xi is the solution.

As with the one-dimensional case, this method is quadratically convergent in a


neighborhood of the solution but the multivariate iteration is even more sensitive
to starting values. An even greater drawback of Newton-Raphson’s method is the
necessity for computing n (n + 1) /2 derivatives when J is symmetric, and n2 in the
8
We define J (x) as the Jacobian of f. J (x) is an n × n matrix, where the element i, j is defined
i
as Ji,j (x) = ∂f∂x(x)
j
.

9
general case.9 There are three aspects to the derivative problem. First, there is the
problem of actually obtaining the analytic form for the set of derivative functions.
Even when this is possible to do by hand, it is often difficult to do so correctly (the first
time at least!). Second, the derivatives must then be transcribed to the programming
language or software used. Third, the cost of performing so many function evaluations
at each step of the iteration may make the algorithm too costly to employ.10

3.4 Newton-like Methods


Several variants of the Newton-Raphson method are often used in order to ease com-
putational burden of evaluating J.

3.4.1 Rescaled Simple Iteration


The simplest way to reduce the cost of approximating J is simply not to recompute
it at all, but rather to use an initial approximation A throughout the iteration, that
is ¡ ¢
xi+1 = xi − A−1 f xi
A good choice for A is J (x0 ). Such method will converge from a starting value
that is sufficiently close the solution provided that A−1 is sufficiently close to J −1
evaluated at the solution and the convergence is generally linear. It is difficult to
guarantee in advance that this condition will be met. A compromise approach is to
take one Newton step followed by several additional iteration steps without updating
the Jacobian matrix. At that point the derivative matrix can be computed afresh,
and several more steps taken.

3.4.2 Generalized Secant Method (Broyden)


Another solution to the problem of Jacobian calculation is to begin with a rough
guess of the Jacobian and use successive evaluations of f and its gradient to evaluate
the guess of J. In this way we make more use of the computed values of f and avoid
the cost of recomputing J at each iteration.
The one-dimensional secant method did this, but the problem in n dimensions
is more complex. To see this, suppose that we have computed f at y and z. In
the one-dimensional case the slope, m, near y and z is approximated by the solution
9
One example of a symmetric J arises in econometrics when estimating parameters with maxi-
mum likelihood, in which case the Jacobian of the first order condutions (scores) is the Hessian of
the (log) likelihood function.
10
When the derivatives can be expressed analytically, it is sometimes helpful to perform or check
the computations using a symbolic algebra package such as Maple, Mathematica, or even Matlab.
When numerical derivatives are used in order to approximate J, we refer to this method as the Dis-
crete Newton Method. Appendix A presents the algorithms generally used for obtaining numerical
derivatives.

10
to the scalar equation f (y) − f (z) = m (y − z), which is unique whenever y 6= z.
the n dimensional analogue to the slope, m, is the Jacobian, M , which near y and
z approximately satisfies the multidimensional secantlike equation f (y) − f (z) =
M (y − z). There is no unique such matrix: Since f (y) − f (z) and y − z are column
vectors, this equation imposes only n conditions on the n2 elements of M . We need
some way to fill in the rest of our estimates of M .
Broyden’s method is the <n version of the secant method. It produces a sequence
of points xk , and matrices Ak which serve as Jacobian guesses. Suppose that after k
iterations our guess for x is xk and our guess for ¡the¢Jacobian at xk is Ak . We use Ak
to compute the Newton step xk+1 = xk − A−1 k f x
k
and with them we obtain Ak+1
according to:

Algorithm 7 (Broyden’s Method) Objective: Find a zero of f (x).


Initialization: Choose convergence criterion, starting point x0 and A0 = In .
Step 1. Compute next iterate: xi+1 = xi − A−1i f (x )
i
i+1 i 0
f (x )[A−1
i f (x )]
Step 2. Update Jacobian guess: Ai+1 = Ai − [A f (xi )]0 [A i
i i f (x )]
Step 3. If convergence is achieved go to step 3, else go to step 1.
Step 4. xi is the solution.

The convergence properties of this algorithm are inferior to Newton’s method but
are better than Gaussian methods. Note that the convergence is only asserted for
the x sequence, while the A sequence need not converge to J. Each iteration of the
Broyden method is far less costly to compute because there is no Jacobian calculated,
but the Broyden method will generally need more iterations than Newton’s method.
For large systems, the Broyden method may be much faster, since Jacobian calcu-
lation can be very expensive; however, for highly nonlinear problems the Jacobian
may change drastically between iterations, causing the Broyden approximation to be
quite poor. Of course that will also give Newton-Raphson’s method problems since
the underlying assumption of any Newton method is that a linear approximation is
appropriate, which is the same as saying that the Jacobian does not change much.

3.4.3 Quasi-Newton Methods


Quasi-Newton methods are similar to Broyden’s method in the sense that they also
include iterative procedures for obtaining Ak . In econometrics, two are of partic-
ular interest: the Davidson-Fletcher-Powell algorithm, and the Broyden-Fletcher-
Goldfarb-Shanno (BFGS) algorithm. These algorithms were proposed for solving the
optimization problem, wherein f (x) is the gradient of a scalar objective function in
which case J is symmetric and positive definite. Both methods ensure that Ak has
this property (if A0 does).
The Davidson-Fletcher-Powell method is usually expressed
¡ ¢ in¡terms
¢ of A−1 rather
than A; for that purpose let dk = xk+1 −xk and gk = f xk+1 −f xk . The algorithm

11
written in terms of the inverse matrix uses
dk d0k A−1 0 −1
k gk gk Ak
A−1
i+1 = A−1
i + 0 −
dk gk gk0 A−1
k gk

This matrix will be nonsingular provided


¡ k ¢that the denominators of the both terms
are non-zero, and provided that gk0 A−1
k f x 6= 0.
The BFGS algorithm is generally considered to be superior to the Davidson-
Fletcher-Powell method for updating the Jacobian estimate. The updating algorithm
is
gk g 0 Ak dk d0 Ak
Ai+1 = Ai + 0 k − 0 k
dk gk dk Ak dk

3.5 Starting Values and Convergence Criteria


Practically everything we had to say about choosing starting values in the univariate
case applies as well in the multivariate case, and there is little to add. Although
deciding when a multivariate iteration has converged is somewhat more problematic,
it is usually satisfactory to define relative and absolute convergence in terms of some
norm on x such as the 2 (euclidean) norm (sum of squares) or the sup-norm (maxi-
mum component magnitude). ¡Always ¢ remember that even if xk satisfies the stopping
rule, we want to check that f xk is close to zero.

4 Concluding Remarks
We presented the basic methods for solving nonlinear equations. One-dimensional
problems are easily solved, reliably by comparison methods and often very quickly
by Newton’s method. Nonlinear systems of equations are more difficult. Solving
systems of nonlinear equations reduces to an iterative search guided generally by the
Jacobian or diagonal portions of the Jacobian. For small systems we generally use
Newton’s method because of its good local convergence properties. Large systems
are generally solved by breaking the system into smaller systems as in Gauss-Jacobi
and Gauss-Seidel methods and their block versions. Newton and Gaussian methods
need good initial guesses, but finding them is often an art and usually conducted in
an ad hoc fashion.
While not discussed here, it is obvious that there are strong connections between
optimization problems and nonlinear equations. First, one can often be transformed
into the other. For example if f (x) twice continuously differentiable function, then
the solution to minx f (x) is also the solution to the system of first-order conditions
∇f (x) = 0. Newton’s method for optimization in fact solves minimization problems
by solving the first-order conditions.
We can also go in the opposite direction, converting a set of nonlinear equations
into an optimization problem. Sometimes there is a solution F (x) such that f (x) =

12
∇F (x), in which case the zeros of f are exactly the local minima of F . Such systems
f (x) are called integrable. In one general sense any nonlinear equation problem can
be converted to an optimization problem. Any solution to f (x) = 0 is also a global
solution to n
X
min f i (x)2
x
i=1
Pn
and any global minimum of i=1 f i (x)2 is a solution to f (x) = 0.
Several of the methods discussed here are used by software packages such as
GAUSS and Matlab. The former uses the library NLSYS to solve systems of nonlinear
equations.

13
References
Arrau, P., J. Quiroz, and R. Chumacero (1992). “Ahorro Fiscal y Tipo de Cambio
Real,” Cuadernos de Economía 88, 349-86.

Judd, K. (1998), Numerical Methods in Economics. The MIT Press.

Thisted, R. (1988), Elements of Statistical Computing. Chapman and Hall.

14
A Numerical Differentiation
The Newton-Raphson method that we described previously is an example of an algo-
rithm that required evaluating derivatives. Finite-difference approximations of gra-
dients, Hessians, and Jacobians are routinely used in optimization and nonlinear
equation problems. The main reason being that analytic derivatives may be diffi-
cult (impossible?) or time-consuming for the programmer. Here, we briefly develop
numerical derivative formulas.
The derivative is defined by

f (x + ε) − f (x)
f 0 (x) = lim
ε→0 ε
This suggests the formula

f (x + h) − f (x)
f 0 (x) ≈
h
0
to approximate f (x). More accurate derivative approximations can be achieved
using central differences, which yields the two-sided formula:

f (x + h) − f (x − h)
f 0 (x) ≈
2h
n
More generally, if f : < → <, the one-sided formula for ∂f /∂xi is

∂f f (x1 , ..., xi + hi , ..., xn ) − f (x1 , ..., xi , ..., xn )



∂xi hi
and the two sided formula is

∂f f (x1 , ..., xi + hi , ..., xn ) − f (x1 , ..., xi − h, ..., xn )



∂xi 2hi
Note that the marginal cost of computing a one-sided derivative equals one evalu-
ation of f , since it is assumed that one computes f (x) no matter how the derivative
is computed. Hence the marginal cost equals its average cost. Of course, while more
precise, two sided derivatives require two evaluations of f , thus making it costlier.
The problem of computing a Jacobian of a multivariate function f : <n → <m re-
duces to the first order derivative of the previous formulas, since each element in the
Jacobian of f is the first derivative of one of the component functions of f . Elements
of the Hessian of f : <n → < are of two types. Cross partials are approximated by
µ
∂2f 1 f (x1 , .., xi + hi , .., xj + hj , .., xn ) − f (x1 , .., xi , .., xj + hj , .., xn )

∂xi ∂xj hj hi

f (x1 , .., xi + hi , .., xj , .., xn ) − f (x1 , .., xi , .., xj , .., xn )

hi

15
and the second partials are approximated by

∂2f f (x1 , .., xi + hi , .., xn ) − 2f (x1 , .., xi , .., xn ) + f (x1 , .., xi − hi , .., xn )
2

∂xi h2i

One obvious question is how big (or small) should h be. A typical choice is
hi = max {εxi , ε}, where ε is usually on the order of 10−6 . The relation between ε
and h is motivated by two contrary factors; first, we want h to be small relative to
x, but second, we want h to stay away from zero to keep the division well-behaved.

16
B Workout Problems
1. Program source codes to compute solutions to nonlinear equations using the
methods discussed.

2. Consider a two-period (t = 0, 1) endowment economy with two types of agents


(i = 1, 2).PEach agent is interested in maximizing the time separable utility
function 1t=0 β t c1−γ
t / (1 − γ). The endowment for each type of agent in each
period is: y0 = 1, y1 = 2, y02 = 2, y12 = 5. Find the equilibrium real interest rate
1 1

if β = 0.8, γ = 2. Which agent is a net lender?

3. Consider a simple deterministic


P growth model in which a representative agent
t
is interested in maximizing ∞ t=0 β u (ct ); u (ct ) = c1−γ
t / (1 − γ); subject to the
flow constraint f (kt ) = ct + kt+1 − (1 − δ) kt ∀t with f (kt ) = ktα . If 0 <
α, β, δ < 1 verify if the equation that determines the steady state of k satisfies
the conditions stated on Theorem 1. Use alternative values for preferences and
technology parameters along with the algorithms presented to solve the steady
state values of c and k. Comment your results.

4. Replicate the results of Arrau, et al. (1992).

5. Show that the error in using a one-sided difference for approximating f 0 (x) is
O (h), whereas the error using a two-sided difference is O (h2 ).

6. Use the Newton-Raphson method to solve the following nonlinear equation:


2
f (x) = x1/3 e−x =0 using as starting values x0 = 0.3, x0 = −1.2, and x0 = 1.01.
Discuss what is wrong with this function and which method would be more
suitable to solve it.

7. Write a program that determines your machine ε.

17

View publication stats

You might also like