Solving_Nonlinear_Equations
Solving_Nonlinear_Equations
net/publication/265633573
Article
CITATIONS READS
0 1,003
1 author:
Rómulo A. Chumacero
University of Chile
78 PUBLICATIONS 610 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Rómulo A. Chumacero on 02 June 2015.
1 Introduction
Concepts of economic equilibrium are often expressed as systems of nonlinear equa-
tions. Such problems generally take two forms, zeros and fixed points. If f : <n → <n ,
then a zero of f is any x such that f (x) = 0, and a fixed point of f is any x such
that f (x) = x. These are of course essentially the same problem, since x is a fixed
point of f (x) if and only if it is a zero of f (x) − x.
The most famous nonlinear equation problem in economics is the Arrow-Debreu
concept of general equilibrium which reduces to finding a price vector at which excess
demand is zero. Other examples include finding Nash equilibria in games, transition
paths of dynamic systems, and computation of steady states in nonlinear deterministic
models to name a few. In general, given that one of the most important paradigms
in economics is summarized by the assumption that agents maximize an objective
function subject to constraints, solutions to these type of problems usually reduce
to finding values that satisfy the underlying first order conditions which are charac-
terized by nonlinear equations. In econometrics we confront the same problem when
estimating models by say maximum likelihood, where the combination of the statisti-
cal model and the data give rise to a first order condition that when solved yields the
estimates that maximize the probability of occurrence of the data given the model.
The document is organized as follows: Section 2 discusses the techniques used
to solve one-dimensional problems. Section 3 discusses methods for solving general
finite-dimensional problems. Finally, Section 4 concludes.
1
generalizations of simpler single-variable techniques and in several instances, complex
multivariate problems can be reduced to a single nonlinear equation.
We begin this section by describing several methods to solve one-dimensional prob-
lems. All of them are iterative, that is, they produce a sequence of values x1 , x2 , ...,
which converges (hopefully!) to the solution. After presenting the algorithms, we dis-
cuss the collateral issues of choosing a starting value for the sequence, and of deciding
when to stop the iteration.
Under what conditions will successive application of this adjustment lead to con-
vergence? Figure 1 shows an example of when this algorithm will work. Notice that
a necessary condition for convergence is that f be defined when evaluated at each
iterate. The following theorem guarantees existence and uniqueness of a solution, as
well as convergence.
Theorem 1 Let g (x) be a continuous function defined on the interval I = [a, b] such
that g (x) ∈ I whenever x ∈ I, and satisfying a Lipschitz condition with L<1. Then
for any x0 ∈ I, the sequence defined by xi = g (xi−1 ) converges to the solution of the
equation g (x) = x, and the solution is unique.
2
x
1.4
1.2
1 g
0.8
0.6
0.4
0.2
0
0.2 0.4 0.6 0.8 1 1.2 1.4
2.2 Newton-Raphson
The simple iteration method requires f (x) to be continuous and to satisfy a Lipschitz
condition. The Newton-Raphson method requires f (x) to be twice continuously
differentiable and f 0 6= 0 at the solution s. Even though these requirements appear
to be more restrictive, they are commonly satisfied in practice. An advantage of this
method with respect to simple iteration is that it has a quadratic convergence, while
the later usually exhibits linear convergence.3
We derive the method by expanding f (x) in a Taylor series about the current
iterate, writing
(s − xi )2 00 ∗
0 = f (s) = f (xi ) + (s − xi ) f 0 (xi ) + f (x )
2
where x∗ is a point between s and xi . When xi is sufficiently close to s, the remainder
term will be small relative to the other terms (provided that f 0 (s) 6= 0). Dropping
the remainder, then, provides an approximation which yields to:
f (xi )
s = xi −
f 0 (xi )
3
This algorithm requires that the starting value be sufficiently close to the solution
to guarantee convergence. Even though an important advantage of this method is
its rapid convergence, it may be computationally costly given that for each iteration
the primitive and derivative have to be evaluated. Furthermore, oftentimes it is im-
possible to write down a closed form for the derivative and numerical approximations
(which may be inaccurate) have to be used.
This iteration is called the secant method because it approximates the function
f (x) by the secant line through two successive points in the iteration, rather than
the tangent at a single point used in the Newton-Raphson iteration. This method
provides a quite good approximation to the derivative provided that the two iterates
on which it based are close to one another. It has the disadvantage of requiring two
starting points and is sensitive to them.4
4
The bisection method divides the current interval at its midpoint. (x2 = 12 (x0 + x1 )).
If f (x2 ) = 0, we are done. If not, we take as the next interval the half which continues
to bracket a root.
5
Step 4. If convergence is achieved go to step 5, else go to step 1.
Step 5. xi is the solution.
6
tol is a preselected tolerance. The absolute convergence criterion is most suitable
when the solution is close to zero; in this case the denominator of the relative crite-
rion can foster numerical difficulties. On the other hand, when the solution is large
(far away from zero) the relative criterion is generally more satisfactory. Once the
convergence criterion is satisfied, we ask if f (xi ) is “nearly” zero. More precisely we
stop if |f (xi )| ≤ δ for some prespecified δ. If we want high precision, we will choose
small δ, but that choice must be reasonable. Choosing δ = 0 is nonsense, since it is
unachievable; equally pointless is choosing δ = 10−20 on a 12-digit machine where f
can be calculated with at most 12 digits of accuracy. accuracy.7
For the most part, methods for solving nonlinear systems are generalizations of
methods for single equations. Each of the methods (except for those based on brack-
eting) has at least one extension to the multivariate case. The problem of solving
nonlinear systems of equations arises most frequently in the context of optimizing a
scalar objective function in several variables. In that case, the system to be solved is
obtained by setting the gradient vector of the objective function to zero.
The methods discussed below will for the most part be based on locally linear
approximations to the vector function whose root is sought, so that generally the
algorithms discussed here will employ a mix of techniques from linear systems and
from multivariate nonlinear equations. As in the univariate case, we usually begin be
expanding f in a Taylor series about the current solution, and then evaluating the
series at the solution.
7
The range of numbers that are machine-representable varies greatly across machines; one should
always have a good idea of their value when working on a computer. Machine epsilon is the
smallest relative quantity that is machine-representable. Formally this is the smallest ε such that
the machine knows that 1 + ε < 1 < 1 − ε. It is also important to know machine infinity, that
is, the largest number such that both at and its negative are representable. Overflow occurs when
an operation takes machine representable numbers but wants to produce a number which exceeds
machine infinity in magnitude. A machine zero is any quantity that is equivalent to zero on the
machine. Underflow occurs when an operation takes nonzero quantities but tries to produce to
produce a nonzero magnitude less than machine zero. The analyst must either know these important
constants for his machine or more conservative guesses. Much of the software contains a section
where the user must specify these arithmetic constants.
7
3.1 Gauss-Jacobi Algorithm
The simplest iteration method for solving multivariate nonlinear equations is the
Gauss-Jacobi method. Given the known value of the kth iterate, xk , we use the ith
equation to compute the ith component of unknown xk+1 , the next iterate. Formally,
xk+1 is defined in terms of xk by the equations:
¡ ¢
f 1 xk+1 k k k
1 , x2 , x3 , ..., xn = 0,
¡ ¢
f 2 xk1 , xk+1 k k
2 , x3 , ..., xn = 0,
..
.
¡ k k k ¢
f x1 , x2 , x3 , ..., xk+1
n
n = 0.
Each of these equations is a single nonlinear equation with one unknown, allowing
us to apply the single-equation methods presented in the previous sections. This
method reduces the problem of solving n unknowns simultaneously in n equations to
that of repeatedly solving n equations with one unknown.
The Gauss-Jacobi method is affected by the indexing scheme for the variables
and the equations. There is no natural choice for which variable is variable 1 and
which equation is equation 1. Therefore there are n! different schemes from which it
is difficult to determine which is the best, but some simple situations come to mind.
For example, if some equation depends on only one unknown, then that equation
should be equation 1 and that variable should be variable 1.
Each step in the Gauss-Jacobi method is a nonlinear equation and is usually solved
by some iterative method. There is a little point in solving each one precisely, since
we must solve each equation again in the next iteration. We could just approximately
solve each equation by taking a single Newton step for each component of xk+1 . The
resulting scheme is ¡ ¢
k+1 k f i xk
xi = xi − i k , i = 1, ..., n
fxi (x )
8
¡ ¢
f 1 xk+1 k k k
1 , x2 , x3 , ..., xn = 0,
¡ ¢
f 2 xk+1 k+1 k k
1 , x2 , x3 , ..., xn = 0,
..
.
n−1
¡ k+1 k+1 ¢
f x1 , x2 , ..., xk+1 k
n−1 , xn = 0,
¡ ¢
f n xk+1 k+1 k+1
1 , x2 , ..., xn−1 , xn
k+1
= 0.
Again we solve f 1 , f 2 , ..., f n in sequence, but we immediately use each new com-
ponent. Now the indexing scheme matters even more because it affects the way in
which later results depend on earlier ones. We can implement a single Newton step
to economize on computation costs at each iteration by using:
f i ¡ k+1 ¢
xk+1
i = xki − i
x1 , ..., xk+1 k k
i−1 , xi , ..., xn , i = 1, ..., n
fxi
While often used, Gaussian methods have some problems. These are risky meth-
ods to use if the system is not diagonally dominant (there is not enough “block
recursion”); furthermore, convergence is at best linear.
3.3 Newton-Raphson
Just as in the one-dimensional case, Newton-Raphson’s method replaces f with a
linear approximation, and then solve the linear problem to generate the next guess.
By Taylor’s theorem, the linear approximation of f around the initial guess x0 is
f (x) ≈ f (x0 ) + J (x0 ) (x − x0 ).8 We can solve for the zero of this linear approxi-
−1
mation, yielding x1 = x0 − J (x0 ) f (x0 ). This zero then serves as the new guess
around which we again linearize.
9
general case.9 There are three aspects to the derivative problem. First, there is the
problem of actually obtaining the analytic form for the set of derivative functions.
Even when this is possible to do by hand, it is often difficult to do so correctly (the first
time at least!). Second, the derivatives must then be transcribed to the programming
language or software used. Third, the cost of performing so many function evaluations
at each step of the iteration may make the algorithm too costly to employ.10
10
to the scalar equation f (y) − f (z) = m (y − z), which is unique whenever y 6= z.
the n dimensional analogue to the slope, m, is the Jacobian, M , which near y and
z approximately satisfies the multidimensional secantlike equation f (y) − f (z) =
M (y − z). There is no unique such matrix: Since f (y) − f (z) and y − z are column
vectors, this equation imposes only n conditions on the n2 elements of M . We need
some way to fill in the rest of our estimates of M .
Broyden’s method is the <n version of the secant method. It produces a sequence
of points xk , and matrices Ak which serve as Jacobian guesses. Suppose that after k
iterations our guess for x is xk and our guess for ¡the¢Jacobian at xk is Ak . We use Ak
to compute the Newton step xk+1 = xk − A−1 k f x
k
and with them we obtain Ak+1
according to:
The convergence properties of this algorithm are inferior to Newton’s method but
are better than Gaussian methods. Note that the convergence is only asserted for
the x sequence, while the A sequence need not converge to J. Each iteration of the
Broyden method is far less costly to compute because there is no Jacobian calculated,
but the Broyden method will generally need more iterations than Newton’s method.
For large systems, the Broyden method may be much faster, since Jacobian calcu-
lation can be very expensive; however, for highly nonlinear problems the Jacobian
may change drastically between iterations, causing the Broyden approximation to be
quite poor. Of course that will also give Newton-Raphson’s method problems since
the underlying assumption of any Newton method is that a linear approximation is
appropriate, which is the same as saying that the Jacobian does not change much.
11
written in terms of the inverse matrix uses
dk d0k A−1 0 −1
k gk gk Ak
A−1
i+1 = A−1
i + 0 −
dk gk gk0 A−1
k gk
4 Concluding Remarks
We presented the basic methods for solving nonlinear equations. One-dimensional
problems are easily solved, reliably by comparison methods and often very quickly
by Newton’s method. Nonlinear systems of equations are more difficult. Solving
systems of nonlinear equations reduces to an iterative search guided generally by the
Jacobian or diagonal portions of the Jacobian. For small systems we generally use
Newton’s method because of its good local convergence properties. Large systems
are generally solved by breaking the system into smaller systems as in Gauss-Jacobi
and Gauss-Seidel methods and their block versions. Newton and Gaussian methods
need good initial guesses, but finding them is often an art and usually conducted in
an ad hoc fashion.
While not discussed here, it is obvious that there are strong connections between
optimization problems and nonlinear equations. First, one can often be transformed
into the other. For example if f (x) twice continuously differentiable function, then
the solution to minx f (x) is also the solution to the system of first-order conditions
∇f (x) = 0. Newton’s method for optimization in fact solves minimization problems
by solving the first-order conditions.
We can also go in the opposite direction, converting a set of nonlinear equations
into an optimization problem. Sometimes there is a solution F (x) such that f (x) =
12
∇F (x), in which case the zeros of f are exactly the local minima of F . Such systems
f (x) are called integrable. In one general sense any nonlinear equation problem can
be converted to an optimization problem. Any solution to f (x) = 0 is also a global
solution to n
X
min f i (x)2
x
i=1
Pn
and any global minimum of i=1 f i (x)2 is a solution to f (x) = 0.
Several of the methods discussed here are used by software packages such as
GAUSS and Matlab. The former uses the library NLSYS to solve systems of nonlinear
equations.
13
References
Arrau, P., J. Quiroz, and R. Chumacero (1992). “Ahorro Fiscal y Tipo de Cambio
Real,” Cuadernos de Economía 88, 349-86.
14
A Numerical Differentiation
The Newton-Raphson method that we described previously is an example of an algo-
rithm that required evaluating derivatives. Finite-difference approximations of gra-
dients, Hessians, and Jacobians are routinely used in optimization and nonlinear
equation problems. The main reason being that analytic derivatives may be diffi-
cult (impossible?) or time-consuming for the programmer. Here, we briefly develop
numerical derivative formulas.
The derivative is defined by
f (x + ε) − f (x)
f 0 (x) = lim
ε→0 ε
This suggests the formula
f (x + h) − f (x)
f 0 (x) ≈
h
0
to approximate f (x). More accurate derivative approximations can be achieved
using central differences, which yields the two-sided formula:
f (x + h) − f (x − h)
f 0 (x) ≈
2h
n
More generally, if f : < → <, the one-sided formula for ∂f /∂xi is
15
and the second partials are approximated by
∂2f f (x1 , .., xi + hi , .., xn ) − 2f (x1 , .., xi , .., xn ) + f (x1 , .., xi − hi , .., xn )
2
≈
∂xi h2i
One obvious question is how big (or small) should h be. A typical choice is
hi = max {εxi , ε}, where ε is usually on the order of 10−6 . The relation between ε
and h is motivated by two contrary factors; first, we want h to be small relative to
x, but second, we want h to stay away from zero to keep the division well-behaved.
16
B Workout Problems
1. Program source codes to compute solutions to nonlinear equations using the
methods discussed.
5. Show that the error in using a one-sided difference for approximating f 0 (x) is
O (h), whereas the error using a two-sided difference is O (h2 ).
17