0% found this document useful (0 votes)
243 views58 pages

Methods For Non-Linear Least Squares Problems-2nd

This document describes methods for solving nonlinear least squares problems. It begins with introductions and definitions related to optimization problems and nonlinear least squares problems. It then describes several descent methods for finding local minimizers, including the steepest descent method and Newton's method. The bulk of the document focuses on algorithms specifically for nonlinear least squares problems, such as the Gauss-Newton method, Marquardt's method, and Powell's dog leg method.

Uploaded by

Xiangguo Li
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views58 pages

Methods For Non-Linear Least Squares Problems-2nd

This document describes methods for solving nonlinear least squares problems. It begins with introductions and definitions related to optimization problems and nonlinear least squares problems. It then describes several descent methods for finding local minimizers, including the steepest descent method and Newton's method. The bulk of the document focuses on algorithms specifically for nonlinear least squares problems, such as the Gauss-Newton method, Marquardt's method, and Powell's dog leg method.

Uploaded by

Xiangguo Li
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

IMM DEPARTMENT OF MATHEMATICAL MODELLING Technical University of Denmark DK-2800 Lyngby Denmark

J. No. H38 7.7.1999 HBN/ms

METHODS FOR NON-LINEAR LEAST SQARES ROBLEMS U P

Kaj Mad n se Hans Bruun Nie n lse OleTingle ff

IMM

Contents
1. Introduction and Definitions ::: :: :: ::: :: :: :: :: ::: :: :: :: ::1 2. Descent Methods : : :: ::: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: :5 2.1. The Steepest Descent method : :: :: :: ::: :: :: :: :: ::: :: :: :: 7 2.2. Newton's Method : :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: :: 8 2.3. Line Search ::: :: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: : 10 3. Non-Linear Least Squares Problems : : :: :: :: ::: :: :: :: :: :13 3.1. The Gauss-Newton Method: :: :: ::: :: :: :: ::: :: :: :: ::: :: :16 3.2. Marquardt's Method : ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: 21 3.3. A Hybrid Method: Marquardt and Quasi-Newton: :: :: ::28 3.4. A Secant Version of Marquardt's Method :: :: :: :: ::: :: :: 33 3.5. Powell's Dog Leg Method: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :39 3.6. Final Remarks :: :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: :: : 46 Appendix :: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: : 49 References : :: :: ::: :: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: : 53 Index : :: ::: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: :: :: ::: :: :: :: :: ::: :: : 55

1. Introduction and Definitions


In this booklet we consider the problem of nding an argument which gives the minimumvalue of a given di erentiable function F : IRn 7!IR, the socalled objective or cost function. In other words:

De nition. Global Minimizer Find x+ = argminx fF(x)g where F : IRn7!IR

(1.1)

This problem is very hard to solve in general, and the methods we give here are built to solve the simpler problem of nding a local minimizer for F, an argument vector which gives a minimum value of F inside a certain region whose size is given by ; > 0 and small enough:

De nition. Local minimizer Find x so that F(x) F(x ) for kx ? x k <

(1.2)

The main subject of this booklet is the treatment of methods for a special kind of optimiziation problems where the function F has the following form

De nition. Least Squares Problem Find x , a local minimizer for


F(x) =
1 2

m X i=1

(fi

(x))2

where fi :

IRn 7! IR

and m n

(1.3)

The factor 1 has no e ect on x , and is introduced for convenience, 2 see page 14.

1. Introduction and De nitions

Example 1.1. An important source of least squares problems is data


tting. As an example consider the data points (t1 ; y1 ); : : : ; (tm ; ym ) shown below
y

Further, we are given a tting model, M (x; t) = x3 ex1 t + x4 ex2 t : The model depends on the parameters x = x1 ; x2 ; x3 ; x4 ]> . We assume that there exists an xy so that yi = M (xy ; ti ) + "i ; where the f"i g are (measurement) errors on the data ordinates, assumed to behave like \white noise". For any choice of x we can compute the residuals fi(x) = yi ? M (x; ti ) = yi ? x3 ex1 ti ? x4 ex2 ti ; i = 1; : : : ; m : For a least squares t the parameters are determined as the minimizer x of the sum of squared residuals. This is seen to be a problem of the form (1.3) with m = 45, n = 4. The graph of M (x ; t) is shown by full line in Figure 1.1.

Figure 1.1. Data points f(ti ; yi )g (marked by +) and model M (x; t) (marked by full line)

In the remainder of this introduction we shall discuss some basic concepts in optimization, and Chapter 2 is a brief review of methods for nding a local minimizer for general cost functions. For more details we refer to Frandsen et al. (1999). In Chapter 3 we give methods that are specially tuned for least squares problems.

3
Taylor expansion is valid,1)

1. Introduction and Definitions

We assume that the cost function F is so smooth that the following

1 F(x+h) = F(x) + h> g + 2 h> H h + O(khk3) ; (1.4a) where g is the gradient, 2 @F (x) 3 6 @x1 7 6 . 7; 6 g F 0(x) 6 .. 7 (1.4b) 7 6 7 4 @F 5 @xn (x) and H is the Hessian matrix, @2F (1.4c) H F00(x) @x @x (x) : i j If x is a local minimizer and khk is su ciently small, then we cannot nd a point x +h with a smaller F-value. Combining this observation with (1.4a) we see that

Necessary Condition for a Local Minimizer x is a local minimizer =) g F 0(x ) = 0 xs is a Stationary Point () gs F 0(xs ) = 0
1) Unless otherwise speci ed, k
k

(1.5)

We use a special name for arguments that satisfy the necessary condition: (1.6)

denotes the 2-norm,

h =
k

h2 + 1

+ h2 . n

1. Introduction and De nitions

F(xs +h) = F(xs ) + 1 h> Hs h + O(khk3) 2 (1.7) with Hs F00(xs ) : From de nition (1.4c) of the Hessian matrix it follows that any H is symmetric. If we request that Hs is positive de nite, then its eigenvalues are greater than some number > 0 (see Appendix A), and h>Hs h > khk2 : This shows that for khk su ciently small the third term on the righthand side of (1.7) will be dominated by the second. This term is positive, so that we get

Thus, the local minimizers are also stationary points, but so are the local maximizers. A stationary point which is neither a local maximizer nor a local minimizer is called a saddle point. In order to determine whether a given stationary point is a local minimizer or not, we need to include the second order term in the Taylor series (1.4a). Inserting xs we see that

is a saddle point.

Su cient Condition for a Local Minimizer x is a stationary point and F00(x ) is positive de nite (1.8) =) x is a local minimizer If Hs is negative de nite, then xs is a local maximizer. If Hs is inde nite (i.e. it has both positive and negative eigenvalues), then xs

2. Descent Methods
All methods for non-linear optimization are iterative: From a starting point x0 the method produces a series of vectors x1 ; x2; : : :, which (hopefully) converges against x , a local minimizer for the given function, see De nition (1.2). Most methods have measures which enforce the descending condition F(xk+1) < F(xk ) : (2.1) This prevents convergence to a maximizer and also makes it less probable that we converge towards a saddle point, cf. Chapter 1. If the given function has several minimizers the result will depend on the starting point x0 . We do not know which of the minimizers that will be found; quite often it is not the minimizer closest to x0 . In many cases the method produces vectors which converge towards the minimizer in 2 clearly di erent stages. When x0 is far from the solution we want the method to produce iterates which move steadily towards x . In this \global stage" of the iteration we are satis ed if the errors do not increase except in the very rst steps, i.e. kek+1k < kek k for k > K ; (2.2) where ek denotes the current error, ek xk ? x : (2.2b) In the nal stage of the iteration, where xk is close to x , we want faster convergence. We distinguish between Linear convergence : kek+1k akek k when kek k is small; 0 < a < 1 ; (2.3a)

2. Descent Methods Quadratic convergence :

6 (2.3b) (2.3c)

kek+1k = O(kek k2) when kek k is small ;

Superlinear convergence :

kek+1k=kekk ! 0 for k!1 :

The methods presented in this booklet are descent methods which satisfy the descending condition (2.1) in each step of the iteration. One step from the current iterate consists in 1. Find a descent direction hdd (discussed below), and 2. nd a steplength giving a good decrease in the F-value. Thus an outline of a descent method is
begin

Algorithm 2.4. Descent Method


fStarting pointg fFrom x and downhillg fx is stationaryg ffrom x in direction hddg fnext iterateg f. . . of descent algorithm g

k := 0; x := x0 ; found := false while not found and k < kmax


if no else

hdd := search direction(x) such h exists


found := true

end

:= line search(x; hdd) x := x + hdd; k := k+1

Consider the variation of the F-value along the half line starting at x and with direction h. From the Taylor series (1.4a) we see that F(x+ h) = F(x) + h>F 0 (x) + O( 2) ' F(x) + h>F 0 (x) for su ciently small. (2.5) We say that h is a descent direction if F(x+ h) is a decreasing function of at = 0. This leads to the following

2. Descent Methods

De nition. h is a Descent Direction for F at x () (2.6) h>F 0(x) < 0 If no such h exists, then F 0(x) = 0, showing that in this case x is
stationary. We want to ful l the descending property (2.1). In other words, we want F(x+ h) < F(x). In some methods we want to nd (an approximation to) the best value of , i.e. (2.7) e = argmin >0fF(x+ h)g : The process of nding a good value for is called a line search ; this is discussed in Section 2.3.

2.1. The Steepest Descent method


!0

From (2.5) we see that when we perform a step h with positive , then the relative gain in function value satis es lim F(x) ? F(x+ h) = ? 1 h> F 0(x) = ?kF 0(x)k cos ;

where is the angle between the vectors h and F 0(x). This shows that we get the greatest gain rate if = , i.e. if we use the steepest descent direction hsd given by hsd = ?F 0(x) : (2.8) The method based on (2.8) (i.e. hdd = hsd in Algorithm 2.4) is called the steepest descent method or gradient method. The choice of descent direction is \the best" (locally) and we could combine it with an exact line search (2.7). A method like this converges, but the nal convergence is linear and often very slow. Examples in Frandsen et al. (1999) show how the steepest descent method with exact line search and nite computation accuracy can fail to nd the minimizer of a second degree polynomial. However, for many problems the method has quite good performance in the initial stage of the convergence.

khk

khk

2.2. Newton's Method

Considerations like this has lead to the socalled hybrid methods, which { as the name suggests { are based on two di erent methods. One which is good in the initial stage, like the gradient method, and another method which is good in the nal stage, like Newton's method; see the next section. A major problem with a hybrid method is the mechanism which switches between the two methods when appropriate.

2.2. Newton's Method

We can derive this from the condition that x is a stationary point. According to (1.6) it satis es F 0 (x ) = 0. This is a nonlinear system of equations, and from the Taylor expansion F 0(x+h) = F 0(x) + F00(x)h + O(khk2) ' F 0 (x) + F00(x)h for khk su ciently small (2.9) we derive Newton's method: Find hN as the solutions to H hN = ?F 0(x) with H = F00(x) ; (2.10a) and compute the next iterate by x := x + hN : (2.10b) Suppose that H is positive de nite, then it is nonsingular (implying that (2.10a) has a unique solution), and u> H u > 0 for all nonzero u. Thus, by multiplying with h> on both sides of (2.10b) we get N (2.11) 0 < h>H hN = ?h> F 0(x) ; N N showing that hN is a descent direction: it satis es (2.6). Newton's method is very good in the nal stage of the iteration, where x ' x . We can show (see Frandsen et al. (1999)) that if the Hessian matrix at the solution is positive de nite (the su cient condition (1.8) is satis ed) and if we are at a position inside the region about x where F00 (x) is positive de nite, then we get quadratic con-

2. Descent Methods

vergence, see (2.3). In the opposite situation, i.e. x is in a region where F00(x) is negative de nite everywhere, and where there is a stationary point, the \raw" Newton method (2.10) would converge (quadratically) towards this stationary point, which is a maximizer. We do not want this, and we can avoid it by requiring that all steps taken are in descent directions. Now we can build a hybrid method, based on Newtons method: if hN is a descent direction use hN use hsd The controlling mechanism is the descent condition, h> F 0(x) < 0. As N shown in (2.11), this is satis ed if F00(x) is positive de nite, so a sketch of the central section of this version of the algorithm is: if F00 (x) is positive de nite if F(x+hN) < F(x) x := x + hN else (2.12) x := x + hN
else else

Here, hsd is the steepest descent direction and is found by a line search; see Section 2.3. This is included also with the Newton direction to make sure that the descending condition (2.1) is satis ed. An alternative reaction when the Hessian is not positive de nite, is the use of a socalled damped Newton method: if F00 (x) is not positive de nite Find so that F00(x)+ I is positive de nite (2.13) 00(x)+ I)hdN = ?F 0 (x) Find hdN by solving (F

x := x + hsd

2.3. Line Search

10

The step hdN produced in this way is a descent direction and can be used instead of hsd in the hybrid (2.12). For the inde nite case we use > 0, and with large, hdN is a short step whose direction is close to the steepest descent direction hsd , whereas we use the Newton step hN in the de nite case. Thus we can say that interpolates between these two directions. Notice that a good tool for checking a matrix for positive de niteness is Cholesky's method (see Appendix A) which, when succesful, is also used for solving the linear system in question. Thus, the check for de niteness is almost for free. The hybrid methods indicated above can be very e cient, but they are hardly ever used. The reason is that they need an implementation of F00(x), and for complicated application problems this is not available. Instead we have to make do with a socalled QuasiNewton method, based on series of matrices which gradually approach H = F00(x ), or (H )?1 or a factorization of H . In Section 3.3 we present such a method. See also Frandsen et al. (1999). Given a point x and a descent direction h. The next iteration step is a move from x in direction h. To nd out, how far to move, we study the variation of the given function along the half line from x in the direction h: ( ) = F(x+ h) ; x and h xed; 0: (2.14) An example of the behaviour of ( ) is shown in Figure 2.1 below. Our h being a descent direction ensures that 0(0) = h> F 0(x) < 0 ; indicating that if is su ciently small, we satisfy the descending condition (2.1), which is equivalent with ( ) < (0) : (2.15)

2.3. Line Search

11
Y y = (0)

2. Descent Methods

y = ()

Figure 2.1. Variation of the cost


function along the search line

Often, we are given an initial guess on , e.g. = 1 with Newton's method. Figure 2.1 illustrates that three di erent situations can arise 1 is so small that the gain in value of the objective function is very small. We should increase . 2 is too large: ( ) (0). We must decrease in order to satisfy the descent condition (2.1). is close to the minimizer1) of ( ). We happily accept this -value. An exact line search is an iterative process producing a series ; 2 : : : . The aim is to nd the true minimizer e de nied in (2.7), 1 and the algorithm stops when the iterate s satis es j 0( s )j j 0(0)j ; where is a small, positive number. In the iteration we can use approximations to the variation of ( ) based on the computed values of 3
1) More precisely: the smallest local minimizer of . If we increase

beyond the interval shown in Figure 2.1, then it may well happen that we get close to another local minimum for F .

2.3. Line Search

12

( k) = F(x+ k h) and 0( k ) = h>F 0 (x+ k h) : See Sections 2.5 { 2.6 in Frandsen et al. (1999) for details. An exact line search can waste a lot of computing time: When x is far from x , the search direction h may be far from the direction x ?x, and there is no need to nd the true minimum of very accurately. This is the background for the socalled soft line searches, where we accept an -value if it does not fall in the categories 1 or 2 listed above. We use a stricter version of the descending condition (2.1), viz. ( s) (0) + % 0(0) with 0 < % < 0:5 : (2.16a) This ensures that we are not in case 2 . Case 1 corresponds to the point ( ; ( )) being too close to the starting tangent, and we supplement with the condition 0( ) 0(0) with % < < 1 : (2.16b) s If the starting guess on satis es both these criteria, then we accept it as s . Otherwise, we have to iterate as outlined for exact line search. Details can be seen in Section 2.5 of Frandsen et al. (1999).

3. Non-Linear Least Squares Problems


In the remainder of this booklet we shall discuss methods for nonlinear least squares problems. Given a vector function f : IRn 7! IRm with m n. We want to minimize kf (x)k, or equivalently to nd x = argminxfF(x)g ; (3.1a) where
1 F(x) = 2

m X i=1

1 1 (fi (x))2 = 2 kf (x)k2 = 2 f (x)> f (x) :

(3.1b)

Least squares problems can be solved by general optimization methods, but we shall present special methods that are more e cient. In many cases they achieve better than linear convergence, sometimes even quadratic convergence, even though they do not need implementation of second derivatives. In the description of the methods in this chapter we shall need formulae for derivatives of F: Provided that f has continuous second partial derivatives, we can write its Taylor series as f (x+h) = f (x) + Jf (x)h + O(khk2) ; (3.2a) where Jf 2 IRm n is the Jacobian matrix containing the rst partial derivatives of the function components, @f (Jf (x))ij = @xi (x) : (3.2b) j As regards F : IRn 7! IR, it follows from the rst formulation in (3.1b),

3. Least Squares Problems

14

that1)

m @F (x) = X f (x) @fi (x) : i @x @xj j i=1

(3.3)

Thus, the gradient (1.4b) is F 0(x) = Jf (x)> f (x) : (3.4a) We shall also need the Hessian matrix of F. From (3.3) we see that the element in position (j; k) is
m @ 2 F (x) = X @fi (x) @fi (x) + f (x) @ 2 fi (x) ; i @x @x @xj @xk j k i=1 @xj @xk m X i=1

showing that

F00(x) = Jf (x)> Jf (x) +

fi (x)fi00(x) :

(3.4b)

Example 3.1. The simplest case of (3.1) is when f (x) has the form f (x) = b ? Ax ; where the vector b 2 IRm and matrix A 2 IRm n are given. We say that this is a linear least squares problem. In this case Jf (x) = ?A for all x, and from (3.4a) we see that F 0 (x) = ?A> (b ? Ax) : This is zero for x determined as the solution to the socalled normal
equations, (A> A)x = A> b : The problem can be written in the form

(3.5)

and alternatively we can solve it via orthogonal transformation: Find an orthogonal matrix Q so that
1) If we had not used the factor 1 in the de nition (1.3), we would have got an 2

Ax ' b ;

annoying factor of 2 in a lot of expressions.

15

3. Least Squares Problems

substitution in the system2) Rx = (Q> b)1:n : This is the method employed in Matlab. It is more accurate than the solution via the normal equations. As the title of the booklet suggests, we assume that f is nonlinear, and shall not discuss linear problems in detail. We refer to Chapter 6 in Nielsen (1996) or Section 5.2 in Golub and Van Loan (1989).

Q> A = R ; 0 n n is upper triangular. The solution is found by back where R 2 IR

Example 3.2. In Example 1.1 we saw a nonlinear least squares problem

arising from data tting. Another application is in the solution of nonlinear systems of equations, f (x ) = 0 ; where f : IRn 7! IRn : We can use Newton-Raphson's method: From an initial guess x0 we compute x1 ; x2 ; : : : by the algorithm, which is based on seeking h so that f (x+h) = 0 and ignoring the term O(khk2 ) in (3.2a), Solve Jf (xk )hk = ?f (xk ) for hk (3.6) xk+1 = xk + hk : Here, the Jacobian matrix Jf is given by (3.2b). If Jf (x ) is nonsingular, then the method has quadratic nal convergence, i.e. if dk = kxk ?x k is small, then kxk+1 ?x k = O(d2 ). However, if xk is far from x , then k we risk to get even further away, but as in Section 2.2 we can supply the method with a line search: We are seeking a zero for f (x). This is a minimizer of the function F de ned by (3.1), F (x) = 1 kf (x)k2 ; 2 with F (x ) = 0 and F (x) > 0 if f (x) 6= 0. We get a robust method by modifying (3.6) to

2) An expression like up:q is used to denote the subvector with elements ui ; i = p; : : : ; q. The ith row and j th column of a matrix A is denoted Ai;: and A:;j , respectively.

3.1. Gauss-Newton

16

solution. To explain this behaviour, we look at the Jacobian matrix 1 0 Jf (x) = (x1 +0:1)?2 4x2 : This is singular for x2 = 0, and for xk close to the X1 -axis the vector hk = Jf (xk )?1 f (xk ) will have large components. Then k will be very small, and we get stuck at the current position. A rigorous proof is given by Powell (1970). An alternative approach is to reformulate the problem so that we aim directly at minimizing F , instead of just using it in the line search. By itself this does not cure the problems associated with singular Jacobian matrices, but it allows us to use all the \tools" we are going to present in this chapter.

f (x) = 10x1 x1+ 2x2 ; 2 x1 +0:1 with x = 0 as the only solution. If we take x0 = 3; 1 ]> and use the above algorithm with exact line search, then the iterates converge to xc ' 1:8016; 0 ]>, which is not a

Solve Jf (xk )hk = ?f (xk ) for hk k = argmin >0 f 1 kf (xk + hk )k2 g 2 xk+1 = xk + k hk : Here (an approximation to) k is found as outlined in Section 2.3. The method may fail if the Jacobian matrix has singularities in the neighbourhood of x . As an example consider the following problem, taken from Powell (1970),

3.1. The Gauss-Newton Method

This method is the basis of the very e cient method we will describe in the next section. It is based on implemented rst derivatives of the components of the vector function. In special cases it can give quadratic convergence as the Newton-method does for general optimization, see Frandsen et al. (1999). As we shall see in an example, there is a risk of convergence towards a non stationary point if we incorporate an exact line search into it.

17

3. Least Squares Problems

The basis of the Gauss-Newton Method is a linear approximation to the components of f (a linear model of f ) in the neighbourhood of x : For small khk we see from the Taylor expansion (3.2) that f (x+h) ' `(h) f (x) + Jf (x)h : (3.7a) Inserting this in the de nition (3.1) of F we see that 1 F(x+h) ' L(h) 2 `(h)> `(h) 1 1 = 2 f>f + h> J> f + 2 h> J> Jf h f f > J> f + 2 h> J> Jf h = F(x) + h f 1 f (3.7b) (with f = f (x) and Jf = Jf (x)). The Gauss-Newton step hGN minimizes L(h), hGN = argminh fL(h)g : It is easily seen that the gradient and the Hessian matrix of L are L 0(h) = J>f + J> Jf h; L00(h) = J> Jf : (3.8) f f f Comparison with (3.4a) shows that L 0(0) = F 0(x). Further, we see that the matrix L00(h) is independent of h. It is symmetric and if Jf has full rank, i.e. if the columns are linearly independent, then L00(h) is also positive de nite, cf. Appendix A. This implies that L(h) has a unique minimizer, which can be found by solving (J> Jf )hGN = ?J> f : (3.9) f f This is a descent direction for F since (3.10) h> F 0(x) = h> (J>f ) = ?h> (J>Jf )hGN < 0 : GN f GN GN f Thus, we can use hGN for hdh in Algorithm 2.4. The typical step is Solve (J> Jf )hGN = ?J> f f f (3.11) x := x + hGN

3.1. Gauss-Newton

18

where is found by line search. The classical Gauss{Newton method uses = 1 in all steps. The method with line search can be shown to have guaranteed convergence, provided that a) fx j F(x) F(x0)g is bounded, and b) the Jacobian Jf (x) has full rank in all steps. In chapter 2 we saw that Newton's method for optimization has quadratic convergence. This is normally not the case with the GaussNewton method. To see this, we compare the search directions used in the two methods, F00(x)hN = ?F 0(x) and L00(h)hGN = ?L 0(0) : We already remarked at (3.8) that the two right-hand sides are identical, but from (3.4b) and (3.8) we see that the coe cient matrices di er:

F00(x) = L00(h) +

m X i=1

fi (x)fi00(x) :

(3.12)

Therefore, if f (x ) = 0, then L00(h) ' F00 (x) for x close to x , and we get quadratic convergence also with the Gauss-Newton method. We can expect superlinear convergence if the functions ffi g have small curvatures or if the fjfi(x )jg are small, but in general we must expect linear convergence. It is remarkable that the value of F(x ) controls the convergence speed.

Example 3.3. Consider the simple problem with n = 1, m = 2 given by + 1 f (x) = x2x+ x1? 1 : F (x) = 2 (x+1)2 + 1 ( x2 +x?1)2 : 2
It follows that F 0 (x) = 2 2 x3 + 3 x2 ? 2( ?1)x ; so x = 0 is a stationary point for F . Now, F 00 (x) = 6 2 x2 + 6 x ? 2( ?1) :

19

3. Least Squares Problems

This shows that if < 1, then F00 (0) > 0, so x = 0 is a local minimizer { actually, it is the global minimizer. The Jacobian matrix is Jf (x) = 2 x1+ 1 ;

and the classical Gauss-Newton method from xk gives 2 2 3 xk+1 = xk ? xk + 1:15+xkx? ( ?1)xk : k Now, if 6= 0 and xk is close to zero, then xk+1 = xk + ( ?1)xk (1 ? xk ) + O(x2 ) = xk + O(x2 ) : k k Thus, if j j < 1, we have linear convergence. convergence, linear If < ? 1, then the classical Gauss-Newton method cannot nd the minimizer. E.g. with = ? 2 and x0 = 0:1 we get

Finally, if = 0, then xk+1 = xk ? xk = 0 ; i.e. we nd the solution in one step. The reason is that in this case f is a linear function.

0 0:1000 1 ?0:2425 2 0:4046 3 ?4:7724 4 44:2975 . . . . . .

xk

Example 3.4. For the data tting problem from Example 1.1 the ith

row of the Jacobian matrix is Jf (x)i;: = ?x3 ti ex1 ti ?x4 ti ex2 ti ?ex1 ti ?ex2 ti : If the problem is consistent (i.e. f (x ) = 0), then the Gauss-Newton method with line search will have quadratic nal convergence, provided that x1 is signi cantly di erent from x2 . If x1 = x2 , then rank(Jf (x )) 2, and the Gauss-Newton method fails.

3.2. Marquardt's Method

20

If one or more measurement errors are large, then f (x ) has some large components, and this may slow down the convergence. In Matlab we can give a very compact function for computing f and Jf : Suppose that x holds the currrent iterate and that the m 2 array ty holds the coordinates of the data points. The following function returns f and J containing f (x) and Jf (x), respectively.
function f, J] = fitexp(x, ty) t = ty(:,1); y = ty(:,2); E = exp(t * x(1), x(2)]); f = y - E* x(3); x(4)]; J = - x(3)*t.*E(:,1), x(4)*t.*E(:,2), E];

Example 3.5. Consider the problem from Example 3.2, f (x ) = 0 with f : IRn 7! IRn : If we use Newton-Raphson's method to solve this

problem, the typical iteration step is Solve Jf (x)hNR = ?f (x); x := x + hNR : The Gauss-Newton method applied to the minimization of F (x) = 1 f (x)> f (x) has the typical step 2 Solve (Jf (x)> Jf (x))hGN = ?Jf (x)> f (x); x := x + hGN : Note, that Jf (x) is a square matrix, and we assume that it is nonsingular. Then (Jf (x)> )?1 exists, and it follws that hGN = hNR . Therefore, when applied to Powells problem from Example 3.2, the Gauss-Newton method will have the same troubles as discussed for Newton-Raphson's method in that example.

These examples show that the Gauss-Newton method may fail, both with and without a line search. Still, in many applications it gives quite good performance, though it normally only has linear convergence as opposed to the quadratic convergence from Newton's method with implemented second derivatives. In Section 3.2 we give a method with superior global performance, and in Section 3.3 we give modi cations to the method so that we achieve superlinear nal convergence.

21

3. Least Squares Problems

In section 2.2 we suggested a hybrid method with better global performance than Newton's method. One problem with the latter is that if we are far from a local minimizer, the Hessian matrix may be indefinite or even negative di nite. Thus the Newton step hN is perhaps not a descent direction, and in that case it would be better to use the steepest descent direction hsd . In Section 3.1 we saw that the Gauss-Newton step hGN is wellde ned only if Jf (x) has full rank. In that case hGN is a descent direction. Both Newton's method and the Gauss-Newton method may suggest steps that are so long that the non-linearity of the components of f gives a value of F, which is larger than the one we are about to leave. The reason is that the linear models behind the two methods, (2.9) and (3.11), are good approximations only for small values of khk. Levenberg (1944) and later Marquardt (1963) suggested a method where the step hM is computed by the following modi cation of the system (3.9) de ning hGN: (J> Jf + I)hM = ?g with g = J> f and 0 : (3.13) f f Here, Jf = Jf (x) and f = f (x). The damping parameter has several e ects: a) For all > 0 the coe cient matrix is positive de nite, and this ensures that hM is a descent direction, cf. (3.10). b) For large values of we get hM ' ? 1 g = ? 1 F 0(x) ; i.e. a short step in the steepest descent direction. c) If is very small, then hM ' hGN, which is a good step in the nal stages of the iteration, when x is close to x . If F(x ) = 0 (or very small), then we can get (almost) quadratic nal convergence.

3.2. Marquardt's Method

3.2. Marquardt's Method

22

Thus, the damping parameter in uences both the direction and the size of the step, and this leads us to make a method without a speci c line search. The choice of initial -value should be related to the size of the elements in A0 = Jf (x0 )> Jf (x0), e.g. by letting maxi a(0) ; (3.14) ii where is chosen by the user. During iteration the size of can be controlled by the gain ratio F(x+ M (3.15a) % = F(x) ?? L(h h) ) ; L(0) M where the denominator is the gain predicted by the linear model (3.7b), L(0) ? L(hM ) = ?h> J> f ? 1 h> J> Jf hM 2 M f M f 1 = ? 2 h> 2g + (J> Jf + I ? I)hM f M 1 (3.15b) = 2 h> ( hM ? g) : M
0=

Note that both h> hM and ?h> g are positive, so L(0)?L(hM ) is M M guaranteed to be positive. A large value of % indicates that L(hM ) is a good approximation to F(x+hM), and we can decrease so that the next Marquardt step is closer to the Gauss-Newton step. If % is small (maybe even negative), then L(hM) is a poor approximation, and we should increase with the twofold aim of getting closer to the steepest descent direction and reducing the step length. These goals can be met in di erent ways, e.g. by using the following simple updating strategy, if % > 0 x := x + hM 1 := maxf 3 ; 1 ? (2% ? 1)3 g; := 2 (3.16)
else

:=

:= 2

23

3. Least Squares Problems

The factor is initialized to = 2. Note that x is updated only if % > 0 , F(x+hM ) < F(x), i.e. if the descending condition (2.1) is satis ed, and that a series of consecutive failures results in rapidly increasing -values. Example 3.6. The currently most widely used strategy has the form
if if if

% > 0:75 := =3 % < 0:25 %>0


:= 2

(3.17)

This strategy was originally proposed by Marquardt (1963), and small 1 changes in the thresholds 0.25 and 0.75 and in the factors 2 and 3 can be seen. The two updating formulas are illustrated below.

new

x := x + hM

0.25

0.75

Marquardt's strategy (dasheded line) The smoother change of for % in the range 0 < % < 1 has a bene cial in uence on the convergence, see Figure 3.2a-b below. Also, if % 0 in consective steps, then (3.16) needs fewer steps to get su ciently large. Extensive testing in Nielsen (1999) shows that generally (3.16) is signi cantly superior to (3.17).

Figure 3.1. Updating of by (3.16) with = 2 (full line)

The stopping criteria for the algorithm should re ect that at a global minimizer we have F 0 (x ) = g(x ) = 0, so we can use kgk1 "1 ; (3.18a)

3.2. Marquardt's Method

24

where "1 is a small, positive number, chosen by the user. Another relevant criterion is to stop if the relative change in x is small, kxnew ? xk "2 kxk : (3.18b) Finally, to guard against an in nite loop, we need a \safety valve" k kmax : (3.18c) Also "2 and kmax are chosen by the user. The last two criteria come into e ect e.g. if "1 is chosen so small that e ects of rounding errors have large in uence. This will typically reveal itself in a poor accordance between the actual gain in F and the gain predicted by the linear model (3.7b), and will result in being augmented in every step. Our strategy for augmenting implies that in this case grows fast, resulting in small khMk, and the process will be stopped by (3.18b). The algorithm is summarized below. As regards practical implementation, we do not need to store the complete m n Jacobian matrix. Suppose e.g. that we compute it one row at a time, then we can build up the matrix A = Jf (x)> Jf (x) and vector g = Jf (x)> f (x) by using the relations

A=

m X i=1

J>:Ji;: ; i;

g=

m X i=1

fi (x)J> : ; i;

(3.19)

where Ji;: is the ith row in Jf (x), holding the derivatives of fi . Finally, remember that the Gauss-Newton step hGN minimizes the function L(h), hGN = argminhfL(h)g : Marquardt has shown that inside a ball of radius khMk the Marquardt step hM minimizes L: hM = argminkhk khMkfL(h)g : (3.20) The proof is given in Appendix B.

25

3. Least Squares Problems

begin

Algorithm 3.21. Marquardt's Method 3)


:= 2; x := x0

while

A := Jf (x)> Jf (x); g := Jf (x)> f (x) found := (kgk1 "1 ); := maxfaii g


(not found ) and (k < kmax) k := k+1; Solve (A + I)hM = ?g if khM k "2 kxk
found := true

k := 0;

else

% := (F (x) ? F (xnew ))=(L(0) ? L(hM )) if % > 0

xnew := x + hM

else end

x := xnew A := Jf (x)>Jf (x); g := Jf (x)>f (x) found := (kgk1 "1 ) := maxf 1 ; 1 ? (2% ? 1)3 g; := 2 3
:= ; := 2

fcf. (3.15)g fstep acceptableg

Example 3.7. Comparing (3.9) and the normal equations (3.5) we see that hGN is simply the least squares solution to the linear problem f (x) + Jf (x)h ' 0 . Similarly, the Marquardt equations (3.13) are the
normal equations for the linear problem f (x) + Jf (x) h ' 0 : p As mentioned in Example 3.1, the most accurate solution is found via orthogonal transformation. However, the solution hM is just a step in an iterative process, and needs not be computed very accurately, and since the solution via the normal equations is \cheaper", this method is normally employed.
3) The algorithm is sometimes called the Levenberg-Marquardt
Method

3.2. Marquardt's Method

26

Example 3.8. We have used Algorithm 3.21 on the data tting problem

from Examples 1.1 and 3.4. Figure 1.1 indicates that both x1 and x2 are negative and that M (x ; 0) ' 0. These conditions are satis ed by x0 = ?1; ?2; 1; ?1]> . Further, we used = 10?3 in the expression (3.14) for 0 and the stopping criteria given by (3.18) with "1 = "2 = 10?8 , kmax = 200. The algorithm stopped after 62 iteration steps with x ' ?4; ?5; 4; ?4]> . The performance is illustrated below; note the logarithmic ordinate axis.
10 10 10 10 10 10 10 10
2 0

F(x) || g ||

10

12

the tting problem from Example 1.1 This problem is not consistent, so we could expect linear nal convergence. The last 7 iteration steps indicate a much better (superlinear) convergence. The explanation is, that the fi00 (x) are slowly varying functions of ti , and the fi (x ) have \random" sign, so that the contributions to the \forgotten term" in (3.12) almost cancel out. Such a situation occurs in many data tting applications. For comparison, Figure 3.2b shows the performance with the updating strategy (3.17). From step 20 to step 68 we see that each decrease in is immediately followed by an increase, and the norm of the gradient has a rugged behaviour. This slows down the convergence, but the nal stage is as in Figure 3.2a.

Figure 3.2a. Marquardt's method applied to

10

20

30

40

50

60

70

27
10 10 10 10 10 10 10 10
2

3. Least Squares Problems

F(x) || g ||

10

12

10

20

30

40

50

60

70

80

Figure 3.2b. Performance with updating strategy (3.17) Example 3.9. Figure 3.3 illustrates the performance of Algorithm 3.21
10
4

applied to Powell's problem from Examples 3.2 and 3.5. The starting point is x0 = 3; 1 ]> , 0 given by = 1 in (3.14), and we use "1 = "2 = 10?15 , kmax = 100 in the stopping criteria (3.18).
F(x) || g ||
10
0

10

10

10

12

10

16

10

20

30

40

50

60

70

80

90

100

The iteration seems to stall between steps 22 and 30. This as an e ect of the (almost) singular Jacobian matrix. After that there seems to be linear convergence. The iteration is stopped by the \safety valve" at the point x = -3.82e-08; -1.38e-03 ]> . This is a better approximation

Figure 3.3. Marquardt's method applied to Powell's problem

3.3. Hybrid: Marquardt and Quasi-Newton

28

to x = 0 than we found in Example 3.2, but still we want to be able to do better; see Examples 3.15 and 3.17.

In 1988 Madsen presented a hybrid method which combines Marquardt's method (quadratic convergence if F(x ) = 0, linear convergence otherwise) with a quasi-Newton method which gives superlinear convergence, even if F(x ) 6= 0. The iteration starts with a series of steps with the Marquardt method. If the performance indicates that F(x ) is signi cantly nonzero, then we switch to the quasi-Newton method for better performance. It may happen that we get an indication that it is better to switch back to Marquardt's method, so there is also a mechanism for that. The switch from Marquardt's method to the quasi-Newton method is made if kF 0(x)k1 < 0:02 F(x) (3.22) in three consecutive, succesful iteration steps. This is interpreted as an indication that we are approaching an x with F 0(x ) = 0 and F(x ) signi cantly nonzero. As discussed in connection with (3.12), this can lead to slow, linear convergence. The quasi-Newton method is based on having an approximation B to the Hessian matrix F00(x) at the current iterate x, and the step hqN is found by solving BhqN = ?F 0(x) ; (3.23) which is an approximation to the Newton equation (2.10a). The approximation B is updated by the BFGS strategy, cf. Section 5.10 in Frandsen et al. (1999): Every B in the series of approximation matrices is symmetric (as any F00(x)) and positive de nite. This ensures that hqN is \downhill", cf. (2.11). We start with the symmetric,

3.3. A Hybrid Method: Marquardt and Quasi-Newton

29

3. Least Squares Problems

with J = Jf (x), Jnew = Jf (xnew ). As mentionened, the current B is positive de nite, and it is changed only, if h> y > 0. In this case it can be shown that also the new B is positive de nite. The quasi-Newton method is not robust in the global stage of the iteration. Speci cally it does not check a descending condition like (2.1). At the solution x we have F 0 (x ) = 0, and good nal convergence is indicated by rapidly decreasing values of kF 0 (x)k. If these norm values do not decrease rapidly enough, then we switch back to the Marquardt method. The algorithm is summarized below. It calls the auxiliary functions M Step and Q Step, implementing the two methods. We have the following remarks: 1 Initialization. 0 can be found by (3.14). The stopping criteria are given by (3.18). 2 The dots indicate that we also transfer current values of f and Jf etc. so that we do not have to recompute them for the same x. 3 The currently best approximation found by Marquardt's method is saved. If the quasi-Newton method fails, we return to Marquardt's method and start it from xbest . 4 Notice that both Marquardt and quasi-Newton steps contribute information for the approximation of the Hessian matrix.

positive de nite matrix B0 = I, and the BFGS update consists of a rank 2 matrix to be added to the current B. Madsen (1988) uses the following version, advocated by Al-Baali and Fletcher (1985), h := xnew ? x; y := J> Jnewh + (Jnew ? J)>f (xnew) new if h> y > 0 (3.24) 1 1 v := Bh; B := B + ? h>y y y> ? ? h>v v v>

3.3. Hybrid: Marquardt and Quasi-Newton

30

begin

Algorithm 3.25. A Hybrid Method


f1 g

k := 0; x := x0 ;

end

:= 0 ; := 2; B := I found := (kF 0 (x)k1 "1 ); method := Marquardt while (not found ) and (k < kmax ) k := k+1 case method of Marquardt: xnew ; found; method; : : :] := M Step(x; : : :) if method = Quasi Newton xbest := xnew Quasi Newton: xnew ; found; method; : : :] := Q Step(x; B; xbest ; : : :) Update B by (3.24); x := xnew

f2 g f3 g f2 g f4 g

begin

Function 3.25a. Marquardt Step xnew; found; method; : : :] := M Step(x; : : :)

xnew := x; method := Marquardt ? Solve Jf (x)> Jf (x) + I hM = ?F 0 (x) found := (khM k "2 kxk)
if not found

% := (F (x) ? F (x+hM ))=(L(0) ? L(hM)) if % > 0


count := count+1 if count = 3 method := Quasi Newton count := 0

xnew := x+hM ; found := (kF 0 (xnew )k1 "1 ) 1 := maxf 3 ; 1 ? (2% ? 1)3 g; := 2; 0 (x )k1 < 0:02 F (x ) if kF new new
else

f5 g f6 g

else end

:=

:= 2

; count := 0

31

3. Least Squares Problems

begin

Function 3.25b Quasi-Newton Step xnew; found; method; : : :] := Q Step(x; B; xbest; : : :)

end

method := Quasi Newton; Solve B hqN = ?F 0 (xnew ) found := (khqNk "2 kxk) if not found xnew := x + hqN ; found := (kF 0 (xnew )k1 "1 ) if (not found) and (kF 0 (xnew )k1 > 0:99 kF 0 (x)k1 ) method := Marquardt if F (xnew ) > F (xbest ) xnew := xbest ;

f7 g

We have the following remarks on the functions M step and Q step : 5 Indication that it might be time to switch method. The parameter count is initialized to zero at the start of Algorithm 3.25. 6 (3.22) was satis ed in three consecutive steps, all of which had % > 0, i.e. x was changed. 7 The gradients do not decrease fast enough.

Example 3.10. Notice that in the updating formula (3.24) the computation of y involves the product Jf (x)>f (xnew ). This implies that we
have to store the previous Jacobian matrix (or to recompute it if we want to exploit the possibilities discussed in connection with (3.19)). Instead, we could use y = F 0 (xnew ) ? F 0 (x) = gnew ? g in the updating formula, but Madsen (1988) found that (3.24) performs better. on the problems discussed in Examples 3.8 and 3.9. In the latter case (see Figure 3.3) F (x)!0, and the switching condition at remark 5 will

Example 3.11. This hybrid method will not outperform Algorithm 3.21

3.3. Hybrid: Marquardt and Quasi-Newton

32

never be satis ed. In the former case, F (x ) is signi cantly nonzero, but { as discussed in Example 3.8 { the simple Marquardt method has the desired superlinear nal convergence. To demonstrate the e ciency of Algorithm 3.25 we consider the modi ed Rosenbrock problem, cf. Example 5.5 in Frandsen et al. (1999), given by f : IR2 7! IR3 , " 10(x2 ? x2 ) # 1 1 ? x1 f (x) = ; where the parameter can be chosen. The minimizer of F (x) = > > 1 2. 1 2 f (x) f (x) is x = 1; 1 ] with F (x ) = 2 Below we give results for Algorithms 3.21 and 3.25 for some values of . In all cases we use x0 = ?1:2; 1 ]>, the initial damping parameter 0 de ned by = 10?3 in (3.14), and ("; kmax) = (10?12 ; 10?12 ; 200) in the stopping criteria (3.18). Algorithm 3.21 Algorithm 3.25 its kx ? x k its kx ? x k 0 18 8.84e-15 18 8.84e-15 10?5 18 8.84e-15 18 8.84e-15 1 23 9.12e-09 19 3.61e-13 102 23 1.79e-06 22 1.20e-15 104 22 1.18e-04 22 1.20e-15 In the rst two cases is too small to really in uence the iterations, but for the larger -values we see that the hybrid method is much better than the simple Marquardt algorithm { especially with respect to the accuracy obtained. In Figure 3.4 we illustrate the performance of algorithms 3.21 and 3.25 in the case = 104 . With Marquardt's method all steps after no. 14 seem to fail to improve the objective function; increases rapidly, and the stopping criterion (3.18b) is satis ed at step no. 22. With the hybrid method there are several attempts to use the quasiNewton method, starting at step no. 5, 11 and 19. The last attempt ends with (3.18a) being satis ed.

33
10
10

3. Least Squares Problems


10
10

10

10

10

10

10

10

10

10

F(x) || g ||
5 10 15 20 25

10

10

F(x) || g || M || g || Q
5 10 15 20 25

10

15

10

15

Figure 3.4. Marquardt's method (left) and the hybrid method (right)

exists. In many practical optimization problems it happens that we cannot give formulae for the elements in Jf , e.g. f may be given by a \black box". The secant version of Marquardt's Method is intended for problems of this type. The simplest remedy is to replace Jf (x) by a matrix B obtained by numerical di erentiation : The (i; j)th element is approximated by the nite di erence approximation @fi fi (x+ ej ) ? fi (x) b ; (3.26) ij @xj (x) ' where ej is the unit vector in the jth coordinate direction and is an appropriately small real number. With this strategy each iterate x needs n+1 evaluations of f , and since is probably much smaller than the distance kx ? x k, we do not get much more information on the global behavior of f than we would get from just evaluating f (x). We want better e ciency.

The methods discussed in this booklet assume that the vector function f is di erentiable, i.e. the Jacobian matrix @f Jf (x) = @xi
j

3.4. A Secant Version of Marquardt's Method

3.4. Secant Marquardt's Method

34

Example 3.12. Let m = n = 1 and consider one nonlinear equation f : IR 7! IR: Find b so that f (b) = 0 : x x

For this problem we can write the Newton-Raphson algorithm (3.6) in the form f (x+h) ' `(h) f (x) + f 0 (x)h solve the linear problem `(h) = 0 (3.27) xnew := x + h If we cannot implement f 0 (x), then we can approximate it by (f (x+ ) ? f (x))= with chosen appropriately small. More generally, we can replace (3.27) by f (x+h) ' (h) f (x) + bh with b ' f 0 (x) solve the linear problem (h) = 0 (3.28a) xnew := x + h Suppose that we already know xprev and f (xprev ). Then we can x the factor b (the approximation to f 0 (x)) by requiring that f (xprev ) = (xprev ? x) : (3.28b) ? This gives us b = f (x) ? f (xprev ) =(x ? xprev ) , and with this choice of b we recognize (3.28) as the secant method ; see e.g. p. 29 in Barker & Tingle (1991). The main advantage of the secant method over an alternative nite di erence approximation to Newton-Raphson's method is that we only need one function evaluation per iteration step instead of two. For a more thorough discussion of computational e ciency see p. 30f in Barker and Tingle (1991).

Now, consider the linear model (3.7a) for f : IRn 7! IRm , f (x+h) ' `(h) f (x) + Jf (x)h : We will replace it by f (x+h) ' (h) f (x) + Bh ; where B is the current approximation to Jf (x). In the next iteration step we need Bnew so that

35

3. Least Squares Problems

Especially, we want this model to hold with equality for h = x?xnew , i.e. f (x) = f (xnew) + Bnew(x?xnew) : (3.29a) This gives us m equations in the m n unknown elements of Bnew , so we need more conditions. Broyden (1965) suggested to supplement (3.30a) with Bnewv = Bv for all v ? (x?xnew) : (3.29b) It is easy to verify that the conditions (3.29a{b) are satis ed by where

f (xnew+h) ' f (xnew) + Bnewh :

Broyden's Rank One Update Bnew = B + uh>


1 h = xnew ? x ; u = h>h (f (xnew) ? f (x) ? Bh) :

(3.30)

Note that condition (3.29a) corresponds to the secant condition (3.28b) in the case n = 1. We say that this approach is a generalized secant method. A brief sketch of the central part of Algorithm 3.21 with this modi cation has the form solve (B> B + I)hsM = ?B> f (x) xnew := x + hsM Update B by (3.30) Update and x as in (3.16) Powell has shown that if the set of vectors x0 ; x1 ; x2; : : : converges to x and if the set of steps fhk xk ?xk?1g satisfy the condition that fhk?n+1; : : :; hkg are linearly independent (they span the whole of IRn) for each k n, then the set of approximations fBk g converges to Jf (x ), irrespective of the choice of B0.

3.4. Secant Marquardt's Method

36

not span the whole of IRn, and there is a risk that after some iteration steps the current B is such a poor approximation to the true Jacobian matrix, that ?B> f (x) is not even a downhill direction. In that case x will stay unchanged and is increased. The approximation B is changed, but may still be a poor approximation, leading to a further increase in , etc. Eventually the process is stopped by hsM being so small that (3.18b) is satis ed, although x may be far from x .

In practice, however, it often happens that the previous n steps do

A number of strategies have been proposed to overcome this problem. A simple idea is occasionally to recompute B by nite di erences. This idea is used in Algorithm 3.31 below. We have the following remarks: 1 Again the stopping criteria are given by (3.18), and the input is the starting vector x0 and the scalars , "1 , "2 , kmax and (for use in (3.30)). We also need K and th, see 3 . 2 The expression for 0 is identical with (3.14) with Jf (x0 ) replaced by B. 3 updx = true shows that x has moved since the previous di erence approximation was computed. > th might be caused by ?B> f (x) not being downhill, and if K updates have been performed, it is time for a fresh approximation. The results given in the following example were computed with th = 16, K = maxfn; 10g. 4 Whereas the iterate x is updated only if the descending condition (2.1) is satis ed, the approximation B is normally updated in every step, see 5 . Therefore the approximate gradient g may change also when f (x) is unchanged.

37

3. Least Squares Problems

begin

Algorithm 3.31 Secant Version of Marquardt's Method


f1 g f2 g f3 g f4 g

k := 0; x := x0 ;

Compute B by (3.26); := maxfkB:;j k2 g; := 2; updB := 0; updx := false; found := false while (not found) and (k < kmax) if (updx and > th ) or updB = K Compute B by (3.26); := 2; updB = 0; updx := false g := B> f (x); found := (kgk1 "1 ) if not found k := k+1; Solve (B> B + I)hsM = ?g
if else if

found := true

khsM k "2 kxk

xnew := x + hsM ; dF = F (x) ? F (xnew )


updx or (dF > 0) Update B by (3.30); updB := updB+1 % := dF=(L(0) ? L(hM )) if % > 0 x := xnew ; updx := true 1 := maxf 3 ; 1 ? (2%?1)7 g; := 2

f5 g f6 g f7 g

else end

:=

:= 2

5 If a new di erence approximation does not immediately lead to a lower value of F (because is too small), then keep the current B. Otherwise, update the approximation: also a \failing" step contributes information about the local behaviour of f . 6 As in (3.15b) we can show that L(0) ? L(hsM ) = 1 h> ( hsM ? g). 2 sM

3.5. Powell's Dog Leg Method

38

7 Compared with (3.16) we have changed the updating formula for . As in the Marquardt algorithm, % is a measure of how well the linear model for f approximates the true behaviour. Now, however, it is in uenced also by the di ence between the current B and the current Jf (x). It is not possible to distinguish between these two error contributions, so we use a more conservative updating of .

Example 3.13. We have used Algorithm 3.31 on the modi ed Rosenbrock

problem from Example 3.11 with = 0. If we use the same starting point and stopping criteria as in that example, and take = 10?6 in the di erence approximation (3.26), we nd the solution after 26 iteration steps, involving a total of 33 evaluations of f (x). For comparison, the \true" Marquardt algorithm needs only 18 steps, implying a total of 19 evaluations of f (x) and Jf (x). We have also used the secant algorithm on the data tting problem from Examples 1.1, 3.4 and 3.8. With = 10?6 and the same starting point and stopping criteria as in Example 3.8 the iteration was stopped by (3.18a) after 104 steps, involving a total of 149 evaluations of f (x). For comparison, Algorithm 3.21 needs 62 iteration steps. These two problems indicate that Algorithm 3.31 is robust, but they also illustrate a general rule of thumb: If gradient information is available, it normally pays to use it.

In many practical applications the numbers m and n are large, but each of the functions fi (x) depends only on a few of the elements @f in x. In that case most of the @x (x) are zero, and we say that Jf (x) is a sparse matrix. There are e cient methods exploiting sparsity in the solution of the Marquardt equation (3.13), see e.g. Nielsen (1997). In the updating formula (3.30), however, normally all elements in the vectors h and u are nonzero, so that Bnew will be a dense matrix. It is outside the scope of this booklet to discuss how to cope with this; we refer to Gill et al. (1984) and Toint (1987).
i j

39

3. Least Squares Problems

3.5. Powell's Dog Leg Method

As the Marquardt method, this method works with combinations of the Gauss-Newton and the steepest descent directions. Now, however controlled explicitly via the radius of a trust region, cf. Section 2.4 in Frandsen et al. (1999). Given f : IRn 7! IRm . At the current iterate x we can compute the Gauss-Newton step hGN by solving ? Jf (x)> Jf (x) hGN = ?Jf (x)> f (x) (3.32) and the steepest descent direction hsd = ?g = ?Jf (x)> f (x) : The latter is a direction, not a step, and to see how far we should go, we look at the linear model f (x+ hsd) ' f (x) + Jf (x)hsd
1 F(x+ hsd ) ' 2 kf (x) + Jf (x)hsd k2 1 = F(x) + h> Jf (x)> f (x) + 2 2 kJf (x)hsd k2 : sd This function of is minimal for 2 h> J (x)> f (x) sd (3.33) = ? kJ f x)h k2 = kJ k(gk gk2 : f ( sd f x) Now we have two candidates for the step to take from the current point x: a = hsd and b = hGN. Powell suggested to use the following strategy for choosing the step, when the trust region has radius . The last case in the strategy is illustrated in Figure 3.5.

3.5. Powell's Dog Leg Method


if

40

khGNk hdl := hGN elseif k hsd k hdl := ( =khsdk)hsd else hdl := hsd + (hGN ? hsd) with chosen so that khdl k = :

(3.34a)

a = hsd x hdl b = hGN

Figure 3.5. Trust region and Dog Leg step 4) With a and b as de ned above, and c = a> (b?a) we can write ( ) ka + (b?a)k2 ? 2 = kb?ak2 2 + 2c + kak2 ? 2 : We seek a root for this second order polynomial, and note that !+1 for ! ? 1; (0) = kak2? 2 < 0; (1) = khGNk2 ? 2 > 0. Thus,

has one negative root and one root in ]0; 1 . We seek the latter, and the most accurate computation of it is given by if c 0 p = ?c + c2 + kb?ak2 ( 2 ? kak2) kb?ak2 (3.34b) else p ? 2 = ? kak2 c + c2 + kb?ak2 ( 2 ? kak2)
4) The name Dog
Leg is taken from golf: The fairway at a \dog leg hole" has a shape as the line from x (the tee point) via the end point of a to the end point of hdl (the hole). Powell is a keen golfer!

41

3. Least Squares Problems

As in Marquardt's method we introduce the gain ratio % = (F(x) ? F(x+hdl )) (L(0) ? L(hdl )) ; where L is the linear model 1 L(h) = 2 kf (x) + Jf (x)hk2 : In Marquardt's method we used % to control the size of the damping parameter. Here, we use it to control the radius of the trust region. A large value of % indicates that the linear model is good: We can increase and thereby take longer steps, and they will be closer to the Gauss-Newton direction. If % is small (maybe even negative) then we reduce , implying smaller steps, closer to the steepest descent direction. Similar to (3.16) we can use if % > 0 x := x + hdl 1 := = maxf 3 ; 1 ? (2% ? 1)3g; := 2 (3.35) := = ; := 2 For general least squares problems the Dog Leg method has the same disadvantages as Marquardt's method: the nal convergence can be expected to be linear (and slow) if F(x ) 6= 0. This problem does not arise, however, in the case where we seek a root of f : IRn 7! IRn. Then Jf (x) is a square matrix, which we shall assume to be nonsingular, and (3.32) is equivalent with Jf (x)hGN = ?f (x) ; (3.36) which we recognize as the system of equations de ning the Newton step. In the nal stage of the Dog Leg algorithm we can expect khGNk , which means that we shall use Newton-Raphson's method, known to have quadratic nal convergence { still provided that Jf (x ) is nonsingular. Now we can formulate
else

3.5. Powell's Dog Leg Method

42

begin

Algorithm 3.37. Dog Leg Method for systems of nonlinear equations


f1 g f2 g

k := 0;
while

:= 2; x := x0 ; := 0 g := Jf (x)> f (x); found := (kf (x)k1 "3 ) or (kgk1 "1 ) (not found) and (k < kmax ) k := k+1; Compute by (3.33)

hsd := ? g; Solve Jf (x)hGN = ?f (x) Compute hdl by (3.34) if khdl k "2 kxk
else

found := true

xnew := x + hdl % := (F (x) ? F (xnew ))=(L(0) ? L(hdl ))


if

else end

x := xnew ; g := Jf (x)> f (x) found := (kf (x)k1 "3 ) or (kgk1 "1 ) := = maxf 1 ; 1 ? (2%?1)3 g; := 2 3
:= = ; := 2 ; found := (

%>0

f3 g

"2 kxk)

f4 g

We have the following remarks. 1 Initialization. 0 should be supplied by the user. 2 We use the stopping criteria (3.18) supplemented with kf (x)k1 "3 , re ecting that f (x ) = 0. 3 Corresponding to the three cases in (3.34a) we can show that 8 F(x) if hdl = hGN > > < (2k gk ? ) ? L(0)?L(hdl ) = > if hdl = kgk g 2 > :1 2 2 2 (1? ) kg k + (2? )F(x) otherwise

43 4 Extra stopping criterion. If be satis ed in the next step.

3. Least Squares Problems

"2 kxk, then (3.18b) will surely

Example 3.14. We have used Algorithm 3.37 on the Rosenbrock function f : IR2 7! IR2 , given by 2 2 f (x) = 10(1x??1x1 ) ; x

cf. Example 3.11. With the starting point x0 = ?1:2; 1 ]> , 0 = 1 and the stopping criteria given by "1 = "2 = "3 = 10?12 we found the solution after 13 iteration steps. In Figure 3.6 we illustrate the convergence. In the left hand part we give the iterates and the level curves of F , and in the right hand part we show the trust region radius and the parameter in (3.34) as functions of iteration step number. = 1 in the last three steps indicates that here the algorithm uses the \pure" Newton step hGN .
1.25
1.5

1
1

0.75
0.5

0.5
0

0.25
0.5 1.5

Figure 3.6. Dog Leg method applied to Rosenbrock's function

0.5

0.5

1.5

0 0

10

12

14

Example 3.15. In Figure 3.7 we illustrate the performance of the Dog Leg

method applied to Powell's problem from Examples 3.2 and 3.9 with starting point x0 = 3; 1 ]>, 0 = 1 and the stopping criteria given by "1 = "2 = "3 = 10?20 , kmax = 100. The criterion from Remark 2 on Algorithm 3.37 was satis ed after 42 iteration steps, returning x = 3:68 10?38 ; ?4:72 10?11 ]> , which is quite a good approximation to x = 0. After the 6th iteration step the algorithm uses \pure" Newton iteration, and is increased in each step. As in Figure 3.3 we see that the convergence is linear (caused by

3.5. Powell's Dog Leg Method


1e5
|| f(x) || || g(x) ||

44

1e5

1e10

1e15

1e20 0

10

20

30

40

Figure 3.7. Dog Leg method applied to Powell's problem


the singular Jf (x )), but considerably faster than with the Marquardt method.

The above examples indicate that the Dog Leg method is good. It is presently considered as the best method for solving systems of nonlinear equations, but its real success is in the solution of such problems, when the Jacobian matrix is not available. As discussed in Section 3.4 we can compute B ' Jf (x) via Broyden's updating formula (3.30) ? 1 Bnew = B + h>h (y ? Bh) h> (3.38a) where h = xnew ?x ; y = f (xnew) ? f (x) : Broyden (1965) has also given a formula for updating an approximate inverse of the Jacobian matrix, D ' Jf (x)?1 . The formula is ? D = D + 1 (h ? Dy) (h>D) ; (3.38b)
new

where h and y are de ned in (3.38a).

h>Dy

45

3. Least Squares Problems

With these matrices the steepest descent direction hsd and the Gauss-Newton step hGN (3.36) are approximated by hssd = ?B>f (x) and hsGN = ?Df (x) : (3.39) Algorithm 3.37 is easily modi ed to use these approximations. The initial B = B0 can be found by the di erence approximation (3.26), and D0 computed as B?1 . It is easy to show that then the current B 0 and D satisfy BD = I. The step parameter is found by (3.33) with Jf (x) replaced by B, and the predicted gain is still given by the expressions in remark 3 on Algorithm 3.37. As in remark 6 on Algorithm 3.31 we recommend the updating formula for the trust region radius 1 to be changed to the more conservative := = maxf 3 ; 1 ? (2% ? 1)7g. Also, we recommend occasional recomputation of B by nite di erences, followed by D := B?1. Each update with (3.38) \costs" 10n2 ops5) and the computation of the two step vectors by (3.39) plus the computation of by (3.33) costs 6n2 ops. Thus, easch iteration step with the gradient-free version of the Dog Leg method costs about 16n2 ops plus evaluation of f2(xnew). For comparison, each step with Algorithm 3.37 costs about 3 2 3 n +6n ops plus evaluation of f (xnew) and Jf (xnew ). Thus, for large values of n the gradient-free version is cheaper per step. It should be mentioned, however, that the number of iteration steps often is considerably larger, and if the Jacobian matrix is available, then the gradient version is normally faster.

Example 3.16.

We have used the gradient-free Dog Leg method on Rosenbrock's function from Example 3.14 with the same starting values and stopping criteria as given there and with = 10?6 in the di erence approximation (3.26) used to nd B0 . The solution was found after 24 iteration steps, i.e. about twice as many as needed with the gradient version.

5) One \ op" is a simple arithmetic operation between two oating point num-

bers.

3.6. Final Remarks

46

We have discussed a number of algorithms for solving nonlinear least squares problems. All of them appear in any good program library, and implementations can be found via GAMS (Guide to Available Mathematical Software) at the Internet address
http://gams.nist.gov

3.6. Final Remarks

The examples in this booklet were computed in Matlab. The programs are available via
http://www.imm.dtu.dk/ hbn/software.html

Finally, it should be mentioned that sometimes a reformulation of the problem can make it easier to solve. We shall illustrate this claim by examples, involving ideas that may be applicable also to your problem. Example 3.17. In Powell's problem from Examples 3.2, 3.9 and 3.15 the
variable x2 occurs only as x2 . We can introduce new variables 2 z = x1 ; x2 ]>, and the problem takes the form: Find z 2 IR2 such that 2 f (z ) = 0, where 1 with Jf (z) = (z1 +0:1)?2 0 : f (z) = 10z1 z1+ 2z2 2 z1 +0:1 This Jacobian matrix is nonsingular for all z. Marquardt's algorithm 3.21 with starting point z0 = 3; 1 ]>; = 10?16 and "1 = "2 = 10?15 in the stopping criteria (3.18) stops after 3 steps with z ' -3.77e-26; -2.83e-24 ]>. This is a good approximation to z = 0.

Example 3.18. The data tting problem from Examples 1.1, 3.4 and 3.8
can be reformulated to have only two parameters, x1 and x2 : We can write the model in the form M (x; t) = c1 ex1 t + c2 ex2 t ; where, for given x, the vector c = c(x) 2 IR2 is found as the least squares solution to the linear problem

Ec ' y

47

3. Least Squares Problems

with E = E(x) 2 IRm 2 given by the rows (E)i;: = ex1 ti ex2 ti ]. As in Example 1.1 the function f is de ned by fi (x) = yi ? M (x; ti ), leading to f (x) = y ? E(x)c(x) : It can be shown that the Jacobian matrix is Jf = ?EG ? H c] ; where, for any vector u we de ne the diagonal matrix u] = diag(u), and ? H = t]E; G = (E>E)?1 H>f ] ? H> E c] : Marquardt's algorithm with the same poor starting guess as in Example 3.8, x0 = ?1; ?2 ]>, = 1 and "1 = "2 = 10?8 nds the solution 1 x ' ?4; ?5 ]> after 9 iteration steps; about 7 of the number of steps needed with the 4-parameter model.

Example 3.19. The nal example illustrates a frequent di culty with

least squares problems: Normally the algorithms work best when the problem is scaled so that all the (nonzero) jxj j are of the same order of magnitude. Consider the socalled Meyer's problem x fi (x) = yi ? x1 exp t +2x ; i = 1; : : : ; 16 ; i 3 with ti = 45+5i and

1 34780 7 2 28610 8 3 23650 9 4 19630 10 5 16370 11 6 13720 The minimizer is x F (x ) ' 43:97.

yi

11540 9744 8261 7030 6005

yi

12 13 14 15 16

5147 4427 3820 3307 2872


> with

yi

' 5:61 10?3 6:18 103 3:45 102

3.6. Final Remarks

48

An alternative formulation is 10z2 i (x) = 10?3 yi ? z1 exp u + z ? 13 ; i = 1; : : : ; 16 ; i 3 with ui = 0:45+0:05i. The reformulation corresponds to z = 10?3 e13 x1 10?3 x2 10?2 x3 > , and the minimizer is z ' 2:48 6:18 3:45 > with (x ) ' 4:397 10?5 . If we use Algorithm 3.21 with = 1, "1 = "2 = 10?12 and the equivalent starting vectors x0 = 2 10?2 4 103 2:5 102 > ; z0 = 8:85 4 2:5 > ; then the iteration is stopped by (3.18b) after 182 iteration steps with the rst formulation, 97 steps with the well-scaled reformulation.

Appendix
A. Symmetric, Positive De nite Matrices

The matrix A 2 IRn n is symmetric if A = A> , i.e. if aij = aji for all i; j .

De nition The symmetric matrix A 2 IRn n is positive de nite , x> A x > 0 for all x 2 IRn ; x 6= 0 positive semide nite , x> A x 0 for all x 2 IRn ; x 6= 0

(A.1)

Some useful properties of such matrices are listed in Theorem A below. The proof can be found by combining theorems in almost any textbooks on linear algebra and on numerical linear algebra. At the end of this appendix we give some practical implications of the theorem. Now, let J 2 IRm n be given, and let A = J>J : Then A> = J>(J> )> = A, i.e. A is symmetric. Further, for any nonzero x 2 IRn let y = Jx. Then x> Ax = x> J> Jx = y> y 0 ; showing that A is positive semide nite. If m n and the columns in J are linearly independent, then x 6= 0 ) y 6= 0 and y> y > 0. Thus, in this case A is positive de nite. From (A.2) follows immediately that (A + I)vj = ( j + )vj ; j = 1; : : : ; n

A. Symmetric, Positive De nite Matrices

50

Theorem A Let A 2 IRn n be symmetric and let A = LU, where L is a unit lower triangular matrix and U is an upper triangular matrix. Further, let f( j ; vj )gn=1 denote the eigensolutions of A, i.e. j Avj = j vj ; j = 1; : : :; n : (A.2)

Then 1 The eigenvalues are real, j 2 IR, and the eigenvectors fvj g form an orthonormal basis of IRn . 2 The following statements are equivalent a) A is positive de nite (positive semide nite) b) All j > 0 ( j 0 ) c) All uii > 0 ( uii 0 ) . If A is positive de nite, then 3 The LU-factorization is numerically stable. 4 U = DL> with D = diag(uii ). 5 A = CC> , the Cholesky factorization. C 2 IRn n is lower triangular.

for any 2 IR. Combining this with 2 in Theorem A we see that if A is symmetric and positive semide nite and > 0, then the matrix A+ I is also symmetric and it is guaranteed to be positive de nite. The condition number of a symmetric matrix A is 2 (A) = maxfj j jg= minfj j jg : If A is positive (semi)de nite and > 0, then maxf j g + maxf j g + ; 2 (A+ I) = minf g + j and this is a decreasing function of .

51

Appendix

Finally, some remarks on Theorem A and practical details: A unit lower triangular matrix L is characterized by `ii = 1 and `ij = 0 for j>i. Note, that the LU-factorization A = LU is made without pivoting. Also note that points 4 {5 give the following relation between the LU- and the Cholesky-factorization

A = LU = LDL> = CC> ;
showing that C = LD1=2 ; with D1=2 = diag(puii ) : The Cholesky factorization can be computed directly (i.e. without the intermediate results L and U) by the following algorithm, that includes a test for positive de nitenes.
begin

Algorithm (A.3). Cholesky factorization


P

k := 0; posdef := true while posdef and k < n

fInitialisationg ftest for pos. def.g fdiagonal elementg fsubdiagonal elementsg

else end

? k := k+1; d := akk ? k=11 c2 kj j if d > 0 p ckk := d for i := k+1; : : : ; n P ? cik := aik ? k=11 cij ckj =ckk j

posdef := false

1 The \cost" of this algorithm is about 3 n3 ops. Once C is computed, the system Ax = b can be solved by forward and back substitution in Cz = b and C> x = z ; respectively. Each of these steps costs about n2 ops.

Appendix

52

B. Proof of (3.20)

We introduce the function 1 G(h) = L(h) + 2 h>h with > 0 : The gradient of this is the linear function G 0 (h) = L 0 (h) + h = J>f + (J>Jf + I)h ; f f where Jf = Jf (x), f = f (x), and we have used (3.8) in the reformulation. According to Appendix A the matrix J>Jf + I is positive de nite, so the f linear system of equations G 0 (h) = 0 has a unique solution, and this is the minimizer for G. By comparison with (3.13) we see that this minimizer is hM. Now, let hm = argminkhk khM k fL(h)g :

G(hm ) = L(hm ) + h> hm m L(hM) + h> hM = G(hM) : M However, hM is the unique minimizer of G, so hm = hM .

Then L(hm ) L(hM) and h> hm h> hM, and m M

References
1. M. Al-Baali & R. Fletcher (1985): \Variational Methods for NonLinear Least-Squares". J. Opl. Res. Soc. 36, No. 5, pp 405{421. 2. V.A. Barker & O. Tingle (1991): \Numerisk L sning af Ikke-Line re Ligninger" (in Danish). H fte 57, IMM, DTU. 3. C.G. Broyden (1965): \A Class of Methods for Solving Nonlinear Simultaneous Equations". Maths. Comp. 19, pp 577{593. 4. J.E. Dennis, Jr. & R.B. Schnabel (1983): \Numerical Methods for Unconstrained Optimization and Nonlinear Equations", Prentice Hall. 5. P.E. Frandsen, K. Jonasson, H.B. Nielsen & O. Tingle (1999): \Unconstrained Optimization", IMM, DTU. Available at 6. P.E. Gill, W. Murray, M.A. Saunders & M.H. Wright (1984): \Sparse Matrix Methods in Optimization", SIAM J.S.S.C. 5, pp 562{589. 7. G. Golub & C.F. van Loan (1989): \Matrix Computations", John Hopkins Univ. Press. 8. P. Hegelund, K. Madsen & P.C. Hansen (1991): \Robust C Functions for Non-Linear Optimization". Institute for Numerical Analysis (now part of IMM), DTU. Report NI-91-03. 9. K. Levenberg (1944): \A Method for the Solution of Certain Problems in Least Squares. Quart. Appl. Math. 2, pp 164{168. 10. K. Madsen (1988): \A Combined Gauss-Newton and Quasi-Newton Method for Non-Linear Least Squares". Institute for Numerical Analysis (now part of IMM), DTU. Report NI-88-10.
http://www.imm.dtu.dk/ hbn/publ/H69.ps

References

54

11. K. Madsen & O. Tingle (1990): \Robust Subroutines for Non-Linear Optimization". Institute for Numerical Analysis (now part of IMM), DTU. Report NI-90-06. 12. K. Madsen, H.B. Nielsen & O. Tingle (1999): \Optimization with Constraints". IMM, DTU. Available at 13. D. Marquardt (1963): \An Algorithm for Least Squares Estimation on Nonlinear Parameters". SIAM J. Appl. Math.11, pp 431{441. 14. H.B. Nielsen (1996): \Numerisk Line r Algebra" (in Danish), IMM, DTU 15. H.B. Nielsen (1997): \Direct Methods for Sparse Matrices". IMM, DTU. Available at 16. H.B. Nielsen (1999): \Damping Parameter in Marquardt's Method". IMM, DTU. Report IMM-REP-1999-05. Available at 17. M.J.D. Powell (1970): \A Hybrid Method for Non-Linear Equations". In P. Rabinowitz (ed): Numerical Methods for Non-Linear Algebraic Equations, Gordon & Breach. pp 87 . 18. P.L. Toint (1987): \On Large Scale Nonlinear Least Squares Calculations". SIAM J.S.S.C. 8, pp 416{435.
http://www.imm.dtu.dk/ hbn/publ/TR9905.ps.Z http://www.imm.dtu.dk/ hbn/publ/Nsparse.ps.Z http://www.imm.dtu.dk/ hbn/publ/H40.ps.Z

Index
Al-Baali, 29 Algorithm Cholesky, 51 { Descent Method, 6 { Dog Leg, 42 { Hybrid Method, 30 { Marquardt, 25 { secant Marquardt, 37 Barker, 34 BFGS 28 black box, 33 Broyden, 35, 44 Cholesky, 10, 49 condition number, 50 consistency, 19 convergence, linear, 6, 7, 18, 19, 27, 43 { quadratic, 6, 9, 18, 19, 41 { superlinear, 6, 18, 26, 28 damped Newton method, 9 damping parameter, 21 data tting, 2, 19, 26, 46 dense matrix, 38 descending condition, 5, 23, 36 descent direction, 7, 8, 17, 21 descent method, 6 di erence approximation, 33, 38, 45 Dog Leg method, 40 eigensolution, 49 exact line search, 12 fairway, 40 nal convergence, 7 tting model, 2, 46 Fletcher, 29 ops, 45, 51 Frandsen et al., 2, 7, 8, 10, 12, 16, 28, 32, 39 full rank, 17, 35, 49 gain ratio, 22, 41 GAMS, 46 Gauss-Newton, 17, 39 generalized secant method, 35 Gill et al., 38 global minimizer, 1, 19 { stage, 5, 29 golf, 40 Golub, 15 gradient, 3, 14, 17 { method, 7 Hessian matrix, 3, 8, 14, 17, 28 hybrid method, 8, 32 implementation, 24 inde nite, 4 in nite loop, 24 initial stage, 7

Index
Internet, 46 inverse, 44 Jacobian matrix, 13, 16, 19, 24, 33, 44, 46, 47 least squares problem, 1, 13 level curves, 43 Levenberg, 21, 25 line search, 6, 7, 9, 10, 12 linear convergence, 6, 7, 18, 19, 27, 43 { independency, 17, 35, 49 { least squares, 14, 46 { model, 17, 21, 22, 34, 39 local minimizer, 1, 3, 4, 19 LU-factorization, 49 Madsen, 28, 29 Marquardt, 21, 23 { equations, 21, 25 { method, 25, 26, 33, 46 { step, 24, 30 Matlab, 15, 20 Meyer's problem, 47 necessary condition, 3 Newton step, 41 { method, 8 Newton-Raphson, 15, 20, 41 Nielsen, 15, 23, 38 nonlinear system, 15, 20, 27, 41, 46 normal equations, 14, 25 numerical di erentiation, 33 orthogonal matrix, 14 { transformation, 14, 25 orthonormal basis, 49 parameter estimation, 46 positive de nite, 4, 8, 9, 17, 21, 28, 49 Powell, 16, 20, 27, 35, 39, 43, 46 quadratic convergence, 6, 9, 18, 19, 41 quasi-Newton, 10, 28, 32

56

rank one update, 35 reformulation, 46 residual, 2 robust method, 15 Rosenbrock, 32, 38, 43, 45 rounding error, 24 saddle point, 4 safety valve, 24 scaling, 47 secant Dog Leg, 45 { method, 34, 35 { Marquardt, 37 semide nite, 49 singular Jacobian, 16, 27, 44 soft line search, 12 sparse matrix, 38 stationary point, 3 steepest descent, 7, 21, 39 stopping criterion, 23, 26, 32, 36, 42 submatrix, 15 su cient condition, 4, 8 superlinear convergence, 6, 18, 26, 28 symmetric matrix, 28, 49 Taylor expansion, 3, 6, 8, 13, 17 tee point, 40 Tingle , 34 Toint, 38 trust region, 39, 43 updating, 22, 23, 27, 35, 38 Van Loan, 15 white noise, 2

You might also like