0% found this document useful (0 votes)
1K views104 pages

Algorithms For Minimization Without Derivatives

Uploaded by

correita77
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
1K views104 pages

Algorithms For Minimization Without Derivatives

Uploaded by

correita77
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 104
ALGORITHMS FOR MINIMIZATION WITHOUT DERIVATIVES RICHARD P. BRENT Prentice-Hall Inc. Englewood Cliffs, Now Jersey PRENTICE-HALL SERIES IN AUTOMATIC COMPUTATION George Forsythe, editor LARIMANS AND STEARNS, Algobraie Si HULL, Introduction fo Com) sacony, et Ew ant1n, Programming Real-Time Con Manns, Sostems Analysis for Data Tran arin, Teleco Language usm, editor, Fa SACKMAN AND EITRENRALM, editors, On-Line Plonnits su.toN, editor, The SMART Retrieval System: Exper Processing savower, Programming Laneuages: History ane Fundamentals SHERMAN, T SIMON AND SiELOSSY, editors, Representation and Meaning: Exp STROUD AND sreResT, Gaussian Quadrature "Tsui, editor, The Computer Impact ‘rnaue, Hert vn, Pa CONTENTS (© 1973 by Prentice-Hall, Inc, Engh Allrights reserved. No part of jnany form or by any meas wi Som the publisher. ISBN: 0-13.022335.2 PREFACE xi Library of Congress Catalog Card Number: 78-39843, 1 Printed in the United States of America INTRODUCTION AND SUMMARY 1 w9RT6S 4321 1.1 Introduction 1 1.2 Summary 4 2 SOME USEFUL RESULTS ON TAYLOR SERIES, DIVIDED DIFFERENCES, AND LAGRANGE INTERPOLATION 9 21 Introduction 8 2.2 Notation and definitions 10 23° Truncated Taylor series 11 24 Lagrange interpoletion 12 PRENTICE-HALL INTERNATIONAL, INC. Lor 25 Divided differences 13 Prentice-HALL oF AustRALtA, Pry. Lrp., Syudiey 26 Differentiating the error 15 Prestice-HALL oF Cana, Lro,, Toronto PresTice-HALt oF INDIA PRIVATE LiMiTED, New Det vii PReNtice-HALt oF JAPAN, INC., Toky wit conrenrs 3 THE USE OF SUCCESSIVE INTERPOLATION FOR FINDING SIMPLE ZEROS OF A FUNCTION AND ITS DERIVATIVES 3.1 Introduction 3.2 The definition of order 3.3 Convergence to a zero 4 Superlinear convergence 35 Strict supertinear convergence 3.6 The exact order of convergence 2.7 — Stronger results for q = 1 and 2 8.8 Accelerating convergence 2.9 Some numerical examples 3.10 Summary 4 AN ALGORITHM WITH GUARANTEED CONVERGENCE FOR FINDING A ZERO OF A FUNCTION 41° Introduction 42° The algorithm 43° Convergence properties 44° Practical tests 45° Conclusion 46 ALGOL 60 procedures 5 AN ALGORITHM WITH GUARANTEED CONVERGENCE FOR FINDING A MINIMUM OF A FUNCTION OF ONE VARIABLE 5.1 Introduction 5.2 Fundamental limitations because of rounding errors 5.3 Unimodality and 5-unimodality 5.4 An algorithm analogous to Dekker's algorithm 55 Convergence propertios 5.6 Practical tests 57 Conclusion 68 An ALGOL 60 procedure 19 21 22 24 26 29 34 46 47 48 53 54 56 58 ot 63 65 2 75 76 78 79 19 47 or 6 GLOBAL MINIMIZATION GIVEN AN UPPER BOUND ON THE SECOND DERIVATIVE 61 Introduction 62 The basic theorems 6.3 An algorithm for global minimization 6.4 The rate of convergence in some special cases 65 A lower bound on the number af function evaluations required 66 Practical tests 6.7 Some extensions and generalizations 68 An algorithm for global minimization of a function of several variables 69 Summary and conclusions 6.10 ALGOL 60 procedures 7 A NEW ALGORITHM FOR MINIMIZING A FUNCTION OF SEVERAL VARIABLES WITHOUT CALCULATING DERIVATIVES 7.1 Introduction and survey of the literature 7.2 The effect of rounding errors 7.3 Powell's algorithm 7.4 The main modification 7.5 The resolution ridge problem 7.6 Some further details 7.7 Numerical results and comparison with othor methods 78 Conclusion 7.9 An ALGOL W procedure and test program BIBLIOGRAPHY APPENDIX: FORTRAN subroutines INDEX ar o4 86 97 100 103 105 107 mt 112 116 122 124 128 132 136 137 154 186 ar 116 169 187 193 PREFACE ‘The problem of finding numerical approximations to the zeros and extrema of functions, using hand computation, has a long history. Recent ress has heen made in the development of algorithms suit- , computer. In this book we suggest improvements s, extend the mathematical theory behind them, f Jocal and global ns considered depend entirely atives are required. to some of these algor and descritie some new al 100k appeared as Stanford University Report g zeros and extrema of functions without yw out of print. This expanded version is published fe students and research workers in mputer science, and operations research. ly indebted to Professors G. E. Forsythe and G. H, Golub id encouragement during my stay at Stanford. ‘Thanks are n and to Professors J. G. Herriot, F. W. Dorr, and C. B. Moler, I reading of various drafts and for many helpful sugges. suggested how to find bounds on polynomials (Chapter nd Dr. J, sduced me to Dekker's algorithm (CI rts of Chapter 4 appeared in Brent (971d), and are included in by kind permission of the Editor of Tie Com 1. Thanks go to cs-71 calculating de in the hope tl pasrace Professor F. Dorr and Dr. I, Sobel for their help in tes inders, and Alan George for nd to Phyllis Winkler for her nearly perfect n also grateful for the influence of my teachers V. Grenness, H. Smith, Dr. D, Faulkner, Dr. E, Strzelecki, Professors G. Preston, J. Mille, Z, Janko, R. Floyd, D. Knuth, G. Polya, and M. Schiffer. Deepest thanks go to Erin Brent for her help in obtaining some of the Finally I wish to thank the Commonwealth Scie Research Organization, Austeal Stanford is work is dedicated to Oscar and Naney Brent, who laid the fo ; and to George Forsythe, w fed the construction, and Industrial for its generous support during my stay at tio) INTRODUCTION AND SUMMARY Section 7 INTRODUCTION Consider the problem of finding an approximate zero or minimum of unction of one real variable, using limited-precision arithmetic on a se- tal computer. The function f may not he differentiable, or the derivative” may be difficult to compute, so a method which uses only com- puted values of f is desirable. Since an evaluation of f may be very expen- .puter time, a good method shouid guarantee to find a correct solution, to within some prescribed tolerance, using fons. Henve, we study algc on evaluating f at a small number of poi properties are guaranteed, even in the presence of rounding errors. Slow, safe alg which may occasionally fal ‘matter how well behaved, as it is worst possible case (ignoring. the possi- bility that an exact zero may occasionally be found by chance). As a con: trasting example, consider the method of successive linear interpolation, 1 2 prRODUCHION AN SUMMARY chap. § I approxi sulficiently good to ensure convergence, or if the effect of rounding errors mportant n Chapter 4 we describe so fast that a less rel cly (0 be preferred on grounds of speed, is described in Chapter 5. This algorithi isfy one of our requirements: in certain ap} where repeated one- 'S are required, and where accuracy is not very less reliable) method is preferable. One such 1a of functions of several variables without ves, is discussed in Chapter 7, (Note that wherever we gorithms for es find, at best a, there is no 1 ie oF lowest) interest in minimum, For a which is of 4 serious practical disadvantage of most ipter 5 is no excep- sand, perhaps, 8 of the minimization procedure, in the hope found is the global minimum. This approach as the same local minimum may be found several times. I for, no matter how many starting points are tried. 1¢ global mi has been found, n Chapter 6 we discuss the problem of finding the global to within a prescribed tolerance. this proble} 4@ priori informa to be minimized is known. We describe an efficient algori if an upper bound on f” is known, and we show how this algor nber of variables, the recursive method is practical ns of less than four variables. For functions of more varie Set wnrmonuction 3 ables, we * method, information abou lab. , we are led to consider practical methods for finding local strained) minima of functions of several variables. As before, we consider methods which depend on evaluating the function at a small number of points. Unfortunately, without imposing very strict conditions on the func tions (© be minimization algorithm produces results which are correct to within some prescribed tolerance, or that into account. We have to be s sive correct results for the functions likely to arise in pr As suggested by the length of our bi ide can hardly expect to find a good method which is completely unrelated to the known ones. In Chapter 7 we take one of the better methods which does not use derivatives, that of Powell (1964), and modify it (o try to overcome some of the difficulties observed in the literature. Numerical tests suggest iF proposed method is faster than Powell’s original method, and just as reliable. It also compares quite well with a different method proposed by Stewart (1967), at least for functions of less than ten variables. (We have few numerical results for non-quadratic functions of ten or more variables.) ALGOL implementations of all the above algorithms are given, Most, e with ALGOL W (Wirth and Hoare (1966)) on IBM 360/67 puters. As ALGOL W is not widely used, we give ALGOL 60, procedures (Naur (1963)), except for the n-dimensional minimization algo- FORTRAN subroutines for the one-dimensional zero-finding and imization algor Appendix. To recapitulate, we describe algorithms, and give ALGOL procedures, for solving the following problems efficiently, mn (not deriva- tio 1. Finding a zero of a function of one variable if an interval in which the Function changes sign is given 2. Finding a local minimum of a function of one variable, defined on a given interval inding, to within a prescribed tolerance, the global minimum of @ function of one or more variables, given upper bounds on the second der 4, Finding a For the first three algorithms, rigorous bounds on the error and the number of function evaluations required are established, taking the effect of rounding inction of several variables, 4 wrsoouctION ano suimany cap. errors into accou first two algorit differences, are also of ing the order of converge on interpolation and di of the “d Section 2 suMMARY In this section wi ize the main results of the following chapters. Amore d fe places in each chapter. This summary is intended to serve as a guide to the reader who is interested in some of our results, thas been made to keep exch cl a reader, an attempt Chapter 2 In Chapter 2 we collect some res n, and divided differences. Most of these results are needed in Chapter xd the casual reader might prefer to skip Chapter 2 and refer back to it when necessary. Some of the re lar to classical ones, but wstead of assuming that f has 7+ 1 continuous derivatives, we onl s Lipschitz contin (@)in the classic and 2.5.1 are of this nature. Since @ Lipschitz cont iable almost everywhere, these rosulls are not surprising, # hey have not been found in the (erature, except where references are given. (Sometimes Lipschitz conditions are imposed on the derivatives of functions of several vat example, Armijo (1966) and McCormick (1969),) The proofs are mostly to those for the classical results. Theorem 2.6.1 is a slight generalization of some results of Ralston (1963, 1965) on differentiating the error in Lagrange interpolation. It is included oth for its independent interest, and because it may be used to prove a ightly weaker form of Lemma 3.6.1 for the important case q = 2. (A proof along these lines is sketched in Kowalik and Osborne (1968).) teresting result of Chapter 2 is Theorem 2.6.2, which gives an of the error in Lagrange interpolation at the points of interpol that the conclusion of Theorem 2.6.2 holds iff has + I continuous derivatives, but Theorem 2.6.2 shows that itis sufficient for f to have n continuous derivatives, ‘Theorem 2.5.1, es an expansion of divided differences, may tee regarded as a general in Chapter 3: for example, see Theorem 3.4.1 and Ler An expression for the derivati See2 summary 6 js useful for the analysis of interpolation processes whenever the coef- ficients of the interpolation polynomials can conveniently be expressed in terms of divided differences, Chapter 3 In Chapter 3 we prove some theorems which provide ibed in Cl converge superline: I be. For these results the effect of rounding, 0 is mainly interested in the practical appl 3.10), zeros (used in Chapter 4), turning points (used in Chapter 5), can be giv. process for finding a zero { of '"-", for any fixed g > 1. Successive ear interpolation and successive pa jon ave just the sp ceases q = Tand g = 2. Ani rest i inflexion points. As the proofs for general g are essential ‘given for general g are Theorem 3.4.1, which gives conditions under which convergence is super linear, and Theorem 3.5.1, which shows when the order js at least 1.618 (for g¢ =1) oF 1.324... (for g = 2). These numbers are well our assumptions about the differentiability of fare weaker than those of thors, e.g. Ostrowski (1966) and Jarratt (1967, 1968) matical point of view, the most interesting result of Chap- ter 3is Theorem 3.7.1. The result forg = | isgiven in Ostrowski (1966), except weaker assumption about the smoothness of /- For q = 2, order at least 1,378... is possib } v0, appears to be new. Jarratt (1967) and Kowal even iff) Osborne 98) assume that bier) 7 Sat ee and then, from Lemma 3.61, the order of convergence How ever, even for such a simple functi f@) (2.2) there are starting points xy, x,, and x,, arbitrarily close tof, such that (2.1) lorder is atleast 1.378... We should point out that this exceptional case is unlikely to occur: that the set of starting points for w) ing conjecture is re 2er0. & INTRODUCTION AND suMAtARy chap 1 ‘The practical conclusion to be drawn from Theorem 3,7 it convergence js to be accelerated, then the result of Lemma 3.6.1 should be preference to a result like equation (3.2.1), In Section 3.8 we gi ne of the many ways in which this may be done. Finally, some numerical examples, ill wlerated and unaccelerated processes, are given in Section 3.9. Chapter 4 In Chapter 4 we describe an algorithm for finding a zero of a fun, h changes sign in a giv ithm is based on a combi n of successive linear interpolation and bisection, in much the same way as “Dekker’s algorithm” (van Wijagaarden, Zonneveld, and Dijkstra (1963); 967); Peters and Wilkinson (1969); and Dekker (1969)). Our algorithm never converges much more slowly than reas Dekker’s n may converge extremely slowly in certain cases. (Examples are tien in Section 42) : : eae It is well known that bisection is the optimal algo sense, for fi 's which change sign in an interval, (We ial algorithms: sce Robbins (1952), Wilde (1964), and for both our algorithm and Dekker’s is n is nat oy if the class of allowable fu stricted, For example, itis not o ex functions (Bi Dreyfus (1962), Gross and Johnson (1959), or for C! fun Both our algorithm and Dekker’s ex! zero of a C* function, for eventui Chapter 6 Igorithm for finding a local minimum of a function of one variable is described in Chapter 5. The algorit ines golden section search (Bellman (1957), Kiefer (1953), Wilde (1964), Witzgall (1969)) and successive interpolation are com zero-finding 1m of Chapter 4, Convergence in a reasonable number of func' raranteed (Section 5.5). For a C? function with posi um, the results of Chapter 3 show that convergence is supetlinear, provided that the minimum is at an interi iven in the literature either fi their order of convergence In Sections 5.2 and 5.3 we consider the effect of roun 1c accuracy of any algorithm ions, and this the ALGOL mited-precision section should be studied by the reader wh procedure given in Section 5.8 If f is unimodal, then our algorithm will find the unique provided there are no rounding errors, To study the effect of rounding err ictions. A unimodal function is d-unimodal jon toa unimodal function is not . the size of 5 dependis and on the precision of computation, (5 0 as the pr increases indefinitely.) We prove some theorems about d-unimodal functions, ‘and give an upper bound on the error i ich found by our a ». In this way we can justify the use of our algorithm in the presence of rounding errors, and account for their effect. Our motiva- tion is rather si (1968) in developing the e-calculus, but we are not concerned with properties that hold as ¢ -» 0. The reader who is not interested in the effect of rounding errors can skip Section 5.3. Chapter 6 In Chapter 6 we consider the problem of finding an approximation to the global minimum of @ function f; defined on a finite interval, if some «priori information about f is given. This interesting probl (0 have received much attention, although there have been some empirical investigations (Magee (1960)). In Section 6.1, we show why some a pric information is necessary, and discuss some of the possi der of the chapter we suppose that the information is an upper bound on f" ‘An algorithm for global minimization of a function of one vai applicable when an upper bound on f” is known, is des 6.3. The basic idea of this on a polynomial in a given interval. We pay particular attention to the prob- lem of giving guaranteed bounds in the presence of rounding errors, and the reader may find the details in the last half of Section 6.3 rather in- He. In Section 6.4, we try to obtain some i algorithm by considering some tractable special cases. Then, in Sections 6.5 and 6.6, we show that no algorithm which uses only function evaluations and an upper bound on * could be much faster than our algorithm, f9 the behavior of our 2 mrropucriON 4no sumuany SOME USEFUL RESULTS ON TAYLOR SERIES, DIVIDED DIFFERENCES, AND LAGRANGE INTERPOLATION Section 1 INTRODUCTION are new, but they t where references are given ugh they are given mainly for their independent terest. Pethaps the most interesting rest jeorem 6.2, which shows is is well known if f tim 6.2 shows that it is sufi for f to have Section 2 NOTATION AND DEFINITIONS nite, closed. interval, a nonnegative integer, ber in, I). Definitions The modulus of continuity w(f; 5) of f in (a, B) is defined, for 6-0, by WL sup, 1) — £0) Qn If f has a continuous n-th derivative on [a, b, then we write f= C*fa, b addition, /° © Lipy a ie, wif; 5) <= Md™ @2) for all 6 >0, then we write © LC*{a, 6; M, a], (This notation is not standard, but it is convenient if we want to mention the constants M and ly) Ife LC*a, b; M, I] then we write simply f= LC*a, b; My sr++2%, are distinct points in [a,b], then IP(fs x5... %,) is ‘the Lagrange interpolating polynomial, i.e., the unique polynomial of degree n or less which +%y The divided difference Fl¥ «5 ,]is defined by L(x) lf fas 3] 3) (There are ma 2 see, for example, Milne (1949), Milne- ‘Thomson (1933), and Traub (1964).) Note that, although we suppose for implicity that x,,...,.x, are distinct, neatly all the results given here and in Chapter 3 hold if some of x,.....x, coincide. (We then have Hermite interpolation and confluent divided differences: see Traub (1964).) For the statement of these results, the word “distinct” is enclosed in parentheses. S08 TRUNCATED TAYLOR Semles 11 Neviton’s identities represent 1. fhe) = sl) flee. e4) 2. WP = IPUS; Xe 110) = POs) + (FPO = =) -eesta sh — 2S) and P(x) = flee] + Oe — xp) F 1X9. X41) + + — x) + Xe ¥d- 26 Section 3 TRUNCATED TAYLOR SERIES In this section we give some fo! is needed in Chapter 6, and applies condition. LEMMA 3. Suppose fe C0, 8) For some & M such that, for a < (0, 4), Fy) = FO) < My. G1) Then, for all x © [0, 6}. fo = SFO 62 where sma) 1, nh Oe! : 65) the bound obtained from the classical result (8) = FSP prngyy 4 CA png, G.10) for some & between x and y, is not sharp. Section 4 LAGRANGE INTERPOLATION The following lemma, used in Chapter 6, gives a one-sided bound on the error in Lagrange interpolation if 7 satisfies a one-sided Lipschitz condition. Thus, it is similar to Lemma 3.1. The corresponding two-sided result follows from Theorem 3 of Baker (1970), but the proof given hete is simpler, and similar to the usual proof of the classical result that, if fe CvMa, bl, then mx) = se E(a)), for some E(x) € [a,b]. (See, for example, Isaacson and Keller (1966). pg. 190.) LEMMA 4. Suppose that f = C*la,d]: x,...,.¥,are (distinct) points in [a, 6]; P= IPF. X58): and, for all x,» © [a,b] with x > y, fix) — Then, for all x e fa, bl, 70) = M2) + (Lex) the (42) YS M@— y). aD se 5 DimoeD oIFFeRENCES 13 where lo) O and x # x, for any r= 0, ...,n, for otherwise the result i trivial. Let wad = [Lo — xd, 4) and write Six) = P(x) + w)SQ). (4.5) Regarding x as fixed, define F(2) = fiz) — P(e) — w&z)Six) 4.6) for z © (4,8) Thus Fe Cela,b), and F(z) vanishes at the m+ 2 distinct points Xp +X Applying Rolle’s theorem 7 times shows that there are two distinet points &,€, © (a6) such that FG) = FPG.) = 0. an Differentiating (46) m times gives ) = f(2) — (n+ 1)! Stay + el), as) where e(x) is independent of z. Thus, from (4.7), 1p) = f"E,)| OF me oi and the result follows from condition (4.1). «9 SQ) = Section 5 DIVIDED DIFFERENCES Lemma $.1 and Theorem 5.1 are needed in Chapter 3. The first part ‘of Lemma 5.1 follows immediately from Lemma 4.1 and the identity (2.5) (we state the two-sided result for variety). The second part is well known, and follows similarly. Theorem $.1 is more interesting, and most of the results of Chapter 3 depend on it, It may be regarded as a generalization of ‘Taylor's theorem, which is the special case n = 0. LEMMA 6.1 Suppose that f= LC*la,b; M] and that x, points in fa, 6), Then are (distinct) 6.) 14 Pretinawany wesours 6 Furthermore, if f © C*" a, 6 for some € © [a, b} ad Suppose that k,n =O; fe C*[a, bl: a <0; 6 0; and x... 5%, nts in fa, b). Then R (G4) oyl Fe RFC) — LO) (53) for some...» in the interval spanned by x; and 0 cono.iaay 5.3 Mf, in Theorem 5.1, (5.6) then 6n Proof of Theorem 6.1 The result for k = 0 is immediate from the second part of Lemma 5.1, 0 suppose that k > 0. Take points y,,...,., which ate distinct, and dis. FEO Nays Xe Then S18, Pal PR Tnsas etd flys, dh (5.8) = Zeya se, 69) by the identity (24). See 6 DIFFERENTIATING THE ERROR 18 We may suppose, by induction on k, that the theorem holds if is replaced by & — 1 and n by n + 1. Use 6%), it aS Ygys.. 4 ¥, tend to 0, By Lemma 5.1, f{%o --- 3s) tends to f‘*'(0)/n!, so the result follows. (Strictly to show the existence of the points &,..,,« we must add to the inductive hypothesis the resolt that 7 "is a continuous fun Byers Mad ry S.1 is immediate, once we note that there are exactly 2) term inthe sum (5.3). Section 6 DIFFERENTIATING THE ERROR ‘The two theorems in this section are concerned with differentiating the error term for Lagrange interpolation. These theorems ate not needed later, but are included for their independent interest, and also because they may be used to give alternative proofs of some of the results of Chapter 3 see Kowalik and Osborne (1968), pp. 18-20. ‘Theorem 6.1 is given by Ralston (1963, 1965) iff © C*""[a, 8). We state the result under the slightly weaker assump b; M] for some M: the only difference in the conclusion is that Ralston’s term ‘fe*MUH(a)) 58 replaced by m(x), where |m(x)| 1; fe LC*a, B; M]s X55 «++ %4- ate (distinct) points in [a bls (3) = (& — ) 0-2 (© Aes P= IPS Koy es) and ‘FG) — P(x) + Rx). Then there are functions &: fa, 6] [a0] and ‘m: [a, B]-> [=A M] such that 1 continuous function of x & [a, bl (although g(x) is not continuous); 2, m(x) is continuous on [a,b except possibly at x. 3. forall x e (a,b) (a) = SEDER (61) and i = PCED), wom 62) 76 Precimwany aesutts ohp2 and 4. fxs x, for =, anh then HImeeay = POY 63) THEOREM 6.2 Suppose that 21; fe Cla, Bs xy... 6X5 are istinet) points in [a, BY; w(x) = (x — x4) ++ — x) P= IPS Xp ‘ F(x) = P(x) + RCs). Then there is a’ function €: [a,5] > [a,b] such F(ECO) is 9 continuous funetion of x € fa, ]; for all x € [a, wf"), R(x) = and, for r= 0,-0.2 1, RQ) = oo fetad) (65) Before proving Theorem 6.2, we need some lemmas. Note the si between Lemma 6.2 and Theorem 6.1 larity LEMMA 6.1 Suppose that » 1; f.¢ C'{a, 6]; xy,...4, are distinet points in P= IP(f, x, ede A= max |//: (6.6) and omg, 6 for all x © fa, 6}, 16) = P09) + (Thor x3} ste), 68) where sel 24 (69) Proot =x, forsomer=0,..., by the identity (2.3). we can, ke S{x) = 0, Otherwise, SO) fle, pH (6.10) Write x,,, for x, and reorder x9. 2.5 %y)1 50 thi are oy 5 Xhayy then ‘ = max i the reordered points 26 (6.11) S06 DIFFERENTIATING THE ERROR 17 From (6.10) and the identity (2.4), Tass cere xd = fix so, by Lemma 5.1, (6.12) (6.13) for some & and 2 in [a, bl. In view of (6.6) and (6.11), the result follows. LEMMA 6.2 Suppose that = 2; fe Cola, Bl: Xo. «are distint points eA may | 9%) 58 max |i — ys Py = IPL oy (x —x,)5 and f(x) =P) + R(x). Then there is 6] (a, 6] such that, for all x & (a, b), G(x) is a con- tinuous function of 3 ex) = MODES, 14) |rer- eo FCECO)| 2 (6.15) and, if x # x, forr = 0,..., aaa 4 pm = 24 |Aroeon |= (616) Proof : Let x, be a point in [a 8), distinct from x and yy. tye For k= orm + 1, define Pye IPF Xp +2 Me) (6.17) and WAX) = (= Xp) 0+ OF — ey). (6.18) By the classical result corresponding to Lemma 4.1, there is a function & such that (6.14) holds, Suppose, until further notice, that x # x, for re 0,...,2, Then, from (6.14) and the identity PQ) = 19) we have LUGO _ FO) _ 5 __ fe 620) mW) 28 Xe) Since the right side of (6.20) is cor side, and inuously differentiable at x, so is the left redo = £(L9) + 3 Ae (621) ds\iv,0x) plea 3 18° Pacuiviwany nesuLrs Define Six, x.) by Since THE USE OF SUCCESSIVE INTERPOLATION FOR FINDING SIMPLE ZEROS OF A FUNCTION AND ITS DERIVATIVES Section 7 INTRODUCTION 20. SUCCESSIVE INTERPOLATION chan 3 unnecessary: all that is required is that x,,,,, i8 a (not necessarily unique) solution of (1). ‘The cases of most practical interest are q ~ 1, 2, and 3. For g= the successive interpolation process reduces Co the familiar method of succes- sive lineat interpolation for finding a zero of f, and some of our results are well known, (See Collatz (1964), Householder (1970), Ortega and Rheinboldt (1970), Ostrowski (1966), Schriider (1870), and Traub (1964, 1967).) For 4g =2, we have a process of successive parabolic interpol a tuming point; for g = 3, a process for finding an inflexion point. These two cases are discussed separately by Jarratt (1967, 1968), who assumes that fis analytic near £. By using (1.3) and Theorem 2.5.1, we show that nuch milder assumptions on the smoothness of f suffice (Theorems 4.1, , and 7.1). Also, most of our results hold for any g>. 1, and the proofs are no more dificult than those for the special eases g = 2 and q = 3. Some simplifying assumptions Practical algorithms for finding zeros and extrema, using the results of this chapter, are discussed in Chapters 4 and 5. Until then we ignore the problem of rounding errors, and usually suppose that the initial approxi- mations x, , are sufficiently good. For the sake of simplicity, we assume that any q + I consecutive points Nyy os ++ Xqug Fe distinct, This is always true in the applications described in Chapters 4 and 5, Thus, P, is just the Lagrange interpolating polynom and the results of Chapter 2 are applicable. As in Chapter 2, the assumption of distinct points is not necessary, and the same results hold without this assumption if P, is the appropriate Hermite interpola A preview of the results ‘The definition of “order of convergence” is discussed in Section 2, and in Section 3 we show that, if sequence (x) satisfies (1.1) and converges to then £2-"@) = 0 (Theorem In Sections 4 to 7, we consider the rate of convergence to a simple zero {of s*-, making increasingly stronger assumptions about the smoothness of f° For practical applications, the most import is probably orem 4.1, which shows that convergence starting values are sufficiently good, As in similar results for Newton's method (Collatz (1968), Kantorovich and Akilov (1959), Ortega (1968), Ortega and Rheinboldt (1970), ete), it is possible to say precisely what “sufficiently good” means. Theorem 5.1 is an easy consequence of Theorem 4.1, and gives 1 lower bound on the order of convergence if /'® is Lipschitz cont ‘The question of when the order of convergence is equal (0 t bound given by Theorem 5.1 is the subject of Sections 6 and 7. Although {ThE pEFMuTION OF ORDER 34 the results are interesting in practical problems it is merely a pleasant converges faster than expected! Thus, the reader interested mainly in prac- Lical applications may prefer to skip Sections 6 and 7 (and also Theorem 3.1), except for Lemma 6.1 In Section 8, we consider the interesting problem of accelerating the rate of convergence. Theorem §.1 shows how this may be done, We make use of Lemma 6.1, which gives a recurrence relation for the error approximations to ¢, and is a generalization of results of Ostrow: and Jarratt (1967, 1968) Finally, in Section 9 the theoretical results are il numerical examples, and a brief summary of the m: in Section 10. The reader may find it worthwhile to glance at this su occasionally in order to see the pattern of the results, they are ‘ i (1966) strated by some Section 2 THE DEFINITION OF ORDER Suppose that lim x, = ¢. There are many reasonable definitions of the “order of convergence” of the sequence (x,). For example, we could say that the order of convergence is g if one or more of (2.1) to (24) holds: en en ex 24) inf (—log|x, — £ "= p. ‘These conditions are in decreasing order of strength, ie. (2.1) > (2.2) > (2.3) = 2.4), and none of them are equivalent, (2.1) is used by Ostrowski (1966), Jarratt (1967), and Traub (1964, 1967), while (2.2) is used by Wall (1956), Tornheim (1964), and Jarratt (1968), Voigt (1971) and Ortega and Rheinboldt (1970) give some more possibilities. For example, we may take the supremum of p such that the limit K in (2.1) exists and is zero, or the infimum of p such that X is infinite, For our purposes it is convenient to use 2.1) and (2.4), s0 we make the following definitions. DEFINITION 21 We say x,—{ with strong order p and asymptotic constant K if x, a8. & and (2.1) holds, We say x, -> { with weak order p if x,—> ¢ a8 n— co and (24) holds If-x,=C forall sufficiently large m then we say that x,» with weak order oo. 22 SUCCESSIVE IVTERPOLATION ome. DEFINITION 2.2 Let c= lim sup}, = 6 23) We say x, € sublinearly if x,» 5 and ¢= 1, We say x,» linearly if OSe< 1. We say x, supertinearly if c= 0. We say x, strictly supertincarly Cowith weak order p> 1 Examples _ Ip to clarify the definitions, 1f ind x = exp(—p'[| + o(l)]ast —> co, then x, ~> O with strong order ic constant 1. If @ > 1 and x, —exp(—oJ2 + ‘weak order @, but not rong order, for the limit nite if p > a. Thus, weak order p, in (2.1) ot (2.4) exists, and x, >, then p > 1. If the limit en K <1. (K | forlinear convergence, ear convergence.) inear, superlinear, and strictly superlinear Section 3 CONVERGENCE TO A ZERO In this section we show converges, then it must converge to & zero of Je Cra, b. First, we need a lemma which gives a relation between the point y+ Beas tewma 33 Se ’ are (distinct) points in [a, (Sess Saeed) PBs aed JB Gl Proot By the identity (2.2.6), PX) = fx) 4+ O = xf dh 2) S03 CONVERGENCE TOA ZERO 23, =@- oir exper d(E Os = 9) Re 63) ‘Thus, the result follows from (1 ‘THEOREM 3.1 Suppose that f= C™ifa, defined in (a, 8]; and that there exists lim x, == €. Then ; that a sequence (x,) satisfying (1-1) is )=0. Proof Suppose, by way of contradiction, that Fr e0. G4) (2.24) shows that Seed = FB. ‘Thus, from Lemma 3.1, Base — Nata = 68) where ; on Both divided differences in (3.7) tend to CN — I) as n—> oo, so there is no loss of generality in assuming that the denominator f [xq +++ + %ery-i) is nonzero for all (on the assumption (3.4)), and we have lim p,,, = 0. G38) Summing (2.6) over r= 0,...,q~ Land rearranging terms gives FE leper — Hesest) = Miliase — aed 69) where eae ett By —Se— G10) Sy, By (3.8), there is no loss of generality in assuming that the denominator in (3.10) is nonzero for all n > 0. From (3.6), with r=q ~ 1, and (3.9), we have Gl) Xyeaen — Xase = Haare — Sse we 24 SUCCESSIVE INTERPOLATION chap 3 where = Bate (3.12) The repeated apy By ~ Br GB) and, by (38), (3.10) and (3.12), 2, ~» 0.88 > €9, so the right side of (3.13) tends to zero as n~> co, This contradicts the assumption that ,_, # 80 (3.4) is false, and the proof is complete. (If we do not wish to assume th any q ++ | consecutive points x... Xy,, are distinet, then we may arg as follows: on the assumption (3.4), the right side of (3.1) is nonzero for all sulfciently large 1, and thus at least two consecutive points from x... =, are distinct. Taking these two points in place of x,_. and x,, we gel a Section 4 SUPERLINEAR CONVERGENCE Iff/has one more continuous derivative than required in Theorem 3.1, then Theorem 4.1 shows that convergence to a simple zero of fis super jon 2.2, provided the starting values are suffi equation (4.1), w is the modulus of continuity: see S gence to a multiple zero of jot usually superlinear, even if g = 1 (Section 4.2}, and Theorem 3.1 above is the only theorem in this chapter for we do not need to assume that the zero is simple. Thus, there is no ters 4 and 5 will con- THEOREM 4.1 Suppose that fe CHa, MM: ¢ © (a. 0 %, are (distinct) points in [a, BY: by = max Ix, —& E) = 05 1f — by, 6 + do] S [@, A]; and : Swiss b.) <1 FO) aD Then a seque (x,) is uniquely defined by (1.1), and x, ~» ¢ superlinearly if, for n> 0, = max [x1 = Cl 42 and as then the sequence (5,) is monotonie decreasing and Seuaet Saber 4) Son 4 SUPERLINEAR CONVERGENCE 25 Proof Without loss of generality, assume that [= 0. Let 5, and 2, be as in the statemer Corollary 65) where (4.6) it an Js (as) where 49) (4.10) (Note that the assumption (4.1) ensures that fis, 50) From Lemma 3.1 (with x, and x, interchanged), (3), and (4.8), (Seis. GP =(E9) 4° aun where R= RS sO + Rae ry (4.12) From (4.6), (4.7), nd (4.10), equation (412) gives |Ral (413) so, from (43) and (47), (4.14) Now, from (4.11), we have \ 15) By the assumption (4.1), 4g < defined, 3, = 6° < by, 4, = Ay, at [xee11 Ady (4.16) 5 in [4,5], 8, and 2, are wel 26 SUCCESSIVE mTERFOLATION chan. In the same way, we see that 6, > 3, > 6, > and, for = 0, {Seeees| S AGous (17) Thus, the inequality (4.4) holds, and superlincarly. From (4.4) and the above, Spgs Saby + ox-inds BB, a. and Ay <1 by assumption (4.1), 50.3, —> Oas > oo. Thus, by the conti of f° and the definition (4.3), 2, —> 0.48 > o». Take any ¢ > 0. For all sufficiently large 7, ee, 19) only remains to show that x, — 0 50, from (4.4), lim sup 6" < &. (4.20) As is arbitrarily small, this shows that im x," lim 83" =. (at) ‘Thus, x, + = 0 superlinearly, and the proof is complete. Romarks ‘The proof of Theor ‘greater than the second-largest of |x, ~ [ase the sequence (|x, — £)) is monotonic decreasing, except perhaps for the first term, In fact, the proof shows that, for g = 1 and » > m 4.1 shows tha for n>, ea El <3, —20an—o (provided x, # ©). This isa common definition of “superlinear convergence,” stronger than our Definition 2.2. If g > 2, the sequence £) need not be eventually monotonic decreasing. Monotonicity would follow from strong superlinear convergence with order greater than 1, but more conditions are necessary to ensure this sort of convergence: see Sections 6 and 7. Section 5 STRICT SUPERLINEAR CONVERGENCE Assuming slightly more than Theorem 4.1, Theorem 5.1 shows that con- vergence 10 simple zero of f'*-"’ is strictly superlinear (Definition 2.2). Before stating the theorem, we define some constants f,,. and 7,,, which are needed here and in Sections 6 and 7, see STRICT SUPERLINEAR CONVERGENCE 27 For g = Land a > 0, let the roots of Ga) Then the con- Since the case & = I often occurs, we write si for 74, Ply By FOE By a 75 Remarks Ban is the unique positive real root of ($.1), and it is easy to see that, for 0 0, exactly q— 2 roots of ay, 62) when a= 1. If = land waxtl 6 is are omitted, forall cases of practical interest are covered by Table 5. ses i, and p, to 12 decimal plac for q = 1, ..., 10. The table was computed by finding all roots of (5.3) wit the program of Jenkins (1969), and the entries are the correctly rounded values of f and y, if Jenkins’s a posteriori error bounds are correct. TABLE 5.1. Tue constants Baud Y, for q — 1()10" 4 Be ’ 1618033988750 D41sos3988750 2 1324717987215 0.868836961833 3 1220744084606 1,063336938821 4 1167303978261 109900031516 3 1134724138402 1099174913506 6 1112775688279 1,091953305766, 7 1.996981587299 1,083743696285 8 1.95070245491 1.076133134083 9 1.075766066087 1969448852721 0 068297188921 1.063666938408 ‘See Definition 51 andthe remarks ebove for » deterption of ‘he constants etd 28 SUCCESSIVE HTERPOLATION Chan 9 THEOREM 5.1 Suppose that f= LC+{a, b; M, a} (see Section 2.2); C= (a, b): F*-"C)=0; and £1015) #0. If xy... are (distinct and) sufficiently close to £. then sequence (x,) is 'y defined by (1.1), and x,» with weak order at least fi,.. the positive real root of xe*!) — x 4 6, Remark then, ftom Soa, 3M65 < G4) fied, then an upper bound on |x, — ¢| follows from equation (5.10) below. Proof of Theorem 6.1 For n> BL MAX [xy = 85) Suppose that x,,..., x, are so clase to £ that the conditions mentioned in the remark above are satisfied. Then Theorem 4.1 shows that (5,) is mono. tonic decreasing to zero, and 6.6) then the result follows immediately: by our defini tion, x, > ¢ with weak order co, Hence, suppose that J, + O for all n >: 0 Let (67) (not the same 2, as in Theorem 4.1). From condi fe fact that (6,) is monotonic decreasing, 0 < 4, <4, 1, we have 1 dy Bie (39) for m = 0,. .. .q. Thus, from (5.8) and the definition of £4, thei log f1> “086, 2, Bis + E tog Since 4y > O and f,. > 1, equation (5.10) shows that lim inf (—log |x, — CD!" = By which completes the proof, Sen IME EXACT ORDER OF COMERGENGE 29 Note that, in the important cuse a = 1, there is a simple proof of Theorem 5.1 which does not depend on Theorems 2.3.1 and 4.1, This proof shows that, instead of (5.4), the condition 3MB, <2) FC) (6.12) is sulicient, The idea is this: by applying Rolle’s Theorem q — 1 times, we see that Pie © $158, and |g, —¢/ <0, the secor ‘+g ~ Ch Thus, from Lemma 2.4.1, (5.13) On the other hand, equati (S14) so we can bound |.x,,4., —¢|, and then the result follows in much the same way as above, Section 6 THE EXACT ORDER OF CONVERGENCE Theorem 5.1 gives conditions under whict least ly ICis natural to ask ifthe order is exactly f. In general this is true, but some conditions are necessary to ensure that the rate of convergence is not too fast: for example, the successive linear interpolation process (q~ 1) converges to a simple zero £ with weak order at least 2(-> , — 1.618 iF it happens that /") = 0, for the more aecu- beexpected, Theorem 6.1 gives for the order to be exactly i. Apart from the condition /' necessary to impose some ci extra conditions are Before proving x, > { with weak order at 7 = 1: sce Section 7.) We need two lemmas, Lemma 6.2 is con- cerned with the solution of a certain differ jon, and is closely related {0 Theorem 12.1 of Ostrowski (1966). The lemma could easily be generalized, but we only need the result stated, Lemma 6.1 gives @ recurrence relation for the error x, — ¢, Special cases of this lemma have been given by Ostrowski (1966) and Jarratt (1967, 1968). Ostrowski essentially gives the case q = 1, and Jarratt gives weaker results for g = 2 and ¢ = 3. (Our bound on the remainder R, is sharper than Jarratt’s, and we do not assume that f is analytic.) Tn Section 8, we show how the result of Lemma 6.1 may be used to accelerate convergence of the sequence (x). 30 SUCCESSIVE ivrEnPOLATION chan 9 Lemma 6. Suppose that fe CHa, b}; C= aA fC) = 0; FO) #05 Nays +++ ¥aoqat@ (distinet) points in [a,b]; and x, 94, satisfies equation (1-1). Let 6, be the largest of |x, — ¢ Then 1d 6, the secon largest. Gm gE Bag Co — 1 nnp D+ RG) R, = Of6, (6.2) as 5,0, Proof Without loss of generality, ass t ne that m=0 and ¢ = 0. Rearrang if necessary, so From Lemma 3, (63) “(3 an) 2 nh Fa ig I! 64) where 64), (65) _ 40) O60¢ Te = Oy 66) and 62) asd, 0. ‘The right side of (6.4) is just : 68) where 6) a8 , 0, so the result follows. See6 THE EXACT ORDER OF CONVERGENCE a From the bounds on +e itis easy to derive an explicit bound on | R, |for sulci 4, For our purposes, though, the relation (6.2) is adequate. A simple corollary of (6.2) is that, if f'**" = Lipy,a, then R, = 015)" 5) (6.10) 15 5,0, Lemna 6.2 Suppose that 2, co as m—> oo and, for n > 0, Bavast ~ Aan be = hin G1 where k, = O18 (6.12) asm» oo, saconstant. Ify, << B, then A, = eB} + Os") (6.13) as n—> 20, and if k, = of98) a8 n> co then = By + fe) (6.14) a8 n> 00. Os < py then > B+ OMY) (6.15) 8 n> 60, where fo itq=, eh ifg>1, Ca and eis a nonnegative constant. sary, for we can choose any A wi and consider 2/2" instead of 2, in Ostrowski's proof, Thus, in view of the remarks after Defini- (6.13) and (6.15) follow from Ostrowski's Theorem 12.1. (6.14) does w Sime way, but the proof of Ostrowski’s Theorem and s Lemma 12.1, but this follows from the Toeplitz Jemma: if k, > 0, |E| <1, and 2, =k, + kyu ++ yg then z,—+ 0 as m—> co (See Ortega and Rheinboldt (1970), pg. 399), THEOREM 6.1 Let fe C¥ifa, df < (@, 85% ‘Suppose that |x, — ¢| is sufficiently sms 32 SUCCESSIVE miTERPOLATION coop 3 fori=1,2,...49, and that 2 $1 = 61 Koy — Ole, — 0) (6.18) a Eee BaF) steed iely defined by (1.1), and x, —»¢ with weak order 8, In fact. ifg = 1 or 2then x, —» ¢ with strong order f, and asymp- totic constant |i"! and if g > 3 then log lx, —€| = Bj + Otny) (6.20) 1%, approach sufficiently fast, € (6.18) makes sure they do not approach & too fast. These conditions lary 7.1 P= By, but 2.1) does not necessarily hold, for y, = Lif g > 3 Proof of Theorem 6.1 Let ¥,= |KO, ~ Oh 21) 17) and (6.18) we have, at least for a = 0, Sen 6.25 From the assumption: show that (6.22) and (6.23) hold for all w = 0. Suppase, as indi < m. Then, by taking] x, — ¢ | suffice ‘ay suppose that the remainder &, of Lemme hypothes (6.24) See THE EXACT ORDER OF COWVERGENCE 30 From (6.23) » (6.26) Simi 3 Yasert 2 Fn Pm Ro)- 6.2 Also, fron 1 > 0, so the right side of (6.28) is p (6.26) and (6.28), we see that (6.22) and (6.23) hold for — 15 (6.25) and (6.27) hold for 4, = —log and a From (6.25) and (6,27) Ik, < log 2, (63, 6.2 with s = 1. fg > 3 then A= cB + Olay) €> 0, so the res so we may apply Lemi a8 > s9, From Theorem Iq =! of 2 then 7, < A= fi + 00) (6.33) 5 n+ co, From (6.29), (6.30), (6.33) and Lemma 6.1, we now see k= a(t) (6.34) as 1 c0, so, by equation (6.14) with s = I, fis + afl) (6.35) asm» oo, Thus, there exists so the result follows from equation (6.21). Note that, if f°? © Lipy a for any Mand a > 0, then (6.34) may be replaced by k, = of5*) for any s > 0, so (6.15) holds, and [KPO + 0098 24 SUCCESSIVE mTERPOLATION chao. Section 7 STRONGER RESULTS FOR q~ 1 AND 2 In section we restrit our attention to the two cases of greatest practical interest: g = I (successive linear interpolation) and g = 2 (sueces- sive parabolic interpolation for finding an extreme point). Corollary 7.1 shows that the conditions (6.17) and (6.18) of Theorem 6.1 are unnecessary f= coRoLLARY 73 Suppose that q = 1; f¢ C¥fa, bl: £ € (@, 8); /@) = 0; f"@) # Oj and £7) #0. Ixy, x, and { are distinct and sufficiently close together, then a sequence (x,) is uniquely defined by (J.1), and x,» with strong order By = AL-+ JS) and asymptotic constant | VAP GIP? as a» vo. Proof From Lemina 6.1, sao = FAB lay — Dien, — O10 + oft a) as max (/xy-- 6 sequence (x,), where x, 6) +0. Thus, Theorem 6.1 is applicable to the Remarks that fe Cla, Corollary 7.1 are satisfied, then 45 1 > oo, As we remarked at the end of the proof of Theorem 6.1, the rela- tion (7.2) holds provided that f= LC%Ia, b: M, al for some M and a (see ‘equation (6.37)). For an even weaker condition, see (7.7) and (7.8) below. ‘The following theorem removes the rather artificial restrictions (6.17) and (6.18) of Theorem 6.1, if f°" is Lipschitz-continuous and q = | ot 2 The proof does not extend to g > 3 because it depends on the assumption that y, <1, which is only true forg = 1 and q = 2 (see Table 5.1). THEOREM 7.1 Suppose thatg = 1 or 2:f'= LC and f°) #0. If.x,,...,x,are (ist sequence (x,) is uniquely defined by ( 1, x, + with strong order f, and asymptotic constant fa, bs ANNE = Bs f°) =O; et and) sufficiently close to ¢, then and either e0.7 STRONGER RESULTS FOR q=1.AND 2 a5 "sin fact f O(n) 13) leant OwrnD os) a6 n—+0o (fecal that 2, ~ 1.618, , = 1.3 7, 0869); or 2. x,» with weak order at least 2 if g = 1, or at least [G4 V5)? = 1378 ifg=2. 7) 0.618, and Romarks. Ifq = then, by Cor S10) =0 (orif one of x, a ary 7.1 cae2 of Theorem 7.1 is posible only if cides with en tho weak one is). If q=2 then case 2 is possible, although unlikely, even if (2) + 0 and x, 4 ¢ for all n. that is necessary is that the terms in relation (7.28) repeatedly nearly cancel out. Jarratt (1967) and Kowalik and Osborne (1968) fsvume that sich cancelation wll eventally de out, so the order wil be Bis. The conditions (6.17) and (6.18) are sufficient for this to be true, bu ott some a ions there i remote poss ontnte indefinitely. For example, wih 7) = 20? t-theve ae staring values Nee andy soc hat mre] and os ran > —e0Dt=2), J $0 x,» = Owith weak order /T. Similarly, if y= 404/53) =2618..., (75) that Zanes ~ eG — DY. 6) and Xyqea ~ —exP(—(9 = Dy 30 x, 0 with weak order 7! = 1.378... The proof is omitted, but the reader may easily verify that (7.4) and (7.6) are compatible with’ Lemma 73 below (this depends on the relation 2y ~ 1 = y(9 — 1). wwe have not stated Theorem 7.1 in the sharpest possible form. If f°") =0, then x, > £ with weak order atleast By > Be provided thai f"*"”” = Lipy-a for some M and a > 0.18 2°") 2 0, then the theorem holds provided that fe C' no longer hold, bu if there is an € > O such (f'4*05 5) = Olllog 5 For the sake of simj 48 successive wrenrovarion chap. OU . ao (oe | 78) O17) . a5 n— co. (A condition like (7.7) ovcurs in some variants of Jackson's theorem: see Meinardus (1967).) Before proving Theorem 7.1, we need three rather technical lemmas. as 5» 0, then ee Lemma 7 Suppose that, for n> 0, = Beet + Mperhaee + aMaa tm, 55 Be, (79) where 6, is the largest of|x,| |x... [and |x,..|, and d; is the second largest. Ie there is a positive constant L such that I Te Shel 31,1291 Lh and Im) 0. for all n > 0, then |x,} > 3 Proof As in the proof of Theorem 6.1, it follows by induction on » that 2 é a 9/2 Hearne |S 1231 xpal aa for all n > 0 Lemma 72 Ifthe conditions of Lemma 7.1 are satisfied, then either x, = 0 for all sufficiently large 1, or O1ng3) a8 n=» co Proot fx, # O for infinitely many » then, by Lemma 7.1, x, % Ofor all > 0. I this is s0, define 2, = —tog|x,| and k, = 2,.,—a,,, — dy From equa- jon (7.11): &, is bounded, so Lemma 6.2 with s— | gives 2, — oft + O(1) asm — oo, By Lemma 7.1, 2, —> + 09, 80 ¢ > 0, Thus, from (7.9), k, = Ofexpl—e(B: — 18") @.12) a8 n> co, (This is not necessarily true in the proof of Theorem 6.1.) Nov, seer STRONGER RESULTS FOR g=1 AND.2 a7 Lemma 6.2 with s < y, gives 4, = Bs + Ow7) 13) as n> c9, and the result follows from the definition of 2, LEMMA 7.3 Suppose that (7.9) (depending on £) such id (7.10) hold. There are constants K and N for some n > N, =x, (7.14) and 7.15) then Huss = Mga (l $04, (7.16) Kang = ML 12) + Xe Xeeall + 05,0), An (L$ ta) + XX teeall + 05,05 (7.18) and Byes = ABM 6.) + aethmeaall tye (7.19) where Wal (720) fori=t,.. Proot The lemma follows by repeated use of the recurrence relation (7.9) and the inequalities (7.10), (7.14), and (7.15). Proof of Theorem 7.1 Without loss of generality assume that £ = 0. First suppose that ¢ — If /*(0) # 0 then the theorem holds, by Corollary 7.1. If f"(0) = 0 then, by Lemma 6.1, 2 = 0(638,) a2 as 6,» 0, where 6, and 6, are as in Lemma 6.1. If x, and 2 ate suificiently smal, equation (7:31) implies that (7.22) and (7.23) 38 SUCCESSIVE mTERPOLATION cme. for all n > 1, Thus x, ->0 as n> 22, and Al xy] < exp(—2) (7.25) n= 1, From (7.24), equation (7.25) holds for all m > 0, by induction on 1. Thus, the weak order of convergence is at least 2, and the proof forq = lis n now 01 —+ eo. IF f40) = 0 then the weak order of root of x? = x +2, bya proof Ii olds 28 fiy.,= 1.52... > 1.38 0) = 0, then we may as well suppose that vergence is at least ys. that above for g =I, 1 7.27 ro aan ige of scale, as in the proof of Theorem 6.1, Thus, we must study the bya interesting recurrence rel $x eSsrx + OS, (7.28) and, by Theorem 5.1, we can assume that x, 0 with weak order at least 1 (729) and (7.30) 2 Mand 3st) and Mgeal < 4mlxadial (7.32) in the sequence N. To avoid confusion with subscripts, write = n+ 2orn +3). Ifa =n, is sufficiently large, and (7.29) 2lx,xe1] (7.33) 00.7 STRONGER RESULTS FOR q=1 AND 2 39 and, by Lemma 7.3, sx (7.34) 1F (7.31) and (7.32) hold then, similarly, Ix) < xyes! (7.35) and SA4lxxb| (7.36) Let v= Axl a) = n, in N, suppose that the (7.31), and then the next s > 1 satisfy (7.29). Then repeater ‘equalities (7.33) to (7.36) gives nts of N satisfy use of the in- MALT y9 5 where a9 = 2 Lec whe, 8) = gtr, sy" (7.40) For fixed «> 1, yin.) i a decreasing fumetion of r, with limit os BES)” ink wer. aan as r—r os. Thus, x,» 0 with weak order at least c, so case 2 of the theorem holds. Now suppose that there linear convergence of (x, Choose such 10 such infinite sequence WV. By the super- na 7.3 is apy n (sufficiently large). There are only three possibi 1, Equation (7.30) holds; 2, Equation (7.32) holds; or 3. Neither (7.30) nor (7.32) holds, so [Xye2] > 21x tyes (7.42) In the first case, Lemma 7.3 shows that we can replace w by n + 2, and con- 1ue with one of the three cases (itis crucial to note that Lemma 7.3 is sti applicable). In the second case, Lemma 7.3 shows that we can replace m by m+ 3 and continue. Since no infinite sequence N with the above proper- ties exists the third case must eventually arise. Then, from (7.42) and Lemma 7.3, we see that Lemma 7.2 is applicable to the sequence (x,,), where x, Npsnto: By Lemma 7.2, (x,) converges with strong order f and asymptotic constant 1, and hence so does (x,). In view of the assumption (7.21), this completes the proof, 4 successive INTERPOLATION cass Section 8 ACCELERATING CONVERGENCE Ifa very accurate solution is required, and high-precision evaluations of fare expensive, then it may be worthwhile to try to increase the order of convergence of the successive approximations by some acceleration techniq For example, we can use Lemma 6.1 to improve the current approximation at each step of the iterative process. Jarratt (1967) suggests one way of doing this if ¢ = 2, but the method which we are about to describe seems easier to (sce Theorem 8.1), and applies for any q = 1 Suppose that x95-.., yy are approximations to a simple zero ¢ of f°”. For example, they could be the last q -} 2 approximations generated by the successive interpolation process discussed above. We may define ¥yea» Yoayy--- in the following way: if m > 1 and xy. <. X,,, are already defined, let P, = IP(Ps 5). +-5 Xyyq)s and choose y, such that %y,) = 0. 8.1) Le, », is just the next approxima process, Fr generated by our usual interpol n Lemma 3.1, y, is given explic = GB — Feeney). Instead of taking y, as the next approximation x,,,.,, we use Lemma 6.1 to compute a correction 10 yy, and take the corrected value as the next ap- proximation. Formally, we defi by (8.2) 3) theorem shows that, under suitable conditions, the sequence (x, is welldefined, and x,—>{ with weak order appreciably greater than Be. which i the usual order of convergence of the unacoclerated process (cee Sections Sto 7). Note that there is very litte extra work involved in comp, equations (8.3) and (8.4) if y, is computed via (8.2), for fx, eres Xe) and fl, seq] Will already be known, except atthe frst iteration Before stating Theorem 8.1, we define some constants ff, which take the place of the constants fi, (Definition 5.1) if the accelerated process is used. DEFINITION 8.1 Forg > 1, is the positive real root of MSM ped 8.5) Seo 8 ACCELERATING CONVERGENCE 41 Romarks Itis easy to see we have B, > B, and, corresponding to the bound (5.2), hen, by the definition of order (see If x, with weak order B > 1 Section 2), for any € > 0 we eventual ive tolerance (log 8, Vlog f, suggests how accelerated process, rather than ocelerated 3s, if very high accuracy is required. From the bounds (5.2) and (8.6), 108 Bo — top, 2 = 0.6309 (8.8) eit log By '% : : there is 37 percent saving for only practical interest < of By Be and (log By) es for fae correctly rounded « & Boe B08 1 0.7897 2 07387 3 0.7093 4 0.6936 5 0.46832 6 0757 7 t os702 8 L1SOIS957I868 od 9 1116575138368 0.6623, 10 1105367322949 0.4595 *Sce Definiion £1, and the remarks above, for a desert . constant By and the signflcnnoe af the et to 12 decimal places, and the other ent Table 5.1 for the f, to 12 places.) The table are given to four places. (See jegests that is true, for x* — x? — x ~ 1 = (x8 — x — 1? + 1 THEOREM 8:1 Suppose that f¢ LE a, db; MIs € (a © = 05 7D # 0; and eas ae ( BIIEX) oy close to £, then a sequence (x,) is uniquely defined by and x, C wi 1 are suiicient ns (8.2) t0 (8.4), weak order at least fy (Definition 8.1) as n> 20, 42° SUCCESSIVE INTERPOLATION tap. Proof Forn> second-larg¢ 1et 6, be the largest of isa — Sf ltd, be the 6% Ify, is defined by equation (8.2), then Lemma 6.1 shows that CHK DC OG, OF OG) as 6, +0, where =f & ae r@ at In particular, (8.10) implies that »— $= 065) (6.12) a5 3,0. Thus, for0<7 {as n — oo. From equation (8.16), there is 2 positive constant A such that, for all nel, ai — 61 A°8,5,6), 17) Also, if 8, is sufficiently small, then —logtA |x, — CD > BY G18) for n= 0,...,g + 1. From equation (8.17) and the definition of, we see that (8.18) holds for all n > 0, by induction on m. Thus Jim inf —log|x, —£)"* = Be 6.19) ic., the weak order of convergence is at least f, $0 the proof is complete. see.9 SOME NumtEaIcAL EXAMPLES 43 Section 9 SOME NUMERICAL EXAMPLES the f GLA Xb ay = GH 2A = BF Oe FAN Oe y= 2, G3, fl) = 1+ 40x + 100 Set xy OS, xy = 0.25; and Ag = 4 fle) = 1+ 2x + 40x? +f Sxt 4 2x3 Xz = OS, xy = 0.25, xg = 0.125 In all these examples ¢ = 0, and the iterative process defined by (1.1) converges, even though the initial values are not yery close to {. Apart from ses the sequences (x,) p n process, for the functions and starting \ear convergence, the entries are given until |x, | < would seldom be required in pra le also gives the sequences (x;) produced by the accelerated interpolati process described in Sect “for 1 = 0, 4+ |, As predicted by Theorem 8.1 and Table 8.1, lerated sequences converge appreciably faster than the unaccelerated ones. To verify relations (8.12) and (8.16), the table gives Od and ne (0.2) when they are defined. With a few exceptions near the begit ()x,D and (1x, ing of some of ic decreasing, sor, and r’, wwe expect that 03) is just 2/[g(q + 1)] for our examples. 8.1, we expect that from the proof of pon Tas tg and this is just —6/{g(q + 1)(q + 2)]. The results support these predictions, . (9.4) TABLE 9.1. Numerical results for g = 1, 2,3 and 4 9.3636 03473, 06851 0.3523, 09568 0.9949 0.9998 1.0000 1.0000 #0000 1325-8 1712 035s 03430 0.3360 0339 03334 03333 4 8 02874 0.2755 07178 ass = 10056 o039 0.0588 0.1253, 0.0757 oni ‘0970 0.0921 ore 1.0847 —0.1055 0.0989 See 10 summany 45 TABLE 9.1 (continued) 4 ” Xe “ » n 4 0.1050, o.1046, .4040 0.1022 2367-23 0.1005 Table 9.1 was computed on an IBM 360/91 computer, with 14-digit truncated floating-point arithmetic to base 16, When computing the divided differences in equations (8.2) and (8.3), we took advantage of the fact that vided differences of 1, x,x%,..., x"! vanish identically. Otherwise itis not possible to reduce | x, |or|>,|to 10-* without using higher precision arithmetic, because of the effect of rounding errors, except When g = 1 For g = 2 our example is the same as that used by Jarratt (1967), and our results agree with his for m <9. For n= 10 and 11 our results differ slightly, presumably because of rounding errors. The example given by Jarratt (1968) for q = 3 has also been verified. Section 10 summary ‘The main results of this chapter for g = 1 (successive linear interpola~ tion for finding a zero) and g = 2 (successive parabolic interpolation for finding a turning point) are summarized on p. 46. SUCCESSIVE INTERPOLATION AN ALGORITHM WITH GUARANTEED CONVERGENCE FOR FINDING A ZERO OF A FUNCTION ed sequence Section 1 INTRODUCTION fed sequence a7 48 FINDING A ZERO chap Dekker's algorithm The algorithm described here is similar to one, which we call Dekker’s algorithm for short, variants of which have been given by van Wijngearden, Zonneveld, and Dijkstra (1963); Wilkinson (1967); Peters and Wilkinson (1969); and Dekker (1969), We wish to emphasize that, although these vari- ants of Dekker’s algorithm have proved satisfactory in most practical cases, intees convergence in less than about (b — a)/6 f ‘ions. An example for which this bound. (On the other hand, our algorithm must converge function evaluations (sce Section 3). Typical values are b—a=1 and 5 = 10", giving 10 and 1600 function evaluations respe point of view is that 1600 is a reasonable number, but 101? puter program which attempts to evaluate a funy ime. ‘behaved functions, e.g., polynomials of moderate degree with arated zeros, both our algorithm and Dekker’s are mucl bisection. Our algorithm is at least as fast as Dekker’s, and often slightly er (see Section 4), so the only price to pay for the improvement in the guaranteed rate of convergence is a slight increase in the complexity of algorithm. Section 2 THE ALGORITHM The algorithm is defined precis by the ALGOL 60 procedure zero given in Section 6, Here we describe the algorithm, but the ALGOL proce- dure should be referred to for points of detail. For the moti ‘be d both our algorithm and Dekker’s algorithm, see Dekker (1969) or Wilkinson (1967) Ata typical step we have three points a, 6 and c such that f(b) (0) <0, | #()| =| f€)|,and a may coincide with c. The points a, 6, and ¢ change dur- ing the algorithm, but there should be no confusion if we omit subscripts. bb is the best approxi ion so far to {, a is the previous value of b, and { between b and c. a=) If f(b) = O then we are finished. The ALGOL procedure giv: (1969) does not recognize this case, and can take a large number of s1 steps iff vanishes on an interval, which may happen because of underflow, This oce x on an TBM 360 computer. Hf(6) # 0, let mt — Mle ~ B), We prefer not to return with € most by Dekker IL =jo+0) [<6 Go the error is ween 6 and ©), and id the possibi id numbers p and g such that Dawa by Des 6° = (ae b—b"|>6, a we (b + d sign(m) otherwise (a “step of 3” Dekker’s algorithm forms a new set fa, b, old set [b, 6 nately itis easy to construct a function f for which steps of Bare taken every time, so about (b — a)/6 function evaluations are required for convergence For example, let es DY as the next point at wh 2 fora +5 D. E> 0, Ce, ~ Oly —O 1 — 6), and x, is sufficiently close to f, then successive linear interps gives a se- quence (x,} which converges linearly to ¢. In fact, equation (3.2.1) holds with p = 1 and K = fi; are det Defi 22) is the ratio Cives 50 FmoING 4 ze80 chap 88 xq ~>G, provided yj, remains bounded away from 1, Now the iteration 2utt = EE 24) where a= es has fixed point 2 = £,1,, and Ie@l Jlel then we do a bisection, otherwise we do either a bisec lerpolation just as in Dekker’s algorithm. Thus, |e| decreases by at a factor of two on every second step, and when |e| <5 a done. (After a biseetion we take ¢ = m for the next step.) T algorithm, unlike Dekker's, is never much slower than bisection, A simpler idea is to take e asthe value of pig at the last step, but pract cal tests show that this slows down convergence for well-behaved functions by causing unnecessary bisections. With the b experience hras been that convergence is always atleast as fast as for Dekker's algorithm (ee Section 4), wee, and, by Theorem be 0. Thus, successive linear interpolation does converge, Inverse quadratic interpolation If the three y inverse quad of by linear interpol urrent points a, b, and ¢ are distinct, we can find the point ic interpolation, ie., fitting x as a quad fon using just @ and b. Experiments show th 70) he form Id be accepted. Cox ( asa function of x), Where p and se the acceleration te id possibility is (See also Ostrowski ‘oid overflow or division by zero whe ew point i. Since A is the most recent approxim he previous value of b, we doa bisee | FB)| n= fla), perform the division unless itis safe to do so. {needed anyway.) When inverse 8 parabola ca is to be done interpolation is used 1 mation to f unless natural to accept the of the way from 4 to ¢: consider parabola has a vertical tangen 2| p> 31mg. interpol ng case wl inter Fld) —f(e), Hence, we reject i i? The tolerance bound evaluations required. In the ALGOL procedures t tol is used for 6 The effect of rounding errors so that IL prevent convergence may be increased if a 52 FINDING & ZERO can higher relative error is acceptable, but it should not be decreased, for then rounding errors might prevent convergence. “The bound for | {| has to be increased slightly if we take rounding errors into account. Suppose that, for floating-point numbers and. t computed arithmetic operations satisfy flix x 3) = 39 +e) 2.10) and flee yy =x be) ty +e). any Se for (= 1,2,3 (see Wilkinson (1963). Also suppose that = |x| exactly, for any floating-point number x. The algorithm com- imations m= f105 x (@—B) @.12) tal =fIQ x € x |b] +9, 2.13) ing only when < 101 14 (unless f(6) = 0, when € = { = 6). Our assumptions (2.10) and (2.11) give [ale Me — | — ley 0) 15) and HOI <= (2elb| + DCL + &), (2.16) so (2.14) implies that je~ 61S (2 ,)eebl+ 00+ er Fahl tle. GID Since|f — | <|e— b| and b = &, this gives If — €]< Gel} + 2, (2.18) neglecting terms of order et and bound (see above). (OF course, it is the user’s responsibility to consider the effect of rounding, errors in the computation of f- The ALGOL procedures only guarantee to find a zero { of the computed function f to an accuracy given by (2.18), and Cmay be nowhere near a root of the mathematically defined function that the User is really interested int {| Usually the error is less than half this Extended exponent range Jn some applications the range of f may be larger than is allowed for standard floating-point numbers. For example, f(x) might be det(4 — x1), where A is a matrix whose eigenvalues are to be found. In Section 6 we give an ALGOL procedure (zero2) which accepts f(x) represented as a pait Soe. 3 CONVERGENCE PaapenTiES 53 (oO), 22), where f(x) = 909-2 (y real, z integer). Thus, 2 functions in the same representation as is assumed by Peter (1969), although zero2 does not require that 1/16 < | y(x) Hx) = 0), and could be simplified slightly if this assump! Section 3 CONVERGENCE PROPERTIES Wthe and let where 6,, is the minimum over [a, 6] of the tolerance B(x) Imacheps|x| + G3) (sce Section 2), and f°] means che least integer y = x. By K > 0. (Procedure zero takes only two function ev First consider a bisection process tert to contain a zero has tength = 24,, (60 the endpoint minimizing within d, of the zero, and certainly within 25,). It is easy to see that this pro- cess terminates after exactly & + I function evaluations unless, by good fortune, f happens to vanish at one of the points of evaluation. ‘Now consider procedure zero or zero2, If k = 1 then the procedure r two function evaluations, one at each end-point of the terval. If k = 2 then there are two initial evaluations, and after no ‘more than four more evaluations a bisection must be done, for the reason described in Section 2. After this bisection, w! evaluations required is 2EGHTHO.. FOK+ Mk + DF 2 64) Since Dekker’s algorithm may take up to 2 function evaluations (see Section 2), this justifies the remarks made in Section 1. Also, although the upper bound (3.4) is attainable, it is clear that it is unlikely to be attained except for very contrived examples, and in practical tests our algorithm has never taken more than 3(k ++ 1) function evaluations (see Section 4). This justifies the claim that our is never much slower than bisecti Superlinear convergence Ignoring the effect of rounding errors and the tolerance 3, we sec, as in Dekker (1969), that the algorithm will eventually stop doing bisect when it is approaching a simple zero £ of a C? function. Thus, temporar jon (3.4.22) holds (see Theorem hen the weak order of convergence ih the .dification described in Section 2, the weak *), The idea of the proof is that ye approximating parabolas is ty (3.5.13) still holds for some M (no longer the inverse parabs order of convergence is Thus, our procedure of steps and, under the cone superlinear with order at (618... P= 2.618... > 2 Section 4 PRACTICAL TESTS The ALGOL procedures zero (for standard floating-point numbers) and zero? (for floating-p ‘an extended exponent range) have been tested using ALGOL W (Wirth and Hoare (1966), Bauer, Becker, and Graham (1968)) on TBM 360/67 and 360/91 computers with machine precision 16, The number of function evaluations for convergence has never been greater than three times the m sd for bisection, even for the cers algorithm 5 been tested extensively function evaluations. Zero2 with eigenval es, and in this applica usually takes the sa tone less fu uation per cigenval Dekker’s algoi considerably less than bisection. Table 4.1, we give convergence Wi where ¢ number of function ev. procedure zero? and functions 3° for x), and Als), p a tala x 10% ste 1 fee troll 29) otherwise, i and ye [iMexD) iP > 10%, a 100 [Fane io5— e-em) onewief Seo. PRACTICAL TESTS 55 The parameters a, b, and ¢ of procedure zero? are given in the table, In all ceases macheps = 16-'* TABLE 4.1 ‘The number of function evaluations for convergence with procedure zero? So) a of = DUT and Fu = 0. In Table 4.2, we compare the procedure given by Dekker (1969) with procedure zero (procedure zero? gives identical results as no underflow or overflow occurs) for a typical application: finding the eigenvalues of a sym- metric band matrix by repeated di jon. Let A be the n by 1 Sdingonal matrix defined by por if injotorinjam pif 1 2, A has eigenvalues a, =p — 49.005 KE) 4 2rcos WH) a4 h (1970). Table 4.2 gives the eigenvalues A, the number mp of fonction evaluations per eigenvalue for Dekker’s proce- dure, and the number , of function evaluations for procedure zero. For eigenvalue, the tolerances for DekKer’s procedure and for proce zero were the same. (The tolerance was adjusted by the eigenvalui to ensure that the computed eigenvalues had a relative error of 5 x 107) Tests were run for several values of m, p, q, and r: the table gives a typical set of results form = 15, p = 7, q = 7/4, and r = 1/2, To ob the same accuracy with bisection, at least 40 function evaluations per eigen: would be required, so both our procedure and Dekker's are at least es as fast as bisection for this application. 56 FINDING A 2680 con 4 TABLE 4.2 Comparison of Dekker's procedure with procedure zero & dy 1 9583825096886 23995005360754 5239614628727 2.05025253169417 4471048521337581 ‘0000000000000, 9 7481 TSITALGOI6L 0 8.971 67724536008 u 40.5063081987721 R 11.9497474683058 5 13.2029707188829 4 1a. 1742635087688 ‘Some more experimental results are given in Chapter S. of the superlinear convergence, see the examples given in Sei Section 5 CONCLUSION s fast as Dekker’s on well-behaved e Dekker's, itis guaranteed to converge in a reasonable wber of steps for any function. The ALGOL proced id cera given in Section 6 have been written to avoid problems wi or overflow, and floating-point underflow is not harmful is set to 22r0, Before giving the ALGOL procedures zero and zero2, we briefly discuss some possible extensions. s long as the result Cox's algorithm Cox (1970) gives an algorithm which combines bisection with polation, using both and /’. This algorithm may fail to converge in reasonable number of steps in the same way as DekKer’s. A simple modifica S008 concwusiow 57 2 for Dekker’s of convergence Parallel algorithms In this chapt known (see, for ex: only fu order less than 2, fat {, Winograd only fi func wan 2 for al be gained by goi ion. However, Miranker (1969) lass of converge nd. superlin behaved fu iF convergence with order greater tions. Searching an ordered file Apt which is commonly solved by a binary search (j.., bisection) be fe 9: S-> Ran order-preserving mapping from $ into the 1 that T= [tye Given ¢ © [p(t.). ot, wumbers, Suppose st] is a finite subset of S, with % <1, < hy ‘we may define a monotonic function f on (0, n] by FO) = wt) ~ ¢ 6) where x © [0, ]and / = [x — 1], Thus, finding an index i such that g(t) = ¢ 1 usual i le to modify our algorithm slightly to problem 52 rivownG A zeRO chan Section 6 ALGOL 60 PROCEDURES The ALGOL procedures and zero2 (for floating-point » below. For a deser (for standard floating-point numbers) han extended exponent range) are gi 12 idea of the algorichm, see Section 2. Some iS are described in Section 4. A FORTRAN ation of procedure zero is given in the Appendix. real procedure zero (a, macheps, 1,f): macheps, t; real a,b, macheps, t: real procedure f; comment: Zero rel terval eps is the nce. The procedure precision and ¢ is a positive tol assumes that fia) and /(b) have different signs: real ¢, de, fo, fb fe, se begin comment: Inverse quadratic interpolation; 4: = faifes 12 = foifes N= ba x1) w gq: = —q else ps = —p a: 2% p <3 x mx g— abs(ool % 9) \ p< abs(O.5 xs xq) then d: = pg else d: = Jol then d else if nx > 0 then rol else ~10l); see.6 L604 60 PROCEDURES 59 fb: = JO) 90 to if fb > O= fe > O then int else ext end; zero: =b end zero; real procedure 2ero2 (a, 6, macheps, 1, ); macheps, ¢; real a, 6, macheps, 1: procedure f: Procedure zer02 finds a zero of the function 7 in the same way as procedure zero, except that the procedure /(x, y, 2) returns (real) and z (integer) so that f(x) = y-2%. Thus underflow and overflow can be avoided with a very large function range: real procedure pwr? (x. n); value x, 1; real x, integer 1 ‘comment: This procedure is machine-dependent. It co: <0, avoiding underfiow in the intermedi 2:= ifn > —200 then x x 2 tm else ifn > —400 then (x x 2 7 (—200)} x 2 1 (a 4: 200) else if n> —600 then ((x x 2} (—200)) x 21 (200) x 2} (x + 400) else 0. integer ea, eb, ec; real ¢, de, fa, fb, fe, tol, m, p fla, fa, ea): f(b, fb, eb) int: ¢: = a; fe: = fa; ec: = ea; di =e: = b~ a; ext: if (eo < eb \ pwr 2(abs( fo), ee — eb) < abs(/D V (ee > eb \ pur 2abs(/b), eb — ec) > abst fc) then begin a: = by fa: = fb; ea: = eb; b: = ¢; fb: = fe: eb: = ec wes -2" for ars c: =a: fe: = fa; ec: = ea ends tol: = 2 macheps x abs(b) ++ 13m: = 0.5 x (e— 5); if abst) > fo 0 then begin if abs(e) < tof fabs( fa), ea — eb) < abs( fb) V +b ~ ea) > abs(fa)) then (ea Seb Ap (ea > eb /\ pwr d: =e: =m else ins. = pur2(fo, eb — eadifa; ita = ¢ then begin p: = 2X mx s:q7—1—send else i fa, ea — cif 1s = pwr2(fb, eb — ecvife: pia=sXQxmxgx(g—N—b—ax0=1) a=) xX O-DXG-D 40 FiNoiNG 4 zE50 ctao. 4 xQa AN ALGORITHM WITH b> O-= fe > 0 then int else ext GUARANTEED CONVERGENCE FOR FINDING A MINIMUM OF A FUNCTION OF ONE VARIABLE > 0 then rol else Section 1 INTRODUCTION ‘A common nding an appro: 62 mavoMizing A FUNCTION OF ONE vaniAeLE In Section 2 we consider the effect of rounding errors on any mi tion algorithm based entirely on fined in Section 3, why methods Ii example). In Sec' gous to the zero- ization algorithm analo- ing algorithm of Chapter 4, and some numerical results some possible extensions are described in Section 7, and an ALGOL 60 procedure is given on 8 Reduction to a zero-finding problem If Fis differentiable in [a 5], a necessary condition for fto have a tocal ‘minimum at an interior point we & (a,b) is Fp) = 0. ata or b: for example, [a,b]. If we are prepared to check for 'y point of fit is possible , OF even an inffexion point, rather 1 is necessary to check whether the point found is a continue the search in some way ifit is not IF itis difficult or impossible to compute f” directly, we could approxi- mate /* numerically (e.g., by finite differences), and search for a zero of f” as above. However, a method wi errors; 2. A method which does not need /" may be more efficient (see below): and 3. Whether /" can be computed directly or not, a method which avoids difficulty with maxima and inflexion points is cl Jarratt’s method Jarratt (1967) suggests a method, using successive parabolic interpola. tion, which is a special case of the iteration anslyzed in Chapter 3. With arbitrary starting points Jarratt’s method may diverge, or converge to a maximum or inflexion point, but this defect need not be fatal if the method is used in combi 1d such as golden section search, in the Seo2 FUNDAMENTAL LIMITATIONS BECAUSE OF ROUNDING ERRORS 63 same way that we used a combination of suocessive linear interpolation and bisection for finding a zero. Theorem 3.5.1 shows if f has a Lipschitz continuous second derivative which is positive at an interior minimum 1 then Jarratt's method gives superlinear convergence to je with weak order al least i, — 1.3247... (Gee Definitions 3.2.1 and 3.5.1), provided the ini- tial approximation is good and rounding errors are negligible Let us compare Jarratt’s method with one of the alternatives: estima J" by finite differences, and then using successive linear interpolation to find a zero off’. (This process may also diverge, or converge to a maximum.) Suppose that f(g) > 0 and fu) +0, to avoid exceptional cases (se 1.324... , Jarratt’s ‘Newton's method and successive linear interpolation: see 966), method has rat betwee Section 2 FUNDAMENTAL LIMITATIONS BECAUSE OF ROUNDING ERRORS Suppose that f= LC'fa.8; M11 has a minimum at gt © (a8). Since Fu) 0, Lemma 23.1 gives, for x = [2,5 M3) Sot ELS = + Bele WY, en where jm, Ale 5 TAR « Thus, if 0 and the term MB/(6; ace I, and James ( s simple analytie represental ely. For example, perhaps AFC) =LOU + EM + ed, Where |, |< € and Je!!|<€, so we can expect to find a zero of f° with a ative error bounded by € (see Lancaster (1966) and Ostrowski (1967b)). 2.6) holds it might be worthwhile to use the algorithm described in Chapter 4 to search for a zero of /", oF at least use it (0 refine the approximation /2 given by a procedure using only evaluations of f, However, this is not so if t be approximated by differences, for then (2.6) cannot be expected to hold, Even if/(x)is a unimod tbe numbers x 67).) i, the ‘may be easy to compute le computed approximation fH f(x)) .F(6)) must be constant over small intervals of real have the same floating-point approximation fl(x). In the Seed LUNIMRODALITY AND S-uMtlNoDALITY 65 of approx nd that the minimum of the y differ from the minimum that he is really interested ‘much as 6 (see equation (2.5) above}. There is no point in wasting ns by finding the of the computed function to not be Section 3 UNIMODALITY AND 8-UNIMODALITY There ate several different definitions of a unimodal function in the literature. One source of confusion is that the definition depends on whether the funeti josed to have a unique minimum or a unique maximum (we is unimodal on fa, 6) if Fhas only one stationary tion has two disadvantages, Firs ‘|. This defini- is meaningless unless f is differentiable Funct ingent are prohib- 3x? is unimodal on. L Wilde (1964) gives another definition: f is unimodal on fa, 6) Kime € [bh By S23 Oy SD) A Oy > x* 3 flr) < Sle). Gy la, 6), (We have 1a rather for all where x* is a point at which f attains its least va reversed some of Wi jes as he considers: ma minima.) Wilde’s de , oF even cont nity, but to verify that a function f satisfies (3.1) we need to know the x" (and such a point must exist). Hence, we prefer the following defi- nt to Wilde’s (see Lemma 3.1), but avoids any reference to the point x*. The d not as complicated as it looks: it have a “hump” between any two points x, and merely Sf can X, in [a, 6}. Two possible configurations of the points xy. X,.X,. and 2x° in G1) and (3.2) are illustrated in Diagram 3.1 DEFINITION 3.1 Lf i8 unimodal on (a, b) if, for all xq, *, and x, & [a, 5], eS a Aa <8 > FO) S/O) 2/6) fle) > fO4)) 68 MINIMIZING A FUNCTION OF ONE veRIAaLE onan. 5 DIAGRAM 3.1, Unimodal functions LEMMA 3. la, bl exist jon 3.1 are equiva SS) =o), G33) so equal dl variables gives Sf) > POE, G4) and thus fis) > fle). G5) Similarly, if x, x*, equation (3.2) gives f0%) < fos G6) Thus, from (3.5) and (3.6), equation (3.1) holds Conversely, suppose that (3.1) holds and xp f(D, contradicting the assumption that f(x (fCx,). Hence case 3 is impossible and, by (3.7) and (3.9), we always have fox) < fOr). Similarly, and the proof ) then f(x) > FU;) $0 equation (3.2) complete, A simple cor if f is continuous, then Wilde's definition of unimodality and ours are equivalent. For arbitrary f the definitions are not equivalent. For example, -x it x 4 y= 310 fo) [x if x>0 ate 1, I) by our definition, but not by Wilde's, for x* does not ‘The following theorem imple characterization of unimod: ‘These is no assumption that / is continuous. jon (@g., x") may have stationary points, 1 definition and Wilde's are essentially different from Kowalik jously differentiable. (Although this point is looked! See also Corollary 3.3) THEOREM 3.1 3.1) iff, for some F is unimodal on [a, 6] (according to De decreasing on (a, ) and (unique) jz © [a, b), either f is strictly ‘monotonic increasing on [1, b], or J is strietly monotonic decreasing ly monotonic increasing on (1, 6] is a special case of Theorem 3.2 below, so the proof is COROLLARY 3.1 1odal on [a, b], then fattains its least value at most once on it must attain IF fis ori [a,b], (IF f attains its least va by Theorem 3.1.) COROLLARY 3.2 IE fis unimod: once on fa, 6). and continu corottary 3.3 If f€ C'fa,6) then f is unimodal iff, for some = [ almost everywhere on [a. z] and f" > 0 almost everywhere on [jt that /" may vanish at a finite number of points.) (Note 65 mnunGZING & FUNCTION OF ONE VARIABLE Chop 5 Fibonacci and golden section search ‘oyd (1965a, b), Johnson (1955), Krolak 965), Pike and Pixner (1967), and Witzgall (1969).) Care the coordinates of the points at which f is I COROLLARY 3.4 Suppose that f is unimodal on (a, 6), dasa, x, Proof If x,

fle). Thus, if fly) flor) problem of computing unimodal B-unimodality We pointed out at the end of Section 2 1 imited-precision arithmetic are not unimodal. Thus, the theoreti sr methods is irrelevant, and it is give even approximately correct results in the pres- ence of rounding errors. To analyze thi that Fibonacci or golden section search. f is not necessari the distance between points h fis evaluated is always greater than 5. ‘The results of Section 2 indicate how large 6 is likely to be in practice. (Our aim differs from that of Richman (1968) in defining the ¢-calculus, for he is interested in proper Id as € ~» 0.) For "t approach to the problem of rounding errors, see Overholt (1967). In the remainder of this section, 5 is a fixed nonnegative number. As well as Sunimodality, we need to define d-monotonicity. If 80 then even though seed UNIMODALITY AND S-UNIMODALITY 69 nodality (Definition 3.1) ‘valued function on J. We say that fis forall x,,x; < J, > fe) < fx). Gal) As an abbreviation, we shall write simply “/ is 5-1 on J”. Strictly 6- ‘monotonic decreasing functions (abbreviated 5-|) are defined in the obvious DEFINITION 3.3 and f a real-valued function on J. We say that fis, Yee Finke © Met F Sm AM FE <> (FOO) LO) M0) > foe). 6.12) of S-unimodat funet The following theorem gives a characterizai ns, es to Theorem 3.1 if 6 = 0. Tere THEOREM 32 {is d-unimodal on [a, 6] iff there exists jr = [a,b] such that either f is on fa, pt) and 6-1 on [11,5], oF fis d+| on (a, 4} and -t on (41, d), Fur it fis 5-unimodal on (a, 6) then there is a unique interval [jt 43] [2, A] such that the points 4 with the above properties are precisely the elements of [jt,, 4) and yt, min(a + 6,6). Tt is immediate from the definitions (3.13) and (3.14) that / is 6-t on (uty, Bl and fis 5-| on fa, 4r,). We shall show that HS My G15) 72 mminzING A FUNCTION OF ONE VARIABLE hop Suppose, by way of cont G16) the definitions of jr, and ji. This G7 such that fis not 5 points »’, "2, on [a,x] Thus, there are PEbay Sx ax xycr 6 (3.18) fe = fi G19) and fo) 2 fe. 8.20) 7 aa G21) otherwise os Xy, and x, co Thus (3.16) is impossible, (3.1 Choose any se in (jt, al. From the definitions of pz, and jy, f is 8+] on [a, 42) and 6-7 on (4, 5]. Suppose, if it is possible, that f is neither 6- or 5 [#, b]. Then there are points 6, (3.22) s, the points y,, 4, and y, contradict aj or bf on [y, 6}. TI part of the theorem. nd (3.14), t of the theorem is precisely [j,, :}. Since f i both 6: EL on (sys ds). We have sly < ft, + 6, and the proof is complete ns its minimum on [a 1), 80 See 3 UMMODALITY AND S-unmmoDALiTY 78 As an example, consider SD = 8 + go) (3.28) |, where gis any function (not necessarily continuous) with |e(x)|<¢, = 0. Since f(x) is bounded above and below by the unimodal fune- f € and x* —e, we see that f is S-unimodal for any 5 > /2E. 4 practical case € might be a small multiple of the relative machine preci- sion, and the fact that the least 5 for which fis e-unimodal is of order ¢' rather than €, is to be expected from the discussi ‘The following theorem is a generalization of Corollary 3.4 (which is just the special case 6 — 0), and shows why methods and golden section search work on d-unimodal fun between points at which / is evaluated is greater than d, THEOREM 3.3 Suppose that fis d-unimodal on A, 1, and ate the points given yeorem 3.2, x, and x, are in [a, band x, +6 < xp. I fix,) fle) then py < xz, and if f(x,) > fOx,) then pr, > Proof I x, < pe, then fix,) > f(x.) for, by Theorem 32 with p= py. f is 4-| on (a, 43). Hence, if flx,) < f(x;) then 4, Golden section search t T Superlinear convergence Successive linear <—> Successive parabolic interpolation polation Many or less ad hoc algorithms have been proposed for one dimensional minimization, particularly as components of n-dimensional minimization algorithms. See Box, Davies, and Swann (1969); Flanagar Vitale, and Mendelsohn (1969); Fletcher and Reeves (1964); Jacoby, Kowalik, and Pizzo (1971); Kowalik and Osborne (1968); Pierre (1969); Powell (1964) etc. The algorithm presented here might be regarded as an unwarrante addition to this list, but it seems to be more natural than these algorithm: Which involve arbitrary preseriptions like then halve the step-size Of course, our algorithm is not quite free of arbitrary pre- ism of the ad hoe algorithms is that asymptotic rate of convergence (when f is sufficiently smooth) for our algorithm (Section 5). Note that we do not claim that our algor ble for use in an m nal minimization procedure: an adh algorithm may be more efficient (sce Seetions 7.6 and 7.7. Sect LAN ALGORITHM ANALOGOUS TO DEKKER'S ALGORITHM 73 A description of the algorithm Here we give an outline which should make the main rithm clear. For: reader refer to Section 8, where im is described formally by the AL‘ nto the minimum defined on the interval [a, 6]. Unless a is very close to b, fis never eva and 6, so f need only be defined on (a, b), and if th at a or b then an interior point di ed, where fol is a tolerance (see equation (4.2) may be local, but non-global, unless f is 6 Ata typical step there are six si istinet. The positions of these points change during the algorithm, but there should be no confusion if we om interval on which fis defined, and vewex=a4 (Soy sie number (3 — 4/S)/2 = 0.381966 so that the first scep is the same as for a golden sec start of a eyele (label “loop” of procedure 1, ad x allvays serve as follows; rm lies in fa Bs fof all the points at which / has been evaluated, x is the one with the least value of f oF the point of the most recent evaluation if there isa tie: w is the point with the next lowest value of f;v is the previous value of w3 and w is the last point at which f has been evaluated (undefined the frst time). One possible configuration is shown in Diagram 4.1 DIAGRAM 4.1 A possile contigoration As in procedure zero (Chapter 4), the to ation of a ive and an absolute tolerance, If noe is a con tol = eps|x 42) 78 mnNWING A FUNCTION OF One VARIABLE oan. jal near x and 5 of the di general ‘machine-preci ‘ease the where € is the be positive in sible that ay exceed 2o! + 5 ‘ors in determining if the stopping c: terion is satisfied, but the addi or greater. 2t01 — Hb ~ a), ive., if max(e — a, b— x) erminates with x as the approximate position numbers p and 4(q > 0) are computed so that point of the parabola passing thro: ore of these points coincide, or if the parabola degen- hen g = 0, crates 10 a straight pand q are given by p= Ele — DAF) — FO) = 6 — WL) = (OO) (43) = Ee = BE = wh = Ye — Wl. md ET ay and 9 = Fle — WK F6) — FOO) = — WF) — FO] 4.3) = FAx — v(x — wKw — fT x). (4.6) From (4.4) and (4,6), the correction pig should be small ifs close to @ mi where the second derivative is positive, so the eect of rounding errors @ is minimized, (Golub and Smith (1967) compute a + w) for the same reason.) procedure zero, let e be the value of p) gle step is performed, ie., the next value of wis ce fereep i Creare an (A) Uf the next k steps are golden sect mal choice as k > co: see Wi “parabolic interps and 6 — must be at least fol. Then fis evaluated at 1 5.0, b, 0, W, and x are updated as necessary, and the cycle is repeated (the proces rns t0 We see that fis never evaluated at two points closer together than fol, so d-unimodality for some 3 < tol ig enough fo ensure that the global minimum is found to an accuracy of 2tol + 5 (see Theorem 3,3 and the following remarks), new point w, the see 5 CONVERGENCE PROPERTIES 75 way: x = b — tol after a parabolic interpolation step has beer WF enforced. The next parabo ies very close to x and 4 sd to be x — fi wes (0 4, b — a becomes 2fol, and the termination m 4.2). Note that two consecutive steps of h were done or less, then id une ¥ * ° t ‘wi DIAGRAM 4.2. A typical coniguration alter termination Section 5 CONVERGENCE PROPERTIES F of at Teast two on every second eycle of the algo < tola golden section step is performed. (In this section, golden section step does not necessarily decrease b — a significantly, ¢8.. if x b — sol and fw) < f(x), then b — ais only decreased by sol, but two golden section steps must decrease b — a by a factor of at least (1 } 4/5 )/2 L618... As in Section 4.3, we see t sonvergence cannot require more 2efe. (54) wn val Lat. (82) 76 MiNMaNZING A FUNCTION OF ONE VARIABLE ome. By comparison, a golden se: ply here as were made al {ests convergence has never been more than 5 per for a Fibonacci search (see Section 6). ing (5.1) we have ignored 1 Section 4.2, itis easy to see that they cf fy (4.2.10) and (4.2. is at least 2e. true if the successive parabolic interpol order By = 1.3247...(6 and 3.7). For most of the ad hoe mi with a n process converges with strong ns for this are given in Sect hods given in the literai iaranteed error bound of order fo! in the number of steps given yes occur, the order is no of Davies, greater than for ouralgori wana, and Campey (Box, Davi or for each parabolic fit, so the order of convergence is at most Section 6 PRACTICAL TESTS The ALGOL procedure localmin given in Section 8 has been tested using ALGOL W (Wirth and Hoare (1966); Bauer, Becker, and Graham (1968)) on IBM 360/67 and 360/91 computers with machine precision 16-'°. Although in example where the bound (5.1) on the ni is nearly at requires, at worst, only 5 perc se the same accuracy using Fibonacci search. In most pr: seo PRACHCAL TESTS 77 ses superinear convergence sets in after few golden section steps, andthe eure is much faster than Fibonacei search ‘Asan extmple, in Table 6.1 we giveth requited to find the minima of the function (2-5) so = 3 (2=2) (a) ‘This function has poles at x = +208, Res 1e open interval 2, /-+ 18) for #= 1,2. ‘odal (ignoring rounding errors) ‘an interior minimum, The fourth column of ‘of function evaluations required to find this mi ocalmin with eps = 16-7 and t = 10-"° (so t where fol = 16°7 || + 107"), ‘The last column of the table gives the number 11z of function evaluations required to find ind 4,6) with macheps = 16°? and ¢ = 10-)®, so the guaranteed accuracy is nearly the same as for Ic in practical cases we would seldom be lucky enough to have such a simple analytic expression for”, so procedure 19.67¢0001, 29.8282273 41,9061162 35.9538958, T1.9856685 90 088685 100265527 56036524295, 5.8956037976 10 e812 9 50762593, ° 5333662003 9 166803639849 9 6.7938538365, 9 6.3634981083 9 6.8539024631 9 6. 6008470481 9 78 MINAMRING A FUNCTION OF ONE VARIABLE chan. 6 zero could not easily be use find mini cedure zero could find a maximum rather than a minimum, Table 6.1 shows that the number of function evaluations required by h would require IO to the same acouracy. ing the superlinear convergence see Section 3.9. Section 7 CONCLUSION ‘The algorithm given in this chapter has the some advantages as the al and thus much faster than Fibonacci search, There is no contrac here: Fibonacci search is the fastest method for the worst possible fun ‘but our algorithm is faster on a large class of functions, including, for example, C? fi positive second deriv: A similar algorithen using derivatives We pointed out in Section 4.5 that bisection could be combined with las which use both f and f". We could combine golden, ith an inter J’ iva similar way. Davidon (1959) su ng a cubic polynomial to agree with fand and taking a (urning point of the cubic as the next approxi- mn. (See also Johnson and Myers hod, which gives the compute, If the cubic isa local mi of th Parallel algorithms a. Ifa paral ich take advantage of t zero-finding problem (see Section 4.5). Karp search method whi analogo id Miranker (1968) give a of Fibonacci search, and ‘optimal in the same sense, if a sufficient d Wi 1g the root of a function, and be used to find a root off”. (Parallel methods for finding a root of. 2 Idd also be used.) These methods could be ‘method with guaranteed convergence, and often her order than for our serial method. combined to give a par ear convergence with a hi Section 8 AN ALGOL 60 PROCEDURE ‘The ALGOL procedure Jocaltin for finding a local minimum of @ fune- tion of ene variable is given below. The algorithm and some numerical results are described in Sections 4 to 6. A FORTRAN translation of proce ure localmin js given in the Appendix. real procedure localnin (a, bs €P5, sf.) value a, b, eps, t; real , h, eps, t,x; real procedure f; begin comment: If the fanction fis defined on the interval (a 8), then Jacalnia finds fan approximation x to the point at which f attains its minimum (or the ‘appropriate limit point), and returns the value of fat x. rand eps defi fa tolerance to! = eps|x| +f, and f is never evaluated at two points closer together than ‘ol. If f is é-unimodal (Definition 3.3) for some 6 < tol, then x approximates the global minimum of f with an error less than 3f0! (See Section 4). If fis not d-unimodal on (a, 6), then x may approximate a loct bbe no smaller 2mackeps, and preferably not much less than sqrt (mache ‘macheps is the relative machine precision (Section 4.2). ¢ should be positive. For further details, see Section 2. ‘The method used is a combination of golden section search and successive parabolic interpolation. Convergence is never much slower than for a Fibonacci search (see Sections 5 and 6). If fhas a contin second derivative which is positive at the um (not at a or A) the ignoring rounding errors, convergence order is at least 1.3247 real cy dsm, By de Fs #0L, 12, Uy BW, ft, fF, £83 ce: = 0.381966; comment: ¢ = (3 ~ sqrt(5))/2s pawiaxizatex (base: 0; fo: = fu: = fr: =f ‘comment: Main loop; loop: m: = 0.5 x (a+ Bs tol: = eps % abs(x) +15 12: = 2 x sh 0 amimiazine & FUNCTION OF ONE VARIABLE cho 5 comment: ‘oping criterion; 0.5 x (b — a) then wW) x Ue = P= (x— 0) xg — (x fq > 0 then p: = —p else g rseead (=) x efi; Kg =2 Othen 4, ~ 1, butif 5 < Othen wy ~ —1, soa very small ch in fcan cause a large change inj, Instead of trying to approximate 1), we should seek 10 approxi = flu). Since lo, ~ ols to find @ such that @-olee with a finite number N, of function evaluat information about f A priori conditions on f ure ¢| 22) by and wf 6) < Wid) for all 5 > 0. Given 1 > 0, choose 6 > 0 such Wat G.toy (always possible by (1.8)), and evaluate fat points xy, ..., x, in [a, 5] such ‘that max min |x ~ x,|<6. aly (For example, we might choose x4 —a-+ 8 x; = a+ 33, m= a +55, ete.) If o> min f then, from (1.7), (1.9), (1.10), and (1.11), 0s $-9,<¢ 3) Ib a finite (a.t2) Thus, a quite weak con: number of function evaluations is that we have a bound W(6), satisfying (18), on the modulus of continuity (fs 8) of f For example, if f ¢ C'fa, 6] and if : 2, and let [e=@ (amy (a) | (1.20) au Dafne 5 = (6 ~ ano 0-4 1 for FO. G0 =D) and . ‘| azn cif (0 a, +4,,) be the polynomial of degree ‘Lemma 2.4.1 and the bound ( =a, Let — 1 which coincide 19) show that, for M 190) = POOLS) a4.) 0 6 ay (1.22) side of (1.22) is no greater than {6/[2eos(n/2r)}}"Mi(71 2°") and, by (1.20) and the choice of 6, this is no greater than 1/2. Thus, we need only find the minimum of each polynomial P(x) in 10 within a tolerance 172. This is easy if r= 2, for then inear, If r > 2, ton was suggested by Ri the method of Golds [a, a,,,) are adjacent, the number of f does not exceed Noir (1970). A\ ize Px) by in and Price (1971).) Because successive intervals red to find (1.23) ely to be practical for small r unless r >-2. On the other hand, in practical prob- lems difficult to obtain good bounds on the third or higher \ey exist). Thus, in the rest of this chapter we suppose that that a one-sided bound rem ided bound (1.19). If f"(x) has ion (e.g., as an acceleration), then @ bound of the form (1,24) ean ies be obtained from physical considerations Section 2 THE BASIC THEOREMS The gl on algorithm which is described in the nex depends on the simple Theorems 2.1, 2.2, and 2.3. Theorem 2.1 is related to the maximum le for elliptic difference operators, and also to some results in Davis (1965). We assume that f= Cla, bl, and that £0) — FOS Me ~y) Qn ad THE BASIC THEOREMS as for all x, v in fa, ] with x > (Weaker conditions s ice: see Soction 7.) IEL© Cla, 6}, then the one-sided Lipschitz condition (2.1) is equivalent to L@osM 2.2) for all x © [a, 4 : THEOREM 24 Suppose (2.1) holds. Then, for all x = (a, 6), E foxy = C=, (af) 1 boa eee ‘The proor is immediate From Lemna 2.4.1 bx. 2.3) Lemma 23 Suppose (2.1) holds and @ <0 <6. Then 110) LD=LO Sb yg es Proof Applying Lemma 2.3.1 to f(—x), we have F(a) = f(O) + af (0) + $Me’, es s0 the result follows, THEOREM 2.2 Suppose (2.1 Then Ids M> 0, a flo, and (0) ~0. 26) Proof Applying Lemma 2.1 w so Fla) ~ fle) < 1M(e ~ ay, es) and the result follows LEMMA 2.2 Suppose (2.1) holds, M> 0, and a 0 is not trivial, although we saw in Section 1 that there does exist an algorithm to solve it. The basic algorithm his section is an elaboration and refinement consistent of the 10), except that we write M for m, fi for The algorithm described i da, a ne point in 1, Set @ min (/(@),/0)), A iP = fle) then a else 2. FM <0 or a, 2b then halt. Otherwise set a, — 8 2s ste below for a better choice). 3, Ifla,) <8 then set a, and 6 —fla,) 4. 1f the parabola y= PG), with Ps) = M, Pla.) = fla), and Pla,) =a.) satisies P(x) > 6 —1 for all xin faa}, then go to 5. Otherwise set a, «ia, + @,) and go back 103, 5. Sota, «a, and go back to 2 We sh sensible choice of a, at step 2) the basic algor ate in a finite number of steps. In view of Theorem 2.1 ‘and step 4, it is clear that the algorithm terminates with @ satisfying (3.2). Refinements of the basic algorithm The he problem is how to make a good choice of a, at step 2 of the basic algorithm. We want (o choose a, as large as possible, but not so large that it has to be reduced at step 4. Theorems 2.2 and 2.3 provide useful lower bounds. If the global minimum yz, lies outside (a,, 5), or if oy > 6 tethen we may halt for @ already sates (32). Other Pn) = os) and Sus -t B.6) 0, from Theorem 2.2 with a replaced by a, and ¢ by py, Thus, at step 2 itis safe to take a, — a, where a= inh. a, 4 (alge a G8) 48 ctoBat mammnzarion crap. 6 and Ww 4. Since the 1 ve to be reduced at step verge in a finite number of steps if, at step 2, we choose any a, in the range 4, is decreasing rapidly at a,,t than (3.1). Apply Theorem 2.3 ‘Theorem 2.3 may give a better bound ith ¢ replaced by a, and a replaced by a dd, (with d, > 0) where f has al is not possible if a, ~ 2) Combining the result wi it is safe to choose a, = a at step 2, where eC oe (as) ~ fla; ~ dy) +: 2.01e fel eee i) 89) ive tolerance, and the term 201 is introduced to combat, i errors (see equations (3.4) and (3.52). ‘The choice a, = aj is safe, but it is possible to speed up the algorithm 1 choosing a, > a. Because we want to avoid having to decrease 4a, at step 4, the best el © d, = min (b, af.) where af is =f ich passes through (a,, /a,)) right of a,. Here @ = min G fa) is the value of @ after step 3 has been executed, and we ean exte of f by defining f(x) =. G.19) ye domain {(b) for x > b if this is necessary. A typical situation It is not practical to choose a, funeti sglomin (Se af, for, evaluations are needed to approximate it aecurately. Procedure See 3 420) ALGORITINS FOR GLOBAL HUNINRATION 39 equacy of the pai “safety factor” fh € (0, 1) IF 4, = min (ba, + Ala — a3), 6 then at step 2 we choose dion to f, the procedure uses a he a, = max (aS, d,), G.12) and if it is neccessary to reduce a, at step 4 thei 4(a, + a,)). Procedure glomin also makes a rather p adjustment depending on the outcome of step 4 set a, — max (a, attempt 10 adjust Some details of procedure glomin ‘The ALGOL 60 procedure glomin given in Section 10 uses the basic al- with the refinements suggested above. From equation (3,8) and the Igo ‘want to find a rough approximai le. In other words, @ should be near! is reason, procedure strategies which are designed to reduce @ qu global minimum would be found without merely reduce the number of fu Sand 6). The first strategy for re About ten percent of the fu dom” points uniformly di point a, if Theorem 2.) fas) > ¢ value as soon as orates several hentistic y. We emphasize that the jing these strategies: the strategies ns required (see Sections ing ¢ quickly is a pseudo-random search, tion evaluations are used to evaluate fat “ran- ributed in (ay,6). (fis not evaluated at the random th @ replaced by a, and x by a, indicates n would be a waste of time.) At worst this strategy wastes ten percent of the function evaluations, but the saving in function eval sed by quickly finding a good value of @ is usually much more than ten percent. The arbitrary choice of ten percent was after some numerical experiments. By comparison with the random search strategy, the second stra lated at the n of the parabola wh three points at which /has been evaluated, provided his minimum a, lies in (¢,, 6) and Theorem 2.1 does not show that the evaluation is futile for the purpose of reducing @. The details are similar to those of procedure localmin (see Chapter 5). This strategy helps to locate the local minima of f whicl less the global minimum is at a or 6, onc of bonus is that, i fis suffcienily well-behaved near the global minimum (see Chapter 3 for more precise conditions), then the minimum more accurately than would be expected with the bas lobal 90 cLonaL wrsimizarion Chan 6 ical examples given in Sections 6 and 8 illustrate this. To avoid wasting func tion evaluations by repeatedly finding the same local minimum, this strategy is only used once in about every tenth eycle, although it is always used if @ = Sla,), for then there is a good chance that fla,) < 6. Finally, the user may be able to make a good guess at the global mini- mum. For example, he may know a local mi to be the global global minimum ofa slightly different ion discussed in Section 8). Thus, proceduce glomin put parameter ¢ which may be set by the user at the suspected po: tion of the global minimum, and on entry the procedure evaluates f at ¢ in an atiempt to reduce @. If the user knows nothing about the likely position of the global minimum, he ean set ¢ = a or b. We can now summarize procedure glomin, (For points of detail, soe Section 10.) Step I of the basic algorithm is performed, and the algorithm terminates immediately unless M>0 and a 0, 2, = —a, > 0,9, =O), and y, = f(a.) then G.13) is equivalent to alr, — 92) + 2290, — G+ HN) < zymyrlz.g — Ys (3.14) Which is the condition tested after label “retry” of procedure glomin. (If q = 0 then (3.14) is false, and it is also false if a, + rig lies outside (a,, 6), since m, > O-and @— 1 = min (yy, 9,)) To approximate a}, we need the point a}* where the parabola y = P(x), Passing through (4, y,) for i = 0, 1, 2, intersects the parabola. BEE tar pam (sae aus {in procedure glomin we use ¢ in place of a, to save a storage location.) Let | See8 AN ALGORITH FOR GLOBAL MINIMIZATION 91 4d, =a, — agyand d, = a, — a. iready computed numbers p and g, Fo= V2 — Junta Yn = In the nonrandom search we have (and g above) with (3.16) and 4, = Ady, ~ dsz0) Gan) in order to find the turning point a, + pig, of P(x). By forming the quadratic equation for a}*, and dividing out the unwanted root a,, we find that at mart Ee G18) where ve G19) gare (3.20) r= dean, 821) and say a$u (3.22) Finally, there is the inspection of the lower bound on fin (a,,@,) given by the parabola yo BEART ER OM — mls — alas, B29) where m, = JM > 0 and dy =a, — a2 >0. (3.24) if pate 3.25) then the parabola (3.23) is monotonic increasing or decreasing in (a,,a,) provided |p| dy. (3.26) Otherwise, the parabola (3.23) attains its minimum in (a,,4a,), and the mini- num value is J(y, + 3) ~ dras(d3 +p?) at x = 4(a, + a, + p). Thos, at step 4 of the basic algorithm, a must be reduced if IPl do AM + ys) ~ dels + p*) <6 — 1, (3.27) [pl 02-9) +0, — +2 3.28) 32 GLOBAL wunuonarion Chap. 6 The effect of rounding errors both in the computation of f(x) and in the internal comput procedure glomin. Now we show how these rounding errors can be accounted for. Lete be the relative machine precision (parameter macheps of procedure slomir cota Spee for r-digit floating-point arithmetic to base 8. We suppose, following Wilkin- son (1963), that (cruncated. (rounded arithmetic), Hix BY = lw BNI where 3 stands for any of the arit os 3.29) ind. metic operations +, —, (3.30) On machines without guard digits, the relations (3.29) and (3.30) may fa to hold for addition and subtraction: we may only have the weaker relation MEEN= MHD MTS | where G3 Se for i= 1,2 With these machines it see to be sure that rounding errors com- mitted inside procedure glo At any rate, our analysis depends heavily on rel analysis.) We also suppose that square roots are computed with a small relative error, say jon (3.29). (See equa! mn (3.52) and the following FKsqrtts)) = ( we | where (32) Wise I (Any good square root routine should satisfy (3.32) very easily. The library routines for IBM 360 computers certainly do: see Clark, Cody, Hillstrom, and Thicleker (1967),) Let us frst consider the effect of rounding errors in the computation of f, supposing for the moment that the internal computations of procedure slomin are done exsetly. The user has to provide procedure glomin with a Positive tolerance e which gives a bound on the absolute error in computing Jf-More precisely, we assume that, for all 6 and x with || =e and x, x(1 in, 6}, we have LIN FU + 3) — fo) |< 0, S05 AN ALGORITHM FOR GLOBAL HINIMIZATION 28 where f(a) is the exact mathematical fu and /(f(3)) is its computed floating-point approximation. The reason for condition (3.33) will be apparent later: at present we only need the special LUL09) — f09| 0 and mi, > 0, and as ss (3.40) 2p + 2.0le > zy + 20> fa.) — flay), we have rs 4 (4, + Le)= fad) Lay Seay) Gal) 98 GLOBAL awmatwzarion chap. and P(a,) = Let y = Q(x) bet =fla,). Since y= fUMa)) 40 + Bem, In the comput computes . G4?) and since errors in the computation of f have already been accounted for, ‘we can assume that y', and @ are exact floating-point numbers. From (3.46) S008 AN ALGORITHM FOR GLOBAL MNIMIZATION 95 and the assumptions (3.29) and (3.32), 5504 36)(l0 Gas TH+ 13) whore [d,| <€ for i= 1,... 4, Since yy ~ @ and ¢ are both nonnegative, G2 - OA +O4+1<0,- 6400 o. (3.49) (3.50) wt the computed s is no the derivation of (3.50), ive error, so the ast that y, — is con is necessary: (3.31) is not enough. Similarly, to find a, we actually compute where ¢ > 0, mz > 0, and a, > ay. We are only interested in F if ¥ > 0, so — 9) +201) +O+2 + 2e)(1 + €F, G.52) Fae bes Gy where O<@.— ail sn Sale) — 5H and sete ay +200 = 99) o> ney tie 6.55) ‘so, from (3.53) to (3.55), rere tee — a) + (te srs] a) + (Pps) 6.56) As befor The same is not true for a, the computed value of 1b, flla, +1), oF ‘la, +). Suppose, for example, that 2 toe os) ‘Then SUH = FtEpMas + HCL + A} 3.58) 36 ctosae sannzarion ern where |6| <¢, so, from (3.33), MALE) — fla, +1 NEY, — 8) + Or, — + 2A, 6.63) and we shall show that (3.63) is true whenever n (3.28) is true tn < d, implies that |p| < dy(l + Se), and thus la| <1 + 96)d,). 3.64) pl < dy and AM (di + BY > (. — 6) + — +2 B.65) then BPE + pI (3.66) so Lm sd + BY = AMUa3 + pV + 46) > [Oa —O+ Os ~ + 20 +36) Pll ~ 8) + rs = 8) + 24 3.67) (Note the importance of grouping the terms: since y, 2, are-all nonnegative, their sum can be computed with a sm: From (3.64) and (3.67), the inexact test (3,63) results in a, being reduced whenever the exact test (3.28) says that it must be, a, may occasionally be seduced because of rounding errors, but this does not in the bounds (3.36) and (3.37); it merely causes some evaluations. We should mention a remote possibility that rounding errors can Prevent convergence. This is only possible if fila, + see THE RATE OF CONVERGENCE IN SOME SPECIAL CASES 97 Fl ~ 14ey(2epy>, possible if 1 Me? max (a, 6), 3.68) Thus, convergence can only be prevented by rounding errors if 1 is unreason- ably small In conclusion, procedure glomin is guaranteed to return 6 and ing the bounds (3.36) and (3.37), provided the input parameters macheps, 1, and e are set correctly, Section 4 THE RATE OF CONVERGENCE IN SOME SPECIAL CASES ‘hin general about the number of function evalua- ibed in Section 3, In the next section ‘we compare the algorithm with the best possible one for given Mf and ¢. In section, we iry insight into the dependence of the number the tolerance 1, by looking at tosay The worst case As p nted out above (equation (3.4)), two function determine @ and @ if M <0, so suppose that Mf ~ 0, a _ ff b= c We showed above that, ifthe last function evaluation was at ay € [a 4), we could safely choose let a, = min(b, a, + 5) for the next evalus of a,, about (6 — jon evaluations would be required, Procedure sglomin tries to do better than this, and is nearly always stccessful (see Sect 6), but the worst that can happen is that a, will be chose es at step 4 of the basic algor of a, there can be at most, Caeilsfan’z)) a ions of a, at step 4. Thus, at worst, about Con lO54) as algoritl consecutive red 98 GLOBAL MinmeariON Chap. 5 nonrandom search evaluations, If 5 is given by (4.1), the with M and f, so the In partic will be required. We have ignored the random and , but these can only add about 2(b — a/6 extra function i (b — a6 in (4.4) varies only slowly per bound is roughly proportional to (b — a) roughly proportional to «/W7, and it seems ions is roughly alsa Section 6), A straight line If the global minim F(u) #0, we can obtain ie behavior of the a Suppose, for example, that M0) = koe ay ye some k > 0, so 4 = a. Ignoring the random searches, the algorit te fat the points a, 8, ¢, and then at points x, t, then wx (MeO, 13) is independent of ¢. ( If cis very small, so that Ab — a) < f, then (4.12) gives, wba nabs, (414) and the algorithm proceeds in steps of size about 24, where 6 is given by (4.1). A parabola Af the global minimum of f occurs. then f“(y) = 0. If "(#0 we may analyze the bs m near j by considering the parabol ‘Thus, suppose that approximation f(w) + 1/"(ux — 4)? t0 fie) M>m>0 (4.15) and fla) = 4m = wy (4.16) 14a GLOBAL mnumnzarion Chen. where 4 © (a, 6). The nonrandom search will quickly locate jz, so we may suppose that i= and, without loss of generality, = 0. The algorithm call for the evaluation of fat points to the left, and then ¢o the right, of As these two cases are similar, let us define x) = y= 0, and study the points x,, defined above, except that now fis given by (4.16) instead of by (4.5). In place of (4.7), we find that [May 2) ase ale tH) <5 It does not seem to be possible to give a simple expression like ( for x,, defined by the recurrence relation (4.17), but we may solve for x, in terms of x,, obtaining we BE. aww fi a) ax Suppose that p is close to I, ie., AF is not much larger than m =f"). Then 7 (421) and, for n> 1, Xp = (B4y)xat + Olfp — IPI} as pr (4.22) ‘Thus (4.23) As the factor (p + 1)(p — 1) is large, only a few function evaluations wi be required Section 5 A LOWER BOUND ON THE NUMBER OF FUNCTION EVALUATIONS REQUIRED Suppose that a positive tolerance ‘and bound M are given, that f atta its global minimum g, in [a, 6] at ,, and that f'G) 9, ith second derivative M, ‘and touches the line DeFI ION 5.2 An integer N and points a’— xy g, — 1 for all x € [a, 6}, given only con 1), it is sufficient to evaluate f at re Gaba meuiazaTiON comp. Lemma 5.2 If g © Cla, dh, g%) Xy.s- Thus, the parabola y= P(x), with P'(x) = Mf, PCS) — e.6 PRACTICAL TESTS 108 FO), and PO.) = flys such that PU) > f(y) (sce breaks down if = f(y), as expected mn (4.23). (See the results for f, with M = 2, 2.1 ) It appears that the number of function evaluation’ does not depend strongly on ¢: comparing N” with we see that the average number of fune- tion evaluations required is only about twenty percent more for ¢ = 10°! than for ¢ = 10-% Finally, the efficiency E of the algorithm is fat ch, even for the these ex: Section 7 SOME EXTENSIONS AND GENERALIZATIONS So far we have assumed that f © C2fa, b] and LO0, Flu BY ~ fla) + flu =) < Mae 3) — fl. Then, for all x € (a, bl, fis) 2 C= DL += FO) _ for all ue (a+ h, TMa—ab-%). (14) 106 GLOBAL suniezaTiON cap. 6 Proof There is no loss of generality in assuming that Sa) = f(b) =0 5) and M=0, 76) for we can consider f(x) — P(x), where P(x) is the right side of (7.4), instead of f(x). Thus, we have to show that o> an where 9, is the least value of f on [a, 6]. Suppose, by way of contradiction, that (28) and let = suplx € (2, 61), /09) = 94). By the continuity of fu) = 9, <0, 80 ua oF b. Th > 0,u ¢ [a+ hy 6 — A] and, from the defi Su) > fo and flu +1) > fe. ai Because of the assumption (7.6), this contradicts (7.3), so (7.8) is impossibl and the result follows. (Note the close connection with the maximum pri ciple for elliptic difference operators.) THEOREM 7.2 Suppose that (7.3) ho Then Is, M>0, ae, <¢, fle,)=fie,)- Me (a) — FE). o-0> (gD aan) Prot Apply Theorem 7.1 with x replaced by o, and b by c,, The hypothesis that f(c,) = fle.) gives, after some simplification, .— ae, — a) = ‘Of ), (713) and the result follows since ¢, ~ a> e, — a0. THEOREM 7.3, ippose that (7.3) holds, M>0, a fer. ve 10 show that Apply Theorem 7.1, first with x replaced by ¢ and b by x, placed by bby xy. The two resi fla) = flo) Ma cy The right sid ive, so (7.14) holds. Remarks Theorems 7.1 to 7.3 generalize Theorems 2.1 to 2.3 respectively. Since ihm described in Section 3 is based entirely on Theorems 2.1 to that condition (7.3) is sufficient for to find a co rect approximation to the global mi surprising, for condition (7.3) iseq global minimum of a function f of several variables, The conditions on f fare much weaker than those required by Newman (1965), Sugie (964), or Krolak and Cooper (1963). (See also Kaupe (1964) and Kiefer (1957).) Section 8 AN ALGORITHM FOR GLOBAL MINIMIZATION OF A FUNCTION OF SEVERAL VARIABLES Suppose iat D = (a, b,) * [a,,6,] i8 a rectangle in RY, f: D> R has continuous second derivatives on D, and constants Af, and M, are known such that Tock @D and R by 90) = min fes (83) Clearly 9(») is continuous, and vane ie Thus, we to the Sections 3 and 10) can be used to evaluate g(») for a given y, using condition 108 GLosat wnnanesrion chp. 6 (8.1) If we could sho ° My 6s) then procedure glomin could be used again (recursively) to minimize @(y),and from (8.4),/(x, 3). Unfortunate where in fa, 6,)- 80 (8.5) may be meat fos 1. Then ot) = min (y, —9) = 7 which is not differentiable at y= 0, and we cannot expect to prove (8.5) The same problem may arise if the minimum in (8.3) occurs at an interior point of D: one example is £159) = (8 = 32) on D= LVF. 9/3] x [-10, 10}. (fe (0) = —2sin y | which is not differen Fortunately, the following tion like (7.3), $0 the ri ) need not be differentiable every- less. For example, consider =x 66) on D=[=1, 1] x [— (8.8) ) vanishes for x= 41, so le at 0, -E2, etc.) orem shows that gy) docs satisfy a condi- of Section 7 show that procedure glomin can f ov) even if gC») is not differentiable THEOREM 8.1 Let f(x, 3) and p(y) be as above, Then, for all k > Oand y € [a, + 4, b,-h, OLY + A) — 29) + Cy — A) < MB (89) Proof From the definition (8.3) of g(»), there is a function p(y) from fa, 6,] into fa,, ,] (not necessarily continuous), such that 00) = fH)... (8.10) Thus wt AS May)y + H), 1 80 (P+) = 299) + OEY =A) MMC IY — 2a and the result follows from condition (8.2). COROLLARY 2.1 For all y € [2,6] at which @”(9) exis oO) EM, 8.13) See GLOBAL MINILIRATION OF A FUNCTION OF SEVERAL VARIABLES 109 Functions of n variables ‘Theorem 8.2 generalizes Theorem 8.1 to functions of any finite number of variables, THEOREM 8.2 Suppose that n> 1; J, is a nonempty compact set in R fori 1,.-.. nel: Dah xT, Xo % yy, SRG Ff DR is continuous, and IX + he) — 2408) + fx — he (4) for all. sufficient A> 0, all x © ROY sue xx dD. and i= 1,2,.-.,2-4 1 Let D’ =, x +++ % iy and define g: D’ > R by (9) = min Ay x (8.15) Then p is continuous on D min f(x) = min gly), (8.16) and 2oly) + or A> 0, y © Re such that y, y + de; < D’, and fn) 1 of variables, provided upper bounds are known for the pai ratives f,.(X) @=1,...,m. Itis interesting that no bounds on the cross derivatives FaxkX) (ij) are necessary, If a one-dimensional minimization using procedure glo! about K function evalu Je to use procedure glomin to find the in requires Kis likely to be in the range 10 < K < 100 in practice (sce S computation involved i ‘of more than three variables, we probably must be sat Which find local, but not necessarily global, minima (see Chapter 7). The theorems of Section 5 have not been extended to functions of more than one variable, so we do not know how far our procedute is from the best possi (given only upper bounds On fa, for i = Thus, there is a chance that a much better method for finding the gh several variables exists. It is also possible that sl conditions on f(e-g., both upper and lower bounds on certain derivatives) might enable us to find the global 110 ctonas atmunwzarion chan. 6 Minimization of a function of two variables: procedure glomin2d In Section 10 we give an ALGOL 60 procedure (g/onin2d) for finding the global minimum of a function f(x, 1) of two variables, suggested above. Note that glomin2d uses procedure g! ‘manner, for glomtin is required both to evaluate and to minimize g. The error bounds given in t from the error bounds (3.36) and (3.37) for procedure glo Procedure glomin2d was tested on an IBM 360/91 computer (using ALGOL W), and some numerical results are summarized in T 2s shown in the . 10°, and 10-! respectively. (Thus g, — I! 1.0002 x’ 10!9is guaranteed, where value returned by the procedure.) In the table we give the upper bounds M, ‘and M, (see equations (8.1) and (8.2)), the total number of funetion evalu- ns N, and the approximate global minimum @ (always very close to the global minimum 9) + f f 2210 200 13320) h 200 ists ° 986 0,39665296108547 100336 —0,396652961085468 130496 (0.396682961085434 Fels) ~ Fylss9) 091-2, Soo.9 summary ano concwusions 111 Comments on Table 8.1 TThe results for the simple functions f, and f, are hardly surprising, expected from the behavior of procedure glomin on functions of one (Gee Seotions 5 and 6), the number of function evaluations (1) increases w ‘M, and M, PY) = WOOLY — 32) + (F—3)? Fs the well-known Rosenbrock function (Rosenbrock (1940)), and it has a steep curved valley along the y= x%, Note that f, is just the Rosenbrock function in disguise, and it is interesting that only 1813 function evaluations were required to minimize f,, compared to 13320 for f,. Thus, itcan make a large difference ‘whether we minimize frst over x (With » fixed) and then over y, oF vice versa, ould be done first. OF tions is very high by course, even the loner figure of 1815 function e comparison with 100 of less for methods which seek local minima (see Chapter isthe price which must be paid to guarantee that wwe do have the global minimum. (This is only a conjecture, for the results of Section 5 have not been extended to functions of several variables.) The functions f, and f, are the same, domain of J, is four times as large as the domain off, For this function the size of the domain has much more influence on WV than do the bounds M, and M,: increasing the size of the domain by a factor of four increased N by a factor of about 50, but doubling Mf, and M, only increased N by about 30 percent. With a different wwe could easily reach the opposite conclusion. s possible to give upper bounds M, and M, on the partial derivatives f.. and f,, then procedure g da guaranteed imum, but a considerable number of luations may be required if the domain of fis large or if the Finally, we should note that we have restricted ourselves to rectang! domains merely for the sake of simplicity: there is no essential di in dealing with nonrectangular domains. Section 9 SUMMARY AND CONCLUSIONS In Section 1 we show that the problem of finding 9, = flu) of a Function f defined on a compact set is w whereas, jons on f conditions are discussed in Section 1. We concentrate our attention mainly 12 GLOBAL tmazaTioNn 1d the number of function evalua to 5, and numerical results are s for functions of one variable are used to give an for finding the global minimum of a function of several variables (practically useful for (wo or three variables), and ALGOL procedures are given in Sec: tion 10. The ALGOL procedures are guaranteed to give correct rest provided the basic ari error, (See the remark For practical prob! of this chapter lies in finding the necessary bounds on second derivatives. One intriguin, (3) is expressed in terms of elementary functions, then the second derivatives can be computed symby upper bo: can then be obtained from the symbolic second derivatives via simple inequ: ities. Thus, the emtite process of finding the global minimum can be auto- ‘mated. In some cases functions defined on unbounded domains can also be dealt with automatically by using suitable elementary transformations Section 10 ALGOL 60 PROCEDURES ‘The ALGOL procedures glomin (for global minimization of a function of one variable) and glomin2d (or global minimization of a function of two variables) are given below. The algor some numerical results are deseribed in Sections 3 to 6 and 8. A FORTRAN translation of procedure (a,b, 6m, macheps, ¢, tf.) racheps, om, macheps e, x; real procedure f: begin comment: slomin returns the global minimum value of the function f(x) defined on (a, 5]. The procedure assumes that f= C2fa, 6] and f”(a) 0 \a 0, a < 8) mid: = 05 X (1-416 x macheps) X mi; We 100000 /\ y < y2 then go to skip; retry: ifg x (rx (yb — 92) +22 xq x G2—-Y+9) < a2 x m2 xX (DX G1) begin a3: = a2} rig: y3: = fla): if y3 < y then begin x: = a3; y: = y3 end end; ‘comment: With probability about 0.1 do a random search fora lower value of f- Any reasonable random number generator can be used in place of the one here (it need not be very good); skip: ke: = J611 x ks kz = k — 1048576 > (k + 1048576); q=ljn=(b—a) (00000); ifr < 22 then go to retry; ‘comment: Prepare to step as far as possible; r= m2 x dO dl x d2; $: = sqrt((y2 A= 05 x (+ hs Pim hx (DE2XKK Si grmr +05 x gs; P= 05 x (0+ (204201 x edd x m2); = a2+ (ifr -< 5 V dO <0 then s else r) D+oI 114 GLonaL imunexrion Chap. 6 comment: It is safe to step to r, but we may try to step further; a3: = Ip x q > O then a2 + pig else r inner: if a3 b then Degin a3: =; y3: = yh end else 93: = fad); ify3 < y then Degin x: = = Bend; dO: = 03 ~ ad; if a3 > then Degin comment: Inspect the parabolic lower (a2, a3); Bp: = 2% (92 — sym x dO); if abs(p) < (I +9 % macheps) x dO A.0.5 x m2 x (d0 x d0+p x p) > — 9) £03 — 9) 42% then begin comment: Halve the step and try agains @B: = 0.5 % (@2 + a3); fh: = 0.9 % A go to inner end end if a3 2; >...> Aq with a set of cor responding normalized eigenvectors t,t... Uy Since pis a local mit mum of f(x), certainly 4, =0, 28) and we suppose that 4,,>-0. (The position of the minimum is less well deter= mined if 2, = 0.) If M6/2, is small compared to unity, and we take = t, then (2.7) is possible a~ Dawe aa Bes, ‘an upper bound on |iji — yj can hardly be less than the right side of The condition number With the assumptions above, and d given by (2.9), fe + 6u,) ~f(p) + xe | FQD|, 2.10) where @uy is the spectral condition number of 4. We shall say that x is the condition number of the minimization problem for the local minimum p. The condition ‘number determines the rate of convergence of some minimization methods (e4g., steepest descent), and itis also important because rounding errors make it difficult to solve problems with condition numbers of the order of e-* or greater, 26 ING A FUNCTION OF SEVERAL VARIABLES crea. Sealing of choosing 5 to even if 4 is known ex mizing the } A good general rule is that SAS should be roughly row (and 1965a). In practical mini- areasonable to do this is described in Section 4. Section 3 POWELL'S ALGORITHM jon we briefly describe Powell’s algorithm for minimization sulating derivatives, The algorithm is described more fully in error in this paper is pointed out by Zangwi (1967a), Numerical results are given in Fleteher (1965), Box (1966), and Kowalik and Osborne (1968). A modified algorithm, which is suitable for use on a parallel computer, and which converges for strictly convex C? funetions with bounded level sets, is described by Chazan and Miranker (1970), Powell's method is a modifica proposed by Smith number of steps, for of some properties of c n of a quadratically convergent method Conjugate directions If A is positive defini funeti id. symmettic, then minimizing the q tic fe ne ee is equivalent to sol of linear equations AX =b, 2) for example, by forming the Cholesky jons of interest here, A is the Hessian See POWELL ALGORITHM 125 matrix of a certain function, and is not known explicit of the problems (3.1) and (3.2) is still useful, , bul the equivalence DEFINITION 3.1 ‘Two vectors wand v are said to be conjugate with respect to the positive ix Ait way = 0. ¢ 3) When there is no risk of confusion, we sh gate, By a set of conjusare pairwise conjugate simply say that u and vare set of vectors which are Romark AF (u,,.--,u,} is any set of nonzero conjugate directions in Re, + -+es Uy are linearly independent, Thus m fl,) unless the ridge is sharply curved. This is one motivation for the method suggested by Rosenbrock (1960), and improved by Davies, ‘Swann, and Campey. (See Swann (1964), and also Andrews (1969), Baer (1962), Fletcher (1965, 1969, d), Osborne (1969), Palmer (1969), Powell (1968a), and Section 7.) Finding another point on the ridge If fincar searches from the pois lution ridge is suspected, then the following strategy may be successful: take ep of length about 105 in a random direction from x,, reaching the point xq. Then perform one or more linear search the point x). As Diagram 5.1 shows, the pi soa linear search in the di Although he does noi uses such a strategy as pat strategy during the regular (1968) ing criterion, We propose to use this rations as well, Incorporating a random step into Powell's basic procedure Suppose that we are commencing iteration k of Powell’s basic procedure, and 2 << k 0. Numerical tests indi often approximated fairly well by the space-curve Wi) A dak = digs yA de did ara 63) which satisfies x(—d,) = x’, x(0) = x”, and x(d,) — x". Before the third, fourth, fifth, ... singular value decompositions, procedure praxis (Section 9) moves to the point x(Ap), where 2 approximately minimizes f(x(2)). 2, is computed by the procedure that performs linesr searches. See Some ruRTHER DETALS 138 Section 6 SOME FURTHER DETAILS In this section we give some more details of the ALGOL procedure praxis of Section 9. The criterion for discarding search directions, th search procedure, and the stopping criterion are described briefly The discarding criterion Suppose for the moment that (x) is the quadra ion given by equation (3.7), In steps 2 and 3 of Powell’s basic procedure (Section 3), we effectively discard the search direction w, and replace it by x, — x. The algorithm suggested by Powell mentioned in Section 3, it discards one of u,, to maximize instead, as =X, 80.as. [dewy, -..¥) where y, is given by equation (3.14) after rei discard the direction, from uy, Uy gor 0" nant (6.1). Suppose that the new direction x, — x, = tye, satisfies Tinalissy = Barats Are The effect of dis directions, is to is to choose i, wi git by and the linear mi then, from (3.7), A= Pina, (63) so /Bi/ fy} may be used as an estimate of (af4u)", (If f, = 0 we use the result of a previous iteration.) ‘Suppose that the random step procedure described in Section 5 moves from x, to yas t Daa, (64) 136 AINUIZING A FUNCTION OF SEVERAL VARIABLES comp. before the linear seatches in the directions u,,...,u, are performed. Then =x x = DG, +2, 65) 1) are given by (Ben its = 66) may From (6.2), (6.3), and (6.5), (uf, Au, (6.7) so we must discard direction uw to maximize the ‘modulus of the tight side of (67). Since this does not explicitly depend on quadr mn that ”) ed even if fis not nec apart from our restri random steps (ie., ify, =O for i= 1, convergence is guaranteed if we ensure that, for k = 2...., Bi= B= = Bien =0 (68) never holds at iteration & The linear search ‘Our linear search procedure is similar to that suggested by Powel We wish to find a value of 4 which approximately mi 964). (A) = fx, + Au), (6.9) where the initial point xp and direction w~ 0 are given, and 9(0) = fix.) is already known. Ifa linear search in the direction w has already been per- ed, oF iw resulted from a singular value decomposition, then an estimate {(0) is available. A parabola P(j) is fitted to g(A), using g(0), the estimate ble, and the computed value of g(2) at another point, or at mate of @”(0). If PA) 2 and g(2*) < (0), then 2* is accepted as & value of A to minimize (6.9) approximately. Otherwise A* is replaced by 1/2, (2%) is re-evaluated, and the test is repeated. (After a number of unsuccessful tries, the procedure returns with 2 = 0.) The stopping criterion user of procedure praxis provides two parameters: ¢ (a positive absolute tolerance), and ¢ ~ macheps (the machine precision). The proce- dure attempls to return x satistying Ix 6.10) See. LMUMERICAL RESULTS AND COMPARISON WITH ODHER METHODS 137 ion of the 4 form of 10) is ly be changed. It w ‘chosen because of the analogy with 1 It is impossible to guarantee that (6.10) wil C? near pO case (Chapter hold for all ping criterion is, howev iC one of The sole exception is the extremely Section 7, Foo) = Ax, where 4 is a twelve by twelve Hilbert matri at the stopping crit iproved criterion could rent best imum before an procedure, and let x” be the best approximation after eration. We test if Let x’ to imply to stop, and return the approxim if (6.12) is satisfied for a prescribed number of consecutive iterations. The ions depends on how cai for the examples described andom step strategy described in Section 5 is always used if ere is no need for a more sed by Powell (1964), Section 7 NUMERICAL RESULTS AND COMPARISON WITH OTHER METHODS ‘The ALGOL W proce ‘on IBM 360/67 and 360/91 tested In and compare lerature. Our proce- ion of ALGOL: see results for other dure has also been translated inehart and Sproul les on a PDP 10 computer ing problem is described Table 7.1 summatizes the performance of procedure pra ions descried es the tolerance ¢ ~ 10 The table gives the number of variables, n; the initial step-size (a igh estimate of the distance to the minimum), f; and the st precision 2, The par 128 MINMSIZNNG A FUNCTION OF SEVERAL VARIABLES han 7 can be compared with those of methods with a different 1s and the 2X4. So that the resul stopping criterion, we give the number 7, of f number 7, of linear searches (in quired to redu Of f As f(x) was only printed o (ie, after every 1 linear mi ction evaluations required to reduce /(x) ~ fiw) t0 10°” is usu less than n,, $0 We also give the actual value of f(x) — f() after n, function evaluations , the table gives x, the estimated condition number of the problem. Except for the few cases where analytically, is estimated from the computed singt s, and may be rather inaccurate. where /(p) isthe true mi sie procedure, TABLE 7.1 Results for various test functions Funetion 120 Rosenbroci Rosenbroek Rosenbrock Cube Teale Helix Powell Box* Singular ‘Wood Chebyquad Chebyquad ‘Chebyquad Chebyquad 1s 17 4.510 1613 1716 t 4 See? NUWERICAL RESULTS AND COMPARISON WITH OTHER METHODS 139 For those examples marked with an asterisk, the random step strategy ‘was used from the start, (In the initialization phase of procedure praxi the variable H/e was set to true.) For the other examples the proce used as given in Section 9 (wi is feature was switched off for the examples ziven in the table. (The bound seh of equation (4.15) was set to 1.) Definitions of the test functions, and comments on the results sum- marized in Table 7.1, are given below. A cautionary note When c and Stewart’ 10 forget tl imerical results reported for the methods may have been obtained on different computers (with differ- ent word-lengths), and with different linear search procedures. Except for he effect of different word-lengths should only be rounding errors determine racy at reason why we prefer 10 consider the number of function evaluations required to reduce f(x) — fn) toa reasonable threshold, rather than the number required for convergence. Because apparently minor differences in the linear search be quite important, Flet searches, n, instead of the number of func! discriminates against methods such as Powell's, which use most of the search directions several times, and can thus use second derivative estimates to reduce the number of function evaluations required for the second. searches in each direction. Note that, for the examples given 2,]n, Ties between 2.1 and 2.7, but it would be at least 3.0 for m do not use second derivative informa involves n of the parabola. Also, there are ich do not use linear searches at all (ee Broyden (1967), Davidon (1968, 1969), Goldstein and Price (1967), and Powell (1970e)), and these methods can be adapted to accept diffe pro. derivatives. Thus, we prefer to compare methods on the basis of the number of function evaluations required, and regard the linear search procedure, if ‘any, as an integral part of each method. Definitions of the test functions and comments on Table 7.1 Rosenbrock (Rosenbrock (1960): £09) = 1006, — 4) 4 I-known function wit nto the valley and then Tax aay valley. Descent methods tend 140 sMnWIMZING A FUNCTION OF SEVERAL VARIAELES Chop. 7 Details of the progress of the algorithm, for the starting point (—1.2, 1, are given in Table 7.2. In Diagram 7.1 we compare these results wit reported for Stewart's method (Stewart (1967)), Powell's method, and the method of Davies, Swann, and Campey (as reported by Fletcher (1965)). ‘The graph shows that our method compares favorably with the other methods. Although the function (7.1) is rather artificial, similar curved valleys often arise when penalty function methods are used to reduce constrained problems to unconstrained problems: consider minimizing by a simple (1 =x, with the constraint that x, = funtion method, Cube (Leon (196) P18) = 100(x3 — (7.2) ‘This function is similar to Rosenbrock’s, and much the same remarks apply. Here the valley follows the curve x; = a. Beale (Beale (198) inded penalty FO) = Ble, — x0 — 4 (73) (1968) report that the Davidon-Fletcher-Powell algorithm reduced f to 2.18 x 10" in 20 function and gradient evaluations (equivalent to 60 f n evaluations if the usual (7 + 1) weighting factor is used), and Powell thod required 86 function evaluations to reduce f to 2.94 x 10"*. Thu r method compares favorably on this example (Fletcher and Powell (1963)): Flos) = 100Ffr, = 108)? ++ (r — IP] + a, (74) where ra Git aye 9) and nef ansaia,) if xy =a nF arctan(xylx,) if x, <0. This function of three variables has a helical valley, and a minimum at 1, 0, 0/7. The results are given in more detail in Table 7.3 and Diagram 7.2 For this example our method is faster than Powells, but slightly slower than Stewart Powell (Powell (1964)): 10)=3-( ry q@tay)- see) ol (76) S007 [NUMERICAL RESULTS AND COMPARISON WITH OTHER METHODS 141 For a descrip , see Powell (1964). Perhaps by good luck, our procedure had no difficulty with this function: it found the true quickly and did not stop prematurely. Box (Box (1966)): 10) — exp(—ixy/10] xslexp(—i/10) — exp(—)] 10, 1)", and also atong the line {(2, 2, OY") f= 78) (Our procedure found the first minimum.) Kowalik and Osborne (1968) 10-7, so itis faster, than any of the methods compared by Box (1966), wi method for sums of squares (Powell (1965)). See the commer Section 1 about special methods for minimizing sums of squates! Singular (Powell (1962)): FO) = (5 + Ox) ory — XP + Oy — ey) + 1G — Yt. (79) ‘This Function is difficult (o minimize, and provides a severe test ofthe stop- Ping entero, beeaus the Hessian matric a the minimum (x = 0) dovbly singular. The function varies very slowly near 0 in the two-dimensional su space {(04,,—4,, 2,, 47}. Table 7.4 and Diagram 7.3 suggest that the algorithm converges only incary, as does Powel’ algorithm. Its interesting to note that the output from our procedure would strongly suggest the si larity, if we did not know it in advance: after 219 funetion evaluations, wi J (8) = 7.67% 10°, the computed eigenvalues were 101.0, 9.999, 0.003790, and 0.001014. (The exact eigenvalues at O are 101, 10, 0, and 0.) After 384 function ev. 5, with f(x) reduced to 1.02 < 10-17, the two smallest ‘eigenvalues were 1.56 x 10-7 and 5.98 x 10-. Thus, our procedure should allow singularity of the Hessian matrix to be detected, in the unlikely event that it occurs in a practical problem. (For one example, see Freudenstein and Roth (1963) Wood (Colville (1968)): A(S) = 100(x, — aD" + = x)? + 90(, — PE +L x) 4 1OAG%s — DF + Gx = 1) 419.80, = Dory — This function is rather tike Rosenbrock’s, but with four variables of two. Procedures with an inadequate stopping criterion may terminate 142° MINIMIZING 4 FUNCTION OF SEVERAL VARIABLES cten.7 prematurely on this function (McCormick and Pearson (1969)), but our procedure successfully found the minimum at w= (1, 1,1, 1. Chebyauad (Fletcher (1963): {02 defiaa by the ALGOL procedure given by Fletcher (1969, As the minimization problem is still valid, we have not corrected a sm error in this procedure, which docs not compute exactly wl wher intended. In contrast to most of our other test functions, wi to be difficult to minimize, this function is fairly easy to mi 1()7 and 9 the minimum is 0; for other m it is nonzero. (For n = 8 itis ap- proximately 0,00351687372568,) The results given below, and illustrated in Diagrams 7.4 to 7.7, show that our method is faster than those of Powell or Davies, Swann, and Campey, but a little stower than Stewart ‘Watson (Kowalik and Osborne (1968)): = 8+ = iP Gay Here a polynomial PO =x, +4 7.12) is fitted, by least squares, to appro tion & +, (0) = 0, 7.13) for 4c (0,1). (The exact solution is 2 = tan 1.) The minimization problem ned, and rather difficult to solve, because of a bad choice of functions {1,1 ‘oF m= 6, the minimum i fiw) = 2.28767005355 % 10°, at w ~ (0.015725, 1.012435, —0.252992, 1.260430, —1,513729, 0.992996)". For m= 9, f(w) = 1,399760138 x 10-*, and p= (0.000015, 0.999790, 0.014764, 0.146342, 1.000821, —2.617731, 4.104403, 3.143612, 1.052627)". (We do not claim that all the figures given are significant) »walik and Osborne (1968) report that, after 700 Function evaluations, Powell's method had only reduced f to 2.434 x 10°? (for 1 = 6), so our method is at least twice as fast here. The Watson problem for m —9 is very ill-conditioned, and seems to be a good test for a minimization procedure. ‘Tridiag (Gregory and Karney (1969), pp. 4t and 74): J) = TAX — 2x, 4) S007 NUMERICAL RESULTS AND COMPARISON WITH OTHER METHODS 143 (7.13) This function is useful for testing the quadratic convergence property. The num f(x) = —m occurs when p is the fist column of A-!, ic, ne 22.0F (7.16) given in Table 7.1 show that, as expecied, the minimum is found ns. The eigenvalues of 4 are just 2, = f(x) = x"Ax, 7) where 4 is ann by n Hilbert matrix, i. mo (118) rapidly with 1. B rounding errors, more than n° linear ns were required to reduce £10 10° for m = 4, The procedure successfully found the minimum w= 0, prescribed tolerance, for n= 10. For um were greater than 0. lustrates how Same more detailed results es 7.2 t0 7.8 give m Rosenbrock, Helix, $) 7.1 to 7.7, we plot details of the progress of our procedure (B) lar, and Chebyguad functions. In Diagrams A= loge (£9) — FW) 7.19) ie number of function evaluations. Using the’ results given by Fletcher (1965) and Stewart (1967), the correspond 8 araphs For the methods of Davies, Swann, and Cai for purposes of compar (n= 8) are not available. Fag taeeral aiae tie ae pov tad vaneael chp? S007 [NUMERICAL RESULTS AND COMPARISON WITH OFHER METHODS 145 TABLE 7.2 Rosenbrock TABLE 7.4 Singular® TABLE 7.3 Vieix 148 INTINLING A FUIVCTION OF SEVERAL VARIABLES TABLE 76 SBT = 0.026728, 06062057, 05937963, 03973272) Chebyquad: 2 = 4* S00) 4ay7 186-8 78-14 Lass TABLE 7.7 Chebyquad: n ~ 6° 195 238 or = onsse7, 0711289, 0833129) feo 409-2 [NUMERICAL RESULTS AND COMPARISON With OrHER METHODS TABLE 7.8 CI “HT 008 o.s00000, 0735 Thebyquad: 2 — 8° ™ feo 0,0386176982859 0.917 124813073, 0.0109131813978 o.0102s60260995 10.0093337335931 ©.0071908895069 10.0049952481593 0,0044832513468 o.00s7940816125 0.0038300722159 0.0035260968747 ©.0035191392605 0.0035180637576 0,0035176361629, 0.0035168737890 0.0038168737290, 0.0035168737288 19300, 0.266329, 0500000, 0.808910, 98684), ver 140 tunnnnzins A FUNCHION OF SEVERAL variAaLes ng. Se 7 Key: oy NUMERICAL RESULTS AND COMPARISON WITH OTHER METHODS 148 AF og, 9(0— He)) A ieereersu tar 2 30 75 vo 15 13a, po 730 DIAGRAM 7.1. Rosentrack DIAGRAM 7.2 Helix method: 1¢ method of Davies, Swi (1965); Powell’s (1964) method, as n by Fletcher oan. S007 AUMERICAL RESULTS AND COMPARISON WTH OTHER METHODS 161 alos a EEE pee eeag eee ey 1 mo a a a) 7; DIAGRAM 7.3. Singular : DIAGRAM 7.4 Chebyauad, n= 2 152° AMINIUIZING A FUNCTION OF SEVERAL VARIABLES Cr ee 209,06) fie!) -s ° 25 Ea 75 100 «125 7m DIAGRAM 7.5. Chebyqvad, n= 4 . He NUMERICAL RESULTS AND COMPARISON WITH OTHER METHODS 700) 200 ae a0 DIAGRAM 7.6 Chebyquai, n= 6 Es 7 152 194 suOIMeING & FLINCTION OF SEVERAL VARIABLES cop. 7 ° vo mC OSC ™ DIAGRAM 7.7 Chebyquad, » = 8 Section 8 CONCLUSION Powell (1964) observes that, wi new. rections (Section 3) be accepted less often as the number of variables increases, and the quadratic "2 property of his basic procedure is lost. Our aim was to avoid keep the quadratic convergence property, and ensure that the search directions continue to span the whole space, while using basically the same method as Powell to generate conjugate directions. nerical results given in Section 7 suggest that our a is faster than Powell's, and comparable to Stewart's, if the criterion is the See. 9 AN ALGOL W PROCEDURE AND TEST PROGRAM 185 Also, our algor Jems like Watson breaks down because of Rosenbrock and Singul algorithm wi hm Keeps on perfor sctions, the si coordinate seare! joved. Since jogonal di convergence of the method of our algorithm to a local converge to the (unique) conver, and satisly ited pact for 7 ay be very impori algorithm converges ve definite at easy (0 prove Zook wre described in Brent (1971¢), Section 9 AN ALGOL W PROCEDURE AND TEST PROGRAM 756 MINWIMING A FUNCTION OF SEVERAL VARIABLES chp 7 1s (LONG EAL VALUE Te MACHEPS, HE HAL RATE OF CONVERDENCE SERVE THE COMMENT UN FE me nee pur tnrenmeotare A-AND SCALE FACTORS ARE ALSO bX AME PRINTED AFTER EVERY FEW LINEAR keruaus TMITIALIZATICN MUST SE. DO THe. BRICEDURE” LS. RACH SraTenenTs. ND. ‘neuer eRow iW2 qurpur WEPSe ME ASSUME THAT ‘DOES THEN wacHeDs. MUST 10 FLOATING SDTNT UNDERPLOM. THE PROCEDURE HINEIT (161 bent commen covus & The URTHUGOWAL MATRIX V SUCH 1 HERE UTS ANCTHER ORTHOGONAL MAT 1F Eco THEN LonGsuRES? Elbe “Lowesan: Se0.8 [AN ALGOL W PROCEDURE AND TEST EROGRAM 167 FUNTIL N00 F t= F 6 AatKeTPSARER LITE ND TRANSFORM ASCE fe ARUN accuse Foes m1 urn, 1 90 SeSLEtLy 1-695 THEN Goro TeSTECCHVERGENCE: IF ges (QUC-1) reeees Tew abTO CANCELLATION 168° mmuanzING A FUNCTION OF SEVERAL VARIABLES eu 60 TO TESTFSPLITTIN Gowvensence: Teneo TaN enn MFT PROceOURE saRts SEGIN COWMENT: RUINS GF rows f= ol) enor nHes vores ae yess no THEN ABSUHIY Fess g THEN. else of SCENDING ERDERE rere tune wom Sees ‘AN ALGOL W PROCEDURE AND TEST PROGRAM 159, Guosaly AND ESTIMATES THE DF FAT THe ME USED ONLY FoR NG LOGLEX = FIN sunt (180) VALUE 5 LONG REAL ARRAY arene v coLuMN ey caLuNns 1 ow @ bo 0) THEN N ELSE 64K? STRINGES2) VALUE 5; LONG REAL ORRAY Vie14 INTs THE ING S AND NAVECTOR vs (TIL 18 0D WRETEONROUNOTOREAL (VK1I9) INTEGER VALUE Jy WL ING REAL VALUE LONG AEAL VALUE: SUaLeaN VALUE Fx wINIMIzes F FRUM x IN thE DERECTION UMLESS JeLy WHEN & QUADRATIC. SEARGN TS, GONE TH THe PLANE DEFINED BY dy QL.aND” x, RETURNED AS The. DI xt ano 2 SES AND ALTERS xy Px, Ted Cr Uses vsti ants USES Ae Ne Ty Mey Vining macneess LONG SEAL PROCEDURE FLIN (LONG REAL VALUE LI (COMMENT: THE FUNCTION OF ONE VARIABLE L“KHICH TS NUNUMIZED BY PROCEDURE. RINe BEGIN LENG REAL ARRAY a 169 anunnzinG A FUNCTION OF SEVERAL VARIABLES hap.7 UerH WOO et deoawectysacwoncny Ten counreRs (02 € mncsersis hoo 8 i= 5 + xtLMHees em os FL EnDE ew te FL ENDS FUN At ANOTHER POINT AND. ‘OF Secono peRtvartves else IF K 5g THEN 9 ELSE 0: “ ce SMALL THEN 02 t= SHALLT x1 t= Sx1 enDe See 8 AN ALGOL W PROCEDURE AND TEST PROGRAM Yor comment Fu Linean seaecn Sut no rus anana.ie tes >o = t + eteve PRocevuae ai BEGIN comme! Loves 0% re wHNIMUR aLcns & CURVE anes ben exp END unos MINIMHRING A FUNCTION OF SEVERAL VARIABLES chap. 7 COMENTS WINIMIZE ALONG FIRST DIRECTION WU Ui 2+ DULy Se Fy PALS F's c= 0" THeN ean ist <2 (o.oe000) Foal += 2"burie m0 Bt FOR k= 2 UNTIL W BD beer STUNT N 90 very ee x tte au er > ore AEGI Gowers RanoOm STEP TO GET OFF RESOLUTION VALLEYS Fost fe 1 uN © NUMBER ONIFURRLY DISTREBUTED. THAT ANY UNIT TSU IZATLON OF HE SEMERATIR HAS ALREADY. BEEN DOM INTICN OO ACI) fe xtd) © 38 Bie Besta’ su" ehawents se raver 6070 LT be a am pion oe braecrioNs: Trubs cay heh 7 LoraceLors 1 Lor < Los THEN LoT += Lt FoR L f= 1 UNILL Won 12 22 12 + xcLIeas 12s waecenosaartray + Ti Comments See Tf Ste LENGTH EXCEEDS HALE THE TOLERANCES END sen 8 ‘AN ALGOL W PROCEDURE AND TEST PROGRAM 168 COMMENT: Tay GUADRATIC EXTRAPOLATION IN CASE AE ARE STUCK NEN OLAECTIONS™, Vy Ny NIE FHL UNTIC N00 VOyat 22 sevetysy N COMMENT? SCALE axeS Te TRY Ta RELUCE CoNDITEON Nnoens 1 viTiL 90 eT UNTION 99 St t= stovt ihe THEN att See eS > ree rMeN Lyre 9 Le sree BEGIN Ste 17Sca04 coment? “raansouse v Fowl z= 2/ONTIC A 00 coment =5e80 fl ie nt tHbr S00) Stes 4 viusineers SPvGse SMALL THEN VLaRGE ELSE LONSDE ese. wstay vecoRtyt TE paIM > 1 THEN VECPRINT (Et CeAVAL My Oy NI Tf DRIN > 3° THEN MATPRINT {EIGENVECTORS OF iy Va Ne NOE Chwwents” Go'8ack 10 MAIN (90s Sort LB: TP PRIN > 0 THEN VEGPRINT (4x E595 Ky NIE Ep Pearse PROCEDURE RANDOM KETUANS 4 LENE REAL RANDOM NUMBER U 164 muinr2iNG 4 FUNCTION OF SevERAL VARIABLES chp. DUSTAISUTED IW (O91) INCLUDING 0 BUT NOT wert fiw BEFORE Tor FIRST CALL. TO. KANGDM, a ee mS DF Ra + Aan2 AND aaNS HUST BE GLOBAL THE ALGORTTUM RETURNS X(N) /20%S6y HHERE KOM tne) a eda) (HOD 2085: Since 1s xe xeetey tS PAIMITIVE (MOO 2), THE PERTUD TS Se SEE KNUTH P 264 Shy 4686 ReRSt 2eni2T <1 > Lo swerve LONG REAL RANLE INTEGER RANZ; LONG HEAL ARRAY RAN Ranz te 12 Rew S190 + Eoaan2 S & on FoR : FANS (RaNe) te Rat TrimaNy < OL THEM RANL © O¥5L Else RAN * 0451 Rant + 65. END RANDDNG LONG AFAL PROCEDURE ROS {LONG REAL ARRAY KC) INTEGER VALUE NIE LOOLFLUK(2) = KCLISAZIEF2) + LOKG REAL PRUCEOURE SINGKLONG MEAL ARRAY XLe):INTEGER VALUE ND MENT: See. POWELL. Tye wgeexczy te OtetxtLy Kear 2 + (xz retex anes LONG REAL PROCEDURE HELIACLUNG REAL AKAAY XC) INTEGER VALUE 81 COMMENTS. “SEE FLETCHER € POWELL (1963) 2g men T xis nee LONG geAL PROCEDURE cu! Toouetnizy = xt1) 983} LONG REAL PROCEDURE WATSON (LONG REAL ARRAY xt=IE sees Love aeau PRocenune ‘AN ALGOL W PROCEDURE AND TEST PROGRAM 168: WUTEGER VALUE Wy resem vate Gowen SEE sein (one aeat es otra, raLuss WOOLEAN EVENS TENG RGAE ARbay ve try rmts CHFOYQUAD CLUNG REAL ARRAY xC+) beta 25 ese on Eno cresvousps LONG REAL PROCEDURE POWELL «LONG REAL AKKAY x0#) Tnrecea VALUE Gownents "See Fi ata s te OL ELSE LONGEAP(= 40x LONG REAL PROCEDURE Hoi Comments SEE HCcokw REAL ARRAY x¢e Turecck VALUE N)s COMMENTS COMPUTES xY.Rexy WHERE A 15 THEN BY W HILBER seorn ventas SE ERARNEY 1909}, PPL Sty ce Eeears 166 HMIMNNLING A FUNCTION OF SEVERAL VARIABLES chap? See.9 AN ALGOL W PROCEDURE AND TEST PROGRAM 167 xan & (40) THEN © Ole FMIN t= we RMAT OF INTEGERS: ABOLIC vA BIBLIOGRAPHY also some references dei converting const References on cluded, and we Jacoby, Kowali arily been assigned the year 1971 1 and integer progra 169 170° reuiocRAPHY ‘Andrews, A. M., 1969, The calculation of orthogonal vectors, Comp. J. 12. 411 as) i Armijo,L., 1966, Minimization of itz continuous first partial tives, Pacific J. M sof 2, 2-315.) 4.5, 193. (7.5) Math. 15, Baer, RM. Baker, C. T. H. 315-319, 1962, Note on an extre 970, The error in ard, Y., 1970, Compari an rameter estimation problems, STAM Bard, Y., see Greenstadt (1970), ACM 12, 266-268, (7.1) b, G.H., and Saunders, M. A., 1970, ng, Report CS 162, Computer aver HE, Bikes and Graham, $1948, ALGOL W lage des meena C59 (eed 25 CS 110 wth Es Serta, 109, Camper Depts Stanford Une (58,6679) reed fo Koyo. "mimeton Univ 2.79) esl, EM. 1 1865, Marea pro eb Becker, 8 se Bast, eskr, and Graham (1968), Beightler, C. S., see Wilde and Beightler (1967). Mand Pike, M,C, 1966, Remark Comm 4c809, 686.0.) Belinan, R. Ey 198%, Bymom Rowe. (13) RF and Drv, S198 Api dic por Unis Press, Prine, Now Joey. 184.) 1M, | Sz xy, New York, ne fn practice, W '78(E4), Direct Seat programming, Princeton Univ, Press, Princeton, 1g, Princeton. aner. Math. 7, Ben! Berman, G., 1969, Lattice approximations to the minima of functions of several variables, J. ACM 16, 286-294. (7.1) ARijSrek, A., 1967, Solving linear least squares problems by Gram-Schmidt ortho gonalization, BIT, I-21. (7.1) Bjorck, A., 19676, Iterative 257-278, (7.1) Bjorck, A., 1968, Iterative refinement of 8-30. (7.1) Boothroyd, J., 1965a, Algorithm 7, MINIX, Conip. Bulletin 9, 104, (5.3) Boothroyd, J.. 1965b, Certification of Algorithm 2, Fi Bulletin 9, 105. (5,3) Bowler, H., Martin, R. S., Reinsch, C., and Wi 1968, The QR and QL algorithms for symmetric m |. 293-306. (7.4) Bos, G. E. P., 1957, Evolutionary operations: a method for increasing productivity, Appl. Star. 6, 3-23. (7.1) Box, M, J., 1965, A new method for constrained optimizatior with other methods, Comp. J. 8, 42-52. (7.1) Box, M. J., 1966, A comparison of several current of use of transformations in constrained problems, C 11,79) Box, M. J,, Davies, D., and Swann, W. H., 1969, Non lates, {CI Monograph No. $, Oliver R wement of linear least squares 5 linear least squares solutions 11, BIT 8, ace Sear Comp. A note on the Dav ‘method for solving nonlinear ws Report RC 3506, IBM T. J. Watson Research Lab, Yorktown ‘New York, to appear in JBAT. cf Of algorithms for solving systems 3725, IBM, Yorktown He algorithm for unconstrained. Brown, K. M., and Dennis, JE. 1968, On Newton-like iteration functions: general convergence and a specific algorithm, Numer. Marh. 12, 186-191. ra) Brown, K. M., and Dennis, J. E, i97la, On the second order convergence Brown’s method for solving si near equations, to appear. (7.1) Brown, K. Mand Dennis, JE, 1971b, Derivative-ree analogues ofthe Levenberg- Marquardt and Gauss algorithms for nonlinear least squares approximation 0 appear in Numer. Math, (7.1) rm sisuocearHy Brosden, C. G., 1965, A 12 nonlinear simultaneous ‘equations, Mazh. Comp. 19, $77-593, (1.1) Broyden, C. G., 1967, Quasi-Newton methods and their appli imization, Math. Comp. 21, 368-381. (7.1, 7.7) Broyden, C. G., 1969, A new method of solving nonlinear simultaneous equations, Comp. 5.12, 94-99. (7.1 Broyden, C. G., 1970a, The convergence of a class of doub! algorithms, Pacts Land I,J. dast. Maths. Apps. 6, 16-90 a1 Broyden, C.G., 19708, The convergence of single-rank qui Math, Comp. 24, 365-382. (7.1) Buchler, R. J., sce Shah, Buehler, and Kempthorne (1964). 1d Golub, G. H., 1965, Linear least squares solutions by Hou transformations, Numer. Math. 7, 269-276. (7.1) Buys, J. D., see Haashoff and Buys (1970), Cantrell, 1. W., 1969, Relation betwee Newton methods, memory gradient method and the px. and Apps. 4, 67-11, (1.1) 840, Sur les fonctions ‘Oeuvres completes, Gaul snousko, F. L., 1970, On opt 0a). A (Cla, N.A., Cody, W. J, Hillstrom, K.E., and Thieleker, E.A. ‘Statistics of the FORTRAN IV (ID) library for the 18M System!360, Argonne Nat. Lab, Report ANL-7321. (6.3) Cody, W. 1, sce Clark, Cody, Hillstrom, and Thieleker (1967). Berlin (translation by H. Oser, Academic Press, New York, 1966). (3.1) 1A. R198, 4 comparative study of nonlietrprogranming codes, IBM. New York etme Center Report 3202983, (71,77, 19) Conte, S, 9, se Brown aed Conte (1967) Cooper se Kelak and Cooper (1963) Cox MG 1970, bracketing fecigue or computing 20 function, Comp. cris iol-o2, a4 Cage. FE. and Levy, A.V. 168, Study of a upermemory ardent method fo “the minimization of fancons, J Open Try and Ap 191-1) aieuiocnarny 170 Crowder. H., and Wolfe, P., 1971, Linear convergence of the coniugate gradient tthod, Report RC3330, IBM T. J. Watson Research Lab, Yorktown Heights, New York, to appear in IBM Jour. Res. andi Dev. (74, 74) Cutty, H., 1944, The method of steepest descent for nonlinear minimization probe lems, Quart. Appl. Math, 2, 258-261, (7.1) |. W.. 1967a, The conjugate gradient method for ‘operator equations, SIAM J. Numer, Anal 4, 10-26. (1 |. J. W,, 1967, Convergence of the conjugate gradient method with computa Wenient modifications, Numer. Marh, 10, 125-131. (7.1) J. W., 1970, A correction concerning the convergence rate for the conjugate gradient method, S14 J. Numer. Anal. 7, 277-280. (7.1) Davidon, W. C., 1959, Variable ization, Argonne Nat, Lab. Report ANLL-990. (5.7, 7.1) rear and nonlinear 1m for minimization, Comp, J. 10, 406-410. Davidon, W. C., 1969, Variance algorithms for minim 4,79) Davies, D., see Box, Davies, and Swann (1969), Matthews and Davies (1971), ‘Swann (1968), Davis, P. J., 1965, tnterpotarion and approximation, 2nd ed., Blaisdell, New York and London. (6.2) ion, in Fletcher (19698). Dejon, B., and Henrici, P. (eds), 1969, Constructive aspects of the fundamental ‘theorem of algebra, Interscience, New York Dekker, TJ. (ed), 1963, The series 4P200 of procedures in ALGOL 60, The Mathe- matical Centre, Amsterdarn, ing a zoro by means of successive linear interpolation, in 169). (1.2, 44,4.2, 43,44) 324-330. (7.1) ie local convergence of Brosden's method for nonlinea is, Tech, Report 69-46, Dept. of Computer Science, Cornel Dennis, J. E., 1969b, Ow the conrergence of Broyden’s method for nonlinear systems of equations, Report 69-88, Dept. of Computer Science, Cornell Univ. appear in Math. Comp, (7.1) see Brown and Dennis (1968, 197la, 6). E. W., See van Wijngaarden, Zonneveld, and Dijkstra (1963). Dixon, L. C. W., 1971a, Variable ns: necessary ane sufficient cond ‘ions for idestcal behaviour on now-guadratic functions, Report 26, Nuse Optimisation Centre, The Hatfield Polyte Dixon, L. C. W., 19716, appear. (7.1) to 1968), Springer Nerlag, Bet nn, B., 1970b, see Spmposi inger-Verlag, Betlin. (7.1) Dreyfus, S. E., so Bellman and Dreyfus (1962) Dyer, P,, see Hanson and Dyer (197). B., see Dold and Eckman (1970s, b) Ehslich, L, W., 1970, Eigenvalues of symmetric five-diagonal ms aa) Evans, J. P., and Gould, F. J, 1970, Stabi 18.7.) Fiacco, A.V a Fiacco, A. V., 1969, A general regulatized sequential unconstrained minimization technique, SI4M J. Appl, Math. 17, 1239-1245, (7 Fiacco, A. V., and Jones, A. P., 1969, Generalized penalty m spaces, STAM J. Appl. Math. 17, 996-1000. (7.1) Fiacco, A. V., and McCo Flanagan, P.D., Oper. Res. 9, 184, trical investigation inear regression problems, 265-284. (6.4) R., 1965, Fu review, Comp. 5.8, 33-41. (12, T, 73, 75,1. 19) Fletcher, R., 1966, C 251, Conun. ACM 9, 686. (7.1) 1g derivatives—a of systems of non-linear equa 392-399. (7.1) Fletcher, R., 1968b, Programming under tinea of inequa ICI Manageme! Fletcher, R. (ed), 1969, Opi Fletcher, R., 1969, 4 class of ‘and convergence properties, Report TP 386, AERE, Harwell, Englan Fletcher, R., 1969, A review of methods for unc Fletcher (1969a). (71, 7.5) Fletcher, R., 19694, A technigue for orthogonalization, J. Inst. Maths. Apps. 5, 162-166. (7.5) Fletcher, R., 1970, A new approach 317-322. 7.1) er, R., and Powell, M. J. D., 1963, A ‘minimization, Comp. J. 6, 165-168. (7.1, 7.7.7.9) Fletcher, R., and Reeves, C. M., 1964, radients, Comp. J. 7, 149-154, (5.4, 7 a) imiaation, in strained 0) o variable metric algo Fiet sieuioGRAPHY 175 of linear algel "-Hall, Englewood Chifs, New Jersey. (7.2) Fox, L., Henrie, P. eigenvalues of el Francis, J, 1962, The QR tr mn, Comp. J. 4, 265-271. (7.4) stein, F., and Roth, B., 1963, Num ‘equations, J. ACM 10, $50-556, (7.7) ‘a unitary analogue to the LR transforma- corpon Gill, PLE. Tech. Report Mi Goldfarb, D, tion und 739-164. ( ically sable form ofthe simplex ale Teddington, England. (7.1) iable metric method to ma nis, STAM J. Appl. Ata Goldfarb, D., 1970, A family of variable-metric methods derived by neans, Math. Comp. 24, 23-26, (7.1) quadratic Res. Mem. 95, Princeton Univ, (7. Goldstein, A. A., 1962, Cauchy's method of 150. (7.1) . On steepest desce Price, 3. F. 84-189. (7 Foldstein, A. A., and Prive, 1. F197 25, 569-574. (6.1) Golub, G. H., 1965, Numerical methods for solving linear least squares problems, Numer. Math. 7, 206-216, (7.1) Golub, G. H., see Businger and G. 5 (1970). , W., 1965, Calculating the singular inverse of a matrix, STAM J. Numer, Anal. 2, 205-224. (1.4) Reinsch, C., 1970, Singular 14, 403-40. (7 inima, Math, Comp. (1965), Bartels and Golub (1969), Bartels, il pseudo- n and least 176 BieuIocRAPHY approximation of ce) C3 72, Compu Golub, G. sfuares sol Gould, F.4., see Evans and Graham, S..see Bauer, Heeker, and Gran Greenst 1967, On ‘Comp, 21, 360-367. (1.2.7.1) ve refinement of iable metric methods, Math. Comp. 24, 1-22 (appendix by Y. Bard) Gregory, R. T.. and Karney, D. L., 1969, 4 collection of 0 ices far testing Gross, 0. convex function, MAC (now Math, Comp.) Haarhoft, P. C., and imax search for a zero of & jon of a inear constrain . 178-184, if, Addison-Wesley, Reading, Massachusetts. (7.1) Hanson, R. J., 1970, C quadratic programming problems: linear ineq and equelity constraints, Tech, Memo. 240, JPL, Pasadena. (7.1) Hanson, R. J, and Dyer, P., 1971, A computational algorithm for sequential Comp. J. 14, 285-290, (7.1) Hartley, H. O., te modified Gauss-Newton method for fitting of nonlinear regression functions by least squares, Technomezrics 3, 269-280. (7.1) Henrici, P., see Dejon and Henrici (1969), Fox, Henrici, and Maler (1967) and James (1967) trom, and Thieleker (1967). Himsworth, P. R., see Spendley, Hext, and F tn (1962) Hoare, C., see Wirth and Hoare (1966). Hooke, R., and Jeeves, T. A., 1961, Direct search solution of numerical statistical problems, J. ACA 8, 212-229..(7.1) Householder, A. S., 1964, The theary of matrices in numerisol analysis, Blaisd New York. (7.4) Householder, A. S., 1970, The McGraw-Hill, New York. (.1) uang, H. ¥., 1910, Unified approach to quadratically convergent algorithms for Finction minimization, J. Opizn. Tary. and Apps. 5, 405-423. (7.1) and eal earment of a single nonlinear equation, sreuocRariy 177 Isaacson, E., and Keller, H. B., 1966, Analysis of York. (2.2, 2.4) Jacoby, $. L.S., Kow: ‘aptimization proble 64,7. James, F. D., see Pike, Hill, and James (1967). Jarratt, P., 1967, An iterative method for locating turning points, Comp. J. 10, 82-84. (1.2, 3.1, 3.2, 36, 3.7, 38, 3.9, 5.1) Jarratt, P,, 1968, A numerical m 35, (12, 3.1, 32, 3.6, 39) Jeeves, T. A., see Hooke and Jeeves (1961), merical methods, ley, New ‘methods for nonlinear few Jersey, to appear. hod for determi 1 points of inflexion, BIT 8, ft iterations for the solution of bownds far the zeros, Report CS 138, er Sci, Dept, Stanford Univ. (3.5) Johnsen, §. E, J,, see Allran and Johnsen (1970), Johnson, I. L., and Myers, G. E., 1967, nd cuble fen Spacecraft Center, Houston. (5.7) Johnson, 8, M., 1955, Best exploration for Report RM-1590. (5.3) Johnson, S. M., see Gross and Johnson (1959), Bk Drevfus (1962), Jones, A. P., 1970, Spiral—a new algorithm for non-linear parameter estimation . using least squares, Comp. J. 13, 301-308, (7.1) Jones, A. P., see Fiseco and Jones (1969), nds, Report No8-18823 (NASA), Manned is Fibonacelan, RAND Corp. Moscow (trans York, 1964). (3.1) Kaplan, J. L., see Mitchell and Kaplan (1968), Karey, D, L,, see Gregory and Karney (1969), Karp, R. M., and Miranker, W. L., 1968, Par J. Comb, inimax Search for a ma ues, Comm. ACM 7, 38, (6.1) . Buehler, and Kempt inno andl Kettler (1969). Kiefer, J., 1953, Sequential minimax search for a maximum, Proc. Amer, Math. Soe. 4, 503-506. (1.2) 1987, Optimal sequential search and approximation methods under imptions, S/AM J. Appl. Math. 5, 105-136. (6.7) ene (1964). 178 sie.socRseRy 1969, The art of computer progran a9) Reading, Massac E.G,, 1955, Sol ik, 1. S,, and Osborne, M. R., 1968, Methods for unconstrained optimization Elsevier, New York. (1.2,2.6,3.7, 5.3, $4,7.1, 77,79) 1d Ryan, D. M., 1969, A new method for con- and Pizzo (1971 sions of Fibonaccian search to nonl SIAM J. Control 6, 288-265. (5.3) Krolak, P. D., and Cooper, L., 1963, An extension of Fi bles, Comm. ACM 6, 639. (6.7) onaccian search to several thms for the solu! of the complete rang: Neuere Verfahren eal methods of ‘Academic Press, New York. (7.1) Lancaster, P., 1966, Error analysis for the Newton-Rapbson method, Num Math, 9, 35-68. (5.2) Lapidus, L, see Goldfarb and Lapidus (1968). Lavi, A., and Vogl T. P. (eds), 1966, Recent advances in optimization techn ey, New York. (7.1, 8) Lawson, C.L., 1968, Bibliography of recent publicarions in approximation theory with ‘emphasis on computer applications, Tech, Mera, 201, IPL, Pasadena, Leon, A., 1966, A comparison of eight known opt Vogl (1966). (7.7, 7.9) Levenberg, K.A., 1944, A method for the solution of certain non-linear problemsin Feast squares, Quart. Appl. Math. 2, 164-168. (7.1) Levy, A. V., see Cragg and Levy (1969). ing procedures, in Lavi and FA, 1968, Co Report 23, 408. ( Lootsma, F. A. 970, Boundary properties of penalty Junetions for constrained m by vector space methods, Wiley, New York Hyperbolic pairs in the method of conjugate gradients, 1263-1267. (7.1) STAM I. Appl. Mat aibuiocnary 170 Luenberger, D. G., 1970, The conjugate residual method for constrained tion problems, S/AMJ. Numer. Anal. 7, 390-398. (7.1) Magee, E.J., 1960, An empirical of procedures for locating th , Lincoln Lab, Report 22G-0046, peak of a multiple-peak vegress a2 18, O. L., 1969, Nonlinear programming, McGraw-Hill, New York. (7.1) D. W., 1963, An algorithm for least squares estimation of non parameters, J. STAM 11, 431-441, (7.1) 1 R.S., see Bowdler, Martin, Reinsch, and Wiki 1 R.S., Reinsch, C., and Wilkinson, J. H., 1968, Housel mn of a symmet Matthews, A.,.and Davies, D., 1971. A comparison of modified New ined optimization, Comp. J. 14, 293-294, (7.1) ick, G. P., 1969, The rate of convergence of the reset Davidon variable metric ‘method, MRC Report 1012, Univ. of Wisconsin. (1.2, 7.1, 7.8) 1ec0 and MeCormick (1968 ‘MeCormick, G. P., and Pearson, J. D., 1969, Variable metric met strained optimization, in Fletcher (19698). (1.2, 7.1, 7.7, 7.9) Mead, R., see Nelder and Mead (1968). Meinardus, G., 1967, Approximation of fumetions: theory and numerical methods, Springer-Verlag, Berlin, (3.7) ‘Mendelsohn, J., see Flanagan, Vitale, and Mendelsolin (1969) Miele, A., and Cantrell, J. W., 1969, Study of a memory gradient method for the ” minimization of functions, J. Optan, Tiny. and Apps. 3, 459-470, (7-1) Milne, W. E., 1949, Numerical eafeulus, Princeton Univ, Press, Princeton, New Jersey. (2.2) Milne-Thomson, L. M., 1933, The caleuhs offi 22 Mirenker, W.L., 1969, Pari IBM Jour. Res, and Dev, 13, 297-301. (4.5, 5.7) Miranker, W. L., see Chazan and Miranker (1970), Karp and R. A., and Kaplan, J L., 1968, Nonlinear constrained optimiz non-random complex method, J. Res. NBS (Er Moler, C. B., see Forsythe and Moler (1967), Fox, Hetiei, and Moles (1967). ferences, Mace Nan, London, hods for approximating the root of a function, ternational syn ling, Princeton, New Jersey, 1967. (7.1) Murray, W., see Murtagh, B. A., and quadratica 13, 185-194, (7.1) ‘and Davidon methods, ‘4. Optzn. Thy. anu Apps. 2, 209-219, (7.1) 180° s1eLioGRAeHY Myers, G. E,, see Johnson and Myers (1967). Naur, P. (ed.), 1963, Revised report on the ACM 6, 1-17. Nelder, J. A., and Mead, R., 1965, A simplex method for funetion mi guage ALGOL 60, ation, n of the maximum on unimodal surfaces, J, ACM 1970), Ortega, J. M., 1968, The Newton-Kantorovich theorem, 75, 658-660. 3 Press, New York. @. ‘on Powell's method for calculating orthogonal vectors, as) Osborne, M. R., see Kowalik and Osborne (1968), Kowalik, Osborne, and Ryan (1969), ear programming, . Canberra. (7.1) Osborne, M. R., and Ryan, D, M., 1971, On penalty function methods for nonlinear programming problems, J. Math. Anal. Apps., to appear. (7-1) Ostrowski, A. M., 1966, Solution of equations and systems of equations, Academie Press, New York (2nd edition). (1.2, 3.1, 32, 3.6,3.7, 4.2, 5.1.7.1) Ostrowski, A. M., 19672, Contributions to the theory of the method of steepest descent, Arch. Rational Mech, Anal. 26, Ostrowski, A. M., 1967b, The round-off stabil Mech. 47, 77-82, (5.2) lt, K. J, 1965, methods, BIT 5, 284. (5.3) Overholt, K. J., 1967, Note on Algorithm 2, Algorithm 16 end Algorithes 17, Comp. 1.9, 414. 5.3) Palmer, J. R,, 1969, An improved procedure for or n Rosenbrock’s and Swan's direct sei 9. (7.5) Parlett.B. N., 1971, Analysis of 13, 197-208. (7.4) Pearson, J. D., 1969, Vs 178. 0.0) Pearson, J. D.. soe MeCe 1970, A new m mizing a sum of s tients, Comp. J. 13, 418-420, (7.1) Peters, G., and Wilkinson, J. H., 1969, Eigenvalues of 4x = ZBx with band sy metric 4 and B, Comp. J. 12, 398-404. (1.2, 4.1.4.2) terations, Z. Angew. Math. cr y in the Fibonacci andthe golden section search 0% the search vectors ‘optimization methods, Comp. J sorithms for reflections in bisectors, SIAM Review ‘ble metric methods of minimization, Comp. J. 12, 171- s1auiocRAcny et Pierte, D. A., 1969, Optimizat Pietraykowski, T., 1969, An exact potent J. Numer. Anal. 6, 29. 7-1) Pike, M.C., H James, F. D., 1967, Note on Al 5.9, 416. (5.2) » 1967, Algorithm 2, Fibonacci Search, Comp. Bulletin ith applications, ley, New York. (5.4) tethod for constrained maxima, SLAM Pike, M, C. 8, 147. (5.3) Pike, M. C,, see Bell and Pike (1966), Pinner, J., see Pike and Pixner (1967), Pizzo, J.T, see Jacoby, Kowalik, and Pizza (1971). Powell, M. J. D., 1962, An iterative method for tion of several variables, Ci Powell. M. 1. D., 1964, An efficient method for finding the minimum of a of several variables without calculating derivatives, Comp. J. 7, 135-162, ( 73,15, 16,71, 78,79) 965, A method of minimizing a sum of squares of not nat calculating derivatives, Comp. J. 7, 303-307. (7.17.2) '966, Minimization of functions of several variables, in Walsh nis J.5, 147-151. (1.7, 7.9) of a fune- n of orthogonal vectors, Comp. J Powell, M. J. D., 1968, 4 FORTRAN subrou eqations, Report R-$947, AERE, Han Powell, M. J. D., 19692, 4 hy AERE, Harve Powell, M.J. D., 19696, On the convergence ofthe valable metric algorithn, Report TP 382, AERE, Harwell, England. (7.1) Powell, M. J. D., 1968¢, A theorem on rank one modifications to a matrix and its verse, Comp, J. 12, 288-290. (7.1) Powell, M. J. D., 1970, A survey of numerical methods for unconst tion, SIAM Review 12, 79-97. (7.1) M. 1. D., 1970b, A new algorithon for TP 393, AFRE, Harwell, England. (7.1) M. J. D., 19702, Rank one adie (1970). (7.1) Powell, M. J. D., 1970d, 4 FORTRAN minimization, es of the objective function, Report R-6469, AERT, re for solving systems of nor-lincar land, 7 id method for nonlinear equations, Report TP 368, England. (7.1) red optimiza- Pow neonstrained optimization, Report hods for unconstrained optimization, in tization, Report TP Powell, M. J. D., see Fletcher and Powell (1963). 192 wwe.iocRaPHY Price, J. F., see Goldste Quandt, R. E., see Goldfeld, Quandt, and T nd Price (1967, 1970). er (1968). Vol. 2, Wiley, New York |, L. B., 1966, Convergence of the Newton process to multiple solu Math. 9, 23-37. (7.1) I, L. B., 1969, Comput New York. (7.1) Ralston, A., 1963, On differer 21,26) A., 1965, A first course in numerical analysis, McGraw-Hill, New York. (12,26) Ralston, A..and Wi 1960, Mathematical methods for digitalcomputers, Vol. 1, Wiley, Ralston, A.,and Wilf, H-S. (eds), 1967, Mathematical methods for digital computers, Vol. 2, Wiley, New York Ramsay, J. O., 1970, A fa 413-417. 7.1, hods for optimization, Comp, J. 13, Some mumericel experiments on Zan strained minimization, Working Paper ICSU 319, Univ. of London. (7.3) Rheinboldt, W. C., see Ortega and Rheinboldt (1970). Rice, J. R., 1970, Minimization and techniques in nonlinear approximation, SZAM lies in Numer. Anal. 2, 80-98. (7.1) P. Lu, 1968, e-caleulus, Report CS 105, Stanford Univ. (1.2, 5.3 1970, Bounds on a polynomial, J. Res. NBS 74B, 47-54, (1.2, 6.1) Robbins, H., 1952, Some aspects of the sequer nis, Bull mer. Math. Soe. $8, 827-836, (1.2) Rosen, J. B., 1960, The gradient projection method for nonl Pact 1: Linear constraints, J. SIAM 8, 181, (7.1) Rosen, J. B., 1961, The gradient projection method for non! Past 2: Nonlinear constraints, J. STAM 9, $14, (7.1) Rosen, J. B., and Suzuki, S., 1968, Construs problems, Comm. ACM 8, 113. (7.1) Roseabrock, H. H., 1960, An automatic method for finding the greatest or least value of a function, Com 75-184. (68, 7.5, 7.7, 7.9) Roth, B., see Freudenstein and Roth (1963). Ryan, D. M., see Osborne and Ryan (1970, 1971), Kowalik, Osborne, and Ryan 196, Sargent, R. W. H., see Murtagh and Sargent (1969, 1970) Satterthwaite, E., see Bauer, Becker, and Graham (1968) design of exper ar programming, 1ear programming. hear programing test Saunders, M., see Gol (1970). Schriider, E., 1870, Uber unendlich vi ungen, Math. Av, 2, 317-365. 3, Schubert, L. K., 1970, Modification of a quasi-Newton method for tions with a sparse Jacobian, Marh, Comp. 24, 27-30. (7.1) RJ. and Kempthorne, ©., 1964, Some algorithms for ion of several variables, STAM J. Appl. Math. 12, 74-92. Saunders 169), Bartels, Golub, and Saunders le Algori ‘aur Aufldsung der Gleich- inear equa izing a fur ay Shanno, D. F., 1970a, Parameter selection for modified Newton met n minimization, S/4M J. Numer, Anal. 7, 366-372. (7.1) inno, D. F.. 1970b, An accelerated gradient projection method for lined STAM J. Appl. Math, 18, 322-334, (7.1) inno, D. F., and Kettler, P, C., 1989, Optimal conditioning of quast-Newton methods, Center for Math, Studies in Business and Economies Report 6937, Univ. of Chicago. (7.1) Smith, C. S., 1962, The automatic computation of maxi NCB Sci. Dept, Report SC 846/MRJ40, (7.1.73) L. B., see Golub and Smith (1967) is for fune- Spang, H. A., 1962, A review of SIAM Review 4, 343-365, (7.1) H., 1967, The damped Taylor series ‘and for solving syster Spendley, W., Hext, G. Rv implex desi 441. 7.1) Sproull, R., see Swinehart and Sproull (1970), Stewart, G. W., 1967, A modification of Davido nee approximations of derivatives, 717,18) Spa imizing a sum of squares ions, Comm. ACM 10, 726-728, (7.1) th, F.R,, 1962, Seque: AC-9, 105, (6.7) Suzuki (1965), 2 on the development of a hod of timization, ICI Ltd. Cent. last, Lab. Research Note 64/3, (1.2, 7.1, 73) ‘Swann (1960), Yet sie.ocnapHy Swinchart, D., and Spr Operating Note $7.1. 7.7) Takahashi, I, 1965, A note on the conjugate gradient method, Information Pro: cessing in Japan 5, 45-49. (7.1) ‘Thieleker, E. A., see Clark, Cody, Hillstrom, and Thieleker (1967) Tornheim, L., 1964, Convergence of multipoint iterative methods, J. ACM 1 210-220. (3.2) 1964, Iterative methods for the solution of equ fs, New Jersey. (2.2, 3.1,3.2, 4.5) Traub, J. F., 1967, The solution of transcende: (1967. B.1, 32) ‘Trotter, H. F., see Goldfeld, Quandt, ‘Teschach, H. G., see Kiinzi, Teschach, and Zehnder (1968) and Mendelsoha (1968), and Vogt (1966). Voigt, R. G., 1971, Orders of convergence ‘Anal, 8, 22-243. (3.2, 7.1) R,, 1970, SAIL, Stanford Arti fence Project ns, Prentice-Ha ns, in Ralston and Wilf iterative procedures, STAM J. Nu Ish, J. (ed), 1966, York. M,, 1965, Algori a.) van Wijngaarden, A., Zonneveld, J. A., and Difkstra, B. W. and AP230 de serie AP200, in Dekker (1 Wilde, D. J., 1964, Option seeking methods, Prentice-Hal New Jersey. (1.2, 4.5, 5.3, 87, 7.1, 7.3) Wilde, D. 4., and Beightler, C. S, 1967, Foundations of o few Jersey. (7.1) Wilf, H. S., see Ralston and Wilf (1960, 1967). Jnson, J. Hy 1963, Rownding 42, 63,72) 500, J. H., 1965a, The algebraic eigenvalue probl Oxford. (7.2, 7.4) 1965b, Error analysis of transfo the form 1-2ww#, in Rall (1968). (7.4) J. HL, 1967, Two algorithms based on Report CS 60, Computer Soi, Dept., Stanford Wilkinson, J. H., 1968, Global convergence of TEIPS Congress (Edinburgh, 1968), (7.4) Wilkinson (1969), Golub and Wilkinson (1966), inson (1968), Bowdler, Martin, Reinsch, and Wilkin 1s in aleebraic processes, HMSO, London, 1, Oxford Univ, Press, rations based on he use of wecessive linear interpolation, 11.2, 4.4, 4.2) QR algor Proceedings of Martin, Reinsch, and Wi son (1968). Winfield, D. H., 1967, Function development of ALGOL, Comm, ACM 9, 413-431. (1.1, 4.4, 5.6, 66, 7.9) Witzgall, C., 1969, Fibonaect search with arbitrary frst evaluation, Report D1-82- 0916, Roeing Scientific Research Labs. Woll, P,, 1959, The secant method for sim: ACM 2, 12-13, (7.1) Wolle, P., 1963, Methods of nonlinear programming, programming, edited by R. L.. Graves and a Wolle, P., 1969, Convergence condi 226-235. (7.1) Wolfe, P., 1971, Convergence conditions for ascent methods IT: some corrections, Review 13, 185-188. (7.1) see Crowder and Wolfe (1971), Winograd and Wolfe Zadeh, L. A. (ed.), 1969, Computing methads in opt Academic Press, New York, (7.1) W. 1, 19672, Minimizing Comp. 4. 10, 293-296, (7.1, 7.3) |, W. 1., 1967b, Nonlinear programming via penalty functions, Mem. Sci 344-358. (7.1) 71), cation problems, Vol. 2, ition without calculating derivatives, near progre s, New Jersey, (7.1) algorithms, Mgmt. Sei, 16 1. (7.1) ler, C. A., ste Kin2i ns, J ACM ik, G., 1960, Methods of feasible direct rork. (7.1) Zoutendiik, G., 1966, Nonlinear programming: a numerical survey, SIAM J. Control 4, 194-210. (7.1) ul New APPENDIX procedures zero, loca ‘The FORTRAN subro, possible, IBM 3% been tested with 2 FORTRAN H compiler on an 1897 losely as = serenon 4 rograan See Paoceoune ZERO. SECT ton ANSLATION DF THE ALGOL pROCEOURE ZERO. 0 Di-te-TOL? Ga ro 110 10 10 120 no 10 0 2 10 10 6 10 19 140 PROCEDURE LOCALMINe OMENS eTCL ba ra 49 190.6243 .OHEG-ABSCTOL¥G)) sOReIPAGELABS(OL5#5ARN)I GO TO 90) F2.U Vee FUNF VE PGP fo 0 100 0 130 to 56.12) -ANd.1S8-Ue6E.T2)) 60 10 90 IF (X.GE.MI GO TO 50 ot ¥ 60 10 160 60 T0170 10 SB Sy APPEND fOstS8%1) Go Ta 40 109 190 aprenon 20 TE UCFULGTAFMD SANDS (WoNE.KI) 60 TO 180 TEV) SANDS (VeNE=X) 0 fu HEA) GOT 10 so 70 10 Tocatn'? ex neruan A FORTRAN TRANSLATION OF OL PROCEDURE GLOMIN, 16," FOR COMMENTS EF By Ge he ACHE sYOVYLAY24¥Sy¥D4 20921422 Neeital Te (¥GsGe.¥) G0 10 10 60 10 30 GELZ24N2eRS (2260-2)) GO TO 50 DOES NOT OVERFLOW. +0-00001#FLOArIKD Wye rime APPEND "ra 40 19 Te (PegiuF.9.9) 69 10 89 As Snes 579) 63 Taga 0 13°90 so Tr (Asrce.a) Go Ta 140 ao SE se 2 8s Bia INDEX ALGOL 60, 3, 38-60, 19-80, 112-15 ALGOL W, 3, 54, 76,103, 110,137, 135-67 6, 47,78, 100-103 7, 78-79, 124 ie constant, 21-22, 32, 34 Barter function, 117 Base, 51, 92 Beales function, 138, Bos 27, 29,32, 34-35, linear, 2, 24, 49, 72,119, 141 193 ‘Convergence, order, 21-22, 28-46, 54, $7, 76,79 ‘quadratie, 8,119, 124,127, 143 rate, §, 97-1 iva at, 22, 26-28 22,50 3,22, 24, 26, $3-54, 76,119, Cov's 181, 56-57 Dekker’ algorthin, 6, 48-80, 55-57, 72 S-monotoniciy, 68 Seunimodlity, 7, 64, 68-71, 73 Desceat property, 120, Difference, divided, 4 5, Difference equation, TPA, 0. 69) 10 ference operators, 84, 105 equations, SI "s theorem, 38 "method, 3, 20, 45, 62 172, 1618 18, 119, 128 arithmetic, $1-33, 63-65, 68, Local, function, 189 16, 92-57, 123 overflow, 51 ‘onderfiow. 4 FORTRAN, 3, FORTRAN ye 27, 32-37 36-27 Greenstadt’s method, § 130, 35-36, 82 Guard dit, 92,95, ‘0 swenk, 21, 28, 35,41, $4, 76 vergence, 31-22, 28-46, S4, $7, 79 PDP 10 computer, 137 Penalty function, 117 EX 195 thods, Pyado-randon SAIL, 137 Scaling. 124, 131-32, Searching ordered fe, Singular fanstion, 138, 45, 150 rrbo1 20/08/10 3:30 PM Errata for Algorithms for Minimization without Derivatives (Prentice-Hall edition) © Page 80, line 11, "p q x (a- x)". ‘The corresponding Fortran code on page 189 is correct. Thanks to Jason M. Lenthe for finding this error. © Page 163, immediately before the line reading COMMENT: TRANSPOSE V FOR MINFIT; insert the following: FOR J := 1 UNTIL N DO V(1,J) := SL*V(I,J) END END;

You might also like