0% found this document useful (0 votes)
76 views7 pages

Basic Concepts: 1.1 Continuity

The document discusses various concepts related to optimization including continuity, convexity, constrained and unconstrained minimization, and methods for finding minima. Some key points: - A function is continuous if its value and limit at a point x0 both exist and are equal. - A function is convex if its graph lies above or on any line segment between two points on the graph. - Finding minima of functions involves determining if first and second derivative conditions are satisfied at critical points. - Line search methods like Fibonacci and golden section searches iteratively narrow the range to locate minima, while Newton's method uses function derivatives to rapidly converge.

Uploaded by

welcometoankit
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views7 pages

Basic Concepts: 1.1 Continuity

The document discusses various concepts related to optimization including continuity, convexity, constrained and unconstrained minimization, and methods for finding minima. Some key points: - A function is continuous if its value and limit at a point x0 both exist and are equal. - A function is convex if its graph lies above or on any line segment between two points on the graph. - Finding minima of functions involves determining if first and second derivative conditions are satisfied at critical points. - Line search methods like Fibonacci and golden section searches iteratively narrow the range to locate minima, while Newton's method uses function derivatives to rapidly converge.

Uploaded by

welcometoankit
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1

Basic concepts
1.1 Continuity

f (x) is continuous at x0 if f (x0 ) and limxx0 f (x) both exist, and limxx0 f (x) = f (x0 ). f (x) = 1/x and f (x) = 1/x2 are not continuous at x = 0. f (x) = ln(x) is continuous for x > 0 for x 0, f (x) is undened. Derivative based optimization may fail if discontinuities exist. If derivative changes sign about x0 , the algorithm may also oscillate. Reactors/heat echangers/pipes of certain sizes only available; hence discontinuities in process design. Vehicles/bus sizes/aircraft. Splines can be used to ensure dierentiability. The general problem In general, min f (x) subject to and ai gi (x) b1 lj x j u j i = 1, ..., m j = 1, ..., n

If ai = bi , the ith constraint is an equality constraint. If ai = and bi = +, then xi is unbounded. Local vs. global minima Solution may be on boundary, boundary extreme point, or interior (in which case, its as if you have an unconstrained problem).

1.2

Convexity

Convex set: For all pairs of points x1 and x2 in set , the straight line segment joining them lies entirely in the set. A point on the straight line is x1 + (1 )x2 f (x) is a convex function if f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ) Linear functions are both convex and concave, but not strictly convex/concave. If f (x) is convex, then set R = {x|f (x) k } is convex for all k . 1 01 If then convex, if < strictly convex.

CL603 c SBN, 2009 If f1 and f2 are two convex functions, then f1 + f2 is convex:

1 Basic concepts

f1 (x1 + (1 )x2 ) + f2 (x1 + (1 )x2 ) f1 (x1 ) + (1 )f1 (x2 ) + f2 (x1 ) + (1 )f2 (x2 ) [f1 (x1 ) + f2 (x1 )] + (1 )[f1 (x2 ) + f2 (x2 )] If f (x1 ) c and f (x2 ) c, then f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ) c Convex sets can be combined to get a convex set. For dierentiable convex functions: If f is convex, then for all , 0 1, x and y are two points in . f (y + (1 )x) f (y) + (1 )f (x) As 0, f (x)(y x) f (y) f (x) Assume f (y) f (x) + f (x)(y x). Set x = x1 + (1 )x2 and y = x1 or y = x2 . f (x1 ) f (x) + f (x)(x1 x) f (x2 ) f (x) + f (x)(x2 x) f (x + (y x)) f (y) f (x) i.e. a linear approx. based on the local derivative underestimates the function. Previously, while dening a convex function, we have observed that a linear interpolation between 2 points overestimates the function.

f (x1 ) + (1 )f (x2 ) f (x) + f (x)(x1 + (1 )x2 x) But x = x1 + (1 )x2 . Therefore f (x1 ) + (1 )f (x2 ) f (x1 ) + (1 )x2 ) Convex functions and global minima If x is a relative minimum of f , if there is a y with f (y) < f (x ), then on line y + (1 )x , we have f (y + (1 )x ) f (y) + (1 )f (x ) < f (x ) which contradicts the claim that x is a relative minimum. Hence x is a global minimum. Convexity and the Hessian If f is convex and twice dierentiable, 1 f (y) = f (x) + f (x) + (y x)T H(x + (y x))(y x) 2 Clearly if H is positive semidenite everywhere, then f (y) f (x) + f (x)(y x) and therefore f is convex. f (x) is H(x) is Strictly convex Positive denite Convex Positive semidenite Concave Negative semidenite Strictly concave Negative denite All eigenvalues of H(x) are >0 0 0 <0

Notation: f C 2 .

Quadratic functions
2 f (x) = a0 + a1 x1 + a2 x2 + a11 x2 1 + a11 x2 + a12 x1 x2

Determine the Hessian, and nd its eigenvalues, 1 and 2 . 1 , 2 > 0 minimum (circular valley). 1 = 2 gives a circle, and 1 > 2 is an ellipse. 1 , 2 < 0 maximum (circular hill). 1 > 0, 2 < 0 hyperbola, saddle points.

1.3: Unconstrained minimization 1 nonzero, 2 = 0 is a straight line.

CL603 c SBN, 2009

Converging across a valley is slow if movement is along the smaller direction. 2 f (x ) = H(x ) Positive denite Positive semidenite Negative denite Negative semidenite Indenite dT 2 f (x )d >0 0 <0 0 Both 0 and 0 x minimum possibly minimum maximum possibly maximum unknown

Weak minima vs strong minima: For f (x) = x2 2 (x1 + 3) a min surface exists at x2 = 0 and hence a weak minimum exists.

1.3

Unconstrained minimization

Consider a set of points x . x is a relative minimum if f (x ) f (x + x) for all x . x is a global minimum of f (x ) f (x), x . First order necessary condition If x is a relative minimum of f over , then for any feasible direction d at x , f (x )d 0 Proof: For any , 0 , x() = x + d. For any 0 , let g () = f (x()) which implies that g () has a relative minimum at = 0. f () g (0) = g (0) + o() 1 f (x) = f (x ) + f (x)(x x ) + (x x )T H(x )(x x ) + higher terms 2 Second order necessary conditions 1. f (x )d 0. 2. If f (x )d = 0, then dT 2 f (x )d 0. Proof: Let x() = x + d and g () = f (x()). Then g (0) = 0 and g () g (0) = 1 g (0)2 + o(2 ) 2 Big Oh: O(n ) implies scales as n . Small oh: o() implies that these terms go to 0 faster than itself does.

if g (0) < 0, RHS < 0 for small g (0) is not a minimum. Hence g (0) = dT 2 f (x )d 0 Note that the second condition states that the Hessian must be positive semidefinite for a minimum. Sucient conditions for a minimum 1. f (x ) = 0. 2. 2 f (x ) is positive denite.

1.4

Convergence of line search methods

Linear convergence For large k , (i.e. close to x ), and if c is the convergence ratio, ||xk+1 x || c||xk x || 0c1

The error in the result drops slowly. if c = 0.1, the error drops 10 fold with each iteration additional digit of accuracy.

4 Order p convergence

CL603 c SBN, 2009

1 Basic concepts

||xk+1 x || c||xk x ||p

c 0, p 1, k large

p = 2 implies quadratic (polynomial) convergence. If ||xk x0 || = 101 for some k , then ||xk+1 x || c102 , ||xk+2 x || c2 104 , ||xk+3 x || c3 106 etc. Even if c = 1, only a few iterations are needed to obtain 16 signicant decimal digits. Superlinear convergence ||xk+1 x || = ck k ||xk x || lim and ck 0 as k

1.5

Bracketing

Let the initial point be xA . Let the trial length be t and so x1 = xA + t. if f (x1 ) < f (xA ), then xB = x1 , and the interval remains t, else x1 = xA + t/2. 1. Let xA and x1 be at the interval bounds, with x1 = xA + t. 2. If f (x1 ) > f (xA ), xC = x1 and xB = xA + t/2. 3. If f (x1 ) f (xA ), set xB = x1 and x2 = xA + 2t. 4. If f (x2 ) > f (x1 ), set xC = x2 . If f (x2 ) f (x1 ), x1 = xB , t = 2t, go to step 2.

Fibonacci and golden section searches


We wish to nd a xed interval [c1 , c2 ] in which f is unimodal. We intend to select n measurement points in [c1 , c2 ] where f will be evaluated. c1 x1 x2 . . . xn c2 Let x0 = c1 and xn+1 = c2 . If xk is the min valued point of the n, then the uncertainty region would become [xk1 , xk+1 ]. Let d1 = c2 c1 = initial feasible region. Let dk = width of uncertainty after k measurements. If a total of n measurements are to be made, Fnk+1 dk = d1 Fn where Fn = Fn1 + Fn2 are terms in the Fibonacci series. Given the endpoints c1 and c2 , place x1 and x2 symmetrically from each extreme, at a distance d2 = Fn1 d1 /Fn . If f (x1 ) < f (x2 ), the interval now is [c1 , x2 ]. Place a third point symmetrically in [c1 , x2 ] w.r.t. x1 : the interval of uncertainty is now d3 = Fn2 d1 /Fn . Each successive point is placed in the current interval of uncertainty symmetriically w.r.t. the point already in the interval. We need a tolerance for the interval size. Golden section As n , the Fibonacci method becomes the golden search method. The solution n n to the Fibonacci dierence equation Fn = Fn1 + Fn2 is Fn = A1 + B2 where 1 2 and 2 are the roots of = + 1. 1+ 5 1 5 1 = 1.618 2 = <0 2 2 For large n, Fn d1 (1/1 )k1
n A1 limn Fn1 /Fn

f evaluations are expensive!

Width of uncertainty.

F0 = F1 = 1.

1=

0.618. Since dk = Fnk+1 d1 /Fn = Linear convergence.

dk+1 1 = = 0.618 dk 1

1.5: Bracketing

CL603 c SBN, 2009

Line search by curve tting


The Fibonacci/GS methods assume only unimodality. If f is smooth in addition, more ecient methods exist which gave convergence order p > 1. Newtons method If f (xk ), f (xk ), and f (xk ) all exist at xk , then we can t a quadratic q at xk which approximates f up to the second derivative. 1 q (x) = f (xk ) + f (xk )(x xk ) + f (xk )(x xk )2 2 We can nd xk+1 by setting dq/dx = 0. q (xk+1 ) = 0 = f (xk + f (xk )(xk+1 xk ) xk+1 = xk f (xk ) f (xk ) The new point does not depend on f (x) but on its derivatives!

This can be performed iteratively. using the notation g (x) = f (x), xk+1 = xk g (xk ) g (xk )

Convergence of Newtons method: If g is continuous and has second derivatives (i.e. f (x)), let x satisfy g (x ) = 0 and g (x ) = 0. If x0 is reasonably close to x , the sequence {xk } k=0 generated by Newtons method converges to x with an order of convergence at least two. Proof: g (x ) = 0 and xk+1 = xk g (xk )/g (xk ). xk+1 x = xk x [g (xk ) g (x ) + g (xk )(x xk )] g (xk ) g (x ) = g ( xk ) g (xk )

By Taylors theorem, the numerator is zero to rst order. For some between x and xk , 1 g ( ) xk+1 x = (xk x )2 2 g ( xk ) near x , |xk+1 x | c|xk x |2 . Netwons method would require one iteration for a quadratic function (good!). We need to compute f (x) and f (x) (bad!) however, and if f (x) 0, convergence would be very slow (very bad!). We need to start close enough to the minimum to ensure convergence. Finite dierence approx. to Newton: If f (x) is not available, we can use a numerical version of the derivative(s). xk+1 = xk [f (xk + h) f (xk h)]/2h [f (xk + h) 2f (xk ) + f (xk h)]/h2

where we have used a central dierence method. We could use a forward dierence approach instead. h should be appropriately small.

Method of false position (Regula falsi)


We approximate the second derivative, xk+1 = xk g (xk ) xk1 xk g (xk1 ) g (xk )

The method is started at two points where g (x) are of dierent sign. Next nd xk+1 and g (xk+1 ). Keep xk+1 and one of xk and xk1 such that the two points chosen have g (x) of opposite sign.

6 Order of convergence

CL603 c SBN, 2009

1 Basic concepts

g (x ) = 0.
g (xk )g (xk1 xk xk1

(xk x ) x k x k 1 xk+1 x = (xk x )g (xk ) = g (xk ) g (xk1 )

g (xk )g (x ) xk x

g (xk )g (xk1 ) xk xk1

This is of the form (xk+1 x ) = M (xk x )(xk1 x ). Let k = xk x . Take logs, you would get an equation of the form yk+1 = yk + yk1 which is a Fibonacci sequence. Hence the order of convergence is 1.618,

Polynomial approximation methods


No derivatives are needed. Quadratic approximation df b =0 x = dx 2c Given 3 points x1 , x2 and x3 and using the notation fi = f (xi ), aij = xi xj and 2 bij = x2 i xj , 1 b23 f1 + b31 f2 + b12 f3 x = 2 a23 f1 + a31 f2 + a12 f3 f (x) = a + bx + cx2 If x1 has the greatest value of f , the new triplet will be {x2 , x , x3 } as long as the bracket about x is not lost. Cubic interpolation Let f (x) = a1 x3 + a2 x2 + a3 x + a4 . Then setting f (x) = 3a1 x2 + 2a2 x + a3 = 0 gives two roots 2a2 4a2 2 12a1 a3 x = 6a1 The root taken depends on the second derivative (max or min). If derivatives are available, only 2 points are needed. Given (xk1 , f (xk1 ), f (xk1 )) and (xk , f (xk ), f (xk )), then f (xk ) + u2 u1 x k+1 = xk (xk xk1 ) f (xk ) f (xk1])+2u2 where u1 = f (xk1 ) + f (xk ) 3 f (xk1 ) f (xk ) , x k 1 x k u2 = u2 1 f (xk1 )f (xk )
1/2

For a minimization problem, having a bracket requires x1 < x2 , f (x1 ) < 0 and f (x2 ) > 0. Calculate f ( xk+1 and then choose which of the k and k 1 points to replace.

1.6

Unidimensional search in a multidimensional problem


2 2 2 min f (x) = x4 1 2x2 x1 + x2 + x1 2x1 + 5

Start at [1, 2]T and use the negative gradient as initial search direction. f (x) = 4 x3 1 4 x1 x2 + 2 x1 2 2x2 1 + 2 x2

At x0 = [1, 2]T , f (x0 ) = 5, and d = f (x0 ) = [4, 2]T . The new point is xnew = xold + d where x1,new x2,new = x1,old + d1 = x2,old + d2

1.7: Termination of a line search We choose (0) = 0.05, and look to bracket the minimum. x1 = x1 + (0.05)4 = 1.2
(2) (1) (1) (0)

CL603 c SBN, 2009

x2 = x2 + 0.05(2) = 1.9
(2) (1)

(1)

(0)

Then f (x(1) ) = 4.25. We next try (1) = 2(0) = 0.1. Then x1 = x1 + (0.1)4 = 1.6 x2 = x2 + 0.1(2) = 1.7

f (x(2) ) = 5.10. The minimum is now bracketed. The optimal value ( = 0.0797) is obtained by quadratic interpolation.

1.7

Termination of a line search

A line search has involved starting at a point, and then nding a direction too move along. Move to a min of an objective function along the line, and then restart from this new point. Percentage test: The step size x or comes to within a xed percentage of true value. Let x = proposed step size, and x the step size needed to the true minimum. |x x | cx , 0<c<1 c = 0.1 is OK. Armijos rule: We rst guarantee that x is not too large, and next that it is not too small. Let () = f (xk + dk ). is not too large if for a xed , (0 < < 1) () (0) + (0)
T

Has there been sucient decrease in a line search?

By focussing on and (), we can use 1D search techniques for a multidimensional problem. Typically 104 .

f (xk + dk ) f (xk ) + f (xk )T dk

Note that f (xk )dk is the slope and is always negative if f is convex. should not be too small: dene > 1. (n) (0) + (0) i.e. if , the rst test (on largeness) fails. A common problem is that very small values of , given a certan and , such as 1015 may satisfy both conditions. Goldsteins test: Like Armijos rule, should not be too large () (0) + (0) but 0 < < 0.5 here. is not too small if () > (0) + (1 ) (0) and in the original notation f (xk+1 ) f (xk ) 1 f (xk )dk Notice the similarity to the Goldstein test. Therefore 0.5 < 1 < 1.

Wolfes test: This test is used when derivatives are available. () (0) + (0) (0) Backtracking This is a line search method where we start with an initial guess of , and we use two parameters > 1 and < 1 (usually < 0.5). The stopping criterion is the same as the rst condition of the Armijo and Goldstein tests. If this condition is not satised, then is reduced by a factor (1/ ) giving new = old / . If the initial satises the test, then it is used as the step size. Else, it is reduced by the factor (1/ ) repeatedly, until it nally satises the test. At that point, old = new does not pass the rst test. Notice that this now passes the second condition of Armijos test. This backtracking method is important because we repeatedly apply it during line searches for a multidimensional optimzation problem. (1 ) (0) 0 < < 0.5

Usually, 0 = 1. () (0) + (0). is usually 1.1 or 1.2.

You might also like