0% found this document useful (0 votes)
21 views58 pages

4 Pattern Directions, 21-08-2024

Uploaded by

balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views58 pages

4 Pattern Directions, 21-08-2024

Uploaded by

balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

MODULE – 3

Unconstrained Nonlinear
Optimization
• Let us consider various methods of solving
unconstrained minimization problem:

• Most of the real world system models involve


nonlinear optimization with complicated objective
functions or constraints for which analytical
solutions (solutions using quadratic programming,
geometric programming, etc.) are not available
• In such cases one of the possible solutions is
the search algorithm in which, the objective
function is first computed with a trial solution
and then the solution is sequentially improved
based on the corresponding objective function
value till convergence.
• Classification of Unconstrained Minimization Methods
• Several methods are available for solving an unconstrained
minimization problem.
• These methods can be classified into 2 categories as direct search
methods and descent methods
• A direct search algorithm for numerical search optimization depends
on the objective function only through ranking a countable set of
function values. It does not involve the partial derivatives of the
function and hence it is also called non gradient or zeroth order
method.
• Indirect search algorithm, also called the descent method, depends
on the first (first-order methods) and often second derivatives
(second-order methods) of the objective function.
• Direct methods are most suitable for simple problems involving a
relatively small number of variables.
• These methods are less efficient than descent methods.
• The descent techniques require, in addition to the
function values, the first and in some cases the
second derivatives of the objective function.
• Since more information about the function being
minimized is used, descent methods are generally
more efficient than direct search techniques.
• Descent methods are known as gradient methods.
• Among the gradient methods, those requiring only
first derivatives of the function are called first-
order methods;
• Those requiring both first and second derivatives
of the function are termed second-order methods.
Unconstrained Minimization Methods

Hook and Jeeves’ method


• UNIVARIATE METHOD
• Here we change only one variable at a time, seek to produce a
sequence of improved approximations to minimum point.
• By starting at a base point Xi in the ith iteration, we fix the
values of n − 1 variables and vary the remaining variable.
• Since only one variable is changed, the problem becomes a one-
dimensional minimization problem and any of the methods can
be used to produce a new base point Xi +1.
• The search is now continued in a new direction.
• This new direction is obtained by changing any one of the n − 1
variables that were fixed in previous iteration, search procedure
is continued by taking each coordinate direction in turn.
• After all n directions are searched, 1st cycle is complete and we
repeat entire process of sequential minimization.
• Procedure is continued until no further improvement is possible
in the objective function in any of the n directions of a cycle.
Univariate method is summarized as follows:
1. Choose an arbitrary staring point X1 and set i = 1.
2. Find the search direction Si as

(1
,0,0,...,0) f
ori 1,n 1,2n  1
,...

(0,1
,0,...,0) f
ori 2,n  2,2n 2,...


S T
i 
(0,0,1
,...,0) f
ori 3,n  3,2
n  3,...
 



(0,0,0,...,
1) f
ori n,2 n,3n,
...
3.Determine whether λi should be positive or negative.
For current direction Si , find whether the function value
decreases in the positive or negative direction.
For this we take a small probe length (ε) and evaluate fi = f (Xi
), f+ = f(Xi + εSi ), and f- = f(Xi - εSi). If f+ < fi , +Si will be the
correct direction for decreasing the value of f and if f- < fi , −Si
will be the correct one. If both f+ and f- are greater than fi ,
we take Xi as minimum along direction Si .
4. Find the optimal step length λi∗ such that

where + or − sign has to be used depending upon whether


Si or − Si is the direction for decreasing the function value.
5. Set Xi+1 = Xi ± λi ∗ Si depending on the direction for
decreasing the function value,
and fi+1 = f(Xi+1).
6. Set the new value of i = i + 1 and go to step 2.
Continue this procedure until no significant change is
achieved in the value of the objective function.
• The univariate method is very simple and can be
implemented easily.
• However, it will not converge rapidly to the
optimum solution, as it oscillates with steadily
decreasing progress toward the optimum.
• Hence it will be better to stop the computations at
some point near to the optimum point rather than
trying to find the precise optimum point.
• In theory, the univariate method can be applied to
find the minimum of any function that possesses
continuous derivatives.
• However, if the function has a steep valley, the
method may not even converge
• Example : Minimize f (x1, x2) = x1 − x2 + 2x1 2 + 2 x1 x2 + x2 2 with
the starting point (0, 0).
• SOLUTION : Let the probe length (ε) as 0.01 to find the correct
direction for decreasing function value in step 3.
• We will use the differential calculus method to find the
optimum step length λi∗ along direction ±Si in step 4.
• Iteration i = 1
• Step 2: Choose the search direction S1 as S1 =
• Step 3: To find whether the value of f decreases along S1 or − S1,
we use the probe length ε.
• Since
• f1 = f(X1) = f(0, 0) = 0, f+ = f(X1 + εS1) = f((1,0),ε) = 0.01 − 0 +
2(0.0001) + 0 + 0 = 0.0102 > f1
• f- = f(X1 - εS1) = f((1,0),−ε) = −0.01 − 0 + 2(0.0001) + 0 + 0 =
−0.0098 < f1, −S1 is correct direction for minimizing f from X1.
• Step 4: To find optimum step length λ1*, we minimize f(X1 - εS1)
replace x1 by −λ1=f(− λ1, 0)=(− λ1)−0 + 2(− λ1)2 + 0 + 0 = 2λ12 − λ1.
• As df/dλ1 = 0 at λ1 = (1/4) we have λ1∗ = (1/4)
• Step 5: Set

As x1=-1/4, x2=0
• Step 2: Choose the search direction S2 as S2 =
• Step 3: Since f2 = f(X2) = -(1/8) = −0.125, f+ = f(X2 + εS2) = f(−0.25,
0.01*1) = −0.1399 < f2
• f- = f(X2 - εS2) = f (−0.25,−0.01*1) = −0.1099 > f2
• +S2 is the correct direction for decreasing value of f from X2.
• Step 4: We minimize f(X2 + λ2S2) to find λ2* .
• Replace x1 by -0.25 and x2 by λ2, f(X2 + λ2S2) = f(−0.25, λ2 ) =
−0.25 − λ2 + 2(-0.25)2 + 2(-0.25)(λ2 ) + λ22 = λ2 2 − 1.5 λ2 − 0.125
• (df/dλ2) = 2λ2 − 1.5 = 0 at λ2* = 0.75
• Step 5: Set

• Next we set the iteration number as i = 3, and continue the


procedure until the optimum solution X∗ =
with f(X∗) = −1.25 is found.
• If method is to be computerized, a suitable convergence
criterion has to be used to test point Xi+1(i = 1, 2, . . .) for
optimality.
Pattern Directions
• In the univariate method, we search for the minimum along
directions parallel to the coordinate axes.
• This method may not converge in some cases, at times, it is
very slow towards optimum point.
• To resolve this, changing the directions of search in a
favourable manner instead of doing them parallel to the
coordinate axes.
+0+0
Univariate Method
Solution
• Iteration 1
s1 = (1,0)
f – 0 , f+ = 0.0999 , f - = -0.1001
So negative direction
lamda value = -5
X2 = (5,0) = f(5,0) = 25
Since f(x2) is higher we may leave, try with (0,1) with (0,0)
also higher. So current soln is the best..
(only highest , only lowest…)
Hooke and Jeeves Method
• It is a sequential search method, has 2 types of
moves:
• Exploratory search. This is a local search that looks
for an improving direction in which to move.
• Pattern move. This is a larger search in the
improving direction. Larger moves are made as long
as the improvement continues.
• Exploratory Search
The main idea is to find some improving direction.
This is done by perturbing (upsetting) the current
point by small amounts in each of the variable
directions and observing whether the objective
function value improves or worsens.
(c'o)-(SoSo)t

(S"o

(s)'s)
(SX)
(1)

(l

(X)f
(h
Xs 2 X4 -X?
- 2( 2, 2) - : s ,|S)
- (2-5 2.5 )
&24
proaed

z 2(25 , 23) -(2,2 )


(3,3)
26

(2-$,2.S)
phnns

hove
Erplsay
(2·s,2) J5
fCxs)
4(25tD, 2-5)4-874
4ixs) J(2s-4,25) l4.06b
X 245
y-diteahir
4(2r2:s) 4. &18

f- (2.5, 25-A) ( 2-0g

f(2. 22) 2.07

values

(2-,2.2s)
Solve the following function using
Hooks and Jeeves
• Example:
• Minimize f(x)
• Given, x1 = 1, x2 =1, Δx1 = Δx2 = 0.5 and ε = 0.1, a = 2.
• Initialize:
• Initial point: x(0) = (1, 1), f(x(0)) = -16.
• Acceleration factor a = 2.
• Perturbation vector P0 = (0.5, 0.5).
• Perturbation tolerance vector T = (0.1, 0.1).
• P  P0 .
• Note that these are not very good choices for P0 and T.
• They are chosen in this case just show that the
algorithm terminates after a small number of steps.
• The elements in T would normally be much smaller.
• Start/Restart: fbest = f(x(0)) = -16.
• Try x(1) = (1.5, 1), f(x(1)) = -18.25, keep
perturbation, update fbest = -18.25.
• Try x(1) = (1.5, 1.5), f(x(1)) = -21, keep perturbation
and update fbest = -21.
• The steps in the exploratory search are shown in
this first Start/Restart, but are omitted from here
forward.
• Pattern Move from x(0) = (1, 1) to x(1) = (1.5, 1.5):
• Tentative x(2) = 2x(1) – x(0) = 2[(1.5, 1.5)] – (1, 1) =
(2, 2), f(2, 2) = -24.
• Final x(2) after exploratory search around tentative
x(2) is (2.0, 2.5), f(x(2)) = -25.75 is better than
f(x(1)) = -21 so the move is accepted.
• Update points: x(0) x(1) = (1.5, 1.5) and x(1)
x(2) = (2.0, 2.5).
• Pattern Move from x(0) = (1.5, 1.5) through x(1) =
(2.0, 2.5):
• Tentative x(2) = 2(2.0, 2.5) – (2,2) = (2, 3), f(2,3) =
-27.
• Final x(2) after exploratory search around
tentative x(2) is (2,3)
• f(x(2)) = -28 is better than f(x(1)) = -25,75 so the
move is accepted.
• Update points: x(0)x(1) = (2.0, 2.5) and
x(1)x(2) = (2.0, 4.0).
• Pattern Move from x(0) = (2.0, 2.5) through x(1) = (2.0, 4.0):
Tentative x(2) = 2(2.0, 4.0) – (2.0, 2.5) = (2.0, 5.5), f(2.0, 5.5) = -
25.75.
• Final x(2) after exploratory search around tentative x(2) is
(2.0, 5.0), f(x(2)) = -27 is worse than f(x(1)) = -28 so the move
is rejected.
• Update points: x(0) x(1) = (2.0, 4.0).
• Start/Restart:
• Exploratory search around x(0) = (2.0, 4.0) fails at all levels of
perturbation size.
• Exit with solution x(0) = (2.0, 4.0) and f(x(0)) = -28.
• The data points are as follows :
• a: f(1, 1) = -16 e: f(2.5, 3.5) = -27
• b: f(1.5, 1.5) = -21 f: f(2, 4) = -28 [eventual solution]
• c: f(2, 2) = -24 g: f(2, 5.5) = -25.75
• d: f(2, 2.5) = -25.75 h: f(2,5) = -27
Indirect Search (Descent) Method :
Gradient of a Function
• Indirect Search (Descent) Method : Gradient of a Function
• The gradient of a function is an n-component vector

• The gradient has a very important property.


• If we move along gradient direction from any point in n-
dimensional space, function value increases at fastest rate.
• Gradient direction is called the direction of steepest ascent.
• Unfortunately, the direction of steepest descent is a local
property and not a global one.
• This is illustrated in Fig. below, where the gradient vectors
∇f evaluated at points 1, 2, 3, and 4 lie along the directions
11′, 22′, 33′, and 44′, respectively.
• Thus the function value increases at the fastest rate in the
direction 11′ at point 1, but not at point 2.
• Similarly, the function value increases at the fastest rate in
direction 22′(33′) at point 2 (3), but not at point 3 (4).
• In other words, the direction of steepest ascent generally
varies from point to point, and if we make infinitely small
moves along the direction of steepest ascent, the path will be
a curved line like the curve 1–2–3–4 in Fig below.
• Since the gradient vector represents the direction of steepest
ascent, the negative of the gradient vector denotes the
direction of steepest descent.
• Thus any method that makes use of the gradient vector can
be expected to give the minimum point faster than one that
does not make use of the gradient vector.
• All the descent methods make use of the gradient vector,
either directly or indirectly, in finding the search directions.
Steepest Ascend Direction

Theorem : The maximum rate of change of f at any


point X is equal to the magnitude of the gradient
vector at the same point.
• Steepest Descent (Cauchy Method)
• The use of the negative of the gradient vector as a direction for
minimization was first made by Cauchy in 1847
• Here we start from an initial trial point X1 and iteratively move
along steepest descent directions until optimum point is found.
• Steepest descent method can be summarized by the steps:
• 1. Start with an arbitrary initial point X1.
• Set the iteration number as i = 1.
• 2. Find the search direction Si as Si = −∇fi = −∇f (Xi )
• 3. Determine the optimal step length λi* in the direction Si and set
• Xi+1 = Xi + λi* Si = Xi − λi* ∇fi
• 4. Test the new point, Xi+1, for optimality.
• If Xi+1 is optimum, stop the process. Otherwise, go to step 5.
• 5. Set the new iteration number i = i + 1 and go to step 2.
• However owing to the fact that the steepest descent direction is a
local property, the method is not really effective in most problems.
• Example : Minimize f (x1, x2) = x1 − x2 + 2x1^2 + 2x1x2 +
x2^ 2 starting from the point X1 = .
• Convergence Criteria
• Following criteria is used to terminate the iterative process.
• 1) When the change in function value in 2 consecutive
iteration is small :
• 2) When the partial derivatives ( components of the
gradients of f) are small :
• 3) When the change in the design vector in 2 consecutive
iterations is small
Example 2 :
Conjugate Gradient (Fletcher – Reeves) Method
• The convergence characteristics of the steepest descent
method can be improved greatly by modifying it into a
conjugate gradient method (which can be considered as a
conjugate directions method involving the use of the
gradient of the function).
• We saw that any minimization method that makes use of
the conjugate directions is quadratically convergent.
• This property of quadratic convergence is very useful
because it ensures that the method will minimize a
quadratic function in n steps or less.
• Since any general function can be approximated reasonably
well by a quadratic near the optimum point, any
quadratically convergent method is expected to find the
optimum point in a finite number of iterations.
• We have seen that Powell’s conjugate
direction method requires n single-variable
minimizations per iteration and sets up a new
conjugate direction at the end of each
iteration.
• Thus it requires, in general, n^2 single-variable
minimizations to find the minimum of a
quadratic function.
• On the other hand, if we can evaluate the
gradients of the objective function, we can set
up a new conjugate direction after every one-
dimensional minimization, and hence we can
achieve faster convergence
• Fletcher–Reeves Method
• The iterative procedure of Fletcher–Reeves method is as
follows:
• 1. Start with an arbitrary initial point X1.
• 2. Set the first search direction S1 = −∇f (X1) = −∇f1.
• 3. Find the point X2 according to the relation X2 = X1 + λ1*S1
• where λ1* is the optimal step length in the direction S1.
• Set i = 2 and go to the next step.
• 4. Find ∇fi = ∇f (Xi ), set Si+1 = −∇fi +[|∇fi |^2/|∇fi−1|^2 ]* Si
• 5. Compute the optimum step length λ∗
• i in the direction Si , and find the new point Xi+1 = Xi + λi*Si
• 6. Test for the optimality of the point Xi+1.
• If Xi+1 is optimum, stop the process.
• Otherwise, set the value of i = i + 1 and go to step 4.
• Example : Minimize f (x1, x2) = x1 − x2 + 2x1^2+ 2x1x2 +
x2^ 2 starting from the point X1 =
• Iteration 1

• The search direction is taken as S1 = −∇f1 =


• To find the optimal step length λ1∗ along S1, we minimize
f(X1 + λ1S1) with respect to λ1.
• Here

• Therefore,
• Thus the optimum point is reached in 2 iterations.
• Even if we do not know this point to be optimum, we will
not be able to move from this point in the next iteration.
• This can be verified as follows :

You might also like