2 - Non Linear Optimization - V3
2 - Non Linear Optimization - V3
Samih Abdul-Nabi
Page 1 of 20
ENGG515 Dr. Samih Abdul-Nabi
Page 2 of 20
ENGG515 Dr. Samih Abdul-Nabi
P = Max f ( x)
Subject to x Î W
With x an n-vector of independent variables: x = [ x1 , x2 ,..., xn ]T ÎÂn these are the decision
variables of the problem. The set W is the set of feasible solutions. A feasible solution is a
solution that satisfies all the constraints of the problem. So the problem P can be written as:
P = Max f ( x)
ì g ( x) £ bi for i = 1, 2,...m
Sobject to í i
îx ³ 0
The function f : Â n ® Â that we wish to maximize is a real-valued function called the objective
function or cost function.
P is a decision problem where we are asked to find the best vector x among all possibilities in W.
Also note that we might be asked to minimize a problem. In fact minimizing f is the same as
maximizing –f.
NOTES:
• If a variable xi is restricted to be negative. A change of variable xi =-x’i is needed with
x’i≥0.
• If a variable xi is not restricted (can be positive or negative) A change of variable xi= x’i- x’j
is needed with x’i≥0 and x’j≥0.
• If a constraint is restricted to be negative, it can be multiplied by -1.
1.1 Neighbors
The neighbors of x, identified as N(x) is the set of variables defined as:
N ( x) = {y ÎW, x - y < e }with Ꜫ > 0 a very small number and x - y the Euclidian distance between
Page 3 of 20
ENGG515 Dr. Samih Abdul-Nabi
1.3 Global maximizer
Definition: x*is a global maximizer if f(x*) ≥f(x) for any xÎW
Meaning that the value of f(x*) is the highest among all x in the feasible domain.
If we have f(x*) >f(x) for any xÎW, then x* is a strict global maximum.
Figure 1-1 shows some local and global maxima. X1 is a local maximum (not strict) while X2 is a
strict local maximum and X3 is a strict global maximum.
The major difficulty of nonlinear optimization is that algorithms are unable to easily
differentiate between local and global maximizers.
1.4 Level sets
The level set of a function f : Â n ® Â at level c is the set of points: S = {x ÎW, f ( x) = c}.
Figure 1-2 shows a nonlinear optimization problem. The feasible domain defined by the linear
constraints is shown in gray.
To draw the domain:
• Consider the first constraint
- Draw the equation of the curve defined as equality (x1=4)
This line divides the plan into two parts. The set of points satisfying x1≤4 is one of
these parts.
- Test on any point, say (0, 0) to see which part is the feasible part. In our case, we
have 0≤4 therefor the part containing the point (0, 0) belongs to the feasible
domain.
• We repeat the same exercise for the four constraints (including constraints on the sign
of the decision variables).
Page 4 of 20
ENGG515 Dr. Samih Abdul-Nabi
Figure 1-2 shows also some level sets of Z. Note in this case that the highest value of Z is
reached at point (3, 3) which is the global maximizer of Z.
A point xÎW is said to be a boundary point if every neighborhood of x has a point in W and a
point outside W. The set of all boundary points of W is called the boundary of W.
Note that a feasible domain might not have a boundary. Therefore, every point of the set is an
interior point. The set ]-1, 1[ has no boundary.
Page 5 of 20
ENGG515 Dr. Samih Abdul-Nabi
1.6 Search direction
Definition: A vector dÎRn with d¹0 (at least one component is not 0) is a feasible direction at
xÎW if there exists α0>0 such that x+αd Î W for all α Î [0, α0].
Figure 1-4 shows d1 as feasible direction while d2 is not a feasible direction from x. Note that x is
on the frontier on the feasible domain.
The directional derivative of the function f in the direction d, is a real valued function denoted
¶f f ( x + a d ) - f(x) ¶f
by: ( x) = lim = Ñf ( x), d = d T Ñf ( x) . If ||d|| =1 (d is a unit vector) then is
¶d a
®0
a ¶d
the rate of increase of the function f at x in the direction of d.
T
Example: consider f : Â ® Â as f ( x) = x1x2 x3 and let d = éê , , ùú then x=(1,1,2) the directional
3 1 1 1
ë2 2 2 û
derivative of the function f in the direction d is:
é 1/ 2 ù
¶f ê ú x x + x x + 2 x1 x2
( x) = Ñf ( x) d = [ x2 x3 , x1 x3 , x1 x2 ] ê 1/ 2 ú = 2 3 1 3
T
¶d ê1/ 2 ú 2
ë û
Note that because d is a unit vector the value is the rate of increase of the function f at x in the
direction of d.
The directional derivative gives an indication about what will happen to the function if we move
in the direction of d starting from x. If the directional derivative is >0 then the value of the
function f will increase if we move in that direction. The maximum increase is found in the
direction of the gradient. Maximum increase with the directional derivative = Ñf ( x)T Ñf ( x)
Page 6 of 20
ENGG515 Dr. Samih Abdul-Nabi
Figure 1-5 shows the directional derivative at two border points. First being at x1, if we move in
the direction of d1 the value of the function f will decrease sine d1T Ñf ( x1 ) < 0 (the angle is larger
than 90). While from x2 moving in the direction of the gradient will increase the value of the
function f.
1.7 Types of a functions
1.7.1 Convex sets
A set WÌRn is called convex if any two points x’ and x’’ÎW, the line segment joining x’ and x’’
completely belongs to W.
In other words: for each t Î[0, 1] the point x = (1 - t ) x'+ tx'' is also in W for every t Î[0, 1].
Definition: A function f: WÌRn à R defined on a convex set W is concave if given any two points
x’ and x’’ÎW we have: (1-t)*f(x’)+t*f(x’’) ≤ f((1-t)*x’+t*x’’) for every t Î[0, 1].
Page 7 of 20
ENGG515 Dr. Samih Abdul-Nabi
Definition: A function f: WÌRn à R defined on a convex set W is convex if given any two points
x’ and x’’ÎW we have: (1-t)*f(x’)+t*f(x’’) ≥ f((1-t)*x’+t*x’’) for every t Î[0, 1].
Figure 1-7 shown a convex and a concave functions. A concave function is said to be curved
down, while a convex function is said to be curved up.
The summation of concave functions is also a concave function. Similarly, the summation of
convex functions is also a convex function.
Note: knowing the type of a function (if it is concave) when we maximize a function of a single
variable without any constraint guarantees that a local maximizer is also a global maximizer.
This guarantee can be given when:
¶2 f
£0
¶x 2
Similarly, a local minimizer of a convex function of a single variable without any constraint is
also a global minimizer if:
¶2 f
³0
¶x 2
A necessary condition that a solution x is optimal when f(x) is a differentiable equation is:
Page 8 of 20
ENGG515 Dr. Samih Abdul-Nabi
¶f
= 0 for j = 1, 2,..., n
¶x j
Note: when f(x) is a concave function the condition is also sufficient.
2.2 One variable unconstrained optimization
Conditions:
• n=1 (one variable)
• the function is concave
Therefor the necessary and sufficient condition for a particular solution x = x* to be
optimal (a global maximum) is:
¶f
=0
¶x
NOTE: the function might not be that easy to derive. For that different methods exist to find a
local maximizer of such function.
NOTE: it is true that we cannot derive (very hard to derive) but we can get the value of the
derivate for a specific variable.
f(x)
Page 9 of 20
X2 X3 x
X0 X1
Figure 2-1: Newton Method for one variable unconstrained function
ENGG515 Dr. Samih Abdul-Nabi
Starting from X0 we approximate f(x) as a quadratic function and we find its maximum (x1). We
then approximate f(x) at x1 to find x2 then x3. We the difference between xk-1 and xk reaches a
threshold, we stop.
The approximation of the function is obtained using Tayler series as follows:
f '' ( xi )
p ( x ) = f ( xi ) + f ' ( xi )( x - xi ) + ( x - xi )2
2
Note that p(x) is similar to f(x) in the neighborhood of xi.
If we derive and set to 0 to find the maximum we got:
f '' ( xi )
p ' ( x ) = f ' ( xi ) + 2* ( x - xi )
2
f ' ( xi ) + f '' ( xi )( x * - xi ) = 0
f ' ( xi )
x* = xi - then we take xi +1 = x *
f '' ( xi )
f ' ( xi )
Newton, moves from xi in the direction of ( - )
f '' ( xi )
Direction: x=xi + Alpha d
The method:
Step 0. find an initial solution x0 set i=0
Step 1. Compute f’ and f’’ at xi
f ' ( xi )
Step 2. Set xi +1 = xi -
f '' ( xi )
Step 3. if xi +1 - xi < e then stop, else i=i+1 and goto step 1
Page 10 of 20
ENGG515 Dr. Samih Abdul-Nabi
2.3.1 Gradient search
In this context, the objective function is assumed to be differentiable, thus has a gradient Ñf ( x)
æ ¶f ¶f ¶f ö
with Ñf ( x) = ç , ,..., ÷ . As we said before the direction of the gradient is the one that
¶
è 1x ¶x2 ¶xn ø
increase the most the objective function.
Starting with an initial point X0, we move in the direction of the gradient. So let x1 be the next
point to reach from X0 in the direction on the gradient. X1 = X0 + αÑf(X0).
Note that replacing X by X1 in the objective function (that we need to maximize) gives a
function f(α) with only α as variable. In this case we are back to a one variable unconstraint
function that we know how to maximize.
The question is what is the best value of α that gives the best new X (X1 that maximizes f(X1)
starting from X0. This can be found by maximizing f(α).
2.3.1.2 Example
Consider the following two-variable problem:
f ( x) = 2x1x2 + 2x2 - x12 - 2x22
The gradient is Ñf ( x) = ( 2 x2 - 2 x1 , 2 x1 + 2 - 4 x2 )T
To apply the gradient search algorithm, let us start with (0, 0) and we have Ñf (0, 0) = ( 0, 2 )T
Iteration 1: the new point is ( 0 + t (0),0 + t(2) ) = (0, 2t )T
X1 = X0 + t*Ñf(X0) = (0,0)T + t*(0,2)T = (0+t*0, 0+t*2)T = (0,2t)T
f (0,2t) = 4t - 8t 2 this function has its maximum for t=1/4 (by setting the derivative to 0)
Thus the new point is (0, ½)
æ 1ö
Ñf ç 0, ÷ = (1, 0 )
T
è 2ø
Page 11 of 20
ENGG515 Dr. Samih Abdul-Nabi
æ 1ö 1
f ç t, ÷ = t - t 2 + this function has its maximum for t=1/2 (by setting the derivative to 0)
è 2ø 2
Thus the new point is (½, ½)
æ1 1ö
Ñf ç , ÷ = ( 0,1)
T
è2 2ø
And so on until the stopping criterion is met, considering for example 0.1 as maximum value of
each partial derivative. Figure 2-2 shows how the gradient method moves from one solution to
another until it converge towards the optimal solution.
The optimal solution reached is a global maximum since the function is concave. If this was not
the case, the solution will simply be a local maximum.
Figure 2-3 shows another illustration on how the gradient method works. The method starts at
X(0) and moves in the direction of the gradient increasing the objective value to the maximum.
This increase ends when the gradient is tangent to a level set at X(1). Starting from X(1) we
move again in the direction of the gradient (which is perpendicular to the tangent to the level
set) and so on.
Page 12 of 20
ENGG515 Dr. Samih Abdul-Nabi
2.3.2 Newton
We saw in 2.2.1 how to use Newton to solve one variable unconstrained function. Same
concept and same approximation are used in case of multi-variable unconstrained optimization.
f ' ( xi )
However since we are in higher dimension then xi +1 = xi -
f '' ( xi )
( ) ( )
-1
x(
k +1)
= x( ) - F x( ) Ñf x( )
k k +1 k
2.3.2.1 Example
Consider the function f ( x1, x2 , x3 , x4 ) = -( x1 +10x2 )2 - 5( x3 - x4 )2 - ( x2 - 2x3 )4 -10( x1 - x4 )4
In order to apply the Newton method, we select x(0) = [3, -1,0,1]T as starting point. This gives
( )
f x(0) = -215 . Before starting with the method let us compute the gradient and the Hessian.
é -2( x1 + 10 x2 ) - 40( x1 - x4 )3 ù
ê ú
ê -20( x1 + 10 x2 ) - 4( x2 - 2 x3 )3 ú
Ñf ( x ) = ê
3 ú
ê -10( x3 - x4 ) + 8( x2 - 2 x3 ) ú
ê 3 ú
ë -10( x3 - x4 ) + 40( x1 - x4 ) û
Page 13 of 20
ENGG515 Dr. Samih Abdul-Nabi
Iteration 1:
( )
Ñf x(0) = [-306 144 2 310]
T
( ) ( )
-1
x( ) = x( ) - F x( ) Ñf x( )
2 1 1 1
Page 14 of 20
ENGG515 Dr. Samih Abdul-Nabi
m
The method starts by formulating the Lagrangian function F (X, l ) = f (X) - å li [ gi (X) - bi ] where
i =1
2.4.1.1 Example
Consider the problem
Maximize f ( x1 , x2 ) = x12 + 2 x2
Subject to :
g ( x1 , x2 ) = x12 + x22 = 1
The corresponding Lagrangian function is F( x1 , x2 , l ) = x12 + 2 x2 - l ( x12 + x22 - 1) and the partial
derivatives are:
¶F
= 2 x1 - 2l x1 = 0 ® x1 (1 - l ) = 0 ® x1 = 0 or l =1
¶x1
¶F
= 2 - 2l x2 = 0
¶x2
¶F
¶l
( )
= - x12 + x22 - 1 = 0
If l=1 then from the remaining two partial derivatives we can have x2=1 and x1=0
Si x1=0 then from the third partial derivative x2=E1. Therefore, the two critical points are (0,1)
and (0,-1).
2.4.2 Duality
We consider now the following problem
Minimize f ( x)
ìïhi ( x) = 0 i = 1,..., m
Subject to í
ïî g j ( x) £ 0 j = 1,...r
We denote by f* the optimal value of the function f and x* the value of the variables leading to
the optimal value.
The Lagrangian of this problem can be written as follows:
m r
L(x,l ,µ )=f ( x) + å l h ( x) + å µ g ( x)
i =1
i i
j =1
j j
Page 15 of 20
ENGG515 Dr. Samih Abdul-Nabi
æ m r ö
q(l ,µ ) = infn L(x,l ,µ )= infn ç f ( x) +
xÎÂ xÎÂ ç
å li hi ( x) + å µ j g j ( x) ÷
÷
è i =1 j =1 ø
The function q is called the duality function. The Lagrangian multipliers l and µ are called also
duality variables.
Page 16 of 20
ENGG515 Dr. Samih Abdul-Nabi
Page 17 of 20
ENGG515 Dr. Samih Abdul-Nabi
3 Exercises
1- Consider the function f ( X ) = -x12 - x2 + x3 defined on W = ìíY Î 3 / y2 =
y1 6y ü
, y3 = 1 ý
î 2 5 þ
a. Say if these points are feasible solution to the problem.
i. X T = ( 0.25 0.5 0.3)
ii. X T = ( 0.5 0.25 0.6)
iii. X T = ( 0.35 0.175 0.45)
b. Find a local maximizer.
c. Say if it is global.
Key Solution
a. No, Yes, No
b. x1 = 0.35 x2 = 0.175 x3 = 0.42
c. The second derivative is always < 0.
Solution
The figure shows the level sets. The gradient is as
æ ¶z ö
ç ¶x ÷ æ -2 x + 4 ö æ -2 0 ö
follows ÑZ = ç ÷ = ç ÷ H =ç ÷< 0
ç ¶z ÷ è -2 y + 4 ø è 0 -2 ø
ç ¶y ÷
è ø
Gradient at (4, 1) is (-4, 2). Note this gives us the direction.
æa bö
NOTE: A 2x2 ç ÷ symmetric matrix is
èc dø
1- Positive definite if and only if a > 0 and det(A) > 0
2- negative definite if and only if a < 0 and det(A) > 0
3- indefinite if and only if det(A) < 0
Page 18 of 20
ENGG515 Dr. Samih Abdul-Nabi
3- Consider the function Max f (X) = ln (1 + x1 - x2 )2 - x12 - x22
a. Determine Ñf the gradient of f and Ñ2f the Hessian of f
b. Starting from X(0) =(-5, -5) determine the best search direction
c. Using Newton, what search direction is used?
d. Perform one iteration of Newton method to find X(1). How much increase in the
objective function?
e. Write the expression of X(i) using gradient method.
f. Starting with X(0) =(-5,-5), What problem should be solved to find the best value
of X(1)?
g. Solve the problem to find X(1) and the increase in the objective function.
Solution
a.
é 2 ù é -2 2 ù
ê 1 + x - x - 2 x1 ú ê (1 + x - x )2 - 2 2 ú
(1 + x1 - x2 ) ú
Ñf ( x) = ê 1 2 ú Ñ 2 f ( x1 , x2 ) = ê 1 2
ê 2 ú ê 2 -2 ú
ê - 1 + x - x - 2 x2 ú ê 2
- 2 ú
ë 1 2 û ë (1 + x1 - x2 ) (1 + x1 - x2 )2 û
b. The best search direction from (-5, -5) is in the direction of Ñ. So in the direction
(12, 8)
c. For Newton X (i +1) = X (i ) - H -1 ( X (i ) ) * Ñf ( X (i ) ) = X (i ) + Newton search direction
So the search direction for Newton is (5.33, 4.66)
Page 19 of 20
ENGG515 Dr. Samih Abdul-Nabi
4- Use Lagrangian multipliers to show that the problem
f = 81x2 + y 2 Subject to the constraint 4x2 + y 2 = 9 x, y Î
Has 4 extreme points that need to be identified.
Page 20 of 20