CHP 3 Part One

Chp 3
Introduction to
Optimization Techniques
By Megha V Gupta
Thanks to Dr. Harish Garg for some slides 1

Optimization
�It is the act of obtaining the best result under given
circumstances.
�Ultimate goal : either to minimize the effort required or
to maximize the desired benefit.
�Effort required/ Benefit desired can be expressed as a
function of certain decision variables.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
What is Optimization?
� Choosing the best element from some set of available
alternatives
� Solving problems in which one seeks to minimize or
maximize a real function
Convex problem has a single local/global optima
4
5
6
Introduction To Optimization: Objective Functions and
Decision Variables - YouTube
7
8
9
Definition
�Optimization can be defined as the process of finding
the conditions that give the maximum or minimum
value of a function.
�If a point x∗ corresponds to minimum value of
function f (x), the same point also corresponds to
maximum value of −f (x) as shown below:
by
by Anuradha
Vincy Joseph
Joseph
Copyright
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Optimization
�Thus, optimization can be taken as a minimization
problem since maximum of a function can be found by
seeking minimum of negative of same function.
�In addition, the following operations on objective
function will not change optimum solution x∗:
1. Multiplication (or division) of f (x) by a positive constant c.
2. Addition (or subtraction) of positive constant c to (or from) f
(x).
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Optimization
�There is no single method available for solving all
optimization problems efficiently.
�Hence a number of optimization methods have been
developed for solving different types of optimization
problems.
�Operations research is a branch of mathematics
concerned with the application of scientific methods
and techniques to decision making problems and with
establishing the best or optimal solutions.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Statement of an Optimization Problem
�An optimization problem can be stated as follows
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Types of optimization
� Constraint:- Solution is arrived at by maximizing or
minimizing the objective function
� Unconstraint:- No constraints are imposed on the
decision variables and differential calculus can be used
to analyze them
�where
◦ X is an n-dimensional vector called the design vector,
�f(X) is termed the objective function, and
�gj(X) and lj(X) are known as inequality and equality
constraints, respectively.
�Number of variables n and number of constraints m and/or
p need not be related in any way.
�The problem stated is called a constrained optimization
problem.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
�Some optimization problems do not involve any
constraints and can be stated as
�Such problems are called unconstrained optimization

problems.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Finding Optimum Solution
�A point X∗ will be a relative minimum of f (X) if the
necessary conditions
are satisfied.
�The point X∗ is guaranteed to be a relative minimum
if the Hessian matrix is positive definite.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Classification of Unconstrained
Minimization Methods
�Several methods are available for solving an unconstrained
minimization problem.
�These methods can be classified into 2 broad categories as
◦ direct search methods
◦ descent methods.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Direct search methods
◦ require only the objective function values

◦ No partial derivatives are required
◦ also called the nongradient methods.
◦ They are most suitable for simple problems involving a
relatively small number of variables.
◦ They are less efficient than the descent methods.
20
Classification of Unconstrained
Minimization Methods
�Descent techniques require first and in some cases second
derivatives of the objective function.
�Since derivative information is used, descent methods are
more efficient than direct search techniques.
�The descent methods are known as gradient methods.
�Among the gradient methods,
◦ those requiring only first derivatives are first-order methods;
◦ those requiring both first and second derivatives are termed second-
order methods.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
General Approach
�All the unconstrained minimization methods are
iterative in nature
�They start from an initial trial solution and proceed
toward the minimum point in a sequential manner.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
by
by Anuradha
Vincy Joseph
Joseph
Copyright
General Approach
�The iterative process is given by
�where
◦ Xi is the starting point,
◦ Si is the search direction,
◦ λi* is the optimal step length, and
◦ Xi+1 is the final point in iteration i.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
General Approach
�Note: All unconstrained minimization methods
�(1) require an initial point X1 to start the iterative
procedure, and
�(2) differ from one another only in
◦ the method of generating the new point Xi+1 (from Xi) and
◦ in testing the point Xi+1 for optimality.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Classification of Optimization algorithms
� Derivative-based Optimization
◦ Descent Methods
◦ The Method of Steepest Descent
◦ Classical Newton’s Method
◦ Step Size Determination
� Derivative-free Optimization
◦ Genetic Algorithms
◦ Simulated Annealing
◦ Random Search
◦ Downhill Simplex Search
Introduction To Optimization: Gradient Free Algorithms

(1/2) - Genetic - Particle Swarm - YouTube
Gradient Based Method (Descent Method)
�The negative of the gradient vector denotes the
direction of steepest descent.
�Making use of the gradient vector gives minimum
point faster.
� All descent methods make use of the gradient vector,
either directly or indirectly, in finding the search
directions.
https://www.youtube.com/watch?v=Gbz8RljxIHo
Introduction To Optimization: Gradient Based Algorithms

- YouTube
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Gradient of a Scalar Function
�
29
Evaluation of the Gradient
�The evaluation of the gradient requires the computation
of the partial derivatives
df/dxi, i = 1,2,. . .,n.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Problems in Evaluation of Gradient
1. The function is differentiable at all the points, but
calculation of components of gradient, df/dxi is either
impractical or impossible.
2. The expressions for the partial derivatives df/dxi can be
derived, but they require large computational time for
evaluation.
3. The gradient is not defined at all the points.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Steepest Descent (Cauchy) Method
�The use of negative gradient vector as a direction for
minimization was first made by Cauchy in 1847.
�In this method we start from an initial trial point X1 and
iteratively move along the steepest descent directions
until the optimum point is found.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Steps in Steepest Descent Method
1. Start with an arbitrary initial point Xi. Set the iteration number
as i = 1.
2. Find the search direction Si as
3. Determine the optimal step length λi* in the direction Si and set
4. Test the new point, Xi+1, for optimality. If Xi+1 is optimum, stop
the process. Otherwise, go to step 5.
5. Set the new iteration number i = i + 1 and go to step 2.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Steepest Descent Method
�Owing to the fact that the steepest descent direction is a
local property, the method is not really effective in
most problems.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Example
2 2
F x  = x1 + 2 x1 x 2 + 2x 2 + x1
x 0 = 0.5  = 0.1
0.5

F x 
 x1 2x 1 + 2x2 + 1 g0 =  F x  = 3
F  x  = = x= x0
 2x 1 + 4x 2 3
F x 
 x2
x 1 = x 0 – g 0 = 0.5 – 0.1 3 = 0.2

0.5 3 0.2
x2 = x1 – g1 = 0.2 – 0.1 1.8 = 0.02

0.2 1.2 0.08
Plot
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Effect of learning rate
�More the learning rate the trajectory becomes
oscillatory.
�This will make the algorithm unstable
�The upper limit for learning rates can be set for
quadratic functions
by
by Anuradha
Vincy Joseph
Joseph
Copyright
38
39
Problem
�
by
by Anuradha
Vincy Joseph
Joseph
Copyright
41
42
43
44
45
Newton’s Method
�In Newton’s method, we consider the quadratic
approximation of the function f(X) at X = Xi, using the
Taylor's series expansion
where [Ji] = [J] | Xi is the matrix of second

partial derivatives (Hessian matrix) of f(X)
evaluated at the point Xi.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Newton’s Method
�By setting the partial derivatives of above equation
equal to zero for the minimum of f(X), we get
�If [Ji] is nonsingular, the above equation can be solved

to obtain an improved approximation (X = Xi+1) as
by
by Anuradha
Vincy Joseph
Joseph
Copyright
48
49
Newton’s Method
�Since higher-order terms of Taylor series are neglected,
the above equation is to be used iteratively to find the
optimum solution X*.
�The sequence of points X1, X2, . . . ,Xi+1 can be shown
to converge to the actual solution X* from any initial
point X1 sufficiently close to the solution X*, provided
that [J1] is nonsingular.
�Since Newton's method uses the second partial
derivatives of the objective function, it is a second-
order method.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
51
Problem with Newton’s Method
�
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Problem with Newton’s Method
�
by
by Anuradha
Vincy Joseph
Joseph
Copyright
by
by Anuradha
Vincy Joseph
Joseph
Copyright
by
by Anuradha
Vincy Joseph
Joseph
Copyright
• Thus, X2 is the optimum point.
• Thus the method has converged
in one iteration for this
quadratic function.
by
by Anuradha
Vincy Joseph
Joseph
Copyright
Let's see how the Newton method has converged in
one iteration (refer the above problem for the
quadratic equation)
57

CHP 3 Part One

Uploaded by

Copyright:

Available Formats

CHP 3 Part One

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CHP 3 Part One

Uploaded by

Copyright:

Available Formats

Chp 3

Thanks to Dr. Harish Garg for some slides 1

�An optimization problem can be stated as follows

�Such problems are called unconstrained optimization

◦ require only the objective function values

Introduction To Optimization: Gradient Free Algorithms

Introduction To Optimization: Gradient Based Algorithms

x 1 = x 0 – g 0 = 0.5 – 0.1 3 = 0.2

x2 = x1 – g1 = 0.2 – 0.1 1.8 = 0.02

where [Ji] = [J] | Xi is the matrix of second

�If [Ji] is nonsingular, the above equation can be solved

You might also like