CHP 3 Part One
CHP 3 Part One
CHP 3 Part One
Introduction to
Optimization Techniques
By Megha V Gupta
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
What is Optimization?
� Choosing the best element from some set of available
alternatives
� Solving problems in which one seeks to minimize or
maximize a real function
Convex problem has a single local/global optima
4
5
6
Introduction To Optimization: Objective Functions and
Decision Variables - YouTube
7
8
9
Definition
�Optimization can be defined as the process of finding
the conditions that give the maximum or minimum
value of a function.
�If a point x∗ corresponds to minimum value of
function f (x), the same point also corresponds to
maximum value of −f (x) as shown below:
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Optimization
�Thus, optimization can be taken as a minimization
problem since maximum of a function can be found by
seeking minimum of negative of same function.
�In addition, the following operations on objective
function will not change optimum solution x∗:
1. Multiplication (or division) of f (x) by a positive constant c.
2. Addition (or subtraction) of positive constant c to (or from) f
(x).
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Optimization
�There is no single method available for solving all
optimization problems efficiently.
�Hence a number of optimization methods have been
developed for solving different types of optimization
problems.
�Operations research is a branch of mathematics
concerned with the application of scientific methods
and techniques to decision making problems and with
establishing the best or optimal solutions.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Statement of an Optimization Problem
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Types of optimization
� Constraint:- Solution is arrived at by maximizing or
minimizing the objective function
� Unconstraint:- No constraints are imposed on the
decision variables and differential calculus can be used
to analyze them
Statement of an Optimization Problem
�where
◦ X is an n-dimensional vector called the design vector,
�f(X) is termed the objective function, and
�gj(X) and lj(X) are known as inequality and equality
constraints, respectively.
�Number of variables n and number of constraints m and/or
p need not be related in any way.
�The problem stated is called a constrained optimization
problem.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Statement of an Optimization Problem
�Some optimization problems do not involve any
constraints and can be stated as
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Finding Optimum Solution
�A point X∗ will be a relative minimum of f (X) if the
necessary conditions
are satisfied.
�The point X∗ is guaranteed to be a relative minimum
if the Hessian matrix is positive definite.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Classification of Unconstrained
Minimization Methods
�Several methods are available for solving an unconstrained
minimization problem.
�These methods can be classified into 2 broad categories as
◦ direct search methods
◦ descent methods.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Direct search methods
20
Classification of Unconstrained
Minimization Methods
�Descent techniques require first and in some cases second
derivatives of the objective function.
�Since derivative information is used, descent methods are
more efficient than direct search techniques.
�The descent methods are known as gradient methods.
�Among the gradient methods,
◦ those requiring only first derivatives are first-order methods;
◦ those requiring both first and second derivatives are termed second-
order methods.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
General Approach
�All the unconstrained minimization methods are
iterative in nature
�They start from an initial trial solution and proceed
toward the minimum point in a sequential manner.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
General Approach
�The iterative process is given by
�where
◦ Xi is the starting point,
◦ Si is the search direction,
◦ λi* is the optimal step length, and
◦ Xi+1 is the final point in iteration i.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
General Approach
�Note: All unconstrained minimization methods
�(1) require an initial point X1 to start the iterative
procedure, and
�(2) differ from one another only in
◦ the method of generating the new point Xi+1 (from Xi) and
◦ in testing the point Xi+1 for optimality.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Classification of Optimization algorithms
� Derivative-based Optimization
◦ Descent Methods
◦ The Method of Steepest Descent
◦ Classical Newton’s Method
◦ Step Size Determination
� Derivative-free Optimization
◦ Genetic Algorithms
◦ Simulated Annealing
◦ Random Search
◦ Downhill Simplex Search
https://www.youtube.com/watch?v=Gbz8RljxIHo
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Gradient of a Scalar Function
�
29
Evaluation of the Gradient
�The evaluation of the gradient requires the computation
of the partial derivatives
df/dxi, i = 1,2,. . .,n.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Problems in Evaluation of Gradient
1. The function is differentiable at all the points, but
calculation of components of gradient, df/dxi is either
impractical or impossible.
2. The expressions for the partial derivatives df/dxi can be
derived, but they require large computational time for
evaluation.
3. The gradient is not defined at all the points.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Steepest Descent (Cauchy) Method
�The use of negative gradient vector as a direction for
minimization was first made by Cauchy in 1847.
�In this method we start from an initial trial point X1 and
iteratively move along the steepest descent directions
until the optimum point is found.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Steps in Steepest Descent Method
1. Start with an arbitrary initial point Xi. Set the iteration number
as i = 1.
2. Find the search direction Si as
3. Determine the optimal step length λi* in the direction Si and set
4. Test the new point, Xi+1, for optimality. If Xi+1 is optimum, stop
the process. Otherwise, go to step 5.
5. Set the new iteration number i = i + 1 and go to step 2.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Steepest Descent Method
�Owing to the fact that the steepest descent direction is a
local property, the method is not really effective in
most problems.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Example
2 2
F x = x1 + 2 x1 x 2 + 2x 2 + x1
x 0 = 0.5 = 0.1
0.5
F x
x1 2x 1 + 2x2 + 1 g0 = F x = 3
F x = = x= x0
2x 1 + 4x 2 3
F x
x2
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Effect of learning rate
�More the learning rate the trajectory becomes
oscillatory.
�This will make the algorithm unstable
�The upper limit for learning rates can be set for
quadratic functions
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
38
39
Problem
�
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
41
42
43
44
45
Newton’s Method
�In Newton’s method, we consider the quadratic
approximation of the function f(X) at X = Xi, using the
Taylor's series expansion
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Newton’s Method
�By setting the partial derivatives of above equation
equal to zero for the minimum of f(X), we get
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
48
49
Newton’s Method
�Since higher-order terms of Taylor series are neglected,
the above equation is to be used iteratively to find the
optimum solution X*.
�The sequence of points X1, X2, . . . ,Xi+1 can be shown
to converge to the actual solution X* from any initial
point X1 sufficiently close to the solution X*, provided
that [J1] is nonsingular.
�Since Newton's method uses the second partial
derivatives of the objective function, it is a second-
order method.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
51
Problem with Newton’s Method
�
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Problem with Newton’s Method
�
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
• Thus, X2 is the optimum point.
• Thus the method has converged
in one iteration for this
quadratic function.
“Machine Learning”
by
by Anuradha
Anuradha Srinivasaraghavan
Srinivasaraghavan && Vincy
Vincy Joseph
Joseph
Copyright
Copyright © 2019 Wiley India Pvt. Ltd. All rights
© 2019 Wiley India Pvt. Ltd. All rights
Let's see how the Newton method has converged in
one iteration (refer the above problem for the
quadratic equation)
57