0% found this document useful (0 votes)
11 views

Unconstrained Optimization - Ipynb - Colaboratory

This document discusses necessary and sufficient conditions for local minimizers in unconstrained optimization problems. It introduces the first-order necessary condition of setting the gradient of the objective function equal to zero at the local minimizer. An example using the Rosenbrock function verifies this condition. The document also presents the second-order necessary conditions, which require the Hessian matrix to be positive semi-definite at any local minimizer.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Unconstrained Optimization - Ipynb - Colaboratory

This document discusses necessary and sufficient conditions for local minimizers in unconstrained optimization problems. It introduces the first-order necessary condition of setting the gradient of the objective function equal to zero at the local minimizer. An example using the Rosenbrock function verifies this condition. The document also presents the second-order necessary conditions, which require the Hessian matrix to be positive semi-definite at any local minimizer.

Uploaded by

mukesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

4/4/24, 10:05 PM 02. Unconstrained Optimization.

ipynb - Colaboratory

The Unconstrained Optimization Problem

An unconstrained optimization problem deals with finding the local minimizer x∗ of a real valued and smooth objective function f (x) of n
variables, given by f : R
n
→ R , formulated as,

min f (x) (1)


n
x∈R

with no restrictions on the decision variables x. We work towards computing x∗ , such that ∀ x near x∗ , the following inequality is satisfied:

f (x ) ≤ f (x) (2)

keyboard_arrow_down Necessary and Sufficient Conditions for Local Minimizer in Unconstrained Optimization
First-Order Necessary Condition
If there exists a local minimizer x∗ for a real-valued smooth function f (x) : R
n
→ R , in an open neighborhood ⊂ R
n
of x∗ along the
direction δ, then the first order necessary condition for the minimizer is given by:
T ∗
∇ f (x )δ = 0 ∀ δ ≠ 0 (3)

i.e, the directional derivative is 0 , which ultimately reduces to the equation:



∇f (x ) = 0 (4)

An Example
The Rosenbrock function of n -variables is given by:
n−1

2 2 2
f (x) = ∑(100(xi+1 − x ) + (1 − xi ) ) (5)
i

i=1

where, x ∈ R
n
. For this example let us consider the Rosenbrock function for two variables, given by:
2 2 2
f (x) = 100(x2 − x ) + (1 − x1 ) (6)
1

1
We will show that the first order necessary condition is satisfied for the local minimizer x∗ = [ ] . We first check whether x∗ is a minimizer
1

or not. Putting x1 = x2 = 1 in f (x) , we get f (x) = 0 . Now, we check whether the x∗ satisfies the first order necessary condition. For that
we calculate ∇f (x∗ ).
2
−400x1 (x2 − x1 ) − 2(1 − x1 ) 0

∇f (x ) = [ ] = [ ] (7)
2
200(x2 − x ) ∗ 0
1 x

So, we see that the first order necessary condition is satisfied. We can do similar analysis using the scipy.optimize package.

import numpy as np
import scipy
# Import the Rosenbrock function, its gradient and Hessian respectively
from scipy.optimize import rosen, rosen_der, rosen_hess

https://colab.research.google.com/github/indrag49/Numerical-optimization-PyDelhi-meetup/blob/main/02. Unconstrained Optimization.ipynb#scrollTo=DKBbWZrq7XA0&printMode=true 1/5


4/4/24, 10:05 PM 02. Unconstrained Optimization.ipynb - Colaboratory
x_m = np.array([1, 1]) #given local minimizer
rosen(x_m) # check whether x_m is a minimizer

0.0

The result is 0.0 . So x∗ is a minimizer. We then check for the first order necessary condition, using the gradient:

rosen_der(x_m) # calculates the gradient at the point x_m

array([0, 0])

This matches with our calculations and also satisfies the first-order necessary condition.

Second-Order Necessary Conditions

If there exists a local minimizer x∗ for a real-valued smooth function f (x) : R


n
→ R , in an open neighborhood ⊂ R
n
of x∗ along the feasible
direction δ, and Hf (x) exists and is continuous in the open neighborhood, then the second order necessary conditions for the minimizer are
given by:
T ∗
∇ f (x )δ = 0, ∀ δ ≠ 0 (8)

and
T ∗
δ Hf (x )δ ≥ 0, ∀ δ ≠ 0 (9)

which reduces to the following:



∇f (x ) = 0 (10)

and

δHf (x ) ≥ 0 (11)

where equation Eq.\eqref{eq:11} means that the Hessian matrix should be positive semi-definite.

If you are interested in the proofs, refer to the books mentioned or the blog.

keyboard_arrow_down An Example
Let us now work with a new test function called Himmelblau's function, given by,
2 2 2 2
f (x) = (x + x2 − 11) + (x1 + x − 7) (12)
1 2

3
where, x ∈ R
2
. We will check whether x∗ = [ ] satisfies the second-order sufficient conditions satisfying the fact that it is a strong local
2

minimizer. We will again use the autograd package to do the analyses for this objective function. Let us first define the function and the local
minimizer as x_star in Python:

https://colab.research.google.com/github/indrag49/Numerical-optimization-PyDelhi-meetup/blob/main/02. Unconstrained Optimization.ipynb#scrollTo=DKBbWZrq7XA0&printMode=true 2/5


4/4/24, 10:05 PM 02. Unconstrained Optimization.ipynb - Colaboratory
def Himm(x):
return (x[0]**2 + x[1] - 11)**2 + (x[0] + x[1]**2 - 7)**2
x_star = np.array([3, 2], dtype='float') #local minimizer

We then check whether x_star is a minimizer.

print("function at x_star:", Himm(x_star))

function at x_star: 0.0

Now, we calculate the gradient vector and the Hessian matrix of the function at x_star and look at the results

# import the necessary packages


import autograd.numpy as au
from autograd import grad, jacobian
# gradient vector of the Himmelblau's function
Himm_grad=grad(Himm)
print("gradient vector at x_star:", Himm_grad(x_star))
# Hessian matrix of the Himmelblau's function
Himm_hess = jacobian(Himm_grad)
M = Himm_hess(x_star)
eigs = np.linalg.eigvals(M)
print("The eigenvalues of M:", eigs)
if (np.all(eigs>0)):
print("M is positive definite")
elif (np.all(eigs>=0)):
print("M is positive semi-definite")
else:
print("M is negative definite")

gradient vector at x_star: [0. 0.]


The eigenvalues of M: [82.28427125 25.71572875]
M is positive definite

We see that x1 satisfies the second order sufficient conditions and is a strong local minimizer. We wanted to perform the analyses using
autograd package instead of scipy.optimize , because there might be cases when we need to use test functions that are not predefined in
scipy.optimize package, unlike the Rosenbrock function .

keyboard_arrow_down Algorithms for Solving Unconstrained Minimization Tasks


An optimization algorithm for solving an unconstrained minimization problem requires an initial point x0 to start with. The choice of x0
depends either on the applicant who has some idea about the data and the task at hand or it can be set by the algorithm in charge. Starting with
x0 , the optimization algorithm iterates through a sequence of successive points {x0 , x1 , … , x∞ } , which stops at the point where all the

https://colab.research.google.com/github/indrag49/Numerical-optimization-PyDelhi-meetup/blob/main/02. Unconstrained Optimization.ipynb#scrollTo=DKBbWZrq7XA0&printMode=true 3/5


4/4/24, 10:05 PM 02. Unconstrained Optimization.ipynb - Colaboratory

termination conditions are met for approximating the minimizer x . The algorithm generates this sequence taking into consideration the

objective function f (x) at a particular point f (xn ) . A new iterate xn+1 is added in the sequence if the condition f (xn+1 ) < f (x n ) . Although
in many special cases, the algorithm might fail to find a new point in each and every step following the above condition, it must satisfy that after
some stipulated number k of steps, the following condition is met:
f (xn+k ) < f (xn )

. One of the important terminating conditions, for example, is to check whether the first order necessary condition is sufficiently accurate, for a
smooth objective function, i.e, ∥∇f (x∞ )∥ < ϵ , where ϵ is the infinitesimal tolerance value. We will discuss these conditions further in the
subsequent chapters.

Fundamentally, there are two approaches available to generate f (xn+1 ) from f (xn ) :

Line Search Descent Method: Using this method, the optimization algorithm first picks a direction δn for the nth step and performs a
search along this direction from the previous generated iterate xn−1 to find a new iterate xn such that the condition f (xn ) < f (xn−1 )

is satisfied. A direction δn is selected for the next iterate if the following condition is satisfied:
T
∇ f (xn−1 )δ n < 0 (13)

i.e, if the directional derivative in the direction δn is negative. Here f is the objective function. In view of that, the algorithm then needs to
ascertain a distance by which it has to move along the direction δn to figure out xn . The distance β , which is called the step length,
> 0

can be figured out by solving the one-dimensional minimization problem formulated as:
~
min f (β) = min f (xn−1 + βδ n ) (14)
β>0 β>0

Trust Region Method: Using this method, the optimization algorithm develops a model function [refer to Nocedal & Wright], Mn , such that
its behavior inside a boundary set around the current iterate xn matches that of the objective function f (xn ) at that point. The model
function is not expected to give a reasonable approximation to the behavior of the objective function at a point xt which is far away from
xn , i.e, not lying inside the boundary defined around xn . As a result, the algorithm obstructs the search for the minimizer of Mn inside
the boundary region, which is actually called the trust region, denoted by T , before finding the step ζ , by solving the minimization problem
formulated by:
min Mn (xn + ζ ), where xn + ζ ∈ T (15)
ζ

Using this $\mathbf{\zeta}$, if the decrease in the value of $f(\mathbf{x}_{n+1})$ from $f(\mathbf{x}_n)$ is not sufficient, it can be
inferred that the selected trust region is unnecessarily large. The algorithm then reduces the size of $\mathcal{T}$ accordingly and re-
solves the problem given by equation Eq.@ref(eq:44). Most often, the trust region $\mathcal{T}$ is defined by a circle in case of a two
dimensional problem or a sphere in case of a three dimensional problem of radius $\mathcal{T_r}>0$, which follows the condition
$|\mathbf{\zeta}| \leq \mathcal{T_r}$. In special cases, the shape of the trust region might vary. The form of the model function is given
by a quadratic function, given by,
1
T T
M n (x n + ζ ) = f (x n ) + ζ ∇f (xn ) + ζ Bf (xn )ζ (16)
2
Where, $\mathbf{B}f(\mathbf{x}_n)$ is either the Hessian matrix $\mathbf{H}f(\mathbf{x}_n)$ or an approximation to it.

https://colab.research.google.com/github/indrag49/Numerical-optimization-PyDelhi-meetup/blob/main/02. Unconstrained Optimization.ipynb#scrollTo=DKBbWZrq7XA0&printMode=true 4/5


4/4/24, 10:05 PM 02. Unconstrained Optimization.ipynb - Colaboratory

Before moving into detailed discussions on line search descent methods and trust region methods in the later chapters, we will first deal with
solving equation Eq.\eqref{eq:14} in the immediate next chapter, which is itself an unconstrained one dimensional minimization problem, where
we have to solve for
~
min f (β)
β>0
~
and deduce the value of β ∗ , which is the minimizer for f .
(β)

https://colab.research.google.com/github/indrag49/Numerical-optimization-PyDelhi-meetup/blob/main/02. Unconstrained Optimization.ipynb#scrollTo=DKBbWZrq7XA0&printMode=true 5/5

You might also like