Unconstrained Optimization - Ipynb - Colaboratory
Unconstrained Optimization - Ipynb - Colaboratory
ipynb - Colaboratory
An unconstrained optimization problem deals with finding the local minimizer x∗ of a real valued and smooth objective function f (x) of n
variables, given by f : R
n
→ R , formulated as,
with no restrictions on the decision variables x. We work towards computing x∗ , such that ∀ x near x∗ , the following inequality is satisfied:
∗
f (x ) ≤ f (x) (2)
keyboard_arrow_down Necessary and Sufficient Conditions for Local Minimizer in Unconstrained Optimization
First-Order Necessary Condition
If there exists a local minimizer x∗ for a real-valued smooth function f (x) : R
n
→ R , in an open neighborhood ⊂ R
n
of x∗ along the
direction δ, then the first order necessary condition for the minimizer is given by:
T ∗
∇ f (x )δ = 0 ∀ δ ≠ 0 (3)
An Example
The Rosenbrock function of n -variables is given by:
n−1
2 2 2
f (x) = ∑(100(xi+1 − x ) + (1 − xi ) ) (5)
i
i=1
where, x ∈ R
n
. For this example let us consider the Rosenbrock function for two variables, given by:
2 2 2
f (x) = 100(x2 − x ) + (1 − x1 ) (6)
1
1
We will show that the first order necessary condition is satisfied for the local minimizer x∗ = [ ] . We first check whether x∗ is a minimizer
1
or not. Putting x1 = x2 = 1 in f (x) , we get f (x) = 0 . Now, we check whether the x∗ satisfies the first order necessary condition. For that
we calculate ∇f (x∗ ).
2
−400x1 (x2 − x1 ) − 2(1 − x1 ) 0
∗
∇f (x ) = [ ] = [ ] (7)
2
200(x2 − x ) ∗ 0
1 x
So, we see that the first order necessary condition is satisfied. We can do similar analysis using the scipy.optimize package.
import numpy as np
import scipy
# Import the Rosenbrock function, its gradient and Hessian respectively
from scipy.optimize import rosen, rosen_der, rosen_hess
0.0
The result is 0.0 . So x∗ is a minimizer. We then check for the first order necessary condition, using the gradient:
array([0, 0])
This matches with our calculations and also satisfies the first-order necessary condition.
and
T ∗
δ Hf (x )δ ≥ 0, ∀ δ ≠ 0 (9)
and
∗
δHf (x ) ≥ 0 (11)
where equation Eq.\eqref{eq:11} means that the Hessian matrix should be positive semi-definite.
If you are interested in the proofs, refer to the books mentioned or the blog.
keyboard_arrow_down An Example
Let us now work with a new test function called Himmelblau's function, given by,
2 2 2 2
f (x) = (x + x2 − 11) + (x1 + x − 7) (12)
1 2
3
where, x ∈ R
2
. We will check whether x∗ = [ ] satisfies the second-order sufficient conditions satisfying the fact that it is a strong local
2
minimizer. We will again use the autograd package to do the analyses for this objective function. Let us first define the function and the local
minimizer as x_star in Python:
Now, we calculate the gradient vector and the Hessian matrix of the function at x_star and look at the results
We see that x1 satisfies the second order sufficient conditions and is a strong local minimizer. We wanted to perform the analyses using
autograd package instead of scipy.optimize , because there might be cases when we need to use test functions that are not predefined in
scipy.optimize package, unlike the Rosenbrock function .
termination conditions are met for approximating the minimizer x . The algorithm generates this sequence taking into consideration the
∗
objective function f (x) at a particular point f (xn ) . A new iterate xn+1 is added in the sequence if the condition f (xn+1 ) < f (x n ) . Although
in many special cases, the algorithm might fail to find a new point in each and every step following the above condition, it must satisfy that after
some stipulated number k of steps, the following condition is met:
f (xn+k ) < f (xn )
. One of the important terminating conditions, for example, is to check whether the first order necessary condition is sufficiently accurate, for a
smooth objective function, i.e, ∥∇f (x∞ )∥ < ϵ , where ϵ is the infinitesimal tolerance value. We will discuss these conditions further in the
subsequent chapters.
Fundamentally, there are two approaches available to generate f (xn+1 ) from f (xn ) :
Line Search Descent Method: Using this method, the optimization algorithm first picks a direction δn for the nth step and performs a
search along this direction from the previous generated iterate xn−1 to find a new iterate xn such that the condition f (xn ) < f (xn−1 )
is satisfied. A direction δn is selected for the next iterate if the following condition is satisfied:
T
∇ f (xn−1 )δ n < 0 (13)
i.e, if the directional derivative in the direction δn is negative. Here f is the objective function. In view of that, the algorithm then needs to
ascertain a distance by which it has to move along the direction δn to figure out xn . The distance β , which is called the step length,
> 0
can be figured out by solving the one-dimensional minimization problem formulated as:
~
min f (β) = min f (xn−1 + βδ n ) (14)
β>0 β>0
Trust Region Method: Using this method, the optimization algorithm develops a model function [refer to Nocedal & Wright], Mn , such that
its behavior inside a boundary set around the current iterate xn matches that of the objective function f (xn ) at that point. The model
function is not expected to give a reasonable approximation to the behavior of the objective function at a point xt which is far away from
xn , i.e, not lying inside the boundary defined around xn . As a result, the algorithm obstructs the search for the minimizer of Mn inside
the boundary region, which is actually called the trust region, denoted by T , before finding the step ζ , by solving the minimization problem
formulated by:
min Mn (xn + ζ ), where xn + ζ ∈ T (15)
ζ
Using this $\mathbf{\zeta}$, if the decrease in the value of $f(\mathbf{x}_{n+1})$ from $f(\mathbf{x}_n)$ is not sufficient, it can be
inferred that the selected trust region is unnecessarily large. The algorithm then reduces the size of $\mathcal{T}$ accordingly and re-
solves the problem given by equation Eq.@ref(eq:44). Most often, the trust region $\mathcal{T}$ is defined by a circle in case of a two
dimensional problem or a sphere in case of a three dimensional problem of radius $\mathcal{T_r}>0$, which follows the condition
$|\mathbf{\zeta}| \leq \mathcal{T_r}$. In special cases, the shape of the trust region might vary. The form of the model function is given
by a quadratic function, given by,
1
T T
M n (x n + ζ ) = f (x n ) + ζ ∇f (xn ) + ζ Bf (xn )ζ (16)
2
Where, $\mathbf{B}f(\mathbf{x}_n)$ is either the Hessian matrix $\mathbf{H}f(\mathbf{x}_n)$ or an approximation to it.
Before moving into detailed discussions on line search descent methods and trust region methods in the later chapters, we will first deal with
solving equation Eq.\eqref{eq:14} in the immediate next chapter, which is itself an unconstrained one dimensional minimization problem, where
we have to solve for
~
min f (β)
β>0
~
and deduce the value of β ∗ , which is the minimizer for f .
(β)