Lecture2 Unconstrained Optimization
Lecture2 Unconstrained Optimization
PPGEQ – UFRGS
Prof. Jorge Otávio Trierweiler
logo
Outline
1 Introduction
Basic Concepts
Indirect Solution
2 Methods
3 Line Search
4 Trust Region
5 Least Squares
6 Final Remarks
logo
Introduction
minimize f (x)
x
logo
Introduction
df f (x + d) − f (x)
f 0 (x) = = lim
dx d→0 d
Introduction
Gradient Vector:
∂f (x)
∂x1
.
..
∇f (x) =
∂f (x)
∂xn
Hessian Matrix:
∂ 2 f (x) ∂ 2 f (x)
∂x12
... ∂x1 ∂xn
H(x) = ∇2 f (x) = .. .. ..
. . .
∂ 2 f (x) ∂ 2 f (x)
∂xn ∂x1 ... ∂xn2
logo
∇f (x) points in the direction of greatest increase of f (x) ; H(x) = H(x)T .
Introduction
1
f (x) ≈ f (xk ) + ∇f (xk )T d + d T H(xk )d, d = x − xk
2
Basic Concepts
What is a solution?
Basic Concepts
Convexity
Why is it so important?
For a convex function any local minimizer is a global minimizer
logo
Basic Concepts
Basic Concepts
Basic Concepts
maximum saddle
λ1 = −2; λ2 = −2 λ1 = 2; λ2 = −2 logo
Indirect Solution
Example
minimize x 2 + 2x − 1
x
Analytical Solution:
df
= 2x − 2 = 0 ⇒ x ∗ = 1
dx
Sufficient conditions:
d 2f
=2>0
dx 2
Indirect Solution
Indirect Solution
∂f (x)
1 ∂x
∂f (x)
∂x2
∇f (x) = .
=0
..
∂f (x)
∂xn
Indirect Solution
Indirect Solution
minimize x 2 − x
x
Indirect Solution
minimize x 4 − x + 1
x
4xk3 −1
Newton’s method: xk+1 = xk − 12xk2
. For x0 = 3:
k xk f (xk )
1 3.0000 79.0000
2 2.0093 15.2904
3 1.3601 3.0619
.. .. ..
. . .
8 0.6300 0.5275
logo
Methods
Direct Methods:
Scanning and Bracketing
Grid search
Interpolation
Stochastic Algorithms
Indirect Methods:
Steepest Descent Method;
Newton Method;
Quasi-Newton Method;
Conjugate Gradient.
logo
Methods
Direct Methods
General Algorithm:
1. Select initial set of point(s);
2. Evaluate objective function at each point;
3. Compare the values and keep the best solution (smallest value);
Outline: Do it forever and then you will surely find the optimal solution.
Remarks:
they are easy to apply and suitable for nonsmooth problems;
they require many objective functions evaluations;
there is no guarantee of convergence, or any proof the point is a
optimum;
methods of last resource - use them when nothing else works;
logo
Methods
Indirect Methods
Line Search:
Choose a promising direction
Find a step size to minimize along this direction
Trust Region:
Choose the maximum step size
Find a direction which minimizes
logo
Methods
Rates of Convergence
kxk+1 − x ∗ k ≤ c kxk − x ∗ k
kxk+1 − x ∗ k ≤ M kxk − x ∗ kp
kxk+1 − x ∗ k ≤ ck kxk − x ∗ k
logo
Line Search
Outline:
0. Initial point xk ;
1. Choose a search direction dk
2. Minimize along that direction to find a new point:
xk+1 = xk + αd
where α is a positive scalar called the step size
Search Direction
The search direction must be a descent direction:
∇f (xk )d T < 0
2.2. Overview of Algorithms 23
p
k
_ f
k
logo
Prof.Figure 2.6Escobar
Marcelo A downhill direction pk of Chemical Processes
Optimization
Introduction Methods Line Search Trust Region Least Squares Final Remarks
∂φ(α) d T ∇f (xk )
= ∇f (xk )T d + αd T ∇2 f (xk )d = 0 ⇒ α = T 2
∂α d ∇ f (xk )d
Line Search
φ (α)
first local
minimizer
first
stationary
point global minimizer
Basic Idea: Try out a sequence of candidates values for α, stopping to accept
iteration by means of a low-rank formula. When p is defined by (3.2) and B is positive
k k
logo
one of these values when
definite, we have certain conditions are satisfied.
φ (α) = f(xk+ α pk )
l( α)
acceptable acceptable
desired
slope
tangent
α
acceptable acceptable
The sufficient decrease and curvature conditions are known collectively as the
Wolfe conditions:
logo
Remarks:
The initial step length ᾱ is chosen to be 1 in Newton and quasi-Newton
methods, but can have different values in other algorithms such as
steepest descent or conjugate gradient;
An acceptable step length will be found after a finite number of trials!!!
logo
Methods
Steepest Descent
d = −∇f (xk )
The gradient is the vector that gives the (local) direction of the
greatest increase in f (x).
Methods
Steepest Descent
As an example consider the following function:
logo
Methods
Steepest Descent
The Gradient:
f 0 (x) = 3x 2 − 100
Steepest Descent:
Methods
1
f (x) ≈ f (xk ) + d T ∇f (xk ) + d T ∇2 f (xk )d
2
What is the optimal direction to minimize f (x)?
∂f (x)
= ∇f (xk ) + ∇2 f (xk )d = 0
∂d
It results in:
d=-∇2 f (xk )−1 ∇f (xk )
logo
Methods
Interpretations:
d minimizes the second order approximation of f (x):
1
f (x) = f (xk ) + ∇f (xk )T d + d T ∇2 f (xk )d
2
d solves linearized optimality conditions for min f (x):
logo
Note: It is important to note that Newton’s Method has a implicit α = 1.
Methods
Newton Example
Newton Step:
3xk2 − 100
xk+1 = xk −
6x
For x0 = 10 and a tolerance of 1e−3 (Matlab File - Ex01a.m):
If αk = 1 the procedure converges in 4 iterations;
It exact line search is used the procedure converges in 1 iteration;
Methods
Newton Direction
Newton Direction:
d = −∇2 (fk )−1 ∇f (xk )
Remarks:
Close to the solution it converges fast - quadratic convergence;
In order to converge from poor starting points - use line search;
The newton direction may be not a descent direction even for
sufficiently small α!
logo
Methods
Numerical Example 02
As an example consider the following function:
30
6.5
6
10
5.5 5
1
5 2
x2
4.5
4
30
3.5
50
3
60
2.5
80
2
0 1 2 3 4 5 6 7
x1
logo
Methods
Newton Example
Line Search:
xk+1 = xk − H(xk )−1 ∇f (xk )
Practical Methods
Levenberg-Marquardt
logo
Practical Methods
Quasi-Newton
Alternatively the Hessian can be approximated:
∇f (xk+1 ) ≈ ∇f (xk ) + Bk dk
Bk dk ≈ yk
Practical Methods
Quasi-Newton
The most popular update formulas are:
Broyden-Fletcher-Goldfarb-Shanno (BFGS):
yk ykT Bk dk (Bk dk )T
Bk+1 = Bk + T
−
yk dk dkT Bk dk
Davidon-Fletcher-Powell (DFP):
yk ykT
yk dkT
dk ykT
Bk+1 = + I− ykT dk
Bk I − ykT dk
ykT dk
Note: For the first iteration B0 = I . Numerical experiments have shown that logo
Practical Methods
Quasi-Newton
DFP:
−1 Bk−1 yk ykT Bk−1 dk dkT
Bk+1 = Bk−1 − +
ykT Bk−1 yk ykT dk
logo
Practical Methods
Quasi-Newton
The methods presented in this section differ by the search direction:
d = −Bk−1 ∇f (xk )
where,
Steepest Descent Bk = I
Newton Method Bk = H (Hessian)
Quasi Newton Method Bk (Hessian approximation)
iterations).
Prof. Marcelo Escobar Optimization of Chemical Processes
Introduction Methods Line Search Trust Region Least Squares Final Remarks
Practical Methods
Quasi-Newton: Example
As an example consider the function (toy02.m):
0.9
0.8
−0.3
0.7 −0.5
0.6
x2
0.5
−2
0.4 1
−4
0.3 −5
0.2 −3
−1
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
x1
logo
Practical Methods
logo
Practical Methods
diT Hdj = 0 ∀i 6= j
x ∗ = α0 d0 + . . . + αn−1 dn−1
Note: For non-quadratic functions more iterations may be necessary!
logo
Practical Methods
where,
∇f (xk+1 )T ∇f (xk+1 )
βk =
∇f (xk )T ∇f (xk )
Practical Methods
General Algorithm:
0. Guess an initial point x0 and d0 = −∇f (x0 );
1. At xk for the given direction dk select αk (exact, inexact);
2. Set xk+1 = xk + αk dk ;
3. Estimate dk+1
(Steepest Descent, Newton, Quasi-Newton, Conjugate Gradient);
Have fun!!!
logo
Trust Region
Trust Region
where ∆k is the trust region radius. It can be shown that the solution is:
Trust Region
When λ varies between 0 and ∞, the search direction d(λ) will vary
between the Newton direction dN and a multiple of −∇f (x k ).
logo
The redu
◮ To en
pN is defi
◮ If the
p(λ) increa
◮ If the
decre
◮ Furth
−g is not
logo
where pN is the Newton direction and −g = −∇f (x).
Trust Region
Trust Region
ρk is always non-negative;
if ρk is close to one there is good agreement with the quadratic
approximation and then the trust region is expanded;
if ρk is close to zero, the trust region radius is shrink;
logo
Trust Region
Trust Region
Subproblem Solution
In order to converge, it is not necessary the optimal solution of the
subproblem. Only a crude approximation with sufficient reduction is
needed.
∇f (xk )
d =− ∆k
k∇f (xk )k
∇f (xk )T ∇f (xk )
dC = − ∇f (xk ) dN = H(xk )−1 ∇f (xk )
∇f (xk )T Bk ∇f (xk )
logo
The factor τ depends on dC ,dN and ∆k .
Trust Region
logo
Least Squares
Curve Fitting:
r (β) = y − f (x, β)
Solution:
∂f ∂r
= 2r T =0
∂β ∂β
logo
Note: For m equations we can determine at most m parameters.
Least Squares
Applications:
Fitting of Linear functions:
ri = yi − (β1 xi + β2 )
logo
Least Squares
β ∗ = (AT W T WA)−1 AT W T y
Least Squares
logo
Least Squares
Example
B
log(P SAT ) = A −
T
Example 2: For given experimental data (Exa2.m) estimate α and β:
Nu = αRe β
logo
Final Remarks
Final Remarks
Final Remarks
Further Readings
Final Remarks
logo
Presented by: Course 22/08/2011 11:39
Marcelo Escobar Slide 23/38
quit