100% found this document useful (1 vote)
103 views31 pages

Optimization Methods (MFE) : Elena Perazzi

This document provides an overview of the Optimization Methods (MFE) course being taught by Elena Perazzi at EPFL in Fall 2018. The course will cover unconstrained and constrained optimization techniques through 5 lectures. Today's lecture will introduce unconstrained and constrained optimization, then cover one-dimensional unconstrained optimization methods including analytical solutions, Newton's method, and the secant method. It will also briefly introduce multidimensional unconstrained optimization techniques.

Uploaded by

Roy Sarkis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
103 views31 pages

Optimization Methods (MFE) : Elena Perazzi

This document provides an overview of the Optimization Methods (MFE) course being taught by Elena Perazzi at EPFL in Fall 2018. The course will cover unconstrained and constrained optimization techniques through 5 lectures. Today's lecture will introduce unconstrained and constrained optimization, then cover one-dimensional unconstrained optimization methods including analytical solutions, Newton's method, and the secant method. It will also briefly introduce multidimensional unconstrained optimization techniques.

Uploaded by

Roy Sarkis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Optimization methods (MFE)

Lecture 01

Elena Perazzi

EPFL

Fall 2018

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 1 / 31


Course essentials
5 lectures on 24/9, 1/10, 8/10, 15/10, 29/10. Mondays, from 9:15 to
12:00.
I We will do two 15-minute breaks at 10:00 and 11:00. .

Exam: in class on 12/11.


Exercise sessions on Tuesdays, 15:15-17:00.
Grading: 40% assignments, 60% final exam.
Assignments can be done in groups of (up to) 3 people.
Assignments will be: small programming projects (in Matlab) or
theoretical exercises that will help you prepare for the exam.
Exercise sessions will be taught by the TA, Damien Klossner
[email protected]
I Purpose of the exercise sessions: help you understand/get started with
the assignments; answer questions on lecture material.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 2 / 31
Course topics

Unconstrained optimization: numerical methods.


Constrained Optimization with equality constraints
I Theory of Lagrange multipliers
I Numerical methods
Constrained Optimization with inequality constraints
I Theory of Kuhn-Tucker multipliers
I Numerical methods
Dynamic optimization
I Bellman equation

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 3 / 31


Today’s Topics
Unconstrained and constrained optimization defined.
Part I: one-dimensional unconstrained optimization
I Basic theorems
I Analytical method
I Newton’s method
I Secant method
Part II: multidimensional unconstrained optimization
I Basic theorems
I Analytical method
I Newton’s method
I Steepest descent (or gradient descent)
I Quasi-newton methods
I Simplex method

A few words about local vs global methods.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 4 / 31


Intro

Constrained and Unconstrained Optimization


Given a function F : Rn → R
Unconstrained optimization problem:

minx F (x) or maxx F (x)


x ∈ Rn

Constrained optimization problem:

minx F (x) or maxx F (x)


s.t. g (x) = 0
and/or h(x) < 0 (or h(x) > 0)

We usually consider minimization problems only, as finding the


maximum of F (x) is equivalent to finding the minimum of −F (x).

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 5 / 31


Part I

Preliminary: Taylor theorem

Let the function F (x) : R → R be N times differentiable at the point x0 ,


then
f (n) (x0 )
f (x) = f (x0 ) + ΣNn=1 (x − x0 )n + o(|x − x0 |N ) (1)
n!
with
o(|x − x0 |N )
limx→x0 =0 (2)
|x − x0 |N
Local Approximation

df (x0 )
(linear ) f (x) ∼ f (x0 ) + (x − x0 )
dx
df (x0 ) 1 d 2 f (x0 )
(quadratic) f (x) ∼ f (x0 ) + (x − x0 ) + (x − x0 )2
dx 2 dx 2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 6 / 31


Part I

Taylor theorem

Let the function F (x) : R → R be N + 1 times differentiable at the point


x0 , then

f (n) (x0 ) F (N+1) (ξ)


f (x) = f (x0 ) + ΣN
n=1 (x − x0 )n + (x − x0 )N+1 (3)
n! (N + 1)!

with ξ between x and x0 .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 7 / 31


Part I

Basic theorems
Assume: F (x) : R → R, F 0 (x) exists everywhere.

First-order necessary condition If x ∗ is a local minimum (or


maximum) of the function F (x), then F 0 (x ∗ ) = 0.
I Proof by contradiction: F 0 (x ∗ ) < 0 → F (x ∗ + dx) < F (x ∗ ) → x ∗
cannot be a local minimum.
Second-order sufficient condition
I If F 0 (x ∗ ) = 0 and F 00 (x ∗ ) > 0, then x ∗ is a local minimum of

F (x)
I If F 0 (x ∗ ) = 0 and F 00 (x ∗ ) < 0, then x ∗ is a local maximum of

F (x)
Proof: quadratic approximation of F around x ∗ :
F (x) = F (x ∗ ) + F 0 (x ∗ )(x − x ∗ ) + 21 F 00 (x ∗ )(x − x ∗ )2 .
If F 0 (x ∗ ) = 0, F 00 (x ∗ ) < 0 → F (x) < F (x ∗ ) for every x
sufficiently close to x ∗ .

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 8 / 31


Part I

Examples
F’’(x)<0
F’(x)=0

F’(x)>0 F’(x)<0
F’(x)>0

F’(x)<0 F’(x)>0 F’(x)=0


F’’(x)=0

F’(x)=0 F’(x)>0
F’’(x)>0

Analytical Approach

Solve analyically the equation F 0 (x) = 0.


Check the second order conditions to be sure that we found a min (or
max).

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 9 / 31


Part I

Newton method – Root finding

Method to solve numerically for the roots (=zeros) of a function G (x).


Given a starting point x0 , use a linear approximation of G (x) around x0

G (x) = G (x0 ) + (x − x0 )G 0 (x0 ) (4)

Solve for x such that G (x) = 0

G (x0 ) + (x − x0 )G 0 (x0 ) = 0
G (x0 )
→ x = x0 − 0
G (x0 )

If G (x) is not linear this will not exactly lead to the solution. But by
iterating this procedure we will get closer and closer to the solution.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 10 / 31


Part I

Newton method

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 11 / 31


Part I

Newton method – Min/Max


To find the minimum (or maximum) of F (x) apply the root-finding
method to find the zeros of G (x) = F 0 (x). This leads to

F 0 (x0 )
x = x0 − (5)
F 00 (x0 )

Algorithm:
Step 1: Start from a point x0 and set a tolerance level  > 0
(e.g.  = 10−5 ). Set k = 0.
Step 2: While |F 0 (xk )| >  compute

F 0 (xk )
xk+1 = xk − (6)
F 00 (xk )

Step 3: If |F 0 (xk+1 )| < , then accept xk+1 is the solution.


Otherwise, go back to step 2 for a new iteration.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 12 / 31


Part I

Speed of Convergence

If there is an integer k such that xk = x ∗ , then we say that the


sequence {xk } has finite convergence.
Otherwise, we study the asymptotic properties of {xk }. Study the
behavior of the error Ek ≡ ||xk − x ∗ ||.
{xk } has degree of convergence p if

Ek+1
limk→∞ = Cp , p ≥ 1, 0 < Cp < ∞ (7)
(Ek )p

p = 1 → linear convergence, p = 2 → quadratic convergence. Usually


an algorithm is regarded as efficient if its convergence is superlinear,
i.e. C1 = 0 (even if C2 can be ∞!)

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 13 / 31


Part I

1
bK =
2k
1
cK =
22k
From Wikipedia: ”Rate of convergence”

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 14 / 31


Part I

Properties of Newton’s method

Convergence: not guaranteed (see Lecture Notes 1)


If it converges, quadratic convergence (see Lecture Notes 1)
Requires knowledge of first and second derivative.
Could converge to a local max or min.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 15 / 31


Part I

Secant method (Root finding)

Similar to Newton’s method.


Only difference: at each iteration, given the point xk , replace f 0 (xk )
with the secant of the function between xk−1 and xk .

G (xk ) − G (xk−1 )
G (sec) ([xk−1 , xk ]) = (8)
xk − xk−1

Given xk and tolerance level , if G (xk ) >  compute


xk − xk−1
xk+1 = xk − G (xk ) (9)
G (xk ) − G (xk−1 )

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 16 / 31


Part I

Secant method (Root finding)

Algorithm:
Step 1: Start from two points x0 and x1 , set a tolerance level  > 0
(e.g.  = 10−5 ). Set k = 1.
Step 2: While |G (xk )| >  compute
xk − xk−1
xk+1 = xk − G (xk ) (10)
G (xk ) − G (xk−1 )

Step 3: If |G (xk+1 )| < , then accept xk+1 is the solution.


Otherwise, go back to step 2 for a new iteration.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 17 / 31


Part I

The secant method

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 18 / 31


Part I

Comparison Newton vs secant

Newton’ s convergence is quadratic, secant’s convergence can be


shown to be superlinear but slower than quadratic (psecant ' 1.6).
However, Newton requires the evaluation of f and f 0 at every step,
whereas secant method only requires the evaluation of f .
Secant requires more steps, but each step might be faster. For
example, if evaluating f 0 requires as much time as evaluating f and
all other costs are negligible, each step is twice as fast with secant
method.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 19 / 31


Part II

Gradient of a function
Generalization of the first derivative for a multi-variable function.
~ , is
The gradient of a function F : R n → R, commonly denoted by ∇F
∂F
a vector-valued function whose i − th component is
∂xi
Example in 3 dimensions. Consider a function F (x, y , z). The
gradient is
∂F
 
∂x
 
~ = ∂F
 
∇F  ∂y

 (11)
 
∂F
∂z
the gradient of a function at a point x∗
is a vector pointing in the
direction in which the function F increases most rapidly (relative to
F (x ∗ )).
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 20 / 31
Part II

Hessian of a function

F : Rn → R.

∂2F ∂2F ∂2F


 
∂x 2 ∂x∂y ∂x∂z
 
 
∂2F ∂2F ∂2F
HF =  (12)
 
 ∂y ∂x ∂y 2 ∂x∂z 

 
∂2F ∂2F ∂2F
∂z∂x ∂z∂y ∂z 2

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 21 / 31


Part II

Taylor Approximation

Linear approximation
~ (x~0 ) · (~x − x~0 )
F (~x ) = F (x~0 ) + ∇F (13)

Quadratic approximation

~ (x~0 ) · (~x − x~0 ) + 1 (~x − x~0 )0 HF (x~0 )(~x − x~0 ) (14)


F (~x ) = F (x~0 ) + ∇F
2
Gradient (linear approximation)
~ (~x ) = ∇F
∇F ~ (x~0 ) + HF (x~0 )(~x − x~0 ) (15)

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 22 / 31


Part II

First-order necessary condition If ~x ∗ is a local minimum (or


maximum) of the function F (~x ), then ∇F ~ (~x ∗ ) = 0.

Second-order sufficient condition


~ (~x ∗ ) = 0 and HF (~x ∗ ) positive definite
I If ∇F

(i.e. det(HF (~x ∗ )) > 0), then ~x ∗ is a local minimum of F (~x ).


F M is positive definite if, for every ~x 6= 0, x 0 Mx > 0.
F A symmetric matrix M positive definite ↔ All eigenvalues > 0 (Hessian
is a symmetric matrix)

I ~ (~x ∗ ) = 0 and HF (~x ∗ ) negative definite


If ∇F
(i.e. det(HF (~x ∗ )) < 0), then ~x ∗ is a local maximum of F (~x ).

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 23 / 31


Part II

Newton method (max/min)


To find the min of F (~x ):
At each iteration, given a point ~xk , use a linear approximation of the
gradient of F around ~xk , and solve for the point around ~xk where the
gradient is 0.

~ (~x ) ' ∇F
∇F ~ (~ x0 )(~x − x~0 ) = 0
x0 ) + HF (~
→ ~x = x~0 − inv (HF (~ ~ (~
x0 ))∇F x0 )
Algorithm:
I Step 1: Start from x~0 and set a tolerance level  > 0. Set k = 0.
I ~ (xk )|| >  compute
Step 2:While ||∇F
~ (~xk )
~xk+1 = ~xk − inv (HF (~xk ))∇F (16)
I ~ (xk+1 )|| < , accept ~xk+1 as the solution, otherwise go
Step 3: If ||∇F
back to step 2.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 24 / 31
Part II

Steepest descent (or Gradient descent) method

Main idea: starting from a point ~x0 , move in the (inverse) direction of
the gradient, until a point is found where the gradient is sufficiently
close to 0.
Variants of the method
I At each iteration, use a constant-length step.
I At each iteration, search for a (local) minimum along the direction of
the (inverse) gradient, for example with a line search method.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 25 / 31


Part II

Steepest descent

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 26 / 31


Part II

Quasi-Newton methods
Main feature of these methods: use an approximation of the Hessian
rather than recalculating it at each iteration
Secant method Algorithm
I Step 1: Start from x~0 , use an initial Hessian H0 , set a tolerance level
 > 0. Set k = 0.
I ~ (xk )|| >  compute
Step 2:While ||∇F
~ (~xk )
~xk+1 = ~xk − inv (Hk )∇F (17)
I ~ (xk+1 )|| < , accept ~xk+1 as the solution, otherwise
Step 3: If ||∇F
compute Hk+1 solving
~ (xk+1 ) = ∇F
∇F ~ (xk ) + Hk+1 (xk+1 − xk ) (18)

and go back tp step 2.


Note: the system (18) is under-determined for n > 1. E.g. for n = 2 ,
system of 2 equations, 3 unknowns. Need to impose some constraints
on the Hessian components.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 27 / 31
Part II

Simplex method

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 28 / 31


Part II

Simplex method (Derivative-free!)


Minimization problem in N dimensions: pick N + 1 points
{~x1 , ~x2 , ..., ~xN+1 }. In 2 dimensions, the simplex is a triangle (in 3
dimensions it is a tethrahedron).
If ~xk has the worst objective value (i.e. maximum function value),
move away from this point by reflecting it through the center of the
face formed by the other points. The “centroid” of the best N points
is
1
~c = Σi6=k ~xi (19)
N
Replace the old xk with
~xnew = ~c + α(~c − ~xk ) (20)
for a coefficient α > 0.
If ~xnew worst than the old ~xk , probably the step was too long → the
simplex should be contracted
If ~xnew is the new best point,we found a good direction and the
simplex should be expanded.
Stop and(EPFL)
Elena Perazzi accept current best methods
Optimization point(MFE)
as the solution
Lecture 01 after noFallbetter
2018 29 / 31
Part II

Unconstrained Optimization in Matlab

Main routines: fminunc and fminsearch.

[x, fval, exitflag , output] = fminunc(@objfun, x0 , options)


I ‘options’ allows us to choose the algorithm. Main algorithms are
’quasi-newton’ and ‘trust-region’.
(in addition, options allows us to choose other things, e.g. tolerance
level, maximum number of iterations etc.)
I ’trust-region’ is another variant of the Newton method in which xk+1 is
constrained to remain within a certain distance from xk .
[x, fval, exitflag , output] = fminsearch(objfun, x0 , options)
I Uses derivative-free methods, in particular the simplex method.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 30 / 31


Part II

Local or Global Optimization?


All methods we have discussed are designed to find a local minimum.
Start from a trial point and look for improvement directions. When a
local minimum is found, the algorithm stops.
How to avoid getting stuck in a local minimum?
First idea: try different starting points.
A more sophisticated algorithm: Simulated annealing. Technique
from material science to cool down materials in a controlled way.
I Introduce randomness in the algorithm.
I At each step, try points at random distance from the current point.
I Accept a new point (with some probability) even if it’s worse than the
previous one. This leads to a more extrensive exploration of the
function domain.
I High acceptance probability is analogous to ‘high temperature’. Start
with ’high temperature’ and progressively ’cool down the temperature’.
Implemented in Matlab Global Optimization toolbox.

Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 31 / 31

You might also like