Optimization Methods (MFE) : Elena Perazzi
Optimization Methods (MFE) : Elena Perazzi
Lecture 01
Elena Perazzi
EPFL
Fall 2018
df (x0 )
(linear ) f (x) ∼ f (x0 ) + (x − x0 )
dx
df (x0 ) 1 d 2 f (x0 )
(quadratic) f (x) ∼ f (x0 ) + (x − x0 ) + (x − x0 )2
dx 2 dx 2
Taylor theorem
Basic theorems
Assume: F (x) : R → R, F 0 (x) exists everywhere.
F (x)
I If F 0 (x ∗ ) = 0 and F 00 (x ∗ ) < 0, then x ∗ is a local maximum of
F (x)
Proof: quadratic approximation of F around x ∗ :
F (x) = F (x ∗ ) + F 0 (x ∗ )(x − x ∗ ) + 21 F 00 (x ∗ )(x − x ∗ )2 .
If F 0 (x ∗ ) = 0, F 00 (x ∗ ) < 0 → F (x) < F (x ∗ ) for every x
sufficiently close to x ∗ .
Examples
F’’(x)<0
F’(x)=0
F’(x)>0 F’(x)<0
F’(x)>0
F’(x)=0 F’(x)>0
F’’(x)>0
Analytical Approach
G (x0 ) + (x − x0 )G 0 (x0 ) = 0
G (x0 )
→ x = x0 − 0
G (x0 )
If G (x) is not linear this will not exactly lead to the solution. But by
iterating this procedure we will get closer and closer to the solution.
Newton method
F 0 (x0 )
x = x0 − (5)
F 00 (x0 )
Algorithm:
Step 1: Start from a point x0 and set a tolerance level > 0
(e.g. = 10−5 ). Set k = 0.
Step 2: While |F 0 (xk )| > compute
F 0 (xk )
xk+1 = xk − (6)
F 00 (xk )
Speed of Convergence
Ek+1
limk→∞ = Cp , p ≥ 1, 0 < Cp < ∞ (7)
(Ek )p
1
bK =
2k
1
cK =
22k
From Wikipedia: ”Rate of convergence”
G (xk ) − G (xk−1 )
G (sec) ([xk−1 , xk ]) = (8)
xk − xk−1
Algorithm:
Step 1: Start from two points x0 and x1 , set a tolerance level > 0
(e.g. = 10−5 ). Set k = 1.
Step 2: While |G (xk )| > compute
xk − xk−1
xk+1 = xk − G (xk ) (10)
G (xk ) − G (xk−1 )
Gradient of a function
Generalization of the first derivative for a multi-variable function.
~ , is
The gradient of a function F : R n → R, commonly denoted by ∇F
∂F
a vector-valued function whose i − th component is
∂xi
Example in 3 dimensions. Consider a function F (x, y , z). The
gradient is
∂F
∂x
~ = ∂F
∇F ∂y
(11)
∂F
∂z
the gradient of a function at a point x∗
is a vector pointing in the
direction in which the function F increases most rapidly (relative to
F (x ∗ )).
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 20 / 31
Part II
Hessian of a function
F : Rn → R.
Taylor Approximation
Linear approximation
~ (x~0 ) · (~x − x~0 )
F (~x ) = F (x~0 ) + ∇F (13)
Quadratic approximation
~ (~x ) ' ∇F
∇F ~ (~ x0 )(~x − x~0 ) = 0
x0 ) + HF (~
→ ~x = x~0 − inv (HF (~ ~ (~
x0 ))∇F x0 )
Algorithm:
I Step 1: Start from x~0 and set a tolerance level > 0. Set k = 0.
I ~ (xk )|| > compute
Step 2:While ||∇F
~ (~xk )
~xk+1 = ~xk − inv (HF (~xk ))∇F (16)
I ~ (xk+1 )|| < , accept ~xk+1 as the solution, otherwise go
Step 3: If ||∇F
back to step 2.
Elena Perazzi (EPFL) Optimization methods (MFE) Lecture 01 Fall 2018 24 / 31
Part II
Main idea: starting from a point ~x0 , move in the (inverse) direction of
the gradient, until a point is found where the gradient is sufficiently
close to 0.
Variants of the method
I At each iteration, use a constant-length step.
I At each iteration, search for a (local) minimum along the direction of
the (inverse) gradient, for example with a line search method.
Steepest descent
Quasi-Newton methods
Main feature of these methods: use an approximation of the Hessian
rather than recalculating it at each iteration
Secant method Algorithm
I Step 1: Start from x~0 , use an initial Hessian H0 , set a tolerance level
> 0. Set k = 0.
I ~ (xk )|| > compute
Step 2:While ||∇F
~ (~xk )
~xk+1 = ~xk − inv (Hk )∇F (17)
I ~ (xk+1 )|| < , accept ~xk+1 as the solution, otherwise
Step 3: If ||∇F
compute Hk+1 solving
~ (xk+1 ) = ∇F
∇F ~ (xk ) + Hk+1 (xk+1 − xk ) (18)
Simplex method