0% found this document useful (0 votes)
6 views

Optimization2

The document outlines various methods and concepts in unconstrained continuous optimization, including gradient descent, Newton's method, and the Levenberg-Marquardt algorithm. It discusses the importance of convexity in optimization problems and introduces key ideas such as axial iteration and conjugate gradients. Additionally, it emphasizes performance issues related to optimization algorithms, such as iteration count and computational cost.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Optimization2

The document outlines various methods and concepts in unconstrained continuous optimization, including gradient descent, Newton's method, and the Levenberg-Marquardt algorithm. It discusses the importance of convexity in optimization problems and introduces key ideas such as axial iteration and conjugate gradients. Additionally, it emphasizes performance issues related to optimization algorithms, such as iteration count and computational cost.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Optimization 2

Lecture outline

Unconstrained continuous optimization:


• Convexity
• Iterative optimization algorithms
• Gradient descent
• Newton’s method
• Gauss-Newton method

New topics:
• Axial iteration
• Levenberg-Marquardt algorithm
• Application
Introduction: Problem specification

Suppose we have a cost function (or objective function)

f ! x " # $%n & $%


Our aim is find the value of the parameters x that minimize this function

x ! ' ()* +,-


x
f ! x "
subject to the following constraints:

• equality c i ! x "#$ %, i $ &, . . . , me


• inequality c i ! x " # $ %, i & me ' (,. . . , m

We will start by focussing on unconstrained problems


Unconstrained optimization
function of one
variable f !x"

+,- f ! x "
x

local global x
minimum minimum

• down-hill search (gradient descent) algorithms can find local minima


• which of the minima is found depends on the starting point
• such minima often occur in real applications
Reminder: convexity
Class of functions

convex Not convex

• Convexity provides a test for a single extremum


• A non-negative sum of convex functions is convex
Class of functions continued

single extremum – convex single extremum – non-convex

Not convex

multiple extrema – non-convex noisy horrible


Optimization algorithm – key ideas

! "#$% δx &'() * ) + * f , x - δx . < f , x .

! /)#& 0 12+%&0* 3 0 +$0#*24+*#520'6%+*20 x n ! " # 7 0 x n - δx

! 82%'(20*)20643912:0* 3 0 +0&24#2&03;0< = 0 1#$20&2+4()2&0δx 7 α p

15

10

-5
-5 0 5 10 15
Optimization algorithm – Random direction
Choosing the direction 1: axial iteration

Alternate minimization over x and y

15

10

-5-5 0 5 10 15
Optimization algorithm
axial directions
Gradient and Partial Derivatives

A function of several variables can be written as f (x1 , x2 ), Gradient and Tangent Plane /
etc. Often times, we abbreviate multiple arguments in a 1st Degree Taylor Expansion
single vector as f (x).

Let a function f : Rn → R. The gradient of f is the


column vector of partial derivatives ∇f (x)
 ∂f (x) 
∂x1

∇f (x) :=  .. 
.

 
∂f (x)
∂xn

Suppose now a function g(x, y) with signature g : Rn × τx1 (y) = f (x) + (y − x)> ∇f (x)
Rm → R. Its derivative with respect to just x is written
as ∇x g(x, y).
Choosing the direction 2: steepest descent

Move in the direction of the gradient ! f "xn#


15

10

-5-5 0 5 10 15
Optimization algorithm – Steepest descent
Steepest descent
15

10

-5
-5 0 5 10 15

$ % & ' # ()*+,'-.#,/#'0')12&')'#3')3'-+,456*)#. 7 # .&'#47-.75) 6,-'/8

$ 9:.')#'*4&#6,-'#;,-,;,<*.,7-#.&'#-'2#()*+,'-.#,/#*62*1/ orthogonal
. 7 .&' 3)'0,75/ /.'3 +,)'4.,7- =.)5' 7: *-1 6,-' ;,-,;,<*.,7-8>

$ ?7-/'@5'-.61A#.&'#,.')*.'/#.'-+#. 7 # <,(B<*(#+72-#.&'#0*66'1#,-#*#0')1##
,-'C4,'-. ;*--')
Gradient Descent

• Iterative method starting at an initial point x(0)


• Step to the next point x(k+1) in the direction of the
negative gradient

x(k+1) = x(k) − ∇f (x(k) )


120
• Repeat until k∇f (x(k) )k <  for a chosen  100
80
f 60
• But: No convergence is guaranteed. 40 10
20
For convergence, an additional line search is required. 0
8
6
1.00
0.75 4
0.50

x1
0.25 2
x2 0.00 0.25
Line Search 0.50
0.75 0

• Take the descent step direction d = −∇f (x) Gradient Descent for

• Select the step length α as minα≥0 f (x + αd)


f (x) = 12 (x1 )2 + 5(x2 )2
• In practice, α is selected with heuristics
A harder case: Rosenbrock’s function

! ! !
f " x , y # $ %&&"y ' x # ( " % ' x #
Rosenbrock function
3

2.5

1.5

0.5

-0.5

-1
-2 -1 0 1 2

" # $ # % & % ' #(') * ' +,, ,-


Steepest descent on Rosenbrock function

Steepest Descent Steepest Descent


3

2.5
0.85
2

1.5 0.8

1
0.75
0.5

0 0.7

-0.5
0.65
-1 -0.95 -0.9 -0.85 -0.8 -0.75
-2 -1 0 1 2

• The zig-zag behaviour is clear in the zoomed view (100 iterations)

• The algorithm crawls down the valley


Optimization algorithm – Steepest descent 2
Optimization algorithm – Steepest descent for matrices
Conjugate Gradients – sketch only
! " # $ # % " & ' &( c o n j u g a t e g r a d i e n t s )"&&*#* *+))#**,-# '#*)#.% ',/#)0
%,&.* p n *+)" % " 1 % , % ,* 2+1/1.%##' % & /#1)" %"# $ , . , $ + $ ,. 1 3. ,% #
.+$4#/ &( *%#5*6

7 81)"9 p n ,*9)"&*#.9% & 9 4#9)&.:+21%#9% & 9 1;;95/#-,&+*9*#1/)"9',/#)%,&.*99


< , % " 9 /#*5#)%9% & 9 %"#9=#**,1.9 H>

p!nHp j ? @, @?< j < n

7 ! " # 9 /#*+;%,.29*#1/)"9',/#)%,&.*91/#9$+%+1;;C9;,.#1/;C ,.'#5#.'#.%6

7 RemarkablyD p n )1. 4# )"&*#. +*,.2 &.;C E.&<;#'2# &( p n " # , A f F x n " # G 9 9


1.'9A f F x n G 9 F*##9H+$#/,)1; I#),5#*G

Afn!Afn p
pn ? A f n B n" #
A f n!" # A f n " #
Choosing the direction 3: conjugate gradients

Again, uses first derivatives only, but avoids “undoing” previous


work

$ 9 - # DB+,; '-/,7-*6#@ 5 * + ) * .,4 #: 7 ) ; # 4*-#E'#; ,- ,; ,<' + #,-#a t m o s t N


47 - F5 ( * .' #+'/4'-. /.'3/8

$ G #+ , H ' ) ' - . # / .* ) .,- ( 3 7 ,- ./ 8

$ I , - , ; 5 ; # ,/#)'*4&'+#,-#'J*4.61#K /.'3/8
The Hessian Matrix

Let f : Rn → R twice differentiable. Its second (partial)


derivatives make up the Hessian Matrix ∇2f (x):
2nd Degree Taylor Expansion
 
∂ 2f (x) ∂ 2f (x)
 ∂x ∂x ···
 1 1 ∂x1 ∂xn 
∇2f (x) := 
 .. .. .. 
 2. . . 

 ∂ f (x) 2
∂ f (x) 
···
∂xn ∂x1 ∂xn ∂xn

• The order of differentiation does not matter if the


function has continuous second (higher-order) τx2 (y) = f (x) +
partial derivatives (Schwarz’s Theorem) (y − x)> ∇f (x) +
• Then the Hessian is symmetric >
1
 2 
2 (y − x) ∇ f (x) (y − x)
∇2f (x) = [∇2f (x)]>
Choosing the direction 4: Newton’s method
Start from Taylor expansion in 2D
9 # :5-4.,7-#;*1#E'#*33)7J,;*.'+#674*661#E1#,./#%*167)#/'),'/#'J3*-/,7-##
*E75.#*#37,-. x $
∂ !f ∂ !f
∂f ∂f δx " ∂x ! ∂x∂y δx
f = x ! δx > L f = x> ! , ! =δx, δy> ∂ !f ∂ !f
∂ x ∂y δy K δy
∂x ∂y ∂y!

%&'# 'J3*-/,7-#. 7 # /'47-+#7)+')#,/#*#@5*+)*.,4 :5-4.,7-


" !
f = x ! δ x > M a ! g ! δx ! δx H δx
K

D72#;,-,;,<'#.&,/#'J3*-/,7-#70') δxN
" !
;,- f = x ! δ x > M a ! g ! δx ! δx H δx
δx K
<
:#$ f , x - δx . 7 a - g>δx - δx>H δx
δx >
"34 + : #$#: ': ?2 42@'#42 * ) + * A f , x - δx . 7 0B +$% &3

A f , x - δx . 7 g - Hδx 7 0
? # * ) &31'*#3$ δx 7 C H O" g , D + * 1 + 9 δx 7 C H E g .F

15
/)#& 0 G#52&0*)20#*24+*#52 '6%+*2

10

x n ! " & x n ) H#n"gn


5

-5
-5 0 5 10 15
x n ! " ! x n " H#n"gn

$ P:#f = x > # ,/#@5*+)*.,4A#.&'-#.&'#/765.,7-#,/#:75-+#,-#7-' /.'38

$ % & ' # ;'.&7+#&*/#@5*+)*.,4#47-0')('-4'#=*/#,-#.&'#" Q 4*/'>8

$ % & ' # /765.,7-#δx M # OH " n#g n ,/#(5*)*-.''+#. 7 # E'#*#+72-&,66#+,)'4.,7-##


3)70,+'+#. & * . # H,/#37/,.,0' +'R-,.'

$ S*.&')#.&*-#F5;3#/.)*,(&.#. 7 # .&'#3)'+,4.'+#/765.,7-#*. # x n O # H"n#gnA##


, . # ,/#E'..')#. 7 # 3'):7);#*#line search

x n % # M x n O αnH "n#gn

$ P:#HM # I .&'-#.&,/#)'+54'/#. 7 # /.''3'/. +'/4'-.8


Newton’s method - example
Newton method with line search
Newton method with line search
3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1
-2 -1 0 1 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
gradient < 1e-3 after 15 iterations gradient < 1e-3 after 15 iterations

ellipses show successive


quadratic approximations

•The algorithm converges in only 15 iterations – far superior to steepest


descent
•However, the method requires computing the Hessian matrix at each
iteration – this is not always feasible
Optimization algorithm – Newton method
Optimization algorithm – Newton2 method
Performance issues for optimization algorithms

1. Number of iterations required

2. Cost per iteration

3. Memory footprint

4. Region of convergence
Non-linear least squares

M
'
f F xG ? ri
i& #
Gradient
M
A f FxG ? J r iF x G A r iF x G Ari
i
Hessian
M
H? A A ! f F x G ? J A 9 r iF x G A 9!r Fi x G
i
M
? J A r i F x G A !9 r iF x G B 9 ri F x G A A !9 r iF x G
i
<",)"9,*9155/&K,$1%#' 1*
!Uri
M Ari
H() ? J A r i F x G A !9 r i F x G
i

! " , * 9 ,*9%"#9G a u s s - N e w t o n 155/&K,$1%,&.


x n ! " " x n # αnH#n"gn $ % & ' Hn ( x ) " H$% ( x n )

Gauss-Newton method with line search


Gauss-Newton method with line search
3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1
-2 -1 0 1 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

gradient < 1e-3 after 14 iterations gradient < 1e-3 after 14 iterations

•minimization with the Gauss-Newton approximation with line search


takes only 14 iterations
Comparison
Newton Gauss-Newton
Newton method with line search Gauss-Newton method with line search
3 3
2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
gradient < 1e-3 after 15 iterations gradient < 1e-3 after 14 iterations

• requires computing Hessian •approximates Hessian by


product of gradient of residuals
• exact solution if quadratic
• requires only derivatives
Summary of minimizations methods

&'()*+ x n ! " , x n ! δx

"- %+.*/0-
H δx , # g

1- $)2334%+.*/0-
HVD#δx , # g

5-6$7)(8+0* (+39+0*-
λ δx , # g
Levenberg-Marquardt algorithm
$ 92*1 :)7; .&' ;,-,;5;A ,- )'(,7-/ 7: -'(*.,0' 45)0*.5)'A .&'
V*5//BD'2.7- *33)7J,;*.,7- ,/ -7. 0')1 (77+8

$ P- /54& )'(,7-/A * /,;36' /.''3'/.B+'/4'-. /.'3 ,/ 3)7E*E61 .&' E'/.


36*-8

$ % & ' W'0'-E')(BI*)@5*)+. ;'.&7+ ,/ * ;'4&*-,/; :7) 0*)1,-( E'B


.2''- /.''3'/.B+'/4'-. *-+ V*5//BD'2.7- /.'3/ +'3'-+,-( 7- &72
(77+ .&' H() *33)7J,;*.,7- ,/ 674*6618
1.4

1.2

0.8

0.6

0.4

0.2

0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Newton gradient
descent
$ % & ' # ; ' . & 7 + # 5/'/#.& ' #; 7 + ,R ' + X'//,*-

H= x , λ > M H$% ! λ I

$ T & ' - #λ ,/#/;*66A# H*33)7J,;*.'/#.& ' #V*5//BD'2.7- X'//,*-8

$ T & ' - #λ ,/#6*)('A# H,/#467/'#. 7 # .& ' #,+'-.,.1A#4*5/,-(#/.''3'/.B+'/4'-.##


/.'3/#. 7 # E' .*Y'-8
LM Algorithm
H= x , λ > M H $ % = x > ! λ I

"8#Z'.#λ M # [.[[" =/*1>

K8 Z760' δx M O H = x , λ > & # g

G8 P: f = x n ! δx > > f = x n > A ,-4)'*/' λ = \ " [ /*1> *-+ (7 . 7 K8

]8 ^.&')2,/'A#+'4)'*/'#λ = \ [ . " # /*1>A#6'.# x n ' # ( M # x n ! δx A#*-+#(7#. 7 # K8

N o t e : T h i s a l g o r i t h m d o e s n o t r e q u i r e e x p l i c i t lin e searches.
Example

Levenberg-Marquardt method
3 Levenberg-Marquardt method
3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1
-2 -1 0 1 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
gradient < 1e-3 after 31 iterations gradient < 1e-3 after 31 iterations

! "#$#%#&'(#)$*+,#$-*./0/$1/2-3"'24+'25(*6 $ ) *7#$/*,/'289:*(';/,*<=**
#(/2'(#)$,>

Matlab: lsqnonlin
Comparison

Gauss-Newton Levenberg-Marquardt
Levenberg-Marquardt method
Gauss-Newton method with line search
3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5
-1
-2 -1 0 1 2 -1
gradient < 1e-3 after 14 iterations -2 -1 0 1 2
gradient < 1e-3 after 31 iterations

•more iterations than Gauss-Newton,


but
• no line search required,
• and more frequently converges

You might also like