MultivariateOptimizationx4 PDF

Overview of Multivariate Optimization Topics Multivariate Optimization Overview
Problem denition The unconstrained optimization problem is a generalization of

Algorithms the line search problem
Cyclic coordinate method Find a vector a such that
Steepest descent a = argminf (a)
Conjugate gradient algorithms a
PARTAN Note that the are no constraints on a

Newtons method Example: Find the vector of coecients (w Rp1 ) that
Levenberg-Marquardt minimize the average absolute error of a linear model
Concise, subjective summary Akin to a blind person trying to nd their way to the bottom of a
valley in a multidimensional landscape
We want to reach the bottom with the minimum number of cane
taps
Also vaguely similar to taking core samples for oil prospecting
J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 1 J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 2
Example 1: Optimization Problem Example 1: Optimization Problem
5 5
a2
a2
0 0
5 5
5 4 3 2 1 0 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5
a1 a1
Example 1: Optimization Problem Example 1: Optimization Problem
Example 1: Optimization Problem Example 1: MATLAB Code
function [] = O p t i m i z a t i o n P r o b l e m ();
% ==============================================================================
% User - Specified Parameters
% ==============================================================================
x = -5:0 .05 :5;
y = -5:0 .05 :5;
% ==============================================================================
% Evaluate the Function
% ==============================================================================
[X , Y ] = meshgrid (x , y );
[Z , G ] = OptFn (X , Y );
functionName = O p t i m i z a t i o n P r o b l e m ;
fileIdentifier = fopen ([ functionName .tex ] , w );
% ==============================================================================
% Contour Map
% ==============================================================================
figure ;
FigureSet (2 , Slides );
contour (x ,y ,Z ,50);
xlabel ( a_1 );
ylabel ( a_2 );
zoom on ;
AxisSet (8);
fileName = sprintf ( %s -% s , functionName , Contour );
print ( fileName , - depsc ); AxisSet (8);
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n
fprintf ( fileIdentifier , \\ newslide \ n ); fileName = sprintf ( %s -% s , functionName , Quiver );
fprintf ( fileIdentifier , \\ stepcounter { exc }\ n ); print ( fileName , - depsc );
fprintf ( fileIdentifier , \\ slideheading { Exam ple \\ arabic { exc }: Optimization Problem }\ n ); fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n fprintf ( fileIdentifier , \\ newslide \ n );
fprintf ( fileIdentifier , \\ includegraphics [ scale =1]{ Matlab /% s }\ n , fileName ); fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: Optimization Problem }\ n );
fprintf ( fileIdentifier , \ n ); fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n
fprintf ( fileIdentifier , \\ includegraphics [ scale =1]{ Matlab /% s }\ n , fileName );
% ============================================================================== fprintf ( fileIdentifier , \ n );
% Quiver Map
% ============================================================================== % ==============================================================================
figure ; % 3 D Maps
FigureSet (1 , Slides ); % ==============================================================================
axis ([ -5 5 -5 5]); figure ;
contour (x ,y ,Z ,50); set ( gcf , Renderer , zbuffer );
h = get ( gca , Children ); FigureSet (1 , Slides );
set (h , LineWidth ,0 .2 ); h = surf (x ,y , Z );
hold on ; set (h , LineStyle , None );
xCoarse = -5:0 .5 :5; xlabel ( a_1 );
yCoarse = -5:0 .5 :5; ylabel ( a_2 );
[X , Y ] = meshgrid ( xCoarse , yCoarse ); shading interp ;
[ ZCoarse , GCoarse ] = OptFn (X , Y ); grid on ;
nr = size ( xCoarse ,1); AxisSet (8);
dzx = GCoarse ( 1: nr ,1: nr ); hl = light ( Position ,[0 ,0 ,30]);
dzy = GCoarse ( nr + (1: nr ) ,1: nr ); set ( hl , Style , Local );
quiver ( xCoarse , yCoarse , dzx , dzy ); set (h , B a c k F a c e L i g h t i n g , unlit )
hold off ; material dull
xlabel ( a_1 );
ylabel ( a_2 ); for c1 =1:3
zoom on ; switch c1
case 1 ,
case 2 ,
view (45 ,10);
view ( -55 ,22);
Global Optimization?
case 3 , view ( -131 ,10);
otherwise , error ( Not implemented. ); In general, all optimization algorithms nd a local minimum in as
end
few steps as possible
fileName = sprintf ( %s -% s % d , functionName , Surface , c1 );
print ( fileName , - depsc ); There are also global optimization algorithms based on ideas
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
fprintf ( fileIdentifier , \\ newslide \ n ); such as
fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: Optimization Problem }\ n );
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Evolutionary computing
Genetic algorithms
fprintf ( fileIdentifier , \\ includegraphics [ scale =1]{ Matlab /% s }\ n , fileName );
fprintf ( fileIdentifier , \ n );
end
Simulated annealing
% ==============================================================================
% List the MATLAB Code
% ==============================================================================
None of these guarantee convergence in a nite number of
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n iterations
fprintf ( fileIdentifier , \\ newslide \ n );
fprintf ( fileIdentifier , \\ slideheading { Example \\ arabic { exc }: MATLAB Code }\ n );
fprintf ( fileIdentifier , % % = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \ n
All require a lot of computation
fprintf ( fileIdentifier , \ t \\ matlabcode { Matlab /% s.m }\ n , functionName );
fclose ( fileIdentifier );
Optimization Comments Optimization Algorithm Outline
Ideally, when we construct models we should favor those which The basic steps of these algorithms is as follows
can be optimized with few shallow local minima and reasonable 1. Pick a starting vector a
computation 2. Find the direction of descent, d
Graphically you can think of the function to be minimized as the 3. Move in that direction until a minimum is found:
elevation in a complicated high-dimensional landscape
:= argminf (a + d)
The problem is to nd the lowest point
The most common approach is to go downhill a := a + d

The gradient points in the most uphill direction 4. Loop to 2 until convergence
The steepest downhill direction is the opposite of the gradient Most of the theory of these algorithms is based on quadratic
Most optimization algorithms use a line search algorithm surfaces
The methods mostly dier only in the way that the direction of Near local minima, this is a good approximation
descent is generated Note that the functions should (must) have continuous gradients
(almost) everywhere
Cyclic Coordinate Method Example 2: Cyclic Coordinate Method

1. For i = 1 to p,
5
ai := argminf ([a1 , a2 , . . . , ai1 , , ai+1 , . . . , ap ]) 4

3
2. Loop to 1 until convergence
2
+ Simple to implement
1
+ Each line search can be performed semi-globally to avoid shallow
Y
local minima 0
+ Can be used with nominal variables 1
+ f (a) can be discontinuous 2
+ No gradient required 3
Very slow compared to gradient-based optimization algorithms 4

Usually only practical when the number of parameters, p, is small 5
5 0 5
There are modied versions with faster convergence X
Example 2: Cyclic Coordinate Method Example 2: Cyclic Coordinate Method
0.5 7
0 6
0.5
5
Function Value
1
4
Y
1.5
3
2
2
2.5
3 1
3.5 0
3 2 1 0 0 5 10 15 20 25
X Iteration
Example 2: Cyclic Coordinate Method Example 2: Relevant MATLAB Code
6 function [] = C y c l i c C o o r d i n a t e ();
% clear all ;
close all ;
5 ns = 26;
Euclidean Position Error
x = -3;
y = 1;
b0 = -1;
4 ls = 30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
3
[z , dzx , dzy ] = OptFn (x , y );
a (1 ,:) = [ x y ];
2 f (1) = z;
for cnt = 2: ns ,
if rem ( cnt ,2)==1 ,
d = [1 0] ; % Along x direction
1 else
d = [0 1] ; % Along y direction
end ;
0 [b , fmin ] = LineSearch ([ x y ] ,d , b0 , ls );
0 5 10 15 20 25
Iteration x = x + b * d (1);
y = y + b * d (2);
set ( h (1) , LineWidth ,1 .2 );
a ( cnt ,:) = [ x y ]; set ( h (2) , LineWidth ,0 .6 );
f ( cnt ) = fmin ; h = plot ( xopt , yopt , kx , xopt , yopt , rx );
end ; set ( h (1) , LineWidth ,1 .5 );
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 )); set ( h (1) , MarkerSize ,5);
[z , dzx , dzy ] = OptFn (x , y ); set ( h (2) , MarkerSize ,4);
[ zopt , id1 ] = min ( z ); hold off ;
[ zopt , id2 ] = min ( zopt ); xlabel ( X );
id1 = id1 ( id2 ); ylabel ( Y );
xopt = x ( id1 , id2 ); zoom on ;
yopt = y ( id1 , id2 ); AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e C o n t o u r A ;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[z , dzx , dzy ] = OptFn (x , y ); figure ;
[ zopt2 , id1 ] = min ( z ); FigureSet (1 ,4 .5 ,2 .75 );
[ zopt2 , id2 ] = min ( zopt2 ); [x , y ] = meshgrid ( -1 .5 + ( -2:0 .05 :2) , -1 .5 + ( -2:0 .05 :2));
id1 = id1 ( id2 ); [z , dzx , dzy ] = OptFn (x , y );
xopt2 = x ( id1 , id2 ); contour (x ,y ,z ,75);
yopt2 = y ( id1 , id2 ); h = get ( gca , Children );
set (h , LineWidth ,0 .2 );
figure ; axis ( square );
FigureSet (1 ,4 .5 ,2 .75 ); hold on ;
[x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5); h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
z = OptFn (x , y ); set ( h (1) , LineWidth ,1 .2 );
contour (x ,y ,z ,50); set ( h (2) , LineWidth ,0 .6 );
h = get ( gca , Children ); hold off ;
set (h , LineWidth ,0 .2 ); xlabel ( X );
axis ( square ); ylabel ( Y );
hold on ; zoom on ;
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r ); AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e C o n t o u r B ; print - depsc C y c l i c C o o r d i n a t e E r r o r L i n e a r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
xerr = ( sum ((( a - ones ( ns ,1)*[ xopt2 yopt2 ]) ) . ^2) ) . ^(1/2);
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
set (h , MarkerSize ,6);
xlabel ( Iteration );
ylabel ( Euclidean Position Error );
xlim ([0 ns -1]);
ylim ([0 xerr (1)]);
grid on ;
set ( gca , Box , Off );
AxisSet (8);
print - depsc C y c l i c C o o r d i n a t e P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 ,f , b ,[0 ns ] , zopt *[1 1] , r ,[0 ns ] , zopt2 *[1 1] , g );
set ( h (1) , Marker , . );
ylabel ( Function Value );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
Steepest Descent Steepest Descent
The gradient of the function f (a) is dened as the vector of partial + Very stable algorithm
derivatives: T
f (a) f (a) f (a) Can converge very slowly once near the local minima where the
a f (a) a1 a2 . . . ap surface is approximately quadratic
It can be shown that the gradient, a f (a), points in the
direction of maximum ascent
The negative of the gradient, a f (a), points in the direction
of maximum descent
A vector d is a direction of descent if there exists a such that
f (a + d) < f (a) for all 0 < <
It can alsoT be shown that d is a direction of descent i
(a f (a)) d < 0
The algorithm of steepest descent uses d = a f (a)
The most fundamental of all algorithms for minimizing a
continuously dierentiable function
Example 3: Steepest Descent Example 3: Steepest Descent
5 1.2
4 1.3
3 1.4
2 1.5
1 1.6
Y
Y
0 1.7
1 1.8
2 1.9
3 2
4 2.1
5 2.2
5 0 5 2 1.8 1.6 1.4 1.2
X X
Example 3: Steepest Descent Example 3: Steepest Descent Method
7
6
6
5

5
4
Function Value
4
3
3
2
2
1
1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Iteration Iteration
Example 3: Relevant MATLAB Code a ( cnt ,:) = [ x y ];

f ( cnt ) = z;
end ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
function [] = SteepestDescent ();
% clear all ;
[ zopt , id1 ] = min ( z );
close all ;
[ zopt , id2 ] = min ( zopt );
id1 = id1 ( id2 );
ns = 26;
xopt = x ( id1 , id2 );
x = -3;
yopt = y ( id1 , id2 );
y = 1;
b0 = 0 .01 ;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
ls = 30;
[ zopt2 , id1 ] = min ( z );
a = zeros ( ns ,2);
f = zeros ( ns ,1); [ zopt2 , id2 ] = min ( zopt2 );
id1 = id1 ( id2 );
[z , g ] = OptFn (x , y ); xopt2 = x ( id1 , id2 );
a (1 ,:) = [ x y ]; yopt2 = y ( id1 , id2 );
f (1) = z;
d = -g / norm ( g ); [ zopt zopt2 ]
for cnt = 2: ns ,
[b , fmin ] = LineSearch ([ x y ] ,d , b0 , ls ); figure ;
FigureSet (1 ,4 .5 ,2 .75 );
x = x + b * d (1); [x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
y = y + b * d (2); z = OptFn (x , y );
contour (x ,y ,z ,50);
[z , g ] = OptFn (x , y ); h = get ( gca , Children );
d = -g ; set (h , LineWidth ,0 .2 );
d = d / norm ( d ); axis ( square );
hold on ;
set ( h (1) , LineWidth ,1 .2 ); print - depsc S t e e p e s t D e s c e n t C o n t o u r B ;
h = plot ( xopt , yopt , kx , xopt , yopt , rx ); figure ;
set ( h (1) , LineWidth ,1 .5 ); FigureSet (2 ,4 .5 ,2 .75 );
set ( h (2) , LineWidth ,0 .5 ); k = 1: ns ;
set ( h (1) , MarkerSize ,5); xerr = ( sum ((( a - ones ( ns ,1)*[ xopt2 yopt2 ]) ) . ^2) ) . ^(1/2);
set ( h (2) , MarkerSize ,4); h = plot (k -1 , xerr , b );
hold off ; set ( h (1) , Marker , . );
xlabel ( X ); set (h , MarkerSize ,6);
ylabel ( Y ); xlabel ( Iteration );
zoom on ; ylabel ( Euclidean Position Error );
AxisSet (8); xlim ([0 ns -1]);
print - depsc S t e e p e s t D e s c e n t C o n t o u r A ; ylim ([0 xerr (1)]);
grid on ;
figure ; set ( gca , Box , Off );
FigureSet (1 ,4 .5 ,2 .75 ); AxisSet (8);
[x , y ] = meshgrid ( -1 .6 + ( -0 .5 :0 .01 :0 .5 ) , -1 .7 + ( -0 .5 :0 .01 :0 .5 )); print - depsc S t e e p e s t D e s c e n t P o s i t i o n E r r o r ;
z = OptFn (x , y );
contour (x ,y ,z ,75); figure ;
h = get ( gca , Children ); FigureSet (2 ,4 .5 ,2 .75 );
set (h , LineWidth ,0 .2 ); k = 1: ns ;
axis ( square ); h = plot (k -1 ,f , b ,[0 ns ] , zopt *[1 1] , r ,[0 ns ] , zopt2 *[1 1] , g );
hold on ; set ( h (1) , Marker , . );
h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r ); set (h , MarkerSize ,6);
set ( h (1) , LineWidth ,1 .2 ); xlabel ( Iteration );
set ( h (2) , LineWidth ,0 .6 ); ylabel ( Function Value );
hold off ; ylim ([0 f (1)]);
xlabel ( X ); xlim ([0 ns -1]);
ylabel ( Y ); grid on ;
zoom on ; set ( gca , Box , Off );
AxisSet (8);
print - depsc S t e e p e s t D e s c e n t E r r o r L i n e a r ;
Conjugate Gradient Algorithms
1. Take a steepest descent step
2. For i = 2 to p
:= argminf (a + d)

a := a + d
gi := f (a)
T
gi gi
:= T
gi1 gi1
d := gi + di
Based on quadratic approximations of f
Called the Fletcher-Reeves method
Example 4: Fletcher-Reeves Conjugate Gradient Example 4: Fletcher-Reeves Conjugate Gradient
5 2.5
4 2.6
3 2.7
2 2.8
1 2.9
Y
Y
0 3
1 3.1
2 3.2
3 3.3
4 3.4
5 3.5
5 0 5 1.5 2 2.5
X X
Example 4: Fletcher-Reeves Conjugate Gradient Example 4: Fletcher-Reeves Conjugate Gradient
7
6
6
5

5
4
Function Value
4
3
3
2
2
1
1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Iteration Iteration
Example 4: Relevant MATLAB Code d = -g + beta * d ;
a ( cnt ,:) = [ x y ];
f ( cnt ) = z;
end ;
function [] = FletcherReeves ();
% clear all ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
close all ;
[ zopt , id1 ] = min ( z );
ns = 26;
x = -3;
id1 = id1 ( id2 );
y = 1;
xopt = x ( id1 , id2 );
b0 = 0 .01 ;
yopt = y ( id1 , id2 );
ls = 30;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
a = zeros ( ns ,2);
f = zeros ( ns ,1); [z , dzx , dzy ] = OptFn (x , y );
[ zopt2 , id1 ] = min ( z );
[z , g ] = OptFn (x , y ); [ zopt2 , id2 ] = min ( zopt2 );
a (1 ,:) = [ x y ]; id1 = id1 ( id2 );
f (1) = z; xopt2 = x ( id1 , id2 );
d = -g / norm ( g ); % First direction yopt2 = y ( id1 , id2 );
for cnt = 2: ns ,
FigureSet (1 ,4 .5 ,2 .75 );
x = x + b * d (1); [x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
y = y + b * d (2); z = OptFn (x , y );
go = g ; % Old gradient h = get ( gca , Children );
[z , g ] = OptFn (x , y ); set (h , LineWidth ,0 .2 );
axis ( square );
beta = (g * g )/( go * go ); hold on ;

set ( h (1) , LineWidth ,1 .2 ); print - depsc F l e t c h e r R e e v e s C o n t o u r B ;
print - depsc F l e t c h e r R e e v e s C o n t o u r A ; ylim ([0 xerr (1)]);
grid on ;
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 ); print - depsc F l e t c h e r R e e v e s P o s i t i o n E r r o r ;
z = OptFn (x , y );
AxisSet (8);
print - depsc F l e t c h e r R e e v e s E r r o r L i n e a r ;
Conjugate Gradient Algorithms Continued
There is also a variant called Polak-Ribiere where
T
(gi gi1 ) gi
:= T
gi1 gi1
+ Only requires the gradient

+ Converges in a nite No. steps when f (a) is quadratic and perfect
line searches are used
Less stable numerically than steepest descent
Sensitive to inexact line searches
Example 5: Polak-Ribiere Conjugate Gradient Example 5: Polak-Ribiere Conjugate Gradient
5 2.5
4 2.6
3 2.7
2 2.8
1 2.9
Y
Y
0 3
1 3.1
2 3.2
3 3.3
4 3.4
5 3.5
5 0 5 1.5 2 2.5
X X
Example 5: Polak-Ribiere Conjugate Gradient Example 5: Polak-Ribiere Conjugate Gradient
7
6
6
5

5
4
Function Value
4
3
3
2
2
1
1
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Iteration Iteration
Example 5: MATLAB Code d = -g + beta * d ;
a ( cnt ,:) = [ x y ];
f ( cnt ) = z;
end ;
function [] = PolakRibiere ();
% clear all ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
close all ;
[ zopt , id1 ] = min ( z );
ns = 26;
x = -3;
id1 = id1 ( id2 );
y = 1;
xopt = x ( id1 , id2 );
b0 = 0 .01 ;
yopt = y ( id1 , id2 );
ls = 30;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
a = zeros ( ns ,2);
f = zeros ( ns ,1); [z , dzx , dzy ] = OptFn (x , y );
[ zopt2 , id1 ] = min ( z );
[z , g ] = OptFn (x , y ); [ zopt2 , id2 ] = min ( zopt2 );
a (1 ,:) = [ x y ]; id1 = id1 ( id2 );
f (1) = z; xopt2 = x ( id1 , id2 );
d = -g / norm ( g ); % First direction yopt2 = y ( id1 , id2 );
for cnt = 2: ns ,
FigureSet (1 ,4 .5 ,2 .75 );
x = x + b * d (1); [x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
y = y + b * d (2); z = OptFn (x , y );
go = g ; % Old gradient h = get ( gca , Children );
[z , g ] = OptFn (x , y ); set (h , LineWidth ,0 .2 );
axis ( square );
beta = (( g - go ) * g )/( go * go ); hold on ;
set ( h (1) , LineWidth ,1 .2 ); print - depsc P o l a k R i b i e r e C o n t o u r B ;
print - depsc P o l a k R i b i e r e C o n t o u r A ; ylim ([0 xerr (1)]);
grid on ;
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 ); print - depsc P o l a k R i b i e r e P o s i t i o n E r r o r ;
z = OptFn (x , y );
AxisSet (8);
print - depsc P o l a k R i b i e r e E r r o r L i n e a r ;
Parallel Tangents (PARTAN)
1. First gradient step
d := f (a)
:= argmin f (a + d)
sp := d
a := a + sp
2. Gradient Step
dg := f (a)
:= argmin f (a + d)
sg := d
a := a + sg
3. Conjugate Step
dp := sp + sg
:= argmin f (a + d)
sp := d
a := a + sp
PARTAN Concept Example 6: PARTAN
a2 a3
a0 5
a6 a7 4
a1 a4 3
a5
2
First two steps are steepest descent 1
Thereafter, each iteration consists of two steps
Y
0
1. Search along the direction 1
di = ai ai2 2
where ai is the current point and ai2 is the point from two 3
steps ago 4
2. Search in the direction of the negative gradient
5
5 0 5
di = f (ai ) X
Example 6: PARTAN Example 6: PARTAN
2.5 7
2.6
6
2.7
2.8 5
Function Value
2.9
4
Y
3.1 3
3.2 2
3.3
1
3.4
3.5 0
1.5 2 2.5 0 5 10 15 20 25
X Iteration
Example 6: PARTAN Example 6: MATLAB Code
6 function [] = Partan ();

% clear all ;
close all ;
5 ns = 26;
x = -3;
y = 1;
b0 = 0 .01 ;
4 ls = 30;
a = zeros ( ns ,2);
f = zeros ( ns ,1);
3
[z , g ] = OptFn (x , y );
a (1 ,:) = [ x y ];
2 f (1) = z;
xa = x;
ya = y;
1 % First step - substitute for a Conjugate step

d = -g / norm ( g ); % First direction
[ bp , fmin ] = LineSearch ([ x y ] ,d , b0 ,100);
x = x + bp * d (1); % Standin for a conjugate step
0 y = y + bp * d (2);
0 5 10 15 20 25 a (2 ,:) = [ x y ];
Iteration f (2) = fmin ;
cnt = 2; else % Could not move - do another gradient update

while cnt < ns , cnt = cnt + 1;
% Gradient step a ( cnt ,:) = a ( cnt -1 ,:);
[z , g ] = OptFn (x , y ); f ( cnt ) = f ( cnt -1);
d = -g / norm ( g ); % Direction if cnt == ns ,
[ bg , fmin ] = LineSearch ([ x y ] ,d , b0 , ls ); break ;
end ;
xg = x + bg * d (1); fprintf ( G2 : );
yg = y + bg * d (2); [z , g ] = OptFn ( xg , yg );
d = -g / norm ( g ); % Direction
cnt = cnt + 1; [ bp , fmin ] = LineSearch ([ xg yg ] ,d , b0 , ls );
a ( cnt ,:) = [ xg yg ]; x = xg + bp * d (1);
f ( cnt ) = OptFn ( xg , yg ); y = yg + bp * d (2);
fprintf ( G : % d %5 .3f \ n ,cnt , f ( cnt )); end ;
if cnt == ns ,
break ; % Update anchor point
end ; xa = xg ;
ya = yg ;
% Conjugate
d = [ xg - xa yg - ya ] ; cnt = cnt + 1;
if norm ( d )=0 , a ( cnt ,:) = [ x y ];
d = d / norm ( d ); f ( cnt ) = OptFn (x , y );
[ bp , fmin ] = LineSearch ([ xg yg ] ,d , b0 , ls ); fprintf ( % d %5 .3f \ n ,cnt , f ( cnt ));
else end ;
bp = 0;
end ; [x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
if bp >0 , % Line search in conj ugate direction was successful [ zopt , id1 ] = min ( z );
fprintf ( P : ); [ zopt , id2 ] = min ( zopt );
x = xg + bp * d (1); id1 = id1 ( id2 );
y = yg + bp * d (2); xopt = x ( id1 , id2 );
print - depsc PartanContourA ;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id2 ] = min ( zopt2 ); [x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 );
id1 = id1 ( id2 ); z = OptFn (x , y );
axis ( square ); ylabel ( Y );
hold on ; zoom on ;
set ( h (1) , LineWidth ,1 .2 ); print - depsc PartanContourB ;
xlim ([0 ns -1]);

PARTAN Pros and Cons
grid on ;
set ( gca , Box , Off ); a2 a3
AxisSet (8); a0
print - depsc P a r t a n P o s i t i o n E r r o r ;
a6 a7
figure ;
FigureSet (2 ,4 .5 ,2 .75 ); a1 a4
k = 1: ns ;
h = plot (k -1 ,f , b ,[0 ns ] , zopt *[1 1] , r ,[0 ns ] , zopt2 *[1 1] , g ); a5
set ( h (1) , Marker , . );
xlabel ( Iteration ); + For quadratic functions, converges in a nite number of steps
ylim ([0 f (1)]);
xlim ([0 ns -1]);
+ Easier to implement than 2nd order methods
+ Can be used with large number of parameters
grid on ;
AxisSet (8);
print - depsc P a r t a n E r r o r L i n e a r ; + Each (composite) step is at least as good as steepest descent
+ Tolerant of inexact line searches
Each (composite) step requires two line searches
Newtons Method Example 7: Newtons with Steepest Descent Safeguard
1
ak+1 = ak H(ak ) f (ak ) 5
where f (ak ) is the gradient and H(ak ) is the hessian of f (a), 4
2 f (a) 2 f (a) 2 f (a)

2 . . . 3
a1 a a a a
2 f (a) 1
2 f (a)
2
2 f (a)
1 p
. . . 2
a a a2 2 a2 ap
H(ak ) 2. 1 .. ..
.. ..
.
1
. .
2 2 2
f (a) f (a) f (a)
Y
0
ap a1 ap a2 . . . a 2
p
1
Based on a quadratic approximation of the function f (a) 2
If f (a) is quadratic, converges in one step 3
If H(a) is positive-denite, the problem is well dened near local 4
minima where f (a) is nearly quadratic
5
5 0 5
X
Example 7: Newtons with Steepest Descent Safeguard Example 7: Newtons with Steepest Descent Safeguard
1.5 7
2 5
Function Value
4
Y
2.5
3
2
3
1
0 0.5 1 1.5 2 0
X 0 10 20 30 40 50 60 70 80 90
Iteration
Example 7: Newtons with Steepest Descent Safeguard Example 7: Relevant MATLAB Code
6 function [] = Newtons ();

% clear all ;
close all ;
5 ns = 100;
x = -3; % Starting x
y = 1; % Starting y
b0 = 1;
4
a = zeros ( ns ,2);
f = zeros ( ns ,1);
3 [z ,g , H ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1) = z;
2
for cnt = 2: ns ,
d = - inv ( H )* g ;
if d * g >0 , % Revert to steepest descent if is not direction of descent
1 % fprintf ( (%2 d of %2 d ) Min. Eig :%5 .3f Reverting... \n , cnt , ns , min ( eig ( H )));
d = -g ;
end ;
d = d / norm ( d );
0 [b , fmin ] = LineSearch ([ x y ] ,d , b0 ,100);
0 10 20 30 40 50 60 70 80 90 % a ( cnt ,:) = ( a ( cnt -1 ,:) - inv ( H )* g ) ; % Pure Newton s Method
Iteration
x = x + b * d (1);
y = y + b * d (2); axis ( square );

hold on ;
[z ,g , H ] = OptFn (x , y ); h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r );
a ( cnt ,:) = [ x y ]; set ( h (2) , LineWidth ,0 .6 );
f ( cnt ) = z; h = plot ( xopt , yopt , kx , xopt , yopt , rx );
end ; set ( h (1) , LineWidth ,1 .5 );
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 )); set ( h (1) , MarkerSize ,5);
[z , dzx , dzy ] = OptFn (x , y ); set ( h (2) , MarkerSize ,4);
[ zopt , id1 ] = min ( z ); hold off ;
[ zopt , id2 ] = min ( zopt ); xlabel ( X );
id1 = id1 ( id2 ); ylabel ( Y );
xopt = x ( id1 , id2 ); zoom on ;
print - depsc NewtonsContourA ;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
[ zopt2 , id2 ] = min ( zopt2 ); [x , y ] = meshgrid (1 .0 + ( -1:0 .02 :1) , -2 .4 + ( -1:0 .02 :1));
id1 = id1 ( id2 ); z = OptFn (x , y );
AxisSet (8); AxisSet (8);
print - depsc NewtonsContourB ; print - depsc N e w t o n s E r r o r L i n e a r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
xerr = ( sum ((( a - ones ( ns ,1)*[ xopt2 yopt2 ]) ) . ^2) ) . ^(1/2);
h = plot (k -1 , xerr , b );
set ( h (1) , Marker , . );
ylabel ( Euclidean Position Error );
xlim ([0 ns -1]);
grid on ;
AxisSet (8);
print - depsc N e w t o n s P o s i t i o n E r r o r ;
figure ;
FigureSet (2 ,4 .5 ,2 .75 );
k = 1: ns ;
h = plot (k -1 ,f , b ,[0 ns ] , zopt *[1 1] , r ,[0 ns ] , zopt2 *[1 1] , g );
set ( h (1) , Marker , . );
ylim ([0 f (1)]);
xlim ([0 ns -1]);
Newtons Method Pros and Cons Levenberg-Marquardt

ak+1 = ak H(ak )1 f (ak ) 1. Determine if k I + H(ak ) is positive denite. If not, k := 4k
and repeat.
+ Very fast convergence near local minima 2. Solve the following equation for ak+1
Not guaranteed to converge (may actually diverge) [k I + H(ak )] (ak+1 ak ) = f (ak )
Requires p p Hessian 3.
Requires a p p matrix inverse that uses O(p3 ) operations f (ak ) f (ak+1 )
rk
q(ak ) q(ak+1 )
where q(a) is the quadratic approximation of f (a) based on the
f (a), f (a), and H(ak )
4. If rk < 0.25, then k+1 := 4k
If rk > 0.75, then k+1 := 12 k
If rk 0, then ak+1 := ak
5. If not converged, k := k + 1 and loop to 1.
Levenberg-Marquardt Comments Example 8: Levenberg-Marquardt Conjugate Gradient
Similar to Newtons method 5
Has safety provisions for regions where quadratic approximation is 4
inappropriate
3
Compare
2
Newtons: ak+1 = ak H(ak )1 f (ak ) 1
LM : [k I + H(ak )] (ak+1 ak ) = f (ak )
Y
0
1
If = 0, these are equivalent 2
If , ak+1 ak 3
is chosen to ensure that the smallest eigenvalue of H(ak ) is 4
positive and suciently large ( )
5
5 0 5
X
Example 8: Levenberg-Marquardt Conjugate Gradient Example 8: Levenberg-Marquardt Conjugate Gradient
2.5 7
2.6
6
2.7
2.8 5
Function Value
2.9
4
Y
3.1 3
3.2 2
3.3
1
3.4
3.5 0
1.5 2 2.5 0 5 10 15 20 25
X Iteration
Example 8: Levenberg-Marquardt Conjugate Gradient Example 8: Relevant MATLAB Code
6 function [] = L e v e n b e r g M a r q u a r d t ();
% clear all ;
close all ;
5 ns = 26;
x = -3; % Starting x
y = 1; % Starting y
eta = 0 .0001 ;
4
a = zeros ( ns ,2);
f = zeros ( ns ,1);
3 [ zn ,g , H ] = OptFn (x , y );
a (1 ,:) = [ x y ];
f (1) = zn ;
2 ap = [ x y ] ; % Previous point
for cnt = 2: ns ,
[ zn ,g , H ] = OptFn (x , y );
1
while min ( eig ( eta * eye (2)+ H )) <0 ,
eta = eta * 4;
end ;
0
0 5 10 15 20 25 a ( cnt ,:) = ( ap - inv ( eta * eye (2)+ H )* g ) ;
Iteration
x = a ( cnt ,1);
y = a ( cnt ,2); y = a ( cnt ,2);

zo = zn ; % Old function value a ( cnt ,:) = [ x y ];
zn = OptFn (x , y ); f ( cnt ) = OptFn (x , y );
xd = ( a ( cnt ,:) - ap ); % disp ([ cnt a ( cnt ,:) f ( cnt ) r eta ])

qo = zo ; end ;
qn = zn + g * xd + 0 .5 * xd * H * xd ;
[x , y ] = meshgrid (0+( -0 .01 :0 .001 :0 .01 ) ,3+( -0 .01 :0 .001 :0 .01 ));
if qo == qn , % Test for convergence [z , dzx , dzy ] = OptFn (x , y );
x = a ( cnt ,1); [ zopt , id1 ] = min ( z );
y = a ( cnt ,2); [ zopt , id2 ] = min ( zopt );
a ( cnt : ns ,:) = ones ( ns - cnt +1 ,1)*[ x y ]; id1 = id1 ( id2 );
f ( cnt : ns ,:) = OptFn (x , y ); xopt = x ( id1 , id2 );
break ; yopt = y ( id1 , id2 );
end ;
[x , y ] = meshgrid (1 .883 +( -0 .02 :0 .001 :0 .02 ) , -2 .963 +( -0 .02 :0 .001 :0 .02 ));
r = ( zo - zn )/( qo - qn ); [z , dzx , dzy ] = OptFn (x , y );
[ zopt2 , id1 ] = min ( z );
if r <0 .25 , [ zopt2 , id2 ] = min ( zopt2 );
eta = eta * 4; id1 = id1 ( id2 );
elseif r >0 .50 , % 0 .75 is recommended , but much slower xopt2 = x ( id1 , id2 );
eta = eta / 2; yopt2 = y ( id1 , id2 );
end ;
figure ;
if zn > zo , % Back up FigureSet (1 ,4 .5 ,2 .75 );
a ( cnt ,:) = a ( cnt -1 ,:); [x , y ] = meshgrid ( -5:0 .1 :5 , -5:0 .1 :5);
else z = OptFn (x , y );
ap = a ( cnt ,:) ; contour (x ,y ,z ,50);
end ; h = get ( gca , Children );
x = a ( cnt ,1); axis ( square );
hold on ; zoom on ;
set ( h (1) , LineWidth ,1 .2 ); print - depsc L e v e n b e r g M a r q u a r d t C o n t o u r B ;
print - depsc L e v e n b e r g M a r q u a r d t C o n t o u r A ; ylim ([0 xerr (1)]);
grid on ;
[x , y ] = meshgrid (1 .5 :0 .01 :2 .5 , -3 .5 :0 .01 : -2 .5 ); print - depsc L e v e n b e r g M a r q u a r d t P o s i t i o n E r r o r ;
z = OptFn (x , y );

AxisSet (8);
Levenberg-Marquardt Pros and Cons
print - depsc L e v e n b e r g M a r q u a r d t E r r o r L i n e a r ;
[k I + H(ak )] (ak+1 ak ) = f (ak )
Many equivalent formulations

+ No line search required
+ Can be used with approximations to the hessian
+ Extremely fast convergence (2nd order)
Requires gradient and hessian (or approximate hessian)
Requires O(p3 ) operations for each solution to the key equation
Optimization Algorithm Summary
Algorithm Convergence Stable f (a) H(a) LS

Cyclic Coordinate Slow Y N N Y
Steepest Descent Slow Y Y N Y
Conjugate Gradient Fast N Y N Y
PARTAN Fast Y Y N Y
Newtons Method Very Fast N Y Y N
Levenberg-Marquardt Very Fast Y Y Y N
J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 93

MultivariateOptimizationx4 PDF

Uploaded by

Copyright:

Available Formats

MultivariateOptimizationx4 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MultivariateOptimizationx4 PDF

Uploaded by

Copyright:

Available Formats

Overview of Multivariate Optimization Topics Multivariate Optimization Overview

Problem denition The unconstrained optimization problem is a generalization of

PARTAN Note that the are no constraints on a

Example 1: Optimization Problem Example 1: Optimization Problem

Example 1: Optimization Problem Example 1: MATLAB Code

The most common approach is to go downhill a := a + d

Cyclic Coordinate Method Example 2: Cyclic Coordinate Method

+ Can be used with nominal variables 1

+ f (a) can be discontinuous 2

Very slow compared to gradient-based optimization algorithms 4

Example 2: Cyclic Coordinate Method Example 2: Relevant MATLAB Code

print - depsc C y c l i c C o o r d i n a t e C o n t o u r B ; print - depsc C y c l i c C o o r d i n a t e E r r o r L i n e a r ;

Example 3: Steepest Descent Example 3: Steepest Descent

Euclidean Position Error

Example 3: Relevant MATLAB Code a ( cnt ,:) = [ x y ];

Example 4: Fletcher-Reeves Conjugate Gradient Example 4: Fletcher-Reeves Conjugate Gradient

Euclidean Position Error

h = plot ( a (: ,1) , a (: ,2) , k ,a (: ,1) , a (: ,2) , r ); AxisSet (8);

+ Only requires the gradient

Example 5: Polak-Ribiere Conjugate Gradient Example 5: Polak-Ribiere Conjugate Gradient

Euclidean Position Error

Example 5: MATLAB Code d = -g + beta * d ;

First two steps are steepest descent 1

Thereafter, each iteration consists of two steps

Example 6: PARTAN Example 6: PARTAN

6 function [] = Partan ();

1 % First step - substitute for a Conjugate step

cnt = 2; else % Could not move - do another gradient update

xlim ([0 ns -1]);

6 function [] = Newtons ();

y = y + b * d (2); axis ( square );

Newtons Method Pros and Cons Levenberg-Marquardt

Example 8: Levenberg-Marquardt Conjugate Gradient Example 8: Levenberg-Marquardt Conjugate Gradient

y = a ( cnt ,2); y = a ( cnt ,2);

xd = ( a ( cnt ,:) - ap ); % disp ([ cnt a ( cnt ,:) f ( cnt ) r eta ])

set ( gca , Box , Off );

Many equivalent formulations

Algorithm Convergence Stable f (a) H(a) LS

You might also like