Lec3 Gradient Based Method Part I
Lec3 Gradient Based Method Part I
Optimisation – Part I
Convex Sets
a, b S a (1 )b S 0,1
Convex vs non-convex function
Condition for convexity
f (x1 (1 ) x2 ) f ( x1 ) (1 ) f ( x2 ), 0 1
y convex function y non-convex function
x2
x2
x2 )
)
x1
x1
1
(
x1
local optima
f (
x x
local and also global optima
global optima
Gradient and Hessian
J
X
1
J
X 2
Gradient of a function is J G ( J ) .
.
.
J
X
n
J ( X) 12 XT HX G T X C
It can be seen that condition for minimum is
J ( X) 12 (HX XT H ) G T 0 or
12 (H T X HX) G 0
If H is symmetric then HX G 0
Thus minimisation of J identical to solution of the
linear algebraic equations usually written as
AX=b (H=A and b=-G)
Hessian and its use
Expanding J(x) about the stationary point x* in a
direction p and noting that G(x*) = 0, at the
stationary point the behavior of the function is
determined by H 0
start
Some methods
do not need H(X)
Sensitivity Analysis
Analysis analysis
Perform
0
Calculate Calculate search
Input X 1D search
J ( X), G ( X), H ( X) direction S q
X q X q 1 S q
q=q+1
n
y
stop Converged?
The search direction
There are many algorithms
Random search
Powell method
Steepest descent
Flecture-Reeves (FR) method
David-Flecture-Powell (DFP) method
Broydon-Fletcher-GoldFarb-Shanno (BFGS) method
Newton's method
J (x) 12 xT Ax xT b
J (x k p k ) 12 (x k p k )T A(x k p k ) (x k p k )T b
12 2 p ( k )T A p k p ( k )T Ax k p ( k )T b constants
as A is (n×n) is symmetric and positive definite
To minimise J wrt , we set dJ/d=0, which gives
p ( k )T
( Ax k
b)
p A p p A x p b 0 or
( k )T k ( k )T k ( k )T
p ( k )T A p k
Method of Steepest descent
X
J X X J X J X
k k k T
J X X 0
k T
G T ( X k 1 )G ( X k ) 0
Method of Steepest descent
J ( X) 12 ( X 12 10 X 22 )