0% found this document useful (0 votes)
66 views51 pages

Gradinet

This document discusses gradient methods for optimization problems. It begins with background on motivation, the gradient notion, and Wolfe theorems. It then describes the steepest descent method, which finds the search direction as the negative gradient at each step. Finally, it introduces the conjugate gradient method, which finds conjugate or orthogonal search directions to span the space in fewer steps than steepest descent. The conjugate gradient method is especially useful for minimizing quadratic functions.

Uploaded by

nabin Paudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views51 pages

Gradinet

This document discusses gradient methods for optimization problems. It begins with background on motivation, the gradient notion, and Wolfe theorems. It then describes the steepest descent method, which finds the search direction as the negative gradient at each step. Finally, it introduces the conjugate gradient method, which finds conjugate or orthogonal search directions to span the space in fewer steps than steepest descent. The conjugate gradient method is especially useful for minimizing quadratic functions.

Uploaded by

nabin Paudel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Gradient Methods

May 2005
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
Motivation

 The min(max) problem:

min f ( x)
x
 But we learned in calculus how to solve that
kind of question!
Motivation

 Not exactly,
 Functions: f :R R n
 High order polynomials:
1 3 1 5 1 7
x- x + x - x
6 120 5040
 What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem

 Connectivity shapes (isenburg,gumhold,gotsman)


mesh = {C = (V , E ), geometry}
 What do we get only from C without geometry?
Motivation- “real world” problem

 First we introduce error functionals and then try


to minimize them:

�( )
2
� )=
Es ( x � n�3
xi - x j - 1
( i , j )�E
n
�n�3 ) = �L( xi ) 2
Er ( x �
i =1

1
L( xi ) =
di
�x
( i , j )�E
j - xi
Motivation- “real world” problem

 Then we minimize:

( 1 - l ) E s ( x ) + l Er ( x ) �
E (C , l ) = arg min �
� �
x��n�3

 High dimension non-linear problem.


 The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
Motivation- “real world” problem

 Changing the parameter:

( 1 - l ) E s ( x ) + l Er ( x ) �
E (C , l ) = arg min �
� �
x��n�3
Motivation

 General problem: find global min(max)


 This lecture will concentrate on finding local
minimum.
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
1 � �
� 1 �
f := ( x , y ) � cos�
�x�
�cos�
� y�
�x
2 � �
� 2 �
Directional Derivatives:
first, the one dimension
derivative:


Directional Derivatives :
Along the Axes…

f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…

vR 2

v =1

f ( x, y )
v
Directional
Derivatives

f ( x, y ) f ( x, y )
y x
2
The Gradient: Definition in R

 f f 
f :R R
2
f ( x, y ) :=  
 x y 

In the plane
f ( x , y )
The Gradient: Definition

f :R R
n

 f f 
f ( x1 ,..., xn ) :=  ,..., 
 x1 xn 
The Gradient Properties

 The gradient defines (hyper) plane


approximating the function infinitesimally

f f
z =  x +  y
x y
The Gradient properties

 By the chain rule: (important for later use)

v =1
f
( p ) = f p , v
v

f p
The Gradient properties

 Proposition 1:
is maximal choosing 1
v=  f p
f f p
v
is minimal choosing -1
v=  f p
f p

(intuitive: the gradient points at the greatest change direction)


The Gradient properties

Proof: (only for minimum case)


-1
Assign: v=  f p by chain rule:
f p

f ( x, y ) -1
( p ) = (f ) p ,  (f ) p =
v (f ) p
2
-1 - f p
 f p , f p = = - f p
f p f p
The Gradient properties

On the other hand for general v:


f ( x, y )
( p ) = f p , v  f p  v =
v
= f p
f ( x, y )
 ( p )  - f p
v
The Gradient Properties

Proposition 2: let f : R  R be a


n

1
smooth C function around P,
if f has local minimum (maximum) at p
then,

f p = 0
(Intuitive: necessary for local min(max))
The Gradient Properties

Proof:
Intuitive:
The Gradient Properties

Formally: for any v  R \ {0}


n

We get:

df ( p + t  v)
0= (0) = (f ) p , v
dt
 (f ) p = 0
The Gradient Properties

 We found the best INFINITESIMAL DIRECTION


at each point,
 Looking for minimum: “blind man” procedure
 How can we derive the way to the minimum
using this knowledge?
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
The Wolfe Theorem

 This is the link from the previous gradient


properties to the constructive algorithm.
 The problem:

min f ( x)
x
The Wolfe Theorem

 We introduce a model for algorithm:


Data: x 0  R
n

Step 0: set i=0


Step 1: if fstop,
( xi ) = 0
else, compute search direction hi  R n

Step 2: compute the step-size


li  arg min f ( xi + l  hi )
l 0
Step 3: set x = go
i +1 i x +tolstep
i ih 1
The Wolfe Theorem

The Theorem: suppose f : R   R C1


n

smooth, and exist continuous function:


k : R  [0,1]
n

And,
x : f ( x)  0  k ( x)  0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi  - k ( xi )  f ( xi )  hi
The Wolfe Theorem

And f ( y )  0  hi  0
Then {ifxi }i=0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y ) = 0
The Wolfe Theorem

The theorem has very intuitive interpretation :


Always go in decent direction.

hi
f ( xi )
Steepest Descent

 What it mean?
 We now use what we have learned to
implement the most basic minimization
technique.
 First we introduce the algorithm, which is a
version of the model algorithm.
 The problem:
min f ( x)
x
Steepest Descent

 Steepest descent algorithm:


Data: x 0  R
n

Step 0: set i=0


Step 1: if fstop,
( xi ) = 0
else, compute search direction hi = -f ( xi )
Step 2: compute the step-size
li  arg min f ( xi + l  hi )
l 0
Step 3: set x = go
i +1 i ix +tolstep
i h 1
Steepest Descent

 From the chain rule:


d
f ( xi + l  hi ) = f ( xi + l  hi ), hi = 0
dl
 Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent

 The steepest descent find critical point and


local minimum.
 Implicit step-size rule
 Actually we reduced the problem to finding
minimum:
f :RR
 There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent

 Back with our connectivity shapes: the authors


solve the 1-dimension problem analytically.
li  arg min f ( xi + l  hi )
l 0

 They change the spring energy and get a


quartic polynomial in x

�( )
2 2
Es ( x �� ) =
n�3
xi - x j -1
( i , j )�E
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Conjugate Gradient

 We from now on assume we want to minimize


the quadratic function:
1 T
f ( x) = x Ax - bT x + c
2
 This is equivalent to solve linear problem:

0 = f ( x) = Ax - b

 There are generalizations to general functions.


Conjugate Gradient

 What is the problem with steepest descent?

 We can repeat the same directions over and


over…
 Conjugate gradient takes at most n steps.
Conjugate Gradient

d 0 ,d 1,...,d j ,... Search directions – should span  n


xi +1 = xi +  i d i
A~ x =b x1
~ e1 ~
e =x -x
i i x
0
e0
f ( x) = Ax - b = Ax - A~ x d0
~
f ( xi ) = A( xi - x ) = Aei x0
Conjugate Gradient

Given dj , how do we calculate j ? (as before)

d f ( xi +1 ) = 0
T
i

d iT Aei +1 = 0 x1 f ( x i +1 )
~
x0
d iT A(ei +  i d i ) = 0 d0
d Aei T
d f ( xi ) T
i = - =- T
i i

d Adi T
i d i Adi x0
Conjugate Gradient

How do we find d j ?
We want that after n step the error will be 0 : x1 e1 ~
x0
n -1
d0 e0
e0 =   i d i
i =0
j -1
x0
e0 = e1 -  0 d 0 = e2 -  0 d 0 - 1d1 = ... = e j -   i d i
i =0
n -1 j -1
e j =   i di +   i di
i =0 i =0
Conjugate Gradient

Here an idea: if  j= - j then:


n -1 j -1 n -1 j -1 n -1
e j =   i di -   i di =   i di -   i di =   i di
i =0 i =0 i =0 i =0 i= j

So if j =n,
en = 0
Conjugate Gradient

So we look for d j such that  j= - j :


Simple calculation shows that if we take
d Tj Adi = 0 i  j A - conjugate (- orthogonal)
Conjugate Gradient

 We have to find an A conjugate basis


d j , j = 0...n - 1

 We can do “gram-schmidt” process, but we


should be careful since it is an O(n³) process:

i -1
u1 , u 2 ,..., un d i = ui +   i , k d k
k =0
Some series of vectors
Conjugate Gradient

 So for a arbitrary choice of ui we don’t earn


nothing.
 Luckily, we can choose ui so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in A .
 The correct choice of ui is:

ui = -f ( xi )
Conjugate Gradient
 So the conjugate gradient algorithm for minimizing f:
Data: x0   n

Step 0: d 0 = r0 := -f ( x0 )
riT ri
Step 1:  i = T ri := -f ( xi )
d i Adi

Step 2: xi +1 = xi +  i d i
riT+1ri +1
Step 3:  i +1 = T
ri ri
Step 4: d i +1 = ri +1 +  i +and
1d i
repeat n times.

You might also like