0% found this document useful (0 votes)

66 views51 pages

Gradinet

This document discusses gradient methods for optimization problems. It begins with background on motivation, the gradient notion, and Wolfe theorems. It then describes the steepest descent method, which finds the search direction as the negative gradient at each step. Finally, it introduces the conjugate gradient method, which finds conjugate or orthogonal search directions to span the space in fewer steps than steepest descent. The conjugate gradient method is especially useful for minimizing quadratic functions.

Uploaded by

nabin Paudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views51 pages

Gradinet

Uploaded by

nabin Paudel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

Gradient Methods

May 2005
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
Motivation

 The min(max) problem:

min f ( x)
x
 But we learned in calculus how to solve that
kind of question!
Motivation

 Not exactly,
 Functions: f :R R n
 High order polynomials:
1 3 1 5 1 7
x- x + x - x
6 120 5040
 What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem

 Connectivity shapes (isenburg,gumhold,gotsman)

mesh = {C = (V , E ), geometry}
 What do we get only from C without geometry?
Motivation- “real world” problem

 First we introduce error functionals and then try

to minimize them:

�( )
2
� )=
Es ( x � n�3
xi - x j - 1
( i , j )�E
n
�n�3 ) = �L( xi ) 2
Er ( x �
i =1

1
L( xi ) =
di
�x
( i , j )�E
j - xi
Motivation- “real world” problem

 Then we minimize:

( 1 - l ) E s ( x ) + l Er ( x ) �
E (C , l ) = arg min �
� �
x��n�3

 High dimension non-linear problem.

 The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
Motivation- “real world” problem

 Changing the parameter:

( 1 - l ) E s ( x ) + l Er ( x ) �
E (C , l ) = arg min �
� �
x��n�3
Motivation

 General problem: find global min(max)

 This lecture will concentrate on finding local
minimum.
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
1 � �
� 1 �
f := ( x , y ) � cos�
�x�
�cos�
� y�
�x
2 � �
� 2 �
Directional Derivatives:
first, the one dimension
derivative:


Directional Derivatives :
Along the Axes…

f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…

vR 2

v =1

f ( x, y )
v
Directional
Derivatives

f ( x, y ) f ( x, y )
y x
2
The Gradient: Definition in R

 f f 
f :R R
2
f ( x, y ) :=  
 x y 

In the plane
f ( x , y )
The Gradient: Definition

f :R R
n

 f f 
f ( x1 ,..., xn ) :=  ,..., 
 x1 xn 
The Gradient Properties

 The gradient defines (hyper) plane

approximating the function infinitesimally

f f
z =  x +  y
x y
The Gradient properties

 By the chain rule: (important for later use)

v =1
f
( p ) = f p , v
v

f p
The Gradient properties

 Proposition 1:
is maximal choosing 1
v=  f p
f f p
v
is minimal choosing -1
v=  f p
f p

(intuitive: the gradient points at the greatest change direction)

The Gradient properties

Proof: (only for minimum case)

-1
Assign: v=  f p by chain rule:
f p

f ( x, y ) -1
( p ) = (f ) p ,  (f ) p =
v (f ) p
2
-1 - f p
 f p , f p = = - f p
f p f p
The Gradient properties

On the other hand for general v:

f ( x, y )
( p ) = f p , v  f p  v =
v
= f p
f ( x, y )
 ( p )  - f p
v
The Gradient Properties

Proposition 2: let f : R  R be a

n

1
smooth C function around P,
if f has local minimum (maximum) at p
then,

f p = 0
(Intuitive: necessary for local min(max))
The Gradient Properties

Proof:
Intuitive:
The Gradient Properties

Formally: for any v  R \ {0}

We get:

df ( p + t  v)
0= (0) = (f ) p , v
dt
 (f ) p = 0
The Gradient Properties

 We found the best INFINITESIMAL DIRECTION

at each point,
 Looking for minimum: “blind man” procedure
 How can we derive the way to the minimum
using this knowledge?
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
The Wolfe Theorem

 This is the link from the previous gradient

properties to the constructive algorithm.
 The problem:

min f ( x)
x
The Wolfe Theorem

 We introduce a model for algorithm:

Data: x 0  R
n

Step 0: set i=0

Step 1: if fstop,
( xi ) = 0
else, compute search direction hi  R n

Step 2: compute the step-size

li  arg min f ( xi + l  hi )
l 0
Step 3: set x = go
i +1 i x +tolstep
i ih 1
The Wolfe Theorem

The Theorem: suppose f : R   R C1

smooth, and exist continuous function:

k : R  [0,1]
n

And,
x : f ( x)  0  k ( x)  0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi  - k ( xi )  f ( xi )  hi
The Wolfe Theorem

And f ( y )  0  hi  0
Then {ifxi }i=0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y ) = 0
The Wolfe Theorem

The theorem has very intuitive interpretation :

Always go in decent direction.

hi
f ( xi )
Steepest Descent

 What it mean?
 We now use what we have learned to
implement the most basic minimization
technique.
 First we introduce the algorithm, which is a
version of the model algorithm.
 The problem:
min f ( x)
x
Steepest Descent

 Steepest descent algorithm:

Data: x 0  R
n

Step 0: set i=0

Step 1: if fstop,
( xi ) = 0
else, compute search direction hi = -f ( xi )
Step 2: compute the step-size
li  arg min f ( xi + l  hi )
l 0
Step 3: set x = go
i +1 i ix +tolstep
i h 1
Steepest Descent

 From the chain rule:

d
f ( xi + l  hi ) = f ( xi + l  hi ), hi = 0
dl
 Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent

 The steepest descent find critical point and

local minimum.
 Implicit step-size rule
 Actually we reduced the problem to finding
minimum:
f :RR
 There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent

 Back with our connectivity shapes: the authors

solve the 1-dimension problem analytically.
li  arg min f ( xi + l  hi )
l 0

 They change the spring energy and get a

quartic polynomial in x

�( )
2 2
Es ( x �� ) =
n�3
xi - x j -1
( i , j )�E
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Conjugate Gradient

 We from now on assume we want to minimize

the quadratic function:
1 T
f ( x) = x Ax - bT x + c
2
 This is equivalent to solve linear problem:

0 = f ( x) = Ax - b

 There are generalizations to general functions.

Conjugate Gradient

 What is the problem with steepest descent?

 We can repeat the same directions over and

over…
 Conjugate gradient takes at most n steps.
Conjugate Gradient

d 0 ,d 1,...,d j ,... Search directions – should span  n

xi +1 = xi +  i d i
A~ x =b x1
~ e1 ~
e =x -x
i i x
0
e0
f ( x) = Ax - b = Ax - A~ x d0
~
f ( xi ) = A( xi - x ) = Aei x0
Conjugate Gradient

Given dj , how do we calculate j ? (as before)

d f ( xi +1 ) = 0
T
i

d iT Aei +1 = 0 x1 f ( x i +1 )
~
x0
d iT A(ei +  i d i ) = 0 d0
d Aei T
d f ( xi ) T
i = - =- T
i i

d Adi T
i d i Adi x0
Conjugate Gradient

How do we find d j ?
We want that after n step the error will be 0 : x1 e1 ~
x0
n -1
d0 e0
e0 =   i d i
i =0
j -1
x0
e0 = e1 -  0 d 0 = e2 -  0 d 0 - 1d1 = ... = e j -   i d i
i =0
n -1 j -1
e j =   i di +   i di
i =0 i =0
Conjugate Gradient

Here an idea: if  j= - j then:

n -1 j -1 n -1 j -1 n -1
e j =   i di -   i di =   i di -   i di =   i di
i =0 i =0 i =0 i =0 i= j

So if j =n,
en = 0
Conjugate Gradient

So we look for d j such that  j= - j :

Simple calculation shows that if we take
d Tj Adi = 0 i  j A - conjugate (- orthogonal)
Conjugate Gradient

 We have to find an A conjugate basis

d j , j = 0...n - 1

 We can do “gram-schmidt” process, but we

should be careful since it is an O(n³) process:

i -1
u1 , u 2 ,..., un d i = ui +   i , k d k
k =0
Some series of vectors
Conjugate Gradient

 So for a arbitrary choice of ui we don’t earn

nothing.
 Luckily, we can choose ui so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in A .
 The correct choice of ui is:

ui = -f ( xi )
Conjugate Gradient
 So the conjugate gradient algorithm for minimizing f:
Data: x0   n

Step 0: d 0 = r0 := -f ( x0 )
riT ri
Step 1:  i = T ri := -f ( xi )
d i Adi

Step 2: xi +1 = xi +  i d i
riT+1ri +1
Step 3:  i +1 = T
ri ri
Step 4: d i +1 = ri +1 +  i +and
1d i
repeat n times.

The Routledge International Handbook of Jungian Film Studies 1st Edition Authorized Download
100% (16)
The Routledge International Handbook of Jungian Film Studies 1st Edition Authorized Download
15 pages
Balde para Roca de Cimentaciones Profundas Pilotaje Tipo Kelly
No ratings yet
Balde para Roca de Cimentaciones Profundas Pilotaje Tipo Kelly
3 pages
InvoiceDocument Run 460 Inv MCM37700 Acc WIL009
No ratings yet
InvoiceDocument Run 460 Inv MCM37700 Acc WIL009
2 pages
Grade 9 History PPP TERM 3 Official
No ratings yet
Grade 9 History PPP TERM 3 Official
134 pages
Conjugate Gradient
No ratings yet
Conjugate Gradient
15 pages
Analytical NTS Test 1: For Question 1 To 4
100% (2)
Analytical NTS Test 1: For Question 1 To 4
7 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Skill Acquisition
No ratings yet
Skill Acquisition
37 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
Cyber Law
No ratings yet
Cyber Law
15 pages
Visual Permeability, Locomotive Permeability, Safety, and Enclosure
No ratings yet
Visual Permeability, Locomotive Permeability, Safety, and Enclosure
34 pages
Fletcher Reeves Gradient Based Techniques
No ratings yet
Fletcher Reeves Gradient Based Techniques
24 pages
Fooled by Randomness
No ratings yet
Fooled by Randomness
9 pages
Lec 11
No ratings yet
Lec 11
13 pages
Scikit Learn Docs
100% (2)
Scikit Learn Docs
2,754 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
My Project Proposal
No ratings yet
My Project Proposal
10 pages
The Wheel
100% (1)
The Wheel
4 pages
Imperial - 2022 Investor Day
No ratings yet
Imperial - 2022 Investor Day
76 pages
Subgradient Methods
No ratings yet
Subgradient Methods
56 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Psychological Statistics
No ratings yet
Psychological Statistics
170 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
11 pages
Spare Part Catalogue For Traveling Machinery: GES342PS3BOF06LA20P-6460N
No ratings yet
Spare Part Catalogue For Traveling Machinery: GES342PS3BOF06LA20P-6460N
3 pages
SkyWatcher Instruction User Manual SynScan Serial Communication Protocol
No ratings yet
SkyWatcher Instruction User Manual SynScan Serial Communication Protocol
7 pages
Janata AAwash Concrete Block PDF
No ratings yet
Janata AAwash Concrete Block PDF
12 pages
RCMP Unit 2
No ratings yet
RCMP Unit 2
10 pages
Thesis Capstone Form1 10 4
No ratings yet
Thesis Capstone Form1 10 4
15 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
E-Book (Mpu2313)
No ratings yet
E-Book (Mpu2313)
90 pages
Module 3 Lesson 2-6 GE 4 Living in The IT Era
No ratings yet
Module 3 Lesson 2-6 GE 4 Living in The IT Era
7 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Product Properties Test Guidelines: OPPTS 830.7300 Density/Relative Density/Bulk Density
No ratings yet
Product Properties Test Guidelines: OPPTS 830.7300 Density/Relative Density/Bulk Density
5 pages
L10 - Subgrad - PGD (Partially Annotated)
No ratings yet
L10 - Subgrad - PGD (Partially Annotated)
39 pages
To Structural Dynamics: Unit-I
No ratings yet
To Structural Dynamics: Unit-I
43 pages
BS 743 - 1970 - Materials For Damp-Proof Courses
100% (1)
BS 743 - 1970 - Materials For Damp-Proof Courses
13 pages
Introduction To Optimization - Jean-François Aujol
No ratings yet
Introduction To Optimization - Jean-François Aujol
51 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Nonlinearity in Structural Dynamics Chapter App G
No ratings yet
Nonlinearity in Structural Dynamics Chapter App G
11 pages
Wireless PAN
100% (2)
Wireless PAN
26 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
Total Districtwise
No ratings yet
Total Districtwise
3 pages
Scanned With Camscanner
No ratings yet
Scanned With Camscanner
4 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Catalogue II: New Submitted Design
No ratings yet
Catalogue II: New Submitted Design
3 pages
Investigating Quality Attributes and Consumer Acceptance of Uncured, No-Nitrate/Nitrite-Added Commercial Hams, Bacons, and Frankfurters
No ratings yet
Investigating Quality Attributes and Consumer Acceptance of Uncured, No-Nitrate/Nitrite-Added Commercial Hams, Bacons, and Frankfurters
9 pages
MICRO Groove Tubes - 55 - Web0504
No ratings yet
MICRO Groove Tubes - 55 - Web0504
1 page
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
8 pages
Examining World Market Segmentation and Brand Positioning Strategies
No ratings yet
Examining World Market Segmentation and Brand Positioning Strategies
15 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
How To Update Rotaract and Interact Club Information
No ratings yet
How To Update Rotaract and Interact Club Information
4 pages
Subgradients
No ratings yet
Subgradients
39 pages
Section 14 Structural Dynamics
No ratings yet
Section 14 Structural Dynamics
32 pages
Damped Free Oscillations Forced Forced
No ratings yet
Damped Free Oscillations Forced Forced
67 pages
CV Online2
No ratings yet
CV Online2
1 page
GPS Mapping
No ratings yet
GPS Mapping
34 pages
McGuire - EERI Monograph Re Seismic Hazard and Risk Analysis PDF
No ratings yet
McGuire - EERI Monograph Re Seismic Hazard and Risk Analysis PDF
119 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Total RTB List
No ratings yet
Total RTB List
196 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
NABL 600 - Directory of Accredited Medical Testing Laboratories As On 01.05.2015 PDF
100% (1)
NABL 600 - Directory of Accredited Medical Testing Laboratories As On 01.05.2015 PDF
100 pages
Ped Guidelines
No ratings yet
Ped Guidelines
225 pages
Lecture 04 - Conjugate Gradient Methods
No ratings yet
Lecture 04 - Conjugate Gradient Methods
9 pages
ANN - Ch2-Adaline and Madaline
No ratings yet
ANN - Ch2-Adaline and Madaline
27 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Gradient Lagrange
No ratings yet
Gradient Lagrange
4 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Cgnotes PDF
No ratings yet
Cgnotes PDF
11 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Chapter 2 Conjugate Gradient and Matlab PDF
No ratings yet
Chapter 2 Conjugate Gradient and Matlab PDF
50 pages
Painless Conjugate Gradient Figs
No ratings yet
Painless Conjugate Gradient Figs
39 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
ANN - Ch2-Adaline and Madaline
No ratings yet
ANN - Ch2-Adaline and Madaline
27 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
50 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Conjugate Gradient Method Report
No ratings yet
Conjugate Gradient Method Report
17 pages
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
No ratings yet
Conjugate Gradient Method: Com S 477/577 Nov 6, 2007
8 pages
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
No ratings yet
Notes On Some Methods For Solving Linear Systems: Dianne P. O'Leary, 1983 and 1999 September 25, 2007
11 pages
Steepest Decent and CG
No ratings yet
Steepest Decent and CG
68 pages
Method of Steepest Descent and Its Applications: Department of Engineering, University of Tennessee, Knoxville, TN 37996
No ratings yet
Method of Steepest Descent and Its Applications: Department of Engineering, University of Tennessee, Knoxville, TN 37996
3 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages