0% found this document useful (0 votes)

121 views33 pages

Lec4 Gradient Method Revise

The document summarizes the gradient method for finding the minimum of a function. It discusses using descent directions to iteratively update the solution by moving in the direction of steepest descent. The gradient direction -minus the gradient - is always a descent direction. Step sizes can be determined via exact line search or backtracking. For functions with Lipschitz continuous gradients, the gradient method converges to a stationary point if the step size is sufficiently small.

Uploaded by

Jose Lorenzo Trujillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views33 pages

Lec4 Gradient Method Revise

Uploaded by

Jose Lorenzo Trujillo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Lecture 4 - The Gradient Method

Objective: find an optimal solution of the problem

min{f (x) : x ∈ Rn }.

The iterative algorithms that we will consider are of the form

xk+1 = xk + tk dk , k = 0, 1, . . .

I dk - direction.
I tk - stepsize.
We will limit ourselves to descent directions.
Definition. Let f : Rn → R be a continuously differentiable function over
Rn . A vector 0 6= d ∈ Rn is called a descent direction of f at x if the
directional derivative f 0 (x; d) is negative, meaning that

f 0 (x; d) = ∇f (x)T d < 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 1 / 33
The Descent Property of Descent Directions
Lemma: Let f be a continuously differentiable function over Rn , and let
x ∈ Rn . Suppose that d is a descent direction of f at x. Then there exists
ε > 0 such that
f (x + td) < f (x)
for any t ∈ (0, ε].
Proof.
I Since f 0 (x; d) < 0, it follows from the definition of the directional derivative
that
f (x + td) − f (x)
lim+ = f 0 (x; d) < 0.
t→0 t
I Therefore, ∃ε > 0 such that
f (x + td) − f (x)
<0
t
for any t ∈ (0, ε], which readily implies the desired result.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 2 / 33
Schematic Descent Direction Method

Initialization: pick x0 ∈ Rn arbitrarily.

General step: for any k = 0, 1, 2, . . . set
(a) pick a descent direction dk .
(b) find a stepsize tk satisfying f (xk + tk dk ) < f (xk ).
(c) set xk+1 = xk + tk dk .
(d) if a stopping criteria is satisfied, then STOP and xk+1 is the output.
Of course, many details are missing in the above schematic algorithm:
I What is the starting point?
I How to choose the descent direction?
I What stepsize should be taken?
I What is the stopping criteria?

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 3 / 33
Stepsize Selection Rules
I constant stepsize - tk = t̄ for any k.
I exact stepsize - tk is a minimizer of f along the ray xk + tdk :

tk ∈ argmin f (xk + tdk ).

t≥0

I backtracking1 - The method requires three parameters:

s > 0, α ∈ (0, 1), β ∈ (0, 1). Here we start with an initial stepsize tk = s.
While
f (xk ) − f (xk + tk dk ) < −αtk ∇f (xk )T dk .
set tk := βtk
Sufficient Decrease Property:

f (xk ) − f (xk + tk dk ) ≥ −αtk ∇f (xk )T dk .

1 also referred to as Armijo

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 4 / 33
Exact Line Search for Quadratic Functions
f (x) = xT Ax + 2bT x + c where A is an n × n positive definite matrix, b ∈ Rn and
c ∈ R. Let x ∈ Rn and let d ∈ Rn be a descent direction of f at x. The objective
is to find a solution to
min f (x + td).
t≥0

In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 5 / 33
The Gradient Method - Taking the Direction of Minus the
Gradient
I In the gradient method dk = −∇f (xk ).
I This is a descent direction as long as ∇f (xk ) 6= 0 since
f 0 (xk ; −∇f (xk )) = −∇f (xk )T ∇f (xk ) = −k∇f (xk )k2 < 0.

I In addition for being a descent direction, minus the gradient is also the
steepest direction method.
Lemma: Let f be a continuously differentiable function and let x ∈ Rn be
a non-stationary point (∇f (x) 6= 0). Then an optimal solution of

min{f 0 (x; d) : kdk = 1} (1)

is d = −∇f (x)/k∇f (x)k.

Proof. In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 6 / 33
The Gradient Method

The Gradient Method

Input: ε > 0 - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.

General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) pick a stepsize tk by a line search procedure on the function

g (t) = f (xk − t∇f (xk )).

(b) set xk+1 = xk − tk ∇f (xk ).

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 7 / 33
Numerical Example
min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , exact line search.

13 iterations until convergence.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 8 / 33
The Zig-Zag Effect
Lemma. Let {xk }k≥0 be the sequence generated by the gradient method
with exact line search for solving a problem of minimizing a continuously
differentiable function f . Then for any k = 0, 1, 2, . . .

(xk+2 − xk+1 )T (xk+1 − xk ) = 0.

Proof.
I xk+1 − xk = −tk ∇f (xk ), xk+2 − xk+1 = −tk+1 ∇f (xk+1 ).
I Therefore, we need to prove that ∇f (xk )T ∇f (xk+1 ) = 0.
I tk ∈ argmin{g (t) ≡ f (xk − t∇f (xk ))}
t≥0
0
I Hence, g (tk ) = 0.
I −∇f (xk )T ∇f (xk − tk ∇f (xk )) = 0.
I ∇f (xk )T ∇f (xk+1 ) = 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 9 / 33
Numerical Example - Constant Stepsize, t¯ = 0.1

min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , t̄ = 0.1.
iter_number = 1 norm_grad = 4.000000 fun_val = 3.280000
iter_number = 2 norm_grad = 2.937210 fun_val = 1.897600
iter_number = 3 norm_grad = 2.222791 fun_val = 1.141888
: : :
iter_number = 56 norm_grad = 0.000015 fun_val = 0.000000
iter_number = 57 norm_grad = 0.000012 fun_val = 0.000000
iter_number = 58 norm_grad = 0.000010 fun_val = 0.000000

I quite a lot of iterations...

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 10 / 33
Numerical Example - Constant Stepsize, t¯ = 10

min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , t̄ = 10..

iter_number = 1 norm_grad = 1783.488716 fun_val = 476806.000000

iter_number = 2 norm_grad = 656209.693339 fun_val = 56962873606.00
iter_number = 3 norm_grad = 256032703.004797 fun_val = 83183008071
: : :
iter_number = 119 norm_grad = NaN fun_val = NaN

I The sequence diverges:(

I Important question: how can we choose the constant stepsize so that
convergence is guaranteed?

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 11 / 33
Lipschitz Continuity of the Gradient
Definition Let f be a continuously differentiable function over Rn . We say
that f has a Lipschitz gradient if there exists L ≥ 0 for which

k∇f (x) − ∇f (y)k ≤ Lkx − yk for any x, y ∈ Rn .

L is called the Lipschitz constant.

I If ∇f is Lipschitz with constant L, then it is also Lipschitz with constant L̃
for all L̃ ≥ L.
I The class of functions with Lipschitz gradient with constant L is denoted by
CL1,1 (Rn ) or just CL1,1 .
I Linear functions - Given a ∈ Rn , the function f (x) = aT x is in C01,1 .
I Quadratic functions - Let A be a symmetric n × n matrix, b ∈ Rn and
c ∈ R. Then the function f (x) = xT Ax + 2bT x + c is a C 1,1 function. The
smallest Lipschitz constant of ∇f is 2kAk2 – why? In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 12 / 33
Equivalence to Boundedness of the Hessian
Theorem. Let f be a twice continuously differentiable function over Rn .
Then the following two claims are equivalent:
1. f ∈ CL1,1 (Rn ).
2. k∇2 f (x)k ≤ L for any x ∈ Rn .
Proof on pages 73,74
√ of the book
Example: f (x) = 1 + x 2 ∈ C 1,1
In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 13 / 33
Convergence of the Gradient Method
Theorem. Let {xk }k≥0 be the sequence generated by GM for solving

min f (x)
x∈Rn

with one of the following stepsize strategies:

I constant stepsize t̄ ∈ 0, 2 .

L
I exact line search.
I backtracking procedure with parameters s > 0 and α, β ∈ (0, 1).
Assume that
I f ∈ CL1,1 (Rn ).
I f is bounded below over Rn , that is, there exists m ∈ R such that
f (x) > m for all x ∈ Rn ).
Then
1. for any k, f (xk+1 ) < f (xk ) unless ∇f (xk ) = 0.
2. ∇f (xk ) → 0 as k → ∞.
Theorem 4.25 in the book.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 14 / 33
Two Numerical Examples - Backtracking
min x 2 + 2y 2
x0 = (2; 1), s = 2, α = 0.25, β = 0.5, ε = 10−5 .
iter_number = 1 norm_grad = 2.000000 fun_val = 1.000000
iter_number = 2 norm_grad = 0.000000 fun_val = 0.000000
I fast convergence (also due to lack!)
I no real advantage to exact line search.
ANOTHER EXAMPLE:
min 0.01x 2 + y 2 , s = 2, α = 0.25, β = 0.5, ε = 10−5 .
iter_number = 1 norm_grad = 0.028003 fun_val = 0.009704
iter_number = 2 norm_grad = 0.027730 fun_val = 0.009324
iter_number = 3 norm_grad = 0.027465 fun_val = 0.008958
: : :
iter_number = 201 norm_grad = 0.000010 fun_val = 0.000000
Important Question: Can we detect key properties of the objective function that
imply slow/fast convergence?
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 15 / 33
Kantorovich Inequality
Lemma. Let A be a positive definite n × n matrix. Then for any 0 6= x ∈ Rn
the inequality

xT x 4λmax (A)λmin (A)

T T −1
≥
(x Ax)(x A x) (λmax (A) + λmin (A))2

holds.
Proof.
I Denote m = λmin (A) and M = λmax (A).
I The eigenvalues of the matrix A + MmA−1 are λi (A) + Mm .
λi (A)
I The maximum of the 1-D function ϕ(t) = t + Mm over [m, M] is attained at
t
the endpoints m and M with a corresponding value of M + m.
I Thus, the eigenvalues of A + MmA−1 are smaller than (M + m).
I A + MmA−1 (M + m)I.
I xT Ax + Mm(xT A−1 x) ≤ (M + m)(xT x),
I Therefore,

1 T 2 (M + m)2 T 2
(xT Ax)[Mm(xT A−1 x)] ≤ (x Ax) + Mm(xT A−1 x) ≤ (x x) ,
4 4
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 16 / 33
Gradient Method for Minimizing xT Ax
Theorem. Let {xk }k≥0 be the sequence generated by the gradient method
with exact linesearch for solving the problem

min xT Ax (A 0).
x∈Rn

Then for any k = 0, 1, . . .:

2
M −m
f (xk+1 ) ≤ f (xk ),
M +m

where M = λmax (A), m = λmin (A).

Proof.
I
xk+1 = xk − tk dk ,
dT
k dk
where tk = 2dT
, dk = 2Axk .
k Adk

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 17 / 33
Proof of Rate of Convergence Contd.
I

f (xk+1 ) = xTk+1 Axk+1 = (xk − tk dk )T A(xk − tk dk )

= xTk Axk − 2tk dTk Axk + tk2 dTk Adk
= xTk Axk − tk dTk dk + tk2 dTk Adk .

I Plugging in the expression for tk

1 (dTk dk )2
f (xk+1 ) = xTk Axk −
4 dTk Adk
(dTk dk )2

1
= xTk Axk
1−
4 (dTk Adk )(xTk AA−1 Axk )
(dTk dk )2

= 1− T f (xk ).
(dk Adk )(dTk A−1 dk )

I By Kantorovich:
2 2
4Mm M −m κ(A) − 1
f (xk+1 ) ≤ 1− f (xk ) = f (xk ) = f (xk ),
(M + m)2 M +m κ(A) + 1

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 18 / 33
The Condition Number
Definition. Let A be an n × n positive definite matrix. Then the condition
number of A is defined by

λmax (A)
κ(A) = .
λmin (A)

I matrices (or quadratic functions) with large condition number are called
ill-conditioned.
I matrices with small condition number are called well-conditioned.
I large condition number implies large number of iterations of the gradient
method.
I small condition number implies small number of iterations of the gradient
method.
I For a non-quadratic function, the asymptotic rate of convergence of xk to a
stationary point x∗ is usually determined by the condition number of ∇2 f (x∗ ).

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 19 / 33
A Severely Ill-Condition Function - Rosenbrock

min f (x1 , x2 ) = 100(x2 − x12 )2 + (1 − x1 )2 .

I optimal solution:(x1 , x2 ) = (1, 1), optimal value: 0.

−400x1 (x2 − x12 ) − 2(1 − x1 )

∇f (x) = ,
200(x2 − x12 )
−400x2 + 1200x12 + 2 −400x1

∇2 f (x) = .
−400x1 200

I
802 −400
∇2 f (1, 1) =
−400 200
condition number: 2508

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 20 / 33
Solution of the Rosenbrock Problem with the Gradient
Method
x0 = (2; 5), s = 2, α = 0.25, β = 0.5, ε = 10−5 , backtracking stepsize selection.

6890(!!!) iterations.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 21 / 33
Sensitivity of Solutions to Linear Systems

I Suppose that we are given the linear system

Ax = b

where A 0 and we assume that x is indeed the solution of the system

(x = A−1 b).
I Suppose that the right-hand side is perturbed b + ∆b. What can be said on
the solution of the new system x + ∆x?
I ∆x = A−1 ∆b.
I Result (derivation In class):

k∆xk k∆bk
≤ κ(A)
kxk kbk

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 22 / 33
Numerical Example
I consider the ill-condition matrix:
1 + 10−5

1
A=
1 1 + 10−5
>> A=[1+1e-5,1;1,1+1e-5];
>> cond(A)
ans =
2.000009999998795e+005

I We have
>> A\[1;1]
ans =
0.499997500018278
0.499997500006722
I However,
>> A\[1.1;1]
ans =
1.0e+003 *
5.000524997400047
-4.999475002650021
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 23 / 33
Scaled Gradient Method
I Consider the minimization problem
(P) min{f (x) : x ∈ Rn }.

I For a given nonsingular matrix S ∈ Rn×n , we make the linear change of

variables x = Sy, and obtain the equivalent problem
(P’) min{g (y) ≡ f (Sy) : y ∈ Rn }.

I Since ∇g (y) = ST ∇f (Sy) = ST ∇f (x), the gradient method for (P’) is

yk+1 = yk − tk ST ∇f (Syk ).

I Multiplying the latter equality by S from the left, and using the notation
xk = Syk :
xk+1 = xk − tk SST ∇f (xk ).

I Defining D = SST , we obtain the scaled gradient method:

xk+1 = xk − tk D∇f (xk ).
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 24 / 33
Scaled Gradient Method
I D 0, so the direction −D∇f (xk ) is a descent direction:
f 0 (xk ; −D∇f (xk )) = −∇f (xk )T D∇f (xk ) < 0,

We also allow different scaling matrices at each iteration.

Scaled Gradient Method

Input: ε > 0 - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.
General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) pick a scaling matrix Dk 0.
(b) pick a stepsize tk by a line search procedure on the function

g (t) = f (xk − tDk ∇f (xk )).

(c) set xk+1 = xk − tk Dk ∇f (xk ).

(c) if k∇f (xk+1 )k ≤ ε, then STOP and xk+1 is the output.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 25 / 33
Choosing the Scaling Matrix Dk
I The scaled gradient method with scaling matrix D is equivalent to the
gradient method employed on the function g (y) = f (D1/2 y).
I Note that the gradient and Hessian of g are given by

∇g (y) = D1/2 f (D1/2 y) = D1/2 f (x),

2
∇ g (y) = D1/2 ∇2 f (D1/2 y)D1/2 = D1/2 ∇2 f (x)D1/2 .

.
1/2 1/2
I The objective is usually to pick Dk so as to make Dk ∇2 f (xk )Dk as
well-conditioned as possible.
I A well known choice (Newton’s method): Dk = (∇2 f (xk ))−1 .
I diagonal scaling: Dk is picked to be diagonal. For example,
2 −1
∂ f (xk )
(Dk )ii = .
∂xi2

I Diagonal scaling can be very effective when the decision variables are of
different magnitudes.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 26 / 33
The Gauss-Newton Method
I Nonlinear least squares problem:
( m
)
X
2
(NLS): min g (x) ≡ (fi (x) − ci ) .
x∈Rn
i=1

f1 , . . . , fm are continuously differentiable over Rn and c1 , . . . , cm ∈ R.

I Denote:  
f1 (x) − c1
 f2 (x) − c2 
F (x) =  ,
 
..
 . 
fm (x) − cm

I Then the problem becomes:

min kF (x)k2 .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 27 / 33
The Gauss-Newton Method
Given the kth iterate xk , the next iterate is chosen to minimize the sum of squares
of the linearized terms, that is,
( m )
X 2
fi (xk ) + ∇fi (xk )T (x − xk ) − ci

xk+1 = argmin .
x∈Rn
i=1

I The general step actually consists of solving the linear LS problem

min kAk x − bk k2 ,
where
∇f1 (xk )T
 
 ∇f2 (xk )T 
Ak =   = J(xk )
 
..
 . 
T
∇fm (xk )
is the so-called Jacobian matrix, assumed to have full column rank.
∇f1 (xk )T xk − f1 (xk ) + c1
 
 ∇f2 (xk )T xk − f2 (xk ) + c2 
bk =   = J(xk )xk − F (xk )
 
..
 . 
∇fm (xk )T xk − fm (xk ) + cm
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 28 / 33
The Gauss-Newton Method
I The Gauss-Newton method can thus be written as:
xk+1 = (J(xk )T J(xk ))−1 J(xk )T bk .

I The gradient of the objective function f (x) = kF (x)k2 is

∇f (x) = 2J(x)T F (x)

I The GN method can be rewritten as follows:

xk+1 = (J(xk )T J(xk ))−1 J(xk )T (J(xk )xk − F (xk ))
= xk − (J(xk )T J(xk ))−1 J(xk )T F (xk )
1
= xk − (J(xk )T J(xk ))−1 ∇f (xk ),
2

I that is, it is a scaled gradient method with a special choice of scaling matrix:
1
Dk = (J(xk )T J(xk ))−1 .
2

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 29 / 33
The Damped Gauss-Newton Method
The Gauss-Newton method does not incorporate a stepsize, which might cause it
to diverge. A well known variation of the method incorporating stepsizes is the
damped Gauss-newton Method.
Damped Gauss-Newton Method

Input: ε - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.

General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) Set dk = −(J(xk )T J(xk ))−1 J(xk )T F (xk ).
(b) Set tk by a line search procedure on the function

h(t) = g (xk + tdk ).

(c) set xk+1 = xk + tk dk .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 30 / 33
Fermat-Weber Problem

Fermat-Weber Problem: Given m points in Rn : a1 , . . . , am – also

called “anchor point” – and m weights ω1 , ω2 , . . . , ωm > 0, find a
point x ∈ Rn that minimizes the weighted distance of x to each of
the points a1 , . . . , am :
( m
)
X
minn f (x) ≡ ωi kx − ai k .
x∈R
i=1

I The objective function is not differentiable at the anchor points a1 , . . . , am .

I One of the simplest instances of facility location problems.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 31 / 33
Weiszfeld’s Method (1937)
I Start from the stationarity condition ∇f (x) = 0.2
Pm x−ai
i=1 ωi kx−ai k = 0.
I
P
m ωi Pm ωi ai
i=1 kx−ai k x = i=1 kx−ai k ,
I

m ωi ai
x = Pm 1 ωi
P
i=1 kx−ai k .
I
i=1 kx−ai k

I The stationarity condition can be written as x = T (x), where T is the

operator
m
1 X ωi ai
T (x) ≡ Pm ωi .
i=1 kx−ai k kx − ai k
i=1

I Weiszfeld’s method is a fixed point method:

xk+1 = T (xk ).

2 We implicitly assume here that x is not an anchor point.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 32 / 33
Weiszfeld’s Method as a Gradient Method
Weiszfeld’s Method
Initialization: pick x0 ∈ Rn such that x 6= a1 , a2 , . . . , am .
General step: for any k = 0, 1, 2, . . . compute:
m
1 X ωi ai
xk+1 = T (xk ) = Pm ωi .
i=1 kxk −ai k i=1 kxk − ai k

I Weiszfeld’s method is a gradient method since

m
1 X ωi ai
xk+1 = Pm ωi
i=1 kxk −ai k i=1 kxk − ai k
m
1 X xk − ai
= xk − Pm ωi ωi
i=1 kxk −ai k i=1 kxk − ai k
1
= xk − Pm ωi ∇f (xk ).
i=1 kxk −ai k

1
I A gradient method with a special choice of stepsize: tk = Pm ωi .
i=1 kxk −ai k

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 33 / 33

Gradient Descent
No ratings yet
Gradient Descent
18 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Efficient Computation of The DFT
No ratings yet
Efficient Computation of The DFT
35 pages
Nonlinear Programming Unconstrained
No ratings yet
Nonlinear Programming Unconstrained
182 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
Part3 1
No ratings yet
Part3 1
15 pages
Maths
No ratings yet
Maths
8 pages
Douglas Rachford Optimization
No ratings yet
Douglas Rachford Optimization
4 pages
C62 ProblemSheet 2 PartAandC Solutions 2024
No ratings yet
C62 ProblemSheet 2 PartAandC Solutions 2024
5 pages
Lec 11
No ratings yet
Lec 11
13 pages
Lec 02
No ratings yet
Lec 02
43 pages
Topic3 PDF
No ratings yet
Topic3 PDF
50 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
One+Shot Umang CBSE+9+-+2021+ +CH02 +polynomials+ +2nd+june
No ratings yet
One+Shot Umang CBSE+9+-+2021+ +CH02 +polynomials+ +2nd+june
76 pages
LA Chapter1-2
No ratings yet
LA Chapter1-2
37 pages
Attendance5 1
No ratings yet
Attendance5 1
19 pages
Kuhn Tucker Conditions
No ratings yet
Kuhn Tucker Conditions
15 pages
Exercise 07
No ratings yet
Exercise 07
102 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Optimization Fundamentals Approach of Classical Optimization Methods
No ratings yet
Optimization Fundamentals Approach of Classical Optimization Methods
51 pages
Simpsons 1-3rd & 3-8th Rule
No ratings yet
Simpsons 1-3rd & 3-8th Rule
12 pages
Lecture 6 - Discretisation Part 1
No ratings yet
Lecture 6 - Discretisation Part 1
40 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
Practical-5 - Jupyter Notebook
100% (1)
Practical-5 - Jupyter Notebook
8 pages
Session 28 C4T5 Method of Weighted Residual and Ritz Method
No ratings yet
Session 28 C4T5 Method of Weighted Residual and Ritz Method
11 pages
Weatherwax Nocedal Solutions
No ratings yet
Weatherwax Nocedal Solutions
23 pages
Module 8 - Cost Minimization
No ratings yet
Module 8 - Cost Minimization
3 pages
Solving and Verifying The Boolean Pythagorean Triples Problem Via Cube-and-Conquer
No ratings yet
Solving and Verifying The Boolean Pythagorean Triples Problem Via Cube-and-Conquer
50 pages
Optimization For Machine Learning: Massachusetts Institute of Technology
No ratings yet
Optimization For Machine Learning: Massachusetts Institute of Technology
169 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Doan BFGS
No ratings yet
Doan BFGS
72 pages
1L LP Transformation Tricks
No ratings yet
1L LP Transformation Tricks
12 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
PID Controller Design For Second Order Nonlinear Uncertain Systems
No ratings yet
PID Controller Design For Second Order Nonlinear Uncertain Systems
13 pages
Lab Notes: CE 33500, Computational Methods in Civil Engineering
No ratings yet
Lab Notes: CE 33500, Computational Methods in Civil Engineering
10 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
MC Ex10 Notes
No ratings yet
MC Ex10 Notes
16 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Janus Guide
No ratings yet
Janus Guide
40 pages
Conditioning in Numerical Analysis
No ratings yet
Conditioning in Numerical Analysis
2 pages
Pole-Balancing With Different Evolved Neurocontrollers: August 2000
No ratings yet
Pole-Balancing With Different Evolved Neurocontrollers: August 2000
8 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Polynomials
No ratings yet
Polynomials
12 pages
(2021) A Heuristic Approach For Two Dimensional Rectangular Cutting
No ratings yet
(2021) A Heuristic Approach For Two Dimensional Rectangular Cutting
15 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
EE364a Homework 7 Solutions
No ratings yet
EE364a Homework 7 Solutions
16 pages
Introduction To Matlab: Variable Names
No ratings yet
Introduction To Matlab: Variable Names
6 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Midterm A1: Section 4
No ratings yet
Midterm A1: Section 4
3 pages
Xu2001 Minimax
No ratings yet
Xu2001 Minimax
13 pages
06 Optimization
No ratings yet
06 Optimization
42 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
File 5edb2945380c6
No ratings yet
File 5edb2945380c6
13 pages
1.6 Determining The End Behaviours of Polynomial Functions
No ratings yet
1.6 Determining The End Behaviours of Polynomial Functions
4 pages
Linear Interpolation
No ratings yet
Linear Interpolation
4 pages
MAE Opti Worksheet 4 Correction
No ratings yet
MAE Opti Worksheet 4 Correction
3 pages
The Daubechies D4 Wavelet Transform
No ratings yet
The Daubechies D4 Wavelet Transform
10 pages
Unit 10 - Review
No ratings yet
Unit 10 - Review
3 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Numerical Solution of Ordinary Differential Equations
No ratings yet
Numerical Solution of Ordinary Differential Equations
10 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Exercise 2: Optimization: Problem 1
No ratings yet
Exercise 2: Optimization: Problem 1
3 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
An Algorithm For Minimax Solution of Overdetennined Systems of Non-Linear Equations
No ratings yet
An Algorithm For Minimax Solution of Overdetennined Systems of Non-Linear Equations
8 pages
Section 9.4 Lagrange Multipliers Problems 1-15 Odd, 19-25 Odd
No ratings yet
Section 9.4 Lagrange Multipliers Problems 1-15 Odd, 19-25 Odd
4 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Interpolation and Basis Function
No ratings yet
Interpolation and Basis Function
12 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Chương 9
No ratings yet
Chương 9
12 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Indy10installation de
No ratings yet
Indy10installation de
3 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Homework 1
No ratings yet
Homework 1
8 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
HW4 Solutions Autotag
No ratings yet
HW4 Solutions Autotag
7 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Stationary Points Minima and Maxima Gradient Method
No ratings yet
Stationary Points Minima and Maxima Gradient Method
8 pages
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
No ratings yet
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
3 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Lec4 Gradient Method Revise

Uploaded by

Lec4 Gradient Method Revise

Uploaded by

Lecture 4 - The Gradient Method

Objective: find an optimal solution of the problem

The iterative algorithms that we will consider are of the form

f 0 (x; d) = ∇f (x)T d < 0.

Initialization: pick x0 ∈ Rn arbitrarily.

tk ∈ argmin f (xk + tdk ).

I backtracking1 - The method requires three parameters:

f (xk ) − f (xk + tk dk ) ≥ −αtk ∇f (xk )T dk .

1 also referred to as Armijo

min{f 0 (x; d) : kdk = 1} (1)

is d = −∇f (x)/k∇f (x)k.

The Gradient Method

Input: ε > 0 - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.

g (t) = f (xk − t∇f (xk )).

(b) set xk+1 = xk − tk ∇f (xk ).

13 iterations until convergence.

(xk+2 − xk+1 )T (xk+1 − xk ) = 0.

I quite a lot of iterations...

iter_number = 1 norm_grad = 1783.488716 fun_val = 476806.000000

I The sequence diverges:(

k∇f (x) − ∇f (y)k ≤ Lkx − yk for any x, y ∈ Rn .

L is called the Lipschitz constant.

with one of the following stepsize strategies:

xT x 4λmax (A)λmin (A)

Then for any k = 0, 1, . . .:

where M = λmax (A), m = λmin (A).

f (xk+1 ) = xTk+1 Axk+1 = (xk − tk dk )T A(xk − tk dk )

I Plugging in the expression for tk

min f (x1 , x2 ) = 100(x2 − x12 )2 + (1 − x1 )2 .

I optimal solution:(x1 , x2 ) = (1, 1), optimal value: 0.

−400x1 (x2 − x12 ) − 2(1 − x1 )

I Suppose that we are given the linear system

where A  0 and we assume that x is indeed the solution of the system

I For a given nonsingular matrix S ∈ Rn×n , we make the linear change of

I Since ∇g (y) = ST ∇f (Sy) = ST ∇f (x), the gradient method for (P’) is

I Defining D = SST , we obtain the scaled gradient method:

We also allow different scaling matrices at each iteration.

Input: ε > 0 - tolerance parameter.

g (t) = f (xk − tDk ∇f (xk )).

(c) set xk+1 = xk − tk Dk ∇f (xk ).

∇g (y) = D1/2 f (D1/2 y) = D1/2 f (x),

f1 , . . . , fm are continuously differentiable over Rn and c1 , . . . , cm ∈ R.

I Then the problem becomes:

I The general step actually consists of solving the linear LS problem

I The gradient of the objective function f (x) = kF (x)k2 is

I The GN method can be rewritten as follows:

Input: ε - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.

h(t) = g (xk + tdk ).

(c) set xk+1 = xk + tk dk .

Fermat-Weber Problem: Given m points in Rn : a1 , . . . , am – also

I The objective function is not differentiable at the anchor points a1 , . . . , am .

I The stationarity condition can be written as x = T (x), where T is the

I Weiszfeld’s method is a fixed point method:

2 We implicitly assume here that x is not an anchor point.

I Weiszfeld’s method is a gradient method since

You might also like

where A 0 and we assume that x is indeed the solution of the system