0% found this document useful (0 votes)
121 views33 pages

Lec4 Gradient Method Revise

The document summarizes the gradient method for finding the minimum of a function. It discusses using descent directions to iteratively update the solution by moving in the direction of steepest descent. The gradient direction -minus the gradient - is always a descent direction. Step sizes can be determined via exact line search or backtracking. For functions with Lipschitz continuous gradients, the gradient method converges to a stationary point if the step size is sufficiently small.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views33 pages

Lec4 Gradient Method Revise

The document summarizes the gradient method for finding the minimum of a function. It discusses using descent directions to iteratively update the solution by moving in the direction of steepest descent. The gradient direction -minus the gradient - is always a descent direction. Step sizes can be determined via exact line search or backtracking. For functions with Lipschitz continuous gradients, the gradient method converges to a stationary point if the step size is sufficiently small.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 4 - The Gradient Method

Objective: find an optimal solution of the problem


min{f (x) : x ∈ Rn }.

The iterative algorithms that we will consider are of the form


xk+1 = xk + tk dk , k = 0, 1, . . .

I dk - direction.
I tk - stepsize.
We will limit ourselves to descent directions.
Definition. Let f : Rn → R be a continuously differentiable function over
Rn . A vector 0 6= d ∈ Rn is called a descent direction of f at x if the
directional derivative f 0 (x; d) is negative, meaning that

f 0 (x; d) = ∇f (x)T d < 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 1 / 33
The Descent Property of Descent Directions
Lemma: Let f be a continuously differentiable function over Rn , and let
x ∈ Rn . Suppose that d is a descent direction of f at x. Then there exists
ε > 0 such that
f (x + td) < f (x)
for any t ∈ (0, ε].
Proof.
I Since f 0 (x; d) < 0, it follows from the definition of the directional derivative
that
f (x + td) − f (x)
lim+ = f 0 (x; d) < 0.
t→0 t
I Therefore, ∃ε > 0 such that
f (x + td) − f (x)
<0
t
for any t ∈ (0, ε], which readily implies the desired result.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 2 / 33
Schematic Descent Direction Method

Initialization: pick x0 ∈ Rn arbitrarily.


General step: for any k = 0, 1, 2, . . . set
(a) pick a descent direction dk .
(b) find a stepsize tk satisfying f (xk + tk dk ) < f (xk ).
(c) set xk+1 = xk + tk dk .
(d) if a stopping criteria is satisfied, then STOP and xk+1 is the output.
Of course, many details are missing in the above schematic algorithm:
I What is the starting point?
I How to choose the descent direction?
I What stepsize should be taken?
I What is the stopping criteria?

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 3 / 33
Stepsize Selection Rules
I constant stepsize - tk = t̄ for any k.
I exact stepsize - tk is a minimizer of f along the ray xk + tdk :

tk ∈ argmin f (xk + tdk ).


t≥0

I backtracking1 - The method requires three parameters:


s > 0, α ∈ (0, 1), β ∈ (0, 1). Here we start with an initial stepsize tk = s.
While
f (xk ) − f (xk + tk dk ) < −αtk ∇f (xk )T dk .
set tk := βtk
Sufficient Decrease Property:

f (xk ) − f (xk + tk dk ) ≥ −αtk ∇f (xk )T dk .

1 also referred to as Armijo


Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 4 / 33
Exact Line Search for Quadratic Functions
f (x) = xT Ax + 2bT x + c where A is an n × n positive definite matrix, b ∈ Rn and
c ∈ R. Let x ∈ Rn and let d ∈ Rn be a descent direction of f at x. The objective
is to find a solution to
min f (x + td).
t≥0

In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 5 / 33
The Gradient Method - Taking the Direction of Minus the
Gradient
I In the gradient method dk = −∇f (xk ).
I This is a descent direction as long as ∇f (xk ) 6= 0 since
f 0 (xk ; −∇f (xk )) = −∇f (xk )T ∇f (xk ) = −k∇f (xk )k2 < 0.

I In addition for being a descent direction, minus the gradient is also the
steepest direction method.
Lemma: Let f be a continuously differentiable function and let x ∈ Rn be
a non-stationary point (∇f (x) 6= 0). Then an optimal solution of

min{f 0 (x; d) : kdk = 1} (1)


d

is d = −∇f (x)/k∇f (x)k.


Proof. In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 6 / 33
The Gradient Method

The Gradient Method

Input: ε > 0 - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.


General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) pick a stepsize tk by a line search procedure on the function

g (t) = f (xk − t∇f (xk )).

(b) set xk+1 = xk − tk ∇f (xk ).


(c) if k∇f (xk+1 )k ≤ ε, then STOP and xk+1 is the output.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 7 / 33
Numerical Example
min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , exact line search.

13 iterations until convergence.


Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 8 / 33
The Zig-Zag Effect
Lemma. Let {xk }k≥0 be the sequence generated by the gradient method
with exact line search for solving a problem of minimizing a continuously
differentiable function f . Then for any k = 0, 1, 2, . . .

(xk+2 − xk+1 )T (xk+1 − xk ) = 0.

Proof.
I xk+1 − xk = −tk ∇f (xk ), xk+2 − xk+1 = −tk+1 ∇f (xk+1 ).
I Therefore, we need to prove that ∇f (xk )T ∇f (xk+1 ) = 0.
I tk ∈ argmin{g (t) ≡ f (xk − t∇f (xk ))}
t≥0
0
I Hence, g (tk ) = 0.
I −∇f (xk )T ∇f (xk − tk ∇f (xk )) = 0.
I ∇f (xk )T ∇f (xk+1 ) = 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 9 / 33
Numerical Example - Constant Stepsize, t¯ = 0.1

min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , t̄ = 0.1.
iter_number = 1 norm_grad = 4.000000 fun_val = 3.280000
iter_number = 2 norm_grad = 2.937210 fun_val = 1.897600
iter_number = 3 norm_grad = 2.222791 fun_val = 1.141888
: : :
iter_number = 56 norm_grad = 0.000015 fun_val = 0.000000
iter_number = 57 norm_grad = 0.000012 fun_val = 0.000000
iter_number = 58 norm_grad = 0.000010 fun_val = 0.000000

I quite a lot of iterations...

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 10 / 33
Numerical Example - Constant Stepsize, t¯ = 10

min x 2 + 2y 2
x0 = (2; 1), ε = 10−5 , t̄ = 10..

iter_number = 1 norm_grad = 1783.488716 fun_val = 476806.000000


iter_number = 2 norm_grad = 656209.693339 fun_val = 56962873606.00
iter_number = 3 norm_grad = 256032703.004797 fun_val = 83183008071
: : :
iter_number = 119 norm_grad = NaN fun_val = NaN

I The sequence diverges:(


I Important question: how can we choose the constant stepsize so that
convergence is guaranteed?

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 11 / 33
Lipschitz Continuity of the Gradient
Definition Let f be a continuously differentiable function over Rn . We say
that f has a Lipschitz gradient if there exists L ≥ 0 for which

k∇f (x) − ∇f (y)k ≤ Lkx − yk for any x, y ∈ Rn .

L is called the Lipschitz constant.


I If ∇f is Lipschitz with constant L, then it is also Lipschitz with constant L̃
for all L̃ ≥ L.
I The class of functions with Lipschitz gradient with constant L is denoted by
CL1,1 (Rn ) or just CL1,1 .
I Linear functions - Given a ∈ Rn , the function f (x) = aT x is in C01,1 .
I Quadratic functions - Let A be a symmetric n × n matrix, b ∈ Rn and
c ∈ R. Then the function f (x) = xT Ax + 2bT x + c is a C 1,1 function. The
smallest Lipschitz constant of ∇f is 2kAk2 – why? In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 12 / 33
Equivalence to Boundedness of the Hessian
Theorem. Let f be a twice continuously differentiable function over Rn .
Then the following two claims are equivalent:
1. f ∈ CL1,1 (Rn ).
2. k∇2 f (x)k ≤ L for any x ∈ Rn .
Proof on pages 73,74
√ of the book
Example: f (x) = 1 + x 2 ∈ C 1,1
In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 13 / 33
Convergence of the Gradient Method
Theorem. Let {xk }k≥0 be the sequence generated by GM for solving

min f (x)
x∈Rn

with one of the following stepsize strategies:


I constant stepsize t̄ ∈ 0, 2 .

L
I exact line search.
I backtracking procedure with parameters s > 0 and α, β ∈ (0, 1).
Assume that
I f ∈ CL1,1 (Rn ).
I f is bounded below over Rn , that is, there exists m ∈ R such that
f (x) > m for all x ∈ Rn ).
Then
1. for any k, f (xk+1 ) < f (xk ) unless ∇f (xk ) = 0.
2. ∇f (xk ) → 0 as k → ∞.
Theorem 4.25 in the book.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 14 / 33
Two Numerical Examples - Backtracking
min x 2 + 2y 2
x0 = (2; 1), s = 2, α = 0.25, β = 0.5, ε = 10−5 .
iter_number = 1 norm_grad = 2.000000 fun_val = 1.000000
iter_number = 2 norm_grad = 0.000000 fun_val = 0.000000
I fast convergence (also due to lack!)
I no real advantage to exact line search.
ANOTHER EXAMPLE:
min 0.01x 2 + y 2 , s = 2, α = 0.25, β = 0.5, ε = 10−5 .
iter_number = 1 norm_grad = 0.028003 fun_val = 0.009704
iter_number = 2 norm_grad = 0.027730 fun_val = 0.009324
iter_number = 3 norm_grad = 0.027465 fun_val = 0.008958
: : :
iter_number = 201 norm_grad = 0.000010 fun_val = 0.000000
Important Question: Can we detect key properties of the objective function that
imply slow/fast convergence?
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 15 / 33
Kantorovich Inequality
Lemma. Let A be a positive definite n × n matrix. Then for any 0 6= x ∈ Rn
the inequality

xT x 4λmax (A)λmin (A)


T T −1

(x Ax)(x A x) (λmax (A) + λmin (A))2

holds.
Proof.
I Denote m = λmin (A) and M = λmax (A).
I The eigenvalues of the matrix A + MmA−1 are λi (A) + Mm .
λi (A)
I The maximum of the 1-D function ϕ(t) = t + Mm over [m, M] is attained at
t
the endpoints m and M with a corresponding value of M + m.
I Thus, the eigenvalues of A + MmA−1 are smaller than (M + m).
I A + MmA−1  (M + m)I.
I xT Ax + Mm(xT A−1 x) ≤ (M + m)(xT x),
I Therefore,

1 T 2 (M + m)2 T 2
(xT Ax)[Mm(xT A−1 x)] ≤ (x Ax) + Mm(xT A−1 x) ≤ (x x) ,
4 4
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 16 / 33
Gradient Method for Minimizing xT Ax
Theorem. Let {xk }k≥0 be the sequence generated by the gradient method
with exact linesearch for solving the problem

min xT Ax (A  0).
x∈Rn

Then for any k = 0, 1, . . .:


 2
M −m
f (xk+1 ) ≤ f (xk ),
M +m

where M = λmax (A), m = λmin (A).


Proof.
I
xk+1 = xk − tk dk ,
dT
k dk
where tk = 2dT
, dk = 2Axk .
k Adk

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 17 / 33
Proof of Rate of Convergence Contd.
I

f (xk+1 ) = xTk+1 Axk+1 = (xk − tk dk )T A(xk − tk dk )


= xTk Axk − 2tk dTk Axk + tk2 dTk Adk
= xTk Axk − tk dTk dk + tk2 dTk Adk .

I Plugging in the expression for tk


1 (dTk dk )2
f (xk+1 ) = xTk Axk −
4 dTk Adk
(dTk dk )2
 
1
= xTk Axk
1−
4 (dTk Adk )(xTk AA−1 Axk )
(dTk dk )2
 
= 1− T f (xk ).
(dk Adk )(dTk A−1 dk )

I By Kantorovich:
   2  2
4Mm M −m κ(A) − 1
f (xk+1 ) ≤ 1− f (xk ) = f (xk ) = f (xk ),
(M + m)2 M +m κ(A) + 1

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 18 / 33
The Condition Number
Definition. Let A be an n × n positive definite matrix. Then the condition
number of A is defined by

λmax (A)
κ(A) = .
λmin (A)

I matrices (or quadratic functions) with large condition number are called
ill-conditioned.
I matrices with small condition number are called well-conditioned.
I large condition number implies large number of iterations of the gradient
method.
I small condition number implies small number of iterations of the gradient
method.
I For a non-quadratic function, the asymptotic rate of convergence of xk to a
stationary point x∗ is usually determined by the condition number of ∇2 f (x∗ ).

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 19 / 33
A Severely Ill-Condition Function - Rosenbrock

min f (x1 , x2 ) = 100(x2 − x12 )2 + (1 − x1 )2 .




I optimal solution:(x1 , x2 ) = (1, 1), optimal value: 0.


I

−400x1 (x2 − x12 ) − 2(1 − x1 )


 
∇f (x) = ,
200(x2 − x12 )
−400x2 + 1200x12 + 2 −400x1
 
∇2 f (x) = .
−400x1 200

I  
802 −400
∇2 f (1, 1) =
−400 200
condition number: 2508

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 20 / 33
Solution of the Rosenbrock Problem with the Gradient
Method
x0 = (2; 5), s = 2, α = 0.25, β = 0.5, ε = 10−5 , backtracking stepsize selection.

6890(!!!) iterations.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 21 / 33
Sensitivity of Solutions to Linear Systems

I Suppose that we are given the linear system

Ax = b

where A  0 and we assume that x is indeed the solution of the system


(x = A−1 b).
I Suppose that the right-hand side is perturbed b + ∆b. What can be said on
the solution of the new system x + ∆x?
I ∆x = A−1 ∆b.
I Result (derivation In class):

k∆xk k∆bk
≤ κ(A)
kxk kbk

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 22 / 33
Numerical Example
I consider the ill-condition matrix:
1 + 10−5
 
1
A=
1 1 + 10−5
>> A=[1+1e-5,1;1,1+1e-5];
>> cond(A)
ans =
2.000009999998795e+005

I We have
>> A\[1;1]
ans =
0.499997500018278
0.499997500006722
I However,
>> A\[1.1;1]
ans =
1.0e+003 *
5.000524997400047
-4.999475002650021
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 23 / 33
Scaled Gradient Method
I Consider the minimization problem
(P) min{f (x) : x ∈ Rn }.

I For a given nonsingular matrix S ∈ Rn×n , we make the linear change of


variables x = Sy, and obtain the equivalent problem
(P’) min{g (y) ≡ f (Sy) : y ∈ Rn }.

I Since ∇g (y) = ST ∇f (Sy) = ST ∇f (x), the gradient method for (P’) is


yk+1 = yk − tk ST ∇f (Syk ).

I Multiplying the latter equality by S from the left, and using the notation
xk = Syk :
xk+1 = xk − tk SST ∇f (xk ).

I Defining D = SST , we obtain the scaled gradient method:


xk+1 = xk − tk D∇f (xk ).
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 24 / 33
Scaled Gradient Method
I D  0, so the direction −D∇f (xk ) is a descent direction:
f 0 (xk ; −D∇f (xk )) = −∇f (xk )T D∇f (xk ) < 0,

We also allow different scaling matrices at each iteration.


Scaled Gradient Method

Input: ε > 0 - tolerance parameter.


Initialization: pick x0 ∈ Rn arbitrarily.
General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) pick a scaling matrix Dk  0.
(b) pick a stepsize tk by a line search procedure on the function

g (t) = f (xk − tDk ∇f (xk )).

(c) set xk+1 = xk − tk Dk ∇f (xk ).


(c) if k∇f (xk+1 )k ≤ ε, then STOP and xk+1 is the output.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 25 / 33
Choosing the Scaling Matrix Dk
I The scaled gradient method with scaling matrix D is equivalent to the
gradient method employed on the function g (y) = f (D1/2 y).
I Note that the gradient and Hessian of g are given by

∇g (y) = D1/2 f (D1/2 y) = D1/2 f (x),


2
∇ g (y) = D1/2 ∇2 f (D1/2 y)D1/2 = D1/2 ∇2 f (x)D1/2 .

.
1/2 1/2
I The objective is usually to pick Dk so as to make Dk ∇2 f (xk )Dk as
well-conditioned as possible.
I A well known choice (Newton’s method): Dk = (∇2 f (xk ))−1 .
I diagonal scaling: Dk is picked to be diagonal. For example,
 2 −1
∂ f (xk )
(Dk )ii = .
∂xi2

I Diagonal scaling can be very effective when the decision variables are of
different magnitudes.
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 26 / 33
The Gauss-Newton Method
I Nonlinear least squares problem:
( m
)
X
2
(NLS): min g (x) ≡ (fi (x) − ci ) .
x∈Rn
i=1

f1 , . . . , fm are continuously differentiable over Rn and c1 , . . . , cm ∈ R.


I Denote:  
f1 (x) − c1
 f2 (x) − c2 
F (x) =  ,
 
..
 . 
fm (x) − cm

I Then the problem becomes:

min kF (x)k2 .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 27 / 33
The Gauss-Newton Method
Given the kth iterate xk , the next iterate is chosen to minimize the sum of squares
of the linearized terms, that is,
( m )
X 2
fi (xk ) + ∇fi (xk )T (x − xk ) − ci

xk+1 = argmin .
x∈Rn
i=1

I The general step actually consists of solving the linear LS problem


min kAk x − bk k2 ,
where
∇f1 (xk )T
 
 ∇f2 (xk )T 
Ak =   = J(xk )
 
..
 . 
T
∇fm (xk )
is the so-called Jacobian matrix, assumed to have full column rank.
∇f1 (xk )T xk − f1 (xk ) + c1
 
 ∇f2 (xk )T xk − f2 (xk ) + c2 
bk =   = J(xk )xk − F (xk )
 
..
 . 
∇fm (xk )T xk − fm (xk ) + cm
Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 28 / 33
The Gauss-Newton Method
I The Gauss-Newton method can thus be written as:
xk+1 = (J(xk )T J(xk ))−1 J(xk )T bk .

I The gradient of the objective function f (x) = kF (x)k2 is


∇f (x) = 2J(x)T F (x)

I The GN method can be rewritten as follows:


xk+1 = (J(xk )T J(xk ))−1 J(xk )T (J(xk )xk − F (xk ))
= xk − (J(xk )T J(xk ))−1 J(xk )T F (xk )
1
= xk − (J(xk )T J(xk ))−1 ∇f (xk ),
2

I that is, it is a scaled gradient method with a special choice of scaling matrix:
1
Dk = (J(xk )T J(xk ))−1 .
2

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 29 / 33
The Damped Gauss-Newton Method
The Gauss-Newton method does not incorporate a stepsize, which might cause it
to diverge. A well known variation of the method incorporating stepsizes is the
damped Gauss-newton Method.
Damped Gauss-Newton Method

Input: ε - tolerance parameter.

Initialization: pick x0 ∈ Rn arbitrarily.


General step: for any k = 0, 1, 2, . . . execute the following steps:
(a) Set dk = −(J(xk )T J(xk ))−1 J(xk )T F (xk ).
(b) Set tk by a line search procedure on the function

h(t) = g (xk + tdk ).

(c) set xk+1 = xk + tk dk .


(c) if k∇f (xk+1 )k ≤ ε, then STOP and xk+1 is the output.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 30 / 33
Fermat-Weber Problem

Fermat-Weber Problem: Given m points in Rn : a1 , . . . , am – also


called “anchor point” – and m weights ω1 , ω2 , . . . , ωm > 0, find a
point x ∈ Rn that minimizes the weighted distance of x to each of
the points a1 , . . . , am :
( m
)
X
minn f (x) ≡ ωi kx − ai k .
x∈R
i=1

I The objective function is not differentiable at the anchor points a1 , . . . , am .


I One of the simplest instances of facility location problems.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 31 / 33
Weiszfeld’s Method (1937)
I Start from the stationarity condition ∇f (x) = 0.2
Pm x−ai
i=1 ωi kx−ai k = 0.
I
P 
m ωi Pm ωi ai
i=1 kx−ai k x = i=1 kx−ai k ,
I

m ωi ai
x = Pm 1 ωi
P
i=1 kx−ai k .
I
i=1 kx−ai k

I The stationarity condition can be written as x = T (x), where T is the


operator
m
1 X ωi ai
T (x) ≡ Pm ωi .
i=1 kx−ai k kx − ai k
i=1

I Weiszfeld’s method is a fixed point method:

xk+1 = T (xk ).

2 We implicitly assume here that x is not an anchor point.


Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 32 / 33
Weiszfeld’s Method as a Gradient Method
Weiszfeld’s Method
Initialization: pick x0 ∈ Rn such that x 6= a1 , a2 , . . . , am .
General step: for any k = 0, 1, 2, . . . compute:
m
1 X ωi ai
xk+1 = T (xk ) = Pm ωi .
i=1 kxk −ai k i=1 kxk − ai k

I Weiszfeld’s method is a gradient method since


m
1 X ωi ai
xk+1 = Pm ωi
i=1 kxk −ai k i=1 kxk − ai k
m
1 X xk − ai
= xk − Pm ωi ωi
i=1 kxk −ai k i=1 kxk − ai k
1
= xk − Pm ωi ∇f (xk ).
i=1 kxk −ai k

1
I A gradient method with a special choice of stepsize: tk = Pm ωi .
i=1 kxk −ai k

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - The Gradient Method 33 / 33

You might also like