0% found this document useful (0 votes)
11 views4 pages

Root

This document presents the projected secant-based Nesterov gradient method for constrained convex optimization problems. The method extends previous work on the secant-based Nesterov method for unconstrained problems by including a projection operator and projected gradient. The algorithm is presented and its global convergence is proved. Key aspects are that the projected gradient mapping preserves important properties like the descent property and global lower bound.

Uploaded by

d. r. o. a.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Root

This document presents the projected secant-based Nesterov gradient method for constrained convex optimization problems. The method extends previous work on the secant-based Nesterov method for unconstrained problems by including a projection operator and projected gradient. The algorithm is presented and its global convergence is proved. Key aspects are that the projected gradient mapping preserves important properties like the descent property and global lower bound.

Uploaded by

d. r. o. a.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Projected Secant-Based Nesterov Gradient Method

R. Olusegun Alli-Oke, William P. Heatha


a University of Manchester, Manchester, United Kingdom. Email: [email protected]

Abstract
Our previous work, the secant-based Nesterov method [1] for unconstrained convex optimization,
extends the classical Nesterov gradient method by updating the estimate-sequence parameter
with secant information whenever possible. This is achieved by imposing a secant condition that
embodies an ”update rule with reset”. Here in this brief technical note, we extend this work to
handle constrained convex optimization.
Keywords: constrained convex optimization, projected gradient descent, secant method, global
convergence, gradient mapping, fast gradient methods, projected Nesterov gradient method.

1. Notation

A continuously differentiable function f : Rn → R has a Lipschitz-continuous gradient with


constant L if there exist a L > 0 such that ∥∇f (x) − ∇f (y)∥ ≤ L∥x − y∥ , ∀ x, y ∈ Rn .
A continuously differentiable function f : Rn → R is convex with convexity parameter µ ≥ 0 if
µ
there exists a constant µ ≥ 0 such that f (x) ≥ f (y) + ∇f (y)(x − y) + ∥x − y∥2 , ∀ x, y ∈ Rn .
2
1,1
Denote Fµ,L (Rn ) as the sub-class of convex functions with convexity parameter µ ≥ 0 and
L -Lipschitz-continuous gradient. The optimal values of f (x) and Φk (x) are denoted by f ∗
and Φ∗k respectively. Let ∇ denote the gradient operator, and the gradient of f is defined by
h df (x) df (x) iT
∇f (x) = ,··· , . Assumption: The gradient’s Lipschitz constant L is known.
dx1 dxn

2. Introduction

This paper considers the constrained optimization of convex function f :

CCOP : arg min f (x) (1)


x∈K⊂Rn

where K ⊂ Rn is a nonempty closed convex set, and f : K → R is convex function with a


1,1
convexity parameter µ ≥ 0 and a L -Lipschitz-continuous gradient i.e. f ∈ Fµ,L (K).
Accelerated gradient methods compute future iterates by relying on the gradient and mo-
mentum from previous iterates. Unlike gradient-descent methods, accelerated gradient methods
are not guaranteed to monotonically decrease the objective value. In other words, they are non-
monotone gradient methods that utilize the momentum from the previous iterates. An example
of accelerated gradient method is the Nesterov gradient method [2] which uses the framework
of estimate sequences as opposed to relaxation sequences of gradient descent methods. The
Nesterov’s principle ensures that the local functions Φk (x) constituting the estimate-sequence
have a continuum of minima Φ∗k that approaches the minimum f ∗ of f (x) as λk → 0 , where
λk is the estimate-sequence parameter. Nesterov’s classical rule for the recursion of these local

Preprint submitted to Results in Control and Optimization February 13, 2023


functions Φk (x) was revisted and relaxed in [1], which allowed freedom in the computation of the
estimate-sequence parameter λk . The secant-based NGM extends the classical NGM by updat-
ing the estimate-sequence parameter λk with secant information whenever possible [1]. This is
achieved by imposing a secant condition that embodies an ”update rule with reset”. The secant-
based Nesterov method [1] significantly outperforms the Nesterov gradient method (NGM) and
the well-known accelerated gradient method [3] in the unconstrained minimization of convex
functions. The reader is referred to our work in [1] for more details and insights to the construc-
tion of Nesterov’s gradient method (NGM) and secant-based NGM.
In this article, we extend our work in [1] to constrained convex optimization by showing that
the projected secant-based Nesterov method is globally convergent. The proposed algorithm is
presented in Section 3, while the proof of its global convergence is derived in Section 4.

3. Projected Secant-Based Nesterov Gradient Method


Consider the constrained convex optimization problem CCOP given in (1) with µ ≥ 0 and L > 0.
The projected secant-based Nesterov method can be defined for the CCOP by the inclusion of
a projector operator (πK ) and a projected gradient (gK ) as shown in Algorithm 1. The constants
µ and L, and operators πK and gK are defined in Sections 1 and 4 respectively.

Algorithm 1: Projected-SBNGM

Given a starting point x0 ∈ K, β0 ∈ (0, 1) and y0 =√x0 .


p 5−1
If µ ̸= 0, choose β0 = µ/L. If µ = 0, choose β0 =
2
repeat until stopping criterion is satisfied
1
1. Compute Nesterov iterate: xk+1 = xα K = πK (yk − αk ∇f (yk )) , with αk = L .
F 1 − βk 1
2. Compute γk+1 = βk2 L ; τk = ; gK α
= (yk − xα K) .
βk αk
yT gα γ̂k+1
α
3. Compute yv = [αk gK −τk (xk+1 −xk )] ; γ̂k+1 = vT K ; γk+1 E
= F
F
×γk+1 .
yv yv γ̂k+1 + γk+1
d µ ( γE , γF ) .
4. Compute γk+1 : γk+1 = min · · · update rule
k+1 k+1
5. If γ̂k+1 < 0 , then set γk+1 = βk2 µ . · · · reset rule
2
6. Compute βk+1 ∈ (0, 1) from βk+1 L + βk+1 (γk+1 − µ) − γk+1 = 0 .
βk+1
7. Compute θk+1 : θk+1 = ρk+1 γk+1 τk , where ρk+1 = .
γk+1 + βk+1 µ
8. Compute yk+1 = xk+1 + θk+1 (xk+1 − xk ) .

end (repeat)

F
In the case µ = 0 , then the reset rule in step 5 can replaced with γk+1 = γk+1 /L2 . However, to
account for cases where L ≫ 1 , we can use γk+1 = min ( γk+1F
/L2 , ϵγk+1
F
) where ϵ = 10−6 .
Remark 1. The min
d µ operator rule in step 4 is given by

F
c = a
 if γk+1 > µ,
c = minµ ( a, b ) : c = min ( γ̂k+1 , b ) F
d if γk+1 < µ and γ̂k+1 > γ̂k

c=b if otherwise.

F
The extra condition in the case of γk+1 < µ serves to penalize oscillation in trajectory of γ̂k+1 .

2
4. Global Convergence Proof of Projected-SBNGM

It has been shown that the SBNGM is globally convergent for convex unconstrained opti-
mization problems, see [1]. Now, we show that the projected-SBNGM presented in Algorithm 1
retains the global convergence of the SBNGM. To this end, we present the following definitions,
lemma and corollary.
Definition 1. The projection-operator πK (v) is an Euclidean projection of vector v onto the
feasible set K. For simple sets like the box constraints, the projection is an entry-wise saturation,

bi = πK (vi ) = min{ubi , max{lbi , vi }} , i = 1···n (2)

Here, the lower and upper bound of the box constraints are denoted by lbi and ubi respectively.
Definition 2 ([2]). For some yk , α > 0, denote xα α
K and gK as
1
xα T
K = arg min [ f (yk ) + ∇f (yk ) (x − yk ) + ∥x − yk ∥2 ]
x∈K 2α
α 1
gK = (yk − xα
K)
α
That is,

xα α
K = πK (yk − α∇f (yk )) = yk − αgK (3)
α
Here, gK is the gradient mapping of f on K. It is well-defined for all x ∈ Rn , [2, pg. 80 , § 2.2.3].
Two key properties of gradient ∇f (yk ) were used in the convergence proof of our work in [1], i.e.
descent property and global lower-bound property, see [1, pg. 86 , § 3.3] and [1, § 3.1] respectively.
α
The following corollary proves that the gradient mapping gK still preserves these properties.
1,1
Lemma 1 ([2, pg. 81]). Let f ∈ Fµ,L (Rn ), α ≥ 1/L and yk ∈ Rn . Then for any x ∈ K ∈ Rn ,
α α 2 µ
f (x) ≥ f (xα T α
K ) + (x − yk ) gK + ∥g ∥ + ∥x − yk ∥2 . (4)
2 K 2
1,1
Observe that (4) still holds when µ = 0, since f ∈ Fµ,L with µ ≥ 0 as pointed out in [2, pg. 67].
1,1
Corollary 1. Let f ∈ Fµ,L (Rn ), α ≥ 1/L and yk ∈ Rn . Then for any x ∈ K ∈ Rn we have,
α α 2
f (xα
K ) ≤ f (yk ) − ∥g ∥ (5)
2 K
µ
f (x) ≥ f (yk ) + (x − yk )T gK
α
+ ∥x − yk ∥2 (6)
2

Proof : Take x = yk in (4), we get (5). Use (5) in (4), we obtain (6).
α
Remark 2. The inequalities in Corollary 1 indicates that the gradient mapping gK preserves the
descent property (5) and the global lower-bound property (6).
The recursive rule Φk+1 (x) ≤ (1 − βk )Φk (x) + βk f (x) for the projected-SBNGM now becomes,
µ
Φk+1 (x) ≤ (1 − βk )Φk (x) + βk [ f (yk ) + (x − yk )T gK
α
+ ∥x − yk ∥2 ] . (7)
2
Compare (7) with [1, eq. 28], as a result, [1, Lemma 6] remains valid except that all occurrences
α
of ∇f (yk ) are replaced with gK . Furthermore, with (3) we have sufficient descent (5) to ensure

3
that Φ∗k+1 ≥ f (xk+1 ), cf. [1, § 3.3], as required by Nesterov’s principle. Also, the secant condition
α α 1
for updating the parameter γk becomes ∇Φk+1 (yk ) = gK , where gK = (yk − xα α
K ). When xK is
α
an iterate with no active constraints, then the secant condition reduces to ∇Φk+1 (yk ) = ∇f (yk )
as before in [1, pg. 93]. Hence, the projected-SBNGM satisfies Nesterov’s principle,

f (xk ) ≤ Φ∗k , Φ∗k = min Φk (x) , ∀k > 0. (8)


x

Consequently, the projected-SBNGM satisfies the premises of [1, Theorem 1] and therefore the
projected-SBNGM is globally convergent with
 k−1  hγ + L i
0
Y
f (xk ) − f (x∗ ) ≤ (1 − βi ) × ∥x∗ − x0 ∥2 , (9)
i=0
2
p
for all k ≥ 1 where βi ∈ (0, 1) and γ0 > 0. If β0 ∈ ( µ/L, 1), then γ0 ≥ µ. With γ0 ≥ µ and
by [2, Lemma 2.2.4], we have that γk > µ ∀k and
k−1 r
Y n µ k 4L o
(1 − βi ) ≤ min (1 − ) , √ √ × ∥x0 − x∗ ∥2 . (10)
i=0
L (2 L + k γ0 )2

It then follows that the inequality (9) reduces to


r
γ0 + L n µ k 4L o
f (xk ) − f (x∗ ) ≤ min (1 − ) , √ √ × ∥x0 − x∗ ∥2 (11)
2 L (2 L + k γ0 )2

Thus, the projected-SBNGM retains the convergence ratio O(1 − q) and the iteration complexity
p 1 p
O( q −1 In ) of the projected-NGM, where q = µ/L. While in the case µ = 0, it then retains
ϵ p
the convergence ratio O(1/k 2 ) and the iteration complexity O( 1/ϵ) of the projected-NGM.

5. Conclusion

The projected-SBNGM algorithm is globally convergent for constrained convex optimzation.


Specifically, we have derived that the algorithm retains the convergence ratio and iteration com-
plexity as that of its unconstrained counterpart. The convergence proof relies on the gradient
α
mapping gK that preserves two key properties required for satisfying Nesterov’s principle, i.e.
the descent property and the global lower-bound property. The MATLAB and Python imple-
mentations of the projected-SBNGM algorithm can be found at the GitHub repository [4].

References

[1] R. Alli-Oke, W. P. Heath, A secant-based nesterov method for convex functions, Optimization
Letters 11 (2017) 81–105.
[2] Y. Nesterov, Introductory Lectures on Convex Programming, Volume I: Basic course, Kluwer
Academic Publishers, 2004.
[3] B. O’Donoghue, E. Candes, Adaptive restart for accelerated gradient schemes, The Journal
of the Society for Foundations of Computational Mathematics 1 (2013) 1–18.
[4] R. Alli-Oke, Repository, https://github.com/droa28/Secant-Based-NGM (March. 2023).

You might also like