Root
Root
Abstract
Our previous work, the secant-based Nesterov method [1] for unconstrained convex optimization,
extends the classical Nesterov gradient method by updating the estimate-sequence parameter
with secant information whenever possible. This is achieved by imposing a secant condition that
embodies an ”update rule with reset”. Here in this brief technical note, we extend this work to
handle constrained convex optimization.
Keywords: constrained convex optimization, projected gradient descent, secant method, global
convergence, gradient mapping, fast gradient methods, projected Nesterov gradient method.
1. Notation
2. Introduction
Algorithm 1: Projected-SBNGM
end (repeat)
F
In the case µ = 0 , then the reset rule in step 5 can replaced with γk+1 = γk+1 /L2 . However, to
account for cases where L ≫ 1 , we can use γk+1 = min ( γk+1F
/L2 , ϵγk+1
F
) where ϵ = 10−6 .
Remark 1. The min
d µ operator rule in step 4 is given by
F
c = a
if γk+1 > µ,
c = minµ ( a, b ) : c = min ( γ̂k+1 , b ) F
d if γk+1 < µ and γ̂k+1 > γ̂k
c=b if otherwise.
F
The extra condition in the case of γk+1 < µ serves to penalize oscillation in trajectory of γ̂k+1 .
2
4. Global Convergence Proof of Projected-SBNGM
It has been shown that the SBNGM is globally convergent for convex unconstrained opti-
mization problems, see [1]. Now, we show that the projected-SBNGM presented in Algorithm 1
retains the global convergence of the SBNGM. To this end, we present the following definitions,
lemma and corollary.
Definition 1. The projection-operator πK (v) is an Euclidean projection of vector v onto the
feasible set K. For simple sets like the box constraints, the projection is an entry-wise saturation,
Here, the lower and upper bound of the box constraints are denoted by lbi and ubi respectively.
Definition 2 ([2]). For some yk , α > 0, denote xα α
K and gK as
1
xα T
K = arg min [ f (yk ) + ∇f (yk ) (x − yk ) + ∥x − yk ∥2 ]
x∈K 2α
α 1
gK = (yk − xα
K)
α
That is,
xα α
K = πK (yk − α∇f (yk )) = yk − αgK (3)
α
Here, gK is the gradient mapping of f on K. It is well-defined for all x ∈ Rn , [2, pg. 80 , § 2.2.3].
Two key properties of gradient ∇f (yk ) were used in the convergence proof of our work in [1], i.e.
descent property and global lower-bound property, see [1, pg. 86 , § 3.3] and [1, § 3.1] respectively.
α
The following corollary proves that the gradient mapping gK still preserves these properties.
1,1
Lemma 1 ([2, pg. 81]). Let f ∈ Fµ,L (Rn ), α ≥ 1/L and yk ∈ Rn . Then for any x ∈ K ∈ Rn ,
α α 2 µ
f (x) ≥ f (xα T α
K ) + (x − yk ) gK + ∥g ∥ + ∥x − yk ∥2 . (4)
2 K 2
1,1
Observe that (4) still holds when µ = 0, since f ∈ Fµ,L with µ ≥ 0 as pointed out in [2, pg. 67].
1,1
Corollary 1. Let f ∈ Fµ,L (Rn ), α ≥ 1/L and yk ∈ Rn . Then for any x ∈ K ∈ Rn we have,
α α 2
f (xα
K ) ≤ f (yk ) − ∥g ∥ (5)
2 K
µ
f (x) ≥ f (yk ) + (x − yk )T gK
α
+ ∥x − yk ∥2 (6)
2
Proof : Take x = yk in (4), we get (5). Use (5) in (4), we obtain (6).
α
Remark 2. The inequalities in Corollary 1 indicates that the gradient mapping gK preserves the
descent property (5) and the global lower-bound property (6).
The recursive rule Φk+1 (x) ≤ (1 − βk )Φk (x) + βk f (x) for the projected-SBNGM now becomes,
µ
Φk+1 (x) ≤ (1 − βk )Φk (x) + βk [ f (yk ) + (x − yk )T gK
α
+ ∥x − yk ∥2 ] . (7)
2
Compare (7) with [1, eq. 28], as a result, [1, Lemma 6] remains valid except that all occurrences
α
of ∇f (yk ) are replaced with gK . Furthermore, with (3) we have sufficient descent (5) to ensure
3
that Φ∗k+1 ≥ f (xk+1 ), cf. [1, § 3.3], as required by Nesterov’s principle. Also, the secant condition
α α 1
for updating the parameter γk becomes ∇Φk+1 (yk ) = gK , where gK = (yk − xα α
K ). When xK is
α
an iterate with no active constraints, then the secant condition reduces to ∇Φk+1 (yk ) = ∇f (yk )
as before in [1, pg. 93]. Hence, the projected-SBNGM satisfies Nesterov’s principle,
Consequently, the projected-SBNGM satisfies the premises of [1, Theorem 1] and therefore the
projected-SBNGM is globally convergent with
k−1 hγ + L i
0
Y
f (xk ) − f (x∗ ) ≤ (1 − βi ) × ∥x∗ − x0 ∥2 , (9)
i=0
2
p
for all k ≥ 1 where βi ∈ (0, 1) and γ0 > 0. If β0 ∈ ( µ/L, 1), then γ0 ≥ µ. With γ0 ≥ µ and
by [2, Lemma 2.2.4], we have that γk > µ ∀k and
k−1 r
Y n µ k 4L o
(1 − βi ) ≤ min (1 − ) , √ √ × ∥x0 − x∗ ∥2 . (10)
i=0
L (2 L + k γ0 )2
Thus, the projected-SBNGM retains the convergence ratio O(1 − q) and the iteration complexity
p 1 p
O( q −1 In ) of the projected-NGM, where q = µ/L. While in the case µ = 0, it then retains
ϵ p
the convergence ratio O(1/k 2 ) and the iteration complexity O( 1/ϵ) of the projected-NGM.
5. Conclusion
References
[1] R. Alli-Oke, W. P. Heath, A secant-based nesterov method for convex functions, Optimization
Letters 11 (2017) 81–105.
[2] Y. Nesterov, Introductory Lectures on Convex Programming, Volume I: Basic course, Kluwer
Academic Publishers, 2004.
[3] B. O’Donoghue, E. Candes, Adaptive restart for accelerated gradient schemes, The Journal
of the Society for Foundations of Computational Mathematics 1 (2013) 1–18.
[4] R. Alli-Oke, Repository, https://github.com/droa28/Secant-Based-NGM (March. 2023).