0% found this document useful (0 votes)

40 views11 pages

A Note On The Optimal Convergence Rate of Descent

This document summarizes a research article about analyzing the optimal convergence rate of descent methods with fixed step sizes for smooth strongly convex functions. The key points are: - Descent methods like gradient descent can achieve an optimal error reduction rate per step for these types of functions when using a fixed step size. - Recent work showed this optimal rate also applies to the error in function values, not just distances to the optimal value. - This note presents an analysis showing the optimal convergence rate holds more generally for variable metric descent methods and methods with inexact gradients, by relating them to the gradient method through a change of metric. - In particular, it proves the optimal rate applies when using a variable metric defined

Uploaded by

d. r. o. a.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views11 pages

A Note On The Optimal Convergence Rate of Descent

Uploaded by

d. r. o. a.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Journal of Optimization Theory and Applications (2022) 194:364–373

https://doi.org/10.1007/s10957-022-02032-z

A Note on the Optimal Convergence Rate of Descent

Methods with Fixed Step Sizes for Smooth Strongly Convex
Functions

André Uschmajew1 · Bart Vandereycken2

Received: 1 July 2021 / Accepted: 20 March 2022 / Published online: 22 April 2022
© The Author(s) 2022

Abstract
Based on a result by Taylor et al. (J Optim Theory Appl 178(2):455–476, 2018) on
the attainable convergence rate of gradient descent for smooth and strongly convex
functions in terms of function values, an elementary convergence analysis for general
descent methods with fixed step sizes is presented. It covers general variable metric
methods, gradient-related search directions under angle and scaling conditions, as well
as inexact gradient methods. In all cases, optimal rates are obtained.

Keywords Convergence rate estimates · Variable metric method · Inexact gradient

method · SR1 update

Mathematics Subject Classification 90C25 · 65K05

1 Introduction

An L-smooth and μ-strongly convex function f : Rn → R is characterized by the

two properties

∇ f (x) − ∇ f (y) ≤ Lx − y

and

Communicated by Claudia Alejandra Sagastizábal.

B André Uschmajew
[email protected]
Bart Vandereycken
[email protected]

1 Max Planck Institute for Mathematics in the Sciences, 04103 Leipzig, Germany
2 Section of Mathematics, University of Geneva, 1211 Geneva, Switzerland

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Journal of Optimization Theory and Applications (2022) 194:364–373 365

μ
f (x) ≥ f (y) + ∇ f (y), x − y + x − y2
2
for some constants 0 < μ ≤ L and all x, y ∈ Rn . Here, , can be any inner product
on Rn with corresponding norm · , and ∇ f denotes the gradient with respect to this
inner product. Note that the constants μ and L depend on the chosen inner product.
The class of such functions plays a main role in the convergence theory of the gradient
method and related descent methods for finding the unique global minimum x ∗ of a
given f . The update rule of the gradient method is

x + = x − h∇ f (x),

where h > 0 is a step size which may depend on the current point x. It is well known
that the fixed step size

2
h=
L +μ

achieves the optimal error reduction

2
+ ∗ 2 κf − 1 L
x − x ≤ x − x ∗ 2 , κ f = , (1.1)
κf + 1 μ

per step, which inductively implies the convergence of the method to x ∗ . We refer to
[6, Theorem 2.1.15] for details.
In a more general setting of proximal gradient methods, it has recently been shown
by Taylor et al. [9, Theorem 3.3 with h = 0] that the same rate is also valid for the
error in function value. Specifically, for any

2
0≤h≤ (1.2)
L +μ

it holds that

f (x + ) − f (x ∗ ) ≤ (1 − hμ)2 ( f (x) − f (x ∗ )). (1.3)

2
Moreover, for L+μ ≤ h < L2 one has f (x + ) − f (x ∗ ) ≤ (h L − 1)2 ( f (x) − f (x ∗ )).
This automatically follows from (1.2) and (1.3) by using a weaker strong convexity
bound 0 < μ ≤ μ satisfying h = L+μ 2
and noting that 1 − hμ = h L − 1. The
optimal choice in the estimates is h = 2/(L + μ) and leads to
2
κf − 1
f (x + ) − f (x ∗ ) ≤ ( f (x) − f (x ∗ )). (1.4)
κf + 1

This estimate for one step of the method is highly nontrivial. Obviously, it implies
the same inequality for the gradient descent method with exact line search (when the
left side is minimized over all h), which has been obtained earlier in [2]. Moreover,

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
366 Journal of Optimization Theory and Applications (2022) 194:364–373

this estimate is known to be optimal in the class of L-smooth and μ-strongly convex
functions. In fact, it is already optimal for quadratic functions in that class; see, e.g.,
[2, Example 1.3].
Of course, in many applications the difference f (x) − f (x ∗ ) is a natural error mea-
sure by itself. For example, for strongly convex quadratic functions it is proportional
to the squared energy norm of the quadratic form. In general, for an L-smooth and
μ-convex function we always have

μ L
x − x ∗ 2 ≤ f (x) − f (x ∗ ) ≤ x − x ∗ 2 ,
2 2

which clearly shows that f (x ) − f (x ∗ ) → 0 for an iterative method implies x −
x ∗ → 0 for → ∞. Moreover, both error measures will exhibit the same R-linear
convergence rate. The novelty of the estimate (1.4) is that one also has an optimal Q-
linear rate for the function values, both for fixed step sizes and exact line search. (We
refer to [8] for the definitions of R- and Q-linear rate.) However, compared to (1.1)
an estimate like (1.4) is “more intrinsic,” because the chosen inner product in Rn
enters only via the constants μ and L. In this short note, we illustrate this advantage
by showing that (1.4) allows for a rather clean analysis of general variable metric
methods, as well as gradient-related methods subject to angle and scaling conditions.
In addition, in Theorem 4.2 we show how (1.4) already implies the sharp rates for
inexact gradient methods under relative error bounds with fixed step sizes, based on a
suitable change of the metric, thereby improving and simplifying a similar result in [3].

2 Variable Metric Method

We first consider the variable metric method. Here the update rule reads

x + = x − h A−1 ∇ f (x), (2.1)

where A is a symmetric (with respect to the given inner product) and positive definite
matrix. It is well known that such an update step can also be interpreted as a gradient
step with respect to a modified inner product. This leads to the following result that
will be the basis for our further considerations.

Theorem 2.1 Assume the eigenvalues of A are in the positive interval [λ, Λ] and
define

2
h̄ = .
L/λ + μ/Λ

Then, x + in (2.1) with 0 ≤ h ≤ h̄ satisfies

hμ 2
f (x + ) − f (x ∗ ) ≤ 1 − ( f (x) − f (x ∗ )).
Λ

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Journal of Optimization Theory and Applications (2022) 194:364–373 367

In particular, the step size h = h̄ yields

2
+ ∗ κ f ,A − 1 L Λ
f (x ) − f (x ) ≤ ( f (x) − f (x ∗ )), κ f ,A = . (2.2)
κ f ,A + 1 μ λ

Proof The result is obtained from (1.3) by noting that ∇ A f (x) = A−1 ∇ f (x) is the
gradient of f with respect to the A-inner product x, y A = x, Ay. We have

L
∇ A f (x) − ∇ A f (y), x − y A ≤ Lx − y2 ≤ x − y2A ,
λ

as well as

μ
∇ A f (x) − ∇ A f (y), x − y A ≥ μx − y2 ≥ x − y2A
Λ

for all x, y. These two conditions are equivalent to f being (L/λ)-smooth and (μ/Λ)-
strongly convex in that A-inner product; see, e.g., [6, Theorems 2.1.5 & 2.1.9]. Thus,
in (1.2) and (1.3), we can replace μ with μ/Λ and L by L/λ, which is exactly the
statement of the theorem.

An alternative, and somewhat more direct proof of Theorem 2.1 that does not require
changing the inner product, can be given by applying the result (1.3) directly to the
function g(y) = f (A−1/2 y) at y = A1/2 x.
Observe that κ f ,A = κ f · κ A with κ A = Λ/λ ≥ 1 the condition number of
A. The contraction factor in (2.2) will therefore always be worse than the orig-
inal factor in (1.4), which corresponds to A = I . This might seem suboptimal
since in Newton’s method, and under additional regularity conditions, the contrac-
tion factor improves when choosing A = ∇ 2 f (x). However, for the general class
of methods (2.1), the result in Theorem 2.1 is optimal. This can already be seen
for the function f (x) = 21 x2 , in which case (2.1) becomes the linear iteration
x + = (I − h A−1 )x. Its contraction factor as predicted by (2.2) is bounded by
(κ A − 1)2 /(κ A + 1)2 , which is indeed a tight bound: as in [2, Example 1.3], take
A = diag(λ, . . . , Λ) and x = (x1 , 0, . . . , 0, xn ). Then, an exact line search yields
x + = (κ A −1)/(κ A +1)·(−x1 , 0, . . . , 0, xn ), and clearly there cannot be a better con-
traction factor with fixed step size. Note that the step size h̄ in Theorem 2.1 also leads
to equality in (2.2) when x is an eigenvector corresponding to λ or Λ. For a less trivial
example, consider f (x) = 21 x, A−1 x. Then, (2.1) becomes x + = (I − h A−2 )x and
the same x from above now leads to a contraction with the factor (κ A2 −1)2 /(κ A2 +1)2
where indeed κ A2 = κ f κ A , as predicted by Theorem 2.1.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
368 Journal of Optimization Theory and Applications (2022) 194:364–373

3 Gradient-Related Methods

Next, we provide error estimates for gradient-related descent methods under angle and
scaling conditions. Specifically, we consider the update rule

x + = x − hd, (3.1)

where −d is a descent direction, that is, d satisfies

∇ f (x), d = cos θ ∇ f (x)d, cos θ > 0, (3.2)

for some θ ∈ [0, π/2). This condition is very natural since it guarantees the conver-
gence of (3.1); see, e.g., [7, Chapter 3.2]. In particular, for the case of exact line search,
it has been shown in [2, Theorem 5.1] that
2
κ f ,θ − 1 L 1 + sin θ
f (x + ) − f (x ∗ ) ≤ ( f (x) − f (x ∗ )), κ f ,θ = ,(3.3)
κ f ,θ + 1 μ 1 − sin θ

and that this Q-linear rate is optimal. For the case of quadratic functions, this has been
known before; see, e.g., [5]. We also mention the result of [1, Theorem 3.3], which
identifies the rate in (3.3) as optimal R-linear rate for exact line search when f is twice
continuously differentiable.
Here, we aim to generalize this result to fixed step sizes. The extent to which this is
possible depends on the available information about the quantities ∇ f (x), d, and
∇ f (x), d. The basic idea is to interpret (3.1) as a variable metric method in order to
apply Theorem 2.1. For this, we need to find a symmetric and positive definite matrix
A satisfying

Ad = ∇ f (x)

and estimate its condition number. Such a matrix can be found explicitly using the
following lemma, which originates from the SR1 update rule; see, e.g., [7].

Lemma 3.1 Let u, v ∈ Rn such that u = v = 1 and u, v = cos θ . Then, the
matrix

1 rr ∗ 1 − sin θ cos θ
B= I− , r = u − αv, α = =
α r , u cos θ 1 + sin θ

is symmetric (for the given inner product), satisfies Bu = v, and has

cos θ cos θ
λmin (B) = , λmax (B) = ,
1 + sin θ 1 − sin θ

as its smallest and largest eigenvalues, respectively. Here, rr ∗ denotes the rank-one
matrix satisfying rr ∗ x = r r , x for all x ∈ Rn .

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Journal of Optimization Theory and Applications (2022) 194:364–373 369

∗
Proof This is checked by a straightforward calculation. Obviously, the matrix I − rr,u
r

equals the identity on the orthogonal complement of r . Its eigenvalue belonging to the
eigenvector r is

r 2 1 − 2α cos θ + α 2 1 − sin θ − α 2
1− =1− = = α2 ,
r , u 1 − α cos θ sin θ

where one uses 1 − α cos θ = sin θ and α 2 = (1 − sin θ )/(1 + sin θ ). Therefore, the
largest eigenvalue of B is 1/α (with multiplicity n − 1), and the smallest eigenvalue
is α.

With Lemma 3.1 and Theorem 2.1 at our disposal, we can state our main result.
Theorem 3.2 Assume (3.2) and

d = c∇ f (x) (3.4)

for some c > 0. Define

2 cos θ
h̄ = .
Lc(1 + sin θ ) + μc(1 − sin θ )

Then, x + in (3.1) with 0 ≤ h ≤ h̄ satisfies

hμc(1 − sin θ ) 2
f (x + ) − f (x ∗ ) ≤ 1 − ( f (x) − f (x ∗ )).
cos θ

In particular, the step size h = h̄ yields

2
+ ∗ κ f ,θ − 1
f (x ) − f (x ) ≤ ( f (x) − f (x ∗ )).
κ f ,θ + 1

Proof If d = 0, the assertion is trivially true. Let d = 0. By Lemma 3.1, there exists
a symmetric and positive definite matrix of the form A = ∇d f (x)
B = 1c B such that
Ad = ∇ f (x) and

1 cos θ 1 cos θ
λmin (A) = , λmax (A) = .
c 1 + sin θ c 1 − sin θ

The assertion follows therefore directly from Theorem 2.1.

Remark 3.3 The condition (3.4) can be replaced with equivalent conditions such as

∇ f (x), d = σ d2

for some σ > 0. An equivalent version of Theorem 3.2 is obtained by observing that
cos θ = σ c.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
370 Journal of Optimization Theory and Applications (2022) 194:364–373

To achieve the optimal rate in Theorem 3.2, the exact values of θ and c need to be
known in order to compute the optimal step size h̄. In practice, this is almost never the
case and only bounds are available. We therefore formulate another, more practical
result of (3.2) under the following relaxed angle and scaling conditions: there exists
0 < c1 ≤ c2 and θ ∈ [0, π/2) such that

θ ≤ θ , c1 ∇ f (x) ≤ d ≤ c2 ∇ f (x). (3.5)

Under these conditions, the eigenvalues of the matrix A = ∇d

f (x)
B in the proof of
Theorem 3.2 can be bounded as

1 cos θ 1 cos θ
λmin (A) ≥ , λmax (A) ≤ ,
c2 1 + sin θ c1 1 − sin θ

since cos θ/(1 ± sin θ ) is monotonically decreasing/increasing in θ ∈ [0, π/2). The

following result is then again immediately obtained from Theorem 2.1.

Theorem 3.4 Assume (3.5) and define

2 cos θ
h̄ = .
Lc2 (1 + sin θ ) + μc1 (1 − sin θ )

Then, x + in (3.1) with 0 ≤ h ≤ h̄ satisfies

hμc1 (1 − sin θ ) 2
f (x + ) − f (x ∗ ) ≤ 1 − ( f (x) − f (x ∗ )).
cos θ

In particular, the step size h = h̄ yields

2
+ ∗ κ − 1 ∗ L c2
1 + sin θ
f (x ) − f (x ) ≤ ( f (x) − f (x )), κ = .
κ + 1 μ c1 1 − sin θ

We remark again that if c1 = c2 = d/∇ f (x) and θ = θ are known, the

resulting statements from Theorem 3.4 coincide with those in Theorem 3.2.

Remark 3.5 We conclude the section with a side remark. When just looking at the
proofs of Theorem 3.2 or 3.4, it would be natural to ask if there exists a symmetric
and positive definite matrix B (and thus A) with a smaller condition number than the
one from Lemma 3.1. As for the SR1 update rule, when matrix B = Bα in the lemma
is regarded as a function of α = 0, then it is well known that the stated α is one of the
minimizers for the condition number in the class of all positive definite Bα (another
is cos θ/(1 − sin θ )); see, e.g., [10]. Indeed, any B with a smaller condition number
would lead to a faster rate in Theorem 3.2 (via Theorem 2.1), which is not possible
since the rate is known to be optimal when exact line search is used. This reasoning
therefore provides a (rather indirect) proof for the following general statement.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Journal of Optimization Theory and Applications (2022) 194:364–373 371

Theorem 3.6 Let u, v ∈ Rn such that u = v = 1 and cos θ = u, v > 0 with
θ ∈ [0, π/2). Then, (1+sin θ )/(1−sin θ ) is the minimum possible (spectral) condition
number among all symmetric and positive definite matrices B satisfying Bu = v.
While probably well known in the field, we did not find this fact explicitly stated
in the literature. It is, of course, not very difficult to prove this result directly by an
elementary calculation on 2 × 2 matrices.

4 Inexact Gradient Method

We now discuss the important case of an inexact gradient method, where instead of
the angle and scaling conditions (3.5), it is assumed that

d − ∇ f (x) ≤ ε∇ f (x) (4.1)

for some ε ∈ [0, 1). This model is also considered in [2–4]. Our aim is again deriving
convergence rates for a fixed step size rule from the variable metric approach. Since
the matrix A in the proof of Theorem 3.2 no longer provides the optimal rates in this
case, we use a different construction.
Lemma 4.1 Let u, v ∈ Rn such that v = 0 and u − v < v. There exists a positive
−1
definite matrix A that satisfies Au = v and has eigenvalues 1 ± u−v
v .
u−v ww ∗ v
Proof Define A−1 = I + v Q with Q = I − 2 w 2 and w = v − u−v .
u−v

v u−v
Observe that Q is the orthogonal reflection matrix that sends v to u−v , which
−1
implies A v = u. Since Q is symmetric with eigenvalues ±1, the result follows.

Applying the lemma to u = d and v = ∇ f (x), the following theorem on the
inexact gradient model (4.1) is an immediate consequence of Theorem 2.1.
Theorem 4.2 Assume ∇ f (x) = 0 and (4.1) for some ε ∈ [0, 1) and define

2
h̄ = .
L(1 + ε) + μ(1 − ε)

Then, x + = x − hd with 0 ≤ h ≤ h̄ satisfies

d − ∇ f (x) 2
f (x + ) − f (x ∗ ) ≤ 1 − hμ 1 − ( f (x) − f (x ∗ ))
∇ f (x)
≤ (1 − hμ(1 − ε))2 ( f (x) − f (x ∗ )).

In particular, the step size h = h̄ yields

2
κε − 1 L 1+ε
f (x + ) − f (x ∗ ) ≤ ( f (x) − f (x ∗ )), κε = . (4.2)
κε + 1 μ 1−ε

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
372 Journal of Optimization Theory and Applications (2022) 194:364–373

The rate in (4.2) is optimal under the general assumption (4.1), in particular for
quadratic f and d satisfying ∇ f (x), d = cos θ d∇ f (x) with sin θ = ε. Triv-
ially, for f (x) = 21 x2 the estimate (4.2) is sharp for all d satisfying (4.1).
The result in Theorem 4.2 is not new. In [4, Proposition 1.5], it has been shown that
2
κε −1
κε +1 is an upper bound for the R-linear convergence rate of the inexact gradient
method with fixed step size h̄. According to [4, Remark 1.6], the estimate (4.2) per
step is implicitly contained in the proof of [3, Theorem 5.3], which, however, is rather
technical. In addition, the statement of [3, Theorem 5.3] itself covers the rate (4.2) only
2μ
for a range ε ∈ [0, ε̄] with some ε̄ < L+μ . Our proof via Lemma 4.1 provides a simple
alternative for obtaining the result for all ε ∈ [0, 1) directly from the estimate (1.4)
for the gradient method (which coincides with [3, Theorem 5.3] when ε = 0).

5 Conclusions

Based on the result (1.4) due to [9], we have derived optimal convergence rates for the
function values in gradient-related descent methods and inexact gradient methods with
fixed step sizes for smooth and strongly convex functions. The results are obtained
using an elementary variable metric approach, in which a single step is interpreted as a
standard gradient step. This is possible since function values are a metric independent
error measure. Compared to the existing results, our proofs offer a more direct way
for obtaining the convergence rate estimates of perturbed gradient methods given the
rates of their exact counterpart.
Acknowledgements The work of B.V. was supported by the SNSF under research Project 192129. We
thank the anonymous referees for their valuable comments on an earlier version of this work, and for
bringing references [4, 9] to our attention.

Funding Open Access funding enabled and organized by Projekt DEAL.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
1. Cohen, A.I.: Stepsize analysis for descent methods. J. Optim. Theory Appl. 33(2), 187–205 (1981)
2. de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact
line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
3. de Klerk, E., Glineur, F., Taylor, A.B.: Worst-case convergence analysis of inexact gradient and Newton
methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3), 2053–
2082 (2020)
4. Gannot, O.: A frequency-domain analysis of inexact gradient methods. Math. Program. (2021)

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Journal of Optimization Theory and Applications (2022) 194:364–373 373

5. Munthe-Kaas, H.: The convergence rate of inexact preconditioned steepest descent algorithm for
solving linear systems. Technical report NA-87-04, Stanford University (1987)
6. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer, Boston (2004)
7. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
8. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Aca-
demic Press, New York (1970)
9. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case convergence rates of the proximal gradient
method for composite convex minimization. J. Optim. Theory Appl. 178(2), 455–476 (2018)
10. Wolkowicz, H.: Measures for symmetric rank-one updates. Math. Oper. Res. 19(4), 815–830 (1994)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:

1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at

[email protected]

Nonlinear Ill Posed Problems of Monotone Type Y Alber I Ryazantseva Springer
No ratings yet
Nonlinear Ill Posed Problems of Monotone Type Y Alber I Ryazantseva Springer
420 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization
No ratings yet
A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization
34 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Uzor-Mewomo-Afrika Matematika - 2025 Published
No ratings yet
Uzor-Mewomo-Afrika Matematika - 2025 Published
16 pages
Frank-Wolfe and Friends A Journey Into Projection
No ratings yet
Frank-Wolfe and Friends A Journey Into Projection
33 pages
Hintermüller M. Semismooth Newton Methods and Applications
No ratings yet
Hintermüller M. Semismooth Newton Methods and Applications
72 pages
Optimization Best
No ratings yet
Optimization Best
71 pages
Introduction To Optimization - Jean-François Aujol
No ratings yet
Introduction To Optimization - Jean-François Aujol
51 pages
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
No ratings yet
Nonlinear Programming 3rd Edition Theoretical Solutions Manual
20 pages
Gate Maths Sreekantha
100% (1)
Gate Maths Sreekantha
192 pages
Nocedal - Wright CH - 02-01
No ratings yet
Nocedal - Wright CH - 02-01
9 pages
2.NCC-SFC-LMT-KKT 2
No ratings yet
2.NCC-SFC-LMT-KKT 2
56 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
Sternberg
No ratings yet
Sternberg
34 pages
Advanced Calculus (1,2,3,4)
No ratings yet
Advanced Calculus (1,2,3,4)
113 pages
Optimization Lecture 2
No ratings yet
Optimization Lecture 2
7 pages
Mirror Descent Slides
No ratings yet
Mirror Descent Slides
35 pages
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
No ratings yet
A Note On The Accelerated Proximal Gradient Method For Nonconvex Optimization
9 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Harmonic Functions
No ratings yet
Harmonic Functions
30 pages
Convergence, Consistency and Stability of Step-By-Step Methods For Ordinary Differential Equations
No ratings yet
Convergence, Consistency and Stability of Step-By-Step Methods For Ordinary Differential Equations
20 pages
Inexact Newton Method For Minimization of Convex P
No ratings yet
Inexact Newton Method For Minimization of Convex P
16 pages
Chapter Six: Numerical Solution of Ordinary Differential Equations 6.1 1 Order Equation: 6.1.1 Euler's Method
No ratings yet
Chapter Six: Numerical Solution of Ordinary Differential Equations 6.1 1 Order Equation: 6.1.1 Euler's Method
15 pages
BFGS
No ratings yet
BFGS
9 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Gradient
No ratings yet
Gradient
37 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Notes On Big-N Problems: 1 Motivation
No ratings yet
Notes On Big-N Problems: 1 Motivation
27 pages
GRAPHS OF THE SIX CIRCULAR FUNCTIONS (Lesson 6)
No ratings yet
GRAPHS OF THE SIX CIRCULAR FUNCTIONS (Lesson 6)
10 pages
Lecture 15 Projected Gradient
No ratings yet
Lecture 15 Projected Gradient
8 pages
1190 543 PB
No ratings yet
1190 543 PB
17 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Error Propagation of General Linear Methods For Ordinary Differential Equations
No ratings yet
Error Propagation of General Linear Methods For Ordinary Differential Equations
21 pages
Gradient
No ratings yet
Gradient
31 pages
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
No ratings yet
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
8 pages
Line Search Algorithms: Bracket
No ratings yet
Line Search Algorithms: Bracket
13 pages
Article 3 - 241123 - 190345
No ratings yet
Article 3 - 241123 - 190345
11 pages
(Hsgs - Edu.vn) Inequalities - Marathon PDF
No ratings yet
(Hsgs - Edu.vn) Inequalities - Marathon PDF
35 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
O4MD 02 Foundations
No ratings yet
O4MD 02 Foundations
8 pages
Lecture 11 AGD Restart Lower Bounds
No ratings yet
Lecture 11 AGD Restart Lower Bounds
5 pages
Institute of Computer Science: Academy of Sciences of The Czech Republic
No ratings yet
Institute of Computer Science: Academy of Sciences of The Czech Republic
49 pages
Chương 9
No ratings yet
Chương 9
12 pages
Sheet 2 Solution
No ratings yet
Sheet 2 Solution
5 pages
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradient Method: Ryan Tibshirani Convex Optimization 10-725
21 pages
Lecture 11
No ratings yet
Lecture 11
4 pages
Smooth Convex Minimization Problems
No ratings yet
Smooth Convex Minimization Problems
28 pages
Chapter-2 MMW
No ratings yet
Chapter-2 MMW
16 pages
The Penalty Function Method
No ratings yet
The Penalty Function Method
22 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Calc 1 Notes
No ratings yet
Calc 1 Notes
30 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Network Analysis On DURGAPUR
No ratings yet
Network Analysis On DURGAPUR
10 pages
Giải Matlab Giải Tích 1
No ratings yet
Giải Matlab Giải Tích 1
8 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Qapsurvey 94
No ratings yet
Qapsurvey 94
42 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
On The Application of A Newton Raphson'S Iterative Method of The Fixed Point Theory To The Solution of A Chemical Equilibrium Problem
No ratings yet
On The Application of A Newton Raphson'S Iterative Method of The Fixed Point Theory To The Solution of A Chemical Equilibrium Problem
20 pages
Trigonometry 6th Edition Charles P. Mckeague - Download The Ebook Today and Experience The Full Content
No ratings yet
Trigonometry 6th Edition Charles P. Mckeague - Download The Ebook Today and Experience The Full Content
43 pages
MATH - Solving Trig Equations
No ratings yet
MATH - Solving Trig Equations
3 pages
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
No ratings yet
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
8 pages
Numerical Methods With Applications
No ratings yet
Numerical Methods With Applications
29 pages
Question Paper MAL 101 Advanced Calculus Mid Sem
No ratings yet
Question Paper MAL 101 Advanced Calculus Mid Sem
2 pages
Coordinate Geometry
No ratings yet
Coordinate Geometry
11 pages
Line Search Algorithms
No ratings yet
Line Search Algorithms
13 pages
Sample Problems and Solution of Gauss Seidel and Jacobi
No ratings yet
Sample Problems and Solution of Gauss Seidel and Jacobi
10 pages
Knowledge/Comprehension/ Application/ Analysis/ Synthesis/Evaluation)
No ratings yet
Knowledge/Comprehension/ Application/ Analysis/ Synthesis/Evaluation)
12 pages
Lecture - (Tree and Its Types)
No ratings yet
Lecture - (Tree and Its Types)
43 pages
Final Exam Honors Algebra 1 Name
No ratings yet
Final Exam Honors Algebra 1 Name
9 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Curvatures, Graph Products and Ricci Flatness
No ratings yet
Curvatures, Graph Products and Ricci Flatness
35 pages
Unit Guide - CCSS Algebra 1 - Unit 8 - Functions
No ratings yet
Unit Guide - CCSS Algebra 1 - Unit 8 - Functions
15 pages
MODULE 04 - Equations of Order One - Homogeneous Differential Equations
No ratings yet
MODULE 04 - Equations of Order One - Homogeneous Differential Equations
10 pages
Unconstrained Minimization in R: Newton Methods
No ratings yet
Unconstrained Minimization in R: Newton Methods
5 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
Barabasi - Rev Mod Phy 2002 - Complex Networks
No ratings yet
Barabasi - Rev Mod Phy 2002 - Complex Networks
51 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
H2 Mathematics - Functions: Supplementary Questions
No ratings yet
H2 Mathematics - Functions: Supplementary Questions
12 pages
Differentiation First
No ratings yet
Differentiation First
10 pages
Some Problems in Nonlinear Volterra Integral Equations: by J. A. Nohel Communicated B y Walter Rudin, March 18, 1962
No ratings yet
Some Problems in Nonlinear Volterra Integral Equations: by J. A. Nohel Communicated B y Walter Rudin, March 18, 1962
7 pages
A Computational Study of The Pseudoflow and Push-Relabel Algorithms For The Maximum Flow Problem PDF
No ratings yet
A Computational Study of The Pseudoflow and Push-Relabel Algorithms For The Maximum Flow Problem PDF
19 pages
1111 LA MidtermExam
No ratings yet
1111 LA MidtermExam
2 pages
University College London Department of Physics and Astronomy
No ratings yet
University College London Department of Physics and Astronomy
1 page

A Note On The Optimal Convergence Rate of Descent

Uploaded by

A Note On The Optimal Convergence Rate of Descent

Uploaded by

Journal of Optimization Theory and Applications (2022) 194:364–373

A Note on the Optimal Convergence Rate of Descent

André Uschmajew1 · Bart Vandereycken2

Keywords Convergence rate estimates · Variable metric method · Inexact gradient

Mathematics Subject Classification 90C25 · 65K05

An L-smooth and μ-strongly convex function f : Rn → R is characterized by the

∇ f (x) − ∇ f (y) ≤ Lx − y

Communicated by Claudia Alejandra Sagastizábal.

achieves the optimal error reduction

f (x + ) − f (x ∗ ) ≤ (1 − hμ)2 ( f (x) − f (x ∗ )). (1.3)

2 Variable Metric Method

x + = x − h A−1 ∇ f (x), (2.1)

Then, x + in (2.1) with 0 ≤ h ≤ h̄ satisfies

In particular, the step size h = h̄ yields

where −d is a descent direction, that is, d satisfies

∇ f (x), d = cos θ ∇ f (x)d, cos θ > 0, (3.2)

is symmetric (for the given inner product), satisfies Bu = v, and has

d = c∇ f (x) (3.4)

for some c > 0. Define

Then, x + in (3.1) with 0 ≤ h ≤ h̄ satisfies

In particular, the step size h = h̄ yields

The assertion follows therefore directly from Theorem 2.1. 

θ ≤ θ , c1 ∇ f (x) ≤ d ≤ c2 ∇ f (x). (3.5)

Under these conditions, the eigenvalues of the matrix A = ∇d

since cos θ/(1 ± sin θ ) is monotonically decreasing/increasing in θ ∈ [0, π/2). The

Theorem 3.4 Assume (3.5) and define

Then, x + in (3.1) with 0 ≤ h ≤ h̄ satisfies

In particular, the step size h = h̄ yields

We remark again that if c1 = c2 = d/∇ f (x) and θ = θ are known, the

4 Inexact Gradient Method

d − ∇ f (x) ≤ ε∇ f (x) (4.1)

Then, x + = x − hd with 0 ≤ h ≤ h̄ satisfies

In particular, the step size h = h̄ yields

Funding Open Access funding enabled and organized by Projekt DEAL.

You might also like

∇ f (x), d = cos θ ∇ f (x)d, cos θ > 0, (3.2)

The assertion follows therefore directly from Theorem 2.1.