0% found this document useful (0 votes)
27 views38 pages

Park 22 C

This document presents an acceleration mechanism for fixed-point iterations with nonexpansive, contractive, and Hölder-type operators. It establishes the exact optimal complexity of the acceleration mechanisms and provides matching lower bounds. Experiments on applications like CT imaging and decentralized optimization demonstrate the practical effectiveness of the acceleration.

Uploaded by

Taha El Bakkali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views38 pages

Park 22 C

This document presents an acceleration mechanism for fixed-point iterations with nonexpansive, contractive, and Hölder-type operators. It establishes the exact optimal complexity of the acceleration mechanisms and provides matching lower bounds. Experiments on applications like CT imaging and decentralized optimization demonstrate the practical effectiveness of the acceleration.

Uploaded by

Taha El Bakkali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Jisun Park 1 Ernest K. Ryu 1

Abstract of Lieder (2021); Kim (2021); Yoon & Ryu (2021), and is
distinct from Nesterov’s acceleration.
Despite the broad use of fixed-point iterations
throughout applied mathematics, the optimal con-
vergence rate of general fixed-point problems with 1.1. Preliminaries and notations
nonexpansive nonlinear operators has not been We review standard definitions and set up the notation.
established. This work presents an acceleration
mechanism for fixed-point iterations with non- Monotone and set-valued operators. We follow stan-
expansive operators, contractive operators, and dard notation of Bauschke & Combettes (2017); Ryu & Yin
nonexpansive operators satisfying a Hölder-type (2020). For the underlying space, consider Rn with standard
growth condition. We then provide matching com- inner product ⟨·, ·⟩ and norm ∥ · ∥, although our results can
plexity lower bounds to establish the exact opti- be extended to infinite-dimensional Hilbert spaces.
mality of the acceleration mechanisms in the non-
expansive and contractive setups. Finally, we pro- We say 𝔸 is an operator on Rn and write 𝔸 : Rn ⇒ Rn if
vide experiments with CT imaging, optimal trans- 𝔸 maps a point in Rn to a subset of Rn . For notational sim-
port, and decentralized optimization to demon- plicity, also write 𝔸x = 𝔸(x). Write Gra 𝔸 = {(x, u) |
strate the practical effectiveness of the accelera- u ∈ 𝔸x} for the graph of 𝔸. Write 𝕀 : Rn → Rn for the
tion mechanism. identity operator. We say 𝔸 : Rn ⇒ Rn is monotone if

⟨𝔸x − 𝔸y, x − y⟩ ≥ 0, ∀x, y ∈ Rn ,


1. Introduction i.e., if ⟨u − v, x − y⟩ ≥ 0 for all u ∈ 𝔸x and v ∈ 𝔸y. For
n n
The fixed-point iteration with 𝕋 : R → R computes µ ∈ (0, ∞), say 𝔸 : Rn ⇒ Rn is µ-strongly monotone if

xk+1 = 𝕋xk ⟨𝔸x − 𝔸y, x − y⟩ ≥ µ∥x − y∥2 , ∀x, y ∈ Rn .

for k = 0, 1, . . . with some starting point x0 ∈ Rn . The An operator 𝔸 is maximally monotone if there is no other
general rubric of formulating solutions of a problem at hand monotone 𝔹 such that Gra 𝔸 ⊂ Gra 𝔹 properly, and is
as fixed points of an operator and then performing the fixed- maximally µ-strongly monotone if there is no other µ-
point iterations is ubiquitous throughout applied mathemat- strongly monotone 𝔹 such that Gra 𝔸 ⊂ Gra 𝔹 properly.
ics, science, engineering, and machine learning. For L ∈ (0, ∞), single-valued operator 𝕋 : Rn → Rn is
Surprisingly, however, the iteration complexity of the ab- L-Lipschitz if
stract fixed-point iteration has not been thoroughly studied.
∥𝕋x − 𝕋y∥ ≤ L∥x − y∥, ∀x, y ∈ Rn .
This stands in sharp contrast with the literature on con-
vex optimization algorithms, where convergence rates and 𝕋 is contractive if it is L-Lipschitz with L < 1 and non-
matching lower bounds are carefully studied. expansive if it is 1-Lipschitz. For θ ∈ (0, 1), an operator
In this paper, we establish the exact optimal complexity of 𝕊 : Rn → Rn is θ-averaged if 𝕊 = (1 − θ)𝕀 + θ𝕋 and a
fixed-point iterations by providing an accelerated method nonexpansive operator 𝕋.
and a matching complexity lower bound. The acceleration is Write 𝕁𝔸 = (𝕀 + 𝔸)−1 for the resolvent of 𝔸, and ℝ𝔸 =
based on a Halpern mechanism, which follows the footsteps 2𝕁𝔸 −𝕀 for the reflected resolvent of 𝔸. When 𝔸 is maximal
1
Department of Mathematical Sciences, Seoul National Univer- monotone, it is well known that 𝕁𝔸 is single-valued with
sity. Correspondence to: Ernest K. Ryu <[email protected]>. dom 𝕁𝔸 = Rn , ℝ𝔸 is a nonexpansive operator, and 𝕁𝔸 =
1 1
2 𝕀 + 2 ℝ𝔸 is 1/2-averaged.
Proceedings of the 39 th International Conference on Machine
Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copy- We say x⋆ ∈ Rn is a zero of 𝔸 if 0 ∈ 𝔸x⋆ . We say y⋆
right 2022 by the author(s). is a fixed-point of 𝕋 if 𝕋y⋆ = y⋆ . Write Zer 𝔸 for the set
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

of zeros of 𝔸 and Fix 𝕋 for the set of all fixed-points of it has been established for KM (Ishikawa, 1976; Borwein
𝕋. For any x ∈ Rn such that x = 𝕁𝔸 y for some y ∈ Rn , et al., 1992) and Halpern (Wittmann, 1992; Xu, 2002).
˜ = y − 𝕁𝔸 y as the resolvent residual of 𝔸 at x.
define 𝔸x
˜ ∈ 𝔸x. For any y ∈ Rn , define y − 𝕋y as the The convergence rate of the KM iteration in terms of
Note that 𝔸x
∥yk − 𝕋yk ∥2 was shown to exhibit O(1/k)-rate (Cominetti
fixed-point residual of 𝕋 at y.
et al., 2014; Liang et al., 2016; Bravo & Cominetti, 2018)
and o(1/k)-rate (Baillon & Bruck, 1992; Davis & Yin, 2016;
Fixed-point iterations. There is a long and rich history
Matsushita, 2017) under various setups. In addition, Bor-
of iterative methods for finding a fixed point of an operator
wein et al. (2017); Lin & Xu (2021) studied the convergence
𝕋 : Rn → Rn (Rhoades, 1991; Brezinski, 2000; Rhoades
rate of the distance to solution under additional bounded
& Saliga, 2001; Berinde & Takens, 2007). In this work, we
Hölder regularity assumption.
consider the following three: the Picard iteration
For the convergence rate of the Halpern iteration in terms of
yk+1 = 𝕋yk , ∥yk − 𝕋yk ∥2 , Leustean (2007) proved a O(1/(log k)2 )-rate
and later Kohlenbach (2011) improved this to a O(1/k)-
the Krasnosel’skiı̆–Mann iteration (KM iteration) rate. Sabach & Shtern (2017) first proved the O(1/k 2 )-rate
yk+1 = λk+1 yk + (1 − λk+1 )𝕋yk , of Halpern iteration, and this rate has been improved in its
constant by a factor of 16 by Lieder (2021).
and the Halpern iteration
Monotone inclusions and splitting methods. As we soon
yk+1 = λk+1 y0 + (1 − λk+1 )𝕋yk ,
establish in Section 2, monotone operators are intimately
where y0 ∈ Rn is an initial point and {λk }k∈N ⊂ (0, 1). connected to fixed-point iterations. Splitting methods
Under suitable assumptions, the {yk }k∈N sequence of these such as forward-backward splitting (FBS) (Bruck Jr, 1977;
iterations converges to a fixed point of 𝕋. Passty, 1979), augmented Lagrangian method (Hestenes,
1969; Powell, 1969), Douglas–Rachford splitting (DRS)
1.2. Prior work (Peaceman & Rachford, 1955; Douglas & Rachford, 1956;
Lions & Mercier, 1979), alternating direction method of
Fixed-point iterations. Picard iteration’s convergence multiplier (ADMM) (Gabay & Mercier, 1976), Davis–Yin
with a contractive operator was established by Banach’s splitting (DYS) (Davis & Yin, 2017), (PDHG) (Chambolle
fixed-point theorem (Banach, 1922). What we refer to as & Pock, 2011), and Condat–Vũ (Condat, 2013; Vũ, 2013)
the Krasnosel’skiı̆–Mann iteration is a generalization of are all fixed-point iterations with respect to specific nonex-
the setups by Krasnosel’skiı̆ (1955) and Mann (1953). Its pansive operators. Therefore, an acceleration of the abstract
convergence with general nonexpansive operators is due to fixed-point iteration is applicable to the broad range of split-
Martinet (1972). The iteration of Halpern (1967) converges ting methods for monotone inclusions.
1
for the wider choice of parameter λk (including λk = k+1 )
due to Wittmann (1992). Halpern iteration is later gen-
Acceleration. Since the seminal work by Nesterov (1983)
eralized to the sequential averaging method (Xu, 2004).
on accelerating gradient methods convex minimization prob-
Ishikawa iteration (Ishikawa, 1976) is an iteration with two
lems, much work as been dedicated to algorithms with
sequences updated in an alternating manner. Anderson ac-
faster accelerated rates. Gradient descent (Cauchy, 1847)
celeration (Anderson, 1965) is another acceleration scheme
can be accelerated in terms of function value suboptimal-
for fixed-point iterations, and it has recently attracted sig-
ity for smooth convex minimization problems (Nesterov,
nificant interest (Walker & Ni, 2011; Scieur et al., 2020;
1983; Kim & Fessler, 2016a), smooth strongly convex mini-
Barré et al., 2020; Zhang et al., 2020; Bertrand & Massias,
mization problems (Nesterov, 2004; Van Scoy et al., 2018;
2021). A number of inertial fixed-point iterations have also
Park et al., 2021; Taylor & Drori, 2021; Salim et al., 2022),
been proposed to accelerate fixed-point iterations (Maingé,
and convex composite minimization problems (Güler, 1992;
2008; Dong et al., 2018; Shehu, 2018; Reich et al., 2021).
Beck & Teboulle, 2009). Recently, accelerated methods for
Our presented method is optimal (in the sense made precise
reducing the squared gradient magnitude for smooth convex
by the theorems) when compared these prior non-stochastic
minimization (Kim & Fessler, 2021; Lee et al., 2021) and
fixed-point iterations.
smooth convex-concave minimax optimization (Diakoniko-
las & Wang, 2021; Yoon & Ryu, 2021) were presented.
Convergence rates of fixed-point iterations. The
squared fixed-point residual ∥yk − 𝕋yk ∥2 is the error mea- Recently, it was discovered that acceleration is also possible
sure for fixed-point problems that we focus on. Its con- in solving monotone inclusions. The accelerated proximal
vergence to 0 (without a specified rate) is referred to as point method (APPM) (Kim, 2021) provides an accelerated
asymptotic regularity (Browder & Petryshyn, 1966), and ˜ k ∥2 compared to the O(1/k)-rate of
O(1/k 2 )-rate of ∥𝔸x
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

proximal point method (PPM) (Martinet, 1970; Gu & Yang, Performance estimation problem. The discovery of the
2020) for monotone inclusions. Maingé (2021) improved main algorithm of Section 3 heavily relied on the use of the
this rate to o(1/k 2 ) rate with another accelerated variant of performance estimation problem (PEP) technique (Drori &
proximal point method called CRIPA-S. Teboulle, 2014). Loosely speaking, the PEP is a computer-
assisted methodology for finding optimal methods by nu-
merically solving semidefinite programs (Drori & Teboulle,
Complexity lower bound. Under the information-based 2014; Kim & Fessler, 2016a; Taylor et al., 2018; Drori &
complexity framework (Nemirovski, 1992), complexity Taylor, 2020; Kim & Fessler, 2021). We discuss the details
lower bound on first-order methods for convex optimiza- of our use of the PEP in Section C of the appendix.
tion has been thoroughly studied (Nesterov, 2004; Drori,
2017; Drori & Shamir, 2020; Carmon et al., 2020; 2021; 1.3. Contributions
Drori & Taylor, 2022). When a complexity lower bound
We summarize the contribution of this work as follows.
matches an algorithms’ guarantee, it establishes optimality
First, we present novel accelerated fixed-point iteration
of the algorithm (Nemirovski, 1992; Drori & Teboulle, 2016;
(OC-Halpern) and its equivalent form (OS-PPM) for mono-
Kim & Fessler, 2016a; Taylor & Drori, 2021; Yoon & Ryu,
tone inclusions. Second, we present exact matching com-
2021; Salim et al., 2022). In the fixed-point theory literature,
plexity lower bounds and thereby establish the exact opti-
Diakonikolas (2020) provided the lower bound result for the
mality of our presented methods. Third, using a restarting
rate of ⟨𝔸xk , xk − x⋆ ⟩ for variational inequalities with Lip-
mechanism, we extend the acceleration to a broader setup
schitz,
 monotone operator. Colao & Marino (2021) showed
2− q2
 with operators satisfying a Hölder-type growth condition.
Ω 1/k lower bound on ∥yk − y⋆ ∥2 for Halpern itera- Finally, we demonstrate the effectiveness of the proposed
tions in q-uniformly smooth Banach spaces. Recently, there acceleration mechanism through extensive experiments.
has been work establishing complexity lower bounds for
the more restrictive “1-SCLI” class of algorithms (Arjevani 2. Equivalence of nonexpansive operators and
et al., 2016). The class of 1-SCLI fixed-point iterations
includes the KM iteration but not Halpern. Up-to-constant
monotone operators
optimality of the KM iteration among 1-SCLI algorithms Before presenting the main content, we quickly establish
was proved with the Ω(1/k) lower bound by Diakonikolas the equivalence between the fixed-point problem
& Wang (2021).
find y = 𝕋y
There also has been recent work on lower bounds for the y∈Rn
general class of algorithms (not just 1-SCLI) for fixed-point
problems. Contreras & Cominetti (2021) established a and the monotone inclusion
Ω(1/k 2 ) lower bound on the fixed-point residual for the find 0 ∈ 𝔸x,
general Mann iteration, which includes the KM and Halpern x∈Rn

iterations, in Banach spaces. Our Ω(1/k 2 ) lower bound


where 𝕋 : Rn → Rn is 1/γ-Lipschitz with γ ≥ 1 and
of Section 4 is more general than the result of Contreras
𝔸 : Rn ⇒ Rn is maximal µ-strongly monotone with µ ≥ 0.
& Cominetti (2021) as it applies to all deterministic algo-
rithms, not just Mann iterations. Diakonikolas & Wang Lemma 2.1. Let 𝕋 : Rn → Rn and 𝔸 : Rn ⇒ Rn . If 𝕋 is
(2021) established a Ω(1/k 2 ) lower bound on the squared 1/γ-Lipschitz with γ ≥ 1, then
operator norm for algorithms finding zeros of cocoercive op-  −1  
erators, which are equivalent to methods finding fixed points 1 1
𝔸= 𝕋+ 𝕀 1+ −𝕀
of nonexpansive operators. Our lower bound of Section 4 γ γ
improves upon this result (by a constant of about 80) and
establishes exact optimality of the methods in Section 3. is maximal γ−1
2 -strongly monotone. Likewise, If 𝔸 is maxi-
mal µ-strongly monotone with µ ≥ 0, then
 
1 1
Acceleration with restart. Restarting is a technique that 𝕋= 1+ 𝕁𝔸 − 𝕀
1 + 2µ 1 + 2µ
allows one to render a standard accelerated method to be
adaptive to the local structure (Nemirovski & Nesterov, 1
is 1+2µ -Lipschitz. Under these transformations, x⋆ is a
1985; Nesterov, 2013; Lin & Xiao, 2014; O’Donoghue & zero of 𝔸 if and only if it is a fixed point of 𝕋, i.e., Zer 𝔸 =
Candes, 2015; Kim & Fessler, 2018; Fercoq & Qu, 2019; Fix 𝕋.
Roulet & d’Aspremont, 2020; Ito & Fukuda, 2021). Our
method of Section 5 was inspired specifically by the restart- The equivalence in case of γ = 1 and µ = 0 is well known
ing scheme of Roulet & d’Aspremont (2020). in optimization literature (Bauschke & Combettes, 2017,
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Theorem 23.8) (Bauschke et al., 2012; Combettes, 2018). When 𝔸 is strongly monotone (µ > 0), (OS-PPM) exhibits
This lemma generalizes the equivalence to γ ≥ 1 and µ ≥ 0. an accelerated O(e−4µN )-rate compared to the O(e−2µN )-
As we see in Appendix A of the appendix, the equivalence is rate of the proximal point method (PPM) (Rockafellar, 1976;
straightforwardly established using the scaled relative graph Bauschke & Combettes, 2017). When 𝕋 is contractive
(SRG) (Ryu et al., 2021), but we also provide a classical (γ < 1), both (OC-Halpern) and the Picard iteration ex-
proof based on inequalities without using the SRG. hibit O(γ −2N )-rates on the squared fixed-point residual. In
1 fact, the Picard iteration with the 𝕋 of Lemma 2.1 instead of
Remark. Since 𝕀 − 𝕋 = (1 + 1+2µ )(𝕀 − 𝕁𝔸 ), finding an
𝕁𝔸 is faster than the regular PPM and achieves a O(e−4µN )
algorithm that effectively reduces ∥yN −1 − 𝕋yN −1 ∥2 for rate. (OC-Halpern) is exactly optimal and is faster than Pi-
fixed-point problem is equivalent to finding an algorithm card in higher order terms hidden in the big-O notation. To
˜ N ∥2 for monotone inclusions.
that effectively reduces ∥𝔸x clarify, the O considers the regime µ → 0.
When 𝔸 is not strongly monotone (µ = 0) or 𝕋 is not
3. Exact optimal methods contractive (γ = 1), (OS-PPM) and (OC-Halpern) respec-
We now present our methods and their accelerated rates. tively reduces to accelerated PPM (APPM) of Kim (2021)
and Halpern iteration of Lieder (2021), sharing the same
For a 1/γ-contractive operator 𝕋 : Rn → Rn , the Optimal O(1/N 2 )-rate. In this paper, we refer to the method of
Contractive Halpern (OC-Halpern) is Lieder (2021) as the optimized Halpern method (OHM).
 
1 1
yk = 1 − 𝕋yk−1 + y0 (OC-Halpern) The discovery of (OC-Halpern) and (OS-PPM) was assisted
φk φk by the performance estimation problem (Drori & Teboulle,
Pk
for k = 1, 2, . . . , where φk = i=0 γ 2i and y0 ∈ Rn is a 2014; Kim & Fessler, 2016b; Taylor et al., 2017; Drori &
starting point. For a maximal µ-strongly monotone operator Taylor, 2020; Ryu et al., 2020; Kim & Fessler, 2021; Park
𝔸 : Rn ⇒ Rn , the Optimal Strongly-monotone Proximal & Ryu, 2021) The details are discussed in Section C of the
Point Method (OS-PPM) is appendix.
xk = 𝕁𝔸 yk−1 (OS-PPM)
3.1. Proof outline of Theorem 3.2
φk−1 − 1 2µφk−1
yk = xk + (xk − xk−1 ) − (yk−1 − xk ) Here, we quickly outline the proof of Theorem 3.2 while
φk φk
(1 + 2µ)φk−2 deferring the full proof to Section B of the appendix.
+ (yk−2 − xk−1 )
φk Define the Lyapunov function
Pk
for k = 1, 2, . . . , where φk = i=0 (1 + 2µ)2i , φ−1 = 0, " !2
k−1
and x0 = y0 = y−1 ∈ Rn is a starting point. These two k −k
X
n ˜ k ∥2
V = (1 + γ ) γ ∥𝔸x
methods are equivalent.
n=0
Lemma 3.1. Suppose γ = 1 + 2µ. Let 𝔸 = k−1
!
 −1   X
𝕋 + γ𝕀1 1
1 + γ − 𝕀 given 𝕋, or equivalently let +2 γn ˜ k − µ(xk − x⋆ ), xk − x⋆ ⟩
⟨𝔸x
  n=0
1 1
𝕋 = 1 + 1+2µ 𝕁𝔸 − 1+2µ 𝕀 given 𝔸. Then the yk -iterates k−1 !
X
2 #

of (OC-Halpern) and (OS-PPM) are identical provided they +γ −k
γ n ˜ k
𝔸xk − γ (xk − x⋆ ) + (xk − y0 )

start from the same initial point y0 = ỹ0 .
n=0

We now state the convergence rates. + (1 − γ −k )∥y0 − x⋆ ∥2 (OS-PPM-Lyapunov)


Theorem 3.2. Let 𝔸 : Rn ⇒ Rn be maximal µ-strongly
˜ k = yk−1 −
for k = 0, 1, . . . , where γ = 1 + 2µ and 𝔸x
monotone with µ ≥ 0. Assume 𝔸 has a zero and let x⋆ ∈
Zer 𝔸. For N = 1, 2, . . . , (OS-PPM) exhibits the rate xk ∈ 𝔸xk . After some calculations (deferred to the ap-
!2 pendix), we use µ-strong monotonicity of 𝔸 to conclude
˜ 2 1
∥𝔸xN ∥ ≤ PN −1 ∥y0 − x⋆ ∥2 .
(1 + 2µ) k V k+1 − V k = −2γ −2k (1 + γ)φk φk−1
k=0
˜ k+1 − 𝔸x
⟨𝔸x ˜ k − µ(xk+1 − xk ), xk+1 − xk ⟩
Corollary 3.3. Let 𝕋 : R → R be γ −1 -contractive with
γ ≥ 1. Assume 𝕋 has a fixed point and let y⋆ ∈ Fix 𝕋. For ≤ 0.
N = 0, 1, . . . , (OC-Halpern) exhibits the rate
 2 !2 Therefore,
2 1 1
∥yN − 𝕋yN ∥ ≤ 1 + ∥y0 − y⋆ ∥2 .
V N ≤ V N −1 ≤ · · · ≤ V 0 = 2∥y0 − x⋆ ∥2
PN
γ k=0 γ
k
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

γ
and we conclude Lemma 4.3. 𝕋 is γ1 -contractive if and only if 𝔾 = 1+γ (𝕀 −
1
!2 𝕋) is 1+γ -averaged.
˜ N ∥2 ≤ 1
∥𝔸x PN −1 ∥y0 − x⋆ ∥2 .
k=0 γk By Lemma 4.3, finding the worst-case γ1 -contractive opera-
1
tor 𝕋 is equivalent to finding the worst-case 1+γ -averaged
4. Complexity lower bound operator 𝔾, which we define in the following lemma.

We now establish exact optimality of (OC-Halpern) and Lemma 4.4. Let R > 0. Define ℕ, 𝔾 : RN +1 → RN +1 as
(OS-PPM) through matching complexity lower bound. By
ℕ(x1 , x2 , . . . , xN , xN +1 ) = (xN +1 , −x1 , −x2 , . . . , −xN )
exact, we mean that the lower bound is exactly equal to
upper bounds of Theorem 3.2 and Corollary 3.3. 1 + γ N +1
−p Re1
Theorem 4.1. For n ≥ N +1 and any initial point y0 ∈ Rn , 1 + γ 2 + · · · + γ 2N
there exists an 1/γ-Lipschitz operator 𝕋 : Rn → Rn with a
and
fixed point y⋆ ∈ Fix 𝕋 such that 1 γ
𝔾= ℕ+ 𝕀.
 2 !2 1+γ 1+γ
2 1 1
∥yN − 𝕋yN ∥ ≥ 1 + PN ∥y0 − y⋆ ∥2 That is,
γ k=0 γ
k
 
γ 0 ··· 0 1
for any iterates {yk }N
k=0 satisfying −1 γ ··· 0 0
1 
 .. .. .. ..

..  x
yk ∈ y0 + span{y0 − 𝕋y0 , y1 − 𝕋y1 , . . . , yk−1 − 𝕋yk−1 } 𝔾x = .
1+γ  . . . .


0 0 ··· γ 0
for k = 1, . . . , N . 0 0 ··· −1 γ
| {z }
The following corollary translates Theorem 4.1 to an equiv- =:H
alent complexity lower bound for proximal point methods 1 1 + γ N +1
− Re1 .
in monotone inclusions.
p
1 + γ 1 + γ 2 + · · · + γ 2N
Corollary 4.2. For n ≥ N and any initial point x0 = y0 ∈
| {z }
=:b
Rn , there exists a maximal µ-strongly monotone operator
1
𝔸 : Rn ⇒ Rn with a zero x⋆ ∈ Zer 𝔸 such that Then ℕ is nonexpansive, and 𝔾 is 1+γ -averaged.
!2
˜ 2 1 Following lemma states the property of iterations {yk }N
∥𝔸xN ∥ ≥ PN −1 ∥y0 − x⋆ ∥2 k=0
(1 + 2µ) k with respect to 𝔾, that proper span condition results in grad-
k=0
ually expanding support of yk .
for any iterates {xk }k=0,1,... and {yk }k=0,1,... satisfying Lemma 4.5. Let 𝔾 : RN +1 → RN +1 be defined as in
xk = 𝕁𝔸 yk−1 Lemma 4.4. For any {yk }N
k=0 with y0 = 0 satisfying

˜ 1 , 𝔸x
yk ∈ y0 + span{𝔸x ˜ 2 , . . . , 𝔸x
˜ k} yk ∈ y0 + span{𝔾y0 , 𝔾y1 , . . . , 𝔾yk−1 }, k = 1, . . . , N,
˜ k = yk−1 − xk .
for k = 1, . . . , N , where 𝔸x we have

(OC-Halpern) and (OS-PPM) satisfy the span assumptions yk ∈ span {e1 , e2 , . . . , ek }


stated in Theorem 4.1 and Corollary 4.2, respectively. There- 𝔾yk ∈ span {e1 , e2 , . . . , ek+1 } , k = 0, . . . , N.
fore, the rates of (OC-Halpern) and (OS-PPM) are exactly
optimal. The lower bounds in the cases where γ = 1 and
µ = 0 establish that the prior rates of OHM (Lieder, 2021) 4.2. Proof outline of Theorem 4.1
and APPM (Kim, 2021) are exactly optimal. To clarify, the Let 𝕋0 : Rn → Rn be the worst-case γ1 -contraction for
lower bound is novel even for the case γ = 1 and µ = 0. initial point 0. For any given y0 ∈ Rn , we show in section
D of the appendix that 𝕋 : Rn → Rn defined as 𝕋(·) =
4.1. Construction of the worst-case operator 𝕋0 (· − y0 ) + y0 becomes the worst-case γ1 -contraction with
initial point y0 ∈ Rn . Therefore, it suffices to consider the
We now describe the construction of the worst-case operator,
case y0 = 0.
while deferring the proofs to Section D of the appendix. Let
ek be the canonical basis vector with 1 at the k-th entry and Define 𝔾, H, and b as in Lemma 4.4. By Lemma 4.3, 𝕋 =
0 at remaining entries. 𝕀 − 1+γ
γ 𝔾 is a 1/γ-contraction. Note that H is invertible, as
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

we can use Gaussian elimination on H to obtain an upper 5. Acceleration under Hölder-type growth
triangular matrix with nonzero diagonals. This makes 𝔾 an condition
invertible affine operator with the unique zero
While (OS-PPM) provides an accelerated rate when the un-
R ⊺
γ N −1 · · · γ 1 .
 N
y⋆ = p γ derlying operator is monotone or strongly monotone, many
1 + γ 2 + · · · + γ 2N operators encountered in practice have a structure lying
So Fix 𝕋 = Zer 𝔾 = {y⋆ } and ∥y0 − y⋆ ∥ = ∥y⋆ ∥ = R. between these two assumptions. For (OC-Halpern), this
corresponds to a fixed-point operator that is not strictly con-
Let the iterates {yk }N
k=0 satisfy the span condition of Theo- tractive but has structure stronger than nonexpansiveness.
rem 4.1, which is equivalent to In this section, we accelerate the proximal point method
yk ∈ y0 + span{𝔾y0 , 𝔾y1 , . . . , 𝔾yk−1 }, k = 1, . . . , N. when the underlying operator is uniformly monotone, an
assumption weaker than strong monotonicity but stronger
By Lemma 4.5, yN ∈ span{e1 , . . . , eN }. Therefoere
than monotonicity.
𝔾yN = HyN − b ∈ span{He1 , . . . , HeN } − b.
We say an operator 𝔸 : Rn ⇒ Rn is uniformly monotone
and 2 with parameters µ > 0 and α > 1 if it is monotone and
2
∥𝔾yN ∥ ≥ Pspan{He1 ,...,HeN }⊥ (b) ,

⟨𝔸x, x − x⋆ ⟩ ≥ µ∥x − x⋆ ∥α+1
where PV is the orthogonal projection onto the subspace V .

As span{He1 , . . . , HeN } = span{v} with for any x ∈ Rn and x⋆ ∈ Zer 𝔸. This is a special case
v = 1 γ · · · γ N −1 γ N ,
 ⊺ of uniform monotonicity in Bauschke & Combettes (2017,
Definition 22.1). We also refer to this as a Hölder-type
we get growth condition, as it resembles the Hölderian error bound
!2 condition with function-value suboptimality replaced by
2 ⟨b, v⟩ 2(∗)

2 1 ⟨𝔸x, x − x⋆ ⟩ (Lojasiewicz, 1963; Bolte et al., 2017).
R2 ,

∥𝔾yN ∥ ≥ Pspan{v} (b) =
v = PN
⟨v, v⟩ k=0 γ
k
The following theorem establishes a convergence rate of the
where (∗) is established in the Section D of the appendix. (unaccelerated) proximal point method. This rate serves as
Finally, a baseline to improve upon with acceleration.
2
Theorem 5.1. Let 𝔸 : Rn ⇒ Rn be uniformly monotone
 
2
1
∥yN − 𝕋yN ∥ = 1 +
𝔾yN
γ with parameters µ > 0 and α > 1. Let x⋆ ∈ Zer 𝔸.
 2 !2 Then the iterates {xk }N
k=0 generated by the proximal point
1 1 method xk+1 = 𝕁𝔸 xk starting from x0 ∈ Rn satisfy
≥ 1+ PN R2
γ k=0 γ k
 α 2
 α−1 
!2 α+3 α−1 −2
2 2

1
2
1 2 α−1 max µ , ∥x0 − x⋆ ∥
= 1+ PN ∥y0 − y⋆ ∥2 . ˜ N ∥2 ≤
∥𝔸x
γ γ k α+1
k=0 N α−1
4.3. Generalized complexity lower bound result ˜ N = xN −1 − xN .
for N ∈ N where 𝔸x
In order to extend the lower bound results of Theorem 4.1 We now present an accelerated method based on (OS-PPM)
and Corollary 4.2 to general deterministic fixed-point itera- and restarting (Nesterov, 2013; Roulet & d’Aspremont,
tions and proximal point methods (which do not necessarily 2020). Given a uniformly monotone operator 𝔸 : Rn ⇒ Rn
satisfy the span condition), we use the resisting oracle tech- with µ > 0 and α > 1, x⋆ ∈ Zer 𝔸, and an initial point
nique of Nemirovski & Yudin (1983). Here, we quickly x0 ∈ Rn , Restarted OS-PPM is:
state the result while deferring the proofs to the Section D
of the appendix. x̃0 = 𝕁𝔸 x0 (OS-PPMres
0 )
Theorem 4.6. Let n ≥ 2N for N ∈ N. For any determin- x̃k ← OS-PPM0 (x̃k−1 , tk ), k = 1, . . . , R,
istic fixed-point iteration A and any initial point y0 ∈ Rn ,
there exists a γ1 -Lipschitz operator 𝕋 : Rn → Rn with a where OS-PPM0 (x̃k−1 , tk ) is the execution of tk iterations
fixed point y⋆ ∈ Fix 𝕋 such that of (OS-PPM) with µ = 0 starting from x̃k−1 . The following
 2 !2 theorem provides a restarting schedule, i.e., specified values
1 1 of t1 , . . . , tR , and an accelerated rate.
∥yN − 𝕋yN ∥2 ≥ 1 + PN ∥y0 − y⋆ ∥2
γ k=0 γ k

where {yt }t∈N = A[y0 ; 𝕋].


Exact Optimal Accelerated Complexity for Fixed-Point Iterations

10-1
10-2 10-3
10-3
kxk − Txk k 2

kM̃xk k 2
10-5
10-4
10-5 Picard PPM
OHM 10-7 APPM
10-6 OC-Halpern OS-PPM
100 101 102 100 101 102
Iteration count Iteration count

(a) Fixed-point residual of 𝕋θ (b) Resolvent residual norm of 𝕄

Figure 1. Fixed-point and resolvent residuals versus iteration count for the 2D toy example of Section 6.1. Here, γ = 1/0.95 = 1.0526,
µ = 0.035, θ = 15◦ and N = 101. Indeed, (OC-Halpern) and (OS-PPM) exhibit the fastest rates.

0.8 0.6
0.6
0.4
0.4
0.2 0.2

0.0 0.0
0.2
0.2
0.4 Picard PPM
OHM APPM
0.6 OC-Halpern 0.4 OS-PPM
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0

(a) Trajectory of 𝕋θ (b) Trajectory of 𝕄

Figure 2. Trajectories of iterates for the 2D toy example of Section 6.1. Here, γ = 1/0.95 = 1.0526, µ = 0.035, θ = 15◦ and N = 101.
A marker is placed at every iterate. Picard and PPM are slowed down by the cyclic behavior. Halpern and APPM dampens the cycling
behavior, but does so too aggressively. The fastest rate is achieved by (OC-Halpern) and (OS-PPM), which appears to be due to the
cycling behavior being optimally dampened.

Theorem 5.2. Let 𝔸 : Rn ⇒ Rn be uniformly monotone Then (OS-PPMres


0 ) exhibits the rate
with parameters µ > 0 and α > 1, x⋆ ∈ Zer 𝔸, and N be
the total number of iterations. Define ˜ N ∥2
∥𝔸x
 β 2α
− α−1
eβ −1

  α1 ≤ eλe−1
2β N −2− 1
β log λeβ
(N − 1) + 1 + 1

e 1 1
λ= ∥x0 − x⋆ ∥1− α , β =1− . × ∥x0 − x⋆ ∥2
µ α  

= O N − α−1 .
Let R ∈ N be an integer satisfying

R R+1 The proofs of Theorems 5.1 and 5.2 are presented in Sec-
tion E of the appendix. When the values of α, µ, and
X X
βk βk
⌈λe ⌉ ≤ N − 1 < ⌈λe ⌉,
k=1 k=1
∥x0 − x⋆ ∥2 are unknown, as in the case in most practical se-
tups, one can use a grid search
 as2αin Roulet & d’Aspremont
and let tk be defined as (2020) and retain the O N − α−1 (log N )2 -rate. Using
Lemma 4.3, (OS-PPMres 0 ) can be translated into a restarted
(  OC-Halpern method. The experiments of Section 6 indi-
λeβk for k = 1, . . . , R − 1 cate that (OS-PPMres
0 ) does provide an acceleration in cases
tk = PR−1
N − 1 − k=1 tk . for k = R. where (OS-PPM) by itself does not.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

6. Experiments R(n−1)×n × Rn×(n−1) is the optimization variable. We


use the algorithm of Li et al. (2018), an instance of a non-
We now present experiments with illustrative toy examples expansive fixed-point iteration via PDHG. The results of
and real-world problems in medical imaging, optimal trans- Figure 3(b) indicate that restarted OC-Halpern (OS-PPMres0 )
port, and decentralized compressed sensing. Further experi- provides an acceleration.
mental details are provided in Section F of the appendix.
6.4. Decentralized optimization with PG-EXTRA
6.1. Illustrative 2D toy examples
Consider a decentralized optimization setting where each
Consider a γ1 -contractive operator 𝕋θ : R2 → R2 agent i ∈ {1, 2, . . . , n} has access to the sensing matrix
 
x

1 cos θ − sin θ x1
  A(i) ∈ Rmi ×n and the noisy measurement b(i) ≈ A(i) x.
𝕋θ 1 = The goal is to recover the sparse signal x ∈ Rn by solving
x2 γ sin θ cos θ x2
the following compressed sensing problem:
and a maximal µ-strongly monotone operator 𝕄 : R2 → R2 1X
n
         minimize ∥A(i) x − b(i) ∥2 + λ∥x∥1 .
x 1 0 1 µ 0 x1 n
x∈R n i=1
𝕄 1 = + .
x2 N − 1 −1 0 0 µ x2
We use PG-EXTRA (Shi et al., 2015), which is an instance
𝕋θ is a counterclockwise θ-rotation followed by γ1 -scaling of a nonexpansive fixed-point iteration via the Condat–Vũ
on 2D plane, and 𝕄 is a linear combination of the worst-case (Condat, 2013; Vũ, 2013) splitting method (Wu et al., 2018).
instances of the proximal point method applied to monotone The results of Figure 3(c) indicate that restarted OC-Halpern
operators (Gu & Yang, 2020) and µ-strongly monotone op- (OS-PPMres0 ) provides an acceleration.
erators (Rockafellar, 1976). The results of Figure 1 indicate
that (OC-Halpern) and (OS-PPM) indeed provide accelera-
tion. 7. Conclusion
This work presents an acceleration mechanism for fixed-
6.2. Computed tomography (CT) imaging point iterations and provides an exact matching complexity
Consider the medical imaging application of total variation lower bound. The acceleration mechanism is an instance
regularized computed tomography (CT), which solves of Halpern’s method, also referred to as anchoring, and the
complexity lower bound is based on an explicit construction
1 satisfying the zero-chain condition.
minimize ∥Ex − b∥2 + λ∥Dx∥1 ,
x∈Rn 2
In this work, we measure the suboptimality of iterates with
where x ∈ Rn is a vectorized image, E ∈ Rm×n is the the fixed-point residual. However, the fixed-point iteration is
discrete Radon transform, b = Ex is the measurement, a meta-algorithm, and almost all instances of it have further
and D is the finite difference operator. We use primal-dual specific structure and suboptimality measures that are better
hybrid gradient (PDHG) (Zhu & Chan, 2008; Pock et al., suited for the particular problem of interest, such as function-
2009; Esser et al., 2010; Chambolle & Pock, 2011), an value suboptimality, infeasibility for constrained problems,
instance of a nonexpansive fixed-point iteration via variable and primal-dual gap for minimax problems. Therefore, the
metric PPM (He & Yuan, 2012). The results of Figure 3(a) fact that our proposed method accelerates the reduction of
indicate that restarted OC-Halpern (OS-PPMres 0 ) provides the fixed-point residual does not necessarily imply that it ac-
an acceleration. celerates the reduction of the problem-specific suboptimality
measure of practical interest.
6.3. Earth mover’s distance
Interestingly, the experimental results of Sections 6 and F
Consider the earth mover’s distance between two probability indicate that our proposed acceleration does indeed provide
measures, also referred to as the Wasserstein distance or the a benefit in practice. This raises the following question:
optimal transport problem. The distance is defined through Under what setups can we expect anchoring-based accel-
the discretized optimization problem eration to theoretically provide a benefit in terms of other
n X
X n suboptimality measures? Investigating this question would
minimize ∥m∥1,1 = |mx,ij | + |my,ij | be an interesting direction of future work.
mx ,my
i=1 j=1
subject to div(m) + ρ1 − ρ0 = 0,
where ρ0 , ρ1 are probability measures on Rn×n , div is
a discrete divergence operator, and m = (mx , my ) ∈
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

106 10-3 PDHG 101


OHM
105 10-5 Restarted OC-Halpern
100
104 10-7
kxk − Txk k 2M

kxk − Txk k 2F
kxk − Txk k 2M
103 10-9 10-1
102 10-11 PG-EXTRA
PDHG OHM
101 OHM 10-2 Restarted OC-Halpern
Restarted OC-Halpern 10-13 OC-Halpern
100 101 102 103 0 20000 40000 60000 80000 100000 100 101 102
Iteration count Iteration count Iteration count

(a) CT imaging (b) Earth mover’s distance (c) Decentralized compressed sensing

Figure 3. Reduction of fixed-point residuals. The norms are the norms under which the fixed-point operator 𝕋 is nonexpansive.

Acknowledgements correspondence and duality. Set-Valued and Variational


Analysis, 20:131–153, 2012.
JP and EKR were supported by the National Research Foun-
dation of Korea (NRF) Grant funded by the Korean Govern- Beck, A. and Teboulle, M. A fast iterative shrinkage-
ment (MSIP) [No. 2022R1C1C1010010], the National Re- thresholding algorithm for linear inverse problems. SIAM
search Foundation of Korea (NRF) Grant funded by the Ko- Journal on Imaging Sciences, 2(1):183–202, 2009.
rean Government (MSIP) [No. 2022R1A5A6000840], and
Berinde, V. and Takens, F. Iterative Approximation of Fixed
the Samsung Science and Technology Foundation (Project
Points. Springer, 2007.
Number SSTF-BA2101-02). We thank TaeHo Yoon for
providing careful reviews and valuable feedback. We thank Bertrand, Q. and Massias, M. Anderson acceleration of
Jelena Diakonikolas for the discussion on the prior work coordinate descent. AISTATS, pp. 1288–1296, 2021.
on complexity lower bounds of the fixed-point iterations.
Finally, we thank the anonymous reviewers for their thought- Bolte, J., Nguyen, T. P., Peypouquet, J., and Suter, B. W.
ful comments. From error bounds to the complexity of first-order descent
methods for convex functions. Mathematical Program-
ming, 165:471–507, 2017.
References
Borwein, J., Reich, S., and Shafrir, I. Krasnoselski-Mann
Anderson, D. G. Iterative procedures for nonlinear integral iterations in normed spaces. Canadian Mathematical
equations. Journal of the ACM, 12(4):547–560, 1965. Bulletin, 35(1):21–28, 1992.
Arjevani, Y., Shalev-Shwartz, S., and Shamir, O. On lower
Borwein, J. M., Li, G., and Tam, M. K. Convergence rate
and upper bounds in smooth and strongly convex opti-
analysis for averaged fixed point iterations in common
mization. The Journal of Machine Learning Research, 17
fixed point problems. SIAM Journal on Optimization, 27
(1):4303–4353, 2016.
(1):1–33, 2017.
Baillon, J.-B. and Bruck, R. E. Optimal rates of asymptotic
Bravo, M. and Cominetti, R. Sharp convergence rates for
regularity for averaged nonexpansive mappings. Fixed
averaged nonexpansive maps. Israel Journal of Mathe-
Point Theory and Applications, 128:27–66, 1992.
matics, 227:163–188, 2018.
Banach, S. Sur les opérations dans les ensembles abstraits
Brezinski, C. Convergence acceleration during the 20th cen-
et leur application aux équations intégrales. Fundamenta
tury. Journal of Computational and Applied Mathematics,
Mathematicae, 3(1):133–181, 1922.
122(1-2):1–21, 2000.
Barré, M., Taylor, A., and d’Aspremont, A. Convergence
Browder, F. E. and Petryshyn, W. The solution by itera-
of constrained anderson acceleration. arXiv preprint
tion of nonlinear functional equations in Banach spaces.
arXiv:2010.15482, 2020.
Bulletin of the American Mathematical Society, 72(3):
Bauschke, H. H. and Combettes, P. L. Convex Analysis and 571–575, 1966.
Monotone Operator Theory in Hilbert Spaces. Springer,
Bruck Jr, R. E. On the weak convergence of an ergodic
second edition, 2017.
iteration for the solution of variational inequalities for
Bauschke, H. H., Moffat, S. M., and Wang, X. Firmly non- monotone operators in Hilbert space. Journal of Mathe-
expansive mappings and maximally monotone operators: matical Analysis and Applications, 61(1):159–164, 1977.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Carmon, Y., Duchi, J. C., Hinder, O., and Sidford, A. Lower Dong, Q., Yuan, H., Cho, Y., and Rassias, T. M. Modified
bounds for finding stationary points I. Mathematical inertial Mann algorithm and inertial CQ-algorithm for
Programming, 184:71–120, 2020. nonexpansive mappings. Optimization Letters, 12(1):
87–102, 2018.
Carmon, Y., Duchi, J. C., Hinder, O., and Sidford, A. Lower
bounds for finding stationary points II: first-order meth- Douglas, J. and Rachford, H. H. On the numerical solu-
ods. Mathematical Programming, 185:315–355, 2021. tion of heat conduction problems in two and three space
variables. Transactions of the American Mathematical
Cauchy, A.-L. Méthode générale pour la résolution des Society, 82(2):421–439, 1956.
systemes d’équations simultanées. Comptes rendus de
l’Académie des Sciences, 25:536–538, 1847. Drori, Y. The exact information-based complexity of smooth
convex minimization. Journal of Complexity, 39:1–16,
Chambolle, A. and Pock, T. A first-order primal-dual algo- 2017.
rithm for convex problems with applications to imaging.
Journal of Mathematical Imaging and Vision, 40(1):120– Drori, Y. and Shamir, O. The complexity of finding sta-
145, 2011. tionary points with stochastic gradient descent. ICML,
2020.
Colao, V. and Marino, G. On the rate of convergence of
Halpern iterations. Journal of Nonlinear and Convex Drori, Y. and Taylor, A. On the oracle complexity of smooth
Analysis, 22(12):2639–2646, 2021. strongly convex minimization. Journal of Complexity, 68,
2022.
Combettes, P. L. Monotone operator theory in convex opti-
mization. Mathematical Programming, 170(1):177–206, Drori, Y. and Taylor, A. B. Efficient first-order methods for
2018. convex minimization: a constructive approach. Mathe-
matical Programming, 184(1–2):183–220, 2020.
Cominetti, R., Soto, J. A., and Vaisman, J. On the rate of
convergence of Krasnosel’skiı̆-Mann iterations and their Drori, Y. and Teboulle, M. Performance of first-order meth-
connection with sums of Bernoullis. Israel Journal of ods for smooth convex minimization: a novel approach.
Mathematics, 199(2):757–772, 2014. Mathematical Programming, 145(1–2):451–482, 2014.

Condat, L. A primal–dual splitting method for convex opti- Drori, Y. and Teboulle, M. An optimal variant of Kelley’s
mization involving Lipschitzian, proximable and linear cutting-plane method. Mathematical Programming, 160
composite terms. Journal of Optimization Theory and (1–2):321–351, 2016.
Applications, 158(2):460–479, 2013.
Esser, E., Zhang, X., and Chan, T. F. A general framework
Contreras, J. P. and Cominetti, R. Optimal error bounds for a class of first order primal-dual algorithms for con-
for nonexpansive fixed-point iterations in normed spaces. vex optimization in imaging science. SIAM Journal on
arXiv preprint arXiv:2108.10969, 2021. Imaging Sciences, 3(4):1015–1046, 2010.

Davis, D. and Yin, W. Convergence rate analysis of several Fercoq, O. and Qu, Z. Adaptive restart of accelerated gradi-
splitting schemes. In Glowinski, R., Osher, S. J., and Yin, ent methods under local quadratic growth condition. IMA
W. (eds.), Splitting Methods in Communication, Imaging, Journal of Numerical Analysis, 39(4):2069–2095, 2019.
Science, and Engineering, pp. 115–163. Springer, 2016.
Gabay, D. and Mercier, B. A dual algorithm for the solu-
Davis, D. and Yin, W. A three-operator splitting scheme and tion of nonlinear variational problems via finite element
its optimization applications. Set-Valued and Variational approximation. Computers & Mathematics with Applica-
Analysis, 25(4):829–858, 2017. tions, 2(1):17–40, 1976.

Diakonikolas, J. Halpern iteration for near-optimal and Gu, G. and Yang, J. Tight sublinear convergence rate of
parameter-free monotone inclusion and strong solutions the proximal point algorithm for maximal monotone in-
to variational inequalities. COLT, 2020. clusion problems. SIAM Journal on Optimization, 30(3):
1905–1921, 2020.
Diakonikolas, J. and Wang, P. Potential function-based
framework for making the gradients small in convex and Güler, O. New proximal point algorithms for convex mini-
min-max optimization. arXiv preprint arXiv:2101.12101, mization. SIAM Journal on Optimization, 2(4):649–664,
2021. 1992.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Halpern, B. Fixed points of nonexpanding maps. Bulletin Li, W., Ryu, E. K., Osher, S., Yin, W., and Gangbo, W. A
of the American Mathematical Society, 73(6):957–961, parallel method for earth mover’s distance. Journal of
1967. Scientific Computing, 75(1):182–197, 2018.
He, B. and Yuan, X. Convergence analysis of primal-dual Liang, J., Fadili, J., and Peyré, G. Convergence rates with
algorithms for a saddle-point problem: from contraction inexact non-expansive operators. Mathematical Program-
perspective. SIAM Journal on Imaging Sciences, 5(1): ming, 159(1–2):403–434, 2016.
119–149, 2012.
Lieder, F. On the convergence rate of the Halpern-iteration.
Hestenes, M. R. Multiplier and gradient methods. Journal Optimization Letters, 15(2):405–418, 2021.
of Optimization Theory and Applications, 4(5):303–320,
Lin, Q. and Xiao, L. An adaptive accelerated proximal
1969.
gradient method and its homotopy continuation for sparse
Ishikawa, S. Fixed points and iteration of a nonexpansive optimization. ICML, 2014.
mapping in a Banach space. Proceedings of the American
Lin, Y. and Xu, Y. Convergence rate analysis for fixed-
Mathematical Society, 59(1):65–71, 1976.
point iterations of generalized averaged nonexpansive
Ito, M. and Fukuda, M. Nearly optimal first-order methods operators. arXiv preprint arXiv:2108.06714, 2021.
for convex optimization under gradient norm measure: Lions, P.-L. and Mercier, B. Splitting algorithms for the sum
An adaptive regularization approach. Journal of Optimiza- of two nonlinear operators. SIAM Journal on Numerical
tion Theory and Applications, 188(3):770–804, 2021. Analysis, 16(6):964–979, 1979.
Kim, D. Accelerated proximal point method for maximally Lojasiewicz, S. Une propriété topologique des sous-
monotone operators. Mathematical Programming, 190 ensembles analytiques réels. Les Équations aux Dérivées
(1–2):57–87, 2021. Partielles, 117:87–89, 1963.
Kim, D. and Fessler, J. A. Optimized first-order methods for Maingé, P.-E. Convergence theorems for inertial KM-type
smooth convex minimization. Mathematical Program- algorithms. Journal of Computational and Applied Math-
ming, 159(1–2):81–107, 2016a. ematics, 219(1):223–236, 2008.
Kim, D. and Fessler, J. A. Optimized first-order methods Maingé, P.-E. Accelerated proximal algorithms with a cor-
for smooth convex minimization. Mathematical program- rection term for monotone inclusions. Applied Mathemat-
ming, 159(1):81–107, 2016b. ics & Optimization, 84(2):2027–2061, 2021.
Kim, D. and Fessler, J. A. Adaptive restart of the optimized Mann, W. R. Mean value methods in iteration. Proceedings
gradient method for convex optimization. Journal of of the American Mathematical Society, 4(3):506–510,
Optimization Theory and Applications, 178(1):240–263, 1953.
2018.
Martinet, B. Régularisation d’inéquations variationnelles
Kim, D. and Fessler, J. A. Optimizing the efficiency of par approximations successives. Revue Française de
first-order methods for decreasing the gradient of smooth Informatique et Recherche Opérationnelle, 4(R3):154–
convex functions. Journal of Optimization Theory and 158, 1970.
Applications, 188(1):192–219, 2021.
Martinet, B. Algorithmes pour la résolution de problèmes
Kohlenbach, U. On quantitative versions of theorems due d’optimisation et de minimax. PhD thesis, Université
to F. E. Browder and R. Wittmann. Advances in Mathe- Joseph-Fourier-Grenoble I, 1972.
matics, 226(3):2764–2795, 2011.
Matsushita, S.-Y. On the convergence rate of the Kras-
Krasnosel’skiı̆, M. A. Two remarks on the method of succes- nosel’skiı̆–Mann iteration. Bulletin of the Australian
sive approximations. Uspekhi Matematicheskikh Nauk, Mathematical Society, 96(1):162–170, 2017.
10:123–127, 1955.
Nemirovski, A. S. Information-based complexity of linear
Lee, J., Park, C., and Ryu, E. K. A geometric structure of operator equations. Journal of Complexity, 8(2):153–175,
acceleration and its role in making gradients small fast. 1992.
NeurIPS, 2021.
Nemirovski, A. S. and Nesterov, Y. E. Optimal methods
Leustean, L. Rates of asymptotic regularity for Halpern iter- of smooth convex minimization. USSR Computational
ations of nonexpansive mappings. Journal of Universal Mathematics and Mathematical Physics, 25(3–4):21–30,
Computer Science, 13(11):1680–1691, 2007. 1985.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Nemirovski, A. S. and Yudin, D. B. Problem Complexity and Roulet, V. and d’Aspremont, A. Sharpness, restart, and
Method Efficiency in Optimization. Wiley-Interscience, acceleration. SIAM Journal on Optimization, 30(1):262–
1983. 289, 2020.

Nesterov, Y. Introductory Lectures on Convex Optimization: Ryu, E. and Yin, W. Large-scale Convex Optimization via
A Basic Course. Springer, 2004. Monotone Operators. Draft, 2020.
Nesterov, Y. Gradient methods for minimizing composite Ryu, E. K., Taylor, A. B., Bergeling, C., and Giselsson,
functions. Mathematical Programming, 140(1):125–161, P. Operator splitting performance estimation: Tight con-
2013. traction factors and optimal parameter selection. SIAM
Journal on Optimization, 30(3):2251–2271, 2020.
Nesterov, Y. E. A method for solving the convex program-
ming problem with convergence rate O(1/k 2 ). Doklady Ryu, E. K., Hannah, R., and Yin, W. Scaled relative
Akademii Nauk SSSR, 269:543–547, 1983. graphs: Nonexpansive operators via 2d Euclidean ge-
O’Donoghue, B. and Candes, E. Adaptive restart for accel- ometry. Mathematical Programming, 2021.
erated gradient schemes. Foundations of Computational
Sabach, S. and Shtern, S. A first order method for solving
Mathematics, 15(3):715–732, 2015.
convex bilevel optimization problems. SIAM Journal on
Park, C. and Ryu, E. K. Optimal first-order algorithms as a Optimization, 27(2):640–660, 2017.
function of inequalities. arXiv preprint arXiv:2110.11035,
Salim, A., Condat, L., Kovalev, D., and Richtárik, P. An op-
2021.
timal algorithm for strongly convex minimization under

Park, C., Park, J., and Ryu, E. K. Factor- 2 accelera- affine constraints. AISTATS, 2022.
tion of accelerated gradient methods. arXiv preprint
arXiv:2102.07366, 2021. Scieur, D., d’Aspremont, A., and Bach, F. Regularized
nonlinear acceleration. Mathematical Programming, 179
Passty, G. B. Ergodic convergence to a zero of the sum of (1–2):47–83, 2020.
monotone operators in Hilbert space. Journal of Mathe-
matical Analysis and Applications, 72(2):383–390, 1979. Shehu, Y. Convergence rate analysis of inertial
Krasnoselskii–Mann type iteration with applications. Nu-
Peaceman, D. W. and Rachford, Jr, H. H. The numerical merical Functional Analysis and Optimization, 39(10):
solution of parabolic and elliptic differential equations. 1077–1091, 2018.
Journal of the Society for Industrial and Applied Mathe-
matics, 3(1):28–41, 1955. Shi, W., Ling, Q., Wu, G., and Yin, W. A proximal gradi-
ent algorithm for decentralized composite optimization.
Pock, T., Cremers, D., Bischof, H., and Chambolle, A. An IEEE Transactions on Signal Processing, 63(22):6013–
algorithm for minimizing the Mumford-Shah functional. 6023, 2015.
ICCV, 2009.
Taylor, A. and Drori, Y. An optimal gradient method for
Powell, M. J. A method for nonlinear constraints in mini- smooth (possibly strongly) convex minimization. arXiv
mization problems. Optimization, pp. 283–298, 1969. preprint arXiv:2101.09741, 2021.
Reich, S., Thong, D. V., Cholamjiak, P., and Van Long,
Taylor, A. B., Hendrickx, J. M., and Glineur, F. Smooth
L. Inertial projection-type methods for solving pseu-
strongly convex interpolation and exact worst-case per-
domonotone variational inequality problems in Hilbert
formance of first-order methods. Mathematical Program-
space. Numerical Algorithms, 88(2):813–835, 2021.
ming, 161(1–2):307–345, 2017.
Rhoades, B. Some fixed point iteration procedures. In-
Taylor, A. B., Hendrickx, J. M., and Glineur, F. Exact
ternational Journal of Mathematics and Mathematical
worst-case convergence rates of the proximal gradient
Sciences, 14(1):1–16, 1991.
method for composite convex minimization. Journal of
Rhoades, B. and Saliga, L. Some fixed point iteration pro- Optimization Theory and Applications, 178(2):455–476,
cedures. II. Nonlinear Analysis Forum, 6(1):193–217, 2018.
2001.
Van Scoy, B., Freeman, R. A., and Lynch, K. M. The
Rockafellar, R. T. Monotone operators and the proximal fastest known globally convergent first-order method for
point algorithm. SIAM Journal on Control and Optimiza- minimizing strongly convex functions. IEEE Control
tion, 14(5):877–898, 1976. Systems Letters, 2(1):49–54, 2018.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Vũ, B. C. A splitting algorithm for dual monotone inclusions


involving cocoercive operators. Advances in Computa-
tional Mathematics, 38(3):667–681, 2013.
Walker, H. F. and Ni, P. Anderson acceleration for fixed-
point iterations. SIAM Journal on Numerical Analysis,
49(4):1715–1735, 2011.

Wittmann, R. Approximation of fixed points of nonexpan-


sive mappings. Archiv der Mathematik, 58(5):486–491,
1992.
Wu, T., Yuan, K., Ling, Q., Yin, W., and Sayed, A. H.
Decentralized consensus optimization with asynchrony
and delays. IEEE Transactions on Signal and Information
Processing over Networks, 4(2):293–307, 2018.
Xu, H.-K. Iterative algorithms for nonlinear operators. Jour-
nal of the London Mathematical Society, 66(1):240–256,
2002.

Xu, H.-K. Viscosity approximation methods for nonexpan-


sive mappings. Journal of Mathematical Analysis and
Applications, 298(1):279–291, 2004.
Yoon, T. and Ryu, E. K. Accelerated algorithms for smooth
convex-concave minimax problems with O(1/k 2 ) rate
on squared gradient norm. ICML, 2021.
Zhang, J., O’Donoghue, B., and Boyd, S. Globally conver-
gent type-I Anderson acceleration for nonsmooth fixed-
point iterations. SIAM Journal on Optimization, 30(4):
3170–3197, 2020.

Zhu, M. and Chan, T. An efficient primal-dual hybrid gradi-


ent algorithm for total variation image restoration. UCLA
CAM Report 08-34, 2008.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

A. Omitted proofs of Section 2


Proof of Lemma 2.1 with inequalities. Suppose 𝕋 : Rn → Rn is γ1 -Lipschitz for γ ≥ 1. Define 𝔸 : Rn ⇒ Rn as

 −1  
1 1
𝔸= 𝕋+ 𝕀 1+ − 𝕀.
γ γ

For any x, y ∈ Rn , let u ∈ 𝔸x and v ∈ 𝔸y. Then


 −1  
1 1
u ∈ 𝔸x =⇒ u ∈ 𝕋+ 𝕀 1+ x−x
γ γ
 −1  
1 1
⇐⇒ x + u ∈ 𝕋 + 𝕀 1+ x
γ γ
 
1 1
⇐⇒ 𝕋 + 𝕀 (x + u) = x + x
γ γ
1
⇐⇒ 𝕋(x + u) = x − u
γ

Likewise,
1
𝕋(y + v) = y − v.
γ

From the γ1 -Lipschitzness of 𝕋,


   
1 1 1 ≤ 1 ∥(x + u) − (y + v)∥
∥𝕋(x + u) − 𝕋(y + v)∥ ≤ ∥(x + u) − (y + v)∥ ⇐⇒ x − u − y − v
γ γ γ γ
2
1 1
≤ 2 ∥(x − y) + (u − v)∥2

⇐⇒ (x − y) − (u − v)

γ γ
   
1 2 2 2
⇐⇒ 1 − 2 ∥x − y∥ ≤ + ⟨u − v, x − y⟩
γ γ2 γ
γ−1
⇐⇒ ⟨u − v, x − y⟩ ≥ ∥x − y∥2 .
2
γ−1
This holds for any u ∈ 𝔸x and v ∈ 𝔸y for any x, y ∈ Rn , so 𝔸 is 2 -strongly monotone.
We can further prove that
 −1  
1 1
x⋆ ∈ Zer 𝔸 ⇐⇒ 0 ∈ 𝔸x⋆ = 𝕋 + 𝕀 x⋆ + x⋆ − x⋆
γ γ
 −1  
1 1
⇐⇒ x⋆ ∈ 𝕋 + 𝕀 x⋆ + x⋆
γ γ
1 1
⇐⇒ 𝕋x⋆ + x⋆ = x⋆ + x⋆
γ γ
⇐⇒ x⋆ = 𝕋x⋆
⇐⇒ x⋆ ∈ Fix 𝕋.

Suppose 𝔸 : Rn ⇒ Rn is µ-strongly monotone for µ ≥ 0. Define 𝕋 : Rn → Rn as


 
1 1
𝕋= 1+ 𝕁𝔸 − 𝕀.
1 + 2µ 1 + 2µ
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

For any x, y ∈ Rn , let u = 𝕋x and v = 𝕋y. Then

u = 𝕋x
 
1 1
⇐⇒ u = 1 + 𝕁𝔸 x − x
1 + 2µ 1 + 2µ
 
1 1
⇐⇒ x+u= 1+ 𝕁𝔸 x
1 + 2µ 1 + 2µ
 
1 + 2µ 1 1 1 + 2µ
⇐⇒ x+u = x+ u = 𝕁𝔸 x
2 + 2µ 1 + 2µ 2 + 2µ 2 + 2µ
 
1 1 + 2µ
⇐⇒ x ∈ (𝕀 + 𝔸) x+ u
2 + 2µ 2 + 2µ
 
1 + 2µ 1 1 + 2µ
⇐⇒ (x − u) ∈ 𝔸 x+ u .
2 + 2µ 2 + 2µ 2 + 2µ
Likewise,  
1 + 2µ 1 1 + 2µ
(y − v) ∈ 𝔸 y+ v .
2 + 2µ 2 + 2µ 2 + 2µ
From the µ-strong monotonicity of 𝔸,
        
1 1 + 2µ 1 1 + 2µ 1 1 + 2µ 1 1 + 2µ
𝔸 x+ u −𝔸 y+ v , x+ u − y+ v
2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ
    2
1 1 + 2µ 1 1 + 2µ
≥ µ 2 + 2µ x + u − y + v
2 + 2µ 2 + 2µ 2 + 2µ
    
1 + 2µ 1 + 2µ 1 1 + 2µ 1 1 + 2µ
=⇒ (x − u) − (y − v), x+ u − y+ v
2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ
    2
1 1 + 2µ 1 1 + 2µ
≥ µ 2 + 2µ x + 2 + 2µ u − 2 + 2µ y + 2 + 2µ v
 
1 + 2µ 1 + 2µ 1 1 + 2µ
⇐⇒ (x − y) − (u − v), (x − y) + (u − v)
2 + 2µ 2 + 2µ 2 + 2µ 2 + 2µ
2
1 1 + 2µ
≥ µ 2 + 2µ (x − y) + 2 + 2µ (u − v)

⇐⇒ ⟨(1 + 2µ)(x − y) − (1 + 2µ)(u − v), (x − y) + (1 + 2µ)(u − v)⟩ ≥ µ∥(x − y) + (1 + 2µ)(u − v)∥2


⇐⇒ (1 + µ)∥x − y∥2 ≥ (1 + µ)(1 + 2µ)2 ∥u − v∥2
1
⇐⇒ ∥u − v∥2 ≤ ∥x − y∥2 .
(1 + 2µ)2
1
This holds for any u = 𝕋x and v = 𝕋y for any x, y ∈ Rn , so 𝕋 is 1+2µ -Lipschitz.

Finally, we can also prove that


 
1 1
x⋆ ∈ Fix 𝕋 ⇐⇒ x⋆ = 𝕋x⋆ = 1 + 𝕁𝔸 x⋆ − x⋆
1 + 2µ 1 + 2µ
2 + 2µ 2 + 2µ
⇐⇒ x⋆ = 𝕁𝔸 x⋆
1 + 2µ 1 + 2µ
⇐⇒ x⋆ = 𝕁𝔸 x⋆ = (𝕀 + 𝔸)−1 x⋆
⇐⇒ x⋆ ∈ x⋆ + 𝔸x⋆
⇐⇒ 0 ∈ 𝔸x⋆
⇐⇒ x⋆ ∈ Zer 𝔸.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Proof of Lemma 2.1 with scaled relative graph. In this proof, we use the notations of Ryu et al. (2021) for the operator
classes, which we list below. Consider a class of operators Mµ of µ-strongly monotone operators and L1/γ of γ1 -contractions.
As Mµ , L1/γ are SRG-full classes, which means that the inclusion of the SRG of some operator to the SRG of an operator
class is equivalent to membership of that operator to the given operator class (Ryu et al., 2021, Section 3.3). Instead of
showing that the operators satisfy the equivalent inequality condition to the membership, we show the membership in terms
of the SRGs.

µ 1+µ 1 1
− 1+2µ 1
1+µ 1+2µ

(a) SRG of 𝔸 (b) SRG of 𝕀 + 𝔸 (c) SRG of 𝕁𝔸 (d) SRG of 𝕋

Figure 4. SRG changing with invertible transformation F .

Consider an invertible transformation F : C ∪ {∞} → C ∪ {∞} defined as


 
1 1
F (z) = 1+ (1 + z)−1 − .
1 + 2µ 1 + 2µ

F is a composition of only scalar addition/subtraction/multiplication and inversion, therefore preserves the SRG of Mµ and
L1/γ . SRG of F (Mµ ) and L1/γ match, and the SRG of F −1 (L1/γ ) and Mµ match.

B. Omitted proofs of Section 3


B.1. Proof of Lemma 3.1
Lemma B.1. The yk -update in algorithm (OS-PPM) is equivalent to

xk = 𝕁𝔸 yk−1
    
1 1 1 1
yk = 1 − 1+ xk − yk−1 + y0
φk γ γ φk

where γ = 1 + 2µ.

Proof. It suffices to show the equivalence of yk -iterates. For k = 1, from (OS-PPM) update,

φ0 − 1 (γ − 1)φ0
y1 = x1 + (x1 − y0 ) − (y0 − x1 )
φ1 φ1
γ−1
= x1 − 2 (y0 − x1 ) (φ1 = 1 + γ 2 )
γ +1
    
1 1 1 1
= 1− 1+ x1 − y0 + y0 .
φ1 γ γ φ1
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Assume that the equivalence of the iterates holds for k = 1, 2, . . . , l. From the (OS-PPM) update,

φl − 1 (γ − 1)φl γφl−1
yl+1 = xl+1 + (xl+1 − xl ) − (yl − xl+1 ) + (yl−1 − xl )
φl+1 φl+1 φl+1
   
φl − 1 (γ − 1)φl φl − 1 γφl−1 (γ − 1)φl γφl−1
= 1+ + xl+1 − + xl − yl + yl−1
φl+1 φl+1 φl+1 φl+1 φl+1 φl+1
φl φl−1 (γ − 1)φl γφl−1
= γ(γ + 1) xl+1 − γ(γ + 1) xl − yl + yl−1 .
φl+1 φl+1 φl+1 φl+1

From the inductive hypothesis, we have


    
1 1 1 1
yl = 1− 1+ xl − yl−1 + y0 ,
φl γ γ φl
or
γφl−1 yl−1 = γ(γ + 1)φl−1 xl − φl yl + y0 .

Plugging this into the γφl−1 yl−1 -term in yl+1 , we get

φl φl−1 (γ − 1)φl 1
yl+1 = γ(γ + 1) xl+1 − γ(γ + 1) xl − yl + {γ(γ + 1)φl−1 xl − φl yl + y0 }
φl+1 φl+1 φl+1 φl+1
φl φl 1
= γ(γ + 1) xl+1 − γ yl + y0
φl+1 φl+1 φl+1
γ 2 φl
  
1 1 1
= 1+ xl+1 − yl + y0
φl+1 γ γ φl+1
    
1 1 1 1
= 1− 1+ xl+1 − yl + y0 .
φl+1 γ γ φl+1

The same equivalence holds for yl+1 , so we are done.

Proof of Lemma 3.1. Start from the same initial iterate y0 = ỹ0 . Suppose yk = ỹk for some k ≥ 0. Then,
    
1 1 1 1
yk+1 = 1 − 1+ xk+1 − yk + y0 (Lemma B.1)
φk+1 1 + 2µ 1 + 2µ φk+1
    
1 1 1 1
= 1− 1+ 𝕁𝔸 − 𝕀 yk + y0
φk+1 1 + 2µ 1 + 2µ φk+1
 
1 1
= 1− 𝕋yk + y0
φk+1 φk+1
 
1 1
= 1− 𝕋ỹk + y0 = ỹk+1 . (yk = ỹk by induction hypothesis)
φk+1 φk+1

B.2. Proof of Theorem 3.2


Recall that
" k−1
!2 k−1
!
X X
k
V = (1 + γ −k
) γ n ˜ k ∥2 + 2
∥𝔸x γ n ˜ k − µ(xk − x⋆ ), xk − x⋆ ⟩
⟨𝔸x
n=0 n=0
k−1 ! 2 #
X
−k
+γ γ n ˜
𝔸xk − γ (xk − x⋆ ) + (xk − y0 ) + (1 − γ −k )∥y0 − x⋆ ∥2 .
k
(OS-PPM-Lyapunov)

n=0
Exact Optimal Accelerated Complexity for Fixed-Point Iterations
Pk
for k = 1, 2, . . . , N and V 0 = 2∥y0 − x⋆ ∥2 , where γ = 1 + 2µ, φk = ˜ k = yk−1 − xk ∈ 𝔸xk . We will
γ 2n and 𝔸x
n=0
often use the following identity.
k
X k
X
(1 + γ)φk = (1 + γ) γ 2n = (1 + γ k+1 ) γn.
n=0 n=0

First, we show that V k has an alternate form as below. This form is useful in proving the monotone decreasing property of
V k in k.
Lemma B.2. V k defined in (OS-PPM-Lyapunov) can be equivalently written as

˜ k ∥2 + 2γ −2k (1 + γ)φk−1 ⟨𝔸x


V k = γ −2k (1 + γ)2 φ2k−1 ∥𝔸x ˜ k − µ(xk − y0 ), xk − y0 ⟩ + 2∥y0 − x⋆ ∥2 .

Proof. Expanding the square term,


! 2
k−1
X n ˜
γ 𝔸xk − γ k (xk − x⋆ ) + (xk − y0 )



n=0
k−1 ! 2
X
=

γ n ˜ k k
𝔸xk − (γ − 1)(xk − y0 ) − γ (y0 − x⋆ )


n=0
k−1
! 2 k−1
! k−1
!
X X X
= γ n ˜ k∥ − 2
∥𝔸x 2
γ n k ˜ k , xk − y0 ⟩ − 2
(γ − 1)⟨𝔸x γ n ˜ k , y0 − x⋆ ⟩
γ k ⟨𝔸x
n=0 n=0 n=0
+ (γ − 1) ∥xk − y0 ∥ − 2γ (γ − 1)⟨xk − y0 , y0 − x⋆ ⟩ + γ ∥y0 − x⋆ ∥2 .
k 2 2 k k 2k

Also, we have
˜ k − µ(xk − x⋆ ), xk − x⋆ ⟩
⟨𝔸x
˜ k − µ(xk − y0 ) − µ(y0 − x⋆ ), (xk − y0 ) + (y0 − x⋆ )⟩
= ⟨𝔸x
˜ k − µ(xk − y0 ), xk − y0 ⟩ + ⟨𝔸x
= ⟨𝔸x ˜ k , y0 − x⋆ ⟩ − 2µ⟨xk − y0 , y0 − x⋆ ⟩.

Then V k is expressed as
k−1
!
X
V k = 2(1 + γ −k ) γn ˜ k − µ(xk − y0 ), xk − y0 ⟩ + ⟨𝔸x
˜ k , y0 − x⋆ ⟩ − 2µ⟨xk − y0 , y0 − x⋆ ⟩

⟨ ⟨𝔸x
n=0
( k−1
!2 k−1
!
X X
+ (1 + γ −k
)γ −k
γ n ˜ k ∥2 − 2
∥𝔸x γ n ˜ k , xk − y0 ⟩ + (γ k − 1)2 ∥xk − y0 ∥2
(γ k − 1)⟨𝔸x
n=0 n=0
k−1
! )
X
−2 γn ˜ k , y0 − x⋆ ⟩ − 2γ k (γ k − 1)⟨xk − y0 , y0 − x⋆ ⟩ + γ 2k ∥y0 − x⋆ ∥2
γ k ⟨𝔸x
n=0
k−1
!2
X
+ (1 + γ −k ) γn ˜ k ∥2 + (1 − γ −k )∥y0 − x⋆ ∥2
∥𝔸x
n=0
k−1
!2 k−1
!
X X
= (1 + γ −k 2
) γ n ˜ k ∥ + 2(1 + γ
∥𝔸x 2 −k
) γ n ˜ k − µ(xk − y0 ), xk − y0 ⟩
⟨𝔸x
n=0 n=0
k−1
!
X
− 2γ −k
(1 + γ −k
)(γ − 1)k
γ n ˜ k , xk − y0 ⟩ + γ −k (1 + γ −k )(γ k − 1)2 ∥xk − y0 ∥2 + 2∥y0 − x⋆ ∥2
⟨𝔸x
n=0
k−1
!2 k−1
!
X X
= (1 + γ −k 2
) γ n ˜ k ∥2 + 2γ −k (1 + γ −k )
∥𝔸x γ n ˜ k − µ(xk − y0 ), xk − y0 ⟩ + 2∥y0 − x⋆ ∥2 .
⟨𝔸x
n=0 n=0
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

As
k−1
! k−1
!
−k
X
n 1 + γk X
n 1+γ
(1 + γ ) γ = γ = φk−1 ,
n=0
γk n=0
γk

we have

˜ k ∥2 + 2γ −2k (1 + γ)φk−1 ⟨𝔸x


V k = γ −2k (1 + γ)2 φ2k−1 ∥𝔸x ˜ k − µ(xk − y0 ), xk − y0 ⟩ + 2∥y0 − x⋆ ∥2 .

Next, we prove that {V k }N


k=0 is monotonically decreasing in k.

Lemma B.3. For k = 0, 1, . . . , N with V k defined as (OS-PPM-Lyapunov), we have

V N ≤ V N −1 ≤ · · · ≤ V 1 ≤ V 0 .

Proof. We use the form of V k as in Lemma B.2.

˜ 1 ∥2 + 2γ −2 (1 + γ)⟨𝔸x
V 1 − V 0 = γ −2 (1 + γ)2 ∥𝔸x ˜ 1 − µ(x1 − y0 ), x1 − y0 ⟩
−2 ˜ 1 ∥ + 2⟨𝔸x
2 ˜ 1 − µ(x1 − y0 ), x1 − y0 ⟩

= γ (1 + γ) (1 + γ)∥𝔸x
= γ −2 (1 + γ) (1 + γ)∥𝔸x˜ 1 ∥2 − 2(1 + µ)∥𝔸x
˜ 1 ∥2 ˜ 1)

(x1 − y0 = −𝔸x
= 0. (1 + γ = 2(1 + µ))

Now, consider k ≥ 1. Then,

˜ k+1 ∥2 − γ −2k (1 + γ)2 φ2k−1 ∥𝔸x


V k+1 − V k = γ −2(k+1) (1 + γ)2 φ2k ∥𝔸x ˜ k ∥2
˜ k+1 − µ(xk+1 − y0 ), xk+1 − y0 ⟩
+ 2γ −2(k+1) (1 + γ)φk ⟨𝔸x
˜ k − µ(xk − y0 ), xk − y0 ⟩.
− 2γ −2k (1 + γ)φk−1 ⟨𝔸x

Now, we claim that

˜ k+1 − 𝔸x
V k+1 − V k + 2γ −2k (1 + γ)φk φk−1 ⟨𝔸x ˜ k − µ(xk+1 − xk ), xk+1 − xk ⟩ = 0.

First,

˜ k+1 − 𝔸x
V k+1 − V k + 2γ −2k (1 + γ)φk φk−1 ⟨𝔸x ˜ k − µ(xk+1 − xk ), xk+1 − xk ⟩
˜ k+1 − µ(xk+1 − y0 ), xk+1 − xk ⟩
= V k+1 − V k + 2γ −2k (1 + γ)φk φk−1 ⟨𝔸x
˜ k − µ(xk − y0 ), xk+1 − xk ⟩
− 2γ −2k (1 + γ)φk φk−1 ⟨𝔸x
˜ k+1 − γφk−1 𝔸x
= γ −2(k+1) (1 + γ)2 ⟨φk 𝔸x ˜ k , φk 𝔸x
˜ k+1 + γφk−1 𝔸x
˜ k⟩
˜ k+1 − µ(xk+1 − y0 ), γ 2 φk−1 (xk+1 − xk ) + (xk+1 − y0 )⟩
+ 2γ −2(k+1) (1 + γ)φk ⟨𝔸x
˜ k − µ(xk − y0 ), φk (xk+1 − xk ) + (xk − y0 )⟩.
− 2γ −2k (1 + γ)φk−1 ⟨𝔸x

From Lemma B.1, we have


    
1 1 1 1
yk = 1− 1+ xk − yk−1 + y0 .
φk γ γ φk
˜ k , yk = xk+1 + 𝔸x
Using the fact that yk−1 = xk + 𝔸x ˜ k+1 , and φk = γ 2 φk−1 + 1, we obtain

˜ k+1 = γ 2 φk−1 (xk − y0 ) − γφk−1 𝔸x


φk (xk+1 − y0 ) + φk 𝔸x ˜ k.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

˜ k+1 − γφk−1 𝔸x
Letting U k = φk (xk+1 − y0 ) − γ 2 φk−1 (xk − y0 ) = −φk 𝔸x ˜ k , above formula is simplified as

˜ k+1 − 𝔸x
V k+1 − V k + 2γ −2k (1 + γ)φk φk−1 ⟨𝔸x ˜ k − µ(xk+1 − xk ), xk+1 − xk ⟩
˜ k+1 − γφk−1 𝔸x
= −γ −2(k+1) (1 + γ)2 ⟨φk 𝔸x ˜ k , Uk ⟩
˜ k+1 − µ(xk+1 − y0 ), Uk ⟩
+ 2γ −2(k+1) (1 + γ)φk ⟨𝔸x
˜ k − µ(xk − y0 ), Uk ⟩
− 2γ −2k (1 + γ)φk−1 ⟨𝔸x
˜ k+1 − γφk−1 𝔸x
= γ −2(k+1) (1 + γ) − (1 + γ)(φk 𝔸x ˜ k ) + 2φk {𝔸x
˜ k+1 − µ(xk+1 − y0 )}

2 ˜ k − µ(xk − y0 )}, Uk

− 2γ φk−1 {𝔸x
˜ k+1 + γφk−1 𝔸x
= γ −2(k+1) (1 + γ) (1 − γ)(φk 𝔸x ˜ k ) − 2µ{φk (xk+1 − y0 ) − γ 2 φk−1 (xk − y0 )}, Uk

= γ −2(k+1) (1 + γ)⟨(γ − 1)Uk − 2µUk , Uk ⟩ = 0 (γ − 1 = 2µ)

We now prove Theorem 3.2.

Proof of Theorem 3.2. According to Lemma B.3, we have V N ≤ V N −1 ≤ · · · ≤ V 0 = 2∥y0 − x⋆ ∥2 . Therefore,

2∥y0 − x⋆ ∥2 ≥ V N
N −1
!2 N −1
!
X X
= (1 + γ −N
) γ n ˜ N ∥2 + 2(1 + γ −N )
∥𝔸x γ n ˜ N − µ(xN − x⋆ ), xN − x⋆ ⟩
⟨𝔸x
n=0 n=0
N −1 ! 2
X
−N −N
+ γ (1 + γ ) γ n ˜
𝔸xN − γ (xN − x⋆ ) + (xN − y0 ) + (1 − γ −N )∥y0 − x⋆ ∥2
N

n=0
N −1
! 2
X
≥ (1 + γ −N ) γ n ∥𝔸x ˜ N ∥2 + (1 − γ −N )∥y0 − x⋆ ∥2 ,
n=0

which can be simplified as

N −1
!2
X
(1 + γ −N 2
)∥y0 − x⋆ ∥ ≥ (1 + γ −N
) γ n ˜ N ∥2 ,
∥𝔸x
n=0

or equivalently,
!2
˜ N ∥2 ≤ 1
∥𝔸x PN −1 ∥y0 − x⋆ ∥2 .
n=0 γn

Proof of Corollary 3.3. This immediately follows from Theorem 3.2 and Lemma 3.1 by
 −1
˜ N = yN −1 − xN = 1
𝔸x 1+ (yN −1 − 𝕋yN −1 ) ∈ 𝔸xN .
γ

C. Details on the formulation of performance estimation problem for (OS-PPM)


In order to obtain an estimate on the worst-case complexity of the algorithm, performance estimation problem (PEP) tech-
nique solves a certain form of semidefinite problem (SDP). This SDP holds positive semidefinite matrix as an optimization
variable, and solves the problem under constraints formulated from the interpolation condition of an operator in hand.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

When discovering (OS-PPM), we used maximal monotonicity as our interpolation condition, just as in Ryu et al. (2020);
Kim (2021). We further extended this to cover the case of maximal strongly-monotone operators, in a slightly different
way with Taylor & Drori (2021) who considered strongly convex interpolation. The optimization variable is a positive
semidefinite matrix, and this is of a Gram matrix form which stores information on the iterates of algorithms. Usual choice
of basis vectors for the gram matrix in PEP is usually ∇f (x) for convex minimization setup (Kim & Fessler, 2016b; Taylor
˜ for operator setup (Kim, 2021). Here, we used x-iterates to form the gram matrix
et al., 2018; Taylor & Drori, 2021), or 𝔸x
of SDP.
This basic SDP is a primal problem of the PEP (Primal-PEP), and solving this returns an estimate to the worst-case
complexity of given algorithm. If we form a dual problem (dual-PEP) and minimize the optimal value of dual-PEP over
possible choices of stepsizes as in Kim & Fessler (2016b); Taylor et al. (2018); Kim (2021); Taylor & Drori (2021),
this provides possibly the fastest rate, and solution to this minimization problem gives possibly optimal algorithms. We
considered a class of algorithms satisfying the span assumption in Corollary 4.2, and obtained (OS-PPM).

D. Omitted proofs of Section 4


D.1. Proving complexity lower bound with span condition
1
Proof of Lemma 4.3 with inequalities. From (Bauschke & Combettes, 2017, Proposition 4.35), 𝔾 is 1+γ -averaged if and
only if
γ−1 2γ
∥𝔾x − 𝔾y∥2 + ∥x − y∥2 ≤ ⟨𝔾x − 𝔾y, x − y⟩, ∀x, y ∈ Rn .
γ+1 1+γ

Then for any x, y ∈ Rn , we get the chain of equivalences as follows.

1
∥𝕋x − 𝕋y∥2 ≤ ∥x − y∥2 ⇐⇒ ∥γ𝕋x − γ𝕋y∥2 ≤ ∥x − y∥2
γ2
⇐⇒ ∥{(1 + γ)𝔾x − γx} − {(1 + γ)𝔾y − γy}∥2 ≤ ∥x − y∥2
⇐⇒ (1 + γ)2 ∥𝔾x − 𝔾y∥2 − 2γ(1 + γ)⟨𝔾x − 𝔾y, x − y⟩ + γ 2 ∥x − y∥2 ≤ ∥x − y∥2
⇐⇒ (1 + γ)2 ∥𝔾x − 𝔾y∥2 + (γ 2 − 1)∥x − y∥2 ≤ 2γ(1 + γ)⟨𝔾x − 𝔾y, x − y⟩
γ−1 2γ
⇐⇒ ∥𝔾x − 𝔾y∥2 + ∥x − y∥2 ≤ ⟨𝔾x − 𝔾y, x − y⟩. (∵ 1 + γ > 0)
γ+1 γ+1

Therefore, 𝕋 is γ1 -contractive if and only if 𝔾 is 1


1+γ -averaged.

Proof of Lemma 4.3 with scaled relative graph. Using the notion of SRG (Ryu et al., 2021), we get the following equiva-
1
lence of SRGs. Here, N 1+γ
1 is a class of 1+γ -averaged operators. Therefore, we get the chain of equivalences

γ
SRG of 𝕋 ∈ L γ1 SRG of 1+γ (𝕀 − 𝕋)

− γ1 1
γ 1− 2
1+γ
1

 
SRG of 𝕀 − 1 + γ1 𝔾 SRG of 𝔾 ∈ N 1+γ
1

Figure 5. SRG of 𝕋 and 𝔾


Exact Optimal Accelerated Complexity for Fixed-Point Iterations

𝕋 ∈ L1/γ ⇐⇒ γ𝕋 ∈ L1 ⇐⇒ −γ𝕋 ∈ L1
γ 1
⇐⇒ 𝔾 = 𝕀+ (−γ𝕋) ∈ N 1+γ
1 ,
1+γ 1+γ
and conclude that 𝕋 is γ1 -Lipschitz if and only if 𝔾 is 1
1+γ -averaged.

Proof of Lemma 4.4. We restate the definition of ℕ : RN +1 → RN +1 .


1 + γ N +1
ℕx = ℕ(x1 , x2 , . . . , xN +1 ) = (xN +1 , −x1 , . . . , −xN ) − p Re1 , x ∈ RN +1 .
1 + γ 2 + · · · + γ 2N
For any x, y ∈ RN +1 such that
x = (x1 , x2 , . . . , xN +1 ), y = (y1 , y2 , . . . , yN +1 ),
we have
∥ℕx − ℕy∥2 = ∥(xN +1 , −x1 , . . . , −xN ) − (yN +1 , −y1 , . . . , −yN )∥2
= (xN +1 − yN +1 )2 + (x1 − y1 )2 + · · · + (xN − yN )2
= ∥x − y∥2 .
1 γ 1
Then ℕ is nonexpansive, and by definition, 𝔾 = 1+γ ℕ + 1+γ 𝕀 is a 1+γ -averaged operator.

Proof of Lemma 4.5. By the definition of 𝔾 : RN +1 → RN +1 , for any x ∈ RN +1 ,


1 γ
𝔾x = ℕx + x
1+γ 1+γ
 
γ 0 0 ... 0 1
−1 γ 0 ... 0 0
 
1   0 −1 γ ... 0 0 1 1 + γ N +1
= ..  x − Re1

 . .. .. .. ..
1 + γ  ..
p
. . . . . 1 + γ 1 + γ 2 + · · · + γ 2N
 
0 0 0 ... γ 0
| {z }
=b
0 0 0 ... −1 γ
| {z }
=H

where γ = 1 + 2µ. Observe that 𝔾ek ∈ span{e1 , ek , ek+1 } for k = 1, . . . , N .


We use induction on k to prove the Lemma. The claim holds for k = 0 from
1 1 + γ N +1
𝔾y0 = 𝔾0 = − p Re1 ∈ span{e1 }.
1 + γ 1 + γ 2 + · · · + γ 2N

Now, suppose that the claim holds for k < N , i.e.,


yk ∈ span{e1 , e2 , . . . , ek }
𝔾yk ∈ span{e1 , e2 , . . . , ek+1 }.
Then
yk+1 ∈ y0 + span{𝔾y0 , 𝔾y1 , . . . , 𝔾yk }
⊆ span{e1 , e2 , . . . , ek+1 }
𝔾yk+1 = Hyk+1 − b
∈ Hspan{e1 , e2 , . . . , ek+1 } − b
⊆ span{e1 , e2 , . . . , ek+2 }.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Proof of Theorem 4.1. The proof outline of Theorem 4.1 in Section 4.2 is complete except for the part that the identity (∗)
holds, and that Theorem 4.1 holds for any initial point y0 ∈ Rn which is not necessarily zero.
First, we show that for any initial point y0 ∈ Rn , there exists an worst-case operator 𝕋 : Rn → Rn which cannot exhibit
better than the desired rate. Denote by 𝕋0 : Rn → Rn the worst-case operator constructed in the proof of Theorem 4.1 for
y0 = 0. Define 𝕋 : Rn → Rn as
𝕋y = 𝕋0 (y − y0 ) + y0
given y0 ∈ Rn . Then, first of all, the fixed point of 𝕋 is y⋆ = ỹ⋆ + y0 where ỹ⋆ is the unique solution of 𝕋0 . Also, if
{yk }N
k=0 satisfies the span condition

yk ∈ y0 + span {y0 − 𝕋y0 , . . . , yk−1 − 𝕋yk−1 } , k = 1, . . . , N,


then ỹk = yk − y0 forms a sequence satisfying
ỹk ∈ ỹ0 +span {ỹ0 − 𝕋0 ỹ0 , . . . , ỹk−1 − 𝕋0 ỹk−1 } , k = 1, . . . , N,
|{z}
=0

which is the same span condition in Theorem 4.1 with respect to 𝕋0 . This is true from the fact that
yk − 𝕋yk = yk − y0 +𝕋0 (yk − y0 ) = ỹk − 𝕋0 ỹk
| {z } | {z }
=ỹk ỹk

for k = 1, . . . , N .
Now, {ỹk }N
k=0 is a sequence starting from ỹ0 = 0 satisfying the span condition for 𝕋0 . This implies that,

∥yN − 𝕋yN ∥2 = ∥ỹN − 𝕋0 ỹN ∥2


 2 !2
1 1
≥ 1+ PN ∥ỹ0 − ỹ⋆ ∥2
γ k=0 γ k

 2 !2
1 1
= 1+ PN ∥y0 − y⋆ ∥2 .
γ k=0 γ k

𝕋 is our desired worst-case γ1 -contraction on Rn .


It remains to show that !2
2 ⟨b, v⟩ 2 (∗)

2 1
R2

∥𝔾yN ∥ ≥ Pspan{v} (b) =
v = PN
⟨v, v⟩ k=0 γ
k

where ⊺
γ2 γN

v= 1 γ ... ,
especially the identity (∗).
⟨b, v⟩ 2 2

v = |⟨b, v⟩|
⟨v, v⟩ ∥v∥2
!2
R 1 + γ N +1 1
= ×p ×
1+γ 1 + γ 2 + γ 4 + · · · + γ 2N 1 + γ 2 + γ 4 + · · · + γ 2N
2
1 + γ N +1

R
= ×
1+γ 1 + γ 2 + γ 4 + · · · + γ 2N
 2
R
=
1 + γ + γ2 + · · · + γN
!2
1
= PN R2 .
k
k=0 γ
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

γ−1
Proof of Corollary 4.2. According to Lemma 2.1, 𝕋 is 1/γ-contractive if and only if 𝔸 = (𝕋 + 1/γ𝕀)−1 − 𝕀 is 2 -strongly
monotone. For any y ∈ Rn , if x = 𝕁𝔸 y, then
      
1 1 1 1 ˜
y − 𝕋y = y − 1+ 𝕁𝔸 y − y = 1 + (y − x) = 1 + 𝔸x.
γ γ γ γ

This implies that

yk ∈ y0 + span{y0 − 𝕋y0 , y1 − 𝕋y1 , . . . , yk−1 − 𝕋yk−1 }, k = 1, . . . , N,

if and only if

xk = 𝕁𝔸 yk−1
˜ 1 , . . . , 𝔸x
yk ∈ y0 + span{𝔸x ˜ k }, k = 1, . . . , N

where xk = 𝕁𝔸 yk−1 . Span conditions in the statements of Theorem 4.1 and Corollary 4.2 are equivalent under the
transformation 𝔸 = (𝕋 + 1/γ𝕀)−1 − 𝕀. Therefore, the lower bound result of this corollary can be derived from the lower
bound result of Theorem 4.1.

D.2. Deterministic algorithm classes


In this section, we provide basic terminologies and necessary concepts in proving the complexity lower bound result for
general algorithms. We follow the information-based complexity framework developed by Nemirovski & Yudin (1983), and
use the resisting oracle technique to extend the results of Theorem 4.1 and Corollary 4.2 to general fixed-point iterations and
general proximal point methods. The proof itself is motivated by the works of Carmon et al. (2020; 2021), and large portion
of the definitions and notations are due to their work.
In the information-based complexity framework, every iterate {yk }k∈N is a query from an information oracle, which returns
restrictive information on a given function or operator. Then, assumptions on the algorithm, such as linear span condition,
illustrates how it uses such information. For instance, provided with a gradient oracle Of (x) = ∇f (x) of convex function
f to be minimized, usually the first-order algorithms search within the span of previous gradients to reach the next iterate.
A deterministic fixed-point iteration A is a mapping of an initial point y0 and an operator 𝕋 to a sequence of iterates {yt }t∈N
and {ȳt }t∈N , such that the output depends on 𝕋 only through the fixed-point residual oracle O𝕋 (y) = y − 𝕋y. Here,
‘deterministic’ means that given the same initial point y0 and the sequence of oracle evaluations {O𝕋 (yt )}t∈N , the algorithm
yields the same sequence of iterates {(yt , ȳt )}t∈N . More precisely, we define A per iteration by setting A = {At }t∈N with

(yt , ȳt ) = At [y0 ; 𝕋] = At [y0 , O𝕋 (y0 ), . . . , O𝕋 (yt−1 )],

where yt is the t-th query point and ȳt is the t-th approximate solution produced by At . Here, we consider the algorithms
whose query points and approximate solutions are identical (yt = ȳt ).
Even though the A is defined to produce infinitely many yt - and ȳt -iterates, the definition includes the case where algorithm
terminates at a predetermined total iteration count N , i.e., the algorithm may have a predetermined iteration count N and
the behavior may depend on the specified value of N . In such cases, yN = ȳN = yN +1 = ȳN +1 = · · · .
Similarly, a deterministic proximal point method A is a mapping of an initial point y0 and a maximal monotone operator 𝔸
to a sequence of query points {yt }t∈N and approximate solutions {ȳt }t∈N , such that the output depends on 𝔸 only through
the resolvent residual oracle O𝔸 (y) = y − 𝕁𝔸 y = 𝔸x ˜ ∈ 𝔸x where x = 𝕁𝔸 y. Indeed, this method A yields the same
sequence of iterates given the same initial point y0 and oracle evaluations {O𝔸 (yt )}t∈N .

D.3. Generalized complexity lower bound


As mentioned earlier, the general deterministic fixed-point iterations have no accounts for the span condition. We use the
resisting oracle technique (Nemirovski & Yudin, 1983) to prove the lower bound result for general deterministic fixed-point
iterations. Recall that Theorem 4.6 is
Theorem 4.6 (Complexity lower bound of general deterministic fixed-point iterations). Let n ≥ 2N for N ∈ N. For any
deterministic fixed-point iteration A and any initial point y0 ∈ Rn , there exists a γ1 -Lipschitz operator 𝕋 : Rn → Rn with a
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

fixed point y⋆ ∈ Fix 𝕋 such that


 2 !2
2 1 1
∥yN − 𝕋yN ∥ ≥ 1+ PN ∥y0 − y⋆ ∥2
γ k=0 γ
k

where {yt }t∈N = A[y0 ; 𝕋].

By the equivalency of the optimization problems and algorithms stated in Lemma 2.1 and Lemma 3.1, Theorem 4.6 also
generalizes Corollary 4.2 to general proximal point methods.
Corollary D.1 (Complexity lower bound of general proximal point methods). Let n ≥ 2N − 2 for N ∈ N. For any
deterministic proximal point method A and arbitrary initial point y0 ∈ Rn , there exists a µ-strongly monotone operator
𝔸 : Rn → Rn with a zero x⋆ ∈ Zer 𝔸 such that
 2
˜ N ∥2 ≥ 1
∥𝔸x ∥y0 − x⋆ ∥2
1 + γ + · · · + γ N −1

where {yt }t∈N = A[y0 ; 𝕋].

D.4. Proof of Theorem 4.6


In order to prove Theorem 4.6, we first extend the result of Theorem 4.1 to the zero-respecting sequences, which is a
requirement slightly more general than the span assumption. The worst-case operator of Theorem 4.1 covers the case of
zero-respecting sequences, and this result will be successfully extended to general deterministic fixed-point iterations.
We say that a sequence {zt }t∈N∪{0} ⊆ Rd is zero-respecting with respect to 𝕋 if

supp{zt } ⊆ ∪s<t supp{zs − 𝕋zs }

for every t ∈ N ∪ {0}, where supp{z} := {i ∈ [d] | ⟨z, ei ⟩ = ̸ 0}. An deterministic fixed-point iteration A is called
zero-respecting if A generates a sequence {zt }t∈N∪{0} which is zero-respecting with respect to 𝕋 for any nonexpansive
𝕋 : Rd → Rd . Note that by definition, z0 = 0. And for notational simplicity, define suppV = z∈V supp{z}.
S

This property serves as an important intermediate step to the generalization of Theorem 4.1, where its similar form called
‘zero-chain’ has numerously appeared on the relevant references in convex optimization (Nesterov, 2004; Drori, 2017;
Carmon et al., 2020; Drori & Taylor, 2022). The worst-case operator found in the proof of Theorem 4.1 still performs the
best among all the zero-respecting query points with respect to 𝕋, according to the following lemma.
Lemma D.2. Let 𝕋 : RN +1 → RN +1 be the worst-case operator defined in the proof of Theorem 4.1. If the iterates {zt }N
t=0
are zero-respecting with respect to 𝕋,
 2  2
2 1 1
∥zN − 𝕋zN ∥ ≥ 1+ ∥z0 − z⋆ ∥2
γ 1 + γ + · · · + γN

for z⋆ ∈ Fix 𝕋.

Proof. Let 𝔾 be defined as in the proof of Theorem 4.1. Then we have

z ∈ span{e1 , e2 , . . . , ek } =⇒ 𝔾z ∈ span{e1 , e2 , . . . , ek+1 }.

We claim that any zero-respecting sequence {zk }k=0,1,...,N satisfies



zk ∈ span e1 , e2 , . . . , ek
γ 
𝔾zk = (zk − 𝕋zk ) ∈ span e1 , e2 , . . . , ek+1
1+γ

for k = 0, 1, . . . , N , so that the lower bound result of Theorem 4.1 is applicable.


Exact Optimal Accelerated Complexity for Fixed-Point Iterations

If k = 0, then y0 = 0 and from this, 𝔾0 ∈ span{e1 }. So the case of k = 0 holds. Now, suppose that 0 < k ≤ N
and the claim holds for all n < k. Then 𝔾zn ∈ span{e1 , . . . , en+1 } ⊆ span{e1 , . . . , ek } for 0 ≤ k < n. {zk }N
k=0 is
zero-respecting with respect to 𝕋, so
[
supp{zk } ⊆ supp{zn − 𝕋zn }
n<k

= supp 𝔾z0 , 𝔾z1 , . . . , 𝔾zk−1
⊆ supp{e1 , e2 , . . . , ek }.
Therefore, zk ∈ span{e1 , e2 , . . . , ek }, and 𝔾zk ∈ span{e1 , e2 , . . . , ek+1 }. The claim holds for k = 1, . . . , N .
According to the proof of Theorem 4.1,
 2  2
1 1
∥zN − 𝕋zN ∥2 ≥ 1+ ∥z0 − z⋆ ∥2
γ 1 + γ + · · · + γN
for any zero-respecting iterates {zk }N
k=0 with respect to 𝕋.

We say that a matrix U ∈ Rm×n with m ≥ n is orthogonal, if each columns {ui }ni=1 ⊆ Rm of U as in
 
| ... |
U = u1 . . . un 
| ... |
are orthonormal to each other, or in other words, U ⊺ U = In . It directly follows that U U ⊺ is an orthogonal projection from
Rm to the range R(U ) of U .
Lemma D.3. For any orthogonal matrix U ∈ Rm×n with m ≥ n and any arbitrary vector y0 ∈ Rm , if 𝕋 : Rn → Rn is a
1 m
γ -contractive operator with γ ≥ 1, then 𝕋U : R → Rm defined as

𝕋U (y) := U 𝕋U ⊺ (y − y0 ) + y0 , ∀y ∈ Rm
is also a γ1 -contractive operator. Furthermore, z⋆ ∈ Fix 𝕋 if and only if y⋆ = y0 + U z⋆ ∈ Fix 𝕋U .

Proof. For any x, z ∈ Rm ,


∥𝕋U x − 𝕋U z∥ = ∥U 𝕋U ⊺ (x − y0 ) − U 𝕋U ⊺ (z − y0 )∥
= ∥𝕋U ⊺ (x − y0 ) − 𝕋U ⊺ (z − y0 )∥ (U is an orthogonal matrix)
1
≤ ∥U ⊺ (x − y0 ) − U ⊺ (z − y0 )∥ (𝕋 is γ1 -contractive)
γ
1
= ∥U U ⊺ (x − z)∥
γ
1
≤ ∥x − z∥. (U U ⊺ is an orthogonal projection onto R(U ))
γ
Now, suppose z⋆ is a fixed point of 𝕋. Then
𝕋U (y⋆ ) = U 𝕋U ⊺ U z⋆ + y0 = U 𝕋z⋆ + y0
= U z⋆ + y 0 = y ⋆
so y⋆ is a fixed point of 𝕋U . On the other hand, if y⋆ is a fixed point of 𝕋U , then z⋆ = U ⊺ (y⋆ − y0 ) satisfies
𝕋(z⋆ ) = 𝕋U ⊺ (y⋆ − y0 )
= U ⊺ U 𝕋U ⊺ (y⋆ − y0 ) (U ⊺ U = In )
= U ⊺ (𝕋U y⋆ − y0 )
= U ⊺ (y⋆ − y0 ) = z⋆ (y⋆ ∈ Fix 𝕋U )
so it is a fixed point of 𝕋.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Lemma D.4. Let A be a general deterministic fixed-point iteration, and 𝕋 : Rn → Rn be a 1/γ-contractive operator.
For m ≥ n + N − 1 and any arbitrary point y0 ∈ Rm , there exists an orthogonal matrix U ∈ Rm×n and the iterates
{yt }N
t=1 = A[y0 ; 𝕋U ] with the following properties.

(i) Let z (t) := U ⊺ (yt − y0 ) for t = 0, 1, . . . , N . Then {z (t) }N


t=0 is zero-respecting with respect to 𝕋.

(ii) {z (t) }N
t=0 satisfies
∥z (t) − 𝕋z (t) ∥ ≤ ∥yt − 𝕋U yt ∥, t = 0, . . . , N.

Proof. We first show that (i) implies (ii). From (i), we know that z (t) = U ⊺ (yt − y0 ) for t = 0, 1, . . . , N . Therefore,

∥z (t) − 𝕋z (t) ∥ = ∥U ⊺ (yt − y0 ) − 𝕋U ⊺ (yt − y0 )∥


= ∥U U ⊺ {(yt − y0 ) − U 𝕋U ⊺ (yt − y0 )}∥ (U is orthogonal)
= ∥U U ⊺ (yt − y0 ) − U U ⊺ U 𝕋U ⊺ (yt − y0 )∥ (U ⊺ U = In )
= ∥U U ⊺ {(yt − y0 ) − U 𝕋U ⊺ (yt − y0 )}∥
≤ ∥(yt − y0 ) − U 𝕋U ⊺ (yt − y0 )∥ (U U ⊺ is an orthogonal projection)
= ∥yt − 𝕋U yt ∥. (Definition of 𝕋U )

Now we prove the existence of orthogonal U ∈ Rm×n with {yt }N t=0 = A[y0 ; 𝕋U ] and (i) holds. In order to show the
existence of such orthogonal matrix U as in (i), we provide the inductive scheme that finds the columns of U at each
iteration. Before describing the actual scheme, we first provide some observations useful to deriving the necessary conditions
for the columns {ui }ni=1 of U to satisfy.
Let t ∈ {1, . . . , N }, and define the set of indices St as

St = ∪s<t supp{z (s) − 𝕋z (s) }.

For {z (t) }N
t=0 to satisfy the zero-respecting property with respect to 𝕋, z
(t)
is required to satisfy

supp{z (t) } ⊆ St

for t = 1, . . . , N . This requirement is fulfilled when

yt − y0 ∈ span{ui }i∈St

or equivalently,
⟨ui , yt − y0 ⟩ = 0
(0) ⊺
for every i ∈
/ St . Note that z = U (y0 − y0 ) = 0 is trivial.
m×n
We now construct U ∈ R . Note that S0 = ∅ ⊆ S1 ⊆ · · · ⊆ St . {ui }i∈St \St−1 is chosen inductively starting from
t = 1. Suppose we have already chosen {ui }i∈St−1 . Choose {ui }i∈St \St−1 from the orthogonal complement of
 
Wt := span {y1 − y0 , · · · , yt−1 − y0 } ∪ {ui }i∈St−1

and let them be orthogonal to each other. In case of SN ̸= ∅, for i ∈


/ SN , choose proper vectors ui so that U becomes
an orthogonal matrix. This is possible when the dimension of Wt⊥ is large enough to draw |St \ St−1 |-many orthogonal
vectors, or in other words,
dim Wt⊥ ≥ |St \ St−1 |.
From the assumption, m − t + 1 ≥ m − N + 1 ≥ n, so we have a guarantee that

dim Wt⊥ = m − dim Wt ≥ m − {(t − 1) + |St−1 |} ≥ |St−1


c
| = n − |St−1 | ≥ |St \ St−1 |.

The columns {ui }ni=1 of constructed U satisfies ⟨ui , yt − y0 ⟩ = 0 if i ∈


/ St , for t = 1, . . . , N . Therefore,

z (t) = U ⊺ (yt − y0 ) ∈ span{ei }i∈St

which leads to supp{z (t) } ⊆ St .


Exact Optimal Accelerated Complexity for Fixed-Point Iterations

We now prove the complexity lower bound result for general fixed-point iterations.

Proof of Theorem 4.6. For any deterministic fixed-point iteration A and initial point y0 ∈ Rn , consider a worst-case
operator 𝕋 : RN +1 → RN +1 defined in the proof of Theorem 4.1. According to Lemma D.4, there exists an orthogonal
U ∈ Rn×(N +1) with n ≥ (N + 1) + (N − 1) = 2N such that z (k) = U ⊺ (yk − y0 ) for k = 0, . . . , N ,

∥z (k) − 𝕋z (k) ∥ ≤ ∥yk − 𝕋U yk ∥, k = 0, . . . , N

where the query points {yk }Nk=0 are generated from applying A to 𝕋U given initial point y0 , and {z
(k) N
}k=0 is a zero-
respecting sequence with respect to 𝕋. According to Lemma D.2,
 2  2
(N ) (N ) 2 1 1
∥z − 𝕋z ∥ ≥ 1+ ∥z (0) − z⋆ ∥2 .
γ 1 + γ + · · · + γN

According to Lemma D.3, y⋆ = y0 + U z⋆ ∈ Fix 𝕋U for z⋆ ∈ Fix 𝕋, so

∥y0 − y⋆ ∥2 = ∥U (z (0) − z⋆ )∥2 = ∥z (0) − z⋆ ∥2

where the second identity comes from orthogonality of U . We may conclude that
 2  2
1 1
∥yN − 𝕋U yN ∥2 ≥ 1+ ∥y0 − y⋆ ∥2
γ 1 + γ + · · · + γN

and that 𝕋U : Rn → Rn is the desired worst-case γ1 -contraction with n ≥ 2N .

E. Omitted proofs of Section 5


E.1. Convergence rate of proximal point method
Lemma E.1. Let {xk }k∈N be the iterates generated by applying PPM xk+1 = 𝕁𝔸 xk starting from x0 ∈ Rn , given a
˜ k+1 ∥2 .
uniformly monotone operator 𝔸 with parameters µ > 0 and α > 1. Now let Ak := ∥xk − x⋆ ∥2 and Bk := ∥𝔸x
Then for any k ∈ N ∪ {0},
 α−1 2
Ak ≥ Ak+1 1 + µAk+1
2

Bk ≥ Bk+1 .

˜ k+1 where 𝔸x
Proof. Note that PPM update xk+1 = 𝕁𝔸 xk is equivalent to xk = xk+1 + 𝔸x ˜ k+1 ∈ 𝔸xk+1 .

˜ k+1 ) − x⋆ = (xk+1 − x⋆ ) + 𝔸x
xk − x⋆ = (xk+1 + 𝔸x ˜ k+1 .

Then

˜ k+1 , xk+1 − x⋆ ⟩
Ak = Ak+1 + Bk + 2⟨𝔸x
≥ Ak+1 + Bk + 2µ∥xk+1 − x⋆ ∥α+1
α+1
= Ak+1 + Bk + 2µAk+1
2

α+1
≥ Ak+1 + µ2 Aα
k+1 + 2µAk+1
2

 α−1  2
≥ Ak+1 1 + µAk+12

where the second inequality follows from

˜ k+1 ∥∥xk+1 − x⋆ ∥ ≥ ⟨𝔸x


∥𝔸x ˜ k+1 , xk+1 − x⋆ ⟩ ≥ µ∥xk+1 − x⋆ ∥α+1 .
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Also, from
˜ k ∥2 − ∥𝔸x
Bk − Bk+1 = ∥𝔸x ˜ k+1 ∥2
˜ k ∥2 + ∥𝔸x
= (∥𝔸x ˜ k+1 ∥2 ) − 2∥𝔸x
˜ k+1 ∥2
˜ k , 𝔸x
≥ 2⟨𝔸x ˜ k+1 ⟩ − 2∥𝔸x
˜ k+1 ∥2 (Young’s inequality)
˜ k+1 − 𝔸x
= −2⟨𝔸x ˜ k , 𝔸x
˜ k+1 ⟩
˜ k+1 − 𝔸x
= 2⟨𝔸x ˜ k , xk+1 − xk ⟩ ≥ 0, (Monotonicity of 𝔸)

we get Bk ≥ Bk+1 .
Theorem E.2. If 𝔸 : Rn ⇒ Rn is a uniformly monotone operator with parameters µ > 0 and α > 1, there exists C > 0
such that the iterates {xk }k∈N generated by PPM exhibits the rate

C
∥xk − x⋆ ∥2 ≤ 2
k α−1

for any k ∈ N.

Proof. We use the induction on k to show the convergence rate, and find the necessary conditions for C > 0 to satisfy.
In case of k = 1, ∥x1 − x⋆ ∥2 ≤ C must be satisfied. Lemma E.1 implies the monotonicity of Ak , so C with C ≥ ∥x0 − x⋆ ∥2
is a suitable choice.
2 2
Now, suppose that Ak ≤ Ck − α−1 and k ≥ 1. We claim that Ak+1 ≤ C(k + 1)− α−1 for the same C > 0. Define
fµα : [0, ∞) → [0, ∞) as
 α−1
2
fµα (t) := t 1 + µt 2 .
 2

Then fµα (Ak+1 ) ≤ Ak from Lemma E.1. If fµα (Ak+1 ) ≤ fµα C(k + 1)− α−1 , since fµα is a monotonically increasing
function over [0, ∞), we are done. Define
( 1
 α−1 )
1
an := (n + 1) 1+ −1 ,
n

and function g : (0, ∞) → R as  n


1 1
o
g(x) = 1+ (1 + x) α−1 − 1
x
1

so that an = g n for n ∈ N. Then
 
1 n 1
o 1 1 1
g ′ (x) = − 2
(1 + x) α−1 − 1 + 1 + (1 + x) α−1 −1
x α−1 x
1 1
(1 + x) α−1 1 1 (1 + x)(1 + x) α−1 −1
=− + +
x2 x2 α−1 x
1 1
−(1 + x) α−1 + 1 + α−1x
(1 + x) α−1
= 2
x
1
(1 + x) α−1 n 1
− α−1
o
= (1 + x) − 1 − 1
α−1 x .
x2
1
As x 7→ (1 + x)− α−1 is a convex function on [0, ∞) and x 7→ 1 − α−1 1
x is a first-order approximation at 0 of it, g ′ (x) ≥ 0
for x > 0. g is a monotonically increasing function, so g obtains its maximum in (0, 1] at x = 1, and we have
 
1 1 α
sup an = sup g = g(1) = 2(2 α−1 − 1) = 2 α−1 − 2.
n∈N n∈N n
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

The boundedness of an leads to the equivalency as


 2
 α−1 α
!2
α 1 2 α−1 − 2
ak ≤ 2 α−1 − 2 ⇐⇒ 1+ ≤ 1+
k k+1
 ! α−1 2
α 2 
C C  2 α−1 − 2 C
⇐⇒ 2 ≤ 2 1 + α−1 2 ,
k α−1 (k + 1) α−1  C 2 (k + 1) α−1 

2 α 2
for any choice of C > 0. Choosing C ≥ µ− α−1 (2 α−1 − 2) α−1 which is equivalent to
α
2 α−1 − 2
α−1 ≤ µ,
C 2

we get
 ! α−1 2  ! α−1 2
α 2  2 
C  2 α−1 − 2 C C  C
2 1+ α−1 2 ≤ 2 1+µ 2
(k + 1) α−1  C 2 (k + 1) α−1  (k + 1) α−1  (k + 1) α−1 
!
C
= fµα 2 .
(k + 1) α−1
2 α 2
Gathering all the inequalities above, if C ≥ µ− α−1 (2 α−1 − 2) α−1 , then
!
C C
fµα (Ak+1 ) ≤ Ak ≤ 2 ≤ fµα 2
k α−1 (k + 1) α−1

so we get
C C
Ak ≤ 2 =⇒ Ak+1 ≤ 2
k α−1 (k + 1) α−1
for k = 1, 2, . . . .
Therefore,
 α 2
 α−1 
2 α−1 −2
max µ , ∥x0 − x⋆ ∥2
C
∥xk − x⋆ ∥2 ≤ 2 = 2 .
k α−1 k α−1

˜ k+1 ∥2 .
We now prove the convergence rate of PPM in terms of Bk = ∥𝔸x

˜ k ∥2 to be as above. From the proof of Lemma E.1,


Proof of Theorem 5.1. We claim the convergence rate of Bk−1 = ∥𝔸x
we have
α+1
Bk ≤ Ak − Ak+1 − 2µAk+1 2
≤ Ak − Ak+1 .
If N = 1, then
B0 ≤ A0 − A1 ≤ A0 = ∥x0 − x⋆ ∥2 .
Suppose N ≥ 2. Let n := ⌊ N2 ⌋ where ⌊x⌋ is the largest integer not exceeding x. Summing up the above inequality from
k = n to k = N − 1 and using the monotonicity of Bk , we have
N −1 N −1 N −1
N X X X
BN −1 ≤ BN −1 ≤ Bk ≤ (Ak − Ak+1 ) = An − AN ≤ An .
2
k=n k=n k=n
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Note that from the convergence analysis of Ak , or Theorem E.2, we have


C
An ≤ 2
n α−1

 α 2
 α−1 
2 α−1 −2
where C = max µ , ∥x0 − x⋆ ∥2 . Therefore,

N C C
BN −1 ≤ 2 ≤ 2 ,
2 n α−1 N −1
 α−1
2

so we may conclude that, for any N ≥ 2,


α+1
2 α−1 C
BN −1 ≤ 2
(N − 1) α−1 N
α+1
 α  2 
2 α−1 −2 α−1 2
2 α−1 max µ , ∥x0 − x⋆ ∥
= 2
(N − 1) α−1 N
α+3
 α  2 
2 α−1 −2 α−1 2
2 α−1 max µ , ∥x0 − x⋆ ∥
≤ α+1

  N α−1
α+1
= O N − α−1

where the second inequality follows from 2(N − 1) ≥ N . Since this bound also holds for the case of N = 1 from
B0 ≤ ∥x0 − x⋆ ∥2 , we are done.

E.2. Convergence rate of restarted OS-PPM (OS-PPMres


0 )

Roulet & d’Aspremont (2020) showed that if the objective function f of a smooth convex minimization problem satisfies a
Hölderian error bound condition
µ
∥x − x⋆ ∥r ≤ f (x) − f ⋆ , ∀ x ∈ K ⊂ Rn
r
where x⋆ ∈ K is a minimizer of f and K is a given set, then the unaccelerated base algorithm can be accelerated with a
restarting scheme. The restarting schedule uses tk iterations for each k-th outer loop recursively satisfying
f (xk ) − f ⋆ ≤ e−ηk (f (x0 ) − f ⋆ ), k = 1, 2, . . .
for some η > 0, where xk = A(xk−1 , tk ) is the output of k-th outer loop, which applies tk iterations of the base algorithm
A starting from xk−1 . If an objective function is strongly convex near the solution (r = 2), a constant restarting schedule
tk = λ provides a faster rate compared to an unaccelerated base algorithm (Nemirovski & Nesterov, 1985). If an objective
function satisfies a Hölderian error bound condition but it is not strongly convex (r > 2), then an exponentially-growing
schedule tk = λeβk for some λ > 0 and β > 0 results in a faster sublinear convergence rate.
As notable prior work, Kim (2021) studied APPM with a constant restarting schedule in the strongly monotone setup but
was not able to obtain a rate faster than plain PPM. We show that restarting with an exponentially increasing schedule
accelerates (OS-PPM) under uniform monotonicity, as for the case of r > 2 in Roulet & d’Aspremont (2020).

Proof of Theorem 5.2. Suppose that given an initial point x0 ∈ Rn , let x̃0 be an iterate generated by applying APPM on x0
only once. Then
1 1
x̃0 = (2𝕁𝔸 x0 − x0 ) + x0 = 𝕁𝔸 x0 ,
2 2
so we get
˜ 0 + (x̃0 − x⋆ )∥2
∥x0 − x⋆ ∥2 = ∥𝔸x̃
˜ 0 ∥2 + 2⟨𝔸x̃
= ∥𝔸x̃ ˜ 0 , x̃0 − x⋆ ⟩ + ∥x̃0 − x⋆ ∥2 .
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

˜ 0 , x̃0 − x⋆ ⟩ ≥ 0, so we may conclude that


From the monotonicity of 𝔸, ⟨𝔸x̃
˜ 0 ∥2 ≤ ∥x0 − x⋆ ∥2 .
∥𝔸x̃

Now we describe the restarting scheme of APPM. Let tk be the number of inner iterations applying APPM for the kth outer
iteration. This iteration starts from x̃k−1 and outputs x̃k after applying tk iterations of APPM. Then the kth outer iteration
results in
˜ k ∥2 ≤ 1 1 1 ˜ k−1 ∥2/α ,
∥𝔸x̃ ∥x̃k−1 − x⋆ ∥2 ≤ 2 ∥x̃k−1 − x⋆ ∥2 ≤ 2/α 2 ∥𝔸x̃
(tk + 1)2 tk µ tk
where the last inequality follows from
˜ k−1 ∥∥x̃k−1 − x⋆ ∥ ≥ ⟨𝔸x̃
∥𝔸x̃ ˜ k−1 , x̃k−1 − x⋆ ⟩ ≥ µ∥x̃k−1 − x⋆ ∥α+1 .

In order to find a possible choice of restart schedule, we will iteratively find the number tk of inner iterations for kth outer
iteration which satisfies
˜ k ∥2 ≤ e−ηk ∥x0 − x⋆ ∥2
∥𝔸x̃
for some η > 0. The case of k = 0 holds automatically. Suppose k ≥ 1, and t1 , . . . , tk−1 are already chosen to satisfy
˜ k−1 ∥2 ≤ e−η(k−1) ∥x0 − x⋆ ∥2
∥𝔸x̃

for k ≥ 1. Then
˜ k ∥2 ≤ 1 ˜ k−1 ∥2/α ≤ 1 η(k−1) 2
∥𝔸x̃ ∥𝔸x̃ e− α ∥x0 − x⋆ ∥ α ,
µ2/α t2k µ2/α t2k
so that the claimed convergence rate is guaranteed if
1 η(k−1) 2
e− α ∥x0 − x⋆ ∥ α ≤ e−ηk ∥x0 − x⋆ ∥2 .
µ2/α t2k

This is equivalent to
1 η 1
nη 1
 o
−α α −1
tk ≥ µ e 2α ∥x0 − x⋆ ∥ exp 1− k ,
| {z } 2 α
:=λ | {z }
:=β

˜ k ∥2 ≤ e−ηk ∥x0 − x⋆ ∥2 for k = 1, . . . , R.


so if tk ≥ λeβk for k = 1, . . . , R, then ∥𝔸x̃
Now we prove that the choice of
( 
λeβk (k = 1, . . . , R − 1)
tk = PR−1
N − 1 − k=1 tk (k = R)

for integer R satisfying


R
X R+1
X
⌈λeβk ⌉ ≤ N − 1 < ⌈λeβk ⌉
k=1 k=1
 2α

− α−1 ˜ 2 for restarted OS-PPM (OS-PPMres
results in O N -rate of ∥𝔸x̂∥ 0 ).

For k = 1, . . . , R − 1, tk ≥ λeβk by definition of tk . If k = R, from


R−1
X R−1
X
N −1= tk + tR = ⌈λeβk ⌉ + tR ,
k=1 k=1

we have
R−1
X
tR = N − 1 − ⌈λeβk ⌉ ≥ ⌈λeβR ⌉ ≥ λeβR .
k=1
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

Therefore, tk ≥ λeβk for k = 1, 2, . . . , R, and we get

˜ R ∥2 ≤ e−ηR ∥x0 − x⋆ ∥2 .
∥𝔸x̃

˜ R ∥2 using the inequality above, we obtain a lower bound to R. From λeβk ≤ ⌈λeβk ⌉ ≤
To find the upper bound to ∥𝔸x̃
λe + 1 and ⌈λe ⌉ ≤ tR < ⌈λeβR ⌉ + ⌈λeβ(R+1) ⌉, we have
βk βR

R
X R
X R−1
X R+1
X
λeβk ≤ N − 1 = tk = ⌈λeβk ⌉ + tR ≤ λeβk + R + 1. (1)
k=1 k=1 k=1 k=1

Using the first inequality in (1), we have


eβR − 1
λeβ ≤N −1
eβ − 1
or equivalently,
N − 1 eβ − 1
 
1
R ≤ log +1 .
β λ eβ
Plugging this upper bound of R to the second inequality of (1), we get

eβ(R+1) − 1 N − 1 eβ − 1
 
1
N − 1 ≤ λeβ β
+ log + 1 + 1.
e −1 β λ eβ

Simplifying this to obtain a lower bound on R, we get


 β
N − 1 eβ − 1
   
e −1 1
e−β N − 2 − log + 1 + 1 ≤ eβR .
λeβ β λ eβ

Therefore,

˜ R ∥2 ≤ e−ηR ∥x0 − x⋆ ∥2
∥𝔸x̃
 β − βη
N − 1 eβ − 1
  
η e −1 1
≤e N − 2 − log +1 +1 ∥x0 − x⋆ ∥2
λeβ β λ eβ
 β   β  − α−12α
e −1 1 e −1 1
= N − 2 − log (N − 1) + 1 + β ∥x0 − x⋆ ∥2 (Choose η = 2)
λe2β β λeβ e
 2α

= O N − α−1

 α1
∥x0 − x⋆ ∥−(1− α ) .
1
where λ = e
µ

α+1
This is a rate faster than O(N − α−1 )-rate of PPM. Although the monotonicity parameter µ > 0 and α > 1 are unknown,
one can obtain a suboptimal restart schedule with additional cost
 for the grid search as in Roulet & d’Aspremont (2020),

− α−1 2
where the total cost for the algorithm is of O N (log N ) .

F. Experiment details
We now describe the experiments of Section 6 in further detail.

F.1. Experiment details of Section 6.1


In the first example, 𝕋θ is constructed with θ = 15◦ and γ = 0.95 1
, and (OC-Halpern) is applied on 𝕋θ with the same
1
γ = 0.95 . In the second example, 𝕄 is constructed with µ = 0.035, and (OS-PPM) is applied on 𝕄 with the same
µ = 0.035. For both experiments, we use N = 101 total iterations. The plots
 of
⊺both experiments display the position of
every iterate with markers, when methods started from initial point y0 = 1 0 ∈ R2 .
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

F.2. Experiment details of Section 6.2


X-ray CT reconstructs the image from the received from a number of detectors. Reconstruction of the original image is
often formulated as a least-squares problem with total variation regularization

1
minimize
n 2 ∥Ex − b∥2 + λ∥Dx∥1 , (2)
x∈R

where x ∈ Rn is a vectorized image, E ∈ Rm×n is the discrete Radon transform, b = Ex is the measurement, and D
is the finite difference operator. This regularized least-squares problem can be solved using PDHG, also known as the
Chambolle–Pock method (Chambolle & Pock, 2011). PDHG can be interpreted as an instance of variable metric PPM (He
& Yuan, 2012); it is a nonexpansive fixed-point iteration (xk+1 , uk+1 , v k+1 ) = 𝕋(xk , uk , v k ) defined as

xk+1 = xk − αE ⊺ uk − βD⊺ v k
1
uk+1 = uk + αE(2xk+1 − xk ) − αb

1+α
v k+1 = Π[−λα/β,λα/β] v k + βD(2xk+1 − xk )


with respect to the metric matrix

−E ⊺ −(β/α)D⊺
 
(1/α)I
M =  −E (1/β)I 0 .
−(β/α)D 0 (1/β)I

Therefore, we apply OHM on 𝕋 as

 
k+1 k+1 k+1 1 1
(x ,u ,v )= 1− 𝕋(xk , uk , v k ) + (x0 , u0 , v 0 ) (PDHG with OHM)
k+2 k+2

and use additional restarting strategy to yield a faster convergence.


In our experiment, we use the a Modified Shepp-Logan phantom image. We applied PDHG, PDHG combined with OHM,
and PDHG combined with restarted OC-Halpern (OS-PPMres 0 ), where the parameters are given as α = 0.01, β = 0.03 and
λ = 1.0. We applied restarting with the schedule illustrated in Theorem 5.2, with properly chosen λ > 0 and β > 0.

Figure 6. Images reconstructed by applying PDHG, PDHG with OHM, and PDHG with restarted OC-Halpern for 1000 iterations.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

108
107

f(xk ) − f
106
105 PDHG
OHM
104 Restarted OC-Halpern
100 101 102 103
Iteration count

Figure 7. Function value suboptimality f (xk ) − f ⋆ plot of PDHG, PDHG with OHM, and PDHG with restarted OC-Halpern (OS-PPMres
0 )
in CT image reconstruction.

Figure 6 shows the reconstructed images after 1000 iterations. Restarted OC-Halpern (OS-PPMres 0 ) can effectively recover
the original image, in a faster rate. Figure 7 shows that even without theoretical guarantee, the function value suboptimality
decreases in a faster rate for OHM and restarted OC-Halpern.

F.3. Experiment details of Section 6.3


In this section, we approximated the Wasserstein distance (or Earth mover’s distance) of two different probability distributions
by solving the following discretized problem
Pn Pn
minimize ∥m∥1,1 = i=1 j=1 |mx,ij | + |my,ij |
mx ,my
subject to div(m) + ρ1 − ρ0 = 0.

To solve this problem Li et al. (2018) used PDHG (Chambolle & Pock, 2011)

1
m̃k+1 shrink1 m̃kx,ij + µ(∇Φk )x,ij , µ

x,ij =
1 + εµ
k+1 1
shrink1 m̃ky,ij + µ(∇Φk )y,ij , µ

m̃y,ij =
1 + εµ
 
Φk+1
ij = Φkij + τ (div(2mk+1 − mk ))ij + ρ1ij − ρ0ij (Primal-dual method for EMD-L1 )

for k = 1, 2, . . . where m̃ = (m̃x , m̃y ) is m = (mx , my ) with zero padding on their last row and last column, respectively,
hence making m̃x , m̃y ∈ Rn×n . We denote this fixed-point iteration by 𝕋, so that (m̃k+1 x , m̃k+1
y , Φk+1 ) = 𝕋(m̃kx , m̃ky , Φk ).
Combining OHM on this fixed-point iteration yields the iteration
 
k+1 k+1 k+1 1 1
(m̃x , m̃y , Φ )= 1− 𝕋(m̃kx , m̃ky , Φk ) + (m̃0 , m̃0 , Φ0 )
k+2 k+2 x y

for k = 1, 2, . . . , and we also combine restarting technique with exponential schedule to hope for further acceleration.
This experiment calculated an approximation of Wasserstein distance between the two probability distributions as in Figure 8.
We applied 3 different algorithms for N = 100, 000 iterations with algorithm parameters µ = 1.0 × 10−6 and ε = 1.0.
Restarting the algorithm with Halpern scheme every 10, 000 iterations provided the accelerated rate, but we chose better
exponential schedule for the plot in Figure 9.

F.4. Experiment details of Section 6.4


We follow the settings of decentralized compressed sensing experiment in section IV of (Shi et al., 2015). The underlying
network has 10 nodes and 18 edges, and these edges connect the nodes as in Figure 11.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

(a) Probabilistic distribution ρ0 (b) Probabilistic distribution of ρ1 (c) Solution of discretized problem
in Section 6.3

Figure 8. Probabilistic distribution of ρ0 and ρ1 . This distribution is expressed as the colored parts in 256 × 256 grid, there ρ0 contains
the part of x2 + y 2 ≤ (0.3)2 and ρ1 contains 4 identical circles with radius 0.2, centered at (±1, ±1).

10-3 PDHG
OHM
Restarted OC-Halpern
10-6
kxk − Txk k 2M

10-9

10-12

0 20000 40000 60000 80000 100000


Iteration count

Figure 9. Fixed-point residual of 𝕋 versus iteration count plot for approximating Wasserstein distance.

Experiment considered the regularized least-squares problem on R50 , where the sparse signal x⋆ has 10 nonzero entries.
n
1X
minimize ∥A(i) x − b(i) ∥2 + λ∥x∥1 .
n
x∈R n i=1

Each node i maintains its local estimate xi of x ∈ Rn , and have access to sensing matrix A(i) ∈ Rmi ×n , where mi is the
number of accessible sensors. Here, we assume to have mi = 3 many sensors for each node and has total m = 30 sensors.
We applied PG-EXTRA, PG-EXTRA combined with OHM, PG-EXTRA with (OC-Halpern), and PG-EXTRA with
Restarted OC-Halpern (OS-PPMres 0 ), since PG-EXTRA can be understood as a fixed-point iteration (Wu et al., 2018).
Let xk ∈ Rn×10 be a vertical stack of Rn vectors, where each i-th row vector xki is a local copy of x stored in node
i. The vectors in node i only interact with other vectors in close neighborhood of node i. The fixed-point iteration
(xk+1 , wk+1 ) = 𝕋(xk , wk ) is
 
X
xk+1
i = Proxαλ∥·∥1  Wi,j xkj − αA⊺(i) (A(i) xki − b(i) ) − wik 
j

1
wk+1 = wk + (I − W )xk
2
and PG-EXTRA combined with OHM is
 
1 1
(xk+1 , wk+1 ) = 1 − 𝕋(xk , wk ) + (x0 , w0 ) (PG-EXTRA with OHM)
k+2 k+2
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

100
PDHG
10-1 OHM
Restarted OC-Halpern
10-2

|f(xk ) − f |
10-3

10-4

10-5

10-6

0 20000 40000 60000 80000 100000


Iterations

Figure 10. Absolute function-value suboptimality |f (xk ) − f ⋆ | versus iteration count plot for approximating Wasserstein distance.

Figure 11. The network underlying the setting of Section 6.4.

for k = 0, 1, . . . . For all these methods, we chose the mixing matrix W ∈ R10×10 to be Metropolis-Hastings weight with
each (i, j)-entry Wi,j being

(
1
max{deg(i),deg(j)} (i ̸= j)
Wi,j = P
1− j̸=i Wi,j (i = j)

where deg(i) is the number of edges connected to node i. We applied each methods (PG-EXTRA, PG-EXTRA with OHM,
PG-EXTRA with OC-Halpern, and PG-EXTRA with restarted OC-Halpern (OS-PPMres 0 )) with stepsize α = 0.005 and
regularization parameter λ = 0.002 for 100 iterations.
Exact Optimal Accelerated Complexity for Fixed-Point Iterations

1.8 × 101 PG-EXTRA


1.7 × 101 OHM
1.6 × 101 OC-Halpern

kx k − x k 2F
Restarted OC-Halpern
1.5 × 101
1.4 × 101
1.3 × 101
1.2 × 101
100 101 102
Iteration count

Figure 12. Distance to solution ∥xk − x⋆ ∥2F versus iteration count plot for PG-EXTRA, PG-EXTRA with OHM, PG-EXTRA with
(OC-Halpern), and PG-EXTRA with Restarted OC-Halpern (OS-PPMres 0 ).

You might also like