A Relaxed Inertial Forward-Backward-Forward Algorithm For Solving Monotone Inclusions With Application To Gans
A Relaxed Inertial Forward-Backward-Forward Algorithm For Solving Monotone Inclusions With Application To Gans
Abstract
We introduce a relaxed inertial forward-backward-forward (RIFBF) splitting algorithm for
approaching the set of zeros of the sum of a maximally monotone operator and a single-
valued monotone and Lipschitz continuous operator. This work aims to extend Tseng’s
forward-backward-forward method by both using inertial effects as well as relaxation pa-
rameters. We formulate first a second order dynamical system that approaches the solution
set of the monotone inclusion problem to be solved and provide an asymptotic analysis for
its trajectories. We provide for RIFBF, which follows by explicit time discretization, a
convergence analysis in the general monotone case as well as when applied to the solving
of pseudo-monotone variational inequalities. We illustrate the proposed method by ap-
plications to a bilinear saddle point problem, in the context of which we also emphasize
the interplay between the inertial and the relaxation parameters, and to the training of
Generative Adversarial Networks (GANs).
Keywords: forward-backward-forward algorithm, inertial effects, relaxation parameters,
continuous time approach, application to GANs;
1. Introduction
1.1 Motivation
The main motivation for the investigation of monotone inclusions and variational inequal-
ities governed by monotone and Lipschitz continuous operators is represented by convex-
concave minimax problems. It is well-known that determining primal-dual pairs of optimal
solutions of convex optimization problems means actually solving convex-concave minimax
problems (Bauschke and Combettes, 2017), nevertheless, minimax problems are of own in-
terest. Minimax problems arise traditionally in game theory and, more recently, in the
The regularizers f and g are assumed to be proper, convex and lower semicontinuous. A
solution of (1) is given by a so-called saddle point (u∗ , v ∗ ), fulfilling for all u ∈ H and all
v∈G
Ψ(u∗ , v) ≤ Ψ(u∗ , v ∗ ) ≤ Ψ(u, v ∗ )
or, equivalently, the system of optimality conditions
2
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
3
Boţ, Sedlmayer and Vuong
where γ, τ : [0, +∞) → [0, +∞), was proposed and studied by (Boţ and Csetnek, 2016a)
(see also the work of Alvarez, 2000; Antipin, 1994; Boţ and Csetnek, 2018). Explicit time
discrerization of this second order dynamical system gives rise to so-called relaxed inertial
forward-backward algorithms, which combine inertial effects and relaxation parameters.
In the last years, Attouch and Cabot have promoted in a series of papers relaxed iner-
tial algorithms for monotone inclusions and convex optimization problems, as they combine
the advantages of both inertial effects and relaxation techniques. More precisely, they ad-
dressed the relaxed inertial proximal method (RIPA) (Attouch and Cabot, 2019b,c) and the
relaxed inertial forward-backward method (RIFB) (Attouch and Cabot, 2019a). A relaxed
inertial Douglas-Rachford algorithm for monotone inclusions has been proposed by (Boţ
et al., 2015). (Iutzeler and Hendrickx, 2019) investigated the influence inertial effects and
relaxation techniques have on the numerical performances of optimization algorithms. The
interplay between relaxation and inertial parameters for relative-error inexact under-relaxed
algorithms has been addressed by (Alves and Marcavillaca, 2020; Alves et al., 2020).
Relaxation techniques are essential ingredients in the formulation of algorithms for
monotone inclusions, as they provide more flexibility to the iterative schemes (Bauschke
and Combettes, 2017; Eckstein and Bertsekas, 1992). Inertial effects have been introduced
in order to accelerate the convergence of the numerical methods. This technique traces back
to the pioneering work of (Polyak, 1964), who introduced the heavy ball method in order
to speed up the convergence behavior of the gradient algorithm and allow the detection
of different critical points. This idea was employed and refined by (Nesterov, 1983) and
by (Alvarez, 2000) and by (Alvarez and Attouch, 2001) in the context of solving smooth
convex minimization problems and monotone inclusions/nonsmooth convex minimization
problems, respectively. When applied to convex minimization problems, the acceleration
techniques introduced by Nesterov in (Nesterov, 1983) or those defined via asymptotically
similar constructions lead to an improved convergence rate for the sequence of objective
function values. When applied to monotone inclusions, the same lead, as seen in (Attouch
and Cabot, 2019a,b,c) for the relaxed inertial proximal and forward-backward methods, to
improved convergence rates for the sequences of discrete velocities and Yosida regulariza-
tions of the governing operator.
In this paper we will focus on solving the monotone inclusion (3) in the case when
B is merely monotone and Lipschitz continuous. To this end we will formulate a relaxed
inertial forward-backward-forward (RIFBF) algorithm, which we obtain through the time
discretization of a second order dynamical system approaching the solution set of (3). The
forward-backward-forward (FBF) method was proposed by (Tseng, 2000) and it generates
an iterative sequence (xk )k≥0 via
(
yk = Jλk A (I − λk B)xk ,
(∀k ≥ 0)
xk+1 = yk − λk (Byk − Bxk ),
where x0 ∈ H is the starting point. The sequence (xk )k≥0 converges weakly to a solution of
(3) if the sequence of stepsizes (λk )k≥0 is chosen in the interval 0, L1 , where L > 0 is the
4
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
parameters take very small values. This is in accordance to the inertial parameters choices
in the earliest papers on inertial algorithms (Alvarez, 2000) and (Alvarez and Attouch,
2001). It is one of our aims to show that relaxation parameters allow more flexibility in the
choice of the inertial ones, which leads to better convergence results.
Recently, a forward-backward algorithm for solving (3), when B is monotone and Lip-
schitz continuous, was proposed by (Malitsky and Tam, 2020). This method requires in
every iteration only one forward step instead of two, however, the sequence of stepsizes has
1
to be chosen constant in the interval 0, 2L , which slows the algorithm down in compar-
ison to FBF. A popular algorithm used to solve the variational inequality (4), when B is
monotone and Lipschitz continuous, is Korpelevich’s extragradient method (Korpelevich,
1976). The stepsizes are to be chosen in the interval 0, L1 , however, this method requires
1.4 Outline
First, we will approach the solution set of (3) from a continuous perspective by means of the
trajectories generated by a second order dynamical system of FBF type. We will prove an
existence and uniqueness result for the generated trajectories and provide a general setting
in which these converge to a zero of A + B as time goes to infinity. In addition, we will
show that explicit time discretization of the dynamical system gives rise to an algorithm of
forward-backward-forward type with inertial and relaxation parameters (RIFBF).
In Section 3 we will discuss the convergence of RIFBF and investigate the interplay
between the inertial and the relaxation parameters. It is of certain relevance to notice
that both the standard FBF method, the algorithm by (Malitsky and Tam, 2020) and the
extragradient method require to know the Lipschitz constant of B, which is not always
available. This can be avoided by performing a line-search procedure, which usually leads
to additional computation costs. On the contrary, we will use an adaptive stepsize rule
which does not require knowledge of the Lipschitz constant of B. We will also comment on
the convergence of RIFBF when applied to the solving of the variational inequality (4) in
the case when the operator B is pseudo-monotone but not necessarily monotone. Pseudo-
monotone operators appear in the consumer theory of mathematical economics (Hadjisavvas
et al., 2012) and as gradients of pseudo-convex functions (Cottle and Ferland, 1971), such
as ratios of convex and concave functions in fractional programming (Borwein and Lewis,
2006).
Concluding, we treat two different numerical experiments supporting our theoretical
results in Section 4. On the one hand we deal with a bilinear saddle point problem which can
be understood as a two-player zero-sum constrained game. In this context, we emphasize the
interplay between the inertial and the relaxation parameters. On the other hand we employ
variants of RIFBF for training Generative Adversarial Networks (GANs), which is a class of
machine learning systems where two opposing artificial neural networks compete in a zero-
sum game. GANs have achieved outstanding results for producing photorealistic pictures
and are typically known to be difficult to optimize. We show that our method outperforms
“Extra Adam”, a GAN training approach inspired by the extra-gradient algorithm, which
recently achieved state-of-the-art results (Gidel et al., 2019).
5
Boţ, Sedlmayer and Vuong
1
where γ, τ : [0, +∞) → [0, +∞) are Lebesgue measurable functions, 0 < λ < L and
x0 , v0 ∈ H, in connection with the monotone inclusion problem (3).
We define M : H → H by
Proposition 1 Let M be defined as in (7). Then the following statements are true:
(i) Zeros(M ) = Zeros(A + B);
Proof (i) For x ∈ H we set y := JλA (I − λB)x, thus M x = x − y − λ (Bx − By). Using
the Lipschitz continuity of B we have
≤ (1 + λL) kx − x0 k + ky − y 0 k .
6
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Therefore,
kM x − M x0 k ≤ (1 + λL)(2 + λL)kx − x0 k,
which shows that M is Lipschitz continuous with Lipschitz constant (1 + λL)(2 + λL) > 0.
(iii) Let x∗ ∈ H be such that 0 ∈ (A + B)x∗ and x ∈ H. We denote y := JλA (I − λB)x
and can write (I − λB)x ∈ (I + λA)y or, equivalently,
1
(x − y) − (Bx − By) ∈ (A + B)y. (11)
λ
Using the monotonicity of A + B we obtain
1 ∗
(x − y) − (Bx − By) , y − x ≥ 0,
λ
which is equivalent to
The following definition makes explicit which kind of solutions of the dynamical system
(6) we are looking for. We recall that a function x : [0, b] → H (where b > 0) is said to be
absolutely continuous if there exists an integrable function y : [0, b] → H such that
Z t
x(t) = x(0) + y(s)ds ∀t ∈ [0, b].
0
This is nothing else than x is continuous and its distributional derivative ẋ is Lebesgue
integrable on [0, b].
Definition 2 We say that x : [0, +∞) → H is a strong global solution of (6) if the following
properties are satisfied:
(i) x, ẋ : [0, +∞) → H are locally absolutely continuous, in other words, absolutely
continuous on each interval [0, b] for 0 < b < +∞;
(ii) ẍ(t) + γ(t)ẋ(t) + τ (t)M x(t) = 0 for almost every t ∈ [0, +∞);
7
Boţ, Sedlmayer and Vuong
Theorem 3 (see Boţ and Csetnek, 2016a, Theorem 4) Let γ, τ : [0, +∞) → [0, +∞) be
Lebesgue measurable functions such that γ, τ ∈ L1loc ([0, +∞)) (that is γ, τ ∈ L1 ([0, b]) for
all 0 < b < +∞). Then for each x0 , v0 ∈ H there exists a unique strong global solution of
the dynamical system (6).
We will prove the convergence of the trajectories of (6) in a setting which requires the
damping function γ and the relaxation function τ to fulfil the assumptions below. We refer
to (Boţ and Csetnek, 2016a) for examples of functions which fulfil this assumption and
want also to emphasize that when the two functions are constant we recover the conditions
from (Attouch and Maingé, 2011).
Assumption 1 γ, τ : [0, +∞) → [0, +∞) are locally absolutely continuous and there exists
ν > 0 such that for almost every t ∈ [0, +∞) it holds
γ 2 (t) (1 + λL)2
γ̇(t) ≤ 0 ≤ τ̇ (t) and ≥ (1 + ν) .
τ (t) 1 − λL
The result which states the convergence of the trajectories is adapted from the results
by (Boţ and Csetnek, 2016a, Theorem 8). Though, it cannot be obtained as a direct
consequence of it, since the operator M is not cocoercive as it is required there. However,
as seen in Proposition 1 (iii), M has a property, sometimes called “cocoercive with respect
to its set of zeros”, which is by far weaker than cocoercivity, but strong enough in order to
allow us to partially apply the techniques used to prove Theorem 8 by (Boţ and Csetnek,
2016a).
Theorem 4 Let γ, τ : [0, +∞) → [0, +∞) be functions satisfying Assumption 1 and x0 , v0 ∈
H. Let x : [0, +∞) → H be the unique strong global solution of (6). Then the following
statements are true:
(i) the trajectory x is bounded and ẋ, ẍ, M x ∈ L2 ([0, +∞); H);
(ii) limt→+∞ ẋ(t) = limt→+∞ ẍ(t) = limt→+∞ M x(t) = limt→+∞ [x(t) − y(t)] = 0;
Proof Take an arbitrary x∗ ∈ Zeros(A + B) = Zeros(M ) and define for all t ∈ [0, +∞) the
Lyapunov function h(t) = 21 kx(t) − x∗ k2 . For almost every t ∈ [0, +∞) we have
Taking into account (8) we obtain for almost every t ∈ [0, +∞) that
1 − λL
ḧ(t) + γ(t)ḣ(t) + τ (t)kM x(t)k2 ≤ kẋ(t)k2 .
(1 + λL)2
8
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
From this point, we can proceed as in the proof of (Boţ and Csetnek, 2016a, Theorem 8)
– for a complete explanation see Appendix A. Consequently, we obtain the statements in
(i) and (ii) and the fact that the limit limt→+∞ kx(t) − x∗ k ∈ R exists, which is the first
assumption in the continuous version of the Opial Lemma as it was used, for example,
by (Boţ and Csetnek, 2016a, Lemma 7). In order to show that the second assumption
of the Opial Lemma is fulfilled, which means actually proving that every weak sequential
cluster point of the trajectory x is a zero of M , one cannot use the arguments from (Boţ and
Csetnek, 2016a, Theorem 8), since M is not maximal monotone. We have to use, instead,
different arguments relying on the maximal monotonicity of A + B.
Indeed, let x̄ be a weak sequential cluster point of x, which means that there exists
a sequence tk → +∞ such that (x(tk ))k≥0 converges weakly to x̄ as k → +∞. Since,
according to (ii), limt→+∞ M x(t) = limt→+∞ [x(t) − y(t)] = 0, we conclude that (y(tk ))k≥0
also converges weakly to x̄. According to (11) we have
1
(x(tk ) − y(tk )) − (Bx(tk ) − By(tk )) ∈ (A + B)y(tk ) ∀k ≥ 0. (12)
λ
Since B is Lipschitz continuous and limk→+∞ kx(tk ) − y(tk )k = 0, the left hand side of
(12) converges strongly to 0 as k → +∞. Since A + B is maximal monotone, its graph is
sequentially closed with respect to the weak-strong topology of the product space H × H.
Therefore, taking the limit as tk → +∞ in (12) we obtain x̄ ∈ Zeros(A + B).
Thus, the continuous Opial Lemma implies that x(t) converges weakly to an element in
Zeros(A + B) as t → +∞.
Remark 5 (explicit discretization) Explicit time discretization of (6) with stepsize sk >
0, relaxation variable τk > 0, damping variable γk > 0, and initial points x0 and x1 yields
for all k ≥ 1 the following iterative scheme:
1 γk
(xk+1 − 2xk + xk−1 ) + (xk − xk−1 ) + τk M zk = 0, (13)
s2k sk
where zk is an extrapolation of xk and xk−1 that will be chosen later. The Lipschitz con-
tinuity of M provides a certain flexibility to this choice. We can write (13) equivalently
as
(∀k ≥ 1) xk+1 = xk + (1 − γk sk )(xk − xk−1 ) − s2k τk M zk .
9
Boţ, Sedlmayer and Vuong
where x0 , x1 ∈ H are starting points, (λk )k≥1 and (ρk )k≥1 are sequences of positive numbers,
and (αk )k≥1 is a sequence of nonnegative numbers. The following iterative schemes can be
obtained as particular instances of RIFBF:
• adaptive stepsize: let µ ∈ (0, 1) and λ1 > 0. The stepsizes for k ≥ 1 are adaptively
updated as follows
n o
µkyk −zk k
(
min λk , kBy k −Bzk k
, if Byk − Bzk 6= 0,
λk+1 := (14)
λk , otherwise.
10
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Proposition 6 Let µ ∈ (0, 1) and λ1 > 0. The sequence (λk )k≥1 generated by (14) is
nonincreasing and n µo
lim λk = λ ≥ min λ1 , .
k→+∞ L
In addition,
µ
kByk − Bzk k ≤ kyk − zk k ∀k ≥ 1. (15)
λk+1
Proof It is obvious from (14) that λk+1 ≤ λk for all k ≥ 1. Since B is Lipschitz continuous
with Lipschitz constant L, it yields
µkyk − zk k µ
≥ , if Byk − Bzk 6= 0,
kByk − Bzk k L
λk
Proposition 7 Let wk := yk − λk (Byk − Bzk ) and θk := λk+1 for all k ≥ 1. Then for all
x∗ ∈ Zeros(A + B) it holds
kzk − x∗ k2 = kzk − yk + yk − wk + wk − x∗ k2
= kzk − yk k2 + kyk − wk k2 + kwk − x∗ k2
+2 hzk − yk , yk − x∗ i + 2 hyk − wk , wk − x∗ i
= kzk − yk k2 − kyk − wk k2 + kwk − x∗ k2 + 2 hzk − wk , yk − x∗ i
= kzk − yk k2 − λ2k kByk − Bzk k2 + kwk − x∗ k2 + 2 hzk − wk , yk − x∗ i
µ2 λ2
≥ kzk − yk k2 − 2 k kyk − zk k2 + kwk − x∗ k2 + 2 hzk − wk , yk − x∗ i .(17)
λk+1
11
Boţ, Sedlmayer and Vuong
Therefore,
1
(zk − wk ) ∈ (A + B)yk ,
λk
which, together with 0 ∈ (A + B)x∗ and the monotonicity of A + B, implies
hzk − wk , yk − x∗ i ≥ 0.
Hence,
µ2 λ2k
kzk − x∗ k2 ≥ kzk − yk k2 − kyk − zk k2 + kwk − x∗ k2 ,
λ2k+1
which is (16).
The following result introduces a discrete Lyapunov function for which a decreasing
property is established.
Proposition 8 Let (αk )k≥1 be a nondecreasing sequence of nonnegative numbers and (ρk )k≥1
be such that for some kρ ≥ 1 we have
2
0 < ρk ≤ ∀k ≥ kρ , (18)
1 + µθk
1 − αk
∗ 2 ∗ 2
Hk := kxk − x k − αk kxk−1 − x k + 2αk αk + kxk − xk−1 k2 . (19)
ρk (1 + µθk )
where
1 − αk+1
2
δk := (1 − αk ) − 1 − 2αk+1 αk+1 + ∀k ≥ 1.
ρk (1 + µθk ) ρk+1 (1 + µθk+1 )
12
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
1 − µθk > 0 ∀k ≥ k1 .
1 − µθk 1 − ρk
∗ 2 ∗ 2
kxk+1 − x k ≤ kzk − x k − + kxk+1 − zk k2
ρk (1 + µθk ) ρk
∗ 2 2
= kzk − x k − − 1 kxk+1 − zk k2 . (22)
ρk (1 + µθk )
Now we will estimate the right-hand side of (22). For all k ≥ 1 we have
and
Combining (22) with (23) and (24), together with (18) and using that (αk )k≥1 is nonde-
creasing, we obtain for all k ≥ k0 := max{k1 , kρ }
13
Boţ, Sedlmayer and Vuong
In order to further proceed with the convergence analysis, we have to choose the se-
quences (αk )k≥1 and (ρk )k≥1 such that
This is a manageable task, since we can choose for example the two sequences such that
limk→+∞ αk = α ≥ 0, and limk→+∞ ρk = ρ > 0. Recalling that limk→+∞ θk = 1, we obtain
2 (1 − α)2
0<ρ< . (25)
(1 + µ) (2α2 − α + 1)
Remark 9 (inertia versus relaxation) Inequality (25) represents the necessary trade-off
between inertia and relaxation (see Figure 1 for two particular choices of µ). The expression
is similar to the one obtained by (Attouch and Cabot, 2019b, Remark 2.13), the exception
being an additional factor incorporating the stepsize parameter µ. This means that for given
2 (1−α)2
0 ≤ α < 1 the upper bound for the relaxation parameter is ρ(α, µ) = (1+µ) (2α2 −α+1)
. We
further see that α 7→ ρ(α, µ) is a decreasing function on the interval [0, 1]. Hence, the
maximal value for the limit of the sequence of relaxation parameters is obtained when α = 0
2
and is ρmax (µ) := ρ(0, µ) = 1+µ . On the other hand, when α % 1, then ρ(α, µ) & 0. In
addition, the function µ 7→ ρmax (µ) is also decreasing on [0, 1] with limiting values 2 as
µ & 0, and 1 as µ % 1.
14
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2
Figure 1: Trade-off between inertia and relaxation for µ = 0.5 (left) and µ = 0.95 (right).
Proof (i) From (20) and lim inf k→+∞ δk > 0 we can conclude that there exists k0 ≥ 1 such
that the sequence (Hk )k≥k0 is nonincreasing. Therefore for all k ≥ k0 we have
We are now in the position to prove the main result of this section. In order to do so,
we first recall two useful lemmas.
15
Boţ, Sedlmayer and Vuong
Lemma 11 (Alvarez and Attouch, 2001) Let (ϕk )k≥0 , (αk )k≥1 , and (ψk )k≥1 be sequences
of nonnegative real numbers satisfying
+∞
X
ϕk+1 ≤ ϕk + αk (ϕk − ϕk−1 ) + ψk ∀k ≥ 1, ψk < +∞,
k=1
and such that 0 ≤ αk ≤ α < 1 for all k ≥ 1. Then the limit limk→+∞ ϕk ∈ R exists.
Proof The result will be a consequence of the discrete Opial Lemma. To this end we will
prove that the conditions (i) and (ii) in Lemma 12 for C := Zeros(A + B) are satisfied.
Let x∗ ∈ Zeros(A + B). Indeed, it follows from (22) and (23) that for k large enough
lim δk kxk+1 − xk k2 = 0,
k→+∞
which, as lim inf k→+∞ δk > 0, yields limk→+∞ kxk+1 − xk k = 0. Recall that for k ≥ 1 we
have wk = yk − λk (Byk − Bzk ). Since
1 1
kwk − zk k = kxk+1 − zk k = kxk+1 − xk + αk (xk − xk−1 )k ∀k ≥ 1,
ρk ρk
16
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
the fact that kyk − zk k → 0 as k → +∞ then shows that (ykl )l≥0 and (zkl )l≥0 converges
weakly to x̄ as l → +∞, too. The definition of (ykl )l≥0 gives
1
(zk − ykl ) + Bykl − Bzkl ∈ (A + B)ykl ∀l ≥ 0.
λkl l
1
Using that λkl (zkl − ykl ) + Bykl − Bzkl converges strongly to 0 and that the graph of
l≥0
the maximal monotone operator A+B is sequentially closed with respect to the weak-strong
topology of the product space H × H, we obtain 0 ∈ (A + B)x̄, thus x̄ ∈ Zeros(A + B).
Remark 14 In the particular case of the variational inequality (4), which corresponds to
the case when A is the normal cone of a nonempty closed convex subset C of H, by taking
into account that JλNC = PC is for all λ > 0 the projection operator onto C, the relaxed
inertial FBF algorithm reads
zk = xk + αk (xk − xk−1 )
(RIF BF − V I) (∀k ≥ 1) yk = PC (I − λk B)zk
xk+1 = (1 − ρk )zk + ρk (yk − λk (Byk − Bzk )) ,
where x0 , x1 ∈ H are starting points, (λk )k≥1 and (ρk )k≥1 are sequences of positive numbers,
and (αk )k≥1 is a sequence of nonnegative numbers.
The algorithm converges weakly to a solution of (4) when B is a monotone and Lipschitz
continuous operator in the hypotheses of Theorem 13.
As it has been shown in (Boţ et al., 2020), it also converges when B is pseudo-monotone
on H, Lipschitz continuous, and sequentially weak-to-weak continuous, and also when H is
finite dimensional, and B is pseudo-monotone on C and Lipschitz continuous.
We recall that B is said to be pseudo-monotone on C (on H) if for all x, y ∈ C (x, y ∈ H)
it holds
hBx, y − xi ≥ 0 ⇒ hBy, y − xi ≥ 0.
λk
Denoting wk := yk − λk (Byk − Bzk ) and θk := λk+1 for all k ≥ 1, then for all x∗ ∈
Zeros(NC + B) it holds
hyk − x∗ , zk − wk i ≥ 0 ∀k ≥ 1,
which, combined with (17), lead as in the proof of Proposition 7 to the conclusion.
Now, since (16) holds, the statements in Proposition 8 and Proposition 10 remain true
and, as seen in the proof of Theorem 13, they guarantee that the limit limk→+∞ kxk − x∗ k ∈
17
Boţ, Sedlmayer and Vuong
R exists, and that limk→+∞ kyk − zk k = limk→+∞ kByk − Bzk k = 0. Having that, the weak
converge of (xk )k≥0 to a solution of (4) follows, by arguing as in the proof by (Boţ et al., 2020,
Theorem 3.1), when B is pseudo-monotone on H, Lipschitz-continuous and sequentially
weak-to-weak continuous, and as in the proof by (Boţ et al., 2020, Theorem 3.2), when H
is finite dimensional, and B is pseudo-monotone on C and Lipschitz continuous.
4. Numerical Experiments
In this section, we provide two numerical experiments that complete our theoretical results.
The first one, a bilinear saddle point problem, is a usual deterministic example where all
the assumptions are fulfilled to guarantee convergence. For the second experiment we leave
the save harbour of justified assumptions and frankly use RIFBF to treat a more complex
(stochastic) problem and train a special kind of generative machine learning system that
receives a lot of attention recently.
for all u ∈ U and all v ∈ V . The monotone inclusion to be solved in this case is of the form
where
NU ×V is the maximal monotone
cone operator to U × V and F (u, v) =
normal
u a 0 A
M + , with M = , is monotone and Lipschitz continuous with
v −b −AT 0
Lipschitz constant L = kM k2 . Notice that F is not cocoercive.
For our experiments we choose m = n = 500, and A, a and b to have entries drawn from
different random distributions; we look at the uniform distribution on the interval [0, 1], the
standard normal distribution and the 1-Poisson distribution. For the constraint sets U and
V we take unit balls in the Euclidean norm. We use constant stepsize λk = λ = Lµ , where
0 < µ < 1, constant inertial parameter αk = α and constant relaxation parameter ρk = ρ
for all k ≥ 1. We set xk := (uk , vk ) to fit the framework of our algorithm. The starting
point x0 is initialized randomly (with entries drawn from the uniform distribution on [0, 1]
for all three settings) and we set x1 = x0 . We did not observe different behavior for various
random trials, so each time we provide the results for only one run in the following.
In Figure 2 we can see the development of kyk − zk k for problem (26) with data drawn
from the different distributions for µ = 0.5, ρ = 0.5 and various values of α. The quantity
kyk − zk k is the fixed point residual of the operator JλA (I − λB), for which according
to the convergence analysis we have that kyk − zk k → 0 as k → +∞. As solutions of the
18
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Figure 2: Behavior of the fixed point residual kyk − zk k for the constrained bilinear saddle-
point problem with data drawn from different distributions (µ = 0.5).
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.32
α
0.00 ≥ 10,000 ≥ 10,000 6,493 4,327 3,245 2,596 2,166 1,860 1,629 1,447 1,234 1,108 1,017 941 929
0.04 ≥ 10,000 ≥ 10,000 6,233 4,154 3,115 2,493 2,080 1,787 1,562 1,375 1,171 1,065 977 - -
0.08 ≥ 10,000 ≥ 10,000 5,973 3,981 2,985 2,389 1,995 1,713 1,496 1,277 1,121 1,024 937 - -
0.12 ≥ 10,000 ≥ 10,000 5,713 3,807 2,856 2,286 1,910 1,635 1,427 1,194 1,077 984 - - -
0.16 ≥ 10,000 ≥ 10,000 5,453 3,634 2,726 2,184 1,824 1,551 1,336 1,140 1,038 - - - -
0.20 ≥ 10,000 ≥ 10,000 5,192 3,461 2,597 2,082 1,735 1,461 1,231 1,099 - - - - -
0.24 ≥ 10,000 9,871 4,932 3,287 2,468 1,976 1,621 1,374 1,170 - - - - - -
0.28 ≥ 10,000 9,350 4,671 3,114 2,339 1,858 1,502 1,282 - - - - - - -
0.32 ≥ 10,000 8,830 4,411 2,943 2,204 1,734 1,396 - - - - - - - -
0.36 ≥ 10,000 8,309 4,150 2,770 2,035 1,589 1,327 - - - - - - - -
0.40 ≥ 10,000 7,787 3,891 2,571 1,866 1,486 - - - - - - - - -
0.44 ≥ 10,000 7,266 3,633 2,351 1,730 - - - - - - - - - -
0.48 ≥ 10,000 6,744 3,340 2,149 - - - - - - - - - - -
0.52 ≥ 10,000 6,223 3,012 2,000 - - - - - - - - - - -
0.56 ≥ 10,000 5,707 2,709 - - - - - - - - - - - -
0.60 ≥ 10,000 5,130 - - - - - - - - - - - - -
0.64 ≥ 10,000 4,480 - - - - - - - - - - - - -
0.68 ≥ 10,000 3,968 - - - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - -
0.88 ≥ 10,000 - - - - - - - - - - - - - -
problem (26) are not available explicitly, we look at this meaningful surrogate instead. Also,
if we have yk = zk for some k ≥ 1 then xk+1 = yk = zk is a solution of (26).
We see that the behavior of the residual is similar for most combinations of parameters
and all three settings. When the inertial parameter α is larger, the algorithm takes fewer
iterations for the fixed point residual to reach the considered threshold of 10−5 . However,
when α gets close to the limiting case the performance is not consistently better throughout
the entire run anymore when the data is drawn from the uniform or the Poisson distribution.
19
Boţ, Sedlmayer and Vuong
Temporarily the residual is even worse than for smaller α, nevertheless the algorithm still
terminates in fewer iterations in the end.
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.04
α
0.00 ≥ 10,000 7,850 3,927 2,620 1,967 1,574 1,309 1,110 948 837 734 705
0.04 ≥ 10,000 7,536 3,770 2,516 1,889 1,511 1,256 1,054 908 791 704 -
0.08 ≥ 10,000 7,222 3,613 2,411 1,810 1,447 1,201 999 865 751 - -
0.12 ≥ 10,000 6,908 3,456 2,307 1,731 1,383 1,141 946 822 - - -
0.16 ≥ 10,000 6,595 3,300 2,202 1,652 1,317 1,071 893 774 - - -
0.20 ≥ 10,000 6,281 3,143 2,098 1,572 1,248 999 843 - - - -
0.24 ≥ 10,000 5,967 2,987 1,993 1,491 1,170 934 - - - - -
0.28 ≥ 10,000 5,653 2,830 1,887 1,405 1,083 879 - - - - -
0.32 ≥ 10,000 5,340 2,674 1,780 1,302 1,000 - - - - - -
0.36 ≥ 10,000 5,026 2,517 1,664 1,196 - - - - - - -
0.40 ≥ 10,000 4,713 2,357 1,531 1,105 - - - - - - -
0.44 ≥ 10,000 4,400 2,189 1,390 - - - - - - - -
0.48 ≥ 10,000 4,088 1,999 - - - - - - - - -
0.52 ≥ 10,000 3,773 1,789 - - - - - - - - -
0.56 ≥ 10,000 3,443 - - - - - - - - - -
0.60 ≥ 10,000 3,077 - - - - - - - - - -
0.64 ≥ 10,000 2,663 - - - - - - - - - -
0.68 ≥ 10,000 - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - -
In Table 1 we can see the necessary number of iterations for the algorithm to achieve
kyk − zk k ≤ 10−5 for µ = 0.5 and different choices of the parameters α and ρ. As the
results are very similar for all three considered distributions, here we only report the case
of data drawn from the uniform distribution on the interval [0, 1]. For the tables regarding
the experiments for data drawn from the standard normal distribution and the 1-Poisson
distribution please refer to Appendix B.
As mentioned in Remark 9, there is a trade-off between inertia and relaxation. The
parameters α and ρ also need to fulfil the relations
2 (1 − α)2
0 ≤ α < 1 and 0 < ρ < ,
(1 + µ) (2α2 − α + 1)
which is the reason why not every combination of α and ρ is valid.
We see that for a particular choice of the relaxation parameter the least number of
iterations is achieved when the inertial parameter is as large as possible. If, on the other
hand, we fix the inertial parameter, we observe that also larger values of ρ are better
and lead to fewer iterations. To get a conclusion regarding the trade-off between the two
parameters, Table 1 suggests that the influence of the relaxation parameter is stronger than
that of the inertial parameter. Given that no numerical experiments are available for the
20
0.1)
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80
α
0.00 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,337 7,143 6,247 5,553 4,975 4,274 4,200 3,818 3,406 3,272 3,222 3,146 3,230
0.04 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,607 8,003 6,856 5,997 5,334 4,639 4,246 4,041 3,579 3,333 3,235 3,161 2,846 -
0.08 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,206 7,668 6,569 5,746 5,114 4,314 4,204 3,837 3,382 3,256 3,211 3,209 - -
0.12 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,804 7,332 6,281 5,493 4,774 4,255 4,038 3,617 3,298 3,242 3,314 - - -
0.16 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,402 6,996 5,992 5,227 4,395 4,215 3,837 3,428 3,338 3,371 - - - -
0.20 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,998 6,658 5,685 4,906 4,261 4,048 3,656 3,415 3,574 - - - - -
0.24 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,501 7,592 6,309 5,263 4,497 4,225 3,841 3,626 - - - - - - -
0.28 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,994 7,178 5,872 4,861 4,297 4,065 3,835 - - - - - - - -
0.32 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,481 6,666 5,364 4,612 4,269 4,105 - - - - - - - - -
0.36 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,906 6,098 5,053 4,548 4,450 - - - - - - - - - -
21
0.40 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,961 7,205 5,649 5,008 4,924 - - - - - - - - - - -
0.44 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,109 6,572 5,520 5,490 - - - - - - - - - - - -
0.48 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,174 6,356 6,115 - - - - - - - - - - - - -
0.52 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,633 6,785 - - - - - - - - - - - - - -
0.56 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,642 - - - - - - - - - - - - - - -
0.60 ≥ 10,000 ≥ 10,000 9,747 - - - - - - - - - - - - - - - -
0.64 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.68 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.72 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.88 ≥ 10,000 - - - - - - - - - - - - - - - - - -
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Figure 3: Behavior of the gap function |G(uk , vk )| for the constrained bilinear saddle-point
problem with data drawn from different distributions (µ = 0.5).
To get further insight into the convergence behavior, we also look at the following gap
function,
G(s, t) := inf Φ(u, t) − sup Φ(s, v).
u∈U v∈V
The quantity (G(uk , vk ))k≥0 should be a measure to judge the performance of the iterates
(uk , vk )k≥0 , as for the optimum (u∗ , v ∗ ) we have Φ(u∗ , v) ≤ Φ(u∗ , v ∗ ) ≤ Φ(u, v ∗ ) for all
u ∈ U and all v ∈ V and hence G(u∗ , v ∗ ) = 0.
Because of the particular choice of a bilinear objective and the constraint sets U and V the
expressions can be actually computed in closed form and we get
In Figure 3 we see the development of the absolute value of the gap |G(uk , vk )| for
problem (26) with data drawn from the different distributions for µ = 0.5, ρ = 0.5 and
various values of α. We see that, as in the case of the residual of the fixed point iteration, the
behavior is similar for most combinations of parameters – the larger the inertial parameter
α the better the results in general. However, in the case of the data being drawn from
the uniform or the Poisson distribution, the behavior of the gap is not consistently better
anymore when α gets close to the limiting case. In the meantime the gap for maximal α is
22
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
worst compared to all other parameter combinations, altough the algorithm still gives the
best results in the end. As the theory suggests, the gap indeed decreases and tends to zero
as the number of iterations grows larger.
Finally, in Figure 4 we look at the discrete velocity of the iterates kxk+1 − xk k which
is square summable and thus vanishes for k → +∞. Again we look at the three settings
where the data is drawn from a uniform, the standard normal and a Poisson distribution
and plot the results for µ = 0.5, ρ = 0.5 and various values of α. Once more larger inertial
parameters give better results, with the biggest difference to the previous instances being an
intermediate phase approximately up to iteration 100 where we do not have a clear picture
yet and which occurs for all distributions.
Figure 4: Behavior of the discrete velocity of the iterates kxk+1 − xk k for the constrained
bilinear saddle-point problem with data drawn from different distributions (µ =
0.5).
23
Boţ, Sedlmayer and Vuong
where Φ(u, v) = Ex∼p [log (Dv (x))] + Ex0 ∼qu [log (1 − Dv (x0 ))] is the value function, u and v
is the parametrisation of the generator and discriminator, respectively, Dv is the probability
how certain the discriminator is that the input is real, and p and qu is the real and learned
distribution, respectively.
Real Sample
Real World
Images
Fake
Loss
Discriminator
Latent random variable
Real
Generator
Fake Sample
for all u ∈ U and all v ∈ V . The corresponding inclusion problem then reads
where NU ×V is the normal cone operator to U ×V and F (u, v) = (∇u Φ(u, v), −∇v Φ(u, v))T .
If U and V are nonempty, convex and closed sets, and Φ(u, v) is convex-concave, Fréchet
differentiable and has a Lipschitz continuous gradient, then we have a variational inequality,
the obtained theory holds and we can apply RIFBF-VI.
This motivates to use (variants of) FBF methods for the training of GANs, even though
in practice the used value functions typically are not convex-concave and further, the gra-
dient might not be Lipschitz continuous if it exists at all. Additionally, in general one
needs stochastic versions of the used algorithms and which we do not provide in the case
of RIFBF. A stochastic variant of the relaxed inertial forward-backward-forward algorithm
RIFBF has been proposed and analyzed in (Cui et al., 2021) one and a half years after the
first version of this article. The convergence analysis of the stochastic numerical method
massively relies on the one carried out for RIFBF.
24
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Recently, a first successful attempt of using methods coming from the field of variational
inequalities for GAN training was done by (Gidel et al., 2019). In particular they applied
the well established extra-gradient algorithm and some derived variations. In this spirit
we frankly apply the FBF method and a variant with inertial effect (α = 0.05), as well as
a primal-dual algorithm for saddle point problems, introduced by (Hamedani and Aybat,
2021) (PDSP), which to the best of our knowledge has not been used for GAN training
before, and compare the results to the best method (Extra Adam) from the work by (Gidel
et al., 2019).
For our experiments we use the standard DCGAN architecture (Radford et al., 2016)
where generator and discriminator consist (among other elements) of several convolutional
and convolutional-transpose layers, respectively, that have shown to work very well with
images. The opposing networks are trained on the CIFAR10 dataset (Krizhevsky, 2009)
with the WGAN objective and weight clipping (Arjovsky et al., 2017), which uses the idea
to minimize the Wasserstein distance between the true distribution and the one learned by
the generator. Note that in absence of bound constraints on the weights of at least one
of the two networks, the backward step in the FBF algorithm would be redundant, as we
would project on the whole space and we obtain the unconstrained extra-gradient method.
Furthermore, in our experiments instead of stochastic gradients we use the Adam opti-
mizer (Kingma and Ba, 2014) with the hyperparameters (β1 = 0.5, β2 = 0.9) that where
used by (Gidel et al., 2019), as the best results there were achieved with this choice. Also
we would like to mention that we only did a hyperparameter search for the stepsizes of the
newly introduced methods, all other parameters we chose to be equal as in the aforemen-
tioned work.
Method IS FID
PDSP Adam 4.20 ± 0.04 53.97 ± 0.28
Extra Adam 4.07 ± 0.05 56.67 ± 0.61
FBF Adam 4.54 ± 0.04 45.85 ± 0.35
IFBF Adam (α = 0 .05 ) 4 .59 ± 0 .04 45 .25 ± 0 .60
Table 4: Best IS and FID scores (averaged over 5 runs) achieved on CIFAR10.
The model is evaluated using the inception score (IS) (reworked implementation by (Bar-
ratt and Sharma, 2018) that fixes some issues of the original one) as well as the Fréchet
inception distance (FID) (Heusel et al., 2017), both computed on 50,000 samples. Exper-
iments were run with 5 random seeds for 500,000 updates of the generator on a NVIDIA
GeForce RTX 2080Ti GPU. Table 4 reports the best IS and FID achieved by each consid-
ered method. Note that the values of IS for Extra Adam differ from those stated by (Gidel
et al., 2019), due to the usage of the corrected implementation of the score.
We see that even though we only have proved convergence for the monotone case in
the deterministic setting, the variants of RIFBF perform well in the training of GANs.
IFBF Adam outperforms all other considered methods, both for the IS and the FID. As
the theory suggests, making use of some inertial effects (regardless that Adam already
incorporates some momentum) seems to provide additional improvement of the numerical
method in practice. The results suggest, that employing methods that are designed to
25
Boţ, Sedlmayer and Vuong
Acknowledgments
The authors would like to thank Julius Berner and Axel Böhm for fruitful discussions and
their valuable suggestions and comments, and also to the anonymous reviewers for their
recommendations which have improved the quality of the paper. Radu Boţ would like to
acknowledge partial support from the Austrian Science Fund (FWF), projects I 2419-N32
and W 1260. Michael Sedlmayer would like to acknowledge support from the Austrian
Research Promotion Agency (FFG), project “Smart operation of wind turbines under icing
conditions (SOWINDIC)”.
26
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Figure 6: Comparison of the samples of a WGAN with weight clipping trained with the
different methods.
27
Boţ, Sedlmayer and Vuong
1 − λL
ḧ(t) + γ(t)ḣ(t) + τ (t)kM x(t)k2 ≤ kẋ(t)k2 ∀t ∈ [0, +∞)
(1 + λL)2
one obtains the statements (i) and (ii) in Theorem 4 as well as the existence of the limit
limt→+∞ kx(t) − x∗ k ∈ R. We denote κ := (1+λL)
1−λL
2 > 0.
Taking again into account (8) one obtains for every t ∈ [0, +∞)
κ
ḧ(t) + γ(t)ḣ(t) + kẍ(t) + γ(t)ẋ(t)k2 ≤ kẋ(t)k2
τ (t)
or, equivalently,
κγ 2 (t)
κγ(t) d κ
kẋ(t)k2 + − 1 ||ẋ(t)||2 + ||ẍ(t)||2 ≤ 0.
ḧ(t) + γ(t)ḣ(t) +
τ (t) dt τ (t) τ (t)
and
d d
γ(t)ḣ(t) = (γh)(t) − γ̇(t)h(t) ≥ (γh)(t),
dt dt
it yields for every t ∈ [0, +∞)
d d γ(t) 2
ḧ(t) + (γh)(t) + κ kẋ(t)k
dt dt τ (t)
2
−γ̇(t)τ (t) + γ(t)τ̇ (t)
κγ (t) κ
+ +κ − 1 ||ẋ(t)||2 + ||ẍ(t)||2 ≤ 0.
τ (t) τ 2 (t) τ (t)
According to Assumption 1 we have for almost every t ∈ [0, +∞) the inequality
d d γ(t) κ
ḧ(t) + (γh)(t) + κ kẋ(t)k2 + νkẋ(t)k2 + kẍ(t)||2 ≤ 0, (27)
dt dt τ (t) τ
where τ denotes a positive upper bound for the function τ . From here we obtain that
the function t 7→ ḣ(t) + γ(t)h(t) + κ γ(t) 2
τ (t) kẋ(t)k , which is locally absolutely continuous, is
monotonically decreasing. Hence there exists a real number M such that
γ(t)
ḣ(t) + γ(t)h(t) + κ kẋ(t)k2 ≤ M ∀t ∈ [0, +∞), (28)
τ (t)
which yields
ḣ(t) + γh(t) ≤ M ∀t ∈ [0, +∞),
28
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
where γ denotes a positive lower bound for the function γ. By multiplying this inequality
with exp(γt) and then by integrating from 0 to T , where T > 0, it yields
M
h(T ) ≤ h(0) exp(−γT ) + (1 − exp(−γT )),
γ
thus
h is bounded
and, consequently,
the trajectory x is bounded.
On the other hand, from (28), it follows that for every t ∈ [0, +∞)
κγ
ḣ(t) + kẋ(t)k2 ≤ M,
τ
hence κγ
hx(t) − x∗ , ẋ(t)i + kẋ(t)k2 ≤ M.
τ
This yields, since x is bounded, that
ẋ is bounded,
which allows us to conclude that ẋ, ẍ ∈ L2 ([0, +∞); H). Finally, from (8) and Assumption
1 we deduce M x ∈ L2 ([0, +∞); H) and the proof of statement (i) is complete.
In order to prove (ii), we notice that for every t ∈ [0, +∞) it holds
d 1 1 1
kẋ(t)k2 = hẋ(t), ẍ(t)i ≤ kẋ(t)k2 + kẍ(t)k2 ,
dt 2 2 2
which, according to (i), leads to limt→+∞ ẋ(t) = 0.
Further, for every t ∈ [0, +∞) we have
(1 + λL)2 (2 + λL)2
d 1 d 1
kM (x(t))k2 = M (x(t)), (M x(t)) ≤ kM (x(t))k2 + kẋ(t)k2 .
dt 2 dt 2 2
By using again (i), we obtain limt→+∞ M (x(t)) = 0, while limt→+∞ ẍ(t) = 0 follows from
from (8) and Assumption 1.
Finally, we have seen that the function t 7→ ḣ(t)+γ(t)h(t)+κ γ(t) 2
τ (t) kẋ(t)k is monotonically
decreasing, thus from (i), (ii) and Assumption 1 we deduce that limt→+∞ γ(t)h(t) exists and
it is a real number. By taking also into account that limt→+∞ γ(t) ∈ (0, +∞) exists, we
obtain that limt→+∞ kx(t) − x∗ k ∈ R exists.
29
Boţ, Sedlmayer and Vuong
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.04
α
0.00 ≥ 10,000 4,217 2,110 1,407 1,056 845 705 604 529 471 424 400
0.04 ≥ 10,000 4,049 2,025 1,351 1,014 811 677 580 508 452 399 -
0.08 ≥ 10,000 3,880 1,941 1,295 972 778 648 556 487 433 - -
0.12 ≥ 10,000 3,711 1,857 1,238 929 744 620 532 466 - - -
0.16 ≥ 10,000 3,543 1,772 1,182 887 710 592 508 441 - - -
0.20 ≥ 10,000 3,374 1,688 1,126 845 677 564 481 - - - -
0.24 ≥ 10,000 3,206 1,604 1,070 803 643 533 - - - - -
0.28 ≥ 10,000 3,037 1,520 1,014 761 607 498 - - - - -
0.32 ≥ 10,000 2,868 1,435 958 717 564 - - - - - -
0.36 ≥ 10,000 2,700 1,351 901 668 - - - - - - -
0.40 ≥ 10,000 2,531 1,267 841 610 - - - - - - -
0.44 ≥ 10,000 2,363 1,182 770 - - - - - - - -
0.48 ≥ 10,000 2,194 1,093 - - - - - - - - -
0.52 ≥ 10,000 2,026 986 - - - - - - - - -
0.56 ≥ 10,000 1,857 - - - - - - - - - -
0.60 ≥ 10,000 1,679 - - - - - - - - - -
0.64 ≥ 10,000 1,459 - - - - - - - - - -
0.68 ≥ 10,000 - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - -
0.80 8,437 - - - - - - - - - - -
0.84 6,747 - - - - - - - - - - -
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.32
α
0.00 ≥ 10,000 7,127 3,565 2,378 1,785 1,429 1,191 1,022 894 795 716 614 600 537 522
0.04 ≥ 10,000 6,842 3,423 2,283 1,713 1,372 1,144 981 859 764 668 623 572 - -
0.08 ≥ 10,000 6,557 3,281 2,188 1,642 1,315 1,096 940 823 731 615 604 542 - -
0.12 ≥ 10,000 6,272 3,138 2,093 1,571 1,258 1,049 900 787 693 623 578 - - -
0.16 ≥ 10,000 5,988 2,996 1,999 1,500 1,201 1,001 858 744 626 607 - - - -
0.20 ≥ 10,000 5,703 2,853 1,904 1,429 1,144 953 813 695 621 - - - - -
0.24 ≥ 10,000 5,418 2,711 1,809 1,358 1,087 900 751 642 - - - - - -
0.28 ≥ 10,000 5,133 2,569 1,714 1,286 1,025 841 682 - - - - - - -
0.32 ≥ 10,000 4,848 2,427 1,619 1,212 952 758 - - - - - - - -
0.36 ≥ 10,000 4,564 2,284 1,524 1,128 861 725 - - - - - - - -
0.40 ≥ 10,000 4,279 2,142 1,421 1,028 795 - - - - - - - - -
0.44 ≥ 10,000 3,995 1,999 1,299 925 - - - - - - - - - -
0.48 ≥ 10,000 3,710 1,847 1,148 - - - - - - - - - - -
0.52 ≥ 10,000 3,426 1,663 1,102 - - - - - - - - - - -
0.56 ≥ 10,000 3,141 1,449 - - - - - - - - - - - -
0.60 ≥ 10,000 2,839 - - - - - - - - - - - - -
0.64 ≥ 10,000 2,458 - - - - - - - - - - - - -
0.68 ≥ 10,000 2,172 - - - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - -
0.88 8,066 - - - - - - - - - - - - - -
30
bution (µ = 0.1)
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80
α
0.00 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,690 7,269 5,816 4,848 4,156 3,637 3,233 2,910 2,416 2,434 2,162 1,878 1,943 1,702 1,696 1,459
0.04 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,303 6,978 5,584 4,654 3,990 3,492 3,104 2,677 2,515 2,316 1,981 1,929 1,828 1,616 1,588 -
0.08 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,916 6,688 5,351 4,460 3,824 3,346 2,970 2,404 2,448 2,186 1,853 1,916 1,650 1,685 - -
0.12 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,528 6,397 5,119 4,267 3,658 3,198 2,802 2,506 2,343 2,044 1,859 1,816 1,572 - - -
0.16 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,141 6,107 4,887 4,073 3,487 3,014 2,460 2,460 2,237 1,893 1,860 1,678 - - - -
0.20 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,754 5,816 4,654 3,876 3,299 2,796 2,485 2,343 2,130 1,752 1,795 - - - - -
0.24 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,366 5,526 4,419 3,653 3,022 2,551 2,473 2,227 2,023 - - - - - - -
0.28 ≥ 10,000 ≥ 10,000 ≥ 10,000 6,979 5,235 4,165 3,396 2,703 2,448 2,344 2,111 - - - - - - - -
0.32 ≥ 10,000 ≥ 10,000 9,885 6,592 4,929 3,847 3,014 2,622 2,490 2,214 - - - - - - - - -
0.36 ≥ 10,000 ≥ 10,000 9,304 6,201 4,568 3,439 2,894 2,639 2,345 - - - - - - - - - -
31
0.40 ≥ 10,000 ≥ 10,000 8,723 5,775 4,128 3,148 2,920 2,513 - - - - - - - - - - -
0.44 ≥ 10,000 ≥ 10,000 8,139 5,246 3,667 3,219 2,737 - - - - - - - - - - - -
0.48 ≥ 10,000 ≥ 10,000 7,507 4,563 3,724 3,049 - - - - - - - - - - - - -
0.52 ≥ 10,000 ≥ 10,000 6,708 4,416 3,518 - - - - - - - - - - - - - -
0.56 ≥ 10,000 ≥ 10,000 5,748 4,298 - - - - - - - - - - - - - - -
0.60 ≥ 10,000 ≥ 10,000 5,719 - - - - - - - - - - - - - - - -
0.64 ≥ 10,000 9,879 - - - - - - - - - - - - - - - - -
0.68 ≥ 10,000 8,680 - - - - - - - - - - - - - - - - -
0.72 ≥ 10,000 8,203 - - - - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.88 ≥ 10,000 - - - - - - - - - - - - - - - - - -
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
bilinear saddle-point problem with data drawn from the standard normal distri-
Table 7: Number of iterations necessary to achieve kyk − zk k ≤ 10−5 in the constrained
Boţ, Sedlmayer and Vuong
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.04
α
0.00 ≥ 10,000 6,183 3,093 2,063 1,549 1,241 1,036 888 769 675 589 565
0.04 ≥ 10,000 5,936 2,969 1,981 1,487 1,192 995 851 733 638 564 -
0.08 ≥ 10,000 5,689 2,846 1,899 1,426 1,143 953 812 698 603 - -
0.12 ≥ 10,000 5,441 2,722 1,816 1,365 1,094 911 768 662 - - -
0.16 ≥ 10,000 5,194 2,598 1,734 1,304 1,044 866 723 623 - - -
0.20 ≥ 10,000 4,947 2,475 1,653 1,242 993 814 679 - - - -
0.24 ≥ 10,000 4,700 2,352 1,571 1,180 941 756 - - - - -
0.28 ≥ 10,000 4,452 2,229 1,490 1,117 882 706 - - - - -
0.32 ≥ 10,000 4,205 2,106 1,407 1,046 810 - - - - - -
0.36 ≥ 10,000 3,958 1,984 1,321 967 - - - - - - -
0.40 ≥ 10,000 3,711 1,861 1,224 888 - - - - - - -
0.44 ≥ 10,000 3,465 1,733 1,125 - - - - - - - -
0.48 ≥ 10,000 3,219 1,592 - - - - - - - - -
0.52 ≥ 10,000 2,975 1,441 - - - - - - - - -
0.56 ≥ 10,000 2,724 - - - - - - - - - -
0.60 ≥ 10,000 2,444 - - - - - - - - - -
0.64 ≥ 10,000 2,145 - - - - - - - - - -
0.68 ≥ 10,000 - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - -
0.84 9,852 - - - - - - - - - - -
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.32
α
0.00 ≥ 10,000 ≥ 10,000 5,149 3,432 2,573 2,059 1,717 1,474 1,293 1,153 998 894 822 763 753
0.04 ≥ 10,000 9,888 4,943 3,294 2,470 1,977 1,648 1,416 1,242 1,103 944 861 791 - -
0.08 ≥ 10,000 9,476 4,737 3,157 2,368 1,895 1,580 1,359 1,190 1,037 904 829 762 - -
0.12 ≥ 10,000 9,064 4,530 3,019 2,264 1,812 1,513 1,300 1,138 964 871 799 - - -
0.16 ≥ 10,000 8,651 4,324 2,882 2,162 1,731 1,446 1,237 1,080 919 842 - - - -
0.20 ≥ 10,000 8,239 4,118 2,744 2,059 1,650 1,379 1,164 996 888 - - - - -
0.24 ≥ 10,000 7,827 3,911 2,607 1,956 1,568 1,296 1,093 942 - - - - - -
0.28 ≥ 10,000 7,414 3,705 2,469 1,855 1,477 1,202 1,031 - - - - - - -
0.32 ≥ 10,000 7,002 3,498 2,332 1,751 1,381 1,119 - - - - - - - -
0.36 ≥ 10,000 6,589 3,292 2,197 1,624 1,273 1,067 - - - - - - - -
0.40 ≥ 10,000 6,176 3,085 2,049 1,490 1,196 - - - - - - - - -
0.44 ≥ 10,000 5,762 2,881 1,874 1,386 - - - - - - - - - -
0.48 ≥ 10,000 5,349 2,658 1,714 - - - - - - - - - - -
0.52 ≥ 10,000 4,935 2,404 1,611 - - - - - - - - - - -
0.56 ≥ 10,000 4,524 2,169 - - - - - - - - - - - -
0.60 ≥ 10,000 4,085 - - - - - - - - - - - - -
0.64 ≥ 10,000 3,576 - - - - - - - - - - - - -
0.68 ≥ 10,000 3,198 - - - - - - - - - - - - -
0.72 ≥ 10,000 - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - -
0.88 ≥ 10,000 - - - - - - - - - - - - - -
32
0.1)
ρ
0.01 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80
α
0.00 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,060 6,714 5,752 5,031 4,471 4,015 3,532 3,438 3,129 2,893 2,827 2,767 2,728 2,761
0.04 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,675 7,737 6,445 5,521 4,829 4,294 3,773 3,504 3,303 2,964 2,890 2,790 2,784 2,646 -
0.08 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,271 7,414 6,175 5,290 4,627 4,119 3,556 3,448 3,152 2,893 2,838 2,786 2,672 - -
0.12 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,867 7,090 5,905 5,058 4,424 3,874 3,523 3,316 3,026 2,920 2,829 2,863 - - -
0.16 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,462 6,766 5,634 4,826 4,210 3,619 3,471 3,198 2,968 2,946 2,895 - - - -
0.20 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,057 6,441 5,362 4,581 3,960 3,563 3,354 3,142 3,022 3,032 - - - - -
0.24 ≥ 10,000 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,652 6,114 5,080 4,263 3,677 3,522 3,288 3,192 - - - - - - -
0.28 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,670 7,244 5,781 4,750 3,957 3,650 3,469 3,366 - - - - - - - -
0.32 ≥ 10,000 ≥ 10,000 ≥ 10,000 9,128 6,831 5,383 4,373 3,831 3,695 3,588 - - - - - - - - -
0.36 ≥ 10,000 ≥ 10,000 ≥ 10,000 8,582 6,374 4,951 4,210 3,982 3,863 - - - - - - - - - -
0.40 10,000 10,000 10,000 8,023 5,842 4,628 4,352 4,233 - - - - - - - - - - -
33
≥ ≥ ≥
0.44 ≥ 10,000 ≥ 10,000 ≥ 10,000 7,357 5,382 4,824 4,681 - - - - - - - - - - - -
0.48 ≥ 10,000 ≥ 10,000 ≥ 10,000 6,655 5,474 5,215 - - - - - - - - - - - - -
0.52 ≥ 10,000 ≥ 10,000 9,424 6,385 5,869 - - - - - - - - - - - - - -
0.56 ≥ 10,000 ≥ 10,000 8,431 6,725 - - - - - - - - - - - - - - -
0.60 ≥ 10,000 ≥ 10,000 8,298 - - - - - - - - - - - - - - - -
0.64 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.68 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.72 ≥ 10,000 ≥ 10,000 - - - - - - - - - - - - - - - - -
0.76 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.80 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.84 ≥ 10,000 - - - - - - - - - - - - - - - - - -
0.88 ≥ 10,000 - - - - - - - - - - - - - - - - - -
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Generator
Discriminator
Input: x ∈ R3×32×32
conv. (kernel: 4 × 4, 1 → 64, stride: 2, pad: 1)
LeakyReLU (negative slope: 0.2)
conv. (kernel: 4 × 4, 64 → 128, stride: 2, pad: 1)
Batch Normalization
LeakyReLU (negative slope: 0.2)
conv. (kernel: 4 × 4, 128 → 256, stride: 2, pad: 1)
Batch Normalization
LeakyReLU (negative slope: 0.2)
Linear 128 × 4 × 4 × 4 → 1
34
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
References
Boushra Abbas and Hedy Attouch. Dynamical systems and forward-backward algorithms
associated with the sum of a convex subdifferential and a monotone cocoercive operator.
Optimization, 64(10):2223–2252, 2015.
Felipe Alvarez. On the minimizing property of a second order dissipative system in Hilbert
spaces. SIAM Journal on Control and Optimization, 38(4):1102–1119, 2000.
Felipe Alvarez and Hedy Attouch. An inertial proximal method for maximal monotone
operators via discretization of a nonlinear oscillator with damping. Set-Valued Analysis,
9(1-2):3–11, 2001.
Maicon M Alves and Raul T Marcavillaca. On inexact relative-error hybrid proximal ex-
tragradient, forward-backward and Tseng’s modified forward-backward methods with
inertial effects. Set-Valued and Variational Analysis, 28:301–325, 2020.
Maicon M Alves, Jonathan Eckstein, Marina Geremia, and Jefferson G Melo. Relative-error
inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms.
Computational Optimization and Applications, 75(2):389–422, 2020.
Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein Generative Adversarial
Networks. In International Conference on Machine Learning (ICML 2017), volume 70,
pages 214–223. PMLR, 2017.
Hedy Attouch and Alexandre Cabot. Convergence of a relaxed inertial proximal algorithm
for maximally monotone operators. Mathematical Programming, pages 1–45, 2019b.
Hedy Attouch and Alexandre Cabot. Convergence rate of a relaxed inertial proximal algo-
rithm for convex minimization. Optimization, pages 1–32, 2019c.
Shane Barratt and Rishi Sharma. A note on the inception score. In ICML 2018 Workshop
on Theoretical Foundations and Applications of Deep Generative Models, 2018.
Heinz H Bauschke and Patrick L Combettes. Convex Analysis and Monotone Operator
Theory in Hilbert Spaces. Springer, New York, 2017.
Jérôme Bolte. Continuous gradient projection method in Hilbert spaces. Journal of Opti-
mization Theory and its Applications, 119(2):235–259, 2003.
35
Boţ, Sedlmayer and Vuong
Jonathan M Borwein and Adrian S Lewis. Convex Analysis and Nonlinear Optimization:
Theory and Examples. Springer, New York, 2006.
Radu I Boţ and Ernö R Csetnek. Second order forward-backward dynamical systems for
monotone inclusion problems. SIAM Journal on Control and Optimization, 54(3):1423–
1443, 2016a.
Radu I Boţ and Ernö R Csetnek. Convergence rates for forward-backward dynamical sys-
tems associated with strongly monotone inclusions. Journal of Mathematical Analysis
and Applications, 457(2):1135–1152, 2018.
Radu I Boţ, Ernö R Csetnek, and Christopher Hendrich. Inertial Douglas-Rachford splitting
for monotone inclusion problems. Applied Mathematics and Computation, 256(1):472–
487, 2015.
Radu I Boţ, Ernö R Csetnek, and Phan T Vuong. The Forward-Backward-Forward method
from discrete and continuous perspective for pseudo-monotone variational inequalities in
Hilbert spaces. European Journal of Operational Research, 287:49–60, 2020.
Shisheng Cui, Uday V Shanbhag, Mathias Staudigl, and Phan T Vuong. Stochastic Relaxed
Inertial Forward-Backward-Forward splitting for monotone inclusions in Hilbert spaces.
arXiv:2107.10335, 2021.
Gauthier Gidel, Hugo Berard, Gaëtan Vignoud, Pascal Vincent, and Simon Lacoste-Julien.
A variational inequality perspective on Generative Adversarial Networks. In International
Conference on Learning Representations (ICLR 2019), 2019.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in
Neural Information Processing Systems, volume 27, pages 2672–2680, 2014.
Erfan Y Hamedani and Necdet S Aybat. A primal-dual algorithm for general convex-concave
saddle point problems. SIAM Journal on Optimization, 31(2):1299–1329, 2021.
36
Relaxed Inertial FBF Algorithm for Solving Monotone Inclusions
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp
Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash
equilibrium. In Advances in Neural Information Processing Systems, volume 30, pages
6626–6637, 2017.
Franck Iutzeler and Julien M Hendrickx. A generic online acceleration scheme for optimiza-
tion algorithms via relaxation and inertia. Optimization Methods and Software, 34(2):
383–405, 2019.
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.
Galina M Korpelevich. The extragradient method for finding saddle points and other prob-
lems. Ekonomika i Matematicheskie Metody, 12:747–756, 1976.
Alex Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis,
University of Toronto, Canada, 2009.
Yura Malitsky and Matthew K Tam. A forward-backward splitting method for monotone
inclusions without cocoercivity. SIAM Journal on Optimization, 30(2):1451–1472, 2020.
Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of
convergence o(1/k 2 ). Doklady Akademija Nauk USSR, 269:543–547, 1983.
Zdzislaw Opial. Weak convergence of the sequence of successive approximations for nonex-
pansive mappings. Bulletin of the American Mathematical Society, 73(4):591–597, 1967.
Boris T Polyak. Some methods of speeding up the convergence of iteration methods. USSR
Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.
Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning
with deep convolutional generative adversarial networks. In International Conference on
Learning Representations (ICLR 2016), 2016.
Paul Tseng. A modified forward-backward splitting method for maximal monotone map-
pings. SIAM Journal on Control and Optimization, 38(2):431–446, 2000.
37