0% found this document useful (0 votes)
5 views11 pages

E Cient Methods For Structured Nonconvex-Nonconcave Min-Max Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

E Cient Methods For Structured Nonconvex-Nonconcave Min-Max Optimization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Efficient Methods for Structured Nonconvex-Nonconcave Min-Max

Optimization

Jelena Diakonikolas Constantinos Daskalakis Michael I. Jordan


UW-Madison MIT UC Berkeley

Abstract tions across a range of disciplines, including complex-


ity theory, statistics, control theory, and online learn-
ing theory. Most recently, min-max optimization has
The use of min-max optimization in the ad-
played an important role in machine learning, notably
versarial training of deep neural network clas-
in the adversarial training of deep neural network clas-
sifiers, and the training of generative ad-
sifiers and the training of generative deep neural net-
versarial networks has motivated the study
work models. These recent applications have height-
of nonconvex-nonconcave optimization objec-
ened the importance of solving min-max optimization
tives, which frequently arise in these appli-
problems with nonconvex-nonconcave objectives, tak-
cations. Unfortunately, recent results have
ing the following general form:
established that even approximate first-order
stationary points of such objectives are in- min max f (x, y), (1.1)
x y
tractable, even under smoothness conditions,
motivating the study of min-max objec- where x and y are real-valued vectors and f is not
tives with additional structure. We intro- (necessarily) convex in x for all y and/or not (nec-
duce a new class of structured nonconvex- essarily) concave in y for all x. There may also be
nonconcave min-max optimization problems, constraints on x and y, and in many applications x
proposing a generalization of the extragradi- and y are high-dimensional vectors.
ent algorithm which provably converges to a
When the objective function is not convex-concave,
stationary point. The algorithm applies not
von Neumann’s celebrated min-max theorem fails to
only to Euclidean spaces, but also to gen-
apply, and so do most standard optimization methods
eral `p -normed finite-dimensional real vector
for solving (1.1). This has motivated several lines of
spaces. We also discuss its stability under
investigation, which include extensions of the min-max
stochastic oracles and provide bounds on its
theorem beyond convex-concave objectives (e.g. Sion’s
sample complexity. Our iteration complexity
theorem for quasiconvex-quasiconcave objectives), and
and sample complexity bounds either match
the pursuit of computational procedures that target
or improve the best known bounds for the
solutions to (1.1) even in the absence of a min-max
same or less general nonconvex-nonconcave
theorem; see Section 1.1 for a review of recent work.
settings, such as those that satisfy variational
Of course, without strong assumptions on f , (1.1) is
coherence or in which a weak solution to the
an intractable problem, at least as intractable as gen-
associated variational inequality problem is
eral nonconvex optimization. Thus, the literature has
assumed to exist.
targeted locally optimal solutions, in the same spirit as
the targeting of local optima in non-convex optimiza-
tion. Naturally, there are various notions of local op-
1 Introduction timality that have been studied in the literature. Our
focus here will be on the simplest such notion, namely
Min-max optimization and min-max duality theory lie first-order local optimality, for which, despite the ap-
at the foundations of game theory and mathematical parent simplicity, many challenges arise (Daskalakis
programming, and have found far-reaching applica- and Panageas, 2018; Mazumdar et al., 2020).
Proceedings of the 24th International Conference on Artifi- In contrast to classical optimization problems, where
cial Intelligence and Statistics (AISTATS) 2021, San Diego, useful results can be obtained with very mild assump-
California, USA. PMLR: Volume 130. Copyright 2021 by tions on the objective function, in min-max optimiza-
the author(s).
tion it is necessary to impose non-trivial assumptions
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

on f , even when the goal is only to compute locally Our contributions. Given the aforedescribed in-
optimal solutions. Indeed, Daskalakis et al. (2021) es- tractability results, our goal is to identify structural
tablish intractability results in the constrained setting properties that make it possible to solve min-max op-
of the problem, wherein first-order locally optimal so- timization problems with smooth objectives. Focus-
lutions are guaranteed to exist whenever the objective ing on the unconstrained setting of (1.1), we view it
is smooth. Moreover, they show that even the compu- as a special case (obtained by considering the opera-
⇥ rx f (x,y)⇤
tation of approximate solutions is PPAD-complete and, tor F ([yx]) = r y f (x,y)
) of the unconstrained varia-
if the objective function is accessible through value- tional inequality problem (svi), and consider instead
queries and gradient-queries, exponentially many such this more general problem. We identify conditions for
queries are necessary (in particular, exponential in at F under which a generalized version of the extragradi-
least one of the following: the inverse approximation ent method of Korpelevich (1976), which we propose,
parameter, the smoothness constant of f , or the diam- converges to a solution of (svi) (or, in the special case
p
eter of the constraint set). of (1.1), to a stationary point of f ) at a rate of 1/ k
We expect similar intractability results to hold in the in the number of iterations k. Our condition, pre-
unconstrained case, which is the case considered in this sented as Assumption 1, postulates that there exists a
paper, even when restricting to smooth objectives that solution to (svi) that only violates the stronger (mvi)
have a non-empty set of optimal solutions.1 Indeed, requirement in a controlled manner that we delineate.
fixed-point complexity-based intractability results for Our generalized extragradient method is based on an
the constrained case are typically extendable to the aggressive interpolation step, as specified by (eg+),
unconstrained case, by embedding the hard instances and our main convergence result is Theorem 3.2. We
within an unbounded domain. additionally show, in Theorems 4.1 and 4.4, that the
algorithm converges in non-Euclidean settings, under
Relatedly, we already know that the unconstrained the stronger condition that an (mvi) solution exists, or
Stampacchia variational inequality (SVI) problem for when we only have stochastic oracle access to F (or,
Lipschitz continuous operators F : Rd ! Rd —a prob- in the special case of (1.1), to the gradient of f ).
lem which includes the unconstrained case of (1.1) by
⇥ rx f (x,y)⇤ The condition on F under which our main result ap-
setting F ([yx]) = ry f (x,y) —is computationally in-
plies is weaker than the assumption that a solution
tractable, even when restricting to operators that have
to (mvi) exists (Zhou et al., 2017; Mertikopoulos et al.,
a non-empty set of SVI solutions.2 This is because:
2019; Malitsky, 2019; Song et al., 2020), an assump-
(i) F is Lipschitz-continuous if and only if the op-
tion which is already satisfied by several interesting
erator T (u) = u F (u) is Lipschitz-continuous; (ii)
families of min-max objectives, including quasiconvex-
for ✏ 0, points ū 2 Rd such that kF (ū)k2  ✏ sat-
concave families or starconvex-concave families. Our
isfy kT (ū) ūk2  ✏, i.e. they are ✏-approximate fixed
significantly weaker condition applies in particular to
points of T , and vice versa; and (iii) it is known that
(min-max objectives f with corresponding) operators
finding approximate fixed points of Lipschitz opera-
F that are negatively comonotone (Bauschke et al.,
tors over Rd is PPAD-hard, even when the operators are
2020) or positively cohypomonotone (Combettes and
guaranteed to have fixed points (Papadimitriou, 1994).
Pennanen, 2004). These conditions have been stud-
Moreover, if we restrict attention to algorithms that
ied in the literature for at least a couple of decades,
only make value queries to T (i.e. F , which corresponds
but only asymptotic convergence results were available
to the type of access that all first-order algorithms
prior to our work for computing solutions to (svi). In
have), the query complexity becomes exponential in
contrast, our rates are asymptotically identical to the
the dimension (Hirsch et al., 1989). Finally, by the
rates that we would get under the stronger assump-
equivalence of norms, these results extend to arbitrary
tion that a solution to (mvi) exists, and sidestep the
`p -normed finite dimensional real vector spaces. Of
intractability results for (1.1) suggested by Daskalakis
course, for these intractability results for SVI to apply
et al. (2021) for general smooth objectives.
to the nonconvex-nonconcave min-max problem (1.1),
one would need to prove that these complexity results
extend to operators F constructed from 1.1 Further Related Work
⇥ rx f (x,y) ⇤ a smooth func-
tion f by setting F ([yx]) = r y f (x,y)
.
A large number of recent works target identifying prac-
tical first-order, low-order, or efficient online learning
1
methods for solving min-max optimization problems
Note that these are stationary points of f in this case.
2 in a variety of settings, ranging from the well-behaved
We formally define the Stampacchia variational in-
equality problem, (svi), in Section 2. We also define the setting of convex-concave objectives to the challeng-
harder Minty variational inequality problem, (mvi), in the ing setting of nonconvex-nonconcave objectives. There
same section. has been substantial work for convex-concave and
Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

Table 1: Comparison of iteration complexities required to find a point x with kF (x)kp⇤  ✏ using deterministic
algorithms, where ✏ > 0, F : Rd ! Rd is a Lipschitz operator satisfying Assumption 1 (Section 2) with ⇢ 0.
Parameter p determines the `p setup, and p⇤ = p p 1 is the exponent conjugate to p. Only the dependence on ✏
and possibly the dimension d is shown; the dependence on other problem parameters is comparable for all the
e hides logarithmic factors. ‘—’ indicates that the result does not exist/is not known.
results. O
XXX
XXX Setup 1
⇢ 2 (0, 4L ), p = 2 ⇢ = 0, p = 2 ⇢ = 0, p 2 (1, 2) ⇢ = 0, p > 2
Paper XXX
X 1/p 1/2 1/2 1/p
(Dang and Lan, 2015) — O( ✏12 ) O( poly(d ✏2 )
) O( poly(d ✏2 )
)
(Lin et al., 2018) — e 12 )
O( — —

(Song et al., 2020) — O( ✏12 ) O( ✏12 ) —
This Paper O( ✏12 ) O( ✏12 ) O( ✏12 ) O( ✏1p )

nonconvex-concave objectives, targeting the compu- In unconstrained Euclidean setups, the p best known
tation of min-max solutions to (1.1) or, respectively, convergence rates are of the order 1/ k (Dang and
stationary points of f or (x) := maxy f (x, y). This Lan, 2015; Song et al., 2020), under the assumption
work has focused on attaining improved convergence that an (mvi) solution exists. We obtain the same
rates (Kong and Monteiro, 2019; Lin et al., 2020b; rate under our weaker Assumption 1. Moreover, under
Thekumparampil et al., 2019; Nouiehed et al., 2019; Lu our weaker assumption, we show that the accumula-
et al., 2020; Zhao, 2019; Alkousa et al., 2019; Azizian tion points of the sequence of iterates of our algorithm
et al., 2020; Golowich et al., 2020; Lin et al., 2020a; Di- are (svi) solutions. This was previously established
akonikolas, 2020) and/or obtaining last-iterate conver- for alternative algorithms and under the stronger as-
gence guarantees (Daskalakis et al., 2018; Daskalakis sumption that an (mvi) solution exists (Mertikopoulos
and Panageas, 2018; Mazumdar et al., 2020; Mer- et al., 2019; Malitsky, 2019).
tikopoulos et al., 2018; Lin et al., 2018; Hamedani
When it comes to more general `p norms, Mertikopou-
and Aybat, 2018; Adolphs et al., 2019; Daskalakis and
los et al. (2019) establish the asymptotic convergence
Panageas, 2019; Liang and Stokes, 2019; Gidel et al.,
of the iterates of an optimistic variant of the mirror de-
2019; Mokhtari et al., 2020; Abernethy et al., 2019;
scent algorithm, under the assumption that an (mvi)
Liu et al., 2020).
solution exists, but they do not provide any conver-
In the nonconvex-nonconcave setting, research has fo- gence rates.p On the other hand, Dang and Lan (2015)
cused on identifying di↵erent notions of local min-max prove a 1/ k rate of convergence for a variant of the
solutions (Daskalakis and Panageas, 2018; Mazumdar mirror-prox algorithm in general normed spaces. This
et al., 2020; Jin et al., 2020; Mangoubi and Vishnoi, result, however, requires the regularizing (prox) func-
2021) and studying the existence and (local) conver- tion to be both smooth and strongly convex w.r.t. the
gence properties of learning methods to these points same norm, and the constant in the convergence bound
(Wang et al., 2019; Mangoubi et al., 2020; Mangoubi scales at least linearly with the condition number of
and Vishnoi, 2021). As already discussed, recent work the prox function. It is well-known that no function
of Daskalakis et al. (2021) shows that, for general can be simultaneously smooth and strongly convex
smooth objectives, the computation of even approx- w.r.t. an `p norm with p 6= 2 and have a condition
imate first-order locally optimal min-max solutions is number independent of the dimension (Borwein et al.,
intractable, motivating the identification of structural 2009). In fact, unless p is trivially close to 2, we only
assumptions on the objective function for which these know of functions whose condition number would scale
intractability barriers can be bypassed. polynomially with the dimension.
An example such assumption, which is closely related Very recent (and independent) work of Song et al.
to the one made in this work, is that an (mvi) solution (2020) proposes an optimistic dual extrapolation
⇥ rx f (x,y)⇤
exists for the operator F ([yx]) = r y f (x,y)
, as studied method with linear convergence for a class of prob-
by Zhou et al. (2017); Lin et al. (2018); Mertikopoulos lems that have a “strong” (mvi) solution. (In particu-
et al. (2019); Malitsky (2019); Liu et al. (2020); Song lar, their assumption is that there exists u⇤ 2 Rd such
et al. (2020). As we have already discussed, the as- that 8u 2 Rd : hF (u), u u⇤ i mku u⇤ k2 for some
sumption we make for our main result in this work is constant m 0; the case m = 0 recovers the existence
weaker. Table 1 provides a comparison of our results of a standard (mvi) solution.) Their result only ap-
to those of existing works, considering the determinis- plies to norms that are strongly convex, which in the
tic setting (i.e. having exact value access to F ). case of `p norms is true only for p 2 (1, 2]. In that case,
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

our results match those of Song et al. (2020). For the the Stampacchia Variational Inequality problem con-
case of stochastic oracle access to F, our bounds also sists in finding u⇤ 2 Rd such that:
match those of Song et al. (2020) for p 2 (1, 2], and
we also handle the case p > 2 which is not covered by (8u 2 U) : hF (u⇤ ), u u⇤ i 0. (svi)
Song et al. (2020).
In this case, u⇤ is referred to as the strong solution
Finally, it is worth noting that Zhou et al. (2017); to the variational inequality corresponding to F and
Mertikopoulos et al. (2019); Malitsky (2019); Song U . When U ⌘ Rd (the case considered here), it must
et al. (2020) consider constrained optimization setups, be the case that kF (u⇤ )kp⇤ = 0. We will assume that
which are not considered in our work. We believe that there exists at least one (svi) solution, and will denote
generalizing our results to constrained setups is possi- the set of all such solutions by U ⇤ .
ble, and defer such generalizations to future work.
The Minty Variational Inequality problem consists in
finding u⇤ such that:
2 Notation and Preliminaries
(8u 2 U) : hF (u), u⇤ ui  0, (mvi)
d
We consider real d-dimensional spaces (R , k · kp ),
in which case u⇤ is referred to as the weak solution
where k · kp is the p standard `p norm for p 1. In
to the variational inequality corresponding to F and
particular, k · k2 = h·, ·i is the `2 (Euclidean) norm
U . If we assume that F is monotone, then (2.1) im-
and h·, ·i denotes the inner product. When the context
plies that every solution to (svi) is also a solution to
is clear, we omit the subscript 2 and just write k · k
(mvi), and the two solution sets are equivalent. More
for the Euclidean norm k · k2 . Moreover, we denote by
generally, if F is not monotone, all that can be said
p⇤ = p p 1 the exponent conjugate to p.
is that the set of (mvi) solutions is a subset of the set
We are interested in finding stationary points for min- of (svi) solutions. In particular, (mvi) solutions may
max problems of the form: not exist even when (svi) solutions exist. These facts
follow from Minty’s theorem (see, e.g., (Kinderlehrer
min max f (x, y), (P) and Stampacchia, 2000, Chapter 3)).
x2Rd1 y2Rd2
We will not, in general, be assuming that F is mono-
where f is a smooth (possibly nonconvex-nonconcave) tone. Note that the Lipschitzness of F on its own is
function and d1 + d2 = d. In this case, stationary not sufficient to guarantee that the problem is compu-
points can be defined as the points at which the gra- tationally tractable, as discussed in the introduction.
dient of f is the zero vector. As is standard, the ✏- Thus, additional structure is needed, which we intro-
approximate variant of this problem for ✏ > 0 is to find duce in the following.
a point (x, y) 2 Rd1 ⇥ Rd2 such that krf (x, y)kp⇤  ✏.
We will study Problem (P) through the lens of vari- Weak MVI solutions. We define the class of prob-
ational inequalities, described in Section 2.1. To do lems with weak (mvi) solutions as the class of problems
so, we consider⇥ rthe operator
⇤ F : Rd ⇥!⇤ Rd defined in which F satisfies the following assumption.
x f (x,y) x
via F (u) = ry f (x,y) , where u = y and where Assumption 1 (Weak mvi). There exists u⇤ 2 U ⇤
rx f (respectively, ry f ) denotes the gradient of f such that:
w.r.t. x (respectively, y). It is clear that F is Lipschitz- ⇢
continuous⇥ whenever f is smooth and that kF (u)kp⇤  (8u 2 Rd ) : hF (u), u u⇤ i kF (u)k2p⇤ , (a2 )
⇤ 2
✏ for u = yx holds if and only if krf (x, y)kp⇤  ✏. ⇥ 1
for some parameter ⇢ 2 0, 4L .
2.1 Variational Inequalities and Structured
We will only provide results for ⇢ > 0 in the case of
(Possibly Non-Monotone) Operators
the `2 norm. For p 6= 2, we will require a stronger as-
Let F : Rd ! Rd be an operator that is L-Lipschitz- sumption; namely, that an (mvi) solution exists, which
continuous w.r.t. k · kp : holds when ⇢ = 0.

(8u, v 2 Rd ) : kF (u) F (v)kp⇤  Lku vkp . (a1 ) 2.2 Example Settings Satisfying
Assumption 1.
F is said to be monotone if:
The class of problems that have weak (mvi) solutions
(8u, v 2 Rd ) : hF (u) F (v), u vi 0. (2.1) in the sense of Assumption 1 generalizes other struc-
tured non-monotone variational inequality problems,
Given a closed convex set U ✓ Rd and an operator F, as we discuss in this section.
Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

When ⇢ = 0, we recover the class of problems that that for some u⇤ 2 U ⇤ we have
have an (mvi) solution. This class contains all uncon-
strained variationally coherent problems studied in, (8u 2 Rd ) hF (u), u u⇤ i ku u⇤ k2p , (2.3)
2
e.g., Zhou et al. (2017); Mertikopoulos et al. (2019),
which encompass all min-max problems with objec- and when the operator F does not plateau or be-
tives that are bilinear, pseudo-convex-concave, quasi- come too close to a linear operator around u⇤ ; namely,
convex-concave, and star-convex-concave. kF (u) F (u⇤ )kp⇤ µku u⇤ kp . (Note that (2.3) is
always satisfied with = 2L for L-Lipschitz opera-
When ⇢ > 0 and p = 2, Assumption 1 is implied by
tors, but we may need to be smaller than 2L). Then
F being ⇢2 -comonotone (Bauschke et al., 2020) or
⇢ Assumption 1 would be satisfied with ⇢ = µ . For a
2 -cohypomonotone (Combettes and Pennanen, 2004), min-max problem, assuming f is twice di↵erentiable,
defined as follows:
this would mean that the lowest⇥ eigenvalue⇤ of the sym-
rx f (x,y)
(8u, v 2 Rd ) : metric part of the Jacobian of r y f (x,y)
is bounded
⇢ below by /2 in any direction u u⇤ and the func-
hF (u) F (v), u vi kF (u) F (v)k22 . (2.2)
2 tion f is sufficiently “curved” (not⇥close to a linear or
⇤⇤

In particular, Assumption 1 is equivalent to requiring a constant function) around u⇤ = yx⇤ .


that (2.2) be satisfied for general u and v = u⇤ , where Finally, we discuss a concrete min-max application
u⇤ is a solution to (svi) (in which case F (u⇤ ) = 0). wherein there are no (mvi) solutions, but there do
Note that Assumption 1 does not imply that a solu- exist (svi) solutions satisfying the weak (mvi) condi-
tion to (mvi) exists, unless ⇢ = 0. It is further impor- tion of Assumption 1. This application arises in the
tant to note that cohypomonotone operators arise as context of two-agent zero-sum reinforcement learning
inverses of operators that only need to be Lipschitz- problems studied by many authors, including recently
continuous. (In fact, even a weaker property suffices; by Daskalakis et al. (2020). In Section 5.1 of that work,
see Bauschke et al. (2020).) This is particularly in- the authors consider a special case of the general two-
teresting as, combined with our main result, it im- agent zero-sum RL problem, called von Neumann’s ra-
plies that we can efficiently find zeros of inverses of tio game, for which they observe that, even on a ran-
Lipschitz-continuous operators, as long as those in- dom example, the (mvi) solution set is empty, yet the
verses are sufficiently Lipschitz, even though finding extragradient method still converges in practice (albeit
zeros of Lipschitz-continuous operators is computa- at a slower rate). Interestingly, it is easy to construct
tionally intractable, in general, as we have discussed. examples of the von Neumann ratio game for which
It is interesting to note that Assumption 1 does not im- no (mvi) solution exists, but the weak (mvi) condition
ply that, in the min-max setting, f is convex-concave of Assumption 1 does hold, and yet the stronger cohy-
(or, more generally, that F is monotone),⇥ even in a pomonotonicity condition of (2.2) does not hold. In-
⇤⇤
neighborhood of an (svi) solution u⇤ = yx⇤ , i.e., a deed, one such example is obtained for the game shown
stationary point of f . To see this, fix y = y⇤ and in Proposition 2 of their paper, setting s = 1/2 and
consider f (x, y⇤ ) for x in a small neighborhood of x⇤ . ✏ = .49. Here (mvi) fails, the weak (mvi) condition
Using the fact that a continuously-di↵erentiable func- of Assumption 1 is satisfied, and cohypomonotonic-
tion is well-approximated by its linear approximation ity fails to hold, e.g., for u = (x, y) = (0.1, 0.3) and
within small neighborhoods, all that we are able to v = (x0 , y0 ) = (0.8, 0.3). To be clear, the von Neu-
deduce from Assumption 1 is that mann ratio game gives rise to a constrained min-max
Dh i h ⇤ iE problem while our algorithm targets the unconstrained
rx f (x,y⇤ )
f (x⇤ , y⇤ ) f (x, y⇤ ) ⇡ r ⇤
y f (x,y )
, x x
y ⇤
y ⇤ setting. While extending our result to the constrained
⇢ setting remains open, our example here demonstrates
⇤ 2
 krf (x, y )kp⇤ . that there is value in further studying the weak (mvi)
2
condition of Assumption 1 in the constrained setting
In particular, Assumption 1 does not preclude that as well.
f (x⇤ , y⇤ ) is larger than f (x, y⇤ ); it only bounds how
much larger it can be by a quantity proportional to
2.3 Useful Definitions and Facts
krf (x, y⇤ )k2p⇤ . Compare this also to the Polyak-
Lojasiewicz condition (see, e.g., Nouiehed et al. (2019); We now list some useful definitions and facts that will
Yang et al. (2020)), which imposes the opposite in- subsequently be used in our analysis. Additional back-
equality, namely, that f (x, y⇤ ) f (x⇤ , y⇤ ) is bounded ground, including proofs of Propositions 2.3 and 2.4 is
above by a multiple of krf (x, y⇤ )k2p⇤ . provided in Appendix A.
One way that a generic operator F may satisfy As- Definition 2.1 (Uniform convexity). Given p 2, a
sumption 1 is when there is a constant > 0 such di↵erentiable function : Rd ! R [ {+1} is said to
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

be p-uniformly convex w.r.t. k · k and with constant m Assumption 1), we introduce the following generaliza-
if 8x, y 2 Rd , tion of the extragradient algorithm, to which we refer
as Extragradient+ (eg+).
m
(y) (x) + hr (x), y xi + ky xkp . na 1 o
p ūk = argmin
k
hF (uk ), u uk i + ku uk k2 ,
u2Rd 2
n 1 o
Observe that when p = 2, we recover the standard def- uk+1 = argmin ak hF (ūk ), u uk i + ku uk k2 ,
inition of strong convexity. Thus, uniform convexity is u2Rd 2
a generalization of strong convexity. (eg+)
where 2 (0, 1] is a parameter of the algorithm and
Definition 2.2 (Bregman divergence). Let : Rd !
ak > 0 is the step size. When = 1, we recover
R be a di↵erentiable function. Then its Bregman di-
standard eg.
vergence between points x, y 2 Rd is defined by
The analysis relies on the following merit (or gap) func-
D (x, y) = (x) (y) hr (y), x yi . tion:
⇣ ⇢ ⌘
hk := ak hF (ūk ), ūk u⇤ i + kF (ūk )k2 , (3.1)
It is immediate that the Bregman divergence of a con- 2
vex function is non-negative. for some u⇤ 2 U ⇤ for which F satisfies Assumption 1.
Then Assumption 1 implies that hk 0, 8k.
Useful facts for `p setups. We now outline some The first (and main) step is to bound all hk ’s above,
useful auxiliary results used specifically in Section 4, as in the following lemma.
where we study the case that p is not necessarily equal
to 2. Lemma 3.1. Let F : Rd ! Rd be an arbitrary L-
Lipschitz operator that satisfies Assumption 1 for some
Proposition 2.3. Given, z, u 2 Rd , p 2 (1, 1) and u⇤ 2 U ⇤ . Given an arbitrary initial point u0 , let the
q 2 {p, 2}, let sequences of points {ui }i 1 , {ūi }i 0 evolve according
n o to (eg+) for some 2 (0, 1] and positive step sizes
1
w = argmin hz, vi + ku vkqp . {ai }i 0 . Then, for any > 0 and any k 0, we have:
v2Rd q
1 ⇤ 1 ⇤
p q hk  ku uk k2 ku uk+1 k2
Then, for p⇤ = p

1, q = q 1:
2 2
ak
⇣1 ⌘ + ⇢ ak (1 ) kF (ūk )k2
q⇤ 1 1 ⇤ 2
w=u r kzkp ⇤ and kw ukqp = kzkqp⇤ . ak 2 (3.2)
q⇤ q q + 2 ak L kF (uk )k2
2
Another useful result is the following proposition, 1 ⇣ ak L ⌘
+ kūk uk+1 k2 ,
which will allow us to relate Lipschitzness of F to uni- 2
form convexity of the prox mapping 1q k · kqp in the def- where hk is defined as in Eq. (3.1).
inition of the algorithm. The ideas used in the proof
can be found in the proofs of (d’Aspremont et al., 2018, The proof is provided in Appendix B.
Lemma 5.7), (Nesterov, 2015, Lemma 2), and in (De-
Using Lemma 3.1, we can now draw conclusions about
volder et al., 2014, Section 2.3).
the convergence of eg+ by choosing parameters ,
Proposition 2.4. For any L > 0,  > 0, q , t 0, and the step sizes ak to guarantee that hk < 12 ku⇤
and > 0, uk k2 12 ku⇤ uk+1 k2 as long as kF (ūk )k = 6 0.
L  ⇤ Theorem 3.2. Let F : Rd ! Rd be an arbitrary L-
t  tq + ,
 q 2 Lipschitz operator that satisfies Assumption 1 for some
q  u⇤ 2 U ⇤ . Given an arbitrary initial point u0 2 Rd , let
2(q )
where ⇤ = q

Lq/ . the sequences of points {ui }i 1 , {ūi }i 0 evolve accord-
ing to (eg+) for = 12 and ak = 2L 1
. Then:
3 Generalized Extragradient for
(i) all accumulation points of {ūk }k 0 are in U ⇤ .
Problems with Weak MVI Solutions
(ii) for all k 1:
In this section, we consider the setup with the Eu- k
clidean norm k · k = k · k2 , i.e., p = 2. To address 1 X 2Lku0 u⇤ k2
kF (ūi )k2  .
the class of problems with weak (mvi) solutions (see k + 1 i=0 (k + 1)(1/(4L) ⇢)
Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

In particular, we have that It remains to observe that


k
2Lku0 u⇤ k2 1 X
min kF (ūi )k2  Ei⇠Unif{0,...,k} [kF (ūi )k2 ] = kF (ūi )k2
0ik (k + 1)(1/(4L) ⇢) k + 1 i=0
Pk
and and 1
i=0 kF (ūi )k2 min0ik kF (ūi )k2 .
k+1
⇥ ⇤ 2Lku0 u⇤ k2 Remark 3.3. Due to Eq. (3.4), we have that all the
Ei⇠Unif{0,...,k} kF (ūi )k2  ,
(k + 1)(1/(4L) ⇢) iterates of eg+ with the parameter setting as in Theo-
rem 3.2 remain in the ball centered at u⇤ and of radius
where i ⇠ Unif{0, . . . , k} denotes an index i cho-
at most 2ku0 u⇤ k. Thus, Assumption 1 does not need
sen uniformly at random from the set {0, . . . , k}.
to hold globally for the result to apply; it suffices that
it only applies locally to points from the ball around
Proof. Applying Lemma 3.1 with the choice of ak and
u⇤ with radius 2ku0 u⇤ k.
from the theorem statement and with = 1, we get
Remark 3.4. It is possible to obtain similar conver-
1 ⇤ 1 ⇤ gence results as those of Theorem 3.2 under di↵erent
hk  ku uk k2 ku uk+1 k2
2 2 parameter choices. In particular, for 2 (0, 1], it suf-
1 ⇣ 1 ⌘ fices that ak  L and ⇢ < ak (1 ). We settled on
+ ⇢ kF (ūk )k2 .
4L 4L the choice made in Theorem 3.2 as it is simple and
1 requires tuning only one parameter, L.
By Assumption 1, ⇢ < 4L , and, thus, the constant
2
multiplying kF (ūk )k is strictly negative. Remark 3.5. Note that, in fact, we did not need
to assume that u⇤ from Assumption 1 is from U ⇤ ;
As hk 0 (by Assumption 1), we can conclude that it could have been any point from Rd for which As-
1 ⇤ 1 ⇤ sumption 1 is satisfied. All that would change in
ku uk+1 k2 ku uk k2 the proof of Theorem 3.2 is that in Eq. (3.4), using
2 2 (3.3)
1 ⇣ 1 ⌘ kF (uk )k  kF (uk ) F (u⇤ )k + kF (u⇤ )k (by triangle
 ⇢ kF (ūk )k2  0. inequality) we would have 2kuk u⇤ k + L1 kF (u⇤ )k on
4L 4L
⇣ ⌘ the right-hand side. Since u⇤ 2 Rd and F is Lipschitz-
As 4L1 1
⇢ > 0, Eq. (3.3) implies that kF (ūk )k continuous, if F is bounded at any point u 2 Rd ,
4L
converges to zero as k ! 1. Further, as ūk uk = kF (u⇤ )k is bounded as well. Thus, we can still con-
ak
F (uk ), using triangle inequality and F (u⇤ ) = 0 : clude that kūk u⇤ k is bounded and proceed with the
rest of the proof. An interesting consequence of this
ak observation and the proof of Theorem 3.2 is that As-
kūk u⇤ k  kuk u⇤ k + kF (uk ) F (u⇤ )k
sumption 1 guarantees existence of an (svi) solution.
⇣ ak ⌘
 1+L kuk u⇤ k = 2kuk u⇤ k,
4 Extensions: `p Norms and
(3.4)
where we have used that F is L-Lipschitz. Now, as
Stochastic Setups
kuk u⇤ k is bounded (by ku0 u⇤ k, from Eq. (3.3)),
In this section, we show how to extend the results of
it follows that the sequence {ūk } is bounded as well,
Section 3 to non-Euclidean, `p -normed setups (for ⇢ =
and thus has a converging subsequence. Let {ūki } be
0) and stochastic evaluations of F . In particular, we
any converging subsequence of {ūk } and let ū⇤ be its
let k · k = k · kp for p 2 (1, 1)3 and p⇤ = p p 1 . Further,
corresponding accumulation point. Then, as kF (ūk )k
converges to zero as k ! 1, it follows that kF (ūki )k we let F̃ denote the stochastic estimate of F that at
converges to zero as i ! 1, and so it must be ū⇤ 2 U ⇤ . iteration k satisfies:
E[F̃ (ūk )|F̄k ] = F (ūk ),
For Part (ii), telescoping Eq. (3.3), we get:
E[kF̃ (ūk ) F (ūk )k2p⇤ |F̄k ]  ¯k2
1 ⇣ 1 ⌘X (4.1)
k
⇢ kF (ūi )k2 E[F̃ (uk+1 )|Fk+1 ] = F (uk+1 ),
4L 4L i=0 E[kF̃ (uk+1 ) F (uk+1 )k2p⇤ |Fk+1 ]  k+1
2
,
1 ⇤ 2 1 1
 ku0 u k kuk+1 u⇤ k2  ku0 ⇤ 2
u k . 3
Note that the norms k · k1 and k · k1 are within a con-
2 2 2 1
stant factor of the `p -norm for p = 1+ log(d) and p = log(d),
Rearranging the last inequality: respectively, and so taking p 2 (1, 1) is w.l.o.g.—for any
1
k
p < 1 + log(d) or p > log(d), we can run the algorithm with
1 X 2Lku0 u⇤ k2 1
p = 1 + log d or p = log d, losing at most a constant factor
kF (ūi )k2  .
k + 1 i=0 (k + 1)(1/(4L) ⇢) in the convergence bound.
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

where Fk and F̄k denote the natural filtrations, includ- As in the case of Euclidean norms, the analysis relies
ing all the randomness up to the construction of points on the following merit function:
uk and ūk , respectively, and ¯k2 , k+1 2 are the variance ⇣ ⌘

constants. Observe that Fk ✓ F̄k and F̄k ✓ Fk+1 . To hk := ak hF (ūk ), ūk u⇤ i + kF (ūk )k2p⇤ . (4.7)
simplify the notation, we denote: 2
Moreover, as before, Assumption 1 guarantees that
⌘¯k = F̃ (ūk ) F (ūk ), ⌘k+1 = F̃ (uk+1 ) F (uk+1 ).
hk 0, 8k. Even though we only handle the case ⇢ = 0
(4.2)
for p 6= 2, the analysis is significantly more challenging
The variant of the method we consider here is stated than in the `2 case, and, due to space constraints, we
as follows: only state the main results here, while all the technical
na D E 1 o details are provided in Appendix C.
k
ūk = argmin F̃ (uk ), u uk + ku uk kqp ,
u2Rd q
n D E o Deterministic oracle access. The main result is
uk+1 = argmin ak F̃ (ūk ), u uk + p (u, uk ) , summarized in the following theorem.
u2Rd
(egp +) Theorem 4.1. Let p > 1 and let F : Rd ! Rd be an
where ( arbitrary L-Lipschitz operator w.r.t. k · kp that satisfies
2, if p 2 (1, 2], Assumption 1 with ⇢ = 0 for some u⇤ 2 U ⇤ . Assume
q= p (4.3) that we are given oracle access to the exact evaluations
p⇤ = p 1, if p 2 (2, 1)
of F, i.e., ⌘¯i = ⌘i = 0, 8i. Given an arbitrary initial
and point u0 2 Rd , let the sequences of points {ui }i 1 ,
( {ūi }i 0 evolve according to (egp +) for 2 (0, 1] and
D 12 k· u0 k2p (u, uk ), if p 2 (1, 2],
p (u, uk ) = step sizes {ai }i 0 specified below. Then, we have:
1
p ku uk kpp , if p 2 (2, 1).
(4.4) p m 3/2
(i) Let p 2 (1, 2]. If = mp = p 1, ak = 2L ,
Notice that for p = 2, egp + is equivalent to eg+. then all accumulation points of {uk }k 0 are in U ⇤ ,
Thus, egp + generalizes eg+ to arbitrary `p norms. and, furthermore 8k 0:
However, egp + is di↵erent from the standard Extra-
gradient or Mirror-Prox, for two reasons. First is that, 1 X
k
16L2 p (u⇤ , u0 )
as is the case for eg+, the step sizes that determine kF (ui )k2p⇤ 
k + 1 i=0 mp 2 (k + 1)
ūk and uk+1 (i.e., ak / and ak ) are not the same in
general, as we could (and will) choose 6= 1. Sec- ⇣ L2 ku⇤ u0 k2 ⌘
p
ond, unless p = q = 2, the function 1q ku uk kqp in =O 2 .
(p 1) (k + 1)
the definition of the algorithm is not a Bregman di-
vergence between points u and uk of any function . L2 ku⇤ u0 k2
In particular, within k = O (p 1)2 ✏2 p itera-
Further, when p > 2, 1q ku uk kqp is not strongly con-
vex. Instead, it is p-uniformly convex with constant tions egp + can output a point u with kF (u)kp⇤ 
1. Additionally, no function whose gap between the ✏.
maximum and the minimum value is bounded by a 1
constant on any ball of constant radius can have con- (ii) Let p 2 (2, 1). If = 2, k = > 0, ⇤ =
q 2
q 2 q/2 1
stant of strong convexity w.r.t. k·kp that is larger than q
2
L , and ak = 2⇤ = a, then, 8k 0:
O( d1 12/p ) (d’Aspremont et al., 2018). When p 2 (1, 2],
1 q
q ku uk kp is strongly convex with constant p 1 (Ne-
k
1 X ⇤ 2ku⇤ u0 kpp 2p
mirovski, 2004). We let mp denote the constant of kF (ūi )kpp⇤  p⇤ + p⇤ 1
.
k + 1 i=0 a (k + 1) a
strong/uniform convexity of 1q ku uk kqp , that is:
In particular, for any ✏ > 0, there is a choice of
mp = max{p 1, 1}. (4.5) 2
= C✏p L , where Cp is a constant that only depends
Observe that on p, such that egp + can output a point u with
mp kF (u)kp⇤  ✏ in at most
p (u, uk ) ku uk kqp . (4.6)
q ✓⇣ ◆
Lku⇤ u0 kp ⌘p
This is immediate for p > 2, by the definition of p k = Op

and using that q = p and mp = 1 when p > 2. For
p 2 (1, 2], we have that q = 2, and Eq. (4.6) follows by iterations. Here, the Op notation hides constants
strong convexity of 12 k · k2p . that only depend on p.
Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

Remark 4.2. There are significant technical obsta- E[kF̃ (u)kp⇤ ]  ✏ with at most
cles in generalizing the results from Theorem 4.1 to
⇣ L2 ku⇤ u0 k2p ⇣ 2 ⌘⌘
settings with ⇢ > 0. In particular, when p 2 (1, 2), the
O 1+
proof fails because we take p (u⇤ , u) to be the Breg- mp 2 ✏2 mp ✏2
man divergence of k · u0 k2p , and relating kūk uk kp
to kF (uk )kp⇤ would require k · k2p to be smooth, which oracle queries to F̃ , where mp = p 1.
is not true. If we had, instead, used ku⇤ uk2p in
(iii) Let p > 2 and ⇢ = 0. If = 12 and ak =
place of p (u⇤ , u), we would have incurred 12 ku⇤ 1
mp ⇤ a = 4⇤ , then egp + can output a point u with
uk k2p 2 ku uk+1 k2p in the upper bound on hk ,
which would not telescope, as in this case mp < 1. In E[kF̃ (u)kp⇤ ]  ✏ with at most
the case of p > 2, the challenges come from a deli- ✓⇣ ⇣ ⌘p ⇤ ⌘◆
Lku⇤ u0 kp ⌘p ⇣
cate relationship between the step sizes ak and error Op 1+
terms k . It turns out that it is possible to guaran- ✏ ✏
tee local convergence (in the region where kF (ūk )k2 is p
oracle queries to F̃ , where p⇤ = p 1.
bounded by a constant less than 1) with ⇢ > 0, but ⇢
would need to scale with poly(✏) in this case. As this
is a weak result whose usefulness is unclear, we have 5 Discussion
omitted it.
We introduced a new class of structured nonconvex-
Stochastic oracle access. To obtain results for the nonconcave min-max optimization problems and pro-
stochastic setups, we mainly need to bound stochas- posed a new generalization of the extragradient
tic error terms which decompose from the analysis of method that provably converges to a stationary point
deterministic setups, as in the following lemma. in Euclidean setups. Our algorithmic results guaran-
tee that problems in this class contain at least one
Lemma 4.3. Let E s = ak h⌘¯k , ūk u⇤ i
stationary point (an (svi) solution, see Remark 3.5).
ak h⌘k ⌘k , ūk uk+1 i, where ⌘k and ⌘k are defined
¯ ¯
The class we introduced generalizes other important
as in Eq. (4.2) and all the assumptions of Theorem 4.4
classes of structured nonconvex-nonconcave problems,
below apply. Then, for q defined by Eq. (4.3) and any
such as those in which an (mvi) solution exists. We
⌧ > 0:
further generalized our results to stochastic setups and
2q

/2 ⇤
ak q ( k 2 + ¯k2 )q

/2 h⌧q i `p -normed setups in which an (mvi) solution exists.
E[E s ]  +E kūk uk+1 kqp , An interesting direction for future research is to under-
q ⇤ ⌧ q⇤ q
stand to what extent we can further relax the assump-
where the expectation is w.r.t. all the randomness in tions about the structure of nonconvex-nonconcave
the algorithm. problems, while maintaining computational feasibility
of algorithms that can address them.
Theorem 4.4. Let p > 1 and let F : Rd ! Rd be an
arbitrary L-Lipschitz operator w.r.t. k · kp that satis-
Acknowledgements
fies Assumption 1 for some u⇤ 2 U ⇤ . Given an arbi-
trary initial point u0 2 Rd , let the sequences of points
We wish to thank Steve Wright for a useful discussion
{ui }i 1 , {ūi }i 0 evolve according to (egp +) for some
regarding convergence of sequences. We also wish to
2 (0, 1] and positive step sizes {ai }i 0 . Let the vari-
thank the Simons Institute for the Theory of Comput-
ance of a single query to the stochastic oracle F̃ be ing where some of this work was conducted.
bounded by some 2 < 1.
JD was supported by the Office of the Vice Chan-
⇥ cellor for Research and Graduate Education at the
(i) Let p = 2 and ⇢ 2 0, ⇢¯ , where ⇢¯ = If p1 .
4 2L University of Wisconsin–Madison with funding from
1 1
= 2 and ak = 2p2L , then egp + can output a
the Wisconsin Alumni Research Foundation and by
point u with E[kF̃ (u)k2 ]  ✏ with at most the NSF Award CCF-2007757. CD was supported by
⇣ Lku⇤ NSF Awards IIS-1741137, CCF-1617730, and CCF-
u0 k22 ⇣ 2 ⌘⌘
1901292, by a Simons Investigator Award, by the Si-
O 1+ 2
✏2 (¯
⇢ ⇢) L✏ (¯
⇢ ⇢) mons Collaboration on the Theory of Algorithmic Fair-
ness, and by the DOE PhILMs project (No. DE-AC05-
oracle queries to F̃ . 76RL01830). MJ was supported in part by the Math-
3/2
ematical Data Science program of the Office of Naval
p m
(ii) Let p 2 (1, 2] and ⇢ = 0. If ak = 2L and Research under grant number N00014-18-1-2764.
= mp , then egp + can output a point u with
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

References Daskalakis, C. and Panageas, I. (2018). The limit


points of (optimistic) gradient descent in min-max
Abernethy, J., Lai, K. A., and Wibisono, A. (2019). optimization. In Proc. NeurIPS’18, pages 9236–9246.
Last-iterate convergence rates for min-max optimiza-
tion. arXiv preprint arXiv:1906.02027. Daskalakis, C. and Panageas, I. (2019). Last-iterate
convergence: Zero-sum games and constrained min-
Adolphs, L., Daneshmand, H., Lucchi, A., and Hof-
max optimization. Proc. ITCS’19.
mann, T. (2019). Local saddle point optimization:
A curvature exploitation approach. In Proc. AIS- Daskalakis, C., Skoulakis, S., and Zampetakis, M.
TATS’19. (2021). The complexity of constrained min-max op-
timization. In Proc. STOC’21.
Alkousa, M., Dvinskikh, D., Stonyakin, F., and Gas-
nikov, A. (2019). Accelerated methods for compos- d’Aspremont, A., Guzman, C., and Jaggi, M. (2018).
ite non-bilinear saddle point problem. arXiv preprint Optimal affine-invariant smooth minimization algo-
arXiv:1906.03620. rithms. SIAM Journal on Optimization, 28(3):2384–
Azizian, W., Mitliagkas, I., Lacoste-Julien, S., and 2405.
Gidel, G. (2020). A tight and unified analysis of Devolder, O., Glineur, F., and Nesterov, Y. (2014).
extragradient for a whole spectrum of di↵erentiable First-order methods of smooth convex optimization
games. In Proc. AISTATS’20. with inexact oracle. Mathematical Programming,
Bauschke, H. H., Moursi, W. M., and Wang, X. 146(1-2):37–75.
(2020). Generalized monotone operators and their av- Diakonikolas, J. (2020). Halpern iteration for near-
eraged resolvents. Mathematical Programming, pages optimal and parameter-free monotone inclusion and
1–20. strong solutions to variational inequalities. In
Bertsekas, D. P. (1971). Control of uncertain systems Proc. COLT’20.
with a set-membership description of the uncertainty.
Gidel, G., Hemmat, R. A., Pezeshki, M., Le Priol,
PhD thesis, MIT.
R., Huang, G., Lacoste-Julien, S., and Mitliagkas,
Bertsekas, D. P., Nedic, A., and Ozdaglar, A. E. I. (2019). Negative momentum for improved game
(2003). Convex Analysis and Optimization. Athena dynamics. In Proc. AISTATS’19.
Scientific.
Golowich, N., Pattathil, S., Daskalakis, C., and
Borwein, J., Guirao, A., Hájek, P., and Vanderwer↵, Ozdaglar, A. E. (2020). Last iterate is slower than av-
J. (2009). Uniformly convex functions on Banach eraged iterate in smooth convex-concave saddle point
spaces. Proceedings of the American Mathematical problems. In Proc. COLT’2020.
Society, 137(3):1081–1091.
Hamedani, E. Y. and Aybat, N. S. (2018). A primal-
Borwein, J. M. and Zhu, Q. J. (2004). Techniques of dual algorithm for general convex-concave saddle
Variational Analysis. Springer. point problems. arXiv preprint arXiv:1803.01401.
Boyd, S., Boyd, S. P., and Vandenberghe, L. (2004). Hirsch, M. D., Papadimitriou, C. H., and Vavasis,
Convex optimization. Cambridge university press. S. A. (1989). Exponential lower bounds for finding
Combettes, P. L. and Pennanen, T. (2004). Proxi- brouwer fix points. Journal of Complexity, 5(4):379–
mal methods for cohypomonotone operators. SIAM 416.
journal on control and optimization, 43(2):731–742. Jin, C., Netrapalli, P., and Jordan, M. I. (2020).
Dang, C. D. and Lan, G. (2015). On the convergence What is local optimality in nonconvex-nonconcave
properties of non-euclidean extragradient methods minimax optimization? In Proc. ICML’20.
for variational inequalities with generalized monotone Kinderlehrer, D. and Stampacchia, G. (2000). An
operators. Computational Optimization and applica- introduction to variational inequalities and their ap-
tions, 60(2):277–310. plications. SIAM.
Daskalakis, C., Foster, D. J., and Golowich,
Kong, W. and Monteiro, R. D. (2019). An ac-
N. (2020). Independent policy gradient meth-
celerated inexact proximal point method for solv-
ods for competitive reinforcement learning. In
ing nonconvex-concave min-max problems. arXiv
Proc. NeurIPS’20.
preprint arXiv:1905.13433.
Daskalakis, C., Ilyas, A., Syrgkanis, V., and Zeng,
H. (2018). Training GANs with optimism. In
Proc. ICLR’18.
Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

Korpelevich, G. (1976). The extragradient method Mertikopoulos, P., Papadimitriou, C. H., and Pil-
for finding saddle points and other problems. Mate- iouras, G. (2018). Cycles in adversarial regularized
con, 12:747–756. learning. In Proc. ACM-SIAM SODA’18.
Liang, T. and Stokes, J. (2019). Interaction matters: Mokhtari, A., Ozdaglar, A., and Pattathil, S. (2020).
A note on non-asymptotic local convergence of gen- A unified analysis of extra-gradient and optimistic
erative adversarial networks. In Proc. AISTATS’19. gradient methods for saddle point problems: Proxi-
mal point approach. In Proc. AISTATS’20.
Lin, Q., Liu, M., Rafique, H., and Yang, T. (2018).
Solving weakly-convex-weakly-concave saddle-point Nemirovski, A. (2004). Regular Banach spaces
problems as successive strongly monotone variational and large deviations of random sums. Un-
inequalities. arXiv preprint arXiv:1810.10207. published, E-print: https: // www2. isye. gatech.
edu/ ~ nemirovs/ LargeDev2004. pdf .
Lin, T., Jin, C., and Jordan, M. (2020a). Near-
optimal algorithms for minimax optimization. In Nesterov, Y. (2015). Universal gradient methods for
Proc. COLT’20. convex optimization problems. Mathematical Pro-
gramming, 152(1-2):381–404.
Lin, T., Jin, C., and Jordan, M. I. (2020b). On gra-
dient descent ascent for nonconvex-concave minimax Nouiehed, M., Sanjabi, M., Huang, T., Lee, J. D., and
problems. In Proc. ICML’20. Razaviyayn, M. (2019). Solving a class of non-convex
min-max games using iterative first order methods.
Liu, M., Mroueh, Y., Ross, J., Zhang, W., Cui, X., In Proc. NeurIPS’19.
Das, P., and Yang, T. (2020). Towards better under-
standing of adaptive gradient algorithms in genera- Papadimitriou, C. H. (1994). On the complexity of
tive adversarial nets. In Proc. ICLR’20. the parity argument and other inefficient proofs of
existence. Journal of Computer and system Sciences,
Lu, S., Tsaknakis, I., Hong, M., and Chen, Y. (2020). 48(3):498–532.
Hybrid block successive approximation for one-sided
non-convex min-max problems: Algorithms and ap- Song, C., Zhou, Y., Zhou, Z., Jiang, Y., and Ma,
plications. IEEE Transactions on Signal Processing, Y. (2020). Optimistic dual extrapolation for co-
68:3676–3691. herent non-monotone variational inequalities. In
Proc. NeurIPS’20.
Malitsky, Y. (2019). Golden ratio algorithms for vari-
ational inequalities. Mathematical Programming. Thekumparampil, K. K., Jain, P., Netrapalli, P., and
Oh, S. (2019). Efficient algorithms for smooth mini-
Mangoubi, O., Sachdeva, S., and Vishnoi, N. K. max optimization. In Proc. NeurIPS’19.
(2020). A provably convergent and practical algo-
rithm for min-max optimization with applications to Wang, Y., Zhang, G., and Ba, J. (2019). On solv-
GANs. arXiv preprint arXiv:2006.12376. ing minimax optimization locally: A follow-the-ridge
approach. In Proc. ICLR’19.
Mangoubi, O. and Vishnoi, N. K. (2021). Greedy
adversarial equilibrium: An efficient alternative to Yang, J., Kiyavash, N., and He, N. (2020). Global
nonconvex-nonconcave min-max optimization. In convergence and variance-reduced optimization for a
Proc. ICLR’21. class of nonconvex-nonconcave minimax problems. In
Proc. NeurIPS’20.
Mazumdar, E., Ratli↵, L. J., and Sastry, S. S.
(2020). On gradient-based learning in continuous Zhao, R. (2019). Optimal algorithms for stochastic
games. SIAM Journal on Mathematics of Data Sci- three-composite convex-concave saddle point prob-
ence, 2(1):103–131. lems. arXiv preprint arXiv:1903.01687.

Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.- Zhou, Z., Mertikopoulos, P., Bambos, N., Boyd, S.,
S., Chandrasekhar, V., and Piliouras, G. (2019). Op- and Glynn, P. W. (2017). Stochastic mirror descent
timistic mirror descent in saddle-point problems: Go- in variationally coherent optimization problems. In
ing the extra (gradient) mile. In Proc. ICLR’19. Proc. NIPS’17.

You might also like