0% found this document useful (0 votes)
5 views32 pages

Robust Berthing Control

This document presents a novel approach to saddle point optimization using approximate minimization oracles, demonstrating linear convergence towards a global min-max saddle point in strongly convex-concave problems. The method, named Adversarial-CMA-ES, is applied to robust automatic berthing control, addressing uncertainties in modeling and real-world conditions. The results indicate the effectiveness of the proposed optimization strategy in achieving reliable solutions under uncertainty.

Uploaded by

690977791qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views32 pages

Robust Berthing Control

This document presents a novel approach to saddle point optimization using approximate minimization oracles, demonstrating linear convergence towards a global min-max saddle point in strongly convex-concave problems. The method, named Adversarial-CMA-ES, is applied to robust automatic berthing control, addressing uncertainties in modeling and real-world conditions. The results indicate the effectiveness of the proposed optimization strategy in achieving reliable solutions under uncertainty.

Uploaded by

690977791qq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

1

Saddle Point Optimization with Approximate Minimization


Oracle and its Application to Robust Berthing Control

YOUHEI AKIMOTO, Faculty of Engineering, Information and Systems, University of Tsukuba & RIKEN
arXiv:2105.11586v4 [math.OC] 4 Jan 2022

Center for Advanced Intelligence Project, Japan


YOSHIKI MIYAUCHI and ATSUO MAKI, Department of Naval Architecture and Ocean Engineering,
Graduate School of Engineering, Osaka University, Japan
We propose an approach to saddle point optimization relying only on oracles that solve minimization problems
approximately. We analyze its convergence property on a strongly convex–concave problem and show its
linear convergence toward the global min–max saddle point. Based on the convergence analysis, we develop a
heuristic approach to adapt the learning rate. An implementation of the developed approach using the (1+1)-
CMA-ES as the minimization oracle, namely Adversarial-CMA-ES, is shown to outperform several existing
approaches on test problems. Numerical evaluation confirms the tightness of the theoretical convergence
rate bound as well as the efficiency of the learning rate adaptation mechanism. As an example of real-world
problems, the suggested optimization method is applied to automatic berthing control problems under model
uncertainties, showing its usefulness in obtaining solutions robust to uncertainty.
CCS Concepts: • Mathematics of computing → Continuous optimization; • Theory of computation
→ Convergence and learning in games; Theory of randomized search heuristics.
Additional Key Words and Phrases: Minimax Optimization, Saddle Point Optimization, Robust Optimization,
Robust Control, Reliability, Zero-order Approach, Convergence Analysis, Automatic Berthing
ACM Reference Format:
Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki. 2022. Saddle Point Optimization with Approximate
Minimization Oracle and its Application to Robust Berthing Control. ACM Trans. Evol. Learn. 1, 1, Article 1
(January 2022), 32 pages. https://doi.org/10.1145/3510425

1 INTRODUCTION
Simulation-based optimization has recently received increasing attention from researchers. Here, the
objective function ℎ : X → R, where X ⊆ R𝑚 , is not explicitly given, but its value for each 𝑥 ∈ X
can be computed through computational simulation. Solvers for simulation-based optimization
problems have been widely developed. While some are domain-specific, others are general-purpose
solvers. For a case where simulation-based optimization is required, we first need to design a
simulator that models reality, for example, a physical equation, and computes the objective function
value for each solution. Then, we apply a numerical solver to solve argmin𝑥 ∈X ℎ(𝑥). However, owing
to modeling errors and uncertainties, the optimal solution to argmin𝑥 ∈X ℎ(𝑥) computed through a
Authors’ addresses: Youhei Akimoto, [email protected], Faculty of Engineering, Information and Systems, University
of Tsukuba & RIKEN Center for Advanced Intelligence Project, 1-1-1 Tennodai, Tsukuba, Ibaraki, Japan, 305-8573; Yoshiki
Miyauchi, [email protected]; Atsuo Maki, [email protected], Department of Naval
Architecture and Ocean Engineering, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka,
565-0971, Japan.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2022 Association for Computing Machinery.
2688-3007/2022/1-ART1 $15.00
https://doi.org/10.1145/3510425

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:2 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

simulator is not necessarily optimal in real environments in which the obtained solution is used.
This issue threatens the reliability of solutions obtained through simulation-based optimization.
An approach to obtain a solution that is robust against modeling errors and uncertainty is to
formulate the problem as a min–max optimization
min max 𝑓 (𝑥, 𝑦), (1)
𝑥 ∈X 𝑦 ∈Y

where 𝑦 ∈ Y represents the model parameters and the uncertain parameters. In the following,
𝑦 is referred to as the uncertainty parameter. Assume that the real environment is represented
by 𝑦real ∈ Y. The original objective ℎ(𝑥) is equivalent to 𝑓 (𝑥, 𝑦est ) with an estimated parameter
𝑦est ∈ Y. Then, the solution 𝑥 𝑦est = argmin𝑥 ∈X 𝑓 (𝑥, 𝑦est ) obtained via simulation does not guaran-
tee good performance in the real environment. That is, 𝑓 (𝑥 𝑦est , 𝑦real ) may be arbitrarily greater
than 𝑓 (𝑥 𝑦est , 𝑦est ). In contrast, the solution 𝑥 Y = argmin𝑥 ∈X max𝑦 ∈Y 𝑓 (𝑥, 𝑦) to (1) guarantees that
𝑓 (𝑥 Y, 𝑦real ) ⩽ max𝑦 ∈Y 𝑓 (𝑥 Y, 𝑦). That is, by minimizing the worst-case objective value, one can
guarantee performance in the real environment provided that 𝑦real ∈ Y.
Robust Berthing Control. As an important real-world application of the min–max optimization (1),
we consider an automatic ship berthing task [Maki et al. 2021, 2020], which can be formulated as
an optimization of the feedback controller of a ship. Currently, the domestic shipping industry in
Japan is facing a shortage of experienced on-board officers. Moreover, the existing workforce of
officers is aging [Ministry of Land and Tourism 2020]. This has generated considerable interest
in autonomous ship operation to improve maritime safety, shipboard working environments, and
productivity, and the technology is being actively developed. Automatic berthing/docking requires
fine control so that the ship can reach the target position located near the berth but avoid colliding
with it. Therefore, automatic berthing is central to the realization of automatic ship operations.
Because it is difficult to train the controller in a real environment owing to cost and safety issues, a
typical approach first models the state equation of a ship, for example, using system identification
techniques [Abkowitz 1980; Araki et al. 2012; Miyauchi et al. 2021a; Wakita et al. 2021] and then
optimizes the feedback controller in a simulator. However, such an approach always suffers from
modeling errors and uncertainties. For instance, the coefficients of a state equation model are often
estimated based on captive model tests in towing tanks and regressions; hence, they may include
errors. Moreover, the weather conditions at the time of operation could differ from those considered
in the model. Optimization of the feedback controller on a simulator with an estimated model
may result in a catastrophic accident, such as collision with the berth. Thus, to design a berthing
control solution robust against modeling errors and uncertainties, we formulate the problem as
a min–max optimization (1), where 𝑥 is the parameter of the feedback controller and 𝑦 is the
parameter representing the coefficients of the state equation model and weather conditions.
Saddle Point Optimization. Here, we consider min–max continuous optimization (1), where
𝑓 : X × Y → R is the objective function and X × Y ⊆ R𝑚 × R𝑛 is the search domain. In addition to
the abovementioned situation, min–max optimization can be applied in many fields of engineering,
including robust design [Conn and Vicente 2012; Qiu et al. 2018], robust control [Pinto et al. 2017;
Shioya et al. 2018], constrained optimization [Cherukuri et al. 2017], and generative adversarial
networks (GANs) [Goodfellow et al. 2014; Salimans et al. 2016]. In particular, we are interested in
the min–max optimization of a derivative-free and black-box objective 𝑓 , where the gradient or
higher-order information is unavailable (derivative-free) and no special structures such as convexity
or the Lipschitz constant are available in advance (black-box) [Frazier 2018].1
1 In
this paper, simulation-based optimization is used to refer to the problems described in the first paragraph above. The
terms derivative-free and black-box are used to refer to the characteristics of the objective function.

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:3

We aim to locate a local min–max saddle point of 𝑓 , that is, a point (𝑥 ∗, 𝑦 ∗ ) satisfying 𝑓 (𝑥, 𝑦 ∗ ) ⩾
𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦) in a neighborhood of (𝑥 ∗, 𝑦 ∗ ). Generally, it is difficult to locate the global
minimum of the worst-case objective 𝐹 (𝑥) := max𝑦 ∈Y 𝑓 (𝑥, 𝑦). In a non-convex optimization
context, the goal is often to locate a local minimum of an objective rather than the global minimum
as a realistic target. However, in the min–max optimization context, it is still difficult to locate a
local minimum of the worst-case objective 𝐹 (𝑥) because doing so requires the maximization itself
and there may exist local maxima of 𝑓 (𝑥, 𝑦) unless 𝑓 (𝑥, 𝑦) is concave in 𝑦 for all 𝑥. A local min–max
saddle point is considered as a local optimal solution in the min–max optimization context because
it is a local minimum in 𝑥 and a local maximum in 𝑦. Therefore, as a practical target, we focus on
locating the local min–max saddle point of (1).
Related Works. First-order approaches are often employed for (1) if gradients are available. A
simultaneous gradient descent-ascent (GDA) approach
(𝑥𝑡 +1, 𝑦𝑡 +1 ) = (𝑥𝑡 , 𝑦𝑡 ) + 𝜂 (−∇𝑥 𝑓 (𝑥𝑡 , 𝑦𝑡 ), ∇𝑦 𝑓 (𝑥𝑡 , 𝑦𝑡 )) (2)
has often been analyzed for its local and global convergence properties on twice continuously
differentiable functions owing to its simplicity and popularity. A condition on the learning rate
𝜂 > 0 for the dynamics (2) to be asymptotically stable at a local min–max saddle point has been
studied [Mescheder et al. 2017; Nagarajan and Kolter 2017]. Subsequently, Adolphs et al. [2019]
showed the existence of asymptotically stable points of (2) that are not local min–max saddle
points. Liang and Stokes [2019] have derived a sufficient condition on 𝜂 for (2) to converge toward
the global min–max saddle point on a locally strongly convex–concave function. Frank-Wolfe
type approaches have also been analyzed for constrained situations [Gidel et al. 2017; Nouiehed
et al. 2019]. Although a convergence guarantee was not provided, [Bertsimas et al. 2010a,b] have
proposed a first-order approach targeting on 𝑓 that is non-concave in 𝑦.
Zero-order approaches for (1) include coevolutionary approaches [Al-Dujaili et al. 2019; Branke
and Rosenbusch 2008; Jensen 2004; Qiu et al. 2018; Zhou and Zhang 2010], surrogate-model–based
approaches [Bogunovic et al. 2018; Conn and Vicente 2012; Picheny et al. 2019], and gradient approx-
imation approaches [Liu et al. 2020]. Compared to first-order approaches, zero-order approaches
have not been thoroughly analyzed in terms of their convergence guarantees and convergence
rates. In particular, coevolutionary approaches are often designed heuristically and without conver-
gence guarantees. Indeed, they fail to converge toward a min–max saddle point even on strongly
convex–concave problems, as has been reported in [Akimoto 2021] and as noted below in the
experimental results. Recently, Bogunovic et al. [2018] showed regret bounds for a Bayesian op-
timization approach and Liu et al. [2020] showed an error bound for a gradient approximation
approach, where the error is measured by the square norm of the gradient. Both analyses show
sublinear rates under possibly stochastic (i.e., noisy) versions of (1). However, compared to the
first-order approach, which exhibits linear convergence, they show slower convergence.
Contributions. We propose an approach to saddle point optimization (1) that relies solely
on numerical solvers that approximately solve argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦) for each 𝑦 ∈ Y and
argmin𝑦′ ∈Y −𝑓 (𝑥, 𝑦 ′) for each 𝑥 ∈ X. Given an initial solution (𝑥 0, 𝑦0 ) ∈ X × Y, our approach itera-
tively locates the approximate solutions 𝑥˜𝑡 ≈ argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦𝑡 ) and 𝑦˜𝑡 ≈ argmin𝑦′ ∈Y −𝑓 (𝑥𝑡 , 𝑦 ′)
and updates the solution as
(𝑥𝑡 +1, 𝑦𝑡 +1 ) = (𝑥𝑡 , 𝑦𝑡 ) + 𝜂 · (𝑥˜𝑡 − 𝑥𝑡 , 𝑦˜𝑡 − 𝑦𝑡 ), (3)
where 𝜂 > 0 is the learning rate. This approach takes inspiration from the GDA method (2),
where we replace −∇𝑥 𝑓 (𝑥𝑡 , 𝑦𝑡 ) and ∇𝑦 𝑓 (𝑥𝑡 , 𝑦𝑡 ) with 𝑥˜𝑡 − 𝑥𝑡 and 𝑦˜𝑡 − 𝑦𝑡 . However, unlike the
GDA approach, the solvers need not be gradient-based. This is advantageous in the following

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:4 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

situations: (1) there exists a well-developed numerical solver suitable for argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦) and/or
argmin𝑦′ ∈Y −𝑓 (𝑥, 𝑦 ′); (2) derivative-free approaches such as the covariance matrix adaptation
evolution strategy (CMA-ES) [Akimoto and Hansen 2020; Hansen and Auger 2014; Hansen et al.
2003; Hansen and Ostermeier 2001] are sought because gradient information is not available or
gradient-based approaches are known to be sensitive to their initial search points.
We analyze the proposed approach on strongly convex–concave problems, and prove its linear
convergence in terms of the number of numerical solver calls. In particular, we provide an upper
bound on 𝜂 to guarantee linear convergence toward the global min–max saddle point and the
convergence rate bound. This corresponds to the known result for the GDA approach (2). Compared
to existing derivative-free approaches for saddle point optimization, this result is unique in that our
convergence is linear, while the existing results show sublinear convergence [Bogunovic et al. 2018;
Liu et al. 2020]. Although our motivational application is not necessarily a strongly convex–concave
problem, the quantitative analysis helps to understand limitations of the approach—need for 𝜂
adaptation—and provide inspiration on how to improve the approach.
Moreover, we developed a heuristic adaptation mechanism for the learning rate in a black-box
optimization setting. In the black-box setting, we do not know in advance the characteristic constants
of a problem that determines the upper bound for the learning rate to guarantee convergence.
Therefore, a learning rate adaptation mechanism is highly desired to avoid trial and error in tuning
the learning rate. We implemented two variants of the proposed approach, one using (1+1)-CMA-
ES [Arnold and Hansen 2010], a zero-order approach, as the minimization solver, and another
using SLSQP [Kraft 1988], a first-order approach. Empirical studies on test problems show that
the learning rate adaptation achieved performance competitive to the proposed approach with the
optimal static learning rate, while obviating the need for time-consuming parameter tuning. We
also demonstrate the limitations of existing coevolutionary approaches as well as the proposed
approach.
We apply our approach to robust berthing control optimization, as an example of a real-world
application with a non-convex-concave objective. We consider the wind conditions and the co-
efficients of the state equation for the wind force as the uncertainty parameter 𝑦. Some related
works address the wind force as an external disturbance when planning the trajectories [Miyauchi
et al. 2021b]; however, they treat the wind condition as an observable disturbance, and the control
signal is selected according to the observed wind condition. In contrast, we optimize the on-line
feedback controller under wind disturbance without considering the wind condition as an input to
the controller. Moreover, among other studies on automatic berthing control, to the best of our
knowledge, the present work is the first to address model uncertainty. Compared to a naive baseline
approach, the proposed approach located solutions with better worst-case performance.
This paper is an extension of a previous work [Akimoto 2021]. We have improved on the previous
work in the following respects. First, we improved the convergence analysis in Section 3.2. We
have removed unnecessary assumptions on the problem by refining the proof. Second, we have
incorporated the covariance matrix adaptation into our proposed approach in Section 4.3. Third, we
have implemented a restart strategy and other ingenuity for practical use, summarized in Section 4.2.
Fourth, we have extended the comparison with existing approaches in Section 5.2. Finally, we have
evaluated the usefulness of the proposed approach in a real-world application in Section 6.
Our implementation of the proposed approach in the Python programming language, Adversarial-
CMA-ES, is publicly available at GitHub Gist.2
Notation. For a twice continuously differentiable function 𝑓 : R𝑚 × R𝑛 → R, that is, 𝑓 ∈
C 2 (R𝑚 × R𝑛 , R), let 𝐻𝑥,𝑥 (𝑥, 𝑦), 𝐻𝑥,𝑦 (𝑥, 𝑦), 𝐻 𝑦,𝑥 (𝑥, 𝑦), and 𝐻 𝑦,𝑦 (𝑥, 𝑦) be the blocks of the Hessian
2 https://gist.github.com/youheiakimoto/ab51e88c73baf68effd95b750100aad0

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:5

 
𝐻𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑦 (𝑥, 𝑦) 𝜕2 𝑓 𝜕2 𝑓
matrix ∇2 𝑓 (𝑥, 𝑦) = of 𝑓 , whose (𝑖, 𝑗)-th components are , ,
𝐻 𝑦,𝑥 (𝑥, 𝑦) 𝐻 𝑦,𝑦 (𝑥, 𝑦) 𝜕𝑥𝑖 𝜕𝑥 𝑗 𝜕𝑥𝑖 𝜕𝑦 𝑗
𝜕2 𝑓 𝜕2 𝑓
, and 𝜕𝑦𝑖 𝜕𝑦 𝑗 , respectively, evaluated at a given point (𝑥, 𝑦).
𝜕𝑦𝑖 𝜕𝑥 𝑗
For symmetric matrices 𝐴 and 𝐵, by 𝐴 ≽ 𝐵 and 𝐴 ≻ 𝐵, we mean that 𝐴 − 𝐵 is non-negative
and positive definite, respectively. For simplicity, we write 𝐴 ≽ 𝑎 and 𝐴 ≻ 𝑎 for 𝑎√∈ R to mean
𝐴 ≽ 𝑎 · 𝐼 and 𝐴 ≻ 𝑎 · 𝐼 , respectively.
√ For a positive definite symmetric matrix 𝐴, let 𝐴 √ denote
√ the
matrix square root, that is, 𝐴 is a positive definite symmetric matrix such that 𝐴 = 𝐴 · 𝐴. Let
∥𝑧 ∥𝐴 = [𝑧 T𝐴𝑧] 1/2 for a positive definite symmetric 𝐴.
Let 𝐽𝑔 (𝑧) denote the Jacobian of a differentiable 𝑔 = (𝑔1, . . . , 𝑔𝑘 ) : Rℓ → R𝑘 , where the (𝑖, 𝑗)-th
element is 𝜕𝑔𝑖 /𝜕𝑧 𝑗 evaluated at 𝑧 = (𝑧 1, . . . , 𝑧 ℓ ) ∈ Rℓ . If 𝑘 = 1, we write 𝐽𝑔 (𝑧) = ∇𝑔(𝑧) T .

2 SADDLE POINT OPTIMIZATION


Our objective is to locate the global or local min–max saddle point of the min–max optimization
problem (1). In the following we first define the min–max saddle point. We introduce the notion of
the suboptimality error to measure the progress toward the global min–max saddle point. Finally,
we introduce a strongly convex–concave function as an important class of the objective function,
on which we performed convergence analysis, which is described in the next section.

2.1 Min–Max Saddle Point


The min–max saddle point of a function 𝑓 : X × Y → R is defined as follows. Here, E𝑧 ⊆ Rℓ is
called a neighborhood of 𝑧 ∗ ∈ Rℓ if there exists an open ball 𝐵(𝑧 ∗, 𝑟 ) = {𝑧 ∈ Rℓ : ∥𝑧 − 𝑧 ∗ ∥ < 𝑟 } such
that 𝐵(𝑧 ∗, 𝑟 ) ⊆ E𝑧 . A point (𝑥, 𝑦) is called a critical point if ∇𝑓 (𝑥, 𝑦) = (∇𝑥 𝑓 (𝑥, 𝑦), ∇𝑦 𝑓 (𝑥, 𝑦)) = 0.
Definition 2.1 (Min–Max Saddle Point). A point (𝑥 ∗, 𝑦 ∗ ) ∈ X × Y is a local min–max saddle point
of a function 𝑓 : X × Y → R if there exists a neighborhood E𝑥 × E 𝑦 ⊆ X × Y including (𝑥 ∗, 𝑦 ∗ )
such that for any (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 , the condition 𝑓 (𝑥, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦) holds. If E𝑥 = X
and E 𝑦 = Y, the point (𝑥 ∗, 𝑦 ∗ ) is called the global min–max saddle point. If the equality holds only
if (𝑥, 𝑦) = (𝑥 ∗, 𝑦 ∗ ), it is called a strict min–max saddle point.
For twice continuously differentiable function 𝑓 ∈ C 2 (X × Y, R), a point (𝑥 ∗, 𝑦 ∗ ) is a strict local
min–max saddle point if it is a critical point and 𝐻𝑥,𝑥 (𝑥 ∗, 𝑦 ∗ ) ≻ 0 and 𝐻 𝑦,𝑦 (𝑥 ∗, 𝑦 ∗ ) ≺ 0 hold. In
general, the opposite does not hold. For example, a local min–max saddle point can be a boundary
point of X × Y that is not a critical point.
We comment on the relation between the min–max saddle point and the solutions to the worst-
case objective function 𝐹 (𝑥) := max𝑦 ∈Y 𝑓 (𝑥, 𝑦). If there exists a global min–max saddle point
(𝑥 ∗, 𝑦 ∗ ) of 𝑓 , then 𝑥 ∗ is one of the global minimal point of the worst-case objective function 𝐹 and
we have 𝐹 (𝑥 ∗ ) = 𝑓 (𝑥 ∗, 𝑦 ∗ ). However, a global minimal point 𝑥¯ ∈ X of 𝐹 (𝑥) and its corresponding
worst uncertainty parameter 𝑦¯ ∈ argmax𝑦 ∈Y 𝑓 (𝑥, ¯ 𝑦) do not necessarily form a global min–max
saddle point in general. An example case is 𝑓 (𝑥, 𝑦) = (𝑥 + sin(𝜋𝑦)) 2 , where the worst-case objective
function is 𝐹 (𝑥) = (|𝑥 | + 1) 2 and its global minimal point is 𝑥¯ = 0. The corresponding worst
uncertainty parameters are 12 + 𝑖 for all integer 𝑖, but (𝑥, ¯ 𝑦)
¯ is not even a local min–max saddle
point for any 𝑦¯ = 12 + 𝑖. Moreover, if (𝑥 ∗, 𝑦 ∗ ) is a local min–max saddle point of 𝑓 , the point 𝑥 ∗ is
not necessarily a local minimal point of 𝐹 .

2.2 Suboptimality Error


The suboptimality error [Gidel et al. 2017] is a quantity that measures the progress toward the
global min–max saddle point, defined as follows.

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:6 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

Definition 2.2 (Suboptimality Error). For function 𝑓 : X × Y → R, the suboptimality error


𝐺𝑥 : X × Y → [0, ∞) in 𝑥 and the suboptimality error 𝐺 𝑦 : X × Y → [0, ∞) in 𝑦 are defined as
𝐺𝑥 (𝑥, 𝑦) := 𝑓 (𝑥, 𝑦) − min

𝑓 (𝑥 ′, 𝑦), (4)
𝑥 ∈X

𝐺 𝑦 (𝑥, 𝑦) := max

𝑓 (𝑥, 𝑦 ) − 𝑓 (𝑥, 𝑦), (5)
𝑦 ∈Y

and the suboptimality error is


𝐺 (𝑥, 𝑦) := 𝐺𝑥 (𝑥, 𝑦) + 𝐺 𝑦 (𝑥, 𝑦) = max

𝑓 (𝑥, 𝑦 ′) − min

𝑓 (𝑥 ′, 𝑦). (6)
𝑦 ∈Y 𝑥 ∈X

The suboptimality error 𝐺 (𝑥, 𝑦) is zero if and only if (𝑥, 𝑦) is the global min–max saddle point
of 𝑓 . Moreover, the local min–max saddle points of 𝑓 are characterized by suboptimality errors.
This is summarized in the following proposition, whose proof is given in Appendix A.
Proposition 2.3. The point (𝑥 ∗, 𝑦 ∗ ) is the global min–max saddle point of 𝑓 if and only if it is the
global minimal point of 𝐺, that is, 𝐺 (𝑥, 𝑦) ⩾ 0 for any (𝑥, 𝑦) ∈ X × Y. The point (𝑥 ∗, 𝑦 ∗ ) is a local
min–max saddle point of 𝑓 if and only if 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (·, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, ·),
respectively, that is, there exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝐺𝑥 (𝑥, 𝑦 ∗ ) ⩾ 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ )
and 𝐺 𝑦 (𝑥 ∗, 𝑦) ⩾ 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) for any (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 .

2.3 Strongly Convex–Concave Function


A strongly convex–concave function is often considered for the theoretical investigation of first-
order min–max optimization approaches.
Definition 2.4. A twice continuously differentiable function 𝑓 ∈ C 2 (R𝑚 × R𝑛 , R) is locally 𝜇-
strongly convex–concave around a critical point (𝑥 ∗, 𝑦 ∗ ) for some 𝜇 > 0 if there exist open sets
E𝑥 ⊆ R𝑚 including 𝑥 ∗ and E 𝑦 ⊆ R𝑛 including 𝑦 ∗ such that 𝐻𝑥,𝑥 (𝑥, 𝑦) ≽ 𝜇 and −𝐻 𝑦,𝑦 (𝑥, 𝑦) ≽ 𝜇 for
all (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . The function 𝑓 is globally 𝜇-strongly convex–concave if E𝑥 = R𝑚 and E 𝑦 = R𝑛 .
We say that 𝑓 is locally or globally strongly convex–concave if 𝑓 is locally or globally 𝜇-strongly
convex–concave for some 𝜇 > 0.
If the objective function 𝑓 is a globally strongly convex–concave, the global minimal point of
the worst-case objective function 𝐹 (𝑥) is the global min–max saddle point, and it is the only local
min–max saddle point.
The implicit function theorem, for example, Theorem 5 of [de Oliveira 2013], provides important
characteristics of strongly convex–concave functions.
Proposition 2.5 (Implicit Function Theorem). Let (𝑥 ∗, 𝑦 ∗ ) be a min–max saddle point of
𝑓 ∈ 𝐶 2 (R𝑚 × R𝑛 , R) and 𝑓 be (at least) locally strongly convex–concave around (𝑥 ∗, 𝑦 ∗ ) in E𝑥 × E 𝑦 ⊆
R𝑚 × R𝑛 .
There exist open sets E𝑥,𝑥 ⊆ E𝑥 including 𝑥 ∗ and E𝑥,𝑦 ⊆ E 𝑦 including 𝑦 ∗ , such that there
exists a unique 𝑦ˆ : E𝑥,𝑥 → E𝑥,𝑦 such that ∇𝑦 𝑓 (𝑥, 𝑦ˆ (𝑥)) = 0. Moreover, 𝑦 ∗ = 𝑦ˆ (𝑥 ∗ ) and
𝐽𝑦ˆ (𝑥) = −(𝐻 𝑦,𝑦 (𝑥, 𝑦ˆ (𝑥))) −1 𝐻 𝑦,𝑥 (𝑥, 𝑦ˆ (𝑥)) for all 𝑥 ∈ E𝑥,𝑥 .
Analogously, there exist open sets E 𝑦,𝑦 ⊆ E 𝑦 including 𝑦 ∗ and E 𝑦,𝑥 ⊆ E𝑥 including 𝑥 ∗ , such
that there exists a unique 𝑥ˆ : E 𝑦,𝑦 → E 𝑦,𝑥 such that ∇𝑥 𝑓 (𝑥ˆ (𝑦), 𝑦) = 0. Moreover, 𝑥 ∗ = 𝑥ˆ (𝑦 ∗ ) and
𝐽𝑥ˆ (𝑦) = −(𝐻𝑥,𝑥 (𝑥ˆ (𝑦), 𝑦)) −1𝐻𝑥,𝑦 (𝑥ˆ (𝑦), 𝑦) for all 𝑦 ∈ E 𝑦,𝑦 .
If 𝑓 is globally strongly convex–concave, one can take E𝑥,𝑥 = E 𝑦,𝑥 = R𝑚 and E 𝑦,𝑦 = E𝑥,𝑦 = R𝑛 in
the above statements.
Proposition 2.5 states that for a globally strongly convex-concave 𝑓 ∈ 𝐶 2 (X × Y, R), for each
𝑥 ∈ R𝑚 there exists a unique global maximal point 𝑦ˆ (𝑥) ∈ R𝑛 such that 𝑦ˆ (𝑥) = argmax𝑦 ∈R𝑛 𝑓 (𝑥, 𝑦),

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:7

and for each 𝑦 ∈ R𝑛 there exists a unique global minimal point 𝑥ˆ (𝑦) ∈ R𝑚 such that 𝑥ˆ (𝑦) =
argmin𝑥 ∈R𝑚 𝑓 (𝑥, 𝑦).
The following lemma shows the positivity of the Hessian of the suboptimality error 𝐺, which
implies that the suboptimality error 𝐺 is a globally strongly convex function. The proof is provided
in Appendix A.
Lemma 2.6. Suppose that 𝑓 ∈ C 2 (R𝑚 ×R𝑛 , R) is globally 𝜇-strongly convex–concave for some 𝜇 > 0.
The Hessian matrix of the suboptimality error 𝐺 is ∇2𝐺 (𝑥, 𝑦) = diag(𝐺𝑥,𝑥 (𝑥, 𝑦ˆ (𝑥)), 𝐺 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦)),
where
𝐺𝑥,𝑥 (𝑥, 𝑦) = 𝐻𝑥,𝑥 (𝑥, 𝑦) + 𝐻𝑥,𝑦 (𝑥, 𝑦) (−𝐻 𝑦,𝑦 (𝑥, 𝑦)) −1 𝐻 𝑦,𝑥 (𝑥, 𝑦)
𝐺 𝑦,𝑦 (𝑥, 𝑦) = −𝐻 𝑦,𝑦 (𝑥, 𝑦) + 𝐻 𝑦,𝑥 (𝑥, 𝑦) (𝐻𝑥,𝑥 (𝑥, 𝑦)) −1 𝐻𝑥,𝑦 (𝑥, 𝑦)
and they are symmetric, and 𝐺𝑥,𝑥 (𝑥, 𝑦) ≽ 𝜇 and 𝐺 𝑦,𝑦 (𝑥, 𝑦) ≽ 𝜇.

3 ORACLE-BASED SADDLE POINT OPTIMIZATION


We now analyze saddle point optimization based on the approximate minimization oracle outlined
in (3). In the following, we formally state the condition for the approximate minimization oracle.
Then, we show the global convergence of (3) on strongly convex–concave functions.

3.1 Approximate Minimization Oracle


First, we formally define the requirement for the minimization problem solvers.
Definition 3.1 (Approximate Minimization Oracle). Given an objective function ℎ : Z → R to
be minimized and a reference solution 𝑧¯ ∈ Z, an approximate minimization oracle M outputs a
solution 𝑧˜ = M (ℎ, 𝑧)
¯ satisfying ℎ(𝑧)
˜ < ℎ(𝑧)
¯ unless 𝑧¯ is a local minimal point of ℎ.
We now reformulate the saddle point optimization with approximate minimization oracles.
Suppose that we have an approximate minimization oracle M𝑥 solving argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦) for any
𝑦 ∈ Y and an approximate minimization oracle M 𝑦 solving argmin𝑦′ ∈Y −𝑓 (𝑥, 𝑦 ′) for any 𝑥 ∈ X. At
each iteration, the algorithm asks the approximate minimization oracles to output the approximate
solutions to argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦𝑡 ) and argmin𝑦′ ∈X 𝑓 (𝑥𝑡 , 𝑦 ′) with the current solution (𝑥𝑡 , 𝑦𝑡 ) as their
reference point. Let 𝑥˜𝑡 = M𝑥 (𝑓 (·, 𝑦𝑡 ), 𝑥𝑡 ) and 𝑦˜𝑡 = M 𝑦 (−𝑓 (𝑥𝑡 , ·), 𝑦𝑡 ). The update follows
𝑥𝑡 +1 = 𝑥𝑡 + 𝜂 · (𝑥˜𝑡 − 𝑥𝑡 ),
(7)
𝑦𝑡 +1 = 𝑦𝑡 + 𝜂 · (𝑦˜𝑡 − 𝑦𝑡 ).
A point (𝑥, 𝑦) ∈ X × Y is a stationary point of the dynamics of (7) only if it is a local min–max
saddle point of 𝑓 . Moreover, if (𝑥, 𝑦) is a strict local min–max saddle point of 𝑓 , it is a stationary
point of the dynamics of (7). Therefore, if it converges, the final solution is guaranteed to be a
local min–max saddle point of 𝑓 . To guarantee its convergence, we further assume the following
requirement.
Assumption 3.2. Given an objective function ℎ : Z → R to be minimized and a reference solution
𝑧¯ ∈ Z, an approximate minimization oracle M with an approximation precision parameter 𝜖 ∈ [0, 1)
outputs a solution 𝑧˜ = M (ℎ, 𝑧)
¯ satisfying
˜ − min ℎ(𝑧) ⩽ 𝜖 · (ℎ(𝑧)
ℎ(𝑧) ¯ − min ℎ(𝑧)). (8)
𝑧 ∈Z 𝑧 ∈Z
We are particularly interested in algorithms that decrease the objective function value at a
geometric rate on (at least locally) strongly convex objective ℎ as instances of the approximate
minimization oracle M. That is, the runtime — number of ℎ calls or ∇ℎ calls — to decrease the
objective function difference ℎ(𝑧) − ℎ(𝑧 ∗ ) from a local minimum by the factor 𝜖 is 𝑂 (log(1/𝜖)).

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:8 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

For example, the gradient descent method is well known to exhibit a geometric decrease in the
objective function value on strongly convex functions with Lipschitz continuous gradients [Boyd
and Vandenberghe 2004; Karimi et al. 2016]. The (1+1)-ES also exhibits a geometric decrease on
strongly convex functions with Lipschitz continuous gradients [Morinaga and Akimoto 2019]. We
can satisfy the oracle requirement (8) by performing 𝑂 (log(1/𝜖)) iterations of such algorithms.
The condition can also be satisfied by algorithms that exhibit slower convergence, that is, sublinear
convergence. However, for such algorithms, the runtime increases as a candidate solution becomes
closer to a local optimum. Therefore, the stopping condition for the internal algorithm to satisfy (8)
needs to be carefully designed.

3.2 Analysis on Strongly Convex–Concave Functions


Next, we investigate the convergence property of the oracle-based saddle point optimization (7)
on strongly convex–concave functions. In particular, we are interested in knowing how small the
learning rate 𝜂 needs to be to guarantee convergence and how fast it converges. The following
theorem provides an upper bound of the suboptimality error at iteration 𝑡 + 1 given the solution at
iteration 𝑡. The proof is provided in Appendix A.
Theorem 3.3. Suppose that 𝑓 ∈ C 2 (R𝑚 × R𝑛 , R) is globally strongly convex–concave, and there
exist 𝛽𝐺 ⩾ 𝛼𝐺 > 0 and 𝛽𝐻 ⩾ 𝛼 𝐻 > 0 such that
√︁ ∗ −1 √︁ ∗ −1
(1) 𝛽𝐻 ≽ 𝐻𝑥,𝑥 𝐻𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑥 ≽ 𝛼𝐻 ;
√︁ ∗ −1 √︁ ∗ −1
(2) 𝛽𝐻 ≽ −𝐻 𝑦,𝑦 (−𝐻 𝑦,𝑦 (𝑥, 𝑦)) −𝐻 𝑦,𝑦 ≽ 𝛼 𝐻 ;
√︁ ∗ −1 √︁ ∗ −1
(3) 𝛽𝐺 ≽ 𝐻𝑥,𝑥 𝐺𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑥 ≽ 𝛼𝐺 ;
√︁ ∗ −1 √︁ ∗ −1
(4) 𝛽𝐺 ≽ −𝐻 𝑦,𝑦 𝐺 𝑦,𝑦 (𝑥, 𝑦) −𝐻 𝑦,𝑦 ≽ 𝛼𝐺 ,
where 𝐻𝑥,𝑥 = 𝐻𝑥,𝑥 (𝑥 , 𝑦 ), 𝐻 𝑦,𝑦 = 𝐻 𝑦,𝑦 (𝑥 , 𝑦 ∗ ) and (𝑥 ∗, 𝑦 ∗ ) is the global min–max saddle point of 𝑓 .
∗ ∗ ∗ ∗ ∗

Consider approach (7) with approximate minimization oracles M𝑥 and M 𝑦 satisfying Assumption 3.2
𝛼𝐻5
with approximate precision 𝜖 < 4 𝛽
𝛽𝐻
. Let
𝐺
√︁
∗ 𝛼 𝐻 𝛼 𝐻 1 − (𝛽𝐻 /𝛼 𝐻 ) 2 (𝛽𝐺 /𝛼 𝐻 ) · 𝜖
𝜂 = √ , (9)
𝛽𝐻 𝛽𝐺 (1 + 𝜖) 2
√︄
𝛼𝐻 © 𝛽𝐻2 𝛽𝐺 ª √ 𝛽𝐺
𝛾 = −2𝜂 ­1 − 2 · 𝜖 ® + 𝜂 2 · (1 + 𝜖) 2 · . (10)
𝛽𝐻 𝛼𝐻 𝛼𝐻 𝛼𝐻
« ¬
Then, for any 𝜂 < 2 · 𝜂 ∗ , we have 𝛾 < 0 and log (𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 )) − log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) < 𝛾. Inlother words,  m
the runtime 𝑇𝜁 to reach {(𝑥, 𝑦) ∈ R𝑚 ×R𝑛 : 𝐺 (𝑥, 𝑦) ⩽ 𝜁 ·𝐺 (𝑥 0, 𝑦0 )} for 𝜁 ∈ (0, 1) is 𝑇𝜁 ⩽ |𝛾1 | log 𝜁1
for any initial point (𝑥 0, 𝑦0 ) ∈ R𝑚 × R𝑛 . Moreover, 𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) > 𝐺 (𝑥𝑡 , 𝑦𝑡 ) for all (𝑥𝑡 , 𝑦𝑡 ) if 𝜂 > 2 · 𝜂,
¯
where √︁
𝛽𝐻 𝛽𝐻 1 + (𝛽𝐺 /𝛼 𝐻 ) · 𝜖
𝜂¯ = √︁ . (11)
𝛼𝐺 𝛼 𝐻 (1 − (𝛽𝐻 /𝛼 𝐻 ) · 𝜖) 2
Linear Convergence. The proposed approach (7) satisfying Assumption 3.2 converges linearly
toward the global min–max saddle point on a strongly convex–concave objective function if 𝜂 < 2𝜂 ∗ .
If M𝑥 and M 𝑦 are implemented with algorithms that exhibit linear convergence, we can conclude
that the runtime in terms of 𝑓 -calls and/or ∇𝑓 -calls is
    
1 1 1
O log log . (12)
|𝛾 | 𝜁 𝜖

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:9

Necessary Condition. To exhibit convergence, shrinking the learning rate 𝜂 is not only sufficient
but also necessary. To determine the closeness of the upper bound 2 · 𝜂 ∗ in the sufficient condition
and the lower bound 2 · 𝜂¯ in the necessary condition, consider a convex–concave quadratic function
𝑓 (𝑥, 𝑦) = 𝑎2 𝑥 2 + 𝑏𝑥𝑦 − 𝑐2 𝑦 2 for instance, where 𝑎 > 0, 𝑏 ∈ R and 𝑐 > 0. Then, we have 𝛼 𝐻 = 𝛽𝐻 = 1
and 𝛼𝐺 = 𝛽𝐺 = 𝑎𝑐+𝑏 𝑎𝑐 ∗ 𝑎𝑐
2 . Ignoring the effect of 𝜖, we have 𝜂 = 𝜂
¯ = 𝑎𝑐+𝑏 2 . This implies that the sufficient

condition for linear convergence, 𝜂 < 2 · 𝜂 , is indeed the necessary condition for the convergence
itself in this example situation. This reveals a limitation of existing approaches [Al-Dujaili et al.
2019; Pinto et al. 2017], which corresponds to (7) with 𝜂 = 1.
Runtime Bound. The runtime bound 𝑇𝜁 is proportional to |𝛾1 | in (10). The log-progress bound |𝛾 |
1
is roughly proportional to 2 · 𝜂 if 𝜂 ≪ 1. That is, the runtime is proportional to 2·𝜂 . The minimal

runtime bound is obtained when 𝜂 = 𝜂 , where
2 (𝛽 /𝛼 ) · 𝜖 2
 2 √︁ !
𝛼 𝛼 1 − (𝛽 /𝛼 )
𝛾 = 𝛾 ∗ := −
𝐻 𝐻 𝐻 𝐻 𝐺 𝐻
√ . (13)
𝛽𝐺 𝛽𝐻 1+ 𝜖
 2
Provided that 𝜖 ≪ 1, we have 𝜂 ∗ ≈ 𝛼𝛽𝐺𝐻 𝛼𝛽𝐻𝐻 and 𝛾 ∗ ≈ − 𝛼𝛽𝐺𝐻 𝛼𝛽𝐻𝐻 . The main factor that limits 𝜂 ∗
and 𝛾 ∗ is 𝛼𝛽𝐺𝐻 . As noted above in the above example of a convex–concave quadratic function, the
ratio 𝛼𝛽𝐺𝐻 is smaller as the influence of the interaction term between 𝑥 and 𝑦 on the objective
function value is greater than that to the other terms, that is, as 𝑏 2 /𝑎𝑐 is greater. The other factor,
𝛼𝐻 ∗ −1 ∗ −1
𝛽𝐻 , is smaller as the condition number Cond(𝐻𝑥,𝑥 (𝑥, 𝑦) (𝐻𝑥,𝑥 ) ) or Cond(𝐻 𝑦,𝑦 (𝑥, 𝑦) (𝐻 𝑦,𝑦 ) ) is
higher. This depends on the change in the Hessian matrix over the search space R𝑚 × R𝑛 . If the
objective function is convex–concave quadratic, that is, 𝑓 (𝑥, 𝑦) = 12 𝑥 T 𝐻𝑥,𝑥 𝑥 + 𝑥 T 𝐻𝑥,𝑦𝑦 + 21 𝑦 T𝐻 𝑦,𝑦𝑦,
the Hessian matrix is constant over the search space, and we have 𝛼 𝐻 /𝛽𝐻 = 1, whereas 𝛽𝐺 =
2 ( 𝐻 −1 𝐻 −1 −1 −1
√︁ √︁ √︁ √︁
1 + 𝜎max 𝑥,𝑥 𝑥,𝑦 −𝐻 𝑦,𝑦 ) and 𝛼𝐺 = 1 + 𝜎min ( 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 ), where 𝜎min and 𝜎max
2

denote the smallest and greatest singular values. Therefore, we have


1  √︁ −1 √︁ −1
  1 + √𝜖  2
2
= 1 + 𝜎 ( 𝐻 𝑥,𝑥 𝐻 𝑥,𝑦 −𝐻 𝑦,𝑦 ) √ . (14)
|𝛾 ∗ | max
1− 𝜖
Comparison with GDA.  Theorem
 1 of [Liang and Stokes 2019] shows that the runtime 𝑇𝜁 of the
1 1
GDA (2) is O 𝛾 GDA log 𝜁 , where

1 max (𝑥,𝑦) ∈R𝑚 ×R𝑛 𝜆max (𝐾 (𝑥, 𝑦))


= , (15)
𝛾 GDA min (𝑥,𝑦) ∈R𝑚 ×R𝑛 𝜆min (diag(𝐻𝑥,𝑥 (𝑥, 𝑦) 2, (−𝐻 𝑦,𝑦 (𝑥, 𝑦)) 2 ))
𝜆min and 𝜆max denote the smallest and greatest eigenvalues, and
 2 +𝐻

𝐻𝑥,𝑥 𝑥,𝑦 𝐻 𝑦,𝑥 −𝐻𝑥,𝑥 𝐻𝑥,𝑦 + 𝐻𝑥,𝑦 (−𝐻 𝑦,𝑦 )
𝐾 (𝑥, 𝑦) = , (16)
−𝐻 𝑦,𝑥 𝐻𝑥,𝑥 + (−𝐻 𝑦,𝑦 )𝐻 𝑦,𝑥 (−𝐻 𝑦,𝑦 ) 2 + 𝐻 𝑦,𝑥 𝐻𝑥,𝑦
where we drop (𝑥, 𝑦) from 𝐻𝑥,𝑥 (𝑥, 𝑦), 𝐻 𝑦,𝑦 (𝑥, 𝑦), 𝐻𝑥,𝑦 (𝑥, 𝑦), and 𝐻 𝑦,𝑥 (𝑥, 𝑦) for compact expres-
sion. To compare this with our result, consider the pre-conditioned convex–concave quadratic
√︁ −1 √︁ −1
function 𝑓 (𝑥, 𝑦) = 12 𝑥 T 𝐼𝑥 + 𝑥 T 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 𝑦 + 12 𝑦 T (−𝐼 )𝑦. Then, it may be easily ob-
served that 𝜆min (diag(𝐻𝑥,𝑥 (𝑥, 𝑦) 2, (−𝐻 𝑦,𝑦 (𝑥, 𝑦)) 2 )) = 1 = 𝛼 𝐻 = 𝛽𝐻 and 𝜆max (𝐾 (𝑥, 𝑦)) = 1 +
2 ( 𝐻 −1 𝐻 −1 −1 −1
√︁ √︁ 1
√︁ √︁
𝑥,𝑦 −𝐻 𝑦,𝑦 ) = 𝛽𝐺 . Therefore, 𝛾 GDA = 1 + 𝜎max ( 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 ). Because
𝜎max 2
𝑥,𝑥
the GDA requires one ∇𝑓 call per iteration, 𝑇𝜁 is the runtime w.r.t. ∇𝑓 calls as well. It indicates that
the runtime bound w.r.t. 𝑓 and/or ∇𝑓 calls are the same for the GDA and the oracle-based saddle
point optimization (7) with linearly convergent oracles, whose runtime is obtained by substituting

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:10 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

(14) into (12), by ignoring the effect of 𝜖. Note, however that the runtime of the GDA depends on
the pre-conditioning, as it is a first-order approach. The number of oracle calls of the oracle-based
saddle point optimization is independent of the pre-conditioning, but the number of 𝑓 and/or ∇𝑓
calls in each oracle call may depend on the pre-conditioning.

4 SADDLE POINT OPTIMIZATION WITH LEARNING RATE ADAPTATION


In this section, we propose practical implementations of the saddle point optimization approach (7)
with a heuristic mechanism to adapt the learning rate 𝜂. We implement the proposed approach
using two minimization routines. The first is (1+1)-CMA-ES, which is a zero-order randomized
hill-climbing approach. The second is SLSQP, which is a first-order deterministic hill-climbing
approach.

4.1 Learning Rate Adaptation


The main limitation of oracle-based saddle point optimization when it is applied to a simulation-
based optimization task is that we rarely know the right 𝜂 value in advance. As we see in Theorem 3.3,
𝜂 < 2 · 𝜂 ∗ must be selected to guarantee convergence on a convex–concave function. However,
the optimal value, 𝜂 ∗ , depends on the problem characteristics and is unknown in advance when
considering a black-box setting. In practice, it is a tedious task to find a reasonable 𝜂.
To address this issue, we propose adapting 𝜂 during the optimization process. The overall
framework is presented in Algorithm 1, where we assume 𝑓 Y = 𝑓 for the moment, that is, Y = ∅ to
simplify the main idea.
The main idea is to estimate the convergence speed in terms of the suboptimality error by running
𝑁 step iterations of algorithm (7) with a candidate learning rate 𝜂𝑐 (lines 6–21). If the estimated
convergence speed 𝛾˜𝑐 associated with 𝜂𝑐 is better (greater absolute value with a negative sign)
than the estimated convergence speed 𝛾˜ associated with the base learning rate 𝜂, we replace 𝜂
with 𝜂𝑐 (lines 22–27). The next candidate learning rate is chosen randomly from min(𝜂 · 𝑐𝜂 , 1)
(greater learning rate), 𝜂 (current learning rate), and min(𝜂/𝑐𝜂 , 𝜂 min ) (smaller learning rate) with
equal probability, where 𝜂 min is the minimal learning rate value and 𝑐𝜂 > 1 is the hyperparameter
that determines the granularity of the 𝜂 update. A smaller 𝑐𝜂 results in a smoother 𝜂 change, but it
may require more time to adapt 𝜂. It is advised to set 𝑐𝜂 < 2 because Theorem 3.3 indicates that the
upper bound on 𝜂 for convergence is 2 · 𝜂 ∗ , where 𝜂 ∗ is the optimal value.
We estimate the convergence speed by running the algorithm for 𝑁 step iterations. The suboptimal-
ity error 𝐺 (𝑥, 𝑦) is approximated by 𝐹𝑠 in line 17. Because of the oracle condition (8), if there exists
a unique (hence, global) min–max saddle point, we have (1 − max(𝜖𝑥 , 𝜖 𝑦 )) · 𝐺 (𝑥, 𝑦) ⩽ 𝐹𝑠 ⩽ 𝐺 (𝑥, 𝑦).
Then, we have
𝐺 (𝑥 𝑁step , 𝑦𝑁step ) log(1 − max(𝜖𝑥 , 𝜖 𝑦 ))
   
1 𝐹 𝑁step
log − log ⩽ .
𝑁 step − 1 𝐺 (𝑥 1, 𝑦1 ) 𝐹1 𝑁 step − 1
Based on Theorem 3.3, if the objective function is strongly convex–concave, the convergence speed
will be proportional to 1/𝜂. Then, to approximate the convergence speed in line 17, 𝑁 step ∈ Ω(1/𝜂)
must be set to alleviate the estimation error, that is, the right-hand side of the above inequality.
Therefore, we set 𝑁 step = ⌊𝑏𝜂 + 𝑎𝜂 /𝜂𝑐 ⌋, where 𝑎𝜂 > 0 and 𝑏𝜂 ⩾ 0 are the hyperparameters. The
greater they are, the more accurate the estimated convergence speed, but the slower the speed of
adaptation of 𝜂. If the objective function is not strongly convex–concave, the above argument may
not hold, yet we optimistically expect it to reflect the convergence speed of the algorithm toward a
local min–max saddle point.
After estimating the convergence speed 𝛾˜𝑐 for the candidate learning rate 𝜂𝑐 , we replace 𝜂 and
𝛾˜ with 𝜂𝑐 and 𝛾˜𝑐 if 𝛾˜𝑐 is equal to or better than the convergence speed 𝛾˜ for the current learning

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:11

Algorithm 1 Saddle Point Optimization with Learning Rate Adaptation


Require: 𝑥 ∈ X, 𝑦 ∈ Y, 𝜃 𝑥 , 𝜃 𝑦 , 𝑎𝜂 > 0, 𝑏𝜂 ⩾ 0, 𝑐𝜂 > 1
𝑦
Require: (optional) 𝑃𝑥 , 𝑃 𝑦 , 𝜂 min ⩾ 0, 𝐺 tol ⩾ 0, 𝑑 min ⩾ 0
1: 𝜂 ← 1, 𝛾˜ ← 0, Y ← ∅, X ← ∅
2: for 𝑡 = 1, · · · ,𝑇 do
𝑦
3: (𝑥𝑡 , 𝑥, ˜ 𝜃 𝑡 ) ← (𝑥, 𝑥, 𝜃 𝑥 , 𝑦, 𝑦, 𝜃 𝑦 )
˜ 𝜃 𝑡𝑥 , 𝑦𝑡 , 𝑦,
4: 𝜂𝑐 ← {min(𝜂 · 𝑐𝜂 , 1), 𝜂, max(𝜂/𝑐𝜂 , 𝜂 min )} w.p. 1/3 for each
5: 𝑁 step ← ⌊𝑏𝜂 + 𝑎𝜂 /𝜂𝑐 ⌋
6: for 𝑠 = 1, · · · , 𝑁 step do
7: Let 𝑓 Y (𝑥, 𝑦) := max𝑦′ ∈Y∪{𝑦 } 𝑓 (𝑥, 𝑦 ′)
8: ˜ 𝜃 𝑥 ) ← (𝑥, 𝜃 𝑡𝑥 ) if 𝑓 Y (𝑥,
(𝑥, ˜ 𝑦) > 𝑓 Y (𝑥, 𝑦)
𝑦
9: ˜ 𝜃 𝑦 ) ← (𝑦, 𝜃 𝑡 ) if 𝑓 (𝑥, 𝑦)
(𝑦, ˜ < 𝑓 (𝑥, 𝑦)
10: ˜ 𝜃 𝑥 ← M𝑥 (𝑓 Y (·, 𝑦), 𝑥;
𝑥, ˜ 𝜃𝑥 )
11: ˜ 𝜃 𝑦 ← M 𝑦 (−𝑓 (𝑥, ·), 𝑦;
𝑦, ˜ 𝜃 𝑦)
12: ˜ 𝜃 ) ← (𝑥 , 𝜃 𝑡 ) if 𝑓 Y (𝑥 ′, 𝑦) < 𝑓 Y (𝑥,
(𝑥, 𝑥 ′ 𝑥 ˜ 𝑦) for 𝑥 ′ ∼ 𝑃𝑥
13: ′
if 𝑓 (𝑥, 𝑦 ) > 𝑓 (𝑥, 𝑦) ′
˜ for 𝑦 ∼ 𝑃 𝑦 then
𝑦
14: Y ← Y ∪ {𝑦} ˜ if 𝑓 (𝑥, 𝑦) ˜ ⩾ 𝑓 Y (𝑥, 𝑦) and ∥𝑦¯ − 𝑦˜ ∥ > 𝑑 min for all 𝑦¯ ∈ Y
˜ 𝜃 𝑦 ) ← (𝑦 ′, 𝜃 𝑡 )
𝑦
15: (𝑦,
16: end if
17: 𝐹𝑠 ← 𝑓 Y (𝑥, 𝑦) ˜ − 𝑓 Y (𝑥,˜ 𝑦)
18: (𝑥, 𝑦) ← (𝑥, 𝑦) + 𝜂𝑐 (𝑥˜ − 𝑥, 𝑦˜ − 𝑦)
19: break if 𝑠 ⩾ 𝑏𝜂 and 𝐹𝑠 > · · · > 𝐹𝑠−𝑏𝜂 +1
20: end for
21: 𝛾˜𝑐 , 𝜎𝛾˜𝑐 ← slope(log(𝐹 1 ), . . . , log(𝐹𝑠 ))
22: if 𝛾˜ ⩾ 0 and 𝛾˜𝑐 ⩾ 0 then
23: 𝜂 ← 𝜂/𝑐𝜂3
24: else if 𝛾˜𝑐 ⩽ 𝛾˜ or 𝜂 = 𝜂𝑐 then
25: 𝜂 ← 𝜂𝑐 , 𝛾˜ ← 𝛾˜𝑐
26: end if
𝑦
27: (𝑥, 𝑦, 𝜃 𝑥 , 𝜃 𝑦 ) ← (𝑥𝑡 , 𝑦𝑡 , 𝜃 𝑡𝑥 , 𝜃 𝑡 ) if 𝛾˜𝑐 − 2𝜎𝛾˜𝑐 > 0
28: if 𝐹𝑠 ⩽ 𝐺 tol then
29: X ← X ∪ {𝑥 }
𝑦
30: Y ← Y ∪ {𝑦} if ∥𝑦¯ − 𝑦˜ ∥ > 𝑑 min for all 𝑦¯ ∈ Y
31: Re-initialize 𝑥, 𝑦, 𝜃 𝑥 , 𝜃 𝑦 and reset 𝜂 ← 1 and 𝛾˜ ← 0
32: end if
33: end for
34: return argmin𝑥 ′ ∈X∪{𝑥 } 𝑓 Y (𝑥 ′, 𝑦)

rate 𝜂 (line 25). We also update 𝛾˜ when 𝜂 = 𝜂𝑐 . If both 𝛾˜ and 𝛾˜𝑐 are non-negative, the learning rate is
too high, and we reduce 𝜂 by multiplying 1/𝑐𝜂3 . If 𝛾˜𝑐 − 2𝜎𝛾˜𝑐 > 0, where 𝜎𝛾˜ is the estimated standard
deviation of 𝛾,˜ we revert the solutions and other strategy parameters 𝜃 𝑥 and 𝜃 𝑦 .
Based on our preliminary experiments and the above argument, we set 𝑎𝜂 = 1, 𝑏𝜂 = 5, and
𝑐𝜂 = 1.1 as the default values.

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:12 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

4.2 Ingenuity for practical use


Our approach is designed to locate a min–max saddle point. However, in practice, we often cannot
guarantee the existence of min–max saddle points. In such a situation, 𝑥 and 𝑦 may not converge
and oscillate. For example, consider 𝑓 (𝑥, 𝑦) = 𝑥 T𝑦 on [−1, 1] × [−1, 1]. The worst 𝑦 is −1 if 𝑥 < 0
and 1 if 𝑥 > 0, and the best 𝑥 is 1 if 𝑦 < 0 and −1 if 𝑦 > 0. This causes a cyclic behavior: 𝑥 is
positive, then 𝑦 becomes positive, then 𝑥 becomes negative, then 𝑦 becomes negative, and so on.
To stabilize the algorithm in such situations, we maintain a set Y of 𝑦 ∈ Y and replace 𝑓 with
𝑓 Y (𝑥, 𝑦) := max𝑦′ ∈Y∪{𝑦 } 𝑓 (𝑥, 𝑦 ′) using the approach described in Section 4.1. In the above example,
provided that there are points 𝑦1 < 0 and 𝑦2 > 0 in Y, the optimal 𝑥 of 𝑓 Y is zero regardless of 𝑦.
This is the optimal solution for min−1⩽𝑥 ⩽1 max−1⩽𝑦⩽1 𝑓 (𝑥, 𝑦). However, if we replace −𝑓 with −𝑓 Y
for the objective function of M 𝑦 , the optimization is likely to fail because 𝑓 Y (𝑥, 𝑦) is constant with
respect to 𝑦 over {𝑦 ∈ Y : 𝑓 (𝑥, 𝑦) ⩾ 𝑓 (𝑥, 𝑦 ′) for some 𝑦 ′ ∈ Y}. Therefore, we replace 𝑓 with 𝑓 Y
only for the parts regarding 𝑥 optimization. We initialize Y with the empty set; hence, 𝑓 Y = 𝑓 at
the beginning. The output 𝑦˜ of M 𝑦 is registered to Y if a random sample 𝑦 ′ ∼ 𝑃 𝑦 provides a worse
objective value than 𝑦, ˜ ⩾ 𝑓 Y (𝑥, 𝑦), and none of the registered points 𝑦¯ ∈ Y is in the closed
˜ 𝑓 (𝑥, 𝑦)
𝑦
ball centered at 𝑦˜ with radius 𝑑 min , which is a hyperparameter.
The existence of multiple local min–max saddle points is another difficulty that is often encoun-
tered in practice. For such problems, we would like to locate a local min–max saddle point whose
worst-case objective value is as small as possible. To tackle this difficulty, we implement a restart
strategy in lines 28–32 of Algorithm 1. First, we check whether the current solution is nearly a local
min–max saddle point by checking 𝐹𝑠 ⩽ 𝐺 tol , where 𝐺 tol is a user-defined threshold parameter.
Note that 𝐹𝑠 can be close to zero at a local min–max saddle point even if the true suboptimality
error is nonzero because 𝐹𝑠 is computed using the outputs of M𝑥 and M 𝑦 . Therefore, a small 𝐹𝑠
value is a sign of a local min–max saddle point. If this restart criterion is satisfied, we register the
current solution 𝑥 as a candidate for the final solution and append the current 𝑦 to Y unless it is
sufficiently close to the already registered points in Y. Then, we re-initialize the solutions 𝑥 and 𝑦
and the internal parameters 𝜃 𝑥 and 𝜃 𝑦 , and restart the search with 𝜂 = 1.
The other details are described as follows. First, we allow the sharing of the internal parameters
𝜃 𝑥 and 𝜃 𝑦 of M𝑥 and M 𝑦 over oracle calls. Second, we feed the last outputs 𝑥˜ and 𝑦˜ to M𝑥 and
M 𝑦 as the reference points instead of the current solutions 𝑥 and 𝑦 if the former is better. This
contributes to realizing smaller approximation errors 𝜖. Third, we optionally try random samples
𝑥 ′ ∼ 𝑃𝑥 and 𝑦 ′ ∼ 𝑃 𝑦 and check if they are better than the outputs of the oracles if 𝑃𝑥 and 𝑃 𝑦 are
given. A typical choice for 𝑃𝑥 and 𝑃 𝑦 is the uniform distribution on X and Y if they are bounded.
Fourth, we optionally introduce the minimal learning rate 𝜂 min . Because a small 𝜂 slows down the
optimization speed, it is not practical to set an extremely small 𝜂, even though it is necessary for
convergence.

4.3 Adversarial-CMA-ES
We implemented the proposed approach with (1+1)-CMA-ES as M𝑥 and M 𝑦 . The (1+1)-CMA-ES is a
derivative-free randomized hill-climbing approach with step-size adaptation and covariance matrix
adaptation. It samples a candidate solution 𝑧 ′ ∼ N (𝑧, (𝜎𝐴) (𝜎𝐴) T ), where 𝜎 is the step size and
𝐴 · 𝐴T is the covariance matrix. The step size is adapted with the so-called 1/5-success rule [Devroye
1972; Rechenberg 1973; Schumer and Steiglitz 1968], which maintains 𝜎 such that the probability
of generating a better solution is approximately 1/5. We implemented a simplified 1/5-success rule
proposed by [Kern et al. 2004]. The covariance matrix was adapted with the active covariance
matrix update [Arnold and Hansen 2010]. The results show empirically that the covariance matrix

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:13

Algorithm 2 (1+1)-CMA-ES as Minimization Oracle


Require: ℎ : Rℓ → R, 𝑧 ∈ Rℓ , 𝜎 > 0, 𝐴 ∈ Rℓ×ℓ , ℎ𝑧 = ℎ(𝑧), 𝜏es, 𝜏es ′ ∈ N

Require: (optional) 𝜎¯min ⩾ 0


2 1 2
1: 𝑐 = 𝑒 2+ℓ , 𝑐 𝑝 = 12 , 𝑐𝑐 = ℓ+2 , 𝑐 cov+ = ℓ 22+6 , 𝑐 cov- = ℓ 1.6
0.4
+1
, 𝑝 thre = 0.44
2: 𝑝 ← 0 ∈ R , 𝑝 succ ← 0.5 ∈ [0, 1], 𝑛 succ = 0

3: Initialize 𝐻 ∈ R5 with 𝐻 1 = ℎ𝑧 and 𝐻 2 = 𝐻 3 = 𝐻 4 = 𝐻 5 = ∞


4: while 𝑛 succ < 𝜏es · ℓ + 𝜏es ′ do

5: ′
𝑧 ← 𝑧 + 𝜎𝐴N (0, 𝐼 )
6: ℎ𝑧′ = ℎ(𝑧 ′)
7: if ℎ𝑧′ ⩽ 𝐻 1 then
8: 𝐻 ← (ℎ𝑧′ , 𝐻 1, 𝐻 2, 𝐻 3, 𝐻 4 )
9: 𝑝 succ ← (1 − 𝑐 𝑝 ) · 𝑝 succ + 𝑐 𝑝
10: if 𝑝 succ > 𝑝 thre then
11: 𝑝 ← (1 − 𝑐𝑐 ) · 𝑝, 𝑐 cov = 𝑐 cov+ (1 − 𝑐𝑐 · (2 − 𝑐𝑐 ))
12: else √︁ ′
13: 𝑝 ← (1 − 𝑐𝑐 ) · 𝑝 + 𝑐𝑐 · (2 − 𝑐𝑐 ) 𝑧 𝜎−𝑧 , 𝑐 cov = 𝑐 cov+
14: end if
15: 𝑤 = 𝐴inv · 𝑝 √
√ 1−𝑐 cov
√︃
𝑐 cov

16: 𝑎 = 1 − 𝑐 cov , 𝑏 = ∥𝑤 ∥2
1 + 1−𝑐 cov
∥𝑤 ∥ 2−1

17: 𝐴 ← 𝑎 · 𝐴 + 𝑏 · (𝐴 · 𝑤) · 𝑤 T , 𝐴inv ← 𝑎1 · 𝐴inv − 𝑎2 +𝑎 ·𝑏𝑏· ∥𝑤 ∥ 2 · 𝑤 · (𝑤 T𝐴inv )


18: 𝜎 ← 𝜎 · 𝑐, 𝑧 ← 𝑧 ′, 𝑛 succ ← 𝑛 succ + 1
19: else
20: 𝑝 succ ← (1 − 𝑐 𝑝 ) · 𝑝 succ
21: if ℎ𝑧′ > 𝐻 5 and 𝑝 succ ⩽ 𝑝 thre then

22: 𝑤 = 𝐴inv · 𝑧 𝜎−𝑧
23: 𝑐 cov = 𝑐 cov- if 𝑐 cov- (2 · ∥𝑤 ∥ 2 − 1) ⩽ 1 else 𝑐 cov = 2· ∥𝑤1∥ 2 −1
√ √ √︃ 
1+𝑐 cov 𝑐 cov
24: 𝑎 = 1 + 𝑐 cov , 𝑏 = ∥𝑤 ∥ 2 1 − 1+𝑐 cov ∥𝑤 ∥ 2−1

25: 𝐴 ← 𝑎 · 𝐴 + 𝑏 · (𝐴 · 𝑤) · 𝑤 T , 𝐴inv ← 𝑎1 · 𝐴inv − 𝑎2 +𝑎 ·𝑏𝑏· ∥𝑤 ∥ 2 · 𝑤 · (𝑤 T𝐴inv )


26: end if
27: 𝜎 ← 𝜎 · 𝑐 −1/4
28: end if √ √
29: 𝜎 ← 𝜎 · ∥𝐴√ℓ∥ 𝐹 , 𝐴 ← 𝐴 · ∥𝐴 ℓ∥ 𝐹 , 𝐴inv ← 𝐴inv · ∥𝐴√ℓ∥ 𝐹 , 𝑝 ← 𝑝 · ∥𝐴 ℓ∥ 𝐹 every ℓ iterations
30: break if 𝜎 < 𝜎¯min
31: end while
32: return 𝑧, max(𝜎, 𝜎¯min ), 𝐴

learned the inverse Hessian matrix on a convex quadratic function. The algorithm is summarized
in Algorithm 2. We call Algorithm 1 with Algorithm 2 as M𝑥 and M𝑥 Adversarial-CMA-ES.
We shared the strategy parameter 𝜃 = (𝜎, 𝐴) over oracle calls. Here, we implicitly assumed that
the objective function ℎ of the current oracle call and that of the last oracle call are similar because
the changes in 𝑥𝑡 and 𝑦𝑡 are small if 𝜂 is small. Then, reusing the strategy parameter of the last
oracle call reduced the need for its adaptation time.
We ran (1+1)-CMA-ES until it improved the solution 𝜏es ·ℓ +𝜏es′ times. The reason for this procedure

is described below. Because the step size is maintained such that the probability of generating a

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:14 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

successful solution is approximately 1/5, the algorithm runs approximately 𝑇 = 5 · (𝜏es · ℓ + 𝜏es ′ )

iterations. It was shown in [Morinaga and Akimoto 2019] that the expected runtime E[𝑇𝜖 ] of
(1+1)-ES with the simplified 1/5-success rule is Θ(log(1/𝜖)) on strongly convex functions with
Lipschitz continuous gradients and their strictly increasing transformations. Moreover, the scaling
of the runtime with respect to dimension ℓ is Θ(ℓ) on general convex quadratic functions [Morinaga
et al. 2021]. Therefore, we expect that 𝑇 iterations of (1+1)-CMA-ES approximates M with 𝜖 ∈
exp(−Θ(𝑇 /ℓ)) = exp(−Θ(1)). The reason that we count the number of successful iterations instead
of the number of total iterations is to avoid producing no progress because of a bad initialization of
each oracle call.
Another optional stopping condition is 𝜎 < 𝜎¯min for a given minimal step size 𝜎¯min ⩾ 0. Once 𝜎
reaches 𝜎¯min , Algorithm 2 returns 𝜎 = 𝜎¯min . Then, the next M call starts with 𝜎 = 𝜎¯min and it is
expected to stop after a few iterations. That is, if 𝜎 for M𝑥 reaches 𝜎¯min while 𝜎 > 𝜎¯min for M 𝑦 ,
Algorithm 1 spends more 𝑓 -calls for M 𝑦 than for M𝑥 , and vice versa.
Based on our preliminary experiments, we set 𝜏es = 𝜏es ′ = 5 as their default values. If they are set

greater, we expect that (1+1)-CMA-ES approximates condition (8) with a smaller 𝜖.

4.4 Adversarial-SLSQP
We also implemented the algorithm with a sequential least squares quadratic programming (SLSQP)
subroutine [Kraft 1988] to demonstrate the applicability of the proposed 𝜂 adaptation mechanism.
This was a first-order approach, which required access to ∇𝑓 . Unlike Adversarial-CMA-ES, no
strategy parameter for SLSQP is shared over oracle calls. The maximum number of iterations is
set to 𝜏slsqp = 5. We used the scipy implementation of SLSQP as M in Algorithm 1. We call this
first-order approach Adversarial-SLSQP (ASLSQP).

5 NUMERICAL EVALUATION
Through experiments on test problems as described below, we confirmed the following hypotheses.
(A) Our implementations of the proposed approach, Adversarial-CMA-ES and Adversarial-SLSQP,
performed as well as the theory implies. (B) Our learning rate adaptation located a nearly optimal
learning rate with little compromise of the objective function calls. (C) Local strong convexity–
concavity of the objective function is necessary for good min–max performance of the proposed
approach. (D) Existing coevolutionary approaches fail to converge even on a convex–concave
quadratic problem.

5.1 On Convex–Concave Quadratic Functions


To confirm (A) and (B), we ran Adversarial-CMA-ES and Adversarial-SLSQP on the following
convex-concave quadratic function 𝑓1 : R𝑚 × R𝑛 → R with 𝑛 = 𝑚:
𝑎 𝑐
𝑓1 (𝑥, 𝑦) = ∥𝑥 ∥ 2 + 𝑏 ⟨𝑥, 𝑦⟩ − ∥𝑦 ∥ 2, (17)
2 2
where 𝑎, 𝑐 > 0 and 𝑏 ∈ R. The global min–max saddle point was located at (𝑥 ∗, 𝑦 ∗ ) = (0, 0). The
·𝑐+𝑏 2
suboptimality error is 𝐺 1 (𝑥, 𝑦) = 𝑎2𝑎 ·𝑐 (∥𝑥 ∥ + ∥𝑦 ∥ ). In this problem, we have 𝛼 𝐻 = 𝛽𝐻 = 1 and
2 2
2 𝑎 ·𝑐
𝛼𝐺 = 𝛽𝐺 = 1 + 𝑎𝑏·𝑐 ; hence, for 𝜖 ≪ 1, we have 𝜂 ∗ ≈ 𝜂¯ ≈ 𝑎 ·𝑐+𝑏 ∗
2 . Moreover, for 𝜂 = 𝛿 · 𝜂 for 𝛿 ∈ (0, 2),
𝑎 ·𝑐
𝛾 = − 𝑎 ·𝑐+𝑏 2 𝛿 (2 − 𝛿). That is, Theorem 3.3 indicates that the runtime of the proposed approach with
 2

1
a fixed learning rate was proportional to 1 + 𝑎𝑏·𝑐 𝛿 (2−𝛿) .
The experimental settings were as follows. We draw the initial solution (𝑥, 𝑦) uniform-randomly
from [−1, 5]𝑚 × [−1, 5] 𝑛 . The strategy parameters for Adversarial-CMA-ES are 𝜃 𝑥 = (𝜎 𝑥 , 𝐴𝑥 ) and
𝜃 𝑦 = (𝜎 𝑦 , 𝐴𝑦 ). The step sizes 𝜎 𝑥 and 𝜎 𝑦 are initialized as one-fourth of the length of the initialization

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:15

105 103

6 × 104 6 × 102
4
4 × 102
#f -calls

#f -calls
4 × 10
3 × 104 3 × 102
4
2 × 10 2 × 102

104 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2
102 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2
2 3 2 3 2 3 2 3 2 3 2 3 23 2 3 2 3 2 3 2 3 2 3 adapt 2 3 2 3 2 3 2 3 2 3 2 3 23 2 3 2 3 2 3 2 3 2 3 adapt
η/η ∗ η/η ∗

(a) Adversarial-CMA-ES (b) Adversarial-SLSQP

Fig. 1. The number of 𝑓 -calls until 𝐺 (𝑥, 𝑦) ⩽ 10−5 is reached on 𝑓1 with 𝑛 = 𝑚 = 10 and 𝑎 = 𝑏 = 𝑐 = 1. (a)
3−𝑘
Adversarial-CMA-ES with 𝜂-adaptation (adapt) and fixed 𝜂 = 𝜂 ∗ ×2 3 for 𝑘 = 1, . . . , 12. (b) Adversarial-SLSQP
3−𝑘
with 𝜂-adaptation (adapt) and fixed 𝜂 = 𝜂 ∗ × 2 3 for 𝑘 = 1, . . . , 12. The dashed lines are proportional to
1
(𝜂/𝜂 ∗ ) (2−𝜂/𝜂 ∗ ) .

interval, that is, 𝜎 𝑥 = 𝜎 𝑦 = 1.5. The factors 𝐴𝑥 and 𝐴𝑦 are initialized by the identity matrix. We
used the default hyperparameter values described in the previous section. We omitted lines 12–13
and lines 28–32 of Algorithm 1 (i.e., neither 𝑃𝑥 nor 𝑃 𝑦 are given and 𝐺 tol = 0) in this experiment.
The minimal learning rate was set to 𝜂 min = 10−4 . The minimal step sizes are set to 𝜎¯min 𝑥 =𝜎 𝑦
¯min = 0.
We run 50 independent trials for each setting, with the maximum number of 𝑓 -calls of 107 .
Figure 1 compares the proposed approaches with and without 𝜂-adaptation mechanism. For fixed
3−𝑘
𝜂 cases, we set 𝜂 to 𝛿 · 𝜂 ∗ with 𝛿 ∈ {2 3 : 𝑘 = 1, . . . , 12}. We remark that for both algorithms, all
the trials with 𝜂 = 2 × 𝜂 ∗ fail to converge, as implied by Theorem 3.3. As expected, the runtimes of
1
both algorithms with fixed 𝜂 were nearly proportional to (𝜂/𝜂 ∗ ) (2−𝜂/𝜂 ∗ ) . The best 𝜂 is approximately

𝜂 . We conclude that our implementations closely approximate the oracle condition (8) and that
the proposed approach works as the theory implies.
The proposed approach with the 𝜂-adaptation mechanism succeed in converging toward the
global min–max saddle point. Comparing the runtime of the 𝜂-adaptation mechanism and the best
fixed 𝜂 = 𝜂 ∗ , we compromise the number of 𝑓 -calls at most three times in the median case for
both Adversarial-CMA-ES and Adversarial-SLSQP to adapt 𝜂. There are also trials that required
a few times more runtime than the median case. However, considering the difficulty in tuning 𝜂
in advance, we conclude that this 𝜂-adaptation mechanism is promising to waive the need for 𝜂
tuning in advance.
Figures 2a and 2b show the runtime of the proposed approaches with and without 𝜂-adaptation
for varying 𝑏 and for varying 𝑎/𝑐. For the fixed 𝜂 case, we set 𝜂 = 𝜂 ∗ . It may be observed that
2
the runtimes in terms of the number of iterations are proportional to 1 + 𝑎𝑏·𝑐 , as expected from
Theorem 3.3. Moreover, the number of iterations was largely the same for all algorithms, as they all
approximate (8) with 𝜖 ≪ 1. In contrast, the number of 𝑓 -calls was different for the two algorithms.
This is because Adversarial-CMA-ES is expected to spend approximately 5(𝜏es × ℓ + 𝜏es ′ ) = 275

𝑓 -calls per oracle call, whereas Adversarial-SLSQP spends 𝜏slsqp = 5 𝑓 -calls. We remark that if one of
the CMA-ES in Adversarial-CMA-ES (i.e., either M𝑥 or M 𝑦 ) is replaced with SLSQP, the number of
𝑓 -calls will be approximately halved. Therefore, it is advisable to use SLSQP, or another first-order
approach, as an approximate minimization oracle if ∇𝑓 is available and cheap to compute. Figure 2c
shows the scaling of the runtime with respect to the dimension 𝑛 = 𝑚. The number of iterations did

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:16 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

104 104 104

103 103 103

#iterations
#iterations

#iterations
102 102 102

101 101 101


CMAES(η ∗ ) SLSQP(η ∗ ) CMAES(η ∗ ) SLSQP(η ∗ ) CMAES(η ∗ ) SLSQP(η ∗ )
CMAES(adapt) SLSQP(adapt) CMAES(adapt) SLSQP(adapt) CMAES(adapt) SLSQP(adapt)
100 100 100
100 101 10−2 100 102 101 102

106 106 106

105 105 105


#f -calls
#f -calls

#f -calls
4
104 10 104

3 3
10 10 103

102 102 102


100 101 10−2 100 102 101 102
b a = 1/c n=m

(a) Varying 𝑏 (b) Varying 𝑎𝑐 (c) Varying 𝑛 = 𝑚


𝑎 = 𝑐 = 1 and 𝑛 = 𝑚 = 10 𝑏 = 𝑎𝑐 = 1 and 𝑛 = 𝑚 = 10 fixed 𝑎 = 𝑐 = 1 and 𝑏 = 2

Fig. 2. The number of iterations and the number of 𝑓 -calls until 𝐺 1 (𝑥, 𝑦) ⩽ 10−5 is reached on 𝑓1 . The
solid lines indicate the median and the shaded areas indicate the 10–90 percentile ranges. Dashed lines are
2
proportional to 1 + 𝑎𝑏·𝑐 .

not depend on the search space dimension. The number of 𝑓 -calls was also constant over varying
𝑛 = 𝑚 for Adversarial-SLSQP. However, it was proportional to 𝑛 + 𝑚 for Adversarial-CMA-ES.3
This is because the runtime of (1+1)-CMA-ES is proportional to the dimension, and iterations must
be run proportional to the search space dimension to approximate (8).

5.2 Comparison with Baseline Approach


To confirm (C) and (D), we ran Adversarial-CMA-ES on the six test problems summarized in Table 1.
In all cases, the domain of the objective function is X × Y = [−1, 5]𝑚 × [−1, 5] 𝑛 . The function 𝑓2 is
globally strongly convex–concave, while 𝑓3 is locally strongly convex–concave. The function 𝑓4
is globally convex–concave but not strongly convex–concave. These functions exhibited a global
min–max saddle point at (𝑥 ∗, 𝑦 ∗ ) = (0, 0) and 𝑥 ∗ was the global optimal solution to the worst-case
objective 𝐹 (𝑥) = 𝑓 (𝑥, 𝑦ˆ (𝑥)). The function 𝑓5 was not strongly convex–concave,
Í but the worst case
𝑦 is independent of 𝑥, and the optimal 𝑥 is constant over 𝑦 such that 𝑛𝑗=1 𝑦 𝑗 > 0. The optimal
solutions 𝑥 ∗ = 0 to the worst-case objective functions for 𝑓6 and 𝑓7 were not min–max saddle points.
The experimental settings were as follows. We ran Adversarial-CMA-ES with and without
sampling distributions 𝑃𝑥 and 𝑃 𝑦 . For the distributions 𝑃𝑥 and 𝑃 𝑦 , uniform distributions over X and
Y are used. Moreover, we use the same initialization as in Section 5.1. The minimal learning rate is
𝜂 min = 10−4 . The minimal step sizes were set to 𝜎¯min
𝑥 𝑦
= 𝜎¯min = 0. The restart was not performed,
that is, 𝐺 tol = 0. The boundary constraint was handled using the mirroring technique, that is, the
domain was virtually extended to R𝑚 × R𝑛 by defining the function value 𝑓 (𝑥, 𝑦) for (𝑥, 𝑦) ∉ X × Y
3 We comment on the computational complexity of the algorithm. The bottleneck of the execution time of each iteration of
Algorithm 1 is an M𝑥 -call and an M 𝑦 -call. The execution time for the 𝜂-adaptation was negligible. The time and space
complexity of Algorithm 2 per 𝑓 -call is 𝑂 (ℓ 2 ), where ℓ is the search space dimension. Therefore, if the number of 𝑓 -calls
scales linearly in 𝑛 + 𝑚, the execution time of Adversarial-CMA-ES scales as 𝑂 (𝑛 3 + 𝑚 3 ).

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:17

Table 1. Definition of the test functions 𝑓 (𝑥, 𝑦) and their worst-case variable 𝑦ˆ (𝑥) = argmax𝑦 ∈Y 𝑓 (𝑥, 𝑦)

𝑓 (𝑥, 𝑦) 𝑦ˆ (𝑥)
1 1 Í𝑚 Í𝑛 1 1 Í𝑚

2 ∥𝑥 ∥ + 𝑚 𝑖=1 𝑥𝑖 − 2 ∥𝑦 ∥2 𝑚 Í𝑖=1 𝑥𝑖  1
𝑓2 2
𝑗=1 𝑦 𝑗
1 1 Í𝑚 Í𝑛 𝑚
𝑓3 2 min ∥𝑥 ∥ 2, ∥𝑥 − 4 · 1∥ 2 + 𝑚 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 − 21 ∥𝑦 ∥ 2 1
𝑚 𝑖=1 𝑥𝑖 1
1 1 Í𝑚 Í𝑛 1 2 2
 1 Í 𝑚  13
2 ∥𝑥 ∥ + − 2 ∥𝑦 ∥ 1
𝑓4 2
Í𝑚 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 𝑚 ·𝑛 𝑖=1 𝑥𝑖
1 𝑛
𝑓5 𝑚 Í∥𝑥 𝑖 ∥ 𝑗=1 𝑦 𝑗 5·1 
1 𝑚 Í𝑛 1 1 Í𝑚
𝑗=1 𝑦 𝑗 − 2 ∥𝑦 ∥ (𝑚 𝑖=1 𝑥𝑖Í 1
𝑓6 2
𝑚 𝑖=1 𝑥𝑖
𝑚
1 Í𝑚 Í𝑛 5·1 𝑖=1 𝑥𝑖 ⩾ 0
𝑓7 2 ∥𝑥 ∥
2 + 𝑚1 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 Í𝑚
−1 · 1 𝑖=1 𝑥𝑖 < 0

by 𝑓 (𝑇X (𝑥),𝑇Y (𝑦)), where 𝑇X and 𝑇Y map each coordinate to 𝑈 − |mod(𝑥 − 𝐿, 2(𝑈 − 𝐿)) − (𝑈 − 𝐿)|,
where 𝑈 = 5 and 𝐿 = −1 denote the upper and lower bounds of each coordinate. The output of (1+1)-
CMA-ES (M𝑥 and M 𝑦 ) is repaired into the feasible domain by applying 𝑇X and 𝑇Y . We compare
the results with those of the naive baseline approach, referred to as CMA-ES(𝑁 𝑦 ). We sampled
𝑁 𝑦 = 10 or 100 points uniform-randomly in Y, and they were denoted as 𝑦𝑘 for 𝑘 = 1, . . . , 𝑁 𝑦 . The
approximate worst-case objective was defined as 𝐹 𝑁 𝑦 (𝑥) = max1⩽𝑘 ⩽𝑁 𝑦 𝑓 (𝑥, 𝑦𝑘 ). Then, we solve 𝐹 𝑁 𝑦
with (1+1)-CMA-ES (Algorithm 2) using mirroring boundary constraint handling. These algorithms
are run 10 times with different initial solutions. We also compared two coevolutionary approaches,
MMDE [Qiu et al. 2018] and COEVA [Al-Dujaili et al. 2019]. These approaches were implemented
based on the Python code provided by the authors of [Al-Dujaili et al. 2019].
Figure 3 shows the results of 10 independent trials of Adversarial-CMA-ES, CMA-ES(𝑁 𝑦 = 10),
CMA-ES(𝑁 𝑦 = 100), MMDE, and COEVA. Adversarial-CMA-ES succeeds in converging the global
min–max saddle point on 𝑓2 , 𝑓3 , and 𝑓6 . The functions 𝑓2 and 𝑓3 were locally strongly convex–concave
functions, and Adversarial-CMA-ES performed well, as expected. The existing coevolutionary
approaches, as well as CMA-ES(𝑁 𝑦 ), failed to converge on these problems. The benchmark problems
used to evaluate the performance of existing coevolutionary approaches [Branke and Rosenbusch
2008; Qiu et al. 2018; Zhou and Zhang 2010] are rather low-dimensional problems (𝑚 ⩽ 2 and
𝑛 ⩽ 2). These approaches do not work well on higher-dimensional problems and perform worse
than the simple baseline, CMA-ES(𝑁 𝑦 ). CMA-ES(𝑁 𝑦 ) tends to the global optimal point on 𝑓5 . This is
because the optimal 𝑥 ∗ is optimal
Í for approximate worst-case functions provided that there exists
𝑦 in 𝑁 𝑦 samples such that 𝑛𝑖=1 𝑦𝑖 > 0 holds. In contrast, no approach succeeded in converging
toward the global optimum of the worst-case function on 𝑓4 , 𝑓6 , and 𝑓7 . From these results, we
conclude that local strong convexity–concavity is an important factor for the convergence of
Adversarial-CMA-ES. These results reveal the limitations of Adversarial-CMA-ES and the difficulty
of locating the solution to the worst-case objective if it is not a min–max saddle point.

6 APPLICATION TO ROBUST BERTHING CONTROL


In this section, we analyze the application of Adversarial-CMA-ES to robust berthing control tasks
under model uncertainty.

6.1 Problem Description


The objective of our robust berthing control task is to obtain a controller that controls a subject
ship toward the target position near a berth while avoiding collision to the berth under the worst
situation in the predefined uncertainty set. The control target was a 3 m model ship representing

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:18 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

103 103 103


maxy f (x, y) − minx maxy f (x, y)

maxy f (x, y) − minx maxy f (x, y)

maxy f (x, y) − minx maxy f (x, y)


101 101 101

10−1 10−1 10−1


Adv-CMA-ES(P)
10−3 Adv-CMA-ES(no P) 10−3 10−3
CMA-ES(Ny = 10)
CMA-ES(Ny = 100)
10−5 MMDE
10−5 10−5
COEVA
10−7 10−7 10−7
0 200000 400000 0 200000 400000 0 200000 400000
#f -calls #f -calls #f -calls

(a) 𝑓2 (b) 𝑓3 (c) 𝑓4

103 103 103


maxy f (x, y) − minx maxy f (x, y)

maxy f (x, y) − minx maxy f (x, y)

maxy f (x, y) − minx maxy f (x, y)


101 101 101

10−1 10−1 10−1

10−3 10−3 10−3

10−5 10−5 10−5

10−7 10−7 10−7


0 200000 400000 0 200000 400000 0 200000 400000
#f -calls #f -calls #f -calls

(d) 𝑓5 (e) 𝑓6 (f) 𝑓7

Fig. 3. Results of 10 independent runs of Adversarial-CMA-ES with and without sampling distribution
(denoted as Adv-CMA-ES(P) and Adv-CMA-ES(no P), respectively), CMA-ES(𝑁 𝑦 = 10), CMA-ES(𝑁 𝑦 = 100),
MMDE, and COEVA. The search space dimension is 𝑚 = 50 and 𝑛 = 20 for all cases.

the vessel MV ESSO OSAKA (Figure 4a). The state variables 𝑠 = (𝑋, 𝑢, 𝑌 , 𝑣𝑚 ,𝜓, 𝑟 ) ∈ R6 and the
control signal 𝑎 = (𝛿, 𝑛 p, 𝑛 BT, 𝑛 ST ) ∈ R4 were as described in Figure 4b. The controller 𝑢𝑥 : 𝑠 ↦→ 𝑎
was modeled by a neural network with 𝑚 = 99 dimensional parameter vector 𝑥. The objective
function 𝑓 (𝑥, 𝑦) measures the distance between the target position and the final position of the
subject ship after a pre-defined control time using the controller 𝑢𝑥 , penalized by the risk of the
collision to the berth, under an uncertainty parameter 𝑦 ∈ Y described below. The details of the
controller and the objective function are explained in Appendix B.
The wind conditions 𝑦 (𝐴) and the model coefficients 𝑦 (𝐵) with respect to the wind forces are
treated as the uncertain factors 𝑦 = (𝑦 (𝐴) , 𝑦 (𝐵) ). The following three situations are considered. (A)
The state equation model is accurately modeled, but the wind conditions are uncertain. In this
situation, the uncertainty parameters 𝑦 (𝐴) ∈ Y𝐴 represent the wind velocity 𝑈𝑇 [m/s] and the wind
direction 𝛾𝑇 [rad], and their feasible values are in Y𝐴 = [0, 0.5] × [0, 2𝜋]. The model coefficients 𝑦 (𝐵)
(𝐵)
are set to the same values as in [Miyauchi et al. 2021b], denoted by 𝑦est . (B) Wind conditions are
known, but the state equation model is uncertain. The coefficients in the state equation model for
the effect of the wind force were derived in [Fujiwara et al. 1998] using regression of wind tunnel
experiment data, and we consider them to be relatively inaccurate. The uncertainty parameters
𝑦 (𝐵) consist of 10 coefficients for the wind force. The feasible domain Y𝐵 is constructed to include
(𝐵)
the coefficient used in [Miyauchi et al. 2021b], that is, 𝑦est ∈ Y𝐵 . For each variable, the feasible

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:19

X
Berth

UT U
u
b
-v
UT
gT UA nB
r
o
n gA
nS

d
y

(a) Photo of 3 m model ship of ESSO OSAKA


Y
O

(b) Coordinate systems

Fig. 4. Control target: 3 m model ship of ESSO OSAKA

(𝐵)
values are defined by the interval. The interval of the 𝑖th component of 𝑦 (𝐵) , denoted by [𝑦est ]𝑖 ,
(𝐵) (𝐵)
is set to [0.9 · [𝑦est ] 𝑖 , 1.1 · [𝑦est ] 𝑖 ] for all 𝑖 = 1, . . . , 10. The other model coefficients are set to the
(𝐴)
same values as [Miyauchi et al. 2021b] and the wind condition is set to 𝑦est = (1.5𝜋, 0.5), meaning
that the velocity of wind blowing orthogonally from the sea to the berth is 0.5 [m/s]. (C) Wind
conditions are unknown, and the model coefficients are uncertain. In this situation, 𝑦 is composed
of the uncertainty parameters 𝑦 (𝐴) and 𝑦 (𝐵) , and the feasible values are set to Y𝐶 = Y𝐴 × Y𝐵 .

6.2 Experiment Setting


We ran Adversarial-CMA-ES and CMA-ES(𝑁 𝑦 = 100) on Y𝐴 , Y𝐵 , and Y𝐶 .4 As baselines, we run
(1+1)-CMA-ES on 𝑓 (𝑥, 𝑦fix ) under two different situations for 𝑦fix ∈ Y. The first situation was
(𝐴) (𝐵) (𝐴)
𝑦fix = (𝑦no , 𝑦est ), where 𝑦no = (0, 0) corresponds to no wind disturbance, and the second situation
(𝐴) (𝐵) (𝐴)
was 𝑦fix = (𝑦est , 𝑦est ), where 𝑦est = (1.5𝜋, 0.5) reflects our prior knowledge that such a wind
is difficult to handle for avoiding collision with the berth. Note that (1+1)-CMA-ES on 𝑓 (𝑥, 𝑦fix )
corresponds to CMA-ES(𝑁 𝑦 = 1) with 𝑦 1 = 𝑦fix . Each algorithm was run 20 times independently
with random initialization of 𝑥 and 𝑦. The search space for 𝑥 and 𝑦 was scaled to X = [−1, 1]𝑚 and
Y = [−1, 1] 𝑛 . The box constraint was treated using the mirroring technique described in Section 5.2.
The initial solution (𝑥, 𝑦) was drawn uniform-randomly from X × Y. For CMA-ES(𝑁 𝑦 = 100), 𝑦𝑘
for 𝑘 = 1, . . . , 𝑁 𝑦 were uniform-randomly generated. The step sizes 𝜎 𝑥 and 𝜎 𝑦 are initialized as
one-fourth of the length of the initialization interval. The factors 𝐴𝑥 and 𝐴𝑦 are initialized by the
−8 𝜎 𝑦 . We set 𝐺 tol = 10−6 and
√ The minimal step size is 𝜎¯min = 10 for both 𝜎 and
identity matrix. 𝑥
𝑦
𝑑 min = 𝜎¯min × 𝑛 for Adversarial-CMA-ES. The 𝑓 -call budget was 10 . 6

4 CMA-ES(𝑁
𝑦 = 10), MMDE and COEVA tested in Section 5.2 were omitted from the comparison based on our preliminary
experiments. The worst-case performance of CMA-ES(𝑁 𝑦 = 10) were worse than the worst-case performance of CMA-
ES(𝑁 𝑦 = 100) on our problems. The worst-case performance of MMDE and COEVA were not competitive to the other
approaches as demonstrated in Figure 3.

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:20 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

For Adversarial-CMA-ES, we used the restart strategy proposed in Algorithm 1. The output
of Adversarial-CMA-ES follows Algorithm 1. For CMA-ES(𝑁 𝑦 ), when the termination condition
𝜎 < 𝜎¯min was satisfied, the candidate solution was recorded and the algorithm was re-started until
it exhausted the 𝑓 -call budget. Note that 𝑦𝑘 (𝑘 = 1, . . . , 𝑁 𝑦 ) were not resampled. The output of CMA-
ES(𝑁 𝑦 ) is determined as follows: Let {𝑥 1, . . . , 𝑥 𝑟 } be the set of recorded candidate solutions and the
solution obtained at the end of the run. We then selected 𝑥 = argmin𝑖=1,...,𝑟 max𝑘=1,...,𝑁 𝑦 𝑓 (𝑥 𝑖 , 𝑦𝑘 ) as
the output of CMA-ES(𝑁 𝑦 ).
The obtained solutions were evaluated as follows. Because the ground truth worst-case objective
function value 𝐹 (𝑥) = max𝑦 ∈Y 𝑓 (𝑥, 𝑦) for a given 𝑥 is unknown, we perform numerical optimization
to approximate 𝐹 (𝑥). We ran (1+1)-CMA-ES for 500 × 𝑛 iterations to obtain a local maximal point 𝑦
of 𝑓 (𝑥, 𝑦). As the objective is expected to have multiple local optima, we repeat it 100 times with
different initial search points 𝑦. The initialization of (1+1)-CMA-ES is as described above.

6.3 Results and Discussion


Figure 5 shows the performance of the resulting controllers of 20 independent trials of each
algorithm under different situations. Some of the trajectories observed for the obtained controllers
are discussed in Appendix C.
(𝐴) (𝐵)
The (1+1)-CMA-ES on 𝑦 = (𝑦no , 𝑦est ) achieves the best performance under no wind disturbance
(𝐴) (𝐵)
(Figure 5a), while (1+1)-CMA-ES on 𝑦 = (𝑦est , 𝑦est ) achieved the best performance under the
(𝐴) (𝐴)
certain wind condition, 𝑦 = 𝑦est (Figure 5c). In all trials, they achieved a cost of < 10−3 . However,
their performances significantly were degraded under the worst case, particularly when the wind
condition was unknown (Figures 5b and 5e), where the ship collided with the berth and the cost
was > 10. The performance was less affected by the uncertainty in the model coefficients in
this experiment, but the effect would be enhanced if we consider a wider uncertainty set Y (𝐵) .
Nonetheless, these results demonstrate the importance of considering model uncertainty to obtain
robust berthing control.
The controllers obtained by Adversarial-CMA-ES and CMA-ES(𝑁 𝑦 = 100) on Y𝐴 and Y𝐶 achieved
better performance under the worst situation in Y𝐴 (Figure 5b) than those obtained by the other
approaches. The numbers of trials that maxA, maxC, advA, and advC succeed in avoiding collision
with the berth under the worst case in Y𝐴 were 3, 1, 4, and 11 out of 20. Interestingly, Adversarial-
CMA-ES on Y𝐶 (advC) achieved better worst-case performance in Y𝐴 than Adversarial-CMA-ES on
Y𝐴 (advA), even though advC considered a wider uncertainty set and hence was expected to show
more conservative performance. The reason for the superior performance of advC may be explained
as follows. Both 𝑦 (𝐴) and 𝑦 (𝐵) represent the uncertainty regarding the wind force. Considering the
worst case in Y𝐶 results in being more conservative to the wind force. This may help to obtain a
controller that can avoid the collision with the berth, while possibly losing the accuracy of the final
position.
The advantage of Adversarial-CMA-ES over CMA-ES(𝑁 𝑦 = 100) was more pronounced in the
worst-case performance on Y𝐵 (Figure 5d). The median of advB and that of maxB were better than
the median of the other results. All 20 trials of advB achieved berthing without collision with the
berth in the worst situation. In contrast, 9 out of 20 trials failed in maxB. This may be because
𝑁 𝑦 = 100 was not sufficiently large to represent the uncertainty in the 10-dimensional space Y𝐵 .
In the worst-case performance on Y𝐶 , 5 out of 20 trials of advC succeed in avoiding a collision
with the berth, whereas all the trials of maxC failed in avoiding a collision with the berth. Because
of the similarity between the results on the worst-case performances on Y𝐴 and Y𝐶 , the difficulties
in obtaining a robust controller under the worst-case in Y𝐶 mainly comes from the difficulty in

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:21

cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103

(𝐴) (𝐵)
(a) 𝑓 (𝑥, (𝑦no , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103

(𝐵)
(b) max𝑦 (𝐴) ∈Y𝐴 𝑓 (𝑥, (𝑦 (𝐴) , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103

(𝐴) (𝐵)
(c) 𝑓 (𝑥, (𝑦est , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103

(𝐴)
(d) max𝑦 (𝐵) ∈Y𝐵 𝑓 (𝑥, (𝑦est , 𝑦 (𝐵) ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103

(e) max𝑦 ∈Y𝐶 𝑓 (𝑥, 𝑦)

(𝐴) (𝐵)
Fig. 5. Performance of the controllers obtained in 20 independent trials of (1+1)-CMA-ES on 𝑦 = (𝑦no , 𝑦est )
(𝐴) (𝐵)
and 𝑦 = (𝑦est , 𝑦est ); CMA-ES(𝑁 𝑦 = 100) on Y𝐴 , Y𝐵 , Y𝐶 ; and Adversarial-CMA-ES on Y𝐴 , Y𝐵 , Y𝐶 , denoted
by cmaA, cmaB, maxA, maxB, maxC, advA, advB, and advC, respectively. Each box indicates the lower quartile
𝑄1 and the upper quartile 𝑄3, with the line indicating the median 𝑄2. The lower and upper whiskers are the
lowest datum above 𝑄1 − 1.5(𝑄3 − 𝑄1) and the highest datum below 𝑄3 + 1.5(𝑄3 − 𝑄1).

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:22 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

treating the worst case in Y𝐴 . The results may be improved by running the optimization process
longer and performing more restarts to locate more local min–max saddle points.

7 CONCLUSION
We have proposed a framework for saddle point optimization with approximate minimization
oracles. Our theoretical analysis has shown the conditions under which the learning rate for the
approach converges linearly (i.e., geometrically) toward the min–max saddle point on strongly
convex–concave functions. Numerical evaluation have shown the tightness of the theoretical
results. We have also proposed a learning rate adaptation mechanism for practical use. Numerical
evaluation on convex-concave quadratic problems has demonstrated that the proposed approach
with the learning rate adaptation successfully converges linearly toward the min–max saddle
point, with the compromise of 𝑓 -calls being no more than three times that of 𝑓 -calls with the best
tuned fixed learning rate. Comparison with other baseline approaches on several test problems
revealed the limitations of existing coevolutionary approaches as well as of the proposed approach
on problems with the optimal solution that is not a min–max saddle point. The application of the
proposed approach to a robust berthing control task demonstrated the usefulness of the proposed
approach, and the results imply the importance of considering modeling errors to achieve a reliable
and safe solution.
We conclude the present work with some suggestions for possible avenues for future research.
The main limitation of the proposed approach as a solver to (1) is that it may fail to converge
to a local minimal solution of the worst-case objective max𝑦 ∈Y 𝑓 (𝑥, 𝑦) if it is not a strict local
min–max saddle point of 𝑓 . Such failure cases were observed in Figure 3, not only for the proposed
approach but also for existing coevolutionary approaches. Addressing this difficulty is an important
direction for future work. For the GDA approach (2), Liang and Stokes [2019] have shown that
the GDA failed to converge to the optimal solution on a bi-linear function 𝑓 (𝑥, 𝑦) = 𝑥 T𝐶𝑦 and
some improved gradient-based approaches [Daskalakis et al. 2018; Mescheder et al. 2017; Yadav
et al. 2018] successfully converged. We expect that these gradient-based approaches would be
useful in improving the here-proposed approach. The other limitation is that the best possible
runtime Ω(1/𝛾 ∗ ) in (13) scales as the interaction term; more precisely, 𝛽𝐺 /𝛼 𝐻 , increases. We intend
to address this limitation in future work.
The main limitation of our theoretical result (Theorem 3.3) is that approximate minimization
oracles are required to satisfy Assumption 3.2. In practice, it is often impossible to guarantee
Assumption 3.2 as we do not know the global minimum/maximum of the objective functions. For
the design of Adversarial-CMA-ES and Adversarial-SLSQP, we expect that it is approximately
satisfied by running a linearly convergence approximate minimization oracle until a fixed number
of improvements are observed. See Section 4.3 for details. In such cases, we have condition (8) not
with a constant 𝜖 but rather with a possibly stochastic and time-dependent sequence {𝜖𝑡 }, which is
not covered by Theorem 3.3. Dealing with such 𝜖𝑡 will enlarge the scope of Theorem 3.3 and bridge
the gap between theory and practice.5
The uncertainty parameter is typically constrained in a bounded set Y ⊂ R𝑛 in practice, however,
the effect of Y on the convergence rate has not been investigated in this work. Our theoretical
result was developed for unconstrained situation and the proof does not immediately generalize to
the constrained situation. The effect of the bound Y on the convergence rate is to be investigated
theoretically and empirically.
5 Another possible approach is to replace condition ℎ (𝑧)
˜ − min𝑧∈Z ℎ (𝑧) ⩽ 𝜖 · (ℎ (𝑧)
¯ − min𝑧∈Z ℎ (𝑧)) in Assumption 3.2 with
condition ∥ ∇ℎ (𝑧)
˜ ∥ < 𝜖 · ∥ ∇ℎ (𝑧)
¯ ∥ under the additional assumption that the objective function is continuously differentiable.
An advantage of this approach is that this condition can be approximately verified by estimating the gradients of the
objective function [Nesterov and Spokoiny 2017].

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:23

The experimental results on the robust berthing control task have demonstrated the usefulness
of the proposed approach and the importance of considering model uncertainties. Simultaneously,
they revealed the difficulty of obtaining a robust solution with satisfactory utility. Regarding the
wind condition uncertainty, it is possible to decompose Y𝐴 into disjoint subsets (e.g., based on the
wind direction), train the robust feedback controller for each subset, and switch the controller based
on the wind condition measured at the time of operation. Such an approach is not available for the
uncertainty in the model coefficients. To improve the worst-case performance, it is important to
reduce the set of uncertain parameter values Y as much as possible. In our experiments, we defined
the interval for each uncertain coefficient to form Y, but the corner case may be unrealistic and
will degrade the worst-case performance unnecessarily. Designing more intelligent Y remains as a
very important task for practical applications.

ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewers for their valuable comments and
suggestions. This work is partially supported by JSPS KAKENHI Grant Number 19H04179 and
19K04858.

REFERENCES
Martin A Abkowitz. 1980. Measurement of hydrodynamic characteristics from ship maneuvering trials by system identifica-
tion. In Transactions of Society of Naval Architects and Marine Engineers 88. 283–318.
Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, and Thomas Hofmann. 2019. Local Saddle Point Optimization: A
Curvature Exploitation Approach. In International Conference on Artificial Intelligence and Statistics. 486–495.
Youhei Akimoto. 2021. Saddle Point Optimization with Approximate Minimization Oracle. In Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO ’21). 493–501.
Youhei Akimoto and Nikolaus Hansen. 2020. Diagonal acceleration for covariance matrix adaptation evolution strategies.
Evolutionary computation 28, 3 (2020), 405–435.
Abdullah Al-Dujaili, Shashank Srikant, Erik Hemberg, and Una-May O’Reilly. 2019. On the application of Danskin’s theorem
to derivative-free minimax problems. In AIP Conference Proceedings, Vol. 2070. 20–26.
Motoki Araki, Hamid Sadat-Hosseini, Yugo Sanada, Kenji Tanimoto, Naoya Umeda, and Frederick Stern. 2012. Estimating
maneuvering coefficients using system identification methods with experimental, system-based, and CFD free-running
trial data. Ocean Engineering 51 (2012), 63–84.
Dirk V. Arnold and Nikolaus Hansen. 2010. Active Covariance Matrix Adaptation for the (1+1)-CMA-ES. In Proceedings of
the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO ’10). 385–392.
Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. 2010a. Nonconvex Robust Optimization for Problems with
Constraints. INFORMS Journal on Computing 22, 1 (2010), 44–58.
Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. 2010b. Robust Nonconvex Optimization for Simulation based
Problems. Operations Research 58, 1 (2010), 161–178.
Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, and Volkan Cevher. 2018. Adversarially Robust Optimization with
Gaussian Processes. In Advances in Neural Information Processing Systems. 5760–5770.
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
Jürgen Branke and Johanna Rosenbusch. 2008. New Approaches to Coevolutionary Worst-Case Optimization. In International
Conference on Parallel Problem Solving from Nature. 144–153.
Ashish Cherukuri, Bahman Gharesifard, and Jorge Cortés. 2017. Saddle-Point Dynamics: Conditions for Asymptotic Stability
of Saddle Points. SIAM Journal on Control and Optimization 55, 1 (2017), 486–511.
Andrew R. Conn and Luis N. Vicente. 2012. Bilevel Derivative-Free Optimization and Its Application to Robust Optimization.
Optimization Methods Software 27, 3 (2012), 561–577.
Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. 2018. Training GANs with Optimism. In
International Conference on Learning Representations.
Oswaldo de Oliveira. 2013. The Implicit and Inverse Function Theorems: Easy Proofs. Real Analysis Exchange 39, 1 (2013),
207–218.
Luc Devroye. 1972. The compound random search. In International Symposium on Systems Engineering and Analysis.
195–110.
Peter I Frazier. 2018. A Tutorial on Bayesian Optimization. arXiv:1807.02811 (2018).

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:24 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

Toshifumi Fujiwara, Michio Ueno, and Tadashi Nimura. 1998. Estimation of Wind Forces and Moments acting on Ships.
Journal of the Society of Naval Architects of Japan 1998 (1998), 77–90. Issue 183.
Gauthier Gidel, Tony Jebara, and Simon Lacoste-Julien. 2017. Frank-Wolfe Algorithms for Saddle Point Problems. In
International Conference on Artificial Intelligence and Statistics. 362–371.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
Nikolaus Hansen and Anne Auger. 2014. Principled design of continuous stochastic search: From theory to practice. In
Theory and principled methods for the design of metaheuristics. Springer, 145–180.
Nikolaus Hansen, Sibylle D Müller, and Petros Koumoutsakos. 2003. Reducing the time complexity of the derandomized
evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation 11, 1 (2003), 1–18.
Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolu-
tionary computation 9, 2 (2001), 159–195.
Mikkel T. Jensen. 2004. A New Look at Solving Minimax Problems with Coevolutionary Genetic Algorithms. Kluwer Academic
Publishers, 369–384.
Hamed Karimi, Julie Nutini, and Mark Schmidt. 2016. Linear Convergence of Gradient and Proximal-Gradient Methods
Under the Polyak-Łojasiewicz Condition. In Joint European Conference on Machine Learning and Knowledge Discovery in
Databases. 795–811.
Stefan Kern, Sibylle D. Müller, Nikolaus Hansen, Dirk Büche, Jiri Ocenasek, and Petros Koumoutsakos. 2004. Learning
Probability Distributions in Continuous Evolutionary Algorithms - a Comparative Review. Natural Computing 3, 1 (2004),
77–112.
Dieter Kraft. 1988. A software package for sequential quadratic programming. Technical Report. DFVLR-FB 88-28, DLR
German Aerospace Center — Institute for Flight Mechanics, Koln, Germany.
Tengyuan Liang and James Stokes. 2019. Interaction matters: A note on non-asymptotic local convergence of generative
adversarial networks. In International Conference on Artificial Intelligence and Statistics. 907–915.
Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, and Una-May O’Reilly. 2020.
Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks.
In International Conference on Machine Learning. 2307–2318.
Atsuo Maki, Youhei Akimoto, and Naoya Umeda. 2021. Application of optimal control theory based on the evolution
strategy (CMA-ES) to automatic berthing (part: 2). Journal of Marine Science and Technology 26 (2021), 835–845.
Atsuo Maki, Naoki Sakamoto, Youhei Akimoto, Hiroyuki Nishikawa, and Naoya Umeda. 2020. Application of optimal
control theory based on the evolution strategy (CMA-ES) to automatic berthing. Journal of Marine Science and Technology
25 (2020), 221–233.
Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. 2017. The Numerics of GANs. In Advances in Neural Information
Processing Systems. 1823–1833.
Transport Ministry of Land, Infrastructure and Tourism. 2020. White paper on land, infrastructure, transport and tourism
in Japan. https://www.mlit.go.jp/en/statistics/white-paper-mlit-index.html
Yoshiki Miyauchi, Atsuo Maki, Naoya Umeda, Dimas M. Rachman, and Youhei Akimoto. 2021a. System Parameter Exploration
of Ship Maneuvering Model for Automatic Docking / Berthing using CMA-ES. arXiv:2111.06124 (2021).
Yoshiki Miyauchi, Ryohei Sawada, Youhei Akimoto, Naoya Umeda, and Atsuo Maki. 2021b. Optimization on Planning
of Trajectory and Control of Autonomous Berthing and Unberthing for the Realistic Port Geometry. arXiv:2106.02459
(2021).
Daiki Morinaga and Youhei Akimoto. 2019. Generalized drift analysis in continuous domain: linear convergence of (1+1)-ES
on strongly convex functions with Lipschitz continuous gradients. In Foundations of Genetic Algorithms. 13–24.
Daiki Morinaga, Kazuto Fukuchi, Jun Sakuma, and Youhei Akimoto. 2021. Convergence Rate of the (1+1)-Evolution Strategy
with Success-Based Step-Size Adaptation on Convex Quadratic Function. In Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO ’21). 1169–1177.
Vaishnavh Nagarajan and J. Zico Kolter. 2017. Gradient Descent GAN Optimization is Locally Stable. In Advances in Neural
Information Processing Systems. 5591–5600.
Yurii Nesterov and Vladimir Spokoiny. 2017. Random Gradient-Free Minimization of Convex Functions. Foundations of
Computational Mathematics 17, 2 (2017), 527–566.
Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Razaviyayn. 2019. Solving a Class of Non-
Convex Min-Max Games Using Iterative First Order Methods. In Advances in Neural Information Processing Systems.
14934–14942.
Victor Picheny, Mickael Binois, and Abderrahmane Habbal. 2019. A Bayesian optimization approach to find Nash equilibria.
Journal of Global Optimization 73, 1 (2019), 171–192.
Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. 2017. Robust Adversarial Reinforcement Learning. In
International Conference on Machine Learning. 2817–2826.

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:25

Xin Qiu, Jian-Xin Xu, Yinghao Xu, and Kay Chen Tan. 2018. A New Differential Evolution Algorithm for Minimax
Optimization in Robust Design. IEEE Transactions on Cybernetics 48, 5 (2018), 1355–1368.
Ingo Rechenberg. 1973. Evolutionsstrategie: Optimierung technisher Systeme nach Prinzipien der biologischen Evolution.
Frommann-Holzboog.
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved
Techniques for Training GANs. In Advances in Neural Information Processing Systems. 2234–2242.
Michael A. Schumer and Kenneth Steiglitz. 1968. Adaptive step size random search. Automatic Control, IEEE Transactions on
13 (1968), 270–276.
Hiroaki Shioya, Yusuke Iwasawa, and Yutaka Matsuo. 2018. Extending Robust Adversarial Reinforcement Learning Consid-
ering Adaptation and Diversity. In International Conference on Learning Representations, Workshop Track Proceedings.
Kouki Wakita, Atsuo Maki, Naoya Umeda, Yoshiki Miyauchi, Tohga Shimoji, Dimas M. Rachman, and Youhei Akimoto.
2021. On Neural Network Identification for Low-Speed Ship Maneuvering Model. arXiv:2111.06120 (2021).
Abhay Yadav, Sohil Shah, Zheng Xu, David Jacobs, and Tom Goldstein. 2018. Stabilizing Adversarial Nets With Prediction
Methods. In International Conference on Learning Representations.
Aimin Zhou and Qingfu Zhang. 2010. A Surrogate-Assisted Evolutionary Algorithm for Minimax Optimization. In IEEE
Congress on Evolutionary Computation. 1–7.

A PROOFS
A.1 Proof of Proposition 2.3
Proof. Assume that (𝑥 ∗, 𝑦 ∗ ) is a local min–max saddle point of 𝑓 . Then, by definition, there
exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝑓 (𝑥, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦) holds for any
(𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . Let (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . Then, 𝐺𝑥 (𝑥, 𝑦 ∗ ) = 𝑓 (𝑥, 𝑦 ∗ ) − min𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) −
min𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦 ∗ ) = 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, 𝑦) = max𝑦′ ∈Y 𝑓 (𝑥 ∗, 𝑦 ′) − 𝑓 (𝑥 ∗, 𝑦) ⩾ max𝑦′ ∈Y 𝑓 (𝑥 ∗, 𝑦 ′) −
𝑓 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ). This implies that 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (𝑥, 𝑦 ∗ ) and
𝐺 𝑦 (𝑥 ∗, 𝑦), respectively.
Conversely, assume that 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (𝑥, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, 𝑦), respec-
tively. Then, there exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝐺𝑥 (𝑥, 𝑦 ∗ ) ⩾ 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ )
and 𝐺 𝑦 (𝑥 ∗, 𝑦) ⩾ 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) for any (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . They read 𝑓 (𝑥, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) and
𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦), which implies that (𝑥 ∗, 𝑦 ∗ ) is a local min–max saddle point of 𝑓 .
If (𝑥 ∗, 𝑦 ∗ ) is the global min–max saddle point of 𝑓 , then (𝑥 ∗, 𝑦 ∗ ) is a local minimal point of 𝐺.
Moreover, we have 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) = 0, implying that it is the global minimal point of 𝐺.
Conversely, if (𝑥 ∗, 𝑦 ∗ ) is the global minimal point of 𝐺, then it is a local min–max saddle point.
Moreover, because the global minimum of 𝐺 is zero, we have 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) = 0. Then, we
can take E𝑥 = X and E 𝑦 = Y in the above proof, which implies that (𝑥 ∗, 𝑦 ∗ ) is the global min–max
saddle point. □

A.2 Proof of Lemma 2.6


Proof. Noting that (∇𝑥 𝑓 ) (𝑥ˆ (𝑦), 𝑦) = 0 and (∇𝑦 𝑓 ) (𝑥, 𝑦ˆ (𝑥)) = 0, we obtain

∇𝑥 𝐺𝑥 (𝑥, 𝑦) = (∇𝑥 𝑓 ) (𝑥, 𝑦), ∇𝑦 𝐺𝑥 (𝑥, 𝑦) = (∇𝑦 𝑓 ) (𝑥, 𝑦) − (∇𝑦 𝑓 ) (𝑥ˆ (𝑦), 𝑦),
(18)
∇𝑥 𝐺 𝑦 (𝑥, 𝑦) = (∇𝑥 𝑓 ) (𝑥, 𝑦ˆ (𝑥)) − (∇𝑥 𝑓 ) (𝑥, 𝑦), ∇𝑦 𝐺 𝑦 (𝑥, 𝑦) = −(∇𝑦 𝑓 ) (𝑥, 𝑦).

Moreover, we have
 
2 𝐻𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑦 (𝑥, 𝑦)
∇ 𝐺𝑥 (𝑥, 𝑦) = ,
𝐻 𝑦,𝑥 (𝑥, 𝑦) 𝐻 𝑦,𝑦 (𝑥, 𝑦) − 𝐻 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦) − [𝐽𝑥ˆ (𝑥ˆ (𝑦), 𝑦)] T 𝐻𝑥,𝑦 (𝑥ˆ (𝑦), 𝑦)
 
2 −𝐻𝑥,𝑥 (𝑥, 𝑦) + 𝐻𝑥,𝑥 (𝑥, 𝑦ˆ (𝑥)) + [𝐽𝑦ˆ (𝑥, 𝑦ˆ (𝑥))] T 𝐻 𝑦,𝑥 (𝑥, 𝑦ˆ (𝑥)) −𝐻𝑥,𝑦 (𝑥, 𝑦)
∇ 𝐺 𝑦 (𝑥, 𝑦) = .
−𝐻 𝑦,𝑥 (𝑥, 𝑦) −𝐻 𝑦,𝑦 (𝑥, 𝑦)

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:26 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

In light of Proposition 2.5 and the symmetry 𝐻𝑥,𝑦 = 𝐻 𝑦,𝑥 T , we have [𝐽 (𝑥ˆ (𝑦), 𝑦)] T =
𝑥ˆ
−𝐻 𝑦,𝑥 (𝑥ˆ (𝑦), 𝑦) (𝐻𝑥,𝑥 (𝑥ˆ (𝑦), 𝑦)) −1 and [𝐽𝑦ˆ (𝑥, 𝑦ˆ (𝑥))] T = −𝐻𝑥,𝑦 (𝑥, 𝑦ˆ (𝑥)) (𝐻 𝑦,𝑦 (𝑥, 𝑦ˆ (𝑥))) −1 . Then, be-
cause ∇2𝐺 = ∇2 (𝐺𝑥 + 𝐺 𝑦 ) = ∇2𝐺𝑥 + ∇2𝐺 𝑦 , we obtain
 
𝐺 (𝑥, 𝑦ˆ (𝑥)) 0
∇2𝐺 (𝑥, 𝑦) = 𝑥,𝑥 .
0 𝐺 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦)
The symmetry of 𝐺𝑥,𝑥 and 𝐺 𝑦,𝑦 are clear from their definitions. The positivity of 𝐺𝑥,𝑥 and 𝐺 𝑦,𝑦
follows that 𝐻𝑥,𝑥 ≻ 0, −𝐻 𝑦,𝑦 ≻ 0, 𝐻𝑥,𝑦 (−𝐻 𝑦,𝑦 ) −1 𝐻 𝑦,𝑥 ≽ 𝜎min (𝐻𝑥,𝑦 ) 2 /𝜎max (−𝐻 𝑦,𝑦 ) ≻ 0 and
𝐻 𝑦,𝑥 (𝐻𝑥,𝑥 ) −1 𝐻𝑥,𝑦 ≽ 𝜎min (𝐻𝑥,𝑦 ) 2 /𝜎max (𝐻𝑥,𝑥 ) ≻ 0. This completes the proof. □

A.3 Proof of Theorem 3.3


Proof. Let 𝑣 𝑥 = 𝑥˜𝑡 −𝑥𝑡 , 𝑣 𝑦 = 𝑦˜𝑡 −𝑦𝑡 , and 𝑣 = (𝑣 𝑥 , 𝑣 𝑦 ). Define 𝑥¯ (𝜏) = 𝑥𝑡 +𝜏 ·𝑣 𝑥 and 𝑦¯ (𝜏) = 𝑦𝑡 +𝜏 ·𝑣 𝑦 .
Let 𝑤 𝑥 = 𝑥ˆ (𝑦𝑡 ) − 𝑥𝑡 and 𝑤 𝑦 = 𝑦ˆ (𝑥𝑡 ) − 𝑦𝑡 . Define 𝑥¯ (𝜏) = 𝑥𝑡 + 𝜏 · 𝑤 𝑥 and 𝑦¯ (𝜏) = 𝑦𝑡 + 𝜏 · 𝑤 𝑦 . Then,
𝑥¯ (0) = 𝑥¯ (0) = 𝑥𝑡 and 𝑦¯ (0) = 𝑦¯ (0) = 𝑦𝑡 . Moreover, 𝑥¯ (𝜂) = 𝑥𝑡 +1 and 𝑦¯ (𝜂) = 𝑦𝑡 +1 .
We first derive several inequalities on the norms of 𝑣 and 𝑤. Note that ∇𝑥 𝑓 (𝑥ˆ (𝑦𝑡 ), 𝑦𝑡 ) = 0 and
∇𝑦 𝑓 (𝑥𝑡 , 𝑦ˆ (𝑥𝑡 )) = 0. In light of conditions (1)–(2) in the theorem statement, we have
2 2
(𝛼 𝐻 /2) ∥𝑤 𝑥 ∥ 𝐻 ∗
𝑥,𝑥
⩽ 𝐺𝑥 (𝑥𝑡 , 𝑦𝑡 ) = 𝑓 (𝑥𝑡 , 𝑦𝑡 ) − 𝑓 (𝑥ˆ (𝑦𝑡 ), 𝑦𝑡 ) ⩽ (𝛽𝐻 /2) ∥𝑤 𝑥 ∥ 𝐻 ∗ ,
𝑥,𝑥
(19)
(𝛼 𝐻 /2) ∥𝑤 𝑦 ∥ 2−𝐻 𝑦,𝑦
∗ ⩽ 𝐺 𝑦 (𝑥𝑡 , 𝑦𝑡 ) = 𝑓 (𝑥𝑡 , 𝑦ˆ (𝑥𝑡 )) − 𝑓 (𝑥𝑡 , 𝑦𝑡 ) ⩽ (𝛽𝐻 /2) ∥𝑤 𝑦 ∥ 2−𝐻 𝑦,𝑦
∗ . (20)

Because of condition (8), we have


2
(𝛼 𝐻 /2) ∥𝑤 𝑥 − 𝑣 𝑥 ∥ 𝐻 ∗
𝑥,𝑥
⩽ 𝑓 (𝑥˜𝑡 , 𝑦𝑡 ) − 𝑓 (𝑥ˆ (𝑦𝑡 ), 𝑦𝑡 ) ⩽ 𝜖 · 𝐺𝑥 (𝑥𝑡 , 𝑦𝑡 ), (21)
(𝛼 𝐻 /2) ∥𝑤 𝑦 − 𝑣 𝑦 ∥ 2−𝐻 𝑦,𝑦
∗ ⩽ 𝑓 (𝑥𝑡 , 𝑦ˆ (𝑥𝑡 )) − 𝑓 (𝑥𝑡 , 𝑦˜𝑡 ) ⩽ 𝜖 · 𝐺 𝑦 (𝑥𝑡 , 𝑦𝑡 ). (22)

Then, from Equations (19) to (22), we have


2 2 √ 2
∥𝑣 𝑥 ∥ 𝐻 ∗
𝑥,𝑥
⩽ (∥𝑤 𝑥 ∥ 𝐻𝑥,𝑥
∗ + ∥𝑣 𝑥 − 𝑤 𝑥 ∥ 𝐻𝑥,𝑥
∗ ) ⩽ 2[(1 + 𝜖) /𝛼 𝐻 ] · 𝐺𝑥 (𝑥𝑡 , 𝑦𝑡 ), (23)
2 2 √ 2
∥𝑣 𝑦 ∥ 𝐻 ∗
𝑦,𝑦
⩽ (∥𝑤 𝑦 ∥ −𝐻 𝑦,𝑦
∗ + ∥𝑣 𝑦 − 𝑤 𝑦 ∥ −𝐻 𝑦,𝑦
∗ ) ⩽ 2[(1 + 𝜖) /𝛼 𝐻 ] · 𝐺 𝑦 (𝑥𝑡 , 𝑦𝑡 ). (24)

Because 𝜖 < 𝛼 𝐻 /𝛽𝐻 , we also have


2 2
√︁
∥𝑣 𝑥 ∥ 𝐻 ∗
𝑥,𝑥
⩾ (∥𝑤 𝑥 ∥ 𝐻𝑥,𝑥
∗ − ∥𝑣 𝑥 − 𝑤 𝑥 ∥ 𝐻𝑥,𝑥
∗ ) ⩾ (2/𝛽 𝐻 ) [1 − (𝛽𝐻 /𝛼 𝐻 ) · 𝜖] 2𝐺𝑥 (𝑥𝑡 , 𝑦𝑡 ), (25)
√︁
∥𝑣 𝑦 ∥ 2−𝐻 𝑦,𝑦
∗ ⩾ (∥𝑤 𝑦 ∥ −𝐻 𝑦,𝑦
∗ − ∥𝑣 𝑦 − 𝑤 𝑦 ∥ −𝐻 𝑦,𝑦 2
∗ ) ⩾ (2/𝛽 𝐻 ) [1 − (𝛽𝐻 /𝛼 𝐻 ) · 𝜖] 2𝐺 𝑦 (𝑥𝑡 , 𝑦𝑡 ). (26)

By applying the mean value theorem repeatedly, we have


𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) − 𝐺 (𝑥𝑡 , 𝑦𝑡 ) = 𝐺 (𝑥¯ (𝜂), 𝑦¯ (𝜂)) − 𝐺 (𝑥¯ (0), 𝑦¯ (0))
∫ 𝜂
= ∇𝐺 (𝑥¯ (𝜏), 𝑦¯ (𝜏)) T d𝜏 · 𝑣
0
∫ 𝜂  ∫ 𝜏 T
(27)
= ∇2𝐺 (𝑥¯ (𝑠), 𝑦¯ (𝑠))d𝑠 · 𝑣 d𝜏 · 𝑣
∇𝐺 (𝑥¯ (0), 𝑦¯ (0)) +
0 0
∫ 𝜂∫ 𝜏
= 𝜂 · ∇𝐺 (𝑥𝑡 , 𝑦𝑡 ) T · 𝑣 + 𝑣 T ∇2𝐺 (𝑥¯ (𝑠), 𝑦¯ (𝑠)) T d𝑠d𝜏 · 𝑣.
0 0

First, we evaluate the first term on the right-most side of (27). Note first that we have ∇𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 =
∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑥 + ∇𝑦 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑦 . In light of (18), we have ∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) = (∇𝑥 𝑓 ) (𝑥𝑡 , 𝑦ˆ (𝑥𝑡 )) =

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:27

(∇𝑥 𝑓 ) (𝑥¯ (0), 𝑦¯ (1)). Noting that (∇𝑥 𝑓 ) (𝑥¯ (1), 𝑦¯ (0)) = (∇𝑥 𝑓 ) (𝑥ˆ (𝑦𝑡 ), 𝑦𝑡 ) = 0, by the mean value
theorem, we obtain
∫ 1 ∫ 1
∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑥 = −𝑤 𝑥T · 𝐻𝑥,𝑥 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑥 + 𝑤 𝑦T · 𝐻 𝑦,𝑥 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑥 . (28)
0 0

Analogously, we obtain
∫ 1 ∫ 1
T
∇𝑦 𝐺 (𝑥𝑡 , 𝑦𝑡 ) 𝑣 𝑦 = 𝑤 𝑦T · 𝐻 𝑦,𝑦 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑦 − 𝑤 𝑥T · 𝐻𝑥,𝑦 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑦 . (29)
0 0

Inserting 𝑣 𝑥 = 𝑤 𝑥 + (𝑣 𝑥 − 𝑤 𝑥 ) and 𝑣 𝑥 = 𝑤 𝑦 + (𝑣 𝑦 − 𝑤 𝑦 ) into (28) and (29), using 𝐻𝑥,𝑦 = 𝐻 𝑦,𝑥


T , and

rearranging them, we obtain


∫ 1  T     T    !
T 𝑤𝑥 𝐻𝑥,𝑥 0 𝑤𝑥 𝑣𝑥 − 𝑤𝑥 𝐻𝑥,𝑥 −𝐻𝑥,𝑦 𝑤 𝑥
∇𝐺 (𝑥𝑡 , 𝑦𝑡 ) 𝑣 = − − d𝜏, (30)
0 𝑤𝑦 0 −𝐻 𝑦,𝑦 𝑤 𝑦 𝑣𝑦 − 𝑤𝑦 𝐻 𝑦,𝑥 −𝐻 𝑦,𝑦 𝑤 𝑦

where we drop (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏)) from 𝐻𝑥,𝑥 , 𝐻𝑥,𝑦 , 𝐻 𝑦,𝑥 and 𝐻 𝑦,𝑦 for compact expressions.
We aim to bound each term of (27) and (30). The second term on (27) is bounded by using
conditions (3) and (4) of the theorem statement as well as the fact that ∇2𝐺 is block-diagonal
(Lemma 2.6) as
𝛼𝐺 ∥𝑣 ∥ 2diag(𝐻𝑥,𝑥 𝑦,𝑦
T 2
¯ (𝑠), 𝑦¯ (𝑠)) T · 𝑣 ⩽ 𝛽𝐺 ∥𝑣 ∥ 2diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) ⩽ 𝑣 ∇ 𝐺 (𝑥 ∗ ,−𝐻 ∗ ) .
𝑦,𝑦
(31)

The first term on (30) is bounded by using conditions (1) and (2) of the theorem statement as
 T   
2 𝑤𝑥 𝐻𝑥,𝑥 0 𝑤𝑥
− 𝛽𝐻 ∥𝑤 ∥ diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) ⩽ − ⩽ −𝛼 𝐻 ∥𝑤 ∥ 2diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) . (32)
𝑦,𝑦 𝑤𝑦 0 −𝐻 𝑦,𝑦 𝑤 𝑦 𝑦,𝑦

The second term on (30) is bounded as


 T   
𝑣𝑥 − 𝑤𝑥 𝐻𝑥,𝑥 −𝐻𝑥,𝑦 𝑤𝑥
⩽ ∥𝑤 − 𝑣 ∥ diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) · ∥𝑤 ∥ diag(𝐻 ∗ ,−𝐻 ∗ )
𝑣𝑦 − 𝑤𝑦 𝐻 𝑦,𝑥 −𝐻 𝑦,𝑦 𝑤𝑦 𝑦,𝑦 𝑥,𝑥 𝑦,𝑦

∗ −1 𝐻
" √︁ √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 # !
𝐻𝑥,𝑥 𝑥,𝑥 𝐻𝑥,𝑥 − 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦
· 𝜎max √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 , (33)
−𝐻 𝑦,𝑦 𝐻 𝑦,𝑥 𝐻𝑥,𝑥 −𝐻 𝑦,𝑦 (−𝐻 𝑦,𝑦 ) −𝐻 𝑦,𝑦
where the greatest singular value is bounded by using conditions (1)–(4) in the theorem statement
as
∗ −1 𝐻
" √︁ √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 # !
𝐻𝑥,𝑥 𝑥,𝑥 𝐻𝑥,𝑥 − 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 1/2 1/2
𝜎max √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 ⩽ 𝛽𝐻 · 𝛽𝐺 /𝛼 𝐻 . (34)
−𝐻 𝑦,𝑦 𝐻 𝑦,𝑥 𝐻𝑥,𝑥 −𝐻 𝑦,𝑦 (−𝐻 𝑦,𝑦 ) −𝐻 𝑦,𝑦
Equations (19) to (24), (27) and (30) to (34) lead to
 √︄ 
 𝛼 𝐻 𝛽𝐻2 𝛽𝐺 ª 2 √ 2 𝛽𝐺 
𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) − 𝐺 (𝑥𝑡 , 𝑦𝑡 ) ⩽ −2𝜂 − · + · (1 + ·  𝐺 (𝑥𝑡 , 𝑦𝑡 ).
©
­1 𝜖 ® 𝜂 𝜖) (35)
 𝛽𝐻 𝛼 𝐻2 𝛼 𝐻 𝛼 𝐻 
 « ¬ 
Here, the right-hand side is 𝛾 ·𝐺 (𝑥𝑡 , 𝑦𝑡 ) with 𝛾 defined in (10). Hence, 𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) ⩽ (1+𝛾)·𝐺 (𝑥𝑡 , 𝑦𝑡 ).
Note that log(1 + 𝛾) < 𝛾 for all 𝛾 ∈ (−1, 0), we thus obtain log (𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 )) − log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) < 𝛾.
Because log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) −log
l (𝐺 (𝑥0, 𝑦m0 )) < 𝛾 ·𝑡, the minimal 𝑡 that log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) −log (𝐺 (𝑥 0, 𝑦0 )) ⩽
log(𝜁 ) is no greater than 𝛾1 log 𝜁1 = 𝑇𝜁 . Similarly, Equations (19) to (22), (25) to (27) and (30)

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:28 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

to (34) lead to
 √︄ √︄ 2
 𝛽𝐻 © 𝛽𝐺 ª 2 𝛼𝐺 © 𝛽𝐻 ª 
𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) − 𝐺 (𝑥𝑡 , 𝑦𝑡 ) ⩾ −2𝜂
 ­1 + · 𝜖® + 𝜂 ­1 − · 𝜖 ®  𝐺 (𝑥𝑡 , 𝑦𝑡 ) . (36)
 𝛼𝐻 𝛼𝐻 𝛽𝐻 𝛼𝐻 
 « ¬ « ¬
The right-hand side of Equation (36) is positive if 𝜂 > 2 · 𝜂.
¯ This completes the proof. □

B DETAILS OF AUTOMATIC BERTHING CONTROL PROBLEM


Subject Ship. The control target is a 3 m model ship of MV ESSO OSAKA (Figure 4), following a
related study [Maki et al. 2021, 2020]. The state variables 𝑠 = (𝑋, 𝑢, 𝑌 , 𝑣𝑚 ,𝜓, 𝑟 ) ∈ R6 are the 𝑋 [m]
and 𝑌 [m] coordinates of the Earth-fixed coordinate system, the longitudinal velocity 𝑢 [m/s] and the
lateral velocity 𝑣𝑚 [m/s] at the mid-ship, and the yaw direction𝜓 [rad] as seen from the 𝑋 coordinates
and the yaw angular velocity 𝑟 [rad/s]. The control signal 𝑎 = (𝛿, 𝑛 p, 𝑛 BT, 𝑛 ST ) ∈ R4 consists
of the rudder angle 𝛿 [rad], propeller revolution number 𝑛 p [rps], the bow thruster revolution
number𝑛 BT [rps], and  the stan-thruster revolution number 𝑛 ST [rps]. Their feasible values are
35 35
in 𝑈 = − 180 𝜋, 180 𝜋 × [−20, 20] × [−20, 20] × [−20, 20]. We employ the state equation model
𝑠¤ = 𝜙 (𝑠, 𝑎; 𝑦) proposed in [Miyauchi et al. 2021b], where 𝑦 ∈ Y represents the model uncertainty
described in Section 6.
Feedback Controller. The feedback controller 𝑢𝑥 : R6 → 𝑈 is modeled by the following neural
network parameterized by 𝑥 = (𝐵,𝑊 , 𝑉 ):
𝑢𝑥 (𝑠) = 𝑉 · softmax(𝛼 · (𝐵 + 𝑊 · 𝑠)), (37)
where 𝑊 ∈ [−1, 1] 𝐾×6 and 𝐵 ∈ [−1, 1] 𝐾 define a linear map 𝑧 = 𝛼 · (𝐵 + 𝑊 · 𝑠) from the state
vector 𝑠 to the 𝐾 dimensional latent space, and 𝑉 ∈ 𝑈 𝐾 ⊂ R𝑚×𝐾 is a matrix consisting of 𝐾 feasible
control vectors as its columns. The softmax function
(exp(𝑧 1 ), . . . , exp(𝑧𝐾 ))
softmax : 𝑧 = (𝑧 1, . . . , 𝑧𝐾 ) ↦→ Í𝐾 ∈ Δ𝐾−1
𝑘=1 exp(exp(𝑧 1 ), . . . , exp(𝑧𝐾 ))

outputs a point in the 𝐾 − 1 dimensional standard simplex Δ𝐾−1 = {𝑧 ∈ R𝐾 : 𝑧 1 ⩾ 0, . . . , 𝑧𝐾 ⩾


Í𝐾
0, and 𝑘=1 𝑧𝑘 = 1}. The output is a combination of the columns of 𝑉 weighted by the softmax
output. 𝛼 > 0 is a parameter that determines whether the output of softmax is close to the one-hot
vector.
The architecture of this neural network is interpreted as follows. First, 𝑧 = softmax(𝛼 · (𝐵 +𝑊 ·𝑠))
on the first layer divides the state space into 𝐾 regions. For example, if the greatest element of
the vector 𝐵 + 𝑊 · 𝑠 is the 𝑘th coordinate, then 𝑧 is approximated by the one-hot vector 𝑒𝑘 with 1
on the 𝑘th coordinate and 0 on the other coordinates if 𝛼 is sufficiently large. In such a situation,
𝑢 (𝑠) = 𝑉 · 𝑧 ≈ 𝑉 · 𝑒𝑘 = 𝑣𝑘 , where 𝑣𝑘 is the 𝑘th column of 𝑉 . In other words, this neural network
approximates the control law that divides the state space using a Voronoi diagram with respect to
the Euclidean metric and outputs the corresponding column of 𝑉 as a control signal in each region.
If we set 𝛼 to be greater, 𝑧 is more likely to be close to a one-hot vector, which makes it easier to
express the bang-bang type control. If we set 𝛼 to be smaller, 𝑧 is more likely to take a value in the
middle of Δ𝐾−1 , which makes it easier to express a continuous control.
Based on our preliminary experiments, we set 𝛼 = 4 and 𝐾 = 9 in the following experiments.
Then, 𝑥 is of 𝑚 = 99 dimension.
Objective Function. The objective is to find the parameter 𝑥 := (𝐵,𝑊 , 𝑉 ) ∈ X of the controller 𝑢𝑥
that minimizes the cost 𝐶 of the trajectory (𝑠𝑡 ∈ [0,𝑡max ] , 𝑎𝑡 ∈ [0,𝑡max ] ) in the worst environment 𝑦 ∈ Y

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:29

for 𝑢𝑥 . It is modeled as
min max 𝑓 (𝑥, 𝑦) = min max 𝐶 (𝑠𝑡 ∈ [0,𝑡max ] , 𝑎𝑡 ∈ [0,𝑡max ] )
𝑥 ∈X 𝑦 ∈Y 𝑥 ∈X 𝑦 ∈Y
∫ 𝑡
subject to 𝑠𝑡 = 𝑠 0 + 𝜙 (𝑠𝜏 , 𝑎𝜏 ; 𝑦)d𝜏 and 𝑎𝑡 = 𝑢𝑥 (𝑠 ⌊𝑡 /d𝑡 ⌋ ·d𝑡 ),
0
where d𝑡 [seconds] is the control time span, that is, the control signal 𝑎𝑡 changes every d𝑡, and 𝑠 0 is
the initial state.
We define the cost of the trajectory as
𝐶 (𝑠𝑡 ∈ [0,𝑡max ] , 𝑎𝑡 ∈ [0,𝑡max ] ) = 𝐶 1 + 𝑤 · (𝐶 2 + I{𝐶 2 > 0}).
where 𝑤 > 0 is the hyperparameter that determines the trade-off between utility and safety,
6
1 ∑︁
𝐶1 = (𝑠𝑡 ,𝑖 − 𝑠 fin,𝑖 ) 2,
6 𝑖=1 max
evaluates the deviation of the final ship state from the target state 𝑠 fin , and
4 ∫
1 ∑︁ 𝑡max
𝐶2 = dist(𝑃𝜏,𝑖 , 𝐶 berth )d𝜏,
4 𝑖=1 0
measures the collision risk, where 𝑃𝜏,1, . . . , 𝑃𝜏,4 represents the coordinates of the four vertices of
the rectangle surrounding the ship at time 𝜏 and dist(𝑃, 𝐶 berth ) measures the distance from a point
𝑃 to the closest point on the berth boundary. Refer to [Maki et al. 2021, 2020] for the definitions of
𝐶 1 and 𝐶 2 .
Following [Maki et al. 2021], we set 𝑡 max = 200 [seconds] and d𝑡 = 10 [seconds]. The initial state
is 𝑠 0 = (15.0, 0.01, 6.0, 0.0, 𝜋, 0.0) and the target state is 𝑠 fin = (3.0, 0.0, 9.5, 0.0, 𝜋, 0.0). The boundary
of the berth is 𝐶 berth = {𝑌 = 9.994625}. The trade-off coefficient is set to 𝑤 = 10. That is, the cost
𝑓 (𝑥, 𝑦) < 10 implies that the controller 𝑢𝑥 produces a trajectory without collision with the berth
under the uncertainty parameter 𝑦.
Differences from Previous Works. Our problem formulation mostly follows previous studies [Maki
et al. 2021, 2020] but with certain differences. First, we optimize the feedback controller, whereas
the control signals for each time period as well as the total control time are directly optimized in
[Maki et al. 2021, 2020], which we believe is not suitable for obtaining robust control. Second, we
modify the objective function. Previous studies include the term penalizing the control time as
they formulate the problem as minimization of the control time. Because we did not optimize the
control time, it is excluded from our objective function definition. Moreover, for better collision
avoidance, we replaced 𝑤 · 𝐶 2 with 𝑤 · (𝐶 2 + I{𝐶 2 > 0}). Third, following [Miyauchi et al. 2021b],
we implement thrusters to realize robust control under external disturbances and adopt the state
equation model proposed in [Miyauchi et al. 2021b].

C ADDITIONAL RESULTS FOR AUTOMATIC BERTHING CONTROL PROBLEM


Figures 6 to 9 visualize the trajectories obtained in the experiments in Section 6. The route of the
ship, that is, (𝑋, 𝑌 ,𝜓 ) at each time, is displayed in the top figure. The 𝑋 and 𝑌 axes are scaled
by 𝐿𝑝𝑝 = 3 [m]. The changes in the velocities, (𝑢, 𝑣𝑚 , 𝑟 ), as well as the changes in the control
signals, (𝛿, 𝑛𝑝 , 𝑛𝐵𝑇 , 𝑛𝑆𝑇 ), are plotted at the bottom. Note that 𝑟 and 𝛿 are plotted on a degree basis
for better intuition. Figure 6 shows the trajectories observed for the best controller obtained by
(𝐴) (𝐴) (𝐵) (𝐴)
CMA-ES(𝑦no ), which is the controller optimized under 𝑦 = (𝑦no , 𝑦est ), that is, no wind 𝑦 (𝐴) = 𝑦no
(𝐵)
and model parameter 𝑦 (𝐵) = 𝑦est used in the previous study. Figure 7 shows the trajectories

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:30 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

6 6
5
4 4
3
2 2
1
0
0 2 4 0
0 2 4

-3
10 0.01
0.10

vm [m/s]
vm [m/s]

u [m/s]
0.10
u [m/s]

5.00
0.00 0.00 0.00 0.00
-0.10 -5.00 -0.10
-0.01
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]

[degree], n [rps]
[degree], n [rps]

r [degree/s]
r [degree/s]

0.50 20
0.50 20
0.00 0
0.00 0
-0.50 -20 -0.50 -20
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]

n n BT n ST n n BT n ST

𝑓 (𝑥, 𝑦) = 7.30 × 10−7 𝑓 (𝑥, 𝑦) = 4.89 × 10+2


(𝐴) (𝐵) (𝐵)
(a) 𝑦 (𝐴) = 𝑦no , 𝑦 (𝐵) = 𝑦est (b) Worst case in 𝑦 (𝐴) ∈ Y𝐴 , 𝑦 (𝐵) = 𝑦est

(𝐴)
Fig. 6. Trajectories of the best controller obtained by CMA-ES(𝑦no )

observed for the best controller obtained by Adversarial-CMA-ES on Y𝐴 , which is the controller
(𝐵)
optimized under the worst wind condition 𝑦 (𝐴) ∈ Y𝐴 with 𝑦 (𝐵) = 𝑦est . For Figures 6 and 7, the
(𝐴) (𝐵)
left figure is the trajectory under 𝑦 = (𝑦no , 𝑦est ) and the right figure is the trajectory under the
(𝐵)
worst wind condition 𝑦 (𝐴) ∈ Y𝐴 with 𝑦 (𝐵) = 𝑦est . Figure 8 shows the trajectories observed for the
(𝐵) (𝐴) (𝐵)
best controller obtained by CMA-ES(𝑦est ), which is the controller optimized under 𝑦 = (𝑦est , 𝑦est ).
Figure 9 shows the trajectories observed for the best controller obtained by Adversarial-CMA-ES
on Y𝐵 , which is the controller optimized under the worst model parameter 𝑦 (𝐵) ∈ Y𝐵 with wind
(𝐴) (𝐴) (𝐵)
condition 𝑦 (𝐴) = 𝑦est . For Figures 8 and 9, the left figure is the trajectory under 𝑦 = (𝑦est , 𝑦est )
(𝐵) (𝐴) (𝐴)
and the right figure is the trajectory under the worst model parameter 𝑦 ∈ Y𝐵 with 𝑦 = 𝑦est .

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:31

6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4

0.02
vm [m/s]

vm [m/s]
0.10 0.10
u [m/s]

u [m/s]
0.01
0.00 0.00 0.00 0.00
-0.10 -0.10 -0.01
-0.02
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]

[degree], n [rps]
r [degree/s]

r [degree/s]
4.00 4.00
2.00 20 2.00 20
0.00 0 0.00 0
-2.00 -20 -2.00 -20
-4.00 -4.00
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]

n n BT n ST n n BT n ST

𝑓 (𝑥, 𝑦) = 4.79 × 10−2 𝑓 (𝑥, 𝑦) = 1.53 × 10−1


(𝐴) (𝐵) (𝐵)
(a) 𝑦 (𝐴) = 𝑦no , 𝑦 (𝐵) = 𝑦est (b) Worst case in 𝑦 (𝐴) ∈ Y𝐴 , 𝑦 (𝐵) = 𝑦est

Fig. 7. Trajectories of the best controller obtained by Adversarial-CMA-ES on Y𝐴

6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4

0.10 0.01 0.10 0.01


vm [m/s]

vm [m/s]
u [m/s]

u [m/s]

0.00 0.00 0.00 0.00


-0.10 -0.01 -0.10 -0.01
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]

[degree], n [rps]

0.50 0.50
r [degree/s]

r [degree/s]

20 20
0.00 0 0.00 0
-20 -20
-0.50 -0.50
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]

n n BT n ST n n BT n ST

𝑓 (𝑥, 𝑦) = 1.20 × 10−6 𝑓 (𝑥, 𝑦) = 1.44 × 10+1


(𝐴) (𝐵) (𝐴)
(a) 𝑦 (𝐴) = 𝑦est , 𝑦 (𝐵) = 𝑦est (b) Worst case in 𝑦 (𝐵) ∈ Y𝐵 , 𝑦 (𝐴) = 𝑦est

(𝐵)
Fig. 8. Trajectories of the best controller obtained by CMA-ES(𝑦est )

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:32 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki

6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4

0.02 0.02
vm [m/s]

vm [m/s]
0.05 0.05
u [m/s]

u [m/s]
0.00 0.00 0.00 0.00
-0.05 -0.02 -0.05 -0.02
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]

[degree], n [rps]
r [degree/s]

r [degree/s]
1.00 20 1.00 20
0.00 0 0.00 0
-1.00 -20 -1.00 -20
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]

n n BT n ST n n BT n ST

𝑓 (𝑥, 𝑦) = 2.23 × 10−4 𝑓 (𝑥, 𝑦) = 4.77 × 10−4


(𝐴) (𝐵) (𝐴)
(a) 𝑦 (𝐴) = 𝑦est , 𝑦 (𝐵) = 𝑦est (b) Worst case in 𝑦 (𝐵) ∈ Y𝐵 , 𝑦 (𝐴) = 𝑦est

Fig. 9. Trajectories of the best controller obtained by Adversarial-CMA-ES on Y𝐵

ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.

You might also like