Robust Berthing Control
Robust Berthing Control
YOUHEI AKIMOTO, Faculty of Engineering, Information and Systems, University of Tsukuba & RIKEN
arXiv:2105.11586v4 [math.OC] 4 Jan 2022
1 INTRODUCTION
Simulation-based optimization has recently received increasing attention from researchers. Here, the
objective function ℎ : X → R, where X ⊆ R𝑚 , is not explicitly given, but its value for each 𝑥 ∈ X
can be computed through computational simulation. Solvers for simulation-based optimization
problems have been widely developed. While some are domain-specific, others are general-purpose
solvers. For a case where simulation-based optimization is required, we first need to design a
simulator that models reality, for example, a physical equation, and computes the objective function
value for each solution. Then, we apply a numerical solver to solve argmin𝑥 ∈X ℎ(𝑥). However, owing
to modeling errors and uncertainties, the optimal solution to argmin𝑥 ∈X ℎ(𝑥) computed through a
Authors’ addresses: Youhei Akimoto, [email protected], Faculty of Engineering, Information and Systems, University
of Tsukuba & RIKEN Center for Advanced Intelligence Project, 1-1-1 Tennodai, Tsukuba, Ibaraki, Japan, 305-8573; Yoshiki
Miyauchi, [email protected]; Atsuo Maki, [email protected], Department of Naval
Architecture and Ocean Engineering, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka,
565-0971, Japan.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2022 Association for Computing Machinery.
2688-3007/2022/1-ART1 $15.00
https://doi.org/10.1145/3510425
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:2 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
simulator is not necessarily optimal in real environments in which the obtained solution is used.
This issue threatens the reliability of solutions obtained through simulation-based optimization.
An approach to obtain a solution that is robust against modeling errors and uncertainty is to
formulate the problem as a min–max optimization
min max 𝑓 (𝑥, 𝑦), (1)
𝑥 ∈X 𝑦 ∈Y
where 𝑦 ∈ Y represents the model parameters and the uncertain parameters. In the following,
𝑦 is referred to as the uncertainty parameter. Assume that the real environment is represented
by 𝑦real ∈ Y. The original objective ℎ(𝑥) is equivalent to 𝑓 (𝑥, 𝑦est ) with an estimated parameter
𝑦est ∈ Y. Then, the solution 𝑥 𝑦est = argmin𝑥 ∈X 𝑓 (𝑥, 𝑦est ) obtained via simulation does not guaran-
tee good performance in the real environment. That is, 𝑓 (𝑥 𝑦est , 𝑦real ) may be arbitrarily greater
than 𝑓 (𝑥 𝑦est , 𝑦est ). In contrast, the solution 𝑥 Y = argmin𝑥 ∈X max𝑦 ∈Y 𝑓 (𝑥, 𝑦) to (1) guarantees that
𝑓 (𝑥 Y, 𝑦real ) ⩽ max𝑦 ∈Y 𝑓 (𝑥 Y, 𝑦). That is, by minimizing the worst-case objective value, one can
guarantee performance in the real environment provided that 𝑦real ∈ Y.
Robust Berthing Control. As an important real-world application of the min–max optimization (1),
we consider an automatic ship berthing task [Maki et al. 2021, 2020], which can be formulated as
an optimization of the feedback controller of a ship. Currently, the domestic shipping industry in
Japan is facing a shortage of experienced on-board officers. Moreover, the existing workforce of
officers is aging [Ministry of Land and Tourism 2020]. This has generated considerable interest
in autonomous ship operation to improve maritime safety, shipboard working environments, and
productivity, and the technology is being actively developed. Automatic berthing/docking requires
fine control so that the ship can reach the target position located near the berth but avoid colliding
with it. Therefore, automatic berthing is central to the realization of automatic ship operations.
Because it is difficult to train the controller in a real environment owing to cost and safety issues, a
typical approach first models the state equation of a ship, for example, using system identification
techniques [Abkowitz 1980; Araki et al. 2012; Miyauchi et al. 2021a; Wakita et al. 2021] and then
optimizes the feedback controller in a simulator. However, such an approach always suffers from
modeling errors and uncertainties. For instance, the coefficients of a state equation model are often
estimated based on captive model tests in towing tanks and regressions; hence, they may include
errors. Moreover, the weather conditions at the time of operation could differ from those considered
in the model. Optimization of the feedback controller on a simulator with an estimated model
may result in a catastrophic accident, such as collision with the berth. Thus, to design a berthing
control solution robust against modeling errors and uncertainties, we formulate the problem as
a min–max optimization (1), where 𝑥 is the parameter of the feedback controller and 𝑦 is the
parameter representing the coefficients of the state equation model and weather conditions.
Saddle Point Optimization. Here, we consider min–max continuous optimization (1), where
𝑓 : X × Y → R is the objective function and X × Y ⊆ R𝑚 × R𝑛 is the search domain. In addition to
the abovementioned situation, min–max optimization can be applied in many fields of engineering,
including robust design [Conn and Vicente 2012; Qiu et al. 2018], robust control [Pinto et al. 2017;
Shioya et al. 2018], constrained optimization [Cherukuri et al. 2017], and generative adversarial
networks (GANs) [Goodfellow et al. 2014; Salimans et al. 2016]. In particular, we are interested in
the min–max optimization of a derivative-free and black-box objective 𝑓 , where the gradient or
higher-order information is unavailable (derivative-free) and no special structures such as convexity
or the Lipschitz constant are available in advance (black-box) [Frazier 2018].1
1 In
this paper, simulation-based optimization is used to refer to the problems described in the first paragraph above. The
terms derivative-free and black-box are used to refer to the characteristics of the objective function.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:3
We aim to locate a local min–max saddle point of 𝑓 , that is, a point (𝑥 ∗, 𝑦 ∗ ) satisfying 𝑓 (𝑥, 𝑦 ∗ ) ⩾
𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦) in a neighborhood of (𝑥 ∗, 𝑦 ∗ ). Generally, it is difficult to locate the global
minimum of the worst-case objective 𝐹 (𝑥) := max𝑦 ∈Y 𝑓 (𝑥, 𝑦). In a non-convex optimization
context, the goal is often to locate a local minimum of an objective rather than the global minimum
as a realistic target. However, in the min–max optimization context, it is still difficult to locate a
local minimum of the worst-case objective 𝐹 (𝑥) because doing so requires the maximization itself
and there may exist local maxima of 𝑓 (𝑥, 𝑦) unless 𝑓 (𝑥, 𝑦) is concave in 𝑦 for all 𝑥. A local min–max
saddle point is considered as a local optimal solution in the min–max optimization context because
it is a local minimum in 𝑥 and a local maximum in 𝑦. Therefore, as a practical target, we focus on
locating the local min–max saddle point of (1).
Related Works. First-order approaches are often employed for (1) if gradients are available. A
simultaneous gradient descent-ascent (GDA) approach
(𝑥𝑡 +1, 𝑦𝑡 +1 ) = (𝑥𝑡 , 𝑦𝑡 ) + 𝜂 (−∇𝑥 𝑓 (𝑥𝑡 , 𝑦𝑡 ), ∇𝑦 𝑓 (𝑥𝑡 , 𝑦𝑡 )) (2)
has often been analyzed for its local and global convergence properties on twice continuously
differentiable functions owing to its simplicity and popularity. A condition on the learning rate
𝜂 > 0 for the dynamics (2) to be asymptotically stable at a local min–max saddle point has been
studied [Mescheder et al. 2017; Nagarajan and Kolter 2017]. Subsequently, Adolphs et al. [2019]
showed the existence of asymptotically stable points of (2) that are not local min–max saddle
points. Liang and Stokes [2019] have derived a sufficient condition on 𝜂 for (2) to converge toward
the global min–max saddle point on a locally strongly convex–concave function. Frank-Wolfe
type approaches have also been analyzed for constrained situations [Gidel et al. 2017; Nouiehed
et al. 2019]. Although a convergence guarantee was not provided, [Bertsimas et al. 2010a,b] have
proposed a first-order approach targeting on 𝑓 that is non-concave in 𝑦.
Zero-order approaches for (1) include coevolutionary approaches [Al-Dujaili et al. 2019; Branke
and Rosenbusch 2008; Jensen 2004; Qiu et al. 2018; Zhou and Zhang 2010], surrogate-model–based
approaches [Bogunovic et al. 2018; Conn and Vicente 2012; Picheny et al. 2019], and gradient approx-
imation approaches [Liu et al. 2020]. Compared to first-order approaches, zero-order approaches
have not been thoroughly analyzed in terms of their convergence guarantees and convergence
rates. In particular, coevolutionary approaches are often designed heuristically and without conver-
gence guarantees. Indeed, they fail to converge toward a min–max saddle point even on strongly
convex–concave problems, as has been reported in [Akimoto 2021] and as noted below in the
experimental results. Recently, Bogunovic et al. [2018] showed regret bounds for a Bayesian op-
timization approach and Liu et al. [2020] showed an error bound for a gradient approximation
approach, where the error is measured by the square norm of the gradient. Both analyses show
sublinear rates under possibly stochastic (i.e., noisy) versions of (1). However, compared to the
first-order approach, which exhibits linear convergence, they show slower convergence.
Contributions. We propose an approach to saddle point optimization (1) that relies solely
on numerical solvers that approximately solve argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦) for each 𝑦 ∈ Y and
argmin𝑦′ ∈Y −𝑓 (𝑥, 𝑦 ′) for each 𝑥 ∈ X. Given an initial solution (𝑥 0, 𝑦0 ) ∈ X × Y, our approach itera-
tively locates the approximate solutions 𝑥˜𝑡 ≈ argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦𝑡 ) and 𝑦˜𝑡 ≈ argmin𝑦′ ∈Y −𝑓 (𝑥𝑡 , 𝑦 ′)
and updates the solution as
(𝑥𝑡 +1, 𝑦𝑡 +1 ) = (𝑥𝑡 , 𝑦𝑡 ) + 𝜂 · (𝑥˜𝑡 − 𝑥𝑡 , 𝑦˜𝑡 − 𝑦𝑡 ), (3)
where 𝜂 > 0 is the learning rate. This approach takes inspiration from the GDA method (2),
where we replace −∇𝑥 𝑓 (𝑥𝑡 , 𝑦𝑡 ) and ∇𝑦 𝑓 (𝑥𝑡 , 𝑦𝑡 ) with 𝑥˜𝑡 − 𝑥𝑡 and 𝑦˜𝑡 − 𝑦𝑡 . However, unlike the
GDA approach, the solvers need not be gradient-based. This is advantageous in the following
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:4 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
situations: (1) there exists a well-developed numerical solver suitable for argmin𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦) and/or
argmin𝑦′ ∈Y −𝑓 (𝑥, 𝑦 ′); (2) derivative-free approaches such as the covariance matrix adaptation
evolution strategy (CMA-ES) [Akimoto and Hansen 2020; Hansen and Auger 2014; Hansen et al.
2003; Hansen and Ostermeier 2001] are sought because gradient information is not available or
gradient-based approaches are known to be sensitive to their initial search points.
We analyze the proposed approach on strongly convex–concave problems, and prove its linear
convergence in terms of the number of numerical solver calls. In particular, we provide an upper
bound on 𝜂 to guarantee linear convergence toward the global min–max saddle point and the
convergence rate bound. This corresponds to the known result for the GDA approach (2). Compared
to existing derivative-free approaches for saddle point optimization, this result is unique in that our
convergence is linear, while the existing results show sublinear convergence [Bogunovic et al. 2018;
Liu et al. 2020]. Although our motivational application is not necessarily a strongly convex–concave
problem, the quantitative analysis helps to understand limitations of the approach—need for 𝜂
adaptation—and provide inspiration on how to improve the approach.
Moreover, we developed a heuristic adaptation mechanism for the learning rate in a black-box
optimization setting. In the black-box setting, we do not know in advance the characteristic constants
of a problem that determines the upper bound for the learning rate to guarantee convergence.
Therefore, a learning rate adaptation mechanism is highly desired to avoid trial and error in tuning
the learning rate. We implemented two variants of the proposed approach, one using (1+1)-CMA-
ES [Arnold and Hansen 2010], a zero-order approach, as the minimization solver, and another
using SLSQP [Kraft 1988], a first-order approach. Empirical studies on test problems show that
the learning rate adaptation achieved performance competitive to the proposed approach with the
optimal static learning rate, while obviating the need for time-consuming parameter tuning. We
also demonstrate the limitations of existing coevolutionary approaches as well as the proposed
approach.
We apply our approach to robust berthing control optimization, as an example of a real-world
application with a non-convex-concave objective. We consider the wind conditions and the co-
efficients of the state equation for the wind force as the uncertainty parameter 𝑦. Some related
works address the wind force as an external disturbance when planning the trajectories [Miyauchi
et al. 2021b]; however, they treat the wind condition as an observable disturbance, and the control
signal is selected according to the observed wind condition. In contrast, we optimize the on-line
feedback controller under wind disturbance without considering the wind condition as an input to
the controller. Moreover, among other studies on automatic berthing control, to the best of our
knowledge, the present work is the first to address model uncertainty. Compared to a naive baseline
approach, the proposed approach located solutions with better worst-case performance.
This paper is an extension of a previous work [Akimoto 2021]. We have improved on the previous
work in the following respects. First, we improved the convergence analysis in Section 3.2. We
have removed unnecessary assumptions on the problem by refining the proof. Second, we have
incorporated the covariance matrix adaptation into our proposed approach in Section 4.3. Third, we
have implemented a restart strategy and other ingenuity for practical use, summarized in Section 4.2.
Fourth, we have extended the comparison with existing approaches in Section 5.2. Finally, we have
evaluated the usefulness of the proposed approach in a real-world application in Section 6.
Our implementation of the proposed approach in the Python programming language, Adversarial-
CMA-ES, is publicly available at GitHub Gist.2
Notation. For a twice continuously differentiable function 𝑓 : R𝑚 × R𝑛 → R, that is, 𝑓 ∈
C 2 (R𝑚 × R𝑛 , R), let 𝐻𝑥,𝑥 (𝑥, 𝑦), 𝐻𝑥,𝑦 (𝑥, 𝑦), 𝐻 𝑦,𝑥 (𝑥, 𝑦), and 𝐻 𝑦,𝑦 (𝑥, 𝑦) be the blocks of the Hessian
2 https://gist.github.com/youheiakimoto/ab51e88c73baf68effd95b750100aad0
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:5
𝐻𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑦 (𝑥, 𝑦) 𝜕2 𝑓 𝜕2 𝑓
matrix ∇2 𝑓 (𝑥, 𝑦) = of 𝑓 , whose (𝑖, 𝑗)-th components are , ,
𝐻 𝑦,𝑥 (𝑥, 𝑦) 𝐻 𝑦,𝑦 (𝑥, 𝑦) 𝜕𝑥𝑖 𝜕𝑥 𝑗 𝜕𝑥𝑖 𝜕𝑦 𝑗
𝜕2 𝑓 𝜕2 𝑓
, and 𝜕𝑦𝑖 𝜕𝑦 𝑗 , respectively, evaluated at a given point (𝑥, 𝑦).
𝜕𝑦𝑖 𝜕𝑥 𝑗
For symmetric matrices 𝐴 and 𝐵, by 𝐴 ≽ 𝐵 and 𝐴 ≻ 𝐵, we mean that 𝐴 − 𝐵 is non-negative
and positive definite, respectively. For simplicity, we write 𝐴 ≽ 𝑎 and 𝐴 ≻ 𝑎 for 𝑎√∈ R to mean
𝐴 ≽ 𝑎 · 𝐼 and 𝐴 ≻ 𝑎 · 𝐼 , respectively.
√ For a positive definite symmetric matrix 𝐴, let 𝐴 √ denote
√ the
matrix square root, that is, 𝐴 is a positive definite symmetric matrix such that 𝐴 = 𝐴 · 𝐴. Let
∥𝑧 ∥𝐴 = [𝑧 T𝐴𝑧] 1/2 for a positive definite symmetric 𝐴.
Let 𝐽𝑔 (𝑧) denote the Jacobian of a differentiable 𝑔 = (𝑔1, . . . , 𝑔𝑘 ) : Rℓ → R𝑘 , where the (𝑖, 𝑗)-th
element is 𝜕𝑔𝑖 /𝜕𝑧 𝑗 evaluated at 𝑧 = (𝑧 1, . . . , 𝑧 ℓ ) ∈ Rℓ . If 𝑘 = 1, we write 𝐽𝑔 (𝑧) = ∇𝑔(𝑧) T .
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:6 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
The suboptimality error 𝐺 (𝑥, 𝑦) is zero if and only if (𝑥, 𝑦) is the global min–max saddle point
of 𝑓 . Moreover, the local min–max saddle points of 𝑓 are characterized by suboptimality errors.
This is summarized in the following proposition, whose proof is given in Appendix A.
Proposition 2.3. The point (𝑥 ∗, 𝑦 ∗ ) is the global min–max saddle point of 𝑓 if and only if it is the
global minimal point of 𝐺, that is, 𝐺 (𝑥, 𝑦) ⩾ 0 for any (𝑥, 𝑦) ∈ X × Y. The point (𝑥 ∗, 𝑦 ∗ ) is a local
min–max saddle point of 𝑓 if and only if 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (·, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, ·),
respectively, that is, there exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝐺𝑥 (𝑥, 𝑦 ∗ ) ⩾ 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ )
and 𝐺 𝑦 (𝑥 ∗, 𝑦) ⩾ 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) for any (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 .
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:7
and for each 𝑦 ∈ R𝑛 there exists a unique global minimal point 𝑥ˆ (𝑦) ∈ R𝑚 such that 𝑥ˆ (𝑦) =
argmin𝑥 ∈R𝑚 𝑓 (𝑥, 𝑦).
The following lemma shows the positivity of the Hessian of the suboptimality error 𝐺, which
implies that the suboptimality error 𝐺 is a globally strongly convex function. The proof is provided
in Appendix A.
Lemma 2.6. Suppose that 𝑓 ∈ C 2 (R𝑚 ×R𝑛 , R) is globally 𝜇-strongly convex–concave for some 𝜇 > 0.
The Hessian matrix of the suboptimality error 𝐺 is ∇2𝐺 (𝑥, 𝑦) = diag(𝐺𝑥,𝑥 (𝑥, 𝑦ˆ (𝑥)), 𝐺 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦)),
where
𝐺𝑥,𝑥 (𝑥, 𝑦) = 𝐻𝑥,𝑥 (𝑥, 𝑦) + 𝐻𝑥,𝑦 (𝑥, 𝑦) (−𝐻 𝑦,𝑦 (𝑥, 𝑦)) −1 𝐻 𝑦,𝑥 (𝑥, 𝑦)
𝐺 𝑦,𝑦 (𝑥, 𝑦) = −𝐻 𝑦,𝑦 (𝑥, 𝑦) + 𝐻 𝑦,𝑥 (𝑥, 𝑦) (𝐻𝑥,𝑥 (𝑥, 𝑦)) −1 𝐻𝑥,𝑦 (𝑥, 𝑦)
and they are symmetric, and 𝐺𝑥,𝑥 (𝑥, 𝑦) ≽ 𝜇 and 𝐺 𝑦,𝑦 (𝑥, 𝑦) ≽ 𝜇.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:8 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
For example, the gradient descent method is well known to exhibit a geometric decrease in the
objective function value on strongly convex functions with Lipschitz continuous gradients [Boyd
and Vandenberghe 2004; Karimi et al. 2016]. The (1+1)-ES also exhibits a geometric decrease on
strongly convex functions with Lipschitz continuous gradients [Morinaga and Akimoto 2019]. We
can satisfy the oracle requirement (8) by performing 𝑂 (log(1/𝜖)) iterations of such algorithms.
The condition can also be satisfied by algorithms that exhibit slower convergence, that is, sublinear
convergence. However, for such algorithms, the runtime increases as a candidate solution becomes
closer to a local optimum. Therefore, the stopping condition for the internal algorithm to satisfy (8)
needs to be carefully designed.
Consider approach (7) with approximate minimization oracles M𝑥 and M 𝑦 satisfying Assumption 3.2
𝛼𝐻5
with approximate precision 𝜖 < 4 𝛽
𝛽𝐻
. Let
𝐺
√︁
∗ 𝛼 𝐻 𝛼 𝐻 1 − (𝛽𝐻 /𝛼 𝐻 ) 2 (𝛽𝐺 /𝛼 𝐻 ) · 𝜖
𝜂 = √ , (9)
𝛽𝐻 𝛽𝐺 (1 + 𝜖) 2
√︄
𝛼𝐻 © 𝛽𝐻2 𝛽𝐺 ª √ 𝛽𝐺
𝛾 = −2𝜂 1 − 2 · 𝜖 ® + 𝜂 2 · (1 + 𝜖) 2 · . (10)
𝛽𝐻 𝛼𝐻 𝛼𝐻 𝛼𝐻
« ¬
Then, for any 𝜂 < 2 · 𝜂 ∗ , we have 𝛾 < 0 and log (𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 )) − log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) < 𝛾. Inlother words, m
the runtime 𝑇𝜁 to reach {(𝑥, 𝑦) ∈ R𝑚 ×R𝑛 : 𝐺 (𝑥, 𝑦) ⩽ 𝜁 ·𝐺 (𝑥 0, 𝑦0 )} for 𝜁 ∈ (0, 1) is 𝑇𝜁 ⩽ |𝛾1 | log 𝜁1
for any initial point (𝑥 0, 𝑦0 ) ∈ R𝑚 × R𝑛 . Moreover, 𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) > 𝐺 (𝑥𝑡 , 𝑦𝑡 ) for all (𝑥𝑡 , 𝑦𝑡 ) if 𝜂 > 2 · 𝜂,
¯
where √︁
𝛽𝐻 𝛽𝐻 1 + (𝛽𝐺 /𝛼 𝐻 ) · 𝜖
𝜂¯ = √︁ . (11)
𝛼𝐺 𝛼 𝐻 (1 − (𝛽𝐻 /𝛼 𝐻 ) · 𝜖) 2
Linear Convergence. The proposed approach (7) satisfying Assumption 3.2 converges linearly
toward the global min–max saddle point on a strongly convex–concave objective function if 𝜂 < 2𝜂 ∗ .
If M𝑥 and M 𝑦 are implemented with algorithms that exhibit linear convergence, we can conclude
that the runtime in terms of 𝑓 -calls and/or ∇𝑓 -calls is
1 1 1
O log log . (12)
|𝛾 | 𝜁 𝜖
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:9
Necessary Condition. To exhibit convergence, shrinking the learning rate 𝜂 is not only sufficient
but also necessary. To determine the closeness of the upper bound 2 · 𝜂 ∗ in the sufficient condition
and the lower bound 2 · 𝜂¯ in the necessary condition, consider a convex–concave quadratic function
𝑓 (𝑥, 𝑦) = 𝑎2 𝑥 2 + 𝑏𝑥𝑦 − 𝑐2 𝑦 2 for instance, where 𝑎 > 0, 𝑏 ∈ R and 𝑐 > 0. Then, we have 𝛼 𝐻 = 𝛽𝐻 = 1
and 𝛼𝐺 = 𝛽𝐺 = 𝑎𝑐+𝑏 𝑎𝑐 ∗ 𝑎𝑐
2 . Ignoring the effect of 𝜖, we have 𝜂 = 𝜂
¯ = 𝑎𝑐+𝑏 2 . This implies that the sufficient
∗
condition for linear convergence, 𝜂 < 2 · 𝜂 , is indeed the necessary condition for the convergence
itself in this example situation. This reveals a limitation of existing approaches [Al-Dujaili et al.
2019; Pinto et al. 2017], which corresponds to (7) with 𝜂 = 1.
Runtime Bound. The runtime bound 𝑇𝜁 is proportional to |𝛾1 | in (10). The log-progress bound |𝛾 |
1
is roughly proportional to 2 · 𝜂 if 𝜂 ≪ 1. That is, the runtime is proportional to 2·𝜂 . The minimal
∗
runtime bound is obtained when 𝜂 = 𝜂 , where
2 (𝛽 /𝛼 ) · 𝜖 2
2 √︁ !
𝛼 𝛼 1 − (𝛽 /𝛼 )
𝛾 = 𝛾 ∗ := −
𝐻 𝐻 𝐻 𝐻 𝐺 𝐻
√ . (13)
𝛽𝐺 𝛽𝐻 1+ 𝜖
2
Provided that 𝜖 ≪ 1, we have 𝜂 ∗ ≈ 𝛼𝛽𝐺𝐻 𝛼𝛽𝐻𝐻 and 𝛾 ∗ ≈ − 𝛼𝛽𝐺𝐻 𝛼𝛽𝐻𝐻 . The main factor that limits 𝜂 ∗
and 𝛾 ∗ is 𝛼𝛽𝐺𝐻 . As noted above in the above example of a convex–concave quadratic function, the
ratio 𝛼𝛽𝐺𝐻 is smaller as the influence of the interaction term between 𝑥 and 𝑦 on the objective
function value is greater than that to the other terms, that is, as 𝑏 2 /𝑎𝑐 is greater. The other factor,
𝛼𝐻 ∗ −1 ∗ −1
𝛽𝐻 , is smaller as the condition number Cond(𝐻𝑥,𝑥 (𝑥, 𝑦) (𝐻𝑥,𝑥 ) ) or Cond(𝐻 𝑦,𝑦 (𝑥, 𝑦) (𝐻 𝑦,𝑦 ) ) is
higher. This depends on the change in the Hessian matrix over the search space R𝑚 × R𝑛 . If the
objective function is convex–concave quadratic, that is, 𝑓 (𝑥, 𝑦) = 12 𝑥 T 𝐻𝑥,𝑥 𝑥 + 𝑥 T 𝐻𝑥,𝑦𝑦 + 21 𝑦 T𝐻 𝑦,𝑦𝑦,
the Hessian matrix is constant over the search space, and we have 𝛼 𝐻 /𝛽𝐻 = 1, whereas 𝛽𝐺 =
2 ( 𝐻 −1 𝐻 −1 −1 −1
√︁ √︁ √︁ √︁
1 + 𝜎max 𝑥,𝑥 𝑥,𝑦 −𝐻 𝑦,𝑦 ) and 𝛼𝐺 = 1 + 𝜎min ( 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 ), where 𝜎min and 𝜎max
2
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:10 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
(14) into (12), by ignoring the effect of 𝜖. Note, however that the runtime of the GDA depends on
the pre-conditioning, as it is a first-order approach. The number of oracle calls of the oracle-based
saddle point optimization is independent of the pre-conditioning, but the number of 𝑓 and/or ∇𝑓
calls in each oracle call may depend on the pre-conditioning.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:11
rate 𝜂 (line 25). We also update 𝛾˜ when 𝜂 = 𝜂𝑐 . If both 𝛾˜ and 𝛾˜𝑐 are non-negative, the learning rate is
too high, and we reduce 𝜂 by multiplying 1/𝑐𝜂3 . If 𝛾˜𝑐 − 2𝜎𝛾˜𝑐 > 0, where 𝜎𝛾˜ is the estimated standard
deviation of 𝛾,˜ we revert the solutions and other strategy parameters 𝜃 𝑥 and 𝜃 𝑦 .
Based on our preliminary experiments and the above argument, we set 𝑎𝜂 = 1, 𝑏𝜂 = 5, and
𝑐𝜂 = 1.1 as the default values.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:12 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
4.3 Adversarial-CMA-ES
We implemented the proposed approach with (1+1)-CMA-ES as M𝑥 and M 𝑦 . The (1+1)-CMA-ES is a
derivative-free randomized hill-climbing approach with step-size adaptation and covariance matrix
adaptation. It samples a candidate solution 𝑧 ′ ∼ N (𝑧, (𝜎𝐴) (𝜎𝐴) T ), where 𝜎 is the step size and
𝐴 · 𝐴T is the covariance matrix. The step size is adapted with the so-called 1/5-success rule [Devroye
1972; Rechenberg 1973; Schumer and Steiglitz 1968], which maintains 𝜎 such that the probability
of generating a better solution is approximately 1/5. We implemented a simplified 1/5-success rule
proposed by [Kern et al. 2004]. The covariance matrix was adapted with the active covariance
matrix update [Arnold and Hansen 2010]. The results show empirically that the covariance matrix
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:13
5: ′
𝑧 ← 𝑧 + 𝜎𝐴N (0, 𝐼 )
6: ℎ𝑧′ = ℎ(𝑧 ′)
7: if ℎ𝑧′ ⩽ 𝐻 1 then
8: 𝐻 ← (ℎ𝑧′ , 𝐻 1, 𝐻 2, 𝐻 3, 𝐻 4 )
9: 𝑝 succ ← (1 − 𝑐 𝑝 ) · 𝑝 succ + 𝑐 𝑝
10: if 𝑝 succ > 𝑝 thre then
11: 𝑝 ← (1 − 𝑐𝑐 ) · 𝑝, 𝑐 cov = 𝑐 cov+ (1 − 𝑐𝑐 · (2 − 𝑐𝑐 ))
12: else √︁ ′
13: 𝑝 ← (1 − 𝑐𝑐 ) · 𝑝 + 𝑐𝑐 · (2 − 𝑐𝑐 ) 𝑧 𝜎−𝑧 , 𝑐 cov = 𝑐 cov+
14: end if
15: 𝑤 = 𝐴inv · 𝑝 √
√ 1−𝑐 cov
√︃
𝑐 cov
16: 𝑎 = 1 − 𝑐 cov , 𝑏 = ∥𝑤 ∥2
1 + 1−𝑐 cov
∥𝑤 ∥ 2−1
learned the inverse Hessian matrix on a convex quadratic function. The algorithm is summarized
in Algorithm 2. We call Algorithm 1 with Algorithm 2 as M𝑥 and M𝑥 Adversarial-CMA-ES.
We shared the strategy parameter 𝜃 = (𝜎, 𝐴) over oracle calls. Here, we implicitly assumed that
the objective function ℎ of the current oracle call and that of the last oracle call are similar because
the changes in 𝑥𝑡 and 𝑦𝑡 are small if 𝜂 is small. Then, reusing the strategy parameter of the last
oracle call reduced the need for its adaptation time.
We ran (1+1)-CMA-ES until it improved the solution 𝜏es ·ℓ +𝜏es′ times. The reason for this procedure
is described below. Because the step size is maintained such that the probability of generating a
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:14 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
successful solution is approximately 1/5, the algorithm runs approximately 𝑇 = 5 · (𝜏es · ℓ + 𝜏es ′ )
iterations. It was shown in [Morinaga and Akimoto 2019] that the expected runtime E[𝑇𝜖 ] of
(1+1)-ES with the simplified 1/5-success rule is Θ(log(1/𝜖)) on strongly convex functions with
Lipschitz continuous gradients and their strictly increasing transformations. Moreover, the scaling
of the runtime with respect to dimension ℓ is Θ(ℓ) on general convex quadratic functions [Morinaga
et al. 2021]. Therefore, we expect that 𝑇 iterations of (1+1)-CMA-ES approximates M with 𝜖 ∈
exp(−Θ(𝑇 /ℓ)) = exp(−Θ(1)). The reason that we count the number of successful iterations instead
of the number of total iterations is to avoid producing no progress because of a bad initialization of
each oracle call.
Another optional stopping condition is 𝜎 < 𝜎¯min for a given minimal step size 𝜎¯min ⩾ 0. Once 𝜎
reaches 𝜎¯min , Algorithm 2 returns 𝜎 = 𝜎¯min . Then, the next M call starts with 𝜎 = 𝜎¯min and it is
expected to stop after a few iterations. That is, if 𝜎 for M𝑥 reaches 𝜎¯min while 𝜎 > 𝜎¯min for M 𝑦 ,
Algorithm 1 spends more 𝑓 -calls for M 𝑦 than for M𝑥 , and vice versa.
Based on our preliminary experiments, we set 𝜏es = 𝜏es ′ = 5 as their default values. If they are set
4.4 Adversarial-SLSQP
We also implemented the algorithm with a sequential least squares quadratic programming (SLSQP)
subroutine [Kraft 1988] to demonstrate the applicability of the proposed 𝜂 adaptation mechanism.
This was a first-order approach, which required access to ∇𝑓 . Unlike Adversarial-CMA-ES, no
strategy parameter for SLSQP is shared over oracle calls. The maximum number of iterations is
set to 𝜏slsqp = 5. We used the scipy implementation of SLSQP as M in Algorithm 1. We call this
first-order approach Adversarial-SLSQP (ASLSQP).
5 NUMERICAL EVALUATION
Through experiments on test problems as described below, we confirmed the following hypotheses.
(A) Our implementations of the proposed approach, Adversarial-CMA-ES and Adversarial-SLSQP,
performed as well as the theory implies. (B) Our learning rate adaptation located a nearly optimal
learning rate with little compromise of the objective function calls. (C) Local strong convexity–
concavity of the objective function is necessary for good min–max performance of the proposed
approach. (D) Existing coevolutionary approaches fail to converge even on a convex–concave
quadratic problem.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:15
105 103
6 × 104 6 × 102
4
4 × 102
#f -calls
#f -calls
4 × 10
3 × 104 3 × 102
4
2 × 10 2 × 102
104 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2
102 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2
2 3 2 3 2 3 2 3 2 3 2 3 23 2 3 2 3 2 3 2 3 2 3 adapt 2 3 2 3 2 3 2 3 2 3 2 3 23 2 3 2 3 2 3 2 3 2 3 adapt
η/η ∗ η/η ∗
Fig. 1. The number of 𝑓 -calls until 𝐺 (𝑥, 𝑦) ⩽ 10−5 is reached on 𝑓1 with 𝑛 = 𝑚 = 10 and 𝑎 = 𝑏 = 𝑐 = 1. (a)
3−𝑘
Adversarial-CMA-ES with 𝜂-adaptation (adapt) and fixed 𝜂 = 𝜂 ∗ ×2 3 for 𝑘 = 1, . . . , 12. (b) Adversarial-SLSQP
3−𝑘
with 𝜂-adaptation (adapt) and fixed 𝜂 = 𝜂 ∗ × 2 3 for 𝑘 = 1, . . . , 12. The dashed lines are proportional to
1
(𝜂/𝜂 ∗ ) (2−𝜂/𝜂 ∗ ) .
interval, that is, 𝜎 𝑥 = 𝜎 𝑦 = 1.5. The factors 𝐴𝑥 and 𝐴𝑦 are initialized by the identity matrix. We
used the default hyperparameter values described in the previous section. We omitted lines 12–13
and lines 28–32 of Algorithm 1 (i.e., neither 𝑃𝑥 nor 𝑃 𝑦 are given and 𝐺 tol = 0) in this experiment.
The minimal learning rate was set to 𝜂 min = 10−4 . The minimal step sizes are set to 𝜎¯min 𝑥 =𝜎 𝑦
¯min = 0.
We run 50 independent trials for each setting, with the maximum number of 𝑓 -calls of 107 .
Figure 1 compares the proposed approaches with and without 𝜂-adaptation mechanism. For fixed
3−𝑘
𝜂 cases, we set 𝜂 to 𝛿 · 𝜂 ∗ with 𝛿 ∈ {2 3 : 𝑘 = 1, . . . , 12}. We remark that for both algorithms, all
the trials with 𝜂 = 2 × 𝜂 ∗ fail to converge, as implied by Theorem 3.3. As expected, the runtimes of
1
both algorithms with fixed 𝜂 were nearly proportional to (𝜂/𝜂 ∗ ) (2−𝜂/𝜂 ∗ ) . The best 𝜂 is approximately
∗
𝜂 . We conclude that our implementations closely approximate the oracle condition (8) and that
the proposed approach works as the theory implies.
The proposed approach with the 𝜂-adaptation mechanism succeed in converging toward the
global min–max saddle point. Comparing the runtime of the 𝜂-adaptation mechanism and the best
fixed 𝜂 = 𝜂 ∗ , we compromise the number of 𝑓 -calls at most three times in the median case for
both Adversarial-CMA-ES and Adversarial-SLSQP to adapt 𝜂. There are also trials that required
a few times more runtime than the median case. However, considering the difficulty in tuning 𝜂
in advance, we conclude that this 𝜂-adaptation mechanism is promising to waive the need for 𝜂
tuning in advance.
Figures 2a and 2b show the runtime of the proposed approaches with and without 𝜂-adaptation
for varying 𝑏 and for varying 𝑎/𝑐. For the fixed 𝜂 case, we set 𝜂 = 𝜂 ∗ . It may be observed that
2
the runtimes in terms of the number of iterations are proportional to 1 + 𝑎𝑏·𝑐 , as expected from
Theorem 3.3. Moreover, the number of iterations was largely the same for all algorithms, as they all
approximate (8) with 𝜖 ≪ 1. In contrast, the number of 𝑓 -calls was different for the two algorithms.
This is because Adversarial-CMA-ES is expected to spend approximately 5(𝜏es × ℓ + 𝜏es ′ ) = 275
𝑓 -calls per oracle call, whereas Adversarial-SLSQP spends 𝜏slsqp = 5 𝑓 -calls. We remark that if one of
the CMA-ES in Adversarial-CMA-ES (i.e., either M𝑥 or M 𝑦 ) is replaced with SLSQP, the number of
𝑓 -calls will be approximately halved. Therefore, it is advisable to use SLSQP, or another first-order
approach, as an approximate minimization oracle if ∇𝑓 is available and cheap to compute. Figure 2c
shows the scaling of the runtime with respect to the dimension 𝑛 = 𝑚. The number of iterations did
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:16 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
#iterations
#iterations
#iterations
102 102 102
#f -calls
4
104 10 104
3 3
10 10 103
Fig. 2. The number of iterations and the number of 𝑓 -calls until 𝐺 1 (𝑥, 𝑦) ⩽ 10−5 is reached on 𝑓1 . The
solid lines indicate the median and the shaded areas indicate the 10–90 percentile ranges. Dashed lines are
2
proportional to 1 + 𝑎𝑏·𝑐 .
not depend on the search space dimension. The number of 𝑓 -calls was also constant over varying
𝑛 = 𝑚 for Adversarial-SLSQP. However, it was proportional to 𝑛 + 𝑚 for Adversarial-CMA-ES.3
This is because the runtime of (1+1)-CMA-ES is proportional to the dimension, and iterations must
be run proportional to the search space dimension to approximate (8).
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:17
Table 1. Definition of the test functions 𝑓 (𝑥, 𝑦) and their worst-case variable 𝑦ˆ (𝑥) = argmax𝑦 ∈Y 𝑓 (𝑥, 𝑦)
𝑓 (𝑥, 𝑦) 𝑦ˆ (𝑥)
1 1 Í𝑚 Í𝑛 1 1 Í𝑚
2 ∥𝑥 ∥ + 𝑚 𝑖=1 𝑥𝑖 − 2 ∥𝑦 ∥2 𝑚 Í𝑖=1 𝑥𝑖 1
𝑓2 2
𝑗=1 𝑦 𝑗
1 1 Í𝑚 Í𝑛 𝑚
𝑓3 2 min ∥𝑥 ∥ 2, ∥𝑥 − 4 · 1∥ 2 + 𝑚 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 − 21 ∥𝑦 ∥ 2 1
𝑚 𝑖=1 𝑥𝑖 1
1 1 Í𝑚 Í𝑛 1 2 2
1 Í 𝑚 13
2 ∥𝑥 ∥ + − 2 ∥𝑦 ∥ 1
𝑓4 2
Í𝑚 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 𝑚 ·𝑛 𝑖=1 𝑥𝑖
1 𝑛
𝑓5 𝑚 Í∥𝑥 𝑖 ∥ 𝑗=1 𝑦 𝑗 5·1
1 𝑚 Í𝑛 1 1 Í𝑚
𝑗=1 𝑦 𝑗 − 2 ∥𝑦 ∥ (𝑚 𝑖=1 𝑥𝑖Í 1
𝑓6 2
𝑚 𝑖=1 𝑥𝑖
𝑚
1 Í𝑚 Í𝑛 5·1 𝑖=1 𝑥𝑖 ⩾ 0
𝑓7 2 ∥𝑥 ∥
2 + 𝑚1 𝑖=1 𝑥𝑖 𝑗=1 𝑦 𝑗 Í𝑚
−1 · 1 𝑖=1 𝑥𝑖 < 0
by 𝑓 (𝑇X (𝑥),𝑇Y (𝑦)), where 𝑇X and 𝑇Y map each coordinate to 𝑈 − |mod(𝑥 − 𝐿, 2(𝑈 − 𝐿)) − (𝑈 − 𝐿)|,
where 𝑈 = 5 and 𝐿 = −1 denote the upper and lower bounds of each coordinate. The output of (1+1)-
CMA-ES (M𝑥 and M 𝑦 ) is repaired into the feasible domain by applying 𝑇X and 𝑇Y . We compare
the results with those of the naive baseline approach, referred to as CMA-ES(𝑁 𝑦 ). We sampled
𝑁 𝑦 = 10 or 100 points uniform-randomly in Y, and they were denoted as 𝑦𝑘 for 𝑘 = 1, . . . , 𝑁 𝑦 . The
approximate worst-case objective was defined as 𝐹 𝑁 𝑦 (𝑥) = max1⩽𝑘 ⩽𝑁 𝑦 𝑓 (𝑥, 𝑦𝑘 ). Then, we solve 𝐹 𝑁 𝑦
with (1+1)-CMA-ES (Algorithm 2) using mirroring boundary constraint handling. These algorithms
are run 10 times with different initial solutions. We also compared two coevolutionary approaches,
MMDE [Qiu et al. 2018] and COEVA [Al-Dujaili et al. 2019]. These approaches were implemented
based on the Python code provided by the authors of [Al-Dujaili et al. 2019].
Figure 3 shows the results of 10 independent trials of Adversarial-CMA-ES, CMA-ES(𝑁 𝑦 = 10),
CMA-ES(𝑁 𝑦 = 100), MMDE, and COEVA. Adversarial-CMA-ES succeeds in converging the global
min–max saddle point on 𝑓2 , 𝑓3 , and 𝑓6 . The functions 𝑓2 and 𝑓3 were locally strongly convex–concave
functions, and Adversarial-CMA-ES performed well, as expected. The existing coevolutionary
approaches, as well as CMA-ES(𝑁 𝑦 ), failed to converge on these problems. The benchmark problems
used to evaluate the performance of existing coevolutionary approaches [Branke and Rosenbusch
2008; Qiu et al. 2018; Zhou and Zhang 2010] are rather low-dimensional problems (𝑚 ⩽ 2 and
𝑛 ⩽ 2). These approaches do not work well on higher-dimensional problems and perform worse
than the simple baseline, CMA-ES(𝑁 𝑦 ). CMA-ES(𝑁 𝑦 ) tends to the global optimal point on 𝑓5 . This is
because the optimal 𝑥 ∗ is optimal
Í for approximate worst-case functions provided that there exists
𝑦 in 𝑁 𝑦 samples such that 𝑛𝑖=1 𝑦𝑖 > 0 holds. In contrast, no approach succeeded in converging
toward the global optimum of the worst-case function on 𝑓4 , 𝑓6 , and 𝑓7 . From these results, we
conclude that local strong convexity–concavity is an important factor for the convergence of
Adversarial-CMA-ES. These results reveal the limitations of Adversarial-CMA-ES and the difficulty
of locating the solution to the worst-case objective if it is not a min–max saddle point.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:18 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
Fig. 3. Results of 10 independent runs of Adversarial-CMA-ES with and without sampling distribution
(denoted as Adv-CMA-ES(P) and Adv-CMA-ES(no P), respectively), CMA-ES(𝑁 𝑦 = 10), CMA-ES(𝑁 𝑦 = 100),
MMDE, and COEVA. The search space dimension is 𝑚 = 50 and 𝑛 = 20 for all cases.
the vessel MV ESSO OSAKA (Figure 4a). The state variables 𝑠 = (𝑋, 𝑢, 𝑌 , 𝑣𝑚 ,𝜓, 𝑟 ) ∈ R6 and the
control signal 𝑎 = (𝛿, 𝑛 p, 𝑛 BT, 𝑛 ST ) ∈ R4 were as described in Figure 4b. The controller 𝑢𝑥 : 𝑠 ↦→ 𝑎
was modeled by a neural network with 𝑚 = 99 dimensional parameter vector 𝑥. The objective
function 𝑓 (𝑥, 𝑦) measures the distance between the target position and the final position of the
subject ship after a pre-defined control time using the controller 𝑢𝑥 , penalized by the risk of the
collision to the berth, under an uncertainty parameter 𝑦 ∈ Y described below. The details of the
controller and the objective function are explained in Appendix B.
The wind conditions 𝑦 (𝐴) and the model coefficients 𝑦 (𝐵) with respect to the wind forces are
treated as the uncertain factors 𝑦 = (𝑦 (𝐴) , 𝑦 (𝐵) ). The following three situations are considered. (A)
The state equation model is accurately modeled, but the wind conditions are uncertain. In this
situation, the uncertainty parameters 𝑦 (𝐴) ∈ Y𝐴 represent the wind velocity 𝑈𝑇 [m/s] and the wind
direction 𝛾𝑇 [rad], and their feasible values are in Y𝐴 = [0, 0.5] × [0, 2𝜋]. The model coefficients 𝑦 (𝐵)
(𝐵)
are set to the same values as in [Miyauchi et al. 2021b], denoted by 𝑦est . (B) Wind conditions are
known, but the state equation model is uncertain. The coefficients in the state equation model for
the effect of the wind force were derived in [Fujiwara et al. 1998] using regression of wind tunnel
experiment data, and we consider them to be relatively inaccurate. The uncertainty parameters
𝑦 (𝐵) consist of 10 coefficients for the wind force. The feasible domain Y𝐵 is constructed to include
(𝐵)
the coefficient used in [Miyauchi et al. 2021b], that is, 𝑦est ∈ Y𝐵 . For each variable, the feasible
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:19
X
Berth
UT U
u
b
-v
UT
gT UA nB
r
o
n gA
nS
d
y
(𝐵)
values are defined by the interval. The interval of the 𝑖th component of 𝑦 (𝐵) , denoted by [𝑦est ]𝑖 ,
(𝐵) (𝐵)
is set to [0.9 · [𝑦est ] 𝑖 , 1.1 · [𝑦est ] 𝑖 ] for all 𝑖 = 1, . . . , 10. The other model coefficients are set to the
(𝐴)
same values as [Miyauchi et al. 2021b] and the wind condition is set to 𝑦est = (1.5𝜋, 0.5), meaning
that the velocity of wind blowing orthogonally from the sea to the berth is 0.5 [m/s]. (C) Wind
conditions are unknown, and the model coefficients are uncertain. In this situation, 𝑦 is composed
of the uncertainty parameters 𝑦 (𝐴) and 𝑦 (𝐵) , and the feasible values are set to Y𝐶 = Y𝐴 × Y𝐵 .
4 CMA-ES(𝑁
𝑦 = 10), MMDE and COEVA tested in Section 5.2 were omitted from the comparison based on our preliminary
experiments. The worst-case performance of CMA-ES(𝑁 𝑦 = 10) were worse than the worst-case performance of CMA-
ES(𝑁 𝑦 = 100) on our problems. The worst-case performance of MMDE and COEVA were not competitive to the other
approaches as demonstrated in Figure 3.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:20 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
For Adversarial-CMA-ES, we used the restart strategy proposed in Algorithm 1. The output
of Adversarial-CMA-ES follows Algorithm 1. For CMA-ES(𝑁 𝑦 ), when the termination condition
𝜎 < 𝜎¯min was satisfied, the candidate solution was recorded and the algorithm was re-started until
it exhausted the 𝑓 -call budget. Note that 𝑦𝑘 (𝑘 = 1, . . . , 𝑁 𝑦 ) were not resampled. The output of CMA-
ES(𝑁 𝑦 ) is determined as follows: Let {𝑥 1, . . . , 𝑥 𝑟 } be the set of recorded candidate solutions and the
solution obtained at the end of the run. We then selected 𝑥 = argmin𝑖=1,...,𝑟 max𝑘=1,...,𝑁 𝑦 𝑓 (𝑥 𝑖 , 𝑦𝑘 ) as
the output of CMA-ES(𝑁 𝑦 ).
The obtained solutions were evaluated as follows. Because the ground truth worst-case objective
function value 𝐹 (𝑥) = max𝑦 ∈Y 𝑓 (𝑥, 𝑦) for a given 𝑥 is unknown, we perform numerical optimization
to approximate 𝐹 (𝑥). We ran (1+1)-CMA-ES for 500 × 𝑛 iterations to obtain a local maximal point 𝑦
of 𝑓 (𝑥, 𝑦). As the objective is expected to have multiple local optima, we repeat it 100 times with
different initial search points 𝑦. The initialization of (1+1)-CMA-ES is as described above.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:21
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103
(𝐴) (𝐵)
(a) 𝑓 (𝑥, (𝑦no , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103
(𝐵)
(b) max𝑦 (𝐴) ∈Y𝐴 𝑓 (𝑥, (𝑦 (𝐴) , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103
(𝐴) (𝐵)
(c) 𝑓 (𝑥, (𝑦est , 𝑦est ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103
(𝐴)
(d) max𝑦 (𝐵) ∈Y𝐵 𝑓 (𝑥, (𝑦est , 𝑦 (𝐵) ))
cmaA
cmaB
maxA
maxB
maxC
advA
advB
advC
10 5 10 3 10 1 101 103
(𝐴) (𝐵)
Fig. 5. Performance of the controllers obtained in 20 independent trials of (1+1)-CMA-ES on 𝑦 = (𝑦no , 𝑦est )
(𝐴) (𝐵)
and 𝑦 = (𝑦est , 𝑦est ); CMA-ES(𝑁 𝑦 = 100) on Y𝐴 , Y𝐵 , Y𝐶 ; and Adversarial-CMA-ES on Y𝐴 , Y𝐵 , Y𝐶 , denoted
by cmaA, cmaB, maxA, maxB, maxC, advA, advB, and advC, respectively. Each box indicates the lower quartile
𝑄1 and the upper quartile 𝑄3, with the line indicating the median 𝑄2. The lower and upper whiskers are the
lowest datum above 𝑄1 − 1.5(𝑄3 − 𝑄1) and the highest datum below 𝑄3 + 1.5(𝑄3 − 𝑄1).
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:22 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
treating the worst case in Y𝐴 . The results may be improved by running the optimization process
longer and performing more restarts to locate more local min–max saddle points.
7 CONCLUSION
We have proposed a framework for saddle point optimization with approximate minimization
oracles. Our theoretical analysis has shown the conditions under which the learning rate for the
approach converges linearly (i.e., geometrically) toward the min–max saddle point on strongly
convex–concave functions. Numerical evaluation have shown the tightness of the theoretical
results. We have also proposed a learning rate adaptation mechanism for practical use. Numerical
evaluation on convex-concave quadratic problems has demonstrated that the proposed approach
with the learning rate adaptation successfully converges linearly toward the min–max saddle
point, with the compromise of 𝑓 -calls being no more than three times that of 𝑓 -calls with the best
tuned fixed learning rate. Comparison with other baseline approaches on several test problems
revealed the limitations of existing coevolutionary approaches as well as of the proposed approach
on problems with the optimal solution that is not a min–max saddle point. The application of the
proposed approach to a robust berthing control task demonstrated the usefulness of the proposed
approach, and the results imply the importance of considering modeling errors to achieve a reliable
and safe solution.
We conclude the present work with some suggestions for possible avenues for future research.
The main limitation of the proposed approach as a solver to (1) is that it may fail to converge
to a local minimal solution of the worst-case objective max𝑦 ∈Y 𝑓 (𝑥, 𝑦) if it is not a strict local
min–max saddle point of 𝑓 . Such failure cases were observed in Figure 3, not only for the proposed
approach but also for existing coevolutionary approaches. Addressing this difficulty is an important
direction for future work. For the GDA approach (2), Liang and Stokes [2019] have shown that
the GDA failed to converge to the optimal solution on a bi-linear function 𝑓 (𝑥, 𝑦) = 𝑥 T𝐶𝑦 and
some improved gradient-based approaches [Daskalakis et al. 2018; Mescheder et al. 2017; Yadav
et al. 2018] successfully converged. We expect that these gradient-based approaches would be
useful in improving the here-proposed approach. The other limitation is that the best possible
runtime Ω(1/𝛾 ∗ ) in (13) scales as the interaction term; more precisely, 𝛽𝐺 /𝛼 𝐻 , increases. We intend
to address this limitation in future work.
The main limitation of our theoretical result (Theorem 3.3) is that approximate minimization
oracles are required to satisfy Assumption 3.2. In practice, it is often impossible to guarantee
Assumption 3.2 as we do not know the global minimum/maximum of the objective functions. For
the design of Adversarial-CMA-ES and Adversarial-SLSQP, we expect that it is approximately
satisfied by running a linearly convergence approximate minimization oracle until a fixed number
of improvements are observed. See Section 4.3 for details. In such cases, we have condition (8) not
with a constant 𝜖 but rather with a possibly stochastic and time-dependent sequence {𝜖𝑡 }, which is
not covered by Theorem 3.3. Dealing with such 𝜖𝑡 will enlarge the scope of Theorem 3.3 and bridge
the gap between theory and practice.5
The uncertainty parameter is typically constrained in a bounded set Y ⊂ R𝑛 in practice, however,
the effect of Y on the convergence rate has not been investigated in this work. Our theoretical
result was developed for unconstrained situation and the proof does not immediately generalize to
the constrained situation. The effect of the bound Y on the convergence rate is to be investigated
theoretically and empirically.
5 Another possible approach is to replace condition ℎ (𝑧)
˜ − min𝑧∈Z ℎ (𝑧) ⩽ 𝜖 · (ℎ (𝑧)
¯ − min𝑧∈Z ℎ (𝑧)) in Assumption 3.2 with
condition ∥ ∇ℎ (𝑧)
˜ ∥ < 𝜖 · ∥ ∇ℎ (𝑧)
¯ ∥ under the additional assumption that the objective function is continuously differentiable.
An advantage of this approach is that this condition can be approximately verified by estimating the gradients of the
objective function [Nesterov and Spokoiny 2017].
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:23
The experimental results on the robust berthing control task have demonstrated the usefulness
of the proposed approach and the importance of considering model uncertainties. Simultaneously,
they revealed the difficulty of obtaining a robust solution with satisfactory utility. Regarding the
wind condition uncertainty, it is possible to decompose Y𝐴 into disjoint subsets (e.g., based on the
wind direction), train the robust feedback controller for each subset, and switch the controller based
on the wind condition measured at the time of operation. Such an approach is not available for the
uncertainty in the model coefficients. To improve the worst-case performance, it is important to
reduce the set of uncertain parameter values Y as much as possible. In our experiments, we defined
the interval for each uncertain coefficient to form Y, but the corner case may be unrealistic and
will degrade the worst-case performance unnecessarily. Designing more intelligent Y remains as a
very important task for practical applications.
ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewers for their valuable comments and
suggestions. This work is partially supported by JSPS KAKENHI Grant Number 19H04179 and
19K04858.
REFERENCES
Martin A Abkowitz. 1980. Measurement of hydrodynamic characteristics from ship maneuvering trials by system identifica-
tion. In Transactions of Society of Naval Architects and Marine Engineers 88. 283–318.
Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, and Thomas Hofmann. 2019. Local Saddle Point Optimization: A
Curvature Exploitation Approach. In International Conference on Artificial Intelligence and Statistics. 486–495.
Youhei Akimoto. 2021. Saddle Point Optimization with Approximate Minimization Oracle. In Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO ’21). 493–501.
Youhei Akimoto and Nikolaus Hansen. 2020. Diagonal acceleration for covariance matrix adaptation evolution strategies.
Evolutionary computation 28, 3 (2020), 405–435.
Abdullah Al-Dujaili, Shashank Srikant, Erik Hemberg, and Una-May O’Reilly. 2019. On the application of Danskin’s theorem
to derivative-free minimax problems. In AIP Conference Proceedings, Vol. 2070. 20–26.
Motoki Araki, Hamid Sadat-Hosseini, Yugo Sanada, Kenji Tanimoto, Naoya Umeda, and Frederick Stern. 2012. Estimating
maneuvering coefficients using system identification methods with experimental, system-based, and CFD free-running
trial data. Ocean Engineering 51 (2012), 63–84.
Dirk V. Arnold and Nikolaus Hansen. 2010. Active Covariance Matrix Adaptation for the (1+1)-CMA-ES. In Proceedings of
the 12th Annual Conference on Genetic and Evolutionary Computation (GECCO ’10). 385–392.
Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. 2010a. Nonconvex Robust Optimization for Problems with
Constraints. INFORMS Journal on Computing 22, 1 (2010), 44–58.
Dimitris Bertsimas, Omid Nohadani, and Kwong Meng Teo. 2010b. Robust Nonconvex Optimization for Simulation based
Problems. Operations Research 58, 1 (2010), 161–178.
Ilija Bogunovic, Jonathan Scarlett, Stefanie Jegelka, and Volkan Cevher. 2018. Adversarially Robust Optimization with
Gaussian Processes. In Advances in Neural Information Processing Systems. 5760–5770.
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
Jürgen Branke and Johanna Rosenbusch. 2008. New Approaches to Coevolutionary Worst-Case Optimization. In International
Conference on Parallel Problem Solving from Nature. 144–153.
Ashish Cherukuri, Bahman Gharesifard, and Jorge Cortés. 2017. Saddle-Point Dynamics: Conditions for Asymptotic Stability
of Saddle Points. SIAM Journal on Control and Optimization 55, 1 (2017), 486–511.
Andrew R. Conn and Luis N. Vicente. 2012. Bilevel Derivative-Free Optimization and Its Application to Robust Optimization.
Optimization Methods Software 27, 3 (2012), 561–577.
Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. 2018. Training GANs with Optimism. In
International Conference on Learning Representations.
Oswaldo de Oliveira. 2013. The Implicit and Inverse Function Theorems: Easy Proofs. Real Analysis Exchange 39, 1 (2013),
207–218.
Luc Devroye. 1972. The compound random search. In International Symposium on Systems Engineering and Analysis.
195–110.
Peter I Frazier. 2018. A Tutorial on Bayesian Optimization. arXiv:1807.02811 (2018).
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:24 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
Toshifumi Fujiwara, Michio Ueno, and Tadashi Nimura. 1998. Estimation of Wind Forces and Moments acting on Ships.
Journal of the Society of Naval Architects of Japan 1998 (1998), 77–90. Issue 183.
Gauthier Gidel, Tony Jebara, and Simon Lacoste-Julien. 2017. Frank-Wolfe Algorithms for Saddle Point Problems. In
International Conference on Artificial Intelligence and Statistics. 362–371.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
Nikolaus Hansen and Anne Auger. 2014. Principled design of continuous stochastic search: From theory to practice. In
Theory and principled methods for the design of metaheuristics. Springer, 145–180.
Nikolaus Hansen, Sibylle D Müller, and Petros Koumoutsakos. 2003. Reducing the time complexity of the derandomized
evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation 11, 1 (2003), 1–18.
Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolu-
tionary computation 9, 2 (2001), 159–195.
Mikkel T. Jensen. 2004. A New Look at Solving Minimax Problems with Coevolutionary Genetic Algorithms. Kluwer Academic
Publishers, 369–384.
Hamed Karimi, Julie Nutini, and Mark Schmidt. 2016. Linear Convergence of Gradient and Proximal-Gradient Methods
Under the Polyak-Łojasiewicz Condition. In Joint European Conference on Machine Learning and Knowledge Discovery in
Databases. 795–811.
Stefan Kern, Sibylle D. Müller, Nikolaus Hansen, Dirk Büche, Jiri Ocenasek, and Petros Koumoutsakos. 2004. Learning
Probability Distributions in Continuous Evolutionary Algorithms - a Comparative Review. Natural Computing 3, 1 (2004),
77–112.
Dieter Kraft. 1988. A software package for sequential quadratic programming. Technical Report. DFVLR-FB 88-28, DLR
German Aerospace Center — Institute for Flight Mechanics, Koln, Germany.
Tengyuan Liang and James Stokes. 2019. Interaction matters: A note on non-asymptotic local convergence of generative
adversarial networks. In International Conference on Artificial Intelligence and Statistics. 907–915.
Sijia Liu, Songtao Lu, Xiangyi Chen, Yao Feng, Kaidi Xu, Abdullah Al-Dujaili, Mingyi Hong, and Una-May O’Reilly. 2020.
Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks.
In International Conference on Machine Learning. 2307–2318.
Atsuo Maki, Youhei Akimoto, and Naoya Umeda. 2021. Application of optimal control theory based on the evolution
strategy (CMA-ES) to automatic berthing (part: 2). Journal of Marine Science and Technology 26 (2021), 835–845.
Atsuo Maki, Naoki Sakamoto, Youhei Akimoto, Hiroyuki Nishikawa, and Naoya Umeda. 2020. Application of optimal
control theory based on the evolution strategy (CMA-ES) to automatic berthing. Journal of Marine Science and Technology
25 (2020), 221–233.
Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. 2017. The Numerics of GANs. In Advances in Neural Information
Processing Systems. 1823–1833.
Transport Ministry of Land, Infrastructure and Tourism. 2020. White paper on land, infrastructure, transport and tourism
in Japan. https://www.mlit.go.jp/en/statistics/white-paper-mlit-index.html
Yoshiki Miyauchi, Atsuo Maki, Naoya Umeda, Dimas M. Rachman, and Youhei Akimoto. 2021a. System Parameter Exploration
of Ship Maneuvering Model for Automatic Docking / Berthing using CMA-ES. arXiv:2111.06124 (2021).
Yoshiki Miyauchi, Ryohei Sawada, Youhei Akimoto, Naoya Umeda, and Atsuo Maki. 2021b. Optimization on Planning
of Trajectory and Control of Autonomous Berthing and Unberthing for the Realistic Port Geometry. arXiv:2106.02459
(2021).
Daiki Morinaga and Youhei Akimoto. 2019. Generalized drift analysis in continuous domain: linear convergence of (1+1)-ES
on strongly convex functions with Lipschitz continuous gradients. In Foundations of Genetic Algorithms. 13–24.
Daiki Morinaga, Kazuto Fukuchi, Jun Sakuma, and Youhei Akimoto. 2021. Convergence Rate of the (1+1)-Evolution Strategy
with Success-Based Step-Size Adaptation on Convex Quadratic Function. In Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO ’21). 1169–1177.
Vaishnavh Nagarajan and J. Zico Kolter. 2017. Gradient Descent GAN Optimization is Locally Stable. In Advances in Neural
Information Processing Systems. 5591–5600.
Yurii Nesterov and Vladimir Spokoiny. 2017. Random Gradient-Free Minimization of Convex Functions. Foundations of
Computational Mathematics 17, 2 (2017), 527–566.
Maher Nouiehed, Maziar Sanjabi, Tianjian Huang, Jason D Lee, and Meisam Razaviyayn. 2019. Solving a Class of Non-
Convex Min-Max Games Using Iterative First Order Methods. In Advances in Neural Information Processing Systems.
14934–14942.
Victor Picheny, Mickael Binois, and Abderrahmane Habbal. 2019. A Bayesian optimization approach to find Nash equilibria.
Journal of Global Optimization 73, 1 (2019), 171–192.
Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. 2017. Robust Adversarial Reinforcement Learning. In
International Conference on Machine Learning. 2817–2826.
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:25
Xin Qiu, Jian-Xin Xu, Yinghao Xu, and Kay Chen Tan. 2018. A New Differential Evolution Algorithm for Minimax
Optimization in Robust Design. IEEE Transactions on Cybernetics 48, 5 (2018), 1355–1368.
Ingo Rechenberg. 1973. Evolutionsstrategie: Optimierung technisher Systeme nach Prinzipien der biologischen Evolution.
Frommann-Holzboog.
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved
Techniques for Training GANs. In Advances in Neural Information Processing Systems. 2234–2242.
Michael A. Schumer and Kenneth Steiglitz. 1968. Adaptive step size random search. Automatic Control, IEEE Transactions on
13 (1968), 270–276.
Hiroaki Shioya, Yusuke Iwasawa, and Yutaka Matsuo. 2018. Extending Robust Adversarial Reinforcement Learning Consid-
ering Adaptation and Diversity. In International Conference on Learning Representations, Workshop Track Proceedings.
Kouki Wakita, Atsuo Maki, Naoya Umeda, Yoshiki Miyauchi, Tohga Shimoji, Dimas M. Rachman, and Youhei Akimoto.
2021. On Neural Network Identification for Low-Speed Ship Maneuvering Model. arXiv:2111.06120 (2021).
Abhay Yadav, Sohil Shah, Zheng Xu, David Jacobs, and Tom Goldstein. 2018. Stabilizing Adversarial Nets With Prediction
Methods. In International Conference on Learning Representations.
Aimin Zhou and Qingfu Zhang. 2010. A Surrogate-Assisted Evolutionary Algorithm for Minimax Optimization. In IEEE
Congress on Evolutionary Computation. 1–7.
A PROOFS
A.1 Proof of Proposition 2.3
Proof. Assume that (𝑥 ∗, 𝑦 ∗ ) is a local min–max saddle point of 𝑓 . Then, by definition, there
exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝑓 (𝑥, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦) holds for any
(𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . Let (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . Then, 𝐺𝑥 (𝑥, 𝑦 ∗ ) = 𝑓 (𝑥, 𝑦 ∗ ) − min𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) −
min𝑥 ′ ∈X 𝑓 (𝑥 ′, 𝑦 ∗ ) = 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, 𝑦) = max𝑦′ ∈Y 𝑓 (𝑥 ∗, 𝑦 ′) − 𝑓 (𝑥 ∗, 𝑦) ⩾ max𝑦′ ∈Y 𝑓 (𝑥 ∗, 𝑦 ′) −
𝑓 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ). This implies that 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (𝑥, 𝑦 ∗ ) and
𝐺 𝑦 (𝑥 ∗, 𝑦), respectively.
Conversely, assume that 𝑥 ∗ and 𝑦 ∗ are local minimal points of 𝐺𝑥 (𝑥, 𝑦 ∗ ) and 𝐺 𝑦 (𝑥 ∗, 𝑦), respec-
tively. Then, there exists a neighborhood E𝑥 × E 𝑦 of (𝑥 ∗, 𝑦 ∗ ) such that 𝐺𝑥 (𝑥, 𝑦 ∗ ) ⩾ 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ )
and 𝐺 𝑦 (𝑥 ∗, 𝑦) ⩾ 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) for any (𝑥, 𝑦) ∈ E𝑥 × E 𝑦 . They read 𝑓 (𝑥, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦 ∗ ) and
𝑓 (𝑥 ∗, 𝑦 ∗ ) ⩾ 𝑓 (𝑥 ∗, 𝑦), which implies that (𝑥 ∗, 𝑦 ∗ ) is a local min–max saddle point of 𝑓 .
If (𝑥 ∗, 𝑦 ∗ ) is the global min–max saddle point of 𝑓 , then (𝑥 ∗, 𝑦 ∗ ) is a local minimal point of 𝐺.
Moreover, we have 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) = 0, implying that it is the global minimal point of 𝐺.
Conversely, if (𝑥 ∗, 𝑦 ∗ ) is the global minimal point of 𝐺, then it is a local min–max saddle point.
Moreover, because the global minimum of 𝐺 is zero, we have 𝐺𝑥 (𝑥 ∗, 𝑦 ∗ ) = 𝐺 𝑦 (𝑥 ∗, 𝑦 ∗ ) = 0. Then, we
can take E𝑥 = X and E 𝑦 = Y in the above proof, which implies that (𝑥 ∗, 𝑦 ∗ ) is the global min–max
saddle point. □
∇𝑥 𝐺𝑥 (𝑥, 𝑦) = (∇𝑥 𝑓 ) (𝑥, 𝑦), ∇𝑦 𝐺𝑥 (𝑥, 𝑦) = (∇𝑦 𝑓 ) (𝑥, 𝑦) − (∇𝑦 𝑓 ) (𝑥ˆ (𝑦), 𝑦),
(18)
∇𝑥 𝐺 𝑦 (𝑥, 𝑦) = (∇𝑥 𝑓 ) (𝑥, 𝑦ˆ (𝑥)) − (∇𝑥 𝑓 ) (𝑥, 𝑦), ∇𝑦 𝐺 𝑦 (𝑥, 𝑦) = −(∇𝑦 𝑓 ) (𝑥, 𝑦).
Moreover, we have
2 𝐻𝑥,𝑥 (𝑥, 𝑦) 𝐻𝑥,𝑦 (𝑥, 𝑦)
∇ 𝐺𝑥 (𝑥, 𝑦) = ,
𝐻 𝑦,𝑥 (𝑥, 𝑦) 𝐻 𝑦,𝑦 (𝑥, 𝑦) − 𝐻 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦) − [𝐽𝑥ˆ (𝑥ˆ (𝑦), 𝑦)] T 𝐻𝑥,𝑦 (𝑥ˆ (𝑦), 𝑦)
2 −𝐻𝑥,𝑥 (𝑥, 𝑦) + 𝐻𝑥,𝑥 (𝑥, 𝑦ˆ (𝑥)) + [𝐽𝑦ˆ (𝑥, 𝑦ˆ (𝑥))] T 𝐻 𝑦,𝑥 (𝑥, 𝑦ˆ (𝑥)) −𝐻𝑥,𝑦 (𝑥, 𝑦)
∇ 𝐺 𝑦 (𝑥, 𝑦) = .
−𝐻 𝑦,𝑥 (𝑥, 𝑦) −𝐻 𝑦,𝑦 (𝑥, 𝑦)
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:26 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
In light of Proposition 2.5 and the symmetry 𝐻𝑥,𝑦 = 𝐻 𝑦,𝑥 T , we have [𝐽 (𝑥ˆ (𝑦), 𝑦)] T =
𝑥ˆ
−𝐻 𝑦,𝑥 (𝑥ˆ (𝑦), 𝑦) (𝐻𝑥,𝑥 (𝑥ˆ (𝑦), 𝑦)) −1 and [𝐽𝑦ˆ (𝑥, 𝑦ˆ (𝑥))] T = −𝐻𝑥,𝑦 (𝑥, 𝑦ˆ (𝑥)) (𝐻 𝑦,𝑦 (𝑥, 𝑦ˆ (𝑥))) −1 . Then, be-
cause ∇2𝐺 = ∇2 (𝐺𝑥 + 𝐺 𝑦 ) = ∇2𝐺𝑥 + ∇2𝐺 𝑦 , we obtain
𝐺 (𝑥, 𝑦ˆ (𝑥)) 0
∇2𝐺 (𝑥, 𝑦) = 𝑥,𝑥 .
0 𝐺 𝑦,𝑦 (𝑥ˆ (𝑦), 𝑦)
The symmetry of 𝐺𝑥,𝑥 and 𝐺 𝑦,𝑦 are clear from their definitions. The positivity of 𝐺𝑥,𝑥 and 𝐺 𝑦,𝑦
follows that 𝐻𝑥,𝑥 ≻ 0, −𝐻 𝑦,𝑦 ≻ 0, 𝐻𝑥,𝑦 (−𝐻 𝑦,𝑦 ) −1 𝐻 𝑦,𝑥 ≽ 𝜎min (𝐻𝑥,𝑦 ) 2 /𝜎max (−𝐻 𝑦,𝑦 ) ≻ 0 and
𝐻 𝑦,𝑥 (𝐻𝑥,𝑥 ) −1 𝐻𝑥,𝑦 ≽ 𝜎min (𝐻𝑥,𝑦 ) 2 /𝜎max (𝐻𝑥,𝑥 ) ≻ 0. This completes the proof. □
First, we evaluate the first term on the right-most side of (27). Note first that we have ∇𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 =
∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑥 + ∇𝑦 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑦 . In light of (18), we have ∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) = (∇𝑥 𝑓 ) (𝑥𝑡 , 𝑦ˆ (𝑥𝑡 )) =
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:27
(∇𝑥 𝑓 ) (𝑥¯ (0), 𝑦¯ (1)). Noting that (∇𝑥 𝑓 ) (𝑥¯ (1), 𝑦¯ (0)) = (∇𝑥 𝑓 ) (𝑥ˆ (𝑦𝑡 ), 𝑦𝑡 ) = 0, by the mean value
theorem, we obtain
∫ 1 ∫ 1
∇𝑥 𝐺 (𝑥𝑡 , 𝑦𝑡 ) T𝑣 𝑥 = −𝑤 𝑥T · 𝐻𝑥,𝑥 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑥 + 𝑤 𝑦T · 𝐻 𝑦,𝑥 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑥 . (28)
0 0
Analogously, we obtain
∫ 1 ∫ 1
T
∇𝑦 𝐺 (𝑥𝑡 , 𝑦𝑡 ) 𝑣 𝑦 = 𝑤 𝑦T · 𝐻 𝑦,𝑦 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑦 − 𝑤 𝑥T · 𝐻𝑥,𝑦 (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏))d𝜏 · 𝑣 𝑦 . (29)
0 0
where we drop (𝑥¯ (1 − 𝜏), 𝑦¯ (𝜏)) from 𝐻𝑥,𝑥 , 𝐻𝑥,𝑦 , 𝐻 𝑦,𝑥 and 𝐻 𝑦,𝑦 for compact expressions.
We aim to bound each term of (27) and (30). The second term on (27) is bounded by using
conditions (3) and (4) of the theorem statement as well as the fact that ∇2𝐺 is block-diagonal
(Lemma 2.6) as
𝛼𝐺 ∥𝑣 ∥ 2diag(𝐻𝑥,𝑥 𝑦,𝑦
T 2
¯ (𝑠), 𝑦¯ (𝑠)) T · 𝑣 ⩽ 𝛽𝐺 ∥𝑣 ∥ 2diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) ⩽ 𝑣 ∇ 𝐺 (𝑥 ∗ ,−𝐻 ∗ ) .
𝑦,𝑦
(31)
The first term on (30) is bounded by using conditions (1) and (2) of the theorem statement as
T
2 𝑤𝑥 𝐻𝑥,𝑥 0 𝑤𝑥
− 𝛽𝐻 ∥𝑤 ∥ diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) ⩽ − ⩽ −𝛼 𝐻 ∥𝑤 ∥ 2diag(𝐻𝑥,𝑥
∗ ,−𝐻 ∗ ) . (32)
𝑦,𝑦 𝑤𝑦 0 −𝐻 𝑦,𝑦 𝑤 𝑦 𝑦,𝑦
∗ −1 𝐻
" √︁ √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 # !
𝐻𝑥,𝑥 𝑥,𝑥 𝐻𝑥,𝑥 − 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦
· 𝜎max √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 , (33)
−𝐻 𝑦,𝑦 𝐻 𝑦,𝑥 𝐻𝑥,𝑥 −𝐻 𝑦,𝑦 (−𝐻 𝑦,𝑦 ) −𝐻 𝑦,𝑦
where the greatest singular value is bounded by using conditions (1)–(4) in the theorem statement
as
∗ −1 𝐻
" √︁ √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 # !
𝐻𝑥,𝑥 𝑥,𝑥 𝐻𝑥,𝑥 − 𝐻𝑥,𝑥 𝐻𝑥,𝑦 −𝐻 𝑦,𝑦 1/2 1/2
𝜎max √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 √︁ ∗ −1 ⩽ 𝛽𝐻 · 𝛽𝐺 /𝛼 𝐻 . (34)
−𝐻 𝑦,𝑦 𝐻 𝑦,𝑥 𝐻𝑥,𝑥 −𝐻 𝑦,𝑦 (−𝐻 𝑦,𝑦 ) −𝐻 𝑦,𝑦
Equations (19) to (24), (27) and (30) to (34) lead to
√︄
𝛼 𝐻 𝛽𝐻2 𝛽𝐺 ª 2 √ 2 𝛽𝐺
𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) − 𝐺 (𝑥𝑡 , 𝑦𝑡 ) ⩽ −2𝜂 − · + · (1 + · 𝐺 (𝑥𝑡 , 𝑦𝑡 ).
©
1 𝜖 ® 𝜂 𝜖) (35)
𝛽𝐻 𝛼 𝐻2 𝛼 𝐻 𝛼 𝐻
« ¬
Here, the right-hand side is 𝛾 ·𝐺 (𝑥𝑡 , 𝑦𝑡 ) with 𝛾 defined in (10). Hence, 𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) ⩽ (1+𝛾)·𝐺 (𝑥𝑡 , 𝑦𝑡 ).
Note that log(1 + 𝛾) < 𝛾 for all 𝛾 ∈ (−1, 0), we thus obtain log (𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 )) − log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) < 𝛾.
Because log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) −log
l (𝐺 (𝑥0, 𝑦m0 )) < 𝛾 ·𝑡, the minimal 𝑡 that log (𝐺 (𝑥𝑡 , 𝑦𝑡 )) −log (𝐺 (𝑥 0, 𝑦0 )) ⩽
log(𝜁 ) is no greater than 𝛾1 log 𝜁1 = 𝑇𝜁 . Similarly, Equations (19) to (22), (25) to (27) and (30)
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:28 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
to (34) lead to
√︄ √︄ 2
𝛽𝐻 © 𝛽𝐺 ª 2 𝛼𝐺 © 𝛽𝐻 ª
𝐺 (𝑥𝑡 +1, 𝑦𝑡 +1 ) − 𝐺 (𝑥𝑡 , 𝑦𝑡 ) ⩾ −2𝜂
1 + · 𝜖® + 𝜂 1 − · 𝜖 ® 𝐺 (𝑥𝑡 , 𝑦𝑡 ) . (36)
𝛼𝐻 𝛼𝐻 𝛽𝐻 𝛼𝐻
« ¬ « ¬
The right-hand side of Equation (36) is positive if 𝜂 > 2 · 𝜂.
¯ This completes the proof. □
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:29
for 𝑢𝑥 . It is modeled as
min max 𝑓 (𝑥, 𝑦) = min max 𝐶 (𝑠𝑡 ∈ [0,𝑡max ] , 𝑎𝑡 ∈ [0,𝑡max ] )
𝑥 ∈X 𝑦 ∈Y 𝑥 ∈X 𝑦 ∈Y
∫ 𝑡
subject to 𝑠𝑡 = 𝑠 0 + 𝜙 (𝑠𝜏 , 𝑎𝜏 ; 𝑦)d𝜏 and 𝑎𝑡 = 𝑢𝑥 (𝑠 ⌊𝑡 /d𝑡 ⌋ ·d𝑡 ),
0
where d𝑡 [seconds] is the control time span, that is, the control signal 𝑎𝑡 changes every d𝑡, and 𝑠 0 is
the initial state.
We define the cost of the trajectory as
𝐶 (𝑠𝑡 ∈ [0,𝑡max ] , 𝑎𝑡 ∈ [0,𝑡max ] ) = 𝐶 1 + 𝑤 · (𝐶 2 + I{𝐶 2 > 0}).
where 𝑤 > 0 is the hyperparameter that determines the trade-off between utility and safety,
6
1 ∑︁
𝐶1 = (𝑠𝑡 ,𝑖 − 𝑠 fin,𝑖 ) 2,
6 𝑖=1 max
evaluates the deviation of the final ship state from the target state 𝑠 fin , and
4 ∫
1 ∑︁ 𝑡max
𝐶2 = dist(𝑃𝜏,𝑖 , 𝐶 berth )d𝜏,
4 𝑖=1 0
measures the collision risk, where 𝑃𝜏,1, . . . , 𝑃𝜏,4 represents the coordinates of the four vertices of
the rectangle surrounding the ship at time 𝜏 and dist(𝑃, 𝐶 berth ) measures the distance from a point
𝑃 to the closest point on the berth boundary. Refer to [Maki et al. 2021, 2020] for the definitions of
𝐶 1 and 𝐶 2 .
Following [Maki et al. 2021], we set 𝑡 max = 200 [seconds] and d𝑡 = 10 [seconds]. The initial state
is 𝑠 0 = (15.0, 0.01, 6.0, 0.0, 𝜋, 0.0) and the target state is 𝑠 fin = (3.0, 0.0, 9.5, 0.0, 𝜋, 0.0). The boundary
of the berth is 𝐶 berth = {𝑌 = 9.994625}. The trade-off coefficient is set to 𝑤 = 10. That is, the cost
𝑓 (𝑥, 𝑦) < 10 implies that the controller 𝑢𝑥 produces a trajectory without collision with the berth
under the uncertainty parameter 𝑦.
Differences from Previous Works. Our problem formulation mostly follows previous studies [Maki
et al. 2021, 2020] but with certain differences. First, we optimize the feedback controller, whereas
the control signals for each time period as well as the total control time are directly optimized in
[Maki et al. 2021, 2020], which we believe is not suitable for obtaining robust control. Second, we
modify the objective function. Previous studies include the term penalizing the control time as
they formulate the problem as minimization of the control time. Because we did not optimize the
control time, it is excluded from our objective function definition. Moreover, for better collision
avoidance, we replaced 𝑤 · 𝐶 2 with 𝑤 · (𝐶 2 + I{𝐶 2 > 0}). Third, following [Miyauchi et al. 2021b],
we implement thrusters to realize robust control under external disturbances and adopt the state
equation model proposed in [Miyauchi et al. 2021b].
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:30 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
6 6
5
4 4
3
2 2
1
0
0 2 4 0
0 2 4
-3
10 0.01
0.10
vm [m/s]
vm [m/s]
u [m/s]
0.10
u [m/s]
5.00
0.00 0.00 0.00 0.00
-0.10 -5.00 -0.10
-0.01
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]
[degree], n [rps]
r [degree/s]
r [degree/s]
0.50 20
0.50 20
0.00 0
0.00 0
-0.50 -20 -0.50 -20
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
n n BT n ST n n BT n ST
(𝐴)
Fig. 6. Trajectories of the best controller obtained by CMA-ES(𝑦no )
observed for the best controller obtained by Adversarial-CMA-ES on Y𝐴 , which is the controller
(𝐵)
optimized under the worst wind condition 𝑦 (𝐴) ∈ Y𝐴 with 𝑦 (𝐵) = 𝑦est . For Figures 6 and 7, the
(𝐴) (𝐵)
left figure is the trajectory under 𝑦 = (𝑦no , 𝑦est ) and the right figure is the trajectory under the
(𝐵)
worst wind condition 𝑦 (𝐴) ∈ Y𝐴 with 𝑦 (𝐵) = 𝑦est . Figure 8 shows the trajectories observed for the
(𝐵) (𝐴) (𝐵)
best controller obtained by CMA-ES(𝑦est ), which is the controller optimized under 𝑦 = (𝑦est , 𝑦est ).
Figure 9 shows the trajectories observed for the best controller obtained by Adversarial-CMA-ES
on Y𝐵 , which is the controller optimized under the worst model parameter 𝑦 (𝐵) ∈ Y𝐵 with wind
(𝐴) (𝐴) (𝐵)
condition 𝑦 (𝐴) = 𝑦est . For Figures 8 and 9, the left figure is the trajectory under 𝑦 = (𝑦est , 𝑦est )
(𝐵) (𝐴) (𝐴)
and the right figure is the trajectory under the worst model parameter 𝑦 ∈ Y𝐵 with 𝑦 = 𝑦est .
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
Saddle Point Optimization with Approximate Minimization Oracle 1:31
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4
0.02
vm [m/s]
vm [m/s]
0.10 0.10
u [m/s]
u [m/s]
0.01
0.00 0.00 0.00 0.00
-0.10 -0.10 -0.01
-0.02
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]
[degree], n [rps]
r [degree/s]
r [degree/s]
4.00 4.00
2.00 20 2.00 20
0.00 0 0.00 0
-2.00 -20 -2.00 -20
-4.00 -4.00
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
n n BT n ST n n BT n ST
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4
vm [m/s]
u [m/s]
u [m/s]
[degree], n [rps]
0.50 0.50
r [degree/s]
r [degree/s]
20 20
0.00 0 0.00 0
-20 -20
-0.50 -0.50
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
n n BT n ST n n BT n ST
(𝐵)
Fig. 8. Trajectories of the best controller obtained by CMA-ES(𝑦est )
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.
1:32 Youhei Akimoto, Yoshiki Miyauchi, and Atsuo Maki
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 2 4 0 2 4
0.02 0.02
vm [m/s]
vm [m/s]
0.05 0.05
u [m/s]
u [m/s]
0.00 0.00 0.00 0.00
-0.05 -0.02 -0.05 -0.02
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
[degree], n [rps]
[degree], n [rps]
r [degree/s]
r [degree/s]
1.00 20 1.00 20
0.00 0 0.00 0
-1.00 -20 -1.00 -20
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
Time [s] Time [s] Time [s] Time [s]
n n BT n ST n n BT n ST
ACM Trans. Evol. Learn., Vol. 1, No. 1, Article 1. Publication date: January 2022.