Arxiv_ v1 [Math.oc] 23 Sep 2021

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

ARTICLE TEMPLATE

Frank-Wolfe method for vector optimization with a portfolio


optimization application

W. Chena , X.M. Yangb and Y. Zhaoc


arXiv:2109.11296v1 [math.OC] 23 Sep 2021

a
College of Mathematics, Sichuan University, Chengdu, China; b School of Mathematical
Sciences, Chongqing Normal University, Chongqing, China; c College of Mathematics and
Statistics, Chongqing Jiaotong University, Chongqing, China

ARTICLE HISTORY
Compiled September 24, 2021

ABSTRACT
In this paper, we propose an extension of the classical Frank-Wolfe method for
solving constrained vector optimization problems with respect to a partial order
induced by a closed, convex and pointed cone with nonempty interior. In the pro-
posed method, the construction of auxiliary subproblem is based on the well-known
oriented distance function. Two types of stepsize strategies including Armijio line
search and adaptive stepsize are used. It is shown that every accumulation point
of the generated sequences satisfies the first-order necessary optimality condition.
Moreover, under suitable convexity assumptions for the objective function, it is
proved that all accumulation points of any generated sequences are weakly efficient
points. We finally apply the proposed algorithms to a portfolio optimization problem
under bicriteria considerations.

KEYWORDS
vector optimization; Frank-Wolfe method; stationary point; convergence; portfolio
optimization

1. Introduction

Vector optimization problems arise, for example, in functional analysis, multiobjective


programming, multicriteria decision making, statistics, approximation theory, cooper-
ative game theory, etc (see [1]). In this class of problems, we seek to minimize several
objectives with respect to a partial order induced by a closed, convex and pointed cone
C with nonempty interior.
A particular case of such problems, very important in the practical applications, is
when C := Rm m m
+ , where R+ is the nonnegative orthant of R . This case corresponds
to the multicriteria or multiobjective optimization. To solve multiobjective optimiza-
tion problems, one of the most popular strategy is the so-called scalarization method
whose core idea is to convert a target multiobjective optimization problem into a scalar
optimization problem (see [2,3]) and then solve this transformed scalar optimization
problem by virtue of some classical optimization methods. However, the main disad-
vantage of scalarization methods is that it needs to introduce some additional and
appropriate parameters in the transformation process, and this requires insight into

CONTACT W. Chen. Email: [email protected]


the problem structure which may not be available in general. Another new type of
strategy is to extend classical optimization methods to multiobjective versions. For
example, Fliege and Svaiter [4] proposed a suitable extension of the classical steepest
descent method for multiobjective optimization. There are two key features of their
method at each iteration: (i) a descent direction is obtained by solving a auxiliary and
non-parametric quadratic scalar subproblem; (ii) Armijo line search is used to find a
point that dominates the current one along this direction. Following the research works
of Fliege and Svaiter [4], in recent years, several classical numerical iterative methods
(e.g. Newton method, quasi-Newton method, projected gradient method, proximal
gradient method, trust region method, conditional gradient method, etc.) for solving
scalar optimization problems have been extended to solve multiobjective optimiza-
tion problems (see for example [5–11] and references therein). Note that, in [8], the
authors presented a rigorous and comprehensive survey on multiobjective versions of
the steepest descent method, the projected gradient method and the Newton method.
Compared with these methods summarized in [8], the conditional gradient method
presented in [11] for constrained multiobjective optimization problems just need solve
a linear subproblem over a compact convex set at every iteration.
To extend the methods for multiobjective optimization presented in [8], the authors
[12–17], in finite-dimensional space, gave respectively the extensions of steepest descent
method, projected gradient method and Newton method to solve vector optimization
problems with respect to the general partial order rather than the nonnegative orthant.
It is noteworthy that the subproblems given in [13–17] have more general forms, and
their constructions are based on the well-known gauge function. In addition, by virtue
of the general order cone C, vector versions of the proximal point method [18], the
nonmonotone gradient algorithm [19] and the Hager-Zhang conjugate gradient method
[20] are introduced to solve vector optimization problems. In infinite-dimensional set-
tings, there are also several methods for solving vector optimization problems (see
[21–27] and references therein). For example, Chuong and Yao [25] presented exact
and inexact steepest descent methods of vector optimization problems for a map from
a finite dimensional Hilbert space to a Banach space, which generalizes the works in [4].
Very recently, Boţ and Grad [27] have proposed two forward-backward proximal point
type algorithms with inertial/memory effects for finding weakly efficient solutions to
a vector optimization problem.
The goal of this paper is to present a new method for vector optimization problems.
In this setting, the partial order is induced by a closed, convex and pointed cone C
with nonempty interior in finite-dimensional space. Our method is consistent with
the idea of the methods presented in [13–17] in that we seek to extend the Frank-
Wolfe method for scalar optimization to vector optimization. At each iteration of
our method, the descent direction is the difference between the previous iteration
point and a optimal solution of a auxiliary subproblem defined by the well-known
oriented distance function. Meanwhile, we consider two strategies of stepsizes: Armijio
line search and adaptive stepsize. Under some reasonable conditions including vector
version of descent lemma and C-boundedness, we establish the convergence results for
Frank-Wolfe method with two different strategies of stepsizes, that is, the stationarity
of accumulation points of the sequences generated by our method. Finally, we apply
the method to bicriteria portfolio optimization problem so as to produce a optimal
portfolio strategy for investors.
The outline of this paper is as follows. Section 2 presents some preliminaries on
the notations. Section 3 explains the vector optimization problem and a necessary
condition for optimality. In Section 4, the Frank-Wolfe method with Armijio line search

2
(see Algorithm 1) and adaptive stepsize (see Algorithm 3) for vector optimization
problems are introduced and the convergence results of the produced sequences are
obtained. An application to a bicriteria portfolio optimization problem is presented in
Section 5. Finally, in Section 6, we make some conclusions about our works.

2. Preliminaries

For a nonempty set X ⊂ Rm , the interior and boundary of X are respectively de-
noted by int(X) and bd(X). Let C ⊂ Rm be a closed, convex and pointed cone with
nonempty interior. For any y1 , y2 ∈ Rm , the partial order  in Rm induced by C is
defined as

y1  y2 ⇔ y2 − y1 ∈ C,

and the partial order ≺ in Rm induced by int(C) is defined as

y1 ≺ y2 ⇔ y2 − y1 ∈ int(C).

We now recall the concept of oriented distance function (also called assigned dis-
tance function or Hiriart-Urruty function), which was proposed by Hiriart-Urruty [28]
to investigate optimality conditions of nonsmooth optimization problems from the ge-
ometric point of view. The oriented distance function has been extensively used in
several works, such as scalarization for vector optimization [29,30], optimality condi-
tions for vector optimization [31], optimality conditions for set-valued optimization
problems [32], etc. Herein, we consider the oriented distance function in Rm .

Definition 2.1. [28] Let A be a subset of Rm . The function ∆A : Rm → R ∪ {±∞},


defined by

∆A (y) := dA (y) − dRm \A (y), ∀y ∈ Rm , (1)

is called the oriented distance function, where dA (y) := inf{ky − ak : a ∈ A} stands


for the distance function from y ∈ Rm to the set A and k · k denotes the norm in Rm .

Note that

y ∈ Rm \A,

dA (y), if
4A (y) =
−dRm \A (y), if y ∈ A.

We give the following Examples 2.2 to illustrate the function ∆A .

Example 2.2. (i) If we consider the norm kyk2 := ( m 2 12 m and A :=


P
i=1 yi ) in R
m
{y ∈ R : kyk2 ≤ 1}, then 4A (y) = kyk2 − 1.
(ii) If Rm is endowed with the norm kyk∞ := max1≤i≤m |yi | and A := −Rm + , then
4A (y) = max1≤i≤m yi .
(iii) Consider the norm kyk2 in R2 and the partial order C := {y = (y1 , y2 ) ∈ R2 :

3
y1 + y2 ≥ 0, y2 ≥ 0}. Let A := −C and

B1 := {y = (y1 , y2 ) ∈ R2 : y1 ≤ 0, y2 > 0},


B2 := {y = (y1 , y2 ) ∈ R2 : y1 − y2 ≤ 0, y1 > 0},
B3 := {y = (y1 , y2 ) ∈ R2 : y1 − y2 > 0, y1 + y2 > 0},
B4 := {y = (y1 , y2 ) ∈ R2 : y1 − y2 > 0, y1 + y2 ≤ 0},
B5 := {y = (y1 , y2 ) ∈ R2 : y1 − y2 ≤ 0, y2 ≤ 0}.

Clearly, A = B4 ∪ B5 and Y \A = B1 ∪ B2 ∪ B3 . By a direct calculation, we have

 y2 , if y ∈ B1 ,

dA (y) := kyk2 , if y ∈ B2 ,
 |y1√+y2 |
2
, if y ∈ B3

and
(
|y1 +y2 |
√ , if y ∈ B4 ,
dR2 \A (y) := 2
|y2 |, if y ∈ B5 .

In this paper, for our purposes, let A := −C in Definition 2.1. For the sake of
convenience, we let

ϕC (y) := 4−C (y), ∀y ∈ Rm . (2)

According to [29, Proposition 3.2] and the fact that C is a closed, convex and pointed
cone with nonempty interior, we have immediately the following properties related to
ϕC .

Lemma 2.3. [29] Let ϕC (·) be defined in (2). Then the following statements hold:
(i) ϕC is real valued and 1-Lipschitzian;
(ii) ϕC (y) < 0 for any y ∈ −int(C), ϕC (y) = 0 for any y ∈ bd(−C), and ϕC (y) > 0
for any y ∈ int(Rm \(−C));
(iii) ϕC is convex;
(iv) ϕC is positively homogeneous;
(v) For all y1 , y2 ∈ Rm ,

ϕC (y1 + y2 ) ≤ ϕC (y1 ) + ϕC (y2 ),


ϕC (y1 ) − ϕC (y2 ) ≤ ϕC (y1 − y2 );

(vi) Let y1 , y2 ∈ Rm . Then

y1 ≺ y2 ⇒ ϕC (y1 ) < ϕC (y2 ),


y1  y2 ⇒ ϕC (y1 ) ≤ ϕC (y2 ).

Definition 2.4. [33] For a nonempty set M ⊂ Rn , the diameter of M is defined as

diam(M ) := sup kx − yk2 .


x,y∈M

4
Remark 1. If M is a compact set, then diam(M ) = maxx,y∈M kx − yk2 and it is a
finite number.

Definition 2.5. [34] A function F : Rn → Rm is called C-convex on Rn , if for all


x, y ∈ Rn and all λ ∈ [0, 1],

F (λx + (1 − λ)y)  λF (x) + (1 − λ)F (y).

Let F : Rn → Rm be a vector-valued function with F = (f1 , f2 , . . . , fm )> , where


the superscript > denotes the transpose. We say that F is continuously differentiable
if each fi , i ∈ I := {1, 2, . . . , m}, is continuously differentiable. Now, let F be a
continuously differentiable function. Given x = (x1 , x2 , . . . , xn ) ∈ Rn , the Jacobian of
F at x, denoted by JF (x), is a matrix of order m × n whose entries are defined by

∂fi
(JF (x))i,j = (x),
∂xj

where i ∈ I and j ∈ {1, 2, . . . , n}. We may represent it by

JF (x) := [∇f1 (x) ∇f2 (x) . . . ∇fm (x)]> , x ∈ Rn .

Lemma 2.6. [34] Assume that the function F : Rn → Rm is C-convex on Rn and it


is continuously differentiable at x ∈ Rn . Then

JF (x)(y − x)  F (y) − F (x), ∀y ∈ Rn .

3. The vector optimization problem

In this paper, we consider the following constrained vector optimization problems with
respect to the partial order C:

minC F (x) = (f1 (x), f2 (x), ..., fm (x))>


(3)
s.t. x∈Ω

where F = (f1 , f2 , . . . , fm )> : Ω → Rm and Ω ⊂ Rn is the domain of F which


is assumed to be nonempty, compact and convex. From now on, unless explicitly
mentioned, we always assume that F is continuously differentiable.

Definition 3.1. [34] A point x ∈ Ω is called weakly efficient solution of problem (3)
if there exists no x∗ ∈ Ω such that F (x∗ ) ≺ F (x).

A necessary, but not sufficient, first order optimality condition for problem (3) at
x̂ ∈ Ω, is

JF (x̂)(Ω − x̂) ∩ (−int(C)) = ∅, (4)

where JF (x̂)(Ω − x̂) := {JF (x̂)(s − x̂) : s ∈ Ω} and

JF (x̂)(s − x̂) = (h∇f1 (x̂), s − x̂i, h∇f2 (x̂), s − x̂i, . . . , h∇fm (x̂), s − x̂i)> .

5
Obviously, (4) is equivalent to JF (x̂)(s − x̂) ∈
/ −int(C) for any s ∈ Ω.

Definition 3.2. A point x̂ ∈ Ω satisfying (4) is called a stationary point of problem


(3).

Remark 2. From Lemma 2.3(ii), we can obtain another equivalent characterization


of a stationary point x̂ of problem (3), i.e.,

ϕC (JF (x̂)(s − x̂)) ≥ 0, ∀s ∈ Ω.

Remark 3. (i) If m = 1 and C := R1+ , then we retrieve the classical stationary


condition for constrained scalar optimization problem, i.e., h∇f1 (x̂), s − x̂i ≥ 0
for all s ∈ Ω.
(ii) When C := Rm + , Definition 3.2 is the same as the notion presented in [11, pp.
744].

Remark 4. Note that if x̂ ∈ Ω is not a stationary point of problem (3), then there
exists ŝ ∈ Ω such that JF (x)(ŝ − x̂) ∈ −int(C), i.e., ϕC (JF (x̂)(ŝ − x̂)) < 0 from
Lemma 2.3(ii). In this case, as analyzed in [16, pp. 665], we can assert that ŝ − x̂ is a
descent direction for F .

We conclude this section by giving the relation between stationary point and weakly
efficient solution. The proof of this property can be similarly analyzed from [13, pp.
410] and we omit the process here.

Theorem 3.3. (i) If x̂ ∈ Ω is a weakly efficient solution of problem (3), then x̂ ∈ Ω


is a stationary point.
(ii) If F is C-convex on Ω and x̂ ∈ Ω is a stationary point of problem (3), then x̂ is
a weakly efficient solution.

4. Frank-Wolfe method for vector optimization

In this section, we propose an extension of the classical Frank-Wolfe method described


in [35, pp. 378] to solve problem (3). First we state and verify that some results that
allow us to introduce two types of algorithms. Then, under some additional assump-
tions, it is proved that all accumulation points of any generated sequences are weakly
efficient solution.
For a given x ∈ Ω, we introduce a useful auxiliary function ψx : Ω → R defined by

ψx (s) := ϕC (JF (x)(s − x)), s ∈ Ω. (5)

Remark 5. If we consider the norm k · k∞ in Rm and C := Rm


+ , then from Example
2.2(ii), it holds that

ψx (s) = max h∇fi (x), s − xi, ∀s ∈ Ω.


i∈I

For x ∈ Ω, in order to obtain the descent direction for F at x, we need to consider


the following auxiliary scalar optimization problem

min ψx (s). (6)


s∈Ω

6
Notice that, it follows from Lemma 2.3(iii) that ψx defined in (5) is a convex function.
This, combined with the fact that Ω is a nonempty, compact and convex set, gives
that problem (6) admits an optimal solution (possibly not unique) on Ω. We denote
the optimal solution of problem (6) by s(x), i.e.,

s(x) ∈ argmin ψx (s). (7)


s∈Ω

and the optimal value of problem (6) is denoted by v(x), i.e.,

v(x) := ψx (s(x)). (8)

According to Remark 4, we formally give the search direction for the objective
function F at x.

Definition 4.1. For any given point x ∈ Ω, the search direction of the Frank-Wolfe
method for F at x is defined as

d(x) := s(x) − x, (9)

where s(x) is given by (7).

The following property gives a characterization of stationarity in terms of v(·), which


is crucial for convergence analysis and the stopping criteria of our algorithm.

Proposition 4.2. Let v : Ω → R be defined in (8). Then, the following statements


hold:
(i) v(x) ≤ 0 for every x ∈ Ω;
(ii) x ∈ Ω is a stationary point of problem (3) if and only if v(x) = 0.

Proof. (i) Since x ∈ Ω, it follows from (7) and (8) that v(x) = mins∈Ω ψx (s) ≤
ψx (x) = ϕC (JF (x)(x − x)) = ϕC (0). Besides, ϕC (0) = 0 by Lemma 2.3(ii). Thus,
v(x) ≤ 0.
(ii) Necessity. Suppose that x ∈ Ω is a stationary point of problem (3). Then, it
follows from Remark 2 that ϕC (JF (x)(s − x)) ≥ 0 for any s ∈ Ω. By (7), we have
s(x) ∈ Ω. Hence, v(x) = ϕC (JF (x)(s(x) − x)) ≥ 0. This, combined with (i), yields
that v(x) = 0.
Sufficiency. Let v(x) = 0. According to (8), we obtain 0 = v(x) ≤ ψx (s) =
ϕC (JF (x)(s − x)) for all s ∈ Ω, which implies that x is a stationary point of problem
(3).

Remark 6. It is obvious from Proposition 4.2 that x is not a stationary point of


problem (3) if and only if v(x) < 0.

Proposition 4.3. Let v : Ω → R be defined in (8). Then, v is continuous on Ω.

Proof. Take x ∈ Ω and let {xk } be a sequence in Ω such that limk→∞ xk = x. In order
to obtain the continuity of v on Ω, it is sufficient to prove that limk→∞ v(xk ) = v(x),
i.e.,

lim sup v(xk ) ≤ v(x) ≤ lim inf v(xk ). (10)


k→∞ k→∞

7
Since s(x) ∈ Ω, using (7) and (8), we can obtain for all k,

v(xk ) = ϕC (JF (xk )(s(xk ) − xk )) ≤ ϕC (JF (xk )(s(x) − xk )). (11)

Since F is continuously differentiable and ϕC is continuous as presented in Lemma


2.3(i), taking lim supk→∞ on both sides of inequality in (11), we have

lim sup v(xk ) ≤ ϕC (JF (x)(s(x) − x)) = v(x). (12)


k→∞

Let us show that v(x) ≤ lim inf k→∞ v(xk ). Obviously, we have

v(x) = min{ψx (s), s ∈ Ω}


≤ ψx (s(xk ))
= ϕC (JF (x)(s(xk ) − x))
(13)
= ϕC (JF (x)(s(xk ) − xk + xk − x))
= ϕC (JF (x)(s(xk ) − xk ) + JF (x)(xk − x))
≤ ϕC (JF (x)(s(xk ) − xk )) + ϕC (JF (x)(xk − x)),

where the last inequality follows from Lemma 2.3(v). Taking lim inf k→∞ in (13), we
get

v(x) ≤ lim inf ϕC (JF (x)(s(xk ) − xk ))


k→∞
= lim inf (v(xk ) + ϕC (JF (x)(s(xk ) − xk )) − ϕC (JF (xk )(s(xk ) − xk )))
k→∞
≤ lim inf (v(xk ) + kJF (x)(s(xk ) − xk ) − JF (xk )(s(xk ) − xk )k2 ) (14)
k→∞
= lim inf (v(xk ) + k(JF (x) − JF (xk ))(s(xk ) − xk )k2 )
k→∞
≤ lim inf (v(xk ) + kJF (x) − JF (xk )k2 ks(xk ) − xk k2 ),
k→∞

where the penultimate inequality follows from Lemma 2.3(i). Since s(xk ), xk ∈ Ω, it
follows from Remark 1 that ks(xk ) − xk k ≤ diam(Ω) < ∞. This, combined with the
continuously differentiability of F and (14), we get v(x) ≤ lim inf k→∞ v(xk ).
Altogether, (10) holds. Consequently, v is continuous on Ω.

4.1. The Frank-Wolfe method with line search


In this section, we will present the proposed algorithm with line search. In order to
compute the stepsize t > 0 of our algorithm, we use an Armijio rule. Let β ∈ (0, 1) be
a preset constant. The condition to accept t is given by

F (x + t(s(x) − x))  F (x) + tβJF (x)(s(x) − x). (15)

We begin with t = 1 and when (15) does not hold, we update

t := τ t,

8
where τ ∈ (0, 1). The following lemma demonstrates the finiteness of this procedure
in view of the fact that (15) holds strictly for t > 0 small enough.

Lemma 4.4. Let s(x) be defined in (7) and JF (x)(s(x) − x) ≺ 0. If β ∈ (0, 1), then
there exists some t̂ > 0 such that

F (x + t(s(x) − x)) ≺ F (x) + tβJF (x)(s(x) − x).

for any t ∈ (0, t̂].

Proof. It is similar to the proof of [13, Proposition 2.1].


Based on the previous discussions, we give the Frank-Wolfe algorithm with Armijio
line search (see Algorithm 1) for problem (3). At k-th iteration, we compute problem
(6) with x = xk . Let us call sk := s(xk ) and v k := v(xk ) the optimal solution and
optimal value of problem (6) at k-th iteration, respectively. The descent direction at
k-th iteration is computed by dk := d(xk ) = sk − xk . If v k 6= 0, then we can use dk
with an Armijio line search technique to look for a new solution xk+1 which dominates
xk .
Algorithm 1 Frank-Wolfe algorithm with armijio line search
Input: x0 ∈ Ω
for k = 0, 1, . . . do
sk ← argmins∈Ω ψxk (s)
v k ← ψxk (sk )
dk ← sk − xk
if v k = 0 then
return stationary point xk
end if
tk ← armijio linear search(xk , dk , JF (xk ))
xk+1 ← xk + tk dk
end for

Here we describe a vector adaptation for an Armijio line search:

Algorithm 2 armijio linear search


Input: xk ∈ Ω, dk ∈ Rn , JF (xk ), β ∈ (0, 1), τ ∈ (0, 1)
t←1
while F (xk + tdk )  F (xk ) + tβJF (xk )dk do
t ← τt
end while
tk = t

Observe that Algorithm 1 ends up with a stationary point in a finite number of


iterations or produces an infinite sequence of nonstationary points. From now on, we
suppose that Algorithm 1 generates an infinite sequence {xk } of nonstationary points.
First, a simple fact that the proposed algorithm generates feasible sequences is given
below.

Theorem 4.5. Let {xk } be a sequence produced by Algorithm 1. Then, xk ∈ Ω for all
k.

9
Proof. We proceed by induction. From Algorithm 1, we have x0 ∈ Ω for k = 0.
Assume that xk ∈ Ω for k > 0. We shall prove xk+1 ∈ Ω for k + 1. It is easy to see that
sk ∈ Ω from Algorithm 1. According to the convexity of Ω, we have xk+1 = xk +tk dk =
xk + tk (sk − xk ) = tk sk + (1 − tk )xk ∈ Ω for tk ∈ (0, 1].

We present some properties related to the points which are iterated by Algorithm
1.

Proposition 4.6. For all k, we have


(i) v k < 0;
(ii) F (xk+1 ) ≺ F (xk ); 0
Pk i ϕC (F (x ))−ϕC (F (xk+1 ))
(iii) i=0 ti |v | ≤ β .

Proof. (i) From the assumption that an infinite sequence {xk } is generated by Algo-
rithm 1 and Remark 6, we have v k < 0 for all k.
(ii) By Theorem 4.5, xk+1 ∈ Ω for all k. From (15), we have

F (xk+1 )  F (xk ) + tk βJF (xk )(sk − xk ).

From the nonstationarity of xk and Remark 4, we have JF (xk )(sk −xk ) ≺ 0. Therefore,
the above inequality implies that F (xk+1 ) ≺ F (xk ).
(iii) For any i, we have

ϕC (F (xi+1 )) ≤ ϕC (F (xi ) + ti βJF (xi )(si − xi ))


≤ ϕC (F (xi )) + ϕC (ti βJF (xi )(si − xi ))
(16)
= ϕC (F (xi )) + ti βϕC (JF (xi )(si − xi ))
= ϕC (F (xi )) + ti βv i ,

where the first inequality holds in view of (15) and Lemma 2.3(vi), the second inequal-
ity follows from Lemma 2.3(v), the first equality is due to Lemma 2.3(iv). According
to (16) and (i), we have

ti |v i | = −ti v i
ϕC (F (xi )) − ϕC (F (xi+1 )) (17)
≤ .
β

Therefore, adding up from i = 0 to i = k in (17), the result is immediately obtained.

Theorem 4.7. Let {xk } be a sequence produced by Algorithm 1. Then, every accu-
mulation point of {xk } is a stationary point of problem (3).

Proof. Let x̂ ∈ Ω be a accumulation point of the sequence {xk }. Then, there exists a
subsequence {xkj } of {xk } such that

lim xkj = x̂. (18)


j→∞

From Proposition 4.3 and (18), we have v(xkj ) → v(x̂) whenever j → ∞. Here, it is
sufficient to show that v(x̂) = 0 in view of Proposition 4.2(ii).

10
Let k := kj in Proposition 4.6(iii). Then

kj
X ϕC (F (x0 )) − ϕC (F (xkj +1 ))
ti |v(xi )| ≤ .
β
i=0

P∞ i
Taking limj→∞ on both sides of the above inequality, we get i=0 ti |v(x )| < ∞, which
implies that limk→∞ tk v(xk ) = 0, and in particular,

lim tkj v(xkj ) = 0. (19)


j→∞

Since tk ∈ (0, 1] for all k, we have the following two alternatives:

(a) lim sup tkj > 0 or (b) lim sup tkj = 0. (20)
j→∞ j→∞

We first suppose that (20)(a) holds. Then, there exists a subsequence {tkji } of {tkj }
converging to some t̂ > 0. And from (18), we have limi→∞ xkji = x̂. Thus, (19) implies
that limi→∞ tkji v(xkji ) = 0, and furthermore, limi→∞ v(xkji ) = 0. This, combined
with Proposition 4.3, gives that 0 = limi→∞ v(xkji ) = v(x̂).
We now consider (20)(b). Clearly, xkj , s(xkj ) ∈ Ω. From the compactness of Ω,
Remark 1 and (9), we have

kd(xkj )k = ks(xkj ) − xkj k ≤ diam(Ω) < ∞,

i.e., the sequence {d(xkj )} is bounded. Now, we take subseqences {xkji }, {d(xkji )} and
{tkji } converging to x̂, d(x̂) and 0, respectively. By (8) and Proposition 4.6(i), we get

ϕC (JF (xkji )d(xkji )) = v(xkji ) < 0.

Taking limi→∞ on both sides of the above inequality and togethering with Proposition
4.3, we have

v(x̂) ≤ 0. (21)

Take some fixed but arbitrary l ∈ N, where N denotes the set of natural numbers. From
limi→∞ tkji = 0, we have tkji < τ l for i large enough. This shows that the Armijio
condition is not satisfied at xkji for t = τ l , that is,

F (xkji + τ l d(xkji ))  F (xkji ) + τ l βJF (xkji )d(xkji ),

or, equivalently,

F (xkji + τ l d(xkji )) − F (xkji ) − τ l βJF (xkji )d(xkji ) ∈


/ −C,

which means that

F (xkji + τ l d(xkji )) − F (xkji ) − τ l βJF (xkji )d(xkji ) ∈ Rm \(−C)


(22)
= int(Rm \(−C)),

11
where the equality holds in view of the closedness of C. By (22) and Lemma 2.3(ii),
we have

ϕC (F (xkji + τ l d(xkji )) − F (xkji ) − τ l βJF (xkji )d(xkji )) > 0. (23)

By the continuously differentiability of F and the continuity of ϕC , taking limi→∞ in


(23), we obtain

ϕC (F (x̂ + τ l d(x̂)) − F (x̂) − τ l βJF (x̂)d(x̂)) ≥ 0, ∀l ∈ N. (24)

According to (24), Lemma 4.4 and Lemma 2.3(ii), we can obtain JF (x̂)d(x̂) ⊀ 0, i.e.,
JF (x̂)d(x̂) ∈ Rm \(−int(C)). Thus, we have v(x̂) = ϕC (JF (x̂)d(x̂)) ≥ 0 from Lemma
2.3(ii). This, combined with (21), yields that v(x̂) = 0.

It follows from Theorems 3.3 and 4.7 that the following result holds.

Theorem 4.8. If F is C-convex on Ω, then every accumulation point produced by


Algorithm 1 is a weakly efficient solution of problem (3).

4.2. The Frank-Wolfe method with adaptive stepsize


In this section, it is always assumed that the gradient of each component of objective
function F is Lipschitz continuous with constant Li > 0 for i ∈ I and we set L :=
maxi∈I Li . In this sequel, we let

e = (e1 , e2 , . . . , em )> ∈ int(C).

We first present an important property which is essential for showing convergence


analysis.
L 2
Lemma 4.9. Suppose that 2 k · k2 e − F (·) is C-convex on Ω. Then, for any x, y ∈ Ω,

L
F (y) − F (x)  JF (x)(y − x) + ky − xk22 e. (25)
2
L 2
Proof. Let G(·) := 2 k·k2 e−F (·). Since G(·) is C-convex on Ω, it follows from Lemma
2.6 that

JG(x)(y − x)  G(y) − G(x). (26)

For JG(x)(y − x) in (26), by a simple calculation, we have

JG(x)(y − x) = Lhx, yie − Lkxk22 e − JF (x)(y − x). (27)

12
From the notion of G(·), (26) and (27), we have

L L
F (y) − F (x)  JF (x)(y − x) + kyk22 e − kxk22 e − Lhx, yie + Lkxk22 e
2 2
L
= JF (x)(y − x) + (kyk22 + kxk22 − 2hx, yi)e
2
L
= JF (x)(y − x) + ky − xk22 e,
2
and the proof is complete.

Remark 7. Here we call the property the vector version of the classical descent lemma
with respect to the order cone C. Actually, the setting of the condition in Lemma 4.9
is inspired by the works of Bauschke et al. [36].

Remark 8. Lemma 4.9 implies that, for any x, y ∈ Ω,

L
ϕC (F (y)) − ϕC (F (x))  ψx (y) + ky − xk22 ϕC (e). (28)
2

Indeed, from (25) and Lemma 2.3(iv)–(vi), we have


 
L 2
ϕC (F (y) − F (x)) ≤ ϕC JF (x)(y − x) + ky − xk2 e
2
 
L 2 (29)
≤ ϕC (JF (x)(y − x)) + ϕC ky − xk2 e
2
L
= ϕC (JF (x)(y − x)) + ky − xk22 ϕC (e).
2

Obviously, according to Lemma 2.3(vi), it holds that

ϕC (F (y)) − ϕC (F (x)) ≤ ϕC (F (y) − F (x)). (30)

Therefore, it immediately follows from (29), (30) and (5) that (28) holds.

Let us now give the Frank-Wolfe algorithm with adaptive stepsize (see Algorithm
3) for solving problem (3).

Algorithm 3 Frank-Wolfe algorithm with adaptive stepsize


Input: x0 ∈ Ω
for k = 0, 1, . . . do
sk ← argmins∈Ω ψxk (s)
v k ← ψxk (sk )
dk ← sk − xk
if v k = 0 then
return stationary point xk
end if n o
k
tk ← min 1, − Lkdv k k2
2

xk+1 ← xk + tk dk
end for

13
Likewise, Algorithm 3 can terminate with a stationary point in a finite number
of iterations or generate an infinite sequence. We will suppose that in the sequel
Algorithm 3 produces an infinite sequence {xk } of nonstationary points. Clearly, it
follows from Remark 6 that v k < 0 for all k.

Lemma 4.10. Suppose that L2 k · k22 e − F (·) is C-convex on Ω, ϕC (e) < 2 and {xk } is
a sequence produced by Algorithm 3. Then, for all k, it holds that

(v k )2
 
k+1 k ϕ (e) − 2 k
ϕC (F (x )) − ϕC (F (x )) ≤ C min , −v . (31)
2 L(diam(Ω))2

Proof. Let xk+1 = xk + tk dk , where dk = sk − xk and

vk
 
tk = min 1, − . (32)
Lkdk k22

Since L2 k·k22 e−F (·) is C-convex on Ω, then by (28) invoked with x = xk and y = xk+1 ,
we have
L 2 k 2
ϕC (F (xk+1 )) − ϕC (F (xk )) ≤ ψxk (xk + tk (sk − xk )) + t kd k2 ϕC (e)
2 k
L 2 k 2
= tk ψxk (sk ) + t kd k2 ϕC (e) (33)
2 k
L 2 k 2
= tk v k + t kd k2 ϕC (e),
2 k

where the first equality holds in view of (5). According to (32), there are two options:
Case 1. Let tk = 1. This, combined with (32), gives that

Lkdk k22 ≤ −v k . (34)

By (33) and (34), we obtain

2 − ϕC (e) k
ϕC (F (xk+1 )) − ϕC (F (xk )) ≤ v . (35)
2
k
Case 2. Let tk = − Lkdv k k2 . From Remark 1, we get kdk k = ksk − xk k ≤ diam(Ω).
2
This, together with (33), ϕC (e) < 2, yields that

ϕC (e) − 2 (v k )2
ϕC (F (xk+1 )) − ϕC (F (xk )) ≤
2 Lkdk k22
(36)
ϕ (e) − 2 (v k )2
≤ C .
2 L(diam(Ω))2

Therefore, (31) is directly derived by (35) and (36).

To present our convergence analysis for Algorithm 3, we need the following assump-
tion.

14
Assumption A. The sequence {F (xk )} is C-bounded from below, i.e., there exists
F̄ ∈ Rm such that F̄  F (xk ) for all k.

Remark 9. The C-boundedness is a generalization of the boundedness for scalar value


functions. It has been extensively used in the proof of the convergence for gradient-
based methods for solving vector optimization problems (see [6,14,15,19]).

Theorem 4.11. Suppose that L2 k · k22 e − F (·) is C-convex on Ω, ϕC (e) < 2 and {xk } is
a sequence produced by Algorithm 3. If Assumption A holds, then every accumulation
point of {xk } is a stationary point of problem (3).

Proof. From (31), ϕC (e) < 2 and v k < 0, we have for all k,

(v k )2
 
k+1 k ϕ (e) − 2
ϕC (F (x )) − ϕC (F (x )) ≤ C min , −v k
2 L(diam(Ω))2 (37)
< 0,

i.e., ϕC (F (xk+1 )) < ϕC (F (xk )), which implies that {ϕC (F (xk ))} is nonincreasing for
all k. Since {F (xk )} is C-bounded from below (say by F̄ ), i.e., F̄  F (xk ) for all k,
it follows from Lemma 2.3(vi) that ϕC (F̄ ) ≤ ϕC (F (xk )) for all k. Therefore, we know
that the sequence {ϕC (F (xk ))} is convergent. This obviously means that

lim (ϕC (F (xk+1 )) − ϕC (F (xk ))) = 0. (38)


k→∞

Taking limk→∞ in (37), and then combining with (38), we have

lim v(xk ) = 0. (39)


k→∞

From Proposition 4.3, Proposition 4.2(ii) and (39), we obtain that each accumulation
point of {xk } is a stationary point of problem (3).

Remark 10. If the C-convexity of F is required in the conditions of Theorem 4.11,


then it follows from Theorems 3.3 and 4.11 that every accumulation point produced
by Algorithm 3 is a weakly efficient solution of problem (3).

Remark 11. It is noteworthy that the extended Frank-Wolfe methods for vector op-
timization problems presented in Algorithms 1 and 3 are conceptual and theoretical
schemes rather than implementable algorithms. Similar issues also appear in the liter-
ature; see, e.g., [12,13,15,17–20,22,24–27]). Therefore, the computational efficiency of
the method to a real-world optimization problem depends essentially on the choice of
a good feature and structure of the minimization subproblem (6) at every iteration.
For example, when the norm k · k∞ is used and C := Rm + , the objective function ψx
in (6) has the simple form as shown in Remark 5, and then the descent direction can
be easily computed by program. We consider this issue as a subject in the following
section.

15
5. Numerical experiments for portfolio optimization

In this section, we present an application of the proposed methods to portfolio opti-


mization problem. The algorithms are were implemented in Python software and ran
on a Lenovo computer with Intel(R)Core(TM)i5-8250U processor (1.60 GHz) and 4.0
GB of RAM.
Consider the following bicriteria optimization problem

−x> u
 
minR2+ F (x) =
x> V x (40)
s.t. x ∈ Ω

where u ∈ Rn , V ∈ Rn×n is a symmetric positive semidefinite matrix and


n
( )
X
n
Ω= x = (x1 , x2 , . . . , xn ) ∈ R : xi = 1, xi ≥ 0, i = 1, 2, . . . , n .
i=1

The problem (40) is actually a well-known portfolio optimization problem, which plays
a critical role in determining portfolio strategies for investors. The decision variable
x = (x1 , x2 , . . . , xn ) of problem (40) stands for the asset weight vector, where xi ,
i = 1, 2, . . . , n, is the weight of asset i in the portfolio. u means the return rate
of the asset and the variance-covariance matrix V = (σij )n×n denotes the variance
and covariance of individual asset, where σii is the variance of asset i and σij is the
covariance between asset i and asset j. The first objective function denotes the negative
of the expected return (that is to be maximized, therefore minimized with a leading
minus) and the second one is to minimize the variance of the portfolio, which quantifies
the risk associated to the considered portfolio.
Herein, we use the real data presented in [37] that contains five stocks: IBM, Mi-
crosoft, Apple, Quest Diagnostics and Bank of America. The expected return and
variance of each stock in the portfolio were calculated based on historical stock price
and dividend payment from February 1, 2002 to February 1, 2007. Thus, in problem
(40), n = 5,

u = (0.004, 0.00513, 0.04085, 0.01006, 0.01236)

and V is set as follows:


 
0.006461 0.002983 0.00235487 0.00235487 0.00096889
 0.002983
 0.0039 0.00095937 −0.0001987 0.00063459
V = 0.00235487 0.00095937 0.01267778 0.00135712 0.00134481.
0.00235487 −0.0001987 0.00135712 0.00559836 0.00041942
0.00096889 0.00063459 0.00134481 0.00041942 0.0016229

Considering the operability in practise, we take the norm k · k∞ in R2 and C := R2+ .


From Remark 5 and (6), at x ∈ Ω, we need solving the following scalar optimization
problem

min max h∇fi (x), s − xi. (41)


s∈Ω i=1,2

16
Clearly, problem (41) is nondifferentiable. Correspondingly, it can be equivalently
transfomed into the following differentiable form

min γ
s.t. γ ≥ h∇fi (x), s − xi, i = 1, 2 (42)
s∈Ω

Observe that problem (42) is a linear convex optimization problem. Therefore, the
optimal solution of problem (42) in our experiment can be obtained by using the
linprog of the solver optimize in Python. Moreover, the constrained set is actually an
unit simplex. So in order to obtain a set of weakly efficient solutions, we randomly and
uniformly sample 50 initial points on the simplex. The stopping criteria in Algorithms
1 and 3 are set as |v k | ≤  := 10−5 . Algorithms 1 and 3 were respectively run 50 times
by using same initial points and each time they ended at solution points, which have
been obtained after the verification of the stopping criterion. The solutions obtained
by Algorithms 1 and 3 are displayed in Figure 1. Meanwhile, the number of iterations
(on the “y” axes) and computing CPU time in seconds (on the “y” axes) for each initial
point (50 in total on the “x” axes) are reported in Figure 2(a) and (b), respectively.
Note that, in Figure 2(a)–(b), the red and blue dotted lines denote respectively the
average of iterations and CPU time obtained by Algorithms 1 and 3 for 50 instances
(the specific values are presented in Table 1).

Figure 1. The optimization results of the five stocks.

Table 1. Average of CPU time and the number of it-


erations.
Algorithm 1 Algorithm 3
Average of CPU time 0.024 0.500
Average of iterations 7.460 169.540

Figure 1 shows that some possible optimal portfolio points on a return-risk tradeoff.

17
(a) Iterations (b) CPU time

Figure 2. Iterations and CPU time of Algorithms 1 and 3.

As we have seen, the expected return is increasing with the risk. From Figure 2 and
Table 1, we observe that Algorithm 1 with Armijio line search takes fewer iterations
and CPU time than Algorithm 3 with adaptive stepsize for the same initial points.
A reasonable explanation of this phenomenon from the experimental data is that the
change of stepsize t in Algorithm 3 for each iteration is very small, which leads to a
small improvement of the objective function F , so it comes with additional cost.

6. Conclusions

In this paper, we have extended the classical Frank-Wolfe method to solve constrained
vector optimization problems with respect to a closed, convex and pointed cone with
nonempty interior. A key point is that we construct a auxiliary subproblem via the
well-known oriented distance function. Under reasonable assumptions, we prove that
accumulation points of the sequences generated by the proposed algorithms with two
different strategies of stepsizes are stationary. Applications to portfolio optimization
under bicriteria considerations are given.
In recent years, the convergence rate analysis of some gradient-based methods for
vector optimization problems have established under the setting of the partial order in
Rm is the nonnegative orthant (see [5,11,38,39]). Moreover, there are some convergence
rate results in the case of the general cone order (see [16,19]). It is noteworthy that
in this paper we have not analyzed the convergence rate of the proposed methods. An
interesting topic for future research is to investigate this issue.

References

[1] John J. Vector optimization: theory, applications and extensions. 2nd ed. Berlin: Springer;
2011.
[2] Miettinen K. Nonlinear multiobjective optimization. Boston, MAA: Kluwer; 1999.
[3] Eichfelder G. Adaptive scalarization methods in multiobjective optimization. Berlin:
Springer; 2008.

18
[4] Fliege J, Svaiter BF. Steepest descent methods for multicriteria optimization. Math.
Methods Oper Res. 2000;51: 479–494.
[5] Fliege J, Graña Drummond LM, Svaiter BF. Newton’s method for multiobjective opti-
mization. SIAM J Optim., 2009;20(2): 602–626.
[6] Qu SJ, Goh M, Chan FTS. Quasi-Newton methods for solving multiobjective optimiza-
tion. Oper Res Lett. 2011;39(5): 397–399.
[7] Qu SJ, Goh M, Liang B. Trust region methods for solving multiobjective optimisation.
Optim Methods Softw. 2013;28(4): 796–811.
[8] Fukuda EH, Graña Drummond L.M.: A survey on multiobjective descent methods.
Pesquisa Oper. 2014;34(3): 585–620.
[9] Tanabe H, Fukuda EH, Yamashita N. Proximal gradient methods for multiobjective op-
timization and their applications. Comput Optim Appl. 2019;72(2): 339–361.
[10] Wang JH, Hu YH, Wai Yu CK, Li C, Yang XQ. Extended Newton methods for multi-
objective optimization: majorizing function technique and convergence analysis. SIAM J
Optim. 2019;29(3): 2388–2421.
[11] Assunção PB, Ferreira OP, Prudente LF. Conditional gradient method for multiobjective
optimization. Comput Optim Appl. 2021;78(3): 741–768.
[12] Graña Drummond LM, Iusem AN. A projected gradient method for vector optimization
problems. Comput Optim Appl. 2004;28(1): 5–29.
[13] Graña Drummond LM, Svaiter BF. A steepest descent method for vector optimization
problems. J Comput Appl Math. 2005;175: 395–414.
[14] Fukuda EH, Graña Drummond LM. On the convergence of the projected gradient method
for vector optimization. Optimization. 2011;60(8–9): 1009–1021.
[15] Fukuda EH, Graña Drummond LM. Inexact projected gradient method for vector opti-
mization. Comput Optim Appl. 2013;54(3): 473–493.
[16] Graña Drummond LM, Raupp FMP, Svaiter BF. A quadratically convergent Newton
method for vector optimization. Optimization. 2014;63(5): 661–677.
[17] Lu F, Chen CR. Newton-like methods for solving vector optimization problems. Appl
Anal. 2014;93(8): 1567–1586.
[18] Qu SJ, Goh M, Ji Y, Souza RD. A new algorithm for linearly constrained c-convex vector
optimization with a supply chain network risk application. Eur J Oper Res. 2015;247(2):
359–365.
[19] Qu SJ, Ji Y, Jiang JL, Zhang QP. Nonmonotone gradient methods for vector optimization
with a portfolio optimization application. Eur J Oper Res. 2017;263(2): 356–366.
[20] Gonçalves MLN, Prudente LF. On the extension of the Hager–Zhang conjugate gradient
method for vector optimization. Comput Optim Appl. 2020;76(3): 889–916.
[21] Bonnel H, Iusem AN, Svaiter BF. Proximal methods in vector optimization. SIAM J
Optim. 2005;15(4): 953–970.
[22] Ceng LC, Yao JC. Approximate proximal methods in vector optimization. Eur. J Oper
Res. 2007;183(1): 1–19.
[23] Chen Z, Huang HQ, Zhao KQ. Approximate generalized proximal-type method for convex
vector optimization problem in Banach spaces. Comput Math Appl. 2009;57(7): 1196–
1203.
[24] Chuong TD. Tikhonov-type regularization method for efficient solutions in vector opti-
mization. J Comput Appl Math. 2010;234(3): 761–766.
[25] Chuong TD, Yao JC. Steepest descent methods for critical points in vector optimization
problems. Appl Anal. 2012;91(10): 1811–1829.
[26] Chuong TD. Newton-like for efficient solutions in vector optimization. Comput Optim
Appl. 2013;54: 495–516.
[27] Boţ RI, Grad S-M. Inertial forward backward methods for solving vector optimization
problems. Optimization. 2018;67(7): 959–974.
[28] Hiriart-Urruty JB. Tangent cone, generalized gradients and mathematical programming
in Banach spaces. Mathe Oper Res. 1979;4: 79–97.
[29] Zaffaroni A. Degrees of efficiency and degrees of minimality. SIAM J Control. Optim.

19
2003;42: 1071–1086.
[30] Miglierina E, Molho E, Rocca M. Well-posedness and scalarization in vector optimization.
J Optim Theory Appl. 2005;126(2): 391–409.
[31] Gao Y, Hou SH, Yang XM. Existence and optimality conditions for approximate solutions
to vector optimization problems. J Optim Theory Appl. 2012;152(1): 97–120.
[32] Zhou ZA, Chen W, Yang XM. Scalarizations and optimality of constrained set-valued
optimization using improvement sets and image space analysis. J Optim Theory Appl.
2019;183: 944–962.
[33] Muscat J. Functional analysis: an introduction to metric spaces, Hilbert Spaces, and
Banach Algebras. Springer; 2014.
[34] Luc TD. Theory of vector optimization, lecture notes in economics and mathematical
systems, vol. 319, Berlin: Springer; 1989.
[35] Bech A. First-order method in optimization. MPS-SIAM Series on Optimization; SIAM,
Philadelphia; 2017.
[36] Bauschke HH, Bolte J, Teboulle M. A descent lemma beyond Lipschitz gradient continu-
ity: first-order methods revisited and applications. Math. Oper. Res., 2017;42(2): 330–348.
[37] Duan YC. A multi-objective approach to portfolio optimization. Undergraduate Math J.
2007;8(1): 1–18.
[38] Fliege J, Vaz AIF, Vicente LN. Complexity of gradient descent for multiobjective opti-
mization. Optim Methods Soft. 2019;34(5): 949–959.
[39] Tanabe H, Fukuda EH, Yamashita N. Convergence rates analysis of multiobjective prox-
imal gradient methods. arXiv preprint arXiv:2010.08217, 2020.

20

You might also like