DC Programming and DCA: Thirty Years of Developments
DC Programming and DCA: Thirty Years of Developments
B
https://doi.org/10.1007/s10107-018-1235-y
Abstract The year 2015 marks the 30th birthday of DC (Difference of Convex func-
tions) programming and DCA (DC Algorithms) which constitute the backbone of
nonconvex programming and global optimization. In this article we offer a short sur-
vey on thirty years of developments of these theoretical and algorithmic tools. The
survey is comprised of three parts. In the first part we present a brief history of the field,
while in the second we summarize the state-of-the-art results and recent advances. We
focus on main theoretical results and DCA solvers for important classes of difficult
nonconvex optimization problems, and then give an overview of real-world applica-
tions whose solution methods are based on DCA. The third part is devoted to new
trends and important open issues, as well as suggestions for future developments.
1 Introduction
Nonconvex (differentiable/nondifferentiable) programming and global optimization
have known, during the past three decades, dramatic developments around the world.
B Hoai An Le Thi
[email protected]
Tao Pham Dinh
[email protected]
1 Laboratory of Theoretical and Applied Computer Science (LITA), University of Lorraine, 3, rue
Augustin Fresnel, 57 073 Metz Technopole, France
2 Laboratory of Mathematics, National Institute for Applied Sciences - Rouen, 76801 Saint-Étienne-
du-Rouvray Cedex, France
123
H. A. Le Thi, T. Pham Dinh
123
DC programming and DCA: thirty years of developments
The relation in (ii) marks the passage from convex optimization to nonconvex opti-
mization and also indicates that DC(X ) constitutes a minimal realistic extension of
Γo (X ).
DC programming and DCA (DC Algorithms) constitute the backbone of nonconvex
programming and global optimization. DCA was first introduced especially for the
standard DC program (B) by Pham Dinh Tao in 1985 as a natural and logical extension
of his previous works on convex maximization since 1974 and further focused for the
general DC program of the form (C). Crucial developments and improvements for DC
programming and DCA from both theoretical and computational aspects have been
completed since 1994 throughout the joint works of the authors of this paper and their
coworkers to become now classic and increasingly popular. As a continuous approach,
DC programming and DCA were successfully applied to combinatorial optimization
as well as many classes of hard nonconvex programs (see Sect. 3.2). These theoretical
and algorithmic tools have been enriched from both a theoretical and an algorithmic
point of view, thanks to a lot of their applications, by researchers and practitioners
in the world, to model and solve nonconvex programs from many fields of Applied
Sciences (refer to Sect. 3.4).
The year 2015 marks the 30th birthday of DC programming and DCA. On this
occasion, it is timely to offer here a short survey on thirty years of developments of
this field.
For beginning, let us introduce briefly DC programming and DCA.
Let X be the Euclidean space Rn equipped with the canonical inner product ., . and
its Euclidean norm .. The dual space of X , denoted by Y , can be identified with X
itself. DC Programming and DCA address the problem of minimizing a function f
which is a difference of convex functions on the whole space X . Generally speaking,
a so-called standard DC program takes the form
123
H. A. Le Thi, T. Pham Dinh
while ∂ϕ stands for the usual (or exact) subdifferential of ϕ at x (i.e. ε = 0 in (1)).
DC programming investigates the structure of DC(X ), DC duality and local and
global optimality conditions for DC programs. The complexity of DC programs clearly
lies in the distinction between local and global solutions and, consequently, the lack
of verifiable global optimality conditions. Necessary local optimality conditions for
the primal DC program (Pdc ) were developed as follows (by symmetry those relating
to dual DC program (Ddc ) are trivially deduced).
Critical and strongly critical points A point x ∗ is a critical point of (Pdc ) (or of
f = g − h) if ∂g(x ∗ ) ∩ ∂h(x ∗ ) = ∅, or equivalently 0 ∈ ∂g(x ∗ ) − ∂h(x ∗ ), while it
is called strongly critical point of (Pdc ) (or of f = g − h) if ∅ = ∂h(x ∗ ) ⊂ ∂g(x ∗ ).
The notion of DC criticality is close to Clarke stationarity/Fréchet critical point
in the sense that the Clarke/Fréchet subdifferential ∂ C f /∂ F f of f = g − h verifies
∂ C f (x) ⊂ [∂g(x) − ∂h(x)], ∂ F f (x) ⊂ [∂g(x) − ∂h(x)] with equality under tech-
nical assumptions. Hence Clarke stationarity of x ∗ , i.e., 0 ∈ ∂ C f (x ∗ ) or its Fréchet
stationarity, say 0 ∈ ∂ F f (x ∗ ) implies DC criticality of x ∗ . We have an equivalence
between these notions if the related equality holds in the corresponding inclusion.
DC strong criticality and d(directional)-stationarity for DC programs DC critical-
ity and strong criticality depend on DC decompositions g −h of DC objective function
f = g − h. And, since a DC function has infinitely many DC decompositions, the
following question is well-founded: is there a stationarity notion for DC programs that
is defined from the function f itself without going through a DC decomposition of
f ? And in case of a positive answer, what is its relation with the notions of criticality
exclusively specific to DC programming and DCA. This question was studied in our
earlier works in 1985 and 1987 (see [232,233] and references quoted therein).
Very recently, in [1] the authors studied the d-stationarity and optimality in DC
programming, they advised researchers to be careful of DC criticality notions. We
fully agree with their consideration, since a thorough mastery of these theoretical
123
DC programming and DCA: thirty years of developments
and algorithmic tools would undoubtedly permit to discover the hidden faces of DC
programming and DCA. Needless to say, the commonly used stationarity remains the
d-stationarity related to the DC objective function f .
For the sake of completeness, let us recall and complete the major results concerning
these different criticalities. They rely on the main results in convex analysis [247]
related to the subdifferential and the directional derivative of a convex function and
the support function of a convex set in X . Let ϕ : X → R∪{+∞} be a proper function
on X and x ∈ dom ϕ. The directional derivative ϕ (x; .) of ϕ at x is defined by
If ϕ is convex, it becomes
Proof The proof of this Theorem can be easily deduced from the specific structure of
DC function and the main results concerning the subdifferential and the directional
derivative of the convex function. It is omitted here. One observes that strong criticality
is generally equivalent to directional stationarity. This justifies the well-foundedness
of DC programming and DCA.
123
H. A. Le Thi, T. Pham Dinh
DCA performs so a double linearization with the help of the subgradient of h and g ∗ ,
and DCA can also be viewed as an iterative primal-dual subgradient method.
Convergence properties of the DCA and its theoretical basis are described in [130,
223]. However, it is worthwhile to mention the following properties:
(i) DCA is a descent method without line search (the sequence {g(x k ) − h(x k )}
is decreasing) but with global convergence (i.e. it converges from an arbitrary
starting point).
(ii) If g(x k+1 ) − h(x k+1 ) = g(x k ) − h(x k ), then x k is a critical point of g − h. In
this case, DCA terminates at kth iteration.
(iii) If the optimal value α of problem (Pdc ) is finite and the infinite sequence {x k } is
bounded, then every limit point x ∗ of this sequence is a critical point of g − h.
(iv) DCA has a linear convergence for general DC programs. In DC programming
with subanalytic data, the whole sequence {x k } generated by DCA converges and
DCA’s rate convergence is stated.
(v) In polyhedral DC programs, the sequence {x k } contains finitely many elements
and DCA converges to a critical point x ∗ after a finite number of iterations.
Especially, if h is polyhedral convex and h is differentiable at x ∗ , then x ∗ is a
local minimizer of (Pdc ).
(vi) This very simple DCA scheme hides the extrapolating character of DCA. Indeed,
we show that, at the limit, the primal (dual) solution x ∗ (resp. y ∗ ) computed by
DCA is also a global solution of the DC program obtained from (Pdc ) (resp.
(Ddc )) by replacing the function h (resp. g ∗ ) with the supremum supk≥1 h k (resp.
supk≥1 (g ∗ )k ) of all the affine minorizations h k (resp. (g ∗ )k ) of h (resp. g ∗ ) gen-
erated by DCA. These DC programs are closer to (Pdc ) and (Ddc ) than (Pk )
and (Dk ) respectively, because the function supk≥1 h k (resp. supk≥1 (g ∗ )k ) better
approximates the function h (resp. g ∗ ) than h k (resp. (g ∗ )k ). Moreover if supk≥1 h k
(resp. supk≥1 (g ∗ )k ) coincides with h (resp. g ∗ ) at an optimal solution of (Pdc )
123
DC programming and DCA: thirty years of developments
(resp. (Ddc )), then x ∗ and y ∗ are also primal and dual optimal solutions respec-
tively. This property, due to the characterization of proper lower semi-continuous
convex functions mentioned above (being supremums of a family of affine func-
tions defined by range (∂h)), makes convex approximations in DCA the most
closed to the DC component h on X, and needless to say, enhances consider-
ably DCA with respect to other Successive Convex Approximation approaches.
These original and distinctive features explain in part the effective convergence
of suitably customized DCA, with a reasonable choice of a starting point, towards
global optimal solutions of DC programs. In practice, DCA quite often converges
to global optimal solutions. The globality of DCA may be assessed either when
the optimal values are known a priori, or through global optimization techniques
(in small dimension of course) the most popular among them remain Branch-and-
Bound (BB) or Cutting Plane Techniques [84,121,122,166,224].
The basic DCA above described, also named the simplified DCA in [223] (or
standard DCA), computes critical points of f = g − h while the complete DCA
provides strongly critical points of f = g − h (see [223] for its description and its
convergence).
Approximate DCA The basic DCA scheme requires the computations of y k ∈
∂h(x k ); x k+1 ∈ ∂g ∗ (y k ). However, these computations are not necessarily exact,
say, we can use an approximate DCA scheme of the form y k ∈ ∂εk h(x k ); x k+1 ∈
∂εk g ∗ (y k ). It has been proved in [295] that the approximate DCA still converges to a
critical point as εk ↓ 0.
DCA with successive DC decomposition In the basic DCA scheme, the DC decom-
position f = g − h is fixed before starting DCA and the majorization f k of f at
iteration k is defined by f k (x) := g(x) − h k (x). It is obviously that, closer f k to
f , faster DCA converges, and better DCA could be. Hence, to increase the speed of
convergence of DCA, one can modify iteratively DC decompositions of f , hoping to
get more suitable f k . More precisely, at each iteration k one considers the so-called
“successive DC decomposition” of f , namely f (x) := g k (x) − h k (x), and takes
f k (x) := g k (x) − h kk (x). The sequence { f (x k )} is still decreasing, since we have
always f (x k ) = f k (x k ) ≥ f k (x k+1 ) ≥ f (x k+1 ). This update DC decompositions
procedure can be used as long as the sequence { f (x k )} is quickly decreasing. Once
the value f (x k ) is slowly improved, one fixes the DC decomposition and applies the
basic DCA until its convergence. The above convergence properties of DCA is then
guaranteed.
General DCA Recently, a natural extension of DC programming and DCA for model-
ing and solving general DC programs with DC constraints (the problem (C) above) was
developed in [91]. Two resulting approaches consist in reformulating those programs
as standard DC programs in order to use standard DCAs for their solutions.
DC programming and DCA can be viewed as an elegant extension of convex
analysis/convex programming, sufficiently broad to cover most real-world nonconvex
programs, but not too large in order to be able to use the powerful arsenal of modern
convex analysis/convex programming (DCA works on convex functions g and h and
their conjugate but not on the DC function f itself). This philosophy leads to the nice
123
H. A. Le Thi, T. Pham Dinh
To our knowledge, the general principle behind the MM method was first introduced
by the numerical analysts Ortega and Rheinboldt [206] in their studies related to line
search methods in one dimensional setting, followed by the work of de Leeuw [168] for
multidimensional scaling problems (MDS). But the acronym MM appeared for the first
time in the work of Hunter and Lange [60]. It is worth noting that in [168], de Leeuw
formulated the metric MDS problem as the maximization of a ratio of two gauges and
then applied the convex maximization algorithm early proposed in [215,216]. Later,
de Leeuw’s algorithm was interpreted in [60] as an MM based method. Like DCA,
the MM is not an algorithm but a philosophy, because it gives a way to construct
algorithms. More precisely, for solving the minimization problem
min{ f (x) : x ∈ X ⊂ Rn },
123
DC programming and DCA: thirty years of developments
is also known in the literature, more recently, under the name SCA, even if the term
“approximation” is used instead to “majorization”.
Without going into details, it is worth mentioning the main differences and advan-
tages of DCA compared to MM method.
(i) Whilst MM proposes a general idea to majorize the objective function f , how to
construct such a majorization for a given function f still is a very hard question.
Meanwhile DCA gives the simplest and the most closed convex surrogate func-
tion of f (see the property vi) of Basic DCA scheme). So, the major benefit of
DCA versus MM is that the DC structure offers an effective and suitable convex
surrogate.
(ii) DCA works on DC components g and h which are convex, but not the function f
itself. While the MM method majorizes iteratively the whole objective function
f , DCA approximates only one part of f , say it majorizes the concave part −h
and keeps the convex part g of f . Hence it is more likely that the majorization
g(x) − h k (x) in DCA is better than θ (x, x k ) in MM, especially as we have the
flexibility of DCA related to the freedom of the choice of g and h. For example,
comparing with the first order Taylor approximation (Böning and Lindsay 1988)
of f , one usual way to construct the surrogate function in MM method, DCA gives
a better approximation. In addition, working with the convex functions g and h
(in the context of DCA) by taking advantage of the powerful convex analysis tools
is by far easier and more attractive than working with nonconvex functions f (in
the context of MM). Moreover, with useful convex analysis tools, working with
convex functions g and h (in DCA’s framework) is easier and more attractive than
working with nonconvex functions f (in the context of MM).
(iii) One of the key criteria in judging majorizing functions in the MM (as well as DCA)
is their ease in solving subproblems. For this purpose, the usual surrogate functions
θ of MM based methods are smooth, so these methods turn a nonsmooth problem
into a smooth problem. Whereas, DCA works, in the same way, on nonsmooth and
smooth optimization problems by using subgradient of convex functions. Again,
this is a considerable benefit of DCA in nonsmooth optimization.
(iv) Thanks to its attractive features, namely, working on convex functions g and h
and then fundamental convex analysis tools such as conjugacy, subdifferential,
criticality, d-stationarity, strong convexity, etc, can be used properply, and the flex-
ibility of DC decompositions, DCA enjoys, in several classes of problems, deeper
and more interesting convergence properties, for example—the finite convergence
for polyhedral DC programs, the convergence of the series x k+1 − x k 2 while
g or h is strongly convex (we can always get this strong convexity via regulariza-
tion techniques [130,223]).
Related works using the MM/SCA principle in the literature lead to the following
observations: numerical methods proposed in these works for a concrete problem
resort finally, directly or not, to DCA. Even if the DC structure of the problem under
consideration is hidden, with the usual choices of the surrogate functions, the MM/SCA
methods result to DCA versions. In other words, the nonconvex functions which can
use these surrogates are in fact DC, and the proposed MM methods are nothing else but
DCA versions. This justifies once again the above classification of realistic nonconvex
123
H. A. Le Thi, T. Pham Dinh
problems (they contain the three classes of DC programs). From numerical point of
view, highlighting first the DC structure of the problem and then investigating directly
DCA seem to be simpler, and more effective/efficient than MM, since one can exploit
the effect of DC decomposition to design other DCA schemes. Hence, an important
question is how to recognize a DC function.
On the other hand, the following questions deserve deeper reflections if one would
like to move beyond DC programming and DCA with the help of MM approaches:
– Show DC programs such that there is an MM version which is not DCA. In this
case compare this MM to DCA from both algorithmic and computational point of
views.
– Find out classes of nonsmooth nonconvex programs which are not DC but can be
solved by MM.
At last, let us consider the commonly used convex surrogate functions in the lit-
erature which are linear upper bounds, quadratic upper bounds and proximal upper
bounds. We will show that, in these cases, the resulting SCA based algorithms are
DCA versions. To simplify the presentation, we consider the following optimization
problem (in the literature, the objective function can be the sum of f and a convex
function g, then one uses the SCA principle for f and keeps g):
θ (x, x k ) = f (x k ) + ∇ f (x k ), x − x k , (4)
This algorithm is also known in the literature under the name “Successive Linear
Approximation”. It is clear that f is a DC function with the following natural DC
decomposition: f (x) = g(x) − h(x) with g(x) = 0, h(x) = − f (x) and the corre-
sponding DC program is
123
DC programming and DCA: thirty years of developments
y k = −∇ f (x k ),
x k+1 ∈ arg min{−y k , x : x ∈ X } = arg min ∇ f (x k ), x : x ∈ X
which is nothing else than the SCA iteration (5) after removing the constant terms.
Note that when f is nonsmooth we can replace ∇ f (x k ) by a subgradient of f at x k .
(ii) Quadratic upper bounds of smooth functions. When f is twice differentiable, the
following upper bound is usual:
1
θ (x, x k ) = f (x k ) + ∇ f (x k ), x − x k + (x − x k )T H (x − x k ), (7)
2
since the two components g and h are convex. Applying DCA on (3) with this DC
decomposition amounts to computing, at each iteration k,
1
y k = H x k − ∇ f (x k ), x k+1 ∈ arg min x, H x − H x k − ∇ f (x k ), x : x ∈ X
2
(8)
which is exactly the SCA iteration (3), (7) after removing the constant terms.
Note that (7) is also referred as the proximal gradient approximation when H = ρ I ,
where I denotes the identity matrix.
(iii) Proximal upper bounds of convex functions. To get the strong convexity of f , the
proximal upper bound of f , say θ (x, x k ) = f (x) + ρ2 x − x k 2 is often used that
leads to the well-known proximal point algorithm in convex programming. We will
see below that this algorithm is a version of DCA.
The concave–convex procedure (CCCP) was first proposed in 2003 [329] for con-
structing discrete time dynamical systems that can be guaranteed to decrease almost
any global optimization/energy function. Under the assumptions that the objective
function f is twice differentiable (and then it can be rewritten as the sum of a convex
part f vex (x) and a concave part f cav (x)) and that the feasible set is defined by linear
constraints, each iteration of the CCCP procedure approximates the concave part by
its tangent and minimizes the resulting convex function:
x k+1 ∈ arg min f vex (x) + x, f cav (x k ) : x ∈ C .
123
H. A. Le Thi, T. Pham Dinh
Obviously, CCCP is nothing else than DCA for smooth optimization. It is observed
that many researchers/practitioners are aware of this fact, but still continue to call
CCCP applied to nonsmooth DC programs!!
D D
where P(w|x, v; Θ k ) = exp i=1 Θi m i (x, v, w)
k / w exp i=1 Θi m i
k
(x, v, w ) .
– M-step: Compute Θ k+1 by maximizing Q(Θ, Θ k ).
The reader can verify that this EM algorithm is exactly DCA applied on the problem
(9) (in its equivalent form min{−L (Θ)}), with the following DC decomposition of
the negative log-likelihood −L (Θ) = g − h:
D
g(Θ) = log exp Θi m i (x, v, w) ,
x∈X v,w i=1
D
h(Θ) = log exp Θi m i (x, v, w) .
x∈X w i=1
123
DC programming and DCA: thirty years of developments
where f and C are convex. First, taking the DC decomposition f (x) = g(x) − h(x)
with g = f and h = 0 and applying DCA on the resulting DC program we obtain the
following DCA iteration:
Hence DCA performs exactly one iteration that consists of solving (11). It turns out
that this DCA corresponds to any standard convex solver used for solving (11).
Now, consider other DC decompositions f (x) = g(x)−h(x) (one say f a false DC
function). DCA applied on the resulting DC programs can be different from standard
convex algorithms for solving (11), but it gives also a global solution to (11). Moreover,
thanks to the effect of DC decomposition, it can be happened that the corresponding
DCA schemes are better than standard convex algorithms.
which is nothing else but the proximal point algorithm for convex program (12).
Now let λ be a positive number such that the function h(x) := 21 λx2 − f (x) is
convex, and let g(x) := χC (x) + 21 λx2 . Hence g − h is a DC decomposition of
χC + f and applying DCA on (12) with this DC decomposition we have
123
H. A. Le Thi, T. Pham Dinh
1
y k = λx k − ηk , ηk ∈ ∂ f (x k ); x k+1 = PC x k − ηk , (13)
λ
where PC denotes the orthogonal projection mapping. One recognizes in (13) the
Goldstein-Levitin-Polyak projection algorithm [238].
ISTA was first introduced in [22] and later developed in [32] and [13]. The most
general case was considered in [13]:
L L
f (x) = g(x) − h(x), where g(x) = g1 (x) + x2 , h(x) = x2 − g2 (x).
2 2
(16)
Similarly, the ISTA with backtracking step-size (when the Lipschitz constant L is not
always known or computable) is a version of DCA with successive DC decomposition.
We just saw that several standard convex/nonconvex approaches are DCA versions.
Meanwhile, the power and creative freedom offered by DC programming and DCA,
in particular the flexibility in the choice of DC decomposition allow to design other
versions of DCA, which are more efficient than the standard methods.
2 Milestone
We state below the most significant steps in the development of DC programming and
DCA.
• 1974: before DC programming and DCA, convex maximization The works of
Pham Dinh Tao on the computation of bound-norms of matrices (i.e., maximizing
a semi-norm over the unit ball of a norm) during the period 1974–1985 ([215–219]
and references therein) paved the way for the introduction of DC programming
and DCA.
• 1985: the birth of DC programming and DCA The previous works on convex
maximization are extended in a natural and logical way to the DC programming.
DC programming and DCA were introduced in 1985 by Pham Dinh Tao [232] in
123
DC programming and DCA: thirty years of developments
the preliminary state. From 1994, these tools were extensively developed through-
out various joint works of Le Thi Hoai An and Pham Dinh Tao, and they are
increasingly popularized since 2002.
• 1994: DCA for solving the trust region subproblem Since its birth, DCA has
been developed in some works but significant results have not had time to emerge
during the period 1986–1992. The first important results have occurred in the the-
sis work of Le Thi Hoai An 1992–1994 [82]. Among them the most interesting
one concerns DCA for solving the trust region subproblem (i.e., the ball con-
strained quadratic programming problem) which plays a key role in the so-called
trust region method for nonlinear optimization. The proposed DCA is a very sim-
ple, fast and scalable algorithm, which is hard to escape a global minimum. It
has attracted attention of the community of nonlinear optimization and has been
known to be one of the best algorithms for this problem. This work was published
firstly in a short version in [222] and then its complete version in [224] marks
a crucial step in the developments of DCA, it was a prelude to the development
and the popularization of DCA. Several DCA based algorithms in both local and
global approaches for linear or convex quadratic constrained nonconvex quadratic
programming problems have been motivated by/based on this work.
• 1996: DCA for optimizing over the efficient set In [145], DCA was investigated,
for the first time, for minimizing a convex or concave or quadratic function f
over the efficient set of a multiple objective program. The original problem were
reformulated as a DC program with the help of a penalty function, and the exact
penalty was proved when f is concave. This paper constitutes the starting point to a
more general result on exact penalty in DC programming in 1999 and motivated the
use of DCA for other classes of difficult nonconvex programs. Later developments
of DCA for optimizing over the efficient set were done in [148,150].
• 1997: DCA for nonconvex quadratic programs We developed in [121] sev-
eral DCA versions corresponding to different DC decompositions and studied
local optimality conditions and convergence properties of the resulting DCAs.
The combined DCA-BB (Branch and Bound) algorithm using DC relaxation for a
quadratic function has seen the day in [121]. Afterward, specific DCA and the com-
bined DCA-BB for bound constrained quadratic programs have been proposed in
[122]. These works established the foundations to later developments of DCA on
linear/quadratic integer programming. The year 1997 was also marked by the sem-
inal paper “DC programming and DCA: Theory, Algorithm, Applications” which
was devoted to a thorough study and the state-of-the-art of DC programming and
DCA [223].
• 1998: introduction of Ellipsoidal bisection in BB algorithms for nonconvex
quadratic programs Motivated by the efficiency of DCA for ball constrained
quadratic programs, we introduced in [146], for the first time in global optimiza-
tion, the ellipsoidal bisection technique in a BB scheme for nonconvex quadratic
programs. This technique enjoys a double advantage: we do not require that the
polytope K is given explicitly by a system of linear inequalities and/or equali-
ties, and the inexpensive ellipsoidal constrained quadratic programs are the main
subproblems in the BB scheme.
123
H. A. Le Thi, T. Pham Dinh
123
DC programming and DCA: thirty years of developments
the objective function or constraints) was appeared in [189] for feature selec-
tion in SVM (Support Vector Machine). Since then, DCA is widely developed
to sparse optimization and its applications for variable selection in classification,
compressed sensing and finance via two approaches: DC approximation (see [144]
and references therein) and DC reformulation via exact penalty [99]. These two
DC approaches cover all existing algorithms in nonconvex approaches for sparse
optimization. The year 2005 was also marked by the state-of-the-art paper [130]
which completes and extends the seminal work on DC programming and DCA in
[223].
• 2013: DCA for general DC programs and recent advances Hitherto DCA is
investigated for standard DC programs (minimizing a DC function over a con-
vex set). However, numerous real-world problems deal with DC constraints. In
[91], we present a natural extension of DC programming and DCA for modeling
and solving general DC programs with DC constraints. Two resulting approaches
consist in reformulating those programs as standard DC programs in order to
use standard DCAs for their solutions. Some other hot topics like convergence
rate of DCA for DC programs with subanalytic data, exact penalty and error
bounds in DC programming whose mixed integer DC programming were pre-
sented in [93,225]. In particular, the convergence study of the whole standard
DCA sequence is a difficult problem. Actually, and only for standard DC pro-
grams with subanalytic data, one should use the sophisticated nonsmooth version
of the Lojasiewicz inequality to demonstrate, for the first time, this conver-
gence result with rate depending on the Lojasiewicz exponent of the objective
function.
• Today … DC programming and DCA were the subject of several hundred articles
in the high ranked scientific journals and the high-level international conferences,
as well as various international research projects, and were the methodological
basis of more than 50 PhD theses. About 100 invited symposia/sessions dedicated
to DC programming and DCA were presented in many international conferences.
The ever-growing number of works using DC programming and DCA proves their
power and their key role in nonconvex programming/global optimization and many
areas of applications.
We will summarize the key results which constitute the state-of-the-art of DC Pro-
gramming and DCA.
DC functions have many important properties that were derived from 1950s in [3]
and [52]. However, it was necessary to wait until the mid-80s when the class of DC
functions was introduced in optimization, to the appearance of DC programming.
The general theoretical results about DC Programming and DCA were developed
in [223] where we have established the DC duality, local optimality conditions, the
123
H. A. Le Thi, T. Pham Dinh
123
DC programming and DCA: thirty years of developments
The trust region subproblem (TRSP) is the minimization of a quadratic function over an
Euclidean ball. It has crucial importance in numerical analysis and nonlinear optimiza-
tion, and enjoys distinctive features: (i) It is the basic subproblem for the Trust-Region
Method (consisting of solving a sequence of (TRSP)) which can be considered as an
improved Newton type method and is recognized to be among the most robust, stable
and efficient methods in nonlinear programming (see [31] where a chapter is devoted
to DC programming and DCA for solving (TRSP)). (ii) TRSP is one of rare nonconvex
programs which possess verifiable global optimality conditions (quite close to KKT
conditions). TRSP has only one local-nonglobal solution. (iii) The set of KKT points
for TRSP is contained in at most 2m + 2 disjunctive subsets where the objective func-
tion has the same value and m is the number of the distinct negative eigenvalues of the
symmetric matrix defining the quadratic function of TRSP. These properties should
promote inexpensive local descent methods which can perform finitely many restart-
ings to converge to global solutions of TRSP. In [222,224], we investigated DCA to
solve TRSP. Thanks to the particular structure of the trust-region subproblem, the DCA
is very simple: it consists of computing, at each iteration, the projection of a point on an
Euclidean ball, which is explicit and requires only matrix-vector products. In practice,
DCA converges to the global solution of TRSP. The inexpensive implicitly restarted
Lanczos method of Sorensen is used to check the optimality of solutions provided
by the DCA. When a nonglobal solution is found, a simple numerical procedure is
introduced both to find a feasible point having a smaller objective value and to restart
the DCA at this point. It is shown that in the nonconvex case, the DCA converges
to the global solution of TRSP, using only matrix-vector products and requiring at
most 2m + 2 restarts. Numerical simulations establish the robustness and efficiency of
the DCA compared to standard related methods, especially for large-scale problems.
This paper provides positive answers concerning (TRSP) in an efficient and elegant
way. It is very much appreciated by the international optimization community and
popularizes DC programming and DCA. It is also worth noting that TRSP plays an
important role in BB algorithms using ellipsoidal outer approximation techniques for
lower bounding and this method has been applied successfully in many global algo-
rithms (see [84,122,146]). Later, a careful study on the behavior of DCA sequences
for TRSP was presented in [153,290,291].
123
H. A. Le Thi, T. Pham Dinh
123
DC programming and DCA: thirty years of developments
DCA was investigated the first time in [124] for the linearly constrained quadratic
programming with binary variables (we call BQP in short)
1
α = min f (x) := x, Qx + q, x : Ax ≤ b, x ∈ {0, 1}n , (BQP)
2
1 λ̄
α = min f (x) := x, (Q − λ̄I )x + q + e, x : x ∈ K , p(x) ≤ 0 ,
2 2
123
H. A. Le Thi, T. Pham Dinh
with λ = λ̄ + t. Its solution set is so contained in the vertex set of K . DCA developed
previously for NQP can be then applied to (CCQP). With a suitable DC decomposition
for the concave quadratic minimization, DCA enjoys several advantages: (i) DCA only
requires solving one linear program at each iteration; (ii) If x r ∈ {0, 1}n , then x k ∈
{0, 1}n for all k ≥ r ; (iii) DCA has finite convergence. Exploiting these advantages
of DCA we proposed the combined DCA-BB for BQP in which DC relaxation was
used for lower bounding and DCA was applied to (CCQP) to get a good feasible
(integer) solution. The second advantage makes our method efficient to find an integer
ε-solution of (CCQP) in the large-scale setting as we show hereafter. These DCA and
BB-DCA have been used for solving several problems in divers domains of application
including scheduling [114,161,192], supply chain [119,200], transport [186], network
optimization [116,117,250,266,269,270], cryptography [100], finance [50,106,158]
where the number of integer variables goes up to 1,700,000 [266], and the above
advantages of DCA have been confirmed by numerous numerical results comparing
with CPLEX solver [61]. It turns out in these experiments that (i) DCA always provides
an integer solution and it converges after a few number of iterations (about 4); (ii) For
the problems that CPLEX can (globally) solve, the gap between DCA and CPLEX is
very small: to cite a few, ε = 1.6.10−2 in [269], ε = 5.7.10−3 in [117], ε = 6.7.10−4 in
[158], ε = 0 in [50,266,270]. In other cases, the gap between DCA and the first lower
bound (given by the BB algorithm or simply by solving the first linear relaxation
problem) varies between 1.5 and 10% [114,119,200] (note however that, there is
nothing to say that the solution given by DCA is not global in these cases); Moreover,
in most cases, DCA is much faster than CPLEX, the ratio of gain goes up to 1278 times
(for the minimum M-dominating set problem [250] with 5, 000 integer variables); (iii)
Thanks to the efficiency of DCA, the combined BB-DCA is faster than the BB and,
more importantly, it can handle problems with larger dimension for which the BB
cannot.
Besides the combined BB-DCA, another global approach, called DCA-CUT (intro-
duced first in [199] which uses the solution of DCA to construct the cutting plane),
was successfully developed to solve several real-world problems (see for example
[185,266]). In particular, for the Single Straddle Carrier Routing Problem in Port
Container Terminals [186], DCA-CUT gives the exact optimal solution in a very short
time (less than 25 seconds) for the problems up to 4, 900 integer variables while the
running time of CPLEX is up to 177 times more.
Later, in [229] the SDP relaxation was investigated in the combined DCA-BB
method. This technique seems to be efficient for finding a good starting point of
123
DC programming and DCA: thirty years of developments
DCA. In all cases, DCA gave a very good binary solution, most of them are -optimal
solutions of BQP (with ≤ 1% or even exact optimal ones) since the number of
restarting DCA is surprisingly small and equal to 1 in most cases.
In [151], the authors considered a class of nonlinear bilevel programs in which the
objective function in the first level is a DC function, and the second level is a quadratic
program. DCA and the combined DCA-BB were investigated for solving the relaxed
problem obtained from the original one by replacing the solution set of the quadratic
program in the second level by its KKT point set. The last problem is described by
K T (x) = {(y, λ) : P x + Qy + q + E T λ = 0, Dx + E y + b ≤ 0,
λT (Dx + E y + b) = 0, λ ≥ 0}
123
H. A. Le Thi, T. Pham Dinh
on random data with large dimensions show that DCA provides good approximated
optimal solutions (the gap between the objective value of DCA and the lowed bound
given by BB scheme is about 8% for randomly generated data), and the combined
BB-DCA is efficient. An interesting real-world application in portfolio selection was
considered for which DCA gives in all test problems (real data) an ε-optimal solution
with ε < 2%. Note that these approaches can also be applied for the case where the
upper objective function is DC because that the exact penalty holds in this case.
DCA and the combined DCA-BB were developed in [145,148,150] for a widely
studied class of the so-called Bilevel Multiple Objective Programming (BMOP) in
which the upper objective is a real valued function while the lower level problem is a
multiple objective linear program:
where C is an (r × n) matrix, E(K ) is the solution set (called the efficient set) of
the problem max{C x : x ∈ K } and K is a bounded polyhedral convex set in Rn .
This class of problems has many applications in multiple objective decision making
[328] and was intensively studied in the literature. Most of the existing algorithms deal
with the case where F is a linear function. Numerical solutions by global approaches
are still difficult when r ≥ 4. So, it is interesting to investigate local approaches and
combine these two approaches for solving efficiently large-scale problems.
Two DC formulations of (BMOP) were proposed, one in the decision space Rn
[145,150], and another in the criteria space Rr (when F is concave, [148]). The
key idea is to introduce suitable penalty functions for representing the efficient (or
more generally, the weakly efficient) sets and then use exact penalty techniques in
DC programming. DCA was then investigated for solving resulting DC programs.
Unlike existing approaches, DCA can solve these problems with a large number of
criteria. For globalizing DCA, several combined DCA-BB algorithms were proposed
in [145,150] where the branching procedure was efficiently proceeded in the criteria
space via simplicial subdivision while bounding procedures were suitably developed
for each case, when F is a convex/concave/quadratic function [145] and when F is
linear [150]. The numerical results in [145,150] showed that DCA is reliable in finding
a good efficient point and/or a global minimizer to problem (BMOP), although, in
general, the starting point of DCA is not in E(K ). More precisely DCA found an ε-
optimal solution with ε < 1 in 85% cases (128/150 problems), and a global minimizer
in 76% cases (114/150 problems). In particular, when r ≥ 5, DCA gave the global
solutions in 43 out of the 50 cases. Hence it is interesting to use DCA while computing
upper bounds in BB methods. As for the BB scheme, it is useful to get the initial point
and to prove the globality of DCA. By the way, the combined DCA-BB can solve
problems of larger dimension.
In [148], a BB method using simplicial subdivision and two tight affine minorants
for bounding was proposed for the case where F is a linear function. This algorithm
works very well when p ≤ 4. Nevertheless, as in any global method for optimizing
123
DC programming and DCA: thirty years of developments
over the efficient and/or weakly efficient sets, it is very difficult to solve the problem
when p ≥ 5. DCA and the combined DCA-BB are really required to handle large-scale
problems.
The simplest and most widely studied complementarity problem is the Linear Comple-
mentarity Problem (LCP), which is one of the fundamental problems of mathematical
programming. It consists of finding a vector x ∈ Rn \ {0} such that
Ax + b, x = 0, Ax + b ≥ 0, x ≥ 0,
for a given real (n ×n) matrix A and a vector b ∈ Rn . The LCP is known to be NP-hard,
even if the underlying symmetric matrix has only one negative eigenvalue. Although
the LCP is not explicitly an optimization problem, there exist in the literature various
optimization models and methods for solving it. In general, the resulting optimization
problem of LCP is nonconvex, and thus solving such a problem by global approaches
is very difficult in the large-scale setting.
DC programming and DCA were carefully investigated in [83,132] for LCP (a
follow-up to [132] is done recently in [62]). When A is symmetric, solving the LCP
is equivalent to finding a KKT point of the quadratic program
1
min f (x) := x, Ax + b, x : x ≥ 0
2
which can be done by a very inexpensive DCA scheme. If A is asymmetric, the LCP
becomes much more difficult than the symmetric case. We proposed four optimization
formulations to LCP including a linear constrained quadratic program, a concave
minimization problem with separate variables: a simple constrained (nonnegativity
of variables) quadratic program, and a bilinear program: whose the optimal value is
equal to zero when the LCP has a solution. This property is interesting for using local
approaches and iterative algorithms such as DCA: the known optimal value can be
used as a stopping rule for iterative algorithms and for checking the globality of the
obtained solution. These four optimization problems are all DC programs with quite
simple structure which are quite suitable for applying DCA. Basing on appropriate DC
decomposition for each of four proposed optimization models we developed simple
DCA schemes which consist of solving successive linear programs, or successive
convex quadratic programs, or quite simply the projection of points on R2n (or on Rn+
in the symmetric case). Numerical experiments on several test problems given in [41]
and some previous works (see [132]) proved the efficiency and the rapidity as well as
the scalability of the proposed approaches.
123
H. A. Le Thi, T. Pham Dinh
w = (λB − A)x, w ≥ 0, x ≥ 0, w T x = 0.
The EiCP is an extension of the classical eigenvalue problem that has been widely
studied [64,240]. Solving the EiCP is in general an NP-hard problem since determining
the feasibility of EiCP is already proved to be an NP-complete problem [64].
DC programming and DCA were investigated for both Symmetric Eigenvalue Com-
plementarity problem SEiP [107] (EiCP when A and B are both symmetric matrices),
and Asymmetric Eigenvalue Complementarity Problem AEiCP [203] (EiCP when A
is an asymmetric matrix and B is a positive definite matrix). Among several equiva-
lent formulations of SEiCP in the literature, the three most interesting formulations for
the use of DCA were considered in [107]: solving SEiCP amounts to finding a KKT
point of these optimization problems. Thus, applying DCA on these problems one
gets solutions to SEiCP (this result is no longer valid when the matrix A is asymmet-
ric). Three DCA schemes were developed for minimizing the generalized Rayleigh
quotient on the standard n − 1 simplex, minimizing a difference of logarithmic func-
tions on the standard n − 1 simplex, and minimizing a quadratic function with one
quadratic constraint and simple constraints. The nice effect of DC decompositions
was well exploited, especially in the two DCA schemes whose main work per iter-
ation relies on the computation of a projection of a point onto the standard n − 1
simplex. Computational experiments showed that DCA is quite efficient and in most
of the cases DCA outperforms Spectral Projected Gradient Algorithm (SPGA) [65],
one among the best existing algorithms for EiCP.
As for AEiCP, similar to the LCP, there are nonlinear programming (NLP) formula-
tions having interesting properties which are suitable to the use of DCA [203]: if their
optimal value is equal to zero, then their optimal solutions correspond to solutions of
the AEiCP. Three NLP formulations were introduced, they were reformulated as DC
programs by penalizing nonlinear constraints (including complementarity constraints).
Two DCA schemes were proposed for two resulting DC programs, one is minimizing
a sum of a convex quadratic function and three nonconvex polynomial functions over
a polytope, the other is minimizing a sum of a convex quadratic function, a noncon-
vex polynomial function and a nonconvex fractional function over a polytope. Two
initialization strategies for DCA using SDP (Semi Definite Programming) and convex
quadratic programming were investigated. Some numerical simulations for randomly
generated problems and real-world problems showed a good performance of DCA.
DCA almost always yields a global optimal solution with zero optimal value of NLP
formulations (solution of EiCP) within a short computational time and is especially
efficient for relatively large-scale problems. These remarkable results outperformed
the Enumerative method, Branch-and-bound method, and BARON/GAMS presented
in [65].
Another class of complementarity problems is the Quadratic Eigenvalue Comple-
mentarity Problem (QEiCP) which consists of finding a real number λ and a vector
x ∈ Rn \{0} such that
123
DC programming and DCA: thirty years of developments
w = λ2 Ax + λBx + C x, x T w = 0, x ≥ 0, w ≥ 0,
where A, B, C ∈ Rn×n are given matrices. The problem QEiCP and its applications
were first introduced in [257]. At the time being, a few algorithms are available in the
literature.
DCA was developed in [202] for QEiCP where the authors reformulated QEiCP
as minimizing a nonconvex polynomial function subject to linear constraints. DCA
applied on QEiCP requires solving a convex quadratic program over a polyhedral
convex set at each iteration which can be efficiently solved by many quadratic pro-
gramming solvers such as CPLEX of IBM [61].
Sparse optimization or optimization involving the zero norm has many applications in
various domains, and draws increased attention from many researchers in recent years.
The 0 -norm on Rn , denoted .0 , is defined by x0 := |{i = 1, . . . , n : xi = 0}|,
where |S| is the cardinality of the set S. Formally, a sparse optimization problem takes
the form
inf f (x, y) + λ x0 : (x, y) ∈ K ⊂ Rn ×Rm , (18)
where the function f corresponds to a given criterion and λ is a positive number that
makes the trade-off between the criterion f and the sparsity of x, or
During the last two decades, research is very active in models and methods optimization
involving the zero-norm. Works can be divided into three categories according to the
way to treat the zero-norm: convex approximation, nonconvex approximation, and
nonconvex exact reformulation.
Nonconvex approximation approaches were extensively developed to solve (18),
most of them were in the context of machine learning and image analysis (feature
selection in classification, sparse regressions, and compressed sensing, see Sect. 3.4.2
(j) below). A variety of sparsity-inducing penalty functions were proposed to approx-
imate the 0 term (see the related references in [144]). Using these approximations,
several algorithms have been developed for resulting optimization problems. In the
recent seminal paper [144], nonconvex approximation approaches for sparse opti-
mization were studied with a unifying point of view in DC programming framework.
Considering a common DC approximation of the zero-norm including all standard
sparse inducing penalty functions, the authors studied the consistency between global
minimums (resp. local minimums) of approximate and original problems and showed
that, in several cases, some global minimizers (resp. local minimizers) of the approxi-
mate problem are also those of the original problem. Based on exact penalty techniques
in DC programming, stronger results for some particular approximations were proved,
namely, the approximate problem, with suitable parameters, is equivalent to the orig-
inal problem. The efficiency of several sparse inducing penalty functions was fully
123
H. A. Le Thi, T. Pham Dinh
analyzed. Four DCA schemes were developed that cover all standard algorithms in
nonconvex sparse approximation approaches as special versions. They can be viewed
as, an 1 -perturbed algorithm/reweighted-1 algorithm. This paper offered a unifying
nonconvex approximation approach, with solid theoretical tools as well as efficient
algorithms based on DC programming and DCA, to tackle the zero-norm and sparse
optimization. As an application, the proposed methods were implemented for the
feature selection in SVM problem and performed empirical comparative numerical
experiments on the proposed algorithms with various approximation functions.
There are few works in nonconvex exact reformulation approaches which consist of
reformulating equivalently a sparse optimization problem as a continuous nonconvex
program. Besides the linear program with equilibrium constraints (LPEC) formulation
for (18) when f is linear [181], works in these approaches deal with DC program-
ming and DCA. They were developed in [106] (for Portfolio selection problem with
cardinality constraint), [280] (for the general classes (18) and (19)), [281] (for Sparse
Eigenvalue problem of the form max{x T Ax : x T x = 1, x0 ≤ k}), and in [99] for
(18) and application to feature selection in SVM. The key idea is using the penalty
techniques related to the 0 -norm to reformulate (18) and (19) as DC programs, that
can be treated by DC programming and DCA. We suppose that K is bounded in the
variable x, i.e., K ⊂ i=1 n [a , b ] × Rm where a , b ∈ R such that a ≤ 0 < b for
i i i i i i
i = 1, . . . , n. Then (18) and (19) can be reformulated as
respectively.
In recent years, the derivative-free global optimization methods based on RBFs (Radial
Basis Functions) have been proposed by several authors. The main idea is that, in each
iteration of direct search, an RBF model with as many points as possible is constructed
and then it is minimized in the intersection of the feasible set with a trust region whose
radius is proportional to the direct-search step size. In this framework, the authors
of [163] addressed the global optimization of functions subject to bound and linear
constraints and investigated two DCA schemes for the resulting subproblems which
consist of the minimization of the RBF models subject to simple bounds on the vari-
ables. The best proposed DCA was compared to the MATLAB fmincon solver and
proved to be competitive (in the sense that DCA obtains similar objective function
values using fewer objective function evaluations). Extensive numerical results were
reported, they confirmed the utility of the RBF in driving the overall direct-search
123
DC programming and DCA: thirty years of developments
algorithm for a better objective function value using fewer objective function evalua-
tions.
Besides the above mentioned DCA solvers, during the last decade, the DCA principle
has been used under different guises in several works. To a complete survey, it would be
useful to discuss about these related approaches which include various DCA versions.
123
H. A. Le Thi, T. Pham Dinh
(b) Parallel SCA for nonsmooth nonconvex optimization [244]. The considered
optimization problem is
n
min f (x) + gi (xi ) : x = (x1 , . . . , xn ) ∈ X1 × . . . × Xn , (23)
i=1
where Xi and f j satisfy the following conditions: (i) Xi is closed and convex;
(ii) f j (x) is continuously differentiable on X = X1 × . . . × Xn ; (iii) ∇xi f j (x)
is Lipschitz continuous on X ; (iv) V (x) is coercive with respect to X . Jacobi
SCA is a distributed version of SCA. Denote x−i = (x j ) j=i and Ci ⊂ { j :
f j (., x−i ) is convex on Xi , for all x−i }. At each iteration k, for all i = 1, . . . , n,
the exact Jacobi SCA Algorithm approximates V (x) with respect to xi by a convex
surrogate function θCi (xi , x k ) and computes
x̂Ci (x k , τi ) = arg min θCi (xi , x k ) : xi ∈ Xi , (25)
with C−i and Hi (x k ) being, respectively, the complement of Ci in {1, ..., N } and a
uniformly positive definite matrix, and τi > 0. And the next feasible vector x k+1 is
computed by
123
DC programming and DCA: thirty years of developments
n
where x̂C (x k , τ ) = x̂Ci (x k , τi ) i=1 . We observe that if j∈Ci f j (x i , x −i ) only
¯
depends on xi , i.e., j∈Ci f j (xi , x−i ) = f i (xi ) and γ = 1, Jacobi SCA can be inter-
k
where f is continuous and X is a closed convex set. SUM [243] is similar to SCA, but
SUM does not require the convexity of θ (x, x k ). Instead, it must satisfy the following
conditions: (i) θ (y, y) = f (y) ∀y ∈ X ; (ii) θ (x, y) ≥ f (x) ∀x, y ∈ X ; (iii)
θ (x, y; d) = f (y; d) ∀d with y + d ∈ X ; (iv) θ (x, y) is continuous in (x, y);
where f (y; d) is the directional derivative of f at point y in direction d.
Even if SUM does not require the convexity of θ , in practice, commonly used
approximations in SUM based algorithms are convex, more precisely linear/quadratic
or proximal upper bounds. Hence, as mentioned in Sect. 1, the resulting SUM algo-
rithms is a DCA version.
123
H. A. Le Thi, T. Pham Dinh
BSUM is a block version of SUM. More precisely, at each iteration k, BSUM chooses
the block i = (k mod n) + 1, and solves the following problem
where θi (xi , x k−1 ) is an approximation of f (x) which satisfies the following assump-
tions (i) θi (yi , y) = f (y) ∀y ∈ X ; (ii) θi (xi , y) ≥ f (y1 , . . . , yi−1 , xi , yi+1 , . . . , yn )
∀xi ∈ Xi , y ∈ X ; (iii) θi (xi , y; d) = f (y; d) ∀d = (0, . . . , di , . . . , 0) with
yi + di ∈ Xi ; (iv) θi (xi , y) is continuous in (xi , y) for all i.
In [243], the authors provided convergence properties of BSUM by assuming that
the subproblem (32) has a unique solution for all x k−1 . Observe that BSUM is a block
version of SUM and with the above usual majorization BSUM is a block version
of DCA. Note that, there is no SUM/BSUM scheme implemented in case of the
subproblems are nonconvex. The reason seems to be natural, as globally solving the
sequence of nonconvex subproblems is still difficult.
where g, h are proper, convex, and lower semi-continuous. PPA iteratively computes
w k ∈ ∂h(x k ) and
1
x k+1 = arg min g(x) + x − x k − ck w k 2 . (34)
x 2ck
It is easy to verify that PPA is nothing else but DCA applied on (33) with the following
successive DC decomposition
1 1
f (x) = gk (x) − hk (x), gk (x) = g(x) + x2 , hk (x) = h(x) + x2 . (35)
2ck 2ck
Note that in the convex case (h = 0), PPA reduces to classical PPA proposed in
[182,248]. In [66], the convexity of g is replaced by the convexity of g(x) + 2c1k x2 .
Hence PPA of [66] is actually a version of DCA with successive DC decomposition.
[10] did not require the convexity on g but assumed that x k+1 can be computed. PPA
in [10] can be regarded as a DCA-type algorithm (say, approximating h by an affine
function and then solving the resulting problem).
123
DC programming and DCA: thirty years of developments
In [11], the authors proposed the FBSA (Forward-Backward Splitting Algorithm) for
solving the next problem:
tk
Suppose that we can determine a number t k such that gk (x) = g1 (x) + 2 x
2
k
and hk (x) = t2 x2 − g2 (x) are convex functions. Then f is a DC function and
(37) is nothing else but DCA applied on (36) with the successive DC decomposition
f = gk − hk . For any real number t k , FBSA is a DCA-type algorithm with successive
DC decomposition.
Note that if g1 is a DC function, FBSA reduces to the GIST (General Iterative
Shrinkage and Thresholding) algorithm proposed in [47]. [30] considered a special
case where both g1 and g2 are convex. On the other hand, Fixed Point Iteration Algo-
rithm [51] is a forward-backward splitting algorithm with constant step-size for solving
the problem min x x1 + f (x), where f is differentiable and convex. Hence, Fixed
Point Iteration Algorithm is actually a DCA scheme.
In [241], the authors proposed a so-called DC-PN algorithm for the following problem:
123
H. A. Le Thi, T. Pham Dinh
and find step size t k via line search and update x k+1 = x k + t k x k , where x k =
x̄ k − x k .
If one takes t k = 1, then x k+1 = x̄ k and it is no difficult to verify that DC-PN is
actually a DCA with the successive DC decomposition F(x) = gk (x) − hk (x), where
gk (x) and hk (x) are convex functions given by
1 1
gk (x) = h 1 (x) + x T H k x, hk (x) = x T H k x − f 1 (x) + f 2 (x) + h 2 (x).
2 2
Note that if f 2 = h 2 = 0, DC-PN reduces to proximal Newton-type methods in
convex optimization [167].
where f (ξ, x) is a differentiable, convex function. The expected risk function cannot
be minimized directly because the grand truth distribution is unknown. However, it is
possible to compute an approximation by using n realizations ξ1 , ..., ξn of ξ with n
being large enough. The approximation problem of (40) is
n
1
min f˜(x) = f (ξi , x) . (41)
x∈Rd n
i=1
At each iteration k, SGD randomly chooses a realization ξik and computes x k+1 by
the following rule
x k+1 = x k − αk ∇ f (ξik , x k ), (42)
where αk is the learning rate. SGD can be interpreted as a stochastic version of DCA
developed in [235]. Indeed, the problem (41) can be expressed as a DC program
ρ 1
n ρ 1
n
g(x) = x2 , h(x) = x2 − f (ξi , x) := h i (x),
2 n 2 n
i=1 i=1
with ρ > 0 such that ρ2 x2 − f (ξi , x) is convex. At each iteration k, stochastic DCA
consists of randomly choosing ξik and computing a stochastic approximation y ik of
∇h(x k ) by y ik = ∇h ik (x k ) = ρx k − ∇ f (ξik , x k ), and then computing x k+1 by
123
DC programming and DCA: thirty years of developments
1
x k+1 ∈ arg min g(x) − y ik , x ⇔ x k+1 = x k − ∇ f (ξik , x k ).
ρ
DC programming and DCA were extensively developed for several challenging classes
of problems in computational biology—Multiple Sequence Alignment, Molecular
conformation, Protein fold recognition, and Phylogenetic tree reconstruction.
(a) Multiple sequence alignment (MSA)
It is well-known that the MSA problem can be transformed into the NP-complete
Maximum Weight Trace problem (MWT in short), as a natural formulation of merging
partial alignments to form multiple alignments. Le Thi et al. [139] proposed a constraint
generation method based on DCA to the MSA using the 0–1 linear formulation of the
MWT. This method solves, at each iteration, one 0–1 linear program for which DCA
is used. Numerical simulation experiments showed the superiority of this approach
compared to some standard methods. In [6], two new formulations were proposed
for the MSA: a new 0–1 quadratic programming formulation whose the number of
variables and constraints are much smaller than that of the MWT, and a compact
linear 0–1 formulation of the MWT. The authors proposed to use DCA for these new
formulations.
(b) Molecular conformation
These last years there has been a very active research in the molecular conformation,
one of major subjects of computational biology. When the molecular conformation
is determined by distances between pairs of atoms in the molecule, the molecular
conformation is tackled by solving the distance geometry problem which has many
applications in various domains. DC programming and DCA were extensively studied
for solving both exact distance geometry problem (finding n points x 1 , . . . , x n ∈ R p
such that x i − x j = δi j , (i, j) ∈ S , where S is a subset of the point pairs,
δi j with (i, j) ∈ S is the given distance between x i and x j ) in [128] and general
distance geometry problem (finding x 1 , . . . , x n ∈ R p such that li j ≤ x i − x j ≤
u i j , (i, j) ∈ S , where li j and u i j are lower and upper bounds of the distance between
x i and x j , respectively) in [85,123,129,135], and testing on real datasets of molecular
conformation. It has been shown in these works that the DCA can be well exploited to
obtain efficient algorithms for both exact and general large-scale distance geometry
problems. Key issues of DCA were carefully studied and several techniques were
proposed to exploit the nice effect of DC decompositions (the Lagrangian duality
without gap relative to DC programming, the regularization techniques) and of starting
points (using the shortest paths between all pairs of atoms to generate the complete
123
H. A. Le Thi, T. Pham Dinh
dissimilarity matrix and the spanning trees procedure and the smoothing technique).
Many numerical simulations of the molecular optimization problems with up to 12,567
variables (4189 atoms) reported in [85,123,128] proved the practical usefulness of
the nonstandard nonsmooth reformulations, the globality of found solutions, and the
robustness and efficiency of DCAs. Note that, so far the optimization existing methods
consider the problem of medium-sized molecules (777 atoms) only.
Molecular conformation can also be done by minimizing an energy function whose
global minimizer corresponds to the molecular structure. In this context, the Lennard-
Jones and Morse potential energies are mostly used. DCAs were carefully studied for
minimizing the Lennard-Jones potential energy [133] and the Morse potential energy
[134]. They are very difficult and challenging global optimization problems.
(c) Phylogenetic tree reconstruction
In [35] the authors proved that the phylogenetic tree reconstruction, a fundamental
task in evolutionary biology, is a DC program and proposed a cutting plane method
for computing the branch lengths x of the tree. This algorithm works on small datasets
(5 sequences). In [87], a DCA scheme was developed which works well on large size
datasets.
(d) Protein fold recognition
Enhanced protein fold recognition through a novel data integration approach was
proposed in [325].
For a complete review on DC programming and DCA for computational biology
(resp. distance geometry problems) the reader is referred to [88] (resp. [135]).
One can have another insight of the applications of DCA in Biology through
machine learning techniques. In this framework DCA was investigated to gene
selection in cancer classification [120], network-based penalized regression with appli-
cation to genomic data [68], to name a few.
Machine Learning and Data Mining (MLDM in short) represent a mine of optimization
problems that are almost all DC programs for which appropriate solution methods
should use DC programming and DCA. During the last decade DC programming
and DCA have been successfully applied to modeling and solving many problems
in MLDM. As mentioned above, the three methods Expectation-Maximization (EM)
[33], Successive Linear Approximation (SLA) [20] and Convex–Concave Procedure
(CCCP) [329], better known, in a certain period, to data miners—not aware of the
state-of-the-art in optimization—are particular cases of DCA. In addition, these three
methods, without proof of convergence, relate only to differentiable functions. Since
then, the paternity has been acknowledged by leading experts in the field in their
publications. We give below a brief overview of the problems in MLDM which were
solved by DCA.
(a) Clustering
Clustering is a fundamental problem in unsupervised learning and has many appli-
cations in various domains. DCA is extensively investigated in various works for
123
DC programming and DCA: thirty years of developments
A very simple and inexpensive DCA scheme where all computations are explicit was
proposed in [89]. This work is the basis of several DCAs developed for clustering data
streams [273–275], clustering massive data sets [273,276]. In [98], a Gaussian kernel
version of the MSSC problem was considered for which an efficient DCA scheme was
investigated. This algorithm outperforms the DCA for the MSSC in several numer-
ical test problems. An alternative mixed integer formulation of the MSSC problem
was considered in [98], where it is reformulated as a DC program via new results
on exact penalty technique in DC programming [143]. Fuzzy clustering via the Fuzzy
c-Means (FCM) model was studied in [75,95,97] with three DC formulations and cor-
responding DCA schemes. Different variants of weighted clustering were considered
in [80,103,195,273] for which DCA were efficiently investigated. DC Programming
and DCA have also succeeded in multilevel hierarchical clustering and its applications
in multicast [75,142], block clustering [78].
A common advantage of DCAs for the above mentioned clustering problems (except
for the mixed 0–1 formulation of the MSSC) is that, with suitable DC decomposi-
tions, they are all simple and inexpensive, they require only matrix-vector products.
The numerical results demonstrated that the proposed algorithms are more efficient
and more robust than related existing algorithms. DC Programming and DCA have
also been successful in [207] for clustering with unknown number of clusters via
sparse optimization formulations (for which DCA was applied to the approximation
problems), and for maximum margin clustering in [301,330] where the problem was
formulated as a general DC program (including DC constraints).
(b) Modularity maximization and communities detection
The modularity maximization, one of the most used methods for detecting communi-
ties in a network, can be formulated as a 0–1 indefinite quadratic program. Interestingly,
in [112] the authors showed that this hard discrete optimization problem is equivalent
to the minimization of a concave quadratic function over a product of simplices. A
fast and scalable DCA scheme was developed for the last problem, it requires only
matrix-vector product at each iteration. Numerical experiments in [112] indicated that
DCA can handle large size networks having up to 4,194,304 nodes and 30,359,198
edges and outperforms reference algorithms.
(c) Self-organizing maps (SOM)
In [110] DC programming and DCA were developed for the popular Batch SOM
model which is a nonsmooth, nonconvex program. With an elegant matrix formulation
and a natural DC decomposition, the resulting DCA requires only sums and scalar
multiplications of vectors which are simple and very inexpensive. A training version
for DCA with an efficient cooling schedule is investigated. This approach is fast and
123
H. A. Le Thi, T. Pham Dinh
scalable and is promising for large-scale setting, it outperforms the standard Batch
SOM method.
(d) Nonnegative matrix factorization (NMF) and dictionary learning
The NMF problem (which consists of approximating a given nonnegative matrix A ∈
Rm×n by the product of two low-rank nonnegative matrices U ∈ Rm×r and V ∈ Rn×r )
has many applications such as text mining, image processing, spectral data analysis,
etc. In [165], the NMF problem and several NMF variants were considered. Two DCAs
were developed. The first combined the alternating framework and DCA while the sec-
ond applied directly DCA to the whole NMF problem. The efficiency of the proposed
approaches was empirically demonstrated on both real-world and synthetic datasets.
It turns out that DCAs compete favorably with the five state-of-the-art algorithms.
In [213,293] the authors highlighted other two DC reformulations of the NMF prob-
lem and proposed to use DCA for solving them. A more difficult NMF variant which
requires U to be 0–1 matrix and V to have stochastic columns was considered in [260],
where the authors used a penalty technique in DC programming to tackle the 0–1 con-
straint. A related problem to the NMF is the dictionary learning. DCA-based methods
were developed in [295,297] with encouraging results in image denoising application.
(e) One-class Support Vector Machine (OC-SVM) and Anomaly detection
One of the most successful approach for anomaly/novelty/outliers detection is the OC-
SVM which tries to separate the high density regions by a hyperplane. In [261] the
authors extended the OC-SVM approach to thresholded estimates of likelihood ratios
and reformulated it as a DC program by changing variables, and proposed a DCA
scheme which guaranteed the globality of solutions. Another extension of the OC-
SVM including latent dependency was proposed in [46] for which DCA was developed.
[34] followed the idea of the support vector data description (SVDD) but employed
the 0 -norm to measure the classification error. The logarithmic approximation of 0 -
norm was used and DCA was efficiently investigated for the resulting problem. Also
based on the SVDD, [298] used the ramp loss for measuring errors and developed a
DCA based algorithm for solving the resulting problem.
(f) Robust support vector machines
Despite its wide success, the classical SVM is sensitive to the presence of outliers
and yields poor generalization performance due to the unboundedness of the hinge
loss function. To overcome these drawbacks, many works proposed to replace an
unbounded loss by a bounded DC ramp loss and apply DCA for the resulting problem.
Various ramp losses were proposed for robust SVM in [29,42], [259] (ψ-learning loss),
[314] (the truncated hinge loss), for robust multiclass SVM in [173,174,314], and for
some variants of robust SVM including the ramp loss linear programming SVM [59],
the ramp loss least squares SVM [171], the ramp loss nonparallel SVM [172], the
robust support vector regression [305,306,336] and the outcome weighted learning
[25]. DCAs were successfully investigated for the resulting problems in these works.
(g) Spherical separation
Another group of binary classification methods is based on spherical separator where
we separate two set of points by mean of a hypersphere. Several DCA-based algorithms
have been proposed for this problem in [8,9] and [101]. It turns out that the DCA
123
DC programming and DCA: thirty years of developments
123
H. A. Le Thi, T. Pham Dinh
specified while [258,321,340] deal with DCA based methods for simultaneous feature
grouping and selection.
Sparse regression and compressed sensing DCA based methods were developed to
sparse logistic regression [318], sparse linear regression [307], sparse quantile regres-
sion [262,315], and to compressed sensing which refers to techniques for efficiently
acquiring and reconstructing signals via the resolution of underdetermined linear
systems. Compressed sensing concerns sparse signal representation, sparse signal
recovery and sparse dictionary learning which can be formulated as a sparse optimiza-
tion problem. DC approximation approach and DCA were widely used for solving
this problem in the context of compressed sensing in [43,118,196] (using usual sparse
inducing functions) and in [36,176,177,323] via new regularizations using the ratios
or the difference of 1 and 2 norms.
It has been pointed out in [144] that popular algorithms for sparse regression and
compressed sensing, including Focal Underdetermined System Solver (FOCUSS)
[48], Local Quadratic Approximation (LQA) [37], Adaptive Lasso [342], reweighted
1 [21], Iteratively Reweighted Least Squares (IRLS) [23], Local Linear Approxima-
tion (LLA) [343], are particular cases of DCA.
Sparse eigenvalue problem Generally speaking, this problem aims to find an eigenvec-
tor x of an (n × n) symmetric matrix A such that x0 ≤ k, where k ∈ N, 1 ≤ k < n.
DC approximation approaches and DCA were investigated for the case where A 0
(resp. A is indefinite) in [263] (resp. in [283]) while DCA based on the exact penalty
technique was developed in [281].
Low-rank matrix estimation Low-rank matrix estimation is the central part of some
important problems such as matrix completion [44] and robust principal component
analysis [264]. Since the rank of a matrix X is equal to the 0 -norm of its vector of
singular values xσ , low-rank matrix estimation problem can be addressed similarly
as for the minimization problem involving the 0 -norm. DCA based methods were
developed in [264] via the capped-1 and in [44] via the p -norm (0 < p < 1) and
the logarithm approximations.
Sparse covariance matrix estimation In [16], using the lasso penalty on the entries
of the covariance matrix, the authors proposed a DCA based method for estimating
a sparse covariance matrix on the basis of sample vectors drawn from a multivariate
normal distribution. A more efficient DCA based algorithm with a more suitable
DC decomposition was developed in [236] for the same problem. The problem of
estimating sparse precision matrices from data with missing values was studied in [278]
where the authors used the lasso penalty and developed DCA for the corresponding
maximum likelihood problem. Experiments show that the DCA is superior to the
Expectation Maximization algorithm in terms of convergence speed.
(k) Learning under uncertain data
Data uncertainty is common in real-world applications due to various reasons and
must be handled carefully. Recently, the idea of using robust optimization [14], a
novel and useful approach to deal with uncertain data, has attracted more interest
from researchers. Robust optimization models result in solving a min-max optimiza-
123
DC programming and DCA: thirty years of developments
tion problem which is often nonconvex and then difficult to solve. Based on robust
optimization models, DC programming and DCA were recently developed to Feature
selection for linear SVMs under uncertain data in [164] and for Robust clustering
uncertain data in [295,296]. These works showed that DC programming and DCA
are very powerful tools to address learning problems under uncertain data.
Optimization plays a key role in communication systems (CS) since most of the issues
of this domain are related to optimization problems. Nonconvex optimization becomes
an indispensable and powerful tool for the analysis and design of CS since the last
decade. As an innovative approach to nonconvex programming, DC programming and
DCA are increasingly used by researchers/practitioners in this field. A careful review
on DC programming and DCA in CS can be found in [137]. Here, due to the page
limit, we only provide an (incomplete) list of problems in CS and network optimiza-
tion solved by DCA. In terms of optimization, they can be partitioned into four classes
of DC programs.
123
H. A. Le Thi, T. Pham Dinh
3.4.5 Security
(a) Cryptography
For building high-quality S-boxes which are a key component of modern crypto-
systems, the authors of [77] studied the construction of highly nonlinear balanced
Boolean functions and proposed a deterministic optimization model which is the min-
123
DC programming and DCA: thirty years of developments
123
H. A. Le Thi, T. Pham Dinh
(b) General DC
Globally solving the Value-at-Risk [228]; Value-at-Risk constrained optimization
[190,311].
(c) Linearly/quadratically constrained quadratic programming with mixed 0–1 vari-
ables
Portfolio selection under downside risk measures and cardinality constraints [106];
robust investment strategies with discrete asset choice constraints [50]; continuous
min max problem for single period portfolio selection with discrete constraints [158].
(d) DC programs with mixed 0–1 variables/integer variables
Long-short portfolio optimization under cardinality constraints [104]; minimizing the
transaction costs of portfolio selection [231]; discrete portfolio optimization under
concave transaction costs [227].
(e) Bilevel programming problems portfolio selection [152].
3.4.7 Transportation
Several large-scale problems in the area of transportation were modeled as 0–1 (or
mixed 0–1) linear programming problems and successfully solved by DCA and com-
bined DCA—global methods (BB, DCA-CUT which is a new cut based on a solution
given by DCA). They include the multimodal transport problem [108]; the car pooling
problem [266,267]; the scheduling of lifting vehicle in an automated port container
terminal [81]; the single straddle carrier routing problem in port container terminals
[186,188]; the two-dimensional packing problem [187] and the orienteering problem
[268].
The sensor network localization with uncertainties in anchor positions and the opti-
mization of traffic signals in networks considering rerouting were formulated as a
standard DC program and solved by DCA in [313] and [285], respectively. The two
much more difficult problems, namely Nonlinear UAV task assignment problem under
uncertainty and Planning a multisensor multizone search for a target were studied in
[113] and [109] respectively. They were formulated as a 0–1 or mixed 0–1 DC program
and efficiently solved by DCA via exact penalty techniques in DC programming.
The earliness tardiness scheduling problems [114,115] and the minimization of pre-
ventive maintenance cost with unequal release dates and tardiness penalties, under
real-time and resource constraints [161,284] were formulated as mixed 0–1 linear
programs and favorably solved by DCA. A more difficult problem, namely optimiz-
ing a multi-stage production/inventory system, was studied in [160]. It was formulated
as a mixed integer linear program for which DCA and the combined DCA-BB were
developed.
123
DC programming and DCA: thirty years of developments
4 Recent advances
Continuing the work in [141,143], very recently we are interested in exact penalty
techniques with DC programming which leverage the DC structure of nonconvex
constraints. In [92], we consider the error bounds for inequality systems and the exact
penalization for constrained optimization problems. We firstly investigate the relation-
ships between the errors bounds and the exact penalization. Secondly, we establish
the new errors bounds for inequality systems of concave functions and of nonconvex
quadratic functions over polyhedral convex sets. These established results serve as
theoretical tools for penalization methods in nonconvex programming, especially in
DC programming. For general DC programs discussed below, these techniques allow
to reformulate equivalently them as standard DC programs. For instance, minimizing
a DC function under a polyhedral convex set with additional nonconvex quadratic con-
straints, which is a hard and important problem having many applications. A deeper
research direction being in progress (see [225]) is to consider the error bound for
systems of finitely many DC inequalities.
In [225] we considered the class of (NP-hard) mixed integer DC programs (44) below
and proposed five DC penalty functions with explicit DC decompositions.
123
H. A. Le Thi, T. Pham Dinh
The general convergence of DCA above mentioned says that every convergent subse-
quence of the sequence {x k } (resp. {y k }) converges to a generalized KKT point of (Pdc )
(resp. (Ddc )). From a theoretical point of view, convergence rate analysis of DCA is
an open key issue. Some elegant results on the convergence of the whole sequences
{x k } and {y k } as well as their convergence rate, in case the objective functions and the
constraints are subanalytic, have been recently presented in [225].
The recent results on exact penalty in mixed integer DC programming can be exploited
to solve many classes of mixed integer DC programming problems by DCA, see e.g.
[227].
123
DC programming and DCA: thirty years of developments
Today, the number of works applying DCA for new practical problems in various
areas is growing constantly. To name a few, reinforcement learning [237], Markov
Decision Process [56], extreme learning [320], face recognition based on Haar fea-
tures [308], sparse and low-rank recovery under simplex constraints [169], linear
feature transform and enhancement of classification on deep neural network [324],
penalized regression-based clustering [312], optimisation over the non-dominated
set of a multiobjective optimisation problem [175], continuous equilibrium network
design problem via mathematical programming with complementarity constraints
(MPCC) [197], finding a Nash equilibrium in polymatrix game of three players
(hexamatrix game) [205], absolute fused lasso and its application to Genome-
Wide association studies [322], full-rate general rank beamforming in single-group
multicasting networks using non-orthogonal STBC [277], power allocation in mul-
ticasting relay network [162], secure cooperative communications under secrecy
outage constraint [239], solving Brugnano-Casulli piecewise linear systems [220],
relaxed Mumford-Shah image segmentation [211], etc. And among numerous works
in DC programming framework appear during the last year, we can cite a few, e.g.
[1,58,111,138,157,183,194,208,209,236,245].
SIP refers to optimization problems having infinitely many constraints. In recent years
several methods have been developed for convex SIP (i.e., when the objective function
f is convex). Traditionally, they can be classified into three categories: the exchange
methods, discretization methods and methods based on local reduction. All the meth-
ods are in nonconvex programming framework (that consider a sequence of finitely
many constrained optimization problems), because checking the feasibility of a solu-
tion to SIP amounts to globally solving a nonconvex program. When f is nonconvex
the corresponding SIP is much more difficult and solving nonconvex SIP is a chal-
lenge of the optimization community. DCA can be investigated to DC SIP problems,
say SIP with the DC objective function f = g − h. Our methodology can be based
on two approaches: (i) if g is a convex function in variable x then one can use the
idea of traditional methods for (SIP) in which standard DCA schemes should be iter-
123
H. A. Le Thi, T. Pham Dinh
atively applied; (ii) if g is a DC function of the variable x then one can use the idea
of traditional methods for (SIP) in which general DCA schemes should be iteratively
applied.
It would be wrong to think that using DCA for efficiently solving a practical prob-
lem is a simple procedure. Indeed, the general DCA scheme is rather a philosophy
123
DC programming and DCA: thirty years of developments
than an algorithm. There is not only one DCA but infinitely many DCAs for a
considered problem. Despite the bright successes obtained by researchers and practi-
tioners in the use of DC programming and DCA for modeling and solving nonconvex
and global optimization problems, their works have not exploited the full power
and creative freedom offered by these tools: their proposed DCAs, although more
efficient than existing methods, but can be further improved to better handle large-
scale problems. The design of an efficient DCA for a real-world problem is an art
which should be based on theoretical tools and on its special structure. Key issues
that should be studied while developing DCA for a DC program are the follow-
ing.
(a) Efficiency (the convergence speed and the quality of computed solutions):
123
H. A. Le Thi, T. Pham Dinh
6 Conclusion
We have given a short survey (with an incomplete list of references) on the thirty
years of developments of DC Programming and DCA which have wonderful scientific
impacts on many fields of applied sciences. Due to the page limit, we just gave a very
brief outline on the philosophy and key properties of DCA and on its historic, for a
complete study of DC programming and DCA the reader is referred to the two papers
[130,223]. And recent advances in this field can be found in [225]. Among the state-
of-the-art results, we focus on the presentation of DCA solvers for important classes
of difficult nonconvex programs and for real-world applications. The reader can see
that, DCA solvers are available for a large class of problems (one can say most of the
problems) appeared in optimization (continuous or discrete) and operational research.
Hence the list of references we offered could be useful for them in the study/use
of solution methods for their problems. On another hand, the list of DCA solvers
available to solving real-world problems, classified area by area, would allow readers
to find the material of most interest to them in their specific applications. The list of
references showed that many classes of problems in various domains of applications
can be solved by DCA. In addition, the analysis on the related works in nonconvex
programming framework provided the reader a unified DC programming and DCA
point of view on the solution methods for realistic convex/nonconvex programs.
It is certain that developments of nonconvex programming and global optimization
via DC Programming and DCA for modeling and solving real-world nonconvex opti-
mization problems (in many branches of applied sciences) will intensify yet in the years
to come and for long, because Nonconvex Programming and Global Optimization are
endless.
Acknowledgements The authors are grateful to Dr. Vo Xuan Thanh for sending us some references on
DCA solvers for real-world applications, and the two anonymous reviewers as well as Professor Jong-Shi
Pang for their constructive comments that greatly improved the manuscript, in particular one of reviewers
for providing us some references on related DCA methods in Sect. 3.3.
References
1. Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and
sparsity. SIAM J. Optim. 27(3), 1637–1665 (2017)
2. Akoa, F.B.: Combining DC algorithms (DCAs) and decomposition techniques for the training of
nonpositive-semidefinite kernels. IEEE Trans. Neural Netw. 19(11), 1854–1872 (2008)
3. Alexandroff, A.: On functions representable as a difference of convex functions. Doklady Akad.
Nauk SSSR (N.S.) 72, 613–616 . [English translation: Siberian Elektron. Mathetical. Izv. 9 (2012)
360–376.] (1950)
4. Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser dc-programming
and its applications. IEEE Trans. Signal Process. 62(11), 2984–2998 (2014)
5. Argyriou, A., Hauser, R., Micchelli, C.A., Pontil, M.: A DC-programming algorithm for kernel
selection. In: ICML 2006, pp. 41–48. ACM (2006)
6. Arthanari, T.S., Le Thi, H.A.: New formulations of the multiple sequence alignment problem. Optim.
Lett. 5(1), 27–40 (2011)
7. Astorino, A., Fuduli, A.: Semisupervised spherical separation. Appl. Math. Model. 39(20), 6351–
6358 (2015)
123
DC programming and DCA: thirty years of developments
8. Astorino, A., Fuduli, A., Gaudioso, M.: DC models for spherical separation. J. Global Optim. 48(4),
657–669 (2010)
9. Astorino, A., Fuduli, A., Gaudioso, M.: Margin maximization in spherical separation. Comput. Optim.
Appl. 53(2), 301–322 (2012)
10. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involv-
ing analytic features. Math. Program. 116(1), 5–16 (2009)
11. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame
problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods.
Math. Program. 137(1), 91–129 (2013)
12. Bačák, M., Borwein, J.M.: On difference convexity of locally lipschitz functions. Optimization 60(8–
9), 961–978 (2011)
13. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems.
SIAM J. Imaging Sci. 2(1), 183–202 (2009)
14. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton Series in Applied Math-
ematics. Princeton University Press, Princeton (2009)
15. Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program.
129(2), 163–195 (2011)
16. Bien, J., Tibshirani, R.J.: Sparse estimation of a covariance matrix. Biometrika 98(4), 807–820 (2011)
17. Bottou, L.: On-line learning in neural networks. Chap. In: On-line Learning and Stochastic Approx-
imations, pp. 9–42. Cambridge University Press, New York, NY, USA (1998)
18. Bouallagui, S.: Techniques d’optimisation déterministe et stochastique pour la résolution de prob-
lèmes difficiles en cryptologie. Ph.D. thesis, INSA de Rouen (2010)
19. Bouallagui, S., Le Thi, H.A.: Pham Dinh, T.: Design of highly nonlinear balanced boolean functions
using an hybridation of DCA and simulated annealing algorithm. In: Modelling, Computation and
Optimization in Information Systems and Management Sciences, Communications in Computer and
Information Science, vol. 14, pp. 579–588. Springer, Berlin, Heidelberg (2008)
20. Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector
machines. ICML 1998, 82–90 (1998)
21. Candes, E.J., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted-l1 minimization. J. Fourier
Anal. Appl. 14, 877–905 (2008)
22. Chambolle, A., Vore, R.A.D., Lee, N.Y., Lucier, B.J.: Nonlinear wavelet image processing: variational
problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process.
7(3), 319–335 (1998)
23. Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing, 2008, pp. 3869–3872 (2008)
24. Che, E., Tuan, H.D., Nguyen, H.H.: Joint optimization of cooperative beamforming and relay assign-
ment in multi-user wireless relay networks. IEEE Trans. Wirel. Commun. 13(10), 5481–5495 (2014)
25. Chen, G., Zeng, D., Kosorok, M.R.: Personalized dose finding using outcome weighted learning. J.
Am. Stat. Assoc. 111(516), 1509–1521 (2016)
26. Cheng, Y., Pesavento, M.: Joint optimization of source power allocation and distributed relay beam-
forming in multiuser peer-to-peer relay networks. IEEE Trans. Signal Process. 60(6), 2962–2973
(2012)
27. Cheung, P.M., Kwok, J.T.: A regularization framework for multiple-instance learning. In: ICML 2006,
pp. 193–200. ACM, New York, NY, USA (2006)
28. Collobert, R., Sinz, F., Weston, J., Bottou, L.: Large scale transductive SVMs. J. Mach. Learn. Res.
7, 1687–1712 (2006)
29. Collobert, R., Sinz, F., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML 2006, pp.
201–208 (2006)
30. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale
Model. Simul. 4(4), 1168–1200 (2005)
31. Conn, A., Gould, N., Toint, P.: Trust Region Methods. SIAM, Philadelphia (2000)
32. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse prob-
lems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
33. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM
algorithm. J. R. Stat. Soc. B Methodol. 39(1), 1–38 (1977)
123
H. A. Le Thi, T. Pham Dinh
34. El Azami, M., Lartizien, C., Canu, S.: Robust outlier detection with L0-SVDD. In: European Sym-
posium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN
2014, pp. 389–394 (2014)
35. Ellis, S.E., Nayakkankuppam, M.V.: Phylogenetic analysis via DC programming . (Preprint) (2003)
36. Esser, E., Lou, Y., Xin, J.: A method for finding structured sparse solutions to nonnegative least
squares problems with applications. SIAM J. Imaging Sci. 6(4), 2010–2046 (2013)
37. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J.
Am. Stat. Assoc. 96(456), 1348–1360 (2001)
38. Fastrich, B., Paterlini, S., Winker, P.: Constructing optimal sparse portfolios using regularization
methods. CMS 12(3), 417–434 (2015)
39. Fawzi, A., Davies, M., Frossard, P.: Dictionary learning for fast classification based on soft-
thresholding. Int. J. Comput. Vis. 114(2), 306–321 (2015)
40. Feng, D., Yu, G., Yuan-Wu, Y., Li, G.Y., Feng, G., Li, S.: Mode switching for energy-efficient device-
to-device communications in cellular networks. IEEE Trans. Wirel. Commun. 14(12), 6993–7003
(2015)
41. Floudas, C.A., Pardalos, P.M., Adjiman, C., Esposito, W.R., Gümüs, Z.H., Harding, S.T., Klepeis,
J.L., Meyer, C.A., Schweiger, C.A.: Handbook of test problems in local and global optimization. In:
Nonconvex Optimization and Its Applications, vol. 33. Springer, USA (1999)
42. Gasso, G., Pappaioannou, A., Spivak, M., Bottou, L.: Batch and online learning algorithms for non-
convex Neyman–Pearson classification. ACM Trans. Intell. Syst. Technol. 2(3), 28:1–28:19 (2011)
43. Gasso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of noncon-
vex penalties and DC programming. IEEE Trans. Signal Process. 57(12), 4686–4698 (2009)
44. Geng, J., Wang, L., Wang, Y.: A non-convex algorithm framework based on DC programming and
DCA for matrix completion. Numer. Algorithms 68(4), 903–921 (2015)
45. Gholami, M.R., Gezici, S., Strom, E.G.: A concave–convex procedure for TDOA based positioning.
IEEE Commun. Lett. 17(4), 765–768 (2013)
46. Göernitz, N., Braun, M., Kloft, M.: Hidden Markov anomaly detection. In: Proceedings of the 32nd
International Conference on Machine Learning, vol. 37, pp. 1833–1842. JMLR: W&CP (2015)
47. Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative shrinkage and thresholding
algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International
Conference on International Conference on Machine Learning, ICML’13, vol. 28, pp. II-37–II-45
(2013)
48. Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstructions from limited data using FOCUSS: a re-
weighted minimum norm algorithm. IEEE Trans. Signal Process. 45(3), 600–616 (1997)
49. Guan, G., Gray, A.: Sparse high-dimensional fractional-norm support vector machine via DC pro-
gramming. Comput. Stat. Data Anal. 67, 136–148 (2013)
50. Gülpinar, N., Le Thi, H.A., Moeini, M.: Robust investment strategies with discrete asset choice
constraints using DC programming and DCA. Optimization 59(1), 45–62 (2010)
51. Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for 1 -minimization: methodology and
convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
52. Hartman, P.: On functions representable as a difference of convex functions. Pac. J. Math. 9(3),
707–713 (1959)
53. Heinkenschloss, M.: On the solution of a two ball trust region subproblem. Math. Program. 64(1–3),
249–276 (1994)
54. Hiriart-Urruty, J.B.: From Convex Optimization to Nonconvex Optimization. Part I Necessary and
Sufficient Conditions for Global Optimality, pp. 219–239. Springer, Boston (1989)
55. Ho, V.T.: Advanced machine learning techniques based on DC programming and DCA. Ph.D. thesis,
University of Lorraine (2017)
56. Ho, V.T., Le Thi, H.A.: Solving an infinite-horizon discounted Markov decision process by DC
programming and DCA. In: Nguyen, T.B., van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced
Computational Methods for Knowledge Engineering: ICCSAMA 2016, Proceedings, Part I, pp. 43–
55. Springer, Berlin (2016)
57. Ho, V.T., Le Thi, H.A., Bui, D.C.: Online DC optimization for online binary linear classification.
In: Nguyen, T.N., Trawiński, B., Fujita, H., Hong, T.P. (eds.) Intelligent Information and Database
Systems: ACIIDS 2016, Proceedings, Part II, pp. 661–670. Springer, Berlin (2016)
123
DC programming and DCA: thirty years of developments
58. Hong, M., Razaviyayn, M., Luo, Z.Q., Pang, J.S.: A unified algorithmic framework for block-
structured optimization involving big data: with applications in machine learning and signal
processing. IEEE Signal Process. Mag. 33(1), 57–77 (2016)
59. Huang, X., Shi, L., Suykens, J.: Ramp loss linear programming support vector machine. J. Mach.
Learn. Res. 15(1), 2185–2211 (2014)
60. Hunter, D.R., Lange, K.: Rejoinder to discussion of optimization transfer using surrogate objective
functions. Comput. Graph. Stat. 9, 52–59 (2000)
61. IBM: CPLEX Optimizer. https://www.ibm.com/analytics/data-science/prescriptive-analytics/cplex-
optimizer
62. Jara-Moroni, F., Pang, J.S., Waechter, A.: A study of the difference-of-convex approach for solving
linear programs with complementarity constraints. Math. Program. Ser. B (2018, to appear)
63. Jeong, S., Simeone, O., Haimovich, A., Kang, J.: Optimal fronthaul quantization for cloud radio
positioning. IEEE Trans. Veh. Technol. 65(4), 2763–2768 (2016)
64. Júdice, J.J., Sherali, H.D., Ribeiro, I.M.: The eigenvalue complementarity problem. Comput. Optim.
Appl. 37(2), 139–156 (2007)
65. Júdice, J.J., Sherali, H.D., Ribeiro, I.M., Rosa, S.S.: On the asymmetric eigenvalue complementarity
problem. Optim. Methods Softw. 24(4–5), 549–568 (2009)
66. Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Global Optim.
13(4), 389–406 (1998)
67. Khalaf, W., Astorino, A., D’Alessandro, P., Gaudioso, M.: A DC optimization-based clustering tech-
nique for edge detection. Optim. Lett. 11(3), 627–640 (2017)
68. Kim, S., Pan, W., Shen, X.: Network-based penalized regression with application to genomic data.
Biometrics 69(3), 582–93 (2013)
69. Krause, N., Singer, Y.: Leveraging the margin more carefully. In: Proceedings of the twenty-first
international conference on Machine learning ICML 2004, p. 63 (2004)
70. Krummenacher, G., Ong, C.S., Buhmann, J.: Ellipsoidal multiple instance learning. In: Dasgupta, S.,
Mcallester, D. (eds.) ICML 2013, JMLR: W&CP, vol. 28, pp. 73–81 (2013)
71. Kuang, Q., Speidel, J., Droste, H.: Joint base-station association, channel assignment, beamforming
and power control in heterogeneous networks. In: IEEE 75th Vehicular Technology Conference (VTC
Spring), pp. 1–5 (2012)
72. Kwon, S., Ahn, J., Jang, W., Lee, S., Kim, Y.: A doubly sparse approach for group variable selection.
Ann. Inst. Stat. Math. 69(5), 997–1025 (2017)
73. Laporte, L., Flamary, R., Canu, S., Déjean, S., Mothe, J.: Nonconvex regularizations for feature
selection in ranking with sparse SVM. IEEE Trans. Neural Netw. Learn. 25(6), 1118–1130 (2014)
74. Le, A.V., Le Thi, H.A., Nguyen, M.C., Zidna, A.: Network intrusion detection based on multi-class
support vector machine. In: Nguyen, N.T., Hoang, K., Jedrzejowicz, P. (eds.) Computational Collective
Intelligence. Technologies and Applications: ICCCI 2012, Proceedings, Part I, pp. 536–543. Springer,
Berlin (2012)
75. Le, H.M.: Modélisation et optimisation non convexe basées sur la programmation DC et DCA pour
la résolution de certaines classes des problémes en fouille de données et cryptologie. Ph.D. thesis,
Université Paul Verlaine-Metz (2007)
76. Le, H.M., Le Thi, H.A., Nguyen, M.C.: Sparse semi-supervised support vector machines by DC
programming and DCA. Neurocomputing 153, 62–76 (2015)
77. Le, H.M., Le Thi, H.A., Pham Dinh, T., Bouvry, P.: A combined DCA: GA for constructing highly
nonlinear balanced boolean functions in cryptography. J. Global Optim. 47(4), 597–613 (2010)
78. Le, H.M., Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Block clustering based on Difference of Convex
functions (DC) programming and DC algorithms. Neural Comput. 25(10), 2776–2807 (2013)
79. Le, H.M., Nguyen, T.B.T., Ta, M.T., Le Thi, H.A.: Image segmentation via feature weighted fuzzy
clustering by a DCA based algorithm. In: Advanced Computational Methods for Knowledge Engi-
neering, Studies in Computational Intelligence, vol. 479, pp. 53–63. Springer (2013)
80. Le, H.M., Ta, M.T.: DC programming and DCA for solving minimum sum-of-squares clustering
using weighted dissimilarity measures. In: Transactions on Computational Intelligence XIII, LNCS,
vol. 8342, pp. 113–131. Springer, Berlin, Heidelberg (2014)
81. Le, H.M., Yassine, A., Moussi, R.: DCA for solving the scheduling of lifting vehicle in an automated
port container terminal. Comput. Manag. Sci. 9(2), 273–286 (2012)
123
H. A. Le Thi, T. Pham Dinh
82. Le Thi, H.A.: Analyse numérique des algorithmes de l’optimisation DC. Approches locale et globale.
Codes et simulations numériques en grande dimension. Applications. Ph.D. thesis, Université de
Rouen (1994)
83. Le Thi, H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: : Théorie.
Algorithmes et Applications. Habilitation à Diriger des Recherches, National Institute for Applied
Sciences, Rouen (1997)
84. Le Thi, H.A.: An efficient algorithm for globally minimizing a quadratic function under convex
quadratic constraints. Math. Program. 87(3), 401–426 (2000)
85. Le Thi, H.A.: Solving large scale molecular distance geometry problems by a smoothing technique
via the Gaussian transform and D.C. programming. J. Global Optim. 27(4), 375–397 (2003)
86. Le Thi, H.A.: DCA collaborative for clustering. University of Lorraine, Tech. rep. (2013)
87. Le Thi, H.A.: Phylogenetic tree reconstruction by a DCA based algorithm. Tech. rep., LITA, University
of Lorraine (2013)
88. Le Thi, H.A.: DC programming and DCA for challenging problems in bioinformatics and computa-
tional biology. In: Adamatzky, A. (ed.) Automata, Universality, Computation, Emergence, Complexity
and Computation, vol. 12, pp. 383–414. Springer, Berlin (2015)
89. Le Thi, H.A., Belghiti, M.T., Pham Dinh, T.: A new efficient algorithm based on DC programming
and DCA for clustering. J. Global Optim. 37(4), 593–608 (2007)
90. Le Thi, H.A., Ho, V.T.: Online learning based on Online DCA (2016, Submitted)
91. Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: DC programming and DCA for general DC programs.
In: van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge
Engineering, pp. 15–35. Springer, Berlin (2014)
92. Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Error bounds via exact penalization with applications to
concave and quadratic systems. J. Optim. Theory Appl. 171(1), 228–250 (2016)
93. Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Convergence analysis of DCA with subanalytic data. J.
Optim. Theory Appl. (2018)
94. Le Thi, H.A., Huynh, V.N., Pham Dinh, T., Vaz, A.I.F., Vicente, L.N.: Globally convergent DC
trust-region methods. J. Global Optim. 59(2), 209–225 (2014)
95. Le Thi, H.A., Le, H.M., Nguyen, T.P., Pham Dinh, T.: Noisy image segmentation by a robust clustering
algorithm based on DC programming and DCA. In: Proceedings of the 8th Industrial Conference on
Advances in Data Mining, ICDM’08, pp. 72–86. Springer (2008)
96. Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature
selection in support vector machines learning. Adv. Data Anal. Classif. 2(3), 259–278 (2008)
97. Le Thi, H.A., Le, H.M., Pham Dinh, T.: Fuzzy clustering based on nonconvex optimisation approaches
using difference of convex (DC) functions algorithms. Adv. Data Anal. Classif. 1(2), 85–104 (2007)
98. Le Thi, H.A., Le, H.M., Pham Dinh, T.: New and efficient DCA based algorithms for minimum
sum-of-squares clustering. Pattern Recognit. 47(1), 388–401 (2014)
99. Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty
approach using a difference of convex function algorithm. Mach. Learn. 101(1–3), 163–186 (2015)
100. Le Thi, H.A., Le, H.M., Pham Dinh, T., Bouvry, P.: Solving the perceptron problem by deterministic
optimization approach based on DC programming and DCA. In: INDIN 2009, Cardiff. IEEE (2009)
101. Le Thi, H.A., Le, H.M., Pham Dinh, T., Huynh, V.N.: Binary classification via spherical separator by
DC programming and DCA. J. Global Optim. 56(4), 1393–1407 (2013)
102. Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: Stochastic DCA for the large-sum of non-convex
functions problem and its application to group variable selection in classification. In: Proceedings of
the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11
August 2017, pp. 3394–3403 (2017)
103. Le Thi, H.A., Le, : M.T., Nguyen, T.B.T.: A novel approach to automated cell counting based on a
difference of convex functions algorithm (DCA). In: Computational Collective Intelligence. Tech-
nologies and Applications, LNCS, vol. 8083, pp. 336–345. Springer, Berlin, Heidelberg (2013)
104. Le Thi, H.A., Moeini, M.: Long-short portfolio optimization under cardinality constraints by differ-
ence of convex functions algorithm. J. Optim. Theory Appl. 161(1), 199–224 (2014)
105. Le Thi, H.A., Moeini, M., Pham Dinh, T.: DC programming approach for portfolio optimization
under step increasing transaction costs. Optimization 58(3), 267–289 (2009)
106. Le Thi, H.A., Moeini, M., Pham Dinh, T.: Portfolio selection under downside risk measures and
cardinality constraints based on DC programming and DCA. Comput. Manag. Sci. 6(4), 459–475
(2009)
123
DC programming and DCA: thirty years of developments
107. Le Thi, H.A., Moeini, M., Pham Dinh, T., Joaquim, J.: A DC programming approach for solving the
symmetric eigenvalue complementarity problem. Comput. Optim. Appl. 51(3), 1097–1117 (2012)
108. Le Thi, H.A., Ndiaye, B.M., Pham Dinh, T.: Solving a multimodal transport problem by DCA. In:
IEEE International Conference on Research, Innovation and Vision for the Future, pp. 49–56 (2008)
109. Le Thi, H.A., Nguyen, D.M., Pham Dinh, T.: A DC programming approach for planning a multisensor
multizone search for a target. Comput. Oper. Res. 41, 231–239 (2014)
110. Le Thi, H.A., Nguyen, M.C.: Self-organizing maps by difference of convex functions optimization.
Data Min. Knowl. Disc. 28(5–6), 1336–1365 (2014)
111. Le Thi, H.A., Nguyen, M.C.: DCA based algorithms for feature selection in multi-class support vector
machine. Ann. Oper. Res. 249(1), 273–300 (2017)
112. Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A DC programming approach for finding communities
in networks. Neural Comput. 26(12), 2827–2854 (2014)
113. Le Thi, H.A., Nguyen, Q.T.: A robust approach for nonlinear UAV task assignment problem under
uncertainty. Transactions on Computational Collective Intelligence II. LNCS, vol. 6450, pp. 147–159.
Springer, Berlin, Heidelberg (2010)
114. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., Pham Dinh, T.: Solving the earliness tardiness scheduling
problem by DC programming and DCA. Math. Balk. 23(3–4), 271–288 (2009)
115. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., Pham Dinh, T.: A time-indexed formulation of earliness
tardiness scheduling via DC programming and DCA. In: International Multiconference on Computer
Science and Information Technology IMCSIT’09, pp. 2009 (779–784)
116. Le Thi, H.A., Nguyen, Q.T., Phan, K.T., Pham Dinh, T.: Energy minimization-based cross-layer
design in wireless networks. In: Proceedings of the 2008 High Performance Computing & Simulation
Conference (HPCS 2008), pp. 283–289 (2008)
117. Le Thi, H.A., Nguyen, Q.T., Phan, K.T., Pham Dinh, T.: DC programming and DCA based cross-layer
optimization in multi-hop TDMA networks. Intelligent Information and Database Systems. LNCS,
vol. 7803, pp. 398–408. Springer, Berlin, Heidelberg (2013)
118. Le Thi, H.A., Nguyen, T.B.T., Le, : H.M.: Sparse signal recovery by difference of convex func-
tions algorithms. In; Intelligent Information and Database Systems. LNCS, vol. 7803, pp. 387–397.
Springer, Berlin, Heidelberg (2013)
119. Le Thi, H.A., Nguyen, T.P., Pham Dinh, T.: A continuous DC programming approach to the strategic
supply chain design problem from qualified partner set. Eur. J. Oper. Res. 183(3), 1001–1012 (2007)
120. Le Thi, H.A., Nguyen, V.V., Ouchani, S.: Gene selection for cancer classification using DCA. J. Front.
Comput. Sci. Technol. 3(6), 612–620 (2009)
121. Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems
by D.C. algorithms. J. Global Optim. 11(3), 253–285 (1997)
122. Le Thi, H.A., Pham Dinh, T.: A branch-and-bound method via D.C. optimization algorithm and
ellipsoidal technique for box constrained nonconvex quadratic programming problems. J. Global
Optim. 13(2), 171–206 (1998)
123. Le Thi, H.A., Pham Dinh, T.: D.C. programming approach for large-scale molecular optimization
via the general distance geometry problem. In: Floudas, C.A., Pardalos, P.M. (eds.) Optimization
in Computational Chemistry and Molecular Biology: Local and Global Approaches, Nonconvex
Optimization and Its Applications, vol. 40, pp. 301–339. Springer, New York (2000)
124. Le Thi, H.A., Pham Dinh, T.: A continuous approach for globally solving linearly constrained
quadratic zero-one programming problems. Optimization 50(1–2), 93–120 (2001)
125. Le Thi, H.A., Pham Dinh, T.: D.C. optimization approaches via Markov models for restoration of
signal (1-D) and (2-D). In: Hadjisavvas, N., Pardalos, P. (eds.) Advances in Convex Analysis and
Global Optimization, pp. 303–317. Kluwer, Dordrecht (2001)
126. Le Thi, H.A., Pham Dinh, T.: D.C. programming approach to the multidimensional scaling problem.
In: Migdalas, A., Pardalos, P.M., Värbrand, P. (eds.) From Local to Global Optimization, pp. 231–276.
Springer, New York (2001)
127. Le Thi, H.A., Pham Dinh, T.: D.C. programming approach for multicommodity network optimization
problems with step increasing cost functions. J. Global Optim. 22(1), 205–232 (2002)
128. Le Thi, H.A., Pham Dinh, T.: Large scale molecular optimization from distance matrices by a D.C.
optimization approach. SIAM J. Optim. 14(1), 77–114 (2003)
129. Le Thi, H.A., Pham Dinh, T.: A new algorithm for solving large scale molecular distance geometry
problems. In: Di Pillo, G., Murli, A. (eds.) High Performance Algorithms and Software for Nonlinear
Optimization. Applied Optimization, vol. 82, pp. 285–302. Springer, New York (2003)
123
H. A. Le Thi, T. Pham Dinh
130. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA
revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4),
23–48 (2005)
131. Le Thi, H.A., Pham Dinh, T.: A continuous approach for the concave cost supply problem via DC
programming and DCA. Discrete Appl. Math. 156(3), 325–338 (2008)
132. Le Thi, H.A., Pham Dinh, T.: On solving linear complemetarity problems by DC programming and
DCA. Comput. Optim. Appl. 50(3), 507–524 (2011)
133. Le Thi, H.A., Pham Dinh, T.: A two phases DCA based algorithm for solving the Lennard–Jones
problem. Tech. rep., LITA, University of Metz (2011)
134. Le Thi, H.A., Pham Dinh, T.: Minimizing the morse potential energy function by a DC programming
approach. Tech. rep., LITA, University of Lorraine (2012)
135. Le Thi, H.A., Pham Dinh, T.: DC programming approaches for distance geometry problems. In:
Mucherino, A., Lavor, C., Liberti, L., Maculan, N. (eds.) Distance Geometry: Theory, Methods, and
Applications, pp. 225–290. Springer, New York (2013)
136. Le Thi, H.A., Pham Dinh, T.: Network utility maximisation: A DC programming approach for Sig-
moidal utility function. In: International Conference on Advanced Technologies for Communications
(ATC’13), pp. 50–54 (2013)
137. Le Thi, H.A., Pham Dinh, T.: DC programming in communication systems: challenging problems
and methods. Vietnam J. Comput. Sci. 1(1), 15–28 (2014)
138. Le Thi, H.A., Pham Dinh, T.: Difference of convex functions algorithms (DCA) for image restoration
via a Markov random field model. Optim. Eng. 18(4), 873–906 (2017)
139. Le Thi, H.A., Pham Dinh, T., Belghiti, M.: DCA based algorithms for multiple sequence alignment
(MSA). Cent. Eur. J. Oper. Res. 22(3), 501–524 (2014)
140. Le Thi, H.A., Pham Dinh, T., Bouallagui, S.: Cryptanalysis of an identification scheme based on
the perceptron problem using a hybridization of deterministic optimization and genetic algorithm.
In: Proceedings of the 2009 International Conference on Security & Management, SAM 2009, pp.
117–123. CSREA Press (2009)
141. Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Exact penalty techniques in DC programming. Tech. rep,
National Institute for Applied Sciences, Rouen (2005)
142. Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Optimization based DC programming and DCA for
hierarchical clustering. Eur. J. Oper. Res. 183(3), 1067–1085 (2007)
143. Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J.
Global Optim. 52(3), 509–535 (2012)
144. Le Thi, H.A., Pham Dinh, T., Le, H.M., Vo, X.T.: DC approximation approaches for sparse optimiza-
tion. Eur. J. Oper. Res. 244(1), 26–46 (2015)
145. Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Numerical solution for optimization over the efficient set
by D.C. optimization algorithm. Oper. Res. Lett. 19(3), 117–128 (1996)
146. Le Thi, H.A., Pham Dinh, T., Muu, L.D.: A combined D.C. optimization-ellipsoidal branch-and-
bound algorithm for solving nonconvex quadratic programming problems. J. Comb. Optim. 2(1),
9–28 (1998)
147. Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Exact penalty in DC programming. Vietnam J. Math. 27(2),
169–179 (1999)
148. Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Simplicially constrained D.C. optimization over the efficient
and weakly efficient sets. J. Optim. Theory Appl. 117(3), 503–521 (2003)
149. Le Thi, H.A., Pham Dinh, T., Thiao, M.: Efficient approaches for 2 − 0 regularization and appli-
cations to feature selection in SVM. Appl. Intell. 45(2), 549–565 (2016)
150. Le Thi, H.A., Pham Dinh, T., Thoai, N.V.: Combination between global and local methods for solving
an optimization problem over the efficient set. Eur. J. Oper. Res. 142(2), 258–270 (2002)
151. Le Thi, H.A., Pham Dinh, T., Thoai, N.V., Nguyen Canh, N.: D.C. optimization techniques for solving
a class of nonlinear bilevel programs. J. Global Optim. 44(3), 313–337 (2009)
152. Le Thi, H.A., Pham Dinh, T., Tran, D.Q.: A DC programming approach for a class of bilevel pro-
gramming problems and its application in portfolio selection. NACO Numer. Algebra Control Optim.
2(1), 167–185 (2012)
153. Le Thi, H.A., Pham Dinh, T., Yen, N.D.: Behavior of DCA sequences for solving the trust-region
subproblem. J. Global Optim. 53, 317–329 (2012)
154. Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse optimal scoring problem. Neuro-
computing 186, 170–181 (2016)
123
DC programming and DCA: thirty years of developments
155. Le Thi, H.A., Phan, D.N.: Efficient nonconvex group variable selection and application to group
sparse optimal scoring (2017, Submitted)
156. Le Thi, H.A., Phan, D.N.: DC programming and DCA for sparse Fisher linear discriminant analysis.
Neural Comput. Appl. 28(9), 2809–2822 (2017)
157. Le Thi, H.A., Ta, A.S., Pham Dinh, T.: An efficient DCA based algorithm for power control in large
scale wireless networks. Appl. Math. Comput. 318, 215–226 (2018)
158. Le Thi, H.A., Tran, D.Q.: Solving continuous min max problem for single period portfolio selection
with discrete constraints by DCA. Optimization 61(8), 1025–1038 (2012)
159. Le Thi, H.A., Tran, D.Q.: New and efficient algorithms for transfer prices and inventory holding
policies in two-enterprise supply chains. J. Global Optim. 60(1), 5–24 (2014)
160. Le Thi, H.A., Tran, D.Q.: Optimizing a multi-stage production/inventory system by DC programming
based approaches. Comput. Optim. Appl. 57(2), 441–468 (2014)
161. Le Thi, H.A., Tran, Q.D., Adjallah, K.H.: A difference of convex functions algorithm for optimal
scheduling and real-time assignment of preventive maintenance jobs on parallel processors. J. Ind.
Manag. Optim. 10(1), 243–258 (2014)
162. Le Thi, H.A., Tran, T.T., Pham Dinh, T., Gély, A.: DC programming and DCA for transmit beamform-
ing and power allocation in multicasting relay network. In: Nguyen, T.B., van Do, T., Le Thi, H.A.,
Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering: ICCSAMA
2016, Proceedings, Part I, pp. 29–41. Springer, New York (2016)
163. Le Thi, H.A., Vaz, A.I.F., Vicente, L.N.: Optimizing radial basis functions by D.C. programming and
its use in direct search for global derivative-free optimization. TOP 20(1), 190–214 (2012)
164. Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Feature selection for linear SVMs under uncertain data: robust
optimization based on difference of convex functions algorithms. Neural Netw. 59, 36–50 (2014)
165. Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Efficient nonegative matrix factorization by DC programming
and DCA. Neural Comput. 28(6), 1163–1216 (2016)
166. Le Thi, H.A.: DC Programming and DCA: http://www.lita.univ-lorraine.fr/~lethi/index.php/en/
research/dc-programming-and-dca.html (Homepage) (2005)
167. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal newton-type methods for minimizing composite func-
tions. SIAM J. Optim. 24(3), 1420–1443 (2014)
168. de Leeuw, J.: Applications of convex analysis to multidimensional scaling. In: Barra, J.R., Brodeau,
F., Romier, G., Van Cutsem, B. (eds.) Recent Developments in Statistics, pp. 133–146. North Holland,
Amsterdam (1977)
169. Li, P., Rangapuram, S.S., Slawski, M.: Methods for sparse and low-rank recovery under simplex
constraints. arXiv:1605.00507 (2016)
170. Li, Z., Lou, Y., Zeng, T.: Variational multiplicative noise removal by DC programming. J. Sci. Comput.
68(3), 1200–1216 (2016)
171. Liu, D., Shi, Y., Tian, Y., Huang, X.: Ramp loss least squares support vector machine. J. Comput. Sci.
14, 61–68 (2016)
172. Liu, D., Tian, Y., Shi, Y.: Ramp loss nonparallel support vector machine for pattern classification.
Knowl. Based Syst. 85, 224–233 (2015)
173. Liu, Y., Shen, X.: Multicategory ψ-learning. J. Am. Stat. Assoc. 101, 500–509 (2006)
174. Liu, Y., Shen, X., Doss, H.: Multicategory ψ-learning and support vector machine: computational
tools. J. Comput. Graph. Stat. 14, 219–236 (2005)
175. Liu, Z.: Non-dominated set of a multi-objective optimisation problem. Ph.D. thesis, Lancaster Uni-
versity (2016)
176. Lou, Y., Osher, S., Xin, J.: Computational aspects of constrained L1–L2 minimization for compres-
sive sensing. In: Le Thi, H.A., Pham Dinh, T., Nguyen, N.T. (eds.) Modelling, Computation and
Optimization in Information Systems and Management Sciences, pp. 169–180. Springer, New York
(2015)
177. Lou, Y., Yin, P., He, Q., Xin, J.: Computing sparse representation in a highly coherent dictionary
based on difference of L1 and L2. J. Sci. Comput. 64(1), 178–196 (2015)
178. Lou, Y., Yin, P., Xin, J.: Point source super-resolution via non-convex l1 based methods. J. Sci.
Comput. 68(3), 1082–1100 (2016)
179. Lou, Y., Zeng, T., Osher, S., Xin, J.: A weighted difference of anisotropic and isotropic total variation
model for image processing. SIAM J. Imaging Sci. 8(3), 1798–1823 (2015)
180. Mahey, P., Phong, T.Q., Luna, H.P.L.: Separable convexification and DCA techniques for capacity
and flow assignment problems. RAIRO Oper. Res. 35, 269–281 (2001)
123
H. A. Le Thi, T. Pham Dinh
181. Mangasarian, O.L.: Machine learning via polyhedral concave minimization. In: Fischer, H., Ried-
mueller, B., Schaeffler, S. (eds.) Applied Mathematics and Parallel Computing—Festschrift for Klaus
Ritter, pp. 175–188. Physica-Verlag, Germany (1996)
182. Martinet, B.: Brève communication. régularisation d’inéquations variationnelles par approximations
successives. ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique
et Analyse Numérique 4(R3), 154–158 (1970)
183. Mokhtari, A., Koppel, A., Scutari, G., Ribeiro, A.: Large-scale nonconvex stochastic optimization
by doubly stochastic successive convex approximation. In: 2017 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 4701–4705 (2017)
184. Mu, P., Hu, X., Wang, B., Li, Z.: Secrecy rate maximization with uncoordinated cooperative jamming
by single-antenna helpers under secrecy outage probability constraint. IEEE Commun. Lett. 19(12),
2174–2177 (2015)
185. Ndiaye, B.M.: Simulation et optimisation DC dans les réseaux de transport combinés : codes à usage
industriel. Ph.D. thesis, INSA de Rouen (2007)
186. Ndiaye, B.M., Le Thi, H.A., Pham Dinh, T.: Single straddle carrier routing problem in port container
terminals: Mathematical model and solving approaches. Int. J. Intell. Inf. Database Syst. 6(6), 532–
554 (2012)
187. Ndiaye, B.M., Le Thi, H.A., Pham Dinh, T., Niu, Y.: DC programming and DCA for large-scale
two-dimensional packing problems. In: Intelligent Information and Database Systems. LNCS, vol.
7197, pp. 321–330. Springer, Berlin Heidelberg (2012)
188. Ndiaye, B.M., Pham Dinh, T., Le Thi, H.A.: Single straddle carrier routing problem in port container
terminals: Mathematical model and solving approaches. In: Le Thi, H.A., Bouvry, P., Pham Dinh, T.
(eds.) Modelling, Computation and Optimization in Information Systems and Management Sciences,
pp. 21–31 (2008)
189. Neumann, J., Schnörr, C., Steidl, G.: Combined SVM-based feature selection and classification.
Mach. Learn. 61(1–3), 129–150 (2005)
190. Nguyen, D.M.: The DC programming and the cross- entropy method for some classes of problems
in finance, assignment and search theory. Ph.D. thesis, INSA de Rouen (2012)
191. Nguyen, M.C.: La programmation DC et DCA pour certaines classes de problèmes en apprentissage
et fouille de données. Ph.D. thesis, University of Lorraine (2014)
192. Nguyen, Q.T.: Approches locales et globales basées sur la programmation DC et DCA pour des
problèmes combinatoires en variables mixtes 0-1 : applications à la planification opérationnelle.
Ph.D. thesis, Université Paul Verlaine-Metz (2010)
193. Nguyen, Q.T., Le Thi, H.A.: Solving an inventory routing problem in supply chain by DC program-
ming and DCA. In: Intelligent Information and Database Systems. LNCS, vol. 6592, pp. 432–441.
Springer, Berlin Heidelberg (2011)
194. Nguyen, T.A., Nguyen, M.N.: Convergence analysis of a proximal point algorithm for minimizing
differences of functions. Optimization 66(1), 129–147 (2017)
195. Nguyen, T.B.T.: La programmation DC et DCA en analyse d’image : acquisition comprimée, seg-
mentation et restauration. Ph.D. thesis, University of Lorraine (2014)
196. Nguyen, T.B.T., Le Thi, H.A., Le, H.M., Vo, X.T.: DC approximation approach for 0 -minimization
in compressed sensing. In: Le Thi, H.A., Nguyen, N.T., van Do, T. (eds.) Advanced Computational
Methods for Knowledge Engineering, pp. 37–48. Springer, New York (2015)
197. Nguyen, T.M.T., Le Thi, H.A.: A DC programming approach to the continuous equilibrium network
design problem. In: Nguyen, T.B., van Do, T., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Com-
putational Methods for Knowledge Engineering: ICCSAMA 2016, Proceedings, Part I, pp. 3–16.
Springer, New York (2016)
198. Nguyen, T.P.: Techniques d’optimisation en traitement d’image et vision par ordinateur et en transport
logistique. Ph.D. thesis, Université Paul Verlaine-Metz (2007)
199. Nguyen, V.V.: Méthodes exactes pour l’optimisation DC polyédrale en variables mixtes 0-1 basées
sur DCA et des nouvelles coupes. Ph.D. thesis, INSA de Rouen (2006)
200. Nguyen Canh, N., Le Thi, H.A., Pham Dinh, T.: A branch and bound algorithm based on DC program-
ming and DCA for strategic capacity planning in supply chain design for a new market opportunity.
In: Operations Research Proceedings. Operations Research Proceedings, vol. 2006, pp. 515–520.
Springer, Berlin Heidelberg (2007)
201. Nguyen Canh, N., Pham, T.H., Tran, V.H.: DC programming and DCA approach for resource allo-
cation optimization in OFDMA/TDD wireless networks. In: Le Thi, H.A., Nguyen, N.T., van Do,
123
DC programming and DCA: thirty years of developments
T. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 49–56. Springer, New
York (2015)
202. Niu, Y.S., Júdice, J., Le Thi, H.A., Pham Dinh, T.: Solving the quadratic eigenvalue complementarity
problem by DC programming. In: Le Thi, H.A., Pham Dinh, T., Nguyen, N.T. (eds.) Modelling,
Computation and Optimization in Information Systems and Management Sciences, pp. 203–214.
Springer, New York (2015)
203. Niu, Y.S., Pham Dinh, T., Le Thi, H.A., Judice, J.: Efficient DC programming approaches for asym-
metric eigenvalue complementarity problem. Optim. Methods Softw. 28(4), 812–829 (2013)
204. Ong, C.S., Le Thi, H.A.: Learning sparse classifiers with difference of convex functions algorithms.
Optim. Methods Softw. 28(4), 830–854 (2013)
205. Orlov, A., Strekalovsky, A.: On a local search for hexamatrix games. In: A. Kononov, I. Bykadorov,
O. Khamisov, I. Davydov, P. Kononova (eds.) DOOR 2016, pp. 477–488 (2016)
206. Ortega, J., Rheinboldt, W.: Iterative Solutions of Nonlinear Equations in Several Variables, pp. 253–
255. Academic, New York (1970)
207. Pan, W., Shen, X., Liu, B.: Cluster analysis: unsupervised learning via supervised learning with a
non-convex penalty. J. Mach. Learn. Res. 14(1), 1865–1889 (2013)
208. Pang, J.S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs.
Math. Oper. Res. 42(1), 95–118 (2017)
209. Pang, J.S., Tao, M.: Decomposition methods for computing directional stationary solutions of a class
of non-smooth non-convex optimization problems. SIAM J. Optim. (2017, submitted)
210. Parida, P., Das, S.S.: Power allocation in OFDM based NOMA systems: a DC programming approach.
In: 2014 IEEE Globecom Workshops (GC Wkshps), pp. 1026–1031. IEEE (2014)
211. Park, F., Lou, Y., Xin, J.: A weighted difference of anisotropic and isotropic total variation for relaxed
Mumford-Shah image segmentation. In: IEEE ICIP 2016, pp. 4314–4318 (2016)
212. Park, S.H., Simeone, O., Sahin, O., Shamai, S.: Multihop backhaul compression for the uplink of
cloud radio access networks. IEEE Trans. Veh. Technol. 65(5), 3185–3199 (2016)
213. Pham, V.N.: Programmation DC et DCA pour l’optimisation non convexe/optimisation globale en
variables mixtes entières : Codes et Applications. Ph.D. thesis, INSA de Rouen (2013)
214. Pham, V.N., Le Thi, H.A., Pham Dinh, T.: A DC programming framework for portfolio selection by
minimizing the transaction costs. In: Advanced Computational Methods for Knowledge Engineering,
Studies in Computational Intelligence, vol. 479, pp. 31–40. Springer International Publishing (2013)
215. Pham Dinh, T.: Elements homoduaux d’une matrice A relatifs à un couple de normes (φ, ψ). Appli-
cations au calcul de sφψ (a) . Séminaire d’Analyse Numérique, Grenoble (1975)
216. Pham Dinh, T.: Calcul du maximum d’une forme quadratique définie positive sur la boule unité de la
norme du maximum . Séminaire d’Analyse Numérique, Grenoble (1976)
217. Pham Dinh, T.: Contribution à la théorie de normes et ses applications à l’analyse numérique. Uni-
versité Joseph Fourier, Grenoble, Thèse de doctorat d’etat es science (1981)
218. Pham Dinh, T.: Algorithmes de calcul du maximum des formes quadratiques sur la boule unité de la
norme du maximum. Numer. Math. 45(3), 377–401 (1984)
219. Pham Dinh, T.: Convergence of a subgradient method for computing the bound norm of matrices.
Linear Algebra Appl. 62, 163–182 (1984)
220. Pham Dinh, T., Ho, V.T., Le Thi, H.A.: DC programming and DCA for solving Brugnano-Casulli
piecewise linear systems. Comput. Oper. Res. 87(Supplement C), 196–204 (2017)
221. Pham Dinh, T., Le Thi, H.A.: Lagrangian stability and global optimality in nonconvex quadratic
minimization over Euclidiean balls and spheres. J. Convex Anal. 2(1–2), 263–276 (1995)
222. Pham Dinh, T., Le Thi, H.A.: Difference of convex function optimization algorithms (DCA) for
globally minimizing nonconvex quadratic forms on Euclidean balls and spheres. Oper. Res. Lett.
19(5), 207–216 (1996)
223. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithm and
applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
224. Pham Dinh, T., Le Thi, H.A.: D.C. optimization algorithms for solving the trust region subproblem.
SIAM J. Optim. 8(2), 476–505 (1998)
225. Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Transactions on
Computational Intelligence XIII. LNCS, vol. 8342, pp. 1–37. Springer, Berlin Heidelberg (2014)
226. Pham Dinh, T., Le Thi, H.A., Akoa, F.: Combining DCA and interior point techniques for large-scale
nonconvex quadratic programming. Optim. Methods Softw. 23(4), 609–629 (2008)
123
H. A. Le Thi, T. Pham Dinh
227. Pham Dinh, T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: DC programming approaches for discrete
portfolio optimization under concave transaction costs. Optim. Lett. 10(2), 1–22 (2016)
228. Pham Dinh, T., Nguyen Canh, N., Le Thi, H.A.: DC programming and DCA for globally solving the
value-at-risk. Comput. Manag. Sci. 6(4), 477–501 (2009)
229. Pham Dinh, T., Nguyen Canh, N., Le Thi, H.A.: An efficient combination of DCA and B&B using
DC/SDP relaxation for globally solving binary quadratic programs. J. Global Optim. 48(4), 595–632
(2010)
230. Pham Dinh, T., Niu, Y.S.: An efficient DC programming approach for portfolio decision with higher
moments. Comput. Optim. Appl. 50(3), 525–554 (2011)
231. Pham Dinh, T., Pham, V.N., Le Thi, H.A.: DC programming and DCA for portfolio optimization
with linear and fixed transaction costs. In: Intelligent Information and Database Systems, LNCS, vol.
8398, pp. 392–402. Springer International Publishing (2014)
232. Pham Dinh, T., Souad, E.B.: Algorithms for solving a class of nonconvex optimization problems.
Methods of subgradients. In: Hiriart-Urruty, J.B. (ed.) Fermat Days 85: Mathematics for Optimization,
North-Holland Mathematics Studies, vol. 129, pp. 249–271. North-Holland, Amsterdam (1986)
233. Pham Dinh, T., Souad, E.B.: Duality in D.C. (difference of convex functions) optimization. Sub-
gradient methods. In: Trends in Mathematical Optimization, International Series of Numerical
Mathematics, vol. 84, pp. 276–294. Birkhäuser, Basel (1988)
234. Phan, A.H., Tuan, H.D., Kha, H.H.: D.C. iterations for SINR maximin multicasting in cognitive radio.
In: 6th International Conference on Signal Processing and Communication Systems (ICSPCS 2012),
pp. 1–5 (2012)
235. Phan, D.N.: DCA based algorithms for learning with sparsity in high dimensional setting and stochas-
tical learning. Ph.D. thesis, University of Lorraine (2016)
236. Phan, D.N., Le Thi, H.A., Pham Dinh, T.: Sparse covariance matrix estimation by DCA-based algo-
rithms. Neural Comput. 29(11), 3040–3077 (2017)
237. Piot, B., Geist, M., Pietquin, O.: Difference of convex functions programming for reinforcement
learning. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.)
Advances in Neural Information Processing Systems, vol. 27, pp. 2519–2527. Curran Associates,
Red Hook (2014)
238. Polyak, B.T.: Introduction to Optimization. Optimization Software. Inc. Publication Division, New
York (1987)
239. Poulakis, M.I., Vassaki, S., Panagopoulos, A.D.: Secure cooperative communications under secrecy
outage constraint: a DC programming approach. IEEE Wirel. Commun. Lett. 5(3), 332–335 (2016)
240. Queiroz, M., Júdice, J., Humes, C.: The symmetric eigenvalue complementarity problem. Math.
Comput. 73(248), 1849–1863 (2004)
241. Rakotomamonjy, A., Flamary, R., Gasso, G.: DC proximal newton for nonconvex optimization prob-
lems. IEEE Trans. Neural Netw. Learn. Syst. 27(3), 636–647 (2016)
242. Razaviyayn, M.: Successive convex approximation: analysis and applications. Ph.D. thesis, University
of Minnesota (2014)
243. Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimiza-
tion methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
244. Razaviyayn, M., Hong, M., Luo, Z.Q., Pang, J.S.: Parallel successive convex approximation for
nonsmooth nonconvex optimization. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.,
Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1440–1448.
Curran Associates, Red Hook (2014)
245. Razaviyayn, M., Sanjabi, M., Luo, Z.Q.: A stochastic successive minimization method for nonsmooth
nonconvex optimization with applications to transceiver design in wireless communication networks.
Math. Program. 157(2), 515–545 (2016)
246. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
247. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
248. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim.
14(5), 877–898 (1976)
249. Schad, A., Law, K.L., Pesavento, M.: Rank-two beamforming and power allocation in multicasting
relay networks. IEEE Trans Signal Process. 63(13), 3435–3447 (2015)
250. Schleich, J., Le Thi, H.A., Bouvry, P.: Solving the minimum m-dominating set problem by a continuous
optimization approach based on DC programming and DCA. J. Comb. Optim. 24(4), 397–412 (2012)
123
DC programming and DCA: thirty years of developments
251. Schnörr, C.: Signal and image approximation with level-set constraints. Computing 81(2), 137–160
(2007)
252. Schüle, T., Schnörr, C., Weber, S., Hornegger, J.: Discrete tomography by convex-concave regular-
ization and D.C. programming. Discrete Appl. Math. 151(1–3), 229–243 (2005)
253. Schüle, T., Weber, S., Schnörr, C.: Adaptive reconstruction of discrete-valued objects from few pro-
jections. Electron. Notes Discrete Math. 20, 365–384 (2005)
254. Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex
optimization-part I: theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2017)
255. Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for
constrained nonconvex optimization-part II: applications in communications and machine learning.
IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)
256. Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.S.: Decomposition by partial linearization:
parallel optimization of multi-agent systems. IEEE Trans. Signal Process. 62(3), 641–656 (2014)
257. Seeger, A.: Quadratic eigenvalue problems under conic constraints. SIAM J. Matrix Anal. A 32(3),
700–721 (2011)
258. Shen, X., Huang, H.C.: Simultaneous supervised clustering and feature selection over a graph.
Biometrika 99(4), 899–914 (2012)
259. Shen, X., Tseng, G.C., Zhang, X., Wong, W.H.: On ψ learning. J. Am. Stat. Assoc. 98, 724–734
(2003)
260. Slawski, M., Hein, M., Lutsik, P.: Matrix factorization with binary components. In: Burges, C.J.C.,
Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information
Processing Systems, vol. 26, pp. 3210–3218. Curran Associates, Red Hook (2013)
261. Smola, A.J., Song, L., Teo, C.H.: Relative novelty detection. In: Proceedings of the 12th International
Conference on Artificial Intelligence and Statistics, vol. 5. JMLR W&CP 5, pp. 536–543 (2009)
262. Song, Y., Lin, L., Jian, L.: Robust check loss-based variable selection of high-dimensional single-index
varying-coefficient model. Commun. Nonlinear Sci. 36, 109–128 (2016)
263. Sriperumbudur, B.K., Torres, D.A., Lanckriet, G.R.G.: Sparse eigen methods by D.C. programming.
In: ICML’07, pp. 831–838. ACM, New York, NY, USA (2007)
264. Sun, Q., Xiang, S., Ye, J.: Robust principal component analysis via capped norms. In: Proceedings
of the 19th ACM SIGKDD, KDD’13, pp. 311–319. ACM (2013)
265. Sun, W., Sampaio, J.B., Candido, R.M.: Proximal point algorithm for minimization of DC function.
J. Comput. Math. 21, 451–462 (2003)
266. Ta, A.S.: Programmation DC et DCA pour la résolution de certaines classes des problèmes dans les
systèmes de transport et de communication. Ph.D. thesis, INSA - Rouen (2012)
267. Ta, A.S., Le Thi, H.A., Arnould, G., Khadraoui, D., Pham Dinh, T.: Solving car pooling problem
using DCA. In: Global Information Infrastructure Symposium (GIIS 2011), pp. 1–6 (2011)
268. Ta, A.S., Le Thi, H.A., Ha, T.S.: Solving relaxation orienteering problem using DCA-CUT. In: Le Thi,
H.A., Pham Dinh, T., Nguyen, N.T. (eds.) Modelling, Computation and Optimization in Information
Systems and Management Sciences, pp. 191–202. Springer, New York (2015)
269. Ta, A.S., Le Thi, H.A., Khadraoui, D., Pham Dinh, T.: Solving multicast QoS routing problem in the
context V2I communication services using DCA. In: IEEE/ACIS 9th International Conference on
Computer and Information Science (ICIS), 2010, pp. 471–476 (2010)
270. Ta, A.S., Le Thi, H.A., Khadraoui, D., Pham Dinh, T.: Solving QoS routing problems by DCA.
In: Intelligent Information and Database Systems. LNCS, vol. 5991, pp. 460–470. Springer, Berlin
Heidelberg (2010)
271. Ta, A.S., Le Thi, H.A., Khadraoui, D., Pham Dinh, T.: Solving partitioning-hub location-routing
problem using DCA. J. Ind. Manag. Optim. 8(1), 87–102 (2012)
272. Ta, A.S., Pham Dinh, T., Le Thi, H.A., Khadraoui, D.: Solving many to many multicast QoS rout-
ing problem using DCA and proximal decomposition technique. In: International Conference on
Computing, Networking and Communications (ICNC 2012), pp. 809–814 (2012)
273. Ta, M.T.: Techniques d’optimisation non convexe basée sur la programmation DC et DCA et méthodes
évolutives pour la classification non supervisée. Ph.D. thesis, University of Lorraine (2014)
274. Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L.: Clustering data stream by a sub-window approach
using DCA. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition, pp.
279–292. Springer, Berlin (2012)
123
H. A. Le Thi, T. Pham Dinh
275. Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L.: Clustering data streams over sliding windows by DCA.
In: Nguyen, T.N., van Do, T., le Thi, A.H. (eds.) Advanced Computational Methods for Knowledge
Engineering, pp. 65–75. Springer, Heidelberg (2013)
276. Ta, M.T., Le Thi, H.A., Boudjeloud-Assala, L.: An efficient clustering method for massive dataset
based on DC programming and DCA approach. In: Lee, M., Hirose, A., Hou, Z.G., Kil, R.M. (eds.)
ICONIP 2013, Part II, LNCS, vol. 8227, pp. 538–545. Springer, Berlin Heidelberg (2013)
277. Taleb, D., Liu, Y., Pesavento, M.: Full-rate general rank beamforming in single-group multicasting
networks using non-orthogonal STBC. In: 24th EUSIPCO, pp. 2365–2369 (2016)
278. Thai, J., Hunter, T., Akametalu, A.K., Tomlin, C.J., Bayen, A.M.: Inverse covariance estimation from
data with missing values using the concave-convex procedure. In: 53rd IEEE Conference on Decision
and Control, pp. 5736–5742 (2014)
279. Thanh, P.N., Bostel, N., Péton, O.: A DC programming heuristic applied to the logistics network
design problem. Int. J. Prod. Econ. 135(1), 94–105 (2012)
280. Thiao, M.: Pham Dinh, T., Le Thi, H.A.: DC programming approach for a class of nonconvex pro-
grams involving 0 norm. Modelling. In: Computation and Optimization in Information Systems and
Management Sciences, Communications in Computer and Information Science, vol. 14, pp. 348–357.
Springer, Berlin Heidelberg (2008)
281. Thiao, M., Pham Dinh, T., Le Thi, H.A.: A DC programming approach for sparse eigenvalue problem.
In: Fürnkranz, J., Joachims, T. (eds.) Proceedings ICML-10, pp. 1063–1070. Omnipress (2010)
282. Tian, X., Gasso, G., Canu, S.: A multiple kernel framework for inductive semi-supervised SVM
learning. Neurocomputing 90, 46–58 (2012)
283. Torres, D.A., Turnbull, D., Sriperumbudur, B.K., Barrington, L., Lanckriet, G.R.G.: Finding musically
meaningful words by sparse CCA. In: NIPS Workshop on Music, the Brain and Cognition (2007)
284. Tran, D.Q., Le Thi, H.A., Adjallah, K.H.: DCA for minimizing the cost and tardiness of preventive
maintenance tasks under real-time allocation constraint. In: Nguyen, N.T., Le, M.T., Swiatek, J.
(eds.) Intelligent Information and Database Systems, LNCS, vol. 5991, pp. 410–419. Springer, Berlin
Heidelberg (2010)
285. Tran, D.Q., Nguyen, B.T.P., Nguyen, Q.T.: A new approach for optimizing traffic signals in networks
considering rerouting. In: Modelling, Computation and Optimization in Information Systems and
Management Sciences, Advances in Intelligent Systems and Computing, vol. 359, pp. 143–154.
Springer International Publishing (2015)
286. Tran, T.T., Le Thi, H.A., Pham Dinh, T.: DC programming and DCA for a novel resource alloca-
tion problem in emerging area of cooperative physical layer security. In: Advanced Computational
Methods for Knowledge Engineering, Advances in Intelligent Systems and Computing 358, 57–68
(2015)
287. Tran, T.T., Le Thi, H.A., Pham Dinh, T.: DC programming and DCA for enhancing physical layer
security via cooperative jamming. Comput. Oper. Res. 87(Supplement C), 235–244 (2017)
288. Tran, T.T., Tuan, N.N., Le Thi, H.A., Gély, A.: DC programming and DCA for enhancing physical
layer security via relay beamforming strategies. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong,
T.P. (eds.) ACIIDS 2016, Part II, LNAI 9622, pp. 640–650. Springer, Berlin Heidelberg (2016)
289. Tsiligkaridis, T., Marcheret, E., Goel, V.: A difference of convex functions approach to large-scale
log-linear model estimation. IEEE Trans. Audio Speech 21(11), 2255–2266 (2013)
290. Tuan, H.N.: Convergence rate of the Pham Dinh-Le Thi algorithm for the trust-region subproblem.
J. Optim. Theory Appl. 154(3), 904–915 (2012)
291. Tuan, H.N., Yen, N.D.: Convergence of Pham Dinh-Le Thi’s algorithm for the trust-region subprob-
lem. J. Global Optim. 55(2), 337–347 (2013)
292. Vanderbei, R.J.: LOQO: an interior point code for quadratic programming. Optim. Methods Softw.
11(1–4), 451–484 (1999)
293. Vasiloglou, N., Gray, A.G., Anderson, D.V.: Non-negative matrix factorization, convexity and isom-
etry. In: Proceedings of the 2009 SIAM ICDM, chap. 57, pp. 673–684 (2009)
294. Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford University Press, Oxford (1991)
295. Vo, X.T.: Learning with sparsity and uncertainty by difference of convex functions optimization.
Ph.D. thesis, University of Lorraine (2015)
296. Vo, X.T., Le Thi, H.A.: Pham Dinh, T.: Robust optimization for clustering. ACIIDS 2016. Part II,
LNCS, vol. 9622, pp. 1–10. Springer, Berlin Heidelberg (2016)
123
DC programming and DCA: thirty years of developments
297. Vo, X.T., Le Thi, H.A., Pham Dinh, T., Nguyen, T.B.T.: DC programming and DCA for dictio-
nary learning. In: Computational Collective Intelligence, LNCS, vol. 9329, pp. 295–304. Springer
International Publishing (2015)
298. Vo, X.T., Tran, B., Le Thi, H.A., Pham Dinh, T.: Ramp loss support vector data description. In: Proc.
9th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2017). 3–5 April
2017, Kanazawa, Japan (2017). Lecture Note in Computer Science. Springer (2017, to appear)
299. Vucic, N., Shi, S., Schubert, M.: DC programming approach for resource allocation in wireless
networks. In: Proceedings of the 8th International Symposium on Modeling and Optimization in
Mobile, Ad Hoc and Wireless Networks (WiOpt 2010), pp. 380–386 (2010)
300. Wang, D., Chen, W., Han, Z.: Energy efficient secure communication over decode-and-forward relay
channels. IEEE Trans. Commun. 63(3), 892–905 (2015)
301. Wang, F., Zhao, B., Zhang, C.: Linear time maximum margin clustering. IEEE Trans. Neural Netw.
21(2), 319–332 (2010)
302. Wang, J., Shen, X.: Large margin semi-supervised learning. J. Mach. Learn. Res. 8, 1867–1891 (2007)
303. Wang, J., Shen, X., Pan, W.: On transductive support vector machines. In: Prediction and Discovery,
Contemporary Mathematics 443, pp. 7–19. American Mathematical Society (2007)
304. Wang, J., Shen, X., Pan, W.: On efficient large margin semisupervised learning: method and theory.
J. Mach. Learn. Res. 10, 719–742 (2009)
305. Wang, K., Zhong, P., Zhao, Y.: Training robust support vector regression via D.C. program. J. Inf.
Comput. Sci. 7(12), 2385–2394 (2010)
306. Wang, K., Zhu, W., Zhong, P.: Robust support vector regression with generalized loss function and
applications. Neural Process. Lett. 41(1), 89–106 (2015)
307. Wang, L., Kim, Y., Li, R.: Calibrating nonconvex penalized regression in ultra-high dimension. Ann.
Stat. 41(5), 2505–2536 (2013)
308. Wang, Y., Xia, X.: An effective l0 -svm classifier for face recognition based on haar features. Adv.
Nat. Sci. 9(1), 1–4 (2016)
309. Weber, S., Schüle, T., Schnörr, C.: Prior learning and convex–concave regularization of binary tomog-
raphy. Electron. Notes Discrete Math. 20, 313–327 (2005)
310. Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and
kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
311. Wozabal, D.: Value-at-risk optimization using the difference of convex algorithm. OR Spectrum
34(4), 861–883 (2012)
312. Wu, C., Kwon, S., Shen, X., Pan, W.: A new algorithm and theory for penalized regression-based
clustering. J. Mach. Learn. Res. 17, 1–25 (2016)
313. Wu, C., Li, C., Long, Q.: A DC Programming approach for sensor network localization with uncer-
tainties in anchor positions. J. Ind. Manag. Optim. 10(3), 817–826 (2014)
314. Wu, Y., Liu, Y.: Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc. 102(479),
974–983 (2007)
315. Wu, Y., Liu, Y.: Variable selection in quantile regression. Stat Sin. 19, 801–817 (2009)
316. Xiang, S., Shen, X., Ye, J.: Efficient nonconvex sparse group feature selection via continuous and
discrete optimization. Artif. Intell. 224, 28–50 (2015)
317. Yang, L., Ju, R.: A DC programming approach for feature selection in the minimax probability
machine. Int. J. Comput. Intell. Syst. 7(1), 12–24 (2014)
318. Yang, L., Qian, Y.: A sparse logistic regression framework by difference of convex functions pro-
gramming. Appl. Intell. 45(2), 241–254 (2016)
319. Yang, L., Wang, L.: A class of semi-supervised support vector machines by DC programming. Adv.
Data Anal. Classif. 7(4), 417–433 (2013)
320. Yang, L., Zhang, S.: A sparse extreme learning machine framework by continuous optimization
algorithms and its application in pattern recognition. Eng. Appl. Artif. Int. 53, 176–189 (2016)
321. Yang, S., Yuan, L., Lai, Y.C., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an
undirected graph. In: ACM SIGKDD, pp. 922–930 (2012)
322. Yang, T., Liu, J., Gong, P., Zhang, R., Shen, X., Ye, J.: Absolute fused lasso and its application to
genome-wide association studies. In: Proceedings of the 22nd ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, KDD’16, pp. 1955–1964. ACM (2016)
323. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1−2 for compressed sensing. SIAM J. Sci. Comput.
37(1), 536–563 (2015)
123
H. A. Le Thi, T. Pham Dinh
324. Yin, P., Xin, J., Qi, Y.: Linear feature transform and enhancement of classification on deep neural
network. (2016, Submitted)
325. Ying, Y., Huang, K., Campbell, C.: Enhanced protein fold recognition through a novel data integration
approach. BMC Bioinform. 10(1), 1–18 (2009)
326. You, S., Lijun, C., Liu, Y.E.: Convex-concave procedure for weighted sum-rate maximization in a
MIMO interference network. In: IEEE GLOBECOM 2014, pp. 4060–4065 (2014)
327. Yu, C.N.J., Joachims, T.: Learning structural SVMs with latent variables. In: ICML’09, pp. 1169–
1176. ACM, New York, NY, USA (2009)
328. Yu, P.L.: Multiple-Criteria Decision Making: Concepts, Techniques, and Extensions. In: Mathematical
Concepts and Methods in Science and Engineering, vol. 30. Springer, USA (1985)
329. Yuille, A.L., Rangarajan, A.: The concave–convex procedure. Neural Comput. 15(4), 915–936 (2003)
330. Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. IEEE Trans. Neural
Netw. 20(4), 583–596 (2009)
331. Zhang, P., Tian, Y., Zhang, Z., Li, A., Zhu, X.: Select objective functions for multiple criteria pro-
gramming classification. In: Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT’08.
IEEE/WIC/ACM International Conference on, vol. 3, pp. 420–423 (2008)
332. Zhang, X., Wu, Y., Wang, L., Li, R.: Variable selection for support vector machines in moderately
high dimensions. J. R. Stat. Soc. B 78(1), 53–76 (2016)
333. Zhao, Z., Sun, L., Yu, S., Liu, H., Ye, J.: Multiclass probabilistic kernel discriminant analysis. In:
Proceedings of the 21st International Joint Conference on Artifical Intelligence, IJCAI’09, pp. 1363–
1368. Morgan Kaufmann (2009)
334. Zheng, G.: Joint beamforming optimization and power control for full-duplex MIMO two-way relay
channel. IEEE Trans. Signal Process. 63(3), 555–566 (2015)
335. Zheng, G., Krikidis, I., Li, J., Petropulu, A.P., Ottersten, B.: Improving physical layer secrecy using
full-duplex jamming receivers. IEEE Trans. Signal Process. 61(20), 4962–4974 (2013)
336. Zhong, P.: Training robust support vector regression with smooth non-convex loss function. Optim.
Methods Softw. 27(6), 1039–1058 (2012)
337. Zhong, Y., Aghezzaf, E.H.: Combining DC-programming and steepest-descent to solve the single-
vehicle inventory routing problem. Comput. Ind. Eng. 61(2), 313–321 (2011)
338. Zhou, Y., Zhu, Y., Xue, Z.: Enhanced MIMOME wiretap channel via adopting full-duplex MIMO
radios. In: 2014 IEEE Global Communications Conference, pp. 3320–3325. IEEE (2014)
339. Zhou, Z.H., Zhang, M.L., Huang, S.J., Li, Y.F.: Multi-instance multi-label learning. Artif. Intell.
176(1), 2291–2320 (2012)
340. Zhu, Y., Shen, X., Pan, W.: Simultaneous grouping pursuit and feature selection over an undirected
graph. J. Am. Stat. Assoc. 108(502), 713–725 (2013)
341. Zisler, M., Petra, S., Schnörr, C., Schnörr, C.: Discrete tomography by continuous multilabeling sub-
ject to projection constraints. In: Proceedings of the 38th German Conference on Pattern Recognition
(2016)
342. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006(476), 1418–1429
(2006)
343. Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat.
36(4), 1509–1533 (2008)
123