Institute of Computer Science: Academy of Sciences of The Czech Republic
Institute of Computer Science: Academy of Sciences of The Czech Republic
January 2018
Pod Vodárenskou věžı́ 2, 182 07 Prague 8 phone: +420 2 688 42 44, fax: +420 2 858 57 89,
e-mail:e-mail:[email protected]
Institute of Computer Science
Academy of Sciences of the Czech Republic
January 2018
Abstract:
This contribution contains the description and investigation of four numerical methods for solving generalized
minimax problems, which consists in the minimization of functions which are compositions of special smooth
convex functions with maxima of smooth functions (the most important problem of this type is the sum of
maxima of smooth functions). Section 1 is introductory. In Section 2, we study recursive quadratic programming
methods. This section also contains the description of the dual method for solving corresponding quadratic
programming problems. Section 3 is devoted to primal interior points methods which use solutions of nonlinear
equations for obtaining minimax vectors. Section 4 contains investigation of smoothing methods, based on
using exponential smoothing terms. Section 5 contains a short description of primal-dual interior point methods
based on transformation of generalized minimax problems to general nonlinear programming problems. Finally
the last section contains results of numerical experiments.
Keywords:
Numerical optimization, nonlinear approximation, nonsmooth optimization, generalized minimax problems,
recursive quadratic programming methods, interior point methods, smoothing methods, algorithms, numerical
experiments.
Content
1 Generalized minimax problems 2
4 Smoothing methods 31
4.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Numerical experiments 42
References 44
1
1 Generalized minimax problems
In many practical problems we need to minimize functions that contain absolute values or pointwise maxima
of smooth functions. Such functions are nonsmooth but they often have a special structure enabling
the use of special methods that are more efficient than methods for minimization of general nonsmooth
functions. The classical minimax problem, where F (x) = max1≤k≤m fk (x), or problems where the function
to be minimized is a nonsmooth norm, e.g. F (x) = ∥f (x)∥∞ , F (x) = ∥f+ (x)∥∞ , F (x) = ∥f (x)∥1 ,
F (x) = ∥f+ (x)∥1 with f (x) = [f1 (x), . . . , fm (x)]T and f+ (x) = [max(f1 (x), 0), . . . , max(fm (x), 0)]T , are
typical examples. Such functions can be considered as special cases of more general functions, so it is
possible to formulate more general theories and construct more general numerical methods. One possibility
for generalization of the classical minimax problem consists in the use of the function
(a) Setting pk = ek , where ek is the k-th column of a unit matrix and k = m, we obtain F (x) =
max1≤k≤m fk (x) (the classical minimax).
(b) Setting pk = ek , pm+k = −ek and k = 2m, we obtain F (x) = max1≤k≤m max(fk (x), −fk (x)) =
∥f (x)∥∞ .
Remark 2. Since the mapping f (x) is continuously differentiable, the function (1) is Lipschitz. Thus, if
the point x ∈ Rn is a local minimum of F (x), then 0 ∈ ∂F (x) [31, Theorem 3.2.5] holds. According to
[31, Theorem 3.2.13], one has
{ }
∂F (x) = (∇f (x))T conv pk : k ∈ I(x)
¯ ,
¯
where I(x) = {k ∈ {1, . . . , k} : pTk f (x) = F (x)}. Thus, if the point x ∈ Rn is a local minimum of F (x),
then multipliers λk ≥ 0, 1 ≤ k ≤ k, exist, such that λk (pTk f (x) − F (x)) = 0, 1 ≤ k ≤ k,
∑
k ∑
k
λk = 1 and λk J(x)T pk = 0,
k=1 k=1
Remark 3. It is clear that a minimum of function (1) is a solution of a nonlinear programming problem
consisting in minimization of a function F̃ : Rn+1 → R, where F̃ (x, z) = z, on the set
2
Obviously, ak = ∇ck (x, z) = (pTk J(x), −1), 1 ≤ k ≤ k, and gk = ∇F̃ (x, z) = (0, 1), so the necessary KKT
conditions can be written in the form
[ ] k [
∑ ]
0 J T (x)pk
+ λk = 0,
1 −1
k=1
− z) = 0, where λk ≥ 0 are the Lagrange multipliers and z = F (x). Thus, we obtain the same
λk (pTk f (x)
necessary conditions for an extremum as in Remark 2.
From the examples given in Remark 1 it follows that composite nondifferentiable functions are not
suitable for representation of functions F (x) = ∥f (x)∥1 and F (x) = ∥f+ (x)∥1 because in this case the
expression on the right-hand side of (1) contains 2m elements with vectors pk , 1 ≤ k ≤ 2m . In the
subsequent considerations, we will choose a somewhat different approach. We will consider generalized
minimax functions established in [6] and [26].
Definition 1. We say that F : Rn → R is a generalized minimax function if
F (x) = h(F1 (x), . . . , Fm (x)), Fk (x) = max fkl (x), 1 ≤ k ≤ m, (2)
1≤l≤mk
3
Remark 5. Since the functions Fk (x), 1 ≤ k ≤ m, are regular [31, Theorem 3.2.13], the function h(z) is
continuously differentiable, and hk = ∂h(z)/∂zk > 0, one can write [31, Theorem 3.2.9]
∑
m ∑
m ∑
m
∂F (x) = conv hk ∂Fk (x) = hk ∂Fk (x) = hk conv{gkl : l ∈ I¯k (x)},
k=1 k=1 k=1
Remark 6. Unconstrained minimization of function (2) is equivalent to the nonlinear programming prob-
lem
minimize F̃ (x, z) = h(z) subject to fkl (x) ≤ zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (7)
Condition (3) is sufficient for satisfying equalities zk = Fk (x), 1 ≤ k ≤ m, at the minimum point. Denote
akl (x, z) gradients of functions ckl (x, z) = fkl (x) − zk . Obviously, akl (x, z) = (gkl (x), −ek ), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , where gkl (x) is a gradient of fkl (x) in x and ek is the k-th column of a unit matrix of order
m. Thus, the necessary first-order (KKT) conditions have the form
∑
m ∑
mk ∑
mk
∂h(z)
g(x, u) = gkl (x)ukl = 0, ukl = hk , hk = , 1 ≤ k ≤ m, (8)
∂zk
k=1 l=1 l=1
4
Remark 8. Minimization of the sum of absolute values
∑
m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x) (14)
k=1 k=1
k ≥ 0,
u+ zk − fk (x) ≥ 0, k (zk − fk (x)) = 0,
u+ 1 ≤ k ≤ m, (17)
u−
k ≥ 0, zk + fk (x) ≥ 0, u−
k (zk + fk (x)) = 0, 1 ≤ k ≤ m. (18)
− − −
If we set uk = u+k −uk and use the equality uk +uk = 1, we obtain uk = (1+uk )/2, uk = (1−uk )/2. From
+ +
− −
conditions uk ≥ 0, uk ≥ 0 the inequalities −1 ≤ uk ≤ 1, or |uk | ≤ 1, follow. The condition u+
+
k + uk = 1
−
implies that the numbers uk , uk cannot be simultaneously zero, so either zk = fk (x) or zk = −fk (x), that
+
is zk = |fk (x)|. If fk (x) ̸= 0, it cannot simultaneously hold zk = fk (x) and zk = −fk (x), so the numbers
− −
u+k , uk cannot be simultaneously nonzero. Then either uk = uk = 1 and zk = fk (x) or uk = −uk = −1
+
and zk = −fk (x), that is uk = fk (x)/|fk (x)|. Thus, the necessary KKT conditions have the form
∑
m
fk (x)
gk (x)uk = 0, zk = |fk (x)|, |uk | ≤ 1, and uk = , if |fk (x)| > 0. (19)
|fk (x)|
k=1
Remark 9. Minimization of the sum of absolute values can also be reformulated so that more slack
variables are used. We obtain the problem
∑
m
minimize F̃ (x, z) = (zk+ + zk− ) subject to fk (x) = zk+ − zk− , zk+ ≥ 0, zk− ≥ 0, (20)
k=1
where 1 ≤ k ≤ m. This problem contains m general equality constraints and 2m simple bounds for 2m
slack variables.
In the subsequent considerations, we will restrict ourselves to functions of the form (4), the sums of
maxima that include most cases important for applications. In this case, it holds
∑
m
h(z) = zk , ∇h(z) = ẽ, ∇2 h(z) = 0, (21)
k=1
where ẽ ∈ Rm is a vector with unit elements. The case when h(z) is a general function satisfying
Assumption X2 is studied in [26]. For simplicity, we will often use the notation vec(a, b) instead of
[aT , bT ]T ∈ Rn+m .
5
2 Recursive quadratic programming methods
2.1 Basic properties
Suppose the function h(z) is of form (21). In this case the necessary KKT conditions are of form (8)–(9),
where ∂h(z)/∂zk = 1, 1 ≤ k ≤ m. If we linearize these conditions in a neighborhood of a point x ∈ Rn ,
we can write for d ∈ Rn
∑
m ∑
mk ∑
mk
(gkl (x) + Gkl (x)d)ukl = 0, ukl = 1, 1 ≤ k ≤ m,
k=1 l=1 l=1
ukl ≥ 0, T
fkl (x) + gkl (x)d − zk ≤ 0, T
ukl (fkl (x) + gkl (x)d − zk ) = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk .
But these are the necessary KKT conditions for solving a quadratic programming problem: minimize a
quadratic function
∑
m
1 ∑m ∑mk
Q(d, z) = zk + dT Gd, G= Gkl (x)ukl (22)
2
k=1 k=1 l=1
on the set
C = {(d, z) ∈ Rn+m : fkl (x) + gkl
T
(x)d ≤ zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk }. (23)
Note that coefficients ukl , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , in (22) are old Lagrange multipliers. New Lagrange
multipliers along with new values of variables zk , 1 ≤ k ≤ m, are determined by solving quadratic
programming problem (22)–(23).
For simplification, we will omit the argument x and use the notation
fk1 (x) uk1 1
fk = . . . , uk = . . . , ẽk = . . . ,
fkmk (x) ukmk 1
Ak = [gk1 (x), . . . , gkmk (x)]. Problem (22)–(23) will be written in the form
∑
m
1
minimize Q(d, z) = zk + dT Gd subject to fk + ATk d ≤ zk ẽk , 1 ≤ k ≤ m, (24)
2
k=1
Quadratic programming problem (24) is convex, so there exists a dual problem stated in [32, Theorem
12.14]. We will use the notation
f1 u1 v1 w1 z1
f = . . . , u = . . . , v = . . . , w = . . . , z = . . . ,
fm um vm wm zm
6
Theorem 1. Consider quadratic programming problem (24) with positive definite matrix G (so this problem
is convex). Then the dual problem can be written in the form
1 T T
minimize Q̃(u) = u A HAu − f T u subject to uk ≥ 0, ẽTk uk = 1, 1 ≤ k ≤ m, (28)
2
where H = G−1 . Problem (28) is convex as well and the dual problem to this problem is primal problem
(24). If the pair (vec(d, z), u) is a KKT pair of the primal problem, then the pair (u, vec(v, w)), where
vk = −(ATk d + fk − zk ẽk ) and wk = zk , 1 ≤ k ≤ m, is a KKT pair of the dual problem. If the pair
(u, vec(v, w)) is a KKT pair of the dual problem, then the pair (vec(d, z), u), where d = −HAu and
zk = wk , 1 ≤ k ≤ m, is a KKT pair of the primal problem and ATk d + fk − zk ẽk = −vk , 1 ≤ k ≤ m.
∑
m
1 ∑ m
L(d, z, u) = zk + dT Gd + uTk (fk + ATk d − zk ẽk ) (29)
2
k=1 k=1
∑
m
d = −H Ak uk = −HAu, ẽTk uk = 1, 1 ≤ k ≤ m, (31)
k=1
∑
m
1 ∑ 1
m
L(d, z, u) = zk + dT Gd + uT (f + AT d) − zk = uT AT HAu + uT f − uT AT HAu
2 2
k=1 k=1
1
= − uT AT HAu + f T u,
2
so maximization of the Lagrange function L(d, z, u) is equivalent to minimization of the function Q̃(u).
Since the matrix H is positive definite, problem (28) is convex and we can set up the dual problem
consisting in maximizing the Lagrange function
1 T T ∑ m ∑ m
L̃(u, v, w) = u A HAu − f T u − vkT uk + wk (ẽTk uk − 1) (32)
2
k=1 k=1
and
uk ≥ 0, ẽTk uk = 1, vk ≥ 0, vkT uk = 0, 1 ≤ k ≤ m.
If we set d = −HAu, and substitute this relation into (33), we obtain
7
which along with uTk vk = 0 and uTk ẽk = 1 gives wk = uTk (fk + ATk d). Thus, it holds wk = zk , 1 ≤ k ≤ m,
by (27). If we substitute these equalities along with d = −HAu into (32), we can write
1 T ∑m ∑m
1 ∑m ∑m
L̃(u, v, w) = d Gd − fkT uk + (fk + ATk d − zk ẽk )T uk = dT Gd + uTk ATk d − zk
2 2
k=1 k=1 k=1 k=1
(m )
1 T ∑m ∑ 1
= d Gd − dT Gd − zk = − zk + dT Gd ,
2 2
k=1 k=1
so maximization of the Lagrange function L̃(u, v, w) is equivalent to minimization of the function Q(d, z).
The following theorem, which is a generalization of a similar theorem given in [15], shows that the
solution of quadratic programming problem (22)–(23) is a descent direction for the objective function
F (x).
Theorem 2. Let Assumption X3 be satisfied and let vectors d ∈ Rn , z ∈ Rm be a solution of quadratic
programming problem (22)–(23) with the positive definite matrix G and a corresponding vector of Lagrange
multipliers u ∈ Rm . If d = 0, then the pair (vec(x, z), u) is the KKT pair of problem (7). If d ̸= 0, then
F ′ (x, d) = dT g(x, u) < 0, where F ′ (x, d) is a directional derivative of function (4) along a vector d at a
point x and g(x, u) is a vector given by (8). If κ(G) ≤ 1/ε20 , where κ(G) is a spectral condition number of
G, then dT g(x, u) ≤ −ε0 ∥d∥∥g(x, u)∥ and for an arbitrary number 0 < ε1 < 1/2 there exists a steplength
0 < α ≤ 1 such that
F (x + αd) − F (x) ≤ ε1 αdT g(x, u) (37)
if 0 < α ≤ α.
Proof.
(a) If d = 0, then conditions (25)–(26) are equivalent to conditions (28)–(29). Thus, if a pair (vec(0, z), u)
is the KKT pair of problem (24), then a pair (vec(x, z), u) is the KKT pair of problem (7).
(b) Function (4) is a sum of maxima of differentiable functions, so it is regular by [31, Theorem 3.2.13]
and there exists a directional derivative
F (x + αd) − F (x) ∑
m
Fk (x + αd) − Fk (x)
F ′ (x, d) = lim = lim .
α↓0 α α↓0 α
k=1
Let 0 < α ≤ 1 and lk be indices such that fklk (x + αd) = Fk (x + αd), 1 ≤ k ≤ m. Then by
Assumption X3 it holds that
1
fklk (x + αd) ≤ fklk (x) + αgkl
T
(x)d + α2 G∥d∥2 , 1 ≤ k ≤ m.
k
2
Using inequality 0 < α ≤ 1 and relations (25)–(26), we obtain
T
fklk + α gkl k
d ≤ fklk + α(zk − fklk ) = αzk + (1 − α)fklk ≤ αzk + (1 − α)Fk
= Fk + α(zk − Fk ) = Fk + α uTk (zk ẽk − Fk ẽk ) ≤ Fk + α uTk (zk ẽk − fk )
= Fk + α uTk (zk ẽk − fk − ATk d) + α uTk ATk d = Fk + α uTk ATk d.
8
Thus, we can write
Since Gd = −Au = −g(x, u), see (25), and matrix G is positive definite, we have F ′ (x, d) =
dT g(x, u) = −dT Gd < 0.
(c) If κ(G) ≤ 1/ε20 , then dT g(x, u) ≤ −ε0 ∥d∥∥g(x, u)∥, see [32, Section 3.2]. Since d = −G−1 g(x, u), it
holds ∥d∥ ≤ ∥g(x, u)∥/G, which along with the previous inequality gives
1 1 T
∥d∥2 ≤ ∥d∥∥g(x, u)∥ ≤ − d g(x, u).
G ε0 G
∑
m ∑m ( )
F (x + αd) − F (x) Fk (x + αd) − Fk (x) 1
= dT Ak uk + αG∥d∥2
=
α α 2
k=1 k=1
( )
m mG
= dT g(x, u) + αG∥d∥2 ≤ 1 − α dT g(x, u),
2 2ε0 G
so (37) holds if
mG 2ε0 (1 − ε1 )G ∆
1−α ≥ ε1 ⇒ α≤ = α.
2ε0 G mG
Remark 11. A number 0 < α ≤ 1 satisfying (37) can be determined using the Armijo steplength selection
[32, Section 3.1]. Then α is a first term meeting (37) in the sequence αj , j ∈ N , such that α1 = 1 and
βαj ≤ αj+1 ≤ βαj , where 0 < β ≤ β < 1. At most int(log α/ log β + 1) steps is used, where int(t) is the
largest integer such that int(t) ≤ t and α ≥ βα. Substituting this inequality into (37) we obtain
ε0 ε1 β α
F (x + αd) − F (x) ≤ ε1 β αdT g(x, u) ≤ −ε0 ε1 β α∥d∥∥g(x, u)∥ ≤ − ∥g(x, u)∥2
G
∆
= −c∥g(x, u)∥2 . (39)
9
Note that by (31) we have
d = −HAu, (42)
which after substituting into (41) gives
1 T
Q̃(u) = d Gd − f T u. (43)
2
The solution of primal problem (40) can be obtained by an efficient method for solving dual problem
(41) as it is described in [19]. Let K ⊂ I = {1, . . . m} be a set of indices such that uk = 0 if k ̸∈ K
and vk = 0 if k ∈ K, where −vk = fk + gkT d − z, k ∈ K, are the values of constraints of the primal
problem (vk , k ∈ K, are the Lagrange multipliers of the dual problem). To simplify notation, we denote
u = [uk , k ∈ K], v = [vk , k ∈ K] vectors, whose elements are Lagrange multipliers with indices belonging
to K, so dimensions of these vectors are equal to the number of indices in K (similar notation we use for
vectors f , ẽ and for columns of matrix A). Note that multipliers uk and vk , k ̸∈ K, exist, but they are not
elements of the vectors u and v. Using (31), (34), and ẽT u = 1, we can determine elements uk , k ∈ K, of
a vector u and a variable z. Setting vk = 0, k ∈ K, we obtain
so
u = (AT HA)−1 (f − ẽz), (45)
and since
ẽT u = ẽT (AT HA)−1 (f − ẽz) = 1,
we obtain
ẽT (AT HA)−1 f − 1
z= . (46)
ẽT (AT HA)−1 ẽ
Definition 2. The set of indices K ⊂ I such that uk = 0, k ̸∈ K, and vk = 0, k ∈ K, is called the set of
active constraint indices (active set for short) of the primal problem. If uk ≥ 0, k ∈ K, then we say that
K is an acceptable active set of the primal problem.
Remark 12. Formulas (45)–(46) cannot be used if A has linearly dependent columns. It may happen even
if the Jacobian matrix [AT , −ẽ] has full rank. In order not to investigate this singular case separately, we
will use matrices [ ] [ ]
A H 0
à = , H̃ = , (47)
−ẽT 0 µ
where µ > 0. Then ÃT H̃ Ã = AT HA + µẽẽT , so by (44) it holds
pT f − 1
u = Cf − (z − µ)p, z =µ+ , (49)
pT e
where C = (ÃT H̃ Ã)−1 and p = C ẽ. The value µ > 0 should be comparable with elements of matrix H.
The choice µ = 1 is usually suitable.
The active constraint method for solving dual problem (41) introduced in [19] is based on generating a
sequence of acceptable active sets of primal problem (40). An initial acceptable active set is determined in
the way that we choose an arbitrary index k ∈ I and set K = {k}, so uk = 1 and ul = 0 if l ∈ I and l ̸= k.
At each step we first test if the necessary (in a convex case also sufficient) KKT conditions are satisfied. If
vl < 0 for some index l ̸∈ K, then we try to remove the active constraint of the dual problem by considering
the set K + = K∪{l}. However, this set need not be acceptable (it may hold uk < 0 for some index k ∈ K + ).
10
Therefore, we need to remove some active constraints of the primal problem in advance, that is, to construct
an acceptable set K̄ ⊂ K such that the set K + = K̄ ∪ {l} was acceptable as well. For this reason we will
change the constraints of the primal problem with index l into −vl (λ) = ATl d + fl − z + (1 − λ)vl ≤ 0
(parameter λ is introduced as an argument), so −vl (0) = ATl d+fl −z +vl = 0 and uk (0) ≥ 0 if k ∈ K ∪{l}.
In the subsequent considerations we will use the notation al = gl .
Lemma 1. Let K be an acceptable active set of primal problem (40) and vl < 0. Suppose that a vector
ãl = [aTl , −1]T is not a linear combination of columns of matrix à and denote p = C ẽ, ql = C ÃT H̃ãl ,
βl = 1 − ẽT ql , γl = βl /ẽT p, and δl = ãTl (H̃ − H̃ ÃC ÃT H̃)ãl = ãTl H̃(ãl − Ãql ), where C = (ÃT H̃ Ã)−1 , so
δl > 0. Then
u(λ) = u(0) − α(ql + γl p), ul (λ) = ul (0) + α, z(λ) = z(0) + αγl , (50)
The inverse matrix of this system can be expressed by the inverse matrix C = (ÃT H̃ Ã)−1 , which gives
[ ] ql qlT ql
u(λ) − u(0) C+ δ − [ (z(λ) − z(0))ẽ
]
= − l δ l
ul (λ) − ul (0) qT 1 λvl + (z(λ) − z(0))
− l
δl δl
βl λvl
(p − ql )(z(λ) − z(0)) − ql
δl δl
= − βl λvl . (52)
(z(λ) − z(0)) +
δl δl
that is
βl δl λvl
z(λ) − z(0) = − λvl = −γl = αγl , (54)
δl δl ẽT p + βl2 δl + β l γl
which is the last equality in (50). Substituting (54) into (52) and performing formal arrangements we
obtain the remaining equalities in (50).
Remark 13. Using (47) we obtain
[ T ] [ T ] [ ]
à H̃ à ÃT H̃ãl A HA AT Hal ẽẽT ẽ
= +µ , (55)
ãTl H̃ Ã ãTl H̃ãl aTl HA aTl Hal ẽT 1
11
so
ql = C ÃT H̃ãl = CAT Hal + µp, (56)
δl = ãTl H̃(ãl − Ãql ) = aTl H(al − Aql ) + µβl , (57)
Note that βl γl + δl = βl2 /ẽT C ẽ + δl = 0 if and only if βl = γl = δl = 0.
Proof.
(a) Using (42) and (50) we can write
AT HAql = ÃT H̃ Ãql − µẽẽT ql = AT Hal + µẽ − µẽẽT ql = AT Hal + µβl ẽ,
AT HAp = ÃT H̃ Ãp − µẽẽT p = ẽ − µẽẽT p,
(because ẽT u(λ) + ul (λ) = 1 for λ ≥ 0). Since z(λ) − z(0) = αγl , using (60)–(61) we can write
12
(b) Using (51), (53), and (55) we obtain
[ ]T [ ][ ]
u(λ) − u(0) AT Hal AT HA u(0)
(d(λ) − d(0)) Gd(0) =
T
ul (λ) − ul (0) aTl Hal aTl HA ul (0)
[ ]T [ ]
u(λ) − u(0) f − z(0)ẽ
=
ul (λ) − ul (0) fl + vl − z(0)
= (u(λ) − u(0))T f + (ul (λ) − ul (0))fl + (ul (λ) − ul (0))vl ,
vl uj (0) ∆ uk (0)
α1 = − , α2 = T
= min T
, (62)
βl γl + δl ej (ql + γl p) k∈I˜ ek (ql + γl p)
13
Algorithm 1. Dual method of active constraints
Step 1 Choose an arbitrary index 1 ≤ l ≤ m (e.g. l = 1) and a number µ (e.g. µ = 1). Set K := {l},
u := [1], ẽ := [1], A := [al ], R := [aTl Hal + µ]1/2 . Compute a number z := fl − aTl Hal . Set vl := 0
and uk := 0 for k ̸∈ K.
Step 2 Compute a vector d := −HAu (formula (42)), set vk := z − (aTk d + fk ) for k ̸∈ K and determine
an index l ̸∈ K such that vl = mink̸∈K vk . If vl ≥ 0, terminate the computation (a pair (d, z) ∈ Rn+1
is a solution of primal problem (40) and a vector u is a solution of dual problem (41)).
Step 3 Determine the vector p by solving the system of equations RT Rp = ẽ and the vector ql by solving
the system of equations RT Rql = ÃT H̃ãl . Set βl := 1 − ẽT ql , γl := βl /ẽT p, δl := ãTl H̃(ãl − Ãql )
(Remark 16). Compute numbers α1 , α2 defined in Remark 14 and set α := min(α1 , α2 ). If α = ∞,
terminate the computation (the primal problem has no feasible solution and the dual problem has no
optimal solution). If α < ∞, set u := u − α(ql + γl p), ul := ul + α, z := z + αγl , vl := (1 − α/α1 )vl .
Step 4 If α = α1 , set K := K ∪{l}, u := u+ , f := f + , ẽ := ẽ+ , A := A+ , R := R+ , where u+ = [uT , ul ]T ,
f + = [f T , fl ]T , ẽ+ = [ẽT , 1]T and A+ , R+ are matrices defined in Remark 16. Go to Step 2.
Step 5 If α ̸= α1 , set K := K \ {j}, u := u− , f := f − , ẽ := ẽ− , A := A− , R := R− , where j ∈ K is an
index determined by (62), vectors u− , f − , ẽ− result from vectors u, f , ẽ by removing the element
with index j and A− , R− are matrices defined in Remark 16. Go to Step 3.
Remark 16. The vector ql and the number δl used in Step 3 of Algorithm 1 can be computed so that we
solve two systems of equations RT rl = ÃT H̃ãl = AT Hal + µẽ and Rql = rl with triangular matrices RT
and R and set δl = ρ2l where ρ2l = aTl Hal − rlT rl . Then, in Step 4 it holds that
[ ]
R rl
A+ = [A, al ], R+ = .
0 ρl
In Step 5 we determine a permutation matrix Π such that AΠ = [A− , aj ] and RΠ is an upper Hessenberg
matrix. Furthermore, we determine an orthogonal matrix Q such that the matrix QRΠ is upper triangular.
Then [ − ]
R rj
QRΠ =
0 ρj
holds. Derivation of these relations can be found in [19].
Theorem 3. After a finite number of steps of Algorithm 1, either a solution of problems (40) and (41) is
found or the fact that these problems have no solution is detected.
Proof. Algorithm 1 generates a sequence of active sets Kji , ji ∈ N , of the primal problem, where the set
Kj1 , j1 = 1, is acceptable. Suppose that the set Kji , ji ∈ N , is acceptable. By Remark 15 and Lemma 3,
after at most m steps, we either find out that problems (40) and (41) have no solution or obtain an
acceptable set Kji+1 where ji+1 − ji ≤ m and where Q(dji+1 , zji+1 ) > Q(dji , zji ). Thus, the sets Kji+1 and
Kji are different and since the number of different subsets of the set {1, . . . , m} is finite, the computation
must terminate after a finite number of steps.
14
on a barrier parameter 0 < µ ≤ µ < ∞, is successively minimized on Rn (without any constraints), where
µ → 0. Applying this approach on problem (7), we obtain a barrier function
∑
m ∑
mk
Bµ (x, z) = h(z) + µ φ(zk − fkl (x)), 0 < µ ≤ µ, (63)
k=1 l=1
is most frequently used. It satisfies Assumption B1 with c = 1 and τ = ∞ but it is not bounded from below
since log t → ∞ for t → ∞. For that reason, barriers bounded from below are sometimes used, e.g. a
function
tτ
φ(t) = log(t−1 + τ −1 ) = − log , (67)
t+τ
which is bounded from below by number φ = − log τ , or a function
which is bounded from below by number φ = c = − log τ − 3. Coefficients a, b, c are chosen so that
function φ(t) as well as its first and second derivatives are continuous in t = τ . All these barriers satisfy
Assumption B1 [26] (the proof of this statement is trivial for logarithmic barrier (66)).
Even if bounded from below barriers (67)-(69) have more advantageous theoretical properties (As-
sumption X1a can be replaced with a weaker Assumption X1b), algorithms using logarithmic barrier
(67) are usually more efficient. Therefore, we will only deal with methods using the logarithmic barrier
φ(t) = − log t in the subsequent considerations.
Further, we will denote gkl (x) and Gkl (x) gradients and Hessian matrices of functions fkl (x), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , and set
µ µ 1
ukl (x, z) = ≥ 0, vkl (x, z) = = u2kl (x, z) ≥ 0, (71)
zk − fkl (x) (zk − fkl (x))2 µ
15
uk1 (x, z) vk1 (x, z) 1
uk (x, z) = ... , vk (x, z) = ... , ẽk = . . . .
ukmk (x, z) vkmk (x, z) 1
Denoting by g(x, z) the gradient of the function Bµ (x, z) and γk (x, z) = ∂Bµ (x, z)/∂zk , the necessary
conditions for an extremum of barrier function (63) can be written in the form
∑
m ∑
mk ∑
m
g(x, z) = gkl (x)ukl (x, z) = Ak (x)uk (x, z) = 0, (72)
k=1 l=1 k=1
∑mk
γk (x, z) = 1− ukl (x, z) = 1 − ẽTk uk (x, z) = 0, 1 ≤ k ≤ m, (73)
l=1
where Ak (x) = [gk1 (x), . . . , gkmk (x)], which is a system of n + m nonlinear equations for unknown vectors
x and z. These equations can be solved by the Newton method. In this case, the second derivatives of the
Lagrange function (which are the first derivatives of expressions (72) and (73)) are computed. Denoting
m ∑
∑ mk
G(x, z) = Gkl (x)ukl (x, z) (74)
k=1 l=1
∂g(x, z) ∑
mk
= − gkl (x)vkl (x, z) = −Ak (x)vk (x, z), (76)
∂zk
l=1
∂γk (x, z) ∑
mk
= − T
vkl (x, z)gkl (x) = −vkT (x, z)ATk (x), (77)
∂x
l=1
∂γk (x, z) ∑
mk
= vkl (x, z) = ẽTk vk (x, z). (78)
∂zk
l=1
Using these formulas we obtain a system of linear equations describing a step of the Newton method
W (x, z) −A1 (x)v1 (x, z) . . . −Am (x)vm (x, z) ∆x gk (x, z)
−v1T (x, z)AT1 (x) ẽT1 v1 (x, z) ... 0 ∆z1
= − γ1 (x, z) , (79)
... ... ... ... ... ...
−vm (x, z)Am (x)
T T
0 ... T
ẽm vm (x, z) ∆zm γm (x, z)
where
∑
m
W (x, z) = G(x, z) + Ak (x)Vk (x, z)ATk (x). (80)
k=1
Setting
C(x, z) = [A1 (x)v1 (x, z), . . . , Am (x)vm (x, z)], D(x, z) = diag(ẽT1 v1 (x, z), . . . , ẽTm vm (x, z))
16
and γ(x, z) = [γ1 (x, z), . . . , γm (x, z)]T , a step of the Newton method can be written in the form
[ ][ ] [ ]
W (x, z) −C(x, z) ∆x g(x, z)
=− . (81)
−C T (x, z) D(x, z) ∆z γ(x, z)
The diagonal matrix D(x, z) is positive definite since it has positive diagonal elements.
Remark 18. If the number m is small (as in case of a classical minimax problem, where m = 1), we will
use the expression
[ ]−1 [ −1 ]
W −C W − W −1 C(C T W −1 C − D)−1 C T W −1 −W −1 C(C T W −1 C − D)−1
= .
−C T D −(C T W −1 C − D)−1 C T W −1 −(C T W −1 C − D)−1
We suppose that the matrix W is regular (otherwise, it can be regularized e.g. by the Gill-Murray decom-
position [11]). Then, a solution of system of equations (81) can be computed by
∆z = (C T W −1 C − D)−1 (C T W −1 g + γ), (82)
∆x = W −1 (C∆z − g). (83)
In this case, a large matrix W of order n, which is sparse if G(x, z) is sparse, and a small dense matrix
C T W −1 C − D of order m are decomposed.
Remark 19. If the numbers mk , 1 ≤ k ≤ m, are small (as in case of a sum of sbsolute values, where
mk = 2, 1 ≤ k ≤ m), the matrix W (x, z) − C(x, z)D−1 (x, z)C T (x, z) is sparse. Thus, we can use the
expression
[ ]−1 [ ]
W −C (W − CD−1 C T )−1 (W − CD−1 C T )−1 CD−1
= .
−C T D D−1 C T (W − CD−1 C T )−1 D−1 + D−1 C T (W − CD−1 C T )−1 CD−1
Then, a solution of system of equations (81) can be computed by
∆x = −(W − CD−1 C T )−1 (g + CD−1 γ), (84)
−1
∆z = D (C ∆x − γ).
T
(85)
In this case, a large matrix W − CD−1 C T of order n, which is usually sparse if G(x, z) is sparse, is
decomposed. The inverse of the diagonal matrix D of order m makes no problem.
During iterative determination of a minimax vector we know a value of the parameter µ and vectors
x ∈ Rn , z ∈ Rm such that zk > Fk (x), 1 ≤ k ≤ m. Using formulas (82)–(83) or (84)–(85) we determine
direction vectors ∆x, ∆z. Then, we choose a steplength α so that
Bµ (x + α∆x, z + α∆z) < Bµ (x, z) (86)
and zk + α∆zk > Fk (x + α∆x), 1 ≤ k ≤ m. Finally, we set x+ = x + α∆x, z+ = z + α∆z and determine a
new value µ+ < µ. If the matrix of system of equations (81) is positive definite, inequality (86) is satisfied
for a sufficiently small value of the steplength α.
Theorem 4. Let the matrix G(x, z) given by (74) be positive definite. Then the matrix of system of
equations (81) is positive definite.
Proof. The matrix of system of equations (81) is positive definite if and only if the matrix D and its Schur
complement W − CD−1 C T are positive definite [8, Theorem 2.5.6]. The matrix D is positive definite since
it has positive diagonal elements. Further, it holds
∑
m
( )
W − CD−1 C T = G + Ak Vk ATk − Ak Vk ẽk (ẽTk Vk ẽk )−1 (Ak Vk ẽk )T ,
k=1
17
3.3 Direct determination of a minimax vector
Now we will show how to solve system of equations (72)–(73) by direct determination of a minimax vector
using two-level optimization
z(x; µ) = arg min Bµ (x, z), (87)
z∈Rm
Problem (87) serves for determination of an optimal vector z(x; µ) ∈ Rm . Let B̃µ (z) = Bµ (x, z) for a fixed
chosen vector x ∈ Rn . The function B̃µ (z) is strictly convex (as a function of a vector z), since it is a
sum of convex function (21) and strictly convex functions −µ log(zk − fkl (x)), 1 ≤ k ≤ m, 1 ≤ l ≤ mk . A
minimum of the function B̃µ (z) is its stationary point, so it is a solution of system of equations (73) with
Lagrange multipliers (71). The following theorem shows that this solution exists and is unique.
Theorem 5. The function B̃µ (z) : (F (x), ∞) → R has a unique stationary point which is its global
minimum. This stationary point is characterized by a system of equations γ(x, z) = 0, or
∑
mk
µ
1 − ẽTk uk = 1 − = 0, 1 ≤ k ≤ m, (89)
zk − fkl (x)
l=1
for 1 ≤ k ≤ m.
Proof. Definition 1 implies fkl (x) ≤ Fk (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , where the equality occurs for at least
one index l.
(a) If (89) holds, then we can write
∑
mk
µ µ
1= > ⇔ zk − Fk (x) > µ,
zk − fkl (x) zk − Fk (x)
l=1
∑
mk
µ mk µ
1= < ⇔ zk − Fk (x) < mk µ,
zk − fkl (x) zk − Fk (x)
l=1
∑
mk
µ µ
γk (x, F + µ) = 1− < 1 − = 0,
µ + Fk (x) − fkl (x) µ
l=1
∑
mk
µ mk µ
γk (x, F + mk µ) = 1− >1− = 0,
mk µ + Fk (x) − fkl (x) mk µ
l=1
and the function γk (x, zk ) is continuous and decreasing in Fk (x) + µ < zk (x; µ) < Fk (x) + mk by
(78), the equation γk (x, zk ) = 0 has a unique solution in this interval. Since the function B̃µ (z) is
convex, this solution corresponds to its global minimum.
18
System (89) is a system of m scalar equations with localization inequalities (90). These scalar equations
can be efficiently solved by robust methods described e.g. in [16] and [17] (details are stated in [25]).
Suppose that z = z(x; µ) and denote
∑
m ∑
m ∑
mk
B(x; µ) = zk (x; µ) − µ log(zk (x; µ) − fkl (x)). (91)
k=1 k=1 l=1
∑
m
∇B(x; µ) = Ak (x)uk (x; µ), (92)
k=1
where W (x; µ) = W (x, z(x; µ)), G(x; µ) = G(x, z(x; µ)), C(x; µ) = C(x, z(x; µ)), D(x; µ) = D(x, z(x; µ))
and Uk (x; µ) = Uk (x, z(x; µ)), Vk (x; µ) = Vk (x, z(x; µ)) = Uk2 (x; µ)/µ, 1 ≤ k ≤ m. A solution of equation
is identical with a vector ∆x given by (84), where z = z(x; µ) (so γ(x, z(x; µ)) = 0).
Proof. Differentiating barrier function (91) and using (73) we obtain
∑
m ∑
m ∑
mk ( )
∂zk (x; µ) ∂zk (x; µ) ∂fkl (x)
∇B(x; µ) = − ukl (x; µ) −
∂x ∂x ∂x
k=1 k=1 l=1
( )
∑m
∂zk (x; µ) ∑mk ∑
m ∑
mk
∂fkl (x)
= 1− ukl (x; µ) + ukl (x; µ)
∂x ∂x
k=1 l=1 k=1 l=1
∑m ∑mk ∑
m
= gkl (x)ukl (x; µ) = Ak (x)uk (x; µ),
k=1 l=1 k=1
where
µ
ukl (x; µ) = , 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (95)
zk (x; µ) − fkl (x)
Formula (93) can be obtained by additional differentiation of relations (73) and (92) using (95). A simpler
way is based on using (84). Since (73) implies γ(x, z(x; µ)) = 0, we can substitute γ = 0 into (84), which
yields the equation ( )
W (x, z) − C(x, z)D−1 (x, z)C T (x, z) ∆x = −g(x, z),
where z = z(x; µ), that confirms validity of formulas (93) and (94) (details can be found in [25]).
Remark 20. To determine an inverse of the Hessian matrix, one can use a Woodbury formula [8, Theorem
12.1.4] which gives
19
If the matrix ∇2 B(x; µ) is not positive definite, it can be replaced by a matrix LLT = ∇2 B(x; µ) + E,
obtained by the Gill-Murray decomposition [11]. Note that it is more advantageous to use system of linear
equations (81) instead of (94) for determination of a direction vector ∆x because the system of nonlinear
equations (89) is solved with prescribed finite precision, and thus a vector γ(x, z), defined by (73), need not
be zero.
From
1 2
Vk (x; µ) = U (x; µ), uk (x; µ) ≥ 0, ẽTk uk (x; µ) = 1, 1 ≤ k ≤ m,
µ k
it follows that ∥Vk (x; µ)∥ → ∞ if µ → 0, so Hessian matrix (93) may be ill-conditioned if the value µ is
very small. From this reason, we use a lower bound µ > 0 for µ.
Theorem 7. Let Assumption X3 be satisfied and let µ ≥ µ > 0. If the matrix G(x; µ) is uniformly positive
definite (i.e. there exists a constant G such that v T G(x; µ)v ≥ G∥v∥2 ), there exists a number κ ≥ 1 such
that κ(∇2 B(x; µ)) ≤ κ.
Proof.
(a) Using (71), (93), and Assumption X3, we obtain
∑m
∥∇ B(x; µ)∥ ≤
G(x; µ) +
2
Ak (x)Vk (x; µ)Ak (x)
T
k=1
∑m ∑ ( )
1 2
mk
≤ |Gkl (x)ukl (x, µ)| + T
u (x; µ)gkl (x)gkl (x)
µ kl
k=1 l=1
m( )∆ c c
≤ µ G + g2 = ≤ (97)
µ µ µ
Remark 21. If there exists a number κ > 0 such that κ(∇2 B(xi ; µi )) ≤ κ, i ∈ N , the direction vector
∆xi , given by solving a system of equations ∇2 B(xi ; µi )∆xi = −∇B(xi ; µi ), satisfies the condition
20
so there exists a number c > 0 such that (see [32, Section 3.2])
If Assumption X3 is not satisfied, then only (∆xi )T g(xi ; µi ) < 0 holds (because the matrix ∇2 B(x; µ) is
positive definite by Theorem 4) and
3.4 Implementation
Remark 22. In (80), it is assumed that G(x, z) is the Hessian matrix of the Lagrange function. Direct
computation of the matrix G(x; µ) = G(x, z(x; µ)) is usually difficult (one can use automatic differentiation
as described in [14]). Thus, various approximations G ≈ G(x; µ) are mostly used.
• If the problem is separable (i.e. fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are functions of a small number
nkl = O(1) of variables), one can set as in [13]
∑
m ∑
mk
T
G= Zkl Ĝkl Zkl ukl (x, z),
k=1 l=1
where the reduced Hessian matrices Ĝkl are updated using the reduced vectors dˆkl = Zkl
T
(x+ − x) and
ŷkl = Zkl (gkl (x+ ) − gkl (x)).
Remark 23. The matrix G ≈ G(x; µ) obtained by the approach stated in Remark 22 can be ill-conditioned
so condition (99) (with a chosen value ε0 > 0) may not be satisfied. In this case it is possible to restart
the iteration process and set G = I. Then G = 1 and G = 1 in (97) and (98), so it is a higher probability
of fulfilment of condition (99). If the choice G = I does not satisfy (99), we set ∆x = −g(x; µ) (a steepest
descent direction).
An update of µ is an important part of interior point methods. Above all, µ → 0 must hold, which is
a main property of interior point methods. Moreover, rounding errors may cause that zk (x; µ) = Fk (x)
when the value µ is small (because Fk (x) < zk (x; µ) ≤ Fk (x) + mk µ and Fk (x) + mk µ → Fk (x) if µ → 0),
which leads to a breakdown (division by zk (x; µ) − Fk (x) = 0) when computing 1/(zk (x; µ) − Fk (x)).
Therefore, we need to use a lower bound µ for a barrier parameter (e.g. µ = 10−8 when computing in
double precision).
The efficiency of interior point methods also depends on the way of decreasing the value of a barrier pa-
rameter. The following heuristic procedures proved successful in practice, where g(xi ; µi ) = A(xi )u(xi ; µi )
and g is a suitable constant.
21
Procedure A
Phase 1 If ∥g(xi ; µi )∥ ≥ g, then µi+1 = µi (the value of a barrier parameter is unchanged).
Phase 2 If ∥g(xi ; µi )∥ < g, then
( )
µi+1 = max µ̃i+1 , µ, 10 εM |F (xi+1 )| , (103)
Procedure B
Phase 1 If ∥g(xi ; µi )∥2 ≥ ϑµi , then µi+1 = µi (the value of a barrier parameter is unchanged).
Phase 2 If ∥g(xi ; µi )∥2 < ϑµi , then
µi+1 = max(µ, ∥gi (xi ; µi )∥2 ). (105)
The values µ = 10−8 and ϑ = 0.1 are usually used.
The choice of g in Procedure A is not critical. We can set g = ∞ but a lower value is sometimes more
advantageous. Formula (104) requires several comments. The first argument of the minimum controls
the decreasing speed of the value of a barrier parameter which is linear (a geometric sequence) for small i
(the term λµi ) and sublinear (a harmonic sequence) for large i (the term µi /(σµi + 1)). Thus, the second
argument ensuring that the value µ is small in a neighborhood of a desired solution is mainly important for
large i. This situation may appear if the gradient norm ∥g(xi ; µi )∥ is small even if xi is far from a solution.
The idea of Procedure B proceeds from the fact that a barrier function B(x; µ) should be minimized with
a sufficient precision for a given value of a parameter µ.
The considerations up to now are summarized in the following algorithm which supposes that the matrix
A(x) is sparse. If it is dense, the algorithm is simplified because there is no symbolic decomposition.
Algorithm 2. Primal interior point method
Data. A tolerance for the gradient norm of the Lagrange function ε > 0. A precision for determination
of a minimax vector δ > 0. Bounds for a barrier parameter 0 < µ < µ. Coefficients for decrease of
a barrier parameter 0 < λ < 1, σ > 1 (or 0 < ϑ < 1). A tolerance for a uniform descent ε0 > 0. A
tolerance for a steplength selection ε1 > 0. A maximum steplength ∆ > 0.
Input. A sparsity pattern of the matrix A(x) = [A1 (x), . . . , Am (x)]. A starting point x ∈ Rn .
Step 1 Initiation. Choose µ ≤ µ. Determine a sparse structure of the matrix W = W (x; µ) from
the sparse structure of the matrix A(x) and perform a symbolic decomposition of the matrix W
(described in [2, Section 1.7.4]). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values Fk (x) =
max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4). Set r = 0 (restart indicator).
Step 2 Termination. Solve nonlinear equations (89) with precision δ to obtain a minimax variable
z(x; µ) and a vector of Lagrange multipliers u(x; µ). Determine a matrix A = A(x) and a vector
g = g(x; µ) = A(x)u(x; µ). If µ ≤ µ and ∥g∥ ≤ ε, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x; µ) or compute an approximation G of the Hes-
sian matrix G(x; µ) using gradient differences or using quasi-Newton updates (Remark 22).
Step 4 Direction determination. Determine a matrix ∇2 B(x; µ) by (93) and a vector ∆x by solving
equations (94) with the right-hand side defined by (92).
22
Step 5 Restart. If r = 0 and (99) does not hold (where s = ∆x), set G = I, r = 1 and go to Step 4. If
r = 1 and (99) does not hold, set ∆x = −g. Set r = 0.
Step 6 Steplength selection. Determine a steplength α > 0 satisfying inequalities (100) (for a barrier
function B(x; µ) defined by (91)) and α ≤ ∆/∥∆x∥. Note that nonlinear equations (89) are solved
at the point x + α∆x. Set x := x + α∆x. Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values
Fk (x) = max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4).
Step 7 Barrier parameter update. Determine a new value of a barrier parameter µ ≥ µ using Proce-
dure A or Procedure B. Go to Step 2.
The values ε = 10−6 , δ = 10−6 , µ = 10−8 , µ = 1, λ = 0.85, σ = 100, ϑ = 0.1, ε0 = 10−8 , ε1 = 10−4 ,
and ∆ = 1000 were used in our numerical experiments.
Differentiating a function
∑
m ∑
m ∑
mk
B(x; µ) = zk (x; µ) − µ log(zk (x; µ) − fkl (x)) (106)
k=1 k=1 l=1
∂B(x; µ) ∑
m
∂zk (x; µ) ∑
m ∑
mk ∑
m ∑
mk
µ ∂zk (x; µ)
= − log(zk (x; µ) − fkl (x)) −
∂µ ∂µ zk (x; µ) − fkl (x) ∂µ
k=1 k=1 l=1 k=1 l=1
( )
∂zk (x; µ) ∑
m ∑mk
µ ∑∑
m mk
= 1− − log(zk (x; µ) − fkl (x))
∂µ zk (x; µ) − fkl (x)
k=1 l=1 k=1 l=1
∑
m ∑
mk
= − log(zk (x; µ) − fkl (x)).
k=1 l=1
Lemma 5. Let Assumption X1a be satisfied. Let xi and µi , i ∈ N , be the sequences generated by Algo-
rithm 2. Then the sequences B(xi ; µi ), z(xi ; µi ), and F (xi ), i ∈ N , are bounded. Moreover, there exists a
constant L ≥ 0 such that for i ∈ N it holds
B(xi+1 ; µi+1 ) ≤ B(xi+1 ; µi ) + L(µi − µi+1 ). (107)
23
Proof.
(a) We first prove boundedness from below. Using (106) and Assumption X1a, one can write
∑
m ∑
m ∑
mk
B(x; µ) − F = zk (x; µ) − F − µ log(zk (x; µ) − fkl (x))
k=1 k=1 l=1
∑m
≥ (zk (x; µ) − F − mk µ log(zk (x; µ) − F )) .
k=1
A convex function ψ(t) = t−mµ log(t) has a unique minimum at the point t = mµ because ψ ′ (mµ) =
1 − mµ/mµ = 0. Thus, it holds
∑
m ∑
m
B(x; µ) ≥ F+ (mk µ − mk µ log(mk µ)) ≥ F + min (0, mk µ(1 − log(mk µ))
k=1 k=1
∑m
∆
≥ F+ min (0, mk µ(1 − log(2mk µ)) = B.
k=1
Boundedness from below of sequences z(xi ; µi ) and F (xi ), i ∈ N , follows from inequalities (90) and
Assumption X1a.
(b) Now we prove boundedness from above. Similarly as in (a) we can write
∑ m ( )
zk (x; µ) − F ∑ zk (x; µ) − F
m
B(x; µ) − F ≥ + − mk µ log(z(x; µ) − F ) .
2 2
k=1 k=1
A convex function t/2 − mµ log(t) has a unique minimum at the point t = 2mµ. Thus, it holds
∑
m
zk (x; µ) − F ∑
m ∑
m
zk (x; µ) − F
B(x; µ) ≥ +F + min (0, mµ(1 − log(2mk µ)) = +B
2 2
k=1 k=1 k=1
or
∑
m
(zk (x; µ) − F ) ≤ 2(B(x; µ) − B). (108)
k=1
Using the mean value theorem and Lemma 4, we obtain
∑
m ∑
mk
B(xi+1 ; µi+1 ) − B(xi+1 ; µi ) = log(zk (xi+1 ; µ̃i ) − fkl (xi+1 ))(µi − µi+1 )
k=1 l=1
∑m ∑mk
≤ log(zk (xi+1 ; µi ) − fkl (xi+1 ))(µi − µi+1 )
k=1 l=1
∑m
≤ mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 ), (109)
k=1
where 0 < µi+1 ≤ µ̃i ≤ µi . Since log(t) ≤ t/e (where e = exp(1)) for t > 0, we can write using
inequalities (108), (109), (90)
∑
m
B(xi+1 ; µi+1 ) − B ≤ B(xi+1 ; µi ) − B + mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1
∑
m
−1
≤ B(xi+1 ; µi ) − B + e mk (zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1
−1
≤ B(xi+1 ; µi ) − B + 2e m(B(xi+1 ; µi ) − B)(µi − µi+1 )
= (1 + λδi )(B(xi+1 ; µi ) − B) ≤ (1 + λδi )(B(xi ; µi ) − B),
24
where λ = 2m/e and δi = µi − µi+1 . Therefore,
∏
i ∞
∏
B(xi+1 ; µi+1 ) − B ≤ (1 + λδj )(B(x1 ; µ1 ) − B) ≤ (1 + λδi )(B(x1 ; µ1 ) − B) (110)
j=1 i=1
and since
∞
∑ ∞
∑
λδi = λ (µi − µi+1 ) = λ(µ − lim µi ) ≤ λµ,
i→∞
i=1 i=1
the expression on the right-hand side of (110) is finite. Thus, the sequence B(xi ; µi ), i ∈ N , is
bounded from above and the sequences z(xi ; µi ) and F (xi ), i ∈ N , are bounded from above as well
by (108) and (90).
(c) Finally, we prove formula (107). Using (109) and (90) we obtain
∑
m
B(xi+1 ; µi+1 ) − B(xi+1 ; µi ) ≤ mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1
∑m
≤ mk log(Fk (xi+1 ) + mk µi − F )(µi − µi+1 )
k=1
∑m
∆
≤ mk log(F + mk µ − F )(µi − µi+1 ) = L(µi − µi+1 )
k=1
(the existence of a constant F follows from boundedness of a sequence F (xi ), i ∈ N ), which together
with (102) gives B(xi+1 ; µi+1 ) ≤ B(xi ; µi ) + L(µi − µi+1 ), i ∈ N . Thus, it holds
∆
B(xi ; µi ) ≤ B(x1 ; µ1 ) + L(µ1 − µi ) ≤ B(x1 ; µ1 ) + Lµ = B, i ∈ N. (111)
The upper bounds g and G are not used in Lemma 5, so Assumption X3 may not be satisfied. Thus,
there exists an upper bound F (independent of g and G) such that F (xi ) ≤ F for all i ∈ N . This upper
bound can be used in definition of a set DF (F ) in Assumption X3.
Lemma 6. Let Assumption X3 and the assumptions of Lemma 5 be satisfied. Then, if we use Procedure A
or Procedure B for an update of parameter µ, the values µi , i ∈ N , form a non-decreasing sequence such
that µi → 0.
Proof. The value of parameter µ is unchanged in the first phase of Procedure A or Procedure B. Since a
function B(x; µ) is continuous, bounded from below by Lemma 5, and since inequality (101) is satisfied
(with µi = µ), it holds ∥g(xi ; µ)∥ → 0 if phase 1 contains an infinite number of subsequent iterative steps
[32, Section 3.2]. Thus, there exists a step (with index i) belonging to the first phase such that either
∥g(xi ; µ)∥ < g in Procedure A or ∥g(xi ; µ)∥2 < ϑµ in Procedure B. However, this is in contradiction with
the definition of the first phase. Thus, there exists an infinite number of steps belonging to the second
phase, where the value of parameter µ is decreased so that µi → 0.
Theorem 8. Let assumptions of Lemma 6 be satisfied. Consider a sequence xi , i ∈ N , generated by
Algorithm 2, where δ = ε = µ = 0. Then
∑
m ∑
mk ∑
mk
lim gkl (xi )ukl (xi ; µi ) = 0, ukl (xi ; µi ) = 1,
i→∞
k=1 l=1 l=1
25
Proof.
(a) Equalities ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m, are satisfied by (89) because δ = 0. Inequalities zk (xi ; µi ) −
fkl (xi ) ≥ 0 and ukl (xi ; µi ) ≥ 0 follow from formulas (90) and statement (95).
(b) Relations (101) and (107) yield
B(xi+1 ; µi+1 ) − B(xi ; µi ) = (B(xi+1 ; µi+1 ) − B(xi+1 ; µi )) + (B(xi+1 ; µi ) − B(xi ; µi ))
≤ L (µi − µi+1 ) − c ∥g(xi ; µi )∥2
and since limi→∞ µi = 0 (Lemma 6), we can write by (111) that
∞
∑ ∞
∑
B ≤ lim B(xi+1 ; µi+1 ) ≤ B(x1 ; µ1 ) + L (µi − µi+1 ) − c ∥g(xi ; µi )∥2
i→∞
i=1 i=1
∞
∑ ∞
∑
≤ B(x1 ; µ1 ) + Lµ − c ∥g(xi ; µi )∥2 = B − c ∥g(xi ; µi )∥2 .
i=1 i=1
Thus, it holds
∞
∑ 1
∥g(xi ; µi )∥2 ≤ (B − B) < ∞,
i=1
c
∑ m ∑ mk
which gives g(xi ; µi ) = k=1 l=1 gkl (xi )ukl (xi ; µi ) → 0.
(c) Let indices 1 ≤ k ≤ m and 1 ≤ l ≤ mk are chosen arbitrarily. Using (95) and Lemma 6 we obtain
µi
ukl (xi ; µi )(zk (xi ; µi ) − fkl (xi )) = (zk (xi ; µi ) − fkl (xi )) = µi → 0.
zk (xi ; µi ) − fkl (xi )
Corollary 1. Let the assumptions of Theorem 8 be satisfied. Then, every cluster point x ∈ Rn of a
sequence xi , i ∈ N , satisfies necessary KKT conditions (8)-(9) where z and u (with elements zk and ukl ,
1 ≤ k ≤ m, 1 ≤ l ≤ mk ) are cluster points of sequences z(xi ; µi ) and u(xi ; µi ), i ∈ N .
Now we will suppose that the values δ, ε, and µ are nonzero and show how a precise solution of the
system of KKT equations will be after termination of computation.
Theorem 9. Let the assumptions of Lemma 6 be satisfied. Consider a sequence xi , i ∈ N , generated by
Algorithm 2. Then, if the values δ > 0, ε > 0, and µ > 0 are chosen arbitrarily, there exists an index i ≥ 1
such that
∑
mk
∥g(xi ; µi )∥ ≤ ε, 1 − ukl (xi ; µi ) ≤ δ,
l=1
26
3.6 Special cases
Both the simplest and most widely considered generalized minimax problem is the classical minimax
problem (10), when m = 1 in (4) (in this case we write m, z, u, v, U , V instead of m1 , z1 , u1 , v1 , U1 , V1 ).
For solving a classical minimax problem one can use Algorithm 2, where a major part of computation is
very simplified. System of equations (79) is of order n + 1 and has the form
[ ][ ] [ ]
G(x, z) + A(x)V (x, z)AT (x) −A(x)V (x, z)ẽ ∆x g(x, z)
=− , (112)
−ẽT V (x, z)AT (x) ẽT V (x, z)ẽ ∆z γ(x, z)
where g(x, z) = A(x)u(x, z), γ(x, z) = 1−ẽT u(x, z), V (x, z) = U 2 (x, z)/µ = diag(u21 (x, z), . . . , u2m (x, z))/µ,
and uk (x, z) = µ/(z − fk (x), 1 ≤ k ≤ m. System of equations (89) is reduced to one nonlinear equation
∑
m
µ
1 − ẽT u(x, z) = 1 − = 0, (113)
z − fk (x)
k=1
whose solution z(x; µ) lies in the interval F (x) + µ ≤ z(x; µ) ≤ F (x) + mµ. To find this solution by robust
methods from [16], [17] is not difficult. A barrier function has the form
∑
m
B(x; µ) = z(x; µ) − µ log(z(x; µ) − fk (x)) (114)
k=1
with
where W (x, z) = G(x, z) + A(x)V (x, z)AT (x), c(x, z) = A(x)V (x, z)ẽ and δ(x, z) = ẽT V (x, z)ẽ, then
Since [ ]−1 [ −1 ]
W −c W − W −1 c ω −1 cT H −1 −W −1 c ω −1
= ,
−cT δ −ω −1 cT W −1 −ω −1
where ω = cT W −1 c − δ, we can write
[ ] [ ]−1 [ ] [ −1 ]
∆x W −c g W (c ∆z − g)
=− = ,
∆z −cT δ γ ∆z
where
∆z = ω −1 (cT W −1 g + γ).
The matrix W is sparse if the matrix A(x) has sparse columns. If the matrix W is not positive definite,
we can use the Gill-Murray decomposition
W + E = LLT , (115)
27
where E is a positive semidefinite diagonal matrix. Then we solve the equations
and set
cT p + γ
∆z = , ∆x = q ∆z − p. (117)
cT q − δ
If we solve the classical minimax problem, Algorithm 2 must be somewhat modified. In Step 2, we solve
only equation (113) instead of the system of equations (89). In Step 4, we determine a vector ∆x by
solving equations (116) and using relations (117). In Step 4, we use the barrier function (114) (nonlinear
equation (113) must be solved at the point x + α∆x).
Minimization of a sum of absolute values, i.e., minimization of the function
∑
m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x),
k=1 k=1
is another important generalized minimax problem. In this case, a barrier function has the form
∑
m ∑
m ∑
m
Bµ (x, z) = zk − µ log(zk − fk+ (x)) − µ log(zk − fk− (x))
k=1 k=1 k=1
∑m ∑m ∑
m
= zk − µ log(zk − fk (x)) − µ log(zk + fk (x))
k=1 k=1 k=1
∑m ∑m
= zk − µ log(zk2 − fk2 (x)), (118)
k=1 k=1
where zk > |fk (x)|, 1 ≤ k ≤ m. Differentiating Bµ (x, z) with respect to x and z we obtain the necessary
conditions for an extremum
∑
m
2µfk (x) ∑
m
2µfk (x)
gk (x) = uk (x, zk ) gk (x) = 0, uk (x, zk ) = (119)
zk − fk (x)
2 2 zk2 − fk2 (x)
k=1 k=1
and
2µzk zk fk (x)
1− = 1 − uk (x, zk ) =0 ⇒ uk (x, zk ) = , 1 ≤ k ≤ m. (120)
zk2 − fk (x)
2 fk (x) zk
Denoting A(x) = [g1 (x), . . . , gm (x)],
f1 (x) z1 u1 (x, z1 )
f (x) = . . . , z = . . . , u(x, z) = ... (121)
fm (x) zm um (x, zm )
Let B̃µ (z) = Bµ (x, z) for a fixed chosen vector x ∈ Rn . The function B̃µ (z) is convex for zk > |fk (x)|,
1 ≤ k ≤ m, because it is a sum of convex functions. Thus, a stationary point of B̃µ (z) exists and it is its
global minimum. Differentiating B̃µ (z) with respect to z we obtain quadratic equations
2µzk (x; µ)
=1 ⇔ zk2 (x; µ) − fk2 (x) = 2µzk (x; µ), 1 ≤ k ≤ m, (123)
zk2 (x; µ) − fk2 (x)
28
defining its unique stationary point, that have a solution
√
zk (x; µ) = µ + µ2 + fk2 (x), 1 ≤ k ≤ m, (124)
(the second solutions of quadratic equations (123) do not satisfy the condition zk > |fk (x)|, so the obtained
vector z does not belong to a domain of B̃µ (z)). Using (120) and (124) we obtain
fk (x) f (x)
uk (x; µ) = uk (x, z(x; µ)) = = √k , 1 ≤ k ≤ m, (125)
zk (x; µ) µ + µ2 + fk2 (x)
and
∑
m ∑
m
B(x; µ) = B(x, z(x; µ)) = zk (x; µ) − µ log(zk2 (x; µ) − fk2 (x))
k=1 k=1
∑
m ∑m
= zk (x; µ) − µ log(2µzk (x; µ))
k=1 k=1
∑m
= [zi (x; µ) − µ log(zi (x; µ))] − µm log(2µ). (126)
i=1
and
∇2 B(x; µ) = W (x; µ) = G(x; µ) + A(x)V (x; µ)AT (x), (128)
where
∑
m
G(x; µ) = Gk (x)uk (x; µ), (129)
k=1
Gk (x) are the Hessian matrices of functions fk (x), 1 ≤ k ≤ m, V (x; µ) = diag(v1 (x; µ), . . . , vm (x; µ)), and
2µ
vk (x; µ) = , 1 ≤ k ≤ m. (130)
zk2 (x; µ) + fk2 (x)
Proof. Differentiating (126) and using (123) and (119) we can write
∑
m ∑
m
zi (x; µ)∇zk (x; µ) − fk (x)gk (x)
∇B(x; µ) = ∇zk (x; µ) − 2µ
i=1
zk2 (x; µ) − fk2 (x)
k=1
∑m ( ) ∑
m
2µzk (x; µ) 2µfk (x)gk (x)
= 1− 2 ∇zk (x; µ) +
zk (x; µ) − fk2 (x) zk (x; µ) − fk2 (x)
2
k=1 k=1
∑m
= uk (x; µ)gk (x) = A(x)u(x; µ).
k=1
29
for 1 ≤ k ≤ m. Thus, by (125), (131), (123), and (127) it holds
( )
fk (x) zk (x; µ)gk (x) − fk (x)∇zk (x; µ)
∇uk (x; µ) = ∇ =
zk (x; µ) zk2 (x; µ)
( )
2fk2 (x) gk (x) zk2 (x; µ) − fk2 (x) gk (x)
= 1− 2 =
zk (x; µ) + fk2 (x) zk (x; µ) zk2 (x; µ) + fk2 (x) zk (x; µ)
2µ
= gk (x) = vk (x; µ)gk (x).
zk2 (x; µ) + fk2 (x)
Differentiating (127) and using the previous expression we obtain
∑
m ∑
m ∑
m
∇2 B(x; µ) = ∇ uk (x; µ)gk (x) = uk (x; µ)Gk (x) + ∇uk (x; µ)gkT (x)
k=1 k=1 k=1
∑
m ∑
m
= uk (x; µ)Gk (x) + vk (x; µ)gk (x)gkT (x),
k=1 k=1
because |uk (x; µ)| ≤ 1, 1 ≤ k ≤ m, holds by (125). Since a matrix V (x; µ) is diagonal, we can write by
(130) that ( )
2µ
∥V (x; µ)∥ = max |vk (x; µ)| = max 2 (x; µ) + f 2 (x) . (133)
1≤k≤m 1≤k≤m zk k
Using (123) and (124) we obtain
( √ )
zk2 (x; µ) + fk2 (x) = 2µzk (x; µ) = 2µ µ + µ2 + fk2 (x) ≥ 4µ2
for 1 ≤ k ≤ m, which after substitution to (133) gives ∥V (x; µ)∥ ≤ 1/(2µ). Thus, inequality
c c
∥∇2 B(x; µ)∥ ≤ ≤ (134)
µ µ
30
where c = m(µ G + g 2 /2) is satisfied.
A slightly modified Algorithm 2 can be used for minimization of a sum of absolute values. However, the
problems of this type are characterized by ill-posedness of a matrix ∇2 B(x; µ). Thus, it is more convenient
to use trust region methods [22]. In this case, a direction vector ∆x is determined by approximate
minimization of a quadratic function
1
Q(∆x) = (∆x)T ∇2 B(x; µ)∆x + g T (x; µ)∆x
2
on the set ∥∆x∥ ≤ ∆, where ∆ is a trust region radius. A direction vector ∆x serves for determination of
a new approximation of the solution x+ . Denoting
we set x+ = x if ρ(∆x) < ρ or x+ = x + ∆x if ρ(∆x) ≥ ρ. The trust region radius is updated so that
β1 ≤ ∆+ ≤ β2 if ρ(∆x) < ρ or ∆+ ≥ ∆ if ρ(∆x) ≥ ρ where 0 < β1 ≤ β2 < 1. More details can be found
in [5] and [22].
4 Smoothing methods
4.1 Basic properties
Similarly as in Section 2.1 we will restrict ourselves to sums of maxima, where a function h : Rn → Rm
is a sum of its arguments, so (4) holds. Smoothing methods for minimization of sums of maxima replace
function (4) by a smoothing function
∑
m
S(x; µ) = Sk (x; µ), (135)
k=1
where ( ) ( )
∑
mk
fkl (x) ∑
mk
fkl (x) − Fk (x)
Sk (x; µ) = µ log exp = Fk (x) + µ log exp , (136)
µ µ
l=1 l=1
∑
m
Fk (x) ≤ Sk (x; µ) ≤ Fk (x) + µ log mk , 1≤k≤m ⇒ F (x) ≤ S(x; µ) ≤ F (x) + µ log mk , (137)
k=1
so S(x; µ) → F (x) if µ → 0.
Remark 24. Similarly as in Section 3.2 we will denote gkl (x) and Gkl (x) the gradients and Hessian
matrices of functions fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and
uk1 (x; µ) 1
uk (x; µ) = ... , ẽk = . . . ,
ukmk (x; µ) 1
where
exp(fkl (x)/µ) exp((fkl (x) − Fk (x))/µ)
ukl (x; µ) = ∑mk = ∑mk . (138)
l=1 exp(f kl (x)/µ) l=1 exp((fkl (x) − Fk (x))/µ)
31
Thus, it holds ukl (x; µ) ≥ 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and
∑
mk
ẽTk uk (x; µ) = ukl (x; µ) = 1. (139)
l=1
Further, we denote Ak (x) = JkT (x) = [gk1 (x), . . . , gkmk (x)] and Uk (x; µ) = diag(uk1 (x; µ), . . . , ukmk (x; µ))
for 1 ≤ k ≤ m.
Theorem 11. Consider smoothing function (135). Then
∇S(x; µ) = g(x; µ) (140)
and
1∑ 1∑
m m
∇2 S(x; µ) = G(x; µ) + Ak (x)Uk (x; µ)ATk (x) − Ak (x)uk (x; µ)(Ak (x)uk (x; µ))T
µ µ
k=1 k=1
1 1
= G(x; µ) + A(x)U (x; µ)AT (x) − C(x; µ)C(x; µ)T (141)
µ µ
∑m
where g(x; µ) = k=1 Ak (x)uk (x; µ) = A(x)u(x) and
∑
m
G(x; µ) = Gk (x)uk (x; µ), A(x) = [A1 (x), . . . , Am (x)],
k=1
U (x; µ) = diag(U1 (x; µ), . . . , Um (x; µ)), C(x; µ) = [A1 (x)u1 (x; µ), . . . , Am (x)um (x; µ)].
Proof. Obviously,
∑
m ∑
m
∇S(x; µ) = ∇Sk (x; µ), ∇ S(x; µ) =
2
∇2 Sk (x; µ).
k=1 k=1
Differentiating functions (136) and using (138) we obtain
µ ∑k
1
m
∇Sk (x; µ) = ∑ mk exp(fkl (x)/µ)gkl (x)
l=1 exp(f kl (x)/µ) µ
l=1
∑
mk
= gkl (x)ukl (x; µ) = Ak (x)uk (x; µ). (142)
l=1
1 1 ∑k m
= ukl (x; µ)gkl (x) − ukl (x; µ) ukl (x; µ)gkl (x). (143)
µ µ
l=1
32
Remark 25. Note that using (141) and the Schwarz inequality we obtain
( )
1∑ T ∑
m m
(v T Ak (x)Uk (x; µ)ẽk )2
v ∇ S(x; µ)v = v G(x; µ)v +
T 2 T
v Ak (x)Uk (x; µ)Ak (x)v −
T
µ
k=1
ẽTk Uk (x; µ)ẽk
k=1
≥ v T G(x; µ)v
because ẽTk Uk (x; µ)ẽk = ẽTk uk (x; µ) = 1, so the Hessian matrix ∇2 S(x; µ) is positive definite if the matrix
G(x; µ) is positive definite.
Using Theorem 11, a step of the Newton method can be written in the form x+ = x + α∆x where
or ( )
1
W (x; µ) − C(x; µ)C (x; µ) ∆x = −g(x; µ),
T
(144)
µ
where
1
W (x; µ) = G(x; µ) + A(x)U (x; µ)AT (x), g(x; µ) = A(x)u(x; µ). (145)
µ
A matrix W in (145) has the same structure as a matrix W in (93) and, by Theorem 11, smoothing
function (135) has similar properties as barrier function (91). Thus, one can use an algorithm that is
analogous to Algorithm 2 and considerations stated in Remark 21, where S(x; µ) and ∇2 S(x; µ) are used
instead of B(x; µ) and ∇2 B(x; µ). It means that
in remaining cases.
Step 1 Initiation. Choose µ ≤ µ. Determine a sparse structure of the matrix W = W (x; µ) from
the sparse structure of the matrix A(x) and perform a symbolic decomposition of the matrix W
(described in [2, Section 1.7.4]). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values Fk (x) =
max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4). Set r = 0 (restart indicator).
Step 2 Termination. Determine a vector of smoothing multipliers u(x; µ) by (138). Determine a matrix
A = A(x) and a vector g = g(x; µ) = A(x)u(x; µ). If µ ≤ µ and ∥g∥ ≤ ε, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x; µ) or compute an approximation G of the Hes-
sian matrix G(x; µ) using gradient differences or using quasi-Newton updates (Remark 22).
Step 4 Direction determination. Determine a matrix W by (145) and a vector ∆x by (144) using the
Gill-Murray decomposition of a matrix W .
33
Step 5 Restart. If r = 0 and (99) does not hold (where s = ∆x), set G = I, r = 1 and go to Step 4. If
r = 1 and (99) does not hold, set ∆x = −g. Set r = 0.
Step 6 Steplength selection. Determine a steplength α > 0 satisfying inequalities (100) (for a smooth-
ing function S(x; µ)) and α = ∆/∥∆x∥. Set x := x + α∆x. Compute values fkl (x), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , values Fk (x) = max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of the objective function
(4).
Step 7 Smoothing parameter update. Determine a new value of the smoothing parameter µ ≥ µ
using Procedure A or Procedure B. Go to Step 2.
Algorithm 3 differs from Algorithm 2 in that a nonlinear equation ẽT u(x; µ) = 1 need not be solved
in Step 2 (because (139) follows from (138)), equations (144)–(145) instead of (116)–(117) are used in
Step 4, and a barrier function B(x; µ) is replaced with a smoothing function S(x; µ) in Step 6. Note
that the parameter µ in (135) has different meaning than the same parameter in (91), so we could use
another procedure for its update in Step 7. However, it is becoming apparent that using Procedure A or
Procedure B is very efficient. On the other hand, it must be noted that using exponential functions in
Algorithm 3 has certain disadvantages. Computation of the values of exponential functions is more time
consuming than performing standard arithmetic operations and underflow may also happen (i.e. replacing
nonzero values by zero values) if the value of a parameter µ is very small.
The values ε = 10−6 , µ = 10−6 , µ = 1, λ = 0.85, σ = 100, ϑ = 0.1, ε0 = 10−8 , ε1 = 10−4 , and
∆ = 1000 were used in our numerical experiments.
and
∑
mk ∑ mk ′
∂Sk (x; µ) l=1∑φkl (x; µ) exp φkl (x; µ)
= log exp φkl (x; µ) + µ mk
∂µ l=1 exp φkl (x; µ)
l=1
∑
mk ∑
mk
= log exp φkl (x; µ) − φkl (x; µ)ukl (x; µ) ≥ 0 (150)
l=1 l=1
34
because φkl (x; µ) ≤ 0, ukl (x; µ) ≥ 0, 1 ≤ k ≤ m, and φkl (x; µ) = 0 holds for at least one index. Thus,
functions Sk (x; µ), 1 ≤ k ≤ m, are nondecreasing. Differentiating (138) with respect to µ we obtain
∑ mk
∂ukl (x; µ) 1 φkl (x; µ) exp φkl (x; µ) 1 exp φkl (x; µ) φkl (x; µ) exp φkl (x; µ)
= − ∑mk + ∑k l=1∑
mk
∂µ µ l=1 exp φ kl (x; µ) µ l=1 m exp φ kl (x; µ) l=1 exp φkl (x; µ)
1 1 ∑
m k
= − φkl (x; µ)ukl (x; µ) + ukl (x; µ) φkl (x; µ)ukl (x; µ). (151)
µ µ
l=1
Differentiating (150) with respect to µ and using equations (139) and (151) we can write
1∑ 1∑ 1∑
m m m
∂ 2 Sk (x; µ) k k k
∂ukl (x; µ)
= − φkl (x; µ)ukl (x; µ) + φkl (x; µ)ukl (x; µ) − φkl (x; µ)
∂µ2 µ µ µ ∂µ
l=1 l=1 l=1
1∑
mk
∂ukl (x; µ)
= − φkl (x; µ)
µ ∂µ
l=1
(m )(m ) (m )2
1 ∑ k ∑ k
1 ∑ k
= 2
φkl (x; µ)ukl (x; µ) ukl (x; µ) − 2 φkl (x; µ)ukl (x; µ) ≥0
µ2 µ
l=1 l=1 l=1
because
(m )2 ( m )2
∑ k ∑ k
√ √ ∑
mk ∑
mk
φkl (x; µ)ukl (x; µ) = φkl (x; µ) ukl (x; µ) ukl (x; µ) ≤ φ2kl (x; µ)ukl (x; µ) ukl (x; µ)
l=1 l=1 l=1 l=1
holds by the Schwarz inequality. Thus, functions Sk (x; µ), 1 ≤ k ≤ m, are convex, so their derivatives
∂Sk (x; µ)/∂µ are nondecreasing. Obviously, it holds
∂Sk (x; µ) ∑
mk ∑
mk
lim = lim log exp φkl (x; µ) − lim φkl (x; µ)ukl (x; µ)
µ→0 ∂µ µ→0 µ→0
l=1 l=1
1 ∑k m
= log mk − lim φkl (x; µ) exp φkl (x; µ) = log mk
mk µ→0
l=1
because φkl (x; µ) = 0 if fkl (x) = Fk (x) and limµ→0 φkl (x; µ) = −∞, limµ→0 φkl (x; µ) exp φkl (x; µ) = 0 if
fkl (x) < Fk (x). Similarly, it holds
∂Sk (x; µ) ∑k m ∑k m
lim = lim log exp φkl (x; µ) − lim φkl (x; µ)ukl (x; µ) = log m
µ→∞ ∂µ µ→∞ µ→∞
l=1 l=1
because limµ→∞ φkl (x; µ) = 0 and limµ→∞ |ukl (x; µ)| ≤ 1 for 1 ≤ k ≤ m.
Lemma 8. Let Assumptions X1b and X3 be satisfied. Then the values µi , i ∈ N , generated by Algorithm 3,
create a nonincreasing sequence such that µi → 0.
Proof. Lemma 8 is a direct consequence of Lemma 6 because the same procedures for an update of a
parameter µ are used and (146) holds.
Theorem 12. Let the assumptions of Lemma 8 be satisfied. Consider a sequence xi i ∈ N , generated by
Algorithm 3, where ε = µ = 0. Then
∑
m ∑
mk ∑
mk
lim ukl (xi ; µi )gkl (xi ) = 0, ukl (xi ; µi ) = 1
i→∞
k=1 l=1 l=1
and
Fk (xi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0, lim ukl (xi ; µi )(Fk (xi ) − fkl (xi )) = 0
i→∞
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .
35
Proof.
(a) Equations ẽTk uk (xi ; µi ) = 1 for 1 ≤ k ≤ m follow from (139). Inequalities Fk (xi ) − fkl (xi ) ≥ 0 and
ukl (xi ; µi ) ≥ 0 for 1 ≤ k ≤ m and 1 ≤ l ≤ mk follow from (4) and (138).
(b) Since Sk (x; µ) are nondecreasing functions of the parameter µ by Lemma 7 and (146) holds, we can
write
∑
m
F ≤ Fk (xi+1 ) ≤ S(xi+1 ; µi+1 ) ≤ S(xi+1 ; µi ) ≤ S(xi ; µi ) − c∥g(xi ; µi )∥2
k=1
∑
i
≤ S(x1 ; µ1 ) − c ∥g(xj ; µj )∥2 ,
j=1
∑m
where F = k=1 F k and F k , 1 ≤ k ≤ m, are lower bounds from Assumption X1b. Thus, it holds
∞
∑
F ≤ lim S(xi+1 ; µi+1 ) ≤ S(x1 ; µ1 ) − c ∥g(xi ; µi )∥2 ,
i→∞
i=1
or
∞
∑ 1
∥g(xi ; µi )∥2 ≤ (S(x1 ; µ1 ) − F ),
i=1
c
so ∥g(xi ; µi )∥ → 0, which together with inequalities 0 ≤ ukl (xi ; µi ) ≤ 1, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
gives limi→∞ ukl (xi ; µi )gkl (xi ) = 0.
(c) Let indices 1 ≤ k ≤ m and 1 ≤ l ≤ mk be chosen arbitrarily. Using (138) we get
φkl (xi ; µi ) exp φkl (xi ; µi )
0 ≤ ukl (xi ; µi )(Fk (xi ) − fkl (xi )) = −µi ∑mk
l=1 exp φkl (xi ; µi )
µi
≤ −µi φkl (xi ; µi ) exp φkl (xi ; µi ) ≤ ,
e
where φkl (xi ; µi ), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are functions used in the proof of Lemma 7, because
∑
mk
exp φkl (xi ; µi ) ≥ 1
l=1
and the function t exp t attains its minimal value −1/e at the point t = −1. Since µi → 0, we obtain
ukl (xi ; µi )(Fk (xi ) − fkl (xi )) → 0.
Corollary 2. Let the assumptions of Theorem 12 be satisfied. Then every cluster point x ∈ Rn of a
sequence xi , i ∈ N , satisfies the necessary KKT conditions (5)–(6), where u (with elements uk , 1 ≤ k ≤ m)
is a cluster point of a sequence u(xi ; µi ), i ∈ N .
Now we will suppose that the values ε and µ are nonzero and show how a precise solution of the system
of KKT equations will be after termination of computation of Algorithm 3.
Theorem 13. Let the assumptions of Theorem 8 be satisfied and let xi , i ∈ N , be a sequence generated
by Algorithm 3. Then, if the values ε > 0 and µ > 0 are chosen arbitrarily, there exists an index i ≥ 1
such that
∥g(xi ; µi )∥ ≤ ε, ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m,
and
µ
Fk (xi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0, ukl (xi ; µi )(Fk (xi ) − fkl (xi )) ≤
e
for all 1 ≤ k ≤ m and 1 ≤ l ≤ mk .
36
Proof. Equalities ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m, follow from (139). Inequalities Fk (xi ) − fkl (xi ) ≥ 0 and
ukl (xi ; µi ) ≥ 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , follow from (10) and (138). Since µi → 0 holds by Lemma 8 and
∥g(xi ; µi )∥ → 0 holds by Theorem 12, there exists an index i ≥ 1 such that µi ≤ µ and ∥g(xi ; µi )∥ ≤ ε.
By (138), as in the proof of Theorem 12, one can write
µi µ
ukl (xi ; µi )(Fk (xi ) − fkl (xi )) ≤ −µi φkl (xi ; µi ) exp φkl (xi ; µi ) ≤ ≤
e e
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .
or ( )
1
W (x; µ) − g(x; µ)g T (x; µ) ∆x = −g(x; µ), (152)
µ
where
1
W (x; µ) = G(x; µ) + A(x)U (x; µ)AT (x), g(x; µ) = A(x)u(x; µ). (153)
µ
Since ( )−1
1 T W −1 gg T W −1
W − gg = W −1 + ,
µ µ − g T W −1 g
holds by the Sherman-Morrison formula, the solution of system of equations (152) can be written in the
form
µ
∆x = T −1 W −1 g. (154)
g W g−µ
If a matrix W is not positive definite, it may be replaced with a matrix LLT = W + E obtained by the
Gill-Murray decomposition described in [11]. Then, we solve an equation
LLT p = g (155)
and set
µ
∆x = p. (156)
gT p − µ
Minimization of a sum of absolute values, i.e., minimization of the function
∑
m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x),
k=1 k=1
is another important generalized minimax problem. In this case, a smoothing function has the form
∑ ( ( ) ( ))
|fk (x)| − fk− (x)
m
|fk (x)| − fk+ (x)
S(x; µ) = F (x) + µ log exp − + exp −
µ µ
k=1
∑
m ∑m ( ( ))
2|fk (x)|
= |fk (x)| + µ log 1 + exp −
µ
k=1 k=1
37
because fk+ (x) = |fk (x)| if fk (x) ≥ 0 and fk− (x) = |fk (x)| if fk (x) ≤ 0, and by Theorem 11 we have
∑
m ∑
m ∑
m
∇S(x; µ) = (gk+ u+
k + gk− u−
k) = gk (u+
k − u−
k) = gk uk = g(x; µ),
k=1 k=1 k=1
∑
m
1∑
m
1∑
m
1∑
m
− − − 2
∇2 S(x; µ) = k −uk )+
Gk (u+ gk gkT (u+
k +uk )− k −uk ) = G(x; µ)+
gk gkT (u+ gk gkT (1−u2k )
µ µ µ
k=1 k=1 k=1 k=1
(because u+
k + u−
= 1), where gk = gk (x) and
k
( ) ( ) ( )
|f (x)|−f + (x) |f (x)|−f − (x)
exp − k µ k − exp − k µ k 1 − exp − 2|fkµ(x)|
−
k − uk =
uk = u+ ( ) ( )= ( ) sign(fk (x)),
|f (x)|−f + (x) |f (x)|−f − (x)
exp − k µ k + exp − k µ k 1 + exp − 2|fkµ(x)|
( )
4 exp − 2|fkµ(x)|
1 − u2k = ( ( ))2 ,
1 + exp − 2|fkµ(x)|
where µ → 0. Necessary conditions for an extremum of problem (159) have the form
∑
m ∑
mk
g(x, u) = gkl (x)ukl = 0,
k=1 l=1
38
∑
mk
1− ukl = 0, 1 ≤ k ≤ m,
l=1
ukl skl − µ = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
fkl (x) + skl − zk = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
which is n+m+2m̄ equations for n+m+2m̄ unknowns (vectors x, z = [zk ], s = [skl ], u = [ukl ], 1 ≤ k ≤ m,
1 ≤ l ≤ mk ), where m̄ = m1 + · · · + mm . Denote A(x) = [A1 (x), . . . , Am (x)], f = [fkl ], S = diag(skl ),
U = diag(ukl ), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and
ẽ1 0 ... 0 ẽ1 z1
0 ẽ . . . 0 ẽ2 z2
E= 2
. . . . . . . . . . . . , ẽ =
. . . , z=
. . .
0 0 . . . ẽm ẽm zm
(matrices Ak (x), vectors ẽk , and numbers zk , 1 ≤ k ≤ m, are defined in Section 3.2). Applying the
Newton method to this system of nonlinear equations, we obtain a system of linear equations for increments
(direction vectors) ∆x, ∆z, ∆s, ∆u. After arrangement and elimination
where c(x̃) = f (x) − Ez. This system of equations is more advantageous against systems (94) and (144) in
that its matrix does not depend on the barrier parameter µ, so it is not necessary to use a lower bound µ.
On the other hand, system (163) has a dimension n+m+ m̄, while systems (94) and (144) have dimensions
n. It would be possible to eliminate the vector ∆u, so the resulting system
where M = U −1 S, would have dimension n + m (i.e., n + 1 for classical minimax problems). Nevertheless,
as follows from the equation ukl skl = µ, either ukl → 0 or skl → 0 if µ → 0, so some elements of a matrix
M −1 may tend to infinity, which increases the condition number of system (164). Conversely, the solution
of equation (163) is easier if the elements of a matrix M are small (if M = 0, we obtain the saddle point
system, which can be solved by efficient iterative methods [1], [29]). Therefore, it is advantageous to split
the constraints to active with skl ≤ ε̃ukl (we denote active quantities by ĉ(x̃), Â(x̃), ŝ, ∆ŝ, Ŝ, û, ∆û, Û ,
39
M̂ = Û −1 Ŝ) and inactive with skl > ε̃ukl (we denote inactive quantities by č(x̃), Ǎ(x̃), š, ∆š, Š, ǔ, ∆ǔ,
Ǔ , M̌ = Ǔ −1 Š). Eliminating inactive equations from (163) we obtain
∆ǔ = M̌ −1 (č(x̃) + Ǎ(x̃)T ∆x̃) + µŠ −1 e (165)
and [ ][ ] [ ]
Ĝ(x̃, u) Â(x̃) ∆x̃ ĝ(x̃, u)
=− , (166)
ÂT (x̃) −M̂ ∆û ĉ(x̃) + µÛ −1 ẽ
where
Ĝ(x̃, u) = G(x̃, u) + Ǎ(x̃)M̌ −1 ǍT (x̃),
ĝ(x̃, u) = g(x̃, u) + Ǎ(x̃)(M̌ −1 č(x̃) + µŠ −1 ẽ),
and M̂ = Û −1 Ŝ is a diagonal matrix of order m̂, where 0 ≤ m̂ ≤ m̄ is the number of active constraints.
Substituting (165) into (160) we can write
∆ŝ = −M̂ (û + ∆û) + µÛ −1 ẽ, ∆š = −(č + ǍT ∆x̃ + š). (167)
The matrix of the linear system (166) is symmetric, but indefinite, so its Choleski decomposition cannot be
determined. In this case, we use either dense [3] or sparse [7] Bunch-Parlett decomposition for solving this
system. System (166) (especially if it is large and sparse) can be efficiently solved by iterative conjugate
gradient method with indefinite preconditioner [20]. If the vectors ∆x̃ and ∆û are solutions of system
(166), we determine vector ∆ǔ by (165) and vectors ∆ŝ, ∆š by (167).
Having vectors ∆x̃, ∆s, ∆u, we need to determine a steplength α > 0 and set
x̃+ = x̃ + α∆x̃, s+ = s(α), u+ = u(α), (168)
where s(α) and u(α) are vector functions such that s(α) > 0, s′ (0) = ∆s and u(α) > 0, u′ (0) = ∆u. This
step is not trivial, because we need to decrease both the value of the barrier function B̃µ (x̃, s) = Bµ (x, z, s)
and the norm of constraints ∥c(x̃)∥, and also to assure positivity of vectors s and u. We can do this in
several different ways: using either the augmented Lagrange function [20], [21] or a bi-criterial filter [10],
[37] or a special algorithm [12], [18]. In this section, we confine our attention to the augmented Lagrange
function which has (for problem (157)) the form
σ
P (α) = B̃µ (x̃ + α∆x̃, s(α)) + (u + ∆u)T (c(x̃ + α∆x̃) + s(α)) + ∥c(x̃ + α∆x̃) + s(α)∥2 , (169)
2
where σ ≥ 0 is a penalty parameter. The following theorem, whose proof is given in [20], holds.
Theorem 14. Let s > 0, u > 0 and let vectors ∆x̃, ∆û be solutions of the linear system
[ ][ ] [ ] [ ]
Ĝ(x̃, u) Â(x̃) ∆x̃ ĝ(x̃, u) r
+ = , (170)
ÂT (x̃) −M̂ ∆û ĉ(x̃) + µÛ −1 ẽ r̂
where r and r̂ are residual vectors, and let vectors ∆ǔ and ∆s be determined by (165) and (167). Then
P ′ (0) = −(∆x̃)T G̃(x̃, u)∆x̃ − (∆s)T M −1 ∆s − σ∥c(x̃) + s∥2 + (∆x̃)T r + σ(ĉ(x̃) + ŝ)T r̂. (171)
If
(∆x̃)T G̃(x̃, u)∆x̃ + (∆s)T M −1 ∆s
σ>− (172)
∥c(x̃) + s∥2
and if system (166) is solved in such a way that
(∆x̃)T r + σ(ĉ(x̃) + ŝ)T r̂ < (∆x̃)T G̃(x̃, u)∆x̃ + (∆s)T M −1 ∆s + σ(∥c(x̃) + s∥2 ), (173)
then P ′ (0) < 0.
Inequality (173) is significant only if linear system (166) is solved iteratively and residual vectors r and
r̂ are nonzero. If these vectors are zero, then (173) follows immediately from (172). Inequality (172) serves
for determination of a penalty parameter, which should be as small as possible. If the matrix G̃(x̃, u) is
positive semidefinite, then the right-hand side of (172) is negative and we can choose σ = 0.
40
5.2 Implementation
The algorithm of the primal-dual interior point method consists of four basic parts: determination of the
matrix G(x, u) or its approximation, solving linear system (166), a steplength selection, and an update of
the barrier parameter µ. The matrix G(x, u) has form (74), so its approximation can be determined in the
one of the ways introduced in Remark 22.
The linear system (166), obtained by determination and subsequent elimination of inactive constraints
in the way described in the previous subsection, is solved either directly using the Bunch-Parlett decom-
position or iteratively by the conjugate gradient method with the indefinite preconditioner
[ ]
D̂ Â(x̃)
C= , (174)
ÂT (x̃) −M̂
where D̂ is a positive definite diagonal matrix that approximates matrix Ĝ(x̃, u). An iterative process is
terminated if residual vectors satisfy conditions (173) and
where 0 < ω < 1 is a prescribed precision. The directional derivative P ′ (0) given by (169) should be
negative. There are two possibilities how this requirement can be achieved. We either determine the value
σ ≥ 0 satisfying inequality (172), which implies P ′ (0) < 0 if (173) holds (Theorem 14), or set σ = 0 and
ignore inequality (173). If P ′ (0) ≥ 0, we determine a diagonal matrix D̃ with elements
∥g̃∥
D̃jj = Γ if |G̃jj | < Γ,
10
∥g̃∥ ∥g̃∥
D̃jj = |G̃jj | if Γ≤ |G̃jj | ≤ Γ, (175)
10 10
∥g̃∥
D̃jj = Γ if Γ< |G̃jj |,
10
for 1 ≤ j ≤ n + m, where g̃ = g̃(x̃, u) and 0 < Γ < Γ, set G̃(x̃, u) = D̃ and restart the iterative process by
solving new linear system (166).
We use functions s(α) = [skl (α)], u(α) = [ukl (α)], where skl (α) = skl + αskl ∆skl , ukl (α) = ukl +
αukl ∆skl and
αskl = α, ∆skl ≥ 0,
( )
skl
αskl = min α, −γ , ∆skl < 0,
∆skl
αukl = α, ∆ukl ≥ 0,
( )
ukl
αukl = min α, −γ , ∆ukl < 0,
∆ukl
when choosing a steplength using the augmented Lagrange function. A parameter 0 < γ < 1 (usually
γ = 0.99) assures the positivity of vectors s+ and u+ in (168). A parameter α > 0 is chosen to satisfy the
inequality P (α) − P (0) ≤ ε1 αP ′ (0), which is possible because P ′ (0) < 0 and a function P (α) is continuous.
After finishing the iterative step, a barrier parameter µ should be updated. There exist several heuristic
procedures for this purpose. The following procedure proposed in [36] seems to be very efficient.
Procedure C
Compute the centrality measure
m̄ minkl (skl ukl )
ϱ= ,
sT u
41
where m̄ = m1 + · · · + mm and 1 ≤ k ≤ m, 1 ≤ l ≤ mk . Compute the value
( )3
1−ϱ
λ = 0.1 min 0.05 ,2
ϱ
Step 2 Termination. Determine a matrix Ã(x̃) and a vector g̃(x̃, u) = Ã(x̃)u by (162). If the KKT
contitions ∥g̃(x̃, u)∥ ≤ ε, ∥c(x̃) + s∥ ≤ ε, and sT u ≤ ε are satisfied, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x, u) or compute an approximation G of the Hes-
sian matrix G(x, u) using gradient differences or utilizing quasi-Newton updates (Remark 22). De-
termine a parameter σ ≥ 0 by (172) or set σ = 0. Split the constraints into active if ŝkl ≤ ε̃ûkl and
inactive if škl > ε̃ǔkl .
Step 4 Direction determination. Determine the matrix G̃ = G̃(x̃, u) by (162) (where the Hessian ma-
trix G(x, u) is replaced with its approximation G). Determine vectors ∆x̃ and ∆û by solving linear
system (166), a vector ∆ǔ by (165), and a vector ∆s by (167). Linear system (166) is solved ei-
ther directly using the Bunch-Parlett decomposition (we carry out both the symbolic and the numeric
decompositions in this step) or iteratively by the conjugate gradient method with indefinite precondi-
tioner (174). Compute the derivative of the augmented Lagrange function by formula (171).
Step 5 Restart. If P ′ (0) ≥ 0, determine a diagonal matrix D̃ by (175), set G̃ = D̃, σ = 0, and go to
Step 4.
Step 6 Steplength selection. Determine a steplength parameter α > 0 satisfying inequalities P (α) −
P (0) ≤ ε1 αP ′ (0) and α ≤ ∆/∥∆x∥. Determine new vectors x̃ := x̃ + α∆x̃, s := s(α), u := u(α) by
(168). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and set ckl (x̃) = fkl (x) − zk , 1 ≤ k ≤ m,
1 ≤ l ≤ mk . Compute the value of the barrier function B̃µ (x̃, s).
Step 7 Barrier parameter update. Determine a new value of the barrier parameter µ ≥ µ using Pro-
cedure C. Go to Step 2.
The values ε = 10−6 , ε̃ = 0.1, δ = 0.1, ω = 0.9, γ = 0.99, µ = 1, ε1 = 10−4 , and ∆ = 1000 were used
in our numerical experiments.
6 Numerical experiments
The methods studied in this contribution were tested by using two collections of test problems TEST14
and TEST15 described in [30], which are the parts of the UFO system [28] and can be downloaded from the
web page www.cs.cas.cz/luksan/test.html. Both these collections contain 22 problems with functions
42
fk (x), 1 ≤ k ≤ m, x ∈ Rn , where n is an input parameter and m ≥ n depends on n (we have used the
values n = 100 and n = 1000 for numerical experiments). Functions fk (x), 1 ≤ k ≤ m, have a sparse
structure (the Jacobian matrix of a mapping f (x) is sparse), so sparse matrix decompositions can be
used for solving linear equation systems. Since the method described in Section 2.2 does not use sparsity
of a quadratic programming problem, it is not comparable with other methods for test problems used.
Therefore, it is not presented in the test results.
The tested methods, whose results are reported in Tables 1-5, are denoted by seven letters. The first
pair of letters distinguishes the line search methods LS from the trust region methods TR (trust region
methods are used only for minimization of the l1 norm). The second pair of letters gives the problem type:
either a classical minimax MX (when a function F (x) has form (10) or F (x) = ∥f (x)∥∞ holds) or a sum
of absolute values SA (when F (x) = ∥f (x)∥1 holds). Further two letters specify the method used:
• For l1 approximation, it is usually more advantageous to use the trust region methods TR than the
line search methods LS.
• The smoothing methods are less efficient than the primal interior point methods. For testing the
smoothing methods, we had to use the value µ = 10−6 , while the primal interior methods work well
with the smaller value µ = 10−8 , which gives more precise results.
• The primal-dual interior point methods are slower than the primal interior point methods, especially
for the reason that system of equations (166) is indefinite, so we cannot use the Choleski (or the
Gill-Murray [11]) decomposition. If the matrix of linear system (166) is large and sparse, we can use
the Bunch-Parlett decomposition [7]. In this case, a large fill-in of new nonzero elements (and thus
to overflow of the operational memory or large extension of the computational time) may appear. In
this case, we can also use the iterative conjugate gradient method with an indefinite preconditioner
[29], however, the ill-conditioned systems can require a large number of iterations and thus a large
computational time.
• It cannot be uniquely decided whether Procedure A is better than Procedure B. The Newton methods
usually work better with Procedure A while the variable metric methods are more efficient with
Procedure B.
43
• The variable metric methods are usually faster because it is not necessary to determine the elements
of the Hessian matrix of the Lagrange function by gradient differences. The Newton methods seem
to be more robust (especially in case of l1 approximation).
References
[1] M.Benzi, G.H.Golub, J.Liesen: Numerical solution of saddle point problems. Acta Numerica 14
(2005) 1-137.
[2] A.Björck: Numerical Methods in Matrix Computations. Springer, New York, 2015.
[3] J.R.Bunch, B.N.Parlett: Direct methods for solving symmetric indefinite systems of linear equations.
SIAM J. Numerical Analysis 8 (1971) 639-655.
[4] T.F.Coleman, J.J.Moré: Estimation of sparse Hessian matrices and graph coloring problems. Math-
ematical Programming 28 (1984) 243-270.
[5] A.R.Conn, N.I.M.Gould, P.L.Toint: Trust-region Methods. SIAM, Phiadelphia, 2000.
[6] G. Di Pillo, L. Grippo, S. Lucidi: Smooth Transformation of the generalized minimax problem. J. of
Optimization Theory and Applications 95 (1997) 1-24.
[7] I.S.Duff, N.I.M.Gould, J.K.Reid, K.Turner: The factoruzation of sparse symmetric indefinite metri-
ces. IMA Journal of Numerical Analysis 11 (1991) 181-204.
[8] M.Fiedler: Special Matrices and Their Applications in Numerical Mathematics. Dover Publications,
New York, 2008.
[9] R.Fletcher: Practical methods of optimization. Wiley, New York, 1987.
[10] R.Fletcher, S.Leyffer: Nonlinear programming without a penalty function Mathematical Program-
ming 91 (2002) 239-269.
[11] P.E.Gill, W.Murray: Newton type methods for unconstrained and linearly constrained optimization.
Mathematical Programming 7 (1974) 311-350.
[12] N.I.M.Gould, P.L.Toint: Nonlinear programming without a penalty function or a filter. Mathematical
Programming 122 (2010) 155-196.
[13] A.Griewank, P.L.Toint: Partitioned variable metric updates for large-scale structured optimization
problems. Numererische Mathematik 39 (1982) 119-137.
[14] A.Griewank, A.Walther: Evaluating Derivatives. SIAM, Philadelphia, 2008.
[15] S.P.Han: Variable metric methods for minimizing a class of nondifferentiable functions. Math. Pro-
gramming 20 (1981) 1-13.
[16] D. Le: Three new rapidly convergent algorithms for finding a zero of a function. SIAM J. on Scientific
and Statistical Computations 6 (1985) 193-208.
[17] D. Le: An efficient derivative-free method for solving nonlinear equations. ACM Transactions on
Mathematical Software 11 (1985) 250-262.
[18] X.Liu Y.Yuan: A sequential quadratic programming method without a penalty function or a filter
for nonlinear equality constrained optimization. SIAM J. Optimization 21 (2011) 545-571.
[19] L.Lukšan: Dual method for solving a special problem of quadratic programming as a subproblem at
linearly constrained nonlinear minimax approximation. Kybernetika 20 (1984) 445-457.
44
[20] L.Lukšan, C.Matonoha, J.Vlček: Interior-point method for non-linear non-convex optimization. Nu-
merical Linear Algebra with Applications 11 (2004) 431-453.
[21] L.Lukšan, C.Matonoha, J.Vlček: Interior-point method for large-scale nonlinear programming. Op-
timization Methods and Software 20 (2005) 569-582.
[22] L.Lukšan, C.Matonoha, J.Vlček: Trust-region interior-point method for large sparse l1 optimization.
Optimization Methods and Software 22 (2007) 737-753.
[23] L.Lukšan, C.Matonoha, J.Vlček: On Lagrange multipliers of trust-region subproblems. BIT Numer-
ical Analysis 48 (2008a) 763-768.
[24] L.Lukšan, C.Matonoha, J.Vlček J.: Algorithm 896: LSA: Algorithms for Large-Scale Optimization.
ACM Transactions on Mathematical Software 36 (2009) No. 3.
[25] L. Lukšan, C. Matonoha, J. Vlček: Primal interior-point method for large sparse minimax optimiza-
tion. Kybernetika 45 (2009) 841-864.
[26] L. Lukšan, C. Matonoha, J. Vlček: Primal interior-point method for minimization of generalized
minimax functions. Kybernetika 46 (2010) 697-721.
[27] L.Lukšan, E.Spedicato: Variable metric methods for unconstrained optimization and nonlinear least
squares. Journal of Computational and Applied Mathematics 124 (2000) 61-93.
[28] L.Lukšan, M.Tůma, C.Matonoha, J.Vlček J., N.Ramešová, M.Šiška, J.Hartman: UFO 2017. Inter-
active System for Universal Functional Optimization. Technical Report V-1252. Prague, ICS AS CR
2017.
[29] L.Lukšan, J.Vlček: Indefinitely preconditioned inexact Newton method for large sparse equality
constrained nonlinear programming problems. Numerical Linear Algebra with Applications 5 (1998)
219-247.
[30] Lukšan L., Vlček J.: Sparse and partially separable test problems for unconstrained and equality
constrained optimization. Technical Report V-767. Prague, ICS AS CR 1998.
[33] M.J.D.Powell: On the global convergence of trust region algorithms for unconstrained minimization.
Mathematical Programming 29 (1984) 297-303.
[34] P.L.Toint: On sparse and symmetric matrix updating subject to a linear equation. Mathematics of
Computation 31 (1977) 954-961.
[35] M.Tůma: A note on direct methods for approximations of sparse Hessian matrices. Applications of
Mathematic 33 (1988) 171-176.
[36] J.Vanderbei, D.F.Shanno: An interior point algorithm for nonconvex nonlinear programming. Com-
putational Optimization and Applications 13 (1999) 231-252.
[37] A.Wachter, L.Biegler: Line search filter methods for nonlinear programming. Motivation and global
convergence. SIAM Journal on Computing 16 (2005) 1-31.
[38] Y.Xiao, B.Yu: A truncated aggregate smoothing Newton method for minimax problems. Applied
Mathematics and Computation 216 (2010) 1868-1879.
45
Newton methods: n=100 Variable metric methods: n=100
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 2232 7265 11575 0.74 4 - 2849 5078 2821 0.32 2 -
LSMXPI-B 2184 5262 9570 0.60 1 - 1567 2899 1589 0.24 1 -
LSMXSM-A 3454 11682 21398 1.29 5 - 4444 12505 4465 1.03 - -
LSMXSM-B 10241 36891 56399 4.15 3 - 8861 32056 8881 2.21 1 1
LSMXDI-C 1386 2847 14578 0.90 2 - 2627 5373 2627 0.96 3 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 1386 3735 7488 5.58 4 - 3237 12929 3258 5.91 6 -
LSMXPI-B 3153 6885 12989 9.03 4 - 1522 3287 1544 2.68 5 -
LSMXSM-A 10284 30783 82334 54.38 7 - 4221 9519 4242 8.00 8 -
LSMXSM-B 18279 61180 142767 87.76 6 - 13618 54655 13639 45.10 9 1
LSMXDI-C 3796 6677 48204 49.95 6 - 2371 5548 2371 18.89 3 -
46
Newton methods: n=100 Variable metric methods: n=100
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 2098 2469 10852 0.57 1 - 22710 22903 22731 1.74 1 1
TRSAPI-B 2286 2771 11353 0.56 1 - 22311 22476 22332 1.62 1 1
LSSAPI-A 1647 5545 8795 0.63 5 - 12265 23579 12287 1.37 2 1
LSSAPI-B 1957 7779 10121 0.67 6 - 4695 6217 10608 0.67 3 -
TRSASM-A 2373 2868 19688 0.73 1 - 22668 22918 22689 2.34 2 1
TRSASM-B 3487 4382 28467 1.12 1 - 22022 22244 22044 1.90 2 1
LSSASM-A 1677 4505 16079 0.74 3 - 20025 27369 20047 2.83 4 -
LSSASM-B 2389 8085 23366 1.18 4 - 5656 11637 5678 1.02 2 -
LSSADI-C 4704 13012 33937 4.16 7 1 6547 7012 6547 9.18 8 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 7570 8955 30013 12.54 3 - 24896 25307 24916 16.22 5 1
TRSAPI-B 8555 9913 36282 18.11 6 - 25013 25492 25033 16.64 5 1
LSSAPI-A 7592 19621 46100 15.39 4 - 22277 36610 22298 19.09 7 1
LSSAPI-B 9067 35463 56292 19.14 6 - 16650 35262 16672 14.47 6 1
TRSASM-A 7922 9453 49104 12.66 2 - 26358 26966 26378 26.44 4 1
TRSASM-B 9559 11358 58418 16.39 7 - 24283 24911 24303 17.79 6 1
LSSASM-A 5696 13534 41347 15.28 4 - 20020 30736 20042 23.05 5 1
LSSASM-B 8517 30736 57878 23.60 6 - 18664 28886 18686 18.65 5 1
LSSADI-C 6758 11011 47960 94.78 11 1 13123 14610 13124 295.46 8 2
47