0% found this document useful (0 votes)
64 views49 pages

Institute of Computer Science: Academy of Sciences of The Czech Republic

This technical report describes four numerical methods for solving generalized minimax problems: 1) Recursive quadratic programming methods 2) Primal interior point methods 3) Smoothing methods 4) Primal-dual interior point methods The report contains descriptions of the methods, their implementation, convergence properties, and applications to special cases of generalized minimax problems. It also describes results from numerical experiments comparing the performance of the different methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views49 pages

Institute of Computer Science: Academy of Sciences of The Czech Republic

This technical report describes four numerical methods for solving generalized minimax problems: 1) Recursive quadratic programming methods 2) Primal interior point methods 3) Smoothing methods 4) Primal-dual interior point methods The report contains descriptions of the methods, their implementation, convergence properties, and applications to special cases of generalized minimax problems. It also describes results from numerical experiments comparing the performance of the different methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Institute of Computer Science

Academy of Sciences of the Czech Republic

Numerical solution of generalized minimax


problems
L.Lukšan, C.Matonoha, J.Vlček

Technical report No. 1255

January 2018

Pod Vodárenskou věžı́ 2, 182 07 Prague 8 phone: +420 2 688 42 44, fax: +420 2 858 57 89,
e-mail:e-mail:[email protected]
Institute of Computer Science
Academy of Sciences of the Czech Republic

Numerical solution of generalized minimax


problems
L.Lukšan, C.Matonoha, J.Vlček

Technical report No. 1255

January 2018

Abstract:

This contribution contains the description and investigation of four numerical methods for solving generalized
minimax problems, which consists in the minimization of functions which are compositions of special smooth
convex functions with maxima of smooth functions (the most important problem of this type is the sum of
maxima of smooth functions). Section 1 is introductory. In Section 2, we study recursive quadratic programming
methods. This section also contains the description of the dual method for solving corresponding quadratic
programming problems. Section 3 is devoted to primal interior points methods which use solutions of nonlinear
equations for obtaining minimax vectors. Section 4 contains investigation of smoothing methods, based on
using exponential smoothing terms. Section 5 contains a short description of primal-dual interior point methods
based on transformation of generalized minimax problems to general nonlinear programming problems. Finally
the last section contains results of numerical experiments.

Keywords:
Numerical optimization, nonlinear approximation, nonsmooth optimization, generalized minimax problems,
recursive quadratic programming methods, interior point methods, smoothing methods, algorithms, numerical
experiments.
Content
1 Generalized minimax problems 2

2 Recursive quadratic programming methods 6


2.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Solving special quadratic programming problems . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Primal interior point methods 14


3.1 Barriers and barrier functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Iterative determination of a minimax vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Direct determination of a minimax vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Smoothing methods 31
4.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Global convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Primal-dual interior point methods 38


5.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Numerical experiments 42

References 44

1
1 Generalized minimax problems
In many practical problems we need to minimize functions that contain absolute values or pointwise maxima
of smooth functions. Such functions are nonsmooth but they often have a special structure enabling
the use of special methods that are more efficient than methods for minimization of general nonsmooth
functions. The classical minimax problem, where F (x) = max1≤k≤m fk (x), or problems where the function
to be minimized is a nonsmooth norm, e.g. F (x) = ∥f (x)∥∞ , F (x) = ∥f+ (x)∥∞ , F (x) = ∥f (x)∥1 ,
F (x) = ∥f+ (x)∥1 with f (x) = [f1 (x), . . . , fm (x)]T and f+ (x) = [max(f1 (x), 0), . . . , max(fm (x), 0)]T , are
typical examples. Such functions can be considered as special cases of more general functions, so it is
possible to formulate more general theories and construct more general numerical methods. One possibility
for generalization of the classical minimax problem consists in the use of the function

F (x) = max pTk f (x), (1)


1≤k≤k

where pk ∈ Rm , 1 ≤ k ≤ k, and f : Rn → Rm is a smooth mapping. This function is a special case of


composite nonsmooth functions of the form F (x) = f0 (x) + max1≤k≤k (pTk f (x) + bk ), where f0 : Rn → R
is a continuously differentiable function [9, Section 14.1].
Remark 1. We can express all above mentioned minimax problems and nonsmooth norms in form (1).

(a) Setting pk = ek , where ek is the k-th column of a unit matrix and k = m, we obtain F (x) =
max1≤k≤m fk (x) (the classical minimax).

(b) Setting pk = ek , pm+k = −ek and k = 2m, we obtain F (x) = max1≤k≤m max(fk (x), −fk (x)) =
∥f (x)∥∞ .

(c) Setting pk = ek , pm+1 = 0 and k = m + 1, we obtain F (x) = max(max1≤k≤m fk (x), 0) = ∥f (x)+ ∥∞ .

1 ≤ k ≤ 2m , are mutually different vectors whose elements are either 1 or −1, we


(d) If k = 2m and pk , ∑
m
can write F (x) = k=1 max(fk (x), −fk (x)) = ∥f (x)∥1 .

1 ≤ k ≤ 2m , are mutually different vectors whose elements are either 1 or 0, we


(e) If k = 2m and pk ,∑
m
can write F (x) = k=1 max(fk (x), 0) = ∥f+ (x)∥1 .

Remark 2. Since the mapping f (x) is continuously differentiable, the function (1) is Lipschitz. Thus, if
the point x ∈ Rn is a local minimum of F (x), then 0 ∈ ∂F (x) [31, Theorem 3.2.5] holds. According to
[31, Theorem 3.2.13], one has
{ }
∂F (x) = (∇f (x))T conv pk : k ∈ I(x)
¯ ,

¯
where I(x) = {k ∈ {1, . . . , k} : pTk f (x) = F (x)}. Thus, if the point x ∈ Rn is a local minimum of F (x),
then multipliers λk ≥ 0, 1 ≤ k ≤ k, exist, such that λk (pTk f (x) − F (x)) = 0, 1 ≤ k ≤ k,


k ∑
k
λk = 1 and λk J(x)T pk = 0,
k=1 k=1

where J(x) is a Jacobian matrix of the mapping f (x).

Remark 3. It is clear that a minimum of function (1) is a solution of a nonlinear programming problem
consisting in minimization of a function F̃ : Rn+1 → R, where F̃ (x, z) = z, on the set

C = {(x, z) ∈ Rn+1 : pTk f (x) ≤ z, 1 ≤ k ≤ k}.

2
Obviously, ak = ∇ck (x, z) = (pTk J(x), −1), 1 ≤ k ≤ k, and gk = ∇F̃ (x, z) = (0, 1), so the necessary KKT
conditions can be written in the form
[ ] k [
∑ ]
0 J T (x)pk
+ λk = 0,
1 −1
k=1

− z) = 0, where λk ≥ 0 are the Lagrange multipliers and z = F (x). Thus, we obtain the same
λk (pTk f (x)
necessary conditions for an extremum as in Remark 2.
From the examples given in Remark 1 it follows that composite nondifferentiable functions are not
suitable for representation of functions F (x) = ∥f (x)∥1 and F (x) = ∥f+ (x)∥1 because in this case the
expression on the right-hand side of (1) contains 2m elements with vectors pk , 1 ≤ k ≤ 2m . In the
subsequent considerations, we will choose a somewhat different approach. We will consider generalized
minimax functions established in [6] and [26].
Definition 1. We say that F : Rn → R is a generalized minimax function if
F (x) = h(F1 (x), . . . , Fm (x)), Fk (x) = max fkl (x), 1 ≤ k ≤ m, (2)
1≤l≤mk

where h : Rm → R and fkl : Rn → R, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are smooth functions satisfying the


following assumptions.
Assumption X1a. Functions fkl , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are bounded from below on Rn , so that there
exists a constant F ∈ R such that fkl (x) ≥ F , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , for all x ∈ Rn .
Assumption X1b. Functions Fk , 1 ≤ k ≤ m, are bounded from below on Rn , so that there exist constants
F k ∈ R such that Fk (x) ≥ F k , 1 ≤ k ≤ m, for all x ∈ Rn .
Assumption X2. The function h is twice continuously differentiable and convex satisfying
0 < hk ≤ ∂h(z)/∂zk ≤ hk , 1 ≤ k ≤ m, (3)
for every z ∈ Z = {z ∈ Rm : zk ≥ F k , 1 ≤ k ≤ m} (vector z ∈ Rm is called the minimax vector).
Assumption X3. Functions fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are twice continuously differentiable on the
convex hull of the level set
DF (F ) = {x ∈ Rn : Fk (x) ≤ F , 1 ≤ k ≤ m}
for a sufficiently large upper bound F and subsequently, constants g and G exist such that ∥gkl (x)∥ ≤ g
and ∥Gkl (x)∥ ≤ G for all 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and x ∈ convDF (F ), where gkl (x) = ∇fkl (x) and
Gkl (x) = ∇2 fkl (x).
Remark 4. The conditions imposed on the function h(z) are relatively strong but many important nons-
mooth functions satisfy them.
(1) Let h : R → R be an identity mapping, so h(z) = z and h′ (z) = 1 > 0. Then setting k = 1, m1 = l
and F (x) = h(F1 (x)) = F1 (x) = max1≤l≤l pTl f (x) (f1l = pTl f (x)), we obtain composite nonsmooth
function (1) and therefore functions F (x) = max1≤k≤m fk (x), F (x) = ∥f (x)∥∞ , F (x) = ∥f+ (x)∥∞ .
(2) Let h : Rm → R, where h(z) = z1 + · · · + zm , so ∂h(z)/∂zk = 1 > 0, 1 ≤ k ≤ m. Then function (2)
has the form
∑m ∑m
F (x) = Fk (x) = max fkl (x) (4)
1≤l≤mk
k=1 k=1
(the sum of maxima). If mk = 2 and Fk (x) = max(fk (x), −fk (x)), we obtain the function F (x) =
∥f (x)∥1 . If mk = 2 and Fk (x) = max(fk (x), 0), we obtain the function F (x) = ∥f+ (x)∥1 . It follows
that the expression of functions F (x) = ∥f (x)∥1 and F (x) = ∥f+ (x)∥1 by (2) contains only m
summands and each summand is a maximum of two function values. Thus, this approach is much
more economic than the use of formulas stated in Remark 1 (d)-(e).

3
Remark 5. Since the functions Fk (x), 1 ≤ k ≤ m, are regular [31, Theorem 3.2.13], the function h(z) is
continuously differentiable, and hk = ∂h(z)/∂zk > 0, one can write [31, Theorem 3.2.9]

m ∑
m ∑
m
∂F (x) = conv hk ∂Fk (x) = hk ∂Fk (x) = hk conv{gkl : l ∈ I¯k (x)},
k=1 k=1 k=1

where I¯k (x) = {l : 1 ≤ l ≤ mk , fkl (x) = Fk (x)}. Thus, one has



m ∑
mk
∂F (x) = hk λkl gkl ,
k=1 l=1
∑ mk
where for 1 ≤ k ≤ m it holds λkl ≥ 0, λkl (Fk (x) − fkl (x)) = 0, 1 ≤ l ≤ mk , and l=1 λkl = 1. Setting
ukl = hk λkl , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , we can write
m ∑
∑ mk
∂F (x) = ukl gkl ,
k=1 l=1
∑mk
where for 1 ≤ k ≤ m it holds ukl ≥ 0, ukl (Fk (x) − fkl (x)) = 0, 1 ≤ l ≤ mk , and l=1 ukl = hk . If a
point x ∈ Rn is a minimum of a function F (x), then 0 ∈ ∂F (x), so there exist multipliers ukl , 1 ≤ k ≤ m,
1 ≤ l ≤ mk , such that

m ∑
mk ∑
mk
∂h(z)
gkl (x)ukl = 0, ukl = hk , hk = , 1 ≤ k ≤ m, (5)
∂zk
k=1 l=1 l=1

ukl ≥ 0, Fk − fkl (x) ≥ 0, ukl (Fk − fkl (x)) = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (6)

Remark 6. Unconstrained minimization of function (2) is equivalent to the nonlinear programming prob-
lem
minimize F̃ (x, z) = h(z) subject to fkl (x) ≤ zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (7)
Condition (3) is sufficient for satisfying equalities zk = Fk (x), 1 ≤ k ≤ m, at the minimum point. Denote
akl (x, z) gradients of functions ckl (x, z) = fkl (x) − zk . Obviously, akl (x, z) = (gkl (x), −ek ), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , where gkl (x) is a gradient of fkl (x) in x and ek is the k-th column of a unit matrix of order
m. Thus, the necessary first-order (KKT) conditions have the form

m ∑
mk ∑
mk
∂h(z)
g(x, u) = gkl (x)ukl = 0, ukl = hk , hk = , 1 ≤ k ≤ m, (8)
∂zk
k=1 l=1 l=1

ukl ≥ 0, zk − fkl (x) ≥ 0, ukl (zk − fkl (x)) = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , (9)


where ukl are Lagrange multipliers and zk = Fk (x). So we obtain the same necessary conditions for an
extremum as in Remark 5.
Remark 7. A classical minimax problem
F (x) = max fk (x), (10)
1≤k≤m

can be replaced with an equivalent nonlinear programming problem


minimize F̃ (x, z) = z subject to fk (x) ≤ z, 1 ≤ k ≤ m, (11)
and necessary KKT conditions have the form

m ∑
m
gk (x)uk = 0, uk = 1, (12)
k=1 k=1

uk ≥ 0, z − fk (x) ≥ 0, uk (z − fk (x)) = 0, 1 ≤ k ≤ m. (13)

4
Remark 8. Minimization of the sum of absolute values

m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x) (14)
k=1 k=1

can be replaced with an equivalent nonlinear programming problem



m
minimize F̃ (x, z) = zk subject to − zk ≤ fk (x) ≤ zk (15)
k=1

(there are two constraints c−


k (x) = zk − fk (x) ≥ 0 and ck (x) = zk + fk (x) ≥ 0 for each index 1 ≤ k ≤ m)
+

and necessary KKT conditions have the form



m
− −
k − uk ) = 0,
gk (x)(u+ u+
k + uk = 1, 1 ≤ k ≤ m, (16)
k=1

k ≥ 0,
u+ zk − fk (x) ≥ 0, k (zk − fk (x)) = 0,
u+ 1 ≤ k ≤ m, (17)
u−
k ≥ 0, zk + fk (x) ≥ 0, u−
k (zk + fk (x)) = 0, 1 ≤ k ≤ m. (18)
− − −
If we set uk = u+k −uk and use the equality uk +uk = 1, we obtain uk = (1+uk )/2, uk = (1−uk )/2. From
+ +
− −
conditions uk ≥ 0, uk ≥ 0 the inequalities −1 ≤ uk ≤ 1, or |uk | ≤ 1, follow. The condition u+
+
k + uk = 1

implies that the numbers uk , uk cannot be simultaneously zero, so either zk = fk (x) or zk = −fk (x), that
+

is zk = |fk (x)|. If fk (x) ̸= 0, it cannot simultaneously hold zk = fk (x) and zk = −fk (x), so the numbers
− −
u+k , uk cannot be simultaneously nonzero. Then either uk = uk = 1 and zk = fk (x) or uk = −uk = −1
+

and zk = −fk (x), that is uk = fk (x)/|fk (x)|. Thus, the necessary KKT conditions have the form


m
fk (x)
gk (x)uk = 0, zk = |fk (x)|, |uk | ≤ 1, and uk = , if |fk (x)| > 0. (19)
|fk (x)|
k=1

Remark 9. Minimization of the sum of absolute values can also be reformulated so that more slack
variables are used. We obtain the problem

m
minimize F̃ (x, z) = (zk+ + zk− ) subject to fk (x) = zk+ − zk− , zk+ ≥ 0, zk− ≥ 0, (20)
k=1

where 1 ≤ k ≤ m. This problem contains m general equality constraints and 2m simple bounds for 2m
slack variables.

In the subsequent considerations, we will restrict ourselves to functions of the form (4), the sums of
maxima that include most cases important for applications. In this case, it holds

m
h(z) = zk , ∇h(z) = ẽ, ∇2 h(z) = 0, (21)
k=1

where ẽ ∈ Rm is a vector with unit elements. The case when h(z) is a general function satisfying
Assumption X2 is studied in [26]. For simplicity, we will often use the notation vec(a, b) instead of
[aT , bT ]T ∈ Rn+m .

5
2 Recursive quadratic programming methods
2.1 Basic properties
Suppose the function h(z) is of form (21). In this case the necessary KKT conditions are of form (8)–(9),
where ∂h(z)/∂zk = 1, 1 ≤ k ≤ m. If we linearize these conditions in a neighborhood of a point x ∈ Rn ,
we can write for d ∈ Rn

m ∑
mk ∑
mk
(gkl (x) + Gkl (x)d)ukl = 0, ukl = 1, 1 ≤ k ≤ m,
k=1 l=1 l=1

ukl ≥ 0, T
fkl (x) + gkl (x)d − zk ≤ 0, T
ukl (fkl (x) + gkl (x)d − zk ) = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk .
But these are the necessary KKT conditions for solving a quadratic programming problem: minimize a
quadratic function

m
1 ∑m ∑mk
Q(d, z) = zk + dT Gd, G= Gkl (x)ukl (22)
2
k=1 k=1 l=1

on the set
C = {(d, z) ∈ Rn+m : fkl (x) + gkl
T
(x)d ≤ zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk }. (23)
Note that coefficients ukl , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , in (22) are old Lagrange multipliers. New Lagrange
multipliers along with new values of variables zk , 1 ≤ k ≤ m, are determined by solving quadratic
programming problem (22)–(23).
For simplification, we will omit the argument x and use the notation
     
fk1 (x) uk1 1
fk =  . . .  , uk =  . . .  , ẽk = . . .  ,
fkmk (x) ukmk 1

Ak = [gk1 (x), . . . , gkmk (x)]. Problem (22)–(23) will be written in the form

m
1
minimize Q(d, z) = zk + dT Gd subject to fk + ATk d ≤ zk ẽk , 1 ≤ k ≤ m, (24)
2
k=1

from where the necessary KKT conditions



m
Gd + Ak uk = 0, ẽTk uk = 1, 1 ≤ k ≤ m, (25)
k=1

uk ≥ 0, fk + ATk d − zk ẽk ≤ 0, uTk (fk + ATk d − zk ẽk ) = 0, 1 ≤ k ≤ m, (26)


follow. Note that from (25)–(26) we have

zk = uTk (fk + ATk d), 1 ≤ k ≤ m. (27)

Quadratic programming problem (24) is convex, so there exists a dual problem stated in [32, Theorem
12.14]. We will use the notation
         
f1 u1 v1 w1 z1
f = . . .  , u = . . .  , v = . . .  , w = . . .  , z = . . .  ,
fm um vm wm zm

A = [A1 , . . . , Am ], during deriving a dual quadratic


∑m programming problem. Obviously, f ∈ Rm , u ∈ Rm ,
v ∈ R , and w ∈ R , z ∈ R , where m = k=1 mk . The following theorem states the dual problem for
m m m

special quadratic programming problem (24).

6
Theorem 1. Consider quadratic programming problem (24) with positive definite matrix G (so this problem
is convex). Then the dual problem can be written in the form
1 T T
minimize Q̃(u) = u A HAu − f T u subject to uk ≥ 0, ẽTk uk = 1, 1 ≤ k ≤ m, (28)
2
where H = G−1 . Problem (28) is convex as well and the dual problem to this problem is primal problem
(24). If the pair (vec(d, z), u) is a KKT pair of the primal problem, then the pair (u, vec(v, w)), where
vk = −(ATk d + fk − zk ẽk ) and wk = zk , 1 ≤ k ≤ m, is a KKT pair of the dual problem. If the pair
(u, vec(v, w)) is a KKT pair of the dual problem, then the pair (vec(d, z), u), where d = −HAu and
zk = wk , 1 ≤ k ≤ m, is a KKT pair of the primal problem and ATk d + fk − zk ẽk = −vk , 1 ≤ k ≤ m.

Proof. The Lagrange function of problem (24) has the form


m
1 ∑ m
L(d, z, u) = zk + dT Gd + uTk (fk + ATk d − zk ẽk ) (29)
2
k=1 k=1

and its gradient is  ∑m 


Gd + k=1 Ak uk
 1 − ẽT1 u1 
g(d, z, u) = 

.
 (30)
...
1 − ẽTm um
By [32, Theorem 12.14], the dual problem consists in maximizing the Lagrange function L(d, z, u) on the
set of constraints u ≥ 0 and g(d, z, u) = 0. Substituting (30) into the equation g(d, z, u) = 0, we obtain


m
d = −H Ak uk = −HAu, ẽTk uk = 1, 1 ≤ k ≤ m, (31)
k=1

which after substituting into (29) gives


m
1 ∑ 1
m
L(d, z, u) = zk + dT Gd + uT (f + AT d) − zk = uT AT HAu + uT f − uT AT HAu
2 2
k=1 k=1
1
= − uT AT HAu + f T u,
2

so maximization of the Lagrange function L(d, z, u) is equivalent to minimization of the function Q̃(u).
Since the matrix H is positive definite, problem (28) is convex and we can set up the dual problem
consisting in maximizing the Lagrange function

1 T T ∑ m ∑ m
L̃(u, v, w) = u A HAu − f T u − vkT uk + wk (ẽTk uk − 1) (32)
2
k=1 k=1

on the set of constraints g̃(u, v, w) = 0. That is,

ATk HAu − fk − vk + wk ẽk = 0, 1 ≤ k ≤ m, (33)

and
uk ≥ 0, ẽTk uk = 1, vk ≥ 0, vkT uk = 0, 1 ≤ k ≤ m.
If we set d = −HAu, and substitute this relation into (33), we obtain

−vk = fk + ATk d − wk ẽk , 1 ≤ k ≤ m, (34)

7
which along with uTk vk = 0 and uTk ẽk = 1 gives wk = uTk (fk + ATk d). Thus, it holds wk = zk , 1 ≤ k ≤ m,
by (27). If we substitute these equalities along with d = −HAu into (32), we can write

1 T ∑m ∑m
1 ∑m ∑m
L̃(u, v, w) = d Gd − fkT uk + (fk + ATk d − zk ẽk )T uk = dT Gd + uTk ATk d − zk
2 2
k=1 k=1 k=1 k=1
(m )
1 T ∑m ∑ 1
= d Gd − dT Gd − zk = − zk + dT Gd ,
2 2
k=1 k=1

so maximization of the Lagrange function L̃(u, v, w) is equivalent to minimization of the function Q(d, z).

Remark 10. Note that by (28) and (31), it holds that


1 T
Q̃(u) = d Gd − f T u, (35)
2
so

m
Q(d, z) − Q̃(u) = zk + f T u. (36)
k=1

The following theorem, which is a generalization of a similar theorem given in [15], shows that the
solution of quadratic programming problem (22)–(23) is a descent direction for the objective function
F (x).
Theorem 2. Let Assumption X3 be satisfied and let vectors d ∈ Rn , z ∈ Rm be a solution of quadratic
programming problem (22)–(23) with the positive definite matrix G and a corresponding vector of Lagrange
multipliers u ∈ Rm . If d = 0, then the pair (vec(x, z), u) is the KKT pair of problem (7). If d ̸= 0, then
F ′ (x, d) = dT g(x, u) < 0, where F ′ (x, d) is a directional derivative of function (4) along a vector d at a
point x and g(x, u) is a vector given by (8). If κ(G) ≤ 1/ε20 , where κ(G) is a spectral condition number of
G, then dT g(x, u) ≤ −ε0 ∥d∥∥g(x, u)∥ and for an arbitrary number 0 < ε1 < 1/2 there exists a steplength
0 < α ≤ 1 such that
F (x + αd) − F (x) ≤ ε1 αdT g(x, u) (37)
if 0 < α ≤ α.
Proof.
(a) If d = 0, then conditions (25)–(26) are equivalent to conditions (28)–(29). Thus, if a pair (vec(0, z), u)
is the KKT pair of problem (24), then a pair (vec(x, z), u) is the KKT pair of problem (7).
(b) Function (4) is a sum of maxima of differentiable functions, so it is regular by [31, Theorem 3.2.13]
and there exists a directional derivative
F (x + αd) − F (x) ∑
m
Fk (x + αd) − Fk (x)
F ′ (x, d) = lim = lim .
α↓0 α α↓0 α
k=1

Let 0 < α ≤ 1 and lk be indices such that fklk (x + αd) = Fk (x + αd), 1 ≤ k ≤ m. Then by
Assumption X3 it holds that
1
fklk (x + αd) ≤ fklk (x) + αgkl
T
(x)d + α2 G∥d∥2 , 1 ≤ k ≤ m.
k
2
Using inequality 0 < α ≤ 1 and relations (25)–(26), we obtain
T
fklk + α gkl k
d ≤ fklk + α(zk − fklk ) = αzk + (1 − α)fklk ≤ αzk + (1 − α)Fk
= Fk + α(zk − Fk ) = Fk + α uTk (zk ẽk − Fk ẽk ) ≤ Fk + α uTk (zk ẽk − fk )
= Fk + α uTk (zk ẽk − fk − ATk d) + α uTk ATk d = Fk + α uTk ATk d.

8
Thus, we can write

Fk (x + αd) − Fk (x) fklk (x + αd) − Fk (x) 1


= ≤ dT Ak uk + αG∥d∥2 , (38)
α α 2
so

m
Fk (x + αd) − Fk (x) ∑ T
m
F ′ (x, d) = lim ≤ d Ak uk = dT Au = dT g(x, u).
α↓0 α
k=1 k=1

Since Gd = −Au = −g(x, u), see (25), and matrix G is positive definite, we have F ′ (x, d) =
dT g(x, u) = −dT Gd < 0.
(c) If κ(G) ≤ 1/ε20 , then dT g(x, u) ≤ −ε0 ∥d∥∥g(x, u)∥, see [32, Section 3.2]. Since d = −G−1 g(x, u), it
holds ∥d∥ ≤ ∥g(x, u)∥/G, which along with the previous inequality gives
1 1 T
∥d∥2 ≤ ∥d∥∥g(x, u)∥ ≤ − d g(x, u).
G ε0 G

Using (38) we obtain


m ∑m ( )
F (x + αd) − F (x) Fk (x + αd) − Fk (x) 1
= dT Ak uk + αG∥d∥2
=
α α 2
k=1 k=1
( )
m mG
= dT g(x, u) + αG∥d∥2 ≤ 1 − α dT g(x, u),
2 2ε0 G

so (37) holds if
mG 2ε0 (1 − ε1 )G ∆
1−α ≥ ε1 ⇒ α≤ = α.
2ε0 G mG

Remark 11. A number 0 < α ≤ 1 satisfying (37) can be determined using the Armijo steplength selection
[32, Section 3.1]. Then α is a first term meeting (37) in the sequence αj , j ∈ N , such that α1 = 1 and
βαj ≤ αj+1 ≤ βαj , where 0 < β ≤ β < 1. At most int(log α/ log β + 1) steps is used, where int(t) is the
largest integer such that int(t) ≤ t and α ≥ βα. Substituting this inequality into (37) we obtain

ε0 ε1 β α
F (x + αd) − F (x) ≤ ε1 β αdT g(x, u) ≤ −ε0 ε1 β α∥d∥∥g(x, u)∥ ≤ − ∥g(x, u)∥2
G

= −c∥g(x, u)∥2 . (39)

2.2 Solving special quadratic programming problems


In this section we will deal with dual methods for solving quadratic programming problems of form (22)–
(23). We restrict ourselves to classical minimax problems. Theoretical considerations are practically
the same in the case of the sum of maxima but a formal description of algorithms is significantly more
complicated. Thus, we will consider the primal problem
1
minimize Q(d, z) = z + dT Gd, subject to fk + gkT d ≤ z, 1 ≤ k ≤ m, (40)
2
and the dual problem
1 T T
minimize Q̃(u) = u A HAu − f T u subject to u ≥ 0, ẽT u = 1. (41)
2

9
Note that by (31) we have
d = −HAu, (42)
which after substituting into (41) gives
1 T
Q̃(u) = d Gd − f T u. (43)
2
The solution of primal problem (40) can be obtained by an efficient method for solving dual problem
(41) as it is described in [19]. Let K ⊂ I = {1, . . . m} be a set of indices such that uk = 0 if k ̸∈ K
and vk = 0 if k ∈ K, where −vk = fk + gkT d − z, k ∈ K, are the values of constraints of the primal
problem (vk , k ∈ K, are the Lagrange multipliers of the dual problem). To simplify notation, we denote
u = [uk , k ∈ K], v = [vk , k ∈ K] vectors, whose elements are Lagrange multipliers with indices belonging
to K, so dimensions of these vectors are equal to the number of indices in K (similar notation we use for
vectors f , ẽ and for columns of matrix A). Note that multipliers uk and vk , k ̸∈ K, exist, but they are not
elements of the vectors u and v. Using (31), (34), and ẽT u = 1, we can determine elements uk , k ∈ K, of
a vector u and a variable z. Setting vk = 0, k ∈ K, we obtain

v = −AT d − f + ẽz = AT HAu − f + ẽz = 0, (44)

so
u = (AT HA)−1 (f − ẽz), (45)
and since
ẽT u = ẽT (AT HA)−1 (f − ẽz) = 1,
we obtain
ẽT (AT HA)−1 f − 1
z= . (46)
ẽT (AT HA)−1 ẽ
Definition 2. The set of indices K ⊂ I such that uk = 0, k ̸∈ K, and vk = 0, k ∈ K, is called the set of
active constraint indices (active set for short) of the primal problem. If uk ≥ 0, k ∈ K, then we say that
K is an acceptable active set of the primal problem.
Remark 12. Formulas (45)–(46) cannot be used if A has linearly dependent columns. It may happen even
if the Jacobian matrix [AT , −ẽ] has full rank. In order not to investigate this singular case separately, we
will use matrices [ ] [ ]
A H 0
à = , H̃ = , (47)
−ẽT 0 µ
where µ > 0. Then ÃT H̃ Ã = AT HA + µẽẽT , so by (44) it holds

ÃT H̃ Ãu = f + (z − µ)ẽ (48)

and formulas (45)–(46) can be written in the form

pT f − 1
u = Cf − (z − µ)p, z =µ+ , (49)
pT e

where C = (ÃT H̃ Ã)−1 and p = C ẽ. The value µ > 0 should be comparable with elements of matrix H.
The choice µ = 1 is usually suitable.
The active constraint method for solving dual problem (41) introduced in [19] is based on generating a
sequence of acceptable active sets of primal problem (40). An initial acceptable active set is determined in
the way that we choose an arbitrary index k ∈ I and set K = {k}, so uk = 1 and ul = 0 if l ∈ I and l ̸= k.
At each step we first test if the necessary (in a convex case also sufficient) KKT conditions are satisfied. If
vl < 0 for some index l ̸∈ K, then we try to remove the active constraint of the dual problem by considering
the set K + = K∪{l}. However, this set need not be acceptable (it may hold uk < 0 for some index k ∈ K + ).

10
Therefore, we need to remove some active constraints of the primal problem in advance, that is, to construct
an acceptable set K̄ ⊂ K such that the set K + = K̄ ∪ {l} was acceptable as well. For this reason we will
change the constraints of the primal problem with index l into −vl (λ) = ATl d + fl − z + (1 − λ)vl ≤ 0
(parameter λ is introduced as an argument), so −vl (0) = ATl d+fl −z +vl = 0 and uk (0) ≥ 0 if k ∈ K ∪{l}.
In the subsequent considerations we will use the notation al = gl .
Lemma 1. Let K be an acceptable active set of primal problem (40) and vl < 0. Suppose that a vector
ãl = [aTl , −1]T is not a linear combination of columns of matrix à and denote p = C ẽ, ql = C ÃT H̃ãl ,
βl = 1 − ẽT ql , γl = βl /ẽT p, and δl = ãTl (H̃ − H̃ ÃC ÃT H̃)ãl = ãTl H̃(ãl − Ãql ), where C = (ÃT H̃ Ã)−1 , so
δl > 0. Then

u(λ) = u(0) − α(ql + γl p), ul (λ) = ul (0) + α, z(λ) = z(0) + αγl , (50)

where α = −λvl /(βl γl − δl ).


Proof. Using relation (48) augmented by the equation with index l, one can write
[ T ][ ] [ ]
à H̃ à ÃT H̃ãl u(λ) f − (z(λ) − µ)ẽ
= . (51)
ãTl H̃ Ã ãTl H̃ãl ul (λ) fl + (1 − λ)vl − (z(λ) − µ)

Subtracting equations for u(0) and ul (0) we obtain


[ T ][ ] [ ]
à H̃ à ÃT H̃ãl u(λ) − u(0) (z(λ) − z(0))ẽ
= − .
ãTl H̃ Ã ãTl H̃ãl ul (λ) − ul (0) λvl + (z(λ) − z(0))

The inverse matrix of this system can be expressed by the inverse matrix C = (ÃT H̃ Ã)−1 , which gives
 
[ ] ql qlT ql
u(λ) − u(0)  C+ δ − [ (z(λ) − z(0))ẽ
]

= − l δ l 
ul (λ) − ul (0) qT 1  λvl + (z(λ) − z(0))
− l
δl δl
 
βl λvl
(p − ql )(z(λ) − z(0)) − ql
 δl δl 
= − βl λvl . (52)
(z(λ) − z(0)) +
δl δl

Since ẽT u(λ) + ul (λ) = 1 for λ ≥ 0, we can write

ẽT (u(λ) − u(0)) + (ul (λ) − ul (0)) = 0, (53)

which along with (52) gives


[ ] ( )
[ T ] u(λ) − u(0) βl2 βl
ẽ 1 = − ẽ p +
T
(z(λ) − z(0)) − λvl = 0,
ul (λ) − ul (0) δl δl

that is
βl δl λvl
z(λ) − z(0) = − λvl = −γl = αγl , (54)
δl δl ẽT p + βl2 δl + β l γl
which is the last equality in (50). Substituting (54) into (52) and performing formal arrangements we
obtain the remaining equalities in (50).
Remark 13. Using (47) we obtain
[ T ] [ T ] [ ]
à H̃ à ÃT H̃ãl A HA AT Hal ẽẽT ẽ
= +µ , (55)
ãTl H̃ Ã ãTl H̃ãl aTl HA aTl Hal ẽT 1

11
so
ql = C ÃT H̃ãl = CAT Hal + µp, (56)
δl = ãTl H̃(ãl − Ãql ) = aTl H(al − Aql ) + µβl , (57)
Note that βl γl + δl = βl2 /ẽT C ẽ + δl = 0 if and only if βl = γl = δl = 0.

Lemma 2. Let the assumptions of Lemma 1 be satisfied. Then


1
Q(d(λ), z(λ)) = Q(d(0), z(0)) + α(βl γl + δl )(ul (λ) + ul (0)). (58)
2
1
Q̃(u(λ), ul (λ)) = Q̃(u(0), ul (0)) + α2 (βl γl + δl ) + αvl . (59)
2

Proof.
(a) Using (42) and (50) we can write

d(λ) − d(0) = −HA(u(λ) − u(0)) − Hal (ul (λ) − ul (0)) = αH(A(ql + γl p) − al ),


d(0) = −H(Au(0) + al ul (0))

and by (55)–(56) it holds

AT HAql = ÃT H̃ Ãql − µẽẽT ql = AT Hal + µẽ − µẽẽT ql = AT Hal + µβl ẽ,
AT HAp = ÃT H̃ Ãp − µẽẽT p = ẽ − µẽẽT p,

because ÃT H̃ Ã = C −1 . Using these equalities and formulas (56)–(57) we obtain

AT H(A(ql + γl p) − al ) = AT Hal + µβl ẽ + γl ẽ − µγl ẽẽT p − AT Hal = γl ẽ,


−aTl H(A(ql + γl p) − al ) = δl − µβl − γl aTl HAC ẽ = δl − µβl − γl ẽT ql + µγl ẽT p = δl − γl ẽT ql ,

which after substitution gives

(d(λ) − d(0))T Gd(0) = −α(A(ql + γl p) − al )T H(Au(0) + al ul (0))


= −αγl ẽT u(0) + α(δl − γl ẽT ql )ul (0)
= −αγl (1 − ul (0)) + α(δl − γl (1 − βl ))ul (0)
= −αγl + α(δl + βl γl )ul (0), (60)
(d(λ) − d(0)) G(d(λ) − d(0)) =
T
α2 (A(ql + γl p) − al )T HA(ql + γl p) − al )
= α2 γl ẽT (ql + γl p) + α2 (δl − γl ẽT ql )
= α2 (δl + βl γl ) (61)

(because ẽT u(λ) + ul (λ) = 1 for λ ≥ 0). Since z(λ) − z(0) = αγl , using (60)–(61) we can write

Q(d(λ), z(λ)) = Q(d(0), z(0)) + z(λ) − z(0) + (d(λ) − d(0))Gd(0)


1
+ (d(λ) − d(0))T G(d(λ) − d(0))
2
1
= Q(d(0)), z(0)) + αγl − αγl + α(δl + βl γl )ul (0) + α2 (δl + βl γl )
2
1
= Q(d(0)), z(0)) + α(δl + βl γl )(u(λ) + ul (0)),
2
because α = ul (λ) − ul (0)).

12
(b) Using (51), (53), and (55) we obtain
[ ]T [ ][ ]
u(λ) − u(0) AT Hal AT HA u(0)
(d(λ) − d(0)) Gd(0) =
T
ul (λ) − ul (0) aTl Hal aTl HA ul (0)
[ ]T [ ]
u(λ) − u(0) f − z(0)ẽ
=
ul (λ) − ul (0) fl + vl − z(0)
= (u(λ) − u(0))T f + (ul (λ) − ul (0))fl + (ul (λ) − ul (0))vl ,

which along with (43) and (50) gives


1
Q̃(u(λ), ul (λ)) = Q̃(u(0), ul (0)) + (d(λ) − d(0))Gd(0) + (d(λ) − d(0))T G(d(λ) − d(0))
2
−f T (u(λ) − u(0)) − fl (ul (λ) − ul (0))
1
= Q̃(u(0), ul (0)) + (d(λ) − d(0))T G(d(λ) − d(0)) + αvl ,
2
so (59) holds by (61).

Remark 14. Denote I˜ = {k ∈ I ∩ K : eTk (ql + γl p) > 0} and set

vl uj (0) ∆ uk (0)
α1 = − , α2 = T
= min T
, (62)
βl γl + δl ej (ql + γl p) k∈I˜ ek (ql + γl p)

where α1 = ∞ if βl γl + δl = 0 and α2 = ∞ if I˜ = ∅. Let α = min(α1 , α2 ). Three cases can occur.


(1) If α = α1 = ∞, the dual problem has no optimal solution because βl γl + δl = 0 and vl < 0. Thus,
Q̃(u(α), ul (α)) → −∞ by (59) if α → ∞. The primal problem has no feasible solution in this case.
(2) If α = α1 < ∞, so βl γl + δl > 0, we can set K + = K ∪ {l}. This corresponds to adding the constraint
with index l into a set of active constraints of the primal problem.
(3) If α = α2 < α1 , we need to remove the constraint with index j (formula (62)) from the set of active
constraints of the primal problem.
In cases (2) and (3), it is necessary to update representation of the linear variety defined by new active
constraints.
Remark 15. If α = α2 < α1 , we cannot add index l into a set K. In this case, we need to remove index
j from the set K and create a set K̄1 = K̄0 \ {j}, where K̄0 = K, which is possible because uj = 0. At
the same time, we need to multiply the value vl by 1 − α/α1 (if α1 = ∞, then the value vl is unchanged).
Performing these adjustments we can try to add index l into the set K̄1 . Repeating these procedure we
obtain a sequence of sets K = K̄0 ⊃ K̄1 ⊃ · · · ⊃ K̄p . Since the number of constraints of the primal problem
is finite, then there exists a set K̄ = K̄p , where p ≥ 0, such that either α = ∞ (no solution of problem
(40) exists) or α = α1 , so a set K + = K̄ ∪ {l} is an acceptable active set of the primal problem.
Lemma 3. Let a solution of problem (40) exist and K, K + = K̄ ∪{l} be the sets mentioned in Remark 15.
Let d and d+ be direction vectors given by (42), where the vectors u and u+ correspond to acceptable sets
K and K + . Then Q(d+ , z + ) > Q(d, z).
Proof. Denote by d¯i , 0 ≤ i ≤ p, direction vectors given by (42), where the vectors ūi , 0 ≤ i ≤ p, correspond
to the sets K̄i , 0 ≤ i ≤ p. Then Q(d¯i , z̄i ) ≥ Q(d¯i−1 , z̄i−1 ), 1 ≤ i ≤ p, holds by Lemma 2. Since α = α1
holds in the last step determined by the set K̄ = K̄p , so βl γl + δl > 0 and α > 0, and since ūl (0) ≥ 0, we
¯ z̄) = Q(d¯p , z̄p ) ≥ Q(d¯0 , z̄0 ) = Q(d, z).
can write Q(d+ , z + ) > Q(d,

13
Algorithm 1. Dual method of active constraints

Step 1 Choose an arbitrary index 1 ≤ l ≤ m (e.g. l = 1) and a number µ (e.g. µ = 1). Set K := {l},
u := [1], ẽ := [1], A := [al ], R := [aTl Hal + µ]1/2 . Compute a number z := fl − aTl Hal . Set vl := 0
and uk := 0 for k ̸∈ K.
Step 2 Compute a vector d := −HAu (formula (42)), set vk := z − (aTk d + fk ) for k ̸∈ K and determine
an index l ̸∈ K such that vl = mink̸∈K vk . If vl ≥ 0, terminate the computation (a pair (d, z) ∈ Rn+1
is a solution of primal problem (40) and a vector u is a solution of dual problem (41)).

Step 3 Determine the vector p by solving the system of equations RT Rp = ẽ and the vector ql by solving
the system of equations RT Rql = ÃT H̃ãl . Set βl := 1 − ẽT ql , γl := βl /ẽT p, δl := ãTl H̃(ãl − Ãql )
(Remark 16). Compute numbers α1 , α2 defined in Remark 14 and set α := min(α1 , α2 ). If α = ∞,
terminate the computation (the primal problem has no feasible solution and the dual problem has no
optimal solution). If α < ∞, set u := u − α(ql + γl p), ul := ul + α, z := z + αγl , vl := (1 − α/α1 )vl .
Step 4 If α = α1 , set K := K ∪{l}, u := u+ , f := f + , ẽ := ẽ+ , A := A+ , R := R+ , where u+ = [uT , ul ]T ,
f + = [f T , fl ]T , ẽ+ = [ẽT , 1]T and A+ , R+ are matrices defined in Remark 16. Go to Step 2.
Step 5 If α ̸= α1 , set K := K \ {j}, u := u− , f := f − , ẽ := ẽ− , A := A− , R := R− , where j ∈ K is an
index determined by (62), vectors u− , f − , ẽ− result from vectors u, f , ẽ by removing the element
with index j and A− , R− are matrices defined in Remark 16. Go to Step 3.

Remark 16. The vector ql and the number δl used in Step 3 of Algorithm 1 can be computed so that we
solve two systems of equations RT rl = ÃT H̃ãl = AT Hal + µẽ and Rql = rl with triangular matrices RT
and R and set δl = ρ2l where ρ2l = aTl Hal − rlT rl . Then, in Step 4 it holds that
[ ]
R rl
A+ = [A, al ], R+ = .
0 ρl

In Step 5 we determine a permutation matrix Π such that AΠ = [A− , aj ] and RΠ is an upper Hessenberg
matrix. Furthermore, we determine an orthogonal matrix Q such that the matrix QRΠ is upper triangular.
Then [ − ]
R rj
QRΠ =
0 ρj
holds. Derivation of these relations can be found in [19].
Theorem 3. After a finite number of steps of Algorithm 1, either a solution of problems (40) and (41) is
found or the fact that these problems have no solution is detected.
Proof. Algorithm 1 generates a sequence of active sets Kji , ji ∈ N , of the primal problem, where the set
Kj1 , j1 = 1, is acceptable. Suppose that the set Kji , ji ∈ N , is acceptable. By Remark 15 and Lemma 3,
after at most m steps, we either find out that problems (40) and (41) have no solution or obtain an
acceptable set Kji+1 where ji+1 − ji ≤ m and where Q(dji+1 , zji+1 ) > Q(dji , zji ). Thus, the sets Kji+1 and
Kji are different and since the number of different subsets of the set {1, . . . , m} is finite, the computation
must terminate after a finite number of steps.

3 Primal interior point methods


3.1 Barriers and barrier functions
Primal interior point methods for equality constraint minimization problems are based on adding a barrier
term containing constraint functions to the minimized function. A resulting barrier function, depending

14
on a barrier parameter 0 < µ ≤ µ < ∞, is successively minimized on Rn (without any constraints), where
µ → 0. Applying this approach on problem (7), we obtain a barrier function

m ∑
mk
Bµ (x, z) = h(z) + µ φ(zk − fkl (x)), 0 < µ ≤ µ, (63)
k=1 l=1

where φ : (0, ∞) → R is a barrier which satisfies the following assumption.


Assumption B1. Function φ(t), t ∈ (0, ∞), is twice continuously differentiable, decreasing, and strictly
convex, with limt→0 φ(t) = ∞. Function φ′ (t) is increasing and strictly concave such that limt→∞ φ′ (t) = 0.
For t ∈ (0, ∞) it holds −tφ′ (t) ≤ 1, t2 φ′′ (t) ≤ 1. There exist numbers τ > 0 and c > 0 such that for t < τ
it holds
−tφ′ (t) ≥ c (64)
and
φ′ (t)φ′′′ (t) − φ′′ (t)2 > 0. (65)

Remark 17. A logarithmic barrier function

φ(t) = log t−1 = − log t, (66)

is most frequently used. It satisfies Assumption B1 with c = 1 and τ = ∞ but it is not bounded from below
since log t → ∞ for t → ∞. For that reason, barriers bounded from below are sometimes used, e.g. a
function

φ(t) = log(t−1 + τ −1 ) = − log , (67)
t+τ
which is bounded from below by number φ = − log τ , or a function

φ(t) = − log t, 0 < t ≤ τ, φ(t) = at−2 + bt−1 + c, t ≥ τ, (68)

which is bounded from below by number φ = c = − log τ − 3/2, or a function

φ(t) = − log t, 0 < t ≤ τ, φ(t) = at−1 + bt−1/2 + c, t ≥ τ, (69)

which is bounded from below by number φ = c = − log τ − 3. Coefficients a, b, c are chosen so that
function φ(t) as well as its first and second derivatives are continuous in t = τ . All these barriers satisfy
Assumption B1 [26] (the proof of this statement is trivial for logarithmic barrier (66)).
Even if bounded from below barriers (67)-(69) have more advantageous theoretical properties (As-
sumption X1a can be replaced with a weaker Assumption X1b), algorithms using logarithmic barrier
(67) are usually more efficient. Therefore, we will only deal with methods using the logarithmic barrier
φ(t) = − log t in the subsequent considerations.

3.2 Iterative determination of a minimax vector


Suppose the function h(z) is of form (21). Using the logarithmic barrier φ(t) = − log t, function (63) can
be written as

m ∑
m ∑mk
Bµ (x, z) = zk − µ log(zk − fkl (x)), 0 < µ ≤ µ. (70)
k=1 k=1 l=1

Further, we will denote gkl (x) and Gkl (x) gradients and Hessian matrices of functions fkl (x), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , and set
µ µ 1
ukl (x, z) = ≥ 0, vkl (x, z) = = u2kl (x, z) ≥ 0, (71)
zk − fkl (x) (zk − fkl (x))2 µ

15
     
uk1 (x, z) vk1 (x, z) 1
uk (x, z) =  ... , vk (x, z) =  ... , ẽk = . . .  .
ukmk (x, z) vkmk (x, z) 1
Denoting by g(x, z) the gradient of the function Bµ (x, z) and γk (x, z) = ∂Bµ (x, z)/∂zk , the necessary
conditions for an extremum of barrier function (63) can be written in the form

m ∑
mk ∑
m
g(x, z) = gkl (x)ukl (x, z) = Ak (x)uk (x, z) = 0, (72)
k=1 l=1 k=1
∑mk
γk (x, z) = 1− ukl (x, z) = 1 − ẽTk uk (x, z) = 0, 1 ≤ k ≤ m, (73)
l=1

where Ak (x) = [gk1 (x), . . . , gkmk (x)], which is a system of n + m nonlinear equations for unknown vectors
x and z. These equations can be solved by the Newton method. In this case, the second derivatives of the
Lagrange function (which are the first derivatives of expressions (72) and (73)) are computed. Denoting
m ∑
∑ mk
G(x, z) = Gkl (x)ukl (x, z) (74)
k=1 l=1

the Hessian matrix of the Lagrange function and setting

Uk (x, z) = diag(uk1 (x, z), . . . , ukmk (x, z)),


1 2
Vk (x, z) = diag(vk1 (x, z), . . . , vkmk (x, z)) = U (x, z),
µ k
we can write
∂g(x, z) ∑
m ∑
mk ∑
m ∑
mk
T
= Gkl (x)ukl (x, z) + gkl (x)vkl (x, z)gkl (x)
∂x
k=1 l=1 k=1 l=1

m
= G(x, z) + Ak (x)Vk (x, z)ATk (x), (75)
k=1

∂g(x, z) ∑
mk
= − gkl (x)vkl (x, z) = −Ak (x)vk (x, z), (76)
∂zk
l=1

∂γk (x, z) ∑
mk
= − T
vkl (x, z)gkl (x) = −vkT (x, z)ATk (x), (77)
∂x
l=1

∂γk (x, z) ∑
mk
= vkl (x, z) = ẽTk vk (x, z). (78)
∂zk
l=1

Using these formulas we obtain a system of linear equations describing a step of the Newton method
    
W (x, z) −A1 (x)v1 (x, z) . . . −Am (x)vm (x, z) ∆x gk (x, z)
 −v1T (x, z)AT1 (x) ẽT1 v1 (x, z) ... 0   ∆z1   
   = −  γ1 (x, z)  , (79)
 ... ... ... ...  ...   ... 
−vm (x, z)Am (x)
T T
0 ... T
ẽm vm (x, z) ∆zm γm (x, z)
where

m
W (x, z) = G(x, z) + Ak (x)Vk (x, z)ATk (x). (80)
k=1
Setting

C(x, z) = [A1 (x)v1 (x, z), . . . , Am (x)vm (x, z)], D(x, z) = diag(ẽT1 v1 (x, z), . . . , ẽTm vm (x, z))

16
and γ(x, z) = [γ1 (x, z), . . . , γm (x, z)]T , a step of the Newton method can be written in the form
[ ][ ] [ ]
W (x, z) −C(x, z) ∆x g(x, z)
=− . (81)
−C T (x, z) D(x, z) ∆z γ(x, z)
The diagonal matrix D(x, z) is positive definite since it has positive diagonal elements.
Remark 18. If the number m is small (as in case of a classical minimax problem, where m = 1), we will
use the expression
[ ]−1 [ −1 ]
W −C W − W −1 C(C T W −1 C − D)−1 C T W −1 −W −1 C(C T W −1 C − D)−1
= .
−C T D −(C T W −1 C − D)−1 C T W −1 −(C T W −1 C − D)−1
We suppose that the matrix W is regular (otherwise, it can be regularized e.g. by the Gill-Murray decom-
position [11]). Then, a solution of system of equations (81) can be computed by
∆z = (C T W −1 C − D)−1 (C T W −1 g + γ), (82)
∆x = W −1 (C∆z − g). (83)
In this case, a large matrix W of order n, which is sparse if G(x, z) is sparse, and a small dense matrix
C T W −1 C − D of order m are decomposed.
Remark 19. If the numbers mk , 1 ≤ k ≤ m, are small (as in case of a sum of sbsolute values, where
mk = 2, 1 ≤ k ≤ m), the matrix W (x, z) − C(x, z)D−1 (x, z)C T (x, z) is sparse. Thus, we can use the
expression
[ ]−1 [ ]
W −C (W − CD−1 C T )−1 (W − CD−1 C T )−1 CD−1
= .
−C T D D−1 C T (W − CD−1 C T )−1 D−1 + D−1 C T (W − CD−1 C T )−1 CD−1
Then, a solution of system of equations (81) can be computed by
∆x = −(W − CD−1 C T )−1 (g + CD−1 γ), (84)
−1
∆z = D (C ∆x − γ).
T
(85)
In this case, a large matrix W − CD−1 C T of order n, which is usually sparse if G(x, z) is sparse, is
decomposed. The inverse of the diagonal matrix D of order m makes no problem.
During iterative determination of a minimax vector we know a value of the parameter µ and vectors
x ∈ Rn , z ∈ Rm such that zk > Fk (x), 1 ≤ k ≤ m. Using formulas (82)–(83) or (84)–(85) we determine
direction vectors ∆x, ∆z. Then, we choose a steplength α so that
Bµ (x + α∆x, z + α∆z) < Bµ (x, z) (86)
and zk + α∆zk > Fk (x + α∆x), 1 ≤ k ≤ m. Finally, we set x+ = x + α∆x, z+ = z + α∆z and determine a
new value µ+ < µ. If the matrix of system of equations (81) is positive definite, inequality (86) is satisfied
for a sufficiently small value of the steplength α.
Theorem 4. Let the matrix G(x, z) given by (74) be positive definite. Then the matrix of system of
equations (81) is positive definite.
Proof. The matrix of system of equations (81) is positive definite if and only if the matrix D and its Schur
complement W − CD−1 C T are positive definite [8, Theorem 2.5.6]. The matrix D is positive definite since
it has positive diagonal elements. Further, it holds

m
( )
W − CD−1 C T = G + Ak Vk ATk − Ak Vk ẽk (ẽTk Vk ẽk )−1 (Ak Vk ẽk )T ,
k=1

matrices Ak Vk ATk −Ak Vk ẽk (ẽTk Vk ẽk )−1 (Ak Vk ẽk )T ,


1 ≤ k ≤ m, are positive semidefinite due to the Schwarz
inequality and the matrix G is positive definite by the assumption.

17
3.3 Direct determination of a minimax vector
Now we will show how to solve system of equations (72)–(73) by direct determination of a minimax vector
using two-level optimization
z(x; µ) = arg min Bµ (x, z), (87)
z∈Rm

x∗ = arg min B(x; µ),



B(x; µ) = Bµ (x, z(x; µ)). (88)
x∈Rn

Problem (87) serves for determination of an optimal vector z(x; µ) ∈ Rm . Let B̃µ (z) = Bµ (x, z) for a fixed
chosen vector x ∈ Rn . The function B̃µ (z) is strictly convex (as a function of a vector z), since it is a
sum of convex function (21) and strictly convex functions −µ log(zk − fkl (x)), 1 ≤ k ≤ m, 1 ≤ l ≤ mk . A
minimum of the function B̃µ (z) is its stationary point, so it is a solution of system of equations (73) with
Lagrange multipliers (71). The following theorem shows that this solution exists and is unique.
Theorem 5. The function B̃µ (z) : (F (x), ∞) → R has a unique stationary point which is its global
minimum. This stationary point is characterized by a system of equations γ(x, z) = 0, or


mk
µ
1 − ẽTk uk = 1 − = 0, 1 ≤ k ≤ m, (89)
zk − fkl (x)
l=1

which has a unique solution z(x; µ) ∈ Z ⊂ Rm such that

Fk (x) < Fk (x) + µ < zk (x; µ) < Fk (x) + mk µ (90)

for 1 ≤ k ≤ m.
Proof. Definition 1 implies fkl (x) ≤ Fk (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , where the equality occurs for at least
one index l.
(a) If (89) holds, then we can write


mk
µ µ
1= > ⇔ zk − Fk (x) > µ,
zk − fkl (x) zk − Fk (x)
l=1

mk
µ mk µ
1= < ⇔ zk − Fk (x) < mk µ,
zk − fkl (x) zk − Fk (x)
l=1

which proves inequalities (90).


(b) Since


mk
µ µ
γk (x, F + µ) = 1− < 1 − = 0,
µ + Fk (x) − fkl (x) µ
l=1

mk
µ mk µ
γk (x, F + mk µ) = 1− >1− = 0,
mk µ + Fk (x) − fkl (x) mk µ
l=1

and the function γk (x, zk ) is continuous and decreasing in Fk (x) + µ < zk (x; µ) < Fk (x) + mk by
(78), the equation γk (x, zk ) = 0 has a unique solution in this interval. Since the function B̃µ (z) is
convex, this solution corresponds to its global minimum.

18
System (89) is a system of m scalar equations with localization inequalities (90). These scalar equations
can be efficiently solved by robust methods described e.g. in [16] and [17] (details are stated in [25]).
Suppose that z = z(x; µ) and denote


m ∑
m ∑
mk
B(x; µ) = zk (x; µ) − µ log(zk (x; µ) − fkl (x)). (91)
k=1 k=1 l=1

To find a minimum of Bµ (x, z) in Rn+m , it suffices to minimize B(x; µ) in Rn .


Theorem 6. Consider barrier function (91). Then


m
∇B(x; µ) = Ak (x)uk (x; µ), (92)
k=1

∇2 B(x; µ) = W (x; µ) − C(x; µ)D−1 (x; µ)C T (x; µ)


∑m ∑m
Ak (x)Vk (x; µ)ẽk ẽTk Vk (x; µ)ATk (x)
= G(x; µ) + Ak (x)Vk (x; µ)Ak (x) −
T
, (93)
k=1 k=1
ẽTk Vk (x; µ)ẽk

where W (x; µ) = W (x, z(x; µ)), G(x; µ) = G(x, z(x; µ)), C(x; µ) = C(x, z(x; µ)), D(x; µ) = D(x, z(x; µ))
and Uk (x; µ) = Uk (x, z(x; µ)), Vk (x; µ) = Vk (x, z(x; µ)) = Uk2 (x; µ)/µ, 1 ≤ k ≤ m. A solution of equation

∇2 B(x; µ)∆x = −∇B(x; µ) (94)

is identical with a vector ∆x given by (84), where z = z(x; µ) (so γ(x, z(x; µ)) = 0).
Proof. Differentiating barrier function (91) and using (73) we obtain


m ∑
m ∑
mk ( )
∂zk (x; µ) ∂zk (x; µ) ∂fkl (x)
∇B(x; µ) = − ukl (x; µ) −
∂x ∂x ∂x
k=1 k=1 l=1
( )
∑m
∂zk (x; µ) ∑mk ∑
m ∑
mk
∂fkl (x)
= 1− ukl (x; µ) + ukl (x; µ)
∂x ∂x
k=1 l=1 k=1 l=1
∑m ∑mk ∑
m
= gkl (x)ukl (x; µ) = Ak (x)uk (x; µ),
k=1 l=1 k=1

where
µ
ukl (x; µ) = , 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (95)
zk (x; µ) − fkl (x)
Formula (93) can be obtained by additional differentiation of relations (73) and (92) using (95). A simpler
way is based on using (84). Since (73) implies γ(x, z(x; µ)) = 0, we can substitute γ = 0 into (84), which
yields the equation ( )
W (x, z) − C(x, z)D−1 (x, z)C T (x, z) ∆x = −g(x, z),
where z = z(x; µ), that confirms validity of formulas (93) and (94) (details can be found in [25]).
Remark 20. To determine an inverse of the Hessian matrix, one can use a Woodbury formula [8, Theorem
12.1.4] which gives

(∇2 B(x; µ))−1 = W −1 (x; µ) − W −1 (x; µ)C(x; µ)


( T )−1
C (x; µ)W −1 (x; µ)C(x; µ) − D(x; µ)
C T (x; µ)W −1 (x; µ). (96)

19
If the matrix ∇2 B(x; µ) is not positive definite, it can be replaced by a matrix LLT = ∇2 B(x; µ) + E,
obtained by the Gill-Murray decomposition [11]. Note that it is more advantageous to use system of linear
equations (81) instead of (94) for determination of a direction vector ∆x because the system of nonlinear
equations (89) is solved with prescribed finite precision, and thus a vector γ(x, z), defined by (73), need not
be zero.
From
1 2
Vk (x; µ) = U (x; µ), uk (x; µ) ≥ 0, ẽTk uk (x; µ) = 1, 1 ≤ k ≤ m,
µ k
it follows that ∥Vk (x; µ)∥ → ∞ if µ → 0, so Hessian matrix (93) may be ill-conditioned if the value µ is
very small. From this reason, we use a lower bound µ > 0 for µ.
Theorem 7. Let Assumption X3 be satisfied and let µ ≥ µ > 0. If the matrix G(x; µ) is uniformly positive
definite (i.e. there exists a constant G such that v T G(x; µ)v ≥ G∥v∥2 ), there exists a number κ ≥ 1 such
that κ(∇2 B(x; µ)) ≤ κ.
Proof.
(a) Using (71), (93), and Assumption X3, we obtain

∑m

∥∇ B(x; µ)∥ ≤ G(x; µ) +
2
Ak (x)Vk (x; µ)Ak (x)
T

k=1
∑m ∑ ( )
1 2
mk
≤ |Gkl (x)ukl (x, µ)| + T
u (x; µ)gkl (x)gkl (x)
µ kl
k=1 l=1
m( )∆ c c
≤ µ G + g2 = ≤ (97)
µ µ µ

because 0 ≤ ukl (x; µ) ≤ ẽTk uk (x; µ) = 1, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , by (89).


(b) As in the proof of Theorem 4, for an arbitrary vector v ∈ Rn it holds

v T ∇2 B(x; µ)v = v T (W (x; µ) − C(x; µ)D−1 (x; µ)C T (x; µ))v


v T Ak (x)Vk (x; µ)ẽk ẽTk V (x; µ)ATk (x)v
= v T G(x; µ)v + v T Ak (x)Vk (x; µ)ATk (x)v −
ẽTk Vk (x; µ)ẽk
≥ v T G(x; µ)v ≥ G∥v∥2 ,

so λ(∇2 B(x; µ)) ≥ G.


(c) Since (a) implies λ(∇2 B(x; µ)) = ∥∇2 B(x; µ)∥ ≤ c/µ, using (b) we can write

λ(∇2 B(x; µ)) c ∆


κ(∇2 B(x; µ)) = ≤ = κ. (98)
λ(∇2 B(x; µ)) µG

Remark 21. If there exists a number κ > 0 such that κ(∇2 B(xi ; µi )) ≤ κ, i ∈ N , the direction vector
∆xi , given by solving a system of equations ∇2 B(xi ; µi )∆xi = −∇B(xi ; µi ), satisfies the condition

(∆xi )T g(xi ; µi ) ≤ −ε0 ∥∆xi ∥∥g(xi ; µi )∥,


i ∈ N, (99)

where ε0 = 1/ κ. Then, for arbitrary numbers 0 < ε1 ≤ ε2 < 1 one can find a steplength parameter
αi > 0 such that for xi+1 = xi + αi ∆xi it holds
B(xi+1 ; µi ) − B(xi ; µi )
ε1 ≤ ≤ ε2 , (100)
αi (∆xi )T g(xi ; µi )

20
so there exists a number c > 0 such that (see [32, Section 3.2])

B(xi+1 ; µi ) − B(xi ; µi ) ≤ −c∥g(xi ; µi )∥2 , i ∈ N. (101)

If Assumption X3 is not satisfied, then only (∆xi )T g(xi ; µi ) < 0 holds (because the matrix ∇2 B(x; µ) is
positive definite by Theorem 4) and

B(xi+1 ; µi ) − B(xi ; µi ) ≤ 0, i ∈ N. (102)

3.4 Implementation
Remark 22. In (80), it is assumed that G(x, z) is the Hessian matrix of the Lagrange function. Direct
computation of the matrix G(x; µ) = G(x, z(x; µ)) is usually difficult (one can use automatic differentiation
as described in [14]). Thus, various approximations G ≈ G(x; µ) are mostly used.

• The matrix G ≈ G(x; µ) can be determined using differences

A(x + δwj )u(x; µ) − A(x)u(x; µ)


Gwj = , 1 ≤ j ≤ k.
δ
The vectors wj , 1 ≤ j ≤ k, are chosen so that the number of them is as small as possible [4], [35].
• The matrix G ≈ G(x; µ) can be determined using the variable metric methods [27]. The vectors

d = x+ − x, y = A(x+ )u(x+ ; µ) − A(x)u(x+ ; µ)

are used for an update of G.

• If the problem is separable (i.e. fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are functions of a small number
nkl = O(1) of variables), one can set as in [13]


m ∑
mk
T
G= Zkl Ĝkl Zkl ukl (x, z),
k=1 l=1

where the reduced Hessian matrices Ĝkl are updated using the reduced vectors dˆkl = Zkl
T
(x+ − x) and
ŷkl = Zkl (gkl (x+ ) − gkl (x)).

Remark 23. The matrix G ≈ G(x; µ) obtained by the approach stated in Remark 22 can be ill-conditioned
so condition (99) (with a chosen value ε0 > 0) may not be satisfied. In this case it is possible to restart
the iteration process and set G = I. Then G = 1 and G = 1 in (97) and (98), so it is a higher probability
of fulfilment of condition (99). If the choice G = I does not satisfy (99), we set ∆x = −g(x; µ) (a steepest
descent direction).

An update of µ is an important part of interior point methods. Above all, µ → 0 must hold, which is
a main property of interior point methods. Moreover, rounding errors may cause that zk (x; µ) = Fk (x)
when the value µ is small (because Fk (x) < zk (x; µ) ≤ Fk (x) + mk µ and Fk (x) + mk µ → Fk (x) if µ → 0),
which leads to a breakdown (division by zk (x; µ) − Fk (x) = 0) when computing 1/(zk (x; µ) − Fk (x)).
Therefore, we need to use a lower bound µ for a barrier parameter (e.g. µ = 10−8 when computing in
double precision).
The efficiency of interior point methods also depends on the way of decreasing the value of a barrier pa-
rameter. The following heuristic procedures proved successful in practice, where g(xi ; µi ) = A(xi )u(xi ; µi )
and g is a suitable constant.

21
Procedure A
Phase 1 If ∥g(xi ; µi )∥ ≥ g, then µi+1 = µi (the value of a barrier parameter is unchanged).
Phase 2 If ∥g(xi ; µi )∥ < g, then
( )
µi+1 = max µ̃i+1 , µ, 10 εM |F (xi+1 )| , (103)

where F (xi+1 ) = F1 (xi+1 ) + · · · + Fm (xi+1 ), εM is a machine precision, and


[ ]
µ̃i+1 = min max(λµi , µi /(σµi + 1)), max(∥g(xi ; µi )∥2 , 10−2k ) . (104)

The values µ = 10−8 , λ = 0.85, and σ = 100 are usually used.

Procedure B
Phase 1 If ∥g(xi ; µi )∥2 ≥ ϑµi , then µi+1 = µi (the value of a barrier parameter is unchanged).
Phase 2 If ∥g(xi ; µi )∥2 < ϑµi , then
µi+1 = max(µ, ∥gi (xi ; µi )∥2 ). (105)
The values µ = 10−8 and ϑ = 0.1 are usually used.
The choice of g in Procedure A is not critical. We can set g = ∞ but a lower value is sometimes more
advantageous. Formula (104) requires several comments. The first argument of the minimum controls
the decreasing speed of the value of a barrier parameter which is linear (a geometric sequence) for small i
(the term λµi ) and sublinear (a harmonic sequence) for large i (the term µi /(σµi + 1)). Thus, the second
argument ensuring that the value µ is small in a neighborhood of a desired solution is mainly important for
large i. This situation may appear if the gradient norm ∥g(xi ; µi )∥ is small even if xi is far from a solution.
The idea of Procedure B proceeds from the fact that a barrier function B(x; µ) should be minimized with
a sufficient precision for a given value of a parameter µ.
The considerations up to now are summarized in the following algorithm which supposes that the matrix
A(x) is sparse. If it is dense, the algorithm is simplified because there is no symbolic decomposition.
Algorithm 2. Primal interior point method
Data. A tolerance for the gradient norm of the Lagrange function ε > 0. A precision for determination
of a minimax vector δ > 0. Bounds for a barrier parameter 0 < µ < µ. Coefficients for decrease of
a barrier parameter 0 < λ < 1, σ > 1 (or 0 < ϑ < 1). A tolerance for a uniform descent ε0 > 0. A
tolerance for a steplength selection ε1 > 0. A maximum steplength ∆ > 0.
Input. A sparsity pattern of the matrix A(x) = [A1 (x), . . . , Am (x)]. A starting point x ∈ Rn .
Step 1 Initiation. Choose µ ≤ µ. Determine a sparse structure of the matrix W = W (x; µ) from
the sparse structure of the matrix A(x) and perform a symbolic decomposition of the matrix W
(described in [2, Section 1.7.4]). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values Fk (x) =
max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4). Set r = 0 (restart indicator).
Step 2 Termination. Solve nonlinear equations (89) with precision δ to obtain a minimax variable
z(x; µ) and a vector of Lagrange multipliers u(x; µ). Determine a matrix A = A(x) and a vector
g = g(x; µ) = A(x)u(x; µ). If µ ≤ µ and ∥g∥ ≤ ε, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x; µ) or compute an approximation G of the Hes-
sian matrix G(x; µ) using gradient differences or using quasi-Newton updates (Remark 22).
Step 4 Direction determination. Determine a matrix ∇2 B(x; µ) by (93) and a vector ∆x by solving
equations (94) with the right-hand side defined by (92).

22
Step 5 Restart. If r = 0 and (99) does not hold (where s = ∆x), set G = I, r = 1 and go to Step 4. If
r = 1 and (99) does not hold, set ∆x = −g. Set r = 0.
Step 6 Steplength selection. Determine a steplength α > 0 satisfying inequalities (100) (for a barrier
function B(x; µ) defined by (91)) and α ≤ ∆/∥∆x∥. Note that nonlinear equations (89) are solved
at the point x + α∆x. Set x := x + α∆x. Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values
Fk (x) = max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4).
Step 7 Barrier parameter update. Determine a new value of a barrier parameter µ ≥ µ using Proce-
dure A or Procedure B. Go to Step 2.

The values ε = 10−6 , δ = 10−6 , µ = 10−8 , µ = 1, λ = 0.85, σ = 100, ϑ = 0.1, ε0 = 10−8 , ε1 = 10−4 ,
and ∆ = 1000 were used in our numerical experiments.

3.5 Global convergence


Now we prove the global convergence of the method realized by Algorithm 2.
Lemma 4. Let vectors zk (x; µ), 1 ≤ k ≤ m, be solutions of equations (89). Then

∂zk (x; µ) ∂B(x; µ) ∑∑k m m


> 0, 1 ≤ k ≤ m, =− log(zk (x; µ) − fkl (x)).
∂µ ∂µ
k=1 l=1

Proof. Differentiating (89) with respect to µ, one can write for 1 ≤ k ≤ m



mk
1 ∑k m
µ ∂zk (x; µ)
− + = 0,
zk (x; µ) − fkl (x) (zk (x; µ) − fkl (x)) 2 ∂µ
l=1 l=1

which after multiplication of µ together with (71) and (89) gives


(m )−1 ( m )−1
∂zk (x; µ) ∑k
µ2 ∑ k

= = u2kl (x; µ) > 0.


∂µ (zk (x; µ) − fkl (x))2
l=1 l=1

Differentiating a function

m ∑
m ∑
mk
B(x; µ) = zk (x; µ) − µ log(zk (x; µ) − fkl (x)) (106)
k=1 k=1 l=1

and using (89) we obtain

∂B(x; µ) ∑
m
∂zk (x; µ) ∑
m ∑
mk ∑
m ∑
mk
µ ∂zk (x; µ)
= − log(zk (x; µ) − fkl (x)) −
∂µ ∂µ zk (x; µ) − fkl (x) ∂µ
k=1 k=1 l=1 k=1 l=1
( )
∂zk (x; µ) ∑
m ∑mk
µ ∑∑
m mk
= 1− − log(zk (x; µ) − fkl (x))
∂µ zk (x; µ) − fkl (x)
k=1 l=1 k=1 l=1

m ∑
mk
= − log(zk (x; µ) − fkl (x)).
k=1 l=1

Lemma 5. Let Assumption X1a be satisfied. Let xi and µi , i ∈ N , be the sequences generated by Algo-
rithm 2. Then the sequences B(xi ; µi ), z(xi ; µi ), and F (xi ), i ∈ N , are bounded. Moreover, there exists a
constant L ≥ 0 such that for i ∈ N it holds
B(xi+1 ; µi+1 ) ≤ B(xi+1 ; µi ) + L(µi − µi+1 ). (107)

23
Proof.
(a) We first prove boundedness from below. Using (106) and Assumption X1a, one can write

m ∑
m ∑
mk
B(x; µ) − F = zk (x; µ) − F − µ log(zk (x; µ) − fkl (x))
k=1 k=1 l=1
∑m
≥ (zk (x; µ) − F − mk µ log(zk (x; µ) − F )) .
k=1

A convex function ψ(t) = t−mµ log(t) has a unique minimum at the point t = mµ because ψ ′ (mµ) =
1 − mµ/mµ = 0. Thus, it holds

m ∑
m
B(x; µ) ≥ F+ (mk µ − mk µ log(mk µ)) ≥ F + min (0, mk µ(1 − log(mk µ))
k=1 k=1
∑m

≥ F+ min (0, mk µ(1 − log(2mk µ)) = B.
k=1

Boundedness from below of sequences z(xi ; µi ) and F (xi ), i ∈ N , follows from inequalities (90) and
Assumption X1a.
(b) Now we prove boundedness from above. Similarly as in (a) we can write
∑ m ( )
zk (x; µ) − F ∑ zk (x; µ) − F
m
B(x; µ) − F ≥ + − mk µ log(z(x; µ) − F ) .
2 2
k=1 k=1

A convex function t/2 − mµ log(t) has a unique minimum at the point t = 2mµ. Thus, it holds

m
zk (x; µ) − F ∑
m ∑
m
zk (x; µ) − F
B(x; µ) ≥ +F + min (0, mµ(1 − log(2mk µ)) = +B
2 2
k=1 k=1 k=1
or

m
(zk (x; µ) − F ) ≤ 2(B(x; µ) − B). (108)
k=1
Using the mean value theorem and Lemma 4, we obtain

m ∑
mk
B(xi+1 ; µi+1 ) − B(xi+1 ; µi ) = log(zk (xi+1 ; µ̃i ) − fkl (xi+1 ))(µi − µi+1 )
k=1 l=1
∑m ∑mk
≤ log(zk (xi+1 ; µi ) − fkl (xi+1 ))(µi − µi+1 )
k=1 l=1
∑m
≤ mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 ), (109)
k=1

where 0 < µi+1 ≤ µ̃i ≤ µi . Since log(t) ≤ t/e (where e = exp(1)) for t > 0, we can write using
inequalities (108), (109), (90)

m
B(xi+1 ; µi+1 ) − B ≤ B(xi+1 ; µi ) − B + mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1

m
−1
≤ B(xi+1 ; µi ) − B + e mk (zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1
−1
≤ B(xi+1 ; µi ) − B + 2e m(B(xi+1 ; µi ) − B)(µi − µi+1 )
= (1 + λδi )(B(xi+1 ; µi ) − B) ≤ (1 + λδi )(B(xi ; µi ) − B),

24
where λ = 2m/e and δi = µi − µi+1 . Therefore,

i ∞

B(xi+1 ; µi+1 ) − B ≤ (1 + λδj )(B(x1 ; µ1 ) − B) ≤ (1 + λδi )(B(x1 ; µ1 ) − B) (110)
j=1 i=1

and since

∑ ∞

λδi = λ (µi − µi+1 ) = λ(µ − lim µi ) ≤ λµ,
i→∞
i=1 i=1
the expression on the right-hand side of (110) is finite. Thus, the sequence B(xi ; µi ), i ∈ N , is
bounded from above and the sequences z(xi ; µi ) and F (xi ), i ∈ N , are bounded from above as well
by (108) and (90).
(c) Finally, we prove formula (107). Using (109) and (90) we obtain

m
B(xi+1 ; µi+1 ) − B(xi+1 ; µi ) ≤ mk log(zk (xi+1 ; µi ) − F )(µi − µi+1 )
k=1
∑m
≤ mk log(Fk (xi+1 ) + mk µi − F )(µi − µi+1 )
k=1
∑m

≤ mk log(F + mk µ − F )(µi − µi+1 ) = L(µi − µi+1 )
k=1

(the existence of a constant F follows from boundedness of a sequence F (xi ), i ∈ N ), which together
with (102) gives B(xi+1 ; µi+1 ) ≤ B(xi ; µi ) + L(µi − µi+1 ), i ∈ N . Thus, it holds

B(xi ; µi ) ≤ B(x1 ; µ1 ) + L(µ1 − µi ) ≤ B(x1 ; µ1 ) + Lµ = B, i ∈ N. (111)

The upper bounds g and G are not used in Lemma 5, so Assumption X3 may not be satisfied. Thus,
there exists an upper bound F (independent of g and G) such that F (xi ) ≤ F for all i ∈ N . This upper
bound can be used in definition of a set DF (F ) in Assumption X3.
Lemma 6. Let Assumption X3 and the assumptions of Lemma 5 be satisfied. Then, if we use Procedure A
or Procedure B for an update of parameter µ, the values µi , i ∈ N , form a non-decreasing sequence such
that µi → 0.
Proof. The value of parameter µ is unchanged in the first phase of Procedure A or Procedure B. Since a
function B(x; µ) is continuous, bounded from below by Lemma 5, and since inequality (101) is satisfied
(with µi = µ), it holds ∥g(xi ; µ)∥ → 0 if phase 1 contains an infinite number of subsequent iterative steps
[32, Section 3.2]. Thus, there exists a step (with index i) belonging to the first phase such that either
∥g(xi ; µ)∥ < g in Procedure A or ∥g(xi ; µ)∥2 < ϑµ in Procedure B. However, this is in contradiction with
the definition of the first phase. Thus, there exists an infinite number of steps belonging to the second
phase, where the value of parameter µ is decreased so that µi → 0.
Theorem 8. Let assumptions of Lemma 6 be satisfied. Consider a sequence xi , i ∈ N , generated by
Algorithm 2, where δ = ε = µ = 0. Then

m ∑
mk ∑
mk
lim gkl (xi )ukl (xi ; µi ) = 0, ukl (xi ; µi ) = 1,
i→∞
k=1 l=1 l=1

zk (xi ; µi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0,


lim ukl (xi ; µi )(zk (xi ; µi ) − fkl (xi )) = 0
i→∞
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .

25
Proof.
(a) Equalities ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m, are satisfied by (89) because δ = 0. Inequalities zk (xi ; µi ) −
fkl (xi ) ≥ 0 and ukl (xi ; µi ) ≥ 0 follow from formulas (90) and statement (95).
(b) Relations (101) and (107) yield
B(xi+1 ; µi+1 ) − B(xi ; µi ) = (B(xi+1 ; µi+1 ) − B(xi+1 ; µi )) + (B(xi+1 ; µi ) − B(xi ; µi ))
≤ L (µi − µi+1 ) − c ∥g(xi ; µi )∥2
and since limi→∞ µi = 0 (Lemma 6), we can write by (111) that

∑ ∞

B ≤ lim B(xi+1 ; µi+1 ) ≤ B(x1 ; µ1 ) + L (µi − µi+1 ) − c ∥g(xi ; µi )∥2
i→∞
i=1 i=1

∑ ∞

≤ B(x1 ; µ1 ) + Lµ − c ∥g(xi ; µi )∥2 = B − c ∥g(xi ; µi )∥2 .
i=1 i=1

Thus, it holds

∑ 1
∥g(xi ; µi )∥2 ≤ (B − B) < ∞,
i=1
c
∑ m ∑ mk
which gives g(xi ; µi ) = k=1 l=1 gkl (xi )ukl (xi ; µi ) → 0.
(c) Let indices 1 ≤ k ≤ m and 1 ≤ l ≤ mk are chosen arbitrarily. Using (95) and Lemma 6 we obtain
µi
ukl (xi ; µi )(zk (xi ; µi ) − fkl (xi )) = (zk (xi ; µi ) − fkl (xi )) = µi → 0.
zk (xi ; µi ) − fkl (xi )

Corollary 1. Let the assumptions of Theorem 8 be satisfied. Then, every cluster point x ∈ Rn of a
sequence xi , i ∈ N , satisfies necessary KKT conditions (8)-(9) where z and u (with elements zk and ukl ,
1 ≤ k ≤ m, 1 ≤ l ≤ mk ) are cluster points of sequences z(xi ; µi ) and u(xi ; µi ), i ∈ N .
Now we will suppose that the values δ, ε, and µ are nonzero and show how a precise solution of the
system of KKT equations will be after termination of computation.
Theorem 9. Let the assumptions of Lemma 6 be satisfied. Consider a sequence xi , i ∈ N , generated by
Algorithm 2. Then, if the values δ > 0, ε > 0, and µ > 0 are chosen arbitrarily, there exists an index i ≥ 1
such that

mk

∥g(xi ; µi )∥ ≤ ε, 1 − ukl (xi ; µi ) ≤ δ,

l=1

zk (xi ; µi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0,


ukl (xi ; µi )(zk (xi ; µi ) − fkl (xi )) ≤ µ
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .
Proof. Inequality |1 − ẽTk uk (xi ; µi )| ≤ δ follows immediately from the fact that the equation ẽTk uk (xi ; µi ) =
1, 1 ≤ k ≤ m, is solved with precision δ. Inequalities zk (xi ; µi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0 follow
from formulas (90) and statement (95) as in the proof of Theorem 8. Since µi → 0 and g(xi ; µi ) → 0 by
Lemma 6 and Theorem 8, there exists an index i ≥ 1 such that µi ≤ µ and ∥g(xi ; µi )∥ ≤ ε. Using (95) we
obtain
µi
ukl (xi ; µi )(zk (xi ; µi ) − fkl (xi )) = (zk (xi ; µi ) − fkl (xi )) = µi ≤ µ.
zk (xi ; µi ) − fkl (xi )

26
3.6 Special cases
Both the simplest and most widely considered generalized minimax problem is the classical minimax
problem (10), when m = 1 in (4) (in this case we write m, z, u, v, U , V instead of m1 , z1 , u1 , v1 , U1 , V1 ).
For solving a classical minimax problem one can use Algorithm 2, where a major part of computation is
very simplified. System of equations (79) is of order n + 1 and has the form
[ ][ ] [ ]
G(x, z) + A(x)V (x, z)AT (x) −A(x)V (x, z)ẽ ∆x g(x, z)
=− , (112)
−ẽT V (x, z)AT (x) ẽT V (x, z)ẽ ∆z γ(x, z)

where g(x, z) = A(x)u(x, z), γ(x, z) = 1−ẽT u(x, z), V (x, z) = U 2 (x, z)/µ = diag(u21 (x, z), . . . , u2m (x, z))/µ,
and uk (x, z) = µ/(z − fk (x), 1 ≤ k ≤ m. System of equations (89) is reduced to one nonlinear equation


m
µ
1 − ẽT u(x, z) = 1 − = 0, (113)
z − fk (x)
k=1

whose solution z(x; µ) lies in the interval F (x) + µ ≤ z(x; µ) ≤ F (x) + mµ. To find this solution by robust
methods from [16], [17] is not difficult. A barrier function has the form


m
B(x; µ) = z(x; µ) − µ log(z(x; µ) − fk (x)) (114)
k=1

with

∇B(x; µ) = A(x)u(x; µ),


A(x)V (x; µ)ẽẽT V (x; µ)AT (x)
∇2 B(x; µ) = G(x; µ) + A(x)V (x; µ)AT (x) − .
ẽT V (x; µ)ẽ

If we write system (112) in the form


[ ][ ] [ ]
W (x, z) −c(x, z) ∆x g(x, z)
= −
−cT (x, z) δ(x, z) ∆z γ(x, z)

where W (x, z) = G(x, z) + A(x)V (x, z)AT (x), c(x, z) = A(x)V (x, z)ẽ and δ(x, z) = ẽT V (x, z)ẽ, then

c(x; µ)cT (x; µ)


∇2 B(x; µ) = W (x; µ) − .
δ(x; µ)

Since [ ]−1 [ −1 ]
W −c W − W −1 c ω −1 cT H −1 −W −1 c ω −1
= ,
−cT δ −ω −1 cT W −1 −ω −1
where ω = cT W −1 c − δ, we can write
[ ] [ ]−1 [ ] [ −1 ]
∆x W −c g W (c ∆z − g)
=− = ,
∆z −cT δ γ ∆z

where
∆z = ω −1 (cT W −1 g + γ).
The matrix W is sparse if the matrix A(x) has sparse columns. If the matrix W is not positive definite,
we can use the Gill-Murray decomposition

W + E = LLT , (115)

27
where E is a positive semidefinite diagonal matrix. Then we solve the equations

LLT p = g, LLT q = c (116)

and set
cT p + γ
∆z = , ∆x = q ∆z − p. (117)
cT q − δ
If we solve the classical minimax problem, Algorithm 2 must be somewhat modified. In Step 2, we solve
only equation (113) instead of the system of equations (89). In Step 4, we determine a vector ∆x by
solving equations (116) and using relations (117). In Step 4, we use the barrier function (114) (nonlinear
equation (113) must be solved at the point x + α∆x).
Minimization of a sum of absolute values, i.e., minimization of the function

m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x),
k=1 k=1

is another important generalized minimax problem. In this case, a barrier function has the form

m ∑
m ∑
m
Bµ (x, z) = zk − µ log(zk − fk+ (x)) − µ log(zk − fk− (x))
k=1 k=1 k=1
∑m ∑m ∑
m
= zk − µ log(zk − fk (x)) − µ log(zk + fk (x))
k=1 k=1 k=1
∑m ∑m
= zk − µ log(zk2 − fk2 (x)), (118)
k=1 k=1

where zk > |fk (x)|, 1 ≤ k ≤ m. Differentiating Bµ (x, z) with respect to x and z we obtain the necessary
conditions for an extremum

m
2µfk (x) ∑
m
2µfk (x)
gk (x) = uk (x, zk ) gk (x) = 0, uk (x, zk ) = (119)
zk − fk (x)
2 2 zk2 − fk2 (x)
k=1 k=1

and
2µzk zk fk (x)
1− = 1 − uk (x, zk ) =0 ⇒ uk (x, zk ) = , 1 ≤ k ≤ m. (120)
zk2 − fk (x)
2 fk (x) zk
Denoting A(x) = [g1 (x), . . . , gm (x)],
     
f1 (x) z1 u1 (x, z1 )
f (x) =  . . .  , z = . . .  , u(x, z) =  ...  (121)
fm (x) zm um (x, zm )

and Z = diag(z1 , . . . , zm ), one can write

A(x)u(x, z) = 0, u(x, z) = Z −1 f (x). (122)

Let B̃µ (z) = Bµ (x, z) for a fixed chosen vector x ∈ Rn . The function B̃µ (z) is convex for zk > |fk (x)|,
1 ≤ k ≤ m, because it is a sum of convex functions. Thus, a stationary point of B̃µ (z) exists and it is its
global minimum. Differentiating B̃µ (z) with respect to z we obtain quadratic equations

2µzk (x; µ)
=1 ⇔ zk2 (x; µ) − fk2 (x) = 2µzk (x; µ), 1 ≤ k ≤ m, (123)
zk2 (x; µ) − fk2 (x)

28
defining its unique stationary point, that have a solution

zk (x; µ) = µ + µ2 + fk2 (x), 1 ≤ k ≤ m, (124)

(the second solutions of quadratic equations (123) do not satisfy the condition zk > |fk (x)|, so the obtained
vector z does not belong to a domain of B̃µ (z)). Using (120) and (124) we obtain

fk (x) f (x)
uk (x; µ) = uk (x, z(x; µ)) = = √k , 1 ≤ k ≤ m, (125)
zk (x; µ) µ + µ2 + fk2 (x)
and

m ∑
m
B(x; µ) = B(x, z(x; µ)) = zk (x; µ) − µ log(zk2 (x; µ) − fk2 (x))
k=1 k=1

m ∑m
= zk (x; µ) − µ log(2µzk (x; µ))
k=1 k=1
∑m
= [zi (x; µ) − µ log(zi (x; µ))] − µm log(2µ). (126)
i=1

Theorem 10. Consider barrier function (126). Then

∇B(x; µ) = A(x)u(x; µ) (127)

and
∇2 B(x; µ) = W (x; µ) = G(x; µ) + A(x)V (x; µ)AT (x), (128)
where

m
G(x; µ) = Gk (x)uk (x; µ), (129)
k=1

Gk (x) are the Hessian matrices of functions fk (x), 1 ≤ k ≤ m, V (x; µ) = diag(v1 (x; µ), . . . , vm (x; µ)), and

vk (x; µ) = , 1 ≤ k ≤ m. (130)
zk2 (x; µ) + fk2 (x)

Proof. Differentiating (126) and using (123) and (119) we can write

m ∑
m
zi (x; µ)∇zk (x; µ) − fk (x)gk (x)
∇B(x; µ) = ∇zk (x; µ) − 2µ
i=1
zk2 (x; µ) − fk2 (x)
k=1
∑m ( ) ∑
m
2µzk (x; µ) 2µfk (x)gk (x)
= 1− 2 ∇zk (x; µ) +
zk (x; µ) − fk2 (x) zk (x; µ) − fk2 (x)
2
k=1 k=1
∑m
= uk (x; µ)gk (x) = A(x)u(x; µ).
k=1

Differentiating (123) we obtain


∇zk (x; µ) 2zk (x; µ)(zk (x; µ)∇zk (x; µ) − fk (x)gk (x))
− =0
zk2 (x; µ) − fk2 (x) (zk2 (x; µ) − fk2 (x))2
for 1 ≤ k ≤ m, which after arrangements gives
2zk (x; µ)fk (x)gk (x)
∇zk (x; µ) = (131)
zk2 (x; µ) + fk2 (x)

29
for 1 ≤ k ≤ m. Thus, by (125), (131), (123), and (127) it holds
( )
fk (x) zk (x; µ)gk (x) − fk (x)∇zk (x; µ)
∇uk (x; µ) = ∇ =
zk (x; µ) zk2 (x; µ)
( )
2fk2 (x) gk (x) zk2 (x; µ) − fk2 (x) gk (x)
= 1− 2 =
zk (x; µ) + fk2 (x) zk (x; µ) zk2 (x; µ) + fk2 (x) zk (x; µ)

= gk (x) = vk (x; µ)gk (x).
zk2 (x; µ) + fk2 (x)
Differentiating (127) and using the previous expression we obtain

m ∑
m ∑
m
∇2 B(x; µ) = ∇ uk (x; µ)gk (x) = uk (x; µ)Gk (x) + ∇uk (x; µ)gkT (x)
k=1 k=1 k=1

m ∑
m
= uk (x; µ)Gk (x) + vk (x; µ)gk (x)gkT (x),
k=1 k=1

which is equation (128).


A vector ∆x ∈ Rn is determined by solving the equation
∇2 B(x; µ)∆x = −g(x; µ), (132)
where g(x; µ) = ∇B(x; µ) ̸= 0. From (132) it follows
(∆x)T g(x; µ) = −(∆x)T ∇2 B(x; µ)∆x = −(∆x)T G(x; µ)∆x − (∆x)T A(x)V (x; µ)AT (x)∆x
≤ −(∆x)T G(x; µ)∆x,
so if a matrix G(x; µ) is positive definite, a matrix ∇B(x; µ) is positive definite as well (since a diagonal
matrix V (x; µ) is positive definite by (130)) and (∆x)T g(x; µ) < 0 holds (a direction vector ∆x is descent
for a function B(x; µ)).
By (130), a norm of a matrix V (x; µ) is bounded from above if the numbers fk2 (x), 1 ≤ k ≤ m, are
sufficiently positive. If fk2 (x) tends to zero faster than µ, then the element vk (x; µ) may tend to infinity
and a matrix ∇2 B(x; µ) (given by (128)) may be ill-conditioned. However, if Assumption X3 is satisfied
and if 0 < µ ≤ µ ≤ µ holds, one can write

∥∇2 B(x; µ)∥ = G(x; µ) + A(x)V (x; µ)AT (x)
m m
∑ ∑

≤ uk (x; µ)Gk (x) + vk (x; µ)gk (x)gk (x)
T

k=1 k=1
≤ mG + mg ∥V (x; µ)∥
2

because |uk (x; µ)| ≤ 1, 1 ≤ k ≤ m, holds by (125). Since a matrix V (x; µ) is diagonal, we can write by
(130) that ( )

∥V (x; µ)∥ = max |vk (x; µ)| = max 2 (x; µ) + f 2 (x) . (133)
1≤k≤m 1≤k≤m zk k
Using (123) and (124) we obtain
( √ )
zk2 (x; µ) + fk2 (x) = 2µzk (x; µ) = 2µ µ + µ2 + fk2 (x) ≥ 4µ2

for 1 ≤ k ≤ m, which after substitution to (133) gives ∥V (x; µ)∥ ≤ 1/(2µ). Thus, inequality
c c
∥∇2 B(x; µ)∥ ≤ ≤ (134)
µ µ

30
where c = m(µ G + g 2 /2) is satisfied.
A slightly modified Algorithm 2 can be used for minimization of a sum of absolute values. However, the
problems of this type are characterized by ill-posedness of a matrix ∇2 B(x; µ). Thus, it is more convenient
to use trust region methods [22]. In this case, a direction vector ∆x is determined by approximate
minimization of a quadratic function
1
Q(∆x) = (∆x)T ∇2 B(x; µ)∆x + g T (x; µ)∆x
2
on the set ∥∆x∥ ≤ ∆, where ∆ is a trust region radius. A direction vector ∆x serves for determination of
a new approximation of the solution x+ . Denoting

B(x + ∆x; µ) − B(x; µ)


ρ(∆x) = ,
Q(∆x)

we set x+ = x if ρ(∆x) < ρ or x+ = x + ∆x if ρ(∆x) ≥ ρ. The trust region radius is updated so that
β1 ≤ ∆+ ≤ β2 if ρ(∆x) < ρ or ∆+ ≥ ∆ if ρ(∆x) ≥ ρ where 0 < β1 ≤ β2 < 1. More details can be found
in [5] and [22].

4 Smoothing methods
4.1 Basic properties
Similarly as in Section 2.1 we will restrict ourselves to sums of maxima, where a function h : Rn → Rm
is a sum of its arguments, so (4) holds. Smoothing methods for minimization of sums of maxima replace
function (4) by a smoothing function


m
S(x; µ) = Sk (x; µ), (135)
k=1

where ( ) ( )

mk
fkl (x) ∑
mk
fkl (x) − Fk (x)
Sk (x; µ) = µ log exp = Fk (x) + µ log exp , (136)
µ µ
l=1 l=1

depending on a smoothing parameter 0 < µ ≤ µ, which is successively minimized on Rn with µ → 0.


Since fkl (x) ≤ Fk (x), 1 ≤ l ≤ mk , and the equality arises for at least one index, at least one exponential
function on the right-hand side of (136) has the value 1, so the logarithm is positive. Thus, it holds


m
Fk (x) ≤ Sk (x; µ) ≤ Fk (x) + µ log mk , 1≤k≤m ⇒ F (x) ≤ S(x; µ) ≤ F (x) + µ log mk , (137)
k=1

so S(x; µ) → F (x) if µ → 0.

Remark 24. Similarly as in Section 3.2 we will denote gkl (x) and Gkl (x) the gradients and Hessian
matrices of functions fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and
   
uk1 (x; µ) 1
uk (x; µ) =  ... , ẽk = . . .  ,
ukmk (x; µ) 1

where
exp(fkl (x)/µ) exp((fkl (x) − Fk (x))/µ)
ukl (x; µ) = ∑mk = ∑mk . (138)
l=1 exp(f kl (x)/µ) l=1 exp((fkl (x) − Fk (x))/µ)

31
Thus, it holds ukl (x; µ) ≥ 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and

mk
ẽTk uk (x; µ) = ukl (x; µ) = 1. (139)
l=1

Further, we denote Ak (x) = JkT (x) = [gk1 (x), . . . , gkmk (x)] and Uk (x; µ) = diag(uk1 (x; µ), . . . , ukmk (x; µ))
for 1 ≤ k ≤ m.
Theorem 11. Consider smoothing function (135). Then
∇S(x; µ) = g(x; µ) (140)
and
1∑ 1∑
m m
∇2 S(x; µ) = G(x; µ) + Ak (x)Uk (x; µ)ATk (x) − Ak (x)uk (x; µ)(Ak (x)uk (x; µ))T
µ µ
k=1 k=1
1 1
= G(x; µ) + A(x)U (x; µ)AT (x) − C(x; µ)C(x; µ)T (141)
µ µ
∑m
where g(x; µ) = k=1 Ak (x)uk (x; µ) = A(x)u(x) and

m
G(x; µ) = Gk (x)uk (x; µ), A(x) = [A1 (x), . . . , Am (x)],
k=1

U (x; µ) = diag(U1 (x; µ), . . . , Um (x; µ)), C(x; µ) = [A1 (x)u1 (x; µ), . . . , Am (x)um (x; µ)].
Proof. Obviously,

m ∑
m
∇S(x; µ) = ∇Sk (x; µ), ∇ S(x; µ) =
2
∇2 Sk (x; µ).
k=1 k=1
Differentiating functions (136) and using (138) we obtain

µ ∑k
1
m
∇Sk (x; µ) = ∑ mk exp(fkl (x)/µ)gkl (x)
l=1 exp(f kl (x)/µ) µ
l=1

mk
= gkl (x)ukl (x; µ) = Ak (x)uk (x; µ). (142)
l=1

Adding up these expressions yields (140). Further, it holds

1 exp(fkl (x)/µ)gkl (x) exp(fkl (x)/µ) ∑


mk
1
∇ukl (x; µ) = ∑mk − ∑m 2 exp(fkl (x)/µ)gkl (x)
µ l=1 exp(fkl (x)/µ) ( k µ
l=1 exp(fkl (x)/µ)) l=1

1 1 ∑k m
= ukl (x; µ)gkl (x) − ukl (x; µ) ukl (x; µ)gkl (x). (143)
µ µ
l=1

Differentiating (142) and using (143) we obtain



mk ∑
mk
∇2 Sk (x; µ) = Gkl (x)ukl (x; µ) + gkl (x)∇ukl (x; µ)
l=1 l=1
(m )T
1∑ 1∑ ∑
mk mk k

= Gk (x; µ) + gkl (x)ukl (x; µ)gkl (x) −


T
gkl (x)ukl (x; µ) gkl (x)ukl (x; µ)
µ µ
l=1 l=1 l=1
1 1
= Gk (x; µ) + Ak (x)Uk (x; µ)ATk (x) − Ak (x)uk (x; µ)(Ak (x)uk (x; µ))T ,
µ µ
∑mk
where Gk (x; µ) = l=1 Gkl (x)ukl (x; µ). Adding up these expressions yields (141).

32
Remark 25. Note that using (141) and the Schwarz inequality we obtain
( )
1∑ T ∑
m m
(v T Ak (x)Uk (x; µ)ẽk )2
v ∇ S(x; µ)v = v G(x; µ)v +
T 2 T
v Ak (x)Uk (x; µ)Ak (x)v −
T
µ
k=1
ẽTk Uk (x; µ)ẽk
k=1
≥ v T G(x; µ)v

because ẽTk Uk (x; µ)ẽk = ẽTk uk (x; µ) = 1, so the Hessian matrix ∇2 S(x; µ) is positive definite if the matrix
G(x; µ) is positive definite.

Using Theorem 11, a step of the Newton method can be written in the form x+ = x + α∆x where

∇2 S(x; µ)∆x = −∇S(x; µ),

or ( )
1
W (x; µ) − C(x; µ)C (x; µ) ∆x = −g(x; µ),
T
(144)
µ
where
1
W (x; µ) = G(x; µ) + A(x)U (x; µ)AT (x), g(x; µ) = A(x)u(x; µ). (145)
µ
A matrix W in (145) has the same structure as a matrix W in (93) and, by Theorem 11, smoothing
function (135) has similar properties as barrier function (91). Thus, one can use an algorithm that is
analogous to Algorithm 2 and considerations stated in Remark 21, where S(x; µ) and ∇2 S(x; µ) are used
instead of B(x; µ) and ∇2 B(x; µ). It means that

S(xi+1 ; µi ) − S(xi ; µi ) ≤ −c∥g(xi ; µi )∥2 ∀i ∈ N (146)

if Assumption X3 is satisfied and

S(xi+1 ; µi ) − S(xi ; µi ) ≤ 0 ∀i ∈ N (147)

in remaining cases.

Algorithm 3. Smoothing method


Data. A tolerance for the gradient norm of the smoothing function ε > 0. Bounds for a smoothing
parameter 0 < µ < µ. Coefficients for decrease of a smoothing parameter 0 < λ < 1, σ > 1 (or
0 < ϑ < 1). A tolerance for a uniform descent ε0 > 0. A tolerance for a steplength selection ε1 > 0.
A maximum steplength ∆ > 0.
Input. A sparsity pattern of the matrix A(x) = [A1 (x), . . . , Am (x)]. A starting point x ∈ Rn .

Step 1 Initiation. Choose µ ≤ µ. Determine a sparse structure of the matrix W = W (x; µ) from
the sparse structure of the matrix A(x) and perform a symbolic decomposition of the matrix W
(described in [2, Section 1.7.4]). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , values Fk (x) =
max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of objective function (4). Set r = 0 (restart indicator).

Step 2 Termination. Determine a vector of smoothing multipliers u(x; µ) by (138). Determine a matrix
A = A(x) and a vector g = g(x; µ) = A(x)u(x; µ). If µ ≤ µ and ∥g∥ ≤ ε, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x; µ) or compute an approximation G of the Hes-
sian matrix G(x; µ) using gradient differences or using quasi-Newton updates (Remark 22).
Step 4 Direction determination. Determine a matrix W by (145) and a vector ∆x by (144) using the
Gill-Murray decomposition of a matrix W .

33
Step 5 Restart. If r = 0 and (99) does not hold (where s = ∆x), set G = I, r = 1 and go to Step 4. If
r = 1 and (99) does not hold, set ∆x = −g. Set r = 0.
Step 6 Steplength selection. Determine a steplength α > 0 satisfying inequalities (100) (for a smooth-
ing function S(x; µ)) and α = ∆/∥∆x∥. Set x := x + α∆x. Compute values fkl (x), 1 ≤ k ≤ m,
1 ≤ l ≤ mk , values Fk (x) = max1≤l≤mk fkl (x), 1 ≤ k ≤ m, and the value of the objective function
(4).
Step 7 Smoothing parameter update. Determine a new value of the smoothing parameter µ ≥ µ
using Procedure A or Procedure B. Go to Step 2.

Algorithm 3 differs from Algorithm 2 in that a nonlinear equation ẽT u(x; µ) = 1 need not be solved
in Step 2 (because (139) follows from (138)), equations (144)–(145) instead of (116)–(117) are used in
Step 4, and a barrier function B(x; µ) is replaced with a smoothing function S(x; µ) in Step 6. Note
that the parameter µ in (135) has different meaning than the same parameter in (91), so we could use
another procedure for its update in Step 7. However, it is becoming apparent that using Procedure A or
Procedure B is very efficient. On the other hand, it must be noted that using exponential functions in
Algorithm 3 has certain disadvantages. Computation of the values of exponential functions is more time
consuming than performing standard arithmetic operations and underflow may also happen (i.e. replacing
nonzero values by zero values) if the value of a parameter µ is very small.
The values ε = 10−6 , µ = 10−6 , µ = 1, λ = 0.85, σ = 100, ϑ = 0.1, ε0 = 10−8 , ε1 = 10−4 , and
∆ = 1000 were used in our numerical experiments.

4.2 Global convergence


Now we prove the global convergence of the smoothing method realized by Algorithm 3.
Lemma 7. Choose a fixed vector x ∈ Rn . Then functions Sk (x; µ) : (0, ∞) → R, 1 ≤ k ≤ m, are
nondecreasing convex functions of µ > 0 and
∂Sk (x; µ)
0 ≤ log mk ≤ ≤ log mk , (148)
∂µ
where mk is a number of active functions (for which fkl (x) = Fk (x)) and
∑m ( ) mk (
∑ )
∂Sk (x; µ) k
fkl (x) − Fk (x) fkl (x) − Fk (x)
= log exp − ukl (x; µ). (149)
∂µ µ µ
l=1 l=1

Proof. Denoting φkl (x; µ) = (fkl (x) − Fk (x))/µ ≤ 0, 1 ≤ k ≤ m, so


∂φkl (x; µ) φkl (x; µ)
φ′kl (x; µ) =

=− ≥ 0,
∂µ µ
we can write by (136) that

mk
Sk (x; µ) = Fk (x) + µ log exp φkl (x; µ)
l=1

and

mk ∑ mk ′
∂Sk (x; µ) l=1∑φkl (x; µ) exp φkl (x; µ)
= log exp φkl (x; µ) + µ mk
∂µ l=1 exp φkl (x; µ)
l=1

mk ∑
mk
= log exp φkl (x; µ) − φkl (x; µ)ukl (x; µ) ≥ 0 (150)
l=1 l=1

34
because φkl (x; µ) ≤ 0, ukl (x; µ) ≥ 0, 1 ≤ k ≤ m, and φkl (x; µ) = 0 holds for at least one index. Thus,
functions Sk (x; µ), 1 ≤ k ≤ m, are nondecreasing. Differentiating (138) with respect to µ we obtain
∑ mk
∂ukl (x; µ) 1 φkl (x; µ) exp φkl (x; µ) 1 exp φkl (x; µ) φkl (x; µ) exp φkl (x; µ)
= − ∑mk + ∑k l=1∑
mk
∂µ µ l=1 exp φ kl (x; µ) µ l=1 m exp φ kl (x; µ) l=1 exp φkl (x; µ)
1 1 ∑
m k

= − φkl (x; µ)ukl (x; µ) + ukl (x; µ) φkl (x; µ)ukl (x; µ). (151)
µ µ
l=1

Differentiating (150) with respect to µ and using equations (139) and (151) we can write
1∑ 1∑ 1∑
m m m
∂ 2 Sk (x; µ) k k k
∂ukl (x; µ)
= − φkl (x; µ)ukl (x; µ) + φkl (x; µ)ukl (x; µ) − φkl (x; µ)
∂µ2 µ µ µ ∂µ
l=1 l=1 l=1

1∑
mk
∂ukl (x; µ)
= − φkl (x; µ)
µ ∂µ
l=1
(m )(m ) (m )2
1 ∑ k ∑ k
1 ∑ k

= 2
φkl (x; µ)ukl (x; µ) ukl (x; µ) − 2 φkl (x; µ)ukl (x; µ) ≥0
µ2 µ
l=1 l=1 l=1

because
(m )2 ( m )2
∑ k ∑ k
√ √ ∑
mk ∑
mk
φkl (x; µ)ukl (x; µ) = φkl (x; µ) ukl (x; µ) ukl (x; µ) ≤ φ2kl (x; µ)ukl (x; µ) ukl (x; µ)
l=1 l=1 l=1 l=1

holds by the Schwarz inequality. Thus, functions Sk (x; µ), 1 ≤ k ≤ m, are convex, so their derivatives
∂Sk (x; µ)/∂µ are nondecreasing. Obviously, it holds
∂Sk (x; µ) ∑
mk ∑
mk
lim = lim log exp φkl (x; µ) − lim φkl (x; µ)ukl (x; µ)
µ→0 ∂µ µ→0 µ→0
l=1 l=1

1 ∑k m
= log mk − lim φkl (x; µ) exp φkl (x; µ) = log mk
mk µ→0
l=1

because φkl (x; µ) = 0 if fkl (x) = Fk (x) and limµ→0 φkl (x; µ) = −∞, limµ→0 φkl (x; µ) exp φkl (x; µ) = 0 if
fkl (x) < Fk (x). Similarly, it holds
∂Sk (x; µ) ∑k m ∑k m
lim = lim log exp φkl (x; µ) − lim φkl (x; µ)ukl (x; µ) = log m
µ→∞ ∂µ µ→∞ µ→∞
l=1 l=1

because limµ→∞ φkl (x; µ) = 0 and limµ→∞ |ukl (x; µ)| ≤ 1 for 1 ≤ k ≤ m.
Lemma 8. Let Assumptions X1b and X3 be satisfied. Then the values µi , i ∈ N , generated by Algorithm 3,
create a nonincreasing sequence such that µi → 0.
Proof. Lemma 8 is a direct consequence of Lemma 6 because the same procedures for an update of a
parameter µ are used and (146) holds.
Theorem 12. Let the assumptions of Lemma 8 be satisfied. Consider a sequence xi i ∈ N , generated by
Algorithm 3, where ε = µ = 0. Then

m ∑
mk ∑
mk
lim ukl (xi ; µi )gkl (xi ) = 0, ukl (xi ; µi ) = 1
i→∞
k=1 l=1 l=1

and
Fk (xi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0, lim ukl (xi ; µi )(Fk (xi ) − fkl (xi )) = 0
i→∞
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .

35
Proof.
(a) Equations ẽTk uk (xi ; µi ) = 1 for 1 ≤ k ≤ m follow from (139). Inequalities Fk (xi ) − fkl (xi ) ≥ 0 and
ukl (xi ; µi ) ≥ 0 for 1 ≤ k ≤ m and 1 ≤ l ≤ mk follow from (4) and (138).
(b) Since Sk (x; µ) are nondecreasing functions of the parameter µ by Lemma 7 and (146) holds, we can
write

m
F ≤ Fk (xi+1 ) ≤ S(xi+1 ; µi+1 ) ≤ S(xi+1 ; µi ) ≤ S(xi ; µi ) − c∥g(xi ; µi )∥2
k=1

i
≤ S(x1 ; µ1 ) − c ∥g(xj ; µj )∥2 ,
j=1
∑m
where F = k=1 F k and F k , 1 ≤ k ≤ m, are lower bounds from Assumption X1b. Thus, it holds


F ≤ lim S(xi+1 ; µi+1 ) ≤ S(x1 ; µ1 ) − c ∥g(xi ; µi )∥2 ,
i→∞
i=1
or

∑ 1
∥g(xi ; µi )∥2 ≤ (S(x1 ; µ1 ) − F ),
i=1
c
so ∥g(xi ; µi )∥ → 0, which together with inequalities 0 ≤ ukl (xi ; µi ) ≤ 1, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
gives limi→∞ ukl (xi ; µi )gkl (xi ) = 0.
(c) Let indices 1 ≤ k ≤ m and 1 ≤ l ≤ mk be chosen arbitrarily. Using (138) we get
φkl (xi ; µi ) exp φkl (xi ; µi )
0 ≤ ukl (xi ; µi )(Fk (xi ) − fkl (xi )) = −µi ∑mk
l=1 exp φkl (xi ; µi )
µi
≤ −µi φkl (xi ; µi ) exp φkl (xi ; µi ) ≤ ,
e
where φkl (xi ; µi ), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are functions used in the proof of Lemma 7, because

mk
exp φkl (xi ; µi ) ≥ 1
l=1

and the function t exp t attains its minimal value −1/e at the point t = −1. Since µi → 0, we obtain
ukl (xi ; µi )(Fk (xi ) − fkl (xi )) → 0.

Corollary 2. Let the assumptions of Theorem 12 be satisfied. Then every cluster point x ∈ Rn of a
sequence xi , i ∈ N , satisfies the necessary KKT conditions (5)–(6), where u (with elements uk , 1 ≤ k ≤ m)
is a cluster point of a sequence u(xi ; µi ), i ∈ N .
Now we will suppose that the values ε and µ are nonzero and show how a precise solution of the system
of KKT equations will be after termination of computation of Algorithm 3.
Theorem 13. Let the assumptions of Theorem 8 be satisfied and let xi , i ∈ N , be a sequence generated
by Algorithm 3. Then, if the values ε > 0 and µ > 0 are chosen arbitrarily, there exists an index i ≥ 1
such that
∥g(xi ; µi )∥ ≤ ε, ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m,
and
µ
Fk (xi ) − fkl (xi ) ≥ 0, ukl (xi ; µi ) ≥ 0, ukl (xi ; µi )(Fk (xi ) − fkl (xi )) ≤
e
for all 1 ≤ k ≤ m and 1 ≤ l ≤ mk .

36
Proof. Equalities ẽTk uk (xi ; µi ) = 1, 1 ≤ k ≤ m, follow from (139). Inequalities Fk (xi ) − fkl (xi ) ≥ 0 and
ukl (xi ; µi ) ≥ 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , follow from (10) and (138). Since µi → 0 holds by Lemma 8 and
∥g(xi ; µi )∥ → 0 holds by Theorem 12, there exists an index i ≥ 1 such that µi ≤ µ and ∥g(xi ; µi )∥ ≤ ε.
By (138), as in the proof of Theorem 12, one can write
µi µ
ukl (xi ; µi )(Fk (xi ) − fkl (xi )) ≤ −µi φkl (xi ; µi ) exp φkl (xi ; µi ) ≤ ≤
e e
for 1 ≤ k ≤ m and 1 ≤ l ≤ mk .

4.3 Special cases


Both the simplest and most widely considered generalized minimax problem is the classical minimax
problem (10), when m = 1 in (4) (in this case we write m and z instead of m1 and z1 ). For solving a
classical minimax problem one can use Algorithm 3, where a major part of computation is very simplified.
A step of the Newton method can be written in the form x+ = x + α∆x where

∇2 S(x; µ)∆x = −∇S(x; µ),

or ( )
1
W (x; µ) − g(x; µ)g T (x; µ) ∆x = −g(x; µ), (152)
µ
where
1
W (x; µ) = G(x; µ) + A(x)U (x; µ)AT (x), g(x; µ) = A(x)u(x; µ). (153)
µ
Since ( )−1
1 T W −1 gg T W −1
W − gg = W −1 + ,
µ µ − g T W −1 g
holds by the Sherman-Morrison formula, the solution of system of equations (152) can be written in the
form
µ
∆x = T −1 W −1 g. (154)
g W g−µ
If a matrix W is not positive definite, it may be replaced with a matrix LLT = W + E obtained by the
Gill-Murray decomposition described in [11]. Then, we solve an equation

LLT p = g (155)

and set
µ
∆x = p. (156)
gT p − µ
Minimization of a sum of absolute values, i.e., minimization of the function

m ∑
m
F (x) = |fk (x)| = max(fk+ (x), fk− (x)), fk+ (x) = fk (x), fk− (x) = −fk (x),
k=1 k=1

is another important generalized minimax problem. In this case, a smoothing function has the form
∑ ( ( ) ( ))
|fk (x)| − fk− (x)
m
|fk (x)| − fk+ (x)
S(x; µ) = F (x) + µ log exp − + exp −
µ µ
k=1

m ∑m ( ( ))
2|fk (x)|
= |fk (x)| + µ log 1 + exp −
µ
k=1 k=1

37
because fk+ (x) = |fk (x)| if fk (x) ≥ 0 and fk− (x) = |fk (x)| if fk (x) ≤ 0, and by Theorem 11 we have

m ∑
m ∑
m
∇S(x; µ) = (gk+ u+
k + gk− u−
k) = gk (u+
k − u−
k) = gk uk = g(x; µ),
k=1 k=1 k=1


m
1∑
m
1∑
m
1∑
m
− − − 2
∇2 S(x; µ) = k −uk )+
Gk (u+ gk gkT (u+
k +uk )− k −uk ) = G(x; µ)+
gk gkT (u+ gk gkT (1−u2k )
µ µ µ
k=1 k=1 k=1 k=1

(because u+
k + u−
= 1), where gk = gk (x) and
k
( ) ( ) ( )
|f (x)|−f + (x) |f (x)|−f − (x)
exp − k µ k − exp − k µ k 1 − exp − 2|fkµ(x)|

k − uk =
uk = u+ ( ) ( )= ( ) sign(fk (x)),
|f (x)|−f + (x) |f (x)|−f − (x)
exp − k µ k + exp − k µ k 1 + exp − 2|fkµ(x)|
( )
4 exp − 2|fkµ(x)|
1 − u2k = ( ( ))2 ,
1 + exp − 2|fkµ(x)|

and where sign(fk (x)) is a sign of a function fk (x).

5 Primal-dual interior point methods


5.1 Basic properties
Primal interior point methods for solving nonlinear programming problems profit from the simplicity of
obtaining and keeping a point in the interior of the feasible set (for generalized minimax problems, it
suffices to set zk > Fk (x), 1 ≤ k ≤ m). Minimization of a barrier function without constraints and a direct
computation of multipliers ukl , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , are basic features of these methods. Primal-
dual interior point methods are intended for solving general nonlinear programming problems, where it
is usually impossible to assure validity of constraints. These methods guarantee feasibility of points by
adding slack variables, which appear in a barrier term added to the objective function. Positivity of the
slack variables is assured algorithmically (by a steplength selection). Minimization of a barrier function
with equality constraints and an iterative computation of the Lagrange multipliers (dual variables) are the
main features of primal-dual interior point methods.
Consider function (4). As is mentioned in the introduction, minimization of this function is equivalent
to the nonlinear programming problem

m
minimize zk subject to fkl (x) ≤ zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk . (157)
k=1

Using slack variables skl > 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and a barrier function



m ∑
m ∑
mk
Bµ (x, z, s) = zk − µ log(skl ), (158)
k=1 k=1 l=1

a solving of problem (157) can be transformed to a successive solving of problems

minimize Bµ (x, z, s) subject to fkl (x) + skl − zk = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk , (159)

where µ → 0. Necessary conditions for an extremum of problem (159) have the form

m ∑
mk
g(x, u) = gkl (x)ukl = 0,
k=1 l=1

38

mk
1− ukl = 0, 1 ≤ k ≤ m,
l=1

ukl skl − µ = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
fkl (x) + skl − zk = 0, 1 ≤ k ≤ m, 1 ≤ l ≤ mk ,
which is n+m+2m̄ equations for n+m+2m̄ unknowns (vectors x, z = [zk ], s = [skl ], u = [ukl ], 1 ≤ k ≤ m,
1 ≤ l ≤ mk ), where m̄ = m1 + · · · + mm . Denote A(x) = [A1 (x), . . . , Am (x)], f = [fkl ], S = diag(skl ),
U = diag(ukl ), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and
     
ẽ1 0 ... 0 ẽ1 z1
 0 ẽ . . . 0   ẽ2   z2 
E= 2 
. . . . . . . . . . . .  , ẽ =  
. . .  , z= 
. . . 
0 0 . . . ẽm ẽm zm

(matrices Ak (x), vectors ẽk , and numbers zk , 1 ≤ k ≤ m, are defined in Section 3.2). Applying the
Newton method to this system of nonlinear equations, we obtain a system of linear equations for increments
(direction vectors) ∆x, ∆z, ∆s, ∆u. After arrangement and elimination

∆s = −U −1 S(u + ∆u) + µS −1 ẽ, (160)

this system has the form


    
G(x, u) 0 A(x) ∆x g(x, u)
 0 0 −E T   ∆z  = −  ẽ − E T u , (161)
−1 −1
A (x) −E −U S
T
∆u f (x) − Ez + µU ẽ
∑m ∑mk
l=1 Gkl (x)ukl . Vector ẽ in equation ẽ − E u = 0 has unit elements, but its
T
where G(x, u) = k=1
dimension is different from the dimension of a vector ẽ in (160).
For solving this linear system, we cannot advantageously use the structure of a generalized minimax
problem (substituting z = Fk (x) = max1≤l≤mk fkl (x) we would obtain a nonsmooth problem whose
solution is much more difficult). Therefore, we need to deal with a general nonlinear programming problem.
To simplify subsequent considerations, we use the notation
[ ] [ ] [ ] [ ]
x g(x, u) G(x, u) 0 A(x)
x̃ = , g̃(x̃, u) = , G̃(x̃, u) = , Ã(x̃) = (162)
z ẽ − E T u 0 0 −E T

and write (161) in the form


[ ][ ] [ ]
G̃(x̃, u) Ã(x̃) ∆x̃ g̃(x̃, u)
=− , (163)
ÃT (x̃) −U −1 S ∆u c(x̃) + µU −1 ẽ

where c(x̃) = f (x) − Ez. This system of equations is more advantageous against systems (94) and (144) in
that its matrix does not depend on the barrier parameter µ, so it is not necessary to use a lower bound µ.
On the other hand, system (163) has a dimension n+m+ m̄, while systems (94) and (144) have dimensions
n. It would be possible to eliminate the vector ∆u, so the resulting system

(G̃(x̃, u) + Ã(x̃)M −1 ÃT (x̃))∆x̃ = −(g̃(x̃, u) + Ã(x̃)(M −1 c(x̃) + µS −1 ẽ)), (164)

where M = U −1 S, would have dimension n + m (i.e., n + 1 for classical minimax problems). Nevertheless,
as follows from the equation ukl skl = µ, either ukl → 0 or skl → 0 if µ → 0, so some elements of a matrix
M −1 may tend to infinity, which increases the condition number of system (164). Conversely, the solution
of equation (163) is easier if the elements of a matrix M are small (if M = 0, we obtain the saddle point
system, which can be solved by efficient iterative methods [1], [29]). Therefore, it is advantageous to split
the constraints to active with skl ≤ ε̃ukl (we denote active quantities by ĉ(x̃), Â(x̃), ŝ, ∆ŝ, Ŝ, û, ∆û, Û ,

39
M̂ = Û −1 Ŝ) and inactive with skl > ε̃ukl (we denote inactive quantities by č(x̃), Ǎ(x̃), š, ∆š, Š, ǔ, ∆ǔ,
Ǔ , M̌ = Ǔ −1 Š). Eliminating inactive equations from (163) we obtain
∆ǔ = M̌ −1 (č(x̃) + Ǎ(x̃)T ∆x̃) + µŠ −1 e (165)
and [ ][ ] [ ]
Ĝ(x̃, u) Â(x̃) ∆x̃ ĝ(x̃, u)
=− , (166)
ÂT (x̃) −M̂ ∆û ĉ(x̃) + µÛ −1 ẽ
where
Ĝ(x̃, u) = G(x̃, u) + Ǎ(x̃)M̌ −1 ǍT (x̃),
ĝ(x̃, u) = g(x̃, u) + Ǎ(x̃)(M̌ −1 č(x̃) + µŠ −1 ẽ),
and M̂ = Û −1 Ŝ is a diagonal matrix of order m̂, where 0 ≤ m̂ ≤ m̄ is the number of active constraints.
Substituting (165) into (160) we can write
∆ŝ = −M̂ (û + ∆û) + µÛ −1 ẽ, ∆š = −(č + ǍT ∆x̃ + š). (167)
The matrix of the linear system (166) is symmetric, but indefinite, so its Choleski decomposition cannot be
determined. In this case, we use either dense [3] or sparse [7] Bunch-Parlett decomposition for solving this
system. System (166) (especially if it is large and sparse) can be efficiently solved by iterative conjugate
gradient method with indefinite preconditioner [20]. If the vectors ∆x̃ and ∆û are solutions of system
(166), we determine vector ∆ǔ by (165) and vectors ∆ŝ, ∆š by (167).
Having vectors ∆x̃, ∆s, ∆u, we need to determine a steplength α > 0 and set
x̃+ = x̃ + α∆x̃, s+ = s(α), u+ = u(α), (168)
where s(α) and u(α) are vector functions such that s(α) > 0, s′ (0) = ∆s and u(α) > 0, u′ (0) = ∆u. This
step is not trivial, because we need to decrease both the value of the barrier function B̃µ (x̃, s) = Bµ (x, z, s)
and the norm of constraints ∥c(x̃)∥, and also to assure positivity of vectors s and u. We can do this in
several different ways: using either the augmented Lagrange function [20], [21] or a bi-criterial filter [10],
[37] or a special algorithm [12], [18]. In this section, we confine our attention to the augmented Lagrange
function which has (for problem (157)) the form
σ
P (α) = B̃µ (x̃ + α∆x̃, s(α)) + (u + ∆u)T (c(x̃ + α∆x̃) + s(α)) + ∥c(x̃ + α∆x̃) + s(α)∥2 , (169)
2
where σ ≥ 0 is a penalty parameter. The following theorem, whose proof is given in [20], holds.
Theorem 14. Let s > 0, u > 0 and let vectors ∆x̃, ∆û be solutions of the linear system
[ ][ ] [ ] [ ]
Ĝ(x̃, u) Â(x̃) ∆x̃ ĝ(x̃, u) r
+ = , (170)
ÂT (x̃) −M̂ ∆û ĉ(x̃) + µÛ −1 ẽ r̂
where r and r̂ are residual vectors, and let vectors ∆ǔ and ∆s be determined by (165) and (167). Then
P ′ (0) = −(∆x̃)T G̃(x̃, u)∆x̃ − (∆s)T M −1 ∆s − σ∥c(x̃) + s∥2 + (∆x̃)T r + σ(ĉ(x̃) + ŝ)T r̂. (171)
If
(∆x̃)T G̃(x̃, u)∆x̃ + (∆s)T M −1 ∆s
σ>− (172)
∥c(x̃) + s∥2
and if system (166) is solved in such a way that
(∆x̃)T r + σ(ĉ(x̃) + ŝ)T r̂ < (∆x̃)T G̃(x̃, u)∆x̃ + (∆s)T M −1 ∆s + σ(∥c(x̃) + s∥2 ), (173)
then P ′ (0) < 0.
Inequality (173) is significant only if linear system (166) is solved iteratively and residual vectors r and
r̂ are nonzero. If these vectors are zero, then (173) follows immediately from (172). Inequality (172) serves
for determination of a penalty parameter, which should be as small as possible. If the matrix G̃(x̃, u) is
positive semidefinite, then the right-hand side of (172) is negative and we can choose σ = 0.

40
5.2 Implementation
The algorithm of the primal-dual interior point method consists of four basic parts: determination of the
matrix G(x, u) or its approximation, solving linear system (166), a steplength selection, and an update of
the barrier parameter µ. The matrix G(x, u) has form (74), so its approximation can be determined in the
one of the ways introduced in Remark 22.
The linear system (166), obtained by determination and subsequent elimination of inactive constraints
in the way described in the previous subsection, is solved either directly using the Bunch-Parlett decom-
position or iteratively by the conjugate gradient method with the indefinite preconditioner
[ ]
D̂ Â(x̃)
C= , (174)
ÂT (x̃) −M̂

where D̂ is a positive definite diagonal matrix that approximates matrix Ĝ(x̃, u). An iterative process is
terminated if residual vectors satisfy conditions (173) and

∥r∥ ≤ ω∥g̃(x̃, u)∥, ∥r̂∥ ≤ ω min(∥ĉ(x̃) + µÛ −1 ẽ∥, ∥ĉ(x̃) + ŝ∥),

where 0 < ω < 1 is a prescribed precision. The directional derivative P ′ (0) given by (169) should be
negative. There are two possibilities how this requirement can be achieved. We either determine the value
σ ≥ 0 satisfying inequality (172), which implies P ′ (0) < 0 if (173) holds (Theorem 14), or set σ = 0 and
ignore inequality (173). If P ′ (0) ≥ 0, we determine a diagonal matrix D̃ with elements

∥g̃∥
D̃jj = Γ if |G̃jj | < Γ,
10
∥g̃∥ ∥g̃∥
D̃jj = |G̃jj | if Γ≤ |G̃jj | ≤ Γ, (175)
10 10
∥g̃∥
D̃jj = Γ if Γ< |G̃jj |,
10
for 1 ≤ j ≤ n + m, where g̃ = g̃(x̃, u) and 0 < Γ < Γ, set G̃(x̃, u) = D̃ and restart the iterative process by
solving new linear system (166).
We use functions s(α) = [skl (α)], u(α) = [ukl (α)], where skl (α) = skl + αskl ∆skl , ukl (α) = ukl +
αukl ∆skl and

αskl = α, ∆skl ≥ 0,
( )
skl
αskl = min α, −γ , ∆skl < 0,
∆skl
αukl = α, ∆ukl ≥ 0,
( )
ukl
αukl = min α, −γ , ∆ukl < 0,
∆ukl

when choosing a steplength using the augmented Lagrange function. A parameter 0 < γ < 1 (usually
γ = 0.99) assures the positivity of vectors s+ and u+ in (168). A parameter α > 0 is chosen to satisfy the
inequality P (α) − P (0) ≤ ε1 αP ′ (0), which is possible because P ′ (0) < 0 and a function P (α) is continuous.
After finishing the iterative step, a barrier parameter µ should be updated. There exist several heuristic
procedures for this purpose. The following procedure proposed in [36] seems to be very efficient.

Procedure C
Compute the centrality measure
m̄ minkl (skl ukl )
ϱ= ,
sT u

41
where m̄ = m1 + · · · + mm and 1 ≤ k ≤ m, 1 ≤ l ≤ mk . Compute the value
( )3
1−ϱ
λ = 0.1 min 0.05 ,2
ϱ

and set µ = λsT u/m̄.


Algorithm 4. Primal-dual interior point method
Data. A tolerance for the barrier function gradient norm ε > 0. A parameter for determination of active
constraints ε̃ > 0. A parameter for initiation of the slack variables and the Lagrange multipliers δ > 0.
An initial value of the barrier parameter µ > 0. Precision for the direction vectors determination
0 ≤ ω < 1. A parameter for the steplength selection 0 < γ < 1. A tolerance for the steplength
selection ε1 > 0. A maximum steplength ∆ > 0.
Input. A sparsity pattern of the matrix A(x) = [A1 (x), . . . , Am (x)]. A starting point x ∈ Rn .
Step 1 Initiation. Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and set Fk (x) = max1≤l≤mk fkl ,
zk = Fk (x) + δ, 1 ≤ k ≤ m. Compute values ckl (x̃) = fkl (x) − zk , 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and set
skl = −ckl (x̃), ukl = δ. Set µ = µ and compute the value of the barrier function B̃µ (x̃, s).

Step 2 Termination. Determine a matrix Ã(x̃) and a vector g̃(x̃, u) = Ã(x̃)u by (162). If the KKT
contitions ∥g̃(x̃, u)∥ ≤ ε, ∥c(x̃) + s∥ ≤ ε, and sT u ≤ ε are satisfied, terminate the computation.
Step 3 Hessian matrix approximation. Set G = G(x, u) or compute an approximation G of the Hes-
sian matrix G(x, u) using gradient differences or utilizing quasi-Newton updates (Remark 22). De-
termine a parameter σ ≥ 0 by (172) or set σ = 0. Split the constraints into active if ŝkl ≤ ε̃ûkl and
inactive if škl > ε̃ǔkl .
Step 4 Direction determination. Determine the matrix G̃ = G̃(x̃, u) by (162) (where the Hessian ma-
trix G(x, u) is replaced with its approximation G). Determine vectors ∆x̃ and ∆û by solving linear
system (166), a vector ∆ǔ by (165), and a vector ∆s by (167). Linear system (166) is solved ei-
ther directly using the Bunch-Parlett decomposition (we carry out both the symbolic and the numeric
decompositions in this step) or iteratively by the conjugate gradient method with indefinite precondi-
tioner (174). Compute the derivative of the augmented Lagrange function by formula (171).
Step 5 Restart. If P ′ (0) ≥ 0, determine a diagonal matrix D̃ by (175), set G̃ = D̃, σ = 0, and go to
Step 4.
Step 6 Steplength selection. Determine a steplength parameter α > 0 satisfying inequalities P (α) −
P (0) ≤ ε1 αP ′ (0) and α ≤ ∆/∥∆x∥. Determine new vectors x̃ := x̃ + α∆x̃, s := s(α), u := u(α) by
(168). Compute values fkl (x), 1 ≤ k ≤ m, 1 ≤ l ≤ mk , and set ckl (x̃) = fkl (x) − zk , 1 ≤ k ≤ m,
1 ≤ l ≤ mk . Compute the value of the barrier function B̃µ (x̃, s).
Step 7 Barrier parameter update. Determine a new value of the barrier parameter µ ≥ µ using Pro-
cedure C. Go to Step 2.

The values ε = 10−6 , ε̃ = 0.1, δ = 0.1, ω = 0.9, γ = 0.99, µ = 1, ε1 = 10−4 , and ∆ = 1000 were used
in our numerical experiments.

6 Numerical experiments
The methods studied in this contribution were tested by using two collections of test problems TEST14
and TEST15 described in [30], which are the parts of the UFO system [28] and can be downloaded from the
web page www.cs.cas.cz/luksan/test.html. Both these collections contain 22 problems with functions

42
fk (x), 1 ≤ k ≤ m, x ∈ Rn , where n is an input parameter and m ≥ n depends on n (we have used the
values n = 100 and n = 1000 for numerical experiments). Functions fk (x), 1 ≤ k ≤ m, have a sparse
structure (the Jacobian matrix of a mapping f (x) is sparse), so sparse matrix decompositions can be
used for solving linear equation systems. Since the method described in Section 2.2 does not use sparsity
of a quadratic programming problem, it is not comparable with other methods for test problems used.
Therefore, it is not presented in the test results.
The tested methods, whose results are reported in Tables 1-5, are denoted by seven letters. The first
pair of letters distinguishes the line search methods LS from the trust region methods TR (trust region
methods are used only for minimization of the l1 norm). The second pair of letters gives the problem type:
either a classical minimax MX (when a function F (x) has form (10) or F (x) = ∥f (x)∥∞ holds) or a sum
of absolute values SA (when F (x) = ∥f (x)∥1 holds). Further two letters specify the method used:

PI - the primal interior point method (Section 3),


SM - the smoothing method (Section 4),

DI - the primal-dual interior point method (Section 5).


The last letter denotes the procedure for updating a barrier parameter µ (procedures A and B are described
in Section 3.4 and procedure C is described in Section 5.2).
The columns of all tables correspond to two classes of methods. The Newton methods use approxima-
tions of the Hessian matrices of the Lagrange function obtained by gradient differences [4] and variable
metric methods update approximations of the Hessian matrices of the partial functions by the methods
belonging to the Broyden family [13] (Remark 22).
The tables contain total numbers of iterations NIT, function evaluations NFV, gradient evaluations NFG,
and also the total computational time, the number of problems with the value ∆ decreased and the number
of failures (the number of unsolved problems). The decrease of the maximum steplength ∆ is used for
three reasons. First, too large steps may lead to overflows if arguments of functions (roots, logarithms,
exponentials) lie outside of their definition domain. Second, the change of ∆ can affect the finding of a
suitable (usually global) minimum. Finally, it can prevent from achieving a domain in which the objective
function has bad behavior leading to a loss of convergence. The number of times the steplength has
decreased is in some sense a symptom of robustness (a lower number corresponds to higher robustness).
Several conclusions can be done from the results stated in these tables.

• For l1 approximation, it is usually more advantageous to use the trust region methods TR than the
line search methods LS.

• The smoothing methods are less efficient than the primal interior point methods. For testing the
smoothing methods, we had to use the value µ = 10−6 , while the primal interior methods work well
with the smaller value µ = 10−8 , which gives more precise results.
• The primal-dual interior point methods are slower than the primal interior point methods, especially
for the reason that system of equations (166) is indefinite, so we cannot use the Choleski (or the
Gill-Murray [11]) decomposition. If the matrix of linear system (166) is large and sparse, we can use
the Bunch-Parlett decomposition [7]. In this case, a large fill-in of new nonzero elements (and thus
to overflow of the operational memory or large extension of the computational time) may appear. In
this case, we can also use the iterative conjugate gradient method with an indefinite preconditioner
[29], however, the ill-conditioned systems can require a large number of iterations and thus a large
computational time.
• It cannot be uniquely decided whether Procedure A is better than Procedure B. The Newton methods
usually work better with Procedure A while the variable metric methods are more efficient with
Procedure B.

43
• The variable metric methods are usually faster because it is not necessary to determine the elements
of the Hessian matrix of the Lagrange function by gradient differences. The Newton methods seem
to be more robust (especially in case of l1 approximation).

References
[1] M.Benzi, G.H.Golub, J.Liesen: Numerical solution of saddle point problems. Acta Numerica 14
(2005) 1-137.
[2] A.Björck: Numerical Methods in Matrix Computations. Springer, New York, 2015.
[3] J.R.Bunch, B.N.Parlett: Direct methods for solving symmetric indefinite systems of linear equations.
SIAM J. Numerical Analysis 8 (1971) 639-655.
[4] T.F.Coleman, J.J.Moré: Estimation of sparse Hessian matrices and graph coloring problems. Math-
ematical Programming 28 (1984) 243-270.
[5] A.R.Conn, N.I.M.Gould, P.L.Toint: Trust-region Methods. SIAM, Phiadelphia, 2000.
[6] G. Di Pillo, L. Grippo, S. Lucidi: Smooth Transformation of the generalized minimax problem. J. of
Optimization Theory and Applications 95 (1997) 1-24.
[7] I.S.Duff, N.I.M.Gould, J.K.Reid, K.Turner: The factoruzation of sparse symmetric indefinite metri-
ces. IMA Journal of Numerical Analysis 11 (1991) 181-204.
[8] M.Fiedler: Special Matrices and Their Applications in Numerical Mathematics. Dover Publications,
New York, 2008.
[9] R.Fletcher: Practical methods of optimization. Wiley, New York, 1987.
[10] R.Fletcher, S.Leyffer: Nonlinear programming without a penalty function Mathematical Program-
ming 91 (2002) 239-269.
[11] P.E.Gill, W.Murray: Newton type methods for unconstrained and linearly constrained optimization.
Mathematical Programming 7 (1974) 311-350.
[12] N.I.M.Gould, P.L.Toint: Nonlinear programming without a penalty function or a filter. Mathematical
Programming 122 (2010) 155-196.
[13] A.Griewank, P.L.Toint: Partitioned variable metric updates for large-scale structured optimization
problems. Numererische Mathematik 39 (1982) 119-137.
[14] A.Griewank, A.Walther: Evaluating Derivatives. SIAM, Philadelphia, 2008.
[15] S.P.Han: Variable metric methods for minimizing a class of nondifferentiable functions. Math. Pro-
gramming 20 (1981) 1-13.
[16] D. Le: Three new rapidly convergent algorithms for finding a zero of a function. SIAM J. on Scientific
and Statistical Computations 6 (1985) 193-208.
[17] D. Le: An efficient derivative-free method for solving nonlinear equations. ACM Transactions on
Mathematical Software 11 (1985) 250-262.
[18] X.Liu Y.Yuan: A sequential quadratic programming method without a penalty function or a filter
for nonlinear equality constrained optimization. SIAM J. Optimization 21 (2011) 545-571.
[19] L.Lukšan: Dual method for solving a special problem of quadratic programming as a subproblem at
linearly constrained nonlinear minimax approximation. Kybernetika 20 (1984) 445-457.

44
[20] L.Lukšan, C.Matonoha, J.Vlček: Interior-point method for non-linear non-convex optimization. Nu-
merical Linear Algebra with Applications 11 (2004) 431-453.
[21] L.Lukšan, C.Matonoha, J.Vlček: Interior-point method for large-scale nonlinear programming. Op-
timization Methods and Software 20 (2005) 569-582.
[22] L.Lukšan, C.Matonoha, J.Vlček: Trust-region interior-point method for large sparse l1 optimization.
Optimization Methods and Software 22 (2007) 737-753.
[23] L.Lukšan, C.Matonoha, J.Vlček: On Lagrange multipliers of trust-region subproblems. BIT Numer-
ical Analysis 48 (2008a) 763-768.
[24] L.Lukšan, C.Matonoha, J.Vlček J.: Algorithm 896: LSA: Algorithms for Large-Scale Optimization.
ACM Transactions on Mathematical Software 36 (2009) No. 3.
[25] L. Lukšan, C. Matonoha, J. Vlček: Primal interior-point method for large sparse minimax optimiza-
tion. Kybernetika 45 (2009) 841-864.
[26] L. Lukšan, C. Matonoha, J. Vlček: Primal interior-point method for minimization of generalized
minimax functions. Kybernetika 46 (2010) 697-721.

[27] L.Lukšan, E.Spedicato: Variable metric methods for unconstrained optimization and nonlinear least
squares. Journal of Computational and Applied Mathematics 124 (2000) 61-93.

[28] L.Lukšan, M.Tůma, C.Matonoha, J.Vlček J., N.Ramešová, M.Šiška, J.Hartman: UFO 2017. Inter-
active System for Universal Functional Optimization. Technical Report V-1252. Prague, ICS AS CR
2017.
[29] L.Lukšan, J.Vlček: Indefinitely preconditioned inexact Newton method for large sparse equality
constrained nonlinear programming problems. Numerical Linear Algebra with Applications 5 (1998)
219-247.

[30] Lukšan L., Vlček J.: Sparse and partially separable test problems for unconstrained and equality
constrained optimization. Technical Report V-767. Prague, ICS AS CR 1998.

[31] M.M.Mäkelá, P.Neitaanmäki: Nonsmooth Optimization. World Scientific, London 1992.


[32] J.Nocedal, S.J.Wright: Numerical optimization. Springer-Verlag, New York, 2006.

[33] M.J.D.Powell: On the global convergence of trust region algorithms for unconstrained minimization.
Mathematical Programming 29 (1984) 297-303.

[34] P.L.Toint: On sparse and symmetric matrix updating subject to a linear equation. Mathematics of
Computation 31 (1977) 954-961.

[35] M.Tůma: A note on direct methods for approximations of sparse Hessian matrices. Applications of
Mathematic 33 (1988) 171-176.

[36] J.Vanderbei, D.F.Shanno: An interior point algorithm for nonconvex nonlinear programming. Com-
putational Optimization and Applications 13 (1999) 231-252.

[37] A.Wachter, L.Biegler: Line search filter methods for nonlinear programming. Motivation and global
convergence. SIAM Journal on Computing 16 (2005) 1-31.

[38] Y.Xiao, B.Yu: A truncated aggregate smoothing Newton method for minimax problems. Applied
Mathematics and Computation 216 (2010) 1868-1879.

45
Newton methods: n=100 Variable metric methods: n=100
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 2232 7265 11575 0.74 4 - 2849 5078 2821 0.32 2 -
LSMXPI-B 2184 5262 9570 0.60 1 - 1567 2899 1589 0.24 1 -
LSMXSM-A 3454 11682 21398 1.29 5 - 4444 12505 4465 1.03 - -
LSMXSM-B 10241 36891 56399 4.15 3 - 8861 32056 8881 2.21 1 1
LSMXDI-C 1386 2847 14578 0.90 2 - 2627 5373 2627 0.96 3 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 1386 3735 7488 5.58 4 - 3237 12929 3258 5.91 6 -
LSMXPI-B 3153 6885 12989 9.03 4 - 1522 3287 1544 2.68 5 -
LSMXSM-A 10284 30783 82334 54.38 7 - 4221 9519 4242 8.00 8 -
LSMXSM-B 18279 61180 142767 87.76 6 - 13618 54655 13639 45.10 9 1
LSMXDI-C 3796 6677 48204 49.95 6 - 2371 5548 2371 18.89 3 -

Table 1: TEST14 (minimization of maxima) – 22 problems

Newton methods: n=100 Variable metric methods: n=100


Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 2194 5789 10553 0.67 3 - 2890 5049 2912 0.48 1 -
LSMXPI-B 6767 17901 39544 3.79 4 - 1764 3914 1786 0.37 2 -
LSMXSM-A 3500 9926 23568 1.79 7 - 8455 23644 8476 4.69 4 -
LSMXSM-B 15858 48313 92486 8.33 5 - 9546 34376 9566 2.59 9 1
LSMXDI-C 1371 2901 11580 1.12 8 - 2467 5130 2467 1.59 3 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 4110 14633 20299 18.89 4 - 1549 2636 1571 2.51 3 -
LSMXPI-B 6711 31618 29939 30.73 7 - 1992 6403 2013 4.96 4 -
LSMXSM-A 9994 24333 88481 67.45 11 - 6164 15545 6185 29.37 8 -
LSMXSM-B 23948 84127 182604 149.63 8 - 24027 92477 24048 132.08 8 1
LSMXDI-C 3528 9084 26206 49.78 12 - 1932 2845 1932 18.73 5 -

Table 2: TEST14 (l∞ approximation) – 22 problems

Newton methods: n=100 Variable metric methods: n=100


Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 15525 20272 55506 4.41 1 - 6497 8204 6518 1.37 3 -
LSMXPI-B 7483 17999 27934 3.27 5 - 1764 7598 2488 0.74 2 -
LSMXSM-A 17574 29780 105531 11.09 4 - 9879 15305 9900 5.95 - -
LSMXSM-B 13446 29249 81938 6.80 9 1 9546 34376 9566 2.59 3 -
LSMXDI-C 980 1402 7356 0.79 1 - 1179 1837 1179 1.06 2 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
LSMXPI-A 10325 15139 32422 39.30 6 - 6484 9904 6502 13.77 2 -
LSMXPI-B 14836 30724 46864 68.70 10 - 7388 15875 7409 19.98 3 -
LSMXSM-A 11722 24882 69643 61.65 10 - 6659 12824 6681 41.55 8 -
LSMXSM-B 13994 31404 86335 78.45 9 1 15125 25984 15147 61.62 10 -
LSMXDI-C 1408 2406 10121 15.63 6 - 2228 3505 2228 35.13 10 -

Table 3: TEST15 (l∞ approximation) – 22 problems

46
Newton methods: n=100 Variable metric methods: n=100
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 2098 2469 10852 0.57 1 - 22710 22903 22731 1.74 1 1
TRSAPI-B 2286 2771 11353 0.56 1 - 22311 22476 22332 1.62 1 1
LSSAPI-A 1647 5545 8795 0.63 5 - 12265 23579 12287 1.37 2 1
LSSAPI-B 1957 7779 10121 0.67 6 - 4695 6217 10608 0.67 3 -
TRSASM-A 2373 2868 19688 0.73 1 - 22668 22918 22689 2.34 2 1
TRSASM-B 3487 4382 28467 1.12 1 - 22022 22244 22044 1.90 2 1
LSSASM-A 1677 4505 16079 0.74 3 - 20025 27369 20047 2.83 4 -
LSSASM-B 2389 8085 23366 1.18 4 - 5656 11637 5678 1.02 2 -
LSSADI-C 4704 13012 33937 4.16 7 1 6547 7012 6547 9.18 8 -
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 7570 8955 30013 12.54 3 - 24896 25307 24916 16.22 5 1
TRSAPI-B 8555 9913 36282 18.11 6 - 25013 25492 25033 16.64 5 1
LSSAPI-A 7592 19621 46100 15.39 4 - 22277 36610 22298 19.09 7 1
LSSAPI-B 9067 35463 56292 19.14 6 - 16650 35262 16672 14.47 6 1
TRSASM-A 7922 9453 49104 12.66 2 - 26358 26966 26378 26.44 4 1
TRSASM-B 9559 11358 58418 16.39 7 - 24283 24911 24303 17.79 6 1
LSSASM-A 5696 13534 41347 15.28 4 - 20020 30736 20042 23.05 5 1
LSSASM-B 8517 30736 57878 23.60 6 - 18664 28886 18686 18.65 5 1
LSSADI-C 6758 11011 47960 94.78 11 1 13123 14610 13124 295.46 8 2

Table 4: TEST14 (l1 approximation) – 22 problems

Newton methods: n=100 Variable metric methods: n=100


Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 15882 16327 54025 3.79 6 - 43038 43435 43054 8.40 8 1
TRSAPI-B 3977 4646 14592 1.15 8 - 6526 6958 6548 0.70 7 1
LSSAPI-A 15760 21846 58082 4.24 8 - 39469 58157 39486 6.28 4 1
LSSAPI-B 4592 17050 17778 1.46 5 - 5932 25035 5952 1.48 6 1
TRSASM-A 6310 7210 38094 2.16 6 - 46219 46759 46237 11.68 7 1
TRSASM-B 4452 5340 26841 1.22 5 - 5240 5821 5409 0.99 5 1
LSSASM-A 10098 14801 610511 3.54 5 - 9162 28421 9184 3.65 6 1
LSSASM-B 4528 14477 290379 2.94 8 - 3507 9036 3528 1.27 6 -
LSSADI-C 877 1373 6026 0.84 3 - 15528 15712 15529 14.49 5 1
Newton methods: n=1000 Variable metric methods: n=1000
Method NIT NFV NFG Time ∆ Fail NIT NFV NFG Time ∆ Fail
TRSAPI-A 14828 16249 85433 29.89 7 - 23758 24120 23778 33.22 9 1
TRSAPI-B 8048 9003 39532 17.45 8 - 10488 11044 10506 11.15 9 1
LSSAPI-A 18519 39319 70951 61.04 5 - 27308 44808 27327 36.64 4 1
LSSAPI-B 12405 57969 43189 55.06 7 - 12712 32179 12731 21.48 8 1
TRSASM-A 11496 13335 63214 26.78 8 1 16345 16754 16362 27.76 8 1
TRSASM-B 9564 11006 53413 24.10 5 - 6993 7525 7011 8.23 6 1
LSSASM-A 19317 32966 113121 62.65 8 - 22264 42908 22284 62.46 7 1
LSSASM-B 14331 33572 86739 57.56 6 - 12898 42479 12919 47.05 7 1
LSSADI-C 2093 3681 12616 20.01 3 1 23957 28000 23960 186.92 5 3

Table 5: TEST15 (l1 approximation) – 22 problems

47

You might also like