0% found this document useful (0 votes)
26 views19 pages

Penalty Methods, Barrier Methods and Augmented Lagrangians: Prob Lem Modifiers in Model

This section discusses penalty methods and barrier methods for constrained optimization problems. Penalty methods add a penalty term to the objective function to penalize infeasible points. Barrier methods instead add a barrier term that approaches infinity at the boundary of the feasible set, keeping iterates within the feasible region. The document also covers duality theory, augmented Lagrangian methods, and their relation to proximal minimization algorithms.

Uploaded by

吴善统
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views19 pages

Penalty Methods, Barrier Methods and Augmented Lagrangians: Prob Lem Modifiers in Model

This section discusses penalty methods and barrier methods for constrained optimization problems. Penalty methods add a penalty term to the objective function to penalize infeasible points. Barrier methods instead add a barrier term that approaches infinity at the boundary of the feasible set, keeping iterates within the feasible region. The document also covers duality theory, augmented Lagrangian methods, and their relation to proximal minimization algorithms.

Uploaded by

吴善统
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CHAPTER 4

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


Penalty Methods, Barrier Methods and
Augmented Lagrangians

We now turn to study some general optimization methods upon which we


can later base the design of model decomposition algorithms (see Chap­
ter 1). These general methods for constrained nonlinear optimization are
the penalty methods that employ penalty or barrier functions giving rise
to so-called exterior penalty methods or interior penalty methods, respec­
tively. These methods are closely related to the primal-dual Lagrangian
algorithmic scheme and to the method of augmented Lagrangians (some­
times referred to as the multiplier method), which we also discuss here.
Such algorithms are not only considered most efficacious in solving
large-scale constrained optimization problems, but are also suitable prob­
lem modifiers in our model decomposition algorithms (Chapter 7). Barrier­
function methods are also helpful when studying the primal-dual path fol­
lowing algorithm for linear programming (see Chapter 8). This algorithm is
developed using a logarithmic barrier function. Finally, there is a close re­
lationship between the method of augmented Lagrangians and the proximal
minimization algorithms (see Chapter 3), which should not be overlooked.
Section 4.1 examines penalty functions and their use in penalty meth­
ods, wherein the optimal solution of the problem is approached from out­
side the feasible set (exterior penalty). In Section 4.2 we describe barrier
functions and their use in approaching the optimal solution from within
the feasible set. Section 4.3 explains the basics of duality theory and the
primal-dual algorithmic framework. Section 4.4 gives an account of aug­
mented Lagrangian methods and how they are related to the proximal
minimization approach developed earlier (see Chapter 3). Notes and Ref­
erences are given in Section 4.5.

4.1 Penalty Methods


Penalty methods for constrained nonlinear optimization are based on the
following idea: a penalty function is defined that imposes a penalty for
constraint violations by raising the value of the objective function that has
to be minimized. The penalty is greater for points that are farther away
from the feasible set and is equal to zero at feasible points that satisfy

Parallel Optimization, Yair Censor, Oxford University Press (1997), © 1997 by


Oxford University Press, Inc., DOI: 10.1093/9780195100624.003.0004
Sect. 4.1 Penalty Methods 61

all constraints of the problem. The optimal solution to the constrained


problem is obtained as the limit of solutions of a sequence of unconstrained
penalized problems.
To elaborate this idea we consider the following optimization problem

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


Minimize F(x) (4-1)
s.t. x € Q, (4-2)

where F:lRn—>IR is a continuous function and Q C IRn is the feasible set.


Definition 4.1.1 (Penalty Functions) Given a nonempty set Q C IRn, a
function p : IR71 —> IR is called an (exterior) penalty function with respect
to Q if the following hold: (i) p(x) is continuous, (ii) p(x) > 0, for all
x G IRn, and (Hi) p(x) — 0 if and only if x E £2.
Example 4.1.1 When Q = {x E IRn | gi(x) < 0, i = 1, 2,..., m} describes
the feasible set by inequality constraints, where g, : IRn —* IR, a commonly
used penalty function is the quadratic penalty function:

1 1 m
P(z) = 2 II II2 = 2 52 (max(0,5i(x)))2 . (4.3)

Using a penalty function we can associate with (4.1)-(4.2) an uncon­


strained penalized problem

Minimize (F(x) T cp(rr)), (4-4)


xeiRn

with a positive penalty parameter c > 0. If the penalized problem (4.4)


yields the exact solution of the original problem (4.1)-(4.2) for a finite
value of the parameter c, then we call p(x) an exact penalty function for
the original problem. If this is not the case then the penalty method for
solving (4.1)-(4.2) works as follows.

Algorithm 4.1.1 The Penalty Algorithmic Scheme

Step 0: (Initialization.) Let be a monotone sequence of penalty


parameters such that cy > 0 and cP+i > cy, for all v > 0, and
limp—Cy ==- ~Foo.
Step 1: (Iterative step.) For every v > 0, solve the unconstrained penal­
ized optimization problem and set

xy = argmin (F(rr) 4- cyp(x)). (4-5)


xeiRn
62 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

The following convergence result gives conditions, on the original problem,


under which any accumulation point of a sequence {xy}™=Q, generated by
Algorithm 4.1.1, is a solution of the original problem.
Theorem 4.1.1 Let p(x) be a penalty function for problem

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


in which F(x) is a continuous function, Q is a closed set and one of the
following conditions holds:
(z) lim^y^oo F(x) =-Foo,
(zz) Q is bounded and lim||a.||_>oo p(x) = +oo.
Then any sequence {xy}^LQ, generated by the penalty algorithmic scheme
(Algorithm 4-1-1) has at least one accumulation point, every such point is
an optimal solution of problem (4-l)~(4-2), and Hindoo p(xy>) = 0.
Proof The Weierstrass theorem, which says that a real continuous function
on a compact set in IRn obtains its minimum on the set, guarantees the
existence of an optimal solution x* G Q of the original problem. This means
*
),
that F(x) > F(x for all x G Q. This is obvious if condition (zz) holds
because then Q is closed and bounded. If condition (z) holds then we take
any x G IRn and deduce that there exists M > 0, such that || x || > M
implies F(x) > F(x). Thus, solving the problem (4.1)-(4.2) reduces to the
minimization of F(x) over the set Q A {x G IRn | || x || < M}, which is
compact, and the Weierstrass theorem applies. Next we see that from the
nonnegativity of the penalty function and the penalty parameters, from
the strict monotone increase of the latter, and from (4.5) we have, for all
v > 0,

F(^+1) + cp+1P(^+1) > F(^+1) + cvP{x^} > F(xy) + cyp(xy). (4.6)

Also, from the nature of xy and xy+l as minimizers of the penalized


problems at the respective iterations, we have, for all u > 0,

F(xy) + c„p(xy) < F(xy+X) + c,p(^+1).

The last two inequalities yield (c^+i — cyyp(xy+v) < (cp+i — clf)p(xy), which
shows that, for all v > 0,

p(zp+1) <p{xv), (4.7)

because is monotonically increasing. The optimality of x* G Q


makes p(x
*
) = 0 and it follows that, for all v > 0,

F(xy) < F(xy) + c„p(xy) < F(z


*
) + c„p(x
*) = F(x
*
). (4.8)

Any sequence generated by the penalty method, must be


bounded because otherwise it would contradict (4.8) if condition (z) holds
Sect. 4.2 Barrier Methods 63

or it would contradict p(xy) < p(x°), for all v > 0, which follows from
(4.7) if condition (u) holds. This proves the existence of an accumula­
tion point x, and we let K C No = {0,1,2,...} be a subse­
quence converging to x. From continuity of F and from (4.8) we obtain

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


limPEK F(xu) = F(x) < F(z).
* The sequence {F(xy) + cpp(xz>)}2T0 is, by
(4.6), monotonically increasing and, by (4.8), bounded from above; thus

\
*
lim (F(xy) + c,p(xy)) = q* < F(x

which shows that, for v e K, lim^e# Ci/p(xy) = q* -F(£). Therefore, since


the sequence {c„} is unbounded, we must have, by (4.7), limI/€7<p(xz'') = 0,
which ensures that p(x) = 0, by continuity of p(x). This means that
x e Q and since we showed above that F(x) < F(z ),
* it follows that
F(x) — F(x
)
* and thus x is an optimal solution. ■

This theorem shows that the penalty algorithmic scheme is an exterior


penalty method because the iterates have a decreasing level of
infeasibility—if we measure the level of infeasibility by the values of the
penalty function, see (4.7); and the function values {F(xu)}^Lq approach
the minimal value F(x)
* from below. In contrast to this behavior, the
barrier methods (see below) approach an optimal point from the interior
of the feasible set.

4.2 Barrier Methods


Barrier methods for the solution of constrained nonlinear optimization
problems are similar to penalty methods in that they replace the orig­
inal problem by a sequence of unconstrained problems whose objective
functions are augmented modifications of the original F(x). The main dif­
ference, however, lies in the nature of the function added to F(x). Here
this function builds a barrier along the boundary of the feasible set Q,
which prevents the iterates produced by the algorithm from approaching
the boundary while searching for a minimum inside Q. We must assume
here that the set Q in problem (4.1)-(4.2) is such that its interior is not
empty and each point on the boundary is the limit of a sequence of points
of its interior int Q. Such sets are referred to as robust.
Definition 4.2.1 (Barrier Functions) Given a robust set Q C ]Rn, a func­
tion q : IRn —> IR is called a barrier function with respect to Q (also called
an interior penalty function) if the following hold: (i) q(x) is continuous
on int Q, (ii) q(x) > 0, for all x E int Q, and (Hi) lim^(rr) = -Foo, as x
tends to any boundary point ofQ.

Example 4.2.1 When Q is as in Example 4-1-1 and we assume that the


functions gj are continuous on lRn, for alii = 1,2,..., m, and that the set
64 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

Q is robust and that int Q = {x e IRn | g;(x) <0, i = 1,2,... ,m}, then
m i
the function q(x) = — 5 —— is a barrier function for Q.

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


This time we associate with the original constrained problem a modified
constrained problem with an objective function augmented by a barrier
function, namely,

Minimize F(x) + cq(x) (4.9)


s.t. x e int Q, (4-10)

with a positive barrier parameter c > 0. Although this is a constrained


problem, it canstillbe solved by a search technique for unconstrained
problems becauseif such a search is initialized at apoint in intQ, then
all points obtained by a search technique for the solution of the problem
mina;GiRn(F(a;) 4- cq(x)) will stay in int fi. The barrier method consists
of solving a sequence of such modified problems with an ever-decreasing
sequence cy —> 0 of barrier parameters, as described by the following algo­
rithmic scheme.

Algorithm 4.2.1 The Barrier Algorithmic Scheme

Step 0: (Initialization.) Let {cp}^0 be a monotone sequence of barrier


parameters such that cy > 0 and c^+i < c^, for all v > 0, and
lim^-^oo cy = 0.
Step 1: (Iterative step.) For every v > 0, solve the modified problem and
set
xy = argmin (F(a?) + Cyq(x)). (4.11)
xeint Q
The next theorem shows that any accumulation point of a sequence
{x"}ff=Q generated by this algorithm is a solution of the original problem.
Theorem 4.2.1 Letq(x) be a barrier function for problem (4-l)-(4-2) with
a nonempty, closed and robust feasible set Q, and with F(x) a continuous
function, and assume that one of the following conditions holds:
(i) limiij.ii_.oo F(x) = +00,
(zz) Q is bounded.
Then any sequence {a^j^o generated by the barrier algorithmic scheme
(Algorithm 4‘2.1) has at least one accumulation point, every such point is
an optimal solution of problem (4-l)-(4-2)> and limp _^QOCyq(xy) = 0.
Proof Using the same argument as in the proof of Theorem 4.1.1, the ex­
* G Q, which is optimal for the original problem, is guaranteed.
istence of a?
Now we can write, for every v > 0,
Sect. 4.3 The Primal-Dual Algorithmic Scheme 65

)
*
F(x < F(xv) < F(x,J) + c^x"), (4.12)

and use the continuity of F and the robustness of fl to conclude that, for
every e > 0, there exists a point x 6 int Q. such that F(x) < F(x
)
* + e.

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


Therefore, for every p > 0, we get

F(x
)
* + e + c„q(x) > F(x) + c>,q(x) > F(xu) + cl/q(x>'),

which shows that lim^oo F(xv} + c„q(xv') < F(x


)
* + e. Combining this
with the limiting behavior of (4.12) we get

lim F(^) + c^z") - F(x


)
* = lim F(.F7) (4.13)
1/—>oo v—>oo

and lirnp-.oc cyq(xy) — 0. The sequence {xy}™=0 must be bounded because


otherwise it would contradict (4.13) if condition (z) holds or, if condition
(zz) holds, this follows from (4.11) since the interior of a bounded set is
also bounded. Let x be an accumulation point (guaranteed to exist since
is bounded) and assume that lim^x xy — x, for some infinite
subset K C Nq. Then continuity of F yields F(x) = F(x )
* and, since Q is
closed, x 6 Q. Thus, every accumulation point of {x^Y^Lq is a minimum
of the original problem. ■
It is easy to show, following arguments similar to those in the proof of
Theorem 4.1.1, that in the barrier algorithmic scheme we get, for all v > 0,

F(xI?+1) + cz/+i^(a;iy+1) < F(^zy) + cyq(xy),


q^) < <7(^+1),
F(aF+1) < F(aF),

showing that the values of the original objective function monotonically


decrease during the iteration of the barrier method.
Example 4.2.2 (The Logarithmic Barrier Function) For a set Q as de­
scribed in Example 4-2-1, the function q(x) = — log gtfx) is a barrier
function. Observe the relation of this with Burg’s entropy function defined
in (6.149); see also Section 6.9.2.

4.3 The Primal-Dual Algorithmic Scheme


Duality, Lagrange multipliers, and primal-dual algorithms for constrained
optimization play a prominent role in optimization theory. In this section
we explain their basic ideas in order to use them in conjunction with penalty
function methods. The combination of primal-dual algorithms with penalty
function methods gives rise to the method of augmented Lagrangians, the
latter of which is used in the development of model decompositions (see
66 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

Chapters 7 and 8). For simplicity of presentation we deal here only with
equality constrained problems of the form

Minimize F(x) (4.14)

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


s.t. hi(x) = 0, i = 1,2,... , m, (4.15)
x Eft, (4.16)

where F : JRn —> IR, hi : IRn —> IR, for all z = 1,2,..., m are given functions
and Q C IRn is a given subset. The Lagrangian of this problem is defined
as
m
L(x, zr) = F(x) + Kihi(x), (4.17)
i=l

for every x e IRn and every zr = (zrz) G IRm. The dual function associated
with this problem is
#(zr) = minL(£,7r), (4.18)

and the unconstrained optimization problem

Maximize #(zr) (4.19)


s.t. zr e IRm, (4.20)

is the dual problem to (4.14)-(4.16). The known local duality theorem


relates the solutions x* and zr * , of the primal (original) problem and its
dual problem, respectively, for the case when Q = IRn, and we present it
here without proof.
The following terms are used in the theorem. A point x* is a local
extremum (minimum or maximum) point of an optimization problem if
it is a feasible point, i.e., fulfills the constraints of the problem, and there
exists a neighborhood of the point where all values of the objective function
F(x) are not smaller than (or, for maximization problems, not greater than)
).
F(x
* A Lagrange multiplier vector zr * for problem (4.14)-(4.15) is a vector
that satisfies the first-order necessary local optimality conditions at a local
extremum point x* , i.e.,
m
VF(i’) + £\’V/ii(?) = (). (4.21)
1=1

* satisfying the constraints /q(.T


A point rr *) = 0, i = 1,2,..., m is said to
be a regular point of these constraints if the gradient vectors {V/ii(z
)}£L
* x
are linearly independent. The Hessian of a real-valued, twice continuously
differentiable function u : IRn —> IR, denoted by V2u(x), is an n x n matrix
Sect. 4.3 The Primal-Dual Algorithmic Scheme 67

. . . z_2 z <92tz(z) . d2u d 2u


whose (z, ?)th entry is (V u(x))i ~—-— and, since ——-— — ——-—,
v dxidxj dxidxj dxjdxi
it is symmetric. The Hessian of the Lagrangian (4.17), with respect to z,
at the point rr
* is

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


m
* = V2L(x
L *
) = V2F(x*) + 52 *
),
TTiV2^^ (4.22)

where V2F and V2hi are the Hessian matrices of the respective functions.
The local duality theorem requires that this matrix be positive definite,
x)
i.e., that (z, L
* > 0, for every x e IRn, such that x / 0.
Theorem 4.3.1 (Local Duality Theorem) Suppose that x* is a local
minimum point of the optimization problem (4-14)~(4-15) a minimal
)
value F(x
* ~ r* and a Lagrange multiplier vector 7r *. Suppose also that
x* is a regular point of the constraints (4-15) and that the corresponding
Hessian of the Lagrangian L* — V2L(x *
) is positive definite. Then the
dual problem (4• 19)~(4-%0) has a local maximum at 7r * with a maximal
value g(ir
*) = r*, and x* as the corresponding point to it* in (4-18).
A similar result can be obtained for problems having inequality con­
straints in addition to the equality constraints. If appropriate convexity
assumptions are added, then the local extrema (mentioned in the theorem)
can be replaced by global extrema. Finally, it is not necessary to include
all the constraints of a problem in the definition of the dual functional
g(w). Local duality can be defined with respect to any subset of functional
constraints; this is called partial duality. For example, the constraints
hi(x) = 0 might be separated into two subsets, easy and hard constraints.
The hard ones can be dualized, i.e., removed from the constraint set and in­
corporated into the Lagrangian, while the easy ones remain as constraints.
The primal-dual approach for constrained optimization problems aims,
in view of this duality theorem, at solving the dual unconstrained prob­
lem (4.19)-(4.20). This is done by an iterative scheme which alternates
between the minimization of the Lagrangian in (4.18) and the application
of a steepest ascent iteration to the dual problem.

Algorithm 4.3.1 The Primal-Dual Algorithmic Scheme

Step 0: (Initialization.) 7r° € IRm is arbitrary.


Step 1: (Iterative step.) Given 7rp, for some v > 0,
(i) Solve the minimization problem

xy — argmin L(x,7rp). (4.23)


xeQ
68 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

(ii) Do a steepest ascent step to calculate 7rp+1 via

7r^+1 = it” + cyhi{xy), i = 1,2,..., m, (4.24)

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


where {cp}^l0 is a nondecreasing sequence of positive numbers, called
stepsizes, and return to the beginning of the iterative step.
The general scheme of Algorithm 4.3.1 can give rise to various specific
algorithms, depending on the particular algorithm employed in (4.23) and
the choice of stepsizes in (4.24). Although a detailed study of general
primal-dual algorithms is beyond the scope of this book, we will examine
in detail the convergence of certain specific primal-dual algorithms (see
Chapter 6, in particular, the discussion in Section 6.3).
From the structure of Algorithm 4.3.1, we can assess some of its draw­
backs. First, the positive definiteness of L
* —referred to as local convexity—
is necessary for Theorem 4.3.1. Without it the dual problem (4.19)-(4.20)
is not well-defined and the iteration (4.24) is not meaningful. Second,
slow convergence of the iterates in (4.24) usually necessitates
many repeated returns to the minimization in (4.23), thus limiting the
usefulness of the whole algorithm to problems whose structure makes these
minimizations efficient. This happens in several important areas of appli­
cations when the original problem is separable. This means (see Defini­
tion 1.3.1) that the vector x € lRm can be partitioned into q subvectors
xl e IRnz, I = 1,2,..., q, such that ni = n, and the objective function
and the constraints separate into sums of functions in the following form:
Q
Minimize <4-25)
1=1
Q
s.t. 52h((?) = O. (4.26)
1=1

In this formulation, hi : > IRm and the equation (4.26) repre­


sent the original equality constraints hi(x) = 0, i = 1,2,... , m. When
inequality constraints are present in the original problem, i.e., gt(x) <
0, i = 1,2, ...,p, then in the separable formulation they take the form
m=i9i(xi) — where gt : IRnz —> IRP. For such separable problems the
minimization involved in (4.23) decomposes into small subproblems. This
class of problems is later treated in detail (see Chapter 7).

4.4 Augmented Lagrangian Methods

Augmented Lagrangian methods, also known as multiplier methods, are a


class of very effective general methods for constrained nonlinear optimiza­
tion. They combine the basic idea of penalty function methods (see Section
Sect. 4.4 Augmented Lagrangian Methods 69

4.1) with the primal-dual algorithmic approach, based on classic Lagrange


duality theory (see above). Penalty methods suffer from slow convergence
and numerical instabilities that result from the need to increase the penalty
parameters cy in equation (4.5) in order to achieve feasibility. Classical La­

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


grangian methods of the primal-dual type have the drawbacks explained
in the previous section. But augmented Lagrangian methods, when prop­
erly employed and under mild technical conditions, can be made to behave
better than both methods.
We look again at the constrained nonlinear optimization problem (4.14)-
(4.16), assuming now that F : ]Rn —* IR is a convex function and that
Q C IRn is a nonempty polyhedral set, i.e., it can be expressed as the
intersection of finitely many half-spaces. The basic idea of augmented La­
grangian methods is to replace first this original problem by an equivalent
problem

Minimize F(x) + - f(ch(x)) (4.27)


c
s.t. h(x) ~ 0, (4.28)
x e Q, (4.29)

where h : IRn —> IRm is the vector of functions h = (hi), thus (4.28)
represents the same equality constraints as (4.15). For this problem to be
equivalent to the original problem, the function f must have the property
that /(0) = 0 and the scalar parameter c is positive. The Lagrangian of
this equivalent problem, called the augmented Lagrangian, has the form

Lc(x, %) = F(x) 4- (tf, h(x)} + ^f(ch(x)), (4.30)


c
and the dual problem is

Maximize <7c(tt) (4.31)


s.t. 7T € IRm, (4.32)

where the dual function is

gc(it) = min Lc(x, %). (4.33)


xESl

Thus, keeping the original constraints (4.15) and (4.28) in the equivalent
problem, in addition to having them built into (4.27), really means that a
penalty term of the form (1 / c) f (ch(x)) has been added to the Lagrangian
rather than to the original objective function F(x). To generate an algo­
rithmic scheme an iterative process is used, which alternates between the
minimization of the augmented Lagrangian (4.30) and the application of
an ascent iteration to the dual problem.
70 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

The method was originally proposed with the function f(z) = j || z ||2
and was later extended to functions of the form /(^) = 52i2i ^>(^?), where
(j): IR —> IR belongs to the class of penalty functions Pe defined as follows.
Definition 4.4.1 (The Class Pe of Penalty Functions) The function

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


(j) : IR —> IR belongs to the class of penalty functions Pe if it has the
following properties:
(z) cp is continuously differentiable and strictly convex on IR.
(zz) </>(0) = 0 and ^(0) = 0.
dt
/.. .. d(f>. . ... def)
\m) lim —\t) — — oo and hm — (t) = Too.
t-+-<x) dt z—>+oo dt
Examples of functions in this class are:
(f>(t) = |t2, this is the classic case with f(z) = | || z ||2 .
(fit) = p~x\t\p, for p > 1.
(fit) = p”1 \t\p + |^2, for p > 1.
(f(t) = cosh(i) — 1, where cosh(^) is the hyperbolic cosine function.
The next proposition expresses the gradient of the dual function in
terms of the constraint functions. We will assume that T is a local solu­
tion of problem (4.27)-(4.29) and a regular point of the constraints (4.28).
Denote by z(tf) the unique solution of (4.33) in the neighborhood of x*
.
Proposition 4.4.1 If all involved functions F(x), hi(x), i = 1,2,
and x(tt) are continuously differentiable, then

V^c(7r) = /z(x(7r)). (4.34)

Proof From (4.33) and the definition of rr(7r) we have

9c U) = F(x(tt)) + (tt, h(x(Tv))) + i f(ch(x(i())). (4.35)


c
From this we obtain

dirt

(4.36)

Denoting by Vft(x) the n x m matrix whose zth column consists of the


gradient V/z^rc), and by W(7r) the m x n matrix whose Jth column consists
of the gradient V^(7r), we may rewrite this as
Sect. 4.4 Augmented Lagrangian Methods 71

V5c(7r) = Vz(7r) ^VF(a:(7r)) + V/i(a;(7r))7r+ |v/i(a:(7r))cV/(7i(2:(7r)))j

+h(x(7r)). (4.37)

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


The vector which multiplies the matrix W(7r) is nothing but the gradient
with respect to x of the augmented Lagrangian VxLc(x(tt), 7r), which is
zero by the definition of x(k)\ thus we obtain (4.34). ■
Let </> 6 Pe be any function of the class of penalty functions given
in Definition 4.4.1. In order to formulate the augmented Lagrangian al­
gorithmic scheme for this class of functions we need to define, for each
i= the scalar quantities

/?i(7T,c) = y (4.38)

where the function under the integral sign is the second derivative of <j) at
a point t = achi(x(7r)). Let B(tt,c) = diag (/?i(tt, c), /^(tt, c), ... ,/3m(7r,c))
be the diagonal m x m matrix with the /3/s on its diagonal. If 0(t) = jt2
we get fy(7r,c) = 1 for all i, and B(tt,c) is the identity matrix. In this
case, the augmented Lagrangian algorithmic scheme, given next, becomes
the classic quadratic method of multipliers.

Algorithm 4.4.1 The Augmented Lagrangian Algorithmic Scheme

Step 0: (Initialization.) 7r° 6 IRm is arbitrary.


Step 1: (Iterative step.) Given 7rp, for some v > 0,
(i) Solve the minimization problem,

xv = argmin LCij (x, iry). (4.39)


zGQ

(ii) Do an ascent step to calculate 7rl'+1 via

+ Ct,B(irv, cv^gCv (tt"), (4.40)

where {cM}£T0 is an appropriately chosen sequence of positive param­


eters, and return to the beginning of the iterative step.
This is quite a general scheme which can be made specific according
to the method chosen for minimization of the augmented Lagrangian in
(4.39) and the choice of the sequence When calculating the en­
tries of B(7rz>,c^) according to (4.38) we should remember that by defini­
tion, x{ev) — xy. The iteration formula (4.40) is a deflected-gradient type
iteration whose behavior depends on {c^}^ and on the matrix B(Ey,ciy),
which deflects the direction of the gradient (7ry). For functions € Pe,
72 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

the ascent nature of (4.40) can be guaranteed. This formula can be simpli­
fied in the following way. By using a first-order Taylor expansion formula
for the function around the point t — 0, and using the fact that <£(0) = 0,
we get

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


(4.41)

This means that ^(c/^(z(tt)) = $(7r)c/ii(z(7r)), and so the iteration (4.40)

is actually identical with

dc/>
7T,- (4.42)
dt
Finally, it is worth noting that with constant parameters Cy = c > 0, for all
v > 0, the sequence of matrices {B(ttZ7, c)}£L0 tends to the identity matrix,
d^d)
as x(tt^) —» x*, assuming that —y (0) = 1, thus making (4.40) a fixed
dtz
stepsize, steepest ascent iteration, as tends to 7r
* —the optimal solution
of the dual problem.
We wish now to analyze the convergence of Algorithm 4.4.1 when ap­
plied to the original problem (4.27)-(4.29) for the case of linear equality
constraints, i.e., h(x) = Ax -b, for some given m x n matrix A and given
vector b C IRm. It turns out that this algorithm is then closely related to
the proximal minimization algorithm (Algorithm 3.1.2) that we studied ear­
lier (see Chapter 3). Specifically, the augmented Lagrangian algorithm is
then equivalent to the proximal minimization algorithm applied to the dual
problem (4.31)-(4.32). Let us assume here that the minimum in (4.39) is
always uniquely attained, and introduce the new vector variable z = Ax — b.
Then

min LCli (x, tt1') = min \ F(x) + (tt", Ax — b) + — , (4.43)


xGQ xE^l \ Cy

and we may write

\) = argmin (f(z) + (tt", z) + —/(cpz)) , (4.44)


z / z—Ax — b, x€Q, z€lRm \ cv /

where zy = Axy — b. This constrained optimization problem has a dual


function

Mm) = min (f(;e) + + — f(c„z) + {n,Ax-b-z)\ , (4.45)


a?eQ, zeIRm \ Cy /

and it decomposes into two minimizations:


Sect. 4.4 Augmented Lagrangian Methods 73

p(/z) = min+ min 02CM, (4.46)


zGQ z£]Rm

where
= F(x) + (n,Ax~b), (4.47)

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


and
= (tt" - H---- f(cvz). (4.48)
£y
Applying a duality theorem under the appropriate conditions (such as The­
orem 4.3.1) we are guaranteed that an optimal dual solution /z
* exists, i.e.,
= argmaxMGlRm </(/z). The first right-hand side summand of (4.46),

min#i(£, /z) = #(/z) = min (F(x) 4- (/z, Ax — b)), (4.49)

is precisely the dual function (4.18) of the original problem (4.27)-(4.29).


The unconstrained minimum in the second summand of (4.46), when /z =
* , is attained at zy, for which
/z

“< = for i = 1,2,...,m, (4.50)


dt
9^2
as follows from solving the system *)-^-=-(z,/z = 0, i = 1,2,..., m. From
UZi
(4.34) with h(x) = Ax — b along with (4.42) and (4.50) we obtain then that
* = tt*7"1"1. With similar calculations we know that the minimum of
/z
m

(4.51)
2=1

is attained at a point z* for which

=-3r(cp^), z = l,2,...,m. (4.52)


dt
In order to obtain the minimal value of (4.51), i.e., calculate we
must extract z* from (4.52) and substitute it into (4.51). This cannot be
done unless we specify </>(t); taking, for example, </>(£) = |t2 we obtain

(4.53)

Combining this with (4.46)- (4.49) and with the fact that /.z
* = 7rp+1, we
74 Penalty Methods, Barrier Methods and Augmented Lagrangians Chap. 4

conclude that

= argmax g(/z) = 7r"+1 = argmax (^(g) - || n - tt" ||2), (4.54)

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


which is precisely the quadratic proximal minimization algorithm (i.e., Al­
gorithm 3.1.2 with /(x) = | || x ||2) applied to the dual problem (4.19)-
(4.20) of the original problem. Whether or not this equivalence is true—or
under what conditions it could be true—for general Bregman functions, is
still an open problem. Another specific instance in which a similar equiva­
lence holds is between the exponential method of multipliers and the PMD
algorithm (Algorithm 3.1.2) with an entropic distance function (see Section
4.5 for references).

4.5 Notes and References


Penalty and barrier methods and augmented Lagrangians are important
classes of methods for constrained nonlinear optimization; see, for exam­
ple, Gill et al. (1989) for a survey, and Luenberger (1984) for a textbook
treatment.
4.1 A thorough understanding of the properties of penalty functions is im­
portant for the study of augmented Lagrangian methods. Consult the
survey paper by Bertsekas (1976) or the general remarks in the intro­
duction to his book (1982). Fiacco and McCormick (1968) pioneered
the use of quadratic penalty functions in sequential unconstrained min­
imization; see also Polak (1971). The suggestions to study constrained
problems via related unconstrained problems seem to originate in the
work of Courant (1943) and Frisch (1955). Penalty methods are dis­
cussed in many advanced books in the field, see, e.g., Fletcher (1987)
or Minoux (1986), whose presentations provide the foundation for our
discussion. A recent unified treatment and review of exact penalty
functions and methods appear in Burke (1991).
4.2 Frisch (1955) proposed the logarithmic barrier function. Carroll (1961)
proposed the function presented in Example 4.2.1. Most books men­
tioned above contain material on barrier function methods; see also the
discussion in Gill, Murray, and Wright (1981) which is geared toward
practical applications of the methods.
4.3 We limited the scope of our discussion of this central subject in con­
strained optimization to the basics of sequential Lagrangian minimiza­
tion. For further details on duality theory in convex analysis see
Rockafellar (1970) or Hiriart-Urruty and Lemarechal (1993). Luen­
berger (1984) or Bazaraa and Shetty (1979) can be used as introduc­
tions. For a review see Rockafellar (1976c).
4.4 Augmented Lagrangian methods (also referred to as multiplier meth­
ods) originated with the works of Hestenes (1969) and Powell (1969);
Sect. 4.5 Notes and References 75

see also Hestenes (1975). Extended treatments of this subject can be


found in almost every modern advanced book in the field. An exten­
sive treatise is Bertsekas (1982) where one can also find a thorough
treatment of nonquadratic penalty functions and an extensive list of

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


references to the contributions made to this field. The exponential
multiplier method, originally proposed by Kort and Bertsekas (1972),
has been recently analyzed by Tseng and Bertsekas (1993) who showed
its equivalence with the entropic version of the PMD algorithm (Algo­
rithm 3.1.2). Teboulle (1992b) also constructed generalized augmented
Lagrangians by using either Bregman distances or Csiszar distances
(see Chapter 2).
Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024
Part II

Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024


ALGORITHMS

The road to wisdom?—Well it’s plain and simple to express:


Err
and err
and err again
but less
and less
and less.
Piet Hein, The Road to Wisdom, 1966
Downloaded from https://academic.oup.com/book/53915/chapter/422193716 by OUP site access user on 12 May 2024

You might also like