Lagrange Multiplier Approach To Variational Problems and Applications
Lagrange Multiplier Approach To Variational Problems and Applications
Approach to
Variational Problems
and Applications
Advances in Design and Control
SIAM’s Advances in Design and Control series consists of texts and monographs dealing with all
areas of design and control and their applications. Topics of interest include shape optimization,
multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses
on the mathematical and computational aspects of engineering design and control that are usable
in a wide variety of scientific and engineering disciplines.
Editor-in-Chief
Ralph C. Smith, North Carolina State University
Editorial Board
Athanasios C. Antoulas, Rice University
Siva Banda, Air Force Research Laboratory
Belinda A. Batten, Oregon State University
John Betts, The Boeing Company
Stephen L. Campbell, North Carolina State University
Eugene M. Cliff, Virginia Polytechnic Institute and State University
Michel C. Delfour, University of Montreal
Max D. Gunzburger, Florida State University
J. William Helton, University of California, San Diego
Arthur J. Krener, University of California, Davis
Kirsten Morris, University of Waterloo
Richard Murray, California Institute of Technology
Ekkehard Sachs, University of Trier
Series Volumes
Ito, Kazufumi and Kunisch, Karl, Lagrange Multiplier Approach to Variational Problems and
Applications
Xue, Dingyü, Chen, YangQuan, and Atherton, Derek P., Linear Feedback Control: Analysis and
Design with MATLAB
Hanson, Floyd B., Applied Stochastic Processes and Control for Jump-Diffusions: Modeling,
Analysis, and Computation
Michiels, Wim and Niculescu, Silviu-Iulian, Stability and Stabilization of Time-Delay Systems:
An Eigenvalue-Based Approach
Ioannou, Petros and Fidan, Baris, ¸ Adaptive Control Tutorial
Bhaya, Amit and Kaszkurewicz, Eugenius, Control Perspectives on Numerical Algorithms and
Matrix Problems
Robinett III, Rush D., Wilson, David G., Eisler, G. Richard, and Hurtado, John E., Applied Dynamic
Programming for Optimization of Dynamical Systems
Huang, J., Nonlinear Output Regulation: Theory and Applications
Haslinger, J. and Mäkinen, R. A. E., Introduction to Shape Optimization: Theory, Approximation,
and Computation
Antoulas, Athanasios C., Approximation of Large-Scale Dynamical Systems
Gunzburger, Max D., Perspectives in Flow Control and Optimization
Delfour, M. C. and Zolésio, J.-P., Shapes and Geometries: Analysis, Differential Calculus, and
Optimization
Betts, John T., Practical Methods for Optimal Control Using Nonlinear Programming
El Ghaoui, Laurent and Niculescu, Silviu-Iulian, eds., Advances in Linear Matrix Inequality Methods
in Control
Helton, J. William and James, Matthew R., Extending H∞ Control to Nonlinear Systems: Control
of Nonlinear Systems to Achieve Performance Objectives
Lagrange Multiplier
Approach to
Variational Problems
and Applications
Kazufumi Ito
North Carolina State University
Raleigh, North Carolina
Karl Kunisch
University of Graz
Graz, Austria
10 9 8 7 6 5 4 3 2 1
All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission
of the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.
Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trademark
is intended.
Ito, Kazufumi.
Lagrange multiplier approach to variational problems and applications / Kazufumi Ito,
Karl Kunisch.
p. cm. -- (Advances in design and control ; 15)
Includes bibliographical references and index.
ISBN 978-0-898716-49-8 (pbk. : alk. paper)
1. Linear complementarity problem. 2. Variational inequalities (Mathematics).
3. Multipliers (Mathematical analysis). 4. Lagrangian functions. 5. Mathematical
optimization. I. Kunisch, K. (Karl), 1952– II. Title.
QA402.5.I89 2008
519.3--dc22
2008061103
is a registered trademark.
To our families:
Contents
Preface xi
2 Sensitivity Analysis 27
2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Stability results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Lipschitz continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 Application to optimal control of an ordinary differential equation . . . . . 62
vii
i i
i i
ItoKunisc
i i
2008/6/12
page viii
i i
viii Contents
i i
i i
ItoKunisc
i i
2008/6/12
page ix
i i
Contents ix
Bibliography 327
Index 339
i i
i i
ItoKunisc
i i
2008/6/12
page x
i i
i i
i i
ItoKunisc
i i
2008/6/12
page xi
i i
Preface
The objective of this monograph is the treatment of a general class of nonlinear variational
problems of the form
min f (y, u)
y∈Y, u∈U
(0.0.1)
subject to e(y, u) = 0, g(y, u) ∈ K,
where f : Y ×U → R denotes the cost functional, and e : Y ×U → W and g : Y ×U → Z
are functionals used to describe equality and inequality constraints. Here Y , U , W , and Z
are Banach spaces and K is a closed convex set in Z. A special choice for K is simple box
constraints
K = {z ∈ Z : φ ≤ z ≤ ψ}, (0.0.2)
where Z is a lattice with ordering ≤, and φ, ψ are elements in Z. Theoretical issues which
will be treated include the existence of minimizers, optimality conditions, Lagrange mul-
tiplier theory, sufficient optimality considerations, and sensitivity analysis of the solutions
to (0.0.1) with respect to perturbations in the problem data. These topics will be covered
mainly in the first part of this monograph. The second part focuses on selected computa-
tional methods for solving the constrained minimization problem (0.0.1). The final chapter
is devoted to the characterization of shape gradients for optimization problems constrained
by partial differential equations.
Problems which fit into the framework of (0.0.1) are quite general and arise in ap-
plication areas that were intensively investigated during the last half century. They include
optimal control problems, structural optimization, inverse and parameter estimation prob-
lems, contact and friction problems, problems in image reconstruction and mathematical
finance, and others. The variable y is often referred to as the state variable and u as the
control or design parameter. The relationship between the variables y and u is described by
e. It typically represents a differential or a functional equation. If e can be used to express
the variable y as a function of u, i.e., y = (u), then (0.0.1) reduces to
This is referred to as the reduced formulation of (0.0.1). In the more general case when y
and u are both independent variables linked by the equation constraint e(y, u) = 0 it will
be convenient at times to introduce x = (y, u) in X = Y × U and to express (0.0.1) as
xi
i i
i i
ItoKunisc
i i
2008/6/12
page xii
i i
xii Preface
where 0 < a < ā < ∞ are lower and upper bounds for the “control” variable a. In this
example, given a ∈ U , the state y ∈ Y can be uniquely determined by solving the elliptic
boundary value problem (0.0.5) for y, and we obtain a problem of type (0.0.3).
The general approach that we follow in this monograph for the analytical as well as the
numerical treatment of (0.0.1) is based on Lagrange multiplier theory. Let us subsequently
suppose that Y, U , and W are Hilbert spaces and discuss the case Z = U and g(y, u) = u,
i.e., the constraint u ∈ K. We assume that f and e are C 1 and denote by fy , fu the Fréchet
derivatives of f with respect to y and u, respectively. Let W ∗ be the dual space of W and
let the duality product be denoted by ·, ·
W ∗ ,W . The analogous notation is used for Z and
Z ∗ . We form the Lagrange functional
where λ ∈ W ∗ is the Lagrange multiplier associated with the equality constraint e(y, u) = 0
which, for the present discussion, is supposed to exist. It will be shown that a minimizing
pair (y, u) satisfies
In the case K = Z the second equation of (0.0.8) results in the equality fy (y, u) +
eu (y, u) = 0. In this case a first possibility for solving the system (0.0.8) for the unknowns
(y, u, λ) is the use of a direct equation solver of Newton type, for example. Here the
Lagrange multiplier λ is treated as an independent variable just like y and u. Alternatively,
for (y, u) satisfying e(y, u) = 0 and λ satisfying fy (y, u) + eu (y, u)∗ λ = 0 the gradient of
J (u) of (0.0.4) can be evaluated as
Thus the combined step of determining y ∈ Y for given u ∈ K such that e(y, u) = 0
and finding λ ∈ W ∗ satisfying fy (y, u) + eu (y, u)∗ λ = 0 for (y, u) ∈ Y × U provides a
i i
i i
ItoKunisc
i i
2008/6/12
page xiii
i i
Preface xiii
i i
i i
ItoKunisc
i i
2008/6/12
page xiv
i i
xiv Preface
The duality method for solving (0.0.1) requires minimizing L(y, u, λk ) over (y, u) ∈ Y ×K
and updating λ by means of
where
(yk , uk ) = argmin L(y, u, λk ),
y∈Y, u∈K
αk > 0 is an appropriately chosen step size and J denotes the Riesz mapping from W onto
W ∗ . It can be argued that e(yk , uk ) is the gradient of d(λ) at λk and (0.0.15) is in turn a
steepest ascent method for the maximization of d(λ) under appropriate conditions. Such
methods are called primal-dual methods. Despite the fact that the method can be justified
only under fairly restrictive convexity assumptions on L(y, u, λ) with respect to (y, u),
it provides an elegant use of Lagrange multipliers and is a basis for so-called augmented
Lagrangian methods.
Augmented Lagrangian methods with K = Z are based on the following problem
which is equivalent to (0.0.1):
c
min f (y, u) + |e(y, u)|2W
y∈Y,u∈K 2 (0.0.16)
subject to e(y, u) = 0.
Under rather mild conditions the quadratic term enhances the local convexity of L(y, u, λ)
in the variables (y, u) for sufficiently large c > 0. It helps the convergence of direct
solvers based on the necessary optimality conditions (0.0.8). To carry this a step further we
introduce the augmented Lagrangian functional
c
Lc (y, u, λ) = f (x, u) + e(y, u), λ
+ |e(y, u)|2W . (0.0.17)
2
The first order augmented Lagrangian method is the primal-dual method applied to (0.0.17),
i.e.,
(yk , uk ) = argmin Lc (y, u, λk ),
y∈Y, u∈K (0.0.18)
λk+1 = λk + c J e(yk , uk ).
Its advantage over the penalty method is attributed to the fact that local convergence of
(yk , uk ) to a minimizer (y, u) of (0.0.1) holds for all sufficiently large and fixed c > 0,
without requiring that c → ∞. As we noted, Lc (y, u, λ) has local convexity properties un-
der well-studied assumptions, and the primal-dual viewpoint is applicable to the multiplier
update. The iterates (yk , uk , λk ) converge linearly to the triple (y, u, λ) satisfying the first
order necessary optimality conditions, and convergence can improve as c > 0 increases.
Due to these attractive characteristics and properties, the method of multipliers and its sub-
sequent Newton-like variants have been recognized as a powerful method for minimization
problems with equality constraints. They constitute an important part of this book.
i i
i i
ItoKunisc
i i
2008/6/12
page xv
i i
Preface xv
In (0.0.16) and (0.0.18) the constraint u ∈ K remained as explicit constraint and was
not augmented. To describe a possibility for augmenting inequalities we return to the general
form g(y, u) ∈ K and consider inequality constraints with finite rank, i.e., Z = Rp and
K = {z ∈ Rp : zi ≤ 0, 1 ≤ i ≤ p}. Then under appropriate conditions the formulation
c
min Lc (y, u, λ) + (μ, g(y, u) − q) + |g(y, u) − q|2Rp
y∈Y, u∈U, q∈Rp 2 (0.0.19)
subject to q ≤ 0
is equivalent to (0.0.1). Here μ ∈ Rp is the Lagrange variable associated with the inequality
constraint g(y, u) ≤ 0. Minimizing the functional in (0.0.19) over q ≤ 0 results in the
augmented Lagrangian functional
1 c
Lc (y, u, λ, μ) = Lc (y, u, λ) + | max(0, μ + c g(y, u))|2Rp − |μ|2Rp , (0.0.20)
2 2
where equality and finite rank inequality constraints are augmented. The corresponding
augmented Lagrangian method is
i i
i i
ItoKunisc
i i
2008/6/12
page xvi
i i
xvi Preface
functional is differentiable with respect to the control and that an optimality principle can
be derived. In terms of shape optimization problems this means that the cost functional is
shape differentiable while the state is not differentiable with respect to the shape.
In summary, Lagrange multiplier theory provides a tool for the analysis of general
constrained optimization problems with cost functionals which are not necessarily C 1 and
with state equations which are in some sense singular. It also leads to a theoretical basis
for developing efficient and powerful iterative methods for solving such problems. The
purpose of this monograph is to provide a rather thorough analysis of Lagrange multiplier
theory and to show its impact on the development of numerical algorithms for problems
which are posed in a function space setting.
Let us give a short description of the book for those readers who do not intend to
read it by consecutive chapters. Chapter 1 provides a variety of tools to establish existence
of Lagrange multipliers and is called upon in all the following chapters. Here, as in other
chapters, we do not attempt to give the most general results, nor do we strive for covering the
complete literature. Chapter 2 is devoted to the sensitivity analysis of abstract constrained
nonlinear programming problems and it essentially stands for itself. This chapter is of great
importance, addressing continuity, Lipschitz continuity, and differentiability of the solutions
to optimization and optimal control problems with respect to parameters that appear in the
problem formulation. Such results are not only of theoretical but also of practical importance.
The sensitivity equations have been a starting point for the development of algorithmic
concepts for decades. Nevertheless, readers who are not interested in this topic at first may
skip this chapter without missing technical results which might be needed for later chapters.
Chapters 3, 5, and 6 form a unit which is devoted to smooth optimization problems.
Chapter 3 covers first order augmented Lagrangian methods for optimization problems with
equality and inequality constraints. Here as in the remainder of the book, the inequality
constraints that we have in mind typically represent partial differential equations. In fact,
during the period in which this monograph was written, the terminology “PDE-constrained
optimization” emerged. Inverse problems formulated as regularized least squares problems
and optimal control problems for (partial) differential equations are primary examples for
the theories that are discussed here. Chapters 5 and 6 are devoted to second order iter-
ative solution methods for equality-constrained problems. Again the equality constraints
represent partial differential equations. This naturally gives rise to the following situation:
The variables with respect to which the optimization is carried out can be classified into
two groups. One group contains the state variables of the differential equations and the
other group consists of variables which represent the control or input variables for optimal
control problems, or coefficients in parameter estimation problems. If the state variables
are considered as functions of the independent controls, inputs, or coefficients, and the cost
functional in the optimization problem is only considered as a functional of the latter, then
this is referred to as the reduced formulation. Applying a second order method to the
reduced functional we arrive at the Newton method for optimization problems with partial
differential equations as constraints. If both state and control variables are kept as inde-
pendent variables and the optimality system involving primal and adjoint variables, which
are the Lagrange multipliers corresponding to the PDE-constraints, is derived, we arrive at
the sequential quadratic programming (SQP) technique: It essentially consists of applying
a Newton algorithm to the first order necessary optimality conditions. The Newton method
for the reduced formulation and the SQP technique are the focus of Chapter 5. Chapter 6
i i
i i
ItoKunisc
i i
2008/6/12
page xvii
i i
Preface xvii
is devoted to second order augmented Lagrangian techniques which are closely related, as
we shall see, to SQP methods. Here the equation constraint is augmented in a penalty term,
which has the effect of locally convexifying the optimization problem. Since augmented
Lagrangians also involve Lagrange multipliers, there is, however, no necessity to let the
penalty parameter tend to infinity and, in fact, we do not suggest doing so.
A second larger unit is formed by Chapters 4, 7, 8, and 9. Nonsmoothness, primal-dual
active set strategy, and semismooth Newton methods are the keywords which characterize
the contents of these chapters. Chapter 4 is essentially a recapture of concepts from convex
analysis in a format that is used in the remaining chapters. A key result is the formulation
of differential inclusions which arise in optimality systems by means of nondifferentiable
equations which are derived from Yosida–Moreau approximations and which will serve
as the basis for the primal-dual active set strategy. Chapter 7 is devoted to the primal-
dual active set strategy and its global convergence properties for unilaterally and bilaterally
constrained problems. The local analysis of the primal-dual active set strategy is achieved
in the framework of semismooth Newton methods in Chapter 8. It contains the notion of
Newton derivative and establishes local superlinear convergence of the Newton method
for problems which do not satisfy the classical sufficient conditions for local quadratic
convergence. Two important classes of applications of semismooth Newton methods are
considered in Chapter 9: image restoration and deconvolution problems regularized by the
bounded variation (BV) functional and friction and contact problems in elasticity.
Chapter 10 is devoted to a Lagrangian treatment of parabolic variational inequali-
ties in unbounded domains as they arise in the Black–Scholes equation, for example. It
contains the use of monotone techniques for analyzing parabolic systems without relying
on compactness assumptions in a Gelfand-triple framework. In Chapter 11 we provide
a calculus for obtaining the shape derivative of the cost functional in shape optimization
problems which bypasses the need for using the shape derivative of the state variables of the
partial differential equations. It makes use of the expansion technique that is proposed in
Chapters 1 and 5 for weakly singular optimal control problems, and of the assumption that
an appropriately defined adjoint equation admits a solution. This provides a versatile tech-
nique for evaluating the shape derivative of the cost functional using Lagrange multiplier
techniques.
There are many additional topics which would fit under the title of this monograph
which, however, we chose not to include. In particular, issues of discretization, convergence,
and rate of convergence are not discussed. Here the issue of proper discretization of adjoint
equations consistent with the discretization of the primal equation and the consistent time
integration of the adjoint equations must be mentioned. We do not enter into the discussion
of whether to discretize an infinite-dimensional nonlinear programming problem first and
then to decide on an iterative algorithm to solve the finite-dimensional problems, or the other
way around, consisting of devising an optimization algorithm for the infinite-dimensional
problem which is subsequently discretized. It is desirable to choose a discretization and
an iterative optimization strategy in such a manner that these two approaches commute.
Discontinuous Galerkin methods are well suited for this purpose; see, e.g., [BeMeVe].
Another important area which is not in the focus of this monograph is the efficient solution
of those large scale linear systems which arise in optimization algorithms. We refer the
reader to, e.g., [BGHW, BGHKW], and the literature cited there. The solution of large
scale time-dependent optimal control problems involving the coupled system of primal and
i i
i i
ItoKunisc
i i
2008/6/12
page xviii
i i
xviii Preface
adjoint equations, which need to be solved in opposite directions with respect to time, still
offers a significant challenge, despite the advances that were made with multiple shooting,
receding horizon, and time-domain decomposition techniques. From the point of view of
optimization theory there are several topics as well into which one could expand. These
include globalization strategies, like trust region methods, exact penalty methods, quasi-
Newton methods, and a more abstract Lagrange multiplier theory than that presented in
Chapter 1.
As a final comment we stress that for a full treatment of a variational problem in func-
tion spaces, both its infinite-dimensional analysis as well as its proper discretization and the
relation between the two are indispensable. Proceeding from an infinite-dimensional prob-
lem directly to its disretization without such a treatment, important issues can be missed. For
instance discretization without a well-posed analysis may result in the use of inappropriate
inner products, which may lead to unnecessary ill-conditioning, which entails unneces-
sary preconditioning. Inconsiderate discretization may also result in the loss of structural
properties, as for instance symmetry properties.
i i
i i
ItoKunisc
i i
2008/6/12
page 1
i i
Chapter 1
Existence of Lagrange
Multipliers
min f (x)
(1.1.1)
subject to x ∈ C and g(x) ∈ K,
A+ = {x ∗ ∈ X ∗ : x ∗ , a
≤ 0 for all a ∈ A},
where ·, ·
(= ·, ·
X∗ ,X ) denotes the duality pairing between X ∗ and X. Further for x ∈ X
the conical hull of C \ {x} is given by
i i
i i
ItoKunisc
i i
2008/6/12
page 2
i i
f (x ∗ ) + λ∗ ◦ g (x ∗ ) ∈ −C(x ∗ )+ .
f (x ∗ ) + λ∗ ◦ g (x ∗ ) = 0.
For the existence analysis of Lagrange multipliers one makes use of the following
tangent cones which approximate the feasible set M at x ∗ :
∗ 1 ∗ +
T (M, x ) = x ∈ X : x = lim (xn − x ), tn → 0 , xn ∈ M ,
n→∞ tn
The cone T (M, x ∗ ) is called the sequential tangent cone (or Bouligand cone) and L(M, x ∗ )
the linearizing cone of M at x ∗ .
Proof. Let x ∈ T (M, x ∗ ). Then there exist sequences {xn }∞ n=1 in M and {tn } with
limn→∞ tn = 0 such that x = limn→∞ t1n (xn − x ∗ ). By the definition of Fréchet differ-
entiability there exists a sequence {rn } in Z such that
1 1
f (x ∗ )x = lim f (x ∗ )(xn − x ∗ ) = lim (f (xn ) − f (x ∗ ) + rn )
n→∞ tn n→∞ tn
and limn→∞ 1
r
|xn −x ∗ | n
= 0. By optimality of x ∗ we have
1
f (x ∗ )x ≥ lim |x| rn = 0
n→∞ |xn − x ∗ |
We now briefly outline the main steps involved in the abstract approach to prove the
existence of Lagrange multipliers. For this purpose we assume that C = X, K = {0}
so that the set of feasible points is described by M = {x : g(x) = 0}. We assume that
g (x ∗ ) : X → Z is surjective. Then by Lyusternik’s theorem [Ja, We] we have
B = {(f (x ∗ )x + r, g (x ∗ )x) : r ≥ 0, x ∈ X} ⊂ R × Z.
i i
i i
ItoKunisc
i i
2008/6/12
page 3
i i
Observe that (0, 0) ∈ B and that due to Proposition 1.2 and (1.1.3) the origin (0, 0) is a
boundary point of B. Since g (x ∗ ) is surjective, B has nonempty interior and hence by the
Eidelheit separation theorem that we shall recall below there exists a closed hyperplane in
R × Z which supports B at (0, 0), i.e., there exists 0 = (α, λ∗ ) ∈ R × Z ∗ such that
α(f (x ∗ )x + r) + λ∗ , g (x ∗ )x
≥ 0 for all (r, x) ∈ R+ × X.
Theorem 1.3. Let K1 and K2 be nontrivial, convex, and disjoint subsets of a normed linear
space X . If K1 is closed and K2 is compact, then there exists a closed hyperplane strictly
separating K1 and K2 , i.e., there exist x ∗ ∈ X , β ∈ R, and > 0 such that
x ∗ , x
≤ β − for all x ∈ K1 and x ∗ , x
≥ β + for all x ∈ K2 .
Let us give a brief description of the sections of this chapter. Section 1.2 contains the
open mapping theorem already alluded to above. Regularity properties which guarantee
the existence of a Lagrange multiplier in general nonlinear optimal programming problems
in infinite dimensions are analyzed in section 1.3 and applications to parameter estimation
and optimal control problems are described in section 1.4. Section 1.5 is devoted to the
derivation of a first order optimality system for a class of weakly singular optimal control
problems and to several applications which are covered by this class. Finally in section 1.6
several further alternatives for deriving optimality systems are presented in an informal
manner.
Theorem 1.4. Let x̄ ∈ C and ȳ ∈ K, where K is a closed convex set. Then the following
statements are equivalent:
(a) Z = T C(x̄) − K(ȳ),
i i
i i
ItoKunisc
i i
2008/6/12
page 4
i i
Proof. Clearly (b) implies (a). To prove the converse we first show that
C(x̄) = n((C − x̄) ∩ X1 ). (1.2.1)
n∈N
The inclusion ⊃ is obvious and hence it suffices to prove that C(x̄) is contained in the set
on the right-hand side of (1.2.1). Let x ∈ C(x̄). Then x = λy with λ ≥ 0 and y ∈ C − x̄.
Without loss of generality we can assume that λ > 0 and |y| > 1. Convexity of C implies
1
that |y| y ∈ (C − x̄) ∩ X1 . Let n ∈ N be such that n ≥ λ|y|. Then x = (λ|y|) |y| 1
y ∈
n((C − x̄) ∩ X1 ), and (1.2.1) is verified.
For α > 0 let Aα = αT ((C − x̄) ∩ X1 − (K − ȳ) ∩ Z1 ). We will show that 0 ∈ int A1 .
In fact, from (1.2.1) and the analogous equality with C replaced by K it follows from (a)
that
Z= nT ((C − x̄) ∩ X1 ) ∪ (−n)((K − ȳ) ∩ Z1 ).
n∈N n∈N
Thus the complete space Z is the countable union of the closed sets An . By the Baire category
theorem there exists some m ∈ N such that int Am = ∅. Let a ∈ int Am . By (1.2.2) there
exists k ∈ N with −a ∈ Ak . Using Aα = αA1 this implies that − mk a ∈ Am . It follows that
the half-open interval {λ(− mk a)+(1−λ)a : 0 ≤ λ < 1} belongs to int Am and consequently
0 ∈ int Am = m int A1 . Thus we have
0 ∈ int A1 . (1.2.3)
Hence for some ρ > 0
1
Zρ ⊂ A1 = A 1 ⊂ A 1 + Z ρ2 , (1.2.4)
2 2 2
2
1 1 1 1
y=T x1 + x2 − y1 + y2 + r 2 .
2 2 2 2
i i
i i
ItoKunisc
i i
2008/6/12
page 5
i i
n n i
1 1 1
un = x1 + · · · + xn + 1 − 0 ∈ (C − x̄) ∩ X1
2 2 i=1
2
and
n+m i n
1 1
|un − un+m | ≤ ≤ .
i=n+1
2 2
Consequently {un }∞
n=1 is a Cauchy sequence and there exists some x ∈ (C − x̄) ∩ X1 such
that limn→∞ un = x. Similarly there exists v ∈ (K − ȳ) ∩ Z1 such that limn→∞ vn = v.
Moreover limn→∞ rn = 0 and continuity of T implies that y = T x − v.
Definition 1.5. The element x̄ ∈ M is called regular (or alternatively x̄ satisfies the regular
point condition) if
Finally (1.3.3) implies (1.3.1) by Theorem 1.4. Thus, the three conditions (1.3.1)–(1.3.3)
are equivalent. Condition (1.3.1) is used in the work of Robinson (see, e.g., [Ro2]) on
the stability of solutions to systems of inequalities, and (1.3.3) is the regularity condition
employed in [MaZo, ZoKu] to guarantee the existence of Lagrange multipliers.
i i
i i
ItoKunisc
i i
2008/6/12
page 6
i i
Remark 1.3.1. If C = X, then the so-called Slater condition g (x̄)h ∈ int K(g(x̄)) for
some h ∈ X implies (1.3.1). In case C = X = Rn , Z = Rm , K = {y ∈ Rm : yi = 0
for i = 1, . . . , k, and yi ≤ 0 for i = k + 1, . . . , m}, g = (g1 , . . . , gk , gk+1 , . . . , gm ),
the constraint g(x) ∈ K amounts to k equality and m − k inequality constraints, and the
regularity condition (1.3.3) becomes
⎧
⎪
⎪ the gradients {gi (x̄)}ki=1 are linearly independent and
⎨
there exists x ∈ X such that
(1.3.4)
⎪
⎪ g (x̄)x = 0 for i = 1, . . . , k,
⎩ i
gi (x̄)x < 0 for i = k + 1, . . . , m with gi (x̄) = 0.
Theorem 1.6. If the solution x ∗ of (1.1.1) is regular, then there exists an associated Lagrange
multiplier λ∗ ∈ Z ∗ .
This is a convex cone containing the origin (0, 0) ∈ R × Z. Due to (1.3.5) the origin is a
boundary point of B. By the regular point condition and Theorem 1.4 with T = g (x ∗ ) and
ȳ = g(x ∗ ) there exists ρ > 0 such that
i i
i i
ItoKunisc
i i
2008/6/12
page 7
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 8
i i
1.4 Applications
This section is devoted to a discussion of examples in which the general existence result of
the previous section is applicable, as well as to other examples in which it fails. Throughout
denotes a bounded domain in Rn with Lipschitz continuous boundary ∂. We use standard
Hilbert space notation as introduced in [Ad], for example.
Differently from the previous sections we shall use J to denote the cost functional
and f for inhomogeneities in equality constraints representing partial differential equations.
Moreover the constraints g(x) ∈ K from the previous sections will be split into equality
constraints, denoted by e(x) = 0, and inequality constraints, g(x) ∈ K, with K a nontrivial
cone.
Example 1.7. Consider the unilateral obstacle problem
⎧
⎪ 1
2
⎪
⎨ min J (y) =
∇y
dx − fydx
2 (1.4.1)
⎪
⎪
⎩
over y ∈ H01 () and y(x) ≤ ψ(x) for a.e. x ∈ ,
where f ∈ L2 () and ψ ∈ H01 (). This is a special case of (1.1.1) with X = Z =
H01 (), K = {ϕ ∈ H01 () : ϕ(x) ≤ 0 for a.e. x ∈ }, C = X, and g(y) = y − ψ. It is well
known and simple to argue that (1.4.1) admits a unique solution y ∗ ∈ H01 (). Moreover,
since g is the identity, every element of X satisfies the regular point condition. Hence by
Theorem 1.6 there exists λ∗ ∈ H01 ()∗ such that
∗
λ , ϕ H −1 ,H 1 ≤ 0 for all ϕ ∈ K,
∗ ∗ 0 0
λ , y − ψ H −1 ,H 1 = 0,
0 0
−y ∗ + λ∗ = f in H −1 ,
where H −1 = H01 ()∗ . Under well-known regularity assumptions on ψ and ∂, cf. [Fr,
IK1], for example, λ∗ ∈ L2 (). This extra smoothness of the Lagrange multiplier does not
follow from the general multiplier theory of the previous section.
The following result is useful to establish the regular point condition in diverse optimal
control and parameter estimation problems. We require some notation. Let L be a closed
convex cone in the real Hilbert space U which induces an ordering denoted according to
u ≥ 0 if u ∈ L. Let q ∗ ∈ U ∗ be a nontrivial bounded linear functional on U , with
π : U → ker q ∗ the orthogonal projection. For ϕ ∈ U, μ ∈ R, and γ ∈ R+ define
g : U → Z := U × R × R
by
2
g(u) = u − ϕ,
u
− γ 2 , q ∗ (u) − μ ,
where | · | denotes the norm in U , and put K = L × R− × {0} ⊂ Z.
Proposition 1.8. Assume that [ker q ∗ ]⊥ ∩ L = {0} and let h0 ∈ [ker q ∗ ]⊥ ∩ L with
2
2
q ∗ (h0 ) = 1. If q ∗ (L) ⊂ R+ , q ∗ (ϕ) < μ, and
π(ϕ)
+ μ2
h0
< γ 2 ; then the set
M = {u ∈ U : g(u) ∈ K}
i i
i i
ItoKunisc
i i
2008/6/12
page 9
i i
1.4. Applications 9
Proof. Note that U = ker q ∗ ⊕ span {h0 } and that the orthogonal projection onto [ker q ∗ ]⊥
is given by q ∗ (u)h0 for u ∈ U . To show that M is nonempty, we define
û = ϕ + μ − q ∗ (ϕ) h0 .
Observe that
û − ϕ = μ − q ∗ (ϕ) h0 ∈ L,
2
û
=
π(ϕ)
2 + μ2
h0
2 < γ ,
This will imply the claim. Observe that by the choice of δ we find μ − q ∗ (ϕ) −
∗
q (x̃) + s̃ ≥ 0. Hence l is defined by
l = μ − q ∗ (ϕ) − q ∗ (ũ) + s̃ h0 ∈ L.
Choosing h1 = π(φ − u + ũ) we obtain equality in the first coordinate of (1.4.5). For
the second coordinate in (1.4.5) we observe that
2
u
− γ 2 + 2(u, h1 + s̃h0 )
2
2
=
πu
+ μ2
h0
− γ 2 + 2(u, π(ϕ − u + ũ) + s̃h0 )
2
2
2
2
2
≤
πu
+ μ2
h0
− γ 2 +
π u
+
π ϕ
− 2
π u
+ 2γ |ũ|2 +
s̃
h0
2
2
2
≤ μ h0 − γ + πϕ + 2δγ 1 + h0 ≤ −δ
2
by the definition of δ. Hence there exists r − ∈ R− such that equality holds in the second
coordinate of (1.4.5).
i i
i i
ItoKunisc
i i
2008/6/12
page 10
i i
Example 1.9. We consider a least squares formulation for the estimation of the potential c in
−y + cy = f in ,
(1.4.6)
y = 0 on ∂
The Lax–Milgram theorem implies the existence of a variational solution y(c) ∈ H01 ()
for every c ∈ M. Here we use the fact that H 1 () ⊂ L4 () and cy ∈ L4/3 () for
c ∈ L2 (), y ∈ H 1 (), for n ≤ 4. Proposition 1.8 with L = {c ∈ L2 () : c(x) ≥ 0} and
2
g(c) = c,
c
− γ 2
is applicable and implies that every element of M is a regular point. Using subsequential
weak limit arguments one can argue the existence of a solution c∗ to (1.4.7). To apply
Theorem 1.6 Fréchet differentiability of c → J (c) at c∗ must be verified. This will follow
from Fréchet differentiability of c → y(c) at every c ∈ M, which in turn can be argued by
means of the implicit function theorem, for example. The Fréchet derivative of c → y(c)
at c ∈ M in direction h ∈ L2 () denoted by y (c)h satisfies
−y (c)h + cy (c)h = −hy(c) in ,
(1.4.8)
y (c)h = 0 on ∂.
i i
i i
ItoKunisc
i i
2008/6/12
page 11
i i
1.4. Applications 11
Example 1.10. We revisit Example 1.9 but this time the state variable is considered as an
independent variable. This results in the following problem with equality and inequality
constraints: ⎧
⎪ 1 α
⎨min J (y, c) = |y − z|2 + |c|2
2 2 (1.4.12)
⎪
⎩ −
subject to e(y, c) = 0 and g(c) ∈ L × R ,
where g is as in Example 1.9 and e : H01 () × L2 () → H −1 () is defined by
e(y, c) = −y + cy − f.
Note that cy ∈ L () ⊂ H −1 () for c ∈ L2 (), y ∈ H 1 () if n ≤ 4. Let
4
3
i i
i i
ItoKunisc
i i
2008/6/12
page 12
i i
Thus the first order optimality condition derived in Example 1.9 above and the present
one coincide, and the adjoint variable of Example 1.9 becomes the Lagrange multiplier of
the equality constraint e(y, c) = 0.
Example 1.11. We consider a least squares formulation for the estimation of the diffusion
coefficient a in
−(ayx )x + cy = f in = (0, 1),
(1.4.13)
y(0) = y(1) = 0,
where f ∈ L2 (0, 1) and c ∈ L2 (), c ≥ 0, are fixed. We assume that a is known outside
of an interval I = (β, γ ) with 0 < β < γ < 1 and that it coincides there with the values
of a known function ν ∈ H 1 (0, 1), which satisfies min {ν(x) : x ∈ (0, 1)} > 0. The set of
admissible parameters is then defined by
M = a ∈ H 1 (I ) : a ≥ ν,
a − â
H 1 (I ) ≤ γ , a(β) = ν(β),
γ
a(γ ) = ν(γ ), and a dx = m .
β
Here â ∈ H 1 (I ) is a fixed reference parameter with â(β) = ν(β), â(γ ) = ν(γ ), and
m = β â dx. Recall that H 1 (I ) ⊂ C(I¯) and hence a(β) and a(γ ) are well defined. The
γ
coefficient a appearing in (1.4.13) coincides with ν on the complement of I and with some
element of M on I . Consider the least squares formulation
⎧
2
2
⎨min 12
y(a) − z
L2 (0,1) + α2
a
H 1 (I )
(1.4.14)
⎩
subject to a ∈ M and y(a) solution to (1.4.13),
We apply Proposition 1.8 with U = and (|u|2L2 (I ) + |ux |2L2 (I ) )1/2 as norm,
H01 (I ),
L = {a ∈ U : a ≥ 0}, q ∗ (a) = β a dx, ϕ = ν − â ∈ U , and μ = 0. The mapping
γ
g : U → U × R × R is given by
2
g(a) = a − ϕ,
a
H 1 (I ) − γ 2 , q ∗ (a) .
γ
Let us assume that β ν dx < m and
ν − â
H 1 (I ) < γ . Then we have q ∗ (L) ⊂
γ
R+ , q ∗ (ϕ) < m − β âdx = μ, and
πϕ
1 =
π(ν − â)
1 ≤
ν − â
1 < γ .
H (I ) H (I ) H (I )
i i
i i
ItoKunisc
i i
2008/6/12
page 13
i i
1.4. Applications 13
Next we present examples where the general existence result of Section 1.3 is not
applicable. Different techniques to be developed in the following sections will guarantee,
however, the existence of a Lagrange multiplier. In these examples we shall again use e for
equality and g for inequality constraints.
where
x1 − x22 − x3
e(x1 , x2 , x3 ) = .
x22 − x32
f (x ∗ ) + e (x ∗ )∗ λ∗ = 0 in X ∗ ,
Example 1.13. Here we consider optimal control of the equation with elliptic nonlinearity
(Bratu problem)
−y + exp(y) = u in ,
(1.4.16)
∂y
∂n
= 0 on and y = 0 on ∂ \ ,
where is a bounded domain in Rn , n ≤ 3, with smooth boundary ∂ and is a connected,
strict subset of ∂. Let H1 () = {y ∈ H 1 () : y = 0 on ∂ \ }. The following lemma
establishes the concept of solution to (1.4.16). Its proof is given at the end of this section.
has a unique solution y = y(u) ∈ H1 () and there exists a constant C such that
i i
i i
ItoKunisc
i i
2008/6/12
page 14
i i
and
|y(u1 ) − y(u2 )|H1 ∩L∞ ≤ C|u1 − u2 |L2 () for all ui ∈ L2 (), i = 1, 2. (1.4.18)
The control component of any minimizing sequence {(yn , un )} for J is bounded. Together
with Lemma 1.14 it is therefore simple to argue that (1.4.19) admits a solution (y ∗ , u∗ ) ∈
Y × U . Clearly J and e are continuously Fréchet differentiable. If (y ∗ , u∗ ) was a regular
point, then this would require surjectivity of e (y ∗ , u∗ ) : X → H1 ()∗ . For (δy, δu) ∈
X, e (y ∗ , u∗ )(δy, δu) is the functional defined by
Since −+exp(y ∗ ) is an isomorphism from H1 () to H1 ()∗ and H1 () is not contained
in L∞ (), if n > 1, it follows that e (y ∗ , u∗ ) : X → H1 ()∗ is not surjective if n > 1.
Example 1.15. We consider the least squares problem for the estimation of the vector-
valued convection coefficient u in
−y + u · ∇y = f in ,
(1.4.20)
y = 0 on ∂, div u = 0,
from data z ∈ L2 (). Here is a bounded domain in Rn , n ≤ 3, with smooth boundary and
f ∈ L 2 (). To cast the problem
in abstract form we choose U = u ∈ L2n () : div u = 0 ,
X = H01 () ∩ L∞ () ×U, Z = H −1 (), K = {0} and define e : X → Z as the mapping
which assigns to (y, u) ∈ X the functional
Note that e(y, u) is not well defined for (y, u) ∈ H01 () × U , since u · ∇y ∈ L1 () only.
The regularized least squares problem is given by
⎧
⎨min J (y, u) = 2 |y − z|L2 () + 2 |u|L2n ()
1 2 α 2
(1.4.21)
⎩
subject to e(y, u) = 0, div u = 0, (y, u) ∈ X.
i i
i i
ItoKunisc
i i
2008/6/12
page 15
i i
1.4. Applications 15
Observe that (u · ∇y, y) = (∇ · (uy), y) = 0. Using this fact and techniques similar to
those of the proof of Lemma 1.14 (compare [Tr, Section 2.3] and [IK14, IK15]) it can be
shown that for every u ∈ U there exists a unique solution y = y(u) ∈ X of (1.4.20) and for
every bounded subset B of U there exists a constant k(B) such that
Note that ey (y ∗ , u∗ ) is well defined on H01 () ∩ L∞ (). But as a consequence of the
bilinear term u∗ · ∇δy which is only in L1 (), ey (y ∗ , u∗ ) is not defined on H01 (). The
operator e (y ∗ , u∗ ) from X to H −1 () is not surjective if n > 1, and hence (y ∗ , u∗ ) does
not satisfy the regular point condition.
We now turn to the proof of Lemma 1.14 for which we require the following result
from [Tr].
Proof of Lemma 1.14. Let us first argue the existence of a solution y = y(u) ∈ of
(∇y, ∇v) + (ey , v)H1 ()∗ ,H1 () = (u, v) for all v ∈ H1 (). (1.4.23)
Since (exp (s1 ) − exp (s2 ))(s1 − s2 ) ≥ 0 for s1 , s2 ∈ R, it follows that −φ + exp (φ) −
γ I defines a maximal monotone operator from a subset of H1 () to H1 ()∗ for some
γ > 0 [Ba], and hence (1.4.23) has a unique variational solution y = y(u) ∈ H1 () for
every u ∈ L2 (); see [IK15] for details. Since
Throughout the remainder of the proof C will denote a generic constant, independent
of u ∈ L2 (). To verify (1.4.17) it remains to obtain an L∞ () bound for y = y(u). The
i i
i i
ItoKunisc
i i
2008/6/12
page 16
i i
Here we used the assumption that n ≤ 3. Employing this estimate in (1.4.25) implies that
1
|∇yk |L2 ≤ C|k | 3 |u|L2 . (1.4.26)
We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞, and we find
|yk |L4 =
4
(y − k) >
4
(y − k)4 ≥ |h |(h − k)4 ,
k h
where the constant Ĉ is independent of h, k, and u. It will be shown that Lemma 1.16 is
applicable to (1.4.27) with ϕ(k) = |k |, β = 43 and K = Ĉ|u|4L2 . The conditions on k1 and
1 β β−1
h1 can easily be satisfied. In fact, in our case k1 = 0, h1 = ∞, and k̂ = Ĉ 4 |u|L2 2 β−1 |0 | 4 .
The condition k1 + k̂ < h1 is satisfied since
1 β β−1 1 β β−1
k̂ = Ĉ 4 |u|L2 2 β−1 |0 | 4 < Ĉ 4 |u|L2 2 β−1 || 4 < ∞.
We conclude that |k̂ | = 0 and hence y ≤ k̂ a.e. in . A uniform lower bound on
y can be obtained in an analogous manner by considering yk = (−(k + y))+ . We leave
the details to the reader. This concludes the proof of (1.4.17). To verify (1.4.18) the H 1
estimate for y(u1 ) − y(u2 ) is already clear from (1.4.24) and it remains to verify the L∞ ()
estimate. Let us set yi = y(ui ), z = y1 − y2 , zk = (z − k)+ , and k = {x ∈ : zk > 0}
for k ∈ (0, ∞). We obtain
|∇zk |2L2 = (∇z, ∇zk ) = (u1 − u2 , zk ) − (ey1 − ey2 , zk ) ≤ (u1 − u2 , zk ).
Proceeding as above with y and yk replaced by z and zk the desired pointwise upper bound
for y1 − y2 is obtained. For the lower bound we define zk = (−(k + z))+ for k ∈ (0, ∞)
and k = {x ∈ : zk > 0} = {x : k + y1 (x) < y2 (x)}. It follows that
|∇zk |2L2 = −(∇(y1 − y2 ), ∇zk ) = (ey1 − ey2 , zk ) − (u1 − u2 , zk ) ≤ −(u1 − u2 , zk ).
From this inequality we obtain the desired uniform pointwise lower bound on
y1 − y2 .
i i
i i
ItoKunisc
i i
2008/6/12
page 17
i i
(H4) For every u ∈ D and y(·) as in (H3), e is directionally differentiable at every element
of {(y ∗ + s(y(t) − y ∗ ), u∗ + st (u − u∗ )) : s ∈ [0, 1], t ∈ [0, tu ]} in all directions
(ỹ, ũ) ∈ Y1 × U and
1
1 ∗ ∗ ∗ ∗ ∗ ∗ ∗
lim [e (y +s(y(t)−y ), u +stv)−e (y , u )](y(t)−y , tv)ds, λ = 0,
t→0+ t 0 W,W ∗
where v = u − u∗ .
i i
i i
ItoKunisc
i i
2008/6/12
page 18
i i
Note that (H4) is satisfied if (H3) holds with Y replaced by Y1 and if e : Y1 → W is Fréchet
differentiable with locally Lipschitzian derivative.
Our assumptions do not require surjectivity of e (y ∗ , u∗ ) : Y1 × U → W which is
required by (1.3.1) nor that e (y ∗ , u∗ ) is well defined on all of Y × U . Below we shall give
several examples which illustrate the applicability of the assumptions. For now, we argue
that if they are satisfied, then a Lagrange multiplier with respect to the equality constraint
exists.
Theorem 1.17. Let (y ∗ , u∗ ) be a local solution of (1.5.1) and assume that (H1)–(H4) hold.
Then
⎧ ∗ ∗
⎪
⎨e(y ,u ) = 0 in W (primal equation),
∗ ∗ ∗ ∗ ∗
G λ + J y (y , u ) = 0 in Y (adjoint equation),
⎪
⎩ eu (y , u ) λ + Ju (y , u ), u − u
∗ ∗ ∗ ∗ ∗ ∗ ∗
≥ 0 for all u ∈ C (optimality).
∗ U ,U
(1.5.3)
Proof. Let u ∈ D, set v = u − u∗ , choose tu according to (H3), and assume that t ∈ (0, tu ].
Due to (H3) and (H4)
(H2) implies the existence of a solution λ∗ to the adjoint equation. Observe that by (1.5.4)
and the fact that u∗ is a local solution to (1.5.1)
Ju (y ∗ , u∗ )(u∗ ) + eu (y ∗ , u∗ )∗ λ∗ , u − u∗
U ∗ ,U ≥ 0 for all u ∈ C.
i i
i i
ItoKunisc
i i
2008/6/12
page 19
i i
Note that with (y1 , y2 , u) = (x1 , x2 , x3 ) this problem coincides with Example 1.12. We
recall that e (y ∗ , u∗ ) is not surjective and the theory of Section 1.3 assuring
the existence
of a Lagrange multiplier is therefore not applicable. However, G∗ = 10 00 and the adjoint
equation
λ 0
G∗ 1 =
λ2 0
has infinitely many solutions. Thus (H1), (H2) are satisfied. As for (H3) note that yy12 (u) =
u+u 34 (u)
Example 1.19. We consider the optimal control problems with distributed control
1 α
min J (y, u) = |y − z|2L2 () + |u|2L2 () (1.5.6)
2 2
subject to
⎧
⎨−y + exp(y) = u in ,
∂y
= 0 on , (1.5.7)
⎩ ∂n
y = 0 on ∂ \ ,
i i
i i
ItoKunisc
i i
2008/6/12
page 20
i i
where y(t) is the solution to (1.5.7) with u = u∗ + tv, v ∈ U . Note that λ∗ ∈ L∞ and that
{|y(t)|L∞ : t ∈ [0, 1]} is bounded by Lemma 1.14. Moreover |y(t) − y ∗ |Y1 ≤ c t|v|L2 for a
constant c independent of t ∈ [0, 1], and thus the pointwise local Lipschitz property of the
exponential function implies that the limit in (1.5.8) is zero. (H4) now easily follows.
The considerations of this example remain correct for cost functionals J that are much
more general than the one in (1.5.6). In fact, it suffices that J is weakly lower semicontinuous
from Y ×U to R and radially unbounded with respect to u, i.e., limn→∞ |un |L2 = ∞ implies
that lim supn→∞ inf y∈Y J (y, un ) = ∞. This guarantees existence of a solution (y ∗ , u∗ ).
The general regularity assumptions and (H1)–(H4) are satisfied if J : Y × U → R is
continuous and Fréchet differentiable in a neighborhood of (y ∗ , u∗ ) with locally Lipschitz
continuous derivative.
1 α
min J (y, u) = |y − z|2L2 () + |u|2H s () (1.5.9)
2 2
subject to
⎧
⎨−y + exp y = f in ,
∂y
= u on , (1.5.10)
⎩ ∂n
y = 0 on ∂ \ ,
i i
i i
ItoKunisc
i i
2008/6/12
page 21
i i
has a unique solution y = y(u) ∈ H1 () ∩ L∞ () for every u ∈ H s (), and there exists
a constant c such that
|y|H1 ∩L∞ ≤ c(|u|H s () + c) for all u ∈ H s (). (1.5.11)
Moreover, c can be chosen such that
|y(u1 ) − y(u2 )|H1 ()∩L∞ ≤ c|u1 − u2 |H s for all ui ∈ H s (), i = 1, 2. (1.5.12)
Example 1.22. We reconsider (1.4.21) from Example 1.15 as a special case of (1.5.1). For
this purpose we set Y = H01 (), Y1 = H01 () ∩ L∞ (), W = H −1 (), and e as in
Example 1.15. Let (y ∗ , u∗ ) ∈ Y1 × U denote a solution of (1.4.21). Clearly e is Fréchet
differentiable at (y ∗ , u∗ ) and the partial derivative G = ey (y ∗ , u∗ ) ∈ L(Y1 , W ) is the
functional characterized by
ey (y ∗ , u∗ )δy, v
W,W ∗ = (∇δy, ∇v) + (u∗ · ∇δy, v), v ∈ W ∗ = H01 ().
As a consequence of the quadratic term u∗ · ∇δy which is only in L1 (), G is not defined
on all of Y = H01 (). As an operator from Y1 to W , the operator G is not surjective.
Considered as an operator with domain in Y , its adjoint is given by
G∗ w = −w − ∇ · (u∗ w).
The domain of G∗ contains Y1 and hence G∗ is densely defined. Moreover its range con-
tains L2 () and thus (H1) as well as (H2) are satisfied. Let U (u∗ ) ⊂ U be a bounded
neighborhood of u∗ . Since for every u ∈ U (u∗ )
(∇(y(u) − y ∗ ), ∇v) − (u(y(u) − y ∗ ), ∇v) = ((u − u∗ )y ∗ , ∇v) for all v ∈ H01 (),
it follows that there exists a constant k > 0 such that
|y(u) − y ∗ |H 1 ≤ k|u − u∗ |L2n for all u ∈ U (u∗ ), (1.5.13)
and (H3) follows. The validity of (H4) is a consequence of (1.5.13) and the fact that λ∗ is
the unique variational solution in H01 () to
−λ∗ − ∇ · (u∗ λ∗ ) = −(y ∗ − z)
and hence an element of L∞ ().
Remark 1.5.1. Comparing Examples 1.19 and 1.20 with Example 1.22 we observe that the
linearization e (y, u), with (y, u) ∈ Y1 × U , is well defined on Y × U for Examples 1.19
and 1.20 but it is only defined with domain strictly contained in Y × U for Example 1.22.
For none of these examples is e defined on all of Y × U .
Example 1.23. Here we consider the nonlinear optimal control problem with nonlinearity
of blowup type:
⎧
⎪ min 12 |∇(y − z)|2L2 () + α2 |u|2L2 () subject to
⎪
⎨ 2
−y − exp y = f in ,
(1.5.14)
⎪
⎩ ∂n = u on ,
∂y
⎪
y = 0 on ∂ \ ,
i i
i i
ItoKunisc
i i
2008/6/12
page 22
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 23
i i
The methodology utilized to consider this example can also be applied to Examples 1.19 and
1.20 provided that is restricted to being two-dimensional. This is essential for (1.5.15) to
hold. For Example 1.23 it is essential for the cost functional to be radially unbounded with
respect to the H1 ()-norm for the y-component to guarantee that minimizing sequences are
bounded. For Examples 1.19 and 1.20 the a priori bound on the y-component of minimizing
sequences can be obtained through the state equation.
i i
i i
ItoKunisc
i i
2008/6/12
page 24
i i
where y = y(t, x), T > 0, and y0 is a given initial condition. With (1.6.5) we associate the
optimal control problem
⎧
⎪ 1 α
⎨min J (y, u) = |y − z|6L6 (Q) + |u|2L2 (Q)
6 2 (1.6.6)
⎪
⎩
subject to u ∈ C, y ∈ H (Q) and (1.6.5),
1,2
where α > 0, z ∈ L6 (Q), C is a closed convex set in L2 (Q), and H 1,2 (Q) = ϕ ∈ L2 (Q) :
ϕt , ϕxi , ϕxi xj ∈ L2 (Q) . Realizing the state equation by a penalty term leads to
⎧
⎪ 1
2
⎨min Jε (y, u) = J (y, u) +
yt − y − y 3 − u
L2 (Q)
ε (1.6.7)
⎪
⎩
subject to u ∈ C, y ∈ H 1,2 (Q), y| = 0, y(0, ·) = y0 .
It can be shown that for every ε > 0 there exists a solution (yε , uε ) to (1.6.7). Considering the
behavior of the pair (yε , uε ) as ε → 0+ the question of convergence to a particular solution
(y ∗ , u∗ ) of (1.6.6) again arises. As before this can be realized by means of adaptation terms:
⎧
⎪ 1
2 1
2
⎨min Jε (y, u) +
y − y ∗
L2 (Q) +
u − u∗
L2 (Q)
2 2 (1.6.8)
⎪
⎩
subject to u ∈ C, y ∈ H (Q), y| = 0, y(0, ·) = y0 .
1,2
i i
i i
ItoKunisc
i i
2008/6/12
page 25
i i
and
(αuε − λε )(u − uε )dQ + (uε − u∗ )(u − uε )dQ ≥ 0 for all u ∈ C, (1.6.11)
Q Q
where λε must be interpreted as a weak solution in L2 (Q) of (1.6.10) (i.e., the inner product
with smooth test function is taken, and partial integration bringing all differentiations on
the test function is carried out). It can be verified that uε → u∗ in L2 (Q), y ε y ∗ weakly
in H 1,2 (Q) and that there exists λ∗ ∈ W 2,1;6/5 , the weak limit of λε , such that the following
optimality system for (1.6.6) is satisfied:
⎧
⎪yt − y − y = u in Q,
⎪ 3
⎪
⎪
⎪
⎪
⎪
⎪
⎨−λt − λ − 3y 2 λ = −(y − z)5 in Q,
! (1.6.12)
⎪
⎪
⎪
⎪ y = λ = 0 on ,
⎪
⎪
⎪
⎪
⎩
y(0, ·) = y 0 , λ(T , ·) = 0 in ,
where W 2,1;6/5 = ϕ ∈ L6/5 (Q) : ϕt , ϕxi , ϕxi xj ∈ L6/5 (Q) ; see [Lio1, Chapter I.3].
i i
i i
ItoKunisc
i i
2008/6/12
page 26
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 27
i i
Chapter 2
Sensitivity Analysis
2.1 Generalities
In this chapter we discuss the sensitivity of solutions to parameterized mathematical pro-
gramming problems of the form
min f (x, p) subject to
x∈C
(2.1.1)
e(x, p) = 0, g(x, p) ≤ 0, and (x, p) ∈ K,
where C is a closed convex set in X, and further e : X × P → W represents an equality
constraint, g : X × P → Rm a finite-dimensional inequality constraint, and : X × P → Z
an infinite-dimensional affine constraint. The cost f as well as the equality and inequality
constraints are allowed to depend on a parameter p ∈ P . Unless otherwise specified P is
a normed linear space, X, W, Z are real Hilbert spaces, and K is a closed convex cone
with vertex at 0 in Z. Such problems arise in inverse, control problems, and variational
inequalities, and some examples will be discussed in Section 2.6.
The cone K induces a natural ordering ≤ by means of z1 ≤ z2 if z1 − z2 ∈ K. We
denote by K + the dual cone given by
K + = {z ∈ Z ∗ : z, z̃
≤ 0 for all z̃ ∈ K},
where ·, ·
denotes the duality pairing between Z and Z ∗ . Suppose that x0 is the solution
to (2.1.1) for the nominal parameter p = p0 . The objective of this section is to investigate
• continuity and Lipschitz continuity of the solution mapping
27
i i
i i
ItoKunisc
i i
2008/6/12
page 28
i i
Due to the local nature of the analysis, f , e, and g need only be defined in a neigh-
borhood of (x0 , p0 ). We assume throughout this chapter that they are twice continuously
Fréchet differentiable with respect to x and that their first and second derivatives are con-
tinuous in a neighborhood of (x0 , p0 ). It is assumed that is affine in x for every p ∈ P
and that the first derivative (p0 ) with respect to x is continuous in a neighborhood of p0 .
In order to ensure the existence of a Lagrange multiplier at x0 we assume that
where the interior is taken in W × Rm × Z and primes denote the Fréchet derivatives with
respect to x. We define the Lagrange functional L : X × P × W ∗ × Rm × Z ∗ → R by
With (2.1.2) holding, it follows from Theorem 1.6 that there exists a Lagrange multiplier
(λ0 , μ0 , η0 ) ∈ W ∗ × Rm,+ × K + such that
L (x0 , p0 , λ0 , μ0 , η0 ), c − x0
≥ 0 for all c ∈ C,
e(x0 , p0 ) = 0,
(2.1.3)
μ0 , g(x0 , p0 )
= 0, g(x0 , p0 ) ≤ 0, μ0 ≥ 0,
η0 , (x0 , p0 )
= 0, (x0 , p0 ) ∈ K, η0 ∈ K + .
To express (2.1.3) in a more compact form we recall that the subdifferential ∂ψC of the
indicator function
0 if x ∈ C,
ψC (x) =
∞ if x ∈
/C
The set ∂ψC (x) is also referred to as the normal cone to C at x. For convenience we also
specify
{z ∈ Z : z∗ − η, z
X∗ ,X ≤ 0 for all z∗ ∈ K + } if η ∈ K + ,
∂ψK + (η) =
{} / K +.
if η ∈
i i
i i
ItoKunisc
i i
2008/6/12
page 29
i i
2.1. Generalities 29
In fact this follows from the following characterization of the normal cone to K + .
Proof. Assume that z ∈ ∂ψK + (η). Then ∂ψK + (η) is nonempty, η ∈ K + , and
z∗ − η, z
≤ 0 for all z∗ ∈ K + . (2.1.5)
Without loss of generality one can assume that the coordinates of the inequality con-
straints g(x, p0 ) ≤ 0 and the associated Lagrange multiplier μ0 are arranged so that
μ0 = (μ+ − + −
0 , μ0 , μ0 ) and g = (g , g , g ), with g
0 0 +
: X → R m 1 , g 0 : X → R m2 ,
−
g : X → R , m = m1 + m2 + m3 , and
m3
g + (x0 , p0 ) = 0, μ+ 0 > 0,
g 0 (x0 , p0 ) = 0, μ00 = 0, (2.1.6)
g − (x0 , p0 ) < 0, μ− 0 = 0.
We further define
G+ = g + (x0 , p0 ) , G0 = g 0 (x0 , p0 ) ,
and
E
E+ = : X → W × R m1 .
G+
i i
i i
ItoKunisc
i i
2008/6/12
page 30
i i
Note that the coordinates denoted by superscripts + behave locally like equality
constraints. Define E(z) : X × R → (W × Rm1 ) × Rm2 × Z for z ∈ Z by
⎛ ⎞
E+ 0
E(z) = ⎝ G0 0 ⎠.
L z.
Condition (H2) is a second order sufficient optimality condition for (2.1.1) at x0 and (H3)
implies that there exists a neighborhood O of (x0 , p0 ) in Z such that E(z) is surjective for
all z ∈ O.
The chapter is organized as follows. In Sections 2.2 and 2.3 we discuss the basic results
for establishing Lipschitz continuity of the solution mapping. The implicit function theory
for the generalized equation of the form (2.1.4) is discussed in Section 2.2. In Section 2.3
stability results for solutions to mathematical programming problems including (2.1.1) are
addressed. Sufficient conditions for stability of the solutions are closely related to suffi-
cient optimality conditions for local minimizers. Section 2.3 therefore also contains first
and second order sufficient conditions for local optimality. Sections 2.4 and 2.5 are de-
voted to establishing Lipschitz continuity and directional differentiability of the solution
mapping. For the latter we employ the assumption of polyhedricity of a closed convex
cone K which, together with appropriate additional conditions, implies that the directional
derivative (ẋ, λ̇, μ̇, η̇) in direction q satisfies
⎧
⎪
⎪ L p (x0 , p0 , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗+ μ̇+ + G∗0 μ̇0 + L∗ η̇,
⎨
−ep+ (x0 , p0 )q − E+ ẋ,
0∈ (2.1.7)
⎪
⎪ −gp0 (x0 , p0 )q − G0 ẋ + ∂ψRm2 ,+ (μ̇0 ),
⎩
−p (x0 , p0 )q − Lẋ + ∂ψK̂ + (η̇),
where e+ = ge+ and K̂ + is the dual cone of K̂ = ∪λ>0 λ(K − (x0 , p0 ))∩[η0 ]⊥ . Section 2.6
contains some applications to optimal control of ordinary differential equations. This chapter
is strongly influenced by the work of S. M. Robinson and also by that of W. Alt.
i i
i i
ItoKunisc
i i
2008/6/12
page 31
i i
where F : X × P → X∗ , with X ∗ the strong dual space of X, and C is a closed convex set in
X. We assume that X and P are normed linear spaces unless specified otherwise. Suppose
x0 ∈ X is a solution to (2.2.1) for a reference parameter p = p0 ∈ P . Let F (x, p0 ) be
Fréchet differentiable at x0 and define the linearized form by
Definition 2.2 (Strong Regularity). The generalized equation (2.2.1) is called strongly
regular at x0 with associated Lipschitz constant ρ if there exist neighborhoods of V of 0
and U of x0 in X such that (T −1 V ) ∩ U , the intersection of U with the restriction of T −1
to V , is single valued and Lipschitz continuous from V to U with modulus ρ.
Note that if C = X, the strong regularity assumption coincides with F (x0 , p0 )−1
being a bounded linear operator, which is the common condition for the implicit function
theorem. The following result is taken from Robinson’s work [Ro3].
Proof. For > 0 we choose a δ > 0 such that ρδ < /(ρ + ). By strong regularity, there
exists neighborhoods V of 0 and U of x0 in X such that (T −1 V ) ∩ U is single valued and
Lipschitz continuous from V to U with modulus ρ. Let
|F (x, p) − F (x0 , p0 )| ≤ δ
and
i i
i i
ItoKunisc
i i
2008/6/12
page 32
i i
Note that x ∈ U ∩ p (x) if and only if x ∈ U and (2.2.1) holds. We show that p is a
strict contraction and p maps U into itself. For x1 , x2 ∈ U we find, using h (x, p) =
F (x0 , p0 ) − F (x, p), that
By the Banach fixed point theorem p has a unique fixed point x(p) in U and for each
x ∈ U we have
|x(p) − x| ≤ (1 − ρδ)−1 |p (x) − x|. (2.2.5)
It follows from our earlier observation that x(p) is the unique solution to (2.2.1) in U . To
verify (2.2.3) with p, q ∈ N we use (2.2.5) with x = x(q) and obtain
and hence
|x(p) − x(q)| ≤ ρ(1 − ρδ)−1 |F (x(q), p) − F (x(q), q)|.
Observing that ρ(1 − ρδ)−1 ≤ ρ + the desired estimate (2.2.3) follows.
Corollary 2.4. Assume in addition to the conditions of Theorem 2.3 that P is a normed
linear space and that for some ν > 0
It should be remarked that the condition of strong regularity is the weakest possible
condition which can be imposed on the values of a function F and its derivative at a point x0 ,
i i
i i
ItoKunisc
i i
2008/6/12
page 33
i i
so that for each perturbation satisfying the hypothesis of Theorem 2.3, a function x(·) will
exist having the properties stated in Theorem 2.3. To see this, one has only to consider a
function F : X → X ∗ which is Fréchet differentiable at x0 and satisfies
0 ∈ F (x0 ) + ∂ψC (x0 ).
Let P be a neighborhood of the origin in X ∗ , and let
F (x, p) = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) − p
with p0 = 0. Choose > 0. If there exist neighborhoods N and V and a function x(·)
satisfying the properties stated in Theorem 2.3, then with
T x = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) + ∂ψC (x)
we see that the restriction to N of T −1 V is a singled-valued, Lipschitz continuous function.
Therefore the generalized equation 0 ∈ F (x, p0 ) + ∂ψC (x) is strongly regular at x0 .
One of the important consequences of Theorem 2.3 is the following theorem on
parametric sensitivity analysis.
Theorem 2.5. Under the conditions of Theorem 2.3 and Corollary 2.4 there exists for each
> 0 a function α (p) : N → R+ satisfying limp→p0 α (p) = 0, such that
|x(p) − p (x0 )| ≤ α (p) |p − p0 |,
where p (x0 ) is the unique solution in U of the linear generalized equation
0 ∈ F (x0 , p) + F (x0 , p0 )(x − x0 ) + ∂ψC (x). (2.2.6)
Proof. From the proof of Theorem 2.3 it follows that x(p) = p (x(p)) and thus by strong
regularity
|x(p) − p (x0 )| ≤ ρ |h(x(p), p) − h(x0 , p)|
1
≤ ρ
h (x0 + θ (x(p) − x0 )) dθ
|x(p) − x0 |
0
1
≤ ρν(ρ + )|p − p0 |
h (x0 + θ (x(p) − x0 )) dθ
.
0
Since h (x, p) = F (x0 , p0 ) − F (x, p) and F is continuous,
1
e (x0 + θ (x(p) − x0 )) dθ
→ 0
0
In the case C = X the generalized equation (2.2.1) reduces to the nonlinear equation
F (x, p) = 0 and
p (x0 ) = F (x0 , p0 )−1 (F (x0 , p0 ) − F (x0 , p)).
i i
i i
ItoKunisc
i i
2008/6/12
page 34
i i
Thus, Theorem 2.5 shows that if F (x0 , ·) is Fréchet differentiable at p0 , then so is x(p) and
∂F
x (p0 ) = −F (x0 , p0 )−1 (x0 , p0 ). (2.2.7)
∂p
In many applications, one might find (2.2.6) significantly easier to solve than (2.2.1). In fact,
the necessary optimality condition (2.1.3) is of the form (2.2.1), and (2.2.6) corresponds to
a quadratic programming problem, as we will discuss in Section 2.4. Thus Theorem 2.5 can
provide a relatively cheap way to find a good approximation to x(p) for p near p0 .
This class of problems contains (2.1.1) as a special case. To facilitate the analysis of (2.3.1)
let (p) denote the set of feasible points, i.e.,
For a given p0 ∈ P and x0 ∈ (p0 ) a local minimizer of (2.3.1), define the set r (p) by
where r > 0 and B̄(x0 , r) is the closure of the open ball with radius r at x0 in X. We further
define the value function μr (p) by
i i
i i
ItoKunisc
i i
2008/6/12
page 35
i i
Theorem 2.6 (Generalized Inverse Function Theorem). Let X and Y be normed linear
spaces and F a closed convex set-valued function from X to Y . Let y0 ∈ F (x0 ) and suppose
that for some η > 0,
y0 + η BY ⊂ F (x0 + BX ).
Then for x ∈ x0 + BX and any y ∈ y0 + int (η BY ),
Proof. Let x ∈ x0 + BX and y − y0 ∈ int (η BY ). For y ∈ F (x) the claim is clearly satisfied
/ F (x). Let δ > 0 be arbitrary and choose yδ ∈ F (x) such that
and therefore we assume y ∈
y = y + (α − )|y − yδ |−1 (y − yδ ).
y ∈ y0 + int (ηBY ).
x ∈ x0 + BX with y ∈ F (x ).
For 1
1+(α−)|y−yδ |−1
∈ (0, 1), we find, using convexity of F and the fact that yδ ∈ F (x),
i i
i i
ItoKunisc
i i
2008/6/12
page 36
i i
The proof of the stability result below relies on the following set-valued fixed point lemma.
For a metric space (X , d) let F(X ) denote the collection of all nonempty closed subsets
of X .
Lemma 2.7. Let (X , d) be a complete metric space and let : X → F(X ) be a set-valued
mapping. Suppose that there exists x0 ∈ X and constants r > 0, α ∈ (0, 1), ∈ (0, r) such
that for all x1 , x2 in the closed ball B̄(x0 , r) the sets (x1 ), (x2 ) are nonempty,
and
Then there exists x̄ ∈ B̄(x0 , r) such that x̄ ∈ (x̄) and d(x0 , x̄) ≤ (1−α)−1 d(x0 , (x0 ))+.
If moreover
for all nonempty closed subsets S1 and S2 of B̄(x0 , r), then the sequence Sk = (Sk−1 ),
with S0 = {x0 }, converges in the Hausdorff metric to a set S̄ satisfying S̄ = (S̄).
Proof. Let δ = α(x0 , (x0 )) and set γ = α −1 (1 − α). By induction one proves the
existence of a sequence xk+1 ∈ (xk ), k = 0, 1, . . . , such that xk ∈ B̄(x0 , r) and
k+1 (2.3.7)
d(xk+ü1 , x0 ) ≤ (1 − α k+1 )(r − ) + γ (1 − 2−i )α i .
i=1
α )(r − ), and (2.3.7) holds for all k. For k ≥ 0 and for every m ≥ 1 we have
k+1
m−1
d(xk+m , xk ) ≤ d(xk+i+1 , xk+i )
i=0
m−1
≤ α k+i δ + (1 − 2−(k+i+1) )γ α k+i+1
i=0
≤ α k (1 − α)−1 (δ + γ α),
i i
i i
ItoKunisc
i i
2008/6/12
page 37
i i
and consequently {xk } is a Cauchy sequence in B̄(xo , r). Since X is complete there exists
x̄ ∈ B̄(x0 , r) with limk→∞ xk = x̄ and d(x̄, x0 ) ≤ 1−α δ
+ . To verify the fixed point
property, note that for every k we have
Theorem 2.8 (Stability). Let x0 ∈ (p0 ) be a regular point. Then for every > 0 there
exist neighborhoods U of x0 , N of p0 such that for p ∈ N the set (p) is nonempty and
for each x ∈ U ∩ C
Proof. Let F be the closed convex set-valued mapping from X into Y given by
g (x0 , p0 )(x − x0 ) + g(x0 , p0 ) − K for x ∈ C,
F (x) =
{} for x ∈
/ C.
The regular point condition implies that there exists an η > 0 such that
η BY ⊂ F (x0 + BX ).
Here we use Theorem 1.4 and the discussion below Definition 1.5. For δ > 0 and r > 0 let
ρ = (η − 2δ r)−1 (1 + r) where we assume that
Let
h(x, p) = g(x0 , p0 ) + g (x0 , p0 )(x − x0 ) − g(x, p)
i i
i i
ItoKunisc
i i
2008/6/12
page 38
i i
and choose a neighborhood of N of p0 and a closed ball B̄(x0 , r) of radius r about x0 such
that for x ∈ U = C ∩ B̄(x0 , r) and p ∈ N
|g (x, p) − g (x0 , p0 )| ≤ δ
and
|g(x0 , p) − g(x0 , p0 )| ≤ δ r.
As in the proof of Theorem 2.3 it can be shown that
Note that x ∈ p (x) implies that g(x, p) ∈ K. We argue that Lemma 2.7 is applicable
with = p and X = C endowed with the metric induced by X. For every x ∈ U the set
p (x) is a closed convex subset of X. (Hence, if X is a reflexive Banach space, then the
metric projection is well defined and Lemma 2.7 will be applicable with = 0.) For every
x1 , x2 ∈ U we have by Theorem 2.6
Hence all conditions of Lemma 2.7 are satisfied with α = 2ρδ and = r(1−2α)
1−α
. Con-
sequently there exists a Cauchy sequence {xk } in C such that xk+1 ∈ p (xk ) and x̄ =
limk→∞ xk ∈ C satisfies x̄ ∈ p (x̄). Moreover, if Sk = p (Sk−1 ) with S0 = {x0 }, then
dH (Sm , Sk ) → 0 as m ≥ k → ∞ and thus S̄ = limk→∞ Sk satisfies S̄ = p (S̄) and
S̄ ⊂ p . Let
i i
i i
ItoKunisc
i i
2008/6/12
page 39
i i
with M = η−1 .
for (x, p) ∈ U × N. Then there exist constants r > 0 and Lr and a neighborhood Ñ of p0
such that
|μr (p) − μr (p0 )| ≤ Lr δ(p, p0 )
for all p ∈ Ñ.
Proof. Since x0 is a local minimizer of (2.3.1) at p0 there exists s > 0 such that
Let Cs = C ∩ B̄(x0 , s). By Theorem 1.4 and the discussion following Definition 1.5 the
regular point condition (2.3.3) also holds with C replaced by Cs . Applying Theorem 2.8
with C = Cs and = 1 there exist neighborhoods U1 of x0 and N1 of p0 such that s (p)
is nonempty and
i i
i i
ItoKunisc
i i
2008/6/12
page 40
i i
and thus f (x0 , p0 ) ≤ f (x̄, p0 ). From (2.3.11) we deduce that |f (x, p) − f (x̄, p0 )| ≤
Lr δ(p, p0 ), and hence
for each x ∈ r (p). The desired estimate follows from (2.3.15) and (2.3.17).
In the above proof we followed the work of Alt [Alt3]. Next, we establish Hölder
continuity of the local minimizers x(p) of (2.3.1). Theorem 2.9 provides Lipschitz conti-
nuity of the value function under the regular point condition. The following example shows
that the local solutions to the perturbed problems may behave rather irregularly.
i i
i i
ItoKunisc
i i
2008/6/12
page 41
i i
problem does not depend on x2 and all (0, x2 ) with x2 ≥ 0 are optimal for p = p0 . On the
other hand if we let f (x1 , x2 , p) = x1 + px2 + 12 (x2 − 1)2 , then for r ≥ 1 and |p| ≤ 1 the
solution to (2.3.18) is given by
x(p) = (0, 1 − p).
Thus the distance |x(p) − x0 | is continuous in p.
This example suggests that some kind of local invertibility should be assumed on
f in order to ensure continuity of the solutions. We define a condition closely related to
sufficient optimality conditions:
There exist κ > 0, β ≥ 1, γ > 0, and neighborhoods Û of x0 and N̂ of p0 such that
f (x, p) ≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 ) (2.3.19)
for all p ∈ N̂ and x ∈ (p) ∩ Û .
The following result shows that this condition is implied by sufficient optimality
conditions at x0 .
Theorem 2.11. Assume that x0 is regular, that (2.3.11)–(2.3.12) hold, and that there exist
constants κ > 0, β ≥ 1 and neighborhood Ũ of x0 such that
f (x, p0 ) ≥ f (x0 , p0 ) + κ |x − x0 |β for all x ∈ (p0 ) ∩ Ũ . (2.3.20)
Then condition (2.3.19) holds.
≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 ),
i i
i i
ItoKunisc
i i
2008/6/12
page 42
i i
Condition (2.3.20) can be verified under the following second order sufficient opti-
mality condition due to Maurer and Zowe [MaZo] .
Theorem 2.12. Assume that K is a closed convex cone in Y and that x0 ∈ (p0 ) is a
regular point, and let L(x, λ) = f (x, p0 ) + λ, g(x, p0 )
Y ∗ ,Y be the Lagrangian for (2.3.1)
at x0 with Lagrange multiplier λ0 . If there exist constants ω > 0, β̄ > 0 such that
where
S = L((p0 ), x0 ) ∩ {h ∈ X : λ0 , g (x0 , p0 )h
Y ∗ ,Y ≥ −β̄ |h|},
then there exist κ > 0 and a neighborhood Ũ of x0 such that (2.3.20) holds with β = 2.
For convenience we recall the definition of the linearizing cone: L((p0 ), x0 ) =
{x ∈ C(x0 ) : g (x0 )x ∈ K(g(x0 ))}, where K(g(x0 )) = {λ(y − g(x0 )) : y ∈ K, λ ≥ 0}.
Proof. All quantities are evaluated at p0 and this dependence will therefore be suppressed.
First we show that every x ∈ = (p0 ) can be represented as
x − x0 = h(x) + z(x) with h(x) ∈ L(, x0 ) and |z(x)| = o(|x − x0 |), (2.3.22)
where o(s)
s
→ 0 as s → 0+ . Let x ∈ and expand g at x0 :
By the generalized open mapping theorem (see Theorem 1.4), there exist α > 0 and k(x) ∈
K(g(x0 )) such that for some z(x) ∈ α |r(x, x0 )| (BX ∩ C(x0 ))
If we put h(x) = x − x0 + z(x), then h(x) ∈ C(x0 ), |z(x)| = o(|x − x0 |), and
1
f (x) ≥ f (x) + λ0 , g(x)
= L(x, λ0 ) = L(x0 , λ0 ) + L (x − x0 , x − x0 ) + r(x − x0 ),
2
(2.3.23)
where |r(x −x0 )| = o(|x −x0 |2 ), and we used λ0 , g(x)
≤ 0 for x ∈ , and L (x0 , λ0 ) = 0.
Now put B = L (x0 , λ0 ) and S = L() ∩ {h ∈ X : λ0 , g (x0 )h
≥ −β̄ |h|} in Lemma
2.13. It implies the existence of δ0 > 0 and γ > 0 such that
i i
i i
ItoKunisc
i i
2008/6/12
page 43
i i
Choose δ ∈ (0, 1) satisfying δ/(1 − δ) < γ and ρ > 0 such that |z(x)| ≤ δ|x − x0 | for
|x − x0 | ≤ ρ. Then for |x − x0 | ≤ ρ
= − λ0 , g (x0 )h(x)
− λ0 , g (x0 )z(x)
+ r̄(x − x0 )
≥ β̄|h(x)| + r1 (x − x0 ),
Proof. Let b = B and choose γ > 0 such that δ = ω − 2bγ − bγ 2 > 0. Then for all
z ∈ X and h ∈ S satisfying |z| ≤ γ |h|
i i
i i
ItoKunisc
i i
2008/6/12
page 44
i i
B(h + z, h + z) ≥ δ0 |h + z|2
with δ0 = δ/(1 + γ )2 .
Remark 2.3.1. An example which illustrates the benefit of including the set {h ∈ X :
λ0 , g (x0 , p0 )h
Y ∗ ,Y ≥ −β̄ |h|} into the definition of the set S on which the Hessian must
be positive definite is given by the one-dimensional example with
In this case = R − , x0 = 0, f (x0 ) = 0, f (x0 ) = −1, and f (x0 ) = 0. We further find that
L(, x0 ) = K and that the Lagrange multiplier is given by λ0 = 1. Consequently for β̄ ∈
(0, 1) we have that S = L(, x0 ) ∩ {h ∈ R : λ0 g (x0 )h ≥ −β̄ |h|} = {0}. Hence (2.3.21) is
trivially satisfied. Let us note that f is bounded below as f (x) ≥ f (x0 ) + max(|x|, 2|x|2 )
for x ∈ K. A similar argument does not hold if f is replaced by f (x) = −x 3 . In this
case x0 = 0, λ0 = 0, f (x0 ) = 0, S = K, and (2.3.21) is not satisfied. Note that in this
case we have lack of strict complementarity; i.e., the constraint is active and simultaneously
λ0 = 0.
Theorem 2.14. Suppose that x0 ∈ (p0 ) is a regular point and that for some β > 0
x − x0 = h(x) + z(x)
where r1 (x, x0 ) = f (xo )z(x) + r(x, xo ) and |r1 (x, x0 )| = o(|x − x0 |). Choose a neighbor-
) of x0 such that |z(x)| ≤ 1 |x − x0 | and |r1 (x, x0 )| ≤ β |x − x0 | for x ∈ (p0 ) ∩ U
hood U ).
2 4
Then |h(x)| ≤ |x − x0 | and f (x) − f (x0 ) ≥ |x − x0 | for x ∈ U .
1 β )
2 4
Note that for C = X we have f (x0 ) + g(x0 , p0 )∗ λ0 = 0 and the second order
condition of Theorem 2.12 involves the set of directions h ∈ L((p0 ), x0 ) for which the
first order condition is violated.
Following [Alt3] we now show that (2.3.19) implies stability of local minimizers.
i i
i i
ItoKunisc
i i
2008/6/12
page 45
i i
Theorem 2.15. Suppose that the local solution x0 ∈ (p0 ) is a regular point and that
(2.3.11), (2.3.12), and (2.3.19) are satisfied. Then there exist real numbers r > 0 and
L > 0 and a neighborhood N of p0 such that the value function μr is Lipschitz continuous
at p0 and for each p ∈ N the following statements hold:
(a) For every sequence {xn } in r (p) with the property that limn→∞ f (xn , p) =
μr (p) it follows that xn ∈ B(x0 , r) for all sufficiently large n.
(b) If there exists a point x(p) ∈ r (p) with μr (p) = f (x(p), p), then x(p) ∈
B(x0 , r), i.e., x(p) is a local minimizer of (2.3.1) and
1
|x(p) − x0 | ≤ L δ(p, p0 ) β .
for all p ∈ Ñ, where Lr is determined in Theorem 2.9. Thus μr is Lipschitz continuous at
p0 by Theorem 2.9. Let p ∈ Ñ be arbitrary and let {xn } in (p) be a sequence satisfying
limn→∞ f (xn , p) = μr (p). By (2.3.19) and Lipschitz continuity of μr
This estimate together with (2.3.26) imply |xn − x0 | < r for all sufficiently large n. This
proves (a).
Similarly, for x(p) ∈ r (p) satisfying μr (p) = f (x(p), p) we have
i i
i i
ItoKunisc
i i
2008/6/12
page 46
i i
Theorem 2.16. Assume (H1)–(H4) hold at the local solution x0 of (2.1.1). Then there
exist neighborhoods N = N (p0 ) of p0 and U = U (x0 , λ0 , μ0 , η0 ) of (x0 , λ0 , μ0 , η0 ) and a
constant M such that for all p ∈ N there exists a unique (xp , λp , μp , ηp ) ∈ U satisfying
⎧
⎪
⎪ L (x, p, λ, μ, η),
⎨
−e(x, p),
0∈ (2.4.1)
⎪
⎪ −g(x, p) + ∂ψRm,+ (μ),
⎩
−(x, p) + ∂ψK + (η),
and
2 (2.4.3)
subject to Ex = b̃ in Ỹ and Gx − c̃ ∈ K̃ ⊂ Z̃,
where Ỹ , Z̃ are Banach spaces, E ∈ L(H, Ỹ ), G ∈ L(H, Z̃), and K̃ is a closed convex
cone in Z̃. If
(a) A·, ·
is a symmetric bounded quadratic form on H ×H and there exists an ω > 0
such that Ax, x
≥ ω |x|2H for all x ∈ ker (E), and
(b) for all (b̃, c̃) ∈ Ỹ × Z̃
Ax + ã, v − x
≥ 0 for all v ∈ S̃(b̃, c̃).
Moreover, if x is a regular point, then there exists (λ̃, η̃) ∈ Ỹ ∗ × Z̃ ∗ such that
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
ã A E∗ G∗ x 0
0 ∈ ⎝ b̃ ⎠ + ⎝ −E 0 0 ⎠ ⎝ λ̃ ⎠ + ⎝ 0 ⎠.
c̃ −G 0 0 η̃ ∂ψK̃ + (η̃)
Proof. Since K̃ is closed and convex, S̃(b̃, c̃) is closed and convex as well. Since S̃(b̃, c̃)
is nonempty, there exists a unique w ∈ range E ∗ such that Ew = b̃. Moreover, every
x ∈ S̃(b̃, c̃) can be expressed as x = w + y, where y ∈ ker (E). From (a) it follows that
J is coercive and bounded below on S̃(b̃, c̃). Hence there exists a bounded minimizing
sequence {xn } in S̃(b̃, c̃) such that limn→∞ J (xn ) → inf x∈S̃(b̃,c̃) J (x). Boundedness of
i i
i i
ItoKunisc
i i
2008/6/12
page 47
i i
xn and weak sequential closedness of S̃(b̃, c̃) imply the existence of a subsequence of xn
that converges weakly to some x in S̃(b̃, c̃). Since J is weakly lower semicontinuous,
J (x) ≤ lim inf J (xn )n→∞ . Hence x minimizes J over S̃(b̃, c̃).
For v ∈ S̃(b̃, c̃) and t ∈ (0, 1) we have x + t (v − x) ∈ S̃(b̃, c̃). Thus,
t2
0 ≤ J (x + t (v − x)) − J (x) = t Ax + ã, v − x
+ A(v − x), v − x
2
and letting t → 0+ we have
Ax + ã, v − x
≥ 0 for all v ∈ S̃(b̃, c̃).
The last assertion follows from Theorem 1.6.
Proof of Theorem 2.16. The proof of the first assertion of the theorem is based on the implicit
function theorem of Robinson for generalized equations; see Theorem 2.3. It requires us to
verify the strong regularity condition for the linearized form of (2.4.1) which is given by
⎧
⎪
⎪ L (x0 , p0 , λ0 , μ0 , η0 ) + A(x − x0 ) + E ∗ (λ − λ0 ) + G∗ (μ − μ0 ) + L∗ (η − η0 ),
⎨
−e(x0 , p0 ) − E(x − x0 ),
0∈
⎪
⎪ −g(x 0 , p0 ) − G(x − x0 ) + ∂ψR m,+ (μ),
⎩
−(x0 , p0 ) − L(x − x0 ) + ∂ψK + (η),
where the operators A, E, G, and L are defined in the introduction to this chapter.
If we define the multivalued operator T from X × W × Rm × Z to itself by
⎛ ⎞ ⎛ ⎞⎛ ⎞
x A E ∗ G ∗ L∗ x
⎜ λ ⎟ ⎜ −E 0 0 0 ⎟ ⎜ ⎟
T ⎜ ⎟ ⎜ ⎟⎜ λ ⎟
⎝ μ ⎠ = ⎝ −G 0 0 0 ⎠ ⎝ μ ⎠
η −L 0 0 0 η
⎛ ⎞ ⎛ ⎞
f (x0 , p0 ) − Ax0 0
⎜ Ex0 ⎟ ⎜ 0 ⎟
+⎜ ⎟ ⎜
⎝ −g(x0 , p0 ) + Gx0 ⎠ + ⎝ ∂ψRm,+ (μ) ⎠ ,
⎟
η −L 0 0 0 η
⎛ ⎞ ⎛ ⎞
f (x0 , p0 ) − Ax0 0
⎜ E+ x0 ⎟ ⎜ 0 ⎟
+⎜ ⎝
⎟+⎜
⎠ ⎝
⎟
0 ⎠.
G 0 x0 ∂ψR 2 (μ )
m ,+
i i
i i
ItoKunisc
i i
2008/6/12
page 48
i i
where
To solve (2.4.4) we introduce the quadratic optimization problem with linear con-
straints:
1
min Ax, x
+ a, x
2 (2.4.6)
subject to E+ x = b, G0 x ≤ c, and Lx − d ∈ K.
We verify the conditions in Lemma 2.17 for (2.4.6) with Ỹ = W × Rm1 , Z̃ = Rm2 × Z,
E = E+ , G = (G0 , L), and K̃ = Rm2 ,− × K. Let S be the feasible set of (2.4.6):
S(α, β, γ , δ) = {x ∈ X : E+ x = b, G0 x ≤ c, Lx − d ∈ K},
where the relationship between (α, β, γ , δ) and (a, b, c, d) is given by (2.4.5). Clearly,
x0 ∈ S(0) and moreover x0 is regular point. From Theorem 2.8 it follows that there exists
a neighborhood V of 0 such that for all (α, β, δ, γ ) ∈ V the set S(α, β, γ , δ) is nonempty.
Lemma 2.17 implies the existence of a unique solution x to (2.4.6) for every (α, β, δ, γ ) ∈ V .
By (H1)–(H2) and Theorems 2.12, 2.11, and 2.14 the solution x is Hölder continuous with
exponent 12 with respect to (α, β, γ , δ) ∈ V . Next we show that x = x(α, β, γ , δ) is a
regular point for (2.4.6), i.e.,
⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ 0 E+ 0 ⎬
0 ∈ int ⎝ G0 x − c ⎠ + ⎝ G0 ⎠ (X − x) + ⎝ Rm2 ,+ ⎠ .
⎩ ⎭
Lx − d L −K
This is equivalent to
⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ β E+ 0 ⎬
0 ∈ int ⎝ γ ⎠ + ⎝ G0 ⎠ (X − x0 ) + ⎝ Rm2 ,+ ⎠ .
⎩ ⎭
(x0 , p0 ) + δ L −K
i i
i i
ItoKunisc
i i
2008/6/12
page 49
i i
Since x0 is a regular point, there exists a neighborhood Ṽ ⊂ V of 0 such that this inclusion
holds for all (α, β, γ , δ) ∈ Ṽ . Hence the existence of a solutions to (2.4.4) follows from
Lemma 2.17.
Uniqueness. To argue uniqueness, assume that (xi , λ̃i , μ0i , ηi ), i = 1, 2, are solutions to
(2.4.4). It follows that
E+ (x1 − x2 ) = 0,
L(x1 − x2 ), η1 − η2
≥ 0.
η1 − η2 , L(x − x0 ) + (x0 , p0 ) + δ
= 0,
or equivalently
E ∗ (z)(λ̃1 − λ̃2 , μ01 − μ02 + η1 − η2 ) = 0,
where z = L(x − x0 ) + (x0 , p0 ) + δ and E is defined below (2.1.6). Due to continuous
dependence of x on (α, β, γ , δ) we can assure that z ∈ O for all (α, β, γ , δ) ∈ Ṽ . Hence
(H3) implies that (λ̃1 − λ̃2 , μ01 − μ02 , η1 − η2 ) = 0.
Continuity. By (H3) the operator E ∗ ((x0 , p0 )) has closed range and injective. Hence there
exists > 0 such that
|E ∗ ((x0 , p0 ))| ≥ ||
for all = (λ̃, μ0 , η). Since E ∗ ((x0 , p0 )) − E ∗ (z) ≤ |z − (x0 , p0 )|, there exists a
neighborhood of 0 in X × (W × Rm1 ) × Rm2 × Z, again denoted by Ṽ , such that
|E ∗ (zδ )| ≥ || (2.4.7)
2
for all (α, β, γ , δ) ∈ Ṽ , where zδ = (x0 , p0 ) + Lx − Lx0 + δ. Any solution (x, λ̃, μ0 , η)
satisfies
E ∗ (zδ )(λ̃, μ0 , η) = (−Ax − a, 0).
It thus follows from (2.4.7) that there exists a constant k1 such that for all (α, β, γ , δ) ∈ Ṽ
the solution (x, λ̃, μ̃, η) satisfies
i i
i i
ItoKunisc
i i
2008/6/12
page 50
i i
Henceforth let (αi , βi , γi , δi ) ∈ Ṽ and let (xi , λ̃i , μ0i , ηi ) denote the corresponding solution
to (2.4.4) for i = 1, 2. Then
⎛ ⎞
λ̃1 − λ̃2
α1 − α2 + A(x2 − x1 )
E ∗ (zδ1 ) ⎝ μ01 − μ02 ⎠ = .
η2 , L(x2 − x1 ) + δ2 − δ1
η1 − η 2
|(λ̃1 − λ̃2 , μ01 − μ02 , η1 − η2 )|(W ×Rm1 )×Rm2 ×Z ≤ k2 (|α1 − α2 | + |δ1 − δ2 | + |x1 − x2 |). (2.4.8)
We also find
μ01 − μ02 , G0 (x1 − x2 )
= μ01 , c1 − c2
+ μ01 , c2 − G0 x2
(2.4.10)
− μ02 , G0 x1 − c1
− μ02 , c1 − c2
≥ μ01 − μ02 , c1 − c2
and similarly
η1 − η2 , L(x1 − x2 )
≥ η1 − η2 , d1 − d2
. (2.4.11)
E+ (x1 − x2 ) = E+ w = b1 − b2 .
Thus
κ |v|2 ≤ Av, v
= A(x1 − x2 ), x1 − x2
− 2 Av, w
− Aw, w
≤ − λ̃1 − λ̃2 , b1 − b2
− μ01 − μ02 , c1 − c2
− η1 − η2 , d1 − d2
− a1 − a2 , v + w
− 2 Av, w
− Aw, w
.
It thus follows from (2.4.8), (2.4.12) that there exists a constant k4 such that
|x1 − x2 | ≤ k4 |(α1 − α2 , β1 − β2 , γ1 − γ2 , δ1 − δ2 )|
for all (αi , βi , γi , δi ) ∈ Ṽ . We apply (2.4.8) once again to obtain Lipschitz continuity of
(x, λ̃, μ0 , η) with respect to (α, β, γ , δ) in a neighborhood of the origin. Consequently T
i i
i i
ItoKunisc
i i
2008/6/12
page 51
i i
Local solution. We show that there exists a neighborhood Ñ of p0 such that for p ∈ Ñ the
second order sufficient optimality (2.3.21) is satisfied at x(p) so that x(p) is a local solution
of (2.1.1) by Theorem 2.12. Due to (H2) and the smoothness properties of f, e, g we can
assume that
κ 2
L (x(p), p, λ(p), η(p))(h, h) ≥ |h| for all h ∈ ker E+ (2.4.13)
2
if p ∈ N (p0 ). Let us define Ep = (e (x(p), p), g+
(x(p), p)) for p ∈ N (p0 ). Due to
surjectivity of Ep0 and smoothness properties of e, g there exists a neighborhood Ñ ⊂ N (p0 )
of p0 such that Ep is surjective for all p ∈ Ṽ . Lemma 2.13 implies the existence of δ0 >
and γ > 0 such that
for all h ∈ ker E+ and z ∈ X satisfying |z| ≤ γ |h|. The orthogonal projection onto ker Ep
is given by Pker Ep = I − Ep∗ (Ep Ep∗ )−1 Ep . We can select Ñ so that
γ
|Ep∗ (Ep Ep∗ )−1 Ep − Ep∗0 (Ep0 Ep∗0 )−1 Ep0 | ≤
1+γ
and, since L((p), p) ⊂ ker Ep , the second order sufficient optimality (2.3.21) is satisfied
at x(p).
The first part of the proof of Theorem 2.16 contains a Lipschitz continuity result for
linear complementarity problems. In the following corollary we reconsider this result in
a modified form which will be used for the convergence analysis of sequential quadratic
programming problems.
Throughout the following discussion p0 in (2.1.1) (with C = X) is fixed and therefore
its notation is suppressed. For (x, λ, μ) ∈ X × W ∗ × Rm let
i.e., the equality and finite rank inequality constraints are realized in the Lagrangian term.
For (x, λ, μ) ∈ X×W ∗ ×Rm , (x̄, λ̄, μ̄) ∈ X×W ∗ ×Rm , and (a, b, c, d) ∈ X×W ×Rn ×Z
i i
i i
ItoKunisc
i i
2008/6/12
page 52
i i
consider
⎧
⎪
⎪ min 12 L (x̄, λ̄, μ̄)(x − x̄, x − x̄) + f (x̄) + a, x − x̄
,
⎪
⎪
⎪
⎪
⎪
⎪
⎨ e(x̄) + e (x̄)(x − x̄) = b,
(2.4.14)
⎪
⎪
⎪
⎪ g(x̄) + g (x̄)(x − x̄) ≤ c,
⎪
⎪
⎪
⎪
⎩
(x̄) + L(x − x̄) − d ∈ K.
Note that ⎛
⎞ ⎛ ⎞
a 0
⎜ b ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ ⎜
⎝ c ⎠ + G(x̄, λ̄, μ̄)(x, λ, μ, η) + ⎝
⎟ (2.4.15)
∂ψRm,+ (μ) ⎠
d ∂ψK + (η)
is the first order optimality system for (2.4.14). The following modification of (H2) will be
used
(H2) there exists κ > 0 such that L (x0 , λ0 , μ0 ) x, x
X ≥ κ|x|2X for all x ∈ ker(E+ ).
Corollary 2.18. Assume that (H1), (H2), , (H3) hold at a local solution of (2.1.1) and that
the second derivatives of f , e, and g are Lipschitz continuous in a neighborhood of x0 .
Then there exist neighborhoods U (ξ0 ) of ξ0 = (x0 , λ0 , μ0 , η0 ) in X × W ∗ × Rm × Z ∗ ,
Û (x0 , λ0 , μ0 ) of (x0 , λ0 , μ0 ), and V of the origin in X × W × Rm × Z and a constant K̃,
such that for all (x̄, λ̄, μ̄) ∈ Û (x0 , λ0 , μ̄0 )) and q = (a, b, c, d) ∈ V , there exists a unique
solution ξ = ξ(x̄, λ̄, μ̄, q) = (x, λ, μ, η) ∈ U (ξ0 ) of (2.4.15), and x is a local solution of
(2.4.14). Moreover, for every pair q1 , q2 ∈ V and (x̄1 , λ̄1 , μ̄1 ), (x̄2 , λ̄2 , μ̄2 ) ∈ Û (x0 , λ0 , μ̄0 )
we have
|ξ(x̄1 , λ̄1 , μ̄1 , q1 ) − ξ(x̄2 , λ̄2 , μ̄2 , q2 )| ≤ K̃(|(x̄1 , λ̄1 , μ̄1 ) − (x̄2 , λ̄2 , μ̄2 )| + |q1 − q2 |).
Proof. From the first part of the proof of Theorem 2.16 it follows that
⎛ ⎞ ⎛ ⎞
a 0
⎜ b ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ ⎜
⎝ c ⎠ + G(x0 , λ0 , μ0 )(x, λ, μ, η) + ⎝ ∂ψRm,+ (μ)
⎟
⎠
d ∂ψK + (η)
i i
i i
ItoKunisc
i i
2008/6/12
page 53
i i
2.5. Differentiability 53
2.5 Differentiability
In this section we discuss the differentiability properties of the solution ξ(p) = (x(p), λ(p),
μ(p), η(p)) of the optimality system (2.1.3) with respect to p. Throughout it is assumed
that p ∈ N so that (2.1.3) has the unique solutions ξ(p) in U , where N and U are specified
in Theorem 2.16. We assume that C = X.
Definition 2.19. A function φ from P to a normed linear space X̃ is said to have a directional
derivative at p0 ∈ P if
φ(p0 + t q) − φ(p0 )
lim
t→0+ t
exists for all q ∈ P .
i i
i i
ItoKunisc
i i
2008/6/12
page 54
i i
and ξ(p0 ) which, without loss of generality, we again denote by N and U , such that (2.5.1)
admits a unique solution ξ̂ (p) = (x̂(p), λ̂(p), μ̂(p), η̂(p)) in U for every p ∈ N . Moreover
we have
|ξ(p) − ξ̂ (p)| ≤ α(p) |p − p0 |, (2.5.2)
where P denotes the metric projection onto C and [z − P z]⊥ stands for the orthogonal
complement of the subspace spanned by x − P z ∈ H . Moreover C is called polyhedric if
C is polyhedric at every z ∈ H .
(H5) e(x0 , ·), (x0 , ·), f (x0 , ·), e (x0 , ·), g (x0 , ·), and (x0 , ·) are directionally differen-
tiable at p0 .
(H6) K is polyhedric at (x0 , p0 ) + η0 .
(H7) There exists ν > 0 such that Ax, x
≥ ν |x|2 for all x ∈ ker E.
(H8) EL : X → W × Z is surjective.
Since every element z ∈ Z can be decomposed uniquely as z = z1 + z2 with z1 = PK z and
z2 = PK + z and z1 , z2
= 0 (see [Zar]), (H.6) is equivalent to
Recall the decomposition of the inequality constraint with finite rank and the nota-
tion that was introduced in (2.1.6). Due to the complementarity condition and continuous
dependency of ξ(p) on p one can assume that
for p ∈ N. This also holds with (x(p), μ(p)) replaced by (x̂(p), μ̂(p)).
i i
i i
ItoKunisc
i i
2008/6/12
page 55
i i
2.5. Differentiability 55
Theorem 2.21. Let (H1)–(H5) hold and let (ẋ, λ̇, μ̇, η̇) denote a weak cluster point of
ξ(p0 +t q)−ξ(p0 )
t
as t → 0+ . Then (ẋ, λ̇, μ̇, η̇) satisfies
⎧
⎪
⎪ Lp (x0 , po , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗ μ̇ + L∗ η̇,
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ −ep+ (x0 , p0 )q − E+ ẋ,
⎪
⎪
⎨
0∈ −gp0 (x0 , p0 ) − G0 ẋ + ∂ψRm2 ,+ (μ̇0 ), (2.5.4)
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ μ̇− ,
⎪
⎪
⎪
⎪
⎩
η̇, (x0 , p0 )
+ η0 , Lẋ + p (x0 , p0 )q
.
Proof. As described above, we can restrict ourselves to a weak cluster point of ξ̂ (p0 +t q))−
t
ξ̂ (p0 )
.
We put pn = p0 + tn q. By (2.5.1)
0 = L (x0 , pn , λ0 , μ0 , η0 ) − L (x0 , p0 , λ0 , μ0 , η0 )
Dividing these two equations by tn > 0 and letting tn → 0+ we obtain the first two equations
in (2.5.4). For the third equation note that μ̇0 ≥ 0 since μ̂0 (pn ) ≥ 0 and μ0 (p0 ) = 0. Since
g 0 (x0 , p0 ) = 0 we have from (2.5.1)
for all z ∈ Rm2 ,+ . Dividing this inequality by tn > 0 and letting tn → 0+ we obtain the
third inclusion. The fourth equation is obvious. For the last equation we recall that
Thus,
η̂(pn ) − η0 , (x0 , pn )
+ η̂(pn ), L(x̂(pn ) − x0 )
+ η0 , (x0 , pn ) − (x0 , p0 )
= 0,
Corollary 2.22 (Sensitivity of Cost). Let (H.1)–(H.4) hold and assume that e, g, are
continuously differentiable in the sense of Fréchet at (x0 , p0 ). Then the Gâteaux derivative
i i
i i
ItoKunisc
i i
2008/6/12
page 56
i i
V (p0 ) = Lp (x0 , p0 , λ0 , μ0 , η0 )
= fp (x0 , p0 ) + λ0 , ep (x0 , p0 )
+ μ0 , gp (x0 , p0 )
+ η0 , p (x0 , p0 )
.
For q ∈ P let ξ(t) = t −1 (x(p0 + t q) − x(p0 )), t > 0. Then there exists a subsequence tn
such that limn→∞ tn = 0 and ξ(tn ) → (ẋ, λ̇, μ̇, η̇). Now, let pn = p0 + tn q and
Thus,
V (pn )−V (p0 )
tn
= L (x0 , p0 , λ0 , μ0 , η0 ), ẋ
+ L(x0 , p0 , λ0 , μ0 , η0 )p , q
+ λ̇, e(x0 , p0 )
+ μ̇, g(x0 , p0 )
+ η̇, (x0 , p0 )
+ O(tn ).
η0 , Lẋ + p (x0 , p0 )q
= 0. (2.5.5)
V (pn ) − V (p0 )
lim = Lp (x0 , p0 , λ0 , μ0 , η0 ), q
.
n→∞ tn
Since the limit does not depend on the sequence {tn } the desired result follows.
i i
i i
ItoKunisc
i i
2008/6/12
page 57
i i
2.5. Differentiability 57
In order to prove strong differentiability of x(p) we use a result due to Haraux [Har]
on directional differentiability of the metric projection onto a closed convex set.
Theorem 2.23. Let C be a closed convex set in a Hilbert space H with metric projection
P from H onto C. Let ψ be an H -valued function and assume that C is polyhedric at
ψ(0). Set
and denote by PK̂(ψ(0)) the projection onto K̂(ψ(0)). If there exists a sequence tn such that
limn→∞ tn → 0+ and limn→∞ ψ(tn )−ψ(0)tn
=: ψ̇ exists in H , then
P ψ(tn ) − P ψ(0)
lim = PK̂(ψ(0)) ψ̇.
tn →0+ tn
This implies
ψ(tn ) − ψ(0)
tn + ψ(0) − tn γ (tn ) − P ψ(0), −tn γ (tn ) ≤ 0
tn
and hence
ψ(tn ) − ψ(0)
tn2 γ (tn ), γ (tn ) − ≤ tn γ (tn ), ψ(0) − P ψ(0)
tn (2.5.6)
= P ψ(tn ) − P ψ(0), ψ(0) − P ψ(0)
≤ 0
for all n. Since the norm is weakly lower semicontinuous, (2.5.6) implies that
γ , γ − ψ̇
≤ 0. (2.5.7)
γ , ψ(0) − P ψ(0)
= 0,
i i
i i
ItoKunisc
i i
2008/6/12
page 58
i i
Let w ∈ ∪λ>0 λ(C − P ψ(0)) ∩ [ψ(0) − P ψ(0)]⊥ be arbitrary. Then there exist λ > 0
and u ∈ C such that w = λ (u − P ψ(0)) and ψ(0) − P ψ(0), u − P ψ(0)
= 0. Since
tn
and therefore
ψ(tn ) − ψ(0)
− γ , u − P ψ(0) ≤ δn , u − P ψ(0)
+ ψ(0) − P ψ(0), δn
+ tn M
tn
for some constant M. Letting tn → 0+ we have
ψ̇ − γ , u − P ψ(0)
≤ 0.
ψ̇ − γ , w
≤ 0 for all w ∈ K̂(ψ(0)). (2.5.8)
γ − ψ̇, γ − w
≤ 0 for all w ∈ K̂(ψ(0)),
ψ̇ − γ , γ
= 0.
Hence
ψ(tn ) − ψ(0)
ψ̇, γ
= |γ | ≤ lim inf |γ (tn )| ≤ lim sup
2 2
, γ (tn ) = ψ̇, γ
,
tn
which implies that lim |γ (tn )|2 = |γ |2 and completes the proof.
Theorem 2.24 (Sensitivity Equation). Let (H1)–(H8) hold. Then the solution mapping
p → ξ(p) = (x(p), λ(p), μ(p), η(p)) is directionally differentiable at p0 , and the direc-
tional derivative (ẋ, λ̇, μ̇, η̇) at p0 in direction q ∈ P satisfies
⎧
⎪
⎪ Lp (x0 , p0 , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗+ μ̇+ G∗0 μ̇0 + L∗ η̇,
⎨
−ep+ (x0 , p0 )q − E+ ẋ,
0∈ (2.5.9)
⎪
⎪ −gp0 (x0 , p0 )q − G0 ẋ + ∂ψRm,+ (μ̇0 ),
⎩
−p (x0 , p0 )q − Lẋ + ∂ψK̂ + (η̇).
i i
i i
ItoKunisc
i i
2008/6/12
page 59
i i
2.5. Differentiability 59
Proof. Let {tn } be a sequence of real numbers with limn→∞ tn = 0+ and w−lim tn−1 (ξ(p0 +
tn q) − ξ(p0 )) = (ẋ, λ̇, μ̇, η̇). Then (ẋ, λ̇, μ̇, η̇) is also a weak cluster point of w −
lim tn−1 (ξ̂ (p0 + tn q) − ξ(p0 )). The proof is now given in several steps.
We first show that x̂(tn )−x(0)
tn
converges strongly in X as n → ∞ and (2.5.9) holds for
all weak cluster points. Let pn = p0 + tn q and define
Due to (H5) and (H8) there exists ẇ ∈ X such that lim tn−1 (w(tn ) − w(0)) = ẇ. If we
define y(tn ) = x̂(tn ) − w(tn ), then using Lemma 2.7 one verifies that y(tn ) ∈ C, where
C = {x ∈ X : Ex = 0 and Lx ∈ K}.
Observe that
E ∗ λ̂(tn ) + L∗ η̂(tn ), c − yn (tn )
≤ 0 for all c ∈ C
and thus
then ker E is a Hilbert space by (H7). Let AP = Pker E A Pker E and define
ψ(tn ) = −A−1 ∗
P Pker E (φ(tn ) + Aw(tn ) + G μ(tn )) ∈ ker E.
i i
i i
ItoKunisc
i i
2008/6/12
page 60
i i
Thus y(tn ) = PC ψ(tn ), where PC denotes the metric projection in ker E onto PC with
respect to the inner product ((·, ·)). For any h ∈ ker E we find
= h, −f (x0 , p0 ) − G∗ μ0 − E ∗ λ0
= h, L∗ η0
,
and therefore
[ψ(0) − PC ψ(0)]⊥ = {h : η0 , L h
= 0}. (2.5.12)
Here the orthogonal complement is taken with respect to the ((·, ·))—the inner product
on ker E. Moreover we have L(y(0) = (x0 , p0 ). Hence from Lemma 2.25 below we
have ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) −
PC ψ(0)]⊥ , i.e., C is polyhedric with respect to ψ(0). Theorem 2.23 therefore implies
that limtn →o+ (tn−1 )(y(tn ) − y(0)) = : ẏ exists and satisfies
ˆ
((ψ̇ − ẏ, v − ẏ)) ≥ 0 for all v ∈ C(ψ(0)). (2.5.13)
Moreover limtn →0 (tn−1 )(x(tn ) − x(0)) exists and will be denoted by ẋ.
To verify that (ẋ, λ̇, μ̇, η̇) satisfies (2.5.9) it suffices to prove the last inclusion. From
(2.5.13) we have
−Lp (x0 , p0 , λ0 , μ0 , η0 )q − A ẋ − G∗ μ̇, v − ẏ
≤ 0
L∗ η̇, v − ẏ
= η̇, L v − p (x0 , p0 )q
≤ 0 (2.5.14)
ˆ
for all v ∈ C(ψ(0)) = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ . Here we used the
facts that φ̇ = Lp (x0 , λ0 , μ0 , ηo )q and L ẇ = −p (x0 , p0 )q. From (2.5.12), (H8), and
L y(0) = (x0 , p0 ) we have
ˆ
L C(ψ(0)) = ∪λ>0 λ(K − (x0 , p0 ) ∩ [η0 ]⊥ = : K̂
η̇, v − L ẋ − p(x0 , p0 )q
≤ 0 for all v ∈ K̂. (2.5.15)
This further implies p (x0 , p0 )q + L ẋ ∈ K̂, and from (2.5.15) we deduce η̇ ∈ K̂ + . Conse-
quently η̇ ∈ ∂ ψK̂ (p (x0 , p0 )q +L ẋ), which is equivalent to p(xo , p0 )q +L ẋ ∈ ∂ ψK̂ + (η̇)
and implies the last inclusion in (2.5.9).
i i
i i
ItoKunisc
i i
2008/6/12
page 61
i i
2.5. Differentiability 61
and hence (H3) implies (λ̇1 , μ̇1 , η̇1 ) = (λ̇2 , μ̇2 , η̇2 ). Consequently ξ̂ (t)−ξ(0)
t
has a unique
weak limit as t → 0+ . Finally we show that the unique weak limit is also strong. For the
x-component this was already verified. Note that from (2.5.1)
⎛˜ ⎞ ⎛ ⎞
λ̂(t) − λ̃0 −L (x0 , p0 + t q, λ0 , μ0 , η0 ) − A(x̂(t) − x0 )
E ∗ ((x0 , p0 )) ⎝ μ̂0 − μ00 ⎠ = ⎝ ⎠.
η̃(t) − η0 η̂(t), −(x0 , p + t q) + (x0 , p0 ) − L(x(t) ˆ − x0 )
Dividing this equation by t and noticing that the right-hand side converges strongly as
t → 0+ , it follows from (H3) that limt→0+ t −1 (λ̃(t) − λ0 , μ̃(t) − μ0 , η̃(t) − η0 ) converges
strongly as well.
Lemma 2.25. Assume that (H6) and (H8) hold and let y(0) ∈ C be such that L y(0) =
(x0 , p0 ). Then we have
∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
= 0}
(2.5.16)
= ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
= 0}.
Proof. It suffices to verify that the set on the right-hand side of (2.5.16) is contained in the
set on the left. For this purpose let y ∈ ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
=
0} and decompose y = y1 + y2 with y1 ∈ ker( EL ) and y2 ∈ (ker( EL ))⊥ . We need
only consider y2 . Since L y2 ∈ ∪λ→0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ and therefore L y2 ∈
∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ by (H6), hence there exist sequences {λn } and {kn } with
λn > 0, kn ∈ K, kn , η0
= 0, and lim λn (kn − (x0 , p0 )) = L y2 . Let us define
cn = c̃n +y(0), where c̃n is the unique element in ( EL )∗ satisfying ( EL )c̃n = ( kn −L0 y(0) ). Then
we have E(cn −y(0)) = 0, η0 , L cn
= η0 , kn
= 0 and hence the sequence λn (cn −y(0)) is
contained in the set on the left-hand side of (2.5.16). Moreover limn→∞ ( EL )(λn (cn −y(0)) =
( EL )y2 , and since both λn (cn − y(0)) and y2 are contained in the range of ( EL )∗ we find that
limn→∞ λn (cn − y(0)) = y2 .
i i
i i
ItoKunisc
i i
2008/6/12
page 62
i i
subject to
|y(T ) − y d | ≤ δ,
1
g(y, u, p) = |y(T ) − y d |2 − δ 2 ,
2
(y, u, p) = u − z.
The linearizations E, G, and L are
L(y0 , u0 , p0 )(h, v) = v.
i i
i i
ItoKunisc
i i
2008/6/12
page 63
i i
(A2) A(·) and B(·) are Lipschitz continuous and directionally differentiable at α0 .
(A3) (y, u) → f (y, u, α) is twice continuously differentiable in a neighborhood of (y0 ,
u0 , α0 ) ∈ HL1 × L2 × Rr .
(A4) There exists > 0 such that for the Hessian with respect to (y, u)
With (A3) holding, the regularity requirements made at the beginning of the chapter
are satisfied. The regular point condition (H1) will be satisfied if for every (φ, ρ, ψ) ∈
L2 × R × L2 there exist (h, v) ∈ HL1 × L2 and (r + , r, k) ∈ R+ × R × K such that
⎧
⎪
⎪ ḣ − A(α0 )h − B(α0 )v = φ,
⎪
⎪
⎨
y0 (T ) − y0d , h(T )
+ r + + r g(y0 , u0 , p0 ) = ρ, (2.6.2)
⎪
⎪
⎪
⎪
⎩
v − k + r(u0 − z0 ) = ψ.
v = ψ + k + r(z0 − u0 ),
t
h(t) = ρ1 (t) + eA(α0 )(t−s) B(α0 )(k + r(z0 − u0 ))dt,
0
t
where ρ1 (t) = 0 e A(α0 )(t−s)
(B(α0 )ψ +φ)ds. The second equation in (2.6.2) is equivalent to
T
y0 (T ) − y0α , eA(α0 )(T −s) B(α0 )(k + r(z0 − u0 ))ds + r + + r g(y0 , u0 , p0 )
0
= ρ − ρ1 (T ).
If g(y0 , u0 , p0 ) = 0, then, using (A6), the desired solution for (2.6.2) is obtained by setting
r + = k = 0 and
T −1
A(α0 )(T −s)
r = y0 (t) − yo ,
d
e B(α0 )(z0 − u0 )ds (ρ − ρ1 (T )).
0
If g(y0 , u0 , p0 ) < 0 and ρ−ρ1 ≥ 0, then the choice r + = ρ−ρ1 , r = k = 0 gives the desired
solution. Finally if g(y0 , u0 , p0 ) < 0 and ρ − ρ1 < 0, then r = (ρ − ρ1 )g(y0 , u0 , p0 ) > 0,
i i
i i
ItoKunisc
i i
2008/6/12
page 64
i i
r + = 0, and k = r(u0 − u) ∈ K give a solution for (2.6.2). Thus the regular point condition
holds and implies the existence of a Lagrange multiplier (λ0 , μ0 , η0 ) ∈ L2 × R × L2 . For
the Lagrangian functional
T
L(y, u, p0 , λ0 , μ0 , η0 ) = fˆ(y, u, α0 )dt + λ0 , ẏ − A(α0 )y − B(α0 )u − h0
μ0
+ (|y(T ) − y0d | − δ 2 |) − η0 , u − z0
,
2
the Hessian with respect to (y, u) at (y0 , u0 ) in directions (h, v) ∈ HL1 × L2 satisfies
L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v))
T
(h(t), v(t))fˆ (y0 , u0 , α0 )(h(t), v(t))T dt + μ0 |h(T )|2 ≥ |v|2L2
=
0
by (A4). Let k1 > 0 be chosen such that |h|HL1 ≤ k1 |v|L2 for all (h, v) ∈ HL1 × L2 in ker E.
Then there exists a constant k2 > 0 such that
L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v)) ≥ k2 (|h|2H 1 + |v|1L2 )
L
( y0 (T ) − y0d , ρ1 (T )
− ρ),
and (H3) holds. Conditions (H4) and (H5) follow from (A2) and (A5). The cone of a.e.
nonpositive functions in L2 is polyhedric [Har] and hence (H6) holds. Finally (H8) is simple
to check. We note that (A5) and (A6) are not needed if (2.6.1) is considered without terminal
constraint.
i i
i i
ItoKunisc
i i
2008/6/12
page 65
i i
Chapter 3
3.1 Generalities
This chapter is devoted to first order augmented Lagrangian methods for constrained prob-
lems of the type
-
min f (x) over x ∈ X
(3.1.1)
subject to e(x) = 0, g(x) ≤ 0, (x) ∈ K,
65
i i
i i
ItoKunisc
i i
2008/6/12
page 66
i i
augmented Lagrangian method can be considered as a hybrid method combining the output-
matching and equation error methods. In the context of parameter estimation or, equally
well, of optimal control problems, the first order augmented Lagrangian method lends itself
to vectorization and parallelization in a natural way; see [KuTa].
It will be convenient throughout this section to identify the Hilbert spaces X, W , and
Z with their duals. As a consequence the Lagrange multiplier associated to the equality
constraint e(x) = 0 is sought in W and that for (x) ∈ K in Z.
Let us now fix those assumptions which are assumed throughout this chapter: There
exists x ∗ ∈ X and (λ∗ , μ∗ , η∗ ) ∈ W × Rp × Z such that
-
f, e, and g are twice continuously Fréchet differentiable
(3.1.2)
in a neighborhood of x ∗ ,
and
-
e(x ∗ ) = 0, μ∗ , g(x ∗ ) Rm = 0, μ∗ ≥ 0, g(x ∗ ) ≤ 0,
∗ (3.1.4)
η , (x ∗ ) Z = 0, η∗ ∈ K + , (x ∗ ) ∈ K.
i i
i i
ItoKunisc
i i
2008/6/12
page 67
i i
⎧
⎪
⎪L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) ≥ γ |h|2X for all h ∈ C,
⎨
where C = {h ∈ X : e (x ∗ )h = 0, gi (x ∗ )h ≤ 0 (3.1.6)
⎪
⎪
⎩
for i ∈ I1 , gi (x ∗ )h = 0 for i ∈ I2 }.
In case X is finite-dimensional, (3.1.6) is well known to imply that x ∗ is a strict local solution
to (3.1.1); see, e.g., [Be, p. 71]. For the infinite-dimensional case this will be proved in
Theorem 3.4 below.
Section 3.2 is devoted to augmentability of problem (3.1.1). This means that a func-
tional is associated to (3.1.1) with the property that if x ∗ is a local minimizer of (3.1.1), then
it is also a minimizer of that functional where the essential constraints are eliminated and
at most simple affine constraints remain. Here this is achieved on the basis of Lagrangian
functionals and augmentability is obtained without the use of a strict complementarity as-
sumption. Section 3.3 contains the foundations for the first order augmented Lagrangian
algorithm based on a duality framework. The convergence analysis for the first order algo-
rithm is given in Section 3.4. Section 3.5 contains an application to parameter estimation
problems.
c c
min fc (x, u) = f (x) + |e(x)|2W + |g(x) + u|2Rm over (x, u) ∈ X × Rm
2 2 (3.2.1)
subject to e(x) = 0, g(x) + u = 0, u ≥ 0, (x) ∈ K,
where c > 0 will be appropriately chosen. Observe that x ∗ is a local solution to (3.1.1)
if and only if (x ∗ , u∗ ) = (x ∗ , −g(x ∗ )) is a (local) solution to (3.2.1). Moreover, x ∗ is
stationary for (3.1.1) with Lagrange multiplier (λ∗ , μ∗ , η∗ ) if and only if (x ∗ , −g(x ∗ )) is
stationary for (3.2.1) with Lagrange multiplier (λ∗ , μ∗ , μ∗ , η∗ ), i.e., (3.1.3)–(3.1.4) hold
with g(x ∗ ) = −u∗ . Associated to (3.2.1) we introduce the functional
c c
Lc (x, u, λ, μ, η) = L(x, λ, μ, η) + |e(x)|2W + |g(x) + u|2Rm . (3.2.2)
2 2
i i
i i
ItoKunisc
i i
2008/6/12
page 68
i i
combines the linearized equality constraint and the active inequality constraints, which
act like equalities in the sense that the associated Lagrange multipliers are positive. The
essential technical result which will imply augmentability of the constraints in (3.1.1) is
given next.
Proposition 3.1. If (3.1.2)–(3.1.6) hold, then there exist constants c̄ > 0 and τ ∈ (0, γ ]
such that
m1
∗ ∗ ∗ ∗
H ((h, k̂), (h, k̂)) := L (x , λ , μ , η )(h, h) + c|Eh| + c | li , h X + kˆi |2
2
i=1
≥ τ (|h|2 + |k̂R2 m1 |)
To verify (3.2.5) let ni = Pker e li , where Pker e denotes the projection onto ker e (x ∗ ). Choose
I˜2 ⊂ I2 such that {ni }i∈I˜ are linearly independent and span{ni }i∈I˜ = span{ni }i∈I2 . Possibly
2 2
1 +m̃2
after reindexing we have {ni }i∈I˜2 = {ni }m
m1 +1 . We show that ker E = ker Ẽ. Let h ∈ ker E.
Then e (x ∗ )h = 0. For every i ∈ I˜2 let li = ni + li⊥ , where li⊥ ∈ (ker e (x ∗ ))⊥ . Then
i i
i i
ItoKunisc
i i
2008/6/12
page 69
i i
ni , h X = li , h X = 0, i ∈ I˜2 , and therefore h ∈ ker Ẽ. The converse, ker Ẽ ⊂ ker E, is
proved similarly. To verify surjectivity! of Ẽ, observe that the adjoint Ẽ ∗ : W ×R! → X of
m̃2
To exclude trivial cases we assume throughout that I˜1 and I˜2 are nonempty.
Step 2. Here we characterize the polar cone
C1∗ = h ∈ ker E : h, h̃ X ≤ 0 for all h̃ ∈ C1
Denoting by P and P ∗ the canonical projections in ker E onto C1 and C1∗ , respectively,
every element h ∈ ker E can be uniquely expressed as h = h1 + h2 with h1 , h2 X = 0 and
h1 = P h ∈ C1 , h2 = P ∗ h ∈ C1∗ (see [Zar, p. 256]). Moreover
.
C1 = Ci with Ci = {h ∈ ker E : ni , h X ≤ 0}
i∈I˜1
and
. ∗
C1∗ = Ci = co Ci∗ = co Ci∗
m̃1
|h| =2
αi n i , h X ≤ αi ni , h X ,
i=1 I1+
i i
i i
ItoKunisc
i i
2008/6/12
page 70
i i
where I1+ = {i ∈ I˜1 : ni , h X > 0 and αi > 0}. Consequently
⎛ ⎞⎛ ⎞1/2 ⎛ ⎞1/2
−1/2
|h|2 ≤ ⎝ αi2 ⎠ ⎝ ni , h ⎠ ≤ K1 |h| ⎝ ni , h ⎠
2 2
X X
I1+ Ii+ I1+
and
2
K1 |h|2 ≤ ni , h X
for every h ∈ C1∗ . (3.2.7)
I1+
+ L (x ∗ , λ∗ , μ∗ , η∗ )(h2 + h3 , h2 + h3 )
c1 2 2
+ c|Eh3 |2 + ni , h2 X + k̂i − c1 li , h3 X
2 + +
I1 I1
c2 2
+ |k̂i |2 − c2 li , h X
2 + +
I1 \I1 I1 \I1
for all 0 < c1 , c2 ≤ c to be chosen below. Here we used (a + b)2 ≥ 12 a 2 − b2 and the
fact that li , h2 = ni , h2 for i ∈ I1 . To estimate Eh3 from below we make use of the
fact that Ẽ is an isomorphism from (ker E)⊥ to W × Rm̃2 . Hence there
/ exists K2 > / 0 such
that K2 |h3 | ≤ |Ẽh3 | ≤ |Eh3 | for all h3 ∈ (ker E)⊥ . Setting L = /L (x ∗ , λ∗ , μ∗ )/ , c3 =
min(c1 , c2 ), and c4 = max(c1 , c2 ) we obtain
H ((h, k̂), (h, k̂)) = γ |h1 |2 − 2L|h1 |(|h2 | + |h3 |) − L(|h2 | + |h3 |)2
c1 2 c3 2
+ cK22 |h3 |2 + ni , h2 X + k̂
2 + 2 I i
I1 1
2 2 2
− 3c4 li , h3 X − 3c2 li , h1 X + li , h2 X ,
I1 I1 −I1+
where we used the fact that ni , h2 X
≥ 0 for i ∈ I1+ .
i i
i i
ItoKunisc
i i
2008/6/12
page 71
i i
!
Setting |l|2 = I1 |li |2 and using (3.2.7) we find
c1 c3 2
H ((h, k̂), (h, k̂)) ≥ γ |h1 |2 + K1 |h2 |2 + cK22 |h3 |2 + k̂
2 2 I i
1
We now choose the constants in the order c2 , c1 , and c̄ such that the coefficients of
|h1 |2 , |h2 |2 , and |h3 |2 , with c replaced by c̄, are positive. This implies the claim for
every c ≥ c̄.
Corollary 3.2. Let E ∈ L(X, W ) be surjective and let A ∈ L(X) be self-adjoint and
coercive on ker E; i.e., there exists γ > 0 such that Ax, x ≥ γ |x|2 for
all x ∈ ker E.
Then there exist constants τ > 0 and c̄ > 0 such that (A + cE ∗ E)x, x X ≥ τ |x|2 for all
x ∈ X and c ≥ c̄.
Proposition 3.3. Let (3.1.2)–(3.1.6) hold. Then there exists σ > 0 such that
Lc̄ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + μ∗i ki ≥ σ (|h|2 + |k 2 |)
i∈I2
for all h ∈ X with |h| ≤ μ̃/(2c̄ supi∈I2 |li |), μ̃ = minI2 μ∗i and all k ∈ Rm with ki ≥ 0 for
i ∈ I1 ∪ I2 , and c̄ is as introduced in Proposition 3.1.
!
Proof. Let A = Lc̄ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + i∈I2 μ∗i ki . Then from (3.2.3) and
the definition of H
∗
A = H ((h, k̂), (h, k̂)) + c̄
li , h + ki
2 + c̄ |ki |2 + 2 li , h X ki + μi ki ,
X
I3 I2 I2
i i
i i
ItoKunisc
i i
2008/6/12
page 72
i i
where k̂ denotes the first m1 coordinates of the vector k. By Proposition 3.1 we have for
every ε > 1
2
A ≥ τ (|h|2 + |k̂|2Rm1 ) + c̄ li , h X (1 − ε 2 ) + |ki |2 (1 − ε −2 )
I3
+ c̄ |ki | − 2c̄
2
li , h
ki + μ∗i ki
X
I2 I2 I2
−2
≥ τ (|h| +2
|k̂|2Rm1 ) + c̄(1 − ε ) |ki |2
I2 ∪I3
+ c̄(1 − ε )|h|
2 2
|li | − 2c̄
2
li , h
ki + μ∗i ki .
X
I3 I2 I2
!
Now choose ε > 1 such that τ
2
= (ε 2 − 1)τ I3 |li |2 . Then using the constraint for h we
find that
1
A≥ τ (|h|2 + |k̂|2Rm1 ) + c̄(1 − ε −2 ) ki2 + μ̃ − 2c̄ sup |li ||h| ki
2 I ∪I I2 I
2 3 2
1
≥ τ (|h|2 + |k̂|2Rm1 ) + c̄(1 − ε −2 ) ki2 .
2 I ∪I 2 3
This proves the claim for c = c̄. For arbitrary c ≥ c̄ the assertion follows from the form of
Lc (x ∗ , u∗ , λ∗ , μ∗ ).
Theorem 3.4. Assume that (3.1.2)–(3.1.6) hold. Then there exist constants σ̄ > 0, c̄ > 0
and a neighborhood U (x ∗ , −g(x ∗ )) = U (x ∗ , u∗ ) of (x ∗ , u∗ ) such that
Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u Rm
c c
= f (x) + λ∗ , e(x) W + μ∗ , g(x) + u Rm + η∗ , (x) Z + |e(x)|2W + |g(x) + u|2Rm
2 2
≥ f (x ∗ ) + σ̄ (|x − x ∗ |2 + |u − u∗ |2 )
(3.2.9)
where (Lc )x and (Lc )u denote the partial derivatives of Lc with respect to x and u. Conse-
quently
Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm = f (x ∗ ) + μ∗ , u − u∗ Rm
+ Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((x − x ∗ , u − u∗ ), (x − x ∗ , u − u∗ )) + o(|x − x ∗ |2 + |u − u∗ |2 ).
i i
i i
ItoKunisc
i i
2008/6/12
page 73
i i
The definitions of I1 and I3 imply that μ∗i = 0 for i ∈ I1 ∪ I3 . Thus Proposition 3.3 implies
the existence of a neighborhood U (x ∗ , u∗ ) of (x ∗ , u∗ ) and a constant σ̄ > 0 such that (3.2.9)
holds for all (x, u) ∈ U (x ∗ , u∗ ) with ui ≥ 0 for i ∈ I1 ∪ I2 .
Corollary 3.5. Assume that (3.1.2)–(3.1.6) hold. Then there exists a neighborhood U (x ∗ )
of x ∗ such that
f (x) ≥ f (x ∗ ) + σ̄ |x − x ∗ |2
for all x ∈ U (x ∗ ) satisfying e(x) = 0, g(x) ≤ 0, and (x) ∈ K.
Theorem 3.4 can be used as the basis to define augmented cost functionals in terms
of x only with the property that x ∗ is a uniform strict local unconstrained minimum. The
two choices we present differ in the way in which the inequality constraints are treated. The
first choice uses the classical cutoff penalty functional
g̃(x) = max(g(x), 0), (3.2.10)
and the second is the Bertsekas penalty functional
μ
ĝ(x, μ, c) = max g(x), − , (3.2.11)
c
where μ ∈ Rm and c > 0. In each of the two cases the max operation acts coordinatewise. To
motivate the choice (3.2.11) we use the Lagrangian Lc (x, u, λ, μ, η) introduced in (3.2.2)
for (3.2.1) and consider
⎧
⎨min Lc (x, u, λ, μ, η) = fc (x, u) + λ, e(x) W
(3.2.12)
⎩ + μ, g(x) + u Rm + η, (x) Z over x ∈ X, u ≥ 0.
Carrying out the constraint minimization with respect to u with x and μ fixed results in
μ
u = u(x, μ) = max 0, − g(x) + ,
c
i i
i i
ItoKunisc
i i
2008/6/12
page 74
i i
and consequently
μ
g(x) + u(x, μ) = max g(x), − = ĝ(x, μ, c). (3.2.13)
c
Proof. Setting
u = max(−g(x), 0) (3.2.14)
we have g(x) + u = g̃(x) and u ≥ 0. Recall the definition of U = U (x ∗ , −g(x ∗ )) from
Theorem 3.4. Determine a neighborhood U (x ∗ ) such that x ∈ U (x ∗ ) implies (x, −g(x)) ∈
U and gi (x) ≤ 0 if gi (x ∗ ) < 0. It is simple to argue that x ∈ U (x ∗ ) implies (x, u) ∈ U,
where u is defined in (3.2.14). The claim now follows from Theorem 3.4.
Corollary 3.7. Let (3.1.2)–(3.1.6) hold and let r > 0. Then there exist constants δ > 0 and
c̃ = c̃(r) ≥ c̄ such that
c c
f (x)+ λ∗ , e(x) X + μ∗ , ĝ(x, μ, c) Rm + η∗ , (x) Z + |e(x)|2W + |ĝ(x, μ, c)|2
2 2
≥ f (x ∗ ) + σ̄ |x − x ∗ |2
Proof. Let ε > 0 be such that gi (x ∗ ) ≤ −ε for all i ∈ I3 and such that |x − x ∗ | < ε
and |u − u∗ | < ε implies (x, u) ∈ U (x ∗ , u∗ ), where U (x ∗ , u∗ ) is given in Theorem 3.4.
Determine δ ∈ (0, ε) such that |x − x ∗ | ≤ δ implies |g(x) − g(x ∗ )| < 2ε and choose c̃ ≥ 2rε .
For c ≥ c̃ and μ ∈ Br+ we define
μ
u = max 0, − + g(x) . (3.2.15)
c
Then g(x) + u = ĝ(x) and u ≥ 0. To verify the claim it suffices to show that x ∈ Bδ
and μ ∈ Br+ imply |u − u∗ | < ε, where u is defined in (3.2.15). If i ∈ I1 ∪ I2 , then
|ui − u∗i | ≤ |gi (x ∗ ) − gi (x)| + μci . For i ∈ I3 we have μci + gi (x) ≤ μci + |gi (x ∗ ) − gi (x)| +
gi (x ∗ ) < rc + 2ε − ε < 0 and consequently |ui − u∗i | = |gi (x ∗ ) − gi (x) − μci |. Summing
over i = 1, . . . , m we find |u − u∗ | ≤ 2|g(x ∗ ) − g(x)|2 + c22 |μ|2 < ε2 .
i i
i i
ItoKunisc
i i
2008/6/12
page 75
i i
and updating λk as a maximizer of the dual problem associated to (3.1.1), which will
be defined below. The first order augmented Lagrangian method provides a systematic
technique for the multiplier update. Its convergence analysis will be given in the next
section.
Let x ∗ be a local solution of (3.1.1) with (3.1.2)–(3.1.6) holding. The algorithm will
∗
require startup values (λ0 , μ0 ) ∈ W × Rm + for the Lagrange multipliers. We set r = |μ | +
∗ 2 ∗ 2 1/2
(|λ0 −λ | +|μ0 −μ | ) and choose ε and c̃ = c̃(r, ε) ≥ c̄ as in the proof of Corollary 3.6.
Recall that ε was chosen such that gi (x ∗ ) ≤ −ε for all indices i in the set of inactive indices I3
and such that |x −x ∗ | < ε, |u−u∗ | < ε imply (x, u) ∈ U (x ∗ , −g(x ∗ )) with U (x ∗ , −g(x ∗ ))
given in Theorem 3.4. Finally, let {cn }∞ n=1 be a monotonically nondecreasing sequence of
penalty parameters with c1 > c̃, and set σn = cn − c̄. It is not required that limn→∞ cn = ∞,
as would be the case for penalty methods.
Algorithm ALM.
1. Choose (λ0 , μ0 ) ∈ W × Rm
+.
2. Let xn be a solution to
i i
i i
ItoKunisc
i i
2008/6/12
page 76
i i
where
Ln (x) = f (x) + λn−1 , e(x) W + μn−1 , ĝ(x, μn−1 , cn ) Rm
cn cn
+ |e(x)|2W + |ĝ(x, μn−1 , cn )|2 .
2 2
Observe that μn = max(μn−1 + σn g(xn ), μn−1 (1 − σcnn )) and consequently μn ∈ Rn+ for
each n = 1, 2, . . . . Existence of solutions to (Pn ) will be discussed in Section 3.4. It will be
shown that (Pn ) has a solution in the interior of Bδ (see Corollary 3.6) for all n sufficiently
large, or for all n if c1 is sufficiently large, and that these solutions converge to x ∗ , while
(λn , μn ) converges to (λ∗ , μ∗ ).
To motivate the Lagrange multiplier update in step 3 of the algorithm and to justify
calling this a first order algorithm, we consider problem (3.2.1) without the infinite rank in-
equality constraint. Introducing Lagrange multipliers for the equality constraints e(x) = 0
and g(x) + u = 0 we obtained the augmented Lagrange functional Lc in (3.2.12). Carry-
ing out the minimization with respect to u and utilizing (3.2.13) suggests introducing the
modified augmented Lagrangian functional
c c
L̂c (x, λ, μ) = f (x) + λ, e(x) W + μ, ĝ(x, μ, c) Rm + |e(x)|2W + |ĝ(x, μ, c)|2Rm .
2 2
(3.3.1)
Since L̂c (x ∗ , λ∗ , μ∗ ) = f (x ∗ ) we find
L̂c (x ∗ , λ, μ) ≤ L̂c (x ∗ , λ∗ , μ∗ ) ≤ L̂c (x, λ∗ , μ∗ ) for all x ∈ Bδ and μ ≥ 0, (3.3.2)
∗ ∗
whenever c ≥ c̃. The first inequality follows from the fact that e(x ) = ĝ(x , μ) = 0 for
μ ≥ 0 and the second one from Corollary 3.6. From the saddle point property (3.3.2) for
L̂c (x, λ, μ) we deduce that
sup inf L̂c (x, λ, μ) ≤ L̂c (x ∗ , λ∗ , μ∗ ) ≤ inf sup L̂c (x, λ, μ). (3.3.3)
λ,μ≥0 x x λ,μ
admits a unique solution x(λ, μ) for every (λ, μ) ∈ U (λ∗ , μ∗ ) and as a consequence of
Theorem 2.24 (λ, μ) → x(λ, μ) is differentiable. Consequently the locally defined dual
functional
d(λ, μ) = min∗ L̂c (x, λ, μ)
x∈U (x )
i i
i i
ItoKunisc
i i
2008/6/12
page 77
i i
Thus the multiplier updates in step 3 of Algorithm ALM are steepest ascent directions for
the dual problem
sup d(λ, μ).
λ,μ≥0
Remark 3.3.1. In this chapter we identify X and W with their dual spaces and consider
e (x)∗ as an operator from W to X. If, alternatively, e (x)∗ is considered as an operator
from W ∗ to X ∗ , then the Lagrange multiplier is taken as an element of W ∗ and the Lagrange
functional is defined as L(x, λ̃) = f (x) + λ̃, e(x)
W ∗ ,W . The relation between λ and λ̃
is given by I λ = λ̃, with I the canonical isomorphism from W to W ∗ . If, for example,
W = H −1 (), then W ∗ = H01 () and I = (−)−1 . The Lagrange multiplier update in
step 3 of Algorithm ALM is then given by λn = λn−1 + σn I e(xn ).
i i
i i
ItoKunisc
i i
2008/6/12
page 78
i i
for all x ∈ Bδ , μ ∈ Br+ , and c ≥ c̃. We also assume that Bδ is contained in the region of
applicability of (3.4.1) and that {cn }∞ n=1 is a monotonically nondecreasing sequence with
c̃ < c1 . Then we have the following convergence properties of Algorithm ALM from
arbitrary initializations (λ0 , μ0 ) ∈ W × Rm+ in the case that suboptimal solutions are chosen
in step 2.
Theorem 3.8. Assume that (3.1.2)–(3.1.5), (3.4.1) hold and that x̃n ∈ Bδ satisfy
If (λn , μn ) are defined as in step 3 of Algorithm ALM with xn replaced by x̃n , then for n ≥ 1
we have with σn = cn − c̄
1
σ̄ |x̃n − x ∗ |2 + (|λn − λ∗ |2 + |μn − μ∗ |2 )
2σn
1
≤ (|λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 ). (3.4.3)
2σn
This implies that μn ∈ Br+ for all n ≥ 1 and
1
|x̃n − x ∗ |2 ≤ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) (3.4.4)
2σ̄ σn
and
∞
1
σn |x̃n − x ∗ |2 ≤ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ). (3.4.5)
n=1
2σ̄
Proof. We proceed by induction and assume that the claim has been verified up to n−1. For
n = 1 the result can be obtained by the general arguments given below. For the induction
step we observe that (3.4.3), which is assumed to hold up to n − 1, implies
Consequently μn−1 ∈ Br+ and (3.4.1) with μ = μn−1 is applicable. Using the fact that
i i
i i
ItoKunisc
i i
2008/6/12
page 79
i i
1
m
∗ ∗
Ln (x ) = f (x ) + {[max(0, μn−1,i + cn gi (x ∗ ))]2 − μ2n−1,i } ≤ f (x ∗ ). (3.4.6)
2cn i=1
1
L̂(x̃n , μn−1 ) + (|λn − λ∗ |2 − |λn−1 − λ∗ |2 )
2σn
1
+ (|μn − μ∗ |2 − |μn−1 − μ∗ |2 ) ≤ Ln (x̃n ) ≤ f (x ∗ ).
2σn
1 1 1 1
σ̄ |x̃n − x ∗ |2 + |λn − λ∗ |2 + |μn − μ∗ |2 ≤ |λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 ,
2σn 2σn 2σn 2σn
which establishes (3.4.3). Estimates (3.4.4) and (3.4.5) follow from (3.4.3).
Further (PnC ) denotes problem (Pn ) with the additional constraint that x is contained in the
closed ball Bδ . We refer to xn as the solution to (PnC ) if Ln (xn ) ≤ Ln (x) for all x ∈ Bδ .
Proposition 3.9. If (3.1.2)–(3.1.5), (3.4.1), and (3.4.7) hold, then (PnC ) admits a solution
xn for every n = 1, 2, . . . . Moreover, there exists n0 such that every solution xn of (PnC )
i i
i i
ItoKunisc
i i
2008/6/12
page 80
i i
Thus for n0 or c sufficiently large the solutions to (PnC ) are local solutions of the
unconstrained problems (Pn ).
Proof. Let {xni }i∈N be a minimizing sequence for (PnC ). There exists a weakly convergent
subsequence {xnik }k∈N with weak limit xn ∈ Bδ . Condition (3.4.7) implies weak lower
semicontinuity of Ln and hence Ln (xn ) ≤ lim inf ik Ln (xnik ) and xn is a solution of (PnC ).
In particular (3.4.2) holds with x̃n replaced by xn for n = 1, 2, . . . . Consequently
limn xn = x ∗ and xn ∈ int Bδ for all n sufficiently large. Alternatively, if c11−c̄ (|λ0 − λ∗ |2 +
|μ0 − μ∗ |2 ) is sufficiently small, then xn ∈ int Bδ for all n = 1, 2, . . . by (3.4.4).
For the solutions xn obtained in Proposition 3.9 the conclusions of Theorem 3.8
hold, in particular limn xn = x ∗ and the associated Lagrange multipliers {λn } and {μn } are
bounded.
To investigate convergence of the Lagrange multipliers we introduce M : X →
W × Rm1 +m2 × Z defined by
M = (e (x ∗ ), gac
(x ∗ ), (x ∗ )),
where gac denotes the first m1 + m2 coordinates of g. The following additional hypothesis
will be needed:
We point out that (3.4.9) is more restrictive than the regular point condition in Def-
inition 1.5 of Chapter 1. Below we set μn,ac = col(μn,1 , . . . , μn,m1 +m2 ) and μ∗ac =
col(μ∗1 , . . . , μ∗m1 +m2 ). Without loss of generality we shall assume that σ̄ ≤ 1.
Theorem 3.10. Let (3.1.2)–(3.1.5), (3.4.1), and (3.4.8)–(3.4.9) hold and let xn be a solution
of (Pn ) in int Bδ , ñ ≥ 1. Then there exists a constant K independent of n such that
K
|λn − λ∗ | + |μn − μ∗ | + |ηn − η∗ | ≤ √ (|λn−1 − λ∗ | + |μn−1 − μ∗ |) for all n ≥ 1.
σ̄ σn
Here ηn denotes the Lagrange multiplier associated with the constraint ∈ K in (Pn ).
i i
i i
ItoKunisc
i i
2008/6/12
page 81
i i
Proof. Optimality of xn and x ∗ for (Pn ) and (3.1.1) (see (3.1.3)) imply
f (xn ), h + λn−1 + cn e(xn ), e (xn )h + max(0, μn−1 + cn g(xn )), g (xn )h
+ ηn , (xn )h = 0
and
∗ ∗ ∗ ∗ ∗ ∗ ∗
f (x ), h + λ , e (x )h + μ , g (x )h + η , (x )h = 0
for all h ∈ X. If we set λ̃n = λn−1 + cn e(xn ) and μ̃n = max(0, μn−1 + cn g(xn )), we obtain
the following equation in X:
e (x ∗ )∗ (λ̃n − λ∗ ) + gac
(x ∗ )∗ (μ̃n,ac − μ∗ac ) + (x ∗ )∗ (ηn − η∗ ) = f (x ∗ ) − f (xn )
+ (e (x ∗ )∗ − e (xn )∗ )λ̃n + (gac
(x ∗ )∗ − gac
(xn )∗ )μn,ac .
Here we used the fact that μ̃n,i = 0 for i ∈ I3 by the choice of δ and c̃ (see the proof of
Corollary 3.7), and we set μn,ac = col (μn,1 , . . . , μn,m1 +m2 ). Note that M ∗ : W ×Rm1 +m2 →
X is given by M ∗ = col (e (x ∗ )∗ , gac
(x ∗ )∗ ). Hence we find
MM ∗ (λ̃n − λ∗ , μ̃n,ac − μ∗ac , ηn − η∗ ) = M f˜ (x ∗ ) − f˜ (xn )
+ (e (x ∗ )∗ − e (xn )∗ )λ̃n + (gac
(x ∗ )∗ − gac
(xn )∗ )μ̃n,ac . (3.4.10)
c̄ n−1
μni = μ for i ∈ I3 , n ≥ 1. (3.4.13)
cn i
From (3.4.3), with x̃n replaced by xn , (3.4.12), and (3.4.13), and using σ̄ ≤ 1 the theorem
follows.
for all n ≥ 1.
i i
i i
ItoKunisc
i i
2008/6/12
page 82
i i
where PM = M(M ∗ M)−1 M ∗ denotes the orthogonal projection of W × Rm1 +m2 × Z onto
the range of M which is closed, since M ∗ is surjective. Since xn → x ∗ in X this implies
convergence of PM (λn , μn,ac , ηn ) to PM (λ∗ , μ∗ac , η∗ ).
Remark 3.4.2. If a constraint of the type x ∈ C, with C a closed convex set in X, appears in
(3.1.1), then Theorem 3.8 and Proposition 3.9 remain valid if Algorithm ALM is modified
such that in (Pn ) the functional Ln (x) is minimized over x ∈ C and x̃n appearing in (3.4.2)
satisfies x̃n ∈ C. The stationarity condition (3.1.3) has to be replaced by
f (x ∗ )(x − x ∗ ) + λ∗ , e (x ∗ )(x − x ∗ ) W + μ∗ , g (x ∗ )(x − x ∗ ) Rm ≥ 0 (3.1.3 )
This is a special case of (3.1.1) with X = Q × H01 (), W = H −1 (), g = 0, and f the
cost functional in (3.5.2). The affine constraint a(x) ≥ β can be considered as an explicit
constraint as discussed in Remark 3.4.2 by setting C = {a ∈ Q : a(x) ≥ β}. Denote the
solution to (3.5.1) as a function of a by y(a). To guarantee the existence of a solution for
(3.5.2) we require the following assumption:
-
either N is a norm on Q
(3.5.3)
or inf {|y(a) − z|H 1 : a is constant } < |z|H 1 .
i i
i i
ItoKunisc
i i
2008/6/12
page 83
i i
Proof. Let {(an , yn )}∞n=1 denote a minimizing sequence for (3.5.2). If N is a norm, then
a subsequence argument implies the existence of a solution to (3.5.2). Otherwise we de-
compose an as an = an1 + an2 with an2 ∈ Q/R and an1 ∈ (Q/R)⊥ . Since {(an , yn )}∞ n=1 is a
minimizing sequence, {N (an2 )}∞ n=1 and consequently {|a 2 ∞ ∞
n | L } n=1 are bounded. We have
an1 |∇yn |2 dx = − an2 |∇yn |2 dx + fyn dx.
|z| ≤ lim |y(an ) − z|2H 1 () + βN 2 (an ) ≤ inf {|y(a) − z|H 1 : a = constant} < |z|H 1 ,
n→∞
Using Theorem 1.6 one argues the existence of Lagrange multipliers (λ∗ , η∗ ) ∈
−1
H () × Q∗ such that the Lagrangian
1 α
L(a, y, λ∗ , η∗ ) = |y − z|2H 1 () + N 2 (a) + λ∗ , e(a, y)
H01 ,H −1 + η∗ , β − a
Q∗ ,Q
2 0 2
is stationary at (a ∗ , y ∗ ), and η∗ , β − a ∗
Q∗ ,Q = 0, η∗ , h
Q∗ ,Q ≥ 0 for all h ≥ 0. Hence
(3.1.2)–(3.1.5) hold. Here (3.1.3)–(3.1.4) must be modified to include the affine inequality
constraint with infinite-dimensional image space. We turn to the augmentability condition
(3.4.1) and introduce the augmented Lagrangian
c
Lc (a, y, λ∗ , η∗ ) = L(a, y, λ∗ , η∗ ) + |e(a, y)|2H −1
2
for c ≥ 0.
Henceforth we assume that (3.5.2) admits a solution (a0 , y0 ) for α = 0. Such a solution
exists if z is attainable, i.e., if there exists a0 ∈ Q, a0 ≥ β, such that y(a0 ) = y and in this
case y0 = y(a0 ). Alternatively, if the set of coefficients is further constrained by a norm
bound |a|Q ≤ γ , for example, then existence of a solution to (3.5.2) with α = 0 is also
guaranteed.
This proposition implies that (3.4.1) is satisfied and that Theorem 3.8 and Proposi-
tion 3.9 are applicable with Bδ = {a ∈ Q : a(x) ≥ β, |a|Q ≤ γ } × H01 ().
i i
i i
ItoKunisc
i i
2008/6/12
page 84
i i
Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) = ∇L(a ∗ , y ∗ , λ∗ , η∗ )
1 α c
− h∇λ∗ , ∇v L2 + |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 ,
2 0 2 2
where h = a − a ∗ , v = y − y ∗ , and ∇L denotes the gradient of L with respect to (a, y).
First order optimality implies that
Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) ≥ − h∇λ∗ , ∇v L2
1 α c
+ |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 =: D.
2 0 2 2
Introduce P = ∇ · (−)−1 ∇ and note that P can be extended to an orthogonal projection
on L2 ()n . We find that
|e(a, y)|2H −1 = P (a∇y − a ∗ ∇y ∗ ), a∇y − a ∗ ∇y ∗ L2
= P (a∇v + h∇y ∗ ), h∇y ∗ + a∇v L2
1
≥ P (h∇y ∗ ), h∇y ∗ L2 − P (a∇v), a∇v L2
2
1
≥ P (h∇y ∗ ), h∇y ∗ L2 − |a|2L∞ |v|2H 1
2 0
1
≥ P (h∇y ∗ ), h∇y ∗ L2 − γ 2 k12 |v|2H 1 ,
2 0
∞
where k1 denotes the embedding constant of Q into L (). Henceforth we consider the case
that ker N = span{1} and use the decomposition introduced in the proof of Lemma 3.12.
We have
1
|e(a, y)|2 ≥ |h1 |2 |y ∗ |2H 1 − |h1 | |h2 |L∞ |y ∗ |2H 1 − γ 2 k12 |v|H01 .
2 0 0
∞
Since Q embeds continuously into L () and N is a norm equivalent to the norm on Q/R,
there exists a constant k2 such that |h2 |L∞ ≤ k2 N (h) for all h = h1 + h2 ∈ Q. Consequently
1
|e(a, y)|2 ≥ |h1 |2 |y ∗ |2H 1 − k2 |h1 | N (h2 ) |y ∗ |2H 1 − γ 2 k12 |v|H01 (3.5.4)
2 0 0
for all (a, y) ∈ C×H01 (). Next note that Ly (a ∗ , u∗ ; λ∗ , η∗ ) = 0 implies that −∇(a ∗ ∇λ∗ ) =
−(y ∗ − y) in H −1 (), and hence
h∇λ∗ , ∇v 2
≤ 1 (|h1 | + k2 N (h2 ))|y ∗ − z|H 1 |v|H 1 . (3.5.5)
L β 0 0
i i
i i
ItoKunisc
i i
2008/6/12
page 85
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 86
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 87
i i
Chapter 4
Augmented Lagrangian
Methods for Nonsmooth,
Convex Optimization
4.1 Introduction
The class of optimization problems that motivates this chapter is given by
where X, H are real Hilbert spaces, C is a closed convex subset of X, and ∈ L(X, H ).
In this chapter we identify H with its dual space and we distinguish between X and its
dual X ∗ . Further f : X → R is a continuously differentiable, convex function, and ϕ :
H → (−∞, ∞] is a proper, lower semicontinuous, convex but not necessarily differentiable
function. This problem class encompasses a wide variety of optimization problems including
variational inequalities of the first and second kind [Glo]. Our formulation here is slightly
more general than the one in [Glo, EkTe], since we allow the additional constraint x ∈ C.
For example, one can formulate regularized inverse scattering problems in the form
(4.1.1):
2
μ 1
min |∇u| + g |∇u| dx +
2
k(x, y)u(y) dy − z(x)
dx (4.1.2)
2 2
over u ∈ H 1 () and u ≥ 0, where k(x, y) denotes a scattering kernel. The problem
consists in recovering the original image u defined on a domain from scattered and
noisy data z ∈ L2 (). Here μ > 0 and g > 0 are fixed and should be adjusted to the
statistics of the noise. If μ = 0, then this problem is equivalentto the image enhancement
algorithm in [ROF] based on minimization of the BV-seminorm |∇u| ds. In this example
ϕ(v) = g |v| ds, which is nondifferentiable. Several additional applications are treated
at the end of this chapter.
We develop a Lagrange multiplier theory to deal with the nonsmoothness of ϕ. To
briefly explain the approach let x, λ ∈ H and c > 0, and define the family of generalized
Yosida–Moreau approximations ϕc (x, λ) by
c
ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.1.3)
u∈H 2
87
i i
i i
ItoKunisc
i i
2008/6/12
page 88
i i
This is equivalent to an augmented Lagrangian approach. In fact, note that (4.1.1) is equiv-
alent to
min f (x) + ϕ(x − u)
(4.1.4)
subject to x ∈ C and u = 0 in H.
Treating the equality constraint u = 0 in (4.1.4) by the augmented Lagrangian method
results in the minimization problem
c 2
min f (x) + ϕ(x − u) + (λ, u)H + |u| , (4.1.5)
x∈C,u∈H 2 H
f (xc ) + ∗ λc , x − xc
X∗ ,X ≥ 0 for all x ∈ C,
λc = ϕc (xc , λc ).
It will further be shown that under appropriate conditions the pair (xc , λc ) ∈ C × H has a
(strong-weak) cluster point (x̄, λ̄) as c → ∞ such that x̄ ∈ C is the minimizer of (4.1.1)
and that λ̄ ∈ H is a Lagrange multiplier in the sense that
f (x̄) + ∗ λ̄, x − x̄
X∗ ,X ≥ 0 for all x ∈ C (4.1.7)
System (4.1.7)–(4.1.8) for the pair (x̄, λ̄) is a necessary and sufficient optimality condition
for problem (4.1.1). We analyze iterative algorithms of Uzawa and augmented Lagrangian
type for finding the optimal pair (x̄, λ̄) and present a convergence analysis. It will be shown
that condition (4.1.8) is equivalent to the complementarity condition λ̄ ∈ ∂ϕ(x̄). Thus
the frequently employed differential inclusion λ̄ ∈ ∂ϕ(x̄) is replaced by the nonlinear
equation (4.1.8).
This chapter is organized as follows. In Sections 4.2–4.3 we present the basic convex
analysis and duality theory in Banach spaces. Section 4.4 is devoted to the generalized
Yosida–Moreau approximation which is the basis for the remainder of the chapter. Condi-
tions for the existence of Lagrange multiplier for (4.1.1) and optimality systems are derived
in Section 4.5. Section 4.6 is devoted to the augmented Lagrangian algorithm. Convergence
of both the augmented Lagrangian and the Uzawa methods are proved. Section 4.7 contains
a large number of concrete applications.
i i
i i
ItoKunisc
i i
2008/6/12
page 89
i i
for all sequences {xn } converging weakly to x. Further F is w.l.s.c. if it is w.l.s.c. at all
x ∈ X.
(4) The subset D(F ) = {x ∈ X : F (x) < ∞} of X is called the effective domain
of F .
(5) The epigraph of F is defined by epi(F ) = {(x, c) ∈ X × R : F (x) ≤ c}.
Lemma 4.2. A functional F : X → (−∞, ∞] is l.s.c. if and only if its epigraph is closed.
Proof. The lemma follows from the fact that epi (F ) is closed if and only if (xn , cn ) ∈
epi(F ) → (x, c) in X × R implies F (x) ≤ c.
Lemma 4.3. A functional F : X → (−∞, ∞] is l.s.c. if and only if its level sets Sc = {x ∈
X : F (x) ≤ c} are closed for all c ∈ R.
Proof. Assume that F is l.s.c. Let c > 0 and let {xn } be a sequence in Sc with limit x. Then
F (x) ≤ lim inf x→∞ F (xn ) ≤ c and hence x ∈ Sc and Sc is closed. Conversely, assume
that Sc is closed for all c > 0, and let {xn } be a sequence converging to x in X. Choose a
subsequence {xnk } with
lim F (xnk ) = lim inf F (xn ),
k→∞ n→∞
and suppose that F (x) > lim inf n→∞ F (xn ). Then there exists c̄ ∈ R such that
and there exists an index m such that F (xnk ) < c̄ for all k ≥ m. Since Sc̄ is closed x ∈ Sc̄ ,
which is a contradiction to c̄ < F (x).
Lemma 4.4. Assume that all level sets of the functional F : X → (−∞, ∞] are convex.
Then F is l.s.c. if and only if it is w.l.s.c. on X.
i i
i i
ItoKunisc
i i
2008/6/12
page 90
i i
Proof. Assume that F is l.s.c. Suppose that {xn } converges weakly to x̄ in X, and let {xnk }
be a subsequence such that d = lim inf F (xn ) = limk→∞ F (xnk ). By Lemma 4.3 the sets
{x : F (x) ≤ d + } are closed for every > 0. By assumption they are also convex. Hence
by Mazur’s lemma they are also weakly (sequentially) closed. Hence F (x̄) ≤ d + for
every > 0. Since is arbitrary, we have that F (x̄) ≤ d and F is w.l.s.c. The converse
implication is obvious.
Note that for a convex function all associated level sets are convex, but not vice
versa.
Theorem 4.5. Let F be a proper, l.s.c., convex functional on X. Then F is bounded below
by an affine functional; i.e., there exist x ∗ ∈ X∗ and c ∈ R such that
F (x) ≥ x ∗ , x
X∗ ,X + c for all x ∈ X.
Proof. Let x0 ∈ X and choose β ∈ R such that F (x0 ) > β. Since epi(F ) is a closed convex
subset of the product space X × R, it follows from the separation theorem for convex
sets [EkTu] that there exists a closed hyperplane H ⊂ X × R given by
H = {(x, r) ∈ X × R : x0∗ , x
+ ar = α} with x0∗ ∈ X∗ , a ∈ R, α ∈ R,
such that
x0∗ , x0
+ aβ < α < x0∗ , x
+ ar for all (x, r) ∈ epi(F ).
Setting x = x0 and r = F (x0 ), we have
x0∗ , x0
+ aβ < α < x0∗ , x0
+ aF (x0 )
and thus a (F (x0 ) − β) > 0. If F (x0 ) < ∞, then a > 0 and thus
α 1
− x0∗ , x
≤ r for all (x, r) ∈ epi(F ),
a a
α 1
β< − x0∗ , x0
< F (x0 ).
a a
Hence, b(x) = αa − a1 x0∗ , x
is a continuous affine function on X such that b ≤ F and
−x ∗
the first claim is established with c = αa and x ∗ = a 0 . Moreover β < b(x0 ) < F (x0 ).
Therefore
F (x0 ) = sup b(x0 ) = x ∗ , x0
+ c :
(4.2.1)
x ∗ ∈ X∗ , c ∈ R, b(x) ≤ F (x) for all x ∈ X .
x ∗ , x
+ c + θ (α − x0∗ , x
) < F (x)
i i
i i
ItoKunisc
i i
2008/6/12
page 91
i i
x ∗ , x
+ c + θ (α − x0∗ , x
) > β,
Theorem 4.6. If F : X → (−∞, ∞] is convex and bounded above on an open set U , then
F is continuous on U .
Proof. We choose M ∈ R such that F (x) ≤ M − 1 for all x ∈ U . Let x̂ be any element in
U . Since U is open there exists a δ > 0 such that the open ball {x ∈ X : |x − x̂| < δ} is
contained in U . For any ∈ (0, 1), let θ = M−F
(x̂)
. Then for x ∈ X satisfying |x − x̂| < θ δ
we have
x − x̂
|x − x̂|
θ + x̂ − x̂
= θ
< δ.
Hence x−x̂
θ
+ x̂ ∈ U . By convexity of F
x − x̂
F (x) ≤ (1 − θ )F (x̂) + θ F + x̂ < (1 − θ)F (x̂) + θ M,
θ
and thus
Similarly, + x̂ ∈ U and
x̂−x
θ
θ x̂ − x 1 θM 1
F (x̂) ≤ F + x̂ + F (x) < + F (x),
1+θ θ 1+θ 1+θ 1+θ
which implies
F (x) − F (x̂) > −θ(M − F (x̂)) = −.
Therefore |F (x) − F (x̂)| < if |x − x̂| < θ δ and F is continuous in U .
Proof. By Theorem 4.5 and by assumption there exist constants M and m such that
i i
i i
ItoKunisc
i i
2008/6/12
page 92
i i
and hence
2
F (x) − F (x̂) ≤ θ(M − F (x̂)) ≤ (M − m)|x − x̂|.
δ
Similarly, x̂−x
θ
+ x̂ ∈ U and
θ x̂ − x 1 θM 1
F (x̂) ≤ F + x̂ + F (x) ≤ + F (x),
1+θ θ 1+θ 1+θ 1+θ
which implies
−θ (M − m) ≤ −θ(M − F (x̂)) ≤ F (x) − F (x̂)
and therefore
2 δ
|F (x) − F (x̂)| ≤ (M − m)|x − x̂| for |x − x̂| ≤ .
θ M −m
F ∗ (x ∗ ) = sup { x ∗ , x
X∗ ,X − F (x) : x ∈ X}
F ∗ (x ∗ ) = sup { x ∗ , x
− F (x)} ≤ sup { x ∗ , x
− x ∗ , x
+ c} = c,
x∈X x∈X
and hence F ∗ is not identically ∞. Note also that D(F ) = ∅ implies that F ∗ ≡ −∞ and
conversely, if F ∗ (x ∗ ) = −∞ for some x ∗ ∈ X, then D(F ) is empty.
Example 4.10. If F (x) is the indicator function of the closed unit ball of X, i.e., F (x) = 0
for |x| ≤ 1 and F (x) = ∞ otherwise, then F ∗ (x ∗ ) = |x ∗ |. In fact, F ∗ (x ∗ ) = sup{ x ∗ , x
:
|x| ≤ 1} = |x ∗ |.
F (x0 ) = sup { x ∗ , x0
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X}
c(x ∗ ) = inf {c ∈ R : c ≥ x ∗ , x
− F (x) for all x ∈ X}
i i
i i
ItoKunisc
i i
2008/6/12
page 93
i i
F (x0 ) = sup { x ∗ , x0
− F ∗ (x ∗ )}.
x ∗ ∈X ∗
Theorem 4.13. For every functional F : X → (−∞, ∞] the conjugate F ∗ is convex and
l.s.c. on X∗ .
F ∗ (x ∗ ) = sup { x ∗ , x
− F (x) : x ∈ D(F )}.
Theorem 4.14. For every F : X → (−∞, ∞] the biconjugate F ∗∗ is convex, l.s.c., and
F ∗∗ ≤ F . Moreover for each x̄ ∈ X
F ∗∗ (x̄) = sup { x ∗ , x̄
−c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
−c ≤ F (x) for all x ∈ X}. (4.2.2)
Proof. If there is no continuous affine functional which is everywhere less than F , then
F ∗ ≡ ∞ and hence F ∗∗ ≡ −∞. In fact, if there exist c ∈ R and x ∗ ∈ X∗ with c ≥ F ∗ (x ∗ ),
then x ∗ , ·
− c is everywhere less than F , which is impossible. The claims readily follow
in this case.
i i
i i
ItoKunisc
i i
2008/6/12
page 94
i i
Otherwise, assume that there exists a continuous affine functional everywhere less
than F . Then D(F ∗ ) is nonempty and F ∗∗ (x) > −∞ for all x ∈ X. If F ∗ (x ∗ ) = −∞
for some x ∗ ∈ X∗ , then F ∗∗ ≡ ∞, F ≡ ∞, and the claims follow. In the remaining case
F ∗ (x ∗ ) is finite for all x ∗ and we have
F ∗∗ (x) = sup{ x ∗ , x
− F ∗ (x ∗ ) : x ∗ ∈ X∗ , F ∗ (x ∗ ) finite}.
For every x ∗ ∈ D(F ∗ ) we have
x ∗ , x
− F ∗ (x ∗ ) ≤ F (x) for all x ∈ X,
hence F ∗∗ ≤ F and
F ∗∗ (x̄) ≤ sup{ x ∗ , x̄
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X}.
Let x ∗ and c be such that x ∗ , x
− c ≤ F (x) for all x ∈ X. Then
x ∗ , x
− c ≤ x ∗ , x
− F (x ∗ ) for all x ∈ X.
Therefore
sup{ x ∗ , x̄
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X} ≤ F ∗∗ (x̄),
and (4.2.2) follows. Moreover F ∗∗ is the pointwise supremum of a family of continuous
affine functionals, and it follows as in the proof of Theorem 4.13 that F ∗ is convex and
l.s.c.
For the proof of this result we refer the reader to [EkTu, p. 104], for example.
Theorem 4.17. Let F : X → (−∞, ∞] and assume that ∂F (x) = ∅ for some x ∈ X.
Then F (x) = F ∗∗ (x).
Proof. Since ∂F (x) = ∅ there exists a continuous affine functional ≤ F with (x) =
F (x). Due to Theorem 4.14 we have ≤ F ∗∗ ≤ F . It follows that (x) = F ∗∗ (x) = F (x),
as desired.
i i
i i
ItoKunisc
i i
2008/6/12
page 95
i i
4.2.2 Subdifferential
In this short section we summarize some important results on subdifferentials of functionals
F : X → (−∞, ∞]. While F is not assumed to be convex, the applications that we have
in mind are to convex functionals.
If ∂F (x) is nonempty, then F is called subdifferentiable at x and the set of all points where
F is subdifferentiable is denoted by D(∂F ).
Example 4.19. Let F be Gâteaux differentiable at x, i.e., there exists w ∗ ∈ X∗ such that
F (x + t v) − F (x)
lim = w ∗ , v
for all v ∈ X,
t→0+ t
and w∗ ∈ X∗ is called the Gâteaux derivative of F at x. It is denoted by F (x). If in addition
F is convex, then F is subdifferentiable at x and ∂F (x) = {F (x)}. Indeed, for v = y − x
F (x + t (y − x)) − F (x)
≤ F (y) − F (x), where 0 < t < 1.
t
As t → 0+ we obtain
F (x), y − x
≤ F (y) − F (x) for all y ∈ X,
and thus F (x) ∈ ∂F (x). On the other hand, if w ∗ ∈ ∂F (x), we find for y ∈ X and t > 0
F (x + t y) − F (x)
≥ w ∗ , y
.
t
Taking the limit t → 0+ , we obtain
F (x) − w ∗ , y
≥ 0 for all y ∈ X.
Example 4.20. For F (x) = 12 |x|2 , x ∈ X, we will show that ∂F (x) = F(x), where
F : X → X∗ denotes the duality mapping. In fact, if x ∗ ∈ F(x), then
1
x ∗ , x − y
= |x|2 − x ∗ , y
≥ (|x|2 − |y|2 ) for all y ∈ X.
2
i i
i i
ItoKunisc
i i
2008/6/12
page 96
i i
1
(|y|2 − |x|2 ) ≥ x ∗ , y − x
for all y ∈ X. (4.2.4)
2
We let y = t x, 0 < t < 1, and obtain
1+t
|x|2 ≤ x ∗ , x
2
and thus |x|2 ≤ x ∗ , x
. Similarly, if t > 1, then |x|2 ≥ x ∗ , x
and therefore |x|2 = x ∗ , x
1 t2
t x ∗ , u
≤ (|x + t u|2 − |x|2 ) ≤ t |u||x| + |u|2 ,
2 2
which implies x ∗ , u
≤ |u||x|. Hence |x ∗ | ≤ |x| and we obtain |x|2 = |x ∗ |2 = x ∗ , x
Example 4.21. Let K be a closed convex subset of X and let ψK be the indicator function
of K, i.e.,
0 if x ∈ K,
ψK (x) =
∞ otherwise.
∂ψK (x) = {x ∗ ∈ X∗ : x ∗ , y − x
≤ 0 for all y ∈ K}
and ∂ψK (x) = ∅ if x ∈ / X. Thus D(ψK ) = D(∂ψK ) = K and ∂ψK (x) = {0} for each
interior point of K. Moreover, for x ∈ K, ∂ψK (x) coincides with the definition of the
normal cone to K at x.
x ∗ , x
− F (x) ≤ x ∗ , x̄
− F (x̄) (4.2.5)
F ∗∗ (x̄) ≤ x ∗ , x̄
− F ∗ (x ∗ ).
i i
i i
ItoKunisc
i i
2008/6/12
page 97
i i
By the definition of F ∗∗ ,
F ∗∗ (x̄) ≥ x ∗ , x̄
− F ∗ (x ∗ ).
Hence
F ∗ (x ∗ ) + F ∗∗ (x̄) = x ∗ , x̄
.
F (x̄) + F ∗ (x ∗ ) = F ∗ (x ∗ ) + F ∗∗ (x̄) = x ∗ , x̄
Proposition 4.23. For F : X → (−∞, ∞] the set ∂F (x) is closed and convex for every
x ∈ X.
Proof. If F (x) = ∞, then ∂F (x) = ∅. Henceforth let F (x) < ∞. For every x ∗ ∈ X ∗ we
have F ∗ (x ∗ ) ≥ x ∗ , x
− F (x) and hence by Theorem 4.22
∂F (x) = {x ∗ ∈ X∗ : F ∗ (x ∗ ) − x ∗ , x
≤ −F (x)}.
Theorem 4.24. If the convex function F is continuous at x̄, then ∂F (x̄) is not empty.
Proof. Since F is continuous at x, there exists for every > 0 an open neighborhood U
of x̄ such that
F (x) ≤ F (x̄) + , x ∈ U .
Then U × (F (x̄) + , ∞) is an open set in X × R, contained in epi F . Hence (epi F )o , the
relative interior of epi F , is nonempty. Since F is convex, epi F is convex and (epi F )o
is convex. Note that (x̄, F (x̄)) is a boundary point of epi F . Hence by the Hahn–Banach
separation theorem, there exists a closed hyperplane S = {(x, a) ∈ X × R : x ∗ , x
+ α a =
β} for nontrivial (x ∗ , α) ∈ X ∗ × R and β ∈ R such that
x ∗ , x
+ α a > β for all (x, a) ∈ (epi F )o ,
(4.2.6)
x ∗ , x̄
+ α F (x̄) = β.
{(x̃, ã) ∈ X × R : x ∗ , x̃
+ α ã < β}
i i
i i
ItoKunisc
i i
2008/6/12
page 98
i i
is a neighborhood of (x, a) and contains an element of (epi ϕ)o , which contradicts (4.2.6).
Therefore
x ∗ , x
+ α a ≥ β for all (x, a) ∈ epi F. (4.2.7)
Suppose α = 0. For any u ∈ U there is an a ∈ R such that F (u) ≤ a. Then from (4.2.7)
x ∗ , u
= x ∗ , u
+ α a ≥ β
and thus
x ∗ , u − x̄
≥ 0 for all u ∈ U .
Choose δ > 0 such that |u − x̄| ≤ δ implies u ∈ U . For any nonzero element x ∈ X let
t = |x|
δ
. Then |(tx + x̄) − x̄| = |tx| = δ so that tx + x̄ ∈ U . Hence
x ∗ , x
= x ∗ , (tx + x̄) − x̄
/t ≥ 0.
Similarly, −t x + x̄ ∈ U and
x ∗ , x
= x ∗ , (−tx + x̄) − x̄
/(−t) ≤ 0.
Thus, x ∗ , x
, x ∗ = 0, which is a contradiction. Therefore α is nonzero. From (4.2.6),
(4.2.7) we have for a > F (x̄) that α(a − F (x̄)) > 0 and hence α > 0. Employing (4.2.6),
(4.2.7) again implies that
∗
x
− , x − x̄ + F (x̄) ≤ F (x)
α
∗
for all x ∈ X and therefore − xα ∈ ∂F (x̄).
the primal problem, where X is a real Banach space and F : X → (−∞, ∞] is a proper
l.s.c. convex function. We have the following result for the existence of a minimizer.
Theorem 4.25. Let X be reflexive and let F be a l.s.c. proper convex functional defined on
X satisfying
lim F (x) = ∞. (4.3.1)
|x|→∞
Proof. Let η = inf {F (x) : x ∈ X} and let {xn } be a minimizing sequence such that
limn→∞ F (xn ) = η. Condition (4.3.1) implies that {xn } is bounded in X. Since X is
i i
i i
ItoKunisc
i i
2008/6/12
page 99
i i
reflexive, there exists a subsequence that converges weakly to some x̄ in X and it follows
from Lemma 4.4 that F (x̄) = η.
In this section we analyze the relationship between the primal problem (P ) and its
dual (P ∗ ). Throughout we assume that h(y) > −∞ for all y ∈ Y .
x ∗ , x
+ y ∗ , y
− (x, y) ≤ ∗ (x ∗ , y ∗ ).
Thus
0 = 0, x
+ y ∗ , 0
≤ F (x) + ∗ (0, y ∗ )
for all x ∈ X and y ∗ ∈ Y ∗ . Therefore
i i
i i
ItoKunisc
i i
2008/6/12
page 100
i i
≤ θ (x1 , y1 ) + (1 − θ) (x2 , y2 ) ≤ θ a1 + (1 − θ) a2 = c,
which is a contradiction. Hence h is convex.
Proof.
h∗ (y ∗ ) = sup ( y ∗ , y
− h(y)) = sup ( y ∗ , y
− inf (x, y))
y∈Y y∈Y x∈X
= sup sup ( y ∗ , y
− (x, y)) = sup sup ( 0, x
+ y ∗ , y
− (x, y))
y∈Y x∈X y∈Y x∈X
Proof. Since F is proper, h(0) = inf x∈X F (x) < ∞. Since h is convex by Lemma 4.28, it
follows from Theorem 4.15 that h(0) = h∗∗ (0). Thus by Lemma 4.29
sup (P ∗ ) = sup (−∗ (0, y ∗ )) = sup ( y ∗ , 0
− h∗ (y ∗ ))
y ∗ ∈Y ∗ y ∗ ∈Y ∗
Theorem 4.31. If h is subdifferentiable at 0, then inf (P ) = sup (P ∗ ) and ∂h(0) is the set
of solutions of (P ∗ ).
= sup ( y ∗ , 0
− h∗ (y ∗ )) = h∗∗ (0).
y ∗ ∈Y ∗
i i
i i
ItoKunisc
i i
2008/6/12
page 101
i i
By Theorem 4.22
if and only if ȳ ∗ ∈ ∂h∗∗ (0). Since h∗∗∗ = h∗ by Theorem 4.16, we have ȳ ∗ ∈ ∂h∗∗ (y ∗ ) if
and only if −h∗ (ȳ ∗ ) = h∗∗ (0). Consequently ȳ ∗ solves (P ∗ ) if and only if y ∗ ∈ ∂h∗∗ (0).
Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗ (0) by Theorem 4.17. Therefore ∂h(0) is the set of
all solutions of (P ∗ ) and (P ∗ ) has at least one solution. Let y ∗ ∈ ∂h(0). Then
y ∗ , x
+ h(0) ≤ h(x)
Corollary 4.32. If there exists an x̄ ∈ X such that (x̄, ·) is finite and continuous at 0, then
h is continuous on an open neighborhood U of 0 and h = h∗∗ . Moreover,
inf (P ) = sup (P ∗ )
Proof. First show that h is continuous. Clearly, (x̄, ·) is bounded above on an open
neighborhood U of 0. Since for all y ∈ Y
= sup{ x ∗ , x
− f (x) + sup[ y ∗ , y
− ϕ(x + y)]},
x∈X y∈Y
where
sup[ y ∗ , y
− ϕ(x + y)] = sup[ y ∗ , x + y
− ϕ(x + y) − y ∗ , x
]
y∈Y y∈Y
= sup[ y ∗ , z
− ϕ(z)] − y ∗ , x
= ϕ ∗ (y ∗ ) − y ∗ , x
.
z∈Y
i i
i i
ItoKunisc
i i
2008/6/12
page 102
i i
Thus
∗ (x ∗ , y ∗ ) = sup{ x ∗ , x
− y ∗ , x
− f (x) + ϕ ∗ (y ∗ )}
x∈X
= sup{ x ∗ − ∗ y ∗ , x
− f (x) + ϕ ∗ (y ∗ )} = f ∗ (x ∗ − y ∗ ) + ϕ ∗ (y ∗ ).
x∈X
Theorem 4.34. For any x̄ ∈ X and ȳ ∗ ∈ Y ∗ , the following statements are equivalent.
By Theorem 4.22
−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.5)
ȳ ∗ ∈ ∂ϕ(x̄).
The functional L : X × Y ∗ → (−∞, ∞] defined by
−L(x, y ∗ ) = sup { y ∗ , y
− (x, y)} (4.3.6)
y∈Y
i i
i i
ItoKunisc
i i
2008/6/12
page 103
i i
∗ (x ∗ , y ∗ ) = sup { x ∗ , x
+ y ∗ , y
− (x, y)}
x∈X, y∈Y
= sup x ∗ , x
+ sup { y ∗ , y
− (x, y)}} = sup ( x ∗ , x
− L(x, y ∗ )).
x∈X y∈Y x∈X
Thus
−∗ (0, y ∗ ) = inf L(x, y ∗ ) (4.3.7)
x∈X
If is a convex l.s.c. function that is finite at (x, y), then for the biconjugate of x : y →
(x, y) in y we have x (y)∗∗ = (x, y) and
(x, y) = ∗∗ ∗ ∗ ∗
x (x, y) = sup { y , y
− x (y )}
y ∗ ∈Y ∗
= sup { y ∗ , y
+ L(x, y ∗ )}.
y ∗ ∈Y ∗
Hence
(x, 0) = sup L(x, y ∗ ) (4.3.8)
y ∗ ∈Y ∗
Theorem 4.35 (Saddle Point). Assume that is a convex l.s.c. function that is finite at
(x̄, ȳ ∗ ). Then the following are equivalent.
(1) (x̄, ȳ ∗ ) ∈ X × Y ∗ is a saddle point of L, i.e.,
i i
i i
ItoKunisc
i i
2008/6/12
page 104
i i
Theorem 4.35 implies that no duality gap between (P ) and (P ∗ ) is equivalent to the
saddle point property of the pair (x̄, ȳ ∗ ).
For Example 4.33 we have
L(x, y ∗ ) = f (x) + y ∗ , x
− ϕ(y ∗ ). (4.3.10)
−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.11)
x̄ ∈ ∂ϕ ∗ (x̄).
ȳ ∗ ∈ ∂ϕ(x̄)
and (4.3.11) is equivalent to (4.3.5), if X is reflexive. Thus the necessary optimality sys-
tem for
min F (x) = f (x) + ϕ(x)
x∈X
is given by
−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.12)
ȳ ∗ ∈ ∂ϕ ∗ (x̄).
y1 − y2 , x1 − x2
≥ 0 for all x1 , x2 ∈ D(A) and y1 ∈ Ax1 , y2 ∈ Ax2
i i
i i
ItoKunisc
i i
2008/6/12
page 105
i i
Let ϕ be an l.s.c. convex function on X. For x1∗ ∈ ∂ϕ(x1 ) and x2∗ ∈ ∂ϕ(x2 ),
Theorem 4.37 (Minty–Browder). Assume that X and X∗ are reflexive and strictly convex
Banach spaces and let F : X → X∗ denote the duality mapping. Then a monotone operator
A is maximal monotone if and only if R(λ F + A) = X∗ for all λ > 0 (or, equivalently, for
some λ > 0).
Theorem 4.38 (Rockafeller). Let X be a real Banach space. If ϕ is an l.s.c. proper convex
functional on X, then ∂ϕ is a maximal monotone operator from X into X∗ .
Proof. We prove the theorem for the case that X is reflexive . The general case is considered
in [Roc2]. ByAsplund’s renorming theorem we can assume that after choosing an equivalent
norm, X and X∗ are strictly convex. Using the Minty–Browder theorem it suffices to prove
that R(F + ∂ϕ) = X∗ . For x0∗ ∈ X∗ we must show that the equation x0∗ ∈ F x + ∂ϕ(x)
has at least a solution x0 . Note that F x is single valued due to the fact that X ∗ is strictly
convex. Define the proper convex functional f : X → (−∞, ∞] by
1 2
f (x) = |x| + ϕ(x) − x0∗ , x
.
2 X
Since f is l.s.c. and f (x) → ∞ as |x| → ∞, there exists x0 ∈ D(f ) such that f (x0 ) ≤
f (x) for all x ∈ X. The subdifferential of the mapping x → 12 |x| 2 is given by the monotone
operator F . Hence we find
Taking the limit t → 0+ and using the fact that x → F (x) is continuous from X endowed
with the norm topology to X ∗ endowed with the weak∗ topology, we obtain
Throughout the remainder of this section let H denote a real Hilbert space which
is identified with its dual H ∗ . Further let A denote a maximal monotone operator in
H × H . Recall that A is necessarily densely defined and closed. Moreover the resolvent
i i
i i
ItoKunisc
i i
2008/6/12
page 106
i i
Jμ = (I + μ A)−1 , with μ > 0, is a contraction defined on all of H ; see [Bre, ItKa, Paz].
Moreover
|Jμ x − x| → 0 as μ → 0+ for each x ∈ H. (4.4.1)
The Yosida approximation Aμ of A is defined by
1
Aμ x = (x − Jμ x).
μ
The operator Aμ is single valued, monotone, everywhere defined, Lipschitz continuous with
Lipschitz constant μ1 and Aμ x ∈ AJμ x for all x ∈ H .
Let ϕ be an l.s.c., proper, and convex function on H . Throughout the remainder of
this chapter let A denote the maximal monotone operator ∂ϕ on the Hilbert space H = H ∗ .
For x, λ ∈ H and c > 0 define the functional ϕc (x, λ) by
c
ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.4.2)
u∈H 2
Then ϕc (x, λ) is a smooth approximation of ϕ in the following sense.
i i
i i
ItoKunisc
i i
2008/6/12
page 107
i i
ϕ(ξ ) − ϕ(v0 ) ≥ c (y − v0 , ξ − v0 ).
λ
uc (x, λ) = x − v0 = x − J 1 x+
c c
attains the minimum in (4.4.2). Note that this argument also implies that A is maximal.
For x1 , x2 ∈ X and 0 < t < 1
1 2
ϕc ((1 − t)x1 + tx2 , λ) = ψ((1 − t)v1 + tv2 ) − |λ| ,
2c
where yi = xi + λc and vi = J 1 yi for i = 1, 2. Hence the convexity of x → ϕc (x, λ)
c
follows from the one of ψ.
Next, we show that ∂ϕc (x, λ) = A 1 (x + λc ). For x̂ ∈ H , let ŷ = x̂ + λc ∈ H and
c
v̂ = J 1 ŷ. Then, we have
c
c c
ϕ(v̂) + |v̂ − y|2 ≥ ϕ(v0 ) + |v0 − y|2
2 2
and
c c
ϕ(v0 ) + |v0 − ŷ|2 ≥ ϕ(v̂) + |v̂ − ŷ|2 .
2 2
Thus,
c c
(|v0 − ŷ|2 − |v0 − y|2 ) ≥ ϕc (x̂, λ) − ϕc (x, λ) ≥ (|v̂ − ŷ|2 − |v̂ − y|2 ). (4.4.6)
2 2
Since |v̂ − v0 | → 0 as |x̂ − x| → 0, it follows from (4.4.6) that
|ϕc (x̂, λ) − ϕc (x, λ) − c(y − v0 ), x̂ − x |
→0
|x̂ − x|
i i
i i
ItoKunisc
i i
2008/6/12
page 108
i i
The dual representation of ϕc (x, λ) is derived next. Define the functional (v, y) on
H × H by
c
(v, y) = ϕ(v) + |v − (ŷ + y)|2 ,
2
where we set ŷ = x + λc . Consider the family of primal problems
If h(y) is the value functional of (Py ), i.e., h(y) = inf v∈H (v, y), then from (4.4.3) we
have
1
ϕc (x, λ) = h(0) − |λ|2 . (4.4.7)
2c
From the proof of Theorem 4.39 it follows that h(y) is continuously Fréchet differentiable
with h (0) = ϕc (x, λ). It thus follows from Theorem 4.31 that inf (P0 ) = max (P ∗ ) and
h (0) is the solution of (P ∗ ).
This leads to the following theorem.
1 ∗2
= ϕ ∗ (y ∗ + v ∗ ) − (y ∗ , ŷ) + |y | .
2c
i i
i i
ItoKunisc
i i
2008/6/12
page 109
i i
Hence,
∗ ∗ ∗ ∗ ∗ 1 ∗2
h(0) = sup {− (0, y )} = sup (y , ŷ) − ϕ (y ) − |y |
y ∗ ∈H y ∗ ∈H 2c
which implies (4.4.8) from (4.4.7) and since ŷ = x + λc . By Theorem 4.31 the maximum
of y ∗ → y ∗ , x
− ϕ ∗ (y ∗ ) − 2c1 |y ∗ − λ|2 is attained at the unique point h (0) = λc (x, λ)
that is given by (4.4.9).
Theorem 4.41.
ϕ(x) ≥ ϕc (x, λ) ≥ λ, x
− ϕ ∗ (λ) = ϕ(x)
for every c > 0. Thus, λ ∈ H attains the supremum in (4.4.8) and by Theorem 4.40
we have λ = ϕc (x, λ). Conversely, if λ ∈ H satisfies λ = ϕc (x, λ) for some c > 0,
then uc (x, λ) = 0 by Theorem 4.40. Hence it follows from Theorem 4.39, (4.4.2), and
Theorem 4.40 that
ϕ(x) = ϕc (x, λ) = λ, x
− ϕ ∗ (λ),
i i
i i
ItoKunisc
i i
2008/6/12
page 110
i i
for all x ∈ X. Due to Theorem 4.24 there exists y0∗ ∈ D(A(y0 )) = ∂ϕ(y0 ) such that
Hence, lim f (x) + ϕ(x) → ∞ as |x|X → ∞ and it follows from Theorem 4.25 and (A2)
that there exists a unique minimizer x̄ ∈ C of (4.1.1).
f (x̄), x − x̄
X∗ ,X + ϕ(x) − ϕ(x̄) ≥ 0 for all x ∈ C. (4.5.3)
Proof. Assume that x̄ is the minimizer of (4.1.1). Then for x ∈ C and 0 < t < 1 we have
x̄ + t (x − x̄) ∈ C and
Since
ϕ(((1 − t)x̄ + tx)) − ϕ(x̄) ≤ t (ϕ(x) − ϕ(x̄)),
we obtain
t −1 (f (x̄ + t (x − x̄)) − f (x̄)) + ϕ(x) − ϕ(x̄) ≥ 0.
Taking the limit t → 0+ , we have (4.5.3) for all x ∈ C.
Conversely, assume that x̄ ∈ C satisfies (4.5.3). Then, from (4.5.1),
i i
i i
ItoKunisc
i i
2008/6/12
page 111
i i
for c > 0 and λ ∈ H . From Theorem 4.39 it follows that x → ϕc (x, λ) is convex,
continuously differentiable, and bounded below by − 2c1 |λ|2H . Thus, for λ ∈ H , (4.5.4) has
a unique solution xc ∈ C and xc ∈ C is the solution of (4.5.4) if and only if xc satisfies
f (xc ), x − xc
+ ϕc (xc , λ), (x − xc ) H ≥ 0 for all x ∈ C. (4.5.5)
It follows from Theorems 4.39 and 4.40 that
1
ϕc (xc , λ) = A 1 xc + λ = λc ∈ H. (4.5.6)
c c
We have the following result.
Proof. To verify (1), assume that xc → x̄ and let λ̄ be a weak cluster point of λc in H .
Then, from (4.5.5)–(4.5.6), we can conclude that (x̄, λ̄) ∈ C × H satisfies
f (x̄), x − x̄
+ λ̄, (x − x̄) H ≥ 0
for all x ∈ C. It follows from Theorems 4.39 and 4.40 that
1
λc , vc H − |λc − λ|2 = ϕc (xc , λ) + ϕ ∗ (λc )
2c
λ 1
≥ ϕ J 1 vc + − |λ|2 + ϕ ∗ (λc ).
c c 2c H
Since J 1 (vc + λc ) → v̄ = x̄ and ϕ and ϕ ∗ are l.s.c., letting c → ∞, we obtain
c
λ̄, v̄ H ≥ ϕ(v̄) + ϕ ∗ (λ̄),
which implies that λ̄ ∈ ∂ϕ(v̄) by Theorem 4.22. Hence, (x̄, λ̄) ∈ C × H satisfies (4.5.7).
Conversely,
suppose that (x̄, λ̄) ∈ C × H satisfies (4.5.7). Then ϕ(x) − ϕ(x̄) ≥
λ̄, (x − x̄) H for all x ∈ C. Thus the inequality in (4.5.7) implies (4.5.3). It then follows
from Theorem 4.42 that x̄ minimizes (4.1.1).
For (2) we note that from (4.5.5) we have
f (xc ), x − xc
+ ϕc (x, λ) − ϕc (xc , λ) ≥ 0 for all x ∈ C.
Thus
f (xc ), x̄ − xc
+ ϕc (x̄, λ) − ϕc (xc , λ) ≥ 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 112
i i
Lemma 4.44. (1) If D(ϕ) = H and bounded above, on bounded sets, then ∂ϕ(xc ) is
nonempty and |∂ϕ(xc )|H is uniformly bounded for c ≥ 1.
(2) If ϕ = χK with K a closed convex set in H and xc ∈ K for all c > 0, then λ̃c
can be chosen to be 0 for all c > 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 113
i i
Theorem 4.7 that ϕ is Lipschitz continuous in the open set B = {v ∈ H : |v − v̄| <
(M + 1) }, where v̄ = x̄. Let L denote the Lipschitz constant. By Theorem 4.24
∂ϕ(vc ) is nonempty for c ≥ 1. Let λ̃c ∈ ∂ϕ(vc ) for c ≥ 1. Hence
L |v − vc | ≥ ϕ(v) − ϕ(vc ) ≥ λ̃c , v − vc
Theorem 4.45 (Complementarity). Assume that there exists a pair (x̄, λ̄) ∈ C × H that
satisfies (4.5.7). Then the complementarity condition λ̄ ∈ ∂ϕ(x̄) can equivalently be
expressed as
λ̄ = ϕc (x̄, λ̄) (4.5.10)
and x̄ is the unique solution of
min f (x) + ϕc (x, λ̄) (4.5.11)
x∈C
Proof. The first claim follows directly from Theorem 4.41. From (4.5.5) we conclude that
x̂ is a minimizer of (4.5.11) if and only if x̂ ∈ C satisfies
f (x̂), x − x̂
X∗ ,X + ϕc (x̂, λ̄), (x − x̂) H ≥ 0 for all x ∈ C.
From (4.5.7) and (4.5.10) it follows that x̄ ∈ C satisfies this inequality as well and hence
x̂ = x̄.
Theorems 4.43 and 4.45 imply that if a pair (x̄, λ̄) ∈ C × H satisfies
⎧
⎨ f (x̄), x − x̄
+ λ̄, (x − x̄) ≥ 0 for all x ∈ C,
(4.5.12)
⎩
λ̄ = ϕc (x̄, λ̄)
for some c > 0, then x̄ is the minimizer of (4.1.1). Conversely, if x̄ is a minimizer of (4.1.1)
and
∂(ϕ ◦ + ψC )(x̄) = ∗ ∂ϕ(x̄) + ∂ψC (x̄), (4.5.13)
then there exists a λ̄ ∈ H such that the pair (x̄, λ̄) satisfies (4.5.12) for all c > 0. Here ψC
denotes the indicator function of the set C. In fact it follows from (4.5.7) that −f (x̄) ∈
∂(ϕ ◦ + ψC )(x̄) and by (4.5.13)
−f (x̄) ∈ ∗ ∂ϕ(x̄) + ∂ψC (x̄) = ∗ ∂ϕ(x̄) + NC (x̄),
where NC (x̄) = {z ∈ X ∗ : z, x − x̄
≤ 0 for all x ∈ C}. This implies that there exists
some λ̄ ∈ ∂ϕ(x̄) such that (4.5.7) holds and also (4.5.12) for the pair (x̄, λ̄). Condition
(4.5.13) holds, for example, if there exists x ∈ int (C) and ϕ is continuous and finite at x
(see, e.g., Propositions 12 and 13 in Section 3.2 of [EkTu]).
i i
i i
ItoKunisc
i i
2008/6/12
page 114
i i
The following theorem discusses the equivalence between the existence of a Lagrange
multiplier λ̄ and uniform boundedness of λc .
f (xc ) + ∗ λc , x − xc
≥ 0 for all x ∈ C,
which by assumption implies that |xc |X is uniformly bounded. Since is compact it follows
that any weakly convergent sequence λc → λ∗ in H satisfies ∗ λc → ∗ λ̄ strongly in X ∗ .
Again, from (A2) we have
for any c, ĉ > 0. Hence {xc } is a Cauchy sequence in X and thus there exists x̄ ∈ X such
that |xc − x̄|X → 0. The rest of the proof for the “only if” part is identical to the one of the
first part of Theorem 4.43. It will be shown in Theorem 4.49 that
σ 1 1
|xc − x̄|2X + |λc − λ̄|2H ≤ |λ − λ̄|2H
2 2c 2c
if (4.5.7) holds. This implies the “if” part.
i i
i i
ItoKunisc
i i
2008/6/12
page 115
i i
for λ ∈ H . The update is then a steepest ascent step for dc (λ). In fact it will be shown in
the following lemma that (4.6.1) attains the minimum at a unique minimizer x(λ) ∈ C and
that dc is continuously Fréchet differentiable with F -derivative
where uc (x, λ) is defined in Theorem 4.39. Thus the steepest ascent step is given by
λk+1 = λk + c u(x(λk ), λk ), which by Theorem 4.40 coincides with the update given in
Step 3.
Lemma 4.47. For λ ∈ H and c > 0 the infimum in (4.6.1) is attained at a unique minimizer
x(λ) ∈ C and the mapping λ ∈ H → x(λ) ∈ X is Lipschitz continuous with Lipschitz
constant σ −1 . Moreover, the dual functional dc is continuously Fréchet differentiable with
F-derivative
dc (λ) = u(x(λ), λ),
where uc (v, λ) is defined in Theorem 4.39.
+ ϕc (x(μ), μ) − ϕc (x(λ), λ), (x(μ) − x(λ)) ≤ 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 116
i i
Since A 1 is monotone and Lipschitz continuous with Lipschitz constant c, this inequality
c
yields
σ |x(μ) − x(λ)|X ≤ |μ − λ|H ,
which shows the first assertion.
Step 2. We show that for every v ∈ H the functional λ ∈ H → ϕc (v, λ) is Fréchet
∂
differentiable with F -derivative ∂λ ϕc (v, λ) given by u(v, λ). Since (4.4.2) can be equiva-
lently written as (4.4.3), it follows from the proof of Theorem 4.39 that λ ∈ H → ϕc (v, λ)
is Fréchet differentiable and
∂ 1 λ
ϕc (v, λ) = (λ + c u(v, λ)) − = u(v, λ).
∂λ c c
Step 3. To argue differentiability of λ → dc (λ) note that it follows from (4.6.2) that
for λ, μ ∈ H
dc (μ) − dc (λ) = f (x(μ)) − f (x(λ)) + ϕc (x(μ), λ) − ϕc (x(λ), λ)
1
1
≤ |μ − λ|
A 1 x(μ) + 1 (λ + t (μ − λ)) − A 1 x(λ) + λ + t (λ − μ)
dt
c
c c c c
0
1 1
1
≤ |μ − λ| (c|x(μ) − x(λ)| + 2t |μ − λ|) dt ≤ + |μ − λ|2 .
c 0 σ c
i i
i i
ItoKunisc
i i
2008/6/12
page 117
i i
Hence (4.6.3)–(4.6.4) and Step 2 imply that λ ∈ H → dc (λ) is Fréchet differentiable with
F -derivative given by u(x(λ), λ), where u(x(λ), λ) is Lipschitz continuous in λ.
Theorem 4.48. Assume that (A1)–(A3) hold and that there exists λ̄ ∈ ∂ϕ(x̄) such that
(4.5.7) is satisfied. Then the sequence (xk , λk ) is well defined and satisfies
σ 1 1
|xk − x̄|2X + |λk+1 − λ̄|2H ≤ |λk − λ̄|2H (4.6.5)
2 2c 2c
and
∞
σ 1
|xk − x̄|2X ≤ |λ1 − λ̄|2H , (4.6.6)
k=1
2 2c
Proof. It follows from Theorem 4.45 that λ̄ = ϕc (v̄, λ̄), where v̄ = x̄. Next, we establish
(4.6.5). From (4.5.5) and Step 3
f (xk ), x̄ − xk
+ λk+1 , (x̄ − xk ) ≥ 0
and from (4.5.7)
f (x̄), xk − x̄
+ λ̄, (xk − x̄) ≥ 0.
Adding these two inequalities, we obtain
f (xk ) − f (x̄), xk − x̄
+ λk+1 − λ̄, (xk − x̄) ≤ 0. (4.6.7)
From Theorems 4.39 and 4.40
λk λ̄
λk+1 − λ̄ = A 1 vk + − A1 v̄ + ,
c c c c
where vk = xk and v̄ = x̄. From the definition of Aμ we have
Aμ v − Aμ v̂, v − v̂
= μ |Aμ v − Aμ v̂|2 + Aμ v − Aμ v̂, Jμ v − Jμ v̂
≥ μ |Aμ v − Aμ v̂|2
for μ > 0 and v, v̂ ∈ H , since Aμ v ∈ AJμ v and A is monotone. Thus,
λk λ̄
(λk+1 − λ̄, vk − v̄) = λk+1 − λ̄, vk + − v̄ +
c c
1 1 1
− (λk+1 − λ̄, λk − λ̄) ≥ |λk+1 − λ̄|2 − (λk+1 − λ̄, λk − λ̄)
c c c
1 1
≥ |λk+1 − λ̄|2 − |λk − λ̄|2 .
2c 2c
Hence, (4.6.5) follows from (A2) and (4.6.7). Summing up (4.6.5) with respect to k, we
obtain (4.6.6).
i i
i i
ItoKunisc
i i
2008/6/12
page 118
i i
where xk ∈ C solves
f (xk ) + ∗ λk , x − xk
≥ 0 for all x ∈ C. (4.6.9)
It will be shown that the Uzawa method is conditionally convergent in the sense that there
exists 0 < c < c̄ such that it converges for c ∈ [c, c̄]. On the other hand, the augmented
Lagrangian method can be written as
λk+1 = ϕc (xk , λk ),
where xk ∈ C satisfies
f (xk ) + ∗ λk+1 , x − xk
≥ 0 for all x ∈ C.
Note that the Uzawa method is explicit with respect to λ while the augmented Lagrangian
method is implicit and converges unconditionally (see Theorem 4.48) with respect to c.
Theorem 4.49 (Uzawa Algorithm). Assume that (A1)–(A3) hold and that there exists
λ̄ ∈ ∂ϕ(x̄) such that (4.5.7) is satisfied. Then there exists c̄ such that for the sequence
(xk , λk ) generated by (4.6.8)–(4.6.9) we have |xk − x̄|X → 0 as k → ∞ if 0 < c ≤ c̄.
Proof. As shown in the proof of Theorem 4.48 it follows from (4.5.7) and (4.6.9) that
f (xk ) − f (x̄), xk − x̄
+ λk − λ̄, (xk − x̄)
≤ 0. (4.6.10)
Since
λk λ̄
λk+1 − λ̄ = A 1 vk + −A 1 v̄ + ,
c c c c
where vk = xk , and since v̄ = x̄ and A 1 is Lipschitz continuous with Lipschitz con-
c
stant c,
|λk+1 − λ̄| ≤ |λk − λ̄ + c (vk − v̄)|2
2
i i
i i
ItoKunisc
i i
2008/6/12
page 119
i i
4.7 Applications
In this section we discuss applications of the results in Sections 4.5 and 4.6. In many cases
the conjugate functional ϕ ∗ is given by
ϕ ∗ (v) = ψK ∗ (v),
where K ∗ is a closed convex set in H and ψS is the indicator function of a set S, i.e.,
⎧
⎨ 0 if x ∈ S,
ψS (x) =
⎩
∞ if x ∈ S.
Then it follows from Theorem 4.40 that for v, λ ∈ H
1 1
ϕc (v, λ) = sup − |y ∗ − (λ + c v)|2H + (|λ + c v|2H − |λ|2H ). (4.7.1)
y ∗ ∈K ∗ 2c 2c
Hence the supremum is attained at λc (v, λ) = ProjK ∗ (λ + c v), where ProjK ∗ (φ) denotes
the projection of φ ∈ H onto K ∗ . This implies that the update in Step 3 of the augmented
Lagrangian algorithm is given by
λk+1 = λk + c xk ,
which coincides with the first order augmented Lagrangian update for equality constraints
discussed in Chapter 3.
Example 4.51. If ϕ(v) = ψK (v), where K is a closed convex cone with vertex at the origin
in H , then ϕ ∗ = ψK + , where K + = {w ∈ H : w, v
H ≤ 0 for all v ∈ K} is the dual cone of
K. In particular, if K = {v ∈ L2 () : v ≤ 0 a.e.}, then K + = {w ∈ L2 () : w ≥ 0 a.e.}.
Thus for the inequality constraint in H = L2 () the update (4.7.2) becomes
where the max operation is defined pointwise in . Here L2 () denotes the space of scalar-
valued square integrable functions over a domain in Rn . For K = {v ∈ Rn : vi ≤
0 for all i} the update (4.7.2) is given by λk+1 = max(0, λk + cxk ), where the maximum
is taken coordinatewise. It coincides with the first order augmented Lagrangian update for
finite rank inequality constraints in Chapter 3.
i i
i i
ItoKunisc
i i
2008/6/12
page 120
i i
Example 4.52. If ϕ(v) = |v|H , then ϕ ∗ = ψB , where B is the closed unit ball in H . Thus
the update (4.7.2) becomes
λ k + c k xk
λk+1 = .
max(1, |λ + c xk |)
φ+ψ y∗ − λ ψ − φ
x− − ∈ ∂(| · |)(y ∗ )
2 c 2
a.e. in . Thus it follows from Theorem 4.40 that the complementarity condition (4.1.8) is
given by
and the Lagrange multiplier update in Step 3 of the augmented Lagrangian method is
where the max and min operations are defined pointwise a.e. in .
Note that with obvious modifications, x in Sections 4.5 and 4.6 can be replaced by
the affine function of the form x + a with a ∈ H .
where is a bounded open set in R2 with Lipschitz boundary, g and μ are positive constants,
and f˜ ∈ L2 (). For a discussion of (4.7.3) we refer the reader to [Glo, GLT] and the
references therein. In the context of the general theory of Section 4.5 we choose
i i
i i
ItoKunisc
i i
2008/6/12
page 121
i i
and
3
ϕ(v1 , v2 ) = v12 + v22 dx.
Conditions (A1)–(A3) are clearly satisfied. Since dom(ϕ) = H it follows from Theo-
rem 4.45 and Lemma 4.44 that there exists λ̄ such that (4.5.7) holds. Moreover, it is not
difficult to show that
ϕ ∗ (v) = ψK ∗ (v),
where K ∗ is given by
Hence it follows from (4.7.2) that Steps 2 and 3 in the augmented Lagrangian method are
given by
where
⎧
⎪
⎪ λ + c ∇uk on Ak = {x : |λk (x) + c ∇uk (x)|R2 ≤ 1},
⎨ k
λk+1 = λk + c ∇uk (4.7.5)
⎪
⎪
⎩ on \ Ak .
|λk + c ∇uk |R2
in the strong form. Here K : L2 () → L2 () denotes the convolution operator defined by
(Ku)(x) = k(x, s)u(s) ds.
i i
i i
ItoKunisc
i i
2008/6/12
page 122
i i
where Ĉ is the closed convex set defined by C = {v ∈ H : |v|R2 ≤ 1 a.e. in }. Then
ϕ ∗ (w) = |w|R2 dx.
a.e. in . It thus follows from Theorem 4.40 that the Lagrange multiplier update is given by
k
λk + c ∇uk λ + c ∇uk
λk+1 = c max 0, | |−1
c |λk + c ∇uk |
a.e. in . The existence of Lagrange multiplier λ̄ ∈ L∞ () for the inequality constraint in
(4.7.6) is shown in [Bre2] for f˜ = 1. In general, existence is still an open problem.
i i
i i
ItoKunisc
i i
2008/6/12
page 123
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 124
i i
We note that (4.5.4) has a unique solution uc ∈ H01 (). From (4.5.5)–(4.5.6) we deduce
that uc satisfies (4.7.13). Since λ̂ ∈ L2 () it follows that λc (uc , λ̂) ∈ L2 () and thus
uc ∈ H 2 (). Let η = sup(0, uc − ψ). Then η ∈ H01 (). Hence, we have
For x ∈ assume that uc (x) ≥ ψ(x). If λ̂(x) > 0, then −(ψ + f˜)+λc = c (uc −ψ) ≥ 0.
If λ̂(x) = 0, then −(ψ + f˜) + λc ≥ c (uc − ψ) ≥ 0. If λ̂(x) < 0, then −(ψ + f˜) + λc ≥
−(ψ − φ) + c (ψ − φ) ≥ 0 for c ≥ c0 . Thus, we have (−(ψ + f˜) + λc , η) ≥ 0 and
|∇η|2 = 0. This implies that η = 0 and uc ≤ ψ a.e. in . Similarly, we can prove that
uc ≥ φ a.e. in by choosing the test function η = inf (0, uc − φ) ∈ H01 (). Moreover, it
follows from (4.7.14) that |λc | = |λc (uc , λ̂)| ≤ |λ̂| a.e. in and |λc |L2 is uniformly bounded.
Thus, there exists a weakly convergent subsequence (uĉ , λĉ ) (ū, λ̄) in H 2 () × L2 ().
Moreover the subsequence uĉ converges strongly to ū in H01 () and, as shown in the proof
of Theorem 4.46,
−ū + λ̄ = f˜ and λ̄ ∈ ∂ϕ(ū). (4.7.15)
Hence, it follows from Theorem 4.43 that ū minimizes (4.7.7). Since the solution to (4.7.7)
is unique it follows from (4.7.15) that λ̄ ∈ L2 () is unique. The theorem now follows from
Theorem 4.45.
From Theorem 4.54 it follows that Steps 2 and 3 in the augmented Lagrangian method
for the two-sided constraint are given by
⎧
⎨ λk + c (uk − ψ) if λk + c (uk − ψ) > 0,
−uk + λk+1 = f˜, λk+1 = λk + c (uk − φ) if λk + c (uk − φ) < 0,
⎩
0 otherwise.
i i
i i
ItoKunisc
i i
2008/6/12
page 125
i i
which is a simplified version of a contact problem arising in elasticity theory. In this case
we choose
λk+1 = max(0, λk − c uk ) on .
where is a bounded open domain of R2 with sufficiently smooth boundary . In this case
we choose
i i
i i
ItoKunisc
i i
2008/6/12
page 126
i i
4.7.7 L1 -fitting
for interpolation of noisy data z ∈ L2 () by minimizing the L1 -norm of the error u − z
over u ∈ H01 (). Again μ > 0 is fixed but should be adjusted to the statistics of noise. The
analysis of this problem is very similar to that of a friction problem and Steps 2 and 3 in the
augmented Lagrangian method are given by
−μ uk + λk+1 = 0,
⎧
⎨ λk + c (uk − z) on k = {x : |λk (x) + c (uk (x) − z)| ≤ 1},
λk+1 = λk + c (uk − z)
⎩ on \ k .
|λk + c (uk − z)|
where A ∈ Rn×n , B ∈ Rn×m , and x0 ∈ Rn are fixed. In this case we formulate the problem
as in the form (4.1.1) by choosing
where K = {v ∈ H : |v(t)| ≤ 1 a.e. in (0, T )}. C is the closed affine space defined by
t
C = (x, u) ∈ X : x(t) − x0 − Ax(s) + Bu(s) ds = 0 .
0
i i
i i
ItoKunisc
i i
2008/6/12
page 127
i i
is surjective it follows from the Lagrange multiplier theory that there exists a unique La-
grange multiplier μc ∈ L2 (0, T ; Rn ) such that the Fréchet derivative of L with respect to
(x, u) satisfies L (xc , uc , μc )(h, v) = 0 for all (h, v) ∈ X. Hence we obtain
T 1 t 2
h(t) − Ah(s) ds, μc (t) + (x(t), h(t)) dt = 0
0 0
i i
i i
ItoKunisc
i i
2008/6/12
page 128
i i
and thus
⎧
⎪
⎪ |B T pc (t)| if |B T pc (t)| ≤ 1,
⎨
|uc (t)| = t ∈ [0, T ]. (4.7.22)
⎪
⎪ |B T pc (t)| − 1
⎩ 1+ if |B pc (t)| ≥ 1,
T
c+1
Note that
−B T pc (t)
uc (t) = ,
1 + ηc (t)|uc (t)|−1
(4.7.23)
c
where ηc (t) = c max(0, |uc (t)| − 1) = max(0, |B T pc (t)| − 1).
1+c
−B T p̄(t) −B T p̄(t)
η̄(t) = max(0, |B T p̄(t)| − 1) and û(t) = = (4.7.24)
1 + η̄(t) max(1, |B T p̄(t)|)
which is equivalent to (4.5.7). Now, from Theorem 4.43 we deduce that (x̄, ū) is the solution
to (4.7.19). The last equality in (4.7.25) can be verified by separately considering the cases
|ū(t)| < 1 and |ū(t)| = 1. It follows from (4.7.20) that Steps 2 and 3 in the augmented
Lagrangian method are given by
d
xk (t) = Axk (t) + Buk (t), xk (0) = x0 ,
dt
d
pk (t) + AT pk (t) + xk = 0, pk (T ) = 0,
dt
λk
λ k + c uk
uk (t) + λk+1 (t) = −B T pk (t), where λk+1 = c max 0,
uk +
− 1
c |λk + c uk |
for t ∈ [0, T ].
i i
i i
ItoKunisc
i i
2008/6/12
page 129
i i
Chapter 5
In this chapter we discuss the Newton method for the equality-constrained optimal control
problem
min J (y, u) subject to e(y, u) = 0, (P )
where J : Y × U → R and e : Y × U → W , and Y, U , and W are Hilbert spaces. The
focus is on problems where for given u there exists a solution y(u) to e(y, u) = 0, which is
typical for optimal control problems. We shall refer to
as the reduced cost functional. In the first section we give necessary and sufficient optimality
conditions based on Taylor series expansion arguments. Section 5.2 is devoted to Newton’s
method to solve (P ) and sufficient conditions for quadratic convergence are given. Sec-
tion 5.3 contains a discussion of SQP (sequential quadratic programming) and reduced SQP
techniques. We do not provide a convergence analysis here, since this can be obtained as
a special case from the results on second order augmented Lagrangians in Chapter 6. The
results are specialized to a class of optimal control problems for the Navier–Stokes equation
in Section 5.4. Section 5.5 is devoted to the Newton method for weakly singular problems
as introduced in Chapter 1.
5.1 Preliminaries
(I) Necessary Optimality Condition
The first objective is to characterize the derivative of Jˆ without recourse to the derivative
of the state y with respect to the control u. This is in the same spirit as Theorem 1.17 but
obtained under the less restrictive assumption of this chapter. We assume that
(C1) (y, u) ∈ Y × U satisfies e(y, u) = 0 and there exists a neighborhood V (y) × V (u) of
(y, u) on which J and e are C 1 with Lipschitz continuous first derivatives such that
for every v ∈ V (u) there exists a unique y(v) ∈ V (y) satisfying e(y(v), v) = 0.
129
i i
i i
ItoKunisc
i i
2008/6/12
page 130
i i
Theorem 5.1. Assume that the pair (y, u) satisfies (C1) and that there exists λ ∈ W ∗ such
that
(C2) ey (y, u)∗ λ + Jy (y, u) = 0, and
(C3) limt→0+ 1t |y(u + td) − y|2Y = 0 for all d ∈ U .
Since J is Lipschitz continuous in V (y) × V (u), it follows from (5.1.2) and conditions
(C2)–(C3) that
If ey (y, u) : Y → W is bijective, then (5.1.1) also follows from the implicit function
theory. In fact, by the implicit function theorem there exists a neighborhood V (y) × V (u)
of (y, u) such that e(y, v) = 0 has a unique solution y(v) ∈ V (y) for v ∈ V (u) and
y(·) : V (u) ⊂ U → Y is C 1 with
i i
i i
ItoKunisc
i i
2008/6/12
page 131
i i
Thus
or equivalently
L (y ∗ , u∗ , λ∗ ) = 0, e(y ∗ , u∗ ) = 0,
where, as in the previous chapter, prime with L denotes the derivative with respect to (y, u).
and
we find by summing (5.1.5) and (5.1.6) with y = y(u) and using (5.1.4)
Based on this identity, we have the following local sufficient optimality conditions.
• If E(u) := E1 (y(u) − y ∗ , u − y ∗ ) + E2 (y(u) − y ∗ , u − u∗ ) ≥ 0 for all u ∈ V (u∗ ),
then u∗ is a local minimizer of (P ). If E(u) > 0 for all u ∈ V (u∗ ) with u = u∗ , then u∗ is
a strict local minimizer.
• Assume that J is locally uniformly convex in the sense that for some α > 0
i i
i i
ItoKunisc
i i
2008/6/12
page 132
i i
and e (x ∗ ) : Y × U → W is surjective, then there exist σ0 > 0 and ρ > 0 such that
for all y(u) ∈ V (y ∗ )×V (u∗ ) with |(y(u), u)−(y(u∗ ), u∗ )| < ρ. In fact, from Lemma 2.13,
Chapter 2, with S = ker e (x ∗ ) there exist γ > 0, σ0 > 0 such that
L (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) ≥ σ0 |x − x ∗ |2
(5.1.10)
for all x ∈ V (y ∗ ) × V (u∗ ) with |xker⊥ | ≤ γ |xker |,
where we decomposed x − x ∗ as
Since 0 = e(x) − e(x ∗ ) = e (x ∗ )(x − x ∗ ) + η(x − x ∗ ), where |η(x − x ∗ )|W = o(|x − x ∗ |),
it follows that for δ ∈ (0, 1+γ
γ
) there exists ρ > 0 such that
Consequently
δ
|xker⊥ | ≤ |xker | < γ |xker | if |x − x ∗ | < ρ,
1−δ
and hence (5.1.9) follows from (5.1.7) and (5.1.10).
We refer the reader to [CaTr, RoTr] and the literature cited therein for detailed in-
vestigations on the topic of second order sufficient optimality. The aim of these results is
to establish second order sufficient optimality conditions which are close to second order
necessary conditions.
i i
i i
ItoKunisc
i i
2008/6/12
page 133
i i
Let (y ∗ , u∗ ) denote a solution to (P ), and assume that (C1) holds for (y ∗ , u∗ ) and that (C2),
(C3) hold for all (y(u), u) ∈ V (y ∗ ) × V (u∗ ). In addition it is assumed that J and e are
C 2 in V (y ∗ ) × V (u∗ ) with Lipschitz continuous second derivatives. From Theorem 5.1 the
first derivative of Jˆ(u) is given by
admits a solution (μ1 , μ2 ) ∈ L(U, Y ) × L(U, W ∗ ). Note that (5.2.5) is an operator equa-
tion in L(U, Y ∗ ) × L(U, W ). It consists of the linearized primal equation for μ1 and the
adjoint operator equation with right-hand side −ey (y, u)∗ μ2 − Lyu (y, u, λ) ∈ L(U, Y ∗ )
for μ2 .
Using (5.24) and the adjoint operator in the form Ly (y, u, λ) = 0 and e(y, u) = 0,
we find
Jˆ (u) = Lyu (y, u, λ)μ1 + eu (y, u)∗ μ2 + Luu (y, u, λ), (5.2.6)
where (μ1 , μ2 ) satisfy (5.2.5). In fact, since e and J are required to have Lipschitz con-
tinuous second derivatives in V (y ∗ ) × V (u∗ ) we obtain the following relationships in W
and Y ∗ :
i i
i i
ItoKunisc
i i
2008/6/12
page 134
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 135
i i
Newton Method.
(i) Initialization: Choose u0 ∈ V (u∗ ), solve
and set k = 0
(ii) Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗
⎛ ⎞ ⎛ ⎞
δy 0
L (yk , uk , λk ) e (yk , uk )∗ ⎝ ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0.
δu
e (yk , uk ) 0
δλ 0
The following theorem provides sufficient conditions for local quadratic convergence.
Here we assume that ey (x ∗ ) : Y → W is a bijection at a local solution x ∗ = (y ∗ , u∗ ) of (P ).
Then there corresponds a unique Lagrange multiplier λ∗ .
i i
i i
ItoKunisc
i i
2008/6/12
page 136
i i
where ey (y(u), u)∗ λ(u) = −Jy (y(u), u). Thus for ρ = min(r, K̂r ) the inequalities in
(5.2.11) hold with x = (y(u), u) and λ = λ(u), provided that |u − u∗ | < ρ. Let S(u) =
T (y(u), u)∗ L (y(u), u, λ(u))T (y(u), u). Then there exists μ > 0 such that
Let |u0 − u∗ | < min( μκ , ρ) and, proceeding by induction, assume that |uk − u∗ | ≤ |u0 − u∗ |.
Then
S(uk )(uk+1 − u∗ ) = S(uk )(uk − u∗ ) − Jˆ (uk ) + Jˆ (u∗ )
1 (5.2.12)
= (S(uk + s(u∗ − uk )) − S(uk ))(u∗ − uk ) ds
0
and hence
2μ μ
|uk+1 − u∗ | ≤ |uk − u∗ |2 ≤ |u0 − u∗ | |u0 − u∗ | ≤ |u0 − u∗ | < ρ.
κ 2 κ
Remark 5.2.2. Note that the reduced Hessian S = T ∗ L T is a Schur complement of the
linear system (5.2.10). That is, if we eliminate δy and δλ by
i i
i i
ItoKunisc
i i
2008/6/12
page 137
i i
Further let
denote the Lagrangian and let λ∗ be a Lagrange multiplier at the solution x ∗ . Then a
necessary first order optimality condition is given by
L (y ∗ , u∗ , λ∗ ) = 0, e(x ∗ ) = 0, (5.3.2)
where σ > 0. Differently from the Newton method described in the previous section,
in the SQP method both y and u are considered as independent variables related by the
equality constraint e(y, u) = 0 which is realized by a Lagrangian term. The SQP method
then consists essentially in a Newton method applied to the necessary optimality condition
(5.3.2) to iteratively solve for (y ∗ , u∗ , λ∗ ). This results in determining updates from the
linear system
⎛ ⎞ ⎛ ⎞
δy ey (yk , uk )∗ λk + Jy (yk , uk )
L (yk , uk , λk )e (yk , uk )∗ ⎝
δu ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0.
e (yk , uk ) 0
δλ e(yk , uk )
(5.3.4)
i i
i i
ItoKunisc
i i
2008/6/12
page 138
i i
If the values for yk and λk are obtained by means of a feasibility step as in (iv) of the Newton
algorithm, then the first and the last components on the right-hand side of (5.3.4) are 0 and
we arrive at the Newton algorithm. This will be further investigated in Section 5.5 for
weakly singular problems. Note that for affine constraints the iterates, except possibly the
initialization, are feasible by construction, and hence e(xk ) = 0.
Let us note that given an iterate xk = (yk+1 , uk+1 ) near x ∗ the SQP step for (5.3.1) is
also obtained as the necessary optimality to the quadratic subproblem
⎧
⎪
⎨min L (xk , λk ), h
+
1
L (xk , λk )(h, h)
2 (5.3.5)
⎪
⎩subject to e (x )h + e(x ) = 0, h ∈ X,
k k
then
T (x ∗ )∗ J (x ∗ ) = 0 (5.3.6)
and
T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ ) ≥ σ I, (5.3.7)
respectively.
Referring to (5.3.5), let qk ∈ ker e (xk )⊥ ⊂ X satisfy e (xk )qk + e(xk ) = 0. Then
hk ∈ X can be expressed as
hk = qk + T (xk )w.
Thus, (5.3.5) is reduced to
(5.3.8)
1
+ T (xk )w, L (xk , λk )T (xk )w
,
2
i i
i i
ItoKunisc
i i
2008/6/12
page 139
i i
Therefore the full SQP step is decomposed into a minimization step in H , a restoration step
to the linearized equation
e (xk )q + e(xk ) = 0,
and an update of the Lagrange multiplier according to
If e (xk ) ∈ L(X, W ) admits a right inverse R(xk ) ∈ L(W, X) satisfying e (xk )R(xk ) = IW ,
then qk = −R(xk )e(xk ) in (5.3.9) and
hk = −Rk e(xk ) − Tk Tk∗ L (xk , λk )Tk )−1 Tk∗ (J (xk ) − L (xk , λk )Rk e(xk ) , (5.3.11)
where Tk = T (xk ) and Rk = R(xk ) and we used that T ∗ w = 0 for w ∈ range e (x)∗ . Note
that a right inverse to e (x) for x in a neighborhood of x ∗ exists if e (x ∗ ) is surjective and
x → e (x) is continuous.
An alternative to deriving the update wk is given by differentiating
d
(T (x ∗ )∗ J (x ∗ )) = T (x ∗ )∗ L (x ∗ , λ∗ ).
dx
This representation holds in general only at the solution x ∗ . But if we use its structure for
an update of the form hk = qk + T (xk )wk in a Newton step to (5.3.6), we arrive at
Tk∗ L (xk , λk )hk = Tk∗ L (xk , λk )(Tk wk − R(xk )e(xk )) = −Tk∗ J (xk ),
This equation for the update of the control coincides with the Schur complement form of
the Newton update given in (5.2.13), since
i i
i i
ItoKunisc
i i
2008/6/12
page 140
i i
where we used the fact that in the Newton method ey∗ (xk )λk + Jy (xk ) = 0. The updates for
the state and the adjoint state differ, however, for the Newton and the reduced SQP methods.
A second distinguishing feature for reduced SQP methods is that often the reduced
Hessians Tk∗ L (xk , λk )Tk are approximated by invertible operators Bk ∈ L(H ) suggested
by secant methods. Thus a reduced SQP step has the form xk+1 = xk + hRED k , where
−1 ∗
k = −Rk e(xk ) − Tk Bk Tk J (xk ).
hRED (5.3.13)
For the reduced SQP method the step hRED k in general depends on the specific choice of
the null-space representation and the right inverse. In the full SQP method the step hk in
(5.3.11) is invariant with respect to these choices. A third distinguishing feature of reduced
SQP methods is the choice of the Lagrange multiplier, which is required to specify the
update of Bk . The λ-update is typically not derived from the first equation in (5.3.4). From
the first order condition we have
J (x ∗ ), R(x ∗ )h
+ λ∗ , e (x ∗ )R(x ∗ )h
= J (x ∗ ), R(x ∗ )h
+ λ∗ , h
= 0
for all h ∈ W.
Other choices for the Lagrange multiplier update are possible. For convergence proofs of
reduced SQP methods this update is required to be locally Lipschitz continuous; see [Kup,
KuSa].
In the case of problem (P ), a vector (δy, δu) ∈ X lies in the null-space of e (x) if
ey (x)δy + eu (x)δu = 0.
This suggests choosing H = U and using the following definitions for the null-space
representation and the right inverse:
(iv) update λ.
i i
i i
ItoKunisc
i i
2008/6/12
page 141
i i
The update of the Lagrange multiplier λ is needed for the approximation Bk to the reduced
Hessian T ∗ (xk )L (xk , λk )T (xk ). A popular choice is given by the BFGS-update formula
1 1
Bk+1 = Bk + zk , ·
zk − Bk δuk , ·
Bk δuk ,
zk , δuk
δuk , Bk δuk
where
zk = T ∗ (xk )L xk + T (xk )δuk , λk − J (xk ).
Each SQP step requires at least three linear systems solves, one in W ∗ for the evaluation of
T (xk )∗ J (xk ), one in U for δu, and another one in Y for δy. The update of the BFGS formula
requires one additional system solve. The typical convergence behavior that can be proved
for reduced SQP methods with BFGS update is local two-step superlinear convergence
[Kup, KuSa, KSa],
|xk+1 − x ∗ |
lim =0
k→∞ |xk−1 − x ∗ |
provided that
B0 − T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ ) is compact. (5.3.16)
Here B0 denotes the initialization to the reduced Hessian approximation, and (x ∗ , λ∗ ) is a
solution to (5.1.4). If e is C 1 at x ∗ and ey (x ∗ ) is continuously invertible and compact, then
(5.3.16) holds for the choice B0 = Luu (x ∗ , λ∗ ). In [HiK2], condition (5.3.16) is analyzed
for a class of optimal control problems related to the stationary Navier–Stokes equations.
We recall that condition (5.3.16) is related to an analogous condition in the context of solving
nonlinear equations F (x) = 0 in infinite-dimensional Hilbert spaces by means of secant
methods, in particular the Broyden method. The initialization of the Jacobian must be a
compact perturbation of the linearization of F at the solution to ascertain local Q-superlinear
convergence [Grie, Sa].
If we deal with finite-dimensional problems with e a mapping from RN into RM , with
M < N , then a common null-space representation is provided by the QR decomposition.
Let H = RN−M and
R1 (x)
e (x) = Q(x)R(x) = (Q1 (x) Q2 (x))
T
,
0
i i
i i
ItoKunisc
i i
2008/6/12
page 142
i i
In particular, the term L (xk , λk )Rk e(xk ) is again deleted from the expression in (5.3.10)
and the Lagrange multiplier update is chosen according to (5.3.14). However, differently
from (5.3.13), the reduced Hessian is not approximated in (5.3.17). Here we do not indicate
the iteration index k, and we assume that ey (x) : Y → W is bijective and that (5.3.7) holds.
Then (q, w, δλ) is a solution to (5.3.17) if and only if (δy, δu, δλ) is a solution to the
system
⎛ ⎞⎛ ⎞ ⎛ ⎞
0 0 ey (x)∗ δy Ly (x, λ)
⎝ 0 T (x)∗ L (x, λ)T (x) eu (x)∗ ⎠ ⎝ δu ⎠ = − ⎝ Lu (x, λ) ⎠ , (5.3.18)
ey (x) eu (x) 0 δλ e(x)
with w = δu, q = −R(x)e(x), and (δy, δu) = q + T (x)δu. In fact, R ∗ (x) is a left inverse
to e (x)∗ , and from the first equation in (5.3.18) we have
System (5.3.18) should be compared to (5.3.4). We note that the reduced SQP step is
equivalent to a block tridiagonal system, where the “Luu ” element in the system matrix is
replaced by the reduced Hessian while the elements corresponding to “Lyy ” and “Lyu ” are
zero. The system matrix of (5.3.18) can be used advantageously as preconditioning for
iteratively solving (5.3.4). In fact, let us denote the system matrices in (5.3.4) and (5.3.18)
by S and Sred , respectively, and consider the iteration
−1
δzn+1 = δzn − Sred S δzk + col (Ly (x, λ), Lu (x, λ), e(x)) . (5.3.19)
−1
In [IKSG] it was proved that the iteration matrix I − Sred S is nilpotent of degree 3. Hence
the iteration (5.3.19) converges in 3 steps to the solution of (5.3.4) with (xk , λk ) = (x, λ).
i i
i i
ItoKunisc
i i
2008/6/12
page 143
i i
1 1
((−A0 ) 2 v, (−A0 ) 2 w)H for all v, w ∈ W . Since A and A0 coincide on dom A0 , and
dom A0 is dense in V , we shall not distinguish between A and A0 as operators in L(V , V ∗ ).
In (5.4.1), moreover, Ũ is the Hilbert space of controls, B ∈ L(Ũ , V ∗ ), and y0 ∈ H . We as-
sume that h ∈ C 1 (Ũ , R) with Lipschitz continuous first derivative and ∈ C 1 (V , R), with
Lipschitz continuous on bounded subsets of H . The nonlinearity F is supposed to satisfy
⎧
⎪
⎪ F : V → V ∗ is continuously differentiable and there exists
⎪
⎪
⎪
⎪
⎪ a constant c > 0 such that for every ε > 0
⎪
⎪
⎪
⎪
⎪
⎪
⎨ F (y) − F (ȳ), y − ȳ
V ∗ ,V ≤ ε|y − ȳ|2V + εc (|ȳ|2V + |y|2H |y|2V )|y − ȳ|2H ,
(H1)
⎪
⎪ F (ȳ)y, y
V ,V ≤ ε|y|V + ε |y|H |ȳ|V (1 + |ȳ|H ),
2 c 2 2 2
⎪ ∗
⎪
⎪
⎪
⎪
⎪
1 1 1 1
⎪
⎪ |(F (y) − F (ȳ))v|V ∗ ≤ c|y − ȳ|H2 |y − ȳ|V2 |v|H2 |v|V2
⎪
⎪
⎪
⎩
for all y, ȳ, and v in V .
⎧
⎪
⎪ For any u ∈ L2 (0, T ; Ũ ) there exists a unique weak
⎪
⎪
⎪
⎪
⎪
⎪ solution y = y(u) in
⎪
⎪
⎪
⎪
⎪
⎪ W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) satisfying
⎪
⎪
⎨
(H2) dtd y(t), ψ
V ∗ ,V = A0 y(t) + F (y(t)) + Bu(t), ψ
V ∗ ,V
⎪
⎪
⎪
⎪
⎪ for all ψ ∈ V , and y(0) = y0 .
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ Moreover {y(u) : |u|L2 (0,T ;Ũ ) ≤ r} is bounded in
⎪
⎪
⎪
⎩
W (0, T ) for each r > 0.
With these preliminaries the dynamical system in (5.4.1) can be considered with values
in V ∗ . The conditions on A0 and F are motivated by the two-dimensional incompressible
i i
i i
ItoKunisc
i i
2008/6/12
page 144
i i
∇y = 0 in (0, T ] × , (5.4.2)
⎪
⎪
⎩
y(0, ·) = y0 in ,
where n is the outer unit normal vector to ∂, let denote the Laplacian in H , and let P F
denote the orthogonal projection of L2 ()2 onto the closed subspace H . Then A0 = νP
1
is the Stokes operator in H . It is a self-adjoint operator with domain dom((−A0 ) 2 ) = V and
It is well defined due to (5.4.4). Moreover (H1) follows from (5.4.3) and (5.4.4). From the
theory of variational solutions to the Navier–Stokes equations it follows that there exists
a constant C such that for all y0 ∈ H and u ∈ L2 (0, T ; Ũ ) there exists a unique solution
y ∈ W (0, T ) such that
|y|C(0,T ;H ) + |y|W (0,T ) ≤ C(|y0 |H + |u|L2 (0,T ;Ũ ) + |y0 |2H + |u|2L2 (0,T ;Ũ ) ),
where |y|W (0,T ) = |y|L2 (0,T ;V ) +| dtd y|L2 (0,T ;V ∗ ) and we recall that C([0, T ]; H ) is embedded
continuously into W (0, T ); see, e.g., [LiMa, Tem]. Assuming the existence of a local solu-
tion to (5.4.1) we shall present in the following subsections first and second order optimality
conditions, as well as the steps which are necessary to realize the Newton algorithm.
Control and especially optimal control has received a considerable amount of attention
in the literature. Here we only mention a few [Be, FGH, Gu, HiK2] and refer the reader to
further references given there.
i i
i i
ItoKunisc
i i
2008/6/12
page 145
i i
W = L2 (0, T ; V ∗ ) × H, U = L2 (0, T ; Ũ ),
T
J (y, u) = 0 (y(t)) + h(u(t)) dt,
e(y, u) = (yt − A0 y − F (y) − Bu, y(0)).
t → a(t; φ, ψ) = − A0 φ, ψ
V ∗ ,V − F (y ∗ (t))φ, ψ
V ∗ ,V
satisfies c
a(t; φ, φ) ≥ |φ|2V − ε|φ|2V − |φ|2V |y ∗ (t)|2V (1 + |y ∗ |2C(0,T ;H ) ),
ε
there exists a constant c̄ > 0 such that
1 2
a(t; φ, φ) ≥ |φ| − c̄|y ∗ (t)|2V |φ|2H for t ∈ (0, T ),
2 V
where |y ∗ |2V ∈ L1 (0, T ). Consequently, the adjoint equation
-
− dtd p ∗ (t) = A0 p ∗ (t) + F (y ∗ (t))∗ p ∗ (t) + (y ∗ (t)),
(5.4.5)
p∗ (T ) = 0
i i
i i
ItoKunisc
i i
2008/6/12
page 146
i i
d ∗ 2
|p (t)|H − C|y ∗ (t)|2V |p ∗ (t)|2H + |p ∗ (t)|2V ≤ | (y ∗ (t))|2V ∗ .
−
dt
T
Multiplying by exp(− t ρ̄(s)ds), where ρ̄(s) = C|y ∗ (s)|2V , we find
T s T s
∗ ∗ ∗
|p (t)|2H + |p (s)|2V exp ρ̄(τ )dτ ds ≤ | (y (s))|2V ∗ exp ρ̄(τ )dτ ds
t t t t
and (5.4.6) follows. This implies (C2) with λ∗ = (p ∗ , p∗ (0)). For any y ∈ W (0, T ) we
have
d d
|y(t)|2H = 2 y(t), y(t) for t ∈ (0, T ). (5.4.7)
dt dt V ∗ ,V
Let u ∈ V (u∗ ) and denote by y = y(u) ∈ V (y ∗ ) the solution to the dynamical system in
(5.4.1). From (5.4.7) we have
1 d
|y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V
2 dt
= F (y(t)) − F (y ∗ (t)), y(t) − y ∗ (t)
V ∗ ,V + Bu(t) − Bu∗ (t), y(t) − y ∗ (t)
V ∗ ,V .
By (H2) the set {y(u) : u ∈ V (u∗ )} is bounded in W (0, T ). Hence by (H1) there exists a
constant C independent of u ∈ V (u∗ ) and t ∈ [0, T ] such that
d
|y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V ≤ C(ρ(t)|y(t) − y ∗ (t)|2H + |u(t) − u∗ (t)|2Ũ ),
dt
T
where 0 ρ(τ )dτ is bounded independent of u ∈ V (u∗ ). Utilizing the equations satisfied
by y(u) and y(u∗ ) it follows that
for a constant Ĉ independent of u ∈ V (u∗ ). Hence (C3) follows and Theorem 5.1 implies
the optimality system
d ∗
y (t) = A0 y ∗ (t) + F (y ∗ (t)) + Bu∗ (t), y(0) = y0 ,
dt
d
− p ∗ (t) = A0 (y ∗ (t)) + F (y ∗ (t))∗ p ∗ (t) + (y ∗ (t)), p ∗ (T ) = 0,
dt
B ∗ p ∗ (t) + h (u∗ (t)) = 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 147
i i
Moreover,
i i
i i
ItoKunisc
i i
2008/6/12
page 148
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 149
i i
The updates (y +δy, u+δu, λ+δλ) will not necessarily remain in Y1 ×U ×dom(ey (y, u)∗ )
since it is not assumed that e is surjective from Y1 × U to W . However, the feasibility steps
consisting in solving the primal equation
e(y, u+ ) = 0 (5.5.1)
+
for y = y and the adjoint equation
ey (y + , u+ )∗ λ + Jy (y + , u+ ) = 0 (5.5.2)
for the dual variable λ = λ+ will guarantee that (y + , λ+ ) ∈ Y1 ×Z1 holds. Here u+ = u+δu
and Z1 ⊂ dom (ey (y + , u+ )∗ ) denotes a Banach space densely embedded into W ∗ , with
λ∗ ∈ Z1 . Since Y1 and Z1 are contained in Y and W ∗ , the feasibility steps (5.5.1)–(5.5.2)
can also be considered as smoothing steps. Thus we obtain the Newton method for the
singular case.
Algorithm
• Initialization: Choose u0 ∈ V (u∗ ), solve
e(y, u0 ) = 0, ey (y0 , u0 )∗ λ + Jy (y0 , u0 ) = 0 for (y0 , λ0 ) ∈ Y1 × Z1 ,
and set k = 0.
• Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗
⎛ ⎞⎛ ⎞ ⎛ ⎞
L (yk , uk , λk ) e (yk , uk )∗ δy 0
⎝ ⎠ ⎝δu⎠ = − ⎝eu (yk , uk )∗ λk + Ju (yk , uk )⎠ .
e (yk , uk ) 0 δλ 0
Remark 5.5.1. The algorithm is essentially the Newton method of Section 5.2. Because of
the feasible step, the first and the last components on the right-hand side of the SQP step are
zero, and Newton’s method and the SQP method coincide. Let us point out that the SQP
iteration may not be well defined without the feasibility step since the updates (y + , λ+ ) may
only be in Y × W ∗ , while e and ey (y + , u+ )∗ are not necessarily well defined on Y × U and
W ∗.
We next specify the assumptions which justify the above derivation and under which
well-posedness and convergence of the algorithm can be proved. Thus let (y ∗ , u∗ , λ∗ ) ∈
Y1 × U × Z1 be a solution to (5.1.4), or equivalently to (1.5.3), and let
V (y ∗ ) × V (u∗ ) × V (λ∗ ) ⊂ Y1 × U × Z1
be a convex bounded neighborhood of the solution triple (y ∗ , u∗ , λ∗ ).
i i
i i
ItoKunisc
i i
2008/6/12
page 150
i i
(H5) (a) For every u ∈ V (u∗ ) there exists a unique solution y = y(u) ∈ V (y ∗ ) of
e(y, u) = 0. Moreover, there exists M > 0 such that |y(u) − y ∗ |Y1 ≤ M|u − u∗ |U .
(b) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) there exists a unique solution λ = λ(y, u) ∈
V (λ∗ ) of ey (y, u)∗ λ + Jy (y, u) = 0 and |λ(y, u) − λ∗ |Z1 ≤ M|(y, u) − (y ∗ , u∗ )|Y1 ×U .
(H9) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) ⊂ Y1 × U the operator e (y, u) can be extended
as continuous linear operator from Y × U to W , and the mapping (y, u) → e (y, u)
from V (y ∗ ) × V (u∗ ) ⊂ Y1 × U → L(Y × U, W ) is continuous and (y, u, λ) →
(λ, e (y, u)(·, ·)) from V (y ∗ )×V (u∗ )×V (λ∗ ) ⊂ Y1 ×U ×Z1 → L(Y ×U, Y ∗ ×U ∗ )
is continuous.
L (y ∗ , u∗ , λ∗ )v, v
Y ∗ ×U ∗ ,Y ×U ≥ κ |v|2Y ×U
Condition (H5) requires well-posedness of the primal and the adjoint equation in
Y1 , respectively, Z1 . The adjoint equations arise from linearization of e at elements of
Y1 ×U . Condition (H6) requires smoothness of J . In (H7) and (H8) the necessary regularity
requirements for e as mapping on Y1 × U and in Y × U are specified. From (H5) it follows
that the initialization as well as the feasibility step are well defined provided that uk ∈ V (u∗ ).
As a consequence the derivatives of J and e that are required for defining the Newton step
are taken at elements (yk , uk , λk ) ∈ Y1 × U × Z1 .
For x = (y, u, λ) ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) let A(x) denote the operator
L (y, u, λ) e (y, u)∗
A(x) = .
e (y, u) 0
Conditions (H6) and (H9) guarantee that the operator A(x) ∈ L(Y ×U ×W ∗ , Y ∗ ×U ∗ ×W )
for x ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) . Conditions (H6), (H9)–(H11) imply that
i i
i i
ItoKunisc
i i
2008/6/12
page 151
i i
(H12) there exist a neighborhood V (x ∗ ) ⊂ V (y ∗ ) × V (u∗ ) × V (λ∗ ) and M1 , such that for
every x = (y, u, λ) ∈ V (x ∗ ) and δw ∈ Y ∗ × U ∗ × W
A(x)δx = δw
|δx|Y ×U ×W ∗ ≤ M1 |δw|Y ∗ ×U ∗ ×W .
with F : Y1 × U × Z1 → Y ∗ × U ∗ × W defined by
F (y, u, λ) = −(ey (y, u)∗ λ + Jy (y, u), eu (y, u)∗ λ + Ju (y, u), e(y, u)).
Due to the smoothing step the first and third coordinates of F are 0 at xk . By (H5) there
exists M > 0 such that
and
where (yk , λk ) are determined by the feasibility step. Let us assume that xk ∈ V (x ∗ ). Then
it follows from (H6)–(H8) that, possibly after increasing M, it can be chosen such that for
every xk ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ )
1
M ∗
= |(A(xk + s(x ∗ − xk )) − A(xk ))(x ∗ − xk )|Y ∗ ×U ∗ ×W ds ≤ |x − xk |2Y1 ×U ×Z1 .
0 2
i i
i i
ItoKunisc
i i
2008/6/12
page 152
i i
Moreover,
M
|A(xk )(xk + δx − x ∗ )| = |F (x ∗ ) − F (xk ) − F (xk )(x ∗ − xk )| ≤ |xk − x ∗ |2Y1 ×U ×Z1 .
2
Consequently, by (H12)
MM1
|uk+1 − u∗ |U ≤ |xk+1 − x ∗ |Y ×U ×W ∗ ≤ |xk − x ∗ |2Y1 ×U ×Z1 , (5.5.6)
2
provided that xk ∈ V (x ∗ ). The proof will be completed by an induction argument with
respect to k. Let r be such that 2M 5 M1 r < 1 and that |x − x ∗ | < 2M 2 r implies x ∈√V (x ∗ ).
Assume that |u0 − u∗ | ≤ r. Then |y0 − y ∗ |Y1 ≤ Mr by (5.5.4) and |λ0 − λ∗ |Z1 ≤ 2M 2 r
by (5.5.5). It follows that
|x0 − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 |u0 − u∗ |2U ≤ 2M 2 r
and hence x0 ∈ V (x ∗ ). Let |xk − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 r. Then from (5.5.6)
|uk+1 − u∗ |U = 2M 5 M1 r 2 ≤ r. (5.5.7)
Consequently (5.5.4)–(5.5.6) are applicable and imply
|xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 2 |uk+1 − u∗ |2U ≤ M 3 M1 |xk − x ∗ |2Y1 ×U ×Z1 . (5.5.8)
It follows that
|xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 7 M1 |uk − u∗ |2U ,
which implies (5.5.3) with K = 4M 7 M1 . From (5.5.7)–(5.5.8) finally |xk+1 −x ∗ |Y1 ×U ×Z1 ≤
2M 2 r.
Let us return now to some of the examples of Chapter 1, Section 1.5, and discuss the
applicability of conditions (H5)–(H11).
Example 1.19 revisited. Condition (H5)(a) is a direct consequence of Lemma 1.14. Con-
dition (H5)(b) corresponds to
(∇λ, ∇φ) + (ey λ, φ) + (y − z, φ) = 0 for all φ ∈ Y, (5.5.9)
given y ∈ Y1 . Let Z1 = Y1 = Y ∩ L∞ (). It follows from [Tr, Chapter 2.3] and the proof
of Lemma 1.14 that there exists a unique solution λ = λ(y) ∈ Z1 to (5.5.9). Moreover, if
y ∈ V (y ∗ ) and w = λ(y) − λ(y ∗ ), then w ∈ Z1 satisfies
∗ ∗
(∇w, ∇φ) + (ey w, φ) + ((ey − ey )λ, φ) + (y − y ∗ , φ) = 0 for all φ ∈ Y.
From the proof of Lemma 4.1 it follows that there exists M > 0 such that
|λ(y) − λ(y ∗ )|Z1 ≤ M |y − y ∗ |Y1 for all y ∈ V (y ∗ )
and thus (H5)(b) is satisfied. It is simple to argue the validity of (H6)–(H9). Note that
e (y ∗ , u∗ ) is surjective from Y × U to W and thus (H11) is satisfied. As for (H10), this
condition is equivalent to the requirement that
∗
|δy|2L2 () + (λ∗ ey , (δy)2 ) + β |δu|2U ≥ κ(|δy|2Y + |δu|2U )
i i
i i
ItoKunisc
i i
2008/6/12
page 153
i i
where λ∗ is the solution to (5.3.11) with y = y ∗ . Then there exists k̄ > 0 such that
|δy|2Y ≤ k̄|δu|2
for all (δy, δu) satisfying (5.5.10). It follows that (H10) holds if |(λ∗ )− |L∞ is sufficiently
small. This is the case, for example, in the case of small residue problems, in the sense that
|y ∗ − z|2L2 () is small enough. If z ≥ y ∗ , then the weak maximum principle [Tr] is applied
to (5.5.9) and gives λ∗ ≥ 0.
With quite analogous arguments it can be shown that (H5)–(H11) also hold for Ex-
ample 1.20 of Chapter 1.
Example 1.22 revisited. The constraint and adjoint equations are given by
and
and
From (1.4.22) it follows that (H5) holds if y ∗ and λ∗ are in W 1,∞ . Conditions (H6)–
(H8) are easily verified. Here we address only the closed range property of e (y, u), with
(y, u) ∈ Y1 × U . For u ∈ Cn∞ () with div u = 0 surjectivity follows from the Lax–
Milgram lemma, and a density argument asserts surjectivity for every u ∈ U . We turn to
(H12) and assume that u ∈ Cn∞ (), div u = 0 first. Then e (y, u) ∈ L(V × U, H −1 ())
and e (y, u)(δy, δu) = 0 can be expressed as
Hence
for all (δy, δu) ∈ ker e (y, u). Henceforth we assume that
i i
i i
ItoKunisc
i i
2008/6/12
page 154
i i
for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ). For (δy, δu) ∈ Y × U and (y, λ) ∈ V (y ∗ ) × V (λ∗ ) we
have by (5.5.12) and (5.5.13)
This estimate, together with (5.5.12), implies that L (y, u, λ) is coercive on ker e (y, u)
and hence A(x)δx = δw admits a unique solution in Y × U × Y for every δw. To estimate
δx in terms of δw, we note that the last equation in the system A(x)δx = δw implies that
and consequently
|δλ|Y ≤ c|δy| + |λ|L∞ () |δu|U + |w1 |H −1 , (5.5.15)
where c is the embedding constant of H01 () 2
into L (). Moreover, using (5.5.14) we find
L (y, u, λ)((δy, δu), (δy, δu)) ≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |δy|Y |δu|U
≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |y|L∞ () |δu|2U − 2|λ|L∞ |w3 |H −1 |δu|U (5.5.16)
≥ |δy|2 + κ|δu|2U − 2|λ|L∞ |w3 |H −1 |δu|U .
|δy|2 + κ|δu|2 ≤ (2|λ|L∞ |δu|U + |δλ|Y )|w3 |H −1 + |w1 |H −1 |δy|Y + |w2 |U |δu|U . (5.5.17)
Inequalities (5.5.14), (5.5.15), and (5.5.17) imply the existence of a constant M1 such that
for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ) and every b ∈ Cn∞ (), with div b = 0. A density argument
with respect to u implies that A(x)δx = δw admits a solution for all x ∈ V (y ∗ )×U ×V (λ∗ )
and that (5.5.18) holds for all such x.
i i
i i
ItoKunisc
i i
2008/6/12
page 155
i i
Chapter 6
Augmented
Lagrangian-SQP
Methods
6.1 Generalities
This chapter is devoted to second order augmented Lagrangian methods for optimization
problems with equality constraints of the type
-
min f (x) over x ∈ X
(6.1.1)
subject to e(x) = 0,
155
i i
i i
ItoKunisc
i i
2008/6/12
page 156
i i
in (6.1.2) are considered in Section 6.3. Applications to optimal control problems will be
given in Section 6.4. Section 6.5 is devoted to short discussions of miscellaneous topics
including reduced SQP methods and mesh independence. In Section 6.6 we give a short
description of related literature.
and
e (x ∗ ) is surjective. (6.2.3)
The Lagrangian functional associated with (6.2.1) is denoted by L : X × W → R and it is
given by
L(x, λ) = f (x) + λ, e(x) W .
With (6.2.3) holding there exists a Lagrange multiplier λ∗ ∈ W such that the following first
order necessary optimality condition is satisfied:
L (x ∗ , λ∗ ) = 0, e(x ∗ ) = 0. (6.2.4)
We shall also make use of the following second order sufficient optimality condition:
-
there exists κ > 0 such that
(6.2.5)
L (x ∗ , λ∗ )(h, h) ≥ κ |h|2X for all h ∈ ker e (x ∗ ).
Here L (x ∗ , λ∗ ) denotes the bilinear form characterizing the second Fréchet derivative of
L with respect to x at (x ∗ , λ∗ ). For any c > 0 the augmented Lagrangian functional
Lc : X × W → R is defined by
c
Lc (x, λ) = f (x) + λ, e(x) W + |e(x)|2W .
2
We note that the necessary optimality condition implies
Lc (x ∗ , λ∗ ) = 0 e(x ∗ ) = 0 for all c ≥ 0. (6.2.6)
i i
i i
ItoKunisc
i i
2008/6/12
page 157
i i
Lemma 6.1. Let (6.2.3) and (6.2.5) hold. Then there exists a neighborhood V (x ∗ , λ∗ ) of
(x ∗ , λ∗ ), c̄ > 0 and σ̄ > 0 such that
Proof. Corollary 3.2 and conditions (6.2.3) and (6.2.5) imply the existence of σ̄ > 0 and
c̄ > 0 such that
Lemma 6.1 implies in particular that x → Lc (x, λ∗ ) can be bounded from below by
a quadratic function. This fact is referred to as augmentability of (6.2.1) at (x ∗ , λ∗ ).
Lemma 6.2. Let (6.2.3) and (6.2.5) hold. Then there exist σ̄ > 0, c̄ > 0, and a
neighborhood Ṽ (x ∗ ) of x ∗ such that
2
Lc (x, λ∗ ) ≥ Lc (x ∗ , λ∗ ) + σ̄
x − x ∗
X for all x ∈ Ṽ (x ∗ ) and c ≥ c̄. (6.2.7)
Proof. Due to Taylor’s theorem, Lemma 6.1, and (6.2.6) we find for x ∈ V (x ∗ )
1
Lc (x, λ∗ ) = Lc (x ∗ , λ∗ ) + Lc (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) + o(|x − x ∗ |2X )
2
σ̄
2
2
≥ Lc (x ∗ , λ∗ ) + x − x ∗
X + o(
x − x ∗
X ).
2
The claim follows from this estimate.
Alternatively, (6.2.8) can be used to define the λ-update only, whereas the x-update is
calculated by a different technique. We shall demonstrate next that (6.2.8) can be solved
for λ̂ without recourse to x̂.
For (x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c̄ we define B(x, λ) ∈ L(W ) by
i i
i i
ItoKunisc
i i
2008/6/12
page 158
i i
Here L(W ) denotes the space of bounded linear operators from W into itself. Note that
B(x, λ) is invertible. In fact, there exists a constant k > 0 such that
B(x, y)y, y W = Lc (x, λ)−1 e (x)∗ y, e (x)∗ y X
2
≥ k
e (x)∗ y
X
for all y ∈ W . Since e (x)∗ is injective and has closed range, there exists k̂ such that
∗
2
e (x) y
≥ k̂ |y|2 for all y ∈ W,
X W
and by the Lax–Milgram theorem continuous invertibility of B(x, λ) follows, provided that
(x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c̄. Premultiplying the first equation in (6.2.8) by e (x)Lc (x, λ)−1
we obtain
-
λ̂= λ + B(x, λ)−1 e(x) − e (x)Lc (x, λ)−1 Lc (x, λ) ,
(6.2.10)
x̂= x − Lc (x, λ)−1 Lc (x, λ̂).
Whenever the dependence of (x̂, λ̂) on (x, λ, c) is important, (x̂(x, λ, c), λ̂(x, λ, c)) will
be written in place of (x̂, λ̂). If, for fixed λ, x = x(λ) is chosen as a local solution to
We point out that (6.2.12) can be interpreted as a second order update to the Lagrange
variable. To acknowledge this, let dc denote the dual functional associated with Lc , i.e.,
dc (λ) = min Lc (x, λ) subject to {x :
x − x ∗
≤ }
for some > 0. Then the first and second derivatives of dc with respect to λ satisfy
∇λ dc (λ) = e(x(λ))
and
and
Lc (x, λ) = L0 (x, λ + ce(x)) + c e (x)(·), e (x)(·) W .
i i
i i
ItoKunisc
i i
2008/6/12
page 159
i i
Using the second equation e (x)(x̂ −x) = −e(x) in the first equation of (6.2.13) we arrive at
λ̃ = λ + ce(x),
(ii) solving
L0 (x, λ̃) e (x)∗ x̂ − x L0 (x, λ̃)
=−
e (x) 0 λ̂ − λ̃ e(x)
With (6.2.3) and (6.2.5) holding there exist a constant κ > 0 and a neighborhood U (x ∗ , λ∗ ) ⊂
V (x ∗ , λ∗ ) of (x ∗ , λ∗ ) such that
Lemma 6.3. Assume that (6.2.3) and (6.2.5) hold. Then there exists a constant K > 0
such that for any (x, λ) ∈ U (x ∗ , λ∗ ) the solution (x̂, λ̂) of
x̂ − x L (x, λ)
M(x, λ) =−
λ̂ − λ e(x)
satisfies
2
(x̂, λ̂) − (x ∗ , λ∗ )
≤ K
(x, λ) − (x ∗ , λ∗ )
X×W . (6.2.16)
X×W
i i
i i
ItoKunisc
i i
2008/6/12
page 160
i i
and consequently
x̂ − x ∗ L (x, λ) − L (x ∗ , λ∗ ) x − x∗
M(x, λ) = − + M(x, λ) .
λ̂ − λ∗ e(x) − e(x ∗ ) λ − λ∗
This equality further implies
15 6x − x ∗
x̂ − x ∗ ∗ ∗
M(x, λ) = M(x, λ) − M(tx + (1 − t)x , tλ + (1 − t)λ ) dt.
λ̂ − λ∗ 0 λ − λ∗
The regularity properties of f and e imply that (x, λ) → M(x, λ) is Lipschitz continuous
on U (x ∗ , λ∗ ) for a Lipschitz constant γ > 0. Thus we obtain
∗
γ
2
M(x, λ) x̂ − x
≤
(x, λ) − (x ∗ , λ∗ )
X×W ,
∗
λ̂ − λ X×W
2
and by (6.2.15)
γ κ
2
(x̂, λ̂) − (x ∗ , λ∗ )
≤ (x, λ) − (x ∗ , λ∗ )
X×W ,
X×W 2
which implies the claim.
We now describe three algorithms and analyze their convergence. They are all based
on (6.2.14) and differ only in the choice of x. Recall that if x is a solution to (6.2.11),
then (6.2.12) and hence (6.2.14) provide a second order update to the Lagrange multiplier.
Solving (6.2.11) implies extra computational cost. In the results which follow we show that
as a consequence a larger region of attraction with respect to the initial condition and an
improved rate of convergence factor is obtained, compared to methods which solve (6.2.11)
only approximately or skip it all together. As a first choice x in (6.2.14) is determined by
solving
The second choice is to take x only as an appropriate suboptimal solution to this optimization
problem, and the third choice is to simply choose x as the solution x̂ of the previous iteration
of (6.2.14).
Algorithm 6.1.
(i) Choose λ0 ∈ W, c ∈ (c̄, ∞) and set σ = c − c̄, n = 0.
(ii) Determine x̃ as a solution of
i i
i i
ItoKunisc
i i
2008/6/12
page 161
i i
The existence of a solution to (Paux ) is guaranteed if, for example, the following
conditions on f and e hold:
⎧
⎪
⎨f : X → R is weakly lower semicontinuous,
e : X → W maps weakly convergent sequences (6.2.17)
⎪
⎩
to weakly convergent sequences.
Under the conditions of Theorem 6.4 below it follows that the solutions x̃ of (Paux )
satisfy x̃ ∈ V (x ∗ ).
K̂
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤ λn − λ∗
W (6.2.18)
X×W c − c̄
Proof. Let η̂ be the largest radius for a ball centered at (x ∗ , λ∗ ) and contained in U (x ∗ , λ∗ ),
and let γ be a Lipschitz constant for f and e on U (x ∗ , λ∗ ). Further let E = e (x ∗ ) and
note that (EE ∗ )−1 E ∈ L(X) as a consequence of (6.2.3). We define
The proof will be given by induction on n. The case n = 0 follows from the general
arguments given below. For the induction step we assume that
λi − λ∗
≤
λi−1 − λ∗
for i = 1, . . . , n. (6.2.20)
W W
i i
i i
ItoKunisc
i i
2008/6/12
page 162
i i
In view of Lemma 6.2 and the fact that x̃ is chosen in V (x ∗ )(⊂ Ṽ (x ∗ )) we find
2 1
2
1
2
σ̄
x̃ − x ∗
X +
λ̃ − λ∗
≤ λn − λ∗
W . (6.2.21)
2σ W 2σ
3
This implies in particular that |x̃ − x ∗ | ≤ 2σM̄σ̄ |λ0 − λ∗ | < η̂ and hence x̃ ∈ V (x ∗ ). The
necessary optimality conditions for (6.2.1) and (Paux ) with x̃ ∈ V (x ∗ ) are given by
f (x ∗ ) + E ∗ λ∗ = 0
and
f (x̃) + e (x̃)∗ (λn + ce(x̃)) = 0.
Subtracting these two equations and defining λ̄ = λn + ce(x̃) give
and consequently
λ∗ − λ̄ = (EE ∗ )−1 E f (x̃) − f (x ∗ ) + (e (x̃)∗ − e (x ∗ )∗ )(λ̄) .
We obtain
|λ∗ − λ̄|W ≤ 2γ (1 + |λ̄|W )(EE ∗ )−1 E
x̃ − x ∗
X . (6.2.22)
To estimate λ̄ observe that due to (6.2.20) and (6.2.21)
λ̄
≤
λ̃ − λ̄
+
λ̃
= c̄
e(x̃) − e(x ∗ )
+
λ∗
+
λ̃ − λ∗
W W W
W W
W
c̄γ
≤ c̄γ
x̃ − x ∗
W +
λ∗
W +
λn − λ∗
W ≤ 1 + √
λ n − λ ∗
W
2σ σ̄
∗
+
λ
W ≤ μ. (6.2.23)
i i
i i
ItoKunisc
i i
2008/6/12
page 163
i i
This implies that Lemma 6.3 is applicable with (x, λ) = (x̃, λ̃). We find
K M̄
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤ λn − λ∗
X , (6.2.24)
X×W 2σ σ̄
and (6.2.18) is proved with K̂ = K2σ̄M̄ . From (6.2.20), (6.2.24), and the definition of η we
also obtain
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤
λn − λ∗
< η.
X×W W
Algorithm 6.2. This coincides with Algorithm 6.1 except for (i) and (ii), which are re-
placed by
(i) Choose λ0 ∈ W, c ∈ (c̄, ∞), and σ ∈ (0, c − c̄] and set n = 0.
(ii) Determine x̃ ∈ V (x ∗ ) such that
Lc (x̃, λn ) ≤ Lc (x ∗ , λn ) = f (x ∗ ).
Theorem 6.5. Let (6.2.3) and (6.2.5) hold. If |λ0 − λ∗ |W is sufficiently small, then
Algorithm 6.2 is well defined and
1
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤ K 1 + λn − λ∗
W (6.2.25)
X×W σ σ̄
for all n = 0, 1, . . . . Here xn+1 stands for x̂ of step (iv) of Algorithm 6.1 and K, independent
of c, is given in (6.2.16).
where a = 1 + 1
σ σ̄
. The proof is based on an induction argument. If
λ0 − λ∗
< η,
and (6.2.25) holds with n = 0. This will follow from the general arguments given below.
Assuming that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η, we show that Algorithm 6.2 is well defined
for n + 1, that
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
< η,
X×W
i i
i i
ItoKunisc
i i
2008/6/12
page 164
i i
and that (6.2.25) holds. As in the proof of Theorem 6.4 one argues that (6.2.21) holds and
consequently
2
∗
1
∗
(x̃, λ̃) − (x , λ )
≤ 1+
λn − λ∗
2 . (6.2.27)
X×W 2σ σ̄ W
This implies that |(x̃, λ̃∗ ) − (x ∗ , λ∗ )|X×W < η̂ and hence Lemma 6.3 is applicable with
(x, λ) = (x̃, λ̃) and (iv) of Algorithm 6.2 is well defined. Combining (6.2.16) with (6.2.27)
we find
1
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤K 1+
λn − λ∗
2 ,
X×W 2σ σ̄ W
Remark 6.2.1. In the proof of Theorem 6.4 as well as in that of Theorem 6.5 we utilize
(6.2.7) from Lemma 6.2. Conditions (6.2.3) and (6.2.5) are sufficient conditions for (6.2.7)
to hold for c ≥ c̄ > 0. If (6.2.7) can be shown to hold for all c ≥ 0, then c̄ can be chosen
equal to 0 and σ = c is admissible in (i) of Algorithm 6.2.
In the third algorithm we delete the second step of Algorithms 6.1 and 6.2 and directly
iterate (6.2.14).
Algorithm 6.3.
(i) Choose (x0 , λ0 ) ∈ X × W, c ≥ 0 and put n = 0.
(ii) Set λ̃ = λn + ce(xn ).
(iii) Solve for (x̂, λ̂):
x̂ − xn L0 (xn , λ̃)
M(xn , λ̃) =− .
λ̂ − λ̃ e(xn )
Theorem 6.6. Let (6.2.3) and (6.2.5) hold. If max(1, c) |(x0 , λ0 ) − (x ∗ , λ∗ )|X×W is
sufficiently small, then Algorithm 6.3 is well defined and
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ )
≤ K̃
(xn , λn ) − (x ∗ , λ∗ )
X×W
X×W
i i
i i
ItoKunisc
i i
2008/6/12
page 165
i i
where a = max 2, 1 + 2c2 γ 2 and K is defined in Lemma 6.3.
Let us assume that
(x0 , λ0 ) − (x ∗ , λ∗ )
< η.
X×W
Again we proceed by induction and the case n = 0 follows from the general arguments
given below. Let us assume that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η. Then
2
2
2
2
(xn , λ̃) − (x ∗ , λ∗ )
≤
xn − x ∗
X + 2c2
e(xn ) − e(x ∗ )
W + 2
λn − λ∗
W
X×W
2
2
2
≤
xn − x ∗
X + 2c2 γ 2
xn − x ∗
X + 2
λn − λ∗
W
2
≤ a
(xn , λn ) − (x ∗ , λ∗ )
X×W < η̂2 ,
Remark 6.2.2. (i) If c is set equal to 0 in Algorithm 6.3, then we obtain the well-known
SQP algorithm for the equality-constrained problem (6.2.1). It is well known to have a
second order convergence rate which also follows from Theorem 6.6 since K̃ is finite for
c = 0.
(ii) Theorem 6.6 suggests that in case (Paux ) is completely skipped in the second
order augmented Lagrangian update the penalty parameter may have a negative effect on
the region of attraction and on the convergence rate estimate. Our numerical experience,
without additional globalization techniques, indicates that moderate values of c do not
impede the behavior of the algorithm when compared to c = 0, which results in the SQP
algorithm. Choosing c > 0 may actually enlarge the region of attraction when compared to
c = 0. For parameter estimation problems c > 0 is useful, because in this way Algorithm
6.3 becomes a hybrid algorithm combining the output least squares and the equation error
formulations [IK9].
i i
i i
ItoKunisc
i i
2008/6/12
page 166
i i
We further define G+ = g + (x ∗ ) , G0 = g 0 (x ∗ ) ,
E
E+ = : X → W × R m1 ,
G+
and for z ∈ Z we define the operator E(z) : X × R → (W × Rm1 ) × Rm2 × Z by
⎛ ⎞
E+ 0
E(z) = ⎝ G0 0 ⎠ .
L z
The following additional assumptions are required:
there exists κ > 0 such that L (x ∗ , λ∗ , μ∗ ) (x, x) ≥ κ|x|2X for all x ∈ ker(E+ ) (6.3.4)
and
E((x ∗ )) is surjective. (6.3.5)
The SQP method for (6.2.1) with elimination of the equality and finite rank inequality
constraints is given next. We set
L(x, λ, μ) = f (x) + λ, e(x) W + μ, g(x) Rm .
i i
i i
ItoKunisc
i i
2008/6/12
page 167
i i
Algorithm 6.4.
(i) Choose (x0 , λ0 , μ0 ) ∈ X × W × Rm and set n = 0.
(ii) Solve for (xn+1 , λn+1 , μn+1 ):
⎧
⎪
⎪ min 12 L (xn , λn , μn )(x − xn , x − xn ) + f (xn )(x − xn ),
⎪
⎪
⎨
e(xn ) + e (xn )(x − xn ) = 0, (Paux )
⎪
⎪
⎪
⎪
⎩
g(xn ) + g (xn )(x − xn ) ≤ 0, (xn ) + L(x − xn ) ∈ K,
where (λn+1 , μn+1 ) are the Lagrange multipliers associated to the equality and in-
equality constraints.
(iii) Set n = n + 1 and goto (ii).
Since (6.3.3)–(6.3.5) imply (H1), (H )2), (H3) of Chapter 2, there exist by Corol-
lary 2.18 neighborhoods Û (x , λ , μ ) and U (x ∗ ) such that the auxiliary problem (Paux )
∗ ∗ ∗
of Algorithm 6.4 admits a unique local solution xn+1 in U (x ∗ ) provided that (xn , λn , μn ) ∈
Û (x ∗ , λ∗ , μ∗ ) = Û (x ∗ )×Û (λ∗ , μ∗ ). To obtain a convergence rate estimate for xn , Lagrange
multipliers to the constraints in (Paux ) are introduced. Since the regular point condition is
stable with respect to perturbations in x ∗ , we can assume that Û (x ∗ ) is chosen sufficiently
small such that for xn ∈ Û (x ∗ ) there exist (λn+1 , μn+1 , ηn+1 ) ∈ W × Rm × Z such that
⎛ ⎞
0
⎜ 0 ⎟
0 ∈ G(xn , λn , μn )(xn+1 , λn+1 , μn+1 , ηn+1 ) + ⎜ ⎟
⎝ ∂ ψRm,+ (μn+1 ) ⎠ , (6.3.6)
∂K + (ηn+1 )
where
G(xn , λn , μn )(x, λ, μ, η)
⎛ ⎞
L (xn , λn , μn )(x − xn ) + f (xn ) + e (xn )∗ λ + g (xn )∗ μ + L∗ η
⎜ −e(xn ) − e (xn )(x − xn ) ⎟
=⎜ ⎝ −g(xn ) − g (xn )(x − xn )
⎟.
⎠
−(xn ) − L(x − xn )
Theorem 6.7. Assume that (6.3.2) – (6.3.5) are satisfied at (x ∗ , λ∗ , μ∗ ) and that |(x0 , λ0 , μ0 )−
(x ∗ , λ∗ , μ∗ )| is sufficiently small. Then there exists K̄ > 0 such that
|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ K̄|(xn , λn , μn ) − (x ∗ , λ∗ , μ∗ )|2
for all n = 1, 2, . . . .
i i
i i
ItoKunisc
i i
2008/6/12
page 168
i i
or equivalently
⎛ ⎞ ⎛ ⎞
an 0
⎜bn ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ + G(xn , λn , μn )(x ∗ , λ∗ , μ∗ , η∗ ) + ⎜
⎝ cn ⎠ ⎝
⎟, (6.3.7)
∂ ψRm,+ (μ ) ⎠
∗
dn ∂k+ (η∗ )
Without loss of generality we may assume that the first and second derivatives of f, e, and
g are Lipschitz continuous in Û (x ∗ ). It follows that there exists L̃ such that
Let K̃ be determined from Corollary 2.18 and let B((x ∗ , λ∗ , μ∗ ), r) denote a ball in Û (x ∗ )×
Û (λ∗ , μ∗ ) with center (x ∗ , λ∗ , μ∗ ) and radius r, where r K̃ L̃ < 1.
Proceeding by induction, assume that (xn , λn , μn ) ∈ B((x ∗ , λ∗ , μ∗ ), r). Then from
Corollary 2.18, with (x̄1 , λ̄1 , μ̄1 ) = (x̄2 , λ̄2 , μ̄2 ) = (xn , λn , μn ), and (6.3.6) – (6.3.8) we
find
This estimate implies that (xn+1 , λn+1 , μn+1 ) ∈ B((x ∗ , λ∗ , μ∗ ), r), as well as the desired
local quadratic convergence of Algorithm 6.4.
Remark 6.3.1. Let L(x, λ) = f (x) + λ, e(x) W and consider Algorithm 6.4 with
L (xn , λn , μn ) replaced by L (xn , λn ); i.e., only the equality constraints are eliminated.
If (6.3.4) is satisfied with L (x ∗ , λ∗ , μ∗ ) replaced by L (x ∗ , λ∗ ), then Theorem 6.7 holds
for the resulting algorithm with the triple (x, λ, μ) replaced by (x, λ).
Next we consider
min f (x) subject to
(6.3.9)
e(x) = 0, x ∈ C,
i i
i i
ItoKunisc
i i
2008/6/12
page 169
i i
where f and e are as in (6.3.1) and C is a closed convex set in X. Let x ∗ denote a local
solution of (6.3.9) and assume that
f and e are twice continuously Fréchet differentiable,
(6.3.10)
f and e are Lipschitz continuous in a neighborhood of x ∗ ,
0 ∈ int e (x ∗ )(C − x ∗ ), (6.3.11)
and
there exists a constant κ > 0 such that L (x ∗ , λ∗ ) (x, x) ≥ κ|x|2X
(6.3.12)
for all x ∈ ker e (x ∗ ).
To solve (6.3.9) we consider the SQP algorithm with elimination of the equality constraint.
Algorithm 6.5.
where
L (xn , λn )(x − xn ) + f (xn )
G(xn , λn )(x, λ) = .
−e(xn ) − e (xn )(x − xn )
i i
i i
ItoKunisc
i i
2008/6/12
page 170
i i
Remark 6.3.2. From the discussion in the first part of this section it follows that (6.3.14)
holds for problem (6.3.1) with C = {x : g(x) ≤ 0, (x) ∈ K}, provided that g is convex.
Condition (6.3.14) was analyzed for optimal control problems in several papers; see, for
instance, [AlMa, Tro].
or equivalently
an ∂ψC (x ∗ )
0∈ + G(xn , λn )(x ∗ , λ∗ ) + , (6.3.15)
bn 0
where
an = f (x ∗ ) − f (xn ) + (e (xn )∗ − e (x ∗ )∗ )λ∗ − L (xn , λn )(x ∗ − xn ),
bn = e(xn ) + e (xn )(x ∗ − xn ) − e(x ∗ ).
In view of (6.3.10) we may assume that Û (x ∗ ) is chosen sufficiently small such that f and
e are Lipschitz continuous on Û (x ∗ ). Consequently there exists L̃ such that
Let B((x ∗ , λ∗ ), r) denote a ball with radius r < (K̃ L̃)−1 and center (x ∗ , λ∗ ) contained
in U (x ∗ , λ∗ ) and in Û (x ∗ ) × Û (λ∗ ), and assume that (xn , λn ) ∈ B((x ∗ , λ∗ ), r). Then by
(6.3.13)–(6.3.16) we have
In Section 6.2 the combination of the second order method (6.2.8) with first order
augmented Lagrangian updates was analyzed for equality-constrained problems. This ap-
proach can also be taken for problems with inequality constraints. We present the analogue
of Theorem 6.4. Let
c c
Lc (x, λ, μ) = f (x) + λ, e(x) W + μ, ĝ(x, μ, c) Rm + |e(x)|2W + |ĝ(x, μ, c)|2Rm ,
2 2
where ĝ(x, μ, c) = max(g(x), − μc ), as defined in Chapter 3. Below c̃ ≥ c̄ denote the
constants and Bδ the closed ball of radius δ around x ∗ that were utilized in Corollary 3.7.
i i
i i
ItoKunisc
i i
2008/6/12
page 171
i i
Algorithm 6.6.
(iv) Solve (Paux ) of Algorithm 6.4 with (xn , λn , μn ) = (x̃, λ̃, μ̃) for (xn+1 , λn+1 , ηn+1 ).
Theorem 6.9. Assume that (3.4.7), (3.4.9) of Chapter 3 and (6.3.2)–(6.3.4) of this chapter
hold at (x ∗ , λ∗ , μ∗ ). If c−1 c̄ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then Algorithm
6.6 is well defined and
Proof. The assumptions guarantee that Corollary 3.7, Proposition 3.9, and Theorem 3.10 of
Chapter 3 and Theorem 6.7 of the present chapter are applicable. The proof is now similar to
that of Theorem 6.4, with Theorem 6.7 replacing Lemma 6.3 to estimate the step of (Paux ).
Let η̂ be the radius of the largest ball U centered at (x ∗ , λ∗ , μ∗ ) such that Theorem 6.7 is
applicable and that the x-coordinates of elements in U are also in Bδ . Let
√ κ 2 (c − c̄) 2(c − c̄)
η = min η̂ κ c − c̄, , where κ 2 = ,
K̄ 1 + 2K 2
We proceed by induction with respect to n. The case n = 0 is simple and we assume that
i i
i i
ItoKunisc
i i
2008/6/12
page 172
i i
This estimate implies that x̃ ∈ int Bδ and that Theorem 6.7 with (xn , λn , μn ) replaced by
(x̃, λ̃, μ̃) is applicable. It implies
K̄
|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ |(λn , μn ) − (λ∗ , μ∗ )|2
κ 2 (c − c̄)
≤ |(λn , μn ) − (λ∗ , μ∗ )|2 .
6.4 Applications
6.4.1 An introductory example
Consider the optimal control problem
⎧
⎪
⎪ min 12 Q |y − z|2 dxdt + β2 |u|2U ,
⎨
yt = y + g(y) + Bu in Q,
(6.4.1)
⎪
⎪ = 0 on ,
y
⎩
y(0, ·) = ϕ on ,
Here B ∈ L(U, Y ), where U is the Hilbert space of controls, and the choice for Y can be
or
for example. Here L2 (H01 ()) is an abbreviation for L2 (0, T ; H01 ()). The former choice
corresponds to variational formulations, the latter to strong formulations of the partial dif-
ferential equation and both require regularity assumptions for g. Matching choices for
W are
i i
i i
ItoKunisc
i i
2008/6/12
page 173
i i
where denotes the Laplacian with Dirichlet boundary conditions. The augmented La-
grangian SQP step as described below (6.2.14) is given by
and
⎛ ⎞⎛ ⎞
I + g (y)−1 λ̃ 0 ( ∂t∂ + + g (y))−1 δy
⎝ 0 βI B ∗ −1 ⎠ ⎝ δu ⎠
−1 ∂ −1
(−) ( ∂t − − g (y)) B 0 δλ
⎛ ⎞
y − z − (( ∂t∂ + + g (y))(−)−1 λ̃
= −⎝ βu + B ∗ −1 λ̃ ⎠,
−1
(−) (yt − y − g(y) − Bu)
with δy(0, ·) = δλ(T , ·) = 0. Inspection of the previous system (e.g., for the sake of
symmetrization) suggests introducing the new variable = −−1 λ.
This results in an update for given by
Let us also point out that if we had immediately discretized (6.4.1), then the differences
between topologies tend to get lost and (−)−1 in (6.4.2) may have been forgotten. On the
discretized level the effect of (−)−1 can be restored by preconditioning.
Let us now return to the system (6.4.3). Due to its size it will—after discretization—
not be solved by a direct method but rather by iteration based on conjugate gradients.
The question of choice for preconditioners arises. The following two choices for block
preconditioners were successful in our tests [KaK]:
⎛ ⎞−1 ⎛ ⎞
0 0 P∗ 0 0 P −1
⎝ 0 βI 0 ⎠ =⎝ 0 β −1 I 0 ⎠ (6.4.4)
P 0 0 P −∗ 0 0
and
⎛ ⎞−1 ⎛ ⎞
I 0 P∗ 0 0 P −1
⎝ 0 βI 0 ⎠ =⎝ 0 β −1 I 0 ⎠, (6.4.5)
P 0 0 P −∗ 0 −∗ −1
P P ,
i i
i i
ItoKunisc
i i
2008/6/12
page 174
i i
where fˆ ∈ L2 () and uf ∈ L2 (2 ) are fixed and u ∈ L2 (1 ) will be the control variable.
Here is a bounded domain in R2 with C 1,1 boundary or is convex. For the case of
higher-dimensional domains we refer the reader to [IK10]. The boundary ∂ is assumed
to consist of two disjoint sets 1 , 2 , each of which is connected, or possibly consisting of
finitely many connected components, with = ∂ = 1 ∪ 2 , and 2 possibly empty.
Further it is assumed that g ∈ C 2 (R) and that g(H 1 ()) ⊂ L1+ε () for some ε > 0.
Equation (6.4.6) is understood in the variational sense, i.e.,
(6.4.8)
⎩
subject to (y, u) ∈ H () × L (1 ) a solution of (6.4.6).
1 2
Here C is a bounded linear (observation) operator from H 1 () to a Hilbert space Z, and
yd ∈ Z and α > 0 are fixed.
To express (6.4.8) in the form (6.2.1) of Section 6.2 we introduce
ẽ : H 1 () × L2 (1 ) → H 1 ()∗
with
ẽ(y, u), ϕ
(H 1 )∗ ,H 1 = (∇y, ∇ϕ) + (g(y) − fˆ, ϕ) − (ũ, ϕ)
i i
i i
ItoKunisc
i i
2008/6/12
page 175
i i
and
e : H 1 () × L2 () → H 1 ()
by
e = N ẽ,
X = H 1 () × L2 (1 ), Y = H 1 (),
If (6.2.3) is satisfied, i.e., if e (x ∗ ) is surjective, then there exists a unique Lagrange multiplier
λ∗ ∈ H 1 () associated to x ∗ such that
i i
i i
ItoKunisc
i i
2008/6/12
page 176
i i
τ1 λ∗ = αu∗ on 1 . (6.4.12)
1 α
L(y, u, λ) = |Cy − yd |2Z + |u|2L2 (1 ) + ∇λ, ∇y
2 2
+ λ, g(y) − fˆ − λ, ũ .
and
Lu (y ∗ , u∗ , λ∗ )(h) = α u∗ , h 1 − λ∗ , h 1 .
We now aim for a priori estimates for λ∗ . We define B : H 1 () → H 1 ()∗ as the
differential operator given by the left-hand side of (6.4.11); i.e., Bv = ϕ is characterized as
the solution to
∇v, ∇ψ + g (y ∗ )v, ψ = ϕ, ψ
(H 1 )∗ ,H 1 for all ψ ∈ H 1 ().
g (y ∗ ) ≥ β a.e. on
for some β > 0. With (h2) holding, B is an isomorphism from H 1 () onto H 1 ()∗ .
Moreover, (h2) implies surjectivity of e (x ∗ ).
i i
i i
ItoKunisc
i i
2008/6/12
page 177
i i
(ii) If moreover (h2) is satisfied and C ∗ (Cy ∗ − yd ) ∈ L2 (), then there exists a
constant K(y ∗ ) such that
Proof. Due to surjectivity of e (x ∗ ) we have (e (x ∗ )e (x ∗ )∗ )−1 ∈ L(H 1 ()) and thus (i)
follows from (6.4.10). Let us turn to (ii). Due to (h2) and (6.4.11) there exists a constant
Ky ∗ such that
To obtain the desired H 2 () estimate for λ∗ one applies the well-known H 2 a priori estimate
for Neumann problems to
−λ∗ + λ∗ = w in ,
∂λ∗
= 0 in ∂
∂n
with w = λ∗ − g (y ∗ )λ∗ − C ∗ (Cy ∗ − yd ). This gives
|λ∗ |H 2 ≤ K |λ∗ |L2 + |g (y ∗ )λ∗ |L2 + |C ∗ (Cy ∗ − yd )|L2
Proof. It suffices to observe that by Sobolev’s embedding theorem there exists a constant
Ke such that
| λ∗ , g (y ∗ )v 2 | ≤ Ke |λ∗ |H 1 |g (y ∗ )|L1+ |v|2H 1 (6.4.15)
We turn now to an analysis of the second order sufficient optimality condition (6.2.5)
∗ optimal
for the control problem (6.4.8). In view of (6.4.14) the crucial term is given by
λ , g (y ∗ )v 2 . Two types of results will be given. The first will rely on | λ∗ , g (y ∗ )v 2 |
i i
i i
ItoKunisc
i i
2008/6/12
page 178
i i
being sufficiently small. This can be achieved by guaranteeing that λ∗ or, in view of (6.4.11),
Cy ∗ − yd is small. We refer to this case as small residual problems. The second class of
assumptions rests on guaranteeing that λ∗ g (y ∗ ) ≥ 0 a.e. on .
In the statement of the following theorem we use Ke from (6.4.15) and K(x ∗ ), K(y ∗ )
from Lemma 6.11. Further B −1 denotes the norm of B −1 as an operator from H 1 ()∗ to
H 1 ().
where k̃e is the embedding constant of H 2 () into L∞ (), then (6.2.5) is satisfied.
(iii) If
where τ1 is the norm of the trace operator from H 1 () onto L2 (1 ) and Ky ∗ is defined
in (6.4.13), then (6.2.5) is satisfied.
where in the last estimate we used Lemma 6.11 (i). The claim now follows from (6.4.16).
We observe that in this case L (y ∗ , u∗ , λ∗ ) is positive definite on all of X, not only on
ker e (x ∗ ).
(ii) By (6.4.9) and (h2) we obtain
Here τ1 denotes the norm of the trace operator from H 1 () onto L2 (1 ). Hence by
Lemma 6.11 (ii) and (6.4.19) we find for every (v, h) ∈ ker e (x ∗ )
L (v ∗ , u∗ , λ∗ )((v, h), (v, h)) = |v|2L2 () + α|h|2L2 (1 ) − λ∗ , g (y ∗ )v 2 L2 ()
≥ |v|2L2 () + α|h|2L2 (1 ) − |λ∗ |L∞ () |g (y ∗ )|L∞ () |v|2L2 ()
5 6
≥ 1 − k̃e K(y ∗ )|g (y ∗ )|L∞ () |y ∗ − yd |L2 () |v|2L2 ()
α 1
+ |h|2L2 (1 ) + |v| 2
.
B −1 2 τ1 2 H ()
1
2
i i
i i
ItoKunisc
i i
2008/6/12
page 179
i i
Due to (6.4.17) the expression in brackets is nonnegative and the result follows.
(iii) In this case C can be a boundary observation operator, for example. As in (i) we
find
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + α|h|2L2 (1 ) − Ke |λ∗ |H 1 () |g |L1+ () |v|2H 1 ()
≥ |Cv|2Z + α|h|2L2 (1 ) − Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ |g |L () |v|2H 1 () ,
where (6.4.17) was used. This estimate and (6.4.19) imply that for (v, h) ∈ ker e (x ∗ )
α
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |h|2L2 (1 )
1 2 2
α ∗ ∗
+ − K y ∗ |C (Cy − y )| 1
d (H ) ∗ |g | L () |v|H 1 () .
1+
2
2B −1 τ1
Theorem 6.14. Assume that (6.2.3), (h1), (h3), and (h4) hold and that
(a) Z = H 1 () and C = id,
or
(b) (h2) is satisfied.
Then (6.2.5) holds.
In the case that (a) holds the conclusion is obvious and L (v ∗ , u∗ , λ∗ ) is positive not only
on ker e (x ∗ ) but on all of X. In case (b) we use (6.4.19) to conclude that
α 1
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |h|2L2 (1 ) + |v| 2
B −1 2 τ1 2 H ()
1
2
i i
i i
ItoKunisc
i i
2008/6/12
page 180
i i
Since g (y ) ≥ 0 it follows from (i) that |∇ϕ|2L2 () = 0 and λ∗ ≥ 0. Together with (iii)
∗
we find λ∗ g (y ∗ ) ≥ 0 a.e. on . If the inequalities in (i) and (iii) are reversed, we take
ϕ = sup(0, λ∗ ).
−y + y 3 − y = fˆ in ,
∂y
= u on (6.4.20)
∂n
and the associated optimal control problem
⎧
⎨ min 12 |y − yd |2 dx + α2 u2 dx
(6.4.21)
⎩
subject to (y, u) ∈ H 1 () × L2 () a solution of (6.4.20).
Proof. We first argue that the set of admissible pairs (y, u) ∈ H 1 () × L2 () for (6.4.21)
is not empty. For this purpose we consider
−y + y 3 − y = fˆ in ,
(6.4.22)
y = 0 on ∂.
i i
i i
ItoKunisc
i i
2008/6/12
page 181
i i
Since y ∈ H01 (), Poincaré’s inequality implies the existence of a constant Ĉ independent
of y = y(tˆ) such that
|y(tˆ)|H01 ≤ Ĉ(|fˆ|L2 + ||).
Proof. We follow [GHS]. Since e = N ẽ with N defined below (6.4.8) it suffices to show
that E : = ẽ(y, u) : X → H 1 ()∗ is surjective, where E is characterized by
E(v, h), w
H 1,∗ ,H 1 = ∇v, ∇w + q v, w − h, τ w ,
i i
i i
ItoKunisc
i i
2008/6/12
page 182
i i
By the Fredholm alternative either E0 , and hence E, is surjective or there exists a finite-
dimensional kernel of E0 spanned by the basis {vi }N i=1 in H () and E0 v = z, for z ∈
1
∗
H () , is solvable if and only if z, vi
H 1,∗ ,H 1 = 0 for all i = 1, . . . , N. In the latter case
1
vi , i = 1, . . . , N, are nontrivial
on by Lemma 6.19 below. Without loss of generality
we may assume that vi , vj = δij for i, j = 1, . . . , N. For z ∈ H 1 ()∗ we define ẑ ∈
!
H 1 ()∗ by ẑ, w
H 1,∗ ,H 1 = z, w
H 1,∗ ,H 1 + h, w , where h = − N i=1 z, vi
H 1,∗ ,H 1 τ vi ∈
L (). Then ẑ, vi
H 1,∗ ,H 1 = 0 for all i = 1, . . . , N and hence there exists v ∈ H 1 () such
2
that E0 v = ẑ or equivalently
∇v, ∇w + q v, w = z, w
H 1,∗ ,H 1 for all v ∈ H 1 ().
Consequently E is surjective.
and v = 0 on , then v = 0 in .
and
let q̃ denote the extension
by 0 of q. Since v = 0 on = ∂ we have ṽ ∈ H 1 () and
˜ by (6.4.24). This, together with the fact
∇ ṽ, ∇w ˜ + q̃ ṽ, w ˜ = 0 for all w ∈ H01 ()
that ṽ = 0 on an open nonempty subset of , ˜ implies that ṽ = 0 in
˜ and v = 0 in ; see
[Geo].
Let us discuss the validity of the conditions (hi) for the present problem. Sobolev’s
embedding theorem and
imply that (h1), (h3), and (h4) hold. The location of the equilibria of g suggests that for
h = 0, yd ≥ 1 implies 1 ≤ y ∗ ≤ yd and similarly that yd ≤ −1 implies yd ≤ y ∗ ≤ −1.
This was confirmed in numerical experiments [IK10]. In these cases g (y ∗ ) ≥ β > 0 and
(i)–(iii) of Theorem 6.15 hold.
The numerical results in [IK10] are based on Algorithm 6.3. While it does not contain
a strategy on the choice of c, it was observed that c > 0 and of moderate size is superior
(with respect to region of attraction for convergence) to c = 0, which corresponds to the
SQP method without line search. For example, for the choice yd = 1 and the initialization
y0 = −2, the iterates yn have to pass through the stable equilibrium −1 and the unstable
equilibrium 0 to reach the desired state yd . This can be accomplished with Algorithm 6.3
with c > 0, without globalization strategy, but not with c = 0. Similar comments apply to
the case of Neumann boundary controls.
i i
i i
ItoKunisc
i i
2008/6/12
page 183
i i
−y − y 3 = h in ,
∂y
= u on , (6.4.25)
∂n
where yd ∈ H 1 (). If (6.4.25) admits at least one feasible pair (y, u), then it is simple to
argue that (6.4.26) has a solution x ∗ = (y ∗ , u∗ ). We refer the reader to [Lio2, Chapter 3] for
existence results in the case that the cost functional is of the form |y −yd |rLr () +α|g|2L2 () for
appropriately chosen r > 2. The existence of a Lagrange multiplier is assured in the same
manner as in Example 6.6. Clearly (h1) and (h3) are satisfied. For yd = const ≥ 12 we
observed numerically that 0 ≤ y ∗ ≤ yd , λ∗ < 0, which in view of (h4) and Theorem 6.14
explains the second order convergence rate that is observed numerically.
i i
i i
ItoKunisc
i i
2008/6/12
page 184
i i
Condition 6.5.1. The approximations (Zh , Rh ) of the space Z are uniformly bounded; i.e.,
there exists a constant cR > 0 independent of h satisfying
Rh L(Z,Zh ) ≤ cR .
Condition 6.5.2. For every h problem (6.5.1) has a local solution xh∗ . The mappings fh
and eh are twice continuously Fréchet differentiable in neighborhoods Ṽh∗ of xh∗ , and the
operators eh are Lipschitz continuous on Ṽh∗ with a uniform Lipschitz constant ξe > 0
independent of h.
Condition 6.5.3. For every h there exists a nonempty open and convex set V̂h∗ ⊆ Ṽh∗ such
that a uniform Babuška–Brezzi condition is satisfied on V̂h∗ ; i.e., there exists a constant
β > 0 independent of h satisfying
eh (xh )qh , wh W
inf sup ≥β for all xh ∈ V̂h∗ .
wh ∈Wh qh ∈Xh qh X wh W
The Babuška–Brezzi condition implies that the operators eh (xh ) are surjective on V̂h∗ .
Hence, if Condition 6.5.3 holds, there exists for every h > 0 a Lagrange multiplier λ∗h ∈ Wh
such that (xh∗ , λ∗h ) solves (6.5.2).
Condition 6.5.4. There exist a scalar r > 0 independent of h and a neighborhood V (λ∗h )
of λ∗h for every
∗h ∗such that for Vh∗ = V̂h∗ × V (λ∗h )
∗
(i) B (xh λh ); r ⊆ Vh and
(ii) a uniform second order sufficient optimality condition is satisfied on Vh∗ ; i.e., there
exists a constant κ̄ > 0 such that for all (xh , λh ) ∈ Vh∗
Lh (xh , λh )(qh )2 ≥ κ̄ qh 2X
for all qh ∈ ker eh (xh ).
We define
L (x, λ)
F (x, λ) = .
e(x)
For every h and for all (xh , λh ) ∈ Vh∗ we introduce approximations of F and M by
Lh (xh , λh )
Fh (xh , λh ) = ,
eh (xh )
Lh (xh , λh ) eh (xh )
Mh (xh , λh ) = .
eh (xh ) 0
By Conditions 6.5.3 and 6.5.4 there exists a bound η which may depend on β and κ ∗ but is
independent of h so that
Mh−1 (xh , λh )L(Zh ) ≤ η (6.5.3)
i i
i i
ItoKunisc
i i
2008/6/12
page 185
i i
for all (xh , λh ) ∈ Vh∗ . We require the following consistency properties of Fh and Mh , where
V (x ∗ , λ∗ ) = V (x ∗ ) × V (λ∗ ) denotes the neighborhood of the solution x ∗ of (6.2.1) and the
associated Lagrange multiplier λ∗ defined above (6.2.8).
Condition 6.5.5. For each h we have Rh (V (x ∗ , λ∗ )) ⊆ Vh∗ , and the approximations of the
operators F and M are consistent on V (x ∗ ) and V (x ∗ , λ∗ ), respectively, i.e.,
and
/ /
/ q q /
lim / M (R (x, λ))R − R M(x, λ) / =0
h→0 / w /Z
h h h h
w
for all (q, w) ∈ Z and (x, λ) ∈ V (x ∗ , λ∗ ). Moreover, for every h let the operator Mh be
Lipschitz continuous on Vh∗ with a uniform Lipschitz constant ξM > 0 independent of h.
Theorem 6.21. Let c (xh0 , λ0h ) − (xh∗ , λ∗h )Z be sufficiently small for all h and let (6.2.3),
(6.2.5), and Conditions 6.5.2–6.5.4 hold. Then we have the following:
(a) Algorithm 6.7 is well defined and
∗ ∗ ∗ ∗ 2
h ) − (xh , λh )Z ≤ C (xh , λh ) − (xh , λh )Z ,
(xhn+1 , λn+1 n n
i i
i i
ItoKunisc
i i
2008/6/12
page 186
i i
for all n, where (xhn , λnh ) and (x n , λn ) are the nth iterates of the finite- and infinite-dimensional
methods, respectively.
holds.
Corollary 6.23. Let the hypotheses of Theorem 6.21 hold and let {(Zh(n) , Rh(n) )} be a
sequence of approximations with limn→∞ h(n) = 0. Then we have
lim (xh(n)
n
, λnh(n) ) − Rh(n) (x ∗ , λ∗ )Z = 0.
n→∞
We point out that both n(ε) and nh (ε) depend on the startup values (x 0 , λ0 ) and (xh0 , λ0h ) of
the infinite- and finite-dimensional methods.
Theorem 6.24. Under the assumptions of Theorem 6.21 there exists for each ε > 0 a
constant hε > 0 such that
The proofs of the results of this section as well as numerical examples which demon-
strate that mesh-independence is also observed numerically can be found in [KVo1, Vol].
6.6 Comments
Second order augmented Lagrangian methods as discussed in Section 6.2 were treated
in [Be] for finite-dimensional and in [IK9, IK10] for infinite-dimensional problems. The
history for SQP methods is a long one; see [Han, Pow1] and [Sto, StTa], for exam-
ple, and the references therein for the analysis of finite-dimensional problems. Infinite-
dimensional problems are considered in [Alt5], for example. Mesh-independence of aug-
mented Lagrangian-SQP methods was analyzed in [Vol]. This paper also contains many
references on mesh-independence for SQP and Newton methods. Methods which replace
equality- and inequality-constrained optimization problems by a linear quadratic approx-
imation with respect to the equality constraints and the cost functional, while leaving the
i i
i i
ItoKunisc
i i
2008/6/12
page 187
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 188
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 189
i i
Chapter 7
The Primal-Dual
Active Set Method
This chapter is devoted to the primal-dual active set strategy for variational problems with
simple constraints. This is an efficient method for solving the optimality systems arising
from quadratic programming problems with unilateral or bilateral affine constraints and it is
equally well applicable to certain complementarity problems. The algorithm and some of its
basic properties are described in Section 7.1. In the ensuing sections sufficient conditions
for its convergence with arbitrary initialization and without globalization are presented
for a variety of different classes of problems. Sections 7.2 and 7.3 are devoted to the
finite-dimensional case where the system matrix is an M-matrix or has a cone-preserving
property, respectively. Operators which are of diagonally dominant type are considered in
Sections 7.4 and 7.5 for unilateral, respectively, bilateral problems. In Section 7.6 nonlinear
optimal control problems with control constraints are investigated.
189
i i
i i
ItoKunisc
i i
2008/6/12
page 190
i i
Ex = b, (7.1.2)
⎪ subject to d
y = y, y(0, ·) = y0 ˆ
y = 0 on ∂ \ ,
⎪
⎪ dt
⎪
⎪
⎪
⎪
⎩ ˆ
ν · ∇y(t) = Bu(t) in , u(t) ≤ ψ,
i i
i i
ItoKunisc
i i
2008/6/12
page 191
i i
exists a unique solution y = y(u) ∈ Y = L2 (0, T ; H 1 ()) ∩ H 1 (0, T ; L2 ()) to the initial
boundary value problem arising as the equality constraints. Let T ∈ L(U, Y ) denote the
solution operator given by y = T (u). Setting
A = αI + T ∗ T , a = T ∗ yd , and G = I,
−y = u, y = 0 on ∂, y≤ψ
for y ∈ H01 () and u ∈ L2 (). This problem can be formulated as (7.1.1) with X =
(H 2 () ∩ H01 ()) × L2 () and G the inclusion of X into L2 (). The solution, however,
will not satisfy a regular point condition, and the Lagrange multiplier associated with the
state constraint y ≤ ψ is only a measure in general. This class of problems will be discussed
in Section 8.6.
Let us return to (7.1.2). If the active set
A = {μ + c(Gx − ψ) > 0}
Ax + E ∗ λ + G∗ μ = a,
Ex = b,
Gx = ψ in A and μ = 0 in Ac .
Here and below we frequently use the notation {f > 0} to stand for {x : f (x) > 0} if
f ∈ L2 () and {i : fi > 0} if f ∈ Rn . The active set at the solution, however, is unknown.
i i
i i
ItoKunisc
i i
2008/6/12
page 192
i i
as a prediction strategy. Based on the current primal-dual pair (x, μ) the updates for the
active and inactive sets are determined by
Ex k+1 = b,
It will be shown in Section 8.4 that the above algorithm can be interpreted as a semismooth
Newton method for solving (7.1.2). This will allow the local convergence analysis of the
algorithm. In this chapter we concentrate on its global convergence, i.e., convergence for
arbitrary initializations and without the necessity for a line search. In case A is positive
definite, a sufficient condition for the existence of solutions to the auxiliary systems of step
(iii) of the algorithm is given by surjectivity of E and surjectivity G : N (E) → Z. In fact
in this case (iii) is the necessary optimality condition for
⎧
⎪ 1
⎪
⎨min Ax, x X − a, x X
x∈X 2
⎪
⎪
⎩
subject to Ex = b, Gx = ψ on Ak .
In the remainder of this chapter we focus on the reduced form of (7.1.2) given by
where A ∈ L(Z).
We now derive sufficient conditions which allow us to transform (7.1.2) into (7.1.3).
In a first step we assume that
i i
i i
ItoKunisc
i i
2008/6/12
page 193
i i
Note that (7.1.4) implies that G : N (E) → Z is surjective. If not, then there exists a
nonzero z ∈ Z such that (z, Gx)Z = (G∗ z, x)X = 0 for all x ∈ ker E. If we let x = G∗ z,
then |x|2 = 0 and z = 0, since G∗ is injective. Let PE denote the orthogonal projection in
X onto ker E. Then (7.1.2) is equivalent to
with A = PE APE and x = x̂ + x̄ ∈ ker E + (ker E)⊥ . The first of the above equations is
equivalent to the system
A11 = (GG∗ )−1 GAG∗ (GG∗ )−1 , A12 = (GG∗ )−1 GAPG , A22 = PG APG
and
and E ∗ λ = (I − PE )(a − A(x̂ + x̄)) are equivalent to (7.1.2) where x = x̂ + x̄, with
x̂ = x1 + x2 ∈ ker E, x2 ∈ ker E ∩ ker G, x1 ∈ ker E ∩ (ker G)⊥ , y = Gx1 .
Note that the system matrix in (7.1.6) is positive definite if A restricted to ker E is
positive definite.
Let us now further assume that
i i
i i
ItoKunisc
i i
2008/6/12
page 194
i i
and (7.1.7), which is of the desired form (7.1.3). In the finite-dimensional case, (7.1.3)
admits a unique solution for every a ∈ Rn if and only if A is a P -matrix; see [BePl,
Theorem 10.2.15.]. Recall that A is called a P -matrix if all its principal minors are positive.
In view of the fact that the reduction of (7.1.6) to (7.1.3) was achieved by taking the Schur
complement with respect to A22 it is also worthwhile to recall that the Schur complement
of a P -matrix (resp., M-matrix) is again a P -matrix (resp., M-matrix); see [BePl, p. 292].
For further reference it will be convenient to specify the primal-dual active set algo-
rithm for the reduced system (7.1.3).
Ax k+1 + μk+1 = a,
We use (7.1.3) rather than (7.1.6) for the convergence analysis for the reason of
avoiding additional notation. All convergence results that follow equally well apply to
(7.1.6) where it is understood that the coordinates corresponding to the variable x2 are
treated as inactive throughout the algorithm and the corresponding Lagrange multipliers are
set and updated by 0.
In the following subsections convergence will be proved under various different con-
ditions. These conditions will imply also the existence of a solution to the subsystems in
step (iii) of the algorithm as well as the existence of a unique solution to (7.1.3).
For ˜ ⊂ {1, . . . , n}, respectively,
˜ ⊂ , let R˜ denote the restriction operator to
˜ ∗
. Then R˜ is the extension-by-zero operator to ˜ c . For any A let I be its complement in
R , respectively, , and denote
n
∗
AA = RA ARA , AA,I = RA ARI∗ ,
and analogously for δμA and δμI . From (iii) above we have
i i
i i
ItoKunisc
i i
2008/6/12
page 195
i i
The following properties for k = 1, 2, . . . follow from steps (ii) and (iii):
μk (x k − ψ) = 0, μk + (x k − ψ) > 0 on Ak ,
x k − ψ ≥ 0, μk ≥ 0 on Ak , x k ≤ ψ, μk ≤ 0 on Ik , (7.1.10)
δxAk ≤ 0, δμIk ≥ 0,
Remark 7.1.1. From (7.1.10) it follows that Ak = Ak+1 implies that the solution is found,
i.e., (xk , μk ) = (x ∗ , μ∗ ). In numerical practice it was observed that Ak = Ak+1 can be
used as a stopping criterion; see [HiIK, IK20, IK22].
Remark 7.1.2. The primal-dual active set strategy can be interpreted as a prediction strategy
which, on the basis of (x k , μk ), predicts the true active and inactive sets for (7.1.3), i.e.,
the sets
To further pursue this point we assume that the systems in (7.1.3) and step (iii) of the
algorithm admit solutions, and we define the following partitioning of the index set at
iteration level k:
IG = Ik ∩ I ∗ , IB = Ik ∩ A∗ , AG = Ak ∩ A∗ , AB = Ak ∩ I ∗ .
The sets IG , AG give a good prediction and the sets IB , AB give a bad prediction. Let
us denote x = x k+1 − x ∗ , μ = μk+1 − μ∗ , and we denote by G((x k , μk )) the system
matrix for step (iii) of the algorithm:
⎛ ⎞
AIk AIk Ak IIk 0
⎜AAk Ik AAk 0 IAk ⎟
G(xk , μk ) = ⎜
⎝ 0
⎟.
0 IIk 0 ⎠
0 −cIAk 0 0
Then we have the identity
⎛ ⎞
xIk
⎜ xAk ⎟
G(x k , μk ) ⎜ ⎟ ∗ ∗
⎝ μIk ⎠ = −col 0Ik , 0Ak , 0IG , μIB , 0AG , c(ψ − x )AB . (7.1.11)
μAk
Here we assumed that the components of the equation μ − max{0, μ + c(x − ψ)} = 0 are
ordered as (IG , IB , AG , AB ). Since x k ≥ ψ on Ak and μk ≤ 0 on Ik , we have
By the definition
i i
i i
ItoKunisc
i i
2008/6/12
page 196
i i
Thus if the incorrectly predicted sets are small in the sense that
1
|(x k − x ∗ )AB | + |(μk − μ∗ )IB | ≤ (|(x k − x ∗ )AB,c | + |(μk − μ∗ )IB,c |),
2κ − 1
where AB,c , IB,c denote the complement of the indices AB , IB , respectively, then
1
|x k+1 − x ∗ | + |μk+1 − μ∗ | ≤ (|x k − y ∗ | + |μk − μ∗ |),
2
and convergence follows.
(iii) If x ∗ < ψ and μ0 + c (x 0 − ψ) ≤ 0 (e.g., y 0 = ψ, μ0 = 0), then the algorithm
converges in one step. In fact, in this case AB = IB = ∅.
Theorem 7.4. Assume that A is an M-matrix. Then xk → x ∗ for arbitrary initial data.
Moreover x ∗ ≤ x k+1 ≤ x k for all k ≥ 1, x k ≤ ψ for all k ≥ 2, and there exists k0 such that
μk ≥ 0 for all k ≥ k0 .
i i
i i
ItoKunisc
i i
2008/6/12
page 197
i i
it follows that δxIk ≤ 0. Together with δxAk ≤ 0, which follows from the third equation
in (7.1.10), this implies that x k+1 ≤ x k for k ≥ 1. Next we show that x k is feasible for
k ≥ 2. Due to monotonicity of x k with respect to k it suffices to show this for k = 2. For i
such that (x 1 − ψ)i > 0 we have μ1i = 0 by (7.1.10) and hence μ1i + c (x 1 − ψ)i > 0 and
i ∈ A1 . Since x 2 = ψ on A1 and x 2 ≤ x 1 it follows that x 2 ≤ ψ.
To verify that x ∗ ≤ x k for k ≥ 1, note that
and consequently
∗
AIk−1 xIk k−1 − xI∗k−1 = μ∗Ik−1 + AIk−1 Ak−1 xA k−1
− ψAk−1 .
i ∈ Ik̄ , μk̄+1
i = 0, and μk̄+1
i + c(xik̄+1 − ψi ) ≤ 0, since xik+1 ≤ ψi , k ≥ 1. Consequently
i ∈ Ik̄+1 and by induction i ∈ Ik for all k ≥ k̄ + 1. Thus, whenever a coordinate of μk
becomes negative at iteration k̄, it is zero from iteration k̄+1 onwards, and the corresponding
primal coordinate is feasible. Due to finite-dimensionality of Rn it follows that there exists
ko such that μk ≥ 0 for all k ≥ ko .
Monotonicity of x k and x ∗ ≤ x k ≤ ψ for k ≥ 2 imply the existence of x̄ such
that lim x k = x̄ ≤ ψ. Since μk = Ax k + a ≥ 0 for all k ≥ ko , there exists μ̄ such
that lim μk = μ̄ ≥ 0. Together with the complementarity property μ̄(x̄ − ψ), which is a
consequence of the first equation in (7.1.10), it follows that (x̄, μ̄) = (x ∗ , μ∗ ).
The third condition in the above theorem motivates the terminology cone sum pre-
serving. If A is an M-matrix,
! then the conditions of Theorem 7.5 are satisfied. The proof
will reveal that M(x k ) = ni=1 xik is a merit function.
Proof. From (7.1.9) and the fact that x k+1 = ψ on Ak we have for k = 1, 2, . . .
i i
i i
ItoKunisc
i i
2008/6/12
page 198
i i
n
(xik+1 −xik ) = − (xik −ψi )+ (A−1
Ik AIk Ak (x −ψ)Ak )i +
k
(A−1 k
Ik μIk )i . (7.3.2)
i=1 i∈Ak i∈Ik i∈Ik
k
Since xA k
≥ ψAk it follows that
n
(xik+1 − xik ) ≤ ((A−1
Ik AIk Ak )+ 1 − 1) |x − ψ|1,Ak +
k
(A−1 k
Ik μIk )i < 0 (7.3.3)
i=1 i∈Ik
n
x k → M(x k ) = xik
i=1
acts as a merit function for the algorithm. Since there are only finitely many possible choices
for active/inactive sets, there exists an iteration index k̄ such that Ik̄ = Ik̄+1 . In this case
(x k̄+1 , μk̄+1 ) is a solution to (7.1.3). In fact, in view of (iii) of the algorithm it suffices to show
that x k̄+1 and μk̄+1 are feasible. This follows from the fact that due to Ik̄ = Ik̄+1 we have
c(xik̄+1 − ψi ) = μk̄+1 i + c(xik̄+1 − ψi ) ≤ 0 for i ∈ Ik̄ and μk̄+1i + c(xik̄+1 − ψi ) = μk̄+1
i >0
for i ∈ Ak̄ . From (7.1.10) we deduce μ (x k̄+1 k̄+1
− ψ) = 0, and hence the complementarity
conditions hold and the algorithm converges in finitely many steps.
A perturbation result. We now discuss the primal-dual active set strategy for the case
where the matrix A can be expressed as an additive perturbation of an M-matrix.
i i
i i
ItoKunisc
i i
2008/6/12
page 199
i i
where BIA (K) = MI−1 KIA − MI−1 KI (M + K)−1 I AIA . Assume that K is chosen such that
ρ < 12 and σ < 1. For every subset I ∈ S the inverse of AI exists and can be expressed as
∞
A−1
I = (II + −MI−1 KI )i MI−1 .
i=1
Consequently the algorithm is well defined. Proceeding as in the proof of Theorem 7.5 we
arrive at
n
(xik+1 − xik ) = − (xik − ψi ) + A−1
Ik AIk Ak (x − ψ)Ak
k
i
+ (A−1 k
Ik μIk )i ,
i=1 i∈Ak i∈Ik i∈Ik
where μki ≤ 0 for i ∈ Ik and xik ≥ ψi for i ∈ Ak . Below we drop the index k with Ik and
Ak . Note that A−1 −1 −1 −1
I AIA ≤ MI KIA − MI KI (M + K)I AIA = BIA (K). Here we used
−1 −1 −1 −1 −1
(M + K)I − MI = −MI KI (M + K)I and MI MIA ≤ 0. This implies
n
(xik+1 − xik ) ≤ − (xik − ψi ) + BIA (K)(x k − ψ)A i + (A−1 k
I μI )i .
i=1 i∈A i∈I i∈I
(7.3.5)
We estimate
⎛ ⎞
∞
(A−1
I μ I )i =
k ⎝MI−1 μkI + (−MI−1 KI )j MI−1 μkI ⎠
i∈I i∈I j =1
i
∞
≤ −|MI−1 μkI |1 + ρ i |MI−1 μkI |1
j =1
1
= (α − 1)|MI−1 μkI |1 + − (α + 1) |MI−1 μkI |1 = (α − 1)|MI−1 μkI |1 ,
1−ρ
where we set α = ρ
1−ρ
∈ (0, 1) by (7.3.4). This estimate, together with (7.3.4) and (7.3.5),
implies that
n
(xik+1 − xik ) ≤ (σ − 1)|xA
k
k
− ψAk |1 + (α − 1)|MI−1
k
μkIk |1 .
i=1
Now it can! be verified in the same manner as in the proof of Theorem 7.5 that x k →
M(x ) = ni=1 xik is a merit function for the algorithm and convergence of (x k , μk ) to
k
a solution (x ∗ , μ∗ ) follows. If there are two solutions to (7.1.3), then their difference y
satisfies y t Ay ≤ 0 and hence y = 0 and uniqueness follows.
Observe that the M-matrix property is not stable under arbitrarily small perturbations
since off-diagonal elements may become positive. Theorem 7.6 guarantees that convergence
of the primal-dual active set strategy for arbitrary initial data is preserved for sufficiently
small perturbations K of an M-matrix. Therefore, Theorem 7.6 is also of interest in con-
nection with numerical implementations of the primal-dual active set algorithm.
i i
i i
ItoKunisc
i i
2008/6/12
page 200
i i
with β > 0 acts as a merit functional for the primal-dual algorithm. Here we set φ + =
max(φ, 0) and φ − = −min(φ, 0). The natural norm associated to this merit functional is
the L1 ()-norm and consequently we assume that
A ∈ L(L1 ()), a ∈ L1 (), and ψ ∈ L1 (). (7.4.3)
The analysis of this section can also be used to obtain convergence in the Lp ()-norm for
any p ∈ (1, ∞) if the norms in the integrands of M are replaced by | · |p -norms and the
L1 ()-norms below are replaced by Lp ()-norms as well.
The results also apply for Z = Rn . In this case the integrals in (7.4.2) are replaced
by sums over the respective index sets.
We assume that there exist constants ρi , i = 1, . . . , 5, such that for all partitions A
and I of and for all φA ≥ 0 in L2 (A) and φI ≥ 0 in L2 (I)
|[A−1 −
I φI ] | ≤ ρ1 |φI |,
(7.4.4)
|[A−1 +
I AIA φA ] | ≤ ρ2 |φA |
and
|[AA φA ]− | ≤ ρ3 |φA |,
|[AAI A−1 −
I φI ] | ≤ ρ4 |φI |, (7.4.5)
|[AAI A−1 +
I AIA φA ] | ≤ ρ5 |φA |.
Here | · | denotes the L1 ()-norm. Assumption (7.4.4) requires in particular the existence
of A−1
I . By a Schur complement argument with respect to the sets Ik and Ak this implies
existence of a solution to the linear systems in step (iii) of the algorithm for every k.
Theorem 7.7. If (7.4.3), (7.4.4), (7.4.5) hold and ρ = max(β ρ1 + ρ2 , ρβ3 + ρ4 + ρβ5 ) <
1, then M is a merit function for the primal-dual algorithm of the reduced system and
limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.4.1).
Proof. For every k ≥ 1 we have (x k+1 − ψ)+ ≤ (x k+1 − x k )+ on Ik and (μk+1 )− = (δμ)−
on Ak . Therefore
M(x k+1 , μk+1 ) ≤ max β (δxIk )+ , (δμAk )− . (7.4.6)
Ik Ak
i i
i i
ItoKunisc
i i
2008/6/12
page 201
i i
δxIk = −A−1 −1
Ik (−μIk ) + AIk AIk Ak (−δxAk ),
k
Thus, if ρ < 1, then M is a merit functional. Furthermore M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ).
Together with (7.4.7), (7.4.8), and (7.1.9) it follows that (x k , μk ) is a Cauchy sequence.
Hence there exists (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) = (x ∗ , μ∗ ) and Ax ∗ + μ∗ =
a, μ∗ (x ∗ −ψ) = 0 a.e. in . Since (x k −ψ)+ → (x ∗ −ψ)+ as k → ∞ and limk→∞ (x k+1 −
ψ)+ = 0, it follows that x ∗ ≤ ψ. Similarly, one argues that μ∗ ≥ 0. Thus (x ∗ , μ∗ ) is a
solution to (7.4.1).
Concerning the uniqueness of the solution to (7.4.1), assume that A ∈ L(L2 ()) and
that (Ay, y)L2 () > 0 for all y = 0. Assume further that (x ∗ , μ∗ ) and (x̂, μ̂) are solutions to
(7.4.1) with x̂ − x ∗ ∈ L2 (). Then (x̂ − x ∗ , A(x̂ − x ∗ ))L2 () ≤ 0 and therefore x̂ − x ∗ = 0.
Remark 7.4.1. In the finite-dimensional case the integrals in the definition of M must be
replaced by sums over the active/inactive index sets. If A is an M-matrix, then ρ1 = ρ2 =
ρ5 = 0 and ρ < 1 if ρβ3 + ρ4 < 1. This is the case if A is diagonally dominant in the sense
that ρ4 < 1 and β is chosen sufficiently large. If these conditions are met, then ρ < 1 is
stable under perturbations of A.
i i
i i
ItoKunisc
i i
2008/6/12
page 202
i i
Ax + E ∗ λ + G∗ μ = a,
Ex = b, (7.5.1)
where max as well as min are interpreted as pointwise a.e. operations if Z = L2 () and
coordinatewise for Z = Rn .
Ex k+1 = b,
Gx k+1 = ψ in A+
k, μk+1 = 0 in Ik , and Gx k+1 = ϕ in A−
k.
i i
i i
ItoKunisc
i i
2008/6/12
page 203
i i
where A is a bounded operator on Z. The algorithm for this reduced system is obtained
from the above algorithm by replacing G by I and deleting the terms involving E and E ∗ .
As in the unilateral case, if one does not carry out the reduction step from (7.1.6) to (7.5.2),
then the coordinates corresponding to x2 are treated as inactive ones in the algorithm. We
henceforth concentrate on the infinite-dimensional case and give sufficient conditions for
M(x k+1 , μk+1 ) = max ((x k+1 − ψ)+ + (x k+1 − ϕ)− ) dx,
Ik
(7.5.3)
k+1 − k+1 +
(μ ) dx + (μ ) dx
A+
k A−
k
to act as a merit function for the algorithm applied to the reduced system. In the finite-
dimensional case the integrals must be replaced by sums over the respective index sets. We
note that (iii) with G = I implies the complementarity property
As in the previous section the merit function involves L1 -norms and accordingly we
aim for convergence in L1 (). We henceforth assume that
Below · denotes the norm of operators in L(L1 ()). The following conditions will be
used: There exist constants ρi , i = 1, . . . , 5, such that for arbitrary partitions A ∪ I =
we have
A−1
I ≤ ρ1 ,
(7.5.6)
A−1
I AIA ≤ ρ2
and
AAk − c I ≤ ρ3 ,
AAI A−1
I ≤ ρ4 , (7.5.7)
AAI A−1
I AIA ≤ ρ5 .
Theorem 7.8. If (7.5.5), (7.5.6), (7.5.7) hold and ρ < 1, then M is a merit function for
the primal-dual algorithm of the reduced system (7.5.2) and limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in
L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.5.2).
i i
i i
ItoKunisc
i i
2008/6/12
page 204
i i
AIk δxIk + AIk A+k δxA+k + AIk A−k δxA−k − μkIk = 0, (7.5.8)
with
⎧
⎨ >0 on A+ +
k−1 ∩ Ak ,
μkA+ =0 on Ik−1 ∩ A+k, (7.5.9)
k ⎩
> c (ψ − ϕ) on A−
k−1 ∩ A +
k,
⎧
⎨ [c(ϕ − ψ), 0) on A+
k−1 ∩ Ik ,
μkIk ∈ =0 on Ik−1 ∩ Ik , (7.5.10)
⎩
(0, c(ψ − ϕ)] on A−
k−1 ∩ Ik ,
⎧
⎨ < c (ϕ − ψ) on A+ −
k−1 ∩ Ak ,
μkA− =0 on Ik−1 ∩ A−
k, (7.5.11)
k ⎩
<0 on A− −
k−1 ∩ Ak ,
⎧
⎨ =0 on A+ +
k−1 ∩ Ak ,
δxAk + <0 on Ik−1 ∩ A+
k, (7.5.12)
⎩ −
Ak−1 ∩ A+
k
= ψ − ϕ < μc on k,
⎧
A+ −
k
⎨ = ϕ − ψ > μc on k−1 ∩ Ak ,
δxA−k >0 on Ik−1 ∩ A−
k,
(7.5.13)
⎩
=0 on A−
k−1 ∩ A −
k.
From (7.5.4)
(xIk+1
k
− ψIk )+ ≤ (δxIk )+ and (xIk+1
k
− ϕIk )− ≤ (δxIk )− . (7.5.14)
≤ ρ1 |(μkI + )− | + |(μkI − )+ | + ρ2 |(x k − ψ)+
A+ ∩I
| + 1c |(μkA+ ∩A− )+ |
k ∩Ak−1 k ∩Ak−1 k k−1 k k−1
+|(x k − ϕ)−
A− ∩I
| + 1c |(μkA− ∩A+ )− | .
k k−1 k k−1
i i
i i
ItoKunisc
i i
2008/6/12
page 205
i i
This implies
ρ2
|δxIk | ≤ 2 max ρ1 , ρ2 , M(x k , μk ). (7.5.16)
c
From (7.5.8) further
Ak − (μAk − cδxAk ) = g,
μk+1 k
(7.5.17)
μkA+ − c δxA+k ≥ 0.
k
Consequently,
|(μk+1
A+
)− | + |(μk+1
A−
)+ | ≤ |gAk | ≤ (ρ3 + ρ5 )|δxAk | + ρ4 |μIk |
k k
(7.5.18)
ρ3 + ρ5
≤ 2 max ρ4 , ρ3 + ρ5 , M(x k , μk ).
c
By (7.5.15), (7.5.16), and (7.5.18)
ρ2 ρ3 + ρ5
M(x , μ ) ≤ 2 max max ρ1 , ρ2 ,
k+1 k+1
, max ρ4 , ρ3 + ρ5 , M(x k , μk ).
c c
It follows that M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ) and if ρ < 1, then M(x k , μk ) → 0 as k →
∞. From the estimates leading to (7.5.16) it follows that x k is a Cauchy sequence. Moreover
μk is a Cauchy sequence by (7.5.8). Hence there exist (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) =
(x ∗ , μ∗ ). By Lebesgue’s bounded convergence theorem and since M(x k , μk ) → 0, it
follows that ϕ ≤ x ∗ ≤ ψ. Clearly Ax ∗ + μ∗ = a and (x ∗ − ψ)(x ∗ − ϕ)μ∗ = 0 by (7.5.4).
This last equation implies that μ∗ = 0 on I ∗ = {ϕ < x ∗ < ψ}. It remains to show that
μ∗ ≥ 0 on A∗,+ = {x ∗ = ψ} and μ∗ ≤ 0 on A∗,− = {x ∗ = ϕ}. Let s ∈ A∗,+ be such that
x k (s) and μk (s) converge. Then μ∗∗(s) ≥ 0. If not, then μ∗ (s) < 0 and there exists k̄ such
that μk (s) + c(x k (s) − ψ(s)) ≤ μ 2(s) < 0 for all k ≥ k̄. Then s ∈ I k and μk+1 = 0 for
k ≥ k̄, contradicting μ∗ (s) < 0. Analogously one shows that μ∗ ≤ 0 on A∗,− .
Conditions (7.5.6) and (7.5.7) are satisfied for additive perturbations of the operator
cI , for example. This can be deduced from the following result.
Theorem 7.9. Assume that A = cI + K with K ∈ L(L1 ()) and K < c and that (7.5.5),
(7.5.6), (7.5.7) are satisfied. If
K ρ3 + ρ5
ρ̄ = 2 max max ρ1 , ρ2 , max(ρ3 + ρ5 , ρ4 ), < 1,
c c
then the conclusions of the previous theorem are valid.
i i
i i
ItoKunisc
i i
2008/6/12
page 206
i i
Proof. We follow the proof of Theorem 7.8 and eliminate the overestimate (7.5.14). Let
P = {xIk+1
k
− ψ > 0} ∩ Ik . We find
⎧
⎨ ≤ δxP ∩Ik−1 P ∩ Ik−1 ,
k
⎪ on
x k+1
−ψ = δxP ∩A+
k
on P ∩ A+ k−1 ,
⎪
⎩ = δx
k−1
−
k
P ∩A−
+ (ϕ − ψ) −
P ∩Ak−1 on P ∩ A k−1 .
k−1
≤ A−1
Ik μIk −
k
A−1
Ik AIk Ak δxAk + ϕ−ψ
P P P ∩A−
k−1
1 1
= μkIk + ϕ−ψ − KIk A−1 k
Ik μIk − A−1
Ik AIk Ak δxAk
c P −
P ∩Ak−1 c P P
1
≤− KIk A−1
Ik μIk −
k
A−1
Ik AIk Ak δxAk ,
c P P
and hence
K
(x k+1 − ψ)+ ≤
ρ1 |μkIk | + ρ2 |δxAk |.
Ik c
An analogous estimate can be obtained for Ik (x k+1 − ϕ)− and we find
+ K
(x k+1
− ψ) (x k+1 − ϕ)− ≤ ρ1 |μkIk | + ρ2 |δxAk |.
Ik Ik c
i i
i i
ItoKunisc
i i
2008/6/12
page 207
i i
will allow us to obtain improved sufficient conditions for global convergence of the primal-
dual active set algorithm. The primary motivation for such problems are optimal control
problems, and we therefore slightly change the notation to that which is more common in
optimal control.
Let U and Y be Hilbert spaces with U = L2 (), where is a bounded measurable
set in Rd , and let T : U → Y be a, possibly nonlinear, continuously differentiable, injective,
mapping with Fréchet derivative denoted by T . Further let ϕ, ψ ∈ U with ϕ < ψ a.e. in
. For α > 0 and z ∈ Y consider
1 α
min J (u) = |T (u) − z|2Y + |u|2U . (7.6.1)
ϕ≤u≤ψ 2 2
where (u, μ) ∈ U × U , and max as well as min are interpreted as pointwise a.e. operations.
If the upper or lower constraint are not present, we can set ψ = ∞ or ϕ = −∞.
1 α
min |y − z|2Y + | u|2U subject to
ϕ≤u≤ψ 2 2
(∇y, ∇v) + (y, v) = (u, v)) for all v ∈ H 1 (), (7.6.3)
˜ We set Y = L2 ()
where z ∈ L2 (). ) and U = L2 ()
). Define L u = y with L : L2 ()
) →
2
L () as the solution operator to the inhomogeneous Neumann boundary value problem
) L : L () ) where R 2 )
(7.6.3) and set T = R 2
) → L2 (), ) : L () → L ()
2
denotes the
∗ 2 ) 2 )
canonical restriction operator. Then T : L () → L () is given by
T ∗ = R) L∗ E
)
(∇p, ∇w) + (p, w) = (w̃, w) for all w ∈ H 1 (). (7.6.4)
The fact that L and L∗ are adjoint to each other follows by setting v = p in (7.6.3) and
w = y in (7.6.4).
We next specify the primal-dual active set algorithm for (7.6.1). The iteration index
is denoted by k and an initial choice (u0 , μ0 ) is assumed to be available.
i i
i i
ItoKunisc
i i
2008/6/12
page 208
i i
uk+1 = ψ on A+
k, u
k+1
= ϕ on A−
k, μ
k+1
= 0 on Ik ,
and
Note that the equations for (uk+1 , μk+1 ) in step (ii) of the algorithm constitute the
necessary optimality condition for the auxiliary problem
⎧
⎨ min 12 |T (u) − z|2Y + α2 |u|2U over u ∈ U
(7.6.6)
⎩
subject to u = ψ on A+ k , u = ϕ on A −
k .
The analysis in this section relies on the fact that (7.6.2) can be equivalently expressed as
⎧
⎪
⎪ y = T (u),
⎪
⎪
⎪
⎪
⎪
⎪
⎨ λ = −T (u)∗ (y − z),
(7.6.7)
⎪
⎪
⎪
⎪ α u − λ + μ = 0,
⎪
⎪
⎪
⎪
⎩
μ = max(0, μ + (u − ψ)) + min(0, μ + (u − ϕ)),
where λ = −T (u)∗ (T (u)−z) is referred to as the adjoint state. Analogously, for uk+1 ∈ U ,
setting y k+1 = T (uk+1 ), λk+1 = −T (uk+1 )∗ (T (uk+1 ) − z), (7.6.5) can equivalently be
expressed as ⎧ k+1
⎪ y
⎪ = T (uk+1 ), where
⎪
⎪ ⎧
⎪
⎪
⎪
⎪ ⎨ ψ on A+ k,
⎪
⎪
⎪
⎨ u k+1
= 1 k+1
λ on Ik,
⎩ α
ϕ on A− k, (7.6.8)
⎪
⎪
⎪
⎪ k+1
⎪
⎪ λ = −T (uk+1 )∗ (y k+1 − z),
⎪
⎪
⎪
⎪
⎪
⎩
α uk+1 − λk+1 + μk+1 = 0.
In what follows we give conditions which guarantee convergence of the primal-dual
active set strategy for linear and certain nonlinear operators T from arbitrary initial data.
The convergence proof is based on an appropriately defined functional which decays when
i i
i i
ItoKunisc
i i
2008/6/12
page 209
i i
evaluated along the iterates of the algorithm. An a priori estimate for the adjoint variable λ
in (7.6.8) will play an essential role.
To specify the condition alluded to in the above let us consider two consecutive iterates
of the algorithm. For every k = 1, 2, . . . , the sets A+ −
k , Ak , and Ik give a mutually disjoint
decomposition of . According to (i) and (ii) in the form (7.6.8) we find
⎧ k
⎪
⎪ RA+ on A+k,
⎪
⎪
⎨
uk+1 − uk = 1
(λk+1 − λk ) + RIk on Ik , (7.6.9)
⎪
⎪
α
⎪
⎪
⎩ k
RA− on A−k,
Sufficient conditions for (7.6.13) will be given at the end of this section. The convergence
proof will be based on the following merit functional M : U × U → R given by
M(u, μ) = α 2 (|(u − ψ)+ |2 + |(ϕ − u)+ |2 ) dx + |μ− |2 dx + |μ+ |2 dx,
A+ (u) A− (u)
where A+ (u) = {x : u ≥ ψ} and A− (u) = {x : u ≤ ϕ}. Note that the iterates (uk , μk ) ∈
U × U satisfy
μk (uk − ψ)(ϕ − uk )(x) = 0 for a.e. x ∈ , (7.6.14)
and hence at most one of the integrands of M(uk , μk ) can be strictly positive at x ∈ .
Theorem 7.13. Assume that (7.6.13) holds for the iterates of the primal-dual active set
strategy. Then M(uk+1 , μk+1 ) ≤ α −2 ρ 2 M(uk , μk ) for every k = 1, . . . . Moreover there
exist (u∗ , μ∗ ) ∈ U ×U , such that limk→∞ (uk , μk ) = (u∗ , μ∗ ) and (u∗ , μ∗ ) satisfies (7.6.2).
i i
i i
ItoKunisc
i i
2008/6/12
page 210
i i
μk+1 = λk+1 − αψ on A+
k,
1 k+1
uk+1 = λ on Ik ,
α
μk+1 = λk+1 − αϕ on A−
k.
Using step (ii) of the algorithm in the form of (7.6.8) implies that
⎧ k
⎨ μ >0 on A+ +
k−1 ∩ Ak ,
+
μk+1 = λk+1 − λk + λk − αψ = λk+1 − λk + α(uk − ψ) > 0 on Ik−1 ∩ Ak ,
⎩
αuk + μk − αψ ≥ 0 on A− +
k−1 ∩ Ak ,
and therefore
|μk+1,− (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A+
k. (7.6.15)
Analogously one derives
Moreover
1 k+1
uk+1 − ψ = (λ − λk + λk ) − ψ
α ⎧ 1 k
⎨ αμ ≤ 0 on A+k−1 ∩ Ik ,
1 k+1
= (λ − λk ) + uk − ψ ≤ 0 on Ik−1 ∩ Ik ,
α ⎩ 1 k
α
μ +u−ψ ≤0 on A−k−1 ∩ Ik ,
i i
i i
ItoKunisc
i i
2008/6/12
page 211
i i
and
|μk+1,+ (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A− (uk+1 ). (7.6.21)
Combining (7.6.19)–(7.6.21) implies that
M(uk+1 , μk+1 ) ≤ |λk+1 (x) − λk (x)|2 dx. (7.6.22)
and hence
μk+1 = max(0, λk − α ψ) + min(0, λk − α ϕ) + (λk+1 − λk )χA+k ∪A−k .
Since limk→∞ (λk+1 − λk ) = 0 and limk→∞ λk exists, it follows that there exists μ∗ ∈ U
such that limk→∞ μk = μ∗ , and
μ∗ = max(0, λ∗ − α ψ) + min(0, λ∗ − α ϕ). (7.6.25)
From the last equation in (7.6.8) it follows that there exists u∗ such that limk→∞ uk = u∗ and
α u∗ −λ∗ +μ∗ = 0. Combined with (7.6.25) the triple (u∗ , μ∗ ) satisfies the complementarity
condition given by the second equation in (7.6.2). Passing to the limit with respect to k in
(7.6.5) we obtain that the first equation in (7.6.2) is satisfied by (u∗ , μ∗ ).
We turn to the discussion of (7.6.13) and consider the linear case first.
Proof. From (7.6.8) and (7.6.9) we have, with δu = uk+1 − uk , δy = y k+1 − y k , and
δλ = λk+1 − λk , ⎧
⎪ δu = R + α δλ χIk ,
⎪ k 1
⎪
⎪
⎨
T δu = δy, (7.6.26)
⎪
⎪
⎪
⎪
⎩ ∗
T δy + δλ = 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 212
i i
We now turn to a particular case when (7.6.13) holds for a nonlinear operator T . Let
= be a bounded domain in Rn , n = 2 or 3, with smooth boundary ∂ . Further
let φ : R → R be a monotone mapping with locally Lipschitzian derivative, satisfying
φ(0) = 0, and such that the substitution operator determined by φ maps H 1 () into L2 ().
We choose U = Y = L2 () and define T (u) = y as the solution operator to
−y + φ(y) = u in ,
(7.6.27)
y=0 on ∂,
where denotes the Laplacian. The adjoint variable λ is the solution to
λ + φ (y)λ = −(y − z) in ,
(7.6.28)
λ=0 on ∂.
) = {uk : k = 1, 2, . . .} denote the set of
Let (u0 , μ0 ) be an arbitrary initialization and let U
iterates generated by the primal-dual active set algorithm. Since these iterates are solutions
to the auxiliary problems (7.6.6), it follows that for every ᾱ > 0 the set U ) is bounded in
L () uniformly with respect to α ≥ ᾱ.
2
By monotone operator theory and regularity theory of elliptic partial differential equa-
tions it follows that the set of primal states {y k = y(uk ) : k = 1, 2, . . .} and adjoint states
{λk = λ(y(uk )) : k = 1, 2, . . .} are bounded subsets of L∞ (); see [Tr]. Let C denote this
bound and let LC denote the Lipschitz constant of φ on the ball BC (0) with center 0 and
radius C in R. Denote by H01 () = {u ∈ H 1 () : u = 0 on ∂} the Hilbert space endowed
with norm |∇u|L2 and let κ stand for the embedding constant from H01 () into L2 ().
4
(1+C LC ) κ
Proposition 7.15. Assume that 0 < α−(1+C LC ) κ 4
< 1, where α ≥ ᾱ. Then (7.6.13) holds
for the mapping T determined by the solution operator to (7.6.27).
i i
i i
ItoKunisc
i i
2008/6/12
page 213
i i
both Laplacians with homogeneous Dirichlet boundary conditions. Taking the inner product
of (7.6.29) with y k+1 − y k we have, using monotonicity of φ,
κ2
|y k+1 − y k |1 ≤ |(λk+1 − λk )|1 + |R k |−1 , (7.6.31)
α
where | · |1 and | · |−1 denote the norms in H01 () and H −1 (), respectively. Note that
φ (y k+1 ) ≥ 0. Hence from (7.6.30) we find
≤ (1 + C LC ) κ 2 |y k+1 − y k |1 |λk+1 − λk |1 .
Thus,
|λk+1 − λk |1 ≤ (1 + C LC ) κ 2 |y k+1 − y k |1
and hence from (7.6.31)
α
|y k+1 − y k |1 ≤ |R k |−1 .
α − (1 + C LC ) κ 4
It thus follows that
α (1 + C LC ) κ 4
|λk+1 − λk |L2 ≤ |R k |L2 .
α − (1 + C LC ) κ 4
This implies (7.6.13) with
(1 + C LC ) κ 4
ρ = .
α − (1 + C LC ) κ 4
i i
i i
ItoKunisc
i i
2008/6/12
page 214
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 215
i i
Chapter 8
Semismooth Newton
Methods I
8.1 Introduction
In this chapter we study semismooth Newton methods for solving nonlinear nonsmooth
equations. These investigations are motivated by complementarity problems, variational
inequalities, and optimal control problems with control or state constraints, for example.
The operator equation for which we desire to find a solution is typically Lipschitz continuous
but not C 1 regular. We shall also establish the relationship between the semismooth Newton
method and the primal-dual active set method that was discussed in Chapter 7.
Since semismooth Newton methods are not widely known even for finite-dimensional
problems, we consider the finite-dimensional case before we turn to problems in infinite
dimensions. In fact, these two cases are distinctly different. In finite dimensions we have
Rademacher’s theorem, which states that every locally Lipschitz continuous function is
differentiable almost everywhere. This result has no counterpart for functions between
infinite-dimensional function spaces.
As an example consider the nonlinear complementarity problem
g(x) ≤ 0, x ≤ ψ, and (g(x), x − ψ)Rn = 0,
where g : R → R and ψ ∈ R . It can be expressed equivalently as the problem of finding
n n n
215
i i
i i
ItoKunisc
i i
2008/6/12
page 216
i i
and we denote by ∂F (x) the generalized derivative at x introduced by Clarke [Cla], i.e.,
The reason for using ∂B F rather than ∂F in (8.1.4) is the following: For the convergence
analysis we shall require that all V ∈ ∂B F (x ∗ ) are nonsingular, where x ∗ is the thought
solution to F (x) = 0. This is more readily satisfied for ∂B F than for ∂F , as can be seen
from F (x) = |x|, for example. In this case 0 ∈ ∂F (0) but 0 ∈/ ∂B F .
We also introduce the coordinatewise operation
∂b F (x) = ⊗m
i=1 ∂B Fi (x),
where Fi is the ith coordinate of F . From the definition of ∂B F (x) it follows that ∂B F (x) ⊂
∂b F (x). For F given in (8.1.1) we have
∂B F (x) = ∂b F (x)
then the generalized Newton method reduces to the primal-dual active set method.
Local convergence of {x k } to x ∗ , a solution of F (x) = 0, is based on the following
concepts. The generalized Jacobians V k ∈ ∂B F (xk ) are selected so that their inverses
V (xk )−1 are uniformly bounded and that they satisfy the condition
|F (x ∗ + h) − F (x ∗ ) − V h| = o(|h|), (8.1.5)
Thus, there exists a neighborhood B(x ∗ , ρ) of x ∗ such that if x 0 ∈ B(x ∗ , ρ), then x k ∈
B(x ∗ , ρ) and x k converges to x ∗ superlinearly. This discussion will be made rigorous for F
mapping between finite-dimensional spaces, as above, as well as for the infinite-dimensional
case. For the finite-dimensional case we shall rely on the notion of semismooth functions.
i i
i i
ItoKunisc
i i
2008/6/12
page 217
i i
F (x k ) + F (x k ; d) = 0, (8.1.8)
where F (x; d) denotes the directional derivative of F at x in direction d. Note that it may
be a nontrivial task to solve (8.1.8) for d k . This method was investigated in [Pan1, Pan2].
We shall return to (8.1.8) in the context of globalization of the method specified by (8.1.4).
In Section 8.2 we present the finite-dimensional theory for Newton’s method for
semismooth functions. Section 8.3 is devoted to the discussion of Newton differentia-
bility and solving nonsmooth equations in Banach spaces. In Section 8.4 we exhibit the
relationship between the primal-dual active set method and semismooth Newton methods.
Section 8.5 is devoted to a class of nonlinear complementarity problems. In Section 8.6
we discuss applications where, for different reasons, semismooth Newton methods are not
directly applicable but rather a regularization is necessary as, for instance, in the case of
state-constrained optimal control problems.
i i
i i
ItoKunisc
i i
2008/6/12
page 218
i i
Theorem 8.2. (i) Suppose that F : Rm → Rn is locally Lipschitz continuous and direction-
ally differentiable at x. Then F (x; ·) is Lipschitz continuous and for every h, there exists
a V ∈ ∂F (x) such that
F (x; h) = V h. (8.2.1)
(ii) If F is locally Lipschitz continuous, then the following statements are equivalent.
(1) F is semismooth at x.
(2) F is directionally differentiable at x and for every V ∈ ∂F (x + h),
V h − F (x; h) = o(|h|) as h → 0.
F (x+h;h)−F (x;h)
(3) limx+h∈ DF , |h|→0 |h|
= 0.
for all x, y ∈ Rm (see [Cla, p. 72]), there exist a sequence {tk }, with tk → 0+ , and Vk ∈
co ∂F ([x, x + tk h])(y − x) such that
F (x; h) = lim Vk h.
k→∞
Since F is locally Lipschitz, the sequence {Vk } is bounded, and there exists a subsequence
of Vk , denoted by the same symbol, such that Vk → V . Moreover ∂F is closed at x; i.e.,
xi → x and Zi → Z, with Zi ∈ ∂F (xi ), imply that Z ∈ ∂F (x) [Cla, p. 70], and hence
V ∈ ∂F (x). Thus F (x; h) = V h, as desired.
(ii) We turn to verify the equivalence of (1)–(3).
(1) → (2): First we show that F (x; h) exists and that
F (x + ti h) − F (x)
∈ co (∂F ([x, x + ti h])) h = co (∂F ([x, x + ti h] h).
ti
i i
i i
ItoKunisc
i i
2008/6/12
page 219
i i
The
!n Carathéodory theorem implies that for each i there exist tik ∈ [0, ti ], λki ∈ [0, 1] with
k=0 λi = 1, and Vi ∈ ∂F (x + ti h), where k = 0, . . . , n, such that
k k k
F (x + ti h) − F (x) k k
n
= λi Vi h.
ti k=0
Next we prove that the limit in (8.2.4) is uniform for all h with |h| = 1. This implies (2).
If the claimed uniform convergence in (8.2.4) does not hold, then there exists > 0, and
sequences {hk } in Rm with |hk | = 1, {tk } with tk → 0+ , and Vk ∈ ∂F (x + tk hk ) such that
|Vk hk − F (x; hk )| ≥
i i
i i
ItoKunisc
i i
2008/6/12
page 220
i i
|V h − F (x + h; h)| ≤ 5 |h|.
|V −1 | ≤ C. (8.2.7)
Proof. First, we claim that there exist a neighborhood N of x and a constant C such that
for all y ∈ DF ∩ N , ∇F (y) is nonsingular and
If this claim is not true, then there exists a sequence y k → x, y k ∈ DF , such that either
all ∇F (y k ) are singular or |(∇F (y k ))−1 | → ∞. Since F is locally Lipschitz the set
{∇F (y k ) : k = 1, . . .} is bounded. Thus there exists a subsequence of ∇F (y k ) that
converges to some V . Then V must be singular and V ∈ ∂B F (x). This contradicts the
assumption and there exists a neighborhood N of x such that (8.2.8) holds for all y ∈ DF ∩N .
Moreover (8.2.7) follows from (8.1.2), (8.2.8), and continuity of the norm.
Proof. From Theorem 8.3 there exist a neighborhood N of x ∗ and a constant C such that
|V −1 | ≤ C for all V ∈ ∂B F (x) with x ∈ N . Thus, if x ∈ N , then it follows from
i i
i i
ItoKunisc
i i
2008/6/12
page 221
i i
≤ (|x − x ∗ |)|x − x ∗ |.
This implies the first claim. Let x̂ = x − V −1 F (x). Since F is B-differentiable at x ∗
we have
|F (x̂)| ≤ |F (x ∗ ; x̂ − x ∗ )| + (|x̂ − x ∗ |)|x̂ − x ∗ |.
From the first part of the theorem we obtain for a possibly redefined function
|F (x̂)| ≤ (L + ) |x − x ∗ |,
where = (|x − x ∗ |) and L is the Lipschitz constant of F at x ∗ . Since
|x − x ∗ | ≤ |x̂ − x| + |x̂ − x ∗ |
We are now prepared for the local superlinear convergence result that was announced
in the introduction of this chapter.
i i
i i
ItoKunisc
i i
2008/6/12
page 222
i i
8.2.2 Globalization
We now discuss globalization of the semismooth Newton method (8.1.4). For this purpose
we define the merit function θ by
θ(x) = |F (x)|2 .
Throughout we assume that F : Rn → Rn is locally Lipschitz continuous and B-
differentiable and that the following assumptions (8.2.9)–(8.2.11) hold:
S = {x ∈ Rn : |F (x)| ≤ |F (x 0 )|} is bounded. (8.2.9)
Algorithm G.
Let β, γ ∈ (0, 1) and σ ∈ (0, σ̄ ). Choose x 0 ∈ Rn and set k = 0. Given x k with
F (x ) = 0. Then
k
i i
i i
ItoKunisc
i i
2008/6/12
page 223
i i
Proof. (a) First we prove that for each x ∈ S such that θ(x) = 0 and d satisfying θ (x; d) ≤
−σ̄ θ (x), there exists a τ̄ > 0 such that
If this is not the case, then there exists a sequence τn → 0+ such that
Since σ < σ̄ , this shows θ (x) = 0, which contradicts the assumption θ(x) = 0. Hence
for each level k at which d k ∈ (x k ) is chosen according to the second alternative in
Algorithm G, there exists mk < ∞ and αk > 0 such that |F (x k+1 )| < |F (x k )|. By
construction the iterates therefore satisfy |F (x k+1 )| < |F (x k )| for each k ≥ 0.
Assume first that lim sup αk > 0. If the first alternative in Algorithm G with αk = 1
occurs infinitely many times, then using the fact that γ < 1 we find that limk→0 θ(x k ) = 0.
Otherwise, for all k sufficiently large
0 ≤ θ (x k+1 ) ≤ (1 − σ αk )θ (x k ) ≤ θ(x k ).
i i
i i
ItoKunisc
i i
2008/6/12
page 224
i i
By (8.2.9), (8.2.10) the sequence {(x k , d k )} is bounded. Let {(x k , d k )}k∈K be any convergent
subsequence with limit (x ∗ , d). Note that
θ (x k + τk d k ) − θ (x k ) θ (x k + τk d) − θ(x k ) θ(x k + τk d k ) − θ(x k + τk d)
= + ,
τk τk τk
where
θ (x k + τk d k ) − θ(x k + τk d)
lim → 0,
k∈K,k→∞ τk
since θ is locally Lipschitz continuous. Since d k ∈ (x k ) for all k ∈ K it follows from
(8.2.11) that θ (x ∗ ; d) ≤ −σ̄ θ (x ∗ ). Then from (8.2.13) and (8.2.11) we find
θ (x k + τk d k ) − θ(x k )
−σ θ (x ∗ ) ≤ lim sup ≤ θ o (x ∗ ; d) ≤ −σ̄ θ(x ∗ ). (8.2.14)
k∈K, k→∞ τk
It follows that (σ̄ − σ ) θ (x ∗ ) ≤ 0 and thus θ(x ∗ ) = 0.
(b) Since F is B-differentiable at x ∗ there exists a δ > 0 such that for |x − x ∗ | ≤ δ
1
|F (x) − F (x ∗ ) − F (x ∗ ; x − x ∗ )| ≤ |x − x ∗ |.
2c
Thus,
|F (x ∗ ; x − x ∗ )| ≤ |F (x)| + |F (x) − F (x ∗ ) − F (x ∗ , x − x ∗ )|
1
≤ |F (x)| + |x − x ∗ |.
2c
From (8.2.12)
1
|x − x ∗ | ≤ c |F (x ∗ ; x − x ∗ )| ≤ c |F (x)| + |x − x ∗ |
2
and thus
|x − x ∗ | ≤ 2c |F (x)| if |x − x ∗ | ≤ δ.
Given ∈ (0, δ) define the set
N (x , ) = x ∈ Rn : |x − x ∗ | ≤ , |F (x)| ≤
∗
.
2c + b
≤ 2c |F (x k̄ )| + b |F (x k̄ )| = (2c + b) |F (x k̄ )| ≤ .
Hence x k̄+1 ∈ N (x ∗ , ). By induction, x k ∈ N (x ∗ , ) for all k ≥ k̄ and thus the sequence
x k converges to x ∗ .
i i
i i
ItoKunisc
i i
2008/6/12
page 225
i i
(c) Since limk→∞ x k = x ∗ the iterates x k of Algorithm G enter into the region of
attraction for Theorem 8.5. Moreover, referring to the proof of Lemma 8.4, for any γ ∈ (0, 1)
there exists kγ such that the iterates according to (8.1.4) satisfy |F (x k+1 )| ≤ γ |F (x k )| for
k ≥ kγ . Hence these iterates coincide with those of the Algorithm G for k ≥ kγ , and
superlinear convergence follows.
Remark 8.2.1. (i) We point out that the requirement that the graph satisfies the closure
property (8.2.11) is used in the proof of Theorem 8.6 only for the case that lim supk→∞ αk = 0.
(ii) For part (a) of Theorem 8.6 the condition |hk | ≤ b|F (x k )| in the first alternative
of Algorithm G is not required and |d| ≤ b|F (x)| for all x ∈ S used in alternative (ii) can
be replaced by requiring that the directions are uniformly bounded. These conditions are
used in the proof of Theorem 8.6(b).
(iii) Since h → F (x ∗ ; h) is positively homogeneous, since we consider the finite-
dimensional case here, one can easily argue that (8.2.12) is equivalent to F (x ∗ ; h) = 0 for
all h = 0, which is called BD-regularity in [Qi].
(iii) (F (x̄), F o (x̄; d̄)) ≤ lim supx→x̄, d→d̄ (F (x), G(x; d)) for all x → x̄, d → d̄ with
x, x̄ ∈ S.
For the special case of optimization subject to box constraints a quasi-directional derivative
will be constructed in Section 8.2.5. In the remainder of this section we consider the
relationship between well-known choices for descent directions (see, e.g., [HPR, Pan1])
and the concept of quasi-directional derivative of Definition 8.7, and we assume that F is
i i
i i
ItoKunisc
i i
2008/6/12
page 226
i i
and if
F (x) + F (x; d) = 0 (8.2.16)
admits a solution d for each x ∈ S, then a first choice for the direction is given by the
solution d to (8.2.16), i.e., (x) = d; see, e.g., [Pan1]. By (8.2.15) we have |d| ≤ b̄|F (x)|.
Moreover
and therefore the inequalities in (8.2.10) hold with b = b̄ and σ̄ = 2. For this choice,
however, does not satisfy (8.2.11), in general, unless additional conditions are imposed
on the problem data; see Section 8.2.5 below.
(b) Generalized Bouligand direction. As a second choice (see [HPR, Pan1]), we
assume that G is a quasi-directional derivative of F on S, that
θ o (x̄, d̄) ≤ 2 lim sup (F (x), G(x; d)) = −2 lim |F (x)|2 = −2θ(x̄).
x→x̄
x→x̄, d→d̄
We refer the reader to Section 8.2.5 for the construction of G for specific applications.
(c) Generalized gradient direction. The following choice was discussed in [HPR].
Here d is chosen as the solution to
Then, d → J (d) is coercive, bounded below, and continuous. Thus there exists an optimal
solution d to (8.2.19).
i i
i i
ItoKunisc
i i
2008/6/12
page 227
i i
By Lemma 8.8 the optimal value of the cost J in (8.2.19) is given by −η|d|2 . If this
value is negative, then any solution to (8.2.19) provides a decay for θ . The optimal value
of the cost is 0 if and only if d = 0 is the optimal solution. In this case Lemma 8.8 implies
that x is a stationary point in the sense that (F (x), G(x; h)) ≥ 0 for all h ∈ Rn .
Let us now turn to the discussion of condition (8.2.10) for the direction given by
the solution d to (8.2.19), i.e., (x) = d. We assume (8.2.17) and that (8.2.18) admits
a solution for every x ∈ S. Since J (0) = 0, we have 2(F (x), G(x; d)) ≤ −η |d|2 and
therefore
η |d|2 ≤ −2(F (x), G(x; d)) ≤ 2|F (x)| |G(x; d)| ≤ 2L |d| |F (x)|.
Thus,
2L
|d| ≤ |F (x)|,
η
and the second condition in (8.2.10) holds. Turning to the first condition let d̂ satisfy
F (x) + G(x; d̂) = 0. Then using Lemma 8.8(a) and (8.2.17) we find, at a solution d to
(8.2.19),
i i
i i
ItoKunisc
i i
2008/6/12
page 228
i i
To argue (8.2.11) for this choice of , let xk → x̄, dk → d̄, with dk = (xk ), xk ∈ S,
and choose d̂k such that F (xk ) + G(xk ; d̂k ) = 0. Then
1 o
θ (x̄; d̄) ≤ lim sup(F (xk ), G(xk ; dk ))
2 k→∞
Gauss–Newton algorithm.
Let β, γ ∈ (0, 1), α > 0, and σ ∈ (0, 2η). Choose x 0 ∈ Rn and set k = 0. Given x k
with F (x k ) = 0. Then
(i) If there exists Vk ∈ ∂B F (x k ) such that
hk = −(α |F (x k )| I + VkT Vk )−1 VkT F (x k ) (8.2.22)
with |hk | ≤ b |F (x k )| satisfies |F (x k +hk )| < γ |F (x k )|, let d k = hk and set x k+1 = x k +d k ,
α k = 1, and mk = 0.
(ii) Otherwise, let d k be defined by (8.2.19) with x = x k . Stop if d k = 0 or F (x k ) = 0.
Otherwise, set αk = β mk where mk is the first positive integer m for which
θ (x k + β m d k ) − θ(x k ) ≤ −σβ m |d k |2 , (8.2.23)
and set x k+1 = x k + α k d k .
If the Gauss–Newton algorithm terminates after finitely many steps with index k̄, then
either F (x k̄ ) = 0 or d k̄ = 0 in the second alternative of the algorithm. In this case x k̄ is a
stationary point of θ in the sense that (F (x k̄ ), G(x k̄ ; d)) ≥ 0 for all d by Lemma 8.8.
i i
i i
ItoKunisc
i i
2008/6/12
page 229
i i
We next discuss global convergence when the Gauss–Newton algorithm takes in-
finitely many steps.
Proof. (a) The algorithm guarantees that |F (x k+1 )| < |F (x k )| for all k. Due to (8.2.9)
the sequence {x k } is bounded. In case the first alternative is taken we have from (8.2.24)
with h = 0
α|d k |2 ≤ |F (x k )|.
In case of the second alternative Lemma 8.8(a) and (8.2.20) imply that
lim αk |d k |2 = 0.
k→∞
If limk→∞ d k = 0, then there exists an index set K such that limk∈K, k→∞ |d k | = 0 and
consequently limk∈K, k→∞ αk = 0. For τk = β mk −1 we have
1
−σ |d k |2 ≤ (θ (x k + τ k d k ) − θ(x k )). (8.2.27)
τk
Let K̂ ⊂ K be such that {x k }k∈K̂ , {d k }k∈K̂ are convergent subsequences with limits x ∗ and
d. Note that
1 1
(θ (x k + τ k d k ) − θ (x k )) = k (θ (x k + τ k d) − θ(x k ))
τk τ (8.2.28)
1
+ k (θ (x k + τ k d k ) − θ(x k + τ k d))
τ
i i
i i
ItoKunisc
i i
2008/6/12
page 230
i i
with
1
lim k
(θ (x k + τ k d k ) − θ(x k + τ k d)) = 0,
k∈K̂,k→∞ τ
Since σ ∈ (0, 2η) this implies that d = 0, which contradicts our assumption. Consequently
limk→∞ d k = 0. From Lemma 8.8(a) we have
for every α > 0 and h ∈ Rn . Passing to the limit with respect to k and dividing by α we
obtain
2 (F (x ∗ ), G(x ∗ ; h)) + ηα|h|2 ≥ 0,
and (8.2.25) follows by letting α → 0+ .
(b) The proof is identical to the one of Theorem 8.6(b).
(c) From Theorem 8.3 there exists a bounded neighborhood N ⊂ {x : |F (x)| ≤
|F (x 0 )|} of x ∗ and a constant C such that for all x ∈ N and V ∈ ∂B F (x) we have
|V −1 | ≤ C. Consequently there exists M > 0 such that for all x ∈ N we have
α|F (x)|I + V (x)T V (x) −1 V (x)T − V (x)−1
−1
(8.2.29)
≤
α|F (x)| α|F (x)|I + V (x)T V (x)
≤ M|F (x)| ≤ M|F (x 0 )|.
−1
Let h = − α|F (x)|I + V (x)T V (x) V (x)T F (x). Then by Lemma 8.4, possibly after
shrinking N , we have for all x ∈ N
where we used (8.2.29) and denoted by L̄ the Lipschitz constant of F on the bounded set
{x : |x| ≤ M|F (x0 )|2 } ∪ {x − V (x)−1 F (x) : x ∈ N } ∪ N .
i i
i i
ItoKunisc
i i
2008/6/12
page 231
i i
Since x k → x ∗ , the last estimate implies the existence of an index k̄ such that x k ∈ N and
where g ∈ C 1 (Rn , Rn ) and φ < ψ. This corresponds to the bilateral constraint case and
can equivalently be expressed as
see Section 4.7, Example 4.53. Clearly θ(x) = |F (x)|2 is locally Lipschitz continuous and
directionally differentiable, and hence B-differentiable.
Define
I 3 = {−g(x) + x − φ = 0}.
We obtain
⎧
⎪
⎪ d on A+ ∪ A− ,
⎪
⎪
⎪
⎪
⎪
⎪
⎨ g (x)d on I 2 ,
F (x; d) =
⎪
⎪
⎪
⎪ max(g (x)d, d) on I 1 ,
⎪
⎪
⎪
⎪
⎩
min(g (x)d, d) on I 3 ,
d + x − ψ = 0 on A+ , d + x − φ = 0 on A− , g (x)d + g(x) = 0 on I 2 ,
i i
i i
ItoKunisc
i i
2008/6/12
page 232
i i
To verify this claim we consider F (x) = g(x) + max(0, −g(x) + x − ψ). The general case
then follows with minor modifications. We find
2
θ o (x, d) = lim sup (F (x), F (y + td) − F (y))
y→x,t→0+ t
2
= lim sup (F (x), max(g (x)(y + td), y + td − ψ̂) − max(g (x)y, y − ψ̂)),
y→x,t→0+ t
(8.2.33)
Ay = y − ψ̂ + r,
i i
i i
ItoKunisc
i i
2008/6/12
page 233
i i
G(x; d) = −F
is equivalent to
d1 = −F1 ,
d2 = −A−1 −1
22 A23 d3 + A22 (−F2 + A21 F1 ),
and
⎧
⎪
⎪ Md3 + μ̃ = −w − F3 ,
⎪
⎪
⎨
μ̃ = max(0, μ̃ + d3 + F3 ) on {F (x) > 0}, (8.2.35)
⎪
⎪
⎪
⎪
⎩
μ̃ = min(0, μ̃ + d3 + F3 ) on {F (x) ≤ 0},
where
Assume that A is symmetric positive definite for every x ∈ S. Then every Schur complement
of A is positive definite as well, and (8.2.35) admits a unique solution (d3 , μ̃). It follows
that G(x; d) = −F admits a unique solution d for every x ∈ S and F ∈ Rn . Consequently
(8.2.18) is satisfied. Since g (x) is positive definite for every x ∈ S and S is closed and
bounded by (8.2.9), and hence compact, it follows that g (x) and M are uniformly positive
definite with respect to x ∈ S. This implies that there exists b̄ such that |d| ≤ b̄|F | for some
b̄ independent of x ∈ S. Consequently (8.2.17) holds.
We remark that (iii) of Definition 8.7 is not satisfied for the choice G(x; d) = F (x, d)
unless g(x)i = (x − ψ)i and g(x)i = (x − φ)i for all i and x ∈ S.
i i
i i
ItoKunisc
i i
2008/6/12
page 234
i i
Note that differently from the finite-dimensional case we do not require Lipschitz
continuity of F as part of the definition of semismoothness.
i.e., F is semismooth.
i i
i i
ItoKunisc
i i
2008/6/12
page 235
i i
Since h ∈ X with |h| = 1 was arbitrary this implies that F is directionally differentiable at
x and
F (x; h) = lim+ G(x + th)h.
t→0
F (x + tv) − F (x)
lim − F (x; v) = 0 and the limit is uniform in |v|X = 1.
t→0+ t
Here we use positive homogeneity of the directional derivative F (x; h) with respect to the
second variable.
If F is B-differentiable at x, then it is differentiable and from (1) we have
F (x + tv) − F (x)
lim = F (x; v) = lim G(x + tv)v.
t→0 t t→0
The Bouligand property and the equivalence stated above imply that the limt→0 G(x + tv)v
exists uniformly in |v| = 1. The converse easily follows as well.
We first argue that there exists a measurable selection V : R → R such that V (t) ∈ ∂ψ(t)
for a.e. t ∈ R. Since ∂ψ(s) is a nonempty closed set in R for every s ∈ R (see [Cla,
p. 70]), it suffices to prove that the multivalued function s → ∂ψ is measurable; i.e., for
every compact set C ⊂ R the preimage PC = {t ∈ R : ∂ψ(t) ∩ C = ∅} is measurable. The
measurable selection theorem (see [Cla, p. 111]) then ensures the existence of the desired
measurable selection V . To verify measurability of s → ∂ψ(s) let C be a compact set
and let {tk } be a convergent sequence in PC with limit t ∗ . Choose vk ∈ ∂ψ(tk ) ∩ C for
k = 1, 2, . . . . By compactness of C there exists a convergent subsequence, denoted by the
same symbol, with limit v ∗ ∈ C. Since tk → t ∗ , upper semicontinuity of ∂ψ at t ∗ (see [Cla,
p. 70]) implies the existence of a sequence ṽk ∈ ∂ψ(t ∗ ) such that limk→∞ (ṽk − vk ) = 0.
Consequently limk→∞ ṽk = v ∗ and by closedness of ∂ψ(t ∗ ) we have v ∗ ∈ ∂ψ(t ∗ ) ∩ C.
Thus PC is closed and therefore measurable.
Associated to ψ with the properties specified above, we define for 1 ≤ p ≤ q ≤ ∞
the substitution operator
F : Lq () → Lp ()
i i
i i
ItoKunisc
i i
2008/6/12
page 236
i i
by
F (x)(s) = ψ(x(s)) for a.e. s ∈ ,
where x ∈ L (), and with a bounded domain in Rn .
q
case for every a.e. convergent subsequence of hk and since {hk } was arbitrary, we find that
|h−1 D(x, h)|Lp̂ → 0 for every 1 ≤ p̂ < ∞ as |h|Lq () → 0. (8.3.5)
By the Hölder inequality we obtain
|D(x, h)|Lp ≤ |h−1 D(x, h)|Lr |h|Lq ,
where r = q−p qp
if q < ∞, and r = p if q = ∞. Using (8.3.5) this implies Newton
differentiability of F at x.
To verify that F is semismooth at x ∈ Lq (), we shall show that
|V (x + h)h − ψ (x; h)|Lp
→ 0 as |h|Lq → 0. (8.3.6)
|h|Lq
Let D̄ : R × R → R be given by D̄(s, v) = |V (s + v)v − ψ (s; v)|. Then by Theorem 8.2
(3) and Lipschitz continuity of ψ we have
lim v −1 D̄(s, v) = 0 and D̄(s, v) ≤ 2L|v| for all (s, v) ∈ R2 .
v→0
The proof of (8.3.6) can now be completed in the same manner as that for Newton dif-
ferentiability, by using Lebesgue’s bounded convergence theorem. Semismoothness of
F : Lq () → Lp () follows from (8.3.5), (8.3.6), and Lemma 8.11 (2). The class of
mappings F of this example was treated in [Ulb].
Example 8.13. Let X be a Hilbert space. Then the norm functional F (x) = |x| is Newton
differentiable. In fact, let G(x + h)h = ( |x+h|
x+h
, h)X and G(0)h = (λ, h)X for some λ with
λ ∈ X. Then
(2(x + h), h)X − |h|2 (x + h, h)X
|h|−1 |F (x + h) − F (x) − G(x + h)h| = |h|−1
−
→0
|x| + |x + h| |x + h|
as h → 0. Hence F is Newton differentiable on X. Moreover F is semismooth.
i i
i i
ItoKunisc
i i
2008/6/12
page 237
i i
Example 8.14. Let F : Lq () → Lp () denote the pointwise max operation F (x) =
max(0, x) and define Gm by
⎧
⎨ 0 if x(s) < 0,
Gm (x)(s) = δ if x(s) = 0,
⎩
1 if x(s) > 0,
where δ ∈ [0, 1]. It follows from Example 8.12 that F is semismooth from Lq () into
Lp () provided that 1 ≤ p < q ≤ ∞, with Gm as N -derivative. It can also be argued
that Gm is an N-derivative for F for any choice of δ ∈ R (see [HiIK]). If p = q, then F is
directionally differentiable at every x ∈ Lq (). In fact, for h ∈ Lq () define
0 if x(s) < 0 or x(s) = 0, h(s) ≤ 0,
F (x; h)(s) =
h(s) if x(s) > 0 or x(s) = 0, h(s) ≥ 0.
Then we have
F (x(s) + t h(s)) − F (x(s))
− F (x; h)(s)
≤ 2|h(s)|
t
and
F (x(s) + t h(s)) − F (x(s))
lim+
− F (x; h)(s)
for a.e. s ∈ .
t→0 t
By the Lebesgue dominated convergence theorem
F (x + t h) − F (x)
lim+
− F (x; h)
= 0,
t→0 t Lp
Thus,
p1
|F (x + hn ) − F (x) − Gm (x + hn )hn |Lp 1
lim = = 0,
n→∞ |hn |Lp p+1
and hence condition (A) is not satisfied at x for any p ∈ [1, ∞). To consider the case
p = ∞ we choose = (0, 1) and show that (A) is not satisfied at x(s) = s. For this
purpose define for n = 2, . . .
⎧
⎨ −(1 + n )s
⎪ on (0, n1 ] ,
1
hn (s) = (1 + n1 )s − n2 (1 + n1 ) on ( n1 , n2 ] ,
⎪
⎩
0 on ( n2 , 1] .
i i
i i
ItoKunisc
i i
2008/6/12
page 238
i i
n2 n
= lim |x|L∞ (En ) ≥ lim = 1,
n→∞ n + 1 n→∞ n + 1
where
/ /
/ 1 /
/ H (x + θ h) dθ − H (x + h)/
/ /→0 as |h|X → 0, (8.3.8)
0
Theorem 8.16. Suppose that x ∗ is a solution to F (x) = 0 and that F is Newton differentiable
at x ∗ with N-derivative G. If G is nonsingular for all x ∈ N (x ∗ ) and {G(x)−1 : x ∈
N (x ∗ )} is bounded, then the Newton iteration
i i
i i
ItoKunisc
i i
2008/6/12
page 239
i i
Theorem 8.17. Suppose that F is continuous and directionally differentiable on the closed
sphere S = B(x 0 , r). Assume also the existence of bounded operators G(·) ∈ L(X, Z) and
constants β, γ ∈ R+ such that
G(x)−1 ≤ β, |G(x)(y − x) − F (x; y − x)| ≤ γ |y − x|,
i i
i i
ItoKunisc
i i
2008/6/12
page 240
i i
Since
k
k
|x k+1 − x 0 | ≤ |x j +1 − x j | ≤ rα j (1 − α) ≤ r,
j =0 j =0
we have r k+1
∈ S and by induction x ∈ S for all k. For each m > n
k
m−1
m−1
|x m − x n | ≤ |x j +1 − x j | ≤ rα j (1 − α) ≤ rα n .
j =n j =n
≤ β |F (y ∗ ) − F (x ∗ ) − F (x ∗ ; y ∗ − x ∗ )|
+β |G(x ∗ )(y ∗ − x ∗ ) − F (x ∗ ; y ∗ − x ∗ )|
≤ α |y ∗ − x ∗ |.
This implies y ∗ = x ∗ and hence x ∗ is the unique solution to F (x) = 0 in S. Finally
m−1
m−k
α
|x m − x k | ≤ |x j +1 − x j | ≤ α j |x k − x k−1 | ≤ |x k − x k−1 |
j =k j =1
1−α
where A ∈ L(L2 ()), with a bounded domain in Rn , c > 0, and a ∈ Lp (), ψ ∈ Lp ()
for some p > 2. Recall that the second equation in (8.4.1) is equivalent to
x ≤ ψ, μ ≥ 0, (μ, x − ψ)L2 () = 0, (8.4.2)
where the inequalities and the max operation are understood in the pointwise a.e. sense. In
Example 7.1 of Chapter 7 it was shown that such problems arise in the context of constrained
optimal control problems with A of the form
i i
i i
ItoKunisc
i i
2008/6/12
page 241
i i
8.4. The primal-dual active set method as a semismooth Newton method 241
A = αI + B ∗ (−)−2 B, (8.4.3)
where α > 0, B is a bounded operator, and denotes the Laplacian with homogeneous
Dirichlet boundary conditions. This suggests and justifies our assumption that
We shall show that the primal-dual active set method discussed in Chapter 7 is equiv-
alent to the semismooth Newton method applied to
⎧
⎨ Ax − a + μ,
0 = F (x, μ) = (8.4.5)
⎩
μ − max(0, μ + c (x − ψ)).
and we recall that the max function is Newton differentiable from Lp () to L2 () if p > 2
with G as an N -derivative. A Newton step applied to the second equation in (8.4.4) results in
This coincides with the primal-dual active set strategy of Section 7.1. To analyze its local
convergence properties note that under assumption (8.4.4), equation (8.4.5) with c = α is
equivalent to the reduced problem
Setting μk+1 = a − Ax k+1 the iterates of (8.4.9) coincide with those of (8.4.7), provided
that the initialization for the reduced iteration (8.4.9) is chosen such that μ0 = a − Ax 0 .
This follows from the fact that μk + α(x k − ψ) = −Cx k + a − αψ.
i i
i i
ItoKunisc
i i
2008/6/12
page 242
i i
For any partition = A ∪ I into measurable sets A and I let RI : L2 (I) → L2 (I)
denote the canonical restriction operator and RI∗ : L2 (I) → L2 (I) its adjoint. Further set
AI = RI A RI∗ .
Proposition 8.18. Assume that (8.4.1) with ψ and a in Lp (), p > 2, admits a solution
(x ∗ , μ∗ ) ∈ L2 () × L2 () and that (8.4.4) holds. If moreover
{A−1
I : = I ∪ A} is uniformly bounded in L(L ()),
2
(8.4.10)
then the iterates (x k , μk ) defined by (8.4.7) converge superlinearly in L2 () × L2 ()
to (x ∗ , μ∗ ), provided that |x ∗ − x 0 | is sufficiently small and μ0 = a − Ax 0 . Moreover
(x ∗ , μ∗ ) ∈ Lp () × Lp ().
Proof. First note that if (8.4.1) admits a solution (x ∗ , μ∗ ) for some c > 0, then it admits the
same solution with c = α and x ∗ is also a solution of (8.4.8). By (8.4.4) this implies that
x ∗ ∈ Lp () and the first equation in (8.4.1) implies that μ∗ ∈ Lp (). By Lemma 8.15 and
Example 8.14 the mapping F̃ x = αAx−a+max(0, −Cx+a−αψ) is Newton differentiable
from L2 () into itself and G̃(x) = A − G(−Cx + a − αψ)C is an N -derivative. By
(8.4.10) the family of operators {G̃(x)−1 : x ∈ L2 ()} is uniformly bounded in L(L2 ()),
and hence by Theorem 8.16 the iterates x k converge superlinearly to x ∗ , provided that
|x 0 − x ∗ | is sufficiently small. Superlinear convergence of μk to μ∗ follows from (8.4.5)
and (8.4.7).
Note that (8.4.10) is satisfied for operators of the form given in (8.4.3).
The characterization of inequalities by means of max and min operations in comple-
mentarity systems such as (8.4.2) is one of several possibilities.
8 Another frequently used
complementarity function is the Fischer–Burmeister function (ψ − x)2 + μ2 −(ψ −x)−
μ. Numerical experiments appear to indicate that max-based complementarity functions
can be more efficient numerically than the Fischer–Burmeister function; see, e.g., [Kan].
As shown in Section 7.5 a class of optimization problems with bilateral constraints
ϕ ≤ x ≤ ψ can be expressed as
Ax − a + μ = 0, μ = max(0, μ + c (x − ψ) + min(0, μ + c (x − ϕ)) (8.4.11)
or, equivalently, as
Ax − a + max(0, −Cx + a − αψ) + min(0, −Cx + a − αψ) = 0, (8.4.12)
where c = α and ϕ < ψ with a, ϕ, and ψ in L (). p
x k+1 = ψ in A+
k = {s : μ (s) + c (x (s) − ψ(s)) > 0},
k k
x k+1 = ϕ in A−
k = {s : μ (s) + c (x (s) − ϕ(s)) < 0}.
k k
(8.4.13)
i i
i i
ItoKunisc
i i
2008/6/12
page 243
i i
In the bilateral case, the role of c influences the likelihood of switches from being
active-above to active-below in one iteration. If, for example, s ∈ A+ k , then μ
k+1
(s) +
c (x (s) − ϕ(s)) = μ (s) + c (ψ(s) − ϕ(s)) is more likely to be negative and hence
k+1 k+1
s ∈ A− k+1 if c is large.
For c = α, iteration (8.4.13) is equivalent to applying the semismooth Newton method
to (8.4.12) with G as N -derivative for the max function and
⎧
⎨ 0 if x(s) ≥ 0,
G̃(x)(s) =
⎩
1 if x(s) < 0
as N-derivative for the min function. Under the assumptions of Proposition 8.18 local
superlinear convergence of the iteration (8.4.13) can be shown.
We shall employ semismooth Newton methods for solving F (x, μ) = 0. Let G(x) be as
defined in Section 8.4.
Applying a primal-dual active set strategy to the second equation in (8.5.2) results in
g(x k+1 ) + μk+1 = 0,
i i
i i
ItoKunisc
i i
2008/6/12
page 244
i i
provided that the initialization for (8.5.4) is chosen such that μ0 = −g(x 0 ). Note that (8.5.5)
can be considered as a partial semismooth Newton iteration. Separating the generalized
differentiation of the max operation from that of the linearization of the nonlinearity g was
investigated numerically in [dReKu, GrVo].
where zk = −g(x k ) + c(x k − ψ). Taking the inner product in L2 () with x k+1 − x ∗ , using
(8.5.7) to estimate the left-hand side from below, and Example 8.14 and Lipschitz continuity
of x → g(x) + c(x − ψ) from L2 () to Lp () to bound the right-hand side from above
we find
8
min(c, α)|x k+1 − x ∗ |L2 () ≤ o(|x k − x ∗ |L2 () ).
Finally (8.5.2) and (8.5.4) imply that
μk+1 − μ∗ = g(x ∗ ) − g(x k+1 ),
and hence superlinear convergence of μk to μ∗ follows from superlinear convergence of x k
to x ∗ .
i i
i i
ItoKunisc
i i
2008/6/12
page 245
i i
Remark 8.5.1. Note that, differently from the linear case considered in Section 8.4, if we
apply a full semismooth Newton step to (8.5.2) rather than to the reduced equation (8.5.5),
then the resulting algorithm differs in the update of the active/inactive sets. Applying a
semismooth Newton step to (8.5.2) results in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0} =
{s : −g(x k−1 )(s) − g (x k−1 )(x k − x k−1 ) + c (x k (s) − ψ(s)) > 0}.
{g (x)−1
I ∈ L(L (I)) : x ∈ U, = A ∪ I} is uniformly bounded.
2
Proof. By Lemma 8.15 and Example 8.14 the mapping x → max(0, −g(x) + c(x − ψ))
is Newton differentiable from L2 () into itself and G(−g(x) + c(x − ψ))(−g (x) + cI )
is an N -derivative. Moreover g (x) + G(−g(x) + c(x − ψ))(−g + cI ) is invertible in
L(L2 ()) with uniformly bounded inverses for x ∈ U . Setting
this follows from the fact that for given f ∈ L2 () the solution to the equation
is given by
1
chA = fA and hI = gI (x)−1
fI − χI g (x)fA .
c
i i
i i
ItoKunisc
i i
2008/6/12
page 246
i i
a(y, v − y) − f, v − y
≥ 0 for all v ∈ C. (8.6.1)
The existence of solutions to (8.6.1) is established in Theorem 8.26. The solution to (8.6.1)
is unique. In fact if ỹ ∈ C is a solution to (8.6.1), we have
a(y, ỹ − y) − f, ỹ − y
≥ 0,
a(ỹ, y − ỹ) − f, y − ỹ
≥ 0.
1
min a(y, y) − f, y
over y ∈ C. (8.6.2)
2
Let us define A ∈ L(V , V ∗ ) by
Ay, v
V ∗ ×V = a(y, v) for y, v ∈ V .
Ay + μ = f, y ≤ ψ, μ ∈ C+, μ, y − ψ
V ∗ ,V = 0, (8.6.3)
i i
i i
ItoKunisc
i i
2008/6/12
page 247
i i
Example 8.21.
For the obstacle problem in its simplest form we choose V = H01 () and
a(v, w) = ∇v∇w dx. In general, the unique solution (y, μ) is in L2 () × H −1 (). If
∂ and ψ are sufficiently regular, then (y, μ) ∈ (H01 () ∩ H 2 ()) × L2 () and (8.6.3) is
equivalent to (8.6.4).
Example 8.22. Consider the state-constrained optimal control problem with β > 0 and
ȳ ∈ L2 ()
1 β
min |y − ȳ|2L2 () + |u|2L2 () subject to Ey = u and y ∈ C, (8.6.5)
u∈L2 () 2 2
where E is a closed linear operator in L2 (). We assume that E −1 exists and set V =
dom (E), where dom (E) is endowed with the graph norm of E. For E = − with
homogeneous Dirichlet boundary conditions, we have dom E = H01 () ∩ H 2 (). The
necessary and sufficient optimality condition for (8.6.5) is given by
Ey = u, y ∈ C, βu = p,
(8.6.6)
E ∗ p + y − ȳ, v − y
V ∗ ×V ≥ 0 for all v ∈ C,
It can also be written in the form (8.6.3) with μ ∈ V ∗ , but differently from Example 8.21,
μ∈ / L2 () in general; see [BeK3].
In the case of mixed constraints, i.e., when pointwise constraints on the controls and
the state are present, it is natural to treat the control constraints with the active set methods
presented in Chapter 7 and to relax the state constraints by the technique explained in this
section.
Thus the obstacle and the state-constrained optimal control problem differ in that for
the former, under weak regularity conditions on the problem data, the Lagrange multiplier
is in L2 () and complementarity can be expressed in the form of (8.6.4), whereas for the
latter this is not the case. Before we can address the applicability of the primal-dual active
set strategy or semismooth Newton methods to such problems we also need to consider the
candidates for the iterates. In the case of the obstacle problem these are given formally by
a(y k+1 , v) + (μk+1 , v)L2 () = (f, v)L2 () for all v ∈ H01 (),
i i
i i
ItoKunisc
i i
2008/6/12
page 248
i i
for which the Lagrange μk+1 is not in L2 (), and hence an update of Ak as in (8.6.7) is
not feasible. Alternatively, considering the possibility of applying a semismooth Newton
approach we observe that the reduced form of (8.6.4) is given by
This is a relaxation of the second equation in (8.6.4) with α as a continuation parameter. Note
that an update for μ based on (8.6.8) results in μ ∈ L2 (). Equation (8.6.8) is equivalent to
cα
μ = max 0, (y − ψ) , (8.6.9)
1−α
with cα/(1 − α) ranging in (0, ∞) for α ∈ (0, 1). We shall use a generalization of (8.6.8)
and introduce an additional shift parameter μ̄ ∈ L2 () into (8.6.9). Moreover we replace
cα/(1 − α) by c and arrive at
This is exactly the same as the generalized Yosida–Moreau approximation for inequality
constraints discussed in Chapter 4.7 and is related to augmented Lagrangian methods as
described in Chapter 4.6. Utilizing this regularization in (8.6.4) results in
-
Ay + μ = f,
(8.6.11)
μ = max(0, μ̄ + c (y − ψ)).
i i
i i
ItoKunisc
i i
2008/6/12
page 249
i i
a(y, v) + (μ̄ + c (y − ψ), χAk v)L2 () = (f, v)L2 () for all v ∈ V .
(iv) Set ⎧
⎨ 0 on Ik ,
μk+1 =
⎩
μ̄ + c (y k+1 − ψ) on Ak .
The iterates μk as assigned in step (iv) are not necessary for the algorithm, but they
will be useful in the convergence analysis. The practical relevance of μ̄ is given by the
fact that for certain problems a proper choice can guarantee feasibility of the iterates, i.e.,
yk ≤ ψ, as will be shown next.
μ̄ ≥ (f − Aψ)+
a(y − ψ, φ) = f, φ
− a(ψ, φ) − (μ̄, φ) ≤ 0.
Theorem 8.24. If a(φ, φ + ) ≤ 0, for φ ∈ V , implies that φ = 0, then the iterates of the
semismooth Newton algorithm with regularization satisfy y k+1 ≤ y k for all k ≥ 1.
We have
a(y k , φ) − f, φ
+ (μ̄ + c(y k − ψ), φ χAk )
= (μ̄ + c(y k − ψ), φ χAk ∩Ik−1 ) − (μ̄ + c(y k − ψ), φ χIk ∩Ak−1 ) ≥ 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 250
i i
in Examples 8.21 and 8.22 and in the discussion of the iterative step of the semismooth
Newton algorithm, does not exist. If the finite-dimensional problems arise from discretiza-
tion of continuous ones, then the regularity of the Lagrange multipliers, however, certainly
influences the convergence and approximation properties. It will typically lead to mesh-
size-dependent behavior of the semismooth Newton algorithm.
We turn to convergence of the semismooth Newton algorithm with regularization.
Recall that A−1 ∈ L(V ∗ , V ). Below we shall also denote the restriction of A−1 to L2 ()
by the same symbol.
Theorem 8.25. Assume that μ̄ − cψ ∈ Lp () and A−1 ∈ L(L2 (), Lp ()) for some
p > 2. If μ0 ∈ L2 () and |μ0 − μc |L2 () is sufficiently small, then (y k , μk ) → (yc , μc )
superlinearly in V × L2 ().
By Example 8.14 and Lemma 8.15 the mapping F is Newton differentiable with N -
derivative given by
where G was defined in (8.4.6). The proof will be completed by showing that G̃(μ) has
uniformly bounded inverses in L(L2 ()) for μ ∈ L2 (). We define
Further, let RA : L2 () → L2 (A) and RI : L2 () → L2 (I) denote the restriction opera-
∗
tors to A and I. Their adjoints RA : L2 (A) → L2 () and RI∗ : L2 (I) → L2 () are the
extension-by-zero operators from A and I to , respectively. The mapping (RA , RI ) : L2 ()
→ L2 (A)×L2 (I) determines an isometric isomorphism and every μ ∈ L2 () can uniquely
be expressed as (RA μ, RI μ). The operator G̃(μ) can equivalently be expressed as
∗
IA 0 RA A−1 RA RA A−1 RI∗
G̃(μ) = +c ,
0 II 0 0
where IA and II denote the identity operators on L2 (A) and L2 (I). Let (gA , gI ) ∈
L2 (A) × L2 (I) be arbitrary and consider the equation
i i
i i
ItoKunisc
i i
2008/6/12
page 251
i i
(δμ)A + c RA A−1 RA
∗
(δμ)A = gA − c RA A−1 RI∗ gI . (8.6.14)
The Lax–Milgram theorem and nonnegativity of A−1 imply the existence of a unique solu-
tion (δμ)A to (8.6.14) and consequently (8.6.13) has a unique solution for every (gA , gI )
and every μ. Moreover these solutions are uniformly bounded with respect to μ ∈ L2 since
(δμ)I = gI and
|δμA |L2 (A) ≤ |gA |L2 () + cA−1 L(L2 ()) |gI |L2 (I) .
Theorem 8.26. The solution (yc , μc ) ∈ V ×X of the regularized problem (8.6.11) converges
to the solution (y ∗ , μ∗ ) ∈ V × V ∗ of (8.6.3) in the sense that yc → y ∗ strongly in V and μc
→ μ∗ strongly in V ∗ as c → ∞.
Since μc ≥ 0, for y ∈ C
c 1
(μc , yc − y) = (μc , yc − ψ − (y − ψ)) ≥ |(yc − ψ)+ |2X − |μ̄|2X .
2 2c
Letting v = yc − y in (8.6.15),
c 1
a(yc , yc − y) + |(yc − ψ)+ |2X ≤ f, yc − y
+ |μ̄|2X . (8.6.16)
2 2c
Since a is coercive, this implies that
c
ν |yc |2V + |(yc − ψ)+ |2X is bounded uniformly in c.
2
Thus there exist a subsequence yc , denoted by the same symbol, and y ∗ ∈ V such that
yc → y ∗ weakly in V . For all φ ≥ 0
i i
i i
ItoKunisc
i i
2008/6/12
page 252
i i
√
which implies y ∗ − ψ ≤ 0 and thus y ∗ ∈ C. Since φ → a(φ, φ) defines an equivalent
norm on V and norms are w.l.s.c., letting c → 0 in (8.6.16) we obtain
a(y ∗ , y ∗ − y) ≤ f, y ∗ − y
for all y ∈ C
i i
i i
ItoKunisc
i i
2008/6/12
page 253
i i
Chapter 9
Semismooth Newton
Methods II: Applications
In the previous chapter semismooth Newton methods in function spaces were investigated.
In was demonstrated that in certain cases the semismooth Newton method is equivalent to
the primal-dual active set method. The application to nonlinear complementarity problems
was discussed and the necessity of introducing regularization in cases where the Lagrange
multiplier associated to the inequality condition has low regularity was demonstrated. In
this chapter applications of semismooth Newton methods to nondifferentiable variational
problems in function spaces will be treated. They concern image restoration problems
regularized by bounded variation functionals in Section 9.1 and frictional contact problems
in elasticity in Section 9.2.
We shall make use of the Fenchel duality theorem which we recall for further reference;
see, e.g., Section 4.3 and [BaPe, EkTe] for details. Let V and Y be Banach spaces with
topological duals V and Y , respectively. Further, let ∈ L(V , Y ) and let F : V −→
R ∪ {∞}, G : Y −→ R ∪ {∞} be convex, proper, and l.s.c. functionals such that there exists
v0 ∈ V with F(v0 ) < ∞, G(v0 ) < ∞ and G is continuous at v0 . Then
inf {F(v) + G(v)} = sup −F (− q) − G (q) , (9.0.1)
v∈V q∈Y
where ∈ L(Y , V ) is the adjoint of . See, e.g., Section 4.3, Theorem 4.30 and
Example 4.33 with (v, w) = F(v) + G(v + w). The convex conjugates F : V −→
R ∪ {∞} and G : Y −→ R ∪ {∞} of F and G, respectively, are defined by
F (v ) = sup v, v
V ,V − F(v) ,
v∈V
and analogously for G . The conditions imposed on F and G guarantee that the dual problem,
i.e., the problem on the right-hand side of (9.0.1), admits a solution. Furthermore, v̄ ∈ V
and q̄ ∈ Y are solutions to the two optimization problems in (9.0.1) if and only if the
extremality conditions
− q̄ ∈ ∂F(ū),
(9.0.2)
q̄ ∈ ∂G(ū)
hold, where ∂F denotes the subdifferential of F.
253
i i
i i
ItoKunisc
i i
2008/6/12
page 254
i i
is finite. Here | · |∞ denotes the supremum norm on R2 . It is well known that BV() ⊂
L2 () for ⊂ R2 (see [Giu]) and that u → |u|L2 + |Du| defines a norm on BV().
If K = identity and α = 0, then (9.1.1) is the well-known image restoration problem
with BV-regularization. It consists of recovering the true image u from the noisy image f .
BV-regularization is known to be preferable to regularization by |∇u|2 dx, for example,
due to its ability to preserve edges in the original image during the reconstruction process.
Since the pioneering work in [ROF], the literature on (9.1.1) has grown tremendously. We
give some selected references [AcVo, CKP, ChLi, ChK, DoSa, GeYa, IK12] and refer the
reader to the monograph [Vog] for additional ones.
Despite its favorable properties for reconstruction of images, and especially images
with blocky structure, problem (9.1.1) poses some severe difficulties. On the analytical level
these are related to the fact that (9.1.1) is posed in a nonreflexive Banach space, the dual
of which is difficult to characterize [Giu, IK18], and on the numerical level the optimality
system related to (9.1.1) consists of a nonlinear partial differential equation, which is not
directly amenable to numerical implementations.
Following [HinK1] we shall show that the predual of (9.1.1) is a bilaterally constrained
optimization problem in a Hilbert space, for which the primal-dual active set strategy can
advantageously be applied and which can be analyzed as a semismooth Newton method.
We require some facts from vector-valued function spaces, which we summarize next. Let
IL2 () = L2 () × L2 () be endowed with the Hilbert space inner product structure and
norm. If the context suggests to do so, then we shall distinguish between vector fields v ∈
IL2 () and scalar functions v ∈ L2 () by using an arrow on top of the letter. Analogously
we set IH10 () = H01 () × H01 (). We set L20 () = {v ∈ L2 () : vdx = 0} and
H0 (div) = {v ∈ IL2 () : div v ∈ L2 (), v · n = 0 on ∂}, where n is the outer normal to
∂. The space H0 (div) is endowed with | v |2H0 (div) = |
v |2IL2 () + | div v|2L2 as norm. Further
we put H0 (div 0) = { v ∈ H0 (div) : div v = 0 a.e. in }. It is well known that
i i
i i
ItoKunisc
i i
2008/6/12
page 255
i i
with
H0 (div 0)⊥ = {
v ∈ grad H 1 () : div v ∈ L2 (), v · n = 0 on ∂},
and div : H0 (div 0)⊥ ⊂ H0 (div) → L20 () is a homeomorphism. In fact, it is injective by
construction and for every f ∈ L20 () there exists, by the Lax–Milgram lemma, ϕ ∈ H 1 ()
such that
div ∇ϕ = f in , ∇ϕ · n = 0 on ∂,
with ∇ϕ ∈ H0 (div 0)⊥ . Hence, by the closed mapping theorem we have
div ∈ L(H0 (div 0)⊥ , L20 ()).
Finally, let Pdiv and Pdiv⊥ denote the orthogonal projections in IL2 () onto H0 (div 0) and
grad H 1 (), respectively. Note that the restrictions of Pdiv and Pdiv⊥ to H0 (div 0) coincide
with the orthogonal projections in H0 (div) onto H0 (div 0) and H0 (div 0)⊥ .
Let 1 denote the two-dimensional vector field with 1 in both coordinates, set B =
αI + K∗ K, and consider
-
min 12 | div p + K∗ f |2B over p ∈ H0 (div),
(9.1.4)
such that − β 1 ≤ p ≤ β 1,
where for v ∈ L2 () we put |v|2B = (v, B −1 v)L2 . It is straightforward to argue that (9.1.4)
admits a solution.
Theorem 9.1. The Fenchel dual to (9.1.4) is given by (9.1.1) and the solutions u∗ of (9.1.1)
and p ∗ of (9.1.4) are related by
Bu∗ = div p ∗ + K∗ f, (9.1.5)
(− div)∗ u∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div), (9.1.6)
with −β 1 ≤ p ≤ β 1.
Alternatively (9.1.4) can be considered as the predual of the original problem (9.1.1).
Proof. We apply Fenchel duality as recalled at the beginning of the chapter with V =
H0 (div), Y = Y ∗ = L2 (), = − div, G : Y → R given by G(v) = 12 |v − K∗ f |2B , and
F : V → R defined by F(p) = I[−β 1,β
1] where
(p),
0 if − β 1 ≤ p(x)
≤ β 1 for a.e. x ∈ ,
I[−β 1,β
1] =
(p)
∞ otherwise.
The convex conjugate G ∗ : L2 () → R of G is given by
1 α 1
G ∗ (v) =
|Kv + f |2 + |v|2 − |f |2 .
2 2 2
∗ ∗
Further the conjugate F : H0 (div) → R of F is given by
F ∗ (
q ) = sup H0 (div)∗ ,H0 (div)
q , p
for q ∈ H0 (div)∗ , (9.1.7)
1
p∈S
i i
i i
ItoKunisc
i i
2008/6/12
page 256
i i
The set S2 is dense in the topology of H0 (div) in S1 . In fact, let p be an arbitrary element
of S1 . Since (D())2 is dense in H0 (div) (see, e.g., [GiRa, p. 26]), there exists a sequence
pn ∈ (D())2 converging in H0 (div) to p.
Let P denote the canonical projection in H0 (div)
onto the closed convex subset S1 and note that, since p ∈ S1 ,
|p − P pn |H0 (div) ≤ |p − pn |H0 (div) + |pn − P pn |H0 (div)
≤ 2|p − pn |H0 (div) → 0 for n → ∞.
Hence limn→∞ |p − P pn |H0 (div) = 0 and S2 is dense in S1 . Returning to (9.1.7) we have
for v ∈ L2 () and (− div)∗ ∈ L(L2 (), V ∗ ),
which can be +∞. By the definition of the functions of bounded variation it is finite if and
only if v ∈ BV() (see [Giu, p. 3]) and
∗ ∗
F ((−div) v) = β |Dv| < ∞ for v ∈ BV().
(− div)∗ u∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ S1
and
Bu∗ = div p ∗ + K∗ f.
∗ ∈ H0 (div)∗
Corollary 9.2. Let p ∗ ∈ H0 (div) be a solution to (9.1.4). Then there exists λ
such that
with −β 1 ≤ p ≤ β 1.
Proof. Apply div∗ B −1 to (9.1.5) and set λ ∗ = − div∗ u∗ ∈ H0 (div)∗ to obtain (9.1.8). For
∗ , (9.1.9) follows from (9.1.6).
this choice of λ
i i
i i
ItoKunisc
i i
2008/6/12
page 257
i i
Theorem 9.3. The family {(pc , λ c )}c>0 converges weakly in H0 (div) × IH10 ()∗ to the
∗ ) ∈ H0 (div) × H0 (div)∗ of the optimality system associated to
unique solution (p ∗ , λ
(9.1.10) given by
Proof. The proof is related to that of Theorem 8.26. The variational form of (9.1.13) is
given by
∗ , v
H0 (div)∗ ,H0 (div) = 0
(div p ∗ , div v)B + (K∗ f, div v)B + γ (p ∗ , v) + λ (9.1.15)
i ) ∈ H0 (div)×H0 (div)∗ ,
for all v ∈ H0 (div). To verify uniqueness, let us suppose that (pi , λ
i = 1, 2, are two solution pairs to (9.1.13), (9.1.14). For δ p = p2 − p1 , δ λ =λ
2 − λ1
we have
(B −1 div δ p,
div v) + γ (δ p, v
H0 (div)∗ ,H0 (div) = 0
v) + δ λ, (9.1.16)
i i
i i
ItoKunisc
i i
2008/6/12
page 258
i i
∗ ) ∈ H0 (div) ×
Combining (9.1.18) and (9.1.19) we can assert the existence of (p ∗ , λ
1 ∗
IH0 () such that for a subsequence denoted by the same symbol
∗
c ) (p ∗ , λ
(pc , λ ∗ ) weakly in H0 (div) × IH10 () . (9.1.20)
i i
i i
ItoKunisc
i i
2008/6/12
page 259
i i
1
c , v) = 0
(∇ pc , ∇ v) + (div pc , div v)B + (K∗ f, div v)B + γ (pc , v) + (λ
c
for all v ∈ IH10 (). Passing to the limit c → ∞, using (9.1.18) and (9.1.20) we have
Since IH10 () is dense in H0 (div) and p ∗ ∈ H0 (div) we have that (9.1.21) holds for all v ∈
H0 (div). Consequently λ ∗ can be identified with an element in H0 (div)∗ and ·, ·
IH1 ()∗ ,IH1 ()
0 0
in (9.1.21) can be replaced by ·, ·
H0 (div)∗ ,H0 (div) . We next verify that p ∗ is feasible. For
this purpose note that
+ min(0, c(pc + β 1),
c , p − pc ) = max(0, c(pc − β 1))
(λ p − pc ) ≤ 0 (9.1.22)
for all −β 1 ≤ p ≤ β 1.
From (9.1.11) we have
1 1
c |2 ≤ |K∗ f |2B .
|∇ pc |2 + | div pc + K∗ f |2B + γ |pc |2 + |λ (9.1.23)
c c
c |2 ≤ |K∗ f |2B for all c > 0. Note that
Consequently, 1c |λ
and thus
22 −
| max(0, (pc − β 1))|
c→∞
22 −
−−→ 0 and | min(0, (pc + β 1))|
c→∞
−−→ 0. (9.1.24)
IL () IL ()
Recall that pc p ∗ weakly in IL2 (). Weak lower semicontinuity of the convex functional
IL2 () and (9.1.24) imply that
p → | max(0, p − β 1)|
∗ 2 dx ≤ lim inf
| max(0, p − β 1)| 2 dx = 0.
| max(0, pc − β 1)|
c→∞
c , p ∗ − pc
H0 (div)∗ ,H0 (div) ≤ 0
λ for all c > 0. (9.1.25)
i i
i i
ItoKunisc
i i
2008/6/12
page 260
i i
1
c μ
− pc + λ ∗ weakly in H0 (div)∗ ,
c
∗
and consequently also in IH10 () . Moreover, { √1c |∇ pc |}c≥1 is bounded and hence
1 ∗
− pc 0 weakly in IH10 ()
c
∗
c λ
as c → ∞. Since λ ∗ weakly in IH10 () it follows that
∗ − μ
λ ∗ , v
IH10 ()∗ ,IH10 () = 0 for all v ∈ IH10 ().
Remark 9.1.1. For γ = 0 problems (9.1.10) and (9.1.11) admit a solution, which, however,
c )}c>0 contains a weak
is not unique. From the proof of Theorem 9.3 it follows that {(pc , λ
∗
accumulation point in H0 (div) × IH0 () and every weak accumulation point is a solution
1
of (9.1.10).
i i
i i
ItoKunisc
i i
2008/6/12
page 261
i i
Algorithm.
(1) Choose p0 ∈ IH10 () and set k = 0.
(2) Set, for i = 1, 2,
A+,i
k+1 = {x : (pki + β 1)(x) > 0},
A −,i
= {x : (pki + β 1)(x) < 0},
k+1
Ik+1
i
= \ (A+,i −,i
k+1 ∪ Ak+1 ).
and analogously for A− k+1 . The superscript i, i = 1, 2, refers to the respective component.
We note that (9.1.27) admits a solution pk+1 ∈ IH10 (). Step (4) is included for the sake of
the analysis of the algorithm. Let C : IH10 () → H −1 () × H −1 () stand for the operator
1
C = − − ∇B −1 div +γ id.
c
It is a homeomorphism for every c > 0 and allows us to express (9.1.12) as
+ c min(0, p + β 1)
C p − ∇B −1 K∗ f + c max(0, p − β 1) = 0, (9.1.28)
i i
i i
ItoKunisc
i i
2008/6/12
page 262
i i
where we drop the index in the notation for pc . For ϕ ∈ L2 () we define
1 if ϕ(x) > 0,
Dmax(0, ϕ)(x) = (9.1.29)
0 if ϕ(x) ≤ 0
and
1 if ϕ(x) < 0,
Dmin(0, ϕ)(x) = (9.1.30)
0 if ϕ(x) ≥ 0.
Using (9.1.29), (9.1.30) as Newton derivatives for the max and the min operations in (9.1.28)
the semismooth Newton step can be expressed as
A+ + c(pk+1 + β 1)χ
C pk+1 + c(pk+1 − β 1)χ A− − ∇B −1 K∗ f = 0, (9.1.31)
k+1 k+1
rather
The iteration of the algorithm can also be expressed with respect to the variable λ
For this purpose we define
than p.
F (λ − c max(0, C −1 (∇ fˆ − λ)
) = λ − c min(0, C −1 (∇ fˆ − λ)
− β 1)
+ β 1), (9.1.33)
which coincides with (9.1.32). Therefore the semismooth Newton iterations according to
= 0 coincide, provided that the initializations are related by
the algorithm and that for F (λ)
ˆ
C p0 − ∇ f + λ0 = 0. The mapping F is Newton differentiable, i.e., for every λ ∈ IL2 ()
|F (λ − F (λ)
+ h) − DF (λ
+ h)h| IL2 () )
IL2 () = o(|h| (9.1.34)
IL2 () → 0; see Example 8.14. Here D denotes the Newton derivative of F defined
for |h|
by means of (9.1.29) and (9.1.30). For (9.1.34) to hold, the smoothing property of C −1
in the sense of an embedding from IL2 () into ILp () for some p > 2 is essential. The
following result now follows from Theorem 8.16.
c − λ
Theorem 9.4. If |λ 0 |IL2 () is sufficiently small, then the iterates {(pk , λ
k )}∞
k=1 of the
1 2
algorithm converge superlinearly in IH0 () × IL () to the solution (pc , λc ) of (9.1.11).
i i
i i
ItoKunisc
i i
2008/6/12
page 263
i i
Let us compare the algorithm of this section to the general framework of augmented
Lagrangians presented in Section 4.6 for nonsmooth problems. We again introduce in
(9.1.10) a diffusive regularization and realize the inequality constraints by a generalized
Yosida–Moreau approximation. This suggests considering the Lagrangian L(p, μ)
: IH10 ()
× IL () → R defined by
2
1 1 γ 2
Lc (p, =
λ) 2 + | div p + K∗ f |2B + |p|
|∇ p| + φc (p,
λ), (9.1.35)
2c̄ 2 2
where φc is the generalized Yosida–Moreau approximation of the indicator function φ of the
set {p ∈ IL2 () : −β 1 ≤ p ≤ β 1},
and c > 0, c̄ > 0. Here we choose c differently from
c̄ since in the limit of the augmented Lagrangian iteration the constraint −β 1 ≤ p ≤ β 1 is
satisfied for any fixed c > 0. We have
c 2
φc (p, =
λ) inf φ(p − q) + (μ,
q)IL2 () + |
q| 2
q∈IL 2
() 2 IL ()
1
2 2 − 1
λ
2 2 .
+ min 0, λ + c(p + β 1)
2c IL () 2c IL ()
The auxiliary problems in step 2 of the augmented Lagrangian method of Section 4.6 with Lc
given in (9.1.35) coincide with (9.1.11) except for the shift by λ k in the max/min operations.
Each of these auxiliary problems can efficiently be solved by the semismooth Newton
algorithm presented in this section if pk ∓ β 1 is replaced by λk + c(pk ∓ β 1).
Conversely,
one can think of introducing augmented Lagrangian steps to the algorithm of this section,
with the goal of avoiding possible ill-conditioning as c → ∞. For numerical experience
with the algorithmic concepts of this section we refer the reader to [HinK1].
i i
i i
ItoKunisc
i i
2008/6/12
page 264
i i
into three disjoint parts, namely, the Dirichlet part d , further the part n with prescribed
2
surface load h ∈ L2 (n ) := L2 (n ) , and the part c , where contact and friction with a
rigid foundation may occur. For simplicity we assume that ¯ c ∩ ¯ d = ∅ to avoid working
1
with the space H002 (c ). We are interested in the deformation y = (y1 , y2 )! of the elastic
2
body which is also subject to a given body force f ∈ L2 () := L2 () . The gap between
2
the elastic body and the rigid foundation is d := τN d ≥ 0, where d ∈ H1 () := H 1 ()
and τN y denotes the normal component of the trace along c . As usual in linear elasticity,
the linearized strain tensor is
1
ε(y) = ∇y + (∇y)! .
2
Using Hooke’s law for the stress-strain relation, the linearized stress tensor
is obtained, where λ and μ are the Lamé parameters. These parameters are given by
λ = (Eν)/ (1 + ν)(1 − 2ν) and μ = E/ 2(1 + ν)
with Young’s modulus E > 0 and the Poisson ration ν ∈ (0, 0.5). Above, C denotes the
fourth order isotropic material tensor for linear elasticity.
The Signorini problem with Coulomb friction is then given as follows:
i i
i i
ItoKunisc
i i
2008/6/12
page 265
i i
Y := {v ∈ H1 () : τ v = 0 a.e. on d },
K := {v ∈ Y : τN v ≤ 0 a.e. on c }.
We define the symmetric bilinear form a(· , ·) on Y × Y and the linear form L(·) on Y by
a(y, z) := (σ y) : (εz) dx, L(y) = f y dx + h τ y dx,
n
1
all h ∈ H 2 (c ) with h ≥ 0. Since the friction coefficient F is assumed to be uniformly
1
Lipschitz continuous, it is a factor on H 2 (c ), i.e., the mapping
1 1
λ ∈ H 2 (c ) → Fλ ∈ H 2 (c )
is well defined and bounded; see [Gri, p. 21]. By duality it follows that F is a factor on
H − 2 (c ) as well. Consequently, the nondifferentiable functional
1
j (y) := Fg|τT y| dx
c
is well defined on Y. After these preparations we can state the contact problem with given
friction as
1
min J (y) := a(y, y) − L(y) + j (y) (P)
y∈d+K 2
Due to the Korn inequality, the functional J (·) is uniformly convex; further it is l.s.c. This
implies that (P) and equivalently (9.2.2) admit a unique solution y ∗ ∈ d + K.
To derive the dual problem corresponding to (P), we apply the Fenchel calculus to
1
the mappings F : Y → R and G : V × H 2 (c ) → R given by
-
−L(y) if y ∈ d + K, 1
F(y) := G(q, ν) := q : C q dx + Fg|ν| dx,
∞ else; 2 c
i i
i i
ItoKunisc
i i
2008/6/12
page 266
i i
where
V = {p ∈ (L2 ())2×2 : p12 = p21 }.
1
Furthermore, ∈ L(Y, V × H 2 (c )) is given by
y := (1 y, 2 y) = (εy, τT y),
which allows us to express (P) as
min F(y) + G(y) .
y∈Y
1
Endowing V × H 2 (c ) with the usual product norm, F and G satisfy the conditions for
the Fenchel duality theorem, which was recalled in the preamble of this chapter. For the
convex conjugate one derives that F (− (p, μ)) equals +∞ unless
and pT + μ = 0 in H − 2 (c ),
1
−Div p = f , p · n = h in L2 (n ), (9.2.3)
− 12
where pT = (n! p) · t ∈ H (c ). Further one obtains that
-
− pN , d
c if (9.2.3) and pN ≤ 0 in H − 2 (c ) hold,
1
Lemma 9.5. The solution y ∗ ∈ d + K of (P) and the solution (p ∗ , μ∗ ) of (P ) are related
by σ y ∗ = p∗ and by the existence of λ∗ ∈ H − 2 (c ) such that
1
a(y ∗ , z) − L(z) + μ∗ , τT z
c + λ∗ , τN z
c = 0 for all z ∈ Y, (9.2.4a)
λ∗ , τN z
c ≤ 0 for all z ∈ K, (9.2.4b)
λ∗ , τN y ∗ − d
c = 0, (9.2.4c)
1
Fg, |ν|
c − μ∗ , ν
c ≥ 0 for all ν ∈ H (c ), 2 (9.2.4d)
Fg, |τT y ∗ |
c − μ∗ , τT y ∗
c = 0. (9.2.4e)
i i
i i
ItoKunisc
i i
2008/6/12
page 267
i i
tiability of the primal functional J (·) has the mechanical interpretation μ∗ = −σT y ∗ . Using
Green’s theorem in (9.2.4a) one finds
λ∗ = −σN y ∗ , (9.2.6)
i.e., λ∗ is the negative stress in normal direction.
We now briefly comment on the case that the given friction g is more regular, namely,
g ∈ L2 (c ). In this case we can define G on the larger set V × L2 (c ). One can verify that
the assumptions for the Fenchel duality theorem hold, and thus obtain higher regularity for
the dual variable μ corresponding to the nondifferentiability of the cost functional in (P ),
in particular μ ∈ L2 (c ). This implies that the dual problem can be written as follows:
1
sup − C−1 p : p dx + pN , d
c . (9.2.7)
2
(p,μ)∈V×L (c ) 2
1
s.t. (9.2.3),pN ≤0 in H − 2 (c ),
and |μ|≤Fg a.e. on c .
Utilizing the relation p = σ y and (9.2.6), one can transform (9.2.7) into
⎧
⎪
⎪ −
1
a(y λ,μ , y λ,μ ) + λ, d
c ,
⎪
⎪ min
⎪
⎪ − 12
(c ) 2
⎨ (λ,μ)∈H (−c )∈×L
2
1
λ≥0 in H 2 (c ) (9.2.8)
⎪ |μ|≤Fg a.e. on c
⎪
⎪
⎪
⎪ where y λ,μ satisfies
⎪
⎩
a(y λ,μ , z) − L(z) + λ, τN z
c + (μ, τT z)c = 0 for all z ∈ Y.
Problem (9.2.8) is an equivalent form for the dual problem (9.2.7), now written in the
variables λ and μ. The primal variable y λ,μ appears only as an auxiliary variable determined
from λ and μ. Since g ∈ L2 (c ), also the extremality conditions corresponding to (P) and
(9.2.7) can be given more explicitly. First, (9.2.4d) is equivalent to
|μ∗ | ≤ Fg a.e. on c , (9.2.4d )
and a brief computation shows that (9.2.4e) is equivalent to
-
τT y ∗ = 0 or
τ y∗ (9.2.4e )
τT y ∗ = 0 and μ∗ = Fg |τT y ∗ | .
T
i i
i i
ItoKunisc
i i
2008/6/12
page 268
i i
We now introduce and analyze a regularized version of the contact problem with given
friction that allows the application of the semismooth Newton method. In what follows we
assume that g ∈ L2 (c ). We start our consideration with a regularized version of the dual
problem (9.2.7) written in the form (9.2.8). Let γ1 , γ2 > 0, λ̂ ∈ L2 (c ), λ̂ ≥ 0, and
μ̂ ∈ L2 (c ), and define the functional Jγ1,γ2 : L2 (c ) × L2 (c ) −→ R by
1 1
Jγ1,γ2 (λ, μ) := a(y λ,μ , y λ,μ ) + (λ, d)c + λ − λ̂2c
2 2γ1
1 1 1
+ μ − μ̂2c − λ̂2c − μ̂2c ,
2γ2 2γ1 2γ2
Obviously, the last two terms in the definition of Jγ1,γ2 are constants and can thus be neglected
in the optimization problem (Pγ1,γ2 ). However, they are introduced with regard to the primal
problem corresponding to (Pγ1,γ2 ), which we turn to next. We define the functional Jγ1,γ2 :
Y → R by
1 1
Jγ1,γ2 (y) := a(y, y) − L(y) + max(0, λ̂ + γ1 (τN y − d))2c
2 2γ1
1
+ Fgh(τT y(x), μ̂(x)) dx,
γ2 c
This can be verified similarly as for the original problem using Fenchel duality theory;
see [Sta1] for details. Clearly, both (Pγ1,γ2 ) and (Pγ1,γ2 ) admit unique solutions y γ1,γ2 and
(λγ1,γ2 , μγ1,γ2 ), respectively. Note that the regularization turns the primal problem into the
unconstrained minimization of a continuously differentiable functional, while the corre-
sponding dual problem is still a constrained minimization of a quadratic functional. To
shorten notation we henceforth mark all variables of the regularized problems only by the
i i
i i
ItoKunisc
i i
2008/6/12
page 269
i i
index γ instead of γ1,γ2 . It can be shown that the extremality conditions relating (Pγ1,γ2 ) and
(Pγ1,γ2 ) are
for any σ > 0. Here, ξγ is the Lagrange multiplier associated to the constraint |μ| ≤ Fg in
(Pγ1,γ2 ). By setting σ = γ2−1 , ξγ can be eliminated from (9.2.12c), which results in
While (9.2.13) and (9.2.12c) are equivalent, they will motivate slightly different active set
algorithms due to the parameter σ in (9.2.12c).
Next we investigate the convergence of the primal variable y γ as well as the dual
variables (λγ , μγ ) as the regularization parameters γ1 , γ2 tend to infinity. For this purpose
we denote by y ∗ the solution of (P) and by (λ∗ , μ∗ ) the solution to (P ).
Proof. The proof of this theorem is related to that of Theorem 8.26 in Chapter 8 and can be
found in [KuSt].
An active set algorithm for solving (9.2.12) is presented next. The interpretation as
generalized Newton method is discussed later. In the following we drop the index γ .
Algorithm SSN.
Ak+1
c = {x ∈ c : λ̂ + γ1 (τN y k − d) > 0},
Ick+1 = c \ Ak+1
c ,
Ak+1
f,− = {x ∈ c : ξ + σ (μ + Fg) < 0},
k k
Ifk+1 = c \ (Ak+1
f,− ∪ Af,+ ).
k+1
3. If k ≥ 1, Ak+1
c = Akc , Ak+1
f,− = Af,− , and Af,+ = Af,+ stop. Else
k k+1 k
i i
i i
ItoKunisc
i i
2008/6/12
page 270
i i
4. Solve
5. Set
⎧
⎪
⎪ τ y k+1 + γ2−1 (μ̂ + Fg) on Ak+1
f,− ,
⎨ T
ξ k+1 := τT y k+1 + γ2−1 (μ̂ − Fg) on Ak+1
f,+ ,
⎪
⎪
⎩
0 on Ifk+1 ,
with Jγ1 ,γ2 as defined in (9.2.10) that clearly has a unique solution. If the algorithm stops
at step 3 then y k is the solution to the primal problem (Pγ1,γ2 ) and (λk , μk ) solves the dual
problem (Pγ1,γ2 ).
Provided that we choose σ = γ2−1 , the above algorithm can be interpreted as a
semismooth Newton method in infinite-dimensional spaces. To show this assertion, we
consider a reduced system instead of (9.2.12). Thereby, as in the dual problem (Pγ1,γ2 ), the
primal variable y only acts as an auxiliary variable that is calculated from the dual variables
(λ, μ). We introduce the mapping F : L2 (c ) × L2 (c ) −→ L2 (c ) × L2 (c ) by
⎛ ⎞
λ − max(0, λ̂ + γ1 (τN y λ,μ − d))
⎜ ⎟
F (λ, μ) = ⎝ γ2 τT y + μ̂ − μ − max(0, γ2 τT y + μ̂ − Fg) · · · ⎠ , (9.2.14)
λ,μ λ,μ
· · · − min(0, γ2 τT y λ,μ + μ̂ + Fg)
i i
i i
ItoKunisc
i i
2008/6/12
page 271
i i
Since the mapping " involves a norm gap under the max functional, it is Newton differ-
entiable (see Example 8.14), and thus the first component of F is Newton differentiable.
A similar observation holds for the second component as well, and thus the whole map-
ping F is Newton differentiable. Hence, we can apply the semismooth Newton method
to the equation F (λ, μ) = 0. Calculating the explicit form of the Newton step leads to
Algorithm SSN with σ = γ2−1 .
Theorem 9.7. Suppose that there exists a constant g0 > 0 with Fg ≥ g0 , and further
suppose that σ ≥ γ2−1 and that λ0 − λγ c , μ0 − μγ c are sufficiently small. Then
the iterates (λk , ξ k , μk , y k ) of Algorithm SSN converge superlinearly to (λγ , ξγ , μγ , y γ )
in L2 (c ) × L2 (c ) × L2 (c ) × Y.
Proof. The proof consists of two steps. First we prove the assertion for σ = γ2−1 and then
we utilize this result for the general case σ ≥ γ2−1 .
Step 1. For σ = γ2−1 Algorithm SSN is a semismooth Newton method for the equation
F (λ, μ) = 0 (F as defined in (9.2.14)). We already argued Newton differentiability of F .
To apply Theorem 8.16, it remains to show that the generalized derivatives have uniformly
bounded inverses, which can be achieved similarly as in the proof of Theorem 8.25 in
Chapter 8. Clearly, the superlinear convergence of (λk , μk ) carries over to the variables ξ k
and y k .
Step 2. For σ > γ2−1 we cannot use the above argument directly. Nevertheless, one
can prove superlinear convergence of the iterates by showing that in a neighborhood of the
solution the iterates of Algorithm SSN with σ > γ2−1 coincide with those of Algorithm
SSN with σ = γ2−1 . The argument for this fact exploits the smoothing properties of the
Neumann-to-Dirichlet mapping for the elasticity equation. First, we again consider the case
σ = γ2−1 . Clearly, for all k ≥ 1 we have λk − λk−1 ∈ L2 (c ) and μk − μk−1 ∈ L2 (c ).
The corresponding difference y k − y k−1 of the primal variables satisfies
From regularity results for elliptic variational equalities it follows that there exists a constant
C > 0 such that
τT y k − τT y k−1 C 0 (c ) ≤ C λk − λk−1 c + μk − μk−1 c . (9.2.17)
Akf,− ∩ Ak+1
f,+ = Af,+ ∩ Af,− = ∅
k k+1
(9.2.18)
provided that λ0 −λγ c and μ0 −μγ c are sufficiently small. If B := Akf,− ∩Ak+1
f,+ = ∅,
−1 −1
then it follows that τT y k−1
+ γ2 (μ̂ + Fg) < 0 and τT y + γ2 (μ̂ − Fg) > 0 on B, which
k
implies that τT y k − τT y k−1 > 2γ2−1 Fg ≥ 2γ2−1 Fg0 > 0 on B. This contradicts (9.2.17)
provided that λ0 − λγ c and μ0 − μγ c are sufficiently small. Analogously, one can
show that Akf,+ ∩ Ak+1f,− = ∅.
We now choose an arbitrary σ ≥ γ2−1 and assume that (9.2.18) holds for Algorithm
SSN if σ = γ2−1 . Then we can argue that in a neighborhood of the solution the iterates of
Algorithm SSN are independent of σ ≥ γ2−1 . To verify this assertion we separately consider
i i
i i
ItoKunisc
i i
2008/6/12
page 272
i i
the sets Ifk , Akf,− , and Akf,+ . On Ifk we have that ξ k = 0 and thus σ has no influence when
determining the new active and inactive sets. On the set Akf,− we have μk = −Fg. Here,
we consider two types of sets. First, sets where ξ k < 0 belong to Ak+1 f,− for the next iteration
independently of σ . And, second, if ξ k ≥ 0, we use
cannot occur for σ ≥ γ2−1 , since it is already ruled out by (9.2.18) for σ = γ2−1 . On the set
Akf,+ one argues analogously.
This shows that in a neighborhood of the solution the iterates are the same for all
σ ≥ γ2−1 , and thus the superlinear convergence result from Step 1 carries over to the
general case σ ≥ γ2−1 , which ends the proof.
Aside from the assumption that λ0 − λγ c and μ0 − μγ c are sufficiently small,
σ controls the probability that points are moved from the lower active set to the upper, or
vice versa, in one iteration. Smaller values for σ make it more likely that points belong to
Akf,− ∩ Ak+1
f,+ or Af,+ ∩ Af,− . In the numerical realization of Algorithm SSN it turns out
k k+1
that choosing small values for σ may not be optimal, since this may lead to the situation
that points which are active with respect to the upper bound become active with respect to
the lower bound in the next iteration, and vice versa. This in turn may lead to cycling of the
iterates. Such undesired behavior can be overcome by choosing larger values for σ . If the
active set strategy is based on (9.2.13), one cannot take advantage of a parameter which
helps avoid points from changing from Af,+ to Af,− , or vice versa, in one iteration.
Remark 9.2.1. So far we have not remarked on the choice of λ̄ and μ̄. One possibility is
to choose them according to first order augmented Lagrangian updates. In practice this will
result in carrying out some steps of Algorithm SSN and then updating (λ̄, μ̄) as the current
values of (λk , μk ).
1
fixed point idea. We define the cone of nonnegative functionals over H 2 (c ) as
−1
H+ 2 (c ) := ξ ∈ H − 2 (c ) : ξ, η
c ≥ 0 for all η ∈ H 2 (c ), η ≥ 0
1 1
−1 −1
and consider the mapping ! : H+ 2 (c ) → H+ 2 (c ) defined by !(g) := λg , where
λg is the unique multiplier for the contact condition in (9.2.4) for the problem with given
friction g. Property (9.2.4b) implies that ! is well defined. With (9.2.6) in mind, y ∈ Y is
called a weak solution of the Signorini problem with Coulomb friction if its negative normal
i i
i i
ItoKunisc
i i
2008/6/12
page 273
i i
boundary stress −σN (y) is a fixed point of the mapping !. In general, such a fixed point for
the mapping ! does not exist unless F is sufficiently small; see, e.g., [EcJa, Has, HHNL].
The regularization for the Signorini contact problem with Coulomb friction that we
consider here corresponds to the regularization in (Pγ1,γ2 ) and reflects the fact that the La-
grangian for the contact condition relates to the negative stress in the normal direction. It
is given by
a(y, z − y) + max(0, λ̂ + γ1 (τN y − d)), τN (z − y) c − L(z − y)
1 (9.2.19)
+ F max(0, λ̂ + γ1 (τN y − d)) h(τT z, μ̂) − h(τT y, μ̂) dx ≥ 0
γ2 c
for all z ∈ Y, with h(· , ·) as defined in (9.2.11). Existence for (9.2.19) is obtained by means
of a fixed point argument for the regularized Tresca friction problem. For this purpose we
set L2+ (c ) := {ξ ∈ L2 (c ) : ξ ≥ 0 a.e.} and define the mapping !γ : L2+ (c ) → L2+ (c )
by !γ (g) := λγ = max(0, λ̂ + γ1 (τN y γ − d)) with y γ the unique solution of the regularized
contact problem with friction g ∈ L2+ (c ). In a first step Lipschitz continuity of the mapping
γ : L2+ (c ) → Y which assigns to a given friction g ∈ L2+ (c ) the corresponding solution
y γ of (Pγ1,γ2 ) is investigated.
For the proof we refer the reader to [Sta1]. We next address properties of the map-
ping !γ .
From Lemma 9.8 it is known that γ is Lipschitz continuous. The mapping " consists
1
of the linear trace mapping from Y into H 2 (c ) and the compact embedding of this space
i i
i i
ItoKunisc
i i
2008/6/12
page 274
i i
for all ξ, ξ̃ ∈ L2 (c ), the mapping ϒ is Lipschitz continuous with constant γ1 . This implies
that !γ is Lipschitz continuous with constant
c 1 c 2 γ1
L= F∞ , (9.2.23)
κ
where c1 , c2 are constants from trace theorems. Concerning compactness the composition
of " and γ is compact. From (9.2.22) it then follows that !γ is compact. This ends the
proof.
We can now show that the regularized contact problem with Coulomb friction has a
solution.
Theorem 9.10. The mapping !γ admits at least one fixed point, i.e., the regularized
Coulomb friction problem (9.2.19) admits a solution. If F∞ is such that L as defined in
(9.2.23) is smaller than 1, the solution is unique.
Hence, the Leray–Schauder theorem guarantees the existence of a solution to the regularized
Coulomb friction problem. Uniqueness of the solution holds if F is such that L is smaller
than 1, since in this case !γ is a contraction.
In the following algorithm the fixed point approach is combined with an augmented
Lagrangian concept to solve (9.2.19).
Algorithm ALM-FP.
1. Initialize (λ̂0 , μ̂0 ) ∈ L2 (c ) × L2 (c ) and g 0 ∈ L2 (c ), m := 0.
2. Choose γ1m , γ2m > 0 and determine the solution (λm , μm ) to problem (Pγ1,γ2 ) with
given friction g m and λ̂ := λ̂m , μ̂ := μ̂m .
3. Update g m+1 := λm , λ̂m+1 := λm , μ̂m+1 := μm , and m := m + 1. Unless an
appropriate stopping criterion is met, goto step 2.
The auxiliary problems (Pγ1,γ2 ) in step (2) can be solved by Algorithm SSN. Numerical
experiments with this algorithm are given in [KuSt]. For the following brief discussion of
the above algorithm, we assume that the mapping ! admits a fixed point λ∗ in L2 (c ). Then,
i i
i i
ItoKunisc
i i
2008/6/12
page 275
i i
Provided that Algorithm ALM-FP has a limit point, it also satisfies this system of equations;
i.e., the limit satisfies the original, nonregularized contact problem with Coulomb friction.
We end the section with a numerical example taken from [KuSt].
Example 9.11. We solve the contact problem with Tresca as well as with Coulomb friction
using Algorithms SSN
and ALM-FP, respectively. We choose
= [0, 3] × [0, 1], the gap
function d = max 0.0015, 0.003(x1 − 1.5)2 + 0.001 and E = 10000, ν = 0.45, and
f = 0. The boundary of possible contact and friction is c := [0, 3] × {0}, and we assume
traction-free boundary conditions on n := [0, 3] × {1} ∪ {0} × [0, 0.2] ∪ {3} × [0, 0.2].
On d := {0} × [0.2, 1] ∪ {3} × [0.2, 1] we prescribe the deformation as follows:
⎧
⎪ 0.003(1 − x2 )
⎪
⎨ on {0} × [0.2, 1],
−0.004
τy =
⎪
⎪ −0.003(1 − x2 )
⎩ on {3} × [0.2, 1].
−0.004
0.8
0.6
0.4
0.2
−0.2
−0.5 0 0.5 1 1.5 2 2.5 3 3.5
Figure 9.1. Deformed mesh for g ≡ 1; gray tones visualize the elastic shear
energy density.
i i
i i
ItoKunisc
i i
2008/6/12
page 276
i i
20 20 20
10 10 10
0 0 0
30 60 150
20 40 100
10 20 50
0 0 0
Figure 9.2. Upper row, left: multiplier λ∗ (solid), rigid foundation (multiplied by
5 · 10 , dotted), and normal displacement τN y ∗ (multiplied by 5 · 103 , dashed). Lower row,
3
left figure: dual variable μ∗ (solid) with bounds ±Fλ∗ (dotted) and tangential displacement
y ∗ (multiplied by 5 · 103 , dashed) for F = 2. Middle column: same as left, but with F = 5.
Right column: same as first column, but with F = 10.
(3, 0.2), i.e., the points where the boundary conditions change from Neumann to Dirichlet,
we observe a stress concentration due to a local singularity of the solution. Further, we also
observe a (small) stress concentration close to the points where the rigid foundation has the
kinks.
We turn to the Coulomb friction problem and investigate the performance of Algo-
rithm ALM-FP. In Figure 9.2 the normal and the tangential displacement with corresponding
multipliers for F = 2, 5, 10 are depicted. One observes that the friction coefficient signif-
icantly influences the deformation. For instance, in the case F = 2 the elastic body is in
contact with the foundation in the interval [1.4, 1.6], but it is not for F = 5 and F = 10.
These large values of F may, however, be physically of little relevance. Algorithm ALM-
|g m −g m−1 |
FP requires overall between 20 and 25 linear solves to stop with |gm | c ≤ 10−7 . For
c
further information on the numerical performance we refer the reader to [Sta1].
i i
i i
ItoKunisc
i i
2008/6/12
page 277
i i
Chapter 10
Parabolic Variational
Inequalities
In this section we discuss the Lagrange multiplier approach to parabolic variational inequal-
ities in the Hilbert space H = L2 () which are of the type
d ∗
y (t) + Ay ∗ (t) − f (t), y − y ∗ (t) ≥ 0, y ∗ (t) − ψ ∈ C,
dt (10.0.1)
∗
y (0) = y0
C = {y ∈ H : y ≥ 0},
for a.e. (t, S) ∈ (0, T ) × (0, ∞), where ⊥ denotes complementarity, i.e., a ≥ 0 ⊥ b ≥ ψ
if a ≥ 0, b ≥ ψ, and a(b − ψ) = 0. In (10.0.2) the reward function ψ(S) = (K − S)+ for
the put option and ψ(S) = (S − K)+ for the call option. Here S ≥ 0 denotes the price, v
the value of the share, r > 0 is the interest rate, σ > 0 is the volatility of the market, and
K is the strike price. Further T is the maturity date. The integral operator B is defined by
∞
Bv(S) = −λ (z − 1)SvS + (v(t, S) − v(t, zS)) dν(z).
0
Note that (10.0.2) is a backward equation with respect to the time variable. Setting
y(t, S) = v(T − t, S) we arrive at (10.0.1), and (10.0.2) has the following interpretation
277
i i
i i
ItoKunisc
i i
2008/6/12
page 278
i i
[Kou] in mathematical finance. The price process St is governed by the Ito’s stochastic
differential equation
v(t, S) = sup E t,x [e−r(τ −t) ψ(Sτ )] over all stopping times τ ≤ T . (10.0.3)
τ
It will be shown that for the put case, (0, ∞) can be replaced by = (S̄, ∞) with
certain S̄ > 0. Thus, (10.0.2) can be formulated as (10.0.1) by defining a bounded bilinear
form a on V × V by
∞ 2
σ 2
a(v, φ) = S vS + (r − σ 2 )S v φS + (2r − σ 2 ) vφ − Bv φ) dS (10.0.4)
S̄ 2
for v, φ ∈ V , where V is defined by
V = φ ∈ H : φ is absolutely continuous on (S̄, ∞),
∞
S 2 |φS |2 dS < ∞ and φ(S) → 0 as S → ∞ and ψ(S̄) = 0
S̄
equipped with
∞
|φ|2V = (S 2 |φS |2 + |φ|2 ) dS.
S̄
Now
σ2
a(v, φ) ≤ |v|V |φ|V + |r − σ 2 ||v|H |φ|V + |2r − σ 2 ||v|H |φ|H
2
and
σ2 2 3
a(v, v) ≥ |v|V + 2r − σ 2 |v|2H − |r − σ 2 ||v|V |v|H
2 2
σ2 2 3 2 (r − σ 2 )2
≥ |v|V + 2r − σ − |v|2H .
4 2 σ2
Note that if Av ∈ L2 (), then
(−Bv, v + ) ≥ (−Bv + , v + ).
i i
i i
ItoKunisc
i i
2008/6/12
page 279
i i
279
Thus,
σ2 2
Av = − S vSS + rS vS − r v + Bv
2
satisfies
Av, v +
≥ Av + , v +
.
Motivated by this example we make the following assumptions. Let X be a Hilbert
space that is continuously embedded into H , and let V be a separable closed linear subspace
of X endowed with the induced norm and dense in H . Assume that
ψ ∈ X, f ∈ L2 (0, T ; V ∗ ),
and that
φ + = sup(0, φ) ∈ V for all φ ∈ V .
The following assumptions will be used for the operator A.
(1) A ∈ L(X, V ∗ ), i.e., there exists M̄ such that
| Ay, φ
V ∗ ×V | ≤ M̄|y| |φ| for all y ∈ X and φ ∈ V ,
dom(A) = {y ∈ X : Ay ∈ H } ⊂ X,
Aφ, φ
≥ ω|φ|2V − ρ|φ|2H .
λ̄ + Aψ − f (t), φ
≤ 0
i i
i i
ItoKunisc
i i
2008/6/12
page 280
i i
Assumptions (1)–(5) apply to (10.0.2) and to second order elliptic differential oper-
ators. Assumption (6) applies to the biharmonic operator 2 and to self-adjoint operators.
For the biharmonic operator and systems of equations as the elasticity system, for instance,
the monotone property (3) does not hold.
In this chapter we discuss (10.0.1) without assuming that V is embedded compactly
into H . In the latter case, one can use the Aubin lemma, which states that W (0, T ) =
L2 (0, T ; V )∩H 1 (0, T ; V ∗ ) is compactly embedded into L2 (0, T ; H ). This ensures that the
weak limit of certain approximating sequences defines the solution; see, e.g., [GLT, IK23].
Instead, our analysis uses the monotone trick for variational inequalities. From, e.g., [Tan,
p. 151], we recall that W (0, T ) embeds continuously into C([0, T ]; H ).
We commence with the definitions of strong and weak solutions to (10.0.1).
Moreover, in case that y ∗ is a strong solution we have y ∗ ∈ L2 (δ, T ; dom(A)) for every
δ > 0. Further λ∗ ∈ L2 (δ, T ; H ) and (10.0.1) can equivalently be written as a variational
inequality in the form
⎧
⎪ d ∗
⎨ y (t) + Ay ∗ (t) + λ∗ (t) = f (t), y ∗ (0) = y0 ,
dt (10.0.6)
⎪
⎩ ∗ ∗ ∗ ∗
λ (t) ≤ 0, y (t) ≥ ψ, (y (t) − ψ, λ (t))H = 0 for a.e. t > 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 281
i i
Proof. Existence and uniqueness of the solution to (10.1.1) follow with monotone tech-
niques; see [ItKa, Lio3], for instance. Define A : V → V ∗ by
Aφ = Aφ + min(−λ̄, cφ).
Then (10.1.1) can equivalently be expressed as
d
v + Av = f − λ̄ − Aψ ∈ L2 (0, T ; V ∗ ), (10.1.2)
dt
with v = yc − ψ and v(0) = y0 − ψ ∈ H . We note that A is hemicontinuous, i.e.,
s → A(φ1 + sφ2 ), φ3
is continuous from R → R for all φi ∈ V , i = 1, . . . , 3, and
|Aφ|V ∗ ≤ |Aφ|V ∗ + c|φ|H for all φ ∈ V ,
Aφ1 − Aφ2 , φ1 − φ2
≥ ω|φ1 − φ2 |2V − ρ|φ1 − φ2 |2H for all φ1 , φ2 ∈ V ,
Aφ, φ
≥ ω|φ|2V − ρ|φ|2H for all φ ∈ V .
i i
i i
ItoKunisc
i i
2008/6/12
page 282
i i
Theorem 10.4. (1) If in addition to the assumptions in Proposition 10.3 assumptions (3)–
(4) hold, y0 − ψ ∈ C, then yc (t) − ψ ∈ C and yc (t) ≥ yĉ (t) for ĉ ≥ c. Moreover,
yc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) and weakly in H 1 (0, T ; V ∗ ) as c → ∞, where
y ∗ is the unique solution of (10.0.1) in the sense that y ∗ − ψ ∈ K, (10.0.6) is satisfied with
λ∗ ∈ L2 (0, T ; H ), and the estimate
t
1 −2ρt ∗ −2ρs ∗ 1 t −2ρs 2
e |yc (t) − y (t)|H +
2
e ω |yc (s) − y (s)|V ds ≤
2
e |λ̄| ds → 0
2 0 c 0
for t ∈ [0, T ] holds. If in addition assumption (7) is satisfied and λ̄ ∈ L∞ (), then
1
|yc (t) − y ∗ (t)|L∞ ≤ |λ̄|L∞ .
c
(2) If assumptions (1)–(4) hold, y0 − ψ ∈ C, and f ∈ L2 (0, T ; H ), then y ∗ is the
unique strong solution to (10.0.1).
where
A(yc − ψ), φ
≥ Aφ, φ
≥ ω|φ|2 − ρ|φ|2H
and by (4)
Aψ − f (t) + λ̄, φ
≥ 0.
Thus,
1 d
|φ|2 ≤ ρ |φ|2H
2 dt H
i i
i i
ItoKunisc
i i
2008/6/12
page 283
i i
and consequently
e−2ρt |φ|2H ≤ |φ(0)|2H = 0. (10.1.4)
Since
0 ≥ λc = min(0, λ̄ + c (yc − ψ)) ≥ λ̄,
we have
|λc (t)|H ≤ |λ̄|H
for all t ∈ [0, T ]. From (10.1.3) we deduce that {yc } is bounded in L2 (0, T ; V ). By assump-
tion (1) and again by (10.1.3) it follows that {Ayc } and { dtd yc } are bounded in L2 (0, T ; V ∗ ).
Thus, there exist λ∗ ∈ L2 (0, T ; H ) satisfying λ∗ ≤ 0 a.e. and y ∗ satisfying y ∗ − ψ ∈ K,
such that for a subsequence denoted again by c,
λc → λ∗ weakly in L2 (0, T ; H ),
d d (10.1.5)
Ayc → Ay ∗ and yc → y ∗ weakly in L2 (0, T ; V ∗ )
dt dt
as c → ∞ . Taking the limit in (10.1.3) implies that
d ∗
y + Ay ∗ − f = −λ∗ , y ∗ (0) = y0 , (10.1.6)
dt
with equality in the differential equation holding in the sense of L2 (0, T ; V ∗ ).
For φ = −(yc − yĉ )− with c ≤ ĉ we deduce from (10.1.3) that
d
(yc − yĉ ) + A(yc − yĉ ), φ + (λc − λĉ , φ) = 0,
dt
where
(λc − λĉ , φ) = min(0, λ̄ + c(yĉ − ψ)) − min(0, λ̄ + ĉ(yĉ − ψ))
+ min(0, λ̄ + c(yc − ψ)) − min(0, λ̄ + c(yĉ − ψ)), φ ≥ 0,
since yĉ ≥ ψ. Hence, using the same arguments as those leading to (10.1.4), we have
|φ(t)|H = 0 and thus
yc ≥ yĉ for c ≤ ĉ.
By Lebesgue dominated convergence theorem and the theorem of Beppo Levi, yc → y ∗
strongly to L2 (0, T ; H ) and pointwise a.e. in (0, T ) × . Since
T
1 T 2
0≥ (λc , yc − ψ)H dt ≥ − |λ̄|H dt → 0
0 c 0
as c → ∞, we have
T
(λ∗ , y ∗ − ψ) dt = 0.
0
i i
i i
ItoKunisc
i i
2008/6/12
page 284
i i
That is, (y ∗ , λ∗ ) satisfies (10.0.6), where the first equation is satisfied in the sense of
L2 (0, T ; V ∗ ). Suppose that y ∈ K satisfies (10.0.1). Then it follows that
1 d ∗
|y (t) − y(t)|2H + A(y ∗ − y(t)), y ∗ (t) − y(t)
≤ 0
2 dt
and thus e−2ρt |y ∗ (t) − y(t)|2H ≤ |y0 − y(0)|2H . This implies that y ∗ is the unique solution
to (10.0.1) in K and that the whole family {(yc , λc )} converges in the sense specified in
(10.1.5). From (10.0.1) and (10.1.1)
d ∗ ∗ ∗
y (t) + Ay (t) − f (t), yc (t) − y (t) ≥ 0,
dt
d ∗
yc (t) + Ayc (t) − f (t), y (t) − yc (t) ≥ (λc , yc − ψ)H .
dt
2 dt
1
+ρ |yc (t) − y ∗ (t)|2H ≤ e−2ρt |λ̄|2 ,
c
which implies the first estimate and in particular that yy → y ∗ strongly in L2 (0, T ; V ).
Suppose next that in addition λ̄ ∈ L∞ (). Let k ∈ R+ and φ = (yc − y ∗ − k)+ . By
assumption φ ∈ V . From (10.0.6) and (10.1.1)
d ∗
y + Ay ∗ − f, φ ≥ 0
dt
and
d
yc + Ayc + λc − f, φ = 0.
dt
If k ≥ 1c |λ̄|L∞ , then
i i
i i
ItoKunisc
i i
2008/6/12
page 285
i i
(2) Now suppose that f ∈ L2 (0, T ; H ) and that assumptions (1)–(4) hold. Consider
(10.1.3) in the form
- d
y + Ayc = f − λc ,
dt c
(10.1.7)
y(0) = y0 .
We decompose yc = yc,i + yh , where yc,i and yh are the solutions to (10.1.7) with initial
condition and forcing functions set to zero, respectively. Note that {λc } is bounded in
L2 (0, T ; H ) uniformly with respect to c. Hence by the following lemma {Ayc,i } and { dtd yc,i }
are bounded in L2 (0, T ; H ) uniformly for c > 0. Moreover, Ayh ∈ L2 (δ, T ; H ) and dtd yh ∈
L2 (δ, T ; H ) for every δ > 0. Thus yc is bounded in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) and
converges weakly in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) to y ∗ for c → ∞.
Lemma 10.5. Under the assumptions of the previous theorem −A generates an analytic
semigroup on H . If dtd x + Ax = g ∈ L2 (0, T ; H ), with x0 = 0, then dtd x(t) and Ax(t) ∈
L2 (0, T ; H ), and
|Ax|L2 (0,T ;H ) ≤ k̄ |g|L2 (0,T ;H ) ,
Proof. Let B = A−ρI . Further for u ∈ dom(A) and λ ∈ C with Reλ ≥ 0 set ḡ = λu+Bu.
Then, since
Re λ (u, u)H + Bu, u
≤ |ḡ|H |u|H ,
and thus
M̄ 1
|u|H = |(λI + B)−1 ḡ|H ≤ 1 + |ḡ|H . (10.1.9)
ω |λ|
It thus follows from [ItKa, Paz, Tan] that −B and hence −A generate analytic semigroups
on H related by e−B t = e (ρ−A) t .
For g ∈ L2 (0, ∞; H ) with eρ · g ∈ L2 (0, ∞; H ) consider
d
x + Ax = g with x(0) = 0.
dt
This is related to
d
z + Bz = gρ := geρ· with x(0) = 0 (10.1.10)
dt
i i
i i
ItoKunisc
i i
2008/6/12
page 286
i i
To allow δ = 0 for the strong solutions in the previous theorem we let ȳ denote the
solution to - d
dt
ȳ + Aȳ = 0,
ȳ(0) = y0 ,
and we consider
- d
dt
(yc − ȳ) + A(yc − ȳ) = f − λc ,
y(0) − ȳ(0) = 0.
Next we turn to verify existence under a different set of assumptions which, in par-
ticular, does not involve the monotone assumption (3). For λ̄ = 0 in (10.1.1), let ŷc denote
the corresponding solution, i.e.,
d
ŷc + Aŷc + c min(0, c (yc − ψ)) = f, c > 0, (10.1.11)
dt
which exists by Theorem 10.4 (1).
i i
i i
ItoKunisc
i i
2008/6/12
page 287
i i
With assumptions (5) and (6) holding, we have that ŷc ∈ H 1 (0, T ; H ),
d d
ŷc + Aŷc + c min(0, ŷc − ψ) − f (t), ŷc = 0
dt dt
for a.e. t ∈ (0, T ). Then
d
2
ŷc
+ d (as (ŷc − ψ̄, ŷc − ψ̄) + c|(ŷc − ψ)− |2 ) ≤ 2(M 2 |ŷc − ψ̄|2 + |Aψ̄ − f |2 )
dt
dt V H
H
and hence
t
d
2
as (ŷc (t) − ψ̄, ŷc (t) − ψ̄) + c |(ŷc (t) − ψ)− |2H +
dt ŷc
ds
t 0 H (10.1.13)
≤ as (y0 − ψ̄, y0 − ψ̄) + 2(M |ŷc (s) − ψ̄|V + |Aψ̄ − f (s)|2H ) ds,
2 2
0
where we used the fact that y0 − ψ ∈ C. It thus follows from (10.1.12)–(10.1.13) that
t
d
2
− 2
|ŷc (t) − ψ̄|V + c |(ŷc (t) − ψ) |H +
2
dt ŷc
ds
0 H
t (10.1.14)
≤ K |y0 − ψ̄|V + |ψ − ψ̄|V + |Aψ̄ − f (s)|2H ds
0
i i
i i
ItoKunisc
i i
2008/6/12
page 288
i i
0, by (10.1.14) we have
T
(y ∗ (t) − ψ, φ)H dt ≥ 0 for all φ ∈ L2 (0, T ; H ) with φ(t) ∈ C. (10.1.15)
0
i i
i i
ItoKunisc
i i
2008/6/12
page 289
i i
and thus y(t) = y ∗ (t). Hence the solution to (10.1.18) is unique. Integrating (10.1.16) on
(τ, t) with 0 ≤ τ < t ≤ T we obtain with the arguments that lead to (10.1.18)
t
d ∗
e−2ρs y (s) + Ay ∗ (s) − f (s), y(s) − y ∗ (s) ds ≥ 0, (10.1.19)
τ ds
and thus y ∗ satisfies (10.0.1).
To argue that ŷc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) ∩ C(0, T ; H ), note that
λ̂c = c min(0, ŷc − ψ) converges weakly in L2 (0, T ; V ∗ ) to λ∗ . From (10.1.11) and
(10.0.5) we have
1 d
|ŷc − y ∗ |2 + A(ŷc − y ∗ ), ŷc − y ∗
= λ∗ − λ̂c , ŷc − y ∗
2 dt
≤ λ∗ − λ̂c , ŷc − ψ + ψ − y ∗
≤ λ∗ , ŷc − ψ
+ λ̂c , y ∗ − ψ
=: ηc ,
1 d
|ŷc − y ∗ |2H + ω|ŷc − y ∗ |2V − ρ|ŷc − y ∗ |2H ≤ ηc ,
2 dt
and hence
d −2ρt
[e |ŷc − y ∗ |2H ] + ω e−2ρt |ŷc − y ∗ |2V ≤ 2e−2ρt ηc ,
dt
i i
i i
ItoKunisc
i i
2008/6/12
page 290
i i
2
− 12 (a − b)2 we find
t
1
lim inf e−2ρs as (y ∗ (s) − ψ̄, y ∗ (s − h) − y ∗ (s))
h→0 −h τ
t
≥ 2ρ e−2ρs as (y ∗ (s) − ψ̄, y ∗ (s) − ψ̄) ds
τ
1 1
+ e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) − e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄).
2 2
∗ ∗
This estimate, together with assumption (6) and the fact that { y (·−h)−y−h
}h>0 is weakly
bounded in L2 (0, T ; H ), allows us to pass to the limit in (10.1.20) to obtain
e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) − e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄)
t
2
+ τ e−2ρs
dtd y ∗ (s)
H ds
t
t
d
d ∗
≤M |y ∗ (s) − ψ̄|V
y ∗ (s)
ds + e−2ρs |Aψ̄ − f (s)|H
y (s)
ds.
dt
τ dt τ
Consequently, we have
e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) ≤ e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄)
t
+ e−2ρs (M 2 |y ∗ (s) − ψ̄|2V + |Aψ̄ − f (s)|2H ) ds
τ
as (y ∗ (τ )−ψ̄, y ∗ (τ )−ψ̄)+|y ∗ (τ )−ψ̄|2H ≥ lim sup as (y ∗ (t)−ψ̄, y ∗ (t)−ψ̄)+|y ∗ (t)−ψ̄|2H .
t→τ
Since as (φ, φ)+|φ|2H defines an equivalent norm on the Hilbert space V , |y ∗ (t)−y ∗ (τ )|V →
∗
0 as t ↓ τ . Hence y is right continuous.
Now, in addition assumptions (3)–(4) are supposed to hold. Let
λ̂c (t) = c min(0, ŷc (t) − ψ).
Then for c ≤ ĉ and φ = (ŷc − ŷĉ )+
(λ̂c − λ̂ĉ , φ) = (c − ĉ) min(0, ŷc − ψ) + ĉ(min(0, ŷc − ψ) − min(0, ŷĉ − ψ)), φ ≥ 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 291
i i
Hence, using the arguments leading to (10.1.4), we have ŷc ≤ ŷĉ for c ≤ ĉ. Then ŷc (t) →
y ∗ (t) strongly in H and pointwise a.e. in . Moreover,
10.2 Regularity
In this section we discuss additional regularity of the solution y ∗ to (10.0.1) under the
assumptions of Theorem 10.4 (3). For h > 0 we have, suppressing the superscripts “∗ ”,
d (y(t + h) − y(t) y(t + h) − y(t) λ(t + h) − λ(t) f (t + h) − f (t)
+A + = .
dt h h h h
From (10.0.5)
and thus
d (y(t + h) − y(t) (y(t + h) − y(t)
,
dt h h
y(t + h) − y(t) y(t + h) − y(t)
+ A ,
h h
f (t + h) − f (t) y(t + h) − y(t)
≤ , .
h h
y(t + h) − y(t)
2
(f (t + h) − f (t)
2
+ρt
+ t
.
h
2ω
h
∗
H V
i i
i i
ItoKunisc
i i
2008/6/12
page 292
i i
Integrating in time,
t
y(t + h) − y(t)
2
y(s + h) − y(s)
2
ds
t
h
+ω s
h
H 0 V
t
y(s + h) − y(s)
2
f (s + h) − f (s)
2
≤e 2ρt
+ s
h
ω
h
∗ ds,
0 H V
The conclusion of Theorem 10.9 remains correct under the assumptions of the first
part of Theorem 10.7, i.e., under assumptions (1)–(2) and (5)–(6), y0 − ψ ∈ C ∩ V , and
f ∈ L2 (0, T ; H ) ∩ H 1 (0, T , V ∗ ).
and therefore
T
|y(q1 )(T ) − y(q2 )(T )|2H + ω |y(q1 ) − y(q2 )|2V dt
0
1 T
≤ |(A(q2 ) − A(q1 ))y(q2 )|2V ∗ dt = e(q1 , q2 )2 ,
ω 0
i i
i i
ItoKunisc
i i
2008/6/12
page 293
i i
where
κ
e(q1 , q2 )2 ≤ |q1 − q2 |2U |y(q2 )|2L2 (0,T ;V ) .
ω
where W α = [H, D]α is the interpolation space between D and H (see, e.g., [Fat, Chapter
8]), and C̄ is an embedding constant. If L∞ () ⊂ W α , with α ∈ (α0 , 1) for some α0 , then
Hölder’s continuity of q ∈ Ũ → y(q)(T ) ∈ L∞ () follows.
Next we prove Lipschitz continuity of q ∈ Ũ → y(q) ∈ L∞ (0, T ; L∞ ()). Some
prerequisites are established first. We assume that A(q) generates an analytic semigroup
S(t) = S(t; q) on H for every q ∈ Ũ [Paz]. Then for each q ∈ Ũ there exists M such that
M
Aα S(t) ≤ for all t > 0, (10.3.2)
tα
where Aα denote the fractional powers of A, with α ∈ (0, 1). We assume that M is
independent of q ∈ Ũ . We shall further assume that
1
dom(A 2 ) = V (10.3.3)
for all q ∈ Ũ , which is the case for a large class of second order elliptic differential operators;
see, e.g., [Fat, Chapter 8]. We assume that
V ⊂ Lr (),
For
2r
p= ∈ (2, ∞),
r −2
we shall utilize the assumption
(10.3.6)
for some α ∈ (0, 1) and all q1 , q2 ∈ Ũ , y ∈ D.
This assumption is applicable, for example, if the parameter enters as a constant into
the leading differential term of A(q) or if it enters into the lower order terms.
i i
i i
ItoKunisc
i i
2008/6/12
page 294
i i
Theorem 10.10. Let A(q) generate an analytic semigroup for every q ∈ Ũ , and let
assumptions (1)–(5) and (7) hold. If further (10.3.3)–(10.3.6) are satisfied and f ∈
L∞ (0, T ; H ), y0 ∈ C ∩ D, then q → y(q) is Lipschitz continuous from Ũ → L∞ (0, T ;
L∞ ()).
A− 2 f i ∈ L∞ (0, T ; Lp ()), i = 1, 2,
1
and let y 1 , y 2 denote the corresponding strong solutions to (10.0.1). For k > 0 let
φk = max(0, y 1 − y 2 − k)
and
k = {x ∈ : φk > 0}.
for all z1 , z2 ∈ H and k > 0. Thus, it follows from (10.0.5) and Theorem 10.4 that for
φk = (y 1 − y 2 − k)+ ∈ V
d 1
(y − y ), φk + A(y 1 − y 2 ), φk
≤ (f 1 − f 2 , φk )
2
(10.3.7)
dt
with
T 12
− 12
C= |A f |2L2 dt .
0
i i
i i
ItoKunisc
i i
2008/6/12
page 295
i i
We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞ and we find
for r > 2
|φk |rLr ≥ |φ − k|r dx > |φ − k|r ds ≥ |h ||h − k|r . (10.3.10)
k h
T β2
ω̄M̄C 1−β − 12
p−2
≤ |A f |2Lp |k | p dt .
ω|h − k| 0
For 1
P
+ 1
Q
= 1 this implies that
T 12 T 2Pβ T 2Qβ
2 ω̄M̄C 1−β − 12
Q(p−2)
|h | dt
r ≤ |A f |2P
Lp dt |k | p dt .
0 ω|h − k| 0 0
T 12 T β2
2 K 2
|h | dt
r ≤ |k | dtr , (10.3.11)
0 |h − k| 0
t∈(0,T )
i i
i i
ItoKunisc
i i
2008/6/12
page 296
i i
M̄ ω̄C
ϕ(k1 ) ≤ .
ωk1
The same estimate can also be obtained in the case that C ≤ k1 , and consequently k1 + k̂ ≤
β
k1 , where = 1 + 2 β−1 ( ω̄ωM̄ )β . Hence we obtain y 1 − y 2 ≤ k1 a.e. in (0, T ) × .
Analogously a uniform lower bound for y 1 − y 2 is obtained by using φk = min(0, y 1 −
y 2 − k) ≤ 0 and thus
(2) We use the estimate of step (1) to obtain Lipschitz continuous dependence of the
solution on the parameter q. Let q1 , q2 ∈ Ũ with corresponding solutions y(q1 ) and y(q2 ).
Since
d
y(q2 ) + A(q1 )y(q2 ) + (A(q2 ) − A(q1 ))y(q2 ) + λ(q2 ) = f (t),
dt
y(q2 ) is the solution to (10.0.1) with A = A(q1 ) and f˜(t) = f − (A(q2 ) − A(q1 ))y(q2 ) ∈
L2 (0, T ; H ). Hence we can apply the estimate of (1) with A = A(q1 ) and f 1 − f 2 =
(A(q2 ) − A(q1 ))y(q2 ) and obtain
|y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤ sup |A(q1 )− 2 (A(q1 ) − A(q2 ))y(t; q2 )|Lp () .
1
t∈(0,T )
|y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤ κ̄ |q1 − q2 |U sup |Aα (q2 )y(t; q2 )|H . (10.3.13)
t∈(0,T )
To estimate Aα (q2 )y(t; q2 ) recall from Theorem 10.4 that λ̄ ≤ λ(t; q) ≤ 0 and thus {f −
λ(q) : q ∈ Ũ } is uniformly bounded in L∞ (0, T ; H ). From (10.0.6) we have that
t
A(q2 )α y(t; q2 ) = A(q2 )α S(t; q2 )y0 + A(q2 )α S(t − s; q2 )(f (s) − λ(s; q2 )) ds
0
∈ L∞ (0, T ; H ).
i i
i i
ItoKunisc
i i
2008/6/12
page 297
i i
Theorem 10.11. Assume that y0 ∈ H and f ∈ L2 (0, T , V ∗ ) and that assumptions (1)–(2)
hold. Then there exists a unique solution {y k }N
k=1 to (10.4.1).
yck − y k−1
+ Ayck + c min(0, yck − ψ) − f k = 0. (10.4.2)
h
Since
y ∈ H → min c (0, y − ψ)
is Lipschitz continuous and monotone, the operator B : V → V ∗ defined by
y
B(y) = + Ay + c min(0, y − ψ)
h
is coercive, monotone, and continuous for all h > 0. Hence by the theory of maximal
monotone operators (10.4.2) admits a unique solution; cf., e.g., [ItKa, Chapter I.5], [Ba,
Chapter II.1]. For each c > 0 and k = 1, . . . , N we find
1
(|y k − ψ|2H − |y k−1 − ψ|2H + |yck − y k−1 |2H )
2h c
+ A(yck − ψ) + Aψ − f k , yck − ψ
+ c |(yck − ψ)− |2H = 0.
Thus the families |yck − ψ|2V and c |(yck − ψ)− |2H are bounded in c > 0 and there exists
a subsequence of {yck − ψ} that converges to some y k − ψ weakly in V as c → ∞. As
i i
i i
ItoKunisc
i i
2008/6/12
page 298
i i
(−(yck −ψ)− , y −yck ) = (−(yck −ψ)− , y −ψ −(yck −ψ)) ≤ 0 for all y −ψ ∈ C (10.4.3)
and
c→∞
= lim inf A(yck − ψ), yck − ψ
+ A(yck − ψ), ψ − y
+ Aψ, yck − ψ
c→∞
(10.4.4)
≥ A(y − ψ), y − ψ
+ A(y − ψ), ψ − y
+ Aψ, y − ψ
k k k k
= Ay k , y k − y
.
and thus
1 k 1 k−1
|y − ỹ k |2H + A(y k − ỹ k ), y k − ỹ k
≤ |y − ỹ k−1 |2H .
2h 2h
Since y 0 = ỹ 0 = y0 , this implies that y k = ỹ k for all k ≥ 1.
t − k h k+1
yh(1) = y k + (y − y k ), yh(2) = y k+1 on (k h, (k + 1) h] (10.4.5)
h
for k = 0, . . . , N − 1.
Theorem 10.12. Suppose that the assumptions of Theorem 10.11 hold. Then there exists
a unique weak solution y ∗ of (10.0.1). Moreover t → y ∗ (t) ∈ H is right continuous,
y ∗ ∈ B(0, T ; H ), and yh(2) − ψ → y ∗ − ψ strongly in L2 (0; T ; V ).
i i
i i
ItoKunisc
i i
2008/6/12
page 299
i i
From the above estimates it follows that there exist subsequences of yh(1) , yh(2) (denoted by
the same symbols) and y ∗ (t) ∈ L2 (0, T ; V ) such that
Note that
d (1) y k+1 − y k
yh = on (k h, (k + 1) h].
dt h
Thus, we have from (10.4.1) for every y ∈ K
d d (1) d
y + Ayh(2) − fh , y − yh(2) + yh − y, y − yh(2) ≥ 0 (10.4.8)
dt dt dt
a.e. in (0, T ). Here
d (1) d (1) d d (1) d
(yh − y), y − yh(2) = yh − y, y − yh(1) + yh − y, yh(1) − yh(2)
dt dt dt dt dt
(10.4.9)
with
T
d (1) d 1
yh − y, y − yh(1) dt ≤ |y(0) − y0 |2H (10.4.10)
0 dt dt 2
and
1 k
T N
d (1) (1)
y , yh − yh(2) dt = − |y − y k−1 |2H . (10.4.11)
0 dt h 2 k=1
Since
T T
Ay ∗ , y ∗ − y
dt ≤ lim inf
+
Ayh(2) , yh(2) − y
dt,
0 h→0 0
i i
i i
ItoKunisc
i i
2008/6/12
page 300
i i
which can be argued as in (10.4.4), it follows from (10.4.7)–(10.4.11) that every weak cluster
point y ∗ of yh(2) satisfies
T
d 1
y + Ay ∗ − f, y − y ∗ dt + |y(0) − y0 |2H ≥ 0 (10.4.12)
0 dt 2
for all y ∈ K and a.e. t ∈ (0, T ). Hence y ∗ ∈ L2 (0, T ; V ) is a weak solution of (10.0.1)
and y ∗ ∈ B(0, T ; H ).
Moreover, from (10.4.6)
∗ ∗ 1 t
|y (t) − ψ|H ≤ |y (τ ) − ψ|H +
2 2
|Aψ − f (s)|2V ∗ ds
ω τ
T
1 k
N
≥ |y − y |H +
k−1 2
A(y ∗ − yh(2) ), y ∗ − yh(2)
dt.
2 k=1 0
as h → 0+ .
i i
i i
ItoKunisc
i i
2008/6/12
page 301
i i
Corollary 10.13. Let y = y(y0 , y) denote the weak solution to (10.0.1), given y0 ∈ H and
f ∈ L2 (0, T ; V ∗ ). Then for all t ∈ [0, T ]
T
˜
|y(y0 , f )(t) − y(ỹ0 , f )(t)|H + ω |y(y0 , f ) − y(ỹ0 , f˜)|2V ds
0
t
1
≤ |y0 − ỹ0 |2H + |f − f˜|2V ∗ ds.
ω 0
Proof. Let y and ỹ be the solution to (10.4.1) corresponding to (y0 , f ) and (ỹ0 , f˜). It
k k
i i
i i
ItoKunisc
i i
2008/6/12
page 302
i i
T
Hence 0 |yc − ψ|2H dt → 0 as c → 0 and {yc − ψ}c≥1 is bounded is L2 (0, T ; V ). Using
the same arguments as in the proof of Theorem 10.7, there exist y ∗ and a subsequence of
{yc − ψ}c≥1 that converges weakly to y ∗ − ψ ∈ L2 (0, T ; V ), and y ∗ − ψ ≥ 0 a.e. in
(0, T ) × . For y(t) ∈ K
T 1
d d
y(t) − (y(t) − yc ), y(t) − yc (t) + Ayc (t) − f (t), y(t) − yc (t)
0 dt dt
2
+(min(0, λ̄ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) dt = 0,
where
T
d 1
− (y(t) − yc ), y(t) − yc (t) dt = (|y(0) − y0 |2H − |y(T ) − yc (T )|2 ), (10.4.16)
0 dt 2
1 2
(min(0, λ̄ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) ≤ |λ̄| . (10.4.17)
2c H
Hence, we have
T
d 1
[ y(t), y(t) − yc (t) dt + Ayc (t) − f (t), y(t) − yc (t)
+ |y(0) − y0 |2H
0 dt 2
T
1 1
≥ |y(T ) − yc (T )|2H − |λ̄|2H ds.
2 2c 0
i i
i i
ItoKunisc
i i
2008/6/12
page 303
i i
Proof. Assume that y k−1 ≥ ỹ k−1 . For φ = −(yck − ỹck )− it follows from (10.4.2) that
yck − ỹck − (y k−1 − ỹ k−1 )
, φ + A(yck − ỹck ) − (f k − f˜k ), φ
− c (yck − ψ)−
h
−(ỹck − ψ)− , φ = 0.
Since
k−1
y − ỹ k−1
− − (f k − f˜k ), φ − c (yck − ψ)− − (ỹck − ψ)− , φ ≥ 0,
h
|φ|2H
+ Aφ, φ
≤ 0
h
and thus yck − ỹck ≥ 0 for sufficiently small h > 0. From the proof of Theorem 10.11 it
follows that we can take the limit with respect to c and obtain y k − ỹ k ≥ 0. By induction
this holds for all k ≥ 0. The first claim of the theorem now follows from (10.4.2) and
Theorem 10.12. The second one follows from
Corollary 10.16. Let assumptions (1)–(3) hold and suppose that the stationary variational
inequality
Ay − f, φ − y
≥ 0 for all φ − ψ ∈ C (10.5.1)
with f ∈ V ∗ has a solution y − ψ ∈ C. Then if y(0) = ψ and f (t) = f , we have y(t) ↑ ŷ,
where ŷ is the minimum solution to (10.5.1).
Proof. Suppose ȳ is a solution to (10.5.1). Since ȳ(t) := ȳ, t ≥ 0, is also the unique
solution to (10.0.1) with y0 = ȳ ≥ ψ, it follows from Corollary 10.16 that y(t) ≤ ȳ for all
t ∈ [0, T ]. On the other hand, it follows from Theorem 10.4 (2) that
τ +1
y(τ + 1) − y(τ ) + (Ay(s) − f ) ds, φ − y(τ ) ≥ 0 for all φ − ψ ∈ C (10.5.2)
τ
i i
i i
ItoKunisc
i i
2008/6/12
page 304
i i
Corollary 10.17 (Perturbation). Let assumptions (1)–(3) hold, and let ψ 1 , ψ 2 ∈ H and
f ∈ L2 (0, T ; V ∗ ). Denote by yc1 and yc2 the solutions to (10.1.1) with y0 equal to ψ 1 and
ψ 2 , respectively, and let y 1 and y 2 be the corresponding weak solutions to (10.0.1). Assume
that (φ − γ )+ ∈ V for any γ ∈ R+ , that φ ∈ V , and that A1, (φ − γ )+
≥ 0. Then for
α = max(0, supx∈ (ψ 1 − ψ 2 )) and β = min(0, inf x∈ (ψ 1 − ψ 2 )) we have
β ≤ yc1 − yc2 ≤ α,
β ≤ y 1 − y 2 ≤ α.
As in the proof of Theorem 10.4 it follows that φ = (yc − yc2 − α)+ satisfies
1 d
|φ|2 ≤ ρ|φ|2H .
2 dt H
Since φ(0) = 0, this implies that φ(t) = 0, t ≥ 0, and thus yc1 − yc2 ≤ α a.e. on (0, T ) × .
Similarly, letting φ = (yc1 − yc2 − β)− we obtain yc1 − yc2 ≥ β a.e. on (0, T ) × . Since
from Corollary 10.14, yc1 → y 1 and yc2 → y 2 weakly in L2 (0, T ; H ) for c → ∞, we obtain
the desired estimates.
i i
i i
ItoKunisc
i i
2008/6/12
page 305
i i
Chapter 11
Shape Optimization
305
i i
i i
ItoKunisc
i i
2008/6/12
page 306
i i
for example. The most commonly used approach relies on differentiating the reduced
functional Jˆ() = J (y(), , ()) using the chain rule. As a consequence shape differ-
entiability of y with respect to variations of the domain are essential in this method. In an
alternative approach the partial differential equation is realized in a Lagrangian formulation;
see [DeZo], for example.
The method for computing the shape derivative that we describe here is quite different
and elementary. In short, it can be described as follows. First we embed e(y, ) = 0 to an
equation in a fixed domain 0 by a coordinate transformation which is called the method
of mapping. Then we combine the Lagrange multiplier method to realize the constraint
e(y, ) = 0, using the shape derivative of functionals to calculate the shape derivative of
Jˆ(). In this process, differentiability of the state with respect to the geometric quantities
is not used. In fact, we require only Hölder continuity with exponent greater than or equal
to 12 of y with respect to the geometric data. We refer the reader to [IKPe2] for an example
in which the reduced cost functional is shape differentiable whereas the state variable of the
constraining partial differential equation is not.
For comparison we briefly discuss an example using the “chain rule” approach. But
first we require some notions from shape calculus, which we introduce only formally here.
Let be a reference domain in Rd , and let h : U → Rd , with ¯ ⊂ U , denote a mapping
describing perturbations of by means of
t = Ft (),
1
z (x) = lim (zt (x) − z0 (x)) for x ∈ ,
t→0 t
1
ż(x) = lim (zt (x) − z0 (x)) for x ∈ .
t→0 t
z = ż − ∇z · F0 .
For a functional → J () the shape derivative at with respect to the perturbation h is
defined as
1
J ()h = lim J (t ) − J () .
t→0 t
Consider now the cost functional
1
min J (y, , ) = y2d (11.1.2)
2
i i
i i
ItoKunisc
i i
2008/6/12
page 307
i i
subject to the constraint e(y, ) = 0 which is given by the mixed boundary value problem
−y = f in , (11.1.3)
y=0 on 0 , (11.1.4)
∂y
=g on . (11.1.5)
∂n
Here the boundary of the domain is the disjoint union of a fixed part 0 and the unknown
part , and f and g are given functions. A formal differentiation leads to the shape derivative
of the reduced cost functional
1 ∂y 2
Jˆ ()h = yy d + + κy 2 h · n d, (11.1.6)
2 ∂n
where y denotes the shape derivative of the solution y of (11.1.3) at with respect to
a deformation field h, and κ stands for the curvature of . For a thorough discussion of
the details we refer the reader to [DeZo, SoZo]. Differentiating formally the constraint
e(y, ) = 0 with respect to the domain, one obtains that y satisfies
−y = 0 in ,
y = 0 on 0 , (11.1.7)
∂y ∂g
= div (h · n ∇ y) + f + + κg h · n on ,
∂n ∂n
where div , ∇ stand for the tangential divergence and tangential gradient, respectively, on
the boundary . Introducing a suitably defined adjoint variable and using (11.1.7), the first
term on the right-hand side of (11.1.6) can be manipulated in such a way that Jˆ ()h can be
represented in the form required by the Zolesio–Hadamard structure theorem (see [DeZo])
Jˆ ()h = Gh · n d.
We emphasize that the kernel G does not involve the shape derivative y anymore. Although
y is only an intermediate quantity, a rigorous analysis requires justifying the formal steps in
the preceding discussion. In addition one has to verify that the solution of (11.1.7) actually
is the shape derivative of y in the sense of the definition in, e.g., [SoZo]. Furthermore, since
the trace of y on 0 is used in (11.1.7) one needs y ∈ H 1 (). However, y ∈ H 2 () is not
sufficient to allow for an interpretation of the Neumann condition in (11.1.7) in H −1/2 ().
Hence y ∈ H 1 () requires more regularity of the solution y of (11.1.3) than H 2 (). In
the approach of this chapter we utilize only y ∈ H 2 () for the characterization of the shape
derivative of Jˆ(). We return to this example in Section 11.3.
In Section 11.2 we present the proposed general framework to compute the shape
derivative for (11.1.1). Section 11.3 contains applications to shape optimization constrained
by linear elliptic systems, inverse interface problems, the Bernoulli problem, and shape
optimization for the Navier–Stokes equations.
i i
i i
ItoKunisc
i i
2008/6/12
page 308
i i
(i) = D,
(ii) = U ,
(iii) = U \ D̄.
Note that
˙ (∂ \ ) ⊂ U ∪
∂ = (∂ ∩ ) ∪ ˙ ∂U. (11.2.3)
Thus the boundary ∂ for the cases (i)–(iii) is given by
(i) ∂ = ∪ ∅ = ,
(ii) ∂ = ∅ ∪ ∂U = ∂U ,
(iii) ∂ = ∪ ∂U.
Ft = id + t h. (11.2.4)
Then there exists τ > 0 such that Ft (U ) = U and Ft is a diffeomorphism for |t| < τ .
Defining the perturbed domains
t = Ft ()
and the perturbed manifolds as
t = Ft (),
¯ t ⊂ U for |t| < τ . Note that since h|∂U = 0 the
it follows that t is of class C 1,1 and
boundary of U remains fixed as t varies, and hence by (11.2.3)
i i
i i
ItoKunisc
i i
2008/6/12
page 309
i i
Alternatively to (11.2.4) the perturbations could be described as the flow determined by the
initial value problem
d
χ (t) = h(χ (t)), χ (0; x) = x,
dt
with Ft (x) = χ (t; x), i.e., by the velocity method.
Let Jˆ(t ) be the functional defined by Jˆ(t ) = J (yt , t , t ), where yt satisfies the
constraint
e(yt , t ) = 0. (11.2.5)
1
Jˆ ()h = lim Jˆ(t ) − Jˆ() .
t→0 t
The functional Jˆ is called shape differentiable at if Jˆ ()h exists for all h ∈ C 1,1 (U, Rd )
and defines a continuous linear functional on C 1,1 (Ū , Rd ). Using the method of mappings
one transforms the perturbed state constraint (11.2.5) to the fixed domain . For this purpose
define
y t = yt ◦ Ft .
We suppress the dependence of ẽ on h, because h will denote a fixed vector field throughout.
Because F0 = id one obtains y 0 = y and
We axiomatize the above description and impose the following assumptions on ẽ,
respectively, e.
ẽ(y t , t) = 0 in X ∗ ,
(H2) There exists 0 < τ0 ≤ τ such that for |t| < τ0 there exists a unique solution y t ∈ X
to ẽ(y t , t) = 0 and
|y t − y 0 |X
lim = 0.
t→0 t 1/2
i i
i i
ItoKunisc
i i
2008/6/12
page 310
i i
for every ψ ∈ X, where y t and y are the solutions of (11.2.6) and (11.2.2),
respectively.
yt = y t ◦ Ft−1 .
Condition (H5) implies that j1 (y) ∈ L2 (), j1 (y) ∈ L2 ()l , j2 (y) ∈ L2 (), j2 (y) ∈
L2 ()l , and j3 (y) ∈ L2 (∂U ), j3 (y) ∈ L2 (∂U )l for y ∈ X. Hence the cost functional
J (y, , ) is well defined for every y ∈ X.
i i
i i
ItoKunisc
i i
2008/6/12
page 311
i i
ht dt = ht ◦Ft det DFt |(DFt )−T n| d.
t
Theorem 11.3. Assume that (H1)–(H5) hold, that F satisfies (11.2.8), and that the adjoint
equation
ey (y, )ψ, p
X∗ ×X − (j1 (y), ψ) − (j2 (y), ψ) − (j3 (y), ψ)\ = 0, ψ ∈ X,
(11.2.10)
i i
i i
ItoKunisc
i i
2008/6/12
page 312
i i
admits a unique solution p ∈ X, where y is the solution to (11.2.2). Then the shape
derivative of Jˆ at in the direction h exists and is given by
ˆ d
J ()h = − ẽ(y, t), p
X∗ ×X |t=0 + j1 (y) div h dx + j2 (y) div h ds. (11.2.11)
dt
where c > 0 does not depend on t. Employing the adjoint state p one obtains
(j1 (y), y t − y) + (j2 (y), y t − y) + (j3 (y), y t − y)∂U = ey (y, )(y t − y), p
X∗ ×X
= − e(y t , ) − e(y, ) − ey (y, )(y t − y), p
X∗ ×X
− ẽ(y t , t) − ẽ(y, t) − e(y t , ) + e(y, ), p
X∗ ×X
− ẽ(y, t) − ẽ(y, 0), p
X∗ ×X ,
(11.2.15)
i i
i i
ItoKunisc
i i
2008/6/12
page 313
i i
where we used (11.2.12). We estimate the ten additive terms on the right-hand side of
(11.2.13). Terms one, five, and nine converge to zero by (11.2.14) and (H2). Terms two
and six converge to 0 by (11.2.8) and (H2). For terms four and eight one uses (11.2.8).
The claim (11.2.11) now follows by passing to the limit in terms three, seven, and ten using
(11.2.15), (H3), (H2), (H4), and (H1).
To check (H2) in specific applications the following result will be useful. It relies on
⎧
⎪
⎪ the linearized equation
⎨
(H6) Ey (y, )δy, ψ
X∗ ×X = f, ψ
X∗ ×X , ψ ∈ X,
⎪
⎪
⎩
admits a unique solution δy ∈ X for every f ∈ X ∗ .
Note that this condition is more stringent than the assumption of solvability of the
adjoint equation in Theorem 11.3, which requires solvability only for a specific right-hand
side.
Proposition 11.4. Assume that (11.2.2) admits a unique solution y and that (H6) is satisfied.
Then (H2) holds.
Assumption (H6) implies that ẽy (y, 0) is bijective. The claim follows from the implicit
function theorem.
Lemma 11.5 (see [DeZo]). (1) Let f ∈ C(I, W 1,1 (U )) and assume that ft (0) exists in
L1 (U ). Then
d
f (t, x) dx|t=0 = ft (0, x) dx + f (0, x) h · n ds.
dt t
(2) Let f ∈ C(I, W 2,1 (U )) and assume that ft (0) exists in W 1,1 (U ). Then
d ∂
f (t, x) ds|t=0 = ft (0, x) ds + f (0, s) + κf (0, s) h · n ds,
dt t ∂n
The first part of the lemma is valid also for domains with Lipschitz continuous
boundary.
i i
i i
ItoKunisc
i i
2008/6/12
page 314
i i
In the examples below f (t) will be typically given by expressions of the form
d
(y ◦ Ft−1 )|t=0 = −(Dy) h.
dt
As a consequence we note that d
∂
dt i
(y ◦ Ft−1 ) |t=0 exists in L2 (U ) and is given by
d
∂i (y ◦ Ft−1 ) |t=0 = −∂i (Dy h), i = 1, . . . , d.
dt
In the next section ∇y stands for (Dy)T , where y is either a scalar- or vector-valued function.
To enhance readability we use two symbols for the inner product in Rd , (x, y), respectively,
x · y. The latter will be utilized only in the case of nested inner products.
11.3 Examples
Throughout this section it is assumed that (H5) is satisfied and that the regularity assumptions
of Section 11.2 for D, , and U hold. If J does not depend on , we write J (y, ) in place
of J (y, , ).
where X = H01 (), f ∈ H 1 (U ), and μ ∈ C 1 (Ū , Rd×d ) such that μ(x) is symmetric and
uniformly positive definite. Here = D and = ∂. Thus e(y, ) : X → X∗ is given by
e(y, ), ψ
X∗ ×X = (μ∇y, ∇ψ) − (f, ψ) .
i i
i i
ItoKunisc
i i
2008/6/12
page 315
i i
Hence (H4) follows from differentiability of μ, (11.2.8), and (H2). In view of Theorem 11.3
we have to compute dtd ẽ(y, t), p
X∗ ×X |t=0 for which we use the representation on t in
(11.3.2). Recall that the solution y of (11.3.1) as well as the adjoint state p, defined by
belong to H 2 () ∩ H01 (). Since ∈ C 1,1 (actually Lipschitz continuity of the boundary
would suffice), y as well as p can be extended to functions in H 2 (U ), which we again
denote by the same symbol. Therefore by Lemmas 11.5 and 11.6
d
ẽ(y, t), p
X∗ ×X |t=0
dt
d
= (μ∇(y ◦ Ft−1 ), ∇(p ◦ Ft−1 )) dxt − fp ◦ Ft−1 dxt |t=0
dt t t
= (μ∇y, ∇p) (h, n) ds + (μ∇(−∇y · h), ∇p)
+ μ∇y, ∇(−∇p · h) + f (∇p, h) dx. (11.3.4)
Note that ∇y · h as well as ∇p · h do not belong to H01 () but they are elements of H 1 ().
Therefore Green’s theorem implies
(μ∇(−∇y · h), ∇p) + (μ∇y, ∇(−∇p · h)) + f (∇p, h) dx
= div(μ∇p) (∇y, h) dx − (μ∇p, n)(∇y, h) ds
(11.3.5)
+ (div(μ∇y) + f ) (∇p, h) dx − (μ∇y, n) (∇p, h) ds
∂y ∂p
= − j1 (y) (∇y, h) dx − 2 (μn, n) (h, n) ds.
∂n ∂n
i i
i i
ItoKunisc
i i
2008/6/12
page 316
i i
Above we used the strong form of (11.3.1) and (11.3.3) in L2 () as well as the identities
∂y ∂y
(μ∇y, n) = (μn, n) , (∇y, h) = (h, n)
∂n ∂n
(together with the ones with y and p interchanged) which follow from y, p ∈ H01 ().
Applying Theorem 11.3 results in
ˆ d
J ()h = − ẽ(y, t), p
X∗ ,X |t=0 + j1 (y) div h dx
dt
∂y ∂p
= (μn, n) (h, n) ds + (j1 (y)(∇y, h) + j1 (y) div h) dx
∂n ∂n
∂y ∂p
= (μn, n) (h, n) ds + div(j1 (y)h) dx,
∂n ∂n
ary which represents the interface between − and + . The inverse problem consists of
identifying the unknown interface from measurements z which are taken on the boundary
∂U . This can be formulated as
min J (y, ) ≡ (y − z)2 ds (11.3.6)
∂U
− div(μ∇y) = 0 in − ∪ + ,
1 2
∂y
[y] = 0, μ − = 0 on , (11.3.7)
∂n
∂y
=g on ∂U,
∂n
i i
i i
ItoKunisc
i i
2008/6/12
page 317
i i
where g ∈ H 1/2 (∂U ), z ∈ L2 (∂U ), with ∂U g = ∂U z = 0, [v] = v + − v − on , and
n+/− standing for the unit outer normals to +/− . The conductivity μ is given by
-
μ− , x ∈ − ,
μ(x) =
μ+ , x ∈ + ,
for some positive constants μ− and μ+ . In the context of the general framework of Section
11.2 we have j1 = j2 = 0 and j3 = (y − z) . Clearly
2
(11.3.7) admits a unique solution
y ∈ H (U ) with ∂U y = 0. Its restrictions to + and − will be denoted by y + and y − ,
1
respectively. It turns out that the regularity of y ± is better than the one of y.
− ⊂ H ⊂ H ⊂ U.
i i
i i
ItoKunisc
i i
2008/6/12
page 318
i i
e(y, ), ψ
X∗ ×X = (μ∇y, ∇ψ)U − (g, ψ)∂U ,
respectively,
ẽ(y, t), ψ
X∗ ×X = (μt At ∇y, At ∇ψIt )U − (g, ψ)∂U
= (μ+ ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))+t + (μ− ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))−t − (g, ψ)∂U .
Note that the boundary term is not affected by the transformation Ft since the deformation
field h vanishes on ∂U . The adjoint state is given by
− div(μ∇p) = 0 in − ∪ + ,
1 2
∂p
[p] = 0, μ − = 0 on , (11.3.9)
∂n
∂p
= 2(y − z) on ∂U,
∂n
respectively,
i i
i i
ItoKunisc
i i
2008/6/12
page 319
i i
d
ẽ(y, t), p
X∗ ×X |t=0
dt
= (μ+ ∇y + , ∇p+ )(h, n+ ) ds − (μ+ ∇(∇y + · h), ∇p + ) dx
∂+ +
+ + +
− (μ ∇y , ∇(∇p · h)) dx + (μ− ∇y − , ∇p− )(h, n− ) ds
+ ∂−
− (μ− ∇(∇y − · h), ∇p − ) dx − (μ− ∇y − , ∇(∇p − · h)) dx
− −
+
= [μ∇y, ∇p](h, n ) ds − μ (∇(∇y + · h), ∇p + ) + (∇y + , ∇(∇p + · h)) dx
+
+
Applying Green’s formula as in Example 11.3.1 (observe that (∇y, h), (∇p, h) ∈ / H 1 (U ))
together with (11.3.9) results in
− (μ+ ∇(∇y + · h), ∇p + )) dx − (μ− ∇(∇y − · h), ∇p − )) dx
+ −
+ + +
= div(μ ∇p )(∇y , h) dx + div(μ− ∇p − )(∇y − , h) dx
+ −
− (μ+ ∇p + , n+ )(∇y + , h) ds − (μ− ∇p − , n− )(∇y − , h) ds
∂+ ∂−
1 2
∂p
=− μ + (∇y, h) ds.
∂n
The identity
[ab] = [a]b+ + a − [b] = a + [b] + [a]b−
implies
[ab] = 0 if [a] = [b] = 0.
i i
i i
ItoKunisc
i i
2008/6/12
page 320
i i
i i
i i
ItoKunisc
i i
2008/6/12
page 321
i i
(2) a(x, ·, ·) defines a bilinear form on Rd×d × Rd×d which is uniformly bounded in
x ∈ Ū ,
where e(y) = 12 (∇y + (∇y)T ), and λ, μ are the positive Lamé coefficients. In this case a
is symmetric, and (11.3.12) admits a unique solution in X ∩ H 2 ()l for every f ∈ L2 ()l
1
and g ∈ H 2 (∂U )l ; see, e.g., [Ci].
The method of mapping suggests defining
ẽ(y, t), ψ
X∗ ×X = a(Ft (x), At ∇y, At ∇ψ) − (f t , ψ) It dx − (g t , ψ)wt ds
= a(x, ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 )) − (f, ψ ◦Ft−1 ) dx − (g, ψ ◦Ft−1 ) ds.
t t
(11.3.13)
i i
i i
ItoKunisc
i i
2008/6/12
page 322
i i
that Proposition 11.4 is applicable. All these properties are satisfied for the linear elasticity
case. Assumptions (H1)–(H4) can then be argued as in Section 11.3.1.
Employing Lemma 11.5 we obtain
d
ẽ(y, t), p
X∗ ×X |t=0 = − a(x, ∇(∇y T h), ∇p) + a(x, ∇y, ∇(∇p T h)) dx
dt
+ a(x, ∇y, ∇p) (h, n) ds + (f, ∇p h) dx − (f, p) (h, n) ds
T
∂
+ (g, ∇p T h) ds − (g, p) + κ(g, p) (h, n) ds.
∂n
which implies
Jˆ ()h = j1 (y)∇y T h dx + j1 (y) div h dx
+ j2 (y)∇y T h ds + j2 (y) div h ds
∂
+ −a(x, ∇y, ∇p) + (f, p) + (g, p) + κ(g, p) (h, n) ds.
∂n
For the third and fourth terms the tangential Green’s formula (see, e.g., [DeZo]) yields
∂
j2 (y)∇y T h ds + j2 (y) div h ds = j2 (y) + κj2 (y) (h, n) ds.
∂n
The first and second terms can be combined using the Stokes theorem. Summarizing we
finally obtain
Jˆ ()h = −a(x, ∇y, ∇p) + (f, p) + j1 (y)
(11.3.15)
∂
+ j2 (y) + (g, p) + κ(j2 (y) + (g, p)) (h, n) ds.
∂n
This example also comprises the shape optimization problem of Bernoulli type:
min J (y, , ) ≡ min y 2 ds,
i i
i i
ItoKunisc
i i
2008/6/12
page 323
i i
Considering (11.3.17) on a perturbed domain t mapping the equation back to the reference
domain yields the form of ẽ(y, t). Concerning the transformation of the divergence we
note that for ψt ∈ H01 (t )d and ψ t = ψt ◦ Ft ∈ H01 ()d , one obtains
div ψt = (Dψit ATt ei ) ◦ Ft−1 = ((At )i ∇ψt,i ) ◦ Ft−1 ,
i i
i i
ItoKunisc
i i
2008/6/12
page 324
i i
where ei stands for the ith canonical basis vector in Rd and (At )i denotes the ith row of
At = (DFt )−T . We follow the convention to sum over indices which occur at least twice
in a term. Thus one obtains
t t
ẽ (y , p ), t , (ψ, χ ) X∗ ×X = ν(It At ∇y t , At ∇ψ) + (y t · At ∇)y t , It ψ
− p t , It (At )k ∇ψk − (f t It , ψ) + It (At )k ∇ykt , χ = 0
which amounts to
ν(∇ψ, ∇λ) + ((ψ · ∇)y + (y · ∇)ψ, λ)
(11.3.18)
− (χ , div λ) + (div ψ, q) = (j1 (y), ψ)
+ (ψ · λ) (y · n) ds = −(ψ, (y · ∇)λ)
holds for all ψ ∈ H 1 ()d . As a consequence the adjoint equation can be interpreted as
To verify conditions (H1)–(H4) we introduce the continuous trilinear form c : H01 ()d ×
H01 ()d × H01 ()d by c(y, v, w) = ((y · ∇)v, w) and assume that
i i
i i
ItoKunisc
i i
2008/6/12
page 325
i i
where
c(y, v, w) c(v, v, y)
N = supy,v,w∈H01 and M = supv∈H01 ,
|y|H01 |v|H01 |w|H01 |v|2H 1
0
i i
i i
ItoKunisc
i i
2008/6/12
page 326
i i
d
ẽ (y, p), t , (λ, q) X∗ ×X |t=0
dt
= ν(∇y, ∇ψλ ) + ((y · ∇)y, ψλ ) − (p, div ψλ ) − (f, ψλ )
+ ν(∇ψy , ∇λ) + ((ψy · ∇)y + (y · ∇)ψy , λ)
+ (div ψy , q) + ν (∇y, ∇λ) (h, n) ds,
ψy = −(∇y)T h, ψλ = −(∇λ)T h.
Note that ψy , ψλ ∈ H 1 ()d but not in H01 ()d . Green’s formula, together with (11.3.16),
(11.3.20), entails
d
ẽ (y, p), t , (λ, q) X∗ ×X |t=0
dt
∂y
= (−νy + (y · ∇y)y + ∇p − f, ψλ ) + ν , ψλ ds + p (ψλ , n) ds
∂n
∂λ
+ (ψy , −νλ + (∇y)λ − (y · ∇)λ − ∇q) + ν , ψy ds
∂n
i i
i i
ItoKunisc
i i
2008/6/12
page 327
i i
Bibliography
327
i i
i i
ItoKunisc
i i
2008/6/12
page 328
i i
328 Bibliography
i i
i i
ItoKunisc
i i
2008/6/12
page 329
i i
Bibliography 329
[BoK] A. Borzì and K. Kunisch, The numerical solution of the steady state solid
fuel ignition model and its optimal control, SIAM J. Sci. Comput. 22(2000),
263–284.
[ChLi] A. Chambolle and P.-L. Lions, Image recovery via total bounded variation
minimization and related problems, Numer. Math. 76(1997), 167–188.
[ChZo] Z. Chen and J. Zou, Finite element methods and their convergence for elliptic
and parabolic interface problems, Numer. Math. 79(1998), 175–202.
[Cla] F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley and Sons,
New York, 1983.
[CNQ] X. Chen, Z. Nashed, and L. Qi, Smoothing methods and semismooth methods
for nondifferentiable operator equations, SIAM J. Numer. Anal. 38(2000),
1200–1216.
i i
i i
ItoKunisc
i i
2008/6/12
page 330
i i
330 Bibliography
[CoKu] F. Colonius and K. Kunisch, Output least squares stability for parameter esti-
mation in two point value problems, J. Reine Angew. Math. 370(1986), 1–29.
[DeZo] M. C. Delfour and J.-P. Zolésio, Shapes and Geometries: Analysis, Differential
Calculus, and Optimization, SIAM, Philadelphia, 2001.
[DoSa] D. C. Dobson and F. Santosa, Recovery of blocky images from noisy and blurred
data, SIAM J. Appl. Math. 56(1996), 1181–1198.
[EcJa] C. Eck and J. Jarusek, Existence results for the static contact problem with
Coulomb friction, Math. Models Methods Appl. Sci. 8(1998), 445–468.
[EkTe] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, North–
Holland, Amsterdam, 1976.
[Fr] A. Friedman, Variational Principles and Free Boundary Value Problems, John
Wiley and Sons, New York, 1982.
[GeYa] D. Geman and C. Yang, Nonlinear image recovery with half-quadratic regu-
larization, IEEE Trans. Image Process. 4(1995), 932–945.
i i
i i
ItoKunisc
i i
2008/6/12
page 331
i i
Bibliography 331
[GiRa] V. Girault and P.-A. Raviart, Finite Element Methods for Navier–Stokes Equa-
tions, Springer-Verlag, Berlin, 1986.
[GrVo] R. Griesse and S. Volkwein, A primal-dual active set strategy for optimal
boundary control of a nonlinear reaction-diffusion system, SIAM J. Control
Optim. 44(2005), 467–494.
[Han] S.-P. Han, Superlinearly convergent variable metric algorithms for general
nonlinear programming problems, Math. Programming 11(1976), 263–282.
[HaPaRa] S.-H. Han, J.-S. Pang, and N. Rangaray, Glabally convergent Newton methods
for nonsmooth equations, Math. Oper. Res. 17(1992), 586–607.
i i
i i
ItoKunisc
i i
2008/6/12
page 332
i i
332 Bibliography
i i
i i
ItoKunisc
i i
2008/6/12
page 333
i i
Bibliography 333
[IK6] K. Ito and K. Kunisch, On the choice of the regularization parameter in non-
linear inverse problems, SIAM J. Optim. 2(1992), 376–404.
[IK7] K. Ito and K. Kunisch, Sensitivity measures for the estimation of parameters
in 1-D elliptic boundary value problems, J. Math. Systems Estim., Control
6(1996), 195–218.
[IK8] K. Ito and K. Kunisch, Maximizing robustness in nonlinear illposed inverse
problems, SIAM J. Control Optim. 33(1995), 643–666.
[IK9] K. Ito and K. Kunisch, Augmented Lagrangian-SQP-methods in Hilbert spaces
and application to control in the coefficients problems, SIAM J. Optim.
6(1996), 96–125.
[IK10] K. Ito and K. Kunisch, Augmented Lagrangian-SQP methods for nonlinear
optimal control problems of tracking type, SIAM J. Control Optim. 34(1996),
874–891.
[IK11] K. Ito and K. Kunisch, Augmented Lagrangian methods for nonsmooth convex
optimization in Hilbert spaces, Nonlinear Anal. 41(2000), 573–589.
[IK12] K. Ito and K. Kunisch, An active set strategy based on the augmented
Lagrangian formulation for image restoration, MZAN Math. Model Nu-
mer. Anal. 33(1999), 1–21.
[IK13] K. Ito and K. Kunisch, Augmented Lagrangian formulation of nonsmooth,
convex optimization in Hilbert spaces, in Control of Partial Differential Equa-
tions, E. Casas, ed., Lecture Notes in Pure andAppl. Math. 174, Marcel Dekker,
New York, 1995, 107–117.
[IK14] K. Ito and K. Kunisch, Estimation of the convection coefficient in elliptic
equations, Inverse Problems 13(1997), 995–1013.
[IK15] K. Ito and K. Kunisch, Newton’s method for a class of weakly singular optimal
control problems, SIAM J. Optim. 10(1999), 896–916.
[IK16] K. Ito and K. Kunisch, Optimal control of elliptic variational inequalities,
Appl. Math. Optim. 41(2000), 343–364.
[IK17] K. Ito and K. Kunisch, Optimal control of the solid fuel ignition model with
H 1 -cost, SIAM J. Control Optim. 40(2002), 1455–1472.
[IK18] K. Ito and K. Kunisch, BV-type regularization methods for convoluted objects
with edge, flat and grey scales, Inverse Problems 16(2000), 909–928.
[IK19] K. Ito and K. Kunisch, Optimal control, in Encyclopedia of Electrical and
Electronic Engineering, J. G. Webster, ed., 15, John Wiley and Sons, New
York, 1999, 364–379.
[IK20] K. Ito and K. Kunisch, Semi-smooth Newton methods for variational inequal-
ities of the first kind, MZAN Math. Model. Numer. Anal. 37(2003), 41–62.
i i
i i
ItoKunisc
i i
2008/6/12
page 334
i i
334 Bibliography
[IK21] K. Ito and K. Kunisch, The primal-dual active set method for nonlinear optimal
control problems with bilateral constraints, SIAM J. Control Optim. 43(2004),
357–376.
[IK23] K. Ito and K. Kunisch, Parabolic variational inequalities: The Lagrange mul-
tiplier approach, J. Math. Pures Appl. (9) 85(2006), 415–449.
[IKPe] K. Ito, K. Kunisch, and G. Peichl, On the regularization and the numerical
treatment of the inf-sup condition for saddle point problems, Comput. Appl.
Math. 21(2002), 245–274.
[ItKa] K. Ito and F. Kappel, Evolution Equations and Approximations, World Scien-
tific, River Edge, NJ, 2002.
[JaSa] H. Jäger and E. W. Sachs, Global convergence of inexact reduced SQP methods,
Optim. Methods Softw. 7(1997), 83–110.
[KaK] A. Kauffmann and K. Kunisch, Optimal control of the solid fuel ignition model,
ESAIM Proc. 8(2000), 65–76.
i i
i i
ItoKunisc
i i
2008/6/12
page 335
i i
Bibliography 335
[KRe] K. Kunisch and F. Rendl, An infeasible active set method for quadratic prob-
lems with simple bounds, SIAM J. Optim. 14(2003), 35–52.
[KRo] K. Kunisch and A. Rösch, Primal-dual active set strategy for a general class
of optimal control problems, to appear in SIAM J. Optim.
[KSa] K. Kunisch and E. W. Sachs, Reduced SQP methods for parameter identifica-
tion problems, SIAM J. Numer. Anal. 29(1992), 1793–1820.
[KuSt] K. Kunisch and G. Stadler, Generalized Newton methods for the 2D-Signorini
contact problem with friction in function space, MZAN Math. Model. Numer.
Anal. 39(2005), 827–854.
[KuTa] K. Kunisch and X-. Tai, Sequential and parallel splitting methods for bilinear
control problems in Hilbert spaces, SIAM J. Numer. Anal. 34(1997), 91–118.
i i
i i
ItoKunisc
i i
2008/6/12
page 336
i i
336 Bibliography
[Lio3] J.-L. Lions, Quelques Methodes de Resolution de Problemes aux Limites Non
Lineares, Dunod, Paris, 1969.
[MaZo] H. Maurer and J. Zowe, First and second-order necessary and sufficient opti-
mality conditions for infinite-dimensional programming problems, Math. Pro-
gramming 16(1979), 98–110.
[MuSi] F. Murat and J. Simon, Sur le controle par un domain geometrique, Rapport
76015, Universite Pierre et Marie Curie, Paris, 1976.
[Pan1] J. S. Pang, Newton’s method for B-differentiable equations, Math. Oper. Res.
15(1990), 311–341.
[PoTr] V. T. Polak and N. Y. Tret’yakov, The method of penalty estimates for condi-
tional extremum problems, Žurnal Vyčislitel’noı̌ Matematiki i Matematičeskoı̌
Fiziki 13(1973), 34–46.
[Qi] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa-
tions, Math. Oper. Res. 18(1993), 227–244.
i i
i i
ItoKunisc
i i
2008/6/12
page 337
i i
Bibliography 337
[Rao] M. Raous, Quasistatic Signorini problem with Coulomb friction and coupling
to adhesion, in New Developments in Contact Problems, P. Wriggers and
Panagiotopoulos, eds., CISM Courses and Lectures 384, Springer-Verlag, New
York, 1999, 101–178.
[Ro2] S. M. Robinson, Stability theory for systems of inequalities, Part II: Differen-
tiable nonlinear systems, SIAM J. Numer. Anal. 13(1976), 497–513.
[ROF] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise
removal algorithms, Phys. D 60 (1992), 259–268.
i i
i i
ItoKunisc
i i
2008/6/12
page 338
i i
338 Bibliography
[Te] R. Temam, Navier Stokes Equations: Theory and Numerical Analysis, North–
Holland, Amsterdam, 1979.
[Tro] F. Troeltzsch, An SQP method for the optimal control of a nonlinear heat
equation, Control Cybernet. 23(1994), 267–288.
i i
i i
ItoKunisc
i i
2008/6/12
page 339
i i
Index
339
i i
i i
ItoKunisc
i i
2008/6/12
page 340
i i
340 Index
i i
i i
ItoKunisc
i i
2008/6/12
page 341
i i
Index 341
i i
i i
ItoKunisc
i i
2008/6/12
page 342
i i
i i
i i