100% found this document useful (3 votes)
2K views360 pages

Lagrange Multiplier Approach To Variational Problems and Applications

SIAM's Advances in Design and Control series consists of texts and monographs dealing with all areas of Design and Control and their applications. Topics of interest include shape optimization, multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses on the mathematical and computational aspects of engineering Design and Control that are usable in a wide variety of scientific and engineering disciplines.

Uploaded by

Adrian Iacob
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
2K views360 pages

Lagrange Multiplier Approach To Variational Problems and Applications

SIAM's Advances in Design and Control series consists of texts and monographs dealing with all areas of Design and Control and their applications. Topics of interest include shape optimization, multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses on the mathematical and computational aspects of engineering Design and Control that are usable in a wide variety of scientific and engineering disciplines.

Uploaded by

Adrian Iacob
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 360

Lagrange Multiplier

Approach to
Variational Problems
and Applications
Advances in Design and Control
SIAM’s Advances in Design and Control series consists of texts and monographs dealing with all
areas of design and control and their applications. Topics of interest include shape optimization,
multidisciplinary design, trajectory optimization, feedback, and optimal control. The series focuses
on the mathematical and computational aspects of engineering design and control that are usable
in a wide variety of scientific and engineering disciplines.

Editor-in-Chief
Ralph C. Smith, North Carolina State University

Editorial Board
Athanasios C. Antoulas, Rice University
Siva Banda, Air Force Research Laboratory
Belinda A. Batten, Oregon State University
John Betts, The Boeing Company
Stephen L. Campbell, North Carolina State University
Eugene M. Cliff, Virginia Polytechnic Institute and State University
Michel C. Delfour, University of Montreal
Max D. Gunzburger, Florida State University
J. William Helton, University of California, San Diego
Arthur J. Krener, University of California, Davis
Kirsten Morris, University of Waterloo
Richard Murray, California Institute of Technology
Ekkehard Sachs, University of Trier

Series Volumes
Ito, Kazufumi and Kunisch, Karl, Lagrange Multiplier Approach to Variational Problems and
Applications
Xue, Dingyü, Chen, YangQuan, and Atherton, Derek P., Linear Feedback Control: Analysis and
Design with MATLAB
Hanson, Floyd B., Applied Stochastic Processes and Control for Jump-Diffusions: Modeling,
Analysis, and Computation
Michiels, Wim and Niculescu, Silviu-Iulian, Stability and Stabilization of Time-Delay Systems:
An Eigenvalue-Based Approach
Ioannou, Petros and Fidan, Baris, ¸ Adaptive Control Tutorial
Bhaya, Amit and Kaszkurewicz, Eugenius, Control Perspectives on Numerical Algorithms and
Matrix Problems
Robinett III, Rush D., Wilson, David G., Eisler, G. Richard, and Hurtado, John E., Applied Dynamic
Programming for Optimization of Dynamical Systems
Huang, J., Nonlinear Output Regulation: Theory and Applications
Haslinger, J. and Mäkinen, R. A. E., Introduction to Shape Optimization: Theory, Approximation,
and Computation
Antoulas, Athanasios C., Approximation of Large-Scale Dynamical Systems
Gunzburger, Max D., Perspectives in Flow Control and Optimization
Delfour, M. C. and Zolésio, J.-P., Shapes and Geometries: Analysis, Differential Calculus, and
Optimization
Betts, John T., Practical Methods for Optimal Control Using Nonlinear Programming
El Ghaoui, Laurent and Niculescu, Silviu-Iulian, eds., Advances in Linear Matrix Inequality Methods
in Control
Helton, J. William and James, Matthew R., Extending H∞ Control to Nonlinear Systems: Control
of Nonlinear Systems to Achieve Performance Objectives
Lagrange Multiplier
Approach to
Variational Problems
and Applications

Kazufumi Ito
North Carolina State University
Raleigh, North Carolina

Karl Kunisch
University of Graz
Graz, Austria

Society for Industrial and Applied Mathematics


Philadelphia
Copyright © 2008 by the Society for Industrial and Applied Mathematics.

10 9 8 7 6 5 4 3 2 1

All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission
of the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA.

Trademarked names may be used in this book without the inclusion of a trademark
symbol. These names are used in an editorial context only; no infringement of trademark
is intended.

Library of Congress Cataloging-in-Publication Data

Ito, Kazufumi.
Lagrange multiplier approach to variational problems and applications / Kazufumi Ito,
Karl Kunisch.
p. cm. -- (Advances in design and control ; 15)
Includes bibliographical references and index.
ISBN 978-0-898716-49-8 (pbk. : alk. paper)
1. Linear complementarity problem. 2. Variational inequalities (Mathematics).
3. Multipliers (Mathematical analysis). 4. Lagrangian functions. 5. Mathematical
optimization. I. Kunisch, K. (Karl), 1952– II. Title.

QA402.5.I89 2008
519.3--dc22
2008061103

is a registered trademark.
To our families:

Junko, Yuka, and Satoru

Brigitte, Katharina, Elisabeth, and Anna


ItoKunisc
i i
2008/6/12
page vii
i i

Contents

Preface xi

1 Existence of Lagrange Multipliers 1


1.1 Problem statement and generalities . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A generalized open mapping theorem . . . . . . . . . . . . . . . . . . . . . 3
1.3 Regularity and existence of Lagrange multipliers . . . . . . . . . . . . . . 5
1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Weakly singular problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Approximation, penalty, and adapted penalty techniques . . . . . . . . . . . 23
1.6.1 Approximation techniques . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.2 Penalty techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Sensitivity Analysis 27
2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Stability results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Lipschitz continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 Application to optimal control of an ordinary differential equation . . . . . 62

3 First Order Augmented Lagrangians for Equality and


Finite Rank Inequality Constraints 65
3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Augmentability and sufficient optimality . . . . . . . . . . . . . . . . . . . 67
3.3 The first order augmented Lagrangian algorithm . . . . . . . . . . . . . . . 75
3.4 Convergence of Algorithm ALM . . . . . . . . . . . . . . . . . . . . . . . 78
3.5 Application to a parameter estimation problem . . . . . . . . . . . . . . . . 82

4 Augmented Lagrangian Methods for Nonsmooth, Convex Optimization 87


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Convex analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 Conjugate and biconjugate functionals . . . . . . . . . . . . . . . . 92
4.2.2 Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Fenchel duality theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

vii

i i

i i
ItoKunisc
i i
2008/6/12
page viii
i i

viii Contents

4.4 Generalized Yosida–Moreau approximation . . . . . . . . . . . . . . . . . 104


4.5 Optimality systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6 Augmented Lagrangian method . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.7.1 Bingham flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.7.2 Image restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.7.3 Elastoplastic problem . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.7.4 Obstacle problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.7.5 Signorini problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.7.6 Friction problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7.7 L1 -fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.7.8 Control problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5 Newton and SQP Methods 129


5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.2 Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.3 SQP and reduced SQP methods . . . . . . . . . . . . . . . . . . . . . . . . 137
5.4 Optimal control of the Navier–Stokes equations . . . . . . . . . . . . . . . 143
5.4.1 Necessary optimality condition . . . . . . . . . . . . . . . . . . . . 145
5.4.2 Sufficient optimality condition . . . . . . . . . . . . . . . . . . . . 147
5.4.3 Newton’s method for (5.4.1) . . . . . . . . . . . . . . . . . . . . . 147
5.5 Newton method for the weakly singular case . . . . . . . . . . . . . . . . . 148

6 Augmented Lagrangian-SQP Methods 155


6.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.2 Equality-constrained problems . . . . . . . . . . . . . . . . . . . . . . . . 156
6.3 Partial elimination of constraints . . . . . . . . . . . . . . . . . . . . . . . 165
6.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.1 An introductory example . . . . . . . . . . . . . . . . . . . . . . . 172
6.4.2 A class of nonlinear elliptic optimal control problems . . . . . . . . 174
6.5 Approximation and mesh-independence . . . . . . . . . . . . . . . . . . . 183
6.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

7 The Primal-Dual Active Set Method 189


7.1 Introduction and basic properties . . . . . . . . . . . . . . . . . . . . . . . 189
7.2 Monotone class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.3 Cone sum preserving class . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.4 Diagonally dominated class . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.5 Bilateral constraints, diagonally dominated class . . . . . . . . . . . . . . 202
7.6 Nonlinear control problems with bilateral constraints . . . . . . . . . . . . 206

8 Semismooth Newton Methods I 215


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
8.2 Semismooth functions in finite dimensions . . . . . . . . . . . . . . . . . . 217
8.2.1 Basic concepts and the semismooth Newton algorithm . . . . . . . . 217
8.2.2 Globalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

i i

i i
ItoKunisc
i i
2008/6/12
page ix
i i

Contents ix

8.2.3 Descent directions . . . . . . . . . . . . . . . . . . . . . . . . . . . 225


8.2.4 A Gauss–Newton algorithm . . . . . . . . . . . . . . . . . . . . . . 228
8.2.5 A nonlinear complementarity problem . . . . . . . . . . . . . . . . 231
8.3 Semismooth functions in infinite-dimensional spaces . . . . . . . . . . . . 234
8.4 The primal-dual active set method as a semismooth Newton method . . . . 240
8.5 Semismooth Newton methods for a class of nonlinear complementarity
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8.6 Semismooth Newton methods and regularization . . . . . . . . . . . . . . 246

9 Semismooth Newton Methods II: Applications 253


9.1 BV-based image restoration problems . . . . . . . . . . . . . . . . . . . . 254
9.2 Friction and contact problems in elasticity . . . . . . . . . . . . . . . . . . 263
9.2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
9.2.2 Contact problem with Tresca friction . . . . . . . . . . . . . . . . . 265
9.2.3 Contact problem with Coulomb friction . . . . . . . . . . . . . . . . 272

10 Parabolic Variational Inequalities 277


10.1 Strong solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
10.2 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
10.3 Continuity of q → y(q) ∈ L∞ () . . . . . . . . . . . . . . . . . . . . . . 292
10.4 Difference schemes and weak solutions . . . . . . . . . . . . . . . . . . . 297
10.5 Monotone property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

11 Shape Optimization 305


11.1 Problem statement and generalities . . . . . . . . . . . . . . . . . . . . . . 305
11.2 Shape derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
11.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
11.3.1 Elliptic Dirichlet boundary value problem . . . . . . . . . . . . . . 314
11.3.2 Inverse interface problem . . . . . . . . . . . . . . . . . . . . . . . 316
11.3.3 Elliptic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
11.3.4 Navier–Stokes system . . . . . . . . . . . . . . . . . . . . . . . . . 323

Bibliography 327

Index 339

i i

i i
ItoKunisc
i i
2008/6/12
page x
i i

i i

i i
ItoKunisc
i i
2008/6/12
page xi
i i

Preface

The objective of this monograph is the treatment of a general class of nonlinear variational
problems of the form
min f (y, u)
y∈Y, u∈U
(0.0.1)
subject to e(y, u) = 0, g(y, u) ∈ K,
where f : Y ×U → R denotes the cost functional, and e : Y ×U → W and g : Y ×U → Z
are functionals used to describe equality and inequality constraints. Here Y , U , W , and Z
are Banach spaces and K is a closed convex set in Z. A special choice for K is simple box
constraints
K = {z ∈ Z : φ ≤ z ≤ ψ}, (0.0.2)
where Z is a lattice with ordering ≤, and φ, ψ are elements in Z. Theoretical issues which
will be treated include the existence of minimizers, optimality conditions, Lagrange mul-
tiplier theory, sufficient optimality considerations, and sensitivity analysis of the solutions
to (0.0.1) with respect to perturbations in the problem data. These topics will be covered
mainly in the first part of this monograph. The second part focuses on selected computa-
tional methods for solving the constrained minimization problem (0.0.1). The final chapter
is devoted to the characterization of shape gradients for optimization problems constrained
by partial differential equations.
Problems which fit into the framework of (0.0.1) are quite general and arise in ap-
plication areas that were intensively investigated during the last half century. They include
optimal control problems, structural optimization, inverse and parameter estimation prob-
lems, contact and friction problems, problems in image reconstruction and mathematical
finance, and others. The variable y is often referred to as the state variable and u as the
control or design parameter. The relationship between the variables y and u is described by
e. It typically represents a differential or a functional equation. If e can be used to express
the variable y as a function of u, i.e., y = (u), then (0.0.1) reduces to

min J (u) = f ((u), u) subject to g((u), u) ∈ K. (0.0.3)


u∈U

This is referred to as the reduced formulation of (0.0.1). In the more general case when y
and u are both independent variables linked by the equation constraint e(y, u) = 0 it will
be convenient at times to introduce x = (y, u) in X = Y × U and to express (0.0.1) as

min f (x) subject to e(x) = 0, g(x) ∈ K. (0.0.4)


x∈X

xi

i i

i i
ItoKunisc
i i
2008/6/12
page xii
i i

xii Preface

From a computational perspective it can be advantageous to treat y and u as independent


variables even though a representation of y in terms of u is available.
As an example consider the inverse problem of determining the diffusion parameter
a in
−∇ · (a ∇y) = f in , y|∂ = 0 (0.0.5)
from the measurement yobs , where  is a bounded open set in R2 and f ∈ H −1 (). A
nonlinear least squares approach to this problem can be formulated as (0.0.1) by choosing
y ∈ Y = H01 (), a ∈ U = H 1 () ∩ L∞ (), W = H −1 (), Z = L2 () and considering

min (|y − yobs |2 + β |∇a|2 ) dx
 (0.0.6)
subject to (0.0.5) and a ≤ a ≤ ā,

where 0 < a < ā < ∞ are lower and upper bounds for the “control” variable a. In this
example, given a ∈ U , the state y ∈ Y can be uniquely determined by solving the elliptic
boundary value problem (0.0.5) for y, and we obtain a problem of type (0.0.3).
The general approach that we follow in this monograph for the analytical as well as the
numerical treatment of (0.0.1) is based on Lagrange multiplier theory. Let us subsequently
suppose that Y, U , and W are Hilbert spaces and discuss the case Z = U and g(y, u) = u,
i.e., the constraint u ∈ K. We assume that f and e are C 1 and denote by fy , fu the Fréchet
derivatives of f with respect to y and u, respectively. Let W ∗ be the dual space of W and
let the duality product be denoted by ·, ·
W ∗ ,W . The analogous notation is used for Z and
Z ∗ . We form the Lagrange functional

L(y, u, λ) = f (y, u) + e(y, u), λ


W,W ∗ , (0.0.7)

where λ ∈ W ∗ is the Lagrange multiplier associated with the equality constraint e(y, u) = 0
which, for the present discussion, is supposed to exist. It will be shown that a minimizing
pair (y, u) satisfies

Ly (y, u) = fy (y, u) + ey (y, u)∗ λ = 0,


fu (y, u) + eu (y, u)∗ λ , v − u
Z∗ ,Z ≥ 0 for all v ∈ K, (0.0.8)
e(y, u) = 0 and u ∈ K.

In the case K = Z the second equation of (0.0.8) results in the equality fy (y, u) +
eu (y, u) = 0. In this case a first possibility for solving the system (0.0.8) for the unknowns
(y, u, λ) is the use of a direct equation solver of Newton type, for example. Here the
Lagrange multiplier λ is treated as an independent variable just like y and u. Alternatively,
for (y, u) satisfying e(y, u) = 0 and λ satisfying fy (y, u) + eu (y, u)∗ λ = 0 the gradient of
J (u) of (0.0.4) can be evaluated as

Ju = fu (y, u) + eu (y, u)∗ λ. (0.0.9)

Thus the combined step of determining y ∈ Y for given u ∈ K such that e(y, u) = 0
and finding λ ∈ W ∗ satisfying fy (y, u) + eu (y, u)∗ λ = 0 for (y, u) ∈ Y × U provides a

i i

i i
ItoKunisc
i i
2008/6/12
page xiii
i i

Preface xiii

possibility for evaluating the gradient of J at u. If in addition u has to satisfy a constraint


of the form u ∈ K, the projected gradient is obtained by projecting fu (y, u) + eu (y, u)∗ λ
onto K, and (projected) gradient-based iterative methods can be employed to solve (0.0.1).
In optimal control of differential equations, the multiplier λ is called the adjoint state
and the second equation in (0.0.8) is called the optimality condition. Further the system
(0.0.8) coincides with the celebrated Pontryagin maximum principle. The steps in the
procedure explained above for obtaining the gradient are referred to as the forward equation
step for the state equation (the third equation in (0.0.8)) and the backward equation step for
the adjoint equation (the first equation). This terminology is motivated by time-dependent
problems, where the third equation is an initial value problem and the first one is a problem
with terminal time boundary condition. Lastly, if K is of type (0.0.2), then the second
equation of (0.0.8) can be written as the complementarity condition
fu (y, u) + eu (y, u)∗ λ + η = 0,
(0.0.10)
η = max(0, η + (u − ψ)) + min(0, η + (u − φ)).
Due to the nondifferentiability of the max and min operations classical Newton methods
are not directly applicable to system (0.0.10). Active set methods provide a very efficient
alternative. They turn out to be equivalent to semismooth Newton methods in function
spaces. If appropriate structural conditions are met, including that f be quadratic and
e affine, then such techniques are globally convergent. Moreover, under more general
conditions they exhibit local superlinear convergence. Due to the practical relevance of
problems with box constraints, a significant part of this monograph is devoted to the analysis
of these methods.
A frequently employed alternative to the multiplier approach is given by the penalty
method. To explain the procedure consider the sequence of minimization problems
ck
min f (y, u) + | e(y, u)|2W (0.0.11)
y∈Y,u∈K 2
for an increasing sequence of penalty parameter ck . That is, the equality constraint is
eliminated through the quadratic penalty function. This requires to solve the unconstrained
minimization problem (0.0.11) over Y × K for a sequence of {ck } tending to infinity. Under
mild assumptions it can be shown that the sequence of minimizers (yk , uk ) determined by
(0.0.11) converges to a minimizer of (0.0.1) as ck → ∞. Moreover, (yk , uk ) satisfies
fy (yk , uk ) + ey (yk , uk )∗ (ck e(yk , uk )) = 0,
(0.0.12)
fu (yk , uk ) + eu (yk , uk )∗ (ck e(yk , uk )), v − uk
Z∗ ,Z ≥ 0.
Comparing (0.0.12) with (0.0.8) it is natural to ask whether ck e(yk , uk ) tends to a Lagrange
multiplier associated to e(y, u) = 0 as ck → ∞. Indeed this can be shown under suitable
conditions. Despite disadvantages due to slow convergence and possible ill-conditioning for
solving (0.0.12) with large values of ck , the penalty method is widely accepted in practice.
This is due, in part, to the simplicity of the approach and the availability of powerful algo-
rithms for solving (0.0.12) if K = Z, when the inequality in (0.0.12) becomes an equality.
Athird methodology is given by duality techniques. They are based on the introduction
of the dual functional
d(λ) = inf L(y, u, λ) (0.0.13)
y∈Y, u∈K

i i

i i
ItoKunisc
i i
2008/6/12
page xiv
i i

xiv Preface

and the duality property

sup d(λ) = inf f (y, u) subject to e(y, u) = 0. (0.0.14)


λ∈Y y∈Y, u∈K

The duality method for solving (0.0.1) requires minimizing L(y, u, λk ) over (y, u) ∈ Y ×K
and updating λ by means of

λk+1 = λk + αk J e(yk , uk ), (0.0.15)

where
(yk , uk ) = argmin L(y, u, λk ),
y∈Y, u∈K

αk > 0 is an appropriately chosen step size and J denotes the Riesz mapping from W onto
W ∗ . It can be argued that e(yk , uk ) is the gradient of d(λ) at λk and (0.0.15) is in turn a
steepest ascent method for the maximization of d(λ) under appropriate conditions. Such
methods are called primal-dual methods. Despite the fact that the method can be justified
only under fairly restrictive convexity assumptions on L(y, u, λ) with respect to (y, u),
it provides an elegant use of Lagrange multipliers and is a basis for so-called augmented
Lagrangian methods.
Augmented Lagrangian methods with K = Z are based on the following problem
which is equivalent to (0.0.1):
c
min f (y, u) + |e(y, u)|2W
y∈Y,u∈K 2 (0.0.16)
subject to e(y, u) = 0.

Under rather mild conditions the quadratic term enhances the local convexity of L(y, u, λ)
in the variables (y, u) for sufficiently large c > 0. It helps the convergence of direct
solvers based on the necessary optimality conditions (0.0.8). To carry this a step further we
introduce the augmented Lagrangian functional
c
Lc (y, u, λ) = f (x, u) + e(y, u), λ
+ |e(y, u)|2W . (0.0.17)
2
The first order augmented Lagrangian method is the primal-dual method applied to (0.0.17),
i.e.,
(yk , uk ) = argmin Lc (y, u, λk ),
y∈Y, u∈K (0.0.18)
λk+1 = λk + c J e(yk , uk ).
Its advantage over the penalty method is attributed to the fact that local convergence of
(yk , uk ) to a minimizer (y, u) of (0.0.1) holds for all sufficiently large and fixed c > 0,
without requiring that c → ∞. As we noted, Lc (y, u, λ) has local convexity properties un-
der well-studied assumptions, and the primal-dual viewpoint is applicable to the multiplier
update. The iterates (yk , uk , λk ) converge linearly to the triple (y, u, λ) satisfying the first
order necessary optimality conditions, and convergence can improve as c > 0 increases.
Due to these attractive characteristics and properties, the method of multipliers and its sub-
sequent Newton-like variants have been recognized as a powerful method for minimization
problems with equality constraints. They constitute an important part of this book.

i i

i i
ItoKunisc
i i
2008/6/12
page xv
i i

Preface xv

In (0.0.16) and (0.0.18) the constraint u ∈ K remained as explicit constraint and was
not augmented. To describe a possibility for augmenting inequalities we return to the general
form g(y, u) ∈ K and consider inequality constraints with finite rank, i.e., Z = Rp and
K = {z ∈ Rp : zi ≤ 0, 1 ≤ i ≤ p}. Then under appropriate conditions the formulation
c
min Lc (y, u, λ) + (μ, g(y, u) − q) + |g(y, u) − q|2Rp
y∈Y, u∈U, q∈Rp 2 (0.0.19)
subject to q ≤ 0

is equivalent to (0.0.1). Here μ ∈ Rp is the Lagrange variable associated with the inequality
constraint g(y, u) ≤ 0. Minimizing the functional in (0.0.19) over q ≤ 0 results in the
augmented Lagrangian functional
1 c
Lc (y, u, λ, μ) = Lc (y, u, λ) + | max(0, μ + c g(y, u))|2Rp − |μ|2Rp , (0.0.20)
2 2
where equality and finite rank inequality constraints are augmented. The corresponding
augmented Lagrangian method is

(yk , uk ) = argmin Lc (y, u, λk , μk ),


y∈Y, u∈K
λk+1 = λk + c J e(yk , uk ), (0.0.21)
μk+1 = max(0, μk + c g(yk , uk )).

In the discussion of box-constrained problems we already pointed out the relevance of


numerical methods for nonsmooth problems; see (0.0.10). In many applied variational prob-
lems, for example in mechanics, fluid flow, or image analysis, nonsmooth cost functionals
arise. Consider as a special case the simplified friction problem
 
1
min f (y) = (|∇y|2 + |y|2 ) − f˜ y dx + g |y| ds over y ∈ H 1 (), (0.0.22)
 2

where  is a bounded domain with boundary . This is an unconstrained minimization


problem with X = H 1 () and no control variables. Since the functional is not continuously
differentiable, the necessary optimality condition fy (y) = 0 is not applicable. However,
with the aid of a generalized Lagrange multiplier theory a necessary optimality condition
can be written as
∂y
− y + y = f˜, = g λ on ,
∂ν (0.0.23)
|λ(x)| ≤ 1 and λ(x)y(x) = | y(x)| a.e. in .

From a Lagrange multiplier perspective, problems with L1 -type cost functionals as in


(0.0.22) and box-constrained problems are dual to each other. So it comes as no surprise that
again semismooth Newton methods provide an efficient technique for solving (0.0.22) or
(0.0.23). We shall analyze them and provide a theoretical basis for their numerical efficiency.
This monograph also contains the analysis of optimization problems constrained by
partial differential equations which are singular in the sense that the state variable cannot
be differentiated with respect to the control. This, however, does not preclude that the cost

i i

i i
ItoKunisc
i i
2008/6/12
page xvi
i i

xvi Preface

functional is differentiable with respect to the control and that an optimality principle can
be derived. In terms of shape optimization problems this means that the cost functional is
shape differentiable while the state is not differentiable with respect to the shape.
In summary, Lagrange multiplier theory provides a tool for the analysis of general
constrained optimization problems with cost functionals which are not necessarily C 1 and
with state equations which are in some sense singular. It also leads to a theoretical basis
for developing efficient and powerful iterative methods for solving such problems. The
purpose of this monograph is to provide a rather thorough analysis of Lagrange multiplier
theory and to show its impact on the development of numerical algorithms for problems
which are posed in a function space setting.
Let us give a short description of the book for those readers who do not intend to
read it by consecutive chapters. Chapter 1 provides a variety of tools to establish existence
of Lagrange multipliers and is called upon in all the following chapters. Here, as in other
chapters, we do not attempt to give the most general results, nor do we strive for covering the
complete literature. Chapter 2 is devoted to the sensitivity analysis of abstract constrained
nonlinear programming problems and it essentially stands for itself. This chapter is of great
importance, addressing continuity, Lipschitz continuity, and differentiability of the solutions
to optimization and optimal control problems with respect to parameters that appear in the
problem formulation. Such results are not only of theoretical but also of practical importance.
The sensitivity equations have been a starting point for the development of algorithmic
concepts for decades. Nevertheless, readers who are not interested in this topic at first may
skip this chapter without missing technical results which might be needed for later chapters.
Chapters 3, 5, and 6 form a unit which is devoted to smooth optimization problems.
Chapter 3 covers first order augmented Lagrangian methods for optimization problems with
equality and inequality constraints. Here as in the remainder of the book, the inequality
constraints that we have in mind typically represent partial differential equations. In fact,
during the period in which this monograph was written, the terminology “PDE-constrained
optimization” emerged. Inverse problems formulated as regularized least squares problems
and optimal control problems for (partial) differential equations are primary examples for
the theories that are discussed here. Chapters 5 and 6 are devoted to second order iter-
ative solution methods for equality-constrained problems. Again the equality constraints
represent partial differential equations. This naturally gives rise to the following situation:
The variables with respect to which the optimization is carried out can be classified into
two groups. One group contains the state variables of the differential equations and the
other group consists of variables which represent the control or input variables for optimal
control problems, or coefficients in parameter estimation problems. If the state variables
are considered as functions of the independent controls, inputs, or coefficients, and the cost
functional in the optimization problem is only considered as a functional of the latter, then
this is referred to as the reduced formulation. Applying a second order method to the
reduced functional we arrive at the Newton method for optimization problems with partial
differential equations as constraints. If both state and control variables are kept as inde-
pendent variables and the optimality system involving primal and adjoint variables, which
are the Lagrange multipliers corresponding to the PDE-constraints, is derived, we arrive at
the sequential quadratic programming (SQP) technique: It essentially consists of applying
a Newton algorithm to the first order necessary optimality conditions. The Newton method
for the reduced formulation and the SQP technique are the focus of Chapter 5. Chapter 6

i i

i i
ItoKunisc
i i
2008/6/12
page xvii
i i

Preface xvii

is devoted to second order augmented Lagrangian techniques which are closely related, as
we shall see, to SQP methods. Here the equation constraint is augmented in a penalty term,
which has the effect of locally convexifying the optimization problem. Since augmented
Lagrangians also involve Lagrange multipliers, there is, however, no necessity to let the
penalty parameter tend to infinity and, in fact, we do not suggest doing so.
A second larger unit is formed by Chapters 4, 7, 8, and 9. Nonsmoothness, primal-dual
active set strategy, and semismooth Newton methods are the keywords which characterize
the contents of these chapters. Chapter 4 is essentially a recapture of concepts from convex
analysis in a format that is used in the remaining chapters. A key result is the formulation
of differential inclusions which arise in optimality systems by means of nondifferentiable
equations which are derived from Yosida–Moreau approximations and which will serve
as the basis for the primal-dual active set strategy. Chapter 7 is devoted to the primal-
dual active set strategy and its global convergence properties for unilaterally and bilaterally
constrained problems. The local analysis of the primal-dual active set strategy is achieved
in the framework of semismooth Newton methods in Chapter 8. It contains the notion of
Newton derivative and establishes local superlinear convergence of the Newton method
for problems which do not satisfy the classical sufficient conditions for local quadratic
convergence. Two important classes of applications of semismooth Newton methods are
considered in Chapter 9: image restoration and deconvolution problems regularized by the
bounded variation (BV) functional and friction and contact problems in elasticity.
Chapter 10 is devoted to a Lagrangian treatment of parabolic variational inequali-
ties in unbounded domains as they arise in the Black–Scholes equation, for example. It
contains the use of monotone techniques for analyzing parabolic systems without relying
on compactness assumptions in a Gelfand-triple framework. In Chapter 11 we provide
a calculus for obtaining the shape derivative of the cost functional in shape optimization
problems which bypasses the need for using the shape derivative of the state variables of the
partial differential equations. It makes use of the expansion technique that is proposed in
Chapters 1 and 5 for weakly singular optimal control problems, and of the assumption that
an appropriately defined adjoint equation admits a solution. This provides a versatile tech-
nique for evaluating the shape derivative of the cost functional using Lagrange multiplier
techniques.
There are many additional topics which would fit under the title of this monograph
which, however, we chose not to include. In particular, issues of discretization, convergence,
and rate of convergence are not discussed. Here the issue of proper discretization of adjoint
equations consistent with the discretization of the primal equation and the consistent time
integration of the adjoint equations must be mentioned. We do not enter into the discussion
of whether to discretize an infinite-dimensional nonlinear programming problem first and
then to decide on an iterative algorithm to solve the finite-dimensional problems, or the other
way around, consisting of devising an optimization algorithm for the infinite-dimensional
problem which is subsequently discretized. It is desirable to choose a discretization and
an iterative optimization strategy in such a manner that these two approaches commute.
Discontinuous Galerkin methods are well suited for this purpose; see, e.g., [BeMeVe].
Another important area which is not in the focus of this monograph is the efficient solution
of those large scale linear systems which arise in optimization algorithms. We refer the
reader to, e.g., [BGHW, BGHKW], and the literature cited there. The solution of large
scale time-dependent optimal control problems involving the coupled system of primal and

i i

i i
ItoKunisc
i i
2008/6/12
page xviii
i i

xviii Preface

adjoint equations, which need to be solved in opposite directions with respect to time, still
offers a significant challenge, despite the advances that were made with multiple shooting,
receding horizon, and time-domain decomposition techniques. From the point of view of
optimization theory there are several topics as well into which one could expand. These
include globalization strategies, like trust region methods, exact penalty methods, quasi-
Newton methods, and a more abstract Lagrange multiplier theory than that presented in
Chapter 1.
As a final comment we stress that for a full treatment of a variational problem in func-
tion spaces, both its infinite-dimensional analysis as well as its proper discretization and the
relation between the two are indispensable. Proceeding from an infinite-dimensional prob-
lem directly to its disretization without such a treatment, important issues can be missed. For
instance discretization without a well-posed analysis may result in the use of inappropriate
inner products, which may lead to unnecessary ill-conditioning, which entails unneces-
sary preconditioning. Inconsiderate discretization may also result in the loss of structural
properties, as for instance symmetry properties.

i i

i i
ItoKunisc
i i
2008/6/12
page 1
i i

Chapter 1

Existence of Lagrange
Multipliers

1.1 Problem statement and generalities


We consider the constrained optimization problem

min f (x)
(1.1.1)
subject to x ∈ C and g(x) ∈ K,

where f is a real-valued functional defined on a real Banach space X; further C is a closed


convex subset of X, g is a mapping from X into the real Banach space Z, and K is a closed
convex cone in Z. Throughout the monograph it is understood that the vertex of a cone
coincides with the origin. Further, in this chapter it is assumed that (1.1.1) admits a solution
x ∗ . Here we refer to x ∗ as solution if it is feasible, i.e., x ∗ satisfies the constraints in (1.1.1),
and f has a local minimum at x ∗ . It is assumed that

f is Fréchet differentiable at x ∗ , and
(1.1.2)
g is continuously Fréchet differentiable at x ∗

with Fréchet derivatives denoted by f (x ∗ ) and g (x ∗ ), respectively. The set of feasible


points for (1.1.1) will be denoted by M = {x ∈ C : g(x) ∈ K}. Moreover, the following
notation will be used. The topological duals of X and Z are denoted by X ∗ and Z ∗ . For the
subset A of X the polar or dual cone is defined by

A+ = {x ∗ ∈ X ∗ : x ∗ , a
≤ 0 for all a ∈ A},

where ·, ·
(= ·, ·
X∗ ,X ) denotes the duality pairing between X ∗ and X. Further for x ∈ X
the conical hull of C \ {x} is given by

C(x) = {λ(c − x) : c ∈ C, λ ≥ 0},

and K(z) with z ∈ Z is defined analogously. The Lagrange functional L : X × Z ∗ → R


associated to (1.1.1) is defined by

L(x, λ) = f (x) + λ, g(x)


Z∗ ,Z .

i i

i i
ItoKunisc
i i
2008/6/12
page 2
i i

2 Chapter 1. Existence of Lagrange Multipliers

Definition 1.1. An element λ∗ ∈ Z ∗ is called a Lagrange multiplier for (1.1.1) at the


solution x ∗ if λ∗ ∈ K + , λ∗ , g(x ∗ )
= 0, and

f (x ∗ ) + λ∗ ◦ g (x ∗ ) ∈ −C(x ∗ )+ .

Lagrange multipliers are of fundamental importance for expressing necessary as well


as sufficient optimality conditions for (1.1.1) and for the development of algorithms to
solve (1.1.1). If C = X, then the inclusion in Definition 1.1 results in the equation

f (x ∗ ) + λ∗ ◦ g (x ∗ ) = 0.

For the existence analysis of Lagrange multipliers one makes use of the following
tangent cones which approximate the feasible set M at x ∗ :
 
∗ 1 ∗ +
T (M, x ) = x ∈ X : x = lim (xn − x ), tn → 0 , xn ∈ M ,
n→∞ tn

L(M, x ) = {x ∈ X : x ∈ C(x ∗ ) and g (x ∗ )x ∈ K(g(x ∗ ))}.


The cone T (M, x ∗ ) is called the sequential tangent cone (or Bouligand cone) and L(M, x ∗ )
the linearizing cone of M at x ∗ .

Proposition 1.2. If x ∗ is a solution to (1.1.1), then f (x ∗ )x ≥ 0 for all x ∈ T (M, x ∗ ).

Proof. Let x ∈ T (M, x ∗ ). Then there exist sequences {xn }∞ n=1 in M and {tn } with
limn→∞ tn = 0 such that x = limn→∞ t1n (xn − x ∗ ). By the definition of Fréchet differ-
entiability there exists a sequence {rn } in Z such that
1 1
f (x ∗ )x = lim f (x ∗ )(xn − x ∗ ) = lim (f (xn ) − f (x ∗ ) + rn )
n→∞ tn n→∞ tn

and limn→∞ 1
r
|xn −x ∗ | n
= 0. By optimality of x ∗ we have

1
f (x ∗ )x ≥ lim |x| rn = 0
n→∞ |xn − x ∗ |

and the result follows.

We now briefly outline the main steps involved in the abstract approach to prove the
existence of Lagrange multipliers. For this purpose we assume that C = X, K = {0}
so that the set of feasible points is described by M = {x : g(x) = 0}. We assume that
g (x ∗ ) : X → Z is surjective. Then by Lyusternik’s theorem [Ja, We] we have

L(M, x ∗ ) ⊂ T (M, x ∗ ). (1.1.3)

Consider the convex cone

B = {(f (x ∗ )x + r, g (x ∗ )x) : r ≥ 0, x ∈ X} ⊂ R × Z.

i i

i i
ItoKunisc
i i
2008/6/12
page 3
i i

1.2. A generalized open mapping theorem 3

Observe that (0, 0) ∈ B and that due to Proposition 1.2 and (1.1.3) the origin (0, 0) is a
boundary point of B. Since g (x ∗ ) is surjective, B has nonempty interior and hence by the
Eidelheit separation theorem that we shall recall below there exists a closed hyperplane in
R × Z which supports B at (0, 0), i.e., there exists 0  = (α, λ∗ ) ∈ R × Z ∗ such that

α(f (x ∗ )x + r) + λ∗ , g (x ∗ )x
≥ 0 for all (r, x) ∈ R+ × X.

Setting x = 0 we have α ≥ 0. If α = 0, then λ∗ g (x ∗ ) = 0, which would imply


λ∗ = 0, which is impossible. Consequently α > 0 and without loss of generality we
can assume α = 1. Hence λ∗ is a Lagrange multiplier. For the general case of equality
and inequality constraints the application of Lyusternik’s theorem is replaced by stability
results for solutions of systems of inequalities. The existence of a separating hyperplane
will follow from a generalized open mapping theorem, which is covered in the following
section.
The separation theorem announced above is presented next.

Theorem 1.3. Let K1 and K2 be nontrivial, convex, and disjoint subsets of a normed linear
space X . If K1 is closed and K2 is compact, then there exists a closed hyperplane strictly
separating K1 and K2 , i.e., there exist x ∗ ∈ X , β ∈ R, and  > 0 such that

x ∗ , x
≤ β −  for all x ∈ K1 and x ∗ , x
≥ β +  for all x ∈ K2 .

If K1 is open and K2 is arbitrary, then these inequalities still hold with  = 0.

Let us give a brief description of the sections of this chapter. Section 1.2 contains the
open mapping theorem already alluded to above. Regularity properties which guarantee
the existence of a Lagrange multiplier in general nonlinear optimal programming problems
in infinite dimensions are analyzed in section 1.3 and applications to parameter estimation
and optimal control problems are described in section 1.4. Section 1.5 is devoted to the
derivation of a first order optimality system for a class of weakly singular optimal control
problems and to several applications which are covered by this class. Finally in section 1.6
several further alternatives for deriving optimality systems are presented in an informal
manner.

1.2 A generalized open mapping theorem


Throughout this section T denotes a bounded linear operator from X to Z. As in the previous
section C stands for a closed convex set in X, and K stands for a closed convex cone in
Z. For Theorem 1.4 below it is sufficient that K is a closed convex set. For ρ > 0 we set
Xρ = {x : |x| ≤ ρ} and Zρ is defined analogously. Recall that by the open mapping theorem,
surjectivity of T implies that T maps open sets of X onto open sets of Z. As a consequence
T X1 contains Zρ for some ρ > 0. The following theorem generalizes this result.

Theorem 1.4. Let x̄ ∈ C and ȳ ∈ K, where K is a closed convex set. Then the following
statements are equivalent:
(a) Z = T C(x̄) − K(ȳ),

i i

i i
ItoKunisc
i i
2008/6/12
page 4
i i

4 Chapter 1. Existence of Lagrange Multipliers

(b) there exists ρ > 0 such that Zρ ⊂ T ((C − {x̄}) ∩ X1 ) − (K − {ȳ}) ∩ Z1 ,


where C(x̄) = {λ(x − x̄) : λ ≥ 0, x ∈ C} and K(ȳ) = {k − λȳ : λ ≥ 0, k ∈ K}.

Proof. Clearly (b) implies (a). To prove the converse we first show that

C(x̄) = n((C − x̄) ∩ X1 ). (1.2.1)
n∈N

The inclusion ⊃ is obvious and hence it suffices to prove that C(x̄) is contained in the set
on the right-hand side of (1.2.1). Let x ∈ C(x̄). Then x = λy with λ ≥ 0 and y ∈ C − x̄.
Without loss of generality we can assume that λ > 0 and |y| > 1. Convexity of C implies
1
that |y| y ∈ (C − x̄) ∩ X1 . Let n ∈ N be such that n ≥ λ|y|. Then x = (λ|y|) |y| 1
y ∈
n((C − x̄) ∩ X1 ), and (1.2.1) is verified.
For α > 0 let Aα = αT ((C − x̄) ∩ X1 − (K − ȳ) ∩ Z1 ). We will show that 0 ∈ int A1 .
In fact, from (1.2.1) and the analogous equality with C replaced by K it follows from (a)
that
 
Z= nT ((C − x̄) ∩ X1 ) ∪ (−n)((K − ȳ) ∩ Z1 ).
n∈N n∈N

This implies hat


 
Z= An = An . (1.2.2)
n∈N n∈N

Thus the complete space Z is the countable union of the closed sets An . By the Baire category
theorem there exists some m ∈ N such that int Am  = ∅. Let a ∈ int Am . By (1.2.2) there
exists k ∈ N with −a ∈ Ak . Using Aα = αA1 this implies that − mk a ∈ Am . It follows that
the half-open interval {λ(− mk a)+(1−λ)a : 0 ≤ λ < 1} belongs to int Am and consequently
0 ∈ int Am = m int A1 . Thus we have
0 ∈ int A1 . (1.2.3)
Hence for some ρ > 0
1
Zρ ⊂ A1 = A 1 ⊂ A 1 + Z ρ2 , (1.2.4)
2 2 2

and consequently for all i = 0, 1, . . .


 i  i
1 1
Z( )i ρ =
1 Zρ ⊂ (A 1 + Z ρ2 ) = A( 1 )i+1 + Z( 1 )i+1 ρ . (1.2.5)
2 2 2 2 2 2

Now choose y ∈ Zρ . By (1.2.4) there exist x1 ∈ (C − x̄) ∩ X1 , y1 ∈ (K − ȳ) ∩ Z1 ,


and r1 ∈ Z ρ2 such that y = T ( 12 x1 ) − 12 y1 + r1 .
Applying (1.2.5) with i = 1 to r1 implies the existence of x2 ∈ (C − x̄) ∩ X1 ,
y2 ∈ (K − ȳ) ∩ Z1 , and r2 ∈ Z( 1 )2 ρ such that
2

  2     
1 1 1 1
y=T x1 + x2 − y1 + y2 + r 2 .
2 2 2 2

i i

i i
ItoKunisc
i i
2008/6/12
page 5
i i

1.3. Regularity and existence of Lagrange multipliers 5

Repeated application of this argument implies that y = T un − vn + rn , where


n  i
1
un = xi with xi ∈ (C − x̄) ∩ X1 ,
i=1
2
n  i
1
vn = yi with yi ∈ (K − ȳ) ∩ Y1 ,
i=1
2

and rn ∈ Y( 1 )n ρ . Observe that


2

 n  n  i 
1 1 1
un = x1 + · · · + xn + 1 − 0 ∈ (C − x̄) ∩ X1
2 2 i=1
2

and

n+m  i  n
1 1
|un − un+m | ≤ ≤ .
i=n+1
2 2

Consequently {un }∞
n=1 is a Cauchy sequence and there exists some x ∈ (C − x̄) ∩ X1 such
that limn→∞ un = x. Similarly there exists v ∈ (K − ȳ) ∩ Z1 such that limn→∞ vn = v.
Moreover limn→∞ rn = 0 and continuity of T implies that y = T x − v.

1.3 Regularity and existence of Lagrange multipliers


In this section existence of a Lagrange multiplier for the optimization problem (1.1.1) will
be established using a regular point condition which was proposed in the publications of
Robinson for sensitivity analysis of generalized equations and later by Kurcyusz, Zowe,
and Maurer for the existence of Lagrange multipliers.

Definition 1.5. The element x̄ ∈ M is called regular (or alternatively x̄ satisfies the regular
point condition) if

0 ∈ int {g (x̄)(C − x̄) − K + g(x̄)}. (1.3.1)

If (1.3.1) holds, then clearly

0 ∈ int {g (x̄)C(x̄) − K(g(x̄))}. (1.3.2)

Note that g (x̄)C(x̄) − K(g(x̄)) is a cone. Consequently (1.3.2) implies that

g (x̄)C(x̄) − K(g(x̄)) = Z. (1.3.3)

Finally (1.3.3) implies (1.3.1) by Theorem 1.4. Thus, the three conditions (1.3.1)–(1.3.3)
are equivalent. Condition (1.3.1) is used in the work of Robinson (see, e.g., [Ro2]) on
the stability of solutions to systems of inequalities, and (1.3.3) is the regularity condition
employed in [MaZo, ZoKu] to guarantee the existence of Lagrange multipliers.

i i

i i
ItoKunisc
i i
2008/6/12
page 6
i i

6 Chapter 1. Existence of Lagrange Multipliers

Remark 1.3.1. If C = X, then the so-called Slater condition g (x̄)h ∈ int K(g(x̄)) for
some h ∈ X implies (1.3.1). In case C = X = Rn , Z = Rm , K = {y ∈ Rm : yi = 0
for i = 1, . . . , k, and yi ≤ 0 for i = k + 1, . . . , m}, g = (g1 , . . . , gk , gk+1 , . . . , gm ),
the constraint g(x) ∈ K amounts to k equality and m − k inequality constraints, and the
regularity condition (1.3.3) becomes


⎪ the gradients {gi (x̄)}ki=1 are linearly independent and

there exists x ∈ X such that
(1.3.4)

⎪ g (x̄)x = 0 for i = 1, . . . , k,
⎩ i
gi (x̄)x < 0 for i = k + 1, . . . , m with gi (x̄) = 0.

This regularity condition is referred to as the Mangasarian–Fromowitz constraint qualifica-


tion.
Other variants of conditions which imply existence of Lagrange multipliers are named
after F. John, Karush–Kuhn–Tucker (KKT), and Arrow, Hurwicz, Uzawa.

Theorem 1.6. If the solution x ∗ of (1.1.1) is regular, then there exists an associated Lagrange
multiplier λ∗ ∈ Z ∗ .

Proof. By Proposition 1.2 we have

f (x ∗ )x ≥ 0 for all x ∈ T (M, x ∗ ).

From Theorem 1 and Corollary 2 in [We], which can be viewed as generalizations of


Lyusternik’s theorem, it follows that L(M, x ∗ ) ⊂ T (M, x ∗ ) and hence

f (x ∗ )x ≥ 0 for all x ∈ L(M, x ∗ ). (1.3.5)

We define the set

B = {(f (x ∗ )x + r, g (x ∗ )x − y) : r ≥ 0, x ∈ C(x ∗ ), y ∈ K(g(x ∗ ))} ⊂ R × Z.

This is a convex cone containing the origin (0, 0) ∈ R × Z. Due to (1.3.5) the origin is a
boundary point of B. By the regular point condition and Theorem 1.4 with T = g (x ∗ ) and
ȳ = g(x ∗ ) there exists ρ > 0 such that

{(α, y) : α ≥ max{f (x ∗ )x : x ∈ (C − x ∗ ) ∩ X1 } and y ∈ Zρ } ⊂ B,

and hence int B  = ∅. Consequently there exists a hyperplane in R × Z which supports B


at (0, 0), i.e., there exists (α, λ∗ )  = (0, 0) ∈ R × Z ∗ such that

α(f (x ∗ )x + r) + λ∗ (g (x ∗ )x − y) ≥ 0 for all x ∈ C(x ∗ ),


y ∈ K(g(x ∗ )), r ≥ 0. (1.3.6)

Setting (r, x) = (0, 0) this implies that λ∗ , y


≤ 0 for all y ∈ K(g(x ∗ )) and consequently

λ∗ ∈ K ∗ and λ∗ g(x ∗ ) = 0. (1.3.7)

i i

i i
ItoKunisc
i i
2008/6/12
page 7
i i

1.3. Regularity and existence of Lagrange multipliers 7

The choice (x, y) = (0, 0) in (1.3.6) implies α ≥ 0. If α = 0, then due to regularity of x ∗


we have λ∗ = 0, which is impossible. Consequently α > 0, and without loss of generality
we can assume that α = 1. Finally with (r, y) = (0, 0), inequality (1.3.6) implies
f (x ∗ ) + λ∗ g (x ∗ ) ∈ −C(x ∗ )+ .
This concludes the proof.

We close this subsection with two archetypical problems.

Equality-constrained problem. We consider


min f (x)
(1.3.8)
subject to g(x) = 0.
Let x ∗ denote a local solution at which (1.1.2) is satisfied and such that g (x ∗ ) : X → Z is
surjective. Then there exists λ∗ ∈ Z ∗ such that the KKT condition
f (x ∗ ) + λ∗ g (x ∗ ) = 0 in X∗
holds. Surjectivity of g (x ∗ ) is of course only a sufficient, not a necessary, condition for
existence of a Lagrangian. This topic is taken up in Example 1.12 and below.

Inequality-constrained problem. This is the inequality-constrained problem with the


so-called unilateral box, or simple constraints. We suppose that Z = L2 () with  a
domain in Rn and consider
min f (x)
(1.3.9)
subject to g(x(s)) ≤ 0 for a.e. s ∈ .
Setting C = X and K = {v ∈ L2 () : v ≤ 0}, this problem becomes a special case
of (1.1.1). Let x ∗ denote a local solution at which (1.1.2) is satisfied and such that g (x ∗ ) :
X → Z is surjective. Then there exists λ∗ ∈ L2 () such that the KKT conditions
f (x ∗ ) + λ∗ g (x ∗ ) = 0 in X∗ ,
(1.3.10)
λ∗ (s) ≥ 0, g(x ∗ (s)) ≤ 0, λ∗ (s)g(x ∗ (s)) = 0 for a.e. s ∈ 
hold. Again surjectivity of g (x ∗ ) is a sufficient, but not a necessary, condition for (1.3.10)
to hold.
For later use let us consider the case where the functionals f (x ∗ ) ∈ X∗ and λ∗ ∈ Z ∗
are represented by duality pairings, i.e.,
f (x ∗ )(x) = JX f (x ∗ ), x
X∗ ,X , λ∗ (z) = JZ λ∗ , z
Z∗ ,Z
for x ∈ X and z ∈ Z. Here JX is the canonical isomorphism from L(X, R) into the
space of representations of continuous linear functionals on X with respect to the ·, ·
X∗ ,X
duality pairing, and JZ is defined analogously. In finite dimensions these isomorphisms are
transpositions. With this notation f (x ∗ ) + λ∗ g (x ∗ ) = 0 can be expressed as
JX f (x ∗ ) + g (x ∗ )∗ JZ λ∗ = 0.
Henceforth the notation of the isomorphisms will be omitted.

i i

i i
ItoKunisc
i i
2008/6/12
page 8
i i

8 Chapter 1. Existence of Lagrange Multipliers

1.4 Applications
This section is devoted to a discussion of examples in which the general existence result of
the previous section is applicable, as well as to other examples in which it fails. Throughout
 denotes a bounded domain in Rn with Lipschitz continuous boundary ∂. We use standard
Hilbert space notation as introduced in [Ad], for example.
Differently from the previous sections we shall use J to denote the cost functional
and f for inhomogeneities in equality constraints representing partial differential equations.
Moreover the constraints g(x) ∈ K from the previous sections will be split into equality
constraints, denoted by e(x) = 0, and inequality constraints, g(x) ∈ K, with K a nontrivial
cone.
Example 1.7. Consider the unilateral obstacle problem
⎧  
⎪ 1 2

⎨ min J (y) = ∇y dx − fydx
2   (1.4.1)



over y ∈ H01 () and y(x) ≤ ψ(x) for a.e. x ∈ ,

where f ∈ L2 () and ψ ∈ H01 (). This is a special case of (1.1.1) with X = Z =
H01 (), K = {ϕ ∈ H01 () : ϕ(x) ≤ 0 for a.e. x ∈ }, C = X, and g(y) = y − ψ. It is well
known and simple to argue that (1.4.1) admits a unique solution y ∗ ∈ H01 (). Moreover,
since g is the identity, every element of X satisfies the regular point condition. Hence by
Theorem 1.6 there exists λ∗ ∈ H01 ()∗ such that
 ∗ 
λ , ϕ H −1 ,H 1 ≤ 0 for all ϕ ∈ K,
 ∗ ∗ 0 0
λ , y − ψ H −1 ,H 1 = 0,
0 0

− y ∗ + λ∗ = f in H −1 ,
where H −1 = H01 ()∗ . Under well-known regularity assumptions on ψ and ∂, cf. [Fr,
IK1], for example, λ∗ ∈ L2 (). This extra smoothness of the Lagrange multiplier does not
follow from the general multiplier theory of the previous section.
The following result is useful to establish the regular point condition in diverse optimal
control and parameter estimation problems. We require some notation. Let L be a closed
convex cone in the real Hilbert space U which induces an ordering denoted according to
u ≥ 0 if u ∈ L. Let q ∗ ∈ U ∗ be a nontrivial bounded linear functional on U , with
π : U → ker q ∗ the orthogonal projection. For ϕ ∈ U, μ ∈ R, and γ ∈ R+ define
g : U → Z := U × R × R
by  
2
g(u) = u − ϕ, u − γ 2 , q ∗ (u) − μ ,
where | · | denotes the norm in U , and put K = L × R− × {0} ⊂ Z.

Proposition 1.8. Assume that [ker q ∗ ]⊥ ∩ L  = {0} and let h0 ∈ [ker q ∗ ]⊥ ∩ L with
2 2
q ∗ (h0 ) = 1. If q ∗ (L) ⊂ R+ , q ∗ (ϕ) < μ, and π(ϕ) + μ2 h0 < γ 2 ; then the set
M = {u ∈ U : g(u) ∈ K}

i i

i i
ItoKunisc
i i
2008/6/12
page 9
i i

1.4. Applications 9

is nonempty and every element u ∈ M is regular, i.e.,


 
0 ∈ int g(u) + g (u)U − K for each u ∈ M. (1.4.2)

Proof. Note that U = ker q ∗ ⊕ span {h0 } and that the orthogonal projection onto [ker q ∗ ]⊥
is given by q ∗ (u)h0 for u ∈ U . To show that M is nonempty, we define
 
û = ϕ + μ − q ∗ (ϕ) h0 .
Observe that
 
û − ϕ = μ − q ∗ (ϕ) h0 ∈ L,
2
û = π(ϕ) 2 + μ2 h0 2 < γ ,

and q ∗ (û) = μ. Thus û ∈ M. Next let u ∈ M be arbitrary. We have to verify that


  
0 ∈ int u − ϕ + h − L, |u|2 − γ 2 + 2(u, h) − R− , q ∗ (h) : h ∈ U ⊂ Z, (1.4.3)
where (·, ·) denotes the inner product in U . Put
 2 2 
γ 2 − π(ϕ) − μ2 h0 μ − q ∗ (ϕ)
δ = min , ,
1 + 2γ + 2γ h0  q ∗  +1
 
and define B = (ũ, r̃, s̃) ∈ Z : (ũ, r̃, s̃) Z ≤ δ . Without loss of generality we endow
the product space with the supremum norm in this proof. We shall show that
 2
B ⊂ u − ϕ + h1 + σ h0 − L, u − γ 2 + 2(u, h1 + σ h0 ) − R− , (1.4.4)

 ∗

σ q (h0 ) : h1 ∈ ker q , σ ∈ R ,
which implies (1.4.3). Let (ũ, r̃, s̃) ∈ B be chosen arbitrarily. We put σ = s̃ and verify that
there exists (h1 , l, r − ) ∈ ker q ∗ × L × R− such that
 2 
u − ϕ + h1 + s̃h0 − l, u − γ 2 + 2(u, h1 + s̃h0 ) − r − = (ũ, r̃). (1.4.5)

This will imply the claim. Observe that by the choice of δ we find μ − q ∗ (ϕ) −

q (x̃) + s̃ ≥ 0. Hence l is defined by
 
l = μ − q ∗ (ϕ) − q ∗ (ũ) + s̃ h0 ∈ L.
Choosing h1 = π(φ − u + ũ) we obtain equality in the first coordinate of (1.4.5). For
the second coordinate in (1.4.5) we observe that
2
u − γ 2 + 2(u, h1 + s̃h0 )
2 2
= πu + μ2 h0 − γ 2 + 2(u, π(ϕ − u + ũ) + s̃h0 )
2 2 2 2 2  
≤ πu + μ2 h0 − γ 2 + π u + π ϕ − 2 π u + 2γ |ũ|2 + s̃ h0
 

2 2
2

≤ μ h0 − γ + πϕ + 2δγ 1 + h0 ≤ −δ
2

by the definition of δ. Hence there exists r − ∈ R− such that equality holds in the second
coordinate of (1.4.5).

i i

i i
ItoKunisc
i i
2008/6/12
page 10
i i

10 Chapter 1. Existence of Lagrange Multipliers

Remark 1.4.1. If the constraint q ∗ (u) = μ is not present


in the definition of M, then the
conclusion of Proposition 1.8 holds provided that ϕ < γ .

Example 1.9. We consider a least squares formulation for the estimation of the potential c in

− y + cy = f in ,
(1.4.6)
y = 0 on ∂

from an observation z ∈ L2 (), where f ∈ L2 () and  ⊂ Rn , with n ≤ 4. For this


purpose we set
 
M = c ∈ L2 () : c(x) ≥ 0, c ≤ γ

for some γ > 0 and consider



⎪ 1 2 α 2
⎨min J (c) = y(c) − z + c
2 2 (1.4.7)


subject to c ∈ M and y(c) solution to (4.6).

The Lax–Milgram theorem implies the existence of a variational solution y(c) ∈ H01 ()
for every c ∈ M. Here we use the fact that H 1 () ⊂ L4 () and cy ∈ L4/3 () for
c ∈ L2 (), y ∈ H 1 (), for n ≤ 4. Proposition 1.8 with L = {c ∈ L2 () : c(x) ≥ 0} and
 
2
g(c) = c, c − γ 2

is applicable and implies that every element of M is a regular point. Using subsequential
weak limit arguments one can argue the existence of a solution c∗ to (1.4.7). To apply
Theorem 1.6 Fréchet differentiability of c → J (c) at c∗ must be verified. This will follow
from Fréchet differentiability of c → y(c) at every c ∈ M, which in turn can be argued by
means of the implicit function theorem, for example. The Fréchet derivative of c → y(c)
at c ∈ M in direction h ∈ L2 () denoted by y (c)h satisfies

− y (c)h + cy (c)h = −hy(c) in ,
(1.4.8)
y (c)h = 0 on ∂.

With these preliminaries we obtain a Lagrange multiplier (μ1 , μ2 ) ∈ −L × R+ such


that the following first order necessary optimality system holds:
⎧ ∗ 

⎪ y(c ) − z, y (c∗ )h L2 + α (c∗ , h) − (μ1 , h) + 2μ2 (c∗ , h) = 0


  for all h ∈ L2 (),
(1.4.9)
−(μ1 , c ) + μ2 c − γ = 0,
∗ ∗ 2




(μ1 , μ2 ) ∈ L × R+ .
The first equation in (1.4.9) can be simplified by introducing p(c∗ ) as the variational solution
to the adjoint equation given by

− p + c∗ p = − (y(c∗ ) − z) in ,
(1.4.10)
p=0 on ∂.

i i

i i
ItoKunisc
i i
2008/6/12
page 11
i i

1.4. Applications 11

Using (1.4.8) and (1.4.10) in (1.4.9) we obtain the optimality system


⎧ ∗

⎪ p(c ) y(c∗ ) + αc∗ − μ1 + 2μ2 c∗ = 0,
⎨ 2
(μ1 , μ2 ) ∈ −L × R+ , c∗ , c∗ − γ 2 ∈ L × R− , (1.4.11)
⎪  

⎩(μ1 , c∗ ) = 0, μ2 ( c∗ 2 − γ 2 = 0.

Example 1.10. We revisit Example 1.9 but this time the state variable is considered as an
independent variable. This results in the following problem with equality and inequality
constraints: ⎧
⎪ 1 α
⎨min J (y, c) = |y − z|2 + |c|2
2 2 (1.4.12)

⎩ −
subject to e(y, c) = 0 and g(c) ∈ L × R ,
where g is as in Example 1.9 and e : H01 () × L2 () → H −1 () is defined by
e(y, c) = − y + cy − f.
Note that cy ∈ L () ⊂ H −1 () for c ∈ L2 (), y ∈ H 1 () if n ≤ 4. Let
4
3

(c∗ , y ∗ ) ∈ L2 () × H01 () denote a solution of (1.4.12). Fréchet differentiability of J, e,


and g is obvious for this formulation with y as independent variable. The regular point
condition now has to be established for the constraint
 
e
∈ K = {0} × L × R− ⊂ H −1 () × L2 () × R.
g
The second and third components are treated as in Example 1.9 by means of Propo-
sition 1.8 and the first coordinate is incorporated by an existence argument for (1.4.6). The
details are left to the reader. Consequently there exists a Lagrange multiplier (p, μ1 μ2 ) ∈
H01 () × L × R+ such that the derivative of
L(y, c, p, μ1 , μ2 )
 
2
= J (y, c) + e(y, c), p
H −1 ,H01 + (μ1 , −c) + μ2 c∗ − γ 2

with respect to (y, c) is zero at (y ∗ , c∗ , p, μ1 , μ2 ). This implies that p is the variational


solution in H01 () to

− p + c∗ p = −(y ∗ − z) in ,
p=0 on ∂
and
py ∗ + αc∗ − μ1 + 2μ2 c∗ = 0.
In addition, of course,
 
2
(μ1 , μ2 ) ∈ −L × R+ , c∗ , c∗ − γ 2 ∈ L × R− ,
 
2
(μ1 , c∗ ) = 0, μ2 c∗ − γ = 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 12
i i

12 Chapter 1. Existence of Lagrange Multipliers

Thus the first order optimality condition derived in Example 1.9 above and the present
one coincide, and the adjoint variable of Example 1.9 becomes the Lagrange multiplier of
the equality constraint e(y, c) = 0.
Example 1.11. We consider a least squares formulation for the estimation of the diffusion
coefficient a in

−(ayx )x + cy = f in  = (0, 1),
(1.4.13)
y(0) = y(1) = 0,

where f ∈ L2 (0, 1) and c ∈ L2 (), c ≥ 0, are fixed. We assume that a is known outside
of an interval I = (β, γ ) with 0 < β < γ < 1 and that it coincides there with the values
of a known function ν ∈ H 1 (0, 1), which satisfies min {ν(x) : x ∈ (0, 1)} > 0. The set of
admissible parameters is then defined by

M = a ∈ H 1 (I ) : a ≥ ν, a − â H 1 (I ) ≤ γ , a(β) = ν(β),
 γ 
a(γ ) = ν(γ ), and a dx = m .
β

Here â ∈ H 1 (I ) is a fixed reference parameter with â(β) = ν(β), â(γ ) = ν(γ ), and
m = β â dx. Recall that H 1 (I ) ⊂ C(I¯) and hence a(β) and a(γ ) are well defined. The
γ

coefficient a appearing in (1.4.13) coincides with ν on the complement of I and with some
element of M on I . Consider the least squares formulation
⎧ 2 2
⎨min 12 y(a) − z L2 (0,1) + α2 a H 1 (I )
(1.4.14)

subject to a ∈ M and y(a) solution to (1.4.13),

where z ∈ L2 (0, 1) is given. It is simple to argue the existence of a solution a ∗ to (1.4.14).


Note that minimizing over M implies that the mean of the coefficient a is known. This,
together with some additional technical assumptions, implies that the solution a ∗ is (locally)
stable with respect to perturbations of z [CoKu]. To argue the existence of Lagrange mul-
tipliers associated to the constraints defining M one considers the set of shifted parameters
M̃ = {a − â : a ∈ M}, i.e.,
  γ 


M̃ = a ∈ H (I ) : a ≥ ν − â, a H 1 (I ) ≤ γ ,
1
a dx = 0, a(β) = a(γ ) = 0 .
β

We apply Proposition 1.8 with U = and (|u|2L2 (I ) + |ux |2L2 (I ) )1/2 as norm,
H01 (I ),

L = {a ∈ U : a ≥ 0}, q ∗ (a) = β a dx, ϕ = ν − â ∈ U , and μ = 0. The mapping
γ

g : U → U × R × R is given by
 2 
g(a) = a − ϕ, a H 1 (I ) − γ 2 , q ∗ (a) .

Let us assume that β ν dx < m and ν − â H 1 (I ) < γ . Then we have q ∗ (L) ⊂

R+ , q ∗ (ϕ) < m − β âdx = μ, and

πϕ 1 = π(ν − â) 1 ≤ ν − â 1 < γ .
H (I ) H (I ) H (I )

i i

i i
ItoKunisc
i i
2008/6/12
page 13
i i

1.4. Applications 13

It remains to ascertain that [ker q ∗ ]⊥ ∩ L is nonempty. Let ψ be the unique solution of



−ψxx + ψ = 1 on (β, γ ),
ψ(β) = ψ(γ ) = 0.

By the maximum principle ψ ∈ L. A short calculation shows that ψ ∈ [ker q ∗ ]⊥ and


hence h0 := q ∗ (ψ)−1 ψ satisfies h0 ∈ [ker q ∗ ]⊥ ∩ L and q ∗ (h0 ) = 1.

Next we present examples where the general existence result of Section 1.3 is not
applicable. Different techniques to be developed in the following sections will guarantee,
however, the existence of a Lagrange multiplier. In these examples we shall again use e for
equality and g for inequality constraints.

Example 1.12. Consider the problem



min x12 + x32
(1.4.15)
subject to e(x1 , x2 , x3 ) = 0,

where
 
x1 − x22 − x3
e(x1 , x2 , x3 ) = .
x22 − x32

Here X = R3 , Z = R2 , e of Section 1.3 is e here, and K = {0}. Note that x ∗ = (0, 0, 0) is a


solution of (1.4.15) and that ∇e(x ∗ ) is not surjective. Hence x ∗ is not a regular point for the
constraint e(x) = 0. Nevertheless (0, λ2 ) with λ2 ⊂ R arbitrary is a Lagrange multiplier
for the constraint in (1.4.15).
Note that in case C = X and K = {0}, the last requirement in Definition 1.1 becomes

f (x ∗ ) + e (x ∗ )∗ λ∗ = 0 in X ∗ ,

where e (x ∗ )∗ : Z ∗ → X ∗ is the adjoint of e (x ∗ ). Hence if e (x ∗ ) is not surjective and if it


has closed range, then ker e (x ∗ )∗ is not empty and the Lagrange multiplier is not unique.

Example 1.13. Here we consider optimal control of the equation with elliptic nonlinearity
(Bratu problem) 
− y + exp(y) = u in ,
(1.4.16)
∂y
∂n
= 0 on and y = 0 on ∂ \ ,
where  is a bounded domain in Rn , n ≤ 3, with smooth boundary ∂ and is a connected,
strict subset of ∂. Let H 1 () = {y ∈ H 1 () : y = 0 on ∂ \ }. The following lemma
establishes the concept of solution to (1.4.16). Its proof is given at the end of this section.

Lemma 1.14. For every u ∈ L2 () the variational problem

(∇y, ∇v) + exp y, v


H 1 ()∗ ,H 1 () = (u, v) for v ∈ H 1 ()

has a unique solution y = y(u) ∈ H 1 () and there exists a constant C such that

|y|H 1 ∩L∞ ≤ C(|u|L2 () + C) for all u ∈ L2 () (1.4.17)

i i

i i
ItoKunisc
i i
2008/6/12
page 14
i i

14 Chapter 1. Existence of Lagrange Multipliers

and

|y(u1 ) − y(u2 )|H 1 ∩L∞ ≤ C|u1 − u2 |L2 () for all ui ∈ L2 (), i = 1, 2. (1.4.18)

The optimal control problem is given by


⎧ 2 2

⎪ min J (y, u) = 21 |y − z L2 () + α2 |u L2 ()



 
⎪ subject to (y, u) ∈ H 1 () ∩ L∞ () × L2 () (1.4.19)




and (y, u) solution to (1.4.17),

where z ∈ L2 (). To consider this problem as a special case of (1.1.1) we set X =


Y × L2 (), where Y = H 1 () ∩ L∞ (), Z = H 1 ()∗ , K = {0}, and define e : X → Z ∗
as the mapping which assigns to (y, u) ∈ Y × U the functional

v → (∇y, ∇v) + (exp(y), v) − (u, v).

The control component of any minimizing sequence {(yn , un )} for J is bounded. Together
with Lemma 1.14 it is therefore simple to argue that (1.4.19) admits a solution (y ∗ , u∗ ) ∈
Y × U . Clearly J and e are continuously Fréchet differentiable. If (y ∗ , u∗ ) was a regular
point, then this would require surjectivity of e (y ∗ , u∗ ) : X → H 1 ()∗ . For (δy, δu) ∈
X, e (y ∗ , u∗ )(δy, δu) is the functional defined by

v → (∇δy, ∇v) + (exp(y ∗ )δy − δu, v), v ∈ H 1 ().

Since − +exp(y ∗ ) is an isomorphism from H 1 () to H 1 ()∗ and H 1 () is not contained
in L∞ (), if n > 1, it follows that e (y ∗ , u∗ ) : X → H 1 ()∗ is not surjective if n > 1.

Example 1.15. We consider the least squares problem for the estimation of the vector-
valued convection coefficient u in

− y + u · ∇y = f in ,
(1.4.20)
y = 0 on ∂, div u = 0,

from data z ∈ L2 (). Here  is a bounded domain in Rn , n ≤ 3, with smooth boundary and

f ∈ L 2 (). To cast the problem
 in abstract form we choose U = u ∈ L2n () : div u = 0 ,
X = H01 () ∩ L∞ () ×U, Z = H −1 (), K = {0} and define e : X → Z as the mapping
which assigns to (y, u) ∈ X the functional

v → (∇y, ∇v) − (uy, ∇v) − (f, v) for v ∈ H01 ().

Note that e(y, u) is not well defined for (y, u) ∈ H01 () × U , since u · ∇y ∈ L1 () only.
The regularized least squares problem is given by

⎨min J (y, u) = 2 |y − z|L2 () + 2 |u|L2n ()
1 2 α 2

(1.4.21)

subject to e(y, u) = 0, div u = 0, (y, u) ∈ X.

i i

i i
ItoKunisc
i i
2008/6/12
page 15
i i

1.4. Applications 15

Observe that (u · ∇y, y) = (∇ · (uy), y) = 0. Using this fact and techniques similar to
those of the proof of Lemma 1.14 (compare [Tr, Section 2.3] and [IK14, IK15]) it can be
shown that for every u ∈ U there exists a unique solution y = y(u) ∈ X of (1.4.20) and for
every bounded subset B of U there exists a constant k(B) such that

|y(u)|X ≤ k(B)|u|L2n for all u ∈ B. (1.4.22)

Extracting a subsequence of a minimizing sequence to (1.4.21) it is simple to argue the


existence of a solution (y ∗ , u∗ ) ∈ X to (1.4.21). Clearly e is Fréchet differentiable at
(y ∗ , u∗ ) and for δy ∈ H01 () ∩ L∞ (), ey (y ∗ , u∗ )δy ∈ H −1 is the functional defined by
 
ey (y ∗ , u∗ )δy, v H −1 ,H 1 = (∇δy, ∇v) + (u∗ · ∇δy, v) with v ∈ H01 ().
0

Note that ey (y ∗ , u∗ ) is well defined on H01 () ∩ L∞ (). But as a consequence of the
bilinear term u∗ · ∇δy which is only in L1 (), ey (y ∗ , u∗ ) is not defined on H01 (). The
operator e (y ∗ , u∗ ) from X to H −1 () is not surjective if n > 1, and hence (y ∗ , u∗ ) does
not satisfy the regular point condition.

We now turn to the proof of Lemma 1.14 for which we require the following result
from [Tr].

Lemma 1.16. Let ϕ : (k1 , h1 ) → R be a nonnegative, nonincreasing function and suppose


that there are positive constants r, K, and β, with β > 1, such that

ϕ(h) ≤ K(h − k)−r ϕ(k)β for k1 < k < h < h1 .


1 β β−1
If k̂ := K r 2 β−1 ϕ(k1 ) r satisfies k1 + k̂ < h1 , then ϕ(k1 + k̂) = 0.

Proof of Lemma 1.14. Let us first argue the existence of a solution y = y(u) ∈ of

(∇y, ∇v) + (ey , v)H 1 ()∗ ,H 1 () = (u, v) for all v ∈ H 1 (). (1.4.23)

Since (exp (s1 ) − exp (s2 ))(s1 − s2 ) ≥ 0 for s1 , s2 ∈ R, it follows that − φ + exp (φ) −
γ I defines a maximal monotone operator from a subset of H 1 () to H 1 ()∗ for some
γ > 0 [Ba], and hence (1.4.23) has a unique variational solution y = y(u) ∈ H 1 () for
every u ∈ L2 (); see [IK15] for details. Since

|∇(y(u1 ) − y(u2 ))|2 ≤ (u1 − u2 , y(u1 ) − y(u2 ))

and ∂ \ is nonempty, there exists an embedding constant C such that

|y(u1 ) − y(u2 )|H 1 () ≤ C |u1 − u2 |L2 () (1.4.24)

for u1 , u2 in L2 (), and

|y(u)|H 1 ≤ C(|u|L2 () + C) for all u ∈ L2 ().

Throughout the remainder of the proof C will denote a generic constant, independent
of u ∈ L2 (). To verify (1.4.17) it remains to obtain an L∞ () bound for y = y(u). The

i i

i i
ItoKunisc
i i
2008/6/12
page 16
i i

16 Chapter 1. Existence of Lagrange Multipliers

proof is based on a generalization of well-known L∞ () estimates due to Stampacchia and


Miranda [Tr] for linear variational problems to the nonlinear problem (1.4.23). Let us aim
first for a pointwise (a.e.) upper bound for y. For k ∈ (0, ∞) we set yk = (y − k)+ and
k = {x ∈  : yk > 0}. Note that yk ∈ H 1 () and yk ≥ 0. Using (1.4.23) we find
(∇yk , ∇yk ) = (∇y, ∇yk ) = (u, yk ) − (ey , yk ) ≤ (u, yk ),
and hence
|∇yk |2L2 ≤ |u|L2 |yk |L2 . (1.4.25)
By Hölder’s inequality and a well-known embedding result
  12
1 1
|yk |L2 = yk2 ≤ |yk |L6 |k | 3 ≤ C|∇yk |L2 |k | 3 .
k

Here we used the assumption that n ≤ 3. Employing this estimate in (1.4.25) implies that
1
|∇yk |L2 ≤ C|k | 3 |u|L2 . (1.4.26)
We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞, and we find
 
|yk |L4 =
4
(y − k) >
4
(y − k)4 ≥ |h |(h − k)4 ,
k h

which, combined with (1.4.26), gives

|h | ≤ Ĉ(h − k)−4 |k | 3 |u|4L2 ,


4
(1.4.27)

where the constant Ĉ is independent of h, k, and u. It will be shown that Lemma 1.16 is
applicable to (1.4.27) with ϕ(k) = |k |, β = 43 and K = Ĉ|u|4L2 . The conditions on k1 and
1 β β−1
h1 can easily be satisfied. In fact, in our case k1 = 0, h1 = ∞, and k̂ = Ĉ 4 |u|L2 2 β−1 |0 | 4 .
The condition k1 + k̂ < h1 is satisfied since
1 β β−1 1 β β−1
k̂ = Ĉ 4 |u|L2 2 β−1 |0 | 4 < Ĉ 4 |u|L2 2 β−1 || 4 < ∞.
We conclude that |k̂ | = 0 and hence y ≤ k̂ a.e. in . A uniform lower bound on
y can be obtained in an analogous manner by considering yk = (−(k + y))+ . We leave
the details to the reader. This concludes the proof of (1.4.17). To verify (1.4.18) the H 1
estimate for y(u1 ) − y(u2 ) is already clear from (1.4.24) and it remains to verify the L∞ ()
estimate. Let us set yi = y(ui ), z = y1 − y2 , zk = (z − k)+ , and k = {x ∈  : zk > 0}
for k ∈ (0, ∞). We obtain
|∇zk |2L2 = (∇z, ∇zk ) = (u1 − u2 , zk ) − (ey1 − ey2 , zk ) ≤ (u1 − u2 , zk ).
Proceeding as above with y and yk replaced by z and zk the desired pointwise upper bound
for y1 − y2 is obtained. For the lower bound we define zk = (−(k + z))+ for k ∈ (0, ∞)
and k = {x ∈  : zk > 0} = {x : k + y1 (x) < y2 (x)}. It follows that
|∇zk |2L2 = −(∇(y1 − y2 ), ∇zk ) = (ey1 − ey2 , zk ) − (u1 − u2 , zk ) ≤ −(u1 − u2 , zk ).
From this inequality we obtain the desired uniform pointwise lower bound on
y1 − y2 .

i i

i i
ItoKunisc
i i
2008/6/12
page 17
i i

1.5. Weakly singular problems 17

1.5 Weakly singular problems


We consider the optimization problem with equality and inequality constraints of the type

min J (y, u)
(1.5.1)
subject to e(y, u) = 0, u ∈ C ⊂ U,
with J : Y × U → R, e : Y1 × U → W , where Y, U, W are Hilbert spaces and Y1 is a
Banach space that is densely embedded in Y . Further C is a closed convex subset of U .
Below W ∗ will denote the dual of W . Let (y ∗ , u∗ ) denote a local solution to (1.5.1) and let
V (y ∗ ) × V (u∗ ) ⊂ Y1 × U denote a neighborhood of (y ∗ , u∗ ) such that J (y ∗ , u∗ ) ≤ J (y, u)
for all (y, u) ∈ V (y ∗ ) × (V (u∗ ) ∩ C) satisfying e(y, u) = 0. It is assumed throughout that
J is Fréchet differentiable in a neighborhood in the Y × U topology of (y ∗ , u∗ ) and that
the Fréchet derivative is locally Lipschitz continuous. Further e is assumed to be Fréchet
differentiable at (y ∗ , u∗ ) with Fréchet derivative
e (y ∗ , u∗ )(δy, δu) = ey (y ∗ , u∗ )δy + eu (y ∗ , u∗ )δu.
In particular ey (y ∗ , u∗ ) ∈ L(Y1 , W ). Since Y1 is dense in Y , one may consider ey (y ∗ , u∗ )
as densely defined linear operator with domain in Y . To distinguish this operator from
ey (y ∗ , u∗ ) defined on Y1 we shall denote it in this section by
G : D(G) ⊂ Y → W
and we assume that
(H1) G∗ : D(G∗ ) ⊂ W ∗ → Y ∗ is densely defined.
Then necessarily G is closable [Ka, p. 168]. Its closure will be denoted by the same symbol.
In addition the following assumptions will be required:
(H2) Jy (y ∗ , u∗ ) ∈ Rg G∗ .
Condition (H2) is a regularity assumption. It implies the existence of a solution λ∗ ∈ D(G∗ )
to the adjoint equation
G∗ λ + Jy (y ∗ , u∗ ) = 0,
which is a Lagrange multiplier associated to the constraint e(y, u) = 0.
(H3) There exists a dense subset D of C with the following property: For every u ∈ D
there exists tu > 0 such that for all t ∈ [0, tu ] there exists y(t) ∈ Y1 satisfying
e(y(t), u∗ + t (u − u∗ )) = 0 and
1
lim+ |y(t) − y ∗ |2Y = 0. (1.5.2)
t→0 t

(H4) For every u ∈ D and y(·) as in (H3), e is directionally differentiable at every element
of {(y ∗ + s(y(t) − y ∗ ), u∗ + st (u − u∗ )) : s ∈ [0, 1], t ∈ [0, tu ]} in all directions
(ỹ, ũ) ∈ Y1 × U and
 1 
1 ∗ ∗ ∗ ∗ ∗ ∗ ∗
lim [e (y +s(y(t)−y ), u +stv)−e (y , u )](y(t)−y , tv)ds, λ = 0,
t→0+ t 0 W,W ∗

where v = u − u∗ .

i i

i i
ItoKunisc
i i
2008/6/12
page 18
i i

18 Chapter 1. Existence of Lagrange Multipliers

Note that (H4) is satisfied if (H3) holds with Y replaced by Y1 and if e : Y1 → W is Fréchet
differentiable with locally Lipschitzian derivative.
Our assumptions do not require surjectivity of e (y ∗ , u∗ ) : Y1 × U  → W which is
required by (1.3.1) nor that e (y ∗ , u∗ ) is well defined on all of Y × U . Below we shall give
several examples which illustrate the applicability of the assumptions. For now, we argue
that if they are satisfied, then a Lagrange multiplier with respect to the equality constraint
exists.

Theorem 1.17. Let (y ∗ , u∗ ) be a local solution of (1.5.1) and assume that (H1)–(H4) hold.
Then
⎧ ∗ ∗

⎨e(y ,u ) = 0 in W (primal equation),
∗ ∗ ∗ ∗ ∗
G λ + J y (y , u ) = 0  in Y (adjoint equation),

⎩ eu (y , u ) λ + Ju (y , u ), u − u
∗ ∗ ∗ ∗ ∗ ∗ ∗
≥ 0 for all u ∈ C (optimality).
∗ U ,U
(1.5.3)

The system of equations (1.5.3) is referred to as an optimality system.

Proof. Let u ∈ D, set v = u − u∗ , choose tu according to (H3), and assume that t ∈ (0, tu ].
Due to (H3) and (H4)

0 = e(y(t), u∗ + tv) − e(y ∗ , u∗ ) = G(y(t) − y ∗ ) + t eu (y ∗ , u∗ ) v


 1 (1.5.4)
+ [e (y ∗ + s(y(t) − y ∗ ), u∗ + stv) − e (y ∗ , u∗ )](y(t) − y ∗ , tv)ds.
0

(H2) implies the existence of a solution λ∗ to the adjoint equation. Observe that by (1.5.4)
and the fact that u∗ is a local solution to (1.5.1)

0 ≤ J (y(t), u∗ + tv) − J (y ∗ , u∗ ) = J (y ∗ , u∗ )(y(t) − y ∗ , tv)


 1
 ∗
+ J (y + s(y(t) − y ∗ ), u∗ + stv) − J (y ∗ , u∗ ) (y(t) − y ∗ , tv)ds
0
+ G∗ λ∗ , y(t) − y ∗
Y ∗ ,Y + t eu (y ∗ , u∗ )v, λ∗
W,W ∗
 1 
 ∗
+ e (y + s(y(t) − y ∗ ), u∗ + stv) − e (y ∗ , u∗ ) (y(t) − y ∗ , tv), λ∗ ds.
0 W,W ∗

By the second equation in (1.5.3), local Lipschitz continuous differentiability of J , (H3),


(H4), and the fact that J (y ∗ , u∗ )(y(t) − y ∗ , tv) = Jy (y ∗ , u∗ )(y(t) − y ∗ ) + tJu (y ∗ , u∗ )v
we obtain
1
0 ≤ lim+ J (y(t), u∗ + tv) − J (y ∗ , u∗ )) = Ju (y ∗ , u∗ )(u∗ ) + eu (y ∗ , u∗ )∗ λ∗ , u − u∗
U ∗ ,U .
t→0 t

Since u is arbitrary in D and D is dense in C it follows that

Ju (y ∗ , u∗ )(u∗ ) + eu (y ∗ , u∗ )∗ λ∗ , u − u∗
U ∗ ,U ≥ 0 for all u ∈ C.

This ends the proof.

i i

i i
ItoKunisc
i i
2008/6/12
page 19
i i

1.5. Weakly singular problems 19

We next give several examples which demonstrate the applicability of hypotheses


(H1)–(H4) and the necessity to allow for two spaces Y1 and Y with Y1  Y . The typical
situation that we have in mind is Y1 = Y ∩ L∞ () with Y a Hilbertian function space
over .

Example 1.18. Consider first the finite-dimensional equality-constrained optimization


problem

⎨min y12 + u2 ,
y1 − y22 = u, (1.5.5)
⎩ 3
y2 = u2

for which (y ∗ , u∗ ) = (0, 0, 0) is the solution. Here Y = R2 , U = R1 , W = R1 , and


 
y − y2 − u
e(y, u) = 1 3 2 2 .
y2 − u

Note that with (y1 , y2 , u) = (x1 , x2 , x3 ) this problem coincides with Example 1.12. We
recall that e (y ∗ , u∗ ) is not surjective and the theory of Section 1.3 assuring
  the existence
of a Lagrange multiplier is therefore not applicable. However, G∗ = 10 00 and the adjoint
equation
   
λ 0
G∗ 1 =
λ2 0
 
has infinitely many solutions. Thus (H1), (H2) are satisfied. As for (H3) note that yy12 (u) =
 u+u 34  (u)

2 defines a solution branch to e(y, u) = 0 for which (H3) is satisfied. It is simple to


u3
verify (H4). Hence (1.5.5) is an example for an optimization problem where all hypotheses
(H1)–(H4) are satisfied and Theorem 1.17 is applicable.

Example 1.19. We consider the optimal control problems with distributed control
1 α
min J (y, u) = |y − z|2L2 () + |u|2L2 () (1.5.6)
2 2
subject to

⎨− y + exp(y) = u in ,
∂y
= 0 on , (1.5.7)
⎩ ∂n
y = 0 on ∂ \ ,

where α > 0, z ∈ L2 (),  is a bounded domain in Rn , n ≤ 3, with smooth boundary ∂,


and is a connected, strict subset of ∂. The concept of solution to (1.5.7) is the variational
one of Lemma 1.14. To consider this problem in the general setting of this section the control
variable u is chosen in U = L2 (). We set Y = H 1 () = {y ∈ H 1 () : y = 0 on ∂\ },
Y1 = H 1 () ∩ L∞ (), and W = (H 1 ())∗ . Moreover e : Y1 × U → W is defined by
assigning to (y, u) ∈ Y1 × U the functional on W ∗ given by

v → (∇y, ∇v) + (exp y, v) − (u, v) for v ∈ H 1 ().

i i

i i
ItoKunisc
i i
2008/6/12
page 20
i i

20 Chapter 1. Existence of Lagrange Multipliers

Let (y ∗ , u∗ ) ∈ Y1 denote a solution to (1.5.6), (1.5.7). For (y, u) ∈ Y1 × L2 () the


Fréchet derivative ey (y, u)δy of e with respect to y in direction δy ∈ H 1 () is given by the
functional v → (∇δy, ∇v) + ((exp y)δy, v). Clearly ey (y, u) : Y → W is symmetric and
(H1), (H2) are satisfied. (H3) is a direct consequence of Lemma 1.14. To verify (H4) note
that e (y, u)(δy, δu) is the functional defined by

v → (∇δy, ∇v) + (exp(y)δy, v) − (δu, v) for v ∈ H 1 ().

If (y, u) ∈ Y1 × U , then e (y, u) is well defined on Y × U . (H4) requires us to consider


 1 
1
lim (exp(y ∗ + s(y(t) − y ∗ )) − exp y ∗ )(y(t) − y ∗ )λ∗ dxds, (1.5.8)
t→0+ t 0 

where y(t) is the solution to (1.5.7) with u = u∗ + tv, v ∈ U . Note that λ∗ ∈ L∞ and that
{|y(t)|L∞ : t ∈ [0, 1]} is bounded by Lemma 1.14. Moreover |y(t) − y ∗ |Y1 ≤ c t|v|L2 for a
constant c independent of t ∈ [0, 1], and thus the pointwise local Lipschitz property of the
exponential function implies that the limit in (1.5.8) is zero. (H4) now easily follows.
The considerations of this example remain correct for cost functionals J that are much
more general than the one in (1.5.6). In fact, it suffices that J is weakly lower semicontinuous
from Y ×U to R and radially unbounded with respect to u, i.e., limn→∞ |un |L2 = ∞ implies
that lim supn→∞ inf y∈Y J (y, un ) = ∞. This guarantees existence of a solution (y ∗ , u∗ ).
The general regularity assumptions and (H1)–(H4) are satisfied if J : Y × U → R is
continuous and Fréchet differentiable in a neighborhood of (y ∗ , u∗ ) with locally Lipschitz
continuous derivative.

Example 1.20. Consider the nonlinear optimal boundary control problem

1 α
min J (y, u) = |y − z|2L2 () + |u|2H s ( ) (1.5.9)
2 2
subject to

⎨− y + exp y = f in ,
∂y
= u on , (1.5.10)
⎩ ∂n
y = 0 on ∂ \ ,

where α > 0, z ∈ L2 (), f ∈ L∞ are fixed,  is a bounded domain in Rn , with smooth


boundary ∂, and is a nonempty, connected, strict subset of ∂. Further s is a real
number strictly larger than n−3
2
if n ≥ 3, and s = 0 if n < 3. Differently from Example 1.19
the dimension n of  is now allowed to be arbitrary. This example can be treated within the
general framework of this book by setting Y, Y1 , W as in Example 1.19. The control space
U is chosen to be H s ( ). To verify (H1)–(H4) one proceeds as in Example 1.19 by utilizing
the following lemma. Its proof is given in [IK15] and is similar to that of Lemma 1.14.

Lemma 1.21. The variational problem

(∇y, ∇v) + (exp y, v) = (f, v) + (u, v) , v ∈ H 1 (),

i i

i i
ItoKunisc
i i
2008/6/12
page 21
i i

1.5. Weakly singular problems 21

has a unique solution y = y(u) ∈ H 1 () ∩ L∞ () for every u ∈ H s ( ), and there exists
a constant c such that
|y|H 1 ∩L∞ ≤ c(|u|H s ( ) + c) for all u ∈ H s ( ). (1.5.11)
Moreover, c can be chosen such that
|y(u1 ) − y(u2 )|H 1 ()∩L∞ ≤ c|u1 − u2 |H s for all ui ∈ H s ( ), i = 1, 2. (1.5.12)

Example 1.22. We reconsider (1.4.21) from Example 1.15 as a special case of (1.5.1). For
this purpose we set Y = H01 (), Y1 = H01 () ∩ L∞ (), W = H −1 (), and e as in
Example 1.15. Let (y ∗ , u∗ ) ∈ Y1 × U denote a solution of (1.4.21). Clearly e is Fréchet
differentiable at (y ∗ , u∗ ) and the partial derivative G = ey (y ∗ , u∗ ) ∈ L(Y1 , W ) is the
functional characterized by
ey (y ∗ , u∗ )δy, v
W,W ∗ = (∇δy, ∇v) + (u∗ · ∇δy, v), v ∈ W ∗ = H01 ().
As a consequence of the quadratic term u∗ · ∇δy which is only in L1 (), G is not defined
on all of Y = H01 (). As an operator from Y1 to W , the operator G is not surjective.
Considered as an operator with domain in Y , its adjoint is given by
G∗ w = − w − ∇ · (u∗ w).
The domain of G∗ contains Y1 and hence G∗ is densely defined. Moreover its range con-
tains L2 () and thus (H1) as well as (H2) are satisfied. Let U (u∗ ) ⊂ U be a bounded
neighborhood of u∗ . Since for every u ∈ U (u∗ )
(∇(y(u) − y ∗ ), ∇v) − (u(y(u) − y ∗ ), ∇v) = ((u − u∗ )y ∗ , ∇v) for all v ∈ H01 (),
it follows that there exists a constant k > 0 such that
|y(u) − y ∗ |H 1 ≤ k|u − u∗ |L2n for all u ∈ U (u∗ ), (1.5.13)
and (H3) follows. The validity of (H4) is a consequence of (1.5.13) and the fact that λ∗ is
the unique variational solution in H01 () to
− λ∗ − ∇ · (u∗ λ∗ ) = −(y ∗ − z)
and hence an element of L∞ ().

Remark 1.5.1. Comparing Examples 1.19 and 1.20 with Example 1.22 we observe that the
linearization e (y, u), with (y, u) ∈ Y1 × U , is well defined on Y × U for Examples 1.19
and 1.20 but it is only defined with domain strictly contained in Y × U for Example 1.22.
For none of these examples is e defined on all of Y × U .

Example 1.23. Here we consider the nonlinear optimal control problem with nonlinearity
of blowup type:

⎪ min 12 |∇(y − z)|2L2 () + α2 |u|2L2 ( ) subject to

⎨ 2
− y − exp y = f in ,
(1.5.14)

⎩ ∂n = u on ,
∂y

y = 0 on ∂ \ ,

i i

i i
ItoKunisc
i i
2008/6/12
page 22
i i

22 Chapter 1. Existence of Lagrange Multipliers

where α > 0, z ∈ H 1 (), f ∈ L2 (),  is a smooth bounded domain in R2 , and ⊂ ∂


is a connected strict subset of ∂. Since  is assumed to be a two-dimensional domain we
have the following property of the exponential function: for every p ∈ [1, ∞),
{| exp y|Lp : y ∈ B} is bounded (1.5.15)
provided that B is a bounded subset of H01 () [GT, p. 155]. The variational form of the
boundary value problem in (1.5.14) is given by
(∇y, ∇v) − (exp y, v) = (f, v) + (u, v) for all v ∈ H 1 (), (1.5.16)
where H 1 () is defined in Example 1.19. To argue existence of a solution to (1.5.14) let
{(yn , un )} be a minimizing sequence with weak limit (y ∗ , u∗ ) ∈ H01 () × L2 (). Due to
(1.5.15) and the radial unboundedness of the cost functional with respect to the H 1 () ×
L2 ( )-norm the set {| exp yn |Lp : n ∈ N} is bounded for every p ∈ [1, ∞) and {| exp yn |W 1,p :
n ∈ N} is bounded for every p ∈ [1, 2). Since W 1,p () is compactly embedded in
L2 () for every p ∈ (1, 2) it follows that for a subsequence of {yn }, denoted by the
same symbol, lim exp(yn ) = exp y ∗ in L2 (). It is now simple to argue that (y ∗ , u∗ ) is a
solution to (1.5.14). Let us discuss then the validity of (H1)–(H4) with Y = Y1 = H 1 (),
W = (H 1 ())∗ , with the obvious choice for J , and with e : Y × U → W the mapping
assigning to (y, u) ∈ Y × U the functional v → (∇y, ∇v) − (exp y, v) − (f, v) − (u, v)
for v ∈ H 1 (). From (1.5.15) it follows that e is well defined and its Fréchet derivative at
(y, u) in direction (δy, δu) is characterized by
(e (y, u)(δy, δu), v) = (∇δy, ∇v) − (exp(y)δy, v) − (δu, v) for v ∈ H 1 ().
The operator G = ey (y ∗ , u∗ ) can be expressed as
G(δy) = − δy − exp(y ∗ )δy.
Note that G ∈ L(Y, W ∗ ), and G is self-adjoint with compact resolvent. In particular, (H1)
is satisfied. The spectrum of G consists of eigenvalues only. It will be assumed that
0 is not an eigenvalue of G. (1.5.17)
Due to the regularity assumption for z (note that it would suffice to have z ∈ (H 1 )∗ ),
(1.5.17) implies that (H2) holds. To argue the validity of (H3) and (H4) one can rely on the
implicit function theorem. Let B be a bounded open neighborhood of y ∗ in H 1 (). Using
(1.5.15) one argues the existence of a constant κ > 0 such that
| exp y − exp ȳ|L4 ≤ κ|y − ȳ|H 1 for all y, ȳ ∈ B.
It follows that e is continuous on B × U and its partial derivative ey (y, u) is Lipschitz
continuous with respect to (y, u) ∈ B × U . The implicit function theorem implies the
existence of a neighborhood U (u∗ ) of u∗ such that for every u ∈ U (u∗ ) there exists a solution
y(u∗ ) of (1.5.16) depending continuously on u. Since e(y, u) is Lipschitz continuous with
respect to u it follows, moreover, that there exists L > 0 such that
|y(u) − y ∗ |H 1 ≤ L|u − u∗ |L2 ( ) for all u ∈ U (u∗ ).
(H3) and (H4) are a direct consequence of this estimate.

i i

i i
ItoKunisc
i i
2008/6/12
page 23
i i

1.6. Approximation, penalty, and adapted penalty techniques 23

The methodology utilized to consider this example can also be applied to Examples 1.19 and
1.20 provided that  is restricted to being two-dimensional. This is essential for (1.5.15) to
hold. For Example 1.23 it is essential for the cost functional to be radially unbounded with
respect to the H 1 ()-norm for the y-component to guarantee that minimizing sequences are
bounded. For Examples 1.19 and 1.20 the a priori bound on the y-component of minimizing
sequences can be obtained through the state equation.

1.6 Approximation, penalty, and adapted penalty


techniques
As in the previous section we consider problems of type

min J (y, u)
(1.6.1)
subject to e(y, u) = 0, u ∈ C ⊂ U.
We describe techniques that in specific situations can be more powerful than the
general theory of Section 1.4 to obtain optimality systems of type
⎧ ∗ ∗

⎪ e(y , u ) = 0,



ey∗ (y ∗ , u∗ )λ∗ + Jy (y ∗ , u∗ ) = 0, (1.6.2)



⎪ ∗ ∗ ∗ ∗ 

eu (y , u )λ + Ju (y ∗ , u∗ ), u − u∗ U ∗ ,U ≥ 0 for all u ∈ C,
where (y ∗ , u∗ ) denotes a solution to (1.6.1). We shall proceed formally here, referring the
reader to [Ba, Chapter 3.2.2] and [Lio1] for details.

1.6.1 Approximation techniques


If J and/or e are not smooth, then one can aim for obtaining an optimality system of the form
(1.6.2) by introducing families of smooth operators eε and smooth functionals Jε , ε > 0,
and replacing (1.6.1) by

min Jε (y, u)
(1.6.3)
subject to eε (y, u) = 0, and u ∈ C.
Let (yε , uε ), ε > 0, denote solutions to (1.6.3). Under relatively mild conditions it can be
argued that {(yε , uε )}ε>0 contains accumulation points and that they are solutions of (1.6.2).
But they need not coincide with (y ∗ , u∗ ). To overcome this difficulty a penalty term is added
to the cost in (1.6.3), resulting in the adapted penalty formulation

⎪ 1
⎨min Jε (y, u) + | u − u∗ |2
2 (1.6.4)


subject to eε (y, u) = 0 and u ∈ C.
It can be shown that under appropriate assumptions (yε , uε ) converges to (y ∗ , u∗ ) and that
there exists λ∗ such that (1.6.2) holds. This approach can be used for optimal control of
semilinear elliptic equations, variational inequalities [Ba, BaK1, BaK2], and state-dependent
parameter estimation problems [BKR], for example.

i i

i i
ItoKunisc
i i
2008/6/12
page 24
i i

24 Chapter 1. Existence of Lagrange Multipliers

1.6.2 Penalty techniques


This technique is well suited for systems with multiple steady states or finite explosion
property. The procedure is explained by means of an example. We consider the nonlinear
parabolic control system


⎪ yt − y − y 3 = u in Q = (0, T ) × ,



y = 0 on  = (0, T ) × ∂, (1.6.5)





y(0, ·) = y0 in ,

where y = y(t, x), T > 0, and y0 is a given initial condition. With (1.6.5) we associate the
optimal control problem

⎪ 1 α
⎨min J (y, u) = |y − z|6L6 (Q) + |u|2L2 (Q)
6 2 (1.6.6)


subject to u ∈ C, y ∈ H (Q) and (1.6.5),
1,2


where α > 0, z ∈ L6 (Q),  C is a closed convex set in L2 (Q), and H 1,2 (Q) = ϕ ∈ L2 (Q) :
ϕt , ϕxi , ϕxi xj ∈ L2 (Q) . Realizing the state equation by a penalty term leads to

⎪ 1 2
⎨min Jε (y, u) = J (y, u) + yt − y − y 3 − u L2 (Q)
ε (1.6.7)


subject to u ∈ C, y ∈ H 1,2 (Q), y| = 0, y(0, ·) = y0 .

It can be shown that for every ε > 0 there exists a solution (yε , uε ) to (1.6.7). Considering the
behavior of the pair (yε , uε ) as ε → 0+ the question of convergence to a particular solution
(y ∗ , u∗ ) of (1.6.6) again arises. As before this can be realized by means of adaptation terms:

⎪ 1 2 1 2
⎨min Jε (y, u) + y − y ∗ L2 (Q) + u − u∗ L2 (Q)
2 2 (1.6.8)


subject to u ∈ C, y ∈ H (Q), y| = 0, y(0, ·) = y0 .
1,2

Let us denote solutions to (1.6.8) by (y ε , uε ). It is fairly straightforward to derive


optimality conditions for (1.6.8): Setting
1 ε 
λε = − yt − y ε − (y ε )3 − uε ∈ L2 (Q) (1.6.9)
ε
we find
⎧ ε ∗
⎪λt − λ − 3(y ) λε = −(y − z) − (y − y )
⎪ ε ε 2 ε 5 ε
in Q,



λε = 0 on , (1.6.10)




⎩ ε
λ (T , ·) = 0 in Q

i i

i i
ItoKunisc
i i
2008/6/12
page 25
i i

1.6. Approximation, penalty, and adapted penalty techniques 25

and
 
(αuε − λε )(u − uε )dQ + (uε − u∗ )(u − uε )dQ ≥ 0 for all u ∈ C, (1.6.11)
Q Q

where λε must be interpreted as a weak solution in L2 (Q) of (1.6.10) (i.e., the inner product
with smooth test function is taken, and partial integration bringing all differentiations on
the test function is carried out). It can be verified that uε → u∗ in L2 (Q), y ε  y ∗ weakly
in H 1,2 (Q) and that there exists λ∗ ∈ W 2,1;6/5 , the weak limit of λε , such that the following
optimality system for (1.6.6) is satisfied:

⎪yt − y − y = u in Q,
⎪ 3






⎨−λt − λ − 3y 2 λ = −(y − z)5 in Q,
! (1.6.12)



⎪ y = λ = 0 on ,





y(0, ·) = y 0 , λ(T , ·) = 0 in ,
 
where W 2,1;6/5 = ϕ ∈ L6/5 (Q) : ϕt , ϕxi , ϕxi xj ∈ L6/5 (Q) ; see [Lio1, Chapter I.3].

i i

i i
ItoKunisc
i i
2008/6/12
page 26
i i

i i

i i
ItoKunisc
i i
2008/6/12
page 27
i i

Chapter 2

Sensitivity Analysis

2.1 Generalities
In this chapter we discuss the sensitivity of solutions to parameterized mathematical pro-
gramming problems of the form
min f (x, p) subject to
x∈C
(2.1.1)
e(x, p) = 0, g(x, p) ≤ 0, and (x, p) ∈ K,
where C is a closed convex set in X, and further e : X × P → W represents an equality
constraint, g : X × P → Rm a finite-dimensional inequality constraint, and  : X × P → Z
an infinite-dimensional affine constraint. The cost f as well as the equality and inequality
constraints are allowed to depend on a parameter p ∈ P . Unless otherwise specified P is
a normed linear space, X, W, Z are real Hilbert spaces, and K is a closed convex cone
with vertex at 0 in Z. Such problems arise in inverse, control problems, and variational
inequalities, and some examples will be discussed in Section 2.6.
The cone K induces a natural ordering ≤ by means of z1 ≤ z2 if z1 − z2 ∈ K. We
denote by K + the dual cone given by

K + = {z ∈ Z ∗ : z, z̃
≤ 0 for all z̃ ∈ K},

where ·, ·
denotes the duality pairing between Z and Z ∗ . Suppose that x0 is the solution
to (2.1.1) for the nominal parameter p = p0 . The objective of this section is to investigate
• continuity and Lipschitz continuity of the solution mapping

p ∈ P → (x(p), λ(p), μ(p), η(p)) ∈ X × W ∗ × Rm,+ × K + ,

where for p in a neighborhood of p0 , x(p) ∈ X denotes the local solution of (2.1.1)


in a neighborhood of x0 , and λ, μ, η are the Lagrange multipliers associated with
constraints described by e, g,  in (2.1.1);
• differentiability of the minimum value function F (p) = f (x(p), p) at p0 ; and

27

i i

i i
ItoKunisc
i i
2008/6/12
page 28
i i

28 Chapter 2. Sensitivity Analysis

• directional differentiability of the solution mapping.

Due to the local nature of the analysis, f , e, and g need only be defined in a neigh-
borhood of (x0 , p0 ). We assume throughout this chapter that they are twice continuously
Fréchet differentiable with respect to x and that their first and second derivatives are con-
tinuous in a neighborhood of (x0 , p0 ). It is assumed that  is affine in x for every p ∈ P
and that the first derivative  (p0 ) with respect to x is continuous in a neighborhood of p0 .
In order to ensure the existence of a Lagrange multiplier at x0 we assume that

(H1) x0 is a regular point, i.e.,


⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ e (x0 , p0 ) 0 0 ⎬
0 ∈ int ⎝ g (x0 , p0 ) ⎠ (C − x0 ) + ⎝ Rm+
⎠ + ⎝ g(x0 , p0 ) ⎠ , (2.1.2)
⎩ ⎭
 (p0 ) −K (x0 , p0 )

where the interior is taken in W × Rm × Z and primes denote the Fréchet derivatives with
respect to x. We define the Lagrange functional L : X × P × W ∗ × Rm × Z ∗ → R by

L(x, p, λ, μ, η) = f (x, p) + λ, e(x, p)


+ μ, g(x, p)
Rm + η, (x, p)
.

With (2.1.2) holding, it follows from Theorem 1.6 that there exists a Lagrange multiplier
(λ0 , μ0 , η0 ) ∈ W ∗ × Rm,+ × K + such that

L (x0 , p0 , λ0 , μ0 , η0 ), c − x0
≥ 0 for all c ∈ C,

e(x0 , p0 ) = 0,
(2.1.3)
μ0 , g(x0 , p0 )
= 0, g(x0 , p0 ) ≤ 0, μ0 ≥ 0,

η0 , (x0 , p0 )
= 0, (x0 , p0 ) ∈ K, η0 ∈ K + .

To express (2.1.3) in a more compact form we recall that the subdifferential ∂ψC of the
indicator function

0 if x ∈ C,
ψC (x) =
∞ if x ∈
/C

of a closed convex set C in a Banach space X is given by



{y ∈ X∗ : y, c − x
X∗ ,X ≤ 0 for all c ∈ C} if x ∈ C,
∂ψC (x) =
{} if x ∈
/ C.

The set ∂ψC (x) is also referred to as the normal cone to C at x. For convenience we also
specify


{z ∈ Z : z∗ − η, z
X∗ ,X ≤ 0 for all z∗ ∈ K + } if η ∈ K + ,
∂ψK + (η) =
{} / K +.
if η ∈

i i

i i
ItoKunisc
i i
2008/6/12
page 29
i i

2.1. Generalities 29

Note that (2.1.3) is equivalent to




⎪ L (x0 , p0 , λ0 , μ0 , η0 ) + ∂ψC (x0 ),

e(x0 , p0 ),
0∈ (2.1.4)

⎪ −g(x 0 , p0 ) + ∂ψRm,+ (μ0 ),

−(x0 , p0 ) + ∂ψK + (η0 ).

In fact this follows from the following characterization of the normal cone to K + .

Lemma 2.1. Suppose that K is a closed convex cone in Z. Then

z ∈ ∂ψK + (η) if and only if z ∈ K, η ∈ K + , and η, z


= 0.

Proof. Assume that z ∈ ∂ψK + (η). Then ∂ψK + (η) is nonempty, η ∈ K + , and

z∗ − η, z
≤ 0 for all z∗ ∈ K + . (2.1.5)

Hence with z∗ = 2η this implies η, z


≤ 0 and for z∗ = η2 we have η, z
≥ 0, and
therefore η, z
= 0. From (2.1.5) it follows that z∗ , z
≤ 0 for all z∗ ∈ K + , and hence the
geometric form of the Hahn–Banach theorem implies that z ∈ K.
Conversely, if z ∈ K, η ∈ K + , and η, z
= 0, then z∗ − η, z
≤ 0 for all z∗ ∈ Z ∗
and thus z ∈ ∂ψK + (η).

Let A : X → X∗ be the operator representation of the Hessian L (x0 , p0 , λ0 , μ0 , η0 )


of L such that
Ax1 , x2
= L (x0 , p0 , λ0 , μ0 , η0 )(x1 , x2 )
and define operators E : X → W , G : X → Rm , and L : X → Z by

E = e (x0 , p0 ), G = g (x0 , p0 ), L =  (p0 ).

Without loss of generality one can assume that the coordinates of the inequality con-
straints g(x, p0 ) ≤ 0 and the associated Lagrange multiplier μ0 are arranged so that
μ0 = (μ+ − + −
0 , μ0 , μ0 ) and g = (g , g , g ), with g
0 0 +
: X → R m 1 , g 0 : X → R m2 ,

g : X → R , m = m1 + m2 + m3 , and
m3

g + (x0 , p0 ) = 0, μ+ 0 > 0,
g 0 (x0 , p0 ) = 0, μ00 = 0, (2.1.6)
g − (x0 , p0 ) < 0, μ− 0 = 0.

We further define

G+ = g + (x0 , p0 ) , G0 = g 0 (x0 , p0 ) ,

and  
E
E+ = : X → W × R m1 .
G+

i i

i i
ItoKunisc
i i
2008/6/12
page 30
i i

30 Chapter 2. Sensitivity Analysis

Note that the coordinates denoted by superscripts + behave locally like equality
constraints. Define E(z) : X × R → (W × Rm1 ) × Rm2 × Z for z ∈ Z by
⎛ ⎞
E+ 0
E(z) = ⎝ G0 0 ⎠.
L z.

The adjoint E ∗ (z) is given by


 
E+∗ G∗0 L∗
E ∗ (z) = .
0 0 ·, z
Z

The following conditions will be used.


(H2) There exists a κ > 0 such that Ax, x
X∗ ,X ≥ κ |x|2X for all x ∈ ker(E+ ).
(H3) E(z) is surjective at z = (x0 , p0 ).
(H4) There exists a neighborhood Ũ × Ñ of (x0 , p0 ) and a positive constant ν such that

|f (x, p) − f (x, q)| + |e(x, p) − e(x, q)|W + |g(x, p) − g(x, q)|Rm

+ |(x, p) − (x, q)|Z ≤ ν |p − q|P

for all (x, p) and (x, q) in Ũ × Ñ .

Condition (H2) is a second order sufficient optimality condition for (2.1.1) at x0 and (H3)
implies that there exists a neighborhood O of (x0 , p0 ) in Z such that E(z) is surjective for
all z ∈ O.
The chapter is organized as follows. In Sections 2.2 and 2.3 we discuss the basic results
for establishing Lipschitz continuity of the solution mapping. The implicit function theory
for the generalized equation of the form (2.1.4) is discussed in Section 2.2. In Section 2.3
stability results for solutions to mathematical programming problems including (2.1.1) are
addressed. Sufficient conditions for stability of the solutions are closely related to suffi-
cient optimality conditions for local minimizers. Section 2.3 therefore also contains first
and second order sufficient conditions for local optimality. Sections 2.4 and 2.5 are de-
voted to establishing Lipschitz continuity and directional differentiability of the solution
mapping. For the latter we employ the assumption of polyhedricity of a closed convex
cone K which, together with appropriate additional conditions, implies that the directional
derivative (ẋ, λ̇, μ̇, η̇) in direction q satisfies


⎪ L p (x0 , p0 , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗+ μ̇+ + G∗0 μ̇0 + L∗ η̇,

−ep+ (x0 , p0 )q − E+ ẋ,
0∈ (2.1.7)

⎪ −gp0 (x0 , p0 )q − G0 ẋ + ∂ψRm2 ,+ (μ̇0 ),

−p (x0 , p0 )q − Lẋ + ∂ψK̂ + (η̇),
 
where e+ = ge+ and K̂ + is the dual cone of K̂ = ∪λ>0 λ(K − (x0 , p0 ))∩[η0 ]⊥ . Section 2.6
contains some applications to optimal control of ordinary differential equations. This chapter
is strongly influenced by the work of S. M. Robinson and also by that of W. Alt.

i i

i i
ItoKunisc
i i
2008/6/12
page 31
i i

2.2. Implicit function theorem 31

2.2 Implicit function theorem


In this section we discuss an implicit function theorem for parameterized generalized equa-
tions of the form

0 ∈ F (x, p) + ∂ψC (x), (2.2.1)

where F : X × P → X∗ , with X ∗ the strong dual space of X, and C is a closed convex set in
X. We assume that X and P are normed linear spaces unless specified otherwise. Suppose
x0 ∈ X is a solution to (2.2.1) for a reference parameter p = p0 ∈ P . Let F (x, p0 ) be
Fréchet differentiable at x0 and define the linearized form by

T x = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) + ∂ψC (x). (2.2.2)

Definition 2.2 (Strong Regularity). The generalized equation (2.2.1) is called strongly
regular at x0 with associated Lipschitz constant ρ if there exist neighborhoods of V of 0
and U of x0 in X such that (T −1 V ) ∩ U , the intersection of U with the restriction of T −1
to V , is single valued and Lipschitz continuous from V to U with modulus ρ.

Note that if C = X, the strong regularity assumption coincides with F (x0 , p0 )−1
being a bounded linear operator, which is the common condition for the implicit function
theorem. The following result is taken from Robinson’s work [Ro3].

Theorem 2.3 (Generalized Implicit Function Theorem). Let P be a topological space,


and assume that F is Fréchet differentiable with respect to x and both F and F are
continuous at (x0 , p0 ). If (2.2.1) is strongly regular at x0 with associated Lipschitz constant
ρ, then for every  > 0 there exists neighborhoods N of p0 and U of x0 , and a single-
valued function x : N → U such that for every p ∈ N , x(p) is the unique solution in U
of (2.2.1). Furthermore, for each p, q ∈ N

|x(p) − x(q)|X ≤ (ρ + ) |F (p, x(q)) − F (q, x(q))|X∗ . (2.2.3)

Proof. For  > 0 we choose a δ > 0 such that ρδ < /(ρ + ). By strong regularity, there
exists neighborhoods V of 0 and U of x0 in X such that (T −1 V ) ∩ U is single valued and
Lipschitz continuous from V to U with modulus ρ. Let

h(x, p) = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) − F (x, p)

and choose a neighborhood N of p0 and a closed ball U of radius r about x0 contained in


U so that for each p ∈ N and x ∈ U we have for h(x, p) ∈ V

|F (x, p) − F (x0 , p0 )| ≤ δ

and

ρ |F (x0 , p) − F (x0 , p0 )| ≤ (1 − ρδ)r.

i i

i i
ItoKunisc
i i
2008/6/12
page 32
i i

32 Chapter 2. Sensitivity Analysis

For any p ∈ N define an operator p from U → U by

p (x) = T −1 (h(x, p)) ∩ U. (2.2.4)

Note that x ∈ U ∩ p (x) if and only if x ∈ U and (2.2.1) holds. We show that p is a
strict contraction and p maps U into itself. For x1 , x2 ∈ U we find, using h (x, p) =
F (x0 , p0 ) − F (x, p), that

|p (x1 ) − p (x2 )| ≤ ρ |h(x1 , p) − h(x2 , p)|


1
= ρ| 0 h (x1 + t (x1 − x2 ), p)(x1 − x2 ) dt| ≤ ρδ |x1 − x2 |,

where ρ δ < 1. Since x0 = T −1 (0) ∩ U we have

|p (x0 ) − x0 | ≤ ρ |h(x0 , p)| = ρ |F (x0 , p) − F (x0 , p0 )| ≤ (1 − ρδ)r,

and thus for x ∈ U

|p (x) − x0 | ≤ |p (x) − p (x0 )| + |p (x0 ) − x0 | ≤ ρδ |x − x0 | + (1 − ρδ)r ≤ r.

By the Banach fixed point theorem p has a unique fixed point x(p) in U and for each
x ∈ U we have
|x(p) − x| ≤ (1 − ρδ)−1 |p (x) − x|. (2.2.5)
It follows from our earlier observation that x(p) is the unique solution to (2.2.1) in U . To
verify (2.2.3) with p, q ∈ N we use (2.2.5) with x = x(q) and obtain

|x(p) − x(q)| ≤ (1 − ρδ)−1 |p (x(q)) − x(q)|.

Since x(q) = q (x(q)) we find

|p (x(q)) − x(q)| ≤ ρ |h(x(q), p) − h(x(q), q)| = ρ |F (x(q), p) − F (x(q), q)|,

and hence
|x(p) − x(q)| ≤ ρ(1 − ρδ)−1 |F (x(q), p) − F (x(q), q)|.
Observing that ρ(1 − ρδ)−1 ≤ ρ +  the desired estimate (2.2.3) follows.

As a consequence of the previous theorem, Lipschitz continuity of F (x, p) in p


implies local Lipschitz continuity of x(p) at p0 .

Corollary 2.4. Assume in addition to the conditions of Theorem 2.3 that P is a normed
linear space and that for some ν > 0

|F (x, p) − F (x, q)| ≤ ν |p − q|

for p, q ∈ N and x ∈ U . Then x(p) is Lipschitz on N with modulus ν (ρ + ).

It should be remarked that the condition of strong regularity is the weakest possible
condition which can be imposed on the values of a function F and its derivative at a point x0 ,

i i

i i
ItoKunisc
i i
2008/6/12
page 33
i i

2.2. Implicit function theorem 33

so that for each perturbation satisfying the hypothesis of Theorem 2.3, a function x(·) will
exist having the properties stated in Theorem 2.3. To see this, one has only to consider a
function F : X → X ∗ which is Fréchet differentiable at x0 and satisfies
0 ∈ F (x0 ) + ∂ψC (x0 ).
Let P be a neighborhood of the origin in X ∗ , and let
F (x, p) = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) − p
with p0 = 0. Choose  > 0. If there exist neighborhoods N and V and a function x(·)
satisfying the properties stated in Theorem 2.3, then with
T x = F (x0 , p0 ) + F (x0 , p0 )(x − x0 ) + ∂ψC (x)
we see that the restriction to N of T −1 V is a singled-valued, Lipschitz continuous function.
Therefore the generalized equation 0 ∈ F (x, p0 ) + ∂ψC (x) is strongly regular at x0 .
One of the important consequences of Theorem 2.3 is the following theorem on
parametric sensitivity analysis.

Theorem 2.5. Under the conditions of Theorem 2.3 and Corollary 2.4 there exists for each
 > 0 a function α (p) : N → R+ satisfying limp→p0 α (p) = 0, such that
|x(p) − p (x0 )| ≤ α (p) |p − p0 |,
where p (x0 ) is the unique solution in U of the linear generalized equation
0 ∈ F (x0 , p) + F (x0 , p0 )(x − x0 ) + ∂ψC (x). (2.2.6)

Proof. From the proof of Theorem 2.3 it follows that x(p) = p (x(p)) and thus by strong
regularity
|x(p) − p (x0 )| ≤ ρ |h(x(p), p) − h(x0 , p)|
 1


≤ ρ h (x0 + θ (x(p) − x0 )) dθ |x(p) − x0 |

0
 1


≤ ρν(ρ + )|p − p0 | h (x0 + θ (x(p) − x0 )) dθ .

0
Since h (x, p) = F (x0 , p0 ) − F (x, p) and F is continuous,

 1

e (x0 + θ (x(p) − x0 )) dθ → 0


0

as p → p0 , which completes the proof.

In the case C = X the generalized equation (2.2.1) reduces to the nonlinear equation
F (x, p) = 0 and
p (x0 ) = F (x0 , p0 )−1 (F (x0 , p0 ) − F (x0 , p)).

i i

i i
ItoKunisc
i i
2008/6/12
page 34
i i

34 Chapter 2. Sensitivity Analysis

Thus, Theorem 2.5 shows that if F (x0 , ·) is Fréchet differentiable at p0 , then so is x(p) and
∂F
x (p0 ) = −F (x0 , p0 )−1 (x0 , p0 ). (2.2.7)
∂p
In many applications, one might find (2.2.6) significantly easier to solve than (2.2.1). In fact,
the necessary optimality condition (2.1.3) is of the form (2.2.1), and (2.2.6) corresponds to
a quadratic programming problem, as we will discuss in Section 2.4. Thus Theorem 2.5 can
provide a relatively cheap way to find a good approximation to x(p) for p near p0 .

2.3 Stability results


Throughout this section, let X, Y be real Banach spaces and let C be closed convex subsets
of X. Concerning K it suffices in this section to assume that it is a closed convex set in Y ,
unless it is specifically assumed that it is a closed convex cone in Y . For p in the metric
space P with metric δ(·, ·) consider the parameterized minimization problem

min f (x, p) subject to x ∈ C, g(x, p) ∈ K. (2.3.1)

This class of problems contains (2.1.1) as a special case. To facilitate the analysis of (2.3.1)
let (p) denote the set of feasible points, i.e.,

(p) = {x ∈ C : g(x, p) ∈ K}.

For a given p0 ∈ P and x0 ∈ (p0 ) a local minimizer of (2.3.1), define the set r (p) by

r (p) = (p) ∩ B̄(x0 , r),

where r > 0 and B̄(x0 , r) is the closure of the open ball with radius r at x0 in X. We further
define the value function μr (p) by

μr (p) = inf {f (x, p)|x ∈ r (p)}. (2.3.2)

Throughout this section it is assumed that f : X × P → R and g : X × P → Y are


continuously Fréchet differentiable with respect to x and that their first derivatives are
continuous in a neighborhood of (x0 , p0 ). Recall from Definition 1.5 that x0 ∈ (p0 ) is a
regular point if
0 ∈ int {g(x0 , p0 ) + gx (x0 , p0 )(C − x0 ) − K}. (2.3.3)
We use a generalized inverse function theorem for set-valued functions due to Robin-
son [Ro3] for the stability analysis of (2.3.1).
For a nonempty set S in X and x ∈ X we define

d(x, S) = inf {|y − x| : y ∈ S},

and d(x, S) = ∞ if S = ∅. For nonempty sets S and S̄ in X the Hausdorff distance is


defined by
 
dH (S, S̄) = max sup{d(x, S̄) : x ∈ S}, sup{d(y, S) : y ∈ S̄} .

i i

i i
ItoKunisc
i i
2008/6/12
page 35
i i

2.3. Stability results 35

We further set BX = {x ∈ X : |x| < 1}. For a set-valued function F : X → Y we define


the preimage of F by F −1 (y) = {x ∈ X : y ∈ F (x)}. A set-valued function is called closed
and convex if its graph is closed and convex.

Theorem 2.6 (Generalized Inverse Function Theorem). Let X and Y be normed linear
spaces and F a closed convex set-valued function from X to Y . Let y0 ∈ F (x0 ) and suppose
that for some η > 0,
y0 + η BY ⊂ F (x0 + BX ).
Then for x ∈ x0 + BX and any y ∈ y0 + int (η BY ),

d(x, F −1 (y) ∩ (x0 + BX )) ≤ (η − |y − y0 |)−1 (1 + |x − x0 |)d(y, F (x)).

Proof. Let x ∈ x0 + BX and y − y0 ∈ int (η BY ). For y ∈ F (x) the claim is clearly satisfied
/ F (x). Let δ > 0 be arbitrary and choose yδ ∈ F (x) such that
and therefore we assume y ∈

0 < |yδ − y| < d(y, F (x)) + δ. (2.3.4)

Let α = η − |y − y0 | > 0. Choose any  ∈ (0, α) and set

y = y + (α − )|y − yδ |−1 (y − yδ ).

Then |y − y0 | ≤ |y − y0 | + (α − ) < η, and

y ∈ y0 + int (ηBY ).

Thus by assumption there exists

x ∈ x0 + BX with y ∈ F (x ).

For 1
1+(α−)|y−yδ |−1
∈ (0, 1), we find, using convexity of F and the fact that yδ ∈ F (x),

y = (1 − λ)yδ + λ y ∈ (1 − λ)F (x) + λ F (x ) ⊂ F ((1 − λ)x + λ x ). (2.3.5)

Since x and x are contained in x0 + BX we have (1 − λ)x + λ x ∈ x0 + BX . Together


with (2.3.5) this implies

d(x, F −1 (y) ∩ (x0 + BX )) ≤ |x − (1 − λ)x + λ x | = λ |x − x |.

However, |x − x | ≤ |x − x0 | + |x0 − x | ≤ |x − x0 | + 1 and


|y − yδ | |y − yδ |
λ= < .
|y − yδ | + α −  α−
Hence by (2.3.4)

d(x, F −1 (y) ∩ (x0 + BX )) ≤ (η − |y − y0 | − )−1 (1 + |x − x0 |) (d(y, F (x)) + δ),

and letting  → 0+ , δ → 0+ the claim follows.

i i

i i
ItoKunisc
i i
2008/6/12
page 36
i i

36 Chapter 2. Sensitivity Analysis

The proof of the stability result below relies on the following set-valued fixed point lemma.
For a metric space (X , d) let F(X ) denote the collection of all nonempty closed subsets
of X .

Lemma 2.7. Let (X , d) be a complete metric space and let  : X → F(X ) be a set-valued
mapping. Suppose that there exists x0 ∈ X and constants r > 0, α ∈ (0, 1),  ∈ (0, r) such
that for all x1 , x2 in the closed ball B̄(x0 , r) the sets (x1 ), (x2 ) are nonempty,

dH ((x1 ), (x2 )) ≤ α d(x1 , x2 ),

and

d(x0 , (x0 )) ≤ (1 − d)(r − ).

Then there exists x̄ ∈ B̄(x0 , r) such that x̄ ∈ (x̄) and d(x0 , x̄) ≤ (1−α)−1 d(x0 , (x0 ))+.
If moreover

dH ((S1 ), (S2 )) ≤ α dH (S1 , S2 ) (2.3.6)

for all nonempty closed subsets S1 and S2 of B̄(x0 , r), then the sequence Sk = (Sk−1 ),
with S0 = {x0 }, converges in the Hausdorff metric to a set S̄ satisfying S̄ = (S̄).

Proof. Let δ = α(x0 , (x0 )) and set γ = α −1 (1 − α). By induction one proves the
existence of a sequence xk+1 ∈ (xk ), k = 0, 1, . . . , such that xk ∈ B̄(x0 , r) and

d(xk+1 , xk ) ≤ α k δ + (1 − 2−(k+1) )γ α k+1 ,


k+1 (2.3.7)
d(xk+ü1 , x0 ) ≤ (1 − α k+1 )(r − ) + γ (1 − 2−i )α i .
i=1

In fact, choose x1 ∈ (x0 ) with d(x0 , x1 ) ≤ δ + γ2 α ≤ (1 − α)(r − ) + γ2 α ≤ r.


Hence x1 ∈ B̄(x0 , r) and (2.3.7) holds for k = 0. Now suppose that (2.3.7) holds with k
replaced by k − 1. Then d(x0 , xk ) < r and hence (xk ) is nonempty and there exists xk+1 ∈
(xk ) satisfying d(xk+1 , xk ) ≤ d((xk ), xk )+2−(k+1) γ α k+1 . Consequently d(xk+1 , xk ) ≤
dH ((xk ), (xk−1 )) + 2−(k+1) γ α k+1 ≤ α d(xk , xk−1 ) + 2−(k+1) !γk+1α
k+1
≤ α k δ + (1 −
−(k+1)
2 )γ α , and d(xk+1 , x0 ) ≤ d(xk+1 , xk ) + d(xk , x0 ) ≤ γ i=1 (1 − 2−i )α i + (1 −
k+1

α )(r − ), and (2.3.7) holds for all k. For k ≥ 0 and for every m ≥ 1 we have
k+1


m−1
d(xk+m , xk ) ≤ d(xk+i+1 , xk+i )
i=0


m−1
 
≤ α k+i δ + (1 − 2−(k+i+1) )γ α k+i+1
i=0

≤ α k (1 − α)−1 (δ + γ α),

i i

i i
ItoKunisc
i i
2008/6/12
page 37
i i

2.3. Stability results 37

and consequently {xk } is a Cauchy sequence in B̄(xo , r). Since X is complete there exists
x̄ ∈ B̄(x0 , r) with limk→∞ xk = x̄ and d(x̄, x0 ) ≤ 1−α δ
+ . To verify the fixed point
property, note that for every k we have

d(x̄, (x̄)) ≤ d(x̄, xk+1 ) + d(xk+1 , (x̄))

≤ d(x̄, xk+1 ) + dH ((xk ), (x̄)) (2.3.8)

≤ d(x̄, xk+1 ) + α d(xk , x̄).

Passing to the limit with respect to k implies that x̄ ∈ (x̄).


We now assume that  also satisfies (2.3.6). Since the set of closed subsets of B̄(xo , r)
is complete with respect to the Hausdorff metric, the existence of a set S̄ with lim Sk = S̄
in the Hausdorff metric follows. To argue invariance of S̄ under  note that

dH (S̄, (S̄)) = lim dH (Sk , (S̄))


k→∞

= lim dH ((Sk−1 ), (S̄)) ≤ lim dH (Sk−1 , S̄) = 0


k→∞ k→∞

and hence S̄ = (S̄).

The following result is an important consequence of Theorem 2.6.

Theorem 2.8 (Stability). Let x0 ∈ (p0 ) be a regular point. Then for every  > 0 there
exist neighborhoods U of x0 , N of p0 such that for p ∈ N the set (p) is nonempty and
for each x ∈ U ∩ C

d(x, (p)) ≤ (M + ) d(0, {g(x, p) − K : x ∈ C}), (2.3.9)

where a constant M > 0 is independent of  > 0.

Proof. Let F be the closed convex set-valued mapping from X into Y given by

g (x0 , p0 )(x − x0 ) + g(x0 , p0 ) − K for x ∈ C,
F (x) =
{} for x ∈
/ C.

The regular point condition implies that there exists an η > 0 such that

η BY ⊂ F (x0 + BX ).

Here we use Theorem 1.4 and the discussion below Definition 1.5. For δ > 0 and r > 0 let
ρ = (η − 2δ r)−1 (1 + r) where we assume that

2δ r < η and 2ρδ < 1.

Let
h(x, p) = g(x0 , p0 ) + g (x0 , p0 )(x − x0 ) − g(x, p)

i i

i i
ItoKunisc
i i
2008/6/12
page 38
i i

38 Chapter 2. Sensitivity Analysis

and choose a neighborhood of N of p0 and a closed ball B̄(x0 , r) of radius r about x0 such
that for x ∈ U = C ∩ B̄(x0 , r) and p ∈ N

|g (x, p) − g (x0 , p0 )| ≤ δ

and
|g(x0 , p) − g(x0 , p0 )| ≤ δ r.
As in the proof of Theorem 2.3 it can be shown that

|h(x1 , p) − h(x2 , p)| ≤ δ|x1 − x2 |

for all x1 , x2 ∈ U and p ∈ N . In particular, if x ∈ U and p ∈ N , then

|h(x, p)| ≤ |h(x0 , p0 )| + |h(x, p) − h(x0 , p0 )| ≤ δ r + δ r < η.

For any p ∈ N define the set-valued closed convex mapping p from U → C by

p (x) = F −1 (h(x, p)). (2.3.10)

Note that x ∈ p (x) implies that g(x, p) ∈ K. We argue that Lemma 2.7 is applicable
with  = p and X = C endowed with the metric induced by X. For every x ∈ U the set
p (x) is a closed convex subset of X. (Hence, if X is a reflexive Banach space, then the
metric projection is well defined and Lemma 2.7 will be applicable with  = 0.) For every
x1 , x2 ∈ U we have by Theorem 2.6

dH (p (x1 ), p (x2 ))| ≤ ρ |h(x1 , p) − h(x2 , p)| < ρδ |x1 − x2 |,

where we used that h(xi , p) ∈ F (xi ) for i = 1, 2. Moreover

|p (x0 ) − x0 | ≤ ρ |g(x0 , p) − g(x0 , p0 )| ≤ ρδr,

and for every pair of sets S1 and S2 in U we have

dH (p (S1 ), p (S2 )) ≤ ρ δ dH (S1 , S2 ).

Hence all conditions of Lemma 2.7 are satisfied with α = 2ρδ and  = r(1−2α)
1−α
. Con-
sequently there exists a Cauchy sequence {xk } in C such that xk+1 ∈ p (xk ) and x̄ =
limk→∞ xk ∈ C satisfies x̄ ∈ p (x̄). Moreover, if Sk = p (Sk−1 ) with S0 = {x0 }, then
dH (Sm , Sk ) → 0 as m ≥ k → ∞ and thus S̄ = limk→∞ Sk satisfies S̄ = p (S̄) and
S̄ ⊂ p . Let

(x, ỹ) ∈ M = {(x, ỹ) ∈ U × Y : g(x0 , p0 ) + g (x0 , p0 )(x − x0 ) + ỹ ∈ K}.

For all x̄ ∈ S̄ we have by Theorem 2.6

d(x, F −1 (h(x̄, p)) ≤ ρ d(h(x̄, p), F (x)) ≤ ρ |h(x̄, p) + ỹ|

≤ ρ (|g(x, p) − g(x0 , p0 ) − g (x0 , p0 )(x − x0 ) − ỹ| + |h(x̄, p) − h(x, p)| ).

i i

i i
ItoKunisc
i i
2008/6/12
page 39
i i

2.3. Stability results 39

Since |h(x, p) − h(x̄, p)| ≤ δ |x − x̄|,


ρ
d(x, S̄) ≤ |g(x, p) − g(x0 , p0 ) − g (x0 , p0 )(x − x0 ) − ỹ|
1 − ρδ
for all (x, ỹ) ∈ M. Now, for x ∈ C and y ∈ g(x, p) − K we let ỹ = g(x, p) − g(x0 , p0 ) −
g (x0 , p0 )(x − x0 ) − y. Then (x, ỹ) ∈ M and hence
ρ
d(x, (p)) ≤ d(x, S̄) ≤ |y|.
1 − ρδ

This holds for all y ∈ g(x, p) − K. Since 1−ρδ


ρ
= η−δ(1+3r)
1+r
, we can select for every  > 0
−1
neighborhoods N = N and U = U such that 1−ρδ ≤ η + . This implies the claim
ρ

with M = η−1 .

Theorem 2.9 (Lipschitz Continuity of Value Function). Let x0 ∈ (p0 ) be a local


minimizer. Assume moreover that x0 is regular and that there exists a neighborhood U × N
of (x0 , p0 ) such that

|f (x, p) − f (x̃, p0 )| ≤ Lf (|x − x̃| + δ(p, p0 )) (2.3.11)

for x, x̃ ∈ U and p ∈ N and

|g(x, p) − g(x, p0 )| ≤ Lg δ(p, p0 ) (2.3.12)

for (x, p) ∈ U × N. Then there exist constants r > 0 and Lr and a neighborhood Ñ of p0
such that
|μr (p) − μr (p0 )| ≤ Lr δ(p, p0 )
for all p ∈ Ñ.

Proof. Since x0 is a local minimizer of (2.3.1) at p0 there exists s > 0 such that

f (x0 , p0 ) ≤ f (x, p0 ) for all x ∈ s (p0 ).

Let Cs = C ∩ B̄(x0 , s). By Theorem 1.4 and the discussion following Definition 1.5 the
regular point condition (2.3.3) also holds with C replaced by Cs . Applying Theorem 2.8
with C = Cs and  = 1 there exist neighborhoods U1 of x0 and N1 of p0 such that s (p)
is nonempty and

d(x, s (p)) ≤ (M + 1) d(0, {g(x, p) − K : x ∈ Cs }) (2.3.13)

for each x ∈ U1 and p ∈ N1 .


We now choose r > 0 and a neighborhood Ñ of p0 such that

r ≤ s, B̄(x0 , r) ⊂ U1 ∩ U, Ñ ⊂ N1 ∩ N, and 2(M + 1)Lg δ(p, p0 ) < r for all p ∈ Ñ .


(2.3.14)
By (2.3.13) and (2.3.12) we obtain for all p ∈ Ñ

d(x0 , s (p)) ≤ (M + 1) |g(x0 , p) − g(x0 , p0 )| ≤ (M + 1)Lg δ(p, p0 ).

i i

i i
ItoKunisc
i i
2008/6/12
page 40
i i

40 Chapter 2. Sensitivity Analysis

Thus there exists a x(p) ∈ s (p) such that

|x(p) − x0 | ≤ 2(M + 1)Lg δ(p, p0 ) < r

and therefore x(p) ∈ r (p) and f (x(p), p) ≥ μr (p). From (2.3.11)

|f (x(p), p) − f (x0 , p0 )| ≤ Lr δ(p, p0 ),

where Lr = Lf (2(M + 1)Lg + 1). Combining these inequalities, we have

μr (p) ≤ f (x(p), p) ≤ f (x0 , p0 ) + Lr δ(p, p0 ) = μr (p0 ) + Lr δ(p, p0 ) (2.3.15)

for all p ∈ Ñ.


Conversely, let p ∈ Ñ and x ∈ r (p). From (2.3.13) and (2.3.12)

d(x, s (p0 )) ≤ (M + 1) |g(x, p) − g(x0 , p0 )|.

Hence there exists x̄ ∈ s (p0 ) such that

|x − x̄| ≤ 2(M + 1)Lg δ(p, p0 ) < r, (2.3.16)

and thus f (x0 , p0 ) ≤ f (x̄, p0 ). From (2.3.11) we deduce that |f (x, p) − f (x̄, p0 )| ≤
Lr δ(p, p0 ), and hence

μr (p0 ) = f (x0 , p0 ) ≤ f (x̄, p0 ) ≤ f (x, p) + Lr δ(p, p0 ) (2.3.17)

for each x ∈ r (p). The desired estimate follows from (2.3.15) and (2.3.17).

In the above proof we followed the work of Alt [Alt3]. Next, we establish Hölder
continuity of the local minimizers x(p) of (2.3.1). Theorem 2.9 provides Lipschitz conti-
nuity of the value function under the regular point condition. The following example shows
that the local solutions to the perturbed problems may behave rather irregularly.

Example 2.10. Let X = Y = R 2 , P = R, C = R 2 , K = {(y1 , y2 ) : y1 ≥ 0, y2 ≥ 0},


f (x1 , x2 , p) = x1 + p x2 , and g(x1 , x2 , p) = (x1 , x2 ). For p0 = 0 a local solution to (2.3.1)
is given by x0 = (0, 1) and μr (p0 ) = 0. Now let r > 0 be arbitrary. Then for p  = p0 the
perturbed problem

min f (x, p), x ∈ r (p) (2.3.18)

has a unique solution



(0, max(1 − r, 0)) if p > 0,
x(p) =
(0, 1 + r) if p < 0
and

p max(1 − r, 0) if p > 0,
μr (p) =
p(1 + r) if p < 0.
This shows that μr (p) is Lipschitz continuous but the distance |x(p) − x0 | is not continuous
with respect to p at p0 = 0. The reason for this behavior of x(p) is that the unperturbed

i i

i i
ItoKunisc
i i
2008/6/12
page 41
i i

2.3. Stability results 41

problem does not depend on x2 and all (0, x2 ) with x2 ≥ 0 are optimal for p = p0 . On the
other hand if we let f (x1 , x2 , p) = x1 + px2 + 12 (x2 − 1)2 , then for r ≥ 1 and |p| ≤ 1 the
solution to (2.3.18) is given by
x(p) = (0, 1 − p).
Thus the distance |x(p) − x0 | is continuous in p.
This example suggests that some kind of local invertibility should be assumed on
f in order to ensure continuity of the solutions. We define a condition closely related to
sufficient optimality conditions:
There exist κ > 0, β ≥ 1, γ > 0, and neighborhoods Û of x0 and N̂ of p0 such that
f (x, p) ≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 ) (2.3.19)
for all p ∈ N̂ and x ∈ (p) ∩ Û .
The following result shows that this condition is implied by sufficient optimality
conditions at x0 .

Theorem 2.11. Assume that x0 is regular, that (2.3.11)–(2.3.12) hold, and that there exist
constants κ > 0, β ≥ 1 and neighborhood Ũ of x0 such that
f (x, p0 ) ≥ f (x0 , p0 ) + κ |x − x0 |β for all x ∈ (p0 ) ∩ Ũ . (2.3.20)
Then condition (2.3.19) holds.

Proof. We select r, s > 0 and a neighborhood N1 of p0 as in (2.3.14) in the proof of Theorem


2.9. We can assume that B̄(x0 , s) ∈ Ũ . Set Û = B̄(x0 , r) and choose x ∈ r (p), p ∈ N .
From (2.3.16) there exists an x̄ ∈ s (p0 ) such that

|x − x̄| ≤ 2(M + 1)Lg δ(p, p0 ),


and due to (2.3.17)
f (x, p) ≥ f (x̄, p0 ) − Lr δ(p, p0 ).
By (2.3.20)
f (x̄, p0 ) ≥ f (x0 , p0 ) + κ̃ |x̄ − x0 |β .
Note that
|x − x0 |β ≤ (|x̄ − x0 | + |x̄ − x|)β

≤ |x̄ − x0 |β + β(|x̄ − x0 | + |x̄ − x|)β−1 |x − x̄| ≤ |x̄ − x0 |β + β(3s)β−1 |x − x̄|.


Thus,
f (x, p) ≥ f (x0 , p0 ) + κ |x − x0 |β − Lr δ(p, p0 )

≥ f (x0 , p0 ) + κ |x − x0 |β − γ δ(p, p0 ),

where γ = Lr + κβ(3s)β−1 2(M + 1)Lg and (2.3.19) follows with N̂ = Ñ .

i i

i i
ItoKunisc
i i
2008/6/12
page 42
i i

42 Chapter 2. Sensitivity Analysis

Condition (2.3.20) can be verified under the following second order sufficient opti-
mality condition due to Maurer and Zowe [MaZo] .

Theorem 2.12. Assume that K is a closed convex cone in Y and that x0 ∈ (p0 ) is a
regular point, and let L(x, λ) = f (x, p0 ) + λ, g(x, p0 )
Y ∗ ,Y be the Lagrangian for (2.3.1)
at x0 with Lagrange multiplier λ0 . If there exist constants ω > 0, β̄ > 0 such that

L (x0 , λ0 , p0 )(h, h) ≥ ω |h|2 for all h ∈ S, (2.3.21)

where

S = L((p0 ), x0 ) ∩ {h ∈ X : λ0 , g (x0 , p0 )h
Y ∗ ,Y ≥ −β̄ |h|},

then there exist κ > 0 and a neighborhood Ũ of x0 such that (2.3.20) holds with β = 2.
For convenience we recall the definition of the linearizing cone: L((p0 ), x0 ) =
{x ∈ C(x0 ) : g (x0 )x ∈ K(g(x0 ))}, where K(g(x0 )) = {λ(y − g(x0 )) : y ∈ K, λ ≥ 0}.

Proof. All quantities are evaluated at p0 and this dependence will therefore be suppressed.
First we show that every x ∈  = (p0 ) can be represented as

x − x0 = h(x) + z(x) with h(x) ∈ L(, x0 ) and |z(x)| = o(|x − x0 |), (2.3.22)

where o(s)
s
→ 0 as s → 0+ . Let x ∈  and expand g at x0 :

g(x) − g(x0 ) = g (x0 )(x − x0 ) + r(x, x0 ) with |r(x, x0 )| = o(|x − x0 |).

By the generalized open mapping theorem (see Theorem 1.4), there exist α > 0 and k(x) ∈
K(g(x0 )) such that for some z(x) ∈ α |r(x, x0 )| (BX ∩ C(x0 ))

r(x, x0 ) = g (x0 )z(x) − k(x).

If we put h(x) = x − x0 + z(x), then h(x) ∈ C(x0 ), |z(x)| = o(|x − x0 |), and

g (x0 )h(x) = g (x0 )(x − x0 ) + r(x, x0 ) + k(x)

= g(x) − g(x0 ) + k(x) ∈ K − g(x0 ) + k(x) ⊂ K(g(x0 )),

i.e., h(x) ∈ L(, x0 ). Next, for x ∈ 

1
f (x) ≥ f (x) + λ0 , g(x)
= L(x, λ0 ) = L(x0 , λ0 ) + L (x − x0 , x − x0 ) + r(x − x0 ),
2
(2.3.23)
where |r(x −x0 )| = o(|x −x0 |2 ), and we used λ0 , g(x)
≤ 0 for x ∈ , and L (x0 , λ0 ) = 0.
Now put B = L (x0 , λ0 ) and S = L() ∩ {h ∈ X : λ0 , g (x0 )h
≥ −β̄ |h|} in Lemma
2.13. It implies the existence of δ0 > 0 and γ > 0 such that

L (x0 , λ0 )(x − x0 , x − x0 ) ≥ δ0 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈  − x0


with |z(x)| ≤ γ |h(x)| and λ0 , g (x0 )h
≥ −β̄ |h|.

i i

i i
ItoKunisc
i i
2008/6/12
page 43
i i

2.3. Stability results 43

Choose δ ∈ (0, 1) satisfying δ/(1 − δ) < γ and ρ > 0 such that |z(x)| ≤ δ|x − x0 | for
|x − x0 | ≤ ρ. Then for |x − x0 | ≤ ρ

|h(x)| ≥ |x − x0 | − |z(x)| ≥ (1 − δ)|x − x0 |


and thus
δ
|z(x)| ≤ |h(x)| < γ |h(x)|. (2.3.24)
1−δ
Hence
L (x0 , λ0 )(x − x0 , x − x0 ) ≥ δ0 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈  − x0
with |x − x0 | ≤ ρ and λ0 , g (x0 )h(x)
≥ −β̄ |h(x)|.

By (2.3.23) there exists ρ̄ ∈ (0, ρ] such that

f (x) ≥ f (x0 ) + δ40 |x − x0 |2 for all x − x0 = h(x) + z(x) ∈  − x0


(2.3.25)
with |x − x0 | ≤ ρ and λ0 , g (x0 )h(x)
≥ −β̄ |h(x)|.

For the case x − x0 = h(x) + z(x) ∈  − x0 with |x − x0 | ≤ ρ̄ and λ0 , g (x0 )h(x)


<
−β̄ |h(x)| we find

f (x) − f (x0 ) = f (x0 )(x − x0 ) + r̄(x − x0 )

= − λ0 , g (x0 )h(x)
− λ0 , g (x0 )z(x)
+ r̄(x − x0 )

≥ β̄|h(x)| + r1 (x − x0 ),

where r1 (x − x0 ) = − λ0 , g (x0 )z(x)


+ r̄(x − x0 ) and r1 (x − x0 ) = o(|x − x0 |). Together
with (2.3.24) this implies

f (x) ≥ f (x0 ) + β̄(1 − δ)|x − x0 | + r1 (x − x0 ) for x − x0 = h(x) + z(x) ∈  − x0


with |x − x0 | ≤ ρ̄ and λ0 , g (x0 )h(x)
< −β̄ |h(x)|.

Combined with (2.3.25) we arrive at the desired conclusion.

Lemma 2.13. Let S be a subset of a normed linear space X and let B : X × X → R be a


bounded symmetric quadratic form satisfying for some ω > 0

B(h, h) ≥ ω |h|2 for all h ∈ S.

Then there exist δ0 > 0 and γ > 0 such that

B(h + z, h + z) ≥ δ0 |h + z| 2 for all h ∈ S, z ∈ X with |z| ≤ γ |h|.

Proof. Let b = B and choose γ > 0 such that δ = ω − 2bγ − bγ 2 > 0. Then for all
z ∈ X and h ∈ S satisfying |z| ≤ γ |h|

B(h + z, h + z) ≥ ω |h|2 − 2b|h||z| − b|z|2 ≥ δ |h|2 .

i i

i i
ItoKunisc
i i
2008/6/12
page 44
i i

44 Chapter 2. Sensitivity Analysis

Since |h + z| ≤ |h| + |z| ≤ (1 + γ ) |h| we get

B(h + z, h + z) ≥ δ0 |h + z|2

with δ0 = δ/(1 + γ )2 .

Remark 2.3.1. An example which illustrates the benefit of including the set {h ∈ X :
λ0 , g (x0 , p0 )h
Y ∗ ,Y ≥ −β̄ |h|} into the definition of the set S on which the Hessian must
be positive definite is given by the one-dimensional example with

f (x) = −x 3 − x, g(x) = x, K = R− , and C = R.

In this case  = R − , x0 = 0, f (x0 ) = 0, f (x0 ) = −1, and f (x0 ) = 0. We further find that
L(, x0 ) = K and that the Lagrange multiplier is given by λ0 = 1. Consequently for β̄ ∈
(0, 1) we have that S = L(, x0 ) ∩ {h ∈ R : λ0 g (x0 )h ≥ −β̄ |h|} = {0}. Hence (2.3.21) is
trivially satisfied. Let us note that f is bounded below as f (x) ≥ f (x0 ) + max(|x|, 2|x|2 )
for x ∈ K. A similar argument does not hold if f is replaced by f (x) = −x 3 . In this
case x0 = 0, λ0 = 0, f (x0 ) = 0, S = K, and (2.3.21) is not satisfied. Note that in this
case we have lack of strict complementarity; i.e., the constraint is active and simultaneously
λ0 = 0.

Theorem 2.14. Suppose that x0 ∈ (p0 ) is a regular point and that for some β > 0

f (x0 )h ≥ β|h| for all h ∈ L((p0 ), x0 ).


) of x0 such that
Then there exists α > 0 and a neighborhood U
).
f (x) ≥ f (x0 ) + α|x − x0 | for all x ∈ (p0 ) ∩ U

Proof. For any x ∈ (p0 ) we use the representation

x − x0 = h(x) + z(x)

from (2.3.22) in the proof of Theorem 2.12. By expansion we have

f (x) − f (x0 ) = f (x0 )h(x) + f (x0 )z(x) + r(x, x0 )

with |r(x, x0 )| = o(|x − xo |). The first order condition implies

f (x) − f (x0 ) ≥ β|h(x)| + r1 (x, x0 ),

where r1 (x, x0 ) = f (xo )z(x) + r(x, xo ) and |r1 (x, x0 )| = o(|x − x0 |). Choose a neighbor-
) of x0 such that |z(x)| ≤ 1 |x − x0 | and |r1 (x, x0 )| ≤ β |x − x0 | for x ∈ (p0 ) ∩ U
hood U ).
2 4
Then |h(x)| ≤ |x − x0 | and f (x) − f (x0 ) ≥ |x − x0 | for x ∈ U .
1 β )
2 4

Note that for C = X we have f (x0 ) + g(x0 , p0 )∗ λ0 = 0 and the second order
condition of Theorem 2.12 involves the set of directions h ∈ L((p0 ), x0 ) for which the
first order condition is violated.
Following [Alt3] we now show that (2.3.19) implies stability of local minimizers.

i i

i i
ItoKunisc
i i
2008/6/12
page 45
i i

2.4. Lipschitz continuity 45

Theorem 2.15. Suppose that the local solution x0 ∈ (p0 ) is a regular point and that
(2.3.11), (2.3.12), and (2.3.19) are satisfied. Then there exist real numbers r > 0 and
L > 0 and a neighborhood N of p0 such that the value function μr is Lipschitz continuous
at p0 and for each p ∈ N the following statements hold:
(a) For every sequence {xn } in r (p) with the property that limn→∞ f (xn , p) =
μr (p) it follows that xn ∈ B(x0 , r) for all sufficiently large n.
(b) If there exists a point x(p) ∈ r (p) with μr (p) = f (x(p), p), then x(p) ∈
B(x0 , r), i.e., x(p) is a local minimizer of (2.3.1) and
1
|x(p) − x0 | ≤ L δ(p, p0 ) β .

Proof. We choose r > 0 and a neighborhood Ñ of p0 as in (2.3.14) in the proof of Theorem


2.9. Without loss of generality we can assume that Ñ × B̄(x0 , r) ⊂ N̂ × Û with N̂ , Û
determined from (2.3.19) and that

κ r β > (Lr + γ )δ(p, p0 ) (2.3.26)

for all p ∈ Ñ, where Lr is determined in Theorem 2.9. Thus μr is Lipschitz continuous at
p0 by Theorem 2.9. Let p ∈ Ñ be arbitrary and let {xn } in (p) be a sequence satisfying
limn→∞ f (xn , p) = μr (p). By (2.3.19) and Lipschitz continuity of μr

κ |xn − x0 |β ≤ |f (xn , p) − f (x0 , p0 )| + γ δ(p, p0 )

≤ |f (xn , p) − μr (p)| + |μr (p) − μr (p0 )| + γ δ(p, p0 )

≤ |f (xn , p) − μr (p)| + (Lr + γ ) δ(p, p0 ).

This estimate together with (2.3.26) imply |xn − x0 | < r for all sufficiently large n. This
proves (a).
Similarly, for x(p) ∈ r (p) satisfying μr (p) = f (x(p), p) we have

κ |x(p) − x0 |β ≤ (Lr + γ ) δ(p, p0 ).

Thus, |x(p) − x0 | < r,


  β1
Lr + γ
|x(p) − x0 | ≤ δ(p, p0 )
κ
and (b) follows.

2.4 Lipschitz continuity


In this section we investigate Lipschitz continuous dependence of the solution and the
Lagrange multipliers with respect to the parameter p in (2.1.1). Throughout this section
and the next we assume that C = X and that the requirements on the function spaces
specified below (2.1.1) are met.

i i

i i
ItoKunisc
i i
2008/6/12
page 46
i i

46 Chapter 2. Sensitivity Analysis

Theorem 2.16. Assume (H1)–(H4) hold at the local solution x0 of (2.1.1). Then there
exist neighborhoods N = N (p0 ) of p0 and U = U (x0 , λ0 , μ0 , η0 ) of (x0 , λ0 , μ0 , η0 ) and a
constant M such that for all p ∈ N there exists a unique (xp , λp , μp , ηp ) ∈ U satisfying


⎪ L (x, p, λ, μ, η),

−e(x, p),
0∈ (2.4.1)

⎪ −g(x, p) + ∂ψRm,+ (μ),

−(x, p) + ∂ψK + (η),

and

|(x(p), λ(p), μ(p), η(p)) − (x(q), λ(q), μ(q), η(q))| ≤ M |p − q| (2.4.2)

for all p, q ∈ N. Moreover, there exists a neighborhood Ñ ⊂ N of p0 such that x(p) is a


local solution of (2.1.1) if p ∈ Ñ .

For the proof we shall employ the following lemma.

Lemma 2.17. Consider the quadratic programming problem in a Hilbert space H


1
min J (x) = Ax, x
+ ã, x

2 (2.4.3)
subject to Ex = b̃ in Ỹ and Gx − c̃ ∈ K̃ ⊂ Z̃,

where Ỹ , Z̃ are Banach spaces, E ∈ L(H, Ỹ ), G ∈ L(H, Z̃), and K̃ is a closed convex
cone in Z̃. If
(a) A·, ·
is a symmetric bounded quadratic form on H ×H and there exists an ω > 0
such that Ax, x
≥ ω |x|2H for all x ∈ ker (E), and
(b) for all (b̃, c̃) ∈ Ỹ × Z̃

S̃(b̃, c̃) = {x ∈ H : Ex = b̃, Gx − c̃ ∈ K} is nonempty,

then there exists a unique solution to (2.4.3) satisfying

Ax + ã, v − x
≥ 0 for all v ∈ S̃(b̃, c̃).

Moreover, if x is a regular point, then there exists (λ̃, η̃) ∈ Ỹ ∗ × Z̃ ∗ such that
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
ã A E∗ G∗ x 0
0 ∈ ⎝ b̃ ⎠ + ⎝ −E 0 0 ⎠ ⎝ λ̃ ⎠ + ⎝ 0 ⎠.
c̃ −G 0 0 η̃ ∂ψK̃ + (η̃)

Proof. Since K̃ is closed and convex, S̃(b̃, c̃) is closed and convex as well. Since S̃(b̃, c̃)
is nonempty, there exists a unique w ∈ range E ∗ such that Ew = b̃. Moreover, every
x ∈ S̃(b̃, c̃) can be expressed as x = w + y, where y ∈ ker (E). From (a) it follows that
J is coercive and bounded below on S̃(b̃, c̃). Hence there exists a bounded minimizing
sequence {xn } in S̃(b̃, c̃) such that limn→∞ J (xn ) → inf x∈S̃(b̃,c̃) J (x). Boundedness of

i i

i i
ItoKunisc
i i
2008/6/12
page 47
i i

2.4. Lipschitz continuity 47

xn and weak sequential closedness of S̃(b̃, c̃) imply the existence of a subsequence of xn
that converges weakly to some x in S̃(b̃, c̃). Since J is weakly lower semicontinuous,
J (x) ≤ lim inf J (xn )n→∞ . Hence x minimizes J over S̃(b̃, c̃).
For v ∈ S̃(b̃, c̃) and t ∈ (0, 1) we have x + t (v − x) ∈ S̃(b̃, c̃). Thus,
t2
0 ≤ J (x + t (v − x)) − J (x) = t Ax + ã, v − x
+ A(v − x), v − x

2
and letting t → 0+ we have
Ax + ã, v − x
≥ 0 for all v ∈ S̃(b̃, c̃).
The last assertion follows from Theorem 1.6.

Proof of Theorem 2.16. The proof of the first assertion of the theorem is based on the implicit
function theorem of Robinson for generalized equations; see Theorem 2.3. It requires us to
verify the strong regularity condition for the linearized form of (2.4.1) which is given by


⎪ L (x0 , p0 , λ0 , μ0 , η0 ) + A(x − x0 ) + E ∗ (λ − λ0 ) + G∗ (μ − μ0 ) + L∗ (η − η0 ),

−e(x0 , p0 ) − E(x − x0 ),
0∈

⎪ −g(x 0 , p0 ) − G(x − x0 ) + ∂ψR m,+ (μ),

−(x0 , p0 ) − L(x − x0 ) + ∂ψK + (η),
where the operators A, E, G, and L are defined in the introduction to this chapter.
If we define the multivalued operator T from X × W × Rm × Z to itself by
⎛ ⎞ ⎛ ⎞⎛ ⎞
x A E ∗ G ∗ L∗ x
⎜ λ ⎟ ⎜ −E 0 0 0 ⎟ ⎜ ⎟
T ⎜ ⎟ ⎜ ⎟⎜ λ ⎟
⎝ μ ⎠ = ⎝ −G 0 0 0 ⎠ ⎝ μ ⎠
η −L 0 0 0 η
⎛ ⎞ ⎛ ⎞
f (x0 , p0 ) − Ax0 0
⎜ Ex0 ⎟ ⎜ 0 ⎟
+⎜ ⎟ ⎜
⎝ −g(x0 , p0 ) + Gx0 ⎠ + ⎝ ∂ψRm,+ (μ) ⎠ ,

−(x0 , p0 ) + Lx0 ∂ψK + (η)


then the linearization can be expressed as
0 ∈ T (x, λ, μ, η).
Since the constraint associated with g − (x, p0 ) ≤ 0 is inactive in a neighborhood of x0 , strong
regularity of T at (x0 , λ0 , μ0 , η0 ) easily follows from strong regularity of the multivalued
mapping T from X × (W × Rm1 ) × Rm2 × Z into itself at (x0 , (λ̃0 , μ+ 0
0 ), μ0 , η0 ) defined by
⎛ ⎞ ⎛ ⎞⎛ ⎞
x A E+∗ G∗0 L∗ x
⎜ λ̃ ⎟ ⎜ −E+ 0 0 0 ⎟ ⎜ ⎟
T⎜ ⎟ ⎜ ⎟ ⎜ λ̃ ⎟
⎝ μ0 ⎠ = ⎝ −G0 0 0 0 ⎠ ⎝ μ ⎠
0

η −L 0 0 0 η
⎛ ⎞ ⎛ ⎞
f (x0 , p0 ) − Ax0 0
⎜ E+ x0 ⎟ ⎜ 0 ⎟
+⎜ ⎝
⎟+⎜
⎠ ⎝

0 ⎠.
G 0 x0 ∂ψR 2 (μ )
m ,+

−(x0 , p0 ) + Lx0 ∂ψK + (η)

i i

i i
ItoKunisc
i i
2008/6/12
page 48
i i

48 Chapter 2. Sensitivity Analysis

Strong regularity of T requires us to show that there exist neighborhoods of V̂ of 0 and Û of


(x0 , (λ̃0 , μ+
0 ), μ0 , η0 ) in X × (W × R ) × R
0 m1 m2
× Z such that T −1 (V̂ ) ∩ Û is single valued
and Lipschitz continuous from V̂ to Û .
We now divide the proof in several steps.

Existence. We show the existence of a solution to

(α, β, γ , δ) ∈ T (x, λ̃, μ0 , η)

for (α, β, γ , δ) ∈ V̂ . Observe that this is equivalent to


⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞
a A E+∗ G∗0 L∗ x 0
⎜ b ⎟ ⎜ −E+ 0 0 0 ⎟ ⎜ λ̃ ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ ⎜
⎝ c ⎠ + ⎝ −G0
⎟⎜ ⎟+⎜ ⎟, (2.4.4)
0 0 0 ⎠ ⎝ μ0 ⎠ ⎝ ∂ψRm2 ,+ (μ0 ) ⎠
d −L 0 0 0 η ∂ψK + (η)

where

(a, b, c, d) = f (x0 , p0 ) − Ax0 − α, E+ x0 − β, G0 x0 − γ , −(x0 , p0 ) + Lx0 − δ). (2.4.5)

To solve (2.4.4) we introduce the quadratic optimization problem with linear con-
straints:
1
min Ax, x
+ a, x

2 (2.4.6)
subject to E+ x = b, G0 x ≤ c, and Lx − d ∈ K.

We verify the conditions in Lemma 2.17 for (2.4.6) with Ỹ = W × Rm1 , Z̃ = Rm2 × Z,
E = E+ , G = (G0 , L), and K̃ = Rm2 ,− × K. Let S be the feasible set of (2.4.6):

S(α, β, γ , δ) = {x ∈ X : E+ x = b, G0 x ≤ c, Lx − d ∈ K},

where the relationship between (α, β, γ , δ) and (a, b, c, d) is given by (2.4.5). Clearly,
x0 ∈ S(0) and moreover x0 is regular point. From Theorem 2.8 it follows that there exists
a neighborhood V of 0 such that for all (α, β, δ, γ ) ∈ V the set S(α, β, γ , δ) is nonempty.
Lemma 2.17 implies the existence of a unique solution x to (2.4.6) for every (α, β, δ, γ ) ∈ V .
By (H1)–(H2) and Theorems 2.12, 2.11, and 2.14 the solution x is Hölder continuous with
exponent 12 with respect to (α, β, γ , δ) ∈ V . Next we show that x = x(α, β, γ , δ) is a
regular point for (2.4.6), i.e.,
⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ 0 E+ 0 ⎬
0 ∈ int ⎝ G0 x − c ⎠ + ⎝ G0 ⎠ (X − x) + ⎝ Rm2 ,+ ⎠ .
⎩ ⎭
Lx − d L −K

This is equivalent to
⎧⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ β E+ 0 ⎬
0 ∈ int ⎝ γ ⎠ + ⎝ G0 ⎠ (X − x0 ) + ⎝ Rm2 ,+ ⎠ .
⎩ ⎭
(x0 , p0 ) + δ L −K

i i

i i
ItoKunisc
i i
2008/6/12
page 49
i i

2.4. Lipschitz continuity 49

Since x0 is a regular point, there exists a neighborhood Ṽ ⊂ V of 0 such that this inclusion
holds for all (α, β, γ , δ) ∈ Ṽ . Hence the existence of a solutions to (2.4.4) follows from
Lemma 2.17.

Uniqueness. To argue uniqueness, assume that (xi , λ̃i , μ0i , ηi ), i = 1, 2, are solutions to
(2.4.4). It follows that

A(x1 − x2 ) + E+∗ (λ̃1 − λ̃2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ), x1 − x2


≤ 0,

E+ (x1 − x2 ) = 0,

G0 (x1 − x2 ), μ01 − μ02


≥ 0,

L(x1 − x2 ), η1 − η2
≥ 0.

Using (H2) we find

κ |x1 −x2 |2 ≤ A(x1 −x2 ), x1 −x2


≤ − G∗0 (μ01 −μ02 ), x1 −x2
−(L∗ (η1 −η2 ), x1 −x2 ) ≤ 0,

which implies that x1 = x2 = x. To show uniqueness of the remaining components, we


observe that
E+∗ (λ̃1 − λ̃2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ) = 0,

η1 − η2 , L(x − x0 ) + (x0 , p0 ) + δ
= 0,
or equivalently
E ∗ (z)(λ̃1 − λ̃2 , μ01 − μ02 + η1 − η2 ) = 0,
where z = L(x − x0 ) + (x0 , p0 ) + δ and E is defined below (2.1.6). Due to continuous
dependence of x on (α, β, γ , δ) we can assure that z ∈ O for all (α, β, γ , δ) ∈ Ṽ . Hence
(H3) implies that (λ̃1 − λ̃2 , μ01 − μ02 , η1 − η2 ) = 0.

Continuity. By (H3) the operator E ∗ ((x0 , p0 )) has closed range and injective. Hence there
exists  > 0 such that
|E ∗ ((x0 , p0 ))| ≥  ||
for all  = (λ̃, μ0 , η). Since E ∗ ((x0 , p0 )) − E ∗ (z) ≤ |z − (x0 , p0 )|, there exists a
neighborhood of 0 in X × (W × Rm1 ) × Rm2 × Z, again denoted by Ṽ , such that

|E ∗ (zδ )| ≥ || (2.4.7)
2
for all (α, β, γ , δ) ∈ Ṽ , where zδ = (x0 , p0 ) + Lx − Lx0 + δ. Any solution (x, λ̃, μ0 , η)
satisfies
E ∗ (zδ )(λ̃, μ0 , η) = (−Ax − a, 0).
It thus follows from (2.4.7) that there exists a constant k1 such that for all (α, β, γ , δ) ∈ Ṽ
the solution (x, λ̃, μ̃, η) satisfies

|(x, λ̃, μ0 , η)|X×(W ×Rm1 )×Rm2 ×Z ≤ k1 .

i i

i i
ItoKunisc
i i
2008/6/12
page 50
i i

50 Chapter 2. Sensitivity Analysis

Henceforth let (αi , βi , γi , δi ) ∈ Ṽ and let (xi , λ̃i , μ0i , ηi ) denote the corresponding solution
to (2.4.4) for i = 1, 2. Then
⎛ ⎞
λ̃1 − λ̃2  
α1 − α2 + A(x2 − x1 )
E ∗ (zδ1 ) ⎝ μ01 − μ02 ⎠ = .
η2 , L(x2 − x1 ) + δ2 − δ1

η1 − η 2

By (2.4.7) this implies that there exists a constant k2 such that

|(λ̃1 − λ̃2 , μ01 − μ02 , η1 − η2 )|(W ×Rm1 )×Rm2 ×Z ≤ k2 (|α1 − α2 | + |δ1 − δ2 | + |x1 − x2 |). (2.4.8)

Now, from the first equation in (2.4.4) we have

A(x1 − x2 ) + E+∗ (λ̃1 − λ̃2 ) + G∗0 (μ01 − μ02 ) + L∗ (η1 − η2 ) + a1 − a2 , x1 − x2


≤ 0. (2.4.9)

We also find
μ01 − μ02 , G0 (x1 − x2 )
= μ01 , c1 − c2
+ μ01 , c2 − G0 x2

(2.4.10)
− μ02 , G0 x1 − c1
− μ02 , c1 − c2
≥ μ01 − μ02 , c1 − c2

and similarly

η1 − η2 , L(x1 − x2 )
≥ η1 − η2 , d1 − d2
. (2.4.11)

Representing x1 − x2 = v + w with v ∈ ker (R+ ) and w ∈ range (E+∗ ), one obtains

E+ (x1 − x2 ) = E+ w = b1 − b2 .

Thus

|w| ≤ k3 |b1 − b2 | (2.4.12)

for some k3 , and from (H2) and (2.4.9)–(2.4.11)

κ |v|2 ≤ Av, v
= A(x1 − x2 ), x1 − x2
− 2 Av, w
− Aw, w

≤ − λ̃1 − λ̃2 , b1 − b2
− μ01 − μ02 , c1 − c2
− η1 − η2 , d1 − d2

− a1 − a2 , v + w
− 2 Av, w
− Aw, w
.

Let  = (λ̃1 − λ̃2 , μ01 − μ02 , η1 − η2 ). Then

κ |v|2 ≤ |||(β1 − β2 , γ1 − γ2 , δ1 − δ2 )| + |α1 − α2 |(|v| + |w|) + A(2|v||w| + |w|2 ).

It thus follows from (2.4.8), (2.4.12) that there exists a constant k4 such that

|x1 − x2 | ≤ k4 |(α1 − α2 , β1 − β2 , γ1 − γ2 , δ1 − δ2 )|

for all (αi , βi , γi , δi ) ∈ Ṽ . We apply (2.4.8) once again to obtain Lipschitz continuity of
(x, λ̃, μ0 , η) with respect to (α, β, γ , δ) in a neighborhood of the origin. Consequently T

i i

i i
ItoKunisc
i i
2008/6/12
page 51
i i

2.4. Lipschitz continuity 51

is strongly regular at (x0 , λ0 , μ0 , η0 ), Robinson’s theorem is applicable, and (2.4.1), (2.4.2)


follow.

Local solution. We show that there exists a neighborhood Ñ of p0 such that for p ∈ Ñ the
second order sufficient optimality (2.3.21) is satisfied at x(p) so that x(p) is a local solution
of (2.1.1) by Theorem 2.12. Due to (H2) and the smoothness properties of f, e, g we can
assume that
κ 2
L (x(p), p, λ(p), η(p))(h, h) ≥ |h| for all h ∈ ker E+ (2.4.13)
2
if p ∈ N (p0 ). Let us define Ep = (e (x(p), p), g+
(x(p), p)) for p ∈ N (p0 ). Due to
surjectivity of Ep0 and smoothness properties of e, g there exists a neighborhood Ñ ⊂ N (p0 )
of p0 such that Ep is surjective for all p ∈ Ṽ . Lemma 2.13 implies the existence of δ0 >
and γ > 0 such that

L (x(p), p, λ(p), η(p))(h + z, h + z) ≥ δ0 |h + z|2

for all h ∈ ker E+ and z ∈ X satisfying |z| ≤ γ |h|. The orthogonal projection onto ker Ep
is given by Pker Ep = I − Ep∗ (Ep Ep∗ )−1 Ep . We can select Ñ so that

γ
|Ep∗ (Ep Ep∗ )−1 Ep − Ep∗0 (Ep0 Ep∗0 )−1 Ep0 | ≤
1+γ

for all p ∈ Ṽ . For x ∈ ker Ep , we have x = h + z with h ∈ ker E+ and z ∈ (ker E+ )⊥


and |x|2 = |h|2 + |z|2 . Thus,
γ
|z| ≤ |Ep∗ (Ep Ep∗ )−1 Ep x − Ep∗0 (Ep0 Ep∗0 )−1 Ep0 x| ≤ (|h| + |z|)
1+γ

and hence |z| ≤ γ |h|. From (2.4.13) this implies

L (x(p), p, λ(p), η(p)) ≥ δ0 |x|2 for all x ∈ ker Ep

and, since L((p), p) ⊂ ker Ep , the second order sufficient optimality (2.3.21) is satisfied
at x(p).

The first part of the proof of Theorem 2.16 contains a Lipschitz continuity result for
linear complementarity problems. In the following corollary we reconsider this result in
a modified form which will be used for the convergence analysis of sequential quadratic
programming problems.
Throughout the following discussion p0 in (2.1.1) (with C = X) is fixed and therefore
its notation is suppressed. For (x, λ, μ) ∈ X × W ∗ × Rm let

L(x, λ, μ) = f (x) + λ, e(x)


+ μ, g(x)
;

i.e., the equality and finite rank inequality constraints are realized in the Lagrangian term.
For (x, λ, μ) ∈ X×W ∗ ×Rm , (x̄, λ̄, μ̄) ∈ X×W ∗ ×Rm , and (a, b, c, d) ∈ X×W ×Rn ×Z

i i

i i
ItoKunisc
i i
2008/6/12
page 52
i i

52 Chapter 2. Sensitivity Analysis

consider


⎪ min 12 L (x̄, λ̄, μ̄)(x − x̄, x − x̄) + f (x̄) + a, x − x̄
,






⎨ e(x̄) + e (x̄)(x − x̄) = b,
(2.4.14)



⎪ g(x̄) + g (x̄)(x − x̄) ≤ c,





(x̄) + L(x − x̄) − d ∈ K.

Let (x̄, λ̄, μ̄) ∈ X × W ∗ × Rm and define the operator G from X × W ∗ × Rm × Z ∗ to


X × W × Rm × Z by

G(x̄, λ̄, μ̄)(x, λ, μ, η)


⎛ ⎞ ⎛ ⎞
f (x̄) L (x̄, λ̄, μ̄)(x − x̄) + e (x̄)∗ λ + g (x̄)∗ μ + L∗ η
⎜ −e(x̄) ⎟ ⎜ −e (x̄)(x − x̄) ⎟
=⎜ ⎝−g(x̄)⎠ ⎝
⎟+⎜ ⎟.

−g (x̄)(x − x̄)
−(x̄) −L(x − x̄)

Note that ⎛
⎞ ⎛ ⎞
a 0
⎜ b ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ ⎜
⎝ c ⎠ + G(x̄, λ̄, μ̄)(x, λ, μ, η) + ⎝
⎟ (2.4.15)
∂ψRm,+ (μ) ⎠
d ∂ψK + (η)
is the first order optimality system for (2.4.14). The following modification of (H2) will be
used

(H2) there exists κ > 0 such that L (x0 , λ0 , μ0 ) x, x
X ≥ κ|x|2X for all x ∈ ker(E+ ).

Corollary 2.18. Assume that (H1), (H2), , (H3) hold at a local solution of (2.1.1) and that
the second derivatives of f , e, and g are Lipschitz continuous in a neighborhood of x0 .
Then there exist neighborhoods U (ξ0 ) of ξ0 = (x0 , λ0 , μ0 , η0 ) in X × W ∗ × Rm × Z ∗ ,
Û (x0 , λ0 , μ0 ) of (x0 , λ0 , μ0 ), and V of the origin in X × W × Rm × Z and a constant K̃,
such that for all (x̄, λ̄, μ̄) ∈ Û (x0 , λ0 , μ̄0 )) and q = (a, b, c, d) ∈ V , there exists a unique
solution ξ = ξ(x̄, λ̄, μ̄, q) = (x, λ, μ, η) ∈ U (ξ0 ) of (2.4.15), and x is a local solution of
(2.4.14). Moreover, for every pair q1 , q2 ∈ V and (x̄1 , λ̄1 , μ̄1 ), (x̄2 , λ̄2 , μ̄2 ) ∈ Û (x0 , λ0 , μ̄0 )
we have

|ξ(x̄1 , λ̄1 , μ̄1 , q1 ) − ξ(x̄2 , λ̄2 , μ̄2 , q2 )| ≤ K̃(|(x̄1 , λ̄1 , μ̄1 ) − (x̄2 , λ̄2 , μ̄2 )| + |q1 − q2 |).

Proof. From the first part of the proof of Theorem 2.16 it follows that
⎛ ⎞ ⎛ ⎞
a 0
⎜ b ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ ⎜
⎝ c ⎠ + G(x0 , λ0 , μ0 )(x, λ, μ, η) + ⎝ ∂ψRm,+ (μ)


d ∂ψK + (η)

i i

i i
ItoKunisc
i i
2008/6/12
page 53
i i

2.5. Differentiability 53

admits a unique solution ξ = (x, λ, μ, η) which depends on (a, b, c, d) Lipschitz contin-


uously provided that (a, b, c, d) is sufficiently small. The proof of the corollary can be
completed by observing that the estimates in the proof of Theorem 2.16 also hold uniformly
if (x0 , λ0 , μ0 ) is replaced by (x̄, λ̄, μ̄) in a sufficiently small neighborhood Û (x0 , λ0 , μ0 )
of (x0 , λ0 , μ0 ), and (a, b, c, d) is in a sufficiently small neighborhood V of the origin in
X × W × Rm × Z.
As an alternative to reconsidering the proof of Theorem 2.16 one can apply Theorem
2.16 to (2.1.1) with P = (X × W ∗ × Rm ) × (X × W × Rm × Z), p = (x, λ, μ, a, b, c, d),
p0 = (x0 , λ0 , μ0 , 0), and f, e, g,  replaced by
f˜(x, x̄, λ̄, μ̄, a) = 12 L (x̄, λ̄, μ̄)(x − x̄, x − x̄) + f (x̄) + a, x − x̄
,

ẽ(x; x̄, b) = e(x̄) + e (x̄)(x − x̄) − b,

g̃(x; x̄, c) = g(x̄) + g (x̄)(x − x̄) − c,

˜ x̄, d) = (x̄) + L(x − x̄) − d.


(x;
Clearly (H1)–(H3) hold for ẽ, g̃, ˜ at (x0 , p0 ) with p0 = (x0 , λ0 , μ0 , 0). Lipschitz continuity
of f˜, ẽ, g̃, ˜ with respect to (x̄, λ̄, μ̄, a, b, c, d) is a consequence of the general regularity
requirements on f, e, g,  made at the beginning of the chapter and the assumption that
the second derivatives of f and e are Lipschitz continuous in a neighborhood of x0 . This
˜
implies (H4) for ẽ, f˜, g̃, .

2.5 Differentiability
In this section we discuss the differentiability properties of the solution ξ(p) = (x(p), λ(p),
μ(p), η(p)) of the optimality system (2.1.3) with respect to p. Throughout it is assumed
that p ∈ N so that (2.1.3) has the unique solutions ξ(p) in U , where N and U are specified
in Theorem 2.16. We assume that C = X.

Definition 2.19. A function φ from P to a normed linear space X̃ is said to have a directional
derivative at p0 ∈ P if
φ(p0 + t q) − φ(p0 )
lim
t→0+ t
exists for all q ∈ P .

Consider the linear generalized equation


⎧ ∗ ∗ ∗
⎪ L (x0 , p, λ0 , μ0 , η0 ) + A(x − x0 ) + E (λ − λ0 ) + G (μ − μ0 ) + L (η − η0 ),


−e(x0 , p) − E(x − x0 ),
0∈

⎪ −g(x 0 , p) − G(x − x0 ) + ∂ψRm,+ (μ),

−(x0 , p) − L(x − x0 ) + ∂ψK + (η).
(2.5.1)
In the proof of Theorem 2.16 strong regularity of (2.1.4) at ξ(0) = (x0 , λ0 , μ0 , η0 ) was
established. Therefore it follows from Theorem 2.5 that there exist neighborhoods of p0

i i

i i
ItoKunisc
i i
2008/6/12
page 54
i i

54 Chapter 2. Sensitivity Analysis

and ξ(p0 ) which, without loss of generality, we again denote by N and U , such that (2.5.1)
admits a unique solution ξ̂ (p) = (x̂(p), λ̂(p), μ̂(p), η̂(p)) in U for every p ∈ N . Moreover
we have
|ξ(p) − ξ̂ (p)| ≤ α(p) |p − p0 |, (2.5.2)

where α(p) : N → R+ satisfies α(p) → 0 as p → p0 . Thus if ξ̂ has a directional derivative


at p0 , then so does ξ(p) and the directional derivatives coincide. We henceforth concentrate
on the analysis of directional differentiability of ξ̂ (p). From (2.5.2) and Lipschitz continuity
if p → ξ(p) at p0 , it follows that for every q ∈ P

ξ̂ (p + t q) − ξ̂ (p )
0 0
lim sup < ∞.
t>0 t

Hence there exist weak cluster points of ξ̂ (p0 +t q)−


t
ξ̂ (p0 )
as t → 0+ . Note that these weak
ξ(p0 +t q)−ξ(p0 )
cluster points coincide with those of t
as t → 0+ . We will show that under
appropriate conditions, these weak cluster points are strong limit points and we derive
equations that they satisfy. The following definition and additional hypotheses will be used.

Definition 2.20. A closed convex set C of a Hilbert space H is called polyhedric at z ∈ H if

∪λ>0 λ(C − P z) ∩ [z − P z]⊥ = ∪λ>0 λ(C − P z) ∩ [z − P z]⊥ ,

where P denotes the metric projection onto C and [z − P z]⊥ stands for the orthogonal
complement of the subspace spanned by x − P z ∈ H . Moreover C is called polyhedric if
C is polyhedric at every z ∈ H .
(H5) e(x0 , ·), (x0 , ·), f (x0 , ·), e (x0 , ·), g (x0 , ·), and  (x0 , ·) are directionally differen-
tiable at p0 .
(H6) K is polyhedric at (x0 , p0 ) + η0 .
(H7) There exists ν > 0 such that Ax, x
≥ ν |x|2 for all x ∈ ker E.
 
(H8) EL : X → W × Z is surjective.
Since every element z ∈ Z can be decomposed uniquely as z = z1 + z2 with z1 = PK z and
z2 = PK + z and z1 , z2
= 0 (see [Zar]), (H.6) is equivalent to

∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ = ∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ .

Recall the decomposition of the inequality constraint with finite rank and the nota-
tion that was introduced in (2.1.6). Due to the complementarity condition and continuous
dependency of ξ(p) on p one can assume that

g + (x(p), p) = 0, g − (x(p), p) < 0, μ+ (p) > 0, μ− (p) = 0 (2.5.3)

for p ∈ N. This also holds with (x(p), μ(p)) replaced by (x̂(p), μ̂(p)).

i i

i i
ItoKunisc
i i
2008/6/12
page 55
i i

2.5. Differentiability 55

Theorem 2.21. Let (H1)–(H5) hold and let (ẋ, λ̇, μ̇, η̇) denote a weak cluster point of
ξ(p0 +t q)−ξ(p0 )
t
as t → 0+ . Then (ẋ, λ̇, μ̇, η̇) satisfies


⎪ Lp (x0 , po , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗ μ̇ + L∗ η̇,







⎪ −ep+ (x0 , p0 )q − E+ ẋ,



0∈ −gp0 (x0 , p0 ) − G0 ẋ + ∂ψRm2 ,+ (μ̇0 ), (2.5.4)







⎪ μ̇− ,





η̇, (x0 , p0 )
+ η0 , Lẋ + p (x0 , p0 )q
.

Proof. As described above, we can restrict ourselves to a weak cluster point of ξ̂ (p0 +t q))−
t
ξ̂ (p0 )
.
We put pn = p0 + tn q. By (2.5.1)

0 = L (x0 , pn , λ0 , μ0 , η0 ) − L (x0 , p0 , λ0 , μ0 , η0 )

+A(x̂(pn ) − x0 ) + E ∗ (λ̂(pn ) − λ0 ) + G∗ (μ̂(pn ) − μ0 ) + L∗ (η̂(pn ) − η0 ) = 0,

0 = −e(x0 , pn ) + e(x0 , p0 ) − E(x̂(pn ) − x0 ) = 0.

Dividing these two equations by tn > 0 and letting tn → 0+ we obtain the first two equations
in (2.5.4). For the third equation note that μ̇0 ≥ 0 since μ̂0 (pn ) ≥ 0 and μ0 (p0 ) = 0. Since
g 0 (x0 , p0 ) = 0 we have from (2.5.1)

g 0 (x0 , pn ) − g 0 (x0 , p0 ) + G0 (x̂(pn ) − x0 ), z − (μ̂0 (pn ) − μ0 (p0 )


≥ 0

for all z ∈ Rm2 ,+ . Dividing this inequality by tn > 0 and letting tn → 0+ we obtain the
third inclusion. The fourth equation is obvious. For the last equation we recall that

η̂(pn ), (x0 , pn ) + L(x̂(pn ) − x0 )


= η0 , (x0 , p0 )
= 0.

Thus,

η̂(pn ) − η0 , (x0 , pn )
+ η̂(pn ), L(x̂(pn ) − x0 )
+ η0 , (x0 , pn ) − (x0 , p0 )
= 0,

which implies the last equality.

Define the value function V (p)

V (p) = inf {f (x, p) : e(x, p) = 0, g(x, p) ≤ 0, (x, p) ∈ K}.


x∈c

Then we have the following corollary.

Corollary 2.22 (Sensitivity of Cost). Let (H.1)–(H.4) hold and assume that e, g,  are
continuously differentiable in the sense of Fréchet at (x0 , p0 ). Then the Gâteaux derivative

i i

i i
ItoKunisc
i i
2008/6/12
page 56
i i

56 Chapter 2. Sensitivity Analysis

of the value function μ at p0 exists and is given by

V (p0 ) = Lp (x0 , p0 , λ0 , μ0 , η0 )

= fp (x0 , p0 ) + λ0 , ep (x0 , p0 )
+ μ0 , gp (x0 , p0 )
+ η0 , p (x0 , p0 )
.

Proof. Note that

V (p) = f (x(p), p) = L(x(p), p, λ(p), μ(p), η(p)).

For q ∈ P let ξ(t) = t −1 (x(p0 + t q) − x(p0 )), t > 0. Then there exists a subsequence tn
such that limn→∞ tn = 0 and ξ(tn ) → (ẋ, λ̇, μ̇, η̇). Now, let pn = p0 + tn q and

V (pn ) − V (p0 ) = L(x(pn ), pn , λ(pn ), μ(pn ), η(pn ))

−L(x(pn ), p0 , λ(pn ), μ(pn ), η(pn )) + L(x(pn ), p0 , λ(pn ), μ(pn ), η(pn ))

−L(x(pn ), p0 , λ(p0 ), μ(p0 ), η(p0 )) + L(x(pn ), p0 , λ(p0 ), μ(p0 ), η(p0 ))

−L(x(p0 ), p0 , λ(p0 ), μ(p0 ), η(p0 )).

Thus,
V (pn )−V (p0 )
tn
= L (x0 , p0 , λ0 , μ0 , η0 ), ẋ
+ L(x0 , p0 , λ0 , μ0 , η0 )p , q

+ λ̇, e(x0 , p0 )
+ μ̇, g(x0 , p0 )
+ η̇, (x0 , p0 )
+ O(tn ).

Clearly λ̇, e(x0 , p0 )


= 0 and considering separately the cases according to (2.5.3)
we find μ̇, g(x0 , p0 )
= 0 as well. Below we verify that

η0 , Lẋ + p (x0 , p0 )q
= 0. (2.5.5)

From Theorem 2.21 this implies that η̇, (x0 , p0 )


= 0. To verify (2.5.5) note that from
(2.5.1) we have

η̂(tn ), (x0 , p0 ) − (x0 , pn ) − L(x̂(tn ) − x0 )


≤ 0

and consequently η0 , p (x0 , p0 )q + L ẋ


≥ 0. Moreover

η0 , (x0 , pn ) − (x0 , p0 ) + L(x̂(tn ) − x0 )


≤ 0

and therefore η0 , p (x0 , p0 )q + L ẋ


≤ 0 and (2.5.5) holds. It follows that

V (pn ) − V (p0 )
lim = Lp (x0 , p0 , λ0 , μ0 , η0 ), q
.
n→∞ tn

Since the limit does not depend on the sequence {tn } the desired result follows.

i i

i i
ItoKunisc
i i
2008/6/12
page 57
i i

2.5. Differentiability 57

In order to prove strong differentiability of x(p) we use a result due to Haraux [Har]
on directional differentiability of the metric projection onto a closed convex set.

Theorem 2.23. Let C be a closed convex set in a Hilbert space H with metric projection
P from H onto C. Let ψ be an H -valued function and assume that C is polyhedric at
ψ(0). Set

K̂(ψ(0)) = ∪λ>0 λ(C − P ψ(0)) ∩ [ψ(0) − P ψ(0)]⊥ ,

and denote by PK̂(ψ(0)) the projection onto K̂(ψ(0)). If there exists a sequence tn such that
limn→∞ tn → 0+ and limn→∞ ψ(tn )−ψ(0)tn
=: ψ̇ exists in H , then

P ψ(tn ) − P ψ(0)
lim = PK̂(ψ(0)) ψ̇.
tn →0+ tn

Proof. Let γ (t) = P ψ(t)−P


t
ψ(0)
. Since the metric projection onto a closed convex set in a
Hilbert space is a contraction, γ (tn ) has a weak limit which is denoted by γ . Since

ψ(tn ) − P ψ(tn ), P ψ(0) − P ψ(tn )


≤ 0

and P ψ(tn ) = tn γ (tn ) + P ψ(0), we obtain

ψ(tn ) − tn γ (tn ) − P ψ(0), −tn γ (tn )


≤ 0.

This implies
 
ψ(tn ) − ψ(0)
tn + ψ(0) − tn γ (tn ) − P ψ(0), −tn γ (tn ) ≤ 0
tn
and hence
 
ψ(tn ) − ψ(0)
tn2 γ (tn ), γ (tn ) − ≤ tn γ (tn ), ψ(0) − P ψ(0)

tn (2.5.6)
= P ψ(tn ) − P ψ(0), ψ(0) − P ψ(0)
≤ 0

for all n. Since the norm is weakly lower semicontinuous, (2.5.6) implies that

γ , γ − ψ̇
≤ 0. (2.5.7)

Employing (2.5.6) again we have


 
ψ(tn ) − ψ(0)
0 ≥ γ tn ), ψ(0) − P ψ(0)
≥ tn γ (tn ), γ (tn ) −
tn
for tn > 0. Since {γ (tn )} is bounded, this implies

γ , ψ(0) − P ψ(0)
= 0,

and thus γ ∈ K̂(ψ(0)).

i i

i i
ItoKunisc
i i
2008/6/12
page 58
i i

58 Chapter 2. Sensitivity Analysis

Let w ∈ ∪λ>0 λ(C − P ψ(0)) ∩ [ψ(0) − P ψ(0)]⊥ be arbitrary. Then there exist λ > 0
and u ∈ C such that w = λ (u − P ψ(0)) and ψ(0) − P ψ(0), u − P ψ(0)
= 0. Since

ψ(tn ) − tn γ (tn ) − P ψ(0), u − P ψ(0) − tn γ (tn )


≤ 0,

we find for δn = γ (tn ) − γ that


 
ψ(tn ) − ψ(0)
tn + ψ(0) − P ψ(0) − tn γ − tn δn , u − P ψ(0) − tn γ − tn δn ≤ 0.
tn
Thus,
 
ψ(tn ) − ψ(0)
tn − tn γ − tn δn , u − P ψ(0) − tn γ − tn δn ≤ ψ(0) − P ψ(0), tn δn

tn
and therefore
 
ψ(tn ) − ψ(0)
− γ , u − P ψ(0) ≤ δn , u − P ψ(0)
+ ψ(0) − P ψ(0), δn
+ tn M
tn
for some constant M. Letting tn → 0+ we have

ψ̇ − γ , u − P ψ(0)
≤ 0.

Since C is polyhedric at ψ(0) it follows that

ψ̇ − γ , w
≤ 0 for all w ∈ K̂(ψ(0)). (2.5.8)

Combining (2.5.7) and (2.5.8) one obtains

γ − ψ̇, γ − w
≤ 0 for all w ∈ K̂(ψ(0)),

and thus γ = PK̂(ψ(0)) ψ̇


To show strong convergence of γ (tn ) observe that from (2.5.7) and (2.5.8)

ψ̇ − γ , γ
= 0.

Hence
 
ψ(tn ) − ψ(0)
ψ̇, γ
= |γ | ≤ lim inf |γ (tn )| ≤ lim sup
2 2
, γ (tn ) = ψ̇, γ
,
tn

which implies that lim |γ (tn )|2 = |γ |2 and completes the proof.

Theorem 2.24 (Sensitivity Equation). Let (H1)–(H8) hold. Then the solution mapping
p → ξ(p) = (x(p), λ(p), μ(p), η(p)) is directionally differentiable at p0 , and the direc-
tional derivative (ẋ, λ̇, μ̇, η̇) at p0 in direction q ∈ P satisfies


⎪ Lp (x0 , p0 , λ0 , μ0 , η0 )q + Aẋ + E ∗ λ̇ + G∗+ μ̇+ G∗0 μ̇0 + L∗ η̇,

−ep+ (x0 , p0 )q − E+ ẋ,
0∈ (2.5.9)

⎪ −gp0 (x0 , p0 )q − G0 ẋ + ∂ψRm,+ (μ̇0 ),

−p (x0 , p0 )q − Lẋ + ∂ψK̂ + (η̇).

i i

i i
ItoKunisc
i i
2008/6/12
page 59
i i

2.5. Differentiability 59

Proof. Let {tn } be a sequence of real numbers with limn→∞ tn = 0+ and w−lim tn−1 (ξ(p0 +
tn q) − ξ(p0 )) = (ẋ, λ̇, μ̇, η̇). Then (ẋ, λ̇, μ̇, η̇) is also a weak cluster point of w −
lim tn−1 (ξ̂ (p0 + tn q) − ξ(p0 )). The proof is now given in several steps.
We first show that x̂(tn )−x(0)
tn
converges strongly in X as n → ∞ and (2.5.9) holds for
all weak cluster points. Let pn = p0 + tn q and define

φ(tn ) = L (x0 , pn , λ0 , μ0 , η0 ) + f (x0 , p0 ) − Ax0 .

From (2.5.1) we have




⎪ L (x0 , pn , λ0 , μ0 , η0 ) + f (x0 , p0 ) + A(x̂(tn ) − x0 ) + E ∗ (λ̂(tn ))


⎨ +G∗ (μ̂(tn )) + L∗ (η̂(tn )),
0∈ −e(x0 , pn ) − E(x̂(tn ) − x0 ), (2.5.10)


⎪ −g(x0 , pn ) − G(x̂(tn ) − x0 ) + ∂ψRm,+ (μ̂(tn )),


−(x0 , pn ) − L(x̂(tn ) − x0 ) + ∂ψK + (η̂(tn )).

By (H8) there exists a unique w(tn ) ∈ range (E ∗ , L∗ ) such that


⎛ ⎞ ⎛ ⎞
E −e(x0 , pn ) + Ex0
⎝ ⎠ w(tn ) = ⎝ ⎠.
L −(x0 , pn ) + Lx0

Due to (H5) and (H8) there exists ẇ ∈ X such that lim tn−1 (w(tn ) − w(0)) = ẇ. If we
define y(tn ) = x̂(tn ) − w(tn ), then using Lemma 2.7 one verifies that y(tn ) ∈ C, where

C = {x ∈ X : Ex = 0 and Lx ∈ K}.

Observe that
E ∗ λ̂(tn ) + L∗ η̂(tn ), c − yn (tn )
≤ 0 for all c ∈ C

and thus

Ayn (t) + φ(tn ) + Aw(tn ) + G∗ μ̂(tn ), c − y(tn )


≥ 0 for all c ∈ C. (2.5.11)

If we equip ker E with the inner product

((x, y)) = Ax, y


for x, y ∈ ker E,

then ker E is a Hilbert space by (H7). Let AP = Pker E A Pker E and define

ψ(tn ) = −A−1 ∗
P Pker E (φ(tn ) + Aw(tn ) + G μ(tn )) ∈ ker E.

Then limtn →0+ ψ(tn )−ψ(0)


tn
=: ψ̇ exists, and (2.5.11) is equivalent to

((y(tn ) − ψ(tn ), c − y(tn ))) ≥ 0 for all c ∈ C.

i i

i i
ItoKunisc
i i
2008/6/12
page 60
i i

60 Chapter 2. Sensitivity Analysis

Thus y(tn ) = PC ψ(tn ), where PC denotes the metric projection in ker E onto PC with
respect to the inner product ((·, ·)). For any h ∈ ker E we find

((h, ψ(0) − PC ψ(0) )) = h, −f (x0 , p0 ) + A xo − A w(0) − G∗ μ0 − A y(0)

= h, −f (x0 , p0 ) − G∗ μ0 − E ∗ λ0
= h, L∗ η0
,

and therefore
[ψ(0) − PC ψ(0)]⊥ = {h : η0 , L h
= 0}. (2.5.12)

Here the orthogonal complement is taken with respect to the ((·, ·))—the inner product
on ker E. Moreover we have L(y(0) = (x0 , p0 ). Hence from Lemma 2.25 below we
have ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) −
PC ψ(0)]⊥ , i.e., C is polyhedric with respect to ψ(0). Theorem 2.23 therefore implies
that limtn →o+ (tn−1 )(y(tn ) − y(0)) = : ẏ exists and satisfies

ˆ
((ψ̇ − ẏ, v − ẏ)) ≥ 0 for all v ∈ C(ψ(0)). (2.5.13)

Moreover limtn →0 (tn−1 )(x(tn ) − x(0)) exists and will be denoted by ẋ.
To verify that (ẋ, λ̇, μ̇, η̇) satisfies (2.5.9) it suffices to prove the last inclusion. From
(2.5.13) we have

−L p (x0 , p0 , λ0 , μ0 , η0 )q − A ẋ − G∗ μ̇, v − ẏ
≤ 0

and from the first equation of (2.5.4)

L∗ η̇, v − ẏ
= η̇, L v − p (x0 , p0 )q
≤ 0 (2.5.14)

ˆ
for all v ∈ C(ψ(0)) = ∪λ>0 λ(C − PC ψ(0)) ∩ [ψ(0) − PC ψ(0)]⊥ . Here we used the

facts that φ̇ = Lp (x0 , λ0 , μ0 , ηo )q and L ẇ = −p (x0 , p0 )q. From (2.5.12), (H8), and
L y(0) = (x0 , p0 ) we have

ˆ
L C(ψ(0)) = ∪λ>0 λ(K − (x0 , p0 ) ∩ [η0 ]⊥ = : K̂

and therefore by (2.5.14)

η̇, v − L ẋ −  p(x0 , p0 )q
≤ 0 for all v ∈ K̂. (2.5.15)

Recall from (2.5.5) that


η0 , p (xo , p0 )q + L ẋ
= 0.

This further implies p (x0 , p0 )q + L ẋ ∈ K̂, and from (2.5.15) we deduce η̇ ∈ K̂ + . Conse-
quently η̇ ∈ ∂ ψK̂ (p (x0 , p0 )q +L ẋ), which is equivalent to  p(xo , p0 )q +L ẋ ∈ ∂ ψK̂ + (η̇)
and implies the last inclusion in (2.5.9).

i i

i i
ItoKunisc
i i
2008/6/12
page 61
i i

2.5. Differentiability 61

Next, we show that the weak cluster points of limt→0+ ξ̂ (t)−ξ(0) t


are unique. Let
(ẋi , λ̇i , μ̇i , η̇i ), i = 1, 2, be two weak cluster points. Then by (2.5.9)
0 = A(ẋ1 − ẋ2 ), ẋ1 − ẋ2
+ μ̇1 − μ̇2 , G(ẋ1 − ẋ2 )
+ η̇1 − η̇2 , L(ẋ1 − ẋ2 )

= A(ẋ1 − ẋ2 ), ẋ1 − ẋ2


+ μ̇01 , −gp0 (x0 , p0 )q − G0 ẋ2
+ μ̇02 , −gp0 (x0 , p0 )q − G0 ẋ1

+ η̇1 , −p (x0 , p0 )q − Lẋ2


+ η̇2 , −p (x0 , p0 )q − Lẋ1

≥ A(ẋ1 − ẋ2 ), ẋ1 − ẋ2


,
and by (H2) this implies ẋ1 = ẋ2 . From (2.5.9) we have
⎛ ˙ ⎞
λ̃1 − λ̃˙ 2
E ∗ ((x0 , p0 )) ⎝ μ̇01 − μ̇02 ⎠ = 0
η̇1 − η̇2

and hence (H3) implies (λ̇1 , μ̇1 , η̇1 ) = (λ̇2 , μ̇2 , η̇2 ). Consequently ξ̂ (t)−ξ(0)
t
has a unique
weak limit as t → 0+ . Finally we show that the unique weak limit is also strong. For the
x-component this was already verified. Note that from (2.5.1)
⎛˜ ⎞ ⎛ ⎞
λ̂(t) − λ̃0 −L (x0 , p0 + t q, λ0 , μ0 , η0 ) − A(x̂(t) − x0 )
E ∗ ((x0 , p0 )) ⎝ μ̂0 − μ00 ⎠ = ⎝ ⎠.
η̃(t) − η0 η̂(t), −(x0 , p + t q) + (x0 , p0 ) − L(x(t) ˆ − x0 )

Dividing this equation by t and noticing that the right-hand side converges strongly as
t → 0+ , it follows from (H3) that limt→0+ t −1 (λ̃(t) − λ0 , μ̃(t) − μ0 , η̃(t) − η0 ) converges
strongly as well.

Lemma 2.25. Assume that (H6) and (H8) hold and let y(0) ∈ C be such that L y(0) =
(x0 , p0 ). Then we have
∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
= 0}
(2.5.16)
= ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
= 0}.

Proof. It suffices to verify that the set on the right-hand side of (2.5.16) is contained in the
set on the left. For this purpose let y ∈ ∪λ>0 λ(C − y(0)) ∩ {h ∈ ker E : η0 , L h
=
0} and decompose y = y1 + y2 with y1 ∈ ker( EL ) and y2 ∈ (ker( EL ))⊥ . We need
only consider y2 . Since L y2 ∈ ∪λ→0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ and therefore L y2 ∈
∪λ>0 λ(K − (x0 , p0 )) ∩ [η0 ]⊥ by (H6), hence there exist sequences {λn } and {kn } with
λn > 0, kn ∈ K, kn , η0
= 0, and lim λn (kn − (x0 , p0 )) = L y2 . Let us define
cn = c̃n +y(0), where c̃n is the unique element in ( EL )∗ satisfying ( EL )c̃n = ( kn −L0 y(0) ). Then
we have E(cn −y(0)) = 0, η0 , L cn
= η0 , kn
= 0 and hence the sequence λn (cn −y(0)) is
contained in the set on the left-hand side of (2.5.16). Moreover limn→∞ ( EL )(λn (cn −y(0)) =
( EL )y2 , and since both λn (cn − y(0)) and y2 are contained in the range of ( EL )∗ we find that
limn→∞ λn (cn − y(0)) = y2 .

i i

i i
ItoKunisc
i i
2008/6/12
page 62
i i

62 Chapter 2. Sensitivity Analysis

2.6 Application to optimal control of an


ordinary differential equation
We consider optimal control of the linear ordinary differential equation with a terminal
constraint  T
min fˆ(y(t), u(t), α)dt
0

subject to

ẏ(t) = A(α)y(t) + B(α)u(t) + h(t) on (0, T ),


(2.6.1)
y(0) = 0,

|y(T ) − y d | ≤ δ,

u≤z a.e. on (0, T ),


where T > 0, y(t) ∈ Rn , u(t) ∈ Rn , α ∈ Rr , y d ∈ Rn , δ > 0, z ∈ L2 , A(α) ∈ Rn×n ,
B(α) ∈ Rn×m , fˆ : Rn ×Rm ×Rr → R, and h ∈ L2 (0, T ). All function spaces of this section
are considered on the interval (0, T ) with Euclidean image space of appropriate dimension.
The perturbation parameter is given by p = (α, y d , h, z) ∈ P = Rr × Rn × L2 × L2 with
reference parameter p0 = (α0 , y0d , h0 , z0 ). To identify (Pp ) as a special case of (2.1.1) we
set
X = HL1 × L2 , W = L2 , Z = L2 , K = {φ ∈ L2 : φ ≤ 0 a.e.},
where HL1 = {φ ∈ H 1 : φ(0) = 0}, x = (y, u), and
 T
f (y, u, p) = fˆ(y, u, α)dt,
0

e(y, u, p) = ẏ − A(α)y − B(α)u − h,

1 
g(y, u, p) = |y(T ) − y d |2 − δ 2 ,
2

(y, u, p) = u − z.
The linearizations E, G, and L are

E(y0 , u0 , p0 )(h, v) = ḣ − A(α0 )h − B(α0 )v,

G(y0 , u0 , p0 )(h, v) = y0 (T ) − y0d , h(T )


,

L(y0 , u0 , p0 )(h, v) = v.

The following assumptions will imply conditions (H1)–(H8) of this chapter.

(A1) There exists a solution (y0 , u0 ) of (2.6.1).

i i

i i
ItoKunisc
i i
2008/6/12
page 63
i i

2.6. Application to optimal control of an ODE 63

(A2) A(·) and B(·) are Lipschitz continuous and directionally differentiable at α0 .
(A3) (y, u) → f (y, u, α) is twice continuously differentiable in a neighborhood of (y0 ,
u0 , α0 ) ∈ HL1 × L2 × Rr .
(A4) There exists  > 0 such that for the Hessian with respect to (y, u)

(h, v) fˆ (y0 , u0 , α0 ) (h, v)T ≥ |v|2


for all (h, v) ∈ Rn × Rm .


(A5) There exist a neighborhood V̂ of (y0 , u0 , α0 ) ∈ HL1 × L2 × Rr and a constant κ
such that |f (y, u, α1 ) − f (y, u, α2 )| ≤ κ|α1 − α2 | for all (y, u, αi ) ∈ V̂ , i = 1, 2,
and f (y0 , u0 , ·) is directionally differentiable at α0 , where the prime denotes the
derivative with respect to (y, u).
T
(A6) y0 (T ) − y0d , 0 eA(α0 )(T −s) B(α0 ) (z0 − u0 )ds
 = 0.

With (A3) holding, the regularity requirements made at the beginning of the chapter
are satisfied. The regular point condition (H1) will be satisfied if for every (φ, ρ, ψ) ∈
L2 × R × L2 there exist (h, v) ∈ HL1 × L2 and (r + , r, k) ∈ R+ × R × K such that


⎪ ḣ − A(α0 )h − B(α0 )v = φ,



y0 (T ) − y0d , h(T )
+ r + + r g(y0 , u0 , p0 ) = ρ, (2.6.2)





v − k + r(u0 − z0 ) = ψ.

From the first and third equations we have

v = ψ + k + r(z0 − u0 ),
 t
h(t) = ρ1 (t) + eA(α0 )(t−s) B(α0 )(k + r(z0 − u0 ))dt,
0
t
where ρ1 (t) = 0 e A(α0 )(t−s)
(B(α0 )ψ +φ)ds. The second equation in (2.6.2) is equivalent to
  T 
y0 (T ) − y0α , eA(α0 )(T −s) B(α0 )(k + r(z0 − u0 ))ds + r + + r g(y0 , u0 , p0 )
0

= ρ − ρ1 (T ).

If g(y0 , u0 , p0 ) = 0, then, using (A6), the desired solution for (2.6.2) is obtained by setting
r + = k = 0 and
  T −1
A(α0 )(T −s)
r = y0 (t) − yo ,
d
e B(α0 )(z0 − u0 )ds (ρ − ρ1 (T )).
0

If g(y0 , u0 , p0 ) < 0 and ρ−ρ1 ≥ 0, then the choice r + = ρ−ρ1 , r = k = 0 gives the desired
solution. Finally if g(y0 , u0 , p0 ) < 0 and ρ − ρ1 < 0, then r = (ρ − ρ1 )g(y0 , u0 , p0 ) > 0,

i i

i i
ItoKunisc
i i
2008/6/12
page 64
i i

64 Chapter 2. Sensitivity Analysis

r + = 0, and k = r(u0 − u) ∈ K give a solution for (2.6.2). Thus the regular point condition
holds and implies the existence of a Lagrange multiplier (λ0 , μ0 , η0 ) ∈ L2 × R × L2 . For
the Lagrangian functional
 T
L(y, u, p0 , λ0 , μ0 , η0 ) = fˆ(y, u, α0 )dt + λ0 , ẏ − A(α0 )y − B(α0 )u − h0

μ0
+ (|y(T ) − y0d | − δ 2 |) − η0 , u − z0
,
2
the Hessian with respect to (y, u) at (y0 , u0 ) in directions (h, v) ∈ HL1 × L2 satisfies

L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v))
 T
(h(t), v(t))fˆ (y0 , u0 , α0 )(h(t), v(t))T dt + μ0 |h(T )|2 ≥ |v|2L2

=
0

by (A4). Let k1 > 0 be chosen such that |h|HL1 ≤ k1 |v|L2 for all (h, v) ∈ HL1 × L2 in ker E.
Then there exists a constant k2 > 0 such that

L (y0 , u0 , p0 , λ0 , μ0 , η0 )((h, v), (h, v)) ≥ k2 (|h|2H 1 + |v|1L2 )
L

for all (h, v) ∈ ker E and (H2), (H7) hold.


We turn to the verification of (H3). The case g(y0 , u0 , α0 ) < 0 is simple and we
therefore consider the case g(y0 , u0 , α0 ) = 0. For arbitrary (φ, ρ, ψ) ∈ L2 × R × L2
existence of a solution (h, v, r) ∈ H11 × L2 × R to


⎪ ḣ − A(α0 )h − B(α0 )v = φ,



y0 (T ) − y0d , h(T )
= ρ, (2.6.3)





v + r(z0 − u0 ) = ψ
must be shown. From the first and third equations
v = ψ − r(z0 − u0 )
and
 t
h(t) = ρ1 (t) − r eA(α0 )(t−s) B(α0 )(z0 − u0 )ds.
0
From the second equation of (2.6.3) and (H7) we find
  T −1
A(α0 )(T −s)
r = y0 (T ) − y0 ,
d
e B(α0 )(z0 − u0 )ds
0

( y0 (T ) − y0d , ρ1 (T )
− ρ),
and (H3) holds. Conditions (H4) and (H5) follow from (A2) and (A5). The cone of a.e.
nonpositive functions in L2 is polyhedric [Har] and hence (H6) holds. Finally (H8) is simple
to check. We note that (A5) and (A6) are not needed if (2.6.1) is considered without terminal
constraint.

i i

i i
ItoKunisc
i i
2008/6/12
page 65
i i

Chapter 3

First Order Augmented


Lagrangians for Equality
and Finite Rank Inequality
Constraints

3.1 Generalities
This chapter is devoted to first order augmented Lagrangian methods for constrained prob-
lems of the type
-
min f (x) over x ∈ X
(3.1.1)
subject to e(x) = 0, g(x) ≤ 0, (x) ∈ K,

where f : X → R, e : X → W, g : X → Rm ,  : X → Z with X, W , and Z real Hilbert


spaces, K a closed convex cone with vertex at 0 in Z, and Rm endowed with the natural
ordering x ≤ 0 if xi ≤ 0 for i = 1, . . . , m. A sequence of unconstrained problems with the
property that their solutions converge to the solution of (3.1.1) will be formulated. As we
shall see, first order augmented Lagrangian techniques are related to Lagrangian or duality
methods, as well as to the penalty technique. They arise from considering the Lagrangian
functional for (3.1.1) and adding a proper penalty term. Differently from pure penalty
methods, however, the penalty parameter is not taken to infinity. Rather the penalty term with
fixed weight is used to enhance convexity of the Lagrangian functional in the neighborhood
of a local minimum and to speed up convergence of pure Lagrangian methods. Affine
inequality constraints with infinite-dimensional image space can be added to the problem
formulation (3.1.1). Such constraints, however, are not augmented in this chapter. Central
to the analysis is the augmentability property, which is also of interest in its own right. It has
received a considerable amount of attention for finite-dimensional optimization problems;
see, e.g., [Hes1]. Augmentability of (3.1.1) means that a properly defined penalty term
involving the constraints in (3.1.1) can be added to the Lagrangian functional for (3.1.1)
such that the resulting augmented Lagrangian functional has the property that its Hessian
is positive definite on all of X under minimal assumptions on (3.1.1). We shall present
a convergence theory for the first order augmented Lagrangian method applied to (3.1.1)
without strict complementarity assumption. This means that we do not assume that μ∗i = 0
implies gi (x ∗ ) < 0, where x ∗ denotes a local solution of (3.1.1) and μ∗i is the ith coordinate
of the Lagrange multiplier associated to the inequality constraint g(x) ≤ 0. As an application
we shall discuss the fact that in the context of parameter estimation problems the first order

65

i i

i i
ItoKunisc
i i
2008/6/12
page 66
i i

66 Chapter 3. First Order Augmented Lagrangians

augmented Lagrangian method can be considered as a hybrid method combining the output-
matching and equation error methods. In the context of parameter estimation or, equally
well, of optimal control problems, the first order augmented Lagrangian method lends itself
to vectorization and parallelization in a natural way; see [KuTa].
It will be convenient throughout this section to identify the Hilbert spaces X, W , and
Z with their duals. As a consequence the Lagrange multiplier associated to the equality
constraint e(x) = 0 is sought in W and that for (x) ∈ K in Z.
Let us now fix those assumptions which are assumed throughout this chapter: There
exists x ∗ ∈ X and (λ∗ , μ∗ , η∗ ) ∈ W × Rp × Z such that
-
f, e, and g are twice continuously Fréchet differentiable
(3.1.2)
in a neighborhood of x ∗ ,

x ∗ is a stationary point of (3.1.1) with Lagrange multiplier (λ∗ , μ∗ ), i.e.,


-      
(f (x ∗ ), h)X + λ∗ , e (x ∗ )h W + μ∗ , g (x ∗ )h Rm + η∗ ,  (x ∗ )h Z = 0
(3.1.3)
for all h ∈ X,

and
-  
e(x ∗ ) = 0, μ∗ , g(x ∗ ) Rm = 0, μ∗ ≥ 0, g(x ∗ ) ≤ 0,
 ∗  (3.1.4)
η , (x ∗ ) Z = 0, η∗ ∈ K + , (x ∗ ) ∈ K.

Moreover we assume that


e (x ∗ ) : X → W is surjective. (3.1.5)
   
Above ·, · W and ·, · Z denote the inner products in W and Z, respectively, and
 
·, · Rm is the usual inner product in Rm . If x ∗ is a local solution of (3.1.1), if further (3.1.2)
holds and x ∗ is a regular point in the sense of Definition 1.5, then there exists a Lagrange
multiplier (λ∗ , μ∗ , η∗ ) such that (3.1.3), (3.1.4) hold.
Introducing the Lagrange functional L : X × W × Rm × Z → R by
     
L(x, λ, μ, η) = f (x) + λ, e(x) W + μ, g(x) Rm + η, (x) Z ,
(3.1.3) can be expressed as
L (x ∗ , λ∗ , μ∗ , η∗ )h = 0 for all h ∈ X.
Here and below the prime denotes differentiation with respect to x. With (3.1.2) holding,
L (x ∗ , λ∗ , μ∗ , η∗ ) : X × X → R exists as symmetric bilinear form given by
   
L (x ∗ , λ∗ , μ∗ , η∗ )(h, k) = f (x ∗ )(h, k) + λ∗ , e (x ∗ )(h, k) W + μ∗ , g (x ∗ )(h, k) Rm
for all (h, k) ∈ X×X. The index set of the coordinates of the finite rank inequality constraints
is decomposed as
I1 = {i : gi (x ∗ ) = 0, μ∗i = 0},
I2 = {i : gi (x ∗ ) = 0, μ∗i < 0},
I3 = {i : gi (x ∗ ) < 0}.

i i

i i
ItoKunisc
i i
2008/6/12
page 67
i i

3.2. Augmentability and sufficient optimality 67

Clearly I1 ∪ I2 ∪ I3 = {1, . . . , m}. If I1 is empty, then we say that strict complementarity


holds. Let us denote the cardinality of Ii by mi . Without loss of generality we assume
that mi ≥ 1 for i = 1, 2, 3 and that I1 = 1, . . . , m1 , I2 = m1 + 1, . . . , m1 + m2 , and
I3 = m1 + m2 + 1, . . . , m.
The following second order sufficient optimality condition will be of central
importance:
There exists a constant γ > 0 such that



⎪L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) ≥ γ |h|2X for all h ∈ C,

where C = {h ∈ X : e (x ∗ )h = 0, gi (x ∗ )h ≤ 0 (3.1.6)



for i ∈ I1 , gi (x ∗ )h = 0 for i ∈ I2 }.

In case X is finite-dimensional, (3.1.6) is well known to imply that x ∗ is a strict local solution
to (3.1.1); see, e.g., [Be, p. 71]. For the infinite-dimensional case this will be proved in
Theorem 3.4 below.
Section 3.2 is devoted to augmentability of problem (3.1.1). This means that a func-
tional is associated to (3.1.1) with the property that if x ∗ is a local minimizer of (3.1.1), then
it is also a minimizer of that functional where the essential constraints are eliminated and
at most simple affine constraints remain. Here this is achieved on the basis of Lagrangian
functionals and augmentability is obtained without the use of a strict complementarity as-
sumption. Section 3.3 contains the foundations for the first order augmented Lagrangian
algorithm based on a duality framework. The convergence analysis for the first order algo-
rithm is given in Section 3.4. Section 3.5 contains an application to parameter estimation
problems.

3.2 Augmentability and sufficient optimality


We consider a modification of (3.1.1) where the equality and finite rank inequality constraints
of (3.1.1) are augmented:

c c
min fc (x, u) = f (x) + |e(x)|2W + |g(x) + u|2Rm over (x, u) ∈ X × Rm
2 2 (3.2.1)
subject to e(x) = 0, g(x) + u = 0, u ≥ 0, (x) ∈ K,

where c > 0 will be appropriately chosen. Observe that x ∗ is a local solution to (3.1.1)
if and only if (x ∗ , u∗ ) = (x ∗ , −g(x ∗ )) is a (local) solution to (3.2.1). Moreover, x ∗ is
stationary for (3.1.1) with Lagrange multiplier (λ∗ , μ∗ , η∗ ) if and only if (x ∗ , −g(x ∗ )) is
stationary for (3.2.1) with Lagrange multiplier (λ∗ , μ∗ , μ∗ , η∗ ), i.e., (3.1.3)–(3.1.4) hold
with g(x ∗ ) = −u∗ . Associated to (3.2.1) we introduce the functional

c c
Lc (x, u, λ, μ, η) = L(x, λ, μ, η) + |e(x)|2W + |g(x) + u|2Rm . (3.2.2)
2 2

i i

i i
ItoKunisc
i i
2008/6/12
page 68
i i

68 Chapter 3. First Order Augmented Lagrangians

Its Hessian L c at (x ∗ , u∗ ) in direction ((h, k), (h, k)) ∈ (X × Rm ) is given by

L c (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k))


= L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|e (x ∗ )h|2 + c|g (x ∗ )h + k|2

= L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|e (x ∗ )h|2 + c |g (x ∗ )h|2 (3.2.3)
i∈I2
 
+c |gi (x ∗ )h + ki |2 + c (|ki |2 + 2ki gi (x ∗ )h).
i∈I1 ∪I3 i∈I2

By the Riesz representation


  theorem there exists for every  i ∈ {1, . . . , m} a unique li ∈ X
such that gi (x ∗ )h = li , h X for every h ∈ X, where ·, · X denotes the inner product in X.
The operator E : X → W × Rm2 , defined by
     
Eh = e (x ∗ )h, lm1 +1 , h X , . . . , lm1 +m2 , h X ,

combines the linearized equality constraint and the active inequality constraints, which
act like equalities in the sense that the associated Lagrange multipliers are positive. The
essential technical result which will imply augmentability of the constraints in (3.1.1) is
given next.

Proposition 3.1. If (3.1.2)–(3.1.6) hold, then there exist constants c̄ > 0 and τ ∈ (0, γ ]
such that

m1
 
∗ ∗ ∗ ∗
H ((h, k̂), (h, k̂)) := L (x , λ , μ , η )(h, h) + c|Eh| + c | li , h X + kˆi |2
2

i=1

≥ τ (|h|2 + |k̂R2 m1 |)

for all c ≥ c̄, h ∈ X, and k̂ ∈ Rm 1


+ .

Proof. Step 1. An equivalent characterization of C is derived by eliminating linear depen-


dencies in the definition of C. The cone in (3.1.6) can be equivalently expressed as
 
C = {h ∈ X : Eh = 0 and li , h X ≤ 0 for all i ∈ I1 }. (3.2.4)

We next show that




⎪ there exist m̃2 ≤ m2 and vectors ni , i = m1 + 1, . . . , m1 + m̃2 ,

⎨ such that Ẽ : X → Y × Rm̃2 defined by
      (3.2.5)

⎪ Ẽh = e (x ∗ )h, nm1 +1 , h X , . . . , nm1 +m̃2 , h X


is surjective and ker E = ker Ẽ.

To verify (3.2.5) let ni = Pker e li , where Pker e denotes the projection onto ker e (x ∗ ). Choose
I˜2 ⊂ I2 such that {ni }i∈I˜ are linearly independent and span{ni }i∈I˜ = span{ni }i∈I2 . Possibly
2 2
1 +m̃2
after reindexing we have {ni }i∈I˜2 = {ni }m
m1 +1 . We show that ker E = ker Ẽ. Let h ∈ ker E.
Then e (x ∗ )h = 0. For every i ∈ I˜2 let li = ni + li⊥ , where li⊥ ∈ (ker e (x ∗ ))⊥ . Then

i i

i i
ItoKunisc
i i
2008/6/12
page 69
i i

3.2. Augmentability and sufficient optimality 69

   
ni , h X = li , h X = 0, i ∈ I˜2 , and therefore h ∈ ker Ẽ. The converse, ker Ẽ ⊂ ker E, is
proved similarly. To verify surjectivity! of Ẽ, observe that the adjoint Ẽ ∗ : W ×R! → X of
m̃2

Ẽ is given by Ẽ ∗ (w, r) = e (x ∗ )∗ w + i∈I˜2 ri ni . If Ẽ ∗ (w, r) = 0, then, using i∈I˜2 ri ni ∈


!
ker e (x ∗ ) and e (x ∗ )∗ w ∈ (ker e (x ∗ ))⊥ , we have i∈I˜2 ri ni = 0 and e (x ∗ )∗ w = 0.
Therefore (w, r) = 0, ker Ẽ ∗ = 0, and range Ẽ = W × Rm2 , by the closed range theorem.
Thus (3.2.5) holds.
Finally we consider the set of vectors {ni }i∈I1 with ni = PẼ li . After possible rear-
rangement of indices there exists a subset I˜1 = {1, . . . , m̃1 } ⊂ I1 such that {ni }i∈I˜1 are
linearly independent in ker E. The cone C can therefore equivalently be expressed as
   
C = h ∈ X : h ∈ ker E and ni , h X ≤ 0 for all i ∈ I˜1 .

To exclude trivial cases we assume throughout that I˜1 and I˜2 are nonempty.
Step 2. Here we characterize the polar cone
   
C1∗ = h ∈ ker E : h, h̃ X ≤ 0 for all h̃ ∈ C1

of the closed convex cone


   
C1 = h ∈ ker E : ni , h X ≤ 0 for all i ∈ I˜1 .

Denoting by P and P ∗ the canonical projections in ker E onto C1 and C1∗ , respectively,

every element h ∈ ker E can be uniquely expressed as h = h1 + h2 with h1 , h2 X = 0 and
h1 = P h ∈ C1 , h2 = P ∗ h ∈ C1∗ (see [Zar, p. 256]). Moreover
.  
C1 = Ci with Ci = {h ∈ ker E : ni , h X ≤ 0}
i∈I˜1

and
. ∗  
C1∗ = Ci = co Ci∗ = co Ci∗

(see [Zar]), with co denoting convex closure. It follows that



m̃1 
C1∗ = αi ni : (α1 , . . . , αm̃1 ) ∈ m̃1
R+ . (3.2.6)
i=1

Step 3. Let K1 be chosen such that


m̃ 2

m̃1  1

K1 |αi | ≤
2
αi n i

i=1 i=1
!m̃1
for every (α1 , . . . , αm̃1 ) ∈ Rm̃1 . For arbitrary h = i=1 αi ni ∈ C1∗ we have


m̃1
   
|h| =2
αi n i , h X ≤ αi ni , h X ,
i=1 I1+

i i

i i
ItoKunisc
i i
2008/6/12
page 70
i i

70 Chapter 3. First Order Augmented Lagrangians

 
where I1+ = {i ∈ I˜1 : ni , h X > 0 and αi > 0}. Consequently
⎛ ⎞⎛ ⎞1/2 ⎛ ⎞1/2
   −1/2
  
|h|2 ≤ ⎝ αi2 ⎠ ⎝ ni , h ⎠ ≤ K1 |h| ⎝ ni , h ⎠
2 2
X X
I1+ Ii+ I1+

and
 2
K1 |h|2 ≤ ni , h X
for every h ∈ C1∗ . (3.2.7)
I1+

Step 4. Every h ∈ X can be uniquely decomposed into mutually orthogonal elements


as h = h1 + h2 + h3 , where h1 ∈ C1 ⊂ ker E, h2 ∈ C1∗ ⊂ ker E, and h3 ∈ ker E ⊥ .
!m̃1
By Step 2 there exists a vector (α1 , . . . , αm̃1 ) ∈ Rm̃1 such that h2 = i=1 α n . Note
    !m̃i1 i 
that ni , h1 X ≤ 0 for all i ∈ I˜1 . Therefore, using the fact that h1 , h2 X = i=1 αi h 1 ,

ni X = 0, it follows that
-
if αi > 0 for some i = 1, . . . , m̃1 , then
      (3.2.8)
ni , h1 X = PẼ li , h1 X = li , h1 X = 0.
 
We shall use Step 3 with I1+ = {i ∈ I˜1 : ni , h2 X > 0 and αi > 0}. Let k̂ ∈ Rm 1
+ and c > 0.
Then by (3.1.6) and (3.2.8) we have
H ((h, k̂), (h, k̂)) = L (x ∗ , λ∗ , μ∗ , η∗ )(h, h) + c|Eh3 |2
       
+c li , h3 + li , h2 + k̂i 2 + c li , h + k̂i 2
X X X
I1+ I1 \I1+

∗ ∗ ∗ ∗
≥ γ |h1 | + 2L (x , λ , μ , η )(h1 , h2 + h3 )
2


+ L (x ∗ , λ∗ , μ∗ , η∗ )(h2 + h3 , h2 + h3 )
c1    2  2
+ c|Eh3 |2 + ni , h2 X + k̂i − c1 li , h3 X
2 + +
I1 I1
c2   2
+ |k̂i |2 − c2 li , h X
2 + +
I1 \I1 I1 \I1

for all 0 < c1 , c2 ≤ c to be chosen below. Here we used (a + b)2 ≥ 12 a 2 − b2 and the
   
fact that li , h2 = ni , h2 for i ∈ I1 . To estimate Eh3 from below we make use of the
fact that Ẽ is an isomorphism from (ker E)⊥ to W × Rm̃2 . Hence there
/ exists K2 > / 0 such
that K2 |h3 | ≤ |Ẽh3 | ≤ |Eh3 | for all h3 ∈ (ker E)⊥ . Setting L = /L (x ∗ , λ∗ , μ∗ )/ , c3 =
min(c1 , c2 ), and c4 = max(c1 , c2 ) we obtain
H ((h, k̂), (h, k̂)) = γ |h1 |2 − 2L|h1 |(|h2 | + |h3 |) − L(|h2 | + |h3 |)2
c1   2 c3  2
+ cK22 |h3 |2 + ni , h2 X + k̂
2 + 2 I i
I1 1
 2   2  2 
− 3c4 li , h3 X − 3c2 li , h1 X + li , h2 X ,
I1 I1 −I1+
 
where we used the fact that ni , h2 X
≥ 0 for i ∈ I1+ .

i i

i i
ItoKunisc
i i
2008/6/12
page 71
i i

3.2. Augmentability and sufficient optimality 71

!
Setting |l|2 = I1 |li |2 and using (3.2.7) we find
c1 c3  2
H ((h, k̂), (h, k̂)) ≥ γ |h1 |2 + K1 |h2 |2 + cK22 |h3 |2 + k̂
2 2 I i
1

− 2L|h1 |(|h2 | + |h3 |) − 2L(|h2 | + |h3 |2 )


2

− 3c2 |l|2 |h1 |2 − 3c2 |l|2 |h2 |2 − 3c4 |l|2 |h3 |2


c1 c3  2
≥ γ |h1 |2 + K1 |h2 |2 + cK22 |h3 |2 + k̂
2 2 I i
1
 γ 1/2  γ −1/2  γ 1/2  γ −1/2
− 2L |h1 ||h2 | − 2L |h1 ||h3 |
4 4 4 4
− 2L|h2 |2 − 2L|h3 |2 − 3c2 |l|2 |h1 |2 − 3c2 |l|2 |h2 |2 − 3c4 |l|2 |h3 |2
 
γ
≥ |h1 | γ − − 3c2 |l|
2 2
2
 
c 1 4 2
+ |h2 | 2
K1 − L − 2L − 3c2 |l| 2
2 γ
 
4 c3  2
+ |h3 | cK2 − L − 2L − 3c4 |l| +
2 2 2 2
k̂ .
γ 2 I i
1

We now choose the constants in the order c2 , c1 , and c̄ such that the coefficients of
|h1 |2 , |h2 |2 , and |h3 |2 , with c replaced by c̄, are positive. This implies the claim for
every c ≥ c̄.

It is worthwhile to point out the following corollary to Proposition 3.1 which is


well known in finite-dimensional spaces.

Corollary 3.2. Let E ∈ L(X, W ) be surjective and let A ∈ L(X) be self-adjoint and
coercive on ker E; i.e., there exists γ > 0 such that Ax, x ≥ γ |x|2 for
 all x ∈ ker E.
Then there exist constants τ > 0 and c̄ > 0 such that (A + cE ∗ E)x, x X ≥ τ |x|2 for all
x ∈ X and c ≥ c̄.

This follows from Proposition 3.1 with A = L (x ∗ ).

Proposition 3.3. Let (3.1.2)–(3.1.6) hold. Then there exists σ > 0 such that

L c̄ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + μ∗i ki ≥ σ (|h|2 + |k 2 |)
i∈I2

for all h ∈ X with |h| ≤ μ̃/(2c̄ supi∈I2 |li |), μ̃ = minI2 μ∗i and all k ∈ Rm with ki ≥ 0 for
i ∈ I1 ∪ I2 , and c̄ is as introduced in Proposition 3.1.
!
Proof. Let A = L c̄ (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((h, k), (h, k)) + i∈I2 μ∗i ki . Then from (3.2.3) and
the definition of H
        ∗
A = H ((h, k̂), (h, k̂)) + c̄ li , h + ki 2 + c̄ |ki |2 + 2 li , h X ki + μi ki ,
X
I3 I2 I2

i i

i i
ItoKunisc
i i
2008/6/12
page 72
i i

72 Chapter 3. First Order Augmented Lagrangians

where k̂ denotes the first m1 coordinates of the vector k. By Proposition 3.1 we have for
every ε > 1
  2 
A ≥ τ (|h|2 + |k̂|2Rm1 ) + c̄ li , h X (1 − ε 2 ) + |ki |2 (1 − ε −2 )
I3
    
+ c̄ |ki | − 2c̄
2 li , h ki + μ∗i ki
X
I2 I2 I2

−2
≥ τ (|h| +2
|k̂|2Rm1 ) + c̄(1 − ε ) |ki |2
I2 ∪I3
    
+ c̄(1 − ε )|h|
2 2
|li | − 2c̄
2 li , h ki + μ∗i ki .
X
I3 I2 I2
!
Now choose ε > 1 such that τ
2
= (ε 2 − 1)τ I3 |li |2 . Then using the constraint for h we
find that
1   
A≥ τ (|h|2 + |k̂|2Rm1 ) + c̄(1 − ε −2 ) ki2 + μ̃ − 2c̄ sup |li ||h| ki
2 I ∪I I2 I
2 3 2

1 
≥ τ (|h|2 + |k̂|2Rm1 ) + c̄(1 − ε −2 ) ki2 .
2 I ∪I 2 3

This proves the claim for c = c̄. For arbitrary c ≥ c̄ the assertion follows from the form of
L c (x ∗ , u∗ , λ∗ , μ∗ ).

Theorem 3.4. Assume that (3.1.2)–(3.1.6) hold. Then there exist constants σ̄ > 0, c̄ > 0
and a neighborhood U (x ∗ , −g(x ∗ )) = U (x ∗ , u∗ ) of (x ∗ , u∗ ) such that
 
Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u Rm
      c c
= f (x) + λ∗ , e(x) W + μ∗ , g(x) + u Rm + η∗ , (x) Z + |e(x)|2W + |g(x) + u|2Rm
2 2
≥ f (x ∗ ) + σ̄ (|x − x ∗ |2 + |u − u∗ |2 )
(3.2.9)

for all c ≥ c̄ and (x, u) ∈ U (x ∗ , u∗ ) with ui ≥ 0 for i ∈ I1 ∪ I2 .


   
Proof. By (3.1.4) we have μ∗ , u∗ = − μ∗ , g(x ∗ ) = 0. Moreover
 
Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm = f (x ∗ ),
(Lc )x (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) = (Lc )u (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) = 0,

where (Lc )x and (Lc )u denote the partial derivatives of Lc with respect to x and u. Conse-
quently
   
Lc (x ∗ , u∗ , λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm = f (x ∗ ) + μ∗ , u − u∗ Rm
+ L c (x ∗ , u∗ , λ∗ , μ∗ , η∗ )((x − x ∗ , u − u∗ ), (x − x ∗ , u − u∗ )) + o(|x − x ∗ |2 + |u − u∗ |2 ).

i i

i i
ItoKunisc
i i
2008/6/12
page 73
i i

3.2. Augmentability and sufficient optimality 73

The definitions of I1 and I3 imply that μ∗i = 0 for i ∈ I1 ∪ I3 . Thus Proposition 3.3 implies
the existence of a neighborhood U (x ∗ , u∗ ) of (x ∗ , u∗ ) and a constant σ̄ > 0 such that (3.2.9)
holds for all (x, u) ∈ U (x ∗ , u∗ ) with ui ≥ 0 for i ∈ I1 ∪ I2 .

The conclusion of Theorem 3.4 is referred to as augmentability of problem   (3.1.1).


This means that a functional, which in our case is Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm , can
be found with the property that it has (x ∗ , u∗ ) as a strict local minimizer under the simple
constraint u ≥ 0, while the explicit nonlinear constraints are eliminated. Let us note that if
(3.1.6) holds with C replaced by the larger set
Cˆ = {h ∈ X : e (x ∗ )h = 0, gi (x ∗ )h ≤ 0 for i ∈ I1 ∪ I2 },
 
then the conclusion of Theorem 3.4 holds with Lc (x, u, λ∗ , μ∗ , η∗ ) + μ∗ , u∗ Rm replaced
by Lc (x, u, λ∗ , μ∗ , η∗ ). In case X is finite-dimensional, augmentability is analyzed in detail
in [Hes1], for example. The proof of augmentability in [Hes1] depends in essential manner
on compactness of the unit ball. In [Be, p. 161 ff ], the augmentability analysis relies on the
strict complementarity assumption, i.e., I1 = ∅ is assumed.
We obtain, as immediate consequence of Theorem 3.4, that (3.1.6) provides a second
order sufficient optimality condition.

Corollary 3.5. Assume that (3.1.2)–(3.1.6) hold. Then there exists a neighborhood U (x ∗ )
of x ∗ such that
f (x) ≥ f (x ∗ ) + σ̄ |x − x ∗ |2
for all x ∈ U (x ∗ ) satisfying e(x) = 0, g(x) ≤ 0, and (x) ∈ K.

Theorem 3.4 can be used as the basis to define augmented cost functionals in terms
of x only with the property that x ∗ is a uniform strict local unconstrained minimum. The
two choices we present differ in the way in which the inequality constraints are treated. The
first choice uses the classical cutoff penalty functional
g̃(x) = max(g(x), 0), (3.2.10)
and the second is the Bertsekas penalty functional
 
μ
ĝ(x, μ, c) = max g(x), − , (3.2.11)
c
where μ ∈ Rm and c > 0. In each of the two cases the max operation acts coordinatewise. To
motivate the choice (3.2.11) we use the Lagrangian Lc (x, u, λ, μ, η) introduced in (3.2.2)
for (3.2.1) and consider
⎧  
⎨min Lc (x, u, λ, μ, η) = fc (x, u) + λ, e(x) W
    (3.2.12)
⎩ + μ, g(x) + u Rm + η, (x) Z over x ∈ X, u ≥ 0.

Carrying out the constraint minimization with respect to u with x and μ fixed results in
  
μ
u = u(x, μ) = max 0, − g(x) + ,
c

i i

i i
ItoKunisc
i i
2008/6/12
page 74
i i

74 Chapter 3. First Order Augmented Lagrangians

and consequently
 
μ
g(x) + u(x, μ) = max g(x), − = ĝ(x, μ, c). (3.2.13)
c

Corollary 3.6. Let (3.1.2)–(3.1.6) hold. Then there exists a neighborhood U (x ∗ ) of x ∗


such that
      c c
f (x)+ λ∗ , e(x) W + μ∗ , g̃(x) Rm + η∗ , (x) Z + |e(x)|2W + |g̃(x)|2
2 2
≥ f (x ∗ ) + σ̄ |x − x ∗ |2

for all x ∈ U (x ∗ ), and c ≥ c̄, where g̃(x) = max(g(x), 0).

Proof. Setting
u = max(−g(x), 0) (3.2.14)
we have g(x) + u = g̃(x) and u ≥ 0. Recall the definition of U = U (x ∗ , −g(x ∗ )) from
Theorem 3.4. Determine a neighborhood U (x ∗ ) such that x ∈ U (x ∗ ) implies (x, −g(x)) ∈
U and gi (x) ≤ 0 if gi (x ∗ ) < 0. It is simple to argue that x ∈ U (x ∗ ) implies (x, u) ∈ U,
where u is defined in (3.2.14). The claim now follows from Theorem 3.4.

Corollary 3.7. Let (3.1.2)–(3.1.6) hold and let r > 0. Then there exist constants δ > 0 and
c̃ = c̃(r) ≥ c̄ such that
      c c
f (x)+ λ∗ , e(x) X + μ∗ , ĝ(x, μ, c) Rm + η∗ , (x) Z + |e(x)|2W + |ĝ(x, μ, c)|2
2 2
≥ f (x ∗ ) + σ̄ |x − x ∗ |2

for all c ≥ c̃, x ∈ Bδ = {x ∈ X : |x − x ∗ | ≤ δ}, and μ ∈ Br+ = {μ ∈ Rm


+ : |μ| ≤ r},
where ĝ(x, μ, c) = max(g(x), − μc ).

Proof. Let ε > 0 be such that gi (x ∗ ) ≤ −ε for all i ∈ I3 and such that |x − x ∗ | < ε
and |u − u∗ | < ε implies (x, u) ∈ U (x ∗ , u∗ ), where U (x ∗ , u∗ ) is given in Theorem 3.4.
Determine δ ∈ (0, ε) such that |x − x ∗ | ≤ δ implies |g(x) − g(x ∗ )| < 2ε and choose c̃ ≥ 2rε .
For c ≥ c̃ and μ ∈ Br+ we define
  
μ
u = max 0, − + g(x) . (3.2.15)
c
Then g(x) + u = ĝ(x) and u ≥ 0. To verify the claim it suffices to show that x ∈ Bδ
and μ ∈ Br+ imply |u − u∗ | < ε, where u is defined in (3.2.15). If i ∈ I1 ∪ I2 , then
|ui − u∗i | ≤ |gi (x ∗ ) − gi (x)| + μci . For i ∈ I3 we have μci + gi (x) ≤ μci + |gi (x ∗ ) − gi (x)| +
gi (x ∗ ) < rc + 2ε − ε < 0 and consequently |ui − u∗i | = |gi (x ∗ ) − gi (x) − μci |. Summing
over i = 1, . . . , m we find |u − u∗ | ≤ 2|g(x ∗ ) − g(x)|2 + c22 |μ|2 < ε2 .

Remark 3.2.1. Note that


  c
x → μ, ĝ(x, μ, c) + |ĝ(x, μ, c)|2
2

i i

i i
ItoKunisc
i i
2008/6/12
page 75
i i

3.3. The first order augmented Lagrangian algorithm 75

is C 1 if g is C 1 . This follows from the identity


  c 1  
μ, ĝ(x, μ, c) Rm
+ |ĝ(x, μ, c)|2Rm = | max(0, μ + cg(x))|2Rm − |μ|2Rm . (3.2.16)
2 2c
Here, as throughout, the norm on Rm is the Euclidean one. On the other hand,
  c
x → μ, g̃(x, μ, c) + |g̃(x, μ, c)|2
2
is not C 1 .

3.3 The first order augmented Lagrangian algorithm


Here we describe the first order augmented Lagrangian algorithm which is a hybrid method
combining the penalty technique and a Lagrangian method. Considering for a moment
equality constraints only, the penalty method for (3.1.1) consists in minimizing
ck
f (x) + |e(x)|2
2
for a sequence of penalty parameters ck tending to ∞. The Lagrangian method relies on
minimizing

f (x) + (λk , e(x))

and updating λk as a maximizer of the dual problem associated to (3.1.1), which will
be defined below. The first order augmented Lagrangian method provides a systematic
technique for the multiplier update. Its convergence analysis will be given in the next
section.
Let x ∗ be a local solution of (3.1.1) with (3.1.2)–(3.1.6) holding. The algorithm will

require startup values (λ0 , μ0 ) ∈ W × Rm + for the Lagrange multipliers. We set r = |μ | +
∗ 2 ∗ 2 1/2
(|λ0 −λ | +|μ0 −μ | ) and choose ε and c̃ = c̃(r, ε) ≥ c̄ as in the proof of Corollary 3.6.
Recall that ε was chosen such that gi (x ∗ ) ≤ −ε for all indices i in the set of inactive indices I3
and such that |x −x ∗ | < ε, |u−u∗ | < ε imply (x, u) ∈ U (x ∗ , −g(x ∗ )) with U (x ∗ , −g(x ∗ ))
given in Theorem 3.4. Finally, let {cn }∞ n=1 be a monotonically nondecreasing sequence of
penalty parameters with c1 > c̃, and set σn = cn − c̄. It is not required that limn→∞ cn = ∞,
as would be the case for penalty methods.

Algorithm ALM.
1. Choose (λ0 , μ0 ) ∈ W × Rm
+.

2. Let xn be a solution to

min Ln (x) over x ∈ X with (x) ∈ K. (Pn )

3. Update the Lagrange multipliers

λn = λn−1 + σn e(xn ), μn = μn−1 + σn ĝ(xn , μn−1 , cn ),

i i

i i
ItoKunisc
i i
2008/6/12
page 76
i i

76 Chapter 3. First Order Augmented Lagrangians

where
   
Ln (x) = f (x) + λn−1 , e(x) W + μn−1 , ĝ(x, μn−1 , cn ) Rm

cn cn
+ |e(x)|2W + |ĝ(x, μn−1 , cn )|2 .
2 2
Observe that μn = max(μn−1 + σn g(xn ), μn−1 (1 − σcnn )) and consequently μn ∈ Rn+ for
each n = 1, 2, . . . . Existence of solutions to (Pn ) will be discussed in Section 3.4. It will be
shown that (Pn ) has a solution in the interior of Bδ (see Corollary 3.6) for all n sufficiently
large, or for all n if c1 is sufficiently large, and that these solutions converge to x ∗ , while
(λn , μn ) converges to (λ∗ , μ∗ ).
To motivate the Lagrange multiplier update in step 3 of the algorithm and to justify
calling this a first order algorithm, we consider problem (3.2.1) without the infinite rank in-
equality constraint. Introducing Lagrange multipliers for the equality constraints e(x) = 0
and g(x) + u = 0 we obtained the augmented Lagrange functional Lc in (3.2.12). Carry-
ing out the minimization with respect to u and utilizing (3.2.13) suggests introducing the
modified augmented Lagrangian functional
    c c
L̂c (x, λ, μ) = f (x) + λ, e(x) W + μ, ĝ(x, μ, c) Rm + |e(x)|2W + |ĝ(x, μ, c)|2Rm .
2 2
(3.3.1)
Since L̂c (x ∗ , λ∗ , μ∗ ) = f (x ∗ ) we find
L̂c (x ∗ , λ, μ) ≤ L̂c (x ∗ , λ∗ , μ∗ ) ≤ L̂c (x, λ∗ , μ∗ ) for all x ∈ Bδ and μ ≥ 0, (3.3.2)
∗ ∗
whenever c ≥ c̃. The first inequality follows from the fact that e(x ) = ĝ(x , μ) = 0 for
μ ≥ 0 and the second one from Corollary 3.6. From the saddle point property (3.3.2) for
L̂c (x, λ, μ) we deduce that
sup inf L̂c (x, λ, μ) ≤ L̂c (x ∗ , λ∗ , μ∗ ) ≤ inf sup L̂c (x, λ, μ). (3.3.3)
λ,μ≥0 x x λ,μ

For the following discussion we assume strict complementarity, i.e., I1 = ∅. Then


x → L̂c (x, λ, μ) is twice continuously differentiable for (x, λ, μ) in a neighborhood of
(x ∗ , λ∗ , μ∗ ) and
L̂ c (x ∗ , λ∗ , μ∗ )(h, h) = L̂ (x ∗ , λ∗ , μ∗ )(h, h) + c|Eh|2W for h ∈ X.
Thus by (3.1.6) and Corollary 3.2, L̂ c (x ∗ , λ∗ , μ∗ ) is positive definite on X for all c suffi-
ciently large. Moreover (x, λ, μ) → L̂ c (x, λ, μ) is continuous in a neighborhood U (x ∗ ) ×
U (λ∗ , μ∗ ) of (x ∗ , λ∗ , μ∗ ) and L̂ c (x, λ, μ) ≥ τ̄ |x|2 for τ̄ > 0 independent of x ∈ X and
(λ, μ) ∈ W × Rm . It follows that
min L̂ c (x, λ, μ)
x∈U (x ∗ )

admits a unique solution x(λ, μ) for every (λ, μ) ∈ U (λ∗ , μ∗ ) and as a consequence of
Theorem 2.24 (λ, μ) → x(λ, μ) is differentiable. Consequently the locally defined dual
functional
d(λ, μ) = min∗ L̂ c (x, λ, μ)
x∈U (x )

i i

i i
ItoKunisc
i i
2008/6/12
page 77
i i

3.3. The first order augmented Lagrangian algorithm 77

is differentiable for (λ, μ) in a neighborhood of (λ∗ , μ∗ ). For the gradient ∇d at (λ, μ) in


direction (δλ , δμ ) we obtain using (3.2.16)
 
∇d(λ, μ)(δλ , δμ ) = f (x) + e (x)∗ λ + g (x)∗ max(0, μ + cg(x)), xλ (δλ ) + xμ (δμ )
   
+ δλ , e(x) + δμ , ĝ(x, μ, c) ,

where x = x(λ, μ).


Utilizing the first order optimality condition this implies that the gradient of d(λ, μ)
is given by
∇d(λ, μ) = (e(x(λ, μ)), ĝ(x(λ, μ), μ, c)) ∈ W × Rm . (3.3.4)

Thus the multiplier updates in step 3 of Algorithm ALM are steepest ascent directions for
the dual problem
sup d(λ, μ).
λ,μ≥0

In view of (3.3.3) the first order augmented Lagrangian algorithm is an iterative


algorithm combining minimization in the primal direction in step 2 and maximization in the
dual direction on every level of the iteration.

Remark 3.3.1. In this chapter we identify X and W with their dual spaces and consider
e (x)∗ as an operator from W to X. If, alternatively, e (x)∗ is considered as an operator
from W ∗ to X ∗ , then the Lagrange multiplier is taken as an element of W ∗ and the Lagrange
functional is defined as L(x, λ̃) = f (x) + λ̃, e(x)
W ∗ ,W . The relation between λ and λ̃
is given by I λ = λ̃, with I the canonical isomorphism from W to W ∗ . If, for example,
W = H −1 (), then W ∗ = H01 () and I = (− )−1 . The Lagrange multiplier update in
step 3 of Algorithm ALM is then given by λn = λn−1 + σn I e(xn ).

As already mentioned the augmented Lagrangian algorithm ALM is also a hybrid


method combining Lagrange multiplier and penalty methods. Solving (Pn ) without the
penalty terms we obtain the former, for λn = 0 and cn satisfying limn→∞ cn = ∞ we obtain
the latter. The choice of cn for Algorithm ALM in practice is an important one. While no
general purpose techniques appear to be available, guidelines for the choice include choosing
them large enough such that the augmented Lagrangian has positive definite Hessian in the
sense of Proposition 3.1. Large cn will improve the convergence estimates, as we shall see
in the following section, and at the same time they can be the cause for ill-conditioning of
the auxiliary problems (Pn ). In fact, for large cn the Hessian of the cost functional Ln (x)
can have eigenvalues on significantly different scales. For further discussion on the choice
of the parameter c we refer the reader to [Be].
Let us make a few historical comments on the first order augmented Lagrangian
algorithm. It was originated by Hestenes [Hes2] and Powell [Pow] and extended among
others by Bertsekas [Be] and in [PoTi, Roc2]. The infinite-dimensional case has received less
attention. In this respect we refer the reader to the work of Polyak and Tret’yakov [PoTr] and
Fortin and Glowinski [FoGl]. Both consider the case of augmenting equality constraints
only. In [FoGl] augmented Lagrangian methods are also developed systematically for
solving nonlinear partial differential equations.

i i

i i
ItoKunisc
i i
2008/6/12
page 78
i i

78 Chapter 3. First Order Augmented Lagrangians

3.4 Convergence of Algorithm ALM


We shall use the following notation:
      c̄ c̄
L̂c (x, μ) = f (x) + λ∗ , e(x) + μ∗ , ĝ(x, μ, c) + η∗ , (x) + |e(x)|2 + |ĝ(x, μ, c)|2 .
2 2
Further we set r = |μ∗ | + (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 )1/2 . Corollary 3.7 then guarantees the
existence of δ > 0 and c̃ = c̃(r) such that

L̂c (x, μ) − f (x ∗ ) ≥ σ̄ |x − x ∗ |2 (3.4.1)

for all x ∈ Bδ , μ ∈ Br+ , and c ≥ c̃. We also assume that Bδ is contained in the region of
applicability of (3.4.1) and that {cn }∞ n=1 is a monotonically nondecreasing sequence with
c̃ < c1 . Then we have the following convergence properties of Algorithm ALM from
arbitrary initializations (λ0 , μ0 ) ∈ W × Rm+ in the case that suboptimal solutions are chosen
in step 2.

Theorem 3.8. Assume that (3.1.2)–(3.1.5), (3.4.1) hold and that x̃n ∈ Bδ satisfy

Ln (x̃n ) ≤ Ln (x ∗ ) for each n = 1, . . . . (3.4.2)

If (λn , μn ) are defined as in step 3 of Algorithm ALM with xn replaced by x̃n , then for n ≥ 1
we have with σn = cn − c̄

1
σ̄ |x̃n − x ∗ |2 + (|λn − λ∗ |2 + |μn − μ∗ |2 )
2σn
1
≤ (|λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 ). (3.4.3)
2σn
This implies that μn ∈ Br+ for all n ≥ 1 and
1
|x̃n − x ∗ |2 ≤ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) (3.4.4)
2σ̄ σn
and

 1
σn |x̃n − x ∗ |2 ≤ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ). (3.4.5)
n=1
2σ̄

Proof. We proceed by induction and assume that the claim has been verified up to n−1. For
n = 1 the result can be obtained by the general arguments given below. For the induction
step we observe that (3.4.3), which is assumed to hold up to n − 1, implies

|μn−1 | ≤ |μ∗ | + (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 )1/2 = r.

Consequently μn−1 ∈ Br+ and (3.4.1) with μ = μn−1 is applicable. Using the fact that

[max(0, μn−1,i + cn gi (x ∗ ))]2 − μ2n−1,i ≤ 0

i i

i i
ItoKunisc
i i
2008/6/12
page 79
i i

3.4. Convergence of Algorithm ALM 79

and (3.2.16) we find

1 
m
∗ ∗
Ln (x ) = f (x ) + {[max(0, μn−1,i + cn gi (x ∗ ))]2 − μ2n−1,i } ≤ f (x ∗ ). (3.4.6)
2cn i=1

Next we rearrange terms in Ln (x̃n ) and obtain


     
Ln (x̃n ) = f (x̃n ) + λ∗ , e(x̃n ) + λn−1 − λ∗ , e(x̃n ) + μ∗ , ĝ(x̃n , μn−1 , cn )
  c̄ 1
+ μn−1 − μ∗ , ĝ(x̃n , μn−1 , cn ) + |e(x̃n )|2 + (cn − c̄)|e(x̃n )|2
2 2
c̄ 1
+ |ĝ(x̃n , μn−1 , cn )|2 + (cn − c̄)|ĝ(x̃n , μn−1 , cn )|2
2 2
1
= L̂(x̃n , μn−1 ) + (|λn − λ∗ |2 − |λn−1 − λ∗ |2 )
2σn
1  
+ (|μn − μ∗ |2 − |μn−1 − μ∗ |2 ) − η∗ , (x̃n ) Z .
2σn
 
Since η∗ ∈ K + and (x̃n ) ∈ K we have η∗ , (x̃n ) ≤ 0. This fact together with the above
equality, (3.4.2), and (3.4.6) implies

1
L̂(x̃n , μn−1 ) + (|λn − λ∗ |2 − |λn−1 − λ∗ |2 )
2σn
1
+ (|μn − μ∗ |2 − |μn−1 − μ∗ |2 ) ≤ Ln (x̃n ) ≤ f (x ∗ ).
2σn

Finally (3.4.1) implies that

1 1 1 1
σ̄ |x̃n − x ∗ |2 + |λn − λ∗ |2 + |μn − μ∗ |2 ≤ |λn−1 − λ∗ |2 + |μn−1 − μ∗ |2 ,
2σn 2σn 2σn 2σn

which establishes (3.4.3). Estimates (3.4.4) and (3.4.5) follow from (3.4.3).

The following conditions will be utilized to guarantee existence of local solutions to


the auxiliary problems (Pn ):


⎪ f : X → R and gi : X → R, i = 1, . . . , m, are weakly lower

⎨semicontinuous,
(3.4.7)
⎪e : X → W maps weakly convergent sequences to



weakly convergent sequences.

Further (PnC ) denotes problem (Pn ) with the additional constraint that x is contained in the
closed ball Bδ . We refer to xn as the solution to (PnC ) if Ln (xn ) ≤ Ln (x) for all x ∈ Bδ .

Proposition 3.9. If (3.1.2)–(3.1.5), (3.4.1), and (3.4.7) hold, then (PnC ) admits a solution
xn for every n = 1, 2, . . . . Moreover, there exists n0 such that every solution xn of (PnC )

i i

i i
ItoKunisc
i i
2008/6/12
page 80
i i

80 Chapter 3. First Order Augmented Lagrangians

satisfies xn ∈ int Bδ if n ≥ n0 . If c11−c̄ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then


every solution xn of (PnC ) is in int Bδ for every n = 1, 2, . . . .

Thus for n0 or c sufficiently large the solutions to (PnC ) are local solutions of the
unconstrained problems (Pn ).

Proof. Let {xni }i∈N be a minimizing sequence for (PnC ). There exists a weakly convergent
subsequence {xnik }k∈N with weak limit xn ∈ Bδ . Condition (3.4.7) implies weak lower
semicontinuity of Ln and hence Ln (xn ) ≤ lim inf ik Ln (xnik ) and xn is a solution of (PnC ).
In particular (3.4.2) holds with x̃n replaced by xn for n = 1, 2, . . . . Consequently
limn xn = x ∗ and xn ∈ int Bδ for all n sufficiently large. Alternatively, if c11−c̄ (|λ0 − λ∗ |2 +
|μ0 − μ∗ |2 ) is sufficiently small, then xn ∈ int Bδ for all n = 1, 2, . . . by (3.4.4).

For the solutions xn obtained in Proposition 3.9 the conclusions of Theorem 3.8
hold, in particular limn xn = x ∗ and the associated Lagrange multipliers {λn } and {μn } are
bounded.
To investigate convergence of the Lagrange multipliers we introduce M : X →
W × Rm1 +m2 × Z defined by

M = (e (x ∗ ), gac

(x ∗ ),  (x ∗ )),

where gac denotes the first m1 + m2 coordinates of g. The following additional hypothesis
will be needed:

There exists a constant κ > 0 such that


/ ∗ /
/f (x ) − f (x)/ ≤ κ|x ∗ − x|,
/ ∗ /
/e (x ) − e (x)/ ≤ κ|x ∗ − x|, (3.4.8)
/ ∗ /
/g (x ) − g (x)/ ≤ κ|x ∗ − x|

for all x ∈ Bδ , and


M is surjective. (3.4.9)

We point out that (3.4.9) is more restrictive than the regular point condition in Def-
inition 1.5 of Chapter 1. Below we set μn,ac = col(μn,1 , . . . , μn,m1 +m2 ) and μ∗ac =
col(μ∗1 , . . . , μ∗m1 +m2 ). Without loss of generality we shall assume that σ̄ ≤ 1.

Theorem 3.10. Let (3.1.2)–(3.1.5), (3.4.1), and (3.4.8)–(3.4.9) hold and let xn be a solution
of (Pn ) in int Bδ , ñ ≥ 1. Then there exists a constant K independent of n such that

K
|λn − λ∗ | + |μn − μ∗ | + |ηn − η∗ | ≤ √ (|λn−1 − λ∗ | + |μn−1 − μ∗ |) for all n ≥ 1.
σ̄ σn

Here ηn denotes the Lagrange multiplier associated with the constraint  ∈ K in (Pn ).

Sufficient conditions for xn ∈ Bδ were given in Proposition 3.9 above.

i i

i i
ItoKunisc
i i
2008/6/12
page 81
i i

3.4. Convergence of Algorithm ALM 81

Proof. Optimality of xn and x ∗ for (Pn ) and (3.1.1) (see (3.1.3)) imply
     
f (xn ), h + λn−1 + cn e(xn ), e (xn )h + max(0, μn−1 + cn g(xn )), g (xn )h
 
+ ηn ,  (xn )h = 0
and
 ∗   ∗ ∗   ∗ ∗   ∗ ∗ 
f (x ), h + λ , e (x )h + μ , g (x )h + η ,  (x )h = 0

for all h ∈ X. If we set λ̃n = λn−1 + cn e(xn ) and μ̃n = max(0, μn−1 + cn g(xn )), we obtain
the following equation in X:

e (x ∗ )∗ (λ̃n − λ∗ ) + gac

(x ∗ )∗ (μ̃n,ac − μ∗ac ) +  (x ∗ )∗ (ηn − η∗ ) = f (x ∗ ) − f (xn )
+ (e (x ∗ )∗ − e (xn )∗ )λ̃n + (gac

(x ∗ )∗ − gac

(xn )∗ )μn,ac .
Here we used the fact that μ̃n,i = 0 for i ∈ I3 by the choice of δ and c̃ (see the proof of
Corollary 3.7), and we set μn,ac = col (μn,1 , . . . , μn,m1 +m2 ). Note that M ∗ : W ×Rm1 +m2 →
X is given by M ∗ = col (e (x ∗ )∗ , gac

(x ∗ )∗ ). Hence we find

MM ∗ (λ̃n − λ∗ , μ̃n,ac − μ∗ac , ηn − η∗ ) = M f˜ (x ∗ ) − f˜ (xn )

+ (e (x ∗ )∗ − e (xn )∗ )λ̃n + (gac

(x ∗ )∗ − gac

(xn )∗ )μ̃n,ac . (3.4.10)

Since λ̃n − λ∗ = c̄e(xn ) and μ̃n − μn = c̄ĝ(xn , μn−1 , cn ) we have

|λ̃n − λn | = c̄|e(xn ) − e(x ∗ )|,


(3.4.11)
|μ̃n,ac − μn,ac | ≤ c̄|gac (xn ) − gac (x ∗ )|.
Thus by (3.4.8) and (3.4.3), (3.4.4) of Theorem 3.8 the sequences {λn } and {μn } are uniformly
bounded. Consequently by (3.4.9) and (3.4.10) there exists a constant K̄ such that |λ̃n −
λ∗ | + |μ̃n,ac − μ∗ac | ≤ K̄|xn − x ∗ |, and further by (3.4.11) a constant K can be chosen
such that
|λn − λ∗ | + |μnac − μ∗ac | + |ηn − η∗ | ≤ K̄|xn − x ∗ | for all n = 1, 2, . . . . (3.4.12)
μn−1,i
The choice of δ implies that gi (xn ) ≤ cn
for all i ∈ I3 and therefore

c̄ n−1
μni = μ for i ∈ I3 , n ≥ 1. (3.4.13)
cn i
From (3.4.3), with x̃n replaced by xn , (3.4.12), and (3.4.13), and using σ̄ ≤ 1 the theorem
follows.

Corollary 3.11. Under the assumptions of Theorem 3.10 we have


  n
∗ ∗ ∗ K n0 1
|λn − λ | + |μn − μ | + |ηn − η | ≤ √ √ (|λ0 − λ∗ | + |μ0 − μ∗ |)
σ̄ i=1
σi

for all n ≥ 1.

i i

i i
ItoKunisc
i i
2008/6/12
page 82
i i

82 Chapter 3. First Order Augmented Lagrangians

Remark 3.4.1. If the surjectivity requirement for M is replaced by assuming that M ∗ is


surjective, then the proof of Theorem 3.10 implies the existence of a constant K such that

|PM (λn − λ∗ , μn,ac − μ∗ac , ηn − η∗ )|W ×Rm1 +m2 ×Z ≤ K|xn − x ∗ |,

where PM = M(M ∗ M)−1 M ∗ denotes the orthogonal projection of W × Rm1 +m2 × Z onto
the range of M which is closed, since M ∗ is surjective. Since xn → x ∗ in X this implies
convergence of PM (λn , μn,ac , ηn ) to PM (λ∗ , μ∗ac , η∗ ).

Remark 3.4.2. If a constraint of the type x ∈ C, with C a closed convex set in X, appears in
(3.1.1), then Theorem 3.8 and Proposition 3.9 remain valid if Algorithm ALM is modified
such that in (Pn ) the functional Ln (x) is minimized over x ∈ C and x̃n appearing in (3.4.2)
satisfies x̃n ∈ C. The stationarity condition (3.1.3) has to be replaced by
   
f (x ∗ )(x − x ∗ ) + λ∗ , e (x ∗ )(x − x ∗ ) W + μ∗ , g (x ∗ )(x − x ∗ ) Rm ≥ 0 (3.1.3 )

for all x ∈ C, in this case.

3.5 Application to a parameter estimation problem


We consider a least squares formulation for the estimation of the coefficient a in
-
−div (a grad y) = f in ,
(3.5.1)
y = 0 in ∂,

where  is a bounded domain in Rn , with Lipschitz continuous boundary ∂ if n ≥ 2, and


f ∈ H −1 (), f  = 0. Let Q be a Hilbert space that embeds compactly in L∞ () and let
N : Q → R denote a seminorm  on  Q with ker(N ) ⊂ span{1} and such that N (a) defines
a norm on Q/R = {a ∈ Q : 1, a Q = 0} that is equivalent to the norm induced by Q. For
z ∈ H01 (), α > 0, and β > 0 we consider
-
min 12 |y − z| 2H 1 () + α2 N 2 (a)
0 (3.5.2)
subject to e(a, y) = 0, a(x) ≥ β,

where e : Q × H01 () → H −1 () is given by

e(a, y) = div (a grad y) + f.

This is a special case of (3.1.1) with X = Q × H01 (), W = H −1 (), g = 0, and f the
cost functional in (3.5.2). The affine constraint a(x) ≥ β can be considered as an explicit
constraint as discussed in Remark 3.4.2 by setting C = {a ∈ Q : a(x) ≥ β}. Denote the
solution to (3.5.1) as a function of a by y(a). To guarantee the existence of a solution for
(3.5.2) we require the following assumption:
-
either N is a norm on Q
(3.5.3)
or inf {|y(a) − z|H 1 : a is constant } < |z|H 1 .

i i

i i
ItoKunisc
i i
2008/6/12
page 83
i i

3.5. Application to a parameter estimation problem 83

Lemma 3.12. If (3.5.3) holds, then there exists a solution (a ∗ , y ∗ ) of (3.5.2).

Proof. Let {(an , yn )}∞n=1 denote a minimizing sequence for (3.5.2). If N is a norm, then
a subsequence argument implies the existence of a solution to (3.5.2). Otherwise we de-
compose an as an = an1 + an2 with an2 ∈ Q/R and an1 ∈ (Q/R)⊥ . Since {(an , yn )}∞ n=1 is a
minimizing sequence, {N (an2 )}∞ n=1 and consequently {|a 2 ∞ ∞
n | L } n=1 are bounded. We have
  
an1 |∇yn |2 dx = − an2 |∇yn |2 dx + fyn dx.
  

Since a ≥ β implies a ≥ β, it follows that


1
{an1 |∇yn |2L2 }∞
n=1 , is bounded. If an1 → ∞, then
|∇yn | = |∇y(an )| → 0 for n → ∞. In this case

|z| ≤ lim |y(an ) − z|2H 1 () + βN 2 (an ) ≤ inf {|y(a) − z|H 1 : a = constant} < |z|H 1 ,
n→∞

which is impossible. Therefore {an1 }∞ ∞


n=1 and as a consequence {an }n=1 are bounded, and a
subsequence of {(an , yn )} converges weakly to a solution of (3.5.2).

Using Theorem 1.6 one argues the existence of Lagrange multipliers (λ∗ , η∗ ) ∈
−1
H () × Q∗ such that the Lagrangian
1 α
L(a, y, λ∗ , η∗ ) = |y − z|2H 1 () + N 2 (a) + λ∗ , e(a, y)
H01 ,H −1 + η∗ , β − a
Q∗ ,Q
2 0 2
is stationary at (a ∗ , y ∗ ), and η∗ , β − a ∗
Q∗ ,Q = 0, η∗ , h
Q∗ ,Q ≥ 0 for all h ≥ 0. Hence
(3.1.2)–(3.1.5) hold. Here (3.1.3)–(3.1.4) must be modified to include the affine inequality
constraint with infinite-dimensional image space. We turn to the augmentability condition
(3.4.1) and introduce the augmented Lagrangian
c
Lc (a, y, λ∗ , η∗ ) = L(a, y, λ∗ , η∗ ) + |e(a, y)|2H −1
2
for c ≥ 0.
Henceforth we assume that (3.5.2) admits a solution (a0 , y0 ) for α = 0. Such a solution
exists if z is attainable, i.e., if there exists a0 ∈ Q, a0 ≥ β, such that y(a0 ) = y and in this
case y0 = y(a0 ). Alternatively, if the set of coefficients is further constrained by a norm
bound |a|Q ≤ γ , for example, then existence of a solution to (3.5.2) with α = 0 is also
guaranteed.

Proposition 3.13. Let (a0 , y0 ) denote a solution to (3.5.2) with α = 0, let (a ∗ , y ∗ ) be a


solution for α > 0, and choose γ ≥ |a ∗ |Q . Then, if |y0 − z|H 1 is sufficiently small, there
exist positive constants c0 and σ such that
1 ∗ α
Lc (a, u; λ∗ , η∗ ) ≥ |y − z|2H 1 + N 2 (a ∗ ) + σ (|a − a ∗ |2Q + |y − y ∗ |2H 1 )
2 2 0

for all c ≥ c0 and (a, y) ∈ C × H01 () with |a|Q ≤ γ .

This proposition implies that (3.4.1) is satisfied and that Theorem 3.8 and Proposi-
tion 3.9 are applicable with Bδ = {a ∈ Q : a(x) ≥ β, |a|Q ≤ γ } × H01 ().

i i

i i
ItoKunisc
i i
2008/6/12
page 84
i i

84 Chapter 3. First Order Augmented Lagrangians

Proof. For (a, y) ∈ C × H01 () we have

Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) = ∇L(a ∗ , y ∗ , λ∗ , η∗ )
  1 α c
− h∇λ∗ , ∇v L2 + |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 ,
2 0 2 2
where h = a − a ∗ , v = y − y ∗ , and ∇L denotes the gradient of L with respect to (a, y).
First order optimality implies that
 
Lc (a, y, λ∗ , η∗ ) − Lc (a ∗ , y ∗ , λ∗ , η∗ ) ≥ − h∇λ∗ , ∇v L2
1 α c
+ |v|2H 1 + N 2 (h) + |e(a, h)|2H −1 =: D.
2 0 2 2
Introduce P = ∇ · (− )−1 ∇ and note that P can be extended to an orthogonal projection
on L2 ()n . We find that
 
|e(a, y)|2H −1 = P (a∇y − a ∗ ∇y ∗ ), a∇y − a ∗ ∇y ∗ L2
 
= P (a∇v + h∇y ∗ ), h∇y ∗ + a∇v L2
1   
≥ P (h∇y ∗ ), h∇y ∗ L2 − P (a∇v), a∇v L2
2
1 
≥ P (h∇y ∗ ), h∇y ∗ L2 − |a|2L∞ |v|2H 1
2 0

1 
≥ P (h∇y ∗ ), h∇y ∗ L2 − γ 2 k12 |v|2H 1 ,
2 0


where k1 denotes the embedding constant of Q into L (). Henceforth we consider the case
that ker N = span{1} and use the decomposition introduced in the proof of Lemma 3.12.
We have
1
|e(a, y)|2 ≥ |h1 |2 |y ∗ |2H 1 − |h1 | |h2 |L∞ |y ∗ |2H 1 − γ 2 k12 |v|H01 .
2 0 0


Since Q embeds continuously into L () and N is a norm equivalent to the norm on Q/R,
there exists a constant k2 such that |h2 |L∞ ≤ k2 N (h) for all h = h1 + h2 ∈ Q. Consequently
1
|e(a, y)|2 ≥ |h1 |2 |y ∗ |2H 1 − k2 |h1 | N (h2 ) |y ∗ |2H 1 − γ 2 k12 |v|H01 (3.5.4)
2 0 0

for all (a, y) ∈ C×H01 (). Next note that Ly (a ∗ , u∗ ; λ∗ , η∗ ) = 0 implies that −∇(a ∗ ∇λ∗ ) =
− (y ∗ − y) in H −1 (), and hence
 
h∇λ∗ , ∇v 2 ≤ 1 (|h1 | + k2 N (h2 ))|y ∗ − z|H 1 |v|H 1 . (3.5.5)
L β 0 0

Setting c = δα with δ > 0 and observing that |f |H −1 ≤ k3 |y ∗ |H 1 for a constant k3 depending


only on β,
1 1
D= (1 − δαγ 2 k12 )|v|2H 1 − (|h1 | + k2 N (h2 ))|y ∗ − z||v|H01
2 0 β
 
α δ ∗ 2 k2
+ N (h2 ) + |y |H 1 |h1 | − 2 δ|h1 |N (h2 ) |f |H −1 .
2 2 2
2 2 0 k3

i i

i i
ItoKunisc
i i
2008/6/12
page 85
i i

3.5. Application to a parameter estimation problem 85

There exist δ > 0 and k4 > 0 such that


1
D ≥ k4 (|v|2H 1 + |h1 |2 + N 2 (h2 )) − (|h1 | + k2 N (h2 )) |y ∗ − z|H01 |v|H01 .
0 β
Since |h1 | + N (h2 ) defines an equivalent norm on Q, the claim follows with c0 = δα.

It is worthwhile to comment on the auxiliary problems of Algorithm ALM. These


involve the minimization of
1 α cn
Ln (a, y) = |y −z|2H 1 + N 2 (a)+ λn−1 , A(a)y +f
H01 ,H −1 + |A(a)y +f |2H −1 (3.5.6)
2 0 2 2
over (a, y) ∈ C × H01 (), where A(a)y = div (a grad y). The resulting optimality condi-
tions are given by
α 2
(N (an )) (a − an )
2
+ (−∇λn−1 ∇ yn + cn ∇ −1 (A(an )yn + f ) ∇yn , a − an )Ln ≥ 0 (3.5.7)
for all a ≥ an and
− (yn − z) + A(an )λn−1 + A(an )(− )−1 (A(an )yn + f ) = 0. (3.5.8)
The Lagrange multiplier is updated according to
λn = λn−1 + σn (− )−1 (A(an )yn + f ). (3.5.9)
To simplify (3.5.7), (3.5.8) for numerical realization one can proceed iteratively solv-
ing (3.5.7) with yn replaced by yn−1 for an and then (3.5.8) for yn . A good initialization for
the variable λ is given by λ0 = 0. In fact, for small residue problems |y ∗ − z|H01 is small
and then ∂y∂
L(a ∗ , y ∗ , λ∗ , η∗ ) = 0 implies that λ∗ is small. Setting λ0 = 0 and y0 = z the
coefficient a1 is determined from
α 2 C1
min N (a) + |A(a)z + f |2H −1 . (3.5.10)
a≥β 2 2
Thus the first step of the augmented Lagrangian algorithm for estimating a in (3.5.1) involves
a regularized equation error step. Recall that the equation error method for parameter
estimation problems consists in solving the hyperbolic equation −div (a grad z) = f for
a. This estimate is improved during the subsequent iterations. The first order augmented
Lagrangian method can therefore be considered as a hybrid method combining the output
least squares and the equation error approach to parameter estimation.
Let us close this section with some brief comments. If the H 1 least squares criterion is
replaced by an L2 -criterion, then the first negative Laplacian in (3.5.8) must be replaced by
the identity operator. For the H 1 -criterion the same number of differentiations are applied
to y in the least squares and the equation error term in (3.5.8). If one would approach the
problem of estimating a in (3.5.1) by first discretizing and then applying an optimization
strategy, the discrete analogue of (− )−1 can be interpreted as preconditioning. We assumed
that Q embeds compactly in L∞ (). This is the case, for example, if Q = H 1 () when
n = 1, or Q = H 2 () when n = 2 or 3, or if Q ⊂ L∞ () is finite-dimensional.

i i

i i
ItoKunisc
i i
2008/6/12
page 86
i i

i i

i i
ItoKunisc
i i
2008/6/12
page 87
i i

Chapter 4

Augmented Lagrangian
Methods for Nonsmooth,
Convex Optimization

4.1 Introduction
The class of optimization problems that motivates this chapter is given by

min f (x) + ϕ(x) over x ∈ C, (4.1.1)

where X, H are real Hilbert spaces, C is a closed convex subset of X, and  ∈ L(X, H ).
In this chapter we identify H with its dual space and we distinguish between X and its
dual X ∗ . Further f : X → R is a continuously differentiable, convex function, and ϕ :
H → (−∞, ∞] is a proper, lower semicontinuous, convex but not necessarily differentiable
function. This problem class encompasses a wide variety of optimization problems including
variational inequalities of the first and second kind [Glo]. Our formulation here is slightly
more general than the one in [Glo, EkTe], since we allow the additional constraint x ∈ C.
For example, one can formulate regularized inverse scattering problems in the form
(4.1.1):
   2
μ 1
min |∇u| + g |∇u| dx +
2
k(x, y)u(y) dy − z(x) dx (4.1.2)
 2 2  

over u ∈ H 1 () and u ≥ 0, where k(x, y) denotes a scattering kernel. The problem
consists in recovering the original image u defined on a domain  from scattered and
noisy data z ∈ L2 (). Here μ > 0 and g > 0 are fixed and should be adjusted to the
statistics of the noise. If μ = 0, then this problem is equivalentto the image enhancement
algorithm in [ROF] based on minimization of the BV-seminorm  |∇u| ds. In this example
ϕ(v) = g  |v| ds, which is nondifferentiable. Several additional applications are treated
at the end of this chapter.
We develop a Lagrange multiplier theory to deal with the nonsmoothness of ϕ. To
briefly explain the approach let x, λ ∈ H and c > 0, and define the family of generalized
Yosida–Moreau approximations ϕc (x, λ) by
 c 
ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.1.3)
u∈H 2

87

i i

i i
ItoKunisc
i i
2008/6/12
page 88
i i

88 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

This is equivalent to an augmented Lagrangian approach. In fact, note that (4.1.1) is equiv-
alent to
min f (x) + ϕ(x − u)
(4.1.4)
subject to x ∈ C and u = 0 in H.
Treating the equality constraint u = 0 in (4.1.4) by the augmented Lagrangian method
results in the minimization problem
c 2
min f (x) + ϕ(x − u) + (λ, u)H + |u| , (4.1.5)
x∈C,u∈H 2 H

where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. Equivalently,


problem (4.1.5) is written as

min Lc (x, λ) = f (x) + ϕc (x, λ). (4.1.6)


x∈C

It will be shown that ϕc (u, λ) is continuously Fréchet differentiable with respect to u ∈ H .


Moreover, if xc ∈ C denotes the solution to (4.1.6), then it satisfies

f (xc ) + ∗ λc , x − xc
X∗ ,X ≥ 0 for all x ∈ C,

λc = ϕc (xc , λc ).

It will further be shown that under appropriate conditions the pair (xc , λc ) ∈ C × H has a
(strong-weak) cluster point (x̄, λ̄) as c → ∞ such that x̄ ∈ C is the minimizer of (4.1.1)
and that λ̄ ∈ H is a Lagrange multiplier in the sense that

f (x̄) + ∗ λ̄, x − x̄
X∗ ,X ≥ 0 for all x ∈ C (4.1.7)

with the complementarity condition

λ̄ = ϕc (x̄, λ̄) for each c > 0. (4.1.8)

System (4.1.7)–(4.1.8) for the pair (x̄, λ̄) is a necessary and sufficient optimality condition
for problem (4.1.1). We analyze iterative algorithms of Uzawa and augmented Lagrangian
type for finding the optimal pair (x̄, λ̄) and present a convergence analysis. It will be shown
that condition (4.1.8) is equivalent to the complementarity condition λ̄ ∈ ∂ϕ(x̄). Thus
the frequently employed differential inclusion λ̄ ∈ ∂ϕ(x̄) is replaced by the nonlinear
equation (4.1.8).
This chapter is organized as follows. In Sections 4.2–4.3 we present the basic convex
analysis and duality theory in Banach spaces. Section 4.4 is devoted to the generalized
Yosida–Moreau approximation which is the basis for the remainder of the chapter. Condi-
tions for the existence of Lagrange multiplier for (4.1.1) and optimality systems are derived
in Section 4.5. Section 4.6 is devoted to the augmented Lagrangian algorithm. Convergence
of both the augmented Lagrangian and the Uzawa methods are proved. Section 4.7 contains
a large number of concrete applications.

i i

i i
ItoKunisc
i i
2008/6/12
page 89
i i

4.2. Convex analysis 89

4.2 Convex analysis


In this section we present standard results from convex analysis and duality theory in Banach
spaces following, in part, [BaPe, EkTu]. Throughout X denotes a real Banach space.

Definition 4.1. (1) A functional F : X → (−∞, ∞] is called convex if

F ((1 − λ) x1 + λ x2 ) ≤ (1 − λ) F (x1 ) + λ F (x2 )

for all x1 , x2 ∈ X and 0 ≤ λ ≤ 1. It is called proper if it is not identically ∞.


(2) A functional F : X → (−∞, ∞] is said to be lower semicontinuous (l.s.c.) at
x ∈ X if
F (x) ≤ lim inf F (y).
y→x

A functional F is l.s.c. if it is l.s.c. at all x ∈ X.


(3) A functional F : X → (−∞, ∞] is called weakly lower semicontinuous (w.l.s.c.)
at x if
F (x) ≤ lim inf F (xn )
n→∞

for all sequences {xn } converging weakly to x. Further F is w.l.s.c. if it is w.l.s.c. at all
x ∈ X.
(4) The subset D(F ) = {x ∈ X : F (x) < ∞} of X is called the effective domain
of F .
(5) The epigraph of F is defined by epi(F ) = {(x, c) ∈ X × R : F (x) ≤ c}.

Lemma 4.2. A functional F : X → (−∞, ∞] is l.s.c. if and only if its epigraph is closed.

Proof. The lemma follows from the fact that epi (F ) is closed if and only if (xn , cn ) ∈
epi(F ) → (x, c) in X × R implies F (x) ≤ c.

Lemma 4.3. A functional F : X → (−∞, ∞] is l.s.c. if and only if its level sets Sc = {x ∈
X : F (x) ≤ c} are closed for all c ∈ R.

Proof. Assume that F is l.s.c. Let c > 0 and let {xn } be a sequence in Sc with limit x. Then
F (x) ≤ lim inf x→∞ F (xn ) ≤ c and hence x ∈ Sc and Sc is closed. Conversely, assume
that Sc is closed for all c > 0, and let {xn } be a sequence converging to x in X. Choose a
subsequence {xnk } with
lim F (xnk ) = lim inf F (xn ),
k→∞ n→∞

and suppose that F (x) > lim inf n→∞ F (xn ). Then there exists c̄ ∈ R such that

lim inf F (xn ) < c̄ < F (x),


n→∞

and there exists an index m such that F (xnk ) < c̄ for all k ≥ m. Since Sc̄ is closed x ∈ Sc̄ ,
which is a contradiction to c̄ < F (x).

Lemma 4.4. Assume that all level sets of the functional F : X → (−∞, ∞] are convex.
Then F is l.s.c. if and only if it is w.l.s.c. on X.

i i

i i
ItoKunisc
i i
2008/6/12
page 90
i i

90 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Proof. Assume that F is l.s.c. Suppose that {xn } converges weakly to x̄ in X, and let {xnk }
be a subsequence such that d = lim inf F (xn ) = limk→∞ F (xnk ). By Lemma 4.3 the sets
{x : F (x) ≤ d + } are closed for every  > 0. By assumption they are also convex. Hence
by Mazur’s lemma they are also weakly (sequentially) closed. Hence F (x̄) ≤ d +  for
every  > 0. Since  is arbitrary, we have that F (x̄) ≤ d and F is w.l.s.c. The converse
implication is obvious.
Note that for a convex function all associated level sets are convex, but not vice
versa.

Theorem 4.5. Let F be a proper, l.s.c., convex functional on X. Then F is bounded below
by an affine functional; i.e., there exist x ∗ ∈ X∗ and c ∈ R such that

F (x) ≥ x ∗ , x
X∗ ,X + c for all x ∈ X.

Moreover, F is the pointwise supremum of a family of such continuous affine functionals.

Proof. Let x0 ∈ X and choose β ∈ R such that F (x0 ) > β. Since epi(F ) is a closed convex
subset of the product space X × R, it follows from the separation theorem for convex
sets [EkTu] that there exists a closed hyperplane H ⊂ X × R given by

H = {(x, r) ∈ X × R : x0∗ , x
+ ar = α} with x0∗ ∈ X∗ , a ∈ R, α ∈ R,

such that
x0∗ , x0
+ aβ < α < x0∗ , x
+ ar for all (x, r) ∈ epi(F ).
Setting x = x0 and r = F (x0 ), we have

x0∗ , x0
+ aβ < α < x0∗ , x0
+ aF (x0 )

and thus a (F (x0 ) − β) > 0. If F (x0 ) < ∞, then a > 0 and thus
α 1
− x0∗ , x
≤ r for all (x, r) ∈ epi(F ),
a a
α 1
β< − x0∗ , x0
< F (x0 ).
a a
Hence, b(x) = αa − a1 x0∗ , x
is a continuous affine function on X such that b ≤ F and
−x ∗
the first claim is established with c = αa and x ∗ = a 0 . Moreover β < b(x0 ) < F (x0 ).
Therefore

F (x0 ) = sup b(x0 ) = x ∗ , x0
+ c :
 (4.2.1)
x ∗ ∈ X∗ , c ∈ R, b(x) ≤ F (x) for all x ∈ X .

If F (x0 ) = ∞, either a > 0 (thus we proceed as above) or a = 0. In the latter case


α − x0∗ , x0
> 0 and α − x0∗ , x
< 0 on D(F ). Since F is proper there exists an affine
function b(x) = x ∗ , x
+ c such that b ≤ F . Thus

x ∗ , x
+ c + θ (α − x0∗ , x
) < F (x)

i i

i i
ItoKunisc
i i
2008/6/12
page 91
i i

4.2. Convex analysis 91

for all x and θ > 0. Choosing θ > 0 large enough so that

x ∗ , x
+ c + θ (α − x0∗ , x
) > β,

we have that b(x) = x ∗ , x


+ c + θ (α − x0∗ , x
) is a continuous affine function on X with
b ≤ F and β < b(x0 ). Therefore (4.2.1) holds at x0 with F (x0 ) = ∞ as well.

Theorem 4.6. If F : X → (−∞, ∞] is convex and bounded above on an open set U , then
F is continuous on U .

Proof. We choose M ∈ R such that F (x) ≤ M − 1 for all x ∈ U . Let x̂ be any element in
U . Since U is open there exists a δ > 0 such that the open ball {x ∈ X : |x − x̂| < δ} is
contained in U . For any  ∈ (0, 1), let θ = M−F
(x̂)
. Then for x ∈ X satisfying |x − x̂| < θ δ
we have
x − x̂ |x − x̂|

θ + x̂ − x̂ = θ
< δ.

Hence x−x̂
θ
+ x̂ ∈ U . By convexity of F
 
x − x̂
F (x) ≤ (1 − θ )F (x̂) + θ F + x̂ < (1 − θ)F (x̂) + θ M,
θ
and thus

F (x) − F (x̂) < θ (M − F (x̂)) = .

Similarly, + x̂ ∈ U and
x̂−x
θ
 
θ x̂ − x 1 θM 1
F (x̂) ≤ F + x̂ + F (x) < + F (x),
1+θ θ 1+θ 1+θ 1+θ
which implies
F (x) − F (x̂) > −θ(M − F (x̂)) = −.
Therefore |F (x) − F (x̂)| <  if |x − x̂| < θ δ and F is continuous in U .

Theorem 4.7. If F : X → (−∞, ∞] is a proper, l.s.c., convex functional on X, and F is


bounded above on a convex, open δ-neighborhood U of a bounded, convex set C, then F is
Lipschitz continuous on C.

Proof. By Theorem 4.5 and by assumption there exist constants M and m such that

m ≤ F (x) ≤ M for all x ∈ U.

Let x and x̂ be in C with |x − x̂| ≤ M−m δ


, x  = x̂, and set θ = 2|x−θ
x̂|
. Without loss of
generality we assume that M−m < 1 so that θ ∈ (0, 1). Then y = θ + x̂ ∈ U since x̂ ∈ U
2 x−x̂

and |y − x̂| = x−θ x̂| ≤ 2δ . Due to convexity of F we have


  
x − x̂
F (x) ≤ (1 − θ )F (x̂) + θ F + x̂ ≤ (1 − θ)F (x̂) + θ M
θ

i i

i i
ItoKunisc
i i
2008/6/12
page 92
i i

92 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

and hence
2
F (x) − F (x̂) ≤ θ(M − F (x̂)) ≤ (M − m)|x − x̂|.
δ
Similarly, x̂−x
θ
+ x̂ ∈ U and
 
θ x̂ − x 1 θM 1
F (x̂) ≤ F + x̂ + F (x) ≤ + F (x),
1+θ θ 1+θ 1+θ 1+θ

which implies
−θ (M − m) ≤ −θ(M − F (x̂)) ≤ F (x) − F (x̂)
and therefore
2 δ
|F (x) − F (x̂)| ≤ (M − m)|x − x̂| for |x − x̂| ≤ .
θ M −m

Since C is bounded and convex, Lipschitz continuity for all x, x̂ ∈ C follows.

4.2.1 Conjugate and biconjugate functionals


Definition 4.8. The functional F ∗ : X∗ → [−∞, ∞] defined by

F ∗ (x ∗ ) = sup { x ∗ , x
X∗ ,X − F (x) : x ∈ X}

is called the conjugate of F .

If F is bounded below by an affine functional x ∗ , ·


− c, then

F ∗ (x ∗ ) = sup { x ∗ , x
− F (x)} ≤ sup { x ∗ , x
− x ∗ , x
+ c} = c,
x∈X x∈X

and hence F ∗ is not identically ∞. Note also that D(F ) = ∅ implies that F ∗ ≡ −∞ and
conversely, if F ∗ (x ∗ ) = −∞ for some x ∗ ∈ X, then D(F ) is empty.

Example 4.9. If F (x) = 1


p
|x|p , x ∈ R, then F ∗ (x ∗ ) = 1 ∗ q
q
|x | for 1 < p < ∞ and
1
p
+ q1 = 1.

Example 4.10. If F (x) is the indicator function of the closed unit ball of X, i.e., F (x) = 0
for |x| ≤ 1 and F (x) = ∞ otherwise, then F ∗ (x ∗ ) = |x ∗ |. In fact, F ∗ (x ∗ ) = sup{ x ∗ , x
:
|x| ≤ 1} = |x ∗ |.

If F is a proper, l.s.c., convex functional on X, then by Theorem 4.5

F (x0 ) = sup { x ∗ , x0
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X}

for every x0 ∈ X. For x ∗ ∈ X define

c(x ∗ ) = inf {c ∈ R : c ≥ x ∗ , x
− F (x) for all x ∈ X}

i i

i i
ItoKunisc
i i
2008/6/12
page 93
i i

4.2. Convex analysis 93

and observe that


F (x0 ) = sup { x ∗ , x0
− c(x ∗ )}.
x ∗ ∈X ∗

But c(x ) = supx∈X { x , x


− F (x)} = F ∗ (x ∗ ) and hence
∗ ∗

F (x0 ) = sup { x ∗ , x0
− F ∗ (x ∗ )}.
x ∗ ∈X ∗

This suggests the following definition.

Definition 4.11. For F : X → (−∞, ∞] the biconjugate functional F ∗∗ : X → (−∞, ∞]


is defined by
F ∗∗ (x) = sup { x ∗ , x
− F ∗ (x ∗ )}.
x ∗ ∈X ∗

Above we proved the following result.

Theorem 4.12. If F is a proper, l.s.c., convex functional, then F = F ∗∗ .

It is simple to argue that (F ∗ )∗ = F ∗∗ if X is reflexive. Next we prove some important


results on conjugate functionals.

Theorem 4.13. For every functional F : X → (−∞, ∞] the conjugate F ∗ is convex and
l.s.c. on X∗ .

Proof. If F is identically ∞, then F ∗ is identically −∞. Otherwise D(F ) is nonempty and


F (x ∗ ) > −∞ for all x ∗ ∈ X∗ , and

F ∗ (x ∗ ) = sup { x ∗ , x
− F (x) : x ∈ D(F )}.

Thus F ∗ is the pointwise supremum of the family of continuous affine functionals x ∗ →


x ∗ , x
− F (x) for x ∈ D(F ). This implies that F ∗ is convex. Moreover {x ∗ ∈ X∗ :
x ∗ , x
− F (x) ≤ c} is closed for each c ∈ R and x ∈ D(F ). Consequently
.
{x ∗ ∈ X∗ : F ∗ (x ∗ ) ≤ c} = {x ∗ ∈ X∗ : x ∗ , x
− F (x) ≤ c}
x∈D(F )

is closed for each c ∈ R. Hence F ∗ is l.s.c. by Lemma 4.3.

Theorem 4.14. For every F : X → (−∞, ∞] the biconjugate F ∗∗ is convex, l.s.c., and
F ∗∗ ≤ F . Moreover for each x̄ ∈ X

F ∗∗ (x̄) = sup { x ∗ , x̄
−c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
−c ≤ F (x) for all x ∈ X}. (4.2.2)

Proof. If there is no continuous affine functional which is everywhere less than F , then
F ∗ ≡ ∞ and hence F ∗∗ ≡ −∞. In fact, if there exist c ∈ R and x ∗ ∈ X∗ with c ≥ F ∗ (x ∗ ),
then x ∗ , ·
− c is everywhere less than F , which is impossible. The claims readily follow
in this case.

i i

i i
ItoKunisc
i i
2008/6/12
page 94
i i

94 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Otherwise, assume that there exists a continuous affine functional everywhere less
than F . Then D(F ∗ ) is nonempty and F ∗∗ (x) > −∞ for all x ∈ X. If F ∗ (x ∗ ) = −∞
for some x ∗ ∈ X∗ , then F ∗∗ ≡ ∞, F ≡ ∞, and the claims follow. In the remaining case
F ∗ (x ∗ ) is finite for all x ∗ and we have
F ∗∗ (x) = sup{ x ∗ , x
− F ∗ (x ∗ ) : x ∗ ∈ X∗ , F ∗ (x ∗ ) finite}.
For every x ∗ ∈ D(F ∗ ) we have
x ∗ , x
− F ∗ (x ∗ ) ≤ F (x) for all x ∈ X,
hence F ∗∗ ≤ F and
F ∗∗ (x̄) ≤ sup{ x ∗ , x̄
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X}.
Let x ∗ and c be such that x ∗ , x
− c ≤ F (x) for all x ∈ X. Then
x ∗ , x
− c ≤ x ∗ , x
− F (x ∗ ) for all x ∈ X.
Therefore
sup{ x ∗ , x̄
− c : x ∗ ∈ X∗ , c ∈ R, x ∗ , x
− c ≤ F (x) for all x ∈ X} ≤ F ∗∗ (x̄),
and (4.2.2) follows. Moreover F ∗∗ is the pointwise supremum of a family of continuous
affine functionals, and it follows as in the proof of Theorem 4.13 that F ∗ is convex and
l.s.c.

Theorem 4.15. If F : X → (−∞, ∞] is a convex functional which is finite and l.s.c. at x,


then F (x) = F ∗∗ (x).

For the proof of this result we refer the reader to [EkTu, p. 104], for example.

Theorem 4.16. For every F : X → (−∞, ∞], we have F ∗ = F ∗∗∗ .

Proof. Since F ∗∗ ≤ F due to Theorem 4.14, F ∗ ≤ F ∗∗∗ by the definition of conjugate


functions. The definition of F ∗∗ implies that
x ∗ , x
− F ∗∗ (x) ≤ F (x ∗ ) (4.2.3)
∗∗ ∗ ∗ ∗ ∗ ∗∗
if F (x) and F (x ) are finite. If F (x ) = ∞ or F (x) = ∞, inequality (4.2.3) holds
as well. If F ∗ (x ∗ ) = −∞, we have that F and F ∗ are identically ∞. If F ∗∗ (x) = −∞,
then F ∗ is identically ∞, and (4.2.3) holds for all (x, x ∗ ) ∈ X × X ∗ . Thus,
F ∗∗∗ (x) = sup{ x ∗ , x
− F ∗∗ (x)} ≤ F (x ∗ )
x∈X
∗ ∗ ∗∗∗ ∗
for all x ∈ X . Hence F =F .

Theorem 4.17. Let F : X → (−∞, ∞] and assume that ∂F (x)  = ∅ for some x ∈ X.
Then F (x) = F ∗∗ (x).

Proof. Since ∂F (x)  = ∅ there exists a continuous affine functional  ≤ F with (x) =
F (x). Due to Theorem 4.14 we have  ≤ F ∗∗ ≤ F . It follows that (x) = F ∗∗ (x) = F (x),
as desired.

i i

i i
ItoKunisc
i i
2008/6/12
page 95
i i

4.2. Convex analysis 95

4.2.2 Subdifferential
In this short section we summarize some important results on subdifferentials of functionals
F : X → (−∞, ∞]. While F is not assumed to be convex, the applications that we have
in mind are to convex functionals.

Definition 4.18. Let F : X → (−∞, ∞]. The subdifferential of F at x is the (possibly


empty) set

∂F (x) = {x ∗ ∈ X∗ : F (y) − F (x) ≥ x ∗ , y − x


for all y ∈ X}.

If ∂F (x) is nonempty, then F is called subdifferentiable at x and the set of all points where
F is subdifferentiable is denoted by D(∂F ).

As a first observation, we note that x0 = argminx∈X F (x) if and only if 0 ∈ ∂F (x0 ).


In fact, these two statements are equivalent to

F (x) ≥ F (x0 ) + (0, x − x0 ) for all x ∈ X.

Example 4.19. Let F be Gâteaux differentiable at x, i.e., there exists w ∗ ∈ X∗ such that

F (x + t v) − F (x)
lim = w ∗ , v
for all v ∈ X,
t→0+ t
and w∗ ∈ X∗ is called the Gâteaux derivative of F at x. It is denoted by F (x). If in addition
F is convex, then F is subdifferentiable at x and ∂F (x) = {F (x)}. Indeed, for v = y − x

F (x + t (y − x)) − F (x)
≤ F (y) − F (x), where 0 < t < 1.
t
As t → 0+ we obtain

F (x), y − x
≤ F (y) − F (x) for all y ∈ X,

and thus F (x) ∈ ∂F (x). On the other hand, if w ∗ ∈ ∂F (x), we find for y ∈ X and t > 0

F (x + t y) − F (x)
≥ w ∗ , y
.
t
Taking the limit t → 0+ , we obtain

F (x) − w ∗ , y
≥ 0 for all y ∈ X.

This implies that w∗ = F (x).

Example 4.20. For F (x) = 12 |x|2 , x ∈ X, we will show that ∂F (x) = F(x), where
F : X → X∗ denotes the duality mapping. In fact, if x ∗ ∈ F(x), then

1
x ∗ , x − y
= |x|2 − x ∗ , y
≥ (|x|2 − |y|2 ) for all y ∈ X.
2

i i

i i
ItoKunisc
i i
2008/6/12
page 96
i i

96 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Thus x ∗ ∈ ∂ϕ(x). Conversely, if x ∗ ∈ ∂ϕ(x), then

1
(|y|2 − |x|2 ) ≥ x ∗ , y − x
for all y ∈ X. (4.2.4)
2
We let y = t x, 0 < t < 1, and obtain

1+t
|x|2 ≤ x ∗ , x

2
and thus |x|2 ≤ x ∗ , x
. Similarly, if t > 1, then |x|2 ≥ x ∗ , x
and therefore |x|2 = x ∗ , x

and |x ∗ | ≥ |x|. On the other hand, setting y = x + t u, t > 0, in (4.2.4), we have

1 t2
t x ∗ , u
≤ (|x + t u|2 − |x|2 ) ≤ t |u||x| + |u|2 ,
2 2
which implies x ∗ , u
≤ |u||x|. Hence |x ∗ | ≤ |x| and we obtain |x|2 = |x ∗ |2 = x ∗ , x

Example 4.21. Let K be a closed convex subset of X and let ψK be the indicator function
of K, i.e.,

0 if x ∈ K,
ψK (x) =
∞ otherwise.

Obviously, ψK is convex and l.s.c. on X. By definition we have for x ∈ K

∂ψK (x) = {x ∗ ∈ X∗ : x ∗ , y − x
≤ 0 for all y ∈ K}

and ∂ψK (x) = ∅ if x ∈ / X. Thus D(ψK ) = D(∂ψK ) = K and ∂ψK (x) = {0} for each
interior point of K. Moreover, for x ∈ K, ∂ψK (x) coincides with the definition of the
normal cone to K at x.

Theorem 4.22. Let F : X → (−∞, ∞].


(1) We have

x ∗ ∈ ∂F (x̄) if and only if F (x̄) + F ∗ (x ∗ ) = x ∗ , x̄


.

(2) Assume that X is reflexive. Then x ∗ ∈ ∂F (x̄) implies that x̄ ∈ ∂F ∗ (x ∗ ). If


moreover F is convex and l.s.c., then x ∗ ∈ ∂F (x̄) if and only if x̄ ∈ ∂F ∗ (x ∗ ).

Proof. (1) Note that x ∗ ∈ ∂F (x̄) if and only if

x ∗ , x
− F (x) ≤ x ∗ , x̄
− F (x̄) (4.2.5)

for all x ∈ X. By the definition of F ∗ this implies F ∗ (x ∗ ) = x ∗ , x̄


− F (x̄). Conversely,
if F (x ∗ ) + F (x̄) = x ∗ , x̄
, then (4.2.5) holds for all x ∈ X.
(2) Since F ∗∗ ≤ F by Theorem 4.14, it follows from (1) that

F ∗∗ (x̄) ≤ x ∗ , x̄
− F ∗ (x ∗ ).

i i

i i
ItoKunisc
i i
2008/6/12
page 97
i i

4.2. Convex analysis 97

By the definition of F ∗∗ ,

F ∗∗ (x̄) ≥ x ∗ , x̄
− F ∗ (x ∗ ).

Hence

F ∗ (x ∗ ) + F ∗∗ (x̄) = x ∗ , x̄
.

Since X is reflexive, F ∗∗ = (F ∗ )∗ . Applying (1) to F ∗ we have x̄ ∈ ∂F ∗ (x ∗ ). If in addition


F is convex and l.s.c., it follows from Theorem 4.12 that F ∗∗ = F . Thus if x̄ ∈ ∂F ∗ (x ∗ ),
then

F (x̄) + F ∗ (x ∗ ) = F ∗ (x ∗ ) + F ∗∗ (x̄) = x ∗ , x̄

by applying (1) to F ∗ . Therefore x ∗ ∈ ∂F (x̄) again by (1).

Proposition 4.23. For F : X → (−∞, ∞] the set ∂F (x) is closed and convex for every
x ∈ X.

Proof. If F (x) = ∞, then ∂F (x) = ∅. Henceforth let F (x) < ∞. For every x ∗ ∈ X ∗ we
have F ∗ (x ∗ ) ≥ x ∗ , x
− F (x) and hence by Theorem 4.22

∂F (x) = {x ∗ ∈ X∗ : F ∗ (x ∗ ) − x ∗ , x
≤ −F (x)}.

By Theorem 4.13 the functional x ∗ → F ∗ (x ∗ ) − x ∗ , x


is convex and l.s.c. The claim
follows from Lemma 4.3.

Theorem 4.24. If the convex function F is continuous at x̄, then ∂F (x̄) is not empty.

Proof. Since F is continuous at x, there exists for every  > 0 an open neighborhood U
of x̄ such that
F (x) ≤ F (x̄) + , x ∈ U .
Then U × (F (x̄) + , ∞) is an open set in X × R, contained in epi F . Hence (epi F )o , the
relative interior of epi F , is nonempty. Since F is convex, epi F is convex and (epi F )o
is convex. Note that (x̄, F (x̄)) is a boundary point of epi F . Hence by the Hahn–Banach
separation theorem, there exists a closed hyperplane S = {(x, a) ∈ X × R : x ∗ , x
+ α a =
β} for nontrivial (x ∗ , α) ∈ X ∗ × R and β ∈ R such that

x ∗ , x
+ α a > β for all (x, a) ∈ (epi F )o ,
(4.2.6)
x ∗ , x̄
+ α F (x̄) = β.

Since (epi F )o = epi F , every neighborhood of (x, a) ∈ epi F contains an element


of (epi ϕ)o . Suppose x ∗ , x
+ α a < β. Then

{(x̃, ã) ∈ X × R : x ∗ , x̃
+ α ã < β}

i i

i i
ItoKunisc
i i
2008/6/12
page 98
i i

98 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

is a neighborhood of (x, a) and contains an element of (epi ϕ)o , which contradicts (4.2.6).
Therefore
x ∗ , x
+ α a ≥ β for all (x, a) ∈ epi F. (4.2.7)
Suppose α = 0. For any u ∈ U there is an a ∈ R such that F (u) ≤ a. Then from (4.2.7)
x ∗ , u
= x ∗ , u
+ α a ≥ β
and thus
x ∗ , u − x̄
≥ 0 for all u ∈ U .
Choose δ > 0 such that |u − x̄| ≤ δ implies u ∈ U . For any nonzero element x ∈ X let
t = |x|
δ
. Then |(tx + x̄) − x̄| = |tx| = δ so that tx + x̄ ∈ U . Hence

x ∗ , x
= x ∗ , (tx + x̄) − x̄
/t ≥ 0.
Similarly, −t x + x̄ ∈ U and
x ∗ , x
= x ∗ , (−tx + x̄) − x̄
/(−t) ≤ 0.
Thus, x ∗ , x
, x ∗ = 0, which is a contradiction. Therefore α is nonzero. From (4.2.6),
(4.2.7) we have for a > F (x̄) that α(a − F (x̄)) > 0 and hence α > 0. Employing (4.2.6),
(4.2.7) again implies that
 ∗ 
x
− , x − x̄ + F (x̄) ≤ F (x)
α

for all x ∈ X and therefore − xα ∈ ∂F (x̄).

4.3 Fenchel duality theory


In this section we discuss some elements of duality theory. We call
inf F (x) (P )
x∈X

the primal problem, where X is a real Banach space and F : X → (−∞, ∞] is a proper
l.s.c. convex function. We have the following result for the existence of a minimizer.

Theorem 4.25. Let X be reflexive and let F be a l.s.c. proper convex functional defined on
X satisfying
lim F (x) = ∞. (4.3.1)
|x|→∞

Then there exists an x̄ ∈ X such that

F (x̄) = inf {F (x) : x ∈ X}.

Proof. Let η = inf {F (x) : x ∈ X} and let {xn } be a minimizing sequence such that
limn→∞ F (xn ) = η. Condition (4.3.1) implies that {xn } is bounded in X. Since X is

i i

i i
ItoKunisc
i i
2008/6/12
page 99
i i

4.3. Fenchel duality theory 99

reflexive, there exists a subsequence that converges weakly to some x̄ in X and it follows
from Lemma 4.4 that F (x̄) = η.

We embed (P ) into a family of perturbed problems

inf (x, y), (Py )


x∈X

where y ∈ Y is an embedding variable, Y is a Banach space, and  : X × Y → (−∞, ∞]


is a proper l.s.c. convex function with (x, 0) = F (x). Thus (P0 ) = (P ). For example, in
terms of (4.1.1) we let
F (x) = f (x) + ϕ(x) (4.3.2)
and with Y = H
(x, y) = f (x) + ϕ(x + y). (4.3.3)

Definition 4.26. The dual problem of (P ) with respect to  is defined by

sup (−∗ (0, y ∗ )). (P ∗ )


y ∗ ∈Y ∗

The value function of (Py ) is defined by

h(y) = inf (x, y), y ∈ Y.


x∈X

In this section we analyze the relationship between the primal problem (P ) and its
dual (P ∗ ). Throughout we assume that h(y) > −∞ for all y ∈ Y .

Theorem 4.27. sup (P ∗ ) ≤ inf (P ).

Proof. For any (x, y) ∈ X × Y and (x ∗ , y ∗ ) ∈ X∗ × Y ∗ we have

x ∗ , x
+ y ∗ , y
− (x, y) ≤ ∗ (x ∗ , y ∗ ).

Thus
0 = 0, x
+ y ∗ , 0
≤ F (x) + ∗ (0, y ∗ )
for all x ∈ X and y ∗ ∈ Y ∗ . Therefore

sup (−∗ (0, y ∗ )) = sup (P ∗ ) ≤ inf (P ) = inf F (x).


y ∗ ∈Y ∗ x∈X

Lemma 4.28. h is convex.

Proof. The proof is established by contradiction. Suppose there exist y1 , y2 ∈ Y and


θ ∈ (0, 1) such that

θ h(y1 ) + (1 − θ ) h(y2 ) < h(θ y1 + (1 − θ) y2 ).

i i

i i
ItoKunisc
i i
2008/6/12
page 100
i i

100 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Then there exist c and  > 0 such that


θ h(y1 ) + (1 − θ ) h(y2 ) < c −  < c < h(θ y1 + (1 − θ) y2 ).
Set a1 = h(y1 ) + θ

and
c − θ a1 c −  − θ h(y1 )
a2 = = > h(y2 ).
1−θ 1−θ
By definition of h there exist x1 , x2 ∈ X such that
h(y1 ) ≤ (x1 , y1 ) ≤ a1 and h(y2 ) ≤ (x2 , y2 ) ≤ a2 .
Thus
h(θ y1 + (1 − θ ) y2 ) ≤ (θ x1 + (1 − θ) x2 , θ y1 + (1 − θ) y2 )

≤ θ (x1 , y1 ) + (1 − θ) (x2 , y2 ) ≤ θ a1 + (1 − θ) a2 = c,
which is a contradiction. Hence h is convex.

Lemma 4.29. For all y ∗ ∈ Y ∗ , h∗ (y ∗ ) = ∗ (0, y ∗ ).

Proof.
h∗ (y ∗ ) = sup ( y ∗ , y
− h(y)) = sup ( y ∗ , y
− inf (x, y))
y∈Y y∈Y x∈X

= sup sup ( y ∗ , y
− (x, y)) = sup sup ( 0, x
+ y ∗ , y
− (x, y))
y∈Y x∈X y∈Y x∈X

= sup ( (0, y ∗ ), (x, y)


− (x, y)) = ∗ (0, y ∗ ).
(x,y)∈X×Y

Theorem 4.30. If h is l.s.c. at 0, then inf (P ) = sup (P ∗ ).

Proof. Since F is proper, h(0) = inf x∈X F (x) < ∞. Since h is convex by Lemma 4.28, it
follows from Theorem 4.15 that h(0) = h∗∗ (0). Thus by Lemma 4.29
sup (P ∗ ) = sup (−∗ (0, y ∗ )) = sup ( y ∗ , 0
− h∗ (y ∗ ))
y ∗ ∈Y ∗ y ∗ ∈Y ∗

= h∗∗ (0) = h(0) = inf (P ).

Theorem 4.31. If h is subdifferentiable at 0, then inf (P ) = sup (P ∗ ) and ∂h(0) is the set
of solutions of (P ∗ ).

Proof. By Lemma 4.29 we have that ȳ ∗ solves (P ∗ ) if and only if


−h∗ (ȳ ∗ ) = −∗ (0, ȳ ∗ ) = sup (−∗ (0, y ∗ ))
y ∗ ∈Y ∗

= sup ( y ∗ , 0
− h∗ (y ∗ )) = h∗∗ (0).
y ∗ ∈Y ∗

i i

i i
ItoKunisc
i i
2008/6/12
page 101
i i

4.3. Fenchel duality theory 101

By Theorem 4.22

h∗∗ (0) + h∗∗∗ (ȳ ∗ ) = ȳ ∗ , 0


= 0

if and only if ȳ ∗ ∈ ∂h∗∗ (0). Since h∗∗∗ = h∗ by Theorem 4.16, we have ȳ ∗ ∈ ∂h∗∗ (y ∗ ) if
and only if −h∗ (ȳ ∗ ) = h∗∗ (0). Consequently ȳ ∗ solves (P ∗ ) if and only if y ∗ ∈ ∂h∗∗ (0).
Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗ (0) by Theorem 4.17. Therefore ∂h(0) is the set of
all solutions of (P ∗ ) and (P ∗ ) has at least one solution. Let y ∗ ∈ ∂h(0). Then

y ∗ , x
+ h(0) ≤ h(x)

for all x ∈ X. If {xn } is a sequence in X such that xn → 0, then

lim inf h(xn ) ≥ lim y ∗ , xn


+ h(0) = h(0)
n→∞

and h is l.s.c. at 0. By Theorem 4.30 inf (P ) = sup (P ∗ ).

Corollary 4.32. If there exists an x̄ ∈ X such that (x̄, ·) is finite and continuous at 0, then
h is continuous on an open neighborhood U of 0 and h = h∗∗ . Moreover,

inf (P ) = sup (P ∗ )

and ∂h(0) is the set of solutions of (P ∗ ).

Proof. First show that h is continuous. Clearly, (x̄, ·) is bounded above on an open
neighborhood U of 0. Since for all y ∈ Y

h(y) ≤ (x̄, y),

h is bounded above on U . Since h is convex by Lemma 4.28 it is continuous by Theorem 4.6.


Hence h = h∗∗ by Theorem 4.15. Moreover h is subdifferentiable at 0 by Theorem 4.24.
The conclusion then follows from Theorem 4.31.
Example 4.33. Consider the case (4.3.2)–(4.3.3), i.e.,

(x, y) = f (x) + ϕ(x + y),

where f : X → (−∞, ∞], ϕ : Y → (−∞, ∞] are l.s.c. and convex and  : X → Y is a


continuous linear operator. Let us calculate the conjugate of :
∗ (x ∗ , y ∗ ) = sup sup{ x ∗ , x
+ y ∗ , y
− (x, y)}
x∈X y∈Y

= sup{ x ∗ , x
− f (x) + sup[ y ∗ , y
− ϕ(x + y)]},
x∈X y∈Y

where
sup[ y ∗ , y
− ϕ(x + y)] = sup[ y ∗ , x + y
− ϕ(x + y) − y ∗ , x
]
y∈Y y∈Y

= sup[ y ∗ , z
− ϕ(z)] − y ∗ , x
= ϕ ∗ (y ∗ ) − y ∗ , x
.
z∈Y

i i

i i
ItoKunisc
i i
2008/6/12
page 102
i i

102 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Thus

∗ (x ∗ , y ∗ ) = sup{ x ∗ , x
− y ∗ , x
− f (x) + ϕ ∗ (y ∗ )}
x∈X

= sup{ x ∗ − ∗ y ∗ , x
− f (x) + ϕ ∗ (y ∗ )} = f ∗ (x ∗ − y ∗ ) + ϕ ∗ (y ∗ ).
x∈X

Theorem 4.34. For any x̄ ∈ X and ȳ ∗ ∈ Y ∗ , the following statements are equivalent.

(1) x̄ solves (P ), ȳ ∗ solves (P ∗ ), and min (P ) = max (P ∗ ).

(2) (x̄, 0) + ∗ (0, ȳ ∗ ) = 0.

(3) (0, ȳ ∗ ) ∈ ∂(x̄, 0).

Proof. Clearly (1) implies (2). If (2) holds, then

(x̄, 0) = F (x̄) ≥ inf (P ) ≥ sup (P ∗ ) ≥ −∗ (0, ȳ ∗ )

by Theorem 4.27. Thus

(x̄, 0) = min (P ) = max (P ∗ ) = −∗ (0, ȳ ∗ ).

Therefore (2) implies (1). Since (0, ȳ ∗ ), (x̄, 0)


= 0, equivalence of (2) and (3) follows by
Theorem 4.22.

Any solution y ∗ of (P ∗ ) is called a Lagrange multiplier associated with . For


Example 4.33 the optimality condition implies

0 = (x̄, 0) + ∗ (0, ȳ ∗ ) = f (x̄) + f ∗ (−∗ ȳ ∗ ) + ϕ(x̄) + ϕ(ȳ ∗ )


(4.3.4)
= [f (x̄) + f ∗ (−∗ ȳ ∗ ) − −∗ ȳ ∗ , x̄
] + [ϕ(x̄) + ϕ ∗ (ȳ ∗ ) − ȳ ∗ , x̄)].

Since each expression of (4.3.4) in square brackets is nonnegative, it follows that

f (x̄) + f ∗ (−∗ ȳ ∗ ) − −∗ ȳ ∗ , x̄


= 0,

ϕ(x̄) + ϕ ∗ (ȳ ∗ ) − ȳ ∗ , x̄) = 0.

By Theorem 4.22
−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.5)
ȳ ∗ ∈ ∂ϕ(x̄).
The functional L : X × Y ∗ → (−∞, ∞] defined by

−L(x, y ∗ ) = sup { y ∗ , y
− (x, y)} (4.3.6)
y∈Y

i i

i i
ItoKunisc
i i
2008/6/12
page 103
i i

4.3. Fenchel duality theory 103

is called the Lagrangian. Note that

∗ (x ∗ , y ∗ ) = sup { x ∗ , x
+ y ∗ , y
− (x, y)}
x∈X, y∈Y

= sup x ∗ , x
+ sup { y ∗ , y
− (x, y)}} = sup ( x ∗ , x
− L(x, y ∗ )).
x∈X y∈Y x∈X

Thus
−∗ (0, y ∗ ) = inf L(x, y ∗ ) (4.3.7)
x∈X

and therefore the dual problem (P ∗ ) is equivalent to

sup inf L(x, y ∗ ).


y ∗ ∈Y ∗ x∈X

If  is a convex l.s.c. function that is finite at (x, y), then for the biconjugate of x : y →
(x, y) in y we have x (y)∗∗ = (x, y) and

(x, y) = ∗∗ ∗ ∗ ∗
x (x, y) = sup { y , y
− x (y )}
y ∗ ∈Y ∗

= sup { y ∗ , y
+ L(x, y ∗ )}.
y ∗ ∈Y ∗

Hence
(x, 0) = sup L(x, y ∗ ) (4.3.8)
y ∗ ∈Y ∗

and the primal problem (P ) is equivalently written as

inf sup L(x, y ∗ ).


x∈X y ∗ ∈Y

Thus by means of the Lagrangian L, the problems (P ) and (P ∗ ) can be formulated as


min-max problems, and by Theorem 4.27

sup inf L(x, y ∗ ) ≤ inf sup L(x, y ∗ ).


y∗ x x y∗

Theorem 4.35 (Saddle Point). Assume that  is a convex l.s.c. function that is finite at
(x̄, ȳ ∗ ). Then the following are equivalent.
(1) (x̄, ȳ ∗ ) ∈ X × Y ∗ is a saddle point of L, i.e.,

L(x̄, y ∗ ) ≤ L(x̄, ȳ ∗ ) ≤ L(x, ȳ ∗ ) for all x ∈ X, y ∗ ∈ Y ∗ . (4.3.9)

(2) x̄ solves (P ), ȳ ∗ solves (P ∗ ), and min (P ) = max (P ∗ ).

Proof. Suppose (1) holds. From (4.3.7) and (4.3.9)

L(x̄, ȳ ∗ ) = inf L(x, ȳ ∗ ) = −∗ (0, ȳ ∗ )


x∈X

i i

i i
ItoKunisc
i i
2008/6/12
page 104
i i

104 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

and from (4.3.8) and (4.3.9)

L(x̄, ȳ ∗ ) = sup L(x̄, y ∗ ) = (x̄, 0).


y ∗ ∈Y ∗

Thus, ∗ (x̄, 0) + (0, ȳ ∗ ) = 0 and (2) follows from Theorem 4.34.


Conversely, if (2) holds, then from (4.3.7) and (4.3.8)

−∗ (0, ȳ ∗ ) = inf L(x, ȳ ∗ ) ≤ L(x̄, ȳ ∗ ),


x∈X

(x̄, 0) = sup L(x̄, y ∗ ) ≥ L(x̄, ȳ ∗ ).


y ∗ ∈Y ∗

Consequently −∗ (0, ȳ ∗ ) = (x̄, 0) by Theorem 4.34 and (4.3.9) holds.

Theorem 4.35 implies that no duality gap between (P ) and (P ∗ ) is equivalent to the
saddle point property of the pair (x̄, ȳ ∗ ).
For Example 4.33 we have

L(x, y ∗ ) = f (x) + y ∗ , x
− ϕ(y ∗ ). (4.3.10)

If (x̄, ȳ ∗ ) is a saddle point, then from (4.3.9)

−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.11)
x̄ ∈ ∂ϕ ∗ (x̄).

It follows from Theorem 4.22 that the second equation is equivalent to

ȳ ∗ ∈ ∂ϕ(x̄)

and (4.3.11) is equivalent to (4.3.5), if X is reflexive. Thus the necessary optimality sys-
tem for
min F (x) = f (x) + ϕ(x)
x∈X

is given by
−∗ ȳ ∗ ∈ ∂f (x̄),
(4.3.12)
ȳ ∗ ∈ ∂ϕ ∗ (x̄).

4.4 Generalized Yosida–Moreau approximation


Definition 4.36 (Monotone Operator). Let A be a graph in X × X ∗ .
(1) A is monotone if

y1 − y2 , x1 − x2
≥ 0 for all x1 , x2 ∈ D(A) and y1 ∈ Ax1 , y2 ∈ Ax2

(2) A is maximal monotone if any monotone extension of A coincides with A.

i i

i i
ItoKunisc
i i
2008/6/12
page 105
i i

4.4. Generalized Yosida–Moreau approximation 105

Let ϕ be an l.s.c. convex function on X. For x1∗ ∈ ∂ϕ(x1 ) and x2∗ ∈ ∂ϕ(x2 ),

ϕ(x1 ) − ϕ(x2 ) ≤ x2∗ , x1 − x2


,

ϕ(x2 ) − ϕ(x1 ) ≤ x1∗ , x2 − x1


.

It follows that x1∗ − x2∗ , x1 − x2


≥ 0. Hence ∂ϕ is a monotone in X × X∗ . The following
theorem characterizes maximal monotone operators by a range condition [ItKa].

Theorem 4.37 (Minty–Browder). Assume that X and X∗ are reflexive and strictly convex
Banach spaces and let F : X → X∗ denote the duality mapping. Then a monotone operator
A is maximal monotone if and only if R(λ F + A) = X∗ for all λ > 0 (or, equivalently, for
some λ > 0).

Theorem 4.38 (Rockafeller). Let X be a real Banach space. If ϕ is an l.s.c. proper convex
functional on X, then ∂ϕ is a maximal monotone operator from X into X∗ .

Proof. We prove the theorem for the case that X is reflexive . The general case is considered
in [Roc2]. ByAsplund’s renorming theorem we can assume that after choosing an equivalent
norm, X and X∗ are strictly convex. Using the Minty–Browder theorem it suffices to prove
that R(F + ∂ϕ) = X∗ . For x0∗ ∈ X∗ we must show that the equation x0∗ ∈ F x + ∂ϕ(x)
has at least a solution x0 . Note that F x is single valued due to the fact that X ∗ is strictly
convex. Define the proper convex functional f : X → (−∞, ∞] by

1 2
f (x) = |x| + ϕ(x) − x0∗ , x
.
2 X
Since f is l.s.c. and f (x) → ∞ as |x| → ∞, there exists x0 ∈ D(f ) such that f (x0 ) ≤
f (x) for all x ∈ X. The subdifferential of the mapping x → 12 |x| 2 is given by the monotone
operator F . Hence we find

ϕ(x) − ϕ(x0 ) ≥ x0∗ , x − x0


− F (x), x − x0
for all x ∈ X.

Setting x(t) = x0 + t (u − x0 ) with u ∈ X and using convexity of ϕ we have

ϕ(u) − ϕ(x0 ) ≥ x0∗ , u − x0


− F (x(t)), u − x0
.

Taking the limit t → 0+ and using the fact that x → F (x) is continuous from X endowed
with the norm topology to X ∗ endowed with the weak∗ topology, we obtain

ϕ(u) − ϕ(x0 ) ≥ x0∗ , u − x0


− F (x0 ), u − x0
,

which implies that x0∗ − F (x0 ) ∈ ∂ϕ(x0 ).

Throughout the remainder of this section let H denote a real Hilbert space which
is identified with its dual H ∗ . Further let A denote a maximal monotone operator in
H × H . Recall that A is necessarily densely defined and closed. Moreover the resolvent

i i

i i
ItoKunisc
i i
2008/6/12
page 106
i i

106 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Jμ = (I + μ A)−1 , with μ > 0, is a contraction defined on all of H ; see [Bre, ItKa, Paz].
Moreover
|Jμ x − x| → 0 as μ → 0+ for each x ∈ H. (4.4.1)
The Yosida approximation Aμ of A is defined by
1
Aμ x = (x − Jμ x).
μ
The operator Aμ is single valued, monotone, everywhere defined, Lipschitz continuous with
Lipschitz constant μ1 and Aμ x ∈ AJμ x for all x ∈ H .
Let ϕ be an l.s.c., proper, and convex function on H . Throughout the remainder of
this chapter let A denote the maximal monotone operator ∂ϕ on the Hilbert space H = H ∗ .
For x, λ ∈ H and c > 0 define the functional ϕc (x, λ) by
 
c
ϕc (x, λ) = inf ϕ(x − u) + (λ, u)H + |u|2H . (4.4.2)
u∈H 2
Then ϕc (x, λ) is a smooth approximation of ϕ in the following sense.

Theorem 4.39. For x, λ ∈ H the infimum in (4.4.2) is attained at a unique point


 
λ
uc (x, λ) = x − J 1 x + .
c c
ϕc (x, λ) is convex, Lipschitz continuously Fréchet differentiable in x and
 
λ
ϕc (x, λ) = λ + c uc (x, λ) = A 1 x + .
c c
Moreover, limc→∞ ϕc (x, λ) = ϕ(x) and
  
λ 1
ϕ J1 x + − |λ|2 ≤ ϕc (x, λ) ≤ ϕ(x) for every x, λ ∈ H.
c c 2c H

Proof. First we observe that (4.4.2) is equivalent to


 2 
c λ 1
ϕc (x, λ) = inf ϕ(v) + x + − v − |λ|2 , (4.4.3)
v∈H 2 c 2c

where v = x − u. Note that v → ψ(v) = ϕ(v) + 2c |v − y|2 , with y = x + λc , is convex


and l.s.c. for every y ∈ H . Since ϕ is proper, D(A)  = ∅, and thus for x0 ∈ D(A), and
x0∗ ∈ Ax0 we have

ϕ(x) ≥ ϕ(x0 ) + (x0∗ , x − x0 ) for all x ∈ H.

Thus, lim ψ(v) = ∞ as |v| → ∞. Hence there exists a unique minimizer v0 ∈ H of ψ.


Let ξ ∈ H and define η = (1 − t)v0 + tξ for 0 < t < 1. Since ϕ is convex we have

ϕ(η) − ϕ(v0 ) ≤ t (ϕ(ξ ) − ϕ(v0 )). (4.4.4)

i i

i i
ItoKunisc
i i
2008/6/12
page 107
i i

4.4. Generalized Yosida–Moreau approximation 107

Moreover, since ψ(v0 ) ≤ ψ(η),


c
ϕ(η) − ϕ(v0 ) ≥ (|v0 − y|2 − |(1 − t)v0 + tξ − y|2 )
2
(4.4.5)
t 2c
= tc (v0 − ξ, v0 − y) − |v0 − ξ |2 .
2
From (4.4.4) and (4.4.5) we obtain, taking the limit t → 0+ ,

ϕ(ξ ) − ϕ(v0 ) ≥ c (y − v0 , ξ − v0 ).

Since ξ ∈ H is arbitrary this implies that y − v0 ∈ 1


c
Av0 . Thus v0 = J 1 y and
c

 
λ
uc (x, λ) = x − v0 = x − J 1 x+
c c

attains the minimum in (4.4.2). Note that this argument also implies that A is maximal.
For x1 , x2 ∈ X and 0 < t < 1
1 2
ϕc ((1 − t)x1 + tx2 , λ) = ψ((1 − t)v1 + tv2 ) − |λ| ,
2c
where yi = xi + λc and vi = J 1 yi for i = 1, 2. Hence the convexity of x → ϕc (x, λ)
c
follows from the one of ψ.
Next, we show that ∂ϕc (x, λ) = A 1 (x + λc ). For x̂ ∈ H , let ŷ = x̂ + λc ∈ H and
c
v̂ = J 1 ŷ. Then, we have
c

c c
ϕ(v̂) + |v̂ − y|2 ≥ ϕ(v0 ) + |v0 − y|2
2 2
and
c c
ϕ(v0 ) + |v0 − ŷ|2 ≥ ϕ(v̂) + |v̂ − ŷ|2 .
2 2
Thus,
c c
(|v0 − ŷ|2 − |v0 − y|2 ) ≥ ϕc (x̂, λ) − ϕc (x, λ) ≥ (|v̂ − ŷ|2 − |v̂ − y|2 ). (4.4.6)
2 2
Since |v̂ − v0 | → 0 as |x̂ − x| → 0, it follows from (4.4.6) that
 
|ϕc (x̂, λ) − ϕc (x, λ) − c(y − v0 ), x̂ − x |
→0
|x̂ − x|

as |x̂ − x| → 0. Hence x → ϕc (x, λ) is Fréchet differentiable with F -derivative


 
λ
c(y − v0 ) = λ + c uc (x, λ) = A 1 x + .
c c

i i

i i
ItoKunisc
i i
2008/6/12
page 108
i i

108 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

The dual representation of ϕc (x, λ) is derived next. Define the functional (v, y) on
H × H by
c
(v, y) = ϕ(v) + |v − (ŷ + y)|2 ,
2
where we set ŷ = x + λc . Consider the family of primal problems

inf (v, y) for each y ∈ H (Py )


v∈H

and the dual problem


sup {−∗ (0, y ∗ )}. (P ∗ )
y ∗ ∈H

If h(y) is the value functional of (Py ), i.e., h(y) = inf v∈H (v, y), then from (4.4.3) we
have
1
ϕc (x, λ) = h(0) − |λ|2 . (4.4.7)
2c
From the proof of Theorem 4.39 it follows that h(y) is continuously Fréchet differentiable
with h (0) = ϕc (x, λ). It thus follows from Theorem 4.31 that inf (P0 ) = max (P ∗ ) and
h (0) is the solution of (P ∗ ).
This leads to the following theorem.

Theorem 4.40. For x, λ ∈ H


 
∗ ∗1 ∗ ∗
ϕc (x, λ) = sup (x, y )H − ϕ (y ) − |y − λ|H ,
2
(4.4.8)
y ∗ ∈H 2c

where the supremum is attained at a unique point λc (x, λ) and we have


 
λ
λc (x, λ) = λ + c uc (x, λ) = ϕc (x, λ) = A 1 x + . (4.4.9)
c c

Proof. For (v ∗ , y ∗ ) ∈ H × H we have by definition


 
c
∗ (v ∗ , y ∗ ) = sup sup (v ∗ , v) + (y ∗ , y) − ϕ(v) − |v − (ŷ + y)|2
v∈H y∈H 2

= sup (v ∗ , v) + (y ∗ , v − ŷ) − ϕ(v)
v∈H
1 2
  c
+ sup − y ∗ , v − (ŷ + y) − |v − (ŷ + y)|2
y∈H 2
 
1 ∗2
= sup (y ∗ + v ∗ , v) − ϕ(v) − (y ∗ , ŷ) + |y |
v∈H 2c

1 ∗2
= ϕ ∗ (y ∗ + v ∗ ) − (y ∗ , ŷ) + |y | .
2c

i i

i i
ItoKunisc
i i
2008/6/12
page 109
i i

4.5. Optimality systems 109

Hence,
 
∗ ∗ ∗ ∗ ∗ 1 ∗2
h(0) = sup {− (0, y )} = sup (y , ŷ) − ϕ (y ) − |y |
y ∗ ∈H y ∗ ∈H 2c

which implies (4.4.8) from (4.4.7) and since ŷ = x + λc . By Theorem 4.31 the maximum
of y ∗ → y ∗ , x
− ϕ ∗ (y ∗ ) − 2c1 |y ∗ − λ|2 is attained at the unique point h (0) = λc (x, λ)
that is given by (4.4.9).

The following theorem provides an equivalent characterization of λ ∈ ∂ϕ(x). This


results will be used in the following section to replace the differential inclusion, which
relates the primal and adjoint variable by means of a nonlinear equation.

Theorem 4.41.

(1) If λ ∈ ∂ϕ(x) for x, λ ∈ H , then λ = ϕc (x, λ) for all c > 0.

(2) Conversely, if λ = ϕc (x, λ) for some c > 0, then λ ∈ ∂ϕ(x).

Proof. If λ ∈ ∂ϕ(x), then by Theorems 4.39, 4.40, and 4.22

ϕ(x) ≥ ϕc (x, λ) ≥ λ, x
− ϕ ∗ (λ) = ϕ(x)

for every c > 0. Thus, λ ∈ H attains the supremum in (4.4.8) and by Theorem 4.40
we have λ = ϕc (x, λ). Conversely, if λ ∈ H satisfies λ = ϕc (x, λ) for some c > 0,
then uc (x, λ) = 0 by Theorem 4.40. Hence it follows from Theorem 4.39, (4.4.2), and
Theorem 4.40 that

ϕ(x) = ϕc (x, λ) = λ, x
− ϕ ∗ (λ),

and thus λ ∈ ∂ϕ(x) by Theorem 4.22.

4.5 Optimality systems


In this section we derive first order optimality systems for (4.1.1) based on Lagrange mul-
tipliers. In what follows we assume that f : X → R and ϕ : H → (−∞, ∞] are
l.s.c., convex functions, that f is continuously differentiable, and that ϕ is proper. Further
 ∈ L(X, H ) and C is a closed convex set in X. In addition we require that

(A1) f, ϕ are bounded below by zero on K,


(A2) f (x1 ) − f (x2 ), x1 − x2
X∗ ,X ≥ σ |x1 − x2 |2X

for some σ > 0 independent of x1 , x2 ∈ C, and

(A3) ϕ(x0 ) < ∞ and ϕ is continuous at x0 for some x0 ∈ C.

i i

i i
ItoKunisc
i i
2008/6/12
page 110
i i

110 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

As a consequence of (A2) we have

f (x) − f (x0 ) − f (x0 ), x − x0


X∗ ,X
 1 (4.5.1)
σ
= f (x0 + t (x − x0 )) − f (x0 ), x − x0
X∗ ,X dt ≥ |x − x0 |2X
0 2

for all x ∈ X. Due to Theorem 4.24 there exists y0∗ ∈ D(A(y0 )) = ∂ϕ(y0 ) such that

ϕ(x) − ϕ(x0 ) ≥ (y0∗ , x − x0 )H for x0 ∈ H. (4.5.2)

Hence, lim f (x) + ϕ(x) → ∞ as |x|X → ∞ and it follows from Theorem 4.25 and (A2)
that there exists a unique minimizer x̄ ∈ C of (4.1.1).

Theorem 4.42 (Optimality). A necessary and sufficient condition for x̄ ∈ C to be the


minimizer of (4.1.1) is given by

f (x̄), x − x̄
X∗ ,X + ϕ(x) − ϕ(x̄) ≥ 0 for all x ∈ C. (4.5.3)

Proof. Assume that x̄ is the minimizer of (4.1.1). Then for x ∈ C and 0 < t < 1 we have
x̄ + t (x − x̄) ∈ C and

f (x̄ + t (x − x̄)) + ϕ((x̄ + t (x − x̄)) ≥ f (x̄) + ϕ(x̄).

Since
ϕ(((1 − t)x̄ + tx)) − ϕ(x̄) ≤ t (ϕ(x) − ϕ(x̄)),
we obtain
t −1 (f (x̄ + t (x − x̄)) − f (x̄)) + ϕ(x) − ϕ(x̄) ≥ 0.
Taking the limit t → 0+ , we have (4.5.3) for all x ∈ C.
Conversely, assume that x̄ ∈ C satisfies (4.5.3). Then, from (4.5.1),

f (x) + ϕ(x) − (f (x̄) + ϕ(x̄))

= f (x) − f (x̄) − f (x̄), x − x̄


+ f (x̄), x − x̄
+ ϕ(x) − ϕ(x̄)
σ
≥ |x − x̄|2X
2
for all x ∈ C. Thus, x̄ is a minimizer of (4.1.1).

Next, ϕ is split from the optimality system in (4.5.3) by means of an appropriately


defined Lagrange multiplier. For this purpose we consider the regularized minimization
problems:

min f (x) + ϕc (x, λ) over x ∈ C (4.5.4)

i i

i i
ItoKunisc
i i
2008/6/12
page 111
i i

4.5. Optimality systems 111

for c > 0 and λ ∈ H . From Theorem 4.39 it follows that x → ϕc (x, λ) is convex,
continuously differentiable, and bounded below by − 2c1 |λ|2H . Thus, for λ ∈ H , (4.5.4) has
a unique solution xc ∈ C and xc ∈ C is the solution of (4.5.4) if and only if xc satisfies
 
f (xc ), x − xc
+ ϕc (xc , λ), (x − xc ) H ≥ 0 for all x ∈ C. (4.5.5)
It follows from Theorems 4.39 and 4.40 that
 
1
ϕc (xc , λ) = A 1 xc + λ = λc ∈ H. (4.5.6)
c c
We have the following result.

Theorem 4.43 (Lagrange Multipliers). (1) Suppose that xc converges strongly to x̄ in X


as c → ∞ and that {λc }c≥1 has a weak cluster point. Then for each weak cluster point
λ̄ ∈ H of {λc }c≥1
λ̄ ∈ ∂ϕ(x̄) and
  (4.5.7)
f (x̄), x − x̄
X∗ ,X + λ̄, (x − x̄) H ≥ 0 for all x ∈ C.

Conversely, if (x̄, λ̄) ∈ C × H satisfies (4.5.7), then x̄ minimizes (4.1.1).


(2) Assume that there exists λ̃c ∈ ∂ϕ(xc ) for c ≥ 1 such that {|λ̃c |H }, c ≥ 1, is
bounded. Then, xc converges strongly to x̄ in X as c → ∞.

Proof. To verify (1), assume that xc → x̄ and let λ̄ be a weak cluster point of λc in H .
Then, from (4.5.5)–(4.5.6), we can conclude that (x̄, λ̄) ∈ C × H satisfies
 
f (x̄), x − x̄
+ λ̄, (x − x̄) H ≥ 0
for all x ∈ C. It follows from Theorems 4.39 and 4.40 that
  1
λc , vc H − |λc − λ|2 = ϕc (xc , λ) + ϕ ∗ (λc )
  2c 
λ 1
≥ ϕ J 1 vc + − |λ|2 + ϕ ∗ (λc ).
c c 2c H
Since J 1 (vc + λc ) → v̄ = x̄ and ϕ and ϕ ∗ are l.s.c., letting c → ∞, we obtain
c
 
λ̄, v̄ H ≥ ϕ(v̄) + ϕ ∗ (λ̄),

which implies that λ̄ ∈ ∂ϕ(v̄) by Theorem 4.22. Hence, (x̄, λ̄) ∈ C × H satisfies (4.5.7).
 Conversely,
 suppose that (x̄, λ̄) ∈ C × H satisfies (4.5.7). Then ϕ(x) − ϕ(x̄) ≥
λ̄, (x − x̄) H for all x ∈ C. Thus the inequality in (4.5.7) implies (4.5.3). It then follows
from Theorem 4.42 that x̄ minimizes (4.1.1).
For (2) we note that from (4.5.5) we have
f (xc ), x − xc
+ ϕc (x, λ) − ϕc (xc , λ) ≥ 0 for all x ∈ C.
Thus
f (xc ), x̄ − xc
+ ϕc (x̄, λ) − ϕc (xc , λ) ≥ 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 112
i i

112 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Also, from (4.5.3)


f (x̄), xc − x̄
+ ϕ(xc ) − ϕ(x̄) ≥ 0.
Adding these inequalities, we obtain
f (x̄) − f (xc ), x̄ − xc
+ ϕc (xc , λ) − ϕ(xc )
(4.5.8)
≤ ϕc (x̄, λ) − ϕ(x̄) ≤ 0.
By (4.4.8)
1 1
ϕ(vc ) − |λ̃c − λ|2H = vc , λ̃c
H − ϕ ∗ (λ̃c ) − |λ̃c − λ|2H ≤ ϕc (vc , λ), (4.5.9)
2c 2c
where vc = xc and hence
1
ϕ(vc ) − ϕc (vc , λ) ≤ |λ̃c − λ|2H .
2c
Thus, from (4.5.8) and (A2)
σ 1
|xc − x ∗ |2X ≤ |λ̃c − λ|2H
2 2c
and since {|λ̃c |H }c≥1 is bounded, |xc − x ∗ |X → 0 as c → ∞.

The following lemma addresses the condition in Theorem 4.43 (2).

Lemma 4.44. (1) If D(ϕ) = H and bounded above, on bounded sets, then ∂ϕ(xc ) is
nonempty and |∂ϕ(xc )|H is uniformly bounded for c ≥ 1.
(2) If ϕ = χK with K a closed convex set in H and xc ∈ K for all c > 0, then λ̃c
can be chosen to be 0 for all c > 0.

Proof. (1) By definition of xc and by Theorem 4.39


f (xc ) + ϕc (vc , λ) ≤ f (x̄) + ϕ(x̄)
and
  
λ 1
f (xc ) + ϕ J 1 vc + ≤ f (x̄) + ϕ(x̄) + |λ|2 ,
c c 2c H
where vc = xc . Since ϕ is bounded below by 0, there exists a constant K1 > 0 independent
of c ≥ 1 such that
f (xc ) − f (x̄) − f (x̄), xc − x̄
≤ K1 + f (x̄)|xc − x̄|X .
By (4.5.1)
σ
|xc − x̄|2X ≤ K1 + f (x̄)|xc − x̄|X .
2
Thus there exists a constant M such that |xc − x̄|X ≤ M for c ≥ 1. Since ϕ is everywhere
defined, convex, l.s.c., and assumed to be bounded above on bounded sets, it follows from

i i

i i
ItoKunisc
i i
2008/6/12
page 113
i i

4.5. Optimality systems 113

Theorem 4.7 that ϕ is Lipschitz continuous in the open set B = {v ∈ H : |v − v̄| <
(M + 1) }, where v̄ = x̄. Let L denote the Lipschitz constant. By Theorem 4.24
∂ϕ(vc ) is nonempty for c ≥ 1. Let λ̃c ∈ ∂ϕ(vc ) for c ≥ 1. Hence
 
L |v − vc | ≥ ϕ(v) − ϕ(vc ) ≥ λ̃c , v − vc

for all v ∈ B. Since vc ∈ B we have |λ̃c | ≤ L, and the claim follows.


(2) In this case ∂ϕ(vc ) is given by the normal cone NC (vc ) = {w ∈ H : (w, vc −v)H ≥
0 for all v ∈ C} and thus 0 ∈ ∂ϕ(vc ) for all c > 0.

Theorem 4.45 (Complementarity). Assume that there exists a pair (x̄, λ̄) ∈ C × H that
satisfies (4.5.7). Then the complementarity condition λ̄ ∈ ∂ϕ(x̄) can equivalently be
expressed as
λ̄ = ϕc (x̄, λ̄) (4.5.10)
and x̄ is the unique solution of
min f (x) + ϕc (x, λ̄) (4.5.11)
x∈C

for every c > 0.

Proof. The first claim follows directly from Theorem 4.41. From (4.5.5) we conclude that
x̂ is a minimizer of (4.5.11) if and only if x̂ ∈ C satisfies
 
f (x̂), x − x̂
X∗ ,X + ϕc (x̂, λ̄), (x − x̂) H ≥ 0 for all x ∈ C.
From (4.5.7) and (4.5.10) it follows that x̄ ∈ C satisfies this inequality as well and hence
x̂ = x̄.

Theorems 4.43 and 4.45 imply that if a pair (x̄, λ̄) ∈ C × H satisfies
⎧  
⎨ f (x̄), x − x̄
+ λ̄, (x − x̄) ≥ 0 for all x ∈ C,
(4.5.12)

λ̄ = ϕc (x̄, λ̄)
for some c > 0, then x̄ is the minimizer of (4.1.1). Conversely, if x̄ is a minimizer of (4.1.1)
and
∂(ϕ ◦  + ψC )(x̄) = ∗ ∂ϕ(x̄) + ∂ψC (x̄), (4.5.13)

then there exists a λ̄ ∈ H such that the pair (x̄, λ̄) satisfies (4.5.12) for all c > 0. Here ψC
denotes the indicator function of the set C. In fact it follows from (4.5.7) that −f (x̄) ∈
∂(ϕ ◦  + ψC )(x̄) and by (4.5.13)
−f (x̄) ∈ ∗ ∂ϕ(x̄) + ∂ψC (x̄) = ∗ ∂ϕ(x̄) + NC (x̄),
where NC (x̄) = {z ∈ X ∗ : z, x − x̄
≤ 0 for all x ∈ C}. This implies that there exists
some λ̄ ∈ ∂ϕ(x̄) such that (4.5.7) holds and also (4.5.12) for the pair (x̄, λ̄). Condition
(4.5.13) holds, for example, if there exists x ∈ int (C) and ϕ is continuous and finite at x
(see, e.g., Propositions 12 and 13 in Section 3.2 of [EkTu]).

i i

i i
ItoKunisc
i i
2008/6/12
page 114
i i

114 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

The following theorem discusses the equivalence between the existence of a Lagrange
multiplier λ̄ and uniform boundedness of λc .

Theorem 4.46. Suppose  is compact and let λ ∈ H be fixed. Then λc = ϕc (vc , λ) is


uniformly bounded in c ≥ 1 if and only if there exists λ∗ ∈ ∂ϕ(x̄) such that (4.5.7) holds.
In either case there exists a subsequence (xĉ , λĉ ) ∈ X × H such that xĉ → x̄ strongly in X
and λĉ → λ∗ weakly in H where (x̄, λ̄) ∈ C × H satisfies (4.5.7).

Proof. Assume that |λc |H is uniformly bounded. From (4.5.5)

f (xc ) + ∗ λc , x − xc
≥ 0 for all x ∈ C,

where λc = ϕc (xc , λ). From (A2) it follows that

σ |xc − x̄|2X ≤ f (xc ) − f (x̄), xc − x̄


≤ f (x̄) − ∗ λc , xc − x̄

≤ (|λc | + f (x̄)) |xc − x̄|,

which by assumption implies that |xc |X is uniformly bounded. Since  is compact it follows
that any weakly convergent sequence λc → λ∗ in H satisfies ∗ λc → ∗ λ̄ strongly in X ∗ .
Again, from (A2) we have

σ |xc − xĉ |X ≤ f (xc ) − f (xĉ ), xc − xĉ


≤ |∗ λc − ∗ λĉ |X∗

for any c, ĉ > 0. Hence {xc } is a Cauchy sequence in X and thus there exists x̄ ∈ X such
that |xc − x̄|X → 0. The rest of the proof for the “only if” part is identical to the one of the
first part of Theorem 4.43. It will be shown in Theorem 4.49 that

σ 1 1
|xc − x̄|2X + |λc − λ̄|2H ≤ |λ − λ̄|2H
2 2c 2c
if (4.5.7) holds. This implies the “if” part.

4.6 Augmented Lagrangian method


In this section we discuss the augmented Lagrangian method for (4.1.1). Throughout we
assume that (A1)–(A3) of Section 4.5 hold and we set Lc (x, λ) = f (x) + φc (x, λ).
Augmented Lagrangian Method.
Step 1: Choose a starting value λ1 ∈ H a positive number c and set k = 1.
Step 2: Given λk ∈ H find xk ∈ C by

Lc (xk , λk ) = min Lc (x, λk ) over x ∈ C.

Step 3: Update λk by λk+1 = ϕc (xk , λk ).


Step 4: If the convergence criterion is not satisfied, then set k = k + 1 and go to Step 2.

i i

i i
ItoKunisc
i i
2008/6/12
page 115
i i

4.6. Augmented Lagrangian method 115

Before proving convergence of the augmented Lagrangian method we motivate the


update of the Lagrange multiplier in Step 3. First, it follows from (4.5.12) that it is a fixed
point iteration for the necessary optimality condition in the dual variable λ. To give the
second motivation we define the dual functional dc : H → R by

dc (λ) = inf f (x) + ϕc (x, λ) over x ∈ C (4.6.1)

for λ ∈ H . The update is then a steepest ascent step for dc (λ). In fact it will be shown in
the following lemma that (4.6.1) attains the minimum at a unique minimizer x(λ) ∈ C and
that dc is continuously Fréchet differentiable with F -derivative

dc (λ) = u(x(λ), λ),

where uc (x, λ) is defined in Theorem 4.39. Thus the steepest ascent step is given by
λk+1 = λk + c u(x(λk ), λk ), which by Theorem 4.40 coincides with the update given in
Step 3.

Lemma 4.47. For λ ∈ H and c > 0 the infimum in (4.6.1) is attained at a unique minimizer
x(λ) ∈ C and the mapping λ ∈ H → x(λ) ∈ X is Lipschitz continuous with Lipschitz
constant σ −1 . Moreover, the dual functional dc is continuously Fréchet differentiable with
F-derivative
dc (λ) = u(x(λ), λ),
where uc (v, λ) is defined in Theorem 4.39.

Proof. The proof is given in three steps.


Step 1. Since f (x) + ϕc (x, λ) is convex and l.s.c. and since (4.5.1) holds, there
exists a unique x(λ) that attains its minimum over K. To establish Lipschitz continuity of
λ → x(λ) we note that x(λ) satisfies the necessary optimality condition
 
f (x(λ)), x − x(λ)
+ ϕc (x(λ), λ), (x − x(λ)) ≥ 0 for all x ∈ C. (4.6.2)

Using this inequality at λ, μ ∈ H , we obtain

f (x(μ)) − f (x(λ)), x(μ) − x(λ)

 
+ ϕc (x(μ), μ) − ϕc (x(λ), λ), (x(μ) − x(λ)) ≤ 0.

By (A2) and (4.5.6) this implies that


     
μ λ
σ |x(μ) − x(λ)|X + A 1 x(μ) +
2
− A 1 x(λ) + , (x(μ) − x(λ)) ≤ 0,
c c c c
and thus
     
μ μ
σ |x(μ) − x(λ)|2X + A 1 x(μ) + − A 1 x(λ) + , (x(μ) − x(λ))
c c c c
     
λ μ
≤ A 1 x(λ) + − A 1 x(λ) + , (x(μ) − x(λ)) .
c c c c

i i

i i
ItoKunisc
i i
2008/6/12
page 116
i i

116 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Since A 1 is monotone and Lipschitz continuous with Lipschitz constant c, this inequality
c
yields
σ |x(μ) − x(λ)|X ≤ |μ − λ|H ,
which shows the first assertion.
Step 2. We show that for every v ∈ H the functional λ ∈ H → ϕc (v, λ) is Fréchet

differentiable with F -derivative ∂λ ϕc (v, λ) given by u(v, λ). Since (4.4.2) can be equiva-
lently written as (4.4.3), it follows from the proof of Theorem 4.39 that λ ∈ H → ϕc (v, λ)
is Fréchet differentiable and
∂ 1 λ
ϕc (v, λ) = (λ + c u(v, λ)) − = u(v, λ).
∂λ c c
Step 3. To argue differentiability of λ → dc (λ) note that it follows from (4.6.2) that
for λ, μ ∈ H
dc (μ) − dc (λ) = f (x(μ)) − f (x(λ)) + ϕc (x(μ), λ) − ϕc (x(λ), λ)

+ϕc (x(μ), μ) − ϕc (x(μ), λ)


 
≥ f (x(λ)), x(μ) − x(λ)
+ ϕc (x(λ), λ), (x(μ) − x(λ)) (4.6.3)

+ϕc (x(μ), μ) − ϕc (x(μ), λ)

≥ ϕc (x(μ), μ) − ϕc (x(μ), λ).


Similarly, we have
dc (μ) − dc (λ) = f (x(μ)) − f (x(λ)) + ϕc (x(μ), μ) − ϕc (x(λ), μ)

+ϕc (x(λ), μ) − ϕc (x(λ), λ)

≤ − f (x(μ)), x(λ) − x(μ)


+ ϕc (x(μ), μ), (x(λ) − x(μ))
(4.6.4)

+ϕc (x(λ), μ) − ϕc (x(λ), λ)

≤ ϕc (x(λ), μ) − ϕc (x(λ), λ).


It follows from Step 2 and Theorem 4.39 that
|ϕc (x(μ), μ) − ϕc (x(μ), λ) − (u(x(λ)λ), μ − λ)|
 1


≤ (u(x(μ), λ + t (μ − λ)) − u(x(λ), λ), μ − λ) dt
0

 1    
1
≤ |μ − λ| A 1 x(μ) + 1 (λ + t (μ − λ)) − A 1 x(λ) + λ + t (λ − μ) dt
c c c c c
0

  
1 1
 1
≤ |μ − λ| (c|x(μ) − x(λ)| + 2t |μ − λ|) dt ≤ + |μ − λ|2 .
c 0 σ c

i i

i i
ItoKunisc
i i
2008/6/12
page 117
i i

4.6. Augmented Lagrangian method 117

Hence (4.6.3)–(4.6.4) and Step 2 imply that λ ∈ H → dc (λ) is Fréchet differentiable with
F -derivative given by u(x(λ), λ), where u(x(λ), λ) is Lipschitz continuous in λ.

The following theorem asserts convergence of the augmented Lagrangian method.

Theorem 4.48. Assume that (A1)–(A3) hold and that there exists λ̄ ∈ ∂ϕ(x̄) such that
(4.5.7) is satisfied. Then the sequence (xk , λk ) is well defined and satisfies
σ 1 1
|xk − x̄|2X + |λk+1 − λ̄|2H ≤ |λk − λ̄|2H (4.6.5)
2 2c 2c
and
∞
σ 1
|xk − x̄|2X ≤ |λ1 − λ̄|2H , (4.6.6)
k=1
2 2c

which implies that |xk − x̄|X → 0 as k → ∞.

Proof. It follows from Theorem 4.45 that λ̄ = ϕc (v̄, λ̄), where v̄ = x̄. Next, we establish
(4.6.5). From (4.5.5) and Step 3
 
f (xk ), x̄ − xk
+ λk+1 , (x̄ − xk ) ≥ 0
and from (4.5.7)  
f (x̄), xk − x̄
+ λ̄, (xk − x̄) ≥ 0.
Adding these two inequalities, we obtain
 
f (xk ) − f (x̄), xk − x̄
+ λk+1 − λ̄, (xk − x̄) ≤ 0. (4.6.7)
From Theorems 4.39 and 4.40
   
λk λ̄
λk+1 − λ̄ = A 1 vk + − A1 v̄ + ,
c c c c
where vk = xk and v̄ = x̄. From the definition of Aμ we have
Aμ v − Aμ v̂, v − v̂
= μ |Aμ v − Aμ v̂|2 + Aμ v − Aμ v̂, Jμ v − Jμ v̂
≥ μ |Aμ v − Aμ v̂|2
for μ > 0 and v, v̂ ∈ H , since Aμ v ∈ AJμ v and A is monotone. Thus,
  
λk λ̄
(λk+1 − λ̄, vk − v̄) = λk+1 − λ̄, vk + − v̄ +
c c

1 1 1
− (λk+1 − λ̄, λk − λ̄) ≥ |λk+1 − λ̄|2 − (λk+1 − λ̄, λk − λ̄)
c c c
1 1
≥ |λk+1 − λ̄|2 − |λk − λ̄|2 .
2c 2c
Hence, (4.6.5) follows from (A2) and (4.6.7). Summing up (4.6.5) with respect to k, we
obtain (4.6.6).

i i

i i
ItoKunisc
i i
2008/6/12
page 118
i i

118 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

The duality (Uzawa) method is an iterative method for λk ∈ H . It is given by

λk+1 = ϕc (xk , λk ), (4.6.8)

where xk ∈ C solves

f (xk ) + ∗ λk , x − xk
≥ 0 for all x ∈ C. (4.6.9)

It will be shown that the Uzawa method is conditionally convergent in the sense that there
exists 0 < c < c̄ such that it converges for c ∈ [c, c̄]. On the other hand, the augmented
Lagrangian method can be written as

λk+1 = ϕc (xk , λk ),

where xk ∈ C satisfies

f (xk ) + ∗ λk+1 , x − xk
≥ 0 for all x ∈ C.

Note that the Uzawa method is explicit with respect to λ while the augmented Lagrangian
method is implicit and converges unconditionally (see Theorem 4.48) with respect to c.

Theorem 4.49 (Uzawa Algorithm). Assume that (A1)–(A3) hold and that there exists
λ̄ ∈ ∂ϕ(x̄) such that (4.5.7) is satisfied. Then there exists c̄ such that for the sequence
(xk , λk ) generated by (4.6.8)–(4.6.9) we have |xk − x̄|X → 0 as k → ∞ if 0 < c ≤ c̄.

Proof. As shown in the proof of Theorem 4.48 it follows from (4.5.7) and (4.6.9) that

f (xk ) − f (x̄), xk − x̄
+ λk − λ̄, (xk − x̄)
≤ 0. (4.6.10)

Since    
λk λ̄
λk+1 − λ̄ = A 1 vk + −A 1 v̄ + ,
c c c c
where vk = xk , and since v̄ = x̄ and A 1 is Lipschitz continuous with Lipschitz con-
c
stant c,
|λk+1 − λ̄| ≤ |λk − λ̄ + c (vk − v̄)|2
2

= |λk − λ̄|2 + 2c λk − λ̄, vk − v̄)


+ c2 |vk − v̄|2

≤ |λk − λ̄|2 − (2cσ − c2 2 )|xk − x̄|2 ,


where (4.6.10) was used for the last inequality. Choose c̄ < 2σ ()−1 . Then β =
2cσ − c2 2 > 0 if c ∈ (0, c̄] and


β |xk − x̄|2 ≤ |λ0 − λ̄|2 .
k=0

Consequently |xk − x̄| → 0 as k → ∞.

i i

i i
ItoKunisc
i i
2008/6/12
page 119
i i

4.7. Applications 119

4.7 Applications
In this section we discuss applications of the results in Sections 4.5 and 4.6. In many cases
the conjugate functional ϕ ∗ is given by

ϕ ∗ (v) = ψK ∗ (v),

where K ∗ is a closed convex set in H and ψS is the indicator function of a set S, i.e.,

⎨ 0 if x ∈ S,
ψS (x) =

∞ if x  ∈ S.
Then it follows from Theorem 4.40 that for v, λ ∈ H
 
1 1
ϕc (v, λ) = sup − |y ∗ − (λ + c v)|2H + (|λ + c v|2H − |λ|2H ). (4.7.1)
y ∗ ∈K ∗ 2c 2c

Hence the supremum is attained at λc (v, λ) = ProjK ∗ (λ + c v), where ProjK ∗ (φ) denotes
the projection of φ ∈ H onto K ∗ . This implies that the update in Step 3 of the augmented
Lagrangian algorithm is given by

λk+1 = ProjK ∗ (λk + c xk ). (4.7.2)

Example 4.50. For the equality constraint

ϕ(v) = ψK (v) with K = {0}

and ϕ ∗ (w) = 0 for w ∈ H . Thus


1
ϕc (v, λ) = (|λ + c v|2H − |λ|2H )
2c
and

λk+1 = λk + c xk ,

which coincides with the first order augmented Lagrangian update for equality constraints
discussed in Chapter 3.

Example 4.51. If ϕ(v) = ψK (v), where K is a closed convex cone with vertex at the origin
in H , then ϕ ∗ = ψK + , where K + = {w ∈ H : w, v
H ≤ 0 for all v ∈ K} is the dual cone of
K. In particular, if K = {v ∈ L2 () : v ≤ 0 a.e.}, then K + = {w ∈ L2 () : w ≥ 0 a.e.}.
Thus for the inequality constraint in H = L2 () the update (4.7.2) becomes

λk+1 = max(0, λk + c xk ),

where the max operation is defined pointwise in . Here L2 () denotes the space of scalar-
valued square integrable functions over a domain  in Rn . For K = {v ∈ Rn : vi ≤
0 for all i} the update (4.7.2) is given by λk+1 = max(0, λk + cxk ), where the maximum
is taken coordinatewise. It coincides with the first order augmented Lagrangian update for
finite rank inequality constraints in Chapter 3.

i i

i i
ItoKunisc
i i
2008/6/12
page 120
i i

120 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Example 4.52. If ϕ(v) = |v|H , then ϕ ∗ = ψB , where B is the closed unit ball in H . Thus
the update (4.7.2) becomes

λ k + c  k xk
λk+1 = .
max(1, |λ + c xk |)

Example 4.53. We consider the bilateral inequality constraint φ ≤ v ≤ ψ in L2 (). We set


 
ψ −φ φ+ψ ψ −φ
K = v ∈ L () : −
2
≤v− ≤
2 2 2

and ϕ = ψK . It follows that


   
∗ φ+ψ ψ −φ
ϕ (w) = w, + , |w| .
2 L2 2 L2

From Theorem 4.40 we conclude that the expression x, w


− φ ∗ (w) − 1
2c
|w − λ|2 is
maximized with respect to w at the element y ∗ satisfying

φ+ψ y∗ − λ ψ − φ
x− − ∈ ∂(| · |)(y ∗ )
2 c 2
a.e. in . Thus it follows from Theorem 4.40 that the complementarity condition (4.1.8) is
given by

λ̄ = max(0, λ̄ + c (x̄ − ψ)) + min(0, λ̄ + c (x̄ − φ)),

and the Lagrange multiplier update in Step 3 of the augmented Lagrangian method is

λk+1 = max(0, λk + c (xk − ψ)) + min(0, λk + c (xk − φ)),

where the max and min operations are defined pointwise a.e. in .

Note that with obvious modifications, x in Sections 4.5 and 4.6 can be replaced by
the affine function of the form x + a with a ∈ H .

4.7.1 Bingham flow


We consider the problem
   
μ ˜
min |∇u| − f u dx + g |∇u| dx
2
over u ∈ H01 (), (4.7.3)
 2 

where  is a bounded open set in R2 with Lipschitz boundary, g and μ are positive constants,
and f˜ ∈ L2 (). For a discussion of (4.7.3) we refer the reader to [Glo, GLT] and the
references therein. In the context of the general theory of Section 4.5 we choose

X = H01 (), H = L2 () × L2 (), and  = g grad,

i i

i i
ItoKunisc
i i
2008/6/12
page 121
i i

4.7. Applications 121

K = X, and define f : X → R and ϕ : H → R by


  
μ ˜
f (u) = |∇u| − f u dx
2
 2

and
 3
ϕ(v1 , v2 ) = v12 + v22 dx.


Conditions (A1)–(A3) are clearly satisfied. Since dom(ϕ) = H it follows from Theo-
rem 4.45 and Lemma 4.44 that there exists λ̄ such that (4.5.7) holds. Moreover, it is not
difficult to show that
ϕ ∗ (v) = ψK ∗ (v),
where K ∗ is given by

K ∗ = {v ∈ H : |v(x)|R2 ≤ 1 a.e. in }.

Hence it follows from (4.7.2) that Steps 2 and 3 in the augmented Lagrangian method are
given by

−μ uk − g div λk+1 = f˜, (4.7.4)

where


⎪ λ + c ∇uk on Ak = {x : |λk (x) + c ∇uk (x)|R2 ≤ 1},
⎨ k
λk+1 = λk + c ∇uk (4.7.5)


⎩ on  \ Ak .
|λk + c ∇uk |R2

4.7.2 Image restoration


For the image restoration problem introduced in (4.1.2) the analysis is similar to the one for
the Bingham flow and Steps 2 and 3 in the augmented Lagrangian method are given by
∂u
−μ uk + K∗ (Kuk − z) + g div λk+1 = 0, μ + g λk+1 = 0 on ,
∂n
where


⎪ λ + c ∇uk on Ak = {x : |λk (x) + c ∇uk (x)|R2 ≤ 1},
⎨ k
λk+1 = λk + c ∇uk


⎩ on  \ Ak
|λk + c ∇uk |R2

in the strong form. Here K : L2 () → L2 () denotes the convolution operator defined by

(Ku)(x) = k(x, s)u(s) ds.


i i

i i
ItoKunisc
i i
2008/6/12
page 122
i i

122 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

4.7.3 Elastoplastic problem


We consider the problem
  
β ˜
min |∇u| − f u dx
2
over u ∈ H01 ()
 2 (4.7.6)
subject to |∇u| ≤ 1 a.e. in ,

where β is a positive constant,  is a bounded domain in R2 , and f˜ ∈ L2 (). In the context


of the general theory we choose

X = H01 (), H = L2 () × L2 (), and  = ∇,

K = X, and define f : X → R and ϕ : H → R by


  
β
f (u) = |∇u|2 − f˜ u dx and ϕ = ψĈ ,
 2

where Ĉ is the closed convex set defined by C = {v ∈ H : |v|R2 ≤ 1 a.e. in }. Then

ϕ ∗ (w) = |w|R2 dx.


The maximum of (4.4.8), i.e., ∇u, w


− φ ∗ (w) − 1
2c
|w − λ| with respect to w ∈ L2 () ×
L2 (), is attained at y ∗ such that

λ + c ∇u = y ∗ + cμ, where |μ| ≤ 1 and μ · y ∗ = |y ∗ |

a.e. in . It thus follows from Theorem 4.40 that the Lagrange multiplier update is given by
  k
λk + c ∇uk λ + c ∇uk
λk+1 = c max 0, | |−1
c |λk + c ∇uk |

a.e. in . The existence of Lagrange multiplier λ̄ ∈ L∞ () for the inequality constraint in
(4.7.6) is shown in [Bre2] for f˜ = 1. In general, existence is still an open problem.

4.7.4 Obstacle problem


We consider the problem
  
1
min |∇u|2 − f˜ u dx over u ∈ H01 ()
 2 (4.7.7)
subject to φ ≤ u ≤ ψ a.e. in .

In the context of the general theory we choose

X = H01 (), H = L2 (), and  = the natural injection,

i i

i i
ItoKunisc
i i
2008/6/12
page 123
i i

4.7. Applications 123

C = X, and define f : X → R and ϕ : H → R by


  
1 ˜
f (u) = |∇u| − f u dx and
2
ϕ(v) = ψK ,
 2

where K is the closed convex set defined by K = {v ∈ H : φ ≤ v ≤ ψ a.e. in }. For


the one-sided constraint u ≤ ψ (i.e., φ = −∞) it is shown in [IK4], for example, that there
exists a unique λ̄ ∈ ∂ϕ(ū) such that (4.5.7) is satisfied provided that ψ ∈ H 1 (), ψ| ≥ 0,
and sup(0, f˜ + ψ) ∈ L2 (). In fact, the optimal pair (ū, λ̄) ∈ (H 2 ∩ H01 ) × L2 satisfies
− ū + λ̄ = f˜,
(4.7.8)
λ̄ = max(0, λ̄ + c (ū − ψ)).
In this case Steps 2 and 3 in the augmented Lagrangian method is given by
− uk + λk+1 = f˜,

λk+1 = max(0, λk + c (uk − ψ)).


For the two-sided constraint case we assume that φ, ψ ∈ H 1 () satisfy
φ≤0≤ψ on , max(0, ψ + f˜), min(0, φ + f˜) ∈ L2 (), (4.7.9)
S1 = {x ∈  : ψ + f˜ > 0} ∩ S2 = {x ∈  : φ + f˜ < 0} = ∅, (4.7.10)
and that there exists a c0 > 0 such that
− (ψ − φ) + c0 (ψ − φ) ≥ 0 a.e. in . (4.7.11)
Let λ̂ ∈ H be defined by

⎨ ψ(x) + f˜(x), x ∈ S1 ,
λ̂(x) = φ(x) + f˜(x), x ∈ S2 , (4.7.12)

0 otherwise.
Employing the regularization procedure of (4.5.4) we find the following theorem.

Theorem 4.54. Assume that φ, ψ ∈ H 1 () satisfy (4.7.9)–(4.7.12) where λ̂ is defined in


(4.7.12). Then (4.5.5) is given by

⎨ λ̂ + c (uc − ψ) if λ̂ + c (uc − ψ) > 0,
− uc + λc = f˜, λc = λ̂ + c (uc − φ) if λ̂ + c (uc − φ) < 0, (4.7.13)

0 otherwise,

and φ ≤ uc ≤ ψ, |λc | ≤ |λ̂| a.e. in  for c ≥ c0 . Moreover, as c → ∞, uc  u∗ weakly


in H 2 () and λc  λ̄ weakly in L2 () where ū is the solution of (4.7.7) and (ū, λ̄) satisfies
the necessary and sufficient optimality condition

⎨ λ̄ + c (ū − ψ) if λ̄ + c (ū − ψ) > 0,
− ū + λ̄ = f˜, λ̄ = λ̄ + c (ū − φ) if λ̄ + c (ū − φ) < 0,

0 otherwise.

i i

i i
ItoKunisc
i i
2008/6/12
page 124
i i

124 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

Proof. From Theorem 4.40 it follows that



⎨ λ̂ + c (u − ψ) if λ̂ + c (u − ψ) > 0,
λc (u, λ̂) = λ̂ + c (u − φ) if λ̂ + c (u − φ) < 0, (4.7.14)

0 otherwise.

We note that (4.5.4) has a unique solution uc ∈ H01 (). From (4.5.5)–(4.5.6) we deduce
that uc satisfies (4.7.13). Since λ̂ ∈ L2 () it follows that λc (uc , λ̂) ∈ L2 () and thus
uc ∈ H 2 (). Let η = sup(0, uc − ψ). Then η ∈ H01 (). Hence, we have

(∇(uc − ψ), ∇η) + (−( ψ + f˜) + λc , η) = 0.

For x ∈  assume that uc (x) ≥ ψ(x). If λ̂(x) > 0, then −( ψ + f˜)+λc = c (uc −ψ) ≥ 0.
If λ̂(x) = 0, then −( ψ + f˜) + λc ≥ c (uc − ψ) ≥ 0. If λ̂(x) < 0, then −( ψ + f˜) + λc ≥
− (ψ − φ) + c (ψ − φ) ≥ 0 for c ≥ c0 . Thus, we have (−( ψ + f˜) + λc , η) ≥ 0 and
|∇η|2 = 0. This implies that η = 0 and uc ≤ ψ a.e. in . Similarly, we can prove that
uc ≥ φ a.e. in  by choosing the test function η = inf (0, uc − φ) ∈ H01 (). Moreover, it
follows from (4.7.14) that |λc | = |λc (uc , λ̂)| ≤ |λ̂| a.e. in  and |λc |L2 is uniformly bounded.
Thus, there exists a weakly convergent subsequence (uĉ , λĉ )  (ū, λ̄) in H 2 () × L2 ().
Moreover the subsequence uĉ converges strongly to ū in H01 () and, as shown in the proof
of Theorem 4.46,
− ū + λ̄ = f˜ and λ̄ ∈ ∂ϕ(ū). (4.7.15)
Hence, it follows from Theorem 4.43 that ū minimizes (4.7.7). Since the solution to (4.7.7)
is unique it follows from (4.7.15) that λ̄ ∈ L2 () is unique. The theorem now follows from
Theorem 4.45.
From Theorem 4.54 it follows that Steps 2 and 3 in the augmented Lagrangian method
for the two-sided constraint are given by

⎨ λk + c (uk − ψ) if λk + c (uk − ψ) > 0,
− uk + λk+1 = f˜, λk+1 = λk + c (uk − φ) if λk + c (uk − φ) < 0,

0 otherwise.

Note that the last equality can be equivalently expressed as

λk+1 = max(0, λk + c (uk − ψ)) + min(0, λk + c (uk − φ)).

4.7.5 Signorini problem


Consider the Signorini problem

1
min (|∇u|2 + |u|2 ) − f˜ u dx over u ∈ H 1 ()
 2 (4.7.16)
subject to u ≥ 0 on the boundary ,

i i

i i
ItoKunisc
i i
2008/6/12
page 125
i i

4.7. Applications 125

which is a simplified version of a contact problem arising in elasticity theory. In this case
we choose

X = H 1 (), H = L2 ( ), and  = the trace operator on boundary ,

C = X, and define f : X → R and ϕ : H → R by



1
f (u) = (|∇u|2 + |u|2 ) − f˜ u dx and ϕ(v) = ψK ,
 2

where K is the closed convex set defined by K = {v ∈ L2 () : v ≥ 0 a.e. in }. If


f˜ ∈ L2 (), then the unique minimizer ū ∈ H 2 () [Bre2] and it is shown in [Glo] that
for λ̄ = ∂n

ū, the pair (ū, λ̄) satisfies (4.5.7). Note that range() is dense in H and  is
compact. It thus follows from Theorems 4.43 and 4.46 that (4.5.7) has a unique solution
and that uc → ū strongly in X and λc → λ̄ weakly in H . Steps 2 and 3 in the augmented
Lagrangian method are given by

− uk + uk = f˜, uk − λk+1 = 0 on ,
∂n

λk+1 = max(0, λk − c uk ) on .

4.7.6 Friction problem


Consider a simplified friction problem from elasticity theory
 
1
min (|∇u|2 + |u|2 ) − f˜ u dx + g |u| ds over u ∈ H 1 (), (4.7.17)
 2

where  is a bounded open domain of R2 with sufficiently smooth boundary . In this case
we choose

X = H 1 (), H = L2 ( ), and  = the trace operator on boundary ,

C = X, and define f : X → R and ϕ : H → R by


 
1
f (u) = (|∇u|2 + |u|2 ) − f˜ u dx and ϕ(v) = g |v| ds.
 2

Note that ϕ = ψK ∗ , where K ∗ = {v ∈ H : |v(x)| ≤ 1 a.e. in }. Since dom(ϕ) = H


it follows from Theorem 4.43 and Lemma 4.44 that there exists λ̄ such that (4.5.7) holds.
From (4.7.2) Steps 2 and 3 in the augmented Lagrangian method are given by

− uk + uk = f˜, uk + λk+1 = 0 on ,
∂n

⎨ λk + c u k on k = {x ∈ : |λk + c uk | ≤ 1},
λk+1 = λk + c uk
⎩ on \ k .
|λk + c uk |

i i

i i
ItoKunisc
i i
2008/6/12
page 126
i i

126 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

4.7.7 L1 -fitting

Consider the minimization problem


 
μ
min |u − z| dx + |∇u|2 ds over u ∈ H01 () (4.7.18)
 2 

for interpolation of noisy data z ∈ L2 () by minimizing the L1 -norm of the error u − z
over u ∈ H01 (). Again μ > 0 is fixed but should be adjusted to the statistics of noise. The
analysis of this problem is very similar to that of a friction problem and Steps 2 and 3 in the
augmented Lagrangian method are given by

−μ uk + λk+1 = 0,

⎨ λk + c (uk − z) on k = {x : |λk (x) + c (uk (x) − z)| ≤ 1},
λk+1 = λk + c (uk − z)
⎩ on  \ k .
|λk + c (uk − z)|

4.7.8 Control problem

Consider the optimal control problem


 T
1
min |x(t)|2 + |u(t)|2 dt over (x, u) ∈ L2 (0, T ; Rn ) × L2 (0, T ; Rm )
2 0
(4.7.19)
 t
subject to x(t) − x0 − Ax(s) + Bu(s) ds = 0 and |u(t)| ≤ 1 a.e.,
0

where A ∈ Rn×n , B ∈ Rn×m , and x0 ∈ Rn are fixed. In this case we formulate the problem
as in the form (4.1.1) by choosing

X = L2 (0, T ; Rn )×L2 (0, T ; Rm ), H = L2 (0, T ; Rm ), and (x, u) = u for (x, u) ∈ X,

and define f : X → R and ϕ : H → by


 T
1
f (x, u) = (|x(t)|2 + |u(t)|2 ) dt and ϕ(v) = ψK (v),
2 0

where K = {v ∈ H : |v(t)| ≤ 1 a.e. in (0, T )}. C is the closed affine space defined by
  t 
C = (x, u) ∈ X : x(t) − x0 − Ax(s) + Bu(s) ds = 0 .
0

It follows from Theorems 4.39 and 4.40 that


   
λ v + λc λ λ+cv
λc = c v + −
= c max 0, v + − 1 (4.7.20)
c max(1, |v + c |)
λ c |λ + c v|

i i

i i
ItoKunisc
i i
2008/6/12
page 127
i i

4.7. Applications 127

for v, λ ∈ H . For c > 0 the regularized problem

fc (x, u) = f (x, u) + ϕc (u, 0) over (x, u) ∈ C

has a unique solution (xc , uc ) ∈ C. Define the Lagrangian


 T  t 
L(x, u, μ) = fc (x, u) + x(t) − x0 − Ax(s) + Bu(s) ds, μ(t) dt.
0 0 Rn

Since the mapping F : X → L2 (0, T ; Rn ) defined by


 t
F (x, u) = x(t) − Ax(s) + Bu(s) ds, t ∈ (0, T ),
0

is surjective it follows from the Lagrange multiplier theory that there exists a unique La-
grange multiplier μc ∈ L2 (0, T ; Rn ) such that the Fréchet derivative of L with respect to
(x, u) satisfies L (xc , uc , μc )(h, v) = 0 for all (h, v) ∈ X. Hence we obtain
 T 1  t  2
h(t) − Ah(s) ds, μc (t) + (x(t), h(t)) dt = 0
0 0

for all h ∈ L2 (0, T ; Rn ) and


 T 1   t 2
uc
u(t) + c max(0, |uc | − 1) , v(t) − Bv(s) ds, μc (t) dt = 0
0 |uc | 0
T
for all v ∈ L2 (0, T ; Rm ). It is not difficult to show that if we define pc (t) = − t μc (s) ds,
then (xc , uc , pc ) satisfies
d
pc (t) + AT pc (t) + xc (t) = 0, pc (T ) = 0,
dt
(4.7.21)
uc (t)
uc (t) + c max(0, |uc (t)| − 1) = −B T pc (t).
|uc (t)|
Let (x̄, ū) be a feasible point of (4.7.19); i.e., (x̃, ũ) ∈ C and ũ ∈ K. Since fc (xc , uc ) ≤
fc (x̃, ũ) = f (x̃, ũ) and ϕc (uc , 0) ≥ 0 it follows that |(xc , uc )|X is uniformly bounded in
c > 0. Thus, there exists a subsequence of (xc , uc ) that converges weakly to (x̄, ū) ∈ X as
c → ∞. We shall show that (x̄, ū) is the solution of (4.7.19). First, since F (xc , uc ) = x0 , xc
converges strongly to x̄ in L2 (0, T ; Rn ) and weakly in H 1 (0, T ; Rn ) and (x̄, ū) ∈ C. From
the first equation of (4.7.21) it follows that pc converges strongly to p̄ in H 1 (0, T ; Rn ),
where p̄ satisfies
d
p̄(t) + AT p̄(t) + x̄(t) = 0, p̄(T ) = 0.
dt
Next, from the second equation of (4.7.21) we have

|uc (t)| + c max(0, |uc (t)| − 1) = |B T pc (t)|

i i

i i
ItoKunisc
i i
2008/6/12
page 128
i i

128 Chapter 4. Augmented Lagrangian Methods for Nonsmooth, Convex Optimization

and thus


⎪ |B T pc (t)| if |B T pc (t)| ≤ 1,

|uc (t)| = t ∈ [0, T ]. (4.7.22)

⎪ |B T pc (t)| − 1
⎩ 1+ if |B pc (t)| ≥ 1,
T
c+1
Note that
−B T pc (t)
uc (t) = ,
1 + ηc (t)|uc (t)|−1
(4.7.23)
c
where ηc (t) = c max(0, |uc (t)| − 1) = max(0, |B T pc (t)| − 1).
1+c

Define the functions η̄, û ∈ H 1 (0, T ) by

−B T p̄(t) −B T p̄(t)
η̄(t) = max(0, |B T p̄(t)| − 1) and û(t) = = (4.7.24)
1 + η̄(t) max(1, |B T p̄(t)|)

for t ∈ [0, T ]. Then


|B T p̄(t)|
|û(t)| = .
max(1, |B T p̄(t)|)
Since B T pc → B T p̄ in C(0, T ) we have |uc | → |û| in C(0, T ) by (4.7.22) and it follows
from (4.7.23)–(4.7.24) that ηc → η̄ and uc → û in C(0, T ). Since uc → ū weakly in
L2 (0, T ; Rm ) we have ū = û. Therefore, we conclude that uc → ū in C(0, T ) and if
λ̄(t) = η̄(t) ū(t), then (x̄, ū, p̄, λ̄) satisfies (x̄, ū) ∈ C, ū ∈ C and
d
p̄(t) + AT p̄(t) + x̄ = 0, p̄(T ) = 0,
dt (4.7.25)
ū(t) + λ̄ = −B T p̄(t), λ̄(t) = λc (ū(t), λ̄(t)) = ϕc (ū, λ̄),

which is equivalent to (4.5.7). Now, from Theorem 4.43 we deduce that (x̄, ū) is the solution
to (4.7.19). The last equality in (4.7.25) can be verified by separately considering the cases
|ū(t)| < 1 and |ū(t)| = 1. It follows from (4.7.20) that Steps 2 and 3 in the augmented
Lagrangian method are given by
d
xk (t) = Axk (t) + Buk (t), xk (0) = x0 ,
dt
d
pk (t) + AT pk (t) + xk = 0, pk (T ) = 0,
dt
 
λk λ k + c uk
uk (t) + λk+1 (t) = −B T pk (t), where λk+1 = c max 0, uk + − 1
c |λk + c uk |

for t ∈ [0, T ].

i i

i i
ItoKunisc
i i
2008/6/12
page 129
i i

Chapter 5

Newton and SQP Methods

In this chapter we discuss the Newton method for the equality-constrained optimal control
problem
min J (y, u) subject to e(y, u) = 0, (P )
where J : Y × U → R and e : Y × U → W , and Y, U , and W are Hilbert spaces. The
focus is on problems where for given u there exists a solution y(u) to e(y, u) = 0, which is
typical for optimal control problems. We shall refer to

Jˆ(u) = J (y(u), u) (5.0.1)

as the reduced cost functional. In the first section we give necessary and sufficient optimality
conditions based on Taylor series expansion arguments. Section 5.2 is devoted to Newton’s
method to solve (P ) and sufficient conditions for quadratic convergence are given. Sec-
tion 5.3 contains a discussion of SQP (sequential quadratic programming) and reduced SQP
techniques. We do not provide a convergence analysis here, since this can be obtained as
a special case from the results on second order augmented Lagrangians in Chapter 6. The
results are specialized to a class of optimal control problems for the Navier–Stokes equation
in Section 5.4. Section 5.5 is devoted to the Newton method for weakly singular problems
as introduced in Chapter 1.

5.1 Preliminaries
(I) Necessary Optimality Condition
The first objective is to characterize the derivative of Jˆ without recourse to the derivative
of the state y with respect to the control u. This is in the same spirit as Theorem 1.17 but
obtained under the less restrictive assumption of this chapter. We assume that

(C1) (y, u) ∈ Y × U satisfies e(y, u) = 0 and there exists a neighborhood V (y) × V (u) of
(y, u) on which J and e are C 1 with Lipschitz continuous first derivatives such that
for every v ∈ V (u) there exists a unique y(v) ∈ V (y) satisfying e(y(v), v) = 0.

129

i i

i i
ItoKunisc
i i
2008/6/12
page 130
i i

130 Chapter 5. Newton and SQP Methods

Theorem 5.1. Assume that the pair (y, u) satisfies (C1) and that there exists λ ∈ W ∗ such
that
(C2) ey (y, u)∗ λ + Jy (y, u) = 0, and
(C3) limt→0+ 1t |y(u + td) − y|2Y = 0 for all d ∈ U .

Then the Gâteaux derivative Jˆ (u) exists and

Jˆ (u) = eu (y, u)∗ λ + Ju (y, u). (5.1.1)

Proof. For d ∈ U and t sufficiently small let v = y(u + td) − y. Then


 1
 
ˆ ˆ
J (u+td)− J (u) = J (y +sv, u+std)(v, td)−J (y, u)(v, td) ds +J (y, u)(v, td).
0
(5.1.2)
Similarly,
0 = λ, e(v, u + td) − e(y, u)
W ∗ ,W
  
1  
= λ, e (y + sv, u + std)(v, td) − e (y, u)(v, td) ds + λ, e (y, u)(v, td)
,

0
(5.1.3)
and hence by Lipschitz continuity of e in V (y) × V (u) and (C3) we have
1
lim+ λ, ey (y, u)v
= − λ, eu (y, u)d
.
t→0 t

Since J is Lipschitz continuous in V (y) × V (u), it follows from (5.1.2) and conditions
(C2)–(C3) that

Jˆ(u + td) − Jˆ(u) 1


lim+ = − lim+ λ, ey (y, u)v
+ Ju (y, u)d
t→0 t t→0 t

= (eu (y, u)∗ λ + Ju (y, u))d.


Defining the Lagrangian L : Y × U × W ∗ → R by

L(y, u, λ) = J (y, u) + λ, e(y, u)


,

(5.1.1) can be written as

Jˆ (u) = Lu (y, u, λ).

If ey (y, u) : Y → W is bijective, then (5.1.1) also follows from the implicit function
theory. In fact, by the implicit function theorem there exists a neighborhood V (y) × V (u)
of (y, u) such that e(y, v) = 0 has a unique solution y(v) ∈ V (y) for v ∈ V (u) and
y(·) : V (u) ⊂ U → Y is C 1 with

ey (y, v)y (v) + eu (y, v) = 0, v ∈ V (u).

i i

i i
ItoKunisc
i i
2008/6/12
page 131
i i

5.1. Preliminaries 131

Thus

Jˆ (u)v = Jy (y, u)y (u)v + Ju (y, u)v = eu (y, u)∗ λ, v


+ Ju (y, u)v.

If conditions (C1)–(C3) hold at a local minimizer (y ∗ , u∗ ) of (P ), it follows from


Theorem 5.1 that the first order necessary optimality condition for (P ) is given by

⎪ ey (y ∗ , u∗ )∗ λ∗ + Jy (y ∗ , u∗ ) = 0,


eu (y ∗ , u∗ )∗ λ∗ + Ju (y ∗ , u∗ ) = 0, (5.1.4)



e(y ∗ , u∗ ) = 0,

or equivalently

L (y ∗ , u∗ , λ∗ ) = 0, e(y ∗ , u∗ ) = 0,

where, as in the previous chapter, prime with L denotes the derivative with respect to (y, u).

(II) Sufficient Optimality Condition


Assume that (y ∗ , u∗ , λ∗ ) ∈ Y1 × U × W ∗ satisfies (C1) and that the necessary optimality
condition (5.1.4) holds. For (y, u) ∈ V (y ∗ ) × V (u∗ ) define the quadratic approximation to
J (y, u) and λ, e(y, u)
by

E1 (y −y ∗ , u−u∗ ) = J (y, u)−J (y ∗ , u∗ )−Ju (y ∗ , u∗ )(u−u∗ )−Jy (y ∗ , u∗ )(y −y ∗ ) (5.1.5)

and

E2 (y − y ∗ , u − u∗ ) = λ∗ , e(y, u) − e(y ∗ , u∗ ) − ey (y ∗ , u∗ )(y − y ∗ ) − eu (y ∗ , u∗ )(u − u∗ )


.
(5.1.6)
Since for y = y(u) with u ∈ V (u∗ )

E2 (y(u) − y ∗ , u − u∗ ) = − λ∗ , ey (y ∗ , u∗ )(y − y ∗ ) + eu (y ∗ , u∗ )(u − u∗ )


,

we find by summing (5.1.5) and (5.1.6) with y = y(u) and using (5.1.4)

Jˆ(u) − Jˆ(u∗ ) = J (y(u), u) − J (y ∗ , u∗ ) = E1 (y(u) − y ∗ , u − u∗ ) + E2 (y(u) − y ∗ , u − u∗ ).

Based on this identity, we have the following local sufficient optimality conditions.
• If E(u) := E1 (y(u) − y ∗ , u − y ∗ ) + E2 (y(u) − y ∗ , u − u∗ ) ≥ 0 for all u ∈ V (u∗ ),
then u∗ is a local minimizer of (P ). If E(u) > 0 for all u ∈ V (u∗ ) with u  = u∗ , then u∗ is
a strict local minimizer.
• Assume that J is locally uniformly convex in the sense that for some α > 0

E1 (y(u) − y ∗ , u − u∗ ) ≥ α(|y(u) − y ∗ |2Y + |u − u∗ |2U ) for all u ∈ V (u∗ ),

and let β ≥ 0 be the size of the nonlinearity of e, i.e.,

|e(y, u) − e(y ∗ , u∗ ) − ey (y ∗ , u∗ )(y − y ∗ ) − eu (y ∗ , u∗ )(u − u∗ )| ≤ β(|y − y ∗ |2Y + |u − u∗ |2U )

i i

i i
ItoKunisc
i i
2008/6/12
page 132
i i

132 Chapter 5. Newton and SQP Methods

for all (y, u) ∈ V (y ∗ ) × V (u∗ ). Then

E(u) ≥ (α − β|λ∗ |W ∗ ) (|y(u) − y ∗ |2Y + |u − u∗ |2U ) for all u ∈ V (u∗ ).

Thus, u∗ is a strict local minimizer of (P ) if α − β|λ∗ |W ∗ > 0.


• Note that

E(u) = L(x, λ∗ ) − L(x ∗ , λ∗ ) − L (x ∗ , λ∗ )(x − x ∗ ),

where x = (y, u), x ∗ = (y ∗ , u∗ ), and y ∗ = y(u∗ ). If L is C 2 with respect to x = (y, u) in


V (y ∗ ) × V (u∗ ) with second derivative with respect to x denoted by L , we have

E(u) = L (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) + r(x − x ∗ ), (5.1.7)

where |r(x − x ∗ )| = o(|x − x ∗ |2 ). Thus, if there exists σ > 0 such that

L (x ∗ , λ∗ )(δx, δx) ≥ σ |δx|2 for all δx ∈ ker e (x ∗ ), (5.1.8)

and e (x ∗ ) : Y × U → W is surjective, then there exist σ0 > 0 and ρ > 0 such that

Jˆ(u) − Jˆ(u∗ ) = E(u) ≥ σ0 (|y(u) − y(u∗ ))|2 + |u − u∗ |2 ) (5.1.9)

for all y(u) ∈ V (y ∗ )×V (u∗ ) with |(y(u), u)−(y(u∗ ), u∗ )| < ρ. In fact, from Lemma 2.13,
Chapter 2, with S = ker e (x ∗ ) there exist γ > 0, σ0 > 0 such that

L (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) ≥ σ0 |x − x ∗ |2
(5.1.10)
for all x ∈ V (y ∗ ) × V (u∗ ) with |xker⊥ | ≤ γ |xker |,

where we decomposed x − x ∗ as

x − x ∗ = xker + xker⊥ ∈ ker e (x ∗ ) + (ker e (x ∗ ))⊥ .

Since 0 = e(x) − e(x ∗ ) = e (x ∗ )(x − x ∗ ) + η(x − x ∗ ), where |η(x − x ∗ )|W = o(|x − x ∗ |),
it follows that for δ ∈ (0, 1+γ
γ
) there exists ρ > 0 such that

|xker⊥ | ≤ δ|x − x ∗ | if |x − x ∗ | < ρ.

Consequently
δ
|xker⊥ | ≤ |xker | < γ |xker | if |x − x ∗ | < ρ,
1−δ
and hence (5.1.9) follows from (5.1.7) and (5.1.10).
We refer the reader to [CaTr, RoTr] and the literature cited therein for detailed in-
vestigations on the topic of second order sufficient optimality. The aim of these results is
to establish second order sufficient optimality conditions which are close to second order
necessary conditions.

i i

i i
ItoKunisc
i i
2008/6/12
page 133
i i

5.2. Newton method 133

5.2 Newton method


In this section we describe the Newton method applied to the reduced form (5.0.1) of (P ).
That is, let y(u) denote a solution to e(y, u) = 0. Then the constrained optimization problem
is transformed to the unconstrained problem for u in U :

min Jˆ(u) = J (y(u), u). (5.2.1)


u∈U

Let (y ∗ , u∗ ) denote a solution to (P ), and assume that (C1) holds for (y ∗ , u∗ ) and that (C2),
(C3) hold for all (y(u), u) ∈ V (y ∗ ) × V (u∗ ). In addition it is assumed that J and e are
C 2 in V (y ∗ ) × V (u∗ ) with Lipschitz continuous second derivatives. From Theorem 5.1 the
first derivative of Jˆ(u) is given by

Jˆ (u) = eu (y, u)∗ λ + Ju (y, u), (5.2.2)

where u ∈ V (u∗ ), y = y(u), and λ = λ(u) satisfy

ey (y, u)∗ λ = −Jy (y, u). (5.2.3)

From (5.2.2) it follows that

Jˆ (u) = Lu (y(u), u, λ(u)) for u ∈ V (u∗ ). (5.2.4)

We henceforth assume that for every u ∈ V (u∗ ) and y = y(u), λ = λ(u),


⎛ ⎞⎛ ⎞ ⎛ ⎞
Lyy (y, u, λ) ey (y, u)∗ μ1 Lyu (y, u, λ)
⎝ ⎠⎝ ⎠+⎝ ⎠=0 (5.2.5)
ey (y, u) 0 μ2 eu (y, u)

admits a solution (μ1 , μ2 ) ∈ L(U, Y ) × L(U, W ∗ ). Note that (5.2.5) is an operator equa-
tion in L(U, Y ∗ ) × L(U, W ). It consists of the linearized primal equation for μ1 and the
adjoint operator equation with right-hand side −ey (y, u)∗ μ2 − Lyu (y, u, λ) ∈ L(U, Y ∗ )
for μ2 .
Using (5.24) and the adjoint operator in the form Ly (y, u, λ) = 0 and e(y, u) = 0,
we find
Jˆ (u) = Lyu (y, u, λ)μ1 + eu (y, u)∗ μ2 + Luu (y, u, λ), (5.2.6)

where (μ1 , μ2 ) satisfy (5.2.5). In fact, since e and J are required to have Lipschitz con-
tinuous second derivatives in V (y ∗ ) × V (u∗ ) we obtain the following relationships in W
and Y ∗ :

ey (y, u)v + eu (y, u)(td) = o(|t|),


(5.2.7)
Lyy (y, u, λ)v + ey (y, u)∗ w + Lyu (y, u, λ)(td) = o(|t|),

i i

i i
ItoKunisc
i i
2008/6/12
page 134
i i

134 Chapter 5. Newton and SQP Methods

where (y, u) ∈ V (y ∗ ) × V (u∗ ), d ∈ U , and v = y(u + td) − y, w = λ(u + td) − λ. By


(5.2.4), (5.2.5), and (5.2.7) we find
Jˆ (u + td) − Jˆ (u) = Luy (y, u, λ)v + Luu (y, u, λ)(td) + eu (y, u)∗ w + o(|t|)
= −Lyy (y, u, λ)(μ1 , v) − ey (y, u)v, μ2
W,L(U,W ∗ ) + Luu (y, u, λ)(td)
− ey (y, u)μ1 , w
L(U,W ∗ ),W + o(|t|)
= ey (y, u)μ1 , w
L(U,W ),W ∗ + Lyu (y, u, λ)(td, μ1 ) + eu∗ (y, u)μ2 , td
L(U,U ∗ ),U
+Luu (y, u, λ)(td) − ey (y, u)μ1 , w
L(U,W ),W ∗ + o(|t|).
Dividing by t and letting t → 0 we obtain (5.2.6).
From (5.2.5) we deduce
μ1 = −(ey (y, u))−1 eu (y, u),

μ2 = −(ey (y, u)∗ )−1 (Lyy μ1 + Lyu ),

where L is evaluated at (y, u, λ). Hence the second derivative of Jˆ is given by


Jˆ (u) = Luy μ1 − eu (y, u)∗ (ey (y, u)∗ )−1 (Lyy μ1 + Lyu ) + Luu
(5.2.8)
= T (y, u)∗ L (y, u, λ)T (y, u),
with
⎛ ⎞
−ey (y, u)−1 eu (y, u)
T (y, u) = ⎝ ⎠,
I
where T ∈ L(U, Y × U ). After this preparation the Newton equation
Jˆ (u)δu + Jˆ (u) = 0 (5.2.9)
can be expressed as
⎛ ⎞
  0
δy
T (y, u)∗ L (y, u, λ) + T (y, u)∗ ⎝ ⎠ = 0,
δu
(eu (y, u)∗ λ + Ju (y, u)

ey (y, u)δy + eu (y, u)δu = 0.


By the definition of T (y, u) and the closed range theorem we have
ker(T (y, u)∗ ) = range(T (y, u))⊥ = ker(e (y, u))⊥ = range(e (y, u)∗ )
provided that range(e (y, u)), and hence range(e (y, u))∗ , is closed. As a consequence the
Newton update can be expressed as
⎛ ⎞ ⎛ ⎞
  δy 0
L (y, u, λ) e (y, u)∗ ⎝
δu ⎠ + ⎝ eu (y, u)∗ λ + Ju (y, u) ⎠ = 0, (5.2.10)
e (y, u) 0
δλ 0

i i

i i
ItoKunisc
i i
2008/6/12
page 135
i i

5.2. Newton method 135

where (δy, δu, δλ) ∈ Y × U × W ∗ and the equality is understood in Y ∗ × U ∗ × W .


We now state the Newton iteration for (5.2.1).

Newton Method.
(i) Initialization: Choose u0 ∈ V (u∗ ), solve

e(y, u0 ) = 0, ey (y, u0 )∗ λ + Jy (y, u0 ) = 0 for (y0 , λ0 ),

and set k = 0
(ii) Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗
⎛   ⎞ ⎛ ⎞
  δy 0
L (yk , uk , λk ) e (yk , uk )∗ ⎝ ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0.
δu
e (yk , uk ) 0
δλ 0

(iii) Update uk+1 = uk + δu.


(iv) Feasibility step: Solve for (yk+1 , λk+1 ) ∈ Y × W ∗

e(y, uk+1 ) = 0, ey (yk+1 , uk+1 )∗ λ + Jy (y, uk+1 ) = 0.

(v) Stop, or set k = k + 1, and goto (ii).

The following theorem provides sufficient conditions for local quadratic convergence.
Here we assume that ey (x ∗ ) : Y → W is a bijection at a local solution x ∗ = (y ∗ , u∗ ) of (P ).
Then there corresponds a unique Lagrange multiplier λ∗ .

Theorem 5.2. Let x ∗ = (y ∗ , u∗ ) be a local solution to (P ) and let V (y ∗ ) × V (u∗ ) denote


a neighborhood in which J and e are C 2 with Lipschitz continuous second derivatives. Let
us further assume that ey (x ∗ ) : Y → W is a bijection and that there exists κ > 0 such that

L (x ∗ , λ∗ )(h, h) ≥ κ|h|2 for all h ∈ ker e (x ∗ ).

Then, if |u0 − u∗ | is sufficiently small, |(yk+1 , uk+1 , λk+1 ) − (y ∗ , u∗ , λ∗ )| ≤ K|uk − u∗ |2


for a constant K independent of k = 0, 1, . . . .

Proof. Since ey (x ∗ ) : Y → W is a homeomorphism and x → ey (x) is continuous from


X → L(Y, X) , there exist r > 0 and γ > 0 such that

⎪ ey (x)−1 L(W,Y ) ≤ γ for all x ∈ Br ,


ey (x)−∗ L(Y ∗ ,W ∗ ) ≤ γ for all x ∈ Br , (5.2.11)



T ∗ (x)L (x, λ)T (x)u, u
≥ 2 |u|2 for all x ∈ Br , |λ − λ∗ | < r,
κ

where Br = {(y, u) ∈ Y × U : |y − y ∗ | < r, |u − u∗ | < r}. These estimates imply that,


possibly after decreasing r > 0, there exists some K̂ > 0 such that

|y(u) − y ∗ |Y ≤ K̂|u − u∗ |U and |λ(u) − λ∗ |W ∗ ≤ K̂|u − u∗ |U for all |u − u∗ | < r,

i i

i i
ItoKunisc
i i
2008/6/12
page 136
i i

136 Chapter 5. Newton and SQP Methods

where ey (y(u), u)∗ λ(u) = −Jy (y(u), u). Thus for ρ = min(r, K̂r ) the inequalities in
(5.2.11) hold with x = (y(u), u) and λ = λ(u), provided that |u − u∗ | < ρ. Let S(u) =
T (y(u), u)∗ L (y(u), u, λ(u))T (y(u), u). Then there exists μ > 0 such that

|S(u) − S(u∗ )| ≤ μ|u − u∗ | for |u − u∗ | < ρ.

Let |u0 − u∗ | < min( μκ , ρ) and, proceeding by induction, assume that |uk − u∗ | ≤ |u0 − u∗ |.
Then
S(uk )(uk+1 − u∗ ) = S(uk )(uk − u∗ ) − Jˆ (uk ) + Jˆ (u∗ )
 1 (5.2.12)
= (S(uk + s(u∗ − uk )) − S(uk ))(u∗ − uk ) ds
0

and hence
2μ μ
|uk+1 − u∗ | ≤ |uk − u∗ |2 ≤ |u0 − u∗ | |u0 − u∗ | ≤ |u0 − u∗ | < ρ.
κ 2 κ

It further follows that |y(uk+1 ) − y ∗ |Y ≤ K̂μ


κ
|uk − u∗ |2 and |y(uk+1 ) − y ∗ |Y ≤ K̂|uk+1 −
u∗ | < K̂ρ ≤ r. In a similar manner we have |λ(uk+1 ) − λ∗ |W ∗ ≤ K̂μ
κ
|uk − u∗ |2 and
|λ(uk+1 ) − λ∗ |W ∗ < r. The claim follows from these estimates.

Remark 5.2.1. If |λ∗ |W ∗ is small, then it is suggested to approximate the Hessian of


L (y, u, λ) by
L (y, u, λ) ∼ J (y, u)
and the reduced Hessian T ∗ L T by

Jˆ (u) ∼ T (y, u)∗ J (y, u)T (y, u).

Under the assumptions of Theorem 5.2, T ∗ J T is positive definite in a neighborhood of


the solution, provided that λ∗ is sufficiently small. Moreover, if λ∗ = 0, then the Newton
method converges superlinearly to (y ∗ , u∗ ), if it converges. Indeed we can follow the proof
of Theorem 5.2 and replace (5.2.12) by

S0 (uk )(uk+1 − u∗ ) = S(uk )(uk − u∗ ) − Jˆ (uk ) + Jˆ (u∗ )


−T ∗ (y(uk ), uk )(e (xk )∗ (λk − λ∗ ))T (y(uk ), uk )(uk − u∗ ),

where S0 (u) = T ∗ (y(u), u) J (y(u), u) T (y(u), u).

Remark 5.2.2. Note that the reduced Hessian S = T ∗ L T is a Schur complement of the
linear system (5.2.10). That is, if we eliminate δy and δλ by

δy = −ey−1 eu δu, δλ = −(ey∗ )−1 (Lyy δy + Lyu δu),

we obtain (5.2.9) in the form

Sδu + Lu (y, u, λ) = 0. (5.2.13)

i i

i i
ItoKunisc
i i
2008/6/12
page 137
i i

5.3. SQP and reduced SQP methods 137

Remark 5.2.3. If (5.2.13) is solved by an iterative method based on Krylov subspaces,


then this uses Sr for given r ∈ U and requires that we perform the forward solution
4 = −ey−1 (y, u)eu (y, u)r,
δy

the adjoint solution


4 = −(ey (y, u)∗ )−1 (Lyy δy
δλ 4 + Lyu r),

and evaluation of the expression


4 + Luu r + eu (y, u)∗ δ4
Sr = Luy δy λ.

In applications where the discretization of U is much smaller than that of Y × U × W ∗ this


procedure may offer a significant reduction in storage and execution time over solving the
saddle point problem (5.2.10).

5.3 SQP and reduced SQP methods


In this section we describe the SQP (sequential quadratic programing) and reduced SQP
methods without entering into technical details. Throughout this section we set x = (y, u) ∈
X = Y × U . Let x ∗ denote a local solution to
-
min J (x) over x ∈ X
(5.3.1)
subject to e(x) = 0.

Further let

L(x, λ) = J (x) + λ, e(x)

denote the Lagrangian and let λ∗ be a Lagrange multiplier at the solution x ∗ . Then a
necessary first order optimality condition is given by

L (y ∗ , u∗ , λ∗ ) = 0, e(x ∗ ) = 0, (5.3.2)

and a second order sufficient optimality condition by

L (x ∗ , λ∗ )(h, h) ≥ σ |h|2X for all h ∈ ker e (x ∗ ), (5.3.3)

where σ > 0. Differently from the Newton method described in the previous section,
in the SQP method both y and u are considered as independent variables related by the
equality constraint e(y, u) = 0 which is realized by a Lagrangian term. The SQP method
then consists essentially in a Newton method applied to the necessary optimality condition
(5.3.2) to iteratively solve for (y ∗ , u∗ , λ∗ ). This results in determining updates from the
linear system
⎛ ⎞ ⎛ ⎞
  δy ey (yk , uk )∗ λk + Jy (yk , uk )
L (yk , uk , λk )e (yk , uk )∗ ⎝
δu ⎠ + ⎝ eu (yk , uk )∗ λk + Ju (yk , uk ) ⎠ = 0.
e (yk , uk ) 0
δλ e(yk , uk )
(5.3.4)

i i

i i
ItoKunisc
i i
2008/6/12
page 138
i i

138 Chapter 5. Newton and SQP Methods

If the values for yk and λk are obtained by means of a feasibility step as in (iv) of the Newton
algorithm, then the first and the last components on the right-hand side of (5.3.4) are 0 and
we arrive at the Newton algorithm. This will be further investigated in Section 5.5 for
weakly singular problems. Note that for affine constraints the iterates, except possibly the
initialization, are feasible by construction, and hence e(xk ) = 0.
Let us note that given an iterate xk = (yk+1 , uk+1 ) near x ∗ the SQP step for (5.3.1) is
also obtained as the necessary optimality to the quadratic subproblem


⎨min L (xk , λk ), h
+
1
L (xk , λk )(h, h)
2 (5.3.5)

⎩subject to e (x )h + e(x ) = 0, h ∈ X,
k k

for hk = (δyk , δuk ) and setting xk+1 = xk + hk .


If (5.3.1) admits a solution (x ∗ , λ∗ ) and J and e are C 2 in a neighborhood of x ∗ with
Lipschitz continuous second derivatives, and if further e (x ∗ ) is surjective and there exists
κ > 0 such that

L (x ∗ , λ∗ )(h, h) ≥ κ|h|2X for all h ∈ ker e (x ∗ ),

then

|(xk+1 , λk+1 ) − (x ∗ , λ∗ )|2 ≤ K|(xk , λk ) − (x ∗ , λ∗ )|

for a constant K independent of k, provided that |(x0 , λ0 ) − (x ∗ , λ∗ )| is sufficiently small.


Thus the SQP method is locally quadratically convergent. The proof is quite similar to the
one of Theorem 6.4.
Assume next that there exists a null-space representation of e (x) for all x in a neigh-
borhood of x ∗ ; i.e., there exist a Hilbert space H and an isomorphism T (x) from H onto
ker e (x), where ker e (x ∗ ) is endowed with the norm of X. Then (5.3.2) and (5.3.3) can be
expressed as

T (x ∗ )∗ J (x ∗ ) = 0 (5.3.6)

and

T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ ) ≥ σ I, (5.3.7)

respectively.
Referring to (5.3.5), let qk ∈ ker e (xk )⊥ ⊂ X satisfy e (xk )qk + e(xk ) = 0. Then
hk ∈ X can be expressed as
hk = qk + T (xk )w.
Thus, (5.3.5) is reduced to

min L (xk , λk ), T (xk )w


+ T (xk )w, L (xk , λk )qk

(5.3.8)
1
+ T (xk )w, L (xk , λk )T (xk )w
,
2

i i

i i
ItoKunisc
i i
2008/6/12
page 139
i i

5.3. SQP and reduced SQP methods 139

and the solution hk to (5.3.5) can be expressed as

hk = qk + T (xk )wk , (5.3.9)

where wk is a solution to the unconstrained problem (5.3.8) in H , given by



T ∗ (xk )L (xk , λk )T (xk )wk = −T ∗ (xk )(J (xk ) − L (xk , λk )Rk (xk )e(xk )). (5.3.10)

Therefore the full SQP step is decomposed into a minimization step in H , a restoration step
to the linearized equation
e (xk )q + e(xk ) = 0,
and an update of the Lagrange multiplier according to

e (xk )∗ δλ = −e (xk )∗ λk − Jy (xk ) − L (xk , λk )hk .

If e (xk ) ∈ L(X, W ) admits a right inverse R(xk ) ∈ L(W, X) satisfying e (xk )R(xk ) = IW ,
then qk = −R(xk )e(xk ) in (5.3.9) and
 
hk = −Rk e(xk ) − Tk Tk∗ L (xk , λk )Tk )−1 Tk∗ (J (xk ) − L (xk , λk )Rk e(xk ) , (5.3.11)

where Tk = T (xk ) and Rk = R(xk ) and we used that T ∗ w = 0 for w ∈ range e (x)∗ . Note
that a right inverse to e (x) for x in a neighborhood of x ∗ exists if e (x ∗ ) is surjective and
x → e (x) is continuous.
An alternative to deriving the update wk is given by differentiating

T (x)∗ J (x) = T (x)∗ L (x, λ)

with respect to x, evaluating at x = x ∗ , λ = λ∗ , and using (5.3.2):

d
(T (x ∗ )∗ J (x ∗ )) = T (x ∗ )∗ L (x ∗ , λ∗ ).
dx
This representation holds in general only at the solution x ∗ . But if we use its structure for
an update of the form hk = qk + T (xk )wk in a Newton step to (5.3.6), we arrive at

Tk∗ L (xk , λk )hk = Tk∗ L (xk , λk )(Tk wk − R(xk )e(xk )) = −Tk∗ J (xk ),

which results in an update as in (5.3.10) above.



In the reduced SQP approach the term L (xk , λk )Rk e(xk ) is deleted from the expres-
sion in (5.3.11). Recall that for Newton’s method this term vanishes since e(xk ) = 0 at each
iteration level. This results in

Tk∗ L (xk , λk )T (xk )wk = −Tk∗ J (xk ). (5.3.12)

This equation for the update of the control coincides with the Schur complement form of
the Newton update given in (5.2.13), since

Lu (xk , λk ) = eu∗ (xk )λk + Ju (xk ) = Tk∗ J (xk ),

i i

i i
ItoKunisc
i i
2008/6/12
page 140
i i

140 Chapter 5. Newton and SQP Methods

where we used the fact that in the Newton method ey∗ (xk )λk + Jy (xk ) = 0. The updates for
the state and the adjoint state differ, however, for the Newton and the reduced SQP methods.
A second distinguishing feature for reduced SQP methods is that often the reduced
Hessians Tk∗ L (xk , λk )Tk are approximated by invertible operators Bk ∈ L(H ) suggested
by secant methods. Thus a reduced SQP step has the form xk+1 = xk + hRED k , where

−1 ∗
k = −Rk e(xk ) − Tk Bk Tk J (xk ).
hRED (5.3.13)

For the reduced SQP method the step hRED k in general depends on the specific choice of
the null-space representation and the right inverse. In the full SQP method the step hk in
(5.3.11) is invariant with respect to these choices. A third distinguishing feature of reduced
SQP methods is the choice of the Lagrange multiplier, which is required to specify the
update of Bk . The λ-update is typically not derived from the first equation in (5.3.4). From
the first order condition we have

J (x ∗ ), R(x ∗ )h
+ λ∗ , e (x ∗ )R(x ∗ )h
= J (x ∗ ), R(x ∗ )h
+ λ∗ , h
= 0
for all h ∈ W.

This suggests the λ-update

λ+ = −R(x)∗ J (x). (5.3.14)

Other choices for the Lagrange multiplier update are possible. For convergence proofs of
reduced SQP methods this update is required to be locally Lipschitz continuous; see [Kup,
KuSa].
In the case of problem (P ), a vector (δy, δu) ∈ X lies in the null-space of e (x) if

ey (x)δy + eu (x)δu = 0.

Assuming that ey (x) : Y → W is invertible, we have

ker e (x) = {(δy, δu) : δy = −ey (x)−1 eu (x) δu}.

This suggests choosing H = U and using the following definitions for the null-space
representation and the right inverse:

T (x) = (−ey (x)−1 eu (x), I ), R(x) = (ey (x)−1 , 0). (5.3.15)

In this case a reduced SQP step can be decomposed as follows:

(i) solve Bk δu = −T (xk )∗ J (xk ),

(ii) solve ey (xk )δy = −eu (xk )δu − e(xk ),

(iii) set xk+1 = xk + (δy, δu),

(iv) update λ.

i i

i i
ItoKunisc
i i
2008/6/12
page 141
i i

5.3. SQP and reduced SQP methods 141

The update of the Lagrange multiplier λ is needed for the approximation Bk to the reduced
Hessian T ∗ (xk )L (xk , λk )T (xk ). A popular choice is given by the BFGS-update formula
1 1
Bk+1 = Bk + zk , ·
zk − Bk δuk , ·
Bk δuk ,
zk , δuk
δuk , Bk δuk

where
 
zk = T ∗ (xk )L xk + T (xk )δuk , λk − J (xk ).
Each SQP step requires at least three linear systems solves, one in W ∗ for the evaluation of
T (xk )∗ J (xk ), one in U for δu, and another one in Y for δy. The update of the BFGS formula
requires one additional system solve. The typical convergence behavior that can be proved
for reduced SQP methods with BFGS update is local two-step superlinear convergence
[Kup, KuSa, KSa],
|xk+1 − x ∗ |
lim =0
k→∞ |xk−1 − x ∗ |

provided that
B0 − T (x ∗ )∗ L (x ∗ , λ∗ )T (x ∗ ) is compact. (5.3.16)
Here B0 denotes the initialization to the reduced Hessian approximation, and (x ∗ , λ∗ ) is a
solution to (5.1.4). If e is C 1 at x ∗ and ey (x ∗ ) is continuously invertible and compact, then
(5.3.16) holds for the choice B0 = Luu (x ∗ , λ∗ ). In [HiK2], condition (5.3.16) is analyzed
for a class of optimal control problems related to the stationary Navier–Stokes equations.
We recall that condition (5.3.16) is related to an analogous condition in the context of solving
nonlinear equations F (x) = 0 in infinite-dimensional Hilbert spaces by means of secant
methods, in particular the Broyden method. The initialization of the Jacobian must be a
compact perturbation of the linearization of F at the solution to ascertain local Q-superlinear
convergence [Grie, Sa].
If we deal with finite-dimensional problems with e a mapping from RN into RM , with
M < N , then a common null-space representation is provided by the QR decomposition.
Let H = RN−M and
 
R1 (x)
e (x) = Q(x)R(x) = (Q1 (x) Q2 (x))
T
,
0

where Q(x) is orthogonal, Q1 (x) ∈ RN ×M , Q2 (x) ∈ RN ×(N−M) , and R1 (x) ∈ RM×M is an


upper triangular matrix. Then the null-space representation and the right inverse to e (x)
are given by
T (x) = Q2 (x) ∈ RN×(N −M) and R(x) = Q1 (x)(R1T (x))−1 ∈ RN ×M .
In the finite-dimensional setting one often works with the “least squares solution” for the
λ-update, i.e., with the solution to
min |J (x) + e (x)∗ λ|,
λ

which results in λ(x)+ = −R1−1 (x)QT1 (x)J (x).

i i

i i
ItoKunisc
i i
2008/6/12
page 142
i i

142 Chapter 5. Newton and SQP Methods

To derive another interesting feature of a variant of a reduced SQP method we recapture


the derivation from above for (P ) with T and R chosen as in (5.3.15), as

⎪ T (x)∗ L (x, λ)T (x)w = −T (x)∗ L (x, λ),


h = (δy, δu) = q + T (x)w, where q − R(x)e(x) = 0, (5.3.17)



λ + δλ = −R(x)∗ J (x).


In particular, the term L (xk , λk )Rk e(xk ) is again deleted from the expression in (5.3.10)
and the Lagrange multiplier update is chosen according to (5.3.14). However, differently
from (5.3.13), the reduced Hessian is not approximated in (5.3.17). Here we do not indicate
the iteration index k, and we assume that ey (x) : Y → W is bijective and that (5.3.7) holds.
Then (q, w, δλ) is a solution to (5.3.17) if and only if (δy, δu, δλ) is a solution to the
system
⎛ ⎞⎛ ⎞ ⎛ ⎞
0 0 ey (x)∗ δy Ly (x, λ)
⎝ 0 T (x)∗ L (x, λ)T (x) eu (x)∗ ⎠ ⎝ δu ⎠ = − ⎝ Lu (x, λ) ⎠ , (5.3.18)
ey (x) eu (x) 0 δλ e(x)

with w = δu, q = −R(x)e(x), and (δy, δu) = q + T (x)δu. In fact, R ∗ (x) is a left inverse
to e (x)∗ , and from the first equation in (5.3.18) we have

δλ = −(ey (x)∗ )−1 (Jy (x) + ey∗ (x)λ) = −R(x)∗ J (x) + λ.

From the second equation

T (x)∗ L (x, λ)T (x)δu = −Lu (x, λ) − eu (x)∗ δλ


= −Lu (x, λ) + eu (x)∗ ey (x)−∗ Ly (x, λ) = −T (x)∗ L (x, λ) = −T (x)∗ J (x).

The third equation is equivalent to

(δy, δu) = −R(x)e(x) + T (x)δu.

System (5.3.18) should be compared to (5.3.4). We note that the reduced SQP step is
equivalent to a block tridiagonal system, where the “Luu ” element in the system matrix is
replaced by the reduced Hessian while the elements corresponding to “Lyy ” and “Lyu ” are
zero. The system matrix of (5.3.18) can be used advantageously as preconditioning for
iteratively solving (5.3.4). In fact, let us denote the system matrices in (5.3.4) and (5.3.18)
by S and Sred , respectively, and consider the iteration

−1
 
δzn+1 = δzn − Sred S δzk + col (Ly (x, λ), Lu (x, λ), e(x)) . (5.3.19)

−1
In [IKSG] it was proved that the iteration matrix I − Sred S is nilpotent of degree 3. Hence
the iteration (5.3.19) converges in 3 steps to the solution of (5.3.4) with (xk , λk ) = (x, λ).

i i

i i
ItoKunisc
i i
2008/6/12
page 143
i i

5.4. Optimal control of the Navier–Stokes equations 143

5.4 Optimal control of the Navier–Stokes equations


In this section we consider the optimal control problem
⎧ T  

⎪ min J (y, u) = 0 (y(t)) + h(u(t)) dt




⎨ over u ∈ U = L2 (0, T ; Ũ ) subject to
(5.4.1)

⎪ d
= A0 y(t) + F (y(t)) + Bu(t) for t ∈ (0, T ],

⎪ y(t)


dt

y(0) = y0 ,

where A0 is a densely defined, self-adjoint operator in a Hilbert space H which satisfies


(−A0 φ, φ)H ≥ α|φ|2H for some α > 0 independent of φ ∈ domA0 . We set V =
1 1
dom((−A0 ) 2 ) endowed with |φ|V = |(−A0 ) 2 φ|H as norm. The V -coercive form (v, w) →
((−A0 ) 2 v, (−A0 ) 2 w) also defines an operator A ∈ L(V , V ∗ ) satisfying −Av, w
V ∗ ,V =
1 1

1 1
((−A0 ) 2 v, (−A0 ) 2 w)H for all v, w ∈ W . Since A and A0 coincide on dom A0 , and
dom A0 is dense in V , we shall not distinguish between A and A0 as operators in L(V , V ∗ ).
In (5.4.1), moreover, Ũ is the Hilbert space of controls, B ∈ L(Ũ , V ∗ ), and y0 ∈ H . We as-
sume that h ∈ C 1 (Ũ , R) with Lipschitz continuous first derivative and  ∈ C 1 (V , R), with
 Lipschitz continuous on bounded subsets of H . The nonlinearity F is supposed to satisfy


⎪ F : V → V ∗ is continuously differentiable and there exists




⎪ a constant c > 0 such that for every ε > 0






⎨ F (y) − F (ȳ), y − ȳ
V ∗ ,V ≤ ε|y − ȳ|2V + εc (|ȳ|2V + |y|2H |y|2V )|y − ȳ|2H ,
(H1)

⎪ F (ȳ)y, y
V ,V ≤ ε|y|V + ε |y|H |ȳ|V (1 + |ȳ|H ),
2 c 2 2 2
⎪ ∗





1 1 1 1

⎪ |(F (y) − F (ȳ))v|V ∗ ≤ c|y − ȳ|H2 |y − ȳ|V2 |v|H2 |v|V2




for all y, ȳ, and v in V .



⎪ For any u ∈ L2 (0, T ; Ũ ) there exists a unique weak





⎪ solution y = y(u) in





⎪ W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) satisfying



(H2) dtd y(t), ψ
V ∗ ,V = A0 y(t) + F (y(t)) + Bu(t), ψ
V ∗ ,V




⎪ for all ψ ∈ V , and y(0) = y0 .







⎪ Moreover {y(u) : |u|L2 (0,T ;Ũ ) ≤ r} is bounded in




W (0, T ) for each r > 0.

With these preliminaries the dynamical system in (5.4.1) can be considered with values
in V ∗ . The conditions on A0 and F are motivated by the two-dimensional incompressible

i i

i i
ItoKunisc
i i
2008/6/12
page 144
i i

144 Chapter 5. Newton and SQP Methods

Navier–Stokes equations with distributed control given by




d
y + (y · ∇)y + ∇p = ν∇y + Bu in (0, T ] × ,

⎨ dt

∇y = 0 in (0, T ] × , (5.4.2)



y(0, ·) = y0 in ,

where  is a bounded domain with Lipschitz continuous boundary ∂, y = y(t, x) ∈ R2 is


the the velocity field, p = p(t, x) ∈ R is the pressure, and ν is the normalized viscosity. Let

V = {φ ∈ H01 ()2 : ∇ · φ = 0}, H = {φ ∈ L2 ()2 : ∇ · φ = 0, n · φ = 0 on δ},

where n is the outer unit normal vector to ∂, let denote the Laplacian in H , and let P F
denote the orthogonal projection of L2 ()2 onto the closed subspace H . Then A0 = νP
1
is the Stokes operator in H . It is a self-adjoint operator with domain dom((−A0 ) 2 ) = V and

−(A0 φ, ψ)H = ν(∇φ, ∇ψ)L2 for all φ ∈ dom(A0 ), ψ ∈ V .

If ∂ is sufficiently smooth, e.g., ∂ is C 2 , then dom(A0 ) = H 2 ()2 ∩ V . The nonlinearity


in (5.4.2) satisfies the following properties:
 
(u · ∇v) w dx = (u · ∇w) v dx, (5.4.3)
 
 1 1 1 1
(u · ∇v) w dx ≤ c|u|H2 |u|V2 |v|V |w|H2 |w|V2 (5.4.4)


for a constant c independent of u, v, w ∈ V ; see, e.g., [Te]. In terms of the general


formulation (5.4.1) the nonlinearity F : V → V ∗ is given by

(F (φ), v)V ∗ ,V = − (φ · ∇)φ v dx.


It is well defined due to (5.4.4). Moreover (H1) follows from (5.4.3) and (5.4.4). From the
theory of variational solutions to the Navier–Stokes equations it follows that there exists
a constant C such that for all y0 ∈ H and u ∈ L2 (0, T ; Ũ ) there exists a unique solution
y ∈ W (0, T ) such that

|y|C(0,T ;H ) + |y|W (0,T ) ≤ C(|y0 |H + |u|L2 (0,T ;Ũ ) + |y0 |2H + |u|2L2 (0,T ;Ũ ) ),

where |y|W (0,T ) = |y|L2 (0,T ;V ) +| dtd y|L2 (0,T ;V ∗ ) and we recall that C([0, T ]; H ) is embedded
continuously into W (0, T ); see, e.g., [LiMa, Tem]. Assuming the existence of a local solu-
tion to (5.4.1) we shall present in the following subsections first and second order optimality
conditions, as well as the steps which are necessary to realize the Newton algorithm.
Control and especially optimal control has received a considerable amount of attention
in the literature. Here we only mention a few [Be, FGH, Gu, HiK2] and refer the reader to
further references given there.

i i

i i
ItoKunisc
i i
2008/6/12
page 145
i i

5.4. Optimal control of the Navier–Stokes equations 145

5.4.1 Necessary optimality condition


Let x ∗ = (y ∗ , u∗ ) denote a local solution to (5.4.1). We shall derive a first order optimality
condition by verifying the assumptions of Theorem 5.1 and applying (5.1.4). In the context
of Section 5.1 we set
Y = W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ),

W = L2 (0, T ; V ∗ ) × H, U = L2 (0, T ; Ũ ),
T  
J (y, u) = 0 (y(t)) + h(u(t)) dt,
e(y, u) = (yt − A0 y − F (y) − Bu, y(0)).

To verify (C1)–(C3) with (y, u) = (y ∗ , u∗ ), let V (y ∗ ) × V (u∗ ) denote a bounded neighbor-


hood of (y ∗ , u∗ ). Let V (y ∗ ) be chosen such that y(u) ∈ V (y ∗ ) for every u ∈ V (u∗ ). This
is possible by (H2). Since Y = W (0, T ) is embedded continuously into C([0, T ]; H ), the
continuity assumptions for  and h imply the continuity requirements J in (C1). Note that
e (y, u) : Y × U → W is given by

e (y, u)(δy, δu) = ((δy)t − A0 δy − F (y)δy − Bδu, δy(0)),

and global Lipschitz continuity of (y, u) → e (y, u) from V (y ∗ ) × V (u∗ ) ⊂ Y × U to


L(Y × U, W ) follows from the last condition in (H1). Solvability of e(y, u) = 0 with
respect to y for given u follows from (H2) and hence (C1) holds. Since, by (H1), the
bounded bilinear form a(t; ·, ·) : V × V → R defined by

t → a(t; φ, ψ) = − A0 φ, ψ
V ∗ ,V − F (y ∗ (t))φ, ψ
V ∗ ,V

satisfies c
a(t; φ, φ) ≥ |φ|2V − ε|φ|2V − |φ|2V |y ∗ (t)|2V (1 + |y ∗ |2C(0,T ;H ) ),
ε
there exists a constant c̄ > 0 such that
1 2
a(t; φ, φ) ≥ |φ| − c̄|y ∗ (t)|2V |φ|2H for t ∈ (0, T ),
2 V
where |y ∗ |2V ∈ L1 (0, T ). Consequently, the adjoint equation
-
− dtd p ∗ (t) = A0 p ∗ (t) + F (y ∗ (t))∗ p ∗ (t) +  (y ∗ (t)),
(5.4.5)
p∗ (T ) = 0

admits a unique solution p∗ ∈ W (0, T ), with


 T T
 T
∗ ∗ C 0 |y ∗ (s)|2V ds
|p (t)|H +
2
|p (s)|V ds ≤ e
2
| (y ∗ (s))|2V ∗ ds (5.4.6)
t t

for a constant C independent of t ∈ [0, T ]. In fact,


1 d ∗ 2
− |p (t)|H + |p ∗ (t)|2V ≤ | F (y ∗ (t))p ∗ (t), p∗ (t)
V ∗ ,V | + | (y ∗ (t))|V ∗ |p ∗ (t)|V .
2 dt

i i

i i
ItoKunisc
i i
2008/6/12
page 146
i i

146 Chapter 5. Newton and SQP Methods

Hence by (H1) there exists a constant C such that

d ∗ 2
|p (t)|H − C|y ∗ (t)|2V |p ∗ (t)|2H + |p ∗ (t)|2V ≤ | (y ∗ (t))|2V ∗ .

dt
T
Multiplying by exp(− t ρ̄(s)ds), where ρ̄(s) = C|y ∗ (s)|2V , we find
 T  s  T  s
∗ ∗ ∗
|p (t)|2H + |p (s)|2V exp ρ̄(τ )dτ ds ≤ | (y (s))|2V ∗ exp ρ̄(τ )dτ ds
t t t t

and (5.4.6) follows. This implies (C2) with λ∗ = (p ∗ , p∗ (0)). For any y ∈ W (0, T ) we
have  
d d
|y(t)|2H = 2 y(t), y(t) for t ∈ (0, T ). (5.4.7)
dt dt V ∗ ,V

Let u ∈ V (u∗ ) and denote by y = y(u) ∈ V (y ∗ ) the solution to the dynamical system in
(5.4.1). From (5.4.7) we have

1 d
|y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V
2 dt
= F (y(t)) − F (y ∗ (t)), y(t) − y ∗ (t)
V ∗ ,V + Bu(t) − Bu∗ (t), y(t) − y ∗ (t)
V ∗ ,V .

By (H2) the set {y(u) : u ∈ V (u∗ )} is bounded in W (0, T ). Hence by (H1) there exists a
constant C independent of u ∈ V (u∗ ) and t ∈ [0, T ] such that

d
|y(t) − y ∗ (t)|2H + |y(t) − y ∗ (t)|2V ≤ C(ρ(t)|y(t) − y ∗ (t)|2H + |u(t) − u∗ (t)|2Ũ ),
dt

where ρ(t) = |y(t)|2V + |y ∗ (t)|2V . By Gronwall’s inequality this implies that


 t   T  t
∗ ∗
|y(t) − y (t)|2H + |y(t) − y (t)|2V ds ≤ C exp C ρ(τ )dτ |u(s) − u∗ (s)|2 ds,
0 0 0

T
where 0 ρ(τ )dτ is bounded independent of u ∈ V (u∗ ). Utilizing the equations satisfied
by y(u) and y(u∗ ) it follows that

|y(u) − y(u∗ )|W (0,T ) ≤ Ĉ|u − u∗ |L2 (0,T ;Ũ ) (5.4.8)

for a constant Ĉ independent of u ∈ V (u∗ ). Hence (C3) follows and Theorem 5.1 implies
the optimality system

d ∗
y (t) = A0 y ∗ (t) + F (y ∗ (t)) + Bu∗ (t), y(0) = y0 ,
dt
d
− p ∗ (t) = A0 (y ∗ (t)) + F (y ∗ (t))∗ p ∗ (t) +  (y ∗ (t)), p ∗ (T ) = 0,
dt
B ∗ p ∗ (t) + h (u∗ (t)) = 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 147
i i

5.4. Optimal control of the Navier–Stokes equations 147

5.4.2 Sufficient optimality condition


We assume J to be of the form
 
1 T β T
J (y, u) = (Q(y(t) − ȳ(t)), y(t) − ȳ(t))H dt + |u(t)|2Ũ dt,
2 0 2 0

where Q is symmetric and positive semidefinite on H , ȳ ∈ L2 (0, T ; H ), and β > 0. We


shall verify the requirements in (II) of Section 5.1. By (5.4.8) it follows that

E1 (y(u) − y ∗ , u − u∗ ) ≥ α(|y(u) − y ∗ |2W (0,T ) + |u − u∗ |2L2 (0,T ;Ũ ) )


(5.4.9)
for all u ∈ V (u∗ ).

Moreover,

|e(y(u), u) − e(y ∗ , u∗ ) − e (y ∗ , u∗ )(y(u) − y ∗ , u − u∗ )|L2 (0,T ;V ∗ )×H


= |F (y(u)) − F (y ∗ ) − F (y ∗ )(y(u) − y ∗ )|L2 (0,T ,V ∗ )
 T  12
∗ ∗ ∗
= |F (y) − F (y ) − F (y )(y(u) − y )|2V ∗
0
 T  1 2  12
≤c s|y(t) − y ∗ (t)|H |y(t) − y ∗ (t)|V ds dt
0 0
c
≤ √ |y − y ∗ |C(0,T ;H ) |y − y ∗ |L2 (0,T ;V ) ,
2
and therefore
|e(y(u), u) − e(y ∗ , u∗ ) − e (y ∗ , u∗ )(y(u) − y ∗ )|L2 (0,T ;V ∗ )×H
(5.4.10)
≤ C̄|y(u) − y ∗ |2W (0,T )

for a constant C̄ independent of u ∈ V (u∗ ). From (5.4.6)


 
C ∗
|p∗ |L2 (0,T ;V ) ≤ exp |y |L2 (0,T ;V ) |Q(y ∗ − ȳ)|L2 (0,T ;V ∗ ) . (5.4.11)
2
Combining (5.4.9)–(5.4.11) implies that if |Q(y ∗ − ȳ)|L2 (O,T ,V ∗ ) is sufficiently small, then
(y ∗ , u∗ ) is a strict local minimum.

5.4.3 Newton’s method for (5.4.1)


Here we specify the steps necessary to carry out one Newton iteration for the optimal control
problem related to the Navier–Stokes equation. Let u ∈ L2 (0, T ; Ũ ) and let y = y(u) ∈
W (0, T ) denote the associated velocity field. For δu ∈ L2 (0, T ; Ũ ) let δy, λ, and μ in
W (0, T ) denote the solutions to the sensitivity equation
d
δy = A0 δy + F (y)δy + Bδu, δy(0) = 0, (5.4.12)
dt

i i

i i
ItoKunisc
i i
2008/6/12
page 148
i i

148 Chapter 5. Newton and SQP Methods

the adjoint equation


d
− λ = A0 λ + F (y)∗ λ +  (y), λ(T ) = 0, (5.4.13)
dt
and the adjoint equation arising in the Hessian
d
− μ = A0 μ + F (y)∗ μ + (F (y)∗ λ)δy +  (y)δy, μ(T ) = 0. (5.4.14)
dt
Then the operators characterizing the Newton step
Jˆ (u) δu = −Jˆ (u) (5.4.15)
can be expressed by
Jˆ (u) = h (u) + B ∗ λ
and
Jˆ (u)δu = h (u)δu + B ∗ μ.
Let us also note that
− F (y)δy, ψ
V ∗ ,V = b(y, δy, ψ) + b(δy, y, ψ),
− F (y)∗ λ, ψ
= b(y, ψ, λ) + b(ψ, y, λ),
− (F (y)∗ λ) δy, ψ
= b(δy, ψ, λ) + b(ψ, δy, λ)

for all ψ ∈ V , where b(u, v, w) =  (u · ∇)v w dx. The evaluation of the gradient Jˆ (u)
requires one forward in time solve for y(u) and one adjoint solve for λ. Each evaluation of
the Hessian necessitates one forward solve of the sensitivity equation for δy and one solve
of the second adjoint equation (5.4.14) for μ.
If the approximation as in Remark 5.2.1 is used, then this results in not including the
term (F (y)∗ λ)δy in the second adjoint equation and each evaluation requires us to solve
the linear time-varying Hamiltonian system
d
y = A(t) δy + B δu, δy(0) = 0,
dt
d
− μ = A(t)∗ μ +  (y)δy, μ(T ) = 0,
dt
where A(t) = A0 + F (y(t)).

5.5 Newton method for the weakly singular case


In this section we discuss the Newton method for the weakly singular case as introduced
in Section 1.5, and we utilize the notation given there. In particular J : Y × U → R, e :
Y1 × U → W , with ey (y, u) considered as an operator in Y . Assuming that (y, u, λ) ∈
Y1 × U × dom(ey (y, u)∗ ), the SQP step for (δy, δu, δλ) is given by
⎛ ⎞⎛ ⎞ ⎛ ⎞
L (y, u, λ) e (y, u)∗ δy ey (y, u)∗ λ + Jy (y, u)
⎝ ⎠ ⎝δu⎠ = − ⎝eu (y, u)∗ λ + Ju (y, u)⎠ .

e (y, u) 0 δλ e(y, u)

i i

i i
ItoKunisc
i i
2008/6/12
page 149
i i

5.5. Newton method for the weakly singular case 149

The updates (y +δy, u+δu, λ+δλ) will not necessarily remain in Y1 ×U ×dom(ey (y, u)∗ )
since it is not assumed that e is surjective from Y1 × U to W . However, the feasibility steps
consisting in solving the primal equation
e(y, u+ ) = 0 (5.5.1)
+
for y = y and the adjoint equation
ey (y + , u+ )∗ λ + Jy (y + , u+ ) = 0 (5.5.2)
for the dual variable λ = λ+ will guarantee that (y + , λ+ ) ∈ Y1 ×Z1 holds. Here u+ = u+δu
and Z1 ⊂ dom (ey (y + , u+ )∗ ) denotes a Banach space densely embedded into W ∗ , with
λ∗ ∈ Z1 . Since Y1 and Z1 are contained in Y and W ∗ , the feasibility steps (5.5.1)–(5.5.2)
can also be considered as smoothing steps. Thus we obtain the Newton method for the
singular case.

Algorithm
• Initialization: Choose u0 ∈ V (u∗ ), solve
e(y, u0 ) = 0, ey (y0 , u0 )∗ λ + Jy (y0 , u0 ) = 0 for (y0 , λ0 ) ∈ Y1 × Z1 ,
and set k = 0.
• Newton step: Solve for (δy, δu, δλ) ∈ Y × U × W ∗
⎛ ⎞⎛ ⎞ ⎛ ⎞
L (yk , uk , λk ) e (yk , uk )∗ δy 0
⎝ ⎠ ⎝δu⎠ = − ⎝eu (yk , uk )∗ λk + Ju (yk , uk )⎠ .

e (yk , uk ) 0 δλ 0

• Newton update: uk+1 = uk + δu.


• Feasibility step: Solve for (yk+1 , λk+1 ) ∈ Y1 × Z1 :
e(y, uk+1 ) = 0, ey (yk+1 , uk+1 )∗ λ + Jy (yk+1 , uk+1 ) = 0.

• Stop, or set k = k + 1, and goto the Newton step.

Remark 5.5.1. The algorithm is essentially the Newton method of Section 5.2. Because of
the feasible step, the first and the last components on the right-hand side of the SQP step are
zero, and Newton’s method and the SQP method coincide. Let us point out that the SQP
iteration may not be well defined without the feasibility step since the updates (y + , λ+ ) may
only be in Y × W ∗ , while e and ey (y + , u+ )∗ are not necessarily well defined on Y × U and
W ∗.

We next specify the assumptions which justify the above derivation and under which
well-posedness and convergence of the algorithm can be proved. Thus let (y ∗ , u∗ , λ∗ ) ∈
Y1 × U × Z1 be a solution to (5.1.4), or equivalently to (1.5.3), and let
V (y ∗ ) × V (u∗ ) × V (λ∗ ) ⊂ Y1 × U × Z1
be a convex bounded neighborhood of the solution triple (y ∗ , u∗ , λ∗ ).

i i

i i
ItoKunisc
i i
2008/6/12
page 150
i i

150 Chapter 5. Newton and SQP Methods

(H5) (a) For every u ∈ V (u∗ ) there exists a unique solution y = y(u) ∈ V (y ∗ ) of
e(y, u) = 0. Moreover, there exists M > 0 such that |y(u) − y ∗ |Y1 ≤ M|u − u∗ |U .

(b) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) there exists a unique solution λ = λ(y, u) ∈
V (λ∗ ) of ey (y, u)∗ λ + Jy (y, u) = 0 and |λ(y, u) − λ∗ |Z1 ≤ M|(y, u) − (y ∗ , u∗ )|Y1 ×U .

(H6) J is twice continuously Fréchet differentiable on Y × U with the second derivative


locally Lipschitz continuous.

(H7) The operator e : V (y ∗ ) × V (u∗ ) ⊂ Y1 × U → W is Fréchet differentiable with


Lipschitz continuous Fréchet derivative e (y, u) ∈ L(Y1 × U, W ). Moreover, for
each (y, u) ∈ V (y ∗ ) × V (u∗ ) the operator e (y, u) with domain in Y × U has closed
range.

(H8) For every λ ∈ V (λ∗ ) the mapping (y, u) → λ, e(y, u)


W ∗ ,W from V (y ∗ ) × V (u∗ ) to
R is twice Fréchet differentiable and the mapping (y, u, λ) → λ, e (y, u)(·, ·)
W ∗ ,W
from V (y ∗ ) × V (u∗ ) × V (λ∗ ) ⊂ Y1 × U × Z1 → L(Y1 × U, Y ∗ × U ∗ ) is Lips-
chitz continuous. Moreover, for each (y, u, λ) ∈ Y1 × U × Z1 , the bilinear form
λ, e (y, u)(·, ·)
W ∗ ,W can be extended as a continuous bilinear form on (Y × U )2 .

(H9) For every (y, u) ∈ V (y ∗ ) × V (u∗ ) ⊂ Y1 × U the operator e (y, u) can be extended
as continuous linear operator from Y × U to W , and the mapping (y, u) → e (y, u)
from V (y ∗ ) × V (u∗ ) ⊂ Y1 × U → L(Y × U, W ) is continuous and (y, u, λ) →
(λ, e (y, u)(·, ·)) from V (y ∗ )×V (u∗ )×V (λ∗ ) ⊂ Y1 ×U ×Z1 → L(Y ×U, Y ∗ ×U ∗ )
is continuous.

(H10) There exists κ > 0 such that

L (y ∗ , u∗ , λ∗ )v, v
Y ∗ ×U ∗ ,Y ×U ≥ κ |v|2Y ×U

for all v ∈ ker e (y ∗ , u∗ ) ⊂ Y × U .

(H11) e (y ∗ , u∗ ) ∈ L(Y × U, W ) is surjective.

Condition (H5) requires well-posedness of the primal and the adjoint equation in
Y1 , respectively, Z1 . The adjoint equations arise from linearization of e at elements of
Y1 ×U . Condition (H6) requires smoothness of J . In (H7) and (H8) the necessary regularity
requirements for e as mapping on Y1 × U and in Y × U are specified. From (H5) it follows
that the initialization as well as the feasibility step are well defined provided that uk ∈ V (u∗ ).
As a consequence the derivatives of J and e that are required for defining the Newton step
are taken at elements (yk , uk , λk ) ∈ Y1 × U × Z1 .
For x = (y, u, λ) ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) let A(x) denote the operator
 
L (y, u, λ) e (y, u)∗
A(x) = .
e (y, u) 0

Conditions (H6) and (H9) guarantee that the operator A(x) ∈ L(Y ×U ×W ∗ , Y ∗ ×U ∗ ×W )
for x ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ ) . Conditions (H6), (H9)–(H11) imply that

i i

i i
ItoKunisc
i i
2008/6/12
page 151
i i

5.5. Newton method for the weakly singular case 151

(H12) there exist a neighborhood V (x ∗ ) ⊂ V (y ∗ ) × V (u∗ ) × V (λ∗ ) and M1 , such that for
every x = (y, u, λ) ∈ V (x ∗ ) and δw ∈ Y ∗ × U ∗ × W

A(x)δx = δw

admits a unique solution δx ∈ Y × U × W ∗ satisfying

|δx|Y ×U ×W ∗ ≤ M1 |δw|Y ∗ ×U ∗ ×W .

Theorem 5.3. If (H5)–(H8) and (H12) hold at a solution (y ∗ , u∗ , λ∗ ) ∈ Y1 × U × Z1 of


(5.1.4) and |u0 − u∗ |U is sufficiently small, then the iterates of the algorithm are well defined
and they satisfy

|(yk+1 , uk+1 , λk+1 ) − (y ∗ , u∗ , λ∗ )|Y1 ×U ×Z1 ≤ K |uk − u∗ |2U (5.5.3)

for a constant K independent of k.

Proof. Well-posedness of the algorithm in a neighborhood of (y ∗ , u∗ , λ∗ ) follows from


(H5) and (H12). To prove convergence let us denote by x ∗ the triple (y ∗ , u∗ , λ∗ ), and
similarly δx = (δy, δu, δλ) and xk = (yk , uk , λk ). Without loss of generality we assume
that min(M, M1 ) ≥ 1. The Newton step of the algorithm can be expressed as

A(xk )δx = −F (xk ),

with F : Y1 × U × Z1 → Y ∗ × U ∗ × W defined by

F (y, u, λ) = −(ey (y, u)∗ λ + Jy (y, u), eu (y, u)∗ λ + Ju (y, u), e(y, u)).

Due to the smoothing step the first and third coordinates of F are 0 at xk . By (H5) there
exists M > 0 such that

|yk − y ∗ |Y1 ≤ M |uk − u∗ |U if uk ∈ V (u∗ ) (5.5.4)

and

|λk − λ∗ |Z1 ≤ M |(yk , uk ) − (y ∗ , u∗ )|Y1 ×U if (yk , uk ) ∈ V (y ∗ ) × V (u∗ ), (5.5.5)

where (yk , λk ) are determined by the feasibility step. Let us assume that xk ∈ V (x ∗ ). Then
it follows from (H6)–(H8) that, possibly after increasing M, it can be chosen such that for
every xk ∈ V (y ∗ ) × V (u∗ ) × V (λ∗ )

|F (x ∗ ) − F (xk ) − F (xk )(x ∗ − xk )|Y ∗ ×U ∗ ×W


 1
= |(F (xk + s(x ∗ − xk )) − F (xk ))(x ∗ − xk )|Y ∗ ×U ∗ ×W ds
0

 1
M ∗
= |(A(xk + s(x ∗ − xk )) − A(xk ))(x ∗ − xk )|Y ∗ ×U ∗ ×W ds ≤ |x − xk |2Y1 ×U ×Z1 .
0 2

i i

i i
ItoKunisc
i i
2008/6/12
page 152
i i

152 Chapter 5. Newton and SQP Methods

Moreover,
M
|A(xk )(xk + δx − x ∗ )| = |F (x ∗ ) − F (xk ) − F (xk )(x ∗ − xk )| ≤ |xk − x ∗ |2Y1 ×U ×Z1 .
2
Consequently, by (H12)
MM1
|uk+1 − u∗ |U ≤ |xk+1 − x ∗ |Y ×U ×W ∗ ≤ |xk − x ∗ |2Y1 ×U ×Z1 , (5.5.6)
2
provided that xk ∈ V (x ∗ ). The proof will be completed by an induction argument with
respect to k. Let r be such that 2M 5 M1 r < 1 and that |x − x ∗ | < 2M 2 r implies x ∈√V (x ∗ ).
Assume that |u0 − u∗ | ≤ r. Then |y0 − y ∗ |Y1 ≤ Mr by (5.5.4) and |λ0 − λ∗ |Z1 ≤ 2M 2 r
by (5.5.5). It follows that
|x0 − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 |u0 − u∗ |2U ≤ 2M 2 r
and hence x0 ∈ V (x ∗ ). Let |xk − x ∗ |Y1 ×U ×Z1 ≤ 2M 2 r. Then from (5.5.6)
|uk+1 − u∗ |U = 2M 5 M1 r 2 ≤ r. (5.5.7)
Consequently (5.5.4)–(5.5.6) are applicable and imply
|xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 2 |uk+1 − u∗ |2U ≤ M 3 M1 |xk − x ∗ |2Y1 ×U ×Z1 . (5.5.8)
It follows that
|xk+1 − x ∗ |Y1 ×U ×Z1 ≤ 4M 7 M1 |uk − u∗ |2U ,
which implies (5.5.3) with K = 4M 7 M1 . From (5.5.7)–(5.5.8) finally |xk+1 −x ∗ |Y1 ×U ×Z1 ≤
2M 2 r.

Let us return now to some of the examples of Chapter 1, Section 1.5, and discuss the
applicability of conditions (H5)–(H11).
Example 1.19 revisited. Condition (H5)(a) is a direct consequence of Lemma 1.14. Con-
dition (H5)(b) corresponds to
(∇λ, ∇φ) + (ey λ, φ) + (y − z, φ) = 0 for all φ ∈ Y, (5.5.9)
given y ∈ Y1 . Let Z1 = Y1 = Y ∩ L∞ (). It follows from [Tr, Chapter 2.3] and the proof
of Lemma 1.14 that there exists a unique solution λ = λ(y) ∈ Z1 to (5.5.9). Moreover, if
y ∈ V (y ∗ ) and w = λ(y) − λ(y ∗ ), then w ∈ Z1 satisfies
∗ ∗
(∇w, ∇φ) + (ey w, φ) + ((ey − ey )λ, φ) + (y − y ∗ , φ) = 0 for all φ ∈ Y.
From the proof of Lemma 4.1 it follows that there exists M > 0 such that
|λ(y) − λ(y ∗ )|Z1 ≤ M |y − y ∗ |Y1 for all y ∈ V (y ∗ )
and thus (H5)(b) is satisfied. It is simple to argue the validity of (H6)–(H9). Note that
e (y ∗ , u∗ ) is surjective from Y × U to W and thus (H11) is satisfied. As for (H10), this
condition is equivalent to the requirement that

|δy|2L2 () + (λ∗ ey , (δy)2 ) + β |δu|2U ≥ κ(|δy|2Y + |δu|2U )

i i

i i
ItoKunisc
i i
2008/6/12
page 153
i i

5.5. Newton method for the weakly singular case 153

for all (δy, δu) ∈ Y × U satisfying



(∇δy, ∇φ) + (ey δy, φ) − (δu, φ) = 0 for all φ ∈ Y, (5.5.10)

where λ∗ is the solution to (5.3.11) with y = y ∗ . Then there exists k̄ > 0 such that

|δy|2Y ≤ k̄|δu|2

for all (δy, δu) satisfying (5.5.10). It follows that (H10) holds if |(λ∗ )− |L∞ is sufficiently
small. This is the case, for example, in the case of small residue problems, in the sense that
|y ∗ − z|2L2 () is small enough. If z ≥ y ∗ , then the weak maximum principle [Tr] is applied
to (5.5.9) and gives λ∗ ≥ 0.
With quite analogous arguments it can be shown that (H5)–(H11) also hold for Ex-
ample 1.20 of Chapter 1.
Example 1.22 revisited. The constraint and adjoint equations are given by

(∇y, ∇φ) − (u y, ∇φ) = (f, φ) for all φ ∈ H01 ()

and

(∇λ, ∇φ) + (u λ, ∇φ) + (y − z, φ) = 0 for all φ ∈ Y = H01 (), (5.5.11)

where div u = 0, f ∈ L2 (), and z ∈ L2 (). As already discussed in Examples 1.15


and 1.22 they admit unique solutions in Y1 = H01 () ∩ L∞ () for u ∈ U = L2n (). Let
w1 = y(u) − y(u∗ ) and w2 = λ(y, u) − λ(y ∗ , u∗ ). Then

(∇w1 , ∇φ) − (u w1 , ∇φ) = ((u − u∗ )y ∗ , ∇φ)

and

(∇w2 , ∇φ) + (u w2 , ∇φ) = −((u − u∗ ) λ∗ , ∇φ) − (y − y ∗ , φ) for all φ ∈ Y.

From (1.4.22) it follows that (H5) holds if y ∗ and λ∗ are in W 1,∞ . Conditions (H6)–
(H8) are easily verified. Here we address only the closed range property of e (y, u), with
(y, u) ∈ Y1 × U . For u ∈ Cn∞ () with div u = 0 surjectivity follows from the Lax–
Milgram lemma, and a density argument asserts surjectivity for every u ∈ U . We turn to
(H12) and assume that u ∈ Cn∞ (), div u = 0 first. Then e (y, u) ∈ L(V × U, H −1 ())
and e (y, u)(δy, δu) = 0 can be expressed as

(∇δy, ∇φ) − (δu y, ∇φ) − (u δy, ∇φ) = 0 for all φ ∈ Y.

Hence

|δy|Y ≤ |y|L∞ |δu|U (5.5.12)

for all (δy, δu) ∈ ker e (y, u). Henceforth we assume that

β − 2|y ∗ |L∞ |λ∗ |L∞ > 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 154
i i

154 Chapter 5. Newton and SQP Methods

Then there exists a neighborhood V (y ∗ ) × V (λ∗ ) of (y ∗ , λ∗ ) and κ > 0 such that

β − 2|y|L∞ () |λ|L∞ ≥ κ (5.5.13)

for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ). For (δy, δu) ∈ Y × U and (y, λ) ∈ V (y ∗ ) × V (λ∗ ) we
have by (5.5.12) and (5.5.13)

L (y, u, λ)((δy, δu), (δy, δu)) = |δy|2L2 () + β|δu|2U − 2(δu∇δy, λ)


≥ β|δu|2 − 2|λ|L∞ |∇δy|Y |δu|U ≥ κ|δu|2U .

This estimate, together with (5.5.12), implies that L (y, u, λ) is coercive on ker e (y, u)
and hence A(x)δx = δw admits a unique solution in Y × U × Y for every δw. To estimate
δx in terms of δw, we note that the last equation in the system A(x)δx = δw implies that

|δy|Y ≤ |y|L∞ |δu|U + |w3 |H −1 . (5.5.14)

Similarly, we find from the first equation in the system that

|δλ|2Y ≤ |δy||δλ| + |λ|L∞ |δλ|Y |δu|U + |w1 |H −1 |δλ|Y

and consequently
|δλ|Y ≤ c|δy| + |λ|L∞ () |δu|U + |w1 |H −1 , (5.5.15)
where c is the embedding constant of H01 () 2
into L (). Moreover, using (5.5.14) we find

L (y, u, λ)((δy, δu), (δy, δu)) ≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |δy|Y |δu|U
≥ |δy|2L2 + β|δu|2U − 2|λ|L∞ |y|L∞ () |δu|2U − 2|λ|L∞ |w3 |H −1 |δu|U (5.5.16)
≥ |δy|2 + κ|δu|2U − 2|λ|L∞ |w3 |H −1 |δu|U .

From (5.5.15), (5.5.16) and utilizing A(x)δx = δw we obtain

|δy|2 + κ|δu|2 ≤ (2|λ|L∞ |δu|U + |δλ|Y )|w3 |H −1 + |w1 |H −1 |δy|Y + |w2 |U |δu|U . (5.5.17)

Inequalities (5.5.14), (5.5.15), and (5.5.17) imply the existence of a constant M1 such that

|δx|Y ×U ×Y ≤ M1 |δw|H −1 ×U ×H −1 (5.5.18)

for all (y, λ) ∈ V (y ∗ ) × V (λ∗ ) and every b ∈ Cn∞ (), with div b = 0. A density argument
with respect to u implies that A(x)δx = δw admits a solution for all x ∈ V (y ∗ )×U ×V (λ∗ )
and that (5.5.18) holds for all such x.

i i

i i
ItoKunisc
i i
2008/6/12
page 155
i i

Chapter 6

Augmented
Lagrangian-SQP
Methods

6.1 Generalities
This chapter is devoted to second order augmented Lagrangian methods for optimization
problems with equality constraints of the type
-
min f (x) over x ∈ X
(6.1.1)
subject to e(x) = 0,

and to problems with equality constraints as well as additional constraints


-
min f (x) over x ∈ C
(6.1.2)
subject to e(x) = 0,

where f : X → R, e : X → W , with X and W real Hilbert spaces and C a closed convex


set in X. We shall show that for equality constraints it is possible to replace the first order
Lagrangian update that was developed in Chapter 4 by a second order one. It will become
evident that second order augmented Lagrangian methods are closely related to SQP meth-
ods. For equality-constrained problems the SQP method also coincides with the Newton
method applied to the first order optimality conditions. Just like the Newton method, the
SQP method and second order augmented Lagrangian methods are convergent with second
order convergence rate if the initialization is sufficiently close to a local solution of (6.1.1)
and if appropriate additional regularity conditions are satisfied. In case the initialization is
not in the region of attraction, globalization techniques such as line searches or trust region
techniques may be necessary. We shall not focus on such methods within this monograph.
However, the penalty term, which, together with the Lagrangian term, characterizes the
augmented Lagrangian method, also has a globalization effect. This will become evident
from the analytical as well as the numerical results of this chapter. Let us stress that for the
concepts that are analyzed in this chapter we do not advocate choosing c large.
As in Chapter 3, throughout this chapter we identify the dual of the space W with itself,
and we consider e (x)∗ as an operator from W to X. In Section 6.2 we present the second
order augmented Lagrangian method for (6.1.1). Problems with additional constraints as

155

i i

i i
ItoKunisc
i i
2008/6/12
page 156
i i

156 Chapter 6. Augmented Lagrangian-SQP Methods

in (6.1.2) are considered in Section 6.3. Applications to optimal control problems will be
given in Section 6.4. Section 6.5 is devoted to short discussions of miscellaneous topics
including reduced SQP methods and mesh independence. In Section 6.6 we give a short
description of related literature.

6.2 Equality-constrained problems


In this section we consider -
min f (x) over x ∈ X
(6.2.1)
subject to e(x) = 0,
f : X → R, e : X → W , with X and W real Hilbert spaces. Let x ∗ be a local solution of
(6.2.1). As before derivatives as well as partial derivatives with respect to the variable x
will be denoted by primes. We shall not distinguish by notation between the functional f
in the dual X ∗ of X and its Riesz representation in X. As in Chapter 4 we shall identify the
topological duals of W and Z with themselves.
It is assumed throughout that


⎨f and e are twice continuously Fréchet differentiable
with Lipschitz continuous second derivatives (6.2.2)


in a convex neighborhood of V (x ∗ ) of x ∗

and
e (x ∗ ) is surjective. (6.2.3)
The Lagrangian functional associated with (6.2.1) is denoted by L : X × W → R and it is
given by
 
L(x, λ) = f (x) + λ, e(x) W .
With (6.2.3) holding there exists a Lagrange multiplier λ∗ ∈ W such that the following first
order necessary optimality condition is satisfied:
L (x ∗ , λ∗ ) = 0, e(x ∗ ) = 0. (6.2.4)
We shall also make use of the following second order sufficient optimality condition:
-
there exists κ > 0 such that
(6.2.5)
L (x ∗ , λ∗ )(h, h) ≥ κ |h|2X for all h ∈ ker e (x ∗ ).

Here L (x ∗ , λ∗ ) denotes the bilinear form characterizing the second Fréchet derivative of
L with respect to x at (x ∗ , λ∗ ). For any c > 0 the augmented Lagrangian functional
Lc : X × W → R is defined by
  c
Lc (x, λ) = f (x) + λ, e(x) W + |e(x)|2W .
2
We note that the necessary optimality condition implies
L c (x ∗ , λ∗ ) = 0 e(x ∗ ) = 0 for all c ≥ 0. (6.2.6)

i i

i i
ItoKunisc
i i
2008/6/12
page 157
i i

6.2. Equality-constrained problems 157

Lemma 6.1. Let (6.2.3) and (6.2.5) hold. Then there exists a neighborhood V (x ∗ , λ∗ ) of
(x ∗ , λ∗ ), c̄ > 0 and σ̄ > 0 such that

L c (x, λ)(h, h) ≥ σ̄ |h|2X for all h ∈ X, (x, λ) ∈ V (x ∗ , λ∗ ), and c ≥ c̄.

Proof. Corollary 3.2 and conditions (6.2.3) and (6.2.5) imply the existence of σ̄ > 0 and
c̄ > 0 such that

L c (x ∗ , λ∗ )(h, h) ≥ 2σ̄ |h|2X for all h ∈ X and c ≥ c̄.

Due to continuity of (x, λ) → L c (x, λ) the conclusion of the lemma follows.

Lemma 6.1 implies in particular that x → Lc (x, λ∗ ) can be bounded from below by
a quadratic function. This fact is referred to as augmentability of (6.2.1) at (x ∗ , λ∗ ).

Lemma 6.2. Let (6.2.3) and (6.2.5) hold. Then there exist σ̄ > 0, c̄ > 0, and a
neighborhood Ṽ (x ∗ ) of x ∗ such that
2
Lc (x, λ∗ ) ≥ Lc (x ∗ , λ∗ ) + σ̄ x − x ∗ X for all x ∈ Ṽ (x ∗ ) and c ≥ c̄. (6.2.7)

Proof. Due to Taylor’s theorem, Lemma 6.1, and (6.2.6) we find for x ∈ V (x ∗ )

1
Lc (x, λ∗ ) = Lc (x ∗ , λ∗ ) + L c (x ∗ , λ∗ )(x − x ∗ , x − x ∗ ) + o(|x − x ∗ |2X )
2

σ̄ 2 2
≥ Lc (x ∗ , λ∗ ) + x − x ∗ X + o( x − x ∗ X ).
2
The claim follows from this estimate.

Without loss of generality we may assume that the neighborhoods V (x ∗ ) and Ṽ (x ∗ )


of (6.2.2) and Lemma 6.2 coincide and that V (x ∗ , λ∗ ) of Lemma 6.1 equals V (x ∗ ) × V (λ∗ ).
Due to (6.2.2) we can further assume that e (x) is surjective for all x ∈ V (x ∗ ).
To iteratively determine (x ∗ , λ∗ ) one can apply Newton’s method to (6.2.6). Given a
current iterate (x, λ) the next iterate (x̂, λ̂) is the solution to
    
L c (x, λ) e (x)∗ x̂ − x L c (x, λ)
=− . (6.2.8)
e (x) 0 λ̂ − λ e(x)

Alternatively, (6.2.8) can be used to define the λ-update only, whereas the x-update is
calculated by a different technique. We shall demonstrate next that (6.2.8) can be solved
for λ̂ without recourse to x̂.
For (x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c̄ we define B(x, λ) ∈ L(W ) by

B(x, λ) = e (x)L c (x, λ)−1 e (x)∗ . (6.2.9)

i i

i i
ItoKunisc
i i
2008/6/12
page 158
i i

158 Chapter 6. Augmented Lagrangian-SQP Methods

Here L(W ) denotes the space of bounded linear operators from W into itself. Note that
B(x, λ) is invertible. In fact, there exists a constant k > 0 such that
   
B(x, y)y, y W = L c (x, λ)−1 e (x)∗ y, e (x)∗ y X
2
≥ k e (x)∗ y X

for all y ∈ W . Since e (x)∗ is injective and has closed range, there exists k̂ such that
∗ 2
e (x) y ≥ k̂ |y|2 for all y ∈ W,
X W

and by the Lax–Milgram theorem continuous invertibility of B(x, λ) follows, provided that
(x, λ) ∈ V (x ∗ , λ∗ ) and c ≥ c̄. Premultiplying the first equation in (6.2.8) by e (x)L c (x, λ)−1
we obtain
- 
λ̂= λ + B(x, λ)−1 e(x) − e (x)L c (x, λ)−1 L c (x, λ) ,
(6.2.10)
x̂= x − L c (x, λ)−1 L c (x, λ̂).

Whenever the dependence of (x̂, λ̂) on (x, λ, c) is important, (x̂(x, λ, c), λ̂(x, λ, c)) will
be written in place of (x̂, λ̂). If, for fixed λ, x = x(λ) is chosen as a local solution to

min Lc (x, λ) subject to x ∈ X, (6.2.11)

then L c (x(λ), λ) = 0 and

λ̂(x(λ), λ, c) = λ + B(x(λ), λ)−1 e(x(λ)). (6.2.12)

We point out that (6.2.12) can be interpreted as a second order update to the Lagrange
variable. To acknowledge this, let dc denote the dual functional associated with Lc , i.e.,

dc (λ) = min Lc (x, λ) subject to {x : x − x ∗ ≤ }

for some  > 0. Then the first and second derivatives of dc with respect to λ satisfy

∇λ dc (λ) = e(x(λ))

and

∇λ2 dc (λ) = −B(x(λ), λ),

and (6.2.12) presents a Newton step for maximizing dc .


Returning to (6.2.8) we note that its blockwise solution given in (6.2.10) requires
setting up and inverting L c (x, λ). Following an argument due to Bertsekas [Be] we next
argue that L c (x, λ) can be avoided during the iteration. One requires only that L 0 (x, λ) =
L (x, λ). In fact, we find

L c (x, λ) = L 0 (x, λ + ce(x))

and
 
L c (x, λ) = L 0 (x, λ + ce(x)) + c e (x)(·), e (x)(·) W .

i i

i i
ItoKunisc
i i
2008/6/12
page 159
i i

6.2. Equality-constrained problems 159

Consequently (6.2.8) can be expressed as


  
L0 (x, λ + ce(x)) + ce (x)∗ e (x) e (x)∗ x̂ − x
e (x) 0 λ̂ − λ
 
L0 (x, λ + ce(x))
=− . (6.2.13)
e(x)

Using the second equation e (x)(x̂ −x) = −e(x) in the first equation of (6.2.13) we arrive at

L 0 (x, λ + ce(x))(x̂ − x) − ce (x)∗ e(x) + e (x ∗ )(λ̂ − λ) = −L 0 (x, λ + ce(x)),

and hence (6.2.8) is equivalent to


  
L0 (x, λ + ce(x)) e (x)∗ x̂ − x
e (x) 0 λ̂ − (λ + ce(x))
 
L0 (x, λ + ce(x))
=− . (6.2.14)
e(x)

Solving (6.2.8) is thus equivalent to


(i) carrying out the first order multiplier iteration

λ̃ = λ + ce(x),

(ii) solving
    
L 0 (x, λ̃) e (x)∗ x̂ − x L 0 (x, λ̃)
=−
e (x) 0 λ̂ − λ̃ e(x)

for (x̂, λ̂).


It will be convenient to introduce the matrix of operators
 
L0 (x, λ) e (x)∗
M(x, λ) = .
e (x) 0

With (6.2.3) and (6.2.5) holding there exist a constant κ > 0 and a neighborhood U (x ∗ , λ∗ ) ⊂
V (x ∗ , λ∗ ) of (x ∗ , λ∗ ) such that

 M −1 (x, λ) L(X×W ) ≤ κ for all (x, λ) ∈ U (x ∗ , λ∗ ). (6.2.15)

Lemma 6.3. Assume that (6.2.3) and (6.2.5) hold. Then there exists a constant K > 0
such that for any (x, λ) ∈ U (x ∗ , λ∗ ) the solution (x̂, λ̂) of
   
x̂ − x L (x, λ)
M(x, λ) =−
λ̂ − λ e(x)

satisfies
2
(x̂, λ̂) − (x ∗ , λ∗ ) ≤ K (x, λ) − (x ∗ , λ∗ ) X×W . (6.2.16)
X×W

i i

i i
ItoKunisc
i i
2008/6/12
page 160
i i

160 Chapter 6. Augmented Lagrangian-SQP Methods

Proof. Note that


   
x̂ − x L (x, λ) − L (x ∗ , λ∗ )
M(x, λ) =−
λ̂ − λ e(x) − e(x ∗ )

and consequently
     
x̂ − x ∗ L (x, λ) − L (x ∗ , λ∗ ) x − x∗
M(x, λ) = − + M(x, λ) .
λ̂ − λ∗ e(x) − e(x ∗ ) λ − λ∗
This equality further implies
   15 6x − x ∗ 
x̂ − x ∗ ∗ ∗
M(x, λ) = M(x, λ) − M(tx + (1 − t)x , tλ + (1 − t)λ ) dt.
λ̂ − λ∗ 0 λ − λ∗
The regularity properties of f and e imply that (x, λ) → M(x, λ) is Lipschitz continuous
on U (x ∗ , λ∗ ) for a Lipschitz constant γ > 0. Thus we obtain
 ∗ 
γ 2
M(x, λ) x̂ − x ≤ (x, λ) − (x ∗ , λ∗ ) X×W ,

λ̂ − λ X×W 2

and by (6.2.15)
γ κ 2

(x̂, λ̂) − (x ∗ , λ∗ ) ≤ (x, λ) − (x ∗ , λ∗ ) X×W ,
X×W 2
which implies the claim.

We now describe three algorithms and analyze their convergence. They are all based
on (6.2.14) and differ only in the choice of x. Recall that if x is a solution to (6.2.11),
then (6.2.12) and hence (6.2.14) provide a second order update to the Lagrange multiplier.
Solving (6.2.11) implies extra computational cost. In the results which follow we show that
as a consequence a larger region of attraction with respect to the initial condition and an
improved rate of convergence factor is obtained, compared to methods which solve (6.2.11)
only approximately or skip it all together. As a first choice x in (6.2.14) is determined by
solving

min Lc (x, λn ) subject to x ∈ V (x ∗ ).

The second choice is to take x only as an appropriate suboptimal solution to this optimization
problem, and the third choice is to simply choose x as the solution x̂ of the previous iteration
of (6.2.14).

Algorithm 6.1.
(i) Choose λ0 ∈ W, c ∈ (c̄, ∞) and set σ = c − c̄, n = 0.
(ii) Determine x̃ as a solution of

min Lc (x, λn ) subject to x ∈ V (x ∗ ). (Paux )

i i

i i
ItoKunisc
i i
2008/6/12
page 161
i i

6.2. Equality-constrained problems 161

(iii) Set λ̃ = λn + σ e(x̃).

(iv) Solve for (x̂, λ̂):


   
x̂ − x̃ L0 (x̃, λ̃)
M(x̃, λ̃) =− .
λ̂ − λ̃ e(x̃)

(v) Set λn+1 = λ̂, n = n + 1, and goto (ii).

The existence of a solution to (Paux ) is guaranteed if, for example, the following
conditions on f and e hold:


⎨f : X → R is weakly lower semicontinuous,
e : X → W maps weakly convergent sequences (6.2.17)


to weakly convergent sequences.

Under the conditions of Theorem 6.4 below it follows that the solutions x̃ of (Paux )
satisfy x̃ ∈ V (x ∗ ).

Theorem 6.4. If (6.2.3), (6.2.5), and (6.2.17) hold and 1


c−c̄
|λ0 − λ∗ |2 is sufficiently small,
then Algorithm 6.1 is well defined and

K̂ 2
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ λn − λ∗ W (6.2.18)
X×W c − c̄

for a constant K̂ independent of c and n = 0, 1, . . . .

Proof. Let η̂ be the largest radius for a ball centered at (x ∗ , λ∗ ) and contained in U (x ∗ , λ∗ ),
and let γ be a Lipschitz constant for f and e on U (x ∗ , λ∗ ). Further let E = e (x ∗ ) and
note that (EE ∗ )−1 E ∈ L(X) as a consequence of (6.2.3). We define

M̄ = 1 + 2(c̄γ )2 + 8γ 2 (1 + μ)2  (EE ∗ )−1 E 2 ,


 
where μ = 1 + √c̄γ
2σ σ̄
|λ0 − λ∗ |W + |λ∗ |W , and we put
 7 
2σ σ̄ 2σ σ̄
η = min η̂ , , (6.2.19)
M̄ K M̄

where K is given in Lemma 6.3. Let us assume that



λ0 − λ∗ < η.
W

The proof will be given by induction on n. The case n = 0 follows from the general
arguments given below. For the induction step we assume that

λi − λ∗ ≤ λi−1 − λ∗ for i = 1, . . . , n. (6.2.20)
W W

i i

i i
ItoKunisc
i i
2008/6/12
page 162
i i

162 Chapter 6. Augmented Lagrangian-SQP Methods

Using (ii) of Algorithm 6.1 we have


   
f (x ∗ ) ≥ Lc (x̃, λn ) = f (x̃) + λ∗ , e(x̃) W + λn − λ∗ , e(x̃) W
1 1
+ c̄ |e(x̃)|2W + (c − c̄) |e(x̃)|2W
2 2
  σ
≥ Lc̄ (x̃, λ∗ ) + λn − λ∗ , e(x̃) W + |e(x̃)|2W
2
1  
= Lc̄ (x̃, λ∗ ) + λn + σ e(x̃) − λ∗ 2 − λn − λ∗ 2 .
2σ W W

In view of Lemma 6.2 and the fact that x̃ is chosen in V (x ∗ )(⊂ Ṽ (x ∗ )) we find
2 1 2
1 2
σ̄ x̃ − x ∗ X + λ̃ − λ∗ ≤ λn − λ∗ W . (6.2.21)
2σ W 2σ
3
This implies in particular that |x̃ − x ∗ | ≤ 2σM̄σ̄ |λ0 − λ∗ | < η̂ and hence x̃ ∈ V (x ∗ ). The
necessary optimality conditions for (6.2.1) and (Paux ) with x̃ ∈ V (x ∗ ) are given by

f (x ∗ ) + E ∗ λ∗ = 0

and
f (x̃) + e (x̃)∗ (λn + ce(x̃)) = 0.
Subtracting these two equations and defining λ̄ = λn + ce(x̃) give

E ∗ (λ∗ − λ̄) = f (x̃) − f (x ∗ ) + (e (x̃)∗ − e (x ∗ )∗ )(λ̄)

and consequently

λ∗ − λ̄ = (EE ∗ )−1 E f (x̃) − f (x ∗ ) + (e (x̃)∗ − e (x ∗ )∗ )(λ̄) .

We obtain
|λ∗ − λ̄|W ≤ 2γ (1 + |λ̄|W )(EE ∗ )−1 E x̃ − x ∗ X . (6.2.22)
To estimate λ̄ observe that due to (6.2.20) and (6.2.21)

λ̄ ≤ λ̃ − λ̄ + λ̃ = c̄ e(x̃) − e(x ∗ ) + λ∗ + λ̃ − λ∗
W W W
W W
  W
c̄γ
≤ c̄γ x̃ − x ∗ W + λ∗ W + λn − λ∗ W ≤ 1 + √ λ n − λ ∗
W
2σ σ̄

+ λ W ≤ μ. (6.2.23)

Using (6.2.20)–(6.2.23) we find

|(x̃, λ̃) − (x ∗ , λ∗ )|2X×W ≤ |x̃ − x ∗ |2X + 2|λ̃ − λ̄|2W + 2|λ̄ − λ∗ |2W


= |x̃ − x ∗ |2X + 2(c̄γ )2 |x̃ − x ∗ |2X + 8γ 2 (1 + μ)2 (EE ∗ )−1 E2 |x̃ − x ∗ |2X
M̄ 2 M̄ 2
= λn − λ∗ X < η ≤ η̂2 .
2σ σ̄ 2σ σ̄

i i

i i
ItoKunisc
i i
2008/6/12
page 163
i i

6.2. Equality-constrained problems 163

This implies that Lemma 6.3 is applicable with (x, λ) = (x̃, λ̃). We find

K M̄ 2
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ λn − λ∗ X , (6.2.24)
X×W 2σ σ̄

and (6.2.18) is proved with K̂ = K2σ̄M̄ . From (6.2.20), (6.2.24), and the definition of η we
also obtain

(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ λn − λ∗ < η.
X×W W

This implies (6.2.20) for i = n + 1 and the proof is finished.

Algorithm 6.2. This coincides with Algorithm 6.1 except for (i) and (ii), which are re-
placed by
(i) Choose λ0 ∈ W, c ∈ (c̄, ∞), and σ ∈ (0, c − c̄] and set n = 0.
(ii) Determine x̃ ∈ V (x ∗ ) such that

Lc (x̃, λn ) ≤ Lc (x ∗ , λn ) = f (x ∗ ).

Theorem 6.5. Let (6.2.3) and (6.2.5) hold. If |λ0 − λ∗ |W is sufficiently small, then
Algorithm 6.2 is well defined and
 
1 2
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ K 1 + λn − λ∗ W (6.2.25)
X×W σ σ̄

for all n = 0, 1, . . . . Here xn+1 stands for x̂ of step (iv) of Algorithm 6.1 and K, independent
of c, is given in (6.2.16).

Proof. Let η̂ be defined as in the proof of Theorem 6.4. We define


 
η̂ 1
η = min √ , , (6.2.26)
a Ka

where a = 1 + 1
σ σ̄
. The proof is based on an induction argument. If

λ0 − λ∗ < η,

then the first iterate of Algorithm 6.2 is well defined,



(x1 , λ1 ) − (x ∗ , λ∗ ) <η
X×W

and (6.2.25) holds with n = 0. This will follow from the general arguments given below.
Assuming that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η, we show that Algorithm 6.2 is well defined
for n + 1, that

(xn+1 , λn+1 ) − (x ∗ , λ∗ ) < η,
X×W

i i

i i
ItoKunisc
i i
2008/6/12
page 164
i i

164 Chapter 6. Augmented Lagrangian-SQP Methods

and that (6.2.25) holds. As in the proof of Theorem 6.4 one argues that (6.2.21) holds and
consequently
2  
∗ 1

(x̃, λ̃) − (x , λ ) ≤ 1+ λn − λ∗ 2 . (6.2.27)
X×W 2σ σ̄ W

This implies that |(x̃, λ̃∗ ) − (x ∗ , λ∗ )|X×W < η̂ and hence Lemma 6.3 is applicable with
(x, λ) = (x̃, λ̃) and (iv) of Algorithm 6.2 is well defined. Combining (6.2.16) with (6.2.27)
we find
 
1
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤K 1+ λn − λ∗ 2 ,
X×W 2σ σ̄ W

which is (6.2.25). By the definition of η we further obtain



(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ λn − λ∗ W < η.
X×W

Remark 6.2.1. In the proof of Theorem 6.4 as well as in that of Theorem 6.5 we utilize
(6.2.7) from Lemma 6.2. Conditions (6.2.3) and (6.2.5) are sufficient conditions for (6.2.7)
to hold for c ≥ c̄ > 0. If (6.2.7) can be shown to hold for all c ≥ 0, then c̄ can be chosen
equal to 0 and σ = c is admissible in (i ) of Algorithm 6.2.

In the third algorithm we delete the second step of Algorithms 6.1 and 6.2 and directly
iterate (6.2.14).

Algorithm 6.3.
(i) Choose (x0 , λ0 ) ∈ X × W, c ≥ 0 and put n = 0.
(ii) Set λ̃ = λn + ce(xn ).
(iii) Solve for (x̂, λ̂):
   
x̂ − xn L 0 (xn , λ̃)
M(xn , λ̃) =− .
λ̂ − λ̃ e(xn )

(iv) Set (xn+1 , λn+1 ) = (x̂, λ̂), n = n + 1, and goto (ii).

Theorem 6.6. Let (6.2.3) and (6.2.5) hold. If max(1, c) |(x0 , λ0 ) − (x ∗ , λ∗ )|X×W is
sufficiently small, then Algorithm 6.3 is well defined and
2
(xn+1 , λn+1 ) − (x ∗ , λ∗ ) ≤ K̃ (xn , λn ) − (x ∗ , λ∗ ) X×W
X×W

for a constant K̃ independent of n = 0, 1, 2, . . . , but depending on c.

Proof. Let η̂ and γ be defined as in the proof of Theorem 6.4. We introduce


 
η̂ 1
η = min √ , , (6.2.28)
a Ka

i i

i i
ItoKunisc
i i
2008/6/12
page 165
i i

6.3. Partial elimination of constraints 165

 
where a = max 2, 1 + 2c2 γ 2 and K is defined in Lemma 6.3.
Let us assume that

(x0 , λ0 ) − (x ∗ , λ∗ ) < η.
X×W

Again we proceed by induction and the case n = 0 follows from the general arguments
given below. Let us assume that |(xn , λn ) − (x ∗ , λ∗ )|X×W < η. Then
2 2 2 2

(xn , λ̃) − (x ∗ , λ∗ ) ≤ xn − x ∗ X + 2c2 e(xn ) − e(x ∗ ) W + 2 λn − λ∗ W
X×W
2 2 2
≤ xn − x ∗ X + 2c2 γ 2 xn − x ∗ X + 2 λn − λ∗ W
2
≤ a (xn , λn ) − (x ∗ , λ∗ ) X×W < η̂2 ,

and thus Lemma 6.3 is applicable with (x, λ) = (xn , λ̃).

Remark 6.2.2. (i) If c is set equal to 0 in Algorithm 6.3, then we obtain the well-known
SQP algorithm for the equality-constrained problem (6.2.1). It is well known to have a
second order convergence rate which also follows from Theorem 6.6 since K̃ is finite for
c = 0.
(ii) Theorem 6.6 suggests that in case (Paux ) is completely skipped in the second
order augmented Lagrangian update the penalty parameter may have a negative effect on
the region of attraction and on the convergence rate estimate. Our numerical experience,
without additional globalization techniques, indicates that moderate values of c do not
impede the behavior of the algorithm when compared to c = 0, which results in the SQP
algorithm. Choosing c > 0 may actually enlarge the region of attraction when compared to
c = 0. For parameter estimation problems c > 0 is useful, because in this way Algorithm
6.3 becomes a hybrid algorithm combining the output least squares and the equation error
formulations [IK9].

6.3 Partial elimination of constraints


Here we consider


⎨ min f (x) subject to
(6.3.1)


e(x) = 0, g(x) ≤ 0, and (x) ∈ K,

where f : X → R, e : X → W, g : X → Rm ,  : X → Z, where X, W , and Z are


real Hilbert spaces, K is a closed convex cone in Z, and  is an affine constraint. In the
remainder of this chapter we identify the dual of Z with itself. Let x ∗ be a local solution of
(6.3.1) and assume that

⎨ f, e, and g are twice continuously Fréchet differentiable
with second derivatives Lipschitz continuous (6.3.2)

in a neighborhood of x ∗ .

i i

i i
ItoKunisc
i i
2008/6/12
page 166
i i

166 Chapter 6. Augmented Lagrangian-SQP Methods

The objective of this section is to approximate (6.3.1) by a sequence of problems


with quadratic cost and affine constraints. For this purpose we utilize the Lagrangian
L : X × W × Rm × Z → R associated with (6.3.1) given by
     
L(x, λ, μ, η) = f (x) + λ, e(x) W + μ, g(x) Rm + η, (x) Z .
We shall require the following assumption (cf. Chapter 3):
x ∗ is a regular point, i.e., (6.3.3)
⎧⎛ ∗ ⎞ ⎛ ⎞ ⎛ ⎞⎫
⎨ e (x ) 0 0 ⎬
0 ∈ int ⎝ g (x ∗ ) ⎠ X + ⎝ Rm +
⎠ + ⎝ g(x ∗ ) ⎠ ,
⎩ ⎭
L −K (x ∗ )
where the interior is taken in W × Rm × Z. With (6.3.3) holding there exists a Lagrange
multiplier (λ∗ , μ∗ , η∗ ) ∈ W × Rm,+ × K + such that
⎧ ∗ ∗ ∗ ∗

⎪ L (x , λ , μ , η ),

−e(x ∗ ),
0∈

⎪ −g(x ∗ ) + ∂ ψ Rm,+ (μ∗ ),

−(x ∗ ) + ∂ ψK + (η∗ ).
As in Section 3.1, we assume that the coordinates of the inequalities g(x) ≤ 0 are arranged
so that μ∗ = (μ∗,+ , μ∗,0 , μ∗,− ) and g = (g + , g 0 , g − ) with g + : X → Rm1 , g 0 : Rm2 , g − :
X → Rm3 , m = m1 + m2 + m3 , and
g + (x ∗ ) = 0, μ∗,+ > 0,
g 0 (x ∗ ) = 0, μ∗,0 = 0,
g − (x ∗ ) < 0, μ∗,− = 0.

We further define G+ = g + (x ∗ ) , G0 = g 0 (x ∗ ) ,
 
E
E+ = : X → W × R m1 ,
G+
and for z ∈ Z we define the operator E(z) : X × R → (W × Rm1 ) × Rm2 × Z by
⎛ ⎞
E+ 0
E(z) = ⎝ G0 0 ⎠ .
L z
The following additional assumptions are required:

there exists κ > 0 such that L (x ∗ , λ∗ , μ∗ ) (x, x) ≥ κ|x|2X for all x ∈ ker(E+ ) (6.3.4)
and
E((x ∗ )) is surjective. (6.3.5)
The SQP method for (6.2.1) with elimination of the equality and finite rank inequality
constraints is given next. We set
   
L(x, λ, μ) = f (x) + λ, e(x) W + μ, g(x) Rm .

i i

i i
ItoKunisc
i i
2008/6/12
page 167
i i

6.3. Partial elimination of constraints 167

Algorithm 6.4.
(i) Choose (x0 , λ0 , μ0 ) ∈ X × W × Rm and set n = 0.
(ii) Solve for (xn+1 , λn+1 , μn+1 ):


⎪ min 12 L (xn , λn , μn )(x − xn , x − xn ) + f (xn )(x − xn ),



e(xn ) + e (xn )(x − xn ) = 0, (Paux )





g(xn ) + g (xn )(x − xn ) ≤ 0, (xn ) + L(x − xn ) ∈ K,

where (λn+1 , μn+1 ) are the Lagrange multipliers associated to the equality and in-
equality constraints.
(iii) Set n = n + 1 and goto (ii).
Since (6.3.3)–(6.3.5) imply (H1), (H )2), (H3) of Chapter 2, there exist by Corol-
lary 2.18 neighborhoods Û (x , λ , μ ) and U (x ∗ ) such that the auxiliary problem (Paux )
∗ ∗ ∗

of Algorithm 6.4 admits a unique local solution xn+1 in U (x ∗ ) provided that (xn , λn , μn ) ∈
Û (x ∗ , λ∗ , μ∗ ) = Û (x ∗ )×Û (λ∗ , μ∗ ). To obtain a convergence rate estimate for xn , Lagrange
multipliers to the constraints in (Paux ) are introduced. Since the regular point condition is
stable with respect to perturbations in x ∗ , we can assume that Û (x ∗ ) is chosen sufficiently
small such that for xn ∈ Û (x ∗ ) there exist (λn+1 , μn+1 , ηn+1 ) ∈ W × Rm × Z such that
⎛ ⎞
0
⎜ 0 ⎟
0 ∈ G(xn , λn , μn )(xn+1 , λn+1 , μn+1 , ηn+1 ) + ⎜ ⎟
⎝ ∂ ψRm,+ (μn+1 ) ⎠ , (6.3.6)
∂K + (ηn+1 )
where
G(xn , λn , μn )(x, λ, μ, η)
⎛ ⎞
L (xn , λn , μn )(x − xn ) + f (xn ) + e (xn )∗ λ + g (xn )∗ μ + L∗ η
⎜ −e(xn ) − e (xn )(x − xn ) ⎟
=⎜ ⎝ −g(xn ) − g (xn )(x − xn )
⎟.

−(xn ) − L(x − xn )

Theorem 6.7. Assume that (6.3.2) – (6.3.5) are satisfied at (x ∗ , λ∗ , μ∗ ) and that |(x0 , λ0 , μ0 )−
(x ∗ , λ∗ , μ∗ )| is sufficiently small. Then there exists K̄ > 0 such that
|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ K̄|(xn , λn , μn ) − (x ∗ , λ∗ , μ∗ )|2
for all n = 1, 2, . . . .

Proof. The optimality system for (6.3.1) can be expressed by


⎛ ⎞
0
⎜ 0 ⎟
0 ∈ G(x , λ , μ )(x , λ , μ , η ) + ⎜
∗ ∗ ∗ ∗ ∗ ∗ ∗
⎝ ∂ ψRm,+ (μ∗ )
⎟,

∂ ψK + (η∗ )

i i

i i
ItoKunisc
i i
2008/6/12
page 168
i i

168 Chapter 6. Augmented Lagrangian-SQP Methods

or equivalently
⎛ ⎞ ⎛ ⎞
an 0
⎜bn ⎟ ⎜ 0 ⎟
0∈⎜ ⎟ + G(xn , λn , μn )(x ∗ , λ∗ , μ∗ , η∗ ) + ⎜
⎝ cn ⎠ ⎝
⎟, (6.3.7)
∂ ψRm,+ (μ ) ⎠

dn ∂k+ (η∗ )

where (an , bn , cn , dn ) = (a(xn , λn , μn ), b(xn ), c(xn ), d(xn )) and

a(x, λ, μ) = f (x ∗ ) − f (x) + (e (x ∗ )∗ − e (x)∗ )λ∗



+ (g (x ∗ )∗ − g (x)∗ )μ∗ − L (x, λ, μ)(x ∗ − x),
b(x) = e(x) + e (x)(x ∗ − x) − e(x ∗ ),
c(x) = g(x) + g (x)(x ∗ − x) − g(x ∗ ),
d(x) = 0.

Without loss of generality we may assume that the first and second derivatives of f, e, and
g are Lipschitz continuous in Û (x ∗ ). It follows that there exists L̃ such that

(|a(x, λ)|2 + |b(x)|2 + |c(x)|2 )1/2 ≤ L̃(|x − x ∗ |2 + |λ − λ∗ |2 + |μ − μ∗ |2 )


(6.3.8)
for all (x, λ, μ) ∈ Û (x ∗ ) × Û (λ∗ , μ∗ ).

Let K̃ be determined from Corollary 2.18 and let B((x ∗ , λ∗ , μ∗ ), r) denote a ball in Û (x ∗ )×
Û (λ∗ , μ∗ ) with center (x ∗ , λ∗ , μ∗ ) and radius r, where r K̃ L̃ < 1.
Proceeding by induction, assume that (xn , λn , μn ) ∈ B((x ∗ , λ∗ , μ∗ ), r). Then from
Corollary 2.18, with (x̄1 , λ̄1 , μ̄1 ) = (x̄2 , λ̄2 , μ̄2 ) = (xn , λn , μn ), and (6.3.6) – (6.3.8) we
find

|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )|


≤ K̃|(an , bn , cn )|X×W ×Rm ≤ K̃ L̃(|xn − x ∗ |2 + |λn − λ∗ |2 + |μn − μ∗ |2 )
≤ (K̃ L̃ r)r < r.

This estimate implies that (xn+1 , λn+1 , μn+1 ) ∈ B((x ∗ , λ∗ , μ∗ ), r), as well as the desired
local quadratic convergence of Algorithm 6.4.
 
Remark 6.3.1. Let L(x, λ) = f (x) + λ, e(x) W and consider Algorithm 6.4 with

L (xn , λn , μn ) replaced by L (xn , λn ); i.e., only the equality constraints are eliminated.

If (6.3.4) is satisfied with L (x ∗ , λ∗ , μ∗ ) replaced by L (x ∗ , λ∗ ), then Theorem 6.7 holds
for the resulting algorithm with the triple (x, λ, μ) replaced by (x, λ).

Next we consider

min f (x) subject to
(6.3.9)
e(x) = 0, x ∈ C,

i i

i i
ItoKunisc
i i
2008/6/12
page 169
i i

6.3. Partial elimination of constraints 169

where f and e are as in (6.3.1) and C is a closed convex set in X. Let x ∗ denote a local
solution of (6.3.9) and assume that

f and e are twice continuously Fréchet differentiable,
(6.3.10)
f and e are Lipschitz continuous in a neighborhood of x ∗ ,
0 ∈ int e (x ∗ )(C − x ∗ ), (6.3.11)

and

there exists a constant κ > 0 such that L (x ∗ , λ∗ ) (x, x) ≥ κ|x|2X
(6.3.12)
for all x ∈ ker e (x ∗ ).

To solve (6.3.9) we consider the SQP algorithm with elimination of the equality constraint.

Algorithm 6.5.

(i) Choose (x0 , λ0 ) ∈ X × W ∗ and set n = 0.


(ii) Solve for (xn+1 , λn+1 ):

1

⎨min 2 L (xn , λn )(x − xn , x − xn ) + f (xn )(x − xn ),
(Paux )


e(xn ) + e (xn )(x − xn ) = 0, x ∈ C,

where λn+1 is the Lagrange multiplier associated to the equality constraint.


(iii) Set n = n + 1 and goto (ii).

The optimality condition for (Paux ) in Algorithm 6.5 is given by


 
∂ψC (xn+1 )
0 ∈ G(xn , λn )(xn+1 , λn+1 ) + , (6.3.13)
0

where


L (xn , λn )(x − xn ) + f (xn )
G(xn , λn )(x, λ) = .
−e(xn ) − e (xn )(x − xn )

Due to (6.3.11) and (6.3.12) there exists a neighborhood U (x ∗ , λ∗ ) of (x ∗ , λ∗ ) such that


(6.3.13) admits a solution (xn+1 , λn+1 ) and xn+1 is the solution of (Paux ) of Algorithm 6.5,
provided that (xn , λn ) ∈ U (x ∗ , λ∗ ). We also require the following condition:


⎪ There exist neighborhoods Û (x ∗ ) × Û (λ∗ ) of (x ∗ , λ∗ ) and



⎪ V of the origin in X × W and a constant K̃ such that



⎨ for all q1 , q2 ∈ V and (x̄, λ̄) ∈ Û (x ∗ ) × Û (λ∗ ) there exists
a unique solution (6.3.14)



⎪ (x, λ) = (x( x̄, λ̄, q), λ( x̄, λ̄, q ), λ(x̄, λ̄, q )) ∈ Û (x ∗
) × Û (λ ∗
) of


1 1

⎪ 0 ∈ q + G( x̄, λ̄)(x, λ) + ∂ψ (x) and


1 C
|(x(x̄, λ̄, q1 ), λ(x̄, λ̄, q1 )) − (x(x̄, λ̄, q2 ), λ(x̄, λ̄, q2 ))| ≤ K̃|q1 − q2 |.

i i

i i
ItoKunisc
i i
2008/6/12
page 170
i i

170 Chapter 6. Augmented Lagrangian-SQP Methods

Remark 6.3.2. From the discussion in the first part of this section it follows that (6.3.14)
holds for problem (6.3.1) with C = {x : g(x) ≤ 0, (x) ∈ K}, provided that g is convex.
Condition (6.3.14) was analyzed for optimal control problems in several papers; see, for
instance, [AlMa, Tro].

Theorem 6.8. Suppose that (6.3.10)–(6.3.13) hold at a local solution x ∗ of (6.3.9). If


|(x0 , λ0 ) − (x ∗ , λ∗ )| is sufficiently small, then the iterates (xn , λn ) converge quadratically
to (x ∗ , λ∗ ).

Proof. The optimality system for (6.3.9) can be expressed as


 
∂ψC (x ∗ )
0 ∈ G(x ∗ , λ∗ )(x ∗ , λ∗ ) + ,
0

or equivalently
   
an ∂ψC (x ∗ )
0∈ + G(xn , λn )(x ∗ , λ∗ ) + , (6.3.15)
bn 0

where

an = f (x ∗ ) − f (xn ) + (e (xn )∗ − e (x ∗ )∗ )λ∗ − L (xn , λn )(x ∗ − xn ),
bn = e(xn ) + e (xn )(x ∗ − xn ) − e(x ∗ ).

In view of (6.3.10) we may assume that Û (x ∗ ) is chosen sufficiently small such that f and

e are Lipschitz continuous on Û (x ∗ ). Consequently there exists L̃ such that

|(an , bn )| ≤ L̃(|xn − x ∗ |2 + |λn − λ∗ |2 ), provided that xn ∈ Û (x ∗ ). (6.3.16)

Let B((x ∗ , λ∗ ), r) denote a ball with radius r < (K̃ L̃)−1 and center (x ∗ , λ∗ ) contained
in U (x ∗ , λ∗ ) and in Û (x ∗ ) × Û (λ∗ ), and assume that (xn , λn ) ∈ B((x ∗ , λ∗ ), r). Then by
(6.3.13)–(6.3.16) we have

|(xn+1 , λn+1 ) − (x ∗ , λ∗ )| ≤ K̃ L̃ |(xn , λn ) − (x ∗ , λ∗ )|2 < |(xn , λn ) − (x ∗ , λ∗ )| < r.

The proof now follows by an induction argument.

In Section 6.2 the combination of the second order method (6.2.8) with first order
augmented Lagrangian updates was analyzed for equality-constrained problems. This ap-
proach can also be taken for problems with inequality constraints. We present the analogue
of Theorem 6.4. Let
    c c
Lc (x, λ, μ) = f (x) + λ, e(x) W + μ, ĝ(x, μ, c) Rm + |e(x)|2W + |ĝ(x, μ, c)|2Rm ,
2 2
where ĝ(x, μ, c) = max(g(x), − μc ), as defined in Chapter 3. Below c̃ ≥ c̄ denote the
constants and Bδ the closed ball of radius δ around x ∗ that were utilized in Corollary 3.7.

i i

i i
ItoKunisc
i i
2008/6/12
page 171
i i

6.3. Partial elimination of constraints 171

Algorithm 6.6.

(i) Choose (λ0 , μ0 ) ∈ W × Rm , c ∈ [c̃, ∞) and set n = 0.

(ii) Determine x̃ as the solution to

min Lc (x, λn , μn ) subject to x ∈ Bδ , (x) ∈ K. 1


(Paux )

(iii) Update λ̃ = λn + (c − c̄)e(x̃), μ̃ = μn + (c − c̄)ĝ(x̃, μn , c).

(iv) Solve (Paux ) of Algorithm 6.4 with (xn , λn , μn ) = (x̃, λ̃, μ̃) for (xn+1 , λn+1 , ηn+1 ).

(v) Set n = n + 1 and goto (ii).

Theorem 6.9. Assume that (3.4.7), (3.4.9) of Chapter 3 and (6.3.2)–(6.3.4) of this chapter
hold at (x ∗ , λ∗ , μ∗ ). If c−1 c̄ (|λ0 − λ∗ |2 + |μ0 − μ∗ |2 ) is sufficiently small, then Algorithm
6.6 is well defined and

|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )|X×W ×Rm ×Z



≤ (|λn − λ∗ |2W + |μn − μ∗ |2Rm )
c − c̄

for a constant K̂ independent of n = 0, 1, 2, . . . .

Proof. The assumptions guarantee that Corollary 3.7, Proposition 3.9, and Theorem 3.10 of
Chapter 3 and Theorem 6.7 of the present chapter are applicable. The proof is now similar to
that of Theorem 6.4, with Theorem 6.7 replacing Lemma 6.3 to estimate the step of (Paux ).
Let η̂ be the radius of the largest ball U centered at (x ∗ , λ∗ , μ∗ ) such that Theorem 6.7 is
applicable and that the x-coordinates of elements in U are also in Bδ . Let
 
√ κ 2 (c − c̄) 2(c − c̄)
η = min η̂ κ c − c̄, , where κ 2 = ,
K̄ 1 + 2K 2

with K defined in Theorem 3.10 and K̄ in Theorem 6.7,

|(λ0 , μ0 ) − (λ∗ , μ∗ )|W ×Rm < η.

We proceed by induction with respect to n. The case n = 0 is simple and we assume that

|(λi , μi ) − (λ∗ , μ∗ )| ≤ |(λi−1 , μi−1 ) − (λ∗ , μ∗ )| for i = 1, . . . , n. (6.3.17)

From Theorems 3.8 and 3.10 we have


1
|(x̃ − x ∗ , λ̃ − λ∗ , μ̃ − μ∗ )| ≤ √ | (λn , μn ) − (λ∗ , μ∗ )|
κ c − c̄
(6.3.18)
η
< √ ≤ η̂.
κ c − c̄

i i

i i
ItoKunisc
i i
2008/6/12
page 172
i i

172 Chapter 6. Augmented Lagrangian-SQP Methods

This estimate implies that x̃ ∈ int Bδ and that Theorem 6.7 with (xn , λn , μn ) replaced by
(x̃, λ̃, μ̃) is applicable. It implies

|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ K̄|(x̃, λ̃, μ̃) − (x ∗ , λ∗ , μ∗ )|2 , (6.3.19)

and combined with (6.3.18)


|(xn+1 , λn+1 , μn+1 , ηn+1 ) − (x ∗ , λ∗ , μ∗ , η∗ )| ≤ |(λn , μn ) − (λ∗ , μ∗ )|2
κ 2 (c − c̄)
≤ |(λn , μn ) − (λ∗ , μ∗ )|2 .

This implies (6.3.17) with i = n + 1 as well as the claim of the theorem.

6.4 Applications
6.4.1 An introductory example
Consider the optimal control problem
⎧ 

⎪ min 12 Q |y − z|2 dxdt + β2 |u|2U ,

yt = y + g(y) + Bu in Q,
(6.4.1)

⎪ = 0 on ,
y

y(0, ·) = ϕ on ,

where Q = (0, T ) ×  and  = (0, T ) × ∂. Define e : X = Y × U → W by

e(y, u) = yt − y − g(y) − Bu.

Here B ∈ L(U, Y ), where U is the Hilbert space of controls, and the choice for Y can be

Y = {y ∈ L2 (H01 ()) : yt ∈ L2 (H −1 ())}

or

Y = L2 (H01 () ∩ H 2 ()) ∩ W 1,2 (L2 ()),

for example. Here L2 (H01 ()) is an abbreviation for L2 (0, T ; H01 ()). The former choice
corresponds to variational formulations, the latter to strong formulations of the partial dif-
ferential equation and both require regularity assumptions for g. Matching choices for
W are

W = L2 (H −1 ()) and W = L2 (L2 ()).

We proceed with Y = {y ∈ L2 (H01 ()) : yt ∈ L2 (H −1 ())}. The Lagrangian is given by

L(y, u, λ) = J (y, u) + λ, e(y, u)


L2 (H −1 )
 T
= J (y, u) + λ, (− )−1 (yt − y − g(y) − Bu)
H −1 ,H01 dt,
0

i i

i i
ItoKunisc
i i
2008/6/12
page 173
i i

6.4. Applications 173

where denotes the Laplacian with Dirichlet boundary conditions. The augmented La-
grangian SQP step as described below (6.2.14) is given by

λ̃ = λ + c(yt − y − g(y) − Bu)

and
⎛ ⎞⎛ ⎞
I + g (y) −1 λ̃ 0 ( ∂t∂ + + g (y)) −1 δy
⎝ 0 βI B ∗ −1 ⎠ ⎝ δu ⎠
−1 ∂ −1
(− ) ( ∂t − − g (y)) B 0 δλ
⎛ ⎞
y − z − (( ∂t∂ + + g (y))(− )−1 λ̃
= −⎝ βu + B ∗ −1 λ̃ ⎠,
−1
(− ) (yt − y − g(y) − Bu)

with δy(0, ·) = δλ(T , ·) = 0. Inspection of the previous system (e.g., for the sake of
symmetrization) suggests introducing the new variable  = − −1 λ.
This results in an update for  given by

˜ =  + c(− )−1 (yt − y − g(y) − Bu),


 (6.4.2)

and the transformed system has the form


⎛ ⎞⎛ ⎞
˜
I − g (y) 0 −( ∂t∂ + + g (y)) δy
⎝ 0 βI −B ∗ ⎠ ⎝ δu ⎠
( ∂t∂ − − g (y)) −B 0 δ
⎛ ⎞
y − z − ( ∂t∂ + + g (y))˜
= −⎝ βu − B ∗ ˜ ⎠. (6.4.3)
yt − y − g(y) − Bu

Let us also point out that if we had immediately discretized (6.4.1), then the differences
between topologies tend to get lost and (− )−1 in (6.4.2) may have been forgotten. On the
discretized level the effect of (− )−1 can be restored by preconditioning.
Let us now return to the system (6.4.3). Due to its size it will—after discretization—
not be solved by a direct method but rather by iteration based on conjugate gradients.
The question of choice for preconditioners arises. The following two choices for block
preconditioners were successful in our tests [KaK]:
⎛ ⎞−1 ⎛ ⎞
0 0 P∗ 0 0 P −1
⎝ 0 βI 0 ⎠ =⎝ 0 β −1 I 0 ⎠ (6.4.4)
P 0 0 P −∗ 0 0

and
⎛ ⎞−1 ⎛ ⎞
I 0 P∗ 0 0 P −1
⎝ 0 βI 0 ⎠ =⎝ 0 β −1 I 0 ⎠, (6.4.5)
P 0 0 P −∗ 0 −∗ −1
P P ,

i i

i i
ItoKunisc
i i
2008/6/12
page 174
i i

174 Chapter 6. Augmented Lagrangian-SQP Methods

where P = ∂t − . Note that (6.4.4) requires 2, whereas (6.4.5) needs 4, parabolic


solves per iteration. Numerically, it appears that (6.4.4) is preferable to (6.4.5). For further
investigations on preconditioning of SQP-like systems we refer the reader to [BaSa]. This
may still be extended by the results of Battermann and Sachs.

6.4.2 A class of nonlinear elliptic optimal control problems


The general framework of the previous section will be applied to optimal control problems
governed by partial differential equations of the type

ˆ
⎨ − y + g(y) = f
⎪ in ,
∂y
= u on 1 , (6.4.6)


∂n
∂y
∂n
= uf on 2 ,

where fˆ ∈ L2 () and uf ∈ L2 ( 2 ) are fixed and u ∈ L2 ( 1 ) will be the control variable.
Here  is a bounded domain in R2 with C 1,1 boundary or  is convex. For the case of
higher-dimensional domains we refer the reader to [IK10]. The boundary ∂ is assumed
to consist of two disjoint sets 1 , 2 , each of which is connected, or possibly consisting of
finitely many connected components, with = ∂ = 1 ∪ 2 , and 2 possibly empty.
Further it is assumed that g ∈ C 2 (R) and that g(H 1 ()) ⊂ L1+ε () for some ε > 0.
Equation (6.4.6) is understood in the variational sense, i.e.,

(∇y, ∇ϕ) + (g(y), ϕ) = (fˆ, ϕ) + (ũ, ϕ) (6.4.7)

for all ϕ ∈ H 1 (), where 


u on 1 ,
ũ =
uf on 2 ,
(·, ·) denotes the L2 -inner product on , and (·, ·) stands for duality pairing between
functions in Lp () and Lq () with p −1 + q −1 = 1 for appropriate choices of p. We recall
that H 1 () is continuously embedded into Lp () for every p ≥ 1 if n = 2.
In (6.4.7) we should more precisely write ũ, τ ϕ
instead of ũ, ϕ
, with τ the
zero order trace operator on . However, we shall frequently suppress this notation. We
refer to y as a solution of (6.4.6) if (6.4.7) holds. The optimal control problem is given by

⎨ min 2 |Cy − yd |2Z + 2 |u|2L2 ( 1 )
1 α

(6.4.8)

subject to (y, u) ∈ H () × L ( 1 ) a solution of (6.4.6).
1 2

Here C is a bounded linear (observation) operator from H 1 () to a Hilbert space Z, and
yd ∈ Z and α > 0 are fixed.
To express (6.4.8) in the form (6.2.1) of Section 6.2 we introduce
ẽ : H 1 () × L2 ( 1 ) → H 1 ()∗
with
ẽ(y, u), ϕ
(H 1 )∗ ,H 1 = (∇y, ∇ϕ) + (g(y) − fˆ, ϕ) − (ũ, ϕ)

i i

i i
ItoKunisc
i i
2008/6/12
page 175
i i

6.4. Applications 175

and

e : H 1 () × L2 ( ) → H 1 ()

by

e = N ẽ,

where N : H 1 ()∗ → H 1 () is the Neumann solution operator associated with

(∇v, ∇ϕ) + (v, ϕ) = (h, ϕ) for all ϕ ∈ H 1 (),

where h ∈ H 1 ()∗ . In the context of Section 6.2 we set

X = H 1 () × L2 ( 1 ), Y = H 1 (),

with x = (y, u) ∈ X, and


1 α
f (x) = f (y, u) = |Cy − yd |2Z + |u|2L2 ( 1 ) .
2 2
We assume that (6.4.8) admits a solution x ∗ = (y ∗ , u∗ ). The regularity requirements (6.2.2)
of Section 6.2 are clearly met by the mapping g. Those for e are implied by
y → g(y) is twice continuously differentiable from H 1 () to L1+ε ()
(h0) for some ε > 0 with Lipschitz continuous second derivative in a
neighborhood of y ∗ .
We shall also require the following hypothesis:
(h1) g (y ∗ ) ∈ L2+ε () for some ε > 0.
With (h1) holding, g (y ∗ )ϕ ∈ L2 () for every ϕ ∈ H 1 (). It is simple to argue that
     
ker e (x ∗ ) = {(v, h) : ∇v, ∇ϕ  + g (y ∗ )v, ϕ  = h, ϕ 1 for all ϕ ∈ H 1 ()},

i.e., (v, h) ∈ ker e (x ∗ ) if and only if (v, h) is a variational solution of




⎪ − v + g (y ∗ )v = 0 in ,



∂v
= h on 1 , (6.4.9)


∂n


⎩ ∂v
∂n
= 0 on 2 .

If (6.2.3) is satisfied, i.e., if e (x ∗ ) is surjective, then there exists a unique Lagrange multiplier
λ∗ ∈ H 1 () associated to x ∗ such that

e (x ∗ )∗ λ∗ + (N C ∗ (Cy ∗ − yd ), αu∗ ) = 0 in H 1 () × L2 ( 1 ), (6.4.10)

where e (x ∗ )∗ : H 1 () → H 1 () × L2 ( 1 ) denotes the adjoint of e (x ∗ ) and C ∗ : Z →


H 1 ()∗ stands for the adjoint of C : H 1 () → Z with Z a pivot space. More precisely
we have the following proposition.

i i

i i
ItoKunisc
i i
2008/6/12
page 176
i i

176 Chapter 6. Augmented Lagrangian-SQP Methods

Proposition 6.10 (Necessary Condition). If x ∗ is a solution to (6.4.8), e (x ∗ ) is surjective,


and (h1) holds, then λ∗ is a variational solution of
-
− λ∗ + g (y ∗ )λ∗ = −C ∗ (Cy ∗ − yd ) in ,
∂λ∗ (6.4.11)
= 0 on ,
∂n
i.e., (∇λ∗ , ∇ϕ) + (g (y ∗ )λ∗ , ϕ) + (Cy ∗ − yd , Cϕ)Z = 0 for all ϕ ∈ H 1 () and

τ 1 λ∗ = αu∗ on 1 . (6.4.12)

Proof. The Lagrangian associated with (6.4.8) can be expressed by

1 α  
L(y, u, λ) = |Cy − yd |2Z + |u|2L2 ( 1 ) + ∇λ, ∇y 
2 2
   
+ λ, g(y) − fˆ  − λ, ũ .

For every (v, h) ∈ H 1 () × L2 ( 1 ) we find


     
Ly (y ∗ , u∗ , λ∗ )(v) = Cy ∗ − yd , Cv Z + ∇λ∗ , ∇v  + λ∗ , g (y ∗ )v 

and
   
Lu (y ∗ , u∗ , λ∗ )(h) = α u∗ , h 1 − λ∗ , h 1 .

Thus the claim follows.

We now aim for a priori estimates for λ∗ . We define B : H 1 () → H 1 ()∗ as the
differential operator given by the left-hand side of (6.4.11); i.e., Bv = ϕ is characterized as
the solution to
   
∇v, ∇ψ  + g (y ∗ )v, ψ  = ϕ, ψ
(H 1 )∗ ,H 1 for all ψ ∈ H 1 ().

We shall use the following hypothesis:


(h2) 0 is not an eigenvalue of B.
Note that (h2) holds, for example, if

g (y ∗ ) ≥ β a.e. on 

for some β > 0. With (h2) holding, B is an isomorphism from H 1 () onto H 1 ()∗ .
Moreover, (h2) implies surjectivity of e (x ∗ ).

Lemma 6.11. Let the conditions of Proposition 6.10 hold.


(i) There exists a constant K(x ∗ ) such that

|λ∗ |H 1 ≤ K(x ∗ )|(N C ∗ (Cy ∗ − yd ), αu∗ )|X .

i i

i i
ItoKunisc
i i
2008/6/12
page 177
i i

6.4. Applications 177

(ii) If moreover (h2) is satisfied and C ∗ (Cy ∗ − yd ) ∈ L2 (), then there exists a
constant K(y ∗ ) such that

|λ∗ |H 2 ≤ K(y ∗ )|C ∗ (Cy ∗ − yd )|L2 () .

Proof. Due to surjectivity of e (x ∗ ) we have (e (x ∗ )e (x ∗ )∗ )−1 ∈ L(H 1 ()) and thus (i)
follows from (6.4.10). Let us turn to (ii). Due to (h2) and (6.4.11) there exists a constant
Ky ∗ such that

|λ∗ |H 1 ≤ Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ . (6.4.13)

To obtain the desired H 2 () estimate for λ∗ one applies the well-known H 2 a priori estimate
for Neumann problems to

− λ∗ + λ∗ = w in ,
∂λ∗
= 0 in ∂
∂n
with w = λ∗ − g (y ∗ )λ∗ − C ∗ (Cy ∗ − yd ). This gives
 
|λ∗ |H 2 ≤ K |λ∗ |L2 + |g (y ∗ )λ∗ |L2 + |C ∗ (Cy ∗ − yd )|L2

for a constant K (depending on  but independent of y ∗ ). Since

|g (y ∗ )λ∗ |L2 ≤ |g (y ∗ )|L1+ |λ∗ |H 1 ,

the desired result follows from (6.4.13).

To calculate the second Fréchet derivative we shall use


(h3) g (y ∗ ) ∈ L1+ε () for some ε > 0.

Proposition 6.12. Let (6.2.3), (h1), and (h3) hold. Then


 
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) = |Cv|2Z + α|h|2L2 ( 1 ) + λ∗ , g (y ∗ )v 2  (6.4.14)

for all (v, h) ∈ X.

Proof. It suffices to observe that by Sobolev’s embedding theorem there exists a constant
Ke such that
 
| λ∗ , g (y ∗ )v 2  | ≤ Ke |λ∗ |H 1 |g (y ∗ )|L1+ |v|2H 1 (6.4.15)

for all v ∈ H 1 ().

We turn now to an analysis of the second order sufficient optimality condition (6.2.5)
 ∗ optimal
for the  control problem (6.4.8). In view of (6.4.14) the crucial term is given by
λ , g (y ∗ )v 2  . Two types of results will be given. The first will rely on | λ∗ , g (y ∗ )v 2  |

i i

i i
ItoKunisc
i i
2008/6/12
page 178
i i

178 Chapter 6. Augmented Lagrangian-SQP Methods

being sufficiently small. This can be achieved by guaranteeing that λ∗ or, in view of (6.4.11),
Cy ∗ − yd is small. We refer to this case as small residual problems. The second class of
assumptions rests on guaranteeing that λ∗ g (y ∗ ) ≥ 0 a.e. on .
In the statement of the following theorem we use Ke from (6.4.15) and K(x ∗ ), K(y ∗ )
from Lemma 6.11. Further B −1  denotes the norm of B −1 as an operator from H 1 ()∗ to
H 1 ().

Theorem 6.13. Let (6.2.3), (h1)–(h3) hold.


(i) If Z = H 1 (), C = id, and

Ke K(x ∗ )|(y ∗ − yd , αg ∗ )|X |g (y ∗ )|Lq < 1, (6.4.16)

then the second order sufficient optimality condition (6.2.5) holds.


(ii) If Z = L2 (), C is the injection of H 1 () into L2 (), and

k̃e K(y ∗ )|y ∗ − yd |L2 () |g (y ∗ )|L∞ () ≤ 1, (6.4.17)

where k̃e is the embedding constant of H 2 () into L∞ (), then (6.2.5) is satisfied.
(iii) If

2B −1 τ 1 Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ < α, (6.4.18)

where τ 1  is the norm of the trace operator from H 1 () onto L2 ( 1 ) and Ky ∗ is defined
in (6.4.13), then (6.2.5) is satisfied.

Proof. (i) By (6.4.14) and (6.4.15) we have for every (v, h) ∈ X

L (y ∗ , u∗ , λ∗ )((v, h), (v, h))


≥ |v|2H 1 () + α|h|2L2 ( 1 ) − Ke |λ∗ |H 1 () |g (y ∗ )|L1+ () |v|2H 1 ()
 
≥ 1 − Ke K(x ∗ )|(y ∗ − yd , αu∗ )|X |g (y ∗ )|L1+ () |v|2H 1 + α|h|2L2 ( 1 ) ,

where in the last estimate we used Lemma 6.11 (i). The claim now follows from (6.4.16).
We observe that in this case L (y ∗ , u∗ , λ∗ ) is positive definite on all of X, not only on
ker e (x ∗ ).
(ii) By (6.4.9) and (h2) we obtain

|v|H 1 () ≤ B −1 τ 1 |h|L2 ( 1 ) for all (v, h) ∈ ker e (x ∗ ). (6.4.19)

Here τ 1  denotes the norm of the trace operator from H 1 () onto L2 ( 1 ). Hence by
Lemma 6.11 (ii) and (6.4.19) we find for every (v, h) ∈ ker e (x ∗ )
 
L (v ∗ , u∗ , λ∗ )((v, h), (v, h)) = |v|2L2 () + α|h|2L2 ( 1 ) − λ∗ , g (y ∗ )v 2 L2 ()
≥ |v|2L2 () + α|h|2L2 ( 1 ) − |λ∗ |L∞ () |g (y ∗ )|L∞ () |v|2L2 ()
5 6
≥ 1 − k̃e K(y ∗ )|g (y ∗ )|L∞ () |y ∗ − yd |L2 () |v|2L2 ()
 
α 1
+ |h|2L2 ( 1 ) + |v| 2
.
B −1 2 τ 1 2 H ()
1
2

i i

i i
ItoKunisc
i i
2008/6/12
page 179
i i

6.4. Applications 179

Due to (6.4.17) the expression in brackets is nonnegative and the result follows.
(iii) In this case C can be a boundary observation operator, for example. As in (i) we
find

L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + α|h|2L2 ( 1 ) − Ke |λ∗ |H 1 () |g |L1+ () |v|2H 1 ()
≥ |Cv|2Z + α|h|2L2 ( 1 ) − Ky ∗ |C ∗ (Cy ∗ − yd )|(H 1 )∗ |g |L () |v|2H 1 () ,

where (6.4.17) was used. This estimate and (6.4.19) imply that for (v, h) ∈ ker e (x ∗ )
α
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |h|2L2 ( 1 )
1 2 2
α ∗ ∗
+ − K y ∗ |C (Cy − y )| 1
d (H ) ∗ |g | L () |v|H 1 () .
1+
2
2B −1 τ 1 

The desired result follows from (6.4.18).

In view of (6.4.16)–(6.4.18) and the fact that |y ∗ − yd | is decreasing with α → 0+ ,


the question arises, whether decreasing α results in the fact that the second order sufficient
optimality condition. This is nontrivial since the term |y ∗ − yd | in (6.4.16)–(6.4.18) is
multiplied by factors which depend on x ∗ and hence on α itself. We refer the reader to
[IK10] for a treatment of this issue.
In Theorem 6.13 the second order optimality condition was guaranteed by small
residue conditions. Alternatively one can proceed by assuming that
(h4) λ∗ g (y ∗ ) ≥ 0 a.e. on .

Theorem 6.14. Assume that (6.2.3), (h1), (h3), and (h4) hold and that
(a) Z = H 1 () and C = id,
or
(b) (h2) is satisfied.
Then (6.2.5) holds.

Proof. By Proposition 6.12 and (h4) we find for all (v, h) ∈ X

L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + α|h|2L2 ( 1 ) .

In the case that (a) holds the conclusion is obvious and L (v ∗ , u∗ , λ∗ ) is positive not only
on ker e (x ∗ ) but on all of X. In case (b) we use (6.4.19) to conclude that
 
α 1
L (y ∗ , u∗ , λ∗ )((v, h), (v, h)) ≥ |Cv|2Z + |h|2L2 ( 1 ) + |v| 2
B −1 2 τ 1 2 H ()
1
2

for all (v, h) ∈ ker e (x ∗ ).

Next we give a sufficient condition for (h4).

i i

i i
ItoKunisc
i i
2008/6/12
page 180
i i

180 Chapter 6. Augmented Lagrangian-SQP Methods

Theorem 6.15. Let (6.2.3), (h1) hold and assume that


(i) C ∗ (yd − Cy ∗ ), ψ
(H 1 )∗ ,H 1 ≥ 0 for all ψ ∈ H 1 () with ψ ≥ 0, a.e.,
(ii) g (y ∗ ) ≥ 0 a.e. on , and
(iii) g (y ∗ ) ≥ 0 a.e. on .
Then (h4) holds. The conclusion remains correct if the inequalities in (i) and (iii) are
reversed.

Proof. Set ϕ = inf(0, λ∗ ) ∈ H 1 () in (6.4.11). Then we have



|∇ϕ|2 dx + (g (y ∗ )ϕ, ϕ) + c∗ (Cy ∗ − yd ), ϕ
(H 1 )∗ ×H 1 = 0.


Since g (y ) ≥ 0 it follows from (i) that |∇ϕ|2L2 () = 0 and λ∗ ≥ 0. Together with (iii)

we find λ∗ g (y ∗ ) ≥ 0 a.e. on . If the inequalities in (i) and (iii) are reversed, we take
ϕ = sup(0, λ∗ ).

Example 6.16. We consider

− y + y 3 − y = fˆ in ,
∂y
= u on (6.4.20)
∂n
and the associated optimal control problem
⎧  
⎨ min 12  |y − yd |2 dx + α2 u2 dx
(6.4.21)

subject to (y, u) ∈ H 1 () × L2 ( ) a solution of (6.4.20).

In the context of the problem (6.4.8) we set 1 = = ∂, 2 = ∅, and

Z = L2 (), C : H 1 () → L2 () canonical injection, g(t) = t 3 − t.

Equation (6.4.20) represents a simplified Ginzburg–Landau model for superconductivity


with y denoting the wave function, which is valid in the absence of internal magnetic fields
(see [Tin, Chapters 1, 4]). Both (6.4.20) and − y + y 3 + y = h̃ are of interest in this
context, but here we concentrate on (6.4.20) since it has three equilibria, ±1 and 0, of which
±1 are stable and 0 is unstable.

Proposition 6.17. Problem (6.4.21) admits a solution (y ∗ , u∗ ) ∈ H 1 () × L1 ( ).

Proof. We first argue that the set of admissible pairs (y, u) ∈ H 1 () × L2 ( ) for (6.4.21)
is not empty. For this purpose we consider

− y + y 3 − y = fˆ in ,
(6.4.22)
y = 0 on ∂.

i i

i i
ItoKunisc
i i
2008/6/12
page 181
i i

6.4. Applications 181

Let T : L6 () → L6 () be the operator defined by


T (y) = (− )−1 (ĥ + y − y 3 ),
where denotes the Laplacian with Dirichlet boundary conditions. Clearly T is completely
continuous and (I − T )y = 0 implies that y is a solution to (6.4.22) with y ∈ H 2 () ∩
H01 (). By a variant of Schauder’s fixed point theorem due to Schäfer [Dei, p. 61], either
(I −t T )y = 0 admits a solution for every t ∈ [0, 1] or (I −t T )y = 0 admits an unbounded
set of solutions for some tˆ ∈ (0, 1). Assume that the latter is the case and let y = y(tˆ)
satisfy (I − tˆ T )y = 0. Then −(tˆ)−1 y + y 3 − y = fˆ, and hence
   
tˆ−1 |∇y|2 dx + y 4 dx = y 2 dx + fˆ y dx
  
   (6.4.23)
≤ y 2 dx + y 2 dx + |y| |fˆ|dx.
|y(x)|>2 |y(x)|≤2 

This implies that


 
tˆ−1 |∇y|2 dx ≤ 4 || + |y| |fˆ|dx.
 

Since y ∈ H01 (), Poincaré’s inequality implies the existence of a constant Ĉ independent
of y = y(tˆ) such that
|y(tˆ)|H01 ≤ Ĉ(|fˆ|L2 + ||).

Consequently the set of solutions y(tˆ) to (I − tˆ T )y = 0 is necessarily bounded in L6 (),


and (6.4.22) admits a solution y ∈ H 2 () ∩ H01 (). Setting u = ∂u ∂y
∈ H 1/2 ( ) ⊂ L2 ( )
we find the existence of an admissible pair for (6.4.21).
Let (yn , un ) ∈ H 1 () × L2 ( ) be a minimizing sequence for (6.4.21). Due to the
choice of the cost functional the sequence {un }∞ 2
n=1 is bounded in L ( ). Proceeding as in
(6.4.23) it is simple to argue that {yn }∞
n=1 is bounded in H 1
(). Standard weak convergence
arguments then imply that every weak subsequential limit of {(yn , un )} is a solution to
(6.4.21).

Proposition 6.18. For every (y, u) ∈ X = H 1 ()×L2 ( ) the linearization e (y, u) : X →


X is surjective.

Proof. We follow [GHS]. Since e = N ẽ with N defined below (6.4.8) it suffices to show
that E : = ẽ(y, u) : X → H 1 ()∗ is surjective, where E is characterized by
     
E(v, h), w
H 1,∗ ,H 1 = ∇v, ∇w  + q v, w  − h, τ w ,

with q = 3y 2 − 1 ∈ H 1 (). Consider the operator E0 : H 1 () → H 1 ()∗ defined by


   
E0 (v), w
H 1,∗ ,H 1 = ∇v, ∇w  + q v, w  ,
and observe that E0 = − + I + (q − 1)I ; i.e., E0 is a compact perturbation by the operator
(q − 1)I ∈ L(H 1 (), H 1 ()∗ ) of the homeomorphism (− + I ) ∈ L(H 1 (), H 1 ()∗ ).

i i

i i
ItoKunisc
i i
2008/6/12
page 182
i i

182 Chapter 6. Augmented Lagrangian-SQP Methods

By the Fredholm alternative either E0 , and hence E, is surjective or there exists a finite-
dimensional kernel of E0 spanned by the basis {vi }N i=1 in H () and E0 v = z, for z ∈
1

H () , is solvable if and only if z, vi
H 1,∗ ,H 1 = 0 for all i = 1, . . . , N. In the latter case
1

vi , i = 1, . . . , N, are nontrivial
 on by Lemma 6.19 below. Without loss of generality
we may assume that vi , vj = δij for i, j = 1, . . . , N. For z ∈ H 1 ()∗ we define ẑ ∈
  !
H 1 ()∗ by ẑ, w
H 1,∗ ,H 1 = z, w
H 1,∗ ,H 1 + h, w , where h = − N i=1 z, vi
H 1,∗ ,H 1 τ vi ∈
L ( ). Then ẑ, vi
H 1,∗ ,H 1 = 0 for all i = 1, . . . , N and hence there exists v ∈ H 1 () such
2

that E0 v = ẑ or equivalently
   
∇v, ∇w  + q v, w  = z, w
H 1,∗ ,H 1 for all v ∈ H 1 ().

Consequently E is surjective.

Lemma 6.19. If for q ∈ L2 () the function v ∈ H 1 () satisfies


   
∇v, ∇w  + q v, w  = 0 for all v ∈ H 1 () (6.4.24)

and v = 0 on , then v = 0 in .

˜ be a bounded domain with smooth boundary strictly containing . Define


Proof. Let 
-
v in ,
ṽ =
˜ \ ,
0 in 

and
 let q̃ denote the extension
 by 0 of q. Since v = 0 on = ∂ we have ṽ ∈ H 1 () and
˜ by (6.4.24). This, together with the fact
∇ ṽ, ∇w ˜ + q̃ ṽ, w ˜ = 0 for all w ∈ H01 ()
that ṽ = 0 on an open nonempty subset of , ˜ implies that ṽ = 0 in 
˜ and v = 0 in ; see
[Geo].

Let us discuss the validity of the conditions (hi) for the present problem. Sobolev’s
embedding theorem and

g (t) = 3t 2 − 1 and g (t) = 6t

imply that (h1), (h3), and (h4) hold. The location of the equilibria of g suggests that for
h = 0, yd ≥ 1 implies 1 ≤ y ∗ ≤ yd and similarly that yd ≤ −1 implies yd ≤ y ∗ ≤ −1.
This was confirmed in numerical experiments [IK10]. In these cases g (y ∗ ) ≥ β > 0 and
(i)–(iii) of Theorem 6.15 hold.
The numerical results in [IK10] are based on Algorithm 6.3. While it does not contain
a strategy on the choice of c, it was observed that c > 0 and of moderate size is superior
(with respect to region of attraction for convergence) to c = 0, which corresponds to the
SQP method without line search. For example, for the choice yd = 1 and the initialization
y0 = −2, the iterates yn have to pass through the stable equilibrium −1 and the unstable
equilibrium 0 to reach the desired state yd . This can be accomplished with Algorithm 6.3
with c > 0, without globalization strategy, but not with c = 0. Similar comments apply to
the case of Neumann boundary controls.

i i

i i
ItoKunisc
i i
2008/6/12
page 183
i i

6.5. Approximation and mesh-independence 183

Example 6.20. This is the singular system

− y − y 3 = h in ,
∂y
= u on , (6.4.25)
∂n

and the associated optimal control problem


⎧ 
⎨ min 2 |y − yd |2H 1 () +
1 α
2 u2 ds
(6.4.26)

subject to (y, u) ∈ H 1 () × L2 ( 1 ) a solution of (6.4.25),

where yd ∈ H 1 (). If (6.4.25) admits at least one feasible pair (y, u), then it is simple to
argue that (6.4.26) has a solution x ∗ = (y ∗ , u∗ ). We refer the reader to [Lio2, Chapter 3] for
existence results in the case that the cost functional is of the form |y −yd |rLr () +α|g|2L2 ( ) for
appropriately chosen r > 2. The existence of a Lagrange multiplier is assured in the same
manner as in Example 6.6. Clearly (h1) and (h3) are satisfied. For yd = const ≥ 12 we
observed numerically that 0 ≤ y ∗ ≤ yd , λ∗ < 0, which in view of (h4) and Theorem 6.14
explains the second order convergence rate that is observed numerically.

6.5 Approximation and mesh-independence


This section is devoted to a brief description of some aspects concerning approximation
of Algorithm 6.3. For this purpose let h denote a discretization parameter tending to zero
and, for each h, let Xh and Wh be finite-dimensional subspaces of X and W , respectively.
In this section we set Z = X × W and Zh = Xh × Wh . The finite-dimensional spaces
Xh and Wh are endowed with the inner products and norms induced by X and W . We
introduce surjective restriction operators RhX ∈ L(X, Xh ) and RhW ∈ L(W, Wh ) and define
Rh = (RhX , RhW ). For each h, let fh : Xh → R and eh : Xh → Wh denote discretizations of
the cost functional f and the constraint e. Together with the infinite-dimensional problem
(6.2.1) we consider the finite-dimensional discrete problems
-
min fh (xh ) over x ∈ Xh
(6.5.1)
subject to eh (x) = 0.

The Lagrangians for (6.5.1) are given by


 
Lh (xh , λh ) = fh (xh ) + eh (xh ), λh W ,

and the approximations of (6.2.6) are

L h (xh , λh + ceh (xh )) = 0 , eh (xh ) = 0, (6.5.2)

which are first order necessary optimality conditions for (6.5.1).

i i

i i
ItoKunisc
i i
2008/6/12
page 184
i i

184 Chapter 6. Augmented Lagrangian-SQP Methods

We require the following assumptions.

Condition 6.5.1. The approximations (Zh , Rh ) of the space Z are uniformly bounded; i.e.,
there exists a constant cR > 0 independent of h satisfying
Rh L(Z,Zh ) ≤ cR .

Condition 6.5.2. For every h problem (6.5.1) has a local solution xh∗ . The mappings fh
and eh are twice continuously Fréchet differentiable in neighborhoods Ṽh∗ of xh∗ , and the
operators eh are Lipschitz continuous on Ṽh∗ with a uniform Lipschitz constant ξe > 0
independent of h.

Condition 6.5.3. For every h there exists a nonempty open and convex set V̂h∗ ⊆ Ṽh∗ such
that a uniform Babuška–Brezzi condition is satisfied on V̂h∗ ; i.e., there exists a constant
β > 0 independent of h satisfying
 
eh (xh )qh , wh W
inf sup ≥β for all xh ∈ V̂h∗ .
wh ∈Wh qh ∈Xh qh X wh W

The Babuška–Brezzi condition implies that the operators eh (xh ) are surjective on V̂h∗ .
Hence, if Condition 6.5.3 holds, there exists for every h > 0 a Lagrange multiplier λ∗h ∈ Wh
such that (xh∗ , λ∗h ) solves (6.5.2).

Condition 6.5.4. There exist a scalar r > 0 independent of h and a neighborhood V (λ∗h )
of λ∗h for every
 ∗h ∗such that for Vh∗ = V̂h∗ × V (λ∗h )

(i) B (xh λh ); r ⊆ Vh and
(ii) a uniform second order sufficient optimality condition is satisfied on Vh∗ ; i.e., there
exists a constant κ̄ > 0 such that for all (xh , λh ) ∈ Vh∗
L h (xh , λh )(qh )2 ≥ κ̄ qh 2X
for all qh ∈ ker eh (xh ).

We define
 
L (x, λ)
F (x, λ) = .
e(x)
For every h and for all (xh , λh ) ∈ Vh∗ we introduce approximations of F and M by
 
Lh (xh , λh )
Fh (xh , λh ) = ,
eh (xh )
 
Lh (xh , λh ) eh (xh )
Mh (xh , λh ) = .
eh (xh ) 0
By Conditions 6.5.3 and 6.5.4 there exists a bound η which may depend on β and κ ∗ but is
independent of h so that
Mh−1 (xh , λh )L(Zh ) ≤ η (6.5.3)

i i

i i
ItoKunisc
i i
2008/6/12
page 185
i i

6.5. Approximation and mesh-independence 185

for all (xh , λh ) ∈ Vh∗ . We require the following consistency properties of Fh and Mh , where
V (x ∗ , λ∗ ) = V (x ∗ ) × V (λ∗ ) denotes the neighborhood of the solution x ∗ of (6.2.1) and the
associated Lagrange multiplier λ∗ defined above (6.2.8).

Condition 6.5.5. For each h we have Rh (V (x ∗ , λ∗ )) ⊆ Vh∗ , and the approximations of the
operators F and M are consistent on V (x ∗ ) and V (x ∗ , λ∗ ), respectively, i.e.,

lim Fh (Rh (x, λ)) − Rh F (x, λ)Z = 0


h→0

and
/    /
/ q q /
lim / M (R (x, λ))R − R M(x, λ) / =0
h→0 / w /Z
h h h h
w

for all (q, w) ∈ Z and (x, λ) ∈ V (x ∗ , λ∗ ). Moreover, for every h let the operator Mh be
Lipschitz continuous on Vh∗ with a uniform Lipschitz constant ξM > 0 independent of h.

Now we consider the following approximation of Algorithm 6.3.


Algorithm 6.7.

(i) Choose (xh0 , λ0h ) ∈ Vh∗ , c ≥ 0 and put n = 0.

(ii) Set λ̃nh = λnh + c eh (xhn ).

(iii) Solve for (x̂h , λ̂h ):


   
x̂h − xhn L h (xhn , λ̃nh )
Mh (xhn , λ̃nh ) =− .
λ̂h − λ̃nh eh (xhn )

h ) = (x̂h , λ̂h ), n = n + 1, and goto (ii).


(iv) Set (xhn+1 , λn+1

Theorem 6.21. Let c (xh0 , λ0h ) − (xh∗ , λ∗h )Z be sufficiently small for all h and let (6.2.3),
(6.2.5), and Conditions 6.5.2–6.5.4 hold. Then we have the following:
(a) Algorithm 6.7 is well defined and
∗ ∗ ∗ ∗ 2
h ) − (xh , λh )Z ≤ C (xh , λh ) − (xh , λh )Z ,
(xhn+1 , λn+1 n n

where C = 12 η ξM max(2, 1 + 2c2 ξe2 ) is independent of h.


(b) Let (x 0 , λ0 ) be a startup value of Algorithm 6.3 such that c (x 0 , λ0 ) − (x ∗ , λ∗ )Z
is sufficiently small and assume that
/ /
lim /(xh0 , λ0h ) − Rh (x 0 , λ0 )/Z = 0. (6.5.4)
h→0

If in addition Conditions 6.5.1 and 6.5.5 are satisfied, we obtain


/ /
lim /(xhn , λnh ) − Rh (x n , λn )/Z = 0
h→0

i i

i i
ItoKunisc
i i
2008/6/12
page 186
i i

186 Chapter 6. Augmented Lagrangian-SQP Methods

for all n, where (xhn , λnh ) and (x n , λn ) are the nth iterates of the finite- and infinite-dimensional
methods, respectively.

Corollary 6.22. Under the hypotheses of Theorem 6.21

lim (xh∗ , λ∗h ) − Rh (x ∗ , λ∗ )Z = 0


h→0

holds.

Corollary 6.23. Let the hypotheses of Theorem 6.21 hold and let {(Zh(n) , Rh(n) )} be a
sequence of approximations with limn→∞ h(n) = 0. Then we have

lim (xh(n)
n
, λnh(n) ) − Rh(n) (x ∗ , λ∗ )Z = 0.
n→∞

We turn to a brief account of mesh-independence. This is an important feature of iter-


ative approximation schemes for infinite-dimensional problems. It asserts that the number
of iterations to reach a specified approximation quality ε > 0 is independent of the mesh
size. We require the following notation:
 /  / 
n(ε) = min n0 | for n ≥ n0 : /F x n , λn + ce(x n ) /Z < ε ,
 /  / 
nh (ε) = min n0 | for n ≥ n0 : /Fh x n , λn + ceh (x n ) / < ε .
h h h Z

We point out that both n(ε) and nh (ε) depend on the startup values (x 0 , λ0 ) and (xh0 , λ0h ) of
the infinite- and finite-dimensional methods.

Theorem 6.24. Under the assumptions of Theorem 6.21 there exists for each ε > 0 a
constant hε > 0 such that

n(ε) − 1 ≤ nh (ε) ≤ n(ε) for h ∈ (0, hε ] .

The proofs of the results of this section as well as numerical examples which demon-
strate that mesh-independence is also observed numerically can be found in [KVo1, Vol].

6.6 Comments
Second order augmented Lagrangian methods as discussed in Section 6.2 were treated
in [Be] for finite-dimensional and in [IK9, IK10] for infinite-dimensional problems. The
history for SQP methods is a long one; see [Han, Pow1] and [Sto, StTa], for exam-
ple, and the references therein for the analysis of finite-dimensional problems. Infinite-
dimensional problems are considered in [Alt5], for example. Mesh-independence of aug-
mented Lagrangian-SQP methods was analyzed in [Vol]. This paper also contains many
references on mesh-independence for SQP and Newton methods. Methods which replace
equality- and inequality-constrained optimization problems by a linear quadratic approx-
imation with respect to the equality constraints and the cost functional, while leaving the

i i

i i
ItoKunisc
i i
2008/6/12
page 187
i i

6.6. Comments 187

equality constraints as explicit constraints, are frequently referred to as Lagrange–Newton


methods in the literature. We quote [Alt4, AlMa, Don, Tro] and the references therein.
Reduced SQP methods in infinite-dimensional spaces were analyzed in [JaSa, KSa, Kup]
among others. For optimization problems with equality and simple inequality constraints the
SQP method can advantageously be combined with projection methods; see [Hei], for ex-
ample. We have not considered the topic of approximating the Hessian by secant updates.
In part this is due to the fact that in the context of optimal control of partial differential
equations the structure of the nonlinearities is such that the continuous problem may allow
a rather straightforward characterization of the gradient and Hessian, provided, of course,
that it is sufficiently regular. Secant methods for SQP methods are considered in, e.g.,
[Sto, KuSa, Kup, StTa].

i i

i i
ItoKunisc
i i
2008/6/12
page 188
i i

i i

i i
ItoKunisc
i i
2008/6/12
page 189
i i

Chapter 7

The Primal-Dual
Active Set Method

This chapter is devoted to the primal-dual active set strategy for variational problems with
simple constraints. This is an efficient method for solving the optimality systems arising
from quadratic programming problems with unilateral or bilateral affine constraints and it is
equally well applicable to certain complementarity problems. The algorithm and some of its
basic properties are described in Section 7.1. In the ensuing sections sufficient conditions
for its convergence with arbitrary initialization and without globalization are presented
for a variety of different classes of problems. Sections 7.2 and 7.3 are devoted to the
finite-dimensional case where the system matrix is an M-matrix or has a cone-preserving
property, respectively. Operators which are of diagonally dominant type are considered in
Sections 7.4 and 7.5 for unilateral, respectively, bilateral problems. In Section 7.6 nonlinear
optimal control problems with control constraints are investigated.

7.1 Introduction and basic properties


Here we investigate the primal-dual active method. Let us consider the quadratic program-
ming problem

⎪ 1   

⎨min Ax, x X − a, x X
x∈X 2
(7.1.1)



subject to Ex = b, Gx ≤ ψ,

where A ∈ L(X) is a self-adjoint operator in the real Hilbert space X, E ∈ L(X, W ),


G ∈ L(X, Z), with W a real Hilbert space, and Z = Rn or Z = L2 (), with  a domain
in Rd , endowed with the usual Hilbert space structure and the natural ordering. We assume
that (7.1.1) admits a unique solution denoted by x. If x is a regular point in the sense of
Definition 1.5 (with C = X and g(x) = (Ex − b, Gx − ψ)), then there exists a Lagrange

189

i i

i i
ItoKunisc
i i
2008/6/12
page 190
i i

190 Chapter 7. The Primal-Dual Active Set Method

multiplier (λ, μ) ∈ W × Z such that


Ax + E ∗ λ + G∗ μ = a,

Ex = b, (7.1.2)

μ = max(0, μ + c (Gx − ψ)),


where c > 0 is a fixed constant and max is interpreted pointwise a.e. in  if Z = L2 () and
coordinatewise if Z = Rn . The third equation in (7.1.2) constitutes the complementarity
condition associated with the inequality constraint in (7.1.1), as discussed in Examples 4.51
and 4.53. Note that the auxiliary problems in the augmented Lagrangian-SQP methods of
Chapter 6 (see for instance (ii) of Algorithm 6.4 or (ii) of Algorithm 6.6) take the form
of (7.1.1).
The primal-dual active set method that will be analyzed in this chapter is an efficient
technique for solving (7.1.2). While (7.1.2) is derived from (7.1.1) with A self-adjoint, this
assumption is not essential in the remainder of this chapter, and we therefore drop it unless
it is explicitly specified.
We next give two examples of constrained optimal control problems which are special
cases of (7.1.1).
Example 7.1. Let X = L2 (), ˆ where  ˆ ⊂  is the control domain and consider the
optimal control problem
⎧ 
⎪ 1 α
⎪min
⎨ |y − yd |2 dx + |u|2X
u∈X 2  2



subject to − y = Bu, y = 0 on ∂ and u ≤ ψ,
ˆ and B ∈ L(X, L2 ()) is the extension-by-zero
where α > 0, yd ∈ L2 (), ψ ∈ L2 (),
ˆ to . This problem can be formulated as (7.1.1) without equality constraint
operator 
(E = 0) by setting
A = αI + B ∗ (− )−2 B, a = B ∗ (− )−1 ȳ, and G = I,
where denotes the Laplacian with homogeneous Dirichlet boundary conditions.
Example 7.2. Here we consider optimal control of the heat equation with Neumann bound-
ˆ where ˆ is a subset of the boundary ∂ of :
ary control and set, X = L2 (0, T ; L2 ( )),
⎧  T


1 α
|y − yd |2 dx dt + |u|2U

⎪ min

⎪ u∈X 2 0 2




⎪ subject to d
y = y, y(0, ·) = y0 ˆ
y = 0 on ∂ \ ,

⎪ dt




⎩ ˆ
ν · ∇y(t) = Bu(t) in , u(t) ≤ ψ,

where α > 0, yd ∈ L2 (0, T ; L2 ()), y0 ∈ L2 (), ˆ is a subset of the boundary ∂, ν is


ˆ and B is the extension-by-zero operator ˆ to ∂. For u ∈ X there
the outer normal to ,

i i

i i
ItoKunisc
i i
2008/6/12
page 191
i i

7.1. Introduction and basic properties 191

exists a unique solution y = y(u) ∈ Y = L2 (0, T ; H 1 ()) ∩ H 1 (0, T ; L2 ()) to the initial
boundary value problem arising as the equality constraints. Let T ∈ L(U, Y ) denote the
solution operator given by y = T (u). Setting

A = αI + T ∗ T , a = T ∗ yd , and G = I,

this optimal control problem is equivalent to (7.1.1).

In these examples A is bounded on X. More specifically it is a compact perturbation


of a multiple of the identity. There are important problems which are not covered by (7.1.1).
For the obstacle problem, considered in Section 4.7.4, the choice X = L2 () results in the
unbounded operator A = − , and hence it is not covered by (7.1.1). If X is chosen as
H01 () and G as the inclusion of H01 () into L2 (), then the solution does not satisfy the
regular point condition. While special techniques allow one to guarantee the existence of
a Lagrange multiplier μ associated with the constraint y ≤ ψ in L2 (), the natural space
for the Lagrange multiplier is H −1 (). This suggests combining the primal-dual active set
method with a regularization method, which will be discussed in a more general context in
Section 8.6. Discretization of the obstacle problem by central finite differences leads to a
system matrix A on X = Rn that is symmetric positive definite and is an M-matrix. This
important class of finite-dimensional variational problems will be discussed in Section 7.2.
Another important class of problems are state-constrained optimal control problem,
for example,

1 α
min |y − ȳ|2 dx + |u|2L2 ()
2  2
subject to

− y = u, y = 0 on ∂, y≤ψ

for y ∈ H01 () and u ∈ L2 (). This problem can be formulated as (7.1.1) with X =
(H 2 () ∩ H01 ()) × L2 () and G the inclusion of X into L2 (). The solution, however,
will not satisfy a regular point condition, and the Lagrange multiplier associated with the
state constraint y ≤ ψ is only a measure in general. This class of problems will be discussed
in Section 8.6.
Let us return to (7.1.2). If the active set

A = {μ + c(Gx − ψ) > 0}

is known, then the linear system reduces to

Ax + E ∗ λ + G∗ μ = a,

Ex = b,

Gx = ψ in A and μ = 0 in Ac .

Here and below we frequently use the notation {f > 0} to stand for {x : f (x) > 0} if
f ∈ L2 () and {i : fi > 0} if f ∈ Rn . The active set at the solution, however, is unknown.

i i

i i
ItoKunisc
i i
2008/6/12
page 192
i i

192 Chapter 7. The Primal-Dual Active Set Method

The primal-dual active set method uses the complementarity condition

μ = max(0, μ + c (Gx − ψ))

as a prediction strategy. Based on the current primal-dual pair (x, μ) the updates for the
active and inactive sets are determined by

I = {μ + c(Gx − ψ) ≤ 0} and A = {μ + c(Gx − ψ) > 0}.

This leads to the following Newton-like method.

Primal-Dual Active Set Method.

(i) Initialize x 0 , μ0 . Set k = 0.


(ii) Set Ik = {μk + c(Gx k − ψ) ≤ 0}, Ak = {μk + c(Gx k − ψ) > 0}.
(iii) Solve for (x k+1 , λk+1 , μk+1 ):
+
Ax k+1 + E ∗ λk 1
+ G∗ μk+1 = a,

Ex k+1 = b,

Gx k+1 = ψ in Ak and μk+1 = 0 in Ik .

(iv) Stop, or set k = k + 1, and return to (ii).

It will be shown in Section 8.4 that the above algorithm can be interpreted as a semismooth
Newton method for solving (7.1.2). This will allow the local convergence analysis of the
algorithm. In this chapter we concentrate on its global convergence, i.e., convergence for
arbitrary initializations and without the necessity for a line search. In case A is positive
definite, a sufficient condition for the existence of solutions to the auxiliary systems of step
(iii) of the algorithm is given by surjectivity of E and surjectivity G : N (E) → Z. In fact
in this case (iii) is the necessary optimality condition for

⎪ 1   

⎨min Ax, x X − a, x X
x∈X 2



subject to Ex = b, Gx = ψ on Ak .

In the remainder of this chapter we focus on the reduced form of (7.1.2) given by

Ax + μ = a, μ = max(0, μ + c (x − ψ)), (7.1.3)

where A ∈ L(Z).
We now derive sufficient conditions which allow us to transform (7.1.2) into (7.1.3).
In a first step we assume that

G is surjective, range(G∗ ) ⊂ kerE, E x̄ = b for some x̄ ∈ (ker E)⊥ . (7.1.4)

i i

i i
ItoKunisc
i i
2008/6/12
page 193
i i

7.1. Introduction and basic properties 193

Note that (7.1.4) implies that G : N (E) → Z is surjective. If not, then there exists a
nonzero z ∈ Z such that (z, Gx)Z = (G∗ z, x)X = 0 for all x ∈ ker E. If we let x = G∗ z,
then |x|2 = 0 and z = 0, since G∗ is injective. Let PE denote the orthogonal projection in
X onto ker E. Then (7.1.2) is equivalent to

Ax̂ + G∗ μ = PE (a − Ax̄), μ = max(0, μ + c (Gx̂ − (ψ − Gx̄),

E ∗ λ = (I − PE )(a − A(PE x̂ + x̄)),

with A = PE APE and x = x̂ + x̄ ∈ ker E + (ker E)⊥ . The first of the above equations is
equivalent to the system

(I − PG )A((I − PG ) x̂ + PG x̂) + G∗ μ = (I − PG )PE (a − A x̄),


(7.1.5)
PG A((I − PG ) x̂ + PG x̂) = PG PE (a − A x̄),

where PG = I − G∗ (G G∗ )−1 G is the orthogonal projection in ker E ⊂ X onto ker G.


Since G∗ is injective the first equation in (7.1.5) is equivalent to

(G G∗ )−1 G A(G∗ (G G∗ )−1 y + x2 ) + μ = (G G∗ )−1 G PE (a − A x̄),

where y = G x1 for x1 ∈ ker E ∩ (ker G)⊥ , x2 ∈ ker G, and x̂ = x1 + x2 . Let

A11 = (GG∗ )−1 GAG∗ (GG∗ )−1 , A12 = (GG∗ )−1 GAPG , A22 = PG APG

and

a1 = (GG∗ )−1 GPE (a − Ax̄), a2 = PG PE (a − Ax̄).

Then (7.1.5) is equivalent to the following equation in Z × (ker E ∩ ker G):


      
A11 A12 y μ a1
+ = . (7.1.6)
A21 A22 x2 0 a2

Equation (7.1.6) together with

μ = max(0, μ + c(y − (ψ − Gx̄))) (7.1.7)

and E ∗ λ = (I − PE )(a − A(x̂ + x̄)) are equivalent to (7.1.2) where x = x̂ + x̄, with
x̂ = x1 + x2 ∈ ker E, x2 ∈ ker E ∩ ker G, x1 ∈ ker E ∩ (ker G)⊥ , y = Gx1 .
Note that the system matrix in (7.1.6) is positive definite if A restricted to ker E is
positive definite.
Let us now further assume that

A22 is nonsingular. (7.1.8)

Then (7.1.6), (7.1.7) are equivalent to

(A11 − A12 A−1 ∗ −1


22 A12 )y + μ = a1 − A12 A22 a2

i i

i i
ItoKunisc
i i
2008/6/12
page 194
i i

194 Chapter 7. The Primal-Dual Active Set Method

and (7.1.7), which is of the desired form (7.1.3). In the finite-dimensional case, (7.1.3)
admits a unique solution for every a ∈ Rn if and only if A is a P -matrix; see [BePl,
Theorem 10.2.15.]. Recall that A is called a P -matrix if all its principal minors are positive.
In view of the fact that the reduction of (7.1.6) to (7.1.3) was achieved by taking the Schur
complement with respect to A22 it is also worthwhile to recall that the Schur complement
of a P -matrix (resp., M-matrix) is again a P -matrix (resp., M-matrix); see [BePl, p. 292].
For further reference it will be convenient to specify the primal-dual active set algo-
rithm for the reduced system (7.1.3).

Primal-Dual Active Set Method for Reduced System.

(i) Initialize x 0 , μ0 . Set k = 0.

(ii) Set Ik = {μk + c(x k − ψ) ≤ 0}, Ak = {μk + c(x k − ψ) > 0}.

(iii) Solve for (x k+1 , μk+1 ):

Ax k+1 + μk+1 = a,

x k+1 = ψ in Ak and μk+1 = 0 in Ik .

(iv) Stop, or set k = k + 1, and return to (ii).

We use (7.1.3) rather than (7.1.6) for the convergence analysis for the reason of
avoiding additional notation. All convergence results that follow equally well apply to
(7.1.6) where it is understood that the coordinates corresponding to the variable x2 are
treated as inactive throughout the algorithm and the corresponding Lagrange multipliers are
set and updated by 0.
In the following subsections convergence will be proved under various different con-
ditions. These conditions will imply also the existence of a solution to the subsystems in
step (iii) of the algorithm as well as the existence of a unique solution to (7.1.3).
For  ˜ ⊂ {1, . . . , n}, respectively, 
˜ ⊂ , let R˜ denote the restriction operator to
˜ ∗
. Then R˜ is the extension-by-zero operator to  ˜ c . For any A let I be its complement in
R , respectively, , and denote
n


AA = RA ARA , AA,I = RA ARI∗ ,

and analogously for AI and AI,A , and

δxA = RA (x k+1 − x k ), δxI = RI (x k+1 − x k ),

and analogously for δμA and δμI . From (iii) above we have

AAk δxAk + AAk ,Ik δxIk + δμAk = 0,


(7.1.9)
AIk δxIk + AIk ,Ak δxAk − μkIk = 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 195
i i

7.1. Introduction and basic properties 195

The following properties for k = 1, 2, . . . follow from steps (ii) and (iii):
μk (x k − ψ) = 0, μk + (x k − ψ) > 0 on Ak ,

x k − ψ ≥ 0, μk ≥ 0 on Ak , x k ≤ ψ, μk ≤ 0 on Ik , (7.1.10)

δxAk ≤ 0, δμIk ≥ 0,

all statements holding coordinatewise if Z = Rn and pointwise a.e. for Z = L2 ().

Remark 7.1.1. From (7.1.10) it follows that Ak = Ak+1 implies that the solution is found,
i.e., (xk , μk ) = (x ∗ , μ∗ ). In numerical practice it was observed that Ak = Ak+1 can be
used as a stopping criterion; see [HiIK, IK20, IK22].

Remark 7.1.2. The primal-dual active set strategy can be interpreted as a prediction strategy
which, on the basis of (x k , μk ), predicts the true active and inactive sets for (7.1.3), i.e.,
the sets

A∗ = {μ∗ + c(x ∗ − ψ) > 0} and I ∗ = (A∗ )c .

To further pursue this point we assume that the systems in (7.1.3) and step (iii) of the
algorithm admit solutions, and we define the following partitioning of the index set at
iteration level k:

IG = Ik ∩ I ∗ , IB = Ik ∩ A∗ , AG = Ak ∩ A∗ , AB = Ak ∩ I ∗ .

The sets IG , AG give a good prediction and the sets IB , AB give a bad prediction. Let
us denote x = x k+1 − x ∗ , μ = μk+1 − μ∗ , and we denote by G((x k , μk )) the system
matrix for step (iii) of the algorithm:
⎛ ⎞
AIk AIk Ak IIk 0
⎜AAk Ik AAk 0 IAk ⎟
G(xk , μk ) = ⎜
⎝ 0
⎟.
0 IIk 0 ⎠
0 −cIAk 0 0
Then we have the identity
⎛ ⎞
xIk
⎜ xAk ⎟  
G(x k , μk ) ⎜ ⎟ ∗ ∗
⎝ μIk ⎠ = −col 0Ik , 0Ak , 0IG , μIB , 0AG , c(ψ − x )AB . (7.1.11)
μAk
Here we assumed that the components of the equation μ − max{0, μ + c(x − ψ)} = 0 are
ordered as (IG , IB , AG , AB ). Since x k ≥ ψ on Ak and μk ≤ 0 on Ik , we have

|ψ − x ∗ |AB ≤ |x k − x ∗ |AB and |μ∗ |IB ≤ |μk − μ∗ |IB . (7.1.12)

By the definition

yAG = 0, xAB = (ψ − x ∗ )AB , μIG = 0, μIB = −μ∗IB . (7.1.13)

i i

i i
ItoKunisc
i i
2008/6/12
page 196
i i

196 Chapter 7. The Primal-Dual Active Set Method

On the basis of (7.1.11)–(7.1.13) we can draw the following conclusions.


(i) If x k → x ∗ , then, in the finite-dimensional case, there exists an index k̄ such that
IB = AB = ∅ for all k ≥ k̄. Consequently the convergence occurs in finitely many steps.
(ii) By (7.1.11)–(7.1.12) there exists a constant κ ≥ 1 independent of k such that
 
| x| + | μ| ≤ κ |(x k − x ∗ )AB | + |(μk − μ∗ )IB | .

Thus if the incorrectly predicted sets are small in the sense that
1
|(x k − x ∗ )AB | + |(μk − μ∗ )IB | ≤ (|(x k − x ∗ )AB,c | + |(μk − μ∗ )IB,c |),
2κ − 1
where AB,c , IB,c denote the complement of the indices AB , IB , respectively, then
1
|x k+1 − x ∗ | + |μk+1 − μ∗ | ≤ (|x k − y ∗ | + |μk − μ∗ |),
2
and convergence follows.
(iii) If x ∗ < ψ and μ0 + c (x 0 − ψ) ≤ 0 (e.g., y 0 = ψ, μ0 = 0), then the algorithm
converges in one step. In fact, in this case AB = IB = ∅.

7.2 Monotone class


In this section we assume that Z = Rn and that A is an M-matrix, i.e., it is nonsingular, its
nondiagonal elements are nonnegative, and A−1 ≥ 0. Then there exists a unique solution
(x ∗ , μ∗ ) to (7.1.3).
Example 7.3. Consider the discretized obstacle problem on the square (0, 1) × (0, 1). Let
h = N1 and let ui,j denote the approximation of the solution at the nodes (ih, j h), 0 ≤ i, j ≤
N . Using the central difference approximation results in a finite-dimensional variational
inequality satisfying
4ui,j − ui+1,j − ui−1,j − ui,j +1 − ui,j −1
+ μi,j = fi,j ,
h2

μi,j = max(0, μi,j + c (ui,j − ψi,j )

for 1 ≤ i, j ≤ N − 1, and u0,j = uN,j = ui,0 = ui,N = 0. The resulting matrix A in


2
R(N−1) is an M-matrix.
The following theorem asserts the convergence of the iterates of the primal-dual active
set method for (7.1.3).

Theorem 7.4. Assume that A is an M-matrix. Then xk → x ∗ for arbitrary initial data.
Moreover x ∗ ≤ x k+1 ≤ x k for all k ≥ 1, x k ≤ ψ for all k ≥ 2, and there exists k0 such that
μk ≥ 0 for all k ≥ k0 .

Proof. Since A is an M-matrix we have A−1 −1


I ≥ 0 and AI AI,A ≤ 0 for every index
partition of {1, . . . , n} into I and A. Since δxIk = −AIk AIk Ak δxAk − A−1
−1
Ik δμIk by (7.1.9)

i i

i i
ItoKunisc
i i
2008/6/12
page 197
i i

7.3. Cone sum preserving class 197

it follows that δxIk ≤ 0. Together with δxAk ≤ 0, which follows from the third equation
in (7.1.10), this implies that x k+1 ≤ x k for k ≥ 1. Next we show that x k is feasible for
k ≥ 2. Due to monotonicity of x k with respect to k it suffices to show this for k = 2. For i
such that (x 1 − ψ)i > 0 we have μ1i = 0 by (7.1.10) and hence μ1i + c (x 1 − ψ)i > 0 and
i ∈ A1 . Since x 2 = ψ on A1 and x 2 ≤ x 1 it follows that x 2 ≤ ψ.
To verify that x ∗ ≤ x k for k ≥ 1, note that

aIk−1 = μ∗Ik−1 + AIk−1 xI∗k−1 + AIk−1 Ak−1 xA



k−1

= AIk−1 xIk k−1 + AIk−1 Ak−1 ψAk−1 ,

and consequently
   ∗ 
AIk−1 xIk k−1 − xI∗k−1 = μ∗Ik−1 + AIk−1 Ak−1 xA k−1
− ψAk−1 .

Since μ∗Ik−1 ≥ 0 and xA ∗


k−1
≤ ψAk−1 , the M-matrix properties of A imply that xIk k−1 ≥ xI∗k−1
and consequently x k ≥ x ∗ for all k ≥ 1.
Turning to the feasibility of μk assume that for a pair of indices (k̄, i), k̄ ≥ 1, we have
μi < 0. Then necessarily i ∈ Ak̄−1 , xik̄ = ψi , and μk̄i + c(xik̄ − ψi ) < 0. It follows that

i ∈ Ik̄ , μk̄+1
i = 0, and μk̄+1
i + c(xik̄+1 − ψi ) ≤ 0, since xik+1 ≤ ψi , k ≥ 1. Consequently
i ∈ Ik̄+1 and by induction i ∈ Ik for all k ≥ k̄ + 1. Thus, whenever a coordinate of μk
becomes negative at iteration k̄, it is zero from iteration k̄+1 onwards, and the corresponding
primal coordinate is feasible. Due to finite-dimensionality of Rn it follows that there exists
ko such that μk ≥ 0 for all k ≥ ko .
Monotonicity of x k and x ∗ ≤ x k ≤ ψ for k ≥ 2 imply the existence of x̄ such
that lim x k = x̄ ≤ ψ. Since μk = Ax k + a ≥ 0 for all k ≥ ko , there exists μ̄ such
that lim μk = μ̄ ≥ 0. Together with the complementarity property μ̄(x̄ − ψ), which is a
consequence of the first equation in (7.1.10), it follows that (x̄, μ̄) = (x ∗ , μ∗ ).

7.3 Cone sum preserving class


For rectangular matrices B ∈ Rn×m we denote by  · 1 the subordinate matrix norm when
both Rn and Rm are endowed with the 1-norms. Moreover, B+ denotes the n × m matrix
containing the positive parts of the elements of B. Recall that a square matrix is called a
P -matrix if all its principle minors are positive.

of the index set {1, . . . , n} into


Theorem 7.5. If A is a P -matrix and for every partitioning!
disjoint subsets I and A we have (A−1
I A ) 
IA + 1 < 1 and −1
i∈I (AI xI )i > 0 for xI ≥ 0

with xI  = 0, then limk→∞ x = x .
k

The third condition in the above theorem motivates the terminology cone sum pre-
serving. If A is an M-matrix,
! then the conditions of Theorem 7.5 are satisfied. The proof
will reveal that M(x k ) = ni=1 xik is a merit function.

Proof. From (7.1.9) and the fact that x k+1 = ψ on Ak we have for k = 1, 2, . . .

(x k+1 − x k )Ik = A−1 −1 k


Ik AIk Ak (x − ψ)Ak + AIk μIk
k

i i

i i
ItoKunisc
i i
2008/6/12
page 198
i i

198 Chapter 7. The Primal-Dual Active Set Method

and upon summation over the inactive indices


   
(xik+1 − xik ) = A−1
Ik AIk Ak (x − ψ)Ak i +
k
(A−1 k
Ik μIk )i . (7.3.1)
i∈Ik i∈Ik i∈Ik

Using again that x k+1 = ψ on Ak , this implies that


n   
(xik+1 −xik ) = − (xik −ψi )+ (A−1
Ik AIk Ak (x −ψ)Ak )i +
k
(A−1 k
Ik μIk )i . (7.3.2)
i=1 i∈Ak i∈Ik i∈Ik

k
Since xA k
≥ ψAk it follows that


n 
(xik+1 − xik ) ≤ ((A−1
Ik AIk Ak )+ 1 − 1) |x − ψ|1,Ak +
k
(A−1 k
Ik μIk )i < 0 (7.3.3)
i=1 i∈Ik

unless x k is the solution to (7.1.3). In fact, if |x k − ψ|1,Ak , then x k ≤ ψ on Ak and μk ≥ 0


on Ak . If moreover μIk = 0, then μk ≥ 0 and x k ≤ ψ on . Together with the first
equation in (7.1.10), this implies that {(x k , μk )} satisfies the complementarity conditions.
It also satisfies Ax k + μk = a and hence (x k , μk ) is a solution to (7.1.3). Consequently


n
x k → M(x k ) = xik
i=1

acts as a merit function for the algorithm. Since there are only finitely many possible choices
for active/inactive sets, there exists an iteration index k̄ such that Ik̄ = Ik̄+1 . In this case
(x k̄+1 , μk̄+1 ) is a solution to (7.1.3). In fact, in view of (iii) of the algorithm it suffices to show
that x k̄+1 and μk̄+1 are feasible. This follows from the fact that due to Ik̄ = Ik̄+1 we have
c(xik̄+1 − ψi ) = μk̄+1 i + c(xik̄+1 − ψi ) ≤ 0 for i ∈ Ik̄ and μk̄+1i + c(xik̄+1 − ψi ) = μk̄+1
i >0
for i ∈ Ak̄ . From (7.1.10) we deduce μ (x k̄+1 k̄+1
− ψ) = 0, and hence the complementarity
conditions hold and the algorithm converges in finitely many steps.

A perturbation result. We now discuss the primal-dual active set strategy for the case
where the matrix A can be expressed as an additive perturbation of an M-matrix.

Theorem 7.6. Assume that A = M + K with M an M-matrix and K an n × n matrix.


If K1 is sufficiently small, then the primal-dual active set algorithm is well defined and
limk→∞ (x k , μk ) = (x ∗ , μ∗ ), where (x ∗ , μ∗ ) is a solution to (7.1.3). If y T Ay > 0 for y  = 0,
then the solution to (7.1.3) is unique.

Proof. As a consequence of the assumption that M is an M-matrix all principal submatrices


of M are M-matrices as well [BePl]. Let S denote the set of all subsets of {1, . . . , n} and
A its complement. Define

ρ = sup MI−1 KI 1 and σ = sup BIA (K)1 , (7.3.4)


I∈S I∈S

i i

i i
ItoKunisc
i i
2008/6/12
page 199
i i

7.3. Cone sum preserving class 199

where BIA (K) = MI−1 KIA − MI−1 KI (M + K)−1 I AIA . Assume that K is chosen such that
ρ < 12 and σ < 1. For every subset I ∈ S the inverse of AI exists and can be expressed as

  
A−1
I = (II + −MI−1 KI )i MI−1 .
i=1

Consequently the algorithm is well defined. Proceeding as in the proof of Theorem 7.5 we
arrive at

n    
(xik+1 − xik ) = − (xik − ψi ) + A−1
Ik AIk Ak (x − ψ)Ak
k
i
+ (A−1 k
Ik μIk )i ,
i=1 i∈Ak i∈Ik i∈Ik

where μki ≤ 0 for i ∈ Ik and xik ≥ ψi for i ∈ Ak . Below we drop the index k with Ik and
Ak . Note that A−1 −1 −1 −1
I AIA ≤ MI KIA − MI KI (M + K)I AIA = BIA (K). Here we used
−1 −1 −1 −1 −1
(M + K)I − MI = −MI KI (M + K)I and MI MIA ≤ 0. This implies

n    
(xik+1 − xik ) ≤ − (xik − ψi ) + BIA (K)(x k − ψ)A i + (A−1 k
I μI )i .
i=1 i∈A i∈I i∈I
(7.3.5)
We estimate
⎛ ⎞
  ∞

(A−1
I μ I )i =
k ⎝MI−1 μkI + (−MI−1 KI )j MI−1 μkI ⎠
i∈I i∈I j =1
i



≤ −|MI−1 μkI |1 + ρ i |MI−1 μkI |1
j =1

 
1
= (α − 1)|MI−1 μkI |1 + − (α + 1) |MI−1 μkI |1 = (α − 1)|MI−1 μkI |1 ,
1−ρ
where we set α = ρ
1−ρ
∈ (0, 1) by (7.3.4). This estimate, together with (7.3.4) and (7.3.5),
implies that

n
(xik+1 − xik ) ≤ (σ − 1)|xA
k
k
− ψAk |1 + (α − 1)|MI−1
k
μkIk |1 .
i=1

Now it can! be verified in the same manner as in the proof of Theorem 7.5 that x k →
M(x ) = ni=1 xik is a merit function for the algorithm and convergence of (x k , μk ) to
k

a solution (x ∗ , μ∗ ) follows. If there are two solutions to (7.1.3), then their difference y
satisfies y t Ay ≤ 0 and hence y = 0 and uniqueness follows.

Observe that the M-matrix property is not stable under arbitrarily small perturbations
since off-diagonal elements may become positive. Theorem 7.6 guarantees that convergence
of the primal-dual active set strategy for arbitrary initial data is preserved for sufficiently
small perturbations K of an M-matrix. Therefore, Theorem 7.6 is also of interest in con-
nection with numerical implementations of the primal-dual active set algorithm.

i i

i i
ItoKunisc
i i
2008/6/12
page 200
i i

200 Chapter 7. The Primal-Dual Active Set Method

7.4 Diagonally dominated class


We again consider the reduced problem (7.1.3),
Ax + μ = a, μ = max(0, μ + c (x − ψ)), (7.4.1)
but differently from Sections 7.2 and 7.3 we admit also the infinite-dimensional case. Suf-
ficient conditions related to diagonal dominance of A will be given which imply that
   
+ k+1 −
M(x , μ ) = max β
k+1 k+1
|(x k+1
− ψ) | dx, |(μ ) | dx (7.4.2)
Ik Ak

with β > 0 acts as a merit functional for the primal-dual algorithm. Here we set φ + =
max(φ, 0) and φ − = −min(φ, 0). The natural norm associated to this merit functional is
the L1 ()-norm and consequently we assume that
A ∈ L(L1 ()), a ∈ L1 (), and ψ ∈ L1 (). (7.4.3)
The analysis of this section can also be used to obtain convergence in the Lp ()-norm for
any p ∈ (1, ∞) if the norms in the integrands of M are replaced by | · |p -norms and the
L1 ()-norms below are replaced by Lp ()-norms as well.
The results also apply for Z = Rn . In this case the integrals in (7.4.2) are replaced
by sums over the respective index sets.
We assume that there exist constants ρi , i = 1, . . . , 5, such that for all partitions A
and I of  and for all φA ≥ 0 in L2 (A) and φI ≥ 0 in L2 (I)
|[A−1 −
I φI ] | ≤ ρ1 |φI |,
(7.4.4)
|[A−1 +
I AIA φA ] | ≤ ρ2 |φA |

and
|[AA φA ]− | ≤ ρ3 |φA |,

|[AAI A−1 −
I φI ] | ≤ ρ4 |φI |, (7.4.5)

|[AAI A−1 +
I AIA φA ] | ≤ ρ5 |φA |.

Here | · | denotes the L1 ()-norm. Assumption (7.4.4) requires in particular the existence
of A−1
I . By a Schur complement argument with respect to the sets Ik and Ak this implies
existence of a solution to the linear systems in step (iii) of the algorithm for every k.

Theorem 7.7. If (7.4.3), (7.4.4), (7.4.5) hold and ρ = max(β ρ1 + ρ2 , ρβ3 + ρ4 + ρβ5 ) <
1, then M is a merit function for the primal-dual algorithm of the reduced system and
limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.4.1).

Proof. For every k ≥ 1 we have (x k+1 − ψ)+ ≤ (x k+1 − x k )+ on Ik and (μk+1 )− = (δμ)−
on Ak . Therefore
   
M(x k+1 , μk+1 ) ≤ max β (δxIk )+ , (δμAk )− . (7.4.6)
Ik Ak

i i

i i
ItoKunisc
i i
2008/6/12
page 201
i i

7.4. Diagonally dominated class 201

From (7.1.9) we deduce that

δxIk = −A−1 −1
Ik (−μIk ) + AIk AIk Ak (−δxAk ),
k

with μkIk ≤ 0 and δxAk ≤ 0. By (7.4.4) therefore

|(δxIk )+ | ≤ ρ1 |μkIk | + ρ2 |δxAk |


 
= ρ1 |(μkIk )− | + ρ2 (xk − ψ)+
Ik ∩Ak−1 Ak ∩Ik−1 (7.4.7)
 
ρ2
≤ ρ1 + M(x k , μk ).
β
Similarly, by (7.1.9)

δμAk = AAk (−δxAk ) + AAk Ik A−1 −1


Ik (−μIk ) − AAk Ik AIk AIk Ak (−δxAk ).
k

Since δxAk ≤ 0 and μkIk ≤ 0, we find by (7.4.5)


 
− ρ3 + ρ5
|(δμAk ) | ≤ ρ3 |δxAk | + ρ4 |μkIk | + ρ5 |δxAk | ≤ + ρ4 M(x k , ν k ), (7.4.8)
β
and therefore
 
ρ3 + ρ5
M(x k+1 k+1
,μ ) ≤ max βρ1 + ρ2 , + ρ4 M(x k , μk ) = ρ M(x k , μk ).
β

Thus, if ρ < 1, then M is a merit functional. Furthermore M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ).
Together with (7.4.7), (7.4.8), and (7.1.9) it follows that (x k , μk ) is a Cauchy sequence.
Hence there exists (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) = (x ∗ , μ∗ ) and Ax ∗ + μ∗ =
a, μ∗ (x ∗ −ψ) = 0 a.e. in . Since (x k −ψ)+ → (x ∗ −ψ)+ as k → ∞ and limk→∞  (x k+1 −
ψ)+ = 0, it follows that x ∗ ≤ ψ. Similarly, one argues that μ∗ ≥ 0. Thus (x ∗ , μ∗ ) is a
solution to (7.4.1).

Concerning the uniqueness of the solution to (7.4.1), assume that A ∈ L(L2 ()) and
that (Ay, y)L2 () > 0 for all y  = 0. Assume further that (x ∗ , μ∗ ) and (x̂, μ̂) are solutions to
(7.4.1) with x̂ − x ∗ ∈ L2 (). Then (x̂ − x ∗ , A(x̂ − x ∗ ))L2 () ≤ 0 and therefore x̂ − x ∗ = 0.

Remark 7.4.1. In the finite-dimensional case the integrals in the definition of M must be
replaced by sums over the active/inactive index sets. If A is an M-matrix, then ρ1 = ρ2 =
ρ5 = 0 and ρ < 1 if ρβ3 + ρ4 < 1. This is the case if A is diagonally dominant in the sense
that ρ4 < 1 and β is chosen sufficiently large. If these conditions are met, then ρ < 1 is
stable under perturbations of A.

Remark 7.4.2. Consider the infinite-dimensional case with A = αI + K, where α > 0,


K ∈ L(L1 ()), and Kφ ≥ 0 for all φ ≥ 0. This is the case for the operators in Examples 7.1
and 7.2, as can be argued by using the maximum principle. Let K denote the norm of

i i

i i
ItoKunisc
i i
2008/6/12
page 202
i i

202 Chapter 7. The Primal-Dual Active Set Method

K in K ∈ L(L1 ()). For K < α and any I ⊂  we have A−1 I = α II − α K I A I


1 1 −1
K
and hence ρ1 ≤ α(α−K) . Moreover ρ3 = 0. The conditions involving ρ2 , ρ4 , and ρ5 are
K K2 K2
satisfied with ρ2 = α−K
, ρ4 = α(α−K)
, and ρ5 = α−K
.

7.5 Bilateral constraints, diagonally dominated class


Consider the quadratic programming with the bilateral constraints
1   
min Ax, x X − a, x X
2
subject to
Ex = b, ϕ ≤ Gx ≤ ψ
with conditions on A, E, and G as in (7.1.1). From Example 4.53 of Section 4.7, we recall
that the necessary optimality condition for this problem is given by

Ax + E ∗ λ + G∗ μ = a,

Ex = b, (7.5.1)

μ = max(0, μ + c(Gx − ψ)) + min(0, μ + c(Gx − ϕ)),

where max as well as min are interpreted as pointwise a.e. operations if Z = L2 () and
coordinatewise for Z = Rn .

Primal-Dual Active Set Algorithm.


(i) Initialize x 0 , μ0 . Set k = 0.

(ii) Given (x k , μk ), set


A+k = {μ + c(Gx − ψ) > 0},
k k

Ik = {μk + c(Gx k − ψ) ≤ 0 ≤ μk + c(Gx k − ϕ)},


A−k = {μ + c(Gx − ϕ) < 0}.
k k

(iii) Solve for (x k+1 , λk+1 , μk+1 ):

Ax k+1 + E ∗ λk+1 + G∗ μk+1 = a,

Ex k+1 = b,

Gx k+1 = ψ in A+
k, μk+1 = 0 in Ik , and Gx k+1 = ϕ in A−
k.

(iv) Stop, or set k = k + 1, and return to (ii).


We assume the existence of a solution to the auxiliary systems in step (iii). In case
A is positive definite a sufficient condition for existence is given by surjectivity of E and

i i

i i
ItoKunisc
i i
2008/6/12
page 203
i i

7.5. Bilateral constraints, diagonally dominated class 203

surjectivity of G : N (E) → Z. As in the unilateral case we shall consider a transformed


version of (7.5.1). If (7.1.4) and (7.1.8) are satisfied, then (7.5.1) can be transformed into
the equivalent system

Ax + μ = a, μ = max(0, μ + c (x − ψ)) + min(0, μ + c (x − ϕ)), (7.5.2)

where A is a bounded operator on Z. The algorithm for this reduced system is obtained
from the above algorithm by replacing G by I and deleting the terms involving E and E ∗ .
As in the unilateral case, if one does not carry out the reduction step from (7.1.6) to (7.5.2),
then the coordinates corresponding to x2 are treated as inactive ones in the algorithm. We
henceforth concentrate on the infinite-dimensional case and give sufficient conditions for

M(x k+1 , μk+1 ) = max ((x k+1 − ψ)+ + (x k+1 − ϕ)− ) dx,
Ik

   (7.5.3)
k+1 − k+1 +
(μ ) dx + (μ ) dx
A+
k A−
k

to act as a merit function for the algorithm applied to the reduced system. In the finite-
dimensional case the integrals must be replaced by sums over the respective index sets. We
note that (iii) with G = I implies the complementarity property

(x k − ψ)(x k − ϕ)μk = 0 a.e. in . (7.5.4)

As in the previous section the merit function involves L1 -norms and accordingly we
aim for convergence in L1 (). We henceforth assume that

A ∈ L(L1 ()), a ∈ L1 (), ψ and ϕ ∈ L1 (). (7.5.5)

Below  ·  denotes the norm of operators in L(L1 ()). The following conditions will be
used: There exist constants ρi , i = 1, . . . , 5, such that for arbitrary partitions A ∪ I = 
we have
A−1
I  ≤ ρ1 ,
(7.5.6)
A−1
I AIA  ≤ ρ2

and
AAk − c I  ≤ ρ3 ,

AAI A−1
I  ≤ ρ4 , (7.5.7)

AAI A−1
I AIA  ≤ ρ5 .

We further set ρ = 2 max( max(ρ1 , ρ2 , ρc2 ), max(ρ3 + ρ5 , ρ4 ), ρ3 +ρ


c
5
).

Theorem 7.8. If (7.5.5), (7.5.6), (7.5.7) hold and ρ < 1, then M is a merit function for
the primal-dual algorithm of the reduced system (7.5.2) and limk→∞ (x k , μk ) = (x ∗ , μ∗ ) in
L1 () × L1 (), with (x ∗ , μ∗ ) a solution to (7.5.2).

i i

i i
ItoKunisc
i i
2008/6/12
page 204
i i

204 Chapter 7. The Primal-Dual Active Set Method

Proof. For δx = x k+1 − x k and δμ = μk+1 − μk we have

AA+k δxA+k + AA+k Ik δxIk + AA+k A−k δxA−k + δμA+k = 0,

AIk δxIk + AIk A+k δxA+k + AIk A−k δxA−k − μkIk = 0, (7.5.8)

AA−k δxA−k + AA−k Ik δxIk + AA−k A+k δxA+k + δμA−k = 0

with

⎨ >0 on A+ +
k−1 ∩ Ak ,
μkA+ =0 on Ik−1 ∩ A+k, (7.5.9)
k ⎩
> c (ψ − ϕ) on A−
k−1 ∩ A +
k,

⎨ [c(ϕ − ψ), 0) on A+
k−1 ∩ Ik ,
μkIk ∈ =0 on Ik−1 ∩ Ik , (7.5.10)

(0, c(ψ − ϕ)] on A−
k−1 ∩ Ik ,

⎨ < c (ϕ − ψ) on A+ −
k−1 ∩ Ak ,
μkA− =0 on Ik−1 ∩ A−
k, (7.5.11)
k ⎩
<0 on A− −
k−1 ∩ Ak ,

⎨ =0 on A+ +
k−1 ∩ Ak ,
δxAk + <0 on Ik−1 ∩ A+
k, (7.5.12)
⎩ −
Ak−1 ∩ A+
k
= ψ − ϕ < μc on k,

A+ −
k
⎨ = ϕ − ψ > μc on k−1 ∩ Ak ,
δxA−k >0 on Ik−1 ∩ A−
k,
(7.5.13)

=0 on A−
k−1 ∩ A −
k.
From (7.5.4)

(xIk+1
k
− ψIk )+ ≤ (δxIk )+ and (xIk+1
k
− ϕIk )− ≤ (δxIk )− . (7.5.14)

This implies that


   
k+1 − k+1 +
M(x k+1
,μ k+1
) ≤ max |δxIk | , (μ ) + (μ ) . (7.5.15)
Ik A+
k A−
k

From (7.5.8), (7.5.6), (7.5.10), (7.5.12), and (7.5.13) we have

|δxIk | ≤ ρ1 |μkIk | + ρ2 |δxAk |


   
≤ ρ1 |(μkI + )− | + |(μkI − )+ | + ρ2 |δxA+k | + |δxA−k |
k ∩Ak−1 k ∩Ak−1

  
≤ ρ1 |(μkI + )− | + |(μkI − )+ | + ρ2 |(x k − ψ)+
A+ ∩I
| + 1c |(μkA+ ∩A− )+ |
k ∩Ak−1 k ∩Ak−1 k k−1 k k−1


+|(x k − ϕ)−
A− ∩I
| + 1c |(μkA− ∩A+ )− | .
k k−1 k k−1

i i

i i
ItoKunisc
i i
2008/6/12
page 205
i i

7.5. Bilateral constraints, diagonally dominated class 205

This implies
 
ρ2
|δxIk | ≤ 2 max ρ1 , ρ2 , M(x k , μk ). (7.5.16)
c
From (7.5.8) further

Ak − (μAk − cδxAk ) = g,
μk+1 k
(7.5.17)

where g = (cI − AδxAk )δxAk − AAk Ik A−1


Ak AIk Ak δxAk . By (7.5.9), (7.5.12), we have

μkA+ − c δxA+k ≥ 0.
k

Similarly, by (7.5.11), (7.5.13)


μkA− − c δxA−k ≤ 0.
k

Consequently,
|(μk+1
A+
)− | + |(μk+1
A−
)+ | ≤ |gAk | ≤ (ρ3 + ρ5 )|δxAk | + ρ4 |μIk |
k k

  (7.5.18)
ρ3 + ρ5
≤ 2 max ρ4 , ρ3 + ρ5 , M(x k , μk ).
c
By (7.5.15), (7.5.16), and (7.5.18)
    
ρ2 ρ3 + ρ5
M(x , μ ) ≤ 2 max max ρ1 , ρ2 ,
k+1 k+1
, max ρ4 , ρ3 + ρ5 , M(x k , μk ).
c c
It follows that M(x k+1 , μk+1 ) ≤ ρ k M(x 1 , μ1 ) and if ρ < 1, then M(x k , μk ) → 0 as k →
∞. From the estimates leading to (7.5.16) it follows that x k is a Cauchy sequence. Moreover
μk is a Cauchy sequence by (7.5.8). Hence there exist (x ∗ , μ∗ ) such that limk→∞ (x k , μk ) =
(x ∗ , μ∗ ). By Lebesgue’s bounded convergence theorem and since M(x k , μk ) → 0, it
follows that ϕ ≤ x ∗ ≤ ψ. Clearly Ax ∗ + μ∗ = a and (x ∗ − ψ)(x ∗ − ϕ)μ∗ = 0 by (7.5.4).
This last equation implies that μ∗ = 0 on I ∗ = {ϕ < x ∗ < ψ}. It remains to show that
μ∗ ≥ 0 on A∗,+ = {x ∗ = ψ} and μ∗ ≤ 0 on A∗,− = {x ∗ = ϕ}. Let s ∈ A∗,+ be such that
x k (s) and μk (s) converge. Then μ∗∗(s) ≥ 0. If not, then μ∗ (s) < 0 and there exists k̄ such
that μk (s) + c(x k (s) − ψ(s)) ≤ μ 2(s) < 0 for all k ≥ k̄. Then s ∈ I k and μk+1 = 0 for
k ≥ k̄, contradicting μ∗ (s) < 0. Analogously one shows that μ∗ ≤ 0 on A∗,− .

Conditions (7.5.6) and (7.5.7) are satisfied for additive perturbations of the operator
cI , for example. This can be deduced from the following result.

Theorem 7.9. Assume that A = cI + K with K ∈ L(L1 ()) and K < c and that (7.5.5),
(7.5.6), (7.5.7) are satisfied. If
   
K ρ3 + ρ5
ρ̄ = 2 max max ρ1 , ρ2 , max(ρ3 + ρ5 , ρ4 ), < 1,
c c
then the conclusions of the previous theorem are valid.

i i

i i
ItoKunisc
i i
2008/6/12
page 206
i i

206 Chapter 7. The Primal-Dual Active Set Method

Proof. We follow the proof of Theorem 7.8 and eliminate the overestimate (7.5.14). Let
P = {xIk+1
k
− ψ > 0} ∩ Ik . We find

⎨ ≤ δxP ∩Ik−1 P ∩ Ik−1 ,
k
⎪ on
x k+1
−ψ = δxP ∩A+
k
on P ∩ A+ k−1 ,

⎩ = δx
k−1

k
P ∩A−
+ (ϕ − ψ) −
P ∩Ak−1 on P ∩ A k−1 .
k−1

This estimate, together with A−1 = 1c I − 1c KA−1 , implies that


  
(x k+1 − ψ)+ ≤ δx k + ϕ−ψ
Ik P P ∩A−
k−1

  
≤ A−1
Ik μIk −
k
A−1
Ik AIk Ak δxAk + ϕ−ψ
P P P ∩A−
k−1

   
1 1
= μkIk + ϕ−ψ − KIk A−1 k
Ik μIk − A−1
Ik AIk Ak δxAk
c P −
P ∩Ak−1 c P P

 
1
≤− KIk A−1
Ik μIk −
k
A−1
Ik AIk Ak δxAk ,
c P P

and hence

K
(x k+1 − ψ)+ ≤
ρ1 |μkIk | + ρ2 |δxAk |.
Ik c

An analogous estimate can be obtained for Ik (x k+1 − ϕ)− and we find
 
+ K
(x k+1
− ψ) (x k+1 − ϕ)− ≤ ρ1 |μkIk | + ρ2 |δxAk |.
Ik Ik c

We can now proceed as in the proof of Theorem 7.8.

Example 7.10. We apply Theorem 7.9, A = I + K ∈ L(L1 ()). By Neumann series


γ2
arguments we find ρ̄ = 2 max( 1−γ
γ
, max(γ + 1−γ γ
, 1−γ )) = 1−γ
γ
, where γ = K, and
ρ̄ < 1 if K < 3 . If A = I + K is replaced by A = cI + K, then ρ̄ < 1 if γ < 2c+1
1 c
, in
c2
case c ≥ 1, and ρ̄ < 1 if c+2
, in case c ≤ 1.

Example 7.11. Here we consider the finite-dimensional case A = I + K ∈ Rn×n , where


Rn is endowed with the 1 -norm. Again Theorem 7.9 is applicable and ρ̄ < 1 if K < 13 ,
where  ·  denotes the matrix-norm subordinate to the 1 -norm of Rn . Recall that this norm
is given by the maximum over the column sums of the absolute values of the matrix.

7.6 Nonlinear control problems with bilateral constraints


In this section we consider a special case of bilaterally constrained problems that were
already investigated in the previous section, where the operator A is of the form T ∗ T . This

i i

i i
ItoKunisc
i i
2008/6/12
page 207
i i

7.6. Nonlinear control problems with bilateral constraints 207

will allow us to obtain improved sufficient conditions for global convergence of the primal-
dual active set algorithm. The primary motivation for such problems are optimal control
problems, and we therefore slightly change the notation to that which is more common in
optimal control.
Let U and Y be Hilbert spaces with U = L2 (), where  is a bounded measurable
set in Rd , and let T : U → Y be a, possibly nonlinear, continuously differentiable, injective,
mapping with Fréchet derivative denoted by T . Further let ϕ, ψ ∈ U with ϕ < ψ a.e. in
. For α > 0 and z ∈ Y consider

1 α
min J (u) = |T (u) − z|2Y + |u|2U . (7.6.1)
ϕ≤u≤ψ 2 2

The necessary optimality condition for (7.6.1) is given by



αu + T (u)∗ (T (u) − z) + μ = 0,
(7.6.2)
μ = max(0, μ + α(u − ψ)) + min(0, μ + α(u − ϕ)),

where (u, μ) ∈ U × U , and max as well as min are interpreted as pointwise a.e. operations.
If the upper or lower constraint are not present, we can set ψ = ∞ or ϕ = −∞.

Example 7.12. Let  ⊂ Rn be a bounded domain in Rn with Lipschitz continuous bound-


) ⊂ , )
ary , let  ⊂ be measurable subsets, and consider the optimal control problem

1 α
min |y − z|2Y + | u|2U subject to
ϕ≤u≤ψ 2 2

(∇y, ∇v) + (y, v) = (u, v)) for all v ∈ H 1 (), (7.6.3)
˜ We set Y = L2 ()
where z ∈ L2 (). ) and U = L2 ()
). Define L u = y with L : L2 ()
) →
2
L () as the solution operator to the inhomogeneous Neumann boundary value problem
) L : L () ) where R 2 )
(7.6.3) and set T = R 2
) → L2 (), ) : L () → L ()
2
denotes the
∗ 2 ) 2 )
canonical restriction operator. Then T : L () → L ( ) is given by

T ∗ = R) L∗ E
)

with R) the restriction operator to )


and E ) to .
) the extension-by-zero operator from 
Further the adjoint L∗ : L2 () → L2 ( ) of L is given by L∗ w̃ = τ p, where τ is the
Dirichlet trace operator from H 1 () to L2 ( ) and p is the solution to

(∇p, ∇w) + (p, w) = (w̃, w) for all w ∈ H 1 (). (7.6.4)

The fact that L and L∗ are adjoint to each other follows by setting v = p in (7.6.3) and
w = y in (7.6.4).

We next specify the primal-dual active set algorithm for (7.6.1). The iteration index
is denoted by k and an initial choice (u0 , μ0 ) is assumed to be available.

i i

i i
ItoKunisc
i i
2008/6/12
page 208
i i

208 Chapter 7. The Primal-Dual Active Set Method

Primal-Dual Active Set Algorithm.

(i) Given (uk , μk ), determine


A+k = {μ + α(u − ψ) > 0},
k k

Ik = {μ + α(u − ψ) ≤ 0 ≤ μk + α(uk − ϕ)},


k k

A−k = {μ + α(u − ϕ) < 0}.


k k

(ii) Determine (uk+1 , μk+1 ) from

uk+1 = ψ on A+
k, u
k+1
= ϕ on A−
k, μ
k+1
= 0 on Ik ,

and

α uk+1 + T (uk+1 )∗ (T (uk+1 ) − z) + μk+1 = 0. (7.6.5)

Note that the equations for (uk+1 , μk+1 ) in step (ii) of the algorithm constitute the
necessary optimality condition for the auxiliary problem

⎨ min 12 |T (u) − z|2Y + α2 |u|2U over u ∈ U
(7.6.6)

subject to u = ψ on A+ k , u = ϕ on A −
k .

The analysis in this section relies on the fact that (7.6.2) can be equivalently expressed as


⎪ y = T (u),






⎨ λ = −T (u)∗ (y − z),
(7.6.7)



⎪ α u − λ + μ = 0,





μ = max(0, μ + (u − ψ)) + min(0, μ + (u − ϕ)),

where λ = −T (u)∗ (T (u)−z) is referred to as the adjoint state. Analogously, for uk+1 ∈ U ,
setting y k+1 = T (uk+1 ), λk+1 = −T (uk+1 )∗ (T (uk+1 ) − z), (7.6.5) can equivalently be
expressed as ⎧ k+1
⎪ y
⎪ = T (uk+1 ), where

⎪ ⎧



⎪ ⎨ ψ on A+ k,



⎨ u k+1
= 1 k+1
λ on Ik,
⎩ α
ϕ on A− k, (7.6.8)



⎪ k+1

⎪ λ = −T (uk+1 )∗ (y k+1 − z),






α uk+1 − λk+1 + μk+1 = 0.
In what follows we give conditions which guarantee convergence of the primal-dual
active set strategy for linear and certain nonlinear operators T from arbitrary initial data.
The convergence proof is based on an appropriately defined functional which decays when

i i

i i
ItoKunisc
i i
2008/6/12
page 209
i i

7.6. Nonlinear control problems with bilateral constraints 209

evaluated along the iterates of the algorithm. An a priori estimate for the adjoint variable λ
in (7.6.8) will play an essential role.
To specify the condition alluded to in the above let us consider two consecutive iterates
of the algorithm. For every k = 1, 2, . . . , the sets A+ −
k , Ak , and Ik give a mutually disjoint
decomposition of . According to (i) and (ii) in the form (7.6.8) we find
⎧ k

⎪ RA+ on A+k,



uk+1 − uk = 1
(λk+1 − λk ) + RIk on Ik , (7.6.9)


α


⎩ k
RA− on A−k,

where the residual R k is given by



⎨ 0 on A+ +
k−1 ∩ Ak ,
+
RAk
+ = ψ − α1 λk = ψ − uk < 0 on Ik−1 ∩ Ak , (7.6.10)

ψ − ϕ < α1 μk on A− +
k−1 ∩ Ak ,
⎧ 1 k
⎨ α μ = α1 λk − ψ ≤ 0 on A+k−1 ∩ Ik ,
RI =
k
0 on Ik−1 ∩ Ik , (7.6.11)
⎩ 1 k
α
μ = α1 λk − ϕ ≥ 0 on A−k−1 ∩ Ik ,

⎨ ϕ − ψ > α1 μk on A+ −
k−1 ∩ Ak ,
RA− =
k
ϕ − α1 λk = ϕ − uk > 0 on Ik−1 ∩ A−k, (7.6.12)

0 on A− −
k−1 ∩ Ak .

Here R k denotes the function defined on  whose restrictions to A+ −


k , Ik , Ak coincide with
k k k
RA+ , RI , and RA− .
We shall utilize the following a priori estimate:

⎨ There exists ρ < α such that
(7.6.13)
⎩ k+1
|λ − λk |U < ρ |R k |U for every k = 1, 2, . . . .

Sufficient conditions for (7.6.13) will be given at the end of this section. The convergence
proof will be based on the following merit functional M : U × U → R given by
  
M(u, μ) = α 2 (|(u − ψ)+ |2 + |(ϕ − u)+ |2 ) dx + |μ− |2 dx + |μ+ |2 dx,
 A+ (u) A− (u)

where A+ (u) = {x : u ≥ ψ} and A− (u) = {x : u ≤ ϕ}. Note that the iterates (uk , μk ) ∈
U × U satisfy
μk (uk − ψ)(ϕ − uk )(x) = 0 for a.e. x ∈ , (7.6.14)
and hence at most one of the integrands of M(uk , μk ) can be strictly positive at x ∈ .

Theorem 7.13. Assume that (7.6.13) holds for the iterates of the primal-dual active set
strategy. Then M(uk+1 , μk+1 ) ≤ α −2 ρ 2 M(uk , μk ) for every k = 1, . . . . Moreover there
exist (u∗ , μ∗ ) ∈ U ×U , such that limk→∞ (uk , μk ) = (u∗ , μ∗ ) and (u∗ , μ∗ ) satisfies (7.6.2).

i i

i i
ItoKunisc
i i
2008/6/12
page 210
i i

210 Chapter 7. The Primal-Dual Active Set Method

Proof. From (7.6.8) we have

μk+1 = λk+1 − αψ on A+
k,

1 k+1
uk+1 = λ on Ik ,
α

μk+1 = λk+1 − αϕ on A−
k.

Using step (ii) of the algorithm in the form of (7.6.8) implies that
⎧ k
⎨ μ >0 on A+ +
k−1 ∩ Ak ,
+
μk+1 = λk+1 − λk + λk − αψ = λk+1 − λk + α(uk − ψ) > 0 on Ik−1 ∩ Ak ,

αuk + μk − αψ ≥ 0 on A− +
k−1 ∩ Ak ,

and therefore
|μk+1,− (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A+
k. (7.6.15)
Analogously one derives

|μk+1,+ (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A−


k. (7.6.16)

Moreover
1 k+1
uk+1 − ψ = (λ − λk + λk ) − ψ
α ⎧ 1 k
⎨ αμ ≤ 0 on A+k−1 ∩ Ik ,
1 k+1
= (λ − λk ) + uk − ψ ≤ 0 on Ik−1 ∩ Ik ,
α ⎩ 1 k
α
μ +u−ψ ≤0 on A−k−1 ∩ Ik ,

which implies that


1 k+1
|(uk+1 − ψ)+ (x)| ≤ |λ (x) − λk (x)| for x ∈ Ik . (7.6.17)
α
Analogously one derives that
1 k+1
|(ϕ − uk+1 )+ (x)| ≤ |λ (x) − λk (x)| for x ∈ Ik . (7.6.18)
α
Due to (ii) of the algorithm we have that

(uk+1 − ψ)+ = (ϕ − uk+1 )+ = 0 on A+ −


k ∪ Ak ,

which, together with (7.6.17)–(7.6.18), implies that


1 k+1
|(uk+1 − ψ)+ (x)| + |(ϕ − uk+1 )+ (x)| ≤ |λ (x) − λk (x)| for x ∈ . (7.6.19)
α
From (7.6.14)–(7.6.16) and since ϕ < ψ a.e. on  we find

|μk+1,− (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A+ (uk+1 ) (7.6.20)

i i

i i
ItoKunisc
i i
2008/6/12
page 211
i i

7.6. Nonlinear control problems with bilateral constraints 211

and
|μk+1,+ (x)| ≤ |λk+1 (x) − λk (x)| for x ∈ A− (uk+1 ). (7.6.21)
Combining (7.6.19)–(7.6.21) implies that

M(uk+1 , μk+1 ) ≤ |λk+1 (x) − λk (x)|2 dx. (7.6.22)


Since (7.6.13) is supposed to hold we have


M(uk+1 , μk+1 ) ≤ ρ 2 |R k |2U .
Moreover, from the definition of R k we deduce that
|R k |2U ≤ α −2 M(uk , μk ), (7.6.23)
and consequently
M(uk+1 , μk+1 ) ≤ α −2 ρ 2 M(uk , μk ) for k = 1, 2, . . . . (7.6.24)
From (7.6.13), (7.6.23), (7.6.24) it follows that |λk+1 − λk |U ≤ ( αρ )k ρ|R 0 |U . Thus there
exists λ∗ ∈ U such that limk→∞ λk = λ∗ .
Note that for k ≥ 1
A+
k = {x : λ (x) > α ψ(x)},
k
Ik = {x : α ϕ(x) ≤ λk (x) ≤ α ψ(x)},
A−
k = {x : λ (x) < α ϕ(x)},
k

and hence
μk+1 = max(0, λk − α ψ) + min(0, λk − α ϕ) + (λk+1 − λk )χA+k ∪A−k .

Since limk→∞ (λk+1 − λk ) = 0 and limk→∞ λk exists, it follows that there exists μ∗ ∈ U
such that limk→∞ μk = μ∗ , and
μ∗ = max(0, λ∗ − α ψ) + min(0, λ∗ − α ϕ). (7.6.25)
From the last equation in (7.6.8) it follows that there exists u∗ such that limk→∞ uk = u∗ and
α u∗ −λ∗ +μ∗ = 0. Combined with (7.6.25) the triple (u∗ , μ∗ ) satisfies the complementarity
condition given by the second equation in (7.6.2). Passing to the limit with respect to k in
(7.6.5) we obtain that the first equation in (7.6.2) is satisfied by (u∗ , μ∗ ).

We turn to the discussion of (7.6.13) and consider the linear case first.

Proposition 7.14. If T is linear and T 2L(U,Y ) < α, then (7.6.13) holds.

Proof. From (7.6.8) and (7.6.9) we have, with δu = uk+1 − uk , δy = y k+1 − y k , and
δλ = λk+1 − λk , ⎧
⎪ δu = R + α δλ χIk ,
⎪ k 1



T δu = δy, (7.6.26)




⎩ ∗
T δy + δλ = 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 212
i i

212 Chapter 7. The Primal-Dual Active Set Method

Taking the inner product in L2 with δλ in the first equation we arrive at


(δu, δλ) ≤ (R k , δλ),
and taking the inner product with δu in the third equation,
|δy|2 + (δλ, δu) = 0.
Combining these two relations implies that
|δy|2 ≤ (R k , δλ).
Utilizing the third equation in (7.6.26) in this last inequality and the fact that the norms
of T and T ∗ coincide, we have |δλ| ≤ T ∗ 2 |R k |, from which the desired estimate
follows.

We now turn to a particular case when (7.6.13) holds for a nonlinear operator T . Let
 =  be a bounded domain in Rn , n = 2 or 3, with smooth boundary ∂ . Further
let φ : R → R be a monotone mapping with locally Lipschitzian derivative, satisfying
φ(0) = 0, and such that the substitution operator determined by φ maps H 1 () into L2 ().
We choose U = Y = L2 () and define T (u) = y as the solution operator to

− y + φ(y) = u in ,
(7.6.27)
y=0 on ∂,
where denotes the Laplacian. The adjoint variable λ is the solution to

λ + φ (y)λ = −(y − z) in ,
(7.6.28)
λ=0 on ∂.
) = {uk : k = 1, 2, . . .} denote the set of
Let (u0 , μ0 ) be an arbitrary initialization and let U
iterates generated by the primal-dual active set algorithm. Since these iterates are solutions
to the auxiliary problems (7.6.6), it follows that for every ᾱ > 0 the set U ) is bounded in
L () uniformly with respect to α ≥ ᾱ.
2

By monotone operator theory and regularity theory of elliptic partial differential equa-
tions it follows that the set of primal states {y k = y(uk ) : k = 1, 2, . . .} and adjoint states
{λk = λ(y(uk )) : k = 1, 2, . . .} are bounded subsets of L∞ (); see [Tr]. Let C denote this
bound and let LC denote the Lipschitz constant of φ on the ball BC (0) with center 0 and
radius C in R. Denote by H01 () = {u ∈ H 1 () : u = 0 on ∂} the Hilbert space endowed
with norm |∇u|L2 and let κ stand for the embedding constant from H01 () into L2 ().
4
(1+C LC ) κ
Proposition 7.15. Assume that 0 < α−(1+C LC ) κ 4
< 1, where α ≥ ᾱ. Then (7.6.13) holds
for the mapping T determined by the solution operator to (7.6.27).

Proof. From (7.6.8) we have


1 k+1
− (y k+1 − y k ) + φ(y k+1 ) − φ(y k ) = (λ − λk )χI k+1 + R k , (7.6.29)
α

− (λk+1 − λk ) + φ (y k+1 )(λk+1 − λk ) + (φ (y k+1 ) − φ (y k ))λk + y k+1 − y k = 0,


(7.6.30)

i i

i i
ItoKunisc
i i
2008/6/12
page 213
i i

7.6. Nonlinear control problems with bilateral constraints 213

both Laplacians with homogeneous Dirichlet boundary conditions. Taking the inner product
of (7.6.29) with y k+1 − y k we have, using monotonicity of φ,

κ2
|y k+1 − y k |1 ≤ |(λk+1 − λk )|1 + |R k |−1 , (7.6.31)
α
where | · |1 and | · |−1 denote the norms in H01 () and H −1 (), respectively. Note that
φ (y k+1 ) ≥ 0. Hence from (7.6.30) we find

|λk+1 − λk |21 ≤ C LC |y k+1 − y k |L2 |λk+1 − λk |L2 + |(y k+1 − y k , λk+1 − λk )|

≤ (1 + C LC ) κ 2 |y k+1 − y k |1 |λk+1 − λk |1 .

Thus,
|λk+1 − λk |1 ≤ (1 + C LC ) κ 2 |y k+1 − y k |1
and hence from (7.6.31)
α
|y k+1 − y k |1 ≤ |R k |−1 .
α − (1 + C LC ) κ 4
It thus follows that
α (1 + C LC ) κ 4
|λk+1 − λk |L2 ≤ |R k |L2 .
α − (1 + C LC ) κ 4
This implies (7.6.13) with

(1 + C LC ) κ 4
ρ = .
α − (1 + C LC ) κ 4

i i

i i
ItoKunisc
i i
2008/6/12
page 214
i i

i i

i i
ItoKunisc
i i
2008/6/12
page 215
i i

Chapter 8

Semismooth Newton
Methods I

8.1 Introduction
In this chapter we study semismooth Newton methods for solving nonlinear nonsmooth
equations. These investigations are motivated by complementarity problems, variational
inequalities, and optimal control problems with control or state constraints, for example.
The operator equation for which we desire to find a solution is typically Lipschitz continuous
but not C 1 regular. We shall also establish the relationship between the semismooth Newton
method and the primal-dual active set method that was discussed in Chapter 7.
Since semismooth Newton methods are not widely known even for finite-dimensional
problems, we consider the finite-dimensional case before we turn to problems in infinite
dimensions. In fact, these two cases are distinctly different. In finite dimensions we have
Rademacher’s theorem, which states that every locally Lipschitz continuous function is
differentiable almost everywhere. This result has no counterpart for functions between
infinite-dimensional function spaces.
As an example consider the nonlinear complementarity problem
g(x) ≤ 0, x ≤ ψ, and (g(x), x − ψ)Rn = 0,
where g : R → R and ψ ∈ R . It can be expressed equivalently as the problem of finding
n n n

a root to the following equation:


F (x) = g(x) + max(0, −g(x) + x − ψ) = max(g(x), x − ψ) = 0, (8.1.1)
where the max operation just like the inequalities must be interpreted componentwise. Note
that F is a locally Lipschitz continuous function if g is locally Lipschitzian, but it is not C 1
if g is C 1 . A function is called locally Lipschitz continuous if it is Lipschitz continuous on
every bounded subset of its domain.
Let us introduce some of the key concepts for the finite-dimensional case. For a
locally Lipschitz continuous function F : Rm → Rn let DF denote the set of points at
which F is differentiable. For x ∈ Rm we define ∂B F (x) as
 
∂B F (x) = J : J = lim ∇F (xi ) , (8.1.2)
xi →x, xi ∈DF

215

i i

i i
ItoKunisc
i i
2008/6/12
page 216
i i

216 Chapter 8. Semismooth Newton Methods I

and we denote by ∂F (x) the generalized derivative at x introduced by Clarke [Cla], i.e.,

∂F (x) = co ∂B F (x), (8.1.3)

where co stands for the convex hull.


A generalized Newton iteration for solving the nonlinear equation F (x) = 0 with
F : Rn → Rn can be defined by

x k+1 = x k − Vk−1 F (x k ), where Vk ∈ ∂B F (x k ). (8.1.4)

The reason for using ∂B F rather than ∂F in (8.1.4) is the following: For the convergence
analysis we shall require that all V ∈ ∂B F (x ∗ ) are nonsingular, where x ∗ is the thought
solution to F (x) = 0. This is more readily satisfied for ∂B F than for ∂F , as can be seen
from F (x) = |x|, for example. In this case 0 ∈ ∂F (0) but 0 ∈/ ∂B F .
We also introduce the coordinatewise operation

∂b F (x) = ⊗m
i=1 ∂B Fi (x),

where Fi is the ith coordinate of F . From the definition of ∂B F (x) it follows that ∂B F (x) ⊂
∂b F (x). For F given in (8.1.1) we have

∂B F (x) = ∂b F (x)

if −g (x) + I is surjective. Moreover, in Section 8.4 it will be shown that if we select



⎨ d if − g(x) + x − ψ > 0,
V (x) d =

g (x)d if − g(x) + x − ψ ≤ 0,

then the generalized Newton method reduces to the primal-dual active set method.
Local convergence of {x k } to x ∗ , a solution of F (x) = 0, is based on the following
concepts. The generalized Jacobians V k ∈ ∂B F (xk ) are selected so that their inverses
V (xk )−1 are uniformly bounded and that they satisfy the condition

|F (x ∗ + h) − F (x ∗ ) − V h| = o(|h|), (8.1.5)

where V = V (x ∗ + h) ∈ ∂B F (x ∗ + h), for h in a neighborhood of x ∗ . Then from (8.1.5)


with h = x k − x ∗ and Vk = V (x k ) we have

|x k+1 − x ∗ | = |Vk−1 (F (x k ) − F (x ∗ ) − Vk (x k − x ∗ ))| = o(|x k − x ∗ |). (8.1.6)

Thus, there exists a neighborhood B(x ∗ , ρ) of x ∗ such that if x 0 ∈ B(x ∗ , ρ), then x k ∈
B(x ∗ , ρ) and x k converges to x ∗ superlinearly. This discussion will be made rigorous for F
mapping between finite-dimensional spaces, as above, as well as for the infinite-dimensional
case. For the finite-dimensional case we shall rely on the notion of semismooth functions.

Definition 8.1. F : Rm → Rn is called semismooth at x if F is locally Lipschitz at x and

lim V h exists for all h ∈ Rm . (8.1.7)


V ∈∂F (x+t h ), h →h, t→0+

i i

i i
ItoKunisc
i i
2008/6/12
page 217
i i

8.2. Semismooth functions in finite dimensions 217

Semismoothness was originally introduced by Miflin [Mif] for scalar-valued func-


tions. Convex functions and real-valued C 1 functions are examples for such semismooth
functions. Definition 8.1 is due to Qi and Sun [Qi, QiSu]. It will be shown that if F is
semismooth at x ∗ , condition (8.1.5) holds. Due to the fact that the notion of Clarke derivative
is not available in infinite dimensions, Definition 8.1 does not allow a direct generalization
to the infinite-dimensional case. Rather, the notion of Newton differentiability related to
the property expressed in (8.1.5) was developed in [CNQ, HiIK]. Alternatively, in [Ulb]
the infinite-dimensional case of mappings into Lp spaces was treated by considering the
superposition of mappings F = !(G), with G a C 1 mapping into Lr and ! the substitu-
tion operator (!y)(s) = ψ(y(s)) for y ∈ Lr , where ψ is a semismooth function between
finite-dimensional spaces.
As an alternative to (8.1.4) the increment d k for a generalized Newton method x k+1 =
x + d k can be defined as the solution to
k

F (x k ) + F (x k ; d) = 0, (8.1.8)

where F (x; d) denotes the directional derivative of F at x in direction d. Note that it may
be a nontrivial task to solve (8.1.8) for d k . This method was investigated in [Pan1, Pan2].
We shall return to (8.1.8) in the context of globalization of the method specified by (8.1.4).
In Section 8.2 we present the finite-dimensional theory for Newton’s method for
semismooth functions. Section 8.3 is devoted to the discussion of Newton differentia-
bility and solving nonsmooth equations in Banach spaces. In Section 8.4 we exhibit the
relationship between the primal-dual active set method and semismooth Newton methods.
Section 8.5 is devoted to a class of nonlinear complementarity problems. In Section 8.6
we discuss applications where, for different reasons, semismooth Newton methods are not
directly applicable but rather a regularization is necessary as, for instance, in the case of
state-constrained optimal control problems.

8.2 Semismooth functions in finite dimensions


8.2.1 Basic concepts and the semismooth Newton algorithm
In this section we discuss properties of semismooth functions and analyze convergence of the
generalized Newton method (8.1.4). We follow quite closely the work by Qi and Sun [QiSu].
First we describe the relationship of semismoothness to directional differentiability. A
function F : Rm → Rn is called directionally differentiable at x ∈ Rm if
F (x + t h) − F (x)
lim = F (x; h)
t→0+ t
exists for all h ∈ Rd . Further F is called Bouligand-differentiable (B-differentiable) [Ro4]
at x if it is directionally differentiable at x and
F (x + h) − F (x) − F (x; h)
lim = 0.
h→0 |h|
A locally Lipschitzian function F is B-differentiable at x if and only if it is directionally
differentiable at x; see [Sha] and the references therein.

i i

i i
ItoKunisc
i i
2008/6/12
page 218
i i

218 Chapter 8. Semismooth Newton Methods I

Theorem 8.2. (i) Suppose that F : Rm → Rn is locally Lipschitz continuous and direction-
ally differentiable at x. Then F (x; ·) is Lipschitz continuous and for every h, there exists
a V ∈ ∂F (x) such that

F (x; h) = V h. (8.2.1)

(ii) If F is locally Lipschitz continuous, then the following statements are equivalent.
(1) F is semismooth at x.
(2) F is directionally differentiable at x and for every V ∈ ∂F (x + h),

V h − F (x; h) = o(|h|) as h → 0.
F (x+h;h)−F (x;h)
(3) limx+h∈ DF , |h|→0 |h|
= 0.

Proof. (i) For h, h ∈ Rm we have



F (x + th) − F (x + th )

|F (x; h) − F (x; h )| = lim+
t→0 t ≤ L |h − h |, (8.2.2)

where L is the Lipschitz constant of F in a neighborhood of x. Thus F (x; ·) is Lipschitz


continuous at x. Since

F (y) − F (x) ∈ co ∂F ([x, y])(y − x) (8.2.3)

for all x, y ∈ Rm (see [Cla, p. 72]), there exist a sequence {tk }, with tk → 0+ , and Vk ∈
co ∂F ([x, x + tk h])(y − x) such that

F (x; h) = lim Vk h.
k→∞

Since F is locally Lipschitz, the sequence {Vk } is bounded, and there exists a subsequence
of Vk , denoted by the same symbol, such that Vk → V . Moreover ∂F is closed at x; i.e.,
xi → x and Zi → Z, with Zi ∈ ∂F (xi ), imply that Z ∈ ∂F (x) [Cla, p. 70], and hence
V ∈ ∂F (x). Thus F (x; h) = V h, as desired.
(ii) We turn to verify the equivalence of (1)–(3).
(1) → (2): First we show that F (x; h) exists and that

F (x; h) = lim V h. (8.2.4)


V ∈∂F (x+th), t→0+

Since F is locally Lipschitz, { F (x+th)−F


t
(x)
: t → 0} is bounded. Thus there exists a sequence
ti → 0+ and  ∈ Rn such that
F (x + ti h) − F (x)
lim = .
i→∞ ti
We argue that  equals the limit in (8.2.4). By (8.2.3)

F (x + ti h) − F (x)
∈ co (∂F ([x, x + ti h])) h = co (∂F ([x, x + ti h] h).
ti

i i

i i
ItoKunisc
i i
2008/6/12
page 219
i i

8.2. Semismooth functions in finite dimensions 219

The
!n Carathéodory theorem implies that for each i there exist tik ∈ [0, ti ], λki ∈ [0, 1] with
k=0 λi = 1, and Vi ∈ ∂F (x + ti h), where k = 0, . . . , n, such that
k k k

F (x + ti h) − F (x)  k k
n
= λi Vi h.
ti k=0

By passing to subsequences such that lim λki → λk , for k = 0, . . . , n, we have



n   n
= lim λki lim Vik h = λk lim Vh = lim V h.
i→∞ i→∞ V ∈∂F (x+th), t→0+ V ∈∂F (x+th), t→0+
k=0 k=0

Next we prove that the limit in (8.2.4) is uniform for all h with |h| = 1. This implies (2).
If the claimed uniform convergence in (8.2.4) does not hold, then there exists  > 0, and
sequences {hk } in Rm with |hk | = 1, {tk } with tk → 0+ , and Vk ∈ ∂F (x + tk hk ) such that

|Vk hk − F (x; hk )| ≥ 2.

Passing to a subsequence we may assume that hk converges to some h ∈ Rm . By Lipschitz


continuity of F (x; h) with respect to h we have

|Vk hk − F (x; h)| ≥ 

for all k sufficiently large. This contradicts the semismoothness of F at x.


(2) → (1): Suppose that F is not semismooth at x. Then there exist  > 0, h ∈ Rm ,
and sequences {hk } in Rm and {tk }, and h ∈ Rm satisfying hk → h, tk → 0+ , and Vk ∈
∂F (x + tk hk ) such that
|Vk hk − F (x; h)| ≥ 2.
Since F (x; ·) is Lipschitz continuous,

|Vk hk − F (x; hk )| ≥ 

for sufficiently large k. This contradicts assumption (2).


(2) → (3) follows from (8.2.1).
(3) → (2): For arbitrary  > 0 there exists δ > 0 such that for all h with |h| < δ and
x + h ∈ DF
|F (x + h; h) − F (x; h)| ≤  |h|. (8.2.5)
Let V ∈ ∂F (x + h) and |h| < with h  = 0. By (8.1.3)
δ
2
 
V h ∈ co lim F (x + h ; h) .
h →h, x+h ∈DF
!
By the Carathéodory theorem, there exist λk ≥ 0 with nk=0 λk = 1 and hk , k = 1, . . . ,
satisfying x + hk ∈ DF and |hk − h| ≤ min( 2δ , |h|
L
, |h|), where L is the Lipschitz constant
of F near x, such that

 n

V h − λ k
F (x + h k
; h) ≤ |h|. (8.2.6)

k=0

i i

i i
ItoKunisc
i i
2008/6/12
page 220
i i

220 Chapter 8. Semismooth Newton Methods I

From (8.2.5) and (8.2.2) we find


n
 k
λ F (x + h k
; h) − F
(x; h)

k=0

n

≤ λk |F (x + hk ; h) − F (x + hk ; hk )| + |F (x + hk ; hk ) − F (x; hk )|
k=0 
+|F (x; hk ) − F (x; h)|

n
≤ λk (2L|hk − h| +  |hk |) ≤ 4 |h|.
k=0

It thus follows from (8.2.6) that

|V h − F (x + h; h)| ≤ 5 |h|.

Since  > 0 is arbitrary, (2) follows.

Theorem 8.3. Let F : Rn → Rn be locally Lipschitz continuous, let x ∈ Rn , and suppose


that all V ∈ ∂B F (x) are nonsingular. Then there exist a neighborhood N of x and a constant
C such that for all y ∈ N and V ∈ ∂B F (y)

|V −1 | ≤ C. (8.2.7)

Proof. First, we claim that there exist a neighborhood N of x and a constant C such that
for all y ∈ DF ∩ N , ∇F (y) is nonsingular and

|∇F (y)−1 | ≤ C. (8.2.8)

If this claim is not true, then there exists a sequence y k → x, y k ∈ DF , such that either
all ∇F (y k ) are singular or |(∇F (y k ))−1 | → ∞. Since F is locally Lipschitz the set
{∇F (y k ) : k = 1, . . .} is bounded. Thus there exists a subsequence of ∇F (y k ) that
converges to some V . Then V must be singular and V ∈ ∂B F (x). This contradicts the
assumption and there exists a neighborhood N of x such that (8.2.8) holds for all y ∈ DF ∩N .
Moreover (8.2.7) follows from (8.1.2), (8.2.8), and continuity of the norm.

Lemma 8.4. Suppose that F : Rn → Rn is semismooth at a solution x ∗ of F (x) = 0


and that all V ∈ ∂B F (x ∗ ) are nonsingular. Then there exist a neighborhood N of x ∗ and
 : R+ → R+ with limt→0+ (t) = 0 monotonically such that

|x − V −1 F (x) − x ∗ | ≤ (|x − x ∗ |)|x − x ∗ |,

|F (x − V −1 F (x))| ≤ (|x − x ∗ |)|F (x)|

for all V ∈ ∂B F (x) and x ∈ N .

Proof. From Theorem 8.3 there exist a neighborhood N of x ∗ and a constant C such that
|V −1 | ≤ C for all V ∈ ∂B F (x) with x ∈ N . Thus, if x ∈ N , then it follows from

i i

i i
ItoKunisc
i i
2008/6/12
page 221
i i

8.2. Semismooth functions in finite dimensions 221

Theorem 8.2 (2) and B-differentiability of F at x ∗ , which is implied by semismoothness of


F at x ∗ , that
|x − V −1 F (x) − x ∗ | ≤ |V −1 ||F (x) − F (x ∗ ) − V (x − x ∗ )|
 
≤ |V −1 | |F (x) − F (x ∗ ) − F (x ∗ ; x − x ∗ )| + |F (x ∗ ; x − x ∗ ) − V (x − x ∗ )|

≤ (|x − x ∗ |)|x − x ∗ |.
This implies the first claim. Let x̂ = x − V −1 F (x). Since F is B-differentiable at x ∗
we have
|F (x̂)| ≤ |F (x ∗ ; x̂ − x ∗ )| + (|x̂ − x ∗ |)|x̂ − x ∗ |.
From the first part of the theorem we obtain for a possibly redefined function 
|F (x̂)| ≤ (L + )  |x − x ∗ |,
where  = (|x − x ∗ |) and L is the Lipschitz constant of F at x ∗ . Since
|x − x ∗ | ≤ |x̂ − x| + |x̂ − x ∗ |

≤ |V −1 F (x)| + |x̂ − x ∗ | ≤ C|F (x)| +  |x − x ∗ |,


we find
C
|x − x ∗ | ≤ |F (x)|
1−
for all x sufficiently close to x ∗ and hence
C(L + )
|F (x̂)| ≤ |F (x)|.
1−
This implies the second claim.

We are now prepared for the local superlinear convergence result that was announced
in the introduction of this chapter.

Theorem 8.5 (Superlinear Convergence). Suppose that F : Rn → Rn is semismooth at a


solution x ∗ of F (x) = 0 and that all V ∈ ∂B F (x ∗ ) are nonsingular. Then the iterates
x k+1 = x k − Vk−1 F (x k ), Vk ∈ ∂B F (x k ),
for k = 0, 1, . . . are well defined and converge to x ∗ superlinearly if x 0 is chosen sufficiently
close to x ∗ . Moreover, |F (xk )| decreases superlinearly to 0.

Proof. We proceed by induction and suppose that x 0 ∈ N and (|x 0 − x ∗ |) ≤ 1 with N, 


defined in Lemma 8.4. Without loss of generality we may assume that N is a ball in Rm .
To verify the induction step suppose that x k ∈ N . Then
|x k+1 − x ∗ | ≤ (|x k − x ∗ |)|x k − x ∗ | ≤ |x 0 − x ∗ |.
This implies that x k+1 ∈ N and superlinear convergence of x k → x ∗ . Superlinear conver-
gence of F (x k ) to 0 easily follows from the second part of Lemma 8.4.

i i

i i
ItoKunisc
i i
2008/6/12
page 222
i i

222 Chapter 8. Semismooth Newton Methods I

8.2.2 Globalization

We now discuss globalization of the semismooth Newton method (8.1.4). For this purpose
we define the merit function θ by
θ(x) = |F (x)|2 .
Throughout we assume that F : Rn → Rn is locally Lipschitz continuous and B-
differentiable and that the following assumptions (8.2.9)–(8.2.11) hold:
S = {x ∈ Rn : |F (x)| ≤ |F (x 0 )|} is bounded. (8.2.9)

There exist σ̄ , b > 0 and a graph  from S to the nonempty


subsets of Rn such that (8.2.10)
θ (x; d) ≤ −σ̄ θ (x) and |d| ≤ b |F (x)| for all d ∈ (x) and x ∈ S.

Moreover  has the following closure property: xk → x̄ and


(8.2.11)
dk → d̄ with xk ∈ S and dk ∈ (xk ) imply θ o (x̄; d̄) ≤ −σ̄ θ(x̄).
Here θ o (x; d) denotes the Clarke generalized directional derivative of θ at x in direction d
(see [Cla]) which is defined by
θ(y + t d) − θ(y)
θ o (x; d) = lim sup .
y→x, t→0+ t
It is assumed that b > C is an arbitrarily large parameter. It serves the purpose that the
iterates {dk } are uniformly bounded. With reference to the closure property of  given in
(8.2.11), note that it is not required that d̄ ∈ (x̄). Conditions (8.2.10) and (8.2.11) will be
discussed in Section 8.2.3 below.
Following [HaPaRa, Qi] we next investigate a globalization strategy for the general-
ized Newton iteration (8.1.4) and introduce the following algorithm.

Algorithm G.
Let β, γ ∈ (0, 1) and σ ∈ (0, σ̄ ). Choose x 0 ∈ Rn and set k = 0. Given x k with
F (x )  = 0. Then
k

(i) If there exists a solution hk to


Vk hk = −F (x k )
with |hk | ≤ b|F (x k )|, and if further
|F (x k + hk )| < γ |F (x k )| ,
set d k = hk , x k+1 = x k + d k , αk = 1, and mk = 0.
(ii) Otherwise choose d k ∈ (x k ) and let αk = β mk , where mk is the first positive
integer m for which
θ (x k + β m d k ) − θ(x k ) ≤ −σβ m θ(x k ).

i i

i i
ItoKunisc
i i
2008/6/12
page 223
i i

8.2. Semismooth functions in finite dimensions 223

Finally set x k+1 = x k + αk d k .

In (ii) the increment d k = hk from (i) can be chosen if θ (x k ; hk ) ≤ −σ θ(x k ).

Theorem 8.6. Suppose that F : Rn → Rn is locally Lipschitz and B-differentiable.


(a) Assume that (8.2.9)–(8.2.11) hold. Then the sequence {x k } generated by Algo-
rithm G is bounded, it satisfies |F (x k+1 )| < |F (x k )| for all k ≥ 0, and each accumulation
point x ∗ of {x k } satisfies F (x ∗ ) = 0.
(b) If moreover for one such accumulation point

|h| ≤ c |F (x ∗ ; h)| for all h ∈ Rn , (8.2.12)

then the sequence x k converges to x ∗ .


(c) If in addition to the above assumptions F is semismooth at x ∗ and all V ∈ ∂B F (x ∗ )
are nonsingular, then x k converges to x ∗ superlinearly.

Proof. (a) First we prove that for each x ∈ S such that θ(x)  = 0 and d satisfying θ (x; d) ≤
−σ̄ θ (x), there exists a τ̄ > 0 such that

θ (x + τ d) − θ(x) ≤ −σ τ θ(x) for all τ ∈ [0, τ̄ ].

If this is not the case, then there exists a sequence τn → 0+ such that

θ (x + τn d) − θ(x) > −σ τn θ(x).

Dividing both sides by τn and letting n → ∞, we have by (8.2.10)

−σ̄ θ (x) ≥ θ (x; d) ≥ −σ θ (x).

Since σ < σ̄ , this shows θ (x) = 0, which contradicts the assumption θ(x)  = 0. Hence
for each level k at which d k ∈ (x k ) is chosen according to the second alternative in
Algorithm G, there exists mk < ∞ and αk > 0 such that |F (x k+1 )| < |F (x k )|. By
construction the iterates therefore satisfy |F (x k+1 )| < |F (x k )| for each k ≥ 0.
Assume first that lim sup αk > 0. If the first alternative in Algorithm G with αk = 1
occurs infinitely many times, then using the fact that γ < 1 we find that limk→0 θ(x k ) = 0.
Otherwise, for all k sufficiently large

0 ≤ θ (x k+1 ) ≤ (1 − σ αk )θ (x k ) ≤ θ(x k ).

Thus θ (x k ) is monotonically decreasing, bounded below by 0, and hence convergent.


Therefore limk→∞ (θ (x k+1 ) − θ (x k )) = 0 and consequently limk→∞ αk θ(x k ) = 0. Thus
lim sup αk > 0 implies that limk→∞ θ(x k ) = 0. Hence each accumulation point x ∗ of {x k }
satisfies F (x ∗ ) = 0. Note that the existence of an accumulation point follows from (8.2.9).
If on the other hand lim sup αk = 0, then lim mk → ∞. By the definition of mk , for
τk := β mk −1 , we have τk → 0 and

θ (x k + τk d k ) − θ(x k ) > −σ τk θ(x k ). (8.2.13)

i i

i i
ItoKunisc
i i
2008/6/12
page 224
i i

224 Chapter 8. Semismooth Newton Methods I

By (8.2.9), (8.2.10) the sequence {(x k , d k )} is bounded. Let {(x k , d k )}k∈K be any convergent
subsequence with limit (x ∗ , d). Note that
θ (x k + τk d k ) − θ (x k ) θ (x k + τk d) − θ(x k ) θ(x k + τk d k ) − θ(x k + τk d)
= + ,
τk τk τk
where
θ (x k + τk d k ) − θ(x k + τk d)
lim → 0,
k∈K,k→∞ τk
since θ is locally Lipschitz continuous. Since d k ∈ (x k ) for all k ∈ K it follows from
(8.2.11) that θ (x ∗ ; d) ≤ −σ̄ θ (x ∗ ). Then from (8.2.13) and (8.2.11) we find
θ (x k + τk d k ) − θ(x k )
−σ θ (x ∗ ) ≤ lim sup ≤ θ o (x ∗ ; d) ≤ −σ̄ θ(x ∗ ). (8.2.14)
k∈K, k→∞ τk
It follows that (σ̄ − σ ) θ (x ∗ ) ≤ 0 and thus θ(x ∗ ) = 0.
(b) Since F is B-differentiable at x ∗ there exists a δ > 0 such that for |x − x ∗ | ≤ δ
1
|F (x) − F (x ∗ ) − F (x ∗ ; x − x ∗ )| ≤ |x − x ∗ |.
2c
Thus,
|F (x ∗ ; x − x ∗ )| ≤ |F (x)| + |F (x) − F (x ∗ ) − F (x ∗ , x − x ∗ )|

1
≤ |F (x)| + |x − x ∗ |.
2c
From (8.2.12)
1
|x − x ∗ | ≤ c |F (x ∗ ; x − x ∗ )| ≤ c |F (x)| + |x − x ∗ |
2
and thus
|x − x ∗ | ≤ 2c |F (x)| if |x − x ∗ | ≤ δ.
Given  ∈ (0, δ) define the set
 

N (x , ) = x ∈ Rn : |x − x ∗ | ≤ , |F (x)| ≤

.
2c + b

Since x ∗ is an accumulation point of {x k }, there exists an index k̄ such that x k̄ ∈ N (x ∗ , ).


Since |d k | ≤ b|F (x k )| for all k we have

|x k̄+1 − x ∗ | ≤ |x k̄ − x ∗ + αk̄ d k̄ | ≤ |x k̄ − x ∗ | + αk̄ |d k̄ |

≤ 2c |F (x k̄ )| + b |F (x k̄ )| = (2c + b) |F (x k̄ )| ≤ .

Hence x k̄+1 ∈ N (x ∗ , ). By induction, x k ∈ N (x ∗ , ) for all k ≥ k̄ and thus the sequence
x k converges to x ∗ .

i i

i i
ItoKunisc
i i
2008/6/12
page 225
i i

8.2. Semismooth functions in finite dimensions 225

(c) Since limk→∞ x k = x ∗ the iterates x k of Algorithm G enter into the region of
attraction for Theorem 8.5. Moreover, referring to the proof of Lemma 8.4, for any γ ∈ (0, 1)
there exists kγ such that the iterates according to (8.1.4) satisfy |F (x k+1 )| ≤ γ |F (x k )| for
k ≥ kγ . Hence these iterates coincide with those of the Algorithm G for k ≥ kγ , and
superlinear convergence follows.

Remark 8.2.1. (i) We point out that the requirement that the graph  satisfies the closure
property (8.2.11) is used in the proof of Theorem 8.6 only for the case that lim supk→∞ αk = 0.
(ii) For part (a) of Theorem 8.6 the condition |hk | ≤ b|F (x k )| in the first alternative
of Algorithm G is not required and |d| ≤ b|F (x)| for all x ∈ S used in alternative (ii) can
be replaced by requiring that the directions are uniformly bounded. These conditions are
used in the proof of Theorem 8.6(b).
(iii) Since h → F (x ∗ ; h) is positively homogeneous, since we consider the finite-
dimensional case here, one can easily argue that (8.2.12) is equivalent to F (x ∗ ; h)  = 0 for
all h  = 0, which is called BD-regularity in [Qi].

8.2.3 Descent directions


We turn to a discussion of conditions (8.2.10) and (8.2.11) required for the descent direc-
tions d.
For the Clarke generalized directional derivative θ o (x, d) we have by local Lipschitz
continuity of F at x that
2(F (x), (F (y + t d) − F (y))
θ o (x, d) = lim sup
y→x,t→0+ t

and there exists F o : Rn × Rn → Rn such that

θ o (x, d) = 2(F (x), F o (x; d)) for (x, d) ∈ Rn × Rn .

We introduce the notion of quasi-directional derivative.

Definition 8.7. Let F : Rn → Rn be directionally differentiable. Then G : S × Rn → Rn


is called a quasi-directional derivative of F on S ⊂ Rn if

(i) (F (x), F (x; d)) ≤ (F (x), G(x; d)),

(ii) G(x; td) = tG(x; d) for all d ∈ Rn , x ∈ S, and t ≥ 0,

(iii) (F (x̄), F o (x̄; d̄)) ≤ lim supx→x̄, d→d̄ (F (x), G(x; d)) for all x → x̄, d → d̄ with
x, x̄ ∈ S.

For the special case of optimization subject to box constraints a quasi-directional derivative
will be constructed in Section 8.2.5. In the remainder of this section we consider the
relationship between well-known choices for descent directions (see, e.g., [HPR, Pan1])
and the concept of quasi-directional derivative of Definition 8.7, and we assume that F is

i i

i i
ItoKunisc
i i
2008/6/12
page 226
i i

226 Chapter 8. Semismooth Newton Methods I

a locally Lipschitz continuous and directionally differentiable function on S where S refers


to the set defined in (8.2.9).
(a) Bouligand direction. If there exists b̄ such that

|h| ≤ b̄ |F (x; h)| for all x ∈ S, h ∈ Rn (8.2.15)

and if
F (x) + F (x; d) = 0 (8.2.16)
admits a solution d for each x ∈ S, then a first choice for the direction is given by the
solution d to (8.2.16), i.e., (x) = d; see, e.g., [Pan1]. By (8.2.15) we have |d| ≤ b̄|F (x)|.
Moreover

θ (x, d) = 2 (F (x; d), F (x)) = −2 θ(x),

and therefore the inequalities in (8.2.10) hold with b = b̄ and σ̄ = 2. For this choice,
however,  does not satisfy (8.2.11), in general, unless additional conditions are imposed
on the problem data; see Section 8.2.5 below.
(b) Generalized Bouligand direction. As a second choice (see [HPR, Pan1]), we
assume that G is a quasi-directional derivative of F on S, that

|h| ≤ b̄ |G(x; h)| for all x ∈ S, h ∈ Rn (8.2.17)

holds, and that


F (x) + G(x; d) = 0 (8.2.18)
admits a solution d which is used as the descent direction for each x ∈ S. We set (x) = d.
Then one argues as for the first choice that the inequalities in (8.2.10) hold with b = b̄
and σ̄ = 2. Moreover  satisfies (8.2.11), since for any (x, d) → (x̄, d̄) in S × R with
(x) = d we have

θ o (x̄, d̄) ≤ 2 lim sup (F (x), G(x; d)) = −2 lim |F (x)|2 = −2θ(x̄).
x→x̄
x→x̄, d→d̄

We refer the reader to Section 8.2.5 for the construction of G for specific applications.
(c) Generalized gradient direction. The following choice was discussed in [HPR].
Here d is chosen as the solution to

min J (x, d) = 2(F (x), G(x; d)) + η |d|2 , (8.2.19)


d

where η > 0 and x ∈ S. Assume that for some L > 0



⎨ h → G(x; h) is continuous and
(8.2.20)

|G(x; h)| ≤ L |h| for all x ∈ S, h ∈ Rn .

Then, d → J (d) is coercive, bounded below, and continuous. Thus there exists an optimal
solution d to (8.2.19).

i i

i i
ItoKunisc
i i
2008/6/12
page 227
i i

8.2. Semismooth functions in finite dimensions 227

Lemma 8.8. Assume that F : Rn → Rn is Lipschitz continuous, directionally differentiable


and that G is a quasi-directional derivative of S satisfying (8.2.20).
(a) If d is an optimal solution to (8.2.19), then
(F (x), G(x; d)) = −η |d|2 .
(b) If d = 0 is an optimal solution to (8.2.19), then (F (x), G(x; h)) ≥ 0 for all
h ∈ Rn .

Proof. (a) Let d be an optimal solution to (8.2.19) and consider


min 2(F (x), G(x; αd)) + α 2 η |d|2 .
α≥0

Then α = 1 is optimal and differentiating with respect to α, we have


(F (x), G(x; d)) + η |d|2 = 0.
(b) If d = 0 is an optimal solution, then the optimal value of (8.2.19) is zero. It
follows that for all h ∈ Rn and α ≥ 0
0 ≤ 2α(F (x), G(x; h)) + α 2 η|h|2 .
Dividing by α > 0 and letting α → 0+ we obtain the claim.

By Lemma 8.8 the optimal value of the cost J in (8.2.19) is given by −η|d|2 . If this
value is negative, then any solution to (8.2.19) provides a decay for θ . The optimal value
of the cost is 0 if and only if d = 0 is the optimal solution. In this case Lemma 8.8 implies
that x is a stationary point in the sense that (F (x), G(x; h)) ≥ 0 for all h ∈ Rn .
Let us now turn to the discussion of condition (8.2.10) for the direction given by
the solution d to (8.2.19), i.e., (x) = d. We assume (8.2.17) and that (8.2.18) admits
a solution for every x ∈ S. Since J (0) = 0, we have 2(F (x), G(x; d)) ≤ −η |d|2 and
therefore
η |d|2 ≤ −2(F (x), G(x; d)) ≤ 2|F (x)| |G(x; d)| ≤ 2L |d| |F (x)|.
Thus,
2L
|d| ≤ |F (x)|,
η
and the second condition in (8.2.10) holds. Turning to the first condition let d̂ satisfy
F (x) + G(x; d̂) = 0. Then using Lemma 8.8(a) and (8.2.17) we find, at a solution d to
(8.2.19),

J (x, d) = (F (x), G(x; d)) = −η |d|2 ≤ 2 (G(x; d̂), F (x)) + η |d̂|2


(8.2.21)
≤ −2 |F (x)|2 + ηb̄2 |F (x)|2 = −(2 − ηb̄2 ) θ (x).
Since G is a quasi-directional derivative of F on S, we have θ (x; d) ≤ 2(F (x), G(x; d)) ≤
−2(2 − ηb̄2 ) θ (x) and thus the direction d defined by (8.2.19) satisfies the first condition in
(8.2.10) with σ̄ = 2(2 − ηb̄2 ), provided that η < b̄22 .

i i

i i
ItoKunisc
i i
2008/6/12
page 228
i i

228 Chapter 8. Semismooth Newton Methods I

To argue (8.2.11) for this choice of , let xk → x̄, dk → d̄, with dk = (xk ), xk ∈ S,
and choose d̂k such that F (xk ) + G(xk ; d̂k ) = 0. Then
1 o
θ (x̄; d̄) ≤ lim sup(F (xk ), G(xk ; dk ))
2 k→∞

≤ lim sup(2(F (xk ), G(xk ; dk )) + η|dk |2 )


k→∞

≤ lim sup(2(F (xk ), G(xk ; d̂k )) + η|d̂k |2 )


k→∞

≤ −2|F (x̄)|2 + η b̄2 lim |F (xk )|2 = −(2 − ηb̄2 )θ (x̄),


k→∞
2
and thus (8.2.11) holds if η < b̄2
.

8.2.4 A Gauss–Newton algorithm


In this section we admit the case that the minimum of θ is not attained with value 0,
which typically arises in nonlinear least squares problems, and the case where the equation
Vk d = −F (x k ) or F (x k ) + G(x k ; d) = 0, which would provide candidates for search
directions in (ii) of Algorithm G, does not have a solution. For these cases we consider the
following Gauss–Newton method.

Gauss–Newton algorithm.
Let β, γ ∈ (0, 1), α > 0, and σ ∈ (0, 2η). Choose x 0 ∈ Rn and set k = 0. Given x k
with F (x k )  = 0. Then
(i) If there exists Vk ∈ ∂B F (x k ) such that
hk = −(α |F (x k )| I + VkT Vk )−1 VkT F (x k ) (8.2.22)
with |hk | ≤ b |F (x k )| satisfies |F (x k +hk )| < γ |F (x k )|, let d k = hk and set x k+1 = x k +d k ,
α k = 1, and mk = 0.
(ii) Otherwise, let d k be defined by (8.2.19) with x = x k . Stop if d k = 0 or F (x k ) = 0.
Otherwise, set αk = β mk where mk is the first positive integer m for which
θ (x k + β m d k ) − θ(x k ) ≤ −σβ m |d k |2 , (8.2.23)
and set x k+1 = x k + α k d k .

Note that (8.2.22) gives the solution to


1 α
min |F (x k ) + Vk h|2 + |F (x k )| |h|2 . (8.2.24)
h 2 2

If the Gauss–Newton algorithm terminates after finitely many steps with index k̄, then
either F (x k̄ ) = 0 or d k̄ = 0 in the second alternative of the algorithm. In this case x k̄ is a
stationary point of θ in the sense that (F (x k̄ ), G(x k̄ ; d)) ≥ 0 for all d by Lemma 8.8.

i i

i i
ItoKunisc
i i
2008/6/12
page 229
i i

8.2. Semismooth functions in finite dimensions 229

We next discuss global convergence when the Gauss–Newton algorithm takes in-
finitely many steps.

Theorem 8.9. Suppose that F : Rn → Rn is locally Lipschitz and B-differentiable, (8.2.9),


(8.2.11), and (8.2.20) hold, and G is a quasi-directional derivative of F on S.
(a) If {x k } is an infinite sequence generated by the Gauss–Newton algorithm, then
{x } is bounded and |F (x k+1 )| < |F (x k )| for all k. If the first alternative occurs infinitely
k

often, then all accumulation points x ∗ satisfy F (x ∗ ) = 0. Otherwise, limk→∞ d k = 0 and

(F (x ∗ ), G(x; h)) ≥ 0 for all h ∈ Rn . (8.2.25)

(b) If for some accumulation point F (x ∗ ) = 0 and

|h| ≤ c |F (x ∗ ; h)| for all h ∈ Rn , (8.2.26)

then the sequence x k converges to x ∗ .


(c) If in addition to the assumptions in (a) and (b), F is semismooth at x ∗ and all
V ∈ ∂B F (x ∗ ) are nonsingular, then x k converges to x ∗ superlinearly.

Proof. (a) The algorithm guarantees that |F (x k+1 )| < |F (x k )| for all k. Due to (8.2.9)
the sequence {x k } is bounded. In case the first alternative is taken we have from (8.2.24)
with h = 0
α|d k |2 ≤ |F (x k )|.
In case of the second alternative Lemma 8.8(a) and (8.2.20) imply that

η|d k |2 ≤ |F (x k )||G(x k ; d k )| ≤ L|F (x k )||d k |.

Consequently the sequence {d k } is bounded.


If the first alternative of the Gauss–Newton algorithm occurs infinitely often, then
limk→∞ |F (x k )| = 0 and every accumulation point x ∗ of {x k } satisfies F (x ∗ ) = 0.
Let us turn to the case when eventually only the second alternative occurs. Since
|F (x k )| is monotonically decreasing and bounded below, the sequence F (x k+1 ) − F (x k ) is
convergent and hence by (8.2.23)

lim αk |d k |2 = 0.
k→∞

If limk→∞ d k  = 0, then there exists an index set K such that limk∈K, k→∞ |d k |  = 0 and
consequently limk∈K, k→∞ αk = 0. For τk = β mk −1 we have
1
−σ |d k |2 ≤ (θ (x k + τ k d k ) − θ(x k )). (8.2.27)
τk
Let K̂ ⊂ K be such that {x k }k∈K̂ , {d k }k∈K̂ are convergent subsequences with limits x ∗ and
d. Note that
1 1
(θ (x k + τ k d k ) − θ (x k )) = k (θ (x k + τ k d) − θ(x k ))
τk τ (8.2.28)
1
+ k (θ (x k + τ k d k ) − θ(x k + τ k d))
τ

i i

i i
ItoKunisc
i i
2008/6/12
page 230
i i

230 Chapter 8. Semismooth Newton Methods I

with
1
lim k
(θ (x k + τ k d k ) − θ(x k + τ k d)) = 0,
k∈K̂,k→∞ τ

since θ is locally Lipschitz continuous. By Lemma 8.8(a) we have (F (x k ), G(x k ; d k )) =


−2η|d k | for all k ∈ K̂. Since G is assumed to be a quasi-directional derivative we have
θ o (x ∗ , d) = (F (x ∗ ), F o (x ∗ ; d)) ≤ −2η|d|. Passing to the limit in (8.2.27), utilizing
(8.2.28), we find
1
−σ |d|2 ≤ lim sup k
(θ (x k + τ k d k ) − θ(x k )) = θ o (x ∗ , d) ≤ −2η|d|2 .
k∈K̂, k→∞
τ

Since σ ∈ (0, 2η) this implies that d = 0, which contradicts our assumption. Consequently
limk→∞ d k = 0. From Lemma 8.8(a) we have

2α(F (x k ), G(x k ; h)) + ηα 2 |h|2 = J (x k , αh) ≥ J (k k ; d k ) = −η|d k |2

for every α > 0 and h ∈ Rn . Passing to the limit with respect to k and dividing by α we
obtain
2 (F (x ∗ ), G(x ∗ ; h)) + ηα|h|2 ≥ 0,
and (8.2.25) follows by letting α → 0+ .
(b) The proof is identical to the one of Theorem 8.6(b).
(c) From Theorem 8.3 there exists a bounded neighborhood N ⊂ {x : |F (x)| ≤
|F (x 0 )|} of x ∗ and a constant C such that for all x ∈ N and V ∈ ∂B F (x) we have
|V −1 | ≤ C. Consequently there exists M > 0 such that for all x ∈ N we have
 
α|F (x)|I + V (x)T V (x) −1 V (x)T − V (x)−1
 −1 (8.2.29)
≤ α|F (x)| α|F (x)|I + V (x)T V (x) ≤ M|F (x)| ≤ M|F (x 0 )|.
 −1
Let h = − α|F (x)|I + V (x)T V (x) V (x)T F (x). Then by Lemma 8.4, possibly after
shrinking N , we have for all x ∈ N

|x + h − x ∗ | = |x − V (x)−1 F (x) − x ∗ + h + V (x)−1 F (x)|


(8.2.30)
≤ (|x − x ∗ |)|x − x ∗ | + M|F (x)|2 .
Moreover
 
|F (x + h)| ≤ |F x − V (x)−1 F (x) |
 
+|F [ V (x)−1 − (α|F (x)|I + V (x)T V (x))−1 V (x)T F (x) ]|
 
≤ L̄(|x − x ∗ |)|F (x)| + M L̄2 |F (x)|2 ≤ L̄ (|x − x ∗ |) + M L̄2 |x − x ∗ | |F (x)|,

where we used (8.2.29) and denoted by L̄ the Lipschitz constant of F on the bounded set
{x : |x| ≤ M|F (x0 )|2 } ∪ {x − V (x)−1 F (x) : x ∈ N } ∪ N .

i i

i i
ItoKunisc
i i
2008/6/12
page 231
i i

8.2. Semismooth functions in finite dimensions 231

Since x k → x ∗ , the last estimate implies the existence of an index k̄ such that x k ∈ N and

|F (x k + hk )| ≤ γ |F (x k )| for all k ≥ k̄,


 −1
where hk = − α|F (x k )|I + V (x k )T V (x k ) V (x k )T F (x k ). Thus the first alternative in
the Gauss–Newton algorithm is chosen for all k ≥ k̄, and superlinear convergence follows
from (8.2.30) with (x, h) = (x k , hk ), where we also use that |F (x)| ≤ L̄|x| for x ∈ N .

8.2.5 A nonlinear complementarity problem


Consider the complementarity problem

⎨ φ ≤ x ≤ ψ, g(x)i (x − ψ)i (x − φ)i = 0 for all i,

g(x) ≤ 0 for x ≥ ψ and g(x) ≥ 0 for x ≤ φ,

where g ∈ C 1 (Rn , Rn ) and φ < ψ. This corresponds to the bilateral constraint case and
can equivalently be expressed as

F (x) = g(x) + max(0, −g(x) + x − ψ) + min(0, −g(x) + x − φ) = 0;

see Section 4.7, Example 4.53. Clearly θ(x) = |F (x)|2 is locally Lipschitz continuous and
directionally differentiable, and hence B-differentiable.
Define

A+ = {−g(x) + x − ψ > 0}, A− = {−g(x) + x − φ < 0},

I 1 = {−g(x) + x − ψ = 0}, I 2 = {x − ψ < g(x) < x − φ}, and

I 3 = {−g(x) + x − φ = 0}.

We obtain


⎪ d on A+ ∪ A− ,






⎨ g (x)d on I 2 ,

F (x; d) =



⎪ max(g (x)d, d) on I 1 ,





min(g (x)d, d) on I 3 ,

and the Bouligand direction (8.2.16) is as the solution to

d + x − ψ = 0 on A+ , d + x − φ = 0 on A− , g (x)d + g(x) = 0 on I 2 ,

max(g (x)d, d) + F (x) = 0 on I 1 , min(g (x)d, c d) + F (x) = 0 on I 3 .


(8.2.31)

i i

i i
ItoKunisc
i i
2008/6/12
page 232
i i

232 Chapter 8. Semismooth Newton Methods I

Further if g (x) − I is surjective and g ∈ C 2 (Rn , Rn ), then F o (x, d) satisfies




⎪ d on A+ ∪ A− ,






⎨ g (x)d on I 2 ,
F (x; d) =
o
(8.2.32)



⎪ max(g (x)d, d) on {F (x) > 0} ∪ (I 1 ∪ I 3 ),





min(g (x)d, d) on {F (x) ≤ 0} ∪ (I 1 ∪ I 3 ).

To verify this claim we consider F (x) = g(x) + max(0, −g(x) + x − ψ). The general case
then follows with minor modifications. We find
2
θ o (x, d) = lim sup (F (x), F (y + td) − F (y))
y→x,t→0+ t
2
= lim sup (F (x), max(g (x)(y + td), y + td − ψ̂) − max(g (x)y, y − ψ̂)),
y→x,t→0+ t
(8.2.33)

where ψ̂ = ψ + g(x) − g (x)x. For d ∈ Rn we define r ∈ Rn by

ri > 0 on S1 = {i ∈ I 1 : Fi (x) > 0, (Ad)i > di } ∪ {i ∈ I 1 : Fi (x) ≤ 0, (Ad)i < di },


ri < 0 on S2 = {i ∈ I 1 : Fi (x) > 0, (Ad)i ≤ di } ∪ {i ∈ I 1 : Fi (x) ≤ 0, (Ad)i ≥ di },

and r arbitrary otherwise. Here we set A = g (x). Let y ∈ Rn be such that

Ay = y − ψ̂ + r,

and choose t = ty such that

(Ay + tAd)i > (y + td − ψ̂)i for i ∈ S1 ,


(Ay + tAd)i < (y + td − ψ̂)i for i ∈ S2

for t ∈ (0, ty ). Passing to the limit in (8.2.33) we arrive at (8.2.32).


For arbitrary δ > 0 we claim that the quasi-directional derivative G, defined by


⎪ d on Aδ = {−g(x) + x − ψ > δ} ∩ {−g(x) + x − φ < −δ},





⎪ g (x)d on Iδ = {x − ψ + δ ≤ g(x) ≤ x − φ − δ},


G(x; d) = and otherwise (8.2.34)





⎪ max(g (x)d, d) if {F (x) > 0},




min(g (x)d, d) if {F (x) ≤ 0},

is a quasi-directional derivative for F . In fact, G is clearly positively homogeneous of


degree 1, and
(F (x), F (x, d)) ≤ (F (x), G(x, d)),

i i

i i
ItoKunisc
i i
2008/6/12
page 233
i i

8.2. Semismooth functions in finite dimensions 233

which shows (i) of Definition 8.7. Moreover we have

(F (x), F o (x, d)) ≤ (F (x), G(x, d)).

If |x − x̄| sufficiently small, then

i ∈ Iδ (x) implies that i ∈ I 2 (x̄),

i ∈ Aδ (x) implies that i ∈ A(x̄).

Thus, (iii) holds by the definition of G.


To argue that (8.2.18) has a solution, let Aδ ∪ Iδ ∪ (Aδ ∩ Iδ )c be a pairwise disjoint
partition of the index {i = 1, . . . , n}. Set A = g (x), for x ∈ S, and decompose A according
to the partition of the index set
⎛ ⎞
A11 A12 A13
A = ⎝ A21 A22 A23 ⎠ .
A31 A32 A33

Setting d = (d1 , d2 , d3 )T and F = (F1 , F2 , F3 )T , the equation

G(x; d) = −F

is equivalent to

d1 = −F1 ,

d2 = −A−1 −1
22 A23 d3 + A22 (−F2 + A21 F1 ),

and


⎪ Md3 + μ̃ = −w − F3 ,



μ̃ = max(0, μ̃ + d3 + F3 ) on {F (x) > 0}, (8.2.35)





μ̃ = min(0, μ̃ + d3 + F3 ) on {F (x) ≤ 0},

where

w = −A31 F1 + A32 A−1


22 (−F2 + A21 F1 ) and M = A33 − A32 A−1
22 A23 .

Assume that A is symmetric positive definite for every x ∈ S. Then every Schur complement
of A is positive definite as well, and (8.2.35) admits a unique solution (d3 , μ̃). It follows
that G(x; d) = −F admits a unique solution d for every x ∈ S and F ∈ Rn . Consequently
(8.2.18) is satisfied. Since g (x) is positive definite for every x ∈ S and S is closed and
bounded by (8.2.9), and hence compact, it follows that g (x) and M are uniformly positive
definite with respect to x ∈ S. This implies that there exists b̄ such that |d| ≤ b̄|F | for some
b̄ independent of x ∈ S. Consequently (8.2.17) holds.
We remark that (iii) of Definition 8.7 is not satisfied for the choice G(x; d) = F (x, d)
unless g(x)i  = (x − ψ)i and g(x)i  = (x − φ)i for all i and x ∈ S.

i i

i i
ItoKunisc
i i
2008/6/12
page 234
i i

234 Chapter 8. Semismooth Newton Methods I

8.3 Semismooth functions in infinite-dimensional spaces


In infinite-dimensional spaces notions of generalized derivatives for functions which are not
C 1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainly utilize a concept
of generalized derivative that is sufficient to guarantee superlinear convergence of Newton’s
method. We refer the reader to the discussion in Section 8.1, especially (8.1.5) and (8.1.6).
This notion of differentiability is called Newton derivative and will be defined below. We
refer the reader to [HiIK, CNQ, Ulb] for further discussions of the topics covered in this
section.
Let X, Z be real Banach spaces and let D ⊂ X be an open set.

Definition 8.10. (1) F : D ⊂ X → Z is called Newton differentiable at x if there exist an


open neighborhood N (x) ⊂ D and mappings G : N (x) → L(X, Z) such that
|F (x + h) − F (x) − G(x + h)h|Z
lim = 0. (A)
|h|→0 |h|X
The family {G(s) : s ∈ N (x)} is called an N -derivative of F at x.
(2) F is called semismooth at x if it is Newton differentiable at x and

lim G(x + t h)h exists uniformly in |h| = 1.


t→0+

(3) F is directionally differentiable at x ∈ D if


F (x + t h) − F (x)
lim =: F (x; h)
t→0+ t
exists for all h ∈ X.
(5) F is B-differentiable at x ∈ D if F is directionally differentiable at x and
F (x + h) − F (x) − F (x; h)
lim = 0.
|h|→0 |h|X

Note that differently from the finite-dimensional case we do not require Lipschitz
continuity of F as part of the definition of semismoothness.

Lemma 8.11. Suppose that F : D ⊂ X → Z is Newton differentiable at x ∈ D with


N-derivative G.
(1) F is directionally differentiable at x if and only if

lim G(x + t h)h exists for all h ∈ X. (8.3.1)


t→0+

In this case F (x; h) = limt→0+ G(x + th)h for all h ∈ X.


(2) F is B-differentiable at x if and only if

lim G(x + t h)h exists uniformly in |h| = 1, (8.3.2)


t→0+

i.e., F is semismooth.

i i

i i
ItoKunisc
i i
2008/6/12
page 235
i i

8.3. Semismooth functions in infinite-dimensional spaces 235

Proof. (1) If (8.3.1) holds for h ∈ X with |h| = 1, then


F (x + th) − F (x)
lim = lim+ G(x + th)h.
t→0+ t t→0

Since h ∈ X with |h| = 1 was arbitrary this implies that F is directionally differentiable at
x and
F (x; h) = lim+ G(x + th)h.
t→0

Similarly the converse holds.


(2) If F is directionally differentiable at x, then F is B-differentiable at x, i.e.,
F (x + h) − F (x) − F (x; h)
lim = 0 if and only if
|h|→0 |h|X

F (x + tv) − F (x)
lim − F (x; v) = 0 and the limit is uniform in |v|X = 1.
t→0+ t
Here we use positive homogeneity of the directional derivative F (x; h) with respect to the
second variable.
If F is B-differentiable at x, then it is differentiable and from (1) we have
F (x + tv) − F (x)
lim = F (x; v) = lim G(x + tv)v.
t→0 t t→0

The Bouligand property and the equivalence stated above imply that the limt→0 G(x + tv)v
exists uniformly in |v| = 1. The converse easily follows as well.

Example 8.12. Let ψ : R → R be semismooth at every x ∈ R in the sense of Definition 8.1


and globally Lipschitz, i.e., there exists a constant L such that

|ψ(s) − ψ(t)| ≤ L|s − t| for all s, t ∈ R.

We first argue that there exists a measurable selection V : R → R such that V (t) ∈ ∂ψ(t)
for a.e. t ∈ R. Since ∂ψ(s) is a nonempty closed set in R for every s ∈ R (see [Cla,
p. 70]), it suffices to prove that the multivalued function s → ∂ψ is measurable; i.e., for
every compact set C ⊂ R the preimage PC = {t ∈ R : ∂ψ(t) ∩ C  = ∅} is measurable. The
measurable selection theorem (see [Cla, p. 111]) then ensures the existence of the desired
measurable selection V . To verify measurability of s → ∂ψ(s) let C be a compact set
and let {tk } be a convergent sequence in PC with limit t ∗ . Choose vk ∈ ∂ψ(tk ) ∩ C for
k = 1, 2, . . . . By compactness of C there exists a convergent subsequence, denoted by the
same symbol, with limit v ∗ ∈ C. Since tk → t ∗ , upper semicontinuity of ∂ψ at t ∗ (see [Cla,
p. 70]) implies the existence of a sequence ṽk ∈ ∂ψ(t ∗ ) such that limk→∞ (ṽk − vk ) = 0.
Consequently limk→∞ ṽk = v ∗ and by closedness of ∂ψ(t ∗ ) we have v ∗ ∈ ∂ψ(t ∗ ) ∩ C.
Thus PC is closed and therefore measurable.
Associated to ψ with the properties specified above, we define for 1 ≤ p ≤ q ≤ ∞
the substitution operator

F : Lq () → Lp ()

i i

i i
ItoKunisc
i i
2008/6/12
page 236
i i

236 Chapter 8. Semismooth Newton Methods I

by
F (x)(s) = ψ(x(s)) for a.e. s ∈ ,
where x ∈ L (), and with  a bounded domain in Rn .
q

We now verify that F is Newton differentiable on R if 1 ≤ p < q ≤ ∞ and that any


measurable selection V of ∂ψ provides an N -derivative. Let D : R × R → R be given by
D(s, v) = |ψ(s + v) − ψ(s) − V (s + v)v|.
By Theorem 8.2 we have
lim v −1 D(s, v) = 0 for every s ∈ R. (8.3.3)
v→0

Moreover global Lipschitz continuity of ψ implies that


D(s, v) ≤ 2L|v| for all (s, v) ∈ R2 . (8.3.4)
Let x ∈ Lq () and let {hk } be a sequence in Lq () converging to 0. Then there exists a sub-
sequence {hk } converging to 0 a.e. in . By (8.3.3) this implies that hk (s)−1 D(x(s), hk (s))
→ 0 for a.e. s ∈ . By (8.3.4) Lebesgue’s bounded convergence theorem is applicable
and implies that h−1
k D(x, hk ) converges to 0 in L for every 1 ≤ p̂ < ∞. Since this is the

case for every a.e. convergent subsequence of hk and since {hk } was arbitrary, we find that
|h−1 D(x, h)|Lp̂ → 0 for every 1 ≤ p̂ < ∞ as |h|Lq () → 0. (8.3.5)
By the Hölder inequality we obtain
|D(x, h)|Lp ≤ |h−1 D(x, h)|Lr |h|Lq ,
where r = q−p qp
if q < ∞, and r = p if q = ∞. Using (8.3.5) this implies Newton
differentiability of F at x.
To verify that F is semismooth at x ∈ Lq (), we shall show that
|V (x + h)h − ψ (x; h)|Lp
→ 0 as |h|Lq → 0. (8.3.6)
|h|Lq
Let D̄ : R × R → R be given by D̄(s, v) = |V (s + v)v − ψ (s; v)|. Then by Theorem 8.2
(3) and Lipschitz continuity of ψ we have
lim v −1 D̄(s, v) = 0 and D̄(s, v) ≤ 2L|v| for all (s, v) ∈ R2 .
v→0

The proof of (8.3.6) can now be completed in the same manner as that for Newton dif-
ferentiability, by using Lebesgue’s bounded convergence theorem. Semismoothness of
F : Lq () → Lp () follows from (8.3.5), (8.3.6), and Lemma 8.11 (2). The class of
mappings F of this example was treated in [Ulb].
Example 8.13. Let X be a Hilbert space. Then the norm functional F (x) = |x| is Newton
differentiable. In fact, let G(x + h)h = ( |x+h|
x+h
, h)X and G(0)h = (λ, h)X for some λ with
λ ∈ X. Then

(2(x + h), h)X − |h|2 (x + h, h)X
|h|−1 |F (x + h) − F (x) − G(x + h)h| = |h|−1 − →0
|x| + |x + h| |x + h|
as h → 0. Hence F is Newton differentiable on X. Moreover F is semismooth.

i i

i i
ItoKunisc
i i
2008/6/12
page 237
i i

8.3. Semismooth functions in infinite-dimensional spaces 237

Example 8.14. Let F : Lq () → Lp () denote the pointwise max operation F (x) =
max(0, x) and define Gm by

⎨ 0 if x(s) < 0,
Gm (x)(s) = δ if x(s) = 0,

1 if x(s) > 0,

where δ ∈ [0, 1]. It follows from Example 8.12 that F is semismooth from Lq () into
Lp () provided that 1 ≤ p < q ≤ ∞, with Gm as N -derivative. It can also be argued
that Gm is an N-derivative for F for any choice of δ ∈ R (see [HiIK]). If p = q, then F is
directionally differentiable at every x ∈ Lq (). In fact, for h ∈ Lq () define

0 if x(s) < 0 or x(s) = 0, h(s) ≤ 0,
F (x; h)(s) =
h(s) if x(s) > 0 or x(s) = 0, h(s) ≥ 0.

Then we have

F (x(s) + t h(s)) − F (x(s))
− F (x; h)(s) ≤ 2|h(s)|

t
and

F (x(s) + t h(s)) − F (x(s))
lim+ − F (x; h)(s) for a.e. s ∈ .
t→0 t
By the Lebesgue dominated convergence theorem

F (x + t h) − F (x)
lim+ − F (x; h) = 0,
t→0 t Lp

i.e., F (x; h) is the directional derivative of F at x.


However, F is not Newton differentiable with Gm as N-derivative in general. For
this purpose consider x = −|s| on  = (−1, 1) and choose hn (s) as n1 multiplied by the
p
characteristic function of the interval (− n1 , n1 ). Then |hn |Lp = np+1
2
and
 1  1  p+1
n 2 1
|F (x + hn ) − F (x) − Gm (x + hn )hn | ds =
p
|x(s)| =
p
.
−1 − n1 p+1 n

Thus,
  p1
|F (x + hn ) − F (x) − Gm (x + hn )hn |Lp 1
lim =  = 0,
n→∞ |hn |Lp p+1
and hence condition (A) is not satisfied at x for any p ∈ [1, ∞). To consider the case
p = ∞ we choose  = (0, 1) and show that (A) is not satisfied at x(s) = s. For this
purpose define for n = 2, . . .

⎨ −(1 + n )s
⎪ on (0, n1 ] ,
1

hn (s) = (1 + n1 )s − n2 (1 + n1 ) on ( n1 , n2 ] ,


0 on ( n2 , 1] .

i i

i i
ItoKunisc
i i
2008/6/12
page 238
i i

238 Chapter 8. Semismooth Newton Methods I

Observe that En = {s : x(s) + hn (s) < 0} ⊃ (0, n1 ]. Therefore


1
lim |max(0, x + hn ) − max(0, x) − Gm (x + hn )hn |L∞ (0,1)
n→∞ |hn |L∞ (0,1)

n2 n
= lim |x|L∞ (En ) ≥ lim = 1,
n→∞ n + 1 n→∞ n + 1

and hence (A) cannot be satisfied.

Lemma 8.15. Suppose that H : D ⊂ X → Y is continuously Fréchet differentiable at x ∈


D and φ : Y → Z is Newton differentiable at H (x) with N -derivative G. Then F = φ(H )
is Newton differentiable at x with N-derivative G(H (x + h))H (x + h) ∈ L(X, Z) for h
sufficiently small.

Proof. Let U be a convex neighborhood of x in D such that H ∈ L(X, Y ) is continuous in U


and H (U ) is contained in N (H (x)), with N (H (x)) defined according to N -differentiability
of φ at H (x). Let h ∈ X be such that x ∗ + h ∈ U and note that
 1
H (x + h) = H (x) + H (x + θ h)h dθ, (8.3.7)
0

where
/ /
/ 1 /
/ H (x + θ h) dθ − H (x + h)/
/ /→0 as |h|X → 0, (8.3.8)
0

since H ∈ L(X, Y ) is continuous at x. By Newton differentiability of φ at H (x) and (8.3.7)


it follows that
 1
1
lim φ(H (x + h)) − φ(H (x)) − G(H (x + h)) H (x + θ h)h dθ = 0.

|h|X →0 |h|X 0 Z

From (8.3.8) we deduce that


1
lim φ(H (x + h)) − φ(H (x)) − G(H (x + h))H (x + h)h Z = 0.
|h|X →0 |h|X
This implies Newton differentiability of F = φ(H ) at x.

Theorem 8.16. Suppose that x ∗ is a solution to F (x) = 0 and that F is Newton differentiable
at x ∗ with N-derivative G. If G is nonsingular for all x ∈ N (x ∗ ) and {G(x)−1  : x ∈
N (x ∗ )} is bounded, then the Newton iteration

x k+1 = x k − G(x k )−1 F (x k )

converges superlinearly to x ∗ provided that |x 0 − x ∗ | is sufficiently small.

Proof. Note that the Newton iterates satisfy

|x k+1 − x ∗ | ≤ G(x k )−1  |F (x k ) − F (x ∗ ) − G(x k )(x k − x ∗ )| (8.3.9)

i i

i i
ItoKunisc
i i
2008/6/12
page 239
i i

8.3. Semismooth functions in infinite-dimensional spaces 239

if x k ∈ N (x ∗ ). Let B(x ∗ , r) denote a ball of radius r centered at x ∗ contained in N (x ∗ ) and


let M be such that G(x)−1  ≤ M for all x ∈ B(x ∗ , r). We apply (A) with x = x ∗ . Let
η ∈ (0, 1] be arbitrary. Then there exists ρ ∈ (0, r) such that
η 1
|F (x ∗ + h) − F (x ∗ ) − G(x ∗ + h)h| < |h| ≤ |h| (8.3.10)
M M
for all |h| < ρ. Consequently, if we choose x 0 such that |x 0 − x ∗ | < ρ, then by induction
from (8.3.9), (8.3.10) with h = x k − x ∗ we have |x k+1 − x ∗ | < ρ and in particular
x k+1 ∈ B(x ∗ , ρ). It follows that the iterates are well defined. Moreover, since η ∈ (0, 1] is
chosen arbitrarily, x k → x ∗ converges superlinearly.

The following theorem provides conditions which guarantee in an appropriate sense


global convergence of the Newton iteration [QiSu].

Theorem 8.17. Suppose that F is continuous and directionally differentiable on the closed
sphere S = B(x 0 , r). Assume also the existence of bounded operators G(·) ∈ L(X, Z) and
constants β, γ ∈ R+ such that
G(x)−1  ≤ β, |G(x)(y − x) − F (x; y − x)| ≤ γ |y − x|,

|F (y) − F (x) − F (x; y − x)| ≤ δ |y − x|


for all x, y ∈ S, where α = β(γ + δ) < 1, and
β |F (x 0 )| ≤ r(1 − α).
Then the iterates defined by
x k+1 = x k − G(x k )−1 F (x k ) for k = 0, 1, . . .
remain in S and converge to the unique solution x ∗ of F (x) = 0 in S. Moreover, we have
the error estimate
α
|x k − x ∗ | ≤ |x k − x k−1 |.
1−α

Proof. First note that


|x 1 − x 0 | ≤ |G(x 0 )−1 F (x 0 )| ≤ β |F (x 0 )| ≤ r(1 − α).
Thus x 1 ∈ S. Suppose that x 1 , . . . , x k ∈ S. Then
|x k+1 − x k | ≤ |G(x k )−1 F (x k )| ≤ β |F (x k )|

≤ β |F (x k ) − F (x k−1 ) − F (x k−1 ; x k − x k−1 )|

+β |G(x k−1 )(x k − x k−1 ) − F (x k−1 ; x k − x k−1 )|

≤ β(δ + γ ) |x k − x k−1 | = α |x k − x k−1 | ≤ α k |x 1 − x0 | ≤ r(1 − α)α k .


(8.3.11)

i i

i i
ItoKunisc
i i
2008/6/12
page 240
i i

240 Chapter 8. Semismooth Newton Methods I

Since

k 
k
|x k+1 − x 0 | ≤ |x j +1 − x j | ≤ rα j (1 − α) ≤ r,
j =0 j =0

we have r k+1
∈ S and by induction x ∈ S for all k. For each m > n
k


m−1 
m−1
|x m − x n | ≤ |x j +1 − x j | ≤ rα j (1 − α) ≤ rα n .
j =n j =n

Hence {x k } is a Cauchy sequence in S and there exists limk→∞ x k = x ∗ ∈ S. By locally


Lipschitz continuity of F and (8.3.11)
|F (x ∗ )| = lim |F (x k )| ≤ lim αβ −1 |x k − x k−1 | = 0,
i.e., F (x ∗ ) = 0. For y ∗ ∈ S satisfying F (y ∗ ) = 0 we find
|y ∗ − x ∗ | ≤ β |G(x ∗ )(y ∗ − x ∗ )|

≤ β |F (y ∗ ) − F (x ∗ ) − F (x ∗ ; y ∗ − x ∗ )|

+β |G(x ∗ )(y ∗ − x ∗ ) − F (x ∗ ; y ∗ − x ∗ )|

≤ α |y ∗ − x ∗ |.
This implies y ∗ = x ∗ and hence x ∗ is the unique solution to F (x) = 0 in S. Finally

m−1 
m−k
α
|x m − x k | ≤ |x j +1 − x j | ≤ α j |x k − x k−1 | ≤ |x k − x k−1 |
j =k j =1
1−α

implies the asserted error estimate by letting m → ∞.

8.4 The primal-dual active set method as a semismooth


Newton method
Let us consider the complementarity problem in the unknowns (x, μ) ∈ L2 () × L2 ()
Ax + μ = a,
(8.4.1)
μ = max(0, μ + c(x − ψ)),

where A ∈ L(L2 ()), with  a bounded domain in Rn , c > 0, and a ∈ Lp (), ψ ∈ Lp ()
for some p > 2. Recall that the second equation in (8.4.1) is equivalent to
x ≤ ψ, μ ≥ 0, (μ, x − ψ)L2 () = 0, (8.4.2)
where the inequalities and the max operation are understood in the pointwise a.e. sense. In
Example 7.1 of Chapter 7 it was shown that such problems arise in the context of constrained
optimal control problems with A of the form

i i

i i
ItoKunisc
i i
2008/6/12
page 241
i i

8.4. The primal-dual active set method as a semismooth Newton method 241

A = αI + B ∗ (− )−2 B, (8.4.3)
where α > 0, B is a bounded operator, and denotes the Laplacian with homogeneous
Dirichlet boundary conditions. This suggests and justifies our assumption that

A = αI + C, where C ∈ L(L2 (), Lp ()) with p > 2. (8.4.4)

We shall show that the primal-dual active set method discussed in Chapter 7 is equiv-
alent to the semismooth Newton method applied to

⎨ Ax − a + μ,
0 = F (x, μ) = (8.4.5)

μ − max(0, μ + c (x − ψ)).

For this purpose we define G : Lp () → L2 () by



⎨ 0 if x(s) ≤ 0,
G(x)(s) = (8.4.6)

1 if x(s) > 0,

and we recall that the max function is Newton differentiable from Lp () to L2 () if p > 2
with G as an N -derivative. A Newton step applied to the second equation in (8.4.4) results in

c (x k+1 − x k ) + c (x k − ψ) = 0 in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0},

(μk+1 − μk ) + μk = 0 in Ik = {s : μk (s) + c (x k (s) − ψ(s)) ≤ 0}.

Hence a Newton step for (8.4.5) is given by




⎪ Ax k+1 − a + μk+1 = 0,



x k+1 = ψ in Ak , (8.4.7)




⎩ k+1
μ = 0 in Ik .

This coincides with the primal-dual active set strategy of Section 7.1. To analyze its local
convergence properties note that under assumption (8.4.4), equation (8.4.5) with c = α is
equivalent to the reduced problem

Ax − a + max(0, −Cx + a − αψ) = 0. (8.4.8)

Applying a semismooth Newton step to (8.4.8) results in

A(x k+1 − x k ) − G(−Cx k + a − αψ) C(x k+1 − x k )


(8.4.9)
+Ax k − a + max(0, −Cx k + a − αψ) = 0.

Setting μk+1 = a − Ax k+1 the iterates of (8.4.9) coincide with those of (8.4.7), provided
that the initialization for the reduced iteration (8.4.9) is chosen such that μ0 = a − Ax 0 .
This follows from the fact that μk + α(x k − ψ) = −Cx k + a − αψ.

i i

i i
ItoKunisc
i i
2008/6/12
page 242
i i

242 Chapter 8. Semismooth Newton Methods I

For any partition  = A ∪ I into measurable sets A and I let RI : L2 (I) → L2 (I)
denote the canonical restriction operator and RI∗ : L2 (I) → L2 (I) its adjoint. Further set
AI = RI A RI∗ .

Proposition 8.18. Assume that (8.4.1) with ψ and a in Lp (), p > 2, admits a solution
(x ∗ , μ∗ ) ∈ L2 () × L2 () and that (8.4.4) holds. If moreover
{A−1
I :  = I ∪ A} is uniformly bounded in L(L ()),
2
(8.4.10)
then the iterates (x k , μk ) defined by (8.4.7) converge superlinearly in L2 () × L2 ()
to (x ∗ , μ∗ ), provided that |x ∗ − x 0 | is sufficiently small and μ0 = a − Ax 0 . Moreover
(x ∗ , μ∗ ) ∈ Lp () × Lp ().

Proof. First note that if (8.4.1) admits a solution (x ∗ , μ∗ ) for some c > 0, then it admits the
same solution with c = α and x ∗ is also a solution of (8.4.8). By (8.4.4) this implies that
x ∗ ∈ Lp () and the first equation in (8.4.1) implies that μ∗ ∈ Lp (). By Lemma 8.15 and
Example 8.14 the mapping F̃ x = αAx−a+max(0, −Cx+a−αψ) is Newton differentiable
from L2 () into itself and G̃(x) = A − G(−Cx + a − αψ)C is an N -derivative. By
(8.4.10) the family of operators {G̃(x)−1 : x ∈ L2 ()} is uniformly bounded in L(L2 ()),
and hence by Theorem 8.16 the iterates x k converge superlinearly to x ∗ , provided that
|x 0 − x ∗ | is sufficiently small. Superlinear convergence of μk to μ∗ follows from (8.4.5)
and (8.4.7).

Note that (8.4.10) is satisfied for operators of the form given in (8.4.3).
The characterization of inequalities by means of max and min operations in comple-
mentarity systems such as (8.4.2) is one of several possibilities.
8 Another frequently used
complementarity function is the Fischer–Burmeister function (ψ − x)2 + μ2 −(ψ −x)−
μ. Numerical experiments appear to indicate that max-based complementarity functions
can be more efficient numerically than the Fischer–Burmeister function; see, e.g., [Kan].
As shown in Section 7.5 a class of optimization problems with bilateral constraints
ϕ ≤ x ≤ ψ can be expressed as
Ax − a + μ = 0, μ = max(0, μ + c (x − ψ) + min(0, μ + c (x − ϕ)) (8.4.11)
or, equivalently, as
Ax − a + max(0, −Cx + a − αψ) + min(0, −Cx + a − αψ) = 0, (8.4.12)
where c = α and ϕ < ψ with a, ϕ, and ψ in L (). p

The primal-dual active set strategy applied to (8.4.11) can be expressed as


Ax k+1 − a + μk+1 = 0,

x k+1 = ψ in A+
k = {s : μ (s) + c (x (s) − ψ(s)) > 0},
k k

μk+1 = 0 in Ik = {s : μk (s) + c (x k (s) − ϕ(s)) ≤ 0 ≤ μk (s) + c (x k (s) − ψ(s))},

x k+1 = ϕ in A−
k = {s : μ (s) + c (x (s) − ϕ(s)) < 0}.
k k

(8.4.13)

i i

i i
ItoKunisc
i i
2008/6/12
page 243
i i

8.5. Semismooth Newton methods for nonlinear complementarity problems 243

In the bilateral case, the role of c influences the likelihood of switches from being
active-above to active-below in one iteration. If, for example, s ∈ A+ k , then μ
k+1
(s) +
c (x (s) − ϕ(s)) = μ (s) + c (ψ(s) − ϕ(s)) is more likely to be negative and hence
k+1 k+1

s ∈ A− k+1 if c is large.
For c = α, iteration (8.4.13) is equivalent to applying the semismooth Newton method
to (8.4.12) with G as N -derivative for the max function and

⎨ 0 if x(s) ≥ 0,
G̃(x)(s) =

1 if x(s) < 0
as N-derivative for the min function. Under the assumptions of Proposition 8.18 local
superlinear convergence of the iteration (8.4.13) can be shown.

8.5 Semismooth Newton methods for a class of nonlinear


complementarity problems
In this section we consider nonlinear complementarity problems in the Hilbert space X =
L2 (). Let g be a continuous mapping from L2 () into itself with  a bounded domain in
Rn , and consider
g(x) + μ = 0, μ ≥ 0, x ≤ ψ, and (μ, x − ψ) = 0, (8.5.1)
where ψ ∈ L2 (). As discussed in Chapter 7, (8.5.1) can be expressed equivalently as

⎨ g(x) + μ,
0 = F (x, μ) = (8.5.2)

μ − max(0, μ + c (x − ψ)),
for any c > 0, where max denotes the pointwise max operation.
If  is a continuously differentiable functional on X, then (8.5.1) with g =  is the
necessary optimality condition for
min (x) subject to x ≤ ψ. (8.5.3)
x∈L2 ()

We shall employ semismooth Newton methods for solving F (x, μ) = 0. Let G(x) be as
defined in Section 8.4.
Applying a primal-dual active set strategy to the second equation in (8.5.2) results in
g(x k+1 ) + μk+1 = 0,

x k+1 = ψ in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0}, (8.5.4)

μk+1 = 0 in Ik = {s : μk (s) + c (x k (s) − ψ(s)) ≤ 0}


for k = 0, 1, . . . . To investigate the convergence properties of (8.5.4) we note that (8.5.2)
is equivalent to
g(x) + max(0, −g(x) + c (x − ψ)) = 0, (8.5.5)

i i

i i
ItoKunisc
i i
2008/6/12
page 244
i i

244 Chapter 8. Semismooth Newton Methods I

and the iteration (8.5.4) is equivalent to the reduced iteration


 
g(x k+1 ) + G −g(x k ) + c (x k − ψ) −g(x k+1 ) + c (x k+1 − ψ)
 (8.5.6)
−(−g(x k ) + c (x k − ψ)) + max(0, −g(x k ) + c (x k − ψ)) = 0,

provided that the initialization for (8.5.4) is chosen such that μ0 = −g(x 0 ). Note that (8.5.5)
can be considered as a partial semismooth Newton iteration. Separating the generalized
differentiation of the max operation from that of the linearization of the nonlinearity g was
investigated numerically in [dReKu, GrVo].

Proposition 8.19. Assume that (8.5.2) admits a solution x ∗ , that x → g(x) − c (x − ψ) is


Lipschitz continuous from L2 () to Lp () in a neighborhood U of x ∗ for some c > 0 and
p > 2, and that there exists α > 0 such that

α|x − y|L2 (I) ≤ (g(x) − g(y))(s)(x − y)(s) ds
2
(8.5.7)
I

for all partitions  = A ∪ I and x, y ∈ L2 (). Then the iterates (x k , μk ) defined by


(8.5.4) converge superlinearly to (x ∗ , μ∗ ), provided that |x ∗ − x 0 | is sufficiently small and
μ0 = −g(x 0 ).

Proof. From (8.5.5) and (8.5.6) we have


g(x k+1 ) − g(x ∗ ) + G(zk )(−g(x k+1 ) + c(x k+1 − ψ) + g(x ∗ ) − c(x ∗ − ψ))
 
= G(zk ) −g(x k ) + c(x k − ψ) + g(x ∗ ) − c(x ∗ − ψ)

+ max(0, −g(x ∗ ) + c(x ∗ − ψ)) − max(0, −g(x k ) + c(x k − ψ)),

where zk = −g(x k ) + c(x k − ψ). Taking the inner product in L2 () with x k+1 − x ∗ , using
(8.5.7) to estimate the left-hand side from below, and Example 8.14 and Lipschitz continuity
of x → g(x) + c(x − ψ) from L2 () to Lp () to bound the right-hand side from above
we find
8
min(c, α)|x k+1 − x ∗ |L2 () ≤ o(|x k − x ∗ |L2 () ).
Finally (8.5.2) and (8.5.4) imply that
μk+1 − μ∗ = g(x ∗ ) − g(x k+1 ),
and hence superlinear convergence of μk to μ∗ follows from superlinear convergence of x k
to x ∗ .

A full semismooth Newton step applied to (8.5.5) is given by


  
g (x k )(x k+1 − x k ) + G −g(x k ) + c (x k − ψ) −g (x)(x k+1 − x k ) + c (x k+1 − x k )

+g(x k ) + max(0, −g(x k ) + c (x k − ψ)) = 0.


(8.5.8)

i i

i i
ItoKunisc
i i
2008/6/12
page 245
i i

8.5. Semismooth Newton methods for nonlinear complementarity problems 245

It can be equivalently expressed as

g (x k )(x k+1 − x k ) + g(x k ) + μk+1 = 0,

x k+1 = ψ in Ak = {s : −g(x k )(s) + c (x k (s) − ψ(s)) > 0}, (8.5.9)

μk+1 = 0 in Ik = {s : −g(x k )(s) + c (x k (s) − ψ(s)) ≤ 0}.

Remark 8.5.1. Note that, differently from the linear case considered in Section 8.4, if we
apply a full semismooth Newton step to (8.5.2) rather than to the reduced equation (8.5.5),
then the resulting algorithm differs in the update of the active/inactive sets. Applying a
semismooth Newton step to (8.5.2) results in Ak = {s : μk (s) + c (x k (s) − ψ(s)) > 0} =
{s : −g(x k−1 )(s) − g (x k−1 )(x k − x k−1 ) + c (x k (s) − ψ(s)) > 0}.

To investigate local convergence of (8.5.8) we denote, for any partition  = A ∪ I


into measurable sets I and , by RI : L2 () → L2 (I) the canonical restriction operator
and by RI∗ : L2 (I) → L2 () its adjoint. Further we set

g (x)I = RI g (x) RI∗ .

Proposition 8.20. Assume that (8.5.2) admits a solution x ∗ , that x → g(x) − c (x − ψ) is


a C 1 function from L2 () to Lp () in a neighborhood U of x ∗ for some c > 0 and p > 2,
and that

{g (x)−1
I ∈ L(L (I)) : x ∈ U,  = A ∪ I} is uniformly bounded.
2

Then the iterates x k defined by (8.5.8) converge superlinearly to x ∗ , provided that |x ∗ − x 0 |


is sufficiently small.

Proof. By Lemma 8.15 and Example 8.14 the mapping x → max(0, −g(x) + c(x − ψ))
is Newton differentiable from L2 () into itself and G(−g(x) + c(x − ψ))(−g (x) + cI )
is an N -derivative. Moreover g (x) + G(−g(x) + c(x − ψ))(−g + cI ) is invertible in
L(L2 ()) with uniformly bounded inverses for x ∈ U . Setting

z = −g(x) + c(x − ψ), A = {z > 0}, I =  \ A, hI = χI h, hA = χA h,

this follows from the fact that for given f ∈ L2 () the solution to the equation

g (x)h + G(z)(−g (x)h + ch) = f

is given by
 
1
chA = fA and hI = gI (x)−1
fI − χI g (x)fA .
c

From Theorem 8.16 we conclude that x k → x ∗ superlinearly, provided that |x ∗ − x 0 | is


sufficiently small.

i i

i i
ItoKunisc
i i
2008/6/12
page 246
i i

246 Chapter 8. Semismooth Newton Methods I

8.6 Semismooth Newton methods and regularization


In this section we discuss the case where the operator A of Section 8.4 is not a continuous
but only a closed operator in X = L2 (). Let V denote a real Hilbert space that is
densely and continuously embedded into X and let V ∗ denote its dual. For ψ ∈ V let
C = {y ∈ V : y ≤ ψ}.
We identify X∗ with X and thus we have the Gel’fand triple framework: V ⊂ X =
X ⊂ V ∗ . Let a : V × V → R be a bounded coercive bilinear form, i.e., for M, ν > 0

|a(y, v)| ≤ M |y|V |v|V for all y, v ∈ V ,

a(v, v) ≥ ν |v|2V for all v ∈ V .

For given f ∈ V ∗ consider the variational inequality for y ∈ C:

a(y, v − y) − f, v − y
≥ 0 for all v ∈ C. (8.6.1)

The existence of solutions to (8.6.1) is established in Theorem 8.26. The solution to (8.6.1)
is unique. In fact if ỹ ∈ C is a solution to (8.6.1), we have

a(y, ỹ − y) − f, ỹ − y
≥ 0,

a(ỹ, y − ỹ) − f, y − ỹ
≥ 0.

Summing these inequalities, we have a(y − ỹ, y − ỹ) ≤ 0 and thus ỹ = y.


If a is symmetric, then (8.6.1) is the necessary and sufficient optimality condition for
the constrained minimization problem in V

1
min a(y, y) − f, y
over y ∈ C. (8.6.2)
2
Let us define A ∈ L(V , V ∗ ) by

Ay, v
V ∗ ×V = a(y, v) for y, v ∈ V .

We note that (8.6.2) is equivalent to

Ay + μ = f, y ≤ ψ, μ ∈ C+, μ, y − ψ
V ∗ ,V = 0, (8.6.3)

where the first equality holds in V ∗ and C + = {μ ∈ V ∗ : μ, v


V ∗ ,V ≤ 0 for all v ≤ 0}.
If μ (or equivalently y) has extra regularity in the sense that μ ∈ L2 (), then (8.6.3) is
equivalent to

⎨ Ay + μ = f,
(8.6.4)

μ = max(0, μ + c(y − ψ))

for each c > 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 247
i i

8.6. Semismooth Newton methods and regularization 247

Example 8.21.
 For the obstacle problem in its simplest form we choose V = H01 () and
a(v, w) =  ∇v∇w dx. In general, the unique solution (y, μ) is in L2 () × H −1 (). If
∂ and ψ are sufficiently regular, then (y, μ) ∈ (H01 () ∩ H 2 ()) × L2 () and (8.6.3) is
equivalent to (8.6.4).

Example 8.22. Consider the state-constrained optimal control problem with β > 0 and
ȳ ∈ L2 ()
1 β
min |y − ȳ|2L2 () + |u|2L2 () subject to Ey = u and y ∈ C, (8.6.5)
u∈L2 () 2 2

where E is a closed linear operator in L2 (). We assume that E −1 exists and set V =
dom (E), where dom (E) is endowed with the graph norm of E. For E = − with
homogeneous Dirichlet boundary conditions, we have dom E = H01 () ∩ H 2 (). The
necessary and sufficient optimality condition for (8.6.5) is given by

Ey = u, y ∈ C, βu = p,
(8.6.6)
E ∗ p + y − ȳ, v − y
V ∗ ×V ≥ 0 for all v ∈ C,

where E ∗ : X → V ∗ denotes the conjugate of E as operator from V to X. This optimality


system can be written as (8.6.1) with f = ȳ and

a(v, w) = β (Ev, Ew)X + (v, w) for v, w ∈ V .

It can also be written in the form (8.6.3) with μ ∈ V ∗ , but differently from Example 8.21,
μ∈ / L2 () in general; see [BeK3].
In the case of mixed constraints, i.e., when pointwise constraints on the controls and
the state are present, it is natural to treat the control constraints with the active set methods
presented in Chapter 7 and to relax the state constraints by the technique explained in this
section.

Thus the obstacle and the state-constrained optimal control problem differ in that for
the former, under weak regularity conditions on the problem data, the Lagrange multiplier
is in L2 () and complementarity can be expressed in the form of (8.6.4), whereas for the
latter this is not the case. Before we can address the applicability of the primal-dual active
set strategy or semismooth Newton methods to such problems we also need to consider the
candidates for the iterates. In the case of the obstacle problem these are given formally by

a(y k+1 , v) + (μk+1 , v)L2 () = (f, v)L2 () for all v ∈ H01 (),

y k+1 = ψ in Ak = {x : μk + c (y k (x) − ψ(x)) > 0}, (8.6.7)

μk+1 = 0 in Ik = {x : μk + c (y k (x) − ψ(x)) ≤ 0}.

This is the optimality condition for



1
min |∇y|2 dx − (f, y)L2 () over y ∈ H01 () with y = ψ on Ak ,
2 

i i

i i
ItoKunisc
i i
2008/6/12
page 248
i i

248 Chapter 8. Semismooth Newton Methods I

for which the Lagrange μk+1 is not in L2 (), and hence an update of Ak as in (8.6.7) is
not feasible. Alternatively, considering the possibility of applying a semismooth Newton
approach we observe that the reduced form of (8.6.4) is given by

μ = max(0, μ + c(A−1 (f − μ) − ψ)),

so that no smoothing of μ under the max operation occurs.


In order to remedy these difficulties related to the regularity properties of the Lagrange
multiplier we consider a one-parameter family of regularized problems based on smoothing
of the complementarity condition given by

μ = α max(0, μ + c(y − ψ)), with 0 < α < 1. (8.6.8)

This is a relaxation of the second equation in (8.6.4) with α as a continuation parameter. Note
that an update for μ based on (8.6.8) results in μ ∈ L2 (). Equation (8.6.8) is equivalent to
 

μ = max 0, (y − ψ) , (8.6.9)
1−α

with cα/(1 − α) ranging in (0, ∞) for α ∈ (0, 1). We shall use a generalization of (8.6.8)
and introduce an additional shift parameter μ̄ ∈ L2 () into (8.6.9). Moreover we replace
cα/(1 − α) by c and arrive at

μ = max(0, μ̄ + c (y − ψ)), with c ∈ (0, ∞). (8.6.10)

This is exactly the same as the generalized Yosida–Moreau approximation for inequality
constraints discussed in Chapter 4.7 and is related to augmented Lagrangian methods as
described in Chapter 4.6. Utilizing this regularization in (8.6.4) results in
-
Ay + μ = f,
(8.6.11)
μ = max(0, μ̄ + c (y − ψ)).

Note that for each c > 0


y → max(0, μ̄ + c (y − ψ))
is Lipschitz continuous and monotone from X to X. Thus existence of a unique yc ∈ V
satisfying
Ay + max(0, μ̄ + c (y − ψ)) = f
follows by monotone operator techniques; see, e.g., the discussion above Theorem 4.43.
This implies the existence of a unique solution (yc , μc ) ∈ V × L2 () to (8.6.11). Conver-
gence as c → ∞ will be analyzed at the end of this section.
The semismooth Newton iteration for (8.6.11) is discussed next.

Semismooth Newton Algorithm with Regularization.


(i) Choose μ, c, y0 , set k = 0.
(ii) Set Ak = {x : (μ̄ + c (y k − ψ))(x) > 0}, Ik =  \ Ak .

i i

i i
ItoKunisc
i i
2008/6/12
page 249
i i

8.6. Semismooth Newton methods and regularization 249

(iii) Solve for y k+1 ∈ V :

a(y, v) + (μ̄ + c (y − ψ), χAk v)L2 () = (f, v)L2 () for all v ∈ V .

(iv) Set ⎧
⎨ 0 on Ik ,
μk+1 =

μ̄ + c (y k+1 − ψ) on Ak .

(v) Stop, or set k = k + 1, goto (ii).

The iterates μk as assigned in step (iv) are not necessary for the algorithm, but they
will be useful in the convergence analysis. The practical relevance of μ̄ is given by the
fact that for certain problems a proper choice can guarantee feasibility of the iterates, i.e.,
yk ≤ ψ, as will be shown next.

Theorem 8.23. Assume that a(φ, φ + ) ≤ 0, for φ ∈ V , implies that φ = 0. Then if

μ̄ ≥ (f − Aψ)+

in the sense that


(μ̄, φ) ≥ f, φ
− a(ψ, φ) (8.6.12)
for all φ ∈ C, the solution to (6.11) is feasible, i.e., yc ≤ ψ.

Proof. Since for φ = (yc − ψ)+


 
max 0, μ̄ + c(yc − ψ), φ ≥ (μ̄, φ),

it follows from (8.6.11) and (8.6.12) that

a(y − ψ, φ) = f, φ
− a(ψ, φ) − (μ̄, φ) ≤ 0.

This implies that φ = 0 and thus yc ≤ ψ.

Theorem 8.24. If a(φ, φ + ) ≤ 0, for φ ∈ V , implies that φ = 0, then the iterates of the
semismooth Newton algorithm with regularization satisfy y k+1 ≤ y k for all k ≥ 1.

Proof. Let φ = (y k+1 − y k )+ and observe that

a(y k+1 −y k , φ)+a(y k , φ)− f, φ


+ (μ̄ + c (y k −ψ), φ χAk ) + c(y k+1 −y k , φ χAk ) = 0.

We have
a(y k , φ) − f, φ
+ (μ̄ + c(y k − ψ), φ χAk )

= −(μ̄ + c(y k − ψ), φ χAk−1 ) + (μ̄ + c(y k − ψ), φ χAk )

= (μ̄ + c(y k − ψ), φ χAk ∩Ik−1 ) − (μ̄ + c(y k − ψ), φ χIk ∩Ak−1 ) ≥ 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 250
i i

250 Chapter 8. Semismooth Newton Methods I

This implies that

a(y k+1 − y k , φ) + c(y k+1 − y k , φ χAk ) ≤ 0;

thus, a(y k+1 − y k , φ) ≤ 0 and hence φ = 0.

Let us stress that in a finite-dimensional or discretized setting with L2 () replaced


by R the need for regularization is not apparent, since the lack of regularity, as exhibited
n

in Examples 8.21 and 8.22 and in the discussion of the iterative step of the semismooth
Newton algorithm, does not exist. If the finite-dimensional problems arise from discretiza-
tion of continuous ones, then the regularity of the Lagrange multipliers, however, certainly
influences the convergence and approximation properties. It will typically lead to mesh-
size-dependent behavior of the semismooth Newton algorithm.
We turn to convergence of the semismooth Newton algorithm with regularization.
Recall that A−1 ∈ L(V ∗ , V ). Below we shall also denote the restriction of A−1 to L2 ()
by the same symbol.

Theorem 8.25. Assume that μ̄ − cψ ∈ Lp () and A−1 ∈ L(L2 (), Lp ()) for some
p > 2. If μ0 ∈ L2 () and |μ0 − μc |L2 () is sufficiently small, then (y k , μk ) → (yc , μc )
superlinearly in V × L2 ().

Proof. First we show superlinear convergence of μk to μc by applying Theorem 8.16 to


F : L2 () → L2 () defined by

F (μ) = μ − max(0, μ̄ + c(A−1 (f − μ) − ψ)).

By Example 8.14 and Lemma 8.15 the mapping F is Newton differentiable with N -
derivative given by

G̃(μ) = I + c G(μ̄ + c(A−1 (f − μ) − ψ))A−1 ,

where G was defined in (8.4.6). The proof will be completed by showing that G̃(μ) has
uniformly bounded inverses in L(L2 ()) for μ ∈ L2 (). We define

A = {x : (μ̄ + c(A−1 (f − μ) − ψ))(x) > 0}, I =  \ A.

Further, let RA : L2 () → L2 (A) and RI : L2 () → L2 (I) denote the restriction opera-

tors to A and I. Their adjoints RA : L2 (A) → L2 () and RI∗ : L2 (I) → L2 () are the
extension-by-zero operators from A and I to , respectively. The mapping (RA , RI ) : L2 ()
→ L2 (A)×L2 (I) determines an isometric isomorphism and every μ ∈ L2 () can uniquely
be expressed as (RA μ, RI μ). The operator G̃(μ) can equivalently be expressed as
   ∗

IA 0 RA A−1 RA RA A−1 RI∗
G̃(μ) = +c ,
0 II 0 0

where IA and II denote the identity operators on L2 (A) and L2 (I). Let (gA , gI ) ∈
L2 (A) × L2 (I) be arbitrary and consider the equation

G̃(μ)((δμ)A , (δμ)I ) = (gA , gI ). (8.6.13)

i i

i i
ItoKunisc
i i
2008/6/12
page 251
i i

8.6. Semismooth Newton methods and regularization 251

Then necessarily (δμ)I = gI and (8.6.13) is equivalent to

(δμ)A + c RA A−1 RA

(δμ)A = gA − c RA A−1 RI∗ gI . (8.6.14)

The Lax–Milgram theorem and nonnegativity of A−1 imply the existence of a unique solu-
tion (δμ)A to (8.6.14) and consequently (8.6.13) has a unique solution for every (gA , gI )
and every μ. Moreover these solutions are uniformly bounded with respect to μ ∈ L2 since
(δμ)I = gI and

|δμA |L2 (A) ≤ |gA |L2 () + cA−1 L(L2 ()) |gI |L2 (I) .

This proves superlinear convergence μk → μc in L2 (). Superlinear convergence of y k to


yc in V follows from Ay k +μk = f and the fact that A : V ∗ → V is a homeomorphism.

Convergence of the solutions (yc , μc ) to (8.6.11) as c → ∞ is addressed next.

Theorem 8.26. The solution (yc , μc ) ∈ V ×X of the regularized problem (8.6.11) converges
to the solution (y ∗ , μ∗ ) ∈ V × V ∗ of (8.6.3) in the sense that yc → y ∗ strongly in V and μc
→ μ∗ strongly in V ∗ as c → ∞.

Proof. Recall that yc ∈ V satisfies


-
a(yc , v) + (μc , v) = f, v
for all v ∈ V ,
(8.6.15)
μc = max(0, μ̄ + c(yc − ψ)).

Since μc ≥ 0, for y ∈ C

c 1
(μc , yc − y) = (μc , yc − ψ − (y − ψ)) ≥ |(yc − ψ)+ |2X − |μ̄|2X .
2 2c
Letting v = yc − y in (8.6.15),

c 1
a(yc , yc − y) + |(yc − ψ)+ |2X ≤ f, yc − y
+ |μ̄|2X . (8.6.16)
2 2c
Since a is coercive, this implies that
c
ν |yc |2V + |(yc − ψ)+ |2X is bounded uniformly in c.
2
Thus there exist a subsequence yc , denoted by the same symbol, and y ∗ ∈ V such that
yc → y ∗ weakly in V . For all φ ≥ 0

((yc − ψ)+ , φ)X ≥ (yc − ψ, φ)X .

Since (yc − ψ)+ → 0 in X and yc → y ∗ weakly in X as c → ∞, this yields

(y ∗ − ψ, φ)X ≤ 0 for all φ ≥ 0

i i

i i
ItoKunisc
i i
2008/6/12
page 252
i i

252 Chapter 8. Semismooth Newton Methods I


which implies y ∗ − ψ ≤ 0 and thus y ∗ ∈ C. Since φ → a(φ, φ) defines an equivalent
norm on V and norms are w.l.s.c., letting c → 0 in (8.6.16) we obtain

a(y ∗ , y ∗ − y) ≤ f, y ∗ − y
for all y ∈ C

which implies that y ∗ is the solution to (8.6.1). Now, setting y = y ∗ in (8.6.16)


1
a(yc − y ∗ , yc − y ∗ ) + a(y ∗ , yc − y ∗ ) − f, yc − y ∗
≤ |μ̄|2 .
2c X
Since yc → y ∗ weakly in V , this implies that limc→∞ |yc − y ∗ |V = 0. Hence (yc , μc ) →
(y ∗ , μ∗ ) strongly in V × V ∗ .

i i

i i
ItoKunisc
i i
2008/6/12
page 253
i i

Chapter 9

Semismooth Newton
Methods II: Applications

In the previous chapter semismooth Newton methods in function spaces were investigated.
In was demonstrated that in certain cases the semismooth Newton method is equivalent to
the primal-dual active set method. The application to nonlinear complementarity problems
was discussed and the necessity of introducing regularization in cases where the Lagrange
multiplier associated to the inequality condition has low regularity was demonstrated. In
this chapter applications of semismooth Newton methods to nondifferentiable variational
problems in function spaces will be treated. They concern image restoration problems
regularized by bounded variation functionals in Section 9.1 and frictional contact problems
in elasticity in Section 9.2.
We shall make use of the Fenchel duality theorem which we recall for further reference;
see, e.g., Section 4.3 and [BaPe, EkTe] for details. Let V and Y be Banach spaces with
topological duals V and Y , respectively. Further, let  ∈ L(V , Y ) and let F : V −→
R ∪ {∞}, G : Y −→ R ∪ {∞} be convex, proper, and l.s.c. functionals such that there exists
v0 ∈ V with F(v0 ) < ∞, G(v0 ) < ∞ and G is continuous at v0 . Then
 
inf {F(v) + G(v)} = sup −F (− q) − G (q) , (9.0.1)
v∈V q∈Y

where  ∈ L(Y , V ) is the adjoint of . See, e.g., Section 4.3, Theorem 4.30 and
Example 4.33 with (v, w) = F(v) + G(v + w). The convex conjugates F : V −→
R ∪ {∞} and G : Y −→ R ∪ {∞} of F and G, respectively, are defined by
 
F (v ) = sup v, v
V ,V − F(v) ,
v∈V

and analogously for G . The conditions imposed on F and G guarantee that the dual problem,
i.e., the problem on the right-hand side of (9.0.1), admits a solution. Furthermore, v̄ ∈ V
and q̄ ∈ Y are solutions to the two optimization problems in (9.0.1) if and only if the
extremality conditions
− q̄ ∈ ∂F(ū),
(9.0.2)
q̄ ∈ ∂G(ū)
hold, where ∂F denotes the subdifferential of F.

253

i i

i i
ItoKunisc
i i
2008/6/12
page 254
i i

254 Chapter 9. Semismooth Newton Methods II: Applications

9.1 BV-based image restoration problems


In this section we consider the nondifferentiable optimization problem
-   
min 12  |Ku − f |2 dx + α2  |u|2 dx + β  |Du|
(9.1.1)
over u ∈ BV(),

where  is a simply connected domain in R2 with Lipschitz continuous boundary ∂,


f ∈ L2 (), β > 0, α ≥ 0 are given, and K ∈ L(L2 ()). We assume that K∗ K is invertible
or α > 0. Further BV() denotes the space of functions of bounded variation. A function
u is in BV() if the BV-seminorm defined by
  

|Du| = sup u div v : v ∈ (C0 ()) , |
2
v (x)|∞ ≤ 1
 

is finite. Here | · |∞ denotes the supremum norm on R2 . It is well known that BV() ⊂
L2 () for  ⊂ R2 (see [Giu]) and that u  → |u|L2 +  |Du| defines a norm on BV().
If K = identity and α = 0, then (9.1.1) is the well-known image restoration problem
with BV-regularization. It consists of recovering the true image u from the noisy image f .
BV-regularization is known to be preferable to regularization by  |∇u|2 dx, for example,
due to its ability to preserve edges in the original image during the reconstruction process.
Since the pioneering work in [ROF], the literature on (9.1.1) has grown tremendously. We
give some selected references [AcVo, CKP, ChLi, ChK, DoSa, GeYa, IK12] and refer the
reader to the monograph [Vog] for additional ones.
Despite its favorable properties for reconstruction of images, and especially images
with blocky structure, problem (9.1.1) poses some severe difficulties. On the analytical level
these are related to the fact that (9.1.1) is posed in a nonreflexive Banach space, the dual
of which is difficult to characterize [Giu, IK18], and on the numerical level the optimality
system related to (9.1.1) consists of a nonlinear partial differential equation, which is not
directly amenable to numerical implementations.
Following [HinK1] we shall show that the predual of (9.1.1) is a bilaterally constrained
optimization problem in a Hilbert space, for which the primal-dual active set strategy can
advantageously be applied and which can be analyzed as a semismooth Newton method.
We require some facts from vector-valued function spaces, which we summarize next. Let
IL2 () = L2 () × L2 () be endowed with the Hilbert space inner product structure and
norm. If the context suggests to do so, then we shall distinguish between vector fields v ∈
IL2 () and scalar functions v ∈ L2 () by using an arrow on top of the letter.  Analogously
we set IH10 () = H01 () × H01 (). We set L20 () = {v ∈ L2 () :  vdx = 0} and
H0 (div) = {v ∈ IL2 () : div v ∈ L2 (), v · n = 0 on ∂}, where n is the outer normal to
∂. The space H0 (div) is endowed with | v |2H0 (div) = |
v |2IL2 () + | div v|2L2 as norm. Further
we put H0 (div 0) = { v ∈ H0 (div) : div v = 0 a.e. in }. It is well known that

IL2 () = grad H 1 () ⊕ H0 (div 0); (9.1.2)

cf. [DaLi, p. 216], for example. Moreover,

H0 (div) = H0 (div 0)⊥ ⊕ H0 (div 0), (9.1.3)

i i

i i
ItoKunisc
i i
2008/6/12
page 255
i i

9.1. BV-based image restoration problems 255

with
H0 (div 0)⊥ = {
v ∈ grad H 1 () : div v ∈ L2 (), v · n = 0 on ∂},
and div : H0 (div 0)⊥ ⊂ H0 (div) → L20 () is a homeomorphism. In fact, it is injective by
construction and for every f ∈ L20 () there exists, by the Lax–Milgram lemma, ϕ ∈ H 1 ()
such that
div ∇ϕ = f in , ∇ϕ · n = 0 on ∂,
with ∇ϕ ∈ H0 (div 0)⊥ . Hence, by the closed mapping theorem we have
div ∈ L(H0 (div 0)⊥ , L20 ()).
Finally, let Pdiv and Pdiv⊥ denote the orthogonal projections in IL2 () onto H0 (div 0) and
grad H 1 (), respectively. Note that the restrictions of Pdiv and Pdiv⊥ to H0 (div 0) coincide
with the orthogonal projections in H0 (div) onto H0 (div 0) and H0 (div 0)⊥ .
Let 1 denote the two-dimensional vector field with 1 in both coordinates, set B =
αI + K∗ K, and consider
-
min 12 | div p + K∗ f |2B over p ∈ H0 (div),
(9.1.4)
such that − β 1 ≤ p ≤ β 1,


where for v ∈ L2 () we put |v|2B = (v, B −1 v)L2 . It is straightforward to argue that (9.1.4)
admits a solution.

Theorem 9.1. The Fenchel dual to (9.1.4) is given by (9.1.1) and the solutions u∗ of (9.1.1)
and p ∗ of (9.1.4) are related by
Bu∗ = div p ∗ + K∗ f, (9.1.5)
(− div)∗ u∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div), (9.1.6)

with −β 1 ≤ p ≤ β 1.


Alternatively (9.1.4) can be considered as the predual of the original problem (9.1.1).
Proof. We apply Fenchel duality as recalled at the beginning of the chapter with V =
H0 (div), Y = Y ∗ = L2 (),  = − div, G : Y → R given by G(v) = 12 |v − K∗ f |2B , and
F : V → R defined by F(p)  = I[−β 1,β
 1]  where
 (p),

0 if − β 1 ≤ p(x)
 ≤ β 1 for a.e. x ∈ ,
I[−β 1,β
 1]  =
 (p)
∞ otherwise.
The convex conjugate G ∗ : L2 () → R of G is given by
1 α 1
G ∗ (v) =
|Kv + f |2 + |v|2 − |f |2 .
2 2 2
∗ ∗
Further the conjugate F : H0 (div) → R of F is given by
F ∗ (
q ) = sup   H0 (div)∗ ,H0 (div)
q , p
for q ∈ H0 (div)∗ , (9.1.7)
 1
p∈S

i i

i i
ItoKunisc
i i
2008/6/12
page 256
i i

256 Chapter 9. Semismooth Newton Methods II: Applications

where S1 = {p ∈ H0 (div) : −β 1 ≤ p ≤ β 1}.


 Let us set

S2 = {p ∈ C01 () × C01 () : −β 1 ≤ p ≤ β 1}.




The set S2 is dense in the topology of H0 (div) in S1 . In fact, let p be an arbitrary element
of S1 . Since (D())2 is dense in H0 (div) (see, e.g., [GiRa, p. 26]), there exists a sequence
pn ∈ (D())2 converging in H0 (div) to p.
 Let P denote the canonical projection in H0 (div)
onto the closed convex subset S1 and note that, since p ∈ S1 ,

|p − P pn |H0 (div) ≤ |p − pn |H0 (div) + |pn − P pn |H0 (div)
≤ 2|p − pn |H0 (div) → 0 for n → ∞.

Hence limn→∞ |p − P pn |H0 (div) = 0 and S2 is dense in S1 . Returning to (9.1.7) we have
for v ∈ L2 () and (− div)∗ ∈ L(L2 (), V ∗ ),

F ∗ ((− div)∗ v) = sup (v, − div p),



 2
p∈S

which can be +∞. By the definition of the functions of bounded variation it is finite if and
only if v ∈ BV() (see [Giu, p. 3]) and

∗ ∗
F ((−div) v) = β |Dv| < ∞ for v ∈ BV().


The dual problem to (9.1.4) is found to be



1 α
min |Ku − f |2 + |u|2 + β |Du| over u ∈ BV().
2 2 

From (9.0.2) moreover we find

(− div)∗ u∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ S1

and
Bu∗ = div p ∗ + K∗ f.

We obtain the following optimality system for (9.1.4).

 ∗ ∈ H0 (div)∗
Corollary 9.2. Let p ∗ ∈ H0 (div) be a solution to (9.1.4). Then there exists λ
such that

div∗ B −1 div p ∗ + div∗ B −1 K∗ f + λ  ∗ = 0, (9.1.8)


 ∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div),
λ (9.1.9)

with −β 1 ≤ p ≤ β 1.


Proof. Apply div∗ B −1 to (9.1.5) and set λ  ∗ = − div∗ u∗ ∈ H0 (div)∗ to obtain (9.1.8). For
 ∗ , (9.1.9) follows from (9.1.6).
this choice of λ

i i

i i
ItoKunisc
i i
2008/6/12
page 257
i i

9.1. BV-based image restoration problems 257

To guarantee uniqueness of the solutions we replace (9.1.4) by


-
min 12 | div p + K∗ f |2B + γ2 |p|
 2 over p ∈ H0 (div)
(9.1.10)
such that − β 1 ≤ p ≤ β 1

with γ > 0. Clearly (9.1.10) admits a unique solution.


Our goal now is to investigate the primal-dual active set strategy for (9.1.10). If
we were to simply apply this method to the first order necessary optimality condition for
(9.1.10), then the resulting nonlinear operator arising in the complementarity system is not
Newton differentiable; cf. Section 8.6. The numerical results which are obtained in this
way are competitive from the point of view of image reconstruction [HinK1], but due to
lack of Newton differentiability the iteration numbers are mesh-dependent. We therefore
approximate (9.1.10) by a sequence of more regular problems where the constraints −β 1 ≤
p ≤ β 1 are realized by penalty terms and an IH10 () smoothing guarantees superlinear
convergence of semismooth Newton methods applied to the first order optimality conditions
of the approximating problems:
-
 2 + 12 | div p + K∗ f |2B + γ2 |p|
min 2c1 |∇ p| 2
 2+
+ 2c1 | max(0, c(p − β 1))| 1  2 over p ∈ IH10 ().
| min(0, c(p
+ β 1))|
2c
(9.1.11)
Let pc denote the unique solution to (9.1.11). It satisfies the optimality condition
1
− pc − ∇B −1 div pc − ∇B −1 K∗ f + γ pc + λ  c = 0, (9.1.12a)
c
 + min(0, c(pc + β 1)).
 c = max(0, c(pc − β 1))
λ  (9.1.12b)

Next we address convergence as c → ∞.

Theorem 9.3. The family {(pc , λ  c )}c>0 converges weakly in H0 (div) × IH10 ()∗ to the
 ∗ ) ∈ H0 (div) × H0 (div)∗ of the optimality system associated to
unique solution (p ∗ , λ
(9.1.10) given by

div∗ B −1 div p ∗ + div∗ B −1 K∗ f + γ p ∗ + λ  ∗ = 0, (9.1.13)


 ∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0 for all p ∈ H0 (div).
λ (9.1.14)

Moreover, the convergence of pc to p ∗ is strong in H0 (div).

Proof. The proof is related to that of Theorem 8.26. The variational form of (9.1.13) is
given by
 ∗ , v
H0 (div)∗ ,H0 (div) = 0
(div p ∗ , div v)B + (K∗ f, div v)B + γ (p ∗ , v) + λ (9.1.15)
 i ) ∈ H0 (div)×H0 (div)∗ ,
for all v ∈ H0 (div). To verify uniqueness, let us suppose that (pi , λ
i = 1, 2, are two solution pairs to (9.1.13), (9.1.14). For δ p = p2 − p1 , δ λ  =λ
2 − λ1
we have

(B −1 div δ p,
 div v) + γ (δ p,  v
H0 (div)∗ ,H0 (div) = 0
 v) + δ λ, (9.1.16)

i i

i i
ItoKunisc
i i
2008/6/12
page 258
i i

258 Chapter 9. Semismooth Newton Methods II: Applications

for all v ∈ H0 (div), and


 δ p

δ λ,  H0 (div)∗ ,H0 (div) ≥ 0.


With v = δ p in (9.1.16) we obtain
|B −1 div δ p|
 2 + γ |δ p|
2≤0
and hence p1 = p2 . From (9.1.15) we deduce that λ 1 = λ  2 . Thus uniqueness is established
and we can henceforth rely on subsequential arguments.
In the following computation we consider the coordinates λ  ic , i = 1, 2, of λ
 c . We
have for a.e. x ∈ 
 
 ic pci = max(0, c(pci − β)) + min(0, c(pci + β)) pci
λ

⎨ c(pci − β)pci if pci ≥ β,
= 0 if |pci | = β,

c(pci + β)pci if pci ≤ β.
It follows that
1 i 2
 ic , pci )IL2 () ≥
(λ  | 2
|λ for i = 1, 2,
c c IL ()
and consequently
1
 c , pc )IL2 () ≥
(λ  c |2 2
|λ for every c > 0. (9.1.17)
c IL ()

From (9.1.12) and (9.1.17) we deduce that


1
|∇ pc |2 + | div pc |2B + γ |pc |2 ≤ | div pc |B |K∗ f |B
c
and hence
1 1 1
|∇ pc |2 + | div pc |2B + γ |pc |2 ≤ |K∗ f |B . (9.1.18)
c 2 2
We further estimate
 c |IH1 ()∗ =
|λ sup  c , v
IH1 ()∗ ,IH1 ()
λ
0 0 0
|
v| 1 =1
IH0 ()
 
1
≤ sup |∇ pc ||∇ v| + | div pc |B | div v|B + |K∗ f |B | div v|B + γ |pc | |
v| .
|
v| 1 =1 c
IH0 ()

From (9.1.18) we deduce the existence of a constant K, independent of c ≥ 1, such that


 c |IH1 ()∗ ≤ K.
|λ (9.1.19)
0

 ∗ ) ∈ H0 (div) ×
Combining (9.1.18) and (9.1.19) we can assert the existence of (p ∗ , λ
1 ∗
IH0 () such that for a subsequence denoted by the same symbol

 c )  (p ∗ , λ
(pc , λ  ∗ ) weakly in H0 (div) × IH10 () . (9.1.20)

i i

i i
ItoKunisc
i i
2008/6/12
page 259
i i

9.1. BV-based image restoration problems 259

We recall the variational form of (9.1.12), i.e.,

1
 c , v) = 0
(∇ pc , ∇ v) + (div pc , div v)B + (K∗ f, div v)B + γ (pc , v) + (λ
c

for all v ∈ IH10 (). Passing to the limit c → ∞, using (9.1.18) and (9.1.20) we have

(div p ∗ , div v)B +(K∗ f, div v)B + γ (p ∗ , v)


(9.1.21)
 ∗ , v
IH1 ()∗ ,IH1 () = 0 for all v ∈ IH10 ().
+ λ 0 0

Since IH10 () is dense in H0 (div) and p ∗ ∈ H0 (div) we have that (9.1.21) holds for all v ∈
H0 (div). Consequently λ  ∗ can be identified with an element in H0 (div)∗ and ·, ·
IH1 ()∗ ,IH1 ()
0 0
in (9.1.21) can be replaced by ·, ·
H0 (div)∗ ,H0 (div) . We next verify that p ∗ is feasible. For
this purpose note that
 
 + min(0, c(pc + β 1),
 c , p − pc ) = max(0, c(pc − β 1))
(λ  p − pc ) ≤ 0 (9.1.22)

for all −β 1 ≤ p ≤ β 1.
 From (9.1.11) we have

1 1
 c |2 ≤ |K∗ f |2B .
|∇ pc |2 + | div pc + K∗ f |2B + γ |pc |2 + |λ (9.1.23)
c c
 c |2 ≤ |K∗ f |2B for all c > 0. Note that
Consequently, 1c |λ

1  2 2 + c| min(0, pc + β 1)|


 22
 c |2 2 = c| max(0, pc − β 1)|

c IL () IL () IL ()

and thus

 22 −
| max(0, (pc − β 1))|
c→∞
 22 −
−−→ 0 and | min(0, (pc + β 1))|
c→∞
−−→ 0. (9.1.24)
IL () IL ()

Recall that pc  p ∗ weakly in IL2 (). Weak lower semicontinuity of the convex functional
 IL2 () and (9.1.24) imply that
p → | max(0, p − β 1)|
 
∗  2 dx ≤ lim inf
| max(0, p − β 1)|  2 dx = 0.
| max(0, pc − β 1)|
 c→∞ 

Consequently, p ∗ ≤ β 1 and analogously one verifies that −β 1 ≤ p ∗ . In particular p ∗ is


feasible and from (9.1.22) we conclude that

 c , p ∗ − pc
H0 (div)∗ ,H0 (div) ≤ 0
λ for all c > 0. (9.1.25)

By optimality of pc for (9.1.11) we have


 
1 γ 1 γ 2
lim sup | div pc + K∗ f |2B + |pc |2 ≤ | div p + K∗ f |2B + |p|
 (9.1.26)
c→∞ 2 2 2 2

i i

i i
ItoKunisc
i i
2008/6/12
page 260
i i

260 Chapter 9. Semismooth Newton Methods II: Applications

for all p ∈ S2 = {p ∈ (C01 ())2 : −β 1 ≤ p ≤ β 1}. Density of S2 in S1 = {p ∈ H0 (div) :


−β 1 ≤ p ≤ β 1} in the norm of H0 (div) implies that (9.1.26) holds for all p ∈ S1 and
consequently
 
1 γ 1 γ
lim sup | div pc + K f |B + |pc | ≤ | div p ∗ + K∗ f |2B + |p ∗ |2
∗ 2 2
c→∞ 2 2 2 2
 
1 γ
≤ lim inf | div pc + K∗ f |2B + |pc |2 ,
c→∞ 2 2
where for the last inequality weak lower semicontinuity of norms is used. The above
inequalities, together with weak convergence of pc to p ∗ in H0 (div), imply strong conver-
gence of pc to p ∗ in H0 (div). Finally we aim at passing to the limit in (9.1.25). This is
impeded by the fact that we only established λ c  λ ∗ in IH10 ()∗ . Note from (9.1.12) that
{− c pc + λ
1  c }c≥1 is bounded in H0 (div). Hence there exists μ  ∗ ∈ H0 (div)∗ such that

1
c  μ
− pc + λ ∗ weakly in H0 (div)∗ ,
c

and consequently also in IH10 () . Moreover, { √1c |∇ pc |}c≥1 is bounded and hence

1 ∗
− pc  0 weakly in IH10 ()
c

c  λ
as c → ∞. Since λ  ∗ weakly in IH10 () it follows that

∗ − μ
λ  ∗ , v
IH10 ()∗ ,IH10 () = 0 for all v ∈ IH10 ().

Since both λ ∗ and μ


 ∗ are elements of H0 (div)∗ and since IH10 () is dense in H0 (div), it
 =μ
follows that λ ∗
 ∗ in H0 (div)∗ . For p ∈ S2 we have
 ∗ , p − p ∗
H0 (div)∗ ,H0 (div) = μ∗ , p − p ∗
H0 (div)∗ ,H0 (div)
λ
 
1

= lim − pc + λc , p − pc
c→∞ c H0 (div)∗ ,H0 (div)
 
1
= lim (∇ pc , ∇(p − pc )) + (λ  c , p − pc )
c→∞ c
 
1
≤ lim (∇ pc , ∇ p)
 + (λ c , p − pc ) ≤ 0
c→∞ c

by (9.1.22) and (9.1.23). Since S2 is dense in S1 we find


 ∗ , p − p ∗
H0 (div)∗ ,H0 (div) ≤ 0
λ for all p ∈ S1 .

Remark 9.1.1. For γ = 0 problems (9.1.10) and (9.1.11) admit a solution, which, however,
 c )}c>0 contains a weak
is not unique. From the proof of Theorem 9.3 it follows that {(pc , λ

accumulation point in H0 (div) × IH0 () and every weak accumulation point is a solution
1

of (9.1.10).

i i

i i
ItoKunisc
i i
2008/6/12
page 261
i i

9.1. BV-based image restoration problems 261

Semismooth Newton treatment of (9.1.11).


We turn to the algorithmic treatment of the infinite-dimensional problem (9.1.11) for which
we propose the following algorithm.

Algorithm.
(1) Choose p0 ∈ IH10 () and set k = 0.
(2) Set, for i = 1, 2,

A+,i 
k+1 = {x : (pki + β 1)(x) > 0},
A −,i 
= {x : (pki + β 1)(x) < 0},
k+1
Ik+1
i
=  \ (A+,i −,i
k+1 ∪ Ak+1 ).

(3) Solve for p ∈ IH10 () and set pk+1 = p where


1
(∇ p,  div v)B + (K∗ f, div v)B + γ (p,
 ∇ v) + (div p,  v)
c (9.1.27)
 A+ , v) + (c(p + β 1)χ
+ (c(p − β 1)χ  A− , v) = 0
k+1 k+1

for all v ∈ IH10 ().


(4) Set ⎧

⎪ 0 on Ik+1
i
,



 ik+1 = c(pk+1
i 
− β 1) on A+,i
λ k+1 ,




⎩ 
c(p i + β 1)
k+1 on A−,i
k+1
for i = 1, 2.
(5) Stop, or set k = k + 1, and goto (2).
Above χA+k+1 stands for

⎨ 1 if x ∈ A+,i
k+1 ,
i
χA + =

/ A+,i
k+1
0 if x ∈ k+1 ,

and analogously for A− k+1 . The superscript i, i = 1, 2, refers to the respective component.
We note that (9.1.27) admits a solution pk+1 ∈ IH10 (). Step (4) is included for the sake of
the analysis of the algorithm. Let C : IH10 () → H −1 () × H −1 () stand for the operator
1
C = − − ∇B −1 div +γ id.
c
It is a homeomorphism for every c > 0 and allows us to express (9.1.12) as
 + c min(0, p + β 1)
C p − ∇B −1 K∗ f + c max(0, p − β 1)  = 0, (9.1.28)

i i

i i
ItoKunisc
i i
2008/6/12
page 262
i i

262 Chapter 9. Semismooth Newton Methods II: Applications

where we drop the index in the notation for pc . For ϕ ∈ L2 () we define

1 if ϕ(x) > 0,
Dmax(0, ϕ)(x) = (9.1.29)
0 if ϕ(x) ≤ 0

and

1 if ϕ(x) < 0,
Dmin(0, ϕ)(x) = (9.1.30)
0 if ϕ(x) ≥ 0.

Using (9.1.29), (9.1.30) as Newton derivatives for the max and the min operations in (9.1.28)
the semismooth Newton step can be expressed as
 A+ + c(pk+1 + β 1)χ
C pk+1 + c(pk+1 − β 1)χ  A− − ∇B −1 K∗ f = 0, (9.1.31)
k+1 k+1

 k+1 from step (4) of the algorithm is given by


and λ
 A+ + c(pk+1 + β 1)χ
 k+1 = c(pk+1 − β 1)χ
λ  A− . (9.1.32)
k+1 k+1

 rather
The iteration of the algorithm can also be expressed with respect to the variable λ
 For this purpose we define
than p.

F (λ  − c max(0, C −1 (∇ fˆ − λ)
) = λ  − c min(0, C −1 (∇ fˆ − λ)
 − β 1) 
 + β 1), (9.1.33)

where we put fˆ = B −1 K∗ f . Setting pk = C −1 (∇ fˆ − λ


 k ), the semismooth Newton step
  
applied to F (λ) = 0 at λ = λk results in
 k+1 = c(C −1(∇ fˆ − λ
λ  A+ + c(C −1(∇ fˆ − λ
 k+1 ) − β 1)χ  A−
 k+1 ) + β 1)χ
k+1 k+1

which coincides with (9.1.32). Therefore the semismooth Newton iterations according to
 = 0 coincide, provided that the initializations are related by
the algorithm and that for F (λ)
ˆ 
C p0 − ∇ f + λ0 = 0. The mapping F is Newton differentiable, i.e., for every λ  ∈ IL2 ()

|F (λ  − F (λ)
 + h)  − DF (λ
 + h)h|  IL2 () )
 IL2 () = o(|h| (9.1.34)

 IL2 () → 0; see Example 8.14. Here D denotes the Newton derivative of F defined
for |h|
by means of (9.1.29) and (9.1.30). For (9.1.34) to hold, the smoothing property of C −1
in the sense of an embedding from IL2 () into ILp () for some p > 2 is essential. The
following result now follows from Theorem 8.16.

c − λ
Theorem 9.4. If |λ  0 |IL2 () is sufficiently small, then the iterates {(pk , λ
 k )}∞
k=1 of the
1 2 
algorithm converge superlinearly in IH0 () × IL () to the solution (pc , λc ) of (9.1.11).

In Theorem 9.3 convergence as c → ∞ is established and Theorem 9.4 asserts


 k ) for each fixed c. The combination of these two limit
convergence of the iterates (pk , λ
processes was addressed in [HinK2], where a path concept with respect to the variable c is
developed and the iterations with respect to k are controlled to remain in a neighborhood
around the path.

i i

i i
ItoKunisc
i i
2008/6/12
page 263
i i

9.2. Friction and contact problems in elasticity 263

Let us compare the algorithm of this section to the general framework of augmented
Lagrangians presented in Section 4.6 for nonsmooth problems. We again introduce in
(9.1.10) a diffusive regularization and realize the inequality constraints by a generalized
Yosida–Moreau approximation. This suggests considering the Lagrangian L(p,  μ)
 : IH10 ()
× IL () → R defined by
2

1 1 γ 2
Lc (p,  =
 λ)  2 + | div p + K∗ f |2B + |p|
|∇ p|  + φc (p, 
 λ), (9.1.35)
2c̄ 2 2
where φc is the generalized Yosida–Moreau approximation of the indicator function φ of the
set {p ∈ IL2 () : −β 1 ≤ p ≤ β 1},
 and c > 0, c̄ > 0. Here we choose c differently from
c̄ since in the limit of the augmented Lagrangian iteration the constraint −β 1 ≤ p ≤ β 1 is
satisfied for any fixed c > 0. We have
c 2
φc (p,  =
 λ) inf φ(p − q) + (μ,
 q)IL2 () + |
q| 2
q∈IL 2
() 2 IL ()

 ∈ IL2 (), which can equivalently be expressed as


for c > 0 and μ
1  
 2 2
 =
 λ)
φc (p, max 0, λ  + c(p − β 1)
2c IL ()

1  
 2 2 − 1 λ 2 2 .
+ min 0, λ + c(p + β 1)
2c IL () 2c IL ()
The auxiliary problems in step 2 of the augmented Lagrangian method of Section 4.6 with Lc
given in (9.1.35) coincide with (9.1.11) except for the shift by λ k in the max/min operations.
Each of these auxiliary problems can efficiently be solved by the semismooth Newton
algorithm presented in this section if pk ∓ β 1 is replaced by λk + c(pk ∓ β 1).
 Conversely,
one can think of introducing augmented Lagrangian steps to the algorithm of this section,
with the goal of avoiding possible ill-conditioning as c → ∞. For numerical experience
with the algorithmic concepts of this section we refer the reader to [HinK1].

9.2 Friction and contact problems in elasticity


9.2.1 Generalities
This chapter is concerned with the application of semismooth Newton methods to contact
problems with Tresca and Coulomb friction in two spatial dimensions. In contact problems,
also known as Signorini problems, one has to detect the contact zone between an elastic body
and a rigid foundation. At the contact boundary, frictional forces are often too large to be
neglected. Thus, besides the nonpenetration condition the frictional behavior in the contact
zone has to be taken into account. Modeling friction involves a convex, nondifferentiable
functional which, by means of Fenchel duality theory, can be related to a bilateral obstacle
problem for which the primal-dual active set concept can advantageously be applied.
We commence with a formulation of the Signorini contact problem with Coulomb
friction in two dimensions. A generalization to the three-dimensional case is given in
[HuStWo]. We consider an elastic body that occupies, in its initial configuration, the open
and bounded domain  ⊂ R2 with C 1,1 -boundary = ∂. Let this boundary be divided

i i

i i
ItoKunisc
i i
2008/6/12
page 264
i i

264 Chapter 9. Semismooth Newton Methods II: Applications

into three disjoint parts, namely, the Dirichlet part d , further the part n with prescribed
 2
surface load h ∈ L2 ( n ) := L2 ( n ) , and the part c , where contact and friction with a
rigid foundation may occur. For simplicity we assume that ¯ c ∩ ¯ d = ∅ to avoid working
1
with the space H002 ( c ). We are interested in the deformation y = (y1 , y2 )! of the elastic
 2
body which is also subject to a given body force f ∈ L2 () := L2 () . The gap between
 2
the elastic body and the rigid foundation is d := τN d ≥ 0, where d ∈ H1 () := H 1 ()
and τN y denotes the normal component of the trace along c . As usual in linear elasticity,
the linearized strain tensor is
1 
ε(y) = ∇y + (∇y)! .
2
Using Hooke’s law for the stress-strain relation, the linearized stress tensor

σ (y) := Cε(y) := λtr(ε(y))Id + 2με(y)

is obtained, where λ and μ are the Lamé parameters. These parameters are given by
   
λ = (Eν)/ (1 + ν)(1 − 2ν) and μ = E/ 2(1 + ν)

with Young’s modulus E > 0 and the Poisson ration ν ∈ (0, 0.5). Above, C denotes the
fourth order isotropic material tensor for linear elasticity.
The Signorini problem with Coulomb friction is then given as follows:

−Div σ (y) = f in , (9.2.1a)


τy = 0 on d , (9.2.1b)
σ (y)n = h on n , (9.2.1c)
τN y − d ≤ 0, σN (y) ≤ 0, (τN y − d)σN (y) = 0 on c , (9.2.1d)
|σT (y)| < F|σN (y)| on {x ∈ c : τT y = 0}, (9.2.1e)
|σN (y)|
σT (y) = −F τT y on {x ∈ c : τT y  = 0}, (9.2.1f )
|τT y|
where the sets {x ∈ c : τT y = 0} and {x ∈ c : τT y  = 0} are referred to as the
sticky and the sliding regions, respectively. Above, Div denotes the rowwise divergence
1  1 2
operator and τ : H1 () → H 2 ( ) := H 2 ( ) the zero order trace mapping. The
corresponding scalar-valued normal and tangential component mappings are denoted by
1
τN , τT : H1 () → H 2 ( c ); i.e., for y ∈ H1 () we have the splitting τ y = (τN y)n + (τT y)t
with n and t denoting the unit normal and tangential vector along c , respectively. Similarly,
using (9.2.1a) we can, following [KO], decompose the stress along the boundary, namely,
σ (y)n = σN (y)n + σT (y)t with mappings σN , σT : Y → H − 2 ( c ). Moreover, F : c → R
1

denotes the friction coefficient which is supposed to be uniformly Lipschitz continuous.


There are major mathematical difficulties inherent in the problem (9.2.1). For instance,
in general σN (y) in (9.2.1e), (9.2.1f) is not pointwise a.e. defined. Replacing the Coulomb
friction in the above model by Tresca friction means replacing |σN (y)| by a given friction
g. The resulting system can then be analyzed and the existence of a unique solution can be
proved. For a review we refer the reader to [Rao], for example.

i i

i i
ItoKunisc
i i
2008/6/12
page 265
i i

9.2. Friction and contact problems in elasticity 265

9.2.2 Contact problem with Tresca friction


We now give a variational statement of the Signorini problem with given friction. The set
of admissible deformations is defined as

Y := {v ∈ H1 () : τ v = 0 a.e. on d },

where H1 () := (H 1 ())2 . To incorporate the nonpenetration condition between the


elastic body and the rigid foundation we introduce the cone

K := {v ∈ Y : τN v ≤ 0 a.e. on c }.

We define the symmetric bilinear form a(· , ·) on Y × Y and the linear form L(·) on Y by
  
a(y, z) := (σ y) : (εz) dx, L(y) = f y dx + h τ y dx,
  n

where : denotes the sum of the componentwise products.


For the friction g we assume that g ∈ H − 2 ( c ) and g ≥ 0, i.e., g, h
c ≥ 0 for
1

1
all h ∈ H 2 ( c ) with h ≥ 0. Since the friction coefficient F is assumed to be uniformly
1
Lipschitz continuous, it is a factor on H 2 ( c ), i.e., the mapping
1 1
λ ∈ H 2 ( c )  → Fλ ∈ H 2 ( c )

is well defined and bounded; see [Gri, p. 21]. By duality it follows that F is a factor on
H − 2 ( c ) as well. Consequently, the nondifferentiable functional
1


j (y) := Fg|τT y| dx
c

is well defined on Y. After these preparations we can state the contact problem with given
friction as
1
min J (y) := a(y, y) − L(y) + j (y) (P)
y∈d+K 2

or, equivalently, as elliptic variational inequality [Glo]:


-
Find y ∈ d + K such that
(9.2.2)
a(y, z − y) + j (z) − j (y) ≥ L(z − y) for all z ∈ d + K.

Due to the Korn inequality, the functional J (·) is uniformly convex; further it is l.s.c. This
implies that (P) and equivalently (9.2.2) admit a unique solution y ∗ ∈ d + K.
To derive the dual problem corresponding to (P), we apply the Fenchel calculus to
1
the mappings F : Y → R and G : V × H 2 ( c ) → R given by
-  
−L(y) if y ∈ d + K, 1
F(y) := G(q, ν) := q : C q dx + Fg|ν| dx,
∞ else; 2  c

i i

i i
ItoKunisc
i i
2008/6/12
page 266
i i

266 Chapter 9. Semismooth Newton Methods II: Applications

where
V = {p ∈ (L2 ())2×2 : p12 = p21 }.
1
Furthermore,  ∈ L(Y, V × H 2 ( c )) is given by
y := (1 y, 2 y) = (εy, τT y),
which allows us to express (P) as
 
min F(y) + G(y) .
y∈Y
1
Endowing V × H 2 ( c ) with the usual product norm, F and G satisfy the conditions for
the Fenchel duality theorem, which was recalled in the preamble of this chapter. For the
convex conjugate one derives that F (− (p, μ)) equals +∞ unless
and pT + μ = 0 in H − 2 ( c ),
1
−Div p = f , p · n = h in L2 ( n ), (9.2.3)
− 12
where pT = (n! p) · t ∈ H ( c ). Further one obtains that
-
− pN , d
c if (9.2.3) and pN ≤ 0 in H − 2 ( c ) hold,
1

F (− (p, μ)) =


∞ else.
Evaluating the convex conjugate for G yields that
⎧ 
⎨ 1 C−1 p : p dx if Fg, |ν|
− ν, μ
≥ 0 for all ν ∈ H 12 ( ),
c c c
G (p, μ) = 2


∞ else.
Thus, following (9.0.1) we derive the dual problem corresponding to (P):

1
sup − C−1 p : p dx + pN , d
c . (P )
1
−2 2 
(p,μ)∈V×H ( c )
− 1
s.t. (9.2.3),p
 N ≤0 in H 2 ( c ),
and Fg, |ν| c − ν, μ
c ≥0
1
for all ν∈H 2 ( c ).

This problem is an inequality-constrained maximization problem of a quadratic func-


tional, while the primal problem (P) involves the minimization of a nondifferentiable func-
tional. Evaluating the extremality conditions (9.0.2) for the above problems one obtains the
following lemma.

Lemma 9.5. The solution y ∗ ∈ d + K of (P) and the solution (p ∗ , μ∗ ) of (P ) are related
by σ y ∗ = p∗ and by the existence of λ∗ ∈ H − 2 ( c ) such that
1

a(y ∗ , z) − L(z) + μ∗ , τT z
c + λ∗ , τN z
c = 0 for all z ∈ Y, (9.2.4a)
λ∗ , τN z
c ≤ 0 for all z ∈ K, (9.2.4b)
λ∗ , τN y ∗ − d
c = 0, (9.2.4c)
1
Fg, |ν|
c − μ∗ , ν
c ≥ 0 for all ν ∈ H ( c ), 2 (9.2.4d)
Fg, |τT y ∗ |
c − μ∗ , τT y ∗
c = 0. (9.2.4e)

i i

i i
ItoKunisc
i i
2008/6/12
page 267
i i

9.2. Friction and contact problems in elasticity 267

Proof. The extremality condition − (p ∗ , μ∗ ) ∈ ∂F(y ∗ ) results in y ∗ ∈ d + K and


(p∗ , ε(z − y ∗ )) − L(z − y ∗ ) + μ∗ , τT (z − y ∗ )
c ≥ 0 for all z ∈ d + K. (9.2.5)
∗ ∗ ∗ ∗ ∗
The condition (p , μ ) ∈ ∂G(y ) yields that p = σ y and (9.2.4d) and (9.2.4e). Intro-
ducing the multiplier λ∗ for the variational inequality (9.2.5) leads to (9.2.4a), (9.2.4b), and
(9.2.4c).

According to (9.2.3) the multiplier μ∗ ∈ H − 2 ( c ) corresponding to the nondifferen-


1

tiability of the primal functional J (·) has the mechanical interpretation μ∗ = −σT y ∗ . Using
Green’s theorem in (9.2.4a) one finds
λ∗ = −σN y ∗ , (9.2.6)
i.e., λ∗ is the negative stress in normal direction.
We now briefly comment on the case that the given friction g is more regular, namely,
g ∈ L2 ( c ). In this case we can define G on the larger set V × L2 ( c ). One can verify that
the assumptions for the Fenchel duality theorem hold, and thus obtain higher regularity for
the dual variable μ corresponding to the nondifferentiability of the cost functional in (P ),
in particular μ ∈ L2 ( c ). This implies that the dual problem can be written as follows:

1
sup − C−1 p : p dx + pN , d
c . (9.2.7)
2
(p,μ)∈V×L ( c ) 2 
1
s.t. (9.2.3),pN ≤0 in H − 2 ( c ),
and |μ|≤Fg a.e. on c .

Utilizing the relation p = σ y and (9.2.6), one can transform (9.2.7) into


⎪ −
1
a(y λ,μ , y λ,μ ) + λ, d
c ,

⎪ min

⎪ − 12
( c ) 2
⎨ (λ,μ)∈H ( −c )∈×L
2
1
λ≥0 in H 2 ( c ) (9.2.8)
⎪ |μ|≤Fg a.e. on c



⎪ where y λ,μ satisfies


a(y λ,μ , z) − L(z) + λ, τN z
c + (μ, τT z) c = 0 for all z ∈ Y.
Problem (9.2.8) is an equivalent form for the dual problem (9.2.7), now written in the
variables λ and μ. The primal variable y λ,μ appears only as an auxiliary variable determined
from λ and μ. Since g ∈ L2 ( c ), also the extremality conditions corresponding to (P) and
(9.2.7) can be given more explicitly. First, (9.2.4d) is equivalent to
|μ∗ | ≤ Fg a.e. on c , (9.2.4d )
and a brief computation shows that (9.2.4e) is equivalent to
-
τT y ∗ = 0 or
τ y∗ (9.2.4e )
τT y ∗  = 0 and μ∗ = Fg |τT y ∗ | .
T

Moreover (9.2.4d ) and (9.2.4e ) can equivalently be expressed as


σ τT y ∗ − max(0, σ τT y ∗ + μ∗ − Fg) − min(0, σ τT y ∗ + μ∗ + Fg) = 0 (9.2.9)
with arbitrary σ > 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 268
i i

268 Chapter 9. Semismooth Newton Methods II: Applications

We now introduce and analyze a regularized version of the contact problem with given
friction that allows the application of the semismooth Newton method. In what follows we
assume that g ∈ L2 ( c ). We start our consideration with a regularized version of the dual
problem (9.2.7) written in the form (9.2.8). Let γ1 , γ2 > 0, λ̂ ∈ L2 ( c ), λ̂ ≥ 0, and
μ̂ ∈ L2 ( c ), and define the functional Jγ1,γ2 : L2 ( c ) × L2 ( c ) −→ R by

1 1
Jγ1,γ2 (λ, μ) := a(y λ,μ , y λ,μ ) + (λ, d) c + λ − λ̂2 c
2 2γ1
1 1 1
+ μ − μ̂2 c − λ̂2 c − μ̂2 c ,
2γ2 2γ1 2γ2

where y λ,μ ∈ Y satisfies

a(y λ,μ , z) − L(z) + (λ, τN z) c + (μ, τT z) c = 0 for all z ∈ Y. (9.2.10)

The regularized dual problem with given friction is defined as

max −Jγ1,γ2 (λ, μ). (Pγ1,γ2 )


(λ,μ)∈L2 ( c )×L2 ( c )
λ≥0, |μ|≤Fg a.e. on c

Obviously, the last two terms in the definition of Jγ1,γ2 are constants and can thus be neglected
in the optimization problem (Pγ1,γ2 ). However, they are introduced with regard to the primal
problem corresponding to (Pγ1,γ2 ), which we turn to next. We define the functional Jγ1,γ2 :
Y → R by

1 1
Jγ1,γ2 (y) := a(y, y) − L(y) +  max(0, λ̂ + γ1 (τN y − d))2 c
2  2γ1
1
+ Fgh(τT y(x), μ̂(x)) dx,
γ2 c

where h(· , ·) : R × R −→ R is a local smoothing of the absolute value function, given by


⎧ 1

⎨ |γ2 x + α| − Fg if |γ2 x + α| ≥ Fg,
h(x, α) := 2 (9.2.11)


1
|γ2 x + α|2 if |γ2 x + α| < Fg.
2Fg

Then the primal problem corresponding to (Pγ1,γ2 ) is

min Jγ1,γ2 (y). (Pγ1,γ2 )


y∈Y

This can be verified similarly as for the original problem using Fenchel duality theory;
see [Sta1] for details. Clearly, both (Pγ1,γ2 ) and (Pγ1,γ2 ) admit unique solutions y γ1,γ2 and
(λγ1,γ2 , μγ1,γ2 ), respectively. Note that the regularization turns the primal problem into the
unconstrained minimization of a continuously differentiable functional, while the corre-
sponding dual problem is still a constrained minimization of a quadratic functional. To
shorten notation we henceforth mark all variables of the regularized problems only by the

i i

i i
ItoKunisc
i i
2008/6/12
page 269
i i

9.2. Friction and contact problems in elasticity 269

index γ instead of γ1,γ2 . It can be shown that the extremality conditions relating (Pγ1,γ2 ) and
(Pγ1,γ2 ) are

a(y γ , z) − L(z) + (μγ , τT z) c + (λγ , τN z) c = 0 for all z ∈ Y, (9.2.12a)


λγ − max(0, λ̂ + γ1 (τN y γ − d)) = 0 on c , (9.2.12b)
-
γ2 (ξγ − τT y γ ) + μγ − μ̂ = 0,
(9.2.12c)
ξγ − max(0, ξγ + σ (μγ − Fg)) − min(0, ξγ + σ (μγ + Fg)) = 0

for any σ > 0. Here, ξγ is the Lagrange multiplier associated to the constraint |μ| ≤ Fg in
(Pγ1,γ2 ). By setting σ = γ2−1 , ξγ can be eliminated from (9.2.12c), which results in

γ2 τT y γ + μ̂ − μγ − max(0, γ2 τT y γ + μ̂ − Fg) − min(0, γ2 τT y γ + μ̂ + Fg) = 0. (9.2.13)

While (9.2.13) and (9.2.12c) are equivalent, they will motivate slightly different active set
algorithms due to the parameter σ in (9.2.12c).
Next we investigate the convergence of the primal variable y γ as well as the dual
variables (λγ , μγ ) as the regularization parameters γ1 , γ2 tend to infinity. For this purpose
we denote by y ∗ the solution of (P) and by (λ∗ , μ∗ ) the solution to (P ).

Theorem 9.6. For all λ̂ ∈ L2 ( c ), λ̂ ≥ 0, μ̂ ∈ L2 ( c ), and g ∈ L2 ( c ), the primal


variable y γ converges to y ∗ strongly in Y and the dual variables (λγ , μγ ) converge to
(λ∗ , μ∗ ) weakly in H − 2 ( c ) × L2 ( c ) as γ1 → ∞ and γ2 → ∞.
1

Proof. The proof of this theorem is related to that of Theorem 8.26 in Chapter 8 and can be
found in [KuSt].

An active set algorithm for solving (9.2.12) is presented next. The interpretation as
generalized Newton method is discussed later. In the following we drop the index γ .

Algorithm SSN.

1. Initialize (λ0 , ξ 0 , μ0 , y 0 ) ∈ L2 ( c ) × L2 ( c ) × L2 ( c ) × Y, σ > 0 and set k := 0.

2. Determine the active and inactive sets

Ak+1
c = {x ∈ c : λ̂ + γ1 (τN y k − d) > 0},

Ick+1 = c \ Ak+1
c ,

Ak+1
f,− = {x ∈ c : ξ + σ (μ + Fg) < 0},
k k

f,+ = {x ∈ c : ξ + σ (μ − Fg) > 0},


Ak+1 k k

Ifk+1 = c \ (Ak+1
f,− ∪ Af,+ ).
k+1

3. If k ≥ 1, Ak+1
c = Akc , Ak+1
f,− = Af,− , and Af,+ = Af,+ stop. Else
k k+1 k

i i

i i
ItoKunisc
i i
2008/6/12
page 270
i i

270 Chapter 9. Semismooth Newton Methods II: Applications

4. Solve

a(y k+1 , z) − L(z) + (μk+1 , τT z) c + (λk+1 , τN z) c = 0 for all z ∈ Y,


λk+1 = 0 on Ick+1 , λk+1 = λ̂ + γ1 (τN y k+1 − d) on Ak+1
c ,

μk+1 − μ̂ − γ2 τT y k+1 = 0 on Ifk+1 ,

μk+1 = −Fg on Ak+1


f,− , μ
k+1
= Fg on Ak+1
f,+ .

5. Set


⎪ τ y k+1 + γ2−1 (μ̂ + Fg) on Ak+1
f,− ,
⎨ T
ξ k+1 := τT y k+1 + γ2−1 (μ̂ − Fg) on Ak+1
f,+ ,



0 on Ifk+1 ,

k := k + 1 and goto step 2.


Note that there exists a unique solution to the system in step 4, since it represents
the necessary and sufficient optimality conditions for the equality-constrained auxiliary
problem
min Jγ1 ,γ2 (λ, μ),
λ=0 on Ic
k+1
,
μ=−Fg on A μ=Fg on Af,+
k+1 k+1
f,− ,

with Jγ1 ,γ2 as defined in (9.2.10) that clearly has a unique solution. If the algorithm stops
at step 3 then y k is the solution to the primal problem (Pγ1,γ2 ) and (λk , μk ) solves the dual
problem (Pγ1,γ2 ).
Provided that we choose σ = γ2−1 , the above algorithm can be interpreted as a
semismooth Newton method in infinite-dimensional spaces. To show this assertion, we
consider a reduced system instead of (9.2.12). Thereby, as in the dual problem (Pγ1,γ2 ), the
primal variable y only acts as an auxiliary variable that is calculated from the dual variables
(λ, μ). We introduce the mapping F : L2 ( c ) × L2 ( c ) −→ L2 ( c ) × L2 ( c ) by
⎛ ⎞
λ − max(0, λ̂ + γ1 (τN y λ,μ − d))
⎜ ⎟
F (λ, μ) = ⎝ γ2 τT y + μ̂ − μ − max(0, γ2 τT y + μ̂ − Fg) · · · ⎠ , (9.2.14)
λ,μ λ,μ
· · · − min(0, γ2 τT y λ,μ + μ̂ + Fg)

where for given λ and μ we denote by y λ,μ the solution to

a(y, z) − L(z) + (μ, τT z) c + (λ, τN z) c = 0 for all z ∈ Y. (9.2.15)


1 1
For (λ, μ) ∈ L2 ( c ) × L2 ( c ) we have that τN y λ,μ , τT y λ,μ ∈ H 2 ( c ). Since H 2 ( c )
embeds continuously into Lp ( c ) for every p < ∞, we have the following composition of
the first component of F , for each 2 < p < ∞:
- "
L2 ( c ) × L2 ( c ) → Lp ( c ) → L2 ( c ),
(9.2.16)
(λ, μ) → τN y λ,μ  → max(0, λ̂ + γ1 (τN y λ,μ − d)).

i i

i i
ItoKunisc
i i
2008/6/12
page 271
i i

9.2. Friction and contact problems in elasticity 271

Since the mapping " involves a norm gap under the max functional, it is Newton differ-
entiable (see Example 8.14), and thus the first component of F is Newton differentiable.
A similar observation holds for the second component as well, and thus the whole map-
ping F is Newton differentiable. Hence, we can apply the semismooth Newton method
to the equation F (λ, μ) = 0. Calculating the explicit form of the Newton step leads to
Algorithm SSN with σ = γ2−1 .

Theorem 9.7. Suppose that there exists a constant g0 > 0 with Fg ≥ g0 , and further
suppose that σ ≥ γ2−1 and that λ0 − λγ  c , μ0 − μγ  c are sufficiently small. Then
the iterates (λk , ξ k , μk , y k ) of Algorithm SSN converge superlinearly to (λγ , ξγ , μγ , y γ )
in L2 ( c ) × L2 ( c ) × L2 ( c ) × Y.

Proof. The proof consists of two steps. First we prove the assertion for σ = γ2−1 and then
we utilize this result for the general case σ ≥ γ2−1 .
Step 1. For σ = γ2−1 Algorithm SSN is a semismooth Newton method for the equation
F (λ, μ) = 0 (F as defined in (9.2.14)). We already argued Newton differentiability of F .
To apply Theorem 8.16, it remains to show that the generalized derivatives have uniformly
bounded inverses, which can be achieved similarly as in the proof of Theorem 8.25 in
Chapter 8. Clearly, the superlinear convergence of (λk , μk ) carries over to the variables ξ k
and y k .
Step 2. For σ > γ2−1 we cannot use the above argument directly. Nevertheless, one
can prove superlinear convergence of the iterates by showing that in a neighborhood of the
solution the iterates of Algorithm SSN with σ > γ2−1 coincide with those of Algorithm
SSN with σ = γ2−1 . The argument for this fact exploits the smoothing properties of the
Neumann-to-Dirichlet mapping for the elasticity equation. First, we again consider the case
σ = γ2−1 . Clearly, for all k ≥ 1 we have λk − λk−1 ∈ L2 ( c ) and μk − μk−1 ∈ L2 ( c ).
The corresponding difference y k − y k−1 of the primal variables satisfies

a(y k − y k−1 , z) + (μk − μk−1 , τT z) c + (λk − λk−1 , τN z) c = 0 for all z ∈ Y.

From regularity results for elliptic variational equalities it follows that there exists a constant
C > 0 such that
 
τT y k − τT y k−1 C 0 ( c ) ≤ C λk − λk−1  c + μk − μk−1  c . (9.2.17)

We now show that (9.2.17) implies

Akf,− ∩ Ak+1
f,+ = Af,+ ∩ Af,− = ∅
k k+1
(9.2.18)

provided that λ0 −λγ  c and μ0 −μγ  c are sufficiently small. If B := Akf,− ∩Ak+1
f,+  = ∅,
−1 −1
then it follows that τT y k−1
+ γ2 (μ̂ + Fg) < 0 and τT y + γ2 (μ̂ − Fg) > 0 on B, which
k

implies that τT y k − τT y k−1 > 2γ2−1 Fg ≥ 2γ2−1 Fg0 > 0 on B. This contradicts (9.2.17)
provided that λ0 − λγ  c and μ0 − μγ  c are sufficiently small. Analogously, one can
show that Akf,+ ∩ Ak+1f,− = ∅.
We now choose an arbitrary σ ≥ γ2−1 and assume that (9.2.18) holds for Algorithm
SSN if σ = γ2−1 . Then we can argue that in a neighborhood of the solution the iterates of
Algorithm SSN are independent of σ ≥ γ2−1 . To verify this assertion we separately consider

i i

i i
ItoKunisc
i i
2008/6/12
page 272
i i

272 Chapter 9. Semismooth Newton Methods II: Applications

the sets Ifk , Akf,− , and Akf,+ . On Ifk we have that ξ k = 0 and thus σ has no influence when
determining the new active and inactive sets. On the set Akf,− we have μk = −Fg. Here,
we consider two types of sets. First, sets where ξ k < 0 belong to Ak+1 f,− for the next iteration
independently of σ . And, second, if ξ k ≥ 0, we use

ξ k + σ (μk − Fg) = ξ k − 2σ Fg.

Sets where ξ k − 2σ Fg ≤ 0 are transferred to Ifk+1 , and those where 0 < ξ k − 2σ Fg ≤


ξ k − 2γ2−1 Fg belong to Ak+1
f,+ for the next iteration. However, the case that x ∈ Af,− ∩ Af,+
k k+1

cannot occur for σ ≥ γ2−1 , since it is already ruled out by (9.2.18) for σ = γ2−1 . On the set
Akf,+ one argues analogously.
This shows that in a neighborhood of the solution the iterates are the same for all
σ ≥ γ2−1 , and thus the superlinear convergence result from Step 1 carries over to the
general case σ ≥ γ2−1 , which ends the proof.

Aside from the assumption that λ0 − λγ  c and μ0 − μγ  c are sufficiently small,
σ controls the probability that points are moved from the lower active set to the upper, or
vice versa, in one iteration. Smaller values for σ make it more likely that points belong to
Akf,− ∩ Ak+1
f,+ or Af,+ ∩ Af,− . In the numerical realization of Algorithm SSN it turns out
k k+1

that choosing small values for σ may not be optimal, since this may lead to the situation
that points which are active with respect to the upper bound become active with respect to
the lower bound in the next iteration, and vice versa. This in turn may lead to cycling of the
iterates. Such undesired behavior can be overcome by choosing larger values for σ . If the
active set strategy is based on (9.2.13), one cannot take advantage of a parameter which
helps avoid points from changing from Af,+ to Af,− , or vice versa, in one iteration.

Remark 9.2.1. So far we have not remarked on the choice of λ̄ and μ̄. One possibility is
to choose them according to first order augmented Lagrangian updates. In practice this will
result in carrying out some steps of Algorithm SSN and then updating (λ̄, μ̄) as the current
values of (λk , μk ).

9.2.3 Contact problem with Coulomb friction


We give a short description of how the contact problem with Coulomb friction can be
treated with the methods of the previous subsection. As pointed out earlier, one of the main
difficulties of such problems is related to the lack of regularity of σN (y) in (9.2.1). The
following formulation utilizes the contact problem with given friction g ∈ H − 2 ( c ) and a
1

1
fixed point idea. We define the cone of nonnegative functionals over H 2 ( c ) as
−1  
H+ 2 ( c ) := ξ ∈ H − 2 ( c ) : ξ, η
c ≥ 0 for all η ∈ H 2 ( c ), η ≥ 0
1 1

−1 −1
and consider the mapping ! : H+ 2 ( c ) → H+ 2 ( c ) defined by !(g) := λg , where
λg is the unique multiplier for the contact condition in (9.2.4) for the problem with given
friction g. Property (9.2.4b) implies that ! is well defined. With (9.2.6) in mind, y ∈ Y is
called a weak solution of the Signorini problem with Coulomb friction if its negative normal

i i

i i
ItoKunisc
i i
2008/6/12
page 273
i i

9.2. Friction and contact problems in elasticity 273

boundary stress −σN (y) is a fixed point of the mapping !. In general, such a fixed point for
the mapping ! does not exist unless F is sufficiently small; see, e.g., [EcJa, Has, HHNL].
The regularization for the Signorini contact problem with Coulomb friction that we
consider here corresponds to the regularization in (Pγ1,γ2 ) and reflects the fact that the La-
grangian for the contact condition relates to the negative stress in the normal direction. It
is given by
 
a(y, z − y) + max(0, λ̂ + γ1 (τN y − d)), τN (z − y) c − L(z − y)

1   (9.2.19)
+ F max(0, λ̂ + γ1 (τN y − d)) h(τT z, μ̂) − h(τT y, μ̂) dx ≥ 0
γ2 c

for all z ∈ Y, with h(· , ·) as defined in (9.2.11). Existence for (9.2.19) is obtained by means
of a fixed point argument for the regularized Tresca friction problem. For this purpose we
set L2+ ( c ) := {ξ ∈ L2 ( c ) : ξ ≥ 0 a.e.} and define the mapping !γ : L2+ ( c ) → L2+ ( c )
by !γ (g) := λγ = max(0, λ̂ + γ1 (τN y γ − d)) with y γ the unique solution of the regularized
contact problem with friction g ∈ L2+ ( c ). In a first step Lipschitz continuity of the mapping
γ : L2+ ( c ) → Y which assigns to a given friction g ∈ L2+ ( c ) the corresponding solution
y γ of (Pγ1,γ2 ) is investigated.

Lemma 9.8. For every γ1 , γ2 > 0 and λ̂ ∈ L2 ( c ), μ̂ ∈ L2 ( c ) the mapping γ defined


above is Lipschitz continuous with constant
FL∞( c ) c1
L= , (9.2.20)
κ
where F∞ denotes the essential supremum of F, κ the coercivity constant of a(· , ·), and c1
the continuity constant of the trace mapping from Y to L2 ( c ). In particular, the Lipschitz
constant L does not depend on the regularization parameters γ1 , γ2 .

For the proof we refer the reader to [Sta1]. We next address properties of the map-
ping !γ .

Lemma 9.9. For every γ1 , γ2 > 0 and λ̂ ∈ L2 ( c ), μ̂ ∈ L2 ( c ) the mapping !γ :


L2+ ( c ) → L2+ ( c ) is compact and Lipschitz continuous with constant

L= F∞ ,
κ
where c is a constant resulting from trace theorems.

Proof. We consider the following composition of mappings.


γ " ϒ
L2+ ( c ) −→ Y −→ L2 ( c ) −→ L2+ ( c ), (9.2.21)
g → y → τN y → max(0, λ̂ + γ1 (τN y − d)).

From Lemma 9.8 it is known that γ is Lipschitz continuous. The mapping " consists
1
of the linear trace mapping from Y into H 2 ( c ) and the compact embedding of this space

i i

i i
ItoKunisc
i i
2008/6/12
page 274
i i

274 Chapter 9. Semismooth Newton Methods II: Applications

into L2 ( c ). Therefore, it is compact and linear, in particular Lipschitz continuous with a


constant c2 > 0. Since

 max(0, λ̂ + γ1 (ξ − d)) − max(0, λ̂ + γ1 (ξ̃ − d)) c ≤ γ1 ξ − ξ̃  c (9.2.22)

for all ξ, ξ̃ ∈ L2 ( c ), the mapping ϒ is Lipschitz continuous with constant γ1 . This implies
that !γ is Lipschitz continuous with constant
c 1 c 2 γ1
L= F∞ , (9.2.23)
κ
where c1 , c2 are constants from trace theorems. Concerning compactness the composition
of " and γ is compact. From (9.2.22) it then follows that !γ is compact. This ends the
proof.

We can now show that the regularized contact problem with Coulomb friction has a
solution.

Theorem 9.10. The mapping !γ admits at least one fixed point, i.e., the regularized
Coulomb friction problem (9.2.19) admits a solution. If F∞ is such that L as defined in
(9.2.23) is smaller than 1, the solution is unique.

Proof. We apply the Leray–Schauder fixed point theorem to the mapping !γ : L2 ( c ) →


L2 ( c ). Using Lemma 9.9, it suffices to show that λ is bounded in L2 ( c ) independently
of g. This is clear taking into account the dual problem (Pγ1,γ2 ). Indeed,

min Jγ1 ,γ2 (λ, μ) ≤ min Jγ1 ,γ2 (λ, 0) < ∞.


λ≥0, |μ|≤Fg a.e. on c λ≥0

Hence, the Leray–Schauder theorem guarantees the existence of a solution to the regularized
Coulomb friction problem. Uniqueness of the solution holds if F is such that L is smaller
than 1, since in this case !γ is a contraction.

In the following algorithm the fixed point approach is combined with an augmented
Lagrangian concept to solve (9.2.19).

Algorithm ALM-FP.
1. Initialize (λ̂0 , μ̂0 ) ∈ L2 ( c ) × L2 ( c ) and g 0 ∈ L2 ( c ), m := 0.
2. Choose γ1m , γ2m > 0 and determine the solution (λm , μm ) to problem (Pγ1,γ2 ) with
given friction g m and λ̂ := λ̂m , μ̂ := μ̂m .
3. Update g m+1 := λm , λ̂m+1 := λm , μ̂m+1 := μm , and m := m + 1. Unless an
appropriate stopping criterion is met, goto step 2.

The auxiliary problems (Pγ1,γ2 ) in step (2) can be solved by Algorithm SSN. Numerical
experiments with this algorithm are given in [KuSt]. For the following brief discussion of
the above algorithm, we assume that the mapping ! admits a fixed point λ∗ in L2 ( c ). Then,

i i

i i
ItoKunisc
i i
2008/6/12
page 275
i i

9.2. Friction and contact problems in elasticity 275

with the variables y ∗ ∈ Y and μ∗ ∈ L2 ( c ) corresponding to λ∗ , we have for γ1 , γ2 > 0


that

a(y ∗ , z) − L(z) + (λ∗ , τN z) c + (μ∗ , τT z) c = 0 for all z ∈ Y,


λ∗ − max(0, λ∗ + γ1 (τN y ∗ − d)) = 0 on c ,
γ2 τT y ∗ − max(0, γ2 τT y ∗ + μ∗ − Fλ∗ ) − min(0, γ2 τT y ∗ + μ∗ + Fλ∗ ) = 0 on c .

Provided that Algorithm ALM-FP has a limit point, it also satisfies this system of equations;
i.e., the limit satisfies the original, nonregularized contact problem with Coulomb friction.
We end the section with a numerical example taken from [KuSt].

Example 9.11. We solve the contact problem with Tresca as well as with Coulomb friction
using Algorithms SSN
 and ALM-FP, respectively. We choose
  = [0, 3] × [0, 1], the gap
function d = max 0.0015, 0.003(x1 − 1.5)2 + 0.001 and E = 10000, ν = 0.45, and
f = 0. The boundary of possible contact and friction is c := [0, 3] × {0}, and we assume
traction-free boundary conditions on n := [0, 3] × {1} ∪ {0} × [0, 0.2] ∪ {3} × [0, 0.2].
On d := {0} × [0.2, 1] ∪ {3} × [0.2, 1] we prescribe the deformation as follows:
⎧  
⎪ 0.003(1 − x2 )

⎨ on {0} × [0.2, 1],
−0.004
τy =  

⎪ −0.003(1 − x2 )
⎩ on {3} × [0.2, 1].
−0.004

Further γ1 = γ2 = 108 , σ = 1, λ̄ = μ̄ = 0, and Algorithm SSN is initialized by


λ0 = μ0 = 0. Algorithm ALM-FP is initialized with the solution of the pure contact
problem and g 0 = 0, λ̂0 = μ̂0 = 0. The MATLAB code is based on [ACFK], which
that uses linear and bilinear finite elements for the discretization of the elasticity equations
without friction and contact.
The semismooth Newton method detects the solution for given friction g ≡ 1 and
F = 1 after 7 iterations. The corresponding deformed mesh and the elastic shear energy
density are shown in Figure 9.1. As expected, in a neighborhood of the points (0, 0.2) and

0.8

0.6

0.4

0.2

−0.2
−0.5 0 0.5 1 1.5 2 2.5 3 3.5

Figure 9.1. Deformed mesh for g ≡ 1; gray tones visualize the elastic shear
energy density.

i i

i i
ItoKunisc
i i
2008/6/12
page 276
i i

276 Chapter 9. Semismooth Newton Methods II: Applications

20 20 20

10 10 10

0 0 0

−10 −10 −10

−20 −20 −20

−30 −30 −30

−40 −40 −40


0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
40 80 200

30 60 150

20 40 100

10 20 50

0 0 0

−10 −20 −50

−20 −40 −100

−30 −60 −150

−40 −80 −200


0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

Figure 9.2. Upper row, left: multiplier λ∗ (solid), rigid foundation (multiplied by
5 · 10 , dotted), and normal displacement τN y ∗ (multiplied by 5 · 103 , dashed). Lower row,
3

left figure: dual variable μ∗ (solid) with bounds ±Fλ∗ (dotted) and tangential displacement
y ∗ (multiplied by 5 · 103 , dashed) for F = 2. Middle column: same as left, but with F = 5.
Right column: same as first column, but with F = 10.

(3, 0.2), i.e., the points where the boundary conditions change from Neumann to Dirichlet,
we observe a stress concentration due to a local singularity of the solution. Further, we also
observe a (small) stress concentration close to the points where the rigid foundation has the
kinks.
We turn to the Coulomb friction problem and investigate the performance of Algo-
rithm ALM-FP. In Figure 9.2 the normal and the tangential displacement with corresponding
multipliers for F = 2, 5, 10 are depicted. One observes that the friction coefficient signif-
icantly influences the deformation. For instance, in the case F = 2 the elastic body is in
contact with the foundation in the interval [1.4, 1.6], but it is not for F = 5 and F = 10.
These large values of F may, however, be physically of little relevance. Algorithm ALM-
|g m −g m−1 |
FP requires overall between 20 and 25 linear solves to stop with |gm | c ≤ 10−7 . For
c
further information on the numerical performance we refer the reader to [Sta1].

i i

i i
ItoKunisc
i i
2008/6/12
page 277
i i

Chapter 10

Parabolic Variational
Inequalities

In this section we discuss the Lagrange multiplier approach to parabolic variational inequal-
ities in the Hilbert space H = L2 () which are of the type
 
d ∗
y (t) + Ay ∗ (t) − f (t), y − y ∗ (t) ≥ 0, y ∗ (t) − ψ ∈ C,
dt (10.0.1)

y (0) = y0

for all y − ψ ∈ C, where the closed convex set C of H is defined by

C = {y ∈ H : y ≥ 0},

A is a closed operator on H ,  denotes an open domain in Rn , and y ≥ 0 is interpreted in


the pointwise a.e. sense.
We consider the Black–Scholes model for American options, which is a variational
inequality of the form
 2 
d σ 2
− v(t, S) − S vSS + rS vS − r v + Bv ≥ 0 ⊥ v(t, S) ≥ ψ(S),
dt 2 (10.0.2)
v(T , S) = ψ(S)

for a.e. (t, S) ∈ (0, T ) × (0, ∞), where ⊥ denotes complementarity, i.e., a ≥ 0 ⊥ b ≥ ψ
if a ≥ 0, b ≥ ψ, and a(b − ψ) = 0. In (10.0.2) the reward function ψ(S) = (K − S)+ for
the put option and ψ(S) = (S − K)+ for the call option. Here S ≥ 0 denotes the price, v
the value of the share, r > 0 is the interest rate, σ > 0 is the volatility of the market, and
K is the strike price. Further T is the maturity date. The integral operator B is defined by
 ∞
 
Bv(S) = −λ (z − 1)SvS + (v(t, S) − v(t, zS)) dν(z).
0

Note that (10.0.2) is a backward equation with respect to the time variable. Setting
y(t, S) = v(T − t, S) we arrive at (10.0.1), and (10.0.2) has the following interpretation

277

i i

i i
ItoKunisc
i i
2008/6/12
page 278
i i

278 Chapter 10. Parabolic Variational Inequalities

[Kou] in mathematical finance. The price process St is governed by the Ito’s stochastic
differential equation

dSt /St− = r dt + σ dBt + (Jt − 1) dπt ,

where Bt denotes a standard Brownian motion, πt is a counting Poisson process, Jt − 1 is


the magnitude of the jump ν, and λ ≥ 0 is the rate. The value function v is represented by

v(t, S) = sup E t,x [e−r(τ −t) ψ(Sτ )] over all stopping times τ ≤ T . (10.0.3)
τ

It will be shown that for the put case, (0, ∞) can be replaced by  = (S̄, ∞) with
certain S̄ > 0. Thus, (10.0.2) can be formulated as (10.0.1) by defining a bounded bilinear
form a on V × V by
 ∞ 2 
σ 2
a(v, φ) = S vS + (r − σ 2 )S v φS + (2r − σ 2 ) vφ − Bv φ) dS (10.0.4)
S̄ 2
for v, φ ∈ V , where V is defined by

V = φ ∈ H : φ is absolutely continuous on (S̄, ∞),

 ∞ 
S 2 |φS |2 dS < ∞ and φ(S) → 0 as S → ∞ and ψ(S̄) = 0

equipped with
 ∞
|φ|2V = (S 2 |φS |2 + |φ|2 ) dS.

Now
σ2
a(v, φ) ≤ |v|V |φ|V + |r − σ 2 ||v|H |φ|V + |2r − σ 2 ||v|H |φ|H
2
and
 
σ2 2 3
a(v, v) ≥ |v|V + 2r − σ 2 |v|2H − |r − σ 2 ||v|V |v|H
2 2
 
σ2 2 3 2 (r − σ 2 )2
≥ |v|V + 2r − σ − |v|2H .
4 2 σ2
Note that if Av ∈ L2 (), then

(Av, φ) = a(v, φ) for all φ ∈ V .


 S̄
Then, the solution v(T − t, S) satisfies (10.0.1) with f (S) = λ S
0 ψ(zS) dν(z). Let
v + = sup(0, v). Since v + ≥ v a.e. in ,

(−Bv, v + ) ≥ (−Bv + , v + ).

i i

i i
ItoKunisc
i i
2008/6/12
page 279
i i

279

Thus,  
σ2 2
Av = − S vSS + rS vS − r v + Bv
2
satisfies
Av, v +
≥ Av + , v +
.
Motivated by this example we make the following assumptions. Let X be a Hilbert
space that is continuously embedded into H , and let V be a separable closed linear subspace
of X endowed with the induced norm and dense in H . Assume that

ψ ∈ X, f ∈ L2 (0, T ; V ∗ ),

and that
φ + = sup(0, φ) ∈ V for all φ ∈ V .
The following assumptions will be used for the operator A.
(1) A ∈ L(X, V ∗ ), i.e., there exists M̄ such that

| Ay, φ
V ∗ ×V | ≤ M̄|y| |φ| for all y ∈ X and φ ∈ V ,

and A is closed in H with

dom(A) = {y ∈ X : Ay ∈ H } ⊂ X,

where dom(A) is a Hilbert space equipped with the graph norm.


(2) There exist ω > 0 and ρ ∈ R such that for all φ ∈ V

Aφ, φ
≥ ω|φ|2V − ρ|φ|2H .

(3) For all φ ∈ V ,


Aφ, φ +
≥ Aφ + , φ +
.
(4) There exists λ̄ ∈ H satisfying λ̄ ≤ 0 a.e. such that

λ̄ + Aψ − f (t), φ
≤ 0

for a.e. t and all φ ∈ V satisfying φ ≥ 0 a.e.


(5) There exists an ψ̄ ∈ dom(A) such that ψ̄ − ψ ∈ V ∩ C = C.
(6) Let as be the symmetric form on V × V defined by
1
as (y, φ) = ( Ay, φ
+ Aφ, y
)
2
for y, φ ∈ V and assume that the skew-symmetric form satisfies
1
| Ay, φ
− Aφ, y
| ≤ M|y|V |φ|H
2
for a constant M independent of y, φ ∈ V .
(7) (φ − γ )+ ∈ V for any γ ∈ R+ and φ ∈ V , and A1, (φ − γ )+
≥ 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 280
i i

280 Chapter 10. Parabolic Variational Inequalities

Assumptions (1)–(5) apply to (10.0.2) and to second order elliptic differential oper-
ators. Assumption (6) applies to the biharmonic operator 2 and to self-adjoint operators.
For the biharmonic operator and systems of equations as the elasticity system, for instance,
the monotone property (3) does not hold.
In this chapter we discuss (10.0.1) without assuming that V is embedded compactly
into H . In the latter case, one can use the Aubin lemma, which states that W (0, T ) =
L2 (0, T ; V )∩H 1 (0, T ; V ∗ ) is compactly embedded into L2 (0, T ; H ). This ensures that the
weak limit of certain approximating sequences defines the solution; see, e.g., [GLT, IK23].
Instead, our analysis uses the monotone trick for variational inequalities. From, e.g., [Tan,
p. 151], we recall that W (0, T ) embeds continuously into C([0, T ]; H ).
We commence with the definitions of strong and weak solutions to (10.0.1).

Definition 10.1 (Strong Solution). Given y0 − ψ ∈ C and f ∈ L2 (0, T ; H ), an X-valued


function y ∗ (t), with y ∗ − ψ ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) is called a strong solution of
(10.0.1) if y ∗ (0) = y0 , y ∗ ∈ H 1 (δ, T ; H ) for every δ > 0, y ∗ (t, x) ≥ ψ(x) a.e. in
(0, T ) × , and for a.e. t ∈ (0, T ),
 
d ∗
y (t) + Ay ∗ (t) − f (t), y − y ∗ (t) ≥ 0
dt
for a.e. t ∈ (0, T ) and for all y − ψ ∈ C.

Defining λ∗ = − dtd y ∗ − Ay ∗ + f (t) ∈ L2 (0, T ; V ∗ ), we have in the distributional


sense that y ∗ satisfies

⎪ d ∗
⎨ y (t) + Ay ∗ (t) + λ∗ (t) = f (t), y ∗ (0) = y0 ,
dt (10.0.5)

⎩ ∗ ∗ ∗
λ (t), y − ψ
≤ 0 for all y − ψ ∈ C and λ , y − ψ
= 0.

Moreover, in case that y ∗ is a strong solution we have y ∗ ∈ L2 (δ, T ; dom(A)) for every
δ > 0. Further λ∗ ∈ L2 (δ, T ; H ) and (10.0.1) can equivalently be written as a variational
inequality in the form

⎪ d ∗
⎨ y (t) + Ay ∗ (t) + λ∗ (t) = f (t), y ∗ (0) = y0 ,
dt (10.0.6)

⎩ ∗ ∗ ∗ ∗
λ (t) ≤ 0, y (t) ≥ ψ, (y (t) − ψ, λ (t))H = 0 for a.e. t > 0.

Definition 10.2 (Weak Solution). Assume that y0 ∈ H and f ∈ L2 (0, T , V ∗ ). Then a


function y ∗ − ψ ∈ L2 (0, T ; V ) satisfying y ∗ (t, x) ≥ ψ(x) a.e. in (0, T ) ×  is called a
weak solution to (10.0.1) if
 T 1 2
d
y(t), y(t) − y ∗ (t)
+ Ay ∗ (t), y(t) − y ∗ (t)
− f (t), y(t) − y ∗ (t) dt
0 dt
1
+ |y(0) − y0 |2H ≥ 0
2
(10.0.7)

i i

i i
ItoKunisc
i i
2008/6/12
page 281
i i

10.1. Strong solutions 281

is satisfied for all y − ψ ∈ K, where


K = {y ∈ W (0, T ) : y(t, x) ≥ 0 a.e. in (0, T ) × } (10.0.8)
and
W (0, T ) = L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ).

Since for y ∗ and y in W (0, T )


 T 
d 1
y(t) − y (t), y(t) − y (t) dt = (|y(T ) − y ∗ (T )|2H − |y(0) − y0 |2H ),
∗ ∗
0 dt 2
it follows that a strong solution to (10.0.1) is also a weak solution.
Let us briefly outline this chapter. Section 10.1 is devoted to proving existence and
uniqueness of strong solutions. Extra regularity of solutions is obtained in Section 10.2.
Continuous dependence of the solution with respect to parameters in A is investigated in
Section 10.3. Section 10.4 focuses on weak solutions obtained as the limit of approximating
difference schemes. In Section 10.5 monotonic behavior of solutions with respect to initial
conditions and the forcing function is proved. We refer to [Sey] and the literature cited there
for an introduction to numerical aspects in mathematical finance.

10.1 Strong solutions


In this section we establish the existence of the strong solution to (10.0.1) under assumptions
(1)–(5) and (1)–(2), (5)–(6), respectively.
For λ̄ ∈ H satisfying λ̄ ≤ 0 we consider the regularized equations of the form
- d
y + Ayc + min(0, λ̄ + c (yc − ψ)) = f, c > 0,
dt c
(10.1.1)
yc (0) = y0 .

Proposition 10.3. If assumptions (1)–(2) hold and y0 ∈ H , f ∈ L2 (0, T ; V ∗ ), and λ̄ ∈ H ,


then (10.1.1) has a unique solution yc satisfying yc − ψ ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ).

Proof. Existence and uniqueness of the solution to (10.1.1) follow with monotone tech-
niques; see [ItKa, Lio3], for instance. Define A : V → V ∗ by
Aφ = Aφ + min(−λ̄, cφ).
Then (10.1.1) can equivalently be expressed as
d
v + Av = f − λ̄ − Aψ ∈ L2 (0, T ; V ∗ ), (10.1.2)
dt
with v = yc − ψ and v(0) = y0 − ψ ∈ H . We note that A is hemicontinuous, i.e.,
s → A(φ1 + sφ2 ), φ3
is continuous from R → R for all φi ∈ V , i = 1, . . . , 3, and
|Aφ|V ∗ ≤ |Aφ|V ∗ + c|φ|H for all φ ∈ V ,
Aφ1 − Aφ2 , φ1 − φ2
≥ ω|φ1 − φ2 |2V − ρ|φ1 − φ2 |2H for all φ1 , φ2 ∈ V ,
Aφ, φ
≥ ω|φ|2V − ρ|φ|2H for all φ ∈ V .

i i

i i
ItoKunisc
i i
2008/6/12
page 282
i i

282 Chapter 10. Parabolic Variational Inequalities

It follows that (10.1.2) admits a unique solution v ∈ L2 (0, T ; V ) ∩ H 1 (0, T ; V ∗ ) and


this gives the desired solution yc = v + ψ of (10.1.1); cf. [ItKa, Theorem 8.7], [Lio3,
Theorem II.1.2].

Theorem 10.4. (1) If in addition to the assumptions in Proposition 10.3 assumptions (3)–
(4) hold, y0 − ψ ∈ C, then yc (t) − ψ ∈ C and yc (t) ≥ yĉ (t) for ĉ ≥ c. Moreover,
yc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) and weakly in H 1 (0, T ; V ∗ ) as c → ∞, where
y ∗ is the unique solution of (10.0.1) in the sense that y ∗ − ψ ∈ K, (10.0.6) is satisfied with
λ∗ ∈ L2 (0, T ; H ), and the estimate
 t 
1 −2ρt ∗ −2ρs ∗ 1 t −2ρs 2
e |yc (t) − y (t)|H +
2
e ω |yc (s) − y (s)|V ds ≤
2
e |λ̄| ds → 0
2 0 c 0

for t ∈ [0, T ] holds. If in addition assumption (7) is satisfied and λ̄ ∈ L∞ (), then

1
|yc (t) − y ∗ (t)|L∞ ≤ |λ̄|L∞ .
c
(2) If assumptions (1)–(4) hold, y0 − ψ ∈ C, and f ∈ L2 (0, T ; H ), then y ∗ is the
unique strong solution to (10.0.1).

Proof. (1) From (10.1.1) it follows that yc satisfies


- d
y + A(yc − ψ) + λc = f − Aψ,
dt c
(10.1.3)
yc (0) = y0 ,

where

λc = min(0, λ̄ + c (yc − ψ)).


If y0 −ψ ∈ C, then yc (t)−ψ ∈ C. In fact, let φ = min(0, yc −ψ) = −(yc −ψ)− ∈ −C∩V .
Since λ̄ ≤ 0 it follows that
 
d
yc + A(yc − ψ), φ + Aψ − f + λ̄ + c φ, φ
= 0,
dt

where by assumptions (3) and (2)

A(yc − ψ), φ
≥ Aφ, φ
≥ ω|φ|2 − ρ|φ|2H

and by (4)

Aψ − f (t) + λ̄, φ
≥ 0.

Thus,
1 d
|φ|2 ≤ ρ |φ|2H
2 dt H

i i

i i
ItoKunisc
i i
2008/6/12
page 283
i i

10.1. Strong solutions 283

and consequently
e−2ρt |φ|2H ≤ |φ(0)|2H = 0. (10.1.4)
Since
0 ≥ λc = min(0, λ̄ + c (yc − ψ)) ≥ λ̄,
we have
|λc (t)|H ≤ |λ̄|H
for all t ∈ [0, T ]. From (10.1.3) we deduce that {yc } is bounded in L2 (0, T ; V ). By assump-
tion (1) and again by (10.1.3) it follows that {Ayc } and { dtd yc } are bounded in L2 (0, T ; V ∗ ).
Thus, there exist λ∗ ∈ L2 (0, T ; H ) satisfying λ∗ ≤ 0 a.e. and y ∗ satisfying y ∗ − ψ ∈ K,
such that for a subsequence denoted again by c,
λc → λ∗ weakly in L2 (0, T ; H ),
d d (10.1.5)
Ayc → Ay ∗ and yc → y ∗ weakly in L2 (0, T ; V ∗ )
dt dt
as c → ∞ . Taking the limit in (10.1.3) implies that
d ∗
y + Ay ∗ − f = −λ∗ , y ∗ (0) = y0 , (10.1.6)
dt
with equality in the differential equation holding in the sense of L2 (0, T ; V ∗ ).
For φ = −(yc − yĉ )− with c ≤ ĉ we deduce from (10.1.3) that
 
d
(yc − yĉ ) + A(yc − yĉ ), φ + (λc − λĉ , φ) = 0,
dt
where

(λc − λĉ , φ) = min(0, λ̄ + c(yĉ − ψ)) − min(0, λ̄ + ĉ(yĉ − ψ))

+ min(0, λ̄ + c(yc − ψ)) − min(0, λ̄ + c(yĉ − ψ)), φ ≥ 0,
since yĉ ≥ ψ. Hence, using the same arguments as those leading to (10.1.4), we have
|φ(t)|H = 0 and thus
yc ≥ yĉ for c ≤ ĉ.
By Lebesgue dominated convergence theorem and the theorem of Beppo Levi, yc → y ∗
strongly to L2 (0, T ; H ) and pointwise a.e. in (0, T ) × . Since
 T 
1 T 2
0≥ (λc , yc − ψ)H dt ≥ − |λ̄|H dt → 0
0 c 0
as c → ∞, we have
 T
(λ∗ , y ∗ − ψ) dt = 0.
0

i i

i i
ItoKunisc
i i
2008/6/12
page 284
i i

284 Chapter 10. Parabolic Variational Inequalities

That is, (y ∗ , λ∗ ) satisfies (10.0.6), where the first equation is satisfied in the sense of
L2 (0, T ; V ∗ ). Suppose that y ∈ K satisfies (10.0.1). Then it follows that
1 d ∗
|y (t) − y(t)|2H + A(y ∗ − y(t)), y ∗ (t) − y(t)
≤ 0
2 dt
and thus e−2ρt |y ∗ (t) − y(t)|2H ≤ |y0 − y(0)|2H . This implies that y ∗ is the unique solution
to (10.0.1) in K and that the whole family {(yc , λc )} converges in the sense specified in
(10.1.5). From (10.0.1) and (10.1.1)
 
d ∗ ∗ ∗
y (t) + Ay (t) − f (t), yc (t) − y (t) ≥ 0,
dt
 
d ∗
yc (t) + Ayc (t) − f (t), y (t) − yc (t) ≥ (λc , yc − ψ)H .
dt

Since yc ≥ ψ, we have (λc , yc − ψ) ≥ − 1c |λ̄|2H . Summing the above inequalities and


multiplying by e−2ρt give
1 d −2ρt
(e |yc (t) − y ∗ (t)|2H ) + e−2ρt A(yc (t) − y ∗ (t)), yc (t) − y ∗ (t)

2 dt
1
+ρ |yc (t) − y ∗ (t)|2H ≤ e−2ρt |λ̄|2 ,
c
which implies the first estimate and in particular that yy → y ∗ strongly in L2 (0, T ; V ).
Suppose next that in addition λ̄ ∈ L∞ (). Let k ∈ R+ and φ = (yc − y ∗ − k)+ . By
assumption φ ∈ V . From (10.0.6) and (10.1.1)
 
d ∗
y + Ay ∗ − f, φ ≥ 0
dt
and  
d
yc + Ayc + λc − f, φ = 0.
dt
If k ≥ 1c |λ̄|L∞ , then

(λc , φ) = (min(0, λ̄ + c (yc − ψ)), (yc − y ∗ − k)+ ) = 0,

where we use that yc ≥ y ∗ ≥ ψ. Hence, we obtain


 
d ∗ ∗
(yc − y − k) + A(yc − y − k) + A k, φ ≤ 0.
dt
By assumption (3) and since A1, φ
≥ 0,
1 d
|φ|2 + Aφ, φ
≤ 0,
2 dt
which implies the second estimate.

i i

i i
ItoKunisc
i i
2008/6/12
page 285
i i

10.1. Strong solutions 285

(2) Now suppose that f ∈ L2 (0, T ; H ) and that assumptions (1)–(4) hold. Consider
(10.1.3) in the form
- d
y + Ayc = f − λc ,
dt c
(10.1.7)
y(0) = y0 .

We decompose yc = yc,i + yh , where yc,i and yh are the solutions to (10.1.7) with initial
condition and forcing functions set to zero, respectively. Note that {λc } is bounded in
L2 (0, T ; H ) uniformly with respect to c. Hence by the following lemma {Ayc,i } and { dtd yc,i }
are bounded in L2 (0, T ; H ) uniformly for c > 0. Moreover, Ayh ∈ L2 (δ, T ; H ) and dtd yh ∈
L2 (δ, T ; H ) for every δ > 0. Thus yc is bounded in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) and
converges weakly in H 1 (δ, T ; H ) ∩ L2 (δ, T ; dom(A)) to y ∗ for c → ∞.

Lemma 10.5. Under the assumptions of the previous theorem −A generates an analytic
semigroup on H . If dtd x + Ax = g ∈ L2 (0, T ; H ), with x0 = 0, then dtd x(t) and Ax(t) ∈
L2 (0, T ; H ), and
|Ax|L2 (0,T ;H ) ≤ k̄ |g|L2 (0,T ;H ) ,

with k̄ independent of g ∈ L2 (0, T ; H ).

Proof. Let B = A−ρI . Further for u ∈ dom(A) and λ ∈ C with Reλ ≥ 0 set ḡ = λu+Bu.
Then, since
Re λ (u, u)H + Bu, u
≤ |ḡ|H |u|H ,

assumption (2) implies that


ω|u|2V ≤ |f |H |u|H . (10.1.8)

From assumption (1) and (10.1.8)


 

|λ| |u|2H ≤ |ḡ|H |u|H + M̄|u|2V ≤ 1 + |ḡ|H |u|H ,
ω

and thus
 
M̄ 1
|u|H = |(λI + B)−1 ḡ|H ≤ 1 + |ḡ|H . (10.1.9)
ω |λ|

It thus follows from [ItKa, Paz, Tan] that −B and hence −A generate analytic semigroups
on H related by e−B t = e (ρ−A) t .
For g ∈ L2 (0, ∞; H ) with eρ · g ∈ L2 (0, ∞; H ) consider

d
x + Ax = g with x(0) = 0.
dt
This is related to
d
z + Bz = gρ := geρ· with x(0) = 0 (10.1.10)
dt

i i

i i
ItoKunisc
i i
2008/6/12
page 286
i i

286 Chapter 10. Parabolic Variational Inequalities

by z(t) = eρ t x. Taking the Laplace transform of (10.1.10), we obtain


 ∞
λẑ + B ẑ = ĝρ , where ẑ = z0 e−λs z(s) ds,

and thus by (10.1.9)


 

|B ẑ|H ≤ |B(λI + B)−1 ĝρ |H ≤ 2 + |ĝρ |H .
ω
From the Fourier–Plancherel theorem we have
 ∞   ∞

|Az(t)|H dt ≤ 2 +
2
|gρ |2 dt.
0 ω 0

This implies that |Ax|L2 (0,T ;H ) ≤ eρ T (2 + M̄


ω
)|g|L2 (0,T ;H ) choosing g = 0 for
t ≥ T.

To allow δ = 0 for the strong solutions in the previous theorem we let ȳ denote the
solution to - d
dt
ȳ + Aȳ = 0,
ȳ(0) = y0 ,
and we consider
- d
dt
(yc − ȳ) + A(yc − ȳ) = f − λc ,

y(0) − ȳ(0) = 0.

Arguing as in (2) of Theorem 10.4 we obtain the following corollary.

Corollary 10.6. Under the assumptions of Theorem 10.4 we have y ∗ − ȳ ∈ H 1 (0, T ; H ) ∩


L2 (0, T ; dom(A)).

Next we turn to verify existence under a different set of assumptions which, in par-
ticular, does not involve the monotone assumption (3). For λ̄ = 0 in (10.1.1), let ŷc denote
the corresponding solution, i.e.,
d
ŷc + Aŷc + c min(0, c (yc − ψ)) = f, c > 0, (10.1.11)
dt
which exists by Theorem 10.4 (1).

Theorem 10.7. If assumptions (1)–(2) and (5)–(6) hold, y0 − ψ ∈ C ∩ V , and f ∈


L2 (0, T ; H ), then (10.0.1) has a unique, strong solution y ∗ (t) in H 1 (0, T ; H ) ∩
L2 (0, T ; dom(A)), and ŷc → y ∗ strongly in L2 (0, T ; V ) ∩ C(0, T ; H ) as c → ∞. More-
over t → y ∗ (t) ∈ V is right continuous. If in addition assumptions (3)–(4) hold, then
ŷc ≤ ŷĉ for c ≤ ĉ and ŷc (t) → y ∗ (t) strongly in H for each t ∈ [0, T ] and pointwise
a.e. in .

i i

i i
ItoKunisc
i i
2008/6/12
page 287
i i

10.1. Strong solutions 287

Proof. For λ̄ = 0 we have


1 d
|ŷc − ψ|2H + A(ŷc − ψ), ŷc − ψ
+ Aψ − f, ŷc − ψ
+ c |(ŷc − ψ)− |2 = 0.
2 dt
From assumptions (1)–(2) we have
 t
|ŷc (t) − ψ|2H + (ω |ŷc − ψ|2V + c |(ŷc − ψ)− |2H ) ds
0
 t 
1
≤ |y0 − ψ|H +
2
2ρ |ŷc − ψ|H + |Aψ − f |V ∗ ds
2 2
0 ω
and thus
 t
|ŷc (t) − ψ|2H + (ω |ŷc − ψ|2V + c |(ŷc − ψ)− |2H ) ds
 0
  (10.1.12)
1 t
≤ e2ρt |y0 − ψ|2H + |Aψ − f |2V ∗ ds .
ω 0

With assumptions (5) and (6) holding, we have that ŷc ∈ H 1 (0, T ; H ),
 
d d
ŷc + Aŷc + c min(0, ŷc − ψ) − f (t), ŷc = 0
dt dt
for a.e. t ∈ (0, T ). Then

d 2
ŷc + d (as (ŷc − ψ̄, ŷc − ψ̄) + c|(ŷc − ψ)− |2 ) ≤ 2(M 2 |ŷc − ψ̄|2 + |Aψ̄ − f |2 )
dt dt V H
H

and hence
 t
d 2
as (ŷc (t) − ψ̄, ŷc (t) − ψ̄) + c |(ŷc (t) − ψ)− |2H +
dt ŷc ds
 t 0 H (10.1.13)
≤ as (y0 − ψ̄, y0 − ψ̄) + 2(M |ŷc (s) − ψ̄|V + |Aψ̄ − f (s)|2H ) ds,
2 2
0

where we used the fact that y0 − ψ ∈ C. It thus follows from (10.1.12)–(10.1.13) that
 t
d 2
− 2
|ŷc (t) − ψ̄|V + c |(ŷc (t) − ψ) |H +
2
dt ŷc ds
 0 H
t  (10.1.14)
≤ K |y0 − ψ̄|V + |ψ − ψ̄|V + |Aψ̄ − f (s)|2H ds
0

for a constant K independent of c > 0 and t ∈ [0, T ].


Hence there exists a y ∗ such that y ∗ − ψ̄ ∈ H 1 (0, T ; H ) ∩ B(0, T ; V ) and on a
subsequence
d d
ŷc → y ∗ , Aŷc → Ay ∗
dt dt

i i

i i
ItoKunisc
i i
2008/6/12
page 288
i i

288 Chapter 10. Parabolic Variational Inequalities

weakly in L2 (0, T ; H ) and L2 (0, T ; V ∗ ), respectively. In particular this implies that yc →


y ∗ weakly in L2 (0, T ; H ). Above B(0, T ; V ) denotes the space of all everywhere bounded
measurable functions from [0, T ] to V . By assumption (5) we have that y ∗ −ψ ∈ B(0, T ; V )
as well.
Since  T  T
(ŷc − ψ, φ)H dt ≥ − ((ŷc − ψ)− , φ)H dt
0 0
T
for all φ ∈ L (0, T ; H ) with φ(t) ∈ C for a.e. t, and since limc→0 0 |(ŷc (t) − ψ)− |2H dt =
2

0, by (10.1.14) we have
 T
(y ∗ (t) − ψ, φ)H dt ≥ 0 for all φ ∈ L2 (0, T ; H ) with φ(t) ∈ C. (10.1.15)
0

This implies that y ∗ (t) − ψ ∈ C for a.e. t ∈ [0, T ]. For y − ψ ∈ C


 
−(c(ŷc (t) − ψ)− , y − ŷc (t)) = − c(ŷc (t) − ψ)− , y − ψ − (ŷc (t) − ψ) ≤ 0

for a.e. t ∈ (0, T ). It therefore follows from (10.1.11) that for y ∈ K


 
d
(yˆc (t) − y(t) + y(t)) + A(ŷc (t) − y(t)) + Ay(t) − f (t), y(t) − ŷc (t) ≥ 0.
ds
(10.1.16)
Hence  t  
d
e−2ρs (ŷc (s) − y(s)), ŷc (s) − y(s) ds
0 ds
 t  
d
+ e−2ρs y(s) + A(ŷc (s) − y(s)), ŷc (s) − y(s)
0 ds
 t
≤ e−2ρs Ay(s) − f (s), y(s) − ŷc (s)
ds.
0
For z ∈ W (0, T ) we have
 t    t
d 1
e−2ρs z(s), z(s) ds = (e−2ρt |z(t)|2H − |z(0)|2H ) + ρ e−2ρs |z(s)|2H ds.
0 ds 2 0
(10.1.17)
This, together with (10.1.16), implies for y ∈ K
1 −2ρt
(e |ŷc (t) − y(t)|2H − |y0 − y(0)|2H )
2
 t   
−2ρs d
+ e y(s) + A(ŷc (s) − y(s)), ŷc (s) − y(s) + ρ |ŷc (s) − y(s)|H ds
2
0 ds
 t
≤ e−2ρs Ay(s) − f (s), y(s) − ŷc (s)
ds
0 t
→ e−2ρs Ay − f (s), y(s) − y ∗ (s)
ds
0

i i

i i
ItoKunisc
i i
2008/6/12
page 289
i i

10.1. Strong solutions 289

as c → ∞. Since norms are w.l.s.c., we obtain


1 −2ρt ∗
(e |y (t) − y(t)|2H − |y0 − y(0)|2H )
2
 t   
−2ρs d ∗ ∗ ∗
+ e y(s) + A(y (s) − y(s)), y (s) − y(s) + ρ |y (s) − y(s)|H ds
2
0 ds
 t
≤ e−2ρs Ay(s) − f (s), y(s) − y ∗ (s)
ds,
0

or equivalently, using (10.1.17)


 t  
d ∗
e−2ρs y (s) + Ay ∗ (s) − f (s), y(s) − y ∗ (s) ds ≥ 0 (10.1.18)
0 ds

for all y ∈ K and t ∈ [0, T ]. If y(t) − ψ ∈ H 1 (0, T ; H ) ∩ B(0, T ; V ) also satisfies


(10.1.18), then it follows from (10.1.18) that
 t  
d
e−2ρs (y(s) − y ∗ (s)) + A(y(s) − y ∗ (s)), y(s) − y ∗ (s) ds ≤ 0.
0 ds
Using (10.1.17) this implies
 t
1 −2ρt
e |y(t)−y ∗ (t)|2H + e−2ρs ( A(y(s)−y ∗ (s)), y(s)−y ∗ (s)
+ρ |y ∗ (s)−y(s)|2H ) ds ≤ 0
2 0

and thus y(t) = y ∗ (t). Hence the solution to (10.1.18) is unique. Integrating (10.1.16) on
(τ, t) with 0 ≤ τ < t ≤ T we obtain with the arguments that lead to (10.1.18)
 t  
d ∗
e−2ρs y (s) + Ay ∗ (s) − f (s), y(s) − y ∗ (s) ds ≥ 0, (10.1.19)
τ ds
and thus y ∗ satisfies (10.0.1).
To argue that ŷc − ψ → y ∗ − ψ strongly in L2 (0, T ; V ) ∩ C(0, T ; H ), note that
λ̂c = c min(0, ŷc − ψ) converges weakly in L2 (0, T ; V ∗ ) to λ∗ . From (10.1.11) and
(10.0.5) we have
1 d
|ŷc − y ∗ |2 + A(ŷc − y ∗ ), ŷc − y ∗
= λ∗ − λ̂c , ŷc − y ∗

2 dt
≤ λ∗ − λ̂c , ŷc − ψ + ψ − y ∗
≤ λ∗ , ŷc − ψ
+ λ̂c , y ∗ − ψ
=: ηc ,

where |ηc |L1 (0,T ;R) → 0 for c → ∞. By assumption (2)

1 d
|ŷc − y ∗ |2H + ω|ŷc − y ∗ |2V − ρ|ŷc − y ∗ |2H ≤ ηc ,
2 dt
and hence
d −2ρt
[e |ŷc − y ∗ |2H ] + ω e−2ρt |ŷc − y ∗ |2V ≤ 2e−2ρt ηc ,
dt

i i

i i
ItoKunisc
i i
2008/6/12
page 290
i i

290 Chapter 10. Parabolic Variational Inequalities

which implies that


 t  t
e−2ρt |ŷc (t) − y ∗ (t)|2H + ω e−2ρs |ŷc (s) − y ∗ (s)|2V ds ≤ e−2ρs ηc (s) ds,
0 0

and the desired convergence of yc to y ∗ in L2 (0, T ; V ) ∩ C(0, T ; H ) follows.


It remains to argue right continuity of t → y ∗ (t) ∈ V . From (10.1.19) it follows that
 t   
−2ρs d ∗ ∗ y ∗ (s − h) − y ∗ (s)
e y (s) + Ay (s) − f (s) , ds ≤ 0, (10.1.20)
τ ds −h

where h > 0. Using a(b − a) = b −a


2 2

2
− 12 (a − b)2 we find
 t
1
lim inf e−2ρs as (y ∗ (s) − ψ̄, y ∗ (s − h) − y ∗ (s))
h→0 −h τ
 t
≥ 2ρ e−2ρs as (y ∗ (s) − ψ̄, y ∗ (s) − ψ̄) ds
τ
1 1
+ e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) − e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄).
2 2
∗ ∗
This estimate, together with assumption (6) and the fact that { y (·−h)−y−h
}h>0 is weakly
bounded in L2 (0, T ; H ), allows us to pass to the limit in (10.1.20) to obtain
e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) − e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄)
t 2
+ τ e−2ρs dtd y ∗ (s) H ds
  t
t d d ∗
≤M |y ∗ (s) − ψ̄|V y ∗ (s) ds + e−2ρs |Aψ̄ − f (s)|H y (s) ds.
dt
τ dt τ

Consequently, we have
e−2ρt as (y ∗ (t) − ψ̄, y ∗ (t) − ψ̄) ≤ e−2ρτ as (y ∗ (τ ) − ψ̄, y ∗ (τ ) − ψ̄)
 t
+ e−2ρs (M 2 |y ∗ (s) − ψ̄|2V + |Aψ̄ − f (s)|2H ) ds
τ

for 0 ≤ τ ≤ t ≤ T . This implies that

 
as (y ∗ (τ )−ψ̄, y ∗ (τ )−ψ̄)+|y ∗ (τ )−ψ̄|2H ≥ lim sup as (y ∗ (t)−ψ̄, y ∗ (t)−ψ̄)+|y ∗ (t)−ψ̄|2H .
t→τ

Since as (φ, φ)+|φ|2H defines an equivalent norm on the Hilbert space V , |y ∗ (t)−y ∗ (τ )|V →

0 as t ↓ τ . Hence y is right continuous.
Now, in addition assumptions (3)–(4) are supposed to hold. Let
λ̂c (t) = c min(0, ŷc (t) − ψ).
Then for c ≤ ĉ and φ = (ŷc − ŷĉ )+
 
(λ̂c − λ̂ĉ , φ) = (c − ĉ) min(0, ŷc − ψ) + ĉ(min(0, ŷc − ψ) − min(0, ŷĉ − ψ)), φ ≥ 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 291
i i

10.2. Regularity 291

Hence, using the arguments leading to (10.1.4), we have ŷc ≤ ŷĉ for c ≤ ĉ. Then ŷc (t) →
y ∗ (t) strongly in H and pointwise a.e. in . Moreover,

λ̂c (t) = c min(0, ŷc − ψ) → λ∗ weakly in L2 (0, T ; V ∗ ).

Summarizing Theorems 10.4 and 10.7 we have the following corollary.

Corollary 10.8. If assumptions (1)–(6) hold, y0 − ψ ∈ C ∩ V , and f ∈ L2 (0, T ; H ), then


for every t ∈ [0, T ]
ŷc (t) ≤ y ∗ (t) ≤ yc (t)
and

ŷc (t) ↑ y ∗ (t) and yc (t) ↓ y ∗ (t)

pointwise a.e., monotonically in  as c → ∞. Moreover, y ∗ − ψ ∈ H 1 (0, T ; H ) ∩


L2 (0, T ; dom(A)) ∩ L2 (0, T ; V ).

10.2 Regularity
In this section we discuss additional regularity of the solution y ∗ to (10.0.1) under the
assumptions of Theorem 10.4 (3). For h > 0 we have, suppressing the superscripts “∗ ”,
   
d (y(t + h) − y(t) y(t + h) − y(t) λ(t + h) − λ(t) f (t + h) − f (t)
+A + = .
dt h h h h

From (10.0.5)

(λ(t + h) − λ(t), y(t + h) − y(t)) = −(λ(t + h), y(t) − ψ) − (λ(t), y(t + h) − ψ) ≥ 0

and thus
   
d (y(t + h) − y(t) (y(t + h) − y(t)
,
dt h h
   
y(t + h) − y(t) y(t + h) − y(t)
+ A ,
h h
 
f (t + h) − f (t) y(t + h) − y(t)
≤ , .
h h

Multiplying this by t > 0 we obtain


 
d t y(t + h) − y(t) 2 tω y(t + h) − y(t) 2 1 (y(t + h) − y(t) 2
+ ≤ 2
dt 2 h
H 2 h V h


y(t + h) − y(t) 2 (f (t + h) − f (t) 2
+ρt + t .
h 2ω h ∗
H V

i i

i i
ItoKunisc
i i
2008/6/12
page 292
i i

292 Chapter 10. Parabolic Variational Inequalities

Integrating in time,
 t
y(t + h) − y(t) 2 y(s + h) − y(s) 2
ds
t
h +ω s
h
H 0 V

 t  
y(s + h) − y(s) 2 f (s + h) − f (s) 2
≤e 2ρt + s
h ω h ∗ ds,
0 H V

and letting h → 0+ , we obtain


 t  t  
d 2 d 2 d 2 s d 2

t y + ω
s y ds ≤ e 2ρt y + f ds. (10.2.1)
dt H dt V dt H ω dt
0 0 H

Hence we obtain the following theorem.

Theorem 10.9. Suppose that assumptions (1)–(5) hold and that y0 − ψ ∈ C ∩ V , f ∈


L2 (0, T ; H ) ∩ H 1 (0, T , V ∗ ). Then the strong solution satisfies (10.2.1) and thus y(t) ∈
dom (A) for all t > 0.

The conclusion of Theorem 10.9 remains correct under the assumptions of the first
part of Theorem 10.7, i.e., under assumptions (1)–(2) and (5)–(6), y0 − ψ ∈ C ∩ V , and
f ∈ L2 (0, T ; H ) ∩ H 1 (0, T , V ∗ ).

10.3 Continuity of q → y(q) ∈ L∞ ()


In this section we analyze the continuous dependence of the strong solution to (10.0.1) with
respect to parameters in the operator A. Let U denote the normed linear space of parameters
and let Ũ ⊂ U be a bounded subset such that for each q ∈ Ũ the operator A(q) satisfies
the assumptions (1)–(5) specified at the beginning of this chapter with ρ = 0. We assume
further that dom(A(q)) = D is independent of q ∈ Ũ and that
q ∈ Ũ → A(q) ∈ L(X, V ∗ )
is Lipschitz continuous with Lipschitz constant κ. Let y(q) be the solution to (10.0.1)
corresponding to A = A(q), q ∈ Ũ . Then for q1 , q2 ∈ Ũ , we have

d
(y(q1 ) − y(q2 )) + A(q1 )(y(q1 ) − y(q2 ))
dt

+ (A(q1 ) − A(q2 ))y(q2 ) , y(q1 ) − y(q2 ) ≤ 0,

and therefore
 T
|y(q1 )(T ) − y(q2 )(T )|2H + ω |y(q1 ) − y(q2 )|2V dt
0

1 T
≤ |(A(q2 ) − A(q1 ))y(q2 )|2V ∗ dt = e(q1 , q2 )2 ,
ω 0

i i

i i
ItoKunisc
i i
2008/6/12
page 293
i i

10.3. Continuity of q → y(q) ∈ L∞ () 293

where
κ
e(q1 , q2 )2 ≤ |q1 − q2 |2U |y(q2 )|2L2 (0,T ;V ) .
ω

Since from Theorem 10.9, y(q)(T ) ∈ D for q ∈ Ũ , it follows by interpolation that

|y(q1 )(T ) − y(q2 )(T )|W α ≤ C̄e(q1 , q2 )1−α (10.3.1)

where W α = [H, D]α is the interpolation space between D and H (see, e.g., [Fat, Chapter
8]), and C̄ is an embedding constant. If L∞ () ⊂ W α , with α ∈ (α0 , 1) for some α0 , then
Hölder’s continuity of q ∈ Ũ → y(q)(T ) ∈ L∞ () follows.
Next we prove Lipschitz continuity of q ∈ Ũ → y(q) ∈ L∞ (0, T ; L∞ ()). Some
prerequisites are established first. We assume that A(q) generates an analytic semigroup
S(t) = S(t; q) on H for every q ∈ Ũ [Paz]. Then for each q ∈ Ũ there exists M such that

M
Aα S(t) ≤ for all t > 0, (10.3.2)

where Aα denote the fractional powers of A, with α ∈ (0, 1). We assume that M is
independent of q ∈ Ũ . We shall further assume that
1
dom(A 2 ) = V (10.3.3)

for all q ∈ Ũ , which is the case for a large class of second order elliptic differential operators;
see, e.g., [Fat, Chapter 8]. We assume that

A(q)L(V ,V )∗ ≤ ω̄ for all q ∈ Ũ . (10.3.4)

Let r > 2 be such that

V ⊂ Lr (),

and let M̄ denote the embedding constant so that

|ζ |Lr () ≤ M̄|ζ |V for all ζ ∈ V . (10.3.5)

For
2r
p= ∈ (2, ∞),
r −2
we shall utilize the assumption

|A− 2 (q1 )(A(q1 ) − A(q2 ))y|Lp () ≤ κ̄|q1 − q2 |U |A(q2 )α y|H


1

(10.3.6)
for some α ∈ (0, 1) and all q1 , q2 ∈ Ũ , y ∈ D.

This assumption is applicable, for example, if the parameter enters as a constant into
the leading differential term of A(q) or if it enters into the lower order terms.

i i

i i
ItoKunisc
i i
2008/6/12
page 294
i i

294 Chapter 10. Parabolic Variational Inequalities

Theorem 10.10. Let A(q) generate an analytic semigroup for every q ∈ Ũ , and let
assumptions (1)–(5) and (7) hold. If further (10.3.3)–(10.3.6) are satisfied and f ∈
L∞ (0, T ; H ), y0 ∈ C ∩ D, then q → y(q) is Lipschitz continuous from Ũ → L∞ (0, T ;
L∞ ()).

Proof. (1) Let q ∈ Ũ and A = A(q). Let f 1 , f 2 ∈ L∞ (0, T ; H ) with

A− 2 f i ∈ L∞ (0, T ; Lp ()), i = 1, 2,
1

and let y 1 , y 2 denote the corresponding strong solutions to (10.0.1). For k > 0 let

φk = max(0, y 1 − y 2 − k)

and
k = {x ∈  : φk > 0}.

By assumption φk ∈ V and φk ≥ 0. Note that

(min(0, λ̄ + c (z1 − ψ)) − min(0, λ̄ + c (z2 − ψ)), (z1 − z2 − k)+ ) ≥ 0

for all z1 , z2 ∈ H and k > 0. Thus, it follows from (10.0.5) and Theorem 10.4 that for
φk = (y 1 − y 2 − k)+ ∈ V
 
d 1
(y − y ), φk + A(y 1 − y 2 ), φk
≤ (f 1 − f 2 , φk )
2
(10.3.7)
dt

for a.e. t ∈ (0, T ). By assumption we have

Aζ, (ζ − k)+ ) ≥ ω|(ζ − k)+ |2V

for ζ ∈ V and hence it follows from (10.3.7) that


 T  T  T
|A− 2 f |L2 (k ) |φk |V dt,
1
ω |φk |2V dt ≤ |(f, φk )| dt ≤ A(q)L(V .V ∗ )
0 0 0

where φk (0) = 0, f = f 1 − f 2 , and thus for any β > 1


 T  12  T  12
− 12
ω |φk |2V dt ≤ ω̄ |A f |2L2 (k ) dt
0 0
 T  β2
− 12
≤ ω̄ C 1−β
|A f |2L2 (k ) dt (10.3.8)
0

with
 T  12
− 12
C= |A f |2L2 dt .
0

i i

i i
ItoKunisc
i i
2008/6/12
page 295
i i

10.3. Continuity of q → y(q) ∈ L∞ () 295

For p > q = 2 we have


  q/p
|A− 2 f |q dx ≤ |A− 2 f |p
1 1
|k |(p−q)/p . (10.3.9)
k k

We denote by h and k arbitrary real numbers satisfying 0 < k < h < ∞ and we find
for r > 2
 
|φk |rLr ≥ |φ − k|r dx > |φ − k|r ds ≥ |h ||h − k|r . (10.3.10)
k h

It thus follows from (10.3.5) and (10.3.8)–(10.3.10) that for β > 1


 T  12  T  12  T  β2
2 M̄ ω̄M̄C 1−β − 21
|h | dt
r ≤ |φk |2V ≤ |A f |2L2 (k ) dt
0 |h − k| 0 ω|h − k| 0

 T  β2
ω̄M̄C 1−β − 12
p−2
≤ |A f |2Lp |k | p dt .
ω|h − k| 0

For 1
P
+ 1
Q
= 1 this implies that

 T  12  T  2Pβ  T  2Qβ
2 ω̄M̄C 1−β − 12
Q(p−2)
|h | dt
r ≤ |A f |2P
Lp dt |k | p dt .
0 ω|h − k| 0 0

For P = ∞ and Q = 1 this implies, using that p = 2r


r−2
,

 T  12  T  β2
2 K 2
|h | dt
r ≤ |k | dtr , (10.3.11)
0 |h − k| 0

|A− 2 f |L∞ (0,T ;Lp ()) .


1−β 1 β
where K = ω̄M̄C
ω
Now, we use the following fact [Tr]: Let ϕ : (k1 , h1 ) → R be a nonnegative,
nonincreasing function and suppose that there are positive constants K, s, and β > 1
such that
ϕ(h) ≤ K(h − k)−s ϕ(k)β for k1 < k < h < h1 .
1 β β−1
Then, if k̂ = K s 2 β−1 ϕ(k1 ) r satisfies k1 + k̂ < h1 , it follows that ϕ(k1 + k̂) = 0. Here
we set
 T  12
r
ϕ(k) = |k | dt 2

on (0, ∞), s = 1, β > 1, and

k1 = sup |A− 2 (f 1 (t) − f 2 (t))|Lp () .


1

t∈(0,T )

i i

i i
ItoKunisc
i i
2008/6/12
page 296
i i

296 Chapter 10. Parabolic Variational Inequalities

It then follows from (10.3.8)–(10.3.10), as in the computation below (10.3.10), that

M̄ ω̄C
ϕ(k1 ) ≤ .
ωk1

From the definition of k̂ we have in case C ≥ k1


 β  β
β ω̄M̄ β 1−β β ω̄M̄
k̂ ≤ 2 β−1 C 1−β k1 C β−1 k1 =2 β−1 k1 .
ω ω

The same estimate can also be obtained in the case that C ≤ k1 , and consequently k1 + k̂ ≤
β
 k1 , where  = 1 + 2 β−1 ( ω̄ωM̄ )β . Hence we obtain y 1 − y 2 ≤ k1 a.e. in (0, T ) × .
Analogously a uniform lower bound for y 1 − y 2 is obtained by using φk = min(0, y 1 −
y 2 − k) ≤ 0 and thus

|y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤  sup |A− 2 (f 1 (t) − f 2 (t))|Lp () .


1
(10.3.12)
t∈(0,T )

(2) We use the estimate of step (1) to obtain Lipschitz continuous dependence of the
solution on the parameter q. Let q1 , q2 ∈ Ũ with corresponding solutions y(q1 ) and y(q2 ).
Since
d
y(q2 ) + A(q1 )y(q2 ) + (A(q2 ) − A(q1 ))y(q2 ) + λ(q2 ) = f (t),
dt

y(q2 ) is the solution to (10.0.1) with A = A(q1 ) and f˜(t) = f − (A(q2 ) − A(q1 ))y(q2 ) ∈
L2 (0, T ; H ). Hence we can apply the estimate of (1) with A = A(q1 ) and f 1 − f 2 =
(A(q2 ) − A(q1 ))y(q2 ) and obtain

|y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤  sup |A(q1 )− 2 (A(q1 ) − A(q2 ))y(t; q2 )|Lp () .
1

t∈(0,T )

Utilizing (10.3.6) this implies that

|y 1 − y 2 |L∞ (0,T ;L∞ ()) ≤  κ̄ |q1 − q2 |U sup |Aα (q2 )y(t; q2 )|H . (10.3.13)
t∈(0,T )

To estimate Aα (q2 )y(t; q2 ) recall from Theorem 10.4 that λ̄ ≤ λ(t; q) ≤ 0 and thus {f −
λ(q) : q ∈ Ũ } is uniformly bounded in L∞ (0, T ; H ). From (10.0.6) we have that
 t
A(q2 )α y(t; q2 ) = A(q2 )α S(t; q2 )y0 + A(q2 )α S(t − s; q2 )(f (s) − λ(s; q2 )) ds
0
∈ L∞ (0, T ; H ).

From (10.3.2) and since y0 ∈ D it follows that {A(q2 )α y(q2 ) : q2 ∈ Ũ } is bounded in



L (0, T ; H ) as desired.

i i

i i
ItoKunisc
i i
2008/6/12
page 297
i i

10.4. Difference schemes and weak solutions 297

10.4 Difference schemes and weak solutions


In this section we establish existence of weak solutions to (10.0.1) based on finite difference
schemes (10.4.1). This difference approximation is also used to establish uniqueness of
the weak solution and for proving monotonicity properties of the solution in the following
section. For the sake of the simplicity of presentation we assume that ρ = 0. Consequently
Aφ, φ
= as (φ, φ) defines an equivalent norm on V . For h > 0 consider the discretized
(in time) variational inequality: Find y k − ψ ∈ C, k = 1, . . . , N, satisfying
 
y k − y k−1
+ Ay k − f k , y − y k ≥ 0, y 0 = y0 (10.4.1)
h

for all y − ψ ∈ C, where


 kh
1
f =
k
f (t) dt
h (k−1)h

and N h = T . Throughout this section we assume that y0 ∈ H and f ∈ L2 (0, T ; V ∗ ).

Theorem 10.11. Assume that y0 ∈ H and f ∈ L2 (0, T , V ∗ ) and that assumptions (1)–(2)
hold. Then there exists a unique solution {y k }N
k=1 to (10.4.1).

Proof. To establish existence of solutions to (10.4.1), we proceed by induction with respect


to k and assume that existence has been proven up to k − 1. To verify the induction step
consider the regularized problems

yck − y k−1
+ Ayck + c min(0, yck − ψ) − f k = 0. (10.4.2)
h
Since
y ∈ H → min c (0, y − ψ)
is Lipschitz continuous and monotone, the operator B : V → V ∗ defined by
y
B(y) = + Ay + c min(0, y − ψ)
h
is coercive, monotone, and continuous for all h > 0. Hence by the theory of maximal
monotone operators (10.4.2) admits a unique solution; cf., e.g., [ItKa, Chapter I.5], [Ba,
Chapter II.1]. For each c > 0 and k = 1, . . . , N we find

1
(|y k − ψ|2H − |y k−1 − ψ|2H + |yck − y k−1 |2H )
2h c

+ A(yck − ψ) + Aψ − f k , yck − ψ
+ c |(yck − ψ)− |2H = 0.

Thus the families |yck − ψ|2V and c |(yck − ψ)− |2H are bounded in c > 0 and there exists
a subsequence of {yck − ψ} that converges to some y k − ψ weakly in V as c → ∞. As

i i

i i
ItoKunisc
i i
2008/6/12
page 298
i i

298 Chapter 10. Parabolic Variational Inequalities

argued in the proof of Theorem 10.7 (cf. (10.1.15)), y k − ψ ∈ C and hence y k − ψ ∈ C.


Note that

(−(yck −ψ)− , y −yck ) = (−(yck −ψ)− , y −ψ −(yck −ψ)) ≤ 0 for all y −ψ ∈ C (10.4.3)

and

lim inf Ayck , yck − y

c→∞
 
= lim inf A(yck − ψ), yck − ψ
+ A(yck − ψ), ψ − y
+ Aψ, yck − ψ

c→∞
(10.4.4)
≥ A(y − ψ), y − ψ
+ A(y − ψ), ψ − y
+ Aψ, y − ψ

k k k k

= Ay k , y k − y
.

Passing to the limit in (10.4.2) utilizing (10.4.3) and (10.4.4) we obtain


 
y k − y k−1
+ Ay k − f k , y k − y
h
 k  
yc − y k−1 k
≤ lim inf , yc − y + Ayck , yck − y
− f k , yck − y
≤ 0,
c→∞ h

and hence y k satisfies (10.4.1).


To verify uniqueness, let ỹ k be another solution to (10.4.1). Then, from (10.4.1),
 
(y k − ỹ k−1 ) − (y k−1 − ỹ k−1 )
+ A(y − ỹ ), y − ỹ ≤ 0
k k k k
h

and thus
1 k 1 k−1
|y − ỹ k |2H + A(y k − ỹ k ), y k − ỹ k
≤ |y − ỹ k−1 |2H .
2h 2h
Since y 0 = ỹ 0 = y0 , this implies that y k = ỹ k for all k ≥ 1.

Next we discuss existence and uniqueness of weak solutions to (10.0.1) by passing to


the limit in the piecewise defined functions

t − k h k+1
yh(1) = y k + (y − y k ), yh(2) = y k+1 on (k h, (k + 1) h] (10.4.5)
h

for k = 0, . . . , N − 1.

Theorem 10.12. Suppose that the assumptions of Theorem 10.11 hold. Then there exists
a unique weak solution y ∗ of (10.0.1). Moreover t → y ∗ (t) ∈ H is right continuous,
y ∗ ∈ B(0, T ; H ), and yh(2) − ψ → y ∗ − ψ strongly in L2 (0; T ; V ).

i i

i i
ItoKunisc
i i
2008/6/12
page 299
i i

10.4. Difference schemes and weak solutions 299

Proof. Setting y = ψ in (10.4.1), we obtain


h
|y k − ψ|2H − |y k−1 − ψ|2H + |y k − y k−1 |2H + hω |y k − ψ|2V ≤ |Aψ − f k |2V ∗ .
ω
Thus,

m
1 
m
|y m − ψ|2H + (|y k − y k−1 |2H + ωh |y k − ψ|2V ) ≤ |y  − ψ|2H + |Aψ − f k |2V ∗ h
k=+1
ω k=+1
(10.4.6)
for all 0 ≤  < m ≤ N . Since

h  k
T N
|yh(1) − yh(2) |2H h ≤ |y − y k−1 |2H h → 0 as h → 0+ .
0 3 k=1

From the above estimates it follows that there exist subsequences of yh(1) , yh(2) (denoted by
the same symbols) and y ∗ (t) ∈ L2 (0, T ; V ) such that

yh(1) (t), yh(2) (t) → y ∗ (t) weakly in L2 (0, T ; V ) as h → 0+ . (10.4.7)

Note that
d (1) y k+1 − y k
yh = on (k h, (k + 1) h].
dt h
Thus, we have from (10.4.1) for every y ∈ K
   
d d (1) d
y + Ayh(2) − fh , y − yh(2) + yh − y, y − yh(2) ≥ 0 (10.4.8)
dt dt dt
a.e. in (0, T ). Here
     
d (1) d (1) d d (1) d
(yh − y), y − yh(2) = yh − y, y − yh(1) + yh − y, yh(1) − yh(2)
dt dt dt dt dt
(10.4.9)
with
 T 
d (1) d 1
yh − y, y − yh(1) dt ≤ |y(0) − y0 |2H (10.4.10)
0 dt dt 2
and
  
1  k
T N
d (1) (1)
y , yh − yh(2) dt = − |y − y k−1 |2H . (10.4.11)
0 dt h 2 k=1

Since
 T  T
Ay ∗ , y ∗ − y
dt ≤ lim inf
+
Ayh(2) , yh(2) − y
dt,
0 h→0 0

i i

i i
ItoKunisc
i i
2008/6/12
page 300
i i

300 Chapter 10. Parabolic Variational Inequalities

which can be argued as in (10.4.4), it follows from (10.4.7)–(10.4.11) that every weak cluster
point y ∗ of yh(2) satisfies
 T  
d 1
y + Ay ∗ − f, y − y ∗ dt + |y(0) − y0 |2H ≥ 0 (10.4.12)
0 dt 2

for all y ∈ K and a.e. t ∈ (0, T ). Hence y ∗ ∈ L2 (0, T ; V ) is a weak solution of (10.0.1)
and y ∗ ∈ B(0, T ; H ).
Moreover, from (10.4.6)

∗ ∗ 1 t
|y (t) − ψ|H ≤ |y (τ ) − ψ|H +
2 2
|Aψ − f (s)|2V ∗ ds
ω τ

for all 0 ≤ τ ≤ t ≤ T . Thus,

lim sup |y ∗ (t) − ψ|2H ≤ |y ∗ (τ ) − ψ|2H .


t↓τ

which implies that t → y ∗ (t) ∈ H is right continuous.


Let y ∗ be a weak solution. Setting y = yh(1) ∈ K in (10.4.12) and y = y ∗ (t) in
(10.4.8) we have
 T 
d (1) ∗ (1) ∗
y + Ay − f, yh − y dt ≥ 0, (10.4.13)
0 dt h
 T 
d (1) (2) ∗ (2)
y + Ayh − fh , y − yh dt ≥ 0, (10.4.14)
0 dt h
where we used that   
T
d (1)
(yh − y ∗ ), y ∗ − yh(2) dt ≤ 0
0 dt
from (10.4.9)–(10.4.11).
Summing up (10.4.13) and (10.4.14) and using (10.4.11) implies that
 T  
Ay ∗ , yh(1) − yh(2)
− f, yh(1) − y ∗
− fh , y ∗ − yh(2)
dt
0

 T
1  k
N
≥ |y − y |H +
k−1 2
A(y ∗ − yh(2) ), y ∗ − yh(2)
dt.
2 k=1 0

Letting h → 0+ we obtain 0 ≥ A(y ∗ (t) − ŷ(t)), y ∗ (t) − ŷ(t)


a.e. on (0, T ) for every
weak cluster point ŷ of yh(2) in L2 (0, T ; V ). This implies that the weak solution is unique
and that
 T
A(y ∗ − yh(2) ), y ∗ − yh(2)
dt → 0
0

as h → 0+ .

i i

i i
ItoKunisc
i i
2008/6/12
page 301
i i

10.4. Difference schemes and weak solutions 301

Corollary 10.13. Let y = y(y0 , y) denote the weak solution to (10.0.1), given y0 ∈ H and
f ∈ L2 (0, T ; V ∗ ). Then for all t ∈ [0, T ]
 T
˜
|y(y0 , f )(t) − y(ỹ0 , f )(t)|H + ω |y(y0 , f ) − y(ỹ0 , f˜)|2V ds
0
 t
1
≤ |y0 − ỹ0 |2H + |f − f˜|2V ∗ ds.
ω 0

Proof. Let y and ỹ be the solution to (10.4.1) corresponding to (y0 , f ) and (ỹ0 , f˜). It
k k

then follows from (10.4.1) that


 k 
(y − ỹ k ) − (y k−1 − ỹ k−1 ) ˜
+ A(y − ỹ ) − (f − f ), y − ỹ ≤ 0.
k k k k k k
h
Thus,
h k
|y k − ỹ k |2H + ωh |y k − ỹ k |2V ≤ |y k−1 − ỹ k−1 |2H + |f − f˜k |2V ∗ .
ω
Summing this in k, we have

m
1
m
|y −
m
ỹ m |2H +ω h |y −
k
ỹ k |2V ≤ |y0 − ỹ0 |2H + h |f k − f˜k |2V ∗ ,
k=1
ω k=1

which implies the desired estimate by letting h → 0+ .

Corollary 10.14. Let λ̄ ∈ H satisfy λ̄ ≤ 0 and let yc ∈ W (0, T ) be the solution to


d
yc (t) + Ayc (t) + min(0, λ̄ + c (yc − ψ)) = f. (10.4.15)
dt
Then yc → y ∗ weakly in L2 (0, T ; V ) and yc (T ) → y ∗ (T ) weakly in H as c → ∞, where
y ∗ is the unique weak solution to (10.0.1). In addition, if y ∗ ∈ W (0, T ), then
|yc − y ∗ |L2 (0,T ;V ) + |yc − y ∗ |L∞ (0,T ;H ) → 0
as c → ∞.

Proof. Note that


c 1
(min(0, λ̄ + c (yc − ψ)), yc − ψ) ≥ |(yc − ψ)− |2H − |λ̄|2H .
2 2c
Thus, we have
1 d 1
|yc − ψ|2H + A(y − ψ), yc − ψ
+ c |(yc − ψ)− |2H ≤ f − Aψ, yc − ψ
+ |λ̄|2
2 dt c
and
 t  t 
1 1
|yc (t) − ψ|2H + ω (|yc − ψ|2V + c |(yc − ψ)2H ) ds ≤ |f − Aψ|2V ∗ + |λ̄|2 ds.
0 ω 0 c

i i

i i
ItoKunisc
i i
2008/6/12
page 302
i i

302 Chapter 10. Parabolic Variational Inequalities

T
Hence 0 |yc − ψ|2H dt → 0 as c → 0 and {yc − ψ}c≥1 is bounded is L2 (0, T ; V ). Using
the same arguments as in the proof of Theorem 10.7, there exist y ∗ and a subsequence of
{yc − ψ}c≥1 that converges weakly to y ∗ − ψ ∈ L2 (0, T ; V ), and y ∗ − ψ ≥ 0 a.e. in
(0, T ) × . For y(t) ∈ K
 T 1 
d d
y(t) − (y(t) − yc ), y(t) − yc (t) + Ayc (t) − f (t), y(t) − yc (t)

0 dt dt
2
+(min(0, λ̄ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) dt = 0,

where
 T 
d 1
− (y(t) − yc ), y(t) − yc (t) dt = (|y(0) − y0 |2H − |y(T ) − yc (T )|2 ), (10.4.16)
0 dt 2
1 2
(min(0, λ̄ + c (yc − ψ)), y(t) − ψ − (yc − ψ)) ≤ |λ̄| . (10.4.17)
2c H
Hence, we have
 T  
d 1
[ y(t), y(t) − yc (t) dt + Ayc (t) − f (t), y(t) − yc (t)
+ |y(0) − y0 |2H
0 dt 2
 T
1 1
≥ |y(T ) − yc (T )|2H − |λ̄|2H ds.
2 2c 0

Letting c → ∞, y ∗ satisfies (10.0.7) and thus y ∗ is the weak solution of (10.0.1).


Suppose that y ∗ ∈ W (0, T ). Then by (10.4.15),
 
d ∗ ∗ d ∗ ∗ ∗
(yc − y ) + A(yc − y ) + y + Ay − f, y − yc
dt dt

+(min(0, λ̄ + c (yc − ψ)), y ∗ (t) − ψ − (yc − ψ)) = 0.

From (10.4.16)–(10.4.17), and since yc → y ∗ weakly in L2 (0, T ; V ),


 t
1 ∗
|yc (t) − y (t)|H +
2
A(yc − y ∗ ), yc − y ∗
ds → 0,
2 0

and this convergence is uniform with respect to t ∈ [0, T ].

10.5 Monotone property


In this section we establish monotone properties of the weak solution to (10.0.1). As in the
previous section assumption (2) is used with ρ = 0.

Corollary 10.15. If assumptions (1)–(3) hold, then

y(y0 , f ) ≥ y(ỹ0 , f˜)

i i

i i
ItoKunisc
i i
2008/6/12
page 303
i i

10.5. Monotone property 303

provided that y0 ≥ ỹ0 and f ≥ f˜, with y0 , ỹ0 ∈ H and f, f˜ ∈ L2 (0, T ; V ∗ ). As a


consequence, if y0 = ψ and f (t) ≥ f (s) for a.e. t > s, then the weak solution satisfies
y(t) ≥ y(s) for t ≥ s.

Proof. Assume that y k−1 ≥ ỹ k−1 . For φ = −(yck − ỹck )− it follows from (10.4.2) that
 
yck − ỹck − (y k−1 − ỹ k−1 ) 
, φ + A(yck − ỹck ) − (f k − f˜k ), φ
− c (yck − ψ)−
h

−(ỹck − ψ)− , φ = 0.

Since
 k−1 
y − ỹ k−1  
− − (f k − f˜k ), φ − c (yck − ψ)− − (ỹck − ψ)− , φ ≥ 0,
h

we have from assumption (3) that

|φ|2H
+ Aφ, φ
≤ 0
h

and thus yck − ỹck ≥ 0 for sufficiently small h > 0. From the proof of Theorem 10.11 it
follows that we can take the limit with respect to c and obtain y k − ỹ k ≥ 0. By induction
this holds for all k ≥ 0. The first claim of the theorem now follows from (10.4.2) and
Theorem 10.12. The second one follows from

y(t; ψ, f (·)) = y(s; y(t − s; ψ, f ), f (· + t − s)) ≥ y(s; ψ, f (·)).

Corollary 10.16. Let assumptions (1)–(3) hold and suppose that the stationary variational
inequality

Ay − f, φ − y
≥ 0 for all φ − ψ ∈ C (10.5.1)

with f ∈ V ∗ has a solution y − ψ ∈ C. Then if y(0) = ψ and f (t) = f , we have y(t) ↑ ŷ,
where ŷ is the minimum solution to (10.5.1).

Proof. Suppose ȳ is a solution to (10.5.1). Since ȳ(t) := ȳ, t ≥ 0, is also the unique
solution to (10.0.1) with y0 = ȳ ≥ ψ, it follows from Corollary 10.16 that y(t) ≤ ȳ for all
t ∈ [0, T ]. On the other hand, it follows from Theorem 10.4 (2) that
  τ +1 
y(τ + 1) − y(τ ) + (Ay(s) − f ) ds, φ − y(τ ) ≥ 0 for all φ − ψ ∈ C (10.5.2)
τ

since y(s) ≥ y(τ ), s ≥ τ . By the Lebesgue monotone convergence theorem ŷ =


limτ →∞ y(τ ) ∈ C exists. Letting τ → ∞ in (10.5.2), we obtain that ŷ satisfies (10.5.1) and
thus ŷ ≤ ȳ.

i i

i i
ItoKunisc
i i
2008/6/12
page 304
i i

304 Chapter 10. Parabolic Variational Inequalities

Corollary 10.17 (Perturbation). Let assumptions (1)–(3) hold, and let ψ 1 , ψ 2 ∈ H and
f ∈ L2 (0, T ; V ∗ ). Denote by yc1 and yc2 the solutions to (10.1.1) with y0 equal to ψ 1 and
ψ 2 , respectively, and let y 1 and y 2 be the corresponding weak solutions to (10.0.1). Assume
that (φ − γ )+ ∈ V for any γ ∈ R+ , that φ ∈ V , and that A1, (φ − γ )+
≥ 0. Then for
α = max(0, supx∈ (ψ 1 − ψ 2 )) and β = min(0, inf x∈ (ψ 1 − ψ 2 )) we have

β ≤ yc1 − yc2 ≤ α,

β ≤ y 1 − y 2 ≤ α.

Proof. Note that

c(yc1 − yc2 − α) = λ̄ + c(yc1 − ψ 1 ) − (λ̄ + c(yc2 − ψ 2 )) − c(ψ 1 − ψ 2 − α).

Thus, an elementary calculation shows that

(min(0, λ̄ + c(yc1 − ψ 1 )) − min(0, λ̄ + c(yc2 − ψ 2 )), (yc1 − yc2 − α)+ ) ≥ 0.

As in the proof of Theorem 10.4 it follows that φ = (yc − yc2 − α)+ satisfies

1 d
|φ|2 ≤ ρ|φ|2H .
2 dt H
Since φ(0) = 0, this implies that φ(t) = 0, t ≥ 0, and thus yc1 − yc2 ≤ α a.e. on (0, T ) × .
Similarly, letting φ = (yc1 − yc2 − β)− we obtain yc1 − yc2 ≥ β a.e. on (0, T ) × . Since
from Corollary 10.14, yc1 → y 1 and yc2 → y 2 weakly in L2 (0, T ; H ) for c → ∞, we obtain
the desired estimates.

i i

i i
ItoKunisc
i i
2008/6/12
page 305
i i

Chapter 11

Shape Optimization

11.1 Problem statement and generalities


We apply the framework that we developed in Section 5.5 for weakly singular problems to
calculate the shape derivative of constrained optimization problems of the form
-
min J (y, , )
(11.1.1)
subject to e(y, ) = 0
over admissible domains  and manifolds . Here e denotes a partial differential equation
depending on the state variable y and the domain , and J stands for a cost functional
depending, besides y and , on , which constitutes the variable part of the boundary of .
We consider as an example the problem of determining an unknown interior domain
D ⊂ U in an elliptic equation
-
y = 0, in  = U \ D,
y = 0 on ∂D
from the Cauchy data
∂y
y = f, = g on the outer boundary ∂U.
∂n
This problem can be formulated as the shape optimization problem
⎧ 
⎪ min ∂U |y − f |2 ds


subject to



y = 0 in  and y = 0 on ∂D, ∂n ∂
y = g on ∂U,
and it is a special case by the theory to be presented in Section 11.2.
Shape sensitivity calculus has been widely analyzed in the past and is covered in the
well-known lecture notes [MuSi] and in the monographs [DeZo, HaMä, HaNe, SoZo, Zo],

305

i i

i i
ItoKunisc
i i
2008/6/12
page 306
i i

306 Chapter 11. Shape Optimization

for example. The most commonly used approach relies on differentiating the reduced
functional Jˆ() = J (y(), , ()) using the chain rule. As a consequence shape differ-
entiability of y with respect to variations of the domain are essential in this method. In an
alternative approach the partial differential equation is realized in a Lagrangian formulation;
see [DeZo], for example.
The method for computing the shape derivative that we describe here is quite different
and elementary. In short, it can be described as follows. First we embed e(y, ) = 0 to an
equation in a fixed domain 0 by a coordinate transformation which is called the method
of mapping. Then we combine the Lagrange multiplier method to realize the constraint
e(y, ) = 0, using the shape derivative of functionals to calculate the shape derivative of
Jˆ(). In this process, differentiability of the state with respect to the geometric quantities
is not used. In fact, we require only Hölder continuity with exponent greater than or equal
to 12 of y with respect to the geometric data. We refer the reader to [IKPe2] for an example
in which the reduced cost functional is shape differentiable whereas the state variable of the
constraining partial differential equation is not.
For comparison we briefly discuss an example using the “chain rule” approach. But
first we require some notions from shape calculus, which we introduce only formally here.
Let  be a reference domain in Rd , and let h : U → Rd , with  ¯ ⊂ U , denote a mapping
describing perturbations of  by means of

t = Ft (),

where Ft :  → Rd is a perturbation of the identity, given by Ft = id + th, for example.


Let {zt : t → R|t ≤ τ }, for some τ > 0, denote a family of mappings, and consider the
associated family zt = zt ◦ Ft :  → R which transports zt back to . Then the shape
derivative of {zt } at  in direction h is defined as

1
z (x) = lim (zt (x) − z0 (x)) for x ∈ ,
t→0 t

and the material derivative is given by

1
ż(x) = lim (zt (x) − z0 (x)) for x ∈ .
t→0 t

Under appropriate regularity conditions we have

z = ż − ∇z · F0 .

For a functional  → J () the shape derivative at  with respect to the perturbation h is
defined as
1 
J ()h = lim J (t ) − J () .
t→0 t
Consider now the cost functional

1
min J (y, , ) = y2d (11.1.2)
2

i i

i i
ItoKunisc
i i
2008/6/12
page 307
i i

11.1. Problem statement and generalities 307

subject to the constraint e(y, ) = 0 which is given by the mixed boundary value problem

− y = f in , (11.1.3)
y=0 on 0 , (11.1.4)
∂y
=g on . (11.1.5)
∂n

Here the boundary of the domain  is the disjoint union of a fixed part 0 and the unknown
part , and f and g are given functions. A formal differentiation leads to the shape derivative
of the reduced cost functional
   
1 ∂y 2
Jˆ ()h = yy d + + κy 2 h · n d , (11.1.6)
2 ∂n

where y denotes the shape derivative of the solution y of (11.1.3) at  with respect to
a deformation field h, and κ stands for the curvature of . For a thorough discussion of
the details we refer the reader to [DeZo, SoZo]. Differentiating formally the constraint
e(y, ) = 0 with respect to the domain, one obtains that y satisfies

− y = 0 in ,
y = 0 on 0 , (11.1.7)
 
∂y ∂g
= div (h · n ∇ y) + f + + κg h · n on ,
∂n ∂n

where div , ∇ stand for the tangential divergence and tangential gradient, respectively, on
the boundary . Introducing a suitably defined adjoint variable and using (11.1.7), the first
term on the right-hand side of (11.1.6) can be manipulated in such a way that Jˆ ()h can be
represented in the form required by the Zolesio–Hadamard structure theorem (see [DeZo])

Jˆ ()h = Gh · n d .

We emphasize that the kernel G does not involve the shape derivative y anymore. Although
y is only an intermediate quantity, a rigorous analysis requires justifying the formal steps in
the preceding discussion. In addition one has to verify that the solution of (11.1.7) actually
is the shape derivative of y in the sense of the definition in, e.g., [SoZo]. Furthermore, since
the trace of y on 0 is used in (11.1.7) one needs y ∈ H 1 (). However, y ∈ H 2 () is not
sufficient to allow for an interpretation of the Neumann condition in (11.1.7) in H −1/2 ( ).
Hence y ∈ H 1 () requires more regularity of the solution y of (11.1.3) than H 2 (). In
the approach of this chapter we utilize only y ∈ H 2 () for the characterization of the shape
derivative of Jˆ(). We return to this example in Section 11.3.
In Section 11.2 we present the proposed general framework to compute the shape
derivative for (11.1.1). Section 11.3 contains applications to shape optimization constrained
by linear elliptic systems, inverse interface problems, the Bernoulli problem, and shape
optimization for the Navier–Stokes equations.

i i

i i
ItoKunisc
i i
2008/6/12
page 308
i i

308 Chapter 11. Shape Optimization

11.2 Shape derivative


Consider the shape optimization problem
  
min J (y, , ) ≡ j1 (y) dx + j2 (y) ds + j3 (y)ds (11.2.1)
 ∂\

subject to the constraint


e(y, ) = 0, (11.2.2)
which represents a partial differential equation posed on the domain  ⊂ Rd with boundary
∂. We focus on sensitivity analysis of the reduced cost functional in (11.2.1)–(11.2.2)
with respect to .
To describe the admissible class of geometries, let U ⊂ Rd be a fixed bounded domain
with C 1,1 -boundary ∂U , or a convex domain with Lipschitzian boundary, and let D be a
domain with C 1,1 -boundary := ∂D satisfying D̄ ⊂ U . For the reference domain  we
admit any of the three cases

(i)  = D,

(ii)  = U ,

(iii)  = U \ D̄.

Note that
˙ (∂ \ ) ⊂ U ∪
∂ = (∂ ∩ ) ∪ ˙ ∂U. (11.2.3)
Thus the boundary ∂ for the cases (i)–(iii) is given by

(i) ∂ = ∪ ∅ = ,

(ii) ∂ = ∅ ∪ ∂U = ∂U ,

(iii) ∂ = ∪ ∂U.

To introduce the admissible class of perturbations let h ∈ C 1,1 (Ū ) with h = 0 on


∂U = 0 and define, for t ∈ R, the mappings Ft : U → Rd by the perturbation of identity

Ft = id + t h. (11.2.4)

Then there exists τ > 0 such that Ft (U ) = U and Ft is a diffeomorphism for |t| < τ .
Defining the perturbed domains
t = Ft ()
and the perturbed manifolds as
t = Ft ( ),
¯ t ⊂ U for |t| < τ . Note that since h|∂U = 0 the
it follows that t is of class C 1,1 and 
boundary of U remains fixed as t varies, and hence by (11.2.3)

(∂)t \ t = ∂ \ for |t| < τ.

i i

i i
ItoKunisc
i i
2008/6/12
page 309
i i

11.2. Shape derivative 309

Alternatively to (11.2.4) the perturbations could be described as the flow determined by the
initial value problem
d
χ (t) = h(χ (t)), χ (0; x) = x,
dt
with Ft (x) = χ (t; x), i.e., by the velocity method.
Let Jˆ(t ) be the functional defined by Jˆ(t ) = J (yt , t , t ), where yt satisfies the
constraint

e(yt , t ) = 0. (11.2.5)

The shape derivative of Jˆ at  in the direction of the deformation field h is defined as

1 
Jˆ ()h = lim Jˆ(t ) − Jˆ() .
t→0 t

The functional Jˆ is called shape differentiable at  if Jˆ ()h exists for all h ∈ C 1,1 (U, Rd )
and defines a continuous linear functional on C 1,1 (Ū , Rd ). Using the method of mappings
one transforms the perturbed state constraint (11.2.5) to the fixed domain . For this purpose
define
y t = yt ◦ Ft .

Then y t :  → Rl satisfies an equation on the reference domain , which we express as

ẽ(y t , t) = 0, |t| < τ. (11.2.6)

We suppress the dependence of ẽ on h, because h will denote a fixed vector field throughout.
Because F0 = id one obtains y 0 = y and

ẽ(y 0 , 0) = e(y, ). (11.2.7)

We axiomatize the above description and impose the following assumptions on ẽ,
respectively, e.

(H1) There is a Hilbert space X and a C 1 -function ẽ : X × (−τ, τ ) → X ∗ such that


e(yt , t ) = 0 is equivalent to

ẽ(y t , t) = 0 in X ∗ ,

with ẽ(y, 0) = e(y, ) for all y ∈ X.

(H2) There exists 0 < τ0 ≤ τ such that for |t| < τ0 there exists a unique solution y t ∈ X
to ẽ(y t , t) = 0 and
|y t − y 0 |X
lim = 0.
t→0 t 1/2

i i

i i
ItoKunisc
i i
2008/6/12
page 310
i i

310 Chapter 11. Shape Optimization

(H3) ey (y, ) ∈ L(X, X∗ ) satisfies

e(v, ) − e(y, ) − ey (y, )(v − y), ψ


X∗ ×X = O(|v − y|2X )

for every ψ ∈ X, where y, v ∈ X.


(H4) ẽ and e satisfy
1
lim ẽ(y t , t) − ẽ(y, t) − e(y t , ) + e(y, ), ψ
X∗ ×X = 0
t→0 t

for every ψ ∈ X, where y t and y are the solutions of (11.2.6) and (11.2.2),
respectively.

In applications (H4) typically results in an assumption on the regularity of the coefficients


in the partial differential equation and on the vector field h. We assume throughout that
X %→ L2 (, Rl ) and, in the case that j2 , j3 are nontrivial, that the elements of X admit traces
in L2 ( , Rl ), respectively, L2 (∂ \ , Rl ). Typically X will be a subspace of H 1 (, Rl )
for some l ∈ N.
With regards to the cost functional J we require

(H5) ji ∈ C 1,1 (Rl , R), i = 1, 2, 3.

As a consequence of (H1)–(H2) we infer that (11.2.5) has a unique solution yt which is


given by

yt = y t ◦ Ft−1 .

Condition (H5) implies that j1 (y) ∈ L2 (), j1 (y) ∈ L2 ()l , j2 (y) ∈ L2 ( ), j2 (y) ∈
L2 ( )l , and j3 (y) ∈ L2 (∂U ), j3 (y) ∈ L2 (∂U )l for y ∈ X. Hence the cost functional
J (y, , ) is well defined for every y ∈ X.

Lemma 11.1. There is a constant c > 0, such that

|ji (v) − ji (y) − ji (y)(v − y)|L1 ≤ c|v − y|2X

holds for all y, v ∈ X, i = 1, 2, 3.

Proof. For j1 the claim follows from



|j1 (v) − j1 (y) − j1 (y)(v − y)| dx

  1  
≤ |j1 y(x) + s(v(x) − y(x)) − j1 (y(x))| ds |v(x) − y(x)| dx
 0
L
≤ |v − y|2L2 ≤ c|v − y|2X ,
2
where L > 0 is the Lipschitz constant for j1 . The same argument is valid also for j2
and j3 .

i i

i i
ItoKunisc
i i
2008/6/12
page 311
i i

11.2. Shape derivative 311

Subsequently we use the following notation:


It = det DFt , At = (DFt )−T , wt = It |At n|,
where DFt is the Jacobian of Ft and n denotes the outer normal unit vector to . We
require additional regularity properties of the transformation Ft which we specify next. Let
I = [−τ0 , τ0 ] with τ0 ≤ τ sufficiently small.
F0 = id, t → Ft ∈ C(I, C 2 (Ū , Rd )),
t → Ft ∈ C 1 (I, C 1 (Ū , Rd )), t → Ft−1 ∈ C(I, C 1 (Ū , Rd )),
t → It ∈ C 1 (I, C(Ū )), t → At ∈ C(I, C(Ū , Rd×d )),
t → wt ∈ C(I, C( )), (11.2.8)
d d −1
Ft |t=0 = h, F |t=0 = −h,
dt dt t
d d d
DFt |t=0 = Dh, DFt−1 |t=0 = (At )T |t=0 = −Dh,
dt dt dt
d d
It |t=0 = div h, wt |t=0 = div h.
dt dt
The surface divergence div is defined by
div h = div h| − (Dh n) · n.
The properties (11.2.8) are easily verified if Ft is specified by perturbation of the identity
as in (11.2.4). As a consequence of (11.2.8) there exists α > 0 such that
It (x) ≥ α, x ∈ Ū . (11.2.9)
We furthermore recall the following transformation theorem where we already utilize
(11.2.9).

Lemma 11.2. (1) Let ϕt ∈ L1 (t ); then ϕt ◦Ft ∈ L1 () and


 
ϕt dxt = ϕt ◦Ft det DFt dx.
t 

(2) Let ht ∈ L ( t ); then ht ◦Ft ∈ L1 ( ) and


1

 
ht d t = ht ◦Ft det DFt |(DFt )−T n| d .
t

We now formulate the representation of the shape derivative of Jˆ at  in direction h.

Theorem 11.3. Assume that (H1)–(H5) hold, that F satisfies (11.2.8), and that the adjoint
equation
ey (y, )ψ, p
X∗ ×X − (j1 (y), ψ) − (j2 (y), ψ) − (j3 (y), ψ)\ = 0, ψ ∈ X,
(11.2.10)

i i

i i
ItoKunisc
i i
2008/6/12
page 312
i i

312 Chapter 11. Shape Optimization

admits a unique solution p ∈ X, where y is the solution to (11.2.2). Then the shape
derivative of Jˆ at  in the direction h exists and is given by
 
ˆ d
J ()h = − ẽ(y, t), p
X∗ ×X |t=0 + j1 (y) div h dx + j2 (y) div h ds. (11.2.11)
dt 

Proof. Referring to (H2) let y t , y ∈ X satisfy


ẽ(y t , t) = e(y, ) = 0 (11.2.12)
for |t| < τ0 . Then yt = y t ◦ Ft is the solution of (11.2.5). Utilizing Lemma 11.2 one
therefore obtains
 
1 ˆ 1   1  
(J (t ) − Jˆ()) = It j1 (y t ) − j1 (y) dx + wt j2 (y t ) − j2 (y) ds
t t  t

1
+ (j3 (y t ) − j3 (y))ds
t ∂U

1 
= It (j1 (y t ) − j1 (y) − j1 (y)(y t − y)) + (It − 1)j1 (y)(y t − y)
t 

+ j1 (y)(y t − y) + (It − 1)j1 (y) dx

1 
+ wt (j2 (y t ) − j2 (y) − j2 (y)(y t − y)) + (wt − 1)j2 (y)(y t − 1)
t

+ j2 (y)(y t − y) + (wt − 1)j2 (y) ds
 
1 1
+ (j3 (y t ) − j3 (y) − j3 (y)(y t − y))ds + j (y)(y t − y)ds.
t ∂U t ∂U 3
(11.2.13)
Lemma 11.1 and (11.2.8) result in the estimates


It (j1 (y t ) − j1 (y) − j (y)(y t − y)) dx ≤ c|y t − y|2 ,
1 X
 

wt (j2 (y t ) − j2 (y) − j (y)(y t − y)) ds ≤ c|y t − y|2 , (11.2.14)
2 X




(j3 (y t ) − j3 (y) − j3 (y)(y t − y))ds ≤ c|y t − y|2X ,

∂U

where c > 0 does not depend on t. Employing the adjoint state p one obtains
(j1 (y), y t − y) + (j2 (y), y t − y) + (j3 (y), y t − y)∂U = ey (y, )(y t − y), p
X∗ ×X
= − e(y t , ) − e(y, ) − ey (y, )(y t − y), p
X∗ ×X
− ẽ(y t , t) − ẽ(y, t) − e(y t , ) + e(y, ), p
X∗ ×X
− ẽ(y, t) − ẽ(y, 0), p
X∗ ×X ,
(11.2.15)

i i

i i
ItoKunisc
i i
2008/6/12
page 313
i i

11.2. Shape derivative 313

where we used (11.2.12). We estimate the ten additive terms on the right-hand side of
(11.2.13). Terms one, five, and nine converge to zero by (11.2.14) and (H2). Terms two
and six converge to 0 by (11.2.8) and (H2). For terms four and eight one uses (11.2.8).
The claim (11.2.11) now follows by passing to the limit in terms three, seven, and ten using
(11.2.15), (H3), (H2), (H4), and (H1).

To check (H2) in specific applications the following result will be useful. It relies on


⎪ the linearized equation

(H6) Ey (y, )δy, ψ
X∗ ×X = f, ψ
X∗ ×X , ψ ∈ X,



admits a unique solution δy ∈ X for every f ∈ X ∗ .

Note that this condition is more stringent than the assumption of solvability of the
adjoint equation in Theorem 11.3, which requires solvability only for a specific right-hand
side.

Proposition 11.4. Assume that (11.2.2) admits a unique solution y and that (H6) is satisfied.
Then (H2) holds.

Proof. Let y ∈ X be the unique solution of (11.2.2). In view of

ẽy (y, 0) = ey (y, )

Assumption (H6) implies that ẽy (y, 0) is bijective. The claim follows from the implicit
function theorem.

Computing the derivative dtd ẽ(y, t), p


X∗ ×X |t=0 in (11.2.11), and subsequently argu-
ing that Jˆ is in fact a shape derivative at , can be facilitated by transforming the equation
ẽ(y, t), p
= 0 back to e(y ◦ Ft−1 , t ), p ◦ Ft−1
= 0 together with the following well-
known differentiation rules of the functionals.

Lemma 11.5 (see [DeZo]). (1) Let f ∈ C(I, W 1,1 (U )) and assume that ft (0) exists in
L1 (U ). Then
  
d
f (t, x) dx|t=0 = ft (0, x) dx + f (0, x) h · n ds.
dt t 

(2) Let f ∈ C(I, W 2,1 (U )) and assume that ft (0) exists in W 1,1 (U ). Then
    
d ∂
f (t, x) ds|t=0 = ft (0, x) ds + f (0, s) + κf (0, s) h · n ds,
dt t ∂n

where κ stands for the additive curvature of .

The first part of the lemma is valid also for domains  with Lipschitz continuous
boundary.

i i

i i
ItoKunisc
i i
2008/6/12
page 314
i i

314 Chapter 11. Shape Optimization

In the examples below f (t) will be typically given by expressions of the form

μv ◦ Ft−1 , μ∂i (v ◦ Ft−1 )∂j (w ◦ Ft−1 ), v ◦ Ft−1 w ◦ Ft−1 ∂i (z ◦ Ft−1 ),

where μ ∈ H 1 (U ) and v, z, and w ∈ H 2 (U ) are extensions of elements in H 2 (). The


assumptions of Lemma 11.5 can be verified using the following result.

Lemma 11.6 (see [SoZo]). (1) If y ∈ Lp (U ), then t → y ◦ Ft−1 ∈ C(I, Lp (U )), 1 ≤


p < ∞.
(2) If y ∈ H 2 (U ), then t → y ◦ Ft−1 ∈ C(I, H 2 (U )).
(3) If y ∈ H 2 (U ), then dtd (y ◦ Ft−1 )|t=0 exists in H 1 (U ) and is given by

d
(y ◦ Ft−1 )|t=0 = −(Dy) h.
dt

 
As a consequence we note that d

dt i
(y ◦ Ft−1 ) |t=0 exists in L2 (U ) and is given by

d  
∂i (y ◦ Ft−1 ) |t=0 = −∂i (Dy h), i = 1, . . . , d.
dt

In the next section ∇y stands for (Dy)T , where y is either a scalar- or vector-valued function.
To enhance readability we use two symbols for the inner product in Rd , (x, y), respectively,
x · y. The latter will be utilized only in the case of nested inner products.

11.3 Examples
Throughout this section it is assumed that (H5) is satisfied and that the regularity assumptions
of Section 11.2 for D, , and U hold. If J does not depend on , we write J (y, ) in place
of J (y, , ).

11.3.1 Elliptic Dirichlet boundary value problem


As a first example we consider the volume functional

J (y, ) = j1 (y) dx


subject to the constraint


(μ∇y, ∇ψ) − (f, ψ) = 0, (11.3.1)

where X = H01 (), f ∈ H 1 (U ), and μ ∈ C 1 (Ū , Rd×d ) such that μ(x) is symmetric and
uniformly positive definite. Here  = D and = ∂. Thus e(y, ) : X → X∗ is given by

e(y, ), ψ
X∗ ×X = (μ∇y, ∇ψ) − (f, ψ) .

i i

i i
ItoKunisc
i i
2008/6/12
page 315
i i

11.3. Examples 315

The equation on the perturbed domain is determined by


 
e(yt , t ), ψt
Xt∗ ×Xt = (μ∇yt , ∇ψt ) dxt − f ψ dxt
t t
 
= (μt At ∇y t , At ∇ψ t )It dx − f t ψ t dx ≡ ẽ(y t , t), ψ t
X∗ ,X
 
(11.3.2)

for any ψt ∈ Xt , with y t = yt ◦ Ft , μt = μ◦Ft , f t = f ◦Ft , and Xt = H01 (t ). Here


we used that ∇yt = (At ∇y t )◦Ft−1 and Lemma 11.5. (H1) is a consequence of (11.2.8),
(11.3.2), and the smoothness of μ and f . Since (11.3.1) admits a unique solution and (H6)
holds, Proposition 11.4 implies (H2). Since ẽ is linear in y, assumption (H3) follows. For
the verification of (H4) observe that

ẽ(y t , t) − ẽ(y, t) − e(y t , ) + e(y, ), ψ


X∗ ×X
= ((μt It At − μ)∇(y t − y), At ∇ψ) + (μ∇(y t − y), (At − I )∇ψ).

Hence (H4) follows from differentiability of μ, (11.2.8), and (H2). In view of Theorem 11.3
we have to compute dtd ẽ(y, t), p
X∗ ×X |t=0 for which we use the representation on t in
(11.3.2). Recall that the solution y of (11.3.1) as well as the adjoint state p, defined by

(μ∇p, ∇ψ) = (j1 (y), ψ) , ψ ∈ H01 (), (11.3.3)

belong to H 2 () ∩ H01 (). Since  ∈ C 1,1 (actually Lipschitz continuity of the boundary
would suffice), y as well as p can be extended to functions in H 2 (U ), which we again
denote by the same symbol. Therefore by Lemmas 11.5 and 11.6
d
ẽ(y, t), p
X∗ ×X |t=0
dt  

d
= (μ∇(y ◦ Ft−1 ), ∇(p ◦ Ft−1 )) dxt − fp ◦ Ft−1 dxt |t=0
dt t t
 

= (μ∇y, ∇p) (h, n) ds + (μ∇(−∇y · h), ∇p)

  

+ μ∇y, ∇(−∇p · h) + f (∇p, h) dx. (11.3.4)

Note that ∇y · h as well as ∇p · h do not belong to H01 () but they are elements of H 1 ().
Therefore Green’s theorem implies

 
(μ∇(−∇y · h), ∇p) + (μ∇y, ∇(−∇p · h)) + f (∇p, h) dx

 
= div(μ∇p) (∇y, h) dx − (μ∇p, n)(∇y, h) ds


 (11.3.5)
+ (div(μ∇y) + f ) (∇p, h) dx − (μ∇y, n) (∇p, h) ds
 
∂y ∂p
= − j1 (y) (∇y, h) dx − 2 (μn, n) (h, n) ds.
 ∂n ∂n

i i

i i
ItoKunisc
i i
2008/6/12
page 316
i i

316 Chapter 11. Shape Optimization

Above we used the strong form of (11.3.1) and (11.3.3) in L2 () as well as the identities
∂y ∂y
(μ∇y, n) = (μn, n) , (∇y, h) = (h, n)
∂n ∂n
(together with the ones with y and p interchanged) which follow from y, p ∈ H01 ().
Applying Theorem 11.3 results in

ˆ d
J ()h = − ẽ(y, t), p
X∗ ,X |t=0 + j1 (y) div h dx
dt
 

∂y ∂p
= (μn, n) (h, n) ds + (j1 (y)(∇y, h) + j1 (y) div h) dx
∂n ∂n
 
∂y ∂p
= (μn, n) (h, n) ds + div(j1 (y)h) dx,
∂n ∂n 

and the Stokes theorem yields the final result,


  
∂y ∂p
J ()h = (μn, n) + j1 (y) (h, n) ds.
∂n ∂n

Remark 11.3.1. If we were to be content with a representation of the shape variation in


terms of volume integrals we could take the expression for dtd ẽ(y, t)|t=0 given in (11.3.4)
and bypass the use of Green’s theorem in (11.3.5). The regularity requirement on the domain
then results from y ∈ H 2 (), p ∈ H 2 (). In [Ber] the shape derivative in terms of the
volume integral is referred to as the weak shape derivative, whereas the final form in terms
of the boundary integrals is called the strong shape derivative.

11.3.2 Inverse interface problem


We consider an inverse interface problem which is motivated by electrical impedance to-
mography. Let U =  = (−1, 1) × (−1, 1) and ∂U = ∂. Further let the domain
D = − , with  ¯ − ⊂ U , represent the inhomogeneity of the conducting medium and set
+
 = U \ ¯ . We assume that − is a simply connected domain of class C 2 with bound-

ary which represents the interface between − and + . The inverse problem consists of
identifying the unknown interface from measurements z which are taken on the boundary
∂U . This can be formulated as

min J (y, ) ≡ (y − z)2 ds (11.3.6)
∂U

subject to the constraints

− div(μ∇y) = 0 in − ∪ + ,
1 2
∂y
[y] = 0, μ − = 0 on , (11.3.7)
∂n
∂y
=g on ∂U,
∂n

i i

i i
ItoKunisc
i i
2008/6/12
page 317
i i

11.3. Examples 317

 
where g ∈ H 1/2 (∂U ), z ∈ L2 (∂U ), with ∂U g = ∂U z = 0, [v] = v + − v − on , and
n+/− standing for the unit outer normals to +/− . The conductivity μ is given by
-
μ− , x ∈ − ,
μ(x) =
μ+ , x ∈ + ,

for some positive constants μ− and μ+ . In the context of the general framework of Section
11.2 we have j1 =  j2 = 0 and j3 = (y − z) . Clearly
2
(11.3.7) admits a unique solution
y ∈ H (U ) with ∂U y = 0. Its restrictions to + and − will be denoted by y + and y − ,
1

respectively. It turns out that the regularity of y ± is better than the one of y.

Proposition 11.7. Let  and ± be as described above. Then the solution y ∈ H 1 (U ) of


(11.3.7) satisfies
y ± ∈ H 2 (± ).

Proof. Let H be the smooth boundary of a domain H with

− ⊂ H ⊂ H ⊂ U.

Then y| H ∈ H 3/2 ( H ). The problem


-
−div(μ+ ∇yH ) = 0 in U \H ,
∂yH
∂n
=g on ∂U, yH = y| H on H

has a unique solution yH ∈ H 2 (U \H ) with y H = y + |H . Therefore,

b := y|∂U = yH |∂U ∈ H 3/2 (∂U ).

Then the solution y to (11.3.7) coincides with the solution to



⎨ −div(μ∇y) = 0 in − ∪ + ,
[y] = 0, [μ ∂n− ] = 0
∂y
on ,

y=b on ∂U.

We now argue that y ± ∈ H 2 (± ). Let yb ∈ H 2 (U ) denote the solution to



− yb = 0 in U,
yb = b on ∂U.

Define w ∈ H01 (U ) as the unique solution to the interface problem



⎨ − div(μ∇w) = 0
⎪ in ,
[w] = 0, [μ ∂n
∂w
− ] = −[μ ∂n− ]
∂yb
on , (11.3.8)


w=0 on ∂U.

i i

i i
ItoKunisc
i i
2008/6/12
page 318
i i

318 Chapter 11. Shape Optimization

Then yb ∈ H 2 () implies [μ ∂n −] ∈ H


∂yb 1/2
( ). By [ChZo], (11.3.8) has a unique solution
w ∈ H0 (U ) with the additional regularity w ± ∈ H 2 (± ). Consequently y = w + yb
1

satisfies y|∂ = g and y ± ∈ H 2 (± ), as desired. In an analogous way p ± ∈ H 2 (± ).

To consider the inverse problem (11.3.6),


 (11.3.7) within the general framework of
Section 11.2 we set X = {v ∈ H 1 (U ) : ∂U v = 0} and define

e(y, ), ψ
X∗ ×X = (μ∇y, ∇ψ)U − (g, ψ)∂U ,

respectively,

ẽ(y, t), ψ
X∗ ×X = (μt At ∇y, At ∇ψIt )U − (g, ψ)∂U
= (μ+ ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))+t + (μ− ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 ))−t − (g, ψ)∂U .

Note that the boundary term is not affected by the transformation Ft since the deformation
field h vanishes on ∂U . The adjoint state is given by

− div(μ∇p) = 0 in − ∪ + ,
1 2
∂p
[p] = 0, μ − = 0 on , (11.3.9)
∂n
∂p
= 2(y − z) on ∂U,
∂n
respectively,

(μ∇p, ∇ψ)U = 2(y − z, ψ)∂U for ψ ∈ X. (11.3.10)

Assumption (H4) requires us to consider


1
| ẽ(y t − y, t) − e(y t − y, ), ψ
|
t 
1
≤ |(μ+ It At ∇(y t − y), At ∇ψ) − (μ+ ∇(y t − y), ∇ψ)| dx
t +

1
+ |(μ− It At ∇(y t − y), At ∇ψ) − (μ+ ∇(y t − y), ∇ψ)| dx
t −
  
1
≤μ +
t (It At − I )∇(y − y), At ∇ψ dx
t
+


1
+μ +
(∇(y − y), t (At − I )∇ψ) dx
t
+
 


1
+μ −
t (It At − I )∇(y − y), At ∇ψ dx
t



1
+μ −
(∇(y − y), t (At − I )∇ψ) dx.
t
 −

The right-hand side of this inequality converges to 0 as t → 0+ by (11.2.8). The remaining


assumptions can be verified as in Section 11.3.1 for the Dirichlet problem and thus Theorem

i i

i i
ItoKunisc
i i
2008/6/12
page 319
i i

11.3. Examples 319

11.3 is applicable. By Proposition 11.7 the restrictions y ± = y|± , p ± = p|± satisfy y ± ,


p± ∈ H 2 (± ). Using Lemma 11.5 we find that

d
ẽ(y, t), p
X∗ ×X |t=0
dt  
= (μ+ ∇y + , ∇p+ )(h, n+ ) ds − (μ+ ∇(∇y + · h), ∇p + ) dx
∂+ +
 
+ + +
− (μ ∇y , ∇(∇p · h)) dx + (μ− ∇y − , ∇p− )(h, n− ) ds
+ ∂−
 
− (μ− ∇(∇y − · h), ∇p − ) dx − (μ− ∇y − , ∇(∇p − · h)) dx
− −
  
+
= [μ∇y, ∇p](h, n ) ds − μ (∇(∇y + · h), ∇p + ) + (∇y + , ∇(∇p + · h)) dx
+
+


− μ− (∇(∇y − · h), ∇p − ) + (∇y − , ∇(∇p − · h)) dx.


−

Applying Green’s formula as in Example 11.3.1 (observe that (∇y, h), (∇p, h) ∈ / H 1 (U ))
together with (11.3.9) results in
 
− (μ+ ∇(∇y + · h), ∇p + )) dx − (μ− ∇(∇y − · h), ∇p − )) dx
+ −
 
+ + +
= div(μ ∇p )(∇y , h) dx + div(μ− ∇p − )(∇y − , h) dx
+ −
 
− (μ+ ∇p + , n+ )(∇y + , h) ds − (μ− ∇p − , n− )(∇y − , h) ds
∂+ ∂−
 1 2
∂p
=− μ + (∇y, h) ds.
∂n

In the last step we utilize h = 0 on ∂U . Similarly we obtain


 
− (μ+ ∇y + , ∇(∇p + · h))) dx − (μ− ∇y − , ∇(∇p − · h))) dx
+ −
 1 2
∂y
=− μ + (∇p, h) ds.
∂n

Collecting terms results in


  1 2 1 2
∂p ∂y
Jˆ ()h = − [μ(∇y, ∇p)] (h, n+ ) ds+ μ (∇y, h) + μ (∇p, h) ds.
∂n+ ∂n+

The identity
[ab] = [a]b+ + a − [b] = a + [b] + [a]b−
implies
[ab] = 0 if [a] = [b] = 0.

i i

i i
ItoKunisc
i i
2008/6/12
page 320
i i

320 Chapter 11. Shape Optimization

Hence the transition conditions


1 2 1 2
∂y ∂y
μ + = = 0,
∂n ∂τ
1 2 1 2 (11.3.11)
∂p ∂p
μ + = = 0,
∂n ∂τ

where ∂τ
stands for the tangential derivative imply
1 2 1 2
∂p ∂p ∂y + ∂p ∂y
μ + (∇y, h) = μ + (h, n ) + μ (h, τ )
∂n ∂n ∂n+ ∂n+ ∂τ
1 2 1 2
∂p ∂y + ∂p ∂y
= μ + (h, n ) + μ (h, τ )
∂n ∂n+ ∂n+ ∂τ
1 2
∂p ∂y
= μ + (h, n+ ),
∂n ∂n+
and analogously 1 2 1 2
∂y ∂p ∂y
μ + (∇p, h) = μ + (h, n+ ),
∂n ∂n ∂n+
which entails
  1 2
∂p ∂y
Jˆ ()h = − +
[μ(∇y, ∇p)] (h, n ) ds + 2 μ + (h, n+ ) ds
∂n ∂n+
 1 2  1 2
∂y ∂p + ∂y ∂p
=− μ (h, n ) ds + μ + (h, n+ ) ds
∂τ ∂τ ∂n ∂n+
  1 2
∂y ∂p + ∂y ∂p
= − [μ] (h, n ) ds + μ + (h, n+ ) ds.
∂τ ∂τ ∂n ∂n+
In view of (11.3.11) this can be rearranged as
1 2
∂y ∂p ∂y ∂p
−[μ] + μ +
∂τ ∂τ ∂n ∂n+
∂y ∂p ∂y ∂p ∂y + ∂p + − ∂y

∂p −
= −μ+ + μ− + μ+ − μ
∂τ
 ∂τ ∂τ
 ∂τ + ∂n+ ∂n+ ∂n+ ∂n+
− − + 
∂y ∂p 1 ∂y ∂p ∂y ∂p
= −μ+ + +
∂τ ∂τ 2 ∂n+ ∂n+ ∂n+ ∂n+
  − 
∂y ∂p 1 ∂y ∂p + ∂y + ∂p −
+ μ− + +
∂τ ∂τ 2 ∂n+ ∂n+ ∂n+ ∂n+
1  
= − [μ] (∇y + , ∇p− ) + (∇y − , ∇p+ ) ,
2
which gives the representation

1  
ˆ
J ()h = − [μ] (∇y + , ∇p− ) + (∇y − , ∇p+ ) (h, n+ ) ds
2

= − [μ](∇y + , ∇p− ) (h, n+ ) ds.

i i

i i
ItoKunisc
i i
2008/6/12
page 321
i i

11.3. Examples 321

11.3.3 Elliptic systems


Here we consider a domain  = U \ D, where D̄ ⊂ U and the boundaries ∂U and = ∂D
are assumed to be C 2 regular.
We consider the optimization problem
 
min J (y, , ) ≡ j1 (y) dx + j2 (y) ds,


where y is the solution of the elliptic system


 
 
e(y, ), ψ
X∗ ×X = a(x, ∇y, ∇ψ) − (f, ψ) dx − (g, ψ) ds = 0 (11.3.12)


in X = {v ∈ H 1 ()l : v|∂U = 0}. Above ∇y stands for (Dy)T . We require that f ∈ H 1 (U )l


and that g is the trace of a given function G ∈ H 2 (U )l . Furthermore we assume that
a : Ū × Rd×d × Rd×d satisfies

(1) a(·, ξ, η) is continuously differentiable for every ξ , η ∈ Rd×d ,

(2) a(x, ·, ·) defines a bilinear form on Rd×d × Rd×d which is uniformly bounded in
x ∈ Ū ,

(3) a(x, ·, ·) is uniformly coercive for all x ∈ Ū .

In the case of linear elasticity a is given by

a(x, ∇y, ∇ψ) = λ tr e(y) tr e(ψ) + 2μ e(y) : e(ψ),

where e(y) = 12 (∇y + (∇y)T ), and λ, μ are the positive Lamé coefficients. In this case a
is symmetric, and (11.3.12) admits a unique solution in X ∩ H 2 ()l for every f ∈ L2 ()l
1
and g ∈ H 2 (∂U )l ; see, e.g., [Ci].
The method of mapping suggests defining
 
 
ẽ(y, t), ψ
X∗ ×X = a(Ft (x), At ∇y, At ∇ψ) − (f t , ψ) It dx − (g t , ψ)wt ds
 

 
= a(x, ∇(y ◦Ft−1 ), ∇(ψ ◦Ft−1 )) − (f, ψ ◦Ft−1 ) dx − (g, ψ ◦Ft−1 ) ds.
t t
(11.3.13)

The adjoint state is determined by the equation


 
 
ey (y, )ψ, p
X∗ ×X = a(x, ∇ψ, ∇p) − j1 (y)ψ dx − j2 (y)ψ ds = 0, (11.3.14)


ψ ∈ X. Under the regularity assumptions on a, (11.3.12) admits a unique solution in


X ∩ H 2 ()l and the adjoint equation admits a solution for any right-hand side in X ∗ so

i i

i i
ItoKunisc
i i
2008/6/12
page 322
i i

322 Chapter 11. Shape Optimization

that Proposition 11.4 is applicable. All these properties are satisfied for the linear elasticity
case. Assumptions (H1)–(H4) can then be argued as in Section 11.3.1.
Employing Lemma 11.5 we obtain

d  
ẽ(y, t), p
X∗ ×X |t=0 = − a(x, ∇(∇y T h), ∇p) + a(x, ∇y, ∇(∇p T h)) dx
dt
 
 
+ a(x, ∇y, ∇p) (h, n) ds + (f, ∇p h) dx − (f, p) (h, n) ds
T

   


+ (g, ∇p T h) ds − (g, p) + κ(g, p) (h, n) ds.
∂n

Since ∇y T h ∈ X and ∇p T h ∈ X, this expression can be simplified using (11.3.12) and


(11.3.14):
 
d
ẽ(y, t), p
X ×X |t=0 = − j1 (y)∇y h dx − j2 (y)∇y T h ds

T
dt
 
  
  ∂
+ a(x, ∇y, ∇p) − (f, p) (h, n) ds − (g, p) + κ(g, p) (h, n) ds,
∂n

which implies
 
Jˆ ()h = j1 (y)∇y T h dx + j1 (y) div h dx
 
+ j2 (y)∇y T h ds + j2 (y) div h ds

 ∂ 
+ −a(x, ∇y, ∇p) + (f, p) + (g, p) + κ(g, p) (h, n) ds.
∂n

For the third and fourth terms the tangential Green’s formula (see, e.g., [DeZo]) yields
    

j2 (y)∇y T h ds + j2 (y) div h ds = j2 (y) + κj2 (y) (h, n) ds.
∂n

The first and second terms can be combined using the Stokes theorem. Summarizing we
finally obtain


Jˆ ()h = −a(x, ∇y, ∇p) + (f, p) + j1 (y)

(11.3.15)
∂   
+ j2 (y) + (g, p) + κ(j2 (y) + (g, p)) (h, n) ds.
∂n

This example also comprises the shape optimization problem of Bernoulli type:

min J (y, , ) ≡ min y 2 ds,

i i

i i
ItoKunisc
i i
2008/6/12
page 323
i i

11.3. Examples 323

where y is the solution of the mixed boundary value problem


− y = f in ,
y =0 on ∂U,
∂y
=g on ,
∂n
which was analyzed with a similar approach in [IKP]. Here the boundary ∂ of the domain
 ⊂ R2 is the disjoint union of a fixed part ∂U and an unknown part both with nonempty
relative interior. Let the state space X be given by
X = {ϕ ∈ H 1 () : ϕ = 0 on ∂U }.
Then the Eulerian derivative of J is given by (11.3.15), which reduces to
  
ˆ ∂ 2
J ()h = −(∇y, ∇p) + fp + (y + gp) + κ(y + gp) (h, n) ds.
2
∂n
This result coincides with the representation obtained in [IKP]. The present derivation,
however, is considerably simpler due to a better arrangement of terms in the proof of
Theorem 11.3. It is straightforward to adapt the framework to shape optimization problems
associated with the exterior Bernoulli problem.

11.3.4 Navier–Stokes system


Consider the stationary Navier–Stokes equations
−ν y + (y · ∇)y + ∇p = f in ,
div y = 0 in , (11.3.16)
y=0 on ∂

on a bounded domain  ⊂ Rd , d = 2, 3, with ν > 0 and f ∈ H 1 (U ). In the context of the


general framework we set  = D with C 2 -boundary = ∂. The variational formulation
of (11.3.16) is given by
Find (y, p) ∈ X ≡ H01 ()d × L2 ()/R such that
   
e (y, p),  , (ψ, χ ) X∗ ×X ≡ ν(∇y, ∇ψ) + ((y · ∇)y, ψ)
(11.3.17)
− (p, div ψ) − (f, ψ) + (div y, χ ) = 0
holds for all (ψ, χ ) ∈ X. Let the cost functional J be given by

J (y, ) = j1 (y) dx.


Considering (11.3.17) on a perturbed domain t mapping the equation back to the reference
domain  yields the form of ẽ(y, t). Concerning the transformation of the divergence we
note that for ψt ∈ H01 (t )d and ψ t = ψt ◦ Ft ∈ H01 ()d , one obtains
div ψt = (Dψit ATt ei ) ◦ Ft−1 = ((At )i ∇ψt,i ) ◦ Ft−1 ,

i i

i i
ItoKunisc
i i
2008/6/12
page 324
i i

324 Chapter 11. Shape Optimization

where ei stands for the ith canonical basis vector in Rd and (At )i denotes the ith row of
At = (DFt )−T . We follow the convention to sum over indices which occur at least twice
in a term. Thus one obtains
  t t    
ẽ (y , p ), t , (ψ, χ ) X∗ ×X = ν(It At ∇y t , At ∇ψ) + (y t · At ∇)y t , It ψ 
   
− p t , It (At )k ∇ψk  − (f t It , ψ) + It (At )k ∇ykt , χ  = 0

for all (ψ, χ ) ∈ X.


The adjoint state (λ, q) ∈ X is given by the solution to
   
e (y, p),  (ψ, χ ), (λ, q) X∗ ×X = (j1 (y), ψ) ,

which amounts to
ν(∇ψ, ∇λ) + ((ψ · ∇)y + (y · ∇)ψ, λ)
(11.3.18)
− (χ , div λ) + (div ψ, q) = (j1 (y), ψ)

for all (ψ, χ ) ∈ X. Integrating by parts one obtains


 
((y · ∇)ψ, λ) = − ψ · λ div y dx − ψ · ((y · ∇)λ) dx
 

+ (ψ · λ) (y · n) ds = −(ψ, (y · ∇)λ)

because y ∈ H01 ()d and div y = 0. Therefore

((ψ · ∇)y + (y · ∇)ψ, λ) = (ψ, (∇y)λ − (y · ∇)λ ) (11.3.19)

holds for all ψ ∈ H 1 ()d . As a consequence the adjoint equation can be interpreted as

−ν λ + (∇y)λ − (y · ∇)λ − ∇q = j1 (y),


(11.3.20)
div λ = 0,

where the first equation holds in L2 ()d and


 the second one in L ().
2

For the evaluation of dt ẽ (y, p), t , (λ, q)


X∗ ×X |t=0 , (y, p), (λ, q) ∈ X being the
d

solution of (11.3.17), respectively, (11.3.18), we transform this expression back to t , which


gives
   
ẽ (y, p), t , (λ, q) X∗ ×X = ν(∇(y ◦ Ft−1 ), ∇(λ ◦ Ft−1 ))t
 
+ (y ◦ Ft−1 · ∇)y ◦ Ft−1 , λ ◦ Ft−1 t − (div(λ ◦ Ft−1 ), p ◦ Ft−1 )t
− (f, λ ◦ Ft−1 )t + (div(y ◦ Ft−1 ), q ◦ Ft−1 )t .

To verify conditions (H1)–(H4) we introduce the continuous trilinear form c : H01 ()d ×
H01 ()d × H01 ()d by c(y, v, w) = ((y · ∇)v, w) and assume that

ν 2 > N |f |H −1 and ν > M, (11.3.21)

i i

i i
ItoKunisc
i i
2008/6/12
page 325
i i

11.3. Examples 325

where
c(y, v, w) c(v, v, y)
N = supy,v,w∈H01 and M = supv∈H01 ,
|y|H01 |v|H01 |w|H01 |v|2H 1
0

with y the solution to (11.3.16). Condition (H1) is satisfied by construction. If ν is suffi-


ciently large so that the first inequality in (11.3.21) is satisfied, existence of a unique solution
(y, p) ∈ H01 ()d × L2 ()/R to (11.3.16) is guaranteed; see, e.g., [Te]. The second con-
dition in (11.3.21) ensures the bijectivity of the linearized operator e (y, p), and thus (H6)
holds. In particular this implies that (H2) holds and that the adjoint equation admits a unique
solution. To verify (H3) we consider for arbitrary (v, q) ∈ X and (ψ, χ ) ∈ X
e((v, q), ) − e((y, p), ) − e ((y, p), )((v, q) − (y, p)), (ψ, χ )
X∗ ,X
= ((v − y) · ∇(v − y), ψ) ≤ K|ψ|H01 () |v − y|2H 1 () ,
0

where K is an embedding constant, independent of (v, q) ∈ X and (ψ, χ ) ∈ X. Verifying


(H4) requires us to consider the quotient of the following expression with t and taking the
limit as t → 0:
ν[(It At ∇(y t − y), At ∇ψ) − (∇(y t − y), ∇ψ)]
+ [((y t · At ∇)y t , It ψ) − ((y t · ∇)y t , ψ) − ((y · At ∇)y, It ψ) + ((y · ∇)y, ψ)]
− [(It (At )k ∇ψk , pt − p) + (div ψ, p t − p)]
+ [(It (At )k ∇ (ykt − yk ), χ ) − (div(y t − y), χ )]
for (ψ, χ ) ∈ X. The first two terms in square brackets can be treated by analogous estimates
as in Sections 11.3.1 and 11.3.2. Noting that the third and fourth square brackets can be
estimated quite similarly to each other, we give the estimate for the last one:
((It − 1)(At )k ∇(ykt − yk ), χ ) + (((At )k − ek )∇(ykt − yk ), χ )
(ek ∇(ykt − yk ) − div(y t − y), χ ),
which, upon division by t, tends to 0 for t → 0.
In the following calculation we utilize that (y, p), (λ, q) ∈ H 2 ()d × H 1 (), which
is satisfied if is C 2 . Applying Lemma 11.5 results in
d   
ẽ (y, p), t , (λ, q) X∗ ×X |t=0
dt 
= ν(∇(−∇y T h), ∇λ) + ν(∇y, ∇(−∇λT h)) + ν (∇y, ∇λ) (h, n) ds
    

+ (−∇y h) · ∇ y, λ + (y · ∇)(−∇y h), λ 


T T


   
+ (y · ∇)y, −∇λ h  +T
(y · ∇)y, λ (h, n) ds


− (−∇p h, div λ) − (p, div(−∇λT h)) − p div λ (h, n) ds
T

− (f, −∇λT h) − f λ (h, n) ds




+ (div(−∇y h), q) + (div y, −∇q T h) + q div y (h, n) ds.
T

i i

i i
ItoKunisc
i i
2008/6/12
page 326
i i

326 Chapter 11. Shape Optimization

Since div y = div λ = 0 and y, λ ∈ H01 ()d , this expression simplifies to

d   
ẽ (y, p), t , (λ, q) X∗ ×X |t=0
dt
= ν(∇y, ∇ψλ ) + ((y · ∇)y, ψλ ) − (p, div ψλ ) − (f, ψλ )
+ ν(∇ψy , ∇λ) + ((ψy · ∇)y + (y · ∇)ψy , λ)

+ (div ψy , q) + ν (∇y, ∇λ) (h, n) ds,

where we have used the abbreviation

ψy = −(∇y)T h, ψλ = −(∇λ)T h.

Note that ψy , ψλ ∈ H 1 ()d but not in H01 ()d . Green’s formula, together with (11.3.16),
(11.3.20), entails
d   
ẽ (y, p), t , (λ, q) X∗ ×X |t=0
dt  
 
∂y
= (−ν y + (y · ∇y)y + ∇p − f, ψλ ) + ν , ψλ ds + p (ψλ , n) ds
∂n
  
∂λ
+ (ψy , −ν λ + (∇y)λ − (y · ∇)λ − ∇q) + ν , ψy ds
∂n
 

+ q (ψy , n) ds + ν (∇y, ∇λ) (h, n) ds



  

   
∂y ∂λ
=− ν , (∇λ)T h ds − p ((∇λ)T h, n) ds − ν , (∇y)T h ds
∂n ∂n
 

− q ((∇y)T h, n) ds + ν (∇y, ∇λ) (h, n) ds − (j1 (y), (∇y)T h)



       
∂y ∂λ ∂λ ∂y
=− ν , +p ,n + q ,n (h, n) ds − (j1 (y), (∇u)T h) .
∂n ∂n ∂n ∂n

Arguing as in Section 11.3.3 one eventually obtains by Theorem 11.3


       
∂y ∂λ ∂λ ∂y
Jˆ ()h = ν , +p ,n + q ,n (h, n) ds
∂n ∂n ∂n ∂n


+ (j1 (y) div h + j1 (y)∇y T h) dx
        
∂y ∂λ ∂λ ∂y
= ν , +p ,n + q , n + j1 (y) (h, n) ds.
∂n ∂n ∂n ∂n

i i

i i
ItoKunisc
i i
2008/6/12
page 327
i i

Bibliography

[ACFK] J. Albert, C. Carstensen, S. A. Funken, and R. Close, MATLAB implementation


of the finite element method in elasticity, Computing 69(2002), 239–263.
[AcVo] R. Acar and C. R. Vogel, Analysis of bounded variation penalty methods for
ill-posed problems, Inverse Problems 10(1994), 1217–1229.
[Ad] R. Adams, Sobolev Spaces, Academic Press, Boston, 1975.
[AlMa] W. Alt and K. Malanowski, The Lagrange-Newton method for nonlinear
optimal control problems, Comput. Optim. Appl. 2(1993), 77–100.
[Alt1] W. Alt, Stabilität mengenwertiger Abbildungen mit Anwendungen auf nicht-
lineare Optimierungsprobleme, Bayreuther Mathematische Schriften 3, 1979.
[Alt2] W. Alt, Lipschitzian perturbations in infinite optimization problems, in Math-
ematical Programming with Data Perturbations II, A.V. Fiacco, ed., Lecture
Notes in Pure and Appl. Math. 85, Marcel Dekker, New York, 1983, 7–21.
[Alt3] W. Alt, Stability of Solutions and the Lagrange-Newton Method for Nonlinear
Optimization and Optimal Control Problems, Habilitation thesis, Bayreuth,
1990.
[Alt4] W. Alt, The Lagrange-Newton method for infinite-dimensional optimization
problems, Numer. Funct. Anal. Optimiz. 11(1990), 201–224.
[Alt5] W. Alt, Sequential quadratic programming in Banach spaces, in Advances
in Optimization, W. Oettli and D. Pallaschke, eds., Lecture Notes in Econom.
and Math. Systems 382, Springer-Verlag, Berlin, 1992, 281–301.
[Ba] V. Barbu, Analysis and Control of Nonlinear Infinite Dimensional Systems,
Academic Press, Boston, 1993.
[BaHe] A. Battermann and M. Heinkenschloss, Preconditioners for Karush-Kuhn-
Tucker matrices arising in the optimal control of distributed systems, in Control
and Estimation of Distributed Parameter Systems, Vorau (1996), Internat. Ser.
Numer. Math. 126, Birkhäuser, Basel, 1998, 15–32.
[BaK1] V. Barbu and K. Kunisch, Identification of nonlinear elliptic equations, Appl.
Math. Optim. 33(1996), 139–167.

327

i i

i i
ItoKunisc
i i
2008/6/12
page 328
i i

328 Bibliography

[BaK2] V. Barbu and K. Kunisch, Identification of nonlinear parabolic equations,


Control Theory Adv. Tech. 10(1995), 1959–1980.
[BaKRi] V. Barbu, K. Kunisch, and W. Ring, Control and estimation of the boundary
heat transfer function in Stefan problems, RAIRO Modél Math. Anal. Numér.
30(1996), 1–40.
[BaPe] V. Barbu and Th. Percupanu, Convexity and Optimization in Banach Spaces,
Reidel, Dordrecht, 1986.
[BaSa] A. Battermann and E. Sachs, An indefinite preconditioner for KKT systems
arising in optimal control problems, in Fast Solution of Discretized Optimiza-
tion Problems, Berlin, 2000, Internat. Ser. Numer. Math. 138, Birkhäuser,
Basel, 2001, 1–18.
[Be] D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods,
Academic Press, Paris, 1982.
[BeK1] M. Bergounioux and K. Kunisch, Augmented Lagrangian techniques for el-
liptic state constrained optimal control problems, SIAM J. Control Optim.
35(1997), 1524–1543.
[BeK2] M. Bergounioux and K. Kunisch, Primal-dual active set strategy for state
constrained optimal control problems, Comput. Optim. Appl. 22(2002), 193–
224.
[BeK3] M. Bergounioux and K. Kunisch, On the structure of the Lagrange multi-
plier for state-constrained optimal control problems, Systems Control Lett.
48(2002), 16–176.
[BeMeVe] R. Becker, D. Meidner, and B. Vexler, Efficient numerical solution of parabolic
optimization problems by finite element methods, Optim. Methods Softw.
22(2007), 813–833.
[BePl] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical
Sciences, Computer Science and Scientific Computing Series, Academic Press,
New York, 1979.
[Ber] M. Berggren, A Unified Discrete-Continuous Sensitivity Analysis Method for
Shape Optimization, Lecture at the Radon Institut, Linz, Austria, 2005.
[Bew] T. Bewley, Flow control: New challenges for a new renaissance, Progress in
Aerospace Sciences 37(2001), 24–58.
[BGHW] L. T. Biegler, O. Ghattas, M. Heinkenschloss, and B. van Bloemen Waanders,
eds., Large-Scale PDE-Constrained Optimization, Lecture Notes in Comput.
Sci. Eng. 30, Springer-Verlag, Berlin, 2003.
[BGHKW] L. T. Biegler, O. Ghattas, M. Heinkenschloss, D. Keyes, and B. van Bloe-
men Waanders, eds., Real-Time PDE-Constrained Optimization, SIAM,
Philadelphia, 2007.

i i

i i
ItoKunisc
i i
2008/6/12
page 329
i i

Bibliography 329

[BHHK] M. Bergounioux, M. Haddou, M. Hintermüller, and K. Kunisch, A comparison


of a Moreau–Yosida-based active set strategy and interior point methods for
constrained optimal control problems, SIAM J. Optim. 11(2000), 495–521.

[BIK] M. Bergounioux, K. Ito, and K. Kunisch, Primal-dual strategy for constrained


optimal control problems, SIAM J. Control Optim. 37(1999), 1176–1194.

[BKW] A. Borzì, K. Kunisch, and D. Y. Kwak, Accuracy and convergence properties of


the finite difference multigrid solution of an optimal control optimality system,
SIAM J. Control Optim. 41(2003), 1477–1497.

[BoK] A. Borzì and K. Kunisch, The numerical solution of the steady state solid
fuel ignition model and its optimal control, SIAM J. Sci. Comput. 22(2000),
263–284.

[BoKVa] A. Borzì, K. Kunisch, and M. Vanmaele, A multigrid approach to the optimal


control of solid fuel ignition problems, in Multigrid Methods VI, Lect. Notes
Comput. Sci. Eng. 14, Springer-Verlag, Berlin, 59–65.

[Bre] H. Brezis, Opérateurs Maximaux Monotones et Semi-groupes de Constraction


das le Espaces de Hilbert, North–Holland, Amsterdam, 1973.

[Bre2] H. Brezis, Problèmes unilatéraux, J. Math. Pures Appl. 51(1972), 1–168.

[CaTr] E. Casas and F. Tröltzsch, Second-order necessary and sufficient optimality


conditions for optimization problems and applications to control theory, SIAM
J. Optim. 13(2002), 406–431.

[ChK] G. Chavent and K. Kunisch, Regularization of linear least squares problems by


total bounded variation, ESAIM Control Optim. Calc. Var. 2(1997), 359–376.

[ChLi] A. Chambolle and P.-L. Lions, Image recovery via total bounded variation
minimization and related problems, Numer. Math. 76(1997), 167–188.

[ChZo] Z. Chen and J. Zou, Finite element methods and their convergence for elliptic
and parabolic interface problems, Numer. Math. 79(1998), 175–202.

[Ci] P. G. Ciarlet, Mathematical Elasticity, Vol. 1, North–Holland, Amsterdam,


1987.

[CKP] E. Casas, K. Kunisch, and C. Pola, Regularization by functions of bounded


variation and applications to image enhancement, Appl. Math. Optim.
40(1999), 229–258.

[Cla] F. H. Clarke, Optimization and Nonsmooth Analysis, John Wiley and Sons,
New York, 1983.

[CNQ] X. Chen, Z. Nashed, and L. Qi, Smoothing methods and semismooth methods
for nondifferentiable operator equations, SIAM J. Numer. Anal. 38(2000),
1200–1216.

i i

i i
ItoKunisc
i i
2008/6/12
page 330
i i

330 Bibliography

[CoKu] F. Colonius and K. Kunisch, Output least squares stability for parameter esti-
mation in two point value problems, J. Reine Angew. Math. 370(1986), 1–29.

[DaLi] R. Dautray and J. L. Lions, Mathematical Analysis and Numerical Methods


for Science and Technology, Vol. 3, Springer-Verlag, Berlin, 1990.

[Dei] K. Deimling, Nonlinear Functional Analysis, Springer-Verlag, Berlin, 1985.

[DeZo] M. C. Delfour and J.-P. Zolésio, Shapes and Geometries: Analysis, Differential
Calculus, and Optimization, SIAM, Philadelphia, 2001.

[Don] A. L. Dontchev, Local analysis of a Newton-type method based on partial


elimination, in The Mathematics of Numerical Analysis, Lectures in Appl.
Math. 32, AMS, Providence, RI, 1996, 295–306.

[DoSa] D. C. Dobson and F. Santosa, Recovery of blocky images from noisy and blurred
data, SIAM J. Appl. Math. 56(1996), 1181–1198.

[dReKu] C. de los Reyes and K. Kunisch, A comparison of algorithms for control


constrained optimal control of the Burgers equation, Calcolo 41(2004), 203–
225.

[EcJa] C. Eck and J. Jarusek, Existence results for the static contact problem with
Coulomb friction, Math. Models Methods Appl. Sci. 8(1998), 445–468.

[EkTe] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, North–
Holland, Amsterdam, 1976.

[EkTu] I. Ekeland and T. Turnbull, Infinite Dimensional Optimization and Convexity,


The University of Chicago Press, Chicago, 1983.

[Fat] H. O. Fattorini, Infinite Dimensional Optimization and Control Theory, Cam-


bridge University Press, Cambridge, 1999.

[FGH] A. V. Fursikov, M. D. Gunzburger, and L. S. Hou, Boundary value prob-


lems and optimal boundary control for the Navier–Stokes systems: The two-
dimensional case, SIAM J. Control Optim. 36(1998), 852–894.

[FoGl] M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applications


to Numerical Solutions of Boundary Value Problems, North–Holland, Ams-
terdam, 1983.

[Fr] A. Friedman, Variational Principles and Free Boundary Value Problems, John
Wiley and Sons, New York, 1982.

[Geo] V. Georgescu, On the unique continuation property for Schrödinger Hamilto-


nians, Helv. Phys. Acta 52(1979), 655–670.

[GeYa] D. Geman and C. Yang, Nonlinear image recovery with half-quadratic regu-
larization, IEEE Trans. Image Process. 4(1995), 932–945.

i i

i i
ItoKunisc
i i
2008/6/12
page 331
i i

Bibliography 331

[GHS] M. D. Gunzburger, L. Hou, and T. P. Svobodny, Finite element approxima-


tions of optimal control problems associated with the scalar Ginzburg-Landau
equation, Comput. Math. Appl. 21(1991), 123–131.

[GiRa] V. Girault and P.-A. Raviart, Finite Element Methods for Navier–Stokes Equa-
tions, Springer-Verlag, Berlin, 1986.

[Giu] E. Giusti, Minimal Surfaces and Functions of Bounded Variation, Birkhäuser,


Boston, 1984.

[Glo] R. Glowinski, Numerical Methods for Nonlinear Variational Problems,


Springer-Verlag, Berlin, 1984.

[GLT] R. Glowinski, J. L. Lions, and R. Tremoliers, Numerical Analysis of Variational


Inequalities, North–Holland, Amsterdam, 1981.

[Gri] P. Grisvard, Elliptic Problems in Nonsmooth Domains, Monographs Stud.


Math. 24, Pitman, Boston, MA, 1985.

[Grie] A. Griewank, The local convergence of Broyden-like methods on Lipschitzian


problems in Hilbert spaces, SIAM J. Numer. Anal. 24(1987), 684–705.

[GrVo] R. Griesse and S. Volkwein, A primal-dual active set strategy for optimal
boundary control of a nonlinear reaction-diffusion system, SIAM J. Control
Optim. 44(2005), 467–494.

[Gu] M. D. Gunzburger, Perspectives of Flow Control and Optimization, SIAM,


Philadelphia, 2003.

[Han] S.-P. Han, Superlinearly convergent variable metric algorithms for general
nonlinear programming problems, Math. Programming 11(1976), 263–282.

[HaMä] J. Haslinger and R. A. E. Mäkinen, Introduction to Shape Optimization,


SIAM, Philadelphia, 2003.

[HaNe] J. Haslinger and P. Neittaanmäki, Finite Element Approximation for Optimal


Shape, Material and Topology Design, 2nd ed, Wiley, Chichester, 1996.

[HaPaRa] S.-H. Han, J.-S. Pang, and N. Rangaray, Glabally convergent Newton methods
for nonsmooth equations, Math. Oper. Res. 17(1992), 586–607.

[Har] A. Haraux, How to differentiate the projection on a closed convex set in


Hilbert space. Some applications to variational inequalities, J. Math. Soc.
Japan 29(1977), 615–631.

[Has] J. Haslinger, Approximation of the Signorini problem with friction, obeying


the Coulomb law, Math. Methods Appl. Sci. 5(1983), 422–437.

[Hei] M. Heinkenschloss, Projected sequential quadratic programming methods,


SIAM J. Optim. 6(1996), 373–417.

i i

i i
ItoKunisc
i i
2008/6/12
page 332
i i

332 Bibliography

[Hes1] M. R. Hestenes, Optimization Theory. The Finite Dimensional Case, John


Wiley and Sons, New York, 1975.
[Hes2] M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl.
4(1968), 303–320.
[HHNL] I. Hlavacek, J. Haslinger, J. Necas, and J. Lovisek, Solution of Variational
Inequalities in Mechanics, Appl. Math. Sci. 66, Springer-Verlag, New York,
1988.
[HiIK] M. Hintermüller, K. Ito, and K. Kunisch, The primal-dual active set strategy
as a semismooth Newton method, SIAM J. Optim. 13(2002), 865–888.
[HiK1] M. Hinze and K. Kunisch, Three control methods for time-dependent fluid
flow, Flow Turbul. Combust. 65(2000), 273–298.
[HiK2] M. Hinze and K. Kunisch, Second order methods for optimal control of time-
dependent fluid flow, SIAM J. Control Optim. 40(2001), 925–946.
[HinK1] M. Hintermüller and K. Kunisch, Total bounded variation regularization as a
bilaterally constrained optimization problem, SIAM J. Appl. Math. 64(2004),
1311–1333.
[HinK2] M. Hintermüller and K. Kunisch, Feasible and noninterior path-following
in constrained minimization with low multiplier regularity, SIAM J. Control
Optim. 45(2006), 1198–1221.
[HPR] S.-P. Han, J.-S. Pang, and N. Rangaraj, Globally convergent Newton methods
for nonsmooth equations, Math. Oper. Res. 17(1992), 586–607.
[HuStWo] S. Hüeber, G. Stadler, and B. I. Wohlmuth, A primal-dual active set algorithm
for three-dimensional contact problems with Coulomb friction, SIAM J. Sci.
Comput. 30(2008), 572–596.
[IK1] K. Ito and K. Kunisch, The augmented Lagrangian method for equality and
inequality constraints in Hilbert space, Math. Programming 46(1990), 341–
360.
[IK2] K. Ito and K. Kunisch, The augmented Lagrangian method for parameteriza-
tion in elliptic systems, SIAM J. Control Optim. 28(1990), 113–136.
[IK3] K. Ito and K. Kunisch, The augmented Lagrangian method for estimating the
diffusion coefficient in an elliptic equation, in Proceedings of the 26th IEEE
Conference on Decision and Control, Los Angeles, 1987, 1400–1404.
[IK4] K. Ito and K. Kunisch, An augmented Lagrangian technique for variational
inequalities, Appl. Math. Optim. 21(1990), 223–241.
[IK5] K. Ito and K. Kunisch, Sensitivity analysis of solutions to optimization prob-
lems in Hilbert spaces with applications to optimal control and estimation, J.
Differential Equations 99(1992), 1–40.

i i

i i
ItoKunisc
i i
2008/6/12
page 333
i i

Bibliography 333

[IK6] K. Ito and K. Kunisch, On the choice of the regularization parameter in non-
linear inverse problems, SIAM J. Optim. 2(1992), 376–404.
[IK7] K. Ito and K. Kunisch, Sensitivity measures for the estimation of parameters
in 1-D elliptic boundary value problems, J. Math. Systems Estim., Control
6(1996), 195–218.
[IK8] K. Ito and K. Kunisch, Maximizing robustness in nonlinear illposed inverse
problems, SIAM J. Control Optim. 33(1995), 643–666.
[IK9] K. Ito and K. Kunisch, Augmented Lagrangian-SQP-methods in Hilbert spaces
and application to control in the coefficients problems, SIAM J. Optim.
6(1996), 96–125.
[IK10] K. Ito and K. Kunisch, Augmented Lagrangian-SQP methods for nonlinear
optimal control problems of tracking type, SIAM J. Control Optim. 34(1996),
874–891.
[IK11] K. Ito and K. Kunisch, Augmented Lagrangian methods for nonsmooth convex
optimization in Hilbert spaces, Nonlinear Anal. 41(2000), 573–589.
[IK12] K. Ito and K. Kunisch, An active set strategy based on the augmented
Lagrangian formulation for image restoration, MZAN Math. Model Nu-
mer. Anal. 33(1999), 1–21.
[IK13] K. Ito and K. Kunisch, Augmented Lagrangian formulation of nonsmooth,
convex optimization in Hilbert spaces, in Control of Partial Differential Equa-
tions, E. Casas, ed., Lecture Notes in Pure andAppl. Math. 174, Marcel Dekker,
New York, 1995, 107–117.
[IK14] K. Ito and K. Kunisch, Estimation of the convection coefficient in elliptic
equations, Inverse Problems 13(1997), 995–1013.
[IK15] K. Ito and K. Kunisch, Newton’s method for a class of weakly singular optimal
control problems, SIAM J. Optim. 10(1999), 896–916.
[IK16] K. Ito and K. Kunisch, Optimal control of elliptic variational inequalities,
Appl. Math. Optim. 41(2000), 343–364.
[IK17] K. Ito and K. Kunisch, Optimal control of the solid fuel ignition model with
H 1 -cost, SIAM J. Control Optim. 40(2002), 1455–1472.
[IK18] K. Ito and K. Kunisch, BV-type regularization methods for convoluted objects
with edge, flat and grey scales, Inverse Problems 16(2000), 909–928.
[IK19] K. Ito and K. Kunisch, Optimal control, in Encyclopedia of Electrical and
Electronic Engineering, J. G. Webster, ed., 15, John Wiley and Sons, New
York, 1999, 364–379.
[IK20] K. Ito and K. Kunisch, Semi-smooth Newton methods for variational inequal-
ities of the first kind, MZAN Math. Model. Numer. Anal. 37(2003), 41–62.

i i

i i
ItoKunisc
i i
2008/6/12
page 334
i i

334 Bibliography

[IK21] K. Ito and K. Kunisch, The primal-dual active set method for nonlinear optimal
control problems with bilateral constraints, SIAM J. Control Optim. 43(2004),
357–376.

[IK22] K. Ito and K. Kunisch, Semi-smooth Newton methods for state-constrained


optimal control problems, Systems Control Lett. 50(2003), 221–228.

[IK23] K. Ito and K. Kunisch, Parabolic variational inequalities: The Lagrange mul-
tiplier approach, J. Math. Pures Appl. (9) 85(2006), 415–449.

[IKK] K. Ito, M. Kroller, and K. Kunisch, A numerical study of an augmented La-


grangian method for the estimation of parameters in elliptic systems, SIAM
J. Sci. Statist. Comput. 12(1991), 884–910.

[IKP] K. Ito, K. Kunisch, and G. Peichl, Variational approach to shape derivatives


for a class of Bernoulli problems, J. Math. Anal. Appl. 314(2006), 126–149.

[IKPe] K. Ito, K. Kunisch, and G. Peichl, On the regularization and the numerical
treatment of the inf-sup condition for saddle point problems, Comput. Appl.
Math. 21(2002), 245–274.

[IKPe2] K. Ito, K. Kunisch, and G. Peichl, A variational approach to shape derivatives,


to appear in ESAIM Control Optim. Calc. Var.

[IKSG] K. Ito, K. Kunisch, V. Schulz, and I. Gherman, Approximate Nullspace Itera-


tions for KKT Systems in Model Based Optimization, preprint, University of
Trier, 2008.

[ItKa] K. Ito and F. Kappel, Evolution Equations and Approximations, World Scien-
tific, River Edge, NJ, 2002.

[Ja] J. Jahn, Introduction to the Theory of Nonlinear Optimization, Springer-Verlag,


Berlin, 1994.

[JaSa] H. Jäger and E. W. Sachs, Global convergence of inexact reduced SQP methods,
Optim. Methods Softw. 7(1997), 83–110.

[Ka] K. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, Berlin,


1980.

[KaK] A. Kauffmann and K. Kunisch, Optimal control of the solid fuel ignition model,
ESAIM Proc. 8(2000), 65–76.

[Kan] C. Kanzow, Inexact semismooth Newton methods for large-scale complemen-


tarity problems, Optim. Methods Softw. 19(2004), 309–325.

[KeSa] C. T. Kelley and E. W. Sachs, Solution of optimal control problems by a


pointwise projected Newton method, SIAM J. Control Optim. 33(1995), 1731–
1757.

i i

i i
ItoKunisc
i i
2008/6/12
page 335
i i

Bibliography 335

[KO] N. Kikuchi and J. T. Oden, Contact Problems in Elasticity: A Study of Vari-


ational Inequalities and Finite Element Methods, SIAM Stud. Appl. Math. 8,
SIAM, Philadelphia, 1988.

[Kou] S. G. Kou, A jump-diffusion model for option pricing, Management Sci.


48(2002), 1086–1101.

[KPa] K. Kunisch and X. Pan, Estimation of interfaces from boundary measurements,


SIAM J. Control Optim. 32(1994), 1643–1674.

[KPe] K. Kunisch and G. Peichl, Estimation of a temporally and spatially varying


diffusion coefficient in a parabolic system by an augmented Lagrangian tech-
nique, Numer. Math. 59(1991), 473–509.

[KRe] K. Kunisch and F. Rendl, An infeasible active set method for quadratic prob-
lems with simple bounds, SIAM J. Optim. 14(2003), 35–52.

[KRo] K. Kunisch and A. Rösch, Primal-dual active set strategy for a general class
of optimal control problems, to appear in SIAM J. Optim.

[KSa] K. Kunisch and E. W. Sachs, Reduced SQP methods for parameter identifica-
tion problems, SIAM J. Numer. Anal. 29(1992), 1793–1820.

[Kup] F.-S. Kupfer, An infinite-dimensional convergence theory for reduced SQP


methods in Hilbert space, SIAM J. Optim. 6(1996), 126–163.

[KuSa] F.-S. Kupfer and E. W. Sachs, Numerical solution of a nonlinear parabolic


control problem by a reduced SQP method, Comput. Optim. Appl. 1(1992),
113–135.

[KuSt] K. Kunisch and G. Stadler, Generalized Newton methods for the 2D-Signorini
contact problem with friction in function space, MZAN Math. Model. Numer.
Anal. 39(2005), 827–854.

[KuTa] K. Kunisch and X-. Tai, Sequential and parallel splitting methods for bilinear
control problems in Hilbert spaces, SIAM J. Numer. Anal. 34(1997), 91–118.

[KVo1] K. Kunisch and S. Volkwein, Augmented Lagrangian-SQP techniques and


their approximation, in Optimization Methods in Partial Differential Equa-
tions, Contemp. Math. 209, AMS, Providence, RI, 1997, 147–160.

[KVo2] K. Kunisch and S. Volkwein, Galerkin proper orthogonal decomposition meth-


ods for parabolic systems, Numer. Math. 90(2001), 117–148.

[KVo3] K. Kunisch and S. Volkwein, Galerkin proper orthogonal decomposition meth-


ods for a general equation in fluid dynamics, SIAM J. Numer. Anal. 40(2002),
492–515.

[LeMa] F. Lempio and H. Maurer, Differential stability in infinite-dimensional nonlin-


ear programming, Appl. Math. Optim. 6(1980), 139–152.

i i

i i
ItoKunisc
i i
2008/6/12
page 336
i i

336 Bibliography

[LiMa] J.-L. Lions and E. Magenes, Non-Homogeneous Boundary Value Problems


and Applications, Springer-Verlag, Berlin, 1972.

[Lio1] J.-L. Lions, Control of Distributed Singular Systems, Gauthier-Villars, Paris,


1985.

[Lio2] J.-L. Lions, Optimal Control of Systems Governed by Partial Differential


Equations, Springer-Verlag, Berlin, 1971.

[Lio3] J.-L. Lions, Quelques Methodes de Resolution de Problemes aux Limites Non
Lineares, Dunod, Paris, 1969.

[MaZo] H. Maurer and J. Zowe, First and second-order necessary and sufficient opti-
mality conditions for infinite-dimensional programming problems, Math. Pro-
gramming 16(1979), 98–110.

[Mif] R. Mifflin, Semismooth and semiconvex functions in constrained optimization,


SIAM J. Control Optim. 15(1977), 959–972.

[MuSi] F. Murat and J. Simon, Sur le controle par un domain geometrique, Rapport
76015, Universite Pierre et Marie Curie, Paris, 1976.

[Pan1] J. S. Pang, Newton’s method for B-differentiable equations, Math. Oper. Res.
15(1990), 311–341.

[Pan2] J. S. Pang, A B-differentiable equation-based, globally and locally quadrat-


ically convergent algorithm for nonlinear programs, complementarity and
variational inequality problems, Math. Programming 51(1991), 101–131.

[Paz] A. Pazy, Semi-groups of nonlinear contractions and their asymptotic behavior,


in Nonlinear Analysis and Mechanics: Heriot Watt Symposium, Vol III, R. J.
Knops, ed., Res. Notes in Math. 30, Pitman, Boston, MA, 1979, 36–134.

[PoTi] E. Polak andA. L. Tits, A globally convergent implementable multiplier method


with automatic limitation, Appl. Math. Optim. 6(1980), 335–360.

[PoTr] V. T. Polak and N. Y. Tret’yakov, The method of penalty estimates for condi-
tional extremum problems, Žurnal Vyčislitel’noı̌ Matematiki i Matematičeskoı̌
Fiziki 13(1973), 34–46.

[Pow] M. J. D. Powell, A method for nonlinear constraints in minimization problems,


in Optimization, R. Fletcher, ed., Academic Press, New York, 1968, 283–298.

[Pow1] M. J. D. Powell, The convergence of variable metric methods for nonlinearly


constrained optimization problems, Nonlinear Programming 3, O. L. Man-
gasarian, ed., Academic Press, New York, 1987, 27–63.

[Qi] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equa-
tions, Math. Oper. Res. 18(1993), 227–244.

i i

i i
ItoKunisc
i i
2008/6/12
page 337
i i

Bibliography 337

[QiSu] L. Qi and J. Sun, A nonsmooth version of Newton’s method, Math. Program-


ming 58(1993), 353–367.

[Rao] M. Raous, Quasistatic Signorini problem with Coulomb friction and coupling
to adhesion, in New Developments in Contact Problems, P. Wriggers and
Panagiotopoulos, eds., CISM Courses and Lectures 384, Springer-Verlag, New
York, 1999, 101–178.

[Ro1] S. M. Robinson, Regularity and stability for convex multivalued functions,


Math. Oper. Res. 1(1976), 130–143.

[Ro2] S. M. Robinson, Stability theory for systems of inequalities, Part II: Differen-
tiable nonlinear systems, SIAM J. Numer. Anal. 13(1976), 497–513.

[Ro3] S. M. Robinson, Strongly regular generalized equations, Math. of Oper. Res.


5(1980), 43–62.

[Ro4] S. M. Robinson, Local structure of feasible sets in nonlinear programming,


Part III: Stability and sensitivity, Math. Programming Stud. 30(1987), 45–66.

[Roc1] R. T. Rockafeller, Local boundedness of nonlinear monotone operators, Michi-


gan Math. J. 16(1969), 397–407.

[Roc2] R. T. Rockafeller, The multiplier method of Hestenes and Powell applied to


convex programming, J. Optim. Theory Appl. 12(1973), 34–46.

[ROF] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise
removal algorithms, Phys. D 60 (1992), 259–268.

[RoTr] A. Rösch and F. Tröltzsch, On regularity of solutions and Lagrange multipliers


of optimal control problems for semilinear equations with mixed pointwise
control-state constraints, SIAM J. Control Optim. 46(2007), 1098–1115.

[Sa] E. Sachs, Broyden’s method in Hilbert space, Math. Programming 35(1986),


71–82.

[Sey] R. Seydel, Tools for Computational Finance, Springer-Verlag, Berlin, 2002.

[Sha] A. Shapiro, On concepts of directional differentiability, J. Optim. Theory Appl.


13(1999), 477–487.

[SoZo] J. Sokolowski and J. P. Zolesio, Introduction to Shape Optimization, Springer-


Verlag, Berlin, 1991.

[Sta1] G. Stadler, Infinite-Dimensional Semi-Smooth Newton and Augmented La-


grangian Methods for Friction and Contact Problems in Elasticity, Ph.D.
thesis, University of Graz, 2004.

[Sta2] G. Stadler, Semismooth Newton and augmented Lagrangian methods for a


simplified problem friction, SIAM J. Optim. 15(2004), 39–62.

i i

i i
ItoKunisc
i i
2008/6/12
page 338
i i

338 Bibliography

[Sto] J. Stoer, Principles of sequential quadratic programming methods for solving


nonlinear programs, in Computation Mathematical Programming, K. Schitt-
kowski, ed., NATO Adv. Sci. Inst. Ser F Comput. Systems Sci. 15, Springer-
Verlag, Berlin, 1985, 165–207.

[StTa] J. Stoer and R. A. Tapia, The Local Convergence of Sequential Quadratic


Programming Methods, Technical Report 87–04, Rice University, 1987.

[Tan] H. Tanabe, Equations of Evolution, Pitman, London, 1979.

[Te] R. Temam, Navier Stokes Equations: Theory and Numerical Analysis, North–
Holland, Amsterdam, 1979.

[Tem] R. Temam, Infinite-Dimensional Dynamical Systems in Mechanics and


Physics, Springer-Verlag, Berlin, 1988.

[Tin] M. Tinkham, Introduction to Superconductivity, McGraw–Hill, New York,


1975.

[Tr] G. M. Troianiello, Elliptic Differential Equations and Obstacle Problems,


Plenum Press, New York, 1987.

[Tro] F. Troeltzsch, An SQP method for the optimal control of a nonlinear heat
equation, Control Cybernet. 23(1994), 267–288.

[Ulb] M. Ulbrich, Semismooth Newton methods for operator equations in function


spaces, SIAM J. Optim. 13(2003), 805–841.

[Vog] C. R. Vogel, Computational Methods for Inverse Problems, Frontiers Appl.


Math. 23, SIAM, Philadelphia, 2002.
[Vol] S. Volkwein, Mesh-independence for an augmented Lagrangian-SQP method
in Hilbert spaces, SIAM J. Control Optim. 38(2000), 767–785.
[We] J. Werner, Optimization—Theory and Applications, Vieweg, Braunschweig,
1984.
[Zar] E. M. Zarantonello, Projection on convex sets in Hilbert space and spectral
theory, in Contributions to Nonlinear Functional Analysis, E. H. Zarantonello,
ed., Academic Press, New York, 1971, 237–424.
[Zo] J. P. Zolesio, The material derivative (or speed) method for shape optimization,
in Optimization of Distributed Parameter Structures, Vol II, E. Haug and J. Cea,
eds., Sijthoff & Noordhoff, Alphen aan den Rijn, 1981, 1089–1151.
[ZoKu] J. Zowe and S. Kurcyusz, Regularity and stability for the mathematical pro-
gramming problem in Banach spaces, Appl. Math. Optim. 5(1979), 49–62.

i i

i i
ItoKunisc
i i
2008/6/12
page 339
i i

Index

adapted penalty, 23 dual cone, 1, 27


adjoint equation, 18 dual functional, 76, 115
American options, 277 dual problem, 77, 99
augmentability, 65, 68, 73 duality pairing, 1
augmented Lagrange functional, 76
augmented Lagrangian, 65, 263 effective domain, 89
augmented Lagrangian-SQP method, 155 Eidelheit separation theorem, 3
elastoplastic problem, 122
Babuška–Brezzi condition, 184 epigraph, 89
Bernoulli problem, 322 equality constraints, 3, 8
Bertsekas penalty functional, 73 equality-constrained problem, 7
BFGS-update formula, 141 extremality condition, 253
biconjugate functional, 93
bilateral constraints, 202 feasibility step, 149
Bingham flow, 120 Fenchel duality theorem, 253
Black–Scholes model, 277 first order augmented Lagrangian, 67, 75
Bouligand differentiability, 217, 234 first order necessary optimality condition,
Bouligand direction, 226 156
box constraints, 225 friction problem, 125, 263
Bratu problem, 13, 152
BV-regularization, 254 Gâteaux differentiable, 95
BV-seminorm, 87, 254 Gauss–Newton algorithm, 228
Gel’fand triple, 246
Clarke directional derivative, 222 generalized equation, 31, 33
coercive, 71 generalized implicit function theorem, 31
complementarity condition, 88, 113 generalized inverse function theorem, 35
complementarity problem, 51, 215, 240 globalization strategy, 222
conical hull, 1
conjugate functional, 92, 253 Hessian, 29
constrained optimal control, 190
contact problem, 263 image restoration, 121
convection estimation, 14, 153 implicit function theorem, 31
convex functional, 89 indicator function, 28, 92
Coulomb friction, 263 inequality constraints, 6
inequality-constrained problem, 7
descent direction, 225 inverse function theorem, 35
directional differentiability, 58, 217, 234 inverse interface problem, 316

339

i i

i i
ItoKunisc
i i
2008/6/12
page 340
i i

340 Index

Karush–Kuhn–Tucker condition, 6 partial semismooth Newton iteration, 244


penalty method, 75
L1 -fitting, 126 penalty techniques, 24
Lagrange functional, 28, 66 polar cone, 1, 69
Lagrange multiplier, 2, 28, 277 polyhedric, 54, 57
Lagrangian method, 75 preconditioning, 174
least squares, 10 primal equation, 18
linear elasticity, 263 primal-dual active set strategy, 189, 202,
linearizing cone, 2 240
lower semicontinuous, 89 proper functional, 89
M-matrix, 194, 196 quasi-directional derivative, 225
Mangasarian–Fromowitz constraint
qualification, 6 reduced formulation, xiv
material derivative, 306 reduced SQP method, 139
Maurer–Zowe optimality condition, 42 regular point, 28, 166
maximal monotone operator, 104 regular point condition, 5
mesh-independence, 183 restriction operator, 183
method of mapping, 306
metric projection, 57 saddle point, 103
monotone operator, 104 second order augmented Lagrangian, 155
second order sufficient optimality condi-
Navier–Stokes control, 143, 323 tion, 156
Newton differentiable, 234 semismooth function, 216
Newton method, 129, 133 semismooth Newton method, 192, 215,
nonlinear complementarity problem, 231, 241
243 sensitivity equation, 58
nonlinear elliptic optimal control prob- sequential tangent cone, 2
lem, 174 shape calculus, 305
normal cone, 28 shape derivative, 305, 306, 308
null-space representation, 138 shape optimization, 305
Signorini problem, 124, 263
obstacle problem, 8, 122, 247 Slater condition, 6
optimal boundary control, 20 SQP (sequential quadratic programing),
optimal control, 62, 126, 172 129, 137
optimal distributed control, 19 stability analysis, 34
optimality system, 18 state-constrained optimal control, 191, 247
stationary point, 66
P -matrix, 194 strict complementarity, 65, 67, 76
parabolic variational inequality, 277 strong regularity, 31
parameter estimation, 82 strong regularity condition, 47
parametric mathematical programming, 27 subdifferential, 28, 95
Hölder continuity of solutions, 45 sufficient optimality, 67
Lipschitz continuity of solutions, 46 sufficient optimality condition, 131
Lipschitz continuity of value func- superlinear convergence, 141, 221
tion, 39
sensitivity of value function, 55 Tresca friction, 263

i i

i i
ItoKunisc
i i
2008/6/12
page 341
i i

Index 341

Uzawa method, 118

value function, 39, 99


variational inequality, 246

weakly lower semicontinuous, 89


weakly singular problem, 148

Yosida–Moreau approximation, 87, 248

i i

i i
ItoKunisc
i i
2008/6/12
page 342
i i

i i

i i

You might also like