Optimization and Control With Applications
Optimization and Control With Applications
WITH APPLICATIONS
Applied Optimization
VOLUME 96
Series Editors:
Panos M. Pardalos
University of Florida, U.S.A.
Donald W. H e m
University of Florida, U.S.A.
OPTIMIZATION AND CONTROL
WITH APPLICATIONS
Edited by
LIQUN QI
The Hong Kong Polytechnic University, Hong Kong
KOKLAY TEO
The Hong Kong Polytechnic University, Hong Kong
XIAOQI YANG
The Hong Kong Polytechnic University, Hong Kong
Q
- Springer
Library of Congress Cataloging-in-Publication Data
A C.I.P. record for this book is available from the Library of Congress.
9 8 7 6 5 4 3 2 1 SPIN 11367154
Contents
...
Preface Xlll
1
ON MINIMIZATION OF MAX-MIN FUNCTIONS
A.M. Baqimv and A.M. Ruhin,oo
1 lntroduction
2 Special Classes of Max-min Objective Functions
3 Discrete Max-min Functions
4 Optimization Problems with Max-min Constraints
5 Minimization of Continuous Maximum Functions
6 Concluding Remarks
References
L
A COMPARISON OF TWO APPROACHES T O SECOND-ORDER SUBDIF-
FERENTlABlLlTY CONCEPTS WITH APPLICATION T O OPTIMALITY
CONDITIONS
A . Eherhard and C . E. M. Prairr
1 lntroduction
2 Preliminaries
3 Characterization of Supported Operators
4 Generalized Convexity and Proximal Subderivatives
5 Generalized Convexity and Subjets
6 Subjet, Contingent Cone Inclusions
7 Some Consequences for Optimality Conditions
8 Appendix
References
References
4
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING WITH
EQUALITY CONSTRAINTS
S. J. Li, X. Q. Yang and K. L. Teo
1 lntroduction and Preliminaries
2 Uniform Duality for Homogeneous (SDSIP)
3 Uniform Duality for Nonhomogeneous (SDSIP)
References
5
THE USE OF NONSMOOTH ANALYSIS AND OF DUALITY METHODS
FOR THE STUDY OF HAMILTON-JACOB1 EQUATIONS
Jean-Paul Penot
1 lntroduction
2 The Interest of Considering Extended Real-valued Functions
3 Solutions in the sense of Unilateral Analysis
4 Validity of Some Explicit Formulae
5 Uniqueness and Comparison Results
References
6
SOME CLASSES OF ABSTRACT CONVEX FUNCTIONS
A.M. Rubinov and A.P. Shveidel
1 lntroduction
2 Sets Pr,
3 Supremal Generators of the Sets Ph
4 Lk~ubdifferentials
References
Appendix
References
8
A N ANALYSIS OF T H E BARZlLAl AND BORWEIN GRADIENT METHOD
FOR UNSYMMETRIC LINEAR EQUATIONS
Yu-Hong Dai, Li-Zhi Liao and Duan Li
1 lntroduction
2 Case of Identical Eigenvalues
3 Properties of the Recurrence Relation (2.8)
4 Case of DifFerent Eigenvalues
5 Properties of the Recurrence Relation (4.11)
6 Concluding Remarks
References
9
A N EXCHANGE ALGORITHM FOR MINIMIZING SUM-MIN FUNCTIONS
Alexei V. Demyanov
1 lntroduction
2 Statement of the Problem
3 Equivalence o f the Two Problems
4 Minimality Conditions
5 An Exchange Algorithm
6 An €-Exchange Algorithm
7 An Application t o One Clustering Problem
8 Conclusions
References
10
ON T H E BARZILAI-BORWEIN METHOD
Roger Fletcher
1 lntroduction
2 The B B Method for Quadratic Functions
3 The B B Method for Non-quadratic Functions
4 Discussion
5 Optimization with Box Constraints
References
11
T H E MODIFIED SUBGRAIDENT METHOD FOR EQUALITY CONSTRAINED
NONCONVEX OPTIMIZATION PROBLEMS
Rafail N. Gasimov and Nergiz A. Ismayilova
1 lntroduction
2 Duality
3 Solving the Dual Problem
References
viii OPTIMIZATION AND CONTROL WITH APPLICATIONS
12
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING:
ADVANCES AND PERSPECTIVES
Jose' Mario ~ a r t j z n e zand Elvio A. Pilottn
1 lntroduction
2 Main Inexact Restoration Ideas
3 Definition of an IR Algorithm
4 AGP Optimality Condition
5 Order-Value Optimization
6 Bilevel Programming
7 Homotopy Methods
8 Conclusions
References
13
QUANTUM ALGORITHM FOR CONTINUOUS GLOBAL OPTIMIZATION
V. Protopopescu and J. Barhen
1 Global Optimization Problem
2 Grover's Quantum Algorithm
3 Solution of the Continuous Global Optimization Problem
4 Practical Implementation Considerations
References
14
SQP VERSUS SCP METHODS FOR NONLINEAR PROGRAMMING
Klaus Schittkowski and Christian Zillober
1 lntroduction
2 A General Framework
3 SQP Methods
4 SCP Methods
5 Comparative Performance Evaluation
6 Some Academic and Commercial Applications
7 Conclusions
References
15
AN APPROXIMATION APPROACH FOR LINEAR PROGRAMMING IN MEA-
SURE SPACE
C.F. Wen and S. Y. Wu
1 lntroduction
2 Solvability of LPM
3 An Approximation Scheme For LPM
4 An Algorithm For (DELPM)~
References
Contents ix
References
17
PROXIMAL-LIKE METHODS FOR CONVEX MINIMIZATION PROBLEMS
Christian Kanzow
1 lntroduction
2 Proximal-like Methods
3 Numerical Results for Some Optimal Control Problems
4 Final Remarks
References
18
ANALYSIS OF TWO DIMENSIONAL NONCONVEXVARIATIONAL PROBLEMS
Rene' Meziat
1 lntroduction
2 The Method of Moments
3 Convex Envelopes
4 Problem Analysis
5 Discrete and Finite Model
6 Examples
7 Concluding Remarks
References
19
STABILITY OF EQUILIBRIUM POINTS OF PROJECTED DYNAMICAL SYSTEMS
Mauro Passacantando
1 lntroduction
2 Variational and Dynamical Models
3 Stability Analysis
4 Special Cases
References
20
ON A QUASI-CONSISTENT APPROXIMATIONS APPROACH TO OPTI-
MIZATION PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS
Olzvier Pironneau and Elzjah Polak
1 lntroduction
2 An Algorithm Model
x OPTIMIZATION AND CONTROL WITH APPLICATIONS
References
21
NUMERICAL SOLUTIONS OF OPTIMAL SWITCHING CONTROL PROBLEMS
T. Ruby and V. Rehbock
1 lntroduction
2 Problem Formulation
3 Solution Strategy
4 Numerical Examples and Discussion
5 Conclusions
References
22
A SOLUTION T O HAMILTON-JACOB1 EQUATION BY NEURAL NETWORKS
AND OPTIMAL STATE FEEDBACK CONTROL
Kiyotaka Shimizu
1 lntroduction
2 Nonlinear Optimal Regulator And Hamilton-Jacobi Equation
3 Approximate Solution To Hamilton-Jacobi Equation And Optimal State Feed-
back Control Law
4 Improvement Of Learning Algorithm Of Neural Network
5 Simulation Results
6 Conclusions
References
23
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS
Wei Xing, Q.L. Zhang, W.Q. Liu and Qiyi Wang
1 lntroduction
2 Preliminaries
3 Main Results
4 Conclusions
References
References
25
ON A GEOMETRIC LEMMA AND SET-VALUED VECTOR EQUILIBRIUM
PROBLEM
Shui-Hung Hou
1 lntroduction
2 Preliminaries
3 A Variation of Fan's Geometric Lemma
4 Set-valued Vector Equilibrium Problem
References
26
EQUILIBRIUM PROBLEMS
Giovanna Idone and Antonino Maugeri
1 lntroduction
2 A Model of Elastic-Plastic Torsion
References
27
GAP FUNCTIONS AND DESCENT METHODS FOR MINTY VARIATIONAL
INEQUALITY
Gzandomenico Mastroeni
1 lntroduction
2 A Gap Function Associated to Minty Variational lnequality
3 Exact and Inexact Descent Methods
4 Some Applications and Extensions of Minty Variational lnequality
5 Concluding Remarks
6 Appendix
References
28
A NEW CLASS OF PROXIMAL ALGORITHMS FOR THE NONLINEAR COM-
PLEMENTARITY PROBLEM
G.J.P. DA Silva and P.R. 0lzvei.ra
1 lntroduction
2 Preliminaries
3 Existence of Regularized Solutions
4 Algorithm and Convergence
5 Conclusions
References
Preface
We are very happy to see that this Workshop has become a new confer-
ence series. This Workshop is now regarded as OCA 2001. During August
18-22, 2002, The Second International Conference on Optimization and Con-
trol with Applications (OCA2002) was successfully held in Tunxi, China. The
Third International Conference on Optimization and Control with Applications
(OCA2003) will be held in Chongqing-Chengdu, China, during July 1-7, 2003.
Liqun Qi and Kok Lay Teo have continued to be the Directors of OCA 2002
and OCA 2003. We hope that OCA Series will continue to provide a forum
for international researchers and practitioners working in optimization, optimal
control and their applications to exchange information and ideas on the latest
development in these fields.
Elijah (Lucien) Polak was born August 11, 1931 in Bialystok, Poland. He is
a holocaust surviver and a veteran of the death camps a t Dachau, Auschwitz,
Gros Rosen, and Buchenwald. His father perished in the camps, but his mother
survived. After the War, he worked as an apprentice blacksmith in Poland and
a clothes salesman in France. In 1949, he and his mother migrated to Australia,
where, after an eight year interruption, he resumed his education, while working
various part time jobs.
Elijah Polak received the B.S. degree in Electrical Engineering, from the
University of Melbourne, .Australia, in 1957 and the M.S. and Ph.D. degrees,
xvi OPTIMIZATION AND CONTROL WITH APPLICATIONS
A. BOOKS
4. E. Polak and E. Wong, Notes for a First Course on Linear Systems, Van
Nostrand Reinhold Co. New York, 169 pages, 1970.
11. E. Polak, "Equivalence and Optimal Strategies for some Minimum Fuel
Discrete Systems," J. of the fianklin Inst., Vol. 277, No. 2, pp. 150-162,
February 1964.
12. E. Polak, "On the Evaluation of Optimal and Non-Optimal Control Strate-
gies," IEEE Trans. on Automatic Control, Vol. AC-9, No. 2, pp. 175-
176,1964.
13. M. D. Canon and E. Polak, "Analog Circuits for Energy and Fuel Optimal
Control of Linear Discrete Systems," University of California, Berkeley,
Electronics Research Laboratory, Tech. Memo. M-95, August 1964.
19. E. Polak and A. Larsen, Jr., "Some Sufficient Conditions for Continuous
Linear Programming Problems," Int'l J. Eng. Science, Vol. 4, No. 5, pp.
583-603, 1966.
22. E. Polak and J. P. Jacob, "On the Inverse of the Operator O(.) = A(.) +
(.)B," American Mathematical Monthly, Vol. 73, No. 4, Part I, pp.
388-390, April 1966.
27. E. Polak, "An Algorithm for Computing the Jordan Canonical Form of
a Matrix," University of California, Berkeley, Electronics Research Lab-
oratory, Memo. M-223, September 1967.
32. E. Polak, "On the Removal of I11 Conditioning Effects in the Computation
of Optimal Controls," Automatica, Vol. 5, pp. 607-614, 1969.
34. E. Polak, "On primal and Dual Methods for Solving Discrete Optimal
Control Problems," Proc. 2nd International Conference on Computing
Methods in Optimization Problems, San Remo, Italy, September 9-13,
1968. Published as: Computing Methods in Optimization Problems -2,
L. A. Zadeh, L. W. Neustadt and A. V. Balakrishnan, eds., pp. 317-331,
Academic Press, 1969.
37. E. Polak and M. Deparis, "An Algorithm for Minimum Energy," IEEE
Trans. on Automatic Control, Vol. AC-14, No. 4, pp. 367-378, 1969.
PUBLICATIONS OF ELIJAH POLAK xxi
39. E. Polak and G. Meyer, "A Decomposition Algorithm for Solving a Class
of Optimal Control Problems," J. Mathematical Analysis & Applications,
Vol. 3, No. 1, pp. 118-140, 1970.
40. E. Polak, "On the use of models in the Synthesis of Optimization Al-
gorithms," Differential Games and Related Topics (Proceedings of the
International Summer School on Mathematical Models of Action and Re-
action, Varenna, Italy, June 15-27, 1970), H. Kuhn and G. Szego eds.,
North Holland, Amsterdam, pp. 263-279, 1971.
46. E. Polak, "A Survey of Methods of Feasible Directions for the Solution of
Optimal Control Problems," IEEE Transactions on Automatic Control,
Vol. AC-17, NO. 5, pp. 591-597, 1972.
xxii OPTIMIZATION AND CONTROL WITH APPLICATIONS
48. 0. Pironneau and E. Polak, " A Dual Method for Optimal Control Prob-
lems with Initial and Final Boundary Constraints," SIAM J. Control, Vol.
11, NO. 3, pp. 534-549, 1973.
49. R. Klessig and E. Polak, "An Adaptive Algorithm for Unconstrained Op-
timization with Applications to Optimal Control," SIAM J. Control, Vol.
11, NO. 1, pp. 80-94, 1973.
51. R. Klessig and E. Polak, "A Method of Feasible Directions Using Func-
tion Approximations with Applications to Min Max Problems," J. Math.
Analysis and Applications, Vol. 41, No. 3, pp. 583-602, 1973.
52. E. Polak, "On the Use of Optimization Algorithms in the Design of Linear
Systems," University of California, Berkeley, Electronics Research Lab.
Memo. No. M377, 1973.
61. E. Polak and Teodoru, I., "Newton Derived Methods for Nonlinear Equa-
tions and Inequalities," Nonlinear Programming, 0. L. Mangasarian, R.
R. Meyer and S. M. Robinson eds., Academic Press, N. Y., pp. 255-277,
1975.
Optimization Theory and Applications, Vol. 16, No. 314, pp. 303-325,
1975.
72. E. Polak and R. Trahan, "An Algorithm for Computer Aided Design
of Control Systems," Proc. IEEE Conference on Decision and Control,
1976.
80. D. Q. Mayne and E. Polak, "A Feasible Directions Algorithm for Optimal
Control Problems with Terminal Inequality Constraints," IEEE Transac-
tions on Automatic Control, Vol. AC-22, No. 5, pp. 741-751, 1977.
81. I. Teodoru Gross and E. Polak, "On the Global Stabilization of Quasi-
Newton Methods," Proc. ORSA/TIMS National Meeting, San Francisco,
May 9-11, 1977.
85. E. Polak and A. Sangiovanni Vincentelli, "An Algorithm for Design Cen-
tering, Tolerancing and Tuning," Proc. European Conference on Circuit
Theory and Design, Lausanne, Switzerland, Sept. 1978.
87. H. Mukai and E. Polak, "A Second Order Algorithm for Unconstrained
Optimization," J. Optimization Theory and Applications, Vol. 26, No.
4, 1978.
88. H. Mukai and E. Polak, "A second Order Algorithm for the General Non-
linear Programming problem," J. Optimization Theory and Applications,
Vol. 26, No. 4, 1978.
89. A. N. Payne and E. Polak, "An Interactive Method for Bi-Objective De-
cision Making," Proc. Second Lawrence Symposium on Systems and
Decision Sciences, Berkeley, Ca. Oct. 1978.
92. T. Glad and E. Polak, "A Multiplier Method with Automatic Limitation
of Penalty Growth," Mathematical Programming, Vol. 17, No. 2, pp.
140-156, 1979.
97. E. Polak and D. Q. Mayne, "On the Finite Solution of Nonlinear In-
equalities," IEEE Trans. on Automatic Control, Vol. AC-24, No. 3, pp.
443-445, 1979.
99. C. Gonzaga and E. Polak, "On Constraint Dropping Schemes and Opti-
mality Functions for a Class of Outer Approximations Algorithms," SIAM
J. Control and Optimization, Vol. 17, No. 4, pp. 477-493, 1979.
100. R. Trahan and E. Polak, "A Derivative Free Algorithm for a Class of
Infinitely Constrained Problems,'' IEEE Trans. on Automatic Control,
Vol. AC-25, NO. 1, pp. 54-62, 1979.
101. C. Gonzaga, E. Polak and R. Trahan, "An Improved Algorithm for Opti-
mization Problems with Functional Inequality Constraints," IEEE Trans.
on Automatic Control, Vol. AC-25, No. 1, pp. 49-54 1979.
110. E. Polak, "An Implementable Algorithm for the Optimal Design Cen-
tering, Tolerancing and Tuning Problem," Proc. Fourth International
Symposium on Computing Methods in Applied Sciences and Engineering,
Versailles, France, Dec. 10-14, 1979. Published as: Computing Methods
in Applied Science and Engineering, R. Glowinski, J. L. Lions, ed., North
Holland, Amsterdam, pp. 499-517, 1980.
111. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for Op-
timal Control Problems with Control and Terminal Equality Constraints,
Part 1," J. Optimization Theory and Applications, Vol. 32 No. 2, pp.
211-246, 1980.
112. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for Op-
timal Control Problems with Control and Terminal Equality Constraints,
PUBLICATIONS OF ELIJAH POLAK xxix
116. E. Polak and D. Q. Mayne, "On the Solution of Singular Value Inequali-
ties" Proc. 20th IEEE Conference on Decision and Control, Albuqurque,
N.M., Dec. 10-12, 1980.
118. D. Q. Mayne, E. Polak and A. Voreadis, "A Cut Map Algorithm for De-
sign Problems with Tolerances" Proc. 20th IEEE Conference on Decision
and Control, Albuqurque, N.M., Dec. 10-12, 1980.
123. E. Polak and D. Q. Mayne, "A Robust Secant Method for Optimiza-
tion Problems with Inequality Constraints," J . Optimization Theory and
Applications, Vol. 33, No. 4, pp. 463-467, 1981.
124. E. Polak and D. Q. Mayne, "On the Solution of Singular Value Inequali-
ties over a Continuum of Frequencies" IEEE Transactions on Automatic
Control, Vol. AC-26, No. 3, pp. 690-695, 1981.
132. E. Polak, "An Implementable Algorithm for the Design Centering, Toler-
ancing and Tuning Problem," J. Optimization Theory and Applications,
Vol. 35, No. 3, 1981.
140. D. Q. Mayne and E. Polak, "Algorithms for the Design of Control Sys-
tems Subject to Singular Value Inequalities," Mathematical Programming
Studies, Vol. 18, pp. 112-134, 1982.
143. D. Q. Mayne, E. Polak and A. Voreadis, "A Cut Map Algorithm for
Design Problems with Tolerances," IEEE Trans. on Circuits and Systems,
Vol. CAS-29 NO. 1, pp. 35-46, 1982.
147. D. Q. Mayne and E. Polak, "Algorithms for the Design of Control Sys-
tems Subject to Singular Value Inequalities," Mathematical Program-
ming Study 18, Algorithms and Theory in Filtering and Control, D. C.
Sorensen and R. J.-B. Wets, ed., North Holland, New York, pp. 112-135,
1982.
150. Balling, R. J., Ciampi, V., Pister K. S. Polak, E., "Optimal Design of
Structures Subjected to Earthquake Loading," Proc. ASCE Convention,
Las Vegas, 1982.
153. E. Polak, D.Q. Mayne and Y. Wardi, "On the Extension of Constrained
Optimization Algorithms from Differentiable to Nondifferentiable Prob-
lems," SIAM J. Control and Optimization, Vol. 21, No. 2, pp. 179-204,
1983.
163. E. Polak, "A Modified Nyquist Stability Criterion for Use in Computer-
Aided Design," IEEE Trans. on Automatic Control, Vol. AC-29, No. 1,
pp. 91-93, 1984.
164. E. Polak and D.M. Stimler, "On the Design of Linear Control Systems
with Plant Uncertainty via Nondifferentiable Optimization," Proc. IX.
Triennial IFAC World Congress, Budapest, July 2-6, 1984.
PUBLICATIONS O F ELIJAH POLAK xxxv
173. E. Polak, S. Salcudean and D. Q. Mayne, "A Rationale for the Sequential
Optimal Redesign of Control Systems," Proc. 1985 ISCAS, pp. 835-838,
Kyoto, Japan, June 1985.
176. E. Polak and D. M. Stimler, "On the Efficient Formulation of the Optimal
Worst Case Control System Design Problem," University of California,
Electronics Research Laboratory Memo No. UCB/ERL M85/71, 21 Au-
gust 1985.
183. E. Polak and D. Q. Mayne, " Design of Multivariable Control Systems via
Semi-Infinite Optimization," Systems and Control Encyclopaedia, M. G.
Singh, editor, Pergamon Press, N.Y. 1987.
184. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for
Control Problems with State and Control Constraints," IEEE Trans. on
Control, Vol. AC-32, No. 5, pp. 380-388, 1987.
188. S. Daijavad, E. Polak, and R-S Tsay, "A Combined Deterministic and
Random Optimization Algorithm for the Placement of Macro-Cells," Proc.
MCNC International Workshop on Placement and Routing, Research Tri-
angle Park, NC, May 10-13, 1988.
192. E. Polak and E. J . Wiest, "Domain Rescaling Techniques for the Solution
of Affinely Parametrized Nondifferentiable Optimal Design Problems,"
Proc. 27th IEEE Conference on Decision and Control, Austin, Tx., Dec.
7-9 1988. Dec. 1988.
195. E. Polak and S. Wuu, "On the Design of Stabilizing Compensators via
Semi-Infinite Optimization," EEE Trans. on Control, Vol. 34, No.2, pp
196-200, 1989.
199. Y-P. Harn and E. Polak, "On the Design of Finite Dimensional Con-
trollers for Infinite Dimensional Feedback-Systems via Semi-Infinite Op-
timization," Proc. 27th IEEE Conference on Dec. and Contr., Austin,
Tx., Dec. 7-9 1988. IEEE Trans. on Automatic Control, Vol. 35, No.
10, pp. 1135-1140, 1990
PUBLICATIONS O F ELIJAH POLAK xxxix
201. E. Polak and E. J. Wiest, "A Variable Metric Technique for the Solution
of Affinely Parametrized Nondifferentiable Optimal Design Problems," J.
Optimization Theory and Applications, Vol. 66, No. 3, pp 391-414, 1990.
202. L. He and E. Polak, "An Optimal Diagonalization Strategy for the Solu-
tion of a Class of Optimal design Problems," IEEE, Zlans. on Automatic
Control, Vol. 35, No.3, pp 258-267, 1990.
203. T. E. Baker and E. Polak, "An Algorithm for Optimal Slewing of Flexible
Structures," University of California, Electronics Research Laboratory,
Memo UCBIERL M89/37, 11 April 1989, Revised, 4 June 1990.
207. Y-P. Harn and E. Polak, "On the Design of Finite Dimensional Con-
trollers for Infinite Dimensional Feedback-Systems via Semi-Infinite Op-
timization," IEEE Trans. on Automatic Control, Vol. 35, No. 10, pp.
1135-1140, 1990.
210. E. Polak and L. He, "A Unified Phase I Phase I1 Method of Feasible
Directions for Semi-infinite Optimization," J. Optimization Theory and
Applications, Vol. 69, No.1, pp 83-107, 1991.
211. J . Higgins and E. Polak, "An €-active Barrier Function Method for Solving
Minimax Problems," J. Applied Mathematics and Optimization, Vol. 23,
pp 275-297, 1991.
212. E. J. Wiest and E. Polak, "On the Rate of Convergence of Two Minimax
Algorithms," J. Optimization Theory and Applications Vol. 71 No.1, pp
1-30, 1991.
215. C. Kirjner Neto and E. Polak, "A Secant Method Based on Cubic Inter-
polation for Solving One Dimensional Optimization Problems," Univer-
sity of California, Berkeley, Electronics Research Laboratory Memo No.
UCB/ERL M91/91, 15 October 1991.
216. E. Polak, J. Higgins and D. Q. Mayne, "A Barrier Function Method for
Minimax Problems," Mathematical Programming, Vol. 54, No.2, pp.
155-176, 1992.
218. E. Polak and L. He, "Rate Preserving Discretization Strategies for Semi-
infinite Programming and Optimal Control," SIAM J. Control and Opti-
mization, Vol. 30, No. 3, pp 548-572, 1992
PUBLICATIONS O F ELIJAH POLAK xli
227. T. E. Baker and E. Polak, "On the Optimal Control of Systems Described
by Evolution Equations," SIAM J. Control and Optimization, Vol. 32,
NO. 1, pp 224-260, 1994
231. C. Kirjner Neto and E. Polak, "On the Use of Consistent Approxima-
tions for the Optimal Design of Beams," SIAM Journal on Control and
Optimization, Vol. 34, No. 6, pp. 1891-1913, 1996.
233. A. Schwartz and E. Polak, "A Family of Projected Descent Methods for
Optimization Problems with Simple Bounds," J . Optimization Theory
and Applications, Vol. 92, No.1, pp.1-32, 1997.
237. E. Polak and L. Qi, "A Globally and Superlinearly Convergent Scheme
for Minimizing a Normal Merit Function", AMR 96/17, Applied Math-
ematics Report, University of New South Wales, 1996, and SIAM J. on
Optimization, Vol. 36, No. 3, p.1005-19, 1998.
241. E. Polak and L. Qi, "Some Optimality Conditions for Minimax Problems
and Nonlinear Programs," Applied Mathematics Report AMR 9814, Uni-
versity of New South Wales, 1998.
ECT," 1998 IEEE Nuclear Science Symposium and Medical Imaging Con-
ference Record, Toronto, Canada, November 9-14, 1998.
246. E. Polak, L. Qi, and D. Sun, "First-Order Algorithms for Generalized Fi-
nite and Semi-Infinite Min-Max Problems," Computational Optimization
and Applications, Vo1.13, No.1-3, Kluwer Academic Publishers, p.137-61,
1999.
248. Geraldine Lemarchand, Olivier Pironneau, and Elijah Polak, "A Mesh
Refinement Method for Optimization with DDM," Proc. 13th Inter-
national Conference on Domain Decomposition Methods, Champfleuri,
Lyons, France, October 9-12, 2000.
255. E. Polak, "Smoothing Techniques for the Solution of Finite and Semi-
Infinite Min-Max-Min Problems," High Performance Algorithms and Soft-
ware for Nonlinear Optimization, G. Di Pillo and A. Murli, Editors,
Kluwer Academic Publishers B.V., 2002
258. J.O. Royset, E. Polak and A. Der Kiureghian, "FORM Analysis USIng
Consistent Approximations," Proceedings of the 15th ASCE Engineering
Mechanics Conference, New York, NY, 2002.
259. E. Polak and J.O. Royset, "Algorithms with Adaptive Smoothing for
Finite Min-Max Problems," J. Optimization Theory and Applications,
submitted 2002.
260. E. Polak and J.O. Royset, "Algorithms for Finite and Semi-Infinite Min-
Max-Min Problems Using Adaptive Smoothing Techniques," J . Optimiza-
tion Theory and Applications, submitted 2002.
Key words: Max-min function, cutting angle method, discrete gradient method,
quasidifferential.
4 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
Max-min functions form one of the interesting and important for applications
classes of nonconvex and nonsmooth functions. There are many practical tasks
where the objective function and/or constraints belong to this class. For exam-
ple, optimization problems with max-min constraints arise in different branches
of engineering such as the design of electronic circuits subject to a toleranc-
ing and tuning provision (see Bandler, et a1 (1976); Liu et a1 (1992); Muller
(1976); Polak and Sangiovanni Vincentelli (1979); Polak (1981)), the design
of paths for robots in the presence of obstacles (Gilbert and Johnson (1985)),
the design of heat exchangers (Grossman and Sargent (1978); Halemane and
Grossman (1983); Ostrovsky et a1 (1994)) and chemical reactors (Halemane
and Grossman (1983); Ostrovsky et a1 (1994)), in the layout design of VLSI
circuits (Cheng et a1 (1992); Hochbaum (1993)) etc. Optimization problems
with max-min objective and constraint functions also arise when one tries to
design systems under uncertainty (see Bracken and McGill (1974); Grossman
and Sargent (1978); Halemane and Grossman (1983); Ierapetritou and Pis-
tikopoulos (1994); Ostrovsky et a1 (1994)).
In the paper (Kirjner-Neto and Polak (1998)), the authors consider opti-
mization problems with twice continuously differentiable objective functions
and max-min constraint functions. They convert this problem to a certain
problem of smooth optimization.
In this paper we consider different classes of unconstrained and constrained
minimization problems with max-min objective and/or constraint functions.
The paper consists of four parts. First, we investigate a special simple class of
max-min functions. We give a explicit description of all local minima and show
that even in such a simple case the number of local minimizers is very large.
We discuss the applicability of the discrete gradient method (see, for example
Bagirov (1999a); Bagirov (199913)) for finding a local minimizer of discrete max-
min functions without constraints in the second part.
The constrained minimization problems with max-min constraints are ex-
amined in the third part. We use a special penalization approach (see Rubinov
et a1 (2002)) for this purpose.
The unconstrained global minimization of some continuous maximum func-
tions by the cutting angle method (see, for example, Rubinov (2000)) are stud-
ied in the fourth part. If the number of internal variables in the continuous
maximum functions is large enough (more than 5) then the minimization prob-
lem of these functions cannot be solved by using traditional discretization.
Application of the cutting angle method allows us to solve such problems with
small number of external and large enough number of internal variables (up to
15 internal variables).
We provide results of numerical experiments, which allow us to conclude
that even for simple max-min functions the number of local minima can be
very large. It means that the problem of minimization of such functions is
quite complicated. However, results of numerical experiments show that meth-
ods considered in this paper allow us to solve different kinds of problems of
minimization of max-min functions up to 10 variables.
The paper is arranged as follows. In Section 2- we study the problem of
minimization of special max-min functions over the unit simplex. Section 3
is devoted to the problem of minimization of the discrete max-min functions.
Minimization problems with max-min constraints are studied in Section 4. The
problem of global minimization of the continuous maximum functions is dis-
cussed in Section 5. Section 6 concludes the paper.
6 OPTIMIZATION AND CONTROL WITH APPLICATIONS
A function of the form (2.1) is called the min-type function generated by 1 (or
simply a min-type function).
Let f be a function defined on R:. A function f is called increasing if
x >
y implies that f ( x ) > f ( y ) . The restriction of the function f to a ray
R, = { a y : a > 0 ) starting from zero and passing through y is the function of
one variable
fY(Q) = f (QY). (2.2)
The following result is based on Theorem 2.1. (See Rubinov (2000) for details.)
The function f is IPH if and only iff is positively homogeneous ICAR function.
It follows from (2.5) that an ICAR function is IPH if the function f , defined
by (2.2) is linear for all y E R;, y # 0.
The following result holds (see Rubinov (2000)).
l ( x ) = min laxi;
iEI(1)
Proposition 2.2 Let f be an IPH function and let X c int IR; be a compact
set. Then for each E > 0 there exists a finite set { x l , . . . , x j ) c X , such that
the function
h ( x ) = max min $xi
k s j a€I(lk)
where f is an ICAR function and X c IR; is a compact set. The cutting angle
method for solving problem (2.7) has been proposed and studied in Rubinov
(2000) (p. 420). This method reduces the problem (2.7) to a sequence of
auxiliary problems:
It follows from the following result (see Rubinov (2000) and references therein)
Then g(y) = f(y) for all y E S and g is an ICAR function if p > (2K)lnz
where m = minyEs f (y) and K is the Lipschitz constant of f in 11 . 11 1 norm:
Let f be a Lipschitz function over S. Consider the function fd(x) = f (x) +d,
where d > 0. Note that the Lipschitz constant o f f coincides with the Lipschitz
constant of f d . Hence, for each p > 0 there exists d > 0 such that the extension
of fd given by (2.9), is an ICAR function. However if d is a very large number
the function g is "almost flat" and its minimization is a difficult task. On the
other side, if d is small enough, then p is large and some computation difficulties
can appear. Thus, an appropriate choice of a number d is an important problem.
We shall consider the cutting angle method only for minimization of ICAR
functions over S , then it can be applied also for minimization of Lipschitz
functions defined on S. The main idea behind this method is to approximate
the objective function g by a sequence of its saw-tooth underestimates hj of
the form (2.4). Assume that hj (xk) = g(xk) for some points xk, k = 1, . . . ,j
and hj uniformly converges to g as j -+ +oo. Then a global minimizer of g can
be approximated by a solution of the problem:
cutting angle method can be found in Rubinov (2000). Here we discuss only
methods for solution of the auxiliary problem (2.10). Some of them can be
found in Rubinov (2000). The most applicable method can be given if the
objective function g of the initial problem is IPH. In this case constant ck in
(2.4) are equal to zero, hence hj is a positively homogeneous function. Using
this property of hj it is possible to prove that all local minimizers of hj over S
are interior points of S and then to give an explicit expression of local minima
of g over S, which are interior points of S. (see Bagirov and Rubinov (2000)
and also Rubinov (2000)). If g is an extension defined by (2.9) of a Lipschitz
function f defined on S, then g is IPH if p = 1, so we need to choose a large
number d in order to apply the cutting angle method. The question arises is it
possible to extend results obtained for minimization of homogeneous functions
of the form (2.6) for general non-homogeneous functions of the form (2.4). In
the case of success we can consider more flexible versions of the cutting angle
method, which can lead to its better implementation.
Our goal is to describe all local minimizers of h over the unit simplex S . We
shall also indicate the structure of such minimizers. Let
k
@k(x)= min lixi - ck, k = 1 , . . .,j.
i€I(lk)
Then
h(x) = max @k(x).
k I j
O N MINIMIZATION O F MAX-MIN FUNCTIONS 11
We set
) {i E I ( l k ) : ltxi - ~k = Qk(x)} .
Q ~ ( x= (2.14)
1) if x is a local minimizer off over S then fl(x, u) 2 0 for all u E K(x, S);
2) if fl(x, u) > 0 for all u E K(x, S ) \ (0) then x is a local minimizer of f
over S .
Now we describe all local minimizers of the function h defined by (2.11). The
function h is bounded from below on S. Adding a sufficiently large positive
constant we can assume without loss of generality that h(x) > 0 for all x E S.
First we consider minimizers, which belong to the relative interior r i S of the
set S:
r i S = { x = (xl ,..., x,) E S : xi > 0 , Z E I).
h ( x ) cki
Since
we conclude that
hence
for all k 5 j . It follows from (2.21) that there exist indices k E R ( x ) and
p E Q k ( x ) such that h ( x ) = Liz, - ck. Using again (2.20) we conclude that
14 OPTIMIZATION AND CONTROL WITH APPLICATIONS
and
1,""xi - ~ k ,> @ k , ( x )= h(3;) for a11 i E ~ ( l ~ " i) #, m.
Hence
1;-
- > h ( x ) ckm for all i E I ( l k m ) , i # m.
1 h ( x ) cki+
+
We again consider the function (2.11) and Problem (2.12). In next propo-
sition we will establish a sufficient condition for a local minimum in Problem
(2.12). We assume that the function (2.11) possesses the following property:
Proposition 2.5 Let a subset { l k l , . . . ,Ikn} of the set {I1,. . . , l j } enjoys the
following properties:
3)
1:- >-d + c k m for all i E I ( l k m ) , i # m
-
1;i d + Cki
where
ON MINIMIZATION O F MAX-MIN FUNCTIONS 15
Thus d > 0 .
Since x E S it follows that d + cki 2 0 for all i = 1 , . . . ,n. Due to (2.23) we
have
min
i€I(lk)
(-$-*)so
d + cki
for all k 5 j . The latter means that for any k 5 j there exists i = ik E l ( l k )
such that
Then
h(5)= max min ( 1 : ~-~c ~5) d.
k l j &I(lk)
Thus h ( 5 ) = d. Clearly
16 OPTIMIZATION AND CONTROL WITH APPLICATIONS
> min
i€Qkm (a)
lfmui
Thus
h t ( 5 ,u ) > 0 for all u E K (5,S ) \ { 0 ) ,
Thus analogous results can be obtained for IPH functions from Propositions
2.4 and 2.5 when ck = 0 for all k 2 0. This case has been studied in Bagirov
and Rubinov (2000).
Remark 2.1 For homogeneous max-min functions we have d > 0 and in this
case we do not need assumption Q k ( x ) := minic1(p) l f x i > 0 for all k 5 j and
x E S.
The previous results and the following Proposition 2.6 allow one to describe
all local minimizers of a homogeneous max-min function over the unit simplex:
where
Let
@ k ( x ) = min l ki x i - c k , x € S , k = l , ..., 5.
a~I(1k)
We have
1
Q 5 ( x ) = - min (52x1, 104x2, 156x3, 208x4)
25
+ 10.
Consider the boundary point of S:
We have
@ l ( x l )= 5.5, @2(x1)= 5.5, @3(x1)= 5.25,
@4(x1)= 2, @5(x1)= 10
Since
h ( x l ) = max ~ ~ ( x l )
k=1,...,5
it follows that
It is clear that x1 is the global minimum of the function over the set S.
Moreover
@ 5 ( x )> @5[x1)
18 OPTIMIZATION AND CONTROL WITH APPLICATIONS
for all x E ri S and Q 5 ( x )= Q 5 ( x 1 )for a11 boundary points x. Since all the
functions Q k ,Ic = 1,. . . , 5 are continuous there exist E > 0 such that
where
1;" >
- h(x) +
n I~( x)) , i # m.
for all i E I ( L ~
12 h ( x )4- Cki
Let h' be the restriction of the function h to IR?. Clearly h' is a function of
the form (2.11) and x is a local minimizer of this function. Since x E riS', we
can apply Proposition 2.4. The result follows directly from this proposition. A
Propositions 2.4 and 2.5 allows us to propose the following algorithm for the
computation the set of local minimizers of the function h defined by (2.11) over
riS.
Step 5. If
1
->-
d + Ck, for all t = 1 , . . . , n , i E ~ ( l ~ ,i)#, t
:1 d + Ck;
then go to Step 6, otherwise set m = n and go to Step 2.
Step 7. If
Note that G is the set of local minimizers of the function h over the set ri S .
where
The code has been written in Fortran 90 and numerical experiments have
been carried out in P C IBM Pentium I11 with CPU 800 MHz.
For the description of the results of numerical experiments we use the fol-
lowing notation:
tl is the CPU time for the calculation of the set of all local minimizers;
Example 2.3
f 2(x) = max min (aij ,x),
152520 l l j s n
22 OPTIMIZATION AND CONTROL WITH APPLICATIONS
We use points xk given in (2.30) and (2.31) for the construction of min-type
functions. Thus we describe all local minimizers of the following functions over
Results presented in Tables 2.2 and 2.3 show that the number of local mini-
mizers strongly depends on the original IPH function. The proposed algorithm
24 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where
r 1
Numerical experiments have been carried out in P C IBM Pentium I11 with
CPU 800 MHz. All problems have been solved with the precision S =
that is a t last point x k :
f ( x ~-)f* <
To carry out numerical experiments unconstrained minimization problems with
the following max-min objective functions were considered:
Problem 1
I
f ( x ) = max min (aij , x )
ZEI ~ E J
I,
..
a: = l / ( i + j + k - 2 ) , i E I , j E J, k = 1,...,n, I = ( 1 ,
( 1,..., 101, so=(10, . . . , l o ) , x* = (0,..., O ) , f* =o.
Problem 2
..
a: = l / ( i + j + k - 2 ) , i E I , j E J, k = 1,...,n, I = ( 1 ,
( 1, . . . , l o ) , x0 = (10, . . . , l o ) , x* = (0,..., O ) , f* = 0.
Results presented in Table 3.1 show that the discrete gradient method can
be applied to solve minimization problems with the discrete max-min func-
tions. This method allowed one to find global solution to the problems under
consideration using reasonable computational efforts.
minimize f ( x )
subject t o x E X C ]Rn,
gi(x) < 0 , i = 1,...,m , g i ( x ) = O , i = m + l ,..., m + k .
lim d, = 0,
c++m
where d, is the least exact penalty parameter for the problem (PC):
minimize f,(x) subject to x E X , fl(x) 5 0.
The cutting angle method requires to evaluate only one value of the objective
function a t each iteration. This property is very beneficial, if the evaluation of
values of the objective function is time-consuming. Due to this property we can
use the cutting angle method for solving some continuous min-max problems.
Some details of this approach can be found in Bagirov and Rubinov (2001b).
Let Y c Rm be a compact set and let cp(x, y) be a continuous function
defined on the set Rn x Y . Consider the following continuous min-max problem:
find points 5 E Rn and y E Y such that
where
We assume that the function y H cp(x, y) is concave for each y, then the
evaluation of the values of function f can be easily done by methods of convex
programming. The cutting angle method can be applied for problems with a
small amount of external variables x, however the number of internal variables
y can be large enough. Numerical experiments with such kind of functions were
produced in Bagirov and Rubinov (2001b) and we do not repeat them here.
6 CONCLUDING REMARKS
Acknowledgments
The authors are very thankful to Professor E. Polak for kindly discussions of our
results during his visit to Australia and providing us his recent papers. We are
also grateful to an anonymous referee for his helpful comments. This research was
supported by the Victorian Partnership for Advanced Computing.
REFERENCES 31
References
Bagirov, A.M. (1999a). Minimization methods for one class of nonsmooth func-
tions and calculation of semi-equilibrium prices, In: Progress in Optimiza-
tion: Contributions from Australasia, Eberhard, A. et al. (eds.), Applied 0p-
timization, 30, Kluwer Academic Publishers, Dordrecht, 147-175.
Bagirov, A.M. (1999b). Derivative-free methods for unconstrained nonsmooth
optimization and its numerical analysis, Investigacao Operacional, 19,75-93.
Bagirov, A.M. and Rubinov, A.M. (2000). Global minimization of increasing
positively homogeneous functions over unit simplex, Annals of Operation
Research, 98, pp. 171-189.
Bagirov, A.M. and Rubinov, A.M. (2001a). Modified versions of the cutting an-
gle method, In: N. Hadjisavvas and P.M.Pardalos (eds.) Advances in Convex
Analysis and Global Optimization, Kluwer Academic Publishers, Dordrecht,
245-268.
Bagirov, A.M. and Rubinov, A.M. (2001b). Global optimization of marginal
functions with applications to economic equilibrium, Journal of Global Op-
timization, 20, 215-237.
Bandler, J.W., Liu, P.C. and Tromp, H. (1976). A nonlinear programming
approach to optimal design centering, tolerancing and tuning, IEEE Trans.
Circuits Systems, CAS-23, 155-165.
Bartels, S.G., Kunz, L. and Sholtes, S. (1995). Continuous selections of linear
functions and nonsmooth critical point theory, Nonlinear Analysis, TMA,
24, 385-407.
Bracken, J . and McGill, J.T. (1974). Defence applications of mathematical pro-
grams with optimization problems in the constraints, Oper. Res., 22, 1086-
1096.
Cheng, C.K., Deng, X., Liao, Y.Z. and Yao, S.Z. (1992). Symbolic layout com-
paction under conditional design rules, IEEE Trans. Comput. Aided Design,
11, 475-486.
Demyanov, V.F. and Rubinov, A.M. (1986). Quasidifferential Calculus, Opti-
mization Software, New York.
Demyanov, V.F. and Rubinov, A.M. (1995). Constructive Nonsmooth Analysis,
Peter Lang, Frankfurt am Main.
32 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Evtushenko, Yu. (1972). A numerical method for finding best quaranteed es-
timates, USSR Computational Mathematics and Mathematical Physics, 12,
109-128.
Gilbert, E.G. and Johnson, D.W. (1985). Distance functions and their applica-
tion to robot path planning in the presence of obstacles, IEEE J. Robotics
Automat., RA-1, 21-30.
Grossman, I.E. and Sargent, R.W. (1978). Optimal design of chemical plants
with uncertain parameters, AIChE J., 24, 1-7.
Halemane, K.P. and Grossman, I.E. (1983). Optimal process design under un-
certainty, AIChE J., 29, 425-433.
Hochbaum, D. (1993). Complexity and algorithms for logical constraints with
applications to VLSI layout, compaction and clocking, Studies in Locational
Analysis, ISOLD VI Proceedings, 159-164.
Ierapetritou, M.G. and Pistikopoulos, E.N. (1994). Simultaneous incorporation
of flexibility and economic risk in operational planning under uncertainty,
Comput. Chem. Engrg., 18, 163-189.
Kirjner-Neto, C. and Polak, E. (1998). On the conversion of oiptimization prob-
lems with max-min constraints to standard optimization problems, SIAM J.
on Optimization, 8(4), 887-915.
Liu, P.C., Chung, V.W. and Li, K.C. (1992). Circuit design with post-fabrication
tuning, in: Proc. 35th Midwest Symposium on Circuits and Systems, Wash-
ington, DC, IEEE, NY, 344-347.
Mifflin, R. (1976). Semismooth and semiconvex functions in constrained opti-
mization. SIAM Journal on Control and Optimization.
Muller, G. (1976). On computer-aided tuning of microwave filters, in: IEEE
Proc. International Symposium on Circuits and Systems, Munich, IEEE
Computer Society Press, Los Alamos, CA, 722-725.
Ostrovsky, G.M., Volin, Y.M., Barit, E.I. and Senyavin, M.M. (1994). Flexibil-
ity analysis and optimization of chemical plants with uncertain parameters,
Comput. Chem. Engrg., 18, 755-767.
Polak, E. (1981). An implementable algorithm for the design centering, toler-
ancing and tuning problem, J. Optim. Theory Appl., 35, 45-67.
Polak, E. (1997). Optimization. Algorithms and Consistent Approximations.
Springer Verlag, New York.
REFERENCES 33
Polak, E. (2003). Smoothing techniques for the solution of finite and semi-
infinite min-max-min problems, High Performance Algorithms and Software
for Nonlinear Optimization, G. Di Pillo and A. Murli (eds.), Kluwer Acad-
emic Publishers, to appear.
Polak, E. and Sangiovanni Vincentelli, A. (1979). Theoretical and computa-
tional aspects of the optimal design centering, tolerancing and tuning prob-
lem, IEEE Trans. Circuits and Systems, CAS-26, 795-813.
Rubinov, A.M. (2000). Abstract Convexity and Global Optimization, Kluwer
Academic Publishers, Dordrecht.
Rubinov, A.M., Yang, X.Q. and Bagirov, A.M. (2002). Nonlinear penalty func-
tions with a small penalty parameter, Optimization Methods and Software,
l7(5), 931-964.
2
A COMPARISON OF TWO
APPROACHES T O SECOND-ORDER
SUBDIFFERENTIABILITY CONCEPTS
WITH APPLICATION T O OPTlMALlTY
CONDITIONS
A. Eberhard
Department of Mathematics,
RMIT University, GPO Box 2476V,
Melbourne, Australia
and C. E. M. Pearce
Department of Mathematics,
University of Adelaide, North Terrace,
Adelaide, Australia
Abstract: The graphical derivative and the coderivative when applied to the
proximal subdifferential are in general not generated by a set of linear operators
Nevertheless we find that in directions at which the subjet (or subhessian) is
supported, in a rank-1 sense, we have these supported operators interpolating
the contingent cone. Thus under a prox-regularity assumption we are able to
make a selection from the contingent graphical derivative in certain directions,
using the exposed facets of a convex set of symmetric matrices. This allows
us to make a comparison between some optimality conditions. A nonsmooth
formulation of a standard smooth mathematical programming problem is used
to derive a novel set of sufficient optimality conditions.
1 INTRODUCTION
In this paper we shall concern ourselves only with second-order subderivatives
that arise from one of two constructions. First, there is the use of the con-
tingent tangent cone to the graph of the proximal subdifferential and its polar
cone to generate the graph of the contingent graphical derivative and contingent
coderivative. The second one is the use of sub-Taylor expansions to construct
a set of symmetric matrices as replacements for Hessians, the so called subjet
(or the subhessian of Penot (199411)) of viscosity solution theory of partial
differential equations (see Crandall et al. (1992)). To each of these construc-
tions may be associated a limiting counterpart which will also be considered a t
times. As the first of the above constructions produces a possibly nonconvex
set of vectors in Rn and the latter a convex set of symmetric operators, it is
not clear how to compare them. One of the purposes of this paper is to begin
a comparative study of these notions for the class of prox-regular functions of
Poliquin and Rockafellar (1996). A second objective is to consider what this
study tells us regarding certain optimality conditions that can be framed using
these alternative notions. We do not take up discussion of the second-order
tangent cones and the associated second-order parabolic derivatives of func-
tions, which is beyond our present scope. We refer the reader to Bonnans,
Cominetti et al. (1999), Rockafellar and Wets (1998) and Penot (199412) and
the bibliographies therein for recent accounts of the use of these concepts in
optimality conditions. Important earlier works include Ben-Tal (1980), Ben-
Tal et al. (1982), Ioffe (1990) and Ioffe (1991) Some results relating parabolic
derivatives to subjet-like notions may be found in Eberhard (2000).
Where possible we adhere to the terminology and notation of the book 'Vari-
ational Analysis' by Rockafellar and Wets (1998), with the notable exception
of the proximal subdifferential. Henceforth we assume f : R"-+ ]FZ to be lower
semi-continuous, minorized by a quadratic function, and 5 E dom f .
Here x' + f x means x' -, x and f (x') + f (x). In general the proximal
subdifferential is only contained in the subdifferential. Next we define one
concept that forces equality.
As noted in many papers (see, for example, Poliquin and Rockafellar (1996),
Poliquin et al. (1996) and Rockafellar and Wets (1998)) prox-regular functions
constitute an important class of nonsmooth functions in the applications of
nonsmooth analysis to optimization and thus are currently undergoing intense
study. The proximal subdifferential when applied to the indicator function
ds(x) of a set S (defined to be zero if x E S and +oo otherwise) gives rise to
the proximal normal cone NZ(3) := dpSS(3). This may in turn be used via
limiting processes to define the normal cone for which dds(x) := Ns(x). For
an arbitrary set-valued mapping F : IRn 2 IRm, the normal cone to its graph
at a point y E F ( x ) gives rise to the coderivative mapping of Mordukhovich
(1976)-Mordukhovich (1994), Kruger et al. (1980) and later studies by Ioffe
(1986)-Ioffe (1989). We consider only the finite-dimensional case in this paper.
Graph F-(x,y)
H e r e TGraph ~ ( xy), := lim suptLo t i s t h e contingent tangent cone.
At this juncture the natural question arises as to whether there is any rela-
tionship between d2>-f ( x ,y), D(dpf ) ( 5 ,y) and D*(d, f ) ( x ,y). The answer to
this question is far from obvious as on first inspection we note that d2>-f ( x ,y)
is a set of operators while D(dpf ) ( 5 , y) and D* (8, f ) ( x ,y) are multifunctions
whose images are contained in IRn. When f E c2(IFtn)all notions coincide in
that
Here the equality for the coderivative utilizes the symmetry of the operator
V 2f ( 5 )and f ( x ,V f ( 5 ) )denotes the Pareto efficient subset with re-
spect to the partial order induced by P ( n ) . Thus a possible relationship to
consider is whether Q E d 2 ' - f ( x ,y) is to have Qw E D*(d,f)(x,y)(w) or
Qw E D(dpf ) ( x ,y) ( w ) . In Rockafellar et al. (1997)conditions are given (which
include prox-regularity of f at x , y) under which we have the inclusion
When the polarity of the associated tangent and normal cones is taken into
account, one can see that this inclusion does imply a kind of symmetry property
for the elements of the contingent graphical derivative. This prompts one to
conjecture that some elements of {Qw I Q E d2>-f ( x ,V f ( 5 ) ) ) may also be
contained in D (8, f ) ( x ,y ) ( w ).
In this paper we extend the inclusion (1.3) by considering the relationship
of d2>-f ( x ,y) to D(dpf ) ( x ,y) ( w ) . To do so we need to extract the correct op-
erators from d21- f ( x ,y). The geometry of d21- f ( x ,y) is of critical importance
in this development. Let us now introduce some elements of this geometry
necessary to state our results. The F'robenius inner product (Q,B) = t r BtQ
induces the quadratic form (Q,hht) = htQh = (Qh,h) when applied to a
rank-one operator hht(- h @ h using tensor product notation). We call
It was established in Eberhard, Nyblom and Ralph (1998) that E(A, w) may
be empty in some given directions. Despite this, we show in this paper that
for an arbitrary rank-one representer we have E(A, w) # Q) for a dense subset
in bf (A) := {W I q(A)(w) < +m).
A multifunction I? : X Z Y is generated by a set of operators A G S(n) if
r(w) = Aw = {Qw I Q E A). In Ioffe (1981) it is established that fans and
hence many derivative-like objects are, in general, not generated by a set of
linear operators. In this paper we investigate an alternative notion.
Then (w, Qw) E TcraphaPf( x , ~ )the , contingent cone to the proximal subdif-
ferential, that is, there exists a rank-1 selection of the graphical derivative
Thus we are able to make a selection from these non-convex set-valued deriv-
atives using certain rank-1 exposed facets of a convex set of symmetric ma-
trices. The construction of such matrices is often possible and provides an
alternative path to the generation of elements within D(dpf)(x, y)(w) and
D* (df) (5, jj)(w). This just the kind of selection which must be calculated
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 41
2 PRELIMINARIES
sets {C" I v E W), limsup,,, C" is defined as the set consisting of all cluster
points of sequences {un) with un E CVn (for n sufficiently large) and some
vn -+ w as n + m , while lirn inf,,, C v consists of points u for which, for any
given sequence vn + w,
there exists a convergent sequence un, with un E Cvn
(for n sufficiently large) and with u = limn,, u n . Clearly liminf,,, Cv G
limsup,,, C". When these coincide we say that {C" I v E W ) converges to C
and we write C = lim,,, C".
Denote by S ( n ) the set of all real n x n symmetric matrices and by R+ (re-
spectively E) the real intervals [0,+m) (respectively (-co,+m]). When C is a
convex set in a vector space X, denote the recession directions of C by O+C =
{x E X I C + x C C). When C is not convex, we denote the horizontal direc-
tions by Cm := {x E X I 3pn J 0 and cn E C such that x = lirn,,, pncn).
The upper epi-limit e-Is,,, f" is the function having as its epigraph the inner
limit of sets epi f":
When these two are equal, the epi-limit e-lim,,, f" is said to exist. In this
case, {fV)vEW is said to epi-converge to f .
As e-li "
when e-1s
,
,f" (x) 5 e-Is
., .,
f"(x), we have epi-convergence of f" occurring
f" (x) 5 f (x) and f (x) 5 e-li ,
, f"(x) for all x. The upper and
lower epi-limits of the sequence f" may also be defined via composite limits
(see Rockafellar and Wets (1998)). In particular
e-li ,
, f" (x) = sup lirn inf inf f" (XI)
6>0 "+W xlEBa(x)
As -a2>- f ( x ,-p) = d2*+ f ( x , p ) we study only the subjet. One may de-
fine corresponding superjet quantities by reversing the inequality. I t must be
stressed that these quantities may not exists everywhere but d2>-f ( x ) and
dpf ( x ) are defined densely. I f f!(Z,p, h ) is finite, then fl(3,h ) = (p,h ) , where
the first-order lower epi-derivative is defined by
1
f l ( ~ , h=
) liminf - ( f ( x + t h t )- f ( x ) ) .
tl0, h'--+ht
Another commonly used concept is that of the upper second-order epi-derivative
at 3 with respect to z and h , which is given by
Using this tangent cone and the contingent tangent cone one may define
many derivative like concepts using the multifunction F ( x ) := f ( x ) + [0,+m)
with graph Graph F = epi f . It is well-known that the contingent cone to
epi f corresponds to the epigraph of the function h H f J ( x , h ) (see Aubin et al.
(l99O), Rockafellar (1989) and Rockafellar (1988)). Also proto-differentiability
of epi f corresponds to first-order epi-differentiability of f .
The normal cone to a set S is given by
This prompts the following definitions (see Eberhard, Nyblom and Ralph
(1998)).
A: I
= { A E S ( n ) ( A ,uut) I q(A)(u)for all u E lRn).
4. The rank-1 (resp. E-rank-1) supported points in the direction u are given
respectively by
E ( A ,u ) := { A E A I ( A ,uut) = q ( A ) ( u ) )and
E,(A, u ) := { A E A I ( A ,uut) 2 q(A)(u)- E ) .
5. When -P(n) c O+A 5 S ( n ) we define the symmetric rank-one barrier
cone as bi (A):= { u E lRn I q(A)( u ) < 00).
The subjet is always a closed convex set of matrices while d2f (P,p) may not
be convex, just as dpf (5) is convex while df (5) often is not. Eberhard, Nyblom
and Ralph (1998) observed that
t-0
(f (Z + th') - f ( 5 )- t(p, h'))
h1+h
:= f:(P,p, u).
Hence if we work with subjets we are in effect dealing with objects dual to the
lower, symmetric, second-order epi-derivative. The study of second-order di-
rectional derivatives as dual objects to some kind of second order subdifferential
was begun in J.-B. Hiriart-Urruty (1986), Seeger (1986) and Hiriart-Urruty
et al. (1989). These works studied the particular case where f is a convex
function.
In general we have for all h (see Ioffe and Penot (1997)) that
q (d2f( 3 , ~ )(h)
) = sup{(Q,hht) I Q E d2f(%p)) I f T T ( % p , h )
In this section we outline some important facts regarding the geometry of rank-
1 sets which are relevant to later sections. We show that there is a dense set
of directions h E bi(A) for which a rank-1 representer A has E(A, h) # 0. We
discuss also some results from previous papers relevant to the characterization
of rank-1 exposed operators, as these figure strongly in the later development
of the paper. One can completely characterise symmetric rank-one supports.
The following characterization may be found in Eberhard, Nyblom and Ralph
COMPARISON OF SOME SECOND-ORDER SUBDERTVATIVES 47
(1998). This result generalizes those of J.-B. Hiriart-Urruty (1986) and Seeger
(1986) which treats only the case when the symmetric rank-1 support happens
to be a convex function. The results of J.-B. Hiriart-Urruty (1986) and Seeger
(1986) are sufficient to study the second-order derivative notion for convex
functions (see, for example, Seeger (1992), Hiriart-Urruty et al. (1989) and
Seeger (1994)), while Theorem 3.1 below is applicable to a more general study
of second-order epi-derivatives. Important as these earlier works are, they do
not figure in the present development.
Theorem 3.1 Let p : Rn H ]R be proper (that is, p(u) # -co anywhere). For
u, v E Rn, define q(u, v) = co if u is not a positive scalar multiple of v or vice
versa, and q(au,u) = q(u,au) = ap(u) for a 0. >
Then q is a symmetric rankone support of a set A C S ( n ) with -P(n) C
O+ A if and only if
2. p is lower semicontinuous;
Even in finite dimensions there may not exist, in every direction, an operator
in a given rank-1 representer achieving the symmetric rank-one support. Thus
some caution is required in the subsequent development when using rank-1
exposed facets and operators.
As noted earlier we endow S(n) with the Frobenius inner product (Q, A) =
t r AtQ. The natural norm induced by this inner product is the so-called pro-
jective norm llAllproj , given by
5 +
Theorem 3.2 Let dimS(n) = m (= (n 1)). Then i f A is a rank-one
representer which has -P(n) G O+A, we have for any Q E P ( n ) that
1 1
-
- q ( A ) ( ~ iI) C U ~ U ;= Q for some 1 5 m +1
i=l
On the space S ( n ) we may define the barrier cone to A (as a convex subset)
by b(A) := {P E S(n) I S(A, P) < 00).
Recall that an exposed point of A is the unique maximizer Q in A of a linear
function P). Recall also that P ( n ) induces a natural ordering in the space
(a,
S(n) with respect to which we may define a Pareto (or undominated) subset of
the rank-1 representer A. We quote next another result from Eberhard (2000)
that we require in this section.
that for all v E ( u ) l with llvll = 1, we have ziEF (ui,v)' = (Qg,vvt) and so
<
C i E F ( u i21)', b2,as (Qs,vvt) I IIQsll llvl12 < b2. By a theorem of Hormander
(see Holmes (1975))we have for M = { a u I a E IR) that there exists v E
( u ) ln Bl(0) such that
Finally take zzt with z = Vllull and note that (ui,z ) = aiu. We compose zzt
with Q = CiEF
uiuf = uut + Qd to get
I
Here a? = 1 are the accumulation points of the a's in Lemma 3.1. Thus
for any E > 0 we have for m L mo that Iluy - aiull I a i E for all i with ai # 0
and IIuyll 5 E for all i with ai = 0. For each m, the supporting point A, E A
has S(A, P,) = (A,, P,). By Corollary 3.1 there exists a representation
P, = xBZ1u T ( u ~ )containing
~, a t least as many linearly independent uT as
the rank of P, but no more than dimS(n) + 1, such that for each i
This is finite when the function f is proper, lower semi-continuous and mi-
norized by a quadratically function y - & 11 . -y112 with r > 0 (see Rockafellar
and Wets (1998) Example 1.44). In this case each fx is proper and lower semi-
continuous. The supremum of all parameters r is called the prox-threshold
of f . It is well-known that fx is pointwise nondecreasing with decreasing X
and epi-convergent to f as X + 0 (see Rockafellar and Wets (1998) Example
1.44 and Proposition 7.4 (d)). Note that even when int dom f = 0 we have fx
finite-valued everywhere (for X sufficiently small) as long as f is quadratically
minorized (or prox-bounded). Thus dom fx = IRn. Moreover f x can be shown
to be locally Lipschitz continuous. In Eberhard, Nyblom and Ralph (1998) we
may find the following series of results.
and so
-m if I +XQ $ P ( n )
Q ( I + ~ Q ) +(h) +
if I XQ E P ( n ) \ (int P(n)) , h E Im ( I + XQ)
q;+xQ(h) =
~ ( I + A Q ) - I (h) +
if I XQ E int P ( n )
+m ifh$Im(I+XQ).
+
Here Im (I XQ) denotes the image or range and (I+XQ)+ the Moore-Penrose
+
inverse. Thus I XQ E int P ( n ) is a necessary and sufficient condition for
( Q Q ) x /=
~ qQx where
Such results have a long history when one notes that QA is constructed via a
'parallel sum' (see Mazure (1996), Anderson et al. (1969) and Seeger (1991)).
The following result is very useful in subsequent proofs (see Eberhard, Nyblom
and Ralph (1998)).
The next result allows us to study E ( d , .) via the more regular E (Ax, 0 ) .
which implies Qx E E ( A x ,h x ) .
I t is widely recognised that the differential information extracted from the func-
tions in such Q-subdifferentials provided information regarding certain subdif-
ferentials of nonsmooth analysis (see Rockafellar and Wets (1998)).This section
concerns itself with the problem of quantifying this relationship more formally.
Notions of abstract convexity were introduced in Janin (1973) and devel-
oped later in Balder (1977) and Dolecki et al. (1978). This approach has a
54 OPTIMIZATION AND CONTROL WITH APPLICATIONS
long history. Many earlier papers were concerned with the case of paracon-
vex/paraconcave functions (or stronglweak convexity) (see Penot and Volle
(1988) and Vial (1983)). Essentially this corresponds to taking the class @ to
consist of quadratics with a fixed maximum negative curvature. This restric-
tion is dispensed with in Dolecki et al. (1978) and greatly generalized via the
use of abstract 'dualities' (see Martinez-Legaz (1988) and Martinez-Legaz and
Singer (1995)). This approach is detailed in the text of Singer (1997). This
approach is developed in a different direction and applied to many optimization
problems in Rubinov (2000).
The approach discussed above is more general than required here as we
require only the use of abstract conjugations in the spirit of Pallaschke and
Rolewicz (1998). This approach has been exploited in Eberhard, Nyblom and
Ralph (1998)-Eberhard (2000) to study the approximate subdifferential and
consequently the basic subdifferential along with certain second-order deriva-
tive concepts. It has long been recognised that abstract convexity gives infor-
mation about the subdifferentials of nonsmooth analysis. One of the contribu-
tions of Eberhard and Nyblom (1998) was to show that, in finite dimensions,
the study of the a2-subdifferential was equivalent to the study of the proximal
subdifferential for the class of lower semi-continuous, prox-bounded functions.
To establish this one must show how to extend the local inequalities (1.1) and
(1.2) to ones which hold globally. The reason for doing this is that generalized
conjugates require global suprema rather than local ones. We present a result
of this kind in this section but defer the long proof to an appendix.
As we are mainly concerned with sub-Taylor expansions, the class
Lemma 4.1 Suppose there exists afunction w(.) : R+++ R with lirntlo ~ ( t =)
0 such that
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 55
belongs to d ~ , ( , ) f ( 5 ) .
Proof It follows from Lemma 4.1 that this may be achieved locally around 5 .
To extend outside this neighbourhood we follow the proof of Eberhard, Nyblom
and Ralph (1998) Proposition 6 , noting that we may begin with equation (4.1)
of Eberhard, Nyblom and Ralph (1998) replaced by (4.1) locally about 3,that
is, with -r(ll% - x11)115 - x1I2 E C 2 ( R n ) replacing the term -Xllx - %/I2. The
argument of Eberhard, Nyblom and Ralph (1998) Proposition 6 now applies
and establishes the result.
In Eberhard, Nyblom and Ralph (1998) and Eberhard and Nyblom (1998) it is
shown that the proximal subdifferential of Rockafellar, Murdukhovich and Ioffe,
denoted by dpf (x), is equivalently characterised via dpf (x) = {Vcp(u)l,,, I cp E
da2f (x)) := Vd@,f (2). This is a set of elements from X* rather than a set
of nonlinear functions defined on X. One may see that dpf (x) C Vda f (x) for
any class C @ C2(lRn). It should be noted that such @-convex functions
are simply those lower semi-continuous functions which are bounded below by
some cp E @. Traditionally this has been termed @-bounded (see Dolecki et al.
(1978)). If cp E dC2f ( I ) , then by taking c > p(~2cp(5))(the spectral radius)
we have .J, E d@,f ( I ) with .J,(x) := cp(I) + (Vcp(x), (z - I ) ) - $ 1 1 ~ - 5Il2, since
locally around I
We may "globalize" this inequality using Proposition 2.2 of Eberhard and Ny-
blom (1998). This globalization property for a2-bounded functions may be
generalized to the class C2(lRn) as shown by Proposition 6 in Eberhard, Ny-
blom and Ralph (1998). Thus we always have dpf (5) = Vd@f (x) for any class
@2 c @ c C2(IR").
If a function is not -cm anywhere and f $ +cm (that is, f is proper), then
to be a supremum of functions it must be at least bounded below by one
such function, that is, Q2-bounded. When one is interested only in the local
differentiability properties of the function, this assumption may be dropped by
setting f (x) = +oo for x 4 B6(5). If f is lower semi-continuous then S >0
may be chosen so that the resultant function is actually bounded below by a
constant. This will not affect the local differentiability properties of f a t I.
We use the notation E > 0 to mean that ~ ( t >) 0 for a11 t > 0. As is usual
+ +
(E A)(.) := e(.) A(.) for any E and X E E. We have deliberately left the
precise choice of E open for time being but note that in Eberhard and Nyblom
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 57
(1998) it is shown that for the choice E = { ~ I tE > 0 ) the quantities d$, f ( x )
approximate the proximal subdifferential in that for all 6 > 0 and any E 2 0
we have (slightly abusing notation in suppressing the t for ~t E 2)
In particular this implies that V d & f ( x ) in effect estimates the closed sets
6'; f ( 5 )= 8- f ( 5 )+ E d - f ( 5 )= { z I ( 2 ,y) I f!(5; y ) for all y ) is
~ ~ ( O where
) ,
the lower Dini subdifferential. As E J, 0 both will approximate the closure of
the proximal subdifferential dpf ( x ) = V d G zf ( 5 ) . In order to drop the closure
operation we need the following concept.
With this in hand it is easy to extend Eberhard and Nyblom (1998) Propo-
sition 5.1 to the form we state next without proof.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 59
where -w(llx - 511) := induces a mapping v(.): lR+ t-+ lR. Now
apply Lemma 4.1 to obtain r ( . ) : lR+ H lR+ with r(.)ll . -5112 E C2(lR) such
that cp(x) := ( p , x - Z ) + $ ( X , ( X - Z ) ( X - % ) ~ ) -r(llx-Z11)11~-511~ E dJzcm) f(5).
Then ( p ,Q ) E ~ d ~ f, ( (5 )~as )required. As the reverse inequality is always true
the result V 2 d J , ( z )f ( z , ~=)d2>-f ( E , ~ follows.) Now apply this result t o the
+
function x H f ( x ) EIIX - %[I2, noting that €11 . -5112 E J2(5). By Lemma 4.2
+
we have d J 2 ( q ( f € 1 1 . - ~ 1 1 ~ ) ( 5= +
, ~d)J z ( qf ( 5 , p ) €11 . -?ill2. We have via an
+
elementary argument that LIZ?-(f €11 . -5112)(5, p) = 821- f (5,p) 2eI. This +
gives
Because the closure is required to relate Vdcz f ( x ) and dpf ( x ) ,we can equate
~ d 5 ~ (f ,( )5 )and d2>-f ( x ) only under the assumption of E-proximally regular.
The next result is taken from Eberhard and Nyblom (1998).
Proof We sketch the proof leaving the details to the reader. First note that
when f is prox-regular at 5 it is also locally prox-regular and so is f (.)+ell. - X I [ .
The addition of ell - - X I [ will only help to ensure that T, the f attentive y
+
localization of d(f (.) ell . -xIl)(x) (see Poliquin and Rockafellar (1996) or
+
Rockafellar and Wets (1998) for definitions), will have T r I monotone (use
Poliquin and Rockafellar (1996) Theorem 3.2 and subdifferential calculus on the
sum). In addition note that an f attentive q-neighbourhood of 5 is contained
+ +
in an f (.) ell . -xll attentive (q eq)-neighbourhood of 5. Thus in an f
attentive neighbourhood (that is, llx - 511 < q and 11 f (x) - f (5)II < q) of Z we
have d(f (a) + ell -xll)(x) = dp(f (.) + ell - -xII)(x) and so
Theorem 5.1 Suppose that f is @-convex and proper and @ is a convex conic
subset of mappings from X to E. Then for any v : X H R with +(v(x+t(.)) -
v(x)) E @ and cp E d~ f (x) we have
Proof We apply the Fenchel equality using the facts that @ is a convex conic
subset and the evaluation mapping $ H x($) := $ ( x ) is linear for II, E
a. For cp E d@f ( x ) take the conjugate ($A2f ( x ,t,cp, .))'($) = suph($(h) -
LA2
2 +
f ( x ,t , cp, h ) ) ,where $(h) := $ ( v ( x t h ) - v ( x ) ) . On using f C ( c p ) + f ( x )=
cp(x) we obtain that
= ($)y p ( ( c p + t v ) ( x+ t h ) f ( x + t h ) f c ( c p ) t v ( x ) )
- - -
= ($) + t v )
(f'(cp t ( I ( P + tv) x(cp)
- fC(cp) -
t
-
+
Thus we have $ ( v ( x t(.))- v ( x ) )E d@(;A2 f ( x ,t ,cp, .)) ( h ) if and only if
1
= -(v(x+th)-v(x))
t
Substituting f ' ( c p ) + f ( x ) = cp(x) again and cancelling gives
($) (fc(cp+tv)+ f ( x + t h ) - cp(x+ t h ) ) -
V(X)
=
v(x+th)-v(x)
t
which is equivalent to fC(cp + t v ) + f ( x + t h ) = (cp + t v )( x + t h ) or cp + tv E
d+ f ( x + th). This is true if and only if ( x + th, cp + t v ) E Graphd f or ( h ,v ) E
Graphaf -(x,v)
t
implies
+
(h, v2cp(x)h Vv(x)) E
Gra~hapf - (x! v ~ ( x ) )+ o(l)Bl,
t
We thus define A2f (x, t, Vcp, h) := (6)(f (x + th) - f (x) - t(Vcp(x), h)) and
obtain
1. The condition
for all t > 0 suficiently small implies for all p > 0 and w E X that
1 1
E ;(v(x + t ( h f+ ,owf))- V ( X ) ) - ;(u(x + t h ) - v ( x ) ) .
This implies
64 OPTIMIZATION AND CONTROL WITH APPLICATIONS
or
Indeed in this case there exists a 6 > 0 such that for all w E &(w) and p < 6
we have
h + PW) = f%,
f;(x, Vv(x), h + PW).
Vv(x), (5.7)
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES
limipf '
~ 1 0w, -+w p
(f:(x, Vp(x), h + pw') - f:(x, Vp(x), h))) (5.8)
and assume once again this limit infimum is finite. We address three subcases.
The first two are when either f:(x, Vv(x), h + p,w,) = f!(x, Vp(x), h + p,w,)
for n sufficiently large or f:(x, Vv(x), h + p,w,) = f!(x, Vp(x), -(h + p,w,))
for all n sufficiently large. With either we have via the finiteness of (5.8) that
In the former case (5.6) holds for all n. In the latter, after renaming h as
-h and w as -w, (5.6) then holds for n sufficiently large. The third subcase
+ +
is when f r ( x , Vp(x), h p,w,) = f!(x, Vp(x), f(h p,w,)) infinitely often.
We may reduce this to (5.6) on taking a subsequence along which a positive
sign is always chosen.
The only case remaining occurs when the limit infimum is not finite in which
case any arbitrarily chosen sequence will attain the limit infimum. Thus we may
assume that the limit infimum in (5.5) is attained by a subsequence along which
(5.6) holds for all n.
Now consider p, I 0 and w, - w achieving the limit infimum
When this is infinite we have (5.10) holding trivially, since (5.4) implies the
quotient in (5.9) is bounded below by (2V2p(x),wwt). Finiteness of the limit
implies finiteness of the limit (5.5). Arguing as before we may once again
assume that (5.6) holds along this sequence. Then when Vv(x) = 0 we have
by (5.7) and (5.4) that for arbitrary w
lim inf
pJ0,w1+w ($) (f:(x, Vv(x), h + pw') - f 3 , V d x ) , h) - p(2v2v(x)h,4 )
66 OPTIMIZATION AND CONTROL WITH APPLICATIONS
This section explores the relationship between subjets and coderivatives. This
is achieved by first obtaining results relating the subjet and the contingent
graphical derivative and then appealing to the results of Rockafellar et al.
(1997). This will allow us to explore the connection between the symmetric
operators used in subjets and the symmetry notions introduced in Rockafellar
et al. (1997) for coderivatives. We find once again that the crucial operators
are those exposed by rank-one supports. In Dolecki et al. (1978)it was shown
that a necessary and sufficient condition for a function f : IRn H to be
@-convex (for C @ G C2(IRn))is for f to be lower semi-continuous and
minorized by a t least one element of @ (that is, G2-bounded or alternatively
prox-bounded) .
Recall that a function f belongs to C1>l(1Rn) when its gradient exists every-
where and the gradient itself is a locally Lipschitz function.
1 1
L -(v(x
t
+ t y ) - v ( x ) )- -(v(x
n t
+ tnhn) - v(x))(6.1)
+
where v ( x t y ) := v ( x ) - t ~ ( t y11 yl12
) (with v ( x ) taken as an arbitrary fixed
value) is differentiable at x with V v ( x ) = 0 and e(.) : IRn -+ IR is as given
Lemma 4.1. If in addition we assume that f is locally Lipschitz and v may be
chosen so that it is strictly differentiable at x , which may be achieved i f f is
C1>l(IRn), then
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 67
and so ( h , V 2 v ( x ) h )E Tcraphaf(x,
V p ( x ) ) , the contingent tangent cone to
Graph d f .
and
~ ~
2
,(v2
~= ( y f) ( x
t
+ ty)t2)= 2 v 2f ( x + t y ) .
Thus almost everywhere we have I ~ V; V ( X + ty)II I t (2 11V2f ( x + ty)II) := t2L
and so for fixed t we have y H v ( x + t y ) is C1?lwith a Lipschitz constant t2L.
68 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Next observe that for fixed t > 0 we have by the chain rule that V y v ( x+ t y ) =
tV,v(u) I,=,+ty= tVv(x + ty). Thus
lim sup
y-thi
&(Y)IIY 112
/
- &(th1)llth'1I2= lim sup
yi-+hi I t2M Y 1 ) - g(hl))
lItY1- th'll
= lirn sup
y-rh' I ~ Y - h' l
To prove (6.4)consider the Clarke subdifferential of the function y tg(y).As H
1
+ +
T ( v ( x t y ) - v ( x ) )= - g ( y ) we have tg(y) = -(v(x ty) - v ( x ) ) . The chain
rule implies -tdv(x + th') = d (tg)(h'). As v is strictly differentiable a t x, we
+
have dv(x thl) -+ V v ( x )= 0 as t t 0. Thus
where z ( t ) E d (tg)( y ' ) for some y' in the interval between y and h'. Thus by
the upper semi-continuity of the Clarke subgradient
lim sup
y-th' IY - h' l
I sup{llzll I z E 8 (tg)( Y 1 )
for y' = Xy + ( 1 - X)th1and X E ( 0 , l ) )= o(t)
Hence
where the little "on notation is taken to mean that implied by (6.4). Define the
function k ( y ) := f ( x + y ) and put $ ( y ) := ( V p ( x ) ,y -tnhn) + ; ( v 2 p ( x ) ,yyt -
t i h n ( h n ) t ) E C 2 ( R n ) . Then by (6.5) we have
and so ( h ,v 2 p ( x ) h ) E G r a ~ h a-f ( x ,V ~ ( x ) )o ( l ) ~ , .
+
tn
af ( 5 ,V p ( x ) ) .
This immediately implies ( h ,V 2 p ( x ) h )E TGraph
Remark 6.1 The construct of Lemma 4.1 ensures that we can always find
a strictly differentiable remainder term for the second-order subjet expansion.
Unfortunately this construction does not preserve the first equation in (6.3) and
so we are unable to use it here.
70 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Set Lx(x,p) := (x + Ap,p) : IRnn + IRnn, which is clearly linear and invertible.
The following is a corollary.
Now use Proposition 3.2. If Q E E(d21- f ( 0 , 0 ) ,h ) , then for all X > 0 suffi-
+
ciently small we have Q x E ~ ( d : ' -f ( 0 , 0 ) ,h?), where h y = (I XQ) h + h
as X -t 0. Observe that Theorem 3.6 implies a:'- f (o,o) = d29- fx(0, 0 ) and
SO Qx E E(d2'- fx(0,0),h?). For this Q x to satisfy the inclusion (6.9). we
need only to show that (fx)','(O,0 , h?) = (fx)'_'(O,O, h?). First we observe that
Q 5 (fx)"(O,0 , h?) always holds. To establish the reverse inequal-
(fx)','(O,0 , h,)
ity, use Proposition 3.2 again with A = d21- fx(0, 0 ) to establish
The last inequality follows via direct calculation as follows. From (6.6) we have
f x ( 0 ) = f ( 0 ) and
72 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Now use (3.5) and a Neumann series (see for example Anderson et al. (1969))
to deduce that for X > 0 sufficiently small
Remark 6.2 Under the assumptions of Corollary 6.1, we have for all h that
In the following we use the convention sup 0 = -m and inf 0 = +m. Put
and
Proof The containment (6.12) follows immediately from Theorem 6.1 and Corol-
lary 6.1, noting that d p f = d f locally. The inequality follows from w t Q w =
f:(5, fi, W ) = f!(5, f i , w) for a11 Q E E(d2*-f (5,f i ) , w ) .
~2
( w th) , (p,h) for all Q E E (a2'-f (x,v),h) (6.16)
for all n and such that fw, E dom f!(x, v,.); fw, -+ f6 and for either the
plus or minus sign we have f:(x, v,f w,) = fr(x,v,fw,) -+ fr(x,V ,fw)=
f:(x, v,6). We may need to take the minus sign if f!(x, v,6)= f!(x, v,-w)=
ff(x,v,w).By relabeling the vectors we can assume without loss of generality
that we may use w, -+ w . The existence of such a sequence may be established
by invoking Theorem 10.2 of Rockafellar (1970)on a simplicia1 convex subset
S of rel-int dom f!(x, v,.) U {a)C b1 (d23-f (x,v)) containing as a vertex.
Also as b1 (a2>- f (x,v)) is a polyhedral convex set we may contain any sequence
w, E b1 (a2>- f (x,v)) with w, -+ tE in such a simplex. The Theorem 10.2 of
the cited reference states that the convex function fr(x,v,.) + rll . /I2 is upper
semi-continuous relative to S. As the function is by construction lower semi-
continuous on all of S,it must also be continuous at f*. It follows that
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 75
f!(x, V , wn) 2 (P, wn) for all p E D* (dpf ) ( x ,v)(wn). Thus for any convergent
Sequence (pn,-wn) E (TGrapha, ( 5 ,v))', converging to (p, - W ) , we have
lim f!(x,v, w,) = f y ( x , v , @ ~=) f y ( x , v , 6 )2 (p,W).
n--tea
As the graph of D* (8, f ) ( x ,v )(.) equals (TGraph
ap (5,v ) )O , a closed convex
cone, p may be taken as an arbitrary element of D* ( d p f )( x , v ) ( W ) . Thus
(6.13) holds for all w E b1 (a2>-
f ( x ,v ) ) with f!(x, v , w ) = f:(x, v , w ) .
f y ( z , G , w ) = S ( D ( d f )(5,G)(w))(w)I S ( D * ( d f )( 5 , v ) ( w ) , w ) . (6.18)
Proof The first equality in (6.18) follows from an application of Corollary 6.2
of Poliquin and Rockafellar (1996) that states that for all w we have
Also without loss of generality we may translate ~ to zero and consider the in-
+
equality for the function g(.) := f (.) - ( G , .) $11 . -5112. By the results of Rock-
afellar and Wets (1998)and Poliquin and Rockafellar (1996))(see also Corollary
6.1 of Eberhard (2000))under the current assumptions h H g y ( ~ , Oh)
, is con-
vex. Hence (6.19) implies
where the subgradient dp coincides with d the usual one from convex analy-
sis. Now use the fact that support function of the convex subdifferential
d (igN(5,0, ( w ) in the direction w is equal to the one sided radial direc-
a))
and on removing the translations (using some basic calculus) we obtain the
equality in (6.18) holding for all w. The final inequality is true for all w as
shown by R. T. Rockafellar and D. Zagrondny in Rockafellar et al. (1997) and
stated in Theorem 6.1.
In this section we give a few examples showing that f o r m u l ~(6.12) and (6.15)
can be used to obtain estimates of graphical derivatives and coderivatives.
These estimates provide a connection between classical optimality concepts
based on derivatives of smooth functions and those using graphical derivatives.
Another approach which is successful in achieving this goal is the study of op-
timality conditions for functions (see Yang and Jeyakumar (1992)) and
for convex composite functions (see Yang (1998)) both of which are particular
examples of prox-regular functions.
We shall apply these ideas to a nonsmooth penalization of the Lagrangian
associated with a standard smoothly constrained mathematical programming
problem. Indeed we do not have to assume a priori any regularity of the
constraint set but allow a condition to arise out of the construction of the rank-
1 exposed facet of the subjet of the penalized Lagrangian. In this way, what
appears to be a new and in some ways a more refined second-order sufficiency
condition for a strict local minimum is derived.
For unconstrained nonsmooth functions the optimality conditions we inves-
tigate, when using various second-order subdifferential objects, are as follows.
The necessary (suficient) condition of the second lcind holds at Z when we have
We shall say nothing here about conditions of the fourth kind, leaving this
to a later paper. The conditions of the first kind are easier to study. This was
first done by Auslender (1984), Studniarski (1986) and later by Ward (1995),
Ward (1994). Some related results may be found in Eberhard (2000). In this
context the sufficient optimality condition of the first kind is equivalent to the
concept of a strict local minimum of order two (see Studniarski (1986) and
Ward (1995)).
is immediate from definitions that when (7.1) holds we have f:(5, 0 , h ) 2 P >0
in all directions h .
Remark 7.3 When 5 is a strict local minimum order two, then by Remark 7.1
we have the suficient conditions of the first kind holding. Consider the case
when f l ( 3 , h ) 1 0 and f l ( 3 , - h ) > 0 with f prox-regular and subdifferentially
continuous. Now f l ( 3 , -h) > 0 implies the existence of a 6 > 0 such that
+
f ( 5 t ( - h ) ) - f (3) 2 bt for t small, so we have
Thus f:(5,O, -h) = +m and f:(3,O, h ) = f:(%,O, h). Now invoke Corollary
6.1 to obtain
0 < fl'(% 0 , h ) L S ( D ( d f ( % O ) ( h ) ) ( h ) .
Thus there exists a p E D ( df ) (5,O)(h) such that ( p , h) > 0 . This is precisely
the suficient condition of the second kind.
The sufficient conditions of the third kind were studied (for the prox-regular
subdifferentially continuous function) in Poliquin et al. (1998) in association
with the concept of the tilt stable local minimum.
Definition 7.4 A point 5 is said to give a tilt local minimum of the function
f : Rn-+ R i f f (3) is finite and there exists 6 >0 such that the mapping
M : v ~ a r gmin { f ( x ) - f ( 5 ) - ( v , x - 5 ) )
Ix-qSa
Proof Using (6.19) there exists a p E D (8,f ) ( 5 ,O)(hl)if and only if d p (a f;(z,o,.)) (h') f
0 which is only possible if f!(5, 0 , h') < +co. Indeed by Corollary 6.1 of
Eberhard (2000) (see also the results of Rockafellar and Wets (1998) and
Poliquin and Rockafellar (1996)) that under the current assumptions h H
+
f:(Z,O, h ) rllh112 = g!(Z,O, h ) (where g ( . ) := f ( . ) +
. -51' as in (6.21))
is convex and proper (see Rockafellar and Wets (1998) Theorem 13.40). Thus
-m < f!(x, 0 , .) and consequently the directional derivative (6.21) is also never
equal to -m in any direction. Invoking Theorem 23.3 of Rockafellar (1970) we
have convex analysis subdifferential d ( i g 7 5 ,0 , .)) ( h ) # 0 and consequently
8, ($f y ( x , 0 , .)) ( h ' ) # 0. Thus by (6.19) we have h' E dom D ( d pf ) ( 5 ,0 ) ( . ) :=
{ h 1 D (8,f ) ( 5 , O ) ( h )# 0) and (7.5) holding.
As f:(5, 0 , h ) = min { fy(Z,O, h ) , f:(Z,O, - h ) ) < +co if and only of fh E
dom D (apf ) (5,O)(.) we have (7.6) holding. Finally note that as f!(Z, 0 , h') <
+co implies f!(5, h') 5 0 and we have the relations preceding (7.6) holding.
1. The necessary conditions of the second kind implies the necessary condi-
tions the first kind while the suficient conditions of the first kind hold i f
and only i f the suficient conditions of the second kind hold.
80 OPTIMIZATION AND CONTROL WITH APPLICATIONS
2. The suficient conditions of the first and second kind hold if and only if
f is strict local minimum of order two.
<
and so 0 I ( < ) f r ( f , 0, h) if and only if 0 (<)S(D (df ) (5, O)(h))(h). Suppose
the necessary (sufficient) conditions of the first kind hold. Then when h E
dom D (df) (Z,O)(.) we have f;(Z, 0, w) < +oo and hence f l ( f , 0) 5 0 implying
0 5(<)f!(f, 0, h). The existence of p E D (df ) ( f , 0) (h) with 0 < (p, h) follows
from (7.7) when 0 < fr(f,O, h). When the necessary condition of the second
kind hold then there exists a p E D (df ) (5,0) (h) with 0 5 (p,h) and (7.7)
implies 0 5 fr(f,O, h).
When the necessary (sufficient) conditions of the second kind hold, take h
such that f l ( f , 0) 5 0. Suppose first that f!(Z, 0, w) < +oo in which case h E
dom D (df) ( f , O)(.). Then (7.7) along with the necessary (sufficient) conditions
of the second kind imply 0 <(<) f!(f,O, h). Otherwise ff(f,O, w) = +oo >0
as required in the necessary (sufficient) conditions of the first kind. As the suf-
ficient conditions of the first kind hold if and only if 5 is a strict local minimum
order two the sufficient conditions of the second kind are also equivalent to f
being a strict local minimum order two.
Finally suppose f is a tilt stable local minimum and hence the sufficient
conditions of third kind hold a t f . Then (6.11) of Theorem 6.1 implies for all
h E dom D (df ) ( f , 0 ) (a) and p E D (df ) (3, 0) (h) that 0 I (<) (p, h) . Thus the
sufficient conditions of the second kind follow and f is a strict local minimum
order two.
When A ( x ) is finite the convex hull of the left hand side of (7.8) is con-
tained in the right hand side of (7.8).
Then
where
-
R ( x ,h,p) = {A E RI;I I 3(tn,h,) + (o+,h ) such that A, = 0,
Va E { i I i $ A(x + tnhn) for n sufficiently large) and
0 # Q ( x , h , ~=) { X E I A~ , = ~ i f a $ ~ ( x , hand
)
Proof The first containment for f ( x ) = SUP,,=A f,(x) follows from definitions
in that when (p,, Q,) E f,(x) and a E A ( x ) then we have
82 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where the last term is clearly of small order when A(x) is countably finite. This
implies xaEA(x)
A, (pa, Q,) E d2>-f (2).Next note that if a E i ( x , h), then
+
by the lower semi-continuity of f we have for xn = Z tnhn that
and so a E A(x). Observe that in this case, if f, E C2 (lRn) and A is finite, then
f is locally Lipschitz, prox-regular and subdifferentially continuous (see Exam-
ple 2.9 of Poliquin and Rockafellar (1996)). Also f is regular (see Rockafellar
(1982) regarding lower C2 functions) and semi-smooth (see Mifflin (1977) re-
garding suprema of semi-smooth functions). Hence by the results of Spingarn
(1981) we must have df submonotone and hence directionally continuous in
the sense that if (t,, h,) + (O+, h) and pn E d f (x + tnhn) with pn -+ p, then
2 1
n+w tn
+
lim inf 7 ( f ( 5 tnhn) - f ( x ) - tn(p,h,)) = - ( Q ,hht)
2
x
or€A(z)
X,V' fa ( x )1 X E R? and x A, = 1 with p =
CYEA(X)
x
aEA(x)
X,V f , ( x )
(7.13)
In particular
0= x
iEA(x,h)
h i V f i ( ~ )for some
iEA(x,h)
Xi = 1 with Xi 2 0.
Proof By Example 2.9 of Poliquin and Rockafellar (1996) and the discussion
of Rockafellar and Wets (1998), we have all the assumptions of Corollary 6.2
holding.
A3 all the constraints fi are bounded below on Rn (once again this can easily
be arranged via a reformulation).
4. If (5, z) E X x R?+?
is a saddle point (as given in (7.15)), then 5 is an
optimal solution for the problem (7.14).
( x d ) = goF(x)
then for any fixed (d, r ) we have x H U(x, d, r ) prox-regular and subdiffer-
entially continuous (see Example 2.9 of Poliquin and Rockafellar (1996))
and as observed before also Clarke regular (see (Rockafellar and Wets
(1998))), semi-smooth (see (Mifflin (1977))) and hence submonotone and
directionally upper semi-continuous (see Spingarn (1981) and Rockafellar
(1982)). It is a very well-behaved function indeed. We now derive the
necessary optimality condition associated with this Lagrangian penalty
method.
Then the following are suficient conditions for a strict minimum of order
two at 5 for the problem (7.14) with X = lRn. Suppose there exists d E IR;
satisfying:
A". (5,h )
:= { i E A(5) I 3(tn,h,) + (o+,h) such that (7.1 7)
and there exists (Ao, X I , . . . ,A,) > 0 with X i = 0 i f i @ A(%,h ) and with
X o = 0 if ( V x L ( 5 , d )h, ) < 0 such that
0 < ~+
( x ~ v : L (d) , C Xiv2f i ( 5 ) ,hht). (7.18)
i€ii(~,h)
Remark 7.6 We note that if (7.18) holds, then for any ri > 0 there exists
Xi 2 0 with &X(z,h) Xi = 1 such that
X0 = (i€il(m,h)
$ + Ao)
-1
and Xi = $( i€ii(m,h)
$ + Xo)
-1
88 OPTIMIZATION AND CONTROL WITH APPLICATIONS
for which ~ ( 5d; ,r ) = f o ( Z ) and the maximum is attained for the indices A(5)U
( 0 ) . By assumption 2. we have 0 E d ~ ( xd;,r ) and so
O X ,2 , ) 2 ( r ) + 1 ( (Q, -
I n -
'I ) ( xn 'xII
115, -
)t) x n -~ 1 1 '
This implies by (7.19) that for n sufficiently large and some c > 0 we have
since Czl difi(5) 5 0 = Czl &fi(5) as fi(5) 5 0 for all i = 1,. . . ,m. Thus
to verify that 5 is indeed a strict local minimum order two for (7.20) we need
90 OPTIMIZATION AND CONTROL WITH APPLICATIONS
only consider if it is also one for x H U(x, d, F ) on BJ(3) for some 6 > 0. Indeed
if 3 is a strict local minimum order two for x ++ U(x, d, F) on B J ( ~then
) for all
XI 3 n BJ(E) we have U(xl,d,F) = m a x { ~ ( x ~ , d ) , F ~ f ~..,.~',f,(x'))
E (x~),. =
L(xl, d) 5 fo(xl). On recalling that fo (3) = U(3, d, r) = 0, we have thus that
and 3 is also a strict local minimum order two for the problem (7.14).
Next note that the first-order condition for this amounts to
We now use sufficient conditions of the first kind in the form of (7.4). We need
to show that for all h with U'(3, d;F, h) 5 0 and h E b1 ( a 2 > - ~ ( 3 , d , F0)),
,
for some > 0 we have the existence of a Q E E, ( a 2 ? - ~ ( 3d,, F,O), h) with
E
0 < (Q, hht). As (7.22) holds we have U'(3, d, F, h) > 0 and hence we need to
consider only those h such that U'(3, d; 7, h) = 0 or
As 1 - Xo = &h(z,h) Xi, d)
we have by (7.23) that XOVL(E, + ( 1 - Xo) p-1 =0
and hence
with Q' satisfying (7.19), that is, (Q', hht)> 0. When h E b1 (a2>-U(Z,d; r , o)),
there always exists a Q E (Q' + P ( n ) ) n E, (a2'-U(3, d; r , 0 ) ,h ) # 0 such that
(7.19) holds and so we have ( Q ,h h t ) > 0 , the desired conclusion follows from
(7.4).
8 APPENDIX
Lemma 4.1 claims that it is sufficient to consider only the class J z ( 0 , Z ) gen-
erated by
On reflection one can see that this corresponds to the second-order char-
acterization of subjets as provided in the first-order case by Proposition 1.2
(page 341) in Deville et al. (1993). We extract and extend to the second-order
level the relevant parts of this argument in the proof of Lemma 4.1.
Proof [Lemma 4.11 The only problem with w in (4.1) is that it is not neces-
sarily C2(Rn). AS a C2(lRn) bump function exists, the construction found in
Lemma 1.3 of Deville et al. (1993) (page 340) leads to a function d : Rn H lR+,
where
is well-defined as CrO=o
b(nx) 2 1. Without loss of generality we may assume
that w(-) 2 0. Now arguing as in Proposition 1.2 of Deville et al. (1993) (page
341) we define
S
ds.
As in the proof of Proposition 1.2 of Deville et al. (1993) (page 342) we have
Thus we have
V $ ( h ) = p;(d(x))Vd(x)and
+
v 2 $ ( h ) = p$(d(x))( ~ d ( x ) ~ d ( x p) ;~( d) ( x ) ) v 2 d ( x ) , where
PN = ( a @t-) a @ )and
) (8.4)
for all small x #0 (that is, & = O(11x11). Letting M be a global bound for
IIVb(x)11 and IIV2b(x)ll,we have since that Vb(x) and V2b(x) are zero outside
the unit ball that
Similarly,
Thus
and since llxll 5 d(x) 5 KIIxll for 11x11 5 1, it follows that ~ ; ( d ( x ) )= o(llxll).
Hence since p$(d(x)) +0 and Vd(x) is bounded as x -+ 0 we have
so Lipschitz on the (compact) unit ball. Since d(x) = 0 outside the unit ball, d
--
is globally Lipschitz. Thus
References
Anderson W. Jr. and R. J. Duffin (1969), Series and Parallel Addition of Ma-
trices J. Math. Anal. Appl., Vo1. 26, pp. 576-594.
Andromonov M. (2001), Global Minimization of Some Classes of Generalized
Convex Functions, PhD Thesis, University of Ballarat, Australia.
Attouch H. (1984) , Variational Convergence for Functions and Operators, Pit-
man Adv. Publ. Prog. Boston-London-Melbourne.
Aubin J. P. and Frankowska H. (1990), Set Valued Analysis, Birkhauser.
Auslender A. (1984), Stability in Mathematical Programming with Nondiffer-
entiable Data, SIAM J. Control and Optimization, Vol. 22, pp 239-254.
Balder E. J. (1977), An Extension of Duality Relations to Nonconvex Optimza-
tion Problems, SIAM J. Control and Optim., Vol. 15, pp. 329-343.
Ben-Tal A. (1980) Second-Order and Related Extremality Conditions in Non-
linear Programming, J. Optimization Theory Applic. Vol. 31, pp. 143-165.
Ben-Tal A. and Zowe J. (1982) Necessary and Suficient Conditions for a Class
of Nonsmooth Minimization problems, Mathematical Programming Study
19, pp. 39-76.
Bonnans J . F., Cominetti R. and Shapiro A. (1999) Second Order Optimal-
ity Conditions Based on Parabolic Second Order Tangent Sets, SIAM J.
Optimization, Vol. 9, No. 2, pp. 466-492.
Crandall M., Ishii H. and Lions P.-L. (1992), User's Guide to Viscosity Solu-
tions of Second Order Partial Differential Equations, Bull. American Math.
Soc., Vol. 27, No. 1, pp. 1-67.
Cominetti R. and Penot J.-P. (1995), Tangent sets to unilateral convex sets, C.
R. Acad. Sci. Ser. I Math., 321, pp 1631-1636.
96 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
In this section, we introduce some concepts and obtain some basic properties
of augmented Lagrangians.
Let R = R U{+oo, -m) and cp : Rn + R be an extended real-valued func-
tion. Consider the primal problem
inf cp(x).
zeRn
GENERALIZED AUGMENTED LAGRANGIAN 103
where
if x E x o ,
otherwise,
So considering the model (2.1) provides a unified approach to the usual con-
strained and unconstrained optimization problems.
A simple way to define the dualizing parametrization function for (CP) is:
f ( x , u) = { y; otherwise,
x Xu,
if E
where
I n the sequel, we will use this dualizing parametrization function for (CP).
Definition 2.3 Consider the primal problem (2.1). Let f be any dualizing
parameterization function for cp, and a be a generalized augmenting function.
(i) The generalized augmented Lagrangian (with parameter r > 0 ) : Rn x
Rm x ( 0 ,+IX) + R is defined by
Remark 2.2 (i) B y Definition 2.2, any augmenting function used in Defin-
ition 11.55 of Rockafellar, et a1 (1998) is a generalized augmenting function.
Thus, any augmented Lagrangian defined in Definition 11.55 of Rockafellar, et
a1 (1998) is also a generalized augmented Lagrangian.
(ii) If a is an augmenting function in the sense of Rockafellar, et a1 (1998),
then for any y > 0, a7 is a generalized augmenting function. I n particular,
(a) let o(u) = IIuII1, then for any y > 0 , a y ( u ) = IIuII: is a generalized
augmenting function;
(b) take a(u) = IIuIIm, then a r ( u ) = I I u I I ~ is a generalized augmenting
function for any y > 0 .
(c) let a ( u ) = Egl
luj17, where y > 0 . Then a ( u ) is a generalized aug-
menting function.
GENERALIZED AUGMENTED LAGRANGIAN 105
It is clear that none of these three classes (a), (b) and (c) of generalized
augmenting functions is convex when y E ( 0 ,I ) , namely, none of them is an
augmenting function.
I+m, otherwise,
where v = ( v l , . . . , v m , ) . I n particular, if U ( U ) = $IIull;, then the augmented
Lagrangian and the augmented Lagrangian dual problem are the classical aug-
mented Lagrangian and the classical augmented Lagrangian dual problem stud-
ied i n Rockafellar (1974); Rockafellar (1993), respectively; i f a(u) = IIuII:, y >
0 and ml = 0 (i.e., (CP) does not have inequality constraints), then the aug-
mented Lagrangian for (CP) is
where
which is the optimal value of the standard perturbed problem of (CP) (see, e.g.,
Clarke (1983); Rosenberg (1984)). Denote by Mcp the optimal value of (CP).
Then we have p(0) = M c p .
Proposition 2.1 For any dualizing parameterization and any generalized aug-
menting function, we have
(i) the generalized augmented Lagrangian i ( x , y, r ) is concave, upper semi-
continuous in ( y ,r ) and nondecreasing in r .
(ii) weak duality holds:
3 STRONG DUALITY
The following strong duality result generalizes and improves Theorem 11.59 of
Rockafellar, et a1 (1998).
Theorem 3.1 (strong duality). Consider the primal problem (2.1) and its
generalized augmented Lagrangian dual problem (2.3). Assume that cp is proper,
and that its dualizing parameterization function f ( x , u ) is proper, k c , and level-
bounded in x locally uniform in u. Suppose that there exists ($, F) E Rm x
( 0 ,+m)such that
inf{i(x, $, r) : x E Rn) > -m.
In particular,
p(0) - 6 > q ( g , r ) = XEX
inf i(x,g,r).
Let 0 < r k +m. Then,
Then,
f(xk, uk) - (b, uk) + Fu(uk) > -mo, Vk.
(3.1) and (3.2) give us
when k is sufficiently large. This combined with the fact that uk -+ 0 and the
fact that f (x, u) is level-bounded in x locally uniform in u implies that {xk}
108 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Remark 3.1 For the standard constrained optimization problem (CP), let the
generalized augmented Lagrangian be defined as in Remark 2.3. Further assume
the following conditions hold:
(i)
f ( x ) Jrm*, V X E X , (3.5)
for some m* E R.
(22)
Then all the conditions of Theorem 3.1 hold. It follows that there exists
no duality gap between (CP) and its generalized augmented Lagrangian dual
problem.
Definition 4.1 (exact penalty representation) Consider the problem (2.1). Let
the generalized augmented Lagrangian i be defined as i n Definition 2.3. A vector
jj E Rm is said to support an exact penalty representation for the problem (2.1)
if there exists 2 > 0 such that
and
argminxcp(x) = argminxi(x,jj, r ) , V r Jr 2.
Consequently,
implying
p(0) I p ( u ) - ( j j , u ) + F a ( u ) , V U ERm.
T h i s proves (i).
It is evident from t h e proof o f Theorem 11.61 i n Rockafellar, et a1 (1998)
that (ii) is true.
and the solution set of (CP) is the same as that of the problem of minimizing
Yjgj ( x ) + r [c,"=~
+ Cy='=l Y
f (x) lgj ( x )I] over x E X whenever r 2 i,
if and only i f there exist i' > 0 and a neighborhood W of 0 E Rm such that
Proof. (i) follows from Theorem 4.1 (i). We need only to prove (ii). Assume
that (4.2) holds.
First we prove (4.1) by contradiction. Suppose by the weak duality that
there exists 0 < rk -+ +m with
Since uk --+ 0, we conclude that (4.4) contradicts (4.2). As a result, there exists
F > 7' such that (4.1) holds. Hence, for any x* E argmin,cp(x), we have
whenever r > F. Now we show that there exists r* > F + 1 > 0 such that
argmin,i(x, 0, r) argmin,cp(x), Vr > r*.
Suppose to the contrary that there exist F + 1 5 r k 1 +oo and xk E
argmin,i(x, 0, r k ) such that xk 6 argmin,cp(x), V k . Then
f ( ~ ~ , + + i(xk,0, rk)
Hence,
f ( x k ,fik) + 7'a(iik)+ (rk - 7')a(fik)I p ( 0 ) . (4.9)
Remark 4.3 Comparing Theorems 4.1 and 4.2, the special case where y = 0
supports an exact penalty representation requires weaker conditions, i.e., con-
dition (c) of (ii) i n Theorem 4.1 is not needed.
Remark 4.4 Consider (CP) and its generalized augmented Lagrangian defined
in Remark 23. Suppose that Xo # 0 and (3.5) holds. Then jj = 0 supports
an exact penalty representation for (CP) in the framework of its generalized
augmented Lagrangian i f and only i f there exist 7 > 0 and a neighborhood W
of 0 E R" such that
Example 4.1 Consider (CP). Let X* denote the set of optimal solutions of
(CP). Suppose that Xo # 0 and (3.5) hold. Let the generalized augmenting
function be a ( u ) = IIuII:,y > 0 and the generalized augmented Lagrangian be
defined as i n Remark 2.3. It is easily computed that
and X * = X,*, r 2 F', where X: is the set of optimal solutions of the problem
+
of minimizing f (x) r [CTlg;(x) +
Cjm_ml Igj(x) llr over x E x;
(ii) there exist F > 0 and a neighborhood W of 0 E Rm such that
5 CONCLUSIONS
References
Clarke, F.H. (1983), Optimization and Nonsmooth Analysis, John Wiley &
Sons, New York.
Huang, X.X. and Yang, X.Q. (2003), A Unified Augmented Lagrangian Ap-
proach to Duality and Exact Penalization, Mathematics of Operations Re-
search, to appear.
Luo, Z.Q., Pang, J.S. and Ralph, D. (1996), Mathematical Programs with Equi-
librium Constraints, Cambridge University Press, New York.
Luo, Z.Q. and Pang, J.S. (eds.) (2000), Error Bounds in Mathematical Pro-
gramming, Mathematical Programming, Ser. B., Vol. 88, No. 2.
Pang, J.S. (1997), Error bounds in mathematical programming, Mathematical
Programming, Vol. 79, pp. 299-332.
Rockafellar, R.T. (1974), Augmented Lagrange multiplier functions and duality
in nonconvex programming, SIAM Journal on Control and Optimization,
Vol. 12, pp. 268-285, 1974.
114 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION A N D PRELIMINARIES
For the set W = {A(t)lt E B), sp(W) denotes the subspace generated by W ,
i.e.,
.P(W) = {CY ( ~ ) A ( ~ ) EI vAs),
Y
tEB
The standard inner product on Sn is
inf C X
s.t A(t) X = b(t), t E B , (1.1)
X 2 0.
Here B is a compact set in R, C and A(t)(t E B ) are all fixed matrices in Sn,
b(t) E R (t E B) and the unknown variable X also lies in Sn.
Obviously, (SDSIP) problem includes the semi-definite programming prob-
lem and the linear semi-infinite programming problem with equality constraints
as special cases. See Charnes et a1 (1962) and Wolkowicz et a1 (2000).
For the (SDSIP) problem , we introduce the Lagrangian dual problem (DS-
DSIP) as follows:
When the parameter set B is finite, Then, (SDSIP) and (DSDSIP) is a pair
of primal and dual (SDP). See Vandenberghe and Boyd (1996), Ramana et a1
(1997) and Wolkowicz et a1 (2000).
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 117
Proposition 1.1 Suppose that X and ( y ,2 ) are feasible solutions for (SDSIP)
problem and (DSDSIP) problem, respectively. Then,
Then, we have
yields duality with respect to C E Sn, i f exactly one of the following conditions
holds:
(iv) Both (SDSIP) and (DSDSIP) are consistent and have the same optimal
value, and the value is attained i n (DSDSIP).
W e say that (SDSIP) yields uniform duality if the constraint system (1.3) yields
duality for every C E Sn.
118 OPTIMIZATION AND CONTROL WITH APPLICATIONS
In the paper, we firstly establish that a uniform duality between the ho-
mogeneous (SDSIP) and its Lagrangian-type dual problem is equivalent to the
closedness condition of certain cone. With aid of the result, we also obtain a
corresponding result for nonhomogeneous (SDSIP).
Detailed study of uniform duality for (SDSIP) problems with inequality con-
strains can be found in Li et a1 (2002).
inf C X
s.t A(t) X = 0, t E B,
x k 0,
and (DSDSIP) becomes the following problem (DSDSIPh):
sup 0
Lemma 2.1 The problem (SDSIPh) is unbounded in value if and only if there
exists X * 2 0 satisfying:
A(t) X * = 0, t E B (2.3)
and C. X * < 0. (2.4)
Proof. Suppose that there is X * 0 such that (2.3) and (2.4) hold. Without
loss of generality, assume COX*< -1. For each n we have A(~).x(") >0, t E B
and COX(")< -n with x(")= nX*. Hence, (SDSIPh) is unbounded in value.
Conversely, by the unbounded definition, the case holds. This completes the
proof. 0
Remark 2.1 Since X = 0 is a feasible solution of (SDSIPh), (SDSIPh) is
always consistent. If the optimal value of (SDSIPh) is bounded below, the
optimal value of (SDSIPh) is zero. Thus, (ii) and (iii) in Definition 1.1 do not
happen.
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 119
Take any S E sp(W) + K . Then, there exist V E sp(W) and Q E K such that
and
s e x *20.
and
C e X * 2 0.
2.1. Now we show that clause (iv) of Definition 1.1 holds. If (DSDSIPh) is not
+
consistent for C, C @ sp(W) K = cl(sp(W) + K). By the definitions of sp(W)
and K , we have that sp(W) + K is a closed and convex cone in Sn. Thus, by
the separation theorem, there exists X * in Sn such that
Obviously,
A(t) E sp(W), V t E B.
Therefore,
CeX4<OandA(t)eX*=0, V ~ E B .
Thus, it is necessary that we prove X * 0. Take any Q E K and 0 E sp(W).
We have
Q @ X 4 2 0 ,VQEK. (2.5)
Thus, X * is a positive semidefinite matrix. It completes this proof. 0
yields duality for any E Sn+l if and only if sP(w) + K is a closed set.
Proof. The proof is similar to that of Theorem 2.1 and is omitted. 0
We now establish the duality for the nonhomogeneous constraint system (1.1)
of (SDSIP) by reformulating it as a form of homogeneous system (2.1) and
applying Proposition 2.1. For any real number d E R, we define:
inf Clx
s.t j(t) 2 = 0, t E B, (3.1)
X k 0.
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 121
sup 0
s.t. y(t)A(t)+ 2 = 6 , y E AB, (3.2)
tE B
2 > 0.
which is equivalent to the program
sup 0
Lemma 3.1 The nonhomogeneous constraint system (1.1) yields duality with
respect to C E Sn if and only if, for every d E R, the constraint system (3.1)
yields duality with respect to 6 E Sn+l.
Proof Suppose that the constraint system (1.1) yields duality with respect to
C E Sn and let d 6 R. We will show that the constraint system (3.1) yields
duality with respect to 6.
Since (SDSIPl) is a homogeneous system, by Remark 2.1, we need only show
two cases:
Case one: if its dual problem (DSDSIPl) is consistent, (iv) in Definition 1.1
holds.
By homogeneous property of (SDSIPl) problem, we have that the optimal
value of (DSDSIPl) problem is zero. It follows from Proposition 1.1 that (SD-
SIP1) problem is bounded in value. By Remark 2.1, we have that (iv) in
Definition 1.1 holds.
Case two: if its dual problem (DSDSIPl) is inconsistent, (i) in Definition
1.1 holds. Namely, (SDSIPl) has a value of -00 with respect to 6.
Assume that its dual problem (DSDSIPl) is inconsistent. Note that (3.3)
and (3.5) are the constraint system of (DSDSIP). If (SDSIP) is consistent and
(3.3) and (3.5) are hold, by hypothesis condition that the constraint system
(1.1) yields duality with respect to C E Sn,we have
122 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Thus, it follows from the inconsistency of the dual problem (DSDSIPl) that
a t least one of two conditions holds: (i) the constraint (3.3) does not hold; (ii)
the constraint (3.3) holds, but the constraint (3.4) does not hold. Namely,
Set
and 0 E Rn.
It follows from x ( ~
that
) x(")is a positive semidefinite matrix. Thus, we have
that
If (ii) holds and (SDSIP) is inconsistent, then we have that (SDSIP) is in-
consistent and (DSDSIP) is consistent. Since (SDSIP) has duality with re-
spect to C , for any n , there exists a solution y E AB for (DSDSIP) with
y(t)b(t) > n.
CtEB Then, the dual problem (DSDSIPl) is consistent for any
d E R, which contradicts with assumption.
If (ii) holds and (SDSIP) is consistent, by (3.6), we have
and
Set
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 123
Then,
Thus, by Lemma 2.1, (SDSIPl) has value -00. We have proved the necessity
of this lemma.
To prove the sufficiency of the lemma, we suppose that, for all d E R and
C E Sn, (SDSIP1) yields duality with respect to 6 . We need to prove that
(SDSIP) yields duality with respect to C.
If (SDSIP) is inconsistent, then, there is only zero to solve (SDSIPl). By
the definition of duality, (DSDSIPl) is consistent for any d E R. Take d = n.
Thus, there exists y(n) E AB such that
and
C o X 2 2 0X + y(t)b(t).
t€B
A(t) O X = 0, t E B, x > 0,
cox <o.
124 OPTIMIZATION AND CONTROL WITH APPLICATIONS
x= ( ti xn+l
), where (I E Rn. Obviously, x is positive semidefinite
and satisfy:
Corollary 3.1 (SDSIP) yields uniform duality if and only if, for any d E R
and C E Sn,the constraint system (3.1) yields duality with respect to 6 E Sn+l.
Proof. By Corollary 3.1, (SDSIP) yields uniform duality if and only if, for
any d E R and C E Sn,the constraint system (3.1) yields duality with respect
to each E Sn+l. By Proposition 2.1, for any d E R and C E S n , the
constraint system (3.1) yields duality with respect to each E Sn+lif and
+
only if s p ( w ) K is a closed set. Then, the conclusion follows readily.
Acknowledgments
This research is partially supported by the Research Committee of The Hong Kong
Polytechnic University and the National Natural Science Foundation of China.
References
1 INTRODUCTION
The origins of equations (H-J), (B) incite to consider such data. These equa-
tions arise when, for a given (x, t) E X x IP, with IP the set of positive numbers,
one considers the Bolza problem:
(B)find V(x,t) :=
HAMILTON-JACOB1 EQUATIONS 129
U(X, t)
V E X pEX*
+
: = inf sup (p.(x - Y) g(y) - ~ H ( P ) ) , (Lax-Oleinik)
where the conjugates are taken with respect to the pairs (p, r), (x, t) and where
the infimal convolution is taken with respect to the variable (x, t) (the notation
is unambiguous inasmuch F and G are defined on X * x IR and X x IR respec-
tively). In fact, for t E IR+ one has F*(., t) = (tH)* and G* = g* o p p , where
p x * is the first projection from X * x IR to X*.
In order to avoid the trivial case in which v is the constant function -cox,
we assume that
domH n domg* # 0, (2.5)
while to avoid the case the Lax solution u is an improper function we assume
the condition
domg # 0, domH # 0, domH* # 0. (2.6)
In Imbert and Volle (1999), Penot (2000) and Penot and Volle (2000) some
criteria for the coincidence of the Hopf and the Lax solutions are presented and
some consequences of this coincidence are drawn. In particular, in Penot and
Volle (2000) we introduced the use of the Attouch-BrBzis type condition
which ensures that u = v when g and H are closed proper convex functions.
Simple examples show that, without convexity assumptions, u and v may differ.
The interchange of inf and sup in the explicit formulae above shows that
u 2 v. A more precise comparison can be given. Without any assumption, for
t E IP,one has
u(-,t) 2 u(.,t)** := ( g ~ ( t H ) * ) * *
-
- (g* + tH**)* 2 (g* + tH)* = v(.,t), (2.8)
u** = ( F * n G ) * * = (F** + G*)* 2 (F + G*)* = v. (2.9)
which is milder than the condition H**= H , one has a close connection between
u and v.
Proposition 2.1 Under assumptions (2.5), (2. lo), one has u(., t)** = v(., t),
u** = v. If moreover g is convex, then v = a, the lower semicontinuous hull of
U.
Proof. Let us prove the second equality of the first assertion, the first one
being similar and simpler. In view of relation (2.9) it suffices to show that
+ +
(F** G*) (p, q) = (F G*) (p, q) for any (p, q) E X x R, or for any (p, q) E
domg* x R since both sides are +oo when p $ domg*. Now F**is the indicator
function of the closed convex hull m(E) of E. Since B ( E ) = m(S(epi H)) =
S(m(epiH) = S(epi H**), for p E domg* we have (F** + G*) (p, q) = g*(p) iff
(p, q) E W(E) = S(epi H**) iff ( F + G*) (p, q) = g*(p).
3 SOLUTIONS I N T H E SENSE O F UNILATERAL ANALYSIS
initial condition (B). In doing so, one can detect interesting properties of func-
tions which are good candidates for the equation but do not satisfy the initial
condition in a classical sense. When a function u satisfies both (H-J) and (B)
in an appropriate sense, we speak of a solution of the system (H-J)-(B).
The notion of viscosity solution (Crandall and Lions (1983), Crandall et a1
(1984)) which yielded existence and uniqueness results, introduced a crucial
one-sided viewpoint since in this notion, equalities are replaced by inequalities.
A further turn in the direction of nonsmooth analysis occurred with Barron
and Jensen (1990), F'rankowska (1987), Fkankowska (1993) (see also Bardi and
Capuzzo-Dolcetta (1998), Clarke et a1 (1998), Vinter (2000)) in which only
subdifferentials are involved. We retain this viewpoint here and we admit the
use of an arbitrary subdifferential. Although this concept is generally restricted
to some natural conditions, here we adopt a loose definition which encompasses
all known proposals: a subdifferential is just a triple (X, 3,d) where X is a class
of Banach spaces, 3 ( X ) is a class of functions on the member X of X and d
is a mapping from 3 ( X ) x X into the family of subsets of X*, denoted by
(f, x) H d'f (x), with empty value at (f,x) when If (x)l = oo. The viscosity
subdifferential of f at x is the set of derivatives at x of functions cp of class C1
such that f - cp attains its minimum at x. For most results, it suffices to require
that cp is F'rbchet differentiable at x. This variant coincides with the notion of
F r k h e t subdifferential defined for f E R ~x E
, f -'(R) by
1
x* E X * :liminf - [ f ( x + u ) -
1141+0+ lbll
f(z)-(x*,u)] 2 0
in view of the following simple lemma which shows that there is no misfit
1
between the two notions.
A similar result holds for the Hadamard (or contingent) subdifferential defined
for f E ITX, x E f-l (R) by
X * E X * : V W E X , liminf
(t,v)-(O+,w)
1
-[f(x+tv)-f(x)-(x*,tv)]>O
t I
HAMILTON-JACOB1 EQUATIONS 133
Theorem 4.1 (a) The Hopf solution v is a Hadamard (hence a Re'chet) su-
persolution of (H-J).
(b) Under assumptions (2.5), (2. lo), the Hopf solution v is a Hadamard
solution of (H-J).
134 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Proof. (a) If assumption (2.5) does not hold, one has v I X x P = -coXX'
and there is nothing to prove. If it holds, given t > 0, x E X and (p, q) E
dv(x, t), since v is convex, for each s 6 R+, one has
For s = 0, taking the supremum on w and using v(., 0)* = (g* + L ~ ~ ~ >~ g*,) * *
one gets
The last inequalities show that g*(p) < oo; it follows that q + H(p) 2 0.
(b) For s > 0, using (2.5), (2.10) to note that g* + sH = g* + sH**is closed,
proper, convex, one gets
Since g * ( ~ <
) +co and since s can be arbitrarily large, one obtains -q 2
H(P).
In order to get a similar property for the Lax solution, we use a coercivity
condition:
(C) liminfllxll+oo H * ( x ) l IIxII > - liminfllxll+oo d x ) l IIxII .
Let us recall that an infimal convolution is said to be exact if the infimum is
attained. Under assumption (C), exactness occurs in (2.2) when X is reflexive
and g is weakly 1.s.c.
Theorem 4.2 (a) When (C) holds, the Lax solution u is a Re'chet supersolu-
tion.
(b) If the inf-convolution in the definition of u is exact, then u is a Hadamard
supersolution.
(c) If g is convex, then u is a Hadamard supersolution.
(d) If H = H**,then u is a Hadamard subsolution.
Proof Since assertions (b)-(d) are proved in Penot and Volle (2000) and else-
where, we just prove (a). Let (x,t) E X x IP be fixed and let k be given by
k(w) = g(z - w) + tH*(t-'w). Assumption (C) ensures that k is coercive (this
fact justifies the observation preceding the statement). Let B be a bounded
subset of X such that inf k(B) = inf k(X). For each s €10, t[ let us pick z, E B
+ +
such that k(z) < inf k(X) s2 = u(x, t) s2.Then, a short computation shows
that
- S) < u(x,t) + s2.
-
U(X - st-'zs,t - sH*(t-lz,)
It follows that for each (p, q) E 8-u(x, t) one can find a function E : lR+ lR+
with limit 0 a t 0 such that
q + H(p) 2 liminf
s+o+
(q + p.t-lzs - h(t-'z,)) 2 lim (-s
s-+o+
- E(s)) = 0.
Let us turn to this important question which has been a t the core of the viscos-
ity method. There are several methods for such a question: partial differential
equations techniques (Bardi and Capuzzo-Dolcetta (1998), Barles (1994), Lions
(1982). ..), invariance and viability for differential inclusions (Subbotin (l995),
F'rankowska (1993), Plaskacz and Quincampoix (2000), Plaskacz and Quincam-
poix (2000)...), nonsmooth analysis results such as the Barron-Jensens strik-
ing touching theorem (Barron (1999), Barron and Jensen (1990)), the fuzzy
sum rule (Borwein and Zhu (1996), Deville (1999), El Haddad and Deville
(1996)), multidirectional mean value inequalities (Imbert (1999), Imbert and
Volle (1999), Penot and Volle (2000)). Let us note that the last two results are
almost equivalent and are equivalent in reflexive spaces.
In order to simplify our presentation of recent uniqueness results arising from
nonsmooth analysis, we assume in the sequel that X is reflexive and we use the
F'rbchet subdifferential. In such a case, a fuzzy sum rule is satisfied and mean
value theorems are available.
Theorem 5.1 (Penot and Volle (2000) Th. 6.2). For any 1.s.c. Fre'chet sub-
solution w of (H-J)-(B) one has w < u, the Lax solution.
136 OPTIMIZATION AND CONTROL WITH APPLICATIONS
The next corollary has been obtained in Alvarez et a1 (1999) Th. 2.1 under
the additional assumptions that X is finite dimensional, H is finite everywhere
and for the subclass of solutions which are 1.s.c. and bounded below by a
function of linear growth. I t is proved in Imbert and Volle (1999) under the
additional condition that dom H* is open.
Corollary 5.1 Suppose X is reflexive, g and H are closed proper convex func-
tions and domg* c dom H. Then the Hopf solution is the greatest 1.s.c. Fre'chet
subsolution of (H-J)-(B).
The use of the mean value inequality for a comparison result first appeared
in Imbert (1999),Imbert and Volle (1999) Theorem 3.3 which assumes that H
is convex and globally Lipschitzian and that X is a Hilbert space. Let us note
that in our framework the mean value theorem is equivalent to the fuzzy sum
rule; the fuzzy sum rule has been used in Borwein and Zhu (1996), Borwein
and Zhu (1999), Deville (1999), El Haddad and Deville (1996) for a similar
purpose.
It is shown in (Alvarez et a1 (1999) Thms 2.1 and 2.5) that the growth
condition on H can be dropped when dimX < +oo.
Well-posedness in the sense of Hadamard requires that when the data (g, H )
are perturbed in a continuous way, the solution is perturbed in a continuous
way. Up to now, this question seems to have been studied essentially in the
sense of local uniform convergence. While this mode of convergence is well-
suited to the finite dimensional case with finite data and solutions, it does not
REFERENCES 137
fit our framework. Thus, in Penot (2000) and in Penot and Zalinescu (2001b)
this question is considered with respect to sublevel convergence and to epicon-
vergence (and various other related convergences). These convergences are well
adapted to fonctions taking infinite values since they involve convergence of
epigraphs. They have a nice behavior with respect to duality. However, the
continuity of the operations involved in the explicit formulae require technical
"qualification" assumptions (Penot and Zalinescu (2001a), Penot and Zalinescu
(2001b)).
We have not considered here the case H depends on x; we refer to Rockafellar
and Wolenski (2000a), Rockafellar and P.R Wolenski (2000b) for recent progress
on this question. We also discarded the case H depends on u(x). In such a
case one can use operations similar to the infimal convolution such as the
sublevel convolution o and quasiconvex dualities as introduced in Penot and
Volle (1987)-Penot and Volle (1990) (see also Martinez-Legaz (1988), Martinez-
Legaz (1988)). The papers Barron et a1 (1996)-Barron et a1 (1997) opened
the way and have been followed by Alvarez et a1 (1999), Barron (1999), Volle
(1998), Volle (1997)). A panorama of quasiconvex dualities is given in Penot
(2000) which incites to look for the use of rare dualities, reminding the role the
Mendeleiev tableau played in chemistry.
References
0. Alvarez, E.N. Barron and H. Ishii (1999), Hopf-Lax formulas for semicon-
tinuous data, Indiana Univ. Math. J. 48 (3), 993-1035.
0. Alvarez, S. Koike and I Nakayama (2000), Uniqueness of lower semicontin-
uous viscosity solutions for the minimum time problem, SIAM J . Control
Optim. 38 (2), 470-481.
H. Attouch (1984), Variational convergence for functions and operators, Pit-
man, Boston.
M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of
Hamilton-Jacobi-Bellman Equations, Birkhauser, Basel.
G. Barles (1994), Solutions de viscositB des Bquations de Hamilton-Jacobi,
Springer, Berlin.
G. Barles and B. Perthame (1987), Discontinuous solutions of deterministic
optimal stopping time problems, Math. Modeling and Numer. Anal. 21, 557-
579.
138 OPTIMIZATION AND CONTROL WITH APPLICATIONS
E.N. Barron (1999), Viscosity solutions and analysis in Lm,in Nonlinear Analy-
sis, Differential Equations and Control, F.H. Clarke and R.J. Stern (eds.),
Kluwer, Dordrecht, pp. 1-60.
E.N. Barron, and R. Jensen (1990), Semicontinuous viscosity solutions of Hamilton-
Jacobi equations with convex Hamiltonians, Comm. Partial Diff. Eq. 15,
1713-1742.
E.N. Barron, R. Jensen and W. Liu (1996), Hopf-Lax formula for ut+ H(u, Du) =
0, J . Differ. Eq. 126, 48-61.
E.N. Barron, R. Jensen and W. Liu (1997), Hopf-Lax formula for ut+ H(u, Du) =
0. 11, Comm. Partial Diff. Eq. 22, 1141-1160.
J.M. Borwein and Q.J. Zhu (1996), Viscosity solutions and viscosity subderiva-
tives in smooth Banach spaces with applications to metric regularity, SIAM
J. Control Optim. 34, 1568-1591.
J.M. Borwein and Q.J. Zhu (1999), A survey of subdifferential calculus with
applications, Nonlinear Anal. Th. Methods Appl. 38, 687-773.
F.H. Clarke, and Yu.S. Ledyaev (1994), Mean value inequalities in Hilbert
space, Trans. Amer. Math. Soc. 344, 307-324.
F.H. Clarke, Yu.S. Ledyaev, R.J. Stern and P.R. Wolenski (1998), Nonsmooth
analysis and control theory, Springer, New York.
M.G. Crandall, L.C. Evans and P.-L. Lions (1984), Some properties of viscosity
solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc. 282, 487-
502.
M.G. Crandall, H. Ishii and P.-L. Lions (1992), User's guide to viscosity so-
lutions of of second order partial differential equations, Bull. Amer. Math.
SOC.27, 1-67.
M.G. Crandall and P.-L. Lions (1983), Viscosity solutions to Hamilton-Jacobi
equations, Trans. Amer. Math. Soc. 277.
R. Deville (1999), Smooth variational principles and nonsmooth analysis in
Banach spaces, in Nonlinear Analysis, Differential Equations and Control,
F.H. Clarke and R.J. Stern (eds.), Kluwer, Dordrecht, 369-405.
E. El Haddad and R. Deville (1996), The viscosity subdifferential of the sum
of two functions in Banach spaces. I First order case, J . Convex Anal. 3,
295-308.
L.C. Evans (1998),Partial differential equations, Amer. Math. Soc., Providence.
H. Frankowska (1987), Equations d'Hamilton-Jacobi contingentes, C.R. Acad.
Sci. Paris Serie I 304, 295-298.
H. Frankowska (1993), Lower semicontinuous solutions of Hamilton-Jacobi-
Bellman equations, SIAM J . Control Optim. 31 (I), 257-272.
G.N. Galbraith (2000), Extended Hamilton-Jacobi characterization of value
functions in optimal control, SIAM J. Control Optim. 39 (I), 281-305.
C. Imbert (1999), Convex analysis techniques for Hopf-Lax' formulae in Hamilton-
Jacobi equations with lower semicontinuous initial data, preprint, Univ. P.
Sabatier, Toulouse, May.
C. Imbert and M. Volle (1999), First order Hamilton-Jacobi equations with
completely convex data, preprint, October.
A.D. Ioffe (1998), F'uzzy principles and characterization of trustworthiness, Set-
Val. Anal. 6, 265-276.
P.-L. Lions (1982), Generalized Solutions of Hamilton-Jacobi Equations, Pit-
man, London.
J.-E. Martinez-Legaz (1988), On lower subdifferentiable functions, in "Trends in
Mathematical Optimization", K.H. Hoffmann et al. eds, Birkhauser, Basel,
197-232.
J.-E. Martinez-Legaz (1988), Quasiconvex duality theory by generalized conju-
gation methods, Optimization, 19, 603-652.
J.-P. Penot (1997), Mean-value theorem with small subdifferentials, J . Opt. Th.
Appl. 94 (I), 209-221.
J.-P. Penot (2000), What is quasiconvex analysis? Optimization 47, 35-110.
J.-P. Penot and M. Volle (1987), Dualit6 de Fenchel et quasi-convexit6, C.R.
Acad. Sc. Paris s6rie I, 304 (13), 269-272.
J.-P. Penot and M. Volle (1988), Another duality scheme for quasiconvex prob-
lems, in "Trends in Mathematical Optimization", K.H. Hoffmann et al. e d ~ ,
Birkhauser, Basel, 259-275.
J.-P. Penot and M. Volle (1990), On quasi-convex duality, Math. Operat. Re-
search 15 (4), 597-625.
J.-P. Penot and M. Volle (2000), Hamilton-Jacobi equations under mild conti-
nuity and convexity assumptions, J. Nonlinear and Convex Anal. 1,177-199.
J.-P. Penot and M. Volle (1999), Convexity and generalized convexity meth-
ods for the study of Hamilton-Jacobi equations, Proc. Sixth Conference on
Generalized Convexity and Generalized Monotonicity, Samos, Sept. 1999, N.
140 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
where supp (f, H ) = {h E H : h(x) < f (x) (V x E X ) is the support set of the
function f with respect to H .
A set H is called a supremal generator of a set F of functions f defined on
X if each f E F is abstract convex with respect to H. A supremal generator
H is a base (in a certain sense) of F , so some properties of H can be extended
to the entire set F. (See Rubinov (2000), Chapter 6 and references therein for
details.) If H is a "small" set then some of its properties can be verified by
the direct calculation. Thus small supremal generators are very helpful in the
examination of some problems. This observation explains, why a description of
small supremal generators for the given broad class of functions is one of the
main problems of abstract convexity. The reverse problem: to describe abstract
convex functions with respect to a given set H, is also very interesting.
If H consists of continuous functions then the set of H-convex functions is
contained in the set LSCH of all lower semicontinuous functions f such that
f 2 h for some function h E H . (Here f 2 h stands for f (x) 2 h(x) for all
x E X.) The set LSCH is very large. In particular, if constants belong to H ,
then LSCH contains all bounded from below lower semicontinuous functions.
As it turned out there are very small supremal generators of the very large set
LSCH. These supremal generators can be described by means of the so-called
peaking property (see Pallaschke and Rolewicz (1997) and references therein)
or the technique based on functions, support to Urysohn peaks (see Rubinov
(2000) and references therein).
We present two known examples of such generators (see Rubinov (2000) and
references therein).
1) let X be a Hilbert space and H be the set of all quadratic functions h of the
form
h(x) = -allx - xoll2 - c, x E X,
Let k be a positive number. We shall study the set H k of all functions h of the
form
h(x) = -apk(x - XO)- c (x E X ) (1.3)
with xo E X , c E R and a > 0 and show that this set is a supremal generator
of the class of functions P k , which depends only on k and does not depend on
p. The class Pr, is very broad. It consists of all lower semicontinuous functions
f :X R+, such that liminfllxll++m
-+ f (x)/1x11 > -00.
Consider the space X = IRn and a sublinear function p defined on X such
that (1.2) holds. It is well-known that there exists a set of linear function
Up such that p(x) = rnaxl,up[l, XI, where [L, x] stands for the inner product of
vectors 1 and x. Let H1 be the set of functions defined by (1.3) for the given
function p and k = 1. Since H1 is a supremal generator of P I , it follows that
each function f E PI can be represented in the following form:
where V(f) = {(a,c, xo) : -ap(x - xo) - c < f (x) Q x E X ) and Up does not
depend on f . Thus we have the following sup-min presentation of an arbitrary
function f E PI through affine functions:
144 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Since the class Pl does not depend on the choice of a sublinear function p with
yp > 0 it is interesting to consider such functions p that the corresponding set
Up has the least possible cardinality. Clearly, this cardinality is greater than or
+
equal to n 1, since for each function p(x) = ma%,l,...j[li, x] with j 5 n and
nonzero li we have yp E minllxll,l p(x) < 0. We discuss this question in details
( see Example 3.3, Example 3.4 and Remark 3.1).
We shall also describe conditions, which guarantee that the so-called abstract
convex subdifferentials are not empty. Recall the corresponding definitions (
Rubinov (2000)). Let X be an arbitrary set. A set L of functions defined on
X , is called the set of abstract linear functions if for every 1 E L the functions
hl,,(x) = 1(x) - c do not belong to L for each c # 0. The set HL = {hl,, :
1 E L,c E IR} is called the set of L-affine functions. Let f be an HL-convex
function. The set
2 SETS PK
Let X be a normed space and k be a positive number. Denote by Pk the set
of lower semicontinuous functions f : X + R+, such that f is bounded from
below on each ball and
f(x) >
liminf - (2.1)
ll41-- llxllk
It follows from the definition of the class Pr,that Pk= {fk : f f El). We now
describe some subsets of the set P I .
Then f is H-convex. To prove it, take a point xo E X and arbitrary a > 0. Let
a be a number such that (3.3) holds. Consider the function h(x) = -apk(x -
+
xO) (f (XO)- a). Let (xo,f (xo) - a) + Ka = K. We have
K = {(x, v) : x = xo + 9, U = f (xo) - E + p, p I -aP k (Y))
= {(x,.) : x = xo + y : v I ~ ( x o-)a - apk(y))
= {(x, v) : v 5 f (TO)- a - apk(x - xo))
= {(x,v) : u I h(x)).
Let x E X and y = x - xo. Since (x, f (x)) E epi f it follows from (3.3) that
+
(x, f (x)) $ (20, f (XO)- a) Ka, so f (x) > h(x). Since h(xo) = f (xo) - a and
a is an arbitrary positive number, we conclude that f (xo) = sup{h(xo) : h E
SUPP (f,H ) ) .
We now prove that (3.3) is valid. Assume in contrary that there exists
xo E X and a > 0 such that for each positive integer n a pair (x,, A,) can
be found such that (x,, A,) E epi f and (x,, A,) E (XO,f (xo) - a) + K,. Let
(y,, p,) = (x,, A,) - (xo,f (xO)- a). It follows from above that
The function f is bounded from below on each ball therefore (3.6) implies
unboundedness of the sequence xn. Since liminfllzll-+oof (x)/IIxllk> -CO, we
conclude that there exists c > 0 such that f ( x n ) 2 -cllxnllk for a11 sufficiently
large n. Hence
An 2 -cllxnllk (3.7)
for these n. On the other hand, applying (3.5) we deduce that
for all n. Due to (3.7) and (3.8), we have for all large enough n:
Theorem 3.1 Let P(H" be the set of all Hk-convex functions. Then
P(H" = Pk = LSCHk.
Proof: The result follows directly from Lemma 3.1, Lemma 3.2 and the obvious
inclusion P ( H k ) c LSCHb. 0
Example 3.1 Let X be a normed space. The set H 1 of all functions h defined
on X by h ( x ) = -allx - xo 11 - c with xo E X , a I 0, c E R is a supremal
generator of P I . If Pl > H > H 1 , then H is also a supremal generator of P l .
In particular the set of all concave Lipschitz functions is a supremal generator
of 7'1.
Example 3.2 Let X be a Hilbert space and H 2 be the set of all functions h
defined on X by h ( x ) = + [l,x]- c with a > 0,l E X , c E R.(Here
150 OPTIMIZATION AND CONTROL WITH APPLICATIONS
[l, x] is the inner product of vectors 1 and x). It is well known (see, for example,
Rubinov (2000)) that H2 is a supremal generator of the set LSCH2. Clearly
h E H2 if and only if there exists a point xo E X and a number c' E IR such
that h(x) = -allx - xo [I2 - c'. So Theorem 3.1 shows that P2 coincides with
the set of all H2-convex functions.
Example 3.4 The following particular case of Example 3.3 is of special in-
terest. Let X be a n-dimensional space IRn, I = (1,. . . ,m ) and li (i E I ) are
vectors such that their conic hull cone (11,. . . ,1,) := {xi,,Aili :X 2 0 (i E I))
coincides with the entire space IRn. Then the set
is bounded and contains a ball c B where B = {x : llxll < 1) and c > 0. Let p s
be the Minkowski gauge of S. We have
1 1
PS(X) I P=B(X)= -C ~ B ( x =
) -IIxII
C
for all x E IRn
Thus the function p s is finite. The set S is bounded, so there exists y > 0 such
that S C (l/y)B. We have
Remark 3.1 Consider a number m with the following property: there exist m
vectors ll,. . . ,I, such that the sublinear functionp(x) = maxi=l,...,,[li, XI, (x E
<
X ) is strictly positive for all x # 0. If m n then this property does not hold
for the function p. Indeed the system [li,x] = -1, i = 1,.. . ,m has a solution
for arbitrary nonzero vectors 11,. . . ,1,. It follows from the Example 3.4 that
+
we can find corresponding vectors if m = n 1. Thus the least number m,
which possesses mentioned property, is equal to n 1. +
In this section we describe sufficient conditions, which guarantee that the Lk-
subdifferential is not empty. These conditions becomes also necessary for k = 1.
Recall the following well known definition (see, for example, Burke (1991)).
A function f : X -+ R+, is called calm at a point xo E dom f if
Proof: 1) Assume that f is calm of degree k, that is (4.2) holds. Then there
exist numbers cl and dl > 0 such that f (x) - f (xo) L cl llx - xoll if IIx - xoll <
dl. Since liminf,,, f (x)/llxllk > -m it follows that there exist numbers ca
and da > 0 such that f(x) 2 clllxlI"f 11x11 > dh. Since
152 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Recall ( see, for example, Demyanov and Rubinov (1995)) that a function
f defined on a normed space X is called subdifferentiable at a point x E X if
there exists the directional derivative
+O ( l l a ) ( f( x + a u ) - f ( x ) )
f b ( 4 = a+lim
for all u E X and f: is a continuous sublinear function. Each convex function f
is subdifferentiable a t a point x E int dom f . A function f is called quasidiffer-
entiable (see, for example, Demyanov and Rubinov (1995))a t a point x if the
directional derivative f k exists and can be represented as the difference of two
continuous sublinear functions. If f is the difference of two convex functions,
then f is quasidifferentiable.
The support set supp (r,X * ) of a continuous sublinear function r : X + IR
with respect to the conjugate space X* will be denoted by dr. Note that
dr coincides with the subdifferential (in the sense of convex analysis) of the
sublinear function r a t the point 0.
1) The L1-subdifferential dLl f ( x o )is not empty and contains a function ll,,,,,
with some c > 0.
2) Let f be quasidifferentiable at the point xo and f;,(u) = r l ( u ) - r 2 ( u ) ,
where r l , rz are continuous sublinear functions and let ll,c,xo E dLl f ( x o ) . Then
dr2 c drl + cap.
Proof: 1) The function f is calm at the point xo, so the result follows from
Theorem 4.1.
2) Let l ( x ) = -cp(x - xo) be a L1-subgradient of f at the point xo. Let u E X
and a 2 0. Then
Thus
-cp(u) < f t ( x , u ) = r l ( u )- Q ( U ) for all u E X ,
which leads to the inclusion dr2 c drl + cap. 0
Indeed, it follows from Proposition 4.1 that 0 E df;, + cap, which is equivalent
to (4.3).
Acknowledgments
This research has been supported by Australian Research Council Grant A69701407.
References
1 INTRODUCTION
Consider a set of 1 data vectors {xi, yi), with xi E R ~yi, E +1, -1, i = 1,. . . ,1,
where is the i-th data vector that belongs to a binary class yi. We seek the
hyperplane that best separates the two classes with the widest margin. More
specifically, the objective of training the SVM is to find the hyperplane (Burges
(1998))
w . @ ( x+
) b = 0, (2.1)
TRAINING DUAL& SUPPORT VECTOR MACHINES 159
subject to
to minimise
where p is the position of the margins, and v is the error parameter to be de-
fined later in the section. The function @ is a mapping function from the data
space to the feature space to provide generalisation for the decision function
that may not be a linear function of the training data. The problem is equiva-
lent to maximising the margin 2/11wll, while minimising the cost of the errors
Ci(vp - ti), where w is the normal vector and b is the bias, describing the
hyperplane, and ti is the slack variable for classification errors. The margins
are defined by w . x + b = fp.
In the Dual-v formulation, we introduce v+ and v- as the error parameters
of training for the positive and negative classes respectively, where
with
where 1+ and 1- are the numbers of training points for the positive a negative
classes respectively. Note that the original v-SVM formulation by Scholkopf et
al. (2000) can be derived from 2v-SVM by letting v+ = 6and v- = &.
The 2v-SVM training problem can be formulated as a Wolfe dual Lagrangian
(Chew et al. (2001a)) problem. The Wolfe dual Lagrangian is explained by
160 OPTIMIZATION AND CONTROL WITH APPLICATIONS
subject to
This property is required in the training process and will be shown in Section
3.3.
3 OPTlMlSATlON METHOD
and we desire the solution to the sub-problem FBB. The solution to the problem
is obtained when no working set B can be found and updated that increases
FBB while satisfying the contraints of the problem.
We can combine the two processes to get a simple and intuitive method of
solving the 2u-SVM training problem. At each interative step, the objective
function is maximised with respect to only two variables. That is, we decompose
the problem into a sub-problem with a working set of two points p and q. This
decomposition simplifies the iterative step while still converges to the solution,
due to the convex surface of the problem.
162 OPTIMIZATION AND CONTROL WITH APPLICATIONS
With p, q E 1,.. . ,1, (3.1) can be rewritten to extract the ap) and aq
(k)
With some change Spq made on the decision variables, the change in Fp(;+') can
be obtained using the substitutions
As only the p-th and q-th decision variables are updated, the change in the
objective function at iteration (k + 1) is shown in the Appendix to be
where
In each iterative step, the objective function is increased by the largest amount
possible in changing ap) and ag).We seek the pair of decision variables to
update in each step by searching for the maximum change in the objective
function in updating the pair. The optimal SPq for each (p, q) pair, denoted by
TRAINING DUAL-v SUPPORT VECTOR MACHINES 163
resulting in
This in turn gives the maximum possible change in the objective function for
the updated pair as
f o r p , q ~ l...,
, 1.
Using (3.9) requires 0 ( 1 2 ) operations to search for the (p, q) pair. We can
reduce the complexity by simplifying the search to
Although the new search criterion is a simplification of (3.9), due to the con-
vexity of the objective function, the optimisation process still converges to the
same maximum, albeit in a less optimal path, but has a search complexity of
only 0 (E), as it only searches for the maximum and minimum of Glk).
All kernel functions need to satisfy Mercer's condition (Vapnik (1995)).
Mercer's condition states that the kernel is actually a dot product in some
space and therefore, the denominator of (3.7) is
which is positive for all valid kernel functions. Thus, it is clear that for the
(p, q) pair found using either (3.9) or (3.10), we obtain d;q 2 0. If 61Tq = 0, the
objective function is a t its maximum and we have found the trained SVM. The
iteration process continues as long as
164 OPTIMIZATION AND CONTROL WITH APPLICATIONS
3.3 Constraints of 2 ~ - S V M
Proposition 3.2 If {a?)} satisfies (2.13)) and {a!"} i s updated using (3.4)
with
Yp lP Yq, (3.12)
Remark 3.1 Due to (3.12)) the search for the update pair ( p ,q ) is divided into
two parts, one for each class. The class which returns the higher increase in
the objective function is selected for the update process.
Remark 3.2 Proposition 3.2 shows that the update process of the optimisation
results in
i i
Since the solution of the training problem has the property of (2.15)) we need
to initialise {ai( 0 )), such that
i
to enable the optimisation process to reach the solution.
Proposition 3.3 If {a?)} satisfies (2.11)) and {a!") is updated using (3.4)
with
, foryp=yq=+l
(3.14)
, for yp = yq = -1
Proof: It is clear that since d;q is always positive (3.11),the limiting constraints
on dpq are
Using dpq as stated in (3.14) will meet the constraints of (3.15) and (3.16),
and therefore (2.11). Thus, if {a?)) satisfies (2.11), then by induction, {a!'")}
satisfies (2.11) for any Ic.
Remark 3.3 The selection process requires dpq > 0 for each iteration. From
(3.15) and (3.16)) it is clear that we need
From Propositions 3.2 and 3.3, the selection of the ( p , q ) pair for each it-
eration therefore requires the search for (3.10) for each class, while satisfying
(3.12), (3.17) and (3.18), to find dpq > 0 with (3.14).
166 OPTIMIZATION AND CONTROL WITH APPLICATIONS
3.4 Algorithm
Given
w Find the set of decision variables, {ai), that maximises the objective
function
subject to
Define
- the equations
- if (0+2 O - )
* then
* else
q
=
9
+ yq6,
- update k =k + 1.
Terminate.
168 OPTIMIZATION AND CONTROL WITH APPLICATIONS
4 INITIALISATION TECHNIQUE
Consider the Lagrangian of 2u-SVM (2.10). The t-th iteration of the Lagrangian
(3.1) is rewritten to extract the a?) component of the function, for some
r € 1, ..., 1, as
As only the r-th decision variable is updated, we can derive the change in F,(t+')
using the substitutions
for some change, 6,. The objective function at iteration (t + 1) is shown in the
Appendix to be
TRAINING DUAL-v SUPPORT VECTOR MACHINES 169
where
We use (4.3) to initialise the set of decision variables such that the variables
satisfy constraints (2.11), (2.12) and (2.15), for the training of 2u-SVMs, by
selecting the optimal decision variable to update a t each iterative step.
~ ~ , ( t + l=) ~ ( t + l-) ~ ( t )
1
= - C , ~ , - G ~-
) -
2
(c,)~
K,,. (4.5)
The process for finding r is therefore to find
In the search for r a t each iteration, (4.7) allows the exclusion of the class
that has met the constraint. The selection process therefore searches for r in a
reduced set of training points,
Note that (2.15) is more stringent than (2.13), implying that meeting (2.15)
is sufficient for (2.13). Consequently, we refine the update equation (4.2) to
(4.10) to satisfy (4.8).
Proposition 4.1 Let N ( ~be) defined by (4.9). If the search for r E N ( ~uses
)
(4.6) i n the { a i ) initialisation, and the update of a!') at each iteration t uses
the resulting set of ai will satisfy (4.8) at the end of the initialisation process.
Since the initialisation process terminates when (4.7) is satisfied, it follows from
(4.11) that (4.8) is satisfied. 0
4.4 Algorithm
w Given
w Find the initial state of the set of decision variables, {ai), for training
such that
Define
- the equations
172 OPTIMIZATION AND CONTROL WITH APPLICATIONS
w While N ( ~#) 0,
- find r E N ( ~with
)
w Terminate.
5 IMPLEMENTATION ISSUES
The training process involves the iterative maximisation of the objective func-
tion, by way of finding the maximum sg") for the positive class and the
negative class. I t is easy to see that finding the maximum s$:+') is basically
finding the maximum and minimum of G!". This is why the process has a
complexity of only O(1) instead of 0(12).
The calculation of G:" can be reduced significantly by updating it a t each
iteration rather than recalculating it again. Since only the p-th and q-th deci-
sion variables are changed, from (3.6) and using (3.4),
which has a complexity of O(1) rather than 0(12), when updating Gik"), for
all i E 1 , . . . ,l. Note that the kernel functions required by the update are for
174 OPTIMIZATION AND CONTROL WITH APPLICATIONS
columns p and q of the kernel matrix. As a, > 0 and a, > 0 a t iteration (k)
or (k + I ) , the columns will be cached by the caching strategy of Section 5.1.
We also have to consider the cost involved in initialising the Gi cache, that
is Gik=O). Assuming that the number of non-zero decision variables is m,
calculating Gi(k=O) would require O(m1) loops and kernel calculations. However,
we will see in the next section that the initialisation method provides Gi(k=O)
in its process as well.
The initialisation process seeks the minimum of I,(~+'),which has one dynamic
term 2ciyiGjt), and one static term (Ci)2Kii. The static term can be calculated
a t the start of the process, cached, and added to the dynamic term a t each
iteration.
As in the optimisation process, G?) in the dynamic term, can be updated a t
each iteration to reduce the computation requirements. From (4.4), and using
(4.2) with 6, = C,,
the initialisation process has provided the Gi(k) cache and kernel column cache
needed in the optimisation process.
TRAINING DUAL-v SUPPORT VECTOR MACHINES 175
6 PERFORMANCE RESULTS
We tested the iterative method against the exisiting ad hoc method using three
datasets. The ad hoc method initialises the first few decision variables until the
constraints are met before the optimisation step. Each dataset is trained with
the error parameters set to v+ = v- = 0.2, and using three different kernels:
linear, polynomial, and radial basis function (RBF).
The first dataset is a face detection problem with 1,000 training points,
with each point being an image. The dataset is trained with the linear, the
polynomial (degree 5), and the RBF (a = 4,000) kernels. The second dataset
is a radar image vehicle detection problem with 10,000 training points. The
dataset is trained with the linear, the polynomial (degree 5), and the RBF
(a = 256) kernels. The third dataset is a handwritten digit recognition problem
(digit 7 against other digits) with 100,000 training points, and is trained with
the linear, the polynomial (degree 4), and the RBF ( a = 15) kernels. This
dataset is also trained with v+ = v- = 0.01, using the polynomial and RBF
kernels.
The 2v-SVMs are trained on an Intel Pentium 4 (2GHz) Linux machine,
and the results are given in Table 6.1. The table shows the iterative initisation
method provides improvements for most of the tests. The iterative initialisation
method may increase the training time for small problems, but this is mainly
due to the slow convergence of the optimisation process, as seen in the face
detection problem using the polynomial kernel.
Figure 6.1 shows more clearly that while the computational time required for
initialisation is not reduced, there is a significant reduction in the optimisation
176 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Table 6.1 Training times for the ad hoc and the iterative methods of initialising the
decision variables
Figure 6.1 Comparison of initialisation and optimisation times for the ad hoc and the
iterative methods of initialising the decision variables
178 OPTIMIZATION AND CONTROL WITH APPLICATIONS
time in some cases. These results clearly show that even with the more complex
iterative initialisation process, there are time improvements for most of the
tested cases.
7 CONCLUSIONS
2v-SVM is a natural extension to SVM that allows different bounds for each of
the binary classes, and compensation for the uneven training class size effects.
We have described the process for training 2v-SVMs using an iterative process.
The training process consists of the initialisation and the optimisation proper.
Both use a similar technique in their iterative procedures.
Simulation and evaluations of the training process are continuing, and some
results were reported in Chew et al. (2001b). The initialisation process has been
found to reduce the training optimisation time, and does not require significant
costs in computing the decision variable.
In general, the optimisation process is expensive, in terms of computing
and memory utilisation, as well as the time it takes. By using caches and
decomposition in the process presented in this work, the problems of high
memory usage and redundant kernel calculations are overcome. Specifically,
the memory utilisation complexity is reduced from 0(12) to 0(1), to cache the
kernel calculations.
The method presented has led to an efficient classifier implementation. It
can be implemented readily on desktop workstations, and possibly on high
performance embedded systems.
Acknowledgments
Appendix
-'2
1
(a:))2 K,.,. - b r a ~ ) y r ~-r r ( ~ 5 KT,.
~ ) ~
Since n+ is the smallest integer satisfying the inequality, using (2.7) and (2.9),
REFERENCES 181
The derivation is similar for the negative class, and therefore the total number
of iterations required for initialising the decision variables is
References
Abstract: The Barzilai and Borwein gradient method does not ensure descent
in the objective function at each iteration, but performs better than the clas-
sical steepest descent method in practical computations. Combined with the
technique of nonmonotone line search etc., such a method has found success-
ful applications in unconstrained optimization, convex constrained optimization
and stochastic optimization. In this paper, we give an analysis of the Barzilai
and Borwein gradient method for two unsymmetric linear equations with only
two variables. Under mild conditions, we prove that the convergence rate of
the Barzilai and Borwein gradient method is Q-superlinear if the coefficient
matrix A has the same eigenvalue; if the eigenvalues of A are different, then the
convergence rate is R-superlinear.
1 INTRODUCTION
yielding
~T-~sk-l
ak = T (1.4)
sk-lYk-l
Compared with the classical steepest descent method, which can be dated
to Cauchy (1847), the Barzilai and Borwein gradient method often requires
less computational work and speeds up the convergence greatly (see Akaike
(1959); Fletcher (1990)). Theoretically, Raydan (1993) proved that the Barzi-
lai and Borwein gradient method can always converge to the unique solution
x* = A-'b of problem (1.1). If there are only two variables, Barzilai & Bor-
wein (1988) established the R-superlinear convergence of the method; for any
dimensional convex quadratics, Dai & Liao (2002) strengthened the analysis of
Raydan (1993) and proved the R-linear convergence of the Barzilai and Borwein
gradient method. A direct application of the Barzilai and Borwein method in
chemistry can be found in Glunt (1993).
To extend the Barzilai and Borwein gradient method to minimize a general
smooth function
minf(x), X E Rn, (1.5)
Raydan (1997) considered the use of the nonmonotone line search technique
by Grippo et a1 (1986) for the Barzilai and Borwein gradient method which
cannot ensure descent in the objective function a t each iteration. The resulting
ANALYSIS O F BB METHOD F O R UNSYMMETRIC LINEAR EQUATIONS 185
one can similarly define the steepest descent method and the Barzilai and Bor-
wein gradient method for problem (1.6). F'riedlander et a1 (1999) presented
a generalization of the two methods for problem (1.6), and compared several
different strategies of choosing the stepsize a k . To develop an efficient algo-
rithm based on the Barzilai and Borwein gradient method for solving nonlinear
equations, we present in this paper an analysis of the Barzilai and Borwein
gradient method for the unsymmetric linear system (1.6), where A E RnXnis
nonsingular but not necessarily symmetric positive definite.
As shown in the coming sections, the analysis of the Barzilai and Borwein
gradient method is difficult for the unsymmetric linear equation (1.6). In this
paper, we assume that there are only two variables and A has two real eigen-
values. In addition, we assume that
The condition (1.8) does not imply the positive definiteness of A, which is re-
quired in the analysis of Barzilai & Borwein (1988), Dai & Liao (2002), and
186 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Raydan (1993). Under the above assumptions, we prove that if the eigenvalues
of A are the same, the Barzilai and Borwein gradient method is Q-superlinearly
convergent (see Section 2). If A has different eigenvalues, the method converges
for almost all initial points and the convergence rate is R-superlinear (see Sec-
tion 4). The two results strongly depend on the analyses of two recurrence
relations, namely (2.8) and (4.11) (see Sections 3 and 5, respectively). Some
concluding remarks are drawn in Section 6.
Assume that A1 and A2 are the two nonzero real eigenvalues of A. Since the
Barzilai and Borwein gradient method is invariant under orthogonal transfor-
mations, we assume without loss of generality that the matrix A in (1.6) has
where 6 E R1. In this section, we will consider the case that A has two equal
eigenvalues, namely
A1 = A2 = A. (2.2)
# 0,
g r ~ g k for all k 2 0, (2.7)
we have by (2.4)-(2.5) that gf)gf) # 0 for all k. Thus tk is well defined and
tk # 0. The relation (2.7) also implies that 6 # 0, for otherwise the algorithm
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 187
gives the solution in a t most two steps. Furthermore, the division of (2.4) by
(2.5) yields the recurrence relation
By Theorem 3.1, we know that there exists at most a zero measure set S such
that
lirn ltkl = +m, for all ( t l ,t 2 )E R2\S.
k+w (2.9)
Then by (2.4), we can show that for most initial points, the Barzilai and
Borwein gradient method converges globally and the convergence rate is Q-
superlinear.
Theorem 2.1 Consider the linear equations (1.6), where A E R~~~ is non-
singular and has two identical real eigenvalues. Suppose that the Barzilai and
Borwein gradient method (1.2) and (1.4) is used, satisfying (1.8). Then for all
(tl,t 2 ) E R2\S, where S E R2 is some zero measure set i n R2, we have that
lirn gk = 0.
k+m
Proof: By Theorem 3.1, we know that relation (2.9) holds for some zero mea-
sure set in R2. It follows from (2.6), (2.9), and (2.2) that
6
lim 611gk-~II; = lim 6(1+ t2,4) - - (2.13)
ST-^
Agk-1 k+m Xt2-1 +
6tk-l + A'
Then it follows from (2.4) and the above relations that
Noting that llgkll; = (gf))2(1+tk2), we then get by this, (2.14), and (2.6) that
llgk+lll2 -
lim - - lim
I&1 41+ t k L
k-m lgtllz X-m J- =
Proof: The statements follow directly from (2.8) and the definition of {&).
Proof: Relation (3.1) and ak > 0 indicates that {ak) is monotonically in-
creasing. Assume that
lim ak = M
k-03
< +oo.
Then by this and (3.1), we get that
lim ak 2 M
k-m
+ M-l, (3.3)
Lemma 3.3 Consider the sequence {tk) that satisfies (2.8) and tk # 0 for
all k. Then there exists at most a zero measure set S in R2 such that for all
(tl, t2) E R2\S, one of the following relations holds for some integer k:
Proof: Assume that neither (3.4) nor (3.5) holds for all 2 1. Then there
must exist an integer k such that either tk > 0, tr2.+1< 0 or
Since tk and -& = -tk satisfy the same recurrence relation by part (ii) of Lemma
3.2, we assume without loss of generality that (3.6) holds. Then by (3.6) and
(2.8), we have that = - tg -ti1 > 0. I t follows that < 0, for
otherwise (3.4) holds with k = i+ 1. Similarly, we can prove that
- (tL+4i+l+ t-I
1
we get that + ti+4i+2+ k+4i+l )]-I < 0, yielding
t~+4i+l+ tii4i+1
< ti+4i+l+ Itk+4i+2 + t'lk+4i+2 1-I '
(3.11)
which, with (3.9), implies that < 1 < ti+,,+, . Since t +t-l is monoton-
ically decreasing for t e (0, I), we can conclude from (3.11) that
190 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1
Itk+4i+4I > Itk+4i+21 + Itk+4i+21- .
Similar to (3.15)-(3.17),we can prove the following three relations
1
Itk+4i+s I > Itii+4i+4I + ltk+4i+4 1-
By (3.17), (3.20), and Lemma 3.2, we get that
This together with (3.15), (3.16), (3.18), and (3.19) indicates that
lim ltk+2i+ll= 0.
2-00
tk+4i+2 = t^l
k+4i+l + - q~+4i+l)~k+4i+l
9
where t,!-(tL+4i+l)
E ( 0 , l ) . Then it follows from (3.23) and (2.8) that
-
tk+4i+3 - -@k+4i+ltk+4i+l
where
2 3
ri = ( 2 - +~+4i+l)~k+4i+l+ tE+4i+l
- $i+4i+l)
(3.26)
+ ( 1 - $i+4i+1 )ti+4i+l
From (3.19), (3.22), and (3.25), we see that
) i ~ $ ' _ r n ~=
+ ~1 and lirn
z-cc
($L+4i+l t ~ ' . = 0.
- 1 ) k+4%+1 (3.27)
On the other hand, by (3.24) and the first part of (3.27), we see that
Noting that { - t k ) and { t k ) satisfy the same recurrence relation and replac-
ing all tk+4i+j with -tL+4i+j+2
in the previous discussions, we can similarly
establish that
lim (-ti+4i+5)/(-'~+4i+3) = -1. (3.31)
i--roo
Relations (3.30) and (3.31) imply that
)f",tk+4i+s/tk+4i+~ = (3.32)
-
hk+4i+l - -2ti+4i+l
+ 0(ti+4i+l), (3.33)
Substituting (3.34) into (3.35) and comparing the resulting expression with
(3.29) yield
- (3.36)
h~+4i+l- -2ti+4i+l + ''2+4i+l+ 0(';+4i+l).
192 OPTIMIZATION AND CONTROL WITH APPLICATIONS
-
'l+4i+a - -'1+4i+3 + 2t3+4i+3 - 6tz+4i+3 + O(tSk+4i+3). (3.38)
Substituting (3.37) into (3.38) and comparing it with (3.29), we then obtain
Theorem 3.1 Consider the sequence {tk) that satisfies (2.8). Then there ex-
ists at most a zero measure set S in R2 such that
Proof: By Lemma 3.3, we know that there exists a t most a zero measure set
S in R2 such that (3.4) or (3.5) holds for some if (tl, t2) E R2\S. Assume
without loss of generality that (3.4) holds, for otherwise we may consider the
sequence & = -tk. Then by part (i) of Lemma 3.1 and relation (3.4), we have
that
Since t + t-l > 2 for t > 0, it follows from (3.41) and (3.4) that t ~ <+ -2.~
Consequently,
By relations (3.41)-(3.43), we can similarly show that ti+6 > 0, > 0 and
t~+> * 0. The repetition of this procedure yields
> 0 and j = 0,1,2;
for a l l i
(3.44)
for all i > 0 and j = 3,4,5.
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 193
lim ltL+3i+21= + m .
i+oo
In fact, if (3.48) is not true, then there exists some constant M > 1 such that
ItH+3i+21I M, for a11 i 2 0. (3.49)
Relation (3.46) implies that there exists some integer i > 0 such that
ltiC+3i+31 %Id,> for a11 i > i. (3.50)
It follows from part (i) of Lemma 3.1, (3.44), (3.49), and (3.50) that for all
i > Z,
and
It~+3i,+2I2 2M. (3.54)
Relation (3.48) implies that {ill is an infinite set. By the choice of {il),we
have that
I < 2M,
It~+3~+2 for j E [il + 1,
i l + l - 11. (3.55)
194 OPTIMIZATION AND CONTROL WITH APPLICATIONS
It follows from this, (3.50) and part (i) of Lemma 3.1 that
Since M can be arbitrarily large, we know that (3.47) must hold. This together
with (3.46) complete our proof.
In this section, we analyze the Barzilai and Borwein gradient method for un-
symmetric linear equations (1.6), assuming that the coefficient matrix A, in the
form of (2.1) has two different real eigenvalues, i.e.,
Denote p = -, P =
S
A2 - A1
(l
0
-'),1
and D =
0
O
A2
) Then the matrix
A can be written as
Further, defining uk = Pgk and vr, = P-Tgk, we can get by multiplying (2.3)
with P that
Assume that UI, = ($I, u f ) ) ~and VI, = (vf), v f ) ) ~ . The above relation
indicates that
(2) (2)
(1) - (h-Xl)%-lvk-l (1)
Uk+l - ~C-~Dvk-i Uk
(1) (1)
(2) - (h-~z)uk-lvk-l (2)
Uk+l - u E ~ D v Uk
~ -' ~
Under the condition (1.8), it is easy to show by (4.4) and the defintions of uk
and vk that u r ) u f ) # 0 and vf)vf) # 0. Let qk be the ratio uf)/uf). It
follows from (4.4) that
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 195
On the other hand, by the definitions of ur, and vk, we have that
which yields
Substituting (4.7) into (4.5), we then obtain the following recurrence relation
from which one can establish the R-superlinear convergence result of the Barzi-
lai and Borwein gradient method (see Barzilai & Borwein (1988)). In this paper,
we assume that y # 0.
For simplicity, we denote T = y-2 and define the sequence
lim
i-00
Pi+,(i+l)+j
pS+~i+j
= (1 + T)-~, for j = 4 5 ,
196 OPTIMIZATION AND CONTROL WITH APPLICATIONS
From the above relations, we can show that if A has two different eigenvalues,
the Barzilai and Borwein gradient method converges globally and its conver-
gence rate is R-superlinear.
Then for the Barzilai and Borwein gradient method (1.2) and (1.4), if (1.8)
holds, we have that
lim gk = 0. (4.17)
k-00
Fbrther, the convergence rate is R-superlinear.
Thus by Theorems 5.1 and 5.2, we know that there exists some integer i such
that the relations (4.12)-(4.15) hold. For any sufficient small E > 0, let E E
(0, E) to be another small number. For this 8, we know from (4.12) and (4.14)
that there exists an integer such that for all i > i,
where c > 0 is some constant. Further, since d < E, it follows from (4.21),
(4.13), and (4.15) that there exist an integer i 2 such that for all i 2 i,
5 1-1(1+ r -E)-~', for j = 1,2;
for j = 3; (4.22)
for j = 4,5,6,
U-
for j = l,2,3;
k+6i+j+l < ~ + ~ ( I + T - E ) - ~ ~ ,forj=4,5; (4.23)
- c- Xz
U -
for j = 6.
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 197
Thus for any E > 0, the following relation holds with some positive constants
cl and c2:
k2
l u ~ ) l < c l c ~ ( l + ~ - c ) -f~o r, 1 = 1 , 2 . (4.24)
The definition of uk implies that gk = P-luk, this and (4.24) give
(~~+2i,Pi~+2i+l)
= (-m,
m),for some > 0 and all i 2 0. (4.27)
By this relation, (4.18) and (4.19), we see that if X1X2 > 0, then the Barzilai
and Borwein gradient method is linearly convergent; otherwise, if XIXz < 0,
the method needs not converge. Here it is worthwhile pointing out that the
case of (4.16) can be avoided if the first stepsize a1 is computed by an exact
line search.
Since
2 (A2 - W 2 / d 2 ,
=~ - = (4.28)
the value of T can be regarded as a quantity that shows the degree to which the
matrix A is close to a symmetric matrix. Relation (4.25) indicates that, the
bigger T is, namely if A is closer to a symmetric matrix, the faster the Barzilai
and Borwein gradient method converges to the solution.
In this section, we consider the sequence {pk) that satisfies (4.11). Assume
that pk # 0 , 1 for all k 2 1. To expedite our analyses, we introduce functions
198 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Proof: (i) follows from the definitions of {pk), h(p) and #(p). (ii) follows
+
from (i). For (iii), noting that pk = (1 r ) r k l , we have that
Lemma 5.2 Consider the sequence pk that satisfies (4.11) and pk # O,1 for
all I%. Then for any integer k , the following cycle cannot occur
lim pi+4i+l = 1,
i-DO
lim pi+4i+2 = 0,
&+DO (5.2)
By the definition of {pk) and the above relations, we get by direct calculation
that
where
lim Oi = 0.
i+oo
However, due to the relation between {rk) and {pk), similarly to (5.15) we
have
200 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Since the value on the right hand side of (5.17) is not equal to the one of (5.18)
for any r > 0 , we see that the two relations contradict each other. Therefore
this lemma is true.
Lemma 5.3 Consider the sequence { p k ) that satisfies (4.11) and pk # O,1 for
all k . Assume that
pk > 0 , for all large k. (5.19)
Then there exists some index ,& such that one of the following relations holds:
By (5.23), (5.22), and (5.19), we can see that there exists some k such that
By part (iii) of Lemma 5.1, we assume without loss of generality that (5.24)
holds, for otherwise consider the sequence { r k } . Now we proceed by contradic-
tion and assume that neither (5.20) nor (5.21) holds. Then by the definition of
{ p k ) and the relation (5.24), it is easy to show that for all i 2 1:
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 201
In the following, we will prove that the cycle (5.26) cannot occur infinitely. In
fact, since pi+4i E ( 0 , I ) , we have by part (i) of Lemma 5.1 that
2 + T ) ~ ( P ; 2) P;. (5.28)
It follows from the definition of pL+4i+4 and pi+4i+3 E ( 0 , l ) that
Using the fact that 4 ( p ) is monotonically decreasing for p E (p;, I ) , we get from
(5.33) that
Pi+4i+3 > Pi+4i-i, (5.34)
for otherwise we have from the monotonicity of 4 ( p ) and (5.26) that
( 1+ ~)4(~~+4i+3)-~4(4(~~+4i-l)-l)
2 + r)4(~i+4i-l)-14(4(~i+4i-1)-1) (5.35)
= (1 + 7 ) h ( $ ( ~ ~ + ~ ~ ->~ 1-) - l )
Relations (5.34) and (5.26) indicate that lim pi+4i-1 = c4 E ( O , l ] . If c4 < 1,
2-00
then we have that
202 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Further, by the definition of {pk), (5.36), and part (iii) of Lemma 5.1, we obtain
However, Lemma 5.2 shows that the cycle (5.36) and (5.37) cannot occur. Thus
(5.26) cannot hold for all k and this lemma is true.
Theorem 5.1 Consider the sequence {pk) that satisfies (4.11) and pk # 1 for
all k. Assume that relation (5.19) holds. Then there exists some index k such
that (4.12)-(4.15) hold.
Proof: By Lemma 5.3, there exists an integer iE such that (5.20) or (5.21)
holds. By part (iii) of Lemma (5.1), we assume without loss of generality that
+
(5.20) holds, for otherwise we consider the sequence ((1 r)pkl). By (5.20)
and part (i) of Lemma 5.1, it is easy to see that
Notice that
By (5.43) and part (iii) of Lemma 5.1, we can similarly prove that
Assume that (5.48) is false. Then there exists some constant cs > 0 such that
p ~ + 2~ c5,~ +for~all i 2 1. (5.49)
Noting that
> 1+ T , for p E (O,l),
(5.50)
we have from this, part (ii) of Lemma 5.1, and (5.46) that
limsup pk+6i+7
-5 - lST<l. (5.51)
i+m pk+,i+i h(c5)
The relation (5.51) implies the truth of (5.47), which contradicts (5.49). Thus
(5.48) is true. For any E E (0,0.5], we know from (5.46) and (5.48) that there
exists some integer i such that
and
h ( ~ ~ + 6 i + 55) h(e), for all i 2 i.
h(~k+6i+2)
By part (ii) of Lemma 5.1, (5.50), (5.53), (5.52), and e 5 0.5, it is clear that
204 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Then for any i E [i+ 1,i - 11, we have that p,&+~i+,> E, and this together with
part (ii) of Lemma 5.1, (5.50), (5.53), and (5.54) indicates that
<
- 2 ~ , for all i E
P&+6i+l 5 %+6(i-l)+l 5 ' ' ' 5 P,&+~(a+l)+l [i+ 1,;- I].
S ~ E , for a l l i 2 2 . (5.57)
and
dl((l + 7)pi1) = dl ( ~ k ) ,
ANALYSIS OF BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 205
Relation (5.63), the definition of {pk) and (i) of Lemma 5.1 indicate that
we have from the definitions of D , d l , d2 and part (i) of Lemma 5.1 that
Lemma 5.5 Consider the sequence {pk) that satisfies (4.11) and pk # 0 , l .
Assume that there exists an infinitely subsequence {ki) such that pki < 0. If
206 OPTIMIZATION AND CONTROL WITH APPLICATIONS
then there exists some index Ic such that at least one of the following relations
holds
Pl;, < 0 , ~,G+IE ( 0 , I ) , (5.69)
We proceed by contradiction and assume that neither (5.69) nor (5.70) holds
for any L. By the definition of { p k ) , we see that if pk < 0 and pk+l < 0 , then
pk+2 > 0 . SO there must exist some integer k such that
Define d l , dz, and D as in Lemma 5.4. Then by Lemma 5.4 and (5.72), we
know that is monotonically decreasing with i. If (5.68) holds,
one can strengthen the analyses in Lemma 5.4 and obtain a constant cs E ( 0 , l )
which depends only on p i and such that
i +~2, ~ + 32) C S D ( P ~ P
D ( P L + ~ ~~ + + ~~ +
~ ,~ ~ + for , i 2 0.
~ ) 811
lim
i--too = 0'
2. >
d2 ( ~ i +%+I)
- (1 +4 - 3 ,
and from this, (5.74), and the definition of D , we get
lim dl = 0.
%---too
On the other hand, the definition of pi+2i+2 and (5.72) imply that
-
Pi+,iPi+2i+2 - p12.+2i+lhCpi+2i) E ( 1 , ( 1 +
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 207
By (5.76), (5.77), and (5.58), we know that there exists some integer, which we
continue to denote by i,such that
lim pi+, . = -00, Jflp~+~~+~ = 0. (5.78)
i+oo '
Further, from (5.78), (5.72), and the definition of { p k ) , we can obtain
lim
i+cc pi+4i+l = 1, lim p ~ + ~= 1
i+cc
+
~ +T . ~ (5.79)
However, Lemma 5.2 shows that the cycle (5.78) and (5.79) cannot occur. The
contradiction shows the truth of this Lemma.
Theorem 5.2 Consider the sequence { p k ) that satisfies (4.11) and pr, # O , 1
for all k. Assume that there exists an infinite subsequence { k i ) such that pki < 0
and that (5.68) holds. Then there exists some integer ,& such that relations
(4.12)-(4.15) hold.
Proof: By Lemma 5.5, there exists an index such that (5.69) or (5.70)
holds. By part (iii) of Lemma 5.1, we assume without loss of generality that
(5.69) holds. Then it follows by the definition of { p k ) that
where p: = 1 + T -J
- is same as the one in (5.41). Note that
By this relation, (5.80), and part (ii) of Lemma 5.1, we can get
and
208 OPTIMIZATION AND CONTROL WITH APPLICATIONS
lim Pi+6i = 0.
i-00
In fact, by part (ii) of Lemma 5.1, (5.87), (5.89) and (5.81), we can get
l i p inf 2 1.
2-O0 Pk+6i+2
In a similar way, we can show
Thus by part (ii) of Lemma 5.1, (5.87), (5.89), (5.94) and (5.95), it follows that
Then from this and part (ii) of Lemma (5.1) we can get
PE+6i+7
I h ( ~ ~ + 6 i +h o( ~ ~ + 6 i +<~%
)
< 1, (5.99)
PL+6i+l h ( ~ ~ + 6 i +hl()~ ~ + 6 i + 2-) h(c7)
which implies the truth of (5.97). However this contradicts (5.98). Thus (5.97)
is true. Similar to (5.90) and (5.97), we can show
By part (ii) of Lemma 5.1 and relations (5.87), (5.89), (5.90), (5.97), and
(5.100), we know that (4.12)-(4.15) hold with ,& = i.
6 CONCLUDING REMARKS
In this paper, we have given an analysis of the Barzilai and Borwein gradient
method for unsymmetric linear equations, assuming that the dimension n =
2. Under mild assumptions, we have proved that the convergence rate of the
Barzilai and Borwein gradient method is Q-superlinear if the coefficient matrix
A has two same eigenvalues; if the eigenvalues of A are different, then the
method converges for almost all starting points XI and x2 and the convergence
rate is R-superlinear. These results strongly depend on the study of the two
nonlinear recurrence relations, namely (2.8) and (4.11), making the analyses
difficult.
F'rom the relations (2.13) and (4.25), we can see that the convergence of
the Barzilai and Borwein gradient is related to the symmetric degree of the
coefficient matrix A. If A is close to a symmetric matrix, then the method
converges rapidly; conversely, if the matrix A is markedly unsymmetric, then
the method will converge slowly. The convergence of the Barzilai and Bor-
wein gradient method for unsymmetric linear equations is slower than that for
symmetric linear equations. In the symmetric case, the Barzilai and Borwein
gradient method can give the solution in a t most two steps if A has two same
eigenvalues; if the eigenvalues of A are different, we know from Barzilai & Bor-
wein (1988) that the R-superliner convergence order of the method is - E,
where E > 0 is any small number. In the unsymmetric case, however, relation
210 OPTIMIZATION AND CONTROL WITH APPLICATIONS
(4.25) indicates that the R-superlinear order of the method for unsymmetric
linear equations is only 1 if A has two different eigenvalues. Thus to accelerate
the Barzilai and Borwein gradient method for unsymmetric linear equations, it
may be worthwhile studying how to make a transformation to an unsymmetric
matrix that improves its symmetric degree.
This paper have made some efforts on directly extending the convergence
result in Barzilai & Borwein (1988) of the Barzilai and Borwein gradient method
to the unsymmetric linear equations. As pointed out by the referee, another
possibility is to apply the method to the symmetric (least square) system
A~AX
=~ ~ b , (6.1)
Acknowledgments
This research was supported by the Chinese NSF grant (no. 19801033 and 10171104)
and Hong Kong Research Grants Council (no. CUHK 43921993). The authors would
like to thank the anonymous referee for his many valuable comments on this paper.
References
1 INTRODUCTION
where
A similar result for the best piece-wise polynomial approximation problem was
proved in Vershik, Malozemov, Pevnyi (1975).
For every j E J the function Fj(x) is called an elementary function, and
it is assumed that one is able to find a minimizer of Fj(x) (exactly or ap-
proximately). The problem of minimizing Fj(x) will also be referred to as
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 215
If y = 0, then by definition c l ( x l , @ =
) ~ 0 )0.
2 ( ~ 2 ,=
F(x) < Fr ( x ) Vx E S.
Definition 2.1 Every such a partition (it depends upon x and, evidently, is
not unique) is called an x-proper partition of the set R.
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 217
Let T ( R ,x ) denote the family of all x-proper partitions of the set R. Clearly,
holds.
For I' = (yl,y2) E T ( R ) let us introduce the function
Hence,
inf F ( x ) 5 min @(I?). (3.2)
XES r€T(R)
2 x l Einf
Rnl
c l ( x l , a l ) + inf c 2 ( ~ 2 , a 2 ) = G ( I ' ~ ) min
xzERnZ
2
r€T(R)
@(I?).
min c2 ( ~ c2
5 2 , ~= ) (x2 (a2),~ 2 ) .
xz€Rnz (3.5)
The point x ( r ) = ( x l ( a l ) ,x 2 ( a 2 ) )is not unique, if the minima in (3.4) or (3.5)
are attained a t more than one point.
Remark 3.1 Theorem 3.1 implies that the problem of minimizing the function
F o n S is reduced to the problem of solving a finite number (precisely, IT(R)I)
of problems of minimizing the functions of the form
4 MlNlMALlTY CONDITIONS
Then for the point Z = (Zl, x;) E S by (4.3) and (2.2) one gets
for every ti E Zl(x*) there exists hi > 0 such that ti E Z1(x) for any
x E B(x*, Si),
for every t j E &(x*) there exists ~j > 0 such that t j E &(x) for any
x E B(x*,E ~ ) .
1 * c1 ) $2 (x*) C Z2(z).
which contradicts the fact that Zl(3) n &(3) = 0. SO, t E C(x*), and, hence,
Al c C(x*). Similarly we get A2 c C(x*).
Now let us take an arbitrary disjoint partition (Zl, Z2) of the set C(Z). Since
for the corresponding %proper partition (a1,a2) E T(R, Z) we obtain
I t is easy to see, that from (4.4) it follows, that El c C(3) c C(x*), and
52 c C(Z) C C(x*), therefore
It follows from (4.7) and (4.8) that x* is a local minimizer of the function F.
n
Definition 4.1 A point x* E S satisfying the conditions (4.1) and (4.2) is
called a stationary point.
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 221
Remark 4.1 The notion of stationary point is closely related to the necessary
condition used. Since the conditions (4.1) and (4.2) are necessary conditions
for a global minimum then it is natural to expect that not every local minimizer
is a stationary point. Note, that conditions (4.1) and (4.2) are of nonlocal
nature.
5 A N EXCHANGE ALGORITHM
Let us suppose that for every I' = (al, a2) E T(R) infima
are attained, that is there exists a point x ( r ) = (xl (al), x2(u2)) E S, such that
The following algorithm allows one to find a stationary point of the function
F (and if pi, i = 1,2, are continuous, then the resulting point will be a local
minimizer of F).
2. Let x" (xF,x!j) E S have already been found. Construct the sets
Sl(xk), S2(xk)and z(xk).
3. Check the conditions (4.1) and (4.2) for all I'= (al,02) E T(R, xk).
4. If the conditions (4.1) and (4.2) are satisfied for all I? E T(R, xk) then the
point xk is stationary, and the process terminates.
5 . Otherwise, find any rk= (a!, a;) E T(R, x" for which one of the condi-
tions (4.1), (4.2) is violated.
Clearly,
F(x"') <~ ( x ~ ) ) .
As a result, a finite sequence {xk} is constructed such that condition (5.2)
holds. Since every x-proper partition (af,ag)may occur only once (due to
(5.2)), then, taking into account the fact that IT(R)I is finite, one concludes
that the algorithm converges to a stationary point in a finite number of steps.
Remark 5.1 The algorithm described above may require at some steps com-
plete enumeration of the set IT(R, xk))l= 21'(~~)l.So, i n practice the algorithm
is effective i f lIc(xk)l is not very large. Theoretically, the case of complete enu-
meration of T(R) is possible (as for every algorithm of discrete mathematics).
6 A N &-EXCHANGE ALGORITHM
In this section it is assumed that cpi's are continuous. For every fixed point
x E S and E > 0 let us introduce the sets
Put
a1 = 51 U (x), a2 =52 u UE2(2).
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 223
Definition 6.1 Every such a partition (it depends upon x , E and, evidently, it
is not unique) is called an ( x ,€)-proper partition of the set R .
Definition 6.2 Let all cpi be continuous and E > 0. A point x* E S i s called
an €-local minimizer of the function F , i f
for every (x*,E ) -proper partition = (a1,a2) E T, (0,x * ) of the set R for
the point x * ( r ) (that is a local minimizer of F , delivered by the exchange
algorithm with the initial point x ( r ) ) the inequality
holds.
Remark 6.1 Since cpi's are continuous, every stationary point is a local mini-
mizer, the converse is not true: a local minimizer is not necessarily a stationary
point.
Note that for I? E T(R, x*) the inequality (6.1) becomes the equality.
Let us describe an algorithm for finding €-local minimizers. Let E > 0 be
fixed.
Note that cpi's are continuously differentiable and convex. Consider the follow-
ing clustering problem.
Problem CP: Find x* = (x;, xs) E Rn x Rn such that
where
Take any = (01, a2) E T(fl, x), the functions ci(xj, ai) defined in (2.1) take
the form
ci(xj,ai) = C lit- xi1I2, i = 1,2. (7.1)
tEoi
Clearly,
min ci (xi, ai) = ci (xi (ui), ui) ,
xieRn (7-2)
where
Many allocation problems can be described by the above model, maybe with
slightly different performance functionals (for example, in ? the functions cpis
have the form cpi(t, x) = Ilt - xill). We have chosen the quadratic functions
cpi (see (7.1)) for the simplicity reasons (since then the auxiliary problems (see
(7.2), (7.3)) have the explicit solutions), because our main intention here is to
demonstrate the Algorithm.
Then for every 22 E Rn such that 11x2 - xi11 > 2M, the point (x;,x2) is
stationary (that is, a local minimizer as well). Such local minimizers will be
called trivial stationary point, and we shall ignore them, looking for better ones.
7.1 Example 1
functional
A t common points
The function value is F(x*)= 498.4104. The point x* and the partition (al,
a2)
of the set 0 a t x* are shown in Figure 7.3.
For the initial point xO = (xy,xi) with xy = (10,lo), x: = (-10,-10) in
three steps the local minimizer x* = (x;,xa),where xy = (1.9500,2.9800),
xa =
(-4.5833,0.5417),is found with F(x*)= 417.5478.
7.2 Example 2
Consider again the problem discussed in Example 1, and apply the €-exchange
algorithm. As the initial point for the €-exchange algorithm let us choose one
of the local minimizers, obtained in Example 1.
The results of numerical experiments for the €-exchange algorithm with dif-
ferent c are presented in Table 7.2. The local minimizer xO = (xy,x:), where
228 OPTIMIZATION AND CONTROL WITH APPLICATIONS
8s
-6
-8 -6 -4 -2 0 2 4 6
X X X X
X X X
X X
I I 1 I I I I I
-6 -4 -2 0 2 4 6 8
'8 X X
X
X X
X X
X X
Va A
A
Figure 7.4 The first step of the &-exchange algorithm with E = 15 (six &-common points).
It is an &-localminimizer for e up to 4.
. x *=
~ (xT2,xa2), where
- (xT3,x;~),where
x*3 -
The point x * is
~ an &-localminimizer for e up to 10.
. x *-
~ (xT4,xa4), where
The point x * is
~ an &-localminimizer for E up to 30.
However, at further steps the increase of E affected the result, since due to
deeper "E-diving" we were able to pick up a better-&-minimizer.
8 CONCLUSIONS
Thus, we have described two algorithms: the exchange algorithm for construct-
ing a stationary point and the &-exchange algorithm for finding, may be, a bet-
ter minimizer. The &-exchange algorithm allows one to "escape" from a local
minimum. These algorithms are conceptual (in the terminology of E.Polak)
(see Polak (1971)), though in some cases (as is demonstrated in section 7) they
are directly applicable.
It may happen, that the number of common (or &-common points) is large.
In such a case it is useful to perform some preliminary aggregation of these
points reducing their number to a reasonable quantity (as well as to reduce
or increase the value of e). The aggregation idea was proposed by Prof. M.
Gaudioso.
Computationally implementable modifications of the above algorithms for
specific classes of functions will be reported elsewhere.
At each step of the both algorithms an elementary problem of minimizing a
function of the form Fr(x)is to be solved. The algorithms converge in a finite
number of steps to, at least, a local minimizer. If E is sufficiently large, the
&-exchange algorithm will produce a global minimizer, however, theoretically
it may require the complete enumeration (and solution) of all elementary prob-
lems. The hope and expectation are that if we take a reasonable E, then (at
least statistically) the price asked for a fairly good local minimizer will not be
too high.
The case m > 2 can be studied in a similar way. Analogous results and
algorithms can be formulated. The number of "elementary" problems becomes
mlnl (cf. Rem. 3.1) and, of course, all calculations are more complicated.
However, the exchange and &-exchangealgorithms can be constructed.
Acknowledgments
The author is thankful to an anonymous referee for his careful reading of the manu-
script, useful advice, remarks and suggestions.
References
steps the increase of E affected the result, allowing to find a better €-local minimizer.
At this point the computations were terminated.
ON THE BARZILAI-BORWEIN
METHOD
Roger Fletcher
Department of Mathematics
University of Dundee
Dundee DD1 4HN, Scotland, UK
Email: [email protected]
1 INTRODUCTION
where the number of variables n is very large, typically lo6 or so. The case of
minimization subject to simple bounds is also considered later in the paper. A
related problem is that of the solution of a nonlinear self-adjoint elliptic system
of equations
is also studied, which is equivalent to the solution of the linear system of equa-
tions
attention until the seminal paper of Raydan (1997). This paper introduces
a globalization strategy based on the non-monotone line search technique of
Grippo, Lampariello and Lucidi (1986), which enables global convergence of
the BB method to be established for non-quadratic functions. Of equal impor-
tance, a wide range of numerical experience is reported on problems of up to
lo4 variables, showing that the method compares reasonably well against the
Polak-RibiBre and CONMIN techniques. Earlier papers by Glunt, Hayden and
Raydan (1993) and Glunt, Hayden and Raydan (1994) also report promising
numerical results on a distance matrix problem. The paper Glunt, Hayden and
Raydan (1994) reports on the possibilities for preconditioning the BB method,
and this theme is also taken up by Molina and Raydan (1996). Of particular
interest is the possibility of applying the BB method to box-constrained opti-
mization problems, and this is considered by Friedlander, Martinez and Raydan
(1995) (for quadratic functions) and by Birgin, Martinez and Raydan (2000).
The latter paper considers the BB method in the context of projection on to
a convex set. Another recent theoretical development has been the result that
the unmodified BB method is R-linearly convergent in the quadratic case (Dai
and Liao (1999)).
Despite all these advances, there is still much to be learned about the BB
method and its modifications. This paper reviews what is known about the
method, and advances some reasons that partially explain why the method is
competitive with CG based methods. The importance of maintaining the non-
monotonicity property of the basic method is stressed. It is argued that the
use of the line search technique of Grippo, Lampariello and Lucidi (1986) in
the manner proposed by Raydan (1997) may not be the best way of globalizing
the BB method, and some tentative alternatives are suggested. Some other
interesting observations about the distribution of the BB steplengths are also
made. Many open questions still remain about the BB method and its potential,
and these are discussed towards the end of the paper.
The theory and practice of line search methods for minimizing f (x) has been
well explored. In such a method, a search direction dk)is chosen a t the start
of iteration k, and a step length 8k is chosen to (approximately) minimize
f (dk) + 8 d k ) ) with respect to 0. ' x("
Then x ( ~ += + O ~ S ( " is set. Usually
BARZILAI-BORWEIN METHOD 239
Initially, a1 > 0 is arbitrary, and Barzilai and Borwein give two alternative
formulae,
and
for k > 1, where we denote y("l) = y ( k )-y ( k - l ) . In fact, attention has largely
been focussed on (2.2) and it is this formula that is discussed here, although
there seems to be some evidence that the properties of (2.3) are not all that
dissimilar.
In the rest of this section, we explore the properties of the BB method and
other gradient methods for minimizing a strictly convex quadratic function.
For the BB method, (2.2) can be expressed in the form
and can be regarded as a Rayleigh quotient, calculated from the previous gradi-
ent vector. This is in contrast to the classical steepest descent method which is
equivalent to using a similar formula, but with y("l) replaced by y(". Another
240 OPTIMIZATION AND CONTROL WITH APPLICATIONS
relevant property, possessed by all gradient methods, and also the conjugate
gradient method, is that
That is to say, the total step lies in the span of the so-called Krylov sequence.
Also for quadratic functions, the BB method has been shown to converge (
Raydan (1993)), and convergence is R-linear (Dai and Liao (1999)). However
the sequences {f ( ~ ( ~ 1and
) ) {Ilr(x(k))l12) are non-monotonic, an explanation
of which is given below, and no realistic estimate of the R-linear rate is known.
However the case n = 2 is special, and it is shown in Barzilai and Borwein
(1988) that the rate of convergence is R-superlinear.
To analyse the convergence of any gradient method for a quadratic function,
we can assume without loss of generality that an orthogonal transformation is
made that transforms A to a diagonal matrix of eigenvalues diag(Xi). Moreover,
if there are any eigenvalues of multiplicity m > 1, then we can choose the
corresponding eigenvectors so that = 0 for a t least m - 1 corresponding
indices of $'). It follows from (2.1) and the properties of a quadratic function
that y(k+l) = y(k) - AF)")~~
and hence using A = diag(Xi) that
It is clear from this recurrence that if gIk) = 0 for any i and k = kt, then this
property will persist for all k > k'. Thus, without any loss of generality, we
can assume that A has distinct eigenvalues
then it follows from (2.4) and the extremal properties of the Rayleigh quotient
that
BARZILAI-BORWEIN METHOD 241
Thus, for the BB method, and assuming that a1 is not equal to A1 or A,, then
a simple inductive argument shows that (2.8) and (2.9) hold for all k > 1. It
follows, for example, that the BB method does not have the property of finite
termination.
From (2.6), it follows for any eigenvalue Xi close to cxk ~ ) lgI,(")l.
that I ~ , ! ~ +<<
It also follows that the values lgl(k) I are monotonically decreasing. However, if
on any iteration a k < $A,, then lgik+')l
> lgik)I
and if a k is close to XI then
the ratio of ~ g i ~ + ' ) / g can
i ~ ) approach
~ A,/A1 - 1. Thus we see the potential for
non-monotonic behaviour in the sequences {f ( ~ ( ~ 1and
) ) {Ilr(x(k))112), and the
extent of the non-monotonicity depends in some way on the size of the condition
number of A. On the other hand, if a k is close to A, then all the coefficients gi
decrease in modulus, but the change in gl is negligible if the condition number
is large. Moreover, small values of cur, tend to diminish the components lgil
for small i and hence enhance the relative contribution of components for large
i. This in turn leads to large values of a k on a subsequent iteration, if the
step is calculated from (2.4). Thus, in the BB method, we see values of a k
being selected from all parts of the interior of the spectrum, with no apparent
pattern, with jumps in the values of f ( ~ ( ~and
1 ) Il-y(~(~))ll2
occurring when a k
is small.
There are a number of reasons that might lead one to doubt whether the
BB method could be effective in practice. Although a nice convergence proof is
given by Raydan (1993), we have to recognise the fact that although both the
CG and BB methods select iterates that satisfy the Krylov sequence property
(2.5), it is the CG method that gives the minimum possible value of f (x("')).
Likewise the Minimum Residual (MR) method gives the minimum possible
value of Thus we must accept that the BB method is necessarily
inferior in regard to these measures in exact arithmetic, and there is limited
scope for the BB method to improve as regards elapsed time, for example. Also
the possibility of non-monotonic behaviour of the BB method might seem to
give further reason to prefer the CG method.
To see just how inferior the BB method is, a large scale test problem is de-
vised, based on the solution of an elliptic system of linear equations arising from
a 3D Laplacian on a box, discretized using a standard 7-point finite difference
242 OPTIMIZATION AND CONTROL WITH APPLICATIONS
T=
and
The problem Laplacel(a) has the centre of the Gaussian in the centre of
the box, giving the problem a high degree of symmetry. Also the smaller value
of a gives a smoother solution. Hence this problem is more easy to solve than
Laplacel (b) .
BARZILAI-BORWEIN METHOD 243
The results for this problem are given in Table 2.1 below. The CG method is
coded as recommended by Reid (1971). Times are given in seconds and double
precision Fortran is used on a SUN Ultra 10 a t 440 MHz. The iteration is
terminated when Ily(k))12 is less than of its initial value.
1 Problem 1 Time
BB
Iterations
CG
Time Iterations
MR
Time Iteration! 1
We see from the table that there is little to choose between the CG and
MR methods, and the elapsed time improves on the BB method by a factor of
over 3 for Laplacel(a) and a factor of over 2 for Laplacel(b). For comparsion
purposes, the classical steepest descent method was manually terminated after
2000 iterations (1355 seconds), by which time it had only reduced the initial
gradient norm by a factor of 0.18, so that not even one significant figure im-
provement had been obtained. Thus we see that, although the performance of
the BB does not quite match that of the CG method, it is able to solve the
problem in reasonable time, and significantly improves on the classical steepest
descent method.
Nonetheless, in view of the above, we might ask if there are any circumstances
under which the BB method might be worth considering as an alternative to
the CG method. The answer lies in the fact that the success of the CG iteration
depends very much on the search direction calculation d k )= - ( k ) +pks("l)
being consistent with data arising from a quadratic model. Any deviation
from the quadratic model can seriously degrade performance. To illustrate
that relatively small perturbations can cause this to happen, we repeat the
calculations of Table 2.1 using single precision arithmetic. The results are
displayed in Table 2.2.
We see that the CG and MR methods now take more than twice as many
iterations for Laplacel(a), with a similar, but not quite as bad, outcome for
Laplacel(b). The comparison in time is less marked, presumably because of
244 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 Problem 1 Time
BB
Iterations
CG
Time Iterations Time
MR
Iteration! /
645 443 523
448 I
the cost savings associated with using single rather than double precision. For
the BB method a different picture emerges. For Laplacel(a), somewhat more
iterations are required, whereas for Laplacel(b), considerably fewer iterations
are required. Again the time comparison is improved by using single precision,
to such an extent that there is now little difference between the performance
of the BB and CG methods on the Laplacel(b) problem. My interpretation of
this is that the BB method is affected in a much more random way by round
off errors, and small departures of -y(" and ak from the values arising from a
quadratic problem are not necessarily detrimental.
This has implications for the likely success of the BB method in other con-
texts. For example, if f (x) is made up of a quadratic function plus a small
non-quadratic term, we might expect the BB method to still converge, and
show improved performance relative to the CG method. Another situation is
in the minimization of a quadratic function subject to simple bounds by an
active set or projection type of method. If the number of active constraints
changes, as is often the case, then it is usually not possible to continue to use
the standard CG formula for the search direction and yet preserve the termi-
nation and optimality properties. To do this it is necessary to restart using the
steepest descent direction when a new active set is obtained. Thus it is more
attractive to use the BB method in some way in this situation.
If the deviation of f (x) from a quadratic function is small then it may still be
possible to use the unmodified BB method successfully. However, in general
it is possible for the method to diverge. This is illustrated by using the test
BARZILAI-BORWEIN METHOD 245
f(x(" + d) 2 max(k-M,l)<j<k
max f(j) - yy(k)Td
is met, where d = - ~ y ( ~is )the displacement along the steepest descent di-
rection. This allows any point to be accepted if it improves sufficiently on the
+
largest of the M 1 (or Ic if Ic 5 M) most recent function values. As usual
y > 0 is a small preset constant, and the integer M controls the amount of
monotonicity that is allowed. Raydan recommends the value M = 10 and
presents a lot of encouraging numerical evidence on test problems with up to
n = lo4 variables. His results are competitive with CG methods but he observes
some poorer results on ill-conditioned problems.
To obtain more insight, a non-quadratic test problem of lo6 variables is
derived, based on a 3D Laplacian, in which the objective function is
- pTu
~ U ~ A U + ahZC u e k ,
ijk
which is not untypical of what might arise from a nonlinear partial differential
equation. This problem is referred to as Laplace2. The matrix A is that
defined in (2.10), and the vector /3 is chosen so that the minimizer u* of (3.3)
246 OPTIMIZATION AND CONTROL WITH APPLICATIONS
5 figures 6 figures
Problem Time #ls #f #g Time #IS #f #gl 1
Polak-Ribiitre CG 20.6 445 697 684 oo
Limited mem. BFGS 35.4 315 711 669 oo
BB-Raydan M=10 29.0 274 1140 866 40.4 394 1595 1201
Unmodified BB 14.4 - 487 487 16.7 - 572 572
BB method (y only) 8.8 - - 487 10.3 - - 572
with only about two function and gradients calls per line search. The BB
method (y only), to be described below, gives the best performance. One reason
for the improvement of the unmodified BB method over the PR-CG method
might be the effect of non-quadratic terms degrading the performance of the CG
method. Another possibility is that the CG line search now requires additional
evaluations of the function and gradient to attain the required accuracy in
the line search. One would not like draw any firm conclusions on the basis of
just one set of results, but these results do reinforce Raydan's conclusion that
the BB method, suitably modified, can match or even improve on the PR-CG
method.
Probably the most interesting outcome to emerge is the difference in per-
formance of the unmodified BB method and the BB-Raydan method. The
reasons for this are readily seen by examining the performance of the unmodi-
fied method shown in Figure 3.1. Here the difference f (k)-f * is plotted on a log
scale against the number of iterations. A noticeable feature is the four occasions
iteration number
on which a huge jump is seen in f (" - f * above the slowly varying part of the
graph. In particular the jump around iteration 460 is over lo5 in magnitude.
248 OPTIMIZATION AND CONTROL WITH APPLICATIONS
noticeable if the condition number is very large. We have seen that the value of
M = 10 fails to allow the very large spikes to be accepted, which, as we argue
above, is important for avoiding slow convergence in a gradient method.
Obvious suggestions to improve the performance of Raydan's modification
are first to choose much larger values of M, especially if the problem is likely
to be ill-conditioned. Another suggestion is to allow increases in f(" up to a
user supplied value 7 > f (I) on early iterations. This is readily implemented by
defining f (" = 7 for k < 1 and changing (3.2) so that the range of indices j is
k- M 5 j 5 k. These changes make it more likely that the non-monotonic steps
observed in Figure 3.1 are able to be accepted. On the other hand, although
the convergence proof presented by Raydan would still hold, this extra freedom
to accept 'bad' points might cause difficulties for very non-quadratic problems,
and further research on how best to devise a non-monotone line search is needed.
Another idea to speed up Raydan's method is based on the observation that
the unmodified BB method does not need to refer to values of the objective
function. These are only needed when the non-monotone line search based on
(3.2) is used. Therefore it is suggested that a non-monotone line search based
on llyll is used. As in (3.2), an Armijo-type search is used along the line ~ ( ~ ) + d
where d = - ~ y ( ~ using
), a sequence of values such as 8 = 1, &, A,
. . .. An
acceptance test such as
and a Taylor series for y(x(" + d)) about x(" we may obtain
250 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where a is defined above. It follows from the Taylor series and the strict
convexity of f (x) that there exists a constant X > 0 such that a 2 A, and
consequently that
if 0 is sufficiently small. Thus we can improve on the most recent value Il y (")12
in this case, and hence (3.4) holds a fortiori.
4 DISCUSSION
One thing that I think emerges from this review is just how little we understand
about the BB method. In the non-quadratic case, all the proofs of convergence
use standard ideas for convergence of the steepest descent method with a line
search, so do not tell us much about the BB method itself, so we shall restrict
this discussion to the quadratic case. Here we have Raydan's ingenious proof of
convergence Raydan (1993), but this is a proof by contradiction and does not
explain for example why the method is significantly better than the classical
steepest descent method. For the latter method we have the much more telling
result of Akaike (1959) that the asymptotic rate of convergence is linear and
the rate constant in the worst case (under the assumptions of Section 2) is
(An - X1)/(Xn + XI).
This exactly matches what is observed in practice. This
result is obtained by defining the vector p(k) by = ((gk))2/y(k)Ty(k)in the
notation of Section 2. This vector acts like a probability distribution for 7(k)
and that the sequence {p(")) in the worst case oscillates between two accumu-
lation points el and en. Here the scalar product ATp(" is just the Rayleigh
quotient calculated from y(k) (like (2.4) but using y(k) on the right hand side).
A similar analysis is possible for the BB method, in which a superficially similar
BARZILAI-BORWEIN METHOD 251
the K K T matrix
Many methods have been suggested for solving optimization problems in which
the constraints are just the simple bounds
(see the references in Friedlander, Martinez and Raydan (1995) and Birgin,
Martinez and Raydan (2000) for a comprehensive review). Use of the BB
methodology is considered in two recent papers. That of Friedlander, Martinez
and Raydan (1995) is applicable only to quadratic functions and uses an active
set type strategy in which the iterates only leave the current face if the norm of
reduced gradient is sufficiently small. No numerical results are given, and to me
it seems preferable to be able to leave the current face a t any time if the com-
ponents of the gradient vector have the appropriate sign. Such an approach is
allowed in the BB-like projected gradient method of Birgin, Martinez and Ray-
dan (2000). This method is applicable to the minimization of a non-quadratic
function on any closed convex set, although here we just consider the case of
box constraints for which the required projections are readily calculated.
Birgin, Martinez and Raydan give two methods, both of which use an
Armijo-type search on a parameter 8. Both methods use an acceptance test
similar to (3.2) which only require sufficient improvement on the largest of the
M + 1 most recent function values in the iteration. In Method 1,the projection
References
Conn, A.R., Gould, N.I.M. and Toint, Ph.L., (1988, 1989), Global convergence
of a class of trust region algorithms for optimization with simple bounds,
SIAM J. Numerical Analysis, Vol. 25, pp. 433-460, and Vol. 26, pp. 764-767.
Dai, Y.H. and Liao, L.-Z., (1999), R-linear convergence of the Barzilai and
Borwein gradient method, Research report, (accepted by IMA J. Numerical
Analysis).
Dembo, R.S., Eisenstat, S.C. and Steihaug, T., (1982), Inexact Newton Meth-
ods, SIAM J. Numerical Analysis, Vol. 19, pp. 400-408.
Fletcher, R., (1990), Low storage methods for unconstrained optimization, Lec-
tures in Applied Mathematics (AMS) Vol. 26, pp. 165-179.
Fletcher, R. and Reeves, C.M., (1964), Function minimization by conjugate
gradients, Computer J. Vol. 7, pp. 149-154.
Friedlander, A., Martinez, J.M. and Raydan, M., (1995), A new method for
large-scale box constrained convex quadratic minimization problems, Opti-
mization Methods and Software, Vol. 5, pp. 57-74.
Friedlander, A., Martinez, J.M., Molina, B., and Raydan, M., (1999), Gradient
method with retards and generalizations, SIAM J. Numerical Analysis, Vol.
36, pp. 275-289.
Glunt, W., Hayden, T.L. and Raydan, M., (1993), Molecular conformations
from distance matrices, J. Comput. Chem., Vol. 14, pp. 114-120.
Glunt, W., Hayden, T.L. and Raydan, M., (1994), Preconditioners for Distance
Matrix Algorithms, J. Comput. Chem., Vol. 15, pp. 227-232.
G. H. Golub and C. F. Van Loan, (1996), Matrix Computations, 3rd Edition,
The Johns Hopkins Press, Baltimore.
Grippo, L., Lampariello, F. and Lucidi, S., (1986), A nonmonotone line search
technique for Newton's method, SIAM J. Numerical Analysis, Vol. 23, pp.
707-716.
Hestenes, M.R. and Stiefel, E.L., (1952), Methods of conjugate gradients for
solving linear systems, J. Res. Nut. Bur. Standards, Sect. 5:49, pp. 409-436.
Molina, B. and Raydan, M., (1996), Preconditioned Barzilai-Borwein method
for the numerical solution of partial differential equations, Numerical Algo-
rithms, Vol. 13, pp. 45-60.
Nocedal, J., (1980), Updating quasi-Newton matrices with limited storage,
Math. of Comp., Vol. 35, pp. 773-782.
256 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
In recent years the duality of nonconvex optimization problems have attracted
intensive attention. There are several approaches for constructing dual prob-
lems. Two of them are most popular: augmented Larangian functional ap-
proach (see, for example, Rockafellar (1993), Rockafellar and Wets (1998)) and
nonlinear Lagrangian functional approach (see, for example, Rubinov (2000),
Rubinov et a1 (to appear), Yang and Huang (2001)). The construction of
dual problems and what is more the zero duality gap property are important
as such the optimal value and sometimes the optimal solution of the origi-
nal constrained optimization problem can be found by solving unconstrained
optimization problem. The main purpose of both augmented and nonlinear
Lagrangians is to reduce the problem in hand to a problem which is easier to
solve. Therefore the justification of theoretical studies in this field is creating
new numerical approaches and algorithms: for example, finding a way to com-
pute the multipliers that achieve the zero duality gap or finding a Lagrangian
function that is tractable numerically.
The theory of augmented Lagrangians, represented as the sum of the or-
dinary Lagrangian and an augmenting function, have been well developed for
very general problems. Different augmented Lagrangians, including sharp La-
grangian function, can be obtained by taking special augmenting functions (see,
for example, Rockafellar and Wets (1998)). In this paper we consider the non-
convex minimization problem with equality constraints and calculate the sharp
Lagrangian function for this problem explicitly. By using the strong duality
results and the simplicity of the obtained Lagrangian, we modify the subgra-
dient method for solving the dual problem constructed. Note that subgradient
methods were first introduced in the middle 60s; the works of Demyanov (1968),
Poljak (1969a), (1969b), (1970) and Shor (1985), and (1995) were particularly
influental. The convergence rate of subgradient methods is discussed in Goffin
(1977). For further study of this subject see Bertsekas (1995) and Bazaraa et
a1 (1993). This method were used for solving dual problems obtained by using
ordinary Lagrangians or problems satisfying convexity conditions. However,
our main purpose is to find an optimal value and an optimal solution to a
nonconvex primal problem. We show that a dual function constructed in this
paper is always concave. Without solving any additional problem, we explicitly
calculate the subgradient of the dual function along which its value strongly
MODIFIED SUBGRADIENT METHOD 259
2 DUALITY
inf fo (x)
XEX
subject to f (x) = 0,
where X C Rn, f(x) = (fl(x),fi(x),..-,fm(x)) and fi : X -+ R, i =
0,1,2, . ,m, are real-valued functions.
Let Q, : Rn x Rm -+ X be a dualizing parametrization function defined as
where II-II is any norm in Rm, [y, u] = ELl yiui. By using the definition of Q,,
we can calculate the augmented Lagrangian associated with (P)explicitly. For
every x E X we have:
Lemma 2.1 For every u, y E Rm, y # 0 and for every r E R+ there exzsts
c E R+ such that cllyII - [ y , ~>] r .
f o (x), x E xo,
sup L(x,u,c) =
(u,c)€RmxR+ +m, x $ xo.
Hence,
This means that the value of a mathematical programming problem with equal-
ity constraints can be represented as (2.2), regardless of properties the original
problem satisfies.
Proofs of the following four theorems are analogous to the proofs of similar
theorems earlier presented for augmented Lagrangian functions with quadratic
or general augmenting functions. See, for example, Rockafellar (1993) and
Rockafellar and Wets (1998).
MODIFIED SUBGRADIENT METHOD 261
where p is a perturbation function defined by (??). When this holds, any a > c
will have the property that
Theorem 2.5 Let inf P = sup P* and suppose that for some (Ti, E ) E Rm x R+,
and 2 E X ,
minL(x,;li,~)
= f o ( 2 ) + E I l f (2)II- [ f (2),Ti].
xEX (2.3')
Then Z is a solution to ( P ) and (Ti,E) is a solution t o ( P * )i f and only i f
f (z)= 0. (2.4)
Sufficiency. Suppose to the contrary that (2.3) and (2.4) are satisfied but Z
and (a, E) are not solutions. Then, there exists E E Xo such that fo (E) < fo (T) .
Hence
We have described several properties of the dual function in the previous sec-
tion. In this section, we utilize these properties to modify the subgradient
method for maximizing the dual function H. Theorem 2.2, Theorem 2.3 and
Theorem 2.4 give necessary and sufficient conditions for an equality between
inf P and sup P*.Therefore, when the hypotheses of these theorems are satis-
fied, the maximization of the dual function H will give us the optimal value of
the primal problem.
We consider the dual problem
The assertion of the following theorem can be obtained from the known the-
orems on the subdifferentials of the continuous maximum and minimum func-
tions. See, for example, Polak (1997).
Let xk be any solution. If f (xk) = 0, then stop; by Theorem 2.5, (uk, ck) is
a solution to (P*), xk is a solution to (P) . Otherwise, go to step 2.
2. Let
Proof: Let (uk, ck) E Rm x R+, xk E X (uk, ck) and ( u ~ +c ~~, + is~a)new iter-
ate calculated from (3.1) for arbitrary positive scalar stepsize sk and ek. Then
by Theorem 3.1, the vector (- f (xk) , I [ f (xk)ll) E Rm x R+ is a subgradient of
a concave function H at (uk,c ~ and
) by definition of subgradients we have:
Now suppose that the last minimum attains for a some Z E X. If f ( Z ) were
zero, then by Theorem 2.5, the pair ( u k , ck + ~k 11 f ( x k )11) would be a solution
to the dual problem and therefore
distance between the points generated by the algorithm and the solution to
the dual problem decreases a t each iteration (cf. Bertsekas (1995), Proposition
6.3.1).
where the last inequality is a result of inequalities E - ck > 0 , 11 f ( x k )11 > 0 , and
0 < ~k < sk. NOW,by using the subgradient inequality
we obtain
and the theorem is proved. The inequality (3.3) can also be used to establish
the convergence theorem for the subgradient method.
Theorem 3.4 Assume that all conditions of Theorem 2.4 are satisfied. Let
( u k , ck) be any iteration of the subgradient method. Suppose that each new
where H
5 I l f (xk)1I2
It is obvious that, the sequence {I18- u t 1l2 + I? - ck12} is bounded from be-
low (for example, by zero), and by Theorem 3.3, it is decreasing. Thus,
{lla - u t 11 + la - ck12} is a convergent sequence. Hence
Unfortunately, however, unless we know the dual optimal value H (u, ??),
which is rare, the range of stepsize is unknown. In practice, one can use the
ste~sizeformula
subject to
fi (2) = X? + xz + xg - 25 = 0,
MODIFIED SUBGRADIENT METHOD 267
and
xi > 0 , i = 1,2,3.
The result reported i n Himmelblau (1972) is
11 f ( x * )11 = 2.9 x .
Through this implementation of the modified subgradient algorithm given
above, the result is obtained i n a single iteration, starting with the initial
guesses xo = 0 , uo = 0 and co = 0 . The positivity constraints xi 2 0 ,
i = 1,2,3, are eliminated by defining new variables yi, such that yZ = x i ,
and then the problem is solved for these yi s. The obtained result is x* =
(3.5120790172, 0.2169913063, 3.5522127962), f,* = 961.7151721335. The con-
straint i s satisfied with this s* as 11 f (x*)ll = 9.2 x lo-'' .
subject to
f ( x )= (xl - 1)
2
+ (x2 - 1)2 + (sin ( X I + x2) - 1 )2 - 1.5 I 0 .
The result reported i n Khenlcin (1976) is
+
fo (x) = 0.5 (XI x2)
2
+ 50 (22 - xi)' + xi + 1x3 - sin (xl+ x2)1 - rnin
subject to
Note that only the range 0.7-1.2 for an estimate of H as an upper bound in
the formula for sk seems to be giving the correct answer. Outside this range,
the number of iterations gets bigger.
+
fo (x,y ) = [a,x] - 0.5 [x,Q x ] by -+ min
Acknowledgments
The authors wish to thank Drs. Y. Kaya and R.S. Burachik for useful discussions
and comments.
References
Andramanov, M.Yu., Rubinov, A.M., and Glover, B.M. (1997), Cutting an-
gle method for minimizing increasing convex-along-rays functions, Research
Report 9717, SITMS, University of Ballarat, Australia.
Andramanov, M.Yu., Rubinov, A.M. and Glover, B.M. (1999), Cutting angle
methods in global optimization, Applied Mathematics Letters, Vol. 12, pp.
95-100.
Bazaraa, M.S., Sherali, H.D. and Shetty, C.M. (1993), Nonlinear programming.
Theory and Algorithms , JohnWiley& Sons, Inc., New York.
Bertsekas, D.P. (1995), Nonlinear Programming, Athena Scientific, Belmont,
Massachusetts.
Demyanov, V. F. (1968), Algorithms for some Minimax Problems, J . Computer
and System Sciences, 2, 342-380.
Floudas, C.A., et al. (1999), Handbook of test problems in local and global op-
timization, Kluwer Academic Publishers, Dordrecht.
270 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Abstract: Inexact Restoration methods have been introduced in the last few
years for solving nonlinear programming problems. These methods are related
to classical restoration algorithms but also have some remarkable differences.
They generate a sequence of generally infeasible iterates with intermediate iter-
ations that consist of inexactly restored points. The convergence theory allows
one to use arbitrary algorithms for performing the restoration. This feature is
appealing because it allows one to use the structure of the problem in quite op-
portunistic ways. Different Inexact Restoration algorithms are available. The
most recent ones use the trust-region approach. However, unlike the algorithms
based on sequential quadratic programming, the trust regions are centered not
in the current point but in the inexactly restored intermediate one. Global con-
vergence has been proved, based on merit functions of augmented Lagrangian
type. In this survey we point out some applications and we relate recent ad-
vances in the theory.
1 INTRODUCTION
\ ) \ NONCONVERGENCEOF
\\ RESTORATION
POINT F
TRIAL POINT
The main drawback of feasible methods is that they tend to behave badly
in the presence of strong nonlinearities, usually represented by very curved
constraints. In these cases, it is not possible to perform large steps far from
the solution, because the nonlinearity forces the distance between consecutive
feasible iterates to be very short. If a large distance occurs in the tangent
space, the newtonian procedure for restoring feasibility might not converge,
and the tangent step must be decreased. See Figure 2.1. Short steps far from
the solution of an optimization problem are undesirable and, frequently, the
practical performance of an optimization method is linked to the ability of
leaving quickly the zones where there is no chance of finding a solution. This
fact leads one to develop Inexact Restoration algoritms.
The convergence theory of Inexact Restoration methods is inspired in the
convergence theory of recent sequential quadratic programming (SQP) algo-
rithms. See Gomes et a1 (1999). The analogies between IR and the SQP
method presented in Gomes et a1 (1999) are:
2. in both methods the iteration is composed by two phases, the first related
to feasibility and the second to optimality;
However, there exist very important differences, which allow one to relate
IR to the classical feasible methods:
1. both in the restoration phase and in the optimality phase, IR deals with
the true function and constraints, while SQP deals with a model of both;
2. the trust region in SQP is centered in the current point. In IR the trust
region is centered in the restored point.
3 DEFINITION OF AN IR ALGORITHM
min f (XI
subject to C(x) = 0, x E R
Algorithm 3.1
Step 1. Initialize penalty parameter.
Define
OF = min ( 1 ,Q k - l , . . . ,0 - I ) ,
large -
8, - m i n { l , O ~ + ~ ~ )
and
ek,-l = Ok
large
.
and
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 277
If this is not possible, replace r k by (rk+1)/2 and repeat Step 2. (In Martinez
(2001) sufficient conditions are given for ensuring that this loop finishes.)
If
define
Step 2.
ki
Set t t tbseak.
Step 3.
If
L(Yk -I-t&,,
k
Xk) 5 L(y , A )
k
+ O.lt(VL(y ,X ), dtan),
k k
(3.10)
and terminate. (Observe that the choice zkvi = y% tdt,, is admissible but,
very likely, it is not the most efficient choice.)
Step 4.
If (3.10) does not hold, choose t,,, E [O.lt, 0.9t], set t t t,,, and go to
Step 3.
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 279
Under these conditions it can be proved that the algorithm is well defined.
We can also prove the convergence theorems stated below.
min llc(~>Il~
subject to ~ < Z < U .
Theorem 3.2 If C(xk) + 0, there exists a limit point of the sequence that
satisfies the Fritz-John optimality conditions of nonlinear programming.
h(x) = 0
where, for simplicity, we leave to the reader the definition of the correct
dimensions for x, h, g. Given x E IRn and the tolerance parameter r) > 0, we
divide the inequality constraints in three groups:
(see Martinez and Svaiter (2000)). See Figures 4.1 and 4.2.
We say that a feasible point x* of (4.1) satisfies the AGP optimality condition
if there exists a sequence {xk} that converges to x* and such that g ( x 9 + 0.
The points x that satisfy the AGP optimality conditions are called "AGP
points". It has been proved in Martinez and Svaiter (2000) that the set of
local minimizers of a nonlinear programming problem is contained in the set of
AGP points and that this set is strictly contained in the set of Fritz-John points.
Therefore, the AGP optimality condition is stronger than the Fritz-John op-
timality conditions, traditionally used in algorithms. When equalities are not
present and the problem is convex it can be proved that AGP is sufficient for
guaranteeing that a point is a minimizer Martinez and Svaiter (2000).
A careful analysis of the convergence proofs in Martinez and Pilotta (2000);
Martinez (2001) shows that Inexact Restoration guaranteeing convergence to
points that satisfy the AGP optimality condition. This fact has interesting
consequences for applications, as we will see later.
5 ORDER-VALUE OPTIMIZATION
k
X
where
max f (XI
subject to x ER
6 BILEVEL PROGRAMMING
min f(x,y)
subject to y solves a nonlinear programming problem
that depends on x (6.1)
and
Ordinary constraints.
In other words:
284 OPTIMIZATION AND CONTROL WITH APPLICATIONS
min f(x,y)
subjectto y m i n i m i z e s P ( x , y ) s . t . t ( ~ , y ) = O , s ( x , y ) ~ O(6.2)
and h(x, Y) = 0, g(x, y) 10. (6.3)
(i) (x*, y*, v*, w*, A*, z*) is a KKT point of (13.12)~(6.13), (6.14);
Roughly speaking, the sufficient conditions are related with the convexity
of the lower-level problem, when the MPEC under consideration is a Bilevel
programming problem.
As in the case of the OVO problem, no feasible point of MPEC satisfies
the Mangasarian-F'romowitz constraint qualification and, consequently, all the
feasible points are Fritz-John. So, it is mandatory to find stronger optimality
conditions in order to explain the practical behavior of algorithms. Fortunately,
it can also be proved that the set of AGP points is only a small part of the set
of Fritz-John points. The characterization of AGP points is given in Andreani
and Martinez (2001). Since IR converges to AGP points, we have an additional
reason for using IR in MPEC.
As in Nash-equilibrium problems (see Vicente and Calamai (1994)) the use
of the optimization structure instead of optimality conditions in the lower level
encourages the use of specific restoration algorithms.
286 OPTIMIZATION AND CONTROL WITH APPLICATIONS
7 HOMOTOPY METHODS
Homotopic methods are used when strictly local Newton-like methods for
solving (7.1) fail because a sufficiently good initial guess of the solution is not
available. Moreover, in some applications areas, homotopy methods are the rule
of choice. See Watson et a1 (1997). The homotopic idea consists in defining
such that H(x, 1) = F(x) and the system H ( x , 0) = 0 is easy to solve. The so-
lution of H(x, t) = 0 is used as initial guess for solving H ( x , t') = 0, with t' > t.
In this way, the solution of the original problem is progressively approximated.
For many engineering problems, natural homotopies are suggested by the very
essence of the physical situation. In other cases, artificial homotopies can be
useful.
In general, the set of points (x, t ) that satisfy H ( x , t ) = 0 define a curve in
Rn+'. Homotopic methods are procedures to "track" this curve in such a way
that its "end point" (t = 1) can be safely reached. Since the intermediate points
of the curve (for which t < 1)are of no interest by themselves, it is not necessary
to compute them very accurately. So, an interesting theoretical problem with
practical relevance is to choose the accuracy to which intermediate points are
to be computed. If an intermediate point (x, t ) is computed with high accuracy,
the tangent line to the curve that passes through this point can be efficiently
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 287
min (t - 1)2
subject to H(x,t) = 0, x E 0.
Therefore, the homotopic curve is the feasible set of (7.2). The nonlinear
programming problem (7.2) could be solved by any constrained optimization
method, but Inexact Restoration algorithms seem to be closely connected to
the classical predictor-corrector procedure used in the homotopic approach.
Moreover, they give theoretically justified answers to the accuracy problem.
The identification of the homotopy path with the feasible set of a nonlinear
programming problem allows one to use IR criteria to define the closeness of
corrected steps to the path. The solutions of (7.1) correspond exactly to those
of (7.2), so the identification proposed here is quite natural.
The correspondence between the feasibility phase of IR and the corrector
phase of predictor-corrector continuation methods is immediate. The IR tech-
nique provides a criterion for declaring convergence of the subalgorithm used
in the correction. The optimality phase of IR corresponds to the predictor
phase of continuation methods. The IR technique determines how long pre-
dictor steps can be taken and establishes a criterion for deciding whether they
should be accepted or not.
8 CONCLUSIONS
4. the use of the Lagrangian in the optimality phase favors practical fast
convergence near the solution.
Acknowledgments
References
be designed to restart the search. The "hardness" of the GOP is well illustrated
by the following "golf course" example for which the approach described above
seems powerless. Define the function f : [O,1] -t { O , l ) as follows:
1 for 0 5 x 5 a - €12
0 for a - €12 < x < a + €12 (1.1)
1 for a + ~ / 25 x 5 1.
where a E (€/2,1 - €12). To obtain the minimum of this function, one should
evaluate it within the E interval around the unknown number a. If this function
is defined like an oracle (i.e., if one does not know the position of the point a),
the probability of choosing an x within this interval is E . For the n-dimensional
version of this oracle, the probability becomes en, and the query complexity of
the problem grows exponentially with n (the dimensionality curse). Of course,
this is an extreme case, for which knowledge about the derivatives (they are
all zero whenever defined !) would not help. This and related issues have been
deftly discussed by Wolpert and Macready in connection with their "No F'ree
Lunch" (NFL) theorem (Wolpert (1996)).
In the light of the previous example, it seems that without additional knowl-
edge about the structure of the function there is no hope to decide upon an
intelligent optimization strategy and one is left with either strategies that have
limited albeit efficient applicability or the exhaustive search option.
Thus, new approaches are needed to reduce the complexity of the problem
to manageable complexity. Recently, quantum computing has been hailed as
the possible solution to some of the computationally hard classical problems
(Nielsen (2000)). Indeed, Grover's (Grover (1997)) and Shor's (Shor (1994))
algorithms provide such solutions to the problems of finding a given element in
an unsorted set and the prime factorization of very large numbers, respectively.
Here we present a solution to the continuous GOP in polynomial time, by
developing a generalization of Grover's algorithm to continuous problems. This
generalization requires additional information on the objective function. In
many optimization problems, some of this additional information is available
(see below). While other required information may be more difficult to obtain
in practical applications, it is important to understand - from a theoretical
point of view - the role of the information in connection to the difficulty of the
problem, and to be able to assess a priori what various information is relevant
296 OPTIMIZATION AND CONTROL WITH APPLICATIONS
and for what. For instance, if the objective function were an analytic function,
the knowledge of all its derivatives at a given point would allow, in principle,
the "knowledge" of the function everywhere else in the domain of analyticity.
However, to actually find the global minimum, the function would still have
to be calculated everywhere! In other words, the (additional) knowledge of all
the derivatives a t a given point cannot be eficiently used to locate the global
minimum, although in principle it is equivalent to the knowledge of the function
a t all points. In fact, to locate the global minimum, both methods would require
exhaustive calculations.
2 GROVER'S Q U A N T U M ALGORITHM
by exchanging the level populations. The CNOT gate acts on four dimensional
vectors in c4.Obviously, some of these vectors can be represented as a tensor
product of two two-dimensional vectors; however other vectors i n a 4 cannot be
written in this form. These latter states are called entangled states and play
a crucial role in quantum algorithms (Nielsen (2000)). Quantum algorithms
are (i) intrinsically parallel and (ii) yield probabilistic results. These proper-
QUANTUM ΑLGΟRΙΤΗΜ 297
ties reίiect the facts that: (i) the waνe fuηctiοη, τ/>, is nonlocal aηd, iη fact,
ubiquituοus aηd (ii) the quaηtity \ψ\2 is iηterρreted as a probability deηsity.
\W >= —== > \Χi >= —r=\Χ\ > Η 7=—\¤t > (2·2)
Iη the secοηd reρreseηtatiοη, the uηit νectοrs orthogonal tο \χ\ > are lumped
tοgether iη the uηit νectοr \χ± >, which formally reduces the problem tο a
bidimensional sρace aηd simplifies the ρreseηtatiοη aηd iηterρretatiοη οf the
algorithm.
We ηοte that the scalar ρrοduct < χ\\w >— L· := cοsβ = siηα where
β deηοtes the angle betweeη the νectοrs \w > aηd \χι > aηd α deηοtes its
complement, i.e. the angle betweeη the νectοrs \w > aηd \χj¦; >.
The cοηstructiοη οf the state \w > caη be dοηe iη Ιοg2Ν = η steρs by aρ
initial ζerο state, |0 > <g> (8) |0 >. Iη the {\xχ >, \χ‡· >} basis, we cοηstruct
η times
the ορeratοrs:
(2.5)
which, in the compressed, two-dimensional representation of the problem, rep-
resents a rotation of the state vector with an angle 2 a towards 1x1 >. This
means that each application of the operator Q will increase the weight of the
unknown vector 1x1 > (which explains the name of the operator Q) and after
roughly -- 7T 2-1 m
&&- , -- q f l applications the state vector will be es-
sentially parallel to 1x1 >, whereupon a measurement of the state will yield the
result 1x1 > with a probability very close to unity. We mention that for (and
only for) N = 4, the result is obtained with certainty, after only one application.
In general, if one continues the application of Q, the state vector continues its
rotation and the weight of 1x1 > decreases; eventually, the evolution of the state
is cyclic as prescribed by the unitary evolution. In the original, N-dimensional
representation, the operator I, has the representation:
Using this representation, one can show explicitly that the algorithm can be
implemented as a sequence of local operations such as rotations, Hadamard
transforms, etc. (Grover (1997))
It is easy to check that if the oracle returns the same value for all the ele-
ments, i.e. there is no "special" element in the set E, the amplification operator
QUANTUM ALGORITHM 299
Q reduces to the identity operator I and, after the required number of applica-
tions, the measurement will return any of the states with the same probability,
namely 1/N. In other words, the algorithm behaves consistently.
Grover's search algorithm described in the previous section has been applied
to a discrete optimization problem, namely finding the minimum among an
unsorted set of N different objects. Diirr and Hoyer (Diirr) adapted Grover's
original algorithm and solved this problem with probability strictly greater than
112, using 0 ( 0 ) function evaluations (oracle invocations).
In this article, we map the continuous GOP to the Grover problem. Once
this is achieved, one can apply either Grover's algorithm and obtain an almost
certain result with 0 ( n ) function evaluations. However, the mapping of the
GOP to Grover's problem is not automatic, but requires additional information.
Before spelling out the required information, let us revisit the "pathological"
example (1). Without loss of generality, we can take E = 1 / N and divide the
segment [0, 11into N equal intervals. By evaluating the function a t the midpoint
of the N intervals, we obtain a discrete function that is equal to 1 in N - 1
points and equal to 0 in one point, which - up to an unessential transformation
- is equivalent to Grover's problem. Direct application of Grover's algorithm
yields the corresponding result. Of course, generalization to any dimensionality
d is trivial. Thus, a clssically intactable problem becomes much easier within
the quantum computing framework. We shall return to this example after
discussing the general case.
Consider now a real function of d variables, f (xl, xz, ....,xd). Without re-
stricting generality, we can assume that f is defined on [0, lIdand takes values
in [O, 11. Assume now that: (i) there is a unique global minimum which is
reached at zero; (ii) there are no local minima whose value is infinitesimally
close to zero; in other words, the values of the other minima are larger than
a constant 6 > 0, and (iii) the size of the basin of attraction for the global
minimum measured at height 6 is known; we shall denote it A.
Then our implementation paradigm is the following: (i) instead of f (.),
consider the transformation g ( . ) := (f(.))llm. For sufficiently large m, this
function will take values very close to one, except in the vicinity of the global
300 OPTIMIZATION AND CONTROL WITH APPLICATIONS
minimum, which will maintain its original value, namely zero. Of course, other
transformations can be used to achieve essentially the same result. We calcu-
late m such that = 112. To avoid technical complications that would not
change the tenure and conclusions of the argument, we assume that A = 1/M
where M is a natural number, and divide the hypercube [0, lId in small d-
dimensional hypercubes with sides A. At the midpoint of each of these hyper-
cubes, define the function h(x) := int[g(.) + 1/21 (here i n t denotes the integer
part). The function h(.) is defined on a discrete set of N points, N = M ~ ,
and takes only values one and zero; by our choice of constants, the region on
which it takes value zero is a hypercube with side A. Thus we have reduced
the problem to the Grover setting. Application of Grover's algorithm to the
function h(.), will result in a point that returns the value zero; by construction,
this point belongs to the basin of attraction of the global minimum. We return
then to the original function f (.) and apply the descent technique of choice
that will lead to the global minimum. If the basin of attraction of the global
minimum is narrow, the gradients of the function f (.) may reach very large val-
ues which may cause overshots. Once that phase of the algorithm is reached,
one can proceed to apply a scaling (dilation) transformation that maintains
the descent mode but moderates the gradients. On the other hand, as one
approaches the global minimum, the gradients become very small and certain
acceleration techniques based on non-Lipschitzian dynamics may be required
(Barhen (1996)); (Barhen (1997)). If the global minimum is attained a t the
boundary of the domain, the algorithm above will find it without additional
complications.
It is clear that, in general, the conditions imposed on the functions f (.) are
rather strong, suficient conditions. However: (a) these conditions are both
satisfied and explicitly given for the academic "golf course" example (1) and
(b) while they do not help reduce the complexity of the classical descentlsearch
algorithm, they make a remarkable difference in the quantum framework.
In fact, assumption (i) is satisfied by a large class of important practical
problems, namely parameter identification encountered e.g. in remote sensing,
pattern recognition, and, in general, inverse problems. In these problems the
absolute minimum, namely zero, is attained for the correct values of the pa-
REFERENCES 301
Acknowledgments
This work was partially supported by the Material Sciences and Engineering Division
Program of the DOE Office of Science under contract DE-AC05-000R22725 with UT-
Battelle, LLC. We thank Drs. Robert Price and Iran Thomas from DOE for their
support. V. P. thanks Dr. Cassius D'Helon for an enlightening discussion on Ref. 3
and for a careful reading of the manuscript.
302 OPTIMIZATION AND CONTROL WITH APPLICATIONS
References
Dept. of Mathematics
University of Bayreuth
D-95440 Bayreuth
Germany
1 INTRODUCTION
The functions and gradients can be evaluated with sufficiently high pre-
cision.
are linearized. Second order information about the Hessian of the Lagrangian
is updated by a quasi-Newton formula. The convex quadratic program must be
solved in each iteration step by an available black box solver. For a review, see
for example Stoer (1985) and Spellucci (1993) or any textbook about nonlinear
programming.
Despite of the success of SQP methods, another class of efficient optimization
algorithm was proposed by engineers mainly, where the motivation is found
in mechanical structural optimization. The first method is known under the
name CONLIN or convex linearization, see Fleury and Braibant (1986) and
Fleury (1989), and is based on the observation that in some special cases,
typical structural constraints become linear in the inverse variables. Although
this special situation is rarely observed in practice, a suitable substitution by
inverse variables depending on the sign of the corresponding partial derivatives
and subsequent linearization is expected to linearize constraints somehow.
More general convex approximations are introduced by Svanberg (1987)
known under the name moving asymptotes (MMA). The goal is always to
construct convex and separable subproblems, for which efficient solvers are
available. Thus, we denote this class of methods by SCP, an abbreviation
for sequential convex programming. The resulting algorithm is very efficient
for mechanical engineering problems, if there are many constraints, if a good
starting point is available, and if only a crude approximation of the optimal
solution needs to be computed because of certain side conditions, for example
calculation time or large round-off errors in objective function and constraints.
In other words, SQP methods are based on local second order approxi-
mations, whereas SCP methods are applying global approximations. Some
comparative numerical tests of both approaches are available for mechanical
structural optimization, see Schittkowski et al. (1994). The underlying finite
element formulation uses the software system MBB-LAGRANGE (Kneppe et
al. (1987)). However we do not know of any direct comparisons of computer
codes of both methods for more general classes of test problems, particularly
for standard benchmark examples.
Thus, the purpose of the paper can be summarized as follows. First we
outline a general framework for stabilizing the algorithms under consideration
by a line search. Merit function is the augmented Lagrangian function, where
violation of constraints is penalized in the Lz-norm. SQP and SCP methods
308 OPTIMIZATION AND CONTROL WITH APPLICATIONS
are introduced in the two subsequent sections, where we outline some common
ideas and some existing convergence results. Section 5 shows the results of
comparative numerical experiments based on the 306 test examples of the test
problem collections of Hock and Schittkowski (1981) and Schittkowski (1987a).
The two computer codes under investigation, are the SQP subroutine NLPQLP
of Schittkowski (2001) and the SCP routine SCPIP of Zillober (2001c), Zillober
(2002). To give an impression on the convergence of the algorithms in case of
structural design optimization, we repeat a few results of the comparative study
Schittkowski et al. (1994). A couple of typical industrial applications and case
studies are found in Section 6, to show the complexity of modern optimization
problems, for which reliable and efficient software is needed.
2 A GENERAL FRAMEWORK
The fundamental tool for deriving optimality conditions and optimization al-
gorithms is the Lagrange function
Since we assume that (1.1) is nonconvex and nonlinear in general, the basic
idea is to replace (1.1) by a sequence of simpler problems. Starting from an
initial design vector xo E Rn and an initial multiplier estimate uo E R m , iter-
ates xk E Rn and uk E Rm are computed successively by solving subproblems
of the form
SQP VERSUS SCP METHODS 309
1. (2.3) is strictly convex and smooth, i.e. the functions f k ( x ) and gf(x) are
twice continuously differentiable, j = 1, . . ., m.
3. The search direction (yk - xk, vk - uk) is a descent direction for an aug-
mented Lagrangian merit function introduced below.
and we set
search is well-defined,
with p > 0, which is satisfied for SQP and SCP methods, see Schittkowski
(1981a), Schittkowski (198313) or Zillober (1993), respectively. A more general
framework is introduced in Schittkowski (1985a). For a more detailed discussion
of line search and different convergence aspects see Ortega and Rheinboldt
(1970).
If the constraints of (2.3) become inconsistent, it is possible to introduce
an additional variable and to modify objective function and constraints, for
example in the simplest form
The penalty term pk is added to the objective function to reduce the influence
of the additional variable yn+l as much as possible. The index k implies that
this parameter also needs to be updated during the algorithm. It is obvious
that (2.12) does always possess a feasible solution.
3 SQP METHODS
methods are also introduced in the books of Papalambros and Wilde (2000)
or Edgar and Himmelblau (1988). Their excellent numerical performance is
tested and compared with other methods in Schittkowski (1980), Schittkowski
(1983a), and Hock and Schittkowski (1981), and since many years they belong
to the most frequently used algorithms to solve practical optimization problems.
The basic idea is to formulate and solve a quadratic programming subprob-
lem in each iteration which is obtained by linearizing the constraints and ap-
proximating the Lagrange function (2.1) quadratically.
To formulate the quadratic programming subproblem, we proceed from given
iterates xk E Rn,an approximation of the solution, uk E Rman approximation
of the multipliers, and Bk E RnXn,
an approximation of the Hessian of the
Lagrange function in a certain sense. Then we obtain subproblem (2.3) by
defining
It is immediately seen that the requirements of the previous section for (2.3)
are satisfied. The key idea is to approximate also second order information
to get a fast final convergence speed. The update of the matrix Bk can be
performed by standard quasi-Newton techniques known from unconstrained
optimization. In most cases, the BFGS-method is applied, see Powell (1978a),
Powell (1978b), or Stoer (1985). Starting from the identity or any other positive
definite matrix Bo, the difference vectors
where
The above formula yields a positive definite matrix Bk+l provided that Bk
is positive definite and qTwk > 0. A simple modification of Powell (1978a)
guarantees positive definite matrices even if the latter condition is violated.
SQP VERSUS SCP METHODS 313
211vk - uk1I2
rk = max
( (Y*- ~ r ) ~ B ~-(xk)
y t
,Tk-1
for the augmented Lagrangian function (2.7), see Schittkowski (1983a). More-
over we need an additional assumption concerning the choice of the matrix
Bk, if we neglect the special update mechanism shown in (3.3). We require
that the eigenvalues of the matrices Bk remain bounded away from 0, i.e. that
(yk - x ~ ) ~ B -
~ xk) >
( Y ~711yk - xk 112 for a11 k and a y > 0. If the iteration
data {(xk, uk, Bk)) are bounded, then it can be shown that there is an accumu-
lation point of {(xk, uk)) satisfying the Karush-Kuhn-Tucker conditions (2.2)
for (1.1), see Schittkowski (1983a).
The statement is quite weak, but without any further information about
second derivatives, we cannot guarantee that the approximated point is indeed
a local minimizer. iFrom the practical point of view, we need a finite stopping
criterion based on the optimality conditions for the subproblem, see (2.5), based
on a suitable tolerance E > 0. For example, we could try to test the KKT
condition
314 OPTIMIZATION AND CONTROL WITH APPLICATIONS
There exists a large variety of different extensions to solve also large scale
problems, see Gould and Toint (2000) for a review.
4 SCP METHODS
If si and ci denote the sinus and co-sinus function values of the corresponding
angles of the trusses, i = 1,2, the horizontal and vertical displacements are
given in the form
+ +
) lpl(sin$~(c?/xz ~ g / x i )- C O S $ J ( S ~ C ~ / X Z ~ z ~ z / x ~ ) ) / s i n -
~ ( x= ~ ($2)
$l .
If we assume now that our optimization problem consists of minimizing the
weight of the structure under some given upper bounds for these displacements,
we get nonlinear constraints that are linear in the reciprocal design variables.
Although this special situation is always found in case of statically determi-
nate structures, it is rarely observed in practice. However, a suitable substi-
tution by inverse variables depending on the sign of the corresponding partial
derivatives and subsequent linearization is expected to linearize constraints
somehow.
For the CONLIN method, Nguyen et al. (1987) gave a convergence proof
but only for the case that (1.1) consists of a concave objective function and
concave constraints which is of minor practical interest. They showed also
that a generalization to non-concave constraints is not possible. More general
convex approximations are introduced by Svanberg (1987) known under the
name method of moving asymptotes (MMA). The goal is always to construct
nonlinear convex and separable subproblems, for which efficient solvers are
available. Using the flexibility of the asymptotes which influence the curvature
of the approximations, it is possible to avoid the concavity assumption.
316 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Given an iterate xk, the model functions of (1.1), i.e., f and gj, are approxi-
mated by functions f k and gt
at xk, j = 1, . . ., m. The basic idea is to linearize
f and gj with respect to transformed variables (u,"- xi)-' and (xi - LF)-'
depending on the sign of the corresponding first partial derivative. u," and L!
are reasonable bounds and are adapted by the algorithm after each successful
step. Also several other transformations have been developed in the past.
The corresponding approximating functions that define subproblem (2.3),
are
In a similar way, ~hand IJ< are defined. The coefficients ajk and Ptj, j = 0, . . .,
m are chosen to satisfy the requirements of Section 2.1, i.e., that (2.3) is convex
and that (2.3) is a first order approximation of (1.1) at xk. By an appropriate
regularization of the objective function, strict convexity of f '(x) is guaranteed,
see Zillober (2001a). As shown there, the search direction (yk -xk, vk -uk) is a
descent direction for the augmented Lagrangian merit function. If the adoption
rule for the parameters L: and U," fulfills the conditions that the absolute value
of their difference to the corresponding component of the current iteration point
xk is uniformly bounded away from 0 and that their absolute value is bounded,
global convergence can be shown for the SCP method if a similar update rule
for the penalty parameters r k as (3.6) is applied.
The choice of the asymptotes L! and u,", is crucial for the computational
behavior of the method. If additional lower bounds xl and upper bounds xu
on the variables are given, an efficient update scheme for the i-th coefficient,
i = 1, . . ., n, and the k-th iteration step is given as follows:
SQP VERSUS SCP METHODS 317
Sparsity in the problem data can be exploited, see again Zillober et al.
(2002) for a series of numerical results for elliptic control problems.
Our numerical tests use all 306 academic and real-life test problems published
in Hock and Schittkowski (1983) and in Schittkowski (1987a). Part of them
are also available in the CUTE library, see Bongartz et al. (1995). The dis-
tribution of the dimension parameter n, the number of variables, is shown in
Figure 5.1. We see, for example, that about 270 of 306 test problems have not
more than 10 variables. In a similar way, the distribution of the number of
constraints is shown in Figure 5.2. The test problems possess also nonlinear
equality constraints and additional lower and upper bounds for the variables.
The two codes under consideration, NLPQLP and SCPIP, are able to solve
more general problems
min f (x)
100
n 10
A ibo 15.0
test problems
2b0 250 36;
100
0
50 100 150 200 250 300
test problems
First we need a criterion to decide, whether the result of a test run is con-
sidered as a successful return or not. Let E > 0 be a tolerance for defining the
relative accuracy, xk the final iterate of a test run, and x* the supposed exact
solution known from the two test problem collections. Then we call the output
a successful return, if the relative error in the objective function is less than E
or
f (xk) < E , if f (x*) = 0
where 11.. .I[, denotes the maximum norm and gj(xk)+ = max(O,gj(xk)).
We take into account that a code returns a solution with a better func-
tion value than the known one, subject to the error tolerance of the allowed
constraint violation. However, there is still the possibility that an algorithm
terminates a t a local solution different from the one known in advance. Thus,
we call a test run a successful one, if the internal termination conditions are
satisfied subject to a reasonably small tolerance (IFAIL=O), and if in addition
to the above decision,
and
T ( x ~<
) e2 .
For our numerical tests, we use E = 0.01, i.e., we require a final accuracy of one
per cent. Gradients are approximated by a fourth-order difference formula
d
-dxi 1
( 2 ~ ( ~ - 2 ~ e ~ ) - 1 6 f ( x - ~ e i ) + 1 6 f ( x + q i e i ) - 2 f ( ~ ,+ 2 ~ e i )
f ( ~ ) ~ ~ 4!%
(5.2)
where Q = qrnax(10-~,IxiI), q = ei the i-th unit vector, and i = 1,. . . ,n.
In a similar way, derivatives of constraints are computed.
SQP VERSUS SCP METHODS 321
NLPQLP
SCPIP
When evaluating NF, we count each single function call. However, function
evaluations needed for gradient approximations, are not counted. Their average
number is 4 x NIT.
Many test problems are unconstrained or possess a highly nonlinear objective
function preventing SCP from converging as fast as SQP methods. Moreover,
bounds are often set far away from the optimal solution, leading to initial
asymptotes too far away from the region of interest. Since SCP methods do
322 OPTIMIZATION AND CONTROL WITH APPLICATIONS
not possess fast local convergence properties, SCPIP needs more iterations and
function evaluations in the neighborhood of a solution.
The situation is different in mechanical structural optimization, where the
SCP methods have been invented. In the numerical study of Schittkowski et al.
(1994), 79 finite element formulations of academic and practical problems are
collected based on the simulation package MBB-LAGRANGE, see Kneppe et
al. (1987). The maximum number of variables is 144 and a maximum number
of constraints 1020 without box-constraints. NLPQL, see Schittkowski (1985b),
and MMA, former versions of NLPQLP and SCPIP, respectively, are among
the 11 optimization algorithms under consideration. To give an impression on
the behavior of SQP versus MMA, we repeat some results of Schittkowski et al.
(1994), see Table 5.2. We compare the performance with respect to percentage
of successful test runs (NSUCC), number of function calls (NF), and number
of iterations (NIT), respectively.
For the evaluation of performance indices by the priority theory of Saaty,
see Schittkowski et al. (1994). The main difficulty is that the optimization
algorithms solve only a certain subset of test problems successfully, which differs
from code to code. Thus, mean values of a performance criterion are evaluated
only pairwise over the set of successfully solved test problems of two algorithms,
and then compared in form of a matrix. The decision whether the result of a
test run is considered as a successful one or not, depends on a tolerance E which
is set to E = 0.01 and E = 0.00001, respectively.
The figures of Table 5.2 represent the scaled relative performance when com-
paring the codes among each other. We conclude for example that for E = 0.01,
NLPQL requires about twice as many gradient evaluations or iterations, re-
spectively, as MMA. When requiring a higher termination accuracy, however,
SQP VERSUS SCP METHODS 323
NLPQL needs only about 30 % as many gradient calls. On the other hand,
NLPQL is a bit more reliable than MMA.
design of surface acoustic wave filters for signal processing, Biinner et al.
(2002),
In some cases, the underlying simulation software is highly complex and devel-
oped over many years, in some others NLPQL is used in a special way to solve
data fitting problems.
Typical applications of the SCP code SCPIP are
where x denotes the relative densities, u the displacement vector computed from
the linear system of equations K(x)u = f with positive definite stiffness matrix
K(x) and external load vector f . The relative densities and the elementary
stiffness matrices Kf define K(x) by
V(x) is the volume of the structure, usually a linear function of the design
variables. xl is a vector of small positive numbers to avoid singularities. The
power 3 in the state equation is found heuristically and usually applied in
practice. Its role is to penalize intermediate values between the lower bound
and 1. Topology optimization problems lead easily to very large scale, highly
nonlinear optimization problems. The probably most simple example is a half
beam, for our test discretized by 390 x 260 linear four-nodes square elements
leading to 101,400 optimization variables. SCPIP computes the solution shown
in Figure 7.1 after 30 iterations.
7 CONCLUSIONS
In this paper we try to describe SQP and SCP methods in a uniform way. In
both cases, convex subproblems are formulated from which a suitable search
direction with respect to the design variables and the multipliers are computed.
A subsequent line search based on the augmented Lagrangian merit function
stabilizes the optimization algorithm and allows to prove global convergence.
REFERENCES 325
Starting from an arbitrary initial design, a stationary point satisfying the nec-
essary Karush-Kuhn-Tucker conditions is approximated.
However, both methods differ significantly in their underlying motivation.
SQP algorithms proceed from a local quadratic model with the goal to achieve a
fast local convergence speed, superlinearly in case of quasi-Newton updates for
the Hessian of the Lagrangian. On the other hand, SCP methods apply a global
convex and nonlinear approximation of objective function and constraints based
on linearized inverse variables, by using first order information only.
Numerical results based on standard, small scale benchmark problems and
on structural design optimization problems are included, also a brief summary
of some industrial applications the authors are involved in.
Advantages and disadvantages of SQP versus SCP methods can be summa-
rized as follows.
Advantages Disadvantages
SCP tuned for solving structural me- slow final convergence possible,
chanical optimization problems, not reliable without stabiliza-
no heredity of round-off errors tion, less robust with respect to
in function and gradient cal- default tolerances
culations, excellent convergence
speed in special situations, able
to solve very large problems
References
Boderke P., Schittkowski K., Wolf M., Merkle H.P. (2000): Modeling of diffu-
sion and concurrent metabolism in cutaneous tissue, Journal on Theoretical
Biology, Vol. 204, No. 3, 393-407.
Birk J., Liepelt M., Schittkowski K., Vogel F. (1999): Computation of optimal
feed rates and operation intervals for tubular reactors, Journal of Process
Control, Vol. 9, 325-336.
Blatt M., Schittkowski K. (1998): Optimal Control of One-Dimensional Partial
Differential Equations Applied to Transdermal Diffusion of Substrates, in:
Optimization Techniques and Applications, L. Caccetta, K.L. Teo, P.F. Siew,
Y.H. Leung, L.S. Jennings, V. Rehbock eds., School of Mathematics and
Statistics, Curtin University of Technology, Perth, Australia, Vol. 1, 81 - 93.
Bongartz I., Conn A.R., Gould N., Toint Ph. (1995): CUTE: Constrained and
unconstrained testing environment, Transactions on Mathematical Software,
Vol. 21, NO. 1, 123-160.
Biinner M., Schittkowski K., van de Braak G. (2002): Optimal design of sur-
face acoustic wave filters for signal processing by mixed integer nonlinear
programming, submitted for publication.
Edgar T.F., Himmelblau D.M. (1988): Optimization of Chemical Processes,
McGraw Hill.
Fleury C. (1989): A n eficient dual optimizer based o n convex approximation
concepts, Structural Optimization, Vol. 1, 81-89.
Fleury C., Braibant V. (1986): Structural Optimization - a new dual method
using mixed variables, International Journal for Numerical Methods in En-
gineering, Vol. 23, 409-428.
REFERENCES 327
Zillober Ch. (2002): SCPIP - an eficient software tool for the solution of struc-
tural optimization problems, Structural and Multidisciplinary Optimization,
Vol. 24, NO. 5, 362-371.
Zillober Ch., Vogel F. (2000a): Solving large scale structural optimization prob-
lems, in: Proceedings of the 2nd ASMO UK/ISSMO Conference on Engi-
neering Design Optimization, J. Sienz ed., University of Swansea, Wales,
273-280.
Zillober Ch., Vogel F. (2000b): Adaptive strategies for large scale optimization
problems in mechanical engineering, in: Recent Advances in Applied and
Theoretical Mathematics, N. Mastorakis ed., World Scientific and Engineer-
ing Society Press, 156-161.
Zillober Ch., Schittkowski K., Moritzen K. (2002): Very large scale optimization
by sequential convex programming, submitted for publication.
APPROACH FOR LINEAR PROGRAMMING
IN MEASURE SPACE
C.F. Wen and S.Y. Wu
1 INTRODUCTION
Let (E,F) and (2, W) be two dual pairs of ordered vector spaces. Let E+ and
Z+ be the positive cones for E and Z respectively, and E; and ZT be the
polar cones of E+ and Z+ respectively. The general linear programming and
its dual problem may be stated as follows: Given b* E F, c E Z, and a linear
map A : E -t 2, then linear programming problem and its dual problem can
be formulated as:
(LP) minimize (x, b*)
subject to Ax - c E Z+ and x E E+;
(1994) discuss LPM and DLPM (defined in this section) with constraint in-
equalities on the relationships of measures to measures. They prove that under
certain conditions the LPM problem can be reformulated as a general capac-
ity problem as well as a linear semi-infinite programming problem. Therefore
LPM is a generalization of the general capacity problem and linear semi-infinite
programming problem. Lai and Wu (1994); Wen and Wu (2001) develope algo-
rithms for solving LPM when certain conditions are added to an LPM. In the
present paper, we develop an approximation scheme to solve LPM in section
3. This scheme is a discritization method. Bascially, this approach finds a se-
quence of optimal solutions of corresponding linear semi-infinite programs and
shows that the sequence of optimal solutions converges to an optimal solution
of LPM. In section 4, we give an algorithm to find the optimal solution of linear
semi-infinite programming problem.
We now formulate a linear program for measure spaces (LPM). As in Lai
and Wu (1994), X and Y are compact Hausdorff spaces, C(X) and M ( X ) are,
respectively, spaces of continuous real valued function and spaces of regular
Borel measures on X . We denote the totality of non-negative Borel mersures
on X as M + ( X ) and the subset of C ( X ) consisting of non-negative functions
as C+ (X). Given v, v* E M(Y), cp E C ( X x Y) and h E C(X), we know from
Lai and Wu (1994) that LPM can be formulated as follows:
Here B(Y) stands for the Borel field of Y. We define the bilinear functionals
(a, a) and (a, as follows:
Theorem 1.1 Suppose DLPM is consistent and -oo < V(DLPM) < m. If
there exists a g* E C+(Y) such that
Corollary 1.1 . Suppose DLPM (or LPM) is consistent with finite value. If
h(x) > 0 or h(x) < 0 for all x E X , then LPM has no duality gap.
APPROXIMATION APPROACH FOR L P IN MEASURE SPACE 335
2 SOLVABILITY OF LPM
In this section, we shall prove that under some simple conditions, there exists
a solution for LPM.
we obtain
V(LPM) = h(x)dp*(x).
for all (p, P) E M ( X ) x M(Y) and (f, g) E C ( X ) x C(Y). Then ELPM can be
rewritten as follows:
where 2,defined by
( ~ ~ ~ ~ ) b i n i ((p,
m p),
i z (h,
e 0))3
subject to (X(p,p) - u,g)2 = 0, V g E Pk,
and, p E M+(x), E M+(Y).
(a,
are a11 represented by .). Then we have the following result:
(a,
Theorem 3.1 Suppose LPM is consistent with finite value. If the subprograms
( E L P M ) ~satisfy the following conditions:
(1) For every k E IN, (ELPM)"~ solvable with an optimal solution (p;,p;),
and
( a ) ELPM is solvable.
338 OPTIMIZATION AND CONTROL WITH APPLICATIONS
and so the sequence pi pi,^;), (h, 0)) : k E IN) is increasing and has an upper
bound. Hence
+m x
h(x)dp;(x) 5 V(ELPM). (3.1)
By condition (2) and the Banach-Alaoglu theorem, there exist (p*,P*) E M+(X) x
M+(Y) and a subsequence {(p&,P;E3) : j E IN) G {(p;,pi) : k E IN) such that
We claim that
(p*,p*)E F ( E L P M ) .
00
since the weak convergence (pij, pi.) t (p*,F*) and the weak continuity of
x. On the other hand, as g is in k=lU span(Pk),there is a x such that g is in
00
Hence,
Let j -+ KI in (3.5), and from (3.6) and (3.6), we obtain (3.4). Thus we have
(3.3).
Therefore, combining (3.1), (3.2) and, (3.3), we have
Corollary 3.1 Suppose LPM is consistent with finite value. If the given data
h and v* satisfy the following conditions:
then
( b ) For every k E IN, let pi,^:) be an optimal solution for (ELPM)" Then
Proof: (a) Given k E IN.As in the proof of Theorem 2.1, we may assume that
there exists ( p r ) , ~ ? ) )E F((ELPM)" such that
By the same argument as in the proof of Theorem 2.1, there exist p; E M+(X),
n* E IN, and a subsequence {pknj : j E IN) C {pk, : n E IN) such that
1Ipknj11
h h(++!?
< min h(x) '
vknj 2 n*,
xEX
and,
Pknj + P; weakly as j + CQ. (3.10)
since ( k j , P k , , ) E F ( ( E L P W k ) , ( X ( ~ k , ~ , i i k J- 4 9 ) = 0, V9 E Pk. In
-
particular, (A(pknj,Pkn j ) - V, 1) = 0. That is,
- j- h(x)dd?
where, k = maxcp(x,
XEX
y) Xminh(x) (Iv*~( - v(Y) is a fixed positive number.
llEY *EX
Therefore, by the Banach-Alaoglu theorem, there exist pi E M+(X) and a
subsequence {pkej: j E IN) C {pknj : j E IN) such that
-
pkrj + pi weakly as j + CQ. (3.11)
Now (3.12) combining with (3.8) also yields that ( p i , P i ) is an optimal solution
for (ELPM)" as
Also by the same argument as in the proof of Theorem 2.1, there exists a
subsequence {pij : j E IN) 5 {pi : k E IN) such that
is the adjoint operator of &. Hence the dual problem for (ELPM)k is defined
as follows:
Tk+l
#la
That is,
Note that (DELPM)k is basically the same as CSIP, the continuous semi-
infinite program, except that the "minimization" in CSIP is replaced by "max-
imization" in (DELPM)~. Moreover, for every k E IN, there is no duality gap
for (ELPM)~ and (DELPM)k under the condition that h(x) > 0, 'dx E X.
Hence,
and,
344 OPTIMIZATION AND CONTROL WITH APPLICATIONS
and
( D S I P k ) : maximize ST g(t)dp(t)
such that ST fi(t)dp(t)= ci, i = 0,1,2,. . . ,k ,
P -L M + ( T ) ,
APPROXIMATION APPROACH FOR L P IN MEASURE SPACE 345
where M+(T) is the space of all nonnegative bounded regular Bore1 measures
on T. Note that (ELPM)~ and SIPk have the same optimal solution as well
as
~((DELPM)" = -v(sIP~).
Given that b > 0 is a prescribed small number, we state our algorithm in the
following steps:
Step 3 Find any tkn+, E T such that +,(tk,+,) < -6. If such t",+l
does not exist, stop and output X n as a solution. Otherwise, set
Tn+l= E n U {tkn+l).
Step 4 Update n t n + 1, and go to step 1.
In the above algorithm, we make the following assumptions:
Proof: Since En+i - {tkn+l)c En, we have, by the definition (4.2) of an,
an(t) > 0, Vt E En+l- { t ~ m + l ) Hence,
. from the complementary slackness
theorem of linear programming, we get
which implies
APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 347
k
If, in each iteration, there exists a 6 > 0 such that pn(t) > 6, V t E En, then
by Theorem 4.1, we obtain
Corollary 4.1 Given any 6 > 0, in each iteration, if there exists a 6 > 0 such
>
that pn(t) 6, Vt E En, then V(LPk(Tn+l))> V(LPk(Tn)).
Theorem 4.1 as well as Corollary 4.1 are fundamental results for the algo-
rithm; with them, we show that, under proper conditions, for any given 6 > 0,
the proposed algorithm actually terminates in a finite number iterations.
Proof: Suppose the scheme does not stop in a finite number of iterations. By
Corollary 4.1, we have
Then, by (4.1), 4nT(t$n,+l) converges to &(t,). Since &.( t Z T + , ) < -6, for
each r, we have
Hence, (4.4) cannot be true, and we have a contradiction. Therefore our claim
is valid and the proof is complete.
Under conditions (A3) and (A4), Theorem 4.2 assures that the proposed
scheme terminates in finitely many iterations, say n* iterations, with an optimal
solution
X n* = (x;',xn*1 ,x; * ,. . . ,$*).
In this case, xn*can be viewed as an approximate solution of (SIPk). The
next theorem tells us how good such an approximate solution can be.
REFERENCES 349
Theorem 4.3 For any given 6 > 0, if there exists X = (TO,TI, T2, . . . ,Tk) E
IRk+' such that
k
C T i f i ( t ) 2 1, Vt E T, (4.5)
i=O
then
By (4.5), we have
References
"e-mail: [email protected]
354 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
In recent papers Banks (2001), Banks and Dinesh (2000) have applied a se-
quence of linear time-varying approximations to find feedback controllers for
nonlinear systems. Thus, for the optimal control problem
min J = LxT ( t f )F ( x ( t f ) ) x ( t f )
2
+: lf +
{ x T ( t ) Q ( x ( t ) ) x ( t ) uT (t)R ( x ( t ) ) ~ ( t ) ) d(1.1)
t
+
X ( t )= A ( x ( t ) )x ( t ) B ( x ( t ) )u (t), x ( 0 ) = xo (1.2)
for i 2 0 , where
and
+:2 Sf0 +
{xnT ( t )Q ( x o )xlOl ( t ) uloIT (t)R (zoo)
u[O]( t ) }dt.
The approximations have been shown to converge under very mild conditions
(that each operator is locally Lipschitz) and to provide very effective control
in many examples. However, optimality has not been proved and, in fact, the
limit control is unlikely to be optimal in general (and indeed, there may be no
optimal control since the nonlinear systems considered are very general). Hence,
in this paper, we consider the full necessary equations derived from Pontryagin's
maximum principle and compare them with the original "approximate optimal
control" method proposed in Banks and Dinesh (2000).
OPTIMAL CONTROL O F NONLINEAR SYSTEMS 355
Let us first look at classical optimal control theory where we consider the linear
system
+
x (t)= A ( t )x (t) B ( t )u (t), x (0) = xo (2.1)
with the finite-time cost functional
1
min J = -xT ( t f )F x ( t f )
2
+ lf + tiT ( t ) R ( t () t~) }dt.
{ x T ( t )Q (t)x ( t )
(2.2)
I t is well known (see, for example, Banks (1986)) that from the maximum
principle, the solution to the linear-quadratic regulator problem is given by the
coupled two-point boundary value problem
A(t)
( ) = Q ( t )
-B(t)R-l(t)BT(t)
-AT (t) ) ( ;) (2.3)
with
Assuming that X (t)= P (t) x (t) for some positive-definite symmetric matrix
P ( t ) ,the necessary conditions are then satisfied by the Riccati equation
+
P (t)= -Q (t) - P ( t )A (t)- AT (t)P (t) P (t)B (t)R-l (t)B~ ( t )P ( t ),
P ( t f )= F
(2.5)
yielding the linear optimal control law given by
we obtain
where
T
(5)
uT -
8%
U, uT- a ax2
R(x)
, . . . , u~=L)
dxn
are vectors of quadratic forms, and
T
u, ...,
Hence we obtain the equations
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 357
-Q(x) - 2x 1 T
,, w ,,
- A T ( x ) - ( aA(z) x )
+ (9R-I ( x )B~ ( x )A)
or
( 1) ( = -,I
A(X)
( , A)
-B(x)R-~(x)B~(x)
-AT ( x )
) ( ;) (2-7)
where
or
di1 ( t ) 2 ( t ) -5 ( t )R-l ( t )5 T ( t )
( AIi1 ( t ) ) (
= -at) -2T (t)
358 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where
X ( t ) = A (x[~-']( t ) )
5 (t)= B (xfi-'] (t))
2-1 (t)= R-l (x[i-l]( t ) )
lj ( t )= Q1(xli-l]( t ), X [ ~ - - ~ I ( t ) )
with
We know that the two-point boundary value problem (2.3) with conditions
(2.4) represents the classical optimal control of a linear system with quadratic
cost (2.2) subject to the dynamics (2.1), the solution of which is given by the
Riccati equation (2.5) together with the optimal control law (2.6). Therefore,
since each approximating problem in (2.8) is linear-quadratic, the system (2.8)
with conditions (2.9), (2.10) and (2.11) is equivalent to the classical optimal
control of a linear system with quadratic cost
problem (2.7) for these necessary conditions with quadratic cost (1.1) subject
to the dynamics (1.2) is given by the approximate Riccati equation sequence
Similar to the "approximate optimal control" technique (see Banks and Dinesh
(2000)), optimization is carried out for each sequence on the system trajectory
for the "global optimal control" approach. In order to calculate the optimal
solution, it is necessary to solve the approximate matrix Riccati equation se-
quence (2.12) storing the values of P (t) from each sequence a t every discrete
time-step. In practice this will be done in a computer and it will be necessary
to solve the Riccati equation using standard numerical integration procedures,
360 OPTIMIZATION AND CONTROL WITH APPLICATIONS
such that 2 : IR, + IR"' and z(t)now represents a time-varying linear ma-
trix. Here p[']( t ) is obtained by again solving (2.12) for i = 1, this time
replacing x[~-']( t )and P [ ~ - '(]t )with the previous approximations x[O](t)and
P[O] ( t )respectively. The subsequent approximations are obtained in a similar
way. Thus, in solving (2.12)-(2.14),we obtain a sequence of time-varying linear
equations where each sequence is solved as a standard numerical problem. For
each sequence, optimization has to be carried out a t every numerical integra-
tion time-step resulting in dynamic feedback control values. When both xli]
and di]have converged on the kth sequence, on applying the control uI" of the
converged sequence to the true nonlinear system (1.2), the solution obtained
should be the same as that of the converged approximation. This is expected
since x [ ~( t])= x ( t )when xii] ( t )has converged.
3 EXAMPLE
Balancing an inverted pendulum on a motor-driven cart, as shown in Figure 3.1,
has become a popular controller design problem. The objective in the control
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 361
of this model is to move the cart to a specified position while maintaining the
pendulum vertical. The inverted pendulum is unstable in that it may fall over
any time in any direction unless a suitable control force u is applied and is
often used to test new controller designs. Here only a two-dimensional problem
is considered, that the pendulum moves only in the plane of the page.
Let us now apply the theories of "approximate optimal control" and "global
optimal control" to the inverted pendulum system where the mathematical
model is given by the equations (see Ogata (1997))
where
x2 = 0
sinc x2 =
For the "global optimal control" technique we also require the Jacobians of
A (x) and B (x), given by
where
h= mg
(M + m sin2x2) [sin x2sinc 5 2 - cos x2 a (sinc x2)]
8x2 ( M + m s i n 2 x2)'
+m sin (2x2)cos x2sinc 2 2
aa34=
ax,
mrx
(M+msi:2 ,;,)$ { ( M + m sin22 2 ) cos x2 - m sin x2 sin (2x2))
2= M+m g
{ ( M + msin2x2) & (sinc x2) - msin (2x2) sine x2
h=
ax,
mx
(M+msit2 ,,)2 {- ( M + msin2 x2) cos (2x2) + $msin2 (2x2))
Bas4 - m r s i n x 2
6x4 M + m sin2 xz
9;; = msin(zx;) )
2 ( M + m s l n x2
ab3 = msin!2;2)
8x2 (M+msm x2)
Bb4=
ax2 ( M + sin2
1
~ x2)~r { ( M + m sin2x2) sin 5 2 + m sin (2x2) cos x2}
with
d
-(sinc x2) =
8x2 {: cm x2,
22 =0
x2 #0
which can easily be shown by differentiating sinc x2 with respect to 22 and
using L'Hopital's rule.
By taking mass of the cart, M = 1 kg, mass of the pendulum arm, m =
0.1 kg, the distance from the pivot to the center of mass of the pendulum arm,
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 363
r = 0.5 m, and the acceleration due to gravity, g = 9.81 m/sec2, the state-
space model (3.2) representing the inverted pendulum system was simulated
using Euler's numerical integration technique with time-step 0.02 sec. The
performance matrices have been chosen independent of the system states such
that Q = F = diag (1, 1, 1, 1) and R = [I], so that the Jacobians of these
become zero. In simulating the inverted pendulum system (using a simulation
package such as MATLAB@) we only consider the regulator problem where
our objective is to drive all the system states to zero. We also assume that the
system starts from rest and simulate it for a given initial angle.
4 RESULTS
The "approximate optimal control" technique has been shown to provide very
effective control in that the pendulum is stabilizable from any given initial an-
gle. In fact taking the initial horizon time tf < 1.9 sec and proceeding with
the control of the approximating sequence from where left-off, by taking the
final state values as initial conditions and defining a new horizon time, the
pendulum can even be stabilized for its uncontrollable states (6 = fn/2 rad).
This is because the "approximate optimal control" technique does not require
a stabilizability condition to be satisfied - it only requires Lipschitz continuity.
Although stability is not guaranteed in general, on a finite-time interval, it
has been achieved for the uncontrollable states of the inverted pendulum sys-
tem. A related "local freezing control" technique given in Banks and Mhana
(1992) (where optimization is carried out point-wise on the system trajectory
for the nonlinear system (1.2) has been shown to stabilize the inverted pendu-
lum from any given initial angle except its uncontrollable states. The "global
optimal control" strategy, however, provides control that stabilizes the inverted
pendulum system for initial angles within the interval f1.1 rad beyond which
problems arise related to the convergence of the approximations for the nec-
essary conditions. This may be related to the solution being a discontinuous
feedback and a viscosity solution, causing the algorithm to eventually blow up.
Note also that the "global optimal control" strategy is harder to implement,
which requires the Jacobians of A (x), B (x), F (x), Q (x), and R (x), hence
taking a longer time to converge to the optimal solution.
Figures 4.1 and 4.2 illustrate and compare the converged solutions using
both techniques when the initial angle is set to 6 = 0.75 rad and 6 = 1.0 rad
364 OPTIMIZATION AND CONTROL WITH APPLICATIONS
73
Time (sec) -
FJ Time (sec)
0
-1
0 2 4 6 8 1 0 0 2 4 6 8 1 0
Time (sec) a" Time (sec)
I I
0 2 4 6 8 1 0
Time (sec)
Figure 4.1 Response of the States and the Control Input of the Inverted Pendulum System
when Subject t o the Initial Angle B = 0.75 rad Using "Approximate Optimal Control" and
"Global Optimal Control" Methods
E4
E
.-
g 0.5
V) u
5 0
'D
$ -0.5
A
0 2 4 6 8 1 0 ~
'D 0 2 4 6 8 10
Time (sec) Time (sec)
$ -1
0 2 4 6 8 1 0 0 2 4 6 8 10
a,
Time (sec) a Time (sec)
0 2 4 6 8 1 0
Time (sec)
Figure 4.2 Response of the States and the Control Input of the Inverted Pendulum System
when Subject to the Initial Angle 6' = 1.0 rad Using "Approximate Optimal Control" and
"Global Optimal Control" Methods
366 OPTIMIZATION AND CONTROL WITH APPLICATIONS
systems converging to any of the local optima. Thus the proposed technique
still remains "approximate optimal control", which provides solutions close to
the optimal one, without having the need to compute any Jacobians of the
system matrices and thus providing an easier implementation.
Table 4.1 Costs Associated with Each Optimization Technique Subject t o Various Initial
Angles.
5 CONCLUSIONS
In this paper we have considered the full necessary conditions of a system with
nonlinear dynamics, derived from Pontryagin's maximum principle, and com-
pared them with the original "approximate optimal control" method. We have
thus considered a nonlinear optimization problem with nonlinear dynamics and
replaced it with a sequence of time-varying linear-quadratic regulator problem,
where we have made the argument that the set of approximating systems are
equivalent to the classical optimal control of a linear-quadratic regulator system
and hence can be solved classically giving the solution to the "global optimal
control", in the case where such a control exists. Even though the approx-
imating systems using the "approximate optimal control" strategy may not
converge to a global optimum of the nonlinear system, by considering a similar
approximation sequence (given by the necessary conditions of the maximum
REFERENCES 367
principle), we have seen that the proposed method gives solutions very close to
the optimal one in many cases for the inverted pendulum system. The meth-
ods used here are very general and apply to a very wide range of nonlinear
systems. Future work will examine issues on discontinuity of the solution of
the Hamilton-Jacobi equation and viscosity solutions.
References
University of Wurzburg
Institute of Applied Mathematics and Statistics
Am Hubland
97074 Wiirzburg
Germany
e-mail: [email protected]
Abstract: This paper gives a brief survey of some proximal-like methods for the
solution of convex minimization problems. Apart from the classical proximal-
point method, it gives an introduction to several proximal-like methods using
Bregman functions, 9-divergences etc. and discusses a couple of recent develop-
ments in this area. Some numerical results for optimal control problems are also
included in order to illustrate the numerical behaviour of these proximal-like
methods.
1 INTRODUCTION
This paper gives a brief survey of some proximal-like methods for the solution
of convex minimization problems. To this end, let f : IRn -+ IR U {+co) be
a closed, proper, convex function, and consider the associated optimization
problem
min f(x), x€IRn. (1.1)
for some convex function 1: IRn -+ IR and a closed, nonempty and convex set
X E IRn can easily be transformed into a minimization problem of the form
(1.1) by defining
f(x), if x E X,
f (x) :=
+co, if x # X .
Hence, theoretically, there is no loss of generality by considering the uncon-
strained problem (1.1). In fact, many theoretical results can be obtained in
this (unifying) way for both unconstrained and constrained optimization prob-
lems. The interested reader is referred to the classical book by Rockafellar
(1970) for further details.
Despite the fact that extended-valued functions allow such a unified treat-
ment of both unconstrained and constrained problems, they are typically not
tractable from a numerical point of view. Therefore, numerical algorithms for
the solution of a problem like (1.1) have to take into account the constraints
explicitly or, a t least, some of these constraints. This can be done quite ele-
gantly by so-called proximal-like methods. Similar to interior-point algorithms,
these methods generate strictly feasible iterates and are usually applied to the
problem with nonnegativity constraints
where, in the latter case, A E IRnXmand b E IRm are the given data.
PROXIMAL-LIKE METHODS 371
2 PROXIMAL-LIKE METHODS
for k = 0,1,. . .; here, XI, denotes a positive number. The objective function of
this subproblem is strictly convex since it is the sum of the original (convex)
objective function f and a strictly convex quadratic term. This term is usually
called the regularization term.
372 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Theorem 2.1 Let {xk} and {Xk) be two sequences generated by the classical
zk
proximal-point method (2.1), define a k := 3=o A. 3, let f , := inf{f (x) 1 x E
IW) be the optimal value and S := {x* E IRn I f (x*)= f * ) be the solution set
of (1.1). Assume that a k + 00. Then the following statements hold:
(a) The sequence of function values { f (xk)} converges to the optimal value
f*.
(b) I f S # 0, then the entire sequence {x" converges to an element of S .
Theorem 2.1 states some very strong convergence properties under rather
weak conditions. In particular, it guarantees the convergence of the entire
sequence {xk) even if the solution set S contains more than one element; in
fact, this statement also holds for an unbounded solution set S. Note that the
assumption ak + 00 holds, for example, if the sequence {Xk) is constant, i.e.,
if X k = X for all k E N and some positive number A.
We note that many variations of Theorem 2.1 are available in the litera-
ture. For example, it is not necessary to compute the exact minimizer of the
subproblems (2.1) a t each step, see Rockafellar (1976) for some criteria under
which inexact solutions still provide similar global convergence properties. It
should be noted, however, that the criteria of inexactness in Rockafellar (1976)
are not implementable in general since they assume some knowledge regarding
the exact solution of (2.1). On the other hand, Solodov and Svaiter (1999)
recently gave a more constructive criterion in a slightly different framework.
F'urthermore, some rate of convergence results can be shown for the classical
proximal-point method under a certain error bound condition, cf. Luque (1984).
This error bound condition holds, for example, for linearly constrained problems
due to Hoffman's error bound Hoffman (1952). Moreover, being applied to
linear programs, it is known that the classical proximal-point method has a
finite termination property, see Ferris (1991) for details.
PROXIMAL-LIKE METHODS 373
The simple idea which is behind each proximal-like method for the solution of
convex minimization problems is, more or less, to replace the strictly quadratic
term in the regularized subproblem (2.1) by a more general strictly convex
function. Later on, we will see that this might be a very useful idea when
solving constrained problems.
There are quite a few different possibilities to replace the term 411x - xkl12
by another strictly convex distance-like function. The one we discuss in this
subsection is defined by
$I(%) :=
$2(x) := 2
i= 1
xi log xi - xi on S = +
:Rl (convention: O logo = 0).
This function may be used in order to solve the constrained optimization prob-
lem (1.2) by generating a sequence {xk} in such a way that xk+l is a solution
of the subproblem
1
min f ( x ) + - D + ( x , x ~ ) , x>O
xk
Definition 2.2 Let @ denote the class of closed, proper and convex functions
cp : R + (-oo, +oo] with dom(cp) 5 [0, +oo) having the following properties:
The graphs of these three functions are shown in the following figure.
376 OPTIMIZATION AND CONTROL WITH APPLICATIONS
and this is precisely the Kullback-Leibler relative entropy function from (2.2).
Using any pdivergence, we may try to solve the constrained minimization
problem (1.2) by generating a sequence {xk) in such a way that xk+l solves
the subproblem
1
min f(x)+-dp(x,xk), x>O
Xk
for k = 0,1,. . ., where so > 0 is any given starting point. For this method,
Teboulle (1997) shows that it has the same global convergence properties as
those mentioned in Theorem 2.1 for the classical proximal-point method. In
addition, Teboulle (1997) also allows inexact solutions of the subproblems (2.4).
The method may also be applied to the linearly constrained problem (1.3), and,
once again, superlinear convergence can be shown under a certain error bound
assumption, see Auslender and Haddou (1995) for details. Further references on
cp-divergences include Csiszdr (1967); Eggermont (1990); Teboulle (1992); Iusem
et al. (1994); Iusem, Teboulle (1995).
PROXIMAL-LIKE METHODS 377
where ei denotes the i-th unit vector in IRn. Hence, in each sum, we have the
factor which increases to infinity during the iteration process for all indices
i for which a constraint like xi 2 0 is active a t a solution. Consequently, we
therefore get Hessian matrices which are very ill-conditioned.
In order to avoid this drawback, it is quite natural to modify the idea from
the previous subsection in the following way: Let Q, be the class of functions
from Definition 2.2 and set
The difference to the pdivergence from the previous subsection is that we use
yH instead of yi. Calculating the second order derivative of the mapping d,
from (2.5) gives
for some constant v > 1. These functions are plotted in the following figure.
PROXIMAL-LIKE METHODS 379
Now, let $ denote any of these functions and set (similar to (2.5))
The regularized method from Auslender et al. (1999a) then generates a se-
quence {x" by starting with a strictly feasible point x0 for problem (1.3) and
by computing xk+l as the solution of the subproblem
1
min f (x) + -d@
Xk
(g(x), dx") , g(x) > 0. (2.7)
Theorem 2.2 Let {xk} be a sequence generated by the above method. Assume
that the following assumptions are satisfied:
(A.l) There exist constants Amax 2 Xmi, > 0 such that Xk E [Amin, Amax] for all
k E IN.
The rank assumption (A.4) is satisfied, e.g., if A = I , i.e., if the feasible set
is the nonnegative orthant. Moreover, this rank condition can be assumed to
hold without loss of generality for linear programs if we view (1.3) as the dual
of a standard form linear program.
Consider again the linearly constrained minimization problem (1.3). The meth-
ods described in the previous subsections all assume, among other things, that
380 OPTIMIZATION AND CONTROL WITH APPLICATIONS
the interior of the feasible set is nonempty. Moreover, it is assumed that we can
find a strictly feasible point in order to start the algorithm. However, for linear
constraints, it is usually not easy to find such a starting point. Furthermore,
there do exist convex optimization problems which are solvable but whose in-
terior of the feasible region is empty. In this case, it is not possible to apply
one of the methods from the previous subsections.
Yamashita et al. (2001) therefore describe an infeasible proximal-like method
which can be started from an arbitrary point and which avoids the assumption
that the interior of the feasible set is nonempty. The idea behind the method
from Yamashita et al. (2001) is to enlarge the feasible set
Note that X # 0 then implies int(Xk) # 0. Hence we can apply the previous
method to the enlarged problem
The fact that the method from the previous subsection uses a quadratic penalty
term actually fits perfectly into our situation where we allow infeasible iterates.
Obviously, we can hope that we obtain a solution of the original problem
(1.3) by letting Sk + 0. To be more precise, let us define
for some perturbation vector Sk > 0. Then is has been shown in Yamashita et
al. (2001) that the statements of Theorem 2.2 remain true under a certain set
of assumptions. However, without going into the details, we stress that these
<
assumptions include the condition dom(f) n {x I ATx b ) # 0 (in contrast to
(A.3) which assumes that the domain o f f intersected with the interior of the
feasible set is nonempty). On the other hand, the main convergence result in
PROXIMAL-LIKE METHODS 381
Yamashita et al. (2001) has to impose another condition which eventually guar-
antees that the iterates {xk} generated by the infeasible proximal-like method
become feasible in the limit point.
We consider two classes of optimal control problems. The first class contains
control constraints, the second one involves state constraints.
The class of control constrained problems is as follows: Let R 5 IRn be an
open and bounded domain and consider the minimization problem
In order to deal with the state constrained problem (3.2), we use the same
discretization scheme as for the control constrained problem (3.1). In this case,
however, this results in a linearly constrained optimization problem of the form
(1.3), and it is in general not easy to find a strictly feasible starting point.
PROXIMAL-LIKE METHODS 383
We begin with a word of caution: The numerical results presented in this and
the next subsection are not intended to show that proximal-point methods are
the best methods for solving the two classes of optimal control problems from
the previous subsection. The only thing we want to do is to provide a brief
comparison between some of the different proximal-like methods for the solution
of these problems in order to get some hints which proximal-like methods seem
to work best.
All methods were implemented in MATLAB and use the same parameter
setting whenever this was possible. The particular methods we consider in this
subsection for the solution of the control constrained problem (3.1) are the
proximal-like methods from Subsections 2.3 (p-divergences) and 2.4 (quadratic
kernels with regularization term). The unconstrained minimization is always
carried out by applying Newton's method. For reasons explained earlier, we
took the function 92 in combination with the proximal-like method from Sub-
section 2.3, and the corresponding mapping & for the regularized method from
Subsection 2.4.
Table 3.2 contains the numerical results for Example 3.1 (a) for different sizes
of N (the dimension of the discretized problem is n = N2). This table contains
the cumulated number of inner iterations, i.e., we present the total number of
Newton steps and therefore the total number of linear system solves for each
test problem. Both methods seem to work reasonably well, and the number of
iterations is more or less independent from the mesh size. However, the number
of iterations using the p-divergence approach is significantly higher than the
number of iterations for the regularized approach. The resulting optimal control
and state for Example 3.1 (a) are given in Figures 3.1 and 3.2, respectively.
The observation is similar for Example 3.1 (b) as shown in Table 3.2, al-
though this time the number of iterations needed by the two methods is pretty
much the same. The resulting optimal control and states for these two examples
are given in Figures 3.3 and 3.4, respectively.
We also tested both methods on Example 3.1 (a) using smaller values of a.
Due to the quadratic penalty term in the regularized method, we do expect
a better behaviour for this method. This is indeed reflected by the numerical
results shown in Table 3.4.
384 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Since a strictly feasible starting point is usually not a t hand for state con-
strained problems, we only applied the infeasible proximal-like method from
Subsection 2.5 to the test problem from Example 3.2.
PROXIMAL-LIKE METHODS 385
Table 3.3 Number of iterations for Example 3.1 (a) using different a ( N = 30)
The results indicate that the infeasible method works quite well. Similar t o
the results from the previous subsection, we can see from Table 3.4 t h a t the
number of iterations is again (more or less) independent of the mesh size.
PROXIMAL-LIKE METHODS 387
1 N 1 iterations 1
The resulting optimal control and state for Example 3.2 are shown in Figures
3.5 and 3.6, respectively.
388 OPTIMIZATION AND CONTROL WITH APPLICATIONS
4 F I N A L REMARKS
-.
1 0
References
Tseng, P., and Bertsekas, D.P. (1993), On the convergence of the exponen-
tial multiplier method for convex programming, Mathematical Programming,
Vol. 60, pp. 1-19.
Yamashita, N., Kanzow, C., Morimoto, T., and Fukushima, M. (2001), An in-
feasible interior proximal method for convex programming problems, Journal
of Nonlinear and Convex Analysis, Vol. 2, 2001, pp. 139-156.
DIMENSIONAL NONCONVEX
VARIATIONAL PROBLEMS
R e d Meziat
Departamento de Matemiticas
Universidad de Los Andes
Carrera 1 este No 18A-10
Bogota', Colombia
[email protected]
and
OMEVA, Research Group on Optimization and Variational Methods
Departamento de Matemiticas
Universidad de Castilla La Mancha
13071, Ciudad Real, Spain
http://matematicas.uclm.es/omeva
Abstract: The purpose of this work is to carry out the analysis of two-
dimensional scalar variational problems by the method of moments. This
method is indeed shown to be useful for treating general cases in which the
Lagrangian is a separable polynomial in the derivative variables. In these cases,
it follows that the discretization of these problems can be reduced to a single
large scale semidefinite program.
1 INTRODUCTION
The classical theory of variational calculus does not provide any satisfactory
methods to analyze non-convex variational problems expressed in the form
which provides information about the limit behavior of the minimizing se-
quences of the functional I given in (1.1). Thus,
In the present work, we will study the particular case in which the Lagrangian
function f takes the polynomial separable form
Jt
min, T ( v )= JR f ( A ) d p z ( A ) d x
w i t h u'( x ) = JR A d p , ( A )
u ( 0 ) = 0 , u (1) = a
can be recast as
min, xEock (x)dx
rnk
w i t h u'(x)= ml ( x )
u ( 0 ) = 0, u( 1 ) = a
396 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where mk (x) are the algebraic moments of the parametrized measures p, which
form the one-dimensional Young measure
3 CONVEX ENVELOPES
where p represents the family of all probability measures with mean t. In this
approach, every probability measure represents a convex combination of points
on the real line. Therefore, the measure
From this point of view, it is clear that optimal measure G, has a very precise
is supported
geometric meaning. Here we have assumed that ,? i in two points
a t the most, because of Caratheodory's theorem in convex analysis.
Since f is a polynomial function in the form (2.1), every integral in (3.1)
can be written as
2D NON CONVEX VARIATIONAL PROBLEMS 397
where values mo, . . . ,rn2, are the algebraic moments of measure p. So we can
express the convex envelope of f using the next semidefinite program
. .. ?0
. ..
mn mn+l mn+a m2n
where tl < t < t2. Using these values in the expression (3.2), we obtain the
optimal measure ji. It is remarkable that only three moments are needed for
recovering the optimal measure ji. Finally, we conclude that Problem (3.1) and
Problem (3.3) are equivalent. For additional details see Pedregal et a1 (2003)
398 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where p represents the family of all probability measures supported in the plane
satisfying
m3
.. .
...
mn mn+l mn+2 m2n
t p2 -..
...
''
PT PT+l pT+2 P2r
(3.5)
The optimal values m2,. .. ,.iiz2,,fi,. . . ,P2r for problem (3.5) allow us to de-
termine the optimal probability measure ,!i which satisfies (3.4).
From a practical point of view, ,!i is the direct product of two independent
one-dimensional distributions ,!ixand jiy, so we have
2D NON CONVEX VARIATIONAL PROBLEMS 399
in (1.4) and respectively, p y represents the convex envelope of the second poly-
nomial
2r
4 PROBLEM ANALYSIS
min I (u) =
u if(du(x,y))dxdy s.t. ulm=g. (4.1)
where the new sets of variables m and p must be characterized as the algebraic
moments of one-dimensional probability measure. In order to do so, we impose
1
the linear matrix inequalities
1 ml (x,Y) ... mn (x, Y)
ml (x,Y) m2 ( 2 , ~ ) ... mn+1 (x, Y)
m2 ( X , Y > m3 (x,Y) ... mn+2 (x,y) >0
...
k0
Here we will transform the optimization problem (4.3) subject to the con-
straints (4.4), into an equivalent discrete mathematical program. First, we
take a finite set of N points on the domain R indexed by lc, that is
2D NON CONVEX VARIATIONAL PROBLEMS 401
Next, for every discrete point (xk,yk) we take the algebraic moments
Using the 2 N x (n
of the respective parametrized measure pzk,yk. + r ) variables
listed in (5.2), we can express the functional J in the discrete form:
The constraints (4.4) form a set of linear matrix inequalities for every point in
0 , hence they should keep the same for every point (xk,yk) in the mesh (5.1).
So we have a set of 2 N linear matrix inequalities expressed as
in (4.3), we use the following fact: Given any Jordan curve C inside the domain
0 , the restriction (5.5) implies
where (xo, yo) and (xf ,yf) are two endpoints of curve C.
We shall select a finite collection of M curves CLwith 1 = 1 , . . . , M which, in
some sense, sweep the whole domain 0 . It will suffice that each point (xk, yk)
on the mesh belongs to a t least one curve CL.In order to impose the boundary
conditions in (4.3), every curve Cl must link two boundary points of R. So we
obtain a new set of M constraints in the form
We can see that optimization problem (4.3) can be transformed into a single
semidefinite program after discretization. Note that objective function Jd in
(5.3) is a linear function of the variables in (5.2). Those variables are restricted
by the set of 2 N linear matrix inequalities given in (5.4) and the set of M linear
equations given in (5.6). Thus, we have obtained a very large single semidefinite
program.
6 EXAMPLES
To illustrate the method proposed in this work, we will analyze the non-convex
variational problem
minm, h-l,lla +
(1 - 2m2 (2, Y) m4 (x, Y) + P2 (x, Y)) dxdY
under the constraints
(E,g) = (ml (x, Y) PI 7
2 1
Y)), b i + j (x, Y)Ii,j=o 0, bi+j (x, Y)Ii,j=O 2 0
and the boundary conditions ulan = g (x) with 0 = [- 1 , 1 ] 2 .
(6.2)
In order to perform the discretization of this problem, we use the straight
lines with slope 1 crossing the square [-I, 112.With them we can impose the
boundary conditions in the finite model. After solving the discrete model, not
to be exposed here, we obtain the optimal moments for (6.2), and the Young
measure solution for the generalized problem (6.1).
2D NON CONVEX VARIATIONAL PROBLEMS 403
For the three cases studied, we obtain the following optimal parametrized
7 CONCLUDING REMARKS
The major contribution of this work is that it settles the way for studying non-
convex variational problems of the form (4.1) . Indeed, the direct method of the
calculus of variations does not provide any answer for them if the integrand f
is not convex. See Dacorogna (1989). In addition, in this work we propose a
method for solving generalized problems like (4.2) when the integrand f has the
separable form described in (1.4). In fact to the best knowledge of the author,
do not exist other proposals to analyze this kind of generalized problems in two
dimensions.
An important remark about this work is that we have reduced the origi-
nal non convex variational problem (4.1) to the optimization problem (4.3).
In addition, the reader should note that Problem (4.3) is a convex problem
because the objective function is linear and the feasible set convex. That is
a remarkable qualitative difference since numerical implementation of problem
(4.1) may provide wrong answers when the search algorithm stops in local min-
ima, whereas a good implementation of Problem (4.3) should yield the global
minima of the problem.
404 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Since we can pose Problem (4.3) as a single large scale semidefinite program,
we can use existing software for solving non convex variational problems in
the form (4.1) whenever the integrand f has the separable form (1.4). This
situation prompts further research on large scale semidefinite programming
specially suited for generalized problems in the form (4.3) .
We should also stress that, although the original non convex variational prob-
lem (4.1) may not have a solution, its new formulation (4.3) always has one. In
general, this solution is unique and provides information about the existence of
minimizers for problem (4.1) . If Problem (4.1) has a unique minimizer Zi (x, y) ,
then Problem (4.3) provides the moments of the Dirac measures
then problem (4.1) has a unique minimizer a (x, y) which satisfies (x, y) =
3 (x, 9 ) .
One fundamental question we feel important to raise is whether the discrete
model (5.3) is an adequate representation of the convex problem (4.3). From
an analytical point of view, we need to find a particular qualitative feature on
the solution of Problem (4.3), that is the Dirac mass condition on all optimal
measures. So we can hope that even rough numerical models can provide us
with the right qualitative answer about the existence of minimizers for the non
convex variational problem (4.1). This has actually been observed in many
numerical experiments.
It is also extremely remarkable that we can get a numerical answer to an
analytical question. Indeed, we are clarifying the existence of minimizers of
one particular variational problem from a numerical procedure. This point is
crucial because no analytical method exists which allows to solve this question
when we are coping with general non convex variational problems.
On the other hand, we really need a fine numerical model because the solu-
tion of problem (4.3) contains the information about the oscillatory behavior
of minimizing sequences of the non convex problem (4.1). In those cases where
Problem (4.1) lacks solution, minimizing sequences show similar oscillatory be-
havior linked with important features in the physical realm. For example, in
REFERENCES 405
Acknowledgments
Author wishes to thank Serge Prudhomme and Juan C. Vera for their comments and
suggestions on this paper.
References
1 INTRODUCTION
F is said to be monotone o n K if
410 OPTIMIZATION AND CONTROL WITH APPLICATIONS
( F ( x * ) , x- x*) 2 0, V x E K;
then
It is easy to check that they also coincide with the solutions to SVI(F,K). Hence,
we can analyse the stability of the solutions to SVI(F,K) also respect to the
dynamical system GPDS(F,K,a).
3 STABILITY ANALYSIS
We have seen that LPDS(F,K) and GPDS(F,K,a) have the same stationary
points, but, in general, their solutions are different. In this section we analyse
the stability of these equilibrium points, namely we wish to know the behaviour
of the solutions, of LPDS(F,K) and GPDS(F,K,a) respectively, which start near
an equilibrium point. Since we are mainly focused on the stability issue, we
can assume the property of existence and uniqueness of solutions to the Cauchy
problems corresponding to locally and globally projected dynamical systems.
In the following, B(x*, r) denotes the open ball with center x* and radius r .
Now we recall some definitions on stability.
x* is called stable if for any E > 0 , there exists 6 > 0 such that for every
solution x ( t ) , with x ( 0 ) E B ( x * ,6 ) n K , one has x ( t ) E B ( x * ,E ) for all t 2 0;
x* is said asymptotically stable i f x* is stable and lim x ( t ) = x* for every
t-++oo
solution x ( t ) , with x ( 0 ) E B ( x * , 6 ) n K ; x* is said globally asymptotically
stable i f it is stable and lim x ( t ) = x* for every solution x ( t ) with x ( 0 ) E K .
t-+w
W e recall also that x* is called monotone attractor if there exists 6 > 0 such
that, for every solution x ( t ) with x ( 0 ) E B ( x * ,6 ) n K , the euclidean distance
between x ( t ) and x*, that is Ilx(t) - x*ll, is a nonincreasing function of t ;
whereas x* is said strictly monotone attractor if Ilx(t) - x*ll is decreasing to
zero in t. Moreover x* is a (strictly) global monotone attractor i f the same
properties hold for any solution x ( t ) such that x ( 0 ) E K .
Finally, x* is a finite-time attractor if there is 6 > 0 such that, for every
solution x ( t ) , with x ( 0 ) E B ( x * ,6 ) n K , there exists some T < +oo such that
x ( t ) = x* for all t 2 T .
We remark that under the assumptions of Theorem 3.2, the vector field F
is the gradient map of a real convex function on K .
Now we go back to the monotone attractors. When the vector filed F is
continuous, there is a further connection between locally projected dynam-
ical systems and variational inequalities: the global monotone attractors of
LPDS(F,K) are equivalent to the solutions of the Minty variational inequality
MVI(F,K) (see Pappalardo et a1 (2002)).
We remark that Theorem 3.3 does not hold for globally projected dynamical
systems: the following example (see Pappalardo et a1 (2002)) shows that the so-
lutions to MVI(F,K) are not necessarily monotone attractors of GPDS(F,K,a),
even if the vector field F is continuous on K .
x* is globally exponentially stable if (3.1) holds for all solutions x ( t ) such that
x(0) E K .
The strong monotonicity and the Lipschitz continuity of F give the expo-
nential stability of a stationary point of GPDS(F,K,a), provided that a is small
enough (see Pappalardo et a1 (2002)).
A result similar to Theorem 3.8 can be proved also for GPDS(F,K,a), pro-
vided that a is small enough (see Pappalardo et a1 (2002)).
4 SPECIAL CASES
This section is devoted to the stability analysis in two special cases: when the
domain K is a convex polyhedron and when the vector field F is linear.
We remarked that the stability for a locally projected dynamical system is
generally different from that of a standard dynamical system; however, when K
is a convex polyhedron, many local stability properties for LPDS(F,K) follow on
that of a classical dynamical system in lower dimension, under suitable assump-
tion on the regularity of the stationary points of LPDS(F,K) (see Nagurney et
a1 (1995)).
We assume that K is specified by
is called the minimal face flow and it is denoted by MFF(F,K,x*). Note that if
F is locally Lipschitz continuous, then so is the right hand side of MFF(F,K,x*),
hence, for any zo E S(x*), there is a unique solution zo(t) to MFF(F,K,x*),
defined in a neighborhood of 0, such that zo(0) = zo. Moreover, it is clear that
0 E S(x*) is a stationary point of MFF(F,K,x*). The stability of 0 E S(x*) for
MFF(F,K,x*) assures the stability of x* for LPDS(F,K), under some regular-
ity condition on x*, which we now introduce. Since x* solves the variational
inequality SVI(F,K), we have
418 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where riNK(x*) denotes the relative interior of NK(x*). Note that any interior
solution of SVI(F,K) is regular if we assume ri{O) = (0); moreover, when x* is
a solution of SVI(F,K) that lies on an (n-1)-dimensional face of K , it is regular
if and only if F(x*) # 0.
Now we show two stability results proved in Nagurney et a1 (1995). First, a
regular solution to SVI(F,K) has the strongest stability when it is an extreme
point of K .
The stability results in the general case are summarized in the following
theorem.
assumption, the existence and uniqueness property for the solutions to the
Cauchy problems associated to LPDS(F,K) and GPDS(F,K,a) holds for any
closed convex domain K (see Dupuis et a1 (1993) and Xia et a1 (2000)).
We first remark that when the matrix A is positive definite, the local stability
properties obtained for LPDS(F,K), by Theorem 3.8, and for GPDS(F,K,a),
by Theorem 3.9, become global properties, as shown by the following result.
Proposition 4.2 Assume that K be a closed convex cone and F(x) = Ax. If
A is a copositive matrix with respect to K , that is ( x ,Ax) 2 0 for all x E K ,
then x* = 0 i s a global monotone attractor for LPDS(F,K). If A is strictly
copositive with respect to K , that is ( x ,Ax) > 0 for all x E K , then x* = 0
is the unique stationary point for LPDS(F,K) and GPDS(F,K,a), and there
exists a 0 > 0 such that x* = 0 is a strictly global monotone attractor and
420 OPTIMIZATION AND CONTROL WITH APPLICATIONS
globally exponentially stable for LPDS(F,K) and for GPDS(F,K,a), for any
a < ao.
The stability analysis in the linear case is still open; future research might be
carried on to study suitable conditions on the matrix A providing stability for
any closed and convex domain. Also, it might be of interest to check if, when
F is an affine vector field and K = R3, the classes of matrices needed for the
study of existence and uniqueness of the solutions to the linear complementarity
problem are sufficient to guarantee some stability properties.
References
1 INTRODUCTION
The research presented in this paper was motivated by optimal control problems
with ODE or PDE dynamics, whose discretized dynamics cannot be solved
explicitly. For example, consider a classical optimal control problem of the
form
where Lg,2[0,1] is a linear space whose elements are in Lg[O, 11, but it uses
the Lz [O,1] norm, f (u) = F(xu (I)), and xu (t) E IRn is the solution of the two
point boundary value problem
with the usual assumptions (see, e.g., Polak (1997), Ch. 4). If we discretize
the dynamics (1.2) by means of Euler's method, using a step-size 1/N, where
N > 0 is an integer, we get
where LN is the space of functions taking values in IRm, which are constant of
the intervals [k/N, (k+l)/N), k = 0,1, ..., N - 1, fN(u) = F(x&), and x v s the
solution of (1.3). Generally, (1.3) cannot be solved explicitly, and hence must
be solved by some recursive technique, that we will call a "solver". Since only
a finite number of iterations of the solver can be contemplated, in solving the
problem (1.1) numerically, we find ourselves dealing with two approximation
parameters: N , which determines the Euler integration step-size, and, say K,
the number of iterations of the solver used to approximate x& and hence also
fN(u). If we denote by x&,, the result of K iterations of the solver in solving
(1.3), we get a second level approximating problem
(u) = F(x&,,).
where ~ N , K
Note that while the function fN(u) is continuously differentiable under stan-
dard assumptions, depending on the solver, the function fN,,(u) may fail to
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 425
be even continuous. Hence we may not assume that (1.5) is solvable by means
of standard nonlinear programming type algorithms.
An examination of the literature shows that efficient approaches to solving
infinite dimensional problems use "dynamic discretization," i. e., they start
out with low discretization precision and increase the precision progressively
as the computation proceeds. Referring to Polak (1997), we see that there are
essentially two distinct approaches to "dynamic" discretization, both of which
have been used only in situations with a single discretization parameter.
The first and oldest is that of algorithm implementation, see, e.g., Becker et
a1 (2000); Betts et a1 (1998); Carter (1991); Carter (1993); Deuflhard (1974);
Deuflhard (1975); Deuflhard (1991); Dunn et a1 (1983); Kelley et a1 (1991); Kel-
ley et a1 (1999); Polak et a1 (1976); Mayne et a1 (1977); Sachs (1986). In this
approach, first one develops a conceptual algorithm for the original problem
and then a numerical implementation of this algorithm. In each iteration, the
numerical implementation adjusts the precision with which the function and
derivative values used by the conceptual algorithm are approximated so as to
ensure convergence to a stationary point of the original problem. When far
from a solution the approximate algorithms perform well a t low precision, but
as a solution is approached, the demand for increased precision progressively
increases. Potentially, this approach is extendable to the case where two dis-
cretization parameters must be used.
The second, and more recent approach to dynamic discretization uses se-
quences of finite dimensional approximating problems, and is currently re-
stricted to problems with a single discretization parameter. It was formalized
in Polak (1993); Polak (1997), in the form of a theory of consistent approxi-
mations. Applications to optimal control are described in Schwartz (1996a);
Schwartz et a1 (1996), and a software package for optimal control, based on
consistent approximations, can be obtained from Schwartz (199613). Within
this approach, an infinite dimensional problem, P , such as an optimal control
problem with either ODE or PDE dynamics, is replaced by an infinite sequence
of "nested", epi-converging finite dimensional problems {PN).Problem P is
then solved by a recursive scheme which applies a nonlinear programming al-
gorithm to problem PN until a test is satisfied, a t which point it proceeds to
solve problem Pk+l,using the last point obtained for PN as the initial point
for the new calculation. In Polak (1997) we find a number of Algorithm Models
426 OPTIMIZATION AND CONTROL WITH APPLICATIONS
2 AN ALGORITHM MODEL
Finally, we assume that the exact evaluation of the functions fN(.) and
their gradients is not practical, and that an iterative "solver" must be used,
with K iterations of the solver yielding an approximation f N , ~ ( u to
) fN(u),
) V f N (u), and ON,K(U)to ON(U).We
and similarly, approximations VK ~ N ( uto
, f N (-), or
will make no continuity assumptions on ~ N , K ( . )VK
In response of the above assumption, we will develop new algorithm models,
with two precision parameters, for solving problems of the form P, by mim-
icking the one precision parameter Algorithm Model 3.3.17 in Polak (1997).
Algorithm Model 3.3.17 in Polak (1997) assumes that the functions fN(.), in
where X(v) > 0 is the Armijo step-size. For convenience, we reproduce Algo-
rithm Model 3.3.17 in Polak (1997) below.
S t e p 0. Set i = 0.
Step 1. Compute the smallest Ni, of the form 2"i-1, k E N,and vi+l E VN~,
such that
vi+l = A N ~ ( v ~ ) , (2.5)
428 OPTIMIZATION AND CONTROL WITH APPLICATIONS
(i) for every bounded set B c V, there exists 6 < ca and a function A :
N + IR+ such that limN,, A(N) = 0, and for all N E N, N N-1, >
v E VNnB,
IfN(v) - f (v)l I &A(N); (2.7)
(ii) For every v* E V such that 0(v*) # 0, there exist p* > 0,6* > 0, N* < ca,
such that
Making use of these definitions, we can now state the following scheme, based
on the idea of algorithm implementation, for solving the problem PN.
Step 0. Set i = 0.
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 429
Step 1. Compute the smallest Ki, of the form Ki-1 + kK*, k E N, and
vi+l E VN, such that
vi+l = AN,K~
(vi), (2.10)
and
fN,Ki(vi+l) - fN,Ki(vi) 5 -0(p(N, Ki)W. (2.11)
(i) for every bounded set B c VN, there exists K < cm and a function cp :
N x N + Re+ such that lim~,, p ( N , K ) = 0, and for all K E N,
K > K - 1 , VE V N n B ,
(ii) For every v* E VN such that QN(v*)# 0, there exist p* > 0, 6* > 0,
K* < cm,such that
~ )~)N , K ( v )L -6*,
~ N , K ( A N , K (- Vv E VN n B(v*,p*), VK 2 K*.
(2.13)
balance the precision with which the problems PNare solved versus the speed
with which N is advanced.
At this point we must introduce some realistic assumptions. In particular,
we assume that for every N, K E N, we can construct an iteration map AN,K :
VN + VN, where K is the number of iterations of a solver.
(i) The function f (.) is continuous and bounded from below, and for all N E N,
the functions f ~ ( . are
) continuous and bounded from below.
(ii) For every bounded set B c V, there exists 6 < cm, a function K* : N -+ N,
and functions p : N x N + IR+, A : N --+ IR+ with the properties
lim K * ( N ) = cm, (2.14)
N-00
lim p ( N , K ) = 0, Q N E N, (2.15)
K-cc
lim p ( N , K N ) = 0,
N-cc
VKN > K*(N), (2.16)
lim A ( N ) = 0, (2.1 7)
N+cc
(iii) For every v* E V such that B(v*) < 0, there exist p* > 0, S* > 0, N* > 0,
K** < m, such that
A l g o r i t h m M o d e l 2: Solves problem P .
Step 0. Set i = 0.
Step 4. If
~ N ~ ( ,A KN ~
~ ,(vi)) K ~ > - A ( N ~ Ki)W,
K ~ - ~ N < , (vi) , (2.23)
1/Ni, will be refined much faster when the Euler method is used for integration
than when a Runge-Kutta method is used for integration. 0
which shows that f ~ * , ~ * ( vti ) -00, as i + oo. Now, it follows from (2.18)
and (2.19) that for all i 2 il,
(b) If f(.) is strictly convex, with bounded level sets, and {vi)zo is a
sequence constructed by Algorithm Model 2, in solving the problem P , then
{vi)z0 converges to the unique solution of P.
) < -8,
f ~ ; , K ~ ( A N i , ~ i ( v-i )fN;,~i(vi) Vvi E B(6, @), VNi 2 N.
Next we note that in view of (2.18) and (2.19), for any v E V,,
Finally, let il L i o be such that Ni I for all i 2 i l . Then, for the subsequence
{vij)Zo, with i j 2 i l ,
Hence we see that the sequence { f (vi))zi, is monotone decreasing, and there-
fore, because f (.) is continuous, it must converge to f (6). Since this is contra-
dicted by (2.31), our proof is complete.
( b ) Since a strictly convex function, with bounded level sets, has exactly
one stationary point, the desired result follows from (a) and the fact that
{ f (vi))zil is monotone decreasing. 0
434 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Remark 2.2 The following Algorithm Model differs from Algorithm Model 2
in two respects: first the integer K is never reset and hence increases monoton-
ically, and second the test for increasing N is based on the magnitude of the
approximate optimality function value. As a result, the proof of its convergence
is substantially simpler than that for Algorithm Model 2. However, convergence
can be established only for the diagonal subsequence { v i j ) z o a t which Ni is
doubled. 0
Algorithm Model 3: Solves problem P.
Data. NO E N , vo E VN,,.
Step 3. If
(i) The optimality functions O(.) and ON(.) are continuous for all N E N.
(ii) For every bounded set B C V, there exists n < oo, a function K * : N -+ N,
and functions cp : N x N -+ R+,A : N -+ R+ satisfying (2.15)-(2.17),
such that for all N E N, v E VN n B,
Theorem 2.4 Suppose that Assumptions and are satisfied and that {v;) is
a sequence constructed by Algorithm Model 3, in solving the problem P.
(a) If {v;) is finite, then the sequence {vi}zo has no accumulation points.
(b) If {vj*) is infinite, then every accumulation point v* of {~j*)joo,~satisfies
q v * ) = o.
( c ) If f (.) is strictly convex, with bounded level sets, and {~j*)joo,~is a
bounded sequence constructed by Algorithm Model 3, in solving the problem
P , then it converges to the unique solution of P.
Proof. (a) Suppose that the sequence {v;) is finite and that he sequence
{vi}zo has an accumulation point v*. Then there exists an io, an N* < oo,
and an E* > 0, such that for all i 2 io,Ni = N*, E i = E*,and ON*,Ki (vi) < -E*.
But, in this case, for i 2 io,the Inner Loop of Algorithm Model 2 is recognized
as being of the form of Master Algorithm Model 1.2.36, in Polak (1997). It now
follows from Theorem 1.2.37 in Polak (1997) that Ki -+ oo, as i -t oo, and that
ON*(^*) = 0. Next, it follows from (2.38), in Assumption and the continuity of
ON*('), that for some infinite subsequence {Vij), ON*,Kij (Vij) -+ ON*(v*)= 0,
which shows that (2.36) could not be violated an infinite number of times, a
contradiction.
(b) When the sequence {vf ) is infinite, it follows directly from Assumption
2 and the test (2.36) that if v* is an accumulation point of {vj*),then O(v*) = 0.
436 OPTIMIZATION AND CONTROL WITH APPLICATIONS
(c) When the function f (.) is strictly convex, with bounded level sets, it has
a unique minimizer v* which is the only point in V satisfying 13(v*) = 0. Hence
the desired result follows from (b). 0
It can be shown that it has one and only one solution which depends con-
tinuously upon the data a. Note that u is a nonlinear function of a.
The problem is discretized by the finite element method of degree one Ciarlet
(1977) on triangles combined with a domain decomposition strategy with the
purpose of having a finer mesh in desired regions without having to touch the
rest of the domain. All linear systems are solved with the Gauss factorization
method.
The airfoil is made of two parts, a main airfoil Smand an auxiliary airfoil S,,
below and slightly behind the main one. To apply Domain Decomposition we
need to partition the physical domain R as a union R1 UR2 of two sub-domains
with a non empty intersection. This is done by surrounding the auxiliary airfoil
by a domain R2 outside Smand with boundary dR2 = r2U S, and by taking
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 437
(w2uiv- V U ~ V V ) + +
(wa(u2 u,) - g)v = 0 Vv E ~ j ( R 2 ) (3.4)
where IIh is the interpolation operator from one mesh to the other. The aprox-
imate solution after K iterations or the Schwarz algorithm is defined to be
1
= U;
U ~ , K in Ri\(R1 n R2) and U ~ , K= -(u;
2
+ u:) in R1 n R2. (3.5)
However, for compatibility with the theory in this paper, we determine the
mesh size h = 1/N and the number of Schwarz iterations K as required in
Algorithm Model 2, and we do not use a z1 - z2 to determine the number of
Schwarz iterations.
The convergence of the Schwarz algorithm is known only for compatible
meshes, i.e. meshes of R1 and R2 identical in R1 f l R2.
438 OPTIMIZATION AND CONTROL WITH APPLICATIONS
df =2 l u 6 ~ (3.6)
V f ( a )= -wup (3.10)
where U ~ , Kand p h , ~
are computed by K iterations of the Schwarz algorithm.
Details of the validity of this calculus of variations calculation can be found
in Lions (1958)
for some C E (0,oo), which implies that we can set cp(h,K ) = (1 - g)K.Note
that in this case cp(h,K ) is actually independent of h. In view of this, we can
take K * ( h )= C c e i l ( l / h ) ,where C > 0 is arbitrary.
(iv)I t follows from the properties of the method of steepest descent that,
given any v* iK V = L2(0,1) such that V f (v*) # 0, there exist a p* > 0, a
6* > 0, A*, and an h* > 0 , suchthat for all v E V n B ( v * , p ) ,(i) V f h ( v )# 0
and (ii)
where A(v) is the exact step-size computed by the Steepest Descent Algorithm.
440 OPTIMIZATION AND CONTROL WITH APPLICATIONS
To show that there exist an h* > 0 and an N** < oo, such that for all
h < h*, N 2 N**, and v E VhflB(v*,p)
v A ~ ( v ) v f h ( v ) )- fh,iv(v) I f h , ~ ( -
f h , ~ (- v A(v)Vfh(v)) - f h , ~ ( v )I -5*/2,
(3.18)
we make use of the facts that (a)
(b) By inspection, the bound functions A(h), cp(h, K ) , and K*(h) have the
required properties.
4 CONCLUSIONS
0081 . , . , . .
.-..
,_.
, .
.,,.
,
,,, > .,, ...
om.
./"
om.
om -
4 0075.
;
om. ;
am. :
om ' . ' . ' ' * . .
Figure 3.2 ct. versus distance t o the leading edge on the two sides of each airfoil.
442 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Figure 3.3 History of the convergence of the cost function for the coating problem. The
method with mesh refinement and adapted Schwarz iteration number (green curve) is com-
pared with a straight steepest descent method (red curve) and a steepest descent with mesh
refinement only and DDM up t o convergence (blue curve). The objective function augments
whenever the mesh is refined.
with PDE dynamics having two precision parameters, the step size and an iter-
ation loop count in the solver. Our numerical results show that our algorithms
are effective. The numerical study was done using the method of steepest de-
scent but the models and the proofs are general and are likely to work also with
Newton methods, conjugate gradient methods, etc.
min f (u)
UEU
PN s g~ N ( v ) (5.2)
(c) We say that the problem-optimality function pairs {PN,ON) are consis-
tent approximations to the problem-optimality function pair {P,8), if
the PNepi-converge to P and for every infinite sequence {uN), such that
UN E UN and u~ +u E U, l i m s u p 8 ~ ( 5~ O(U)*.
~) 0
The reason for introducing optimality functions into the definition of con-
sistency of approximation is that it enables us to ensure that not only global
optimal solutions of the problems Phconverge to global optimal solutions of P,
but also local optimal solutions converge to either local solutions or stationary
points.
Acknowledgments
This work was supported in part by the National Science Foundation under Grant
No. ECS-9900985 and by the Institut Universitaire de France.
444 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Notes
1. Please refer t o the Appendix or Polak (1997) for the definitions of optimality functions
and consistent approximations.
2. The epigraphs of f N , restricted t o UN, converge to the epigraph o f f , restricted to U,
in the PainlevbKuratowski sense.
3. When optimality functions are properly constructed, their zeros are standard station-
ary points, for examples see Polak (1997).
4. Note that this property ensures that the limit point of a converging sequence of ap-
proximate stationary points for the Phmust be a stationary point for P
References
Becker, R., Kapp, H., and Rannacher, R. (2000), Adaptive finite element meth-
ods for optimal control of partial differential equations: basic concept, SIAM
J. Control and Optimization, Vol. 39, No. 1, pp. 113-132.
Bernardi D., Hecht, F., Otsuka K., Pironneau 0. (1999) : freefern+, a finite
element software to handle several meshes. Dowloadable from
ftp://ftp.ann.jussieu.fr/pub/soft/pironneau/.
Cessenat M. (1998), Mathematical Methods in Electromagnetism, World Sci-
entific, River Edge, NJ.
Betts, J. T. and Huffman, W. P. (1998), Mesh refinement in direct transcription
methods for optimal control, Optm. Control Appl., Vol. 19, pp. 1-21.
Carter, R. G. (1991), On the global convergence of trust region algorithms using
inexact gradient information, SIAM J. Numer. Anal., Vol. 28, pp. 251-265.
Carter, R. G. (1993), Numerical experience with a class of algorithms for non-
linear optimization using inexact function and gradient information, SIAM
J. Sci. Comput., Vol. 14, No. 2, pp.368-88.
Ciarlet, P.G. (1977), The Finite Element Method, Prentice Hall. Deuflhard, P.
(1974), A modified Newton method for the solution of ill-conditioned systems
of nonlinear equations with application to multiple shooting, Numerische
Mathematik, Vo1.22, No.4, p.289-315.
Deuflhard, P. (1975), A relaxation strategy for the modified Newton method.
Optimization and Optimal Control, Proc. Conference on Optimization and
Optimal Control, Oberwolfach, West Germany, 17-23 Nov. 1974, Eds. Bu-
lirsch, R.; Oettli, W.; Stoer, J., Springer-Verlag, Berlin, p.59-73.
Deuflhard, P. (1991), Global inexact Newton methods for very large scale non-
linear problems, Impact of Computing in Science and Engineering, Vo1.3,
NO.^), p.366-93.
REFERENCES 445
Dunn, J . C., and Sachs, E. W. (1983), The effect of perturbations on the con-
vergence rates of optimization algorithms, Applied Math. and Optimization,
pp. 143-147, Vol. 10.
Kelley, C. T. and Sachs, E. W. (1991), Fast algorithms for compact fixed point
problems with inexact function evaluations, SIAM J. Sci. Statist. Cornput.,
VO~.12, pp. 725-742.
Kelley, C. T. and Sachs, E. W. (1999), A Trust Region Method for Parabolic
Boundary Control Problems, SIAM J. Optim., Vol. 9, pp. 1064-1081.
Lions J.L. (l968), Contr6le Optimal de syst&mesgouvern4.s par des kquations
aux dkriv4es partielles. Dunod-Gauthier Villars, 1968.
Mayne D. Q., and Polak E. (1977), A Feasible Directions Algorithm for Optimal
Control Problems with Terminal Inequality Constraints, IEEE Transactions
on Automatic Control, Vol. AC-22, No. 5, pp. 741-751.
Pironneau O., Polak E. (2002), Consistent Approximations and Approximate
Functions and Gradients In Optimal Control, J. SIAM Control and Opti-
mization, Vol 41, pp.487-510.
Polak E., and Mayne D. Q. (1976), An Algorithm for Optimization Problems
with Functional Inequality Constraints, IEEE Transactions on Automatic
Control, Vol. AC-21, No. 2.
Polak E. (1993), On the Use of consistent approximations in the solution
of semi-Infinite optimization and optimal control problems", Mathematical
Programming, Series B, Vol. 62, No.2, pp 385-414.
POLAK E. (1997), Optimization: Algorithms and Consistent Approximations,
Springer-Verlag, New York.
Sachs, E. (1986), Rates of Convergence for adaptive Newton methods,JOTA,
Vol. 48, No.1, pp. 175-190.
Schwartz, A. L. (1996a), Theory and Implementation of Numerical Methods
Based on Runge-Ku tta Integration for Solving Optimal Control Problems,
Ph. D. Dissertation, University of California, Berkeley.
Schwartz, A. L. (1996b), RIOTS The Most Powerful Optimal Control Problem
Solver. Available from http://www.accesscom.com/ adam/RIOTS/
Schwartz, A. L., and Polak, E. (1996), Consistent Approximations for Opti-
mal Control Problems Based on Runge-Kutta Integration, SIAM Journal on
Control and Optimization, Vol. 34, No.4, pp. 1235-69.
21 NUMERICAL SOLUTIONS OF
OPTIMAL SWITCHING CONTROL
PROBLEMS
T. Ruby and V. Rehbock
1 INTRODUCTION
numerical results, this can sometimes lead to more optimal objective function
values.
Note, though, that neither of these algorithms can guarantee a globally op-
timal solution, due to the combinatorial aspect of having to choose an optimal
sequence of dynamical systems.
2 PROBLEM FORMULATION
Suppose that we have a total of M given dynamical systems, R1, R2, . . . , OM,
defined on the time horizon [O,T]. Each of these may be invoked over any
subinterval of the time horizon. For i = 1 , 2 , . . . ,M, let the i-th candidate
system be defined by a set of first order ordinary differential equations, i.e.
3 SOLUTION STRATEGY
ie. the ordered sequence [l, 2,. . .,MI is repeated N + 1times within WL. Then,
we consider the following system on [0,TI.
where TL = [ r 1 , r 2 , . .. , T
' L ]must,
r ~ -E ~ ~ of course, satisfy
(b) The gradient of the cost functional with respect to TL is not continuous(see
Teo et al. (1991));
(c) When ri-1 and ri coalesce, the number of decision variables changes.
454 OPTIMIZATION AND CONTROL WITH APPLICATIONS
To complete the application of the CPET to the Problem (pN), note that,
in the new time scale, the system dynamics may be conveniently rewritten as
where we define Z ( s ) = x ( t ( s ) ) , f
' ( s ,Z ( s ) ) = f j ( t ( s ) ,x ( t ( s ) ) )and t ( s ) is the
solution of (3.5). Furthermore, the objective functional is transformed to
N ): Given
where g(s, Z ( s ) ) = g ( t ( s ) ,x(t(s))). Finally, we define Problem (P,
N , find a u E U such that the cost functional (3.11) is minimized subject to
the dynamics (3.5), (3.8)-(3.10) and subject to the constraint (3.7).
Note that Problems ( P N ) and (P:) are equivalent. In Problem (P:), rather
...,, TL-1, we look for u E U.All switching points of the orig-
than finding T ~ , T Z
inal problem are mapped onto the set of integers in chronological order. Piece-
wise integration can now be performed easily since all points of discontinuity
of the dynamics in the s-domain are known and fixed. Moreover, u E U is a
piecewise constant function and hence Problem P
)(: is readily solvable by the
optimal control software MISER3 (see Jennings et al. (1991), Jennings et al.
(2001)), which is an implementation of the control parametrization technique
(see Teo et al. (1991)). Note that the piecewise integration of (3.8) is performed
automatically in MISER3. The continuity constraints (3.10) are also satisfied
automatically when executing the code in standard mode.
The solution of (3.5) yields t ( s ) , so the the state trajectory x(t), of the
original problem defined on [0,T] can be reconstructed easily. I t is clear from
Assumption 1 that there exists an integer N such that an optimal solution to
Problem P
)(: is also an optimal solution of the original Problein (P). We
must note, though, that MISER3 uses a gradient based optimization approach
and we are therefore not guaranteed of finding a globally optimal solution to
Problem P
)(: and therefore Problem (P).
We consider the numerical example given in Liu and Teo (2000). In this prob-
lem, there are 3 candidate dynamical systems and the time horizon is [O,21. We
have
The transformed problem may then be written asMinimize the cost functional
We note from Table 4.1 that the proposed method will generate local solutions
depending on the choice of initial guess. For the first initial guess listed in Table
4.1, we obtain a locally optimal solution which involves one switching and is
identical to that produced in Liu and Teo (2000) (where a t most one switching
OPTIMAL SWITCHING CONTROL PROBLEMS 457
I I Case 1 I Case 2 I
I Initial Sequence I {Rl) I {RI, a 2 , 0 3 , RI, fi2,03) I
I Initial Switching Times I no switch 1 0.1,0.3,0.8,1.1,1.5 1
I Optimal Sequence I {%, 02) I {%, &,&,0 2 ) I
I Optimal Switching Times 1 1.03463 1 0.501,0.797,0.990 1
I Optimal Cost 1 0.003611 1 0.003204 I
Table 4.1 Result for N = 1.
was allowed). However, as the second line in the table shows, a different initial
guess produces quite a different solution with 3 switches and a significantly
lower cost. Virtually all other initial guesses we tried resulted in one of these
two solutions.
Next we tried increasing N to N = 3 in order to see if there are more
optimal solutions if we allow more switchings. Again, many initial guesses
were tested, with most of these leading to a solution with optimal switching
sequence {RI ,Q3, f l 1 , 0 2 , R1 ,i22) and corresponding switching times 0.50156,
0.82533, 0.91906, 0.97595, and 1.03862. The optimal cost value was 0.003170.
Again, we did obtain a slightly worse local optimal solution with one of the
initial guesses tested.
A further increase to N = 7 did not yield any other solutions with a lower
cost value, so the one obtained with N = 3 appears to be optimal, ie. our
estimate for the optimal number of switches is 5.
5 CONCLUSIONS
References
Abstract: This paper is concerned with state feedback controller design us-
ing neural networks for nonlinear optimal regulator problem. Nonlinear opti-
mal feedback control law can be synthesized by solving the Hamilton-Jacobi
equation with three layered neural networks. The Hamilton-Jacobi equation
generates the value function by which the optimal feedback law is synthesized.
To obtain an approximate solution of the Hamilton-Jacobi equation, we solve
an optimization problem by the gradient method, which determines connection
weights and thresholds in the neural networks. Gradient functions are calcu-
lated explicitly by the Lagrange multiplier method and used in the learning
algorithm of the networks. We propose also a device such that an approximate
solution to the Hamilton-Jacobi equation converges to the true value function.
The effectiveness of the proposed method was confirmed with simulations for
various plants.
1 INTRODUCTION
This paper is concerned with optimal state feedback control of nonlinear sys-
tems. To solve nonlinear optimal regulator problem, we solve the Hamilton-
Jacobi equation using neural networks and then synthesize the optimal state
feedback control law with its approximate solution.
Most studies on optimal control of nonlinear systems have been made by appli-
cation of calculus of variation. They are aimed a t calculating optimal control in-
put uO(t), t € [0,tl] and the corresponding optimal trajectory xo(t), t E [0,tl]
starting from an initial condition x(0). This yields the so called open-loop
control, but a practically interesting matter is to obtain optimal state feedback
control law uO(t)= a ( x ( t ) ) , t E [0,tl] which brings us a closed loop system.
As is well known, optimal regulator is a typical control problem in which a
linear system and a quadratic performance functional are considered and the
Riccati equation plays an important role. The reason of attaching importance
to the optimal regulator is that it offers a systematic method to design a state
feedback control law and consequently one can construct a closed-loop control
system.
In contrast it is very hard for nonlinear systems to design such an optimal
state feedback controller. To synthesize the optimal state feedback control law
resulting in the stable closed loop system, one must solve the Hamilton-Jacobi
partial differential equation (H-J equation).
However, nonlinear optimal regulator is of restrictive use since it is extremely
difficult to solve H-J equation analytically. Hence we need to develop approxi-
mate solution of the H-J equation.
In the past several approaches to solve H-J equation were proposed as follows.
With the Taylor series expansion, one can obtain an accurate approximate
solution around an operating point. However, it is difficult to approximate
uniformly in broad range.
The principle of neural network approximation for H-J equation is all explained
NONLINEAR OPTIMAL REGULATOR 463
subj. to x ( t ) = f ( x ( t ) , u ( t ) ) , x(O)=Xo
464 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where x(t) E Rn and u(t)E R' are the state vector and the control vector,
respectively.
We assume the following:
Assumption 2.1 System (2.lb) is stabilizable in the sence that for any initial
such that x ( t ) -+ 0 as t -+ m.
condition x ( 0 ) there exists a control law u(.)
= min { q ( x )
U
+ uTRU + VX( x )f ( x ,u ) ) (2.2)
V x ( x ) )is the Hamiltonian function
where, H ( x , u,
holds for u satisfying (2.2). Therefore the optimal control must satisfy the
following partial differential equations.
NONLINEAR OPTIMAL REGULATOR 465
+
x = f(x) G(x)u (2.10)
From (2.6)
If we can obtain V ( x )satisfying this equation, then the optimal state feedback
control law u ( x )is given by (2.12).
where y E Rn and x E R4 are the output and internal state of the neural
network, respectively and Wl E RQXn,W2 E RnXq,8 E RQand a E Rn are the
connection weight matrices, the threshold and the constant, respectively.
Further a : Rn -+ R4 denotes the sigmoid function, and we use hyperbolic
tangent functions as the sigmoid function a ( x ) , i.e.,
466 OPTIMIZATION AND CONTROL WITH APPLICATIONS
A
Here xP denotes the element of a set A = {xplxp E R, p = 1,2,. . . ,P) where
R c R" is a subregion of state space and A is the discretized set of R.
Learning problem of the neural network is formulated as the following opti-
mization problem.
min E[Wl,W2,8] = C l e ( ~ ~ ) 1 ~
w1,w2,8 p= 1
subj. to zP = WlxP +8 (3.6b)
For learning of the network we need concrete expressions of gradients of the per-
formance function with respect to connection weights, etc., i.e., Vwl EIWl, W2,8],
NONLINEAR OPTIMAL REGULATOR 467
Then we have
Next, t o calculate the gradients of le(x)I2, let us define the Lagrangian L with
Lagrange multipliers X E Rn,p E RQ,y E Rn.
By the chain rule of derivatives and formulae1 for gradients and symmetricity
of V a ( z ) , partial derivatives of the Lagrangian L with respect t o each variable
are calculated as follows.
468 OPTIMIZATION AND CONTROL WITH APPLICATIONS
where x @ y and x y denote the tensor product and the inner product of array
x E X and y E Y, respectively, and V 2 u ( z ) E RqXQXq is the second order
derivative array. From (3.16)~(3.21)variables A, P, y are obtained as
In order to get the optimal feedback law satisfying (2.7)--(2.9) by solving the
optimal regulator problem (2.1), we must approximate the value function V(x)
and the state feedback control law u ( x ) with separate neural networks. Hence
we use for V(x) the same neural network used in the affine nonlinear case.
A
Note here that e2 E R'. For simplicity letting
--
W
{Wl,W2,W3,W4) and
=
- A
8 = {el,021, we define the performance function E[W,81 for learning as fol-
lows.
subj. to 27 = W1xP + el
then
Then in the similar manner as the previous case [A], concrete expressions of
gradients are obtained.
Calculating partial derivatives of Lagrangian L with respect to each variable
+
and noticing Vwi{le1(x)I2 IIe2(~)11~) = VwiL, Vei{lel(x)l2 +
IIez(x)112) =
Vei L, we can obtain the gradients of performance function (3.42) with respect
to connection weight matrices W iand threshold ei as follows.
472 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Using these gradients we can execute the steepest descent method ( a > 0)
Since it does not necessarily follow that partial differential equations (2.7)--(2.9)
possess a unique solution, an arbitary solution of the H-J equation is not al-
ways the true value function. This difficulty is caused by the fact that the
H-J equation is only a necessary condition for optimality. In general it is very
hard to prove that any approximate solution of the H-J equation converges to
the true value function V(x). However, we can improve the possibility that
the approximate solution converges to the value function, making a device on
learning of networks.
It is not guaranteed that v N ( x ) obtained from the learning problem (3.6) or
(3.43) coincides with the value function V(x) of the performance functional
( 2 . 1 ~ ) .In fact it sometime happens that v N ( x ) # V(x). This is caused by
that in general the solution to the H-J equation is not unique. Accordingly, we
need a device of learning such that vN(x) converges to the true value function
V(x).
For simplicity let us assume there exits u = uO(x,Vx(x)) satisfying (2.9)
around (x, u ) = (0,O) globally. Substitute this into (2.7) to get
The solution to the H-J equation (4.1) is not unique because the H-J equation
is only a necessary condition for optimality. The value function of problem
(2.1) certainly satisfies the H-J equation (4.1) but there may exist any other
solutions. Let us here denote the value function (2.4) especially by VO(x)and
distinguish it from any other solutions V(x).
The minimum solution among semi-positive definite solutions to the H-J equa-
tion (4.1) coincides with VO(x). But asymptotical stability of the closed-loop
system is not guaranteed by implementing the optimal control law uO(x,Vj: (x)).
If there exists the solution V(x) of the H-J equation such that x = f (x, uO(x,
Vz(x))) becomes asymptotically stable, then we call it the stabilizing solution
denoted by V-(x). The following lemma gives a condition that V-(2) exists
uniquely [Kucera (1972),Schaft (l996)I.
474 OPTIMIZATION AND CONTROL WITH APPLICATIONS
does not possess pure imaginary eigenvalues and {A, B } is stabilizable, where
A = f x ( 0 ,0 ) , B = f , (0,
0 ) , 2Q = q,, ( 0 ) . Then the H- J equation possesses
the unique stabilizing solution V - ( x ).
As this lemma holds, it is known from the uniqueness of the stabilizing solu-
tion that V - ( 2 ) becomes equal to the minimum performance function for the
stabilizing optimal control problem, that is
min
U
{1m(4(x)+ uTRu)dt 6 = f u ) ,x (t)=
(1, x , lim x(t) = 0
t--too
(4.2)
At this time the stabilizing optimal control law is given by u O ( xV, i ( x ) ) .Fur-
ther it can be easily shown [Schaft (1996)]that V - ( x )is the maximum solution
of the H-J equation.
Now V O ( xwas
) the minimum among the semi-positive definite solutions of the
H-J equation (4.1), while V - ( x ) is the maximum one. Hence if there exists
more than one semi-positive definite solution, then V O ( x#) V - ( x ) . However if
) V - (2).
the semi-positive definite solution is unique, then it becomes V O ( x=
The uniqueness of semipositive definite solution is guaranteed by assuming de-
tectability.
t--too t+w
+
solutions of (2.lb) that lim x(t) = 0 as lirn { q ( x ( t ) ) ~ ( t ) ~ R u (=t 0.
)}
for the stabilizing solution V-(x) of the H-J equation (4.1). Further, since
VO(x) = V- (x), it holds V2V0(0) = P- also.
Now from (4.2) V-(x) takes the minimum V-(0) = 0 and it holds V& (0) = 0.
Meanwhile VN(2) satisfies VN(0) = 0 and v,N (0) =0. Thus letting the relation
hold for VN(x) as well as (4.4), we can make v N ( x ) coincide with V-(x) =
VO(x)in the neighborhood of x = 0.
Here, we show the improvement of learning algorithm as to affine nonlinear sys-
tem case. Note that we can apply the same approach for the general nonlinear
system case.
V2VN(0)can be calculated from (3.7) as
+
subj. to zP = W l x P 19 (4.8b)
y ( x P ) = W2u(zP)- W2u(19) (4.8~)
vN(xp) = y(xp)Ty(xp) (4.8d)
e ( x p )= q ( x P )+ v;(xp)f ( x p )
1
- -4 V , N ( X ~ ) G ( X ~ ) R - ~ G ( X ~ ) ~ V , " (4.8e)
(~~)~
D = ~w,Tvu(I~)w,Tw~vu(I~)w~- P- (4.8.f)
p= l,2,...,P
5 SIMULATION RESULTS
We made computer simulations for the following example [Isidori (1989)l. Here
let us consider the case where the method in [Goh (1993)l is difficult to apply
for learning.
min
u
Lrn +
x: xi + x: + u2dt@@@
subj. to x l = -q + e2"2u
x2 = 2x122 + sin(x2) + 0 . 5 ~
x3 = 2x2
x ( 0 ) = xo
F'rom (2.12) and (2.13) the H-J equation and the optimal control law uO(x,
Vx(x))
become:
- 0 . 2 5 { e ~ ~( ~x ~)+
,~~e 2 x 2 ~ (x)Vx2
xl + 0 . 2 5 (~x ~) ~~=} 0
(2) (5.2)
NONLINEAR OPTIMAL REGULATOR 477
Since {A, b ) and {&,A) are controllable and observable, respectively, the
assumption in Lemma 1 is satisfied. The Riccati equation for problem (5.4)
becomes
PA +A P ~ b ( 2 r ) - l b T p+ 2Q = 0,
~ - (5.5)
and the stabilizing solution P- is calculated as follows.
+
Since q(x) = xf +x$ x: is positive definite, Assumption 4.1 is satisfied. Hence
the state variables become asymptotically stable by the optimal control law.
A number of middle layer of the network was taken 20. The learning domain
wasset a s O = {(x1,x2,x3)1 - 2 < x l I 2,-1 5 x 2 5 1,-2 5 x 3 5 2) and was
discretized by an orthogonal lattice. The distance between the adjoining lattice
points was set as Ax1 = 0.2, Ax2 = 0.1, Ax3 = 0.2. Initial values of Wl, W2
and 0 were given by random numbers between -0.5 and 0.5.
The results of optimal feedback control is presented in Figure 5.1 and 5.2 in
case of initial state x(0) = (1.5, 1, 1.5). Then the value of performance function
is 19.797. For comparison optimal control in the open-loop style was computed
by a usual optimization algorithm [Shimizu (1994)] in case of the same initial
478 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Figure 5.1 Optimal state feedback con- Figure 5.2 Optimal state feedback con-
trol by neural network (state variables) trol by neural network (control input)
Figure 5.3 Optimal control in open loop Figure 5.4 Optimal control in open loop
style (state variables) style (control input)
6 CONCLUSIONS
Notes
References
Beard, R.W., Saridis, G.N. and Wen, J.T. (1997), Galerkin Approximations
of the Generalized Hamilton-Jacobi-Bellman Equation, Automatica, Vol. 33,
No. 12, pp. 2159-2177.
Doya, K. (2000), Reinforcement Learning in Continuous Time and Space, Neural
Computation, Vo1.12, pp. 219-245.
Goh, C.J. (1993), On the Nonlinear Optimal Regulator Problem, Automatica,
Vol. 29, No. 3, pp. 751-756.
Isidori, A. (1989), Nonlinear Control Systems: An Introduction, Springer-Verlag.
Kucera, V. (1972), A Contribution to Matrix Quadratic Equations, IEEE Trans.
Automatic Control, pp. 344-347.
Lee, H.W.J., Teo, K.L. and Yan, W.Y. (1996), Nonlinear Optimal Feedback
Control Law for a Class of Nonlinear Systems, Neural Parallel & Scientific
Computations 4, pp. 157-178.
Lukes, D.L. (1969), Optimal Regulation of Nonlinear Dynamical Systems, SIAM.
J. Control, Vol. 7, No. 1, pp. 75-100.
Saridis, G.N. and Balaram, J . (1986), Suboptimal Control for Nonlinear System,
Control Theory and Advanced Technology, Vol. 2, No. 3, pp. 547-562.
480 OPTIMIZATION AND CONTROL WITH APPLICATIONS
W.Q. Liu
School of Computing, Curtin University of Technology, WA 6102,
Australia. Email: [email protected]
1 INTRODUCTION
addition, we also investigate this result further when the orthogonal condition
defined in this paper is satisfied.
The paper is organized as follows: In section 2, we will present some pre-
liminary results. The main result will be given in section 3 and conclusions are
given in section 4.
2 PRELIMINARIES
2. The finite dynamic modes of the system (2.1) are the finite eigenvalues of
(E,A).
3. If all the finite dynamic modes lie in the open left half plane, Then the
system (2.1) is said to be stable.
L e m m a 2.1 Dai (1989) The regular pair (E, A ) is impulse-free if and only if
L e m m a 2.2 Dai (1989) The triple (E, A , C ) is finite dynamics detectable and
impulse observable i f and only if there exists a constant matrix L such that
+
(E, A LC) is stable and impulse-free or equivalently admissible.
For the descriptor system (2.1), many researchers have devoted their effort
to the output feedback H, control problems Takaba et a1 (1994); Masubuchi
et a1 (1997); Wang et a1 (1998). One important motivation is that the output
feedback can be realized easily. However, the corresponding solutions to H ,
control problems are much more complicated compared to the case of state
feedback control. Since the state of a system contains all the essential infor-
mation for the system, a controller based on state-feedback can lead to a more
effective control. Particularly, the corresponding solution to the H , control
problems may become much more explicit Gao et a1 (1999); Wang et a1 (1998).
The problem for state feedback control is that all the state variables are not
available in practice, so we will choose to use a state observer in this paper
and investigate the corresponding state-feedback H, control problem for the
descriptor system within this framework. This problem is between the output
feedback and static feed back control. As can be seen in the sequel, it is not a
trivial special case of output feedback control. With solving this problem for
singular systems, one can visulize the difference more clearly between descriptor
systems and normal systems.
As in the Full Information case Doyel et a1 (1989) for linear time-invariant
system, it is also assumed here that the exogenous input signal w is always
available.
The next lemma gives a result on the design of state observer for singular
systems.
H, CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 485
L e m m a 2.3 Dai (1989) Assume that (E,A, C2) is finite dynamics detectable.
Then the following dynamic system is a state observer for the system (2.1)
where matrix L is such that (E, A + LC2) is stable and impulsive free.
In this paper, a controller based on the observer (2.2) is assumed to be in the
following form.
R e m a r k 2.1 It should be noted that the state feedback controller given above
is diferent from the output feedback controller given by Masubuchi et a1 (1997)
I n the controller (2.5'), there are only two feedback parameters rather than
the four parameters present in (2.4). It can be seen late in this paper that the
results based o n the controller (2.5') are much more explicit.
The next result is the basic result for H , control for singular systems. It
gives a necessary and sufficient condition for a singular system to be H, norm
bounded.
3 MAIN RESULTS
In order to consider the H, problem for the descriptor system (2.1), the fol-
lowing assumptions are made.
(Al) (E,A, C2) is finite dynamics detectable and impulse observable.
(A2) rankDl2 = k.
The next lemma is an important result for the proof of our main result.
Lemma 3.1 Assume that all the following matrices have appropriate dimen-
sions. Let
q x : A, B, C) ATX + XTA + cTc+ y - 2 ~ T ~ ~ T ~
+ +
9 ( X : A B2K,B1, C DK) = \k(X : A, B1,B2,C, D)
+
+{D[K + (DTD)-~(B$'x+ D ~ c ) ] ) ~ { D [ K(DTD)-l(B;X + DTc)])
Proof
@ ( X :A + B 2 K , B 1 , C + D K = \ k ( X :A,B1,B2,C,D)+II
where
II = +
KTBgX + X T B 2+~C ~ D K KTDTC KTDTDK+
= [KT+ (xTB2+ c ~ D ) ( D ~ D ) - ~ ] ( D ~+DKT(B:X
)K + DTC)
+ (xTB2+ CTD)(DTD)-I(BTX + DTC) -
- (xTB2+ CTD)(DTD)-I( B ~ X + DTC)
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 487
Therefore,
Theorem 3.1 Suppose that assumptions ( A l ) and ( A 2 ) hold. Then for the
descriptor system (2.1), the following statements are equivalent.
(i) There exists a state-feedback controller (2.3) such that the resulting closed-
loop system is admissible and the H,-norm of transfer function m a t r k
Tzw(s)from w to z is strictly less than a prescribed positive number y,
ie.,
II Tzw(s)<,I 7
with a constraint
X X ~ 2E o
E ~ =
When (ii) holds, the matrix K in the controller (2.3) can be constructed as
and L in the controller (2.3) satisfies the requirement that (E,A + LC2) is
admissible.
Let
488 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Denote
-
C = [ Ci Cl + D u K ]
then the closed-loop system (3.5) can be written as
= ~O+BW
z = &I
where
From Lemma 2.4, there exists a solution X satisfying the following inequality
ETx4= X ~ 2
E0
According to Lemma 3.1, one can obtain
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 489
With chosen L and K above in the controller (2.3), the resulting closed-loop
system is the system (3.4) and its transfer function matrix from w to z is
In order to complete the proof, it suffices to prove that the system (3.4) is
admissible. Notice that
and (E,A + LC2) and (E, A + B 2 K ) are stable. Then the closed-loop system
is stable. Further,
490 OPTIMIZATION AND CONTROL WITH APPLICATIONS
T h e o r e m 3.2 Suppose that assumptions ( A l ) and (A2') hold. Then the fol-
lowing statements are equivalent for system (2.1).
(i) There exists a controller (2.3) such that the resulting closed-loop system
is admissible and
II Tzw(s) IIw< 7
(ii) There exists a solution X satisfying the following GARI
with constraint
E T x = x T E 2 0.
When (ii) hold, the matrix K in the controller (2.3) can be constructed
as
A
K= - BTX (3.12)
and L in the controller (2.3) is such that (E,A + LC2) is admissible.
In this section, the H
, control problem based on state feedback is investi-
gated and a sufficient and necessary condition in terms of GARI are obtained.
REFERENCES 491
It should be noted the G A H obtained here is much more simple than those
obtained in Masubuchi et a1 (1997) in two ways.
(i) Only one parameter is involved here. This greatly simplifies the complex-
ity of solving the G A H .
(ii) Only one GARI is required.
This motivates us that state feed back controller design is much easier if
the estimated state information is available. In this case, the algorithm for
solving G A H with constraints proposed in Masubuchi et a1 (1997) can also
be significantly simplified. We will not discuss this simplification here since it
is technically trivial.
4 CONCLUSIONS
References
I. V. Konnov
1 INTRODUCTION
monotonicity are constant. Now we consider the case where the parameters (or
weights) are variables, thus extending the results from Konnov (2001); Allevi
et a1 (2001). Namely, we establish existence results for generalized vector varia-
tional inequalities and for systems of generalized vector variational inequalities
in a topological vector space by employing new relative (pseudo)monotonicity
concepts for set-valued mappings.
Let I be the set of indexes { I , . . . ,m). For each s E I, let E, be a real linear
topological space and Us be a nonempty subset of E,. Set
u = nu,.
Let F be a real linear topological space with a partial order induced by a
convex, closed and solid cone C.
Set R y = {(CLE Rm I pi > 0 , 1 I i I m ) .
For each s E I, let Gs : U + 2L(E8tF)be a mapping so that if we set
( u , * ) ~ E IE U such that
Lemma 2.1 Suppose that the set U , defined by (2.1), is convex and that the
mapping G : U --t 2 L ( E t F ) , defined by (2.2), is u-hemicontinuous. Then
DGVVI(I,U,G ) implies G V V I ( I ,U,G ) .
Proposition 2.1 Suppose that the set U, defined by (2.1), is convex and that
the mapping G : U + 2L(E9F),
defined by (2.21, is u-hemicontinuous and pseudo
(w,P)-monotone. Then G V V I ( I ,U,G ) ,DGVVI(I, U,G ) ,and SGVVI(I, U,G )
are equivalent.
Proposition 3.1 Suppose that the set U, defined by (2.1), is convex and that
yoG is u-hemicontinous and pseudo (w, P)-monotone. Then GVVI(I, U, y oG)
is equivalent to SGVVI(I, U, G).
In what follows, we reserve the symbols a and ,6 for the parameters associated
with relative (pseudo)monotonicity.
4 EXISTENCE RESULTS
In this section, in addition to the general assumptions, we shall suppose that the
space L(E, F) is topologized in such a manner that the mapping T : Z x U -+ F ,
defined by T(g, u) = g(u), is continuous, if Z is a compact subset of L ( E , F).
As usual, for each set B 5 E, we denote by B its closure. First we establish
existence results for SGVVI(I, U, G).
Theorem 4.1 Let U be convex and compact. Suppose that G has nonempty
and compact values and is relatively pseudomonotone, and that a o G is u-
hemicontinuous. Then SGVVI(I, U, G) is solvable.
and
) {u E u
~ ( v= I C(Y,(V)G,(V)(V~
- us) -intC).
sEI
We divide the proof into the following three steps.
Let z be in the convex hull of any finite subset {vl,. . . ,vn) of K. Then z =
n n
C pjvj for some p j 2 0, j = 1,.. . , n ; C p j = 1. If z # ~ j " , ~ B ( v j then
) , for
j=1 i=l
all g, E G,(z), s E I , we have
DECOMPOSABLE GENERALIZED VECTOR VARIATIONAL INEQUALITIES 503
I t follows that
(ii)nwGu A(v) # 0.
F'rom relative pseudomonotonicity of G it follows that B(v) c A(v), but, for
each v E U, A(v) is closed. In fact, let { u e ) be a net in A(v) such that ue
converges to ii E U. Then, for each 0, there exist elements E G,(v), s E I,
such that
9, E GS(v) for each s E I. It follows that g:(u:) -+ gs(iis) for each s E I and
-
We conclude that B E A(v), i.e. A(v) is closed. Therefore, B(v) C A(v) and (i)
now implies (ii).
Corollary 4.1 Suppose that G has nonempty and compact values and is rel-
atively pseudomonotone, and that a o G is u-hemicontinuous. Suppose that U
is convex and closed and that there exist a compact subset V of E and a point
5 E V n U such that
x P~(U)G~(U
~ € 1
( ~G
-)us) ~ - i n t ~ for all u E U\V. (4.1)
Proof: In this case it suffices to follow the proof of Theorem 4.1 and observe
-
that B(5) C V under the above assumptions. Indeed, it follows that B(6) is
compact, hence the assertion of Step (i) will be true due t o Proposition 3.2 as
well. 0
The proof follows from Corollary 4.1 and the definition of pseudo (w, P)-
monotonicity.
REFERENCES 505
By choosing different topologies, one can specify the above existence results for
less general classes of topological vector spaces. For example, in this section,
we specialize these results for a Banach space setting.
Namely, we suppose that E and F are real Banach spaces and that C is a
convex, closed and solid cone C in F . We shall apply the weak topology in E ,
the strong topology in F, and the strong operator topology in L(E, F). For
this reason, we need the concept of a completely continuous mapping.
Corollary 5.2 Let all the assumptions of Corollary 5.1 hold and let G be
pseudo (w, P)-monotone and u-hemicontinuous. Then GVVI(I, U, G) is solv-
able.
References
Allevi, E., Gnudi, A., and Konnov, I.V. (2001), Generalized vector variational
inequalities over product sets, Nonlinear Analysis Theory, Methods & Ap-
506 OPTIMIZATION AND CONTROL WITH APPLICATIONS
1 INTRODUCTION
Suppose that there exists a subset I' of A and a compact convex subset K of X
such that I? is closed in X x Y and
Remark 1.1 Both Lemma 1.1 and 1.2 are special cases of Lemma 1.3 with
X = Y = K a n d r = { ( x , x ): x E X ) .
In this paper, we first give a variation of the geometric lemma of Fan et al,
and then by applying the result, we prove an existence theorem of solutions for
set-valued vector equilibrium problem.
2 PRELIMINARIES
Let us also mention the well known fixed point theorem of Kakutani (1941).
Theorem 2.1 Let X be a nonempty compact convex subset in some finite di-
mensional space lRn and let the set-valued map F : X --+ 2X have closed graph
and nonempty convex vaules. Then F has a fixed point x* E F ( x * ) .
Suppose that
512 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Proof: Suppose to the contrary that for each x E K , there exists y E Y such
that x 4 A-(y). Then, by (a) x 4 cl A-(y') for some y' E Y. Hence we have
K C U V, where Vy := {Z E X : z $! clA-(y)). By the compactness of
YEY
n
K , there exists a finite family {yl, . . . ,yn) of Y such that K c U V, .
Let
i=l
{Pi,. . . ,Pn) be a partition of unity on K subordinated to the finite covering
.
{V& : i = 1,.. . ,n), that is, Dl,. . ,,On are nonnegative real-valued continuous
n
functions on K such that each Pi vanishes on K\V,,, while C Pi(x) = 1 for
i=l
all x E K . Let P := co {yl, . . . ,yn) c Y and define a continuous mapping
p : K + P by setting
Thus, for each i such that ,&(x) > 0, x lies in V, n K , so that (x, yi) B and
whence by (b) we have
On the other hand, by (c) there exists a fintie subset {xl, . . . ,x,) of K such
n
that P Let A := co {xl, . . . ,x,) C K . Define a set-valued map
c U r(xi).
i=l
H : A + 2A by H(x) := co {xi : p(x) E I'(xi)) for each x in A. Then each
H(x) is a nonempty closed convex subset of A. Moreover, H has a closed
GEOMETRIC LEMMA 513
graph in A x A. Indeed, let (v, w) E A x A\Gr (H), i.e. w 6H(v). Then there
exists an open neighborhood V of w in A which is disjoint from H(v). Suppose
H(v) = co {xi : i E J) for some J c {I,.. .,n). Then p(v) 6r(xj) for j 6 J.
Therefore by (d), U := p-l
A. If z E U, then p(z)
n P\I'(xj)
(jeJ
4 r ( x j ) for j 6 J,
) is an open neighborhood of v in
and so H(z) c H(v). This implies
V n H(z) = 0 for all z E U. Hence we have an open neighborhood U x V of
(v, w) which does not intersect the graph of H , that is, the graph Gr (H) is
closed.
Applying the Kakutani's fixed point theorem, we have a point 5 E A such
that 1 E H(5). If H (5) = co {xj : j E Jo)for some Jo c {I,. . . ,n), then
p(1) E r ( x j ) c R(xj) for every j E Jo,i.e. x j E ER-[p(~)],Vj
E Jo. Since
R-[p(1)] is convex by (e) and 5 is a convex combination of {xj : j E Jo),we
have 5 E R-[p(5)]. And so (5,p(Z)) E Gr (R) C B , contradicting (3.1). The
theorem is proven.
Remark 3.1 The condition (b) in Theorem 3.1 may be replaced by the equiv-
alent condition (b ') below.
to show that x belongs to one of the B-(yi)'s. Assume to-the contrary that
x $ B-(yi) for a11 i = 1,.. . ,n. It follows from (b) that (x, Xl y l + . .+Anyn) $ B
which is a contradiction. Therefore, x E ;,
2= 1
B-(yj) n K. This establishes the
implication (b') + (b).
(b') + (b): let (x, Pi) $ B for i = 1,. . . ,n with x E K and let z = Xlyl +
+ Anyn be an arbitrary convex combination of yi's. We need to show that
(x, z) $ B. Suppose otherwise that (x,z) E B. Then, x E K n B-(z), and so
n
by (b') we have x E U B-(yi) n K. This implies that for some 1 I j I n,
i=l
(x, yj) E B , which is a contradiction. The implication (b')+ (b) is established.
Thus, the theorem below is equivalent to Theorem 3.1.
514 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Suppose that
Based on the result of Theorm 3.1, we are now ready to prove an existence
theorem for the set-valued vector equilibrium problem.
Proof: If we can show that all the conditions (a)-(e) of Theorem 3.1 are
satisfied, then we may invoke the theorem to conclude the existence of an
equilibrium point xo E K . To this end, We define A := {(x, y) E X x Y :
@(x,y) C P) and B := A. It follows from (i) and (ii) that conditions (a) and (b)
in Thoerem 3.1 are satisfied. Also it is clear that (iii) implies Gr (0) c B c A,
(iv) implies (e), and (v) implies (d) of Theorem 3.1. It remains to show that
condition (c) of Theorem 3.1 is satisfied. Indeed, let Q be a polytope of Y. For
each y E Q, by condition (vi) we know that there exists an open neighborhood
N(y) of y such that M(y) := n I?-(v) # 0. Since Q is compact, there
"EN(Y)
e
exists a finite family {yl, . . . ,y,) C Y such that Q c (J N(yi). With each
i=l
M(yi) # 0, we may take a point xi E M(yi); consequently N(yi) c I'(xi) and
T
so Q c U l?(xi).
Thus, condition (c) is satisfied. We can now conclude from
i=l
Theorem 3.1 the existence of an equilibrium point xo E K. The theorem is
proven.
Remark 4.1 Condition (ii) is satisfied when for every x E K the set-valued
map @(x,-) : Y + 2W has the following P-properly quasiconvexity property
(see Kuroiwa (1996)).
516 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Acknowledgments
This work was supported by the Research Committee of The Hong Kong Polytechnic
University.
References
D.I.M.E.T.
Universiti di Reggio Calabria
Via Graziella, Loc. Feo di Vito
89100 Reggio Calabria - ITALIA
idone9ing.unirc.it
1 INTRODUCTION
Many equilibrium problems arising from various fields of science may be ex-
pressed, under general conditions, in a unified way:
[C(u)u - F ( u , Au)][A(u,Au) - $1 = 0
A(% Au) 5 $
u 2 4
u=+ondR.
that the Variational Inequality theory which in general expresses the equilib-
rium conditions ( 1.I), provides a powerful methodology, that in these last years
has been improved by studying the connections with the Separation Theory,
Gap Functions, the Lagrangean Theory and Duality and many related compu-
tational procedures.
In the present paper we show that various models of elasto-plastic torsion
satisfy the structure (1.1) and that the equilibrium conditions (1.1) can be
expressed in terms of a Variational Inequality.
Let R be an open bounded Lipschitz domain with its boundary I' = dR; for
the sake of simplicity, we confine ourselves to the case R c R2.
Let K be the closed convex non empty subset of HA (52):
where:
where:
(a,X i , X 2 ) E C* = { ( a ,X i , X 2 ) : a, X i , X 2 E L ~ ( R )a,, X i , A2 2 0 a. e. in 0 ) .
Taking into account that the convex K satisfies the constraint qualification
condition introduced in Borwein et al. (1991), namely the "quasi-relative in-
terior of K is non empty", which replaces the standard Slater condition in the
infinite dimensional case, following Daniele (1999) (see also Maugeri (1998)),it
is possible to show the following Lemma:
- -
Lemma 2.1 There exist (P,X I , X 2 ) E C* such that
- -
Vv E HA ( R ) ,V(P, X I , X z ) E C*. Moreover, L(u,F, X I , X 2 ) = 0.
(Lu,v ) = a(., v) -
J, f vdx Vv E H,$(0).
By means of Lemma 2.1, we can prove the following result:
EQUILIBRIUM PROBLEMS 523
T h e o r e m 2.1 Let u be solution to the problem (2.2) and C u the operator de-
fined by (2.5). Then u fulfills the conditions:
- -
from C ( u ,F , X I , X 2 ) = 0 , it follows:
++A- X I ( x ) d x= xl ( x ) ($+ 2) dx
( Lu-p- a-+-
8x1
8%) +
ax2
(-+- a
8x2
, v-u ) t o
also
Taking into account (2.7), we find that the solution to the Variational In-
equality (2.2) verifies the conditions (2.6):
(see Brkzis (1972), Chiadb et al. (1994), Lanchon (1969), Ting (1969) for similar
results).
This is a particular case of the general scheme (1.1).
In the paper Brkzis et al. (1977) the author claims for the convenience to
study problems of the general scheme (1.1) in a convex R in which an upper
bound for the gradient of u is given:
This case is studied in paper Idone et al. (2002), in which a similar charac-
terization of (1.1) is proved. In particular the convex set
-
K = {v E ~ ; ( 0 ): v 2 0 v(x) 5 6(x))
6(x) = dist(x, dfl)
is considered and, under further assumptions, the following characterization is
I
shown:
-f - F ) ( ~ ( x-
) 4 ~ ) =) 0
L(u)- f -p<o
~ ( xS) 6(x).
It is also possible to prove
I Lu - - div(X grad u) = 0
References
Borwein, J.M. and Lewis, A.S. (1991), Practical conditions for Fenchel duality
in Infinite Dimension, Pitman Research Notes in Mathematics Series, 252,
pp. 83-89.
BrBzis, H. (1972), Multiplicateur de Lagrange en torsion Blasto-plastique, Arch.
Rational Mech. Anal., 49, pp. 32-40.
BrBzis, H. and Stampacchia, G. (1977), Remarks on some fourth order varia-
tional inequality, Ann. Scuola Norm. Sup. Pisa (4), pp. 363-371.
BrBzis, H. (1972), Problkmes UnilatBraux, J. Mat. pures et appl. 51, pp. 1-168.
Chiadb, V. and Percivale, D. (1994), Generalized Lagrange Multipliers in Elasto-
plastic torsion, Journal of Differential Equations, 114, pp. 570-579.
Daniele, P. (1999), Lagrangean function for dynamic Variational Inequalities,
Rendiconti del Circolo Matematico di Palermo, 58, pp. 101-119.
Idone, G., Variational inequalities and application to a continuum model of
transportation network with capacity constraints, to appear.
Idone, G., Maugeri, A. and Vitanza, C. (2002), Equilibrium problems in Elastic-
Palstic Torsion, Boundary Elements 24th, Brebbia C.A., Tadeu A., Popov
V. Eds., WIT Press, Southampton, Boston, pp. 611-616.
REFERENCES 527
1 INTRODUCTION
The gap function approach for Variational Inequalities (for short, VI) has al-
lowed to develop a wide class of descent methods for solving the classic V I
defined by the following problem:
where F : lRn -
find y* E K s.t. (F(y*),x - y*) 2 0,
-
a)
VX E K ,
lR is a non-negative function on K ,
such that p(y) = 0 with y E K if and only if y is a solution of VI. Therefore
solving a V I is equivalent to the (global) minimization of the gap function on
K.
In the last years the efforts of the scholars have been directed to the study
of differentiable gap functions in order to simplify the computational aspects of
the problem. See Harker et a1 (1990), for a survey on the theory and algorithms
developed for VI.
The problem of defining a continuously differentiable gap function was first
solved by Fukushima (1992) whose approach was generalized by Zhu et a1
(1994); they proved that
where x = x(t), t 2 0.
Moreover some algorithmic applications have been developed in the field of
bundle methods for solving V I (see e.g. Lemarechal et a1 (1995)).
In this paper, we will deepen the analysis of descent methods for VI* initi-
ated by Mastroeni (1999). In particular, we will define an inexact line-search
algorithm for the minimization of a gap function associated to the problem
VI*.
In Section 2 we will recall the main properties of the gap functions related
to VI*. In Section 3 we will develop an inexact descent method for VI*, in the
hypothesis of strong monotonicity of the operator F. Section 4 will be devoted
to a brief outline of the applications of Minty Variational Inequality and to the,
recently introduced, extension to the vector case (Giannessi (1998)).
A function f : Rn -
We recall the main notations and definitions that will be used in the sequel.
R is said quasi-convex on the convex set K iff:
A function f : K -
in (1.1),for every XI
R is said strictly quasi-convex iff strict inequality holds
# x2 and every X E (0,l).
This last definition has been given by Ponstein (1967). Different definitions
of strict quasi-convexity can be found in the literature ( see e.g. Karamardian
(1967)): for a deeper analysis on this topic see Avriel et a1 (1981) and references
therein.
A strictly quasi-convex function has the following properties (Thomson et
a1 (1973)):
(i)f is quasi-convex on K ,
532 OPTIMIZATION AND CONTROL WITH APPLICATIONS
In this section, we will briefly recall the main results concerning the gap function
theory for VI* (Mastroeni (1999)). Following the analysis developed for the
-
classic VI, we introduce the gap function associated to VI*.
H(x,x) = 0, Vx E K; (2.1)
Proposition 2.1 Let K be a convex set i n lRn. Suppose that H : lRn x lRn -
and F : lRn
Then
-
lR, is a non negative, differentiable function o n K that fulfils (2.1) and (2.2)
lRn is a differentiable and pseudomonotone operator o n K .
(VY[q(x*,
x*) + H(x*,x*)],Y - x*) 2 0, Qy E K,
534 OPTIMIZATION AND CONTROL WITH APPLICATIONS
+
Since V y q ( x , y ) = F ( y ) V F ( y ) ( y- x ) then V y q ( x * , x * )= F ( x * ) , which
implies that x* is a solution of V I . By the pseudomonotonicity of F, we obtain
that x* is also a solution of V I * .
Now suppose that x* is a solution of V I * . Since H ( x ,y) is non negative, we
have that
( F ( Y )Y, - x*) + H(x*,Y ) > 0, YY E K ,
which is equivalent to the condition
h ( x ) = - inf +(x,y)
YEK
Since +(x,y) is strictly quasi convex with respect to y then there exists a
unique minimum point y(x) of the problem (2.3) . Applying Theorem 4.3.3 of
Bank et a1 (1983) (see the Appendix), we obtain that y(x) is U.S.C. according
GAP FUNCTIONS 535
From the continuity of F, y(x) and VxH, it follows that hl(x) is continuous at
x so that h is continuously differentiable and
In the previous section we have shown, that under suitable assumptions on the
operator F and the function H , the gap function associated to the variational
inequality VI*:
is continuously differentiable on K.
This considerable property allows us to define descent direction methods for
solving the problem
min h(x) .
x EK
> K;
-
3. F is a continuously differentiable operator on an open set A
where $J : lRn
$J(O)= 0.
- lR is nonnegative, continuously differentiable and such that
Lemma 3.1 Suppose that the hypotheses 1-4 hold and, furthermore, VF(y) is
a positive definite matrix, Vy E K . Let g(x) be the solution of P(x). Then x*
is a solution of VI* @x* = y(x*).
Next result proves that y(x) - x provides a descent direction for h at the
point x, when x # x*.
GAP FUNCTIONS 537
Proposition 3.1 Suppose that the hypotheses 1-4 hold and F is strongly monotone
on K (with modulus p > 0). Let y(x) be the solution of the problem P(x) and
d(x) := y(x) - x. Then
The following exact line search algorithm has been proposed by Mastroeni
(1999):
Algorithm 1
Theorem 3.1 Suppose that the hypotheses 1-4 hold and VF(y) is positive def-
inite, Vy E K . Then, for any xo E K the sequence {xk) defined by Algorithm 1
belongs to the set K and converges to the solution of the variational inequality
VI*.
Proof: Since VF(y) is positive definite Vy E K , and F is continuously differ-
entiable then F is a strictly monotone operator (Ortega et a1 (1970), Theorem
5.4.3) and therefore both problems V I and VI* have the same unique solution.
The convexity of K implies that the sequence {xk) C K since tr, E [O,l]. I t
is proved in the Proposition 2.2 that the function y(x) is continuous, which
implies the continuity of d(x). It is known (see e.g. Minoux (1986), Theorem
3.1) that the map
Algorithm 2
Step 3. Let dk := y(xk) - xk. Select the smallest nonnegative integer m such
that
h(xk)- h(xk +pmdk) 2 apmIldkl12,
set ak = pm and xk+l = xk + akdk.
If Ilxk+l - xkII < E, then STOP, otherwise let k = k + 1 and go to step 2.
Theorem 3.2 Suppose that the hypotheses 1-4 hold, F is a strongly monotone
operator on K with modulus p, a < p/2, and { x k ) is the sequence defined in
the Algorithm 2. Then, for any xo E K , the sequence { x k ) belongs to the set
K and converges to the solution of the variational inequality V I * .
If Gk > pmO
y (x*)= x* .
> 0, for some mo, 'dk > k E N, then Ild(Zk)ll - 0 SO that
540 OPTIMIZATION AND CONTROL WITH APPLICATIONS
akl -
Otherwise suppose that there exists a subsequence
0. By the line search rule we have that
{ a k ~ )G { C k ) such that
where &I = I
-i+
Taking the limit in (3.6) for k + oo, since
differentiable, we obtain
- 0 and h is continuously
-(Vh(x*),d*) 2 p(ld*1I2.
Besides the already mentioned equivalence with the classic VI, Minty varia-
tional inequality enjoys some peculiar properties that justify the interest in the
development of the analysis. We will briefly recall some applications in the
field of optimization problems and in the theory of dynamical systems. Finally
we will outline the recently introduced extension to the vector case (Giannessi
(1998)).
Consider the problem
-
min f(x), s.t. X E K , (4.1)
then x* is stable.
Giannessi (1998) has extended the analysis of V I * to the vector case and has
obtained a first order optimality condition for a Pareto solution of the vector
optimization problem:
542 OPTIMIZATION AND CONTROL WITH APPLICATIONS
(VVI*)
5 CONCLUDING REMARKS
We have shown that the gap function theory developed for the classic V I , intro-
duced by Stampacchia, can be extended, under further suitable assumptions, to
the Minty Variational Inequality. These extensions are concerned not only with
the theoretical point of view, but also with the algorithmic one: under strict or
strong monotonicity assumptions on the operator F , exact or inexact descent
methods, respectively, can be defined for VI* following the line developed for
VI.
I t would be of interest to analyse the relationships between the class of gap
functions associated to V I and the one associated to VI* in the hypothesis of
pseudomonotonicity of the operator F, which guarantees the equivalence of the
two problems. This might allow to define a resolution method, based on the
simultaneous use of both gap functions related to V I and VI*.
6 APPENDIX
In this appendix we recall the main theorems that have been employed in the
proofs of the results stated in the present paper.
GAP FUNCTIONS 543
Theorem 6.1 (Bank et a1 (1983)) is concerned with the continuity of the opti-
mal solution map of a parametric optimization problem. Theorem 6.2 (Auslen-
der (1976)) is a generalization of well-known results on directional differentia-
bility of extremal-value functions. Theorem 6.3 is the Zangwill convergence
theorem for a general algorithm formalized under the form of a multifunction.
Consider the following parametric optimization problem:
- -
where f : A x Y
Let $ : A
- v(x) := inf {f (x, y) s.t.
R, M : A 2Y, Y
2Y be the optimal set mapping
y E M(x)},
RP -
we report the statement of Auslender (1976). We recall that a function h :
R is said to be "directionally differentiable" a t the point x* E RP in
the direction d, iff there exists finite:
lim
h(x* + td) - h(x*) =: hl(x*;d).
t*0+ t
544 OPTIMIZATION AND CONTROL WITH APPLICATIONS
v ( x ) := inf f ( x ,y ) ,
1. f is continuous on RPx Y ;
YEY
v t ( x ; d )= inf ( V , f ( x , y ) , d ) .
YE+(.)
vt(x*)= Vz f (x*,y(x*)).
- 2X.
1. x $ M implies z ( y ) < z ( x ) Vy E A ( x ) ,
References
1 INTRODUCTION
Note that the class of Po functions includes the class of monotone functions.
Applications of N C P can be found in many important fields such as mathe-
matical programming, economics, engineering and mechanics (see, e.g.,Cottle
et a1 (1992); Harker and Pang (1990)).
There exist several methods for the solution of the complementarity problem.
In this paper are considered regularization methods, which are designed to
handle ill-posed problems. Very roughly speaking, an ill-posed problem may
be difficult to solve since small errors in the computations can lead to a totally
wrong solution.
For the class of the Pofunctions, Facchinei and Kanzow (1999) considered the
Tikhonov-regularization, this scheme consist of solving a sequence of comple-
mentarity problems NCP(Fk), where Fk(x) := F(x) + ckx and ck is a positive
parameter converging to 0, and Yamashita et a1 (1999), considered the proximal
point algorithm, proposed by Martinet (1970) and further studied by Rockafel-
lar (1976). For the NCP(F), given the current point xk, the proximal point
algorithm produces the next iterate by approximately solving the subproblem
+
NCP(Fk), where Fk(x) := F(x) ck(x - x" and ck is a positive parameter
that does not necessary converge to 0. In the case above, if F is a Po function,
then Fk is a P function, that is, for any x, y E Rnwith x # y,
2 PRELIMINARIES
In this section we review some basic definitions and properties which will be
used in the subsequent analysis.
We first restate the basic definition.
(b) (More' and Rheinboldt (1973), Theorem 5.2) If F ' ( x ) is a P-matrix for
every x E IRn, then F is a P function;
The following theorem is a version of the mountain pass theorem (see, Palais
and Terng (1988)),that will be used to establish a global convergence theorem
for the proposed algorithm.
-
A NEW CLASS OF PROXIMAL ALGORITHMS 553
m := min f ( x ) .
~Eas
Assume further that there are two points a E S and b 6 S such that f (a) < m
and f (b) < m. Then there exists a points c E lRn such that V f ( c ) = 0 and
f ( c ) L m.
IR2 -
N C P ( F ) using the Fischer- Burmeister function (see, Fischer (1992)), cp :
lR, defined by
p(a, b) = JW- a - b.
The most fundamental property of this function is that
GJ(x)= 511wl12.
For the regularized problem, we define the corresponding operator and the
corresponding merit function similarly as
and
554 OPTIMIZATION AND CONTROL WITH APPLICATIONS
zj i f j $ J.
A NEW CLASS OF PROXIMAL ALGORITHMS 555
where j is one of the indices for which the max is attained, that is in-
dependent of I . Since { y l ) is bounded, by continuity of Fj it follows
that { ~ j ( y ' )is) bounded. Therefore, zj [ F j ( y l ) ]2 0 implies
~( zjl )-
that { ~ ~ ( z does
' ) ) not tend to -m. This, in turn, implies zj ---t oo e
+
F j ( z l ) ~ ~ ( x $ ) - ~-(x$)
z j --+ m. By Lemma 3.1 we have that
We are now in position to prove the following existence and uniqueness result.
in turn, implies, also using Proposition 3.1, item 2, that the global minimum
is a stationary point of @. However, @ i s a
P function; in particular, 9pk itself is a Pofunction, so that the global mini-
mum must be a solution of N C P ( F k ) , due to Proposition 3.1, item 3.
S t e p 2: Choose ck+l E (0, ck) and 6k+l E (0, Sk). Set k := k + 1, and go to
Step 1.
The Algorithm 4.1 is well defined, since by Theorem 3.1 the N C P ( F 9 has
a unique solution, therefore, as @k is continuous, given dk > 0, there exists
xW1 E R:+, in a neigborhood of the unique solution of the N C P ( F ~ ) , such
that @(x"l)* 5 Sk.
-
solution of N C P ( F ) . The sequence {ck) satisfies the following conditions:
-
(A) ck(xk))-' (xM1 - xk) 0 if {xk) is bounded;
Remark 4.2 The conditions (A) and (B) can be ver$ed if we define ck =
for 4 E (0,1). In this case ck +0.
)
Lemma 4.1 Suppose that condition (B) holds. Let S C lRn be an arbitrary
compact set. If {xk} zs unbounded, then for any e > 0, there exists a sujjiciently
large ko such that for all k 2 ko
where e = (1,. . . , l)TE lRn. Therefore by condition (B), and the fact that S
is compact we have I~F"x)- F(x)ll --+ 0, since r 2 1 e x E S. Now, for any
a,b,c E R we have that
Applying the result above with a = xi, b = Fi(x) and c = c~(x:)-"(x~- x:)
we have that Icp(xi, F!(x)) - cp(xi, F ~ ( x ) ) 5
[ 21~k(xP)-~(xi
- x:)I --+ 0.
Since S is compact, Fk converges uniformly to F in S; furthermore, cp, F
and F%re continuous, so we have that for all i
The following result is our main convergence theorem for the Algorithm 4.1.
558 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Theorem 4.1 Suppose that F is a Po function and assume that the solution
Proof: First we show that {x" is bounded. Suppose that the sequence {xk}
is not bounded. Then there exists a subsequence { x ~ such
} that
~ ~llxkll~ + co
as k + co with k E K. Since S*is bounded, there exists a nonempty compact
set S c lRn such that S*c i n t ( S ) and x" S for all k E K , sufficiently large.
If x* E S*,then we have @(x*)= 0. Let
Applying Lemma 4.1 with e := $, there exists some ko such that for all k 2 ko
and
m := min@"x)
XE~S
> -.34a
Since ak(xk+') < 6; by Step 1 of Algorithm 4.1, there exists some kl such
that for all k 2 kl,
< ,;
-
ak(xk+l)
Since {x"
--
is bounded, we have IIFk(xk+l)- ~(x"l)ll
(A), and hence I@"xwl) - (a(x"')l
-
Next, we show that any accumulation point of {xk} is a solution of N C P ( F ) .
0 by condition
0. By Step 1 of algorithm and the
that @(xwl) -
assumption that 6k --+ 0, we have ak(xk+l) 0 . Consequently it holds
0, which means that every accumulation point of the sequence
{xk} is a solution of NCP(F).
REFERENCES 559
5 CONCLUSIONS
Acknowledgments
The author Da Silva thanks CAPES/PICDT/UFG for support. The author Oliveira
thanks CNPq for support.
References
Palais, R.S. and Terng, C.L. (1988), Critical point theory and submanifold geom-
etry, Lecture Note in Mathematics, 1353, Springer Verlag, Berlin.
Cottle, R.W., Pang, J.S. and Stone, R.E. (1992),The Linear Complementarity
Problem, Academic Press, New York.
Rapcsdk, T. (1997), Smooth Nonlinear Optimization in Rn, Kluwer Academic
Publishers, Dordrecht, Netherlands.
Udriste, C. (1994), Convex Functions and Optimization Methods in Riemannian
Geometry, Kluwer Academic Publishers, Dordrecht, Netherlands.
Bayer, D.A. and Lagarias, J.C. (1989), The Nonlinear Geometry of Linear Pro-
gramming I, Affine and Projective Scaling Trajectories, Transactions of the
American Mathematical Society, Vol. 314, No 2, pp. 499-526.
Bayer, D.A. and Lagarias, J.C. (1989), The Nonlinear Geometry of Linear
Programming 11, Legendre Transform Coordinates and Central Trajecto-
ries, Transactions of the American Mathematical Society, Vol. 314, No 2,
pp. 527-581.
560 OPTIMIZATION AND CONTROL WITH APPLICATIONS
Cruz Neto, J.X. and Oliveira, P.R. (1995), Geodesic Methods in Riemannian
Manifolds, Preprint, R T 95-10, PESC/COPPE - Federal University of Rio
de Janeiro, BR
Cruz Neto, J. X., Lima, L.L. and Oliveira, P. R. (1998), Geodesic Algorithm in
Riemannian Manifold, Balkan JournaL of Geometry and Applications, Vol.
3, NO 2, pp. 89-100.
Ferreira, O.P. and Oliveira, P.R. (1998), Subgradient Algorithm on Riemannian
Manifold, Journal of Optimization Theory and Applications, Vol. 97, No 1,
pp. 93-104.
Ferreira, O.P. and Oliveira, P.R. (2002), Proximal Point Algorithm on Rie-
mannian Manifolds, Optimization, Vol. 51, No 2, pp. 257-270.
Gabay, D. (1982), Minimizing a Differentiable Function over a Differential Man-
ifold, Journal of Optimization Theory and Applications, Vol. 37, No 2, pp.
177-219.
Karmarkar, N. (1990), Riemannian Geometry Underlying Interior-Point Meth-
ods for Linear Programming, Contemporary Mathematics, Vol. 114, pp. 51-
75.
Karmarkar, N. (1984), A New Polynomial-Time Algorithm for Linear Program-
ming, Combinatorics, Vol. 4, pp. 373-395.
Nesterov, Y.E. and Todd, M. (2002), On the Riemannian Geometry Defined
by self- Concordant Barriers and Interior-Point Methods, Preprint
Dikin, 1.1. (1967), Iterative Solution of Problems of Linear and Quadratic Pro-
gramming, Soviet Mathematics Doklady, Vol. 8, pp. 647-675.
Eggermont, P.P.B. (1990), Multiplicative Iterative Algorithms for Convex Pro-
gramming, Linear Algebra and its Applications, Vol. 130, pp. 25-42.
Oliveira, G.L. and Oliveira, P.R. (2002), A New Class of Interior-Point Methods
for Optimization Under Positivity Constraints, TR PESC/COPPE-FURJ,
preprint,
Pinto, A.W.M., Oliveira, P.R. and Cruz Neto, J . X. (2002), A New Class of Po-
tential Affine Algorithms for Linear Convex Programming, TR PESC/COPPE-
FUR J, preprint,
Mork, J.J. and Rheinboldt, W.C. (1973), On P- and S-functions and related
classes of n- dimensional nonlinear mappings, Linear Algebra Appl., Vol. 6,
pp. 45-68.
REFERENCES 561
Harker, P.T. and Pang, J.S. (1990), Finite dimensional variational inequality
and nonlinear complementarity problems: A survey of theory, algorithms and
applications, Mathematical Programming, Vol. 48, pp. 161-220.
Rockafellar, R.T. (1976), Monotone operators and the proximal point algo-
rithm, SIAM Journal on Control and Optimization, Vol. 14, pp. 877-898.
Martinet, B. (1970), Regularisation d'inequations variationelles par approxi-
mations sucessives, Revue Pran~aised'lnformatique et de Recherche Opera-
tionelle, Vol. 4, pp. 154-159.
Facchinei, F. (1998), Strutural and stability properties of Po nonlinear com-
plementarity problems, Mathematics of Operations Research, Vol. 23, pp.
735-745.
Facchinei, F. and Kanzow, C. (1999), Beyond Monotonicity in regularization
methods for nonlinear complementarity problems, SIAM Journal on Control
and Optimization, Vol. 37, pp. 1150-1161.
Yamashita, N., Imai, I. and Fukushima, M. (2001), The proximal point algo-
rithm for the Po complementarity problem, Complementarity: Algorithms
and extensions , Edited by Ferris, M. C., Mangasarian, 0 . L. and Pang, J.
S., Kluwer Academic Publishers, pp. 361-379.
Kanzow, C. (1996), Global convergence properties of some iterative methods
for linear complementarity problems, SIAM Journal of Optimization, Vol. 6,
pp. 326-341.
MorB, J.J. (1974), Coercivity conditions in nonlinear complementarity problem,
SIAM Rev., Vol. 16, pp. 1-16.
Fischer, A. (1992), A special Newton-type optimization method, Optmization,
Vol. 24, pp. 269-284.