0% found this document useful (0 votes)

10 views

Optimization and Control With Applications

Uploaded by

Aya Hameid Abo Elnaga

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Optimization and Control With Applications

Uploaded by

Aya Hameid Abo Elnaga

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 586

OPTIMIZATION AND CONTROL

WITH APPLICATIONS
Applied Optimization

VOLUME 96

Series Editors:

Panos M. Pardalos
University of Florida, U.S.A.

Donald W. H e m
University of Florida, U.S.A.
OPTIMIZATION AND CONTROL
WITH APPLICATIONS

Edited by

LIQUN QI
The Hong Kong Polytechnic University, Hong Kong

KOKLAY TEO
The Hong Kong Polytechnic University, Hong Kong

XIAOQI YANG
The Hong Kong Polytechnic University, Hong Kong

Q
- Springer
Library of Congress Cataloging-in-Publication Data

A C.I.P. record for this book is available from the Library of Congress.

ISBN 0-387-24254-6 e-ISBN 0-387-24255-4 Printed on acid-free paper.

O 2005 Springer Science+Business Media, Inc.

All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street,
New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly
analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now know or hereafter
developed is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if
the are not identified as such, is not to be taken as an expression of opinion as to whether or not
they are subject to proprietary rights.

Printed in the United States of America.

9 8 7 6 5 4 3 2 1 SPIN 11367154
Contents

...
Preface Xlll

Biographical Sketch of Elijah Polak xv

Publications of Elijah Polak xvii

Part I DUALITY AND OPTlMALlTY CONDITIONS

1
ON MINIMIZATION OF MAX-MIN FUNCTIONS
A.M. Baqimv and A.M. Ruhin,oo
1 lntroduction
2 Special Classes of Max-min Objective Functions
3 Discrete Max-min Functions
4 Optimization Problems with Max-min Constraints
5 Minimization of Continuous Maximum Functions
6 Concluding Remarks

References
L
A COMPARISON OF TWO APPROACHES T O SECOND-ORDER SUBDIF-
FERENTlABlLlTY CONCEPTS WITH APPLICATION T O OPTIMALITY
CONDITIONS
A . Eherhard and C . E. M. Prairr
1 lntroduction
2 Preliminaries
3 Characterization of Supported Operators
4 Generalized Convexity and Proximal Subderivatives
5 Generalized Convexity and Subjets
6 Subjet, Contingent Cone Inclusions
7 Some Consequences for Optimality Conditions
8 Appendix

References

DUALITY AND EXACT PENALIZATION VIA A GENERALIZED AUGMENTED

LAGRANGIAN FUNCTION
X . X . Hwqg and X. Q. Yang
1 lntroduction
vi OPTIMIZATION AND CONTROL WITH APPLICATIONS

2 Generalized Augmented Lagrangian

3 Strong Duality
4 Exact Penalty Representation
5 Conclusions

References
4
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING WITH
EQUALITY CONSTRAINTS
S. J. Li, X. Q. Yang and K. L. Teo
1 lntroduction and Preliminaries
2 Uniform Duality for Homogeneous (SDSIP)
3 Uniform Duality for Nonhomogeneous (SDSIP)

References
5
THE USE OF NONSMOOTH ANALYSIS AND OF DUALITY METHODS
FOR THE STUDY OF HAMILTON-JACOB1 EQUATIONS
Jean-Paul Penot
1 lntroduction
2 The Interest of Considering Extended Real-valued Functions
3 Solutions in the sense of Unilateral Analysis
4 Validity of Some Explicit Formulae
5 Uniqueness and Comparison Results

References
6
SOME CLASSES OF ABSTRACT CONVEX FUNCTIONS
A.M. Rubinov and A.P. Shveidel
1 lntroduction
2 Sets Pr,
3 Supremal Generators of the Sets Ph
4 Lk~ubdifferentials

References

Part II OPTIMIZATION ALGORITHMS

7
AN IMPLEMENTATION OF TRAINING DUAL-nu SUPPORT VECTOR MACHINES
Hong-Gunn Chew, Cheng-Chew Lim and Robert E. Bogner
1 lntroduction
2 Dual-v Support Vector Machines
3 Optimisation Method
4 Initialisation Technique
5 Implementation Issues
6 Performance Results
7 Conclusions 178
Contents

Appendix

References
8
A N ANALYSIS OF T H E BARZlLAl AND BORWEIN GRADIENT METHOD
FOR UNSYMMETRIC LINEAR EQUATIONS
Yu-Hong Dai, Li-Zhi Liao and Duan Li
1 lntroduction
2 Case of Identical Eigenvalues
3 Properties of the Recurrence Relation (2.8)
4 Case of DifFerent Eigenvalues
5 Properties of the Recurrence Relation (4.11)
6 Concluding Remarks

References
9
A N EXCHANGE ALGORITHM FOR MINIMIZING SUM-MIN FUNCTIONS
Alexei V. Demyanov
1 lntroduction
2 Statement of the Problem
3 Equivalence o f the Two Problems
4 Minimality Conditions
5 An Exchange Algorithm
6 An €-Exchange Algorithm
7 An Application t o One Clustering Problem
8 Conclusions

References
10
ON T H E BARZILAI-BORWEIN METHOD
Roger Fletcher
1 lntroduction
2 The B B Method for Quadratic Functions
3 The B B Method for Non-quadratic Functions
4 Discussion
5 Optimization with Box Constraints

References
11
T H E MODIFIED SUBGRAIDENT METHOD FOR EQUALITY CONSTRAINED
NONCONVEX OPTIMIZATION PROBLEMS
Rafail N. Gasimov and Nergiz A. Ismayilova
1 lntroduction
2 Duality
3 Solving the Dual Problem

References
viii OPTIMIZATION AND CONTROL WITH APPLICATIONS

12
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING:
ADVANCES AND PERSPECTIVES
Jose' Mario ~ a r t j z n e zand Elvio A. Pilottn
1 lntroduction
2 Main Inexact Restoration Ideas
3 Definition of an IR Algorithm
4 AGP Optimality Condition
5 Order-Value Optimization
6 Bilevel Programming
7 Homotopy Methods
8 Conclusions

References
13
QUANTUM ALGORITHM FOR CONTINUOUS GLOBAL OPTIMIZATION
V. Protopopescu and J. Barhen
1 Global Optimization Problem
2 Grover's Quantum Algorithm
3 Solution of the Continuous Global Optimization Problem
4 Practical Implementation Considerations

References
14
SQP VERSUS SCP METHODS FOR NONLINEAR PROGRAMMING
Klaus Schittkowski and Christian Zillober
1 lntroduction
2 A General Framework
3 SQP Methods
4 SCP Methods
5 Comparative Performance Evaluation
6 Some Academic and Commercial Applications
7 Conclusions

References
15
AN APPROXIMATION APPROACH FOR LINEAR PROGRAMMING IN MEA-
SURE SPACE
C.F. Wen and S. Y. Wu
1 lntroduction
2 Solvability of LPM
3 An Approximation Scheme For LPM
4 An Algorithm For (DELPM)~

References
Contents ix

Part Ill OPTIMAL CONTROL

16
OPTIMAL CONTROL OF NONLINEAR SYSTEMS
S.P. Banks and T. Cimen
1 lntroduction
2 The Approximating Systems
3 Example
4 Results
5 Conclusions

References
17
PROXIMAL-LIKE METHODS FOR CONVEX MINIMIZATION PROBLEMS
Christian Kanzow
1 lntroduction
2 Proximal-like Methods
3 Numerical Results for Some Optimal Control Problems
4 Final Remarks

References
18
ANALYSIS OF TWO DIMENSIONAL NONCONVEXVARIATIONAL PROBLEMS
Rene' Meziat
1 lntroduction
2 The Method of Moments
3 Convex Envelopes
4 Problem Analysis
5 Discrete and Finite Model
6 Examples
7 Concluding Remarks

References
19
STABILITY OF EQUILIBRIUM POINTS OF PROJECTED DYNAMICAL SYSTEMS
Mauro Passacantando
1 lntroduction
2 Variational and Dynamical Models
3 Stability Analysis
4 Special Cases

References
20
ON A QUASI-CONSISTENT APPROXIMATIONS APPROACH TO OPTI-
MIZATION PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS
Olzvier Pironneau and Elzjah Polak
1 lntroduction
2 An Algorithm Model
x OPTIMIZATION AND CONTROL WITH APPLICATIONS

3 A Distributed Problem with Control in the CoefFicients

4 Conclusions
5 Appendix: Consistent Approximations

References
21
NUMERICAL SOLUTIONS OF OPTIMAL SWITCHING CONTROL PROBLEMS
T. Ruby and V. Rehbock
1 lntroduction
2 Problem Formulation
3 Solution Strategy
4 Numerical Examples and Discussion
5 Conclusions

References
22
A SOLUTION T O HAMILTON-JACOB1 EQUATION BY NEURAL NETWORKS
AND OPTIMAL STATE FEEDBACK CONTROL
Kiyotaka Shimizu
1 lntroduction
2 Nonlinear Optimal Regulator And Hamilton-Jacobi Equation
3 Approximate Solution To Hamilton-Jacobi Equation And Optimal State Feed-
back Control Law
4 Improvement Of Learning Algorithm Of Neural Network
5 Simulation Results
6 Conclusions

References
23
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS
Wei Xing, Q.L. Zhang, W.Q. Liu and Qiyi Wang
1 lntroduction
2 Preliminaries
3 Main Results
4 Conclusions

References

Part IV VARIATIONAL INEQUALITY AND EQUILIBRIUM

24
DECOMPOSABLE GENERALIZED VECTOR VARIATIONAL INEQUALITIES
E. Allevi, A. Gnudi and I. V. Konnov
1 lntroduction
2 Problem Formulations and Basic Facts
3 Relative Monotonicity Type Properties
4 Existence Results
5 Existence Results in Banach Spaces
Contents xi

References
25
ON A GEOMETRIC LEMMA AND SET-VALUED VECTOR EQUILIBRIUM
PROBLEM
Shui-Hung Hou
1 lntroduction
2 Preliminaries
3 A Variation of Fan's Geometric Lemma
4 Set-valued Vector Equilibrium Problem

References
26
EQUILIBRIUM PROBLEMS
Giovanna Idone and Antonino Maugeri
1 lntroduction
2 A Model of Elastic-Plastic Torsion

References
27
GAP FUNCTIONS AND DESCENT METHODS FOR MINTY VARIATIONAL
INEQUALITY
Gzandomenico Mastroeni
1 lntroduction
2 A Gap Function Associated to Minty Variational lnequality
3 Exact and Inexact Descent Methods
4 Some Applications and Extensions of Minty Variational lnequality
5 Concluding Remarks
6 Appendix

References
28
A NEW CLASS OF PROXIMAL ALGORITHMS FOR THE NONLINEAR COM-
PLEMENTARITY PROBLEM
G.J.P. DA Silva and P.R. 0lzvei.ra
1 lntroduction
2 Preliminaries
3 Existence of Regularized Solutions
4 Algorithm and Convergence
5 Conclusions

References
Preface

The 34th Workshop of The International School of Mathematics G. Stampac-

chia, The International Workshop on Optimization and Control with Applica-
tions, was held during July 9-17, 2001 in Ettore Majorana Centre for Scientific
Culture, Erice, Sicily, Italy. The Workshop was supported by Italian Ministry
of University and Scientific Research, Sicilian Regional Government, The Hong
Kong Polytechnic University and The National Cheng Kung University. The
Director of The International School of Mathematics G. Stampacchia is Franco
Giannessi. The Directors of the Workshop are Liqun Qi and Kok Lay Teo.
They jointly organized the Workshop. About 90 scholars from as many as 26
countries and regions attended the Workshop. It consisted of 21 45-minute
invited lectures, 45 30-minute contributed talks and 11 15-minute short com-
munications.
This book contains 28 papers emitted from the Workshop. All papers were
refereed. A special issue of Journal of Global Optimization containing 6 papers
from the Workshop is also published.
The 28 papers are divided into four parts: Part I, Duality and Optimality
Conditions, has 6 papers, Part 11, Optimization Algorithms, has 9 papers,
Part 111, Optimal Control, has 8 papers, Part IV, Variational Inequality and
Equilibrium Problems, has five papers.
One of the invited lecturers of the Workshop was Professor Elijah (Lucien)
Polak, who is a legend in our community. As a survivor of the Holocaust, Lucien
has made many significant contributions to optimization and control and their
applications. Lucien was close to his 70th birthday during the Workshop. We
decided to dedicate this edited volume to him.
We wish to take this opportunity to express our gratitude to Professor Franco
Giannessi, the staff of Ettore Majorana Centre for Scientific Culture, and all
speakers and participants for their contribution to the success of the Work-
shop. We would also like to thank Eva Yiu for the clerical work she provided
for the workshop. We greatly appreciate the support from the referees of all
submissions to this Special Issue for their reviews.
xiv OPTIMIZATION AND CONTROL WITH APPLICATIONS

We are very happy to see that this Workshop has become a new confer-
ence series. This Workshop is now regarded as OCA 2001. During August
18-22, 2002, The Second International Conference on Optimization and Con-
trol with Applications (OCA2002) was successfully held in Tunxi, China. The
Third International Conference on Optimization and Control with Applications
(OCA2003) will be held in Chongqing-Chengdu, China, during July 1-7, 2003.
Liqun Qi and Kok Lay Teo have continued to be the Directors of OCA 2002
and OCA 2003. We hope that OCA Series will continue to provide a forum
for international researchers and practitioners working in optimization, optimal
control and their applications to exchange information and ideas on the latest
development in these fields.

Liqun Qi, Kok Lay Teo and Xiaoqi Yang

The Hong Kong Polytechnic University
Biographical Sketch of Elijah Polak

Professor Elijah Polak

Elijah (Lucien) Polak was born August 11, 1931 in Bialystok, Poland. He is
a holocaust surviver and a veteran of the death camps a t Dachau, Auschwitz,
Gros Rosen, and Buchenwald. His father perished in the camps, but his mother
survived. After the War, he worked as an apprentice blacksmith in Poland and
a clothes salesman in France. In 1949, he and his mother migrated to Australia,
where, after an eight year interruption, he resumed his education, while working
various part time jobs.
Elijah Polak received the B.S. degree in Electrical Engineering, from the
University of Melbourne, .Australia, in 1957 and the M.S. and Ph.D. degrees,
xvi OPTIMIZATION AND CONTROL WITH APPLICATIONS

both in Electrical Engineering, from the University of California, Berkeley, in

1959 and 1961, respectively.
In 1961 he married Ginette with whom he had a son and a daughter. At
present, they have 5 grandchildren, of which two are beginning to show an
interest in mathematics.
From 1957 to 1958 he was employed as an Instrument Engineer by Imperial
Chemical Industries, Australia and New Zealand, Ltd., in Melbourne, Aus-
tralia. He spent the summers of 1959 and 1960 as a Summer Student, with
I.B.M. Research Laboratories, San Jose, California, and the Fall Semester of
1964 as a Visiting Assistant Professor, a t the Massachusetts Institute of Tech-
nology. Since 1961, he has been on the faculty of the University of California,
Berkeley, where he is a t present Professor Emeritus of Electrical Engineering
and Computer Sciences, as well as Professor in the Graduate School.
He was a Guggenheim Fellow in 1968 - 1969, a t the Institut Blaise Pascal,
in Paris, France, and a United Kingdom Science Research Council Senior Post
Doctoral Fellow, at Imperial College, London, England, in 1972, in 1976, in
1979, in 1982, 1985, 1988, and in 1990.
His research interests lie in the development of optimization algorithms for
computer-aided design, with applications to electronic circuit design, control
system design, and structural design, as well as algorithms for optimal control
and nonsmooth optimization.
He is the author or co-author of over 250 papers as well as of four books:
Theory of Mathematical Programming and Optimal Control (with M. Canon
and C. Cullum, 1970), Notes of a First Course on Linear Systems (with E.
Wong, 1970), Computational Methods in Optimization (1971), and Optimiza-
tion: Algorithms and Consistent Approximations (1997). In addition, with L.
A. Zadeh, he co-edited System Theory (1969), and translated from the Russian
Absolute Stability of Regulator Systems, by M. A. Aizerman and F. R. Gant-
macher .
He is a Life Fellow of the Institute of Electrical and Electronic Engineers, a
member of the Society of Industrial and Applied Mathematics and a member
of the Mathematical Programming Society.
He is an Associate Editor the Journal of Optimization Theory and Applica-
tions, and of the Journal of Computational Optimization and Applications.
Publications o f Elijah Polak

A. BOOKS

1. E. Polak, "Absolute stability of regulator systems", (Translated from

Russian) (M. A. Aizerman and F. R. Gantmacher) Holden-Day, 1964.

2. L. A. Zadeh and E. Polak eds., Systems Theory, McGraw-Hill, 521 pages,

1969.

3. Canon, M. D., C. D. Cullum and E. Polak, Theory of Optimal Control

and Mathematical Programming, McGraw-Hill Co., New York, 285 pages,
1970.

4. E. Polak and E. Wong, Notes for a First Course on Linear Systems, Van
Nostrand Reinhold Co. New York, 169 pages, 1970.

5. E. Polak, Computational Methods in Optimization: A Unified Approach,

Academic Press, 329 pages, 1971.

6. E. Polak, Optimization: Algorithms and Consistent Approximations,

Springer, New Yort, 800 pages, 1997

B. PAPERS AND REPORTS

1. E. Polak, "Stability and Graphical Analysis of First-Order Pulse-Width-

Modulated Sampled-Data Regulator Systems," IRE Trans. on Automatic
Control, Vol. AC-6, No. 3, pp. 276-282, 1961.

2. E. Polak, "Minimum Time Control of Second Order Pulse-Width-Modulated

Sampled-Data Systems," ASME Trans. Journal of Basic Engineering,
Vol. 84, Series D, No. 1, pp. 101-110, 1962.

3. E. Polak and C. A. Desoer, "A Note on Lumped linear Time Invariant

Systems," IRE Trans. on Circuit Theory, pp. 282-283, 1962.

4. E. Polak, "Minimal Time Control of a Discrete System with a Nonlinear

Plant," IEEE Trans. on Automatic Control, Vol. AC-8, No. 1, pp. 49-56,
1963.
xviii OPTIMIZATION AND CONTROL WITH APPLICATIONS

5. E. Polak, "Exploratory Design of a Hydraulic Position Servo," Instrument

Society of Americal Trans., Vol. 2, Issue 3, pp. 207-215, 1963.

6. E. Polak, "On the Equivalence of Discrete Systems in Time-Optimal Con-

trol," ASME Trans. Journal of Basic Engineering, Series D, pp. 204-210,
1963.

7. E. Polak, C. A. Desoer, and J. Wing, "Theory of Minimum Time Discrete

Regulators," Proc. Second IFAC Congress, Paper No. 406, Basle, 1963.

8. E. Polak, "An Application of Discrete Optimal Control Theory," J. Franklin

Inst., Vol. 276, No. 2, pp. 118-127, 1963.

9. H. Kwakernaak and E. Polak, "On the Reduction of the System x =

Ax + Bu, y = c'x to its Minimal Equivalent," IEEE Trans. on Circuit
Theory, Vol. CT-10, No. 4, 1963.

10. E. Polak, "A Note on D-Decomposition Theory," IEEE Trans. on Con-

trol, Vol. AC-9, No. 1, January 1964.

11. E. Polak, "Equivalence and Optimal Strategies for some Minimum Fuel
Discrete Systems," J. of the fianklin Inst., Vol. 277, No. 2, pp. 150-162,
February 1964.

12. E. Polak, "On the Evaluation of Optimal and Non-Optimal Control Strate-
gies," IEEE Trans. on Automatic Control, Vol. AC-9, No. 2, pp. 175-
176,1964.

13. M. D. Canon and E. Polak, "Analog Circuits for Energy and Fuel Optimal
Control of Linear Discrete Systems," University of California, Berkeley,
Electronics Research Laboratory, Tech. Memo. M-95, August 1964.

14. E. Polak and B. W. Jordan, "Theory of a Class of Discrete Optimal

Control Systems," Journal of Electronics and Control, Vol. 17, No. 6,
pp. 697-711, 1964.

15. E. Polak and B. W. Jordan, "Optimal Control of Aperiodic Discrete-Time

Systems," J. SIAM Control, Vol. 2, pp. 332-346, 1965.

16. E. Polak, "Fundamentals of the Theory of Optimal Control," Mathemat-

ical Review, Vol. 29, No. 2, pp. 404-405, 1965.
PUBLICATIONS O F ELIJAH POLAK xix

17. E. Polak, "An Algorithm for Reducing a Linear Time-Invariant Differ-

ential System to State Form," IEEE Trans. on Automatic Control, Vol.
AC-11, NO. 3, pp. 577-580, 1966.

18. E. Polak, H. Halkin, B. W. Jordan, and J. B. Rosen, "Theory of Optimum

Discrete Time Systems," Proc. 3rd IFAC Congress, Paper No. 28B,
London, 1966.

19. E. Polak and A. Larsen, Jr., "Some Sufficient Conditions for Continuous
Linear Programming Problems," Int'l J. Eng. Science, Vol. 4, No. 5, pp.
583-603, 1966.

20. E. Polak and C. D. Cullum, "Equivalence Relations for the Classification

and Solution of Optimal Control Problems," J. SIAM Control, Vol. 4,
NO. 3, pp. 403-420, 1966.

21. E. Polak, M. D. Canon, and C. D. Cullum, "Constrained Minimization

Problems in Finite Dimensional Spaces," J. SIAM Control, Vol. 4, No.
3, pp. 528-547, 1966.

22. E. Polak and J. P. Jacob, "On the Inverse of the Operator O(.) = A(.) +
(.)B," American Mathematical Monthly, Vol. 73, No. 4, Part I, pp.
388-390, April 1966.

23. E. Polak and N. 0 . Da Cunha, "Constrained Minimization Under Vector

Valued-Criteria in Finite Dimensional Spaces," J. Mathematical Analysis
& Applications, Vol. 19, No. 1, pp. 103-124, 1967.

24. E. Polak and N. 0. Da Cunha, "Constrained Minimization Under Vector-

Valued Criteria in Linear Topological Spaces," Proc. Conference on
Mathematical Theory of Control, Los Angeles, February 1967.

25. E. Polak and K. Y. Wong, "Identification of Linear Discrete Time Systems

Using the Instrumental Variable Method," IEEE Trans. on Automatic
Control, Vol. AC-12, No. 6, pp. 707-718, 1967.

26. E. Polak, "Necessary Conditions of Optimality in Control and Program-

ming," Proc. AMS Summer Seminar on the Math. of the Decision Sci-
ences, Stanford University, July-August 1967.
xx OPTIMIZATION AND CONTROL WITH APPLICATIONS

27. E. Polak, "An Algorithm for Computing the Jordan Canonical Form of
a Matrix," University of California, Berkeley, Electronics Research Lab-
oratory, Memo. M-223, September 1967.

28. E. Polak and J. P. Jacob, "On a Class of Pursuit-Evasion Problems,"

IEEE Trans. on Automatic Control, Vol. AC-12, No. 4, pp.752-755,
1967.

29. E. Polak and J . P. Jacob, "On Finite Dimensional Approximations to a

Class of Games," J. Mathematical Analysis & Applications, Vol. 21, No.
2, pp. 287-303, 1968.

30. P. L. Falb and E. Polak, "Conditions for optimality," in L. A. Zadeh and

E. Polak, eds., Systems Theory, McGraw-Hill, 1969.

31. E. Polak, "Linear Time Invariant Systems," in L. A. Zadeh and E. Polak,

eds., Systems Theory, McGraw-Hill, 1969.

32. E. Polak, "On the Removal of I11 Conditioning Effects in the Computation
of Optimal Controls," Automatica, Vol. 5, pp. 607-614, 1969.

33. E. Polak and E. J. Messerli, "On Second Order Necessary Conditions of

Optimality," SIAM J. Control, Vol. 7, No. 2, 272-291, 1969.

34. E. Polak, "On primal and Dual Methods for Solving Discrete Optimal
Control Problems," Proc. 2nd International Conference on Computing
Methods in Optimization Problems, San Remo, Italy, September 9-13,
1968. Published as: Computing Methods in Optimization Problems -2,
L. A. Zadeh, L. W. Neustadt and A. V. Balakrishnan, eds., pp. 317-331,
Academic Press, 1969.

35. E. Polak, "On the Convergence of Optimization Algorithms," Revue

Francaise dlInformatique et de Recherche Operationelle, Serie Rouge, No.
16, pp. 17-34, 1969.

36. E. Polak and G. Ribiere, "Note sur la Convergence de Methodes de Di-

rections Conjuguees," Revue Francaise dlInformatique et de Recherche
Operationelle, Serie Rouge, No. 16, 1969.

37. E. Polak and M. Deparis, "An Algorithm for Minimum Energy," IEEE
Trans. on Automatic Control, Vol. AC-14, No. 4, pp. 367-378, 1969.
PUBLICATIONS OF ELIJAH POLAK xxi

38. E. Polak, "On the Implementation of Conceptual Algorithms," Proc.

Nonlinear Programming Symposium, University of Wisconsin, Madison,
Wisconsin, May 4-6, 1970.

39. E. Polak and G. Meyer, "A Decomposition Algorithm for Solving a Class
of Optimal Control Problems," J. Mathematical Analysis & Applications,
Vol. 3, No. 1, pp. 118-140, 1970.

40. E. Polak, "On the use of models in the Synthesis of Optimization Al-
gorithms," Differential Games and Related Topics (Proceedings of the
International Summer School on Mathematical Models of Action and Re-
action, Varenna, Italy, June 15-27, 1970), H. Kuhn and G. Szego eds.,
North Holland, Amsterdam, pp. 263-279, 1971.

41. E. Polak, H. Mukai and 0. Pironneau, "Methods of Centers and of Feasi-

ble Directions for the Solution of Optimal Control Problems," Proc. 1971
IEEE Conference on Decision and Control, Miami Beach, Fla., Dec. 15-
17, 1971.

42. G. G. L. Meyer and E. Polak, "Abstract Models for the Synthesis of

Optimization Algorithms," SIAM J. Control, Vol. 9, No. 4, pp 547, 560,
1971.

43. 0. Pironneau and E. Polak, "On the Rate of Convergence of Certain

Methods of Centers," Mathematical Programming, Vol. 2, No. 2, pp.
230-258, 1972.

44. R. Klessig and E. Polak, "Efficient Implementations of the Polak-Ribiere

Conjugate Gradient Algorithm," SIAM J. Control, Vol. 10, No. 3, pp.
524-549, 1972.

45. E. Polak, "On a Class of Numerical Methods with an Adaptive Integra-

tion Subprocedure for Optimal Control Problems," Proc. Fourth IFIP
Colloquium on Optimization, Santa Monica, Calif. Oct. 19-22, 1971.
Published as: Techniques of Optimization, A. V. Balakrishnan, ed., Aca-
demic Press, pp. 89-105, 1972.

46. E. Polak, "A Survey of Methods of Feasible Directions for the Solution of
Optimal Control Problems," IEEE Transactions on Automatic Control,
Vol. AC-17, NO. 5, pp. 591-597, 1972.
xxii OPTIMIZATION AND CONTROL WITH APPLICATIONS

47. E. Polak, "A Modified Secant Method for Unconstrained Minimization,"

Proc. VIII International Symposium on Mathematical Programming,
Stanford University, Aug. 27-31, 1973.

48. 0. Pironneau and E. Polak, " A Dual Method for Optimal Control Prob-
lems with Initial and Final Boundary Constraints," SIAM J. Control, Vol.
11, NO. 3, pp. 534-549, 1973.

49. R. Klessig and E. Polak, "An Adaptive Algorithm for Unconstrained Op-
timization with Applications to Optimal Control," SIAM J. Control, Vol.
11, NO. 1, pp. 80-94, 1973.

50. 0. Pironneau and E. Polak, "Rate of Convergence of a Class of Methods

of Feasible Directions," SIAM J. Numerical Analysis, Vol. 10, No. 1,
pp. 161-174, 1973.

51. R. Klessig and E. Polak, "A Method of Feasible Directions Using Func-
tion Approximations with Applications to Min Max Problems," J. Math.
Analysis and Applications, Vol. 41, No. 3, pp. 583-602, 1973.

52. E. Polak, "On the Use of Optimization Algorithms in the Design of Linear
Systems," University of California, Berkeley, Electronics Research Lab.
Memo. No. M377, 1973.

53. E. Polak, "A Historical Survey of Computational Methods in Optimal

Control," SIAM Review, Vol. 15, No. 2, Part 2, pp. 553-584, 1973.

54. L. J. Williamson and E. Polak, "Convergence Properties of Optimal Con-

trol Algorithms," Proc. 1973 IEEE Conference on Decision and Control,
Dec. 5-7, 1973.

55. E. Polak, "Survey of Secant Methods for Optimization," Proc. 1973IEEE

Conference on Decision and Control, Dec. 5-7, 1973.

56. E. Polak "A Modified Secant Method for Unconstrained Optimization,"

Mathematical Programming, Vol. 6, No. 3, pp. 264-280, 1974.

57. E. Polak "A Globally Convergent Secant Method with Applications to

Boundary Value Problems," SIAM J. Numerical Analysis, Vol. 11, No.
3, pp. 529-537, 1974.
PUBLICATIONS OF ELIJAH POLAK xxiii

58. H. J. Payne, E. Polak, D. C. Collins and S. Meisel, "An Algorithm

for Multicriteria Optimization Based on the Sensitivity Function," Proc.
1974 IEEE Conference on Decision and Control, 1974.

59. E. Polak, R. W. H.Sargent and D. J. Sebastian, "On the Convergence of

Sequential Minimization Algorithms," J. Optimization Theory and Ap-
plications, Vol. 14, No. 4, pp. 439-442, 1974.

60. H. Mukai and E. Polak, "Approximation Techniques in Gradient Projec-

tion Algorithms," Proc. IEEE 1974 Allerton Conference on Circuits and
Systems, Univ. of Illinois, October, 1974.

61. E. Polak and Teodoru, I., "Newton Derived Methods for Nonlinear Equa-
tions and Inequalities," Nonlinear Programming, 0. L. Mangasarian, R.
R. Meyer and S. M. Robinson eds., Academic Press, N. Y., pp. 255-277,
1975.

62. E. Polak, "Computational Methods in Optimal Control," Proc. Confer-

ence on Energy Related Modelling and Data Base Management, Brookhaven
National Laboratories, May 12-14, 1975.

63. R. Klessig and E. Polak, "A Survey of Convergence Theorems," Proc.

Joint National Meeting, ORSA-TIMS, Las Vegas Nevada, Nov. 17-19,
1975.

64. H. Mukai and E. Polak, "A Quadratically Convergent Primal-Dual Algo-

rithm with Global Convergence Properties for solving optimization with
equality constraints," Mathematical Programming, Vol. 9, No. 3, pp.
336-350, 1975.

65. E. Polak, K. S. Pister and D. Ray, "Optimal Design of Framed Struc-

tures Subjected to Earthquakes," Proc. Symposium on Optimization
and Engineering Design in Conjunction with the 47th national Meeting
of ORSA-TIMS, Chicago, Ill. April 30 - May 2, 1975.

66. D. Q. Mayne and E. Polak, "First Order, Strong Variations Algorithms

for Optimal Control," J. Optimization Theory and Applications, Vol. 16,
No. 314, pp. 277-301, 1975.

67. E. Polak and D. Q. Mayne, "First Order, Strong Variations Algorithms

for Optimal Control Problems with Terminal Inequality Constraints," J.
xxiv OPTIMIZATION AND CONTROL WITH APPLICATIONS

Optimization Theory and Applications, Vol. 16, No. 314, pp. 303-325,
1975.

68. E. Polak, "On the Approximation of Solutions to Multiple Criteria Deci-

sion Making Problems," Proc. XXIIInternational Meeting TIMS, Kyoto,
Japan, July, 1975.

69. E. Polak, "On the Global Stabilization of Locally Convergent Algorithms

for Optimization and Root Finding," Proc. 6th triannaual IFAC Congress,
Boston Mass., Aug. 24-30, 1975.

70. H. J . Payne, E. Polak, D. C. Collins and S. Meisel, "An Algorithm for

Multicriteria Optimization Based on the Sensitivity Function," IEEE
Transactions on Automatic Control, Vol. AC-20, No. 4, pp. 546-548,
1975.

71. E. Polak and D. Q. Mayne, "An Algorithm for Optimization Problems

with Functional Inequality Constraints," IEEE Transactions on Auto-
matic Control, Vol. AC-21, No. 2, 1976.

72. E. Polak and R. Trahan, "An Algorithm for Computer Aided Design
of Control Systems," Proc. IEEE Conference on Decision and Control,
1976.

73. E. Polak and A. N. Payne, "On Multicriteria Optimization," Proc. Con-

ference on Directions in Decentralized Control, Many Person Games and
Large Scale Systems, Cambridge, Mass, Sept 1-3, 1975. Published as:
Directions in Large Scale Systems, Y. C. Ho and K. S. Mitter, eds.,
Plenum Press, N.Y., pp. 77-94, 1976.

74. L. J . Williamson and E. Polak, "Relaxed Controls and the Convergence

of Optimal Control Algorithms," SIAM J. Control, Vol. 14, No. 4, pp.
737-757, 1976.

75. E. Polak, "On the Approximation of Solutions to Multiple Criteria Deci-

sion Making Problems," Multiple Criteria Decision Making: Kyoto 1975,
M. Zeleny Ed., Springer Verlag, New York, pp. 271-182, 1976.

76. E. Polak, "On the Global Stabilization of Locally Convergent Algorithms

for Optimization and Root Finding," Automatica, Vol. 12, pp. 337-342,
1976.
PUBLICATIONS O F ELIJAH POLAK xxv

77. H. Mukai and E. Polak, "On the Implementation of Reduced Gradient

Methods," Proc. 7th IFIP Conference on Optimization Techniques, Nice,
France, Sept. 8-18, 1975. Published as: Optimization Techniques: Mod-
eling and Optimization in the Service of Man, Jean Cea, ed., Springer
Verlag, Berlin, N.Y., Vol. 2, pp. 426-437, 1976.

78. E. Polak, K. S. Pister and D. Ray, "Optimal Design of Framed Structures

Subjected to Earthquakes," Engineering Optimization, Vol. 2, pp. 65-71,
1976.

79. D. Q. Mayne and E. Polak, "Feasible Directions Algorithms for Optimiza-

tion Problems with Equality and Inequality Constraints," Mathematical
Programming, Vol. 11, pp. 67-80, 1976.

80. D. Q. Mayne and E. Polak, "A Feasible Directions Algorithm for Optimal
Control Problems with Terminal Inequality Constraints," IEEE Transac-
tions on Automatic Control, Vol. AC-22, No. 5, pp. 741-751, 1977.

81. I. Teodoru Gross and E. Polak, "On the Global Stabilization of Quasi-
Newton Methods," Proc. ORSA/TIMS National Meeting, San Francisco,
May 9-11, 1977.

82. D. Ray, K. S. Pister and E. Polak, "Sensitivity Analysis for Hysteretic

Dynamical Systems: Theory and Applications," Comp. Meth. in Applied
Mechanics and Engineering, Vol. 14, pp. 179-208, 1978.

83. E. Polak, "On a Class of Computer-Aided-Design Problems," Proc. 7th

IFAC World Congress, Helsinki, Finland, June 1978.

84. H. Mukai and E. Polak, "On the Use of Approximations in Algorithms

for Optimization Problems with Equality and Inequality Constraints,"
SIAM J. Numerical Analysis, Vol. 1, No. 4, pp. 674-693, 1978.

85. E. Polak and A. Sangiovanni Vincentelli, "An Algorithm for Design Cen-
tering, Tolerancing and Tuning," Proc. European Conference on Circuit
Theory and Design, Lausanne, Switzerland, Sept. 1978.

86. A. N. Payne and E. Polak, "An Efficient Interactive Optimization Method

for Multi-objective Design Problems," Proc. 16th Allerton Conference on
xxvi OPTIMIZATION AND CONTROL WITH APPLICATIONS

Communications, Control and Computing, Univ. of Illinois, October 4-6,

1978.

87. H. Mukai and E. Polak, "A Second Order Algorithm for Unconstrained
Optimization," J. Optimization Theory and Applications, Vol. 26, No.
4, 1978.

88. H. Mukai and E. Polak, "A second Order Algorithm for the General Non-
linear Programming problem," J. Optimization Theory and Applications,
Vol. 26, No. 4, 1978.

89. A. N. Payne and E. Polak, "An Interactive Method for Bi-Objective De-
cision Making," Proc. Second Lawrence Symposium on Systems and
Decision Sciences, Berkeley, Ca. Oct. 1978.

90. E. Polak and A. Sangiovanni Vincentelli, "On Optimization Algorithms

for Engineering Design Problems with Distributed Constraints, Toler-
ances and Tuning," Proc. 1978 Joint Automatic Control Conference,
October 18, 1978.

91. M. A. Bhatti, K. S. Pister and E. Polak, "Optimal Design of an Earth-

quake Isolation System," Earthquake Engineering Research Center, Uni-
versity of California, Berkeley, Report No. UCBIEERC-78/22, October,
1978.

92. T. Glad and E. Polak, "A Multiplier Method with Automatic Limitation
of Penalty Growth," Mathematical Programming, Vol. 17, No. 2, pp.
140-156, 1979.

93. E. Polak, D. Q. Mayne and R. Trahan, "An Outer Approximations Al-

gorithm for Computer Aided Design Problems," J. Optimization Theory
and Applications, Vol. 28, No. 3, pp. 331-352, 1979.

94. E. Polak and A. Sangiovanni Vincentelli, "Theoretical and Computational

Aspects of the Optimal Design Centering, Tolerancing and Tuning Prob-
lem," IEEE Trans. on Circuits and Systems, Vol. CAS-26, No. 9, pp.
795-813, 1979.

95. A. N. Payne and E. Polak, "An Interactive Rectangle Elimination Method

for Multi-Objective Decision Making," IEEE Trans. on Automatic Con-
trol, Vol. AC-25, No. 3, 1979.
PUBLICATIONS O F ELIJAH POLAK xxvii

96. S. Tishyadhigama, E. Polak and R. Klessig, "A Comparative Study of

Several General Convergence Conditions for Algorithms Modeled by Point
to Set Maps," Mathematical Programming Study 10, pp. 172-190, 1979.

97. E. Polak and D. Q. Mayne, "On the Finite Solution of Nonlinear In-
equalities," IEEE Trans. on Automatic Control, Vol. AC-24, No. 3, pp.
443-445, 1979.

98. E. Polak, R. Trahan and D. Q. Mayne, "Combined Phase I - Phase I1

Methods of Feasible Directions," Mathematical Programming, Vol. 17,
NO. 1, pp. 32-61, 1979.

99. C. Gonzaga and E. Polak, "On Constraint Dropping Schemes and Opti-
mality Functions for a Class of Outer Approximations Algorithms," SIAM
J. Control and Optimization, Vol. 17, No. 4, pp. 477-493, 1979.

100. R. Trahan and E. Polak, "A Derivative Free Algorithm for a Class of
Infinitely Constrained Problems,'' IEEE Trans. on Automatic Control,
Vol. AC-25, NO. 1, pp. 54-62, 1979.

101. C. Gonzaga, E. Polak and R. Trahan, "An Improved Algorithm for Opti-
mization Problems with Functional Inequality Constraints," IEEE Trans.
on Automatic Control, Vol. AC-25, No. 1, pp. 49-54 1979.

102. E. Polak and D. Q. Mayne, "Algorithms for Computer Aided Design

of Control Systems by the Method of Inequalities," Proc. 18th IEEE
Conference on Decision and Control, Fort Lauderdale, Florida, Dec. 12-
14, 1979.

103. E. Polak and S. Tishyadhighama, "New Convergence Theorems for a

Class of Feasible Directions Algorithms," Proc. 18th IEEE Conference
on Decision and Control, Fort Lauderdale, Florida, Dec. 12-14, 1979.

104. E. Polak and A. Sangiovanni Vincentelli "Theoretical and Computational

Aspects of the Optimal Design Centering, Tolerancing and Tuning Prob-
lem," Proc. Fourth International Symposium on Mathematical Theory
of Networks and Systems, Delft University of Technology, Delft, Holland,
July 3-6, 1979.
xxviii OPTIMIZATION AND CONTROL WITH APPLICATIONS

105. E. Polak, "On the Nature of Optimization Problems in Engineering De-

sign," Proc. 10th International Symposium on Mathematical Program-
ming, Montreal, Canada, Aug 27-30, 1979.

106. D. Q. Mayne and E. Polak, "A Superlinearly Convergent Algorithm for

Constrained Optimization Problems," Electronics Research Laboratory,
University of California, Berkeley, Memo. No. UCB/ERL M79/ 13, Jan.
1979; (presented a t the 10th International Symposium on Mathematical
Programming, Montreal, Canada, Aug 27-30, 1979. Revised 1511/ 1980,
Publication No. 78/52, Department of Computing and Control, Imperial
College, London.)

107. E. Polak "Algorithms for a Class of Computer Aided Design Problems:

a Review," Automatica, Vol. 15, pp. 531-538, 1979.

108. M. A. Bhatti, E. Polak, K. S. Pister, "OPTDYN - A General Purpose

Program for Optimization Problems with and without Dynamic Con-
straints," University of California, Berkeley, Eathquake Engineering Re-
search Center, Report No. UCB/EERC-79/16, July, 1979.

109. M. A. Bhatti, E. Polak, K. S. Pister, "Optimization of Control Devices

in Base isolation Systems for Aseismic Design," Proc. International IU-
TAM Symposium on Structural Control, University of Waterloo, Ontario,
Canada, North Holland Pub. Co., Amsterdam, pp. 127-138, 1980.

110. E. Polak, "An Implementable Algorithm for the Optimal Design Cen-
tering, Tolerancing and Tuning Problem," Proc. Fourth International
Symposium on Computing Methods in Applied Sciences and Engineering,
Versailles, France, Dec. 10-14, 1979. Published as: Computing Methods
in Applied Science and Engineering, R. Glowinski, J. L. Lions, ed., North
Holland, Amsterdam, pp. 499-517, 1980.

111. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for Op-
timal Control Problems with Control and Terminal Equality Constraints,
Part 1," J. Optimization Theory and Applications, Vol. 32 No. 2, pp.
211-246, 1980.

112. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for Op-
timal Control Problems with Control and Terminal Equality Constraints,
PUBLICATIONS OF ELIJAH POLAK xxix

Part 2," J. Optimization Theory and Applications, Vol. 32 No. 3, pp.

345-363, 1980.

113. E. Polak and A. Tits, "A Globally Convergent Implementable Multiplier

Method with Automatic Penalty Limitation," J. Appl. Math. and Opti-
mization, Vol. 6, pp. 335-360, 1980.

114. M. A. Bhatti, K. S. Pister, and E. Polak, "Interactive Optimal Design

of Dynamically Loaded Structures," Proc. ASCE conf on Structural
Optimization, Florida, 27-31 Oct. 1980.

115. M. A. Bhatti, K. S. Pister and E. Polak, "An Implementable Algorithm

for Computer-Aided Design Problems with or without Dynamic Con-
straints," Proc. ASME Century 2, International Computer Technology
Conference, Aug. 12-15, 1980, San Francisco. Published as: Advances in
Computer Technology - 1980, A. Sierig, ed., ASME, New York, Vol. I.,
pp. 392-400, Aug. 1980.

116. E. Polak and D. Q. Mayne, "On the Solution of Singular Value Inequali-
ties" Proc. 20th IEEE Conference on Decision and Control, Albuqurque,
N.M., Dec. 10-12, 1980.

117. E. Polak and D. Q. Mayne, "Design of Nonlinear Feedback Controllers,"

Proc. 20th IEEE Conference on Decision and Control, Albuqurque,
N.M., Dec. 10-12, 1980.

118. D. Q. Mayne, E. Polak and A. Voreadis, "A Cut Map Algorithm for De-
sign Problems with Tolerances" Proc. 20th IEEE Conference on Decision
and Control, Albuqurque, N.M., Dec. 10-12, 1980.

119. M. A. Bhatti, K. S. Pister, and E. Polak, "Interactive Optimal Design

of Dynamically Loaded Structures," Proc. 1980 ASCE National Conven-
tion, Florida, Dec. 1980.

120. D. Q. Mayne, E. Polak and A. Sangiovanni Vincentelli, "Computer Aided

Design via Optimization," Proc. IFAC Workshop on Control Applications
of Nonlinear Programming, Denver, Colorado, June 21, 1979. Published
as: Control Applications of Nonlinear Programming, H. E. Rauch, ed.,
Pergamon Press, Oxford and New York, pp. 85-91, 1980.
xxx OPTIMIZATION AND CONTROL WITH APPLICATIONS

121. M. A. Bhatti, K. S. Pister and E. Polak, "Optimization of Control Devices

in Base Isolation Systems for Aseismic Design," Structural Control, H. H.
E. Leipholz (ed), North Holland Pub. Co. SM Publications, pp. 127-138,
1980.

122. E. Polak, "Optimization-Based Computer-Aided Design of Engineering

Systems," FOREFRONT: Research in the College of Engineering, Uni-
versity of California, Berkeley, 1979/80.

123. E. Polak and D. Q. Mayne, "A Robust Secant Method for Optimiza-
tion Problems with Inequality Constraints," J . Optimization Theory and
Applications, Vol. 33, No. 4, pp. 463-467, 1981.

124. E. Polak and D. Q. Mayne, "On the Solution of Singular Value Inequali-
ties over a Continuum of Frequencies" IEEE Transactions on Automatic
Control, Vol. AC-26, No. 3, pp. 690-695, 1981.

125. E. Polak and D. Q. Mayne, "Design of Nonlinear Feedback Controllers,"

IEEE Transactions on Automatic Control, Vol. AC-26, No. 3, pp. 730-
733, 1981.

126. W. T. Nye, E. Polak, A. Sangiovanni-Vincentelliand A. Tits, "DELIGHT:

an Optimization-Based Computer-Aided-Design System," Proc. IEEE
Int. Symp. on Circuits and Systems, Chicago, Ill, April 24-27, 1981.

127. E. Polak and A. Tits, " On globally Stabilized Quasi-Newton Methods

for Inequality Constrained Optimization Problems," Proc. 10th IFIP
Conference on System Modeling and Optimization, New York, August
31-September 4, 1981.

128. E. Polak, "Optimization-Based Computer-Aided-Design of Control Sys-

tems," Proc. Joint Automatic Control Conference, University of Virginia,
Charlottesville, Virginia, June 17-19, 1981.

129. E. Polak, "Algorithms for Optimal Design," Proc. NATO Advanced

Study Institute, Univ. of Iowa, Iowa City, Ia, May 1980. Published
as: Optimization of Distributed Parameter Structures: Vol. 1, E. J .
Haug and J . Cea eds., Sijthoff & Noordhoff, pp. 586-602, 1981.
PUBLICATIONS OF ELIJAH POLAK xxxi

130. M. A. Bhatti, T. Essebo, W. Nye, K. S. Pister, E. Polak, A. Sangio-

vanni Vincentelli and A. Tits, "A Software System for Optimization-
Based Computer-Aided Design" Proc. IEEE International Symp. on
Circuits and Systems, Houston, Tx, April 28-30, 1980 and also Proc.
NATO Advanced Study Institute, Univ. of Iowa, Iowa City, Ia, May
1980. Published as: Optimization of Distributed Parameter Structures:
Vol. 1, E. J. Haug and J . Cea eds., Sijthoff & Noordhoff, pp. 602-620,
1981.

131. E. Polak and Y. Wardi, "A Nondifferentiable Optimization Algorithm

for the Design of Control Systems Subject to Singular Value Inequalities
over a Frequency Range," Proceedings IFA C/8l World Congress, Kyoto,
Japan, August 24-28, 1981.

132. E. Polak, "An Implementable Algorithm for the Design Centering, Toler-
ancing and Tuning Problem," J. Optimization Theory and Applications,
Vol. 35, No. 3, 1981.

133. D. Q. Mayne, E. Polak and A. J . Heunis, "Solving Nonlinear Inequalities

in a Finite Number of Iterations," J. Optimization Theory and Applica-
tions, Vol. 33, No. 2, pp. 207-221, 1981.

134. M. A. Bhatti, V. Ciampi, K. S. Pister and E. Polak, "OPTNSR an Interac-

tive Software System for Optimal Design of Statically Loaded Structures
with Nonlinear Response," University of California, Berkeley, Earthquake
Engineering Research Center, Report No. UCB/EERC-81/02, 1981.

135. R. J. Balling, K. S. Pister and E. Polak, "DELIGHT.STRUCT A Computer-

Aided Design Environment for Structural Engineering," University of
California, Berkeley, Earthquake Engineering Research Center Report
No. UCB/EERC-81/19, Dec. 1981.

136. R. J. Balling, V. Ciampi, K. S. Pister and E. Polak, "Optimal Design of

Seismic-Resistant Planar Steel Frames," University of California, Berke-
ley, Earthquake Engineering Research Center Report No. UCB/EERC-
81/20, Dec. 1981.

137. E. Polak, "Interactive Software for Computer-Aided-Design of Control

Systems via Optimization," Proc. 20th IEEE Conference of Decision and
Control, San Diego, Ca., pp. 408-411, Dec. 16-18, 1981.
xxxii OPTIMIZATION AND CONTROL WITH APPLICATIONS

138. M. A. Bhatti, K.S. Pister and E. Polak, "Package for Optimization-Based,

Interactive CAD," J. of the Structural Division of the A.S.C.E., Vol. 107,
No.ST11, pp. 2271-2284, 1981.

139. E. Polak, K. J. Astrom and D. Q. Mayne, "INTEROPTDYN-SISO: a Tu-

torial," University of California, Electronics Research Laboratory, Memo
UCB/ERL No. M81/99, Dec. 15, 1981.

140. D. Q. Mayne and E. Polak, "Algorithms for the Design of Control Sys-
tems Subject to Singular Value Inequalities," Mathematical Programming
Studies, Vol. 18, pp. 112-134, 1982.

141. E. Polak and S. Tishyadhigama, "New Convergence Theorems for a Class

of Feasible Directions Algorithms," J. Optimization Theory and Applica-
tions, Vol. 37, No. 1, pp. 33-44, 1982.

142. M. A. Bhatti, V. Ciampi, K. S. Pister and E. Polak, "An Interactive Soft-

ware System for Optimal Design of Statically and Dynamically Loaded
Structures with Nonlinear Response," Proc. of International Symposium
on Optimum Structural Design, Tucson Arizona, October 19-22, 1981.
Published as: Optimum Structural Design, R. H. Gallagher et al, eds.,
John Wiley and Sons, Chichester, England, 1982.

143. D. Q. Mayne, E. Polak and A. Voreadis, "A Cut Map Algorithm for
Design Problems with Tolerances," IEEE Trans. on Circuits and Systems,
Vol. CAS-29 NO. 1, pp. 35-46, 1982.

144. D. Q. Mayne, E. Polak and A. Sangiovanni Vincentelli, "Computer Aided

Design via Optimization: a Review," Automatica, Vol. 18, No. 2, pp.
147-154, 1982.

145. D. Q. Mayne and E. Polak, "A Superlinearly Convergent Algorithm for

Constrained Optimization Problems," Mathematical Programming Study
16, pp. 45-61, 1982.

146. E. Polak and Y. Wardi, "A Nondifferentiable Optimization Algorithm for

the Design of Control Systems Subject to Singular Value Inequalities over
a Frequency Range," Automatica, Vol. 18, No. 3, pp. 267-283, 1982.
PUBLICATIONS OF ELIJAH POLAK xxxiii

147. D. Q. Mayne and E. Polak, "Algorithms for the Design of Control Sys-
tems Subject to Singular Value Inequalities," Mathematical Program-
ming Study 18, Algorithms and Theory in Filtering and Control, D. C.
Sorensen and R. J.-B. Wets, ed., North Holland, New York, pp. 112-135,
1982.

148. E. Polak and A. Tits, " A Recursive Quadratic Programming Algorithm

for Semi-Infinite Optimization Problems," J. Appl. Math. and Optimiza-
tion, Vol. 8, pp. 325-349, 1982.

149. Balling, R. J., Ciampi, V., Pister K. S. Polak, E, Sangiovanni Vincen-

telli, A., Tits, A., " DELIGHT.STRUCTURE: An Interactive Software
System for Optimization-Based Computer-Aided Design of Dynamically
Loaded Structures with Nonlinear Response," Proc. ASCE Convention,
Las Vegas, 1982.

150. Balling, R. J., Ciampi, V., Pister K. S. Polak, E., "Optimal Design of
Structures Subjected to Earthquake Loading," Proc. ASCE Convention,
Las Vegas, 1982.

151. E. Polak, P. Siege1 and T. Wuu, W. T. Nye and D. Q. Mayne, "DELIGHT-

MIMO an Interactive, Optimization based Multivariable Control System
Design Package," IEEE Control Systems Magazine, Vol. 2, No. 4, pp.
9-14, 1982.

152. D. Q. Mayne and E. Polak, "A Quadratically Convergent Algorithm for

Solving Infinite Dimensional Inequalities," J. of Appl. Math. and Opti-
mization, Vol. 9., pp. 25-40, 1982.

153. E. Polak, D.Q. Mayne and Y. Wardi, "On the Extension of Constrained
Optimization Algorithms from Differentiable to Nondifferentiable Prob-
lems," SIAM J. Control and Optimization, Vol. 21, No. 2, pp. 179-204,
1983.

154. Y.Y. Wardi and E. Polak, "A Nondifferentiable Optimization Algorithm

for Structural Problems with Eigenvalue Constraints," Journal of Struc-
tural Mechanics, Vol. 11, No. 4, 1983.

155. E. Polak and D. Q. Mayne, "On Three Approaches to the Construction of

Nondifferentiable Optimization Algorithms," Proc. 11th IFIP Conference
xxxiv OPTIMIZATION AND CONTROL WITH APPLICATIONS

on System Modelling and Optimization, Copenhagen, Denmark, July 25-

29, 1983.

156. Polak, E. and Stimler, D. M. "Optimization-Based Design of SISO Con-

trol Systems with Uncertain Plant: Problem Formulation," University of
California, Berkeley, Electronics Research Laboratory Memo No. UCB/ERL
M83/16, 1983.

157. Polak, E. and Stimler, D. M. "Optimization-based Design of SISO Control

Systems with Uncertain Plant," Proc. IFAC Symp. on Applications
of Nonlinear Programming to Optimization and Control, San Francisco,
June 20-21, 1983.

158. Polak, E. and Stimler, D. M. "Complexity Reduction in Optimization-

Based Design of Control Systems with Uncertain Plant," Proc. American
Automatic Control Conference, San Francisco, June 22-24, 1983.

159. E. Polak, " Semi-Infinite Optimization in Engineering Design," Lecture

Notes in Economics and Mathematical Systems, Vol. 215: Semi-Infinite
Programming and Applications, A. V. Fiacco and K. 0. Kortanek, eds.,
Springer-Verlag, Berlin, New York, Tokyo, 1983.

160. E. Polak and Y. Y. Wardi, "A Study of Minimizing Sequences," Proc.

IEEE Conference on Decision and Control, San Antonio, Tx., pp. 923-
928, Dec. 1983.

161. Polak and D. Q. Mayne, "Algorithm Models for Nondifferentiable Opti-

mization," Proc. IEEE Conference on Decision and Control, San Anto-
nio, Tx., pp. 934-939, Dec. 1983.

162. R. J. Balling, K. S. Pister and E. Polak, "DELIGHT.STRUCT A Computer-

Aided Design Environment for Structural Engineering," Computer Meth-
ods in Applied Mechanics and Engineering, 38, pp. 237-251, 1983.

163. E. Polak, "A Modified Nyquist Stability Criterion for Use in Computer-
Aided Design," IEEE Trans. on Automatic Control, Vol. AC-29, No. 1,
pp. 91-93, 1984.

164. E. Polak and D.M. Stimler, "On the Design of Linear Control Systems
with Plant Uncertainty via Nondifferentiable Optimization," Proc. IX.
Triennial IFAC World Congress, Budapest, July 2-6, 1984.
PUBLICATIONS O F ELIJAH POLAK xxxv

165. D. Q. Mayne and E. Polak, "Outer Approximations Algorithm for Non-

differentiable Optimization Problems," J. Optimization Theory and Ap-
plications, Vol. 42, No. 1, pp. 19-30, 1984.

166. E. Polak, "Notes on the Mathematical Foundations of Nondifferentiable

Optimization in Engineering Design," University of California, Electron-
ics Research Laboratory, Memo UCB/ERL M84/15, 2 Feb. 1984.

167. E. Polak and D. Q. Mayne, "Theoretical and Software Aspects of Opti-

mization Based Control System Design," Proceedings of the Sixth Inter-
national Conference Analysis and Optimization of Systems, Nice, France,
June 19-22, 1984.

168. E. Polak, "A Perspective on the Use of Semi-Infinite Optimization in

Control System Design," Proc. 1984 Automatic Control Conference, San
Diego, June 1984.

169. D. Q. Mayne and E. Polak, "Nondifferentiable Optimization via Adaptive

Smoothing," J. Optimization Theory and Applications, Vol. 43, No. 4,
pp. 601-614, 1984.

170. E. Polak and Y. Y. Wardi, "A Study of Minimizing Sequences," SIAM J.

Control and Optimization, Vol. 22, No. 4, pp. 599-609, 1984.

171. M. A. Bhatti, V. Ciampi, K. S. Pister and E. Polak, "An Interactive

Software System for Optimal Design with Nonlinear Response," New Di-
rections in Optimum Structural Design, E. Atrek, R. H. Gallagher, K.
M. Ragsdell and 0. C. Zienkewicz eds., pp. 633-663, John Wiley and
Sons, New York, N.Y. 1984,

172. E. Polak, D. Q. Mayne and D. M. Stimler, "Control System Design via

Semi-Infinite Optimization," Proceedings of the IEEE, pp. 1777-1795,
December 1984.

173. E. Polak, S. Salcudean and D. Q. Mayne, "A Rationale for the Sequential
Optimal Redesign of Control Systems," Proc. 1985 ISCAS, pp. 835-838,
Kyoto, Japan, June 1985.

174. E. Polak, S. Salcudean and D. Q. Mayne, "A Sequential Optimal Re-

design Procedure for Linear Feedback Systems," University of California,
xxxvi OPTIMIZATION AND CONTROL WITH APPLICATIONS

Berkeley, Electronics Research laboratory Memo No. UCB/ERL M85/15,

Feb.28, 1985.

175. D. Q. Mayne and E. Polak "Algorithms for Optimization Problems with

Exclusion Constraints," Proc. 1985 IEEE Conference on Decision and
Control, Fort Lauderdale, Florida, Dec. 1985.

176. E. Polak and D. M. Stimler, "On the Efficient Formulation of the Optimal
Worst Case Control System Design Problem," University of California,
Electronics Research Laboratory Memo No. UCB/ERL M85/71, 21 Au-
gust 1985.

177. E. Polak and D. Q. Mayne, "Algorithm Models for Nondifferentiable Op-

timization," SIAM J. Control and Optimization, Vol. 23, No. 3, 1985.

178. T. L. Wuu, R. G. Becker and E. Polak, "A Diagonalization Technique

for the Computation of Sensitivity Functions of Linear Time Invariant
Systems," IEEE Trans. on Automatic Control, Vol. AC-31 No. 12, pp.
1141-1143, 1986.

179. E. Polak and D. M. Stimler, "Majorization: a Computational Complex-

ity Reduction Technique in Control System Design," Proceedings of the
Seventh International Conference Analysis and Optimization of Systems,
Antibes, France, June, 1986.

180. D. M. Stimler and E. Polak , "Nondifferentiable Optimization in Worst

Case Control Systems Design: a Computational Example," Proc. IEEE
Control Systems Society 3rd Symposium on CACSD, Arlington, Va.,
September 24-26, 1986.

181. D. Q. Mayne and E. Polak "Algorithms for Optimization Problems with

Exclusion Constraints," J. Optimization Theory and Applications, Vol.
51, No. 3, pp. 453-474, 1986

182. E. Polak, "A Perspective on Control System Design by Means of Semi-

Infinite Optimization Algorithms," Proc. IFIP Working Conference on
Optimization Techniques, Santiago, Chile, Aug. 1984. Springer Verlag.
1987
PUBLICATIONS O F ELIJAH POLAK xxxvii

183. E. Polak and D. Q. Mayne, " Design of Multivariable Control Systems via
Semi-Infinite Optimization," Systems and Control Encyclopaedia, M. G.
Singh, editor, Pergamon Press, N.Y. 1987.

184. D. Q. Mayne and E. Polak "An Exact Penalty Function Algorithm for
Control Problems with State and Control Constraints," IEEE Trans. on
Control, Vol. AC-32, No. 5, pp. 380-388, 1987.

185. E. Polak, S. Salcudean and D. Q. Mayne, "Adaptive Control of ARMA

Plants Using Worst Case Design by Semi-Infinite Optimization, IEEE
Trans. on Automatic Control, Vol. AC-32, No. 5 , pp. 388-397, 1987.

186. E. Polak, "On the Mathematical Foundations of Nondifferentiable Opti-

mization in Engineering Design," SIAM Review, Vo1.29, No.1 pp. 21-91,
March 1987.

187. E. Polak, T. E. Baker, T-L. Wuu and Y-P. Harn "Optimization-Based

Design of Control Systems for Flexible Structures," Proc. 4-th Annual
NASA SCOLE Workshop, Colorado Springs, December 1987.

188. S. Daijavad, E. Polak, and R-S Tsay, "A Combined Deterministic and
Random Optimization Algorithm for the Placement of Macro-Cells," Proc.
MCNC International Workshop on Placement and Routing, Research Tri-
angle Park, NC, May 10-13, 1988.

189. E. Polak and S. E. Salcudean, "Algorithms for Optimal Feedback Design,"

Proc. International Symposium on the Mathematical Theory of Networks
and Systems (MTNS/87), Phoenix, Arizona, June 15-19, 1987: .br C. I.
Byrnes, C. F. Martin, and R. E. Saeks eds., Linear Circuits, Systems and
Signal Processing: Theory and Applications, Elsevier Science Pub. B.V.
(North Holland), 1988.

190. T. E. Baker and E. Polak, "Computational Experiments in the Optimal

Slewing of Flexible Structures," Proc. Second NASA/Air Force Sympo-
sium on Recent Advances in Multidisciplinary Analysis and Optimization,
Hampton, Va., Sept. 28-30, 1988.

191. E. Polak, "Minimax Algorithms for Structural Optimization," Proc. IU-

TAM Symposium on Structural Optimization, Melbourne, Australia, Feb.
9 - 13, 1988.
xxxviii OPTIMIZATION AND CONTROL WITH APPLICATIONS

192. E. Polak and E. J . Wiest, "Domain Rescaling Techniques for the Solution
of Affinely Parametrized Nondifferentiable Optimal Design Problems,"
Proc. 27th IEEE Conference on Decision and Control, Austin, Tx., Dec.
7-9 1988. Dec. 1988.

193. E. Polak and D. M. Stimler, "Majorization: a Computational Complex-

ity Reduction Technique in Control System Design," IEEE Trans. on
Automatic Control, Vol. 33, No.11, pp 1010-1022, 1988.

194. E. Polak and S. E. Salcudean, "On The Design of Linear Multivari-

able Feedback Systems via Constrained Nondifferentiable Optimization
in Hsupinf Spaces," IEEE Trans on Automatic Control, Vol. 34, No.3,
pp 268-276, 1989.

195. E. Polak and S. Wuu, "On the Design of Stabilizing Compensators via
Semi-Infinite Optimization," EEE Trans. on Control, Vol. 34, No.2, pp
196-200, 1989.

196. E. Polak, "Nonsmooth Optimization Algorithms for the Design of Con-

trolled Flexible Structures," Proc. AMS-SIAM-IMS Joint Summer Re-
search Conf. on Dynamics and Control of Multibody Systems , July
30- August 5, 1988, Bowdoin College, Brunswick, Maine. Contemporary
Mathematics Vol. 97, pp 337-371, J . E. Marsden, P. S. Krishnaprasad,
and J. C. Simo eds., American Math Soc., Providence RI, 1989.

197. E. Polak, "Basics of Minimax Algorithms," Proc. Fourth Course of the

International School of Mathematics on Nonsmooth Optimization and
Related Topics Erice, Italy, June 19 - July 8 1988. Published as (pp 343-
367): Nonsmooth Optimization and Related Topics, F. H. Clarke, V. F.
Dem'yanov and F. Giannessi eds., Plenum Press, New York, 1989.

198. L. He and E. Polak, "Effective Discretization Strategies in Optimal De-

sign," Proceedings 28th IEEE Conference on Decision and Control, Tampa,
FL., December 12-14, 1989.

199. Y-P. Harn and E. Polak, "On the Design of Finite Dimensional Con-
trollers for Infinite Dimensional Feedback-Systems via Semi-Infinite Op-
timization," Proc. 27th IEEE Conference on Dec. and Contr., Austin,
Tx., Dec. 7-9 1988. IEEE Trans. on Automatic Control, Vol. 35, No.
10, pp. 1135-1140, 1990
PUBLICATIONS O F ELIJAH POLAK xxxix

200. J . E. Higgins and E. Polak, "Minimizing Pseudo-Convex Functions on

Convex Compact Sets," J. Optimization Theory and Applications, Vo1.65,
No.1, pp 1-28, 1990.

201. E. Polak and E. J. Wiest, "A Variable Metric Technique for the Solution
of Affinely Parametrized Nondifferentiable Optimal Design Problems," J.
Optimization Theory and Applications, Vol. 66, No. 3, pp 391-414, 1990.

202. L. He and E. Polak, "An Optimal Diagonalization Strategy for the Solu-
tion of a Class of Optimal design Problems," IEEE, Zlans. on Automatic
Control, Vol. 35, No.3, pp 258-267, 1990.

203. T. E. Baker and E. Polak, "An Algorithm for Optimal Slewing of Flexible
Structures," University of California, Electronics Research Laboratory,
Memo UCBIERL M89/37, 11 April 1989, Revised, 4 June 1990.

204. E. Polak, T. Yang, and D. Q. Mayne, "A Method of Centers Based on

Barrier Functions for Solving Optimal Control Problems with Continuum
State and Control Constraints," Proc. 29-th IEEE Conf. on Decision and
Control, Honolulu, Hawaii, Dec. 5-7, 1990.

205. L. He and E. Polak, "Effective Diagonalization Strategies for the Solution

of a Class of Optimal Design Problems," IEEE Trans. on Automatic
Control, Vol. 35, No.3, pp 258-267, 1990.

206. D. Q. Mayne, H. Michalska and E. Polak, "An Efficient Outer Approxi-

mations Algorithm for Solving Infinite Sets of Inequalities," Proc. 29-th
IEEE Conf. on Dec, and Control, Honolulu, Hawaii, Dec. 5-7, 1990.

207. Y-P. Harn and E. Polak, "On the Design of Finite Dimensional Con-
trollers for Infinite Dimensional Feedback-Systems via Semi-Infinite Op-
timization," IEEE Trans. on Automatic Control, Vol. 35, No. 10, pp.
1135-1140, 1990.

208. E. Polak, D. Q. Mayne and J. Higgins, "A Superlinearly Convergent Algo-

rithm for Min-Max Problems," J. Optimization Theory and Applications
Vol. 69, No.3, pp 407-439, 1991.

209. Y-P. Harn and E. Polak, "Proportional-Plus-Multiintegral Stabilizing

Compensators for a Class of MIMO Feedback Systems with Infinite-
xl OPTIMIZATION AND CONTROL WITH APPLICATIONS

Dimensional Plants," IEEE Trans. on Automatic Control, Vol. 36, No.

2, pp. 207-213, 1991.

210. E. Polak and L. He, "A Unified Phase I Phase I1 Method of Feasible
Directions for Semi-infinite Optimization," J. Optimization Theory and
Applications, Vol. 69, No.1, pp 83-107, 1991.

211. J . Higgins and E. Polak, "An €-active Barrier Function Method for Solving
Minimax Problems," J. Applied Mathematics and Optimization, Vol. 23,
pp 275-297, 1991.

212. E. J. Wiest and E. Polak, "On the Rate of Convergence of Two Minimax
Algorithms," J. Optimization Theory and Applications Vol. 71 No.1, pp
1-30, 1991.

213. E. Polak and L. He, "Finite-Termination Schemes for Solving Semi-Infinite

Satisfycing Problems," J. Optimization Theory and Applications, Vol. 70,
NO. 3, pp 429-466, 1991.

214. L. He and E. Polak, "Multistart Method with Estimation Scheme for

Global Satisfycing Problems," Proc. European Control Conference, Greno-
ble, July 2-5, 1991.

215. C. Kirjner Neto and E. Polak, "A Secant Method Based on Cubic Inter-
polation for Solving One Dimensional Optimization Problems," Univer-
sity of California, Berkeley, Electronics Research Laboratory Memo No.
UCB/ERL M91/91, 15 October 1991.

216. E. Polak, J. Higgins and D. Q. Mayne, "A Barrier Function Method for
Minimax Problems," Mathematical Programming, Vol. 54, No.2, pp.
155-176, 1992.

217. E. Polak, D. Q. Mayne, and J . Higgins, "On the Extension of Newton's

Method to Semi-Infinite Minimax Problems," SIAM J. Control and Op-
timization, Vol. 30, No.2, pp. 376-389, 1992.

218. E. Polak and L. He, "Rate Preserving Discretization Strategies for Semi-
infinite Programming and Optimal Control," SIAM J. Control and Opti-
mization, Vol. 30, No. 3, pp 548-572, 1992
PUBLICATIONS O F ELIJAH POLAK xli

219. E. J. Wiest and E. Polak, "A Generalized Quadratic Programming-Based

Phase I Phase I1 Method for Inequality Constrained Optimization," J.
Appl. Mathematics and Optimization, Vol. 26, pp 223-252, 1992.

220. T. H. Yang and E. Polak, "Moving Horizon Control of Nonlinear Systems

with Input Saturation, Disturbances, and Plant Uncertainty," Interna-
tional Journal on Control Vol. 58. No.4, pp. 875-903, 1993.

221. E. Polak and T. H. Yang, "Moving Horizon Control of Linear Systems

with Input Saturation, and Plant Uncertainty, Part 2. Disturbance Re-
jection and Tracking," International Journal on Control, Vol. 68, No. 3,
pp. 639-663, 1993.

222. E. Polak and T. H. Yang, "Moving Horizon Control of Linear Systems

with Input Saturation, and Plant Uncertainty, Part 1. Robustness," In-
ternational Journal on Control, Vol. 68, No. 3, pp. 613-638, 1993.

223. E. Polak, "On the Use of Consistent Approximations in the Solution of

Semi-Infinite Optimization and Optimal Control Problems," Mathemat-
ical Programming, Series B, Vol. 62, No.2, pp 385-414, 1993.

224. L. He and E. Polak, "Multistart Method with Estimation Scheme for

Global Satisfycing Problems," J. Global Optimization, No.3, pp 139-156,
1993.

225. E. Polak, T. Yang, and D. Q. Mayne, "A Method of Centers Based on

Barrier Functions for Solving Optimal Control Problems with Contin-
uum State and Control Constraints," Siam J. Control and Optimization,
V01.31, pp 159-179, 1993.

226. D. Q. Mayne and E. Polak, "Optimization Based Design and Control,"

plenary address, Proc. IFAC Congress, July 1993, Sydney, Australia.

227. T. E. Baker and E. Polak, "On the Optimal Control of Systems Described
by Evolution Equations," SIAM J. Control and Optimization, Vol. 32,
NO. 1, pp 224-260, 1994

228. E. Polak, G. Meeker, K. Yamada and N. Kurata, "Evaluation of an Ac-

tive Variable-Damping-Structure," Earthquake Engineering and Struc-
tural Dynamics, Vol. 23, pp 1259-1274, 1994.
xlii OPTIMIZATION AND CONTROL WITH APPLICATIONS

229. D. Q. Mayne, H. Michalska and E. Polak, "An Efficient Algorithm for

Solving Semi-Infinite Inequality Problems with Box Constraints," J. Ap-
plied Mathematics and Optimization, Vol. 30 No.2, pp. 135-157, 1994.

230. C. Kirjner-Neto, E. Polak, and A. Der Kiureghian, "Algorithms for Reliability-

Based Optimal Design," Proc. IFIP Working Group 7.5 Working Confer-
ence on Reliability and Optimization of Structural Systems, Assisi (Pe-
rugia), Sept 7-9, 1994.

231. C. Kirjner Neto and E. Polak, "On the Use of Consistent Approxima-
tions for the Optimal Design of Beams," SIAM Journal on Control and
Optimization, Vol. 34, No. 6, pp. 1891-1913, 1996.

232. A. Schwartz and E. Polak, "Consistent Approximations for Optimal Con-

trol Problems Based on Runge-Kutta Integration," SIAM Journal on
Control and Optimization, Vol. 34., No.4, pp. 1235-69, 1996.

233. A. Schwartz and E. Polak, "A Family of Projected Descent Methods for
Optimization Problems with Simple Bounds," J . Optimization Theory
and Applications, Vol. 92, No.1, pp.1-32, 1997.

234. E. Polak, C. Kirjner-Neto, and A. Der Kiureghian, "Structural optimiza-

tion with reliability constraints," Proc. 7th IFIP WG 7.5 Conference on
Reliability and Optimization of Structural Systems Boulder, Colorado,
USA. 2-4 April, 1996. Published as Reliability and Optimization ofStruc-
tural Systems, D. M. Frangopol, R. Corotis, and R. Rackwitz eds, Perg-
amon, 1997.

235. I. S. Khalil-Bustany, C. J. Diederich, E. Polak, and A. W. Dutton, "A

Three Dimensional Minimax Optimization-Based Inverse Treatment Plan-
ning Approach for Interstitial Thermal Therapy Using Multi-Element Ap-
plicators," Proc. 16th Annual Meeting North American Hyperthermia
Soc., Rhode Island, 1997.

236. I. S. Khalil, C. J. Dietrich, E. Polak, and A. W. Dutton, "Three Di-

mensional Minimax Optimization Based Inverse Treatment Planning for
Interstitial Thermal Therapy Using Multi-Element Applicators," Proc.
North American Hyperthermia Society - 16th Annual Meeting, Provi-
dence RI, April 1997.
PUBLICATIONS OF ELIJAH POLAK xliii

237. E. Polak and L. Qi, "A Globally and Superlinearly Convergent Scheme
for Minimizing a Normal Merit Function", AMR 96/17, Applied Math-
ematics Report, University of New South Wales, 1996, and SIAM J. on
Optimization, Vol. 36, No. 3, p.1005-19, 1998.

238. C. Kirjner and E. Polak, "On the Conversion of Optimization Problems

with MaxMin Constraints to Standard Optimization Problems," SIAM
J . Optimization, Vol. 8, No. 4, pp 887-915, 1998.

239. C. Kirjner-Neto, E. Polak and A. Der Kiureghian, "An Outer Approx-

imations Approach to Reliability-Based Optimal Design of Structures,"
J. Optimization Theory and Applications, Vol. 98, No.1, pp. 1-17, July
1998.

240. I. S. Khalil-Bustany, C. J . Diederich, C. Kirjner-Neto, and E. Polak,

"A Minimax Optimization-Based Inverse Treatment Planning Approach
for Interstitial Thermal Therapy," International Journal of Hyperthermia
Vol. 14, No. 4 pp 331-346, 1998.

241. E. Polak and L. Qi, "Some Optimality Conditions for Minimax Problems
and Nonlinear Programs," Applied Mathematics Report AMR 9814, Uni-
versity of New South Wales, 1998.

242. N. di Cesare, 0. Pironneau, E. Polak. "Consistent Approximations for an

Optimal Design Problem," LAN-UPMC report 98005, Universite Pierre
et Marie Curie, Paris, France, January 1998. Paris, Jussieu, March 1998.

243. A. der Kiureghian and E. Polak, "Reliability-Based Optimal Structural

Design: a Decoupled Approach," Proc. 8th IFIP WG 7.5 Conference
on Reliability and Optimization of Structural Systems, Krakow, Poland,
11-13 May, 1998.

244. L. Davis, R. Evans, and E. Polak, "Maximum Likelyhood Estimation of

Positive Definite hermitian Toeplitz Matrices Using Outer Approxima-
tions," Proc. 9th IEEE Signal Processing Workshop on Statistical Signal
and Array Processing, Portland, Oregon, September 14-16, pp. 49-52,
1998.

245. J. S. Maltz, E. Polak, and T. F. Budinger, "Multistart Optimisation Al-

gorithm for Joint Spatial and Kinetic Parameter Estimation in Dynamic
xliv OPTIMIZATION AND CONTROL WITH APPLICATIONS

ECT," 1998 IEEE Nuclear Science Symposium and Medical Imaging Con-
ference Record, Toronto, Canada, November 9-14, 1998.

246. E. Polak, L. Qi, and D. Sun, "First-Order Algorithms for Generalized Fi-
nite and Semi-Infinite Min-Max Problems," Computational Optimization
and Applications, Vo1.13, No.1-3, Kluwer Academic Publishers, p.137-61,
1999.

247. E. Polak, R. J-B. Wets, and A. der Kiureghian, "On an Approach to

Optimization Problems with a Probabilistic Cost and or Constraints," in
Nonlinear Optimization and Related Topics, pp. 299-316, G.Di Pillo and
F.Giannessi, Editors, Kluwer Academic Publishers B.V., 2000.

248. Geraldine Lemarchand, Olivier Pironneau, and Elijah Polak, "A Mesh
Refinement Method for Optimization with DDM," Proc. 13th Inter-
national Conference on Domain Decomposition Methods, Champfleuri,
Lyons, France, October 9-12, 2000.

249. J. Royset, A. der Kiureghian, and E. Polak, "Reliability-based Optimal

Design with Probabilistic Cost and Constraint," in Proceeding of the 9th
IE'IP Working Conference on Optimization and Reliability of Structrual
Systems, Ann Arbor, Michigan, Sep 2000. A.S. Nowak and M.M. Szerszen
(Eds.), Univ. of Michigan, Ann Arbor, pp. 209-216, 2000.

250. E. Polak, "First-Order Algorithms for Optimization Problems With a

Maximum Eigenvalue/Singular Value Cost and or Constraints," M. A.
Goberna and M. A. Lopez, eds, Semi-Infinite Programming: Recent Ad-
vances, Kluwer Academic Publishers, pp. 197-220, 2001.

251. J. Royset, A. der Kiureghian, and E. Polak, "Reliability-based Optimal

Structural Design by the Decoupling Approach," Journal of Reliability
Engineering and System Safety, Elsevier Science, Vol. 73, No. 3, p.
213-221, 2001.

252. J . Royset, A. der Kiureghian, and E. Polak, "Reliability-based Optimal

Design of Series Structural Systems," Journal of Engineering Mechanics,
127, 6 , p. 607-614, 2001.
PUBLICATIONS OF ELIJAH POLAK xlv

253. E. Polak, L. Qi and D. Sun, "Second-Order Algorithms for Generalized

Finite and Semi-Infinite Min-Max Problems," SIAM Journal on Opti-
mization, Vol.11, 218 (no.4), p.937-61, 2001.

254. E. Polak and M. Wetter, "Generalized Pattern Search Algorithms with

Adaptive Precision Function Evaluations," University of California ERL
Memo No UCB/ERL M01/30,7 September 2001.

255. E. Polak, "Smoothing Techniques for the Solution of Finite and Semi-
Infinite Min-Max-Min Problems," High Performance Algorithms and Soft-
ware for Nonlinear Optimization, G. Di Pillo and A. Murli, Editors,
Kluwer Academic Publishers B.V., 2002

256. 0. Pironneau and E. Polak, "On a Consistent Approximations Approach

to Optimal Control Problems with Two Numerical Precision Parameters,"
High Performance Algorithms and Software for Nonlinear Optimization,
G. Di Pillo and A. Murli, Editors, Kluwer Academic Publishers B.V.,
2002.

257. 0. Pironneau and E. Polak, "Consistent Approximations and Approxi-

mate Functions and Gradients in Optimal Control," J. SIAM Control and
Optimization, Vol. 41, pp. 487-510, 2002.

258. J.O. Royset, E. Polak and A. Der Kiureghian, "FORM Analysis USIng
Consistent Approximations," Proceedings of the 15th ASCE Engineering
Mechanics Conference, New York, NY, 2002.

259. E. Polak and J.O. Royset, "Algorithms with Adaptive Smoothing for
Finite Min-Max Problems," J. Optimization Theory and Applications,
submitted 2002.

260. E. Polak and J.O. Royset, "Algorithms for Finite and Semi-Infinite Min-
Max-Min Problems Using Adaptive Smoothing Techniques," J . Optimiza-
tion Theory and Applications, submitted 2002.

261. J.O. Royset, E. Polak and A. Der Kiureghian, "Adaptive Approximations

and Exact Penalization for the Solution of Generalized Semi-Infinite Min-
Max Problems," to appear in SIAM J. Optimization,
xlvi OPTIMIZATION AND CONTROL WITH APPLICATIONS

262. A. Brockwell, E. Polak, R. Evans, and D. Ralph, "Dual-Sampling-Rate

Moving Horizon Control of a Class of Linear Systems with Input Satura-
tion and Plant Uncertainty," Carnegie Mellon, Dept. of Statistics Report
No. 733, 11/00, 2000. To appear in J. Optimization Theory and Appli-
cations.
I DUALITY AND OPTIMALITY
CONDITIONS
1 O N M INlMlZATlON OF MAX-MIN
FUNCTIONS
A.M. Bagirov and A.M. Rubinov

Centre for Informatics and Applied Optimization,

School of Information Technology and Mathematical Sciences,
University of Ballarat, Victoria, 3353, Australia
Email: [email protected], [email protected]

Abstract: In this paper different classes of unconstrained and constrained

minimization problems with max-min objective and/or constraint functions are
studied. First we consider simple problems with the max-min objective function,
where the explicit description of all local minimizers is given. Then we discuss
the application of the cutting angle and discrete gradient methods as well as a
special penalization method for solving constrained problems. We report the
results of preliminary numerical experiments.

Key words: Max-min function, cutting angle method, discrete gradient method,
quasidifferential.
4 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Max-min functions form one of the interesting and important for applications
classes of nonconvex and nonsmooth functions. There are many practical tasks
where the objective function and/or constraints belong to this class. For exam-
ple, optimization problems with max-min constraints arise in different branches
of engineering such as the design of electronic circuits subject to a toleranc-
ing and tuning provision (see Bandler, et a1 (1976); Liu et a1 (1992); Muller
(1976); Polak and Sangiovanni Vincentelli (1979); Polak (1981)), the design
of paths for robots in the presence of obstacles (Gilbert and Johnson (1985)),
the design of heat exchangers (Grossman and Sargent (1978); Halemane and
Grossman (1983); Ostrovsky et a1 (1994)) and chemical reactors (Halemane
and Grossman (1983); Ostrovsky et a1 (1994)), in the layout design of VLSI
circuits (Cheng et a1 (1992); Hochbaum (1993)) etc. Optimization problems
with max-min objective and constraint functions also arise when one tries to
design systems under uncertainty (see Bracken and McGill (1974); Grossman
and Sargent (1978); Halemane and Grossman (1983); Ierapetritou and Pis-
tikopoulos (1994); Ostrovsky et a1 (1994)).

A mathematical formalization of one of the main problems of cluster analysis

leads to the following optimization problem: for a given set of points ai E
lRn, i = 1,. . . ,m find a collection it = {it1, . . . ,itP} of p n-dimensional vectors,
which is a solution of the following problem:

minimize f (xl,. . . ,xp) = max min

i = l , ...,m l=l,...,p
Ilxl-sill subject t o x l E D, I = 1,.. . , p

where D c lRn is a compact set.

It is known that piecewise linear functions can be represented as max-min
of certain linear functions (see Bartels et a1 (1995)). Thus the minimization
of piecewise linear functions can be reduced to the minimization of functions
represented as max-min of linear functions.
In the theory of abstract convexity an arbitrary Lipschitz continuous func-
tion on the unit simplex is underestimated by certain max-min functions (see
Rubinov (2000) for details). Thus, the global minimization of Lipschitz func-
tions can be reduced to the sequence of global minimization problems with
special max-min objective functions (see Bagirov and Rubinov (2000); Rubi-
nov (2000)).
ON MINIMIZATION O F MAX-MIN FUNCTIONS 5

In the paper (Kirjner-Neto and Polak (1998)), the authors consider opti-
mization problems with twice continuously differentiable objective functions
and max-min constraint functions. They convert this problem to a certain
problem of smooth optimization.
In this paper we consider different classes of unconstrained and constrained
minimization problems with max-min objective and/or constraint functions.
The paper consists of four parts. First, we investigate a special simple class of
max-min functions. We give a explicit description of all local minima and show
that even in such a simple case the number of local minimizers is very large.
We discuss the applicability of the discrete gradient method (see, for example
Bagirov (1999a); Bagirov (199913)) for finding a local minimizer of discrete max-
min functions without constraints in the second part.
The constrained minimization problems with max-min constraints are ex-
amined in the third part. We use a special penalization approach (see Rubinov
et a1 (2002)) for this purpose.
The unconstrained global minimization of some continuous maximum func-
tions by the cutting angle method (see, for example, Rubinov (2000)) are stud-
ied in the fourth part. If the number of internal variables in the continuous
maximum functions is large enough (more than 5) then the minimization prob-
lem of these functions cannot be solved by using traditional discretization.
Application of the cutting angle method allows us to solve such problems with
small number of external and large enough number of internal variables (up to
15 internal variables).
We provide results of numerical experiments, which allow us to conclude
that even for simple max-min functions the number of local minima can be
very large. It means that the problem of minimization of such functions is
quite complicated. However, results of numerical experiments show that meth-
ods considered in this paper allow us to solve different kinds of problems of
minimization of max-min functions up to 10 variables.
The paper is arranged as follows. In Section 2- we study the problem of
minimization of special max-min functions over the unit simplex. Section 3
is devoted to the problem of minimization of the discrete max-min functions.
Minimization problems with max-min constraints are studied in Section 4. The
problem of global minimization of the continuous maximum functions is dis-
cussed in Section 5. Section 6 concludes the paper.
6 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2 SPECIAL CLASSES OF MAX-MIN OBJECTIVE FUNCTIONS

In this section we propose an algorithm for solving special classes of minimiza-

tion problems with the max-min objective functions. This algorithm allow us
to calculate the set of all local minimizers.

2.1 ICAR and IPH functions

Let I = ( 1 , . . . ,n} and Rn be an n-dimensional vector space. For each x E Rn
we denote by xi the i-th coordinate of the vector x . We use the following
notation:

Let R; = { x E Rn : x > 0 ) be the cone of vectors with nonnegative coordi-

nates. For a vector 1 E Ry consider the sets

If 1 > 0 then I(1) = 0 if and only if 1 = 0. We also have I(1) U I o ( l ) = I . For

each 1 E R"+\{O) consider the function, which we denote by the same symbol
1:
l ( x ) = min Lixi, x E Ry . (2.1)
iEI(1)

A function of the form (2.1) is called the min-type function generated by 1 (or
simply a min-type function).
Let f be a function defined on R:. A function f is called increasing if
x >
y implies that f ( x ) > f ( y ) . The restriction of the function f to a ray
R, = { a y : a > 0 ) starting from zero and passing through y is the function of
one variable
fY(Q) = f (QY). (2.2)

Definition 2.1 A function f : R"++ IR is called I C A R (increasing convex-

along-rays) function i f f is increasing and the restriction o f f to each ray Ry,
yE IR,; \{O) is a convex function.
The class of ICAR functions is very broad. It contains all increasing convex
functions and all polynomials with nonnegative coefficients and also some con-
cave functions.
ON MINIMIZATION O F MAX-MIN FUNCTIONS 7

Let L be the set of all min-type functions. A function 1 E L is called an

L-subgradient of f at a point y E R? if l ( x ) - l ( y ) I.f ( x )- f ( y ) for all x . The
corresponding vector 1 is called a support vector for f at y. The set d~ f ( y ) of
all subgradients o f f at y is called the L-subdifferential of f at y. If 1 E d~ f ( y )
then the function i ( x ) = l ( x ) - l ( y ) + f ( y ) enjoys the following properties:
i ( x ) 5 f ( x ) for all x E Rn+and i(y)
= f ( y ) . Let x E R + , x # 0 and u 2 0 .
We shall use the following notation:

The following result holds:

Theorem 2.1 (see Rubinov (2000)) Let f be a finite I C A R function defined

o n lR"+ Then for each y E R+\{0} the L subdifferential d~ f ( y ) is not empty
and contains all vectors of the form u l x where u E d f y ( l ) and d f , ( l ) is the
subdifferential (in the sense of convex analysis) of the convex function fy defined
by (2.2)) at the point a = 1. I n particular, if f is strictly increasing at the point
y ( x 5 y, x # y implies f ( x ) < f ( y ) ) and differentiable at this point, then

where (., .) is a usual inner product.

The following result is based on Theorem 2.1. (See Rubinov (2000) for details.)

Proposition 2.1 Let f be an I C A R function and let X c int IR? be a compa,ct

set. Then for each E > 0 there exists a finite set { x l , . . . ,x j ) c X ) su,ch th.at
the function
h ( x ) = max min (lFxi - c k ) , (2.4)
klj i~I(1k)

where lk is an arbitrary element of d Lf ( x k ) and ck = f ( x k )- f ' ( x k , x"), k =

1 , . . . ,j , curries out a unzform lower approximation o f f , that is,

It is easy to see that the function h is ICAR (see Rubinov (2000)).

An important subclass of the set of all ICAR functions consists of increasing
positively homogeneous of degree one(1PH) functions. Recall that a function f
8 OPTIMIZATION AND CONTROL WITH APPLICATIONS

defined on RT is called positively homogeneous of degree one if

The function f is IPH if and only iff is positively homogeneous ICAR function.
It follows from (2.5) that an ICAR function is IPH if the function f , defined
by (2.2) is linear for all y E R;, y # 0.
The following result holds (see Rubinov (2000)).

Theorem 2.2 1) A finite function defined on IR? is IPH if and only if

f ( x ) = max{l(x) : 1 E L, 1 < f),

where 1(x) is defined by (2.1):

l ( x ) = min laxi;
iEI(1)

2) Let x0 E IEt? be a vector such that f ( x O )> 0 and I = f ( x O ) / x O .Then

1( x ) < f ( x ) for all x E R? and 1 ( x O )= f ( x O ) .

The following proposition is a special case of Proposition 2.1.

Proposition 2.2 Let f be an IPH function and let X c int IR; be a compact
set. Then for each E > 0 there exists a finite set { x l , . . . , x j ) c X , such that
the function
h ( x ) = max min $xi
k s j a€I(lk)

with lk = f ( x k ) / x k curries out a uniform lower approximation o f f , that is,

0 < f ( x ) - h ( x ) 5 E for all x E X .

Now we consider the following global optimization problem:

minimize f ( x ) subject to x E X (2.7)

where f is an ICAR function and X c IR; is a compact set. The cutting angle
method for solving problem (2.7) has been proposed and studied in Rubinov
(2000) (p. 420). This method reduces the problem (2.7) to a sequence of
auxiliary problems:

minimize h ( x ) subject to x E X (2.8)

ON MINIMIZATION O F MAX-h4IN FUNCTIONS 9

where the objective function h is defined by (2.4).

The cutting angle method can be applied for minimization of an arbitrary
Lipschitz function if the set X coincides with the unit simplex S:

It follows from the following result (see Rubinov (2000) and references therein)

Theorem 2.3 Let f be a Lipschitz positive function defined on S and g be a

function defined on IR? by

Then g(y) = f(y) for all y E S and g is an ICAR function if p > (2K)lnz
where m = minyEs f (y) and K is the Lipschitz constant of f in 11 . 11 1 norm:

K= sup If (XI - f(y)I

z#y, z , y E S llx - ~111 '

Let f be a Lipschitz function over S. Consider the function fd(x) = f (x) +d,
where d > 0. Note that the Lipschitz constant o f f coincides with the Lipschitz
constant of f d . Hence, for each p > 0 there exists d > 0 such that the extension
of fd given by (2.9), is an ICAR function. However if d is a very large number
the function g is "almost flat" and its minimization is a difficult task. On the
other side, if d is small enough, then p is large and some computation difficulties
can appear. Thus, an appropriate choice of a number d is an important problem.
We shall consider the cutting angle method only for minimization of ICAR
functions over S , then it can be applied also for minimization of Lipschitz
functions defined on S. The main idea behind this method is to approximate
the objective function g by a sequence of its saw-tooth underestimates hj of
the form (2.4). Assume that hj (xk) = g(xk) for some points xk, k = 1, . . . ,j
and hj uniformly converges to g as j -+ +oo. Then a global minimizer of g can
be approximated by a solution of the problem:

minimize hj(x) subject to x E S. (2.10)

Note that objective function of (2.10) is a max-min function (this is one of

the simplest max-min functions.) The detailed presentation of a theory of the
10 OPTIMIZATION AND CONTR.OL WITH APPLICATIONS

cutting angle method can be found in Rubinov (2000). Here we discuss only
methods for solution of the auxiliary problem (2.10). Some of them can be
found in Rubinov (2000). The most applicable method can be given if the
objective function g of the initial problem is IPH. In this case constant ck in
(2.4) are equal to zero, hence hj is a positively homogeneous function. Using
this property of hj it is possible to prove that all local minimizers of hj over S
are interior points of S and then to give an explicit expression of local minima
of g over S, which are interior points of S. (see Bagirov and Rubinov (2000)
and also Rubinov (2000)). If g is an extension defined by (2.9) of a Lipschitz
function f defined on S, then g is IPH if p = 1, so we need to choose a large
number d in order to apply the cutting angle method. The question arises is it
possible to extend results obtained for minimization of homogeneous functions
of the form (2.6) for general non-homogeneous functions of the form (2.4). In
the case of success we can consider more flexible versions of the cutting angle
method, which can lead to its better implementation.

2.2 The simplest max-min functions

Thus in this subsection we shall study max-min functions of the form:

h(x) = max min (1fxi - ck) ,

k < j ZEI(P)

where " 1 (lf, . . . ,1:) E R; and ck

Rn, k 5 j. Sometimes we shall denote
E
the function h in (2.11) by hj in order to emphasize the number of min-type
functions in (2.11). The function h defined by (2.11) is the simplest (in a certain
sense) non-trivial max-min function.

We consider the following minimization problem:

minimize h(x) subject to x E S. (2.12)

Our goal is to describe all local minimizers of h over the unit simplex S . We
shall also indicate the structure of such minimizers. Let

k
@k(x)= min lixi - ck, k = 1 , . . .,j.
i€I(lk)

Then
h(x) = max @k(x).
k I j
O N MINIMIZATION O F MAX-MIN FUNCTIONS 11

We set

) {i E I ( l k ) : ltxi - ~k = Qk(x)} .
Q ~ ( x= (2.14)

The functions Qk, k = 1,.. . ,j and h are directionally differentiable and

QL(x, U) = min $ui, u E lRn,

, (2.15)
ZEQ~X)

hl(x, U) = max Qk(x, u) = max min @ui, u E lRn . (2.16)

k~R(x) k ~ R ( x )i ~ Q k ( x )

Recall the definition of the cone K ( x , S) of feasible directions at a point x E S

with respect to the set S . By definition

K(x, S) = {u E lRn : 3 a o > 0 : x + a u E S for all a E (0, a o ) ) .

Recall the well-known necessary and sufficient conditions for a local minimizer
of a directionally differentiable function f over the set S (see, for example,
Demyanov and Rubinov (1995)).

Proposition 2.3 Let f be a directionally differentiable and locally Lipschitz

function defined on S and x E S. Then

1) if x is a local minimizer off over S then fl(x, u) 2 0 for all u E K(x, S);
2) if fl(x, u) > 0 for all u E K(x, S ) \ (0) then x is a local minimizer of f
over S .

Now we describe all local minimizers of the function h defined by (2.11). The
function h is bounded from below on S. Adding a sufficiently large positive
constant we can assume without loss of generality that h(x) > 0 for all x E S.
First we consider minimizers, which belong to the relative interior r i S of the
set S:
r i S = { x = (xl ,..., x,) E S : xi > 0 , Z E I).

Proposition 2.4 Let j 2 n and x E r i S be a local minimizer of the function

h over the set ri S such that h(x) > 0. Then there exists a subset {lk1,. . . ,lkn)
of the set {I1,. . . ,lj) such that :1 > 0, i = 1 , . . . ,n and
12 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1:- > h(x) for all i E I ( P ~ ) ,i + m .

+
+

h ( x ) cki

Proof. Let x E S be a local minimizer. It is easy t o check that

I t follows from Proposition 2.3 that hl(x,u ) 2 0 for all u E K ( x ,S ) . Applying

(2.16) we conclude that for each u E K ( x ,S ) there exists k E R ( x ) such that

Let m E I . Consider the following direction u = ( u l ,. . . ,u,):

Here Xi > 0, i E I\m and

It follows from (2.17) that u E K ( x ,S ) , hence there exists k E R ( x ) such that

Let Q k ( x )be the set defined by (2.14). We shall prove that Q k ( x ) = { m ) .

Assume that Q k ( x )# { m ) . Then due to (2.15) we have a;(%, u ) < 0 which
contradicts (2.18). Thus Q k ( x )= { m ) , SO for any m E I there exists at least
one k E R ( x ) such that Q k ( x )= { m ) . Let ml,m2 E I , ml # m2 and kmi be
ON MINIMIZATION O F MAX-MIN FUNCTIONS 13

an index such that Q k i ( x ) = { m i ) , i = 1,2. Since ml # mz, it follows that

km1 # km,.
Let m E I and Ic, E R ( x ) be an arbitrary index such that Qk,(x) = { m ) .
Then
h ( x ) = Q k , ( x ) = lkmx , - ck, . (2.19)

It follows from (2.19) that

Due to the equality xi xi = 1, we have

Since

we conclude that

h ( z ) 2 min lkxi - c k for all k 5 j. (2.22)

i€I(lk)

I t follows from (2.22) t,hat there exists p E I ( l k ) such that l i x , - ck 5 h ( x ) .

Combining this inequality with (2.20) we deduce that

hence

for all k 5 j . It follows from (2.21) that there exist indices k E R ( x ) and
p E Q k ( x ) such that h ( x ) = Liz, - ck. Using again (2.20) we conclude that
14 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Take now any m E I and an index k , such that k , E R ( x ) and Qk, ( x ) = { m } .

Then we have
h ( x ) = @k,(x) = lhmx m - ~ k ,

and
1,""xi - ~ k ,> @ k , ( x )= h(3;) for a11 i E ~ ( l ~ " i) #, m.
Hence

1;-
- > h ( x ) ckm for all i E I ( l k m ) , i # m.
1 h ( x ) cki+
+

We again consider the function (2.11) and Problem (2.12). In next propo-
sition we will establish a sufficient condition for a local minimum in Problem
(2.12). We assume that the function (2.11) possesses the following property:

@ k ( x ):= ,min l f x i - ck >0 for all k 5 j and x E S.

2EI(lk)

It can be achieved by adding the same large number to all functions Q k , k 5 j.

Proposition 2.5 Let a subset { l k l , . . . ,Ikn} of the set {I1,. . . , l j } enjoys the
following properties:

3)
1:- >-d + c k m for all i E I ( l k m ) , i # m
-
1;i d + Cki

where
ON MINIMIZATION O F MAX-MIN FUNCTIONS 15

then 5 is a local minimum of the function h over S and h ( 5 ) = d .

Proof. It is sufficient to prove that h 1 ( 5 , u ) > 0 for all u E K ( 5 , S ) \ ( 0 ) .

By assumption Q k ( x ) > 0 for all x E S and k 5 j . Then for the point
53 = (&, . . . ,&) E S , where Pi = l / n , i = 1 , . . . ,n,, we have
1
- min 1: -ck > 0 , k 5 j
n i€qlk)
which implies that
1p
--cki >O, i=1, ..., n.
n
It follows from (2.26) that
n

Thus d > 0 .
Since x E S it follows that d + cki 2 0 for all i = 1 , . . . ,n. Due to (2.23) we
have
min
i€I(lk)
(-$-*)so
d + cki
for all k 5 j . The latter means that for any k 5 j there exists i = ik E l ( l k )
such that

Combining this inequality with (2.25) we get

l,"3i - ck 5 d for all k 5 j.

Then
h(5)= max min ( 1 : ~-~c ~5) d.
k l j &I(lk)

Finally, it follows from (2.23) and (2.24) that

min l p 5 i - ckp= d for all p = 1,.. . ,n.

i€I(Ikp)

Thus h ( 5 ) = d. Clearly
16 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Moreover it follows from (2.24) that Q k m ( 5 ) = { m ) . Let u E K ( 5 , S ) , u # 0.

Due to (2.17) we conclude that there exists m E ( 1 , . . . ,n ) such that urn > 0 .
We have that

ht(5,u) = max min lik ui

k€R(Z) i€Qk(Z)

> min
i€Qkm (a)
lfmui

Thus
h t ( 5 ,u ) > 0 for all u E K (5,S ) \ { 0 ) ,

which implies that 5 E S is a local minimum of the function h.

It is easy to see that a function

h ( x ) = rnax min ($xi - c k )

k I j i€I(lk)
with c,+ > 0 is positively homogeneous if and only if cr, = 0 for all k 5 j , that
is
h ( x ) = rnax min lik xi. (2.27)
k l j icI(1k)

Thus analogous results can be obtained for IPH functions from Propositions
2.4 and 2.5 when ck = 0 for all k 2 0. This case has been studied in Bagirov
and Rubinov (2000).

Remark 2.1 For homogeneous max-min functions we have d > 0 and in this
case we do not need assumption Q k ( x ) := minic1(p) l f x i > 0 for all k 5 j and
x E S.

The previous results and the following Proposition 2.6 allow one to describe
all local minimizers of a homogeneous max-min function over the unit simplex:

Proposition 2.6 (see Bagirov and Rubinov (2000)) Let j 1; >

> n,"1 1;e"
+
0, k = 1,.. . ,n, I1(lk)l = n, k = n 1 , . . . ,j where e%s the k t h orth vector
and II(1)I is the cardinality of the set I(1). Then each local minimizer of the
function h defined by (2.27) over the simplex S is a strictly positive vector.

Remark 2.2 Proposition 2.6 remains true if II(1")I L 2 for k = n + 1 , . . . ,j .

ON MINIMIZATION O F MAX-MIN FUNCTIONS 17

Unfortunately the analogue of Proposition 2.6 for non-homogeneous func-

tions of the form (2.11) does not hold. The following example confirms this
assertion.

Example 2.1 Consider t'he following max-min function:

h ( x ) = max min (l$xi - ck) , x E S C R ~ ,

k=l,...,5 i ~ I ( 1 k )

where

Let
@ k ( x ) = min l ki x i - c k , x € S , k = l , ..., 5.
a~I(1k)

We have

1
Q 5 ( x ) = - min (52x1, 104x2, 156x3, 208x4)
25
+ 10.
Consider the boundary point of S:

We have
@ l ( x l )= 5.5, @2(x1)= 5.5, @3(x1)= 5.25,

@4(x1)= 2, @5(x1)= 10

Since
h ( x l ) = max ~ ~ ( x l )
k=1,...,5
it follows that

It is clear that x1 is the global minimum of the function over the set S.
Moreover
@ 5 ( x )> @5[x1)
18 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for all x E ri S and Q 5 ( x )= Q 5 ( x 1 )for a11 boundary points x. Since all the
functions Q k ,Ic = 1,. . . , 5 are continuous there exist E > 0 such that

for all x E { y E S : 11 y - x1 11 < €1. Hence x1 is a local minimizer of the function

This example demonstrates that non-homogeneous max-min functions of the

form (2.11) can attain their local minimum on the boundary of the set S. We
now present a description of local minima, which are placed on the boundary
of S.

Proposition 2.7 Let x E S be a local minimizer of the function h defined by

(2.11) over the simplex S such that h ( x ) > 0 and r = I I(x)l < n . Then there
exists a subset {lk1',. . ,l k ~of} the set {I1,.. . ,l j ) such that

where

1;" >
- h(x) +
n I~( x)) , i # m.
for all i E I ( L ~
12 h ( x )4- Cki

Proof. Consider the space IRT and the simplex

ON MINIMIZATION O F MAX-MIN FUNCTIONS 19

Let h' be the restriction of the function h to IR?. Clearly h' is a function of
the form (2.11) and x is a local minimizer of this function. Since x E riS', we
can apply Proposition 2.4. The result follows directly from this proposition. A

Propositions 2.4 and 2.5 allows us to propose the following algorithm for the
computation the set of local minimizers of the function h defined by (2.11) over
riS.

An algorithm for the calculation of local minimizers

Step 0. (Initialization) Let E > 0 be a tolerance. Set t = 0, m = 0, p(k) =

0, k = l , ..., n.
Step 1. Set m = m + l . If m > n g o to Step 5.

Step 2. Set p(m) = p(m) + 1 and i = p(m). If i > j then go to Step 3.

Otherwise take a vector li and go to Step 4.

Step 3. Set p(m) = 0 and m = m - 1. If m = 0 then go to Step 9, otherwise

go to Step 2.

Step 4. (checking the vector li). If 1i < E go to Step 2, otherwise set k, = i

and go to Step 1.

Step 5. If

1
->-
d + Ck, for all t = 1 , . . . , n , i E ~ ( l ~ ,i)#, t
:1 d + Ck;
then go to Step 6, otherwise set m = n and go to Step 2.

Step 6. Calculate the number

and the point x with coordinates

Step 7. If

and x E S then go to Step 8, otherwise set m = n and go to Step 2.

20 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Step 8. Set t = t + 1, G(t) = x, m = n and go to Step 2.

Step 9. End.

Note that G is the set of local minimizers of the function h over the set ri S .

2.3 Results of Numerical Experiments

In this subsection we describe results of numerical experiments with minimiza-

tion of max-min functions of the form (2.11), both homogeneous and non-
homogeneous, over the unit simplex. First consider non-homogeneous func-
tions of the form (2.11). We consider max-min functions, which appear as
approximations of the following ICAR function:

where

( 15/(i +0.u) otherwise.

The following points on the unit simplex S were used for the construction of
min-type functions:

xt= $ * , yk, = i l s i n ( i + k ) l , k>n+l, i=l, ..., n. (2.31)

Cp=lYp
Due to Proposition 2.1 and Theorem 2.1, the approximation function has the
form:

h(x) = max min ((vf (xk)'x" - (f (xk) - ( o f (.*).L, xk)) , (2.32)

k l j &I(X~) X:

where f defined by (2.28) with aij defined by (2.29).

The algorithm for calculation of local minimizers, which was proposed in
previous subsection, has been applied for the calculation of local minimizers of
the function (2.32) over ri S.
ON MINIMIZATION O F MAX-MIN FUNCTIONS 21

The code has been written in Fortran 90 and numerical experiments have
been carried out in P C IBM Pentium I11 with CPU 800 MHz.
For the description of the results of numerical experiments we use the fol-
lowing notation:

n is the number of variables;

j is the number of min-type functions;

nl, is the number of local minimizers;

tl is the CPU time for the calculation of the set of all local minimizers;

t2 is the average CPU time for calculation of one local minimizer;

nf is the average number of the objective function evaluations for the

calculation of one local minimizer.

Results of numerical experiments are presented in Table 2.1.

Table 2.1 demonstrates that even so simple max-min function as (2.11) has a
large number of local minimizers. Results of numerical experiments presented
in this table show that the proposed algorithm calculates the set of local mini-
mizers quickly enough for small size problems (up to 5 variables). The number
of local minimizers increases sharply as the number of min-type functions in-
creases. The proposed algorithm can calculate one local minimizer very quickly
even for large number of variables and min-type functions.
Consider now minimization of homogeneous max-min functions of the form
(2.27):
h(x) = max min lik xi.
k 5 j i€I(lk)

We consider max-min functions, which appear as approximations of the follow-

ing IPH functions (see Bagirov and Rubinov ( 2001a)):

where aij are numbers defined by (2.29).

Example 2.3
f 2(x) = max min (aij ,x),
152520 l l j s n
22 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Table 2.1 Results of numerical experiments for non-homogeneous max-min functions

We use points xk given in (2.30) and (2.31) for the construction of min-type
functions. Thus we describe all local minimizers of the following functions over

Results of numerical experiments with the functions h1 and h2 are presented

in Tables 2.2 and 2.3, respectively.
ON MINIMIZATION O F MAX-MIN FUNCTIONS 23

Table 2.2 Results of numerical experiments for the function h1

Results presented in Tables 2.2 and 2.3 show that the number of local mini-
mizers strongly depends on the original IPH function. The proposed algorithm
24 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Table 2.3 Results of numerical experiments for the function h2

calculates the set of local minimizers quickly enough for problems up to 10

variables.
ON MINIMIZATION O F MAX-MIN FUNCTIONS 25

3 DISCRETE MAX-MIN FUNCTIONS

In this section we will consider the following minimization problem:

minimize f (x) subject to x E Rn

where
f (x) = maxmin fij(x)
i€I j€J
and I,J are finite set of indices and functions fij are convex. The function f
can be represented as the difference of two convex functions:

where
r 1

Therefore f is a quasidifferentiable in the sense by Demyanov and Rubinov

(see Demyanov and Rubinov (1986); Demyanov and Rubinov (1995)) and its
quasidifferential [2f(x), Bf(x)] can be calculated by using methods of quasid-
ifferential calculus. Moreover this function is semismooth (see Mifflin (1977)).
Some differential properties of this function have been studied in Kirjner-
Neto and Polak (1998); Polak (1997); Polak (2003). The calculation of the
Clarke subdifferential or the Demyanov-Rubinov quasidifferential of this func-
tion is complicated. Therefore here we suggest the discrete gradient method
to solve Problem (3.1). This method is derivative-free. Its description can
be found, for example, in Bagirov (1999a); Bagirov (1999b). For the calcula-
tion of the objective function an algorithm described in Evtushenko (1972) is
used. This algorithm allows one to significantly accelerate the calculation of the
objective function. The usage of this algorithm allows us to conclude that the
calculation of the Clarke subdifferential or Demyanov-Rubinov quasidifferential
of the discrete max-min functions is much more expensive than the calculation
of values these functions. This means that methods, which use only values
of the max-min functions, are more appropriate for their minimization. The
discrete gradient method in conjunction with this algorithm allows us to solve
minimization problems with discrete max-min objective functions. This local
method sometimes can find even a global minimizer of a max-min function.
26 OPTIMIZATION AND CONTROL WITH APPLICATIONS

3.1 Results of numerical experiments

In this subsection results of numerical experiments by using the discrete gradi-

ent method are given. For the description of the test problems and the results
of numerical experiments we use the following notation:

f = f ( x ) is the objective function;

xO is the starting point;

x* is the local minimizer, f , = f ( x * ) ;

n is the number of variables;

iter is the number of iterations;

n f is the number of the objective function evaluations;

nd is the number of the discrete gradient evaluations;

t i m e is the CPU time.

Numerical experiments have been carried out in P C IBM Pentium I11 with
CPU 800 MHz. All problems have been solved with the precision S =
that is a t last point x k :
f ( x ~-)f* <
To carry out numerical experiments unconstrained minimization problems with
the following max-min objective functions were considered:

Problem 1
I
f ( x ) = max min (aij , x )
ZEI ~ E J
I,
..
a: = l / ( i + j + k - 2 ) , i E I , j E J, k = 1,...,n, I = ( 1 ,
( 1,..., 101, so=(10, . . . , l o ) , x* = (0,..., O ) , f* =o.
Problem 2

..
a: = l / ( i + j + k - 2 ) , i E I , j E J, k = 1,...,n, I = ( 1 ,
( 1, . . . , l o ) , x0 = (10, . . . , l o ) , x* = (0,..., O ) , f* = 0.

Results of numerical experiments are presented in Table 3.1.

O N MINIMIZATION O F MAX-MIN FUNCTIONS 27

Table 3.1 Results of numerical experiments for discrete max-min functions

Problem n iter nf nd time

Problem 1 5 24 516 89 0.05

10 32 1498 141 0.17
15 39 3720 240 0.60
20 76 10172 498 1.87
35 135 25060 706 6.81
50 230 55177 1089 20.70

Problem 2 5 53 1272 230 0.17

10 98 4682 442 0.60
15 218 21464 1388 5.33
20 221 26908 1314 6.70
35 409 72968 2047 31.09
50 316 111840 2217 90.24

Results presented in Table 3.1 show that the discrete gradient method can
be applied to solve minimization problems with the discrete max-min func-
tions. This method allowed one to find global solution to the problems under
consideration using reasonable computational efforts.

4 OPTIMIZATION PROBLEMS WITH MAX-MIN CONSTRAINTS

Consider the following constrained optimization problem ( P ) :

minimize f ( x )
subject t o x E X C ]Rn,
gi(x) < 0 , i = 1,...,m , g i ( x ) = O , i = m + l ,..., m + k .

We use a certain version of penalization for solving this problem. First we

convolute all constraints gi (i = 1, . . . ,m . . . ,m + k ) to a single constraint fl:

minimize f ( x ) subject to x E X , fl( x ) 5 0.

28 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The classical penalization of this function leads to the unconstrained minimiza-

tion of the following penalty function:

where f:(x) = max(fl (x), 0) and d is a large enough number. A number d,

such that the optimal value of problem (PI) coincides with minZExL+(x, d),
is called the exact penalty parameter. The exact parameter d, if it exists, can
be very large, so the problem

minimize L+(x, d) subject to x E X,

can be ill-conditioned. In order to reduce the exact parameter, we replace the

objective function f with the objective function f,, where f,(x) = d m
with a large enough number c. (It is assumed that f is bounded from below
on X and c is a number such that f (x) + c 2 0 for all x E X. There are some
ways to omit this assumptions, however we do not discuss them here.) This
approach has been studied in Rubinov et a1 (2002). It was shown there that
under some natural assumptions it holds:

lim d, = 0,
c++m

where d, is the least exact penalty parameter for the problem (PC):
minimize f,(x) subject to x E X , fl(x) 5 0.

Thus instead of the initial problem ( P ) we consider the problem (PC)with

a certain number c and the penalty function L+(x, d,) corresponding to (PC)
with a certain penalty parameter d,. This approach allows one to use two large
number c and d, instead of one "very large" number d (a penalty parameter
for the problem (PI)).
The described approach has been used for solving two test problems, which
can be found in Kirjner-Neto and Polak (1998).

Problem 3 The well-known Rosenbrock function is minimized subject to max-

min constraints. The description of constraint functions are given in Kirjner-
Neto and Polak (1998). Here we present only results of experiments. The
described penalty method has been applied to solve this problem and then the
discrete gradient method has been used to solve obtained unconstrained min-
imization problem. Solving this problem with the starting point xO = (-1,l)
O N MINIMIZATION O F MAX-MIN FUNCTIONS 29

we obtained the local minimizer 5 = (0.552992,0.306732) (up to six digits),

which corresponds to the objective function (in the original problem) value of
0.1999038. This minimum was reached for 34 iterations and 274 objective func-
tion (in the unconstrained minimization problem) evaluations. If we instead
set xO= (1.5,1.5), after 54 iterations using 499 the objective function evalua-
tions in the unconstrained optimization problem we obtained local minimizer
Z = (1.199906,1.440341) which corresponds to the objective function value of
0.039994 in the original problem.

Problem 4 In this problem the well-known Beale function is minimized sub-

ject to the same max-min constraints as in Problem 3. The described penalty
method has been applied to solve this problem and then the discrete gradient
method has been used to solve obtained unconstrained minimization problem.
Solving this problem with the starting point xO= ( 1 , l ) we obtained the local
minimizer E = (2.868929,0.466064) (up to six digits), which corresponds to the
objective function (in the original problem) value of 0.003194. This minimum
was reached for 32 iterations and 172 objective function (in the unconstrained
minimization problem) evaluations. For the starting point xO= (1, -I), after
43 iterations using 225 the objective function evaluations in the unconstrained
optimization problem we obtained the same local minimizer.

5 MINIMIZATION OF CONTINUOUS MAXIMUM FUNCTIONS

The cutting angle method requires to evaluate only one value of the objective
function a t each iteration. This property is very beneficial, if the evaluation of
values of the objective function is time-consuming. Due to this property we can
use the cutting angle method for solving some continuous min-max problems.
Some details of this approach can be found in Bagirov and Rubinov (2001b).
Let Y c Rm be a compact set and let cp(x, y) be a continuous function
defined on the set Rn x Y . Consider the following continuous min-max problem:
find points 5 E Rn and y E Y such that

We reduce (5.1) to the following unconstrained minimization problem

minimize f (x) subject to x E IRn

30 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where

We assume that the function y H cp(x, y) is concave for each y, then the
evaluation of the values of function f can be easily done by methods of convex
programming. The cutting angle method can be applied for problems with a
small amount of external variables x, however the number of internal variables
y can be large enough. Numerical experiments with such kind of functions were
produced in Bagirov and Rubinov (2001b) and we do not repeat them here.

6 CONCLUDING REMARKS

In this paper we have considered different optimization problems with max-min

objective and/or constraint functions. We have presented numerical methods
which can be used for solving some of these problems. In particular, terminating
algorithm for minimization of the simplest max-min functions over the unit
simplex have been proposed. This algorithm allows one to describe the set
of local minima of problems under consideration. We demonstrated that the
discrete gradient method can be used to solve unconstrained minimization of
the discrete max-min functions and a special penalty function method can
be applied to solve optimization problems with max-min constraint functions.
Finally, we have discussed possible applications of the cutting angle method to
global minimization of the continuous maximum functions when the number
of external variables is not large. Results of numerical experiments presented
in this paper show that the considered methods are effective for solving some
problems under consideration.

Acknowledgments

The authors are very thankful to Professor E. Polak for kindly discussions of our
results during his visit to Australia and providing us his recent papers. We are
also grateful to an anonymous referee for his helpful comments. This research was
supported by the Victorian Partnership for Advanced Computing.
REFERENCES 31

References

Bagirov, A.M. (1999a). Minimization methods for one class of nonsmooth func-
tions and calculation of semi-equilibrium prices, In: Progress in Optimiza-
tion: Contributions from Australasia, Eberhard, A. et al. (eds.), Applied 0p-
timization, 30, Kluwer Academic Publishers, Dordrecht, 147-175.
Bagirov, A.M. (1999b). Derivative-free methods for unconstrained nonsmooth
optimization and its numerical analysis, Investigacao Operacional, 19,75-93.
Bagirov, A.M. and Rubinov, A.M. (2000). Global minimization of increasing
positively homogeneous functions over unit simplex, Annals of Operation
Research, 98, pp. 171-189.
Bagirov, A.M. and Rubinov, A.M. (2001a). Modified versions of the cutting an-
gle method, In: N. Hadjisavvas and P.M.Pardalos (eds.) Advances in Convex
Analysis and Global Optimization, Kluwer Academic Publishers, Dordrecht,
245-268.
Bagirov, A.M. and Rubinov, A.M. (2001b). Global optimization of marginal
functions with applications to economic equilibrium, Journal of Global Op-
timization, 20, 215-237.
Bandler, J.W., Liu, P.C. and Tromp, H. (1976). A nonlinear programming
approach to optimal design centering, tolerancing and tuning, IEEE Trans.
Circuits Systems, CAS-23, 155-165.
Bartels, S.G., Kunz, L. and Sholtes, S. (1995). Continuous selections of linear
functions and nonsmooth critical point theory, Nonlinear Analysis, TMA,
24, 385-407.
Bracken, J . and McGill, J.T. (1974). Defence applications of mathematical pro-
grams with optimization problems in the constraints, Oper. Res., 22, 1086-
1096.
Cheng, C.K., Deng, X., Liao, Y.Z. and Yao, S.Z. (1992). Symbolic layout com-
paction under conditional design rules, IEEE Trans. Comput. Aided Design,
11, 475-486.
Demyanov, V.F. and Rubinov, A.M. (1986). Quasidifferential Calculus, Opti-
mization Software, New York.
Demyanov, V.F. and Rubinov, A.M. (1995). Constructive Nonsmooth Analysis,
Peter Lang, Frankfurt am Main.
32 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Evtushenko, Yu. (1972). A numerical method for finding best quaranteed es-
timates, USSR Computational Mathematics and Mathematical Physics, 12,
109-128.
Gilbert, E.G. and Johnson, D.W. (1985). Distance functions and their applica-
tion to robot path planning in the presence of obstacles, IEEE J. Robotics
Automat., RA-1, 21-30.
Grossman, I.E. and Sargent, R.W. (1978). Optimal design of chemical plants
with uncertain parameters, AIChE J., 24, 1-7.
Halemane, K.P. and Grossman, I.E. (1983). Optimal process design under un-
certainty, AIChE J., 29, 425-433.
Hochbaum, D. (1993). Complexity and algorithms for logical constraints with
applications to VLSI layout, compaction and clocking, Studies in Locational
Analysis, ISOLD VI Proceedings, 159-164.
Ierapetritou, M.G. and Pistikopoulos, E.N. (1994). Simultaneous incorporation
of flexibility and economic risk in operational planning under uncertainty,
Comput. Chem. Engrg., 18, 163-189.
Kirjner-Neto, C. and Polak, E. (1998). On the conversion of oiptimization prob-
lems with max-min constraints to standard optimization problems, SIAM J.
on Optimization, 8(4), 887-915.
Liu, P.C., Chung, V.W. and Li, K.C. (1992). Circuit design with post-fabrication
tuning, in: Proc. 35th Midwest Symposium on Circuits and Systems, Wash-
ington, DC, IEEE, NY, 344-347.
Mifflin, R. (1976). Semismooth and semiconvex functions in constrained opti-
mization. SIAM Journal on Control and Optimization.
Muller, G. (1976). On computer-aided tuning of microwave filters, in: IEEE
Proc. International Symposium on Circuits and Systems, Munich, IEEE
Computer Society Press, Los Alamos, CA, 722-725.
Ostrovsky, G.M., Volin, Y.M., Barit, E.I. and Senyavin, M.M. (1994). Flexibil-
ity analysis and optimization of chemical plants with uncertain parameters,
Comput. Chem. Engrg., 18, 755-767.
Polak, E. (1981). An implementable algorithm for the design centering, toler-
ancing and tuning problem, J. Optim. Theory Appl., 35, 45-67.
Polak, E. (1997). Optimization. Algorithms and Consistent Approximations.
Springer Verlag, New York.
REFERENCES 33

Polak, E. (2003). Smoothing techniques for the solution of finite and semi-
infinite min-max-min problems, High Performance Algorithms and Software
for Nonlinear Optimization, G. Di Pillo and A. Murli (eds.), Kluwer Acad-
emic Publishers, to appear.
Polak, E. and Sangiovanni Vincentelli, A. (1979). Theoretical and computa-
tional aspects of the optimal design centering, tolerancing and tuning prob-
lem, IEEE Trans. Circuits and Systems, CAS-26, 795-813.
Rubinov, A.M. (2000). Abstract Convexity and Global Optimization, Kluwer
Academic Publishers, Dordrecht.
Rubinov, A.M., Yang, X.Q. and Bagirov, A.M. (2002). Nonlinear penalty func-
tions with a small penalty parameter, Optimization Methods and Software,
l7(5), 931-964.
2
A COMPARISON OF TWO
APPROACHES T O SECOND-ORDER
SUBDIFFERENTIABILITY CONCEPTS
WITH APPLICATION T O OPTlMALlTY
CONDITIONS
A. Eberhard
Department of Mathematics,
RMIT University, GPO Box 2476V,
Melbourne, Australia

and C. E. M. Pearce
Department of Mathematics,
University of Adelaide, North Terrace,
Adelaide, Australia
Abstract: The graphical derivative and the coderivative when applied to the
proximal subdifferential are in general not generated by a set of linear operators
Nevertheless we find that in directions at which the subjet (or subhessian) is
supported, in a rank-1 sense, we have these supported operators interpolating
the contingent cone. Thus under a prox-regularity assumption we are able to
make a selection from the contingent graphical derivative in certain directions,
using the exposed facets of a convex set of symmetric matrices. This allows
us to make a comparison between some optimality conditions. A nonsmooth
formulation of a standard smooth mathematical programming problem is used
to derive a novel set of sufficient optimality conditions.

Key words: Subhessians, coderivative inclusions, rank-1 representers, opti-

mality conditions.
36 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION
In this paper we shall concern ourselves only with second-order subderivatives
that arise from one of two constructions. First, there is the use of the con-
tingent tangent cone to the graph of the proximal subdifferential and its polar
cone to generate the graph of the contingent graphical derivative and contingent
coderivative. The second one is the use of sub-Taylor expansions to construct
a set of symmetric matrices as replacements for Hessians, the so called subjet
(or the subhessian of Penot (199411)) of viscosity solution theory of partial
differential equations (see Crandall et al. (1992)). To each of these construc-
tions may be associated a limiting counterpart which will also be considered a t
times. As the first of the above constructions produces a possibly nonconvex
set of vectors in Rn and the latter a convex set of symmetric operators, it is
not clear how to compare them. One of the purposes of this paper is to begin
a comparative study of these notions for the class of prox-regular functions of
Poliquin and Rockafellar (1996). A second objective is to consider what this
study tells us regarding certain optimality conditions that can be framed using
these alternative notions. We do not take up discussion of the second-order
tangent cones and the associated second-order parabolic derivatives of func-
tions, which is beyond our present scope. We refer the reader to Bonnans,
Cominetti et al. (1999), Rockafellar and Wets (1998) and Penot (199412) and
the bibliographies therein for recent accounts of the use of these concepts in
optimality conditions. Important earlier works include Ben-Tal (1980), Ben-
Tal et al. (1982), Ioffe (1990) and Ioffe (1991) Some results relating parabolic
derivatives to subjet-like notions may be found in Eberhard (2000).
Where possible we adhere to the terminology and notation of the book 'Vari-
ational Analysis' by Rockafellar and Wets (1998), with the notable exception
of the proximal subdifferential. Henceforth we assume f : R"-+ ]FZ to be lower
semi-continuous, minorized by a quadratic function, and 5 E dom f .

Definition 1.1 A vector y E lRn is called a proximal sub-gradient to f at x if,

for some c > 0,

i n a neighbourhood of x . The set of all proximal sub-gradients to f at x will be

denoted by 8, f ( x ) .
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 37

This in turn defines the basic subdifferential via limiting processes by

d f (x) = lim sup dpf (5').

x'-f x

Here x' + f x means x' -, x and f (x') + f (x). In general the proximal
subdifferential is only contained in the subdifferential. Next we define one
concept that forces equality.

Definition 1.2 The function f is prox-regular at 3 for jj (E df (3))with re-

spect to E and c > 0 if f satisfies (1.1) whenever llx' - 311 < E and Ilx - 511 < E
and If(x) - f(3)l < E , while Ily -jjll < E with y E df(x).

As noted in many papers (see, for example, Poliquin and Rockafellar (1996),
Poliquin et al. (1996) and Rockafellar and Wets (1998)) prox-regular functions
constitute an important class of nonsmooth functions in the applications of
nonsmooth analysis to optimization and thus are currently undergoing intense
study. The proximal subdifferential when applied to the indicator function
ds(x) of a set S (defined to be zero if x E S and +oo otherwise) gives rise to
the proximal normal cone NZ(3) := dpSS(3). This may in turn be used via
limiting processes to define the normal cone for which dds(x) := Ns(x). For
an arbitrary set-valued mapping F : IRn 2 IRm, the normal cone to its graph
at a point y E F ( x ) gives rise to the coderivative mapping of Mordukhovich
(1976)-Mordukhovich (1994), Kruger et al. (1980) and later studies by Ioffe
(1986)-Ioffe (1989). We consider only the finite-dimensional case in this paper.

Definition 1.3 Suppose F : X Z Y is a multifunction, (x, y) E Graph F and

w E X. The Mordukhovich-Ioffe coderivative is defined as

This multifunction is not necessarily convex-valued and generally has a very

complicated structure. Another nonconvex set-valued derivative constructed
from cones is the contingent graphical derivative.

Definition 1.4 (Aubin) The graphical derivative (or contingent derivative)

mapping of a multifunction F : XZ Y at (x, y) E Graph F in the direction
w E X is defined by
38 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Graph F-(x,y)
H e r e TGraph ~ ( xy), := lim suptLo t i s t h e contingent tangent cone.

The generality of this construction enables one to apply this to various

subdifferential multifunctions and hence arrive a t a "second-order" theory of
subdifferentiation, that is, for y E d,f(x) or y E df(x) we may construct
D*(df ) (x, y) (w) (see Mordukhovich (l998), Mordukhovich (2001) and Mor-
dukhovich (2002) for recent results regarding the calculus of this object). A
price is paid for going down this path in that the coderivative of the basic
subdifferential is, in general, no longer related to operators that generalize
Hessians in any obvious manner, except for cases when the functions involved
are a composition or sum of sufficiently regular or smooth functions that ad-
mit a classical Hessian. The use of calculus rules to derive explicit expression
for D*(8f)(x, y)(w) (the coderivative to the basic subdifferential) when f in-
volves such a composition or sum has been undertaken in Mordukhovich (1984)
and Mordukhovich (1994). Such results help to characterise the structure of
these set-valued mappings in specific cases. We concern ourselves here with
a more generic way of using certain symmetric operators to generate elements
within the graphical derivative and coderivative to the proximal subdifferen-
tial. The method used will be to establish certain inclusions which augment
those of Rockafellar et al. (1997). Thus the intent, methods and results of
this paper are quite distinct in character to those of Mordukhovich (1984) and
Mordukhovich (1994).
Another path may be taken to arrive at a second-order theory of the subd-
ifferential. The strategy here is to extend the inequality (1.1) into a sub-Taylor
expansion

f(x) 2 f):( + (y,x - 3)+ -(Q(x

1
2
- E ) , (X - 3)) + 0(11x -311~) (la2)

of order two, where Q is chosen to be a symmetric n x n matrix and 0(11x-~11~)

is the usual Landau small-order notation. The subhessians are the set of all
such operators and denoted d2'- f (x, y). This set is always unbounded. Indeed,
the subjet contains many "redundant" operators, since all the negative semi-
definite operators (denoted by -P(n)) are in the recession cone of d2*-f (x, y)
when viewed as a convex set of operators in the linear space of all symmetric
operators (denoted by S(n)). This cone induces a natural ordering in S(n).
Once again the application of ideas from set-valued analysis may be used to
define limiting versions of the subjet. Details are provided in the next section.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 39

At this juncture the natural question arises as to whether there is any rela-
tionship between d2>-f ( x ,y), D(dpf ) ( 5 ,y) and D*(d, f ) ( x ,y). The answer to
this question is far from obvious as on first inspection we note that d2>-f ( x ,y)
is a set of operators while D(dpf ) ( 5 , y) and D* (8, f ) ( x ,y) are multifunctions
whose images are contained in IRn. When f E c2(IFtn)all notions coincide in
that

Here the equality for the coderivative utilizes the symmetry of the operator
V 2f ( 5 )and f ( x ,V f ( 5 ) )denotes the Pareto efficient subset with re-
spect to the partial order induced by P ( n ) . Thus a possible relationship to
consider is whether Q E d 2 ' - f ( x ,y) is to have Qw E D*(d,f)(x,y)(w) or
Qw E D(dpf ) ( x ,y) ( w ) . In Rockafellar et al. (1997)conditions are given (which
include prox-regularity of f at x , y) under which we have the inclusion

When the polarity of the associated tangent and normal cones is taken into
account, one can see that this inclusion does imply a kind of symmetry property
for the elements of the contingent graphical derivative. This prompts one to
conjecture that some elements of {Qw I Q E d2>-f ( x ,V f ( 5 ) ) ) may also be
contained in D (8, f ) ( x ,y ) ( w ).
In this paper we extend the inclusion (1.3) by considering the relationship
of d2>-f ( x ,y) to D(dpf ) ( x ,y) ( w ) . To do so we need to extract the correct op-
erators from d21- f ( x ,y). The geometry of d21- f ( x ,y) is of critical importance
in this development. Let us now introduce some elements of this geometry
necessary to state our results. The F'robenius inner product (Q,B) = t r BtQ
induces the quadratic form (Q,hht) = htQh = (Qh,h) when applied to a
rank-one operator hht(- h @ h using tensor product notation). We call

:= {Q E S ( n ) I q ( A ) ( h )2 ( Q ,hht) for all h )

a symmetric rank-1 representer. The function q ( A ) ( h ):= sup{(&,hht) I Q E

A ) : IRn H is called the symmetric rank-1 support function. We say that a
rank-one representer A is exposed by the rank-one support in the direction w
when
W , w ) := { Q E A I q ( A ) ( w )= ( Q , w w t ) )# 0.
40 OPTIMIZATION AND CONTROL WITH APPLICATIONS

It was established in Eberhard, Nyblom and Ralph (1998) that E(A, w) may
be empty in some given directions. Despite this, we show in this paper that
for an arbitrary rank-one representer we have E(A, w) # Q) for a dense subset
in bf (A) := {W I q(A)(w) < +m).
A multifunction I? : X Z Y is generated by a set of operators A G S(n) if
r(w) = Aw = {Qw I Q E A). In Ioffe (1981) it is established that fans and
hence many derivative-like objects are, in general, not generated by a set of
linear operators. In this paper we investigate an alternative notion.

Definition 1.5 We say that a multifunction I? : X Z Y is rank-one generated

by a set A of bounded linear operators A : X Z Y if

Here we are concerned with the question of when a multifunction il : X Z Y

admits a rank-1 selection. That is, an operator I? : X= Y defined as above
with F(w) & il(w) for at least a dense set of points in domil. We can now
state the inclusions proved in this paper. Suppose f is prox-regular at x with
respect to y. Suppose in addition that Q E E(d21- f (x, y), w) and choose w
such that
2
f r ( 5 , y, W) := liminf - (f (5
,I-+, t2
+ tw') - f (5) - t(y, w'))
tl0
= f:(x, y, w) := min(fr(5, y, w), f r ( 5 , y, -w)}.

Then (w, Qw) E TcraphaPf( x , ~ )the , contingent cone to the proximal subdif-
ferential, that is, there exists a rank-1 selection of the graphical derivative

If we posit the same assumptions as in Rockafellar et al. (1997), then we have

immediately

Thus we are able to make a selection from these non-convex set-valued deriv-
atives using certain rank-1 exposed facets of a convex set of symmetric ma-
trices. The construction of such matrices is often possible and provides an
alternative path to the generation of elements within D(dpf)(x, y)(w) and
D* (df) (5, jj)(w). This just the kind of selection which must be calculated
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 41

if numerical optimization procedures are to be developed using such second-

order information. We give an example for suprema of smooth functions.
The paper is structured as follows. Section 2 details various definitions
and concepts. Section 3 discusses the geometry of rank-1 representers that
is necessary for later proofs and consequently is partly a survey of results,
although some new results are derived. Section 4 surveys the necessary material
we use from the field of generalized convexity. Again some new results appear.
Sections 5 and 6 are devoted to the development of the promised comparisons
and inclusions, and contains no material previously published.
Finally, in Section 7, we explore some consequences of these results in the
study of sufficient conditions for isolated, strict local minima of nonsmooth
functions. A number of new observations are made. The inclusions devel-
oped allow us to compare various second-order optimality conditions based
on different second-order subdifferentials. To study the particular case of a
standard mathematical programming problem, we make use of the penalized
Lagrangian method of Andromonov (2001). We derive a sufficient optimality
condition for the so called "strict local minimum of order two" of Auslender
(1984), Studniarski (1986) and Ward (1995)-Ward (1994). These conditions
are of a novel character for three reasons. First is that no standard regularity
condition is assumed. Secondly, one does not necessarily have to demand that
VL(Z,d) = 0, where VL(3, d) denote the usual Lagrangian function of the
problem and d the optimal Lagrange multipliers. Thirdly, the second-order
conditions involve some interaction of the geometry of the constraint set a t the
optimal point Z via the curvature of the active constraints in directions h for
which (VL(Z,d), h) 1 0.

2 PRELIMINARIES

In this section we define and discuss a number of concepts used throughout

the paper. We assume a working knowledge of variational derivatives and
the associated notion of convergence of sets taken from set-valued analysis
(see Rockafellar and Wets (1998) and Aubin et al. (1990)). The convergence
notation used is standard and as we are working in finite dimensions there is
no confusion regarding which of the many possible convergence notions is being
used. One may always assume we are using Kuratowski-Painlev6 convergence
notions (see Rockafellar and Wets (1998) and Attouch (1984)). For a family of
42 OPTIMIZATION AND CONTROL WITH APPLICATIONS

sets {C" I v E W), limsup,,, C" is defined as the set consisting of all cluster
points of sequences {un) with un E CVn (for n sufficiently large) and some
vn -+ w as n + m , while lirn inf,,, C v consists of points u for which, for any
given sequence vn + w,
there exists a convergent sequence un, with un E Cvn
(for n sufficiently large) and with u = limn,, u n . Clearly liminf,,, Cv G
limsup,,, C". When these coincide we say that {C" I v E W ) converges to C
and we write C = lim,,, C".
Denote by S ( n ) the set of all real n x n symmetric matrices and by R+ (re-
spectively E) the real intervals [0,+m) (respectively (-co,+m]). When C is a
convex set in a vector space X, denote the recession directions of C by O+C =
{x E X I C + x C C). When C is not convex, we denote the horizontal direc-
tions by Cm := {x E X I 3pn J 0 and cn E C such that x = lirn,,, pncn).

Definition 2.1 Let {f, f" : R + ]R, v E W ) be a family of proper extended-

real-valued functions, where W is a neighbourhood of w in some topological
space. Then the lower epi-limit e-liv,,fv is the function hawing as its epi-
graph the outer limit of the sequence of sets epi f":

epi (e-li ., f " ) := lirn sup(epi f ").

The upper epi-limit e-Is,,, f" is the function having as its epigraph the inner
limit of sets epi f":

epi (e-1s ., f ") := lirn inf (epi f ").

When these two are equal, the epi-limit e-lim,,, f" is said to exist. In this
case, {fV)vEW is said to epi-converge to f .

As e-li "
when e-1s
,
,f" (x) 5 e-Is
., .,
f"(x), we have epi-convergence of f" occurring
f" (x) 5 f (x) and f (x) 5 e-li ,
, f"(x) for all x. The upper and
lower epi-limits of the sequence f" may also be defined via composite limits
(see Rockafellar and Wets (1998)). In particular

e-1s u,w fv (x) = sup lirn sup

6>0 v+w
inf
x1EBa(x)
f "(x') and

e-li ,
, f" (x) = sup lirn inf inf f" (XI)
6>0 "+W xlEBa(x)

In the introduction we defined the notions of proximal subdifferential and

prox-regularity. Let us now define a number of subderivative concepts arising
in nonsmooth analysis.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 43

Definition 2.2 Let f : lRn H E be lower semi-continuous, Z E dom f and Z ,

z , h E lRn. Put A , f ( x , t , p , u ) : = 2 $ [ f ( x + t u )- f ( x ) - t ( p , u ) ] .

1. The basic subdifferential is given by

d f ( x ) = limsupdpf(x'):= { lim z, I for some z, E dpf(x,), x, -tf x).

21-03
x'-f x

2. The lower second-order epi-derivative at 2 with respect to z and h is

given by

f ! ( ~ , z , h ) := liminf A2f (5,t ,z, ht) = e-li tloA2 f ( Z , t ,2 , h ) .

tl0
h' --+ h

3. A function f : X --+ R is said to be twice sub-differentiable (or possess a

subjet) at x zf the set

d2>-f ( x ) = {(v(P(x),v2cp(x))I (P E c2(lRn)and

f - cp has a local minimum at x)

is nonempty. The subhessians are denoted by

4. The limiting subjet o f f at x is defined to be

d 2 f ( x )= l i m s u p d 2 , - f ( ~ ) .
-
u-f x

5. The set 22f ( x ,p) = { Q E S ( n ) I (p,Q ) E 3f ( x ) ) is called the limiting

subhessians o f f .

As -a2>- f ( x ,-p) = d2*+ f ( x , p ) we study only the subjet. One may de-
fine corresponding superjet quantities by reversing the inequality. I t must be
stressed that these quantities may not exists everywhere but d2>-f ( x ) and
dpf ( x ) are defined densely. I f f!(Z,p, h ) is finite, then fl(3,h ) = (p,h ) , where
the first-order lower epi-derivative is defined by
1
f l ( ~ , h=
) liminf - ( f ( x + t h t )- f ( x ) ) .
tl0, h'--+ht
Another commonly used concept is that of the upper second-order epi-derivative
at 3 with respect to z and h , which is given by

f y ( 2 , Z , h ) := limsup inf A2f ( 2 ,t , z , h') := e-1s tloA2 f ( 2 ,t ,Z , h).

t10 h'-h
44 OPTIMIZATION AND CONTROL WITH APPLICATIONS

When f:(E,p, h ) = f;l(Z,p, h ) := fc(?,p, h ) , we say f possesses a second-order

epi-derivative.
We recall that there is a one-to-one correspondence between subsets of X x Y
and the graphs Graph F := { ( x ,y) Iy E F ( x ) ) of multifunctions F : X Z Y .
The next concept has been extensively studied (see, for example, Rockafellar
(1989) and Rockafellar (1988)).

Definition 2.3 The intermediate cone is given by

GraphF - ( x ,y )
TAraph( x ,y ) := lim inf
tL0 t
A multifunction is said to be proto-dzfferentiable i f this cone coincides with the
contingent cone, that is,

Using this tangent cone and the contingent tangent cone one may define
many derivative like concepts using the multifunction F ( x ) := f ( x ) + [0,+m)
with graph Graph F = epi f . It is well-known that the contingent cone to
epi f corresponds to the epigraph of the function h H f J ( x , h ) (see Aubin et al.
(l99O), Rockafellar (1989) and Rockafellar (1988)). Also proto-differentiability
of epi f corresponds to first-order epi-differentiability of f .
The normal cone to a set S is given by

N s ( % ) = lim sup (Ts( x ) )0 .

x(ES)--+z
The contingent coderivative of a multifunction F : X Z Y at ( x ,y ) E Graph F
in the direction w E X is defined by

The Ioffe-Mordukhovich coderivative is generated by the contingent coderiva-

tive through

D * F ( x ,Y ) ( w )= limsup D*F(x',y l ) ( w l ) . (2.2)

( . ' , d ) ( ~ G r a p hF ) - ( = , v )
w "w

Note that (2.2) is consistent with Definition 1.3 in that

COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 45

In finite dimensions the subjet concept is closely related to the proximal

differential. Denote by P ( n ) the cone in S ( n ) of all positive semi-definite
symmetric matrices. It is well-known (see Penot (1994/1))that d2>-f ( 5 )# 0
is equivalent to d,f(Z) # 0 which is in turn equivalent to the existence of
( p ,X ) E lRn x S ( n ) such that

f r ( 5 ,p, h) 2 ( X ,hht) for all h.

This prompts the following definitions (see Eberhard, Nyblom and Ralph
(1998)).

Definition 2.4 Denote by M ( n ,m ) the class of real n x m matrices.

1. The rank-one hull of a set A C M ( n ,m) is given by

A' I
= { A E M ( n ,m ) ( A ,vut) 5 q(A)(u,v) for all v E lRm,u E lRn),

where q(A)(u,v ) := sup{(Q,vut) I Q E A ) .

2. A set A is said to be a rank-one representer if A1 = A.

3. When A C S ( n ) , the set of real symmetric matrices, we denote the sym-

metric rank-1 support by q(A)(u):= sup{(Q,uut) I Q E A ) and the
symmetric rank-1 hull by

A: I
= { A E S ( n ) ( A ,uut) I q(A)(u)for all u E lRn).

4. The rank-1 (resp. E-rank-1) supported points in the direction u are given
respectively by

E ( A ,u ) := { A E A I ( A ,uut) = q ( A ) ( u ) )and
E,(A, u ) := { A E A I ( A ,uut) 2 q(A)(u)- E ) .
5. When -P(n) c O+A 5 S ( n ) we define the symmetric rank-one barrier
cone as bi (A):= { u E lRn I q(A)( u ) < 00).

Remark 2.1 If we restrict attention to S ( n ) and sets A such that -P(n) C

O+A, then q(A)(u,v)= +oo unless v = u. Thus in this case we need only
consider the symmetric supports q(A)(u)and A; = A'. Indeed we always have
-P(n) C 0+{A E S ( n ) I ( A ,uut) 5 q(A)(u)for all u E lRn) and q(A -
P ( n ) )( u )= q(A)( u ). Even E ( A ,u ) contains recession directions in -P(n) .
46 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The subjet is always a closed convex set of matrices while d2f (P,p) may not
be convex, just as dpf (5) is convex while df (5) often is not. Eberhard, Nyblom
and Ralph (1998) observed that

t-0
(f (Z + th') - f ( 5 )- t(p, h'))
h1+h
:= f:(P,p, u).

Hence if we work with subjets we are in effect dealing with objects dual to the
lower, symmetric, second-order epi-derivative. The study of second-order di-
rectional derivatives as dual objects to some kind of second order subdifferential
was begun in J.-B. Hiriart-Urruty (1986), Seeger (1986) and Hiriart-Urruty
et al. (1989). These works studied the particular case where f is a convex
function.

Definition 2.5 The function f is subdzfferentially continuous at P for G, where

S > 0 there exists E > 0 such that
G E df (P), if for every If (x) - f (%)I 5 6
whenever Ix - PI 5 E and Iv - GI 5 E with v E df (x).

In general we have for all h (see Ioffe and Penot (1997)) that

q (d2f( 3 , ~ )(h)
) = sup{(Q,hht) I Q E d2f(%p)) I f T T ( % p , h )

where (xf,p') --+s,(f)

(P,p) means x' + 5, f (XI) + f (lc), z' E dpf (XI) and
p' + p. Equality holds when f is prox-regular and subdifferentially continuous
(see Corollary 6.1 of Eberhard (2000)).

3 CHARACTERIZATION OF SUPPORTED OPERATORS

In this section we outline some important facts regarding the geometry of rank-
1 sets which are relevant to later sections. We show that there is a dense set
of directions h E bi(A) for which a rank-1 representer A has E(A, h) # 0. We
discuss also some results from previous papers relevant to the characterization
of rank-1 exposed operators, as these figure strongly in the later development
of the paper. One can completely characterise symmetric rank-one supports.
The following characterization may be found in Eberhard, Nyblom and Ralph
COMPARISON OF SOME SECOND-ORDER SUBDERTVATIVES 47

(1998). This result generalizes those of J.-B. Hiriart-Urruty (1986) and Seeger
(1986) which treats only the case when the symmetric rank-1 support happens
to be a convex function. The results of J.-B. Hiriart-Urruty (1986) and Seeger
(1986) are sufficient to study the second-order derivative notion for convex
functions (see, for example, Seeger (1992), Hiriart-Urruty et al. (1989) and
Seeger (1994)), while Theorem 3.1 below is applicable to a more general study
of second-order epi-derivatives. Important as these earlier works are, they do
not figure in the present development.

Theorem 3.1 Let p : Rn H ]R be proper (that is, p(u) # -co anywhere). For
u, v E Rn, define q(u, v) = co if u is not a positive scalar multiple of v or vice
versa, and q(au,u) = q(u,au) = ap(u) for a 0. >
Then q is a symmetric rankone support of a set A C S ( n ) with -P(n) C
O+ A if and only if

1. p is positively homogeneous of degree two;

2. p is lower semicontinuous;

Even in finite dimensions there may not exist, in every direction, an operator
in a given rank-1 representer achieving the symmetric rank-one support. Thus
some caution is required in the subsequent development when using rank-1
exposed facets and operators.
As noted earlier we endow S(n) with the Frobenius inner product (Q, A) =
t r AtQ. The natural norm induced by this inner product is the so-called pro-
jective norm llAllproj , given by

Here llQllop = supllzllzll 11Q~11~

denotes the usual operator norm induced by
the Euclidean norm. One may use the symmetric rank-1 support to reconstruct
the support of the symmetric rank-1 representer as a convex subset of S(n),
confirming that all relevant data for this problem is contained in such rank-one
facets. We quote the following result from Eberhard (2000).
48 OPTIMIZATION AND CONTROL WITH APPLICATIONS

5 +
Theorem 3.2 Let dimS(n) = m (= (n 1)). Then i f A is a rank-one
representer which has -P(n) G O+A, we have for any Q E P ( n ) that

1 1
-
- q ( A ) ( ~ iI) C U ~ U ;= Q for some 1 5 m +1
i=l
On the space S ( n ) we may define the barrier cone to A (as a convex subset)
by b(A) := {P E S(n) I S(A, P) < 00).
Recall that an exposed point of A is the unique maximizer Q in A of a linear
function P). Recall also that P ( n ) induces a natural ordering in the space
(a,

S(n) with respect to which we may define a Pareto (or undominated) subset of
the rank-1 representer A. We quote next another result from Eberhard (2000)
that we require in this section.

Corollary 3.1 Any point Q E A which is supported by a linear functional

(., P) o n S(n) also lies o n at least as many rank-one supporting hyperplanes as
the rank of P. I n particular i f Q is an exposed point of A, then it is a Pareto
maximizer of E(A, h) for some h .

In convex analysis the concept of supporting points plays an important role.

Unfortunately even when bi (A) = IRn we are not assured of the existence of a
supporting point for the rank-one support. See Eberhard, Nyblom and Ralph
(1998) for an example of a rank-one representer with E(A, u) = 0 for a given
direction u E bi(A) = IRn. Thus it is not immediately clear that rank-one
supported operators are sufficiently numerous to provide a description of the
rank-one representer. The best that can be hoped for is E ( A , u ) # 0 for a
dense set in b i (A).
Denote by aS(A,Q) the subdifferential (in the sense of ordinary convex
analysis) of the convex support Q' H S(A)(Qt) a t a given point Q E S(n). It
is a standard result in convex analysis that dS(A, Q) = { A E A I S(A, Q) =
(A, Q)). It is well-known that aS(A, Q) reduces to a singleton almost every-
where in int b(A) at which VS(A, Q) € b d d is the unique supporting point,
that is, the intersection of the supporting hyperplane and the set A). Even
when int b(A) = 0, we know from Section 20D of Holmes (1975) that any lower
semi-continuous convex function (on a Banach space) still has aS(A, Q) #0
densely on its effective domain. We need the following lemma.
COMPARISON O F SOME SECOND-ORDER SUBDERIVATIVES 49

Lemma 3.1 Suppose that Q = ziEF uiuf E B62(uut)n P ( n ) , where u #0

and F is a finite index set. Then there exist ai such that I XiEFa: - I I I 6'

and for i E F we have llui - aiull < 6.

Proof We have Q = CiEF +
U ~ U ;= U U ~ Qg, where Qs E B p ( 0 ) . It follows

that for all v E ( u ) l with llvll = 1, we have ziEF (ui,v)' = (Qg,vvt) and so
<
C i E F ( u i21)', b2,as (Qs,vvt) I IIQsll llvl12 < b2. By a theorem of Hormander
(see Holmes (1975))we have for M = { a u I a E IR) that there exists v E
( u ) ln Bl(0) such that

d(ul,M ) = sup {(u',w) - S ( M ,w ) ) = (u',v ) . (34

wEBl(0)

The supremum is attained as B1(0) is compact and S ( M ,.) is lower semi-

continuous.
Let ai attain the minimum in min, llui - aull Using (3.2) for each i , we
have vi E ( u ) ln B1(0) such that d(ui,M ) = (ui,vi) and so

Finally take zzt with z = Vllull and note that (ui,z ) = aiu. We compose zzt
with Q = CiEF
uiuf = uut + Qd to get

We may prove the promised density of directions in b:(A)that expose A in

a symmetric rank-1 sense.

Theorem 3.3 Suppose that A is a rank-one representer with -P(n) C O+A

and with bl(A)# 0. Then the set of directions u for which E ( A , u ) # 0 is
dense in b1( A ) .

Proof Take an arbitrary nonzero u E b(A). As d S ( A ,P ) # 0 densely there

exists a sequence Pm + uut(# O ) , where Pm E P ( n ) and

By Lemma 3.1 we have for any representation Pm = )~ 1 I

u , " ( u ~(with
dimS(n)+ 1)that u," + aiu (taking a subsequence and renaming if necessary).
50 OPTIMIZATION AND CONTROL WITH APPLICATIONS

I
Here a? = 1 are the accumulation points of the a's in Lemma 3.1. Thus
for any E > 0 we have for m L mo that Iluy - aiull I a i E for all i with ai # 0

and IIuyll 5 E for all i with ai = 0. For each m, the supporting point A, E A
has S(A, P,) = (A,, P,). By Corollary 3.1 there exists a representation
P, = xBZ1u T ( u ~ )containing
~, a t least as many linearly independent uT as
the rank of P, but no more than dimS(n) + 1, such that for each i

where 11% - ull 5 E whenever ai # 0. In particular we have A, E E(A, %).

It remains only to establish that a t least one ai # 0. But if we assume to
the contrary that all ai = 0, then for all i we have u y + 0 and so P, =
xf=lu T ( u ~ ) 0.~ This contradicts u # 0, so we are done.
--+

Remark 3.1 In Eberhard (2000), Corollary 6.2 it is shown that i f f is prox-

regular at 5 for p E df (5) and in addition possesses a second-order epi-
derivative at 5, then E (a2>-f ( z , ~ h)
) , # 0 for all directions in the relative
interior of the set b1 (a2>-
f(z,~)).

As noted earlier the identification of rank-1 exposed points is of importance

in the development of later results. With regard to the supported points of a
rank-1 representer, we have the following result (see Eberhard, Nyblom and
Ralph (1998), Theorem 4 and Eberhard (2000), Theorem 4.2).

Theorem 3.4 Suppose that A 2 S ( X ) is a rank-one representer with -P(X)

O+A and u E b(A). Then (2Au, 2A) E d21-S(A)(u) and A E A if and only if
A E E(A,u).

It should be noted that Eberhard, Nyblom and Ralph (1998) contains a

counterexample to the possible conjecture that (2Au, 2A) E d21-S(A)(u) im-
plies A E A.
We finish this section by consider what effect the infimal convolution (or
Moreau envelope) has on the rank-1 support of a rank-1 representer and its
rank-1 exposed operators. Recall that the infimal convolution (Moreau enve-
lope) of a function f is defined via
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 51

This is finite when the function f is proper, lower semi-continuous and mi-
norized by a quadratically function y - & 11 . -y112 with r > 0 (see Rockafellar
and Wets (1998) Example 1.44). In this case each fx is proper and lower semi-
continuous. The supremum of all parameters r is called the prox-threshold
of f . It is well-known that fx is pointwise nondecreasing with decreasing X
and epi-convergent to f as X + 0 (see Rockafellar and Wets (1998) Example
1.44 and Proposition 7.4 (d)). Note that even when int dom f = 0 we have fx
finite-valued everywhere (for X sufficiently small) as long as f is quadratically
minorized (or prox-bounded). Thus dom fx = IRn. Moreover f x can be shown
to be locally Lipschitz continuous. In Eberhard, Nyblom and Ralph (1998) we
may find the following series of results.

Definition 3.1 Suppose that A is a rank-1 representer. Put

1
Q E S(n) I (Q,uut) I 2qx(-A)(u) for all u E IRn
2
where

Write a:'- f (x, 0) = (a2>-

f (x, 0))x. Not surprisingly we have the following
(see Eberhard, Nyblom and Ralph (1998)).

Theorem 3.5 Let A be a rank-one representer with -P(n) 2 o + A . Then

the infimal convolution of the support u H q(A)(u) is also the support of a
rank-1 representer. That is, there exists a r a n k 1 representer Ax for which
-P(n) = O+Ax and
Q X / ~ ( ~ )=( U
q ()A ~ ) ( u ) .

) qx(d)(u) I q(A)(u) for all U, we have Ax

Note that, since q ( A ~ ) ( u= C A.
We now quote a couple of results from Eberhard, Nyblom and Ralph (1998)
detailing the approximation properties of the infimal convolution which will be
used in subsequent proofs. In particular we have the following.

Theorem 3.6 Suppose that f : IRn t+ ]R is a lower semi-continuous, prox-

bounded function. We have for all X > 0 suficiently small that
52 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and so

Some related observations may be found in Poliquin and Rockafellar (1996).

The limiting behaviour is also good (see Eberhard, Nyblom and Ralph (1998)).

Theorem 3.7 Suppose that f : RnH is a lower semi-continuous function

which is pros-bounded. Then for all X > 0 suficiently small

Such smoothing may be viewed alternatively in terms of infimal convolution

smoothings of the associated quadratic forms rather than the smoothing of the
rank-one support (see Eberhard, Nyblom and Ralph (1998), Proposition 7).
An alternative approach to these results follows from the following observation
communicated to us by A. Seeger. Suppose star denotes the convex conjugate.
The form qQ(u) := (Q, uut) has an infimal convolution characterised by

-m if I +XQ $ P ( n )
Q ( I + ~ Q ) +(h) +
if I XQ E P ( n ) \ (int P(n)) , h E Im ( I + XQ)
q;+xQ(h) =
~ ( I + A Q ) - I (h) +
if I XQ E int P ( n )
+m ifh$Im(I+XQ).

+
Here Im (I XQ) denotes the image or range and (I+XQ)+ the Moore-Penrose
+
inverse. Thus I XQ E int P ( n ) is a necessary and sufficient condition for
( Q Q ) x /=
~ qQx where

Such results have a long history when one notes that QA is constructed via a
'parallel sum' (see Mazure (1996), Anderson et al. (1969) and Seeger (1991)).
The following result is very useful in subsequent proofs (see Eberhard, Nyblom
and Ralph (1998)).

Proposition 3.1 Suppose that A is a ranlc-one representer with -P(n)

O+A. Then for X > 0 suficiently small we have

(A)A= {&A IQEA and QA is a quadratic form) -

COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 53

The next result allows us to study E ( d , .) via the more regular E (Ax, 0 ) .

Proposition 3.2 Suppose A is a non-empty rank-one representer. If Q E

E ( A ,h), then for all X > 0 suficiently small we have Qx E E(Ax,hx), where
+
hx = ( I XQ) h + h as X -t 0 and q(Ax)(hx)= q(A)(h) XllQhl12. +
Proof Since (Q,hht) 5 q(A)(h)for all h, application of the infimal convolu-
tion (with parameter X-') to both sides of this inequality provides (Qx,uut) 5
qx(A)(u)= q(Ax)(u)for all u. Thus Qx E Ax. For X-' > I min{p I p
+
is an eigenvalue of &)I, the matrix (Q,r)qt) X-' 11r)112 is positive definite
3
and so the problem inf, { (Q,rlr)t) + 11 h - r)1I2} has a unique solution a t r) =
(I + XQ)-' h. In particular, for any fixed h E Rn,we have for each X-' > 11Q112
+
that hx := (I XQ) h has h = ( I +
XQ)-' hx. Thus

( Q x , hxh:) = ( Q ,hht) + illhA- hl12

When Q E E ( A ,h) we have q(A)( h )= (Q,hht) and so

which implies Qx E E ( A x ,h x ) .

4 GENERALIZED CONVEXITY AND PROXIMAL SUBDERIVATIVES

We shall need to use the notion of a generalized subgradient with respect to

certain generalized convexity generating classes of functions @. The following
concept was probably first introduce in Ky Fan (1963).

Definition 4.1 Let Q be subset of the mappings cp : lRn -+ E. A function f is

called @-convex if f(x) = sup,,@, cp(x)for some subclass Q' 2 Q.

I t is widely recognised that the differential information extracted from the func-
tions in such Q-subdifferentials provided information regarding certain subdif-
ferentials of nonsmooth analysis (see Rockafellar and Wets (1998)).This section
concerns itself with the problem of quantifying this relationship more formally.
Notions of abstract convexity were introduced in Janin (1973) and devel-
oped later in Balder (1977) and Dolecki et al. (1978). This approach has a
54 OPTIMIZATION AND CONTROL WITH APPLICATIONS

long history. Many earlier papers were concerned with the case of paracon-
vex/paraconcave functions (or stronglweak convexity) (see Penot and Volle
(1988) and Vial (1983)). Essentially this corresponds to taking the class @ to
consist of quadratics with a fixed maximum negative curvature. This restric-
tion is dispensed with in Dolecki et al. (1978) and greatly generalized via the
use of abstract 'dualities' (see Martinez-Legaz (1988) and Martinez-Legaz and
Singer (1995)). This approach is detailed in the text of Singer (1997). This
approach is developed in a different direction and applied to many optimization
problems in Rubinov (2000).
The approach discussed above is more general than required here as we
require only the use of abstract conjugations in the spirit of Pallaschke and
Rolewicz (1998). This approach has been exploited in Eberhard, Nyblom and
Ralph (1998)-Eberhard (2000) to study the approximate subdifferential and
consequently the basic subdifferential along with certain second-order deriva-
tive concepts. It has long been recognised that abstract convexity gives infor-
mation about the subdifferentials of nonsmooth analysis. One of the contribu-
tions of Eberhard and Nyblom (1998) was to show that, in finite dimensions,
the study of the a2-subdifferential was equivalent to the study of the proximal
subdifferential for the class of lower semi-continuous, prox-bounded functions.
To establish this one must show how to extend the local inequalities (1.1) and
(1.2) to ones which hold globally. The reason for doing this is that generalized
conjugates require global suprema rather than local ones. We present a result
of this kind in this section but defer the long proof to an appendix.
As we are mainly concerned with sub-Taylor expansions, the class

I with cp E c'(R") and r : R+ - R and limr(t) = 0

tl0

will be of importance in subsequent proofs. Clearly it follows that cp E dJ,(zl f (5)

implies (Vcp(Z),V2cp(5)) E 8'
1 - f (5).

Lemma 4.1 Suppose there exists afunction w(.) : R+++ R with lirntlo ~ ( t =)
0 such that
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 55

Then there exists a function E ( . ) : IR+ H IR+ with E 2 w, limtlo ~ ( t=)0 = ~ ( 0 )

and a function r : y I+ ~ ( 1 - 1 %ll)lly
~ - %[I2 E C2(IRn) satisfying n ( r , % ) :=
( V r ( 5 ) ,v 2 r ( 5 ) )= (0,O) and (4.1) is satisfied with E i n place of w.

The proof is deferred to the appendix. We shall provide a proper introduction

to the operator q in Definition 4.4.

Corollary 4.1 Suppose f : Rn + R is prox-bounded (or bounded). If

( p , X ) E d2,- f ( Z ) , then there exists 7 : R+ -t R+ such that

belongs to d ~ , ( , ) f ( 5 ) .

Proof It follows from Lemma 4.1 that this may be achieved locally around 5 .
To extend outside this neighbourhood we follow the proof of Eberhard, Nyblom
and Ralph (1998) Proposition 6 , noting that we may begin with equation (4.1)
of Eberhard, Nyblom and Ralph (1998) replaced by (4.1) locally about 3,that
is, with -r(ll% - x11)115 - x1I2 E C 2 ( R n ) replacing the term -Xllx - %/I2. The
argument of Eberhard, Nyblom and Ralph (1998) Proposition 6 now applies
and establishes the result.

Thus any cp E J 2 ( Z ) that satisfies the subgradient inequality locally around

5 may be suitably modified outside a neighbourhood of Z to arrive a t a sub-
gradient inequality that holds globally. Thus for all u

This generalized Fenchel inequality may be used to define a generalized deriv-

ative.

Definition 4.2 Let f : X t+ be a @-convex function. Then

1. the @-conjugate is given by fC(cp):= sup,,^ (cp(u)- f ( u ) ) ;

2. the Fenchel inequality is f ( x ) + fC(cp) L cp(x);

5'. the @-subdifferential is d~ f ( x ) := {cp E I f (x) + f ' ( c p ) = cp(x));
Consider the class
56 OPTIMIZATION AND CONTROL WITH APPLICATIONS

In Eberhard, Nyblom and Ralph (1998) and Eberhard and Nyblom (1998) it is
shown that the proximal subdifferential of Rockafellar, Murdukhovich and Ioffe,
denoted by dpf (x), is equivalently characterised via dpf (x) = {Vcp(u)l,,, I cp E
da2f (x)) := Vd@,f (2). This is a set of elements from X* rather than a set
of nonlinear functions defined on X. One may see that dpf (x) C Vda f (x) for
any class C @ C2(lRn). It should be noted that such @-convex functions
are simply those lower semi-continuous functions which are bounded below by
some cp E @. Traditionally this has been termed @-bounded (see Dolecki et al.
(1978)). If cp E dC2f ( I ) , then by taking c > p(~2cp(5))(the spectral radius)
we have .J, E d@,f ( I ) with .J,(x) := cp(I) + (Vcp(x), (z - I ) ) - $ 1 1 ~ - 5Il2, since
locally around I

We may "globalize" this inequality using Proposition 2.2 of Eberhard and Ny-
blom (1998). This globalization property for a2-bounded functions may be
generalized to the class C2(lRn) as shown by Proposition 6 in Eberhard, Ny-
blom and Ralph (1998). Thus we always have dpf (5) = Vd@f (x) for any class
@2 c @ c C2(IR").
If a function is not -cm anywhere and f $ +cm (that is, f is proper), then
to be a supremum of functions it must be at least bounded below by one
such function, that is, Q2-bounded. When one is interested only in the local
differentiability properties of the function, this assumption may be dropped by
setting f (x) = +oo for x 4 B6(5). If f is lower semi-continuous then S >0
may be chosen so that the resultant function is actually bounded below by a
constant. This will not affect the local differentiability properties of f a t I.

Definition 4.3 Given a function f : lRn H E, a generalized convexity gener-

ating family @ C C2(lRn) and an abstract convex cone

EC {&(a) : lR+ H lR+ with lima(t) = 0) ,

tl0

for any given E E we define the E-subdifferential off at x by

We use the notation E > 0 to mean that ~ ( t >) 0 for a11 t > 0. As is usual
+ +
(E A)(.) := e(.) A(.) for any E and X E E. We have deliberately left the
precise choice of E open for time being but note that in Eberhard and Nyblom
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 57

(1998) it is shown that for the choice E = { ~ I tE > 0 ) the quantities d$, f ( x )
approximate the proximal subdifferential in that for all 6 > 0 and any E 2 0
we have (slightly abusing notation in suppressing the t for ~t E 2)

In particular this implies that V d & f ( x ) in effect estimates the closed sets
6'; f ( 5 )= 8- f ( 5 )+ E d - f ( 5 )= { z I ( 2 ,y) I f!(5; y ) for all y ) is
~ ~ ( O where
) ,
the lower Dini subdifferential. As E J, 0 both will approximate the closure of
the proximal subdifferential dpf ( x ) = V d G zf ( 5 ) . In order to drop the closure
operation we need the following concept.

Definition 4.4 W e say a function f is E-proximally regular at 5 i f V d g , f (Z)la

(the set of points generated by evaluating these derivatives at 5 ) is closed for
a l l ~ Et E:= { ~ I Et > 0 ) .

Corollary 4.2 Let f : lRn H ]R be lower semi-continuous and @2-bounded.

Then
v a g f ( q=vag,f(q =a,-f(~)

for any class a2 C 5 C2(lRn). I n particular for all 6 > 0 we have V d $ f ( 5 ) c

vd:Z6 f ( 5 ) and when V d & f ( 5 ) is closed for all E > 0 (that is, is E-proximally
regular) we have V d zf ( 5 ) = V d & f ( 5 ) .

Proof The result will follow immediately on showing V d $ f ( 5 )5 v d : f 6 f ( 5 )for

all 6 > 0.If z E V d $ f ( 5 ) , then by definition z = Vcp(5)for some cp E 8; f ( 5 ) .
Put c = p(V2cp(5)). Then for all 6 > 0 we have
58 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where v ( x ) = -511~ - 8112 E a2 G a, 8 = it +

and z = V v ( Z ) . By Theorem
5.3 part 1.3 of Eberhard and Nyblom (1998) we have

which implies Vd;f (Z) G V d g 2f (it) G V d $ f ( 5 )

and so V d g f ( 5 ) = V d & f ( Z ) = c f ( i t ) . If ~ d : : ~ f ( i t )is closed for all 6 > 0,

then as noted in Theorem 5.1 part 1 (c) of Eberhard and Nyblom (1998) the
closures may be dropped.

We note that J2(Z) is indeed a convex conic subspace of C 2 ( R n )which clearly

contains a2 (for any Z ) . Put D h ( f ) := { x I a;,(,) f (s) # 0 for all E > 0 ) .
Definition 4.5 Define an operator 17 : C 2 ( R n ) x Rn I+ Rn x S ( n ) given by
~ ) (Vcp(Z),V2cp(Z))and let E E Z := { i t 2 I
~ ( c p , := 6 > 0). Put

1. d ; > - f ( Z , p ) := { Q I f r ( Z , p , h ) + ~ l l h 1 1>~ ( Q ,h h t ) for all h ) ;

When e = 0 we have d ; ? - f ( Z , p ) = { Q I f r ( Z , p , h ) ( Q , h h t ) for all >

h ) = d2>-f ( ~ , p )is a closed subset of S(n) (with the operator norm). We
must now consider the relationship between V2d52(,)f ( Z , p ) and e>-
f ,p).
(f
Clearly d$- f ( 3 , ~is)closed for any e 2 0 and it follows from definitions that
V2d52(,)f ( 5 , ~ G) f ( 5 , ~ )We
81- . wish to show that equality holds. To do
so we need the following slight generalization of Lemma 5.2 in Eberhard and
Nyblom (1998). The proof is easily extended to this more general context.

Lemma 4.2 Let g : Rn -t be a proper lower semi-continuous @-bounded

function and cp E @. Then for all e 2 0

With this in hand it is easy to extend Eberhard and Nyblom (1998) Propo-
sition 5.1 to the form we state next without proof.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 59

Proposition 4.1 Let g : lRn + be a proper lower semi-continuous a2-

bounded function and cp E @, where a2 C C2(lRn). Then for all E E Z and
x E domg, where V d $ g ( 5 ) # 0, we have

We can now show the equivalence of two notions.

Lemma 4.3 Suppose f : lRn H is lower semi-continuous and proper. Then

V2d52(z,f (3,p) = @- f (5,p) and so v2d;2(,) f (3,P ) is closed.

Proof To show that V 2 d ~ , (f q( 5 , p ) 2 d 2 > - f ( 5 , ~take

), Q E d 2 ? - f ( % p ) . BY
(1.2) we have

where -w(llx - 511) := induces a mapping v(.): lR+ t-+ lR. Now
apply Lemma 4.1 to obtain r ( . ) : lR+ H lR+ with r(.)ll . -5112 E C2(lR) such
that cp(x) := ( p , x - Z ) + $ ( X , ( X - Z ) ( X - % ) ~ ) -r(llx-Z11)11~-511~ E dJzcm) f(5).
Then ( p ,Q ) E ~ d ~ f, ( (5 )~as )required. As the reverse inequality is always true
the result V 2 d J , ( z )f ( z , ~=)d2>-f ( E , ~ follows.) Now apply this result t o the
+
function x H f ( x ) EIIX - %[I2, noting that €11 . -5112 E J2(5). By Lemma 4.2
+
we have d J 2 ( q ( f € 1 1 . - ~ 1 1 ~ ) ( 5= +
, ~d)J z ( qf ( 5 , p ) €11 . -?ill2. We have via an
+
elementary argument that LIZ?-(f €11 . -5112)(5, p) = 821- f (5,p) 2eI. This +
gives

Because the closure is required to relate Vdcz f ( x ) and dpf ( x ) ,we can equate
~ d 5 ~ (f ,( )5 )and d2>-f ( x ) only under the assumption of E-proximally regular.
The next result is taken from Eberhard and Nyblom (1998).

Lemma 4.4 Let f be locally Lipschitz in a neighbourhood of 5. Then f is E -

proximally regular at 5 i f and only i f for every E > 0 there exists a c, > 0 such
that for each z E V d $ , f ( 5 ) (with Z := { ~I Et > 0)) there exists a cp E d& f ( Z )
such that Vcp(5)= z and V2cp(5)2 -c,I.

Indeed prox-regular functions are locally E-proximally regular.

60 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Theorem 4.1 Suppose the function f is prox-regular at 5 for fi with respect

to q and r , where fi E df (5). Then f is locally e- proximally regular for all x
with llx - 511 < q and If (x) - f (%)I< q and all p E dpf (x) with lip - fill < q.

Proof We sketch the proof leaving the details to the reader. First note that
when f is prox-regular at 5 it is also locally prox-regular and so is f (.)+ell. - X I [ .
The addition of ell - - X I [ will only help to ensure that T, the f attentive y
+
localization of d(f (.) ell . -xIl)(x) (see Poliquin and Rockafellar (1996) or
+
Rockafellar and Wets (1998) for definitions), will have T r I monotone (use
Poliquin and Rockafellar (1996) Theorem 3.2 and subdifferential calculus on the
sum). In addition note that an f attentive q-neighbourhood of 5 is contained
+ +
in an f (.) ell . -xll attentive (q eq)-neighbourhood of 5. Thus in an f
attentive neighbourhood (that is, llx - 511 < q and 11 f (x) - f (5)II < q) of Z we
have d(f (a) + ell -xll)(x) = dp(f (.) + ell - -xII)(x) and so

This implies Vd& f (x) = 8; f (x) is closed. Hence f is locally e- proximally

regular.

5 GENERALIZED CONVEXITY AND SUBJETS

As we have shown, the derivative information supplied by cp E dcz f(5) is

closely related to proximal subdifferentials and subjets. We pursue this ap-
proach further here in order to study the graphical derivative of the proximal
subdifferential. For the rest of this section let g @ g C2(Rn) with @ con-
vex and conic (using the usual operations scalar multiplication and addition of
functions). For cp E d~ f (x) we define

Theorem 5.1 Suppose that f is @-convex and proper and @ is a convex conic
subset of mappings from X to E. Then for any v : X H R with +(v(x+t(.)) -
v(x)) E @ and cp E d~ f (x) we have

A s a consequence the following are equivalent:

COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 61

Proof We apply the Fenchel equality using the facts that @ is a convex conic
subset and the evaluation mapping $ H x($) := $ ( x ) is linear for II, E
a. For cp E d@f ( x ) take the conjugate ($A2f ( x ,t,cp, .))'($) = suph($(h) -
LA2
2 +
f ( x ,t , cp, h ) ) ,where $(h) := $ ( v ( x t h ) - v ( x ) ) . On using f C ( c p ) + f ( x )=
cp(x) we obtain that

= ($)s;p ( p ( x+ t h ) + tv(x + t h ) f ( x + t h ) + f (x)

- - ~ ( 5-)t v ( x ) )

= ($)y p ( ( c p + t v ) ( x+ t h ) f ( x + t h ) f c ( c p ) t v ( x ) )
- - -

= ($) + t v )
(f'(cp t ( I ( P + tv) x(cp)
- fC(cp) -
t
-

+
Thus we have $ ( v ( x t(.))- v ( x ) )E d@(;A2 f ( x ,t ,cp, .)) ( h ) if and only if

1
= -(v(x+th)-v(x))
t
Substituting f ' ( c p ) + f ( x ) = cp(x) again and cancelling gives
($) (fc(cp+tv)+ f ( x + t h ) - cp(x+ t h ) ) -
V(X)
=
v(x+th)-v(x)
t
which is equivalent to fC(cp + t v ) + f ( x + t h ) = (cp + t v )( x + t h ) or cp + tv E
d+ f ( x + th). This is true if and only if ( x + th, cp + t v ) E Graphd f or ( h ,v ) E
Graphaf -(x,v)
t

These imply some familiar structures.

62 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Corollary 5.1 Suppose f is a @-convex function, v E @ and cp E f(x).

implies

+
(h, v2cp(x)h Vv(x)) E
Gra~hapf - (x! v ~ ( x ) )+ o(l)Bl,
t

Proof We use Vd f (x + th) := {Vq5(x + th) I q5 E d~f (x + th)) = dpf (x + th)

along with Vcp(x+th) = ~ c p ( x ) + t ~ ~ c p ( x ) h + and
o ( t )Vv(x+th) = Vv(x)+o(l).
In particular whenever (2) of Theorem 5.1 hold we have equivalently

cp(x+ th) + tv(x+ th) E d a f ( x + t h )

and so Vcp(x + th) + tVv(x + th) E Vda f (x + th). This implies

which in turn implies

(h, v2v(x)h + Vv(x)) E

G r a ~ h a p f - (x, v ~ ( x ) )
+
B1. (5.1)
t t
Note that for cp E C2(X) we have

We thus define A2f (x, t, Vcp, h) := (6)(f (x + th) - f (x) - t(Vcp(x), h)) and
obtain

Recall that A E E(A, u) if and only if (2Au, 2A) E d2t-q(A)(u) and A E A.

This situation is rather more natural than first may appear.

Corollary 5.2 Suppose f is a @-convex function, cp E da f (x); h E IRn and

v :X H is diflerentiable at x with v E 9 cP. >
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 63

1. The condition

for all t > 0 suficiently small implies for all p > 0 and w E X that

2. If V v ( x )= 0, then there exists h such that

Proof The subgradient inequality for (5.2), namely i ( v ( x + t(.))- v ( x ) ) E

(+a2f ( x ,t , cp, .)) ( h ) ,gives that for all h' and w'

1 1
E ;(v(x + t ( h f+ ,owf))- V ( X ) ) - ;(u(x + t h ) - v ( x ) ) .

If X E d2>-f ( x ,Vcp(x))then we have A2f ( x ,t , cp, h )> (X - v2cp(x),hht ) +

o t ( l ) . On taking the limit infimum as t J 0, wf + w and h' --+ h we obtain

for arbitrary h and 6. On choosing h = h , p = pn and tZ = wn, it follows that

for pn 0 and wn + w

This implies
64 OPTIMIZATION AND CONTROL WITH APPLICATIONS

We observe that for any h and w' we have

We separate out two cases, when

and when fr(x,Vv(x),

-h) = f!(x, Vp(x),h).
Consider first the former. As w' H f!(x, Vv(x), +
h pw') is lower semi-
continuous and by assumption ff(x,Vp(x), -h) > f!(x, Vy~(x),
h),it follows
that for all w

> 0 such that for p < 6 and w' E Bg(w)we have

Thus there exists a 6
-(h + pw'))2 f!(x, Vp(x),h) + 6. In particular this implies
f!(x, Vv(x),

as p 0. Thus if pn J 0,wn+ w results in a finite limit in

then for n sufficiently large we have

Indeed in this case there exists a 6 > 0 such that for all w E &(w) and p < 6
we have
h + PW) = f%,
f;(x, Vv(x), h + PW).
Vv(x), (5.7)
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES

We now consider the second case. Let p, J, 0, w,

infimum in
- w attain the limit
65

limipf '
~ 1 0w, -+w p
(f:(x, Vp(x), h + pw') - f:(x, Vp(x), h))) (5.8)

and assume once again this limit infimum is finite. We address three subcases.
The first two are when either f:(x, Vv(x), h + p,w,) = f!(x, Vp(x), h + p,w,)
for n sufficiently large or f:(x, Vv(x), h + p,w,) = f!(x, Vp(x), -(h + p,w,))
for all n sufficiently large. With either we have via the finiteness of (5.8) that

In the former case (5.6) holds for all n. In the latter, after renaming h as
-h and w as -w, (5.6) then holds for n sufficiently large. The third subcase
+ +
is when f r ( x , Vp(x), h p,w,) = f!(x, Vp(x), f(h p,w,)) infinitely often.
We may reduce this to (5.6) on taking a subsequence along which a positive
sign is always chosen.
The only case remaining occurs when the limit infimum is not finite in which
case any arbitrarily chosen sequence will attain the limit infimum. Thus we may
assume that the limit infimum in (5.5) is attained by a subsequence along which
(5.6) holds for all n.
Now consider p, I 0 and w, - w achieving the limit infimum

When this is infinite we have (5.10) holding trivially, since (5.4) implies the
quotient in (5.9) is bounded below by (2V2p(x),wwt). Finiteness of the limit

implies finiteness of the limit (5.5). Arguing as before we may once again
assume that (5.6) holds along this sequence. Then when Vv(x) = 0 we have
by (5.7) and (5.4) that for arbitrary w

lim inf
pJ0,w1+w ($) (f:(x, Vv(x), h + pw') - f 3 , V d x ) , h) - p(2v2v(x)h,4 )
66 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2v2cp(x))E 6';'- f f ( x ,Vcp(x),h). Since

This implies (2v2cp(x)h,

we have, by Theorem 3.4, that v2cp(x)E E(d2?-f ( x ,V c p ( ~ )h).

6 SUBJET, CONTINGENT CONE INCLUSIONS

This section explores the relationship between subjets and coderivatives. This
is achieved by first obtaining results relating the subjet and the contingent
graphical derivative and then appealing to the results of Rockafellar et al.
(1997). This will allow us to explore the connection between the symmetric
operators used in subjets and the symmetry notions introduced in Rockafellar
et al. (1997) for coderivatives. We find once again that the crucial operators
are those exposed by rank-one supports. In Dolecki et al. (1978)it was shown
that a necessary and sufficient condition for a function f : IRn H to be
@-convex (for C @ G C2(IRn))is for f to be lower semi-continuous and
minorized by a t least one element of @ (that is, G2-bounded or alternatively
prox-bounded) .
Recall that a function f belongs to C1>l(1Rn) when its gradient exists every-
where and the gradient itself is a locally Lipschitz function.

Proposition 6.1 Suppose that f : IRn H i?

is @-convex with @ 2 C @ C
c2(IRn) and cp E da f ( x ). Suppose v2cp(x)E E(d2>-f ( x ,Vcp(x)),h ) and choose
h such that f!(x, Vcp(x),h ) = min{ fT(x, Vcp(x),h ) ,f;(x, Vcp(x),-h)). Then
there exist sequences {t,) 10 and {h,) -+ h such that for all y

1 1
L -(v(x
t
+ t y ) - v ( x ) )- -(v(x
n t
+ tnhn) - v(x))(6.1)
+
where v ( x t y ) := v ( x ) - t ~ ( t y11 yl12
) (with v ( x ) taken as an arbitrary fixed
value) is differentiable at x with V v ( x ) = 0 and e(.) : IRn -+ IR is as given
Lemma 4.1. If in addition we assume that f is locally Lipschitz and v may be
chosen so that it is strictly differentiable at x , which may be achieved i f f is
C1>l(IRn), then
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 67

and so ( h , V 2 v ( x ) h )E Tcraphaf(x,
V p ( x ) ) , the contingent tangent cone to
Graph d f .

Proof Suppose V 2 v ( x )E E(d2'-f ( x ,V ( x ) ) h, ) and h is chosen as above. Then

f!(x, V(p(x),h ) = ( v 2 v ( x )hht)
, and so there exists { t n ) 0 and { h n ) -+ h
and an error term e(y)ll~11~ = 0(lly11~)
such that

and

f (5 +t y ) - f (5) > t ( V v ( x ) Y, ) + $t2(v2v(4,

Y Y t ) - ~ ( t Y ) l l t Y l (6.3)
l~

for all y. Here cp E d~f ( x ) C 2 ( R n )and e : Rn H R with limy,o e(y) =

0 := c(0). When f is C 1 ? l ( R nwe
) may assume that y H ~(ty)IIy11~ is strictly
differentiable at the origin with derivative zero. Indeed we could put

We have on rearranging and subtracting (6.3) that

If v ( x + t y ) := v ( x )- t~(ty)lly11~ (for v ( x ) := 0 ) , it follows that

-(v(x + t y ) - v ( x ) )= -&(tY)llyll2+ 0 = V v ( x )as t

1
-+ 0
t
along with (6.1). Define g(y) := ~ ( t y ) l l y 1 and
1 ~ note that v(x+ty) = v ( x ) - t g ( ~ ) .
When f is C1*l we have f twice differentiable almost everywhere with
Ilv2 +
f ( x ty)ll I L , where L is the Lipschitz constant of V f . Thus y H
v ( x + t y ) is twice differentiable almost everywhere with value V;V(X +ty) =
- t V 2 g ( y ) ,where

~ ~
2
,(v2
~= ( y f) ( x
t
+ ty)t2)= 2 v 2f ( x + t y ) .
Thus almost everywhere we have I ~ V; V ( X + ty)II I t (2 11V2f ( x + ty)II) := t2L
and so for fixed t we have y H v ( x + t y ) is C1?lwith a Lipschitz constant t2L.
68 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Next observe that for fixed t > 0 we have by the chain rule that V y v ( x+ t y ) =
tV,v(u) I,=,+ty= tVv(x + ty). Thus

which implies that V v ( . )is strictly differentiable at x when f is C1*l.

We show next that when v is strictly differentiable at zero we have

lim sup
y-thi
&(Y)IIY 112
/
- &(th1)llth'1I2= lim sup
yi-+hi I t2M Y 1 ) - g(hl))
lItY1- th'll
= lirn sup
y-rh' I ~ Y - h' l
To prove (6.4)consider the Clarke subdifferential of the function y tg(y).As H
1
+ +
T ( v ( x t y ) - v ( x ) )= - g ( y ) we have tg(y) = -(v(x ty) - v ( x ) ) . The chain
rule implies -tdv(x + th') = d (tg)(h'). As v is strictly differentiable a t x, we
+
have dv(x thl) -+ V v ( x )= 0 as t t 0. Thus

diam d ( t g ) (h') = tdiam dv(x + th') = o(t)

for h' in a bounded set. By the Lebourg mean-value Theorem we have

where z ( t ) E d (tg)( y ' ) for some y' in the interval between y and h'. Thus by
the upper semi-continuity of the Clarke subgradient

lim sup
y-th' IY - h' l
I sup{llzll I z E 8 (tg)( Y 1 )
for y' = Xy + ( 1 - X)th1and X E ( 0 , l ) )= o(t)

as t t 0 and h' -+ h as claimed in (6.4).

Consequently for each t there is a small neighbourhood Bat(th')within which
we have
I4~)ll~Il~ L 4t)llY - thlII.
- ~(th')Ilth11I21

Thus for every y E Ban(tnhn)we have

COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 69

Hence

where the little "on notation is taken to mean that implied by (6.4). Define the
function k ( y ) := f ( x + y ) and put $ ( y ) := ( V p ( x ) ,y -tnhn) + ; ( v 2 p ( x ) ,yyt -
t i h n ( h n ) t ) E C 2 ( R n ) . Then by (6.5) we have

for all y E B6,(tnhn). Since the function o ( t n ) 2 0 the a2-boundedness of

+
k is transferred to k(.) o(tn)ll . -tnhnll and so we may use Corollary 4.1 to
+
globalize this inequality and thus obtain $ E acz ( k ( . ) o(tn)II. -tnhnll)(tnhn).
This implies, using the subdifferential calculus for locally Lipschitz functions
and the basic subdifferential, that

and so ( h ,v 2 p ( x ) h ) E G r a ~ h a-f ( x ,V ~ ( x ) )o ( l ) ~ , .
+

tn
af ( 5 ,V p ( x ) ) .
This immediately implies ( h ,V 2 p ( x ) h )E TGraph

Remark 6.1 The construct of Lemma 4.1 ensures that we can always find
a strictly differentiable remainder term for the second-order subjet expansion.
Unfortunately this construction does not preserve the first equation in (6.3) and
so we are unable to use it here.
70 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The following is a rephrasing of the result A5 of Crandall et al. (1992) (see

Eberhard, Nyblom and Ralph (1998) for the following version), under only the
assumption of an-boundedness and lower semi-continuity, that we have

(p, Q) E d2'- f~(5) implies (p, Q) E d2,- f (3- Ap) and

f (x - AP) = fxb) - 11~11~- (6.6)

Set Lx(x,p) := (x + Ap,p) : IRnn + IRnn, which is clearly linear and invertible.
The following is a corollary.

Lemma 6.1 Suppose f is prox-bounded and lower semi-continuous. Then

for all A > 0 suficiently small we have L;' (Graph dpf x ) G Graph dpf and
TGraph apfx (570) G LA ( T ~ r a p a,h f (570)).
Proof We know that there exists a Q such that (p, Q) E d2,- fX(5) if and
only if (5,p) E Graphdpfx. This implies (p, Q) E d2p- f (5 - Ap), for some
Q, which is equivalent to saying that (5 - Ap, p) = L;' (" p) E Graph dpf
and so L;' (GraphdPfx) C Graphd, f . Finally we note that if p = 0, then
L;' (5,O) = ( 5 , O ) , giving

Graphdpf~- (5,O) Graph 8, f - (5,O)

t
The result follows on taking the limit supremum.

For the set of vectors D* (df) (5, E)(w) we put

and similarly for S ( D (df ) (5,E) (w))(w). Recall that

Corollary 6.1 Suppose f is prox-regular at x with respect to v. Suppose

- v), h) and choose h such that f!(x, v, h) =
in addition that Q E ~ ( d ~ f3(x,
f:(x,v,h) (or f!(x,v,h) 5 f!(x,v,-h)). Then

the contingent cone to the proximal subdzfferential. I t follows that

COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 71

Proof We may translate the graph of d f so that x = 0 and v = V q ~ ( x=) 0

and add an indicator of a neighbourhood of (0,O) to f so that the resultant
function is bounded below, since f is lower semi-continuous. Next note that
prox-regularity ensures that locally we have d f (x) = d, f ( x ) . We extend the
result from functions that are C 1 > l ( R nto
) those that are prox-regular. We
use the results in Poliquin and Rockafellar (1996) and Poliquin et al. (1996)
to deduce that fx is C 1 ? ' ( R n )locally around 5 (see Poliquin and Rockafellar
(1996) Theorem 4.4) with V f x ( 0 ) = 0. Thus we may apply Proposition 6.1
and Lemma 6.1 to deduce that for all X > 0 and any hx with (fx)$'(O,
0 , hx) =
(fx)'_'(O,
0 , hx) and Qx E E(d2"- fx(0,0), h x ) we have

Now use Proposition 3.2. If Q E E(d21- f ( 0 , 0 ) ,h ) , then for all X > 0 suffi-
+
ciently small we have Q x E ~ ( d : ' -f ( 0 , 0 ) ,h?), where h y = (I XQ) h + h
as X -t 0. Observe that Theorem 3.6 implies a:'- f (o,o) = d29- fx(0, 0 ) and
SO Qx E E(d2'- fx(0,0),h?). For this Q x to satisfy the inclusion (6.9). we
need only to show that (fx)','(O,0 , h?) = (fx)'_'(O,O, h?). First we observe that
Q 5 (fx)"(O,0 , h?) always holds. To establish the reverse inequal-
(fx)','(O,0 , h,)
ity, use Proposition 3.2 again with A = d21- fx(0, 0 ) to establish

The last inequality follows via direct calculation as follows. From (6.6) we have
f x ( 0 ) = f ( 0 ) and
72 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2 - liminf 2 f ~ ( o + t h ' ) - f x ( O ) ) 21 ( f ~ ) r ( ~@).,

= 0,

Now use (3.5) and a Neumann series (see for example Anderson et al. (1969))
to deduce that for X > 0 sufficiently small

as X -+ 0. Thus on taking the limit as X +0 through (6.9) we obtain ( h ,Q h ) E

h,f
T~rap0 (590).

This suggests that { Q w I Q E ~ ( d ' >f -( 5 ,ii),w ) ) may indeed provide a

better description of the function than the coderivative D* (8f ) (5,ii)( w ) for
some functions!

Remark 6.2 Under the assumptions of Corollary 6.1, we have for all h that

R. T. Rockafellar and D. Zagrodny in Rockafellar et al. (1997) proved the

following related result (see also Rockafellar and Wets (1998)).

Theorem 6.1 Suppose that f : Rn H is prox-regular and subdifferentially

continuous at Z for ii E d f ( Z ) , and is proto-dzfferentiable at 5 for ii. Then

In fact for all w we have

In the following we use the convention sup 0 = -m and inf 0 = +m. Put

b D ( f ,x , v ) := {w E b1 (a2?-f ( x ,v ) ) I such that ~ ( d ~f ('x -,v ) ,W ) # 0

and f y ( x , v , W ) = f f ( x , V , w ) ) .
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 73

Corollary 6.2 Suppose that f : Rn H is prox-regular, subdifferentially

continuous at 5 for f i E d f ( 5 ) and proto-differentiable at 5 for f i . Then for all
w E b ~ ( f , Z , f i )we
, have

and

Proof The containment (6.12) follows immediately from Theorem 6.1 and Corol-
lary 6.1, noting that d p f = d f locally. The inequality follows from w t Q w =
f:(5, fi, W ) = f!(5, f i , w) for a11 Q E E(d2*-f (5,f i ) , w ) .

In Poliquin and Rockafellar (1996) and Poliquin et al. (1996) it is observed

that the proto-differentiability of d f a t (5,f i ) implies the second-order epi-
differentiability of f a t (3,c).
Whenever we have ( h ,Q h ) E TGraph ( x ,p), then by definition for all x* E
D* V J P f ( 2 ,P ) ) ( Y * ) we have
( x * ,h ) I(y*,Q h ) which implies S ( D * ~ ~ ~ ( X , P ) h( ~) I
* )(Q,
, h(~*)~).

If d2>-f ( x ,V v ( x ) )= { Q } then we have for all h that 0 I( y * Q - x * , h ) and so

x* = y*Q as one would expect.

Proposition 6.2 Suppose f is pros-regular at x with respect to v . Then for

every w E bD( f ,x,v ) we have

3. Suppose i n addition we assume f is subdifferentially continuous at 5 for

f i E d f (%), proto-differentiable at 5 for f i and b1 (a2>-
f ( x ,v ) ) is a poly-
hedral convex set. Then (6.13) holds for all w E b1 ( a 2 > - f( x , v ) ) with
f!(x, v , w ) = f%, v ,w).
74 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Proof We use (2.1)and (6.7).Hence for w E lRn

On using the symmetry of Q we have p E D* (8,f) (x,v)(w), which implies

~2
( w th) , (p,h) for all Q E E (a2'-f (x,v),h) (6.16)

for all h E b~(f,x,v).

Let w E bo(f,x,v).We may use (6.16)with w = h to get wtQw2 (p,w)
for all Q E E (a2'-f (x,v),w). This implies

D* (8,f) (x,v)(w).If in addition we make the assumption of 3,then

for all p E
by Remark 3.1 E(d21- f (x,v),w)# 0 for all w E rel-int b1 (a2'-f (x,v)) .
Now consider fij E b1 (a2>-
f (x,v)) \rel-int b1 (a2>-
f (x,v)) with f!(x, v,a)=
f:(x, v,8).
Using Corollary 6.1 of Eberhard (2000)we find that

is a convex function. We wish to take Q, E E(d21-f (x,v),w,) with

w, E re1-int b1 (a2,-f (x,v))

for all n and such that fw, E dom f!(x, v,.); fw, -+ f6 and for either the
plus or minus sign we have f:(x, v,f w,) = fr(x,v,fw,) -+ fr(x,V ,fw)=
f:(x, v,6). We may need to take the minus sign if f!(x, v,6)= f!(x, v,-w)=
ff(x,v,w).By relabeling the vectors we can assume without loss of generality
that we may use w, -+ w . The existence of such a sequence may be established
by invoking Theorem 10.2 of Rockafellar (1970)on a simplicia1 convex subset
S of rel-int dom f!(x, v,.) U {a)C b1 (d23-f (x,v)) containing as a vertex.
Also as b1 (a2>- f (x,v)) is a polyhedral convex set we may contain any sequence
w, E b1 (a2>- f (x,v)) with w, -+ tE in such a simplex. The Theorem 10.2 of
the cited reference states that the convex function fr(x,v,.) + rll . /I2 is upper
semi-continuous relative to S. As the function is by construction lower semi-
continuous on all of S,it must also be continuous at f*. It follows that
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 75

f!(x, V , wn) 2 (P, wn) for all p E D* (dpf ) ( x ,v)(wn). Thus for any convergent
Sequence (pn,-wn) E (TGrapha, ( 5 ,v))', converging to (p, - W ) , we have
lim f!(x,v, w,) = f y ( x , v , @ ~=) f y ( x , v , 6 )2 (p,W).
n--tea
As the graph of D* (8, f ) ( x ,v )(.) equals (TGraph
ap (5,v ) )O , a closed convex
cone, p may be taken as an arbitrary element of D* ( d p f )( x , v ) ( W ) . Thus
(6.13) holds for all w E b1 (a2>-
f ( x ,v ) ) with f!(x, v , w ) = f:(x, v , w ) .

Under the assumption of 3 we always have b1 (a2>-

f ( x ,v ) ) convex (see Eber-
hard (2000) Corollary 4.1). This set is actually a polyhedral set when f is "fully
amenable" (see Rockafellar and Wets (1998) Theorem 13.67).

Corollary 6.3 Suppose that f : IRn H is prox-regular at 5 for G E d f (5)

with respect to E and r and subdifferentially continuous and proto-differentiable
at 5 for G . Then for all w we have

f y ( z , G , w ) = S ( D ( d f )(5,G)(w))(w)I S ( D * ( d f )( 5 , v ) ( w ) , w ) . (6.18)

Proof The first equality in (6.18) follows from an application of Corollary 6.2
of Poliquin and Rockafellar (1996) that states that for all w we have

Also without loss of generality we may translate ~ to zero and consider the in-
+
equality for the function g(.) := f (.) - ( G , .) $11 . -5112. By the results of Rock-
afellar and Wets (1998)and Poliquin and Rockafellar (1996))(see also Corollary
6.1 of Eberhard (2000))under the current assumptions h H g y ( ~ , Oh)
, is con-
vex. Hence (6.19) implies

where the subgradient dp coincides with d the usual one from convex analy-
sis. Now use the fact that support function of the convex subdifferential
d (igN(5,0, ( w ) in the direction w is equal to the one sided radial direc-
a))

tional derivative (see Rockafellar (1970))

76 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and on removing the translations (using some basic calculus) we obtain the
equality in (6.18) holding for all w. The final inequality is true for all w as
shown by R. T. Rockafellar and D. Zagrondny in Rockafellar et al. (1997) and
stated in Theorem 6.1.

7 SOME CONSEQUENCES FOR OPTlMALlTY CONDITIONS

In this section we give a few examples showing that f o r m u l ~(6.12) and (6.15)
can be used to obtain estimates of graphical derivatives and coderivatives.
These estimates provide a connection between classical optimality concepts
based on derivatives of smooth functions and those using graphical derivatives.
Another approach which is successful in achieving this goal is the study of op-
timality conditions for functions (see Yang and Jeyakumar (1992)) and
for convex composite functions (see Yang (1998)) both of which are particular
examples of prox-regular functions.
We shall apply these ideas to a nonsmooth penalization of the Lagrangian
associated with a standard smoothly constrained mathematical programming
problem. Indeed we do not have to assume a priori any regularity of the
constraint set but allow a condition to arise out of the construction of the rank-
1 exposed facet of the subjet of the penalized Lagrangian. In this way, what
appears to be a new and in some ways a more refined second-order sufficiency
condition for a strict local minimum is derived.
For unconstrained nonsmooth functions the optimality conditions we inves-
tigate, when using various second-order subdifferential objects, are as follows.

Definition 7.1 Let f : IRn + R and assume the first-order condition 0 E

8, f (5) holds.
The necessary (suficient) condition of the first lcind holds at Z when we have

fy(Z,O, h) > ( > )O for all h # 0 with fL(5, h) 1 0.

The necessary (suficient) condition of the second lcind holds at Z when we have

Vh E dom D (8, f ) (Z,O)(.) with h #0

3p E D (8, f ) (Z,O)(h) such that (p, h) 2 ( > )0.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 77

Definition 7.2 Let f : Rn + and assume the first-order condition 0 E

8f (5) holds.
The necessary (suficient) condition of the third lcind holds at 3 when we have

V h # 0, p E D* (df) (3,O)(h) such that (p, h) > ( > )O.

The necessary (suficient) condition of the forth kind holds at 3 when we have

V h # 0, Q E E (d2f(3, O), h) we have ( Q h ,h ) > ( > ) 0.

We shall say nothing here about conditions of the fourth kind, leaving this
to a later paper. The conditions of the first kind are easier to study. This was
first done by Auslender (1984), Studniarski (1986) and later by Ward (1995),
Ward (1994). Some related results may be found in Eberhard (2000). In this
context the sufficient optimality condition of the first kind is equivalent to the
concept of a strict local minimum of order two (see Studniarski (1986) and
Ward (1995)).

Definition 7.3 W e say 3 E C is a strict local minimizer of order 2 for f

IRn + ]R if there exist P > 0 and b > 0 such that

for all x E B6(5).

As we assume we have extended real-valued functions, constrained problems

may easily be included via the use of indicator functions. That is, Z is a strict
local minimizer of order two for the problem inf { f ( x ) 1 x E C) if it is also one
for the function f (x) + bc(x), where bc(x) = +oo when x $ C and bc(x) = 0
when x E C.

Remark 7.1 I n Eberhard (2000) it is noted that 0 E 8, f ( 2 ) and

f:(5,0, h ) = q (a2,-f (3,0)) ( h ) > 0 for all h # 0 with fl(5, h) 5 0 (7.2)

is necessary and suficient for a strict local minimizer of order two. Proposition
3.3 of Ward (1995) states that 5 is strict local minimizer of order two i f and
only i f
fl'(3, 0, h ) > 0 for all h # 0 with fL(5, h ) 5 0. (7.3)
Clearly (7.2) implies (7.3), since f: (5,O, h ) = min { fy(3,0, h ) , fy(3, 0, -h ) )
and hence implies 5 is a strict local minimizer of order two. For the converse, it
78 OPTIMIZATION AND CONTROL WITH APPLICATIONS

is immediate from definitions that when (7.1) holds we have f:(5, 0 , h ) 2 P >0
in all directions h .

Remark 7.2 I f f is not pros-regular, then i n order to frame a suficient op-

timality condition in terms of the subjet we need to use a condition postulating
0 E d f (3) and the existence of a c > 0 such that when f l ( 5 , h) 1 0 for

h E b1 (a2'-f ( 3 , 0 ) ,) then 3Q E E, (a2>-f (5,0 ) ,h ) with (Q,h h t ) > 0. (7.4)

Remark 7.3 When 5 is a strict local minimum order two, then by Remark 7.1
we have the suficient conditions of the first kind holding. Consider the case
when f l ( 3 , h ) 1 0 and f l ( 3 , - h ) > 0 with f prox-regular and subdifferentially
continuous. Now f l ( 3 , -h) > 0 implies the existence of a 6 > 0 such that
+
f ( 5 t ( - h ) ) - f (3) 2 bt for t small, so we have

Hence we always have

Thus f:(5,O, -h) = +m and f:(3,O, h ) = f:(%,O, h). Now invoke Corollary
6.1 to obtain
0 < fl'(% 0 , h ) L S ( D ( d f ( % O ) ( h ) ) ( h ) .
Thus there exists a p E D ( df ) (5,O)(h) such that ( p , h) > 0 . This is precisely
the suficient condition of the second kind.

The sufficient conditions of the third kind were studied (for the prox-regular
subdifferentially continuous function) in Poliquin et al. (1998) in association
with the concept of the tilt stable local minimum.

Definition 7.4 A point 5 is said to give a tilt local minimum of the function
f : Rn-+ R i f f (3) is finite and there exists 6 >0 such that the mapping

M : v ~ a r gmin { f ( x ) - f ( 5 ) - ( v , x - 5 ) )
Ix-qSa

is single-valued and Lipschitz continuous o n some neighbourhood of v = 0 with

M(0) = 5 .
COMPARISON O F SOME SECOND-ORDER SUBDER.IVATIVES 79

Remark 7.4 I n Poliquin et al. (1998) it is shown that i f f is prox-regular

and subdifferentially continuous at 5 for 6 = 0 then a tilt stable local minimum
exists at 5 i f and only i f the suficient conditions of the third kind hold at x .

Lemma 7.1 Suppose that f : Rn + is prox-regular and subdifferentially

continuous at x for a = 0 E df ( z ) , and d f is proto-differentiable at Z for a.
Then

I n particular { h / fL(5, h ) 5 0 ) > { h fy(5,O, h ) < + m ) = dom D (8,f ) ( 5 ,O)(.)

and

Proof Using (6.19) there exists a p E D (8,f ) ( 5 ,O)(hl)if and only if d p (a f;(z,o,.)) (h') f
0 which is only possible if f!(5, 0 , h') < +co. Indeed by Corollary 6.1 of
Eberhard (2000) (see also the results of Rockafellar and Wets (1998) and
Poliquin and Rockafellar (1996)) that under the current assumptions h H

+
f:(Z,O, h ) rllh112 = g!(Z,O, h ) (where g ( . ) := f ( . ) +
. -51' as in (6.21))
is convex and proper (see Rockafellar and Wets (1998) Theorem 13.40). Thus
-m < f!(x, 0 , .) and consequently the directional derivative (6.21) is also never
equal to -m in any direction. Invoking Theorem 23.3 of Rockafellar (1970) we
have convex analysis subdifferential d ( i g 7 5 ,0 , .)) ( h ) # 0 and consequently
8, ($f y ( x , 0 , .)) ( h ' ) # 0. Thus by (6.19) we have h' E dom D ( d pf ) ( 5 ,0 ) ( . ) :=
{ h 1 D (8,f ) ( 5 , O ) ( h )# 0) and (7.5) holding.
As f:(5, 0 , h ) = min { fy(Z,O, h ) , f:(Z,O, - h ) ) < +co if and only of fh E
dom D (apf ) (5,O)(.) we have (7.6) holding. Finally note that as f!(Z, 0 , h') <
+co implies f!(5, h') 5 0 and we have the relations preceding (7.6) holding.

The following are immediate from the results thus proved.

Theorem 7.1 Suppose that f : Rn H is prox-regular, subdifferentially con-

tinuous and possesses a second-order epi-derivative at 3 for 0 € d f ( 5 ) .

1. The necessary conditions of the second kind implies the necessary condi-
tions the first kind while the suficient conditions of the first kind hold i f
and only i f the suficient conditions of the second kind hold.
80 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2. The suficient conditions of the first and second kind hold if and only if
f is strict local minimum of order two.

3. If f is a tilt stable local minimum, then it is also a strict local minimum

of order two.

Proof On applying Corollary 6.3 we have for all w that

<
and so 0 I ( < ) f r ( f , 0, h) if and only if 0 (<)S(D (df ) (5, O)(h))(h). Suppose
the necessary (sufficient) conditions of the first kind hold. Then when h E
dom D (df) (Z,O)(.) we have f;(Z, 0, w) < +oo and hence f l ( f , 0) 5 0 implying
0 5(<)f!(f, 0, h). The existence of p E D (df ) ( f , 0) (h) with 0 < (p, h) follows
from (7.7) when 0 < fr(f,O, h). When the necessary condition of the second
kind hold then there exists a p E D (df ) (5,0) (h) with 0 5 (p,h) and (7.7)
implies 0 5 fr(f,O, h).
When the necessary (sufficient) conditions of the second kind hold, take h
such that f l ( f , 0) 5 0. Suppose first that f!(Z, 0, w) < +oo in which case h E
dom D (df) ( f , O)(.). Then (7.7) along with the necessary (sufficient) conditions
of the second kind imply 0 <(<) f!(f,O, h). Otherwise ff(f,O, w) = +oo >0
as required in the necessary (sufficient) conditions of the first kind. As the suf-
ficient conditions of the first kind hold if and only if 5 is a strict local minimum
order two the sufficient conditions of the second kind are also equivalent to f
being a strict local minimum order two.
Finally suppose f is a tilt stable local minimum and hence the sufficient
conditions of third kind hold a t f . Then (6.11) of Theorem 6.1 implies for all
h E dom D (df ) ( f , 0 ) (a) and p E D (df ) (3, 0) (h) that 0 I (<) (p, h) . Thus the
sufficient conditions of the second kind follow and f is a strict local minimum
order two.

Jets do possess a kind of monotonicity property under fairly natural condi-

tions. Recall that the rank-1 hull of a set of symmetric operators can be much
larger than its convex hull (see Eberhard et al. (2002)).

Lemma 7.2 Suppose { fa),,-* is a family of lower semi-continuous functions.

1. Define f (x) = sup,,^ f,(x). Let A(x) := {a E A I f,(x) = f (x)). Then

COMPARISON O F SOME SECOND-ORDER SUBDERIVATIVES 81

When A ( x ) is finite the convex hull of the left hand side of (7.8) is con-
tained in the right hand side of (7.8).

2. Now suppose in addition that each f , E C 2 ( R n )for a E A = { I , . . . ,m )

and let

li ( x ,h ) := lim sup A(xl)

x'+~x
= { a E A(%)I 3(tn,hn)+ (0+,h)with a E A(x+tnhn), V n ) .

Then

If also p E co {Vf,(x) Ia E A(x, h ) ) , then

where
-
R ( x ,h,p) = {A E RI;I I 3(tn,h,) + (o+,h ) such that A, = 0,
Va E { i I i $ A(x + tnhn) for n sufficiently large) and

3. When { V f,(x) I a E x ( x , h ) ) are linearly independent we have R ( x ,h,p) =

-
R ( x ,h ,p) where

0 # Q ( x , h , ~=) { X E I A~ , = ~ i f a $ ~ ( x , hand
)

Proof The first containment for f ( x ) = SUP,,=A f,(x) follows from definitions
in that when (p,, Q,) E f,(x) and a E A ( x ) then we have
82 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Thus for any {Aff),EA(x) with CaEA(x)

A, = 1 and A, 2 0 we have for any y

where the last term is clearly of small order when A(x) is countably finite. This
implies xaEA(x)
A, (pa, Q,) E d2>-f (2).Next note that if a E i ( x , h), then
+
by the lower semi-continuity of f we have for xn = Z tnhn that

f, (x) = lim inf f, (x,) = lim inf f (xn) L f (5)

n n

and so a E A(x). Observe that in this case, if f, E C2 (lRn) and A is finite, then
f is locally Lipschitz, prox-regular and subdifferentially continuous (see Exam-
ple 2.9 of Poliquin and Rockafellar (1996)). Also f is regular (see Rockafellar
(1982) regarding lower C2 functions) and semi-smooth (see Mifflin (1977) re-
garding suprema of semi-smooth functions). Hence by the results of Spingarn
(1981) we must have df submonotone and hence directionally continuous in
the sense that if (t,, h,) + (O+, h) and pn E d f (x + tnhn) with pn -+ p, then

In particular, if a E A(x, h) then 3(tn, h,) + (O+, h) such that V f, (x+tnhn) E

d f (x + tnh,). This implies

and so a E A(x,h). If p E co { ~ f , ( x ) 1 a E i ( x , h ) ) , then for any A E

-
R(x, h,p) (which may be empty) we have p = x,EA(x,h)
A,Vf,(x) and the
existence of (t,, h,) + (O+, h) such that A, = 0 for all a $ A(x + tnhn), for n
sufficiently large. Summing across the second-order Taylor expansion of each
f, for a E i ( x , h) yields
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 83

where o (t:) = CaEA(x,h)off(Iltnhnl12),a finite sum. As f,(x) = f ( x ) and

f,(x + +
tnhn) = f ( x tnhn) for all A, > 0 and n sufficiently large, we obtain
for Q = CaE~(x,h)
V 2 f a ( x )that

2 1
n+w tn
+
lim inf 7 ( f ( 5 tnhn) - f ( x ) - tn(p,h,)) = - ( Q ,hht)
2

In order for Q E E (a2,-f ( x ) )( h ) we require (Q,hht) = f:(fi,p, h). B y stan-

dard arguments (see Example 13.16 of Rockafellar and Wets (1998))one may
show that

whenever h is a direction such that p E co {Vf,(x) I a E A(x,h ) ) , as is the

case here. If p E co { V f , ( x ) I a E A(x, - h ) ) , then as (7.12) holds for h and
-h we have f!(Z, p, h ) = f!(Z, p, -h) (due to the symmetry of the Hessians)
and so f;(fi,p, h ) = f:(Z,p, h). Otherwise by the Clarke regularity of f we
have (p,-h) < f O ( x ,-h) = fL(x, -h) , implying f!(x,p, -h) = +oo and once
again f!(fi,p, h ) = f f ( % , p ,h). By (7.8), (7.12) and the fact that d2>-f ( x , ~is)
a rank-1 representer, we have immediately that d2>-f ( x , ~equals
)

x
or€A(z)
X,V' fa ( x )1 X E R? and x A, = 1 with p =
CYEA(X)
x
aEA(x)
X,V f , ( x )

(7.13)
In particular

Combining this observation with (7.11) yields

84 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Now suppose p E co {Vf,(x) 1 a E A(x, h)). It is clear that n ( x , h,p) C

O(x, h,p). By the linear independence of {Vfx(x) I a E x ( x , h)) there exists
a unique X E O(x, h,p) such that p = CaEA(x,h)
X,Vf,(x). Thus O(x, h,p) =
{A) forcing equality.

Remark 7.5 If f, E C 2 (Rn) and A is finite, then f is locally Lipschitz, prox-

regular and subdiflerentially continuous (see Example 2.9 of Poliquin and Rock-
afellar (1996) and we have

(see, exercise 8.31 of Rockafellar and Wets (1998)). In particular, when 0 E

af (x) and 0 = max{(Vf,(x), h) I i E A(x)) this implies

and so there exists X such that

0= x
iEA(x,h)
h i V f i ( ~ )for some
iEA(x,h)
Xi = 1 with Xi 2 0.

Corollary 7.1 Let f, E C2(Rn) for each a E A = (1,. . . ,m) and f =

rnax,~n f, as in Lemma 7.2 part 2. Then

Proof By Example 2.9 of Poliquin and Rockafellar (1996) and the discussion
of Rockafellar and Wets (1998), we have all the assumptions of Corollary 6.2
holding.

Consider a standard non-linear programming problem

We consider a modified Lagrangian penalty which is studied in Andromonov

(2001) (but in greater generality) and is given by
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 85

This can be viewed as a combination of penalty and Lagrangian methods. We

posit the following assumptions:

A1 all fi are continuous for all i = 0,. . . ,m;

A2 fo is positive on the feasible set 3n X , where

3 : = {x I fi(x) 5 0 for i = 1,..., m ) n X

(this can easily be arranged via a reformulation);

A3 all the constraints fi are bounded below on Rn (once again this can easily
be arranged via a reformulation).

Then it can be shown (see Andromonov (2001), Chapter 5, but in greater

generality) that the following are all true (note the absence of regularity as-
sumptions).

1. Suppose (3, d) E X x RI;. is a saddle point of L(x, d), fi(Z) I

:0 for all
i = 1,..., m and difi(%) = 0 for all i = 1,..., m. Then (3,d) is also a
saddle point of the function U(x, d, f ) for any fixed value of the penalty
parameters F, that is, for all x E X and d E R"+:= {d I di > 0) we have

2. We have f * > SUPdE~;linfXExU(x, d, 7 ) .

Proof (Provided for completeness.) Let x, E 3 f l X be a minimizing
sequence i.e. fo(xn) -+ f * . Then fi(x,) <
0 for all i and n and as
d E R1;1 we have di fi(xn) 5 0 so fo (x,) 2 L(x,, d). Next note that

inf U(X,d,F) l U(xn, d, F). (7.16)

XEX

If L(xn, d) I 0, then U(x,, d, F) I 0 I fo(x,) (since xn E 3fl

X and fo
is positive there). Otherwise

In both cases we have on combining with (7.16) that

86 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for all n and so i n f , , ~ U(x, d, 7) 5 f * for all d E ]RT

3. Under the assumption that (for any e > 0)

is upper semi-continuous for e J, 0 , there exists 7 sufficiently large such

that f * - E 5 SUPdE~;inf+€xU(x, d, 7).

4. If (5, z) E X x R?+?
is a saddle point (as given in (7.15)), then 5 is an
optimal solution for the problem (7.14).

Proof (Provided for completeness.) First we show that 5 is feasible. As

we have a saddle point, it follows that for all d E IRT we have

If there is a k such that fk(Z) > 0, we simply let dk -+ cm with di = 0

for i # k to obtain a contradiction. Thus 5 E 3 n X. Next let d = 0 to
obtain f * > fo ( 5 )and so that 5 is optimal for (7.14).
5. From the work of Poliquin and Rockafellar (1996), we know that as

( x d ) = goF(x)

then for any fixed (d, r ) we have x H U(x, d, r ) prox-regular and subdiffer-
entially continuous (see Example 2.9 of Poliquin and Rockafellar (1996))
and as observed before also Clarke regular (see (Rockafellar and Wets
(1998))), semi-smooth (see (Mifflin (1977))) and hence submonotone and
directionally upper semi-continuous (see Spingarn (1981) and Rockafellar
(1982)). It is a very well-behaved function indeed. We now derive the
necessary optimality condition associated with this Lagrangian penalty
method.

Theorem 7.2 Suppose that all fi E C2(Etn)for i = 0,. . . ,m and fo is bounded

below on the feasible set
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 87

Let A(5) := {i I fi(5) = 0 ) be the active set at Z and

Then the following are suficient conditions for a strict minimum of order
two at 5 for the problem (7.14) with X = lRn. Suppose there exists d E IR;
satisfying:

1. f i ( Z ) 5 O for all i = 1 , . . . ,m and d i f i ( Z ) = 0 for all i = 1 , . . . ,m;

2. 0 E co ({V,L(5, d)} u { V , fi(5) I i E A ( 5 ) ) ) ;

3. for each h such that 0 = sup { ( ~ , L ( 5 , d )h,) ,{ ( V x f i ( 5 )h,) 1 i E A ( 5 ) ) )

we have that
there exists ri > 0 for i = 1 , . . . , m such that A(%,h ) = &(5, h ) , where

A". (5,h )
:= { i E A(5) I 3(tn,h,) + (o+,h) such that (7.1 7)

and there exists (Ao, X I , . . . ,A,) > 0 with X i = 0 i f i @ A(%,h ) and with
X o = 0 if ( V x L ( 5 , d )h, ) < 0 such that

0 = X0VxL(5,d)+ C XiVfi(5) and

i€x(m,h)

0 < ~+
( x ~ v : L (d) , C Xiv2f i ( 5 ) ,hht). (7.18)
i€ii(~,h)

Remark 7.6 We note that if (7.18) holds, then for any ri > 0 there exists
Xi 2 0 with &X(z,h) Xi = 1 such that

In particular this implies 0 E co ( { V ,L ( Z ,2 ) ) U {riV, fi(5) I i E A(%,h ) ) ). To

see this, simply divide the equations in (7.18) by ( c ~ %~+ ~ ( ~ , ~ )
X O ) and define

X0 = (i€il(m,h)
$ + Ao)
-1

and Xi = $( i€ii(m,h)
$ + Xo)
-1
88 OPTIMIZATION AND CONTROL WITH APPLICATIONS

A similar calculation shows that the condition (2) implies

0 E co ( { v X L ( 5 , d )U} {riV, fi(5) I i E A ( 5 ) ) ) for any ( r l ,. . . , r,) > 0.

Proof (Theorem 7.2). Since fo is bounded below on 3 by adding a sufficiently

large constant to fo we may assume without loss of generality that fo is positive
on F.Consider the localized problem

for X = Bs(5) for some 6 > 0 yet to be specified. We observe that as fi E

C2(lRn)for i = 0,. . . , m these are all bounded below for 6 > 0 sufficiently
small. Define U(x,d,F) = max{L(x,d),Flf i ( x ) ,. . . ,Fmfm(x)}and note that
~ ( 5d, ,7 ) = f0(5).
Observe that as & fi(5) = 0 for all i = 1,. . . ,m , we have L ( z , d) = f0(5) 2 0.
Now suppose that contrary to our assertion that 5 is not a minimum of fo in
B6(5) for any 6 > 0. Then for any F E IRY we have U(5,d; F ) = L(5,d) =
fo(5) > 0 and there exists x' E F n B6(5) such that 0 5 f o ( x l )< fo(5). This
implies

since Cz1 &

Ifi(xl) 5 0 for any x' E 3. Now as all Fi f i ( x l ) 5 0 for all x' E F
and any F 2 0, when L(xl,d) > 0 we have

When L(xl,d) 5 0 we have

In both cases we have, for 6 , F) <

> 0, an x1 E Bd(5) n 3 such that U ( x l d,
U(5,d;F). If this is true for all 6 > 0, then there exists xn +h 5 such that
fo(xn) < fo(5) and

~ ' ( 5d;,r, h) 5 0, which implies L'_(z,d; h) 5 0

since ~ ( x , d;
, F) = L(x,, d) for large n since fo(5) > 0.
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 89

Now consider the function

for which ~ ( 5d; ,r ) = f o ( Z ) and the maximum is attained for the indices A(5)U
( 0 ) . By assumption 2. we have 0 E d ~ ( xd;,r ) and so

For i E A(%)we have f i ( Z ) = 0 and so rifi(z,) 5 ri fi(5) = 0 implying

(ofi(%),h ) 5 0. As L'_(Z,d, h ) 5 0 we have the supremum in (7.21) equal to
zero. Thus we may invoke 3. to obtain the existence of a X 2 0, X i = 1 withxi
Xi = 0 if i $ A(Z,h ) U ( 0 ) such that 0 = XoVL(Z,J) + CiEh(z,h)
XiriVi f ( 2 ) .
By (7.13) we have for any such choice of X that

Thus on choosing X as given in (7.19) we have

O X ,2 , ) 2 ( r ) + 1 ( (Q, -
I n -
'I ) ( xn 'xII
115, -
)t) x n -~ 1 1 '

This implies by (7.19) that for n sufficiently large and some c > 0 we have

but since we assumed fo(xn) < fo(5) with x, E 3 we have ~ ( x , d,

, r ) = fo(Z)
contradicting c> 0. Thus 5 is at least a local minimum.
Let us now fix 6 > 0 sufficiently small so that Z is a local minimum of fo
on Ba(Z). Let us now redefine the objective to be fo - f: 2 0 and henceforth
will assume fo 2 0 on the feasible set 3 n Ba(Z). This translation does not
change any the form of the optimality conditions. As fo(3) = f; = 0 we have
for all r > 0 that U ( Z , z , r ) = 0. We now proceed to show that 5 is a strict
local minimum order two. Note that for any d E IRI;" we have

since Czl difi(5) 5 0 = Czl &fi(5) as fi(5) 5 0 for all i = 1,. . . ,m. Thus
to verify that 5 is indeed a strict local minimum order two for (7.20) we need
90 OPTIMIZATION AND CONTROL WITH APPLICATIONS

only consider if it is also one for x H U(x, d, F ) on BJ(3) for some 6 > 0. Indeed
if 3 is a strict local minimum order two for x ++ U(x, d, F) on B J ( ~then
) for all
XI 3 n BJ(E) we have U(xl,d,F) = m a x { ~ ( x ~ , d ) , F ~ f ~..,.~',f,(x'))
E (x~),. =
L(xl, d) 5 fo(xl). On recalling that fo (3) = U(3, d, r) = 0, we have thus that

and 3 is also a strict local minimum order two for the problem (7.14).
Next note that the first-order condition for this amounts to

We now use sufficient conditions of the first kind in the form of (7.4). We need
to show that for all h with U'(3, d;F, h) 5 0 and h E b1 ( a 2 > - ~ ( 3 , d , F0)),
,
for some > 0 we have the existence of a Q E E, ( a 2 ? - ~ ( 3d,, F,O), h) with
E

0 < (Q, hht). As (7.22) holds we have U'(3, d, F, h) > 0 and hence we need to
consider only those h such that U'(3, d; 7, h) = 0 or

This means that 0 E co { ( v , L ( ~ , 4,{riVxfi(3) I i E A(%,h)}), so we need

to consider all (Ao, X I , . . . ,A,) > 0 with X i = 0 if i $ A(%,h) such that
Xiri + Xo = 1 and

We are at liberty to assume that X satisfies (7.19). We now use (7.10)

for the function x H U - l ( ~ , d ; r ) := max{Flfl(x), ... ,Lf,(x)}. Let pi :=
/ \

Note that as A(5, h) = A,(?, h), we have by (7.13) that

Since U(3,d;F) = L(3,d) = 0 and U(3,d;F) = max{L(lc,d),U-1(3,d,r)),

we may apply (7.8) to obtain for our given Xo E ( 0 , l ) such that
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 91

As 1 - Xo = &h(z,h) Xi, d)
we have by (7.23) that XOVL(E, + ( 1 - Xo) p-1 =0
and hence

with Q' satisfying (7.19), that is, (Q', hht)> 0. When h E b1 (a2>-U(Z,d; r , o)),
there always exists a Q E (Q' + P ( n ) ) n E, (a2'-U(3, d; r , 0 ) ,h ) # 0 such that
(7.19) holds and so we have ( Q ,h h t ) > 0 , the desired conclusion follows from
(7.4).

Remark 7.7 The standard second-order suficiency conditions correspond to

the case when V L ( P ,d) = 0 and so 0 E A ( 3 , h ) for all h . If we assume also
that ( v 2 L ( Z ,d ) h h t ) > 0 for all h satisfying ( V L ( Z ,d),h ) = 0 , we find that
(7.18) is automatically satisfied. Assumption (7.17) corresponds to a kind of
well-posedness condition which replaces the usual regularity at 5. The corre-
spondence between these two conditions is a topic of current research.

8 APPENDIX

In this appendix we provide the proof of Lemma 4.1. Let O be a class of

functions from R+ to IR and define J z ( O , Z ) to be the class

{ c p E C2(IFtn) I there exists (a,p, Q ) E IFtn+' x S(n) and r E O such that

Lemma 4.1 claims that it is sufficient to consider only the class J z ( 0 , Z ) gen-
erated by

which we denote by J2(Z).

In Penot (1994/1), the subjet is defined via a notion equivalent to taking
O := { r : R+ H R I limtlo r ( t ) = 0 ) . Penot shows that in finite dimensions
that this amounts to demanding
92 OPTIMIZATION AND CONTROL WITH APPLICATIONS

On reflection one can see that this corresponds to the second-order char-
acterization of subjets as provided in the first-order case by Proposition 1.2
(page 341) in Deville et al. (1993). We extract and extend to the second-order
level the relevant parts of this argument in the proof of Lemma 4.1.

Proof [Lemma 4.11 The only problem with w in (4.1) is that it is not neces-
sarily C2(Rn). AS a C2(lRn) bump function exists, the construction found in
Lemma 1.3 of Deville et al. (1993) (page 340) leads to a function d : Rn H lR+,
where

d(x) := - and h(x) =

h(x) n=O
b(nx). x00

Here b : lRn HR is a C2(Rn) bump function such that 0 5 b 5 1 on R n ,

b(0) = 1 and b(x) = 0 for all llxll 2 1. Since 11 . [I2 is C2(Rn) by any one of a
standard set of constructions, we may choose here b(x) = R ( l l ~ 1 ) (for
~ ) some
R : R+ H lR+)for a C2(R) bump function on IR. Thus we may assume that b
is of the form b(x) = D(11x11) for some function D defined on the positive reals.
The function d possess all the properties announced in Lemma 1.3 of Deville
et al. (1993) as well as being C2(Rn) on Rn (note that h(x) 2 1). To see this,
observe that the sum is locally finite on lRn and so

is well-defined as CrO=o
b(nx) 2 1. Without loss of generality we may assume
that w(-) 2 0. Now arguing as in Proposition 1.2 of Deville et al. (1993) (page
341) we define

p(t) := inf{u(h) I llhll 5 t) where u(h) := sup{-w(llhll)llhl12, -1).

Then p is nonincreasing, p(0) = 0 and so p 5 0 and

lim inf - u(h) 2 0, implying 11m - ~ ( t=

) 0,
Ilhll-0 llh1I2 t+o t2
because inf - u(h) 5 inf - 4h) -
--~ ( t ) 0,
l M l t llh1I2 l l h l l l t t2 t2
COMPARISON OF SOME SECOND-ORDER SUBDERIVATIVES 93

Continuing in a parallel fashion to Deville et al. (1993) Proposition 1.2 we

define

pl(t) := let ds, p2(t) := 1et

f?@
S
ds and p3(t) := 1et

S
ds.

As in the proof of Proposition 1.2 of Deville et al. (1993) (page 342) we have

Thus we have

As pl is continuous, p3 is C2-smooth on (0,+m). Put $ ( x ) := p3(d(x))and

$(0) = 0. This function is clearly Fr6chet differentiable on lRn\{O), as d ( x ) # 0
for x # 0. Since p is nonincreasing, it follows that each pi, i = 1,2,3 are non-
increasing. Moreover as ( u - $ ) ( 0 ) = 0 and llxll I d ( x ) for 11x11 L 1, we have

and so u - $ has a local minimum a t 0. Using (4.1), we find that in some

neighbourhood of 5 we have

It remains only to show that $ is twice F'r6chet differentiable a t 0. via direct

calculation we have for x E lRn\{O) that

V $ ( h ) = p;(d(x))Vd(x)and
+
v 2 $ ( h ) = p$(d(x))( ~ d ( x ) ~ d ( x p) ;~( d) ( x ) ) v 2 d ( x ) , where

PN = ( a @t-) a @ )and
) (8.4)

By (8.3), p$(t) -+ 0 along with p!(t) -+ 0 as t --t 0. Since d ( x ) is Lipschitz

continuous, we have d ( x ) -+ 0 as llxll -+ 0. In the proof of Proposition 1.2 of
Deville et al. (1993) it is shown that IlV$(x)II -+ 0 as x -+ 0 and SO we need
only show 11 V 2 $ ( x )11 -+ 0 as x -+ 0. To do this we need estimates on V 2 d ( x )for
such x. From Proposition 1.2 of Deville et al. (1993) we know that h ( x ) <&
94 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for all small x #0 (that is, & = O(11x11). Letting M be a global bound for
IIVb(x)11 and IIV2b(x)ll,we have since that Vb(x) and V2b(x) are zero outside
the unit ball that

Similarly,

Thus

as x + 0. From (8.4) and limt,0 9 = 0 we have

and since llxll 5 d(x) 5 KIIxll for 11x11 5 1, it follows that ~ ; ( d ( x ) )= o(llxll).
Hence since p$(d(x)) +0 and Vd(x) is bounded as x -+ 0 we have

By the mean-value theorem applied initially to the function $ we find V$(O)

which exists and is zero a t the origin (see the proof of Proposition 1.2 of Deville
et al. (1993)). Next we apply the mean-value theorem to the first derivative
V$(x) to deduce that v2$(0) = 0 exists and v2$is continuous at 0. That is,
for some yt E (0, I), we have

as t -+ 0 and h -+ 0. Thus $ is twice Frhchet differentiable a t 0 with IJ+(Z) =

(0,O). Finally we note the following. As d is C2(Rn) it is locally Lipschitz and
REFERENCES 95

so Lipschitz on the (compact) unit ball. Since d(x) = 0 outside the unit ball, d

--
is globally Lipschitz. Thus

where K is the global Lipschitz constant of d. Putting r(t) := 20

and r(0) = 0, we derive '(h) = - r(llhll)llhl12 is as required.

References

Anderson W. Jr. and R. J. Duffin (1969), Series and Parallel Addition of Ma-
trices J. Math. Anal. Appl., Vo1. 26, pp. 576-594.
Andromonov M. (2001), Global Minimization of Some Classes of Generalized
Convex Functions, PhD Thesis, University of Ballarat, Australia.
Attouch H. (1984) , Variational Convergence for Functions and Operators, Pit-
man Adv. Publ. Prog. Boston-London-Melbourne.
Aubin J. P. and Frankowska H. (1990), Set Valued Analysis, Birkhauser.
Auslender A. (1984), Stability in Mathematical Programming with Nondiffer-
entiable Data, SIAM J. Control and Optimization, Vol. 22, pp 239-254.
Balder E. J. (1977), An Extension of Duality Relations to Nonconvex Optimza-
tion Problems, SIAM J. Control and Optim., Vol. 15, pp. 329-343.
Ben-Tal A. (1980) Second-Order and Related Extremality Conditions in Non-
linear Programming, J. Optimization Theory Applic. Vol. 31, pp. 143-165.
Ben-Tal A. and Zowe J. (1982) Necessary and Suficient Conditions for a Class
of Nonsmooth Minimization problems, Mathematical Programming Study
19, pp. 39-76.
Bonnans J . F., Cominetti R. and Shapiro A. (1999) Second Order Optimal-
ity Conditions Based on Parabolic Second Order Tangent Sets, SIAM J.
Optimization, Vol. 9, No. 2, pp. 466-492.
Crandall M., Ishii H. and Lions P.-L. (1992), User's Guide to Viscosity Solu-
tions of Second Order Partial Differential Equations, Bull. American Math.
Soc., Vol. 27, No. 1, pp. 1-67.
Cominetti R. and Penot J.-P. (1995), Tangent sets to unilateral convex sets, C.
R. Acad. Sci. Ser. I Math., 321, pp 1631-1636.
96 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Dolecki S. and Kurcyusz S. (1978), On @-Convexity in Extremal Problems,

Soc. for. Indust. and Applied Maths. (SIAM), J.of Control and Optimization,
Vol. 16, pp. 277-300.
Deville R., Godefroy G. and Zizler V. (1993), Smoothness and Renorming in
Babach Spaces, Pitman Monographs and Surveys in Pure and Applied Math-
ematics 64, Longman Science and Technical-Wiley and Sons, Inc., New York.
Eberhard A., Nyblom M. and Ralph D. (1998), Applying Generalised Convex-
ity Notions to Jets, J.-P. Crouzeix et al.. (eds), in Generalized Convexity,
Generalized Monotonicity: Recent Results, Kluwer, pp. 111-157.
Eberhard A. and Nyblom M. (1998), Jets, Generalized Convexity, Proximal
Normality and Differences of Functions, Nonlinear Analysis Vol. 34, pp. 319-
360.
Eberhard A. (1998), Optimality Conditions using a Generalized Second Order
Derivative, in Proceedings of ICOTA98, Optimization Tecniques and Appli-
cations, Vol. 2, Curtin University Press, pp. 811-818.
Eberhard A. (2000), Prox-Regularity and Subjets, in Optimization and Re-
lated Topics, Applied Optimization Volumes, Kluwer Academic Pub., Ed.
A. Rubinov, pp. 237-313.
Eberhard A. and Ralph D. (2002), Rank One Representers, to appear in the
Journal of Mathematical Sciences, published by Kluwer/Plenum.
Holmes R. (l975), Geometric Functional Analysis, Springer-Verlag, New York
Berlin.
Ioffe A. D. (1981), Nonsmooth Analysis: Differential Calculus of Nondifferen-
tiable Mappings, Trans. Amer. Math. Soc., Vol. 266, No. 1, pp. 1-56.
Ioffe A. D. (1984), Calculus of Dini Subdifferentials of Functions and Contingent
Coderivatives of Set-Valued Mappings, Nonlinear Analysis: Theory, Methods
and Applications, Vol. 8, pp 517-539.
Ioffe A. D. (1996), Approximate Subdifferentials and Applications 2, Mathe-
matika 33, pp 111-128.
Ioffe A. D. (1989), Approximate Subdifferentials and Application 3. The Metric
Theroy, Mathematika 36, Vol. 36, No. 3, pp 1-38.
Ioffe A. D. (1990) Composite Optimization: Second Order Conditions, Value
Functions and Sensitivity, Analysis and optimization of systems, Lecture
Notes in Control and Inform. Sci., 144 (Antibes), pp. 442-451.
REFERENCES 97

Ioffe A. D. (1991) Variational Analysis of Composite Function: A Formula for

the Lower Second Order Epi-Derivative, J. Math. Anal. and Applic., Vol.
160, pp. 379-405.
Ioffe A. and Penot J.-P. (1997), Limiting Subhessians, Limiting Subjets and
Their Calculus, Transactions of the American Mathematics Society, No. 2,
pp. 789-807.
Janin R. (1973) Sur la dualit6 in programmation dynamique, C. R. Acad. Sci.
Paris A 277 , pp. 1195-1197.
Kruger A. Ya. and Mordukhovich B. S. (1980), Extremal Points and the Euler
Equation in Nonsmooth Optimzation Problems, Dokl. Akad. Nauk BSSR,
Vol. 24, pp. 684-687.
Ky Fan (1963) On the the Krein-Milman Theorem, Proc. of Symposia in
Pure Mathematics. Vol VII, American Math. Soc., Providence, RI, 1963, pp
211-220.
Martinez-Legas J-E (1988), Generalized Conjugation and related Topics, in
'Generalized Convexity and Fkactional Programming with Economic Appli-
cations', Proceedings, Pisa, Italy, 1998 Lecture Notes Economics and Math-
ematical Systems Vol. 345, Springer-Verlag, pp. 168-179.
Martinez-Legas J-E and Singer I. (1995), Subdifferentials with respect to Du-
alities, Mathematical Methods of Operations Research, Vol. 42, pp 109-125.
Mazure M.-L. (1986), L'addition parallble d'op6rareurs InterprMe Comme Inf-
convolution de Formes Quadratiques Convexes, RAIRO Mode'l. Math. Anal.
Numir., Vol. 20, No. 3, pp. 497-515.
Mifflin R. (1977), Semismooth and Semiconvex Functions in Constrained Op-
timization, SIAM J. Control Optim., Vol. 15, pp. 957-972.
Mordukhovich B. S. (1976), Maximum Principle in the Problem of Time Opti-
mal Response with Nonsmooth Constraints, J. Appl. Math. Mech., Vol. 40,
pp. 960-969; translation from Prikl. Mat. Mekh. 40, pp. 1014-1023.
Mordukhovich B. S. (1984), Nonsmooth Analysis with Nonconvex generalized
Differentials and Conjugate Mappings, Dokl. Akad. Nauk. BSSR, Vol. 28,
pp. 976-979.
Mordukhovich B. S. (1994), Generalized Differential Calculus for Nonsmooth
and Set-Valued Mappings, J. Math. Anal. and Applic., Vol. 183, pp. 1805-
1838.
98 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Mordukhovich B. S. and Shao Y. (1998), Mixed Coderivatives of Set-Valued

Mappings in Variational Analysis, J. Appl. Analysis, Vol. 4, No.2, pp. 269-
294.
Mordukhovich B. S. and Outrata J . V. (2001), On Second Order Subdifferen-
tials and their Applications, SIAM J. Optim. , Vol. 12, No. 1, pp. 139-169.
Mordukhovich B. S. (2002), Calculus of Second-Order Subdifferentials in Infi-
nite Dimensions, personal communication.
Pallaschke D. and Rolewicz S. (1998) Foundations of Mathematical Optimiza-
tion. Convex Analysis without Linearity, Maths. and its Appl., Vol. 388,
Kluwer , Dordrecht.
Penot J.-P. and Volle M. (1988), On Strongly Convex and paraconvex Duali-
ties, in 'Generalized Convexity and Fractional Programming with Economic
Applications', Proceedings, Pisa, Italy, 1998 Lecture Notes Economics and
Mathematical Systems Vol. 345, Springer-Verlag, pp 198-218.
Penot J.-P. (1992), Second-Order Generalised Derivatives: Relationship with
Convergence Notions, Non-Smooth Optimization Methods and Applications,
Gordon and Breach Sc, Pub., Ed. F. Giannessi, pp. 303-322.
Penot J.-P.(1994), Sub-Hessians, Super-Hessians and Conjugation, Non-linear
Analysis, Theory, Methods and Applications, Vol. 23, No. 6, pp. 689-702.
Penot J.-P. (1994) Optimality Conditions in Mathematical Programming and
Composite Optimization, Mathematical Programming Vol. 67, pp. 225-245.
Poliquin R. A. and Rockafellar R. T. (1993), A Calculus of Epiderivatives Ap-
plicable to Optimization, Canadian Journal of Mathematics, Vol. 45, No. 4,
pp. 879-896.
Poliquin R. A. and Rockafellar R. T. (1996), Prox-Regular Functions in Vari-
ational Analysis, Transactions of the American Mathematical Society, Vol.
348, No. 5, 99 pp. 1805-1838.
Poliquin R. A. and Rockafellar R. T. (1996), Generalised Hessians Properties
of Regularized Nonsmooth Functions, SIAM J Optimization, Vol. 6, No. 4,
pp. 1121-1137.
Poliquin R. A. and Rockafellar R. T. (1998), Tilt Stability of Local Minimum,
SIAM J. Optimization, Vol. 8, No.2, pp. 287-299.
R. T. Rockafellar (1970) Convex Analysis, Princeton University Press, Prince-
ton New Jersey.
REFERENCES 99

Rockafellar R. T. (1982), Favorable classes of Lipschitz Continuous Functions

in Subgradient Optimization, Progress in Nondifferentiable Optimization, E,
Nurminski, ed., (IIASA, Luxenberg, Austria, l982), pp. 125-143.
Rockafellar R. T. (1989), Proto-Differentiability of Set-Valued Mappings and
its Applications in Optimization, Analyse Non Line'aire, (editor H. Attouch
et al.), Gauthier-Villars, Paris, pp. 449-482.
Rockafellar R. T. (1988), First and Second Order Epi-Differentiability in Non-
linear Programming, Transactions of the American Mathematical Society,
Vol. 307, No. 1, pp. 75-108.
Rockafellar R. T. and Zagrodny D. (1997), A Derivative-Coderivative Inclusion
in Second-Order Nonsmooth Analysis, Set-Valued Analysis, No. 5, pp. 89-
105.
Rockafellar R. T. (1990), Generalized Second Derivatives of Convex Functions
and Saddle Functions, Trans. Amer. Math. Soc., Vol. 320, pp. 810-822.
Rockafellar R. T. and Wets R. J-B. (1998), Variational Analysis, Volume 317,
A series of Comprehensive Studies in Mathematics, Pub. Springer.
Rubinov A. (200), Abstract Convexity and Global Optimization., Nonconvex
Optimization and its Applications 44, Dordrecht: Kluwer Academic Publish-
ers.
SeegerA. (1986), Analyse du Second Ordre de problbmes Non Diffe'rentiables,
Thesis de 1'Universitk Paul Sabatier, Toulouse, 1986.
Seeger A. (1991), Complkment de Shur a t Sous-diffkrentriel du Second-ordre
d'une Function Convexe, Aequationes mathematicae, Vol. 42, No.1, pp. 47-
71.
Seeger A. (1992) Limiting Behaviour of the Approximate Second-Order Subd-
ifferential of a Convex Function, Journal of Optimization theory and Appli-
cations, Vol. 74, No. 3, pp. 527-544.
Seeger A. (1994), Second-Order Normal Vectors to Convex Epigraph, Bull.
Austral. Math. Soc.,Vol. 50, pp. 123-134.
Singer I., Abstract convex analysis, Wiley, New York.
Spingarn J.E. (1981), Sub-monotone Sub-differentials of Lipschitz Functions,
Transactions of the American Mathematicla Society, Vol. 264, pp. 77-89.
Studniarski M. (1986), Necessary and Sufficient Conditions for Isolated Local
Minima of Nonsmooth Functions, SIAM J. Control and Optimization, Vol.
24, NO. 5, pp. 1044-1049.
100 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Hiriart-Urruty J.-B. (1986), A New Set-Valued Second Order Derivative for

Convex Functions, Mathematics for Optimization, Mathematical Studies 129,
(North-Holland, Amsterdam, l996), pp 157-182.
Hiriart-Urruty J.-B. and Seeger A. (1989), The Second-Order Subdifferential
and the Dupin Indicators of a Non-differential Convex Function, Proc. Lon-
don Math, Soc., Vol. 58, No. 3, pp. 351-365.
Vial J.-P. (1983), Strong and Weak Convexity of Sets and Functions, Mathe-
matics of Operations Research, Vol. 8, No. 2, pp. 231-259.
Ward D. (1995), A Comparison of Second-Order Epiderivativess: Calculus and
Optimality Conditions, Journal of Mathematical Analysis and Applications,
Vol. 193, pp. 465-482.
Ward D. (1994), Characterizations of Strict Local Minima and Necessary Con-
ditions for Weak Sharp Minima, Journal of Optimization Theory and Appli-
cations, Vo1.80, No. 3, pp. 551-571.
Yang, X.Q. and Jeyakumar, V. (1992), Generalized second-order directional
derivatives and optimization with C1>'functions, Optimization, Vol. 26, pp.
165-185.
Yang, X.Q. (1998), Second-order global optimality conditions for convex com-
posite optimization, Mathematical Programming, Vol. 81, pp. 327-347.
3
DUALITY AND EXACT
PENAL l ZAT ION VIA A GENERALIZED
AUGMENTED LAGRANGIAN FUNCTION
X.X. Huang

Department of Mathematics and Computer Science,

Chongqing Normal University,
Chongqing 400047, China

and X.Q. Yang

Department of Applied Mathematics,

The Hong Kong Polytechnic University,
Kowloon, Hong Kong, China

Abstract: In this paper, we introduce generalized augmented Lagrangian by

relaxing the convexity assumption on the usual augmenting function. Appli-
cations are given to establish strong duality and exact peanlty representation
for the problem of minizing an extended real valued function. More specifi-
cally, a strong duality result based on the generalized augmented Lagrangian is
established, and a necessary and sufficient condition for the exact penalty rep-
resentation in the framework of generalized augmented Lagrangian is obatined.

Key words: Extended real-valued function, generalized augmented Lagrangian,

duality, exact penalty representation.
102 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

It is well-known that for nonconvex optimization problems, a nonzero dual-

ity gap may exist when using ordinary Lagrangian. In order to overcome
this drawback, augmented Lagrangians were introduced in, e.g., Rockafellar
(1974); Rockafellar (1993) for constrained optimization problems. Recently in
Rockafellar, et a1 (1998) a general augmented Lagrangian was introduced where
the augmenting function is assumed to be convex and, under mild conditions, a
zero duality gap and a necessary and sufficient condition for the exact penaliza-
tion were established. Most recently, there is an increasing interest in the use of
"lower order" penalty penalty functions Luo et a1 (1996); Luo, et a1 (2000); Pang
(1997); Robinov, et a1 (1999). A special feature of these penalty functions is
that they are generally neither convex composite functions nor locally Lipschitz
functions. However, the conditions to guarantee the exact penalty property of
these "lower order" functions are weaker than those required by the usual l1
penalty function.
In this paper, we will introduce a generalized augmented Lagrangian for a
primal problem of minimizing an extended real-valued function without the con-
vexity requirement on the augmenting function. This generalized augmented
Lagrangian includes the lower order penalty function in Luo et a1 (1996) as
a special case. This relaxation also allows us to derive strong duality results
and exact penalty representation results under weaker conditions than those of
Rockafellar, et a1 (1998). More detailed study of this generalized augmented
Lagrangian can be found in Huang et a1 (2003).
The outline of this paper is as follows. In section 2, we introduce a gen-
eralized augmented Lagrangian. In section 3, we address the issue of strong
duality. Section 4 is devoted to exact penalization.

2 GENERALIZED AUGMENTED LAGRANGIAN

In this section, we introduce some concepts and obtain some basic properties
of augmented Lagrangians.
Let R = R U{+oo, -m) and cp : Rn + R be an extended real-valued func-
tion. Consider the primal problem

inf cp(x).
zeRn
GENERALIZED AUGMENTED LAGRANGIAN 103

A function f : Rn x Rm -+ R is said to be a dualizing parameterization function

for cp if cp(x) = f ( x , 0 ), 'v'x E Rn.

Remark 2.1 A standard constrained nonlinear program can be written as

where X c Rn is a nonempty and closed set, f , gj : X

+
are lsc and gj : X -+ R1, j = ml 1, . ,m are continuous.
-
R1, j = 1, . . . , ml

It is clear that (CP) is equivalent to the following unconstrained optimization

problem

where
if x E x o ,
otherwise,

So considering the model (2.1) provides a unified approach to the usual con-
strained and unconstrained optimization problems.
A simple way to define the dualizing parametrization function for (CP) is:

f ( x , u) = { y; otherwise,
x Xu,
if E

where

I n the sequel, we will use this dualizing parametrization function for (CP).

Definition 2.1 (Rockafellar, et a1 (1998)). (i) Let X C Rn be a closed subset

and f : X -+ R be an extended real-valued function. The function f is said
to be level-bounded o n X iJ for any a E R, the set { x E X : f ( x ) 1 a ) is
bounded.
(ii) A function F : Rn x Rm -
R with value F ( x , u ) is said to be level-
bounded in x locally uniform i n u i f , for each ii E Rm and o E R, there
exists a neighborhood U ( i i ) of ii along with a bounded set D C Rn, such that
{ x E Rn : F ( x , u ) 1a ) C D for any u E U ( i i ) .
104 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Definition 2.2 A function a : Rm + R+ U{+oo) is said to be a generalized

augmenting function i f it is proper, lower semicontinuous ( k c , for short), level-
bounded o n Rm, argminva(y) = ( 0 ) and a ( 0 ) = 0 .

I t is worth noting that this definition o f generalized augmenting function

is different from that o f t h e augmenting function given i n Definition 11.55
o f Rockafellar, et a1 (1998) i n that no convexity requirement is imposed o n
generalized augmenting functions.

Definition 2.3 Consider the primal problem (2.1). Let f be any dualizing
parameterization function for cp, and a be a generalized augmenting function.
(i) The generalized augmented Lagrangian (with parameter r > 0 ) : Rn x
Rm x ( 0 ,+IX) + R is defined by

[ ( x ,y , r ) = i n f { f ( x ,u ) - ( y ,u ) +ra(u) :u E Rm), x E Rn,y E Rm,r > 0 ,

where ( y ,u) denotes the inner product.

(ii) The generalized augmented Lagrangian dual function is defined by

(iii) The generalized augmented Lagrangian dual problem is defined as

Remark 2.2 (i) B y Definition 2.2, any augmenting function used in Defin-
ition 11.55 of Rockafellar, et a1 (1998) is a generalized augmenting function.
Thus, any augmented Lagrangian defined in Definition 11.55 of Rockafellar, et
a1 (1998) is also a generalized augmented Lagrangian.
(ii) If a is an augmenting function in the sense of Rockafellar, et a1 (1998),
then for any y > 0, a7 is a generalized augmenting function. I n particular,
(a) let o(u) = IIuII1, then for any y > 0 , a y ( u ) = IIuII: is a generalized
augmenting function;
(b) take a(u) = IIuIIm, then a r ( u ) = I I u I I ~ is a generalized augmenting
function for any y > 0 .
(c) let a ( u ) = Egl
luj17, where y > 0 . Then a ( u ) is a generalized aug-
menting function.
GENERALIZED AUGMENTED LAGRANGIAN 105

It is clear that none of these three classes (a), (b) and (c) of generalized
augmenting functions is convex when y E ( 0 ,I ) , namely, none of them is an
augmenting function.

Remark 2.3 Consider (CP). Let the dualizing parametrization function be as

in Remark 2.1. It is routine to check (see, e.g., Huang et a1 (2003)) that the
corresponding augmented Lagrangian is

I+m, otherwise,
where v = ( v l , . . . , v m , ) . I n particular, if U ( U ) = $IIull;, then the augmented
Lagrangian and the augmented Lagrangian dual problem are the classical aug-
mented Lagrangian and the classical augmented Lagrangian dual problem stud-
ied i n Rockafellar (1974); Rockafellar (1993), respectively; i f a(u) = IIuII:, y >
0 and ml = 0 (i.e., (CP) does not have inequality constraints), then the aug-
mented Lagrangian for (CP) is

and the corresponding augmented Lagrangian dual problem is

where

Define the perturbation function by

p(u) = inf{f ( x ,u ) : x E R").

Then p(0) is just the optimal value of the problem (2.1).

Remark 2.4 Consider (CP). Let the dualizing parametrization function be as

in Remark 2.1. Then the perturbation function for (CP) is
106 OPTIMIZATION AND CONTROL WITH APPLICATIONS

which is the optimal value of the standard perturbed problem of (CP) (see, e.g.,
Clarke (1983); Rosenberg (1984)). Denote by Mcp the optimal value of (CP).
Then we have p(0) = M c p .

The following proposition summarizes some basic properties of the gener-

alized augmented Lagrangian, which will be useful in the sequel. Its proof is
elementary and omitted.

Proposition 2.1 For any dualizing parameterization and any generalized aug-
menting function, we have
(i) the generalized augmented Lagrangian i ( x , y, r ) is concave, upper semi-
continuous in ( y ,r ) and nondecreasing in r .
(ii) weak duality holds:

3 STRONG DUALITY
The following strong duality result generalizes and improves Theorem 11.59 of
Rockafellar, et a1 (1998).

Theorem 3.1 (strong duality). Consider the primal problem (2.1) and its
generalized augmented Lagrangian dual problem (2.3). Assume that cp is proper,
and that its dualizing parameterization function f ( x , u ) is proper, k c , and level-
bounded in x locally uniform in u. Suppose that there exists ($, F) E Rm x
( 0 ,+m)such that
inf{i(x, $, r) : x E Rn) > -m.

Then zero duality gap holds:

Proof. By Proposition 2.1, we have

GENERALIZED AUGMENTED LAGRANGIAN 107

Suppose to the contrary that there exists 6 > 0 such that

In particular,
p(0) - 6 > q ( g , r ) = XEX
inf i(x,g,r).
Let 0 < r k +m. Then,

Hence, 3uk E Rm such that

From the assumption of the theorem, we suppose that

Then,
f(xk, uk) - (b, uk) + Fu(uk) > -mo, Vk.
(3.1) and (3.2) give us

From the level-boundedness of a, we see that {uk) is bounded. Assume without

loss of generality that uk -+ .li.Then from (3.3) we deduce

a(ii) 5 liminf a(uk) 5 0.

k--++m
Consequently, ii = 0. Moreover, from (3.1) we see that

when k is sufficiently large. This combined with the fact that uk -+ 0 and the
fact that f (x, u) is level-bounded in x locally uniform in u implies that {xk}
108 OPTIMIZATION AND CONTROL WITH APPLICATIONS

is bounded. Assume without loss of generality that xk t 5. Taking the lower

limit in (3.4) as k + +co, we obtain

This contradicts the definition of p(0).

Remark 3.1 For the standard constrained optimization problem (CP), let the
generalized augmented Lagrangian be defined as in Remark 2.3. Further assume
the following conditions hold:
(i)
f ( x ) Jrm*, V X E X , (3.5)

for some m* E R.
(22)

Then all the conditions of Theorem 3.1 hold. It follows that there exists
no duality gap between (CP) and its generalized augmented Lagrangian dual
problem.

4 EXACT PENALTY REPRESENTATION

In this section, we present exact penalty representation results in the framework

of generalized augmented Lagrangian.

Definition 4.1 (exact penalty representation) Consider the problem (2.1). Let
the generalized augmented Lagrangian i be defined as i n Definition 2.3. A vector
jj E Rm is said to support an exact penalty representation for the problem (2.1)
if there exists 2 > 0 such that

and
argminxcp(x) = argminxi(x,jj, r ) , V r Jr 2.

The following result can be proved similarly to Theorem 11.61 in Rockafellar,

et a1 (1998).
GENERALIZED AUGMENTED LAGRANGIAN 109

Theorem 4.1 I n the framework of the generalized augmented Lagrangian de-

fined in Definition 2.3. The following statements are true:
(i) If j j supports an exact penalty representation for the problem (2.1), then
there exist f > 0 and a neighborhood W of 0 E Rm such that

(ii) The converse of (i) is true i f

(a) p(0) is finite;
(b) there exists 7' > 0 such that

i n f { f ( x ,u ) - (Y, u ) + F1a(u): ( x ,u ) E Rn x Rm) > -w;

(c) there exist T > 0 and N > 0 such that a ( u ) 2 r11ull when llull > N.
Proof. Since jj supports an exact penalty representation, there exists f >0
such that

p(0) = i n f { i ( x ,j j , 7 ) : x E R ~ =)inf{J ( x , u )- (3,u)+ f a ( u ): ( x ,u ) E R~ x R ~ ) .

Consequently,

implying
p(0) I p ( u ) - ( j j , u ) + F a ( u ) , V U ERm.
T h i s proves (i).
It is evident from t h e proof o f Theorem 11.61 i n Rockafellar, et a1 (1998)
that (ii) is true.

Remark 4.1 I n Rockafellar, et a1 (1998), a was assumed to be proper, lsc,

convex and argmin,u(y) = ( 0 ) . A s noted i n Rockafellar, et a1 (1998), a is
level-coercive. It follows that this assumption implies the existence of T > 0
and N > 0 satisfying u(u) 2 r11ull when llull 2 N.

Remark 4.2 Consider (CP) with ml = 0 . Suppose that (3.5) holds, Xo # 0,

>
and a(u) = IIuII:, 7 1. Let the generalized augmented Lagrangian i7(x, y, r )
for (CP) be given as in Remark 2.3. Then j j supports an exact penalty rep-
resentation for (CP) i n the framework of its augmented Lagrangian & ( x , y, r ) ,
namely,
110 OPTIMIZATION AND CONTROL WITH APPLICATIONS

there exists i > 0 such that

and the solution set of (CP) is the same as that of the problem of minimizing
Yjgj ( x ) + r [c,"=~
+ Cy='=l Y
f (x) lgj ( x )I] over x E X whenever r 2 i,
if and only i f there exist i' > 0 and a neighborhood W of 0 E Rm such that

where p is defined as in Remark 2.4.

For the special case where jj = 0 supports an exact penalty representation

for the problem (2.1)) we have the following result.

Theorem 4.2 I n the framework of the generalized augmented Lagrangian h u l n g l

defined i n Definition 2.3. The following statements are true:
(i) If jj = 0 supports an exact penalty representation, then there exist F > 0
and a neighborhood W of 0 E Rm such that

(ii) The converse of (i) is true if

(a) p(0) is finite;
(b) there exist F' > 0 and m**E R such that f ( x , u ) + i 1 a ( u )2 m**,Vx E
Rn,u E Rm.

Proof. (i) follows from Theorem 4.1 (i). We need only to prove (ii). Assume
that (4.2) holds.
First we prove (4.1) by contradiction. Suppose by the weak duality that
there exists 0 < rk -+ +m with

Then there exist x k E Rn and uk E Rm such that

GENERALIZED AUGMENTED LAGRANGIAN 111

The level-boundedness of a implies that {uk} is bounded. Assume, without

loss of generality, that uk + ii. It follows from (4.3) that

a(ii) <_ lim inf a ( u k ) <_ lim ~ ( 0-) m** = O.

k++w k++w Tk -i='
Thus ii = 0. From the first inequality in (4.3), we deduce that

Since uk --+ 0, we conclude that (4.4) contradicts (4.2). As a result, there exists
F > 7' such that (4.1) holds. Hence, for any x* E argmin,cp(x), we have

Consequently, x* E argmin,i(x, 0, r). This shows that

whenever r > F. Now we show that there exists r* > F + 1 > 0 such that
argmin,i(x, 0, r) argmin,cp(x), Vr > r*.
Suppose to the contrary that there exist F + 1 5 r k 1 +oo and xk E
argmin,i(x, 0, r k ) such that xk 6 argmin,cp(x), V k . Then

cp(xk) > P(O), Vk.

For each fixed k, by the definition of i(xk,0, rk), 3{uk>') c Rm with

f ( ~ ~ , + + i(xk,0, rk)

as i + +co.Since r k > F + 1, xk 6 argmin,cp(x), by (4.1), we have

i(xk,0, r k ) = p(0).

From (4.6) and (4.7), we deduce that

f(xk, ukli) + F ' ~ ( u ~+' ~(rk) - #)a(uk~')+ p(0)

as i --+ +oo. It follows that {(rk- F')a(uk>'))f=Y is bounded since f(xk, uk>') +
>
F'a(uk>') m**. As a is level-bounded, we know that {ukti)ftY is bounded.
Without loss of generality, assume that ukli -+ iik. Then

f(xk, iik) + r k a ( i i k ) 5 lim inf f(xk, ukl') + rka(uk>')= p(0).

2++m
(4.8)
112 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Hence,
f ( x k ,fik) + 7'a(iik)+ (rk - 7')a(fik)I p ( 0 ) . (4.9)

Again, by the level-boundedness of a , we see that { f i k } is bounded. Suppose,

without loss of generality, that fik + fi. Then, from (4.10), we obtain

a(ii)Ilim inf a ( f i k )) limk,+, ~ ( 0-) m* = O.

k-+m rk - 7'
So we know that iik -+ 0. Note from (4.5) that iik # 0,Vk. Otherwise, suppose
that 3k* such that fik* = 0. Then from (4.9) we have

contradicting (4.5). As a result, (4.8) contradicts (4.2). The proof is complete.

Remark 4.3 Comparing Theorems 4.1 and 4.2, the special case where y = 0
supports an exact penalty representation requires weaker conditions, i.e., con-
dition (c) of (ii) i n Theorem 4.1 is not needed.

Remark 4.4 Consider (CP) and its generalized augmented Lagrangian defined
in Remark 23. Suppose that Xo # 0 and (3.5) holds. Then jj = 0 supports
an exact penalty representation for (CP) in the framework of its generalized
augmented Lagrangian i f and only i f there exist 7 > 0 and a neighborhood W
of 0 E R" such that

where p is defined as in Remark 2.4.

Example 4.1 Consider (CP). Let X* denote the set of optimal solutions of
(CP). Suppose that Xo # 0 and (3.5) hold. Let the generalized augmenting
function be a ( u ) = IIuII:,y > 0 and the generalized augmented Lagrangian be
defined as i n Remark 2.3. It is easily computed that

i ( x ,0 , r ) = g T ( x ) + C,"=,, ,I Isj ( x )I ] ' ,

f ( x )+ r[Cjn"_', if x Ex
otherwise.
This is a typical form of 'Yower order" penalty function when 0 < y < 1.
Remark 4.4 says that the following two statements are equivalent
REFERENCES 113

(i) there exists 7' > 0 such that

and X * = X,*, r 2 F', where X: is the set of optimal solutions of the problem
+
of minimizing f (x) r [CTlg;(x) +
Cjm_ml Igj(x) llr over x E x;
(ii) there exist F > 0 and a neighborhood W of 0 E Rm such that

where p is defined as in Remark 2.4.

5 CONCLUSIONS

We introduced generalized augmented Lagrangian by relaxing the convexity

assumption imposed on the ordinary augmenting function. As a result, weaker
conditions are required to guarantee the strong duality and exact penalization
results similar to those obtained in Rockafellar, et a1 (1998). This work was
supported by a grant from the Research Grants Council of the Hong Kong
Special Administrative Region, China (Project No. PolyU 514l/OlE).

References

Clarke, F.H. (1983), Optimization and Nonsmooth Analysis, John Wiley &
Sons, New York.
Huang, X.X. and Yang, X.Q. (2003), A Unified Augmented Lagrangian Ap-
proach to Duality and Exact Penalization, Mathematics of Operations Re-
search, to appear.
Luo, Z.Q., Pang, J.S. and Ralph, D. (1996), Mathematical Programs with Equi-
librium Constraints, Cambridge University Press, New York.
Luo, Z.Q. and Pang, J.S. (eds.) (2000), Error Bounds in Mathematical Pro-
gramming, Mathematical Programming, Ser. B., Vol. 88, No. 2.
Pang, J.S. (1997), Error bounds in mathematical programming, Mathematical
Programming, Vol. 79, pp. 299-332.
Rockafellar, R.T. (1974), Augmented Lagrange multiplier functions and duality
in nonconvex programming, SIAM Journal on Control and Optimization,
Vol. 12, pp. 268-285, 1974.
114 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Rockafellar, R.T. (1993), Lagrange multipliers and optimality, SIAM Review,

Vol. 35, pp. 183-238.
Rockafellar, R.T. and Wets, R.J.-B. (1998), Variational Analysis, Springer-
Verlag, Berlin.
Rosenberg, E. (1984), Exact penalty functions and stability in locally Lipschitz
programming, Mathematical Programming, Vol. 30, pp. 340-356.
Rubinov, A. M., Glover, B. M. and Yang, X. Q. (1999), Decreasing functions
with applications to penalization, SIAM J. Optimization, Vol. 10, No. 1, pp.
289-313.
4 DUALITY FOR SEMI-DEFINITE AND
SEMI-INFINITE PROGRAMMING WITH
EQUALITY CONSTRAINTS
S.J. Li,
Department of Information and Computer Sciences,
College of Sciences, Chongqing University,
Chongqing, 400044, China.
E-mail: [email protected]

X.Q. Yang and K.L. Teo

Department of Applied Mathematics,

The Hong Kong Polytechnic University,
Kowloon, Hong Kong.
E-mail: [email protected]
and [email protected]

Abstract: In this paper, we study a semi-definite and semi-infinite program-

ming problem (SDSIP) with equality constraints. We establish that a uniform
duality between a homogeneous (SDSIP) problem and its Lagrangian-type dual
problem is equivalent to the closedness condition of certain cone. A corre
sponding result for a nonhomogeneous (SDSIP) problem is also obtained by
transforming it into an equivalent homogeneous (SDSIP) problem.

Key words: Duality, semi-definite program, semi-infinite program.

116 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION A N D PRELIMINARIES

Let Sn denote the set of real symmetric n x n matrices. By X 2 0, where

X E Sn, we mean that the matrix X is positive semidefinite. The set K =
>
{X E SnlX 0) is called the positive semidefinite cone. For any S C S n , cl(S)
denotes the closure of S in Sn. For the compact set B in a metric space, let

3 a set F C B)(W E B \ F ) y(t) = 0).

AB = {y = { y ( t ) ) t ~E ~~ ~ 1 ( finite

For the set W = {A(t)lt E B), sp(W) denotes the subspace generated by W ,
i.e.,
.P(W) = {CY ( ~ ) A ( ~ ) EI vAs),
Y
tEB
The standard inner product on Sn is

We consider the following semi-definite and semi-infinite linear programming

problem (SDSIP):

inf C X
s.t A(t) X = b(t), t E B , (1.1)
X 2 0.
Here B is a compact set in R, C and A(t)(t E B ) are all fixed matrices in Sn,
b(t) E R (t E B) and the unknown variable X also lies in Sn.
Obviously, (SDSIP) problem includes the semi-definite programming prob-
lem and the linear semi-infinite programming problem with equality constraints
as special cases. See Charnes et a1 (1962) and Wolkowicz et a1 (2000).
For the (SDSIP) problem , we introduce the Lagrangian dual problem (DS-
DSIP) as follows:

When the parameter set B is finite, Then, (SDSIP) and (DSDSIP) is a pair
of primal and dual (SDP). See Vandenberghe and Boyd (1996), Ramana et a1
(1997) and Wolkowicz et a1 (2000).
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 117

Proposition 1.1 Suppose that X and ( y ,2 ) are feasible solutions for (SDSIP)
problem and (DSDSIP) problem, respectively. Then,

Proof Since (y, Z) is a feasible solution for (DSDSIP),

Then, we have

Since X and Z are positive semidefinite matrices, Z X 2 0. It follows that

Thus, the result holds.

(SDSIP) problem is said to be consistent if there exists X 0 such that
(1.1) holds. It is said to be bounded in value if it is consistent and there
exists a number z* such that all feasible solutions X E Sn to (SDSIP) satisfy
C X 2 z*. It is said to unbounded in value if, for each integer n, there exists
a feasible solution x(")to (SDSIP) with C x ( ~ 5 )-n.
Definition 1.1 The system of linear equalities

yields duality with respect to C E Sn, i f exactly one of the following conditions
holds:

(i) (SDSIP) is unbounded in value and (DSDSIP) is inconsistent;

(ii) (DSDSIP) is unbounded in value and (SDSIP) is inconsistent;

(iii) Both (SDSIP) and (DSDSIP) are inconsistent;

(iv) Both (SDSIP) and (DSDSIP) are consistent and have the same optimal
value, and the value is attained i n (DSDSIP).

W e say that (SDSIP) yields uniform duality if the constraint system (1.3) yields
duality for every C E Sn.
118 OPTIMIZATION AND CONTROL WITH APPLICATIONS

In the paper, we firstly establish that a uniform duality between the ho-
mogeneous (SDSIP) and its Lagrangian-type dual problem is equivalent to the
closedness condition of certain cone. With aid of the result, we also obtain a
corresponding result for nonhomogeneous (SDSIP).
Detailed study of uniform duality for (SDSIP) problems with inequality con-
strains can be found in Li et a1 (2002).

2 UNIFORM DUALITY FOR HOMOGENEOUS (SDSIP)

We firstly discuss the homogeneous case in (SDSIP): b(t) = 0,V t E B. Then

(SDSIP) becomes the following problem (SDSIPh):

inf C X
s.t A(t) X = 0, t E B,
x k 0,
and (DSDSIP) becomes the following problem (DSDSIPh):

sup 0

Lemma 2.1 The problem (SDSIPh) is unbounded in value if and only if there
exists X * 2 0 satisfying:

A(t) X * = 0, t E B (2.3)
and C. X * < 0. (2.4)

Proof. Suppose that there is X * 0 such that (2.3) and (2.4) hold. Without
loss of generality, assume COX*< -1. For each n we have A(~).x(") >0, t E B
and COX(")< -n with x(")= nX*. Hence, (SDSIPh) is unbounded in value.
Conversely, by the unbounded definition, the case holds. This completes the
proof. 0
Remark 2.1 Since X = 0 is a feasible solution of (SDSIPh), (SDSIPh) is
always consistent. If the optimal value of (SDSIPh) is bounded below, the
optimal value of (SDSIPh) is zero. Thus, (ii) and (iii) in Definition 1.1 do not
happen.
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 119

Theorem 2.1 (SDSIPh) yields uniform duality if and only if sp(W) +K is a

closed set.

Proof. Suppose (SDSIPh) yields uniform duality. Let C E cl(sp(W) + K ) and

+ >
C $! sp(W) K . Then, there exists no y E AB and Z 0 such that

Thus, (DSDSIPh) is inconsistent. Since (SDSIPh) is consistent, the problem

(SDSIPh) must be unbounded in value. By Lemma 2.1, there exists X * 20
satisfying (2.3) and (2.4). By (2.3), we have that

Take any S E sp(W) + K . Then, there exist V E sp(W) and Q E K such that

Since Q and X * are positive semidefinite matrices, we have that

and
s e x *20.

Therefore, we get that

and
C e X * 2 0.

However, it follows from (2.4) that C e X * < 0, which is a contradiction. Hence,

sp(W) + K is closed.
Conversely, suppose sp(W) +K is closed. Let C E Sn be arbitrary. Since
(SDSIPh) is consistent, either (SDSIPh) is unbounded in valued or bounded
in value. If (SDSIPh) is unbounded in value, by Proposition 1.1, (DSPSIPh)
is inconsistent. If (SDSIPh) is bounded in value, its value is zero by Lemma
120 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2.1. Now we show that clause (iv) of Definition 1.1 holds. If (DSDSIPh) is not
+
consistent for C, C @ sp(W) K = cl(sp(W) + K). By the definitions of sp(W)
and K , we have that sp(W) + K is a closed and convex cone in Sn. Thus, by
the separation theorem, there exists X * in Sn such that

Obviously,
A(t) E sp(W), V t E B.
Therefore,
CeX4<OandA(t)eX*=0, V ~ E B .
Thus, it is necessary that we prove X * 0. Take any Q E K and 0 E sp(W).
We have
Q @ X 4 2 0 ,VQEK. (2.5)
Thus, X * is a positive semidefinite matrix. It completes this proof. 0

Proposition 2.1 The constraint system

yields duality for any E Sn+l if and only if sP(w) + K is a closed set.
Proof. The proof is similar to that of Theorem 2.1 and is omitted. 0

3 UNIFORM DUALITY FOR NONHOMOGENEOUS (SDSIP)

We now establish the duality for the nonhomogeneous constraint system (1.1)
of (SDSIP) by reformulating it as a form of homogeneous system (2.1) and
applying Proposition 2.1. For any real number d E R, we define:

0 is zero element in Rn and 2 , g E Sn+l.

Now, we introduce a new semi-definite and semi-infinite programming prob-
lem (SDSIPl):

inf Clx
s.t j(t) 2 = 0, t E B, (3.1)
X k 0.
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 121

The Lagrangian dual program (DSDSIPl) of (SDSIPI) is as follows:

sup 0
s.t. y(t)A(t)+ 2 = 6 , y E AB, (3.2)
tE B
2 > 0.
which is equivalent to the program

sup 0

Lemma 3.1 The nonhomogeneous constraint system (1.1) yields duality with
respect to C E Sn if and only if, for every d E R, the constraint system (3.1)
yields duality with respect to 6 E Sn+l.

Proof Suppose that the constraint system (1.1) yields duality with respect to
C E Sn and let d 6 R. We will show that the constraint system (3.1) yields
duality with respect to 6.
Since (SDSIPl) is a homogeneous system, by Remark 2.1, we need only show
two cases:
Case one: if its dual problem (DSDSIPl) is consistent, (iv) in Definition 1.1
holds.
By homogeneous property of (SDSIPl) problem, we have that the optimal
value of (DSDSIPl) problem is zero. It follows from Proposition 1.1 that (SD-
SIP1) problem is bounded in value. By Remark 2.1, we have that (iv) in
Definition 1.1 holds.
Case two: if its dual problem (DSDSIPl) is inconsistent, (i) in Definition
1.1 holds. Namely, (SDSIPl) has a value of -00 with respect to 6.
Assume that its dual problem (DSDSIPl) is inconsistent. Note that (3.3)
and (3.5) are the constraint system of (DSDSIP). If (SDSIP) is consistent and
(3.3) and (3.5) are hold, by hypothesis condition that the constraint system
(1.1) yields duality with respect to C E Sn,we have
122 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Thus, it follows from the inconsistency of the dual problem (DSDSIPl) that
a t least one of two conditions holds: (i) the constraint (3.3) does not hold; (ii)
the constraint (3.3) holds, but the constraint (3.4) does not hold. Namely,

If (i) holds, then the problem (SDSIP) is unbounded in value by hypothesis.

Thus, for any number n however large, there exists x(")2 0 such that

Set
and 0 E Rn.

It follows from x ( ~
that
) x(")is a positive semidefinite matrix. Thus, we have
that

Therefore, (SDSIP1) has value -00.

If (ii) holds and (SDSIP) is inconsistent, then we have that (SDSIP) is in-
consistent and (DSDSIP) is consistent. Since (SDSIP) has duality with re-
spect to C , for any n , there exists a solution y E AB for (DSDSIP) with
y(t)b(t) > n.
CtEB Then, the dual problem (DSDSIPl) is consistent for any
d E R, which contradicts with assumption.
If (ii) holds and (SDSIP) is consistent, by (3.6), we have

Therefore, there exists a point X t0 with

and

Set
DUALITY FOR SEMI-DEFINITE AND SEMI-INFINITE PROGRAMMING 123

Then,

Thus, by Lemma 2.1, (SDSIPl) has value -00. We have proved the necessity
of this lemma.
To prove the sufficiency of the lemma, we suppose that, for all d E R and
C E Sn, (SDSIP1) yields duality with respect to 6 . We need to prove that
(SDSIP) yields duality with respect to C.
If (SDSIP) is inconsistent, then, there is only zero to solve (SDSIPl). By
the definition of duality, (DSDSIPl) is consistent for any d E R. Take d = n.
Thus, there exists y(n) E AB such that

and

Therefore, clause (ii) of the Definition 1.1 holds.

If (SDSIP) is consistent, then there are two cases:
(a) (SDSIP) is unbounded in value. If (DSDSIP) is consistent, then take any
feasible solutions X and (y, Z) for (SDSIP) and (DSDSIP), respectively. Thus,

C o X 2 2 0X + y(t)b(t).
t€B

Since X and Z are positive semidefinite matrices, Z l X 2 0. It follows that

which contradicts unboundedness for (SDSIP). Thus, (DSDSIP) is inconsistent.

(b) (SDSIP) is bounded in value. Let zo = inf{CoX (A(t)*X 2 b(t) , t E B) .
We first show that (SDSIPl) for d = zo cannot be unbounded in value.
If (SDSIPl) is unbounded in value, then by Lemma 2.1, there is a solution
to

A(t) O X = 0, t E B, x > 0,
cox <o.
124 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Suppose 2 = ( l :x ), where x E R n , X * E Sn and X * 2 0. Take

x= ( ti xn+l
), where (I E Rn. Obviously, x is positive semidefinite
and satisfy:

If x,+l > 0, then we may assume xn+l = 1 by homogeneity, and we have

which is a contradiction to the definition of d.

If xn+l = 0, then, we have

Let Xo be a feasible solution for (SDSIP). Then, for any X 2 0, Xo + AX* is a

solution of (SDSIP). Thus, we have that Co(Xo+XX*) = C.Xo+XC.X* < 20,
for large X > 0. This is a contradiction to the definition of zo. Then, (SDSIPl) is
bounded in value for d = zo. Since (SDSIPl) has duality, there exists a solution
(y, 2 )satisfying (3.3) and (3.4). By (3.4), the optimal value of (SDSIP) is equal
to that of (DSDSIP). So (SDSIP) yields duality with respect to C.
I t follows from Lemma 3.1 that we can get the following corollary.

Corollary 3.1 (SDSIP) yields uniform duality if and only if, for any d E R
and C E Sn,the constraint system (3.1) yields duality with respect to 6 E Sn+l.

Theorem 3.1 (SDSIP) yields uniform duality if and only if sP(w) + K is a

closed set, where

w = {A(t) E sn+llt E B), sp(w) = {x

tEB
y(t)A(t)ly E As),
REFERENCES 125

Proof. By Corollary 3.1, (SDSIP) yields uniform duality if and only if, for
any d E R and C E Sn,the constraint system (3.1) yields duality with respect
to each E Sn+l. By Proposition 2.1, for any d E R and C E S n , the
constraint system (3.1) yields duality with respect to each E Sn+lif and
+
only if s p ( w ) K is a closed set. Then, the conclusion follows readily.

Acknowledgments

This research is partially supported by the Research Committee of The Hong Kong
Polytechnic University and the National Natural Science Foundation of China.

References

Charnes, A., Cooper, W. W. and Kortanek, K. (1962), Duality in Semi-Infinite

Programs and Some Works of Haar and Caratheodory, Management Sci-
ences, Vo1.9, pp.209-229.
Duffin, R. J., Jeroslow, R. G. and Karlovitz, L. A. (1983), Duality in Semi-
infinite Linear Programming, in Semi-Infinite Programming and Applica-
tions, Fiacco, A.V. and Kortanek, K. O., Eds., Lecture Notes in Economics
and Mathematical Systems 215, Spring-Verlag Berlin Heidelberg New York
Tokyo, pp.50-62.
Lieven Vandenberghe and Stephen Boyd (1996), Semidefinite Programming,
SIAM Review, Vo1.38, No.1, pp.49-95.
Madhu V. Nayakkankuppam and Michael L. Overton (1999), Conditioning of
Semidefinite Programs, Mathematical Programming, Series A, Vo1.85, No.3,
pp.525-540.
Ramana, M. V., Tuncel, L. and Wolkowicz, H. (1997), Strong Duality for Semi-
definite Programming, SIAM Journal on Optimization, Vo1.7, pp.641-662.
Reemtsen, R. and Ruckmann, J.J. (1998), Semi-Infinite Programming, Kluwer
Academic Publishers.
Wolkowicz, H., Saigal, R. and Vandenberghe, L. (2000), Handbook of Semidefi-
nite Programming Theory, Algorithms, and Applications, Kluwer Academic
Publishers.
Li, S. J., Yang, X. Q. and Teo, K. L. (2002), Duality and Discretization for
Semi-Definite and Semi-Infinite Programming, submitted.
5
THE USE OF NONSMOOTH
ANALYSIS AND OF DUALITY METHODS
FOR THE STUDY OF HAMILTON-JACOB1
EQUATIONS
Jean-Paul Penot

UniversitC de Pau, FacultC des Sciences,

Laboratoire de MathCmatiques AppliquCes, CNRS ERS 2055
Av. de I'UniversitC, BP 1155, 64013 PAU, France

Abstract: We consider some elements of the influence of methods from convex

analysis and duality on the study of Hamilton-Jacobi equations.

Key words: Conjugacy, convexity, duality, Hamilton-Jacobi equation, subdif-

ferential, viscosity solution.
128 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

A huge literature has been devoted to Hamilton-Jacobi equations during the

last decades and several monographs are devoted to them, partially or entirely
(Bardi and Capuzzo-Dolcetta (1998), Barles (1994), Clarke et a1 (1998), Evans
(1998), Lions (1982), Subbotin (1995), Vinter (2000)). The amount of methods
used to study them is amazing. Here we consider them from the point of
view of unilateral analysis (nonsmooth analysis, convex analysis and variational
convergences).
Given a Banach space X , with dual X * and functions g : X + IRU{+co),
H : X * -t IRU{+co), the evolution Hamilton-Jacobi equation we study con-
sists in finding solutions to the system
du
(H -J ) -d(tx , t ) + H ( D u ( x , t ) ) = 0 (1.1)
(B) 4x7'3) = g(x) (1.2)
where u : X x R+ --+ is the unknown function, and D u (resp. g)
denotes
the derivative of u with respect to its first (resp. second) variable. Note that
we accept solutions, Hamiltonians and initial value functions taking the values
+co or being discontinuous.
In the present paper we survey some questions which occurred to us while
studying this equation and the papers which were available to us. Needless
to say that many other questions could be considered. We refer to Alvarez
et a1 (1999), Barron (1999), Borwein and Zhu (1996), Crandall et a1 (1992),
Deville (1999), Frankowska (l993), Imbert (lggg), Imbert and Volle (lggg),
Penot (2000), Penot and Volle (2000) and Volle (1997) for more complete recent
developments and to the monographs quoted above for classical results and
references.

2 THE INTEREST OF CONSIDERING EXTENDED REAL-VALUED

FUNCTIONS

The origins of equations (H-J), (B) incite to consider such data. These equa-
tions arise when, for a given (x, t) E X x IP, with IP the set of positive numbers,
one considers the Bolza problem:

(B)find V(x,t) :=
HAMILTON-JACOB1 EQUATIONS 129

where W1*l([-t, 01, X ) is the set of primitives of integrable functions on [-t, 0]

with values in X.
For considering problems in which a target A has to be reached, it can be
useful to take for g the indicator function LA of the subset A of X given by
LA(%)= 0 if x E A, +co if x E X\A, SO that the terminal constraint x(0) E A
is taken into account.
Classically, one associates to the Lagrangian L an Hamiltonian H via the
partial conjugacy formula

or H(x,p) = (L,)(-p), where L,(v) = L(x,v) and f denotes the Fenchel

conjugate off given by f *(p) := sup{p.x- f (x) : x E X). We observe that, even
when L is everywhere finite, the function H may take the value +m. Moreover,
for modeling control problems, L may take infinite values. For instance, in order
to take into account a differential inclusion

where E : X Z X is a multimapping, one may set L(x, .) = iE(,)(.), the indicator

function of the set E(x). In such a case, H(x, .) is the support function of the
set E(x); it is finite everywhere iff E(x) is bounded, a condition which is not
always satisfied.
Allowing the solutions to take the value +m brings difficulties in defining
notions of solution and in questions of convergence. When H depends on the
derivative of u only, one disposes of explicit formulae designed by Hopf, Lax
and Oleinik: for (x, t) E X x IF' := X x (0, +co)

U(X, t)
V E X pEX*
+
: = inf sup (p.(x - Y) g(y) - ~ H ( P ) ) , (Lax-Oleinik)

~ ( xt), : = sup inf (p.(x - y) + g(y) - tH(p)) (Hopf)

pex* VEX

These formulae can be interpreted with the help of the Legendre-Fenchel

transform:

v(x,t) := (g* + tH)*(x) := sup

pEdomHndomg*
(p.x - g*(p) - tH(p)) (2.1)
130 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and, using the infimal convolution given by ( g ~ h (x)

) := inf{g(x-y)+h(y) :
YE Y ) ,
u(x,t) := ( g n ( t H ) * ) (x) := inf g(x - y)
YEX
+ (tH)*(y), (2.2)
One can extend these functions to X x R by setting u(x, t) = v(x, t) = +co
for X E X, t < 0, V(X,0) := (g* + ~domff)*(x),U(X,0) := ( g o hO) (2) with
ho being interpreted as L:,,~, the support function of dom H. In doing so,
one gets another interpretation made in several recent contributions (Imbert
(lggg), Imbert and Volle (lggg), Penot and Volle (2000)), under various degrees
of generality, starting with the pioneering work of Plazanet Plazanet (1990)
dealing with the Moreau regularization of convex functions using a convex
kernel. Setting

where E := {(p, r ) E X * x IR : -r > H(p)) = S(epi H ) and S is the symmetry

(I),r ) (P, -TI, one has

where the conjugates are taken with respect to the pairs (p, r), (x, t) and where
the infimal convolution is taken with respect to the variable (x, t) (the notation
is unambiguous inasmuch F and G are defined on X * x IR and X x IR respec-
tively). In fact, for t E IR+ one has F*(., t) = (tH)* and G* = g* o p p , where
p x * is the first projection from X * x IR to X*.
In order to avoid the trivial case in which v is the constant function -cox,
we assume that
domH n domg* # 0, (2.5)
while to avoid the case the Lax solution u is an improper function we assume
the condition
domg # 0, domH # 0, domH* # 0. (2.6)
In Imbert and Volle (1999), Penot (2000) and Penot and Volle (2000) some
criteria for the coincidence of the Hopf and the Lax solutions are presented and
some consequences of this coincidence are drawn. In particular, in Penot and
Volle (2000) we introduced the use of the Attouch-BrBzis type condition

Z := IR+(domg* - domH) = -2 = cl(Z) (2-7)

HAMILTON-JACOB1 EQUATIONS 131

which ensures that u = v when g and H are closed proper convex functions.
Simple examples show that, without convexity assumptions, u and v may differ.
The interchange of inf and sup in the explicit formulae above shows that
u 2 v. A more precise comparison can be given. Without any assumption, for
t E IP,one has

u(-,t) 2 u(.,t)** := ( g ~ ( t H ) * ) * *
-
- (g* + tH**)* 2 (g* + tH)* = v(.,t), (2.8)
u** = ( F * n G ) * * = (F** + G*)* 2 (F + G*)* = v. (2.9)

Under the convexity assumption

H**I domg* = H 1 domg*, (2.10)

which is milder than the condition H**= H , one has a close connection between
u and v.

Proposition 2.1 Under assumptions (2.5), (2. lo), one has u(., t)** = v(., t),
u** = v. If moreover g is convex, then v = a, the lower semicontinuous hull of
U.

Proof. Let us prove the second equality of the first assertion, the first one
being similar and simpler. In view of relation (2.9) it suffices to show that
+ +
(F** G*) (p, q) = (F G*) (p, q) for any (p, q) E X x R, or for any (p, q) E
domg* x R since both sides are +oo when p $ domg*. Now F**is the indicator
function of the closed convex hull m(E) of E. Since B ( E ) = m(S(epi H)) =
S(m(epiH) = S(epi H**), for p E domg* we have (F** + G*) (p, q) = g*(p) iff
(p, q) E W(E) = S(epi H**) iff ( F + G*) (p, q) = g*(p).
3 SOLUTIONS I N T H E SENSE O F UNILATERAL ANALYSIS

Defining an appropriate notion of solution is part of the challenge. When

considering existence and uniqueness of a solution as the crucial question, the
notion of viscosity solution (or Crandall-Lions solution) is a fine concept. I t is
certainly preferable to the notion of generalized solution in which the derivative
of u exists a.e. and satisfies equation (H-J).
Since the initial condition can also be given different interpretations, we
chose in Penot and Volle (2000) to treat separately equation (H-J) and the
132 OPTIMIZATION AND CONTROL WITH APPLICATIONS

initial condition (B). In doing so, one can detect interesting properties of func-
tions which are good candidates for the equation but do not satisfy the initial
condition in a classical sense. When a function u satisfies both (H-J) and (B)
in an appropriate sense, we speak of a solution of the system (H-J)-(B).
The notion of viscosity solution (Crandall and Lions (1983), Crandall et a1
(1984)) which yielded existence and uniqueness results, introduced a crucial
one-sided viewpoint since in this notion, equalities are replaced by inequalities.
A further turn in the direction of nonsmooth analysis occurred with Barron
and Jensen (1990), F'rankowska (1987), Fkankowska (1993) (see also Bardi and
Capuzzo-Dolcetta (1998), Clarke et a1 (1998), Vinter (2000)) in which only
subdifferentials are involved. We retain this viewpoint here and we admit the
use of an arbitrary subdifferential. Although this concept is generally restricted
to some natural conditions, here we adopt a loose definition which encompasses
all known proposals: a subdifferential is just a triple (X, 3,d) where X is a class
of Banach spaces, 3 ( X ) is a class of functions on the member X of X and d
is a mapping from 3 ( X ) x X into the family of subsets of X*, denoted by
(f, x) H d'f (x), with empty value at (f,x) when If (x)l = oo. The viscosity
subdifferential of f at x is the set of derivatives at x of functions cp of class C1
such that f - cp attains its minimum at x. For most results, it suffices to require
that cp is F'rbchet differentiable at x. This variant coincides with the notion of
F r k h e t subdifferential defined for f E R ~x E
, f -'(R) by
1
x* E X * :liminf - [ f ( x + u ) -
1141+0+ lbll
f(z)-(x*,u)] 2 0

in view of the following simple lemma which shows that there is no misfit
1
between the two notions.

Lemma 3.1 For any Banach space X, the Fre'chet subdifferential o f f at x E

dom f coincides with the set of derivatives at x of functions cp which are Fre'chet-
differentiable at x and such that f -cp attains its minimum at x. If X is reflexive
(or more generally can be renormed by a norm of class C1 o n X\{O)) then the
Fre'chet subdifferential coincides with the viscosity subdifferential.

A similar result holds for the Hadamard (or contingent) subdifferential defined
for f E ITX, x E f-l (R) by

X * E X * : V W E X , liminf
(t,v)-(O+,w)
1
-[f(x+tv)-f(x)-(x*,tv)]>O
t I
HAMILTON-JACOB1 EQUATIONS 133

when the Frkchet differentiability o f cp is replaced with Hadamard differentia-

bility.

Definition 3.1 Given a subdifferential a?, a function w : X x R -t is a 8'-

supersolution of (H-J) i f for any ( x ,t ) E X x IF' and any ( p ,q) E d'w(x, t ) one
has q + H ( p ) 2 0. If moreover weak-lim i n f ( , ~ , ~ ) ~ ~w,(,x~l ,+t ))2 g ( x ) for each
x E X , with X endowed with its weak topology, w is said to be a supersolution
to (H-J)-(B).
It is called a 8'-subsolution of (H-J) if for ( x ,t ) E X x IP and any ( p ,q ) E
+
d?w(x,t ) one has q H ( p ) I 0. If moreover weak-lim inf(,~,t)-(,,~+)w ( x l ,t ) I
g ( x ) for each x E X , w is said to be a subsolution to (H-J)-(B). It is a 8'-
solution of (H-J ) i f it is both a 8'-supersolution and a 8'-subsolution.

T h e preceding definition is a natural one from the point o f view o f nonsmooth

analysis or unilateral analysis (note that i n Bardi and Capuzzo-Dolcetta (1998)
+
a solution is called a bilateral solution in view o f the relation q H ( p ) = 0
for any ( p , q ) E d 7 u ( x , t ) ; however, we prefer t o stress the exclusive use o f
the subdifferential). T h e preceding definition has been introduced b y Barron
and Jensen i n Barron and Jensen (1990) (along with a lower semicontinuity
assumption) for the viscosity subdifferential.
W e recover a more classical notion i n the following definition.

Definition 3.2 A function u : X x R -t jTi: is a Crandall-Lions 8?-subsolution

i f -u is a supersolution of the equation associated with the Hamiltonian p I+
-H(-p). It is a Crandall-Lions 8'-solution i f it is both a 8'-supersolution and
a Crandall-Lions 8?-subsolution.

4 VALIDITY OF SOME EXPLICIT FORMULAE

Another interest o f formulas (2.3)-(2.4) lies i n t h e simplicity o f the proofs o f

existence theorems (see Imbert (1999) and Imbert and Volle (1999)). Here
we simplify the original approach o f Penot and Volle (2000). Note that no
assumption is needed for assertion (a).

Theorem 4.1 (a) The Hopf solution v is a Hadamard (hence a Re'chet) su-
persolution of (H-J).
(b) Under assumptions (2.5), (2. lo), the Hopf solution v is a Hadamard
solution of (H-J).
134 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Proof. (a) If assumption (2.5) does not hold, one has v I X x P = -coXX'
and there is nothing to prove. If it holds, given t > 0, x E X and (p, q) E
dv(x, t), since v is convex, for each s 6 R+, one has

For s = 0, taking the supremum on w and using v(., 0)* = (g* + L ~ ~ ~ >~ g*,) * *

one gets

The last inequalities show that g*(p) < oo; it follows that q + H(p) 2 0.
(b) For s > 0, using (2.5), (2.10) to note that g* + sH = g* + sH**is closed,
proper, convex, one gets

p.x+q(t-s)-v(x,t) 2 sup (p.w-(g+sH)(w))

wEX
= (g*+s~)**(p)
= (g* + s H ) (P). (4.2)

Since g * ( ~ <
) +co and since s can be arbitrarily large, one obtains -q 2
H(P).
In order to get a similar property for the Lax solution, we use a coercivity
condition:
(C) liminfllxll+oo H * ( x ) l IIxII > - liminfllxll+oo d x ) l IIxII .
Let us recall that an infimal convolution is said to be exact if the infimum is
attained. Under assumption (C), exactness occurs in (2.2) when X is reflexive
and g is weakly 1.s.c.

Theorem 4.2 (a) When (C) holds, the Lax solution u is a Re'chet supersolu-
tion.
(b) If the inf-convolution in the definition of u is exact, then u is a Hadamard
supersolution.
(c) If g is convex, then u is a Hadamard supersolution.
(d) If H = H**,then u is a Hadamard subsolution.

Since the definition of u involves H through H*, the assumption H = H**

is sensible.
HAMILTON-JACOB1 EQUATIONS 135

Proof Since assertions (b)-(d) are proved in Penot and Volle (2000) and else-
where, we just prove (a). Let (x,t) E X x IP be fixed and let k be given by
k(w) = g(z - w) + tH*(t-'w). Assumption (C) ensures that k is coercive (this
fact justifies the observation preceding the statement). Let B be a bounded
subset of X such that inf k(B) = inf k(X). For each s €10, t[ let us pick z, E B
+ +
such that k(z) < inf k(X) s2 = u(x, t) s2.Then, a short computation shows
that
- S) < u(x,t) + s2.
-
U(X - st-'zs,t - sH*(t-lz,)

It follows that for each (p, q) E 8-u(x, t) one can find a function E : lR+ lR+
with limit 0 a t 0 such that

Therefore, dividing by s, and passing to the limit inferior, we obtain

q + H(p) 2 liminf
s+o+
(q + p.t-lzs - h(t-'z,)) 2 lim (-s
s-+o+
- E(s)) = 0.

5 UNIQUENESS AND COMPARISON RESULTS

Let us turn to this important question which has been a t the core of the viscos-
ity method. There are several methods for such a question: partial differential
equations techniques (Bardi and Capuzzo-Dolcetta (1998), Barles (1994), Lions
(1982). ..), invariance and viability for differential inclusions (Subbotin (l995),
F'rankowska (1993), Plaskacz and Quincampoix (2000), Plaskacz and Quincam-
poix (2000)...), nonsmooth analysis results such as the Barron-Jensens strik-
ing touching theorem (Barron (1999), Barron and Jensen (1990)), the fuzzy
sum rule (Borwein and Zhu (1996), Deville (1999), El Haddad and Deville
(1996)), multidirectional mean value inequalities (Imbert (1999), Imbert and
Volle (1999), Penot and Volle (2000)). Let us note that the last two results are
almost equivalent and are equivalent in reflexive spaces.
In order to simplify our presentation of recent uniqueness results arising from
nonsmooth analysis, we assume in the sequel that X is reflexive and we use the
F'rbchet subdifferential. In such a case, a fuzzy sum rule is satisfied and mean
value theorems are available.

Theorem 5.1 (Penot and Volle (2000) Th. 6.2). For any 1.s.c. Fre'chet sub-
solution w of (H-J)-(B) one has w < u, the Lax solution.
136 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The next corollary has been obtained in Alvarez et a1 (1999) Th. 2.1 under
the additional assumptions that X is finite dimensional, H is finite everywhere
and for the subclass of solutions which are 1.s.c. and bounded below by a
function of linear growth. I t is proved in Imbert and Volle (1999) under the
additional condition that dom H* is open.

Corollary 5.1 Suppose X is reflexive, g and H are closed proper convex func-
tions and domg* c dom H. Then the Hopf solution is the greatest 1.s.c. Fre'chet
subsolution of (H-J)-(B).

The use of the mean value inequality for a comparison result first appeared
in Imbert (1999),Imbert and Volle (1999) Theorem 3.3 which assumes that H
is convex and globally Lipschitzian and that X is a Hilbert space. Let us note
that in our framework the mean value theorem is equivalent to the fuzzy sum
rule; the fuzzy sum rule has been used in Borwein and Zhu (1996), Borwein
and Zhu (1999), Deville (1999), El Haddad and Deville (1996) for a similar
purpose.

Theorem 5.2 Suppose X is reflexive, H is u.s.c. o n domg* # 0 and such

<
that H ( . ) b+cII.II for some b,c E R . Let w : X x R+ -t IR be a weakly 1.s.c.
Fre'chet supersolution to (H-J)-(B). Then w 2 v , the Hopf solution.

Thus v is the lowest Frkchet supersolution to (H-J)-(B) when H fulfils the

assumptions.

Corollary 5.2 Suppose X is reflexive and H is U.S.C. o n domg* and such

< +
that H ( - ) b c 11. 1 1 for some b, c E R. Let w be a weakly 1.s.c. function o n
X x R+ which is such that w(., 0) = g and is a Fre'chet (or viscosity) solution
< <
to (H-J)-(B). Then v w u.
If moreover g and H are convex, then w = v, the Hopf solution.

It is shown in (Alvarez et a1 (1999) Thms 2.1 and 2.5) that the growth
condition on H can be dropped when dimX < +oo.
Well-posedness in the sense of Hadamard requires that when the data (g, H )
are perturbed in a continuous way, the solution is perturbed in a continuous
way. Up to now, this question seems to have been studied essentially in the
sense of local uniform convergence. While this mode of convergence is well-
suited to the finite dimensional case with finite data and solutions, it does not
REFERENCES 137

fit our framework. Thus, in Penot (2000) and in Penot and Zalinescu (2001b)
this question is considered with respect to sublevel convergence and to epicon-
vergence (and various other related convergences). These convergences are well
adapted to fonctions taking infinite values since they involve convergence of
epigraphs. They have a nice behavior with respect to duality. However, the
continuity of the operations involved in the explicit formulae require technical
"qualification" assumptions (Penot and Zalinescu (2001a), Penot and Zalinescu
(2001b)).
We have not considered here the case H depends on x; we refer to Rockafellar
and Wolenski (2000a), Rockafellar and P.R Wolenski (2000b) for recent progress
on this question. We also discarded the case H depends on u(x). In such a
case one can use operations similar to the infimal convolution such as the
sublevel convolution o and quasiconvex dualities as introduced in Penot and
Volle (1987)-Penot and Volle (1990) (see also Martinez-Legaz (1988), Martinez-
Legaz (1988)). The papers Barron et a1 (1996)-Barron et a1 (1997) opened
the way and have been followed by Alvarez et a1 (1999), Barron (1999), Volle
(1998), Volle (1997)). A panorama of quasiconvex dualities is given in Penot
(2000) which incites to look for the use of rare dualities, reminding the role the
Mendeleiev tableau played in chemistry.

References

0. Alvarez, E.N. Barron and H. Ishii (1999), Hopf-Lax formulas for semicon-
tinuous data, Indiana Univ. Math. J. 48 (3), 993-1035.
0. Alvarez, S. Koike and I Nakayama (2000), Uniqueness of lower semicontin-
uous viscosity solutions for the minimum time problem, SIAM J . Control
Optim. 38 (2), 470-481.
H. Attouch (1984), Variational convergence for functions and operators, Pit-
man, Boston.
M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of
Hamilton-Jacobi-Bellman Equations, Birkhauser, Basel.
G. Barles (1994), Solutions de viscositB des Bquations de Hamilton-Jacobi,
Springer, Berlin.
G. Barles and B. Perthame (1987), Discontinuous solutions of deterministic
optimal stopping time problems, Math. Modeling and Numer. Anal. 21, 557-
579.
138 OPTIMIZATION AND CONTROL WITH APPLICATIONS

E.N. Barron (1999), Viscosity solutions and analysis in Lm,in Nonlinear Analy-
sis, Differential Equations and Control, F.H. Clarke and R.J. Stern (eds.),
Kluwer, Dordrecht, pp. 1-60.
E.N. Barron, and R. Jensen (1990), Semicontinuous viscosity solutions of Hamilton-
Jacobi equations with convex Hamiltonians, Comm. Partial Diff. Eq. 15,
1713-1742.
E.N. Barron, R. Jensen and W. Liu (1996), Hopf-Lax formula for ut+ H(u, Du) =
0, J . Differ. Eq. 126, 48-61.
E.N. Barron, R. Jensen and W. Liu (1997), Hopf-Lax formula for ut+ H(u, Du) =
0. 11, Comm. Partial Diff. Eq. 22, 1141-1160.
J.M. Borwein and Q.J. Zhu (1996), Viscosity solutions and viscosity subderiva-
tives in smooth Banach spaces with applications to metric regularity, SIAM
J. Control Optim. 34, 1568-1591.
J.M. Borwein and Q.J. Zhu (1999), A survey of subdifferential calculus with
applications, Nonlinear Anal. Th. Methods Appl. 38, 687-773.
F.H. Clarke, and Yu.S. Ledyaev (1994), Mean value inequalities in Hilbert
space, Trans. Amer. Math. Soc. 344, 307-324.
F.H. Clarke, Yu.S. Ledyaev, R.J. Stern and P.R. Wolenski (1998), Nonsmooth
analysis and control theory, Springer, New York.
M.G. Crandall, L.C. Evans and P.-L. Lions (1984), Some properties of viscosity
solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc. 282, 487-
502.
M.G. Crandall, H. Ishii and P.-L. Lions (1992), User's guide to viscosity so-
lutions of of second order partial differential equations, Bull. Amer. Math.
SOC.27, 1-67.
M.G. Crandall and P.-L. Lions (1983), Viscosity solutions to Hamilton-Jacobi
equations, Trans. Amer. Math. Soc. 277.
R. Deville (1999), Smooth variational principles and nonsmooth analysis in
Banach spaces, in Nonlinear Analysis, Differential Equations and Control,
F.H. Clarke and R.J. Stern (eds.), Kluwer, Dordrecht, 369-405.
E. El Haddad and R. Deville (1996), The viscosity subdifferential of the sum
of two functions in Banach spaces. I First order case, J . Convex Anal. 3,
295-308.
L.C. Evans (1998),Partial differential equations, Amer. Math. Soc., Providence.
H. Frankowska (1987), Equations d'Hamilton-Jacobi contingentes, C.R. Acad.
Sci. Paris Serie I 304, 295-298.
H. Frankowska (1993), Lower semicontinuous solutions of Hamilton-Jacobi-
Bellman equations, SIAM J . Control Optim. 31 (I), 257-272.
G.N. Galbraith (2000), Extended Hamilton-Jacobi characterization of value
functions in optimal control, SIAM J. Control Optim. 39 (I), 281-305.
C. Imbert (1999), Convex analysis techniques for Hopf-Lax' formulae in Hamilton-
Jacobi equations with lower semicontinuous initial data, preprint, Univ. P.
Sabatier, Toulouse, May.
C. Imbert and M. Volle (1999), First order Hamilton-Jacobi equations with
completely convex data, preprint, October.
A.D. Ioffe (1998), F'uzzy principles and characterization of trustworthiness, Set-
Val. Anal. 6, 265-276.
P.-L. Lions (1982), Generalized Solutions of Hamilton-Jacobi Equations, Pit-
man, London.
J.-E. Martinez-Legaz (1988), On lower subdifferentiable functions, in "Trends in
Mathematical Optimization", K.H. Hoffmann et al. eds, Birkhauser, Basel,
197-232.
J.-E. Martinez-Legaz (1988), Quasiconvex duality theory by generalized conju-
gation methods, Optimization, 19, 603-652.
J.-P. Penot (1997), Mean-value theorem with small subdifferentials, J . Opt. Th.
Appl. 94 (I), 209-221.
J.-P. Penot (2000), What is quasiconvex analysis? Optimization 47, 35-110.
J.-P. Penot and M. Volle (1987), Dualit6 de Fenchel et quasi-convexit6, C.R.
Acad. Sc. Paris s6rie I, 304 (13), 269-272.
J.-P. Penot and M. Volle (1988), Another duality scheme for quasiconvex prob-
lems, in "Trends in Mathematical Optimization", K.H. Hoffmann et al. e d ~ ,
Birkhauser, Basel, 259-275.
J.-P. Penot and M. Volle (1990), On quasi-convex duality, Math. Operat. Re-
search 15 (4), 597-625.
J.-P. Penot and M. Volle (2000), Hamilton-Jacobi equations under mild conti-
nuity and convexity assumptions, J. Nonlinear and Convex Anal. 1,177-199.
J.-P. Penot and M. Volle (1999), Convexity and generalized convexity meth-
ods for the study of Hamilton-Jacobi equations, Proc. Sixth Conference on
Generalized Convexity and Generalized Monotonicity, Samos, Sept. 1999, N.
140 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Hadjisavvas, J.-E. Martinez-Legaz, J.-P. Penot, eds., Lecture Notes in Econ.

and Math. Systems, Springer, Berlin, to appear.
J.-P. Penot and C. Zalinescu, Continuity of usual operations and variational
convergences, preprint.
J.-P. Penot and C. Zalinescu, Persistence and stability of solutions of Hamilton-
Jacobi equations, preprint.
S. Plaskacz and M. Quincampoix (2000), Value function for differential games
and control systems with discontinuous terminal cost, SIAM J. Control Op-
tim. 39, no. 5, 1455-1498.
S. Plaskacz and M. Quincampoix (2000), Discontinuous Mayer control problems
under state constraints, Topological Methods in Nonlinear Anal. 15, 91-100.
P. Plazanet (1990), Contributions B l'analyse des fonctions convexes et des
diffkrences de fonctions convexes. Application B l'optimisation et B la thkorie
des E.D.P., thesis, Univ. P. Sabatier, Toulouse.
R.T. Rockafellar and R. J.-B. Wets (1997), Variational Analysis, Springer-
Verlag, Berlin.
R.T. Rockafellar and P.R. Wolenski (2000), Convexity in Hamilton-Jacobi the-
ory. I. Dynamics and duality. SIAM J . Control Optim. 39, no. 5, 1323-1350.
R.T. Rockafellar and P.R Wolenski (2000), Convexity in Hamilton-Jacobi the-
ory. 11. Envelope representations. SIAM J. Control Optim. 39, no. 5, 1351-
1372.
A.I. Subbotin (1995), Generalized solutions of first-order PDE's, Birkhauser,
Basel.
R. Vinter (2000), Optimal Control, Birkhauser, Boston.
M. Volle (1998), Duality for the level sum of quasiconvex functions and applica-
tions, ESAIM: Control, Optimisation and Calculus of Variations, 3, 329-343,
http://www.emath.fr/cocv/
M. Volle (1997), Conditions initiales quasiconvexesdans les kquations de Hamilton-
Jacobi, C.R. Acad. Sci. Paris skrie I, 325, 167-170.
A.M. Rubinov

School of Information Technology and Mathematical Sciences

University of Ballarat, Victoria 3353 Australia

and A.P. Shveidel

Department of Mathematics, Karaganda State University

Karaganda, 470 074, Kazakhstan

Abstract: We describe classes of abstract convex functions with respect to

some sets of functions with the peaking property. We also describe conditions,
which guarantee that the corresponding abstract subdifferentials are nonempty.

Key words: Abstract convexity, suprema1 generator, peaking property, ab-

stract subdifferential.
142 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

We begin with the following definitions. Let H be a set of functions defined

on a set X . A function f : X -+ R+, := IR U {+oo) is called abstract convex
with respect to H (H- convex) if

f (x) = sup{h(x) : x E supp (f, H)) for all x E X, (1.1)

where supp (f, H ) = {h E H : h(x) < f (x) (V x E X ) is the support set of the
function f with respect to H .
A set H is called a supremal generator of a set F of functions f defined on
X if each f E F is abstract convex with respect to H. A supremal generator
H is a base (in a certain sense) of F , so some properties of H can be extended
to the entire set F. (See Rubinov (2000), Chapter 6 and references therein for
details.) If H is a "small" set then some of its properties can be verified by
the direct calculation. Thus small supremal generators are very helpful in the
examination of some problems. This observation explains, why a description of
small supremal generators for the given broad class of functions is one of the
main problems of abstract convexity. The reverse problem: to describe abstract
convex functions with respect to a given set H, is also very interesting.
If H consists of continuous functions then the set of H-convex functions is
contained in the set LSCH of all lower semicontinuous functions f such that
f 2 h for some function h E H . (Here f 2 h stands for f (x) 2 h(x) for all
x E X.) The set LSCH is very large. In particular, if constants belong to H ,
then LSCH contains all bounded from below lower semicontinuous functions.
As it turned out there are very small supremal generators of the very large set
LSCH. These supremal generators can be described by means of the so-called
peaking property (see Pallaschke and Rolewicz (1997) and references therein)
or the technique based on functions, support to Urysohn peaks (see Rubinov
(2000) and references therein).
We present two known examples of such generators (see Rubinov (2000) and
references therein).
1) let X be a Hilbert space and H be the set of all quadratic functions h of the
form
h(x) = -allx - xoll2 - c, x E X,

where a 2 0, xo E X and c E R.Then H is a supremal generator of LSCH.

The set H can be described by only three parameters: a point x E X, a number
ABSTRACT CONVEX FUNCTIONS 143

c E R and a number a E R+. If X coincides with n-dimensional space then

the dimension of H is n 2. +
2) Let X be a Banach space and H be the set of all functions h of the form

where a > 0, xo E X and c E R. The set H is a supremal generator of LSCH.

If X = Rn then the dimension of H again equal to n + 2.

It is interesting to give a direct description of the set LSCH without ref-

erences to the class H. We give such a description for a broad class of sets
H , which possess the peaking property. Let X be a normed space and p be a
continuous sublinear function defined on X and such that

yp := inf p(x) > 0.

Ilxll=l

Let k be a positive number. We shall study the set H k of all functions h of the
form
h(x) = -apk(x - XO)- c (x E X ) (1.3)
with xo E X , c E R and a > 0 and show that this set is a supremal generator
of the class of functions P k , which depends only on k and does not depend on
p. The class Pr, is very broad. It consists of all lower semicontinuous functions
f :X R+, such that liminfllxll++m
-+ f (x)/1x11 > -00.
Consider the space X = IRn and a sublinear function p defined on X such
that (1.2) holds. It is well-known that there exists a set of linear function
Up such that p(x) = rnaxl,up[l, XI, where [L, x] stands for the inner product of
vectors 1 and x. Let H1 be the set of functions defined by (1.3) for the given
function p and k = 1. Since H1 is a supremal generator of P I , it follows that
each function f E PI can be represented in the following form:

f (x) = sup (-a max([L,x - xo] - c),

(a,c,xo)EV(f) IEUp

where V(f) = {(a,c, xo) : -ap(x - xo) - c < f (x) Q x E X ) and Up does not
depend on f . Thus we have the following sup-min presentation of an arbitrary
function f E PI through affine functions:
144 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Since the class Pl does not depend on the choice of a sublinear function p with
yp > 0 it is interesting to consider such functions p that the corresponding set
Up has the least possible cardinality. Clearly, this cardinality is greater than or
+
equal to n 1, since for each function p(x) = ma%,l,...j[li, x] with j 5 n and
nonzero li we have yp E minllxll,l p(x) < 0. We discuss this question in details
( see Example 3.3, Example 3.4 and Remark 3.1).
We shall also describe conditions, which guarantee that the so-called abstract
convex subdifferentials are not empty. Recall the corresponding definitions (
Rubinov (2000)). Let X be an arbitrary set. A set L of functions defined on
X , is called the set of abstract linear functions if for every 1 E L the functions
hl,,(x) = 1(x) - c do not belong to L for each c # 0. The set HL = {hl,, :
1 E L,c E IR} is called the set of L-affine functions. Let f be an HL-convex
function. The set

is called the L-subdifferential of the function f at a point xo. Clearly 1 E

dLf (xO)if and only if f (xo) = h(xo) where h(x) := l(x) - c with c := f (xo) -
l(xo). Due to the definition of L-subdifferential h(x) 5 f (x) for all x. Thus
the L-subdifferential is not empty if and only if the supremum in the equality
f (xo) = sup{h(x) : h E supp (f, HL)) is attained.
The set H~ defined by (1.3) can be considered as the set HLk of Lk-affine
functions where
Lk = {lk,a,zo: a > 0,Xo E X, C E R}, (I*5)

and lk,a,xo(x)= -apk(x-xO), (x E X ) . We express conditions, which guarantee

that L~-subdifferentialis not empty in terms of the calmness of degree k (see
Theorem 4.1). Note that if L1-subdifferential dLlf (x) is not empty for all x
then sup-min representation (1.4) become max-min representation.
Note that the classical convex subdifferential df (x) of convex function f at
the point x plays two different roles. First, a subgradient (that is an element of
d f (2)) accomplishes a local approximation of f in a neighborhood of x. Second,
a subgradient permits the construction of a global afine support, that is an affine
function, which does not exceed f over the entire space and coincides with f
at x. Generalizations of the notion of the convex subdifferential based on local
approximation (global affine support, respectively) lead to nonsmooth analysis
(abstract convexity, respectively). It is very important to unite such different
ABSTRACT CONVEX FUNCTIONS 145

theories as nonsmooth analysis and abstract convexity in the study of some

concrete non-convex objects. We provide an example of such a unifications
(see Proposition 4.1).

2 SETS PK
Let X be a normed space and k be a positive number. Denote by Pk the set
of lower semicontinuous functions f : X + R+, such that f is bounded from
below on each ball and
f(x) >
liminf - (2.1)
ll41-- llxllk

If X is a finite -dimensional space, then f E Pk if and only if f is lower

semicontinuous and (2.1) holds. We describe some simple properties of the set
P k

1) P k is closed under the pointwise addition;

2) if f E Pk and g 2 f is a lower semicontinuous function, then g E P k as

well. In particular, if T is an arbitrary index set, ft E Pk for each t E T
and (supt,T ft)(x) = SUPtE~
ft(x), then supt,, ft E 7%.
3) If f l , f2 E Pk, then the function x H min(fl(x), f2(x)) also belongs to
P k .
4) Let f E Pk and g(x) = af (x - xo) - c, where a > 0, xo E X and c E R.
Then g E Pk.

5) Constants belong to Fk for all k > 0. Convex lower semicontinuous

functions (in particular, linear continuous functions) belong to Fk for all
k 2 1.

Note also that Pl C if I < k.

Ppk
Let f : X + R+, and k > 0. Consider the function fk defined by

It follows from the definition of the class Pr,that Pk= {fk : f f El). We now
describe some subsets of the set P I .

1. Each lower semicontinuous positively homogeneous of degree one function,

which is bounded from below on the unit ball, belongs to P i ;
146 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2. Each Lipschitz function belongs to P I .

3. A function f : X + R+, is called radiant if f ( a x ) 5 a f ( x ) for all x E X

and a E (0,11. Equivalent definition: a function f is radiant if f (Py) > P f (y)
for all y E X and P 2 1. Recall that a function f : X + R+, is called
proper, if the set dom f := { x : f ( x ) < + m ) is nonempty. The inequality
f (0) 5 0 holds for each proper lower semicontinuous and radiant function f .
Indeed, let y E dom f . Since f ( a y ) 5 a f ( y ) for all a E (0,I ) , it follows that
f (0) 5 lim,,o f ( a y ) = 0. It is easy to see that f is radiant if and only if its
epigraph epi f = { ( x ,A ) E X x R : x E X : X >
f ( x ) ) is a radiant set. (A
subset 0 of a vector space is called radiant if this set is star-shaped with respect
to zero, that is ( x E 0,X E ( 0 , l ) )+ (Ax E 0.))
We now show that every lower semicontinuous radiant function f , such that
d := infllzllllf ( x ) > -00, belongs to PI. Assume without loss of generality
that f is proper. Since f (0) 5 0 it follows that d 5 0. Let x E X and
IIxII = P > 1. Then x = Px', where IIx'II = 1, SO

Applying (2.2) we conclude that

liminf fo > d > -m.

1 1 ~ 1 1 + ~ IIxII
Let us check that f is bounded from below on each ball B, = { x : llxll 5 r ) .
We can consider only balls with r > 1. Let x E B, and llxll > 1. Then
due to (2.2) we have f ( x ) 2 llxlld 2 rd. If llxll 5 1, then f ( x ) 2 d. Thus
infl,lllr f ( x ) > dr > -m.
4. Consider a function f with the star-shaped epigraph epi f . This means
that there exists a point ( x ,A) E X x R such that ( y ,p) E epi f * a ( x ,A) +
(1 - a )( y ,p) E epi f for all a E ( 0 , l ) . An equivalent definition: there is a
point x and a number X such that aX + (1- a )f (y) 2 f ( a x + (1 - a ) y ) for all
y E X and a E ( 0 , l ) . Since each star-shaped set is a shift of a radiant set, it
easily follows from 3 that a lower semicontinuous function with the star-shaped
epigraph belongs to P I .

3 SUPREMAL GENERATORS O F T H E SETS PK

In this section we describe small suprema1 generators of the set Pk. Let X be a
normed space and p be a continuous sublinear function defined on X with the
ABSTRACT CONVEX FUNCTIONS 147

following property: there exists a number y > 0 such that

p(x) > yllxll for all x E X. (3.1)

Let a > 0,xo E X and lk,a,xo be a function defined on X by lk,a,xo(x)

=
-upk ( x - xo). Consider the following sets:

where 1 ( x ) = 1 for a11 x E X. First of all we describe abstract convex with

respect to ~"unctions. Let LSCHk be the set of all lower semicontinuous
functions f such that the set supp ( f , H k ) = { h E H% h 5 f } is not empty.

Lemma 3.1 LSCH. C Pk.

Proof: Let f E LSCHk. Then f is lower semicontinuous and supp ( f , H" # 0.

Let h E supp ( f , H ~ ) h, ( x ) = -ap"x - x o )- c. Since h is bounded from below
on each ball, it follows that f enjoys the same property. We have

= -a limsup pk(x - xo) 1 1 : - xOIlk

11xIl~+m115 - x0llk llxllk
Note that p(x - X O ) 5 llpll IIx - xoll where llpll = S U P I I ~ I I lp(x)1
,~ < +m. There-
fore
liminf fo > -allPIIk
llxllk -
11~11-++00
> -m.
Hence f E Pk. 0

Corollary 3.1 Each H"convex function belongs to Pk.

Lemma 3.2 Each function f E Pk is convex.

Proof: It is sufficient to consider only finite functions. Indeed, for each f E Pk
we have f ( x ) = sup, f,(x), where f,(x) = min(f ( x ) ,n ) , n = 1,2, . . . . So f is
Hkconvex if each f , is convex.
Let f E Pk be a finite function. For each a > 0 consider the hypograph
Ka = { ( x ,A) : X 5 -apk(x)} of the function lk,a,o(x)= -apk(x). Assume that
for every xo E X and every E > 0 there exists a > 0 such that
148 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Then f is H-convex. To prove it, take a point xo E X and arbitrary a > 0. Let
a be a number such that (3.3) holds. Consider the function h(x) = -apk(x -
+
xO) (f (XO)- a). Let (xo,f (xo) - a) + Ka = K. We have
K = {(x, v) : x = xo + 9, U = f (xo) - E + p, p I -aP k (Y))
= {(x,.) : x = xo + y : v I ~ ( x o-)a - apk(y))
= {(x, v) : v 5 f (TO)- a - apk(x - xo))
= {(x,v) : u I h(x)).

Let x E X and y = x - xo. Since (x, f (x)) E epi f it follows from (3.3) that
+
(x, f (x)) $ (20, f (XO)- a) Ka, so f (x) > h(x). Since h(xo) = f (xo) - a and
a is an arbitrary positive number, we conclude that f (xo) = sup{h(xo) : h E
SUPP (f,H ) ) .
We now prove that (3.3) is valid. Assume in contrary that there exists
xo E X and a > 0 such that for each positive integer n a pair (x,, A,) can
be found such that (x,, A,) E epi f and (x,, A,) E (XO,f (xo) - a) + K,. Let
(y,, p,) = (x,, A,) - (xo,f (xO)- a). It follows from above that

An = f (XO)- + Pn; Pn I -npk(yn) (3.5)

We consider separately two cases.
1) The sequence p, is bounded. Since Ip,l 2 npyYn) it follows that pk(yn) -+
0. Since p(y) 2 ~llyllfor a11 y, we conclude that y, -+ 0, so x, + xo. Due to
lower semicontinuity o f f and the inequality in (3.4) we have

liminf A, 2 liminf f (xn) L

n n
f(~0).
On the other hand

so liminf, A, < f (50). We arrive at a contradiction, which shows that the

sequence p, cannot be bounded.
2) The sequence p, is unbounded. Without lost of generality assume that
limn pn = -00. Then limn A, = lim,(f (xo) - E + p,) = -00, SO

lim f (x,) = -00.

n (3.6)
ABSTRACT CONVEX FUNCTIONS 149

The function f is bounded from below on each ball therefore (3.6) implies
unboundedness of the sequence xn. Since liminfllzll-+oof (x)/IIxllk> -CO, we
conclude that there exists c > 0 such that f ( x n ) 2 -cllxnllk for a11 sufficiently
large n. Hence
An 2 -cllxnllk (3.7)
for these n. On the other hand, applying (3.5) we deduce that

for all n. Due to (3.7) and (3.8), we have for all large enough n:

I t follows from (3.1) that

Since Ilxnll/llynII + 1 as n -t +CO and llxnll is unbounded it follows that the

sequence on the left hand of (3.9) is unbounded from below, which contradicts
(3.9). 0

Theorem 3.1 Let P(H" be the set of all Hk-convex functions. Then

P(H" = Pk = LSCHk.

Proof: The result follows directly from Lemma 3.1, Lemma 3.2 and the obvious
inclusion P ( H k ) c LSCHb. 0

We now present some examples.

Example 3.1 Let X be a normed space. The set H 1 of all functions h defined
on X by h ( x ) = -allx - xo 11 - c with xo E X , a I 0, c E R is a supremal
generator of P I . If Pl > H > H 1 , then H is also a supremal generator of P l .
In particular the set of all concave Lipschitz functions is a supremal generator
of 7'1.

Example 3.2 Let X be a Hilbert space and H 2 be the set of all functions h
defined on X by h ( x ) = + [l,x]- c with a > 0,l E X , c E R.(Here
150 OPTIMIZATION AND CONTROL WITH APPLICATIONS

[l, x] is the inner product of vectors 1 and x). It is well known (see, for example,
Rubinov (2000)) that H2 is a supremal generator of the set LSCH2. Clearly
h E H2 if and only if there exists a point xo E X and a number c' E IR such
that h(x) = -allx - xo [I2 - c'. So Theorem 3.1 shows that P2 coincides with
the set of all H2-convex functions.

Example 3.3 Let U be a closed convex bounded subset of a normed space

X such that 0 E int U. Then the Minkowski gauge p u of the set U is a
pu(x) = y > 0. (Recall that
continuous sublinear function, such that infllzll=l
pu(x) = inf{X >0 :x E XU) for x E X.) Hence the set H~ of all functions
h defined on X by h(x) = -a&(x - xo) - c with a > 0, xo E X, c E IR is a
supremal generator of Pk.

Example 3.4 The following particular case of Example 3.3 is of special in-
terest. Let X be a n-dimensional space IRn, I = (1,. . . ,m ) and li (i E I ) are
vectors such that their conic hull cone (11,. . . ,1,) := {xi,,Aili :X 2 0 (i E I))
coincides with the entire space IRn. Then the set

is bounded and contains a ball c B where B = {x : llxll < 1) and c > 0. Let p s
be the Minkowski gauge of S. We have
1 1
PS(X) I P=B(X)= -C ~ B ( x =
) -IIxII
C
for all x E IRn

Thus the function p s is finite. The set S is bounded, so there exists y > 0 such
that S C (l/y)B. We have

Let us calculate the Minkowski gauge ps. Since S = niEIMi, where Mi =

{x : [li,x] 5 1) and p ~(x), = max([li, XI, 0) (i E I ) , it follows that ps(x) =
max([ll, XI,.. . ,[l,, XI,0). Due to (3.11) we conclude that ps(x) > 0 for all
nonzero x E IRn, so

pS(x) = max([ll, XI, . . . , [l,, XI) for all x E IRn. (3.12)

It follows from (3.12) and (3.11) that p = p s is a continuous sublinear function,

which possesses property (3.1), so the set H1 of all functions h of the form
ABSTRACT CONVEX FUNCTIONS 151

with a 2 0, xo E Rn,c E R is a suprema1 generator of the set Pl.

Remark 3.1 Consider a number m with the following property: there exist m
vectors ll,. . . ,I, such that the sublinear functionp(x) = maxi=l,...,,[li, XI, (x E
<
X ) is strictly positive for all x # 0. If m n then this property does not hold
for the function p. Indeed the system [li,x] = -1, i = 1,.. . ,m has a solution
for arbitrary nonzero vectors 11,. . . ,1,. It follows from the Example 3.4 that
+
we can find corresponding vectors if m = n 1. Thus the least number m,
which possesses mentioned property, is equal to n 1. +

In this section we describe sufficient conditions, which guarantee that the Lk-
subdifferential is not empty. These conditions becomes also necessary for k = 1.
Recall the following well known definition (see, for example, Burke (1991)).
A function f : X -+ R+, is called calm at a point xo E dom f if

We say that a function f is calm of degree k > 0 at xo E dom f if

(This definition can be found, for example in Rubinov (2000).)

Assume, we have a continuous sublinear function p defined on a normed
space X , which enjoys the property (3.1). Let L~ be the set of abstract linear
functions defined by (1.5).

Theorem 4.1 1) Let f E P k and xo E X . If the function f is calm of degree

k at the point xo then the subdifferential d L k f (XO)is nonempty and contains
the function l(x) = -cp"x - xo) with some c > 0.
2) If k = 1 and the subdifferential dLl f (xo) is nonempty then f is calm (of
degree k = 1) at a point xo.

Proof: 1) Assume that f is calm of degree k, that is (4.2) holds. Then there
exist numbers cl and dl > 0 such that f (x) - f (xo) L cl llx - xoll if IIx - xoll <
dl. Since liminf,,, f (x)/llxllk > -m it follows that there exist numbers ca
and da > 0 such that f(x) 2 clllxlI"f 11x11 > dh. Since
152 OPTIMIZATION AND CONTROL WITH APPLICATIONS

we can find numbers c2 and d2 > 0 such that f ( x )- f (XO) 2 ~2 11 x - xo 11 if

llxll > d2. Consider the set D = { x : llx - xoll 2 d l , 11x11 I dz). Since the
function f is bounded from below on the ball { x : llxll I d2), it follows that
there exists a number cg such that f ( x ) - f ( X O ) 2 cgllx - xoll"or all x E D.
Due to the inequality -113: - xollk 2 -(l/r)kp"x - xo) we can find a number
c such that f ( x ) - f ( x o ) 2 c p y x - x o ) for all x E X . This means that the
function 1(x)= -cp"x - xo) belongs to the subdifferential dLkf (xo).

2) Let k = 1 and dLl f ( x o )# 0. Let 1 E d ~f i(xo),where l ( x ) = -ap(x-XI)-c.

We have,

so the function f is calm a t the point xo. 0

Recall ( see, for example, Demyanov and Rubinov (1995)) that a function
f defined on a normed space X is called subdifferentiable at a point x E X if
there exists the directional derivative

+O ( l l a ) ( f( x + a u ) - f ( x ) )
f b ( 4 = a+lim
for all u E X and f: is a continuous sublinear function. Each convex function f
is subdifferentiable a t a point x E int dom f . A function f is called quasidiffer-
entiable (see, for example, Demyanov and Rubinov (1995))a t a point x if the
directional derivative f k exists and can be represented as the difference of two
continuous sublinear functions. If f is the difference of two convex functions,
then f is quasidifferentiable.
The support set supp (r,X * ) of a continuous sublinear function r : X + IR
with respect to the conjugate space X* will be denoted by dr. Note that
dr coincides with the subdifferential (in the sense of convex analysis) of the
sublinear function r a t the point 0.

Proposition 4.1 Let p be a continuous sublinear function, such that (3.1)

holds and LI = {ll,a,y: a > 0, y E X ) , where I I , ~ , ~ ( X )= -ap(x - 9). Let
f E P1 be a locally Lipschitz at a point xo E X function. Then

1) The L1-subdifferential dLl f ( x o )is not empty and contains a function ll,,,,,
with some c > 0.
2) Let f be quasidifferentiable at the point xo and f;,(u) = r l ( u ) - r 2 ( u ) ,
where r l , rz are continuous sublinear functions and let ll,c,xo E dLl f ( x o ) . Then
dr2 c drl + cap.
Proof: 1) The function f is calm at the point xo, so the result follows from
Theorem 4.1.
2) Let l ( x ) = -cp(x - xo) be a L1-subgradient of f at the point xo. Let u E X
and a 2 0. Then

-cap(.) = -CP((XO + a u ) - xo) = I ( X O + a u ) - l(zo) I f (XO + au) - f (50).

Thus
-cp(u) < f t ( x , u ) = r l ( u )- Q ( U ) for all u E X ,
which leads to the inclusion dr2 c drl + cap. 0

Corollary 4.1 Let f be a locally Lipschitz subdifferentiable at xo function and

E 8 ~ f 1( S O ) . Then
h,c,xo
cap n af;, # 0 (4.3)

Indeed, it follows from Proposition 4.1 that 0 E df;, + cap, which is equivalent
to (4.3).

Corollary 4.2 Let f be differentiable at a point xo and ll,,,xo E dLl f ( x o ) .

Then V f ( x o )E cap.

Acknowledgments

This research has been supported by Australian Research Council Grant A69701407.

References

Balder E.J.(1977), An extension of duality-stability relations to nonconvex op-

timization problems, SIAM J. Control and Optimization, 15, 329-343.
Burke, J.V. (1991), Calmness and exact penalization, SIAM J. Control and
Optimization 29, 493 -497.
Demyanov V.F. and Rubinov A.M.,(1995) Constructive Nonsmooth Analysis,
Verlag Peter Lang, Frankfurt on Main.
154 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Dolecki S. and Kursyusz, S. (1978). On @-convexity in extremal problems,

SIAM J. Control and Optimization, 16, 277-300.
Kutateladze S.S. and Rubinov A.M. (1972). Minkowski duality and its appli-
cations, Russian Mathem. Surveys 27, 137-191.
Pallaschke D. and Rolewicz S. (1997). Foundations of Mathematical Optzmiza-
tion (Convex analysis without linearity) Kluwer Academic Publishers, Dor-
drecht.
Rubinov A. M. (2000) Abstract convexity and global optimization, Kluwer Aca-
demic Publishers, Dordrecht.
Singer I. (1997) Abstract Convex Analysis. Wiley-Interscience Publication, New
York.
I\ OPTIMIZATION
ALGORITHMS
/
AN IMPLEMENTATION OF
TRAINING DUAL-NU SUPPORT VECTOR
MACHINES
Hong-Gunn Chew, Cheng-Chew Lim and Robert E. Bogner

Department of Electrical and Electronic Engineering

The University of Adelaide
and
The Cooperative Research Centre for
Sensor Signal and Information Processing
Australia

Abstract: Dual-v Support Vector Machine (2v-SVM) is a SVM extension

that reduces the complexity of selecting the right value of the error parameter
selection. However, the techniques used for solving the training problem of the
original SVM cannot be directly applied to 2v-SVM. An iterative decomposition
method for training this class of SVM is described in this chapter. The training
is divided into the initialisation process and the optimisation process, with
both processes using similar iterative techniques. Implementation issues, such
as caching, which reduces the memory usage and redundant kernel calculations
are discussed.

Key words: Training Dual-nu Support Vector Machine, nu-SVM, decision

variable initialisation, decomposition.
158 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The Support Vector Machine (SVM) is a classification paradigm based on sta-

tistical learning that has shown promise in real world applications. Successful
use of SVMs in large scale applications include face detection (Osuna et al.
(1997b)), target detection (Chew et al. (2000)), and text decoding using Sup-
port Vector Regression (Chang and Lin (2001b)).
The setting of the error penalty in the original SVM formulation (Burges
(1998)) is essentially based on trial-and-error, which requires additional time
consuming training. This shortcoming is partially overcome with the formula-
tion of u-SVM by Scholkopf et al. (2000). A more general formulation, termed
Dual-u-SVM (Chew et al. (2001a)) or 2u-SVM, provides better performance
when the training class sizes are not the same, or when different class error
rates are required. I t is introduced in Section 2.
The training of a SVM involves solving a large convex quadratic program-
ming (QP) problem, and can only be solved numerically. There are issues with
the computation and memory complexities that need to be addressed, before
the training problem can be solved. A decomposition method well suited to
handle such implementation concerns is described in Sections 3 and 5.
As in any numerical optimisation, there is a starting point for the decision
variables where the algorithm starts searching the path to the optimal point.
In the original SVM formulation, the decision variables are set to zero. In
2u-SVM, setting the decision variables to zero will result in an optimisation
constraint being violated. I t is therefore necessary to initialise the decision
variables properly before the optimisation process can proceed. In Section 4, a
systematic process of initialising the decision variables is discussed. Numerous
implementation aspects on the initialisation and the decomposition are dealt
with in Section 5.

2 DUAL-v SUPPORT VECTOR MACHINES

Consider a set of 1 data vectors {xi, yi), with xi E R ~yi, E +1, -1, i = 1,. . . ,1,
where is the i-th data vector that belongs to a binary class yi. We seek the
hyperplane that best separates the two classes with the widest margin. More
specifically, the objective of training the SVM is to find the hyperplane (Burges
(1998))
w . @ ( x+
) b = 0, (2.1)
TRAINING DUAL& SUPPORT VECTOR MACHINES 159

subject to

to minimise

where p is the position of the margins, and v is the error parameter to be de-
fined later in the section. The function @ is a mapping function from the data
space to the feature space to provide generalisation for the decision function
that may not be a linear function of the training data. The problem is equiva-
lent to maximising the margin 2/11wll, while minimising the cost of the errors
Ci(vp - ti), where w is the normal vector and b is the bias, describing the
hyperplane, and ti is the slack variable for classification errors. The margins
are defined by w . x + b = fp.
In the Dual-v formulation, we introduce v+ and v- as the error parameters
of training for the positive and negative classes respectively, where

Denoting the error penalty as

with

where 1+ and 1- are the numbers of training points for the positive a negative
classes respectively. Note that the original v-SVM formulation by Scholkopf et
al. (2000) can be derived from 2v-SVM by letting v+ = 6and v- = &.
The 2v-SVM training problem can be formulated as a Wolfe dual Lagrangian
(Chew et al. (2001a)) problem. The Wolfe dual Lagrangian is explained by
160 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Fletcher (1987). The present problem involves:

subject to

where i, j E 1,.. . ,1, and the kernel function is

In solving the 2u-SVM problem, constraint (2.13) can be written as an equality

(Scholkopf et al. (2000)) during the optimisation process,

This property is required in the training process and will be shown in Section
3.3.

3 OPTlMlSATlON METHOD

Training a SVM involves the maximisation of a quadratic function. As the size

of the data matrix Q, where Qij = yiyjKij, scales with 12, with most problems,
it will not be possible to train a SVM using standard QP techniques without
some sort of decomposition on the optimisation problem.
Many methods had been suggested to solve the Q P problem of the origi-
nal formulation of SVMs: chunking (Vapnik (1995)), decomposition (Osuna et
al. (1997a), and Hsu and Lin (2002)), and Sequential Minimal Optimization
(SMO) (Platt (1999)). However, these methods cannot be directly applied to
2u-SVM. The addition of the variable margin, p, increases the difficulty in
determining the state of the Karush-Kuhn-Tucker (KKT) conditions that some
methods require during the process of optimisation.
In this section, we will describe a decomposition method which has been
approached from a pairwise decomposition viewpoint, and results in a method
similar to that proposed by Chang and Lin (2001a).
TRAINING DUAL& SUPPORT VECTOR MACHINES 161

A constrained convex QP problem can be broken down into a smaller QP

problem a t each iterative step, with a smaller set of decision variables. The
solution is found when there is no set of decision variables that can be changed
to maximise the objective function. We have chosen to break the problem
down to a two decision variable problem, as the sub-problem can be solved
analytically. The constraints imposed by the problem are still satisfied and are
discussed in Section 3.3.

3.1 Iterative decomposition training

In the iterative optimisation process, the k-th iteration of the Lagrangian of

The solution to the problem is obtained when no update to ai ( k ) can be found

that increases F ( ~while

) satisfying the constraints of the problem.
In the decomposition process (Joachims (1999)), the problem is divided into
the working set B , and the non-working set N , where B and N are exclusive
and B U N = (1.. .1). That is for a chosen working set B,

and we desire the solution to the sub-problem FBB. The solution to the problem
is obtained when no working set B can be found and updated that increases
FBB while satisfying the contraints of the problem.
We can combine the two processes to get a simple and intuitive method of
solving the 2u-SVM training problem. At each interative step, the objective
function is maximised with respect to only two variables. That is, we decompose
the problem into a sub-problem with a working set of two points p and q. This
decomposition simplifies the iterative step while still converges to the solution,
due to the convex surface of the problem.
162 OPTIMIZATION AND CONTROL WITH APPLICATIONS

With p, q E 1,.. . ,1, (3.1) can be rewritten to extract the ap) and aq
(k)

components of the function, to give

With some change Spq made on the decision variables, the change in Fp(;+') can
be obtained using the substitutions

As only the p-th and q-th decision variables are updated, the change in the
objective function at iteration (k + 1) is shown in the Appendix to be

where

We use (3.5) to train 2 ~ - S V Mby selecting the optimal decision variables to

update a t each iterative step until the objective function is a t its maximum.

3.2 Choosing optimising points

In each iterative step, the objective function is increased by the largest amount
possible in changing ap) and ag).We seek the pair of decision variables to
update in each step by searching for the maximum change in the objective
function in updating the pair. The optimal SPq for each (p, q) pair, denoted by
TRAINING DUAL-v SUPPORT VECTOR MACHINES 163

d;,, is obtained from

resulting in

This in turn gives the maximum possible change in the objective function for
the updated pair as

The process for finding (p, q) is therefore to find

f o r p , q ~ l...,
, 1.
Using (3.9) requires 0 ( 1 2 ) operations to search for the (p, q) pair. We can
reduce the complexity by simplifying the search to

Although the new search criterion is a simplification of (3.9), due to the con-
vexity of the objective function, the optimisation process still converges to the
same maximum, albeit in a less optimal path, but has a search complexity of
only 0 (E), as it only searches for the maximum and minimum of Glk).
All kernel functions need to satisfy Mercer's condition (Vapnik (1995)).
Mercer's condition states that the kernel is actually a dot product in some
space and therefore, the denominator of (3.7) is

which is positive for all valid kernel functions. Thus, it is clear that for the
(p, q) pair found using either (3.9) or (3.10), we obtain d;q 2 0. If 61Tq = 0, the
objective function is a t its maximum and we have found the trained SVM. The
iteration process continues as long as
164 OPTIMIZATION AND CONTROL WITH APPLICATIONS

3.3 Constraints of 2 ~ - S V M

In maximising the objective function (2.10), constraints (2.11), (2.12) and

(2.13) have to be satisfied. We now derive the conditions for meeting each
constraint.

Proposition 3.1 If {a,!')}satisfies (2.12)) and {a$"} i s updated using (3.4))

then {a?)} satisfies (2.12) for any k.

Proof: From (2.12) and using (3.4),

Since {cup)}satisfies (2.12), by induction, {a!"} satisfies (2.12) for any k.

Proposition 3.2 If {a?)} satisfies (2.13)) and {a!"} i s updated using (3.4)
with
Yp lP Yq, (3.12)

then {ailr))satisfies (2.13) for ang k.

Proof: From (2.13), using (3.4) and (3.12),

Since {a!O)}satisfies (2.13), by induction, {a!"} satisfies (2.13) for any k.

TRAINING DUAGu SUPPORT VECTOR MACHINES 165

Remark 3.1 Due to (3.12)) the search for the update pair ( p ,q ) is divided into
two parts, one for each class. The class which returns the higher increase in
the objective function is selected for the update process.

Remark 3.2 Proposition 3.2 shows that the update process of the optimisation
results in

i i
Since the solution of the training problem has the property of (2.15)) we need
to initialise {ai( 0 )), such that

i
to enable the optimisation process to reach the solution.

Proposition 3.3 If {a?)} satisfies (2.11)) and {a!") is updated using (3.4)
with
, foryp=yq=+l
(3.14)
, for yp = yq = -1

then {a{"')} satisfies (2.11) for any Ic.

Proof: It is clear that since d;q is always positive (3.11),the limiting constraints
on dpq are

a?) - dpq 2 0, a?' + dpq 5 C q , for yp = yq = + l , (3.15)

a?) + dPq 2 C P , a?) - dpq 5 0 , for yp = yq = -1. (3.16)

Using dpq as stated in (3.14) will meet the constraints of (3.15) and (3.16),
and therefore (2.11). Thus, if {a?)) satisfies (2.11), then by induction, {a!'")}
satisfies (2.11) for any Ic.

Remark 3.3 The selection process requires dpq > 0 for each iteration. From
(3.15) and (3.16)) it is clear that we need

From Propositions 3.2 and 3.3, the selection of the ( p , q ) pair for each it-
eration therefore requires the search for (3.10) for each class, while satisfying
(3.12), (3.17) and (3.18), to find dpq > 0 with (3.14).
166 OPTIMIZATION AND CONTROL WITH APPLICATIONS

3.4 Algorithm

The optimisation process with iterative decomposition is stated in the following

realisable algorithm.

Given

w Find the set of decision variables, {ai), that maximises the objective
function

subject to

Define

- the equations

- and the sets

TRAINING DUAL-v SUPPORT VECTOR MACHINES 167

Start at k = 0 , for some set of {a?)} where

While 3 ( p ,q ) E WY)n wLk)such that sit+')> 0 ,

(k)
- find (p+,q+) E W+ , and (p-, q-) E W-( k ), with

- if (0+2 O - )
* then

* else

- and updating the corresponding decision variables

ar+l) = -y 6
P ,

q
=
9
+ yq6,
- update k =k + 1.
Terminate.
168 OPTIMIZATION AND CONTROL WITH APPLICATIONS

4 INITIALISATION TECHNIQUE

In Section 3.4, the optimisation procedure needs the decision variables to be

initialised, such that constraints (2.11), (2.12) and (2.15) are satisfied. One ap-
proach is to choose a number of decision variables to change, such that ai = Ci
until the constraints are met. The selection of training points to update usually
involves selecting the first n points in the training set, and updating the cor-
responding decision variables, until the constraints are satisfied. This method
of initialising the decision variables is used in most of the available u-SVM
software packages, but it is not systematic.
It is possible to initialise the decision variable using a similar iterative ap-
proach to the optimisation process. The method in Sections 4.1, 4.2 and 4.3
will demonstrate that the selection of variables to update is systematic. The
initialisation method also reduces the training time of the optimisation, while
not taking more computational time to perform (Chew et al. (2001b)).

4.1 Iterative decision variable initialisation

Consider the Lagrangian of 2u-SVM (2.10). The t-th iteration of the Lagrangian
(3.1) is rewritten to extract the a?) component of the function, for some
r € 1, ..., 1, as

As only the r-th decision variable is updated, we can derive the change in F,(t+')
using the substitutions

for some change, 6,. The objective function at iteration (t + 1) is shown in the
Appendix to be
TRAINING DUAL-v SUPPORT VECTOR MACHINES 169

where

We use (4.3) to initialise the set of decision variables such that the variables
satisfy constraints (2.11), (2.12) and (2.15), for the training of 2u-SVMs, by
selecting the optimal decision variable to update a t each iterative step.

4.2 Choosing initialisation points

The training process of 2v-SVMs requires the maximisation of the objective

function. Therefore, during the initialisation process, we seek the minimal
decrease in the objective function when changing a?) = C,. We find the
variable to update in each step by searching for the maximum increase (or
minimum decrease) in the objective function in updating the pair, with 6, = C,,
which is

~ ~ , ( t + l=) ~ ( t + l-) ~ ( t )
1
= - C , ~ , - G ~-
) -
2
(c,)~
K,,. (4.5)
The process for finding r is therefore to find

for all r E 1,.. . , I , and a?) = 0.

4.3 Constraints of initialisation

The iterative process for initialising ai is to vary ai in order to meet (2.15).

However, the two other constraints (2.11) and (2.12) have to be satisfied in the
process as well. While (2.11) is clearly never violated, an additional restriction
is required to meet (2.12) when the process terminates.
By combining (2.12) and (2.13), the constraints reduce to

Similarly, by combining (2.12) and (2.15), the constraints reduce to

170 OPTIMIZATION AND CONTROL WITH APPLICATIONS

In the search for r a t each iteration, (4.7) allows the exclusion of the class
that has met the constraint. The selection process therefore searches for r in a
reduced set of training points,

Note that (2.15) is more stringent than (2.13), implying that meeting (2.15)
is sufficient for (2.13). Consequently, we refine the update equation (4.2) to
(4.10) to satisfy (4.8).

Proposition 4.1 Let N ( ~be) defined by (4.9). If the search for r E N ( ~uses
)
(4.6) i n the { a i ) initialisation, and the update of a!') at each iteration t uses

the resulting set of ai will satisfy (4.8) at the end of the initialisation process.

Proof: F'rom (4.9), we see that )

4
: = 0. Therefore the updated a?+') for
class y, is

Since the initialisation process terminates when (4.7) is satisfied, it follows from
(4.11) that (4.8) is satisfied. 0

Proposition 4.2 The number of iterations required in the initialisation is only

TRAINING DUAL-u SUPPORT VECTOR MACHINES 171

Proof: The derivation of (4.12) is given in the Appendix.

The iterative process of initialising the decision variables is therefore to find

r E N ( ~using
) (4.6), and updating a?) with (4.10). The result of the initial-
isation process is a set of decision variables, {ai),that satisfies the training
constraints imposed on 2v-SVMs, as well as reduces the training time required.

4.4 Algorithm

The initialisation process is stated in the following algorithm.

w Given

w Find the initial state of the set of decision variables, {ai), for training
such that

Define

- the equations
172 OPTIMIZATION AND CONTROL WITH APPLICATIONS

- and the set

w Start a t t = 0, with a,(0)

w While N ( ~#) 0,

- find r E N ( ~with
)

- and updating the corresponding decision variable

a?+') = min (c,.,-V2 - I/!')) ,

- update t = t + 1.

w Terminate.

5 IMPLEMENTATION ISSUES

Training 2v-SVMs is a computationally expensive process. The iterative process

described in Section 3 provides a working optimiser that allows large problems
to be solved without the need for large amounts of memory to store the Hessian,
Q. The process can be further improved to reduce the computational complex-
ity.
Three issues are discussed on the possible improvements to the computa-
tional performance of the optimisation: the kernel calculations, the optimisa-
tion process, and the initialisation process. We shall see in Section 5.3 that
the initialisation process, besides initialisation of the decision variables, also
initialise the caches for the optimisation, as part of the process.

5.1 Kernel calculations

The kernel calculations form a large proportion of the computational load in

the training process, as each stage of both iterative processes requires Kij.
Kij usually involves expensive calculations, such as the exponential for radial
basis function kernels, and the numeric powers for polynomial kernels. The
TRAINING DUAL-V SUPPORT VECTOR MACHINES 173

calculation of Kij each time can be avoided by caching it as a 1 x 1 matrix K ,

but it is impossible to store the whole matrix for all but the smallest training
problems. For example, we would require 800 Megabytes of cache for 1 = 10240
using double precision floating point numbers.
It is, however, possible to cache only Kij for aj > 0. The two algorithms
stated here mainly requires only Gi, and can safely ignore the terms where
aj = 0. Thus we only need to cache the columns of K where aj > 0. This
caching method will improve the performance of the optimisation by reducing
the number of kernel calculations required. As the optimisation process is
iterative, the set of {jlaj > 0) changes as the iteration progresses. The strategy
to manage the caching is to add columns to the cache when aj = 0 becomes
non-zero, and to only remove columns when memory is full. The column where
aj = 0, and that was unused for the longest time, is removed first.
It is also useful to cache the diagonal of the kernel matrix, Kii, as it only
requires I elements and is used in each step of both algorithms.

5.2 Optimisation process

The training process involves the iterative maximisation of the objective func-
tion, by way of finding the maximum sg") for the positive class and the
negative class. I t is easy to see that finding the maximum s$:+') is basically
finding the maximum and minimum of G!". This is why the process has a
complexity of only O(1) instead of 0(12).
The calculation of G:" can be reduced significantly by updating it a t each
iteration rather than recalculating it again. Since only the p-th and q-th deci-
sion variables are changed, from (3.6) and using (3.4),

which has a complexity of O(1) rather than 0(12), when updating Gik"), for
all i E 1 , . . . ,l. Note that the kernel functions required by the update are for
174 OPTIMIZATION AND CONTROL WITH APPLICATIONS

columns p and q of the kernel matrix. As a, > 0 and a, > 0 a t iteration (k)
or (k + I ) , the columns will be cached by the caching strategy of Section 5.1.
We also have to consider the cost involved in initialising the Gi cache, that
is Gik=O). Assuming that the number of non-zero decision variables is m,
calculating Gi(k=O) would require O(m1) loops and kernel calculations. However,
we will see in the next section that the initialisation method provides Gi(k=O)
in its process as well.

5.3 Initialisation calculations

The initialisation process seeks the minimum of I,(~+'),which has one dynamic
term 2ciyiGjt), and one static term (Ci)2Kii. The static term can be calculated
a t the start of the process, cached, and added to the dynamic term a t each
iteration.
As in the optimisation process, G?) in the dynamic term, can be updated a t
each iteration to reduce the computation requirements. From (4.4), and using
(4.2) with 6, = C,,

Again, updating Gttf l) has a complexity of O(1).

Here, we see that a t the beginning of the initialisation process, all decision
variables are zero, att=') = 0. This means G:=O) = 0 and no columns of the
kernel matrix cache are required. At each iteration, a zero decision variable r
is updated, resulting in adding column r of the kernel matrix to the cache, and
+
updating Gdt+'). The process terminates after n = [v+l+] [LL]iterations
{
(4.12), with G?=") and kernel cache of j la:=") > 0 columns. Noting that

the initialisation process has provided the Gi(k) cache and kernel column cache
needed in the optimisation process.
TRAINING DUAL-v SUPPORT VECTOR MACHINES 175

Note that n is also the minimum number of non-zero decision variables in

{ai)that will satisfy the constraints of 2v-SVM training, that is n 5 m. Addi-
tionally, the calculation of G,!t) has a complexity of O(n1). We assume that the
calculation of Git) is the major computational proportion of the initialisation
process. Thus, the computational complexity of initialising the decision vari-
ables is lower than that needed to calculate GLk=') for the optimisation process
proper, as O(n1) 5 O(m1). This means that the cost of initialising the decision
variable is minimal and comparable to that of the ad hoc initialisation process
that many current implementations use. This new initialisation process has the
advantage of reducing the number of optimisation iterations to train 2v-SVM.

6 PERFORMANCE RESULTS

We tested the iterative method against the exisiting ad hoc method using three
datasets. The ad hoc method initialises the first few decision variables until the
constraints are met before the optimisation step. Each dataset is trained with
the error parameters set to v+ = v- = 0.2, and using three different kernels:
linear, polynomial, and radial basis function (RBF).
The first dataset is a face detection problem with 1,000 training points,
with each point being an image. The dataset is trained with the linear, the
polynomial (degree 5), and the RBF (a = 4,000) kernels. The second dataset
is a radar image vehicle detection problem with 10,000 training points. The
dataset is trained with the linear, the polynomial (degree 5), and the RBF
(a = 256) kernels. The third dataset is a handwritten digit recognition problem
(digit 7 against other digits) with 100,000 training points, and is trained with
the linear, the polynomial (degree 4), and the RBF ( a = 15) kernels. This
dataset is also trained with v+ = v- = 0.01, using the polynomial and RBF
kernels.
The 2v-SVMs are trained on an Intel Pentium 4 (2GHz) Linux machine,
and the results are given in Table 6.1. The table shows the iterative initisation
method provides improvements for most of the tests. The iterative initialisation
method may increase the training time for small problems, but this is mainly
due to the slow convergence of the optimisation process, as seen in the face
detection problem using the polynomial kernel.
Figure 6.1 shows more clearly that while the computational time required for
initialisation is not reduced, there is a significant reduction in the optimisation
176 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Table 6.1 Training times for the ad hoc and the iterative methods of initialising the
decision variables

Dataset Initialisation/Optimisation times (seconds)

Kernel Ad hoc Iterative Change
Face detection
linear
polynomial
radial basis function
Target detection
linear
polynomial
radial basis function
Digit recognition (v = 0.2)
linear
polynomial
radial basis function
Digit recognition (v = 0.01)
polynomial
radial basis function
TRAINING DUALV SUPPORT VECTOR MACHINES 177

lin poly rbf

(X 100)
poly rbf lin poly rbf
v = 0.20
I poly rbf
v = 0.01
Face detection I Target detection Digit recognition

Figure 6.1 Comparison of initialisation and optimisation times for the ad hoc and the
iterative methods of initialising the decision variables
178 OPTIMIZATION AND CONTROL WITH APPLICATIONS

time in some cases. These results clearly show that even with the more complex
iterative initialisation process, there are time improvements for most of the
tested cases.

7 CONCLUSIONS

2v-SVM is a natural extension to SVM that allows different bounds for each of
the binary classes, and compensation for the uneven training class size effects.
We have described the process for training 2v-SVMs using an iterative process.
The training process consists of the initialisation and the optimisation proper.
Both use a similar technique in their iterative procedures.
Simulation and evaluations of the training process are continuing, and some
results were reported in Chew et al. (2001b). The initialisation process has been
found to reduce the training optimisation time, and does not require significant
costs in computing the decision variable.
In general, the optimisation process is expensive, in terms of computing
and memory utilisation, as well as the time it takes. By using caches and
decomposition in the process presented in this work, the problems of high
memory usage and redundant kernel calculations are overcome. Specifically,
the memory utilisation complexity is reduced from 0(12) to 0(1), to cache the
kernel calculations.
The method presented has led to an efficient classifier implementation. It
can be implemented readily on desktop workstations, and possibly on high
performance embedded systems.

Acknowledgments

Hong-Gunn Chew is supported by an University of Adelaide Scholarship, and by a

supplementary scholarship from the Cooperative Research Centre for Sensor Signal
and Information Processing.
TRAINING DUAL-V SUPPORT VECTOR MACHINES 179

Appendix

Derivation of (3.5).From (3.3) and (3.4),

Using (3.1) and (3.6),

180 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Derivation of (4.3) .From (4.1) and (4.2),

-'2
1
(a:))2 K,.,. - b r a ~ ) y r ~-r r ( ~ 5 KT,.
~ ) ~

Using (3.1) and (4.4),

Derivation of (4.12).The update of a, a t each iteration is C,., except for the

last update of each class. Consider the positive training class. The initialisation
process will stop updating the decision variables for the positive class when (4.7)
is met. Let n+ be the number of iterations for updating decision variables for
the positive class. From (4.7) and (2.6),

Since n+ is the smallest integer satisfying the inequality, using (2.7) and (2.9),
REFERENCES 181

The derivation is similar for the negative class, and therefore the total number
of iterations required for initialising the decision variables is

References

Burges, C.J.C. (1998), A Tutorial on Support Vector Machines for Pattern

Recognition, Data Mining and Knowledge Discovery, Vol. 2, no. 2.
Chang, C.C. and Lin, C.J . (2001a), Training nu-Support Vector Classifiers:
Theory and Algorithms, Neural Computation, Vol. 13, no. 9, pp. 2119-2147.
Chang, C.C. and Lin, C.J. (2001b), IJCNN 2001 Challenge: Generalization
Ability and Text Decoding, Proceedings of INNS-IEEE International Joint
Conference on Neural Networks, IJCNN2001, Washington, DC, USA.
Chew, H.G., Crisp, D.J., Bogner, R.E. and Lim, C.C. (2000), Target Detection
in Radar Imagery using Support Vector Machines with Training Size Biasing,
Proceedings of the Sixth International Conference on Control, Automation,
Robotics and Vision, ICARCV2000, Singapore.
Chew, H.G., Bogner, R.E. and Lim, C.C. (2001a), Dual-nu Support Vector
Machine with Error Rate and Training Size Biasing, Proceedings of the
26th International Conference on Acoustics, Speech and Signal Processing,
ICASSP2001, Salt Lake City, Utah, USA.
Chew, H.G., Lim, C.C. and Bogner, R.E. (2001b), On Initialising nu-Support
Vector Machine Training, Proceedings of the 5th International Conference
on Optimisation: Techniques and Applications, ICOTA2001, pp. 1740-1747,
Hong Kong.
Fletcher, R. (1987), Practical Methods of Optimization, John Wiley and Sons,
Inc., 2nd edition.
Hsu, C.W. and Lin, C.J (2002), A Simple Decomposition Method for Support
Vector Machines, Machine Learning, Vol. 46, pp. 291-314.
Joachims, T. (1999), Making Large-scale SVM Learning Practical, in Scholkopf,
B., Burges, C.J.C and Smola, A.J. editors, Advances in Kernel Methods -
Support Vector Learning, MIT Press, Cambridge, MA, USA.
Osuna, E., Freund, R. and Girosi, F. (1997a), An Improved Training Algorithm
for Support Vector Machines, in Principe, J , Gile, L, Morgan, N. and Wilson,
182 OPTIMIZATION AND CONTROL WITH APPLICATIONS

E. editors, Neural Networks for Signal Processing VII - Proceedings of the

1997 IEEE Workshop, pp. 276-285, New York, USA.
Osuna, E., Freund, R. and Girosi, F. (1997b), Training support vector machines:
An application to face detection, Proceedings of CVPR'97, Puerto Rico.
Platt, J.C. (1999) Fast Training of Support Vector Machines Using Sequential
Minimal Optimization, in Scholkopf, B., Burges, C.J.C and Smola, A.J. ed-
itors, Advances in Kernel Methods - Support Vector Learning, MIT Press,
Cambridge, MA, USA.
Scholkopf, B., Smola, A.J., Williamson, R.C. and Bartlett, P.L. (2000), New
Support Vector Algorithms, Neural Computation, Vol. 12, pp. 1207-1245.
Vapnik, V. (1995), The Nature of Statistical Learning Theory, Springer-Verlag,
New York, USA.
AND BORWEIN GRADIENT METHOD
FOR UNSYMMETRIC LINEAR
EQUATIONS
Yu-Hong Dai,
Institute of Computational Mathematics and ScientificIEngineering Computing
Chinese Academy of Sciences, P.R. China ([email protected])
Li-Zhi Liao
Department of Mathematics, Hong Kong Baptist University,
Kowloon Tong, Kowloon, Hong Kong ([email protected])
and Duan Li
Department of Systems Engineering and Engineering Management
Chinese University of Hong Kong, Hong Kong ([email protected])

Abstract: The Barzilai and Borwein gradient method does not ensure descent
in the objective function at each iteration, but performs better than the clas-
sical steepest descent method in practical computations. Combined with the
technique of nonmonotone line search etc., such a method has found success-
ful applications in unconstrained optimization, convex constrained optimization
and stochastic optimization. In this paper, we give an analysis of the Barzilai
and Borwein gradient method for two unsymmetric linear equations with only
two variables. Under mild conditions, we prove that the convergence rate of
the Barzilai and Borwein gradient method is Q-superlinear if the coefficient
matrix A has the same eigenvalue; if the eigenvalues of A are different, then the
convergence rate is R-superlinear.

Key words: Unsymmetric linear equations, gradient method, convergence,

Q-superlinear, R-superlinear.
184 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Consider the problem of minimizing a strongly convex quadratic,

where A E R n x n is a real symmetric positive definite matrix and b E Rn. The

Barzilai and Borwein gradient method Barzilai & Borwein (1988) for solving
(1.1) has the form
xk+l =xk -akgk, (I.2)
where g k is the gradient off a t x k and a k is determined by information achieved
a t the points x k - 1 and x k . Denote s k - 1 = x k - x k - 1 and yk-1 = g k - gk-1.
One choice for the stepsize ak is such that the matrix Dk = a k I satisfies a
certain quasi-Newton relation:

yielding
~T-~sk-l
ak = T (1.4)
sk-lYk-l

Compared with the classical steepest descent method, which can be dated
to Cauchy (1847), the Barzilai and Borwein gradient method often requires
less computational work and speeds up the convergence greatly (see Akaike
(1959); Fletcher (1990)). Theoretically, Raydan (1993) proved that the Barzi-
lai and Borwein gradient method can always converge to the unique solution
x* = A-'b of problem (1.1). If there are only two variables, Barzilai & Bor-
wein (1988) established the R-superlinear convergence of the method; for any
dimensional convex quadratics, Dai & Liao (2002) strengthened the analysis of
Raydan (1993) and proved the R-linear convergence of the Barzilai and Borwein
gradient method. A direct application of the Barzilai and Borwein method in
chemistry can be found in Glunt (1993).
To extend the Barzilai and Borwein gradient method to minimize a general
smooth function
minf(x), X E Rn, (1.5)

Raydan (1997) considered the use of the nonmonotone line search technique
by Grippo et a1 (1986) for the Barzilai and Borwein gradient method which
cannot ensure descent in the objective function a t each iteration. The resulting
ANALYSIS O F BB METHOD F O R UNSYMMETRIC LINEAR EQUATIONS 185

algorithm, called the global Barzilai and Borwein algorithm, is proved to be

globally convergent for general functions and is competitive to some standard
conjugate gradient codes (see Raydan (1997)). A successful application of the
global Barzilai and Borwein algorithm can be found in Birgin et a1 (1999). The
idea of Raydan (1997) was further extended in Birgin et a1 (2000) for minimiz-
ing differentiable functions on closed convex sets, resulting in a more efficient
class of projected gradient methods. Liu & Dai (2001) recently provided a
powerful scheme for unconstrained optimization problems with strong noises
by combining the Barzilai and Borwein gradient method and the stochastic ap-
proximation method. Other work related to the Barzilai and Borwein gradient
method can be found in Birgin et a1 (2000). Because of its simplicity and nu-
merical efficiency, the Barzilai and Borwein gradient method has been studied
by many researchers.
It is trivial to see that problem (1.1) is equivalent to solving the symmetric
positive definite linear system
AX = b. (1.6)

Denoting gr, to be the residual a t xk, namely

one can similarly define the steepest descent method and the Barzilai and Bor-
wein gradient method for problem (1.6). F'riedlander et a1 (1999) presented
a generalization of the two methods for problem (1.6), and compared several
different strategies of choosing the stepsize a k . To develop an efficient algo-
rithm based on the Barzilai and Borwein gradient method for solving nonlinear
equations, we present in this paper an analysis of the Barzilai and Borwein
gradient method for the unsymmetric linear system (1.6), where A E RnXnis
nonsingular but not necessarily symmetric positive definite.
As shown in the coming sections, the analysis of the Barzilai and Borwein
gradient method is difficult for the unsymmetric linear equation (1.6). In this
paper, we assume that there are only two variables and A has two real eigen-
values. In addition, we assume that

The condition (1.8) does not imply the positive definiteness of A, which is re-
quired in the analysis of Barzilai & Borwein (1988), Dai & Liao (2002), and
186 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Raydan (1993). Under the above assumptions, we prove that if the eigenvalues
of A are the same, the Barzilai and Borwein gradient method is Q-superlinearly
convergent (see Section 2). If A has different eigenvalues, the method converges
for almost all initial points and the convergence rate is R-superlinear (see Sec-
tion 4). The two results strongly depend on the analyses of two recurrence
relations, namely (2.8) and (4.11) (see Sections 3 and 5, respectively). Some
concluding remarks are drawn in Section 6.

2 CASE OF IDENTICAL EIGENVALUES

Assume that A1 and A2 are the two nonzero real eigenvalues of A. Since the
Barzilai and Borwein gradient method is invariant under orthogonal transfor-
mations, we assume without loss of generality that the matrix A in (1.6) has

where 6 E R1. In this section, we will consider the case that A has two equal
eigenvalues, namely
A1 = A2 = A. (2.2)

By relations (1.2) and (1.7), we can write

Noting that the condition (1.8) is equivalent to

# 0,
g r ~ g k for all k 2 0, (2.7)

we have by (2.4)-(2.5) that gf)gf) # 0 for all k. Thus tk is well defined and
tk # 0. The relation (2.7) also implies that 6 # 0, for otherwise the algorithm
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 187

gives the solution in a t most two steps. Furthermore, the division of (2.4) by
(2.5) yields the recurrence relation

By Theorem 3.1, we know that there exists at most a zero measure set S such
that
lirn ltkl = +m, for all ( t l ,t 2 )E R2\S.
k+w (2.9)
Then by (2.4), we can show that for most initial points, the Barzilai and
Borwein gradient method converges globally and the convergence rate is Q-
superlinear.

Theorem 2.1 Consider the linear equations (1.6), where A E R~~~ is non-
singular and has two identical real eigenvalues. Suppose that the Barzilai and
Borwein gradient method (1.2) and (1.4) is used, satisfying (1.8). Then for all
(tl,t 2 ) E R2\S, where S E R2 is some zero measure set i n R2, we have that

lirn gk = 0.
k+m

Further, the convergence rate is Q-superlinear, namely

Proof: By Theorem 3.1, we know that relation (2.9) holds for some zero mea-
sure set in R2. It follows from (2.6), (2.9), and (2.2) that

6
lim 611gk-~II; = lim 6(1+ t2,4) - - (2.13)
ST-^
Agk-1 k+m Xt2-1 +
6tk-l + A'
Then it follows from (2.4) and the above relations that

(1) (1) (2)

gk+l = lirn 69,-19,-1
lim - - lim 6//gk--lll; t ; =
~ 0. (2.14)
k+m g f ) k+m gkT_lAgk-l k + m gT-lAgk-l
188 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Noting that llgkll; = (gf))2(1+tk2), we then get by this, (2.14), and (2.6) that

llgk+lll2 -
lim - - lim
I&1 41+ t k L
k-m lgtllz X-m J- =

Therefore this theorem is true.

In practical computations, it was found that the relation limk,, Itk/ = +oo
always holds for any tl and tz, and that the sequence {llgkll) generated by
the Barzilai and Borwein gradient method is Q-superlinearly convergence if
X1 = X2.
In the case of identical eigenvalues, the number 161 provides an indication
of the extent to which A is close to a symmetric matrix. From relation (2.13),
we can see that the smaller 161 is, the faster the Barzilai and Borwein gradient
method converges to the solution.

3 PROPERTIES OF T H E RECURRENCE RELATION (2.8)

In this section, we analyze the recurrence relation (2.8) and establish the prop-
erty (2.9) for the sequence {tk) after giving some lemmas.

(ii) Define = -tk. Then we also have that =& - - fiJl

Proof: The statements follow directly from (2.8) and the definition of {&).

Lemma 3.2 Suppose that {ak) is a positive sequence that satisfies

Then {ak) is disconvergent, namely lim ak = +oo.

k-w

Proof: Relation (3.1) and ak > 0 indicates that {ak) is monotonically in-
creasing. Assume that
lim ak = M
k-03
< +oo.
Then by this and (3.1), we get that

lim ak 2 M
k-m
+ M-l, (3.3)

which contradicts (3.2). So this Lemma is true.

ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 189

Lemma 3.3 Consider the sequence {tk) that satisfies (2.8) and tk # 0 for
all k. Then there exists at most a zero measure set S in R2 such that for all
(tl, t2) E R2\S, one of the following relations holds for some integer k:

Proof: Assume that neither (3.4) nor (3.5) holds for all 2 1. Then there
must exist an integer k such that either tk > 0, tr2.+1< 0 or

Since tk and -& = -tk satisfy the same recurrence relation by part (ii) of Lemma
3.2, we assume without loss of generality that (3.6) holds. Then by (3.6) and
(2.8), we have that = - tg -ti1 > 0. I t follows that < 0, for
otherwise (3.4) holds with k = i+ 1. Similarly, we can prove that

The repetition of the above procedure yields

>0, foralli>Oandj=1,2;
t"4"j { < 0, for all i 2 0 and j = 3,4. (3.8)

ti+4i+5 = -(t~+4i+2 + ti:4i+2 + t"k+4i+3) > O' (3.10)

- (tL+4i+l+ t-I
1
we get that + ti+4i+2+ k+4i+l )]-I < 0, yielding

t~+4i+l+ tii4i+1
< ti+4i+l+ Itk+4i+2 + t'lk+4i+2 1-I '
(3.11)

In addition, it follows by < 0 and (2.8) with k replaced by i+ 4i + 1 that

> t k+4i+l
t&+4i+2- - + 2, (3.12)

which, with (3.9), implies that < 1 < ti+,,+, . Since t +t-l is monoton-
ically decreasing for t e (0, I), we can conclude from (3.11) that
190 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for otherwise we have titli+, 2 ti+4i+land hence

tk+4i+l +tit4i+l
2 tk+4i+2ttr1*+4i+2 +'itdit2(3.14)
> 'k+4i+2 + (t~+4i+2 )-I,

contradicting (3.11). By (3.9) and (3.13), we get that

t&i+l 'tk+.(i+2< tk+4i+l +

It follows from (3.9) and (3.15) that

Iti+4i+3 1 = Iti+ri+l I - (t~+4i+2- tit4i+1)

< Iti+e+i I.
In addition, we can see from the definition of tk+4i+4
and (3.8) that

1
Itk+4i+4I > Itk+4i+21 + Itk+4i+21- .
Similar to (3.15)-(3.17),we can prove the following three relations

1
Itk+4i+s I > Itii+4i+4I + ltk+4i+4 1-
By (3.17), (3.20), and Lemma 3.2, we get that

lim ltk+2il= +oo.

i+w

This together with (3.15), (3.16), (3.18), and (3.19) indicates that

lim ltk+2i+ll= 0.
2-00

By (3.15), we can assume that

tk+4i+2 = t^l
k+4i+l + - q~+4i+l)~k+4i+l
9

where t,!-(tL+4i+l)
E ( 0 , l ) . Then it follows from (3.23) and (2.8) that

-
tk+4i+3 - -@k+4i+ltk+4i+l

and hence by part (i) of Lemma 3.1,

ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 191

where
2 3
ri = ( 2 - +~+4i+l)~k+4i+l+ tE+4i+l
- $i+4i+l)
(3.26)
+ ( 1 - $i+4i+1 )ti+4i+l
From (3.19), (3.22), and (3.25), we see that

) i ~ $ ' _ r n ~=
+ ~1 and lirn
z-cc
($L+4i+l t ~ ' . = 0.
- 1 ) k+4%+1 (3.27)

From the above relations, we can further deduce that

$k+4i+l = + hk+4i+1, where hk+4i+l = 0 ( ~ ~ + 4 i + l ) ' (3.28)

Then it follows from (3.25), (3.26), and (3.28) that

On the other hand, by (3.24) and the first part of (3.27), we see that

Noting that { - t k ) and { t k ) satisfy the same recurrence relation and replac-
ing all tk+4i+j with -tL+4i+j+2
in the previous discussions, we can similarly
establish that
lim (-ti+4i+5)/(-'~+4i+3) = -1. (3.31)
i--roo
Relations (3.30) and (3.31) imply that

)f",tk+4i+s/tk+4i+~ = (3.32)

Then we get by (3.29) and (3.32) that

-
hk+4i+l - -2ti+4i+l
+ 0(ti+4i+l), (3.33)

which, together with (3.28) and (3.24), gives

Similarly, we also have that

Substituting (3.34) into (3.35) and comparing the resulting expression with
(3.29) yield
- (3.36)
h~+4i+l- -2ti+4i+l + ''2+4i+l+ 0(';+4i+l).
192 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Consequently, we have that

-
'l+4i+a - -'1+4i+3 + 2t3+4i+3 - 6tz+4i+3 + O(tSk+4i+3). (3.38)
Substituting (3.37) into (3.38) and comparing it with (3.29), we then obtain

Recursively, we know that for any j the coefficient of $+4i+l in hh+4i+l is

uniquely determined. It follows from this and (3.23) that t ~ + , ~=+ ~ -
hL+4i+ltk+4i+l is uniquely determined. In other words, for any tL+,,+, > 0,
there exists a t most one value of tL+4i+2 such that the cycle (3.8) occurs.
Therefore there exists a t most a zero measure set S in R2 such that the cycle
(3.8) occurs for (tl, t2) E S. This completes our proof.

Theorem 3.1 Consider the sequence {tk) that satisfies (2.8). Then there ex-
ists at most a zero measure set S in R2 such that

Proof: By Lemma 3.3, we know that there exists a t most a zero measure set
S in R2 such that (3.4) or (3.5) holds for some if (tl, t2) E R2\S. Assume
without loss of generality that (3.4) holds, for otherwise we may consider the
sequence & = -tk. Then by part (i) of Lemma 3.1 and relation (3.4), we have
that

Since t + t-l > 2 for t > 0, it follows from (3.41) and (3.4) that t ~ <+ -2.~
Consequently,

By relations (3.41)-(3.43), we can similarly show that ti+6 > 0, > 0 and
t~+> * 0. The repetition of this procedure yields
> 0 and j = 0,1,2;
for a l l i
(3.44)
for all i > 0 and j = 3,4,5.
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 193

The above relation, (2.8), (3.41), and (3.42) indicate that

ItL+3(i+l)+jI 2 ItL+3i+jI + l t ~ + ~ i + ~ Ifor , i > 0 and j = 0 , l .

- ~ all (3.45)

Thus by Lemma 3.2, we obtain

lim lt6+3i+jI = + m , for j = 0 , l . (3.46)

i+m

It remains to prove that

lim ltL+3i+21= + m .
i+oo

For this aim, we first show that

lim sup ltL+3i+2I = +m. (3.48)

i-oo

In fact, if (3.48) is not true, then there exists some constant M > 1 such that
ItH+3i+21I M, for a11 i 2 0. (3.49)

Relation (3.46) implies that there exists some integer i > 0 such that
ltiC+3i+31 %Id,> for a11 i > i. (3.50)

It follows from part (i) of Lemma 3.1, (3.44), (3.49), and (3.50) that for all
i > Z,

contradicting (3.49). So (3.48) must hold. For any M >

1, we continue to
>
denote Z 0 to be such that (3.50) holds. Let {il : 1 = 1 , 2 , .. .) be the set of
all positive integers such that

and
It~+3i,+2I2 2M. (3.54)
Relation (3.48) implies that {ill is an infinite set. By the choice of {il),we
have that
I < 2M,
It~+3~+2 for j E [il + 1,
i l + l - 11. (3.55)
194 OPTIMIZATION AND CONTROL WITH APPLICATIONS

It follows from this, (3.50) and part (i) of Lemma 3.1 that

for all j E [il + 1,i l + l - 11. Therefore we have that

ItE+3i+21L M, for all i 2 il. (3.57)

Since M can be arbitrarily large, we know that (3.47) must hold. This together
with (3.46) complete our proof.

4 CASE OF DIFFERENT EIGENVALUES

In this section, we analyze the Barzilai and Borwein gradient method for un-
symmetric linear equations (1.6), assuming that the coefficient matrix A, in the
form of (2.1) has two different real eigenvalues, i.e.,

Denote p = -, P =
S
A2 - A1
(l
0
-'),1
and D =
0
O
A2
) Then the matrix
A can be written as

Further, defining uk = Pgk and vr, = P-Tgk, we can get by multiplying (2.3)
with P that

Assume that UI, = ($I, u f ) ) ~and VI, = (vf), v f ) ) ~ . The above relation
indicates that
(2) (2)
(1) - (h-Xl)%-lvk-l (1)
Uk+l - ~C-~Dvk-i Uk
(1) (1)
(2) - (h-~z)uk-lvk-l (2)
Uk+l - u E ~ D v Uk
~ -' ~
Under the condition (1.8), it is easy to show by (4.4) and the defintions of uk
and vk that u r ) u f ) # 0 and vf)vf) # 0. Let qk be the ratio uf)/uf). It
follows from (4.4) that
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 195

On the other hand, by the definitions of ur, and vk, we have that

which yields

Substituting (4.7) into (4.5), we then obtain the following recurrence relation

9k+l = -9lc~;:~(y + y + 19k-1 ). (4.8)

If A is symmetric, namely S = y = 0, the above relation reduces to

from which one can establish the R-superlinear convergence result of the Barzi-
lai and Borwein gradient method (see Barzilai & Borwein (1988)). In this paper,
we assume that y # 0.
For simplicity, we denote T = y-2 and define the sequence

Then it follows from (4.8) that

For the sequence pk that satisfies (4.11), if (Ipll, Ip21) # ( d m ,d m ) ,and

if pk # O , 1 for all k, we know from Theorems 5.1 and 5.2 that, there exists
some integer such that the following relations hold:
-%+6(i+l)+j
lim = (1+ I-)2, for j = 1,2,
i-im Pi+~i+j

lim
i-00
Pi+,(i+l)+j
pS+~i+j
= (1 + T)-~, for j = 4 5 ,
196 OPTIMIZATION AND CONTROL WITH APPLICATIONS

From the above relations, we can show that if A has two different eigenvalues,
the Barzilai and Borwein gradient method converges globally and its conver-
gence rate is R-superlinear.

Theorem 4.1 Consider the unsymmetric linear equations (1.6), where A E

R2x2 is given in (2.I ) , with A1 # Az, XlA2 # 0 and S # 0. Assume that

Then for the Barzilai and Borwein gradient method (1.2) and (1.4), if (1.8)
holds, we have that
lim gk = 0. (4.17)
k-00
Fbrther, the convergence rate is R-superlinear.

Proof: It follows from (4.4), (4.7) and (4.10) that

The condition (4.16) implies that

Thus by Theorems 5.1 and 5.2, we know that there exists some integer i such
that the relations (4.12)-(4.15) hold. For any sufficient small E > 0, let E E
(0, E) to be another small number. For this 8, we know from (4.12) and (4.14)
that there exists an integer such that for all i > i,

where c > 0 is some constant. Further, since d < E, it follows from (4.21),
(4.13), and (4.15) that there exist an integer i 2 such that for all i 2 i,
5 1-1(1+ r -E)-~', for j = 1,2;
for j = 3; (4.22)
for j = 4,5,6,

U-
for j = l,2,3;
k+6i+j+l < ~ + ~ ( I + T - E ) - ~ ~ ,forj=4,5; (4.23)
- c- Xz
U -
for j = 6.
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 197

Thus for any E > 0, the following relation holds with some positive constants
cl and c2:
k2
l u ~ ) l < c l c ~ ( l + ~ - c ) -f~o r, 1 = 1 , 2 . (4.24)
The definition of uk implies that gk = P-luk, this and (4.24) give

where c3 = fiII~-~112cl.Relation (4.25) indicates that the sequence {Ilgkll) is

globally convergent and the convergence rate is R-superlinear.
Theorem 2.1 tells us that under certain conditions on A, the Barzilai and
Borwein gradient method is R-superlinearly convergent for most initial starting
points xl and 22. If (4.16) does not hold, namely

direct calculations show that

(~~+2i,Pi~+2i+l)
= (-m,
m),for some > 0 and all i 2 0. (4.27)
By this relation, (4.18) and (4.19), we see that if X1X2 > 0, then the Barzilai
and Borwein gradient method is linearly convergent; otherwise, if XIXz < 0,
the method needs not converge. Here it is worthwhile pointing out that the
case of (4.16) can be avoided if the first stepsize a1 is computed by an exact
line search.
Since
2 (A2 - W 2 / d 2 ,
=~ - = (4.28)

the value of T can be regarded as a quantity that shows the degree to which the
matrix A is close to a symmetric matrix. Relation (4.25) indicates that, the
bigger T is, namely if A is closer to a symmetric matrix, the faster the Barzilai
and Borwein gradient method converges to the solution.

5 PROPERTIES OF T H E RECURRENCE RELATION (4.11)

In this section, we consider the sequence {pk) that satisfies (4.11). Assume
that pk # 0 , 1 for all k 2 1. To expedite our analyses, we introduce functions
198 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Proof: (i) follows from the definitions of {pk), h(p) and #(p). (ii) follows
+
from (i). For (iii), noting that pk = (1 r ) r k l , we have that

So this lemma is true.

Lemma 5.1 shows that the sequences pk and rk satisfy the same recurrence
relation. This observation will greatly simplify our coming analyses.

Lemma 5.2 Consider the sequence pk that satisfies (4.11) and pk # O,1 for
all I%. Then for any integer k , the following cycle cannot occur

lim pi+4i+l = 1,
i-DO
lim pi+4i+2 = 0,
&+DO (5.2)

Proof: We proceed by contradiction and assume that (5.2)-(5.3) hold for

some k. For simplicity, we let k = 0. By the definition of {pk) and relation
(5.2), it is easy to see that

.lim ~4i+2(1- ~4i+i)-' = a,

2'00
where a = 1 + 7-l. (5.4)
Let us introduce the following infinitesimals

It follows from (5.6) that

ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 199

By the definition of {pk) and the above relations, we get by direct calculation
that

Let rk be given in part (iii) of Lemma 5.1 and denote

Direct calculations yield

where

By the definition of {rk) and relations (5.2)-(5.3), we must have

lim Oi = 0.
i+oo

Thus by (5.13), the following relation holds

Substituting (5.15) into (5.13), we can then obtain

Relations (5.12) and (5.16) show that

However, due to the relation between {rk) and {pk), similarly to (5.15) we
have
200 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Since the value on the right hand side of (5.17) is not equal to the one of (5.18)
for any r > 0 , we see that the two relations contradict each other. Therefore
this lemma is true.

Lemma 5.3 Consider the sequence { p k ) that satisfies (4.11) and pk # O,1 for
all k . Assume that
pk > 0 , for all large k. (5.19)

Then there exists some index ,& such that one of the following relations holds:

Proof: For the function 4 ( p ) defined in (5.1), it is easy to check that p; =

1 +r- minimizes +(p) in ( 0 , l ) and p; = 1 +r+ J
- maximizes
$(p) in ( 1 + 7 ,+oo), and

It follows from (4.11) that

k+l-2
) ' ' ' ' . ' = pk
Pk+l = ~ k + l - 1 4 ( ~ k + l - 2= n
i=k-1
+(pi), 0 a 1 2 1. (5.23)

By (5.23), (5.22), and (5.19), we can see that there exists some k such that

By part (iii) of Lemma 5.1, we assume without loss of generality that (5.24)
holds, for otherwise consider the sequence { r k } . Now we proceed by contradic-
tion and assume that neither (5.20) nor (5.21) holds. Then by the definition of
{ p k ) and the relation (5.24), it is easy to show that for all i 2 1:
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 201

In the following, we will prove that the cycle (5.26) cannot occur infinitely. In
fact, since pi+4i E ( 0 , I ) , we have by part (i) of Lemma 5.1 that

2 + T ) ~ ( P ; 2) P;. (5.28)
It follows from the definition of pL+4i+4 and pi+4i+3 E ( 0 , l ) that

In addition, by the definition of pk+4i+5 and pi+4i+5 > 1 + T , it follows that

Writing pi+di+5 = ~ ( P C + ~ ~ + ~ ) ~ ( we P ~ can

+ ~ prove ) , pi+4i+5 > 1 + 7
~ + ~ from
and (5.28) that ~ 1 2 . + ~ ~ E+ ~(pT, 1). Noting that $(p) is monotonically decreasing
for p 2 p; and combining relations (5.27)-(5.30), we can obtain

we have by this and (5.31) that

Using the fact that 4 ( p ) is monotonically decreasing for p E (p;, I ) , we get from
(5.33) that
Pi+4i+3 > Pi+4i-i, (5.34)
for otherwise we have from the monotonicity of 4 ( p ) and (5.26) that

( 1+ ~)4(~~+4i+3)-~4(4(~~+4i-l)-l)
2 + r)4(~i+4i-l)-14(4(~i+4i-1)-1) (5.35)
= (1 + 7 ) h ( $ ( ~ ~ + ~ ~ ->~ 1-) - l )
Relations (5.34) and (5.26) indicate that lim pi+4i-1 = c4 E ( O , l ] . If c4 < 1,
2-00
then we have that
202 OPTIMIZATION AND CONTROL WITH APPLICATIONS

contradicting (5.33). Thus we must have

lim pi+4i-1 = 1. (5.36)

2--to3

Further, by the definition of {pk), (5.36), and part (iii) of Lemma 5.1, we obtain

However, Lemma 5.2 shows that the cycle (5.36) and (5.37) cannot occur. Thus
(5.26) cannot hold for all k and this lemma is true.

Theorem 5.1 Consider the sequence {pk) that satisfies (4.11) and pk # 1 for
all k. Assume that relation (5.19) holds. Then there exists some index k such
that (4.12)-(4.15) hold.

Proof: By Lemma 5.3, there exists an integer iE such that (5.20) or (5.21)
holds. By part (iii) of Lemma (5.1), we assume without loss of generality that
+
(5.20) holds, for otherwise we consider the sequence ((1 r)pkl). By (5.20)
and part (i) of Lemma 5.1, it is easy to see that

Notice that

By viewing p ~ as+ a ~function of pk and p ~ with

+ ~0 < pl;: 5 p ~ <+ 1,~we can
check that PA+, reaches its minimum when p~ = p ~ =+p; ~= 1+T- J-,
and that its minimum value is

I t follows from the definition of p ~ + ,and (5.38) that

from which and (5.41) we get that

ANALYSIS OF BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 203

By (5.43) and part (iii) of Lemma 5.1, we can similarly prove that

Recursively, we have for all i 2 1,

pk+6i-1 E (O, O < Pk+6i < pk+6i+l <

Pk+6i+2 > + T, Pk+6i+3 > Pk+6i+4 > +
By (5.45) and part (ii) of Lemma 5.1, we then obtain

Pk+,(i+l)+j 2 (1+ T ) ~ , for i 2 1 and j = 2,3;

Pk+6i+j { < (1 + T ) - ~ , for i > 1 and j = 5,6.

In the following, we will show that

To do so, we first show by contradiction that

lim inf p&+6i+l= 0. (5.48)

Z+W

Assume that (5.48) is false. Then there exists some constant cs > 0 such that
p ~ + 2~ c5,~ +for~all i 2 1. (5.49)

Noting that
> 1+ T , for p E (O,l),
(5.50)

we have from this, part (ii) of Lemma 5.1, and (5.46) that

limsup pk+6i+7
-5 - lST<l. (5.51)
i+m pk+,i+i h(c5)
The relation (5.51) implies the truth of (5.47), which contradicts (5.49). Thus
(5.48) is true. For any E E (0,0.5], we know from (5.46) and (5.48) that there
exists some integer i such that

and
h ( ~ ~ + 6 i + 55) h(e), for all i 2 i.
h(~k+6i+2)
By part (ii) of Lemma 5.1, (5.50), (5.53), (5.52), and e 5 0.5, it is clear that
204 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Let i be the least integer such that 2 a and

PL+6;+1 5 &' (5.55)

Then for any i E [i+ 1,i - 11, we have that p,&+~i+,> E, and this together with
part (ii) of Lemma 5.1, (5.50), (5.53), and (5.54) indicates that

<
- 2 ~ , for all i E
P&+6i+l 5 %+6(i-l)+l 5 ' ' ' 5 P,&+~(a+l)+l [i+ 1,;- I].

The above relations, (5.52), and (5.55) imply that

5 2&, for all i E [a, a]. (5.56)

Since the i in (5.52) can be arbitrarily large, we can then obtain

S ~ E , for a l l i 2 2 . (5.57)

Thus by the definition of limit and &+6i+l 2 0, relation (5.47) holds. In a

similar way, we can show that lim p;+6i+4 = +m. By this, (5.47), and (5.46),
2--too

we know that (4.12)-(4.15) hold with k= + 1.

Lemma 5.4 Consider the sequence {pk) that satisfies (4.11). Define the func-
tion D(zl,z2) = dl (zl)d2 ( a ) , where

and

If the following relations hold for some k,

Proof: Noting that the following relation holds for 1 = 1 , 2

dl((l + 7)pi1) = dl ( ~ k ) ,
ANALYSIS OF BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 205

we can assume without loss of generality that

Relation (5.63), the definition of {pk) and (i) of Lemma 5.1 indicate that

we have from the definitions of D , d l , d2 and part (i) of Lemma 5.1 that

For the case (ii), we also have that

The cases (iii)-(iv) can be similarly shown. So (5.61) is true.

Lemma 5.5 Consider the sequence {pk) that satisfies (4.11) and pk # 0 , l .
Assume that there exists an infinitely subsequence {ki) such that pki < 0. If
206 OPTIMIZATION AND CONTROL WITH APPLICATIONS

then there exists some index Ic such that at least one of the following relations
holds
Pl;, < 0 , ~,G+IE ( 0 , I ) , (5.69)

We proceed by contradiction and assume that neither (5.69) nor (5.70) holds
for any L. By the definition of { p k ) , we see that if pk < 0 and pk+l < 0 , then
pk+2 > 0 . SO there must exist some integer k such that

Pi+2i < 0 , pk+2i+l E (1,1+ T), for all i 2 0. (5.72)

Define d l , dz, and D as in Lemma 5.4. Then by Lemma 5.4 and (5.72), we
know that is monotonically decreasing with i. If (5.68) holds,
one can strengthen the analyses in Lemma 5.4 and obtain a constant cs E ( 0 , l )
which depends only on p i and such that

i +~2, ~ + 32) C S D ( P ~ P
D ( P L + ~ ~~ + + ~~ +
~ ,~ ~ + for , i 2 0.
~ ) 811

Since D(pk+2i,pi+2i+l) < 0 for all i, we then have that

lim
i--too = 0'

It follows by (5.58) and (5.72) that

2. >
d2 ( ~ i +%+I)
- (1 +4 - 3 ,
and from this, (5.74), and the definition of D , we get

lim dl = 0.
%---too

On the other hand, the definition of pi+2i+2 and (5.72) imply that

-
Pi+,iPi+2i+2 - p12.+2i+lhCpi+2i) E ( 1 , ( 1 +
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 207

By (5.76), (5.77), and (5.58), we know that there exists some integer, which we
continue to denote by i,such that
lim pi+, . = -00, Jflp~+~~+~ = 0. (5.78)
i+oo '
Further, from (5.78), (5.72), and the definition of { p k ) , we can obtain

lim
i+cc pi+4i+l = 1, lim p ~ + ~= 1
i+cc
+
~ +T . ~ (5.79)

However, Lemma 5.2 shows that the cycle (5.78) and (5.79) cannot occur. The
contradiction shows the truth of this Lemma.

Theorem 5.2 Consider the sequence { p k ) that satisfies (4.11) and pr, # O , 1
for all k. Assume that there exists an infinite subsequence { k i ) such that pki < 0
and that (5.68) holds. Then there exists some integer ,& such that relations
(4.12)-(4.15) hold.

Proof: By Lemma 5.5, there exists an index such that (5.69) or (5.70)
holds. By part (iii) of Lemma 5.1, we assume without loss of generality that
(5.69) holds. Then it follows by the definition of { p k ) that

where p: = 1 + T -J
- is same as the one in (5.41). Note that

By this relation, (5.80), and part (ii) of Lemma 5.1, we can get

Thus we also have

pE+6 < PE+7 E (O,
The recursion of the above procedure yields for all i 2 1,

and
208 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Relation (5.86) and the fact that h(q5(pT))< 1 indicate that

lim Pi+6i = 0.
i-00

Similar to (5.86), we can establish the relation

This together with pL+6i+3 < 0 and h(q5CpT))< 1 implies that

lim P,$+6i+3 = -00. (5.89)
i-00

We shall now prove that

lim pL+6i+2 = -00. (5.90)
2-00

In fact, by part (ii) of Lemma 5.1, (5.87), (5.89) and (5.81), we can get

l i p inf 2 1.
2-O0 Pk+6i+2
In a similar way, we can show

lim sup PL+si+ll I (5.92)

i-00 Pi+6i+5
Relations (5.86), (5.88), (5.91) and (5.92) imply that there exists some integer
i such that the following relations hold for all i 2 i,

By part (ii) of Lemma 5.1 and (5.93), we can obtain that

Similar to (5.94), we can verify that

~&+6i+ll> for a11 i i. (5.95)

Thus by part (ii) of Lemma 5.1, (5.87), (5.89), (5.94) and (5.95), it follows that

lim inf %+6i+8

-2 h(~~+62+5) .-1 + 7
> 1,
i-00
2+2 h(~~+6;+2)
and this together with pi+6i+2 <0 implies that the truth of (5.90). We now
prove that
lim Pi+6i+l = 0. (5.97)
2-00
ANALYSIS O F BB METHOD FOR UNSYMMETRIC LINEAR EQUATIONS 209

In fact, by (5.86), we know that pi+6i+l is monotonically decreasing and hence

its limit exists. Assume that

Then from this and part (ii) of Lemma (5.1) we can get

PE+6i+7
I h ( ~ ~ + 6 i +h o( ~ ~ + 6 i +<~%
)
< 1, (5.99)
PL+6i+l h ( ~ ~ + 6 i +hl()~ ~ + 6 i + 2-) h(c7)
which implies the truth of (5.97). However this contradicts (5.98). Thus (5.97)
is true. Similar to (5.90) and (5.97), we can show

By part (ii) of Lemma 5.1 and relations (5.87), (5.89), (5.90), (5.97), and
(5.100), we know that (4.12)-(4.15) hold with ,& = i.

6 CONCLUDING REMARKS

In this paper, we have given an analysis of the Barzilai and Borwein gradient
method for unsymmetric linear equations, assuming that the dimension n =
2. Under mild assumptions, we have proved that the convergence rate of the
Barzilai and Borwein gradient method is Q-superlinear if the coefficient matrix
A has two same eigenvalues; if the eigenvalues of A are different, then the
method converges for almost all starting points XI and x2 and the convergence
rate is R-superlinear. These results strongly depend on the study of the two
nonlinear recurrence relations, namely (2.8) and (4.11), making the analyses
difficult.
F'rom the relations (2.13) and (4.25), we can see that the convergence of
the Barzilai and Borwein gradient is related to the symmetric degree of the
coefficient matrix A. If A is close to a symmetric matrix, then the method
converges rapidly; conversely, if the matrix A is markedly unsymmetric, then
the method will converge slowly. The convergence of the Barzilai and Bor-
wein gradient method for unsymmetric linear equations is slower than that for
symmetric linear equations. In the symmetric case, the Barzilai and Borwein
gradient method can give the solution in a t most two steps if A has two same
eigenvalues; if the eigenvalues of A are different, we know from Barzilai & Bor-
wein (1988) that the R-superliner convergence order of the method is - E,
where E > 0 is any small number. In the unsymmetric case, however, relation
210 OPTIMIZATION AND CONTROL WITH APPLICATIONS

(4.25) indicates that the R-superlinear order of the method for unsymmetric
linear equations is only 1 if A has two different eigenvalues. Thus to accelerate
the Barzilai and Borwein gradient method for unsymmetric linear equations, it
may be worthwhile studying how to make a transformation to an unsymmetric
matrix that improves its symmetric degree.
This paper have made some efforts on directly extending the convergence
result in Barzilai & Borwein (1988) of the Barzilai and Borwein gradient method
to the unsymmetric linear equations. As pointed out by the referee, another
possibility is to apply the method to the symmetric (least square) system

A~AX
=~ ~ b , (6.1)

that is equivalent to (1.6) in theory if A is nonsingular. In this case, since ATA

is symmetric and positive definite, we know by Barzilai & Borwein (1988) and
Dai & Liao (2002) that the Barzilai and Borwein gradient method for (6.1)
is R-linearly convergent and if n = 2, the convergence rate is R-superlinear.
In practical computations, transforming (1.6) into (6.1) has two main disad-
vantages; namely the condition number of the problem will be squared and
the multiplications of A transpose with vectors will be unavoidable. It is not
known yet which is the better approach in extending the Barzilai and Borwein
gradient method for solving unsymmetric linear equations.

Acknowledgments

This research was supported by the Chinese NSF grant (no. 19801033 and 10171104)
and Hong Kong Research Grants Council (no. CUHK 43921993). The authors would
like to thank the anonymous referee for his many valuable comments on this paper.

References

H. Akaike (1959), On a successive transformation of probability distribution

and its application to the analysis of the optimum gradient method, Ann.
Inst. Statist. Math. Tokyo, Vol. 11, pp. 1-17.
J. Barzilai and J. M. Borwein (1988), Two-point step size gradient methods,
IMA J. Numer. Anal., Vol. 8, pp. 141-148.
E. G. Birgin, I. Chambouleyron, and J. M. Martinez (1999), Estimation of
the optical constants and the thickness of thin films using unconstrained
optimization,J. Comput. Phys., Vol. 151, pp. 862-880.
REFERENCES 2 11

E. G. Birgin and Y. G. Evtushenko (1998), Automatic differentiation and spec-

tral projected gradient methods for optimal control problems, Optim. Meth-
ods Softw., Vol. 10, pp. 125-146.
E. G. Birgin, J. M. Martinez, and M. Raydan (2000), Nonmonotone spectral
projected gradient methods on convex sets, SIAM J. Optim., Vol. 10, pp.
1196-1211.
A. Cauchy (1847), Mkthode ghnhrale pour la rksolution des systBmes d'kquations
simultanhes, Comp. Rend. Acad. Sci. Paris, Vol. 25, pp. 46-89.
Y. H. Dai and L.-Z. Liao (2002), R-linear convergence of the Barzilai and
Borwein gradient method, IMA J. Numer. Anal., Vol. 22, No. 1, pp. 1-10.
R. Fletcher (1990), Low storage methods for unconstrained optimization, Lec-
tures in Applied Mathematics (AMS), Vol. 26, pp. 165-179.
A. F'riedlander, J. M. Martinez, B. Molina, and M. Raydan (1999), Gradient
method with retards and generalizations, SIAM J. Numer. Anal., Vol. 36,
275-289.
W. Glunt, T. L. Hayden, and M. Raydan (1993), Molecular conformations from
distance matrices, J. Comput. Chem., Vol. 14, pp. 114-120.
L. Grippo, F. Lampariello, and S. Lucidi (1986), A nonmonotone line search
technique for Newton's method, SIAM J. Numer. Anal., Vol. 23, pp. 707-716.
W. B. Liu and Y. H. Dai (2001), Minimization Algorithms based on Supervi-
sor and Searcher Cooperation, Journal of Optimization Theory and Appli-
cations, Vol. 111, No. 2, pp. 359-379.
M. Raydan (1993), On the Barzilai and Borwein choice of steplength for the
gradient method, IMA J. Numer. Anal., Vol. 13, pp. 321-326.
M. Raydan (1997), The Barzilai and Borwein gradient method for the large
scale unconstrained minimization problem, SIAM J. Optim., Vol. 7, pp. 26-
33.
9 AN EXCHANGE ALGORITHM FOR
MINIMIZING SUM-MIN FUNCTIONS
Alexei V. Demyanov

Faculty of Mathematics and Mechanics

StPetersburg State University
Staryi Peterhof, St.Petersburg
198904 RUSSIA

Abstract: The problem of minimizing maxmin-type functions or the sum of

minimum-type functions appears to be increasingly interesting from both the-
oretical and practical considerations. Such functions are essentially nonsmooth
and, in general, they can successfully be treated by existing tools of Nonsmooth
Analysis. In some cases the problem of finding a minimizer of such a function
can be reduced to solving some mixed combinatorial-continuous problem.
In the present paper the problem of minimizing the sum of minima of a finite
number of functions is discussed. It is shown that this problem is equivalent to
solving a finite (though may be quite large) number of simpler (and sometimes
quite trivial) optimization problems. Necessary conditions for global a minimum
and sufficient conditions for a local minimum are stated. An algorithm for
finding a local minimizer (the so-called exchange algorithm) is proposed. It
converges to a local minimizer in a finite number of steps. A more general
algorithm (called €-exchange algorithm) is described which allows one to escape
from a local minimum point. Numerical examples demonstrate the algorithm.

Key words: Sum-min function, necessary conditions for a minimum, sufficient

conditions, exchange algorithm, &-exchange algorithm.
214 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Let functions 9ij(x) : Rn -+ R be given, where i E I := 1 : m, j E Ji := 1 : Ni.

Construct the functions

and the function

The function F defined by (1.2) is called a sum-min function. Many practical

problems arising, e.g., in Mathematical Diagnostics, Data Mining, Clustering,
Network Allocation etc. (see Mangasarian (1997), Rao (1971), Bagirov, Ru-
binov (1999), Bagirov, Rubinov, Yearwood (2001), Rubinov (2000)) can be
described by mathematical models where it is required to find a minimizer of
F. The problem of minimizing the function F is, first of all, generally speak-
ing, nonsmooth (even if all the functions cpij's are quite good) and "extremely"
multiextremal. The nonsmoothness can be treated by the existing tools of
Nonsmooth Analysis and Nondifferentiable Optimization (for example, local
properties of F can be studied, under some conditions, imposed on Pij, by
considering the directional derivatives of F ) . But the multiextremality remains
the main unavoidable obstacle.
In the paper we consider the problem of minimizing one class of sum-min
functions (the so-called separable-like sum-min functions). I t is known that
this problem is equivalent to solving a finite number of "simpler" problems.
Namely, it is shown that

inf F(x) = min inf Fj(x)

xeRn j E J xERn

where

A similar result for the best piece-wise polynomial approximation problem was
proved in Vershik, Malozemov, Pevnyi (1975).
For every j E J the function Fj(x) is called an elementary function, and
it is assumed that one is able to find a minimizer of Fj(x) (exactly or ap-
proximately). The problem of minimizing Fj(x) will also be referred to as
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 215

an elementary problem. In some cases the problem of minimizing Fj is quite

simple (see Examples in Section 7).
Theoretically, it is possible to find a global minimizer of the function F
by solving all elementary problems, however, the number of such elementary
problems is too large, therefore it is important to be able to find a proper local
minimizer at a reasonable computational price.
The paper is organized as follows. In Section 2 a special case of the above
problem is stated (namely, the case m = 2 and vij(x) = vij(xi), where x =
(XI,...,x,), xi E Rni, xiGI
ni = n). A result analogous to (1.3) is formulated
in Section 3. Necessary conditions for a point to be a global minimizer and
sufficient conditions for a point to be a local minimizer is proved in Section
4. An algorithm for finding a local minimizer is described in Section 5. This
algorithm (called an exchange algorithm) is based on the necessary minimality
conditions. In a finite number of steps a local minimizer is constructed. In
Section 6 a modification of the exchange algorithm (c-exchange algorithm)
is proposed to "escape" from a local minimum point. Illustrative numerical
examples are described in Section 7.

2 STATEMENT OF THE PROBLEM

Let a set of points R = {tl, . . . ,t ~ C) Y (where Y is some space) and functions

pi : R x Rni -+ R, i E 1 : m, be given. Put x = (xl,. . . ,x,) E S :=
Rnl x ... x Rnm.
Let
.min ~ i ( txi)
~ ( tX), = zEl:m ,
and
F(x)= C ~ ( tX), = C rnin ~ ( txi).,
zEl:m
t€R ~ E R
Problem P: Find a point x* = (xi,. . . ,x&) E S, such that

F(x*) = min F(x).

rES

In the paper the case m = 2 is described in detail. Note that no specific

requirements on pi are imposed (the functions pi's are not even assumed to be
continuous).
Thus, we consider the case m = 2, S = Rnl x Rn2.Then
216 OPTIMIZATION AND CONTROL WITH APPLICATIONS

For every yl c R and 7 2 c R put

If y = 0, then by definition c l ( x l , @ =
) ~ 0 )0.
2 ( ~ 2 ,=

Let us introduce the set

For any I? = (y1,72) E T ( R ) let us consider the function

It is clear, that for every E T(R)

F(x) < Fr ( x ) Vx E S.

For each fixed point x E S let us introduce the sets

The set C ( x ) is called the set of "common points".

Let sets Zl, Z2 be such that

Clearly, a1 U a2 = 0, a1 f l az = 0. Hence, any distribution (Zl,Z2)of the

common points generates the corresponding disjoint partition ( a l ,a 2 ) of the
set R.

Definition 2.1 Every such a partition (it depends upon x and, evidently, is
not unique) is called an x-proper partition of the set R.
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 217

Let T ( R ,x ) denote the family of all x-proper partitions of the set R. Clearly,

Besides, it is easy to see that for any r = ( y l ,7 2 ) E T ( R ,x ) the relation

holds.
For I' = (yl,y2) E T ( R ) let us introduce the function

= inf Fr ( x ) = inf C I ( X I , 71)+ inf c2(x2,7 2 ) .

@(r) xES x1~Rn1 z2~Rnz

Now it is possible to formulate the following

Problem PI: Find r* E T ( R )such that
@(r*) = min @(I?).
r€T(S1)

3 EQUIVALENCE OF THE TWO PROBLEMS

I t will now be shown that the problem P is equivalent to the problem PI, that is
the problem of minimizing the function F ( x ) on S is equivalent to the problem
of minimizing the function @(r)
on the set T ( R ) .
Let us suppose that for any a1 c R and a2 C R the functions cl ( X I , a l ) and
c2(x2,u 2 ) attain their minimal values on Rnl and Rn2, respectively.

Theorem 3.1 The following equality holds:

inf F ( x ) = min @(I?).

XES r€T(S1)

Proof: It is clear from (2.2) that for any r = ( y l , y 2 )E T ( R )

inf F ( x ) 5 inf Fr ( x ) = inf

xES xES xl€R"I
cl ( X I ,yl) + x2ERn2
inf c2(x2,y2) = @(r).

Hence,
inf F ( x ) 5 min @(I?). (3.2)
XES r€T(R)

Let us take an arbitrary Z = (Z1,Z2) E S. For every ro= ( a l ,c2)E T ( R ,Z ) ,

the conditions (2.4) and (2.5) yield the relations
218 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2 x l Einf
Rnl
c l ( x l , a l ) + inf c 2 ( ~ 2 , a 2 ) = G ( I ' ~ ) min
xzERnZ
2
r€T(R)
@(I?).

Since an arbitrary E S was chosen,

inf F ( x ) 2 min G(I').

xES r€T(R)

The inequalities (3.2) and (3.3) imply (3.1). A

min c2 ( ~ c2
5 2 , ~= ) (x2 (a2),~ 2 ) .
xz€Rnz (3.5)
The point x ( r ) = ( x l ( a l ) ,x 2 ( a 2 ) )is not unique, if the minima in (3.4) or (3.5)
are attained a t more than one point.

Remark 3.1 Theorem 3.1 implies that the problem of minimizing the function
F o n S is reduced to the problem of solving a finite number (precisely, IT(R)I)
of problems of minimizing the functions of the form

where r = ( a l , a 2 ) E T ( R ) . However, since IT(R)l = 21"1, then a quite large

number of points in the set R can annihilate the practical value of this fact.
Here, as usual, IAI stands for the number of points in a set A.

Thus, the problem of minimizing F on R becomes a combinatorial one (if the

problem of minimizing F r ( x ) is assumed to be an elementary one).

4 MlNlMALlTY CONDITIONS

Theorem 4.1 For a point x* E S to be a global minimizer of the function

F it is necessary that for any x*-proper partition I? = ( a l , a2) E T ( R ,x * ) the
following conditions hold:

22) c2 ( x ; ,aa) = xzmin

E Rnz
c2 ( x 2 ,a 2 ) .
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 219

If, in addition, the functions cpi Is, i E 1 : 2, are continuous (respectively, in

xi), then the above conditions are suJ3Cicient conditions for a point x* to be a
local minimizer.

Proof: Necessity Let x* be a global minimizer of the function F. Assume

that the theorem is not valid. Let, for example, the condition (4.1) don't hold.
Then there exists a point Z1 E Rnl such that

Then for the point Z = (Zl, x;) E S by (4.3) and (2.2) one gets

which contradicts the fact, that the point x* is a global minimizer of F.

SufficiencyLet the functions cpi's be continuous in xi and the conditions (4.1),
(4.2) hold. Then the function

is also continuous. The continuity of the function cp with respect to x E S

implies that

for every ti E Zl(x*) there exists hi > 0 such that ti E Z1(x) for any
x E B(x*, Si),

for every t j E &(x*) there exists ~j > 0 such that t j E &(x) for any
x E B(x*,E ~ ) .

Here B(x,S) = {x E S 1 11x - xll < 6).

Put
6 = min{&, ejIti E Zl(x*), t j E ?2(x*)}.

Thus, for all 5 E B(x*, 6) we have

1 * c1 ) $2 (x*) C Z2(z).

Let us consider the sets

220 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Now we will prove the inclusions Al C C(x), A2 c C(x).

Let t E Al, then, since

either t E Z2(x),or t E C(x). However, if t E Z2(x*) c Z2(2), then we have

which contradicts the fact that Zl(3) n &(3) = 0. SO, t E C(x*), and, hence,
Al c C(x*). Similarly we get A2 c C(x*).
Now let us take an arbitrary disjoint partition (Zl, Z2) of the set C(Z). Since
for the corresponding %proper partition (a1,a2) E T(R, Z) we obtain

I t is easy to see, that from (4.4) it follows, that El c C(3) c C(x*), and
52 c C(Z) C C(x*), therefore

and (A1 U (TI, A2 U Z2) is a disjoint partition of the set C(x*).

So, the equalities (4.5) and (4.6) ensure us, that ( a l , 02) is an x*-proper
partition of the set a, and the hypotheses of the theorem are satisfied. Hence,
the condition (4.1) implies that

and the condition (4.2) implies that

It follows from (4.7) and (4.8) that x* is a local minimizer of the function F.
n
Definition 4.1 A point x* E S satisfying the conditions (4.1) and (4.2) is
called a stationary point.
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 221

If cpi are continuous, a stationary point is a local minimizer. The opposite is

not true: not every local minimizer is a stationary point.

Remark 4.1 The notion of stationary point is closely related to the necessary
condition used. Since the conditions (4.1) and (4.2) are necessary conditions
for a global minimum then it is natural to expect that not every local minimizer
is a stationary point. Note, that conditions (4.1) and (4.2) are of nonlocal
nature.

5 A N EXCHANGE ALGORITHM

Let us suppose that for every I' = (al, a2) E T(R) infima

inf ci(xi, ai), i = 1,2,

xi E Rni

are attained, that is there exists a point x ( r ) = (xl (al), x2(u2)) E S, such that

The following algorithm allows one to find a stationary point of the function
F (and if pi, i = 1,2, are continuous, then the resulting point will be a local
minimizer of F).

1. Take an arbitrary xO= (xp, xg) E S .

2. Let x" (xF,x!j) E S have already been found. Construct the sets
Sl(xk), S2(xk)and z(xk).

3. Check the conditions (4.1) and (4.2) for all I'= (al,02) E T(R, xk).

4. If the conditions (4.1) and (4.2) are satisfied for all I? E T(R, xk) then the
point xk is stationary, and the process terminates.

5 . Otherwise, find any rk= (a!, a;) E T(R, x" for which one of the condi-
tions (4.1), (4.2) is violated.

6. w If the condition (4.1) holds, then put ?i$ = x t .

w If not, then find 35: E Rnl such that
222 OPTIMIZATION AND CONTROL WITH APPLICATIONS

If the condition (4.2) holds, then put Z; = xk

If not, then find Zi E Rn2 such that

7. Put xk+l = (z~,z!). Go to step 2.

Clearly,
F(x"') <~ ( x ~ ) ) .
As a result, a finite sequence {xk} is constructed such that condition (5.2)
holds. Since every x-proper partition (af,ag)may occur only once (due to
(5.2)), then, taking into account the fact that IT(R)I is finite, one concludes
that the algorithm converges to a stationary point in a finite number of steps.

Remark 5.1 The algorithm described above may require at some steps com-
plete enumeration of the set IT(R, xk))l= 21'(~~)l.So, i n practice the algorithm
is effective i f lIc(xk)l is not very large. Theoretically, the case of complete enu-
meration of T(R) is possible (as for every algorithm of discrete mathematics).

6 A N &-EXCHANGE ALGORITHM

In this section it is assumed that cpi's are continuous. For every fixed point
x E S and E > 0 let us introduce the sets

The set C,(x) is called the set of "E-common points".

Let sets 51,6 be such that

Put
a1 = 51 U (x), a2 =52 u UE2(2).
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 223

It is easy t o see, that a1 U a 2 = R , a1 nu2 = 0. Hence, any distribution ( Z l ,Z2)

o f t h e set o f €-common points generates t h e corresponding disjoint partition
(al,a2)o f t h e set R .

Definition 6.1 Every such a partition (it depends upon x , E and, evidently, it
is not unique) is called an ( x ,€)-proper partition of the set R .

Let us denote by T E ( R x, ) t h e family o f all ( x ,€)-proper partitions o f t h e set

R . Clearly,
T ( R ,x*) c T E ( R x, ) cT(R).
Take r = ( y l , y 2 ) E T ( R ) and construct t h e point x ( r ) (see (5.1)). B y
applying t h e exchange algorithm (described i n Section 5 ) , and taking x ( r ) as
t h e initial point, i n a finite number o f steps we get a point x * ( r ) which is a
stationary point o f t h e function F .

Definition 6.2 Let all cpi be continuous and E > 0. A point x* E S i s called
an €-local minimizer of the function F , i f

x* is a stationary point of the function F ,

for every (x*,E ) -proper partition = (a1,a2) E T, (0,x * ) of the set R for
the point x * ( r ) (that is a local minimizer of F , delivered by the exchange
algorithm with the initial point x ( r ) ) the inequality

holds.

Remark 6.1 Since cpi's are continuous, every stationary point is a local mini-
mizer, the converse is not true: a local minimizer is not necessarily a stationary
point.

Remark 6.2 The stationarity of a point is based o n the necessary condition

(see Rem. 4.1) while the property of being an €-local minimizer depends also o n
the exchange algorithm. W e will use this notion for the sake of convenience.

Definition 6.3 A point x* E S is called a strict €-local minimizer of the func-

tion F , if it is an €-local minimizer of the function F , and, i n addition, the
inequality (6.1) is strict, i. e.
224 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Note that for I? E T(R, x*) the inequality (6.1) becomes the equality.
Let us describe an algorithm for finding €-local minimizers. Let E > 0 be
fixed.

1. Choose an arbitrary stationary point x0 E S (constructed, for example,

by the exchange algorithm).

2. Let x" (xt, x;) E S have already been found.

3. Construct the sets i?El(xk),3Ez(xk)and C,(X~)).

4. Check the €-local minimality of xk (since xk is a stationary point, it

remains to verify the condition (6.1) for all r E T,(R, xk)).
If the point xk is an €-local minimizer, then the process terminates. Oth-
erwise, go to step 5.

5. Find any I'k E T,(R,xk) such that

Such a rr,exists, since xk is not an €-local minimizer.

6. Put xk+' = x * ( r k ) and go to step 3.

As a result, in a finite number of steps an €-local minimizer is constructed.

Remark 6.3 If e is suficiently large then an E-local minimizer is a global

minimizer of the function F . I n this case T,(R,x) = T(R), and hence i f E is
large, from the computational point of view, the €-exchange algorithm is not
effective (it is equivalent to the complete enumeration of the elements of the set
T(R)).
If E is fairly small, the above described algorithm allows one to escape from
a local minimum point (if the point itself is not yet an e-local minimizer).

7 AN APPLICATION T O ONE CLUSTERING PROBLEM

Let a set of points R = {tl, . . . , t N ) C Rn be given. Introduce the functions

where llx112 = (x, x). Put x = (XI,xz),

AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 225

Note that cpi's are continuously differentiable and convex. Consider the follow-
ing clustering problem.
Problem CP: Find x* = (x;, xs) E Rn x Rn such that

F(x*)= min F(x),

x€RnxRn

where

Take any = (01, a2) E T(fl, x), the functions ci(xj, ai) defined in (2.1) take
the form
ci(xj,ai) = C lit- xi1I2, i = 1,2. (7.1)
tEoi

Clearly,
min ci (xi, ai) = ci (xi (ui), ui) ,
xieRn (7-2)
where

Many allocation problems can be described by the above model, maybe with
slightly different performance functionals (for example, in ? the functions cpis
have the form cpi(t, x) = Ilt - xill). We have chosen the quadratic functions
cpi (see (7.1)) for the simplicity reasons (since then the auxiliary problems (see
(7.2), (7.3)) have the explicit solutions), because our main intention here is to
demonstrate the Algorithm.

Remark 7.1 Let

Then for every 22 E Rn such that 11x2 - xi11 > 2M, the point (x;,x2) is
stationary (that is, a local minimizer as well). Such local minimizers will be
called trivial stationary point, and we shall ignore them, looking for better ones.

7.1 Example 1

Let n = 2, and fl be as shown in Table 7.1. The set fl contains 32 points in

R2, and we have to find two clusters xi E R2 and xz E R2, which minimize the
226 OPTIMIZATION AND CONTROL WITH APPLICATIONS

functional

F(x) = F(x1, 22) = C min{llti - XI /I2, llti - x21I2},

iEI

where I = 1 : 32. First, let us solve the 1-cluster problem:

Find
min Fl(x1) = Fl(x;),
xl€R2
where

I t follows from (7.3) that x; = (-0.5,2.0656) and Fl(x;) = 782.2722.

The first and quite natural idea is to take the point xO= (x?,x;) as the initial
point for our 2-cluster problem. In this case C(xO)= R (and F(xO)= Fl(xT) =
782.2722), and a t the first step of the exchange algorithm (only to check the
conditions (4.1) and (4.2)!!!) it is required to solve 232 elementary problems
(which is absolutely unacceptable from practical considerations). However, it
is sufficient to slightly perturb xO to overcome this difficulty: e.g., take To =
+
(x; z1, x;), where zl = (0,0.01).
Applying the exchange algorithm with To as the initial point, in two steps
we get the local minimizer T = (El, Zz), where

and F(:) = 523.9929.

Now take 2 = (xT + z2, x i ) where x; is as above, z2 = (0.01,O).
Applying the exchange algorithm with ??as the initial point, in four steps
- -
we get another local minimizer = (TI, R ) ,where

and F(T) = 417.5478.

The point ?i? and the corresponding partition (u1, u2) of the set R a t ?i? are
shown in Figure 7.2. The notations used in the Figures are explained in Figure
7.1. The points in 3 1 ( ~ 0are
) referred to as the points in the first cluster, and
the points in S2(T0) are referred to as the points in the second cluster.
For the initial point xO = (xy,x:) E R2 x R2, where xy = (-5, lo), x: =
(5, -5) (the function value F(xO) = l707.81), the exchange algorithm termi-
nated in three steps, resulting in the local minimizer x* = (x;,xb), where
AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS 227

t center of the first cluster

vb t center of the second cluster

X t points in the first cluster

t points in the second cluster

A t common points

Figure 7.1 Legend.

The function value is F(x*)= 498.4104. The point x* and the partition (al,
a2)
of the set 0 a t x* are shown in Figure 7.3.
For the initial point xO = (xy,xi) with xy = (10,lo), x: = (-10,-10) in
three steps the local minimizer x* = (x;,xa),where xy = (1.9500,2.9800),
xa =
(-4.5833,0.5417),is found with F(x*)= 417.5478.

Table 7.1 The set of points 0 = {tl, . . . ,t32)C R2.

7.2 Example 2

Consider again the problem discussed in Example 1, and apply the €-exchange
algorithm. As the initial point for the €-exchange algorithm let us choose one
of the local minimizers, obtained in Example 1.
The results of numerical experiments for the €-exchange algorithm with dif-
ferent c are presented in Table 7.2. The local minimizer xO = (xy,x:), where
228 OPTIMIZATION AND CONTROL WITH APPLICATIONS

-6
-8 -6 -4 -2 0 2 4 6

Figure 7.2 The partition o f the set fl at '73 = (x;4-21,x;).

X X X X

X X X

X X

I I 1 I I I I I
-6 -4 -2 0 2 4 6 8

Figure 7.3 Results o f the exchange algorithm.

AN EXCHANGE ALGORITHM FOR SUM-MIN FUNCTIONS

'8 X X

X
X X

X X

Va A
A

Figure 7.4 The first step of the &-exchange algorithm with E = 15 (six &-common points).

Figure 7.5 15 for the initial point x0 = (xy,x:),

An E-local minimizer with E =
xy = (-1.8421,4.1316),X: = (1.4615,-0.95385),F(x*) = 417.5478.
230 OPTIMIZATION AND CONTROL WITH APPLICATIONS

xy = (-1.8421,4.1316), x$ = (1.4615, -0.95385), was taken as the initial point

for these computations. In Figure 7.4 the sets l?,l(xO), l?,2(x0), C,(xO) are
depicted (they are denoted by X, *, A, respectively) for E = 15.
In Figure 7.5 the results of application of the &-exchangealgorithm are shown
(for &=I5and the initial point xO).As a result of the computations we received
four local minimizers:

x*l = (xT1,xzl), where

It is an &-localminimizer for e up to 4.

. x *=
~ (xT2,xa2), where

This point is an &-localminimizer for c up to 8.

- (xT3,x;~),where
x*3 -

The point x * is
~ an &-localminimizer for e up to 10.

. x *-
~ (xT4,xa4), where

The point x * is
~ an &-localminimizer for E up to 30.

Remark 7.2 W e have computed only integer values of E up to 30. I t seems

~ a global minimizer of F . It is interesting to note that for
that the point x * is
E = 9 at the first step the set of common points was the same as for E = 8.

However, at further steps the increase of E affected the result, since due to
deeper "E-diving" we were able to pick up a better-&-minimizer.

Remark 7.3 Numerical experiments with several real databases demonstrated

the effectiveness of the exchange algorithm.
REFERENCES 231

8 CONCLUSIONS

Thus, we have described two algorithms: the exchange algorithm for construct-
ing a stationary point and the &-exchange algorithm for finding, may be, a bet-
ter minimizer. The &-exchange algorithm allows one to "escape" from a local
minimum. These algorithms are conceptual (in the terminology of E.Polak)
(see Polak (1971)), though in some cases (as is demonstrated in section 7) they
are directly applicable.
It may happen, that the number of common (or &-common points) is large.
In such a case it is useful to perform some preliminary aggregation of these
points reducing their number to a reasonable quantity (as well as to reduce
or increase the value of e). The aggregation idea was proposed by Prof. M.
Gaudioso.
Computationally implementable modifications of the above algorithms for
specific classes of functions will be reported elsewhere.
At each step of the both algorithms an elementary problem of minimizing a
function of the form Fr(x)is to be solved. The algorithms converge in a finite
number of steps to, at least, a local minimizer. If E is sufficiently large, the
&-exchange algorithm will produce a global minimizer, however, theoretically
it may require the complete enumeration (and solution) of all elementary prob-
lems. The hope and expectation are that if we take a reasonable E, then (at
least statistically) the price asked for a fairly good local minimizer will not be
too high.
The case m > 2 can be studied in a similar way. Analogous results and
algorithms can be formulated. The number of "elementary" problems becomes
mlnl (cf. Rem. 3.1) and, of course, all calculations are more complicated.
However, the exchange and &-exchangealgorithms can be constructed.

Acknowledgments

The author is thankful to an anonymous referee for his careful reading of the manu-
script, useful advice, remarks and suggestions.

References

Bagirov, A.M., Rubinov, A.M. (1999), Global Optimization of IPH functions

over the unit simplex with applications to cluster analysis, In Proceedings of
232 OPTIMIZATION AND CONTROL WITH APPLICATIONS

the Third Australia- Japan Joint Workshop on Intelligent and Evolutionary

Systems, Canberra, Australia, pp. 191-192.
Bagirov, A.M., Rubinov, A.M. and Yearwood J. (2001), Using global opti-
mization to improve classification for medical diagnosis, Topics in Health
Information Management, Vol. 22, pp. 65-74.
Mangasarian, O.L. (1997), Mathematical Programming in Data Mining, Data
Mining and Knowledge Discovery, Vol. 1, pp. 183-201.
Polak E. (1971), Computational Methods in Optimization; A unified approach,
Academic Press, New York, NY.
Rao, M.R. (1971), Cluster analysis and mathematical programming, Journal
of the American Statistical Association, Vol. 66, pp. 622-626.
Rubinov, A.M. (2000), Abstract Convexity and Global Optimization, Kluwer
Academic Publishers, Dordrecht.
Vershik, A.M., Malozemov, V.N., Pevnyi, A.B. (1975), The best piecewise poly-
nomial approximation, Siberian Mathematical Journal, Vol. XVI, Nr. 5, pp.
925-938.
REFER.ENCES 233

Table 7.2 The behavior of the &-exchange Algorithm, depending on E .

a , E )denotes the &-localminimizer constructed by the €-exchange Algorithm.

x: = ( x ; ~x g
Although at the first step the set of common points was the same as for E = 8, at further

steps the increase of E affected the result, allowing to find a better €-local minimizer.
At this point the computations were terminated.
ON THE BARZILAI-BORWEIN
METHOD
Roger Fletcher

Department of Mathematics
University of Dundee
Dundee DD1 4HN, Scotland, UK
Email: [email protected]

Abstract: A review is given of the underlying theory and recent developments

in regard to the Barzilai-Borwein steepest descent method for large scale un-
constrained optimization. One aim is to assess why the method seems to be
comparable in practical efficiency to conjugate gradient methods. The impor-
tance of using a non-monotone line search is stressed, although some suggestions
are made as to why the modification proposed by Raydan (1997) often does not
usually perform well for an ill-conditioned problem. Extensions for box con-
straints are discussed. A number of interesting open questions are put forward.

Key words: Barzilai-Borwein method, steepest descent, elliptic systems, un-

constrained optimization.
236 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The context of this paper is the solution of the unconstrained minimization

problem

minimize f (x) x E IRn (1.1)

where the number of variables n is very large, typically lo6 or so. The case of
minimization subject to simple bounds is also considered later in the paper. A
related problem is that of the solution of a nonlinear self-adjoint elliptic system
of equations

in which y = Vf is the gradient of some variational function. The case in

which f (x) is a strictly convex quadratic function

is also studied, which is equivalent to the solution of the linear system of equa-
tions

in which A is a positive definite symmetric matrix. This is referred to as

the quadratic case and is important, not only as a model problem to analyse
properties of methods, but also in its own right as a problem that becomes
difficult to solve for large n, when (sparse) Choleski factorisation is impractical
due to lack of time or storage capacity. It is the large scale situation (both for
quadratic or non-quadratic problems) that we study in this paper.
The methods that we study are all iterative methods, since in the quadratic
case we cannot expect to be able to carry out enough iterations to obtain an
exact solution, even if the theory allows this possibility, due to the size of n or
the build up of round-off error. Early methods such as relaxation methods or
SOR have mostly been supplanted by use of the conjugate gradient method or
variations thereof. For the quadratic case the conjugate gradient (CG) method
itself (Hestenes and Stiefel (1952)), or some preconditioned conjugate gradient
(PCG) method (see for example Golub and Van Loan (1966)), is usually the
method of choice, although there are other variants such as the minimum resid-
ual (MR) algorithm that are also applicable to the case that A is symmetric
BARZILAI-BORWEIN METHOD 237

and indefinite. A particular feature of these methods is that they terminate in

a t most n iterations. This is not particularly exciting when n is large, but Reid
(1971) gives reasons why the methods are effective as iterative methods in that
they are able t o deliver a reasonably accurate estimate of the solution in sub-
stantially fewer than n iterations, particularly if A has a favourable eigenvalue
structure. The CG method is particularly attractive because it only requires
4n storage locations for its implementation. PCG methods need to store and
solve linear systems with some matrix that approximates A and makes tolerable
demands on time and storage.
For non-quadratic systems there are various methods of line search type
that are based on using the search direction formula of the CG iteration. The
simplest methods are those of Fletcher and Reeves (1964) (3n locations) and
Polak (1971) (4n locations), the latter being more usually preferred in practice.
These can also be preconditioned in a manner similar to the quadratic case.
Then there are also methods that use rather more storage, such as CONMIN (
Shanno and Phua (1980)) (7n locations), the Limited Memory BFGS method
(Nocedal (1980)), (9nS locations), the Truncated Newton method (Dembo,
Eisenstat and Steihaug (1982)), and many others.
Amongst all of these, steepest descent methods hardly rate a mention in a
modern text-book on optimization, even though the storage requirements are
minimal (3n locations). Indeed, the 'classical' steepest descent method with
exact line seach (Cauchy (1847)) is known to behave increasingly badly in the
quadratic case as the condition number of A deteriorates. Early attempts to
modify the method led to the introduction of CG methods, with much superior
performance.
In 1988, a paper by Barzilai and Borwein (1988) proposed a steepest de-
scent method (the BB method) that uses a different strategy for choosing the
step length. The main result of the paper is to show the surprising result
that for n = 2, the method converges R-superlinearly. Barzilai and Borwein
also show that their method is considerably superior to the classical steepest
descent method for one instance of a quadratic function with n = 4, but no
other numerical results are given. Fletcher (1990) investigates some connec-
tions with the spectrum of A in the quadratic case, and an ingenious proof by
Raydan (1993) demonstrates convergence in the quadratic case. However, nei-
ther of these papers gives any numerical results and the method attracted little
238 OPTIMIZATION AND CONTROL WITH APPLICATIONS

attention until the seminal paper of Raydan (1997). This paper introduces
a globalization strategy based on the non-monotone line search technique of
Grippo, Lampariello and Lucidi (1986), which enables global convergence of
the BB method to be established for non-quadratic functions. Of equal impor-
tance, a wide range of numerical experience is reported on problems of up to
lo4 variables, showing that the method compares reasonably well against the
Polak-RibiBre and CONMIN techniques. Earlier papers by Glunt, Hayden and
Raydan (1993) and Glunt, Hayden and Raydan (1994) also report promising
numerical results on a distance matrix problem. The paper Glunt, Hayden and
Raydan (1994) reports on the possibilities for preconditioning the BB method,
and this theme is also taken up by Molina and Raydan (1996). Of particular
interest is the possibility of applying the BB method to box-constrained opti-
mization problems, and this is considered by Friedlander, Martinez and Raydan
(1995) (for quadratic functions) and by Birgin, Martinez and Raydan (2000).
The latter paper considers the BB method in the context of projection on to
a convex set. Another recent theoretical development has been the result that
the unmodified BB method is R-linearly convergent in the quadratic case (Dai
and Liao (1999)).
Despite all these advances, there is still much to be learned about the BB
method and its modifications. This paper reviews what is known about the
method, and advances some reasons that partially explain why the method is
competitive with CG based methods. The importance of maintaining the non-
monotonicity property of the basic method is stressed. It is argued that the
use of the line search technique of Grippo, Lampariello and Lucidi (1986) in
the manner proposed by Raydan (1997) may not be the best way of globalizing
the BB method, and some tentative alternatives are suggested. Some other
interesting observations about the distribution of the BB steplengths are also
made. Many open questions still remain about the BB method and its potential,
and these are discussed towards the end of the paper.

2 T H E BB METHOD FOR QUADRATIC FUNCTIONS

The theory and practice of line search methods for minimizing f (x) has been
well explored. In such a method, a search direction dk)is chosen a t the start
of iteration k, and a step length 8k is chosen to (approximately) minimize
f (dk) + 8 d k ) ) with respect to 0. ' x("
Then x ( ~ += + O ~ S ( " is set. Usually
BARZILAI-BORWEIN METHOD 239

s(qTy(" < 0 for a11 k (a descent method) and it is possible to guarantee

that the method is monotonic in the sense that the sequence { f ( k ) } is strictly
monotonically decreasing unless a stationary point is exactly located. The
classical steepest descent method belongs to this class, with s(" = - - y ( k ) . CG
methods have s ( l ) = -y(') and s(" = --y(k) +
+ks("l) for k > 1, where
,& = y ( k ) T y ( k ) / y ( k - l ) T y ( " - l )in the Fletcher-Reeves (FR) method, and PI, =
y(k)T (y(" - y("l))/y(k-l)Ty("l) in the (Polak-RibiBre) (PR) method. When
f (x) is the quadratic function (1.3), the step Ok is readily calculated from
the expression Ok = - ~ ( ~ ) ~ y ( ~ ) / s ( "For
~ ~non-quadratic
s("). functions it is in
general only possible to find an approximate solution of the line search problem,
and for CG methods it seems better if the solution is reasonably accurate.
In contrast, the BB method is a fixed step gradient method, which we choose
to write in the form

Initially, a1 > 0 is arbitrary, and Barzilai and Borwein give two alternative
formulae,

and

for k > 1, where we denote y("l) = y ( k )-y ( k - l ) . In fact, attention has largely
been focussed on (2.2) and it is this formula that is discussed here, although
there seems to be some evidence that the properties of (2.3) are not all that
dissimilar.
In the rest of this section, we explore the properties of the BB method and
other gradient methods for minimizing a strictly convex quadratic function.
For the BB method, (2.2) can be expressed in the form

and can be regarded as a Rayleigh quotient, calculated from the previous gradi-
ent vector. This is in contrast to the classical steepest descent method which is
equivalent to using a similar formula, but with y("l) replaced by y(". Another
240 OPTIMIZATION AND CONTROL WITH APPLICATIONS

relevant property, possessed by all gradient methods, and also the conjugate
gradient method, is that

That is to say, the total step lies in the span of the so-called Krylov sequence.
Also for quadratic functions, the BB method has been shown to converge (
Raydan (1993)), and convergence is R-linear (Dai and Liao (1999)). However
the sequences {f ( ~ ( ~ 1and
) ) {Ilr(x(k))l12) are non-monotonic, an explanation
of which is given below, and no realistic estimate of the R-linear rate is known.
However the case n = 2 is special, and it is shown in Barzilai and Borwein
(1988) that the rate of convergence is R-superlinear.
To analyse the convergence of any gradient method for a quadratic function,
we can assume without loss of generality that an orthogonal transformation is
made that transforms A to a diagonal matrix of eigenvalues diag(Xi). Moreover,
if there are any eigenvalues of multiplicity m > 1, then we can choose the
corresponding eigenvectors so that = 0 for a t least m - 1 corresponding
indices of $'). It follows from (2.1) and the properties of a quadratic function
that y(k+l) = y(k) - AF)")~~
and hence using A = diag(Xi) that

It is clear from this recurrence that if gIk) = 0 for any i and k = kt, then this
property will persist for all k > k'. Thus, without any loss of generality, we
can assume that A has distinct eigenvalues

and that 9:') # 0 for all i = 1, 2,. . . ,n.

Many things can be deduced from these conditions and (2.6). First, if cur, is
equal to any eigenvalue Xi, then gik+') = 0 and this property persists subse-
quently. If both

then it follows from (2.4) and the extremal properties of the Rayleigh quotient
that
BARZILAI-BORWEIN METHOD 241

Thus, for the BB method, and assuming that a1 is not equal to A1 or A,, then
a simple inductive argument shows that (2.8) and (2.9) hold for all k > 1. It
follows, for example, that the BB method does not have the property of finite
termination.
From (2.6), it follows for any eigenvalue Xi close to cxk ~ ) lgI,(")l.
that I ~ , ! ~ +<<
It also follows that the values lgl(k) I are monotonically decreasing. However, if
on any iteration a k < $A,, then lgik+')l
> lgik)I
and if a k is close to XI then
the ratio of ~ g i ~ + ' ) / g can
i ~ ) approach
~ A,/A1 - 1. Thus we see the potential for
non-monotonic behaviour in the sequences {f ( ~ ( ~ 1and
) ) {Ilr(x(k))112), and the
extent of the non-monotonicity depends in some way on the size of the condition
number of A. On the other hand, if a k is close to A, then all the coefficients gi
decrease in modulus, but the change in gl is negligible if the condition number
is large. Moreover, small values of cur, tend to diminish the components lgil
for small i and hence enhance the relative contribution of components for large
i. This in turn leads to large values of a k on a subsequent iteration, if the
step is calculated from (2.4). Thus, in the BB method, we see values of a k
being selected from all parts of the interior of the spectrum, with no apparent
pattern, with jumps in the values of f ( ~ ( ~and
1 ) Il-y(~(~))ll2
occurring when a k

is small.
There are a number of reasons that might lead one to doubt whether the
BB method could be effective in practice. Although a nice convergence proof is
given by Raydan (1993), we have to recognise the fact that although both the
CG and BB methods select iterates that satisfy the Krylov sequence property
(2.5), it is the CG method that gives the minimum possible value of f (x("')).
Likewise the Minimum Residual (MR) method gives the minimum possible
value of Thus we must accept that the BB method is necessarily
inferior in regard to these measures in exact arithmetic, and there is limited
scope for the BB method to improve as regards elapsed time, for example. Also
the possibility of non-monotonic behaviour of the BB method might seem to
give further reason to prefer the CG method.
To see just how inferior the BB method is, a large scale test problem is de-
vised, based on the solution of an elliptic system of linear equations arising from
a 3D Laplacian on a box, discretized using a standard 7-point finite difference
242 OPTIMIZATION AND CONTROL WITH APPLICATIONS

stencil. Thus we define the matrices

and

where T is 1 x I, W is block m x m and A is block n x n where 1,m, n are the

number of interior nodes in each coordinate direction, and are specified by the
user. The interval length is taken to be h = 1/(1+ 1) and is the same in each
direction. Hence the dimensions of the box are 1 x Y x Z where Y = (m + l)h
and Z = (n + 1)h. We fix the solution of the problem to be function

evaluated at the nodal points. This function is a Gaussian centered on the

point (a, ,B, r ) , multiplied by quadratics x(x - 1) etc., that give u = 0 on
the boundary. The parameter a controls the rate of decay of the Gaussian.
The problem has lmn variables, and we denote u* to be the solution derived
from (2.11) and calculate the right hand side ,6 = Au*. In our tests we choose
1 = m = n = 100 giving a problem with lo6 variables, in which case the
condition number of A is &/A1 = 4133.6 = lo3.". We also choose u(l) = 0.
We denote the resulting test problem to be Laplacel and choose the parameters
in two different ways, that is

(a) a = 2 0 , a=P=y=O.5 (b) u = 5 0 , a = 0 . 4 , P = 0 . 7 , y = 0 . 5 .

The problem Laplacel(a) has the centre of the Gaussian in the centre of
the box, giving the problem a high degree of symmetry. Also the smaller value
of a gives a smoother solution. Hence this problem is more easy to solve than
Laplacel (b) .
BARZILAI-BORWEIN METHOD 243

The results for this problem are given in Table 2.1 below. The CG method is
coded as recommended by Reid (1971). Times are given in seconds and double
precision Fortran is used on a SUN Ultra 10 a t 440 MHz. The iteration is
terminated when Ily(k))12 is less than of its initial value.

Table 2.1 Double length comparison (quadratic case)

1 Problem 1 Time
BB
Iterations
CG
Time Iterations
MR
Time Iteration! 1

We see from the table that there is little to choose between the CG and
MR methods, and the elapsed time improves on the BB method by a factor of
over 3 for Laplacel(a) and a factor of over 2 for Laplacel(b). For comparsion
purposes, the classical steepest descent method was manually terminated after
2000 iterations (1355 seconds), by which time it had only reduced the initial
gradient norm by a factor of 0.18, so that not even one significant figure im-
provement had been obtained. Thus we see that, although the performance of
the BB does not quite match that of the CG method, it is able to solve the
problem in reasonable time, and significantly improves on the classical steepest
descent method.
Nonetheless, in view of the above, we might ask if there are any circumstances
under which the BB method might be worth considering as an alternative to
the CG method. The answer lies in the fact that the success of the CG iteration
depends very much on the search direction calculation d k )= - ( k ) +pks("l)

being consistent with data arising from a quadratic model. Any deviation
from the quadratic model can seriously degrade performance. To illustrate
that relatively small perturbations can cause this to happen, we repeat the
calculations of Table 2.1 using single precision arithmetic. The results are
displayed in Table 2.2.
We see that the CG and MR methods now take more than twice as many
iterations for Laplacel(a), with a similar, but not quite as bad, outcome for
Laplacel(b). The comparison in time is less marked, presumably because of
244 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Table 2.2 Single length comparison (quadratic case)

1 Problem 1 Time
BB
Iterations
CG
Time Iterations Time
MR
Iteration! /
645 443 523
448 I
the cost savings associated with using single rather than double precision. For
the BB method a different picture emerges. For Laplacel(a), somewhat more
iterations are required, whereas for Laplacel(b), considerably fewer iterations
are required. Again the time comparison is improved by using single precision,
to such an extent that there is now little difference between the performance
of the BB and CG methods on the Laplacel(b) problem. My interpretation of
this is that the BB method is affected in a much more random way by round
off errors, and small departures of -y(" and ak from the values arising from a
quadratic problem are not necessarily detrimental.
This has implications for the likely success of the BB method in other con-
texts. For example, if f (x) is made up of a quadratic function plus a small
non-quadratic term, we might expect the BB method to still converge, and
show improved performance relative to the CG method. Another situation is
in the minimization of a quadratic function subject to simple bounds by an
active set or projection type of method. If the number of active constraints
changes, as is often the case, then it is usually not possible to continue to use
the standard CG formula for the search direction and yet preserve the termi-
nation and optimality properties. To do this it is necessary to restart using the
steepest descent direction when a new active set is obtained. Thus it is more
attractive to use the BB method in some way in this situation.

3 THE BB METHOD FOR NON-QUADRATIC FUNCTIONS

If the deviation of f (x) from a quadratic function is small then it may still be
possible to use the unmodified BB method successfully. However, in general
it is possible for the method to diverge. This is illustrated by using the test
BARZILAI-BORWEIN METHOD 245

problem referred to as S t r i c t l y Convex 2 by Raydan (1997), in which f (x)

is defined by

f (x) = hi (ex" xi)

The initial point is x(l) = (1, 1, . . . , l ) T and the solution is x* = 0 . The

Hessian matrix a t x* is diag((1, 2, . . . ,n) so that the condition number is n.
It is readily verified that the unmodified BB method converges for n = 20 but
diverges for n = 30, even though (3.1) is a strictly convex function and has a
positive definite Hessian matrix for all x.
It is therefore necessary to modify the BB method if it is to be used as a
general purpose solver for non-quadratic problems. An important contribution
is that of Raydan (1997) who suggests using the non-monotonic line search of
Grippo, Lampariello and Lucidi (1986). In Raydan's method (the BB-Raydan
method, say) the initial estimate of the line search step is I3 = a;', with some
adjustment if a;' turns out to be unreasonably large or small (and I3 is required
to be positive). An Armijo-type line search on I3 is then carried out until the
acceptance condition

f(x(" + d) 2 max(k-M,l)<j<k
max f(j) - yy(k)Td

is met, where d = - ~ y ( ~is )the displacement along the steepest descent di-
rection. This allows any point to be accepted if it improves sufficiently on the
+
largest of the M 1 (or Ic if Ic 5 M) most recent function values. As usual
y > 0 is a small preset constant, and the integer M controls the amount of
monotonicity that is allowed. Raydan recommends the value M = 10 and
presents a lot of encouraging numerical evidence on test problems with up to
n = lo4 variables. His results are competitive with CG methods but he observes
some poorer results on ill-conditioned problems.
To obtain more insight, a non-quadratic test problem of lo6 variables is
derived, based on a 3D Laplacian, in which the objective function is

- pTu
~ U ~ A U + ahZC u e k ,
ijk

which is not untypical of what might arise from a nonlinear partial differential
equation. This problem is referred to as Laplace2. The matrix A is that
defined in (2.10), and the vector /3 is chosen so that the minimizer u* of (3.3)
246 OPTIMIZATION AND CONTROL WITH APPLICATIONS

is the function u(x, y, z ) in (2.11), evaluated a t the discretization points. The

non-quadratic term in (3.3) includes a factor h2, and 0 5 uijk < 1 which also
makes the utjk term small. Thus the relative contribution of the non-quadratic
term is small, and as we shall see, the unmodified BB method is able to solve
instances of this problem.
The progress of various methods for solving Laplace2(b) is given in Table
3.1. The columns headed #Is, #f and #g give the numbers of line searches,

Table 3.1 Time (minutes) and numbers of evaluations t o solve Laplace2(b)

5 figures 6 figures
Problem Time #ls #f #g Time #IS #f #gl 1
Polak-Ribiitre CG 20.6 445 697 684 oo
Limited mem. BFGS 35.4 315 711 669 oo
BB-Raydan M=10 29.0 274 1140 866 40.4 394 1595 1201
Unmodified BB 14.4 - 487 487 16.7 - 572 572
BB method (y only) 8.8 - - 487 10.3 - - 572

function evaluations and gradient evaluations required to solve the problem.

The calculations are carried out in double precision Fortran and the non-BB
methods both use the same line search based on standard strong Wolfe con-
ditions with a relative slope tolerance of 0.1. For the BB-Raydan method the
column #Is gives the number of occasions on which the Armijo line search is
used. In the limited memory method, 3 back pairs of vectors are stored. Ini-
tially an accuracy tolerance of better than of the initial gradient norm was
required (the column headed '6 figures' in the table) but only the BB methods
were able to find the solution to this accuracy. Therefore the comparison is
carried out on the basis of 5 figure accuracy of the initial gradient norm
required).
It can be seen that here the unmodified BB method actually improves on
the PR-CG method, but that this improvement is not maintained for the BB-
Raydan method. The limited memory BFGS method shows up worst in the
tests. The line search for the non-BB methods is seen to be reasonably efficient
BARZILAI-BORWEIN METHOD 247

with only about two function and gradients calls per line search. The BB
method (y only), to be described below, gives the best performance. One reason
for the improvement of the unmodified BB method over the PR-CG method
might be the effect of non-quadratic terms degrading the performance of the CG
method. Another possibility is that the CG line search now requires additional
evaluations of the function and gradient to attain the required accuracy in
the line search. One would not like draw any firm conclusions on the basis of
just one set of results, but these results do reinforce Raydan's conclusion that
the BB method, suitably modified, can match or even improve on the PR-CG
method.
Probably the most interesting outcome to emerge is the difference in per-
formance of the unmodified BB method and the BB-Raydan method. The
reasons for this are readily seen by examining the performance of the unmodi-
fied method shown in Figure 3.1. Here the difference f (k)-f * is plotted on a log
scale against the number of iterations. A noticeable feature is the four occasions

iteration number

Figure 3.1 Performance of unmodified BB on Laplace2(b)

on which a huge jump is seen in f (" - f * above the slowly varying part of the
graph. In particular the jump around iteration 460 is over lo5 in magnitude.
248 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Although somewhat disconcerting, these jumps are actually very beneficial in

that they are soon followed by a considerable overall improvement in f ( k )-f *.
For example, before the spike a t around iterations 260-270, f (" - f * is slowly
varying a t around 1 0 - ~ .whereas
~, afterwards it is around 1 0 - ~ . an
~ , improve-
ment of two orders of magnitude. Similar improvements can be seen either side
of the other large spikes. I think the explanation for this is as follows. Consider
the quadratic case of the previous section and the effect of 'small' components
of the gradient (by which I mean components gi(k) for small i ) in the case that
the condition number X,/X1 is large. In this case, large values of crk have very
little effect on the small components, but they diminish significantly the size of
the large components, by virtue of (2.6). Subsequently therefore, an iteration
occurs on which the small components dominate $"I), and this gives rise to a
small Rayleigh quotient for a k (see (2.4)), which in turn causes a large increase
in the large components of the gradient. The effect over this, possibly over
two or three iterations, is to cause the huge spike in the graph of f ( k ) - f * .
(A similar effect is observed if Ily(")I2 is graphed.) However the effect of these
iterations is to significantly reduce the relative contribution of the small com-
ponents in the gradient, and it i s only by allowing large increases in the large
components that these small components can be eficiently removed. Gradient
methods which do not permit non-monotonic steps, or which limit their effect,
are only able to remove the small components slowly, and hence suffer from
slow convergence.
Looking a t the spike around iteration 460 in Figure 3.1, this value of f ( k )
could only be accepted by Raydan's modification for a value of about M = 200,
corresponding to the position of the previous higher spike. Therefore we see
that the value of M = 10 used by Raydan severely restricts the amount of non-
monotonicity that can occur. Moreover, the test (3.2) does not allow values
of f ( k ) that are larger than f ( l ) to be accepted. For Laplace2(b), the value
of f ( ' ) - f * is about 0.94 x Thus we see from Figure 3.1 that many of
the early spike values would not be acceptable, and it is only after iteration
270 or thereabouts that there are no spike values for which f ( k ) is greater than
f ('1. Therefore this is another feature of Raydan's modification that restricts
the amount of non-monotonicity. The above interpretation also explains why
the numerical results obtained by Raydan are poor for very ill-conditioned
problems. This is because, from (2.6), the non-monotonicity effects are most
BARZILAI-BORWEIN METHOD 249

noticeable if the condition number is very large. We have seen that the value of
M = 10 fails to allow the very large spikes to be accepted, which, as we argue
above, is important for avoiding slow convergence in a gradient method.
Obvious suggestions to improve the performance of Raydan's modification
are first to choose much larger values of M, especially if the problem is likely
to be ill-conditioned. Another suggestion is to allow increases in f(" up to a
user supplied value 7 > f (I) on early iterations. This is readily implemented by
defining f (" = 7 for k < 1 and changing (3.2) so that the range of indices j is
k- M 5 j 5 k. These changes make it more likely that the non-monotonic steps
observed in Figure 3.1 are able to be accepted. On the other hand, although
the convergence proof presented by Raydan would still hold, this extra freedom
to accept 'bad' points might cause difficulties for very non-quadratic problems,
and further research on how best to devise a non-monotone line search is needed.
Another idea to speed up Raydan's method is based on the observation that
the unmodified BB method does not need to refer to values of the objective
function. These are only needed when the non-monotone line search based on
(3.2) is used. Therefore it is suggested that a non-monotone line search based
on llyll is used. As in (3.2), an Armijo-type search is used along the line ~ ( ~ ) + d
where d = - ~ y ( ~ using
), a sequence of values such as 8 = 1, &, A,
. . .. An
acceptance test such as

would be used, where y E ( 0 , l ) is a small constant and we denote y = y ( ~ ( ~ ) +

d ) and a = dT(y - y(")/dTd, which is the prospective value of a k + l (see
(2.2)) if the step is accepted. Also we denote 11 y(k))12 = ij for k < 1, where ij is
a user supplied upper limit on Ily(k)l12. The calculation shown in the last line
of Table 3.1 is obtained by choosing M and ij sufficiently large so that Bk = 1
for all k, and shows the benefit to be gained by not evaluating f (x). To prove
convergence it is necessary to show that x(" + d would always be accepted for
sufficiently small 8 in the Armijo sequence. This is readily proved if f (x) is a
strictly convex function, as follows. Using the identity

and a Taylor series for y(x(" + d)) about x(" we may obtain
250 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and hence using the binomial theorem that

where a is defined above. It follows from the Taylor series and the strict
convexity of f (x) that there exists a constant X > 0 such that a 2 A, and
consequently that

if 0 is sufficiently small. Thus we can improve on the most recent value Il y (")12
in this case, and hence (3.4) holds a fortiori.

4 DISCUSSION

One thing that I think emerges from this review is just how little we understand
about the BB method. In the non-quadratic case, all the proofs of convergence
use standard ideas for convergence of the steepest descent method with a line
search, so do not tell us much about the BB method itself, so we shall restrict
this discussion to the quadratic case. Here we have Raydan's ingenious proof of
convergence Raydan (1993), but this is a proof by contradiction and does not
explain for example why the method is significantly better than the classical
steepest descent method. For the latter method we have the much more telling
result of Akaike (1959) that the asymptotic rate of convergence is linear and
the rate constant in the worst case (under the assumptions of Section 2) is
(An - X1)/(Xn + XI).
This exactly matches what is observed in practice. This
result is obtained by defining the vector p(k) by = ((gk))2/y(k)Ty(k)in the
notation of Section 2. This vector acts like a probability distribution for 7(k)

over the eigenvectors of A, insofar as it satisfies the conditions p(" 2 0 and

eTp(" = 1. Akaike shows that the components of p(" satisfy the recurrence
relation

and that the sequence {p(")) in the worst case oscillates between two accumu-
lation points el and en. Here the scalar product ATp(" is just the Rayleigh
quotient calculated from y(k) (like (2.4) but using y(k) on the right hand side).
A similar analysis is possible for the BB method, in which a superficially similar
BARZILAI-BORWEIN METHOD 251

Figure 4.1 Distribution of ak values for Laplacel(b)

two term recurrence

is obtained. However the resulting sequence {p(")) shows no obvious pattern,

and although it must have accumulation points, it is not obvious what they
are (probably they include el and en, but there may well be others), and the
oscillatory behaviour of classical steepest descent is certainly not seen.
In an attempt to obtain further insight into the behaviour of the BB method,
the distribution of the 1009 values of ar,obtained in Table 2.1 is graphed in
Figure 3.2. It can be seen that a very characteristic pattern
is obtained, with most of the ak values being generated in the vicinity of
A,. The range of values is consistent with a condition number of A,/A1 =
I t is also seen that there are very few values close to XI, and it is
values a t this end of the spectrum that give rise to the large non-monotonic
spikes such as are seen in Figure 3.1. It is easily shown from (2.6) that any
ak E (;An, A,) guarantees to reduce all components lgil, SO we see that the
great majority of iterations cause an improvement in f , and only relatively few
252 OPTIMIZATION AND CONTROL WITH APPLICATIONS

iterations give rise to an increase in f . I have observed the pattern of behaviour

shown in Figure 3.2 on a number of ill-conditioned problems, although N. Gould
(private communication) indicates that he has generated problems for which
the distribution of the a k does not show this pattern.
What we would like, and what we do not have, is a comprehensive theory
that explains these phenomena, and gives a realistic estimate of the rate of
convergence averaged over a large number of steps. A useful piece of information
a t any x(') would be a realistic bound on the number of iterations needed to
obtain a sufficient improvement on the best value of f (x) that has currently
been obtained. This for example could be used in a watchdog-type test for
non-quadratic functions, returning to the best previous iterate if the required
improvement were not obtained in the said number of iterations. It would also
be useful to have a theory that relates to how the eigenvalues Xi are distributed
within the spectrum. For example, if the eigenvalues are distributed in two
clusters close to XI and A,, then we would expect to be able to show rapid
linear convergence, by virtue of the R-superlinear result for n = 2.
Then there is the possibility of alternative choices of the step in a steep-
est descent method. A range of possibilities have been suggested by Fried-
lander, Martinez, Molina and Raydan (1999), amongst which is the repeated
use of a group of iterations in which a classical steepest descent step with
a k = y(k)T~y("/y("Ty(" is followed by the use of aj = ar, for the subsequent
+ +
m iterations, j = k 1,.. .,k m. For m > 1, this method can considerably
increase the non-monotonic behaviour observed in the sequences { f (x(")) and
~ ) . is shown from (2.6) because term (1- Ailak) is repeated m
{ I l y ( ~ ( ~ ) ) 1 1This
times, so that the effect of a value of a k close to A1 is to increase large compo-
nents Jgil by a factor close to (An/Al)m over the m iterations. Nonetheless, it
seems overall that the effect of this modification is beneficial, and values up to
say m = 4 can work well (Y-H. Dai, private communication). Clearly further
study of these possibilities is called for.
The success of the BB and related methods for unconstrained optimization
leads us to consider how it might be used for constrained optimization. This
has already been considered for optimization subject to box constraints, and
we review current progress in the next section. An important advance would
be to find an effective BB-like method for large-scale linear systems involving
BARZILAI-BORWEIN METHOD 253

the K K T matrix

Such an advance would be an important step in developing methods suitable

for large scale quadratic programming, and this could lead to the development
of methods for large scale nonlinear programming.

5 OPTIMIZATION WITH BOX CONSTRAINTS

Many methods have been suggested for solving optimization problems in which
the constraints are just the simple bounds

(see the references in Friedlander, Martinez and Raydan (1995) and Birgin,
Martinez and Raydan (2000) for a comprehensive review). Use of the BB
methodology is considered in two recent papers. That of Friedlander, Martinez
and Raydan (1995) is applicable only to quadratic functions and uses an active
set type strategy in which the iterates only leave the current face if the norm of
reduced gradient is sufficiently small. No numerical results are given, and to me
it seems preferable to be able to leave the current face a t any time if the com-
ponents of the gradient vector have the appropriate sign. Such an approach is
allowed in the BB-like projected gradient method of Birgin, Martinez and Ray-
dan (2000). This method is applicable to the minimization of a non-quadratic
function on any closed convex set, although here we just consider the case of
box constraints for which the required projections are readily calculated.
Birgin, Martinez and Raydan give two methods, both of which use an
Armijo-type search on a parameter 8. Both methods use an acceptance test
similar to (3.2) which only require sufficient improvement on the largest of the
M + 1 most recent function values in the iteration. In Method 1,the projection

is calculated, where ak is the BB quotient given in (2.2). Then an Armijo

search on 8 is carried out until an acceptable point is obtained. In Method 2
the point
Y = P(x(" - -(")lak)

is calculated, and an Armijo search is carried out along the line x = x ( ~ ) +

e ( y - ~ ( ~ 1 )Both
. methods are proved to be globally convergent, by using
254 OPTIMIZATION AND CONTROL WITH APPLICATIONS

a sufficient reduction property related to the projected gradient. Numerical

results on a wide variety of CUTE test problems of dimension up to about lo4
are described. These suggest that there is little to choose in practice between
Methods 1 and 2, and that the performance is comparable with that of the
LANCELOT method of Conn, Gould and Toint (1988), (1989).
There are a number of aspects in which improvements to Methods 1 and 2
might be sought. For a quadratic function we no longer have the assurance
that the unmodified BB method converges (in contrast to Raydan's proof in
Raydan (1993)), so that the methods rely on the Armijo search, and so are
open to the criticisms described in Section 3. It would be nice therefore if a
convergence theory for some BB-type projected gradient algorithm for the box
constrained QP problem could be developed that does not rely on an Armijo
search. Similar remarks hold in the non-quadratic case, and any developments
for box constrained QP problems can be expected to have implications for the
non-quadratic case. However, it will certainly be necessary to have modifica-
tions to allow for non-quadratic effects. Any developments for unconstrained
optimization, of the sort referred to in Section 3, may well be relevant here.
This could include for example the use of a watchdog-type algorithm that re-
quires sufficient improvement over a fixed number of steps. Thus there are
many challenging research topics in regard to BB-like methods that suggest
themselves, and we can look forward to interesting developments in the future.

References

Akaike, H. (1959), On a successive transformation of probability distribution

and its application to the analysis of the optimum gradient method, Ann.
Inst. Statist. Math. Tokyo, Vol. 11, pp. 1-17.
Barzilai, J . and Borwein, J.M. (1988), Two-point step size gradient methods,
IMA J. Numerical Analysis, Vol. 8, pp. 141-148.
Birgin, E.G., Martinez, J.M. and Raydan, M., Nonmonotone spectral projected
gradient methods on convex sets, SIAM J. Optimization, Vol. 10, pp. 1196-
1211.
Cauchy, A., (1847), Mdthode gdndrale pour la rdsolution des systems d'dquations
simultandes, Comp. Rend. Sci. Paris, Vol. 25, pp. 536-538.
REFERENCES 255

Conn, A.R., Gould, N.I.M. and Toint, Ph.L., (1988, 1989), Global convergence
of a class of trust region algorithms for optimization with simple bounds,
SIAM J. Numerical Analysis, Vol. 25, pp. 433-460, and Vol. 26, pp. 764-767.
Dai, Y.H. and Liao, L.-Z., (1999), R-linear convergence of the Barzilai and
Borwein gradient method, Research report, (accepted by IMA J. Numerical
Analysis).
Dembo, R.S., Eisenstat, S.C. and Steihaug, T., (1982), Inexact Newton Meth-
ods, SIAM J. Numerical Analysis, Vol. 19, pp. 400-408.
Fletcher, R., (1990), Low storage methods for unconstrained optimization, Lec-
tures in Applied Mathematics (AMS) Vol. 26, pp. 165-179.
Fletcher, R. and Reeves, C.M., (1964), Function minimization by conjugate
gradients, Computer J. Vol. 7, pp. 149-154.
Friedlander, A., Martinez, J.M. and Raydan, M., (1995), A new method for
large-scale box constrained convex quadratic minimization problems, Opti-
mization Methods and Software, Vol. 5, pp. 57-74.
Friedlander, A., Martinez, J.M., Molina, B., and Raydan, M., (1999), Gradient
method with retards and generalizations, SIAM J. Numerical Analysis, Vol.
36, pp. 275-289.
Glunt, W., Hayden, T.L. and Raydan, M., (1993), Molecular conformations
from distance matrices, J. Comput. Chem., Vol. 14, pp. 114-120.
Glunt, W., Hayden, T.L. and Raydan, M., (1994), Preconditioners for Distance
Matrix Algorithms, J. Comput. Chem., Vol. 15, pp. 227-232.
G. H. Golub and C. F. Van Loan, (1996), Matrix Computations, 3rd Edition,
The Johns Hopkins Press, Baltimore.
Grippo, L., Lampariello, F. and Lucidi, S., (1986), A nonmonotone line search
technique for Newton's method, SIAM J. Numerical Analysis, Vol. 23, pp.
707-716.
Hestenes, M.R. and Stiefel, E.L., (1952), Methods of conjugate gradients for
solving linear systems, J. Res. Nut. Bur. Standards, Sect. 5:49, pp. 409-436.
Molina, B. and Raydan, M., (1996), Preconditioned Barzilai-Borwein method
for the numerical solution of partial differential equations, Numerical Algo-
rithms, Vol. 13, pp. 45-60.
Nocedal, J., (1980), Updating quasi-Newton matrices with limited storage,
Math. of Comp., Vol. 35, pp. 773-782.
256 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Polak, E., (1971), Computational Methods in Optimization: A Unified Ap-

proach, Academic Press, New York.
Shanno, D.F., and Phua, K.H., (1980), Remark on Algorithm 500: Minimization
of unconstrained multivariate functions, ACM Trans. Math. Software, Vol.
6, pp. 618-622.
Raydan, M., (1993), On the Barzilai and Borwein choice of steplength for the
gradient method, IMA J. Numerical Analysis, Vol. 13, pp. 321-326.
Raydan, M., (1997), The Barzilai and Borwein gradient method for the large
scale unconstrained minimization problem, SIAM J. Optimization, Vol. 7,
pp. 26-33.
Reid, J.K., (1971), Large Sparse Sets of Linear Equations, Academic Press,
London, Chapter 11, pp. 231-254.
11 THEMODIFIEDSUBGRAIDENT
METHOD FOR EQUALITY CONSTRAINED
NONCONVEX OPTIMIZATION
PROBLEMS
Rafail N. Gasimov and Nergiz A. lsmayilova

Osmangazi University, Department of Industrial Engineering,

Bademlik 26030, Eskiqehir, Turkey

Abstract: In this paper we use a sharp Lagrangian function to construct a dual

problem to the nonconvex minimization problem with equality constraints. By
using the strong duality results we modify the subgradient method for solving
a dual problem constructed. The algorithm proposed in this paper has some
advantages. In contrast with the penalty or multiplier methods, for improving
the value of the dual function, one need not to take the "penalty like parameter"
to infinity in the new method. The value of the dual function strongly increases
a t each iteration. The subgradient of the dual function along which its value
increases is calculated explicitly. In contrast, by using the primal-dual gap, the
proposed algorithm possesses a natural stopping criteria. We do not use any
convexity and differentiability conditions, and show that the sequence of the
values of dual function converges to the optimal value. Finally, we demonstrate
the presented method on numerical examples.

Key words: Nonconvex programming, augmented Lagrangian, duality with

zero gap, subgradient method.
258 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION
In recent years the duality of nonconvex optimization problems have attracted
intensive attention. There are several approaches for constructing dual prob-
lems. Two of them are most popular: augmented Larangian functional ap-
proach (see, for example, Rockafellar (1993), Rockafellar and Wets (1998)) and
nonlinear Lagrangian functional approach (see, for example, Rubinov (2000),
Rubinov et a1 (to appear), Yang and Huang (2001)). The construction of
dual problems and what is more the zero duality gap property are important
as such the optimal value and sometimes the optimal solution of the origi-
nal constrained optimization problem can be found by solving unconstrained
optimization problem. The main purpose of both augmented and nonlinear
Lagrangians is to reduce the problem in hand to a problem which is easier to
solve. Therefore the justification of theoretical studies in this field is creating
new numerical approaches and algorithms: for example, finding a way to com-
pute the multipliers that achieve the zero duality gap or finding a Lagrangian
function that is tractable numerically.
The theory of augmented Lagrangians, represented as the sum of the or-
dinary Lagrangian and an augmenting function, have been well developed for
very general problems. Different augmented Lagrangians, including sharp La-
grangian function, can be obtained by taking special augmenting functions (see,
for example, Rockafellar and Wets (1998)). In this paper we consider the non-
convex minimization problem with equality constraints and calculate the sharp
Lagrangian function for this problem explicitly. By using the strong duality
results and the simplicity of the obtained Lagrangian, we modify the subgra-
dient method for solving the dual problem constructed. Note that subgradient
methods were first introduced in the middle 60s; the works of Demyanov (1968),
Poljak (1969a), (1969b), (1970) and Shor (1985), and (1995) were particularly
influental. The convergence rate of subgradient methods is discussed in Goffin
(1977). For further study of this subject see Bertsekas (1995) and Bazaraa et
a1 (1993). This method were used for solving dual problems obtained by using
ordinary Lagrangians or problems satisfying convexity conditions. However,
our main purpose is to find an optimal value and an optimal solution to a
nonconvex primal problem. We show that a dual function constructed in this
paper is always concave. Without solving any additional problem, we explicitly
calculate the subgradient of the dual function along which its value strongly
MODIFIED SUBGRADIENT METHOD 259

increases. Therefore, in contrast with the penalty or multiplier methods, for

improving the value of the dual function, one need not to take the "penalty like
parameter" to infinity in the new method. The paper is outlined as follows.
The construction of sharp Lagrangian function and zero duality gap properties
are presented in Section 2. In Section 3 we explain the modified subgradi-
ent method, give convergence theorems and present the results of numerical
experiments obtained by applying the presented method.

2 DUALITY

Consider the following equality constrained optimization problem (P):

inf fo (x)
XEX
subject to f (x) = 0,
where X C Rn, f(x) = (fl(x),fi(x),..-,fm(x)) and fi : X -+ R, i =
0,1,2, . ,m, are real-valued functions.
Let Q, : Rn x Rm -+ X be a dualizing parametrization function defined as

fo (x) if x E X and f (x) = y,

Q,(x,Y)=
+oo otherwise.

The function P : Rm -+ defined by

is called the perturbation function, corresponding to dualizing parametrization

Q,. The augmented Lagrangian L : Rn x Rm x R+ -+ i? associated with the
problem (P) will be defined as

where II-II is any norm in Rm, [y, u] = ELl yiui. By using the definition of Q,,
we can calculate the augmented Lagrangian associated with (P)explicitly. For
every x E X we have:

Every element x E Xo, where Xo is a feasible set defined by

260 OPTIMIZATION AND CONTROL WITH APPLICATIONS

such that fo (x) = inf P will be termed a solution of (P).

The dual function H is defined as:

H ( u , c ) = inf L ( x , u , c ) , for u E R m , c E [O,+w)

xEX

Then, a dual problem of (P) is given by

(P) supP = sup H (u,c) .

(u,c)€RmxR+

Any element (u, c) E Rm x R+ with H (u, c) = sup P* is termed a solution of

(P*). The following lemma allows us to represent the primal problem ( P ) as
an " inf sup" of the augmented Lagrangian L.

Lemma 2.1 For every u, y E Rm, y # 0 and for every r E R+ there exzsts
c E R+ such that cllyII - [ y , ~>] r .

It follows from this lemma that

f o (x), x E xo,
sup L(x,u,c) =
(u,c)€RmxR+ +m, x $ xo.
Hence,

inf sup L (x, u, c) = inf { fo (x) lx E Xo ) = inf P. (2.2)

(u,c)€Rmx R+

This means that the value of a mathematical programming problem with equal-
ity constraints can be represented as (2.2), regardless of properties the original
problem satisfies.
Proofs of the following four theorems are analogous to the proofs of similar
theorems earlier presented for augmented Lagrangian functions with quadratic
or general augmenting functions. See, for example, Rockafellar (1993) and
Rockafellar and Wets (1998).
MODIFIED SUBGRADIENT METHOD 261

Theorem 2.1 inf P > sup P*.

Theorem 2.2 Suppose that inf P is finite. Then a pair of elements 'i: E X
and (Ti,Z) E Rm x R+ furnishes a saddle point of the augmented Lagrangian L
o n X x (Rm x R+) if and only if 2 is a solution to ( P ), @,Z) is a solution to
(P*)and inf P = sup P* .

Theorem 2.3 A pair of vectors x E X and (u,c) E Rm x R+ furnishes a

saddle point of the augmented Lagrangian L o n X x (Rm x R+) i f and only i f

where p is a perturbation function defined by (??). When this holds, any a > c
will have the property that

[x solves ( P ) ]H [x minimizes L ( z ,u , a ) over z E X ] .

Theorem 2.4 Suppose in ( P ) that fo and f are continuous, X i s compact,

and a feasible solution exists. Then inf P = sup P* and there exists a solution
to ( P ) . Furthermore, in this case, the dual function H in ( P * ) is concave
and finite everywhere o n Rm x R+, so this maximization problem is effectively
unconstrained.

The following theorem will also be used as a stopping criteria in solution

algorithm for dual problem in the next section.

Theorem 2.5 Let inf P = sup P* and suppose that for some (Ti, E ) E Rm x R+,
and 2 E X ,
minL(x,;li,~)
= f o ( 2 ) + E I l f (2)II- [ f (2),Ti].
xEX (2.3')
Then Z is a solution to ( P ) and (Ti,E) is a solution t o ( P * )i f and only i f

f (z)= 0. (2.4)

Proof: Necessity. If (2.3) holds and 2 is a solution to ( P ) then 2 is feasible

and therefore f (2)= 0.
262 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Sufficiency. Suppose to the contrary that (2.3) and (2.4) are satisfied but Z
and (a, E) are not solutions. Then, there exists E E Xo such that fo (E) < fo (T) .
Hence

= min L (x, E,Z) 5 sup min L (x, u, c) = sup P*

SEX ( u , c ) € R m x R +x E X
= inf P 5 fo (E) ,

which proves the theorem.

3 SOLVING THE DUAL PROBLEM

We have described several properties of the dual function in the previous sec-
tion. In this section, we utilize these properties to modify the subgradient
method for maximizing the dual function H. Theorem 2.2, Theorem 2.3 and
Theorem 2.4 give necessary and sufficient conditions for an equality between
inf P and sup P*.Therefore, when the hypotheses of these theorems are satis-
fied, the maximization of the dual function H will give us the optimal value of
the primal problem.
We consider the dual problem

maximize H ( u , c ) = minL(x,u,c) = min{fo (x) +cIlf (x)II - [u, f (x)])

XEX xEX

subject to (u, c) E F = Rm x R+,

It will be convenient to introduce the following set:

The assertion of the following theorem can be obtained from the known the-
orems on the subdifferentials of the continuous maximum and minimum func-
tions. See, for example, Polak (1997).

Theorem 3.1 Let X be a nonempty compact set in Rn and let fo and f be

continuous, so that for any ( T i , E) E Rm x R+, X (E, E) is not empty. If Z E
X (E, 3) , then (- f (T) ,I[f (T) 11) is a subgradient of H at (E, i3) .
MODIFIED SUBGRADIENT METHOD 263

3.1 Subgradient Method

Initialization Step Choose a vector (ul, cl) with cl 2 0, let k = 1, and go to

the main step.
Main Step 1. Given (uk, ck) , solve the following subproblem:

Let xk be any solution. If f (xk) = 0, then stop; by Theorem 2.5, (uk, ck) is
a solution to (P*), xk is a solution to (P) . Otherwise, go to step 2.
2. Let

where sk and ek are positive scalar stepsizes, replace k by k + 1, and repeat

step 1.
The following theorem shows that in contrast with the subgradient methods
developed for dual problems formulated by using ordinary Lagrangians, the
new iterate strictly improves the cost for all values of the stepsizes sk and ek.

Theorem 3.2 Suppose that the pair (uk,ck) E Rm x R+ is not a solution

to the dual problem and xk E X (uk, ck) . Then for a new iterate (uk+l, ck+l)
calculated from (3.1) for all positive scalar stepsizes sk and ~k we have:

Proof: Let (uk, ck) E Rm x R+, xk E X (uk, ck) and ( u ~ +c ~~, + is~a)new iter-
ate calculated from (3.1) for arbitrary positive scalar stepsize sk and ek. Then
by Theorem 3.1, the vector (- f (xk) , I [ f (xk)ll) E Rm x R+ is a subgradient of
a concave function H at (uk,c ~ and
) by definition of subgradients we have:

On the other hand

264 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Now suppose that the last minimum attains for a some Z E X. If f ( Z ) were
zero, then by Theorem 2.5, the pair ( u k , ck + ~k 11 f ( x k )11) would be a solution
to the dual problem and therefore

because of (uk,ck) is not a solution. When f ( Z ) # 0 then

Thus we have established that H (uk+l,ck+l) > H ( u k , ck) . The following

theorem demonstrates that for the certain values of stepsizes sk and ~ k the,

distance between the points generated by the algorithm and the solution to
the dual problem decreases a t each iteration (cf. Bertsekas (1995), Proposition
6.3.1).

Theorem 3.3 Let ( u k ,c k )

be any iteration, which is not a solution to the dual
problem, so f ( x k ) # 0. Then for any dual solution (;li,i?), we have

for all stepsizes sk such that

MODIFIED SUBGRADIENT METHOD 265

where the last inequality is a result of inequalities E - ck > 0 , 11 f ( x k )11 > 0 , and
0 < ~k < sk. NOW,by using the subgradient inequality

we obtain

I t is straightforward to verify that for the range of stepsize of equation (3.2)

the sum of the last two terms in the above relation is negative. Thus,

and the theorem is proved. The inequality (3.3) can also be used to establish
the convergence theorem for the subgradient method.

Theorem 3.4 Assume that all conditions of Theorem 2.4 are satisfied. Let
( u k , ck) be any iteration of the subgradient method. Suppose that each new

iteration ( ~ k +c~ ~, + is~ calculated

) from (3.1) for the stepsizes
-
sk = - H ( u k ' c k ) and 0 < EI < s t ,

where H
5 I l f (xk)1I2

= H (E,E) denotes the optimal dual value. Then H ( u k , ck) - H.

Proof: By taking sk = in (3.3) we obtain:

266 OPTIMIZATION AND CONTROL WITH APPLICATIONS

which can be written in the form

It is obvious that, the sequence {I18- u t 1l2 + I? - ck12} is bounded from be-
low (for example, by zero), and by Theorem 3.3, it is decreasing. Thus,
{lla - u t 11 + la - ck12} is a convergent sequence. Hence

On the other hand, since X is a compact set and f is continuous, ( 5 11 f (xx)l12}

is a bounded sequence. Thus, (3.4) implies

Unfortunately, however, unless we know the dual optimal value H (u, ??),
which is rare, the range of stepsize is unknown. In practice, one can use the
ste~sizeformula

where Hk is an approximation to the optimal dual value and 0 < a k < 2. By

Theorem 3.2, the sequence {H (uk, ck)} is increasing, therefore to estimate the
optimal dual value from below, we can use the current dual value H (uk, ck) .
As an upper bound, we can use any primal value fo (Z) corresponding to a
primal feasible solution 5.
Now we demonstrate the proposed algorithm on some examples. In examples
3.1, 3.2 and 3.3, below, for finding an x" X(uk, ck), the MATLAB function
m-file fminsearch is utilized. In all examples the stepsize parameter E~ was
~ sk is calculated from (3.5) with
taken as 0 . 9 5 ~and a k = 1.

Example 3.1 (see Hzmmelblau (1972))

fo (x) = 1000 - 22 - 22,2 - x32 - ~ 1 x -

2 XlX3 t min

subject to
fi (2) = X? + xz + xg - 25 = 0,
MODIFIED SUBGRADIENT METHOD 267

and
xi > 0 , i = 1,2,3.
The result reported i n Himmelblau (1972) is

where the constraint is satisfied with

11 f ( x * )11 = 2.9 x .
Through this implementation of the modified subgradient algorithm given
above, the result is obtained i n a single iteration, starting with the initial
guesses xo = 0 , uo = 0 and co = 0 . The positivity constraints xi 2 0 ,
i = 1,2,3, are eliminated by defining new variables yi, such that yZ = x i ,
and then the problem is solved for these yi s. The obtained result is x* =
(3.5120790172, 0.2169913063, 3.5522127962), f,* = 961.7151721335. The con-
straint i s satisfied with this s* as 11 f (x*)ll = 9.2 x lo-'' .

Example 3.2 (see Khenlcin (1976))

fo ( 2 ) = 0.5 (xi + ~ 2 + 50 (x2 - x l ) + sin2 ( x i + x 2 )

) ~
2
t rnin

subject to

f ( x )= (xl - 1)
2
+ (x2 - 1)2 + (sin ( X I + x2) - 1 )2 - 1.5 I 0 .
The result reported i n Khenlcin (1976) is

where the constraint is satisfied with

Through this implementation of the modified subgradient algorithm, the result

is obtained i n a single iteration, starting with the initial guesses xo = 0 , uo = 0
and co = 0 . A slack variable has been added to the constraint to convert it to
an equality constraint. The obtained result is
x* = (0.22901434,0.22901434), f,* = 0.300419026502. The constraint is sat-
isfied with this x* as 11 f (x*)11 = 8.1 x lo-' .
268 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Example 3.3 (see Khenkin (1976))

+
fo (x) = 0.5 (XI x2)
2
+ 50 (22 - xi)' + xi + 1x3 - sin (xl+ x2)1 - rnin

subject to

f (x) = (XI- 1)' + (22 - 1)2 + (23 - 1)2 - 1.5 1 0.

The result reported in Khenkin (1976) is

where the constraint is satisfied with

Through this implementation of the subgradient algorithm, the result is ob-

tained i n two iterations, starting with the initial guesses xo = 0, uo = 0 and
co = 0. A slack variable has been added to the constraint to convert it to an
equality constraint. The obtained result is

The constraint is satisfied with the above x* as

Note that only the range 0.7-1.2 for an estimate of H as an upper bound in
the formula for sk seems to be giving the correct answer. Outside this range,
the number of iterations gets bigger.

Example 3.4 (see Floudas (1999))

+
fo (x,y ) = [a,x] - 0.5 [x,Q x ] by -+ min

and0 5 xi 5 1, i = 1,2,3,4,5,y 2 0 ,where a = (-10.5, -7.5, -3.5, -2.5, -1.5),

b = -10, Q = 1001, and I is the identity matrix.
REFERENCES 269

The result reported in Floudas (1999) is

where the constraint is satisfied with

Through this implementation of the subgradient algorithm, the result is obtained

in a single iteration, starting with the initial guesses x o = 0, uo = 0 and co = 0.
A slack variable has been added to the constraint to convert it to an equality
constraint. For finding an x* = x1 E X(u1, cl), the LINGO 6.0 is utilized. The
obtained result is the

The upper bound for H was taken as H = 20.

Acknowledgments

The authors wish to thank Drs. Y. Kaya and R.S. Burachik for useful discussions
and comments.

References

Andramanov, M.Yu., Rubinov, A.M., and Glover, B.M. (1997), Cutting an-
gle method for minimizing increasing convex-along-rays functions, Research
Report 9717, SITMS, University of Ballarat, Australia.
Andramanov, M.Yu., Rubinov, A.M. and Glover, B.M. (1999), Cutting angle
methods in global optimization, Applied Mathematics Letters, Vol. 12, pp.
95-100.
Bazaraa, M.S., Sherali, H.D. and Shetty, C.M. (1993), Nonlinear programming.
Theory and Algorithms , JohnWiley& Sons, Inc., New York.
Bertsekas, D.P. (1995), Nonlinear Programming, Athena Scientific, Belmont,
Massachusetts.
Demyanov, V. F. (1968), Algorithms for some Minimax Problems, J . Computer
and System Sciences, 2, 342-380.
Floudas, C.A., et al. (1999), Handbook of test problems in local and global op-
timization, Kluwer Academic Publishers, Dordrecht.
270 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Goffin, J.L. (1977), On convergent rates of subgradient optimization methods,

Mathematical Programming , Vol. 13, pp. 329-347.
Himmelblau, D.M. (1972), Applied nonlinear optimization, McGraw-Hill Book
Company.
Khenkin, E.I. (1976), A search algorithm for general problem of mathematical
programming, USSR Journal of Computational Mathematics and Mathemat-
ical Physics , Vol. 16, pp. 61-71, (in Russian).
Polak, E. (1997), Optimization. Algorithms and consistent approximations, Springer-
Verlag.
Poljak, B.T. (1969a), Minimization of unsmooth functionals, 2. Vychislitelnoy
Matematilci i Matematicheskoy Fizilci , Vol. 9, pp. 14-29.
Poljak, B.T. (1969b), The conjugate gradient method in extremal problems, 2.
Vychislitelnoy Matematilci i Matematicheslcoy Fiziki , Vol. 9, pp. 94-112.
Poljak, B.T. (1970), Iterative methods using Lagrange multipliers for solving
extremal problems with constraints of the equation type,Z. Vychislitelnoy
Matematiki i Matematicheslcoy Fizilci, Vol. 10, pp. 1098-1106.
Rockafellar, R.T. (1993), Lagrange Multipliers and Optimality, SIAM Review,
Vol. 35, pp. 183-238.
Rockafellar, R.T. and Wets, R.J-B. (1998), Variational Analysis, Springer Ver-
lag, Berlin.
Rubinov, A.M. (2000), Abstract convexity and global optimization , Kluwer Aca-
demic Publishers.
Rubinov, A.M., Glover, B.M. and Yang, X.Q. (1999), Decreasing Functions
with Applications to Penalizations, SIAM J. Optimization , Vol. 10, No. 1,
pp.289-313.
Rubinov, A. M., Yang, X.Q., Bagirov, A.M. and Gasimov, R. N., Lagrange-type
functions in constrained optimization, Journal of Mathematical Sciences , (to
appear).
Shor, N.Z. (1985), Minimization methods for nondifferentiable functions, Springer
Verlag, Berlin.
Shor, N.Z. (1995), Dual Estimates in Multiextremal Problems, Journal of Global
optimization, Vol. 7, pp. 75-91.
Yang, X.Q. and Huang X.X. (2001), A nonlinear Lagrangian approach to con-
strained optimization problems, SIAM J. Optimization , Vol. 14, pp. 1119 -
1144.
12 INEXACT RESTORATION
METHODS FOR NONLINEAR
PROGRAMMING: ADVANCES AND
PERSPECTIVES
JosC Mario Martinez
Departamento de Matema'tica Aplicada,
IMECC-UNICAMP, CP 6065, 13081-970 Campinas SP, Brazil. ([email protected])

and Elvio A. Pilotta

Facultad de Matemitica, Astronomia y Fisica, FaMAF, Universidad Nacional de
Cbrdoba, CIEM, Cdad. Universitaria (5000) Cbrdoba, Argentina. ([email protected])

Abstract: Inexact Restoration methods have been introduced in the last few
years for solving nonlinear programming problems. These methods are related
to classical restoration algorithms but also have some remarkable differences.
They generate a sequence of generally infeasible iterates with intermediate iter-
ations that consist of inexactly restored points. The convergence theory allows
one to use arbitrary algorithms for performing the restoration. This feature is
appealing because it allows one to use the structure of the problem in quite op-
portunistic ways. Different Inexact Restoration algorithms are available. The
most recent ones use the trust-region approach. However, unlike the algorithms
based on sequential quadratic programming, the trust regions are centered not
in the current point but in the inexactly restored intermediate one. Global con-
vergence has been proved, based on merit functions of augmented Lagrangian
type. In this survey we point out some applications and we relate recent ad-
vances in the theory.

Key words: Nonlinear programming, trust regions, GRG methods, SGRA

methods, projected gradients, global convergence.
272 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

A classical problem in numerical optimization is the minimization of a general

function with nonlinear constraints. Except in very particular cases, analytical
solutions for this problem are not available. Therefore, iterative methods must
be used. One of the most natural ideas is to generate a sequence of feasible
points {xk} that satisfy the constraints and such that the objective function f
is progressively decreased. In this way, one expects that, in the limit, a solution
of the minimization problem will be obtained. However, maintaining feasibility
of the iterates when the constraints are nonlinear is difficult. Usually, one
takes a direction dk for moving away from the "current point" xk but, if the
constraints are nonlinear, the points of the form xk+adk might be infeasible, no
matter how small the parameter a could be. On the other hand, in the linearly
constrained case, one can always take a "feasible direction", that guarantees
feasibility of x" adqor a > 0 small enough.
The observations above imply that, when nonlinear constraints are present,
one must be prepared to lose feasibility and to restore feasibility from time to
time. The process of coming back to a feasible point from a nonfeasible one is
called "restoration". Usually, it involves the solution of a nonlinear system of
equations, perhaps underdetermined.
The most classical restoration methods are Rosen's gradient-projection
method Rosen (1960); Rosen (1961), the sequential gradient-restoration algo-
rithms (SGRA) of Miele and coworkers Miele et a1 (1969); Miele et a1 (1971);
Miele et a1 (1983) and the GRG method introduced by Abadie and Carpen-
tier. See Abadie and Carpentier (1968); Drud (1985); Lasdon (1982). Roughly
speaking, these methods proceed in the following way: given a feasible point xk,
a trial point w is found satisfying a linear approximation of the constraints and
such that the functional (or the Lagrangian) value at w is sufficiently smaller
than the corresponding value a t x? Since, in general, w is not feasible, a
restoration process is necessary to obtain a feasible z k If f (z" is not suffi-
ciently smaller than f (x", the allowed distance between xk and w must be
decreased.
The main difference between the different classical methods is the way in
which restoration is performed. See Rom and Avriel (1989a); Rom and Avriel
(198913). In the gradient-projection method of Rosen the Jacobian of the con-
straints is fixed a t w and we seek the restored point in the orthogonal subspace,
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 273

GRADIENT PROJECTION GRG SGRA

Figure 1.1 Restoration in classical methods.

by means of a modified Newton-like procedure. In the GRG method some (non-

basic) variables are fixed and only the remaining "basic" variables are moved
in order to get the feasible point. A Newton method with fixed Jacobian is also
used for solving the resulting square system of nonlinear equations. The SGRA
methods consider the problem of restoration as an underdetermined nonlinear
system of equations and solve it using an underdetermined Newton method
with damping, for which global convergence can be proved. See Figure 1.1.
The nonlinearity of the constraints impose that restoration procedures must
be iterative. Therefore, strictly speaking, a feasible point is rarely obtained
and restoration is always inexact. This means that practical implementations
include careful decisions with respect to the accuracy a t which the restored
points must be declared feasible. The tolerances related to feasibility were
incorporated in the rigorous definition of an algorithm by Mukai and Polak
(1978). Rom and Avriel (198913) used the Mukai-Polak scheme to prove unified
convergence theorems for the classical restoration algorithms using progressive
reduction of the feasibility tolerance.
This survey is organized as follows. The main Inexact Restoration ideas
are given in Section 2. In Section 3 we give a detailed description of a re-
cently introduced IR algorithm and we state convergence results. In Section 4
we comment the AGP optimality condition, which is useful to analyze limit
points of IR methods. In Section 5 we describe the application to Order-Value
Optimization. In Section 6 we comment the application to Bilevel program-
ming. In Section 7 we describe an application that identifies feasible regions
with homotopy curves for solving nonlinear systems. Conclusions are stated in
Section 8.
274 OPTIMIZATION AND CONTROL WITH APPLICATIONS

\ ) \ NONCONVERGENCEOF
\\ RESTORATION

POINT F
TRIAL POINT

Figure 2.1 Short steps far from solution in classical restoration.

2 M A I N INEXACT RESTORATION IDEAS

The main drawback of feasible methods is that they tend to behave badly
in the presence of strong nonlinearities, usually represented by very curved
constraints. In these cases, it is not possible to perform large steps far from
the solution, because the nonlinearity forces the distance between consecutive
feasible iterates to be very short. If a large distance occurs in the tangent
space, the newtonian procedure for restoring feasibility might not converge,
and the tangent step must be decreased. See Figure 2.1. Short steps far from
the solution of an optimization problem are undesirable and, frequently, the
practical performance of an optimization method is linked to the ability of
leaving quickly the zones where there is no chance of finding a solution. This
fact leads one to develop Inexact Restoration algoritms.
The convergence theory of Inexact Restoration methods is inspired in the
convergence theory of recent sequential quadratic programming (SQP) algo-
rithms. See Gomes et a1 (1999). The analogies between IR and the SQP
method presented in Gomes et a1 (1999) are:

1. both are trust-region methods;

2. in both methods the iteration is composed by two phases, the first related
to feasibility and the second to optimality;

3. the optimality phase seeks a "more optimal" point in a "tangent approx-

imation" to the constraints;

4. the same type of merit function is used.

INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 275

However, there exist very important differences, which allow one to relate
IR to the classical feasible methods:

1. both in the restoration phase and in the optimality phase, IR deals with
the true function and constraints, while SQP deals with a model of both;

2. the trust region in SQP is centered in the current point. In IR the trust
region is centered in the restored point.

Because of these differences, we say that IR has a more defined preference

for feasibility than SQP.
A very important characteristic of modern IR methods is that we are free
to choose the algorithm used for restoration. This allows one to exploit char-
acteristics of the constraints, which would not be taken into account in other
nonlinear programming algorithms.

3 DEFINITION OF AN IR ALGORITHM

In this section we give a rigorous definition of the algorithm given in Martinez

(2001).
In this algorithm, the trust region is centered in the intermediate point,
as in Martinez and Pilotta (2000); Martinez (1997), but, unlike the algorithms
introduced in those papers, the Lagrangian function is used in the "tangent set"
(which approximates the feasible region), as in Martinez (1998). Accordingly,
we define a new merit function that fits well both requirements: one of its terms
is the Lagrangian, as in Martinez (1998), but the second term is a nonsmooth
measure of infeasibility as in Martinez and Pilotta (2000); Martinez (1997).
Inexact Restoration can be applied to problems with quite general constraint
structure but, for simplicity, we restrict here to the case "nonlinear equalities
and bounds", which is the one considered in Martinez (2001). Every nonlinear
programming problem is equivalent to a problem with this structure by means
of the addition of slack variables and bounds. So, we consider the problem

min f (XI
subject to C(x) = 0, x E R

where R c Rn is a closed and convex subset of R n , f : Rn -+ R and

c:lRn-+IRm.
276 OPTIMIZATION AND CONTROL WITH APPLICATIONS

We denote C 1 ( x )the Jacobian matrix of C evaluated a t x. Throughout the

paper we assume that V f ( x ) and C 1 ( x )exist and are continuous in R.
The algorithm is iterative and generates a sequence {x" c 0. The para-
meters 7 > 0,r-1 E [O,l),P >0 , M > 0,O-1 E (O,1),dmi, > O , T ~ > O , T ~ > 0
are given, as well as the initial approximation xO E E , the initial vector of
Lagrange multipliers X0 E Rm and a sequence of positive numbers { w k ) such
that CEO wk < co . All along this paper 11 11 will denote the Euclidean norm,
although in many cases it can be replaced by an arbitrary norm.

Definition 3.1 The Lagrangian function of the minimization problem is de-

fined as
-%A) = f ( X I + ( C ( x ) 4
, (3.1)
for all x E R, X E Rm.

Assume that k E {O,1,2,. . .), xk E E, Xk E Em and rk-1, 13,+~,. . . ,0-1

have been computed. The steps for obtaining x w l , XW1 and Ok are given below.

Algorithm 3.1
Step 1. Initialize penalty parameter.
Define
OF = min ( 1 ,Q k - l , . . . ,0 - I ) ,
large -
8, - m i n { l , O ~ + ~ ~ )
and
ek,-l = Ok
large
.

Step 2. Feasibility phase of the iteration.

Set rk = rk-1.
Compute yk E E such that

and
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 277

If this is not possible, replace r k by (rk+1)/2 and repeat Step 2. (In Martinez
(2001) sufficient conditions are given for ensuring that this loop finishes.)

Step 3. Tangent Cauchy direction.

Compute
@an = P k [Y" -vL(yk ,h ) ] - yk, (3.6)
where Pk(z) is the orthogonal projection of z on nk and

If yk = xk (SOC(xk) = c(yk) = 0) and dtan = 0, terminate the execution of

the algorithm returning xk as "the solution".
If dFan = 0 compute Xk+l E lRm such that IIX"l 11 I M , define

and terminate the iteration.

>
Else, set i t 0, choose dklo dnin and continue.

Step 4. Trial point in the tangent set.

Compute, using Algorithm 3.2 below, zk>iE ~k such that

1)zki - yk1I I I k , i and L(zkji, Xk) < L(yk ,X k ).

Step 5. Trial multipliers.

Compute X,k;ial E lRm such that IIIX;~~,
11 I M .
Step 6. Predicted reduction.
Define, for all 8 E [ O , l ] ,

Compute 8k,i, the maximum of the elements 8 E [O,8k,i-l] that verify

278 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Step 7. Compare actual and predicted reduction.

Compute

define

and terminate iteration k.

Else, choose dk,i+l E [0.16k,i,0.96k,i], set i + i + 1 and go to Step 4.
Algorithm 3.2
Step 1.
Compute tf;,, = min (1, 6k,i/lldtanll).

Step 2.
ki
Set t t tbseak.

Step 3.
If
L(Yk -I-t&,,
k
Xk) 5 L(y , A )
k
+ O.lt(VL(y ,X ), dtan),
k k
(3.10)

define zksiE R such that Ilzkli - y"I 5 and

and terminate. (Observe that the choice zkvi = y% tdt,, is admissible but,
very likely, it is not the most efficient choice.)
Step 4.
If (3.10) does not hold, choose t,,, E [O.lt, 0.9t], set t t t,,, and go to
Step 3.
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 279

The following conditions are assumed for proving convergence of the IR

algorithm.
Al. R is convex and compact.
A2. There exists L1 > 0 such that, for all x, y E R,

A3. There exists L2 > 0 such that, for all x, y E R,

Under these conditions it can be proved that the algorithm is well defined.
We can also prove the convergence theorems stated below.

Theorem 3.1 Every limit point of a sequence generated by Algorithm IR is a

stationary point of

min llc(~>Il~
subject to ~ < Z < U .

Theorem 3.2 If C(xk) + 0, there exists a limit point of the sequence that
satisfies the Fritz-John optimality conditions of nonlinear programming.

The algorithm presented in Martinez (2001) uses a fixed r = rk for all

k = 0,1,2,. . . In Martinez (2001) it was proved that, if Step 2 can always be
completed, then C(xk) + 0, which implies that every limit point is feasible.
Clearly, this is not always possible. For example, in some cases the feasible
region might be empty. Therefore, we find it useful to show that, even when
Step 2 is eventually impossible, one finds a stationary point of the squared norm
of infeasibilities. The same result can be proved for most practical nonlinear
programming algorithms.
The optimality theorem can be improved. In fact, one can prove that a limit
point can be found that satisfies a stronger optimality condition than F'ritz-
John. This is called the AGP optimality condition and will be the subject of
the following section.
280 OPTIMIZATION AND CONTROL WITH APPLICATIONS

h(x) = 0

Figure 4.1 AGP vector: equalities and bounds.

4 AGP OPTlMALlTY CONDITION

Consider the nonlinear programming problem

where, for simplicity, we leave to the reader the definition of the correct
dimensions for x, h, g. Given x E IRn and the tolerance parameter r) > 0, we
divide the inequality constraints in three groups:

1. N: constraints not satisfied at x, defined by gi(x) > 0.

2. A: constraints almost active at x, given by 0 > gi(x) 2 -r).

3. I: constraints strongly satisfied (or inactive) at x, given by gi(x) < -q.
Clearly, if r) is very large, this set is empty.
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 281

Definition 4.1 W e define T(x), the "tangent approximation" to the feasible

set as the polytope given by

a ( x ) = {z E lRn I hl(x)(z - x) = 0, gi(x)(z - x) I 0 if i E N,

gi(x) + gi(x)(z - x) I 0 if i E A } .

Observe that x E T(x).

Definition 4.2 The A G P (approximate gradient projection) vector g(x) is

defined by
g(x) = P ( x - V f (x)) - x,

where P ( z ) is the Euclidean projection of z o n T(x).

(see Martinez and Svaiter (2000)). See Figures 4.1 and 4.2.
We say that a feasible point x* of (4.1) satisfies the AGP optimality condition
if there exists a sequence {xk} that converges to x* and such that g ( x 9 + 0.
The points x that satisfy the AGP optimality conditions are called "AGP
points". It has been proved in Martinez and Svaiter (2000) that the set of
local minimizers of a nonlinear programming problem is contained in the set of
AGP points and that this set is strictly contained in the set of Fritz-John points.
Therefore, the AGP optimality condition is stronger than the Fritz-John op-
timality conditions, traditionally used in algorithms. When equalities are not
present and the problem is convex it can be proved that AGP is sufficient for
guaranteeing that a point is a minimizer Martinez and Svaiter (2000).
A careful analysis of the convergence proofs in Martinez and Pilotta (2000);
Martinez (2001) shows that Inexact Restoration guaranteeing convergence to
points that satisfy the AGP optimality condition. This fact has interesting
consequences for applications, as we will see later.

5 ORDER-VALUE OPTIMIZATION

The Order-Value optimization problem (OVO) has been introduced recently in

Andreani et a1 (2001).

Definition 5.1 Given m functions f i , .. . ,f,, defined in a domain 0 c IRn

and a n integer p E (1,. . . ,m}, the (p
' ) order-value function f (x) is defined by
282 OPTIMIZATION AND CONTROL WITH APPLICATIONS

k
X

Figure 4.2 AGP vector: inequalities.

where

If p = 1, f ( x ) = min{ fi ( x ) ,. . . ,fm(x)) whereas for p = m we have that

f ( x )= m a x { f i ( x ) ,. - .,f m ( x ) ) .

The OVO problem consists in the maximization of the order-value function:

max f (XI
subject to x ER

The definition of the OVO problem was motivated by applications to decision

problems and robust regression with systematic errors. See Figure 5.1.
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 283

Figure 5.1 OVO linear regression.

I t has been proved in Andreani et a1 (2001) that (5.1) is equivalent to

Stationary points of the sum of squares of infeasibilities are feasible points

and that, although all the feasible points of this problem are Fritz-John, this
is not a serious inconvenient for IR algorithms, due to the AGP condition.
Finally, it can be easily verified that an obvious restoration procedure exists,
which encourages the use of restoration methods. See Andreani and Martinez
(2001).

6 BILEVEL PROGRAMMING

The Bilevel Programming problem is

min f(x,y)
subject to y solves a nonlinear programming problem
that depends on x (6.1)
and
Ordinary constraints.
In other words:
284 OPTIMIZATION AND CONTROL WITH APPLICATIONS

min f(x,y)
subjectto y m i n i m i z e s P ( x , y ) s . t . t ( ~ , y ) = O , s ( x , y ) ~ O(6.2)
and h(x, Y) = 0, g(x, y) 10. (6.3)

We omit the dimensions of x, y, t, s, h, g in order to simplify the notation.

Usually, the constraints (6.2) are replaced by the optimality (KKT) condi-
tions of the minimization problems, so that they take the form

In (6.4) we denote F ( x , y) - Vyf(x, y). When F is not a gradient we

say that the minimization of f (x, y) subject to (6.3-6.6) is an MPEC (mathe-
matical programming problem with equilibrium constraints). The equilibrium
constraints are (6.4-6.6). Completing the inequalities with slack variables and
bounds, we obtain the nonlinear programming problem:

Nonlinear programming algorithms can be used for solving (6.7-6.11). Since

many of these algorithms, including IR, have the property that limit points are
stationary points of the squared norm of infeasibilities, it is interesting to find
sufficient conditions under which stationary points of the following problem are
feasible:
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 285

An answer is given by the following theorem (see Andreani and Martinez

(2001)).

Theorem 6.1 Assume that the following conditions hold:

(i) (x*, y*, v*, w*, A*, z*) is a KKT point of (13.12)~(6.13), (6.14);

(ii) The matrix V,F(x, y) + Cf='=l

[z*]i~;,[s(x*,y*)li is positive definite in
the null-space of ~ r t ( x *y*);
,

(iii) h(x, y) = 0, g(x, Y) < 0;

(iv) t(x*, y) is an afine function (of y) and the functions [s(x*,y)]i are convex
(as functions of y);

(v) There exists c such that t(x,g ) = 0 and s(x,c) < 0;

Then, (x*,Y*) is a feasible point of (6.8)-(6.11).

Roughly speaking, the sufficient conditions are related with the convexity
of the lower-level problem, when the MPEC under consideration is a Bilevel
programming problem.
As in the case of the OVO problem, no feasible point of MPEC satisfies
the Mangasarian-F'romowitz constraint qualification and, consequently, all the
feasible points are Fritz-John. So, it is mandatory to find stronger optimality
conditions in order to explain the practical behavior of algorithms. Fortunately,
it can also be proved that the set of AGP points is only a small part of the set
of Fritz-John points. The characterization of AGP points is given in Andreani
and Martinez (2001). Since IR converges to AGP points, we have an additional
reason for using IR in MPEC.
As in Nash-equilibrium problems (see Vicente and Calamai (1994)) the use
of the optimization structure instead of optimality conditions in the lower level
encourages the use of specific restoration algorithms.
286 OPTIMIZATION AND CONTROL WITH APPLICATIONS

7 HOMOTOPY METHODS

Nonlinear systems of equations often represent mathematical models of prac-

tical engineering problems. Homotopic techniques are used for enhancing con-
vergence to solutions, especially when a good initial estimate is not available.
In Birgin et a1 (2001), the homotopy curve is considered as the feasible set of a
mathematical programming problem, where the objective is to find the optimal
value of the homotopic parameter. Inexact Restoration techniques can then be
used to generate approximations in a neighborhood of the homotopy curve, the
size of which is theoretically justified.
Assume that R = {x E Rn I C 5 x 5 u), the mapping F : R +
Rn has continuous first partial derivatives, C, u E Rn and C < u. Then, the
mathematical problem consists in finding x E R such that

Homotopic methods are used when strictly local Newton-like methods for
solving (7.1) fail because a sufficiently good initial guess of the solution is not
available. Moreover, in some applications areas, homotopy methods are the rule
of choice. See Watson et a1 (1997). The homotopic idea consists in defining

such that H(x, 1) = F(x) and the system H ( x , 0) = 0 is easy to solve. The so-
lution of H(x, t) = 0 is used as initial guess for solving H ( x , t') = 0, with t' > t.
In this way, the solution of the original problem is progressively approximated.
For many engineering problems, natural homotopies are suggested by the very
essence of the physical situation. In other cases, artificial homotopies can be
useful.
In general, the set of points (x, t ) that satisfy H ( x , t ) = 0 define a curve in
Rn+'. Homotopic methods are procedures to "track" this curve in such a way
that its "end point" (t = 1) can be safely reached. Since the intermediate points
of the curve (for which t < 1)are of no interest by themselves, it is not necessary
to compute them very accurately. So, an interesting theoretical problem with
practical relevance is to choose the accuracy to which intermediate points are
to be computed. If an intermediate point (x, t ) is computed with high accuracy,
the tangent line to the curve that passes through this point can be efficiently
INEXACT RESTORATION METHODS FOR NONLINEAR PROGRAMMING 287

used to predict the points corresponding to larger values of the parameter t.

This prediction can be very poor if (x,t) is far from the true zero-curve of
H(x, t). On the other hand, accurate computing of all intermediate points can
be unaffordable.
In Birgin et a1 (2001) a relation between the homotopic framework for solving
nonlinear equations and Inexact Restoration methods was established. The idea
is to look a t the homotopic problem as the nonlinear optimization problem

min (t - 1)2
subject to H(x,t) = 0, x E 0.

Therefore, the homotopic curve is the feasible set of (7.2). The nonlinear
programming problem (7.2) could be solved by any constrained optimization
method, but Inexact Restoration algorithms seem to be closely connected to
the classical predictor-corrector procedure used in the homotopic approach.
Moreover, they give theoretically justified answers to the accuracy problem.
The identification of the homotopy path with the feasible set of a nonlinear
programming problem allows one to use IR criteria to define the closeness of
corrected steps to the path. The solutions of (7.1) correspond exactly to those
of (7.2), so the identification proposed here is quite natural.
The correspondence between the feasibility phase of IR and the corrector
phase of predictor-corrector continuation methods is immediate. The IR tech-
nique provides a criterion for declaring convergence of the subalgorithm used
in the correction. The optimality phase of IR corresponds to the predictor
phase of continuation methods. The IR technique determines how long pre-
dictor steps can be taken and establishes a criterion for deciding whether they
should be accepted or not.

8 CONCLUSIONS

Some features of the Inexact Restoration framework are reasonably consoli-

dated. Among them, we can cite:

1. freedom for choosing different algorithms in both phases of the method,

so that problem characteristics can be exploited and large problems can
be solved using appropriate sub-algorithms.
288 OPTIMIZATION AND CONTROL WITH APPLICATIONS

2. loose tolerances for restoration discourage the danger of "over-restoring" ,

a procedure that could demand a large amount of work in feasible methods
and that, potentially, leads to short steps far from the solution.

3. trust regions centered in the intermediate point adequately reflect the

"preference for feasibility" of IR methods.

4. the use of the Lagrangian in the optimality phase favors practical fast
convergence near the solution.

The above characteristics should be preserved in future IR implementations.

It is not so clear for us which is the best merit function for IR methods. The
one introduced in Martinez (2001) seems to deal well with the feasibility and
the optimality requirements but it is certainly necessary to pay attention to
other alternatives or, even to the possibility of not using merit functions a t
all, as proposed, for example, in Bielschowsky (1996); Bielschowsky and Gomes
(1998); Fletcher and Leyffer (1997). A remarkable contribution has been re-
cently made by Gonzaga et al(2001).
The philosophy of Inexact Restoration encourages case-oriented applications.
Very relevant is the use of IR methods for Bilevel Programming, because we
think that, in this important family of problems, the strategy of using oppor-
tunistic methods in the lower level (Feasibility Phase) could be useful. A related
family of problems for which IR could be interesting is MPEC (Mathematical
Programming with Equilibrium Constraints) which, of course, it is closely re-
lated to Bilevel programming. The discretization of control problems also offers
an interesting field of IR applications because, in this case, the structure of the
state equations suggests ad hoc restoration methods.

Acknowledgments

The authors wish to thank to FAPESP (Grant 90-3724-6), CNPq, FAEP-UNICAMP

and SeCYT-UNC (Grant 194-2000) for their support. We also acknowledge an anony-
mous referee for helpful comments.
REFERENCES 289

References

Abadie, J. and Carpentier, J. (1968), Generalization of the Wolfe reduced-

gradient method to the case of nonlinear constraints, in Optimization, Edited
by R. Fletcher, Academic Press, New York, pp. 37-47.
Andreani, R., Dunder, C. and Martinez, J. M. (2001), Order-Value optimiza-
tion, Technical Report , Institute of Mathematics, University of Campinas,
Brazil.
Andreani, R. and Martinez, J. M. (2001), On the solution of mathematical
programming problems with equilibrium constraints, Mathematical Methods
of Operations Research 54, pp. 345-358.
Bielschowsky, R. H. (1996), Nonlinear Programming Algorithms with Dynamic
Definition of Near-Feasibility: Theory and Implementations, Doctoral Dis-
sertation, Institute of Mathematics, University of Campinas, Campinas, SP,
Brazil, 1996.
Bielschowsky, R. H. and Gomes, F. (1998), Dynamical Control of Infeasibility
in Constrained Optimization, Contributed Presentation in Optimization 98,
Coimbra, Portugal.
Birgin, E., Krejii: N. and Martinez, J. M. (2001), Solution of bounded nonlinear
systems of equations using homotopies with Inexact Restoration, Technical
Report, Institute of Mathematics, University of Campinas, Brazil.
Drud, A. (1985), CONOPT - A GRG code for large sparse dynamic nonlinear
optimization problems, Mathematical Programming 31, pp. 153-191.
Fletcher, R. and Leyffer, S. (1997), Nonlinear programming without penalty
function, Numerical Analysis Report NA/171, University of Dundee.
Gomes, F. M., Maciel, M. C. and Martinez, J. M. (1999), Nonlinear program-
ming algorithms using trust regions and augmented Lagrangians with non-
monotone penalty parameters, Mathematical Programming 84, pp. 161-200.
Gonzaga, C., Karas, E. and Vanti, M. (2001), A globally convergent filter
method for nonlinear programming, Technical Report, Federal University
of Santa Catarina, Brazil.
Lasdon, L. (1982), Reduced gradient methods, in Nonlinear Optimization 1981,
edited by M. J. D. Powell, Academic Press, New York, pp. 235-242.
Martinez, J. M. (1997), A Trust-region SLCP model algorithm for nonlinear
programming, in Foundations of Computational Mathematics. Edited by F.
Cucker and M. Shub. Springer-Verlag, pp. 246-255.
290 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Martinez, J. M. (1998), Two-phase model algorithm with global convergence for

nonlinear programming, Journal of Optimization Theory and Applications
96, pp. 397-436.
Martinez, J. M. (2001), Inexact Restoration method with Lagrangian tangent
decrease and new merit function for nonlinear programming, Journal of Op-
timization Theory and Applications 111, pp. 39-58.
Martinez, J. M. and Pilotta, E. A. (2000), Inexact Restoration algorithm for
constrained optimization, Journal of Optimization Theory and Applications
104, pp. 135-163.
Martinez, J. M. and Svaiter, B. F. (2000), A sequential optimality condition for
nonlinear programming, Technical Report, Institute of Mathematics, Univer-
sity of Campinas, Brazil.
Miele, A., Yuang, H. Y and Heideman, J. C. (1969), Sequential gradient-
restoration algorithm for the minimization of constrained functions, ordi-
nary and conjugate gradient version, Journal of Optimization Theory and
Applications, Vol. 4, pp. 213-246.
Miele, A., Levy, V. and Cragg, E. (1971), Modifications and extensions of
the conjugate-gradient restoration algorithm for mathematical programming
problems, Journal of Optimization Theory and Applications, Vol. 7, pp. 450-
472.
Miele, A., Sims, E. and Basapur, V. (1983), Sequential gradient-restoration
algorithm for mathematical programming problems with inequality con-
straints, Part 1, Theory, Report No. 168, Rice University, Aero-Astronautics.
Mukai, H. and Polak, E. (1978), On the use of approximations in algorithm
for optimization problems with equality and inequality constraints, SIAM
Journal on Numerical Analysis 15, pp. 674-693.
Rom, M. and Avriel, M. (1989), Properties of the sequential gradient-restoration
algorithm (SGRA), Part 1: introduction and comparison with related meth-
ods, Journal of Optimization Theory and Applications, Vol. 62, pp. 77-98.
Rom, M. and Avriel, M. (1989), Properties of the sequential gradient-restoration
algorithm (SGRA), Part 2: convergence analysis, Journal of Optimization
Theory and Applications, Vol. 62, pp. 99-126.
Rosen, J. B. (1960), The gradient projection method for nonlinear program-
ming, Part 1, Linear Constraints, SIAM Journal on Applied Mathematics,
Vol. 8, pp. 181-217.
Rosen, J. B. (1961), The gradient projection method for nonlinear program-
ming, Part 2, Nonlinear constraints, SIAM Journal on Applied Mathematics,
Vol. 9, pp. 514-532.
Rosen, J. B. (1978), Two-phase algorithm for nonlinear constraint problems,
Nonlinear Programming 3, Edited by 0. L. Mangasarian, R. R. Meyer and
S. M. Robinson, Academic Press, London and New York, pp. 97-124.
Vicente, L. and Calamai, P.(1994), Bilevel and multilevel programming: a bib-
liography review, Journal of Global Optimization 5 , pp. 291-306.
Watson, L. T., Sosonkina, M., Melville, R. C., Morgan, A. P., Walker, H. F.
(1997), Algorithm 777. HOMPACKSO: A suite of Fortran 90 codes for glob-
ally convergent homotopy algorithms, ACM Transactions on Mathematical
Software 23, 514-549.
CONTINUOUS GLOBAL OPTIMIZATION
V. Protopopescu and J. Barhen

Center for Engineering Science Advanced Research

Computing and Computational Sciences Directorate
Oak Ridge National Laboratory
Oak Ridge, TN 37831-6355

Abstract: We investigate the entwined roles played by information and quan-

tum algorithms in reducing the complexity of the global optimization problem
(GOP). We show that: (i) a modest amount of additional information is suffi-
cient to map the general continuous GOP into the (discrete) Grover problem; (ii)
while this additional information is actually available in some classes of GOPs,
it cannot be taken advantage of within classical optimization algorithms; (iii)
on the contrary, quantum algorithms offer a natural framework for the efficient
use of this information, resulting in a speed-up of the solution of the GOP.

Key words: Global optimization, search, Grover's quantum algorithm.

294 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 GLOBAL OPTlMlZATlON PROBLEM

Optimization problems are ubiquitous and extremely consequential. The the-

oretical and practical interest they have generated has continued to grow from
the first recorded instance of Queen Dido's problem (Smith (1974)) to present
day forays in complexity theory or large scale logistics applications (see Refs.
(Torn (1989)), (Horst (1993)), (Hager (1994)), (Floudas (1996)), and references
therein). The formulation of almost any optimization problem is deceptively
simple: find the absolute minimum (maximum) of a given function - called the
objective function - over the allowed range of its variables. Sometimes, the
function to be optimized is not specified in analytic form and must be evalu-
ated point-wise by a computer program, a physical device, or other construct.
Such a black-box tool is called an oracle. Of course, the brute force approach
of evaluating the function on its whole domain is either impossible - if the vari-
ables are continuous, or prohibitively expensive - if the variables are discrete,
but have large ranges in high dimensional spaces. Since in general each oracle
invocation (function evaluation) involves an expensive computational sequence,
the number of function evaluations needs to be kept to a minimum. The num-
ber of invocations of the oracle measures the query complexity of the problem
and gives a fair - although by no means unique - idea of its difficulty or "hard-
ness" (Deng (1996)). Therefore, the number of oracle invocations is one of
the paramount criteria in comparing the efficiency of competing optimization
algorithms.
The primary difficulty in solving the GOP stems from the fact that the fa-
miliar condition for determining extrema (namely, annulment of the gradient
of the objective function) is only necessary (the function may have a maximim,
a minimum, or not have an extremum at all !) and local (it does not distin-
guish between local and global extrema). Indeed, the generic strategy to find
the global minimum involves two main operations, namely: (i) descent to a
local minimum and (ii) search for the new descent region. Usually, the former
operation is deterministic and the latter stochastic. This strategy is marred
by additional problems. First, descent assumes a certain degree of smoothness,
which is not always warranted. When the dimensionality of the problem is
large, the search of the phase space becomes more and more responsible for in-
creasing the query complexity of the problem. Finally, after determining a local
minimum, the algorithm is usually trapped in it and special operations have to
QUANTUM ALGORITHM 295

be designed to restart the search. The "hardness" of the GOP is well illustrated
by the following "golf course" example for which the approach described above
seems powerless. Define the function f : [O,1] -t { O , l ) as follows:

1 for 0 5 x 5 a - €12
0 for a - €12 < x < a + €12 (1.1)
1 for a + ~ / 25 x 5 1.

where a E (€/2,1 - €12). To obtain the minimum of this function, one should
evaluate it within the E interval around the unknown number a. If this function
is defined like an oracle (i.e., if one does not know the position of the point a),
the probability of choosing an x within this interval is E . For the n-dimensional
version of this oracle, the probability becomes en, and the query complexity of
the problem grows exponentially with n (the dimensionality curse). Of course,
this is an extreme case, for which knowledge about the derivatives (they are
all zero whenever defined !) would not help. This and related issues have been
deftly discussed by Wolpert and Macready in connection with their "No F'ree
Lunch" (NFL) theorem (Wolpert (1996)).
In the light of the previous example, it seems that without additional knowl-
edge about the structure of the function there is no hope to decide upon an
intelligent optimization strategy and one is left with either strategies that have
limited albeit efficient applicability or the exhaustive search option.
Thus, new approaches are needed to reduce the complexity of the problem
to manageable complexity. Recently, quantum computing has been hailed as
the possible solution to some of the computationally hard classical problems
(Nielsen (2000)). Indeed, Grover's (Grover (1997)) and Shor's (Shor (1994))
algorithms provide such solutions to the problems of finding a given element in
an unsorted set and the prime factorization of very large numbers, respectively.
Here we present a solution to the continuous GOP in polynomial time, by
developing a generalization of Grover's algorithm to continuous problems. This
generalization requires additional information on the objective function. In
many optimization problems, some of this additional information is available
(see below). While other required information may be more difficult to obtain
in practical applications, it is important to understand - from a theoretical
point of view - the role of the information in connection to the difficulty of the
problem, and to be able to assess a priori what various information is relevant
296 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and for what. For instance, if the objective function were an analytic function,
the knowledge of all its derivatives at a given point would allow, in principle,
the "knowledge" of the function everywhere else in the domain of analyticity.
However, to actually find the global minimum, the function would still have
to be calculated everywhere! In other words, the (additional) knowledge of all
the derivatives a t a given point cannot be eficiently used to locate the global
minimum, although in principle it is equivalent to the knowledge of the function
a t all points. In fact, to locate the global minimum, both methods would require
exhaustive calculations.

2 GROVER'S Q U A N T U M ALGORITHM

A quantum computation is a sequence of unitary transformations on the ini-

tial state of the wave function, +. As such, quantum computation is purely
deterministic and reversible. I t requires the initialization or preparation of the
initial state, the actual "computation", and the read out of the result effected
through a measurement of the final state. If the algorithm is efficient, then,
with probability (much) higher than 112, the measurement would collapse the
final state onto the desired result. Computer architectures needed to imple-
ment classical or quantum algorithm are realized in terms of gates. As opposed
to classical gates that operate on bits taking values in the set (0, I), quantum
gates operate on normalized vectors in a finite-dimensional complex Euclidian
space. In principle, any quantum computer can be viewed as an assembly of
elementary quantum gates, such as the NOT and CNOT gates. The NOT gate

is the 2 x 2 matrix ( ) . It acts on a qubit, q, which is the normalized

state in a two dimensional Euclidian space,c2:

by exchanging the level populations. The CNOT gate acts on four dimensional
vectors in c4.Obviously, some of these vectors can be represented as a tensor
product of two two-dimensional vectors; however other vectors i n a 4 cannot be
written in this form. These latter states are called entangled states and play
a crucial role in quantum algorithms (Nielsen (2000)). Quantum algorithms
are (i) intrinsically parallel and (ii) yield probabilistic results. These proper-
QUANTUM ΑLGΟRΙΤΗΜ 297

ties reίiect the facts that: (i) the waνe fuηctiοη, τ/>, is nonlocal aηd, iη fact,
ubiquituοus aηd (ii) the quaηtity \ψ\2 is iηterρreted as a probability deηsity.

Grονer's original algorithm ρrονides a solution tο the following problem.

Suρροse we haνe a set οf Ν uηsοrted οbjects, Ε = {XI,O;2,....£JV}, aηd aη
oracle fuηctiοη ƒ : Ε —> {0,1}, such that f(x\) = 1 aηd f(χ%) — 0, i = 2, ...Ν.
Using the oracle, fiηd the element χ\ iη the uηsοrted set Ε.
Οη aνerage, the classical solution will involve ~ Ν/2 ~ Ο(Ν) evaluations
οf the oracle. The quaηtum algorithm ρrοροsed by Grονer (Grονer (1997))
reduces this ηumber tο Ο(y/Ν). Iη a generalized νersiοη οf the problem, there
may be L "special" elements fοr which the oracle returηs the value οηe; theη the
ηumber οf evaluations required tο fiηd οηe οf them is οf the οrder Ο(y/Ν/L).

We giνe a brief ρreseηtatiοη οf Grονer's quaηtum algorithm (Grονer (1997)).

First, we ideηtify the set Ε with the complex Euclidean sρace (Ε aηd the
N
elements Χi 6 Ε with the uηit νectοrs iη (E , < Χi\χj >= <J^·, where <% is
the Κrοηecker symbol. Theη cοηstruct the normalized aνerage state οf all the
elements \χ%>\

\W >= —== > \Χi >= —r=\Χ\ > Η 7=—\¤t > (2·2)

Iη the secοηd reρreseηtatiοη, the uηit νectοrs orthogonal tο \χ\ > are lumped
tοgether iη the uηit νectοr \χ± >, which formally reduces the problem tο a
bidimensional sρace aηd simplifies the ρreseηtatiοη aηd iηterρretatiοη οf the
algorithm.
We ηοte that the scalar ρrοduct < χ\\w >— L· := cοsβ = siηα where
β deηοtes the angle betweeη the νectοrs \w > aηd \χι > aηd α deηοtes its
complement, i.e. the angle betweeη the νectοrs \w > aηd \χj¦; >.
The cοηstructiοη οf the state \w > caη be dοηe iη Ιοg2Ν = η steρs by aρ

plying (iη parallel) η Ηadamard traηsfοrmatiοηs, Η = 4= Ι J, οη the

initial ζerο state, |0 > <g> (8) |0 >. Iη the {\xχ >, \χ‡· >} basis, we cοηstruct
η times
the ορeratοrs:

IX1 : = ƒ 2 | x i > < x i | = , (2.3)

298 OPTIMIZATION AND CONTROL WITH APPLICATIONS

which executes a reflection (sign inversion) of the XI-component of a vector and

which represents a reflection (sign inversion) of the w-component of a vector.

At the level of the oracle, the operator I,, is implemented by (-l)f(,)(.),
which
does not depend explicitly on the unknown element XI, while the application
of the operator I, is obvious, since the average state is known. We define now
the amplitude amplification operator:

(2.5)
which, in the compressed, two-dimensional representation of the problem, rep-
resents a rotation of the state vector with an angle 2 a towards 1x1 >. This
means that each application of the operator Q will increase the weight of the
unknown vector 1x1 > (which explains the name of the operator Q) and after
roughly -- 7T 2-1 m
&&- , -- q f l applications the state vector will be es-
sentially parallel to 1x1 >, whereupon a measurement of the state will yield the
result 1x1 > with a probability very close to unity. We mention that for (and
only for) N = 4, the result is obtained with certainty, after only one application.
In general, if one continues the application of Q, the state vector continues its
rotation and the weight of 1x1 > decreases; eventually, the evolution of the state
is cyclic as prescribed by the unitary evolution. In the original, N-dimensional
representation, the operator I, has the representation:

Using this representation, one can show explicitly that the algorithm can be
implemented as a sequence of local operations such as rotations, Hadamard
transforms, etc. (Grover (1997))
It is easy to check that if the oracle returns the same value for all the ele-
ments, i.e. there is no "special" element in the set E, the amplification operator
QUANTUM ALGORITHM 299

Q reduces to the identity operator I and, after the required number of applica-
tions, the measurement will return any of the states with the same probability,
namely 1/N. In other words, the algorithm behaves consistently.

3 SOLUTION OF THE CONTINUOUS GLOBAL OPTIMIZATION

PROBLEM

Grover's search algorithm described in the previous section has been applied
to a discrete optimization problem, namely finding the minimum among an
unsorted set of N different objects. Diirr and Hoyer (Diirr) adapted Grover's
original algorithm and solved this problem with probability strictly greater than
112, using 0 ( 0 ) function evaluations (oracle invocations).
In this article, we map the continuous GOP to the Grover problem. Once
this is achieved, one can apply either Grover's algorithm and obtain an almost
certain result with 0 ( n ) function evaluations. However, the mapping of the
GOP to Grover's problem is not automatic, but requires additional information.
Before spelling out the required information, let us revisit the "pathological"
example (1). Without loss of generality, we can take E = 1 / N and divide the
segment [0, 11into N equal intervals. By evaluating the function a t the midpoint
of the N intervals, we obtain a discrete function that is equal to 1 in N - 1
points and equal to 0 in one point, which - up to an unessential transformation
- is equivalent to Grover's problem. Direct application of Grover's algorithm
yields the corresponding result. Of course, generalization to any dimensionality
d is trivial. Thus, a clssically intactable problem becomes much easier within
the quantum computing framework. We shall return to this example after
discussing the general case.
Consider now a real function of d variables, f (xl, xz, ....,xd). Without re-
stricting generality, we can assume that f is defined on [0, lIdand takes values
in [O, 11. Assume now that: (i) there is a unique global minimum which is
reached at zero; (ii) there are no local minima whose value is infinitesimally
close to zero; in other words, the values of the other minima are larger than
a constant 6 > 0, and (iii) the size of the basin of attraction for the global
minimum measured at height 6 is known; we shall denote it A.
Then our implementation paradigm is the following: (i) instead of f (.),
consider the transformation g ( . ) := (f(.))llm. For sufficiently large m, this
function will take values very close to one, except in the vicinity of the global
300 OPTIMIZATION AND CONTROL WITH APPLICATIONS

minimum, which will maintain its original value, namely zero. Of course, other
transformations can be used to achieve essentially the same result. We calcu-
late m such that = 112. To avoid technical complications that would not
change the tenure and conclusions of the argument, we assume that A = 1/M
where M is a natural number, and divide the hypercube [0, lId in small d-
dimensional hypercubes with sides A. At the midpoint of each of these hyper-
cubes, define the function h(x) := int[g(.) + 1/21 (here i n t denotes the integer
part). The function h(.) is defined on a discrete set of N points, N = M ~ ,
and takes only values one and zero; by our choice of constants, the region on
which it takes value zero is a hypercube with side A. Thus we have reduced
the problem to the Grover setting. Application of Grover's algorithm to the
function h(.), will result in a point that returns the value zero; by construction,
this point belongs to the basin of attraction of the global minimum. We return
then to the original function f (.) and apply the descent technique of choice
that will lead to the global minimum. If the basin of attraction of the global
minimum is narrow, the gradients of the function f (.) may reach very large val-
ues which may cause overshots. Once that phase of the algorithm is reached,
one can proceed to apply a scaling (dilation) transformation that maintains
the descent mode but moderates the gradients. On the other hand, as one
approaches the global minimum, the gradients become very small and certain
acceleration techniques based on non-Lipschitzian dynamics may be required
(Barhen (1996)); (Barhen (1997)). If the global minimum is attained a t the
boundary of the domain, the algorithm above will find it without additional
complications.

4 PRACTICAL IMPLEMENTATION CONSIDERATIONS

It is clear that, in general, the conditions imposed on the functions f (.) are
rather strong, suficient conditions. However: (a) these conditions are both
satisfied and explicitly given for the academic "golf course" example (1) and
(b) while they do not help reduce the complexity of the classical descentlsearch
algorithm, they make a remarkable difference in the quantum framework.
In fact, assumption (i) is satisfied by a large class of important practical
problems, namely parameter identification encountered e.g. in remote sensing,
pattern recognition, and, in general, inverse problems. In these problems the
absolute minimum, namely zero, is attained for the correct values of the pa-
REFERENCES 301

rameters, matching of patterns, and fitting of output to input. Assumption

(i) can be relaxed in the sense that the function may have multiple global
minima, all equal to zero. Functions with multiple global minima will simply
result in Grover problems with multiple "special" elements and can be treated
accordingly if the number of global minima is known.
Assumption (ii) can be replaced with the much more reasonable assumption
that f has a finite number of local minima. This would prevent the value of any
local minimum to be infinitesimally close to the value of the global minimum.
Assumption (iii) is the most difficult to fulfill in practical problems. However,
this assumption could also be relaxed with no significant performance loss, if
more efficient (e.g. exponentially fast) unstructured search quantum algorithms
were available. For the time being, the likelihood of exponentially fast search
algorithms is uncertain (Protopopescu).
Recently, Chen and Diao (Chen), proposed an algorithm which was supposed
to achieve an exponential (as opposed to polynomial) speed-up of the unstruc-
tured search. Unfortunately, subtle complexity hidden in one of the proposed
steps makes this algorithm unusuitable for exponentially fast search.
Despite their scarcity and still elusive implementation in a practical quan-
tum computer, quantum algorithms could bring very promising solutions to
hard computational problems. It seems likely that - like the algorithms pro-
posed so far - future quantum algorithms will be much more "problem tailored"
than their classical counterparts. Therefore, specific additional information is
crucial. In general, this information may be difficult to obtain, but - as il-
lustrated above - its benefits may significantly outweigh its cost. Indeed, for
very high dimensional, computationally intensive problems, even the polyno-
mial reduction of complexity offered by the Grover's algorithm is extremely
significant.

Acknowledgments

This work was partially supported by the Material Sciences and Engineering Division
Program of the DOE Office of Science under contract DE-AC05-000R22725 with UT-
Battelle, LLC. We thank Drs. Robert Price and Iran Thomas from DOE for their
support. V. P. thanks Dr. Cassius D'Helon for an enlightening discussion on Ref. 3
and for a careful reading of the manuscript.
302 OPTIMIZATION AND CONTROL WITH APPLICATIONS

References

Barhen, J. and V. Protopopescu, Generalized TRUST Algorithm for Global

Optimization, State of the Art in Global Optimization, C.A. Floudas and
P.M. Pardalos eds., pp. 163-180, Kluwer Academic Press, Dordrecht, Boston
1996.
Barhen, J., V. Protopopescu, and D. Reister, TRUST: A Deterministic Algo-
rithm for Global Optimization, Science 276, 1094-1097 (1997).
Chen, G. and Z. Diao, Exponentially Fast Quantum Search Algorithm, quant-
ph/0011109.
Deng, H. Lydia and J . A. Scales, Characterizing Complexity in Generic Opti-
mization Problems, Preprint, Center of Wave Phenomena, Golden, Colorado,
CWP-208P, October 1996.
Diirr, C. and P. Hoyer, A Quantum Algorithm for Finding the Minimum, quant-
ph/9607014
Floudas, C. A. and P. M. Pardalos, eds. State of the Art in Global Optimiza-
tion: Computational Methods and Applications, Kluwer Academic Publish-
ers, Dordrecht, Boston, 1996.
Grover, L. K., Quantum Mechanics Helps in Searching a Needle in a Haystack,
Phys. Rev. Lett 78, 325-328 (1997).
Hager, W. W., D. W. Hearn, and P. M. Pardalos, eds. Large Scale Optimization:
State of the Art, Kluwer Academic Publishers, Dordrecht, Boston, 1994.
Horst, R. and H. Tuy, Global Optimization, 2d ed., Springer-Verlag, Berlin,
1993.
Nielsen, M. and I. Chuang, Quantm Computation and Quantum Information,
Cambridge University Press, Cambridge, UK, 2000; A. 0. Pittenger, An
Intyroduction to Quantum Computing Algorithms, Birkhauser, Boston, 2000.
Protopopescu, V. and J. Barhen, to be published.
Shor, P., Algorithms for Quantum Computation: Discrete Logarithms and Fac-
toring, Proceedings of the 35th Annual Symposium on Foundations of Com-
puter Science, 1994, p. 124-134.
Smith, D. R., Variational Methods in Optimization, Prentice Hall, Inc., Engle-
wood Cliffs, N.J., 1974.
Torn, A. and A. Zilinskas, Global Optimization, Springer-Verlag, Berlin, 1989.
Traub, J. F., G. W. Wasilkowski, and H. Wozniakowski, Information-Based
Complexity, Academic Press, Boston, 1988.
REFERENCES 303

Wolpert, D. H. and W. G. Macready, No Free Lunch Theorems for Optimiza-

tion, IEEE Trans. on Evolutionary Computing, 167-82 (1997); see also W.
G. Macready and D. H. Wolpert, What Makes an Optimization Problem
Hard ?, Complexity, 5, 40-46 (1996).
FOR NONLINEAR PROGRAMMING
Klaus Schittkowski and Christian Zillober

Dept. of Mathematics
University of Bayreuth
D-95440 Bayreuth
Germany

Abstract: We introduce two classes of methods for constrained smooth non-

linear programming that are widely used in practice and that are known under
the names SQP for sequential quadratic programming and SCP for sequential
convex programming. In both cases, convex subproblems are formulated, in the
first case a convex quadratic programming problem, in the second case a convex
and separable nonlinear program. An augmented Lagrangian merit function can
be applied for stabilization and for guaranteeing convergence. The methods are
outlined in a uniform way, convergence results are cited, and the results of a
comparative performance evaluation are shown based on a set of 306 standard
test problems. In addition a few industrial applications and case studies are
listed that are obtained for the two computer codes under consideration, i.e.,
NLPQLP and SCPIP.

Key words: Sequential quadratic programming, SQP, sequential convex pro-

gramming, SCP, global convergence, augmented Lagrangian merit function,
comparative tests.
306 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Whenever a mathematical model is available to simulate a real-life application,

a straightforward idea is to apply mathematical optimization algorithms for
minimizing a so-called cost function subject to constraints.
A typical example is the minimization of the weight of a mechanical structure
under certain loads and constraints for admissible stresses, displacements, or
dynamic responses. Highly complex industrial and academic design problems
are solved today by means of nonlinear programming algorithms without any
chance to get equally qualified results by traditional empirical approaches.
We consider the smooth, constrained optimization problem to minimize a
scalar objective function f (x) under nonlinear inequality constraints,

where z is an n-dimensional parameter vector, and the vector-valued function

g(x) defines m inequality constraints, g(x) = (gl(x), . . . , g , ( ~ ) ) ~ To
. facilitate
the subsequent analysis, upper and lower bounds are not handled separately,
i.e., they are considered as general inequality constraints, and there are no
equality constraints. They would be linearized in both situations we are inves-
tigating and would not lead to any new insight. We assume that the feasible
domain of (1.1) is non-empty and bounded.
Sequential quadratic programming (SQP) methods are very well known and
are considered as the standard general purpose algorithms for solving smooth
nonlinear optimization problems, at least under the following assumptions:

The problem is not too big.

The functions and gradients can be evaluated with sufficiently high pre-
cision.

The problem is smooth and well-scaled.

SQP methods have their roots in unconstrained optimization, and can be

considered as extensions of quasi-Newton methods by taking constraints into
account. The basic idea is to establish a quadratic approximation based on sec-
ond order information, with the goal to achieve a fast local convergence speed.
A quadratic approximation of the Lagrangian is formulated and the constraints
SQP VERSUS SCP METHODS 307

are linearized. Second order information about the Hessian of the Lagrangian
is updated by a quasi-Newton formula. The convex quadratic program must be
solved in each iteration step by an available black box solver. For a review, see
for example Stoer (1985) and Spellucci (1993) or any textbook about nonlinear
programming.
Despite of the success of SQP methods, another class of efficient optimization
algorithm was proposed by engineers mainly, where the motivation is found
in mechanical structural optimization. The first method is known under the
name CONLIN or convex linearization, see Fleury and Braibant (1986) and
Fleury (1989), and is based on the observation that in some special cases,
typical structural constraints become linear in the inverse variables. Although
this special situation is rarely observed in practice, a suitable substitution by
inverse variables depending on the sign of the corresponding partial derivatives
and subsequent linearization is expected to linearize constraints somehow.
More general convex approximations are introduced by Svanberg (1987)
known under the name moving asymptotes (MMA). The goal is always to
construct convex and separable subproblems, for which efficient solvers are
available. Thus, we denote this class of methods by SCP, an abbreviation
for sequential convex programming. The resulting algorithm is very efficient
for mechanical engineering problems, if there are many constraints, if a good
starting point is available, and if only a crude approximation of the optimal
solution needs to be computed because of certain side conditions, for example
calculation time or large round-off errors in objective function and constraints.
In other words, SQP methods are based on local second order approxi-
mations, whereas SCP methods are applying global approximations. Some
comparative numerical tests of both approaches are available for mechanical
structural optimization, see Schittkowski et al. (1994). The underlying finite
element formulation uses the software system MBB-LAGRANGE (Kneppe et
al. (1987)). However we do not know of any direct comparisons of computer
codes of both methods for more general classes of test problems, particularly
for standard benchmark examples.
Thus, the purpose of the paper can be summarized as follows. First we
outline a general framework for stabilizing the algorithms under consideration
by a line search. Merit function is the augmented Lagrangian function, where
violation of constraints is penalized in the Lz-norm. SQP and SCP methods
308 OPTIMIZATION AND CONTROL WITH APPLICATIONS

are introduced in the two subsequent sections, where we outline some common
ideas and some existing convergence results. Section 5 shows the results of
comparative numerical experiments based on the 306 test examples of the test
problem collections of Hock and Schittkowski (1981) and Schittkowski (1987a).
The two computer codes under investigation, are the SQP subroutine NLPQLP
of Schittkowski (2001) and the SCP routine SCPIP of Zillober (2001c), Zillober
(2002). To give an impression on the convergence of the algorithms in case of
structural design optimization, we repeat a few results of the comparative study
Schittkowski et al. (1994). A couple of typical industrial applications and case
studies are found in Section 6, to show the complexity of modern optimization
problems, for which reliable and efficient software is needed.

2 A GENERAL FRAMEWORK

The fundamental tool for deriving optimality conditions and optimization al-
gorithms is the Lagrange function

defined for all x E Rn and u = (ul,. . . , E R m . The purpose of L(x, u)

is to link objective function f (x) and constraints g(x). The variables in u are
called the Lagrangian multipliers of the nonlinear programming problem (1.1).
Especially, the necessary Karush-Kuhn-Tucker (KKT) optimality conditions
are easily formulated by the equations

Since we assume that (1.1) is nonconvex and nonlinear in general, the basic
idea is to replace (1.1) by a sequence of simpler problems. Starting from an
initial design vector xo E Rn and an initial multiplier estimate uo E R m , iter-
ates xk E Rn and uk E Rm are computed successively by solving subproblems
of the form
SQP VERSUS SCP METHODS 309

Let yk be the optimal solution and vk the corresponding Lagrangian multiplier

of (2.3). A new iterate is computed by

where ak is a steplength parameter discussed subsequently.

Simpler means in this case that the subproblem is solvable by an available
black box technique, more or less independently from the underlying model
structure. In particular, it is assumed that the numerical algorithm for solving
(2.3) does not require any additional function or gradient evaluations of the
original functions f (x) and gj(x), j = 1, . . ., m. The approach indicates also
that we are looking for a simultaneous approximation of an optimal solution
x* and of the corresponding multiplier u*.
Now we summarize the requirements to describe a t least the SQP and SCP
algorithms in a uniform manner:

1. (2.3) is strictly convex and smooth, i.e. the functions f k ( x ) and gf(x) are
twice continuously differentiable, j = 1, . . ., m.

3. The search direction (yk - xk, vk - uk) is a descent direction for an aug-
mented Lagrangian merit function introduced below.

4. The feasible domain of (2.3) is bounded.

Strict convexity of (2.3) means that the objective function f y x ) is strictly

convex and that the constraints gf(x) are convex functions for all iterates k and
j = 1, . . ., m. If the feasible domain is non-empty, (2.3) has a unique solution
yk E IRn with Lagrangian multiplier vk E IRm.
A further important consequence is that if yk = xr, a t least conceptually,
then xk and vk solve the general nonlinear programming problem (1.1) in the
sense of a stationary solution. The Karush-Kuhn-Tucker optimality conditions
310 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for the subproblem are given by

with corresponding Lagrangian

If yk = xk, the current iterate xk is feasible, and satisfies the complementary

v kgk(yk)Tvk= 0 as well as the stationary condi-
slackness condition g ( ~ k ) ~ =
tion V,L(xk,vk) = v,L'(Y~, vk) = 0. In other words, the pair xk and vk is a
stationary point of (1.1).
A line search is introduced to stabilize the solution process, particularly
helpful when starting from a bad initial guess. We are looking for an ak, see
(2.4), 0 < a k 5 1,so that a step along a merit function *,+(a) from the current
iterate to the new one becomes acceptable. The idea is to penalize the Lagrange
function in the L2 norm, as soon as constraints are violated, by defining

and we set

where J = { j : gj(x) 2 --uj/rj) and K = {I,.. . ,m ) \ J define the constraints

considered as active or inactive, respectively.
The steplength parameter a k is required in (2.4) to enforce global conver-
gence of the optimization method, i.e. the approximation of a point satisfying
the necessary Karush-Kuhn-Tucker optimality conditions when starting from
arbitrary initial values, e.g. a user-provided xo E Rn and uo = 0. The merit
function defined by (2.7) is also called augmented Lagrange function, see for
example Rockafellar (1974). The corresponding penalty parameter r k a t the
k-th iterate that controls the degree of constraint violation, must be chosen
carefully to guarantee a descent direction of the merit function, so that the line
SQP VERSUS SCP METHODS 311

search is well-defined,

The line search consists of a successive reduction of a starting a t 1, usually

combined with a quadratic interpolation, until a sufficient decrease condition

is obtained, 0 < v < 0.5 arbitrarily chosen. To prove convergence, however, we

need a stronger estimate typically of the form

with p > 0, which is satisfied for SQP and SCP methods, see Schittkowski
(1981a), Schittkowski (198313) or Zillober (1993), respectively. A more general
framework is introduced in Schittkowski (1985a). For a more detailed discussion
of line search and different convergence aspects see Ortega and Rheinboldt
(1970).
If the constraints of (2.3) become inconsistent, it is possible to introduce
an additional variable and to modify objective function and constraints, for
example in the simplest form

The penalty term pk is added to the objective function to reduce the influence
of the additional variable yn+l as much as possible. The index k implies that
this parameter also needs to be updated during the algorithm. It is obvious
that (2.12) does always possess a feasible solution.

3 SQP METHODS

Sequential quadratic programming or SQP methods belong to the most power-

ful nonlinear programming algorithms we know today for solving differentiable
nonlinear programming problems of the form (1.1). The theoretical background
is described e.g. in Stoer (1985) in form of a review, or in Spellucci (1993) in
form of an extensive text book. il?rom the more practical point of view, SQP
312 OPTIMIZATION AND CONTROL WITH APPLICATIONS

methods are also introduced in the books of Papalambros and Wilde (2000)
or Edgar and Himmelblau (1988). Their excellent numerical performance is
tested and compared with other methods in Schittkowski (1980), Schittkowski
(1983a), and Hock and Schittkowski (1981), and since many years they belong
to the most frequently used algorithms to solve practical optimization problems.
The basic idea is to formulate and solve a quadratic programming subprob-
lem in each iteration which is obtained by linearizing the constraints and ap-
proximating the Lagrange function (2.1) quadratically.
To formulate the quadratic programming subproblem, we proceed from given
iterates xk E Rn,an approximation of the solution, uk E Rman approximation
of the multipliers, and Bk E RnXn,
an approximation of the Hessian of the
Lagrange function in a certain sense. Then we obtain subproblem (2.3) by
defining

It is immediately seen that the requirements of the previous section for (2.3)
are satisfied. The key idea is to approximate also second order information
to get a fast final convergence speed. The update of the matrix Bk can be
performed by standard quasi-Newton techniques known from unconstrained
optimization. In most cases, the BFGS-method is applied, see Powell (1978a),
Powell (1978b), or Stoer (1985). Starting from the identity or any other positive
definite matrix Bo, the difference vectors

are used to update Bk in the form

where

The above formula yields a positive definite matrix Bk+l provided that Bk
is positive definite and qTwk > 0. A simple modification of Powell (1978a)
guarantees positive definite matrices even if the latter condition is violated.
SQP VERSUS SCP METHODS 313

Among the most attractive features of sequential quadratic programming

methods is the superlinear convergence speed in the neighborhood of a solution,
i.e.
lI~k+l- %*I1< ykllxk - x*11 (3.5)
with yk -+ 0.
The motivation for the fast convergence speed of SQP methods is based on
the following observation: An SQP method is identical to Newton's method
to solve the necessary optimality conditions (2.2), if Bk is the Hessian of the
Lagrange function a t xk and uk and if we start sufficiently close to a solution.
The statement is easily derived in case of equality constraints only, but holds
also for inequality restrictions.
In fact there exist numerous theoretical convergence results in the literature,
see for example Spellucci (1993). In the frame of our discussion, we consider
the global convergence behaviour, i.e., the question, whether the SQP method
converges when starting from an arbitrary initial point. Suppose that the aug-
mented Lagrangian merit function (2.7) is implemented as merit function and
that the corresponding penalty parameters that control the degree of constraint
violation are determined in the way

211vk - uk1I2
rk = max
( (Y*- ~ r ) ~ B ~-(xk)
y t
,Tk-1

for the augmented Lagrangian function (2.7), see Schittkowski (1983a). More-
over we need an additional assumption concerning the choice of the matrix
Bk, if we neglect the special update mechanism shown in (3.3). We require
that the eigenvalues of the matrices Bk remain bounded away from 0, i.e. that
(yk - x ~ ) ~ B -
~ xk) >
( Y ~711yk - xk 112 for a11 k and a y > 0. If the iteration
data {(xk, uk, Bk)) are bounded, then it can be shown that there is an accumu-
lation point of {(xk, uk)) satisfying the Karush-Kuhn-Tucker conditions (2.2)
for (1.1), see Schittkowski (1983a).
The statement is quite weak, but without any further information about
second derivatives, we cannot guarantee that the approximated point is indeed
a local minimizer. iFrom the practical point of view, we need a finite stopping
criterion based on the optimality conditions for the subproblem, see (2.5), based
on a suitable tolerance E > 0. For example, we could try to test the KKT
condition
314 OPTIMIZATION AND CONTROL WITH APPLICATIONS

together with sufficient feasibility and complementary slackness satisfaction.

The above convergence result ensures that the stopping conditions are satisfied
after a finite number of iterations. It should be noted, however, that imple-
mented criteria are much more complex and sophisticated, see for example
Schittkowski (l985b).
There remain a few final comments to summarize the most interesting fea-
tures of SQP methods:

Linear constraints and bounds of variables remain satisfied.

In case of n active constraints, the SQP method behaves like Newton's

method for solving the corresponding system of equations, i.e., the local
convergence speed is even quadratically.

The algorithm is globally convergent and the local convergence speed is

superlinear.

A simple reformulation allows the efficient solution of constrained nonlin-

ear least squares problems, see Schittkowski (1988), Schittkowski (1994),
or Schittkowski (2002).

A large number of constraints can be treated by an active set strategy,

see Schittkowski (1992). In particular, the computation of gradients for
inactive restrictions can be omitted.

There exists a large variety of different extensions to solve also large scale
problems, see Gould and Toint (2000) for a review.

4 SCP METHODS

Sequential convex programming methods are developed mainly for mechanical

structural optimization. The first approach of Fleury and Braibant (1986) and
Fleury (1989) is known under the name convex linearization (CONLIN), and
exploits the observation that in some special cases, typical structural constraints
become linear in the inverse variables.
To illustrate this motivation, we consider the most simple example, two bars
fixed a t a wall and connected a t the other end. An external load p is applied
at this node, see Figure 4.1. The two design variables are the cross sectional
areas ai scaled by elasticity modulus E and length li, i = 1,2, i.e., si = Eai/li.
SQP VERSUS SCP METHODS 315

Figure 4.1 2-bar-truss.

If si and ci denote the sinus and co-sinus function values of the corresponding
angles of the trusses, i = 1,2, the horizontal and vertical displacements are
given in the form

+ +
) lpl(sin$~(c?/xz ~ g / x i )- C O S $ J ( S ~ C ~ / X Z ~ z ~ z / x ~ ) ) / s i n -
~ ( x= ~ ($2)
$l .
If we assume now that our optimization problem consists of minimizing the
weight of the structure under some given upper bounds for these displacements,
we get nonlinear constraints that are linear in the reciprocal design variables.
Although this special situation is always found in case of statically determi-
nate structures, it is rarely observed in practice. However, a suitable substi-
tution by inverse variables depending on the sign of the corresponding partial
derivatives and subsequent linearization is expected to linearize constraints
somehow.
For the CONLIN method, Nguyen et al. (1987) gave a convergence proof
but only for the case that (1.1) consists of a concave objective function and
concave constraints which is of minor practical interest. They showed also
that a generalization to non-concave constraints is not possible. More general
convex approximations are introduced by Svanberg (1987) known under the
name method of moving asymptotes (MMA). The goal is always to construct
nonlinear convex and separable subproblems, for which efficient solvers are
available. Using the flexibility of the asymptotes which influence the curvature
of the approximations, it is possible to avoid the concavity assumption.
316 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Given an iterate xk, the model functions of (1.1), i.e., f and gj, are approxi-
mated by functions f k and gt
at xk, j = 1, . . ., m. The basic idea is to linearize
f and gj with respect to transformed variables (u,"- xi)-' and (xi - LF)-'
depending on the sign of the corresponding first partial derivative. u," and L!
are reasonable bounds and are adapted by the algorithm after each successful
step. Also several other transformations have been developed in the past.
The corresponding approximating functions that define subproblem (2.3),
are

j = 1, . . ., m, where y = (yl, . . . , Y , ) ~ .The index sets are defined by

In a similar way, ~hand IJ< are defined. The coefficients ajk and Ptj, j = 0, . . .,
m are chosen to satisfy the requirements of Section 2.1, i.e., that (2.3) is convex
and that (2.3) is a first order approximation of (1.1) at xk. By an appropriate
regularization of the objective function, strict convexity of f '(x) is guaranteed,
see Zillober (2001a). As shown there, the search direction (yk -xk, vk -uk) is a
descent direction for the augmented Lagrangian merit function. If the adoption
rule for the parameters L: and U," fulfills the conditions that the absolute value
of their difference to the corresponding component of the current iteration point
xk is uniformly bounded away from 0 and that their absolute value is bounded,
global convergence can be shown for the SCP method if a similar update rule
for the penalty parameters r k as (3.6) is applied.
The choice of the asymptotes L! and u,", is crucial for the computational
behavior of the method. If additional lower bounds xl and upper bounds xu
on the variables are given, an efficient update scheme for the i-th coefficient,
i = 1, . . ., n, and the k-th iteration step is given as follows:
SQP VERSUS SCP METHODS 317

A suitable choice of the constants is X1 = 0.5, X2 = 1.15, X3 = 0.7. If there is

no change in the sign of a component of two successive iterations, this situation
is interpreted as smooth convergence and allows a relaxation of the asymptotes.
If there are changes of the sign in two successive iterations, we are afraid of
cycling. The asymptotes are chosen closer to the iteration point leading to
more conservative approximations.
Additional safeguards ensure the compatibility of this procedure with the
overall scheme and guarantee global convergence. A small positive constant is
introduced to avoid that the difference between the asymptotes and the current
iteration point becomes too small. However, these safeguards are rarely used
in practice, see Zillober (2001a) for more details.
For the first SCP codes, the convex and separable subproblems are solved
very efficiently by a dual approach, where dense linear systems of equations with
m rows and columns are solved, cf. Svanberg (1987) or Fleury (1989). Recently,
an interior point method for the solution of the subproblems is proposed by
Zillober (2001b). The advantage is to formulate either n x n or m x m linear
systems of equations leading to a more flexible treatment of large problems.
The resulting algorithm is very efficient especially for large scale mechanical
engineering problems, and sparsity patterns of the original problem data can
be exploited.
To summarize, the most important features of SCP methods are:

Linear constraints and bounds of variables remain satisfied.

The algorithm is globally convergent.

As for SQP methods, a large number of constraints can be treated by an

active set strategy, see Zillober (2001c), Zillober (2002). In particular,
the computation of gradients for inactive restrictions can be omitted.
318 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Large scale problems can be handled more flexible by different variations

of the solution procedure for the subproblem, see Zillober et al. (2002)
for test results with up to lo6 variables.

Sparsity in the problem data can be exploited, see again Zillober et al.
(2002) for a series of numerical results for elliptic control problems.

5 COMPARATIVE PERFORMANCE EVALUATION

Our numerical tests use all 306 academic and real-life test problems published
in Hock and Schittkowski (1983) and in Schittkowski (1987a). Part of them
are also available in the CUTE library, see Bongartz et al. (1995). The dis-
tribution of the dimension parameter n, the number of variables, is shown in
Figure 5.1. We see, for example, that about 270 of 306 test problems have not
more than 10 variables. In a similar way, the distribution of the number of
constraints is shown in Figure 5.2. The test problems possess also nonlinear
equality constraints and additional lower and upper bounds for the variables.
The two codes under consideration, NLPQLP and SCPIP, are able to solve
more general problems

min f (x)

with smooth functions h(x) = ( h l , . . . , h,,)T and xl < xu.

Since analytical derivatives are not available for all problems, we approxi-
mate them numerically. The test examples are provided with exact solutions,
either known from analytical solutions or from the best numerical data found so
far. The Fortran codes are compiled by the Compaq Visual Fortran Optimizing
Compiler, Version 6.5, under Windows 2000, and executed on a Pentium IV
processor with 1.0 GHz. Since the calculation times are very short, about 10 sec
for solving all 306 test problems with high order derivative approximations, we
count only function and gradient evaluations. This is a realistic assumption,
since for the practical applications in mind calculation times for evaluating
model functions dominate and the numerical efforts within an optimization
code are negligible.
SQP VERSUS SCP METHODS 319

100

n 10

A ibo 15.0
test problems
2b0 250 36;

Figure 5.1 Distribution o f number of variables.

100

0
50 100 150 200 250 300
test problems

Figure 5.2 Distribution of number of constraints.

320 OPTIMIZATION AND CONTROL WITH APPLICATIONS

First we need a criterion to decide, whether the result of a test run is con-
sidered as a successful return or not. Let E > 0 be a tolerance for defining the
relative accuracy, xk the final iterate of a test run, and x* the supposed exact
solution known from the two test problem collections. Then we call the output
a successful return, if the relative error in the objective function is less than E

and if the maximum constraint violation is less than c2, i.e., if

or
f (xk) < E , if f (x*) = 0

where 11.. .I[, denotes the maximum norm and gj(xk)+ = max(O,gj(xk)).
We take into account that a code returns a solution with a better func-
tion value than the known one, subject to the error tolerance of the allowed
constraint violation. However, there is still the possibility that an algorithm
terminates a t a local solution different from the one known in advance. Thus,
we call a test run a successful one, if the internal termination conditions are
satisfied subject to a reasonably small tolerance (IFAIL=O), and if in addition
to the above decision,

and
T ( x ~<
) e2 .
For our numerical tests, we use E = 0.01, i.e., we require a final accuracy of one
per cent. Gradients are approximated by a fourth-order difference formula

d
-dxi 1
( 2 ~ ( ~ - 2 ~ e ~ ) - 1 6 f ( x - ~ e i ) + 1 6 f ( x + q i e i ) - 2 f ( ~ ,+ 2 ~ e i )
f ( ~ ) ~ ~ 4!%

(5.2)
where Q = qrnax(10-~,IxiI), q = ei the i-th unit vector, and i = 1,. . . ,n.
In a similar way, derivatives of constraints are computed.
SQP VERSUS SCP METHODS 321

Table 5.1 Performance results for standard test problems.

code NSUCC NF NIT

NLPQLP
SCPIP

The Fortran implementation of the SQP method introduced in Section 3, is

called NLPQLP, see Schittkowski (2001). The code represents the most recent
version of NLPQL which is frequently used in academic and commercial institu-
tions, see Schittkowski (198513). NLPQLP is prepared to run also under distrib-
uted systems, but behaves in exactly the same way as the previous version, if
the number of simulated processors is set to one. Functions and gradients must
be provided by reverse communication and the quadratic programming sub-
problems are solved by the primal-dual method of Goldfarb and Idnani (1983)
based on numerically stable orthogonal decompositions. NLPQLP is executed
with termination accuracy ACC=10-* and a maximum number of iterations
MAXIT=500.
The SCP method has been implemented in Fortran and is called SCPIP, see
Zillober (2001~)and Zillober (2002). The convex subproblems are solved by
the predictor-corrector interior point method described in Zillober (2001b). All
input variables have been chosen such that the calling conventions for SCPIP
and NLPQLP are comparable.
Table 5.1 shows

the percentage of successful test runs NSUCC,

the average number of function calls NF,

the average number of iterations NIT.

When evaluating NF, we count each single function call. However, function
evaluations needed for gradient approximations, are not counted. Their average
number is 4 x NIT.
Many test problems are unconstrained or possess a highly nonlinear objective
function preventing SCP from converging as fast as SQP methods. Moreover,
bounds are often set far away from the optimal solution, leading to initial
asymptotes too far away from the region of interest. Since SCP methods do
322 OPTIMIZATION AND CONTROL WITH APPLICATIONS

not possess fast local convergence properties, SCPIP needs more iterations and
function evaluations in the neighborhood of a solution.
The situation is different in mechanical structural optimization, where the
SCP methods have been invented. In the numerical study of Schittkowski et al.
(1994), 79 finite element formulations of academic and practical problems are
collected based on the simulation package MBB-LAGRANGE, see Kneppe et
al. (1987). The maximum number of variables is 144 and a maximum number
of constraints 1020 without box-constraints. NLPQL, see Schittkowski (1985b),
and MMA, former versions of NLPQLP and SCPIP, respectively, are among
the 11 optimization algorithms under consideration. To give an impression on
the behavior of SQP versus MMA, we repeat some results of Schittkowski et al.
(1994), see Table 5.2. We compare the performance with respect to percentage
of successful test runs (NSUCC), number of function calls (NF), and number
of iterations (NIT), respectively.
For the evaluation of performance indices by the priority theory of Saaty,
see Schittkowski et al. (1994). The main difficulty is that the optimization
algorithms solve only a certain subset of test problems successfully, which differs
from code to code. Thus, mean values of a performance criterion are evaluated
only pairwise over the set of successfully solved test problems of two algorithms,
and then compared in form of a matrix. The decision whether the result of a
test run is considered as a successful one or not, depends on a tolerance E which
is set to E = 0.01 and E = 0.00001, respectively.

Table 5.2 Performance results for structural optimization test problems

code NSUCC NF NIT NSUCC NF NIT

NLPQL 84 % 2.0 1.6 77 % 1.3 1.3

MMA 73 % 1.O 1.O 73 % 1.O 1.O

The figures of Table 5.2 represent the scaled relative performance when com-
paring the codes among each other. We conclude for example that for E = 0.01,
NLPQL requires about twice as many gradient evaluations or iterations, re-
spectively, as MMA. When requiring a higher termination accuracy, however,
SQP VERSUS SCP METHODS 323

NLPQL needs only about 30 % as many gradient calls. On the other hand,
NLPQL is a bit more reliable than MMA.

6 SOME ACADEMIC AND COMMERCIAL APPLICATIONS

A few typical academic and commercial applications of the SQP algorithm

NLPQL are

mechanical structural optimization, for example airplane and space struc-

tures, Kneppe et al. (1987),

optimal feed control of tubular chemical reactors, Birk et al. (1999),

design of surface acoustic wave filters for signal processing, Biinner et al.
(2002),

design of horn radiators for satellite communication, Hartwanger et al.

(2000),

modeling of maltodextrin DE12 drying process in a convection oven, Frias

et al. (2001),

fitting drug dissolution measurements of immediate release solid dosage,

Loth et al. (2001),

diffusion and concurrent metabolism in cutaneous tissue, Boderke et al.

(2000).

In some cases, the underlying simulation software is highly complex and devel-
oped over many years, in some others NLPQL is used in a special way to solve
data fitting problems.
Typical applications of the SCP code SCPIP are

a minimum weight of a cruise ship, Zillober and Vogel (2000a),

minimum weight of an exhaust pipe in truck design, Zillober (2001b),

optimal design of the bulk head of an aircraft, Zillober and Vogel(2000b),

optimal construction of supporting tub in automotive industry, Zillober

(2001b),

a topology design of mechanical structures, see below.

324 OPTIMIZATION AND CONTROL WITH APPLICATIONS

To give an impression about the capabilities of an SCP implementation for

solving very large scale nonlinear programming problems, we consider topology
optimization. Given a predefined domain in the 2D/3D space with boundary
conditions and external load, the intention is to distribute a percentage of the
initial mass on the given domain such that a global measure takes a minimum,
see Bendsere (1995) for a broad introduction. The so-called power law approach,
see also Bendsere (1989) or Mlejnek (1992), leads to a nonlinear programming
problem of the form

where x denotes the relative densities, u the displacement vector computed from
the linear system of equations K(x)u = f with positive definite stiffness matrix
K(x) and external load vector f . The relative densities and the elementary
stiffness matrices Kf define K(x) by

V(x) is the volume of the structure, usually a linear function of the design
variables. xl is a vector of small positive numbers to avoid singularities. The
power 3 in the state equation is found heuristically and usually applied in
practice. Its role is to penalize intermediate values between the lower bound
and 1. Topology optimization problems lead easily to very large scale, highly
nonlinear optimization problems. The probably most simple example is a half
beam, for our test discretized by 390 x 260 linear four-nodes square elements
leading to 101,400 optimization variables. SCPIP computes the solution shown
in Figure 7.1 after 30 iterations.

7 CONCLUSIONS

In this paper we try to describe SQP and SCP methods in a uniform way. In
both cases, convex subproblems are formulated from which a suitable search
direction with respect to the design variables and the multipliers are computed.
A subsequent line search based on the augmented Lagrangian merit function
stabilizes the optimization algorithm and allows to prove global convergence.
REFERENCES 325

Starting from an arbitrary initial design, a stationary point satisfying the nec-
essary Karush-Kuhn-Tucker conditions is approximated.
However, both methods differ significantly in their underlying motivation.
SQP algorithms proceed from a local quadratic model with the goal to achieve a
fast local convergence speed, superlinearly in case of quasi-Newton updates for
the Hessian of the Lagrangian. On the other hand, SCP methods apply a global
convex and nonlinear approximation of objective function and constraints based
on linearized inverse variables, by using first order information only.
Numerical results based on standard, small scale benchmark problems and
on structural design optimization problems are included, also a brief summary
of some industrial applications the authors are involved in.
Advantages and disadvantages of SQP versus SCP methods can be summa-
rized as follows.

Advantages Disadvantages

SQP highly accurate solutions with fast convergence only in case of

fast final convergence speed, able accurate gradients, large storage
to solve highly nonlinear prob- requirements
lems, robust, general purpose
tool

SCP tuned for solving structural me- slow final convergence possible,
chanical optimization problems, not reliable without stabiliza-
no heredity of round-off errors tion, less robust with respect to
in function and gradient cal- default tolerances
culations, excellent convergence
speed in special situations, able
to solve very large problems

References

Bendsoe M.P. (1989): Optimal shape design as a material distribution problems,

Structural Optimization, Vol. 1, 193-202.
Bendsoe M.P. (1995): Optimization of Structural Topology, Shape and Material,
Springer, Heidelberg.
326 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Figure 7.1 Half beam.

Boderke P., Schittkowski K., Wolf M., Merkle H.P. (2000): Modeling of diffu-
sion and concurrent metabolism in cutaneous tissue, Journal on Theoretical
Biology, Vol. 204, No. 3, 393-407.
Birk J., Liepelt M., Schittkowski K., Vogel F. (1999): Computation of optimal
feed rates and operation intervals for tubular reactors, Journal of Process
Control, Vol. 9, 325-336.
Blatt M., Schittkowski K. (1998): Optimal Control of One-Dimensional Partial
Differential Equations Applied to Transdermal Diffusion of Substrates, in:
Optimization Techniques and Applications, L. Caccetta, K.L. Teo, P.F. Siew,
Y.H. Leung, L.S. Jennings, V. Rehbock eds., School of Mathematics and
Statistics, Curtin University of Technology, Perth, Australia, Vol. 1, 81 - 93.
Bongartz I., Conn A.R., Gould N., Toint Ph. (1995): CUTE: Constrained and
unconstrained testing environment, Transactions on Mathematical Software,
Vol. 21, NO. 1, 123-160.
Biinner M., Schittkowski K., van de Braak G. (2002): Optimal design of sur-
face acoustic wave filters for signal processing by mixed integer nonlinear
programming, submitted for publication.
Edgar T.F., Himmelblau D.M. (1988): Optimization of Chemical Processes,
McGraw Hill.
Fleury C. (1989): A n eficient dual optimizer based o n convex approximation
concepts, Structural Optimization, Vol. 1, 81-89.
Fleury C., Braibant V. (1986): Structural Optimization - a new dual method
using mixed variables, International Journal for Numerical Methods in En-
gineering, Vol. 23, 409-428.
REFERENCES 327

Frias J.M., Oliveira J.C, Schittkowski K. (2001): Modelling of maltodextrin

DEl2 drying process in a convection oven, Applied Mathematical Modelling,
Vol. 24, 449-462.
Goldfarb D., Idnani A. (1983): A numerically stable method for solving strictly
convex quadratic programs, Mathematical Programming, Vol. 27, 1-33.
Gould N.I.M., Toint P.L. (2000): SQP methods for large-scale nonlinear pro-
gramming, in: System Modelling and Optimization: Methods, Theory and
Applications, M.J.D. Powell, S. Scholtes eds., Kluwer.
Han S.-P. (1976): Superlinearly convergent variable metric algorithms for gen-
eral nonlinear programming problems, Mathematical Programming, Vol. 11,
263-282.
Han S.-P. (1977): A globally convergent method for nonlinear programming,
Journal of Optimization Theory and Applications, Vol. 22, 297-309.
Hartwanger C., Schittkowski K., Wolf H. (2000): Computer aided optimal de-
sign of horn radiators for satellite communication, Engineering Optimiza-
tion, Vol. 33, 221-244.
Hock W., Schittkowski K. (1981): Test Examples for Nonlinear Programming
Codes, Lecture Notes in Economics and Mathematical Systems, Vol. 187,
Springer.
Hock W., Schittkowski K. (1983): A comparative performance evaluation of 27
nonlinear programming codes, Computing, Vol. 30, 335-358.
Kneppe G., Krammer J., Winkler E. (1987): Structural optimization of large
scale problems using MBB-LAGRANGE, Report MBB-S-PUB-305, Messer-
schmitt-Bolkow-Blohm, Munich.
Loth H., Schreiner Th., Wolf M., Schittkowski K., Schafer U. (2001): Fitting
drug dissolution measurements of immediate release solid dosage forms by
numerical solution of differential equations, submitted for publication.
Mlejnek H.P. (1992): Some aspects of the genesis of structures, Structural Op-
timization, Vol. 5, 64-69.
Nguyen V.H., Strodiot J.J., Fleury C. (1987): A mathematical convergence
analysis for the convex linearization method for engineering design optimiza-
tion, Engineering Optimization, Vol. 11, 195-216.
Ortega J.M., Rheinboldt W.C. (1970): Iterative Solution of Nonlinear Equa-
tions in Several Variables, Academic Press, New York-San Francisco-London.
328 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Papalambros P.Y., Wilde D.J. (2000): Principles of Optimal Design, Cambridge

University Press.
Powell M.J.D. (1978): A fast algorithm for nonlinearly constraint optimiza-
tion calculations, in: Numerical Analysis, G.A. Watson ed., Lecture Notes
in Mathematics, Vol. 630, Springer.
Powell M.J.D. (1978): The convergence of variable metric methods for nonlin-
early constrained optimization calculations, in: Nonlinear Programming 3,
O.L. Mangasarian, R.R. Meyer, S.M. Robinson eds., Academic Press.
Powell M.J.D. (1983): On the quadratic programming algorithm of Goldfarb and
Idnani. Report DAMTP 19831Na 19, University of Cambridge, Cambridge.
Rockafellar R.T. (1974): Augmented Lagrange multiplier functions and duality
in non-convex programming, Journal on Control, Vol. 12, 268-285.
Schittkowski K. (1980): Nonlinear Programming Codes, Lecture Notes in Eco-
nomics and Mathematical Systems, Vol. 183 Springer.
Schittkowski K. (1981): The nonlinear programming method of Wilson, Hun
and Powell. Part 1: Convergence analysis, Numerische Mathematik, Vol.
38, 83-114.
Schittkowski K. (1981): The nonlinear programming method of Wilson, Hun
and Powell. Part 2: An eficient implementation with linear least squares
subproblems, Numerische Mathematik, Vol. 38, 115-127.
Schittkowski K. (1983): Theory, implementation and test of a nonlinear pro-
gramming algorithm, in: Optimization Methods in Structural Design, H. Es-
chenauer, N. Olhoff eds., Wissenschaftsverlag.
Schittkowski K. (1983): On the convergence of a sequential quadratic program-
ming method with an augmented Lagrangian search direction, Optimization,
Vol. 14, 197-216.
Schittkowski K. (1985): On the global convergence of nonlinear programming
algorithms, ASME Journal of Mechanics, Transmissions, and Automation in
Design, Vol. 107, 454-458.
Schittkowski K. (1985186): NLPQL: A Fortran subroutine solving constrained
nonlinear programming problems, Annals of Operations Research, Vol. 5,
485-500.
Schittkowski K. (1987a): More Test Examples for Nonlinear Programming, Lec-
ture Notes in Economics and Mathematical Systems, Vol. 182, Springer.
REFERENCES 329

Schittkowski K. (1987): New routines in MATH/LIBRARY for nonlinear pro-

gramming problems, IMSL Directions, Vol. 4, No. 3.
Schittkowski K. (1988): Solving nonlinear least squares problems by a general
purpose SQP-method, in: Trends in Mathematical Optimization, K.-H. Hoff-
mann, J.-B. Hiriart-Urruty, C. Lemarechal, J . Zowe eds., International Series
of Numerical Mathematics, Vol. 84, Birkhauser, 295-309.
Schittkowski K. (1992): Solving nonlinear programming problems with very
many constraints, Optimization, Vol. 25, 179-196.
Schittkowski K. (1994): Parameter estimation in systems of nonlinear equa-
tions, Numerische Mathematik, Vol. 68, 129-142.
Schittkowski K. (2001): NLPQLP: A New Fortran Implementation of a Se-
quential Quadratic Programming Algorithm for Parallel Computing, Report,
Department of Mathematics, University of Bayreuth.
Schittkowski K. (2002): Numerical Data Fitting in Dynamical Systems, Kluwer.
Schittkowski K., Zillober C., Zotemantel R. (1994): Numerical comparison of
nonlinear programming algorithms for structural optimization, Structural
Optimization, Vol. 7, No. 1, 1-28.
Spellucci P. (1993): Numerische Verfahren der nichtlinearen Optimierung, Birk-
hauser.
Stoer J. (1985): Foundations of recursive quadratic programming methods for
solving nonlinear programs, in: Computational Mathematical Programming,
K. Schittkowski, ed., NATO AS1 Series, Series I?: Computer and Systems
Sciences, Vol. 15, Springer.
Svanberg K. (1987): The Method of Moving Asymptotes - a new method for
Structural Optimization, International Journal for Numerical Methods in
Engineering, Vol. 24, 359-373.
Zillober Ch. (1993): A globally convergent version of the Method of Moving
Asymptotes, Structural Optimization, Vol. 6, 166-174.
Zillober Ch. (2001a): Global convergence of a nonlinear programming method
using convex approximations, Numerical Algorithms, Vol. 27, 256-289.
Zillober Ch. (2001b): A combined convex approximation - interior point ap-
proach for large scale nonlinear programming, Optimization and Engineer-
ing, Vol. 2, 51-73.
Zillober Ch. (2001~):Software manual for SCPIP 2.5 Report, Department of
Mathematics, University of Bayreuth.
330 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Zillober Ch. (2002): SCPIP - an eficient software tool for the solution of struc-
tural optimization problems, Structural and Multidisciplinary Optimization,
Vol. 24, NO. 5, 362-371.
Zillober Ch., Vogel F. (2000a): Solving large scale structural optimization prob-
lems, in: Proceedings of the 2nd ASMO UK/ISSMO Conference on Engi-
neering Design Optimization, J. Sienz ed., University of Swansea, Wales,
273-280.
Zillober Ch., Vogel F. (2000b): Adaptive strategies for large scale optimization
problems in mechanical engineering, in: Recent Advances in Applied and
Theoretical Mathematics, N. Mastorakis ed., World Scientific and Engineer-
ing Society Press, 156-161.
Zillober Ch., Schittkowski K., Moritzen K. (2002): Very large scale optimization
by sequential convex programming, submitted for publication.
APPROACH FOR LINEAR PROGRAMMING
IN MEASURE SPACE
C.F. Wen and S.Y. Wu

Institute of Applied Mathematics

National Cheng Kung University
Tainan 701, TAIWAN

Abstract: The purpose of this paper is to present some results, of a study of

linear programming in measure spaces (LPM). We prove that under certain con-
ditions, there exists a solution for LPM. We develop an approximation scheme to
solve LPM and prove the convergence properties of the approximation scheme.

Key words: Linear programming in measure spaces, optimal solution, approx-

imation scheme, linear semi-infinite programming problems.
332 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Let (E,F) and (2, W) be two dual pairs of ordered vector spaces. Let E+ and
Z+ be the positive cones for E and Z respectively, and E; and ZT be the
polar cones of E+ and Z+ respectively. The general linear programming and
its dual problem may be stated as follows: Given b* E F, c E Z, and a linear
map A : E -t 2, then linear programming problem and its dual problem can
be formulated as:
(LP) minimize (x, b*)
subject to Ax - c E Z+ and x E E+;

(DLP) maximize (c, y*)

subject to b* - A*y* E ET and y* E 2:.
We now discuss this kind of linear programming in measure spaces. Glashoff
and Gustafson (1982) discuss linear semi-infinite programming (LSIP) in which
the constraint inequalities are subject to the relationships of finite linear com-
binations of functions to functions. The theory and algorithms for linear semi-
infinite programming are discussed in Glashoff and Gustafson (1982); Hettich
and Kortanek (1993); Lai and Wu (1992a); Reemtsen and Gorner (1998); Wu
et a1 (1998). The generalized capacity problem is an infinite dimensional linear
programming problem. The generalized capacity problem extends the linear
semi-infinite programming from the variable space Rn to variable space of the
regular Borel measure space. Lai and Wu (1992) investigate the generalized
capacity problem (GCAP) in which the constraint inequalities are subject to
the relationships of measures to a function. Kellerer (1988) explores linear
programming in measure spaces using a theoretical model. He consideres the
linear programming problems for measure spaces of the forms:
(PI) minimize J
, hdp
subject to pp, 2 v in M(Y),
where p, maps p E M + ( X ) to M(Y), and

(Dl) maximize Jy gdv

subject to all measurable functions g 2 0 on Y, and pg 5 h,
where p maps the set of nonnegative measurable functions
on Y to measurable functions on X.
Here X and Y are topological spaces endowed with their Borel algebras,
and M + ( X ) denotes the set of nonnegative measures in M(X). Lai and Wu
APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 333

(1994) discuss LPM and DLPM (defined in this section) with constraint in-
equalities on the relationships of measures to measures. They prove that under
certain conditions the LPM problem can be reformulated as a general capac-
ity problem as well as a linear semi-infinite programming problem. Therefore
LPM is a generalization of the general capacity problem and linear semi-infinite
programming problem. Lai and Wu (1994); Wen and Wu (2001) develope algo-
rithms for solving LPM when certain conditions are added to an LPM. In the
present paper, we develop an approximation scheme to solve LPM in section
3. This scheme is a discritization method. Bascially, this approach finds a se-
quence of optimal solutions of corresponding linear semi-infinite programs and
shows that the sequence of optimal solutions converges to an optimal solution
of LPM. In section 4, we give an algorithm to find the optimal solution of linear
semi-infinite programming problem.
We now formulate a linear program for measure spaces (LPM). As in Lai
and Wu (1994), X and Y are compact Hausdorff spaces, C(X) and M ( X ) are,
respectively, spaces of continuous real valued function and spaces of regular
Borel measures on X . We denote the totality of non-negative Borel mersures
on X as M + ( X ) and the subset of C ( X ) consisting of non-negative functions
as C+ (X). Given v, v* E M(Y), cp E C ( X x Y) and h E C(X), we know from
Lai and Wu (1994) that LPM can be formulated as follows:

LPM : minimize h dp(x) Sx

subject to p E M+(X), and
SB SX ' P ( X > Y ) ~ P ( X ) ~2~ v(B)
* ( Y ) for B(Y)'

Here B(Y) stands for the Borel field of Y. We define the bilinear functionals
(a, a) and (a, as follows:

For any cp E C ( X x Y), we define a linear operator A : M ( X ) + M(Y) by

Ap(B) = +(p, y)dv*(y) for any B E B(Y) and p E M(X),

IB
334 OPTIMIZATION AND CONTROL WITH APPLICATIONS

is a continuous function on Y, and v* E M(Y).

Applying Fubini theorem, we have

where Ag(.) = Jy g(y)cp(x, y)dv(y) is the adjoint operator of A. It is clear

that M + ( X ) is a a ( M ( X ) , C(Y))-closed convex cone, and M+(Y) is a
a(M(Y), C(X))-closed convex cone. From Kretschmer (1961), we know that
LPM has an associated dual problem:

where v, v* E M(Y), cp E C ( X x Y) and h E C(X) are given.

A feasible solution of an optimization problem (P) is a point satisfying the
constraints of problem (P). The set of all feasible solutions for problem (P) is
called the feasible set of problem (P), which we denote by F ( P ) . If F ( P ) is
not empty, then we say the problem (P) is consistent. We denote by V(P) the
optimal values for problem (P).
Wen and Wu (2001) proved that under some conditions there is no duality
gap for LPM. Now we state the result as follows.

Theorem 1.1 Suppose DLPM is consistent and -oo < V(DLPM) < m. If
there exists a g* E C+(Y) such that

1' g(y)cp(x,y)dv(y) < h(x) for allx E X.

Then LPM has no duality gap.

Corollary 1.1 . Suppose DLPM (or LPM) is consistent with finite value. If
h(x) > 0 or h(x) < 0 for all x E X , then LPM has no duality gap.
APPROXIMATION APPROACH FOR L P IN MEASURE SPACE 335

2 SOLVABILITY OF LPM

In this section, we shall prove that under some simple conditions, there exists
a solution for LPM.

Theorem 2.1 . Suppose LPM is consistent with finite value. If h ( x ) > 0,

V x E X , then LPM is solvable.

Proof: It is obvious that /, h(x)dp(x)2 0, V p E F ( L P M ) , since h ( x ) > 0,

V x E X. If Sxh(x)dp(x)= 0, V p E F ( L P M ) , then LPM is solvable. Hence
we may assume that, without loss of generality, there exists a po E F ( L P M )
such that

h(x)dpo(x)> 0 and V ( L P M )<

Let {pn : n E IN) G F ( L P M ) be such that Sx h(x)dp,(x) J. V ( L P M ) . Since

there exists n* E IN such that

This implies that

By the Banach-Alaoglu theorem, there exists p* E M + ( X ) and a subsequence

{pnj : j E IN) of { p , : n E IN) such that
336 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Hence, for every g E C+ (Y),

since pnj E F ( L P M ) , V j E IN. This implies that Ap* - v E M+(Y). Hence

p* E F ( L P M ) .

we obtain
V(LPM) = h(x)dp*(x).

Therefore LPM is solvable.

3 AN APPROXIMATION SCHEME FOR LPM

In this section, let X and Y be compact sets on R. We shall develop an scheme
for solving LPM. This scheme is a discritization method. To derive the scheme,
we now reformulate LPM in equality form as follows.

For the given cp E C ( X x Y), v* E M(Y), we define a linear operator

We also define the bilinear function (., as follows:

for all (p, P) E M ( X ) x M(Y) and (f, g) E C ( X ) x C(Y). Then ELPM can be
rewritten as follows:

Minimize ((p,p), (h, 0))3

s.t. &,p) = v, and p E M+(X), P E M+(Y).
APPROXIMATION APPROACH FOR L P IN MEASURE SPACE 337

Applying Fubini theorem, we have

where 2,defined by

is the adjoint of 2 . Since 2 maps C(Y) into C(X) x C(Y), 2 is weakly

continuous by Proposition 4 in Anderson and Nash (1987).
For every k E IN, we let

be a subset of polynomials. Instead of the original program ELPM, we consider

the linear program

( ~ ~ ~ ~ ) b i n i ((p,
m p),
i z (h,
e 0))3
subject to (X(p,p) - u,g)2 = 0, V g E Pk,
and, p E M+(x), E M+(Y).

The (ELPM)"rogram is a discritized version of LPM. As there will be no

danger of confusion, from now on, the bilinear functionals (., .)2, and
(a,

(a,
are a11 represented by .). Then we have the following result:
(a,

Theorem 3.1 Suppose LPM is consistent with finite value. If the subprograms
( E L P M ) ~satisfy the following conditions:

(1) For every k E IN, (ELPM)"~ solvable with an optimal solution (p;,p;),
and

(2) there exists a positive number M such that Ilp;ll + IIpiII 5 M .

Then

( a ) ELPM is solvable.
338 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Proof: It is obvious that F ( E L P M ) C

F((ELPM)"') C F((ELPM)~),
Vk IN. Hence V(ELPM)
E > V((ELPM)"l) >
~ ( ( E L P M ) " , V k E IN.
That is,

and so the sequence pi pi,^;), (h, 0)) : k E IN) is increasing and has an upper
bound. Hence

lim ((pi, p i ) , (h, 0)) = klim

k+m
1P

+m x
h(x)dp;(x) 5 V(ELPM). (3.1)

By condition (2) and the Banach-Alaoglu theorem, there exist (p*,P*) E M+(X) x
M+(Y) and a subsequence {(p&,P;E3) : j E IN) G {(p;,pi) : k E IN) such that

We claim that
(p*,p*)E F ( E L P M ) .

To show this claim, it suffices to prove that

because, according to the fact that U span(Pk) is dense in C(Y),

(3.4) implies
k=l
that (p*,F*) E F(ELPM).
00
Now to prove (3.4), let g E U span(Pk), and write
k=l

since the weak convergence (pij, pi.) t (p*,F*) and the weak continuity of
x. On the other hand, as g is in k=lU span(Pk),there is a x such that g is in
00

span(PF). Since g E span(Pz), there exists m 5 x + 1 such that

m
g= C Xigi with gi E % and Xi E R (i = 1,2,3,. . . ,m).
i=l
APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 339

Moreover, as F((ELPM)W1) c F ( ( E L P M ) ~ )for every k E IN,

Hence,

Let j -+ KI in (3.5), and from (3.6) and (3.6), we obtain (3.4). Thus we have
(3.3).
Therefore, combining (3.1), (3.2) and, (3.3), we have

and complete the proof.

Corollary 3.1 Suppose LPM is consistent with finite value. If the given data
h and v* satisfy the following conditions:

(1) h(z) > 0, Vx E X, and

then

(a) ( E L P M ) k is solvable for every k E IN.

( b ) For every k E IN, let pi,^:) be an optimal solution for (ELPM)" Then

Proof: (a) Given k E IN.As in the proof of Theorem 2.1, we may assume that
there exists ( p r ) , ~ ? ) )E F((ELPM)" such that

h(x)dpr)(x)>O and v((ELPM)")<

340 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Let {(pk,,pk,) : n E IN) C F((ELPM)~) be such that

By the same argument as in the proof of Theorem 2.1, there exist p; E M+(X),
n* E IN, and a subsequence {pknj : j E IN) C {pk, : n E IN) such that

1Ipknj11
h h(++!?
< min h(x) '
vknj 2 n*,
xEX

and,
Pknj + P; weakly as j + CQ. (3.10)
since ( k j , P k , , ) E F ( ( E L P W k ) , ( X ( ~ k , ~ , i i k J- 4 9 ) = 0, V9 E Pk. In
-
particular, (A(pknj,Pkn j ) - V, 1) = 0. That is,

- j- h(x)dd?
where, k = maxcp(x,
XEX
y) Xminh(x) (Iv*~( - v(Y) is a fixed positive number.
llEY *EX
Therefore, by the Banach-Alaoglu theorem, there exist pi E M+(X) and a
subsequence {pkej: j E IN) C {pknj : j E IN) such that
-
pkrj + pi weakly as j + CQ. (3.11)

Combining (3.10) and (3.11), we obtain

This implies that (p;,p;) E F((ELPM)~),since, for every g E Pk,

APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 341

Now (3.12) combining with (3.8) also yields that ( p i , P i ) is an optimal solution
for (ELPM)" as

(b) As in the proof of Theorem 3.1, the sequence {((p;,pi), (h,O)))r=l is

increasing and convergent. As in part (a), we may assume that there exists
(pO,&,) E F ( E L P M ) such that

h(x)dpo(x)>O and V ( E L P M ) <

Also by the same argument as in the proof of Theorem 2.1, there exists a
subsequence {pij : j E IN) 5 {pi : k E IN) such that

Since ( p & , q ) E F ( ( E L P M ) ~ ~ )for every j E IN, ( x ( p i j , p i j ) - v, 1) = 0. By

the same argument as in part (a), we have

Hence by Theorem 3.1, we obtain

and this implies

Although the programs (ELPM~) are solvable, it is not always easy to

solve them. To overcome the difficulty, we shall consider the dual problem of
(ELPM)" For k E IN, we define a linear operator
342 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Applying Fubini theorem, we have

where & : Rwl -+ C(X) x C(Y), defined by

APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 343

is the adjoint operator of &. Hence the dual problem for (ELPM)k is defined
as follows:

(DELPM)~ : maximize c:=~ ri+l(u, yi)

- -
r1

subject to & 7-2

Tk+l
#la
That is,

(DELPM)~ : maximize c ~ = ~ (yid~(y))ri+l

J~
subject to C:=~(J, yicp(x, y ) d ~(y))ri+l
* I h(x), b' x E X
and ~:=,(-rj+l yi) I 0, 'd y E Y.

Note that (DELPM)k is basically the same as CSIP, the continuous semi-
infinite program, except that the "minimization" in CSIP is replaced by "max-
imization" in (DELPM)~. Moreover, for every k E IN, there is no duality gap
for (ELPM)~ and (DELPM)k under the condition that h(x) > 0, 'dx E X.

Theorem 3.2 If h(x) > 0, Vx E X, then for every k E W, there is no duality

gap for ( E L P M ) ~and (DELPM)~.

Proof: Suppose (po,jZo)E M+(X) x M+(Y) such that

Hence,

and,
344 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Since h ( x ) > 0, V x E X and owing to (3.14), PO is the zero measure on X .

Thus, from (3.13), we have

and this implies that Po is the zero measure on Y . Therefore, there is no

( p , p ) E M + ( X ) x M + ( Y ) other than zero measure with &(P, p) = 0 and
( ( p , ~ ()h, ,0 ) ) = 0. As the positive cone M + ( X ) x M + ( Y ) is weakly closed and
the associated dual cone C + ( X ) x C + ( Y ) has a non-empty interior

in the Mackey topology T ( C ( X )x C ( Y ) ,M ( X ) x M ( Y ) ) , by Corollary 3.18

in Anderson and Nash (1987), there is no duality gap for (ELPM)%nd
(DELPM)~.

In this section, we let X , Y be compact sets in R such that X nY = 4. We will

develop an algorithm for solving the semi-infinite linear programming problem
(DELPM)~.
Given k E IN, let T = X U Y , and for i = 0,1,2,. . . ,k , let

and

Consider the semi-infinite linear programming problem SIPk defined as follows:

k
( S I P k ): minimize x i = o &Xi
= ~> g ( t ) , V t E T.
such that ~ t fi(t)xi

Its dual problem can be formulated in the following form:

( D S I P k ) : maximize ST g(t)dp(t)
such that ST fi(t)dp(t)= ci, i = 0,1,2,. . . ,k ,
P -L M + ( T ) ,
APPROXIMATION APPROACH FOR L P IN MEASURE SPACE 345

where M+(T) is the space of all nonnegative bounded regular Bore1 measures
on T. Note that (ELPM)~ and SIPk have the same optimal solution as well
as
~((DELPM)" = -v(sIP~).

Many algorithms in Glashoff and Gustafson (1982); Hettich and Kortanek

(1993); Lai and Wu (1992a); Reemtsen and Gorner (1998); Wu et a1 (1998)
can be applied to solve S I P k . Based on Wu et a1 (1998), we develop an explicit
algorithm for sloving SIPk.
To introduce the algorithm, let us start with some notations. Let T =
{tl, tz, . . . ,t,) be a subset with m elements in T . We denote by (LPk(T1))the
following linear program with m constraints induced by TI:

The dual problem of (LPk(T1)) can be formulated as the following problem:

(DLPk(T1)) :maximize CGl g(tj) yj

such that CF1fi(tj)yj = ci, i = 0,1,2,. . .,k.
y j > O , V j = 1 , 2 ,...,m.

Given that b > 0 is a prescribed small number, we state our algorithm in the
following steps:

Step 0 Let n + 1, choose any ty E T, set TI = {ty), and set mo = 0.

Step 1 Solve LPk(Tn). Let X n = (x;, x?, x?, . . . ,x:) be an optimal solution
of LPk(Tn). Define

Step 2 Solve DLPk(Tn). Let Yn = (yy, yz, . . . ,ykn_,+,) be an optimal

solution DLPk(Tn). Define a discrete measure pn on T by letting
346 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Step 3 Find any tkn+, E T such that +,(tk,+,) < -6. If such t",+l
does not exist, stop and output X n as a solution. Otherwise, set
Tn+l= E n U {tkn+l).
Step 4 Update n t n + 1, and go to step 1.
In the above algorithm, we make the following assumptions:

( A l ) LPk(Tn) and DLPk(Tn) are both solvable for every n E IN,

(A2) LPk(En) has the unique solution for every n E IN.
Note that tkn+l E En+1,that is, pn+l(tkn+l) > 0. Otherwise, we have
=~0,) which implies that V(LPk(Tn+1)) = V(DLPk(Tn+l)) =
~n+l(tk~+
V(DLPk(En)). Since V(DLPk(En)) = V(DLPk(Tn)),we have V(LPk(Tn+l)) =
V(DLPk(Tn)) = V(LPk(Tn)). Since X n and x"+' are feasible for LPk(En),
they are both optimal solutions for LPk(En). Hence, by the assumption (A2),
we have X n = Xn+l, and this implies that +,(t",+,) > 0, which is a contra-
diction.

Proof: Since En+i - {tkn+l)c En, we have, by the definition (4.2) of an,
an(t) > 0, Vt E En+l- { t ~ m + l ) Hence,
. from the complementary slackness
theorem of linear programming, we get

which implies
APPROXIMATION APPROACH FOR LP IN MEASURE SPACE 347
k

and we complete the proof.

If, in each iteration, there exists a 6 > 0 such that pn(t) > 6, V t E En, then
by Theorem 4.1, we obtain

Hence we have the following corollary.

Corollary 4.1 Given any 6 > 0, in each iteration, if there exists a 6 > 0 such
>
that pn(t) 6, Vt E En, then V(LPk(Tn+l))> V(LPk(Tn)).

Theorem 4.1 as well as Corollary 4.1 are fundamental results for the algo-
rithm; with them, we show that, under proper conditions, for any given 6 > 0,
the proposed algorithm actually terminates in a finite number iterations.

Theorem 4.2 Given any 6 > 0, in each iteration, assume that:

(A3) There exzsts a M > 0 such that llXnll 5 M ;
(A4) There exists a 6 > 0 such that pn(t) 2 6, V t E En.

Then, the algorithm terminates in a finite number of iterations.

Proof: Suppose the scheme does not stop in a finite number of iterations. By
Corollary 4.1, we have

We claim that this is impossible. By (A3),the infinite sequence { X n ) is con-

fined in a compact set C in Rn. There exists a subsequence { X n r ) of { X n )
348 OPTIMIZATION AND CONTROL WITH APPLICATIONS

such that X n r converges to X*, and the subsequence {t$nT+l) converges to

some point t, as r + oo. Now we let

Then, by (4.1), 4nT(t$n,+l) converges to &(t,). Since &.( t Z T + , ) < -6, for
each r, we have

Now, let E be an arbitrary number, and we can find a large integer nN E { n r } z l

such that

By Theorem 4.1, we have

Hence, (4.4) cannot be true, and we have a contradiction. Therefore our claim
is valid and the proof is complete.

Under conditions (A3) and (A4), Theorem 4.2 assures that the proposed
scheme terminates in finitely many iterations, say n* iterations, with an optimal
solution
X n* = (x;',xn*1 ,x; * ,. . . ,$*).
In this case, xn*can be viewed as an approximate solution of (SIPk). The
next theorem tells us how good such an approximate solution can be.
REFERENCES 349

Theorem 4.3 For any given 6 > 0, if there exists X = (TO,TI, T2, . . . ,Tk) E
IRk+' such that
k
C T i f i ( t ) 2 1, Vt E T, (4.5)
i=O
then

Proof: By the definition of xn*,we have

By (4.5), we have

It follows from (4.6) and (4.7) that

Hence, xn*+ 6 7 is a feasible solution of (SIPk). Therefore,

References

E.J. Anderson and P. Nash (1987), Linear programming in infinite dimensional

spaces, Wiley, Chichester.
K. Glashoff and S.A. Gustafson (1982), Linear optimization and approximation,
Springer-Verlarg, New York.
R. Hettich and K. Kortanek (1993), Semi-infinite programming: Theory, method
and application, SIAM Review, 35, pp.380-429.
H.G. Kellerer (1988), Measure theoretic versions of linear programming, Math.
Zeitschrifl, 198, pp.367-400.
350 OPTIMIZATION AND CONTROL WITH APPLICATIONS

K.S. Kretschmer (1961), Programmes in paired spaces, Canad. J. Math., 13,

pp.221-238.
H.C. Lai and S.Y. Wu (1992), Extremal points and optimal solutions for general
capacity problems, Math. Programming (series A), 54, pp.87-113.
H.C. Lai and S.Y. Wu (1992a), On linear semi-infinite programming problems,
An algorithm, Numer. Funct. Anal. and Optimzz., 13, 314, pp.287-304.
H.C. Lai and S.Y. Wu (1994), Linear programming in measure spaces, Opti-
mization, 29, pp.141-156.
R. Reemtsen and S. Gijrner (1998), Numerical methods for semi-infinite pro-
gramming; a survey. In R. Reemtsen and J-J. Ruckmann, editors, Semi-
Infinite programming, pp.195-275, Kluwer Academic Publishers, Boston.
C.F. Wen and S.Y. Wu (2001), Duality theorems and algorithms for linear pro-
gramming in measure spaces, submitted to Journal of Global Optimization.
S.Y. Wu, S.C. Fang and C.J. Lin (1998), Relaxed cutting plane method for
solving linear semi-infinite programming problems, Journal of Optimization
Theory and Applications, 99, pp.759-779.
S.Y. Wu, C.J. Lin and S.C. Fang (2001), Relaxed cutting plane method for solv-
ing general capacity programming problems, to appear in Ann of Operations
Research.
III OPTIMAL CONTROL
NONLINEAR SYSTEMS
S.P. Banks* and T. Cimen

Department of Automatic Control and Systems Engineering

The University of Sheffield
Mappin Street. Sheffield, S1 3JD. U.K.

Abstract: In this paper we study a nonlinear optimization problem with non-

linear dynamics and replace it with a sequence of time-varying linear-quadratic
problem, which can be solved classically.

Key words: Nonlinear optimal control, approximating systems.

"e-mail: [email protected]
354 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

In recent papers Banks (2001), Banks and Dinesh (2000) have applied a se-
quence of linear time-varying approximations to find feedback controllers for
nonlinear systems. Thus, for the optimal control problem

min J = LxT ( t f )F ( x ( t f ) ) x ( t f )
2
+: lf +
{ x T ( t ) Q ( x ( t ) ) x ( t ) uT (t)R ( x ( t ) ) ~ ( t ) ) d(1.1)
t

subject to the dynamics

+
X ( t )= A ( x ( t ) )x ( t ) B ( x ( t ) )u (t), x ( 0 ) = xo (1.2)

they have introduced the sequence of approximations

$1 (t)= A (,I.-ll @)) ( t )+ B (

xl+ll ( t ) )ul" ( t ), xl'] ( 0 ) = xo (1.4)

for i 2 0 , where

do]( t )= A ( x o )x[O1( t )+ B ( x o )u[O]( t ), x[O]( 0 ) = xo

and

+:2 Sf0 +
{xnT ( t )Q ( x o )xlOl ( t ) uloIT (t)R (zoo)
u[O]( t ) }dt.

The approximations have been shown to converge under very mild conditions
(that each operator is locally Lipschitz) and to provide very effective control
in many examples. However, optimality has not been proved and, in fact, the
limit control is unlikely to be optimal in general (and indeed, there may be no
optimal control since the nonlinear systems considered are very general). Hence,
in this paper, we consider the full necessary equations derived from Pontryagin's
maximum principle and compare them with the original "approximate optimal
control" method proposed in Banks and Dinesh (2000).
OPTIMAL CONTROL O F NONLINEAR SYSTEMS 355

2 THE APPROXIMATING SYSTEMS

2.1 Classical Optimal Control Theory

Let us first look at classical optimal control theory where we consider the linear
system
+
x (t)= A ( t )x (t) B ( t )u (t), x (0) = xo (2.1)
with the finite-time cost functional
1
min J = -xT ( t f )F x ( t f )
2
+ lf + tiT ( t ) R ( t () t~) }dt.
{ x T ( t )Q (t)x ( t )
(2.2)
I t is well known (see, for example, Banks (1986)) that from the maximum
principle, the solution to the linear-quadratic regulator problem is given by the
coupled two-point boundary value problem

A(t)
( ) = Q ( t )
-B(t)R-l(t)BT(t)
-AT (t) ) ( ;) (2.3)

with

Assuming that X (t)= P (t) x (t) for some positive-definite symmetric matrix
P ( t ) ,the necessary conditions are then satisfied by the Riccati equation

+
P (t)= -Q (t) - P ( t )A (t)- AT (t)P (t) P (t)B (t)R-l (t)B~ ( t )P ( t ),
P ( t f )= F
(2.5)
yielding the linear optimal control law given by

u (t)= -R - ~( t )B~ (t)P ( t )x (t). (2.6)

2.2 Global Optimal Control

Now consider applying the maximum principle to the nonlinear-quadratic regu-

lator problem (1.1), (1.2) as in classical linear optimal control theory. We thus
have the Hamiltonian
356 OPTIMIZATION AND CONTROL WITH APPLICATIONS

and from the eauations

we obtain

where

T
(5)
uT -
8%
U, uT- a ax2
R(x)
, . . . , u~=L)
dxn
are vectors of quadratic forms, and

T
u, ...,
Hence we obtain the equations
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 357

together with the transversality condition A ( t f )= & (&xT( t f )F ( x( t f ) x) ( t f)) ,

We write these differential equations in the form

A (x) -B ( x )R-I ( x )BT ( x )

- B ( x )R- ( x ) R- ( x )B~ ( x )
2

-Q(x) - 2x 1 T
,, w ,,
- A T ( x ) - ( aA(z) x )
+ (9R-I ( x )B~ ( x )A)
or

( 1) ( = -,I
A(X)
( , A)
-B(x)R-~(x)B~(x)
-AT ( x )
) ( ;) (2-7)

where

Here means that we assume L ( x , A) has a factor x so that it can be

written as L ( x , A) = M ( x , A) x for some function M ( x , A) so that =
M ( x , A). Using the approximation theory proposed in Banks and McCaffrey
(1998),we can replace the nonlinear system of equations (2.7) by a sequence
of linear time-varying approximations given by

or
di1 ( t ) 2 ( t ) -5 ( t )R-l ( t )5 T ( t )
( AIi1 ( t ) ) (
= -at) -2T (t)
358 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where
X ( t ) = A (x[~-']( t ) )
5 (t)= B (xfi-'] (t))
2-1 (t)= R-l (x[i-l]( t ) )
lj ( t )= Q1(xli-l]( t ), X [ ~ - - ~ I ( t ) )
with

We know that the two-point boundary value problem (2.3) with conditions
(2.4) represents the classical optimal control of a linear system with quadratic
cost (2.2) subject to the dynamics (2.1), the solution of which is given by the
Riccati equation (2.5) together with the optimal control law (2.6). Therefore,
since each approximating problem in (2.8) is linear-quadratic, the system (2.8)
with conditions (2.9), (2.10) and (2.11) is equivalent to the classical optimal
control of a linear system with quadratic cost

subject to the dynamics

and so the solution is given by the Riccati equation

P (t)= -6 ( t )- P ( t )A" (t)- A"T ( t )P (t)+ P (t)Zj(t)2-l ( t )BT (t)P (t),

F
P ( t f )=

together with the optimal control law

u (t)= - E - I ( t )BT ( t )P (t)x ( t )

where we have made the assumption that X ( t ) = P (t)x (t) as in classical

optimal control theory. Hence the solution to the nonlinear optimal control
OPTIMAL CONTROL O F NONLINEAR SYSTEMS 359

problem (2.7) for these necessary conditions with quadratic cost (1.1) subject
to the dynamics (1.2) is given by the approximate Riccati equation sequence

pril (t) = -6(t) - pli1 (t) A (t)) - AT

(x[i-ll (x[i-ll (t)) p[il (t)
+PIi] (t) B (t)) R-I ( x [ ~ - '(t)) ] P [ ~(t)] ,
] BT ( x [ ~ - '(t))
] + ~x
PIi] (tf) = F ( x [ ~ - '(tf)) 1 [i-l]T (tf) a ~ ( x [ ~ x l1)
l ( t f
(2.12)
where

together with the approximate optimal control law sequence

The ith dynamical system then becomes

2.3 Solving the Approximating Sequence

Similar to the "approximate optimal control" technique (see Banks and Dinesh
(2000)), optimization is carried out for each sequence on the system trajectory
for the "global optimal control" approach. In order to calculate the optimal
solution, it is necessary to solve the approximate matrix Riccati equation se-
quence (2.12) storing the values of P (t) from each sequence a t every discrete
time-step. In practice this will be done in a computer and it will be necessary
to solve the Riccati equation using standard numerical integration procedures,
360 OPTIMIZATION AND CONTROL WITH APPLICATIONS

starting a t the final time t = t f and integrating backwards in time by taking

negative time-steps. It should be noted that, even though the problem is to
be solved in discrete-time, we do not solve the discrete-time Riccati equation
as the dynamics are essentially continuous-time. The control that minimizes
the finite-time quadratic cost functional at every time-step for each sequence
is then given by (2.13). The approximating sequence for the states is obtained
by solving (2.14), again storing the values x ( t ) from each sequence a t every
time-step.
Let us consider solving the sequence of approximations. F'rom (2.14) the
first approximation is given by

do' ( t )= ( A (xo) - B ( X O ) R-' (xo)B~ ( x o )P[O]( t ) )x[O1(t), xlOl ( 0 ) = xo

where P[O] ( t ) is the solution of equation (2.12) for i = 0 where we assume

x [ ~ - '(] t )= x0 and P [ ~ - '(t)
] = F (so).The second approximation is then given
by
$1 ( t )= z( t )x [ l ]( t ), x['] ( 0 ) = x0
where
-
A ( t )= A (xio]( t ) )- B (xlO]( t ) )R-' (xlol ( t ) )B~ ( X [ O ] ( t ) )P['l (t)

such that 2 : IR, + IR"' and z(t)now represents a time-varying linear ma-
trix. Here p[']( t ) is obtained by again solving (2.12) for i = 1, this time
replacing x[~-']( t )and P [ ~ - '(]t )with the previous approximations x[O](t)and
P[O] ( t )respectively. The subsequent approximations are obtained in a similar
way. Thus, in solving (2.12)-(2.14),we obtain a sequence of time-varying linear
equations where each sequence is solved as a standard numerical problem. For
each sequence, optimization has to be carried out a t every numerical integra-
tion time-step resulting in dynamic feedback control values. When both xli]
and di]have converged on the kth sequence, on applying the control uI" of the
converged sequence to the true nonlinear system (1.2), the solution obtained
should be the same as that of the converged approximation. This is expected
since x [ ~( t])= x ( t )when xii] ( t )has converged.

3 EXAMPLE
Balancing an inverted pendulum on a motor-driven cart, as shown in Figure 3.1,
has become a popular controller design problem. The objective in the control
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 361

Figure 3.1 Inverted Pendulum System.

of this model is to move the cart to a specified position while maintaining the
pendulum vertical. The inverted pendulum is unstable in that it may fall over
any time in any direction unless a suitable control force u is applied and is
often used to test new controller designs. Here only a two-dimensional problem
is considered, that the pendulum moves only in the plane of the page.
Let us now apply the theories of "approximate optimal control" and "global
optimal control" to the inverted pendulum system where the mathematical
model is given by the equations (see Ogata (1997))

Defining the four state-variables XI, x2, x3, x4 as x, 0, x, e respectively, we

can represent the system (3.1) in state-space in the form (1.2) as
362 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where
x2 = 0
sinc x2 =

For the "global optimal control" technique we also require the Jacobians of
A (x) and B (x), given by

where

h= mg
(M + m sin2x2) [sin x2sinc 5 2 - cos x2 a (sinc x2)]
8x2 ( M + m s i n 2 x2)'
+m sin (2x2)cos x2sinc 2 2
aa34=
ax,
mrx
(M+msi:2 ,;,)$ { ( M + m sin22 2 ) cos x2 - m sin x2 sin (2x2))
2= M+m g
{ ( M + msin2x2) & (sinc x2) - msin (2x2) sine x2
h=
ax,
mx
(M+msit2 ,,)2 {- ( M + msin2 x2) cos (2x2) + $msin2 (2x2))
Bas4 - m r s i n x 2
6x4 M + m sin2 xz
9;; = msin(zx;) )
2 ( M + m s l n x2
ab3 = msin!2;2)
8x2 (M+msm x2)
Bb4=
ax2 ( M + sin2
1
~ x2)~r { ( M + m sin2x2) sin 5 2 + m sin (2x2) cos x2}
with
d
-(sinc x2) =
8x2 {: cm x2,
22 =0
x2 #0
which can easily be shown by differentiating sinc x2 with respect to 22 and
using L'Hopital's rule.
By taking mass of the cart, M = 1 kg, mass of the pendulum arm, m =
0.1 kg, the distance from the pivot to the center of mass of the pendulum arm,
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 363

r = 0.5 m, and the acceleration due to gravity, g = 9.81 m/sec2, the state-
space model (3.2) representing the inverted pendulum system was simulated
using Euler's numerical integration technique with time-step 0.02 sec. The
performance matrices have been chosen independent of the system states such
that Q = F = diag (1, 1, 1, 1) and R = [I], so that the Jacobians of these
become zero. In simulating the inverted pendulum system (using a simulation
package such as MATLAB@) we only consider the regulator problem where
our objective is to drive all the system states to zero. We also assume that the
system starts from rest and simulate it for a given initial angle.

4 RESULTS

The "approximate optimal control" technique has been shown to provide very
effective control in that the pendulum is stabilizable from any given initial an-
gle. In fact taking the initial horizon time tf < 1.9 sec and proceeding with
the control of the approximating sequence from where left-off, by taking the
final state values as initial conditions and defining a new horizon time, the
pendulum can even be stabilized for its uncontrollable states (6 = fn/2 rad).
This is because the "approximate optimal control" technique does not require
a stabilizability condition to be satisfied - it only requires Lipschitz continuity.
Although stability is not guaranteed in general, on a finite-time interval, it
has been achieved for the uncontrollable states of the inverted pendulum sys-
tem. A related "local freezing control" technique given in Banks and Mhana
(1992) (where optimization is carried out point-wise on the system trajectory
for the nonlinear system (1.2) has been shown to stabilize the inverted pendu-
lum from any given initial angle except its uncontrollable states. The "global
optimal control" strategy, however, provides control that stabilizes the inverted
pendulum system for initial angles within the interval f1.1 rad beyond which
problems arise related to the convergence of the approximations for the nec-
essary conditions. This may be related to the solution being a discontinuous
feedback and a viscosity solution, causing the algorithm to eventually blow up.
Note also that the "global optimal control" strategy is harder to implement,
which requires the Jacobians of A (x), B (x), F (x), Q (x), and R (x), hence
taking a longer time to converge to the optimal solution.
Figures 4.1 and 4.2 illustrate and compare the converged solutions using
both techniques when the initial angle is set to 6 = 0.75 rad and 6 = 1.0 rad
364 OPTIMIZATION AND CONTROL WITH APPLICATIONS

73
Time (sec) -
FJ Time (sec)

0
-1
0 2 4 6 8 1 0 0 2 4 6 8 1 0
Time (sec) a" Time (sec)
I I

0 2 4 6 8 1 0
Time (sec)

Figure 4.1 Response of the States and the Control Input of the Inverted Pendulum System
when Subject t o the Initial Angle B = 0.75 rad Using "Approximate Optimal Control" and
"Global Optimal Control" Methods

respectively. The "approximate optimal control" solution is shown with a dot-

ted line whereas the "global optimal control" solution is shown with a solid
line.
From the plots it is obvious that the solutions obtained using these tech-
niques are indeed very close. From Table 4.1, a similar statement can be made
about the costs associated with each approximating strategy. However, note
that there are problems in obtaining slightly larger costs using the "global op-
timal control" approach. This may be due to numerical problems since the
costs obtained are only approximations. It may also be due to nonexistence
of a global optimum since we have only considered the necessary (and not
sufficient) conditions for an optimum, which may result in the approximating
OPTIMAL CONTROL OF NONLINEAR SYSTEMS 365

E4
E

.-
g 0.5
V) u
5 0
'D

$ -0.5
A
0 2 4 6 8 1 0 ~
'D 0 2 4 6 8 10
Time (sec) Time (sec)

$ -1

0 2 4 6 8 1 0 0 2 4 6 8 10
a,
Time (sec) a Time (sec)

0 2 4 6 8 1 0
Time (sec)

Figure 4.2 Response of the States and the Control Input of the Inverted Pendulum System
when Subject to the Initial Angle 6' = 1.0 rad Using "Approximate Optimal Control" and
"Global Optimal Control" Methods
366 OPTIMIZATION AND CONTROL WITH APPLICATIONS

systems converging to any of the local optima. Thus the proposed technique
still remains "approximate optimal control", which provides solutions close to
the optimal one, without having the need to compute any Jacobians of the
system matrices and thus providing an easier implementation.

Table 4.1 Costs Associated with Each Optimization Technique Subject t o Various Initial
Angles.

Initial Angle (rad) App. Optimal Control Global Optimal Control

5 CONCLUSIONS

In this paper we have considered the full necessary conditions of a system with
nonlinear dynamics, derived from Pontryagin's maximum principle, and com-
pared them with the original "approximate optimal control" method. We have
thus considered a nonlinear optimization problem with nonlinear dynamics and
replaced it with a sequence of time-varying linear-quadratic regulator problem,
where we have made the argument that the set of approximating systems are
equivalent to the classical optimal control of a linear-quadratic regulator system
and hence can be solved classically giving the solution to the "global optimal
control", in the case where such a control exists. Even though the approx-
imating systems using the "approximate optimal control" strategy may not
converge to a global optimum of the nonlinear system, by considering a similar
approximation sequence (given by the necessary conditions of the maximum
REFERENCES 367

principle), we have seen that the proposed method gives solutions very close to
the optimal one in many cases for the inverted pendulum system. The meth-
ods used here are very general and apply to a very wide range of nonlinear
systems. Future work will examine issues on discontinuity of the solution of
the Hamilton-Jacobi equation and viscosity solutions.

References

Banks, S.P. (1986), Control Systems Engineering, Prentice-Hall International,

Englwood Cliffs, New Jersey.
Banks, S.P. (2001), Exact Boundary Controllability and Optimal Control for a
Generalized Korteweg de Vries Equation, International Journal of Nonlinear
Analysis, Methods and Applications, Vol. 47, pp. 5537-5546.
Banks, S.P., and Dinesh, K. (2000), Approximate Optimal Control and Stability
of Nonlinear Finite- and Infinite-Dimensional Systems, Annals of Operations
Research, Vol. 98, pp. 19-44.
Banks, S.P., and McCaffrey, D. (1998), Lie Algebras, Structure of Nonlin-
ear Systems and Chaotic Motion, International Journal of Bifurcation and
Chaos, Vol. 8, No. 7, pp. 1437-1462, World Scientific Publishing Company.
Banks, S.P., and Mhana, K.J. (1992), Optimal Control and Stabilization for
Nonlinear Systems, IMA Journal of Control and Information, Vol. 9, pp.
179-196.
Ogata, K. (1997), Modern Control Engineering, 3rd Edition, Prentice-Hall Inc.,
Upper Saddle River, New Jersey.
CONVEX MINIMIZATION PROBLEMS
Christian Kanzow

University of Wurzburg
Institute of Applied Mathematics and Statistics
Am Hubland
97074 Wiirzburg
Germany
e-mail: [email protected]

Abstract: This paper gives a brief survey of some proximal-like methods for the
solution of convex minimization problems. Apart from the classical proximal-
point method, it gives an introduction to several proximal-like methods using
Bregman functions, 9-divergences etc. and discusses a couple of recent develop-
ments in this area. Some numerical results for optimal control problems are also
included in order to illustrate the numerical behaviour of these proximal-like
methods.

Key words: Convex minimization, proximal-point method, Bregman func-

tions, 9-divergences, global convergence.
370 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

This paper gives a brief survey of some proximal-like methods for the solution
of convex minimization problems. To this end, let f : IRn -+ IR U {+co) be
a closed, proper, convex function, and consider the associated optimization
problem
min f(x), x€IRn. (1.1)

Formally, this is an unconstrained problem. However, since f is allowed to be

extended-valued, any constrained problem of the form

min j(x) subject to x E X

for some convex function 1: IRn -+ IR and a closed, nonempty and convex set
X E IRn can easily be transformed into a minimization problem of the form
(1.1) by defining
f(x), if x E X,
f (x) :=
+co, if x # X .
Hence, theoretically, there is no loss of generality by considering the uncon-
strained problem (1.1). In fact, many theoretical results can be obtained in
this (unifying) way for both unconstrained and constrained optimization prob-
lems. The interested reader is referred to the classical book by Rockafellar
(1970) for further details.
Despite the fact that extended-valued functions allow such a unified treat-
ment of both unconstrained and constrained problems, they are typically not
tractable from a numerical point of view. Therefore, numerical algorithms for
the solution of a problem like (1.1) have to take into account the constraints
explicitly or, a t least, some of these constraints. This can be done quite ele-
gantly by so-called proximal-like methods. Similar to interior-point algorithms,
these methods generate strictly feasible iterates and are usually applied to the
problem with nonnegativity constraints

min f(x) s.t. x 2 0 (1.2)

or to the linearly constrained problem

min f (x) s.t. A ~ 5

X b, (1-3)

where, in the latter case, A E IRnXmand b E IRm are the given data.
PROXIMAL-LIKE METHODS 371

While we concentrate on the application of proximal-like methods to op-

timization problems, we should a t least mention that proximal-like methods
may also be applied to several other problem classes like nonlinear systems
of equations, complementarity problems, variational inequalities and general-
ized equations. The interested reader is referred to Lemaire (1989); Lemaire
(1992); Eckstein (1998); Censor et al. (1998); Auslender et al. (199913); Solodov
and Svaiter (2000b) and the corresponding references in these papers for more
details.
This paper is organized in the following way: Section 2 first reviews the
classical proximal-point method and then describes several proximal-like meth-
ods for the solution of the constrained optimization problems (1.2) and (1.3).
In Section 3, we then present some numerical results obtained with some of
these proximal-like methods when applied to some classes of optimal control
problems. We then conclude with some final remarks in Section 4.
The notation used in this paper is quite standard: Rn denotes the n-dimensi-
onal Euclidean space, inequalities like x 2 0 or x > 0 for a vector x E Rn are
defined componentwise, R:+ := {x E I
Cn x > 0) denotes the strictly positive
orthant, and 3 is the closure of a subset S C Rn.

2 PROXIMAL-LIKE METHODS

Throughout this section, we make the blanket assumption that f : IRn +

IR U {+oo) is a closed, proper and convex function.

2.1 Classical Proximal-Point Method

The classical proximal-point method was introduced by Martinet (1970) and

further developped by Rockafellar (1976) and others, see, e.g., the survey by
Lemaire (1989). Being applied to the minimization problem (1.1), it generates
a sequence {x" C Cn such that x"' is a solution of the subproblem

min f (x) + -llx

1
2x1,
- x?I2

for k = 0,1,. . .; here, XI, denotes a positive number. The objective function of
this subproblem is strictly convex since it is the sum of the original (convex)
objective function f and a strictly convex quadratic term. This term is usually
called the regularization term.
372 OPTIMIZATION AND CONTROL WITH APPLICATIONS

This strictly convex regularization term guarantees that the subproblem

(2.1) has a unique minimizer for each k E IN. Hence the classical proximal-point
method is well-defined. Furthermore, it has the following global convergence
properties, see, e.g., Giiler (1991) for a proof of this result.

Theorem 2.1 Let {xk} and {Xk) be two sequences generated by the classical
zk
proximal-point method (2.1), define a k := 3=o A. 3, let f , := inf{f (x) 1 x E

IW) be the optimal value and S := {x* E IRn I f (x*)= f * ) be the solution set
of (1.1). Assume that a k + 00. Then the following statements hold:

(a) The sequence of function values { f (xk)} converges to the optimal value
f*.
(b) I f S # 0, then the entire sequence {x" converges to an element of S .

Theorem 2.1 states some very strong convergence properties under rather
weak conditions. In particular, it guarantees the convergence of the entire
sequence {xk) even if the solution set S contains more than one element; in
fact, this statement also holds for an unbounded solution set S. Note that the
assumption ak + 00 holds, for example, if the sequence {Xk) is constant, i.e.,
if X k = X for all k E N and some positive number A.
We note that many variations of Theorem 2.1 are available in the litera-
ture. For example, it is not necessary to compute the exact minimizer of the
subproblems (2.1) a t each step, see Rockafellar (1976) for some criteria under
which inexact solutions still provide similar global convergence properties. It
should be noted, however, that the criteria of inexactness in Rockafellar (1976)
are not implementable in general since they assume some knowledge regarding
the exact solution of (2.1). On the other hand, Solodov and Svaiter (1999)
recently gave a more constructive criterion in a slightly different framework.
F'urthermore, some rate of convergence results can be shown for the classical
proximal-point method under a certain error bound condition, cf. Luque (1984).
This error bound condition holds, for example, for linearly constrained problems
due to Hoffman's error bound Hoffman (1952). Moreover, being applied to
linear programs, it is known that the classical proximal-point method has a
finite termination property, see Ferris (1991) for details.
PROXIMAL-LIKE METHODS 373

We also note that the classical proximal-point method can be extended

to infinite-dimensional Hilbert spaces for which weak convergence of the it-
erates {xk) can be shown, see Rockafellar (1976); Giiler (1991) once again.
Strong convergence does not hold without any further modifications as noted
by Giiler (1991) who provides a counterexample. On the other hand, using
a rather simple modification of the classical proximal-point method, Solodov
and Svaiter (2000a) were able to present a strongly convergent version of the
classical proximal-point method in Hilbert spaces.
In contrast to the classical proximal-point method, the proximal-like meth-
ods to be presented in our subsequent discussion have only been presented in
the finite dimensional setting; this is mainly due to the fact that they typically
involve some logarithmic functions.

2.2 Proximal-like Methods Using Bregman Functions

The simple idea which is behind each proximal-like method for the solution of
convex minimization problems is, more or less, to replace the strictly quadratic
term in the regularized subproblem (2.1) by a more general strictly convex
function. Later on, we will see that this might be a very useful idea when
solving constrained problems.
There are quite a few different possibilities to replace the term 411x - xkl12
by another strictly convex distance-like function. The one we discuss in this
subsection is defined by

where $ is a so-called Bregman function. According to Solodov and Svaiter

(2000a), a Bregman function may be defined in the following way.

Definition 2.1 Let S C IRn be an open and convex set. A mapping $ : 3 + IR

is called a Bregman function with zone S i f it has the following properties:

(i) $ is strictly convex and continuous o n 3;

(ii) $ is continuously differentiable in S;

(iii) The partial level set

374 OPTIMIZATION AND CONTROL WITH APPLICATIONS

is bounded for every x E 3;

(iv) If {yk} C S converges to x, then limk,, D$(x, yk) = 0.

Earlier papers on Bregman functions require some additional properties, see

De Pierro and Iusem (1986); Censor and Zenios (1992); Eckstein (1993); Guler
(1994); Censor et al. (1998). However, as noted in Solodov and Svaiter (2000a),
all these additional properties follow from those mentioned in Definition 2.1.
Two simple examples of Bregman functions are:

$I(%) :=

$2(x) := 2
i= 1
xi log xi - xi on S = +
:Rl (convention: O logo = 0).

Using += in the definition of D$, we reobtain the subproblem (2.1) of

the classical proximal-point method. On the other hand, using $ = $2 in the
definition of Dq gives the so-called Kullback-Leibler relative entropy function

This function may be used in order to solve the constrained optimization prob-
lem (1.2) by generating a sequence {xk} in such a way that xk+l is a solution
of the subproblem
1
min f ( x ) + - D + ( x , x ~ ) , x>O
xk

for k = 0,1,2,. . ., where so> 0 is a strictly feasible starting point. Then

a convergence result completely identical to Theorem 2.1 can be shown for
this method, see Chen and Teboulle (1993) for details. Some related papers
dealing with Bregman functions in the context of proximal-like methods are
Eckstein (1993); Guler (1994); Eckstein (1998); Censor et al. (1998), where the
interested reader will find rules which allow inexact solutions of the subproblems
(2.3) and where he will also find some rate of convergence results. In contrast
to the classical proximal-point method, however, the proximal-like methods do
not possess any finite termination properties for linear programs.
On the other hand, a major advantage of the proximal-like methods is that
a subproblem like (2.3) is essentially unconstrained and can therefore be solved
by unconstrained minimization techniques.
PROXIMAL-LIKE METHODS 375

2.3 Proximal-like Methods Using 9-Divergences

Another variant of the classical proximal-point method is a proximal-like method

based on so-called cp-divergences. These cp-divergences will be used in order to
replace the strictly convex quadratic term in the subproblem (2.1) of the clas-
sical proximal-point method.

Definition 2.2 Let @ denote the class of closed, proper and convex functions
cp : R + (-oo, +oo] with dom(cp) 5 [0, +oo) having the following properties:

(2) cp is twice wntinuously diflerentiable on int(domcp) = (0, +oo);

(ii) cp is strictly convex on its domain;

(iii) limt,o+ cpl(t) = -00;

(iv) cp(1) = cpl(l) = 0 and cp1I(l) > 0;

(v) There exists v E (;cpn(l), cp"(1)) such that

Then the cp-divergence corresponding to a mapping cp E @ is defined by

Sometimes condition (v) is not required in Definition 2.2. However, the

corresponding convergence results are weaker without this additional property.
Some examples of functions cp E @ are

The graphs of these three functions are shown in the following figure.
376 OPTIMIZATION AND CONTROL WITH APPLICATIONS

In particular, if we choose cp = cpl in the definition of d,, we obtain

and this is precisely the Kullback-Leibler relative entropy function from (2.2).
Using any pdivergence, we may try to solve the constrained minimization
problem (1.2) by generating a sequence {xk) in such a way that xk+l solves
the subproblem

1
min f(x)+-dp(x,xk), x>O
Xk

for k = 0,1,. . ., where so > 0 is any given starting point. For this method,
Teboulle (1997) shows that it has the same global convergence properties as
those mentioned in Theorem 2.1 for the classical proximal-point method. In
addition, Teboulle (1997) also allows inexact solutions of the subproblems (2.4).
The method may also be applied to the linearly constrained problem (1.3), and,
once again, superlinear convergence can be shown under a certain error bound
assumption, see Auslender and Haddou (1995) for details. Further references on
cp-divergences include Csiszdr (1967); Eggermont (1990); Teboulle (1992); Iusem
et al. (1994); Iusem, Teboulle (1995).
PROXIMAL-LIKE METHODS 377

2.4 Proximal-like Methods Using Quadratic Kernels and Regularization

The method we describe in this subsection is taken from Auslender et al.

(1999a) and based on earlier work by Tseng and Bertsekas (1993) and Ben
Tal and Zibulevsky (1997). It can be motivated by using the ideas from the
previous subsection. To this end, assume for a while that the mapping f from
the linearly constrained optimization problem (1.3) is twice continuously differ-
entiable. Then it is most natural to use Newton's method in order to minimize
the objective function from the (essentially unconstrained) subproblem in (2.4)
in order to obtain the next iterate in this way. Using Newton's method, how-
ever, we need second order derivatives of these functions. In particular, we need
second order derivatives of the function d, from Definition 2.2. Calculating the
first derivative with respect to x, we obtain

and calculating the second derivative gives

where ei denotes the i-th unit vector in IRn. Hence, in each sum, we have the
factor which increases to infinity during the iteration process for all indices
i for which a constraint like xi 2 0 is active a t a solution. Consequently, we
therefore get Hessian matrices which are very ill-conditioned.
In order to avoid this drawback, it is quite natural to modify the idea from
the previous subsection in the following way: Let Q, be the class of functions
from Definition 2.2 and set

The difference to the pdivergence from the previous subsection is that we use
yH instead of yi. Calculating the second order derivative of the mapping d,
from (2.5) gives

i.e., the crucial factor & vanishes completely.

378 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Now consider the linearly constrained optimization problem (1.3), i.e.,

' con-
sider the problem
min f(x) s.t. g(x) 2 0, (2-6)

with g(x) := b - ATx. Auslender et al. (1999a) consider an algorithm which

defines a sequence {xk} in such a way that, given a strictly feasible starting
point xO,the next iterate xk+' is computed as a solution of the subproblem

min f (x) + -d,,1

xk
(g(x), g(xk)), g(x) > 0,
where, of course, d,, denotes the mapping from (2.5). However, it turns out that
this method has weaker global convergence properties than all other methods
discussed so far, see Auslender et al. (1999a).
In order to overcome this problem, Auslender et al. (1999a) suggest to use
a regularization technique. More precisely, they suggest to add a quadratic
penalty term to a function cp E (with being the set from Definition 2.2) in
order to obtain a function $5 in this way. For example, using the three mappings
(PI, 9 2 , cp3 from Subsection 2.3, we obtain

for some constant v > 1. These functions are plotted in the following figure.
PROXIMAL-LIKE METHODS 379

Now, let $ denote any of these functions and set (similar to (2.5))

The regularized method from Auslender et al. (1999a) then generates a se-
quence {x" by starting with a strictly feasible point x0 for problem (1.3) and
by computing xk+l as the solution of the subproblem
1
min f (x) + -d@
Xk
(g(x), dx") , g(x) > 0. (2.7)

Then the following result was shown in Auslender et al. (1999a).

Theorem 2.2 Let {xk} be a sequence generated by the above method. Assume
that the following assumptions are satisfied:

(A.l) There exist constants Amax 2 Xmi, > 0 such that Xk E [Amin, Amax] for all
k E IN.

(A.2) The optimal value f, := inf{f (x) I ATx 5 b} is finite.

(A.3) dom(f) n int{x I ATx I b} is nonempty.

(A.4) The matrix A has rank n.

Then the following statements hold:

The sequence of function values {f (xk)} converges to the optimal value

f*.
If S # 0, then the entire sequence {x" converges to an element of S.

The rank assumption (A.4) is satisfied, e.g., if A = I , i.e., if the feasible set
is the nonnegative orthant. Moreover, this rank condition can be assumed to
hold without loss of generality for linear programs if we view (1.3) as the dual
of a standard form linear program.

2.5 Infeasible Proximal-like Methods

Consider again the linearly constrained minimization problem (1.3). The meth-
ods described in the previous subsections all assume, among other things, that
380 OPTIMIZATION AND CONTROL WITH APPLICATIONS

the interior of the feasible set is nonempty. Moreover, it is assumed that we can
find a strictly feasible point in order to start the algorithm. However, for linear
constraints, it is usually not easy to find such a starting point. Furthermore,
there do exist convex optimization problems which are solvable but whose in-
terior of the feasible region is empty. In this case, it is not possible to apply
one of the methods from the previous subsections.
Yamashita et al. (2001) therefore describe an infeasible proximal-like method
which can be started from an arbitrary point and which avoids the assumption
that the interior of the feasible set is nonempty. The idea behind the method
from Yamashita et al. (2001) is to enlarge the feasible set

of the original minimization problem (1.3) by introducing a perturbation vector

Sk > 0 and by replacing X a t each iteration k by an enlarged region of the form

Note that X # 0 then implies int(Xk) # 0. Hence we can apply the previous
method to the enlarged problem

min f(x) s.t. x E X k .

The fact that the method from the previous subsection uses a quadratic penalty
term actually fits perfectly into our situation where we allow infeasible iterates.
Obviously, we can hope that we obtain a solution of the original problem
(1.3) by letting Sk + 0. To be more precise, let us define

and let xk+l be a solution of the subproblem

for some perturbation vector Sk > 0. Then is has been shown in Yamashita et
al. (2001) that the statements of Theorem 2.2 remain true under a certain set
of assumptions. However, without going into the details, we stress that these
<
assumptions include the condition dom(f) n {x I ATx b ) # 0 (in contrast to
(A.3) which assumes that the domain o f f intersected with the interior of the
feasible set is nonempty). On the other hand, the main convergence result in
PROXIMAL-LIKE METHODS 381

Yamashita et al. (2001) has to impose another condition which eventually guar-
antees that the iterates {xk} generated by the infeasible proximal-like method
become feasible in the limit point.

3 NUMERICAL RESULTS FOR SOME OPTIMAL CONTROL

PROBLEMS

3.1 Description of Test Problems

We consider two classes of optimal control problems. The first class contains
control constraints, the second one involves state constraints.
The class of control constrained problems is as follows: Let R 5 IRn be an
open and bounded domain and consider the minimization problem

where y = y(u) E H i ( 0 ) denotes the weak solution of Poisson's equation

-Ay = u on 0 and yd, ud, $J E L2(R) are given square-integrable functions.
Here y is the state and u is the control variable. The meaning of (3.1) is that
we want to minimize the distance to a desired state yd (hence the subscript
'd') subject to some constraints on the control u. The penalty term in the
objective function J is a standard regularization term multiplied by a small
constant a > 0.
The following two particular instances of the class of problems (3.1) will be
used in our numerical tests; they are taken from Bergounioux et al. (2001).

Example 3.1 Consider the following two instances:

(a) Consider problem (3.1) with the following data:

382 OPTIMIZATION AND CONTROL WITH APPLICATIONS

(b) Consider problem (3.1) with the following data:

In order to deal with this problem numerically, we discretize the domain

R = (0, 1)2 by using an equidistant ( N x N)-grid. The Laplacian was approx-
imated by using the standard 5-point finite difference scheme. After shifting
the variables in order to get nonnegativity constraints, we then obtain an op-
timization problem of the form (1.2).
The second class of problems we will deal with is the state constrained prob-
lem
min ) f ~IY(u)
~ ( u := + $ 1 1 ~-~ d l l $ ( ~ )
-~ d l l $ ( ~ )
(3.2)
s.t. u~F:={u~~~(R)ly<cjonR},
where we used the same notation as for the control constrained problem, i.e.,
y = y(u) E H i ( R ) denotes the weak solution of Poisson's equation -Ay =
u on R, yd, ud, cj E L2(R) are given functions, and a > 0 is a small regulariza-
tion parameter. Note that the only difference between problems (3.1) and (3.2)
is that we have different constraints.
The particular instance of problem (3.2) we are interested in is also taken
from Bergounioux et al. (2001).

Example 3.2 Consider problem (3.2) with the following data:

In order to deal with the state constrained problem (3.2), we use the same
discretization scheme as for the control constrained problem (3.1). In this case,
however, this results in a linearly constrained optimization problem of the form
(1.3), and it is in general not easy to find a strictly feasible starting point.
PROXIMAL-LIKE METHODS 383

3.2 Numerical Results for Control Constrained Problems

We begin with a word of caution: The numerical results presented in this and
the next subsection are not intended to show that proximal-point methods are
the best methods for solving the two classes of optimal control problems from
the previous subsection. The only thing we want to do is to provide a brief
comparison between some of the different proximal-like methods for the solution
of these problems in order to get some hints which proximal-like methods seem
to work best.
All methods were implemented in MATLAB and use the same parameter
setting whenever this was possible. The particular methods we consider in this
subsection for the solution of the control constrained problem (3.1) are the
proximal-like methods from Subsections 2.3 (p-divergences) and 2.4 (quadratic
kernels with regularization term). The unconstrained minimization is always
carried out by applying Newton's method. For reasons explained earlier, we
took the function 92 in combination with the proximal-like method from Sub-
section 2.3, and the corresponding mapping & for the regularized method from
Subsection 2.4.
Table 3.2 contains the numerical results for Example 3.1 (a) for different sizes
of N (the dimension of the discretized problem is n = N2). This table contains
the cumulated number of inner iterations, i.e., we present the total number of
Newton steps and therefore the total number of linear system solves for each
test problem. Both methods seem to work reasonably well, and the number of
iterations is more or less independent from the mesh size. However, the number
of iterations using the p-divergence approach is significantly higher than the
number of iterations for the regularized approach. The resulting optimal control
and state for Example 3.1 (a) are given in Figures 3.1 and 3.2, respectively.
The observation is similar for Example 3.1 (b) as shown in Table 3.2, al-
though this time the number of iterations needed by the two methods is pretty
much the same. The resulting optimal control and states for these two examples
are given in Figures 3.3 and 3.4, respectively.
We also tested both methods on Example 3.1 (a) using smaller values of a.
Due to the quadratic penalty term in the regularized method, we do expect
a better behaviour for this method. This is indeed reflected by the numerical
results shown in Table 3.4.
384 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Table 3.1 Number of iterations for Example 3.1 (a) ( n= N 2 )

pdivergence 1 regularized p function 1

Figure 3.1 Resulting optimal control for Example 3.1 (a)

3.3 Numerical Results for State Constrained Problems

Since a strictly feasible starting point is usually not a t hand for state con-
strained problems, we only applied the infeasible proximal-like method from
Subsection 2.5 to the test problem from Example 3.2.
PROXIMAL-LIKE METHODS 385

Figure 3.2 Resulting optimal state for Example 3.1 (b)

Table 3.2 Number of iterations for Example 3.1 (b) (n = N ~)

/ N 1 pdivergence I regularized q function 1

Similar t o our description of the methods from the previous subsection, we

use Newton's method in order t o carry out the inner iterations. The cumulated
number of inner iterations for different dimensions of this problem are reported
in Table 3.3.
386 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Figure 3.3 Resulting optimal control for Example 3.1 (b)

Table 3.3 Number of iterations for Example 3.1 (a) using different a ( N = 30)

1 a 1 pdivergence I regularized p function 1

The results indicate that the infeasible method works quite well. Similar t o
the results from the previous subsection, we can see from Table 3.4 t h a t the
number of iterations is again (more or less) independent of the mesh size.
PROXIMAL-LIKE METHODS 387

Figure 3.4 Resulting optimal state for Example 3.1 (b)

Table 3.4 Number of iterations for Example 3.2 ( n= N 2 )

1 N 1 iterations 1

The resulting optimal control and state for Example 3.2 are shown in Figures
3.5 and 3.6, respectively.
388 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Figure 3.5 Resulting optimal control for Example 3.2

4 F I N A L REMARKS

In this paper, we presented several proximal-like methods for the solution of

convex minimization problems. A particular class of convex problems are the
so-called semi-definite programs, see Nesterov and Nemirovskii (1994) for ex-
ample. The main difference between semi-definite programs and a convex min-
imization problem of the form (1.1), say, is the fact that the variables in semi-
definite programs are matrices, more precisely, symmetric positive semi-definite
matrices. As far as the author is aware of, proximal-like methods have not been
applied to semi-definite programs so far. However, due to the recent interest in
semi-definite programs (mainly in the field of interior-point analysis), it might
be a very useful research topic to see how proximal-like methods can be ex-
tended to semi-definite programs and how these methods behave for this class
of convex problems.
REFERENCES 389

-.
1 0

Figure 3.6 Resulting optimal state for Example 3.2

References

Auslender, A., and Haddou, M. (1995), An interior-proximal method for convex

linearly constrained problems and its extension t o variational inequalities,
Mathematical Programming, Vol. 71, pp. 77-100.
Auslender, A., Teboulle, M., and Ben-Tiba, S. (1999a), Interior proximal and
multiplier methods based on second order homogeneous kernels, Mathemat-
ics of Operations Research, Vol. 24, pp. 645-668.
Auslender, A., Teboulle, M., and Ben-Tiba, S. (1999b), A logarithmic-quadratic
proximal method for variational inequalities, Computational Optimization
and Applications, Vol. 12, pp. 31-40.
Ben-Tal, A., and Zibulevsky, M. (1997), Penalty-barrier methods for convex
programming problems, SIAM Journal on Optimization, Vol. 7, pp. 347-366.
Bergounioux, M., Haddou, M., Hintermiiller, M., and Kunisch, K. (2001): A
comparison of a Moreau-Yosida-based active set strategy and interior point
390 OPTIMIZATION AND CONTROL WITH APPLICATIONS

methods for constrained optimal control problems, SIAM Journal on Opti-

mization, Vol. 11, pp. 495-521.
Censor, Y., Iusem, A.N., and Zenios, S.A. (1998), An interior point method with
Bregman functions for the variational inequality problem with paramonotone
functions, Mathematical Programming, Vol. 81, pp. 373-400.
Censor, Y., and Zenios, S.A. (1992), Proximal minimization algorithm with
D-functions, Journal of Optimization Theory and Applications, Vol. 73, pp.
451-464.
Chen, G., and Teboulle, M. (1993), Convergence analysis of a proximal-like
minimization algorithm using Bregman functions, SIAM Journal on Opti-
mization, Vol. 3, pp. 538-543.
Csiszbr, I. (1967), Information-type measures of difference of probability dis-
tributions and indirect observations, Studia Scientiarium Mathematicarum
Hungarica, Vol. 2, pp. 299-318.
De Pierro, A.R., and Iusem, A.N. (1986), A relaxed version of Bregman's
method for convex programming, Journal of Optimization Theory and Ap-
plications, Vol. 51, pp. 421-440.
Eckstein, J. (1993), Nonlinear proximal point algorithms using Bregman func-
tions, with applications to convex programming, Mathematics of Operations
Research, Vol. 18, pp. 202-226.
Eckstein, J . (1998), Approximate iterations in Bregman-function-based proxi-
mal algorithms. Mathematical Programming, Vol. 83, pp. 113-123.
Eggermont, P.P.B., Multiplicative iterative algorithms for convex programming,
Linear Algebra and its Applications, Vol. 130, pp. 25-42.
Ferris, M.C. (lggl), Finite termination of the proximal-point algorithm, Math-
ematical Programming, Vol. 50, pp. 359-366.
Guler, 0. (1991), On the convergence of the proximal-point algorithm for con-
vex minimization, SIAM Journal on Control and Optimization, Vol. 29, pp.
403-419.
Guler, 0. (1994), Ergodic convergence in proximal point algorithms with Breg-
man functions, in Du, D.-Z. and Sun, J., Advances in Optimization and
Approximation, Kluwer Academic Publishers, pp. 155-165.
Hoffman, A.J. (1952), On approximate solutions of systems of linear inequali-
ties, Journal of Research of the National Bureau of Standards, Vol. 49, pp.
263-265.
REFERENCES 391

Iusem, A.N., Svaiter, B.F., and Teboulle, M. (1994), Entropy-like proximal

methods in convex programming. Mathematics of Operations Research, Vol.
19, pp. 790-814.
Iusem, A.N., and Teboulle, M. (1995), Convergence rate analysis of nonquadratic
proximal methods for convex and linear programming, Mathematics of Op-
erations Research, Vol. 20, pp. 657-677.
Lemaire, B. (1989), The proximal algorithm, in Penot. J.P. (ed.): New Methods
in Optimization and their Industrial Uses, Birkhauser-Verlag, Basel, pp. 73-
87.
Lemaire, B. (1992), About the convergence of the proximal method, in Advances
in Optimization. Lecture Notes in Economics and Mathematical Systems
382, Springer-Verlag, pp. 39-51.
Luque, F.J. (1984), Asymptotic convergence analysis of the proximal point
algorithm. SIAM Journal on Control and Optimization, Vol. 22, pp. 277-
293.
Martinet, B. (1970), Regularisation d'inbqations variationelles par approxima-
tions successives, Revue Francaise d'lnformatique et de Recherche Opera-
tionelle, pp. 273-299.
Nesterov, Y., and Nemirovskii, A. (1994), Interior-Point Polynomial Algorithms
in Convex Programming, SIAM, Philadelphia, PA.
Rockafellar, R.T. (1970), Convex Analysis. Princeton University Press.
Rockafellar, R.T. (1976), Monotone operators and the proximal point algo-
rithm, SIAM Journal on Control and Optimization, Vol. 14, pp. 877-898.
Solodov, M.V., and Svaiter, B.F. (1999), A hybrid projection-proximal point
algorithm, Journal of Convex Analysis, Vol. 6, pp. 59-70.
Solodov, M.V., and Svaiter, B.F. (2000a), Forcing strong convergence of proxi-
mal point iterations in a Hilbert space, Mathematical Programming, Vol. 87,
pp. 189-202.
Solodov, M.V., and Svaiter, B.F. (2000b), A truly globally convergent Newton-
type method for the monotone nonlinear complementarity problem, SIAM
Journal on Optimization, Vol. 10, pp. 605-625.
Teboulle, M. (1992), Entropic proximal mappings with applications to nonlinear
programming, Mathematics of Operations Research, Vol. 17, pp. 670-690.
Teboulle, M. (1997), Convergence of proximal-like algorithms, SIAM Journal
on Optimization 7, pp. 1069-1083.
392 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Tseng, P., and Bertsekas, D.P. (1993), On the convergence of the exponen-
tial multiplier method for convex programming, Mathematical Programming,
Vol. 60, pp. 1-19.
Yamashita, N., Kanzow, C., Morimoto, T., and Fukushima, M. (2001), An in-
feasible interior proximal method for convex programming problems, Journal
of Nonlinear and Convex Analysis, Vol. 2, 2001, pp. 139-156.
DIMENSIONAL NONCONVEX
VARIATIONAL PROBLEMS
R e d Meziat

Departamento de Matemiticas
Universidad de Los Andes
Carrera 1 este No 18A-10
Bogota', Colombia
[email protected]
and
OMEVA, Research Group on Optimization and Variational Methods
Departamento de Matemiticas
Universidad de Castilla La Mancha
13071, Ciudad Real, Spain
http://matematicas.uclm.es/omeva

Abstract: The purpose of this work is to carry out the analysis of two-
dimensional scalar variational problems by the method of moments. This
method is indeed shown to be useful for treating general cases in which the
Lagrangian is a separable polynomial in the derivative variables. In these cases,
it follows that the discretization of these problems can be reduced to a single
large scale semidefinite program.

Key words: Calculus of variations, young measures, the method of moments,

semidefinite programming, microstructure, non linear elasticity.
394 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The classical theory of variational calculus does not provide any satisfactory
methods to analyze non-convex variational problems expressed in the form

where f is a coercive non-convex Lagrangian function, and u the family of all

admissible scalar functions defined on 0. For a review on recent methods in the
calculus of variations see Dacorogna (1989).
In order to analyze this class of non-convex variational problems, we must
appeal to a new formulation with respect to Young measures. For these prob-
lems we introduce the generalized functional

is a parametrized family of probability measures supported on the plane. Each

one of these sets v is called a Young measure, hence the generalized functional
is defined in the family of all Young measures v.
Young measures theory predicts that the generalized functional (1.2) has a
Young measure minimizer

which provides information about the limit behavior of the minimizing se-
quences of the functional I given in (1.1). Thus,

in measure, whenever un is a minimizing sequence for the functional I. One

immediate conclusion is that functional I has a unique minimizer if and only
-
if the generalized functional I has a minimizer v* composed only of Dirac
measures. In this case
PZ,Y= % q x , y )
where G is a minimizer for I. For a thorough study on Young measures and
calculus of variations see Pedregal (1997).
2D NON CONVEX VARIATIONAL PROBLEMS 395

In the present work, we will study the particular case in which the Lagrangian
function f takes the polynomial separable form

Under this assumption, Problem (1.2) may be reduced to a single semidefinite

program using the theory of the classical problem of moments and elementary
convex analysis.
The present paper is organized as follows: in Section 2 we will see a short
review on the use of the Method of Moments for treating one dimensional
non convex variational problems. In Section 3 we will see how the Method of
Moments is used for analyzing the convex envelope of one-dimensional algebraic
polynomials. Section 4 describes the general analysis of two dimensional non
convex variational problems by the Method of Moments. Section 5 shows how
transform the analytical formulation into a particular mathematical program.
In Section 6 we will see some examples in detail and finally Section 7 gives some
comments about the interplay of this work with pure and applied mathematics.

2 THE METHOD OF MOMENTS

The generalized formulation in Young measures is valid for one-dimensional

non convex variational problems like

Assuming that f is a one-dimensional polynomial in the form

2n
f (t)= C CI tk >0
~ 2 n

the generalized problem in Young measures

Jt
min, T ( v )= JR f ( A ) d p z ( A ) d x
w i t h u'( x ) = JR A d p , ( A )
u ( 0 ) = 0 , u (1) = a
can be recast as
min, xEock (x)dx
rnk

w i t h u'(x)= ml ( x )
u ( 0 ) = 0, u( 1 ) = a
396 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where mk (x) are the algebraic moments of the parametrized measures p, which
form the one-dimensional Young measure

The theory of moments provides a good characterization for the algebraic

moments of positive measures supported on the real line. Therefore we can
study the one-dimensional generalized problem (2.2) by solving the optimiza-
tion problem (2.3). Here we will study two-dimensional problems defined by
separable polynomials in the form (1.4).
For a short review on applications of the method of moments for one-
dimensional non-convex variational problems see Meziat et a1 (2001). The es-
sential facts about the characterization of one-dimensional algebraic moments
are exposed in Akhiezer and Krein (1962). The difficulties about the charac-
terization of two-dimensional algebraic moments are explained in Berg et a1
(1979).

3 CONVEX ENVELOPES

Given a one-dimensional polynomial (2.1), its convex envelope may be defined

where p represents the family of all probability measures with mean t. In this
approach, every probability measure represents a convex combination of points
on the real line. Therefore, the measure

which solves (3.1), represents the convex combination which satisfies

From this point of view, it is clear that optimal measure G, has a very precise
is supported
geometric meaning. Here we have assumed that ,? i in two points
a t the most, because of Caratheodory's theorem in convex analysis.
Since f is a polynomial function in the form (2.1), every integral in (3.1)
can be written as
2D NON CONVEX VARIATIONAL PROBLEMS 397

where values mo, . . . ,rn2, are the algebraic moments of measure p. So we can
express the convex envelope of f using the next semidefinite program

fc (t) = co + clt + min,, zE2ck mk 8.t.

t m2

. .. ?0
. ..
mn mn+l mn+a m2n

where we have used the classical representation of one-dimensional algebraic

moments: The convex cone of positive definite Hankel matrices H = (mk+l)L,l=O
is the interior of the convex cone of algebraic moments (mo,. . . , m a n ) of posi-
tive measures supported on the real line. For more details we refer the reader
to Akhiezer and Krein (1962).
By using an elementary algebraic procedure, we can obtain the optimal mea-
sure ji for problem (3.1) from the optimal values fi2,. . . ,fianof the semidefinite
program (3.3). Indeed, if fi2 = t2, take

so the optimal measure ji is equal to the Dirac measure

Otherwise, take tl and t2 as the roots of the polynomial

and denote by XI, X2 the quantities

where tl < t < t2. Using these values in the expression (3.2), we obtain the
optimal measure ji. It is remarkable that only three moments are needed for
recovering the optimal measure ji. Finally, we conclude that Problem (3.1) and
Problem (3.3) are equivalent. For additional details see Pedregal et a1 (2003)
398 OPTIMIZATION AND CONTROL WITH APPLICATIONS

When f is a two-dimensional separable polynomial with the form (1.4), its

convex envelope is defined as

where p represents the family of all probability measures supported in the plane
satisfying

Note that it is analogous to the definition of convex envelopes for one-dimensional

functions.
However, in order to estimate the convex envelope of the separable polyno-
mial f, we must use another well known result of convex analysis: the convex
envelope of a separable function is the sum of the convex envelopes of its compo-
nents. See Dacorogna (1989). From this result and the explanation on convex
envelopes of one-dimensional polynomials given above, one observes that the
convex envelope of f is given by the semidefinite program

f c ( s , t ) = ao + bo + a ~ +s bit + minmitp,x?z2 + x5Z2bjpjaimi s.t

S m2

m3
.. .
...
mn mn+l mn+2 m2n
t p2 -..

...
''
PT PT+l pT+2 P2r
(3.5)
The optimal values m2,. .. ,.iiz2,,fi,. . . ,P2r for problem (3.5) allow us to de-
termine the optimal probability measure ,!i which satisfies (3.4).
From a practical point of view, ,!i is the direct product of two independent
one-dimensional distributions ,!ixand jiy, so we have
2D NON CONVEX VARIATIONAL PROBLEMS 399

where jix represents the convex envelope of the first polynomial

in (1.4) and respectively, p y represents the convex envelope of the second poly-
nomial
2r

in (1.4). Thus, marginal distributions px and py are obtained from values

s,jfL2,jfL3and t,ji2,fi respectively, in the same way that we did for the one-
dimensional case. In other words, there is no essential difference between the
one-dimensional polynomial case (2.1) and the two-dimensional separable poly-
nomial case (1.4) . Finally, it is very important to note that the optimal measure
p = fix x ,!iy determines the convex combination which defines the convex en-
velope of the separable polynomial f at the point (s,t) .

4 PROBLEM ANALYSIS

Our concern here is the analysis of non-convex variational problems like

min I (u) =
u if(du(x,y))dxdy s.t. ulm=g. (4.1)

We will study the case where f is a two-dimensional separable polynomial in

the general form (1.4). We first notice that direct discretization of functional I
in (4.1) provides a non-convex optimization problem which is not particularly
adequate to be solved by standard numerical optimization software. The reason
behind that is the lack of convexity on f , which can cause the search algorithm
to stop at some wrong local minima instead of providing the right global min-
ima for the functional I. In addition, we must consider the possibility that I
lacks minimizers on the space of admissible functions. Normally, admissible
functions belong to the Sobolev space witp(a)+ g where index p depends on
the integrand function f in (4.1).
To overcome this difficulty we study the generalized problem
400 OPTIMIZATION AND CONTROL WITH APPLICATIONS

whose solution in Young measures provides information about the minimizers

of the original functional I. By using the separable polynomial structure of f,
we can transform the generalized functional in (4.2) into the functional
27
J( m )= ( a ,Y + P x ,Y
i=O j=O
where m = (mi (x, Y )271 , ~ p = (pj (x, Y )2T) ~ represent
) ~ and =~ the algebraic mo-
ments of the parametrized measures p ~in ,the~ Young measure v. In this way,
we must solve the optimization problem

where the new sets of variables m and p must be characterized as the algebraic
moments of one-dimensional probability measure. In order to do so, we impose

1
the linear matrix inequalities
1 ml (x,Y) ... mn (x, Y)
ml (x,Y) m2 ( 2 , ~ ) ... mn+1 (x, Y)
m2 ( X , Y > m3 (x,Y) ... mn+2 (x,y) >0
...

PT(x, Y) (x, Y) ' P2T (x>Y)

for every point (x, y) E R. After an appropriate discretization, this problem can
be posed as a single semidefinite program. See Vandenverghe and Boyd (1996)
and Boyd et a1 (1994) for an introduction to semidefinite programming.

5 DISCRETE AND FINITE MODEL

Here we will transform the optimization problem (4.3) subject to the con-
straints (4.4), into an equivalent discrete mathematical program. First, we
take a finite set of N points on the domain R indexed by lc, that is
2D NON CONVEX VARIATIONAL PROBLEMS 401

Next, for every discrete point (xk,yk) we take the algebraic moments

Using the 2 N x (n
of the respective parametrized measure pzk,yk. + r ) variables
listed in (5.2), we can express the functional J in the discrete form:

The constraints (4.4) form a set of linear matrix inequalities for every point in
0 , hence they should keep the same for every point (xk,yk) in the mesh (5.1).
So we have a set of 2 N linear matrix inequalities expressed as

where mo = 1 and po = 1 for every k = 1 , . . . ,N.

In order to impose the boundary conditions

and the constraints

in (4.3), we use the following fact: Given any Jordan curve C inside the domain
0 , the restriction (5.5) implies

where (xo, yo) and (xf ,yf) are two endpoints of curve C.
We shall select a finite collection of M curves CLwith 1 = 1 , . . . , M which, in
some sense, sweep the whole domain 0 . It will suffice that each point (xk, yk)
on the mesh belongs to a t least one curve CL.In order to impose the boundary
conditions in (4.3), every curve Cl must link two boundary points of R. So we
obtain a new set of M constraints in the form

which can be incorporated as linear equalities in the discrete model.

402 OPTIMIZATION AND CONTROL WITH APPLICATIONS

We can see that optimization problem (4.3) can be transformed into a single
semidefinite program after discretization. Note that objective function Jd in
(5.3) is a linear function of the variables in (5.2). Those variables are restricted
by the set of 2 N linear matrix inequalities given in (5.4) and the set of M linear
equations given in (5.6). Thus, we have obtained a very large single semidefinite
program.

6 EXAMPLES

To illustrate the method proposed in this work, we will analyze the non-convex
variational problem

under the following boundary conditions

The corresponding generalized problem has the form

minv S[-l,l12 {sn2((1 - u2) + Y2) d k y (0, Y)} dx dY

under the constraint (g, g) = JR2 ( 0 , ~ dpr,y
) (0,~)
and the boundary conditions ulan = g (x) with 0 = [-1 , 11"
(6.1)
which transforms into the optimization problem

minm, h-l,lla +
(1 - 2m2 (2, Y) m4 (x, Y) + P2 (x, Y)) dxdY
under the constraints
(E,g) = (ml (x, Y) PI 7
2 1
Y)), b i + j (x, Y)Ii,j=o 0, bi+j (x, Y)Ii,j=O 2 0
and the boundary conditions ulan = g (x) with 0 = [- 1 , 1 ] 2 .
(6.2)
In order to perform the discretization of this problem, we use the straight
lines with slope 1 crossing the square [-I, 112.With them we can impose the
boundary conditions in the finite model. After solving the discrete model, not
to be exposed here, we obtain the optimal moments for (6.2), and the Young
measure solution for the generalized problem (6.1).
2D NON CONVEX VARIATIONAL PROBLEMS 403

For the three cases studied, we obtain the following optimal parametrized

c) P s ,=~ d(1,o)VX,Y E [-I, 112

hence we infer that Problem a) does not have minimizers, Problem b) has the
minimizer ii (x, y) = 1- 1x1 and Problem c) has the minimizer (x, y) = x 1. +
Although Problem a) lacks minimizers, the optimal Young measure obtained
gives enough information about the limit behavior of the minimizing sequences.
Indeed, if unis an arbitrary minimizing sequence for Problem a) we have

in measure, where gradient (1,O) is preferred with 50% of possibilities and

gradient (-1,O) is preferred with the remaining 50% of possibilities in the
minimizing process, for every point (x, y) E [-I, 112 .

7 CONCLUDING REMARKS

The major contribution of this work is that it settles the way for studying non-
convex variational problems of the form (4.1) . Indeed, the direct method of the
calculus of variations does not provide any answer for them if the integrand f
is not convex. See Dacorogna (1989). In addition, in this work we propose a
method for solving generalized problems like (4.2) when the integrand f has the
separable form described in (1.4). In fact to the best knowledge of the author,
do not exist other proposals to analyze this kind of generalized problems in two
dimensions.
An important remark about this work is that we have reduced the origi-
nal non convex variational problem (4.1) to the optimization problem (4.3).
In addition, the reader should note that Problem (4.3) is a convex problem
because the objective function is linear and the feasible set convex. That is
a remarkable qualitative difference since numerical implementation of problem
(4.1) may provide wrong answers when the search algorithm stops in local min-
ima, whereas a good implementation of Problem (4.3) should yield the global
minima of the problem.
404 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Since we can pose Problem (4.3) as a single large scale semidefinite program,
we can use existing software for solving non convex variational problems in
the form (4.1) whenever the integrand f has the separable form (1.4). This
situation prompts further research on large scale semidefinite programming
specially suited for generalized problems in the form (4.3) .
We should also stress that, although the original non convex variational prob-
lem (4.1) may not have a solution, its new formulation (4.3) always has one. In
general, this solution is unique and provides information about the existence of
minimizers for problem (4.1) . If Problem (4.1) has a unique minimizer Zi (x, y) ,
then Problem (4.3) provides the moments of the Dirac measures

Moreover, if Problem (4.3) provides the moments of a family of Dirac measures

then problem (4.1) has a unique minimizer a (x, y) which satisfies (x, y) =
3 (x, 9 ) .
One fundamental question we feel important to raise is whether the discrete
model (5.3) is an adequate representation of the convex problem (4.3). From
an analytical point of view, we need to find a particular qualitative feature on
the solution of Problem (4.3), that is the Dirac mass condition on all optimal
measures. So we can hope that even rough numerical models can provide us
with the right qualitative answer about the existence of minimizers for the non
convex variational problem (4.1). This has actually been observed in many
numerical experiments.
It is also extremely remarkable that we can get a numerical answer to an
analytical question. Indeed, we are clarifying the existence of minimizers of
one particular variational problem from a numerical procedure. This point is
crucial because no analytical method exists which allows to solve this question
when we are coping with general non convex variational problems.
On the other hand, we really need a fine numerical model because the solu-
tion of problem (4.3) contains the information about the oscillatory behavior
of minimizing sequences of the non convex problem (4.1). In those cases where
Problem (4.1) lacks solution, minimizing sequences show similar oscillatory be-
havior linked with important features in the physical realm. For example, in
REFERENCES 405

elasticity models of solid mechanics such behavior represents the distribution

of several solid phases inside some particular body. This information provides
the microstructure of the crystalline net of the material.
To discover such phenomena we need a good representation of the optimal
Young measure of the generalized problem (4.2), which in turns, is embedded
into the solution of the convex formulation in moments (4.3). In conclusion,
we feel that it is important to devise a good numerical treatment of problem
(4.3) by solving a semidefinite model like (5.3).

Acknowledgments

Author wishes to thank Serge Prudhomme and Juan C. Vera for their comments and
suggestions on this paper.

References

Dacorogna, B. (1989), Direct Methods in the Calculus of Variations, Springer

Verlag.
Pedregal, P. (1997), Parametrized Measures and Variational Principles, Birkhauser.
Meziat, R., J.J. Egozcue and P. Pedregal (2001), The method of moments for
non-convex variational problems, in Advances in Convex Analysis and Global
Optimization, Kluwer Nonconvex Optimization and its Applications Series,
vol 54, 371-382.
Akhiezer, N. and M. Krein (1962), Some Questions in the Theory of Moments,
AMS.
Berg, C., J. Christensen and C. Jensen (1979), A remark on the multidimen-
sional moment problem, Math. Ann. 243, 163-169.
Pedregal, P., R. Meziat and J .J . Egozcue (2003), From a non linear, non convex
variational problem to a linear, convex formulation, accepted in Journal of
Applied Mathematics, Springer Verlag.
Vandenverghe, L. and S. Boyd (1996), Semidefinite programming, SIAM Re-
view, vol. 38, no. 1.
Boyd, S., L. El Ghaoui, E. Feron and V. Balakrishnan (1994), Linear matrix
inequalities in system and control theory, SIAM, studies in applied mathe-
matics series, vol. 15.
POINTS OF PROJECTED DYNAMICAL
SYSTEMS
Mauro Passacantando

Department of Applied Mathematics, University of Pisa

Via Bonanno 25/b, 56126 Pisa, Italy
e-mail: [email protected]

Abstract: We present a survey of the main results about asymptotic stability,

exponential stability and monotone attractors of locally and globally projected
dynamical systems, whose stationary points coincide with the solutions of a
corresponding variational inequality. In particular, we show that the global
monotone attractors of locally projected dynamical systems are characterized by
the solutions of a corresponding Minty variational inequality. Finally, we discuss
two special cases: when the domain is a polyhedron, the stability analysis for
a locally projected dynamical system, a t regular solutions to the associated
variational inequality, is reduced to one of a standard dynamical system of
lower dimension; when the vector field is linear, some global stability results,
for locally and globally projected dynamical systems, are proved if the matrix
is positive definite (or strictly copositive when the domain is a convex cone).

Key words: Variational inequality, projected dynamical system, equilibrium

solution, stability analysis.
408 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Equilibrium is a central concept in the study of complex and competitive sys-

tems. Examples of well-known equilibrium problems include oligopolistic mar-
ket equilibrium and traffic network equilibrium problems. For these problems
many variational formulations have been introduced in the last years, neverthe-
less their analysis is focused on the static study of the equilibrium, while it is
also of interest to analyse the time evolution of adjustment processes for these
equilibrium problems. Recently, two models for studying dynamic behaviour
of such systems have been proposed: they are based on constrained dynamical
systems involving projection operators. The first one is the so-called locally
projected dynamical system (first proposed in Dupuis et a1 (1993)), the other
is known as globally projected dynamical system (introduced in Friesz et a1
(1994)). The most important connection between these dynamical models and
variational models lies is the possibility of characterizing stationary points for
dynamical models by solutions of a particular variational inequality. The lo-
cally projected dynamical systems have had recently important applications in
economics (see Dong et a1 (1996), Nagurney et a1 (1996a) and Nagurney et
a1 (199613)) and in the traffic networks equilibrium problems (see Nagurney et
a1 (1997a), Nagurney et a1 (199713) and Nagurney et a1 (1998)); the globally
projected dynamical systems have been introduced as a model for describing
network disequilibria (see Friesz et a1 (1994)) and they have been applied to
neural networks for solving a class of optimization problems (see Xia et a1
(2000)).
The main purpose of this paper is to give, to the best of my knowledge, a
survey of the main results about the stability of these two types of projected
dynamical systems (see Nagurney et a1 (1995) and Nagurney et a1 (1996c), Pap-
palardo et al (2002) and Xia et al (2000)). The paper is organized as follows. In
Section 2, we recall the definitions of locally and globally projected dynamical
systems and we show the equivalence between their equilibrium points and the
solutions of a suitable associated variational inequality. In Section 3, we recall
some stability definitions needed throughout the paper (monotone attractor,
asymptotic stability, exponential stability); we show some stability results for
a locally projected dynamical system under monotonicity assumptions on the
vector field; we state the correspondence between the global monotone attrac-
tors of a locally projected dynamical system and the solutions to a related
VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 409

Minty variational inequality; we give asymptotic and exponential stability re-

sults for a globally projected dynamical system when the jacobian matrix of the
vector field is symmetric; finally we provide a stability result, for both the pro-
jected dynamical systems, similar to the nonlinear sink theorem for standard
dynamical systems. Section 4 is dedicated to the stability analysis in two spe-
cial cases. When the domain is a convex polyhedron, the stability of a locally
projected dynamical system, at regular solutions to the associated variational
inequality, is essentially the same as that of a standard dynamical system of
lower dimension, the so-called minimal face flow. When the vector field is lin-
ear and the matrix is positive definite (or strictly copositive if the domain is a
convex cone), the global exponential stability for locally and globally projected
dynamical systems is proved. Finally, some suggestions for future research in
the linear case are described.

2 VARIATIONAL A N D DYNAMICAL MODELS

Throughout this paper K denotes a closed convex subset of Rn and

F : Rn -+ Rn a vector field. The following monotonicity definitions will be
needed for our later discussions.

Definition 2.1 F is said to be locally pseudomonotone at x* i f there is a neigh-

borhood N(x*) of x* such that

(F(x),x - x) 2 0, + (F(x),x - x) 2 0, Q x E N(x),

where (., .) denotes the inner product i n Rn;

F is said to be pseudomonotone o n K if

F is said to be locally strictly pseudomonotone at x* i f there is a neighborhood

N(x*) of x* such that

(F(x),x - x) 2 0, + (F(x),x - x) > 0, V x E N(x);

F is said to be strictly pseudomonotone on K i f

F is said to be monotone o n K if
410 OPTIMIZATION AND CONTROL WITH APPLICATIONS

F is said to be locally strongly monotone at x* if there is a neighborhood N ( x * )

of x* and v > 0 such that

( F ( x )- F ( x * ) , x- x) 2 qllx - x1I2, V x E N(x*);

F is said to be strongly monotone on K i f there is q > 0 such that

We recall that a variational inequality of Stampacchia-type SVI(F,K) con-

sists in determining a vector x* E K , such that

( F ( x * ) , x- x*) 2 0, V x E K;

the associated Minty variational inequality MVI(F,K) consists in finding a vec-

tor x* E K , such that

It is well-known that if F is continuous on K , then each solution to MVI(F,K)

is a solution to SVI(F,K);whereas if F is pseudomonotone on K , then each
solution to SVI(F,K) is also a solution to MVI(F,K).
The first dynamical model we consider in this context is the so-called locally
projected dynamical system (first introduced in Dupuis et a1 (1993)), denoted
by LPDS(F,K),which is defined by the following ordinary differential equation

where T K ( x )denotes the tangent cone to K at x, Ps denotes the usual projec-

tion on a closed convex subset S:

Ps(x) = argmin llx - 211,

ZES

and 11 . 11 is the euclidean norm on Rn.

We remark that if the vector field F is continuous on K , then the right-
hand side of LPDS(F,K) is also continuous on the relative interior of K , and it
can be discontinuous elsewhere. By a solution to the LPDS(F,K) we mean an
absolutely continuous function x : [0, +oo) -+ K such that

for all t 2 0 save on a set of Lebesgue measure zero.

VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 411

We are interested in the equilibrium (or stationary) points of LPDS(F,K),

i.e. the vectors x* E K such that

that is once a solution of the LPDS(F,K) is a t x, it will remain a t x for

all future times. The first connection between the locally projected dynamical
systems and the variational inequalities is that the stationary points of the
LPDS(F,K) coincide with the solutions of SVI(F,K). This equivalence allows
us to carry out a stability analysis of a solution to SVI(F,K) respect to the
dynamical system LPDS(F,K), as we will see in Section 3.
Since the main purpose of this paper is to analyse the stability of the equi-
librium points, we confine ourself to cite only the following result about the
existence, uniqueness and continuous dependence on the initial value of solu-
tions to LPDS(F,K), in the special case where K is a polyhedron (see Nagurney
et a1 (1996~)).

Theorem 2.1 Let K be a polyhedron. If there exists a constant M >0 such

that
IIF(x)II I M (1 + IIxII), v x E K ,

then

for any xo E K, there exists a unique solution xo(t) to LPDS(F,K), such

that xo(0) = xo;

if xn + xo as n + +GO, then x, (t) converges to xo(t) uniformly on every

compact set of [0,+GO).

We note that Lipschitz continuity of F on K is a sufficient condition for the

properties stated in Theorem 2.1.

The second dynamical model we consider is the so-called globally projected

dynamical system, denoted by GPDS(F,K,a), which is defined as the following
ordinary differential equation
412 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where a is a positive constant. If the vector field F is continuous on K , then

the right-hand side of GPDS(F,K,a) is also continuous on K, but it can be
different from -a F(x) even if x is an interior point to K. Hence the solutions
of GPDS(F,K,a) and LPDS(F,K) are different in general.
The following result about global existence and uniqueness of solutions to
GPDS(F,K,a) follows on the ordinary differential equations theory (see Xia et
a1 (2000)).

Theorem 2.2 If F is locally Lipschitz continuous and there exists a constant

M > 0 such that
IIF(x)II I M (1+ IIxII), Q x E K,
then for any xo E K there exists a unique solution xo(t) for GPDS(F,K,a),
such that xo(0) = xo, that is defined for all t E R.

I t can be proved that, as in the case of LPDS(F,K), a solution to GPDS(F,K,a)

starting from a point in K has to remain in K (see Xia et a1 (2000) and Pap-
palardo et al (2002)).
The equilibrium (or stationary) points of GPDS(F,K,a) are naturally defined
as the vectors x* E K such that

PK (x* - a F(x)) = x.

It is easy to check that they also coincide with the solutions to SVI(F,K). Hence,
we can analyse the stability of the solutions to SVI(F,K) also respect to the
dynamical system GPDS(F,K,a).

3 STABILITY ANALYSIS

We have seen that LPDS(F,K) and GPDS(F,K,a) have the same stationary
points, but, in general, their solutions are different. In this section we analyse
the stability of these equilibrium points, namely we wish to know the behaviour
of the solutions, of LPDS(F,K) and GPDS(F,K,a) respectively, which start near
an equilibrium point. Since we are mainly focused on the stability issue, we
can assume the property of existence and uniqueness of solutions to the Cauchy
problems corresponding to locally and globally projected dynamical systems.
In the following, B(x*, r) denotes the open ball with center x* and radius r .
Now we recall some definitions on stability.

Definition 3.1 Let x* be a stationary point of LPDS(F,K) and GPDS(F,K,a).

VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 413

x* is called stable if for any E > 0 , there exists 6 > 0 such that for every
solution x ( t ) , with x ( 0 ) E B ( x * ,6 ) n K , one has x ( t ) E B ( x * ,E ) for all t 2 0;
x* is said asymptotically stable i f x* is stable and lim x ( t ) = x* for every
t-++oo
solution x ( t ) , with x ( 0 ) E B ( x * , 6 ) n K ; x* is said globally asymptotically
stable i f it is stable and lim x ( t ) = x* for every solution x ( t ) with x ( 0 ) E K .
t-+w
W e recall also that x* is called monotone attractor if there exists 6 > 0 such
that, for every solution x ( t ) with x ( 0 ) E B ( x * ,6 ) n K , the euclidean distance
between x ( t ) and x*, that is Ilx(t) - x*ll, is a nonincreasing function of t ;
whereas x* is said strictly monotone attractor if Ilx(t) - x*ll is decreasing to
zero in t. Moreover x* is a (strictly) global monotone attractor i f the same
properties hold for any solution x ( t ) such that x ( 0 ) E K .
Finally, x* is a finite-time attractor if there is 6 > 0 such that, for every
solution x ( t ) , with x ( 0 ) E B ( x * ,6 ) n K , there exists some T < +oo such that
x ( t ) = x* for all t 2 T .

I t is trivial t o remark that the monotone attractors are stable equilibrium

points, whereas the strictly monotone attractors and the finite-time attractors
are asymptotically stable ones.
I t is easy t o check that the stability o f a locally (or globally) projected
dynamical system can differ from the stability o f a standard dynamical system
in the same vector field (see examples in Nagurney et a1 (1995)).
T h e pseudo-monotonicity property o f F is directly related t o the monotone
attractors o f LPDS(F,K), as shown in the following theorem which is a direct
generalization o f a result proved in Nagurney et a1 (1995).

Theorem 3.1 Let x* be a stationary point of LPDS(F,K). If F is locally

(strictly) pseudomonotone at x*, then x* is a (strictly) monotone attractor
for LPDS(F,K); if F is (strictly) pseudomonotone on K , then x* is a (strictly)
global monotone attractor for LPDS(F,K).

However, the monotonicity o f F is not sufficient t o prove the stability for

an equilibrium point o f a globally projected dynamical system, as the following
example shows (for the details see Xia et a1 (2000)).
414 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Example 3.1 We consider K = { x E R3 : -10 5 xi 5 lo), a = 1 and

+
F ( x ) = A x b, where

The afine vector field F is monotone, because A is positive semidefinite, but

the unique equilibrium point of GPDS(F,K,a), i.e. x* = (0.5, -0.5, -2)T, is
not stable.

In addition to the monotonicity of F , the symmetry of the jacobian matrix

of F , denoted by J F , is necessary in order to achieve the asymptotic stability
of a stationary point of a globally projected dynamical system (see Xia et a1
(2000)).

Theorem 3.2 Let x* be the unique stationary point of GPDS(F,K,a). If F is

monotone on K and the jacobian matrix J F is symmetric on an open convex
set including K , then x* is globally asymptotically stable for GPDS(F,K,a) for
any a > 0.

We remark that under the assumptions of Theorem 3.2, the vector field F
is the gradient map of a real convex function on K .
Now we go back to the monotone attractors. When the vector filed F is
continuous, there is a further connection between locally projected dynam-
ical systems and variational inequalities: the global monotone attractors of
LPDS(F,K) are equivalent to the solutions of the Minty variational inequality
MVI(F,K) (see Pappalardo et a1 (2002)).

Theorem 3.3 Let F be continuous on K . Then x* E K is a global monotone

attractor for LPDS(F,K) if and only if it is solution to MVI(F,K).

We remark that Theorem 3.3 does not hold for globally projected dynamical
systems: the following example (see Pappalardo et a1 (2002)) shows that the so-
lutions to MVI(F,K) are not necessarily monotone attractors of GPDS(F,K,a),
even if the vector field F is continuous on K .

Example 3.2 W e consider K = R: and the vector field

VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 415

The point x* = ( O , O ) ~is solution to MVI(F,K), but it is not a monotone

attractor for GPDS(F,K,a) for any fixed a > 0.

Another stability type we consider is the so-called exponential stability.

Definition 3.2 Let x* be a stationary point of LPDS(F, K ) and GPDS(F,K,a).

It is said exponentially stable, if the solutions starting from points close to x*
are convergent to x* with exponential rate, that is if there is 6 > 0 and two
constants a > 0 and C > 0 such that for every solution x ( t ) , with x ( 0 ) E
B ( x * ,6 ) n K , one has

x* is globally exponentially stable if (3.1) holds for all solutions x ( t ) such that
x(0) E K .

We remark that a strictly monotone attractor is not necessarily exponentially

stable and vice versa (see examples in Pappalardo et a1 (2002)).
The exponential stability of a stationary point of LPDS(F,K) is proved under
the strong monotonicity assumption of the vector field F (see Nagurney et a1
(1995) and Nagurney et a1 ( 1 9 9 6 ~ ) ) .

Theorem 3.4 Let x* be a stationary point of LPDS(F,K). If F is locally

strongly monotone at x*, then x* is a strictly monotone attractor and expo-
nentially stable for LPDS(F,K); i f F is strongly monotone on K , then x*
is a strictly global monotone attractor and globally exponentially stable for
LPDS(F, K).

The strong monotonicity and the Lipschitz continuity of F give the expo-
nential stability of a stationary point of GPDS(F,K,a), provided that a is small
enough (see Pappalardo et a1 (2002)).

Theorem 3.5 Let x* be a stationary point of GPDS(F,K,a). If F is locally

strongly monotone at x* with constant q , and locally Lipschitz continuous at
x* with constant L , then x* is a strictly monotone attractor and exponentially
stable for GPDS(F,K,a), provided that a < 2q/L2; i f F is strongly monotone
on K with constant q and locally Lipschitz continuous on K with constant L,
then x* is a strictly global monotone attractor and globally exponentially stable
for GPDS(F,K p ) , provided that a < 2q/L2.
416 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The global exponential stability for GPDS(F,K,a), where a is small enough,

has been proved even when the jacobian matrix J F is symmetric and positive
definite (see Xia et a1 (2000)).

Theorem 3.6 Let x* be a stationary point of GPDS(F,K,a). If J F is symmet-

ric and uniformly positive definite in Rn, 11 JFll has an upper bound, then x* is
globally exponentially stable for GPDS(F,K,a), provide thata < 21 max 11 JF(x)II.
xERn

A further result on the exponential stability for globally projected dynamical

systems can be proved when the jacobian matrix J F is not symmetric, but the
domain K is bounded (see Xia et a1 (2000)).

Theorem 3.7 Let x* be a stationary point of GPDS(F,K,a). If K is bounded,

F is continuously differentiable on K and J F is positive definite on K , then
there exists a0 > 0 such that x* is globally exponentially stable for GPDS(F,K,a)
for any a < ao.

Now we present a stability result for LPDS(F,K) analogous to the nonlinear

sink theorem for classical (i.e. not projected) dynamical systems (see Hirsch
et a1 (1974)); we will assume a stronger condition on F , that is the jacobian
matrix of F in x* is positive definite, instead of having its eigenvalues positive
real parts, but we also obtain a stronger result, i.e. x* is a strictly monotone
attractor and exponentially stable (see Pappalardo et a1 (2002)), instead of only
exponentially stable.

Theorem 3.8 Let x* be a stationary point of LPDS(F,K). If F is continu-

ously differentiable on a neighborhood of x* and the jacobian matrix J F ( x * )
is positive definite, then x* is a strictly monotone attractor and exponentially
stable for LPDS(F,K).

A result similar to Theorem 3.8 can be proved also for GPDS(F,K,a), pro-
vided that a is small enough (see Pappalardo et a1 (2002)).

Theorem 3.9 Let x* be a stationary point of GPDS(F,K,a). If F is contin-

uously differentiable on a neighborhood of x* and the jacobian matrix J F l x * )
is positive definite, then there exists a0 > 0 such that x* is a strictly monotone
attractor and exponentially stable for GPDS(F,K,a) for any a < ao.
VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 417

4 SPECIAL CASES

This section is devoted to the stability analysis in two special cases: when the
domain K is a convex polyhedron and when the vector field F is linear.
We remarked that the stability for a locally projected dynamical system is
generally different from that of a standard dynamical system; however, when K
is a convex polyhedron, many local stability properties for LPDS(F,K) follow on
that of a classical dynamical system in lower dimension, under suitable assump-
tion on the regularity of the stationary points of LPDS(F,K) (see Nagurney et
a1 (1995)).
We assume that K is specified by

where B is an m x n matrix, with rows Bi, and b E Rn. We recall that a

face of K is the intersection of K and a number of hyperplanes that support
K , and the minimal face of K containing a point x, denoted by E(x), is the
intersection of all the faces of K containing x. If we denote

I ( x ) = { i : Bix=bi) and S ( X ) = { X E R n : Bix=O, Qi'iI(x)},

then E(x) = (S(x) + x) n K . We assume that S(x) = Rn, when I ( x ) = 0.

Let x* be a stationary point of LPDS(F,K), with dimS(x*) > 1, then there
is a S > 0 such that

The following ordinary differential equation defined on the subspace S(x*):

is called the minimal face flow and it is denoted by MFF(F,K,x*). Note that if
F is locally Lipschitz continuous, then so is the right hand side of MFF(F,K,x*),
hence, for any zo E S(x*), there is a unique solution zo(t) to MFF(F,K,x*),
defined in a neighborhood of 0, such that zo(0) = zo. Moreover, it is clear that
0 E S(x*) is a stationary point of MFF(F,K,x*). The stability of 0 E S(x*) for
MFF(F,K,x*) assures the stability of x* for LPDS(F,K), under some regular-
ity condition on x*, which we now introduce. Since x* solves the variational
inequality SVI(F,K), we have
418 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where NK(x) = { y E Rn : (y,x - x) 5 0, V x E K ) is the normal cone of

K at x*. We say that x* is a regular solution of SVI(F,K) if

where riNK(x*) denotes the relative interior of NK(x*). Note that any interior
solution of SVI(F,K) is regular if we assume ri{O) = (0); moreover, when x* is
a solution of SVI(F,K) that lies on an (n-1)-dimensional face of K , it is regular
if and only if F(x*) # 0.
Now we show two stability results proved in Nagurney et a1 (1995). First, a
regular solution to SVI(F,K) has the strongest stability when it is an extreme
point of K .

Theorem 4.1 If x* is a regular solution to SVI(F,K) and S(x*) = {0), then

it is a finite-time attractor for LPDS(F,K) and there are y > 0 and 6 > 0 such
that, for any solution x(t), with x(0) E B(x*,S) n K

The stability results in the general case are summarized in the following
theorem.

Theorem 4.2 If x* is a regular solution to SVI(F,K) and dimS(x*) 2 1 , then

if 0 is stable for MFF(F,K,x), then x is stable for LPDS(F,K);

if 0 is asymptotically stable for MFF(F,K,x), then x is asymptotically

stable for LPDS(F, K);

i f 0 is a finite-time attractor for MFF(F,K,x), then x is a finite-time

attractor for LPDS(F,K).

We remark that the local stability of a stationary point x* of LPDS(F,K)

depends on the combination of the regularity of x* and the local stability of
MFF(F,K,x*). In the extreme case S(x*) = (0) the stability for LPDS(F,K) is
implied only by the regularity condition, in the other extreme case, S(x*) = Rn,
MFF(F,K,x*) is just a translation of LPDS(F,K) from x* to the origin, hence
they enjoy the same stability.
We now proceed to consider the special case of F being a linear vector
field, that is F(x) = Ax, where A is a real n-dimensional matrix. Under this
VARIATIONAL INEQUALITIES AND DYNAMICAL SYSTEMS 419

assumption, the existence and uniqueness property for the solutions to the
Cauchy problems associated to LPDS(F,K) and GPDS(F,K,a) holds for any
closed convex domain K (see Dupuis et a1 (1993) and Xia et a1 (2000)).
We first remark that when the matrix A is positive definite, the local stability
properties obtained for LPDS(F,K), by Theorem 3.8, and for GPDS(F,K,a),
by Theorem 3.9, become global properties, as shown by the following result.

Proposition 4.1 Assume that F(x) = Ax and x* is a stationary point of

LPDS(F,K) and GPDS(F,K,a). If A is a positive semidefinite matrix, then x*
is a global monotone attractor for LPDS(F,K). If A is positive definite, then
x* is the unique stationary point for LPDS(F,K) and GPDS(F,K,a); moreover
there is a 0 > 0 such that x* is a strictly global monotone attractor and globally
exponentially stable for LPDS(F,K) and for GPDS(F,K,a) for any a < a o .

Proof: It easy to prove that F is strongly monotone on K with constant

r] = min (Ax, x) > 0;

Ilxll=l
therefore the variational inequality SVI(F,K) has a unique solution x*, which
is also a stationary point for LPDS(F,K) and GPDS(F,K,a). By Theorem 3.4,
x* is a strictly global monotone attractor and globally exponentially stable for
LPDS(F,K). Moreover F is Lipschitz continuous on K with constant IIAll. By
Theorem 3.5, x* is a strictly global monotone attractor and globally exponen-
tially stable for GPDS(F,K,a), provided that a < 2r]/llA112.

In addition, if we assume that the domain K is a closed convex cone, then

it is well-known that SVI(F,K) is equivalent to a generalized complementarity
problem. In this case the origin is a trivial stationary point for LPDS(F,K) and
GPDS(F,K,a), and we can easily prove some stability properties for it, under
the weaker assumption that the matrix A is (strictly) copositive with respect
to the cone K .

Proposition 4.2 Assume that K be a closed convex cone and F(x) = Ax. If
A is a copositive matrix with respect to K , that is ( x ,Ax) 2 0 for all x E K ,
then x* = 0 i s a global monotone attractor for LPDS(F,K). If A is strictly
copositive with respect to K , that is ( x ,Ax) > 0 for all x E K , then x* = 0
is the unique stationary point for LPDS(F,K) and GPDS(F,K,a), and there
exists a 0 > 0 such that x* = 0 is a strictly global monotone attractor and
420 OPTIMIZATION AND CONTROL WITH APPLICATIONS

globally exponentially stable for LPDS(F,K) and for GPDS(F,K,a), for any
a < ao.

The stability analysis in the linear case is still open; future research might be
carried on to study suitable conditions on the matrix A providing stability for
any closed and convex domain. Also, it might be of interest to check if, when
F is an affine vector field and K = R3, the classes of matrices needed for the
study of existence and uniqueness of the solutions to the linear complementarity
problem are sufficient to guarantee some stability properties.

References

Aubin, J.P. and Cellina, A. (1984), Differential inclusions, Springer, Berlin,

Germany.
Cottle, R.K., Pang, J.-S. and Stone, R.E. (1992), The linear complementarity
problem, Academic Press, Inc., Boston, Massachussets.
Dong, J., Nagurney, A. and Zhang, D. (1996), A projected dynamical system
model of general financial equilibrium with stability analysis, Mathematical
and Computer Modelling, Vol. 24, pp. 35-44.
Dupuis, P. (1987), Large deviations analysis of reflected diffusions and con-
strained stochastic approximation algorithms in convex sets, Stochastics, Vol.
21, pp. 63-96.
Dupuis, P. and Nagurney, A. (1993), Dynamical systems and variational in-
equalities, Annals of Operations Research, Vol. 44, pp. 9-42.
Friesz, T.L., Bernstein, D.H., Metha, N.J., Tobin, R.L. and Ganjlizadeh, S.
(1994), Day-to-day dynamic network disequilibria and idealized traveler in-
formation systems, Operations Research, Vol. 42, pp. 1120-1136.
Hirsch, M.W. and Smale, S. (1974), Differential equations, dynamical systems,
and linear algebra, Academic Press, New York, New York.
Karamardian, S. and Schaible, S. (1990), Seven kinds of monotone maps, Jour-
nal of Optimization Theory and Applications, Vol. 66, pp. 37-46.
Kinderlehrer, D. and Stampacchia, G. (1980), An introduction to variational
inequality and their application, Academic Press, New York, New York.
Nagurney, A. (1993), Network economics: a variational inequality approach,
Kluwer Academic Publishers, Boston, Massachusetts.
Nagurney, A. (1997), Parallel computation of variational inequalities and pro-
jected dynamical systems with applications, Parallel computing in optimiza-
tion, Kluwer Academic Publishers, Dordrecht, Holland, pp. 343-411.
Nagurney, A. and Zhang, D. (1995), On the stability of projected dynamical
systems, Journal of Optimization Theory and Applications, Vol. 85, pp. 97-
124.
Nagurney, A. and Zhang, D. (1996a), On the stability of an adjustment process
for spatial price equilibrium modeled as a projected dynamical system, Jour-
nal of Economic Dynamics and Control, Vol. 20, pp. 43-62.
Nagurney, A. and Zhang, D. (1996b), A stability analysis of an adjustment
process for oligopolistic market equilibrium modeled as a projected dynam-
ical system, Optimization, Vol. 36, pp. 263-285.
Nagurney, A. and Zhang, D. (1996c), Projected dynamical systems and vari-
ational inequalities with applications, Kluwer Academic Publishers, Dor-
drecht, Holland.
Nagurney, A. and Zhang, D. (1997a), Massively parallel computation of dy-
namic traffic networks modeled as projected dynamical systems, Network
optimization, Springer, Berlin, Germany, pp. 374-396.
Nagurney, A. and Zhang, D. (1997b), A formulation, stability and computa-
tion of traffic network equilibria as projected dynamical systems, Journal of
Optimization Theory and Applications, Vol. 93, pp. 417-444.
Nagurney, A. and Zhang, D. (1998), A massively parallel implementation of a
discrete-time algorithm for the computation of dynamic elastic demand traf-
fic problems modeled as projected dynamical systems, Journal of Economic
Dynamics and Control, Vol. 22, pp. 1467-1485.
Pappalardo, M. and Passacantando, M. (2002), Stability for equilibrium prob-
lems: from variational inequalities to dynamical systems, Journal of Opti-
mization Theory and Applications, Vol. 113, pp. 567-582.
Xia, Y.S. and Wang, J. (2000), On the stability of globally projected dynamical
systems, Journal of Optimization Theory and Applications, Vol. 106, pp. 129-
150.
20
ON A QUASI-CONSISTENT
APPROXIMATIONS APPROACH T O
OPTIMIZATION PROBLEMS WITH TWO
NUMERICAL PRECISION PARAMETERS
Olivier Pironneau
Laboratoire dlAnalyse Numerique,
~niversitedeParis 6, Paris, France
Email: [email protected]
and Elijah Polak
Department of Electrical Engineering and Computer Sciences,
University of California, Berkeley, CA 94720, USA
Email: [email protected]

Abstract: We present a theory of quasi-consistent approximations that com-

bines the theory of consistent approximations with the theory of algorithm
implementation, presented in Polak (1997), and enables us to solve infinite-
dimensional optimization problems whose discretization involves two precision
parameters. A typical example of such a problem is an optimal control prob-
lem with initial and final value constraints. The theory includes new algorithm
models that can be used with two discretization parameters. We illustrate the
applicability of these algorithm models by implementing them using an approx-
imate steepest descent method and applying it them to a simple two point
boundary value optimal control problem. Our numerical results (not only the
ones in this paper) show that these new algorithms perform quite well and are
fairly insensitive to the selection of user-set parameters. Also, they appear to
be superior to some alternative, ad hoc schemes.

Key words: Optimization, partial differential equations, acceleration methods

424 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The research presented in this paper was motivated by optimal control problems
with ODE or PDE dynamics, whose discretized dynamics cannot be solved
explicitly. For example, consider a classical optimal control problem of the
form

where Lg,2[0,1] is a linear space whose elements are in Lg[O, 11, but it uses
the Lz [O,1] norm, f (u) = F(xu (I)), and xu (t) E IRn is the solution of the two
point boundary value problem

with the usual assumptions (see, e.g., Polak (1997), Ch. 4). If we discretize
the dynamics (1.2) by means of Euler's method, using a step-size 1/N, where
N > 0 is an integer, we get

and the discretized problem assumes the form

where LN is the space of functions taking values in IRm, which are constant of
the intervals [k/N, (k+l)/N), k = 0,1, ..., N - 1, fN(u) = F(x&), and x v s the
solution of (1.3). Generally, (1.3) cannot be solved explicitly, and hence must
be solved by some recursive technique, that we will call a "solver". Since only
a finite number of iterations of the solver can be contemplated, in solving the
problem (1.1) numerically, we find ourselves dealing with two approximation
parameters: N , which determines the Euler integration step-size, and, say K,
the number of iterations of the solver used to approximate x& and hence also
fN(u). If we denote by x&,, the result of K iterations of the solver in solving
(1.3), we get a second level approximating problem

(u) = F(x&,,).
where ~ N , K
Note that while the function fN(u) is continuously differentiable under stan-
dard assumptions, depending on the solver, the function fN,,(u) may fail to
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 425

be even continuous. Hence we may not assume that (1.5) is solvable by means
of standard nonlinear programming type algorithms.
An examination of the literature shows that efficient approaches to solving
infinite dimensional problems use "dynamic discretization," i. e., they start
out with low discretization precision and increase the precision progressively
as the computation proceeds. Referring to Polak (1997), we see that there are
essentially two distinct approaches to "dynamic" discretization, both of which
have been used only in situations with a single discretization parameter.
The first and oldest is that of algorithm implementation, see, e.g., Becker et
a1 (2000); Betts et a1 (1998); Carter (1991); Carter (1993); Deuflhard (1974);
Deuflhard (1975); Deuflhard (1991); Dunn et a1 (1983); Kelley et a1 (1991); Kel-
ley et a1 (1999); Polak et a1 (1976); Mayne et a1 (1977); Sachs (1986). In this
approach, first one develops a conceptual algorithm for the original problem
and then a numerical implementation of this algorithm. In each iteration, the
numerical implementation adjusts the precision with which the function and
derivative values used by the conceptual algorithm are approximated so as to
ensure convergence to a stationary point of the original problem. When far
from a solution the approximate algorithms perform well a t low precision, but
as a solution is approached, the demand for increased precision progressively
increases. Potentially, this approach is extendable to the case where two dis-
cretization parameters must be used.
The second, and more recent approach to dynamic discretization uses se-
quences of finite dimensional approximating problems, and is currently re-
stricted to problems with a single discretization parameter. It was formalized
in Polak (1993); Polak (1997), in the form of a theory of consistent approxi-
mations. Applications to optimal control are described in Schwartz (1996a);
Schwartz et a1 (1996), and a software package for optimal control, based on
consistent approximations, can be obtained from Schwartz (199613). Within
this approach, an infinite dimensional problem, P , such as an optimal control
problem with either ODE or PDE dynamics, is replaced by an infinite sequence
of "nested", epi-converging finite dimensional problems {PN).Problem P is
then solved by a recursive scheme which applies a nonlinear programming al-
gorithm to problem PN until a test is satisfied, a t which point it proceeds to
solve problem Pk+l,using the last point obtained for PN as the initial point
for the new calculation. In Polak (1997) we find a number of Algorithm Models
426 OPTIMIZATION AND CONTROL WITH APPLICATIONS

for organizing such a calculation. The advantages of the consistent approxima-

tions approach over the algorithm implementation approach are that (i) there
is a much richer set of possibilities for constructing precision refinement tests,
and hence for devising one that enhances computational efficiency, and (ii) one
can use unmodified nonlinear programming code libraries as subroutines, see
Schwartz (1996a); Schwartz (199613).
In this paper we develop two-tier algorithms for solving infinite dimensional
optimization problems for which two discretization parameters must be used.
In the first tier, this algorithm constructs an infinite sequence of epi-converging
finite dimensional approximating problems {PN),such as (1.4), and in the
second tier, it uses an algorithm implementation strategy in solving each PN.
The main task that we had to address was that of constructing efficient tests
for dynamically adjusting two precision parameters: N and K, where K de-
termines the precision used in the implementation strategy. The end result
can be viewed as a quasi consistent approximations approach. As we will see
in Section 3, our new algorithm performs considerably better than an ad hoc
algorithm implementation scheme on the problems tested.
Finally, to make this paper reasonably self contained we include an appendix
in which we define and summarize the properties of consistent approximations.

2 AN ALGORITHM MODEL

We now proceed in an abstract setting1 Let S be a normed space, f : S -+ R a

continuously differentiable function, and V c S . We will consider the problem

P min f (v) (2.1)

vEV
and we will assume that we have an optimality function 8 : S -+ R for P.
Next, we will assume that {SN)F=Nois a nested sequence of finite dimen-
sional subspaces of S such that USN is dense in S, and that the pair ( P , 8) can
be approximated by an infinite sequence of consistent approximations ( P N ,ON),
N = No, ....., where ON : SN--+ R is an optimality function for PN,with PN
of the form

PN min fiv(v), (2.2)

V~VN
where the functions f N : SN-+ R are continuously differentiable, and VN c
SN
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 427

Finally, we assume that the exact evaluation of the functions fN(.) and
their gradients is not practical, and that an iterative "solver" must be used,
with K iterations of the solver yielding an approximation f N , ~ ( u to
) fN(u),
) V f N (u), and ON,K(U)to ON(U).We
and similarly, approximations VK ~ N ( uto
, f N (-), or
will make no continuity assumptions on ~ N , K ( . )VK
In response of the above assumption, we will develop new algorithm models,
with two precision parameters, for solving problems of the form P, by mim-
icking the one precision parameter Algorithm Model 3.3.17 in Polak (1997).
Algorithm Model 3.3.17 in Polak (1997) assumes that the functions fN(.), in

such that for all v E VN n B ,

-
(2.2) and the associated optimality functions ON, are computable, and that for
any bounded set B C V, there exists a function A : N IR+ and a 6E (0, oo),

with A(N) - 0 as N + oo.

Algorithm Model 3.3.17 in Polak (1997) uses a parametrized iteration func-
t i o n A N : VN + VN, N E {N-l,No,Nl ,...) C N := {0,1,2,3,...). When
AN(w) is derived from the Armijo gradient method, it will have the form

where X(v) > 0 is the Armijo step-size. For convenience, we reproduce Algo-
rithm Model 3.3.17 in Polak (1997) below.

Algorithm M o d e l la: Solves problem P.

Parameters. w E (0, I), a > 0.

Data. N-1 E N, and vo E VN-~.

S t e p 0. Set i = 0.

Step 1. Compute the smallest Ni, of the form 2"i-1, k E N,and vi+l E VN~,
such that
vi+l = A N ~ ( v ~ ) , (2.5)
428 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Step 2. Replace i by i + 1, and go to Step 1.

Algorithm Model l a has the following convergence property:

Theorem 2.1 Suppose that

(i) for every bounded set B c V, there exists 6 < ca and a function A :
N + IR+ such that limN,, A(N) = 0, and for all N E N, N N-1, >
v E VNnB,
IfN(v) - f (v)l I &A(N); (2.7)

(ii) For every v* E V such that 0(v*) # 0, there exist p* > 0,6* > 0, N* < ca,
such that

Then, every accumulation point ij of a sequence { v ~ ) constructed

~ ~ , by Algo-
rithm Model l a , satisfies 0(+) = 0. 0
Referring to Section 1.2 in Polak (1997), we note that Algorithm Model l a
can also be used in an algorithm implementation scheme. Thus, suppose N is
fixed and that K iterations of the solver applied to function and gradient eval-
uations yield the approximations f N , ~ ( v and
) A N , ~ ( v )to
, fN(v) and AN(v)
respectively, and that for any bounded subset B c VN, there is a 6 E (0, m)
and a function cp : N x N + IR+, such that cp(N, K ) + 0 as K + ca and

Making use of these definitions, we can now state the following scheme, based
on the idea of algorithm implementation, for solving the problem PN.

Algorithm Model lb: Solves problem PN.

Parameters. w E (0, l),a > 0 K * E N.

Data. K-1 E N, and vo E VN.

Step 0. Set i = 0.
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 429

Step 1. Compute the smallest Ki, of the form Ki-1 + kK*, k E N, and
vi+l E VN, such that
vi+l = AN,K~
(vi), (2.10)
and
fN,Ki(vi+l) - fN,Ki(vi) 5 -0(p(N, Ki)W. (2.11)

Step 2. Replace i by i + 1, and go to Step 1. 0

Referring to Theorem 1.2.37 in Polak (1997), we see that Algorithm Model
l b has the following convergence property:

Theorem 2.2 Suppose that

(i) for every bounded set B c VN, there exists K < cm and a function cp :
N x N + Re+ such that lim~,, p ( N , K ) = 0, and for all K E N,
K > K - 1 , VE V N n B ,

(ii) For every v* E VN such that QN(v*)# 0, there exist p* > 0, 6* > 0,
K* < cm,such that

~ )~)N , K ( v )L -6*,
~ N , K ( A N , K (- Vv E VN n B(v*,p*), VK 2 K*.
(2.13)

Then, every accumulation point .ir of a sequence { v i ) r , constructed by Algo-

rithm Model l b, satisfies ON (G) = 0. 0
In view of this, we propose to construct an algorithm model which behaves
as follows. Given a discretization parameter N , it applies Algorithm Model l b
to PN until a sufficiently good approximation to its solution is obtained, and
then it increases N and repeats the process. We will use the failure of a test
of the form (2.6) to determine that that one is close enough to a solution of
PN.As a result, the problems PNwill be solved more and more accurately as
the computation progresses. Parameters in the algorithm will allow the user to
430 OPTIMIZATION AND CONTROL WITH APPLICATIONS

balance the precision with which the problems PNare solved versus the speed
with which N is advanced.
At this point we must introduce some realistic assumptions. In particular,
we assume that for every N, K E N, we can construct an iteration map AN,K :
VN + VN, where K is the number of iterations of a solver.

Assumption: We will assume as follows:

(i) The function f (.) is continuous and bounded from below, and for all N E N,
the functions f ~ ( . are
) continuous and bounded from below.

(ii) For every bounded set B c V, there exists 6 < cm, a function K* : N -+ N,
and functions p : N x N + IR+, A : N --+ IR+ with the properties
lim K * ( N ) = cm, (2.14)
N-00
lim p ( N , K ) = 0, Q N E N, (2.15)
K-cc
lim p ( N , K N ) = 0,
N-cc
VKN > K*(N), (2.16)
lim A ( N ) = 0, (2.1 7)
N+cc

such that for all N E N, v E VN n B ,

(iii) For every v* E V such that B(v*) < 0, there exist p* > 0, S* > 0, N* > 0,
K** < m, such that

A l g o r i t h m M o d e l 2: Solves problem P .

P a r a m e t e r s . w E (0, l ) ,K , lc* E N, K * ( . ) ,A(.), p(., .) verifying (2.14), (2.15),

(2.161, (2.17).
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 431

Data. NOE N ,vo E V N ~ .

Begin Outer Loop

Step 0. Set i = 0.

Begin Inner Loop

Step 1. Set Ki = N*(Ki).
Step 2. Compute A N ~ , (vi).
K~
Step 3. If Ki < k*K*(Ni)and

replace Ki by Ki + K and go to Step 2.

Else, set
mi, ~ i= )A(Ni) + cp(Ni,Ki),
and go to Step 4.
End Inner Loop

Step 4. If
~ N ~ ( ,A KN ~
~ ,(vi)) K ~ > - A ( N ~ Ki)W,
K ~ - ~ N < , (vi) , (2.23)

replaceNi by 2Ni and go to Step 1.

Else, set
Vi+l = AN^ , K ~(vi), (2.24)

replace i by i + 1 and go to Step 2.

End Outer Loop 0
Remark 2.1
1. The main function of the test (2.21) is to increase N over the initial value
of N = N * ( N i )if that is necessary. It gets reset to N = N*(Ni)whenever Ni
is halved.
2. Note that the faster cp(N,K ) + 0 as K -+ oo, the easier it is to satisfy the
test (2.21)a t a particular value of N . Thus, when the solver is fast, the precision
parameter Ki will be increased more slowly than when it is slow. A similar
argument applies to the increase of Ni, on the basis of the test in (2.23). In the
context of dynamics defined by differential equations, the integration mesh size,
432 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1/Ni, will be refined much faster when the Euler method is used for integration
than when a Runge-Kutta method is used for integration. 0

Lemma 2.1 Suppose that Assumption is satisfied.

(a) If vi E VN is such that 9(vi) # 0, then there exists an Ni < oo such that
(2.23) fails, i.e., vi+l is constructed.
(b) If Algorithm Model 2 constructs an infinite sequence {vi)zo that has
a t least one accumulation point, then Ni t m , as i + oo, and hence, also,
Kitoo,asitoo.

Proof. (a) Suppose that vi E V , is such that 9(vi) # 0. Then, by Assumption

(iii), there exist an N < oo, K < m, and 8 < 0, such that for all N 2 N,
K 2 K,
fiV,K(AiV,K(wi) - fN,K(vi) 5 -8 5 -A(N, K)~. (2.25)

Since by construction Ki 2 K(Ni) and K(N) -+ oo, as N + oo, it follows

that there exists an Ni < oo such that (2.23) fails.
(b) For the sake of contradiction, suppose that the monotone increasing
sequence {Ni)go is bounded from above by b < oo. Then there exists an
io such that Ni = Ni, = N * < oo for all i 2 io. In this case, there is a
K * 5 Ic * K*(N*) + K and an il 2 iosuch that Ki = K * for all i 2 i l . Hence,
for all i > il, since (2.23) fails for each such i ,

which shows that f ~ * , ~ * ( vti ) -00, as i + oo. Now, it follows from (2.18)
and (2.19) that for all i 2 il,

which implies that f (vi) + -oo, as i + m . However, by assumption, there

exists an infinite subsequence {vij) and a v* E Vh*, such that vij t v*, as j t

oo. Since f (.) is continuous, by assumption, we conclude that f (vij) + f (v*),

as j + oo, which is a contradiction, and completes our proof. 0
Theorem 2.3 Suppose that Assumption is satisfied.
(a) If {vi)z0 is a sequence constructed by Algorithm Model 2, in solving the
problem P, then every accumulation point v* of {vi)zo satisfies 9(v*) = 0.
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 433

(b) If f(.) is strictly convex, with bounded level sets, and {vi)zo is a
sequence constructed by Algorithm Model 2, in solving the problem P , then
{vi)z0 converges to the unique solution of P.

Proof. (a) Suppose that {vi)z0 is a sequence constructed by Algorithm Model

2 and that {vij)Zo is a subsequence converging to a point 6 and that 8(6) # 0.
Now, by Lemma 2.1, Ni --t 0, as i --t oo, and by Assumption (iii), since
Ki 2 K*(Ni), there exist @ > 0, $ > 0, > 0, such that

) < -8,
f ~ ; , K ~ ( A N i , ~ i ( v-i )fN;,~i(vi) Vvi E B(6, @), VNi 2 N.
Next we note that in view of (2.18) and (2.19), for any v E V,,

If (u) - ~ N ~ , K ~I(f Ui )~~( ~ i ,:=~ fi[A(Ni)

i ) + c~(Ni,Ki)].
Let io be such that for all i j 2 io, vij E B(6, @)and

2nA(Nij, Kij) 5 ;$,

Finally, let il L i o be such that Ni I for all i 2 i l . Then, for the subsequence
{vij)Zo, with i j 2 i l ,

< -8 + 2nB(Nij, Kij) 5 -;8,

(2.32)
and in addition, in view of (2.29) and the test (2.23), for all i 21,

Hence we see that the sequence { f (vi))zi, is monotone decreasing, and there-
fore, because f (.) is continuous, it must converge to f (6). Since this is contra-
dicted by (2.31), our proof is complete.
( b ) Since a strictly convex function, with bounded level sets, has exactly
one stationary point, the desired result follows from (a) and the fact that
{ f (vi))zil is monotone decreasing. 0
434 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Remark 2.2 The following Algorithm Model differs from Algorithm Model 2
in two respects: first the integer K is never reset and hence increases monoton-
ically, and second the test for increasing N is based on the magnitude of the
approximate optimality function value. As a result, the proof of its convergence
is substantially simpler than that for Algorithm Model 2. However, convergence
can be established only for the diagonal subsequence { v i j ) z o a t which Ni is
doubled. 0
Algorithm Model 3: Solves problem P.

Parameters. w E (0, I ) , €0 > 0, K E N , K * (a), cp(-,-) verifying (2.14), (2. IS),

(2.16).

Data. NO E N , vo E VN,,.

Begin Outer Loop

Step 0. Set i = 0, j = 0, KO= K*(No).

Begin Inner Loop

Step 1. Compute a point v, = AN,,Ki( v ~ ) .
Step 2. If
~N,,K , I -€i
(v*) (2.34)

replace Ki by Ki + K and go to Step 1. Else, set vi+l = v,, and

go to Step 3.
End Inner Loop

Step 3. If

set v; = vi+l, K;+l = Ki, Nj+l = Ni, replace j by j 1, Ni by +

2Ni, ~i by +
ei/2, i by i + 1, and go to Step 1. Else, replace i by i 1 and
go to Step 1.

End Outer Loop 0

PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 435

At this point we need an additional assumption:

Assumption: We will assume as follows:

(i) The optimality functions O(.) and ON(.) are continuous for all N E N.

(ii) For every bounded set B C V, there exists n < oo, a function K * : N -+ N,
and functions cp : N x N -+ R+,A : N -+ R+ satisfying (2.15)-(2.17),
such that for all N E N, v E VN n B,

and for all N E N , K E N , v E VNnB,

where ON,K(U)is the approximation to ON(V)obtained as a result of K

iterations of a solver.

Theorem 2.4 Suppose that Assumptions and are satisfied and that {v;) is
a sequence constructed by Algorithm Model 3, in solving the problem P.
(a) If {v;) is finite, then the sequence {vi}zo has no accumulation points.
(b) If {vj*) is infinite, then every accumulation point v* of {~j*)joo,~satisfies
q v * ) = o.
( c ) If f (.) is strictly convex, with bounded level sets, and {~j*)joo,~is a
bounded sequence constructed by Algorithm Model 3, in solving the problem
P , then it converges to the unique solution of P.

Proof. (a) Suppose that the sequence {v;) is finite and that he sequence
{vi}zo has an accumulation point v*. Then there exists an io, an N* < oo,
and an E* > 0, such that for all i 2 io,Ni = N*, E i = E*,and ON*,Ki (vi) < -E*.
But, in this case, for i 2 io,the Inner Loop of Algorithm Model 2 is recognized
as being of the form of Master Algorithm Model 1.2.36, in Polak (1997). It now
follows from Theorem 1.2.37 in Polak (1997) that Ki -+ oo, as i -t oo, and that
ON*(^*) = 0. Next, it follows from (2.38), in Assumption and the continuity of
ON*('), that for some infinite subsequence {Vij), ON*,Kij (Vij) -+ ON*(v*)= 0,
which shows that (2.36) could not be violated an infinite number of times, a
contradiction.
(b) When the sequence {vf ) is infinite, it follows directly from Assumption
2 and the test (2.36) that if v* is an accumulation point of {vj*),then O(v*) = 0.
436 OPTIMIZATION AND CONTROL WITH APPLICATIONS

(c) When the function f (.) is strictly convex, with bounded level sets, it has
a unique minimizer v* which is the only point in V satisfying 13(v*) = 0. Hence
the desired result follows from (b). 0

3 A DISTRIBUTED PROBLEM WITH CONTROL IN THE

COEFFICIENTS

An absorbant coating of thickness a on a multi-component airfoil S is to be

optimized to cancel the reflected wave u from an accoustic incident wave u,
of frequency w, in a sector angle C.
The Leontovich condition models the thin coating and the thickness of the
coating layer is proportional to the Leontovich coefficient a. A first order
absorbing boundary condition is applied on the outer boundary I?, which, for
computational purposes is assumed to be a t finite distance; therefore the sector
angle is a portion of r,. The problem is

min, f (a) = Jc IuI2 subject to

du du
w2u+ Au = 0 --iwu=Oonr,, -+aw(u+u,)=gonS
dn dn
(3.1)

In weak form the PDE is

It can be shown that it has one and only one solution which depends con-
tinuously upon the data a. Note that u is a nonlinear function of a.
The problem is discretized by the finite element method of degree one Ciarlet
(1977) on triangles combined with a domain decomposition strategy with the
purpose of having a finer mesh in desired regions without having to touch the
rest of the domain. All linear systems are solved with the Gauss factorization
method.
The airfoil is made of two parts, a main airfoil Smand an auxiliary airfoil S,,
below and slightly behind the main one. To apply Domain Decomposition we
need to partition the physical domain R as a union R1 UR2 of two sub-domains
with a non empty intersection. This is done by surrounding the auxiliary airfoil
by a domain R2 outside Smand with boundary dR2 = r2U S, and by taking
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 437

R1 = R\D where D contains S, but is contained in R2. Note that,'I US, U r l

is the boundary of Rl if rl denotes dD.
Each domain is triangulated by an automatic mesh generator which is mon-
itored by the mesh of the boundaries. So as the mesh sizes on the boundary
tend to zero, so does the size of the triangles.
We denote by V i , i = 1,2 the spaces of piecewise linear functions on the
triangulations of Ri which are zero on the approximations of r2and C respec-
tively.
Consider the two problems: given zi, j = 1,2, piecewise linear continuous
on the triangulation of R j , find ujh such that ujh - z i E V/ and

(w2u;v - V U ; V V ) - iw U;V =0 Vv E v;(R1) (3.3)

(w2uiv- V U ~ V V ) + +
(wa(u2 u,) - g)v = 0 Vv E ~ j ( R 2 ) (3.4)

The Schwarz algorithm is as follows:

1. Choose z i , zz, E >0

3. Solve (3.3) and (3.4)

where IIh is the interpolation operator from one mesh to the other. The aprox-
imate solution after K iterations or the Schwarz algorithm is defined to be
1
= U;
U ~ , K in Ri\(R1 n R2) and U ~ , K= -(u;
2
+ u:) in R1 n R2. (3.5)

However, for compatibility with the theory in this paper, we determine the
mesh size h = 1/N and the number of Schwarz iterations K as required in
Algorithm Model 2, and we do not use a z1 - z2 to determine the number of
Schwarz iterations.
The convergence of the Schwarz algorithm is known only for compatible
meshes, i.e. meshes of R1 and R2 identical in R1 f l R2.
438 OPTIMIZATION AND CONTROL WITH APPLICATIONS

3.1 Computation of Gradients

With self explanatory notation we have

df =2 l u 6 ~ (3.6)

where 6 u is the solution of

Let p be the solution of

Then v = p and w = Su yield

Corollary 3.1 In L2 ( C ) the gradient of f , with respect to a, is given by

V f ( a )= -wup (3.10)

where p is solution of (3.8)

The same calculation applies to the discrete problem and yields:

Corollary 3.2 In L 2 ( C ) , the gradient of fh, with respect to a, is given by

Vfh(ah)= -WUhPh (3.11)

where ph E Vh is the piecewise linear continuous solution of
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 439

The approximate gradient of fh, with respect to a, is given by

where U ~ , Kand p h , ~
are computed by K iterations of the Schwarz algorithm.
Details of the validity of this calculus of variations calculation can be found
in Lions (1958)

3.2 Verification ofHypotheses

( i ) Continuity of f (.) with respect to the control is established in Cessenat

(1998). Continuity of f h ( . ) with respect to the control is obvious from the
matrix representation of the problem and the fact that the matrices are non
singular.
( i i ) It follows from the finite element error estimates given in Ciarlet (1977)
that for some C < oo,
lluh - ullo < Ch2, (3.14)

which implies that

lfh(vh)- f ( v ) (< c h 2 .

Hence we can set A ( h ) = h2.

(iii) The Schwarz algorithm converges linearly with rate constant (1- d / D ) ,
where d is the diameter of O1nil2and D is the diameter of Cll U 0 2 , SO instead
of (3.14) we have the bound

for some C E (0,oo), which implies that we can set cp(h,K ) = (1 - g)K.Note
that in this case cp(h,K ) is actually independent of h. In view of this, we can
take K * ( h )= C c e i l ( l / h ) ,where C > 0 is arbitrary.
(iv)I t follows from the properties of the method of steepest descent that,
given any v* iK V = L2(0,1) such that V f (v*) # 0, there exist a p* > 0, a
6* > 0, A*, and an h* > 0 , suchthat for all v E V n B ( v * , p ) ,(i) V f h ( v )# 0
and (ii)

f ( v - A(v)Vf ( v ) )- f ( v ) < f ( v - AVf ( v ) )- f (v) 5 -d, (3.17)

where A(v) is the exact step-size computed by the Steepest Descent Algorithm.
440 OPTIMIZATION AND CONTROL WITH APPLICATIONS

To show that there exist an h* > 0 and an N** < oo, such that for all
h < h*, N 2 N**, and v E VhflB(v*,p)

v A ~ ( v ) v f h ( v ) )- fh,iv(v) I f h , ~ ( -
f h , ~ (- v A(v)Vfh(v)) - f h , ~ ( v )I -5*/2,
(3.18)
we make use of the facts that (a)

(b) By inspection, the bound functions A(h), cp(h, K ) , and K*(h) have the
required properties.

3.3 Numerical Results

The following test was conducted in cooperation with G. Lemarchand. Other

test cases can be found in Pironnea et a1 (2002). Compared with these, the
novelty of this accoustic problem is that it is nonlinear.
The numerical software freefem+Bernardi et at (1999) was used t o solve
the problem where the obstacles are two NACA0012 airfoils of length 1 in
a circular domain of radius 5. The accoustic wave comes horizontally with
frequency w = 1.7. The optimization starts with a = 0. The Schwarz iteration
number K is augmented by 1 when the test fails until the criteria is met and
the mesh size goes from c/n to c/(n + 1) when refinement is required by the
algorithm.
Figure 3.1 and figure 3.2 show the solution and figure 3.3 shows the history
of the convergence compared with a straight steepest descent method and a
steepest descent with mesh refinement only and no DDM.
On this example Algorithm model 3 proved ro be fairly stable. N o t e that the
approximation of the objective function increases quite a lot whenever the mesh
is subdivided, a t least in the beginning, but that it a common phenomenon.
The speed-up over using fixed very small h and very large K is considerable
because most of the optimization occurs before the mesh is too fine or the
number of Schwarz iterations is too large.

4 CONCLUSIONS

We have developed algorithm models based on the consistent approximations

approach for solving infinite dimensional problems with two independent pre-
cision parameters. We have applied it to a nonlinear optimal control problems
PROBLEMS WITH T W O NUMERICAL PR.ECISION PARAMETERS 441

Figure 3.1 Real part of the solution of Helmholtz equation.

0081 . , . , . .
.-..
,_.
, .
.,,.
,
,,, > .,, ...
om.
./"
om.

om -
4 0075.
;
om. ;

am. :
om ' . ' . ' ' * . .

Figure 3.2 ct. versus distance t o the leading edge on the two sides of each airfoil.
442 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Figure 3.3 History of the convergence of the cost function for the coating problem. The
method with mesh refinement and adapted Schwarz iteration number (green curve) is com-
pared with a straight steepest descent method (red curve) and a steepest descent with mesh
refinement only and DDM up t o convergence (blue curve). The objective function augments
whenever the mesh is refined.

with PDE dynamics having two precision parameters, the step size and an iter-
ation loop count in the solver. Our numerical results show that our algorithms
are effective. The numerical study was done using the method of steepest de-
scent but the models and the proofs are general and are likely to work also with
Newton methods, conjugate gradient methods, etc.

5 APPENDIX: CONSISTENT APPROXIMATIONS

To make this paper reasonably self contained we define consistent approxima-

tions and state their most important properties.

Definition 5.1 Let S be a normed space, let {SN)F=No be a nested sequence

of finite dimensional subspaces of S such that USN is dense in S, and consider
the problems

min f (u)
UEU

where U is a subset of S and f : S + IR is continuous, together with the

approximating problems
PROBLEMS WITH TWO NUMERICAL PRECISION PARAMETERS 443

PN s g~ N ( v ) (5.2)

where UN is a subset of SNand f~ : SNt IR is continuous.

(a) We say that the problems PN epi-converge2 to P if (i) for every u E U

there exists a sequence {uN), with UN E UN, such that UN + U, as
N t oo,and limsup f N ( u ~ 5) f (u); and (ii) for every infinite sequence
{uN), such that U N E UN and U N t U, u E U and liminf frv(unr)2 f (u).

(b) We say that upper-semicontinuous, nonpositive-valued functions ON :

UN -+ IR (8 : U -+ IR) are optimality functions for PN (P), if they
vanish at local minimizers of PN (P)3.

(c) We say that the problem-optimality function pairs {PN,ON) are consis-
tent approximations to the problem-optimality function pair {P,8), if
the PNepi-converge to P and for every infinite sequence {uN), such that
UN E UN and u~ +u E U, l i m s u p 8 ~ ( 5~ O(U)*.
~) 0

Theorem 5.1 Suppose that the problems PNepi-converge to P. .

(a) If {GN) is a sequence of global minimizers of the PN and G is any accu-
mulation point of this sequence, then G is a global minimizer of P, and
) f (6).
~ N ( G N+

(b) If {GN) is a sequence of local minimizers of the PN,sharing a common

radius of attraction, and G is accumulation point of this sequence, then G
is a local minimizer of P. 0

The reason for introducing optimality functions into the definition of con-
sistency of approximation is that it enables us to ensure that not only global
optimal solutions of the problems Phconverge to global optimal solutions of P,
but also local optimal solutions converge to either local solutions or stationary
points.

Acknowledgments

This work was supported in part by the National Science Foundation under Grant
No. ECS-9900985 and by the Institut Universitaire de France.
444 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Notes

1. Please refer t o the Appendix or Polak (1997) for the definitions of optimality functions
and consistent approximations.
2. The epigraphs of f N , restricted t o UN, converge to the epigraph o f f , restricted to U,
in the PainlevbKuratowski sense.
3. When optimality functions are properly constructed, their zeros are standard station-
ary points, for examples see Polak (1997).
4. Note that this property ensures that the limit point of a converging sequence of ap-
proximate stationary points for the Phmust be a stationary point for P

References

Becker, R., Kapp, H., and Rannacher, R. (2000), Adaptive finite element meth-
ods for optimal control of partial differential equations: basic concept, SIAM
J. Control and Optimization, Vol. 39, No. 1, pp. 113-132.
Bernardi D., Hecht, F., Otsuka K., Pironneau 0. (1999) : freefern+, a finite
element software to handle several meshes. Dowloadable from
ftp://ftp.ann.jussieu.fr/pub/soft/pironneau/.
Cessenat M. (1998), Mathematical Methods in Electromagnetism, World Sci-
entific, River Edge, NJ.
Betts, J. T. and Huffman, W. P. (1998), Mesh refinement in direct transcription
methods for optimal control, Optm. Control Appl., Vol. 19, pp. 1-21.
Carter, R. G. (1991), On the global convergence of trust region algorithms using
inexact gradient information, SIAM J. Numer. Anal., Vol. 28, pp. 251-265.
Carter, R. G. (1993), Numerical experience with a class of algorithms for non-
linear optimization using inexact function and gradient information, SIAM
J. Sci. Comput., Vol. 14, No. 2, pp.368-88.
Ciarlet, P.G. (1977), The Finite Element Method, Prentice Hall. Deuflhard, P.
(1974), A modified Newton method for the solution of ill-conditioned systems
of nonlinear equations with application to multiple shooting, Numerische
Mathematik, Vo1.22, No.4, p.289-315.
Deuflhard, P. (1975), A relaxation strategy for the modified Newton method.
Optimization and Optimal Control, Proc. Conference on Optimization and
Optimal Control, Oberwolfach, West Germany, 17-23 Nov. 1974, Eds. Bu-
lirsch, R.; Oettli, W.; Stoer, J., Springer-Verlag, Berlin, p.59-73.
Deuflhard, P. (1991), Global inexact Newton methods for very large scale non-
linear problems, Impact of Computing in Science and Engineering, Vo1.3,
NO.^), p.366-93.
REFERENCES 445

Dunn, J . C., and Sachs, E. W. (1983), The effect of perturbations on the con-
vergence rates of optimization algorithms, Applied Math. and Optimization,
pp. 143-147, Vol. 10.
Kelley, C. T. and Sachs, E. W. (1991), Fast algorithms for compact fixed point
problems with inexact function evaluations, SIAM J. Sci. Statist. Cornput.,
VO~.12, pp. 725-742.
Kelley, C. T. and Sachs, E. W. (1999), A Trust Region Method for Parabolic
Boundary Control Problems, SIAM J. Optim., Vol. 9, pp. 1064-1081.
Lions J.L. (l968), Contr6le Optimal de syst&mesgouvern4.s par des kquations
aux dkriv4es partielles. Dunod-Gauthier Villars, 1968.
Mayne D. Q., and Polak E. (1977), A Feasible Directions Algorithm for Optimal
Control Problems with Terminal Inequality Constraints, IEEE Transactions
on Automatic Control, Vol. AC-22, No. 5, pp. 741-751.
Pironneau O., Polak E. (2002), Consistent Approximations and Approximate
Functions and Gradients In Optimal Control, J. SIAM Control and Opti-
mization, Vol 41, pp.487-510.
Polak E., and Mayne D. Q. (1976), An Algorithm for Optimization Problems
with Functional Inequality Constraints, IEEE Transactions on Automatic
Control, Vol. AC-21, No. 2.
Polak E. (1993), On the Use of consistent approximations in the solution
of semi-Infinite optimization and optimal control problems", Mathematical
Programming, Series B, Vol. 62, No.2, pp 385-414.
POLAK E. (1997), Optimization: Algorithms and Consistent Approximations,
Springer-Verlag, New York.
Sachs, E. (1986), Rates of Convergence for adaptive Newton methods,JOTA,
Vol. 48, No.1, pp. 175-190.
Schwartz, A. L. (1996a), Theory and Implementation of Numerical Methods
Based on Runge-Ku tta Integration for Solving Optimal Control Problems,
Ph. D. Dissertation, University of California, Berkeley.
Schwartz, A. L. (1996b), RIOTS The Most Powerful Optimal Control Problem
Solver. Available from http://www.accesscom.com/ adam/RIOTS/
Schwartz, A. L., and Polak, E. (1996), Consistent Approximations for Opti-
mal Control Problems Based on Runge-Kutta Integration, SIAM Journal on
Control and Optimization, Vol. 34, No.4, pp. 1235-69.
21 NUMERICAL SOLUTIONS OF
OPTIMAL SWITCHING CONTROL
PROBLEMS
T. Ruby and V. Rehbock

Department of Mathematics and Statistics

Curtin University of Technology
GPO Box U 1987, Perth 6845, Australia

Abstract: We develop a numerical solution strategy for a general class of

optimal switching control problems. We view this class of problems as a nat-
ural extension of related classes considered previously, namely discrete valued
optimal control problems and time optimal control problems. We show that
techniques developed for these subclasses may be readily extended to the more
general class considered in this paper. Numerical results are given to illus-
trate the proposed approach and to compare it with another recently developed
solution technique.

Key words: Optimal control, computational methods, switching dynamics,

control parametrization, transformations.
448 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

In this paper, we consider a class of optimal switching control problems. These

are characterized by the fact that the state variables of the problem are gen-
erated from a given initial state in conjunction with a successive sequence of
dynamical systems which the user can select from amongst a given finite set.
It is assumed that each system in this sequence is active for a certain duration
within the time horizon and that the state of the system is continuous across
a time point where systems are switched, i.e. the final state over the previous
time duration is chosen as the initial state for the next time duration. The aim
in these problems is to choose a sequence of dynamical systems such that the
resulting state of the system optimizes a given performance index. At the same
time, the aim is to also optimally choose the time durations over which each
of the chosen dynamical systems is active. In this work, we limit our atten-
tion to problems involving dynamical systems describes by systems of ordinary
differential equations (ODES).
This class of problems has significant practical applications in economics,
management, production planning and industrial engineering, as dynamical
systems arising in these areas are typically subject to frequent and sudden
changes.
A number of subclasses of the general switching control problem exist. These
include time optimal control problems (where the optimal control is of bang-
bang type) and discrete valued optimal control problems (where the control is
only allowed to take values from a finite, discrete set). The common feature
of all these problems is the need to accurately determine the optimal times
for the control function to switch (between its upper and lower bounds in
the case of time optimal control and between discrete values for the discrete
valued optimal control). Regarding these switching times as parameters, it is
known that the objective function gradients with respect to these parameters is
discontinuous (see Teo et al. (1991)). Hence, direct solution via gradient based
optimization methods does not usually work well for these problems. Recently,
the Control Parametrization Enhancing Technique (CPET) has been proposed
to overcome this difficulty. The basic idea is to introduce an auxiliary control
function (known as the enhancing control) which is used to scale the original
time horizon in such a way that the switching times are mapped to fixed points
in the transformed time scale. CPET was originally introduced for time optimal
OPTIMAL SWITCHING CONTROL PROBLEMS 449

control problems in Lee et al. (1997). The application of CPET to discrete

valued optimal control problems is demonstrated in Lee et al. (1999); Lee et
al. (2001); Lee et al. (2001a). An additional difficulty arises with discrete
valued optimal control problems in that one needs to find not only the optimal
switching times but also the corresponding optimal sequence of control values.
Assuming a finite number of switching times, a heuristic method is proposed in
Lee et al. (1999) to deal with this difficulty. Similar issues need to be addressed
for the class of problems considered in this paper. A review of the CPET can
be found in Rehbock et al. (1999).
Further, note that another efficient numerical approach for time optimal
control and discrete valued optimal control problems has recently been proposed
in Kaya and Noakes (2001a) and Kaya and Noakes (2001b). This is based on a
combination of the Switching Time Computation (STC) method and the Time
- Optimal Switchings (TOS) algorithms.
General optimal switching control problems have received limited attention
in the literature. Theoretical and some computational aspects of these prob-
lems can be found in Yong (1989) and Li and Yong (1995) and in the references
cited therein. A computational approach for solving optimal switching control
problems with a fixed number of switching times has recently been proposed in
Liu and Teo (2000). The approach in Liu and Teo (2000) involves a two trans-
formations which turn the original problem into a form suitable for solution
by a standard optimal control software. The first transformation introduces
a set of scalar variables and associated constraints which determine whether
a particular choice of dynamical system is active over a given time interval.
This is similar to an approach proposed in Wang et al. (1996) to deal with
general mixed programming problem. The resulting constraints are appended
to the cost functional as penalty functions. The second transformation involves
a rescaling of the original time horizon, along the lines of the CPET, to deal
with the variable switching times.
The purpose of this paper is to demonstrate that a general class of optimal
switching control problems can be transformed into a form suitable for solution
by standard optimal control software (such as MISER3, see Jennings et al.
(1991), Jennings et al. (2001)) through a more direct application of CPET.
The resulting algorithm differs somewhat to that of Liu and Teo (2000) in that
we do not assume a fixed number of switching times. As demonstrated by the
450 OPTIMIZATION AND CONTROL WITH APPLICATIONS

numerical results, this can sometimes lead to more optimal objective function
values.
Note, though, that neither of these algorithms can guarantee a globally op-
timal solution, due to the combinatorial aspect of having to choose an optimal
sequence of dynamical systems.

2 PROBLEM FORMULATION

Suppose that we have a total of M given dynamical systems, R1, R2, . . . , OM,
defined on the time horizon [O,T]. Each of these may be invoked over any
subinterval of the time horizon. For i = 1 , 2 , . . . ,M, let the i-th candidate
system be defined by a set of first order ordinary differential equations, i.e.

where x = [xl, x2, . . . ,xnIT E Rn is the state of the system and

fZ = [f;,f;, . . . ,f:lT E Rn is continuously differentiable in all its arguments.
We denote these systems collectively by the set

Let K > 1 be an integer. Then, we define

Here, each TK = [tl, t2, . . . ,tK-1IT E TK is called a feasible switching time

sequence of length K , and each V K = [vl, v2,. . . ,vKjT E VK is called an index
sequence of length K . Note how the index sequence V K = [vl, v2,. . . , vKIT can
be readily used to define a sequence of dynamical systems {R,, ,Rv, , . . . ,QvK}.
We refer to '& x VK as the set of feasible switching sequences of length K .
Corresponding to each K and each feasible switching sequence ( r K ,v K ) E
IK x VK, we can define a dynamical system over the entire time horizon as
follows. Let to = 0 and t~ = T, then consider
OPTIMAL SWITCHING CONTROL PROBLEMS 451

where xo E Rn is a given initial condition of the system and where we use

+
the notation x ( t - 0) = lim x ( t E). Note how equation (2.6) insures the
€--to-
continuity of the resulting state of the system. Finally, let the solution to
the system (2.4)-(2.6) corresponding to (TK,VK) E ?i( x VK be denoted as
~ ( ~ I TOK).
K,
For each K and each (TK,VK) E TK x VK, we define the following cost
functional:

J(K, TK, V K ) = @(x(T)) + JdT g(t, ~ (~ITK, VK))dt.

(2.7)

The optimal switching control problem is then defined as:Problem (P):

Find a K and (TK, vK) E 'TK x VK to minimize the cost functional (2.7)
subject to the dynamics (2.4)-(2.6).
Note how this general model encompasses the classes of time-optimal and
discrete valued control problems addressed in the earlier references. For in-
stance, in a discrete valued optimal control problem, we can consider each
possible combination of the discrete control values as leading to a different dy-
namical system Ri which is obtained by simply substituting the control values.
In problems where the control has multiple components with each one capable
of taking on a large number of discrete values, this can lead to a large set R
and we come back to this issue later.
We could include a range of other parameters, control functions and canon-
ical constraints in Problem (P), but choose not to do so here for the sake of
brevity. If ordinary (i.e. non-discrete valued) control functions were incorpo-
rated, issues regarding the partitioning of [0,T] with respect to the parametriza-
tion of the controls would need to be addressed carefully to avoid discrepancies
with the switching times.
Finally, note that the basic class of problems considered here is slightly
different to that addressed in (Liu and Teo (2000)), because we allow K to be
variable. Also, we do not consider terms directly measuring a cost of switching
in the cost functional, although it is not clear whether the technique developed
in Liu and Teo (2000) can actually deal with these in all cases.
Before proceeding, we make one assumption about the nature of Problem
(PI:
A s s u m p t i o n 1: There exists an optimal solution of Problem (P) where K
is finite.
452 OPTIMIZATION AND CONTROL WITH APPLICATIONS

3 SOLUTION STRATEGY

The aim of this section is to describe a transformation which turns Problem

(P) into a standard form suitable for solution by the ordinary control para-
metrization approach. As for the subclass of discrete valued optimal control
problems dealt with in Lee et al. (1999), we use the control parametrization
enhancing technique (CPET) in the second stage of this transformation.
In view of Assumption 1, let us initially suppose that the optimal number
of switchings is a t most N . In order to allow for all possible combinations of
systems in fl to occur, we let L = M ( N + 1) and consider the fixed index
sequence WL = [wl,w2, . . . , w L ] ~E VL where wj = ( ( j - 1) mod M ) + 1,
j = l , ...,L, i e .

ie. the ordered sequence [l, 2,. . .,MI is repeated N + 1times within WL. Then,
we consider the following system on [0,TI.

x(t) = fWj(t,x(t)), t ~ [ r j - l , r j ) , l , . . . ,L (3.1)

~ ( 0 )= 20, (3.2)
x(rj) = x(q-O), j=l, ..., L - 1 , (3.3)

where TL = [ r 1 , r 2 , . .. , T
' L ]must,
r ~ -E ~ ~ of course, satisfy

We define the following prob1em:Problem (PN ): Given N , for L = M ( N + l ) ,

find a TL E T
' L ( i e . its components must satisfy (3.4)) such that the cost
functional J ( L , TL, WL) defined by (2.7) is minimized subject to the dynamics
(3.1)-(3.3).
Note how the only variables in Problem ( p N ) are the components of TL,
since both L and the index sequence WL are assumed fixed. I t should also be
clear that Problem ( p N ) may allow solutions with more then N switchings. In
fact, it may allow solutions with up to L-1 switchings. In view of the algorithm
described below, this is does not pose a problem, though. What is important
to note is that all feasible switching sequences of length N for Problem ( P )
can be replicated as feasible switching sequences for Problem ( P N ) , simply by
choosing an appropriate TL E 'TL.
OPTIMAL SWITCHING CONTROL PROBLEMS 453

Note that a solution of Problem ( p N ) may only be optimal with respect

to allowing up to N switchings. Even if an optimal solution of Problem (P)
has less than N switchings (in which case it is guaranteed to be amongst the
feasible switching sequences of Problem ( p N ) ) we have no sure way of find-
ing such a globally optimal solution. Instead, our proposed algorithm is only
guaranteed to find a locally optimal solution. Indeed, if the number of candi-
date system, M, is large, such as is often the case in discrete valued optimal
control problems, it is highly likely that only a locally optimal solution will be
generated. The task of actually finding an optimal number of switchings and a
corresponding optimal index sequence is theoretically challenging and has not
been solved for the general class of problems under consideration here. How-
ever, in many practical applications, careful examination of the problem, based
on the minimum principle, can give clues to the optimal switching sequence
and thus simplify this task. See Pudney et al. (1992) and Howlett (1996) for
an example involving the optimal control of a train.
We can, however, suggest a reasonable heuristic approach to approximate
the globally optimal number of switchings as follows. Starting with a fixed
N , we solve the Problem (PN). We then increase N ( e . g . N = N 1) and +
solve Problem (pN)again. If there is no decrease in the optimal cost, we
assume that the optimal solution resulting from to the previous value of N
contains the optimal number of switchings. Otherwise, we increase N further
(by Assumption 1, we will only have to increase N a finite number of times).
Note that, while this approach yields satisfactory results in many practical
problems, it can not guarantee that the resulting N is indeed optimal.
For a given TL E TL,the right hand side of (3.1) may be a discontinuous
function o f t at the switching points r j , j = 1,2, ...,L- 1. Hence, we encounter a
number of difficulties when trying to calculate the optimal TL in a conventional
manner:

(a) We need to perform piecewise integration over intervals whose endpoints

are variable;

(b) The gradient of the cost functional with respect to TL is not continuous(see
Teo et al. (1991));

All of these difficulties are overcome by the CPET originally developed in

Lee et al. (1997) for time optimal control problems and also employed for
discrete valued optimal control problems in Lee et al. (1999).
As in the cited references, we introduce the new time variable s E [0,L].
Let U denote the class of non-negative piecewise constant scalar functions de-
fined on [0,L ) with fixed switching points located a t {1,2,3, ...,L - 1). The
transformation (CPET) from t E [0,T ]to s E [0,L ] is defined by

where the scalar function u E U, u ( s ) = r j - r j - 1 for s E Ij- 1,j ) , is called the

enhancing control. Clearly, it must satisfy

Alternatively, we could replace (3.6) by the constraint

Remark 3.1 A s noted earlier, we are introducing many artificial switchings

into the transformed problem by the above technique. This is necessary to allow
all the possible orderings of the sequence of discrete control values to be consid-
ered when solving the transformed problem. These artificial switchings do not
cause any dificulties, though, since, i f U ( S ) = 0 o n [ j - 1,j ) , then r j - 1 = T j .
Consequently, the choice of dynamical system RWj, valid for s E [ j - 1,j ) , has
no bearing o n the the solution of the original problem.

To complete the application of the CPET to the Problem (pN), note that,
in the new time scale, the system dynamics may be conveniently rewritten as

where we define Z ( s ) = x ( t ( s ) ) , f
' ( s ,Z ( s ) ) = f j ( t ( s ) ,x ( t ( s ) ) )and t ( s ) is the
solution of (3.5). Furthermore, the objective functional is transformed to

J(u) = @ ( Z ( L ) ) + JdL u ( s )g ( s , Z ( s ) )ds,

OPTIMAL SWITCHING CONTROL PROBLEMS 455

N ): Given
where g(s, Z ( s ) ) = g ( t ( s ) ,x(t(s))). Finally, we define Problem (P,
N , find a u E U such that the cost functional (3.11) is minimized subject to
the dynamics (3.5), (3.8)-(3.10) and subject to the constraint (3.7).
Note that Problems ( P N ) and (P:) are equivalent. In Problem (P:), rather
...,, TL-1, we look for u E U.All switching points of the orig-
than finding T ~ , T Z
inal problem are mapped onto the set of integers in chronological order. Piece-
wise integration can now be performed easily since all points of discontinuity
of the dynamics in the s-domain are known and fixed. Moreover, u E U is a
piecewise constant function and hence Problem P
)(: is readily solvable by the
optimal control software MISER3 (see Jennings et al. (1991), Jennings et al.
(2001)), which is an implementation of the control parametrization technique
(see Teo et al. (1991)). Note that the piecewise integration of (3.8) is performed
automatically in MISER3. The continuity constraints (3.10) are also satisfied
automatically when executing the code in standard mode.
The solution of (3.5) yields t ( s ) , so the the state trajectory x(t), of the
original problem defined on [0,T] can be reconstructed easily. I t is clear from
Assumption 1 that there exists an integer N such that an optimal solution to
Problem P
)(: is also an optimal solution of the original Problein (P). We
must note, though, that MISER3 uses a gradient based optimization approach
and we are therefore not guaranteed of finding a globally optimal solution to
Problem P
)(: and therefore Problem (P).

Remark 3.2 A large number, M , of candidate dynamical systems will result

i n a large number of control parameters once Problem (P ): is parametrized.
This is particularly true in discrete valued control problems where the control has
multiple components each capable of taking o n a large number of discrete values
and it reflects the combinatorial nature of the class of problems we consider here.
A similarly large number of parameters arises i n the technique proposed in Liu
and Teo (2000).

Remark 3.3 Note that a n optimal solution of Problem (P

): found by MISERS
does not necessarily represent a unique parametrization of the corresponding op-
timal solution of Problem (PN), although this is of n o practical consequence.
The particular parametrization found depends o n the initial guess supplied to
MISERS.
456 OPTIMIZATION AND CONTROL WITH APPLICATIONS

4 NUMERICAL EXAMPLES A N D DISCUSSION

We consider the numerical example given in Liu and Teo (2000). In this prob-
lem, there are 3 candidate dynamical systems and the time horizon is [O,21. We
have

The given initial condition is x(0) = 0. The cost functional is

We have M = 3 in this case. Following the approach in the previous section,

we have L = 3 ( N + 1) and

The transformed problem may then be written asMinimize the cost functional

subject to the dynamics

We solve the problem for various values of N . For N = 1, we get different

solutions depending on the initial guess we provide for u. The results are
summarized in Table 4.1. Note that we do not express these results in terms of
u but in terms of the original statement of the problem for the sake of clarity.

We note from Table 4.1 that the proposed method will generate local solutions
depending on the choice of initial guess. For the first initial guess listed in Table
4.1, we obtain a locally optimal solution which involves one switching and is
identical to that produced in Liu and Teo (2000) (where a t most one switching
OPTIMAL SWITCHING CONTROL PROBLEMS 457

I I Case 1 I Case 2 I
I Initial Sequence I {Rl) I {RI, a 2 , 0 3 , RI, fi2,03) I
I Initial Switching Times I no switch 1 0.1,0.3,0.8,1.1,1.5 1
I Optimal Sequence I {%, 02) I {%, &,&,0 2 ) I
I Optimal Switching Times 1 1.03463 1 0.501,0.797,0.990 1
I Optimal Cost 1 0.003611 1 0.003204 I
Table 4.1 Result for N = 1.

was allowed). However, as the second line in the table shows, a different initial
guess produces quite a different solution with 3 switches and a significantly
lower cost. Virtually all other initial guesses we tried resulted in one of these
two solutions.
Next we tried increasing N to N = 3 in order to see if there are more
optimal solutions if we allow more switchings. Again, many initial guesses
were tested, with most of these leading to a solution with optimal switching
sequence {RI ,Q3, f l 1 , 0 2 , R1 ,i22) and corresponding switching times 0.50156,
0.82533, 0.91906, 0.97595, and 1.03862. The optimal cost value was 0.003170.
Again, we did obtain a slightly worse local optimal solution with one of the
initial guesses tested.
A further increase to N = 7 did not yield any other solutions with a lower
cost value, so the one obtained with N = 3 appears to be optimal, ie. our
estimate for the optimal number of switches is 5.

5 CONCLUSIONS

We have presented a new approach to solve a large class of optimal switching

control problems. I t is a natural extension of the approaches used for certain
subclasses of these problems in some of our earlier work. Furthermore, it is
more transparent than a recent alternative approach and it does not require
the introduction of a set of equality constraints handled via penalty methods.
On the other hand, the new approach is not always able to solve a problem
when a fixed number of switching points is prescribed.
458 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Neither of these methods can necessary guarantee a globally optimal solution

and future work will need to address this issue.

References

P. Howlett (1996), Optimal strategies for the control of a train. Automatica,

Vol. 32, pp. 519-532.
L.S. Jennings, M.E. Fisher, K.L. Teo and C.J. Goh (1991), Miser3: Solving
optimal control problems - an update. Advances in Engineering Software
and Workstations, Vol. 13, pp. 190-196.
L.S. Jennings, M.E. Fisher, K.L. Teo and C.J. Goh (2001), MISER3 Optimal
Control Software, Version 3: Theory and User Manual,
http://www.cado.uwa.edu.au/miser/.
C.Y. Kaya and J.L. Noakes, Computations and time - optimal controls, Optimal
Control Applications and Methods, to appear.
C.Y. Kaya and J.L. Noakes, Computational method for time - optimal switching
control, in press.
H.W.J. Lee, X.Q. Cai and K.L. Teo (2001), An Optimal Control Approach to
Manpower Planning Problem, Mathematical Problems in Engineering, Vol.
7, pp.155-175.
H.W.J. Lee, K.L. Teo, L.S. Jennings and V. Rehbock (1997), Control parame-
trization enhancing technique for time optimal control problems, Dynamical
Systems and Applications, Vol. 6(2), pp. 243-261.
H.W.J. Lee, K.L. Teo and A.E.B. Lim (2001), Sensor Scheduling in Continuous
Time, Automatica, Vol. 37, pp. 2017-2023.
H.W.J. Lee, K.L. Teo, V. Rehbock and L.S. Jennings (1999), Control para-
metrization enhancing technique for discrete-valued control problems, Auto-
matic~,Vol. 35(8), pp. 1401-1407.
X. Li and J. Yong (1995), Optimal Control Theory for Infinite Dimensional
Systems, Birkhauser, Boston.
Y. Liu and K.L Teo (2000), Computational method for a class of optimal switch-
ing control problems, in X.Q. Yang et a1 (Eds.), Progress in Optimization:
Contributions from Australasia, Kluwer Academic Publishers, Dordrecht, pp.
221-237.
P. Pudney P. Howlett and B. Benjamin (1992), Determination of optimal
driving strategies for the control of a train. In B.R. Benjamin B.J. Noye
REFERENCES 459

and L.H. Colgan, editors, Proc. Computational Techniques and Applications,

CTAC 91, pages 241-248. Computational Mathematical Group, Australian
Mathematical Society.
V. Rehbock, K.L. Teo, L.S. Jennings and H.W.J. Lee (1999), A survey of the
control parametrization and control parametrization enhancing methods for
constrained optimal control problems, A. Eberhard et a1 (Eds.), Progress in
Optimization: Contributions from Australia, Kluwer Academic Publishers,
Dordrecht, pp. 247-275.
K.L. Teo, C.J. Goh and K.H. Wong (1991), A Unified Computational Approach
to Optimal Control Problems, Longman Scientific and Technical, Essex.
S. Wang, K.L. Teo and H.W.J. Lee (1996), A new approach to nonlinear mixed
programming problems, Preprint.
J. Yong (1989), Systems governed by ordinary differential equations with con-
tinuous, switching and impulse controls, Appl. Math. Optim., Vol. 20, pp.
223-235.
22
A SOLUTION T O
HAMILTON-JACOB1 EQUATION BY
NEURAL NETWORKS AND OPTIMAL
STATE FEEDBACK CONTROL
Kiyotaka Shimizu

Faculty of Science and Technology, Keio University

Abstract: This paper is concerned with state feedback controller design us-
ing neural networks for nonlinear optimal regulator problem. Nonlinear opti-
mal feedback control law can be synthesized by solving the Hamilton-Jacobi
equation with three layered neural networks. The Hamilton-Jacobi equation
generates the value function by which the optimal feedback law is synthesized.
To obtain an approximate solution of the Hamilton-Jacobi equation, we solve
an optimization problem by the gradient method, which determines connection
weights and thresholds in the neural networks. Gradient functions are calcu-
lated explicitly by the Lagrange multiplier method and used in the learning
algorithm of the networks. We propose also a device such that an approximate
solution to the Hamilton-Jacobi equation converges to the true value function.
The effectiveness of the proposed method was confirmed with simulations for
various plants.

Key words: Nonlinear optimal control, Hamilton-Jacobi equation, neural net-

work, state feedback.
462 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

This paper is concerned with optimal state feedback control of nonlinear sys-
tems. To solve nonlinear optimal regulator problem, we solve the Hamilton-
Jacobi equation using neural networks and then synthesize the optimal state
feedback control law with its approximate solution.
Most studies on optimal control of nonlinear systems have been made by appli-
cation of calculus of variation. They are aimed a t calculating optimal control in-
put uO(t), t € [0,tl] and the corresponding optimal trajectory xo(t), t E [0,tl]
starting from an initial condition x(0). This yields the so called open-loop
control, but a practically interesting matter is to obtain optimal state feedback
control law uO(t)= a ( x ( t ) ) , t E [0,tl] which brings us a closed loop system.
As is well known, optimal regulator is a typical control problem in which a
linear system and a quadratic performance functional are considered and the
Riccati equation plays an important role. The reason of attaching importance
to the optimal regulator is that it offers a systematic method to design a state
feedback control law and consequently one can construct a closed-loop control
system.
In contrast it is very hard for nonlinear systems to design such an optimal
state feedback controller. To synthesize the optimal state feedback control law
resulting in the stable closed loop system, one must solve the Hamilton-Jacobi
partial differential equation (H-J equation).
However, nonlinear optimal regulator is of restrictive use since it is extremely
difficult to solve H-J equation analytically. Hence we need to develop approxi-
mate solution of the H-J equation.
In the past several approaches to solve H-J equation were proposed as follows.

(1) application of Taylor series expansion [Lukes (1969)l

(2) application of neural networks [Goh (1993)l
(3) application of Galerkin method [Beard (1997)l
(4) application of Spline function [Lee (1996)l

With the Taylor series expansion, one can obtain an accurate approximate
solution around an operating point. However, it is difficult to approximate
uniformly in broad range.
The principle of neural network approximation for H-J equation is all explained
NONLINEAR OPTIMAL REGULATOR 463

in [Goh (1993)l. However, gradient functions necessary for neural network

learning was not explicitly represented there. In the Galerkin method the
problem is how to select basis functions, and besides its application is limited
only to linear partial differential equations. Hence it was required to introduce
the generalized H-J equation [Saridis (1986),Beard (1997)l.

In this paper we propose a method to obtain an approximate solution to the

H-J equation and to realize optimal state feedback control law using a three
layered neural network. The method is relatively easy when applied to affine
nonlinear systems, since we have only to approximate the value function in the
H-J equation with the network such that an error of H-J equation goes to zero.
When applied to general nonlinear systems, however, its computation becomes
very complex, because we must approximate both the value function and con-
trol inputs with neural networks such that necessary optimality conditions for
the control inputs are satisfied. We can calculate gradient functions with re-
spect to connection weights and thresholds in the network explicitly by applying
Lagrange multiple method. Hence learning of the networks is carried out very
efficiently and systematically.
Note that the H-J equation does not necessarily possess a unique solution and
so an approximate solution obtained by the neural network is not guaranteed to
be the true value function from which the optimal feedback law is synthesized.
Thus we make a device for learning of the networks, that is, we propose an
idea to let any approximate solution converge to the true value function, using
a stabilizing solution of the Riccati equation for the linearized LQ regulator
problem.
The usefulness of the proposed method was confirmed from simulation results
for both affine and general nonlinear systems.

2 NONLINEAR OPTIMAL REGULATOR AND HAMILTON-JACOB1

EQUATION

Consider nonlinear optimal regulator problem

subj. to x ( t ) = f ( x ( t ) , u ( t ) ) , x(O)=Xo
464 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where x(t) E Rn and u(t)E R' are the state vector and the control vector,
respectively.
We assume the following:

Assumption 2.1 System (2.lb) is stabilizable in the sence that for any initial
such that x ( t ) -+ 0 as t -+ m.
condition x ( 0 ) there exists a control law u(.)

Assumption 2.2 f : Rn x R' 4 Rn i s continuously differentiable in ( x , u )

and f ( 0 , O ) = 0.

Assumption 2.3 I t holds that q ( 0 ) = 0 , q ( x ) 2 0 , and R > 0.

We design a nonlinear state feed back controller u(t)= a ( x ( t ) )for problem
(2.1), where a : Rn -+ R' is C1 and a(0) = 0. Then the nonlinear opti-
mal regulator problem (2.1) results in the following stationary Hamilton-Jacobi
equation

0 = min H ( x ,u, VX( x ) )

= min { q ( x )
U
+ uTRU + VX( x )f ( x ,u ) ) (2.2)
V x ( x ) )is the Hamiltonian function
where, H ( x , u,

and V ( x )denotes the value function that is semi-positive definite.

Further, the stationary H-J equation (2.2) satisfies a boundary condition,

Meanwhile, a necessary optimality condition

holds for u satisfying (2.2). Therefore the optimal control must satisfy the
following partial differential equations.
NONLINEAR OPTIMAL REGULATOR 465

As mentioned above, the nonlinear optimal regulator problem is equivalent to

find V ( x )and u=a(x)which satisfy conditions ( 2 . 7 ) ~ ( 2 . 9a) t the same time.

Here let us consider affine nonlinear system

+
x = f(x) G(x)u (2.10)

From (2.6)

Solve (2.11) with regard to u to obtain

Substituting this into (2.7) and taking account of symmetricity of R, we have

If we can obtain V ( x )satisfying this equation, then the optimal state feedback
control law u ( x )is given by (2.12).

3 APPROXIMATE SOLUTION TO HAMILTON-JACOB1 EQUATION

AND OPTIMAL STATE FEEDBACK CONTROL LAW

[A] Affine Nonlinear System Case

Firstly, we try to generate an approximate solution to the H-J equation (2.13)

by using a neural networks. As the neural networks a three layered neural
network is used :

where y E Rn and x E R4 are the output and internal state of the neural
network, respectively and Wl E RQXn,W2 E RnXq,8 E RQand a E Rn are the
connection weight matrices, the threshold and the constant, respectively.
Further a : Rn -+ R4 denotes the sigmoid function, and we use hyperbolic
tangent functions as the sigmoid function a ( x ) , i.e.,
466 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Then the value function V(x) is approximated with y as

The boundary conditon V(0) = 0 is easily satisfied by taking a = -W2a(8).

Note that when we set the value function and a like this, v N ( 0 ) = 0, v N ( x ) 2
0 Vx are always satisfied regardless of values of Wl, W2,8.
Next let us consider learning of the neural network so that v N ( x ) satisfies
(2.13). As an error of (2.13) put
1
e(x) = q(x) + V$ (x) f ( x ) - v$' (x)G(x) R-~G(X)TV-(X)~ (3.4)

and define a performance function EIWl, W2,81 for learning as follows.

A
Here xP denotes the element of a set A = {xplxp E R, p = 1,2,. . . ,P) where
R c R" is a subregion of state space and A is the discretized set of R.
Learning problem of the neural network is formulated as the following opti-
mization problem.

min E[Wl,W2,8] = C l e ( ~ ~ ) 1 ~
w1,w2,8 p= 1
subj. to zP = WlxP +8 (3.6b)

Here v$(x) included in e(x) becomes from (3.6b)~(3.6d)as follows.

For learning of the network we need concrete expressions of gradients of the per-
formance function with respect to connection weights, etc., i.e., Vwl EIWl, W2,8],
NONLINEAR OPTIMAL REGULATOR 467

we have only t o derive expressions of Vwl le(x)I2, Vwzle(x)I2, and Vele(x)12.

Hence we derive these gradients, applying the Lagrange multiplier method as
below.
Define first the following variable v E Rn:

Then we have

Next, t o calculate the gradients of le(x)I2, let us define the Lagrangian L with
Lagrange multipliers X E Rn,p E RQ,y E Rn.

By the chain rule of derivatives and formulae1 for gradients and symmetricity
of V a ( z ) , partial derivatives of the Lagrangian L with respect t o each variable
are calculated as follows.
468 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where x @ y and x y denote the tensor product and the inner product of array
x E X and y E Y, respectively, and V 2 u ( z ) E RqXQXq is the second order
derivative array. From (3.16)~(3.21)variables A, P, y are obtained as

By substituting these A, P, y into (3.13)--(3.15) we obtain Vwl L, Vw, L, VeL.

Then it holds that

In the above calculation of partial derivatives, not vector-matrix expression but

array expression is used in only one place. Defining matrices

Z = diag[a:'(zl)hl, a;(zz)ha, . . . ,at(zq)hq]

where h = WTY= W:W~{U(~) - ~ ( 0 ) )
however, its array expression part V2u(z). {2W1A @ yTw2) can be rewritten
by vector-matrix one as follows.

After all, from (3.13)~(3.15),(3.22)~(3.24)

the gradients of performance func-
tion ( 3 . 6 ~ Vwl
) E, Vw,E, and V e E are acquired as follows.
NONLINEAR OPTIMAL REGULATOR 469

where XP,PP,YP are given by (3.22)--(3.24) coresponding to xp.

Using these gradients, we can apply the steepest descent method to obtain
optimal connection weights Wl, W2 and threshold 8 :

where a > 0 is a proportional coefficient and k denotes the iteration number.

Consequently, v$(x) in (3.7) is obtained, which yields the optimal state feed-
back control law given by (2.12), as follows.

[B] General Nonlinear System Case

In order to get the optimal feedback law satisfying (2.7)--(2.9) by solving the
optimal regulator problem (2.1), we must approximate the value function V(x)
and the state feedback control law u ( x ) with separate neural networks. Hence
we use for V(x) the same neural network used in the affine nonlinear case.

where Wl E RQXn,W2 E RnxQ,el E RQ,and a E R n ( a = -Wzu(01)). In like

manner as the case [A] the value function V(x) is approximated with y as

It is noted that the boundary condition V(0) = 0 is easily satisfied by setting

a = - W 2 ~ ( 8 1 ) .Next for u ( x ) we use the neural network
470 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where W3 E Rmxn,W4 E R T X m(32 , E Rm, and b E R'. 2 2 E Rm and U N E R'

denote the internal state and output vector of the neural network, respectively.
The condition u N ( 0 )= 0 is always satisfied by setting b = -W4u(02).
Define errors of equation (2.7)and (2.9):

A
Note here that e2 E R'. For simplicity letting
--
W
{Wl,W2,W3,W4) and
=
- A
8 = {el,021, we define the performance function E[W,81 for learning as fol-
lows.

The learning problem is formulated as the following optimizaton problem.

subj. to 27 = W1xP + el

Here VZ( x )becomes

as the same as (3.7).

NONLINEAR OPTIMAL REGULATOR 471

essary for the learning, but since

we have to know vwi{lel(x)12 + l l e 2 ( ~ ~ ) 1 1i~ =) , 1,2,3,4

Vei{lel(z)12 +
(le2(x~)J12),i = 1,2. So we derive these gradients, applying the Lagrange
multiplier method as below.
First define the variable v E Rn as

then

Introducing the Lagrange multipliers X E Rn,p E Rq,y E Rn, q~ E Rm,6 E R',

define the Lagrangian

Then in the similar manner as the previous case [A], concrete expressions of
gradients are obtained.
Calculating partial derivatives of Lagrangian L with respect to each variable
+
and noticing Vwi{le1(x)I2 IIe2(~)11~) = VwiL, Vei{lel(x)l2 +
IIez(x)112) =
Vei L, we can obtain the gradients of performance function (3.42) with respect
to connection weight matrices W iand threshold ei as follows.
472 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where A, p, y, r ) , and 6 are given as

Using these gradients we can execute the steepest descent method ( a > 0)

to get optimal connection weight matrices Wi, i = 1 , 2 , 3 , 4 and thresholds

ei, i = l , 2 .
) the generalized H-J equations (2.7)-
As the results both the solution V N ( x to
(2.9) and the optimal state feedback law u N ( x )are obtained. This optimal
state feedback control law

becomes the solution for nonlinear optimal regulator problem (2.1)

NONLINEAR OPTIMAL REGULATOR 473

4 IMPROVEMENT OF LEARNING ALGORITHM OF NEURAL

NETWORK

Since it does not necessarily follow that partial differential equations (2.7)--(2.9)
possess a unique solution, an arbitary solution of the H-J equation is not al-
ways the true value function. This difficulty is caused by the fact that the
H-J equation is only a necessary condition for optimality. In general it is very
hard to prove that any approximate solution of the H-J equation converges to
the true value function V(x). However, we can improve the possibility that
the approximate solution converges to the value function, making a device on
learning of networks.
It is not guaranteed that v N ( x ) obtained from the learning problem (3.6) or
(3.43) coincides with the value function V(x) of the performance functional
( 2 . 1 ~ ) .In fact it sometime happens that v N ( x ) # V(x). This is caused by
that in general the solution to the H-J equation is not unique. Accordingly, we
need a device of learning such that vN(x) converges to the true value function
V(x).
For simplicity let us assume there exits u = uO(x,Vx(x)) satisfying (2.9)
around (x, u ) = (0,O) globally. Substitute this into (2.7) to get

The solution to the H-J equation (4.1) is not unique because the H-J equation
is only a necessary condition for optimality. The value function of problem
(2.1) certainly satisfies the H-J equation (4.1) but there may exist any other
solutions. Let us here denote the value function (2.4) especially by VO(x)and
distinguish it from any other solutions V(x).
The minimum solution among semi-positive definite solutions to the H-J equa-
tion (4.1) coincides with VO(x). But asymptotical stability of the closed-loop
system is not guaranteed by implementing the optimal control law uO(x,Vj: (x)).
If there exists the solution V(x) of the H-J equation such that x = f (x, uO(x,
Vz(x))) becomes asymptotically stable, then we call it the stabilizing solution
denoted by V-(x). The following lemma gives a condition that V-(2) exists
uniquely [Kucera (1972),Schaft (l996)I.
474 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Lemma 4.1 Assume that the Hamilton matrix

does not possess pure imaginary eigenvalues and {A, B } is stabilizable, where
A = f x ( 0 ,0 ) , B = f , (0,
0 ) , 2Q = q,, ( 0 ) . Then the H- J equation possesses
the unique stabilizing solution V - ( x ).

As this lemma holds, it is known from the uniqueness of the stabilizing solu-
tion that V - ( 2 ) becomes equal to the minimum performance function for the
stabilizing optimal control problem, that is

min
U
{1m(4(x)+ uTRu)dt 6 = f u ) ,x (t)=
(1, x , lim x(t) = 0
t--too
(4.2)
At this time the stabilizing optimal control law is given by u O ( xV, i ( x ) ) .Fur-
ther it can be easily shown [Schaft (1996)]that V - ( x )is the maximum solution
of the H-J equation.
Now V O ( xwas
) the minimum among the semi-positive definite solutions of the
H-J equation (4.1), while V - ( x ) is the maximum one. Hence if there exists
more than one semi-positive definite solution, then V O ( x#) V - ( x ) . However if
) V - (2).
the semi-positive definite solution is unique, then it becomes V O ( x=
The uniqueness of semipositive definite solution is guaranteed by assuming de-
tectability.

Assumption 4.1 { f ( x ,u ) ,q ( x ) + u T RU} is detectable. That is, it holds along

t--too t+w
+
solutions of (2.lb) that lim x(t) = 0 as lirn { q ( x ( t ) ) ~ ( t ) ~ R u (=t 0.
)}

In order that V O ( xexists,

)
t+m
+
it must hold that lim { q ( z ( t ) ) ~ ( t ) ~ R u (=t 0.
)}
Meanwhile, if we assume the detectability, we have lim x(t) = 0. Therefore
t--tm
the optimal regulator problem coincides with the stabilizing optimal regulator
problem, and so it holds that V O ( x=
) V - ( 2 ) . Equality of the minimum and
the maximum solutions indicates the uniqueness of semi-positive definite solu-
tion.
In below assume the detectability. Then our aim is to approximate the unique
semi-positive definite solution to the H-J equation by the neural network.
To make matters worth, however, it happens sometime that V N ( x )converges
to something other than V - ( x ) partially on the way that V N ( x )is learnt to
NONLINEAR OPTIMAL REGULATOR 475

converge to V-(2). Yet a t least in the neighborhood of x = 0 it is easy to

let VN(x) coincide with V-(2). Moreover, once VN(x) begins to converge to
V-(x), it does not happen to converge to another solution thereafter.
The following well-known fact is important for letting v N ( x ) converge to
v- (x);
Under the same assumption of Lemma 1, there exists a unique stabilizing so-
lution P- to the Riccati equation

And it holds that

v 2 v - ( 0 ) = P-

for the stabilizing solution V-(x) of the H-J equation (4.1). Further, since
VO(x) = V- (x), it holds V2V0(0) = P- also.
Now from (4.2) V-(x) takes the minimum V-(0) = 0 and it holds V& (0) = 0.
Meanwhile VN(2) satisfies VN(0) = 0 and v,N (0) =0. Thus letting the relation

hold for VN(x) as well as (4.4), we can make v N ( x ) coincide with V-(x) =
VO(x)in the neighborhood of x = 0.
Here, we show the improvement of learning algorithm as to affine nonlinear sys-
tem case. Note that we can apply the same approach for the general nonlinear
system case.
V2VN(0)can be calculated from (3.7) as

Put D V2VN(0) - P - , then the norm of D , i.e. [ID11 must be required to

be zero. Therefore we modify the performance function (3.5) for learning as
follows.
P
E[Wl, W2,OI = C le(xP)I2+ lpl12
p=l
(4.7)

Hence the learning problem is to solve the following optimization problem.

min E[Wl, W2,6] =

wI,w2,6 P=1
C
le(xp)12 +1 1 ~ 1 1 ~
476 OPTIMIZATION AND CONTROL WITH APPLICATIONS

+
subj. to zP = W l x P 19 (4.8b)
y ( x P ) = W2u(zP)- W2u(19) (4.8~)
vN(xp) = y(xp)Ty(xp) (4.8d)
e ( x p )= q ( x P )+ v;(xp)f ( x p )
1
- -4 V , N ( X ~ ) G ( X ~ ) R - ~ G ( X ~ ) ~ V , " (4.8e)
(~~)~
D = ~w,Tvu(I~)w,Tw~vu(I~)w~- P- (4.8.f)
p= l,2,...,P

Although gradients of (4.8a) w.r.t. Wl,W2,O are necessary to solve (4.8),

the gradients of the first term in (4.8a) have been obtained by (3.28)--(3.30)
already. Thus we have only to obtain gradients of 1 1 ~ 1 (w.r.t.
1~ Wl,W2,O).
But they are easily calculated by using a formulae in the notes as follows.

By using Vw,E , V w 2 E , V e E newly obtained, the learning of neural network is

executed by the steepest descent method (3.31)--(3.33).

5 SIMULATION RESULTS

We made computer simulations for the following example [Isidori (1989)l. Here
let us consider the case where the method in [Goh (1993)l is difficult to apply
for learning.

min
u
Lrn +
x: xi + x: + u2dt@@@
subj. to x l = -q + e2"2u
x2 = 2x122 + sin(x2) + 0 . 5 ~
x3 = 2x2
x ( 0 ) = xo

F'rom (2.12) and (2.13) the H-J equation and the optimal control law uO(x,
Vx(x))
become:

x? + x i + x i - xlVxl + (2x122 + sin(x2)}VX2( x )+ 2x2VZ,( x )

(2)

- 0 . 2 5 { e ~ ~( ~x ~)+
,~~e 2 x 2 ~ (x)Vx2
xl + 0 . 2 5 (~x ~) ~~=} 0
(2) (5.2)
NONLINEAR OPTIMAL REGULATOR 477

uO(x,VX(2)) = - 0 . 5 { e ~ " ~ ~ (x)

,, + 0.5Vx, (x)) (5.3)
In order to obtain the desired value function VO(x), we improve the learn-
ing algorithm of neural network as stated in Section 4, using the condition
v 2 v N ( 0 ) = P - . For that purpose we consider LQ regulator problem gener-
ated by linearizing ( 5 . l b ) ~ ( 5 . l d and
) by approximating (5.la) in a quadratic
form.

subj.to x = Ax + bu, x(0) = xo

where A, b, Q , and r are given as

Since {A, b ) and {&,A) are controllable and observable, respectively, the
assumption in Lemma 1 is satisfied. The Riccati equation for problem (5.4)
becomes
PA +A P ~ b ( 2 r ) - l b T p+ 2Q = 0,
~ - (5.5)
and the stabilizing solution P- is calculated as follows.

+
Since q(x) = xf +x$ x: is positive definite, Assumption 4.1 is satisfied. Hence
the state variables become asymptotically stable by the optimal control law.
A number of middle layer of the network was taken 20. The learning domain
wasset a s O = {(x1,x2,x3)1 - 2 < x l I 2,-1 5 x 2 5 1,-2 5 x 3 5 2) and was
discretized by an orthogonal lattice. The distance between the adjoining lattice
points was set as Ax1 = 0.2, Ax2 = 0.1, Ax3 = 0.2. Initial values of Wl, W2
and 0 were given by random numbers between -0.5 and 0.5.
The results of optimal feedback control is presented in Figure 5.1 and 5.2 in
case of initial state x(0) = (1.5, 1, 1.5). Then the value of performance function
is 19.797. For comparison optimal control in the open-loop style was computed
by a usual optimization algorithm [Shimizu (1994)] in case of the same initial
478 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Figure 5.1 Optimal state feedback con- Figure 5.2 Optimal state feedback con-
trol by neural network (state variables) trol by neural network (control input)

Figure 5.3 Optimal control in open loop Figure 5.4 Optimal control in open loop
style (state variables) style (control input)

state x ( 0 ) (see Figure 5.3&5.4). Then the value of performance function is

19.705.
Comparing both results, we can see that the optimal feedback neural controller
generates the true optimal control in sufficiently satisfactory accuracy.
REFERENCES 479

6 CONCLUSIONS

We proposed a method using a neural network to obtain an approximate solu-

tion of the H-J equation for the nonlinear optimal regulator problem. It was
confirmed from the simulation results of various examples that the proposed
method is effective to synthesize the optimal feedback control law of nonlinear
systems.
With regard to learning methods of the neural network, however, there is room
for further improvements. To obtain a global optimum of connection weights
and thresholds, the so called pattern mode search [Saridis (1986)] is considered
to apply. How to initialize the neural network is also our future problem.

Notes

1. ( i ) Let f ( x ) = aT x , aT E Z @ X * , (* denotes the conjugate space).

Then V f ( x ) = a E X @ Z*
(ii) ~ e t f ( D ) = x ~ * D * yx, E X , y E Y , D E X @ Y * ,
then V f ( D ) = x @ y T E X B Y *
(see the proof in [Suzuki (1990)l)

References

Beard, R.W., Saridis, G.N. and Wen, J.T. (1997), Galerkin Approximations
of the Generalized Hamilton-Jacobi-Bellman Equation, Automatica, Vol. 33,
No. 12, pp. 2159-2177.
Doya, K. (2000), Reinforcement Learning in Continuous Time and Space, Neural
Computation, Vo1.12, pp. 219-245.
Goh, C.J. (1993), On the Nonlinear Optimal Regulator Problem, Automatica,
Vol. 29, No. 3, pp. 751-756.
Isidori, A. (1989), Nonlinear Control Systems: An Introduction, Springer-Verlag.
Kucera, V. (1972), A Contribution to Matrix Quadratic Equations, IEEE Trans.
Automatic Control, pp. 344-347.
Lee, H.W.J., Teo, K.L. and Yan, W.Y. (1996), Nonlinear Optimal Feedback
Control Law for a Class of Nonlinear Systems, Neural Parallel & Scientific
Computations 4, pp. 157-178.
Lukes, D.L. (1969), Optimal Regulation of Nonlinear Dynamical Systems, SIAM.
J. Control, Vol. 7, No. 1, pp. 75-100.
Saridis, G.N. and Balaram, J . (1986), Suboptimal Control for Nonlinear System,
Control Theory and Advanced Technology, Vol. 2, No. 3, pp. 547-562.
480 OPTIMIZATION AND CONTROL WITH APPLICATIONS

van der Schaft, A. (1996), Lz Gain and Passivity Techniques in Nonlinear

Control, Lecture Notes in Control and Information Science 218, Springer.
Shimizu, K. (1994), Optimal Control-Theory and Algorithms, Chap. 10.
Suzuki, M. and Shimizu, K. (1990), Analysis of Distributed Systems by Array
Algebra, Int. J. of Systems Science, Vol. 21, No. 1, pp. 129-155.
OBSERVER FOR DESCRIPTOR SYSTEMS
Wei Xing, Q.L. Zhang,
College of Science, Northeastern University,
Shenyang, Liaoning 110006, P. R. China, Email: [email protected]

W.Q. Liu
School of Computing, Curtin University of Technology, WA 6102,
Australia. Email: [email protected]

and Qiyi Wang

College of Science, Northeastern University,
Shenyang, Liaoning 110006, P. R. China, Email: [email protected]

Abstract: In this paper, the H , control problem based on a state observer

for descriptor systems is investigated. The motivation of this paper is two-fold.
One is to extend the corresponding H , results for linear time invariant systems
to the case for descriptor systems. The other is to obtain more explicit results
compared to those in the existing literature. F'rom the results obtained here,
one can figure out some special features for singular systems, which are different
from normal linear systems. Also the approach adopted here is pure algebraic,
which is easy to understand. As to the results, a necessary and sufficient con-
dition for the solvability of the H , control problem based observer design is
obtained in terms of a Generalized Algebraic Riccati Inequality (GARI). More-
over, the desired controller is also explicitly constructed 'in this case.

Key words: Descriptor system, observer, state-feedback, admissible, H ,

norm, GARI.
482 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

H , control theory in normal linear time-invariant systems has made signifi-

cant progress since the fundamental work was published in 1981 Zames (1981).
Within the state-space approach, the existing results can be roughly classi-
fied into two categories. One is the state feedback H , control problems, such
as Petersen (1987); Khargonekar et a1 (1988); Zhou et a1 (1988); Stoorvogel
(1990); Scherer (1992) and the other category is the output feedback problems,
see Doyel et a1 (1989); Sampei rt a1 (1990); Gahinet et a1 (1994); Barabanov
(1998).
Descriptor systems can be seen as a generalization of the normal linear time-
invariant systems and they can describe many practical systems that the normal
systems can not model properly (e.g., see Verhese et a1 (1981); Dai (1989);
Brenan et a1 (1996); Zhang (1997)). Though a lot of concepts and results in
the normal linear time invariant systems have been generalized to the case of
descriptor systems Cobb (1984); Lewis (1986); Dai (1989), the research on the
H , control problems for descriptor systems is still far behind compared to
that for normal linear time invariant systems. Recently, output feedback H,
control problems for the descriptor systems using state-space method have been
investigated Masubuchi et a1 (1997); Wang et a1 (1998). Further, static state-
feedback H , control problems for the descriptor systems is also examined in
Gao et a1 (1999).
The authors in Masubuchi et a1 (1997) solved the H , control problem
based output feedback. The main result was given in terms of two generalized
algebraic Riccatti equations (GAREs) with constraints. One possible algorithm
was proposed for solving the two GAREs. It can be seen that it is not trivial
to solve these GAREs with constraints. The results in Wang et a1 (1998); Gao
et a1 (1999) were based on bounded real lemma for descriptor systems.
In this paper, we also study the H, control problem based on state feed
back for the descriptor systems. However, the state-feedback here is based on a
state observer rather than static state feedback as in Gao et a1 (1999). In terms
of one GARI, a necessary and sufficient condition for the solvability of our H,
control problem is obtained. Moreover, based on the solution of one GARI,
an admissible controller is constructed explicitly. It should be noted that the
results obtained here are much more explicit compared to those reported in
Masubuchi et a1 (1997) since it only needs one GARI instead of two GARIs. In
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 483

addition, we also investigate this result further when the orthogonal condition
defined in this paper is satisfied.
The paper is organized as follows: In section 2, we will present some pre-
liminary results. The main result will be given in section 3 and conclusions are
given in section 4.

2 PRELIMINARIES

Consider the descriptor system

where x E Rn is the descriptor state variable, w E Rm is the exogenous input

variable (such as reference signal, command etc.), u E Rk is the control input
variable, z E RP is the controlled output variable, y E RQis the measured
output variable; E, A E R n X n ,B1, B2, C1, Cz, D l 2 and Dgl are all constant
real matrices with compatible dimensions. Moreover, it is generally assumed
that ranlc(E) < n. The following concepts will be used and they are mainly
from Dai (1989).

1. The system (2.1) is said to be regular, if the polynomial (with respect to

A) det(AE - A ) # 0.

2. The finite dynamic modes of the system (2.1) are the finite eigenvalues of
(E,A).

3. If all the finite dynamic modes lie in the open left half plane, Then the
system (2.1) is said to be stable.

4. The infinite eigenvalues of (E, A ) corresponding to such relative eigen-

vectors that their ranks are larger than 1 are said to be impulsive modes.

5. If the system (2.1) has no impulsive mode, Then it is said to be impulse-

free;.

6. The triple (E,A , C) is said to be finite dynamics detectable if there exists

a constant matrix M such that the pair (E, A + M C ) is stable.
484 OPTIMIZATION AND CONTROL WITH APPLICATIONS

7. The triple (E, A , C) is said to be impulse observable if there exists a

constant matrix N such that the pair (E,A + N C ) is impulse-free.
Now one can present the following lemmas and definition for singular systems
and they are useful to understand the results in this paper.

L e m m a 2.1 Dai (1989) The regular pair (E, A ) is impulse-free if and only if

deg det(sE - A ) = rankE

Definition 2.1 The descriptor system (2.1) is said to be admissible i f it is

regular, stable and impulse-free.

L e m m a 2.2 Dai (1989) The triple (E, A , C ) is finite dynamics detectable and
impulse observable i f and only if there exists a constant matrix L such that
+
(E, A LC) is stable and impulse-free or equivalently admissible.

For the descriptor system (2.1), many researchers have devoted their effort
to the output feedback H, control problems Takaba et a1 (1994); Masubuchi
et a1 (1997); Wang et a1 (1998). One important motivation is that the output
feedback can be realized easily. However, the corresponding solutions to H ,
control problems are much more complicated compared to the case of state
feedback control. Since the state of a system contains all the essential infor-
mation for the system, a controller based on state-feedback can lead to a more
effective control. Particularly, the corresponding solution to the H , control
problems may become much more explicit Gao et a1 (1999); Wang et a1 (1998).
The problem for state feedback control is that all the state variables are not
available in practice, so we will choose to use a state observer in this paper
and investigate the corresponding state-feedback H, control problem for the
descriptor system within this framework. This problem is between the output
feedback and static feed back control. As can be seen in the sequel, it is not a
trivial special case of output feedback control. With solving this problem for
singular systems, one can visulize the difference more clearly between descriptor
systems and normal systems.
As in the Full Information case Doyel et a1 (1989) for linear time-invariant
system, it is also assumed here that the exogenous input signal w is always
available.
The next lemma gives a result on the design of state observer for singular
systems.
H, CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 485

L e m m a 2.3 Dai (1989) Assume that (E,A, C2) is finite dynamics detectable.
Then the following dynamic system is a state observer for the system (2.1)

where matrix L is such that (E, A + LC2) is stable and impulsive free.
In this paper, a controller based on the observer (2.2) is assumed to be in the
following form.

R e m a r k 2.1 It should be noted that the state feedback controller given above
is diferent from the output feedback controller given by Masubuchi et a1 (1997)

I n the controller (2.5'), there are only two feedback parameters rather than
the four parameters present in (2.4). It can be seen late in this paper that the
results based o n the controller (2.5') are much more explicit.

The next result is the basic result for H , control for singular systems. It
gives a necessary and sufficient condition for a singular system to be H, norm
bounded.

L e m m a 2.4 Masubuchi et a1 (1997) The descriptor system

is admissible and 11 C ( s E - A)-lB I[<, y i f and only i f there exists matrix X

such that
486 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The H , control problem investigated here is to design a state-feedback con-

troller (2.3) for the system (2.1) such that the resulting closed-loop system is
admissible and the H , norm of the transfer function matrix from w to z is
strictly less than a prescribed positive number y. In the next section, we will
give solution of this problem.

3 MAIN RESULTS

In order to consider the H, problem for the descriptor system (2.1), the fol-
lowing assumptions are made.
(Al) (E,A, C2) is finite dynamics detectable and impulse observable.
(A2) rankDl2 = k.
The next lemma is an important result for the proof of our main result.

Lemma 3.1 Assume that all the following matrices have appropriate dimen-
sions. Let
q x : A, B, C) ATX + XTA + cTc+ y - 2 ~ T ~ ~ T ~

ATX + XTA + CTC + y - 2 ~ T ~ 1 ~ T X

A
=
\k(X : A, B1,B2,C, D)
- (B$'X+D~C)~(D~D)--~(B$'X+D~C).
where D is of full column rank. Then

+ +
9 ( X : A B2K,B1, C DK) = \k(X : A, B1,B2,C, D)
+
+{D[K + (DTD)-~(B$'x+ D ~ c ) ] ) ~ { D [ K(DTD)-l(B;X + DTc)])
Proof

@ ( X :A + B 2 K , B 1 , C + D K = \ k ( X :A,B1,B2,C,D)+II

where

II = +
KTBgX + X T B 2+~C ~ D K KTDTC KTDTDK+
= [KT+ (xTB2+ c ~ D ) ( D ~ D ) - ~ ] ( D ~+DKT(B:X
)K + DTC)
+ (xTB2+ CTD)(DTD)-I(BTX + DTC) -
- (xTB2+ CTD)(DTD)-I( B ~ X + DTC)
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 487

Therefore,

@(X: A,B1,B2,C,D)= 9 ( X : A,Bl,C)-

(BTX + D ~ c ) ~ ( D ~ D ) - ~ ( BD+ ?T +
x c ) = 9 ( X : A B2K, B1, C DK)+
+ +
-{D[K + ( D ~ D ) - ~ ( B ; X D ~ c ) ] ) ~ { D [ KpT~)-lpTx +
DTC)I)

Now the main result of this paper can be stated as below.

Theorem 3.1 Suppose that assumptions ( A l ) and ( A 2 ) hold. Then for the
descriptor system (2.1), the following statements are equivalent.

(i) There exists a state-feedback controller (2.3) such that the resulting closed-
loop system is admissible and the H,-norm of transfer function m a t r k
Tzw(s)from w to z is strictly less than a prescribed positive number y,
ie.,
II Tzw(s)<,I 7

(ii) There exists a solution to the following G A R 1

with a constraint
X X ~ 2E o
E ~ =

When (ii) holds, the matrix K in the controller (2.3) can be constructed as

and L in the controller (2.3) satisfies the requirement that (E,A + LC2) is
admissible.

Proof (i)+(ii): It can be seen that the closed-loop system is

Let
488 OPTIMIZATION AND CONTROL WITH APPLICATIONS

then the system (3.4) becomes

Denote

-
C = [ Ci Cl + D u K ]
then the closed-loop system (3.5) can be written as

= ~O+BW
z = &I

where

From Lemma 2.4, there exists a solution X satisfying the following inequality

with the constraint

XX ~ 2
E ~ = E0
Partition X as in A
xi x2
X= [x, x4]
then the (2, 2)-block of the matrix in the left side of the inequalities (3.7) and
(3.8) respectively are

ETx4= X ~ 2
E0
According to Lemma 3.1, one can obtain
H , CONTROL BASED ON STATE OBSERVER FOR DESCRIPTOR SYSTEMS 489

Then the statement (ii) is proved.

(ii)+(i): Since the descriptor system (2.1) satisfies assumption (Al), by
+
Lemma 2.1, there exists a constant matrix L such that (E, A LC2) is admis-
sible. Then the observer (2.2) exists by Lemma 2.3. Now let

Then by Lemma 3.1,

jFrom Lemma 2.4, the pair (E, A + B 2 K ) is admissible and

With chosen L and K above in the controller (2.3), the resulting closed-loop
system is the system (3.4) and its transfer function matrix from w to z is

In order to complete the proof, it suffices to prove that the system (3.4) is
admissible. Notice that

and (E,A + LC2) and (E, A + B 2 K ) are stable. Then the closed-loop system
is stable. Further,
490 OPTIMIZATION AND CONTROL WITH APPLICATIONS

= deg det(sE - (A + LC2)) + deg det(sE - ( A + B2K))

= rank(E) + rank(E)
= rank [ E 0
0 E
]
According to Lemma 2.1, one can see that the closed-loop system is impulse-
free. Hence the closed-loop system is admissible.
It can be seen that the H
, control problem based on state feedback can be
solved via one GARE with constraint. It should be noted that this result is
not a special case for output feedback control, which involves two GAREs with
constraints. Another important feature for this result is that there is only one
parameter in the GARE obtained here instead of two parameters proposed in
output case Masubuchi et a1 (1997). Moreover, if assumption (A2) is replaced
by the following orthogonal condition:
(A2') DT2[C1 Dl21 = [0 I] Then Theorem 3.1 can be simplified.

T h e o r e m 3.2 Suppose that assumptions ( A l ) and (A2') hold. Then the fol-
lowing statements are equivalent for system (2.1).

(i) There exists a controller (2.3) such that the resulting closed-loop system
is admissible and
II Tzw(s) IIw< 7
(ii) There exists a solution X satisfying the following GARI

with constraint
E T x = x T E 2 0.

When (ii) hold, the matrix K in the controller (2.3) can be constructed
as
A
K= - BTX (3.12)
and L in the controller (2.3) is such that (E,A + LC2) is admissible.
In this section, the H
, control problem based on state feedback is investi-
gated and a sufficient and necessary condition in terms of GARI are obtained.
REFERENCES 491

It should be noted the G A H obtained here is much more simple than those
obtained in Masubuchi et a1 (1997) in two ways.
(i) Only one parameter is involved here. This greatly simplifies the complex-
ity of solving the G A H .
(ii) Only one GARI is required.
This motivates us that state feed back controller design is much easier if
the estimated state information is available. In this case, the algorithm for
solving G A H with constraints proposed in Masubuchi et a1 (1997) can also
be significantly simplified. We will not discuss this simplification here since it
is technically trivial.

4 CONCLUSIONS

For descriptor system (2.1), much effort of research on H , control problems in

the existing literature are devoted to output feedback case. In this paper, we
obtained results on H , control based on a state observer for the system (2.1).
With mild assumptions, a necessary and sufficient condition is obtained for the
solvability of our H , control problem in terms of a GARI. Furthermore, the
construction of the desired controller presented here is much more explicit.
It should be noted that the results obtained here is not a trivial case for
output case as revealed in previous section. The algorithm for solving the
G A H can be developed similarly to that in Masubuchi et a1 (1997).

References

N. E. Barabanov, (1998) On the static H, control problem, Systems Control

Lett, Vol. 35, pp. 13-18.
K. E. Brenan, S. L. Campbell and L. R. Petzold, (1996) The numerical so-
lution of initial value problems in differential-algebraic equations , SIAM,
Philadelphia.
D. Cobb, (1984) Controllability, observability, and duality in singular systems,
IEEE Trans. Automat. Contr., vol. AC-29, pp. 1076-1082.
L. Dai, (1989) Singular control systems, Lecture notes in control and informa-
tion sciences, Springer, Berlin.
492 OPTIMIZATION AND CONTROL WITH APPLICATIONS

J. C. Doyle, K. Glover, P. Khargonekar and B. Francis, (1989) State-space

solutions to standard Hz and H, control problems, IEEE Trans. Automat.
Contr., vol. AC-34, pp. 831-847.
P. Gahinet and P. Apkarian, (1994) A linear matrix inequality approach to H,
control, Int. J. of Robust and Nonlinear Contr., vol. 4, pp. 421-448.
F. Gao, W. Q. Liu, V. Sreeram, K. L. Teo, (1999) Bound real lemma for descrip-
tor systems and its application, Proc. of the 14th world congress of IFAC,
Beijing, pp. 57-62.
P. P. Khargonekar, I. R. Petersen and M. Rotea, (1988) Hm-optimal control
with state-feedback, IEEE Trans. Automat. Contr., AC-33, pp. 786-788.
F. L. Lewis, (1986) A survey of linear singular system, Circ. Syst. Sig. Proc.,
Vol. 5, pp. 3-36.
I. Masubuchi, Y. Kamitane, A. Ohara and N. Suda, (1997) H, control for
descriptor systems: a matrix inequalities approach, Automatica, vol. 33, pp.
669-673.
I. Petersen, (1987) Disturbance attenuation and Hm optimization: a design
method based on the algebraic Riccati equation, IEEE Trans. Automat.
Contr. AC-32, pp. 427-429.
M. Sampei, T. Mita and M. Nakamichi, (1990) An algebraic approach to H,
output feedback control problems, systems Control Lett., Vol. 14, pp. 13-24.
C. Scherer, (1992) Hm-control by state-feedback for plants with zeros on the
imaginary axis, SIMA J. Control and Optimization, Vol. 30, pp. 123-142.
A. A. Stoorvogel and H. L. Trentelman, (1990) The quadratic matrix inequality
in singular H, control with state-feedback, SIMA J. Control and Optimiza-
tion, Vol. 28, pp. 1190-1208.
K. Takaba, N. Morihira and T. Katayama, (1994) H" control for descriptor
systems-a J-spectral factorization approach, Proc. of the 33rd conference on
decision and control, pp. 2251-2256.
G. Verhese, B. C. Levy and T. Kailath, (1981) A generalized state-space for
singular systems, IEEE Trans. Automat. Contr., AC-26. pp. 811-831.
H. S. Wang, C. F. Yung, F. R. Chang, (1988) Bounded real lemma and H,
control for descriptor systems, IEE Proc. -Control Theory Appl. Vol. 145,
(3) pp. 316-322.
REFERENCES 493

G. Zames, (1981) Feedback and optimal sensitivity: Model reference transfor-

mations. Multiplicative seminorms and approximate inverses. IEEE Tans.
Automat. Contr., AC-26, pp. 585-601.
Q. L. Zhang, (1997) Robust and decentralized control for large-scale descriptor
systems, Northwestern Polytechnical University Press.
K. Zhou and P. P. Kargonekar, (1988) An algebraic Riccati equation approach
to H m optimization, systems Control Lett., Vol. 11, pp. 85-91.
IV VARIATIONAL
INEQUALITY AND
EQUILIBRIUM PROBLEMS
VECTOR VARIATIONAL INEQUALITIES
E. Allevi, A. Gnudi and

Department of Mathematics, Statistics, Computer Science and Applications,

Bergamo University,
Piazza Rosate, 2, Bergamo 24129, Italy

I. V. Konnov

Department of Applied Mathematics,

Kazan University, ul. Kremlevskaya,l8,
Kazan 420008, Russia

Abstract: In this paper, we consider vector variational inequalities with set-

valued mappings over product sets in a real linear topological space setting.
By employing concepts of relative pseudomonotonicity with variable weights,
we establish several existence results for generalized vector variational inequali-
ties and for systems of generalized vector variational inequalities. These results
strengthen previous existence results which were based on the usual monotonic-
ity type assumptions.

Key words: Vector variational inequalities, set-valued mappings, product sets,

relative pseudomonotonicity, existence results.
498 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Vector variational inequalities were introduced by Giannessi e.g., Giannessi

(1980) in a finite-dimensional Euclidean space as a natural extension of scalar
variational inequalities. Since then, various classes of vector variational in-
equalities were studied both in finite- and in infinite-dimensional spaces; see
e.g., Hadjisavvas et a1 (1998). It is well known that a number of equilibrium
type problems arising in Economics, Game Theory and Transportation have a
decomposable structure, namely, they can be formulated as vector variational
inequalities over Cartesian product sets; see e.g. Yuan (1998)-Yang et al(1997)
and the references therein. At the same time, most existence results for such
problems are based on the known fixed point techniques, which require either
the feasible set (otherwise, the corresponding subset associated to a coercivity
condition) be compact in the strong topology or the cost mapping possess cer-
tain continuity type properties with respect to the weak topology; see e.g. Yuan
(1998); Ansari et a1 (1999). Usually, to essentially weaken these assumptions
one make use of the Ky Fan Lemma (Ky Fan (1961)) together with certain
monotonicity type properties regardless of the decomposable structure of VI;
see e.g. Hadjisavvas et a1 (1998); Oettli et a1 (1998). However, it was noticed
by Bianchi (1993) that infinite-dimensional extensions of the concepts of M-
and P-mappings are not sufficient to apply the Ky Fan Lemma for deriving
exstence results.
In this paper, we develop some other approach, which is based on the invari-
ance of the solution sets of decomposable equilibrium and variational inequality
problems with respect to certain linear trasformations. This property enables
one to extend the usual (generalized) monotonicity properties. Rosen (1965)
introduced such an extension of strict monotonicity to establish the uniqueness
of solutions for non-cooperative games with scalar payoffs. Being based on the
same property, Konnov (2001) introduced new (generalized) monotonicity con-
cepts, which are adjusted to a decomposable structure of the initial problem,
and proved new existence results for scalar variational inequalities in a Banach
space. These new relative monotonicity concepts can be regarded as inter-
mediate ones between the usual monotonicity and order monotonicity ones.
Allevi et a1 (2001) extended the results from Konnov (2001) to vector varia-
tional inequalities with set-valued mappings over the Cartesian product of a
finite number of sets, where the parameters associated to relative (generalized)
DECOMPOSABLE GENERALIZED VECTOR VARIATIONAL INEQUALITIES 499

monotonicity are constant. Now we consider the case where the parameters (or
weights) are variables, thus extending the results from Konnov (2001); Allevi
et a1 (2001). Namely, we establish existence results for generalized vector varia-
tional inequalities and for systems of generalized vector variational inequalities
in a topological vector space by employing new relative (pseudo)monotonicity
concepts for set-valued mappings.

2 PROBLEM FORMULATIONS A N D BASIC FACTS

Let I be the set of indexes { I , . . . ,m). For each s E I, let E, be a real linear
topological space and Us be a nonempty subset of E,. Set

u = nu,.
Let F be a real linear topological space with a partial order induced by a
convex, closed and solid cone C.
Set R y = {(CLE Rm I pi > 0 , 1 I i I m ) .
For each s E I, let Gs : U + 2L(E8tF)be a mapping so that if we set

then G : U + 2L(E>F),where E = nsEIEs. That is, if we take arbitrary points

u E U and v, E Es, then Gs(u)v, is a subset of F, hence, for each v E E, the
sum

is a subset of F. The generalized vector variational inequality problem is to find

an element u* = ( u : ) , ~ E~ U such that

for brevity, we denote by GVVI(I, U, G) problem (2.3).

Together with GVVI(I, U, G) we shall consider its dual formulation, denoted
by DGVVI(I, U, G), which is to find an element u* = ( u : ) , ~ ~E U such that

Next, we shall also consider the system of generalized vector variational

inequalities, denoted by SGVVI(I, U, G), which is to find an element u* =
500 OPTIMIZATION AND CONTROL WITH APPLICATIONS

( u , * ) ~ E IE U such that

Definition 1 The mapping G : U --t 2L(E9F),

defined by (2.2),is said to be

(a) u-hemicontinuous if for any u,v E U and X E [ O , l ] , the mapping X +

+
G(u Xz)z with z = v - u is upper semicontinuous a t O+;
(b) pseudo (w,P)-monotone if for all u , v E U we have

G,(v)(u, - v,) g -intC Vs E I I GG,(uus - us) $ l

-intC.
,€I

In Allevi et a1 (2001),the following result has been proved.

Lemma 2.1 Suppose that the set U , defined by (2.1), is convex and that the
mapping G : U --t 2 L ( E t F ) , defined by (2.2), is u-hemicontinuous. Then
DGVVI(I,U,G ) implies G V V I ( I ,U,G ) .

Clearly, SGVVI(I,U,G ) implies DGVVI(I,U,G ) if G is pseudo (w, P)-mo-

notone and, also, G V V I ( I ,U,G ) always implies S G V V I ( I ,U,G). Combining
the statements above, we obtain the following equivalence result.

Proposition 2.1 Suppose that the set U, defined by (2.1), is convex and that
the mapping G : U + 2L(E9F),
defined by (2.21, is u-hemicontinuous and pseudo
(w,P)-monotone. Then G V V I ( I ,U,G ) ,DGVVI(I, U,G ) ,and SGVVI(I, U,G )
are equivalent.

3 RELATIVE MONOTONICITY TYPE PROPERTIES

Let G : U + 2L(E1F)be a mapping of form (2.2) and let y : U + Ry be a

single-valued mapping. Then we can define the composite mapping [yo GI of
form (2.2) as follows

The following result enables us to reduce GVVI with "weighted" mapping

to a usual SGVVI.

Lemma 3.1 G V V I ( I ,U,y o G ) implies S G V V I ( I ,U,G ) .

DECOMPOSABLE GENERALIZED VECTOR VARIATIONAL INEQUALITIES 501

Proof: Clearly GVVI(I, U, y o G) implies SGVVI(I, U, y o G) which is equiv-

alent to SGVVI(I, U, G). 0

Under certain additional assumptions, we also can obtain an equivalence

result.

Proposition 3.1 Suppose that the set U, defined by (2.1), is convex and that
yoG is u-hemicontinous and pseudo (w, P)-monotone. Then GVVI(I, U, y oG)
is equivalent to SGVVI(I, U, G).

Proof: GVVI(I, U, y o G) is now equivalent to SGVVI(I, U, y o G) due to

Proposition 2.1 with using y o G instead of G. Obviously, SGVVI(I, U, y o G)
is equivalent to SGVVI(I, U, G). 0

In Konnov (2001), new monotonicity type concepts for single-valued scalar

mappings, which extend the usual ones, were proposed. Their extensions to
the the multi-valued vector case involving constant weights, were suggested in
Allevi et a1 (2001). Now we propose relative monotonicity concepts for vector
multivalued mappings with non constant weights.

Definition 2 The mapping G : U -+ 2L(E>F),defined by (2.2), is said to be

(a) relatively monotone if there exists a mapping a : U -+ Ry such that for all
U,V E U, we have

(b) relatively pseudomonotone if there exist mappings a , ,6 : U -+ Ry such that

for all u, v E U, we have

,&(v)G. (v)(v. - u.) -intC + - v,) $

a,(u)~,(u)(u~ i
-intC.
SEI sEI

In what follows, we reserve the symbols a and ,6 for the parameters associated
with relative (pseudo)monotonicity.

The following well-known Ky Fan Lemma Ky Fan (1961), Lemma 1 (see

also Yuan (1998), p.6) will play a crucial role in deriving existence results for
GVVI's and SGWI's. First this result was established in Ky Fan (1961),
502 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Lemma 1 under a Hausdorff space setting. We give a somewhat strengthened

version of this assertion (see e.g. Yuan (1998), p.6).

Proposition 3.2 Let X and Y be non-empty sets in a topological vector space

E and Q : X + 2Y be such that
(i) for each x E X , Q(x) is closed in Y;
(ii) for each finite subset {xl, - - ,xn) of X , its convex hull is contained in
the corresponding union U~=lQ(xi);
(iii) there exists a point 3 E X such that Q(3) is compact.
Then
~,,,Q(x) # 0.

4 EXISTENCE RESULTS

In this section, in addition to the general assumptions, we shall suppose that the
space L(E, F) is topologized in such a manner that the mapping T : Z x U -+ F ,
defined by T(g, u) = g(u), is continuous, if Z is a compact subset of L ( E , F).
As usual, for each set B 5 E, we denote by B its closure. First we establish
existence results for SGVVI(I, U, G).

Theorem 4.1 Let U be convex and compact. Suppose that G has nonempty
and compact values and is relatively pseudomonotone, and that a o G is u-
hemicontinuous. Then SGVVI(I, U, G) is solvable.

Proof: Define set-valued mappings A, B : U + 2' by

and
) {u E u
~ ( v= I C(Y,(V)G,(V)(V~
- us) -intC).
sEI
We divide the proof into the following three steps.

Let z be in the convex hull of any finite subset {vl,. . . ,vn) of K. Then z =
n n
C pjvj for some p j 2 0, j = 1,.. . , n ; C p j = 1. If z # ~ j " , ~ B ( v j then
) , for
j=1 i=l
all g, E G,(z), s E I , we have
DECOMPOSABLE GENERALIZED VECTOR VARIATIONAL INEQUALITIES 503

Since -intC is convex, we obtain

I t follows that

a contradiction. Therefore, the mapping B :U -+ 2u satisfies all the assump-

tions of Proposition 3.2 and we get

(ii)nwGu A(v) # 0.
F'rom relative pseudomonotonicity of G it follows that B(v) c A(v), but, for
each v E U, A(v) is closed. In fact, let { u e ) be a net in A(v) such that ue
converges to ii E U. Then, for each 0, there exist elements E G,(v), s E I,
such that

Since G(v) is compact, without loss of generality we can suppose that -+

9, E GS(v) for each s E I. It follows that g:(u:) -+ gs(iis) for each s E I and

-
We conclude that B E A(v), i.e. A(v) is closed. Therefore, B(v) C A(v) and (i)
now implies (ii).

(iii) SGVVI(I, U,G) is solvable.

F'rom (ii) it follows that DGVVI(I, U, a o G) is solvable. Applying now Lem-
mas 2.1 and 3.1, we conclude that SGVVI(I, U, G) is solvable. The proof is
complete. 0
504 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Using the additional pseudo(w, P)-monotonicity and u-hemicontinuity as-

sumptions on G, we obtain existence results of solutions for GVVI(I, U, G).

Theorem 4.2 Let G be relatively pseudomonotone and pseudo (w, P)-monoto-

ne. Suppose that G has nonempty and compact values and that a o G and G are
u-hemicontinuous. Suppose that U is convex and compact. Then GVVI(I, U,G)
is solvable.

The proof follows from Theorem 4.1 and Proposition 2.1.

By employing the corresponding coercivity condition, we obtain existence

results on unbounded sets.

Corollary 4.1 Suppose that G has nonempty and compact values and is rel-
atively pseudomonotone, and that a o G is u-hemicontinuous. Suppose that U
is convex and closed and that there exist a compact subset V of E and a point
5 E V n U such that

x P~(U)G~(U
~ € 1
( ~G
-)us) ~ - i n t ~ for all u E U\V. (4.1)

Then SGVVI(I, U, G ) is solvable.

Proof: In this case it suffices to follow the proof of Theorem 4.1 and observe
-
that B(5) C V under the above assumptions. Indeed, it follows that B(6) is
compact, hence the assertion of Step (i) will be true due t o Proposition 3.2 as
well. 0

We can now obtain an exitence result for GVVI on unbounded sets.

Corollary 4.2 Let G be relatively pseudomonotone and pseudo (w, P)-mono-

tone. Suppose that G has nonempty and compact values and that a o G and G
are u-hemicontinuous. Suppose that U is convex and closed and that there exist
a compact subset V of E and a point 6 E V f l U such that (4.1) holds. Then
GVVI(I, U, G) is solvable.

The proof follows from Corollary 4.1 and the definition of pseudo (w, P)-
monotonicity.
REFERENCES 505

5 EXISTENCE RESULTS I N BANACH SPACES

By choosing different topologies, one can specify the above existence results for
less general classes of topological vector spaces. For example, in this section,
we specialize these results for a Banach space setting.
Namely, we suppose that E and F are real Banach spaces and that C is a
convex, closed and solid cone C in F . We shall apply the weak topology in E ,
the strong topology in F, and the strong operator topology in L(E, F). For
this reason, we need the concept of a completely continuous mapping.

Definition 4 A mapping is said to be completely continuous if it maps each

weakly convergent sequence into a strongly convergent sequence.

If a mapping Q : E + 2L(E9F)has nonempty and compact values and each el-

ement q E Q(v) is completely continuous, then the mapping T : Q(v) x U -+ F,
defined by T(q, u) = q(u), will be continuous in the (strong) x (weak) topology.
Taking this argument into account and following the proof of Theorem 4.1 and
Corollary 4.1, we obtain an existence result of solutions for SGVVI(I, U, G).

Corollary 5.1 Let G be relatively pseudomonotone. Suppose that a o G is u-

hemicontinuous, G has nonempty and compact values and that, for each v E U,
each element of G(v) is completely continuous. Suppose that U is convex and
that at least one of the following assumptions holds:
(i) U is weakly compact;
(ii) U is closed and there exist a weakly compact subset V of E and a point
C E V n U such that (4.1) holds.
Then SGVVI(I, U, G) is solvable.

Adding the pseudo(w, P)-monotonicity and u-hemicontinuity assumptions

on G, we obtain an analogue of Theorem 4.2 in Banach spaces.

Corollary 5.2 Let all the assumptions of Corollary 5.1 hold and let G be
pseudo (w, P)-monotone and u-hemicontinuous. Then GVVI(I, U, G) is solv-
able.

References

Allevi, E., Gnudi, A., and Konnov, I.V. (2001), Generalized vector variational
inequalities over product sets, Nonlinear Analysis Theory, Methods & Ap-
506 OPTIMIZATION AND CONTROL WITH APPLICATIONS

plications, Proceedings of the Third Word Congress of Nonlinear Analysis,

Vol. 47, Part 1, Editor-in-chief: V.Lamshmikantham, 573-582.
Ansari, Q.H. and Yao, J.-C. (1999), A fixed point theorem and its applications
to a system of variational inequalities, Bull. Austral. Math. Soc., Vol. 59,433
- 442.
Ansari, Q.H. and Yao, J.-C., System of generalized variational inequalities and
their applications, Applicable Analysis, Vo1.76(3-4), 203-217, 2000.
Bianchi, M. (1993), Pseudo P-monotone operators and variational inequalities,
Research Report No. 6, Istituto di Econometria e Matematica per le Decisioni
Economiche, Universitb Cattolica del Sacro Cuore, Milan.
Giannessi, F. (1980), Theory of alternative, quadratic programs and comple-
mentarity problems, Variational Inequalities and Complementarity Prob-
lems, R.W.Cottle, F.Giannessi, and J.L.Lions, eds., Wiley, New York, 151-
186.
Hadjisavvas, H. and Schaible, S. (1998), Quasimonotonicity and pseudomo-
notonicity in variational inequalities and equilibrium problems, Generalized
Convexity, Generalized Monotonicity, J.-P. Crouzeix, J.E. Martinez-Legaz
and M. Volle, eds., Kluwer Academic Publishers, Dordrecht - Boston - Lon-
don, 257-275.
Ky Fan (1961), A generalization of Tychonoff's fixed-point theorem, Math.
Annalen, Vol. 142, 305-310.
Konnov, I.V. (1995), Combined relaxation methods for solving vector equilib-
rium problems, Russ. Math. (Iz. VUZ), Vol. 39, no.12, 51-59.
Konnov, I.V. (2001), Relatively monotone variational inequalities over product
sets, Operations Research Letters, Vol. 28, 21-26.
Oettli, W. and Schlager, D. (1998), Generalized vectorial equilibria and gen-
eralized monotonicity,Functional Analysis with Current Applications in Sci-
ence, Technology and Industry, M. Brokate and A.H. Siddiqi, eds., Pitman
Research Notes in Mathematical Series, No.377, Addison Wesley Longman
Ltd., Essex, 145-154.
Rosen, J.B. (1965), Existence and uniqueness of equilibrium points for concave
n-person games, Econornetrica, Vol. 33, 520-534.
Yang, X.Q. and Goh, C.J. (1997), On vector variational inequalities: application
to vector equilibria, J. Optimiz. Theoy and Appl., Vol. 95, 431-443.
Yuan,G.X.Z. (1998), The Study of Minimax Inequalities and Applications to
Economies and Variational Inequalities, Memoires of the AMS, Vo1.132,
Number 625.
SET-VALUED VECTOR EQUILIBRIUM
PROBLEM
Shui-Hung Hou

Department of Applied Mathematics

The Hong Kong Polytechnic University

Abstract: Based on a variation of Fan's geometric lemma, an existence theo-

rem of solutions for set-valued vector equilibrium problem is given in this paper.

Key words: Set-valued map, geometric lemma, vector equilibrium.

510 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The following geometrical lemma was established in Fan (1961):

Lemma 1.1 Let X be a nonempty compact convex set in a Hausdorff topo-

logical vector space E . Let A be a closed subset of X x X with the following
properties:

(a) (x, x) E A for every x E X ;

(b) Vx E X , the set { y E X : (x, y ) $ A) is convex or empty.

Then there exists a point xo E X such that xo x X c A.

Subsequently, a slight generalization of Fan's lemma was given in Takahashi
(1976) in the following form:

Lemma 1.2 Let X be a nonempty compact convex set i n a Hausdorff topolog-

ical vector space. Let A be a subset of X x X having the following properties:

(i) For any y E X , the set {x E X : (x, y ) E A) is closed;

(ii) (x, x) E A for every x E X ;

(iii) For any x E X , the set { y E X : (x, y ) $! A) is convex.

Then there exists a point xo E X such that xo x X c A.

Later, Ha gave another generalization of Fan's geometric lemma by relaxing,
among the others, the compactness condition (see Ha (1980)).

Lemma 1.3 Let E, F be Hausdorff topological vector spaces, X c E, Y c F ,

be nonempty convex subsets and let A C X x Y be a subset such that

(a) V y E Y , the set { x E X : (x, y ) E A) is closed in X ;

(b) Vx E X , the set { y E Y : (x, y ) $! A) is convex or empty.

Suppose that there exists a subset I' of A and a compact convex subset K of X
such that I? is closed in X x Y and

(c) V y E Y , the set {x E K : (x, y ) E I?) is nonempty and convex.

Then there exists a point xo E K such that xo x Y c A.

GEOMETRIC LEMMA 511

Remark 1.1 Both Lemma 1.1 and 1.2 are special cases of Lemma 1.3 with
X = Y = K a n d r = { ( x , x ): x E X ) .

In this paper, we first give a variation of the geometric lemma of Fan et al,
and then by applying the result, we prove an existence theorem of solutions for
set-valued vector equilibrium problem.

2 PRELIMINARIES

We shall use the following notations and definitions in this paper.

Let A be a nonempty set. We denote by 2A the family of all subsets of A.
If A is a subset of a topological vector space E, we shall denote by clA the
closure of A in E, and by co A the convex hull of A. Also in E , the convex hull
of its finite subset will be called a polytope.
Let X , Y be topological spaces and F : X -+ 2Y be a set-valued map. The
graph of F is the set Gr ( F ) := { ( x ,y ) E X x Y : y E F ( x ) ) , and F is said to
have a closed graph if Gr ( F ) is a closed subset of X x Y. For B c Y, we set
F - ( B ) := { x E X : F ( x ) n B # 0 ) which is the lower inverse of B under F .
Recall also the following definitions from Tian (1992).

Definition 2.1 A set-valued map F : X --+ 2Y is said to be transfer closed-

valued on X i f for any x E X and y $! F ( x ) , there exists an z E X such that
Y $! cl W z ) .

Let us also mention the well known fixed point theorem of Kakutani (1941).

Theorem 2.1 Let X be a nonempty compact convex subset in some finite di-
mensional space lRn and let the set-valued map F : X --+ 2X have closed graph
and nonempty convex vaules. Then F has a fixed point x* E F ( x * ) .

3 A VARIATION OF FAN'S GEOMETRIC L E M M A

Theorem 3.1 Let E, F be Hausdorff topological vector spaces and X C E ,
Y c F be nonempty subsets with Y convex. Let B c A c X x Y, K c X a
compact convex subset, and 0, r : K -+ 2Y be set-valued maps such that

Suppose that
512 OPTIMIZATION AND CONTROL WITH APPLICATIONS

(a) the set-valued map y + A-(y) := {x E X : (x, y) E A) is transfer closed-

valued on Y;

(b) Vx E K , the set {y E Y : (x, y) 4 B) is convex or empty;

(c) for every polytope P in Y, there exists a finite subset {xl, . . . ,x,) of K
n
such that P c U I'(xi);
i= 1
(d) for every polytope P in Y, r ( x ) n P is closed in P for each x in K ; and

(e) Vy E Y, the set W ( y ) is convex.

Then there exists a point xo E K such that xo x Y C A.

Proof: Suppose to the contrary that for each x E K , there exists y E Y such
that x 4 A-(y). Then, by (a) x 4 cl A-(y') for some y' E Y. Hence we have
K C U V, where Vy := {Z E X : z $! clA-(y)). By the compactness of
YEY
n
K , there exists a finite family {yl, . . . ,yn) of Y such that K c U V, .
Let
i=l
{Pi,. . . ,Pn) be a partition of unity on K subordinated to the finite covering
.
{V& : i = 1,.. . ,n), that is, Dl,. . ,,On are nonnegative real-valued continuous
n
functions on K such that each Pi vanishes on K\V,,, while C Pi(x) = 1 for
i=l
all x E K . Let P := co {yl, . . . ,yn) c Y and define a continuous mapping
p : K + P by setting

Thus, for each i such that ,&(x) > 0, x lies in V, n K , so that (x, yi) B and
whence by (b) we have

On the other hand, by (c) there exists a fintie subset {xl, . . . ,x,) of K such
n
that P Let A := co {xl, . . . ,x,) C K . Define a set-valued map
c U r(xi).
i=l
H : A + 2A by H(x) := co {xi : p(x) E I'(xi)) for each x in A. Then each
H(x) is a nonempty closed convex subset of A. Moreover, H has a closed
GEOMETRIC LEMMA 513

graph in A x A. Indeed, let (v, w) E A x A\Gr (H), i.e. w 6H(v). Then there
exists an open neighborhood V of w in A which is disjoint from H(v). Suppose
H(v) = co {xi : i E J) for some J c {I,.. .,n). Then p(v) 6r(xj) for j 6 J.
Therefore by (d), U := p-l
A. If z E U, then p(z)
n P\I'(xj)
(jeJ
4 r ( x j ) for j 6 J,
) is an open neighborhood of v in
and so H(z) c H(v). This implies
V n H(z) = 0 for all z E U. Hence we have an open neighborhood U x V of
(v, w) which does not intersect the graph of H , that is, the graph Gr (H) is
closed.
Applying the Kakutani's fixed point theorem, we have a point 5 E A such
that 1 E H(5). If H (5) = co {xj : j E Jo)for some Jo c {I,. . . ,n), then
p(1) E r ( x j ) c R(xj) for every j E Jo,i.e. x j E ER-[p(~)],Vj
E Jo. Since
R-[p(1)] is convex by (e) and 5 is a convex combination of {xj : j E Jo),we
have 5 E R-[p(5)]. And so (5,p(Z)) E Gr (R) C B , contradicting (3.1). The
theorem is proven.

Remark 3.1 The condition (b) in Theorem 3.1 may be replaced by the equiv-
alent condition (b ') below.

(b') For any finite subset {yl, . . . ,y,) c Y,

Indeed, (b) + (b'): let x E K r l B-(co{yl,. . . ,y,)). Then, x E K and

to show that x belongs to one of the B-(yi)'s. Assume to-the contrary that
x $ B-(yi) for a11 i = 1,.. . ,n. It follows from (b) that (x, Xl y l + . .+Anyn) $ B
which is a contradiction. Therefore, x E ;,
2= 1
B-(yj) n K. This establishes the
implication (b') + (b).
(b') + (b): let (x, Pi) $ B for i = 1,. . . ,n with x E K and let z = Xlyl +
+ Anyn be an arbitrary convex combination of yi's. We need to show that
(x, z) $ B. Suppose otherwise that (x,z) E B. Then, x E K n B-(z), and so
n
by (b') we have x E U B-(yi) n K. This implies that for some 1 I j I n,
i=l
(x, yj) E B , which is a contradiction. The implication (b')+ (b) is established.
Thus, the theorem below is equivalent to Theorem 3.1.
514 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Theorem 3.2 Let E , F be Hausdorff topological vector spaces and X C E ,

Y C F be nonempty subsets with Y convex. Let B C A c X x Y , K c X a
compact convex subset, and R, r : K + 2Y be set-valued maps such that

Suppose that

(a) the set-valued map y + A, := { x E X : ( x ,y ) E A ) is transfer closed-

valued o n Y ;

(b') For any finite subset { y l , . . . ,y,) cY,

(c) for every polytope P in Y , there exists a finite subset { x l , . . . ,x,} of K

n
such that P C U r(xi);
i= 1
(d) for every polytope P in Y , r(x) n P is closed i n P for each x in K; and
(e) 'dy E Y , the set R- ( y ) is convex.

Then there exists a point xo E K such that xo x Y c A.

4 SET-VALUED VECTOR EQUILIBRIUM PROBLEM

Let W be a topological vector space ordered by a pointed closed convex cone

P. An order relation in W can be defined by x I p iff y - x E P. Let E, F be
Hausdorff topological vector spaces and X c E, Y c F be nonempty subsets
with Y convex.
Let K C X be a compact convex subset and let @ : X x Y + 2W be a
set-valued map. We are interested in the problem of finding

xo E K such that @(xo,y ) cP

holds for all y E Y . This problem is called a set-valued vector equilibrium
problem, and such a point xo is said to be an equilibrium point.
If @ is a single-valued map, then the above problem becomes the vector
equilibrium problem; see for example Blum and Oettli (1994) and Giannessi
(2000) and the references therein.
GEOMETRIC LEMMA 515

Based on the result of Theorm 3.1, we are now ready to prove an existence
theorem for the set-valued vector equilibrium problem.

Theorem 4.1 Let R,I? : K + 2Y be set-valued maps such that Gr (I?) C

Gr ( 0 ) . Suppose that

(i) the set-valued map y + {x E X : @(x,y) C P) is transfer closed-valued;

(ii) Vx E K , the set {y E Y : @(x,y) P) is convex or empty;

(iii) V(x, y) E Gr (a), @(x,y) c P;

(iv) Vy E Y, the set a- (y) is convex;

(v) Vx E K , the set r(x) is closed; and

(vi) Vy E Y, there exists an open neighborhood N(y) of y such that

Then there exists xo E K such that @(xo,y) c P for all y E Y.

Proof: If we can show that all the conditions (a)-(e) of Theorem 3.1 are
satisfied, then we may invoke the theorem to conclude the existence of an
equilibrium point xo E K . To this end, We define A := {(x, y) E X x Y :
@(x,y) C P) and B := A. It follows from (i) and (ii) that conditions (a) and (b)
in Thoerem 3.1 are satisfied. Also it is clear that (iii) implies Gr (0) c B c A,
(iv) implies (e), and (v) implies (d) of Theorem 3.1. It remains to show that
condition (c) of Theorem 3.1 is satisfied. Indeed, let Q be a polytope of Y. For
each y E Q, by condition (vi) we know that there exists an open neighborhood
N(y) of y such that M(y) := n I?-(v) # 0. Since Q is compact, there
"EN(Y)
e

exists a finite family {yl, . . . ,y,) C Y such that Q c (J N(yi). With each
i=l
M(yi) # 0, we may take a point xi E M(yi); consequently N(yi) c I'(xi) and
T

so Q c U l?(xi).
Thus, condition (c) is satisfied. We can now conclude from
i=l
Theorem 3.1 the existence of an equilibrium point xo E K. The theorem is
proven.

Remark 4.1 Condition (ii) is satisfied when for every x E K the set-valued
map @(x,-) : Y + 2W has the following P-properly quasiconvexity property
(see Kuroiwa (1996)).
516 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Definition 4.1 A set-valued map 4 :Y -+ 2W is said to be P-properly quasi-

convex if for every y1,y2 E Y, w1 E 4(yl), w2 E +(y2), and X E (0, I), there
+
exists w E +(Xyl (1 - X)y2) such that either w I p wl or w I p w2.

Indeed, let M, := {y E Y : @(x,y) c P) and X E (0, I), yl E M,, y2 E M,.

Then for i = 1 , 2 there exist zi E @(x,yi), while zi # P. Since @(x,y) is P-
properly quasi-convex in y, there exists a z E @(x, Xyl + (1 - X)y2) such that
either z I p zl or z I p z2. If we can show z # P, then @(x,Xyl + (1-X)y2) $ P
implying Xyl + (1 - X)y2 E M,, so that M, is convex. To this end, suppose
otherwise z E P. If z I p zl holds, then this together with z E P imply
+
zl = (zl - z) z E P+ P P which is a contradiction. This establishes z # P.
Similarly z # P if z I p 22. Hence M, is convex and condition (ii) of Theorem
4.1 is satisfied.

Acknowledgments

This work was supported by the Research Committee of The Hong Kong Polytechnic
University.

References

Kakutani, S. (1941), A generalization of Brouwer's fixed point theorem, Duke

Math. Journal, Vol. 8, pp. 457-459.
Fan, K. (1961), A generalization of Tychonoff's fixed point theorem, Math.
Ann., Vol. 142, pp. 305-310.
Takahashi, W. (1976), Nonlinear variational inequalities and fixed point theo-
rems, J. of Math. Soc. of Japan, Vol. 28, No. 1, pp. 168-181.
Ha, C.W. (1980), Minimax and fixed point theorems, Math. Ann., Vol. 248, pp.
73-77.
Tian, G. (1992), Generalizations of the FKKM theorem and the Ky Fan min-
imax inequality, with applications to maximal elements, price equilibrium
and complementarity, Journal of Mathematical Analysis and Applications,
Vol. 170, No. 2, pp. 457-471.
Blum, E. and Oettli, W. (1994), jFrom optimization and variational inequalities
to equilibrium problems, The Math. Student, Vol. 63, pp.123-145.
REFERENCES 5 17

Kuroiwa, D. (1996), Convexity for set-valued maps, Applied Mathematics Let-

ter, Vol. 9, No. 2, pp. 97-101.
Giannessi, F., Editor (2000), Vector Variational Inequalities and Vector Equi-
libria: Mathematical Theories, Kluwer Academic Publishers, Dordrecht, The
Netherlands.
Giovanna ldone

D.I.M.E.T.
Universiti di Reggio Calabria
Via Graziella, Loc. Feo di Vito
89100 Reggio Calabria - ITALIA
idone9ing.unirc.it

and Antonino Maugeri

Dipartimento di Matematica e lnformatica

Universiti d i Catania
Viale A.Doria, 6
95125 Catania - ITALIA
maugeri9dmi.unict.it

Abstract: We show that many equilibrium problems fulfill a common law

expressed by a set of complementarity conditions and that the equilibrium so-
lution is obtained as a solution to a Variational Inequality. In particular we
show that various models of elastoplastic torsion are included in the framework
above.

Key words: Variational inequality, elastoplastic torsion, quasi-relative interior,

Lagrange multipliers.
520 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Many equilibrium problems arising from various fields of science may be ex-
pressed, under general conditions, in a unified way:

where B and C are suitable operators defined in a suitable functional class

S. For example, many important problems of Mathematical Phisic follow the
structure:

[C(u)u - F ( u , Au)][A(u,Au) - $1 = 0
A(% Au) 5 $
u 2 4
u=+ondR.

A particular case is considered in section 5.4 of the book Troianiello (1987),

where the following problem is considered:

The kind of EQUILIBRIUM described by the structure (1.1) is, in general,

different from the one obtained by minimizing of a cost functional or of an
energy integral. Moreover, the structure (1.1) leads, in general, to a VARIA-
TIONAL INEQUALITY on a convex, closed subset of S (usually called K).
This happens, for example, for: unilateral problems in continuum mechan-
ics (the celebrated Signorini problem) the obstacle problem, the discrete and
continuum traffic equilibrium problem, the spatial price problem, the finan-
cial problem, the Walras problem, see Idone (2003), Maugeri (2001), Maugeri
(1998).
The equilibrium given by the structure (1.1) may be considered as an equilib-
rium from a "local point of view". It is different, in general, from the one, which
we call "global", obtained by minimizing the usual functionals, and represents
a complementary aspect which makes clear unknown features of equilibrium
problems. The reason for which the past decades have witnessed an excep-
tional interest in the equilibrium problems of the type (1.1) rests on the fact
EQUILIBRIUM PROBLEMS 521

that the Variational Inequality theory which in general expresses the equilib-
rium conditions ( 1.I), provides a powerful methodology, that in these last years
has been improved by studying the connections with the Separation Theory,
Gap Functions, the Lagrangean Theory and Duality and many related compu-
tational procedures.
In the present paper we show that various models of elasto-plastic torsion
satisfy the structure (1.1) and that the equilibrium conditions (1.1) can be
expressed in terms of a Variational Inequality.

2 A MODEL OF ELASTIC-PLASTIC TORSION

Let R be an open bounded Lipschitz domain with its boundary I' = dR; for
the sake of simplicity, we confine ourselves to the case R c R2.
Let K be the closed convex non empty subset of HA (52):

Then for each f E L2(R) the V.I.

where:

is the usual sesquilinear form with:

aij E Lw(R), bi E L2+', E > 0

X2 in such a way that:

522 OPTIMIZATION AND CONTROL WITH APPLICATIONS

admits a unique solution.

Let u be the unique solution to the V.I. and let us consider the Lagrangean
Function:

where:

(a,X i , X 2 ) E C* = { ( a ,X i , X 2 ) : a, X i , X 2 E L ~ ( R )a,, X i , A2 2 0 a. e. in 0 ) .

Taking into account that the convex K satisfies the constraint qualification
condition introduced in Borwein et al. (1991), namely the "quasi-relative in-
terior of K is non empty", which replaces the standard Slater condition in the
infinite dimensional case, following Daniele (1999) (see also Maugeri (1998)),it
is possible to show the following Lemma:

- -
Lemma 2.1 There exist (P,X I , X 2 ) E C* such that

- -
Vv E HA ( R ) ,V(P, X I , X z ) E C*. Moreover, L(u,F, X I , X 2 ) = 0.

Now let us denote by L u the operator defined by the relationship:

(Lu,v ) = a(., v) -
J, f vdx Vv E H,$(0).
By means of Lemma 2.1, we can prove the following result:
EQUILIBRIUM PROBLEMS 523

T h e o r e m 2.1 Let u be solution to the problem (2.2) and C u the operator de-
fined by (2.5). Then u fulfills the conditions:

Proof.: Being: $(u) = 0 , F20 xi L 0 , i = 1,2 and

- -
from C ( u ,F , X I , X 2 ) = 0 , it follows:

++A- X I ( x ) d x= xl ( x ) ($+ 2) dx

A2 ( x ) d x = - Lx2 (x) ($+ *)

8x2
dx.

From (2.4), taking into account (2.7), (2.8), (2.9), we derive:

524 OPTIMIZATION AND CONTROL WITH APPLICATIONS

( Lu-p- a-+-
8x1
8%) +
ax2
(-+- a
8x2
, v-u ) t o

From (2.10), choosing v = u 5 cp, Vcp E H; (a)we obtain:

ax1 ax1 a as;,
.-(G+G)+(%+G)=o (2.11).
Now let us set:
E = { x E O : ("+*)'<I)
8x1 ax2 (2.12)

E is called the elastic region.

From (2.8) and (2.9) it follows &(x) = 0 i = 1,2, x E E and hence (2.11)
becomes
L-F=O inE.
Now if x E 0 is such that
L-ji#O

also

and in virtue of the previous arguments, it is not possible that:

Then if L - p # 0, it follows that

T is the torsion region.

Thus, we have obtained that:
EQUILIBRIUM PROBLEMS 525

Taking into account (2.7), we find that the solution to the Variational In-
equality (2.2) verifies the conditions (2.6):

(see Brkzis (1972), Chiadb et al. (1994), Lanchon (1969), Ting (1969) for similar
results).
This is a particular case of the general scheme (1.1).
In the paper Brkzis et al. (1977) the author claims for the convenience to
study problems of the general scheme (1.1) in a convex R in which an upper
bound for the gradient of u is given:

This case is studied in paper Idone et al. (2002), in which a similar charac-
terization of (1.1) is proved. In particular the convex set
-
K = {v E ~ ; ( 0 ): v 2 0 v(x) 5 6(x))
6(x) = dist(x, dfl)
is considered and, under further assumptions, the following characterization is

I
shown:
-f - F ) ( ~ ( x-
) 4 ~ ) =) 0

L(u)- f -p<o

~ ( xS) 6(x).
It is also possible to prove

Theorem 2.2 The solution u to problem:

Find u E K : a(u, v - u) 2 f (v - u)dx Vv E K (2.17)

526 OPTIMIZATION AND CONTROL WITH APPLICATIONS

verifies the conditions:

I Lu - - div(X grad u) = 0

Moreover, using these results, it is possible to exhibit a computational pro-

cedure.

References

Borwein, J.M. and Lewis, A.S. (1991), Practical conditions for Fenchel duality
in Infinite Dimension, Pitman Research Notes in Mathematics Series, 252,
pp. 83-89.
BrBzis, H. (1972), Multiplicateur de Lagrange en torsion Blasto-plastique, Arch.
Rational Mech. Anal., 49, pp. 32-40.
BrBzis, H. and Stampacchia, G. (1977), Remarks on some fourth order varia-
tional inequality, Ann. Scuola Norm. Sup. Pisa (4), pp. 363-371.
BrBzis, H. (1972), Problkmes UnilatBraux, J. Mat. pures et appl. 51, pp. 1-168.
Chiadb, V. and Percivale, D. (1994), Generalized Lagrange Multipliers in Elasto-
plastic torsion, Journal of Differential Equations, 114, pp. 570-579.
Daniele, P. (1999), Lagrangean function for dynamic Variational Inequalities,
Rendiconti del Circolo Matematico di Palermo, 58, pp. 101-119.
Idone, G., Variational inequalities and application to a continuum model of
transportation network with capacity constraints, to appear.
Idone, G., Maugeri, A. and Vitanza, C. (2002), Equilibrium problems in Elastic-
Palstic Torsion, Boundary Elements 24th, Brebbia C.A., Tadeu A., Popov
V. Eds., WIT Press, Southampton, Boston, pp. 611-616.
REFERENCES 527

Lanchon, H. (1969), Solution du problbme de torsion 6lasto-plastique, d'une

barre cylindrique de section quelconque, C.R. Acad. Sci. Paris, 269, pp.
791-794.
Maugeri, A. (2001), Equilibrium problems and variational inequalities, in: Equi-
librium Problems: nonsmooth optimization and variational inequalities mod-
els, Maugeri A., Giannessi F. and Pardalos P. Eds., Kluwer Academic Pub-
lishers, pp. 187-205.
Maugeri A. (1998), Dynamic models and generalized equilibrium problems, in:
New Trends in Mathematical Programming, Giannessi F. et al. (eds.), Kluwer
Academic Publishers, pp. 191-202.
Nagurney A. (1993), Network economics. A Variational Inequality Approach,
Kluwer Academic Publishers.
Ting, J.W. (1969), Elasto-plastic torsion of convex cylindrical bars, J. Math.
Mech., 19, pp. 531-551.
Troianiello, G.M. (1987), Elliptic D2ferential Equations and obstacle problems,
Plenum Press.
METHODS FOR MINTY VARIATIONAL
INEQUALITY
Giandomenico Mastroeni

Department of Mathematics, University of Pisa

Via Buonarroti 2, 56127 Pisa, Italy

Abstract: A new class of gap functions associated to the variational inequality

introduced by Minty is defined. Descent methods for the minimization of the
gap functions are analysed in order to develop exact and inexact line-search
algorithms for solving strictly and strongly monotone variational inequalities,
respectively.

Key words: Variational inequality, gap function, descent methods.

530 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

The gap function approach for Variational Inequalities (for short, VI) has al-
lowed to develop a wide class of descent methods for solving the classic V I
defined by the following problem:

where F : lRn -
find y* E K s.t. (F(y*),x - y*) 2 0,

lRn, K G lRn and

We recall that a gap function p : lRn
(a,

-
a)
VX E K ,

is the inner product in lRn.

(VI)

lR is a non-negative function on K ,
such that p(y) = 0 with y E K if and only if y is a solution of VI. Therefore
solving a V I is equivalent to the (global) minimization of the gap function on
K.
In the last years the efforts of the scholars have been directed to the study
of differentiable gap functions in order to simplify the computational aspects of
the problem. See Harker et a1 (1990), for a survey on the theory and algorithms
developed for VI.
The problem of defining a continuously differentiable gap function was first
solved by Fukushima (1992) whose approach was generalized by Zhu et a1
(1994); they proved that

tions: G(x, y) : lRn x lRn -

is a continuously differentiable gap function for V I under the following condi-
lR, is a non-negative, continuously differentiable,
strongly convex function on the convex set K with respect to x, such that

G(y, y) = 0 and V,G(y, y) = 0, Vy E K.

In the particular case where G(x, y) := i ( x - y, M ( x - y)), where M is a sym-

metric and positive definite matrix of order n, it is recovered the gap function
introduced by Fukushima (1992).
Mastroeni (1999) showed that the gap function approach for V I developed
by Fukushima (1992), Zhu et a1 (1994), can be extended to the variational
inequality introduced by Minty (1962):

find x* E K s.t. (F(y), x* - y) 5 0, Vy E K. (VI*)

The interest in the study of Minty Variational Inequality had, at first, theoret-
ical reasons, mainly in the analysis of existence results concerning the classic
VI.
GAP FUNCTIONS 531

In fact, under the hypotheses of continuity and pseudomonotonicity of the

operator F, VI* is equivalent to V I (Karamardian (1976)). Recently, John
(1998) has shown that VI* provides a sufficient condition for the stability of
equilibrium solutions of autonomous dynamical systems:

where x = x(t), t 2 0.
Moreover some algorithmic applications have been developed in the field of
bundle methods for solving V I (see e.g. Lemarechal et a1 (1995)).
In this paper, we will deepen the analysis of descent methods for VI* initi-
ated by Mastroeni (1999). In particular, we will define an inexact line-search
algorithm for the minimization of a gap function associated to the problem
VI*.
In Section 2 we will recall the main properties of the gap functions related
to VI*. In Section 3 we will develop an inexact descent method for VI*, in the
hypothesis of strong monotonicity of the operator F. Section 4 will be devoted
to a brief outline of the applications of Minty Variational Inequality and to the,
recently introduced, extension to the vector case (Giannessi (1998)).

A function f : Rn -
We recall the main notations and definitions that will be used in the sequel.
R is said quasi-convex on the convex set K iff:

Vx1, x2 E K,VX E [0, 11.

If f is differentiable on K , then f is quasi-convex on K iff:

A function f : K -
in (1.1),for every XI
R is said strictly quasi-convex iff strict inequality holds
# x2 and every X E (0,l).
This last definition has been given by Ponstein (1967). Different definitions
of strict quasi-convexity can be found in the literature ( see e.g. Karamardian
(1967)): for a deeper analysis on this topic see Avriel et a1 (1981) and references
therein.
A strictly quasi-convex function has the following properties (Thomson et
a1 (1973)):

(i)f is quasi-convex on K ,
532 OPTIMIZATION AND CONTROL WITH APPLICATIONS

(ii) every local minimum point of f on K is also a global minimum point on

K,
(iii) if f attains a global minimum point x* on K then x* is the unique
minimum point for f on K.

Let X,Y be metric spaces. A point to set map A : X - 2Y is upper

semicontinuous (for short, u.s.c.) according to Berge a t a point A* E X if, for
each open set B > AX*, there exists a neighborhood V of A* such that

A is lower semicontinuous (for short, 1.s.c.) according to Berge at a point A* E

X if, for each open set B satisfying B n AX* # 0, there exists a neighborhood
V of A* such that
A A n B , VAEV.

A is called closed at A* E X iff

A" A* E X , yk - 3 € Y, with 3" AAX"~, implies that y E AX*.

A point to set map is called closed on S C X if it is closed a t every point of

We will say that the mapping F : lRn - lRn is monotone on K iff:

it is strictly monotone if strict inequality holds Vx # y.

We will say that the mapping F is pseudomonotone on K iff:

(F(y), x - y) 2 0 implies (F(x), x - y) 2 0, VX,Y E K.

We will say that F is strongly monotone on K (with modulus p > 0) iff:

It is known (Ortega et a1 (1970)) that, if F is continuously differentiable on K ,

then F is strongly monotone on K iff

where V F denotes the Jacobian matrix associated to F.

GAP FUNCTIONS 533

2 A GAP FUNCTION ASSOCIATED T O M I N T Y VARIATIONAL

INEQUALITY

In this section, we will briefly recall the main results concerning the gap function
theory for VI* (Mastroeni (1999)). Following the analysis developed for the

-
classic VI, we introduce the gap function associated to VI*.

Definition 2.1 Let K 5 lRn. The function p : lRn lR is a gap function

for VI* iff:
4 P(Y) 2 0, VY E K ;
ii) p(y) = 0 and y E K i f f y is a solution for VI*.

By means of a suitable regularization of the variational inequality, it is pos-

(1999)). Let H(x, y) : lRn x lRn

tion, such that
-
sible to define a continuously differentiable gap function for VI* (Mastroeni
R be a non-negative, differentiable func-

H(x,x) = 0, Vx E K; (2.1)

Proposition 2.1 Let K be a convex set i n lRn. Suppose that H : lRn x lRn -
and F : lRn
Then
-
lR, is a non negative, differentiable function o n K that fulfils (2.1) and (2.2)
lRn is a differentiable and pseudomonotone operator o n K .

is a gap function for VI*

Proof: We observe that h(x) 2 0, Qx E K . Suppose that h(x*) = 0 with

x* E K . This is equivalent to say that x* is a global minimum point of the
problem
min[(F(y),Y - x*) + H b * , y)l.
YEK

The convexity of K implies that x* is a solution of the variational inequality

(VY[q(x*,
x*) + H(x*,x*)],Y - x*) 2 0, Qy E K,
534 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where q(x,y) := ( F ( y ) y, - x ) . From (2.2) we obtain

+
Since V y q ( x , y ) = F ( y ) V F ( y ) ( y- x ) then V y q ( x * , x * )= F ( x * ) , which
implies that x* is a solution of V I . By the pseudomonotonicity of F, we obtain
that x* is also a solution of V I * .
Now suppose that x* is a solution of V I * . Since H ( x ,y) is non negative, we
have that
( F ( Y )Y, - x*) + H(x*,Y ) > 0, YY E K ,
which is equivalent to the condition

Since h ( x ) 2 0 , VXE K , we obtain

h(x*)= min max[(F(y),

x - y) - H ( x ,y)] = 0.
xEK yEK

Let us consider the differentiability properties of the function h ( x ) .

Proposition 2.2 Let K be a nonempty compact convex set in Rn.Suppose

that F is continuous on an open set A > K , H : Rnx Rn--+ R is continuously
differentiable on A x A and the function +(x,y) := ( F ( y ) y, - x ) H ( x ,y) +
is strictly quasi convex with respect to y, 'dx E K , then h ( x ) is continuously
differentiable on K and its gradient is given by

where y ( x ) is the solution of the problem m i n , ~+(x,

~ 3).

Proof: We observe that

h ( x ) = - inf +(x,y)
YEK

Since +(x,y) is strictly quasi convex with respect to y then there exists a
unique minimum point y(x) of the problem (2.3) . Applying Theorem 4.3.3 of
Bank et a1 (1983) (see the Appendix), we obtain that y(x) is U.S.C. according
GAP FUNCTIONS 535

to Berge at x and, being y(x) single-valued, it follows that y(x) is continuous

at x.
Since F is continuous and H is continuously differentiable then V x 4 is con-
tinuous. Therefore, from theorem 1.7 Chapter 4 of Auslender (1976) (see the
Appendix), taking into account that (2.3) has a unique minimum point, it
follows that h is differentiable in the sense of Gateaux at x and

From the continuity of F, y(x) and VxH, it follows that hl(x) is continuous at
x so that h is continuously differentiable and

3 EXACT AND INEXACT DESCENT METHODS

In the previous section we have shown, that under suitable assumptions on the
operator F and the function H , the gap function associated to the variational
inequality VI*:

is continuously differentiable on K.
This considerable property allows us to define descent direction methods for
solving the problem
min h(x) .
x EK

After recalling an exact descent method proposed by Mastroeni (1999), we will

analyse an inexact line search method. We will assume that

1. K is a nonempty compact and convex set in lRn;

2. +(x, y) := (F(y),y - x) + H(x, y) is strictly quasi convex with respect to

y, Vx E K ;

> K;
-
3. F is a continuously differentiable operator on an open set A

4. H(x, y) : Rnx Rn R is a non negative function on K , which is

continuously differentiable on A x A. Moreover, we suppose that it fulfils
536 OPTIMIZATION AND CONTROL WITH APPLICATIONS

conditions (2.1) and (2.2) and the further assumption:

Remark 3.1 The hypothesis 4 is fulfilled by the function H(x, y) := %(M(x-

y), x - y) where M is a symmetric matrix of order n. With this choice of the
function H , the hypothesis 2 is fulfilled when (F(y), y-x) is convex with respect
to y, Vx E K , and M is positive definite; for example when F(y) = Cy+b where
C is a positive semidefinite matrix of order n and b E lRn. A characterization
of strict quasi convexity, in the differentiable case, is given in Theorem 3.26 of
Avriel et a1 (1981).
In order to obtain a function H which fulfils (2.1),(2.2) and the condition
(3.2), as noted by Yamashita et a1 (1997), it must necessarily be

where $J : lRn
$J(O)= 0.
- lR is nonnegative, continuously differentiable and such that

We recall that, from Proposition 2.2, h is a continuously differentiable function

and Vh(x) = F(y(x))-V,H(x, y(x)), where y(x) is the solution of the problem

Lemma 3.1 Suppose that the hypotheses 1-4 hold and, furthermore, VF(y) is
a positive definite matrix, Vy E K . Let g(x) be the solution of P(x). Then x*
is a solution of VI* @x* = y(x*).

Proof: Since VF(y) is a positive definite matrix, Vy E K , and F is con-

tinuously differentiable, then F is a strictly monotone operator (Ortega et a1
(1970), Theorem 5.4.3). Therefore x* is a solution of VI* iff

0 = h(x) = - min +(x,y)

YEK

and, by the uniqueness of the solution, iff y(x) = x. 0

Next result proves that y(x) - x provides a descent direction for h at the
point x, when x # x*.
GAP FUNCTIONS 537

Proposition 3.1 Suppose that the hypotheses 1-4 hold and F is strongly monotone
on K (with modulus p > 0). Let y(x) be the solution of the problem P(x) and
d(x) := y(x) - x. Then

Proof: Since K is a convex set y(x) fulfils the condition

that is, putting q(x, y) := (F(y), y - x),

In particular for z := x we obtain

and the proposition is proved.

Remark 3.2 If we replace the hypothesis of strong monotonicity of the op-

erator F, with the one of strict monotonicity, we obtain the weaker descent
condition:
(Vh(x),4x1) < 0,
provided that y(x) # x.

The following exact line search algorithm has been proposed by Mastroeni
(1999):

Algorithm 1

Step 1. Let xo E K , E be a tolerance factor and k = 0. If h(xo) = 0, then

STOP, otherwise go to step 2.
538 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Step 2. Let dk := y(xk) - xk.

Step 3. Let tk E [O,1] be the solution of the problem

min{h(xk + tdk) : 0 I t I 1); (3-4)

put Xk+1 = Xk -b tkdk.

If 11xk+1 - xkll < E, then STOP, otherwise let k = k + 1 and go to step 2.
The following convergence result holds (Mastroeni (1999)):

Theorem 3.1 Suppose that the hypotheses 1-4 hold and VF(y) is positive def-
inite, Vy E K . Then, for any xo E K the sequence {xk) defined by Algorithm 1
belongs to the set K and converges to the solution of the variational inequality
VI*.
Proof: Since VF(y) is positive definite Vy E K , and F is continuously differ-
entiable then F is a strictly monotone operator (Ortega et a1 (1970), Theorem
5.4.3) and therefore both problems V I and VI* have the same unique solution.
The convexity of K implies that the sequence {xk) C K since tr, E [O,l]. I t
is proved in the Proposition 2.2 that the function y(x) is continuous, which
implies the continuity of d(x). It is known (see e.g. Minoux (1986), Theorem
3.1) that the map

is closed whenever h is a continuous function.

Therefore the algorithmic map xk+l = U(xk,d(xk)) is closed, (see e.g.
Minoux (1986), Proposition 1.3). Zangwill's convergence theorem (Zangwill
(1969)) (see the Appendix) implies that any accumulation point of the sequence
{xk) is a solution of V I * . Since VI* has a unique solution, the sequence {xk}
converges to the solution of V I * . 0

Algorithm 1 is based on an exact line search rule: it is possible to consider

the inexact version of the previous method.

Algorithm 2

Step 1. Let xo be a feasible point, E be a tolerance factor and P, parameters

in the open interval (0,l). Let k = 0.

GAP FUNCTIONS 539

Step 2. If h(xk)= 0, then STOP, otherwise go to step 3.

Step 3. Let dk := y(xk) - xk. Select the smallest nonnegative integer m such
that
h(xk)- h(xk +pmdk) 2 apmIldkl12,
set ak = pm and xk+l = xk + akdk.
If Ilxk+l - xkII < E, then STOP, otherwise let k = k + 1 and go to step 2.

Theorem 3.2 Suppose that the hypotheses 1-4 hold, F is a strongly monotone
operator on K with modulus p, a < p/2, and { x k ) is the sequence defined in
the Algorithm 2. Then, for any xo E K , the sequence { x k ) belongs to the set
K and converges to the solution of the variational inequality V I * .

Proof: The convexity of K implies that the sequence { x k ) C K, since ak E

[O, 11. The compactness of K ensures that { x k ) has a t least one accumulation
point. Let { Z k ) be any convergent subsequence of { x k ) and x* be its limit
point.
We will prove that y(x*) = x* so that, by Lemma 3.1, x* is the solution of
VI*.
Since y(x) is continuous (see the proof of Proposition 2.2) it follows that d(x)
is continuous; therefore we obtain that d(3k) ---t d(x*)=: d* and h(Zk)
h(x*)=: h*. B y the line search rule we have
-
for a suitable subsequence { & ) C {ak).
Let us prove the relation (3.5). We observe that, by the line search rule, the
sequence { h ( x k ) )is strictly decreasing. Let k E N and xl;: := Z k , for some
iE E N;
we have

Putting & := a i , we obtain (3.5). Therefore,

If Gk > pmO
y (x*)= x* .
> 0, for some mo, 'dk > k E N, then Ild(Zk)ll - 0 SO that
540 OPTIMIZATION AND CONTROL WITH APPLICATIONS

akl -
Otherwise suppose that there exists a subsequence
0. By the line search rule we have that
{ a k ~ )G { C k ) such that

where &I = I
-i+
Taking the limit in (3.6) for k + oo, since
differentiable, we obtain
- 0 and h is continuously

-(Vh(x), d) 5 ~lld*11~. (3.7)

Recalling Proposition 3.1, we have also

-(Vh(x*),d*) 2 p(ld*1I2.

Since o < $, it must be Ildll = 0, which implies y(x) = x*.

4 SOME APPLICATIONS AND EXTENSIONS OF MINTY

VARIATIONAL INEQUALITY

Besides the already mentioned equivalence with the classic VI, Minty varia-
tional inequality enjoys some peculiar properties that justify the interest in the
development of the analysis. We will briefly recall some applications in the
field of optimization problems and in the theory of dynamical systems. Finally
we will outline the recently introduced extension to the vector case (Giannessi
(1998)).
Consider the problem

-
min f(x), s.t. X E K , (4.1)

where f : Rn R is a continuously differentiable function on the convex set

K.
The following statement has been proved by Komlosi (1999).

Theorem 4.1 Let F := V f . If x* is a solution of VI* then x* is a global

minimum point for (4.1).

In some particular cases, the previous result leads to an alternative character-

ization of a global minimum point of (4.1).
GAP FUNCTIONS 541

Corollary 4.1 Let F := V f and suppose that f is a quasi-convex function on

K. Then x* is a solution of V I * i f and only if it is a global minimum point for
(4.1).
Proof: Suppose that x* is a global minimum point of (4.1). By the equivalent
characterization (1.2) of the quasi-convexity, in the differentiable case, it follows
that x* is a solution of V I * . The converse implication follows from Theorem
4.1. 0

A further interesting application can be found in the field of autonomous

dynamical systems:
dx
-
dt
+ F ( x ) = 0, x E K,
(DS)
where x = x ( t ) , t 2 0.

Suppose that V F is continuous on the set

R := { x E K : IIxII < A),

where A > 0, so that there exists a unique solution x ( t ) of DS with x(to)= so.
Consider an equilibrium point x* E R, which fulfils the relation F ( x * ) = 0. It
is obvious that
x ( t ) = x*, vt )t 0, x ( t o )= x*,
is a solution for DS. The following definition clarifies the concept of stability
of the previous solution.
Definition 4.1 The equilibrium point x* is said stable for DS iJ for every
0 < E < A, there exists 0 < S I E such that i f llxo -x*II I 6, then Ilx(t)-x* 11 <
E, Vt 2 0, where x ( t ) is the solution of DS with the initial condition x ( t o )= xo.

Minty Variational Inequality provides a sufficient condition for the equilibrium

point x* to be stable.

Theorem 4.2 (John (1998))Let x* be an equilibrium point for DS. If

then x* is stable.

Giannessi (1998) has extended the analysis of V I * to the vector case and has
obtained a first order optimality condition for a Pareto solution of the vector
optimization problem:
542 OPTIMIZATION AND CONTROL WITH APPLICATIONS

where C is a convex cone in Re, f : K -

minc\{o} f (x) s.t. x E K ,

lRe and K 5 lRn.

The Minty vector variational inequality is defined by the following problem:
(4.2)

find x E K such that

(VVI*)

where, a 2c\{o) b iff a - b E C \ {0), F : Rn ---+ Rex".

We observe that, if C := R+,then Minty vector variational inequality collapses
into VI*.
In the hypotheses that C = intR;, F = Vf and f is a (componentwise)
convex function, Giannessi (1998) proved that x is an optimal solution for (4.2)
if and only if it is a solution of VVI*.
Further developments in the analysis of VVI* can be found in Giannessi
(1998), Komlosi (1999), Mastroeni (2000).

5 CONCLUDING REMARKS

We have shown that the gap function theory developed for the classic V I , intro-
duced by Stampacchia, can be extended, under further suitable assumptions, to
the Minty Variational Inequality. These extensions are concerned not only with
the theoretical point of view, but also with the algorithmic one: under strict or
strong monotonicity assumptions on the operator F , exact or inexact descent
methods, respectively, can be defined for VI* following the line developed for
VI.
I t would be of interest to analyse the relationships between the class of gap
functions associated to V I and the one associated to VI* in the hypothesis of
pseudomonotonicity of the operator F, which guarantees the equivalence of the
two problems. This might allow to define a resolution method, based on the
simultaneous use of both gap functions related to V I and VI*.

6 APPENDIX

In this appendix we recall the main theorems that have been employed in the
proofs of the results stated in the present paper.
GAP FUNCTIONS 543

Theorem 6.1 (Bank et a1 (1983)) is concerned with the continuity of the opti-
mal solution map of a parametric optimization problem. Theorem 6.2 (Auslen-
der (1976)) is a generalization of well-known results on directional differentia-
bility of extremal-value functions. Theorem 6.3 is the Zangwill convergence
theorem for a general algorithm formalized under the form of a multifunction.
Consider the following parametric optimization problem:

- -
where f : A x Y
Let $ : A
- v(x) := inf {f (x, y) s.t.

R, M : A 2Y, Y
2Y be the optimal set mapping
y E M(x)},

G Rn and A is a metric space.

Theorem 6.1 ( Bank et a1 (1983), Theorem 4.3.3 ) Let Y := Rn and xO E A.

Suppose that the following condition are fulfilled:

1. $(xO) is non-empty and bounded;

2. f is lower semicontinuous on {xO}x Y and a point yo E $(xO) exists such

that f is upper semicontinuous at (so, yo);

3. f (x, -) is quasiconvex on Y for each fixed x E A;

4. M(x) is a convex set, 'dx E A;

5. M(xO)is closed and the mapping M is closed and lower semicontinuous,

according to Berge, at xO.

Then $ is upper semicontinuous according to Berge at xO

We observe that, if M(x) = K , Vx E A, where K is a nonempty convex and

compact set in R n , then the assumptions 1,4 and 5, of Theorem 6.1, are clearly
fulfilled and it is possible to replace the assumption Y := Rn with Y := K .
Next result is well-known and can be found in many generalized versions:

RP -
we report the statement of Auslender (1976). We recall that a function h :
R is said to be "directionally differentiable" a t the point x* E RP in
the direction d, iff there exists finite:

lim
h(x* + td) - h(x*) =: hl(x*;d).
t*0+ t
544 OPTIMIZATION AND CONTROL WITH APPLICATIONS

If there exists z* E RPsuet that hl(x,d) = (z,d) then h is said to be differ-

entiable in the sense of Gateaux at x*, and z* is denoted by ht(x*).

Theorem 6.2 (Auslender (1976),Theorem 1.7, Chapter 4) Let

v ( x ) := inf f ( x ,y ) ,

where f : RPX Y - R. Suppose that

1. f is continuous on RPx Y ;
YEY

2. V , f exists and is continuous on R x Y , where R is an open set in IRP;

3. Y is a closed set in Rn;

4. For every x E Rp,@ ( x ):= { y E Y : f ( x ,y) = v ( x ) ) is nonempty and

there exists a neighbourhood V ( x )o f x , such that UZEV(,)+(z)is bounded.

Then, for every x E R, we have:

v t ( x ; d )= inf ( V , f ( x , y ) , d ) .
YE+(.)

Moreover, i f for a point x* E R, @ ( x * )contains exactly one element y(x*), then

v is differentiable, in the sense of Gateaux, at x* and

vt(x*)= Vz f (x*,y(x*)).

We observe that, when Y is a nonempty compact set, then the assumptions

3 and 4, of Theorem 6.2, are obviously fulfilled. The reader can also refer
to Hogan (1973) and references therein for similar versions of the previous
theorem.
Finally, we recall the statement of Zangwill Convergence Theorem as re-
ported in Minoux (1986). Given an optimization problem P defined on X g
Rn, let M be the set of the points of X that fulfil a suitable necessary opti-

represented by a point to set map A : X

Definition 6.1 We say that z : X

-
mality condition. Suppose that, in order to solve P , it is used an algorithm

- 2X.

R is a descent function (related to the

algorithm A) i f it is continuous and has the following properties:

1. x $ M implies z ( y ) < z ( x ) Vy E A ( x ) ,

2. x E M implies z ( y ) < Z(X) Vy E A(x).

REFERENCES 545

Theorem 6.3 ( Zangwill (1969)) Let P be an optimization problem on X and

M be the set of the points of X that fulfil a certain necessary optimality condi-
tion.
Let A : X - 2X be the algorithmic point to set mapping and consider a
sequence {xk} generated by the algorithm, i.e. satisfying xk+l E A(xk).
Suppose that the following three conditions hold:

1. Every point xk is contained i n a compact set K C X ;

2. There exists a descent function z;

3. The point to set map A is closed on X \M and Vx EX \ M, A(x) # 8.

Then, for every x which is the limit of a convergent subsequence of {x", we
have that x E M.

References

Auslender A. (1976), Optimization. Methodes Numeriques, Masson, Paris.

Avriel M., Diewert W.E., Schaible S. and Ziemba W.T. (1981), Introduction
to concave and generalized concave functions, in '(Generalized Concavity in
Optimization and Economics", S. Schaible, W.T. Ziemba (Eds.), pp. 21-50.
Bank B., Guddat J., Klatte D., Kummer B. and Tammer K. (1983), Nonlinear
Parametric Optimization, Birkhauser Verlag.
Fukushima M. (1992), Equivalent differentiable optimization problems and de-
scent methods for asymmetric variational inequality problems, Mathematical
Programming, Vol. 53, pp. 99-110.
Giannessi F. (1998), On Minty variational principle, in "New Trends i n Mathe-
matical Programming", F . Giannessi, S. Komlosi, T. Rapcsak (Eds.), Kluwer
Academic Publishers, Dordrecht, Boston, London.
Harker P.T., Pang J.S. (lggo), Finite-dimensional variational inequalities and
nonlinear complementarity problem: a survey of theory, algorithms and ap-
plications, Mathematical Programming, Vol. 48, pp. 161-220.
Hogan W. (1973), Directional derivatives for extremal-value functions with ap-
plications to the completely convex case, Operations Research, Vol. 21, N.l,
pp.188-209.
John R. (1998), Variational inequalities and pseudomonotone functions: some
characterizations, in " Generalized Convexity, Generalized Monotonicity",
546 OPTIMIZATION AND CONTROL WITH APPLICATIONS

J.P. Crouzeix, J.E. Martinez-Legaz (Eds.), Kluwer Academic Publishers,

Dordrecht, Boston, London.
Karamardian S. (1976), An existence theorem for the complementary problem,
Journal of Optimization Theory and Applications, Vol. 18, pp. 445-454.
Karamardian S. (1967), Strictly quasi-convex functions and duality in mathe-
matical programming, Journal of Mathematical Analysis and Applications,
Vo1. 20, pp. 344-358.
Komlosi S. (1999), On the Stampacchia and Minty variational inequalities,
in "Generalized Convexity and Optimization for Economic and Financial
Decisions", G. Giorgi and F. Rossi (Eds.), Pitagora, Bologna, Italy, pp.
231-260.
Lemarechal C., Nemirovskii A. and Nesterov Y. (1995), New variants of bundle
methods, Mathematical Programming, Vol. 69, pp. 111-147.
Mastroeni G. (1999), Minimax and extremum problems associated to a varia-
tional inequality, Rendiconti del Circolo Matematico di Palermo, Vol. 58 ,
pp. 185-196 .
Mastroeni G. (2000), On Minty vector variational inequality, in "Vector Vari-
ational Inequalities and Vector Equilibria. Mathematical Theories", F. Gi-
annessi (Ed.), Kluwer Academic Publishers, Dordrecht, Boston, London pp.
351-361.
Minoux M. (1986), Mathematical Programming, Theory and Algorithms, John
Wiley, New York.
Minty G. J. (1962), Monotone (non linear) operators in Hilbert space, Duke
Math. Journal, Vol. 29, pp. 341-346.
Ortega J.M., Rheinboldt W.C. (1970), Iterative solutions of nonlinear equations
in several variables, Academic Press, New York.
Ponstein J. (1967), Seven kind of convexity, S.I.A. M. Rev., 9, pp. 115-119.
Thompson W.A., Parke D.W. (1973), Some properties of generalized concave
functions, Operations Research, Vol. 21, pp. 305-313.
Yamashita N., Taji K. and Fukushima M. (1997), Unconstrained optimization
reformulations of variational inequality problems, Journal of Optimization
Theory and Applications, Vol. 92, pp. 439-456.
Zangwill W.I. (1969), Nonlinear Programming: a unified approach, Prentice-
Hall, Englewood Cliffs, New York.
REFERENCES 547

Zhu D.L., Marcotte P. (1994), An extended descent framework for variational

inequalities, Journal of Optimization Theory and Applications, Vol. 80, pp.
349-366.
ALGORITHMS FOR THE NONLINEAR
COMPLEMENTARITY PROBLEM
G.J.P. DA Silva

Universidade Federal de Goiis

lnstituto de Matemitica e Estatistica
Campus 11, CP 131
GoiSnia, GO, CEP 74001-970, Brazil.
geciQcos.ufrj.br

and P.R. Oliveira

Engenharia de Sistemas e Computa~Zo

COPPE-UFRJ,CP 68511
Rio de Janeiro, RJ, CEP 21945-970, Brazil.
poliveirQcos.ufrj.br

Abstract: In this paper, we consider a new variable proximal regularization

method for solving the nonlinear complementarity problem(NCP) for Po func-
tions.

Key words: Nonlinear complementarity problem, P o function, proximal reg-

ularization.
550 OPTIMIZATION AND CONTROL WITH APPLICATIONS

1 INTRODUCTION

Consider the nonlinear complementarity problem (NCP(F)),

where we assume that F : lRn + lRn is a continuously differentiable Po

function. We recall that F is a Po function if for any x # y in Rn

Note that the class of Po functions includes the class of monotone functions.
Applications of N C P can be found in many important fields such as mathe-
matical programming, economics, engineering and mechanics (see, e.g.,Cottle
et a1 (1992); Harker and Pang (1990)).
There exist several methods for the solution of the complementarity problem.
In this paper are considered regularization methods, which are designed to
handle ill-posed problems. Very roughly speaking, an ill-posed problem may
be difficult to solve since small errors in the computations can lead to a totally
wrong solution.
For the class of the Pofunctions, Facchinei and Kanzow (1999) considered the
Tikhonov-regularization, this scheme consist of solving a sequence of comple-
mentarity problems NCP(Fk), where Fk(x) := F(x) + ckx and ck is a positive
parameter converging to 0, and Yamashita et a1 (1999), considered the proximal
point algorithm, proposed by Martinet (1970) and further studied by Rockafel-
lar (1976). For the NCP(F), given the current point xk, the proximal point
algorithm produces the next iterate by approximately solving the subproblem
+
NCP(Fk), where Fk(x) := F(x) ck(x - x" and ck is a positive parameter
that does not necessary converge to 0. In the case above, if F is a Po function,
then Fk is a P function, that is, for any x, y E Rnwith x # y,

Therefore the subproblem NCP(Fk) is better tractable than N C P ( F ) , since

if F is a P function, then NCP(F) has at most one solution.
In this paper, we consider a variable proximal regularization algorithm.
Given the current point xk > 0, the variable proximal regularization algorithm
produces the next iterate by approximately solving the subproblem NCP(F",
where
+
F ~ ( x ):= F(x) c~(x"-'(x - x k ),
A NEW CLASS O F PROXIMAL ALGORITHMS 551
k -r
(xk)-' is defined by (x")-~ = di~g{(x:)-~, (xn) ), r 2 1 and ck is a
positive parameter.
Now, some words about our motivation. It comes from the application of
some tools of Riemannian geometry to the continuous optimization. This is
object of research by many authors, as can be seen in Bayer and Lagarias
(1989); Bayer and Lagarias (1989a); Cruz Neto, Lima and Oliveira (1998); Fer-
reira and Oliveira (1998); Ferreira and Oliveira (2002); Gabay (1982); Kar-
markar (1990); Nesterov and Todd (2002); Rapcs&k (1997); Udriste (1994),
and in the bibliography therein. One of the trends is given in Cruz Neto
and OLiveira (1995), where it is explored the idea that associates Riemannian
metrics and descent directions. Specifically, they start from the equivalence
property between a metric dependent gradient and the generator descent direc-
tion (also metric dependent). The authors had unified a large variety of primal
methods, seen as gradient ones, and obtained other classes. That includes pri-
mal interior point methods such as Dikin (1967), Karmarkar (1984), and the
Eggermont multiplicative algorithm, by Eggermont (1990). A general theory
for such gradient methods can be seen in Cruz Neto, Lima and Oliveira (1998).
Particularly, in Cruz Neto and OLiveira (1995), they considered the positive
orthant R+-tn as a manifold, associated with a class of metrics generated by
the Hessian of some separable functions p(x) = Cy=lpi(~i), pi : R++ -+ R,
pi(xi) > 0, for i = 1,2, ...,n. Clearly, the functions such that pi = x i r ,
i = 1,2, ...,n, and r 2 1, is contained in that class. For r = 1, r = 2, and r = 3,
they correspond, respectively, to the Hessian of Eggermont multiplicative, log
and Fiacco- McCormick barriers. Those metrics, denoted by X-r, lead to pro-
jective (affine) interior point methods in Pinto, Oliveira and Cruz Neto (2002),
and proximal interior point algorithms in Oliveira and Oliveira (2002). Here,
we exploit those ideas in a context of NCP. Observe that in the case where
NCP is equivalent to the optimality conditions of some linear or nonlinear pro-
gramming problem, our method can be seen as an infeasible interior point class
of algorithms.
By using the Mountain Pass Theorem, we show that the method converges
globally if F is a Pofunction and the solution set of N C P ( F ) is nonempty and
bounded. Facchinei (1998) give necessary and sufficient conditions that ensure
that the solution set of N C P ( F ) is bounded.
552 OPTIMIZATION AND CONTROL WITH APPLICATIONS

The paper is organized as follows. In Section 2, we review some results which

will be used in the following sections. In section 3, we prove that the regularized
problem has a unique solution. In Section 4, we describe the proposed algorithm
and we show its convergence properties.

2 PRELIMINARIES

In this section we review some basic definitions and properties which will be
used in the subsequent analysis.
We first restate the basic definition.

Definition 2.1 A matrix M E Rnxnis called

1. a Po-matrix if, for every x E Rn with x # 0 , there is an index i = i ( x )

with
X i # 0 and x ~ [ M x2] 0;
~

2. a P-matrix i;f,for every x E IRn with x # 0 , it holds that

The following proposition summarizes some useful properties that play an

important role in the analysis of the uniqueness of solution of the regularized
problems N C P ( F ~ ) .

Proposition 2.1 Let F be function from IRn in IRn.

( a ) (More' and Rheinboldt (1973), Theorem 5.8) If F is a Po function, then

the Jacobian matrix, F t ( x ) , is a Po-matrix for every x E R n ;

(b) (More' and Rheinboldt (1973), Theorem 5.2) If F ' ( x ) is a P-matrix for
every x E IRn, then F is a P function;

(c) (More' (74), Theorem 2.3) If F is a P function, then N C P ( F ) has at most

one solution.

The following theorem is a version of the mountain pass theorem (see, Palais
and Terng (1988)),that will be used to establish a global convergence theorem
for the proposed algorithm.
-
A NEW CLASS OF PROXIMAL ALGORITHMS 553

Theorem 2.1 Let f : lRn IR be continuously diflerentiable and coercive.

Let S c lRn be a nonempty and compact set and define m to be least value of
f on the (compact) boundary of S :

m := min f ( x ) .
~Eas
Assume further that there are two points a E S and b 6 S such that f (a) < m
and f (b) < m. Then there exists a points c E lRn such that V f ( c ) = 0 and
f ( c ) L m.

3 EXISTENCE OF REGULARIZED SOLUTIONS

In this section, we prove that the regularized problem N C P ( F ~ )has a unique

solution for every k. For this, we consider the equivalent reformulation of

IR2 -
N C P ( F ) using the Fischer- Burmeister function (see, Fischer (1992)), cp :
lR, defined by

p(a, b) = JW- a - b.
The most fundamental property of this function is that

p(a, b) = 0 +=+ a 2 0, b 2 0 and ab= 0.

Using this function, we obtain the following system of equations equivalent to

With this operator, we define a merit fuction GJ : Rn

1
- lR through

GJ(x)= 511wl12.
For the regularized problem, we define the corresponding operator and the
corresponding merit function similarly as

and
554 OPTIMIZATION AND CONTROL WITH APPLICATIONS

We summarize some useful properties of those functions in the following result

(see, Facchinei and Kanzow (1999)).

Proposition 3.1 The following statements hold:

1. x* E lRn solves N C P ( F ) i f and only i f x* solves the nonlinear system of

equations Q ( X )= 0.

2. The merit function is continuosly differentiable o n the whole space Cn.

3. If F is a Po function, then every stationary point of Q, is a solution of

NCP(F).

The proof of the following lemma can be found in Kanzow (1996).

or ak - -oo or bk - --o Then Ip(ak, bk)l -

Lemma 3.1 Let { a k } ,{bk} C C be any two sequences such that ak,bk
+m.
- +oo

The following proposition plays an important role in proving the existence

of solution of the regularized problems N C P ( F ~ ) .

Proposition 3.2 Suppose that F is a Po function. Then the merit function

ak is coercive for every k, i.e.,
lim a k ( x ) = +oo.
llxll---
Proof: Suppose by contradiction that there exists an unbounded sequence
{zl) such that { ~ ~ ( z is
' )bounded.
) Since the sequence {zL}is unbounded, the
index set
J : = { j E {l,...,n}l{zi) is unbounded},
is nonempty. Subsequencing if necessary, we can assume without loss of gen-
erality that lzil --+ oo for all j E J . Therefore, we consider two possible
cases.

Case 1. z j - oo Let {yl) denotes the bounded sequence defined through

zj i f j $ J.
A NEW CLASS OF PROXIMAL ALGORITHMS 555

jFrom the definition of { y l ) and the assumption that F is a Pofunction,

we get

0 5 max ( z f - y f ) [ ~ i ( z-' )~ i ( ~ ' ) ]

lsisn
= maxzf [ ~( z il )- ~i ( y l ) ]
i€J
= 2;. [ ~( zjl )- Fj ( y l ) ] ,

where j is one of the indices for which the max is attained, that is in-
dependent of I . Since { y l ) is bounded, by continuity of Fj it follows
that { ~ j ( y ' )is) bounded. Therefore, zj [ F j ( y l ) ]2 0 implies
~( zjl )-

that { ~ ~ ( z does
' ) ) not tend to -m. This, in turn, implies zj ---t oo e
+
F j ( z l ) ~ ~ ( x $ ) - ~-(x$)
z j --+ m. By Lemma 3.1 we have that

contradicting the boudedness of the sequence { @ ( z l ) ) .

Case 2. zi -+ -oo We have immediately from Lemma 3.1 that

contradicting the boundedness of the sequence { ~ ~ ( z ' ) ) .

We are now in position to prove the following existence and uniqueness result.

Theorem 3.1 If F is a Pofunction, then the NCP(F" has a unique solution

for every k .

Proof: Since F 1 ( x )is a Po-matrix, ck > 0 and (X")-T is positive definite, by

the definition of F%e +
have that ( F k ) l ( x )= F1(x) c k ( X k ) - Tis a P-matrix.
Therefore, by Proposition 2.1 (b)it follows that Fk is a P function. This in
turn implies that N C P ( F k ) has at most one solution, due to the Proposition
2.1 ( c ) .
We prove now the existence of solution. From the Proposition 3.2 @ i s
coercive. Since the function Qk is continuos, it attains a global minimum. This,
556 OPTIMIZATION AND CONTROL WITH APPLICATIONS

in turn, implies, also using Proposition 3.1, item 2, that the global minimum
is a stationary point of @. However, @ i s a
P function; in particular, 9pk itself is a Pofunction, so that the global mini-
mum must be a solution of N C P ( F k ) , due to Proposition 3.1, item 3.

4 ALGORITHM AND CONVERGENCE

In this section we propose a variable metric proximal algorithm and we show

convergence properties.

Algorithm 4.1 S t e p 0: Choose co > 0, So E ( 0 , l ) and x0 E R:+. Set

k := 0.

S t e p 1: Given ck > 0, Sk E (0,l) and x" RlR"++,

obtain xk+l E R
:
+ such
that 9pk(xk+l)i I Sk.

S t e p 2: Choose ck+l E (0, ck) and 6k+l E (0, Sk). Set k := k + 1, and go to
Step 1.

The Algorithm 4.1 is well defined, since by Theorem 3.1 the N C P ( F 9 has
a unique solution, therefore, as @k is continuous, given dk > 0, there exists
xW1 E R:+, in a neigborhood of the unique solution of the N C P ( F ~ ) , such
that @(x"l)* 5 Sk.

Now, we give conditions under which the algorithm converges globally to a

-
solution of N C P ( F ) . The sequence {ck) satisfies the following conditions:

-
(A) ck(xk))-' (xM1 - xk) 0 if {xk) is bounded;

(B) c ~ ( x ~ ) ~0 if {xk) is unbounded and s 1 0.

R e m a r k 4.1 I n Oliveira and Oliveira (2002), they introduce a family of vari-

able metric interior-proximal methods which considering F = V f and assuming
that F is a Lipschtz continuous monotone function, they showed that hold (A).
A NEW CLASS OF PROXIMAL ALGORITHMS 557

Remark 4.2 The conditions (A) and (B) can be ver$ed if we define ck =
for 4 E (0,1). In this case ck +0.
)

Remark 4.3 When s = 0 in (B), then ck +0.

Under above conditions, we have the following lemma.

Lemma 4.1 Suppose that condition (B) holds. Let S C lRn be an arbitrary
compact set. If {xk} zs unbounded, then for any e > 0, there exists a sujjiciently
large ko such that for all k 2 ko

I@"x) - @(x)l 5 e for all x E S.

Proof: By definition of Fkwe have that for any x E S

where e = (1,. . . , l)TE lRn. Therefore by condition (B), and the fact that S
is compact we have I~F"x)- F(x)ll --+ 0, since r 2 1 e x E S. Now, for any
a,b,c E R we have that

Applying the result above with a = xi, b = Fi(x) and c = c~(x:)-"(x~- x:)
we have that Icp(xi, F!(x)) - cp(xi, F ~ ( x ) ) 5
[ 21~k(xP)-~(xi
- x:)I --+ 0.
Since S is compact, Fk converges uniformly to F in S; furthermore, cp, F
and F%re continuous, so we have that for all i

The following result is our main convergence theorem for the Algorithm 4.1.
558 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Theorem 4.1 Suppose that F is a Po function and assume that the solution

and ( B ) hold. If dr, -

set S* of N C P ( F ) is nonempty and bounded. Suppose also that conditions (A)
0, then {x" is bounded and any accumulation point
of {xk} is a solution of N C P ( F ) .

Proof: First we show that {x" is bounded. Suppose that the sequence {xk}
is not bounded. Then there exists a subsequence { x ~ such
} that
~ ~llxkll~ + co
as k + co with k E K. Since S*is bounded, there exists a nonempty compact
set S c lRn such that S*c i n t ( S ) and x" S for all k E K , sufficiently large.
If x* E S*,then we have @(x*)= 0. Let

Applying Lemma 4.1 with e := $, there exists some ko such that for all k 2 ko

and
m := min@"x)
XE~S
> -.34a
Since ak(xk+') < 6; by Step 1 of Algorithm 4.1, there exists some kl such
that for all k 2 kl,
< ,;
-
ak(xk+l)

since dk 0 by our asumption.

Now, consider a fixed index k 2 max{ko, kl} with k E K and set a = x* and
b = xk+l, we have from Theorem 2.1 that there exists a vector c E lRn such
that
3a
v a k ( c ) = 0 and ~ ~ (2crn) 2 - > 0.
4
Therefore c is a stationary point of ak,which does not minimize akglobally.
However this contradicts Proposition 3.1, item 3. Hence {x" is bounded.

Since {x"
--
is bounded, we have IIFk(xk+l)- ~(x"l)ll
(A), and hence I@"xwl) - (a(x"')l
-
Next, we show that any accumulation point of {xk} is a solution of N C P ( F ) .
0 by condition
0. By Step 1 of algorithm and the

that @(xwl) -
assumption that 6k --+ 0, we have ak(xk+l) 0 . Consequently it holds
0, which means that every accumulation point of the sequence
{xk} is a solution of NCP(F).
REFERENCES 559

5 CONCLUSIONS

We presented a new class of algorithms for N C P , with convergence results for

Po functions. As we have seen in the introduction, a few papers were produced
for that class of N C P . As a further working, we are particularly interested
in the behavior of the algorithm, as depending of the r 2 1 parameter, also,
when applied to monotone functions. On the other hand, observe that although
Fischer-Burmeister function (Fischer (1992)), was essential in our theoretical
analysis, it could be used someother function in step 1 of our algorithm. Now,
we are working on different choices, in order to get a easier computable ck. As
a final remark, it is worthwhile to consider the case when the solution set of
N C P ( F ) is unbounded.

Acknowledgments

The author Da Silva thanks CAPES/PICDT/UFG for support. The author Oliveira
thanks CNPq for support.

References

Palais, R.S. and Terng, C.L. (1988), Critical point theory and submanifold geom-
etry, Lecture Note in Mathematics, 1353, Springer Verlag, Berlin.
Cottle, R.W., Pang, J.S. and Stone, R.E. (1992),The Linear Complementarity
Problem, Academic Press, New York.
Rapcsdk, T. (1997), Smooth Nonlinear Optimization in Rn, Kluwer Academic
Publishers, Dordrecht, Netherlands.
Udriste, C. (1994), Convex Functions and Optimization Methods in Riemannian
Geometry, Kluwer Academic Publishers, Dordrecht, Netherlands.
Bayer, D.A. and Lagarias, J.C. (1989), The Nonlinear Geometry of Linear Pro-
gramming I, Affine and Projective Scaling Trajectories, Transactions of the
American Mathematical Society, Vol. 314, No 2, pp. 499-526.
Bayer, D.A. and Lagarias, J.C. (1989), The Nonlinear Geometry of Linear
Programming 11, Legendre Transform Coordinates and Central Trajecto-
ries, Transactions of the American Mathematical Society, Vol. 314, No 2,
pp. 527-581.
560 OPTIMIZATION AND CONTROL WITH APPLICATIONS

Cruz Neto, J.X. and Oliveira, P.R. (1995), Geodesic Methods in Riemannian
Manifolds, Preprint, R T 95-10, PESC/COPPE - Federal University of Rio
de Janeiro, BR
Cruz Neto, J. X., Lima, L.L. and Oliveira, P. R. (1998), Geodesic Algorithm in
Riemannian Manifold, Balkan JournaL of Geometry and Applications, Vol.
3, NO 2, pp. 89-100.
Ferreira, O.P. and Oliveira, P.R. (1998), Subgradient Algorithm on Riemannian
Manifold, Journal of Optimization Theory and Applications, Vol. 97, No 1,
pp. 93-104.
Ferreira, O.P. and Oliveira, P.R. (2002), Proximal Point Algorithm on Rie-
mannian Manifolds, Optimization, Vol. 51, No 2, pp. 257-270.
Gabay, D. (1982), Minimizing a Differentiable Function over a Differential Man-
ifold, Journal of Optimization Theory and Applications, Vol. 37, No 2, pp.
177-219.
Karmarkar, N. (1990), Riemannian Geometry Underlying Interior-Point Meth-
ods for Linear Programming, Contemporary Mathematics, Vol. 114, pp. 51-
75.
Karmarkar, N. (1984), A New Polynomial-Time Algorithm for Linear Program-
ming, Combinatorics, Vol. 4, pp. 373-395.
Nesterov, Y.E. and Todd, M. (2002), On the Riemannian Geometry Defined
by self- Concordant Barriers and Interior-Point Methods, Preprint
Dikin, 1.1. (1967), Iterative Solution of Problems of Linear and Quadratic Pro-
gramming, Soviet Mathematics Doklady, Vol. 8, pp. 647-675.
Eggermont, P.P.B. (1990), Multiplicative Iterative Algorithms for Convex Pro-
gramming, Linear Algebra and its Applications, Vol. 130, pp. 25-42.
Oliveira, G.L. and Oliveira, P.R. (2002), A New Class of Interior-Point Methods
for Optimization Under Positivity Constraints, TR PESC/COPPE-FURJ,
preprint,
Pinto, A.W.M., Oliveira, P.R. and Cruz Neto, J . X. (2002), A New Class of Po-
tential Affine Algorithms for Linear Convex Programming, TR PESC/COPPE-
FUR J, preprint,
Mork, J.J. and Rheinboldt, W.C. (1973), On P- and S-functions and related
classes of n- dimensional nonlinear mappings, Linear Algebra Appl., Vol. 6,
pp. 45-68.
REFERENCES 561

Harker, P.T. and Pang, J.S. (1990), Finite dimensional variational inequality
and nonlinear complementarity problems: A survey of theory, algorithms and
applications, Mathematical Programming, Vol. 48, pp. 161-220.
Rockafellar, R.T. (1976), Monotone operators and the proximal point algo-
rithm, SIAM Journal on Control and Optimization, Vol. 14, pp. 877-898.
Martinet, B. (1970), Regularisation d'inequations variationelles par approxi-
mations sucessives, Revue Pran~aised'lnformatique et de Recherche Opera-
tionelle, Vol. 4, pp. 154-159.
Facchinei, F. (1998), Strutural and stability properties of Po nonlinear com-
plementarity problems, Mathematics of Operations Research, Vol. 23, pp.
735-745.
Facchinei, F. and Kanzow, C. (1999), Beyond Monotonicity in regularization
methods for nonlinear complementarity problems, SIAM Journal on Control
and Optimization, Vol. 37, pp. 1150-1161.
Yamashita, N., Imai, I. and Fukushima, M. (2001), The proximal point algo-
rithm for the Po complementarity problem, Complementarity: Algorithms
and extensions , Edited by Ferris, M. C., Mangasarian, 0 . L. and Pang, J.
S., Kluwer Academic Publishers, pp. 361-379.
Kanzow, C. (1996), Global convergence properties of some iterative methods
for linear complementarity problems, SIAM Journal of Optimization, Vol. 6,
pp. 326-341.
MorB, J.J. (1974), Coercivity conditions in nonlinear complementarity problem,
SIAM Rev., Vol. 16, pp. 1-16.
Fischer, A. (1992), A special Newton-type optimization method, Optmization,
Vol. 24, pp. 269-284.

Applied Regression Analysis.
25% (4)
Applied Regression Analysis.
9 pages
Economic Theory and Operations Analysis-1
No ratings yet
Economic Theory and Operations Analysis-1
7 pages
Kleinbaum Applied Regression Analysis and Other Multivariable Methods 3 Ed PDF
0% (6)
Kleinbaum Applied Regression Analysis and Other Multivariable Methods 3 Ed PDF
9 pages
Ebooksclub Org Process Dynamics Modeling and Control Topics in Chemical Engineering
100% (1)
Ebooksclub Org Process Dynamics Modeling and Control Topics in Chemical Engineering
1,281 pages
Hayashi Econometrics
50% (2)
Hayashi Econometrics
686 pages
Process Dynamics: Modeling, Analysis, and Simulation
No ratings yet
Process Dynamics: Modeling, Analysis, and Simulation
632 pages
Optimal Control: Linear Quadratic Methods
From Everand
Optimal Control: Linear Quadratic Methods
Brian D. O. Anderson
4/5 (2)
Optimal Control Theory: An Introduction
From Everand
Optimal Control Theory: An Introduction
Donald E. Kirk
5/5 (2)
PIN Select in LPC2148
100% (4)
PIN Select in LPC2148
3 pages
Framework Spotify
No ratings yet
Framework Spotify
66 pages
Yatchew A. Semiparametric Regression For The Applied Econometrician (CUP, 2003) (ISBN 0521812836) (235s) - GL
100% (1)
Yatchew A. Semiparametric Regression For The Applied Econometrician (CUP, 2003) (ISBN 0521812836) (235s) - GL
235 pages
Unreachable Setpoints in MPC
No ratings yet
Unreachable Setpoints in MPC
7 pages
Introduction to Econometrics 2nd Edition James H. Stock - The ebook is available for instant download, read anywhere
No ratings yet
Introduction to Econometrics 2nd Edition James H. Stock - The ebook is available for instant download, read anywhere
74 pages
Nonlinear Dynamics: Exploration Through Normal Forms
From Everand
Nonlinear Dynamics: Exploration Through Normal Forms
Peter B. Kahn
5/5 (1)
Process-Dynamics Modeling Analysis and Simulation Wayne Bequette PDF
No ratings yet
Process-Dynamics Modeling Analysis and Simulation Wayne Bequette PDF
632 pages
Get Introduction to Econometrics 2nd Edition James H. Stock free all chapters
100% (1)
Get Introduction to Econometrics 2nd Edition James H. Stock free all chapters
82 pages
[Ebooks PDF] download Introduction to Econometrics 2nd Edition James H. Stock full chapters
100% (4)
[Ebooks PDF] download Introduction to Econometrics 2nd Edition James H. Stock full chapters
81 pages
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Denn Optimization by Variational Methods
No ratings yet
Denn Optimization by Variational Methods
431 pages
Bayesian Non and Semi parametric Methods and Applications Peter Rossi download
100% (1)
Bayesian Non and Semi parametric Methods and Applications Peter Rossi download
84 pages
On The Rate of Convergence For The Pseudospectral Optimal Control of Feedback Linearizable Systems
No ratings yet
On The Rate of Convergence For The Pseudospectral Optimal Control of Feedback Linearizable Systems
28 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
Taming Heterogeneity and Complexity of Embedded Control
From Everand
Taming Heterogeneity and Complexity of Embedded Control
Françoise Lamnabhi-Lagarrigu
No ratings yet
Design of Nonlinear Control Systems With The Highest Derivative in Feedback
No ratings yet
Design of Nonlinear Control Systems With The Highest Derivative in Feedback
374 pages
Nonlinear Model Predictive Control - From - Theory - To - Application PDF
No ratings yet
Nonlinear Model Predictive Control - From - Theory - To - Application PDF
18 pages
ECO 4203 - Outline 2023-2024 Session
No ratings yet
ECO 4203 - Outline 2023-2024 Session
4 pages
(Lecture Notes in Economics and Mathematical Systems 374) Søren Asmussen, Reuven Rubinstein (Auth.), Prof. Dr. Georg Pflug, Prof. Dr. Ulrich Dieter (Eds.)-Simulation and Optimization_ Proceedings of t
No ratings yet
(Lecture Notes in Economics and Mathematical Systems 374) Søren Asmussen, Reuven Rubinstein (Auth.), Prof. Dr. Georg Pflug, Prof. Dr. Ulrich Dieter (Eds.)-Simulation and Optimization_ Proceedings of t
174 pages
The essentials of factor analysis 2nd Edition Dennis Child - The full ebook with complete content is ready for download
100% (1)
The essentials of factor analysis 2nd Edition Dennis Child - The full ebook with complete content is ready for download
72 pages
Gujarat Technological University: W.E.F. AY 2018-19
No ratings yet
Gujarat Technological University: W.E.F. AY 2018-19
3 pages
From Static Output Feedbackto Structured Robust Static Output Feedback - A Survey
No ratings yet
From Static Output Feedbackto Structured Robust Static Output Feedback - A Survey
40 pages
Process Dynamics: Modeling, Analysis, and Simulation
No ratings yet
Process Dynamics: Modeling, Analysis, and Simulation
632 pages
A Computational Study of Linear Approximations to the Convex Constraints in the Iterative Linear IV‐ACOPF Formulation
No ratings yet
A Computational Study of Linear Approximations to the Convex Constraints in the Iterative Linear IV‐ACOPF Formulation
17 pages
Infinite dimensional optimization and control theory 1. pbk. print Edition Fattorini - The full ebook with complete content is ready for download
100% (1)
Infinite dimensional optimization and control theory 1. pbk. print Edition Fattorini - The full ebook with complete content is ready for download
58 pages
Get Introduction to Econometrics 3rd Edition James H. Stock free all chapters
100% (1)
Get Introduction to Econometrics 3rd Edition James H. Stock free all chapters
55 pages
[Ebooks PDF] download Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine full chapters
100% (27)
[Ebooks PDF] download Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine full chapters
85 pages
Sequential Quadratic Programming
No ratings yet
Sequential Quadratic Programming
52 pages
Fluid Mechanics-I (ME321) : Dr. Ali Turab Jafry
No ratings yet
Fluid Mechanics-I (ME321) : Dr. Ali Turab Jafry
22 pages
Nonlinear Model Predictive Control From Theory To
No ratings yet
Nonlinear Model Predictive Control From Theory To
18 pages
Modelling Note
No ratings yet
Modelling Note
43 pages
Applied Linear Regression 4th Edition Sanford Weisberg instant download
100% (2)
Applied Linear Regression 4th Edition Sanford Weisberg instant download
66 pages
Control Systems _syllabus_2
No ratings yet
Control Systems _syllabus_2
2 pages
Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine All Chapters Instant Download
100% (1)
Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine All Chapters Instant Download
67 pages
Bayesian Non and Semi Parametric Methods and Applications Peter Rossi
No ratings yet
Bayesian Non and Semi Parametric Methods and Applications Peter Rossi
83 pages
MLP Vs RBF Doctoral Thesis
No ratings yet
MLP Vs RBF Doctoral Thesis
32 pages
Flowsheeting in Excel PDF
No ratings yet
Flowsheeting in Excel PDF
30 pages
4363Introduction to Econometrics 3rd Edition James H. Stock - Download the complete ebook in PDF format and read freely
100% (6)
4363Introduction to Econometrics 3rd Edition James H. Stock - Download the complete ebook in PDF format and read freely
67 pages
Robust Adaptive Control
From Everand
Robust Adaptive Control
Petros Ioannou
No ratings yet
10389847Complete Download (Ebook) Bayesian Non- and Semi-parametric Methods and Applications by Peter Rossi ISBN 9781400850303 PDF All Chapters
100% (3)
10389847Complete Download (Ebook) Bayesian Non- and Semi-parametric Methods and Applications by Peter Rossi ISBN 9781400850303 PDF All Chapters
81 pages
Introduction to Econometrics 2nd Edition James H. Stock instant download
100% (1)
Introduction to Econometrics 2nd Edition James H. Stock instant download
77 pages
Panel Data Econometrics Theoretical Contributions and Empirical Applications 1st Edition Badi H. Baltagi (Eds.) - The special ebook edition is available for download now
100% (1)
Panel Data Econometrics Theoretical Contributions and Empirical Applications 1st Edition Badi H. Baltagi (Eds.) - The special ebook edition is available for download now
83 pages
CC-11 Sem-5 Statistical Inference-II (STS-A-CC-5-11-TH)
No ratings yet
CC-11 Sem-5 Statistical Inference-II (STS-A-CC-5-11-TH)
5 pages
Lecture Notes On Advanced Econometrics
100% (1)
Lecture Notes On Advanced Econometrics
378 pages
Bayesian Non and Semi parametric Methods and Applications Peter Rossi - The full ebook with complete content is ready for download
No ratings yet
Bayesian Non and Semi parametric Methods and Applications Peter Rossi - The full ebook with complete content is ready for download
75 pages
Instant ebooks textbook Introduction to Econometrics 3rd Edition James H. Stock download all chapters
100% (2)
Instant ebooks textbook Introduction to Econometrics 3rd Edition James H. Stock download all chapters
65 pages
Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine - The latest ebook edition with all chapters is now available
100% (1)
Levine s Guide to SPSS for Analysis of Variance 2nd Edition Gustav Levine - The latest ebook edition with all chapters is now available
47 pages
Chapter 1-2
No ratings yet
Chapter 1-2
32 pages
(Chapman & Hall_CRC Statistics in the Social and Behavioral Sciences) Jocelyn E. Bolin - Regression Analysis in R_ a Comprehensive View for the Social Sciences-CRC Press _ Taylor & Francis Group (2023
No ratings yet
(Chapman & Hall_CRC Statistics in the Social and Behavioral Sciences) Jocelyn E. Bolin - Regression Analysis in R_ a Comprehensive View for the Social Sciences-CRC Press _ Taylor & Francis Group (2023
193 pages
M.tech. Chemical Engineering
No ratings yet
M.tech. Chemical Engineering
20 pages
[FREE PDF sample] Nonlinear Oscillations in Physical Systems Chihiro Hayashi ebooks
100% (7)
[FREE PDF sample] Nonlinear Oscillations in Physical Systems Chihiro Hayashi ebooks
67 pages
Download Full Levine s Guide to SPSS for Analysis of Variance 2nd Edition Melanie C. Page PDF All Chapters
100% (3)
Download Full Levine s Guide to SPSS for Analysis of Variance 2nd Edition Melanie C. Page PDF All Chapters
84 pages
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
1-s2.0-S0020740319345138-main
No ratings yet
1-s2.0-S0020740319345138-main
26 pages
1-s2.0-0020768394900752-main
No ratings yet
1-s2.0-0020768394900752-main
14 pages
Presentation
No ratings yet
Presentation
55 pages
Mela Metnet 2016 Tampere
No ratings yet
Mela Metnet 2016 Tampere
38 pages
L01 - Intro To Java
No ratings yet
L01 - Intro To Java
18 pages
Forta DLP Brochure
No ratings yet
Forta DLP Brochure
16 pages
Revision SQL
No ratings yet
Revision SQL
5 pages
L3 DFA Introduction
No ratings yet
L3 DFA Introduction
17 pages
Semi-Auto Chemistry Analyzer Semi-Auto Chemistry Analyzer: Technical Specifications
No ratings yet
Semi-Auto Chemistry Analyzer Semi-Auto Chemistry Analyzer: Technical Specifications
2 pages
Ashish Singh: +919311143172 - Delhi, India - Profile Summary
No ratings yet
Ashish Singh: +919311143172 - Delhi, India - Profile Summary
2 pages
Future of AI
No ratings yet
Future of AI
2 pages
VR2272B SVDR Operating Manual AMI Iss 04
No ratings yet
VR2272B SVDR Operating Manual AMI Iss 04
18 pages
Chapter: 3: E Banking 1. What Is E Banking
No ratings yet
Chapter: 3: E Banking 1. What Is E Banking
7 pages
Internet of Things
No ratings yet
Internet of Things
11 pages
Nwuav UAvionix Brochure
No ratings yet
Nwuav UAvionix Brochure
6 pages
Flow Meter Lab Report Annagracekeel
No ratings yet
Flow Meter Lab Report Annagracekeel
14 pages
Vendor Document / Document Front Sheet: Diesel Engine Generator Set Overall Piping & Instrumentation Diagram
100% (2)
Vendor Document / Document Front Sheet: Diesel Engine Generator Set Overall Piping & Instrumentation Diagram
11 pages
Czns 4 P
No ratings yet
Czns 4 P
14 pages
Canon Ir2018 2030 Manuel D Utilisation
No ratings yet
Canon Ir2018 2030 Manuel D Utilisation
8 pages
Interactivity_in_Print
No ratings yet
Interactivity_in_Print
9 pages
Mohtaseb Ubdated Qoutation
No ratings yet
Mohtaseb Ubdated Qoutation
4 pages
Reading list for Building Envelope
No ratings yet
Reading list for Building Envelope
1 page
ADB Chapter Two
No ratings yet
ADB Chapter Two
22 pages
Infineon-ApplicationNote Demoboard IDP2303 120W-AN-v01 00-EN PDF
No ratings yet
Infineon-ApplicationNote Demoboard IDP2303 120W-AN-v01 00-EN PDF
36 pages
International PA
No ratings yet
International PA
1 page
Manual For Students
No ratings yet
Manual For Students
21 pages
Vikas Resume
No ratings yet
Vikas Resume
3 pages
Chapter 2 Divide and Conquer Strategy
No ratings yet
Chapter 2 Divide and Conquer Strategy
71 pages
Saluran Open Short Match 2023
No ratings yet
Saluran Open Short Match 2023
9 pages
5 Most Common Offshore Oil and Gas Production Facility Types You Can See Today
No ratings yet
5 Most Common Offshore Oil and Gas Production Facility Types You Can See Today
6 pages
L-Hsara Tal-Pornografija
No ratings yet
L-Hsara Tal-Pornografija
17 pages
C029 Supply Chain Management and Logistics
No ratings yet
C029 Supply Chain Management and Logistics
16 pages