100% found this document useful (1 vote)

161 views407 pages

Optimization Structure and Applications

This document is about optimization and its applications. It discusses: 1. The Springer Optimization and Its Applications series which publishes textbooks, monographs and works focusing on algorithms for solving optimization problems and their applications. 2. The volume contains an introduction and is edited by Charles Pearce and Emma Hunt of the University of Adelaide. 3. The volume is dedicated to the late Alex Rubinov, who was an invited plenary speaker at a related mini-conference.

Uploaded by

Jhovanny Alexander

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

161 views407 pages

Optimization Structure and Applications

Uploaded by

Jhovanny Alexander

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 407

OPTIMIZATION

Springer Optimization and Its Applications

VOLUME 32

Managing Editor
Panos M. Pardalos (University of Florida)

Editor–Combinatorial Optimization
Ding-Zhu Du (University of Texas at Dallas)

Advisory Board
J. Birge (University of Chicago)
C.A. Floudas (Princeton University)
F. Giannessi (University of Pisa)
H.D. Sherali (Virginia Polytechnic and State University)
T. Terlaky (McMaster University)
Y. Ye (Stanford University)

Aims and Scope

Optimization has been expanding in all directions at an astonishing rate dur-
ing the last few decades. New algorithmic and theoretical techniques have
been developed, the diffusion into other disciplines has proceeded at a rapid
pace, and our knowledge of all aspects of the field has grown even more
profound. At the same time, one of the most striking trends in optimization
is the constantly increasing emphasis on the interdisciplinary nature of the
field. Optimization has been a basic tool in all areas of applied mathematics,
engineering, medicine, economics and other sciences.
The series Springer Optimization and Its Applications publishes under-
graduate and graduate textbooks, monographs and state-of-the-art exposi-
tory works that focus on algorithms for solving optimization problems and
also study applications involving such problems. Some of the topics covered
include nonlinear optimization (convex and nonconvex), network flow prob-
lems, stochastic optimization, optimal control, discrete optimization, mul-
tiobjective programming, description of software packages, approximation
techniques and heuristic approaches.
OPTIMIZATION

Structure and Applications

Edited By

CHARLES PEARCE
School of Mathematical Sciences,
The University of Adelaide,
Adelaide, Australia

EMMA HUNT
School of Economics & School of Mathematical Sciences,
The University of Adelaide,
Adelaide, Australia

123
Editors
Charles Pearce Emma Hunt
Department of Applied Mathematics Department of Applied Mathematics
University of Adelaide University of Adelaide
70 North Terrace 70 North Terrace
Adelaide SA 5005 Adelaide SA 5005
Australia Australia
[email protected] [email protected]

ISSN 1931-6828
ISBN 978-0-387-98095-9 e-ISBN 978-0-387-98096-6
DOI 10.1007/978-0-387-98096-6
Springer Dordrecht Heidelberg London New York

Library of Congress Control Number: 2009927130

Mathematics Subject Classification (2000): 49-06, 65Kxx, 65K10, 76D55, 78M50

c Springer Science+Business Media, LLC 2009

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in
connection with any form of information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to
proprietary rights.

Cover illustration: Picture provided by Elias Tyligadas

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

This volume
is dedicated with great
aﬀection to the late Alex
Rubinov, who was in-
vited plenary speaker at
the mini-conference.
He is sorely
missed.
♥
Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Part I Optimization: Structure

1 On the nondiﬀerentiability of cone-monotone functions

in Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Jonathan Borwein and Rafal Goebel
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Duality and a Farkas lemma

for integer programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Jean B. Lasserre
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Summary of content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Duality for the continuous problems P and I . . . . . . . . . . . . . . 18
2.2.1 Duality for P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Duality for integration . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Comparing P, P∗ and I, I∗ . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 The continuous Brion and Vergne formula . . . . . . . . . 20
2.2.5 The logarithmic barrier function . . . . . . . . . . . . . . . . . 21
2.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Duality for the discrete problems Id and Pd . . . . . . . . . . . . . . 22
2.3.1 The Z-transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 The dual problem I∗d . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Comparing I∗ and I∗d . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

vii
viii Contents

2.3.4 The “discrete” Brion and Vergne formula . . . . . . . . . 25

2.3.5 The discrete optimization problem Pd . . . . . . . . . . . . 26
2.3.6 A dual comparison of P and Pd . . . . . . . . . . . . . . . . . 27
2.4 A discrete Farkas lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 The case when A ∈ Nm×n . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.2 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6.3 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6.4 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Some nonlinear Lagrange and penalty functions for
problems with a single constraint . . . . . . . . . . . . . . . . . . . . . . . . . 41
J. S. Giri and A. M. Rubinov
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 The relationship between extended penalty functions and
extended Lagrange functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Generalized Lagrange functions . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 The Lagrange function approach . . . . . . . . . . . . . . . . . 51
3.5.2 Penalty function approach . . . . . . . . . . . . . . . . . . . . . . 52
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Convergence of truncates in l1 optimal feedback control . . 55

Robert Wenczel, Andrew Eberhard and Robin Hill
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Mathematical preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 System-theoretic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Basic system concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 Feedback stabilization of linear systems . . . . . . . . . . . 62
4.4 Formulation of the optimization problem in l1 . . . . . . . . . . . . 64
4.5 Convergence tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Veriﬁcation of the constraint qualiﬁcation . . . . . . . . . . . . . . . . 71
4.6.1 Limitations on the truncation scheme . . . . . . . . . . . . . 76
4.7 Convergence of approximates . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.7.1 Some extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Contents ix

5 Asymptotical stability of optimal paths in nonconvex

problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Musa A. Mamedov
5.1 Introduction and background . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 The main conditions of the turnpike theorem . . . . . . . . . . . . . 97
5.3 Deﬁnition of the set D and some of its properties . . . . . . . . . . 100
5.4 Transformation of Condition H3 . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5 Sets of 1st and 2nd type: Some integral inequalities . . . . . . . . 105
5.5.1 ............................................. 105
5.5.2 ............................................. 106
5.5.3 ............................................. 107
5.5.4 ............................................. 108
5.5.5 ............................................. 113
5.6 Transformation of the functional (5.2) . . . . . . . . . . . . . . . . . . . 117
5.6.1 ............................................. 117
5.6.2 ............................................. 119
5.7 The proof of Theorem 13.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.7.1 ............................................. 123
5.7.2 ............................................. 129
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6 Pontryagin principle with a PDE: a uniﬁed approach . . . . . 135

B. D. Craven
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Pontryagin for an ODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.3 Pontryagin for an elliptic PDE . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4 Pontryagin for a parabolic PDE . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7 A turnpike property for discrete-time control systems

in metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Alexander J. Zaslavski
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.2 Stability of the turnpike phenomenon . . . . . . . . . . . . . . . . . . . . 146
7.3 A turnpike is a solution of the problem (P) . . . . . . . . . . . . . . . 149
7.4 A turnpike result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8 Mond–Weir Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

B. Mond
8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2 Convexity and Wolfe duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.3 Fractional programming and some extensions
of convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
x Contents

8.4 Mond–Weir dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.6 Second order duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.7 Symmetric duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

9 Computing the fundamental matrix of an M /G/1–type

Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Emma Hunt
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.2 Algorithm H: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.3 Probabilistic construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.4 Algorithm H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.5 Algorithm H: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.6 H, G and convergence rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.7 A special case: The QBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.8 Algorithms CR and H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

10 A comparison of probabilistic and invariant subspace

methods for the block M /G/1 Markov chain . . . . . . . . . . . . . . 189
Emma Hunt
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.2 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.3 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
10.3.1 Experiment G1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.3.2 Experiment G2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.3.3 The Daigle and Lucantoni teletraﬃc problem . . . . . . 196
10.3.4 Experiment G6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.3.5 Experiment G7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

11 Interpolating maps, the modulus map and Hadamard’s

inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
S. S. Dragomir, Emma Hunt and C. E. M. Pearce
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.2 A reﬁnement of the basic inequality . . . . . . . . . . . . . . . . . . . . . 210
11.3 Inequalities for Gf and Hf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.4 More on the identric mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
11.5 The mapping Lf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Contents xi

Part II Optimization: Applications

12 Estimating the size of correcting codes using extremal

graph problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Sergiy Butenko, Panos Pardalos, Ivan Sergienko, Vladimir Shylo
and Petro Stetsyuk
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.2 Finding lower bounds and exact solutions for the largest
code sizes using a maximum independent set problem . . . . . . 229
12.2.1 Finding the largest correcting codes . . . . . . . . . . . . . . 232
12.3 Lower Bounds for Codes Correcting One Error on
the Z-Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
12.3.1 The partitioning method . . . . . . . . . . . . . . . . . . . . . . . . 237
12.3.2 The partitioning algorithm . . . . . . . . . . . . . . . . . . . . . . 239
12.3.3 Improved lower bounds for code sizes . . . . . . . . . . . . . 239
12.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

13 New perspectives on optimal transforms of random

vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
P. G. Howlett, C. E. M. Pearce and A. P. Torokhti
13.1 Introduction and statement of the problem . . . . . . . . . . . . . . . 245
13.2 Motivation of the statement of the problem . . . . . . . . . . . . . . . 247
13.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
13.4 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
13.5 Comparison of the transform T 0 and the GKLT . . . . . . . . . . . 251
13.6 Solution of the unconstrained minimization problem (13.3) . 252
13.7 Applications and further modiﬁcations and extensions . . . . . 253
13.8 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
13.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

14 Optimal capacity assignment in general queueing

networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
P. K. Pollett
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
14.2 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
14.3 The residual-life approximation . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.4 Optimal allocation of eﬀort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
14.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
14.6 Data networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
14.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
xii Contents

15 Analysis of a simple control policy for stormwater

management in two connected dams . . . . . . . . . . . . . . . . . . . . . . 273
Julia Piantadosi and Phil Howlett
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
15.2 A discrete-state model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
15.2.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
15.2.2 The transition matrix for a specific control policy . . 275
15.2.3 Calculating the steady state when 1 < m < k . . . . . . 276
15.2.4 Calculating the steady state for m = 1 . . . . . . . . . . . . 279
15.2.5 Calculating the steady state for m = k . . . . . . . . . . . . 280
15.3 Solution of the matrix eigenvalue problem using Gaussian
elimination for 1 < m < k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
15.3.1 Stage 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.3.2 The general rules for stages 2 to m − 2 . . . . . . . . . . . 281
15.3.3 Stage m − 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
15.3.4 The general rules for stages m to k − 2m . . . . . . . . . . 284
15.3.5 Stage k − 2m + 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
15.3.6 The general rule for stages k − 2m + 2
to k − m − 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
15.3.7 The final stage k − m − 1 . . . . . . . . . . . . . . . . . . . . . . . 287
15.4 The solution process using back substitution for 1 < m < k . 287
15.5 The solution process for m = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 290
15.6 The solution process for m = k . . . . . . . . . . . . . . . . . . . . . . . . . . 292
15.7 A numerical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
15.8 Justification of inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
15.8.1 Existence of the matrix W0 . . . . . . . . . . . . . . . . . . . . . . 296
15.8.2 Existence of the matrix Wp for 1 ≤ p ≤ m − 1 . . . . . 296
15.8.3 Existence of the matrix Wq
for m ≤ q ≤ k − m − 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 298
15.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

16 Optimal design of linear consecutive–k–out–of–n systems . 307

Malgorzata O’Reilly
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
16.1.1 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
16.1.2 Applications and generalizations of linear
consecutive–k–out–of–n systems . . . . . . . . . . . . . . . . . 308
16.1.3 Studies of consecutive–k–out–of–n systems . . . . . . . . 309
16.1.4 Summary of the results . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.2 Propositions for R and M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
16.3 Preliminaries to the main proposition . . . . . . . . . . . . . . . . . . . . 315
16.4 The main proposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
16.5 Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Contents xiii

16.6 Procedures to improve designs not satisfying necessary

conditions for the optimal design . . . . . . . . . . . . . . . . . . . . . . . . 324
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

17 The (k+1)-th component of linear consecutive–k–out–of–n

systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Malgorzata O’Reilly
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
17.2 Summary of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
17.3 General result for n > 2k, k ≥ 2 . . . . . . . . . . . . . . . . . . . . . . . . . 330
17.4 Results for n = 2k + 1, k > 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
17.5 Results for n = 2k + 2, k > 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
17.6 Procedures to improve designs not satisfying the necessary
conditions for the optimal design . . . . . . . . . . . . . . . . . . . . . . . . 340
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

18 Optimizing properties of polypropylene and elastomer

compounds containing wood ﬂour . . . . . . . . . . . . . . . . . . . . . . . . . 343
Pavel Spiridonov, Jan Budin, Stephen Clarke and Jani Matisons
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
18.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
18.2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
18.2.2 Sample preparation and tests . . . . . . . . . . . . . . . . . . . . 345
18.3 Results and discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
18.3.1 Density of compounds . . . . . . . . . . . . . . . . . . . . . . . . . . 345
18.3.2 Comparison of compounds obtained in a Brabender
mixer and an injection-molding machine . . . . . . . . . . 346
18.3.3 Compatibilization of the polymer matrix
and wood ﬂour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
18.3.4 Optimization of the compositions . . . . . . . . . . . . . . . . 350
18.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

19 Constrained spanning, Steiner trees and the triangle

inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Prabhu Manyem
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
19.2 Upper bounds for approximation . . . . . . . . . . . . . . . . . . . . . . . . 359
19.2.1 The most expensive edge is at most a minimum
spanning tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
19.2.2 MaxST is at most (n − 1)MinST . . . . . . . . . . . . . . . . . 359
19.3 Lower bound for a CSP approximation . . . . . . . . . . . . . . . . . . . 360
19.3.1 E-Reductions: Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . 360
19.3.2 SET COVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
19.3.3 Reduction from SET COVER . . . . . . . . . . . . . . . . . . . 361
19.3.4 Feasible Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
xiv Contents

19.3.5 Proof of E-Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

19.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

20 Parallel line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

T. C. Peachey, D. Abramson and A. Lewis
20.1 Line searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
20.2 Nimrod/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
20.3 Execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
20.3.1 A model for execution time . . . . . . . . . . . . . . . . . . . . . 373
20.3.2 Evaluation time a Bernoulli variate . . . . . . . . . . . . . . . 373
20.3.3 Simulations of evaluation time . . . . . . . . . . . . . . . . . . . 375
20.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
20.4 Accelerating convergence by incomplete iterations . . . . . . . . . 377
20.4.1 Strategies for aborting jobs . . . . . . . . . . . . . . . . . . . . . . 377
20.4.2 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
20.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

21 Alternative Mathematical Programming Models: A Case

for a Coal Blending Decision Process . . . . . . . . . . . . . . . . . . . . . 383
Ruhul A. Sarker
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
21.2 Mathematical programming models . . . . . . . . . . . . . . . . . . . . . . 385
21.2.1 Single period model (SPM) . . . . . . . . . . . . . . . . . . . . . . 386
21.2.2 Multiperiod nonlinear model (MNM) . . . . . . . . . . . . . 389
21.2.3 Upper bound linear model (MLM) . . . . . . . . . . . . . . . 390
21.2.4 Multiperiod linear model (MLM) . . . . . . . . . . . . . . . . 391
21.3 Model ﬂexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
21.3.1 Case-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
21.3.2 Case-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
21.3.3 Case-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
21.4 Problem size and computation time . . . . . . . . . . . . . . . . . . . . . . 395
21.5 Objective function values and ﬂuctuating situation . . . . . . . . 396
21.6 Selection criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
21.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

About the Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

List of Figures

3.1 P (f0 , f1 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2 L(x; 52 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 L+
s 1 (x; 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3
4.1 A closed-loop control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
12.1 A scheme of the Z-channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
12.2 Algorithm for finding independent set partitions . . . . . . . . . . . . 240
13.1 Illustration of the performance of our method. . . . . . . . . . . . . . 256
13.2 Typical examples of a column reconstruction in the matrix
X (image “Lena”) after filtering and compression of the
observed noisy image (Figure 13.1b) by transforms H
(line with circles) and T 0 (solid line) of the same rank.
In both subfigures, the plot of the column (solid line)
virtually coincides with the plot of the estimate by the
transform T 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
18.1 Density of (a) polypropylene and (b) SBS in elastomer
compounds for different blending methods. . . . . . . . . . . . . . . . . 347
18.2 Comparison of tensile strength of the compounds obtained
in an injection-molding machine and in a Brabender mixer. . . 348
18.3 Influence of wood flour fractions and the modifier on the
tensile strength of injection-molded specimens of the (a) PP
and (b) SBS compounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
18.4 Relative cost of the (a) PP and (b) SBS compounds
depending on the content of wood flour
and maleated polymers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
18.5 Photographs of the PP compounds containing 40% wood
flour of different fractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
19.1 A Constrained Steiner Tree and some of its special cases. . . . . 357
19.2 A CSPI instance reduced from SET COVER (not all edges
shown). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361

xv
xvi List of Figures

19.3 Feasible solution for our instance of CSPI

(not all edges shown). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
20.1 A sample conﬁguration ﬁle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
20.2 Architecture of Nimrod/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
20.3 Performance with Bernoulli job times. . . . . . . . . . . . . . . . . . . . . 374
20.4 Test function g(x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
20.5 Results of simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
20.6 Incomplete evaluation points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
20.7 Strategy 1 with exponential distribution of job times. . . . . . . . 379
20.8 Strategy 1 with rectangular distribution of job times. . . . . . . . 379
20.9 Results for Strategy 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
20.10 Results for Strategy 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
21.1 Simple case problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
List of Tables

9.1 The interlacing property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

10.1 Experiment G1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.2 Experiment G2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.3 Experiment G2 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.4 Iterations required with various traffic levels: Experiment G3 198
10.5 Iterations required with various traffic levels: Experiment
G3 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.6 Experiment G4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.7 Experiment G4 continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.8 Experiment G5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10.9 Experiment G6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10.10 Experiment G7: a transient process . . . . . . . . . . . . . . . . . . . . . . . 204
12.1 Lower bounds obtained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
12.2 Exact algorithm: Computational results . . . . . . . . . . . . . . . . . . . 235
12.3 Exact solutions found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
12.4 Lower bounds obtained in: a [27]; b [6]; c [7]; d [12]; e (this
chapter) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
12.5 Partitions of asymmetric codes found . . . . . . . . . . . . . . . . . . . . . 240
12.6 Partitions of constant weight codes obtained in: a (this
chapter); b [4]; c [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
12.7 New lower bounds. Previous lower bounds were found in:
a [11]; b [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.1 Ratios ρij of the error associated with the GKLT H to that
of the transform T 0 with the same compression ratios . . . . . . . 255
15.1 Overflow lost from the system for m = 1, 2, 3, 4 . . . . . . . . . . . . . 295
16.1 Invariant optimal designs of linear consecutive–k–out–of–n
systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
18.1 Physical characteristics of the wood flour fractions . . . . . . . . . . 345
19.1 Constrained Steiner Tree and special cases: References . . . . . . 358
19.2 E-Reduction of a SET COVER to a CSPI : Costs and delays
of edges in G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

xvii
xviii List of Tables

21.1 Relative problem size of ULM, MLM and MNM . . . . . . . . . . . . 395

21.2 Objective function values of ULM, MLM and MNM . . . . . . . . 397
Preface

This volume comprises a selection of material based on presentations at the

Eighth Australian Optimization Day, held in McLaren Vale, South Australia,
in September 2001, and some additional invited contributions by distin-
guished colleagues, here and overseas. Optimization Day is an annual mini-
conference in Australia which dates from 1994. It has been successful in bring-
ing together Australian researchers in optimization and related areas for the
sharing of ideas and the facilitation of collaborative work. These meetings
have also attracted some collaborative researchers from overseas.
This particular meeting was remarkable in the efforts made by some of
the participants to ensure being present. It took place within days of the
September 11 tragedy in New York and the financial collapse of a major
Australian airline. These events left a number of us without air tickets on the
eve of the conference. Some participants arrived in South Australia by car,
having driven up to several thousand kilometers to join the meeting.
This volume has two parts, one concerning mathematical structure and
the other applications. The first part begins with a treatment of nondifferen-
tiability of cone-monotone functions in Banach spaces, showing that whereas
several regularity properties of cone-monotone functions in finite-dimensional
spaces carry over to a separable Banach space provided the cone has an in-
terior, further generalizations are not readily possible. The following chapter
concerns a comparison between linear and integer programming, particularly
from a duality perspective. A discrete Farkas lemma is provided and it is
shown that the existence of a nonnegative integer solution to a linear equa-
tion can be tested via a linear program. Next, there is a study of connec-
tions between generalized Lagrangians and generalized penalty functions for
problems with a single constraint. This is followed by a detailed theoretical
analysis of convergence of truncates in 1 optimal feedback control. The treat-
ment permits consideration of the frequently occurring case of an objective
function lacking interiority of domain. The optimal control theme continues
with a study of asymptotic stability of optimal paths in nonconvex prob-
lems. The purpose of the chapter is to avoid the convexity conditions usually

xix
xx Preface

assumed in turnpike theory. The succeeding chapter proposes a uniﬁed

approach to Pontryagin’s principle for optimal control problems with dynam-
ics described by a partial differential equation. This is followed by a study
of a turnpike property for discrete-time control systems in metric spaces.
A treatment of duality theory for nonlinear programming includes compar-
isons of alternative approaches and discussion of how Mond–Weir duality
and Wolfe duality may be combined. There are two linked chapters centered
on the use of probabilistic structure for designing an improved algorithm for
the determination of the fundamental matrix of a block-structured M/G/1
Markov chain. The approach via probabilistic structure makes clear in par-
ticular the nature of the relationship between the cyclic reduction algorithms
and the Latouche–Ramaswami algorithm in the QBD case. Part I concludes
with a chapter developing systematic classes of refinements of Hadamard’s
inequality, a cornerstone of convex analysis.
Although Part II of this volume is concerned with applications, a number
of the chapters also possess appreciable theoretical content. Part II opens
with the estimation of the sizes of correcting codes via formulation in terms
of extremal graph problems. Previously developed algorithms are used to gen-
erate new exact solutions and estimates. The second chapter addresses the
issue of optimal transforms of random vectors. A new transform is presented
which has advantages over the Karhunen–Loève transform. Theory is devel-
oped and applied to an image reconstruction problem. The following chapter
considers how to assign service capacity in a queueing network to minimize
expected delay under a cost constraint. Next there is analysis of a control pol-
icy for stormwater management in a pair of connected tandem dams, where
a developed mathematical technology is proposed and exhibited. Questions
relating to the optimal design of linear consecutive-k-out-of-n systems are
treated in two related chapters. There is a study of optimizing properties of
plastics containing wood flour; an analysis of the approximation characteris-
tics of constrained spanning and Steiner tree problems in weighted undirected
graphs where edge costs and delays satisfy the triangle inequality; heuristics
for speeding convergence in line search; and the use of alternative mathemat-
ical programming formulations for a real-world coal-blending problem under
different scenarios.
All the contributions to this volume had the benefit of expert refereeing.
We are grateful to the following reviewers for their help:
Mirta Inés Aranguren (Universidad Nacional de Mar del Plata, Argentina),
Eduardo Casas (Universidad de Cantabria, Spain), Aurelian Cernea (Uni-
versity of Bucharest), Pauline Coolen–Schrijner (University of Durham),
Bruce Craven (University of Melbourne), Yu-Hong Dai (Chinese Academy
of Sciences, Beijing), Sever Dragomir (Victoria University of Technology,
Melbourne), Elfadl Khalifa Elsheikh (Cairo University), Christopher Frey
(N. Carolina State University), Frank Kelly (University of Cambridge), Peter
Kloeden (Johann Wolfgang Goethe-Universität, Frankfurt), Denis Lander
(RMIT University), Roy Leipnik (UCSB), Musa Mamedov (University of
Preface xxi

Ballarat), Jie Mi (Florida International University), Marco Muselli (Insti-

tute of Electronics, Computer and Telecommunication Engineering, Genova),
Malgorzata O’Reilly (University of Tasmania), Stavros Papastavridis (Uni-
versity of Athens), Serpil Pehlivan (Süleyman Demirel University, Turkey),
Danny Ralph (Cambridge University), Sekharipuram S. Ravi (University at
Albany, New York), Dan Rosenkrantz (University at Albany, New York),
Alexander Rubinov (University of Ballarat), Hanif D. Sherali (Virginia
Polytechnic Institute), Moshe Sniedovich (University of Melbourne), Nicole
Stark (USDA Forest Products Laboratory, Madison), Nasser Hassan Sweilam
(Cairo University), Fredi Tröltzsch (Technische Universität, Berlin), Erik van
Doorn (University of Twente, Netherlands), Frank K. Wang (National Chiao
University, Taiwan, R.O.C.), Jianxing Yin (SuDa University, China), Alexan-
der Zaslavski (Technion-Israel Institute of Technology), and Ming Zuo (Uni-
versity of Alberta).
We wish to thank John Martindale for his involvement in bringing about a
firm arrangement for publication of this volume with Kluwer; Elizabeth Loew
for shepherding the book through to publication with Springer following the
Kluwer–Springer merger; and Panos Pardalos for his support throughout. A
special thank you is due to Jason Whyte, through whose technical skill, effort
and positive morale many difficulties were overcome.
Charles Pearce & Emma Hunt
Chapter 1
On the nondifferentiability
of cone-monotone functions
in Banach spaces

Jonathan Borwein and Rafal Goebel

Abstract In ﬁnite-dimensional spaces, cone-monotone functions – a special

case of which are coordinate-wise nondecreasing functions – possess several
regularity properties like almost everywhere continuity and diﬀerentiability.
Such facts carry over to a separable Banach space, provided that the cone
has interior. This chapter shows that further generalizations are not readily
possible. We display several examples of cone–monotone functions on various
Banach spaces, lacking the regularity expected from their ﬁnite-dimensional
counterparts.

Key words: Monotone functions, ordered Banach spaces, generating cones,

diﬀerentiability

1.1 Introduction

Functions for which f (y) ≥ f (x) whenever y − x is an element of a given

convex cone K are called cone monotone with respect to K (or, simply, K-
monotone). The simplest examples are provided by nondecreasing functions
on the real line. These have several immediate regularity properties, the most
intuitive of which may be the at most countable number of discontinuities.
Regularity properties of coordinate-wise nondecreasing functions on IRn , that
is, functions f for which f (y) ≥ f (x) whenever yi ≥ xi for i = 1, 2, . . . , n,

Jonathan Borwein
Centre for Experimental and Constructive Mathematics,
Simon Fraser University, Burnaby, BC, CANADA V5A 1S6
e-mail: [email protected]
Rafal Goebel
Center for Control Engineering and Computation ECE, University of California, Santa
Barbara, CA 93106-9650 U. S. A.
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 3

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 1,
c Springer Science+Business Media, LLC 2009
4 J. Borwein and R. Goebel

were first collected by Chabrillac and Crouzeix [5] and include measurability,
almost everywhere continuity, and almost everywhere Fréchet differentia-
bility. Note that nondecreasing functions, whether on the real line or IRn ,
are cone monotone with respect to the nonnegative cone, either [0, +∞) or
[0, +∞)n .
Recently, Borwein, Burke and Lewis [2] showed that functions on a sepa-
rable Banach space, monotone with respect to a convex cone with nonempty
interior, are differentiable except at points of an appropriately understood
null set. The main goal of the current chapter is to demonstrate how possible
extensions of this result, or other generalizations of finite-dimensional results
on regularity of cone-monotone functions, fail in a general Banach space.
Motivation for studying coordinate-wise nondecreasing functions in
Chabrillac and Crouzeix, and cone-monotone functions by Borwein, Burke
and Lewis, comes in part from the connections of such functions with Lip-
schitz, and more generally, directionally Lipschitz functions. Interest in the
latter stems from the work of Burke, Lewis and Overton [4] on approximation
of the Clarke subdifferential using gradients, an important idea in practical
optimization. It turns out that such approximations, like in the Lipschitz
case, are possible in the more general directionally Lipschitz setting.
Before summarizing the properties of nondecreasing functions in finite di-
mensions, we illustrate their connection with Lipschitz functions. Consider a
Lipschitz function l : IRn → IR and a K > 0 satisfying

|l(x) − l(y)| ≤ K x − y ∞ , for all x, y ∈ IRn .

Let z ∈ IRn be given by zi = K, i = 1, 2, . . . , n, and deﬁne a function

f : IRn → IR by f (x) = l(x) +
z, x. Then for x and y such that yi ≥ xi ,
i = 1, 2, . . . , n, we have

f (y) − f (x) =
z, y − x + l(y) − l(x)
n

≥K (yi − xi ) − max (yi − xi ) ≥ 0.
i=1,2,...,n
i=1

In eﬀect, f is coordinate-wise nondecreasing. Thus a Lipschitz function de-

composes into a sum of a linear and a nondecreasing function, and inherits the
regularity properties of the latter. Borwein, Burke and Lewis show that di-
rectionally Lipschitz functions on more general spaces admit a similar (local)
decomposition, into a sum of a linear function and a cone-monotone one (with
respect to a convex cone with interior).
Theorem 1 (Monotone functions in finite dimensions). Suppose that
f : IRn → IR satisfies f (x) ≤ f (y) whenever xi ≤ yi , i = 1, 2, . . . , n. Then:
(a) f is measurable.
(b) If, for some d with di > 0 for i = 1, 2, . . . , n, the function t → f (x0 + td)
is lower semicontinuous at t = 0, then f is lower semicontinuous at x0 .
Similarly for upper semicontinuity.
1 On the nondifferentiability of cone-monotone functions in Banach spaces 5

(c) f is almost everywhere continuous.

(d) If f is Gâteaux differentiable at x0 , then it is Fréchet differentiable at x0 .
(e) Let f be the lower semicontinuous hull of f . Then f is continuous at x0
if and only if f is. Similarly, f is Gâteaux differentiable at x0 if and only
f is, and if these functions are Gâteaux differentiable, their derivatives
agree.
(f ) f is almost everywhere Fréchet differentiable.

For details and proofs consult Chabrillac and Crouzeix [5]. Statements (c)
and (f ) generalize the Lebesgue monotone differentiability theorem and in
fact can be deduced from the two-dimensional version given by S. Saks [9];
for details consult Borwein, Burke and Lewis [2]. The Banach space version
of (c) and (f ) of Theorem 1, proved by Borwein, Burke and Lewis, requires a
notion of a null set in a Banach space. We recall that a Banach space does not
admit a Haar measure unless it is finite-dimensional, and proceed to make the
following definitions – for details on these, and other measure-related notions
we use in this chapter, we refer the reader to Benyamini and Lindenstrauss
[1].
Let X be a separable Banach space. A probability measure μ on X is
called Gaussian if for every x∗ ∈ X ∗ , the measure μx∗ on the real line,
defined by μx∗ (A) = μ{y |
x∗ , y ∈ A}, has a Gaussian distribution. It is
additionally called nondegenerate if for every x∗ = 0 the distribution μx∗ is
nondegenerate. A Borel set C ⊂ X is called Gauss null if μ(C) = 0 for every
nondegenerate Gaussian measure on X. It is known that the set of points
where a given Lipschitz function f : X → IR is not Gâteaux differentiable is
Gauss null. This in fact holds for functions with values in a space with the
Radon–Nikodym property (Benyamini and Lindenstrauss [1] Theorem 6.42),
whereas it fails completely for the stronger notion of Fréchet differentiability.

Theorem 2 (Borwein, Burke and Lewis). Let X be a separable space

and let K ⊂ X be a convex cone with non-empty interior. If f : X →
IR is K-monotone, then it is continuous and Hadamard (so also Gâteaux)
diﬀerentiable except at the points of a Gauss null set.

In what follows, we show that all the assumptions in the above theorem
are necessary, and, more generally, demonstrate how the properties of cone-
monotone functions described in Theorem 1 fail to extend to a general Banach
space. Note that the results of Theorems 1 and 2 hold if the functions are
allowed to take on infinite values, as long as appropriate meaning is given
to the continuity or differentiability of such functions. Indeed, composing a
possibly infinite-valued function with, for example, an inverse tangent does
not change its monotonicity properties, while leading to finite values. We do
not address this further, and work with finite-valued functions. Moreover, we
only work with monotone functions which are in fact nondecreasing (homo-
tone) and note that nonincreasing (antitone) functions can be treated in a
symmetric fashion.
6 J. Borwein and R. Goebel

1.2 Examples

We begin the section by showing that K-monotonicity of a function f : X →

IR, with X ﬁnite- or inﬁnite-dimensional, carries little information about the
function if the cone K is not generating: K − K = X. (Recall that K − K
is always a linear subspace of X.) An interesting example of a nongenerating
cone K is provided by nonnegative and nondecreasing functions in X =
C[0, 1]: K − K is the non-closed set of all functions of bounded variation,
dense in (but not equal to) X. Note that the example below, as well as
Example 2, could be considered in a general vector space.

Example 1 (a nongenerating cone). Suppose that K − K = X. Let L ⊃

K − K be a hyperplane in X, not necessarily closed, and let l be the (one-
dimensional) algebraic complement of L in X, that is, L + l = X, L ∩ l = 0.
Such L and l can be constructed using one of the versions of the Separation
Principle: for any point l0 not in the intrinsic core (relative algebraic interior)
of the convex set K − K (as K − K is linear, for any point not in K − K)
there exists a linear functional (not necessarily continuous) φ on X such that

φ, l0 >
φ, K − K; see Holmes [6], page 21. Now let L = ker φ, l = IR l0 .
Deﬁne Pl (x) to be the projection of x onto l – the unique point of l such
that x ∈ Pl (x) + L. Given any function g : l → IR, let

f (x) = g(Pl (x)).

The function f : X → IR is K-monotone: if y ≥K x, that is, y − x ∈ K, then

Pl (x) = Pl (y), and in eﬀect, f (y) = f (x). Now note that at any point x ∈ X,
the function f has the properties of g “in the direction of l.” Consequently,
in this direction, f may display any desired irregular behavior.

In light of the example above, in what follows we only discuss generating

cones. A cone K is certainly generating when int K = ∅, as then the linear
subspace K − K has interior. More generally, K is generating when the core
(algebraic interior) of K is not empty. These conditions are met by nonneg-
ative cones in C[a, b] or l∞ but not by all generating cones — consider for
example nonnegative cones in c0 or lp , 1 ≤ p < ∞. A condition equivalent
to K − K = X is that for any x, y ∈ X there exists an upper bound of x
and y: an element z ∈ X such that z ≥K x, z ≥K y. Consequently, nonnega-
tive cones are generating in Banach lattices, as in such spaces, the maximum
of two elements (with respect to the nonnegative cone) is defined. Banach
lattices include Lp , lp , C[a, b] and c0 spaces. When the subspace K − K is
dense in X, K is generating whenever K − K is additionally closed; equiv-
alently, when the difference of polar cones to K is closed. Finally, in some
cases, measure-theoretic arguments lead to conclusions that int(K − K) is
nonempty, under some assumptions on K. We take advantage of such argu-
ments in Example 4.
1 On the nondifferentiability of cone-monotone functions in Banach spaces 7

In what follows, we assume that the spaces in question are inﬁnite-

dimensional. The next example involves a nonnegative cone deﬁned through
a Hamel basis.

Example 2 (lack of continuity, general Banach space). Consider an ordered

Hamel basis in X (every element of X is a finite linear combination of ele-
ments of such a basis, the latter is necessarily uncountable). Let K be a cone
of elements, all of which have nonnegative coordinates. This cone is closed,
convex, generating and has empty interior. Define a function f : X → IR by
setting f (x) = 1 if the last nonzero coordinate of x is positive, and f (x) = 0
otherwise. Then f is K-monotone: if y − x ∈ K, then the last coordinates
where x and y differ must satisfy yα > xα . If the α-th coordinate is the last
nonzero one for either x or y, then f (y) ≥ f (x). In the opposite case, the last
nonzero coordinates for x and y agree, and f (x) = f (y).
In any neighborhood of any point in X, there are points y with f (y) = 0
as well as points z with f (z) = 1. Indeed, if xα is the last nonzero coordinate
of x, we can perturb x by assigning a small negative or positive value to
some xβ , with β > α. In effect, f is neither lower nor upper semicontinuous.
However, for any nonzero x ∈ X, the function t → f (x + tx) is continuous at
t = 0 (multiplying the last nonzero coordinate by 1 + t does not change its
sign). Moreover, the lower semicontinuous hull of f is the constant function
l(x) = 0, and the upper semicontinuous hull of f is the constant function
u(x) = 1. Both of these are smooth.

The preceding example demonstrated the failure of (c) and (e) of Theorem
1 in a general Banach space. We note that the cone K in this example is a
variation on one given by Klee [7], of a pointed convex cone dense in X
(pointed means that K ∩ −K = 0). Such a set is obtained by considering all
elements of X for which the last nonzero coordinate in a given Hamel basis
is positive.
In the example below, we use a Schauder basis to construct a cone-
monotone function violating (c) and (e) of Theorem 1 similarly to Example
2, however continuous at every point in a dense set of directions.

Example 3 (lack of continuity but continuity on a dense set of directions).

Let {xi }∞ i=1 be a Schauder basis of X, and let {x∗j }∞
i=1 be the associated
∞
projections, so that x = i=1
x, x∗i xi for any x ∈ X. We assume that the
∞
basis is unconditional, that is, for any x the sum i=1 i
x, x∗i xi converges
∞
for any combinations of i = ±1, and consequently, j=1
x, x∗ij xij converges
∞
for any subsequence {ij }j=1 . The standard bases in c0 and lp with p < +∞
satisfy this condition.
Deﬁne a cone K ⊂ X by K = co(cone{xi }∞ i=1 ), a closed convex hull of
the cone generated by xi ’s – equivalently, K = {x |
x, x∗i ≥ 0 for all i =
1, 2, . . .}. As the basis is unconditional, any x can be decomposed into a
sum of an element with positive coordinates and an element with negative
coordinates, and thus the cone K is generating. Let f : X → IR be given by
8 J. Borwein and R. Goebel

f (x) = lim sup sign

x, x∗j + ,
j→∞

where a+ = max{0, a}. Then f is K-monotone. Indeed, if x ≤K y, that is,

y − x ∈ K, then
y − x, x∗j ≥ 0 – equivalently
y, x∗j ≥
x, x∗j – for all x∗j .
This implies that f (x) ≤ f (y).
Note that the sets {x | f (x) = 0} and {x | f (x) = 1} are dense in X:
we have f (x) = 0for any x ∈ span({xi }∞ i=1 ) whereas f (x) = 1 for any x ∈
∞
span({xi }∞i=1 ) +
−i
i=1 2 xi / xi . As a result, f (x) is nowhere continuous,
whereas for any x ∈ X there exists d ∈ X such that f (x + td) is continuous
at t = 0. In fact, f (x + txi ) is continuous in t, with f (x + txi ) = f (x) for any t
and any xi . In greater generality, f (x + y) = f (x) for any y ∈ span({xi }∞ i=1 ),
as for large enough j, we have
x + y, x∗j =
x, x∗j . Thus there exists a set
D dense in X such that, for every x ∈ X and every d ∈ D, the function
t → f (x + td) is continuous.
As in Example 2 the lower and upper semicontinuous hulls of f are the
constant functions l(x) = 0 and u(x) = 1, respectively, and in particular,
they are smooth.

We now need to introduce another notion of a null set in a separable

Banach space, more general than that of a Gauss null set, described in the
comments preceding Theorem 2. A Borel set C ⊂ X is called Haar null if
there is a Borel probability measure μ on X such that μ(C + x) = 0 for all
x ∈ X. Haar null sets include all Gauss null sets. The nonnegative cone in lp ,
1 ≤ p < ∞, is Haar null but not Gauss null (in fact it is not σ-directionally
null, a notion weaker than Gauss null), whereas the nonnegative cone in c0 is
not Haar null. In the example below, we use a fact that follows from Theorem
6.4 in Benyamini and Lindenstrauss [1]: if a set S is not Haar null, then S − S
contains a neighborhood of 0.

Example 4 (continuity, but only on a dense subset of a separable and non-

reflexive Banach space). Let X be a separable, nonreflexive space, and
Y ⊂ X a hyperplane not containing 0. Let C ⊂ Y be a closed convex set,
with empty interior and not Haar null with respect to Y . In X, consider
K = IR+ C = {rC | r ∈ [0, +∞)}, and note that this set is a closed con-
vex cone: any description of C as {x ∈ X |
x, aγ ≥ bγ , γ ∈ Γ } for aγ ∈ X ∗ ,
bγ ∈ IR leads to K = {x ∈ X |
x, aγ ≥ bγ , γ ∈ Γ, bγ = 0}. Moreover, K has
empty interior and is not Haar null. Indeed, suppose that μ(K) = 0 for some
Borel probability measure on X. Then μ (C) = 0 where μ is a Borel prob-
ability measure on Y defined by μ (A) = μ(IR+ A), and this contradicts C
being non Haar null. Also note that K − K = X, as K − K is a cone, and,
since K is not Haar null, K − K is a neighborhood of 0.
Define a function f : X → IR by

0 if x ∈ −K,
f (x) =
1 if x ∈ −K.
1 On the nondifferentiability of cone-monotone functions in Banach spaces 9

We check that f is K-monotone. The only way this could fail is if for some
y ≥K x, f (y) = 0 and f (x) = 1. But f (y) = 0 states that y ∈ −K, y ≥K x
implies x = y − k for some k ∈ K, and since −K is a convex cone, x =
y + (−k) ∈ −K + (−K) = −K. Thus x ∈ −K and f (x) cannot equal 1. Thus
f is K-monotone. Additionally, as K is closed and convex, the function f is,
respectively, lower semicontinuous and quasiconvex (level sets {x | f (x) ≤
r} are closed and convex). Moreover, f is generically continuous by Fort’s
theorem (see Borwein, Fitzpatrick and Kenderov [3]).
However, for every x ∈ −K, there exists d ∈ X such that t → f (x +
td) is not continuous at t = 0. Indeed, suppose that this failed for some
x0 ∈ −K. Then for every d ∈ X there exists (d) > 0 so that |t| < (d)
implies f (x0 + td) = 0, and so x0 + td ∈ −K. Thus x0 is an absorbing point
of a closed convex set −K, and, as X is barelled, x0 ∈ int(−K). But the
latter set is empty.
To sum up, f is continuous on a dense set but it is not continuous (and
not diﬀerentiable) at any point of the given non Haar null set.

The closed convex set C ⊂ Y in the example above was chosen to be not
Haar null and has no interior. Such a set exists in any nonreflexive space,
and in fact can be chosen to contain a translate of every compact subset of
the space – see Benyamini and Lindenstrauss [1]. In c0 , the nonnegative cone
is such a set (this requires the cone to be not Haar null), whereas the Haar
null nonnegative cone in l1 is not. Still, in l1 , and in fact in any nonreflexive
space, a cone satisfying the mentioned conditions can be found.
Indeed, suppose the set C of Example 4 contains translates of all compact
subsets of Y . We show that the constructed cone K contains a translate of
every compact subset of X. Pick any compact D ⊂ X. Let g ∈ X ∗ be such
that Y = g −1 (1). Shift D by z1 so that mind∈D+z1 g(d) = 1, and moreover,
so that (D + z1 ) ∩ C = ∅. Pick any v ∈ (D + z1 ) ∩ C, and let E ⊂ Y be the
projection of D onto Y in the direction v. Then E is a compact subset of Y ,
and thus for some z2 ∈ ker g, E + z2 ⊂ C. Now note that E + z2 is exactly
the projection in the direction v onto Y of the set D + z1 + z2 , which implies
that the latter set is a subset of C + IR+ v. Now C + IR+ v ⊂ K, as C ⊂ K
and v ∈ C. In effect, K contains D + z1 + z2 .
We now address another question on regularity of cone-monotone func-
tions. Weak and strong notions of lower semicontinuity for convex functions
agree. One cannot expect the same to hold for monotone functions, as the
following example demonstrates.

Example 5 (Lipschitz continuity, but no weak continuity). Let X = c0 with

the supremum norm. The nonnegative cone K is closed, convex, has empty
interior but is not Haar null. Fix any a ∈ X with a > 0 (a has positive
coordinates) and deﬁne f : X → IR by

x+
f (x) = .
x+ + (a − x)+
10 J. Borwein and R. Goebel

The denominator is never 0, as at least one of the summands is always posi-

tive, and thus f is continuous. In fact, x+ + (a − x)+ ≥ a , and since
both the numerator and denominator are Lipschitz, so is f .
Note also that f (X) = [0, 1], with f (x) = 0 if and only if x ≤ 0, and
f (x) = 1 if and only if x ≥ a. We check that f is monotone. For any y ≥ x
we have y + ≥ x+ , and (a − x)+ ≥ (a − y)+ , since a − x ≥ a − y. Then

y + x+ x+
≥ + ≥ + ,
y + + (a − y)
+ x + (a − y)
+ x + (a − x)+

where the first inequality stems from the fact that for a fixed β ≥ 0, the
function α → α/(α + β) is nondecreasing. Thus f is monotone. Let {en }∞
n=1
be the standard unit vectors in X. Notice that for any fixed α > 0 and
large enough n, we have (x − αen )+ = x+ and (a − x + αen )+ =
max{ (a − x)+ , α}. In effect,

(x − αen )+
f (x − αen ) =
(x − αen )+ + (a − x + αen )+
x+
→ +
x + max{ (a − x)+ , α}

as n → ∞. Note that the last expression is less than f (x) whenever x+ > 0
and (a − x)+ < α. Similar analysis leads to

max{ x+ , α}
f (x + αen ) → ,
max{ x+ , α} + (a − x)+

with the limit greater than f (x) when (a − x)+ > 0 and x+ < α. For a
given α, the vectors αen converge weakly to 0. The constant α can be ﬁxed
arbitrarily large, and thus the function f is not weakly lower semicontinuous
at any x with x+ > 0 (equivalently x ∈ −K), and not weakly upper
semicontinuous at any x with (a − x)+ > 0 (equivalently x ∈ a + K).
Consider any x with xn < 0 for all n. It is easy to verify that

f (x + th) − f (x)
lim =0
t→0 t h

for all h ∈ c00 , that is, for sequences h with finitely many nonzero entries (in
fact, the difference quotient is then 0 for all small enough t). As c00 is dense
in X, and f is Lipschitz continuous, f is Gâteaux differentiable at x, with
the derivative equal to 0 ∈ X ∗ . Similarly, f has Gâteaux derivative 0 ∈ X ∗
at every x such that xn > an for all n.
Theorem 2 of Burke, Borwein and Lewis states that functions monotone
with respect to a cone with interior are Gâteaux differentiable outside a Gauss
null set. In the example below, we show a failure of that conclusion for a cone
with empty interior, even in a Hilbert space.
1 On the nondifferentiability of cone-monotone functions in Banach spaces 11

We ﬁrst recall that the nonnegative cone in c0 is not Haar null, and so
is not Gaussian null, whereas those in lp , 1 < p < ∞ are Haar null but not
Gauss null. To see that the nonnegative cone in l2 is not Gauss null, observe
for example that it contains the interval

J = {x ∈ l2 | 0 ≤ xn ≤ 1/8n , n = 1, 2, . . .}

and apply the fact that the closed convex hull of any norm-convergent to 0
sequence with dense span is not Gauss null (Theorem 6.24 in Benyamini and
Lindenstrauss). The non Gauss null interval J will be used in the example
below.
Example 6 (Holder continuity, lack of Gâteaux diﬀerentiability). We show
that the Holder continuous function

f (x) = x+ ,

monotone with respect to the nonnegative cone K, fails to be Gâteaux dif-

ferentiable at any point of −K, both in c0 with the supremum norm and in
the Hilbert space l2 with the standard l2 –norm.
We discuss the c0 case first. Pick any x ∈ −K. If xn = 0 for some n,
then considering xn + ten shows that f is not Gâteaux differentiable (the
directional derivative in the direction of en is infinite). Suppose that xn < 0
for all n. Let h be given by hn = (−xn )1/3 , and consider tk = 2(−xk )2/3
converging to 0 as k → ∞. We have
√
f (x + tk h) − f (x) f (x + tk h) xk + tk hk
= ≥
tk tk tk
1/2
(−xk ) 1
= = ,
2(−xk )2/3 2(−xk )1/6

and the last expression diverges to +∞ as k → ∞. Thus f is not diﬀerentiable

at x.
We now turn to l2 , and ﬁrst show that f fails to be Gâteaux diﬀerentiable
on the non Gauss null interval

−J = {x ∈ l2 | − 1/8n ≤ xn ≤ 0, n = 1, 2, . . .}.

Indeed, for any x ∈ −J, consider h with hn = 1/2n and tk = 1/2k . Then
√
f (x + tk h) − f (x) xk + tk hk −1/8k + 1/4k
≥ ≥ = 1 − 1/2k ,
tk tk 1/2k

and, if the function was Gâteaux diﬀerentiable at x, the directional derivative

in the direction h would be at least 1. But this is impossible, as x provides a
global minimum for f .
12 J. Borwein and R. Goebel

To see that f fails to be Gâteaux diﬀerentiable at any point x ∈ −K, note

that for some sequence ni , we have −1/8i ≤ xni ≤ 0. An argument as above,
with hni = 1/2i and 0 otherwise, leads to the desired conclusion.
A slight variation on Example 6 leads to a continuous function on c0 ,
monotone with respect to the nonnegative cone K, but not Gâteaux differ-
entiable at any point of a dense set c00 − K (any point with finite number of
positive coefficients). Let {ki }∞i=1 be dense in K, and define

∞
(x − ki )+
f (x) = .
i=1
2i

Monotonicity is easy to check, and f is continuous as (x − ki )+ ≤ x+ . If

for some i, x ≤K ki , then f is not Gâteaux at x. Indeed, in such a case we
have, for h ≥K 0 and t > 0,

f (x + th) − f (x) (x − ki + th)+ − (x − ki )+
≥ .
t 2i t
Picking t and h as in Example 6 (for x − Ki ) leads to the desired conclusion.
The close relationship of cone-monotone and Lipschitz functions suggests
that badly behaved cone-monotone functions will exist in spaces where irreg-
ular Lipschitz functions do. For example,

p(x) = lim sup |xn |

n→∞

is a nowhere Gâteaux diﬀerentiable continuous seminorm in l∞ , see Phelps

[8]. Arguments similar to those given by Phelps show that the function f :
l∞ → IR, given by
−
n − lim sup xn ,
f (x) = lim sup x+
n→∞ n→∞

though monotone with respect to the nonnegative cone, is not Gâteaux dif-
ferentiable outside c0 , that is, at any x for which at least one of lim sup x+ n
and lim sup x− − −
n is positive. (Recall α = −α when α < 0 and α = 0 other-
wise.) Indeed, suppose that lim sup x+ n = α > 0, and choose a subsequence
nk so that lim xnk = α. Define h by hn2i = 1, hn2i+1 = −1, i = 1, 2, . . .,
and hn = 0 for n = nk , k = 1, 2, . . .. Notice that for t close enough to 0,
lim sup(x + th)+n = α + |t| (if t > 0 then (x + th)n2i = xn2i + t, i = 1, 2, . . .,
and these terms provide the lim sup(x + th)+ n ; if t < 0 one should consider
(x + th)n2i+1 = xn2i+1 − t). On the other hand, lim sup(x + th)− −
n = lim sup xn ,
and in effect, the limit of (f (x + th) − f (x))/t as t → 0 does not exist. The
case of lim sup x−
n = β > 0 can be treated in a symmetric fashion.
Borwein, Burke and Lewis [2] show that any directionally Lipschitz func-
tion decomposes (locally) into a linear function, and a cone-monotone one
(with respect to a cone with interior). Consequently, nondifferentiable
1 On the nondifferentiability of cone-monotone functions in Banach spaces 13

Lipschitz functions lead to local examples of nonregular cone-monotone func-

tions. On spaces where there exist nowhere differentiable globally Lipschitz
functions, like l∞ or l1 (Γ ) with Γ uncountable, one can in fact construct
nowhere Gâteaux cone-monotone functions; we carry this out explicitly in
our final example. We note that the technique of Example 7 can be used
to construct cone-monotone functions (with respect to cones with nonempty
interiors) from any given Lipschitz function, on spaces like c0 and lp . Also
note that spaces which admit a nowhere Fréchet differentiable convex func-
tion (spaces which are not Asplund spaces) also admit a nowhere Fréchet
renorm (and so a nowhere Fréchet globally Lipschitz function); the situation
is not well understood for Gâteaux differentiability.
Example 7 (a nowhere Gâteaux differentiable function on l∞ ). As discussed
above, p(x) = lim supn→∞ |xn | is nowhere Gâteaux differentiable on l∞ . We
use this fact to construct a nowhere Gâteaux differentiable function, mono-
tone with respect to a cone with interior.
Let e1 be the first of the standard unit vectors in l∞ , and consider
the

function f (x) = p(x) +

e1 , x = p(x) + x1 and the cone K = IR+ IB 1/2 (e1 )
(the cone generated by the closed ball of radius 1/2 centered at e1 ). Then K
has interior and f is K-monotone. Indeed, as for any x ∈ IB 1/2 (e1 ), x1 ≥ 1/2
while xr ≤ 1/2, r = 2, 3, . . ., we have, for any k ∈ K, k = k1 (for the
supremum norm). As p(x) is Lipschitz continuous, with constant 1, we obtain,
for any x ∈ X, k ∈ K,

p(x + k) − p(x) ≥ − k ≥ −
e1 , k,

which translates to

p(x + k) +
e1 , x + k ≥ p(x) +
e1 , x,

and this means that f is K-monotone. As p is nowhere Gâteaux diﬀerentiable,

so is f .

Acknowledgments The ﬁrst author’s research was partially supported by NSERC and
by the Canada Research Chair Programme. The second author performed this research at
the Centre for Experimental and Constructive Mathematics at Simon Fraser University
and at the Department of Mathematics at University of British Columbia.

References

1. Y. Benyamini and J. Lindenstrauss,Geometric Nonlinear Functional Analysis, Vol. 1

(American Mathematical Society, Providence, RI, 2000).
2. J. M. Borwein, J. Burke and A. S. Lewis, Diﬀerentiability of cone–monotone functions
on separable Banach space, Proc. Amer. Math. Soc., 132 (2004), 1067–1076.
3. J. M. Borwein, S. Fitzpatrick and P. Kenderov, Minimal convex uscos and monotone
operators on small sets, Canad. J. Math. 43 (1991), 461–477.
14 J. Borwein and R. Goebel

4. J. Burke, A. S. Lewis and M. L. Overton, Approximating subdiﬀerentials by random

sampling of gradients, Math. Oper. Res. 27 (2002), 567–584.
5. Y. Chabrillac and J.-P. Crouzeix, Continuity and diﬀerentiability properties of mono-
tone real functions of several real variables, Nonlinear Analysis and Optimization
(Louvain-la-Neuve, 1983); Math. Programming Stud. 30 (1987), 1–16.
6. R. B. Holmes, Geometric Functional Analysis and its Applications (Springer-Verlag,
New York, 1975).
7. V. Klee, Convex sets in linear spaces, Duke Math. J. 8 (1951), 433–466.
8. R. Phelps, Convex Functions, Monotone Operators and Diﬀerentiability (Springer-
Verlag, New York, 1993).
9. S. Saks, Theory of the Integral, English translation, second edition (Stechert, New York,
1937).
Chapter 2
Duality and a Farkas lemma
for integer programs

Jean B. Lasserre

Abstract We consider the integer program max{c x | Ax = b, x ∈ Nn }. A

formal parallel between linear programming and continuous integration, and
discrete summation, shows that a natural duality for integer programs can
be derived from the Z-transform and Brion and Vergne’s counting formula.
Along the same lines, we also provide a discrete Farkas lemma and show that
the existence of a nonnegative integral solution x ∈ Nn to Ax = b can be
tested via a linear program.

Key words: Integer programming, counting problems, duality

2.1 Introduction

In this paper we are interested in a comparison between linear and integer

programming, and particularly in a duality perspective. So far, and to the
best of our knowledge, the duality results available for integer programs are
obtained via the use of subadditive functions as in Wolsey [21], for exam-
ple, and the smaller class of Chvátal and Gomory functions as in Blair and
Jeroslow [6], for example (see also Schrijver [19, pp. 346–353]). For more de-
tails the interested reader is referred to [1, 6, 19, 21] and the many references
therein. However, as subadditive, Chvátal and Gomory functions are only
deﬁned implicitly from their properties, the resulting dual problems deﬁned
in [6] or [21] are conceptual in nature and Gomory functions are used to
generate valid inequalities for the primal problem.
We claim that another natural duality for integer programs can be derived
from the Z-transform (or generating function) associated with the counting

Jean B. Lasserre
LAAS-CNRS, 7 Avenue du Colonel Roche, 31077 Toulouse Cédex 4, FRANCE
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 15

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 2,
c Springer Science+Business Media, LLC 2009
16 J.B. Lasserre

version (deﬁned below) of the integer program. Results for counting problems,
notably by Barvinok [4], Barvinok and Pommersheim [5], Khovanskii and
Pukhlikov [12], and in particular, Brion and Vergne’s counting formula [7],
will prove especially useful.
For this purpose, we will consider the four related problems P, Pd , I and
Id displayed in the diagram below, in which the integer program Pd appears
in the upper right corner.

Continuous Optimization Discrete Optimization

− | −

P : f (b, c) := max c x P d : fd(b, c) := max c x
Ax = b ←→ Ax = b
s.t. s.t.
x ∈ Rn+ x ∈ Nn
−
Integration Summation
− | −

εc x
I : f(b, c) :=

εc x ds Id : fd (b, c) :=
Ω(b)
←→ x∈Ω(b)
Ax = b Ax = b
Ω(b) := Ω(b) :=
x ∈ Rn+ x ∈ Nn

Problem I (in which ds denotes the Lebesgue measure on the aﬃne variety
{x ∈ Rn | Ax = b} that contains the convex polyhedron Ω(b)) is the inte-
gration version of the linear program P, whereas Problem Id is the counting
version of the (discrete) integer program Pd .
Why should these four problems help in analyzing Pd ? Because ﬁrst, P
and I, as well as Pd and Id , are simply related, and in the same manner.
Next, as we will see, the nice and complete duality results available for P, I
and Id extend in a natural way to Pd .

2.1.1 Preliminaries

In fact, I and Id are the respective formal analogues in the algebra (+, ×) of
P and Pd in the algebra (⊕, ×), where in the latter, the addition a ⊕ b stands
for max(a, b); indeed, the “max” in P and Pd can be seen as an idempotent
integral (or Maslov integral) in this algebra (see, for example, Litvinov et al.
[17]). For a nice parallel between results in probability ((+, ×) algebra) and
optimization ((max, +) algebra), the reader is referred to Bacelli et al. [3,
Section 9].
Moreover, P and I, as well as Pd and Id , are simply related via

εf (b,c) = lim f(b, rc)1/r ; εfd (b,c) = lim fd (b, rc)1/r . (2.1)
r→∞ r→∞
2 Duality and a Farkas lemma for integer programs 17

Equivalently, by continuity of the logarithm,

1 1
f (b, c) = lim ln f (b, rc); fd (b, c) = lim ln fd (b, rc), (2.2)
r→∞ r r→∞ r
a relationship that will be useful later.
Next, concerning duality, the standard Legendre-Fenchel transform which
yields the usual dual LP of P,

P∗ → minm {b λ | A λ ≥ c}, (2.3)

λ∈R

has a natural analogue for integration, the Laplace transform, and thus the
inverse Laplace transform problem (that we call I∗ ) is the formal analogue of
P∗ and provides a nice duality for integration (although not usually presented
in these terms). Finally, the Z-transform is the obvious analogue for summa-
tion of the Laplace transform for integration. We will see that in the light of
recent results in counting problems, it is possible to establish a nice duality
for Id in the same vein as the duality for (continuous) integration and by
(2.2), it also provides a powerful tool for analyzing the integer program Pd .

2.1.2 Summary of content

(a) We first review the duality principles that are available for P, I and Id
and underline the parallels and connections between them. In particular, a
fundamental difference between the continuous and discrete cases is that in
the former, the data appear as coefficients of the dual variables whereas in the
latter, the same data appear as exponents of the dual variables. Consequently,
the (discrete) Z-transform has many more poles than the Laplace transform.
Whereas the Laplace transform has only real poles, the Z-transform has ad-
ditional complex poles associated with each real pole, which induces some
periodic behavior, a well-known phenomenon in number theory where the
Z-transform (or generating function) is a standard tool (see, for example, Io-
sevich [11], Mitrinovı́c et al. [18]). So, if the procedure of inverting the Laplace
transform or the Z-transform (that is, solving the dual problems I∗ and I∗d )
is basically of the same nature, that is, a complex integral, it is significantly
more complicated in the discrete case, due to the presence of these additional
complex poles.
(b) Then we use results from (a) to analyze the discrete optimization
problem Pd . Central to the analysis is Brion and Vergne’s inverse formula
[7] for counting problems. In particular, we provide a closed-form expression
for the optimal value fd (b, c) which highlights the special role played by the
so-called reduced costs of the linear program P and the complex poles of the
Z-transform associated with each basis of the linear program P. We also
show that each basis B of the linear program P provides exactly det(B)
18 J.B. Lasserre

complex dual vectors in Cm , the complex (periodic) analogues for Pd of the

unique dual vector in Rm for P, associated with the basis B. As in linear
programming (but in a more complicated way), the optimal value fd (b, c) of
Pd can be found by inspection of (certain sums of) reduced costs associated
with each vertex of Ω(b).
(c) We also provide a discrete Farkas lemma for the existence of nonneg-
ative integral solutions x ∈ Nn to Ax = b. Its form also conﬁrms the special
role of the Z-transform described earlier. Moreover, it allows us to check
the existence of a nonnegative integral solution by solving a related linear
program.

2.2 Duality for the continuous problems P and I

With A ∈ Rm×n and b ∈ Rm , let Ω(b)Rn be the convex polyhedron

Ω(b) := {x ∈ Rn | Ax = b; x ≥ 0}, (2.4)

and consider the standard linear program (LP)

P: f (b, c) := max{c x | Ax = b; x ≥ 0} (2.5)

with c ∈ Rn , and its associated integration version

I : f (b, c) :=

εc x ds (2.6)
Ω(b)

where ds is the Lebesgue measure on the aﬃne variety {x ∈ Rn | Ax = b}

that contains the convex polyhedron Ω(b).
For a vector c and a matrix A we denote by c and A their respective
transposes. We also use both the notation c x and
c, x for the usual scalar
product of two vectors c and x. We assume that both A ∈ Rm×n and b ∈ Rm
have rational entries.

2.2.1 Duality for P

It is well known that the standard duality for (2.5) is obtained from the
Legendre-Fenchel transform F (., c) : Rm → R of the value function f (b, c)
with respect to b, that is, here (as y → f (y, c) is concave)

λ → F (λ, c) := infm
λ, y − f (y, c), (2.7)
y∈R

which yields the usual dual LP problem

P∗ → inf
λ, b − F (λ, c) = minm {b λ | A λ ≥ c}. (2.8)
λ∈Rm λ∈R
2 Duality and a Farkas lemma for integer programs 19

2.2.2 Duality for integration

Similarly, the analogue for integration of the Fenchel transform is the two-
sided Laplace transform F(., c) : Cm → C of f(b, c), given by

λ → F(λ, c) := ε−λ,y f(y, c) dy. (2.9)
Rm

It turns out that developing (2.9) yields

n
1
F(λ, c) = whenever Re(A λ − c) > 0, (2.10)
(A λ − c)k
k=1

(see for example [7, p. 798] or [13]). Thus F(λ, c) is well-deﬁned provided

Re(A λ − c) > 0, (2.11)

and f(b, c) can be computed by solving the inverse Laplace transform prob-
lem, which we call the (integration) dual problem I∗ of (2.12), that is,
γ+i∞
1
I∗ → f(b, c) := εb,λ F(λ, c) dλ
(2iπ)m γ−i∞

1 γ+i∞
εb,λ
= dλ, (2.12)
(2iπ)m γ−i∞
n

(A λ − c)k
k=1

where γ ∈ Rm is ﬁxed and satisﬁes A γ −c > 0. Incidentally, observe that the

domain of deﬁnition (2.11) of F(., c) is precisely the interior of the feasible
set of the dual problem P∗ in (2.8). We will comment more on this and the
link with the logarithmic barrier function for linear programming (see Section
2.2.5 below).
We may indeed call I∗ a dual problem of I as it is deﬁned on the space Cm
of variables {λk } associated with the nontrivial constraints Ax = b; notice
that we also retrieve the standard “ingredients” of the dual optimization
problem P∗ , namely b λ and A λ − c.

2.2.3 Comparing P, P∗ and I, I∗

One may compute f(b, c) directly using Cauchy residue techniques. That is,
one may compute the integral (2.12) by successive one-dimensional complex
integrals with respect to one variable λk at a time (for example starting
with λ1 , λ2 , . . .) and by repeated application of Cauchy’s Residue Theorem
20 J.B. Lasserre

[8]. This is possible because the integrand is a rational fraction, and after
application of Cauchy’s Residue Theorem at step k with respect to λk , the
ouput is still a rational fraction of the remaining variables λk+1 , . . . , λm . For
more details the reader is referred to Lasserre and Zeron [13]. It is not diﬃcult
to see that the whole procedure is a summation of partial results, each of them
corresponding to a (multi-pole) vector λ ∈ Rm that annihilates m terms of
n products in the denominator of the integrand.
This is formalized in the nice formula of Brion and Vergne [7, Proposition
3.3 p. 820] that we describe below. For the interested reader, there are several
other nice closed-form formulae for f(b, c), notably by Barvinok [4], Barvinok
and Pommersheim [5], and Khovanskii and Pukhlikov [12].

2.2.4 The continuous Brion and Vergne formula

The material in this section is taken from [7]. To explain the closed-form
formula of Brion and Vergne we need some notation.
Write the matrix A ∈ Rm×n as A = [A1 | . . . |An ] where Aj ∈ Rm denotes
the j-th column of A for all j = 1, . . . , n. With Δ := (A1 , . . . , An ) let C(Δ) ⊂
Rm be the closed convex cone generated by Δ. Let Λ ⊆ Zm be a lattice.
A subset σ of {1, . . . , n} is called a basis of Δ if the sequence {Aj }j∈σ is
a basis of Rm , and the set of bases of Δ is denoted by B(Δ). For σ ∈ B(Δ)
let C(σ) be the cone generated by {Aj }j∈σ . With any y ∈ C(Δ) associate
the intersection of all cones C(σ) which contain y. This deﬁnes a subdivision
of C(Δ) into polyhedral cones. The interiors of the maximal cones in this
subdivision are called chambers in Brion and Vergne [7]. For every y ∈ γ,
the convex polyhedron Ω(y) in (2.4) is simple. Next, for a chamber γ (whose
closure is denoted by γ), let B(Δ, γ) be the set of bases σ such that γ is
contained
in C(σ), and let μ(σ) denote the volume of the convex polytope
{ j∈σ tj Aj | 0 ≤ tj ≤ 1} (normalized so that m
vol(R /Λ) = 1). Observe that
for b ∈ γ and σ ∈ B(Δ, γ) we have b = j∈σ xj (σ)Aj for some xj (σ) ≥ 0.
Therefore the vector x(σ) ∈ Rn+ , with xj (σ) = 0 whenever j ∈ σ, is a vertex of
the polytope Ω(b). In linear programming terminology, the bases σ ∈ B(Δ, γ)
correspond to the feasible bases of the linear program P. Denote by V the
subspace {x ∈ Rn | Ax = 0}. Finally, given σ ∈ B(Δ), let π σ ∈ Rm be the
row vector that solves π σ Aj = cj for all j ∈ σ. A vector c ∈ Rn is said to
be regular if cj − π σ Aj = 0 for all σ ∈ B(Δ) and all j ∈ σ. Let c ∈ Rn
be regular with −c in the interior of the dual cone (Rn+ ∩ V )∗ (which is the
case if A u > c for some u ∈ Rm ). Then, with Λ = Zm , Brion and Vergne’s
formula [7, Proposition 3.3, p. 820] states that
εc,x(σ)
f(b, c) = σ
∀ b ∈ γ. (2.13)
μ(σ) k ∈σ (−ck + π Ak )
σ∈B(Δ,γ)
2 Duality and a Farkas lemma for integer programs 21

Notice that in linear programming terminology, ck − π σ Ak is simply the so-

called reduced cost of the variable xk , with respect to the basis {Aj }j∈σ .
Equivalently, we can rewrite (2.13) as
εc,x(σ)
f(b, c) = σ
. (2.14)
μ(σ) k ∈σ (−ck + π Ak )
x(σ): vertex of Ω(b)

Thus f(b, c) is a weighted summation over the vertices of Ω(b) whereas f (b, c)
is a maximization over the vertices (or a summation with ⊕ ≡ max).
So, if c is replaced by rc and x(σ ∗ ) denotes the vertex of Ω(b) at which

c x is maximized, we obtain
⎡ ⎤ r1
⎢ εrc,x(σ)−x(σ ) ⎥ ∗

f(b, rc)1/r = εc,x(σ ) ⎢ ⎥

∗

⎣ ⎦
n−m μ(σ) σ
x(σ):vertex of Ω(b)
r (−c k + π Ak )
k ∈σ

from which it easily follows that

lim ln f(b, rc)1/r =

c, x(σ ∗ ) = max
c, x = f (b, c),
r→∞ x∈Ω(b)

as indicated in (2.2).

2.2.5 The logarithmic barrier function

It is also worth noticing that

1 γr +i∞
εb,λ
f(b, rc) = dλ
(2iπ)m γr −i∞
n

(A λ − rc)k
k=1
γ+i∞ m−n rb,λ
1 r ε
= dλ
(2iπ)m γ−i∞
n
(A λ − c)k
k=1

with γr = rγ and we can see that (up to the constant (m − n) ln r) the loga-
rithm of the integrand is simply the well-known logarithmic barrier function

n
−1
λ → φμ (λ, b) = μ
b, λ − ln (A λ − c)j ,
j=1
22 J.B. Lasserre

with parameter μ := 1/r, of the dual problem P∗ (see for example den Hertog
[9]). This should not come as a surprise as a self-concordant barrier function
of a cone K ⊂ Rn is given by the logarithm of the Laplace transform
φK (x)−x,s
K∗
ε ds of its dual cone K ∗ (see for example Güler [10], Truong and
Tunçel [20]).
Thus, when r → ∞, minimizing the exponential logarithmic barrier func-
tion on its domain in Rm yields the same result as taking its residues.

2.2.6 Summary

The parallel between P, P∗ and I, I∗ is summarized below.

Fenchel-duality Laplace-duality

f (b, c) := max c x
f (b, c) :=

εc x ds
Ax=b; x≥0 Ax=b; x≥0

F(λ, c) := ε−λ y f(y, c) dy

F (λ, c) := infm {λ y − f (y, c)}
y∈R Rm
1
=

n
(A λ − c)k
k=1

with : A λ − c ≥ 0 with : Re(A λ − c) > 0

1
f(b, c) = ελ b F(λ, c) dλ

f (b, c) = minm {λ b − F (λ, c)}
λ∈R (2iπ)m
Γ
1 ελ b
= minm {b λ | A λ ≥ c} = dλ
λ∈R (2iπ)m Γ
n

(A λ − c)k
k=1

Simplex algorithm → Cauchy’ s Residue →

vertices of Ω(b) poles of F(λ, c)
c x
→ max c x over vertices. → ε over vertices.

2.3 Duality for the discrete problems Id and Pd

In the respective discrete analogues Pd and Id of (2.5) and (2.6) one replaces
the positive cone Rn+ by Nn (or Rn+ ∩ Zn ), that is, (2.5) becomes the integer
program
Pd : fd (b, c) := max {c x | Ax = b; x ∈ Nn } (2.15)
2 Duality and a Farkas lemma for integer programs 23

whereas (2.6) becomes a summation over Nn ∩ Ω(b), that is,

Id : fd (b, c) := {εc x | Ax = b; x ∈ Nn }. (2.16)

We here assume that A ∈ Zm×n and b ∈ Zm , which implies in particular that

the lattice Λ := A(Zn ) is a sublattice of Zm (Λ ⊆ Zm ). Note that b in (2.15)
and (2.16) is necessarily in Λ.
In this section we are concerned with what we call the “dual” problem I∗d
of Id , the discrete analogue of the dual I∗ of I, and its link with the discrete
optimization problem Pd .

2.3.1 The Z-transform

The natural discrete analogue of the Laplace transform is the so-called Z-

transform. Therefore with fd (b, c) we associate its (two-sided) Z-transform
Fd (., c) : Cm → C deﬁned by

z → Fd (z, c) := z −y fd (y, c), (2.17)
y∈Zm

where the notation z y with y ∈ Zm stands for z1y1 · · · zm

ym
. Applying this
deﬁnition yields

Fd (z, c) = z −y fd (y, c)
y∈Zm
⎡ ⎤

= z −y ⎣ εc x ⎦
y∈Zm x∈Nn ; Ax=y
⎡ ⎤

= εc x ⎣ −ym ⎦
z1−y1 · · · zm
x∈Nn y=Ax
−(Ax)1 −(Ax)m
= εc x z 1 · · · zm
x∈Nn

n
1
=
k=1
(1 − εck z1−A1k z2−A2k −Amk
· · · zm )
n
1
= , (2.18)
(1 − εck z −Ak )
k=1

which is well-deﬁned provided

|z1A1k · · · zm
Amk
| (= |z Ak |) > εck ∀k = 1, . . . , n. (2.19)
24 J.B. Lasserre

Observe that the domain of deﬁnition (2.19) of Fd (., c) is the exponential

version of (2.11) for F(., c). Indeed, taking the real part of the logarithm in
(2.19) yields (2.11).

2.3.2 The dual problem I∗d

Therefore the value fd (b, c) is obtained by solving the inverse Z-transform

problem I∗d (that we call the dual of Id )

1
fd (b, c) = · · · Fd (z) z b−em dzm · · · dz1 , (2.20)
(2iπ)m |z1 |=γ1 |zm |=γm

where em is the unit vector of Rm and γ ∈ Rm is a (ﬁxed) vector that

satisﬁes γ1A1k γ2A2k · · · γm
Amk
> εck for all k = 1, . . . , n. We may indeed call I∗d
the dual problem of Id as it is deﬁned on the space Zm of dual variables zk
associated with the nontrivial constraints Ax = b of the primal problem Id .
We also have the following parallel.

Continuous Laplace-duality Discrete Z-duality

f (b, c) :=

εc x ds fd (b, c) :=

εc x
Ax=b; x∈Rn
+ Ax=b; x∈Nn

F(λ, c) := ε−λ y f(y, c)dy Fd (z, c) := z −y fd (y, c)

Rm y∈Zm

n
1
n
1
= =
(A λ − c)k 1 − εck z −Ak
k=1 k=1

with Re(A λ − c) > 0. with |z Ak | > εck , k = 1, . . . , n.

2.3.3 Comparing I∗ and I∗d

Observe that the dual problem I∗d in (2.20) is of the same nature as I∗ in
(2.12) because both reduce to computing a complex integral whose integrand
is a rational function. In particular, as I∗ , the problem I∗d can be solved by
Cauchy residue techniques (see for example [14]).
However, there is an important diﬀerence between I∗ and I∗d . Whereas the
data {Ajk } appears in I∗ as coeﬃcients of the dual variables λk in F(λ, c),
it now appears as exponents of the dual variables zk in Fd (z, c). And an
immediate consequence of this fact is that the rational function Fd (., c) has
many more poles than F(., c) (by considering one variable at a time), and in
2 Duality and a Farkas lemma for integer programs 25

particular, many of them are complex, whereas F(., c) has only real poles. As
a result, the integration of Fd (z, c) is more complicated than that of F(λ, c),
which is reﬂected in the discrete (or periodic) Brion and Vergne formula
described below. However, we will see that the poles of Fd (z, c) are simply
related to those of F(λ, c).

2.3.4 The “discrete” Brion and Vergne formula

Brion and Vergne [7] consider the generating function H : Cm → C deﬁned

by
λ → H(λ, c) := fd (y, c)ε−λ,y ,
y∈Zm

which, after the change of variable zi = ελi for all i = 1, . . . , m, reduces to

Fd (z, c) in (2.20).
They obtain the nice formula (2.21) below. Namely, and with the same
notation used in Section 2.2.4, let c ∈ Rn be regular with −c in the interior of
(Rn+ ∩V )∗ , and let γ be a chamber. Then for all b ∈ Λ ∩ γ (recall Λ = A(Zn )),

εc x(σ)
fd (b, c) = Uσ (b, c) (2.21)
μ(σ)
σ∈B(Δ,γ)

for some coeﬃcients Uσ (b, c) ∈ R, a detailed expression for which can be

found in [7, Theorem 3.4, p. 821]. In particular, due to the occurrence of
complex
poles in F (z, c), the term Uσ (b, c) in (2.21) is the periodic analogue
of ( k ∈σ (ck − πσ Ak ))−1 in (2.14).
Again, as for f(b, c), (2.21) can be re-written as

εc x(σ)
fd (b, c) = Uσ (b, c), (2.22)
μ(σ)
x(σ): vertex of Ω(b)

to compare with (2.14). To be more precise, by inspection of Brion and

Vergne’s formula in [7, p. 821] in our current context, one may see that
ε2iπb (g)
Uσ (b, c) = , (2.23)
Vσ (g, c)
g∈G(σ)

where G(σ) := (⊕j∈σ ZAj )∗ /Λ∗ (where ∗ denotes the dual lattice); it is a
2iπb
ﬁnite abelian group of order μ(σ) and with (ﬁnitely many) characters ε
for all b ∈ Λ. In particular, writing Ak = j∈σ ujk Aj for all k ∈ σ,

ε2iπAk (g) = ε2iπ j∈σ ujk gj
k ∈ σ.
26 J.B. Lasserre

Moreover,
1 − ε−2iπAk (g)εck −π
σ
Ak
Vσ (g, c) = , (2.24)
k ∈σ

with Ak , π σ as in (2.13) (and π σ rational). Again note the importance of the

reduced costs ck − π σ Ak in the expression for Fd (z, c).

2.3.5 The discrete optimization problem Pd

We are now in a position to see how I∗d provides some nice information about
the optimal value fd (b, c) of the discrete optimization problem Pd .

Theorem 1. Let A ∈ Zm×n , b ∈ Zm and let c ∈ Zn be regular with −c in the

interior of (Rn+ ∩ V )∗ . Let b ∈ γ ∩ A(Zn ) and let q ∈ N be the least common
multiple (l.c.m.) of {μ(σ)}σ∈B(Δ,γ) .
If Ax = b has no solution x ∈ Nn then fd (b, c) = −∞, else assume that

1
max c x(σ) + lim ln Uσ (b, rc) ,
x(σ): vertex of Ω(b) r→∞ r

is attained at a unique vertex x(σ) of Ω(b). Then

1
fd (b, c) = max c x(σ) + lim ln Uσ (b, rc)
x(σ): vertex of Ω(b) r→∞ r

1
= max c x(σ) + (deg(Pσb ) − deg(Qσb ))
x(σ): vertex of Ω(b) q
(2.25)

for some real-valued univariate polynomials Pσb and Qσb .

Moreover, the term limr→∞ ln Uσ (b, rc)/r or (deg(Pσb ) − deg(Qσb ))/q in
(2.25) is a sum of certain reduced costs ck − π σ Ak (with k ∈ σ).

For a proof see Section 2.6.1.

Remark 1. Of course, (2.25) is not easy to obtain but it shows that the optimal
value fd (b, c) of Pd is strongly related to the various complex poles of Fd (z, c).
It is also interesting to note the crucial role played by the reduced costs
ck − π σ Ak in linear programming. Indeed, from the proof of Theorem 1 the
optimal value fd (b, c) is the value of c x at some vertex x(σ) plus a sum of
certain reduced costs (see (2.50) and the form of the coeﬃcients αj (σ, c)).
Thus, as for the LP problem P, the optimal value fd (b, c) of Pd can be found
by inspection of (certain sums of) reduced costs associated with each vertex
of Ω(b).
2 Duality and a Farkas lemma for integer programs 27

We next derive an asymptotic result that relates the respective optimal values
fd (b, c) and f (b, c) of Pd and P.

Corollary 1. Let A ∈ Zm×n , b ∈ Zm and let c ∈ Rn be regular with −c in

the interior of (Rn+ ∩ V )∗ . Let b ∈ γ ∩ Λ and let x∗ ∈ Ω(b) be an optimal
vertex of P, that is, f (b, c) = c x∗ = c x(σ ∗ ) for σ ∗ ∈ B(Δ, γ), the unique
optimal basis of P. Then for t ∈ N suﬃciently large,

1
fd (tb, c) − f (tb, c) = lim ln Uσ∗ (tb, rc) . (2.26)
r→∞ r

In particular, for t ∈ N suﬃcently large, the function t → f (tb, c) − fd (tb, c)

is periodic (constant) with period μ(σ ∗ ).

For a proof see Section 2.6.2. Thus, when b ∈ γ ∩ Λ is suﬃciently large, say
b = tb0 with b0 ∈ Λ and t ∈ N, the “max” in (2.25) is attained at the unique
optimal basis σ ∗ of the LP (2.5) (see details in Section 2.6.2).
From Remark 1 it also follows that for suﬃciently large t ∈ N, the optimal
∗
value fd (tb, c) is equal to f (tb, c) plus a certain sum of reduced costs ck −π σ Ak
∗ ∗
(with k ∈ σ ) with respect to the optimal basis σ .

2.3.6 A dual comparison of P and Pd

We now provide an alternative formulation of Brion and Vergne’s discrete

formula (2.22), which explicitly relates dual variables of P and Pd . Recall
that a feasible basis of the linear program P is a basis σ ∈ B(Δ) for which
A−1
σ b ≥ 0. Thus let σ ∈ B(Δ) be a feasible basis of the linear program P and
consider the system of m equations in Cm :
A
z1 1j · · · zm
Amj
= ε cj , j ∈ σ. (2.27)

Recall that Aσ is the nonsingular matrix [Aj1 | · · · |Ajm ], with jk ∈ σ for

all k = 1, . . . , m. The above system (2.27) has ρ(σ) (= det(Aσ )) solutions
ρ(σ)
{z(k)}k=1 , written as

z(k) = ελ ε2iπθ(k) , k = 1, . . . , ρ(σ) (2.28)

for ρ(σ) vectors {θ(k)} in Cm .

Indeed, writing z = ελ ε2iπθ (that is, the vector {eλj ε2iπθj }m m
j=1 in C ) and
passing to the logarithm in (2.27) yields

Aσ λ + 2iπ Aσ θ = cσ , (2.29)

where cσ ∈ Rm is the vector {cj }j∈σ . Thus λ ∈ Rm is the unique solution of

Aσ λ = cσ and θ satisﬁes
28 J.B. Lasserre

Aσ θ ∈ Zm . (2.30)
Equivalently, θ belongs to (⊕j∈σ Aj Z)∗ , the dual lattice of ⊕j∈σ Aj Z.
Thus there is a one-to-one correspondence between the ρ(σ) solutions
{θ(k)} and the ﬁnite group G (σ) = (⊕j∈σ Aj Z)∗ /Zm , where G(σ) is a sub-
group of G (σ). Thus, with G(σ) = {g1 , . . . , gs } and s := μ(σ), we can write
(Aσ )−1 gk = θgk = θ(k), so that for every character ε2iπy of G(σ), y ∈ Λ, we
have

ε2iπy (g) = ε2iπy θg , y ∈ Λ, g ∈ G(σ) (2.31)
and

ε2iπAj (g) = ε2iπAj θg = 1, j ∈ σ. (2.32)
So, for every σ ∈ B(Δ), denote by {zg }g∈G(σ) these μ(σ) solutions of (2.28),
that is,
zg = ελ ε2iπθg ∈ Cm , g ∈ G(σ), (2.33)
with λ = (Aσ )−1 cσ , and where ελ ∈ Rm is the vector {ελi }m
i=1 .
So, in the linear program P we have a dual vector λ ∈ Rm associated with
each basis σ. In the integer program P, with each (same) basis σ there are now
associated μ(σ) “dual” (complex) vectors λ + 2iπθg , g ∈ G(σ). Hence, with a
basis σ in linear programming, the “dual variables” in integer programming
are obtained from (a), the corrresponding dual variables λ ∈ Rm in linear
programming, and (b), a periodic correction term 2iπθg ∈ Cm , g ∈ G(σ).
We next introduce what we call the vertex residue function.
Deﬁnition 1. Let b ∈ Λ and let c ∈ Rn be regular. Let σ ∈ B(Δ) be a
feasible basis of the linear program P and for every r ∈ N, let {zgr }g∈G(σ)
be as in (2.33), with rc in lieu of c, that is,

zgr = εrλ ε2iπθg ∈ Cm ; g ∈ G(σ), with λ = (Aσ )−1 cσ .

The vertex residue function associated with a basis σ of the linear program
P is the function Rσ (zg , .) : N → R deﬁned by

1 b
zgr
r → Rσ (zg , r) := , (2.34)
μ(σ) −Ak rck
g∈G(σ) (1 − zgr ε )
j ∈σ

which is well deﬁned because when c is regular, |zgr |Ak = εrck for all k ∈ σ.

The name vertex residue is now clear because in the integration (2.20),
Rσ (zg , r) is to be interpreted as a generalized Cauchy residue, with respect
to the μ(σ) “poles” {zgr } of the generating function Fd (z, rc).
Recall from Corollary 1 that when b ∈ γ ∩Λ is suﬃciently large, say b = tb0
with b0 ∈ Λ and some large t ∈ N, the “max” in (2.25) is attained at the
unique optimal basis σ ∗ of the linear program P.
2 Duality and a Farkas lemma for integer programs 29

Proposition 1. Let c be regular with −c ∈ (Rn+ ∩ V )∗ and let b ∈ γ ∩ Λ be

suﬃciently large so that the max in (2.25) is attained at the unique optimal
basis σ ∗ of the linear program P. Let {zg }g∈G(σ∗ ) be as in (2.33) with σ = σ ∗ .
Then the optimal value of Pd satisﬁes
⎡ ⎤
1 ⎣ 1 b
zgr
fd (b, c) = lim ln ⎦
μ(σ ∗ ) −Ak rck
k ∈σ ∗ (1 − zgr ε
r→∞ r )
∗ g∈G(σ )
1
= lim ln Rσ∗ (zg , r) (2.35)
r→∞ r

and the optimal value of P satisﬁes

⎡ ⎤
1 1 |zgr |b
f (b, c) = lim ln ⎣ ⎦
r→∞ r μ(σ ∗ ) ∗ k ∈σ ∗ (1 − |zgr |−Ak εrck )
g∈G(σ )
1
= lim ln Rσ∗ (|zg |, r). (2.36)
r→∞ r
For a proof see Section 2.6.3.
Proposition 1 shows that there is indeed a strong relationship between
the integer program Pd and its continuous analogue, the linear program
P. Both optimal values obey exactly the same formula (2.35), but for the
continuous version, the complex vector zg ∈ Cm is replaced by the vector
∗
|zg | = ελ ∈ Rm of its component moduli, where λ∗ ∈ Rm is the optimal
solution of the LP dual of P. In summary, when c ∈ Rn is regular and
b ∈ γ ∩ Λ is suﬃciently large, we have the following correspondence.

Linear program P Integer program Pd

unique optimal basis σ ∗ unique optimal basis σ ∗

1 optimal dual vector μ(σ ∗ ) dual vectors

λ∗ ∈ Rm zg ∈ Cm , g ∈ G(σ ∗ )

ln zg = λ∗ + 2iπθg

1 1
f (b, c) = lim ln Rσ∗ (|zg |, r) fd (b, c) = lim ln Rσ∗ (zg , r)
r→∞ r r→∞ r

2.4 A discrete Farkas lemma

In this section we are interested in a discrete analogue of the continuous

Farkas lemma. That is, with A ∈ Zm×n and b ∈ Zm , consider the issue of the
existence of a nonnegative integral solution x ∈ Nn to the system of linear
equations Ax = b .
30 J.B. Lasserre

The (continuous) Farkas lemma, which states that given A ∈ Rm×n and
b ∈ Rm ,

{x ∈ Rn | Ax = b, x ≥ 0} = ∅ ⇔ [A λ ≥ 0] ⇒ b λ ≥ 0, (2.37)

has no discrete analogue in an explicit form. For instance, the Gomory func-
tions used in Blair and Jeroslow [6] (see also Schrijver [19, Corollary 23.4b])
are implicitly and iteratively defined, and are not directly defined in terms of
the data A, b. On the other hand, for various characterizations of feasibility
of the linear diophantine equations Ax = b, where x ∈ Zn , the interested
reader is referred to Schrijver [19, Section 4].
Before proceeding to the general case when A ∈ Zm×n , we first consider
the case A ∈ Nm×n , where A (and thus b) has only nonnegative entries.

2.4.1 The case when A ∈ Nm×n

In this section we assume that A ∈ Nm×n and thus necessarily b ∈ Nm , since

otherwise {x ∈ Nn | Ax = b} = ∅.
Theorem 2. Let A ∈ Nm×n and b ∈ Nm . Then the following two proposi-
tions (i) and (ii) are equivalent:
(i) The linear system Ax = b has a solution x ∈ Nn .
(ii) The real-valued polynomial z → z b − 1 := z1b1 · · · zm
bm
− 1 can be written

n
zb − 1 = Qj (z)(z Aj − 1), (2.38)
j=1

for some real-valued polynomials Qj ∈ R[z1 , . . . , zm ], j = 1, . . . , n, all

of which have nonnegative coeﬃcients.
In addition, the degree of the Qj in (2.38) is bounded by

m
m
b∗ := bj − min Ajk . (2.39)
k
j=1 j=1

For a proof see Section 2.6.4. Hence Theorem 2 reduces the issue of existence
of a solution x ∈ Nn to a particular ideal membership problem, that is, Ax = b
has a solution x ∈ Nn if and only if the polynomial z b − 1 belongs to the
binomial ideal I =
z Aj − 1j=1,...,n ⊂ R[z1 , . . . , zm ] for some weights Qj with
nonnegative coeﬃcients.
Interestingly, consider the ideal J ⊂ R[z1 , . . . , zm , y1 , . . . , yn ] generated by
the binomials z Aj − yj , j = 1, . . . , n, and let G be a Gröbner basis of J. Using
the algebraic approach described in Adams and Loustaunau [2, Section 2.8],
it is known that Ax = b has a solution x ∈ Nn if and only if the monomial
2 Duality and a Farkas lemma for integer programs 31

z b can be reduced (with respect to G) to some monomial y α , in which case,

α ∈ Nn is a feasible solution. Observe that in this case, we do not know
α ∈ Nn in advance (we look for it!) to test whether z b − y α ∈ J. One has
to apply Buchberger’s algorithm to (i) ﬁnd a reduced Gröbner basis G of
J, and (ii) reduce z b with respect to G and check whether the ﬁnal result
is a monomial y α . Moreover, in the latter approach one uses polynomials in
n + m (primal) variables y and (dual) variables z, in contrast with the (only)
m dual variables z in Theorem 2.
∗

Remark 2. (a) With b∗ as in (2.39) denote by s(b∗ ) := m+b b∗ the dimension

of the vector space of polynomials of degree b∗ in m variables. In view of
Theorem 2, and given b ∈ Nm , checking the existence of a solution x ∈ Nn
to Ax = b reduces to checking whether or not there exists a nonnegative
solution y to a system of linear equations with:
• n × s(b∗ ) variables, the nonnegative coefficients of the Qj ;
n
• s(b∗ + max Ajk ) equations to identify the terms of the same powers on
k
j=1
both sides of (2.38).
This in turn reduces to solving an LP problem with ns(b∗ ) variables and
s(b∗ + maxk j Ajk ) equality constraints. Observe that in view of (2.38),
this LP has a matrix of constraints with coefficients made up only of 0’s and
±1’s.
(b) From the proof of Theorem 2 in Section 2.6.4, it easily follows that one
may even constrain the weights Qj in (2.38) to be polynomials in Z[z1 , . . . , zm ]
(instead of R[z1 , . . . , zm ]) with nonnegative coefficients. However, (a) shows
that the strength of Theorem 2 is precisely allowing Qj ∈ R[z1 , . . . , zm ] while
enabling us to check feasibility by solving a (continuous) linear program. By
enforcing Qj ∈ Z[z1 , . . . , zm ] one would end up with an integer linear system
whose size was larger than that of the original problem.

2.4.2 The general case

In this section we consider the general case where A ∈ Zm×n so that A may
have negative entries, and we assume that the convex polyhedron Ω := {x ∈
Rn+ | Ax = b} is compact.
The above arguments cannot be repeated because of the occurrence of
negative powers. However, let α ∈ Nn and β ∈ N be such that
jk := Ajk + αk ≥ 0,
A k = 1, . . . , n; bj := bj + β ≥ 0, (2.40)

for all j = 1, . . . , m. Moreover, as Ω is compact, we have that

32 J.B. Lasserre
⎧ ⎫ ⎧ ⎫
⎨n ⎬ ⎨n ⎬
max αj xj | Ax = b ≤ maxn αj xj | Ax = b =: ρ∗ (α) < ∞.
x∈Nn ⎩ ⎭ x∈R+ ⎩ ⎭
j=1 j=1
(2.41)
∗
Observe that given α ∈ N , the scalar ρ (α) is easily calculated by solving
n

an LP problem. Choose N β ≥ ρ∗ (α), and let A ∈ Nm×n and b ∈ Nm be

as in (2.40). Then the existence of solutions x ∈ Nn to Ax = b is equivalent
to the existence of solutions (x, u) ∈ Nn ×N to the system of linear equations
⎧
⎪
⎨ n Ax + uem = b
Q (2.42)
⎪
⎩ αj xj + u = β.
j=1

Indeed, if Ax = b with x ∈ Nn then

n
n
Ax + em αj xj − em αj xj = b + em β − em β,
j=1 j=1

or equivalently, ⎛ ⎞

n
+ ⎝β −
Ax αj xj ⎠ em = b,
j=1
n
and thus, as β ≥ ρ∗ (α) ≥ j=1 αj xj (see, for example, (2.41)), we see that
n
(x, u) with β − j=1 αj xj =: u ∈ N is a solution of (2.42). Conversely, let
and b, it
(x, u) ∈ Nn × N be a solution of (2.42). Using the deﬁnitions of A
then follows immediately that

n
n
Ax + em αj xj + uem = b + βem ; αj xj + u = β,
j=1 j=1

so that Ax = b. The system of linear equations (2.42) can be cast in the form
⎡ ⎤
A | em
x b
B = with B := ⎣ − − ⎦ , (2.43)
u β
α | 1

and as B only has entries in N, we are back to the case analyzed in Section
2.4.1.

Corollary 2. Let A ∈ Zm×n and b ∈ Zm and assume that Ω := {x ∈

Rn+ | Ax = b} is compact. Let α ∈ Nn and β ∈ N be as in (2.40) with
β ≥ ρ∗ (α) (see, for example, (2.41)). Then the following two propositions (i)
and (ii) are equivalent:
(i) The system of linear equations Ax = b has a solution x ∈ Nn ;
2 Duality and a Farkas lemma for integer programs 33

(ii) The real-valued polynomial z → z b (zy)β − 1 ∈ R[z1 , . . . , zm , y] can be

written

n
z b (zy)β − 1 = Q0 (z, y)(zy − 1) + Qj (z, y)(z Aj (zy)αj − 1) (2.44)
j=1

for some real-valued polynomials {Qj }nj=0 in R[z1 , . . . , zm , y], all of which
have nonnegative coeﬃcients.
The degree of the Qj in (2.44) is bounded by
⎡ ⎡ ⎤⎤
m m
(m + 1)β + bj − min ⎣m + 1, min ⎣(m + 1)αk + Ajk ⎦⎦ .
k=1,...,n
j=1 j=1

Proof. Let A ∈ Nm×n , b ∈ Nm , α ∈ Nn and β ∈ N be as in (2.40) with

∗
β ≥ ρ (α). Then apply Theorem 2 to the equivalent form (2.43) of the system
Q in (2.42), where B and (b, β) only have entries in N, and use the deﬁnitions
and b.
of A

Indeed Theorem 2 and Corollary 2 have the ﬂavor of a Farkas lemma as

it is stated with the transpose A of A and involving the dual variables zk
associated with the constraints Ax = b. In addition, and as expected, it
implies the continuous Farkas lemma because if {x ∈ Nn | Ax = b} = ∅, then
from (2.44), and with z := ελ and y := (z1 · · · zm )−1 ,

m
εb λ − 1 = Qj (eλ1 , . . . eλm , e− i λi
)(ε(A λ)j − 1). (2.45)
j=1

Therefore A λ ≥ 0 ⇒ ε(A λ)j − 1 ≥ 0 for all j = 1, . . . , n, and as the Qj have

nonnegative coeﬃcients, we have eb λ − 1 ≥ 0, which in turn implies b λ ≥ 0.
Equivalently, evaluating the partial derivatives
n of both sides of (2.45) with
respect to λj , at the point λ = 0, yields bj = k=1 Ajk xk for all j = 1, . . . , n,
with xk := Qk (1, . . . , 1) ≥ 0. Thus Ax = b for some x ∈ Rn+ .

2.5 Conclusion

We have proposed what we think is a natural duality framework for the in-
teger program Pd . It essentially relies on the Z-transform of the associated
counting problem Id , for which the important Brion and Vergne inverse for-
mula appears to be an important tool for analyzing Pd . In particular, it
shows that the usual reduced costs in linear programming, combined with
the periodicities phenomena associated with the complex poles of Fd (z, c),
also play an essential role for analyzing Pd . Moreover, for the standard dual
34 J.B. Lasserre

vector λ ∈ Rm associated with each basis B of the linear program P, there

are det(B) corresponding dual vectors z ∈ Cm for the discrete problem Pd .
Moreover, for b suﬃciently large, the optimal value of Pd is a function of
these dual vectors associated with the optimal basis of the linear program P.
A topic of further research is to establish an explicit dual optimization prob-
lem P∗d in these dual variables. We hope that the above results will stimulate
further research in this direction.

2.6 Proofs

A proof in French of Theorem 1 can be found in Lasserre [15]. The English

proof in [16] is reproduced below.

2.6.1 Proof of Theorem 1

Proof. Use (2.1) and (2.22) to obtain

⎡ ⎤1/r
ε rc x(σ)
εfd (b,c) = lim ⎣ Uσ (b, rc)⎦
r→∞ μ(σ)
x(σ): vertex of Ω(b)
⎡ ⎤1/r
ε rc x(σ) ε2iπb
(g) ⎦
= lim ⎣
r→∞ μ(σ) Vσ (g, rc)
x(σ): vertex of Ω(b) g∈G(σ)
⎡ ⎤1/r

= lim ⎣ Hσ (b, rc)⎦ . (2.46)
r→∞
x(σ): vertex of Ω(b)

Next, from the expression of Vσ (b, c) in (2.24), and with rc in lieu of c, we see
that Vσ (g, rc) is a function of y := er , which in turn implies that Hσ (b, rc) is
also a function of εr , of the form

ε2iπb (g)
Hσ (b, rc) = (εr )c x(σ)
, (2.47)
j δj (σ, g, A) × (er )αj (σ,c)
g∈G(σ)

for finitely many coefficients {δj (σ, g, A), αj (σ, c)}. Note that the coefficients
αj (σ, c) are sums of some reduced costs ck − π σ Ak (with k ∈ σ). In addition,
the (complex) coefficients {δj (σ, g, A)} do not depend on b.
Let y := εr/q , where q is the l.c.m. of {μ(σ)}σ∈B(Δ,γ) . As q(ck −π σ Ak ) ∈ Z
for all k ∈ σ,
Pσb (y)
Hσ (b, rc) = y qc x(σ) × (2.48)
Qσb (y)
2 Duality and a Farkas lemma for integer programs 35

for some polynomials Pσb , Qσb ∈ R[y]. In view of (2.47), the degree of Pσb
and Qσb , which depends on b but not on the magnitude of b, is uniformly
bounded in b.
Therefore, as r → ∞,

Hσ (b, rc) ≈ (εr/q )qc x(σ)+deg(Pσb )−deg(Qσb ) , (2.49)

so that the limit in (2.46), which is given by max εc x(σ) lim Uσ (b, rc)1/r (as
σ r→∞
we have assumed unicity of the maximizer σ), is also

max εc x(σ)+(deg(Pσb )−deg(Qσb ))/q .
x(σ): vertex of Ω(b)

Therefore fd (b, c) = −∞ if Ax = b has no solution x ∈ Nn , else

1
fd (b, c) = max c x(σ) + (deg(Pσb ) − deg(Qσb )) , (2.50)
x(σ): vertex of Ω(b) q

from which (2.25) follows easily.

2.6.2 Proof of Corollary 1

Proof. Let t ∈ N and note that f (tb, rc) = trf (b, c) = trc x∗ = trc x(σ ∗ ). As
in the proof of Theorem 1, and with tb in lieu of b, we have
⎡ % &t ⎤ r1
U (tb, rc) ε rc x(σ)
Uσ (tb, rc) ⎦
fd (tb, rc) = εtc x ⎣
1 ∗ σ∗
r +
μ(σ ∗ ) εrc x(σ∗ ) μ(σ)
vertex x(σ) =x∗

and from (2.47)–(2.48), setting δσ := c x∗ − c x(σ) > 0 and y := εr/q ,

⎡ ⎤1/r

fd (tb, rc)1/r = εtc x ⎣ Uσ (tb, rc) + P (y)
∗ ∗
y −tqδσ
σtb ⎦ .
μ(σ ∗ ) ∗
Q σtb (y)
vertex x(σ) =x

Observe that c x(σ ∗ )−c x(σ) > 0 whenever σ = σ ∗ because Ω(y) is simple
if y ∈ γ, and c is regular. Indeed, as x∗ is an optimal vertex of the LP problem
∗
P, the reduced costs ck − π σ Ak (k ∈ σ ∗ ) with respect to the optimal basis
σ ∗ are all nonpositive, and in fact, strictly negative because c is regular (see
Section 2.2.4). Therefore the term

Pσtb (y)
y −tqδσ
Qσtb (y)
vertex x(σ) =x∗
36 J.B. Lasserre

is negligible for t suﬃciently large, when compared with Uσ∗ (tb, rc). This is
because the degrees of Pσtb and Qσtb depend on tb but not on the magnitude
of tb (see (2.47)–(2.48)), and they are uniformly bounded in tb. Hence taking
the limit as r → ∞ yields
% ∗
&1/r
fd (tb,c) εrtc x(σ ) ∗
ε = lim Uσ∗ (tb, rc) = εtc x(σ ) lim Uσ∗ (tb, rc)1/r ,
r→∞ μ(σ ∗ ) r→∞

from which (2.26) follows easily.

Finally, the periodicity comes from the term ε2iπtb (g) in (2.23) for g ∈
G(σ ∗ ). The period is then, of the order G(σ ∗ ).

2.6.3 Proof of Proposition 3.1

∗
Proof. Let Uσ∗ (b, c) be as in (2.23)–(2.24). It follows immediately that π σ =
(λ∗ ) and so
σ∗ ∗
ε−π Ak −2iπAk
ε (g) = ε−Ak λ ε−2iπAk θg = zg−Ak , g ∈ G(σ ∗ ).

Next, using c x(σ ∗ ) = b λ∗ ,

∗ ∗
εc x(σ ) ε2iπb (g) = εb λ ε2iπb θg = zgb , g ∈ G(σ ∗ ).

Therefore

1 1 zgb
εc x(σ) Uσ∗ (b, c) =
∗
μ(σ ) μ(σ ∗ )
g∈G(σ ∗ )
(1 − zg−Ak εck )
= Rσ∗ (zg , 1),

and (2.35) follows from (2.25) because, with rc in lieu of c, zg becomes zgr =
∗
εrλ ε2iπθg (only the modulus changes).
∗
Next, as only the modulus of zg is involved in (2.36), we have |zgr | = εrλ
for all g ∈ G(σ ∗ ), so that

1 |zgr |b εrb λ
∗

= r(ck −Ak λ∗ ) )
,
μ(σ ∗ ) k ∈σ ∗ (1 − |zgr |
−A kε rc k)
k ∈σ ∗ (1 − ε
g∈G(σ ∗ )

and, as r → ∞ ,
∗
εrb λ ∗
r(ck −Ak λ∗ ) )
≈ εrb λ ,
k ∈σ ∗ (1 − ε

because (ck − Ak λ∗ ) < 0 for all k ∈ σ ∗ . Therefore

2 Duality and a Farkas lemma for integer programs 37
% ∗
&
1 εrb λ
limln r(ck −Ak λ∗ )
= b λ∗ = f (b, c),
k ∈σ ∗ (1 − ε
r→∞ r )

the desired result.

2.6.4 Proof of Theorem 2

Proof. (ii) ⇒ (i). Assume that z b − 1 can be written as in (2.38) for some
polynomials {Qj } with
nonnegative coefficients {Qjα }, that is, {Qj(z) } =
α∈N m Q jα z α
= α∈N m Q jα z α1
1 · · · zmαm
, for finitely many nonzero (and
nonnegative) coefficients Qjα . Using the notation of Section 2.3, the function
fd (b, 0), which (as c = 0) counts the nonnegative integral solutions x ∈ Nn
to the equation Ax = b, is given by

1 z b−em
fd (b, 0) = · · · n −Ak )
dz,
j=1 (1 − z
(2πi)m
|z1 |=γ1 |zm |=γm

where γ ∈ Rm satisﬁes A γ > 0 (see (2.18) and (2.20)).

Writing z b−em as z −em (z b − 1 + 1) we obtain

fd (b, 0) = B1 + B2 ,

with

1 z −em
B1 = ··· n −Ak )
dz
(2πi)m |z1 |=γ1 |zm |=γm j=1 (1 − z

and

1 z −em (z b − 1)
B2 := ··· n −Ak )
dz
(2πi)m |z1 |=γ1 |zm |=γm j=1 (1 − z

n
1 z Aj −em Qj (z)
= ··· −Ak )
dz
j=1
(2πi)m |z1 |=γ1 |zm |=γm k =j (1 − z

n
Qjα z Aj +α−em
= ··· −Ak )
dz.
j=1 α∈Nm
(2πi)m |z1 |=γ1 |zm |=γm k =j (1 − z

From (2.20) (with b := 0) we recognize in B1 the number of solutions x ∈ Nn

to the linear system Ax = 0, so that B1 = 1. Next, again from (2.20) (now
with b := Aj + α), each term

Qjα z Aj +α−em
Cjα := ··· −Ak )
dz
(2πi)m |z1 |=γ1 |zm |=γm k =j (1 − z
38 J.B. Lasserre

is equal to

Qjα × the number of integral solutions x ∈ Nn−1

of the linear system A (j) x = Aj + α, where A

(j) is the matrix in Nm×(n−1)
obtained from A by deleting its j-th column Aj . As by hypothesis each Qjα
is nonnegative, it follows that

n
B2 = Cjα ≥ 0,
j=1 α∈Nm

so that fd (b, 0) = B1 + B2 ≥ 1. In other words, the sytem Ax = b has at least

one solution x ∈ Nn .
(i) ⇒ (ii). Let x ∈ Nn be a solution of Ax = b, and write
n−1
z b − 1 = z A1 x1 − 1 + z A1 x1 (z A2 x2 − 1) + · · · + z j=1 Aj xj
(z An xn − 1)

and
' (
z Aj xj − 1 = (z Aj − 1) 1 + z Aj + · · · + z Aj (xj −1) , j = 1, . . . , n,

to obtain (2.38) with

j−1 ' (
z → Qj (z) := z k=1 Ak xk
1 + z Aj + · · · + z Aj (xj −1) , j = 1, . . . , n.

We immediately see that each Qj has all its coeﬃcients nonnegative (and
even in {0, 1}).
Finally, the bound on the degree follows immediately from the proof for
(i) ⇒ (ii).

References

1. K. Aardal, R. Weismantel and L. A. Wolsey, Non-standard approaches to integer

programming, Discrete Appl. Math. 123 (2002), 5–74.
2. W. W. Adams and P. Loustaunau, An Introduction to Gröbner Bases (American
Mathematical Society, Providence, RI, 1994).
3. F. Bacelli, G. Cohen, G. J. Olsder and J.-P. Quadrat, Synchronization and Linearity
(John Wiley & Sons, Chichester, 1992).
4. A. I. Barvinok, Computing the volume, counting integral points and exponential sums,
Discrete Comp. Geom. 10 (1993), 123–141.
5. A. I. Barvinok and J. E. Pommersheim, An algorithmic theory of lattice points in
polyhedra, in New Perspectives in Algebraic Combinatorics, MSRI Publications 38
(1999), 91–147.
6. C. E. Blair and R. G. Jeroslow, The value function of an integer program, Math.
Programming 23 (1982), 237–273.
2 Duality and a Farkas lemma for integer programs 39

7. M. Brion and M. Vergne, Residue formulae, vector partition functions and lattice
points in rational polytopes, J. Amer. Math. Soc. 10 (1997), 797–833.
8. J. B. Conway, Functions of a Complex Variable I, 2nd ed. (Springer, New York, 1978).
9. D. den Hertog, Interior Point Approach to Linear, Quadratic and Convex Program-
ming (Kluwer Academic Publishers, Dordrecht, 1994).
10. O. Güler, Barrier functions in interior point methods, Math. Oper. Res. 21 (1996),
860–885.
11. A. Iosevich, Curvature, combinatorics, and the Fourier transform, Notices Amer.
Math. Soc. 48 (2001), 577–583.
12. A. Khovanskii and A. Pukhlikov, A Riemann-Roch theorem for integrals and sums of
quasipolynomials over virtual polytopes, St. Petersburg Math. J. 4 (1993), 789–812.
13. J. B. Lasserre and E. S. Zeron, A Laplace transform algorithm for the volume of a
convex polytope, JACM 48 (2001), 1126–1140.
14. J. B. Lasserre and E. S. Zeron, An alternative algorithm for counting integral points
in a convex polytope, Math. Oper. Res. 30 (2005), 597–614.
15. J. B. Lasserre, La valeur optimale des programmes entiers, C. R. Acad. Sci. Paris
Ser. I Math. 335 (2002), 863–866.
16. J. B. Lasserre, Generating functions and duality for integer programs, Discrete Optim.
1 (2004), 167–187.
17. G. L. Litvinov, V. P. Maslov and G. B. Shpiz, Linear functionals on idempotent
spaces: An algebraic approach, Dokl. Akad. Nauk. 58 (1998), 389–391.
18. D. S. Mitrinović, J. Sándor and B. Crstici, Handbook of Number Theory (Kluwer
Academic Publishers, Dordrecht, 1996).
19. A. Schrijver, Theory of Linear and Integer Programming (John Wiley & Sons, Chich-
ester, 1986).
20. V. A. Truong and L. Tunçel, Geometry of homogeneous convex cones, duality map-
ping, and optimal self-concordant barriers, Research report COOR #2002-15 (2002),
University of Waterloo, Waterloo, Canada.
21. L. A. Wolsey, Integer programming duality: Price functions and sensitivity analysis,
Math. Programming 20 (1981), 173–195.
Chapter 3
Some nonlinear Lagrange and penalty
functions for problems with a single
constraint

J. S. Giri and A. M. Rubinov†

Abstract We study connections between generalized Lagrangians and gen-

eralized penalty functions, which are formed by increasing positively homo-
geneous functions. In particular we show that the least exact penalty param-
eter is equal to the least Lagrange multiplier. We also prove, under some
natural assumptions, that the natural generalization of a Lagrangian cannot
improve it.

Key words: Generalized Lagrangians, generalized penalty functions, single

constraint, IPR convolutions, IPH functions

3.1 Introduction

Consider the following constrained optimization problem P (f0 , f1 ):

min f0 (x) subject to x ∈ X, f1 (x) ≤ 0, (3.1)

where X ⊆ IRn and f0 (x), f1 (x) are real-valued, continuous functions. (We
shall assume that these functions are directionally diﬀerentiable in Section
3.4.) Note that a general mathematical programming problem:

min f0 (x) subject to x ∈ X, gi (x) ≤ 0, (i ∈ I), hj (x) = 0 (j ∈ J),

J. S. Giri
School of Information Technology and Mathematical Sciences, University of Ballarat,
Victoria, AUSTRALIA
e-mail: [email protected]
A. M. Rubinov†
School of Information Technology and Mathematical Sciences, University of Ballarat,
Victoria, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 41

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 3,
c Springer Science+Business Media, LLC 2009
42 J.S. Giri and A.M. Rubinov

where I and J are ﬁnite sets, can be reformulated as (3.1) with

f1 (x) = max(max gi (x), max |hj (x)|). (3.2)

i∈I j∈J

Note that the function f1 deﬁned by (3.2) is directionally diﬀerentiable if the

functions gi (i ∈ I) and hj (j ∈ J) possess this property.
The traditional approach to problems of this type has been to employ a
Lagrange function of the form

L(x; λ) = f0 (x) + λf1 (x).

The function q(λ) = inf x∈X (f0 (x) + λf1 (x)) is called the dual function and
the problem
max q(λ) subject to λ > 0
is called the dual problem. The equality

sup inf L(§, λ) = inf{{ (§) : § ∈ X , {∞ (§) ≤ }

λ>0 x∈X

is called the zero duality gap property. The number λ̄ > 0 such that

inf L(x, λ̄) = inf{f0 (x) : x ∈ X, f1 (x) ≤ 0}

x∈X

is called the Lagrange multiplier.

Let f1+ (x) = max(f1 (x), 0). Then the Lagrange function for the problem
P (f0 , f1+ ) is called the penalty function for the initial problem P (f0 , f1 ).
The traditional Lagrange function may be considered to be a linear con-
volution of the objective and constraint functions. That is,

L(x; λ) ≡ p(f0 (x), λf1 (x)),

where p(u, v) = u + v. It has been shown in [4, 5] that for penalty func-
tions, increasing positively homogeneous (IPH) convolutions provide exact
penalization for a large class of objective functions. The question thus arises
“are there nonlinear convolution functions for which Lagrange multipliers
exist?” The most interesting example of a nonlinear IPH convolution func-

1/k
tion is the function sk (u, v) = uk + v k . These convolutions also of-
ten provide a smaller exact penalty parameter than does the traditional
linear convolution. (See Section 3.3 for the deﬁnition of an exact penalty
parameter.)
We will show in this chapter that for problems where a Lagrange multiplier
exists, an exact penalty parameter also exists, and the smallest exact penalty
parameter is equal to the smallest Lagrange multiplier.
We also show that whereas a generalized penalty function can often im-
prove the classical situation (for example, provide exact penalization with
a smaller parameter than that of the traditional function), this is not true
3 Some nonlinear Lagrange and penalty functions 43

for generalized Lagrange functions. Namely, we prove, under some natural

assumptions, that among all functions sk the Lagrange multiplier may exist
only for the k = 1 case. So generalized Lagrangians cannot improve the
classical situation.

3.2 Preliminaries

Let us present some results and definitions which we will make use of later in
this chapter. We will refer to the solution of the general problem, P (f0 , f1 ),
as M (f0 , f1 ).
We will also make use of the sets, X0 = {x ∈ X : f1 (x) ≤ 0} and X1 =
{x ∈ X : f1 (x) > 0}.
It will be convenient to talk about Increasing Positively Homogeneous
(IPH) functions. These are defined as functions which are increasing, that is,
if (δ, γ) ≥ (δ , γ ) then p(δ, γ) ≥ p(δ , γ ), and positively homogeneous of the
first degree, that is, p(α(δ, γ)) = αp(δ, γ), α > 0 .
We shall consider only continuous IPH functions defined on either the half-
plane {(u, v) : u ≥ 0} or on the quadrant IR2+ = {(u, v) ∈ IR2 : u ≥ 0, v ≥ 0}.
In the latter case we consider only IPH functions p : IR2++ → IR, which
possess the following properties:

p(1, 0) = 1, lim p(1, v) = +∞.

v→+∞

We shall denote by P1 the class of all such functions. The simplest example
of functions from P1 is the following function sk (0 < k < +∞), defined on
IR2+ :

1
sk (u, v) = uk + v k k . (3.3)
2l + 1
If k = with k, l ∈ N then the function sk is well defined and IPH
2m + 1
on the half-plane {(u, v) : u ≥ 0}. (Here N is the set of positive integers.)
A perturbation function plays an important part in the study of extended
penalty functions and is defined on IR+ = {y ∈ IR : y ≥ 0} by

β(y) = inf{f0 (x) : x ∈ X, f1 (x) ≤ y}.

We denote by CX the set of all problems (f0 , f1 ) such that:

1. inf x∈X f0 (x) > 0;
2. there exists a sequence xk ∈ X1 such that f1 (xk ) → 0 and f0 (xk ) →
M (f0 , f1 );
3. there exists a point x ∈ X such that f1 (x) ≤ 0;
4. the perturbation function of the problem (f0 , f1 ) is l.s.c. at the point
y = 0.
44 J.S. Giri and A.M. Rubinov

An important result which follows from the study of perturbation functions

is as follows.
Theorem 1. Let P (f0 , f1 ) ∈ CX . Let k > 0 and let p = pk be deﬁned as
1
pk (δ, γ) = (δ k + γ k ) k . There exists a number d¯ > 0 such that qp+ (d)
¯ =
M (f0 , f1 ) if and only if β is calm of degree k at the origin. That is,

β(y) − β(0)
lim inf > −∞.
y→+0 yk

A proof of this is presented in [4] and [5].

3.3 The relationship between extended penalty

functions and extended Lagrange functions

Let (f0 , f1 ) ∈ CX and let p be an IPH function deﬁned on the half-plane

IR2∗ = {(u, v) : u ≥ 0}. Recall the following deﬁnitions. The Lagrange-type
function with respect to p is deﬁned by

Lp (x, d) = p(f0 (x), df1 (x)).

(Here d is a real number and df does not mean the diﬀerential of f .) The
dual function qp (d) with respect to p is deﬁned by

qp (d) = inf p(f0 (x), df1 (x)), d > 0.

x∈X

Let p+ be the restriction of p to IR2+ . Consider the penalty function L+

p and
the dual function qp+ corresponding to p+ :

L+
p (x, d) = p (f0 (x), df1 (x)), (x ∈ X, d ≥ 0),
+ +

qp+ (d) = inf L+

p (x, d), (d ≥ 0).
x∈X

Note that if f1 (x) = 0 for x ∈ X0 then qp = qp+ .

Let
tp (d) = inf p(f0 (x), df1 (x)). (3.4)
x∈X0

Then
qp (d) = min(tp (d), rp+ (1, d)), (3.5)
where rp+ is deﬁned by

rp+ (d0 , d) = inf p+ (d0 f0 (x), df1 (x)). (3.6)

x∈X1
3 Some nonlinear Lagrange and penalty functions 45

(The function rp+ was introduced and studied in [4] and [5].)
If the restriction p+ of p on IR2+ belongs to P1 then rp+ (1, d) = qp+ (d) (see
([4, 5]), so
qp (d) = min(tp (d), qp+ (d)).
Note that the function tp is decreasing and
tp (d) ≤ tp (0) = M (f0 , f1 ), (d > 0).

The function qp+ (d) = rp+ (1, d) is increasing. It is known (see [4, 5]) that
qp+ (d) ≡ rp+ (1, d) ≤ lim rp (1, u) = M (f0 , f1 ). (3.7)
u→+∞

Recall that a positive number d¯ is called a Lagrange multiplier of P (f0 , f1 )

¯ = M (f0 , f1 ). A positive number d¯ is called an exact
with respect to p if qp (d)
¯ = M (f0 , f1 ). We
penalty parameter of P (f0 , f1 ) with respect to p+ , if qp+ (d)
will now show that the following statement holds.
Theorem 2. Consider (f0 , f1 ) ∈ CX and an IPH function p deﬁned on IR2∗ .
Assume that the restriction p+ of p to IR2+ belongs to P1 . Then the following
assertions are equivalent:
1) there exists a Lagrange multiplier d¯ of P (f0 , f1 ) with respect to p.
2) there exists an exact penalty parameter d¯ of P (f0 , f1 ) with respect to p+
and
max(tp (d), rp+ (d)) = M (f0 , f1 ) for all d ≥ 0. (3.8)

Proof. 1) =⇒ 2). Let d¯ be a Lagrange multiplier of P (f0 , f1 ). Then

¯ 1 (x)) = M (f0 , f1 ).
inf p(f0 (x), df
x∈X

¯ + (x) ≥ df
Since p is an increasing function and df ¯ 1 (x) for all x ∈ X, we have
1
¯ = inf p+ (f0 (x), df
qp+ (d) ¯ + (x))
1
x∈X
¯ + (x))
= inf p(f0 (x), df 1
x∈X
¯ 1 (x)) = M (f0 , f1 ).
≥ inf p(f0 (x), df
x∈X

On the other hand, due to (3.7) we have qp+ (d) ≤ M (f0 , f1 ) for all d. Thus
¯ = M (f0 , f1 ), that is, d¯ is an exact penalty parameter of P (f0 , f1 ) with
qp+ (d)
respect to p+ .
Due to (3.5) we have
¯ rp+ (1, d))
min(tp (d), ¯ = M (f0 , f1 ).

Since tp (d) ≤ M (f0 , f1 ) and rp+ (1, d) ≤ M (f0 , f1 ), it follows that

¯ = M (f0 , f1 ) and rp+ (1, d)
tp (d) ¯ = M (f0 , f1 ). (3.9)
46 J.S. Giri and A.M. Rubinov

Since tp (d) is decreasing and rp+ (1, d) is increasing, (3.9) implies the
equalities

tp (d) = M (f0 , f1 ), ¯
(0 ≤ d ≤ d),
rp+ (1, d) = M (f0 , f1 ), (d¯ ≤ d < +∞),

which, in turn, implies (3.8).

2) Assume now that (3.8) holds. Let

Ds = {d : tp (d) = M (f0 , f1 )}, Dr = {d : rp+ (1, d) = M (f0 , f1 )}.

Since p is a continuous function it follows that tp is upper semicontinuous.

Also M (f0 , f1 ) is the greatest value of tp and this function is decreasing,
therefore it follows that the set Ds is a closed segment with the left end-point
equal to zero. It should also be noted that the set Dr is nonempty. Indeed,
since p+ ∈ P1 it follows that Dr contains a penalty parameter of P (f0 , f1 )
with respect to p+ . It is observed that the function rp+ (1, ·) is increasing
and upper semicontinuous and since M (f0 , f1 ) is the greatest value of this
function, it follows that Dr is a closed segment. Due to (3.8) we can say that
Ds ∪ Dr = [0, +∞). Since both Ds and Dr are closed segments, we conclude
that the set Dl := Ds ∩ Dr = ∅. Let d¯ ∈ Dl and therefore tp (d)¯ = M (f0 , f1 )
¯ = M (f0 , f1 ). Due to (3.5) we have qp (d)
and rp+ (1, d) ¯ = M (f0 , f1 ).

Remark 1. Assume that p+ ∈ P1 and an exact penalty parameter exists. It

easily follows from the second part of the proof of Proposition 2 that the set
of Lagrange multipliers coincides with the closed segment Dl = Ds ∩ Dr .

Note that for penalty parameters the following assertion (Apen ) holds:
a number which is greater than an exact penalty parameter is also an exact
penalty parameter.
The corresponding assertion, Alag :
a number, which is greater than a Lagrange multiplier is also a Lagrange
multiplier,
does not hold in general. Assume that a Lagrange multiplier exists. Then
according to Proposition 2 an exact penalty parameter also exists. It follows
from Remark 1 that (Alag ) holds if and only if Ds = [0, +∞), that is,

inf p(f0 (x), df1 (x)) = M (f0 , f1 ) for all d ≥ 0. (3.10)

x∈X0

We now point out two cases where (3.10) holds.

One of them is closely related to penalization. Let p be an arbitrary IPH
function, such that p(1, 0) = 1 and f1 (x) = 0 for all x ∈ X0 (in other words,
f1+ = f1 ), then (3.10) holds.
3 Some nonlinear Lagrange and penalty functions 47

We now remove condition f1+ = f1 and consider very special IPH functions,
for which (3.10) holds without this condition. Namely, we consider a class P∗
of IPH functions deﬁned on the half-plane IR2∗ = {(u, v) : u ≥ 0} such that
(3.10) holds for each problem (f0 , f1 ) ∈ CX .
The class P∗ consists of functions p : IR2∗ → IR, such that the restriction
of p on the cone IR2+ belongs to P1 and p(u, v) = u for (u, v) ∈ IR2∗ with
v ≤ 0.
It is clear that each p ∈ P∗ is positively homogeneous of the ﬁrst degree. Let
us now describe some further properties of p. Let (u, v) ≥ (u , v ). Assuming
without loss of generality that v ≥ 0, v ≤ 0 we have

p(u, v) ≥ p(u , 0) ≥ u = p(u , v )

so p is increasing. Since p(u, 0) = u, it follows that p is continuous. Thus

P∗ consists of IPH continuous functions. The simplest example of a function
p ∈ P∗ is p(u, v) = max(u, av) with a > 0. Clearly the function
1
pk (u, v) = max((uk + av k ) k , u)
2l+1
with k = 2m+1 , l, m ∈ N belongs to P∗ as well.
Let us check that (3.10) holds for each (f0 , f1 ) ∈ CX . Indeed, since f0 (x) >
0 for all x ∈ X, we have

inf p(f0 (x), df1 (x)) = inf f0 (x) = M (f0 , f1 ) for all d ≥ 0.
x∈X0 x∈X0

3.4 Generalized Lagrange functions

In this section we consider problems P (f0 , f1 ) such that both f0 and f1 are
directionally differentiable functions defined on a set X ⊆ IRn . Recall that
a function f defined on X is called directionally differentiable at a point
x ∈ intX if for each z ∈ IRn there exists the derivative f (x, z) at the point
x in the direction z:
1
f (x, z) = lim (f (x + αz) − f (x)).
α→+0 α
Usually only directionally differentiable functions with a finite derivative are
considered. We also accept functions whose directional derivative can attain
the values ±∞. It is well known (see, for example, [1]) that the maximum of
two directionally differentiable functions is also directionally differentiable. In
particular the function f + is directionally differentiable, if f is directionally
differentiable. Let f (x) = 0. Then

(f + ) (x, z) = max(f (x, z), 0) = (f (x, z))+ .

48 J.S. Giri and A.M. Rubinov

Let sk , k > 0 be a function deﬁned on IR2+ by (3.3). Assume that there

exists an exact penalty parameter for a problem P (f0 , f1 ) with (f0 , f1 ) ∈ CX .
It easily follows from results in [5, 6] that an exact penalty parameter with
respect to k < k also exists and that the smallest exact penalty parameter
d¯k with respect to sk is smaller than the smallest exact penalty parameter
d¯k with respect to sk . The question then arises, does this property hold for
2l + 1
Lagrange multipliers? (This question makes sense only for k = with
2m + 1
k, l ∈ N and functions sk defined by (3.3) on the half-plane IR2∗ .) We provide
a proof that the answer to this question is, in general, negative.
Let f be a directionally differentiable function defined on a set X and let
x ∈ intX. We say that x is a min-stationary point of f on X if for each
direction z either f (x, z) = 0 or f (x, z) = +∞. We now present a simple
example.
Example 1. Let X = IR,
)√ )√
x if x > 0, x if x > 0,
f1 (x) = f2 (x) =
−x if x ≤ 0, x if x ≤ 0,
) √
− x if x > 0,
f3 (x) =
−x if x ≤ 0.

Then the point x = 0 is a min-stationary point for f1 and f2 , but this point
is not min-stationary for f3 .

Proposition 1. (Necessary condition for a local minimum). Let x ∈ intX

be a local minimizer of a directionally diﬀerentiable function f . Then x is a
min-stationary point of f .

Proof. Indeed, for all z ∈ IRn and suﬃciently small α > 0 we have
(1/α)(f (x + αu) − f (x)) ≥ 0. Thus the result follows.

Consider a problem P (f0 , f1 ) where (f0 , f1 ) ∈ CX are functions with ﬁnite

directional derivatives. Consider the IPH function sk deﬁned by (3.3). Let us
deﬁne the corresponding Lagrange-type function Lsk :

Lsk (x, λ) = f0 (x)k + λf1 (x)k . (3.11)

We have for x ∈ X such that f1 (x) = 0 that

Lsk (x, z; λ) = kf0 (x)k−1 (f0 ) (x, z) + λkf1 (x)k−1 (f1 ) (x, z). (3.12)

Assume now that f1 (x) = 0. We consider the following cases separately.

1) k > 1. Then

Lsk (x, z; λ) = kf0 (x)k−1 (f0 ) (x, z). (3.13)

3 Some nonlinear Lagrange and penalty functions 49

2) k = 1. Then
Lsk (x, z; λ) = (f0 ) (x, z). (3.14)

3) k < 1. First we calculate the limit

1
A(z) := lim (f1 (x + αz))k
α
α→+0
1
= lim (f1 (x) + αf1 (x, z) + o(α))k
α→+0 α
1
= lim (αf1 (x, z) + o(α))k .
α→+0 α

We have ⎧
⎨ +∞ if f1 (x, z) > 0,
A(z) = 0 if f1 (x, z) = 0,
⎩
−∞ if f1 (x, z) < 0.
Hence ⎧
⎨ +∞ if f1 (x, z) > 0,
Lsk (x, z; λ) = kf0 (x) k−1
(f0 ) (x, z) if f1 (x, z) = 0, (3.15)
⎩
−∞ if f1 (x, z) < 0.
Note that for problems P (f0 , f1 ) with (f0 , f1 ) ∈ CX a minimizer is located
on the boundary of the the set of feasible elements {x : f1 (x) ≤ 0}.
Proposition 2. Let k > 1. Let (f0 , f1 ) ∈ CX . Assume that the functions
f0 and f1 have ﬁnite directional derivatives at a point x̄ ∈ intX, which is a
minimizer of the problem P (f0 , f1 ). Assume that

there exists u ∈ IRn such that (f0 ) (x̄, u) < 0, (3.16)

(that is, x̄ is a not a min-stationary point for the function f0 over X). Then
the point x̄ is not a min-stationary point of the function Lk for each λ > 0.

Proof. Assume that x̄ is a min-stationary point of the function Lsk (x; λ) over
X. Then combining Proposition 1 and (3.13) we have

f0 (x̄)k−1 (f0 ) (x̄, z) ≥ 0, z ∈ IRn .

Since f0 (x̄) > 0 it follows that (f0 ) (x̄, z) ≥ 0 for all z, which contradicts
(3.16).

It follows from this proposition that the Lagrange multiplier with respect
to Lsk (k > 1) does not exist for a problem P (f0 , f1 ) if (3.16) holds. Condition
(3.16) means that the constraint f1 (x) ≤ 0 is essential, that is, a minimum
under this constraint does not remain a minimum without it.
Remark 2. Consider a problem P (f0 , f1 ) with (f0 , f1 ) ∈ CX . Then under
some mild assumptions there exists a number k > 1 such that the zero duality
50 J.S. Giri and A.M. Rubinov

gap property holds for the problem P (f0k , f1k ) with respect to the classical
Lagrange function (see [2]). This means that

sup inf (f0k (x) + λf1k (x)) = inf f0k (x).

λ>0 x∈X x∈X:f1 (x)≤0

Clearly this is equivalent to

sup inf sk (f0 (x), λf1 (x)) = inf f0 (x),

λ>0 x∈X x∈X:f1 (x)≤0

that is, the zero duality gap property with respect to sk holds. It follows
from Proposition 2 that a Lagrange multiplier with respect to sk does not
exist. Hence there is no a Lagrange multiplier for P (f0 , f1 ) with respect to
the classical Lagrange function.
Remark 3. Let g(x) = f1+ (x). Then the penalty-type function for P (f0 , f1 )
with respect to sk coincides with the Lagrange-type function for P (f0 , g) with
respect to sk . Hence an exact penalty parameter with respect to this penalty
function does not exist if (3.16) holds.
Proposition 3. Let k < 1 and let (f0 , f1 ) ∈ CX . Assume that the functions
f0 and f1 have ﬁnite directional derivatives at a point x̄ ∈ intX, which is a
minimizer for the problem P (f0 , f1 ). Assume that

there exists u ∈ IRn such that (f1 ) (x̄, u) < 0, (3.17)

(that is, x̄ is not a min-stationary point for the function f0 over X). Then
the point x̄ is not a min-stationary point of the function Lsk for each λ > 0.
Proof. Assume that a min-stationary point exists. Then combining Proposi-
tion 1, (3.15) and (3.17) we get a contradiction.
It follows from this proposition that a Lagrange multiplier with respect
to Lsk , k < 1, does not exist if condition (3.17) holds. We now give the
simplest example, when (3.17) is valid. Let f1 be a differentiable function
and ∇f (x̄) = 0. Then (3.17) holds.
Consider now a more complicated and interesting example. Let f1 (x) =
maxi∈I gi (x), where gi are differentiable functions. Then f1 is a directionally
differentiable function and f (x̄, u) = maxi∈I(x̄) [∇gi (x), u], where I(x̄) = {i ∈
I : gi (x) = f1 (x)} and [x, y] stands for the inner product of vectors x and y.
Thus (3.17) holds in this case if and only if there exists a vector u such that
[∇gi (x̄), u] < 0 for all i ∈ I(x̄). To understand the essence of this result, let us
consider the following mathematical programming problem with m inequality
constraints:

min f0 (x) subject to gi (x) ≤ 0, i ∈ I = {1, . . . , m}. (3.18)

We can present (3.18) as the problem P (f0 , f1 ) with f1 (x) = maxi∈I gi (x).
Recall the well-known Mangasarian–Fromovitz (MF) constraint qualiﬁcation
3 Some nonlinear Lagrange and penalty functions 51

for (3.18) (see, for example, [3]): (MF) holds at a point x̄ if there exists a
vector u ∈ IRn such that [∇gi (x̄), u] < 0 for all i ∈ I such that gi (x̄) = 0. Thus
(3.17) for P (f, f1 ) is equivalent to (MF) constraint qualification for (3.18). In
other words, if (MF) constraint qualification holds then a Lagrange multiplier
for Lsk with k < 1 does not exist. (It is known that (MF) implies the existence
of a Lagrange multiplier with k = 1.)
Let (f, f1 ) ∈ CX , where f0 , f1 are functions with finite directional deriva-
tives. Let g = f1+ and x be a point such that f1 (x) = 0. Then g (x, z) =
max(f (x, z), 0) ≥ 0 for all z, hence (3.17) does not hold for the problem
P (f0 , g). This means that Proposition 3 could not be applied to a penalty
function for P (f, f1 ) with respect to sk .
Simple examples show that exact penalty parameters with respect to sk
with k < 1 can exist. We now present an example from [5]. We do not provide
any details. (These can be found in [5], Example 4.6.)
Example 2. Let 0 < b < c < a be real numbers and X = [0, c]. Let f (x) =
(a − x)2 , f1 (x) = x − b, so P (f, f1 ) coincides with the following problem:

minimize (a − x)2 subject to x ≤ b, x ∈ X.

Let k = 1. Then an exact penalty parameter exists and the least exact penalty
d¯1 is equal to 2(a − b). Let k = 1/2. Then an exact penalty parameter also
exists and the least exact penalty parameter d¯1/2 coincides with c − b. We
indicate the following two points:
1) d¯1 does not depend on the set X; d¯1/2 depends on this set.
2) d¯1 depends on the parameter a, that is on the turning point of the parabola;
d¯1/2 does not depend on this parameter.

3.5 Example
Consider the following one-dimensional optimization problem:
9x2 7x
min f0 (x) = x3 − + + 5, (3.19)
2 2
subject to f1 (x) = x − 2 ≤ 0, x ∈ X = [0, 4].

A graphical representation of this problem is given in Figure 3.1 where

the shaded area represents the product of the feasible region and the axis
{(0, y) : y ∈ R}.
It can easily be shown that for this problem M (f0 , f1 ) = 2 at x̄ = 2.

3.5.1 The Lagrange function approach

The corresponding extended Lagrangian for (3.20) is

9x2 7x 1
Lsk (x, λ) = sk (f0 (x), λf1 (x)) = ((x3 − + + 5)k + λk (x − 2)k ) k ,
2 2
52 J.S. Giri and A.M. Rubinov

6 objective function (f–0)

0 x
–6 –4 –2 0 2 4 6 8 10 12

Fig. 3.1 P (f0 , f1 ).

2l + 1
recalling that k = .
2m + 1
Now consider
dL ∂L ∂L
= f0 (x̄) + λ f (x̄). (3.20)
dx̄ ∂f0 ∂f1 1
An easy calculation shows that
⎧
⎪
⎨−2,
5
k > 1,
dL
(x̄) = − + λ, k = 1,
5 (3.21)
dx ⎪
⎩ ∞,2 k < 1.

5
From this it is clear that an exact Lagrange multiplier λ̄ = 2 may exist
only for the case k = 1.

Remark 4. In fact Figure 3.2 shows that in this example λ̄ = 52 provides a

local minimum at x̄ for k = 1 but not a global minimum, therefore it follows
that no exact Lagrange multiplier exists for this problem.

3.5.2 Penalty function approach

The corresponding penalty function for (3.20) is

1
L+ k + k k
sk (x; λ) = (f0 + (λf1 ) )
2 1
((x3 − 9x2 + 7x2 + 5) + λ (x − 2) ) , for x ≥ 2,
k k k k
= 2
x3 − 9x2 + 7x2 + 5, for x ≤ 2.
3 Some nonlinear Lagrange and penalty functions 53

y
10

0 x
–8 –6 –4 –2 0 2 4 6 8 10 12

Fig. 3.2 L(x; 52 ).

By Theorem 13.6 it can easily be shown that an exact penalty parameter

exists when k < 1. This is shown in Figure 3.3 where an exact penalty
parameter, d¯ = 1, is used.

Fig. 3.3 L+
s 1 (x; 1).
3

From these results we have shown that whereas the adoption of extended
penalty functions of the form sk yields an improvement to the traditional
penalty function approach, this cannot be generalized to improve the La-
grange approach.

References

1. V. F. Demyanov and A. M. Rubinov, Constructive Nonsmooth Analysis (Peter Lang,

Frankfurt on Main, 1995).
2. D. Li, Zero duality gap for a class of nonconvex optimization problems, J. Optim.
Theory Appl., 85 (1995), 309–324.
54 J.S. Giri and A.M. Rubinov

3. Z. Q. Luo, J. S. Pang and D. Ralph, Mathematical Programming with Equilibrium

Constraints (Cambridge University Press, Cambridge, 1996).
4. A. M. Rubinov, B. M. Glover and X. Q. Yang, Decreasing functions with applications
to penalization, SIAM J. Optim., 10(1) (1999), 289–313.
5. Rubinov A. M., Abstract Convexity and Global Optimization (Kluwer Academic Pub-
lishers, Dordrecht, 2000).
6. A. M. Rubiniv, X. Q. Yang and A. M. Bagirov, Penalty functions with a small penalty
parameter, Optim. Methods Softw. 17 (2002), 931–964.
Chapter 4
Convergence of truncates in l1 optimal
feedback control

Robert Wenczel, Andrew Eberhard and Robin Hill

Abstract Existing design methodologies based on inﬁnite-dimensional linear

programming generally require an iterative process often involving progres-
sive increase of truncation length, in order to achieve a desired accuracy. In
this chapter we consider the fundamental problem of determining a priori es-
timates of the truncation length suﬃcient for attainment of a given accuracy
in the optimal objective value of certain inﬁnite-dimensional linear programs
arising in optimal feedback control. The treatment here also allows us to con-
sider objective functions lacking interiority of domain, a problem which often
arises in practice.

Key words: l1 -feedback control, epi-distance convergence, truncated convex

programs

4.1 Introduction

In the literature on feedback control there exist a number of papers addressing

the problem of designing a controller to optimize the response of a system
to a ﬁxed input. In the discrete-time context there are many compelling

Robert Wenczel
Department of Mathematics, Royal Melbourne University of Technology, Melbourne 3001,
AUSTRALIA
Andrew Eberhard
Department of Mathematics, Royal Melbourne University of Technology, Melbourne 3001,
AUSTRALIA
e-mail: [email protected]
Robin Hill
Department of Mathematics, Royal Melbourne University of Technology, Melbourne 3001,
AUSTRALIA

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 55

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 4,
c Springer Science+Business Media, LLC 2009
56 R. Wenczel et al.

reasons for using error-sequences in the space l1 as the basic variable, and
using various measures of performance (see, for example, [15]) which includes
the l1 -norm (see, for example, [11]). The formulation of l1 -problems leads
to the computation of inf M f (and determination of optimal elements, if
any), where M is an affine subspace of a suitable product X of l1 with
itself. This subspace is generated by the YJBK (or Youla) parameterization
[24] for the set of all stabilizing feedback-controllers for a given linear, time-
invariant system. The objective f will typically be the l1 -norm, (see [6],
[11]). As is now standard in the literature on convex optimization ( [20],
[21]), we will use convex discontinuous extended-real-valued functions, of the
form f = · 1 + δC , where δC is the indicator function (identically zero on C,
identically +∞ elsewhere) of some closed convex set in l1 . This formalism is
very flexible, encompassing many problem formats, including the case of time-
domain-template constraints ([10, Chapter 14], [14]). These represent bounds
on signal size, of the form Bi ≤ ei ≤ Ai (for all i), where e = {ei }∞i=0 denotes
the error signal. Often there are also similar bounds on the control signal u. As
the Youla parameterization generates controllers having rational z-transform,
the variables in M should also be taken to be rational in this sense, if the
above infimum is to equal the performance limit [6] for physically realizable
controllers. (If this condition is relaxed, then inf M f provides merely a lower
bound for the physical performance limit.) The set M may be recharacterized
by a set of linear constraints and thus forms an affine subspace of X. The
approach in many of the above references is to evaluate inf M f by use of
classical Lagrangian-type duality theory for such minimization problems. The
common assumption is that the underlying space X is l1 , or a product thereof,
and that M is closed, forcing it to contain elements with non-rational z-
transform, counter to the “physical” model.
Consequently, inf M f may not coincide with the performance limit for
physically realizable (that is, rational) controllers. However, in the context
of most of the works cited above, such an equality is actually easily estab-
lished (as was first noted in [27]). Indeed, whenever it is assumed that C has
nonempty interior, on combining this with the density in M of the subset con-
sisting of its rational members [19], we may deduce (see Lemma 15 below)
that the rational members of C ∩ M are l1 -dense in C ∩ M . This yields the
claimed equality for any continuous objective function (such as the l1 -norm).
Use of the more modern results of conjugate duality permits the extension
of the above approach to a more general class of (−∞, +∞]-valued objective
functions f . They are applicable even when int C may vanish. In this case,
the question of whether inf M f equals the physical limit becomes nontrivial
(in contrast to the case when C has interior). Indeed, if inf M f is strictly less
than the physical limit, any result obtained by this use of duality is arguably
of questionable engineering significance.
It is therefore important to know when inf M f is precisely the performance
limit for physically realizable controllers, to ensure that results obtained via
the duality approaches described above are physically meaningful. Note that
4 Convergence of truncates in l1 optimal feedback control 57

this question is posed in the primal space, and may be analyzed purely in the
primal space. This question will be the concern of this chapter.
In this paper, we derive conditions on the system, and on the time-domain
template set C, that ensure inf M (f0 + δC ) = inf C∩M f0 is indeed the per-
formance limit for physically realizable controllers, for various convex lower-
semicontinuous performance measures f0 . Here we only treat a class of 1-
input/2-output problems (referred to as “two-block” in the control litera-
ture), although the success of these methods for this case strongly suggests
the possibility of a satisfactory extension to multivariable systems.
Existing results on this question (see, for example, [14, Theorem 5.4])
rely on Lagrangian duality theory, and thereby demand that the time-domain
template C has interior. Here, for the class of two-block problems treated, we
remove this interiority requirement. Our result will be obtained by demon-
strating the convergence of a sequence of truncated (primal) problems. More-
over, this procedure will allow the explicit calculation of convergence esti-
mates, unlike all prior works with the exception of [23]. (This latter paper
estimates bounds on the truncation length for a problem with H∞ -norm
constraints and uses state-space techniques, whereas our techniques are quite
distinct.) The approach followed in our chapter has two chief outcomes. First,
it validates the duality-based approach in an extended context, by ensuring
that the primal problem posed in the duality recipe truly represents the
limit-of-performance for realizable controllers. Secondly, it provides a com-
putational alternative to duality itself, by exhibiting a convergent sequence of
ﬁnite-dimensional primal approximations with explicit error estimates along
the sequence. (This contrasts with traditional “primal–dual” approximation
schemes, which generally do not yield explicit convergence rates.)
This will be achieved by the use of some recently developed tools in opti-
mization theory (the relevant results of which are catalogued in Section 4.5)—
namely, the notion of epi-distance (or Attouch–Wets) convergence for convex
sets and functions [2, 3, 4, 5]. Epi-distance convergence has the feature that
if fn converges to f in this sense, then subject to some mild conditions,
* *
* *
*inf fn − inf f * ≤ d(f, fn ),
X X

where d(f, g) is a metric describing this mode of convergence. Since the l1 -

control problem is expressible as inf X (f + δM ), this leads naturally to the
question of the Attouch–Wets convergence of sums fn + δMn of sequences of
functions (where fn and Mn are approximations to f and M respectively). A
result from [5], which estimates the “epi-distances” d(fn + gn , f + g) between
sums of functions, in terms of sums of d(fn , f ) and d(gn , g), is restated and
modified to suit our purpose. This result requires that the objective and
constraints satisfy a so-called “constraint qualification” (CQ).
In Section 4.6 some conditions on the template C and on M are derived
that ensure that the CQ holds. Also, the truncated problems will be defined,
and some fundamental limitations on the truncation scheme (in relation to
58 R. Wenczel et al.

satisﬁability of the CQ) will be discussed. Speciﬁcally, the basic optimization

will be formulated over two variables e and u in l1 . The truncated approxi-
mate problems will be formed by truncating in the e variable only, since the
CQ will be seen to fail under any attempt to truncate in both variables. Thus
in these approximate problems, the set of admissible (e, u) pairs will contain
a u of infinite length. Despite this, it will be noted that the set of such (e, u)’s
will be generated by a finite basis (implied by the e’s), so these approximates
are truly finite-dimensional.
Finally, in Section 4.7, the results of Section 4.5 will be applied to deduce
convergence of a sequence of finite-dimensional approximating minimizations.
This will follow since the appropriate distances d(Cn , C) can be almost triv-
ially calculated. Also, we observe that for a sufficiently restricted class of
systems, our truncation scheme is equivalent to a simultaneous truncation in
both variables e and u. In fact, this equivalence is satisfied precisely when
the system has no minimum-phase zeros.
The motivation for this work arose from a deficiency in current computa-
tional practices which use simultaneous primal and dual truncations. These
yield upper and lower bounds for the original primal problem, but have the
disadvantage of not providing upper estimates on the order of truncation
sufficient for attainment of a prescribed accuracy.

4.2 Mathematical preliminaries

We let R stand for the extended reals [−∞, +∞]. For a Banach space X,
balls in X centered at 0 will be written as B(0, ρ) = { x ∈ X | x < ρ } and
B̄(0, ρ) = { x ∈ X | x ≤ ρ }. Corresponding balls in the dual space X ∗ will
be denoted B ∗ (0, ρ) and B̄ ∗ (0, ρ) respectively. The indicator function of a set
A ⊆ X will be denoted δA . We will use u.s.c. to denote upper-semicontinuity
and l.s.c. to denote lower-semicontinuity. Recall that a function f : X → R
is called proper if never equal to −∞ and not identically +∞, and proper
closed if it is also l.s.c. For a function f : X → R, the epigraph of f , denoted
epi f , is the set {(x, α) ∈ X × R | f (x) ≤ α}. The domain, denoted dom f,
is the set {x ∈ X | f (x) < +∞}. The (sub-)level set {x ∈ X | f (x) ≤ α}
(where α > inf X f ) will be given the abbreviation {f ≤ α}. For > 0, and
if inf X f is finite, -argmin f = {x ∈ X | f (x) ≤ inf X f + } is the set of -
approximate minimizers of f . Any product X×Y of normed spaces will always
be understood to be endowed with the box norm (x, y) = max{ x , y };
any balls in such product spaces will always be with respect to the box norm.
∞
∞the Banach space of all complex sequences a = {an }n=0
Here l1 (C) denotes
such that a 1 := n=0 |an | is finite;
l1 denotes the Banach space of all real sequences in l1 (C); and l∞ denotes
the Banach space of all real sequences a = {an }∞ n=0 such that a ∞ :=
supn |an | is finite.
4 Convergence of truncates in l1 optimal feedback control 59

For two sequences a, b their convolution a ∗ b is the sequence (a ∗ b)i =

i
j=0 aj bi−j .
The length of a sequence a, denoted l(a), is the smallest integer n such
that ai = 0 for all i ≥ n.
We deﬁne D̄ to be the closed unit disk {z ∈ C | |z| ≤ 1} in the complex
plane; D is the open unit disk {z ∈ C | |z| < 1}. ∞
The z-transform of a = {an }∞ n=0 is the function â(z) =
n
n=0 an z for
complex z wherever it is deﬁned. The inverse z-transform of â will be written
as Z −1 (â).
Also l1 denotes the set of all z-transforms of sequences in l1 . It can be
regarded as a subset of the collection of all continuous functions on D̄ that
are analytic on D.
We use R∞ to denote the set of all rational functions of a complex variable,
with no poles in D̄.

Deﬁnition 1. Let A be a convex set in a topological vector space and x ∈ A.

Then
1 cone A = ∪λ>0 λA (the smallest convex cone containing A);
2 The core or algebraic interior of A is characterized as x ∈ core A iﬀ ∀y ∈ X,
∃ > 0 such that ∀λ ∈ [−, ] we have x + λy ∈ A.

The following generalized interiority concepts were introduced in [8] and

[17, 18] respectively in the context of Fenchel duality theory, to frame a
suﬃcient condition for strong duality that is weaker than the classical Slater
condition.

Deﬁnition 2. Let A be a convex set in a topological vector space and x ∈ A.

Then
1 The quasi relative interior of A (qri A) consists of all x in A for which
cone (A − x) is a closed subspace of X;
2 The strong quasi relative interior of A (sqri A) consists of all x in X for
which cone (A − x) is a closed subspace of X.

Note that 0 ∈ core A if and only if cone A = X, and that in general,

core A ⊆ sqri A ⊆ qri A.
Nearly all modern results in conjugate duality theory use constraint qual-
iﬁcations based on one or other of these generalized interiors. Some appli-
cations of such duality results to (discrete-time) feedback optimal control
may be found in [11] and [15]. An example of an application to a problem in
deterministic optimal control (that is, without feedback) is outlined in [17].
From [16] and [20] we have the following. Recall that a set A in a topolog-
ical linear space X is ideally convex if for
∞ any bounded sequence{xn } ⊆ A
∞
and {λn } of nonnegative numbers with n=1 λn = 1, the series n=1 λn xn
either converges to an element of A, or else does not converge at all. Open
or closed convex sets are ideally convex, as is any ﬁnite-dimensional convex
60 R. Wenczel et al.

set. In particular, if X is Banach, then such series

∞ always converge, and the
deﬁnition of ideal convexity only requires that n=1 λn xn be in A. From [16,
Section 17E] we have the following proposition.

Proposition 1. For a Banach space X,

1 If C ⊆ X is closed convex, it is ideally convex.
2 For ideally convex C, core C = core C = int C = int C.
3 If A and B are ideally convex subsets of X, one of which is bounded, then
A − B is ideally convex.

Proof. We prove the last assertion only; the rest can be found in the cited
n − bn } ⊆ A − B be a bounded sequence and let λn ≥
reference. Let {a
∞
0 be such that n=1 λn = 1. Then, due to the assumed boundedness of
one of A or B, {an∞ } ⊆ A and {bn } ⊆ ∞B are both bounded, yielding the
∞
convergent
∞ sums ∞
n=1 λ n an ∈ A and n=1 λ n b n ∈ B. Thus n=1 λn (an −
bn ) = n=1 λn an − n=1 λn bn ∈ A − B.

Corollary 1. Let A and C be closed convex subsets of the Banach space X.

Then

0 ∈ core (A − C) implies 0 ∈ int (A ∩ B(0, ρ) − C ∩ B(0, ρ))

for some ρ > 0.

Proof. Let ρ > inf A∩C · and let x̄ ∈ A ∩ C ∩ B(0, ρ). Then for x ∈ X, x =
λ(a−c) for some λ > 0, a ∈ A, c ∈ C since by assumption, cone (A−C) = X.
Then for any t ≥ 1 suﬃciently large so that t−1 ( a + c ) + x̄ < ρ,
+ + , + , ,
1 1 1 1
x = tλ a+ 1− x̄ − c + 1 − x̄
t t t t
∈ tλ (A ∩ B(0, ρ) − C ∩ B(0, ρ)) ⊆ cone (A ∩ B(0, ρ) − C ∩ B(0, ρ)) .

Hence 0 ∈ core (A ∩ B(0, ρ) − C ∩ B(0, ρ)) from the arbitrariness of x ∈ X.

The result follows since by Proposition 1 the core and interior of A∩B(0, ρ)−
C ∩ B(0, ρ) coincide.

4.3 System-theoretic preliminaries

4.3.1 Basic system concepts

In its most abstract form, a system may be viewed as a map H : XI → XO

between a space XI of inputs and a space XO of outputs. In the theory of
feedback stabilization of systems, interconnections appear where the output
of one system forms the input of another, and for this to make sense, XI
4 Convergence of truncates in l1 optimal feedback control 61

and XO will, for simplicity, be taken to be the same space X. The system H
is said to be linear, if X is a linear space and H a linear operator thereon.
Our focus will be on SISO (single-input/single-output) linear discrete-time
systems. In this case, X will normally be contained in the space RZ of real-
valued sequences.
For n ∈ N deﬁne a time-shift τn on sequences in X by

(τn φ)i := φi−n , φ∈X.

If each τn commutes with H, then H is called shift- (or time-) invariant. Our
interest will be in linear time-invariant (or LTI) systems.
It is well known [12] that H is LTI if and only if it takes the form of
the convolution operator h∗ for some h ∈ X. This h is called the impulse-
response of H, since h = H(δ) = h∗δ, where δ is the (Dirac) delta-function in
continuous time, or is the unit pulse sequence (1, 0, 0, . . . ), in discrete time.
The discrete-time LTI system H = h∗ is causal if the support of h lies
in the positive time-axis N = {0, 1, 2, . . .}. The signiﬁcance of this notion is
clariﬁed after
observing the action of H on an input u, which takes the form
(Hu)n = k≤n hk un−k , so if h takes any nonzero values for negative time
then evaluation of the output at time n would require foreknowledge of input
behavior at later times.
Note that for LTI systems, a natural (but by no means the only) choice for
the “space of signals” X is the space of sequences for which the z-transform
exists. From now on we identify a LTI system H with its impulse-response
h := H(δ).

Deﬁnition 3. The LTI system h is BIBO (bounded-input/bounded-output)-

stable if h ∗ u ∈ l∞ whenever u ∈ l∞ .

From the proof of [12, Theorem 7.1.5], it is known that H = h∗ is BIBO-

stable if and only if h ∈ l1 .
Any LTI system H can be characterized by its transfer function, deﬁned
to be the z-transform ĥ of its impulse-response h. The input–output relation
then takes the form û → ĥû for appropriate inputs u. Thus convolution equa-
tions can be solved by attending to an algebraic relation, much simplifying
the analysis of such systems.
For systems H1 , H2 we shall often omit the convolution symbol ∗ from
products H1 H2 , which will be understood to be the system (h1 ∗ h2 )∗. By
the commutativity of ∗, H1 H2 = H2 H1 . This notation will be useful in
that now formal manipulation of (LTI) systems is identical to that of their
transforms (that is, transfer functions). Consequently, we may let the sym-
bol H stand either for the system, or its impulse-response h, or its transfer
function ĥ.
We now express stability in terms of rational transfer functions. The alge-
bra R∞ to be deﬁned below shall have a fundamental role in the theory of
feedback-stabilization of systems. First, we recall some notation:
62 R. Wenczel et al.

R[z] — the space of polynomials with real coeﬃcients, of the complex

variable z;
R(z) — the space of rational functions with real coeﬃcients, of the complex
variable z. Here

R∞ := {h ∈ R(z) | h has no poles in the closed unit disk} .

It is readily established that

R∞ = l1 ∩ R(z) ,

so R∞ forms the set of all rational stable transfer functions.

4.3.2 Feedback stabilization of linear systems

Here we summarize the theory of stabilization of (rational) LTI systems by

use of feedback connection with other (rational) LTI systems. All deﬁnitions
and results given in this subsection may be found in [24].
Consider the feedback conﬁguration of Figure 4.1. Here w is the “reference
input,” e = w − y is the “(tracking–) error,” representing the gap between
the closed-loop output y and the reference input w, and u denotes the control
signal (or “input activity,” or “actuator output”).

Fig. 4.1 A closed-loop control system.

Deﬁnition 4. A (rational) LTI discrete-time system K is said to (BI-BO-)

stabilize the (rational) LTI system P if the closed loop in Figure 4.1 is stable
in the sense that for any bounded input w ∈ l∞ and any bounded additively-
applied disturbance (such as Δ in Figure 4.1) at any point in the loop, all
resulting signals in the loop are bounded. Such K is referred to as a stabilizing
compensator or controller.

It should be noted that the deﬁnition as stated applies to general non-LTI,

even nonlinear systems, and amounts to the requirement of “internal,” as well
as “external,” stability. This is of importance, since real physical systems have
4 Convergence of truncates in l1 optimal feedback control 63

a ﬁnite operating range, and it would not be desirable for, say, the control
signal generated by K to become too large and “blow up” the plant P .
Writing P = p∗ and K = k∗, with p̂ and k̂ in R(z) and noting that the
transfer function between any two points of the loop can be constructed by
addition or multiplication from the three transfer functions given in (4.1)
below, we obtain the following well-known result.

Proposition 2. K stabilizes P if and only if

1/(1 + p̂k̂) , k̂/(1 + p̂k̂) and p̂/(1 + p̂k̂) ∈ R∞ . (4.1)

We denote by S(P ) the set of rational stabilizing compensators for the

plant P . The fundamental result of the theory of feedback stabilization (the
YJBK factorization — see Proposition 3) states that S(P ) has the struc-
ture of an aﬃne subset of R(z). Before we move to a precise statement of
this result, we remind the reader that we only intend to deal with single-
input/single-output systems. In the multi-input/multi-output case, each sys-
tem would be represented by a matrix of transfer functions, which greatly
complicates the analysis. We note, however, that this factorization result does
extend to this case (see [24, Chapter 5]).

Deﬁnition 5. Let n̂ and dˆ be in R∞ . We say that n̂ and dˆ are coprime if

there exist x̂ and ŷ in R∞ such that

x̂n̂ + ŷ dˆ ≡ 1 in R(z) ,

ˆ = 1 except at singularities). This

(that is, for all z ∈ C, x̂(z)n̂(z) + ŷ(z)d(z)
can be easily shown to be identical to coprimeness in R∞ considered as an
abstract ring.

Let p̂ ∈ R(z). It has a coprime factorization p̂ = n̂/dˆ where n̂ and dˆ are in

R∞ . Indeed, we can write p̂ = q̂/r̂ for polynomials q̂ and r̂ having no common
factors, which implies coprimeness in R[z] and hence in R∞ ⊇ R[z].
We can now state the fundamental theorem [24, Chapter 2], also referred
to as the Youla (or YJBK) parameterization.
Proposition 3. Let the plant P have rational transfer function p̂ ∈ R(z), let
n̂ and dˆ in R∞ form a coprime factorization and let x̂ and ŷ in R∞ arise
ˆ Then
from the coprimeness of n̂ and d.
* -
ˆ *
x̂ + dq̂
S(P ) = * q̂ ∈ R∞ , q̂ = ŷ/n̂ . (4.2)
ŷ − n̂q̂ *

This result has the following consequence for the stabilized closed-loop
mappings. Recall that 1/(1 + p̂ĉ) is the transfer function taking input w to e
(that is, ê = ŵ/(1 + p̂ĉ)) and ĉ/(1 + p̂ĉ) maps the input w to u. We now have
(see [24]) the following result.
64 R. Wenczel et al.

Corollary 2. The set of all closed-loop maps Φ taking w to (e, u), achieved
by some stabilizing compensator C ∈ S(P ), is
) + , + ,* .
*
Φ = dˆ ŷ − q̂ dˆ n̂ * q̂ ∈ R∞ , q̂ = ŷ/n̂ , (4.3)
x̂ dˆ *

and has the form of an aﬃne set in R(z) × R(z).

4.4 Formulation of the optimization problem in l1

In the following, we consider SISO (single-input/single-output) rational, lin-

ear time-invariant (LTI) systems. For such a system (the ‘plant’) we char-
acterize the set of error-signal and control-signal pairs achievable for some
stabilizing compensator, in the one-degree-of-freedom feedback conﬁguration
(Figure 4.1). The derivation, based on the Youla parameterization, is stan-
dard.
Assume that the reference input w = 0 has rational z-transform and the
plant P is rational and causal, so it has no pole at 0. For a = eiθ on the unit
circle, deﬁne a subspace Xa of l∞ by

Xa = l1 + span {c, s},

where
c := {cos kθ}∞ ∞
k=0 and s := {sin kθ}k=0 .

We shall assume that the error signals (usually denoted by e here) are in l1 ,
so we are considering only those controllers that make the closed-loop system
track the input (in an l1 -sense), and we shall assume also that the associated
control signal remains bounded (and in fact resides in Xa ).

Deﬁnition 6. For any u ∈ Xa , let uc and us denote the (unique) real num-
bers such that u − uc c − us s is in l1 .

Let the plant P have transfer function P(z) with the coprime factorization
P(z) = n̂(z)/d(z)
ˆ where n̂ and dˆ are members of R[z] (the space of polynomi-
s
als, with real coeﬃcients, in the complex variable z). Write n̂(z) = i=0 ni z i ,

ˆ = t di z i , n := (n0 , n1 , .., ns , 0, ..) and d := (d0 , .., dt , 0, ..).
d(z) i=0
Let x and y be ﬁnite-length sequences such that x̂n̂ + ŷ dˆ = 1 in R[z].
Their existence is a consequence of the coprimeness of n̂ and dˆ in R[z]
(see [24]).
If we aim to perform an optimization over a subset of the set S(P ) of
stabilizing controllers K for P , such that for each K in this subset, the
corresponding error signal φ(K) and control output u(K) are in l1 and Xa
respectively, the appropriate feasible set is
4 Convergence of truncates in l1 optimal feedback control 65

F0 := {(e, u) ∈ l1 × Xa | e = e(K) for some K ∈ S(P ), u = K ∗ e}

= {(e, u) ∈ l1 × Xa | (e, u) = w ∗ d ∗ (y, x) − q ∗ w ∗ d ∗ (n, −d)
for some q such that q̂ ∈ R∞ \{ŷ/n̂}},

where the latter equality follows from the YJBK factorization for S(P ) (use
Property 3 or Corollary 2).
Henceforth we shall always assume that w ∈ Xa is rational (that is, has
rational z-transform), the plant P is rational and has no zero at a.
By the lemma to follow (whose proof is deferred to the Appendix) we ob-
serve that a single sinusoid suﬃces to characterize the asymptotic (or steady-
state) behavior of u for all feasible signal pairs (e, u) ∈ F0 .

Lemma 1. Let w ∈ Xa and P be rational and assume that P (a) = 0. Then

for any (e, u) ∈ F0 ,
1
uc = βc := wc Re P (a) − ws Im P (a)
1
,
1 1
us = βs := ws Re P (a) + wc Im P (a) .

Using this lemma, we can translate F0 in the u variable to obtain the set
F := F0 − (0, βc c + βs s), having the form (where we use the notation x ∗
(y, z) := (x ∗ y, x ∗ z) for sequences x, y, z)

{ u) ∈ l1 × l1 |
F =(e,
(e, u) = w ∗ d ∗ (y, x) − (0, βc c + βs s) − q ∗ w ∗ d ∗ (n, −d)
for some q such that q̂ ∈ R∞ \{ŷ/n̂}}.

We need to recast F into a form to which the tools of optimization theory

can be applied. As a ﬁrst step, formally deﬁne the sets

M :=
⎧ * ⎫
⎪ * ê(p̄i ) = 0 ( p̄i pole of P : |p̄i | ≤ 1) i = 1,.., m1⎪
⎪ * ⎪
⎨ * ê(z̄ ) = ŵ(z̄j )(z̄j zero of P : |z̄j | ≤ 1) j = 1,.., m2⎬
(e, u) ∈ (l1 )2 ** j ,
⎪
⎪
⎩ * ê(v̄k ) = 0 (v̄k zero of ŵ : |v̄k | ≤ 1) k = 1,.., m3⎪ ⎪
⎭
d ∗ e*+ n ∗ u = w ∗ d − n ∗ (βc c + βs s)

Mr := {(e, u) ∈ M | ê, û rational, e = 0} and

(0)
M
⎧ :=* ⎫
⎨ * ê(p̄i ) = 0 (p̄i a pole of P with |p̄i | ≤ 1) i = 1,.., m1 ⎬
*
e ∈ l1 ** ê(z̄j ) = ŵ(z̄j )(z̄j a zero of P with |z̄j | ≤ 1) j = 1,.., m2
⎩ * ê(v̄k ) = 0 (v̄k a zero of ŵ with |v̄k | ≤ 1) k = 1,.., m3⎭

with the understanding that in the above sets, whenever P and ŵ have
a common pole at a (and hence at ā), the constraint ê(a) = 0 is absent.
66 R. Wenczel et al.

Moreover, note that in the deﬁnition of M , the constraints on ê at the ze-

ros z̄ (if z̄ = a, ā) are redundant, as follows from the closed-loop equation
d ∗ e + n ∗ u = w ∗ d − n ∗ (βc c + βs s).
The above constraint system can also be obtained from the general mul-
tivariable formalism of [9], [14] etc., on removal of the redundancies in the
latter. The content of the following remark will be used repeatedly in our
work.

Remark 1. We note the following relation between M (0) and M . Let (ē, ū) be
any element of M . Then M (0) − ē and M −(ē, ū) consist of elements satisfying
the corresponding constraints with right-hand sides set to zero. Assuming
that P (equivalently, n̂) has no zeros on the unit circle, and that all its zeros
in D are simple, then the map T on M (0) − ē taking e to −Z −1 (dê/n̂)
ˆ maps
into l (by Lemma 4 below) with T ≤ κ d 1 . Then (e, T e) ∈ M − (ē, ū),
1

since d ∗ e + n ∗ T e = 0, which follows on taking z-transforms.

The next two lemmas give simple sufficient conditions for the feasible
set F to be fully recharacterized as an affine subspace defined by a set of
linear constraints, and for when elements of finite length exist. Since they are
only minor modifications of standard arguments, we omit the proofs here,
relegating them to the Appendix.

Lemma 2. Assume that either

1 in D̄ the poles/zeros of P and ŵ are distinct and simple; or
2 in D̄\{a, ā} the poles/zeros of P and ŵ are distinct and simple, and P and
ŵ have a common simple pole at z = a.
Then Mr = F.

Lemma 3. Let P and w ∈ Xa be rational with P (a) = 0. Assume also the

following:
1 either
(a) in D̄ the poles/zeros of P and ŵ are distinct and simple; or
(b) in D̄\{a, ā} the poles/zeros of P and ŵ are distinct and simple, and P
and ŵ have a common simple pole at z = a;
2 (w − wc c − ws s) ∗ d has ﬁnite length;
3 all zeros (in C) of P are distinct and simple.
Then M contains elements (ē, ū) of ﬁnite length.

Remark 2. In the above result, one can always choose that ē = 0, so that
(ē, ū) ∈ Mr , which coincides with F by Lemma 2. Thus the latter is nonempty.

Lemma 4. Let fˆ ∈ l1 and let p(·) be a polynomial with real coeﬃcients with
no zeros on the unit circle. Assume also at each zero of p(·) in the open unit
4 Convergence of truncates in l1 optimal feedback control 67

disk, that fˆ has a zero also, of no lesser multiplicity. Then fˆ(·)/p(·) ∈ l1 .

Also, if q := Z −1 (fˆ/p), then q 1 ≤ κ f 1 where
+ ,
1
1/κ = (1 − |a|) 1 − ,
|a |
a, a zeros of p: |a|<1, |a |>1

and where in the above product, the zeros appear according to multiplicity,
and moreover, both members of any conjugate pair of zeros are included.
Proof. See the Appendix.
Remark 3. Such a result will not hold in general if p(·) has zeros on the unit
circle; a counterexample for |a| = 1 is as follows.
a−j
Let f0 = −1 and fj = j(j+1) for j ≥ 1. Then f ∈ l1 (C) and fˆ(a) = 0 since
∞ fˆ(z)
k=1 k(k+1) = 1. If q is the inverse z-transform of [z → z−a ], then for each
1

k,
* *
* 1 j*
|qk | = * ak+1 j>k j * = k+1
f a 1

so q ∈
/ l1 (C).
The hard bounds on the error and on input activity will be represented
by the closed convex sets C (e) and C (u) respectively, given by
(e) (e)
C (e) := {e ∈ l1 | Bi ≤ ei ≤ Ai for all i}, (4.4)
(e) (e)
where Bi ≤ 0 ≤ Ai eventually for large i, and
(u) (u)
C (u) := {u ∈ l1 | Bi ≤ ui ≤ Ai for all i}, where (4.5)

C := C (e) × C (u) . (4.6)

We shall also make use of the truncated sets
(e)
Ck := {e ∈ C (e) | ei = 0 for i ≥ k}. (4.7)
(u)
We shall not use the truncated sets Ck , for reasons that will be indicated
in Section 4.6.1.

4.5 Convergence tools

It is well known that there is no general correspondence between pointwise

convergence of functions, and the convergence, if any, of their inﬁma, or of op-
timal or near optimal elements. Hence approximation schemes for minimiza-
tion of functions, based on this concept, are generally useless. In contrast,
68 R. Wenczel et al.

schemes based on monotone pointwise convergence, or on uniform conver-

gence, have more desirable consequences in this regard. However, the cases
above do not include all approximation schemes of interest.
In response to this difficulty, a new class of convergence notions has gained
prominence in optimization theory, collectively referred to as “variational con-
vergence” or “epi-convergence.” These are generally characterized as suitable
set-convergences of epigraphs. For an extensive account of the theory, the
reader is referred to [1, 7]. Such convergences have the desirable property
that epi-convergence of functions implies, under minimal assumption, the
convergence of the infima to that of the limiting function, and further, for
some cases, the convergence of the corresponding optimal, or near-optimal,
elements.
The convergence notion with the most powerful consequences for the con-
vergence of infima is that of Attouch and Wets, introduced by them in [2, 3, 4].
Propositions 4 and 5 (see below) will form the basis for the calculation of ex-
plicit error estimates for our truncation scheme.

Deﬁnition 7. For sets C and D in a normed space X, and ρ > 0,

d(x, D) := inf x − d ,
d∈D
eρ (C, D) := sup d(x, D),
x∈C∩B(0,ρ)

haus ρ (C, D) := max{eρ (C, D), eρ (D, C)}, (the ρ-Hausdorﬀ distance) .

For functions f and g mapping X to R, the “ρ-epi-distance” between f and

g is deﬁned by
dρ (f, g) := haus ρ (epi f, epi g).

It is easily shown that

haus ρ (C, D) = dρ (δC , δD ).

Deﬁnition 8. Let Cn and C be subsets of X. Then Cn Attouch–Wets con-

verges to C iﬀ for each ρ > 0, we have

lim haus ρ (Cn , C) = 0.

n→∞

If fn and f are R-valued functions on X, then fn Attouch–Wets (or epi-

distance) converges to f if for each ρ > 0,

lim dρ (fn , f ) = 0,
n→∞

(that is, if their epigraphs Attouch–Wets converge).

This convergence concept is well suited to the treatment of approximation

schemes in optimization, since the epi-distances dρ provide a measure of the
4 Convergence of truncates in l1 optimal feedback control 69

diﬀerence between the inﬁma of two functions, as indicated by the following

result of Attouch and Wets. For ϕ : X → R with inf X ϕ ﬁnite, we denote by
-argmin ϕ the set {x ∈ X | ϕ(x) ≤ inf X ϕ + } of -approximate minimizers
of ϕ on X.
Proposition 4. [4, Theorem 4.3] Let X be normed and let ϕ and ψ be proper
R-valued functions on X, such that
1 inf X ϕ and inf X ψ are both ﬁnite;
2 there exists ρ0 > 0 such that for all > 0, (-argmin ϕ) ∩ B(0, ρ0 ) = ∅ and
(-argmin ψ) ∩ B(0, ρ0 ) = ∅.
Then * *
* *
*inf ϕ − inf ψ * ≤ dα(ρ0 ) (ϕ, ψ),
X X

where α(ρ0 ) := max{ρ0 , 1 + | inf X ϕ|, 1 + | inf X ψ|}.

Since our optimization problems are expressible as the inﬁmum of a sum
of two (convex) functions, we will ﬁnd useful a result on the Attouch–Wets
convergence of the sum of two Attouch–Wets convergent families of (l.s.c.
convex) functions. The following result by Azé and Penot may be found in
full generality in [5, Corollary 2.9]. For simplicity, we only quote the form
this result takes when the limit functions are both nonnegative on X.

Proposition 5. Let X be a Banach space, let fn , f , gn and g be proper closed

convex R-valued functions on X, with f ≥ 0 and g ≥ 0. Assume that we have
the Attouch–Wets convergence fn → f and gn → g, and that for some s > 0,
t ≥ 0 and r ≥ 0,

B(0, s)2 ⊆ Δ(X) ∩ B(0, r)2 − {f ≤ r} × {g ≤ r} ∩ B(0, t)2 , (4.8)

where Δ(X) := {(x, x) | x ∈ X} and B(0, s)2 denotes a ball in the box norm
in X × X (that is, B(0, s)2 = B(0, s) × B(0, s)). Then for each ρ ≥ 2r + t
and all n ∈ N such that

dρ+s (fn , f ) + dρ+s (gn , g) < s ,

we have
2r + s + ρ
dρ (fn + gn , f + g) ≤ [dρ+s (fn , f ) + dρ+s (gn , g)] .
s
In particular, fn + gn Attouch–Wets converges to f + g.

Corollary 3. Let Cn , C and M be closed convex subsets of a Banach space

X and let fn and f be proper closed convex R-valued functions on X with
f ≥ 0 and Attouch–Wets convergence fn → f and Cn → C. Suppose that for
some s > 0, t ≥ 0 and r ≥ 0,

B(0, s)2 ⊆ Δ(X) ∩ B(0, r)2 − (C × M ) ∩ B(0, t)2 ; (4.9)

70 R. Wenczel et al.

B(0, s)2 ⊆ Δ(X) ∩ B(0, r)2 − [{f ≤ r} × (C ∩ M )] ∩ B(0, t)2 . (4.10)

Assume further that fn and f satisfy

∃n0 ∈ N, α ∈ R such that max{ sup inf fn , inf f } ≤ α and (4.11)

n≥n0 Cn ∩M C∩M

there exists ρ0 > 0 such that

/
B(0, ρ0 ) ⊇ {fn ≤ α + 1} ∩ Cn ∩ M and B(0, ρ0 ) ⊇ {f ≤ α + 1} ∩ C ∩ M .
n≥n0
(4.12)
Then for any ﬁxed ρ ≥ max{2r + t, ρ0 , α + 1}, and all n ≥ n0 for which
2r + 2s + ρ
dρ+s (fn , f ) + dρ+2s (Cn , C) < s , and dρ+2s (Cn , C) < s , (4.13)
s
we have
* *
* *
* inf fn − inf f * ≤ dρ (fn + δC ∩M , f + δC∩M )
*Cn ∩M C∩M *
n

2r + s + ρ 2r + 2s + ρ
≤ dρ+s (fn , f ) + dρ+2s (Cn , C) .
s s

Proof. For n ≥ n0 , ϕ := f + δC∩M and ϕn := fn + δCn ∩M satisfy the

hypotheses of Proposition 4 and so
* *
* *
* inf fn − inf f * ≤ dα(ρ ) (ϕ, ϕn ) ≤ dρ (ϕ, ϕn ) .
* Cn ∩M *
C∩M
0

The estimates for dρ (ϕ, ϕn ) will now follow from two applications of Proposi-
tion 5. Indeed, from (4.9) and Proposition 5, whenever ρ ≥ 2r+t and n is such
that dρ+2s (Cn , C) < s, then dρ+s (Cn ∩M, C ∩M ) ≤ 2r+s+(ρ+s) s dρ+2s (Cn , C).
Taking n to be such that (4.13) holds, we find that dρ+s (fn , f ) + dρ+s (Cn ∩
M, C ∩ M ) < s, so from (4.10) and Proposition 5 again,
2r + s + ρ
dρ (ϕ, ϕn ) ≤ [dρ+s (fn , f ) + dρ+s (Cn ∩ M, C ∩ M )]
s
2r + s + ρ 2r + 2s + ρ
≤ dρ+s (fn , f ) + dρ+2s (Cn , C) .
s s
If we keep fn fixed in this process, then only one iteration of Proposition 5
is required, which will lead to a better coefficient for the rate of convergence
than that obtained from taking dρ (fn , f ) ≡ 0 in Corollary 3.

Corollary 4. Let Cn , C and M be closed convex subsets of a Banach space

X and let f be a proper closed convex R-valued functions on X with f ≥ 0
and Attouch–Wets convergence Cn → C. Suppose that for some s > 0, t ≥ 0
and r ≥ 0,
4 Convergence of truncates in l1 optimal feedback control 71

B(0, s)2 ⊆ Δ(X) ∩ B(0, r)2 − [({f ≤ r} ∩ M ) × C] ∩ B(0, t)2 . (4.14)

Assume further that for some n0 ∈ N, α ∈ R and ρ0 > 0,

inf f ≤ α and B(0, ρ0 ) ⊇ {f ≤ α + 1} ∩ C ∩ M . (4.15)

Cn0 ∩M

Then for any ﬁxed ρ ≥ max{2r + t, ρ0 , α + 1}, and all n ≥ n0 for which

dρ+s (Cn , C) < s , (4.16)

we have
* *
* *
* inf f − inf f * ≤ dρ (f + δC ∩M , f + δC∩M )
*Cn ∩M C∩M *
n

2r + s + ρ
≤ dρ+s (Cn , C) .
s
Proof. Similar to that of Corollary 3, but with only one appeal to
Proposition 5.

4.6 Veriﬁcation of the constraint qualiﬁcation

Our intention will be to apply the results of the preceding section with X =
l1 ×l1 (with the box norm), so the norm on X ×X will be the four-fold product
of the norm on l1 . From now on we will not notationally distinguish balls in
the various product spaces, the dimensionality being clear from context.
Before we can apply the convergence theory of Section 4.5, we need to check
that the constraint qualifications (4.9) and (4.10) (or (4.14)) are satisfied.
This will be the main concern in this section. First, we consider some more
readily verifiable sufficient conditions for (4.9) and (4.10) to hold.
Lemma 5. Suppose that for sets C and M we have

B(0, s) ⊆ C ∩ B(0, σ) − M (4.17)

for some s and σ. Then it follows that

B(0, s/2)2 ⊆ Δ(X) ∩ B(0, σ + 3s/2)2 − (C × M ) ∩ B(0, σ + s)2 (4.18)

(of the form (4.9)). Next, suppose we have

B(0, μ) ⊆ {f ≤ ν} ∩ B(0, λ) − (C ∩ M ) ∩ B(0, λ) (4.19)

for some λ, μ and ν.

Then it follows that
72 R. Wenczel et al.

B(0, μ/2)2 ⊆ Δ(X) ∩ B(0, μ/2 + λ)2 − ({f ≤ ν} × (C ∩ M )) ∩ B(0, λ)2 (4.20)

(of the form (4.10)).

Also,
B(0, s) ⊆ C ∩ B(0, σ) − {f ≤ r} ∩ M ∩ B(0, σ + s) (4.21)
implies

B(0, s/2)2 ⊆ Δ(X) ∩ B(0, σ + 3s/2)2 − (C × ({f ≤ r} ∩ M )) ∩ B(0, σ + s)2

(4.22)
(which is of the form of (4.14)).
Proof. Now (4.17) implies

B(0, s) ⊆ C ∩ B(0, σ) − M ∩ B(0, σ + s).

Place D := Δ(X)−(C ×M )∩B(0, σ +s)2 . Then, if P denotes the subtraction

map taking (x, y) to y − x, we have

P (D) = C ∩ B(0, σ + s) − M ∩ B(0, σ + s)

⊇ C ∩ B(0, σ) − M ∩ B(0, σ + s) ⊇ B(0, s),

and hence

B(0, s/2)2 ⊆ P −1 (B(0, s)) ⊆ P −1 P (D)

= D + P −1 (0) = D + Δ(X) = D

since Δ(X) + Δ(X) = Δ(X). Thus

B(0, s/2)2 ⊆ Δ(X) − (C × M ) ∩ B(0, σ + s)2 ,

from which (4.18) clearly follows (and is of the form (4.9) for suitable r, s
and t).
Next, suppose we have (4.19). If we define D := Δ(X) − ({f ≤ ν} × (C ∩
M ))∩B(0, λ)2 , then similarly P (D) = {f ≤ ν}∩B(0, λ)−(C ∩M )∩B(0, λ) ⊇
B(0, μ) so, proceeding as above, we obtain (4.20) (which is of the form (4.10)).
The last assertion follows from the first on substituting {f ≤ r} ∩ M for
M therein.
In this chapter, the objective functions f will always be chosen such that
(4.17) will imply (4.19) or (4.21). Hence in this section we focus on verification
of (4.17).
The constraints of M (0) have the form Ae = b, where m = 2m1 +2m2 +2m3
and

b = (Re ŵ(z̄1 ), Im ŵ(z̄1 ),.., Re ŵ(z̄m1 ), Im ŵ(z̄m1 ), 0,.., 0,.., 0)T ∈ Rm

with A : l1 → Rm given by
4 Convergence of truncates in l1 optimal feedback control 73

Ae := (.., Re ê(z̄i ), Im ê(z̄i ),.., Re ê(p̄j ), Im ê(p̄j ),.., Re ê(v̄k ), Im ê(v̄k ),..)T ,
(4.23)
where z̄i , p̄j and v̄k denote distinct elements of the unit disk, and i, j and k
range over {1, . . . , m1 }, {1, . . . , m2 } and {1, . . . , m3 } respectively. Then A is
expressible as a matrix operator of the form
⎛ ⎞
1 Re z̄1 Re z̄12 · · ·
⎜ 0 Im z̄1 Im z̄12 · · · ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎜. . . ⎟
⎜ ⎟
⎜ 1 Re z̄m1 Re z̄m 2
· · · ⎟
⎜ 1 ⎟
⎜ 0 Im z̄m1 Im z̄m 2
· · · ⎟
⎜ 1 ⎟
⎜ 1 Re p̄1 Re p̄21 · · · ⎟
⎜ ⎟
⎜ 0 Im p̄1 Im p̄21 · · · ⎟
⎜ ⎟
⎜ ⎟
(aij )1≤i≤m; 0≤j<∞ = ⎜ ... ..
.
..
. ⎟,
⎜ ⎟
⎜ 1 Re p̄m Re p̄2 · · · ⎟
⎜ 2 m2 ⎟
⎜ 0 Im p̄m Im p̄2 · · · ⎟
⎜ 2 m2 ⎟
⎜ 1 Re v̄1 Re v̄ 2 · · · ⎟
⎜ 1 ⎟
⎜ 0 Im v̄1 Im v̄ 2 · · · ⎟
⎜ 1 ⎟
⎜. .. .. ⎟
⎜ .. . . ⎟
⎜ ⎟
⎝ 1 Re v̄m Re v̄ 2 · · · ⎠
3 m3
0 Im v̄m3 Im v̄m 2
3
···

where rows of imaginary parts of the matrix (aij ) and of b are omitted when-
ever the associated z̄i , p̄j or v̄k is real.
For integer K, deﬁne A(K) to be the truncated operator taking RK into
m
R given by the matrix

(aij )1≤i≤m,0≤j<K .

It is known, as stated in [9], that A(m) is invertible on Rm . Hence for each

K ≥ m,
αK := sup inf ξ 1 (4.24)
β ∈ Rm ξ ∈ RK
β ∞ = 1 A(K) ξ = β

is ﬁnite. In particular, αK ≤ αm for all K ≥ m and

2' 2
2 (m) (−1 2
αm = 22 A 2,
2

the norm being taken relative to the 1-norm on the range of the inverse
3 (m) 4−1
A and the ∞-norm on its domain. Note that αK satisﬁes

(∀β ∈ Rm )(∃ξ ∈ RK )(A(K) ξ = βand ξ 1 ≤ αK β ∞ ).

74 R. Wenczel et al.

Lemma 6. Let C (e) := {e ∈ l1 | Bi ≤ ei ≤ Ai ; i = 0, 1, 2, . . . } where the

bounds Bi , Ai satisfy: ∃ξ ∈ RK such that Aξ = b and Bi < ξi < Ai for
i = 0, 1, . . . , K − 1; and Bi ≤ 0 ≤ Ai for i ≥ K.
Let K ≥ m, and let αK be as in (4.24). Also, let

:= min {|Ai − ξi |, |Bi − ξi |}

0≤i≤K−1

(so > 0). Then

−1 −1
B(0, αK ) ⊆ C (e) ∩ B(0, + ξ 1 ) − M (0) ∩ B(0, (1 + αK ) + ξ 1 ) .

Proof. Place s := /αK and let η ∈ B(0, s). Set e := {ξ, 0, 0, ..} ∈ C (e) ∩M (0) .
As A(K) maps onto Rm , there exists p ∈ RK ⊆ l1 such that Ap = A(K) p =
Aη with

p 1 ≤ αK Aη ∞ ≤ αK max {|η̂(z̄i )|, |η̂(p̄j )|, |η̂(v̄k )|} ≤ αK η 1 < .

z̄i ,p̄j ,v̄k

Consequently, p ∈ C (e) − e since for each i ≤ K − 1, |pi | < ≤ mini<K {|Ai −

ei |, |Bi − ei |}. Thus, since

A(η − (p + e)) = Aη − (Ap + Aē) = −Aē = −b ,

so that η−(p+e) ∈ −M (0) , we have η = p+e+(η−(p+e)) ∈ C (e) +(−M (0) ) =

C (e) − M (0) , as well as

p + e 1 ≤ p 1 + e 1 < + ξ 1 ,

which implies that η ∈ C (e) ∩ B(0, + ξ 1 ) − M (0) ∩ B(0, s + + ξ 1 ).

Lemma 7. Let C (e) , C (u) , C, M and M (0) be as in Section 4.4, and assume
that there exists (ē, ū) ∈ C ∩ M of ﬁnite length such that
1 B(0, τ ) ⊆ C (e) ∩ B(0, σ) − M (0) ∩ B(0, ρ) for some positive ρ, σ, τ ;
2 n̂ has no zeros on the unit circle; and
3 B(0, μ) ⊆ C (u) − ū for some positive μ with μ > κ d 1 (ρ + ē 1 ), where κ
is given in Lemma 4.
Then, if s := min{τ, μ − κ d 1 (ρ + ē 1 )}, we have

B(0, s) ⊆ C ∩ B(0, max{σ, μ + ū 1 }) − M. (4.25)

Proof. Let (ξ, η) ∈ B(0, s). From Assumption 7, ξ = v − e ∈ C (e) ∩ B(0, σ) −

M (0) ∩ B(0, ρ) so ξ = v − e where v = v − ē ∈ (C (e) − ē) ∩ B(0, σ + ē 1 )
and e = e − ē ∈ (M (0) − ē) ∩ B(0, ρ + ē 1 ). Place u = η + T e , then

u 1 ≤ η 1 + T e 1
≤ η 1 + T ( ē 1 + ρ)
< s + T ( ē 1 + ρ) ≤ μ,
4 Convergence of truncates in l1 optimal feedback control 75

where the last inequality follows by Remark 1, and so u ∈ C (u) − ū from As-
sumption 7 and u+ ū ∈ C (u) ∩B(0, μ+ ū 1 ). Also, v + ē = v ∈ C (e) ∩B(0, σ)
and

(ξ, η) = (v , u) − (e , T e ) ∈ (v , u) − (M − (ē, ū))

= (v + ē, u + ū) − M = (v, u + ū) − M
⊆ C ∩ B(0, max{σ, μ + ū 1 }) − M.

Remark 4. The existence of (ē, ū) in M of finite length is ensured, for in-
(e) (e)
stance, under the conditions of Lemma 3. If the bounds Ai and Bi satisfy
(e) (e)
Bi < ēi < Ai for i ≤ l(ē) (where l(·) denotes length), then Condition 7
of Lemma 7 follows, for suitable constants, from Lemma 6. (Note again, that
(e) (e)
this holds for arbitrary Ai and Bi for i > l(ē), so by making these bounds
decay to zero sufficiently rapidly, as will be shown in Lemma 13 to come, we
can enforce compactness of C (e) , which will be essential for the Attouch–Wets
(e)
convergence of Cn to C (e) ).
(u) (u)
If, furthermore, the bounds Ai and Bi are chosen to envelop ū (∀i,
(u) (u)
Bi < ūi < Ai ), and to be bounded away from zero by sufficient distance,
for all i, Condition 7 of Lemma 7 is also satisfied.

Remark 5. Note that

0 ∈ int (C − M )
iﬀ (4.9) holds for some r, s and t. Indeed, if 0 ∈ int (C−M ) then by Corollary 1
we obtain (4.17) for some s and σ, which implies (4.18), which is of the form
(4.9) for suitable constants. Conversely, (4.9) implies (where P (x, y) := x−y)

0 ∈ int P (Δ(X) ∩ B(0, r)2 − (C × M ) ∩ B(0, t)2 )

⊆ int (C ∩ B(0, t) − M ∩ B(0, t)) ⊆ int (C − M ).

If we are interested only in knowing that Cn ∩ M Attouch–Wets con-

verges to C ∩ M , and not in the actual rate of such convergence, then
0 ∈ int (C − M ) certainly suffices for the applicability of the results of the
theory to this end. Note however that in this case, the error bounds obtained
in Section 4.7 are now not “computable” since we do not have explicit val-
ues for the constants r, s and t appearing in the constraint qualification. A
sufficient condition for 0 ∈ int (C − M ) may be obtained on modification of
Lemma 7.
Lemma 8. Let C (e) , C (u) , C and M be as above, and assume that:
1 there exists (ē, ū) ∈ C ∩ M such that 0 ∈ core (C (u) − ū);
2 0 ∈ core (C (e) − M (0) ); and
3 n̂ has no zeros on the unit circle.
Then 0 ∈ int (C − M ).
76 R. Wenczel et al.

Proof. Note ﬁrst that

cone (C (e) − ē) × (C (u) − ū) = cone (C (e) − ē) × cone (C (u) − ū)

since C (e) − ē and C (u) − ū are convex sets containing 0. Then (where T
denotes the mapping introduced in Remark 1)

cone (C − M ) = cone (C − (ē, ū)) − (M − (ē, ū))

= cone (C (e) − ē) × cone (C (u) − ū) − (M − (ē, ū))
= cone (C (e) − ē) × l1
−{(e, u) | e ∈ M (0) − ē, d ∗ e + n ∗ u = 0}
= cone (C (e) − ē) × l1 − {(e, T e) | e ∈ M (0) − ē}
⊆ cone (C (e) − M (0) ) × l1 ,

where the ﬁnal inclusion is in fact an equality, as follows by noting that if

(ξ, η) ∈ cone (C (e) −M (0) )×l1 , then ξ ∈ cone (C (e) −M (0) ) = cone (C (e) −ē)−
(M (0) − ē) so ξ = v − e for some v ∈ cone (C (e) − ē) and e ∈ M (0) − ē. Setting
u := η+T e ∈ l1 yields (ξ, η) = (v, u)−(e, T e) ∈ cone (C (e) −ē)×l1 −{(e, T e) |
e ∈ M (0) − ē}. Thus cone (C − M ) = cone (C (e) − M (0) ) × l1 , which by
the assumptions yields 0 ∈ core (C − M ) and the result then follows from
Corollary 1.

4.6.1 Limitations on the truncation scheme

In Section 4.7, we will apply Corollary 3 to deduce various convergence re-

(e) (u)
sults. For this, it will be necessary that Ck := Ck × Ck Attouch–Wets
(u)
converges to C = C (e) × C (u) , where Ck denote the corresponding trun-
(u)
cations of C . Recall that we need a condition of the form (4.17) for the
application of the convergence theory of Section 4.5. This has an untoward
consequence in relation to convergence of truncations of C (u) . From Lemma 9
below, we see that Attouch–Wets convergence of Ck to C is impossible unless
(u)
we keep Ck = C (u) for all k; indeed, if truncations of C (u) are included,
then Attouch–Wets convergence will occur if and only if C, and hence C (u) , is
locally compact (in the sense of having compact intersections with all closed
balls), which is incompatible with the constraint qualification (4.17), as we
shall observe in Lemma 10.
Further, if instead we try to truncate the space M to form an expanding
family of finite-dimensional subspaces Mn , then similarly, any Attouch–Wets
convergence of Mn to M demands local compactness of M , which is an im-
possibility since the latter has infinite dimension.
We therefore use truncations in the e-variable only, yielding the form Ck :=
(e)
Ck ×C (u) . Thus our truncations will generally not consist purely of elements
(e, u) of fixed finite length. It will be shown currently however (see the end
4 Convergence of truncates in l1 optimal feedback control 77

of this section) that each Cn ∩ M is in fact contained in a ﬁnite-dimensional

subspace, but the basis thereof may consist of infinite-length members (in the
u-variable). If we wish for these truncations to contain only (e, u) of some
fixed finite length dependent only on n, then further assumptions on the plant
will be required (see Lemma 16).
Lemma 9. Let X be a Banach space, let Cn and C be closed convex subsets,
with Cn ⊆ C for all n, and Cn Attouch–Wets convergent to C. Assume also
for each n ∈ N and ρ > 0 that Cn ∩ B̄(0, ρ) is compact. Then C ∩ B̄(0, ρ) is
compact whenever C ∩ B(0, ρ) is nonempty.
Proof. It suffices to show that C∩B̄(0, ρ) is totally bounded. Now 0 ∈ int (C−
B̄(0, ρ)) and hence B(0, s) ⊆ C∩B̄(0, ρ+s)−B̄(0, ρ) for some s > 0 (see (4.17)
with M := B̄(0, ρ)). On comparing (4.17) and (4.18) we find the indicator
functions fn = δCn , f = δC and g = δB̄(0,ρ) satisfy a condition of the form
(4.8), so by Proposition 5, Cn ∩B̄(0, ρ) Attouch–Wets converges to C∩B̄(0, ρ).
Let > 0. By this convergence, there exists n such that C ∩ B̄(0, ρ) ⊆
Cn ∩ B̄(0, ρ) + B(0, /2). From the compactness of Cn ∩ B̄(0, ρ), there exist
5N
x1 , . . . , xN in Cn ∩ B̄(0, ρ) ⊆ C ∩ B̄(0, ρ) such that i=1 B(xi , /2) contains
5N
Cn ∩ B̄(0, ρ). Hence C ∩ B̄(0, ρ) ⊆ i=1 B(xi , ).

Lemma 10. Suppose that C = C (e) × C (u) is locally compact in the sense of
Lemma 9 and n̂ has no zeros on the unit circle. Then

0∈
/ int (C − M ) .

Proof. Supposing the contrary, Corollary 1 yields ρ > 0 satisfying cone (C ∩

B(0, ρ) − M ) = l1 × l1 , which in turn implies that

(∀(ξ, η) ∈ l1 × l1 )(∃e ∈ M (0) − ē) (4.26)

with ξ + e ∈ cone (C (e) ∩ B(0, ρ) − ē)
and η + T e ∈ cone (C (u) ∩ B(0, ρ) − ū),

where T is as in Remark 1, and (ē, ū) is a ﬁxed member of C ∩ B(0, ρ) ∩ M .

Let χ ∈ l1 . By the surjectivity of A : l1 → Rm given in (4.23), there exists
ˆ
ξ ∈ l'1 such that(for each zero z̄ in D for n̂, ξ(z̄) ˆ
= χ̂(z̄)/d(z̄). Place η :=
Z −1 (χ̂ − dˆξ)/n̂
ˆ . Since χ̂ − dˆξˆ now must have a zero at each (simple) zero
in D for n̂, and the latter has no zeros on the unit circle, we have η ∈ l1 by
Lemma 4. Thus χ = d ∗ ξ + n ∗ η. With e ∈ M (0) − ē as in (4.26), it follows
(on noting that d ∗ e + n ∗ T e = 0 from the deﬁnition of T ) that

χ = d∗ξ+n∗η
∈ d ∗ cone (C (e) ∩ B(0, ρ) − ē) + n ∗ cone (C (u) ∩ B(0, ρ) − ū)
−(d ∗ e + n ∗ T e)
= cone (d ∗ (C ∩ B(0, ρ) − ē)) + cone (n ∗ (C (u) ∩ B(0, ρ) − ū))
(e)
' (
⊆ cone d ∗ (C (e) ∩ B(0, ρ) − ē) + n ∗ (C (u) ∩ B(0, ρ) − ū) ,
78 R. Wenczel et al.

where the latter inclusion follows since both C (e) ∩ B(0, ρ) − ē and C (u) ∩
B(0, ρ) − ū are convex sets containing 0. Since χ ∈ l1 is arbitrary,
' (
cone d ∗ (C (e) ∩ B(0, ρ) − ē) + n ∗ (C (u) ∩ B(0, ρ) − ū) = l1 ,

so the compact convex set d ∗ (C (e) ∩ B̄(0, ρ) − ē) + n ∗ (C (u) ∩ B̄(0, ρ) − ū)
has a nonempty core (Definition 1) and hence by Proposition 1, a nonempty
interior. However, this latter property is forbidden for any compact subset of
an infinite-dimensional normed space. Thus we arrive at a contradiction.
We end this section with the promised verification that each truncation
(e)
Cn ∩ M = (Cn × C (u) ) ∩ M is indeed finite-dimensional.
Lemma 11. Under the assumptions of this section, Ck ∩ M is of finite di-
mension for each k.
Proof. Assume that C ∩ M has a member (ē, ū) with ē of finite length. Now
let (e, u) ∈ Ck ∩M , and let K := max{k, l(ē)}. Since d∗(e− ē)+n∗(u− ū) = 0
and e − ē ∈ M (0) − ē, then u − ū = T (e − ē). Since e − ē ∈ (M (0) − ē) ∩ RK ,
K
we can write e − ē = i=1 αi e(i) where the {e(i) }K i=0 is some spanning set for
(M (0) − ē) ∩ RK . Placing u(i) := T e(i) ∈ l1 , we obtain

K K
αi u(i) = T ( αi e(i) ) = T (e − ē) = u − ū
i=1 i=1

so that u ∈ ū+span {u(i) }K

i=1 , and hence Ck ∩M ⊆ R ×(ū+span {u }i=1 ),
K (i) K

a subspace of ﬁnite dimension. Note again that there is no guarantee that

any of the u(i) has ﬁnite length.

4.7 Convergence of approximates

As asserted in the opening paragraph of Section 4.6.1, if we wish to ap-

ply convergence theory, we cannot simultaneously truncate in both e and u.
(e)
Accordingly, our truncations will always be of the form Cn = Cn × C (u) .
The following two lemmas show that compactness of C (e) is essential for the
(e)
Attouch–Wets convergence of the truncations Cn and Cn .
Lemma 12. Let C, C (e) and C (u) be as usual, with also
∞ (e) (e)
i=0 max{|Ai |, |Bi |} < +∞. Then

∞

(e) (e) (e) (e)
dρ (Cn , C) = max{|Ai |, |Bi |}for any ρ ≥ max{|Ai |, |Bi |}
i≥n i=0

and Cn Attouch–Wets converges to C.

4 Convergence of truncates in l1 optimal feedback control 79

(e)
Proof. Since Cn ⊆ C (e) for all n, we have

dρ (Cn , C) = dρ (Cn(e) × C (u) , C (e) × C (u) ) = dρ (C (e) , Cn(e) ) = eρ (C (e) , Cn(e) ) ,

(e)
the latter. Let e ∈ C = C ∩ B(0, ρ). Then d(e, Cn ) =
(e) (e)
and we compute
e − e(n) = i≥n |ei | where e(n) denotes the truncation to length n (that
is, e(n) = (e0 , ..., en−1 , 0, 0, ..)), and hence as n → ∞
(e) (e)
eρ (C (e) , Cn(e) ) = sup |ei | = max{|Ai |, |Bi |} → 0 .
e∈C (e) i≥n i≥n

Lemma 13. The set C (e) is compact in l1 if and only if the bounds satisfy
∞
(e) (e)
max{|Ai |, |Bi |} < +∞ .
i=0

(e) (e)
Proof. Let i0 be such that Bi ≤ 0 ≤ Ai for all i ≥ i0 . If C (e) is compact,
(e) (e)
then xn := (A0 , .., An , 0, 0, ..) ∈ C (e) (n ≥ i0 ) must have a convergent
subsequence, along which we then have the uniform boundedness of the norms
nk (e) ∞ (e)
xnk = i=0 |Ai |, and since these increase with k, i=0 |Ai | is ﬁnite.
∞
Similarly, i=0 |Bi | is ﬁnite.
∞ (e) (e)
i=0 max{|Ai |, |Bi |} < +∞, the compactness of C
(e)
Conversely, if
(e)
follows from Lemma 9 since its truncations Cn are all compact and Attouch–
(e)
Wets converges to C by Lemma 12.

The next lemma shows that C ∩M is always bounded whenever the bounds
on e deﬁne sequences in l1 . This ensures that condition (4.12) will always be
satisﬁed for any objective f .

Lemma 14. Suppose (as usual here) that n̂ has no zeros on the unit circle
and that all its zeros in the unit disk are simple. Then C ∩ M ⊆ B(0, ρ0 ),
where
∞
6 7
(e) (e)
ρ0 = max max |Ai |, |Bi | ,
i=0
-
∞
6 7
(e) (e)
κ b 1 + d 1 max |Ai |, |Bi |
i=0

where κ is as in Lemma 4 and b := w ∗ d − n ∗ (βc c + βs s).

ˆ
Proof. Let (e, u) ∈ C ∩ M , then from the relation d ∗ e +n ∗ u = b, b̂ − dê
ˆ
has zeros at each zero in D̄ of n̂, and since u = Z −1 b̂−n̂dê , Lemma 4 yields
that u 1 ≤ κ b − d ∗ e 1 ≤ κ( b 1 + d 1 e 1 ). From this, and the relation
∞ (e) (e)
e 1 ≤ i=0 max{|Ai |, |Bi |}, the result follows.
80 R. Wenczel et al.

Assembling all the parts we obtain our main result.

Theorem 1. Let fn and f be proper closed convex R-valued functions, with

fn Attouch–Wets convergent to f and f ≥ 0. Also, assume the following for
C and M :
1 n̂ has no zeros on the unit circle, and all its zeros in the unit disk are
simple;
2 The bounds {Bi , Ai }∞
(e) (e) (e)
i=0 characterizing C form sequences in l1 , and
also satisfy the requirement that for some K ≥ m, ∃ξ ∈ RK such that
(e) (e) (e) (e)
Aξ = b with Bi < ξi < Ai for i = 0, 1, . . . , K − 1; and Bi ≤ 0 ≤ Ai
for i ≥ K;
3 There exists (ē, ū) ∈ C ∩ M of finite length such that B(0, μ) ⊆ C (u) − ū
−1
for some positive μ with μ > κ d 1 [(1 + αK ) + ξ 1 + ē 1 ], where κ is
given in Lemma 4 and αK in (4.24);
4 γ := max{supn≥n0 inf Cn ∩M fn , inf C∩M f } is finite;
5 B(0, μ) ⊆ {f ≤ ν} ∩ B(0, λ) − (C ∩ M ) ∩ B(0, λ) for some λ, μ and ν (that
is, (4.19) holds).
Define the constants

:= min {|Ai − ξi |, |Bi − ξi |} (4.27)

0≤i≤K−1
1 8 −1 −1
9
s := min αK , μ − κ d 1 [(1 + αK ) + ξ 1 + ē 1 ] (4.28)
2
s := min{s , μ} (4.29)
r := max{ + ξ 1 + 3s , μ + ū 1 + 3s , μ/2 + λ, ν} (4.30)
t := max{ + ξ 1 + 2s , μ + ū 1 + 2s , λ} . (4.31)

Let ρ0 be as in Lemma 14. Then for any ﬁxed ρ satisfying

-
∞
6 7
(e) (e)
ρ > max 2r + t, ρ0 , γ + 1, max |Ai |, |Bi | ,
i=0

(e) (e)
and all n ≥ n0 for which i≥n max{|Ai |, |Bi |} < s and

2r + 2s + ρ (e) (e)
dρ+s (fn , f ) + max{|Ai |, |Bi |} < s ,
s
i≥n

it follows that
* * 6 7
* *
* inf fn − inf f * ≤ (2r + s + ρ)(2r + 2s + ρ) max |A
(e)
|, |B
(e)
|
* Cn ∩M C∩M * s2 i i
i≥n
2r + s + ρ
+ dρ+s (fn , f ) .
s
4 Convergence of truncates in l1 optimal feedback control 81

Proof. By assumption 1 and Lemma 6, we obtain an inclusion of the form of

(4.17):
−1 −1
B(0, αK ) ⊆ C (e) ∩ B(0, + ξ 1 ) − M (0) ∩ B(0, (1 + αK ) + ξ 1 ) .

This, along with assumptions 1 and 1, may be inserted into Lemma 7 to yield

B(0, 2s ) ⊆ C ∩ B(0, max{ + ξ 1 , μ + ū 1 }) − M .

Lemma 5 then gives (where r := max{ + ξ 1 , μ + ū 1 })

B(0, s ) ⊆ Δ ∩ B(0, r + 3s ) − (C × M ) ∩ B(0, r + 2s ) ,

which, along with (4.20) (a consequence of assumption 1 via Lemma 5) yields

(4.9) and (4.10) (in Corollary 3) for r, s and t as above.
Further, assumption 1 gives (4.11), and (4.12) follows from Lemma 14
with the indicated value for ρ0 . Noting the explicit form for dρ (Cn , C) in
∞ (e) (e)
Lemma 12 (for ρ ≥ i=0 max{|Ai |, |Bi |}), the result may now be read
from Corollary 3.
In particular, if fn = f for all n, we see that
* * 6* * * *7
* * * (e) * * (e) *
* inf f − inf f * ≤ (2r + s + ρ)(2r + 2s + ρ) max *Ai * , *Bi * .
*Cn ∩M C∩M * s 2
i≥n

However, in this case, we can obtain better constants by using Corollary 4 in

place of Corollary 3, which will require the condition (4.21) (or (4.14)). To
illustrate, if f is like a norm, say, f (e, u) := e 1 + ζ u 1 (ζ > 0), then for
any r > 0, {f ≤ r} ⊇ B(0, r/(2 max{1, ζ})) so if (4.17) holds (for some s, σ)
then taking r = 2(s + σ) max{1, ζ} gives

C ∩ B(0, σ) − {f ≤ r} ∩ M ∩ B(0, σ + s)
⊇ C ∩ B(0, σ) − M ∩ B(0, σ + s) ⊇ B(0, s) by (4.17)

so (4.21), and hence (4.14), holds for suitable constants. Accordingly, we

arrive at the following result.
Theorem 2. Let f : l1 × l1 → R have the form f (e, u) := e 1 + ζ u 1
for some ζ > 0. Assume Conditions 1, 1 and 1 of Theorem 1 hold, and
Condition 1 is replaced by

γ := inf f <∞ for some n0 .

Cn0 ∩M

Deﬁne as in (4.27), and s := s by (4.28), with

r := 2 max{1, ζ} (max{ + ξ 1 , μ + ū 1 } + 2s) and (4.32)

82 R. Wenczel et al.

t := max{ + ξ 1 + 2s, μ + ū 1 + 2s} . (4.33)

Then, for any ﬁxed ρ satisfying
-
∞
6 7
(e) (e)
ρ > max 2r + t, ρ0 , γ + 1, max |Ai |, |Bi |
i=0

(where ρ0 appears in Lemma 14) and all n ≥ n0 for which

(e) (e)
max{|Ai |, |Bi |} < s ,
i≥n

we have
* * 6 7
* *
* inf f − inf f * ≤ 2r + s + ρ (e) (e)
max |Ai |, |Bi | .
*Cn ∩M C∩M * s
i≥n

Proof. This follows along similar lines to that for Theorem 1, but uses the
last displayed relation before Theorem 2 to obtain (4.21) from (4.17), so that
(4.14) is obtained for the above r, s and t and we may apply Corollary 4.

In summary: we have obtained

• that inf C∩M f provides the exact lower bound for the performance of ra-
tional controllers for the l1 control problem, and
• computable convergence estimates for the approximating truncated prob-
lems.
Note further that these results are obtained for the case where the hard-
bound set C (or time-domain template) has no interior (since C (e) is assumed
compact and hence has an empty interior). This then extends a result of [14]
on such approximations (in the particular two-block control problem we con-
sider) to cases where int C is empty. We note however that the cited result
from [14] in fact has an alternate short proof by an elementary convexity argu-
ment (see Lemma 15 below) once the density in M of the subset of members
of ﬁnite length is demonstrated. (This density is established, in the general
“multi-block” case, in [19]. For the special two-block setup we consider, this
property is proved in [26].) The above convergence results should be readily
extendible to the multi-block formalism of [14].
Lemma 15. Let C, M and M0 be convex sets, with M = M0 and (int C) ∩
M = ∅. Then C ∩ M = C ∩ M0 .

Proof. Let x ∈ C ∩ M and x0 ∈ (int C) ∩ M . Then for each 0 < λ < 1,

xλ := λx0 + (1 − λ)x ∈ (int C) ∩ M , and xλ → x for λ → 0. For each λ, the
density of M0 yields approximates to xλ from M0 which, if suﬃciently close,
must be in (int C) ∩ M0 .
4 Convergence of truncates in l1 optimal feedback control 83

This argument in fact leads to a very quick proof of [14, Theorem 5.4]
(whose original proof contains a ﬂaw, detailed in [27]) which asserts the
equality inf C∩M · 1 = inf C∩Mr · 1 under a slightly stronger assumption,
which in fact implies nonemptiness of (int C) ∩ M . To see the applicability of
Lemma 15 to the cited result from [14], we temporarily adopt the notation
thereof. Assign C and M as follows:

C := {Φ ∈ l1nz ×nw | Atemp Φ ≤ btemp } and

M := {Φ ∈ l1nz ×nw | Afeas Φ = bfeas } ,

where btemp ∈ l∞ , bfeas ∈ Rcz × l1nz ×nw , Atemp : l1nz ×nw → l∞ and Afeas :
l1nz ×nw → Rcz × l1nz ×nw are bounded linear, and where the symbol ≤ stands
for the partial order on l∞ induced by its standard positive cone P + . The
assumption of [14, Theorem 5.4] is that btemp − Atemp Φ0 ∈ int P + for some
Φ0 ∈ M . However, the continuity of Atemp implies that Φ0 ∈ int C and hence
Φ0 ∈ (int C) ∩ M , which is the assumption of Lemma 15. This, coupled with
the density of M0 := Mr in M , gives the result.
As was discussed in Section 4.6.1, the approximating problems are con-
structed by truncating in the e-variable only, otherwise the method fails.
However, the truncated constraint-sets Ck ∩ M satisfy

dim span Ck ∩ M ≤ 2k (for large k)

by Lemma 11, whereby the approximating minimizations are indeed over

finite-dimensional sets. Since in general (e, u) ∈ Ck ∩ M does not imply that
u has finite length, one may ask under what conditions it may be possible for
there to exist, for each k, a uniform bound m(k) to the length of u whenever
(e, u) ∈ Ck ∩ M . In this case, the truncated sets Ck ∩ M would resemble those
obtained by the more natural method of simultaneously truncating both e
and u (a strategy that fails to yield convergence, in general, by the results
of Section 4.6.1). From the lemma to follow, such a property can hold only
when the plant has no minimum-phase zeros (that is, no zeros outside the unit
circle). Hence, except for some highly restrictive cases, Ck ∩ M will always
contain some u of infinite length.
Lemma 16. Suppose that the assumptions of Theorem 1 hold, and assume
further that n̂ has no zeros outside the open unit disk. (As usual, all zeros
are assumed simple). Let (ē, ū) be in C ∩ M with both ē and ū having finite
length. Then for any k and any (e, u) ∈ Ck ∩ M ,

l(u) ≤ max{l(ū), l(d) − l(n) + max{l(ē), k}},

where l(·) denotes length.

(e)
Moreover, assuming that the conditions of Remark 4 hold (with Ai >
(e)
0 > Bi for i > l(ē)), then if for all k, each (e, u) ∈ Ck ∩ M has u of ﬁnite
length, it follows that n̂ cannot have zeros outside the unit disk.
84 R. Wenczel et al.

Proof. Let (e, u) ∈ Ck ∩ M and let (e1 , u1 ) := (e, u) − (ē, ū). Then l(e1 ) ≤
max{l(e), l(ē)} and similarly for u1 . Also u1 = −d ∗ Z −1 (ê1 /n̂) from the
corresponding convolution relation defining M . Since ê1 is a polynomial hav-
ing zeros at each zero of n̂ in the unit disk and hence the whole plane (re-
call e1 ∈ M (0) − ē), we have that û1 is a polynomial of degree at most
l(d) − l(n) + l(e1 ) − 1 ≤ l(d) − l(n) + max{l(e), l(ē)} − 1, and so, as u = u1 + ū,
the result follows.
For the second assertion, suppose n̂(z0 ) = 0 for some |z0 | > 1. With (ē, ū)
as above, it follows from the closed–loop equation d∗ē+n∗ū = w∗d−n∗β =: b̃
(where β := βc c + βs s) that ŵ(z0 ) is finite and ēˆ(z0 ) = ŵ(z0 ). Let k exceed
both the number of constraints defining
M (0) and the length of ē, and let
ˆ ˆ
e ∈ M (0) ∩ Rk . If u := Z −1 b̃−dê
n̂ , the interpolation constraints on e ensure
that û has no poles in the closed unit disk (so u ∈ l1 ) and hence (e, u) ∈ M .
(e)
Also, since from the assumptions, cone (Ck − ē) ⊇ Rk and cone (C (u) − ū) =
1
l , we have

(e, u) − (ē, ū) ∈ (M − (ē, ū)) ∩ (Rk × l1 )

(e)
⊆ (M − (ē, ū)) ∩ cone (Ck − ē) × cone (C (u) − ū)
= (M − (ē, ū)) ∩ cone (Ck − (ē, ū))
(e)
(recalling that Ck := Ck × C (u) )
= cone (Ck ∩ M − (ē, ū)) ,

whence λ((e, u) − (ē, ū)) ∈ Ck ∩ M − (ē, ū) for some positive λ. Since now
λ(e, u) + (1 − λ)(ē, ū) ∈ Ck ∩ M , the hypothesized ﬁniteness of length of
λu + (1 − λ)ū implies, via the equation

d ∗ (λe + (1 − λ)ē) + n ∗ (λu + (1 − λ)ū) = w ∗ d − n ∗ β ,

that (λe + (1 − λ)ē)(z0 ) = ŵ(z0 ) and hence ê(z0 ) = ŵ(z0 ). Since k is arbi-
trary, we have shown that every ﬁnite-length e ∈ M (0) satisﬁes an additional
interpolation constraint ê(z0 ) = ŵ(z0 ) at z0 , which yields a contradiction.

Remark 6. Under the assumptions of the preceding lemma, note that for k ≥
max{l(ē), l(ū) − l(d) + l(n)}, and (e, u) ∈ Ck ∩ M , we have l(u) ≤ k + l(d) −
l(n) := k + l, so for such k, Ck ∩ M consists precisely of those elements (e, u)
of C ∩ M with e of length k and u of length k + l.
If l ≤ 0, then

Cn ∩ M = {(e, u) ∈ C ∩ M | l(e) ≤ n, l(u) ≤ n} := Qn .

If l ≥ 0, then for all n ≥ max{l(ē), l(ū) − l(d) + l(n)},

Cn ∩ M ⊆ Qn+l ⊆ Cn+l ∩ M,
4 Convergence of truncates in l1 optimal feedback control 85

and hence Qn Attouch–Wets converges to C ∩ M , so from Corollary 3,

inf fn → inf f
Qn C∩M

as n → ∞. Observe that the sets Qn represent truncations of C ∩ M to the

same length n for both e and u.

4.7.1 Some extensions

So far we have considered CQ of the form of an interiority 0 ∈ int (C − M ),

leading, via Proposition 5, to the determination of rates of convergence for
our truncation scheme. If the CQ is weakened to the strong quasi relative
interiority 0 ∈ sqri (C − M ) (meaning cone (C − M ) is a closed subspace), it
is not immediately possible to apply Proposition 5. In this case, then, explicit
convergence estimates may not be obtainable. We may still, however, derive
limit statements of the form inf Cn ∩M f → inf C∩M f for reasonable f . To
achieve this, we use an alternate result [13] on the Attouch–Wets convergence
of sums of functions, based on a sqri-type CQ, but unfortunately, this result
will not provide the estimates obtainable through Proposition 5.
We proceed by first establishing the Attouch–Wets convergence of Cn ∩
M → C ∩ M using [13, Theorem 4.9]. In this context, the required CQ is:
(1) 0 ∈ sqri (C − M ) and cone (C − M ) has closed algebraic complement
Y ; and (2) that Y ∩ span (Cn − M ) = {0} for all n. Note that (1) implies
(2) since Cn ⊆ C for all n, so we need only consider (1). The following two
lemmas provide a sufficient condition for 0 ∈ sqri (C − M ).
Lemma 17. Suppose that:
1 0 ∈ sqri (C (e) − M (0) );
2 0 ∈ core (C (u) − ū) for some (ē, ū) ∈ C ∩ M ; and
3 n̂ has no zeros on the unit circle.
Then 0 ∈ sqri (C − M ) and cone (C − M ) has a closed algebraic complement.
Proof. By arguing as in Lemma 8, cone (C − M ) = cone (C (e) − M (0) ) × l1 .
Thus it forms a closed subspace, so strong quasi relative interiority is es-
tablished. Since M (0) − ē is a subspace of finite codimension in l1 , the
complementary space Y0 to cone (C (e) − M (0) ) is finite dimensional and
hence closed. Clearly then cone (C − M ) has the closed complement Y :=
Y0 × {0}.
From Lemma 17 the problem is reduced to finding conditions under which
0 ∈ sqri (C (e) − M (0) ).
(e) (e)
Lemma 18. Let ē ∈ l1 and C (e) = {e ∈ l1 | Bi ≤ ei ≤ Ai ; i =
(e) (e)
0, 1, 2, . . . }, where the bounds Bi and Ai satisfy:
86 R. Wenczel et al.

(e) (e)
1 Bi ≤ ēi ≤ Ai for all i ∈ N;
(e) (e)
2 i |Ai | and i |Bi | < ∞;
3 for a subsequence {ik }∞
(e) (e)
k=0 we have Bik = ēik = Aik , and for all i not in
(e) (e)
this subsequence, Bi < ēi < Ai .
Then 0 ∈ sqri (C (e) − M (0) ).

Proof. After projecting into a suitable subspace of l1 , we shall follow the

argument of Lemma 6. Let P denote the projection on l1 onto the subspace
consisting of sequences vanishing on {ik }∞ k=0 . That is, (Pe)i := 0 for i ∈
{ik }∞
k=0 and (Pe) i := ei otherwise. Evidently P is continuous and maps closed
sets to closed sets. Next, observe that (I − P)cone (C (e) − ē) = cone (I −
P)(C (e) − ē) = {0}, since if e ∈ (C (e) − ē) then for all k, we have eik ∈
(e) (e)
[Bik − ēik , Aik − ēik ] = {0} so eik = 0, yielding (I − P)e = 0 by the
deﬁnition of P. Thus we obtain

cone (C (e) − M (0) ) = cone P(C (e) − M (0) ) + (I − P)(M (0) − ē) . (4.34)

If we can show that cone P(C (e) − M (0) ) = Pl1 then (4.34) would give

cone (C (e) − M (0) ) = Pl1 + (I − P)(M (0) − ē)

= Pl1 + P(M (0) − ē) + (M (0) − ē)
= Pl1 + (M (0) − ē)

which must be closed since Pl1 is closed and the linear subspace M (0) − ē
has ﬁnite codimension.
We now verify that cone P(C (e) −M (0) ) = Pl1 . (This part of the argument
parallels that of Lemma 6.) Let ξ ∈ Pl1 , with
−1 −1 (e) (e)
ξ 1 < αK := αK min {|Ai − ξi |, |Bi − ξi |} > 0 .
0≤i≤K−1; i∈{i
/ k}

There exists η ∈ RK ⊆ l1 such that η 1 ≤ αK ξ 1 < with Aη = Aξ. Then

(e) (e)
for i ∈/ {ik }, we have Bi < ēi + ηi < Ai whenever i < K, and for i ≥ K,
(e) (e)
ēi + ηi = ēi ∈ [Bi , Ai ], whence P(η + ē) ∈ PC (e) (or Pη ∈ P(C (e) − ē)).
Also, η − ξ ∈ M − ē as A(η − ξ) = 0, implying Pη − ξ = Pη − Pξ ∈
(0)

P(M (0) − ē). Thus

ξ = Pη − (Pη − ξ)
∈ P(C (e) − ē) − P(M (0) − ē) = P(C (e) − M (0) ).
−1
This shows that B(0, αK ) ∩ Pl1 ⊆ P(C (e) − M (0) ) whence
cone P(C − M ) = Pl1 as required.
(e) (0)

Corollary 5. Under the conditions of Lemmas 17 and 18, and for any convex
closed real-valued f : l1 × l1 → R,
4 Convergence of truncates in l1 optimal feedback control 87

lim inf f = inf f .

n→∞ Cn ∩M C∩M

Proof. By the lemmas (and the cited result from [13]) the indicator functions
δCn ∩M Attouch–Wets converge to δC∩M , and (since dom f has an interior)
by any of the cited sum theorems, f + δCn ∩M → f + δC∩M also. The result
then follows from Proposition 4.

Our formulation of the control problem (see Section 4.4) was chosen to
permit its re-expression as an optimization over l1 × l1 . This choice resulted
from a wish to compare with the other methods we described in the in-
troduction, which used duality theory. Accordingly, we sought to formulate
minimizations over a space such as l1 , which has a nice dual (namely, l∞ ).
However, this is not the only problem formulation that can be treated by the
methods of this paper.
Recall from Section 4.4 that we considered only stabilizing controllers
K ∈ S(P ) for which the resulting control signal u was restricted to the
subspace Xa ⊆ l∞ . This requirement will now be relaxed to u ∈ l∞ . This
will be seen to entail only trivial changes to the main results of this section.
(Note that in this case the resulting optimization is taken over l1 × l∞ , so
duality methods would not be readily applicable, since the dual (l∞ )∗ has a
complicated characterization.)
The basic feasible set is now of the form

F0 := {(e, u) ∈ l1 × l∞ | e = e(K)for some K ∈ S(P ) u = K ∗ e}

where u is free to range through l∞ instead of its subspace Xa . If we deﬁne

(where w ∈ l∞ and has rational z-transform) the altered sets
⎧ ⎫
⎪
⎪ (e, u) ∈ l1 × l∞ | ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ ê(p̄ ) = 0 (p̄ pole of P : |p̄ | ≤ 1) i = 1,.., m ⎪
1 ⎪
⎨ i i i ⎬
M := ê(z̄j ) = ŵ(z̄j ) (z̄j zero of P : |z̄j | ≤ 1) j = 1,.., m2 ,
⎪
⎪ ⎪
⎪
⎪
⎪ ê(v̄k ) = 0 (v̄k zero of ŵ : |v̄k | ≤ 1) k = 1,.., m3 ⎪⎪
⎪
⎪ ⎪
⎪
⎩ ⎭
d∗e+n∗u=w∗d
Mr := {(e, u) ∈ M | ê, û rational, e = 0},

the argument of Lemma 2 again yields Mr = F0 . Accordingly, we may now

frame a minimization problem over the subset C ∩ M of l1 × l∞ where C =
C (e) × C (u) but now

C (u) = {u ∈ l∞ | Bi for all i ∈ N} ⊆ l∞ .

(u) (u)
≤ ui ≤ Ai

For simplicity, we consider only the required changes to Theorem 2, since this
result has a simpler statement than Theorem 1. In fact, with the assumptions
of Theorem 2, but with f : l1 ×l∞ → R given by f (e, u) = e 1 +ζ u ∞ , and
all references to ū 1 converted to ū ∞ , the form of Theorem 2 is unaltered,
88 R. Wenczel et al.

except that the ρ0 of Lemma 14 is not available, but another value may be
used (namely, ρ0 = (γ + 1) max{1, ζ −1 }).

Theorem 3. Let f : l1 × l∞ → R have the form f (e, u) := e 1 + ζ u ∞

for some ζ > 0. Assume Conditions 1, 1 and 1 of Theorem 1 hold (where
the interiority in Condition 1 is relative to the l∞ -norm) and Condition 1 is
replaced by
γ := inf f < ∞ for some n0 .
Cn0 ∩M

Deﬁne as in (4.27), with

1 8 −1 −1
9
s := min αK , μ − κ d 1 [(1 + αK ) + ξ 1 + ē 1 ] ,
2
r := 2 max{1, ζ} (max{ + ξ 1 , μ + ū ∞ } + 2s) and
t := max{ + ξ 1 + 2s, μ + ū ∞ + 2s} .

Then for any ﬁxed ρ satisfying

-
∞
6 7
ρ > max 2r + t, (γ + 1) max{1, ζ −1 },
(e) (e)
max |Ai |, |Bi |
i=0

and all n ≥ n0 for which

(e) (e)
max{|Ai |, |Bi |} < s ,
i≥n

we obtain
* * 6 7
* *
* inf f − inf f * ≤ 2r + s + ρ (e) (e)
max |Ai |, |Bi | .
*Cn ∩M C∩M * s
i≥n

Proof. Since the form of the convergence theory of Section 4.5 (and the as-
sociated CQs) is insensitive to the particular norm used, all we need to do
to obtain a proof in this context is to reinterpret all statements in u-space
as referring instead to the norm · ∞ . The only places where changes occur
are in Lemma 7, where all references to ū 1 become ū ∞ ; and instead of
using ρ0 from Lemma 14, we exploit the form of f to ensure the existence of
some ρ0 for which Condition (4.15) of Proposition 4 is valid.

4.8 Appendix

Remark 7. (A comment on z-transforms)

Let w = {wn }∞
n=0 be a sequence with a rational z-transform ŵ. On a small
neighborhood of 0 in the complex plane, this is given by a power series
4 Convergence of truncates in l1 optimal feedback control 89
∞
n=0wn z n for such z. However, by rationality, ŵ has extension to the whole
plane by its functional form, and so is extended beyond the domain of con-
vergence of the power series. ∞
For example, if w = (1, 1, ...), then for |z| < 1, ŵ(z) = n=0 z n = 1/(1−z),
but the function taking z to 1/(1 − z) is deﬁned on the whole plane except at
1 and so constitutes an extension in the above sense. We shall always assume
that such transforms have been so extended.

Proof of Lemma 1: Let (e, u) ∈ F0 and write u = ū + uc c + us s and

w = w̄+wc c+ws s, with ū and w̄ in l1 . The ensuing relation d∗e+n∗u = d∗w
implies that
ˆ + n̂ū
(wc dˆ − uc n̂)ĉ + (ws dˆ − us n̂)ŝ ≡ dê ˆ dˆ ∈ lˆ1
ˆ − w̄

and so cannot have any poles in the closed disk D̄. Since
1 − z cos θ z sin θ
ĉ(z) = and ŝ(z) = ,
(z − a)(z − ā) (z − a)(z − ā)

it follows that
ˆ − uc n̂(z))(1 − z cos θ) + (ws d(z)
(wc d(z) ˆ − us n̂(z))z sin θ
z →
(z − a)(z − ā)

has no poles in D̄ and hence none at all in C, and must be a polynomial (so
that d ∗ (wc c + ws s) − n ∗ (uc c + us s) has ﬁnite length). If now θ = 0, so s = 0,
the above amounts to stating that
ˆ − uc n̂(z)
wc d(z)
z →
1−z
ˆ
is polynomial, from which we obtain uc = wc d(1)/n̂(1) = wc /P (1). Sim-
ˆ c n̂
wc d−u
ilarly, if θ = π (so again s = 0), we have 1+· polynomial, so that
ˆ
uc = wc d(−1)/n̂(−1) = wc /P (−1). For other values of θ, similar reason-
ing implies
ˆ − uc n̂(a))(1 − a cos θ) + (ws d(a)
(wc d(a) ˆ − us n̂(a))a sin θ = 0.

Rerranging terms and taking real and imaginary parts yields

ws a sin θ+wc (1−a cos θ)
us cos θ sin θ + uc sin2 θ = Re P (a) and

ws a sin θ+wc (1−a cos θ)

us sin2 θ − uc sin θ cos θ = Im P (a) ,

with the right-hand side vanishing if P has a pole at a. Solving this for uc
and us gives the desired relation.
90 R. Wenczel et al.

Proof of Lemma 2: It suﬃces to show that Mr ⊆ F, since the reverse

inclusion is easy to demonstrate. Let (e, u) ∈ Mr . Define R := ŷ/n̂− ê/(n̂ŵd). ˆ
ˆ
Then R = ŷ/n̂ since e = 0. If we can prove that R ∈ R∞ then ê = ŵd(ŷ −Rn̂)
and from the convolution relation, û = βc ĉ + βs ŝ + ŵd(x̂ ˆ + Rd) ˆ and thus
(e, u) ∈ F.
Now, in the closed unit disk D̄, the only candidates for poles of R are the
poles/zeros of P and the zeros of ŵ. The proof now proceeds by successive
elimination of these possibilities.
ˆ = 0 and n̂(p̄) = 0), then, if p̄ is not a pole for ŵ,
If p̄ is such a pole (so d(p̄)
ŵ(p̄) is finite and nonzero (from the assumptions on pole/zero positioning),
so the nonsingularity of R at p̄ follows from that of ê/d, ˆ which itself is implied
by the interpolation constraint ê(p̄) = 0, whereby ê has a zero at p̄ cancelling
the simple zero at p̄ for d. ˆ If p̄ is also a pole of ŵ (so p̄ = a or p̄ = ā) then
ˆ is finite and nonzero at a and ā, so R has no pole there, regardless of
ŵd(·)
the value of ê(a).
If z̄ is a zero in D̄ for P (so d(z̄) ˆ = 0 and n̂(z̄) = 0), then again ŵ is
finite and nonzero here. Now R is expressible as n̂1dˆ(1 − ŵê ) − x̂dˆ at least in a
punctured neighborhood of z̄, where we used the relation x̂n̂ + ŷ dˆ = 1. The
interpolation constraint ê(z̄) = ŵ(z̄) = 0 means that 1 − ŵê has a zero at
z̄, cancelling the simple zero of n̂dˆ there. Again the singularity of R here is
removable.
If v̄ is a zero of ŵ, it is neither a pole nor a zero of P (by assumption) so
ˆ
n̂(v̄) and d(v̄) are both nonzero. The constraint ê(v̄) = 0 then implies that
ê/ŵ, and hence R, has a removable singularity there. Thus R has no poles in
D̄ as claimed.
The proof of Lemma 4 will follow from the next two elementary lemmas.

Lemma 19. Let fˆ ∈ l 1 (C) and let |a| < 1 be a zero of fˆ. Then fˆ(·)/(· − a) ∈

1 (C) and is in l1 if fˆ ∈ l1 and a ∈ R.

Proof. The indicated function is analytic on a neighborhood of 0 in the plane,

so q := Z −1 [fˆ(·)/(· − a)] exists as a complex sequence. Assume now that
a = 0 (the argument for a = 0 is trivial and hence omitted). Then since
q = Z −1 [1/(· − a)] ∗ f = − a1 (1, a1 , a12 , . . . ) ∗ f ,
k
qk = − ak+1
1
j=0 fj aj = 1
ak+1 j>k fj aj since fˆ(a) = 0.

Then, where an interchange of the order of summation occurs below,

∞ ∞
1 1
q 1 ≤ |f j ||a|j
= |fj ||a|j
|a| k+1
j=1
|a|k+1
k=0 j>k k<j
∞ + ,
1 f 1
≤ |fj ||a|j
≤ .
j=1
|a| j (1 − |a|) 1 − |a|
4 Convergence of truncates in l1 optimal feedback control 91

Lemma 20. If fˆ ∈ l
1 (C) and |a| > 1, then fˆ(·) 1 (C) and is in l1 if fˆ ∈ l1
∈ l
·−a
and a ∈ R.
Proof. As in the preceding lemma, let q be the inverse transform of the func-
tion in question. From the expression for qk given there, and an interchange
of summations,
∞ 1 ∞
1 f 1
q 1 ≤ |fj ||a|j = |fj ||a|j = 1 .
j=0
|a| k+1
j=0 |a|j 1 − 1 1 − |a|
k≥j |a|

Proof of Lemma 4: By Lemma 20, no generality is lostby assuming that

p has no zeros outside the open unit disk. Write p(z) = C i (z − ai )(z − āi )
where ai and āi are the zeros of p, with the understanding that for real ai
we only include the single factor z − ai in the above product. Also, we allow
the possibility that the ai are nondistinct. Now, as fˆ(a1 ) = 0, Lemma 19
1 (C). If a is real, then the function is in l1 —if
implies that fˆ(·)/(· − a1 ) ∈ l 1
complex, then a1 = ā1 and since fˆ(ā1 ) = 0, so fˆ(·)/(· − a1 ) has a zero at ā1
and by Lemma 19 again, fˆ(·) 1 (C) and hence is also in l1 since it
∈ l
(·−a1 )(·−ā1 )
is a symmetric function of z. Continue inductively to complete the proof.
Proof of Lemma 3: Let b1 , .., bp ∈ R and a1 , .., aq ∈ C\IR denote the
collection of poles/zeros of P in D̄, the zeros of ŵ in D̄, along with the zeros
of P outside D̄. From the assumed distinctness of these, it follows that the
corresponding square matrix
⎛ ⎞
1 b1 · · · bp+2q−1
1
⎜. .. .. ⎟
⎜ .. ⎟
⎜ . . ⎟
⎜1 b p+2q−1 ⎟
· · · bp
⎜ p ⎟
⎜ p+2q−1 ⎟
⎜ 1 Re a1 · · · Re a1 ⎟
⎜. ⎟
⎜. .. .. ⎟
⎜. . . ⎟
⎜ p+2q−1 ⎟
⎜ 1 Re aq · · · Re aq ⎟
⎜ ⎟
⎜ 0 Im a1 · · · Im ap+2q−1 ⎟
⎜ 1 ⎟
⎜ .. .. .. ⎟
⎝. . . ⎠
0 Im aq · · · Im ap+2q−1
q

has full rank over R.

Now, note that if z̄ ∈ / D̄ is a zero of P , then ŵ(z̄) is finite. To see this,
ˆ
note that d(z̄) = 0, so by (2), (w − wc c − ws s)ˆ has no pole at z̄, and hence
neither has ŵ, since z̄ equals neither a nor its conjugate. Thus ŵ(ai ) and ŵ(bj )
are all finite. By the surjectivity of the above matrix, and constructing an
appropriate real (p + 2q)-vector from ŵ(ai ) and ŵ(bj ), we can find a nonzero
e ∈ l1 of finite length such that all the interpolation constraints of M are
satisfied, with, furthermore,

ê(z̄) = ŵ(z̄)at each zero z̄ ∈

/ D̄ of P .
92 R. Wenczel et al.

We now seek u ∈ l1 of ﬁnite length such that

ŵdˆ − n̂(βc ĉ + βs ŝ) − dê

ˆ
û =
6 n̂ 7
d + d(wc ĉ + ws ŝ) − n̂(βc ĉ + βs ŝ) − dê
w̄ ˆ ˆ ˆ
=
n̂
where w̄ := w − wc c − ws s. From the deﬁnition of βc and βs (see Lemma 1),
ˆ c ĉ + ws ŝ) − n̂(βc ĉ + βs ŝ) has no pole at a (nor at ā) and
it follows that d(w
so must be polynomial, and hence the numerator in the above expression for
û is polynomial, since w̄ dˆ and ê are. To show that û is polynomial, we only
need to see that the numerator has a zero at each zero of n̂. (Recall that
we assumed all these rational transforms to be extended by their functional
form to the whole complex plane, as in Remark 7). Indeed, let z̄ be a zero of
n̂ (that is, of P ). Then z̄ = a, ā and ê(z̄) = ŵ(z̄), and since ĉ(z̄) and ŝ(z̄) are
both ﬁnite, the numerator evaluated at z̄ is

d(z̄)
w̄(z̄) ˆ −d(z̄)(w
ˆ ˆ
c ĉ(z̄) + ws ŝ(z̄)) + n̂(z̄)(βc ĉ(z̄) + βs ŝ(z̄)) − ê(z̄)d(z̄)
d(z̄)
= w̄(z̄) ˆ − d(z̄)(w
ˆ ˆ
c ĉ(z̄) + ws ŝ(z̄)) − φ̂(z̄)d(z̄)
ˆ =0
= (ŵ(z̄) − ê(z̄))d(z̄)

as claimed. Thus u has ﬁnite length.

References

1. H. Attouch, Variational Convergence for Functions and Operators, Applicable Math-

ematics Series (Pitman, London, 1984).
2. H. Attouch and R. J.–B. Wets, (1991), Quantitative stability of variational systems:
I. The epigraphical distance, Trans. Amer. Math. Soc., 328 (1991), 695–729.
3. H. Attouch and R. J.–B. Wets, (1993), Quantitative stability of variational systems:
II. A framework for nonlinear conditioning, SIAM J. Optim., 3, No. 2 (1993), 359–381.
4. H. Attouch and R. J.–B. Wets, Quantitative stability of variational systems: III. –
approximate solutions, Math. Programming, 61 (1993), 197–214.
5. D. Azé and J.–P. Penot, Operations on convergent families of sets and functions,
Optimization, 21, No. 4 (1990), 521–534.
6. S. P. Boyd and C. H. Barratt, Linear Controller Design: Limits of Performance
(Prentice–Hall, Englewood Cliffs, NJ 1991).
7. G. Beer, Topologies on Closed and Closed Convex Sets, Mathematics and its Applica-
tions, 268 (Kluwer Academic Publishers, Dordrecht, 1993).
8. J. M. Borwein and A. S. Lewis, Partially–finite convex programming, Part I: Quasi
relative interiors and duality theory, Math. Programming, 57 (1992), 15–48.
9. M. A. Dahleh and J. B. Pearson, l1 -Optimal feedback controllers for MIMO discrete–
time systems, IEEE Trans. Automat. Control, AC-32 (1987), 314–322.
10. M. A. Dahleh and I. J. Diaz-Bobillo, Control of Uncertain Systems: A Linear Pro-
gramming Approach (Prentice–Hall, Englewood Cliffs, NJ 1995).
11. G. Deodhare and M. Vidyasagar, Control system design by infinite linear program-
ming, Internat. J. Control, 55, No. 6 (1992), 1351–1380.
4 Convergence of truncates in l1 optimal feedback control 93

12. C. A. Desoer and M. Vidyasagar, Feedback Systems: Input–Output Properties (Aca-

demic Press, New York, 1975).
13. A. C. Eberhard and R. B. Wenczel, Epi–distance convergence of parametrised sums
of convex functions in non–reflexive spaces, J. Convex Anal., 7, No. 1 (2000), 47–71.
14. N. Elia and M. A. Dahleh, Controller design with multiple objectives, IEEE Trans.
Automat. Control, AC-42, No. 5 (1997), 596–613.
15. R. D. Hill, A. C. Eberhard, R. B. Wenczel and M. E. Halpern, Fundamental limitations
on the time–domain shaping of response to a fixed input, IEEE Trans. Automat.
Control, AC-47, No. 7 (2002), 1078–1090.
16. R. B. Holmes, Geometric Functional Analysis and its Applications, Graduate Texts
in Mathematics, 24 (Springer-Verlag, New York, 1975).
17. V. Jeyakumar, Duality and infinite–dimensional optimization, Nonlinear Analysis:
Theory, Methods and Applications, 15, (1990), 1111–1122.
18. V. Jeyakumar and H. Wolkowicz, Generalizations of Slater’s constraint qualification
for infinite convex programs, Math. Programming, 57 (1992), 85–102.
19. J. S. McDonald and J. B. Pearson, l1 –Optimal control of multivariable systems with
output norm constraints, Automatica J. IFAC, 27, No. 2 (1991), 317–329.
20. R. T. Rockafellar, Conjugate Duality and Optimization (SIAM, Philadelphia, PA,
1974).
21. R. T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, NJ, 1970).
22. R. T. Rockafellar and R. J.–B. Wets, Variational systems, an introduction, Multifunc-
tions and Integrands, G. Salinetti (Ed.), Springer-Verlag Lecture Notes in Mathematics
No. 1091 (1984), 1–54.
23. H. Rotstein, Convergence of optimal control problems with an H∞–norm constraint,
Automatica J. IFAC, 33, No. 3 (1997), 355–367.
24. M. Vidyasagar, Control System Synthesis (MIT Press, Cambridge MA, 1985).
25. M. Vidyasagar, Optimal rejection of persistent bounded disturbances, IEEE Trans.
Automat. Control, AC-31 (1986), 527–534.
26. R. B. Wenczel, PhD Thesis, Department of Mathematics, RMIT, 1999.
27. R. B. Wenczel, A. C. Eberhard and R. D. Hill, Comments on “Controller design with
multiple objectives,” IEEE Trans. Automat. Control, 45, No. 11 (2000), 2197–2198.
Chapter 5
Asymptotical stability of optimal
paths in nonconvex problems

Musa A. Mamedov

Abstract In this chapter we study the turnpike property for the nonconvex
optimal control problems described by the differential inclusion ẋ ∈ a(x). We
T
study the infinite horizon problem of maximizing the functional 0 u(x(t)) dt
as T grows to infinity. The purpose of this chapter is to avoid the convexity
conditions usually assumed in turnpike theory. A turnpike theorem is proved
in which the main conditions are imposed on the mapping a and the function
u. It is shown that these conditions may hold for mappings a with nonconvex
images and for nonconcave functions u.

Key words: Turnpike property, diﬀerential inclusion, functional

5.1 Introduction and background

Let x ∈ Rn and Ω ⊂ Rn be a given set. Denote by Πc (Rn ) the set of all

compact subsets of Rn . We consider the following problem:
·
x ∈ a(x), x(0) = x0 , (5.1)
T
JT (x(·)) = u(x(t))dt → max . (5.2)
0

Here x0 ∈ Ω ⊂ Rn is an assigned initial point. The multivalued mapping

a : Ω → Πc (Rn ) has compact images and is continuous in the Hausdorﬀ

Musa A. Mamedov
School of Information Technology and Mathematical Sciences, University of Ballarat,
Victoria 3353, AUSTRALIA
e-mail: musa [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 95

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 5,
c Springer Science+Business Media, LLC 2009
96 M.A. Mamedov

metric. We assume that at every point x ∈ Ω the set a(x) is uniformly

locally–connected (see [3]). The function u : Ω → R1 is a given continuous
function.
In this chapter we study the turnpike property for the problem given by
(6.1) and (5.2). The term ‘turnpike property’ was first coined by Samuelson in
1958 [16] when he showed that an efficient expanding economy would spend
most of its time in the vicinity of a balanced equilibrium path. This prop-
erty was further investigated by Radner [13], McKenzie [11], Makarov and
Rubinov [7] and others for optimal trajectories of a von Neuman–Gale model
with discrete time. In all these studies the turnpike property was established
under some convexity assumptions.
In [10] and [12] the turnpike property was defined using the notion of
statistical convergence (see [4]) and it was proved that all optimal trajectories
have the same unique statistical cluster point (which is also a statistical limit
point). In these works the turnpike property is proved when the graph of the
mapping a does not need to be a convex set.
The turnpike property for continuous-time control systems has been stud-
ied by Rockafellar [14], [15], Cass and Shell [1], Scheinkman [18], [17] and
others who imposed additional conditions on the Hamiltonian. To prove a
turnpike theorem without these kinds of additional conditions has become a
very important problem. This problem has recently been further investigated
by Zaslavsky [19], [21], Mamedov [8], [9] and others. The theorem proved in
the current chapter was first given as a short note in [8]. In this work we give
the proof of this theorem and explain the assumptions used in the examples.

Deﬁnition 1. An absolutely continuous function x(·) is called a trajectory

(solution) to the system (6.1) in the interval [0, T ] if x(0) = x0 and almost
·
everywhere on the interval [0, T ] the inclusion x (t) ∈ a(x(t)) is satisﬁed.
We denote the set of trajectories deﬁned on the interval [0,T] by XT and let

JT∗ = sup JT (x(·)).

x(·)∈XT

We assume that the trajectories of system (5.1) are uniformly bounded, that
is, there exists a number L < +∞ such that

|x(t)| ≤ L for all t ∈ [0, T ], x(·) ∈ XT , T > 0. (5.3)

Note that in this work we focus our attention on the turnpike property of
optimal trajectories. So we do not consider the existence of bounded tra-
jectories defined on [0, ∞]. This issue has been studied for different control
problems by Leizarowitz [5], [6], Zaslavsky [19], [20] and others.
Definition 2. The trajectory x(·) is called optimal if J(x(·)) = JT∗ and is
called ξ-optimal (ξ > 0) if
5 Asymptotical stability of optimal paths in nonconvex problems 97

J(x(·)) ≥ JT∗ − ξ.
Deﬁnition 3. The point x is called a stationary point if 0 ∈ a(x).

Stationary points play an important role in the study of the asymptotical

behavior of optimal trajectories. We denote the set of stationary points by
M:
M = {x ∈ Ω : 0 ∈ a(x)}.
We assume that the set M is bounded. This is not a hard restriction, because
we consider uniformly bounded trajectories and so the set Ω can be taken as
a bounded set. Since the mapping a(·) is continuous the set M is also closed.
Then M is a compact set.
Deﬁnition 4. The point x∗ ∈ M is called an optimal stationary point if

u(x∗ ) = max u(x).

x∈M

In turnpike theory it is usually assumed that the optimal stationary point

x∗ is unique. We also assume that the point x∗ is unique, but the method
suggested here can be applied in the case when we have several diﬀerent
optimal stationary points.

5.2 The main conditions of the turnpike theorem

Turnpike theorems for the problem (6.1), (5.2) have been proved in [14], [18]
and elsewhere, where it was assumed that the graph of the mapping a is a
compact convex set and the function u is concave. The main conditions are
imposed on the Hamiltonian. In this chapter a turnpike theorem is presented
in which the main conditions are imposed on the mapping a and the function
u. Here we present a relation between a and u which provides the turnpike
property without needing to impose conditions such as convexity of the graph
of a and of the function u. On the other hand this relation holds if the graph
of a is a convex set and the function u is concave.
Condition M There exists b < +∞ such that for every T > 0 there is a
trajectory x(·) ∈ XT satisfying the inequality

JT (x(·)) ≥ u∗ T − b.

Note that satisfaction of this condition depends in an essential way on the

initial point x0 , and in a certain sense it can be considered as a condition
for the existence of trajectories converging to x∗ . Thus, for example, if there
exists a trajectory that hits x∗ in a ﬁnite time, then Condition M is satisﬁed.
Set
B = {x ∈ Ω : u(x) ≥ u∗ }.
98 M.A. Mamedov

We ﬁx p ∈ Rn , p = 0, and deﬁne a support function

c(x) = max py.

y∈a(x)

Here the notation py means the scalar product of the vectors p and y. By |c|
we will denote the absolute value of c.
We also deﬁne the function
u(x)−u∗ u(y)−u∗
ϕ(x, y) = |c(x)| + c(y) .

Condition H There exists a vector p ∈ Rn such that

H1 c(x) < 0 for all x ∈ B, x = x∗ ;
H2 there exists a point x̃ ∈ Ω such that px̃ = px∗ and c(x̃) > 0;
H3 for all points x, y, for which

px = py, c(x) < 0, c(y) > 0,

the inequality ϕ(x, y) < 0 is satisﬁed; and also if

xk → x∗ , yk → y = x∗ , pxk = pyk , c(xk ) < 0 and c(yk ) > 0,

then lim supk→∞ ϕ(xk , yk ) < 0.

Note that if Condition H is satisﬁed for any vector p then it is also satisﬁed
for all λp, (λ > 0). That is why we assume that ||p|| = 1.
Condition H1 means that derivatives of the system (6.1) are directed to
one side with respect to p, that is, if x ∈ B, x = x∗ , then py < 0 for all
y ∈ a(x). It is also clear that py ≤ 0 for all y ∈ a(x∗ ) and c(x∗ ) = 0.
Condition H2 means that there is a point x̃ on the plan {x ∈ Rn : p(x −
∗
x ) = 0} such that pỹ > 0 for some ỹ ∈ a(x̃). This is not a restrictive
assumption, but the turnpike property may be not true if this condition does
not hold.
The main condition here is H3. It can be considered as a relation between
the mapping a and the function u which provides the turnpike property.
Note that Conditions H1 and H3 hold if the graph of the mapping a is a
convex set (in Rn × Rn ) and the function u is strictly concave. In the next
example we show that Condition H can hold for mappings a without a convex
graph and for functions u that are not strictly concave (in this example the
function u is convex).

Example 1 Let x = (x1 , x2 ) ∈ R2 and the system (6.1) have the form
·
x1 = λ[x21 + (x22 + 1)2 + w], − 1 ≤ w ≤ +1,
·
x2 = f (x1 , x2 , v), v ∈ U ⊂ Rm .
Here λ > 0 is a positive number, the function f (x1 , x2 , v) is continuous and
f (0, 0, ṽ) = 0 for some ṽ ∈ U.
5 Asymptotical stability of optimal paths in nonconvex problems 99

The mapping a can be written as

a(x) = { y = (y1 , y2 ) : y1 = λ[x21 + (x22 + 1)2 + w], y2 = f (x1 , x2 , v),

x = (x1 , x2 ) ∈ R2 , − 1 ≤ w ≤ +1, v ∈ U }.

The function u is given in the form

u(x) = cx2 + dx2k
1 , where d > 0, c ≥ 2d, k ∈ {1, 2, 3, ...}.
We show that Condition H holds.
It is not diﬃcult to see that the set of stationary points M contains the
point (0, 0) and also M ⊂ B1 (0, −1), where B1 (0, −1) represents the sphere
with center (0, −1) and with radius 1. We have

u∗ = max u(x) = u(0, 0) = 0.

x∈M

Therefore x∗ = (0, 0) is a unique optimal stationary point.

We ﬁx the vector p = (−1, 0) and calculate the support function c(x) =
maxy∈a(x) py :
c(x) = −λ(x21 + x22 + 2x2 ).
Take any point x = (x1 , x2 ) ∈ B = {x : u(x) > 0} such that x =
x∗ = (0, 0). Clearly x ∈/ B1 (0, −1) and therefore c(x) < 0. Then Condition
H1 holds. Condition H2 also holds, because, for example, for the point x̃ =
(0, −1) for which px̃ = 0 we have c(x̃) = λ > 0.
Now we check Condition H3.
Take any two points x = (x1 , x2 ), y = (y1 , y2 ) for which px = py, c(x) < 0,
c(y) > 0.
If u(x) < 0, from the expression of the function φ(x, y) we obtain that
φ(x, y) < 0. Consider the case u(x) ≥ 0.
From px = py we have x1 = y1 . Denote ξ = x1 = y1 . Since c(y) > 0 and
λ > 0 we obtain ξ 2 + (y2 + 1)2 < 1. Therefore 0 < ξ < 1 and y2 + (1/2)ξ2 < 0.
On the other hand

u(y) − u∗ = cy2 + dξ 2k < cy2 + dξ 2 ≤ c[y2 + (1/2)ξ 2 ].

Since c(x) < 0 and u(x) ≥ 0 then |c(x)| = λ(ξ 2 +x22 +2x2 ), x2 +(1/2)ξ 2 +
(1/2)x22 > 0,

u(x) − u∗ = cx2 + dξ 2k < cx2 + dξ 2 ≤ c[x2 + (1/2)ξ 2 ].

Thus
u(x) − u∗ c(x2 + (1/2)ξ 2 ) c
< ≤ ,
|c(x)| λ(ξ 2 + x22 + 2x2 ) 2λ
u(y) − u∗ c(y2 + (1/2)ξ 2 ) c
< ≤− .
c(y) −λ(ξ 2 + y22 + 2y2 ) 2λ
100 M.A. Mamedov

From these inequalities we have ϕ(x, y) < 0, that is, the ﬁrst part of H3
holds. The second part of Condition H3 may also be obtained from these
inequalities. Therefore Condition H holds.
We now formulate the main result of the current chapter.

Theorem 1. Suppose that Conditions M and H are satisﬁed and that the
optimal stationary point x∗ is unique. Then:
1. there exists C < +∞ such that

T
(u(x(t)) − u∗ )dt ≤ C
0

for every T > 0 and every trajectory x(t) ∈ XT ;

2. for every ε > 0 there exists Kε,ξ < +∞ such that

meas {t ∈ [0, T ] : ||x(t) − x∗ || ≥ ε} ≤ Kε,ξ

for every T > 0 and every ξ-optimal trajectory x(t) ∈ XT ;
3. if x(t) is an optimal trajectory and x(t1 ) = x(t2 ) = x∗ , then x(t) ≡ x∗
for t ∈ [t1 , t2 ].

The proof of this theorem is given in Section 7, and in Sections 3 to 6 we

give preliminary results.

5.3 Deﬁnition of the set D and some of its properties

In this section a set D is introduced. This set will be used in all sections
below.
Denote
M∗ = {x ∈ Ω : c(x) ≥ 0}.
Clearly M ⊂ M∗ . We recall that B = {x ∈ Ω : u(x) ≥ u∗ }.
Consider a compact set D ⊂ Ω for which the following conditions hold:
a) x ∈ int D for all x ∈ B, x = x∗ ;
b) c(x) < 0 for all x ∈ D, x = x∗ ;
c) D ∩ M∗ = {x∗ } and B ⊂ D.

It is not diﬃcult to see that there exists a set D with properties a), b) and
c). For example such a set can be constructed as follows. Let x ∈ B, x =
x∗ . Then c(x) < 0. Since the mapping a is continuous in the Hausdorﬀ
metric the function c(x) is continuous too. Therefore there exists εx > 0
such that c(x ) < 0 for all x ∈ Vεx (x) ∩ Ω. Here Vε (x) represents the open
ε-neighborhood of the point x. In this case for the set
5 Asymptotical stability of optimal paths in nonconvex problems 101

⎧ ⎫
⎨ / ⎬
D = cl V 21 εx (x) ∩ Ω
⎩ ⎭
x∈B,x =x∗

Conditions a) to c) are satisﬁed.

Lemma 1. For every ε > 0 there exists νε > 0 such that

u(x) ≤ u∗ − νε

/ int D and ||x − x∗ || ≥ ε.

for every x ∈ Ω, x ∈
Proof. Assume to the contrary that for any ε > 0 there exists a sequence xk
such that xk ∈ / int D, ||xk − x∗ || ≥ ε and u(xk ) → u∗ as k → ∞. Since the
sequence xk is bounded it has a limit point, say x . Clearly x = x∗ , x ∈
/ int D
and also u(x ) = u∗ , which implies x ∈ B. This contradicts Condition a) of
the set D.
Lemma 2. For every ε > 0 there exists ηε > 0 such that

c(x) < −ηε and for all x ∈ D, ||x − x∗ || ≥ ε.

Proof. Assume to the contrary that for any ε > 0 there exists a sequence xk
such that xk ∈ D, ||xk − x∗ || ≥ ε and c(xk ) → 0. Let x be a limit point of the
sequence xk . Then x ∈ D, x = x∗ and c(x ) = 0. This contradicts Property
b) of the set D.

5.4 Transformation of Condition H3

In this section we prove an inequality which can be considered as a transfor-

mation of Condition H3.
Take any number ε > 0 and denote Xε = {x : ||x − x∗ || ≥ ε}. Consider
the sets D ∩ Xε and M∗ ∩ Xε .
Let x ∈ D ∩ Xε , y ∈ M∗ ∩ Xε and px = py, with c(x) < 0 and c(y) > 0.
From Condition H3 it follows that

ϕ(x, y) < 0. (5.4)

We show that for every ε > 0 there exists δε > 0 such that

ϕ(x, y) < −δε (5.5)

for all x ∈ D (x = x∗ ), y ∈ M∗ ∩ Xε , for which px = py, c(x) < 0

and c(y) > 0.
102 M.A. Mamedov

First we consider the case where x ∈ D ∩ Xε . In this case if (5.5) is not

true then there exist sequences (xn ) and (yn ), for which

pxn = pyn , c(xn ) < 0, c(yn ) > 0, xn ∈ D ∩ Xε , yn ∈ M∗ ∩ Xε

and
xn → x̄, yn → ȳ, ϕ(xn , yn ) → 0.
From Lemma 17.4.1 it follows that the sequence {(u(xn ) − u∗ )/|c(xn )|} is
bounded. Since ϕ(xn , yn ) → 0, the sequence {(u(yn ) − u∗ )/c(yn )} is also
bounded and therefore from Lemma 17.3.1 we obtain c(ȳ) > 0. We also
obtain c(x̄) < 0 from the inclusion x̄ ∈ D ∩ Xε . Thus the function ϕ(x, y)
is continuous at the point (x̄, ȳ). Then from ϕ(xn , yn ) → 0 it follows that
ϕ(x̄, ȳ) = 0, which contradicts (5.4).
We now consider the case where x ∈ D, x = x∗ . Assume that (5.5)
does not hold. Then there exist sequences (xn ) and (yn ), for which pxn =
pyn , c(xn ) < 0, c(yn ) > 0, xn ∈ D, yn ∈ M∗ ∩ Xε and xn → x̄, yn → ȳ,
ϕ(xn , yn ) → 0. If x̄ = x∗ , we have a contradiction similar to the ﬁrst case. If
x̄ = x∗ , taking the inequality ȳ = x∗ into account we obtain a contradiction
to the second part of Condition H3.
Thus we have shown that (5.5) is true.
Deﬁne the function
u(x) − u∗ u(y) − u∗
ϕ(x, y)δ1 ,δ2 = + , (δ1 ≥ 0, δ2 ≥ 0).
|c(x)| + δ1 c(y) + δ2

Since the support function c(·) is continuous, from Conditions H2 and H3

it follows that there exists a number b ∈ (0, +∞) such that

u(x) − u∗
≤b for all x ∈ D, x = x∗ . (5.6)
|c(x)|

For a given ε > 0 we choose a number γ(ε) > 0 such that

νε
b− ≤ −δε ; (5.7)
γε

here the number νε is deﬁned by Lemma (17.3.1). Clearly

u(y) − u∗ ≤ −νε for all y ∈ M∗ ∩ Xε . (5.8)

By using γ(ε) we divide the set M∗ ∩ Xε ∩ {y : c(y) > 0} into two parts:
1
Y1 = {y ∈ M∗ ∩ Xε ∩ {y : c(y) > 0} : c(y) ≥ γ(ε)},
2
1
Y2 = {y ∈ M∗ ∩ Xε ∩ {y : c(y) > 0} : c(y) < γ(ε)}.
2
5 Asymptotical stability of optimal paths in nonconvex problems 103

1
Consider the set Y2 . Denote δ̄2 = 2 γ(ε) and take any number δ̄1 > 0.
Then
u(x) − u∗
≤b for all x ∈ D, x = x∗ , δ1 ≤ δ̄1 (5.9)
|c(x)| + δ1

and
1 1
c(y) + δ2 ≤ c(y) + δ̄2 < γ(ε) + γ(ε) = γ(ε)
2 2
for all 0 ≤ δ2 ≤ δ̄2 , y ∈ Y2 . Using (5.8) we obtain

u(y) − u∗ νε νε
≤− ≤− for all y ∈ Y2 . (5.10)
c(y) + δ2 c(y) + δ2 γ(ε)

Thus from (5.9) and (5.10) we have

νε
ϕ(x, y)δ1 ,δ2 ≤ b − ≤ − δε (5.11)
γ(ε)

for all (x, y) and (δ1 , δ2 ) satisfying

x ∈ D, x = x∗ , y ∈ Y2 , c(x) < 0, c(y) > 0,

1
0 ≤ δ1 ≤ δ̄1 , 0 ≤ δ2 ≤ δ̄2 = γ(ε).
2
Now consider the set Y1 . Since Y1 is a bounded closed set and c(y) ≥
1
2 γ(ε) > 0 for all y ∈ Y1 , then the function (u(y) − u∗ )/(c(y) + δ2 ) is
uniformly continuous with respect to (y, δ2 ) on the closed set Y1 × [0, L],
where L > 0. That is why for a given number δε there exists δ̄ˆ21 (ε) such
that
u(y) − u∗ u(y) − u∗ 1
− ≤ δε for all y ∈ Y1 , 0 ≤ δ2 ≤ δ̄ˆ21 (ε).
c(y) + δ2 c(y) 2

On the other hand

u(x) − u∗ u(x) − u∗
≤ for all x ∈ D, x = x∗ , u(x) ≥ u∗ , 0 ≤ δ1 ≤ δ̄1 .
|c(x)| + δ1 |c(x)|

Then if px = py we have

1 1 1
ϕ(x, y)δ1 ,δ2 ≤ ϕ(x, y) + δε ≤ − δ ε + δε = − δε , (5.12)
2 2 2
for all x ∈ D, x = x∗ , u(x) ≥ u∗ , y ∈ Y1 , px = py,

0 ≤ δ1 ≤ δ̄1 and 0 ≤ δ2 ≤ δ̄ˆ21 (ε).

104 M.A. Mamedov

Now consider the case when x ∈ D and u(x) < u∗ . Since the function c(y)
is bounded on the set Y1 , then for ε > 0 there exist δ̄ˆ22 (ε) > 0 and δ̂ε > 0
such that

u(y) − u∗ 1
≤ − δ̂ε for all y ∈ Y1 , 0 ≤ δ2 ≤ δ̄ˆ22 (ε).
c(y) + δ2 2
Then
u(y) − u∗ 1
ϕ(x, y)δ1 ,δ2 ≤ ≤ − δ̂ε , (5.13)
c(y) + δ2 2

for all x ∈ D, u(x) < u∗ , y ∈ Y1 , px = py, c(x) < 0,

0 ≤ δ1 ≤ δ̄1 and 0 ≤ δ2 ≤ δ̄ˆ22 (ε).

Therefore from (5.11) to (5.13) we have

ϕ(x, y)δ1 ,δ2 ≤ − δ̄ε , (5.14)

for all x ∈ D, x = x∗ , y ∈ M∗ ∩ Xε , px = py, c(y) > 0,

0 ≤ δ1 ≤ δ̄1 and 0 ≤ δ2 ≤ δ2 (ε).

Here δ̄ε = min{ 12 δε , 1

2 δ̂ε } and δ2 (ε) = min{ 12 γ(ε), δ̄ˆ21 (ε), δ̄ˆ22 (ε)}.

Consider the function (u(x) − u∗ )/(|c(x)| + δ1 ) with respect to (x, δ1 ).

From Lemma 17.4.1 we obtain that this function is continuous on the set
(D ∩ Xε̂ ) × [0, δ̄1 ] for all ε̂ > 0. Then for a given number 12 δ̄ε > 0 there exists
η = η(ε, ε̂) > 0 such that

u(x) − u∗ u(x ) − u∗ 1
− ≤ δ̄ε
|c(x)| + δ1 |c(x )| + δ1 2

for all x ∈ cl (Vη (x )) = {x : ||x − x || ≤ η}, x ∈ D ∩ Xε and 0 ≤ δ1 ≤ δ̄1 .

If px = py then we have
1 1
ϕ(x, y)δ1 ,δ2 ≤ − ϕ(x , y)δ1 ,δ2 + δ̄ε ≤ − δ̄ε ,
2 2
∗
for all x ∈ cl (Vη (x )), x ∈ D ∩ Xε , y ∈ M ∩ Xε , px = py, c(y) > 0, 0 ≤
δ1 ≤ δ̄1 and 0 ≤ δ2 ≤ δ2 (ε).
Denote δ (ε) = min{ 12 δ̄ε , δ2 (ε)}. Obviously δ (ε) > 0 if ε > 0 and also for
every ε > 0 there exists δ > 0 such that δ (ε) ≥ δ for all ε ≥ ε . Therefore
there exists a continuous function δ (ε), with respect to ε, such that
• δ (ε) ≤ δ (ε) for all ε > 0 and
• for every ε > 0 there exists δ > 0 such that δ (ε) < δ for all ε ≥ ε .
5 Asymptotical stability of optimal paths in nonconvex problems 105

Let x ∈ D and y ∈ M∗ . Taking ε̂ = ||x − x∗ || and e = ||y − x∗ ||, we

deﬁne the functions δ(y) = δ (ε) = δ (||y − x∗ ||) and η(x , y) = η(ε, ε̂) =
η(||y − x∗ ||, ||x − x∗ ||). Clearly in this case the function δ(y) is continuous
with respect to y.
Thus the following lemma is proved.
Lemma 3. Assume that at the point x ∈ D, y ∈ M∗ we have px =
py, c(x) < 0 and c(y) > 0. Then for every point x and numbers δ1 , δ2
satisfying

x ∈ cl Vη(x ,y) (x ) , c(x) < 0, 0 ≤ δ1 ≤ δ̄1 and 0 ≤ δ2 ≤ δ2 (y),

the following inequality holds:

+ ,
u(x) u(y) 1 1
+ ≤ u∗ + − δ(y).
|c(x)| + δ1 c(y) + δ2 |c(x)| + δ1 c(y) + δ2

Here the functions η(x , y) and δ(y) are such that δ(y) is continuous and
∼ ∧ ∧
for every ε > 0, ε> 0 there exist δ ε > 0 and η ε,∼ε > 0 such that
∧ ∧
δ(y) ≥δ ε and η(x , y) ≥η ε,∼ε
∼
for all (x , y) for which ||x − x∗ || ≥ ε, ||y − x∗ || ≥ ε.

5.5 Sets of 1st and 2nd type: Some integral inequalities

In this section, for a given trajectory x(t) ∈ XT we divide the interval [0, T ]
into two types of intervals and prove some integral inequalities. We assume
that x(t) is a given continuously diﬀerentiable trajectory.

5.5.1

Consider a set {t ∈ [0, T ] : x(t) ∈ int D}. This set is an open set and therefore
it can be presented as a union of a countable (or ﬁnite) number of open
intervals τk = (tk1 , tk2 ), k = 1, 2, 3, ..., where τk ∩ τl = ∅ if k = l. We denote
the set of intervals τk by G = {τk : k = 1, 2, ...}.
From the deﬁnition of the set D we have
d ·
px(t) = p x (t) ≤ c(x(t)) < 0 for all t ∈ τk , k = 1, 2, ....
dt
Then px(tk1 ) > px(tk2 ) for all k.
We introduce the notation piτk = px(tki ), i = 1, 2, ..., Pτk = [p2τk , p1τk ] and
Pτk = (p2τk , p1τk ).
0
106 M.A. Mamedov

5.5.2

We divide the set G into the sets gm (G = ∪m gm ), such that for every set
g = gm the following conditions hold:
a) The set g consists of a countable (or ﬁnite) number of intervals τk , for
which

i - related intervals Pτ0k are disjoint;

ii - if [t1τi , t2τi ] ≤ [t1τj , t2τj ] for some τi , τj ∈ g, then Pτi ≥ Pτj ; (here and
henceforth the notation [t1τi , t2τi ] ≤ [t1τj , t2τj ] and Pτi ≥ Pτj means that
t2τi ≤ t1τj and p2τi ≥ p1τj , respectively)
iii - if [t1τ , t2τ ] ⊂ [t1g , t2g ] for τ ∈ G, then τ ∈ g; here t1g = inf τ ∈g t1τ and
t2g = supτ ∈g t2τ .
b) The set g is a maximal set satisfying Condition a); that is, the set g cannot
be extended by taking other elements τ ∈ G such that a) holds.

Sets gm (G = ∪m gm ) satisfying these conditions exist. For example we can

construct a set g satisfying Conditions a) and b) in the following way.
Take any interval τ1 = (t11 , t12 ). Denote by t2 a middle point of the interval
[0, t11 ] : t2 = 12 t11 . If for all intervals τ ∈ G, for which τ ⊂ [t2 , t11 ], Conditions
i and ii hold, then we take t3 = 12 t2 , otherwise we take t3 = 12 (t2 + t11 ). We
repeat this process and obtain a convergent sequence tn . Let tn → t . In this
case for all intervals τ ∈ G, for which τ ⊂ [t , t11 ], Conditions i and ii are
satisfied.
Similarly, in the interval [t12 , T ] we find the point t such that for all inter-
vals τ ∈ G, for which τ ⊂ [t12 , t ], Conditions i and ii are satisfied. Therefore
we obtain that for the set g which consists of all intervals τ ∈ G, for which
τ ⊂ [t , t ], Condition a) holds. It is not difficult to see that for the set g
Condition b) also holds.
Thus we have constructed a set g1 = g satisfying Conditions a) and b).
We can construct another set g2 taking G \ g1 in the same manner and so on.
Therefore G = ∪m gm , where for all gm Conditions a) and b) hold. For every
set g, we have intervals [t1g , t2g ] (see iii) and [p1g , p2g ], where p1g = supτ ∈g p1τ and
p2g = inf τ ∈g p2τ .

Deﬁnition 5. We say that g1 < g2 if:

a) p2g1 < p1g2 and t2g1 < t1g2 ;
b) there is no g ∈ G which belongs to the interval [t2g1 , t1g2 ].

Take some set g 1 ∈ G and consider all sets gm ∈ G, m = ±1, ±2, ±3, ... for
which
... < g−2 < g−1 < g 1 < g1 < g2 < ....

The number of sets gm may be at most countable. Denote G1 = {g 1 } ∪ {gm :

m = ±1, ±2, ±3, ...}.
5 Asymptotical stability of optimal paths in nonconvex problems 107

We take another set g 2 ∈ G \ G1 and construct a set G2 similar to G1 , and

so on. We continue this procedure and obtain sets Gi . The number of these
sets is either ﬁnite or countable. Clearly G = ∪i Gi .
Denote t1Gi = inf g∈Gi t1g and t2Gi = supg∈Gi t2g . Clearly

∪i [t1Gi , t2Gi ] ⊂ [0, T ] (5.15)

and (t1Gi , t2Gi ) ∩ (t1Gj , t2Gj ) = ∅ for all i = j.

Proposition 1. Let gk ∈ Gi , k = 1, 2, .... Then
a) if g1 < g2 < g3 < .... then x(t2Gi ) = x∗ ;
b) if g1 > g2 > g3 > .... then x(t1Gi ) = x∗ .

Proof. Consider case a). Take some gk . By Deﬁnition 5 it is clear that there
exists an interval τk ∈ gk and a point tk ∈ τk such that

x(tk ) ∈ D. (5.16)

Consider the interval [t2gk , t1gk+1 ]. Since gk < gk+1 by Deﬁnition 5 we have
px(t2gk ) < px(t1gk+1 ). Therefore there exists a point sk ∈ (t2gk , t1gk+1 ) such that
·
p x (sk ) ≥ 0, which implies

x(sk ) ∈ M∗ . (5.17)

On the other hand, |t2gk − t1gk | → 0 and |t1gk+1 − t2gk | → 0, as k → ∞.

− − −
Then x(t) →x as t → t2Gi . In this case x(tk ) →x and x(sk ) →x as k → ∞.
Therefore from the deﬁnition of the set D and from (5.16) and (5.17) we
− −
obtain x∈ D ∩ M∗ = {x∗ }; that is, x= x∗ .
We can prove case b) in the same manner.

5.5.3

Take the set Gi . We denote by t1i an exact upper bound of the points t2Gm sat-
isfying t2Gm ≤ t1Gi and by t2i an exact lower bound of the points t1Gm satisfying
t1Gm ≥ t2Gi .
Proposition 2. There exist points ti ∈ [t1i , t1Gi ] and ti ∈ [t2Gi , t2i ] such that
x(ti ) = x(ti ) = x∗ .

Proof. First we consider the interval [t1i , t1Gi ]. Two cases should be studied.
1. Assume that the exact upper bound t1i is not reached. In this case there
exists a sequence of intervals [t1Gm , t2Gm ] such that t2Gm → t1i and t1Gm →
t1i . Since the intervals (t1Gm , t2Gm ) are disjoint we obtain that x(t1i ) = x∗ .
Therefore ti = t1i .
108 M.A. Mamedov

2. Assume that t1i = t2Gm for any Gm .

If there exists a sequence gk ∈ Gm , k = 1, 2, ..., such that g1 < g2 < ..., then
from Proposition 1 it follows that x(t2Gm ) = x∗ . So we can take ti = t2Gm = t1i .
Therefore we can assume that the set Gm consists of gmk , where gm1 >
gm2 > ....
Now consider the set Gi . If in this set there exists a sequence gil such that
gi1 > gi2 > · · · then from Proposition 6.1 it follows that x(t1Gi ) = x∗ , so we
can take ti = t1Gi . That is why we consider the case when the set Gi consists
of gil , where gi1 < gi2 < ....
Consider the elements gm1 and gi1 . We denote gm = gm1 and gi = gi1 .
The elements gm and gi belong to the different sets Gm and Gi , so they
cannot be compared by Definition 5. Since the second condition of this defini-
tion holds, the first condition is not satisfied, that is, p2Gm ≥ p2Gi . This means
that the interval τ in gm and gi forms only one element of type g. This is a
contradiction.
Therefore we can take either ti = t1i or ti = t1Gi .
Similarly we can prove the proposition for the interval [t2Gi , t2i ].

Note We take ti = 0 (or ti = T ) if for the chosen set Gi there does not exist
Gm such that t2Gm ≤ t1Gi (or t1Gm ≥ t2Gi , respectively).

Therefore the following lemma is proved.

Lemma 4. The interval [0, T ] can be divided into an at most countable num-
ber of intervals [0, t1 ], [t1k , t2k ] and [t2 , T ], such that the interiors of these in-
tervals are disjoint and
a) [0, T ] = [0, t1 ] ∪ {∪k [t1k , t2k ]} ∪ [t2 , T ];
b) in each interval [0, t1 ], [t1k , t2k ] and [t2 , T ] there is only one set G0 , Gk
and GT , respectively, and

G = G0 ∪ {∪k Gk } ∪ GT ;

c) x(t1 ) = x(t1k ) = x(t2k ) = x(t2 ) = x∗ , k = 1, 2, 3, ....

5.5.4

In this subsection we give two lemmas.

Lemma 5. Assume that the function x(t) is continuously diﬀerentiable on

the interval [t1 , t2 ] and p1 < p2 , where pi = px(ti ), i = 1, 2. Then there exists
an at most countable number of intervals [tk1 , tk2 ] ⊂ [t1 , t2 ] such that
·
a) p x (t) > 0, t ∈ [tk1 , tk2 ], k = 1, 2, ...;
b) [pk1 , pk2 ] ⊂ [p1 , p2 ] and pk1 < pk2 for all k = 1, 2, ...,
where pki = px(tki ), i = 1, 2;
5 Asymptotical stability of optimal paths in nonconvex problems 109

c) the intervals (pk1 , pk2 ) are disjoint and

p 2 − p1 = (pk2 − pk1 ).
k

Proof: For the sake of definiteness, we assume that px(t) ∈ [p1 , p2 ], t ∈ [t1 , t2 ].
Otherwise we can consider an interval [t1 , t2 ] ⊂ [t1 , t2 ], for which px(t) ∈
[p1 , p2 ], t ∈ [t1 , t2 ] and pi = px(ti ), i = 1, 2.
We set t(q) = min{t ∈ [t1 , t2 ] : px(t) = q} for all q ∈ [p1 , p2 ] and then
define a set
m = {t(q) : q ∈ [p1 , p2 ]}.
Clearly m ⊂ [t1 , t2 ]. Consider a function a(t) = px(t) defined on the set m.
It is not difficult to see that for every q ∈ [p1 , p2 ] there is only one number
.
t(q), for which a(t(q)) = q, and also a (t(q)) ≥ 0, ∀q ∈ [p1 , p2 ].
We divide the interval [p1 , p2 ] into two parts as follows:
. .
P1 = {q : a (t(q)) = 0} and P2 = {q : a (t(q)) > 0}.

Deﬁne the sets

m(P1 ) = {t(q) : q ∈ P1 } and m(P2 ) = {t(q) : q ∈ P2 }.

We denote by mΛ a set of points t ∈ [t1 , t2 ] which cannot be seen from

the left (see [2]). It is known that the set mΛ can be presented in the form
mΛ = ∪n (αn , βn ). Then we can write

[t1 , t2 ] = m(P1 ) ∪ m(P2 ) ∪ mΛ ∪ (∪n {βn }).

Let q ∈ P2 . Since the function x(t) is continuously diﬀerentiable, there

exists a number ε > 0, such that Vε (q) ⊂ P2 , where Vε (q) stands for the open
ε-neighborhood of the point q. Therefore the set m(P2 ) is an open set and
that is why it can be presented as m(P2 ) = ∪k (tk1 , tk2 ).
Thus we have
t2
· ·
p2 − p1 = px(t2 ) − px(t1 ) = p x (t)dt = p x (t)dt +
t1 m(P1 )

tk

2 βn
· · ·
p x (t)dt + p x (t)dt + p x (t)dt.
k n α
tk
1
n ∪n {βn }
110 M.A. Mamedov

· .
It is not diﬃcult to observe that p x (t) = a (t) = 0, ∀t ∈ m(P1 ),
meas(∪n {βn }) = 0 and px(αn ) = px(βn ), n = 1, 2, ... (see [2]). Then we
obtain

p2 − p1 = (px(tk2 ) − px(tk1 )) = (pk2 − pk1 ).
k k

Therefore for the intervals [tk1 , tk2 ] all assertions of the lemma hold.

Lemma 6. Assume that on the intervals [t1 , t2 ] and [s2 , s1 ] the following con-
ditions hold:

1. px(ti ) = px(si ) = pi , i = 1, 2.
2. x(t) ∈ int D, ∀t ∈ (t1 , t2 ). In particular, from this condition it follows
·
that p x (t) < 0, ∀t ∈ (t1 , t2 ).
·
3. p x (s) > 0, ∀s ∈ (s2 , s1 ).
Then
t2 s1 s1
∗
u(x(t))dt + u(x(s))ds ≤ u [(t2 − t1 ) + (s1 − s2 )] − δ 2 (x(s))ds
t1 s2 s2

where the function δ(x) is as deﬁned in Lemma 17.5.1.

Proof: Consider two cases.

I. Let p∗ = px∗ = pi , i = 1, 2. We recall that pi = px(ti ), i = 1, 2. In this
case from Conditions 2) and 3) we have

∼
ε= ρ (x∗ , {x(t) : t ∈ [t1 , t2 ]}) > 0 and ε = ρ (x∗ , {x(s) : s ∈ [s2 , s1 ]}) > 0.
∧ ∧
Now we use Lemma 17.5.1. We deﬁne δ =δ ε > 0 and η =η ε,∼ε for the chosen
∼
numbers ε and ε. We take any number N > 0 and divide the interval [p2 , p1 ]
into N equal parts [pk2 , pk1 ]. From Conditions 2 and 3 it follows that in this
case the intervals [t1 , t2 ] and [s2 , s1 ] are also divided into N parts, say [tk1 , tk2 ]
and [sk2 , sk1 ], respectively. Here px(tki ) = px(ski ) = pki , i = 1, 2, k = 1, ..., N.
Clearly
p 1 − p2
pk1 − pk2 = → 0 as N → ∞.
N
∼
Since x(t) ∈ D and ||x(t) − x∗ || ≥ ε > 0 for all t ∈ [t1 , t2 ] then from
Lemma 17.4.1 it follows that
·
p x (t) ≤ c(x(t)) < −η∼ε < 0.
That is why for every k we have tk2 − tk1 → 0 as N → ∞. Therefore for a
given η > 0 there exists a number N such that
5 Asymptotical stability of optimal paths in nonconvex problems 111

max ||x(t) − x(s)|| < η for all k = 1, ..., N. (5.18)

t,s∈[tk k
1 ,t2 ]

Now we show that for all k = 1, ..., N

sk1 − sk2 → 0 as N → ∞. (5.19)

Suppose that (5.19) is not true. In this case there exists a sequence of inter-
vals [sk2N , sk1N ], such that ski N → si , i = 1, 2, and s2 < s1 . Since pk1N −pk2N → 0
as N → ∞ then px(s2 ) = px(s1 ) = p , and moreover px(s) = p for all
s ∈ [s2 , s1 ]. This is a contradiction. So (5.19) is true.
A. Now we take any number k and ﬁx it. For the sake of simplicity we
denote the intervals [tk1 , tk2 ] and [sk2 , sk1 ] by [t1 , t2 ] and [s2 , s1 ], respectively. Let
pi = px(ti ) = px(si ), i = 1, 2.
Take any s ∈ (s2 , s1 ) and denote by t the point in the interval (t1 , t2 ) for
which px(t ) = px(s). From (6.8) it follows that

x(t) ∈ Vη (x(t )) for all t ∈ [t1 , t2 ].

Therefore we can apply Lemma 17.5.1. We also note that the following
conditions hold:
·
• |cx(t))| ≤ |p x (t)| for all t ∈ (t1 , t2 );
·
• cx(s)) ≥ p x (s) for all s ∈ (s2 , s1 );
• u(x(s)) ≤ u∗ for all s ∈ [s2 , s1 ].
Then from Lemma 17.5.1 we obtain

u(x(t)) u(x(s)) ∗ 1 1
· + · ≤u · + ·
|p x (t)| + δ1 p x (s) + δ2 |p x (t)| + δ1 p x (s) + δ2
− δ(x(s)), (5.20)

for all t ∈ (t1 , t2 ), s ∈ (s2 , s1 ), δ1 ∈ [0, δ¯1 ] and δ2 ∈ [0, δ(x(s))].

Denote ξ = mins∈[s2 ,s1 ] δ(x(s)). Clearly ξ > 0. Since the function δ(x) is
∼
continuous there is a point s∈ [s2 , s1 ] such that

∼
ξ = δ(x( s)).

We transform t → π and s → ω as follows:

s2 − s1
π = px(t) + ξ (t1 − t), t ∈ [t1 , t2 ],
t1 − t2
ω = px(s) + ξ(s − s1 ), s ∈ [s2 , s1 ].
112 M.A. Mamedov

Clearly
· ∼ ·
dπ = [p x (t)− ξ ]dt and dω = [p x (s) + ξ]ds,
∼
where ξ = ξ(s2 − s1 )/(t1 − t2 ).
· ∼ ·
Since p x (t)− ξ < 0 and p x (s) + ξ > 0 then there exist inverse functions

t = t(π) and s = s(ω). We also note that π1 = px(t1 ) = px(s1 ) = ω1 and

π2 = px(t2 ) + ξ(s2 − s1 ) = ω2 .
Therefore we have
t2 s1

A = u(x(t))dt + u(x(s))ds
t1 s2
2
π ω1
u(x(t(π))) u(x(s(ω)))
= · ∼ dπ +
· dω
p x (s(ω)) + ξ
π1 p x (t(π))− ξ ω2
ω1
u(x(t(ω))) u(x(s(ω)))
= · ∼ + · dω.
ω2 |p x (t(ω))|+ ξ p x (s(ω)) + ξ

∼
Let δ¯1 > ξ . Since
ξ ≤ δ(x(t)) = δ(x(t(ω))), t(ω) ∈ [t1 , t2 ], s(ω) ∈ [s2 , s1 ],

then from (6.10) we obtain

ω1 ω1
∗ 1 1
A≤ u · ∼ + · dω − δ(x(s(ω)))dω
ω2 |p x (t(ω))|+ ξ p x (s(ω)) + ξ
ω2

⎛t ⎞
2 s1 s1
∗⎝ ·
=u ⎠
dt + ds − δ(x(s))[p x (s) + ξ]ds
t1 s2 s2
s1
∗
≤ u [(t2 − t1 ) + (s1 − s2 )] − ξδ(x(s)ds.
s2

∼
On the other hand δ(x(s)) ≥ ξ = δ(x( s)). Thus
∼
A ≤ u∗ [(t2 − t1 ) + (s1 − s2 )] − (s1 − s2 )δ 2 (x( s)).

B. Now we consider diﬀerent numbers k. The last inequality shows that

∼
for every k = 1, ..., N there is a point sk ∈ [sk2 , sk1 ] such that
5 Asymptotical stability of optimal paths in nonconvex problems 113

t2 s1
∼
u(x(t))dt + u(x(s))ds ≤ u∗ [(tk2 − tk1 ) + (sk1 − sk2 )] − (sk1 − sk2 )δ 2 (x(sk )).
t1 s2

Summing over k we obtain

t2 s1
N
∼
u(x(t))dt + u(x(s))ds ≤ u∗ [(t2 − t1 ) + (s1 − s2 )] − (sk1 − sk2 )δ 2 (x(sk )).
t1 s2 k=1

Therefore the lemma is proved taking into account (5.19) and passing to
the limit as N → ∞.

II. Now consider the case when p∗ = pi for some i = 1, 2. For the sake of
deﬁniteness we assume that p∗ = p1 .
Take any number α > 0 and consider the interval [p2 , p1 − α]. Denote by
[t1 − t(α), t2 ] and [s2 , s1 − s(α)] the intervals which correspond to the interval
[p2 , p1 − α]. Clearly t(α) → 0 and s(α) → 0 as α → 0. We apply the result
proved in the ﬁrst part of the lemma for the interval [p2 , p1 − α]. Then we
pass to the limit as α → 0. Thus the lemma is proved.

5.5.5

We deﬁne two types of sets.

Deﬁnition 6. The set π ⊂ [0, T ] is called a set of 1st type on the interval
[p2 , p1 ] if the following conditions hold:
a) The set π consists of two sets π1 and π2 , that is, π = π1 ∪ π2 , such that
x(t) ∈ int D, ∀t ∈ π1 and x(t) ∈ / int D, ∀t ∈ π2 .
b) The set π1 consists of an at most countable number of intervals dk , with
end-points tk1 < tk2 and the intervals (px(tk2 ), px(tk1 )), k = 1, 2, ..., are
disjoint.
Clearly in this case the intervals d0k = (tk1 , tk2 ) are also disjoint.
c) Both the inequalities p1 ≥ supk px(tk1 ) and p2 ≤ inf k px(tk2 ) hold.
Deﬁnition 7. The set ω ⊂ [0, T ] is called a set of 2nd type on the interval
[p2 , p1 ] if the following conditions hold:
a) x(t) ∈
/ int D, ∀t ∈ ω.
b) The set ω consists of an at most countable number of intervals [sk2 , sk1 ],
such that the intervals (px(sk2 ), px(sk1 )), k = 1, 2, ..., are nonempty and
disjoint, and

p1 − p2 = [px(sk1 ) − px(sk2 )].
k
114 M.A. Mamedov

Lemma 7. Assume that π and ω are sets of 1st and 2nd type on the interval
[p2 , p1 ], respectively. Then

u(x(t))dt ≤ u∗ meas(π ∪ ω) − [u∗ − u(x(t))]dt − δ 2 (x(t))dt,
π∪ω Q E

where
a) Q ∪ E = ω ∪ π2 = {t ∈ π ∪ ω : x(t) ∈/ int D};
b) for every ε > 0 there exists a number δε > 0 such that

δ 2 (x) ≥ δε for all x for which ||x − x∗ || ≥ ε;

c) for every δ > 0 there exists a number K(δ) < ∞ such that

meas[(π ∪ ω) ∩ Zδ ] ≤ K(δ)meas[(Q ∪ E) ∩ Zδ ],

where Zδ = {t ∈ [0, T ] : |px(t) − p∗ | ≥ δ}.

Proof. Let π = π1 ∪ π2 , π1 = ∪k dk , ∪n νn ⊂ ω and νn = [sn2 , sn1 ] (see

Deﬁnitions 6 and 7).
We denote π10 = ∪k d0k and d0k = (tk1 , tk2 ). Clearly meas π1 = meas π10 and
that is why below we deal with d0k .
Denote pni = px(sni ), i = 1, 2. Clearly pn2 < pn1 . Since the function x(t) is
absolutely continuous, from Lemma 5 it follows that there exists an at most
countable number of intervals [snm 2 , s1 ] ⊂ [s2 , s1 ], m = 1, 2, ..., such that
nm n n
·
i - p x (s) > 0, for all s ∈ [snm nm
2 , s1 ], n, m = 1, 2, ...;
ii - [p2 , p1 ] ⊂ [p2 , p1 ] and p2 < pnm
nm nm n n nm
1 for all n, m,
here pnm
i = px(snm
i ), i = 1, 2;
iii - the intervals (pnm nm
2 , p1 ), n, m = 1, 2, ..., are disjoint and

pn1 − pn2 = (pnm
1 − pnm
2 ).
m

Therefore the set ω contains an at most countable number of intervals νm =

(sm m
2 , s1 ), such that:

1. the intervals (pm m m m

2 , p1 ), m = 1, 2, . . . , are disjoint (here pi = px(si ),
i = 1, 2, m = 1, 2, . . .) ;
2.
·
p x (t) > 0, for all t ∈ ∪m (sm m
2 , s1 ). (5.21)

Now we take some interval d0k = (tk1 , tk2 ) and let pki = px(tki ), i = 1, 2. Denote

2 , p1 ] = [p2 , p1 ] ∩ [p2 , p1 ].
[pkm km k k m m
(5.22)
5 Asymptotical stability of optimal paths in nonconvex problems 115

·
Since p x (t) < 0, for all t ∈ d0k , from (5.21) it follows that there are two
intervals [tkm km km km
1 , t2 ] and [s2 , s1 ] corresponding to the nonempty interval
[pkm
2 , pkm
1 ], and
2 − t1 ) = t2 − t1 .
(tkm km k k

Applying Lemma 6 we obtain

km km km

t2
s1
s1

u(x(t))dt+ u(x(s))ds ≤ u∗ [(tkm

2 −t1 )+(s1 −s2 )]−
km km km
δ 2 (x(s))ds.
tkm
1 skm
2 skm
2

Summing over m and then over k we have

tk km

2 s1

u(x(t))dt + u(x(s))ds
k k,m km
tk
1 s2
⎡ ⎤ km

s1

≤ u∗ ⎣ (tk2 − tk1 ) + km ⎦
1 − s2 )
(skm − δ 2 (x(s))ds.
k k,m k,m km
s2

Denote ω = ∪k,m [skm

2 , s1 ]. Clearly ω ⊂ ω. Therefore
km

u(x(t))dt
π∪ω

= u(x(t))dt + u(x(t))dt + u(x(t))dt + u(x(t))dt
π1 ω ω\ω π2

≤ u∗ (meas π1 + meas ω ) − δ 2 (x(s))ds
ω

− [u∗ − u(x(t))]dt + u∗ [meas π2 + meas (ω \ ω )]
π2 ∪(ω\ω )

= u∗ meas (π ∪ ω) − [u∗ − u(x(t))]dt − δ 2 (x(t))dt,
Q E

where Q = π2 ∪ (ω \ ω ) and E = ω .
Now we check Conditions a), b) and c) of the lemma.
Condition a) holds, because Q ∪ E = π2 ∪ (ω \ ω ) ∪ ω = π2 ∪ ω. Condition
b) follows from Lemma 17.5.1. We now check Condition c).
Take any number δ > 0 and denote Pδ = {l : |l − p∗ | ≥ δ}.
116 M.A. Mamedov

Consider the intervals [pkm km k k m m

2 , p1 ], [p2 , p1 ] and [p2 , p1 ] (see (6.11)) cor-
0
responding to the interval dk , where

pk1 − pk2 = 1 − p2 ).
(pkm km
(5.23)
m

From Lemma 17.4.1 we have

·
p x (t) ≤ c(x(t)) < −ηδ for all t ∈ π1 ∩ Zδ . (5.24)

On the other hand there exists a number K < +∞, for which
·
p x (t) ≤ K for all t ∈ [0, T ].

Therefore

·
meas [pk2 , pk1 ] ∩ Pδ = [−p x (t)]dt ≥ ηδ meas (dk ∩ Zδ ).
dk ∩Zδ

Summing over k we have

I= meas [pk2 , pk1 ] ∩ Pδ ≤ ηδ meas (π1 ∩ Zδ ). (5.25)

On the other hand, from (5.23) it follows that

·
2 , p1 ] ∩ Pδ =
meas [pkm km
I= p x (t)dt
k,m k,m km km
[s2 ,s1 ]∩Zδ

≤ K 2 , s1 ] ∩ Zδ ≤ K meas (ω ∩ Zδ ) .
meas [skm km
(5.26)
k,m

Thus from from (5.25) and (5.26) we obtain

K K
meas (π1 ∩ Zδ ) ≤ meas (E ∩ Zδ ) ≤ meas [(Q ∪ E) ∩ Zδ ].
ηδ ηδ
But Q ∪ E = π2 ∪ ω and therefore

meas [(π ∪ ω) ∩ Zδ ] = meas (π1 ∩ Zδ ) + meas [(Q ∪ E) ∩ Zδ ]

K
≤ meas [(Q ∪ E) ∩ Zδ ] + meas [(Q ∪ E) ∩ Zδ ].
ηδ
K
Then Condition c) holds if we take Kδ = ηδ + 1 and thus the lemma is
proved.
5 Asymptotical stability of optimal paths in nonconvex problems 117

5.6 Transformation of the functional (5.2)

In this section we divide the sets G0 , Gk and GT (see Lemma 4) into sets
of 1st and 2nd type such that Lemma 7 can be applied. Note that x(t) is a
continuously diﬀerentiable trajectory.

5.6.1

Lemma 8. Assume that the set Gi consists of a ﬁnite number of elements

gk , g1 < g2 < ... < gN . Then
2
tg
N

u(x(t))dt = u(x(t))dt + u(x(t))dt.
t1g k π ∪ω F
k k
1

Here πk and ωk are the sets of 1st and 2nd type in the interval [pk2 , pk1 ] and
the set F is either a set of 1st type in the interval [p2gN , p1g1 ], if p2gN ≤ p1g1 , or
is a set of 2nd type in the interval [p1g1 , p2gN ], if p2gN > p1g1 .

Proof. Take the set g1 and assume that [p2g1 , p1g1 ] and [t1g1 , t2g1 ] are the corre-
sponding intervals. Note that we are using the notation introduced in Section
5.5. Take the set g2 .

A. First we consider the case p1g2 < p1g1 . In this case there is a point
t ∈ [t1g1 , t2g1 ] such that px(t1 )) = p1g2 . Denote π1 = [t1 , t2g1 ] and ω1 = [t2g1 , t1g2 ].
1

Note that by Deﬁnition 6 we have p2g1 < p1g2 . It is clear that π1 and ω1 are
sets of 1st and 2nd type on the interval [p2g1 , p1g2 ], respectively. Therefore
2
tg2
u(x(t))dt = u(x(t))dt + u(x(t))dt,
t1g1 π1 ∪ω1 π11

where π11 = [t1g1 , t1 ] ∪ [t1g2 , t2g2 ] is a set of 1st type on the interval [p2g2 , p1g1 ].

B. Now we assume that p1g2 ≥ p1g1 . In this case there is a point t1 ∈ [t2g1 , t1g2 ]
such that px(t1 ) = p1g1 . Denote π1 = [t1g1 , t2g1 ] and ω1 = [t2g1 , t1 ]. Consider two
cases.

1. Let p2g2 ≥ p1g1 . Then there is a point t2 ∈ [t1 , t1g2 ] such that px(t2 )) = p2g2 .
In this case we denote
118 M.A. Mamedov

π2 = [t1g2 , t2g2 ], ω2 = [t2 , t1g2 ] ; and]; ω1 = [t1 , t2 ].

Therefore
2
tg2
u(x(t))dt = u(x(t))dt + u(x(t))dt,
i=1,2π ∪ω
t1g1 i i ω1

where ω1 is a set of 2nd type on the interval [p1g1 , p2g2 ].

2. Let p2g2 < p1g1 . Then there is a point t2 ∈ [t1 , t1g2 ] such that px(t2 )) = p1g1 .
In this case we denote
π1 = [t1g2 , t2 ], ω2 = [t1 , t1g2 ] and π1 = [t2 , t2g2 ].

Therefore
2
tg2
u(x(t))dt = u(x(t))dt + u(x(t))dt,
i=1,2π ∪ω
t1g1 i i π1

where π1 is a set of 1st type on the interval [p2g2 , p1g1 ].

We repeat this procedure taking g3 , g4 , ...gN and thus the lemma is proved.

Lemma 9. Assume that gn ∈ Gi , n = 1, 2, ..., g1 < g2 < ..., and t2 =

limn→∞ t2gn . Then

t
2

u(x(t))dt = u(x(t))dt + u(x(t))dt.

n π ∪ω
t1g1 n n F

Here πn and ωn are sets of 1st and 2nd type in the interval [p2n , p1n ] and the
set F is either a set of 1st type in the interval [p∗ , p1g1 ], if p∗ ≤ p1g1 , or is a
set of 2nd type in the interval [p1g1 , p∗ ], if p∗ > p1g1 .
Proof. We apply Lemma 8 for every n. From Proposition 1 we obtain that
x(t) → x∗ as t → t2 , and therefore p2gn → p∗ as n → ∞. This completes the
proof.
We can prove the following lemmas in a similar manner to that used for
proving Lemmas 8 and 9.
Lemma 10. Assume that the set Gi consists of a ﬁnite number of elements
gk , g1 > g2 > ... > gN . Then
2
tg1
u(x(t))dt = u(x(t))dt + u(x(t))dt.
t1g k π ∪ω F
k k
N
5 Asymptotical stability of optimal paths in nonconvex problems 119

Here πk and ωk are sets of 1st and 2nd type in the interval [p2k , p1k ] and the
set F is either a set of 1st type in the interval [p2g1 , p1gN ], if p1gN ≥ p2g1 , or is
a set of 2nd type in the interval [p1gN , p2g1 ], if p1gN < p2g1 .

Lemma 11. Assume that gn ∈ Gi , n = 1, 2, ..., g1 > g2 > ..., and t1 =

limn→∞ t1gn . Then
2
tg1
u(x(t))dt = u(x(t))dt + u(x(t))dt.
n π ∪ω
t1 n n F

Here πn and ωn are sets of 1st and 2nd type in the interval [p2n , p1n ] and the
set F is either a set of 1st type in the interval [p2g1 , p∗ ], if p∗ ≥ p2g1 , or is a
set of 2nd type in the interval [p∗ , p2g1 ], if p∗ < p2g1 .

In the next lemma we combine the results obtained by Lemmas 9 and 11.
Lemma 12. Assume that the set Gi consists of elements gn , n = ±1,
±2, . . ., where · · · < g−2 < g−1 < g1 < g2 < · · · , and where t1 =
limn→−∞ t1gn and t2 = limn→∞ t2gn . Then

t
2

u(x(t))dt = u(x(t))dt.
n π ∪ω
t1 n n

Here πn and ωn are sets of 1st and 2nd type in the interval [p2n , p1n ].

Proof. We apply Lemmas 9 and 11 and obtain

t
2

u(x(t))dt = u(x(t))dt + u(x(t))dt, (5.27)

n ∪ω
t1g1 πn n F

t2g
−1
u(x(t))dt = u(x(t))dt + u(x(t))dt. (5.28)
n ∪ω
t1 πn n F

We deﬁne π0 = F ∪ F and ω0 = [t2g−1 , t1g1 ]. Clearly they are sets of 1st and
2nd type in the interval [p2g−1 , p1g1 ] (note that p2g−1 < p1g1 by Deﬁnition 5).
Therefore the lemma is proved if we sum (5.27) and (5.28).

5.6.2

Now we use Lemma 4. We take any interval [t1k , t2k ] and let
120 M.A. Mamedov

[t1k , t2k ] = [t1k , t1Gk ] ∪ [t1Gk , t2Gk ] ∪ [t2Gk , t2k ].

We show that
2
tk
u(x(t))dt = u(x(t))dt + u(x(t))dt, (5.29)
n k ∪ω k
t1k πn n Ek

where πnk and ωnk are sets of 1st and 2nd type in the interval [p2nk , p1nk ] and
x(t) ∈ int D, ∀t ∈ E k .
If the conditions of Lemma 12 hold then (5.29) is true if we take

E k = [t1k , t1Gk ] ∪ [t2Gk , t2k ].

Otherwise we apply Lemmas 8–11 and obtain

2
tGk
u(x(t))dt = u(x(t))dt + u(x(t))dt.
n k ∪ω k
t1G πn n Fk
k

If F k is a set of 2nd type then (5.29) is true if we take E k = F k . Assume

that F k is a set of 1st type on some interval [p2 , p1 ]. In this case we set

π0k = F k and ω0k = [t1k , t1Gk ] ∪ [t2Gk , t2k ].

We have x(t1k ) = x(t2k ) = x∗ (see Lemma 4) and therefore π0k and ω0k are sets
of 1st and 2nd type in the interval [p2 , p1 ]. Thus (5.29) is true.
Now we apply Lemmas 8–12 to the intervals [0, t1 ] and [t2 , T ]. We have

t
1

u(x(t))dt = u(x(t))dt + u(x(t))dt + u(x(t))dt, (5.30)

0 n 0 ∪ω 0
πn n F0 E0

T
u(x(t))dt = u(x(t))dt + u(x(t))dt + u(x(t))dt. (5.31)
n T ∪ω T
t2 πn n FT ET

Here
• F 0 and F T are sets of 1st type (they may be empty);
• [0, t1G0 ] ∪ [t2G0 , t1 ] ⊂ E 0 and [t2 , t1GT ] ∪ [t2GT , T ] ⊂ E T ;
• x(t) ∈ / int D for all t ∈ E 0 ∪ E T .
Thus, applying Lemma 4 and taking into account (5.29)–(5.31), we can
prove the following lemma.
Lemma 13. The interval [0, T ] can be divided into subintervals such that
5 Asymptotical stability of optimal paths in nonconvex problems 121

[0, T ] = ∪n (πn ∪ ωn ) ∪ F1 ∪ F2 ∪ E, (5.32)

T
u(x(t)) dt = u(x(t)) dt + u(x(t)) dt + u(x(t)) dt.
n π ∪ω
0 n n F1 ∪F2 E
(5.33)

Here
1. The sets πn and ωn are sets of 1st and 2nd type, respectively, in the inter-
vals [p2n , p1n ], n = 1, 2, ....
2. The sets F1 and F2 are sets of 1st type in the intervals [p21 , p11 ] and [p22 , p12 ],
respectively, and

x(t) ∈ int D, f or all t ∈ F1 ∪ F2 , (5.34)

p1i − p2i ≤ C < +∞, i = 1, 2. (5.35)

3. Also

x(t) ∈
/ int D, f or all t ∈ E. (5.36)

4. For every δ > 0 there is a number C(δ) such that

meas [(F1 ∪ F2 ) ∩ Zδ ] ≤ C(δ), (5.37)

where the number C(δ) < ∞ does not depend on the trajectory x(t), on T
or on the intervals in (5.32).

Proof: We define
F1 = {t ∈ F 0 : x(t) ∈ int D}, F2 = {t ∈ F T : x(t) ∈ int D}
and E = ∪k E k ∪ E 0 ∪ E T . Then we replace π10 to π10 ∪ (F 0 \ F1 ) and π1T
to π1T ∪ (F T \ F2 ) in (5.30) and (5.31) (note that after these replacements
the conditions of Definition 6 still hold). We obtain (5.33) summing (5.29)–
(5.31). It is not difficult to see that all assertions of the lemma hold. Note that
(5.35) follows from the fact that the trajectory x(t) is uniformly bounded (see
(5.3)). The inequality (5.37) follows from Lemma 17.4.1, taking into account
Definition 6, and thus the lemma is proved.

Lemma 14. There is a number L < +∞ such that

[u(x(t)) − u∗ ] dt < L, (5.38)
F1 ∪F2

where L does not depend on the trajectory x(t), on T or on the intervals in

(5.32).

Proof: From Condition H3 it follows that there exist a number ε > 0 and a
∼ ∼
trajectory x (·) to the system (6.1), deﬁned on [ 0, Tε ], such that p x (0) =
∼
p∗ − ε, p x (Tε ) = p∗ + ε and
122 M.A. Mamedov
.
∼
p x (t) > 0 for almost all t ∈ [0, Tε ]. (5.39)

Deﬁne
∼
Rε = u(x (t))dt.
[0,Tε ]

Consider the set F1 and corresponding interval [p21 , p11 ]. Deﬁne a set

F1ε = {t ∈ F 1 : |px(t) − p∗ | < ε}.

We consider the most common case, when [p∗ − ε, p∗ + ε] ⊂ [p21 , p11 ]. In
this case the sets F1ε and [0, Tε ] are sets of 1st and 2nd type in the interval
∼
[p∗ − ε, p∗ + ε] for the trajectories x(·) and x (·), respectively. We have

u(x(t))dt + Rε = u(x(t)) dt + u(x(t)) dt + u(x(t))dt.
F1 F1ε [0,Tε ] F1 \F1ε
(5.40)
∼
We use Lemma 7. Note that this lemma can be applied for the trajectory x (·)
(which may not be continuously diﬀerentiable) due to the inequality (5.39).
Taking into account δ(x) ≥ 0 and u(x(t)) ≤ u∗ , for t ∈ Q, from Lemma 7 we
obtain

u(x(t)) dt + u(x(t)) dt ≤ u∗ (meas F1ε + Tε ). (5.41)
F1ε [0,Tε ]

From (5.37) it follows that

meas (F1 \ F1ε ) = meas (F1 ∩ Zε ) ≤ C(ε).

Thus

u(x(t)) dt ≤ Cε , (5.42)
F1 \F1ε

where the number Cε < +∞ does not depend on T or on the trajectory x(t).
Denote C = Tε u∗ + Cε − Rε . Then from (5.40)–(5.42) we obtain

u(x(t)) dt ≤ u∗ meas F1ε + C ≤ u∗ meas F1 + C
F1

and therefore
[u(x(t)) − u∗ ] dt ≤ C .
F1
5 Asymptotical stability of optimal paths in nonconvex problems 123

By analogy we can prove that

[u(x(t)) − u∗ ] dt ≤ C .
F2

Thus the lemma is proved if we take L = C + C .

5.7 The proof of Theorem 13.6

From Condition M it follows that for every T > 0 there exists a trajectory
xT (·) ∈ XT , for which

u(xT (t)) dt ≥ u∗ T − b. (5.43)
[0,T ]

5.7.1

First we consider the case when x(t) is a continuously diﬀerentiable function.

In this case we can apply Lemma 7.
From Lemmas 4 and 14 we have

u(x(t)) dt ≤ u(x(t)) dt+ u(x(t)) dt+L+u∗ +meas (F1 ∪F2 ).
n π ∪ω
[0,T ] n n E

Then applying Lemma 7 we obtain

u(x(t)) dt
[0,T ]
⎛ ⎞

≤ ⎝u∗ meas (πn ∪ ωn ) − u(x(t)) dt − δ 2 (x(t)) dt⎠
n
Qn En

+ u(x(t)) dt + L + u∗ + meas (F1 ∪ F2 )
E

= u∗ meas (πn ∪ ωn ) + meas (F1 ∪ F2 ) + meas E
n

− [u∗ − u(x(t))] dt − δ 2 (x(t)) dt + L
Q A

∗ ∗
= u meas [0, T ] − [u − u(x(t))] dt − δ 2 (x(t)) dt + L.
Q A
124 M.A. Mamedov

Here Q = (∪n Qn ) ∪ E and A = ∪n En . Taking (5.43) into account we have

u(x(t)) dt − u(xT (t)) dt ≤ − [u∗ − u(x(t))] dt
[0,T ] [0,T ] Q

− δ 2 (x(t)) dt + L + b,
A

that is,

JT (x(·)) − JT (xT (·)) ≤ − [u∗ − u(x(t))] dt − δ 2 (x(t)) dt + L + b.
Q A
(5.44)

Here

Q = (∪n Qn ) ∪ E and A = ∪n En (5.45)

and the following conditions hold:

Q ∪ A = { t ∈ [0, T ] : x(t) ∈
/ int D}; (5.46)

[0, T ] = ∪n (πn ∪ ωn ) ∪ (F1 ∪ F2 ) ∪ E; (5.47)

c) for every δ > 0 there exist K(δ) < +∞ and C(δ) < +∞ such that

meas [(πn ∪ ωn ) ∩ Zδ ] ≤ K(δ) meas [(Qn ∪ En ) ∩ Zδ ] and (5.48)

meas [(F1 ∪ F2 ) ∩ Zδ ] ≤ C(δ); (5.49)

(recalling that Zδ = {t ∈ [0, T ] : |px(t) − p∗ | ≥ δ})

d) for every ε > 0 there exists δε > 0 such that

δ 2 (x) ≥ δε for all x, ||x − x∗ || ≥ ε. (5.50)

The first assertion of the theorem follows from (5.44), (5.46) and (5.50) for
the case under consideration (that is, x(t) continuously differentiable). We
now prove the second assertion.
Let ε > 0 and δ > 0 be given numbers and x(·) a continuously differentiable
ξ-optimal trajectory. We denote

Xε = {t ∈ [0, T ] : ||x(t) − x∗ || ≥ ε}.

5 Asymptotical stability of optimal paths in nonconvex problems 125
∼
First we show that there exists a number K ε,ξ < +∞ which does not depend
on T > 0 and
∼
meas [(Q ∪ A) ∩ Xε ] ≤ K ε,ξ . (5.51)

Assume that (5.51) is not true. In this case there exist sequences Tk → ∞,
k
Kε,ξ → ∞ and sequences of trajectories {xk (·)} (every xk (·) is a ξ-optimal
trajectory in the interval [0, Tk ]) and {xTk (·)} (satisfying (5.43) for every
T = Tk ) such that

meas [(Qk ∪ Ak ) ∩ Xεk ] ≥ Kε,ξ

k
as k → ∞. (5.52)

From Lemma 17.3.1 and (5.50) we have

u∗ − u(xk (t)) ≥ νε if t ∈ Qk ∪ Xεk and

δ 2 (xk (t)) ≥ δε2 if t ∈ Ak ∩ Xεk .

Denote ν = min {νε , δε2 } > 0. From (5.44) it follows that

JTk (xk (·)) − JTk (xTk (·)) ≤ L + b − ν meas [(Qk ∪ Ak ) ∩ Xεk ].

Therefore, for suﬃciently large numbers k, we have

JTk (xk (·)) ≤ JTk (xTk (·)) − 2 ξ ≤ JT∗k − 2 ξ,

which means that xk (t) is not a ξ-optimal trajectory. This is a contradiction.

Thus (5.51) is true.
1
Now we show that for every δ > 0 there is a number Kδ,ξ < +∞ such that

meas Zδ ≤ Kδ,ξ
1
. (5.53)

From (5.47)–(5.49) we have

meas Zδ = meas [(πn ∪ ωn ) ∩ Zδ ] + meas [(F1 ∪ F2 ) ∩ Zδ ]
n
+ meas (E ∩ Zδ )

≤ K(δ) meas [(Qn ∪ En ) ∩ Zδ ] + C(δ) + meas (E ∩ Zδ )
n
∼
≤K (δ) meas [([∪n (Qn ∪ En )] ∩ Zδ ) ∪ (E ∩ Zδ )] + C(δ)
∼
=K (δ) meas [(Q ∪ A) ∩ Zδ ] + C(δ),
∼
where K (δ) = max{1, K(δ)}.
Since Zδ ⊂ Xδ , taking (5.51) into account we obtain (5.53), where
126 M.A. Mamedov
∼ ∼
1
Kδ,ξ =K (δ) K δ,ξ +C(δ).

We denote Xε/2 0
= {t ∈ [0, T ] : ||x(t) − x∗ || > ε/2.}. Clearly Xε/2
0
is an
open set and therefore can be presented as a union of an at most countable
∼
number of open intervals, say Xε/2 0
= ∪k τ k . Out of these intervals we
choose further intervals, which have a nonempty intersection with Xε , say
these are τk , k = 1, 2, .... Then we have

Xε ⊂ ∪k τk ⊂ Xε/2
0
. (5.54)

Since a derivative of the function x(t) is bounded, it is not diﬃcult to see

that there exists a number σε > 0 such that

meas τk ≥ σε for all k. (5.55)

But the interval [0, T ] is bounded and therefore the number of intervals
τk is ﬁnite too. Let k = 1, 2, 3, ..., NT (ε). We divide every interval τk into
two parts:

τk1 = {t ∈ τk : x(t) ∈ int D} and τk2 = {t ∈ τk : x(t) ∈

/ int D}.

From (5.46) and (5.54) we obtain

∪k τk2 ⊂ (Q ∪ A) ∩ Xε/2
0

and therefore from (5.51) it follows that

∼
meas (∪k τk2 ) ≤ K ε/2,ξ . (5.56)

Now we apply Lemma 17.4.1. We have

·
p x (t) ≤ − ηε/2 , t ∈ ∪k τk1 . (5.57)

Deﬁne p1k = supt∈τk px(t) and p2k = inf t∈τk px(t). It is clear that
∼
p1k − p2k ≤ C , k = 1, 2, 3, ..., NT (ε), (5.58)

and
·
|p x (t)| ≤ K, for all t. (5.59)
∼
Here the numbers C and K do not depend on T > 0, x(·), ε or ξ. We divide
the interval τk into three parts:
· ·
τk− = {t ∈ τk : p x (t) < 0}, τk0 = {t ∈ τk : p x (t) = 0} and
·
τk+ = {t ∈ τk : p x (t) > 0}.
5 Asymptotical stability of optimal paths in nonconvex problems 127

Then we have
* * * *
* * * *
* * * *
· * · · *
p1k − p2k ≥ ** p x (t)dt ** = * p x (t)dt + p x (t)dt * .
* * * *
τk * τ− τ+
*
k k

· ·
We deﬁne α=− p x (t)dt and β = p x (t)dt. Clearly α > 0, β > 0
τk− τk+
and
)
−α + β, if α < β,
p1k − p2k ≥ (5.60)
α − β, if α ≥ β.

From (5.59) we obtain

0 < β ≤ K meas τk+ . (5.61)

On the other hand, τk1 ⊂ τk− and therefore from (5.57) we have

α ≥ ηε/2 meas τk− ≥ ηε/2 meas τk1 . (5.62)

Consider two cases.

a) α ≥ β. Then from (5.60)–(5.62) we obtain
∼
C ≥ pk − pk ≥ α − β ≥ ηε/2 meas τk − K meas τk .
1 2 1 +
(5.63)
∼
Since τk+ ⊂ τk2 , then from (5.56) it follows that meas τk+ ≤ K ε/2,ξ .
Therefore from (5.63) we have
∼

meas τk1 ≤ Cε,ξ , where Cε,ξ = (C + K· K ε/2,ξ )/ηε/2 . (5.64)

b) α < β. Then from (5.61) and (5.62) we obtain

∼
ηε/2 meas τk1 < K meas τk+ ≤ K· K ε/2,ξ

or
∼

meas τk1 < Cε,ξ , where Cε,ξ = K· K ε/2,ξ /ηε/2 . (5.65)

Thus from (5.64) and (5.65) we obtain

meas τk1 ≤ Cε,ξ = max{Cε,ξ , Cε,ξ }, k = 1, 2, ..., NT (ε),

and then

meas (∪k τk1 ) ≤ NT (ε) Cε,ξ . (5.66)

128 M.A. Mamedov

Now we show that for every ε > 0 and ξ > 0 there exists a number

Kε,ξ < +∞ such that

meas (∪k τk1 ) ≤ Kε,ξ . (5.67)

Assume that (5.67) is not true. Then from (5.66) it follows that NT (ε) → ∞
as T → ∞. Consider the intervals τk for which the following conditions hold:
1
meas τk1 ≥ σε and meas τk2 ≤ λ meas τk1 , (5.68)
2
where λ is any fixed number. Since NT (ε) → ∞, then from (5.55) and (5.56)
it follows that the number of intervals τk satisfying (5.68) increases infinitely
as T → ∞.
On the other hand, the number of intervals τk , for which the conditions
α < β,
meas τk2 > λ meas τk1 and λ = ηε/2 /K
hold, is finite. Therefore the number of of intervals τk for which the conditions
α ≤ β and (5.68) hold infinitely increases as T → ∞. We denote the number
of such intervals by NT and for the sake of definiteness assume that these are
intervals τk , k = 1, 2, ..., NT .
We set λ = ηε/2 /2K for every τk . Then from (5.63) and (5.68) we have

ηε/2 1
p1k − p2k ≥ ηε/2 meas τk1 − K· meas τk1 = ηε/2 meas τk1 .
2K 2
Taking (5.55) into account we obtain

p1k − p2k ≥ eε , k = 1, 2, ..., NT , (5.69)

where
1
eε = ηε/2 σε > 0 and NT → ∞ as T → ∞.
2
1
Let δ = 8 eε . From (5.69) it follows that for every τk there exists an interval
d
dk = [sk , sk ] ⊂ τk such that
1 2

|p x(t) − p∗ | ≥ δ, t ∈ dk , p x(s1k ) = sup p x(t),

t∈dk

p x(s2k ) = inf p x(t) and p x(s1k ) − p x(s2k ) = δ.

t∈dk

From (5.59) we have

* *
* *
* *
* · * · ·
δ = * p x (t) dt* ≤ |p x (t)| dt ≤ |p x (t)| dt ≤ K·meas dk .
* *
* [s1 ,s2 ] * [s1 ,s2 ] dk
k k k k
5 Asymptotical stability of optimal paths in nonconvex problems 129

Then meas dk ≥ δ/K > 0. Clearly dk ⊂ Zδ and therefore

NT
δ
meas Zδ ≥ meas ∪N
k=1 dk =
T
meas dk ≥ NT .
K
k=1

This means that meas Zδ → ∞ as T → ∞, which contradicts (5.53).

Thus (5.67) is true. Then taking (5.56) into account we obtain
∼

meas ∪k τk = (meas τk1 + meas τk2 ) ≤ K ε/2,ξ +Kε,ξ .
k

Therefore from (5.54) it follows that

meas Xε = meas ∪k τk ≤ Kε,ξ ,

∼

where Kε,ξ =K ε/2,ξ +Kε,ξ .
Thus we have proved that the second assertion of the theorem is true for
the case when x(t) is a continuously diﬀerentiable function.

5.7.2

We now take any trajectory x(·) to the system (6.1). It is known (see, for
example, [3]) that for a given number δ > 0 (we take δ < ε/2) there exists a
∼
continuously diﬀerentiable trajectory x (·) to the system (6.1) such that
∼
|| x(t)− x (t)|| ≤ δ for all t ∈ [0, T ].

Since the function u is continuous then there exists η(δ) > 0 such that
∼
u(x (t)) ≥ u(x(t)) − η(δ) for all t ∈ [0, T ].

Therefore
∼
u(x (t)) dt ≥ u(x(t)) dt − T η(δ).
[0,T ] [0,T ]

Let ξ > 0 be a given number. For every T > 0 we choose a number δ such
that T η(δ) ≤ ξ. Then

∼ ∼
u(x(t)) dt ≤ u(x (t)) dt + T η(δ) ≤ u(x (t)) dt + ξ, (5.70)
[0,T ] [0,T ] [0,T ]
130 M.A. Mamedov

that is,

∗ ∼
[ u(x(t)) − u ] dt ≤ [ u(x (t)) − u∗ ] dt + ξ.
[0,T ] [0,T ]
∼
Since the function x (·) is continuously differentiable then the second integral
in this inequality is bounded (see the first part of the proof), and therefore
the first assertion of the theorem is proved.

Now we prove the second assertion of Theorem 13.6. We will use (5.70).
Take a number ε > 0 and assume that x(·) is a ξ-optimal trajectory, that is,
JT (x(·)) ≥ JT∗ − ξ.

From (5.70) we have

∼
JT (x (·)) ≥ JT (x(·)) − ξ ≥ JT∗ − 2ξ.
∼
Thus x (·) is a continuously diﬀerentiable 2ξ-optimal trajectory. That is why
(see the ﬁrst part of the proof) for the numbers ε/2 > 0 and 2ξ > 0 there
exists Kε,ξ < +∞ such that
∼
meas { t ∈ [0, T ] : || x (t) − x∗ || ≥ ε/2} ≤ Kε,ξ .

If || x(t ) − x∗ || ≥ ε for any t then

∼ ∼ ε
|| x (t ) − x∗ || ≥ || x(t ) − x∗ || − || x(t )− x (t ) || ≥ ε − δ ≥ .
2
Therefore

∼
{ t ∈ [0, T ] : || x(t) − x∗ || ≥ ε} ⊂ { t ∈ [0, T ] : || x (t) − x∗ || ≥ ε/2},

which implies that the proof of the second assertion of the theorem is com-
pleted, that is,

meas { t ∈ [0, T ] : || x(t) − x∗ || ≥ ε} ≤ Kε,ξ .

Now we prove the third assertion of the theorem.

Let x(·) be an optimal trajectory and x(t1 ) = x(t2 ) = x∗ . Consider a
trajectory x∗ (·) deﬁned by the formula
)
x(t) if t ∈ [0, t1 ] ∪ [t2 , T ],
x∗ (t) =
x∗ if t ∈ [t1 , t2 ].
5 Asymptotical stability of optimal paths in nonconvex problems 131

Assume that the third assertion of the theorem is not true, that is, there
is a point t ∈ (t1 , t2 ) such that ||x(t ) − x∗ || = c > 0.
Consider the function x(·). In [3] it is proved that there is a sequence of
continuously diﬀerentiable trajectories xn (·), t ∈ [t1 , T ], which is uniformly
convergent to x(·) on [t1 , T ] and for which xn (t1 ) = x(t1 ) = x∗ . That is, for
every δ > 0 there exists a number Nδ such that

max || xn (t) − x(t) || ≤ δ for all n ≥ Nδ .

t∈[t1 ,T ]

On the other hand, for every δ > 0 there exists a number η(δ) > 0 such that
η(δ) → 0 as δ → 0 and

| u(x(t)) − u(xn (t)) | ≤ η(δ) for all t ∈ [t1 , T ]. (5.71)

Then we have

u(x(t)) dt ≤ u(xn (t)) dt + T η(δ). (5.72)
[t1 ,T ] [t1 ,T ]

Take a sequence of points tn ∈ (t , t2 ) such that tn → t2 as n → ∞. Clearly

in this case xn (tn ) → x∗ . We apply Lemma 13 for the interval [t1 , tn ] and
obtain (see also (5.31))

u(xn (t)) dt
[t1 ,tn ]

= u(xn (t)) dt + u(xn (t)) dt + u(xn (t)) dt. (5.73)
k π n ∪ω n Fn En
k k

Here x(t) ∈ int D ∀t ∈ F n and F n is a set of 1st type on the interval

[pxn (tn ), p∗ ] if pxn (tn ) < p∗ .
Since xn (tn ) → x∗ , pxn (tn ) → p∗ and thus for every t ∈ F n we have
u(xn (t)) → u∗ as n → ∞. Therefore

αn = [u(xn (t)) − u∗ ] dt → 0 as n → ∞.
Fn

We also note that from xn (t) ∈

/ int D, t ∈ E n , it follows that

u(xn (t)) dt ≤ u∗ meas E n .
En
132 M.A. Mamedov

Now we use Lemma 7 and obtain

u(xn (t)) dt
k π n ∪ω n
k k

∗ ∗
=u meas [∪k (πkn ∪ ωkn )] − [u − u(xn (t))] dt − δ 2 (xn (t)) dt.
∪k Qn
k ∪k Ekn

∼
We take a number δ < c/2. Then there exists a number β > 0 such that

∼
meas [∪k (Qnk ∪ Ekn )] ≥ β .

Then there exists a number β > 0 for which

u(xn (t)) dt ≤ u∗ meas [∪k (πkn ∪ ωkn )] − β.
k π n ∪ω n
k k

Therefore from (5.73) we have

u(xn (t)) dt ≤ u∗ {meas [∪k (πkn ∪ ωkn )] + meas F n + meas E n }
[t1 ,tn ]

+ αn − β

or

u(xn (t)) dt ≤ u∗ (tn − t1 ) + αn − β. (5.74)
[t1 ,tn ]

From (5.71) we obtain

u(xn (t)) dt
[t2 ,T ]

≤ u(x(t)) dt + T η(δ) = u(x∗ (t)) dt + T η(δ). (5.75)
[t2 ,T ] [t2 ,T ]

Thus from (5.72)–(5.75) we have

5 Asymptotical stability of optimal paths in nonconvex problems 133

u(x(t)) dt
[t1 ,T ]

≤ u(xn (t)) dt + T η(δ)
[t1 ,T ]

= u(xn (t)) dt + u(xn (t)) dt + u(xn (t)) dt + T η(δ)
[t1 ,tn ] [tn ,t2 ] [t2 ,T ]

≤ u∗ (tn − t1 ) + u∗ (t2 − tn ) + u(x∗ (t)) dt
[t2 ,T ]

+ αn − β + λn + 2T η(δ)

= u(x∗ (t)) dt + αn − β + λn + 2T η(δ).
[t1 ,T ]

Here
λn = [ u(xn (t)) − u∗ ] dt → 0 as n → ∞,
[tn ,t2 ]

because tn → t2 . We choose the numbers δ > 0 and n such that the following
inequality holds:
αn + λn + 2T η(δ) < β.
In this case we have

u(x(t)) dt < u(x∗ (t)) dt
[t1 ,T ] [t1 ,T ]

and therefore
u(x(t)) dt < u(x∗ (t)) dt,
[0,T ] [0,T ]

which means that x(t) is not optimal. This is a contradiction.

Thus the theorem is proved.

References

1. D. Cass and K. Shell, The structure and stability of competitive dynamical systems,
J. Econom. Theory, 12 (1976), 31–70.
2. A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis (Moscow, Nauka,
1975).
134 M.A. Mamedov

3. A. F. Filippov, Diﬀerential Equations with Discontinuous Right–Hand Sides, Mathe-

matics and its Applications (Soviet series) (Kluwer Academic Publishers, Dordrecht,
1988).
4. J. A. Fridy, Statistical limit points, Proc. Amer. Math. Soc., 118 (1993), 1187–1192.
5. A. Leizarowitz, Optimal trajectories on infinite horizon deterministic control systems,
Appl. Math. Optim., 19 (1989), 11–32.
6. A. Leizarowitz (1985), Infinite horizon autonomous systems with unbounded cost,
Appl. Math. Optim., 13 (1985), 19–43.
7. V. L. Makarov and A. M. Rubinov, Mathematical Theory of Economic Dynamics and
Equilibria (Nauka, Moscow, 1973). English trans. Springer-Verlag, New York, 1977.
8. M. A. Mamedov, Turnpike theorems in continuous systems with integral functionals,
Russian Acad. Sci. Dokl. Math. 45, No. 2 (1993), 432–435.
9. M. A. Mamedov, Turnpike theorems for integral functionals, Russian Acad. Sci. Dokl.
Math. 46, No. 1 (1993), 174–177.
10. M. A. Mamedov and S. Pehlivan, Statistical cluster points and turnpike theorem in
nonconvex problems, J. Math. Anal. Appl., 256 (2001), 686–693.
11. L. W. McKenzie, Turnpike theory, Econometrica, 44 (1976), 841–866.
12. S. Pehlivan and M. A. Mamedov, Statistical cluster points and turnpike, Optimization,
48 (2000), 93–106.
13. R. Radner, Paths of economic growth that are optimal with regard only to final states;
a turnpike theorem, Rev. Econom. Stud., 28 (1961), 98–104.
14. R. T. Rockafellar, Saddle points of Hamiltonian systems in convex problems of La-
grange, J. Optimization Theory Appl., 12 (1973), 367–390.
15. R. T. Rockafellar, Saddle points of Hamiltonian systems in convex problems having a
nonzero discount rate, J. Econom. Theory, 12 (1976), 71–113.
16. P. A. Samuelson, A catenary turnpike theorem involving consumption and the gold
rule, Amer. Econom. Rev., 55 (1965), 486–496.
17. J. A. Scheinkman, On optimal steady states of n-sector growth models when utility is
discounted, J. Econom. Theory, 12 (1976), 11–30.
18. J. A. Scheinkman, Stability of regular equilibra and the correspondence principle for
symmetric variational problems, Internat. Econ. Rev. 20 (1979), 279–315.
19. A. Zaslavski, Existence and structure of optimal solutions of variational problems,
Contemp. Math., 204 (1997), 247–278.
20. A. Zaslavski, Existence and uniform boundedness of optimal solutions of variational
problems, Abstr. Appl. Anal., 3 (1998), 265–292.
21. A. Zaslavski, Turnpike theorem for nonautonomous infinite dimensional discrete–time
control systems, Optimization, 48 (2000), 69–92.
Chapter 6
Pontryagin principle with a PDE:
a unified approach

B. D. Craven

Abstract A Pontryagin principle is obtained for a class of optimal control

problems with dynamics described by a partial diﬀerential equation. The
method, using Karush–Kuhn–Tucker necessary conditions for a mathematical
program, is almost identical to that for ordinary diﬀerential equations.

Key words: Optimal control, Pontryagin principle, partial diﬀerential equa-

tion, Karush–Kuhn–Tucker conditions

6.1 Introduction

Pontryagin’s principle has been proved in at least four ways, for an optimal
control problem in continuous time with dynamics described by an ordinary
differential equation (ODE). One approach ([5], [6]) regards the control prob-
lem as a mathematical program, and uses the Karush–Kuhn–Tucker (KKT)
necessary conditions as the starting point (though with some different hy-
potheses) for deriving the Pontryagin theory. There are various results for
optimal control when the dynamics are described by a partial differential
equation (PDE), often derived (as, for example, by Lions and Bensoussan)
using variational inequalities, which are generally equivalent to mathemat-
ical programs in infinite dimensions. The results in [1]–[5], and others by
the same authors, obtain some versions of Pontryagin’s principle by quite
different methods to those used for ODEs. However, the Pontryagin theory
involving a PDE can also be derived from the mathematical programming
approach, using the KKT conditions, and replacing the time variable t by a
space variable z, say in R2 or R3 , or by (t, z) combined. Whatever approach

B. D. Craven
Department of Mathematics, University of Melbourne, Victoria 3010, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 135

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 6,
c Springer Science+Business Media, LLC 2009
136 B.D. Craven

is followed requires a good deal of detailed calculation, concerned with choice

of function spaces (suitable Sobolev spaces), and proofs of diﬀerentiability
properties. These details are omitted here (they are adequately treated for
example in [1], [3]), since the aim here is to show that a Pontryagin principle
readily follows. The results depend indeed on certain diﬀerentiability prop-
erties, stated in what follows, but only indirectly on how these properties are
achieved.

6.2 Pontryagin for an ODE

Consider ﬁrst an optimal control problem with an ODE:

T
MIN J(u) : = F (x, u) := f (x(t), u(t), t)dt subject to (6.1)
0
x(0) = x0 , ẋ(t) = m(x(t), u(t), t) u(t) ∈ Γ (t) (0 ≤ t ≤ T ).

Here x(.) is the state function, u(.) is the control function, the time interval
[0, T ] is ﬁxed, f and m are diﬀerentiable functions. Other details such as
variable horizon T, an endpoint constraint on x(T ), and state constraints, can
readily be added to the problem. They are omitted here, since the purpose
is to show the method. The steps are as follows.
(a) The problem (6.1) is expressed as a mathematical program:

MINx∈X,u∈U J(u) := F (x, u) subject to Dx = M (x, u), u ∈ Γ,

over suitable function spaces X and U ; X is chosen so that the diﬀerential

operator D := (d/dt) is a continuous linear mapping (see Note 1 in the
Appendix).
(b) Assume temporarily that F and M are diﬀerentiable with respect to
(x, u). Then necessary KKT conditions for a minimum at (x, u) = (x̄, ū) are

Fx (x̄, ū) + λ̂(−Dx + Mx (x̄, ū)) = 0, (6.2)

(Fu (x̄, ū) + λ̂Mu ( x̄, ū))(Γ − ū) ≥ 0, (6.3)

with a Lagrange multiplier λ̂. Represent λ̂ by a function λ̄(.), where

T
∀w ∈ C[0, T ] < λ̂, w ≥ λ̄(t), w(t)dt.
0

Deﬁne the Hamiltonian

h(x(t), u(t), t, λ(t)) := f (x(t), u(t), t) + λ(t)m(x(t), u(t), t)

6 Pontryagin principle with a PDE: a uniﬁed approach 137

and
T
H(x, u, λ̂) := F (x, u) + λ̂M (x, u) = h(x(t), u(t), t, λ̄(t))dt.
0

In what follows, diﬀerentiability will be assumed only with respect to x, not

u, so that (6.3) is not available. The multiplier λ̂ remains, satisfying (6.2),
provided that the operator −D + Mx (x̄, ū) is assumed surjective.
(c) Integrating the −λ̂D term in (6.2) by parts leads to

−Dλ̄ = (F + λ̂M̄ )x (x̄, ū),

if the integrated part vanishes. Choosing a boundary condition to do this,

the adjoint diﬀerential equation is obtained:
˙
− λ̄(t) = hx (x̄(t), ū(t), t, λ(t)), λ̄(T ) = 0. (6.4)

(d) Assume that Dx = M (x, u) deﬁnes x as a Lipschitz function of u, and

that (see Note 2 in the Appendix)

F (x, u) − F (x̄, u) = Fx (x̄, ū)(x − x̄) + O( x − x̄ + u − ū ), (6.5)

with a similar requirement for M . Then minimality of (x̄, ū), namely that
F (x, u) − F (x̄, ū) ≥ 0, with (6.2), leads (see [7], Theorem 7.2.3) to

H(x̄, u, λ̂)−H(x̄, ū, λ̂) = F (x, u)− F( x̄, ū)+O( u− ū ) ≥ O( u− ū ), (6.6)

describing a quasimin (see [6]) of H(x̄, ., λ̂) over Γ (.) at ū. (Note that there
is no requirement of convexity on Γ (.).)
(e) Assuming that ū is a minimum in terms of the L1 norm, suppose if
possible that
h(x̄(t), u(t), t, λ(t)) < h(x̄(t), ū(t), t, λ(t))
for t in a set of positive measure. Then (see Note 3 in the Appendix) a set
of control functions {uβ (.) : β ≥ 0} ⊂ Γ is constructed (see [7], Theorem
7.2.6), for which
H(x̄, u, λ̂) − H(x̄, ū, λ̂) ≤ c u − ū
for some constant c > 0, thus contradicting (6.6). (A required chattering
property holds automatically for the considered control constraint.) This has
proved Pontryagin’s principle, in the following form.

Theorem 1. Let the control problem (6.1) reach a local minimum at (x, u) =
(x̄, ū) with respect to the L1 -norm for the control u. Assume that the diﬀeren-
tial equation Dx−M (x, u) determines x as a Lipschitz function of u, that the
diﬀerentiability property (6.5) (with respect to x) holds, and that −DM (x̄, ū)
is surjective, Then necessary conditions for the minimum are that the costate
138 B.D. Craven

λ(.) satisﬁes the adjoint equation (6.4), and that h(x̄(t), ., t, λ̄(t)) is minimized
over Γ (t) at ū(t), for almost all t.

6.3 Pontryagin for an elliptic PDE

Denote by Ω a closed bounded region in R3 (or R2 ), with boundary ∂Ω,

and disjoint sets Ai (i = 1, 2, 3, 4) whose union is ∂Ω. The optimal problem
considered is:

MIN x(.), u(.)J(u) := f (x(z), u(z), z)dz
Ω

subject to

(∀z ∈ Ω) Dx(z) = m(x(z), u(z), z), (6.7)

(∀z ∈ A1 ) x(z) = x0 (z), (6.8)
(∀z ∈ A2 ) (∇x(z)).n(z) = g0 (z),
(∀z ∈ Ω) u(z) ∈ Γ (z). (6.9)

Here D is an elliptic linear partial diﬀerential operator, such as the Lapla-

cian ∇2 , and n ≡ n(z) denotes the outward-pointing unit normal vector to
∂Ω at z ∈ ∂Ω. The constraint on the control u(z) is specified in terms of
a given set-valued function Γ (z). The precise way in which x(.) satisfies the
PDE (6.8) need not be specified here; instead, some specific properties of the
solution will be required. The function spaces must be chosen so that D is
a continuous linear mapping. This holds, in particular, for D = ∇2 , with
Sobolev spaces, if x ∈ W02 (Ω) and u ∈ W01 (Ω). It is further required that
(6.7) determines x(.) as a Lipschitz function of u(.). The boundary ∂Ω of the
region need only be smooth enough that Green’s theorem can be applied to
it.
The Hamiltonian is

h(x(z), u(z), z, λ(z)) := f (x(z), u(z), z) + λ(z)m(x(z), u(z), z). (6.10)

The steps of Section 6.2 are now applied, but replacing t ∈ [0, T ] by z ∈ Ω. It
is observed that steps (a), (b), (d) and (e) remain valid – they do not depend
on t ∈ R. Step (c) requires a replacement for integration by parts. If D = ∇2 ,
it is appropriate to use Green’s theorem in the form

[λ∇ x − x∇ λ]dv =
2 2
[λ(∂x/∂n) − x(∂λ/∂n)]ds,
Ω ∂Ω

in which dv and ds denote elements of volume and surface. The right side of
(6.10) beomes the integrated part; the origin can be shifted, in the spaces of
6 Pontryagin principle with a PDE: a uniﬁed approach 139

functions, to move x(z) = x0 (z) to x(z) = 0, and a similar replacement for

the normal component (∇x(z)).n(z); so the contributions to the integrated
part from A1 and A2 vanish already. The remaining contributions vanish if
boundary conditions are imposed:

λ(z) = 0 on A3 ; ∂λ/∂n = 0 on A4 (thus ∇λ(z).n(z) = 0 on A4 ). (6.11)

Then (6.2) leads to the adjoint PDE

D∗ λ(z) = ∂h(x(z), u(z), z; λ(z))/∂x(z),

with boundary conditions (6.11), where D∗ denotes the adjoint linear oper-
ator to D. Here, with D = ∇2 , (6.10) shows that D∗ = ∇2 also. Then (e),
with z ∈ Ω replacing t ∈ [0, T ], gives Pontryagin’s principle in the form:
h(x̄(z), ., t, λ̄(z)) is minimized over Γ (z) at ū(z), possibly except for a set of
z of zero measure.
If f and m happen to be linear in u, and if Γ (z) is a polyhedron with
vertices pi (or an interval if u(z) ∈ R), then Pontryagin’s principle may lead
to bang-bang control, namely u(z) = pi when z ∈ Ei , for some disjoint sets
Ei ⊂ Ω.

6.4 Pontryagin for a parabolic PDE

Now consider a control problem with dynamics described by the PDE

∂x(z, t)/∂t = c2 ∇2z x(z, t) + m(x(z, t), u(z, t), z, t),

where ∇2z acts on the variable z. Here t (for the ODE) has been replaced by
3
(t, z) ∈ [0, T ] × Ω, for a closed bounded region Ω ⊂ R , and where m(.) is a
forcing function. Define the linear differential operator D := (∂/∂t) − c2 ∇2z .
The function spaces must be chosen so that D is a continuous linear mapping.
Define Ai ⊂ ∂Ω as in Section 6.3. The optimal control problem now becomes
(with a certain choice of boundary conditions)
T
M INx(.),u(.) J(u) := f (x(z, t), u(z, t), z, t)dtdz
0 Ω

subject to

(∀z ∈ Ω) Dx(z, t) = m(x(z, t), u(z, t), t, z),

(∀z ∈ A1 )(∀t ∈ [0, T ]) x(z, t) = x0 (z, t),
(∀z ∈ A2 ) ∇x(z, t)).n(z) = g0 (z, t),
(∀z ∈ ∂Ω) x(z, 0) = b0 (z),
(∀z ∈ Ω) u(z, t) ∈ Γ (z, t).
140 B.D. Craven

Then steps (a), (b), (d) and (e) proceed as in Section 6.3, for an elliptic PDE.
The Hamiltonian is

h(x(z, t), u(z, t), z, t, λ(z, t))

:= f (x(z, t), u(z, t), z, t) + λ(z, t)m(x(z, t), u(z, t), z, t).

Step (c) (integration by parts) is replaced by the following (where I := [0, T ]

and θ := (∂/∂t)) :

−λ̂Dx = − λ(z, t)Dx(z, t)dzdt
I ∂
=− λ(z, t)[θ − ∇2z ]x(z, t)dzdt
I Ω
= dz (θλ(z, t))x(z, t)dt
Ω
I
+ (∇2z λ(z, t))x(z, t)dz,
I Ω

applying integration by parts to θ and Green’s theorem to ∇2z , provided

that the “integrated parts” vanish. Since x(z, t) is given for t = 0, and for
z ∈ A1 ∪ A2 ⊂ ∂Ω, it suﬃces if
(∀z)λ(z, T ) = 0,

so that Ω
[λ(z, t)x(x, t)]T0 = 0, and if

(∀t ∈ [0, T ]) λ(z, t) = 0 on A3 ; ∇x(z, t.n(z, t) = 0 on A4 .

With these boundary conditions, the adjoint PDE becomes

−(∂/∂t)λ(z, t) = c2 ∇2z λ(z, t).

Then (e), with (z, t) ∈ Ω × I replacing t ∈ [0, T ], gives Pontryagin’s principle

in the form:

h(x̄(z, t), ., z, t, λ̄(z, t)) is minimized over Γ (z) at ū(z, t),

possibly except for a set of (z, t) of zero measure.

Concerning bang-bang control, a similar remark to that in Section 6.3
applies here also.

6.5 Appendix

Note 1. The linear mapping D is continuous if x(.) is given a graph norm :

x := x ∗ + Dx ∗ ,
6 Pontryagin principle with a PDE: a uniﬁed approach 141

where x ∗ denotes a given norm, such as x ∞ or x 2 .

Note 2. It follows from Gronwall’s inequality that the mapping from u
(with L1 norm) to x (with L∞ or L2 norm) is Lipschitz if m(.) satisfies a
Lipschitz condition. The differentiability property (6.5) replaces the usual
Fx (x̄, ū) by Fx (x̄, u). This holds (using the first mean value theorem) if f
and m have bounded second derivatives.
Similar results are conjectured for the case of partial differential equations.
Note 3. The construction depends on the (local) minimum being reached
when u has the L1 -norm, and on the constraint (∀z)u(z) ∈ Γ (z) having the
chattering property, that if u and v are feasible controls, then w is a feasible
control, defining w(z) = u(z) for z ∈ Ω1 ⊂ Ω and w(z) = v(z) for z ∈ Ω\Ω1 .
For Section 6.4, substitute (z, t) for z here.

Acknowledgments The author thanks two referees for pointing out ambiguities and
omissions.

References

1. E. Casas, Boundary control problems for quasi–linear elliptic equations: A Pontryagin’s

principle, Appl. Math. Optim. 33 (1996), 265–291.
2. E. Casas, Pontryagin’s principle for state–constrained boundary control problems of
semilinear parabolic equations, SIAM J. Control Optim. 35 (1997), 1297–1327.
3. E. Casas, F. Tröltsch and A. Unger, Second order suﬃcient optimality condition for
a nonlinear elliptic boundary control problem, Z. Anal. Anwendungen 15 (1996),
687–707.
4. E. Casas and F. Tröltsch, Second order necessary optimality conditions for some state-
constrained control problems of semilinear elliptic equations, SIAM J. Control Optim.
38 (2000), 1369–1391.
5. E. Casas, J.-P. Raymond and H. Zidani, Pontryagin’s principle for local solutions of
control problems with mixed control–state constraints, SIAM J. Control Optim. 39
(1998), 1182–1203.
6. B. D. Craven, Mathematical Programming and Control Theory (Chapman & Hall,
London, 1978).
7. B. D. Craven, Control and Optimization (Chapman & Hall, London, 1995).
Chapter 7
A turnpike property for discrete-time
control systems in metric spaces

Alexander J. Zaslavski

Abstract In this work we study the structure of “approximate” solu-

tions for a nonautonomous inﬁnite dimensional discrete-time control sys-
tem determined by a sequence of continuous functions vi : X × X → R1 ,
i = 0, ±1, ±2, . . . where X is a metric space.

Key words: Discrete-time control system, metric space, turnpike property

7.1 Introduction

Let X be a metric space and let ρ(·, ·) be the metric on X. For the set X × X
we deﬁne a metric ρ1 (·, ·) by

ρ1 ((x1 , x2 ), (y1 , y2 )) = ρ(x1 , y1 ) + ρ(x2 , y2 ), x1 , x2 , y1 , y2 ∈ X.

Let Z be the set of all integers. Denote by M the set of all sequences
of functions v = {vi }∞
i=−∞ where vi : X → R is bounded from below for
1
∞
each i ∈ Z. Such a sequence of functions {vi }i=−∞ ∈ M will occasionally be
denoted by a boldface v (similarly {ui }∞
i=−∞ will be denoted by u, etc.)
The set M is equipped with the metric d deﬁned by
˜ u) = sup{|vi (x, y) − ui (x, y)| : (x, y) ∈ X × X, i ∈ Z},
d(v, (1.1)
˜ u))−1 , u, v ∈ M.
˜ u)(1 + d(v,
d(v, u) = d(v,
In this paper we investigate the structure of “approximate” solutions of
the optimization problem

Alexander J. Zaslavski
Department of Mathematics, The Technion–Israel Institute of Technology, Haifa, Israel

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 143

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 7,
c Springer Science+Business Media, LLC 2009
144 A.J. Zaslavski

2 −1
k
vi (xi , xi+1 ) → min, {xi }ki=k
2
1
⊂ X, xk1 = y, xk2 = z (P )
i=k1

where v = {vi }∞ i=−∞ ∈ M, y, z ∈ X and k2 > k1 are integers.

The interest in these discrete-time optimal problems stems from the study
of various optimization problems which can be reduced to this framework,
for example, continuous-time control systems which are represented by ordi-
nary diﬀerential equations whose cost integrand contains a discounting factor
(see [Leizarowitz (1985)]), the inﬁnite-horizon control problem of minimiz-
T
ing 0 L(z, z )dt as T → ∞ (see [Leizarowitz (1989), Zaslavski (1996)]) and
the analysis of a long slender bar of a polymeric material under tension in
[Leizarowitz and Mizel (1989), Marcus and Zaslavski (1999)]. Similar opti-
mization problems are also considered in mathematical economics (see
[Dzalilov et al. (2001), Dzalilov et al. (1998), Makarov, Levin and Rubinov
(1995), Makarov and Rubinov (1973), Mamedov and Pehlivan (2000), Mame-
dov and Pehlivan (2001), McKenzie (1976), Radner (1961), Rubinov (1980),
Rubinov (1984)]). Note that the problem (P) was studied in [Zaslavski (1995)]
when X was a compact metric space and vi = v0 for all integers i.
For each v ∈ M, each m1 , m2 ∈ Z such that m2 > m1 and each z1 , z2 ∈ X
set

σ(v, m1 , m2 , z1 , z2 ) =
m −1 -
2

inf vi (xi , xi+1 ) : {xi }m

i=m1 ⊂ X, xm1 = z1 , xm2 = z2 .
2
(1.2)
i=m1

If the space of states X is compact and vi is continuous for all integers

i, then the problem (P) has a solution for each y, z ∈ X and each pair of
integers k2 > k1 . For the noncompact space X the existence of solutions
of the problem (P) is not guaranteed and in this situation we consider δ-
approximate solutions.
Let v ∈ M, y, z ∈ X, k2 > k1 be integers and let δ be a positive number.
We say that a sequence {xi }ki=k2
1
⊂ X satisfying xk1 = y, xk2 = z is a
δ-approximate solution of the problem (P) if

2 −1
k
vi (xi , xi+1 ) ≤ σ(v, k1 , k2 , y, z) + δ.
i=k1

In this chapter we study the structure of δ-approximate solutions of the

problem (P).
Deﬁnition: Let v = {vi }∞ ∞
i=−∞ ∈ M and {x̄i }i=−∞ ⊂ X. We say that v
∞
has the turnpike property (TP) and {x̄i }i=−∞ is the turnpike for v if for each
> 0 there exist δ > 0 and a natural number N such that for each pair of
integers m1 , m2 satisfying m2 ≥ m1 + 2N and each sequence {xi }m i=m1 ⊂ X
2
7 A turnpike property 145

satisfying

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + δ

i=m1

there exist τ1 ∈ {m1 , . . . , m1 + N } and τ2 ∈ {m2 − N, . . . , m2 } such that

ρ(xi , x̄i ) ≤ , i = τ1 , . . . , τ2 .

Moreover, if ρ(xm1 , x̄m1 ) ≤ δ, then τ1 = m1 , and if ρ(xm2 , x̄m2 ) ≤ δ, then

τ 2 = m2 .
This property was studied in [Zaslavski (2000)] for sequences of functions
v which satisfy certain uniform boundedness and uniform continuity assump-
tions. We showed that a generic v has the turnpike property.
The turnpike property is very important for applications. Suppose that a
sequence of cost functions v ∈ M has the turnpike property and we know
a finite number of approximate solutions of the problem (P). Then we know
the turnpike {x̄i }∞ i=−∞ , or at least its approximation, and the constant N
which is an estimate for the time period required to reach the turnpike. This
information can be useful if we need to find an “approximate” solution of the
problem (P) with a new time interval [k1 , k2 ] and the new values y, z ∈ X
at the end points k1 and k2 . Namely instead of solving this new problem
on the “large” interval [k1 , k2 ] we can find an “approximate” solution of the
problem (P) on the “small” interval [k1 , k1 + N ] with the values y, x̄k1 +N
at the end points and an approximate solution of the problem (P) on the
“small” interval [k2 − N, k2 ] with the values x̄k2 −N , z at the end points.
Then the concatenation of the first solution, the sequence {x̄i }kk21 −N+N and the
second solution is an approximate solution of the problem (P) on the interval
[k1 , k2 ] with the values y, z at the end points. Sometimes as an “approximate”
solution of the problem (P) we can choose any sequence {xi }ki=k 2
1
satisfying

xk1 = y, xk2 = z and xi = x̄i for all i = k1 + N, . . . , k2 − N.

This sequence is a δ-approximate solution where the constant δ does not

depend on k1 , k2 and y, z. The constant δ is not necessarily a “small” number
but it may be sufficient for practical needs especially if the length of the
interval [k1 , k2 ] is large.
The turnpike property is well known in mathematical economics. The term
was first coined by Samuelson in 1948 (see [Samuelson (1965)]) where he
showed that an efficient expanding economy would spend most of the time in
the vicinity of a balanced equilibrium path (also called a von Neumann path).
This property was further investigated in [Dzalilov et al. (2001), Dzalilov et al.
(1998), Makarov, Levin and Rubinov (1995), Makarov and Rubinov (1973),
Mamedov and Pehlivan (2000), Mamedov and Pehlivan (2001), McKenzie
(1976), Radner (1961), Rubinov (1980), Rubinov (1984)] for optimal trajec-
tories of models of economic dynamics.
146 A.J. Zaslavski

The chapter is organized as follows. In Section 2 we study the stability

of the turnpike phenomenon. In Section 3 we show that if {x̄i }∞ i=−∞ is the
turnpike for v = {vi }∞i=−∞ ∈ M and v i is continuous for each integer i,
then for each pair of integers k2 > k1 the sequence {x̄i }i=k1 is a solution
k2

of the problem (P) with y = x̄k1 and z = x̄k2 . In Section 4 we show that
under certain assumptions the turnpike property is equivalent to its weakened
version.

7.2 Stability of the turnpike phenomenon

In this section we prove the following result.

Theorem 1. Assume that v = {vi }∞ i=−∞ ∈ M has the turnpike property and
{x̄i }∞
i=−∞ ⊂ X is the turnpike for v. Then the following property holds:
For each > 0 there exist δ > 0, a natural number N and a neighborhood
U of v in M such that for each u ∈ U, each pair of integers m1 , m2 satisfying
m2 ≥ m1 + 2N and each sequence {xi }m i=m1 ⊂ X satisfying
2

m 2 −1

ui (xi , xi+1 ) ≤ σ(u, m1 , m2 , xm1 , xm2 ) + δ (2.1)

i=m1

there exist τ1 ∈ {m1 , . . . , m1 + N } and τ2 ∈ {m2 − N, . . . , m2 } such that

ρ(xi , x̄i ) ≤ , i = τ1 , . . . , τ2 . (2.2)

Moreover, if ρ(xm1 , x̄m1 ) ≤ δ, then τ1 = m1 , and if ρ(xm2 , x̄m2 ) ≤ δ, then

τ2 = m 2 .

Proof. Let > 0. It follows from the property (TP) that there exist
δ0 ∈ (0, /4) (2.3)

and a natural number N0 such that the following property holds:

(P1) for each pair of integers m1 , m2 ≥ m1 + 2N0 and each sequence
{xi }m
i=m1 ⊂ X satisfying
2

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + 4δ0 (2.4)

i=m1

there exist τ1 ∈ {m1 , . . . , m1 + N0 }, τ2 ∈ {m2 − N0 , m2 } such that (2.2)

holds. Moreover, if ρ(xm1 , x̄m1 ) ≤ 4δ0 , then τ1 = m1 and if ρ(xm2 x̄m2 ) ≤ 4δ0
then τ2 = m2 .
It follows from the property (TP) that there exist

δ ∈ (0, δ0 /4) (2.5)

7 A turnpike property 147

and a natural number N1 such that the follwing property holds:

(P2) For each pair of integers m1 , m2 ≥ m1 + 2N1 and each sequence
{xi }m
i=m1 ⊂ X which satisﬁes
2

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + 4δ

i=m1

there exist τ1 ∈ {m1 , . . . , m1 + N1 }, τ2 ∈ {m2 − N1 , . . . , m2 } such that

ρ(xi , x̄i ) ≤ δ0 , i ∈ {τ1 , . . . , τ2 }. (2.6)

Set
N = 4(N1 + N0 ) (2.7)
and

U = {u ∈ M : |ui (x, y) − vi (x, y)| ≤ (8N )−1 δ, (x, y) ∈ X × X}. (2.8)

Assume that u ∈ U, m1 , m2 ∈ Z, m2 ≥ m1 + 2N ,

m 2 −1

{xi }m 2
i=m1 ⊂ X and ui (xi , xi+1 ) ≤ σ(u, m1 , m2 , xm1 , xm2 ) + δ. (2.9)
i=m1

Let
k ∈ {m1 , . . . , m2 }, m2 − k ≥ 2N. (2.10)
(2.9) implies that

−1
k+2N
ui (xi , xi+1 ) ≤ σ(u, k, k + 2N, xk , xk+2N ) + δ. (2.11)
i=k

By (2.11), (2.7) and (2.8),

*k+2N −1 *
* −1
k+2N *
* *
* ui (xi , xi+1 ) − vi (xi , xi+1 )* ≤ δ/4,
* *
i=k i=k

|σ(u, k, k + 2N, xk , xk+2N ) − σ(v, k, k + 2N, xk , xk+2N )| < δ/4

and

−1
k+2N −1
k+2N
vi (xi , xi+1 ) ≤ ui (xi , xi+1 ) + δ/4
i=k i=k
≤ δ/4 + σ(u, k, k + 2N, xk , xk+2N ) + δ
≤ σ(v, k, k + 2N, xk , xk+2N ) + δ + δ/4 + δ/4. (2.12)
148 A.J. Zaslavski

We have that (2.12) holds for any k satisfying (2.10). This fact implies
that
2 −1
k
vi (xi , xi+1 ) ≤ σ(v, k1 , k2 , xk1 , xk2 ) + 2−1 · 3δ (2.13)
i=k1

for each pair of integers k1 , k2 ∈ {m1 , . . . , m2 } such that

k1 < k2 ≤ k1 + 2N.

It follows from (2.13), (2.7) and the property (P2) that for any integer k ∈
{m1 , . . . , m2 } satisfying m2 − k ≥ 2N0 + 2N1 there exists an integer q such
that q − k ∈ [2N0 , 2N0 + 2N1 ] and

ρ(xq , x̄q ) ≤ δ0 .

This fact implies that there exists a ﬁnite strictly increasing sequence of
integers {τj }sj=0 such that

ρ(xτj , x̄τj ) ≤ δ0 , j = 0, . . . , s, (2.14)

m1 ≤ τ0 ≤ 2N0 + 2N1 + m1 , if ρ(xm1 , x̄m1 ) ≤ δ0 , then τ0 = m1 , (2.15)

τj+1 − τj ∈ [2N0 , 2N0 + 2N1 ], j = 0, . . . , s − 1 (2.16)
and
m2 − 2N0 − 2N1 < τs ≤ m2 . (2.17)
It follows from (2.13), (2.7), (2.5), (2.14) and (2.16) that for j = 0, . . . , s − 1

ρ(xi , x̄i ) ≤ , i ∈ {τj , . . . , τj+1 }.

Thus ρ(xi , x̄i ) ≤ , i ∈ {τ0 , . . . , τs } with τ0 ≤ m1 + N , τs ≥ m2 − N . By

(2.15) if ρ(xm1 , x̄m1 ) ≤ δ0 , then τ0 = m1 .
Assume that
ρ(xm2 , x̄m2 ) ≤ δ0 . (2.18)
To complete the proof of the theorem it is suﬃcient to show that

ρ(xi , x̄i ) ≤ , i ∈ {τs , . . . , m2 }.

By (2.17) and (2.16)

m2 − τs−1 = m2 − τs + τs − τs−1 ∈ [2N0 , 4N0 + 4N1 ]. (2.19)

By (2.13), (2.19) and (2.7),

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, τs−1 , m2 , xτs−1 , xm2 ) + 2−1 · 3δ. (2.20)

i=τs−1
7 A turnpike property 149

It follows from (2.19), (2.20), (2.18), (2.14) and the property (P1) that

ρ(xi , x̄i ) ≤ , i = τs−1 , . . . , m2 .

Theorem 2.1 is proved.

7.3 A turnpike is a solution of the problem (P)

In this section we show that if {x̄i }∞ ∞

i=−∞ is the turnpike for v = {vi }i=−∞ ∈
M and vi is continuous for each integer i, then for each pair of integers
k2 > k1 the sequence {x̄i }ki=k
2
1
is a solution of the problem (P) with y = x̄k1
and z = x̄k2 . We prove the following result.
Theorem 2. Let v = {vi }∞ ∞
i=−∞ ∈ M, and {x̄i }−∞ ⊂ X. Assume that vi
is continuous for all i ∈ Z, v has the turnpike property and {x̄i }∞
−∞ is the
turnpike for v. Then for each pair of integers m1 , m2 > m1 ,

m 2 −1

vi (x̄i , x̄i+1 ) = σ(v, m1 , m2 , x̄m1 , x̄m2 ).

i=m1

Proof. Assume the contrary. Then there exist a pair of integers m̄1 , m̄2 > m̄1 ,
a sequence {xi }m̄
i=m̄1 and a number Δ > 0 such that
2

xm̄1 = x̄m̄1 , xm̄2 = x̄m̄2 ,

m̄ 2 −1
m̄ 2 −1

vi (xi , xi+1 ) < vi (x̄i , x̄i+1 ) − Δ. (3.1)

i=m̄1 i=m̄1

There exists ∈ (0, Δ/4) such that the following property holds:
(P3) if i ∈ {m̄1 − 1, . . . , m̄2 + 1}, z1 , z2 ∈ X and

ρ(z1 , x̄i ), ρ(z2 , x̄i+1 ) ≤ ,

then
|vi (x̄i , x̄i+1 ) − vi (z1 , z2 )| ≤ Δ[64(m̄2 − m̄1 + 4)]−1 . (3.2)
By the property (TP) there exist δ ∈ (0, /4) and a natural number N such
that for each pair of integers n1 , n2 ≥ n1 + 2N and each sequence {yi }ni=n
2
1
⊂
X satisfying
2 −1
n
vi (yi , yi+1 ) ≤ σ(v, n1 , n2 , yn1 , yn2 ) + δ (3.3)
i=n1

the following inequality holds:

ρ(yi , x̄i ) ≤ , i = n1 + N, . . . , n2 − N. (3.4)

150 A.J. Zaslavski

There exists {ȳi }i= m̄1 −4N ⊂ X such that

m̄2 +4N

ȳm̄1 −4N = x̄m̄1 −4N , ȳm̄2 +4N = x̄m̄2 +4N , (3.5)

and
m̄
2 +4N

vi (ȳi , ȳi+1 ) ≤ σ(v, m̄1 − 4N, m̄2 + 4N, ȳm̄1 −4N , ȳm̄2 +4N ) + δ/8.
i=m̄1 −4N
(3.6)

By (3.5), (3.6) and the deﬁnition of δ, N (see (3.3) and (3.4))

ρ(ȳi , x̄i ) ≤ , i = m̄1 − 3N, . . . , m̄2 + 3N. (3.7)

Deﬁne {yi }i= m̄1 −4N ⊂ X by

m̄2 +4N

yi = ȳi , i ∈ {m̄1 − 4N, . . . , m̄1 − 1} ∪ {m̄2 + 1, . . . , m̄2 + 4N }, (3.8)

yi = xi , i ∈ {m̄1 , . . . , m̄2 }.
We will estimate
−1
m̄2 +4N −1
m̄2 +4N
vi (ȳi , ȳi+1 ) − vi (yi , yi+1 ).
i=m̄1 −4N i=m̄1 −4N

By (3.8) and (3.1),

−1
m̄2 +4N −1
m̄2 +4N
vi (ȳi , ȳi+1 ) − vi (yi , yi+1 ) (3.9)
i=m̄1 −4N i=m̄1 −4N

m̄2
= [vi (ȳi , ȳi+1 ) − vi (yi , yi+1 )]
i=m̄1 −1

= vm̄1 −1 (ȳm̄1 −1 , ȳm̄1 ) − vm̄1 −1 (ȳm̄1 −1 , ym̄1 ) + vm̄2 (ȳm̄2 , ȳm̄2 +1 )

m̄ 2 −1

− vm̄2 (ym̄2 , ym̄2 +1 ) + [vi (ȳi , ȳi+1 ) − vi (yi , yi+1 )]

i=m̄1

= vm̄1 −1 (ȳm̄1 −1 , ȳm̄1 ) − vm̄1 −1 (ȳm̄1 −1 , x̄m̄1 ) + vm̄2 (ȳm̄2 , ȳm̄2 +1 )

m̄ 2 −1

− vm̄2 (x̄m̄2 , ȳm̄2 +1 ) + [vi (ȳi , ȳi+1 ) − vi (x̄i , x̄i+1 )]

i=m̄1

m̄ 2 −1

+ [vi (x̄i , x̄i+1 ) − vi (xi , xi+1 )].

i=m̄1

By (3.7) and the property (P3),

7 A turnpike property 151

|vm̄1 −1 (ȳm̄1 −1 , ȳm̄1 ) − vm̄1 −1 (ȳm̄1 −1 , x̄m̄1 )| ≤ 2Δ[64(m̄2 − m̄1 + 4)]−1 , (3.10)

|vm̄2 (ȳm̄2 , ȳm̄2 +1 ) − vm̄2 (x̄m̄2 , ȳm̄2 +1 )| ≤ 2Δ[64(m̄2 − m̄1 + 4)]−1 , (3.11)

|vi (ȳi , ȳi+1 ) − vi (x̄i , x̄i+1 )| ≤ Δ[64(m̄2 − m̄1 + 4)]−1 , i = m̄1 , . . . , m̄2 − 1.
(3.12)
It follows from (3.9), (3.12), (3.11) and (3.1) that

−1
m̄2 +4N
[vi (ȳi , ȳi+1 ) − vi (yi , yi+1 )]
i=m̄1 −4N

≥ −Δ[64(m̄2 − m̄1 + 4)]−1 (m̄2 − m̄1 + 4)

m̄ 2 −1

+ [vi (x̄i , x̄i+1 ) − vi (xi , xi+1 )]

i=m̄1

≥ Δ − Δ/64 > 2δ.

Combined with (3.8) this fact contradicts (3.6). The contradiction we have
reached proves the theorem.

7.4 A turnpike result

In this section we show that under certain assumptions the turnpike property
is equivalent to its weakened version.

Theorem 3. Let v = {vi }∞ ∞

−∞ ∈ M, {x̄i }i=−∞ ⊂ X and

m 2 −1

vi (x̄i , x̄i+1 ) = σ(v, m1 , m2 , x̄m1 , x̄m2 ) (4.1)

i=m1

for each pair of integers m1 , m2 > m1 . Assume that the following two prop-
erties hold:
(i) For any > 0 there exists δ > 0 such that for each i ∈ Z, each
x1 , x2 , y1 , y2 ∈ X satisfying ρ(xj , yj ) ≤ δ, j = 1, 2,

|vi (x1 , x2 ) − vi (y1 , y2 )| ≤ ;

(ii) for each > 0 there exist δ > 0 and a natural number N such that for
each pair of integers m1 , m2 ≥ m1 + 2N and each sequence {xi }m i=m1 ⊂ X
2

satisfying
152 A.J. Zaslavski

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , x̄m1 , x̄m2 ) + δ (4.2)

i=m1

the inequality
ρ(xi , x̄i ) ≤ , i = m1 + N, . . . , m2 − N

holds. Then v has the property (TP) and {x̄i }∞

i=−∞ is the turnpike for v.

Proof. We show that the following property holds:

(C) Let > 0. Then there exists δ > 0 such that for each integer m, each
natural number k and each sequence {xi }k+m
i=k ⊂ X satisfying

ρ(xi , x̄i ) ≤ δ, i = m, m + k (4.3)

m+k−1
vi (xi , xi+1 ) ≤ σ(v, m, m + k, xm , xm+k ) + δ
i=m

the inequality
ρ(xi , x̄i ) ≤ , i = m, . . . , m + k (4.4)
holds.
Let > 0. There exists 0 ∈ (0, /2) and a natural number N such that for
each pair of integers m1 , m2 ≥ m1 + 2N and each sequence {xi }m i=m1 ⊂ X
2

satisfying

m 2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + 20

i=m1

the inequality
ρ(xi , x̄i ) ≤ , i = m1 + N, . . . , m2 − N (4.5)
holds.
By the property (i) there is δ ∈ (0, 0 /8) such that for each integer i
and each z1 , z2 , y1 , y2 ∈ X satisfying ρ(zj , yj ) ≤ δ, j = 1, 2 the following
inequality holds:
|vi (z1 , z2 ) − vi (y1 , y2 )| ≤ 0 /16. (4.6)

Assume that m is an integer, k is a natural number and a sequence {xi }m+k

i=m
⊂ X satisﬁes (4.3). We will show that (4.4) is true. Clearly we may assume
without loss of generality that k > 1.
Deﬁne a sequene {zi }m+k
i=m ⊂ X by

zi = xi , i = m, m + k, zi = x̄i , i ∈ {m, . . . , m + k} \ {m, m + k}. (4.7)

(4.3) and (4.7) imply that

7 A turnpike property 153

m+k−1
vi (xi , xi+1 )
i=m

m+k−1
≤ vi (zi , zi+1 ) + δ
i=m

m+k−1
≤δ+ vi (x̄i , x̄i+1 ) + |vm (x̄m , x̄m+1 ) − vm (xm , x̄m+1 )|
i=m
+ |vm+k−1 (x̄m+k−1 , x̄m+k ) − vm+k−1 (x̄m+k−1 , xm+k )|.

Combined with (4.3) and the deﬁnition of δ (see (4.6)) this inequality
implies that

m+k−1
m+k−1
vi (xi , xi+1 ) ≤ δ + 0 /8 + vi (x̄i , x̄i+1 ). (4.8)
i=m i=m

Deﬁne {yi }m+2N

i=m−2N ⊂ X by
+k

yi = x̄i , i ∈ {m − 2N, . . . , m − 1} ∪ {m + k + 1, . . . , m + k + 2N }, (4.9)

yi = xi , i ∈ {m, . . . , m + k}.
It follows from (4.9), (4.3) and the deﬁnition of δ (see (4.6)) that

|vm−1 (x̄m−1 , x̄m ) − vm−1 (ym−1 , ym )| ≤ 0 /16

and
|vm+k (x̄m+k , x̄m+k+1 ) − vm+k (ym+k , ym+k+1 )| ≤ 0 /16.

Combined with (4.9) and (4.8) these inequalities imply that

m+k
0
vi (yi , yi+1 ) ≤ vm−1 (x̄m−1 , x̄m ) + + vm+k (x̄m+k , x̄m+k+1 )
i=m−1
16

0
m+k−1
+ + vi (xi , xi+1 )
16 i=m
≤ 0 /8 + vm−1 (x̄m−1 , x̄m ) + vm+k (x̄m+k , x̄m+k+1 ) + δ

m+k−1
+ 0 /8 + vi (x̄i , x̄i+1 )
i=m

m+k
< 0 /2 + vi (x̄i , x̄i+1 ). (4.10)
i=m−1
154 A.J. Zaslavski

By (4.9), (4.10) and (4.1)

−1
m+k+2N
vi (yi , yi+1 )
i=m−2N

m−2
m+k −1
m+k+2N
= vi (yi , yi+1 ) + vi (yi , yi+1 ) + vi (yi , yi+1 )
i=m−2N i=m−1 i=m+k+1

m−2
m+k
≤ vi (x̄i , x̄i+1 ) + 0 /2 + vi (x̄i , x̄i+1 )
i=m−2N i=m−1

−1
m+k+2N
+ vi (x̄i , x̄i+1 )
i=m+k+1

−1
m+k+2N
= 0 /2 + vi (x̄i , x̄i+1 )
i=m−2N

= 0 /2 + σ(v, m − 2N, m + k + 2N, x̄m−2N , x̄m+k+2N )

= 0 /2 + σ(v, m − 2N, m + k + 2N, ym−2N , ym+k+2N ).

Thus

−1
m+k+2N
vi (yi , yi+1 ) ≤ 0 /2 + σ(v, m − 2N, m + k + 2N, ym−2N , ym+k+2N ).
i=m−2N

By this inequality and the deﬁnition of 0 (see (4.5))

ρ(yi , x̄i ) ≤ , i = m − N, . . . , m + k + N.

Together with (4.9) this implies that

ρ(xi , x̄i ) ≤ , i = m, . . . , m + k.

Thus we have shown that the property (C) holds. Now we are ready to com-
plete the proof.
Let > 0. By the property (C) there exists δ0 ∈ (0, ) such that for each
pair of integers m1 , m2 > m1 and each sequence {xi }m
i=m1 ⊂ X satisfying
2

ρ(xi , x̄i ) ≤ δ0 , i = m1 , m2

m2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + δ0 (4.11)

i=m1
7 A turnpike property 155

the following inequality holds:

ρ(xi , x̄i ) ≤ , i = m1 , . . . , m2 (4.12)

There exist a number δ ∈ (0, δ0 ) and a natural number N such that for each
pair of integers m1 , m2 ≥ m1 + 2N and each sequence {xi }m i=m1 ⊂ X which
2

satisﬁes
m2 −1

vi (xi , xi+1 ) ≤ σ(v, m1 , m2 , xm1 , xm2 ) + δ (4.13)

i=m1

the following inequality holds:

ρ(xi , x̄i ) ≤ δ0 , i = m1 + N, . . . , m2 − N. (4.14)

Let m1 , m2 ≥ m1 + 2N be a pair of integers and {xi }m i=m1 ⊂ X satisfy

(4.13). Then (4.14) is valid. Assume that ρ(xm1 , x̄m1 ) ≤ δ. Then by (4.14)
and (4.13),
ρ(xm1 +N , x̄m1 +N ) ≤ δ0
and
+N −1
m1
vi (xi , xi+1 ) ≤ σ(v, m1 , m1 + N, xm1 , xm1 +N ) + δ.
i=m1

It follows from these relations and the deﬁnition of δ0 (see (4.11) and (4.12))
that
ρ(xi , x̄i ) ≤ , i = m1 , . . . , m1 + N.
Analogously we can show that is ρ(xm2 , x̄m2 ) ≤ δ, then

ρ(xi , x̄i ) ≤ , i = m2 − N, . . . , m2 .

This completes the proof of the theorem.

References

[Dzalilov et al. (2001)] Dzalilov, Z., Ivanov, A.F. and Rubinov, A.M. (2001) Difference
inclusions with delay of economic growth, Dynam. Systems Appl., Vol. 10, pp. 283–293.
[Dzalilov et al. (1998)] Dzalilov, Z., Rubinov, A.M. and Kloeden, P.E. (1998) Lyapunov
sequences and a turnpike theorem without convexity, Set-Valued Analysis, Vol. 6,
pp. 277–302.
[Leizarowitz (1985)] Leizarowitz, A. (1985) Infinite horizon autonomous systems with un-
bounded cost, Appl. Math. and Opt., Vol. 13, pp. 19–43.
[Leizarowitz (1989)] Leizarowitz, A. (1989) Optimal trajectories on infinite horizon de-
terministic control systems, Appl. Math. and Opt., Vol. 19, pp. 11–32.
[Leizarowitz and Mizel (1989)] Leizarowitz, A. and Mizel, V.J. (1989) One dimensional
infinite horizon variational problems arising in continuum mechanics, Arch. Rational
Mech. Anal., Vol. 106, pp. 161–194.
156 A.J. Zaslavski

[Makarov, Levin and Rubinov (1995)] Makarov, V.L, Levin, M.J. and Rubinov, A.M.
(1995) Mathematical economic theory: pure and mixed types of economic mechanisms,
North-Holland, Amsterdam.
[Makarov and Rubinov (1973)] Makarov, V.L. and Rubinov, A.M. (1973) Mathematical
theory of economic dynamics and equilibria, Nauka, Moscow, English trans. (1977):
Springer-Verlag, New York.
[Mamedov and Pehlivan (2000)] Mamedov, M.A. and Pehlivan, S. (2000) Statistical con-
vergence of optimal paths, Math. Japon., Vol. 52, pp. 51–55.
[Mamedov and Pehlivan (2001)] Mamedov, M.A. and Pehlivan, S. (2001) Statistical clus-
ter points and turnpike theorem in nonconvex problems, J. Math. Anal. Appl., Vol. 256,
pp. 686–693.
[Marcus and Zaslavski (1999)] Marcus, M. and Zaslavski, A.J. (1999) The structure of
extremals of a class of second order variational problems, Ann. Inst. H. Poincare, Anal.
non lineare, Vol. 16, pp. 593–629.
[McKenzie (1976)] McKenzie, L.W. (1976) Turnpike theory, Econometrica, Vol. 44,
pp. 841–866.
[Radner (1961)] Radner, R. (1961) Path of economic growth that are optimal with regard
only to final states; a turnpike theorem, Rev. Econom. Stud., Vol. 28, pp. 98–104.
[Rubinov (1980)] Rubinov, A.M. (1980) Superlinear multivalued mappings and their ap-
plications to problems of mathematical economics, Nauka, Leningrad.
[Rubinov (1984)] Rubinov, A.M. (1984) Economic dynamics, J. Soviet Math., Vol. 26,
pp. 1975–2012.
[Samuelson (1965)] Samuelson, P.A. (1965) A catenary turnpike theorem involving con-
sumption and the golden rule, American Economic Review, Vol. 55, pp. 486–496.
[Zaslavski (1995)] Zaslavski, A.J. (1995) Optimal programs on infinite horizon, 1 and 2,
SIAM Journal on Control and Optimization, Vol. 33, pp. 1643–1686.
[Zaslavski (1996)] Zaslavski, A.J. (1996) Dynamic properties of optimal solutions of
variational problems, Nonlinear Analysis: Theory, Methods and Applications, Vol. 27,
pp. 895–932.
[Zaslavski (2000)] Zaslavski, A.J. (2000) Turnpike theorem for nonautonomous infinite
dimensional discrete-time control systems, Optimization, Vol. 48, pp. 69–92.
Chapter 8
Mond–Weir Duality

B. Mond

Abstract Consider the nonlinear programming problem to minimize f (x)

subject to g(x) ≤ 0. The initial dual to this problem given by Wolfe required
that all the functions be convex. Since that time there have been many ex-
tensions that allowed the weakening of the convexity conditions. These gen-
eralizations include pseudo- and quasi-convexity, invexity, and second order
convexity. Another approach is that of Mond and Weir who modified the
dual problem so as to weaken the convexity requirements. Here we summa-
rize and compare some of these different approaches. It will also be pointed
out how the two different dual problems (those of Wolfe and Mond–Weir)
can be combined. Some applications, particularly to fractional programming,
will be discussed.

Key words: Mond–Weir dual, linear programming, Wolfe dual

8.1 Preliminaries

One of the most interesting and useful aspects of linear programming is du-
ality theory. Thus to the problem minimize ct x subject to Ax ≥ b, x ≥ 0
(when A is an m × n matrix) there corresponds the dual problem maximize
bt y subject to At y ≤ c, y ≥ 0. Duality theory says that for any feasible x
and y, ct x ≥ bt y; and, if x0 is optimal for the primal problem, there exists
an optimal y0 of the dual and ct x0 = bt y0 .

B. Mond
Department of Mathematics and Statistical Sciences, La Trobe University, Victoria 3086,
AUSTRALIA; Department of Mathematics and Statistics, University of Melbourne
Victoria 3052, AUSTRALIA
e-mail: [email protected] and [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 157

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 8,
c Springer Science+Business Media, LLC 2009
158 B. Mond

The ﬁrst extension of this duality theory was to quadratic programming. In

[4], Dorn considered the quadratic program with linear constraints minimize
2 x Cx + p x subject to Ax ≥ b, x ≥ 0, where C is a positive semi-deﬁnite
1 t t

matrix. Dorn [4] showed that this problem was dual to the quadratic problem
maximize − 21 ut Cu + bt y subject to At y ≤ Cu + p, y ≥ 0. Weak duality holds
and if x0 is optimal for the primal, there exists y0 such that (u = x0 , y0 )
is optimal for the dual with equality of objective functions. The require-
ment that C be positive semi-deﬁnite ensures that the objective function is
convex.

8.2 Convexity and Wolfe duality

Let f be a function from Rn into R and 0 ≤ λ ≤ 1. f is said to be convex if

for all x, y ∈ Rn ,

f (λy + (1 − λ)x) ≤ λf (y) + (1 − λ)f (x).

If f is diﬀerentiable, this is equivalent to

f (y) − f (x) ≥ (y − x)t ∇f (x). (8.1)

Although there are other characterizations of convex functions, we shall as-

sume that all functions have a continuous derivative and shall use the de-
scription (21.1) for convex functions.
The extension of duality theory to convex non-linear programming prob-
lems (with convex constraints) was ﬁrst given by Wolfe [17]. He considered
the problem
(P) minimize f (x) subject to g(x) ≤ 0,
and proposed the dual
(WD) maximize f (u) + y t g(u)
subject to ∇f (u) + ∇y t g(u) = 0, y ≥ 0.
Assuming f and g are convex, weak duality holds since, for feasible x and
(u, y)

f (x) − f (u) − y t g(u) ≥ (x − u)t ∇f (u) − y t g(u)

= −(x − u)t ∇y t g(u) − y t g(u) ≥ −y t g(x) ≥ 0.

He also showed that if x0 is optimal for (P) and a constraint qualiﬁcation is

satisﬁed, then there exists y0 such that (u = x0 , y0 ) is optimal for (WD) and
for these optimal vectors the objective functions are equal.
8 Mond–Weir Duality 159

8.3 Fractional programming and some extensions

of convexity

Simultaneous to the development of convex programming, there was consid-

eration of the fractional programming problem
(FP) minimize fg(x)
(x)
, (g(x) > 0) subject to h(x) ≤ 0,
and in particular, the linear fractional programming problem
t
(LFP) minimize dc tx+α t
x+β (d x + β) > 0
subject to Ax ≥ b, x ≥ 0.
It was noted (see, e.g., Martos [9]) that many of the features of linear
programming, such as duality and the simplex method, are easily adapted to
linear fractional programming, although the objective function is not convex.
This led to consideration of weaker than convexity conditions. Mangasar-
ian [6] deﬁned pseudo-convex functions that satisfy

(y − x)t ∇f (x) ≥ 0 =⇒ f (y) − f (x) ≥ 0.

Also, useful are quasi-convex functions that satisfy

f (y) − f (x) ≤ 0 =⇒ (y − x)t ∇f (x) ≤ 0.

It can be shown [6] that if f is convex, ≥ 0 and g concave, > 0 (or f convex, g
linear > 0) then f /g is pseudo-convex. It follows from this that the objective
function in the (LFP) is pseudo-convex.
Mangasarian [7] points out that whereas some results (such as suﬃciency
and converse duality) hold if, in (P), f is only pseudo-convex and g quasi-
convex, Wolfe duality does not hold for such functions. One example is the
following:
minimize x3 + x subject to x ≥ 1, which has the optimal value 2 at x = 1.
The Wolfe dual
maximize u3 + u + y(1 − u)
3u2 + 1 − y = 0, y ≥ 0

can be shown to → +∞ as u → −∞.

One of the reasons that in Wolfe duality, convexity cannot be weakened
to pseudo-convexity is because, unlike for convex functions, the sum of two
pseudo-convex functions need not be pseudo-convex. It is easy to see, how-
ever, that duality between (P) and the Wolfe dual (WD), does hold if the
Lagrangian f + y t g (y ≥ 0) is pseudo-convex. We show that weak duality
holds.
Since ∇[f (u) + y t g(u)] = 0, we have

(x − u)t ∇[f (u) + y t g(u)] ≥ 0 =⇒ f (x) + y t g(x) ≥ f (u) + y t g(u)

Now since y t g(x) ≤ 0, f (x) ≥ f (u) + y t g(u).

160 B. Mond

8.4 Mond–Weir dual

In order to weaken the convexity requirements, Mond and Weir [12] proposed
a diﬀerent dual to (P).
(MWD) maximize f (u)
subject to ∇f (u) + ∇y t g(u) = 0,
y t g(u) ≥ 0, y ≥ 0.

Theorem 1 (Weak duality). If f is pseudo-convex and y t g is quasi-convex,

then
f (x) ≥ f (u).

Proof. y t g(x) − y t g(u) ≤ 0 =⇒ (x − u)t ∇g(u) ≤ 0

∴ −(x − u)t ∇f (u) ≤ 0 or
(x − u)t ∇f (u) ≥ 0 =⇒ f (x) ≥ f (u).

It is easy to see that if also x0 is optimal for (P) and a constraint qualiﬁ-
cation is satisﬁed, then there exists a y0 such that (u = x0 , y0 ) is optimal for
(MWD) with equality of objective functions.
Consider again the problem minimize x3 + x subject to x ≥ 1, to which
Wolfe duality does not apply. The corresponding Mond–Weir dual maximize
u3 + u subject to 3u2 + 1 − y = 0, y(1 − u) ≥ 0, y ≥ 0, has an optimal at
u = 1, y = 4 with optimum value equal to 2.
Although many variants of (MWD) are possible (see [12]), we give a dual
that can be regarded as a combination of (WD) and (MWD). Let M =
{1, 2, . . . , m} and I ⊆ M .

max f (u) + yi gi (u)
i∈I
∇f (u) + ∇y t g(u) = 0, y ≥ 0

yi gi (u) ≥ 0.
i∈M/I

Weak duality holds if f + yi gi is pseudo-convex and yi gi is quasi-
i∈I i∈M/I
convex.

8.5 Applications

Since Mond–Weir duality holds when the objective function is pseudo-convex

but not convex, it is natural to apply the duality results to the fractional
programming problem (FP). Thus if f ≥ 0, convex and g > 0, concave,
y t h quasi-convex then weak duality holds between (FP) and the following
problem:
8 Mond–Weir Duality 161

max f (u)/g(u)
subject to ∇[f (u)/g(u) + y t h(u)] = 0
y t h(u) ≥ 0, y ≥ 0.

Other fractional programming duals can be found in Bector [1] and

Schaible [15, 16].
Instead of the problem (FP), Bector [1] considered the equivalent problem

minimize f (x)/g(x)
h(x)
subject to ≤ 0.
g(x)
t
Here the Lagrangian is f (x)+y
g(x)
h(x)
and is psuedo-convex if f and h are
convex, g is concave > 0, f + y h ≥ 0 (unless g is linear). Thus his dual to
t

(FP) is

f (u) + y t h(u)
maximize
g(u)
+ ,
f (u) + y t h(u)
subject to ∇ =0
g(u)
f (u) + y t h(u) ≥ 0, y ≥ 0.

Duality holds if f is convex, g is concave > 0, and h is convex.

A dual that combines the fractional dual of Mond–Weir and that of Bector
can be found in [13].
Schaible [15, 16] gave the following dual to (FP):

maximize λ
∇f (u) − λ∇g(u) + ∇y t h(u) = 0
f (u) − λg(u) + y t h(u) ≥ 0
y ≥ 0, λ ≥ 0.

Duality holds if f is convex, ≥ 0, g concave, > 0 and h is convex.

A Mond–Weir version of the Schaible dual is the following:

maximize λ
∇f (u) − λ∇g(u) + ∇y t h(u) = 0
f (u) − λg(u) ≥ 0
y t h(u) ≥ 0, λ ≥ 0, y ≥ 0.

Here duality holds if f is convex and nonnegative, g concave and strictly

positive, and y t h is quasiconvex.
162 B. Mond

A dual that is a combination of the last two is the following:

maximize λ
∇f (u) − λ∇g(u) + ∇y t h(u) = 0

f (u) − λg(u) + yi hi (u) ≥ 0
i∈I

yi hi (u) ≥ 0, λ ≥ 0, y ≥ 0
i∈M/I

Here yi hi (u) need only be quasi-convex for duality to hold.
i∈M/I
A fractional programming problem where Bector and Schaible duality do
not hold but the Mond–Weir fractional programming duals are applicable is
the following:
1
minimize − subject to x3 ≥ 1.
x>0 x
Here neither the Bector nor the Schaible dual is applicable. The Mond–
Weir Bector type dual is
1
maximize −
u>0 u
1
y = 4 , y(1 − u3 ) ≥ 0, y ≥ 0.
3u
The maximum value −1 is attained at u = 1, y = 13 .
The Mond–Weir Schaible type dual is

maximize λ
subject to − λ − 3yu2 = 0
−1 − λu ≥ 0
y(1 − u3 ) ≥ 0, y ≥ 0

The maximum is attained at

u = 1, y = 1/3, λ = −1.

8.6 Second order duality

Mangasarian [8] proposed the following second order dual to (P).

1 3 4
(MD) maximize f (u) + y t g(u) − pt ∇2 f (u) + ∇2 y t g(u) p
2
∇y t g(u) + ∇2 y t g(u)p + ∇f (u) + ∇2 f (u)p = 0
y≥0
8 Mond–Weir Duality 163

In [11] Mond established weak duality between (P) and (MD) under the
following conditions: If for all x, u, p
1
f (x) − f (u) ≥ (x − u)t ∇f (u) + (x − u)t ∇2 f (u)p − pt ∇2 f (u)p
2
(subsequently called second order convex by Mahajan [5]) and
1
gi (x) − gi (u) ≥ (x − u)t ∇gi (u) + (x − u)t ∇2 gi (u)p − pt ∇2 gi (u)p
2
i = 1, . . . , m

The second order convexity requirements can be weakened by suitably mod-

ifying the dual. The corresponding dual is
(MWSD) max f (u) − 12 pt ∇2 f (u)p

∇y t g(u) + ∇2 y t g(u)p + ∇f (u) + ∇2 f (u)p = 0

1
y t g(u) − pt [∇2 y t g(u)]p ≥ 0, y ≥ 0.
2
Weak duality holds between (P) and (MWSD) if f satisﬁes
1
(x − u)t ∇f (u) + (x − u)t ∇2 f (u)p ≥ 0 =⇒ f (x) ≥ f (u) − pt ∇2 f (u)p
2
(called second order pseudo-convex) and y t g satisﬁes
1
y t g(x) − y t g(u) + pt ∇2 y t g(u)p ≤ 0
2
=⇒ (x − u)t ∇y t g(u) + (x − u)t ∇2 [y t g(u)]p ≤ 0

(called second order quasi-convex).

Other second order duals can be found in [13].

8.7 Symmetric duality

In [3], Dantzig, Eisenberg and Cottle formulated the following pair of sym-
metric dual problems:

(SP) minimize K(x, y) − y t ∇2 K(x, y)

−∇2 K(x, y) ≥ 0
x≥0
(SD) maximize K(u, v) − ut ∇1 K(u, v)
−∇1 K(u, v) ≤ 0
v ≥ 0.
164 B. Mond

Weak duality holds if K is convex in x for ﬁxed y and concave in y for

ﬁxed x.
In [12] Mond and Weir considered the possibility of weakening the convex-
ity and concavity requirements by modifying the symmetric dual problems.
They proposed the following:
(MWSP) minimize K(x, y)
−∇2 K(x, y) ≥ 0
−y t ∇2 K(x, y) ≤ 0
x≥0
(MWSD) maximize K(u, v)
−∇1 K(u, v) ≤ 0
−ut ∇1 K(u, v) ≥ 0
v ≥ 0.

Weak duality holds if K is pseudo-convex in x for ﬁxed y and pseudo-concave

in y for ﬁxed x.

Proof. (x − u)t ∇1 K(u, v) ≥ 0 =⇒ K(x, v) ≥ K(u, v)

(v − y)t ∇2 K(x, y) ≤ 0 =⇒ K(x, v) ≤ K(x, y)
∴ K(x, y) ≥ K(u, v).
Once symmetric duality was shown to hold with only pseudo-convex and
pseudo-concave requirements, it was tempting to try to establish a pair of
symmetric dual fractional problems. Such a pair is given in [2].
minimize [φ(x, y)/ψ(x, y)]
ψ(x, y)∇2 φ(x, y) − φ(x, y)∇2 ψ(x, y) ≤ 0
y t [ψ(x, y)∇2 φ(x, y) − φ(x, y)∇2 ψ(x, y)] ≥ 0
x≥0
maximize [φ(u, v)/ψ(u, v)]
ψ(u, v)∇1 φ(u, v) − φ(u, v)∇1 ψ(u, v) ≥ 0
ut [ψ(u, v)∇1 φ(u, v) − φ(u, v)∇1 ψ(u, v)] ≤ 0
v≥0

Assuming that φ(·, y) and ψ(x, ·) are convex while φ(x, ·) and ψ(·, y) are
concave, then the objective function is pseudo-convex in x for ﬁxed y and
pseudo-concave in y for ﬁxed x. In this case weak duality holds, i.e., for
feasible (x, y) and (u, v)

φ(x, y)/ψ(x, y) ≥ φ(u, v)/ψ(u, v).

Finally we point out that Mond–Weir duality has been found to be useful
and applicable in a great many diﬀerent contexts. A recent check of Math
8 Mond–Weir Duality 165

Reviews showed 112 papers where the term Mond–Weir is used either in the
title or in the abstract. Seventy-eight of these papers are listed in [10].

References

1. C.R. Bector, Duality in Nonlinear Programming, Z. Oper. Res., 59 (1973), 183–193.

2. S. Chandra, B. D. Craven and B. Mond, Symmetric Dual Fractional Programming, Z.
Oper. Res., 29 (1985), 59–64.
3. G.G. Dantzig, E. Eisenberg and R.W. Cottle, Symmetric Dual Nonlinear Programs,
Paciﬁc J. Math., 15 (1965), 809–812.
4. W.S. Dorn, Duality in Quadratic Programming, Quart. Appl. Math., 18 (1960), 155–
162.
5. D.G. Mahajan, Contributions to Optimality Conditions and Duality Theory in Non-
linear Programming, PhD Thesis, IIT Bombay, India, 1977.
6. O.L. Mangasarian, Pseudo-convex Functions, SIAM J. Control, 3 (1965), 281–290.
7. O.L. Mangasarian, Nonlinear Programming, McGraw-Hill, New York, 1969.
8. O.L. Mangasarian, Second and Higher-order Duality in Nonlinear Programming, J.
Math. Anal. Appl., 51 (1975), 607–620.
9. B. Martos, Nonlinear Programming; Theory and Methods, North Holland Pub. Co.,
Amsterdam, 1975.
10. B. Mond, What is Mond-Weir Duality?, in, Recent Developments in Operational Re-
search, Manja Lata Agarwal and Kanwar Sen, Editors, Narosa Publising House, New
Delhi, India, 2001, 297–303.
11. B. Mond, Second Order Duality for Non-linear Programs, Opsearch, 11 (1974), 90–99.
12. B. Mond and T. Weir, Generalized concavity and duality, in Generalized Concavity in
Optimization and Economics, S. Schaible and W.T. Ziemba, Editors, Academic Press,
New York, 1981, 263–279.
13. B.Mond and T. Weir, Duality for Fractional Programming with Generalized Convexity
Conditions, J. Inf. Opt. Sci., 3 (1982), 105–124.
14. B. Mond and T. Weir, Generalized Convexity and Higher Order Duality, J. Math. Sci.,
16–18 (1981–83), 74–94.
15. S. Schaible, Duality in Fractional Programming: A Uniﬁed Approach, Oper. Res., 24
(1976), 452–461.
16. S. Schaible, Fractional Programming I, Duality, Man. Sci., 22 (1976), 858–867.
17. P. Wolfe, A Duality Theorem for Nonlinear Programming, Quart. Appl. Math., 19
(1961), 239–244.
Chapter 9
Computing the fundamental matrix
of an M /G/1–type Markov chain

Emma Hunt

Abstract A treatment is given of a probabilistic approach, Algorithm H, to

the determination of the fundamental matrix of a block-structured M/G/1–
type Markov chain. Comparison is made with the cyclic reduction algorithm.

Key words: Block Markov chain, fundamental matrix, Algorithm H,

convergence rates, LR Algorithm, CR Algorithm

9.1 Introduction

By a partitioned or block–M/G/1 Markov chain we mean a Markov chain

with transition matrix of block-partitioned form
⎡ ⎤
B1 B2 B3 B4 . . .
⎢ A0 A1 A2 A3 . . . ⎥
⎢ ⎥
P = ⎢ 0 A0 A1 A2 . . . ⎥ ,
⎣ ⎦
.. .. .. .. . .
. . . . .

where each block is k × k, say. We restrict attention to the case where the
chain is irreducible but do not suppose positive recurrence. If the states are
partitioned conformably with the blocks, then the states corresponding to
block (≥ 0) are said to make up level and to constitute the phases of level
. The j-th phase of level will be denoted (, j).
In [18] Neuts noted a variety of special cases of the block–M/G/1 Markov
chain which occur as models in various applications in the literature, such as

Emma Hunt
School of Mathematical Sciences & School of Economics, The University of Adelaide,
Adelaide SA 5005, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 167

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 9,
c Springer Science+Business Media, LLC 2009
168 E. Hunt

Bailey’s bulk queue (pp. 66–69) and the Odoom–Lloyd–Ali Khan–Gani dam
(pp. 69–71 and 348–353).
For applications, the most basic problem concerning the block–M/G/1
Markov chain is finding the invariant probability measure in the positive
recurrent case. We express this measure as π = (π0 , π1 , . . .), the components
πi being k-dimensional vectors so that π is partitioned conformably with the
structure of P . An efficient and stable method of determining π has been
devised by Ramaswami [20] based on a matrix version of Burke’s formula.
The key ingredient here is the fundamental matrix, G. This arises as follows.
Denote by Gr, (1 ≤ r, ≤ k) the probability, given the chain begins in
state (i + 1, r), that it subsequently reaches level i ≥ 0 and that it first
does so by entering the state (i, ). By the homogeneity of the transition
probabilities in levels one and above, plus the fact that trajectories are skip-
free downwards in levels, the probability Gr, is well defined and independent
of i. The fundamental matrix G is defined by G = (Gr, ).
A central property of G, of which we shall make repeated use, is that it is
the smallest nonnegative solution to the matrix equation
∞

G = A(G) := Ai Gi , (9.1)
i=0

where we deﬁne G0 = I [9, Sections 2.2, 2.3].

Now deﬁne.................................
∞
∞

A∗i = Aj Gj−i , Bi∗ = Bj+1 Gj−i , i ≥ 1. (9.2)
j=i j=i

The Ramaswami generalization of the Burke formula to block–M/G/1

Markov chains is as follows (Neuts [18, Theorem 3.2.5]).

Theorem A. For a positive recurrent block–M/G/1 chain P , the matrix

I − A∗1 is invertible and the invariant measure of P satisﬁes
⎡ ⎤

i−1
πi = ⎣π0 Bi∗ + ∗
πj Bi+1−j ⎦ (I − A∗1 )−1 (i ≥ 1).
j=1

The determination of π0 is discussed in Neuts [18, Section 3.3]. The the-

orem may then be used to derive the vectors πi once G is known. Thus the
availability of eﬃcient numerical methods for computing the matrix G is
crucial for the calculation of the invariant measure.
Diﬀerent algorithms for computing the minimal nonnegative solution of
(9.1) have been proposed and analyzed by several authors. Many of them
arise from functional iteration techniques based on manipulations of (9.1).
For instance, in Ramaswami [19] the iteration
9 Computing the fundamental matrix of an M/G/1–type Markov chain 169

∞

Xj+1 = Ai Xji , X0 = 0, (9.3)
i=0

was considered. Similar techniques, based on the recurrences

∞

−1 −1
Xj+1 = (I − A1 ) A0 + (I − A1 ) Ai Xji (9.4)
i=2

or
∞
−1

Xj+1 = I− Ai Xji−1 A0 (9.5)
i=1

were introduced in Neuts [18], Latouche [12] in order to speed up the con-
vergence. However, the convergence of these numerical schemes still remains
linear. In Latouche [13] a Newton iteration was introduced in order to arrive
at a quadratic convergence, with an increase in the computational cost. In
Latouche and Stewart [15] the approximation of G was reduced to solving
nested finite systems of linear equations associated with the matrix P by
means of a doubling technique. In this way the solution for the matrix P
is approximated with the solution of the problem obtained by cutting the
infinite block matrix P to a suitable finite block size n.
In this chapter we present a probabilistic algorithm, Algorithm H, for the
determination of the fundamental matrix G in a structured M/G/1 Markov
chain. An account of the basic idea is given by the author in [11]. Algorithm
H is developed in the following three sections. In Section 5 we then consider
an alternative approach to the calculation of G. This turns out to be rather
more complicated than Algorithm H in general. We do not directly employ
this algorithm, Algorithm H, and we do not detail all its steps.
However Algorithm H serves several purposes. First, we show that it and
Algorithm H possess an interlacing property. This enables us to use it to
obtain (in Section 7) information about the convergence rate of Algorithm H.
Algorithm H reduces to Algorithm LR, the logarithmic-reduction algorithm
of Latouche and Ramaswami [14], in the quasi birth–death (QBD) case. Thus
for a QBD the interlacing property holds for Algorithms H and LR. This we
consider in Section 8.
In Section 9 we address the relation between Algorithm H and Bini and
Meini’s cyclic reduction algorithm, Algorithm CR. Algorithm CR was devel-
oped and refined in a chain of articles that provided a considerable improve-
ment over earlier work. See in particular [3]–[10]. We show that Algorithm H
becomes Algorithm CR under the conditions for which the latter has been
established. Algorithm H is seen to hold under more general conditions than
Algorithm CR. It follows from our discussion that, despite a statement to the
contrary in [8], Algorithm CR is different from Algorithm LR in the QBD
case.
170 E. Hunt

9.2 Algorithm H: Preliminaries

It proves convenient to label the levels of the chain A as −1, 0, 1, 2, . . ., so

that A is homogeneous in the one-step transition probabilities out of all
nonnegative levels. Thus the matrix G gives the probabilities relating to ﬁrst
transitions into level −1, given the process starts in level 0. Since we are
concerned with the process only up to the ﬁrst transition into level −1, we
may without loss of generality change the transition probabilities out of level
−1 and make each phase of level −1 absorbing. That is, we replace our chain
A with a chain A with levels −1, 0, 1, 2, . . . and structured one-step transition
matrix
⎡ ⎤
I 0 0 ...
⎢ A0 A1 A2 . . . ⎥
⎢ ⎥
P = ⎢ 0 A0 A1 . . . ⎥ .
⎣ ⎦
.. .. .. . .
. . . .

Most of our analysis will be in terms of the (substochastic) subchain A0

with levels 0, 1, 2, . . . and structured one-step transition matrix
⎡ ⎤
A1 A2 A3 . . .
⎢ A0 A1 A2 . . . ⎥
⎢ ⎥
P (0) = ⎢ 0 A0 A1 . . . ⎥ .
⎣ ⎦
.. .. .. . .
. . . .

The assumption that A is irreducible entails that every state in a

nonnegative-labeled level of A has access to level −1. Hence all the states
of A0 are transient or ephemeral.
For t ≥ 0, let Xt denote the state of A0 at time t and let the random
variable Yt represent the level of A0 at time t. For r, s ∈ K := {1, 2, . . . , k}
we deﬁne

/ *
*
Ur,s := P {Xt = (0, s), Yu > 0 (0 < u < t)} *X0 = (0, r) .
t>0

Thus Ur,s is the probability that, starting in (0, r), the process A0 revisits
level 0 at some subsequent time and does so with ﬁrst entry into state (0, s).
The matrix U := (Ur,s ) may be regarded as the one-step transition matrix
of a Markov chain U on the ﬁnite state space K. The chain U is a censoring
of A0 in which the latter is observed only on visits to level zero. No state of
U is recurrent, for if r ∈ K were recurrent then the state (0, r) in A0 would
be recurrent, which is a contradiction. Since no state of U is recurrent, I − U
is invertible and
∞
U i = (I − U )−1 .
i=0
9 Computing the fundamental matrix of an M/G/1–type Markov chain 171

The matrix U is also strictly substochastic, that is, at least one row sum is
strictly less than unity.
any path whose probability contributes to Gr,s begins in (0, r), makes
In A,
some number n ≥ 0 of revisits to level 0 with −1 as a taboo level, and then
takes a ﬁnal step to (−1, s). Suppose the ﬁnal step to (−1, s) is taken from
(0, m). Allowing for all possible choices of m, we derive
∞

i
Gr,s = U (A0 )m,s ,
m∈K i=0 r,m

so that ∞

G= U i
A0 = (I − U )−1 A0 .
i=0

Our strategy for ﬁnding G is to proceed via the determination of U .

For ≥ 0, we write U () for the matrix whose entries are given by

/ *
*
(U ())r,s := P {Xt = (0, s), 0 < Yu < (0 < u < t)} *X0 = (0, r)
t>0

for r, s ∈ K. Thus U () corresponds to U when the trajectories in A0 are

further restricted not to reach level or higher before a ﬁrst return to
level 0.
We may argue as above that I − U () is invertible and that
∞

I − U () = (U ())i .
i=0

Further, since U is ﬁnite,

U () ↑ U as → ∞

and
[I − U ()]−1 ↑ [I − U ]−1 as → ∞.
The probabilistic construction we are about to detail involves the exact
algorithmic determination (to machine precision) of U () for of the form
2N with N a nonnegative integer. This leads to an approximation
3 4−1
TN := I − U (2N ) A0

for G. We have
TN ↑ G as N → ∞.
The matrix TN may be interpreted as the contribution to G from those
trajectories from level 0 to level −1 in A that are restricted to pass through
only levels below 2N .
172 E. Hunt

9.3 Probabilistic construction

We construct a sequence (Aj )j≥0 of censored processes, each of which has

the nonnegative integers as its levels. For j ≥ 1, the levels 0, 1, 2, . . . of Aj
are respectively the levels 0, 2, 4, . . . of Aj−1 , that is, Aj is Aj−1 censored to
be observed in even-labeled levels only. Thus Aj is a process that has been
censored j times. By the homogeneity of one-step transitions out of level 1
and higher levels, Aj has a structured one-step transition matrix
⎡ (j) (j) (j) ⎤
B1 B2 B3 . . .
⎢ (j) (j) (j) ⎥
⎢ A0 A1 A2 . . . ⎥
(j)
P =⎢ ⎢ ⎥,
(j) (j) ⎥
⎣ 0 A0 A1 . . . ⎦
.. .. .. . .
. . . .

(0)
that is, each chain Aj is of structured M/G/1 type. We have Bi = Ai for
(0)
i ≥ 1 and Ai = Ai for i ≥ 0.
In the previous section we saw that A0 contains no recurrent states, so
the same must be true also for the censorings A1 , A2 , ... . The substochastic
(j) (j)
matrices B1 , A1 , formed by censoring Aj to be observed only in levels 0
(j)
and 1 respectively, thus also contain no recurrent states. Hence I − B1 and
(j)
I − A1 are both invertible.
We now consider the question of deriving the block entries in P (j+1) from
(j) (j)
those in P (j) . First we extend our earlier notation and write Xt , Yt respec-
tively for the state and level of Aj at time t ∈ {0, 1, . . .}. For h a nonnegative
(j)
integer, deﬁne the event Ωs,t,h by
6 7
(j) (j) (j)
Ωs,t,h = Xt = (h, s), Yu(j) − Y0 is even (0 < u < t)

(j+1)
and for n ≥ 0, deﬁne the k × k matrix Ln by
% * &
/ (j) *
(j+1) * (j)
Ln :=P Ωs,t,2+2n * X0 = (2 + 1, r)
r,s *
t>0

for r, s ∈ K. By the homogeneity in positive-labeled levels of the one-step

transition probabilities in Aj , the left-hand side is well deﬁned, that is, the
right-hand side is independent of the value of ≥ 0.
(j+1)
We may interpret (Ln )r,s as the probability, conditional on initial state
(2 + 1, r), that the ﬁrst transition to an even-labeled level is to state (2 +
2n, s).
We may express the transitions in Aj+1 in terms of those of Aj and the ma-
(j+1)
trices Ln by an enumeration of possibilities. Suppose i > 0. A single-step
transition from state (i, r) to state (i − 1 + n, s) (n ≥ 0) in Aj+1 corresponds
9 Computing the fundamental matrix of an M/G/1–type Markov chain 173

to a transition (possibly multistep) from state (2i, r) to state (2(i − 1 + n), s)

in Aj that does not involve passage through any other state with even-labeled
(j)
level. When n > 0, this may occur in a single step with probability (A2n−1 )r,s .
For a multistep transition, we may obtain the probability by conditioning on
the ﬁrst state lying in an odd-labeled level. This gives

(j)

n
(j) (j+1)
A(j+1)
n = A2n−1 + A2m Ln−m (n ≥ 1). (9.6)
m=0

For n = 0, there is no single-step transition in Aj producing a drop of two

levels, so the leading term disappears to give
(j+1) (j) (j+1)
A0 = A0 L0 . (9.7)

A similar argument gives

(j)

n
(j) (j+1)
Bn(j+1) = B2n−1 + B2m Ln−m (n ≥ 1) (9.8)
m=1

for transitions from level 0.

(j+1)
The determination of the matrices Ln proceeds in two stages. For n ≥
(j+1)
0, deﬁne the k × k matrix Kn by
% * &
/ (j) *
(j+1) * (j)
Kn := P Ωs,t,2+1 * X0 = (2 + 1, r)
r,s *
t>0

for r, s ∈ K. Again the left-hand side is well defined. We may interpret this
as follows. Suppose Aj is initially in state (2 + 1, r). The (r, s) entry in
(j+1)
Kn is the probability that, at some subsequent time point, Aj is in state
(2 + 2n + 1, s) without in the meantime having been in any even–labelled
level.
(j+1)
Each path in Aj contributing to Ln consists of a sequence of steps each
of which involves even-sized changes of level, followed by a final step with an
odd-sized change of level. Conditioning on the final step yields

n
(j)
L(j+1)
n = (j+1)
Km A2(n−m) (n ≥ 0). (9.9)
m=0

To complete the speciﬁcation of P (j+1) in terms of P (j) , we need to deter-

(j+1)
mine the matrices Kn . We have by deﬁnition that
% * &
/ *
(j+1) (j) * (j)
K0 = Ωs,t,2+1 * X0 = (2 + 1, r) .
r,s *
t>0
174 E. Hunt

(j+1)
Since Aj is skip-free to the left, trajectories contributing to K0 cannot
change level and so
∞
i −1
(j+1) (j) (j)
K0 = A1 = I − A1 . (9.10)
i=0

(j+1)
For n > 0, Kn involves at least one step in Aj with an increase in level.
Conditioning on the last such step yields the recursive relation

n−1
(j) (j+1)
Kn(j+1) = (j+1)
Km A2(n−m)+1 K0 (n ≥ 1). (9.11)
m=0

We may also develop a recursion by conditioning on the ﬁrst such jump

between levels. This gives the alternative recursive relation

n−1
(j+1) (j)
Kn(j+1) := K0 (j+1)
A2(n−m)+1 Km (n ≥ 1). (9.12)
m=0

Since level 1 in AN corresponds to level 2N in A0 , paths in AN from (0, r)

to (0, s) that stay within level 0 correspond to paths from (0, r) to (0, s) in
A0 that do not reach level 2N or higher. Hence

(N )
B1 = U (2N ) r,s
r,s

for r, s ∈ K, or
(N )
B1 = U (2N ).
Thus the recursive relations connecting the block entries in P (j+1) to those
in P (j) for j = 0, 1, . . . , N − 1 provide the means to determine U (2N ) exactly
and so approximate G.

9.4 Algorithm H

In the last section we considered the sequence of censored processes (Aj )j≥0 ,
(N )
each with the nonnegative integers as its levels. The determination of B1 re-
quires only a ﬁnite number of the matrix entries in each P (j) to be determined.
For the purpose of calculating TN , the relevant parts of the construction of
the previous section may be summarized as follows.
The algorithm requires initial input of A0 , A1 , . . . , A2N −1 . First we specify

Bn(0) = An (n = 1, . . . , 2N ),
A(0)
n = An (n = 0, 1, . . . , 2N − 1).
9 Computing the fundamental matrix of an M/G/1–type Markov chain 175

We then determine
(j) (j)
B1 , B2 , . . . , B2N −j ,
(j) (j)
A0 , A1 , . . . , A2N −j −1

recursively for j = 1, 2, . . . , N as follows. To obtain the block matrices in

Aj+1 from those in Aj , ﬁrst ﬁnd the auxiliary quantities
' (−1
(j+1) (j)
K0 = I − A1 ,

n−1
(j) (j+1)
Kn(j+1) = (j+1)
Km A2(n−m)+1 K0 ,
m=0

for n = 1, 2, . . . , 2N −j−1 − 1, and

n
(j)
L(j+1)
n = (j+1)
Km A2(n−m) ,
m=0

for n = 0, 1, . . . , 2N −j−1 − 1.
Calculate (j+1) (j) (j+1)
A0 = A0 L0
and
(j)

n
(j) (j+1)
Bn(j+1) = B2n−1 + B2m Ln−m ,
m=1

(j)
n
(j) (j+1)
A(j+1)
n = A2n−1 + A2m Ln−m ,
m=0

for n = 1, 2, . . . , 2N −j−1 − 1.
(N )
The above suﬃces for the evaluation of B1 . We then compute
' (−1
(N )
TN = I − B1 A0 ,

which is an approximation to G incorporating all contributing paths in A

that do not involve level 2N (or higher). The algorithm may be speciﬁed as
a short MATLAB program.

9.5 Algorithm H: Preliminaries

We now consider an Algorithm H oriented toward a diﬀerent way of calculat-

ing G. This involves two sequences (Mj )j≥1 , (Nj )j≥1 of censored processes,
all with levels −1, 0, 1, 2, . . .. The process Mj has block-structured one-step
transition matrix
176 E. Hunt
⎡ ⎤
I 0 0 0 ...
⎢ (j) (j) (j) (j) ⎥
⎢ A0 A1 A2 A3 . . . ⎥
P
(j)
⎢
=⎢ ⎥.
(j) (j) (j)
⎥
⎣ 0 A0 A1 A2 . . . ⎦
.. .. .. .. . .
. . . . .

The process Nj has block-structured one-step transition matrix

⎡ ⎤
I 0 0 0 ...
⎢ (j) (j) (j) ⎥
⎢ B0 0 B2 B3 . . . ⎥
Q =⎢ ⎥.
(j)
⎢ 0 B0 (j)
0 B2 . . . ⎥
(j)
⎣ ⎦
.. .. .. .. . .
. . . . .

that is,
These are set up recursively, beginning with M1 = A,

(1)
Ai := Ai (i ≥ 0). (9.13)

We construct Nj by censoring Mj , observing it only when it is either in level

−1 or a change of level occurs. Thus for i = 1,

(j)
' (
(j) −1 (j)
Bi = I − A1 Ai . (9.14)

We form Mj+1 by censoring Nj , observing only the odd-labeled levels −1,

1, 3, 5, ... , and then relabeling these as −1, 0, 1, 2, ... . Thus level
≥ −1 of Mj+1 and Nj+1 corresponds to level 2( + 1) − 1 of Mj and Nj . It
follows that level (≥ −1) of Mj and Nj corresponds to level ( + 1)2j−1 − 1
of M1 and N1 .
(j+1) (j) (j)
We derive the blocks of P from those of Q as follows. Let X t ,
(j)
Y t denote respectively the state and level of Nj at time t. Following the
procedure involved in Algorithm H, deﬁne for h a nonnegative integer

(j)
6 (j) (j) (j)
7
Ω s,t,h = X t = (k, s), Y u − Y 0 is even (0 < u < t) .

(j+1)
The matrices Ln are then deﬁned for n ≥ 0 by
% * &
/ *
(j+1) (j) * (j)
Ln := P Ω s,t,2+2n−1 * X 0 = (2, r)
r,s *
t>0

for r, s ∈ K. As before the right-hand side is independent of > 0. The matrix

(j+1) (j+1)
Ln plays a similar role to that of Ln for Algorithm H, only here the
trajectories utilize only even-labeled levels of Nj except for a ﬁnal step to an
odd-labeled level.
9 Computing the fundamental matrix of an M/G/1–type Markov chain 177

Arguing as before, we derive

(j+1) (j)
n
(j) (j+1)
An = B 2n−1 + B 2m Ln−m (n > 1) (9.15)
m=0

with
(j+1)
n
(j) (j+1)
An = B 2m Ln−m (n = 0, 1). (9.16)
m=0

The derivation is identical to that leading to (9.7) and (9.6). For n = 1 there is
no term corresponding to the ﬁrst term on the right in (9.6) since the present
(j)
censoring requires B 1 := 0. The present censoring out of even-labeled, as
opposed to odd-labeled, levels means that no analogue to (9.8) is needed.
(j+1)
As before we now determine the matrices Ln in terms of matrices
(j+1)
Kn . We deﬁne
% * &
/ *
(j+1) (j) * (j)
Kn := P Ω s,t,2+2n * X 0 = (2, r)
r,s *
t>0

for r, s ∈ K and n ≥ 1. As before the right-hand side is independent of ≥ 0.

(j)
By analogy with (9.10), we have since B 1 = 0 that
' (
(j) −1
(j+1)
K0 = I − B1 = I.

By analogy with (9.9), we have

(j+1)
n
(j+1) (j)
Ln = Km B 2(n−m)
m=0

(j)
n
(j+1) (j)
= B 2n + Km B 2(n−m) (n ≥ 0). (9.17)
m=1

For n = 0 we adopt the convention of an empty sum being interpreted as

zero.
Finally we have corresponding to (9.11) that

(j+1)
n−1
(j+1) (j)
Kn = Km B 2(n−m)+1
m=0

(j)
n−1
(j+1) (j)
= B 2n+1 + Km B 2(n−m)+1 (n ≥ 1), (9.18)
m=1

(j+1)
where again the empty sum for n = 1 is interpreted as zero. As with Ln ,
the leading term on the right-hand side corresponds to a single-step transition
in Nj while the sum incorporates paths involving more than one step in Nj .
178 E. Hunt

(N )
As with the computations involved in Algorithm H, B 0 can be calculated
in a ﬁnite number of steps. We may identify the relevant steps as follows.
(1)
We require initial input of A0 , A1 , ... , A2N −1 . First we specify An = An
for n = 0, 1, . . . , 2 − 1. We calculate
N

(j) (j) (j)

A0 , A1 , . . . , A2N −j −1 ,
(j) (j) (j) (j)
B 0 , B 2 , B 3 , . . . , B 2N −j −1
recursively for j = 2, . . . , N as follows.
We have
(j)
' (
(j) −1 (j)
B n = I − A1 An n = 0, 2, . . . , 2N −j − 1. (9.19)

To obtain the matrices

(j+1) (j+1)
A0 , A1 , . . . , A2N −j−1 −1 ,

ﬁrst evaluate the auxiliary quantities

(j+1) (j)
n−1
(j+1) (j) (j+1)
Kn = B 2n+1 + Km A2(n−m)+1 K0 (9.20)
m=1
for n = 1, 2, . . . , 2N −j−1 − 1 and
(j+1) (j) n
(j+1) (j)
Ln = B 2n + K m B 2(n−m) (9.21)
m=1

n = 0, 1, . . . , 2N −j−1 − 1 and then calculate

(j+1)
n
(j) (j+1)
An = B 2m Ln−m (n = 0, 1), (9.22)
m=0

(j+1) (j)
n
(j) (j+1)
An = B 2n−1 + B 2m Ln−m (n = 2, . . . , 2N −j−1 − 1). (9.23)
m=0

We shall make use of this construction in subsequent sections.

9.6 H, G and convergence rates

(j)
For j ≥ 1, deﬁne M by
⎡ * ⎤
/ *
*
:= P ⎣ Φs,t ** X 0 = (0, r)⎦
(j) (j)
M (9.24)
r,s
t≥0 *
9 Computing the fundamental matrix of an M/G/1–type Markov chain 179

for r, s ∈ K, where
(j) 8 9
Φs,t = X t = (2j−1 , s), 0 ≤ Y u < 2j − 1, Y u = 2j−1 − 1 (0 < u < t) .

We note that this gives

(1)
M = I. (9.25)
Also for j ≥ 1 we put
% &
/8 9 **
(Vj )r,s = P X t = (−1, s), 2j−1
− 1 ≤ Yt < 2 − 1 *X 0 = (0, r)
j

t>0

for r, s ∈ K. Here Y = max0≤u<t Y u . We may interpret Vj as the contribution

to G from those trajectories which reach a level of at least 2j−1 − 1 but
achieve a maximum level less than 2j − 1. By decomposing G according to
the maximum level reached by a trajectory contributing to it, we can thus
derive
∞
G= Vj . (9.26)
j=1

For j = 1, we have also that

% &
/8 9 **
(V1 )r,s := P X t = (−1, s), Y u = 0 (0 ≤ u < t) * X 0 = (0, r)
t>0

= B0 r,s
,

so that
(1)
V1 = B 0 . (9.27)
(j)
Proposition 9.6.1 For j ≥ 1, the matrices Vj , M are related by
(j) (j)
Vj = M B0 . (9.28)

Proof. By (9.25) and (9.27), the result is immediate for j = 1, so suppose

j > 1. Since A is skip-free from above in levels, every trajectory contributing
to Vj must at some time pass through level 2j−1 − 1. Conditioning on the
ﬁrst entry into level 2j−1 − 1, we have
% * &
(j) / (j) **
(Vj )r,s = M P Ψs,t * X 0 = (2j−1
− 1, m)
r,m *
m∈K t>0
(j) ' (j) * (
* (j)
= M P X 1 = (−1, s)*X 0 = (0, m)
r,m
m∈K
(j) (j)
= M B0 ,
r,s
180 E. Hunt

where
(j) 8 9
Ψs,t = X t = (−1, s), 0 ≤ Y u < 2j − 1 (0 < u < t) ,

giving the required result.

Once a convenient recursion is set up for the determination of the matrix

(j)
M , this may be used to set up Algorithm H for approximating G by use
of (9.28). Iteration N of Algorithm H gives the estimate

N +1
T N := Vj (9.29)
j=1
for G.
The contribution T N is the contribution to G (describing transitions from
level 0 to level −1) by paths which which reach a level of at most 2N +1 −2. The
estimate TN from the ﬁrst N iterations of Algorithm H is the contribution
from paths which reach a level of 2N +1 − 1 at most. Hence we have the
interlacing property
T 1 ≤ T1 ≤ T 2 ≤ T2 ≤ T 3 ≤ . . .

connecting the successive approximations of G in Algorithms H and H. We

have T N ↑ G and TN ↑ G as N → ∞.
The interlacing property yields
(G − T 1 )e ≥ (G − T1 )e ≥ (G − T 2 )e ≥ (G − T2 )e ≥ . . .

or
e − T 1 e ≥ e − T1 e ≥ e − T 2 e ≥ e − T2 e ≥ . . .
in the case of stochastic G.
The interlacing property need not carry over to other error measures such
as goodness of ﬁt to the equation G = A(G). This will be further discussed
in the subsequent partner chapter in this volume.
Theorem 9.6.1 If A is transient and irreducible, then T N converges to G
quadratically as N → ∞.

Proof. If A is transient and irreducible, then the maximal eigenvalue of G

is numerically less than unity. We may choose a matrix norm such that 0 <
G < ξ < 1. We have
% * &
(j) / (j) **
B0 =P Ψs,t * X 0 = (2j−1 − 1, r)
r,s *
t>0
% * &
/8 9**
≤P X t = (−1, s) * X 0 = (2 j−1
− 1, r)
*
t>0
j−1
= G(2 ) .
r,s
9 Computing the fundamental matrix of an M/G/1–type Markov chain 181

Choose K ≥ 1 to be an upper bound for the norm of all substochastic

k × k matrices. (For some norms K equals 1 will suﬃce.) Then by (9.28)
(j) j−1 j−1
Vj ≤ K B0 ≤ K G(2 )
< Kξ (2 )
.

Hence
2 2 2 2
2 2 2 2
2 N
2 2 ∞ 2 ∞

2G − V 2 ≤ 2 V 2≤ j−1
Kξ (2 )
2 j 2 2 j 2
2 j=1 2 2j+N +1 2 j=N +1
∞

N
= Kξ (2 )
ξ (2 )

=0
(2N )

< Kξ / 1 − ξ2 ,

whence the stated result.

Corollary 9.6.1 By the convergence result for T N and the interlacing prop-
erty, Algorithm H also converges to G quadratically when A is transient and
irreducible.
(j)
Remark 1. Similarly to the argument for B 0 , we have that
% * &
(j) / (j) **
B2 =P Λs,t * X 0 = (2j−1
− 1, r) , (9.30)
r,s *
t>0

where
(j) 8 9
Λs,t = X t = (2j − 1, s), 0 ≤ Y u < 3 · 2j−1 − 1 (0 ≤ u < t) .

We shall make use of (9.30) in the next section.

By the interlacing property, an implementation of Algorithm H would,
for the same number of iterations, be no more accurate than Algorithm H.
(j)
The computation of the auxiliary matrices M appears to be in general
quite complicated, so Algorithm H oﬀers no special advantages. However it
does have theoretical interest, as with the interlacing property shown above
and the consequent information about the convergence rate of Algorithm
H. We shall see in the next section that in the special case of a QBD, Al-
gorithm H reduces to the logarithmic-reduction algorithm of Latouche and
Ramaswami [14].

9.7 A special case: The QBD

There is considerable simpliﬁcation to Algorithm H in the special case of a

QBD, arising from the fact that this is skip-free in levels both from above
and from below. We now investigate this situation.
182 E. Hunt

(j) (j+1)
Because An = 0 for n > 2, (9.20) gives K n = 0 for n > 0. We have
(j+1)
already seen that K 0 = I. Relation (9.21) now provides
(j+1) (j) (j+1) (j) (j+1)
L0 = B0 , L1 = B 2 and Ln = 0 for n > 1.

Equations (9.22) and (9.23) consequently yield

(j+1)
(j) 2
An = Bn for n = 0, 2, (9.31)

(j+1) (j) (j) (j) (j)

A1 = B0 B2 + B2 B0 . (9.32)
The relations (9.19) give

(j)
' (
(j) −1 (j)
B n = I − A1 An for n = 0, 2. (9.33)

Equations (9.31)–(9.33) are simply the familiar deﬁning relations for Al-
(j)
gorithm LR. We now turn our attention to the matrix M .
For a QBD with j > 1,
% * &
(j) / (j) **
M =P χs,t * X 0 = (0, r)
r,s *
t>0

where
(j) 8 9
χs,t = X t = (2j−1 − 1, s), 0 ≤ Y u < 2j−1 − 1, (0 < u < t)

and so
% * &
/ *
(j+1) j+1 *
M =P Ψs,t * X 0 = (0, r) .
r,s *
t>0

Thus in particular
% &
(2) /8 9 **
M =P X t = (1, s), Y u = 0, (0 < u < t) *X 0 = (0, r)
r,s
t>0
3 * 4
= P X 1 = (1, s)* X 0 = (0, r)
(1)
= B2 ,
r,s

so that
(2) (1)
M = B2 . (9.34)
For j ≥ 2, we derive by conditioning on the ﬁrst passage of N1 to level
2j−1 − 1 that
9 Computing the fundamental matrix of an M/G/1–type Markov chain 183
(j+1)

M
r,s
% * & % * &
/ * / *
(j) * (j) *
= P χm,t * X 0 = (0, r) × P Υs,t,v * X t j−1
= (2 − 1, m)
* *
m∈K t>0 v>0

(j)
(j)
= M B2
r,m
m∈K m,s
(j) (j)

= M B2 ,
r,s

where
(j) 8 9
Υs,t,v = X t+v = (2j − 1, s), 0 ≤ Y u < 2j − 1 (t < u < t + v) .

Thus
(j+1) (j) (j)
M =M B2
and so for j > 2
(j+1) (2) (2) (j)
M =M B2 . . . B2
(1) (2) (j)
= B2 B2 . . . B2 .

Taking this result with (9.34) yields

(j) (1) (j−1)
M = B2 . . . B2 for j ≥ 2.

Finally, Proposition 9.6.1 provides

(1)
B0 for j = 1
Vj = (1) (j−1) (j)
B2 . . . B2 B 0 for j > 1.

With this evaluation for Vj , (9.29) is the formula employed in Algorithm LR

for calculating approximations to G. Thus Algorithm H reduces to Algorithm
LR in the case of a QBD.
In the case of a QBD, some simpliﬁcations occur also in Algorithm H.
(j) (j)
Since An = 0 for n > 2, we have Kn = 0 for n > 0 and so
(j+1) (j+1) (j)
L0 = K0 A0 ,
(j+1) (j+1) (j)
L1 = K0 A2 ,
L(j+1)
n = 0 for n > 1.
(j)
Also Bn = 0 for n > 2.
184 E. Hunt

The relations linking Aj+1 and Aj are thus

(j+1) (j) (j+1) (j)
Ai = Ai K0 Ai (i = 0, 2),
(j+1) (j) (j) (j+1) (j) (j) (j+1) (j)
A1 = A1 + A0 K0 A2 + A2 K0 A0 ,
(j+1) (j) (j) (j+1) (j)
B1 = B 1 + B2 K 0 A0 ,
(j+1) (j) (j+1) (j)
B2 = B2 K0 A2 .
The initialization is
(0) (0)
Ai = Ai (i = 0, 1, 2), Bi = Ai (i = 1, 2).

As a result, Algorithm H can, in the QBD case, be run in a very similar

way to Algorithm LR. The censorings and algebraic detail are, however, quite
diﬀerent.
We programmed the LR Algorithm and ran it and Algorithm H on an
example given in [7] and subsequently in [1] and [2].

Example 5. Latouche and Ramaswami’s pure-birth/pure-death process.

This example is a QBD with

1−p 0 0 p 0 0
A0 = , A1 = , A2 = .
0 0 2p 0 0 1 − 2p

We chose p equal to 0.1.

In presenting results we employ GI as a generic notation for the approxi-
mation to G after I iterations with the algorithms involved, viz., T I and TI
in the present case.
The results in Table 9.1 have errors

0.5672 > 0.4800 > 0.3025 > 0.2619 > · · · > 4.9960e − 14 > 4.3854e − 14,

illustrating well the interlacing property.

Table 9.1 The interlacing property

LR H

Iteration e − GI e∞ CPU e − GI e∞ CPU

I Time (s) Time (s)
1 0.5672 0.001 0.4800 0.000
2 0.3025 0.002 0.2619 0.000
3 0.1027 0.003 0.0905 0.004
4 0.0145 0.007 0.0130 0.007
5 3.3283e-04 0.009 2.9585e-04 0.009
6 1.7715e-07 0.011 1.5747e-07 0.010
7 4.9960e-14 0.012 4.3854e-14 0.010
9 Computing the fundamental matrix of an M/G/1–type Markov chain 185

9.8 Algorithms CR and H

We now consider the relation between Algorithm H and Bini and Meini’s
Cyclic Reduction Algorithm CR. The latter is carried out in terms of formal
power series, so to make a connection we need to express Algorithm H in
these terms, too. For j ≥ 0, we deﬁne
∞

ψ (j) (z) := A(j) n
n z ,
n=0
∞
(j)
φ(j) (z) := Bn+1 z n .
n=0

We remark that since Aj is substochastic, these series are absolutely conver-

gent for |z| ≤ 1. We encapsulate the odd- and even-labeled coeﬃcients in the
further generating functions
∞
∞

(j) (j)
ψe(j) (z) := A2n z n , ψo(j) (z) := A2n+1 z n ,
n=0 n=0
∞ ∞
(j) (j)
φ(j)
e (z) := B2(n+1) z n , φ(j)
o (z) := B2n+1 z n .
n=0 n=0

Again, these power series are all absolutely convergent for |z| ≤ 1.
We introduce
∞
L(j+1) (z) := L(j+1)
n zn,
n=0
∞

K (j+1) (z) := Kn(j+1) z n .
n=0

Multiplication of (9.6) by z , summing over n ≥ 1 and adding (9.7) provides

∞
∞
∞
(j)
(j)

ψ (j+1) (z) = A2n−1 z n + A2m z m L(j+1)
n zn
n=1 m=0 n=0

= zψo(j) (z) + ψe(j) (z)L(j+1) (z). (9.35)

Similarly (9.8) gives

∞
∞
∞
(j)
(j)

φ(j+1) (z) = B2n−1 z n−1 + B2m z m L(j+1)
n zn
n=1 m=0 n=0

= φ(j) (j)
o (z) + zφe (z)L
(j+1)
(z). (9.36)
186 E. Hunt

Forming generating functions from (9.9) in the same way leads to

L(j+1) (z) = K (j+1) (z)ψe(j) (z), (9.37)

while from (9.10) and (9.11) we derive

∞

n−1
(j+1) (j) (j+1)
K (j+1) (z) = K0 + zn (j+1)
Km A2(n−m)+1 K0
n=1 m=0
∞ ∞

(j+1) (j+1) (j) (j+1)
= K0 + Km z n A2(n−m)+1 K0
m=0 n=m+1
∞ ∞
(j+1) (j+1) m (j) (j+1)
= K0 + Km z z A2+1 K0
m=0 =1
∞
(j+1) (j) (j+1)
= K0 + K (j+1) (z) z A2+1 K0
=1
' (
(j+1) (j) (j+1)
= K0 + K (j+1) (z) ψo(j) (z) − A1 K0
' (
(j+1) (j+1)
= K0 + K (j+1) (z) ψo(j) (z) − I K0
' (
(j) (j+1)
+ K (j+1) (z) I − A1 K0 .

By (9.10) the last term on the right simpliﬁes to K (j+1) (z). Hence we have
' (
(j+1) (j+1)
K0 = K (j+1) (z) I − ψo(j) (z) K0 .

(j) (j+1) −1
Postmultiplication by I − A1 = [K0 ]
yields
' (
I = K (j+1) (z) I − ψo(j) (z) ,

so that
' (−1
K (j+1) (z) = I − ψo(j) (z) .

Hence we have from (9.37) that

' (−1
L(j+1) (z) = I − ψo(j) (z) ψe(j) (z).

We now substitute for L(j+1) (z) in (9.35) and (9.36) to obtain

' (−1
o (z) + zφe (z) I − ψo (z)
φ(j+1) (z) = φ(j) (j) (j)
ψe(j) (z) (9.38)
9 Computing the fundamental matrix of an M/G/1–type Markov chain 187

and
' (−1
e (z) I − ψo (z)
ψ (j+1) (z) = zψo(j) (z) + φ(j) (j)
ψe(j) (z). (9.39)

We have also that

∞

ψ (0) (z) = An z n (9.40)
n=0

and
∞

φ(0) (z) = An+1 z n . (9.41)
n=0

The recursive relations (9.38)–(9.41) are precisely the generating functions

used in CR (see [5]). Thus Algorithm H is equivalent to the cyclic reduction
procedure whenever the latter is applicable.
The formulation of Algorithm CR of Bini and Meini that we derived above
is the simpler of two versions given in [5]. Bini and Meini have developed the
theme of [5] in this and a number of associated works (see, for example,
[3]–[10]).
The proofs in [5] are more complicated than those we use to establish
Algorithm H. Furthermore, their treatment employs a number of assumptions
that we have not found necessary. The most notable of these are restrictions
as to the M/G/1–type Markov chains to which the results apply. Like us,
they assume A is irreducible. However they require also that A be positive
recurrent and that the matrix G be irreducible and aperiodic.
Further conditions imposed later in the proofs are less straightforward.
Several alternative possibilities are proposed which can be used to lead to
desired results. These are:
∞ (j) −1
(a) that the matrix I − i=1 Ai is bounded above;

(b) that the limit P = limj→∞ P (j) exists and the matrix P is is the one-
step transition matrix of a positive recurrent Markov chain;
(j)
(c) that the matrix A1 is irreducible for some j;
∞ (j)
(d) that the matrices i=1 Ai are irreducible for every j and do not con-
verge to a reducible matrix.

References

1. N. Akar, N. C. Oǧuz & K. Sohraby, “TELPACK: An advanced TELetraﬃc analysis

PACKage,” IEEE Infocom ’97. http://www.cstp.umkc.edu/personal/akar/home.html
2. N. Akar, N. C. Oǧuz & K. Sohraby, An overview of TELPACK IEEE Commun. Mag.
36 (8) (1998), 84–87.
3. D. Bini and B. Meini, On cyclic reduction applied to a class of Toeplitz–like matrices
arising in queueing problems, in Proc. 2nd Intern. Workshop on Numerical Solution
of Markov Chains, Raleigh, North Carolina (1995), 21–38.
188 E. Hunt

4. D. Bini and B. Meini, Exploiting the Toeplitz structure in certain queueing problems,
Calcolo 33 (1996), 289–305.
5. D. Bini and B. Meini, On the solution of a non–linear matrix equation arising in
queueing problems, SIAM J. Matrix Anal. Applic. 17 (1996), 906–926.
6. D. Bini and B. Meini, On cyclic reduction applied to a class of Toeplitz–like matrices
arising in queueing problems, in Computations with Markov Chains, Ed. W. J. Stewart,
Kluwer, Dordrecht (1996) 21–38.
7. D. Bini and B. Meini, Improved cyclic reduction for solving queueing problems, Nu-
merical Algorithms 15 (1997), 57–74.
8. D. A. Bini and B. Meini, Using displacement structure for solving non–skip–free
M/G/1 type Markov chains, in Advances in Matrix Analytic Methods for Stochas-
tic Models, Eds A. S. Alfa and S. R. Chakravarthy, Notable Publications, Neshanic
Station, NJ (1998), 17–37.
9. D. Bini and B. Meini, Solving certain queueing problems modelling by Toeplitz ma-
trices, Calcolo 30 (1999), 395–420.
10. D. Bini and B. Meini, Fast algorithms for structured problems with applications to
Markov chains and queueing models, Fast Reliable Methods for Matrices with Struc-
ture, Eds T. Kailath and A. Sayed, SIAM, Philadelphia (1999), 211–243.
11. E. Hunt, A probabilistic algorithm for determining the fundamental matrix of a block
M/G/1 Markov chain, Math. & Comput. Modelling 38 (2003), 1203–1209.
12. G. Latouche, Algorithms for evaluating the matrix G in Markov chains of P H/G/1
type, Bellcore Tech. Report (1992)
13. G. Latouche, Newton’s iteration for non–linear equations in Markov chains, IMA J.
Numer. Anal. 14 (1994), 583–598.
14. G. Latouche and V. Ramaswami, A logarithmic reduction algorithm for Quasi–Birth–
Death processes, J. Appl. Prob. 30 (1993), 650–674.
15. G. Latouche and G. W. Stewart, Numerical methods for M/G/1 type queues, in Proc.
Second Int. Workshop on Num. Solution of Markov Chains, Raleigh NC (1995), 571–
581.
16. B. Meini, Solving M/G/1 type Markov chains: recent advances and applications,
Comm. Statist.– Stoch. Models 14 (1998), 479–496.
17. B. Meini, Solving QBD problems: the cyclic reduction algorithm versus the invariant
subspace method, Adv. Performance Anal. 1 (1998), 215–225.
18. M. F. Neuts, Structured Stochastic Matrices of M/G/1 Type and Their Applications,
Marcel Dekker, New York (1989).
19. V. Ramaswami, Nonlinear matrix equations in applied probability – solution tech-
niques and open problems, SIAM Review 30 (1988), 256–263.
20. V. Ramaswami, A stable recursion for the steady state vector in Markov chains of
M/G/1 type, Stoch. Models 4 (1988), 183–188.
Chapter 10
A comparison of probabilistic
and invariant subspace methods
for the block M /G/1 Markov chain

Emma Hunt

Abstract A suite of numerical experiments is used to compare Algorithm H

and other probability-based algorithms with invariant subspace methods for
determining the fundamental matrix of an M/G/1–type Markov chain.

Key words: Block M/G/1 Markov chain, fundamental matrix, invariant

subspace methods, probabilistic algorithms, Algorithm H

10.1 Introduction

In a preceding chapter in this volume, we discussed the structure of a new

probabilistic Algorithm H for the determination of the fundamental matrix
G of a block M/G/1 Markov chain. We assume familiarity with the ideas
and notation of that chapter. In the current chapter we take a numerical
standpoint and compare Algorithm H with other, earlier, probability-based
algorithms and with an invariant subspace approach.
The last-mentioned was proposed recently by Akar and Sohraby [4] for
determining the fundamental matrix G of an M/G/1–type Markov chain or
the rate matrix R of a GI/M/1–type Markov chain. Their approach applies
only for special subclasses of chains. For the M/G/1 case this is where
∞

A(z) = Ai z i
i=0

is irreducible for 0 < z ≤ 1 and is a rational function of z. The analysis

can then be conducted in terms of solving a matrix polynomial equation

Emma Hunt
School of Mathematical Sciences & School of Economics, The University of Adelaide,
Adelaide SA 5005, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 189

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 10,
c Springer Science+Business Media, LLC 2009
190 E. Hunt

of ﬁnite degree. Following its originators, we shall refer to this technique

as TELPACK. It is important to note that TELPACK applies only in the
positive recurrent case.
It is natural to have high hopes for such an approach, since it exploits spe-
cial structure and circumvents the necessity for truncations being made to
the sequence (Ak )k≥0 . The solution to the polynomial problem is effected via
a so-called invariant subspace approach. The invariant subspace approach
is one that has been extensively used over the past 20 years for attacking
an important problem in control theory, that of solving the algebraic Ric-
cati equation. This has been the object of intense study and many solution
variants and refinements exist.
A further treatment relating to the M/G/1–type chain has been given
by Gail, Hantler and Taylor [6]. Akar, Oǧuz and Sohraby [3] also treat the
finite quasi-birth-and-death process by employing either Schur decomposition
or matrix-sign-function iteration to find bases for left- and right-invariant
subspaces.
In connection with a demonstration of the strength of the invariant sub-
space method, Akar, Oǧuz and Sohraby [1], [2] made available a suite of
examples of structured M/G/1 and GI/M/1 Markov chains which may be
regarded as standard benchmarks. These formed part of a downloadable pack-
age, including C code implementations of the invariant subspace approach,
which was until very recently available from Khosrow Sohraby’s home page at
http://www.cstp.umkc.edu/org/tn/telpack/home.html. This site no longer
exists.
Section 2 addresses some issues for error measures used for stopping rules
for iterative algorithms, in preparation for numerical experiments. In Sec-
tion 3 we perform numerical experiments, drawing on a suite of TELPACK
M/G/1 examples and a benchmark problem of Daigle and Lucantoni. Our
experiments illustrate a variety of points and provide some surprises. We
could find no examples in the literature for which A(z) is not rational, so
have supplied an original example.

10.2 Error measures

In [8], Meini noted that, in the absence of an analysis of numerical stability,

the common error measure
e − GI e ∞ (10.1)

applied when G is stochastic may not be appropriate for TELPACK. She

proposed instead the measure
GI − A(GI ) ∞ , (10.2)

which is also appropriate in the case of substochastic G,

10 A comparison of probabilistic and invariant subspace methods 191

We now note that a closer approximation to G can on occasion give rise

to a worse error as measured by (10.2). That is, it can happen that there are
substochastic matrices G0 , G1 simultaneously satisfying

G − G1 ∞ < G − G0 ∞ , (10.3)

and in fact 0 ≤ G0 ≤ G1 ≤ G, but with

G1 − A(G1 ) ∞ > G0 − A(G0 ) ∞ . (10.4)

We shall make use of the QBD given by

1−p 0 0 p 0 0
A0 = , A1 = , A2 = (10.5)
0 0 rp 0 0 1 − rp

with
r ≥ 1 and 0 < p < 1/r. (10.6)
This is an extension of the pure-birth/pure-death process of Latouche and
Ramaswami [7]. With these parameter choices, the QBD is irreducible. It is
null recurrent for r = 1 and positive recurrent for r > 1, with fundamental
matrix

10
G= .
10

Also for any matrix

x0
GI = with 0 ≤ x, y ≤ 1, (10.7)
y0

we have

1 − p + py 0
A(GI ) = .
rpx + (1 − rp)xy 0

Take r = 1 and p = 1/2 and put

0.5 0 0.6 0
G0 = , G1 = .
0.5 0 0.9 0

We have
G − G1 ∞ = 0.4 < 0.5 = G − G0 ∞
and so (10.3) holds. Also

0.75 0
A(G0 ) = , so that G0 − A(G0 ) ∞ = 0.25,
0.375 0
192 E. Hunt

and

0.95 0
A(G1 ) = , so that G1 − A(G1 ) ∞ = 0.35.
0.57 0

We thus have (10.4) as desired.

Further
G − GI ∞ = e − GI e ∞
for GI of the form (10.7), so that we have also an example in which (10.4)
and
e − G1 e ∞ < e − G0 e ∞ (10.8)
hold simultaneously.
The inequalities (10.3) and (10.4) or (10.4) and (10.8) also occur simulta-
neously for the same choices of G0 and G1 when we take the QBD given by
(10.5) with r = 2 and p = 0.4.
In the above examples, the two nonzero entries in Gi − A(Gi ) (i = 0, 1),
that is, those in the leading column, have opposite sign. This is a particular
instance of a general phenomenon with QBDs given by (10.5) and (10.6) and
GI of the form (10.7).
The general result referred to is as follows. Suppose GI is of the form (10.7)
with y < 1, x ≤ 1 and x < (1 − p)/(1 − rp). We have

(1 − p)(1 − y) > x(1 − rp)(1 − y)

or
x − [1 − p + py] < [rpx + (1 − rp)xy] − y,
so that
Θ1 < −Θ2 ,
where
Θi := [GI − A(GI )]i,1 (i = 1, 2).
If Θ2 ≥ 0, then Θ1 < 0. Conversely, if Θ1 ≥ 0, then Θ2 < 0. In particular,
if Θ1 and Θ2 are both nonzero, then they are of opposite sign. This sort of
behavior does not appear to have been reported previously.
It is also worthy of note that, because of its use of rational functions, no
truncations need be involved in the computation of the error measure on the
left in (10.2) with the use of TELPACK.

10.3 Numerical experiments

We now consider some numerical experiments testing Algorithm H. As previ-

ously noted, all outputs designated as TELPACK have been obtained running
10 A comparison of probabilistic and invariant subspace methods 193

the C program downloaded from Khosrow Sohraby’s website. All other code
has been implemented by us in MATLAB.
The following experiments illustrate a variety of issues.

10.3.1 Experiment G1

Our ﬁrst experiment is drawn from the suite of TELPACK M/G/1 examples.
We use ⎡ ⎤
1 2 7
11i+1 11i+1 11i+1
⎢ ⎥
⎢ 18
10
i 1
10
i 1
10
i ⎥
⎢
Ai = ⎢ 21 21 ⎥
21 21 21 21 ⎥
⎣ ⎦
9
4
i+1 9
4
i+1 12
4
i+1
40 7 40 7 40 7

for i ≥ 0. This gives

⎡ z −1 z −1 z −1
⎤
1
11 (1 − 11 )
2
11 (1 − 11 )
7
11 (1 − 11 )
⎢ ⎥
⎢ 18 ⎥
A(z) = ⎢ 10z −1
⎢ 21 (1 − 21 )
1
21 (1 − 10z −1 1
21 ) 21 (1 − 10z −1 ⎥
21 ) ⎥
⎣ ⎦
4z −1 4z −1 4z −1
70 (1 − 7 ) − −
9 9 12
70 (1 7 ) 70 (1 7 )

⎡ z −1
⎤⎡ ⎤
(1 − 11 ) 0 0 1 2 7
11 11 11
⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ 18 ⎥
=⎢
⎢ 0 (1 − 10z −1
21 ) 0 ⎥⎢
⎥ ⎢ 21
1 1
21 21
⎥
⎥
⎣ ⎦⎣ ⎦
4z −1
0 0 (1 − 7 )
9 9 12
70 70 70

⎡ 1 2 7
⎤
11 11 11
+ ,−1 ⎢ ⎥
1 10 4 ⎢ 18 ⎥
= I − z diag , , ⎢ 1 1 ⎥.
11 21 7 ⎢ 21 21 21 ⎥
⎣ ⎦
9 9 12
70 70 70

This example provides the simplest form of rational A(z) for which every
Ai has nonzero elements and can be expected to favor TELPACK.
Using the stopping criterion

(GI − GI−1 )e < (10.9)

with = 10−8 , TELPACK converges in 7 iterations, Algorithm H in 5. In

Table 10.1 we include also, for comparison, details of the performance of
Algorithm H for 6 iterations.
194 E. Hunt

Table 10.1 Experiment G1

TELPACK H

I e − GI e∞ CPU Time (s) I e − GI e∞ CPU Time (s)

7 9.9920e-16 0.070 5 1.0617e-08 0.001

6 5.5511e-16 0.001

The use of the stopping criterion (10.9) is ﬁxed in TELPACK. We have

therefore employed it wherever TELPACK is one of the algorithms being
compared.

10.3.2 Experiment G2

We now consider a further numerical experiment reported by Akar and

Sohraby in [4]. The example we take up has

0.0002 0.9998 1 0
A(z) = 1−a .
0.9800 0.0200 0 1−az

The parameter a is varied to obtain various values of the traﬃc intensity ρ,

with the latter deﬁned by
ρ = xA (1)e,
where x is the invariant probability measure of the stochastic matrix A(1).
See Neuts [9, Theorem 2.3.1 and Equation (3.1.1)].
In Tables 10.2 and 10.3 we compare the numbers of iterations required
to calculate the fundamental matrix G to a precision of 10−12 or better
(using (10.9) as a stopping criterion) with several algorithms occurring in
the literature. The Neuts Algorithm is drawn from [9] and is based on the
iterative relations G(0) = 0 with
% ∞
&
−1

G(j + 1) = (I − A1 ) A0 + Ai (G(j))i
for j ≥ 0. (10.10)
i=2

The Ramaswami Algorithm [10] is based similarly on G(0) = 0 with

∞
−1

G(j + 1) = I− Ai (G(j))i−1
A0 for j ≥ 0. (10.11)
i=1

As noted previously, TELPACK is designed for sequences (Ai ) for which the
generating function A(z) is, for |z| ≤ 1, a rational function of z. In this event
10 A comparison of probabilistic and invariant subspace methods 195

Table 10.2 Experiment G2

ρ Method I e − GI e∞ CPU

Time (s)

0.20 Neuts 12 3.8658e-13 0.070

Ramaswami 8 1.6036e-12 0.060
Extended Neuts 26 1.1143e-12 0.020
Extended Ramaswami 23 1.1696e-12 0.020
TELPACK 7 6.6613e-16 0.060
H 4 2.2204e-16 0.001

0.40 Neuts 21 1.1224e-12 0.200

Ramaswami 15 3.7181e-13 0.180
Extended Neuts 50 3.0221e-12 0.040
Extended Ramaswami 41 1.9054e-12 0.030
TELPACK 6 2.2204e-16 0.060
H 5 1.1102e-16 0.001

0.60 Neuts 37 2.3257e-12 0.450

Ramaswami 26 9.0550e-13 0.330
Extended Neuts 95 5.7625e-12 0.060
Extended Ramaswami 72 4.1225e-12 0.040
TELPACK 6 4.4409e-16 0.060
H 6 2.2204e-16 0.001

0.80 Neuts 81 5.4587e-12 1.230

Ramaswami 55 4.2666e-12 0.870
Extended Neuts 220 1.4281e-11 0.150
Extended Ramaswami 157 1.0090e-11 0.090
TELPACK 6 4.4409e-16 0.060
H 7 4.4409e-16 0.001

the fundamental matrix satisﬁes a reduced matrix polynomial equation

f
F (G) := Fi Gi = 0. (10.12)
i=0

The Extended Neuts and Extended Ramaswami Algorithms are respectively

extensions of the Neuts and Ramaswami Algorithms based on Equation
(10.12). This enables the inﬁnite sums to be replaced by ﬁnite ones. Assuming
the invertibility of F1 , the recursions (10.10), (10.11) become respectively

f
−1
G(0) = 0, G(j + 1) = −F1 F0 + Fi (G(j))i

i=2

and −1

f
G(0) = 0, G(j + 1) = − Fi (G(j)) i−1
F0 .
i=1
196 E. Hunt

Table 10.3 Experiment G2 continued

ρ Method I e − GI e∞ CPU

Time (s)

0.90 Neuts 163 1.0630e-11 2.710

Ramaswami 111 7.2653e-12 1.900
Extended Neuts 451 3.2315e-11 0.310
Extended Ramaswami 314 2.1745e-11 0.180
TELPACK 6 6.6613e-16 0.060
H 8 1.1102e-16 0.001

0.95 Neuts 315 2.2072e-11 5.430

Ramaswami 214 1.5689e-11 3.810
Extended Neuts 881 6.6349e-11 0.590
Extended Ramaswami 606 4.4260e-11 0.350
TELPACK 7 0 0.060
H 9 1.1102e-15 0.001

0.99 Neuts 1368 1.1456e-10 24.770

Ramaswami 933 7.7970e-11 17.440
Extended Neuts 3836 3.3809e-10 2.880
Extended Ramaswami 2618 2.2548e-10 1.720
TELPACK 8 1.9984e-15 0.060
H 11 1.0880e-14 0.003

TELPACK is designed to exploit situations in which A(z) is a rational

function of z, so it is hardly surprising that it achieves a prescribed accu-
racy in markedly fewer iterations than needed for the Neuts and Ramaswami
Algorithms. What is remarkable is that these beneﬁts do not occur for the
Extended Neuts or Ramaswami Algorithms, in fact they require almost three
times as many iterations for a prescribed accuracy, although overall the ex-
tended algorithms take substantially less CPU time than do their counter-
parts despite the extra iterations needed.
In contrast to the Extended Neuts and Ramaswami Algorithms and TEL-
PACK, Algorithm H holds for all Markov chains of block–M/G/1 type. In
view of this Algorithm H compares surprisingly well. We note that it achieves
accuracy comparable with that of TELPACK, with much smaller CPU times,
for all levels of traﬃc intensity.

10.3.3 The Daigle and Lucantoni teletraﬃc problem

The most common choice of benchmark problem in the literature and the sub-
ject of our next three numerical experiments is a continuous-time teletraﬃc
example of Daigle and Lucantoni [5]. This involves matrices expressed in
terms of parameters K, ρd , a, r and M . The deﬁning matrices Ai (i = 0, 1, 2)
10 A comparison of probabilistic and invariant subspace methods 197

are of size (K + 1) × (K + 1). The matrices A0 and A2 are diagonal and

prescribed by

(A0 )j,j = 192[1 − j/(K + 1)] (0 ≤ j ≤ K), A2 = 192ρd I.

The matrix A1 is tridiagonal with

M −j
(A1 )j,j+1 = ar (0 ≤ j ≤ K − 1), (A1 )j,j−1 = jr (1 ≤ j ≤ K).
M

A physical interpretation of the problem (given in [5]) is as follows.

A communication line handles both circuit-switched telephone calls and
packet-switched data. There are a finite number M of telephone subscribers,
each of whom has exponentially distributed on-hook and off-hook times,
the latter having parameter r and the former being dependent upon the
offered load a which is given in Erlangs. In particular, the rate for the on-
hook distribution is given by the quantity a/(M r). Data packets arrive ac-
cording to a Poisson process and their lengths are assumed to be approx-
imated well by an exponentially distributed random variable having mean
8000.
The communication line has a transmission capacity of 1.544 megabits per
second of which 8000 bits per second are used for synchronization. Thus, at
full line capacity, the line can transmit 192 packets per second. Each active
telephone call consumes 64 kilobits per second. A maximum of min(M ,23)
active telephone subscribers are allowed to have calls in progress at any given
time. The transmission capacity not used in servicing telephone calls is used
to transmit data packets. Thus, if there are i active callers, then the service
rate for the packets is (1 − i/24) × 192. The offered load for the voice traffic
is fixed at 18.244 Erlangs.
Following the original numerical experiments in [5], the above example has
been used as a testbench by a number of authors including Latouche and Ra-
maswami [7] and Akar, Oǧuz and Sohraby [1], [2]. This example is a fairly de-
manding comparative test for an algorithm designed for general M/G/1–type
Markov chains, since it features a QBD. The logarithmic-reduction method in
[7] is expressly designed for such processes. The matrix-sign-function method
of [1] and [2] is designed for the more general, but still rather limited case,
in which A(z) is a rational function of z.
As our algorithm is designed for discrete-time rather than continuous-time
processes, we use the embedded jump chain of the latter, for which the entries
in G have to be the same, for our analysis.
In Experiments G3 and G4 we employ the criterion (10.9) with = 10−12
used in the above stochastic-case references, together with the parameter
choice K = 23, the latter indicating that the matrices An in the example are
of size of size 24 × 24.
198 E. Hunt

10.3.3.1 Experiment G3

In this experiment the call holding rate is set at r = 100−1 s−1 and the calling
population size M is fixed at 512. In Tables 10.4 and 10.5 we compare the
number of iterations involved in estimating G with different algorithms for a
range of traffic parameter values from ρd = 0.01 to ρd = 0.29568. The latter
value was noted by Daigle and Lucantoni [5] to correspond to an instability
limit. The algorithms considered are the logarithmic-reduction algorithm of
Latouche and Ramaswami (LR), TELPACK and Algorithm H. We do not
give iteration counts for the original experiments of Daigle and Lucantoni.
These counts are not detailed in [5] but are mentioned as running to tens of
thousands.

Table 10.4 Iterations required with various traﬃc levels: Experiment G3

ρd Method I GI − A(GI )∞ CPU Time (s)

0.010 TELPACK 10 1.5613e-16 0.450

LR 4 1.4398e-16 0.010
H 4 2.4460e-16 0.010

0.025 TELPACK 10 1.8388e-16 0.480

LR 5 1.6306e-16 0.020
H 5 2.1164e-16 0.020

0.050 TELPACK 10 2.6368e-16 0.460

LR 8 1.5959e-16 0.040
H 8 1.5959e-16 0.040

0.075 TELPACK 10 2.2204e-16 0.450

LR 10 2.2204e-16 0.040
H 10 2.6368e-16 0.040

0.100 TELPACK 10 1.5266e-16 0.048

LR 11 3.4781e-16 0.040
H 11 3.4478e-16 0.040

0.120 TELPACK 10 2.0817e-16 0.060

LR 12 1.5266e-16 0.060
H 12 2.3592e-16 0.060

0.140 TELPACK 10 3.6082e-16 0.560

LR 13 2.6368e-16 0.060
H 13 1.5266e-16 0.060

0.160 TELPACK 9 2.2204e-16 0.420

LR 14 1.6653e-16 0.060
H 14 1.9429e-16 0.060
10 A comparison of probabilistic and invariant subspace methods 199

Table 10.5 Iterations required with various traﬃc levels: Experiment G3 continued

ρd Method I GI − A(GI )∞ CPU Time (s)

0.180 TELPACK 9 4.9960e-16 0.470

LR 14 1.6653e-16 0.060
H 14 1.9429e-16 0.060

0.200 TELPACK 9 1.8822e-16 0.420

LR 15 1.1102e-16 0.070
H 15 1.9429e-16 0.070

0.220 TELPACK 9 3.0531e-16 0.410

LR 15 3.6082e-16 0.070
H 15 2.2204e-16 0.070

0.240 TELPACK 10 3.0531e-16 0.450

LR 16 1.3878e-16 0.080
H 16 1.1796e-16 0.070

0.260 TELPACK 10 3.7383e-16 0.470

LR 17 2.4980e-16 0.080
H 17 2.2204e-16 0.080

0.280 TELPACK 12 9.5659e-16 0.530

LR 18 1.9429e-16 0.080
H 18 1.1102e-16 0.080

0.290 TELPACK 13 7.5033e-15 0.560

LR 20 2.2204e-16 0.080
H 20 1.3878e-16 0.080

0.29568 TELPACK 20 1.5737e-09 0.830

LR 29 2.2204e-16 0.100
H 29 1.6653e-16 0.100

It should be noted that in the references cited there is some slight variation
between authors as to the number of iterations required with a given method,
with larger diﬀerences at the instability limit. Akar et al. attribute this to
diﬀerences in the computing platforms used [4]. All computational results
given here are those obtained by us, either using our own MATLAB code or
by running TELPACK.

10.3.3.2 Experiment G4

Our fourth numerical experiment fixed the offered data traffic at 15%, the
call holding rate at r = 300−1 s−1 and then considered system behavior as a
function of the calling population size M (see Tables 10.6 and 10.7).
200 E. Hunt

Table 10.6 Experiment G4

M Method I GI − A(GI )∞ CPU Time (s)

64 TELPACK 9 3.7323e-16 0.440

LR 16 2.7756e-16 0.030
H 16 1.3878e-16 0.030

128 TELPACK 10 3.0531e-16 0.470

LR 18 1.3878e-16 0.060
H 18 1.3878e-16 0.060

256 TELPACK 11 4.2340e-16 0.500

LR 19 2.2204e-16 0.050
H 19 1.3878e-16 0.060

512 TELPACK 12 6.6337e-16 0.530

LR 20 2.2204e-16 0.070
H 20 1.6653e-16 0.070

1024 TELPACK 13 3.1832e-15 0.550

LR 21 2.4980e-16 0.080
H 21 1.9429e-16 0.070

2048 TELPACK 13 3.8142e-14 0.550

LR 22 2.2204e-16 0.080
H 22 1.9429e-16 0.080

Table 10.7 Experiment G4 continued

M Method I GI − A(GI )∞ CPU Time (s)

4096 TELPACK 14 6.3620e-14 0.530

LR 23 1.9429e-16 0.080
H 23 2.7756e-16 0.080

8192 TELPACK 15 1.5971e-13 0.610

LR 24 2.4980e-16 0.090
H 24 3.0531e-16 0.090

16384 TELPACK 16 4.2425e-12 0.650

LR 25 2.2204e-16 0.090
H 25 2.2204e-16 0.080

32768 TELPACK 17 2.5773e-11 0.690

LR 27 1.9429e-16 0.100
H 27 2.2204e-16 0.100

65536 TELPACK 25 6.5647e-08 0.960

LR 32 1.9429e-16 0.130
H 32 2.2204e-16 0.110
10 A comparison of probabilistic and invariant subspace methods 201

10.3.3.3 Overview of Experiments G3 and G4

In the light of its design versatility, Algorithm H compares quite well with the
above-mentioned more specialist algorithms. Its performance with respect to
CPU time and accuracy is comparable with that of the logarithmic-reduction
(LR) algorithm. Both the logarithmic-reduction algorithm and Algorithm
H require considerably less CPU time than does TELPACK (the diﬀerence
in times sometimes being as much as an order of magnitude) for superior
accuracy.
In Experiments 3 and 4 we employ the alternative error measure GI −
A(GI ) ∞ < suggested by Meini (see, for example, [8]). In terms of this
measure, the performance of TELPACK deteriorates steadily with an increase
in the size of M , whereas Algorithms H and LR are unaﬀected.
The last two TELPACK entries in Tables 10.5 and 10.7 are in small type-
face to indicate that TELPACK was unable to produce a result in these cases
and crashed, generating the error message ‘segmentation fault.’ Reducing
to 10−8 produced a result in both instances.

10.3.3.4 Experiment G5

We ran Algorithm H on the Daigle and Lucantoni problem with the call
holding rate fixed at r = 300−1 s−1 , the offered data traffic at 28% and the
calling population size M at 65,536, varying the size of the matrices from
24 × 24 to 500 × 500. In all cases we used (10.9) as a stopping criterion with
= 10−8 .
We found that although the iteration counts decreased as the size of the
matrices increased, CPU times increased substantially (see Table 10.8). This
held for all matrix sizes except for 24×24 (the first entry in Table 10.8) where
the computation required for the extra iterations outweighed the speed gain
due to smaller matrix size.

10.3.4 Experiment G6

We now turn our attention to the case of a null recurrent process where the
deﬁning transition matrices for the system are given by

0.4 0 0 0.1 0.5 0
A0 = , A1 = and A2 = .
0 0.4 0.2 0.2 0 0.2

Results for this experiment are given in Table 10.9. The stopping criterion
used was (10.9) with = 10−8 . We note that this case is not covered by
202 E. Hunt

Table 10.8 Experiment G5

H
K Iterations I e − GI e∞ CPU Time (s)

23 29 9.4832e-09 0.110
24 19 5.1710e-11 0.080
25 18 8.0358e-11 0.080
26 17 2.6813e-08 0.090
27 17 9.2302e-11 0.100
28 17 4.4409e-16 0.100
29 16 2.5738e-08 0.110
39 15 2.3319e-11 0.200
49 14 1.18140e-09 0.260
59 14 2.2204e-15 0.600
69 13 3.6872e-08 1.130
79 13 4.5749e-10 2.250
89 13 4.5552e-12 4.170
99 13 5.3213e-13 7.670
149 12 3.0490e-09 76.400
299 12 9.7700e-15 853.990
499 12 5.8509e-14 3146.600

Table 10.9 Experiment G6

Method Iterations I e − GI e∞ CPU Time (s)

Neuts 11307 2.1360e-04 10.950
LR 24 3.9612e-08 0.010
H 24 3.7778e-08 0.010

the Akar and Sohraby methodology and therefore that TELPACK cannot be
used for this experiment. The results for the H and LR Algorithms are several
orders more accurate than that for the Neuts Algorithm with signiﬁcantly
lower CPU times.

10.3.5 Experiment G7

The numerical experiments above all involve matrix functions A(z) of rational
form. We could ﬁnd no examples in the literature for which A(z) is not
rational. The following is an original example showing how Algorithm H
(and the Neuts Algorithm) perform when A(z) is not rational. We note that
these are the only two algorithms which can be applied here.
Suppose p, q are positive numbers with sum unity. We deﬁne two k × k
matrices Ω0 , Ω1 with ‘binomial’ forms. Let
10 A comparison of probabilistic and invariant subspace methods 203
⎡ ⎤
0 0 ... 0 0
⎢ .. .. .. .. .. ⎥
⎢ . . ⎥
Ω0 = ⎢ . . . ⎥,
⎣ 0 0 ... 0 0 ⎦

k−2
k−2 k−1
pk−1 k−1
1 p q k−1
. . . k−2 pq q

⎡ ⎤
p q 0 ... 0 0
⎢ p2 2pq q2 ... 0 0 ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. ⎥
Ω1 = ⎢ . . . ⎥ .
⎢ k−1
. k−2 k−1
. k−3 2 k−1
. k−2 k−1 ⎥
⎣ pk−1 ⎦
1 p q 2 p q . . . k−2 pq q
0 0 0 ... 0 0

We now deﬁne

A0 := Ω0 e−r ,
rn rn−1
An := Ω0 e−r + Ω1 e−r (n ≥ 1),
n! (n − 1)!

for r a positive number. We remark that

∞

A(z) := Am z m
m=0

= (Ω0 + zΩ1 )e−r(1−z) (|z| ≤ 1),

so that A(z) is irreducible for 0 < z ≤ 1 and stochastic for z = 1.

Let ω := (ω1 , ω2 , . . . , ωk ) denote the invariant probability measure of

Ω := Ω0 + Ω1 = A(1).

Then the condition

ωA (1)e ≤ 1
for G to be stochastic (see [9, Theorem 2.3.1]) becomes

ω [Ω1 + r(ω0 + Ω1 )] e ≤ 1,

that is,
3 4
ω (r + 1)e − (0, 0, . . . , ωk )T ≤ 1
or
r + 1 − ωk ≤ 1.
We deduce that G is stochastic if and only if

r ≤ ωk .
204 E. Hunt

The parameter choice r = 1 thus provides the new and interesting situation
of a transient chain. Results are given in Table 10.10 (with the size of the
matrices set to 5 × 5).
Since G is not stochastic we again revert to the use of

A(GI ) − GI ∞ <

as an error measure.

Table 10.10 Experiment G7: a transient process

p Method I GI − A(GI )∞ CPU Time (s)

0.05 Neuts 78 5.2153e-13 1.950
H 6 1.1102e-16 0.006

0.1 Neuts 38 6.5445e-13 0.960

H 5 1.1102e-16 0.003

0.2 Neuts 18 3.3762e-13 0.480

H 4 8.3267e-17 0.002

0.3 Neuts 11 3.5207e-13 0.310

H 3 2.5153e-17 0.002

0.4 Neuts 8 5.8682e-14 0.210

H 2 2.3043e-13 0.001

0.5 Neuts 6 2.1154e-14 0.150

H 2 5.5511e-17 0.001

0.6 Neuts 4 1.5774e-13 0.130

H 2 1.7347e-18 0.001

0.7 Neuts 3 1.0413e-13 0.100

H 1 2.7311e-15 0.001

0.8 Neuts 2 6.4682e-13 0.080

H 1 2.7756e-17 0.001

References

1. N. Akar, N. C. Oǧuz & K. Sohraby, “TELPACK: An advanced TELetraﬃc analysis

PACKage,” IEEE Infocom ’97. http://www.cstp.umkc.edu/personal/akar/home.html
2. N. Akar, N. C. Oǧuz & K. Sohraby, An overview of TELPACK IEEE Commun. Mag.
36 (8) (1998), 84–87.
3. N. Akar, N. C. Oǧuz and K. Sohraby, A novel (computational?) method for solving
ﬁnite QBD processes, preprint. Comm. Statist. Stoch. Models 16 (2000), 273–311.
4. N. Akar and K. Sohraby, An invariant subspace approach in M/G/1 and G/M/1
type Markov chains, Commun. Statist. Stoch. Models 13 (1997), 381–416.
10 A comparison of probabilistic and invariant subspace methods 205

5. J. N. Daigle and D. M. Lucantoni, Queueing systems having phase–dependent arrival

and service rates, Numerical Solution of Markov Chains, Marcel Dekker, New York
(1991), 223–238.
6. H. R. Gail, S. L. Hantler and B. A. Taylor, M/G/1 type Markov chains with rational
generating functions, in Advances in Matrix Analytic Methods for Stochastic Models,
Eds A. S. Alfa and S. R. Chakravarthy, Notable Publications, Neshanic Station, NJ
(1998), 1–16.
7. G. Latouche and V. Ramaswami, A logarithmic reduction algorithm for Quasi–Birth–
Death processes, J. Appl. Prob. 30 (1993), 650–674.
8. B. Meini, Solving QBD problems: the cyclic reduction algorithm versus the invariant
subspace method, Adv. Performance Anal. 1 (1998), 215–225.
9. M. F. Neuts, Structured Stochastic Matrices of M/G/1 Type and Their Applications,
Marcel Dekker, New York (1989).
10. V. Ramaswami, Nonlinear matrix equations in applied probability – solution tech-
niques and open problems, SIAM Review 30 (1988), 256–263.
Chapter 11
Interpolating maps, the modulus map
and Hadamard’s inequality

S. S. Dragomir, Emma Hunt and C. E. M. Pearce

Abstract Reﬁnements are derived for both parts of Hadamard’s inequality

for a convex function. The main results deal with the properties of various
mappings involved in the reﬁnements.

Key words: Convexity, Hadamard inequality, interpolation, modulus map

11.1 Introduction

A cornerstone of convex analysis and optimization is Hadamard’s inequality,

which in its basic form states that for a convex function f on a proper ﬁnite
interval [a, b]
+ , b
a+b 1 f (a) + f (b)
f ≤ f (x) dx ≤ ,
2 b−a a 2

whereas the reverse inequalities hold if f is concave. For simplicity we take f

as convex on [a, b] throughout our discussion. The three successive terms in

S. S. Dragomir
School of Computer Science and Mathematics, Victoria University, Melbourne VIC 8001,
AUSTRALIA
e-mail: [email protected]
Emma Hunt
School of Mathematical Sciences & School of Economics, The University of Adelaide,
Adelaide SA 5005, AUSTRALIA
e-mail: [email protected]
C. E. M. Pearce
School of Mathematical Sciences, The University of Adelaide, Adelaide SA 5005,
AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 207

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 11,
c Springer Science+Business Media, LLC 2009
208 S.S. Dragomir et al.

Hadamard’s inequality are all means of f over the interval [a, b]. We denote
them respectively by mf (a, b), Mf (a, b), Mf (a, b) or simply by m, M, M
when f , a, b are understood. The bounds m and M for M are both tight.
More generally, the integral mean M is deﬁned by
b
1
f (x)dx if a = b
Mf (a, b) = b−a a .
f (a) if a = b

The Hadamard inequality can then be written as

mf (a, b) ≤ Mf (a, b) ≤ Mf (a, b) (11.1)

without the restriction a = b.

There is a huge literature treating various reﬁnements, generalizations and
extensions of this result. For an account of these, see the monograph [4]. Work
on interpolations frequently involves use of the auxiliary function

φt (p, q) := pt + q(1 − t) t ∈ [0, 1]

for particular choices of p, q. Thus a continuous interpolation of the ﬁrst part

of Hadamard’s inequality is available via the map Hf : [0, 1] → R given by
b
1
Hf (t) := f (y) dx,
b−a a

where for x ∈ [a, b] we set

yt (x) := φt (x, (a + b)/2).

Theorem A. We have that:

(a) Hf is convex;
(b) Hf is nondecreasing with Hf (0) = m, Hf (1) = M.

The ﬁrst part of Hadamard’s inequality is associated with Jensen’s in-

equality and has proved much more amenable to analysis than the second,
though this is the subject of a number of studies, see for example Dragomir,
Milośević and Sándor [3] and Dragomir and Pearce [5]. In the former study
a map Gf : [0, 1] → R was introduced, deﬁned by
1
Gf (t) := [f (u1 ) + f (u2 )] ,
2
where u1 (t) := yt (a) and u2 (t) := yt (b) for t ∈ [0, 1].

Theorem B. The map Gf enjoys the following properties:

(i) Gf is convex on [0, 1];
(ii) Gf is nondecreasing on [0, 1] with Gf (0) = m, Gf (1) = M ;
(iii) we have the inequalities
11 Interpolating maps, the modulus map and Hadamard’s inequality 209

0 ≤ Hf (t) − m ≤ Gf (t) − Hf (t) ∀t ∈ [0, 1] (11.2)

and
+ , + , + ,
3a + b a + 3b 1 3a + b a + 3b
Mf , ≤ f +f
4 4 2 4 4
1
≤ Gf (t) dt
0
m+M
≤ . (11.3)
2
Inequality (11.2) was proved for differentiable convex functions. As this
class of functions is dense in the class of all convex functions defined on the
same interval with respect to the topology induced by uniform convergence,
(11.2) holds also for an arbitrary convex map.
Dragomir, Milošević and Sándor introduced a further map Lf : [0, 1] → R
given by
b
1
Lf (t) := [f (u) + f (v)] dx, (11.4)
2(b − a) a
where we define

u(x) := φt (a, x) and v(x) := φt (b, x)

for x ∈ [a, b] and t ∈ [0, 1].

The following was shown.

Theorem C. We have that

(1) Lf is convex on [0, 1];
(2) for all t ∈ [0, 1]

Gf (t) ≤ Lf (t) ≤ (1 − t)M + tM ≤ M,

sup Lf (t) = Lf (1) = M ;

t∈[0,1]

(3) for all t ∈ [0, 1]

Hf (t) + Hf (1 − t)
Hf (1 − t) ≤ Lf (t) and ≤ Lf (t). (11.5)
2
In this chapter we take these ideas further and introduce results involving
the modulus map. With the notation of (11.1) in mind, it is convenient to
employ
σ(x) := |x|, i(x) := x.
210 S.S. Dragomir et al.

This gives in particular the identity

f (a) if f (a) = f (b)
Mf (a, b) = Mi (f (a), f (b)) = 1
f (b)
f (b)−f (a) f (a)
xdx otherwise .

In Section 2 we derive a reﬁnement of the basic Hadamard inequality

and in Section 3 introduce further interpolations for the outer inequality
Gf (t) − Hf (t) ≥ 0 in (11.2) and for Hf (t) − m ≥ 0, all involving the modulus
map.
For notational convenience we shall employ also in the sequel

w1 (t) := φt (a, b), w2 (t) := φt (b, a).

We note that this gives

+ , + ,
a + w1 b + w2
f = f (u1 ), f = f (u2 ),
2 2

so that + , + ,
1 a + w1 b + w2
Gf (t) = f +f . (11.6)
2 2 2
In Section 4 we derive some new results involving the identric mean I(a, b)
and in Section 5 introduce the univariate map Mf : [0, 1] → R given by

1
Mf (t) := [f (w1 ) + f (w2 )] (11.7)
2
and derive further results involving Lf . We remark that by convexity

f (w1 ) ≤ tf (a) + (1 − t)f (b) and f (w2 ) ≤ (1 − t)f (a) + tf (b),

so that
Mf (t) ≤ M. (11.8)

11.2 A reﬁnement of the basic inequality

We shall make repeated use of the following easy lemma.

Lemma 2.1 If f , g are integrable on some domain I and

f (x) ≥ |g(x)| on I,

then * *
* *
f (x)dx ≥ * g(x)dx** .
*
I I
11 Interpolating maps, the modulus map and Hadamard’s inequality 211
* *
* *
Proof. We have f (x)dx ≥ |g(x)|dx ≥ ** g(x)dx** .
I I I

We now proceed to a reﬁnement of Hadamard’s inequality for convex func-

tions. For this result, we introduce the symmetrization f s of f on [a,b], deﬁned
by
1
f s (x) = [f (x) + f (a + b − x)].
2
This has the properties

mf s (a, b) = mf (a, b), Mf s (a, b) = Mf (a, b), Mf s (a, b) = Mf (a, b).

Theorem 2.2 Let I ⊂ R be an interval and f : I → R. Suppose a, b ∈ I

with a < b. Then

Mf (a, b) − Mf (a, b) ≥ |Mσ (f (a), f (b)) − Mσ◦f (a, b)| (11.9)

and
Mf (a, b) − mf (a, b) ≥ |Mσ◦f s (a, b) − |mf (a, b)| | . (11.10)

Proof. From the convexity of f , we have for t ∈ [0, 1] that

0 ≤ tf (a) + (1 − t)f (b) − f (ta + (1 − t)b).

By virtue of the general inequality

|c − d| ≥ | |c| − |d| | , (11.11)

we thus have

tf (a) + (1 − t)f (b) − f (ta + (1 − t)b)

≥ | |tf (a) + (1 − t)f (b)| − |f (ta + (1 − t)b)| | . (11.12)

Lemma 2.1 provides

1 1 1
f (a) tdt + f (b) (1 − t)dt − f (ta + (1 − t)b)dt
0 0 0
* 1 1 *
* *
≥ ** |tf (a) + (1 − t)f (b)| dt − |f (ta + (1 − t)b)| dt** . (11.13)
0 0

Inequality (11.9) now follows from evaluation of the integrals and a change
of variables.
Similarly we have for any α, β ∈ [a, b] that
+ , ** * * + ,**
α+β * * f (α) + f (β) * * **
f (α) + f (β)
−f ≥ ** ** * − *f α + β * * .
2 2 2 * * 2 **
212 S.S. Dragomir et al.

Set α = w1 , β = w2 . Then
α+β =a+b (11.14)
and by Lemma 2.1 we have
1 1
1
f (w1 ) dt + f (w2 ) dt − m
2 0 0
* 1* * *
* * f (w1 ) + f (w2 ) * *
≥* * * * dt − |m| * .
* 2 * *
0

By (11.14) and a change of variables, the previous result reduces to (21.2).

When does the theorem provide an improvement on Hadamard’s inequal-

ity? That is, when does strict inequality obtain in (11.9) or (21.2)? To
answer this, we consider the derivation of (11.11). Since c = (c − d) + d,
we have
|c| ≤ |c − d| + |d| (11.15)
or
|c| − |d| ≤ |c − d|. (11.16)
By symmetry we have also

|d| − |c| ≤ |d − c|. (11.17)

Combining the last two inequalities yields (11.11).

To have strict inequality in (11.11) we need strict inequality in both (11.16)
and (11.17), or equivalently in both (11.15) and

|d| ≤ |d − c| + |c|.

Thus strict inequality occurs in (11.16) if and only d, c − d are of opposite

sign, and in (11.17) if and only c, d − c are of opposite sign. These conditions
are satisﬁed if and only if c and d are of opposite sign.
It follows that strict inequality obtains in (11.12) if and only if

tf (a) + (1 − t)f (b) and f (ta + (1 − t)b) are of opposite sign,

that is, if and only if

tf (a) + (1 − t)f (b) > 0 > f (ta + (1 − t)b). (11.18)

Since an integrable convex function is continuous, (11.13) and so (11.9) ap-

plies with strict inequality if and only if there exists t ∈ [0, 1] for which (11.18)
holds.
Similarly strict inequality applies in (21.2) if and only if there exist α, β ∈
[a, b] such that
11 Interpolating maps, the modulus map and Hadamard’s inequality 213
+ ,
f (α) + f (β) α+β
>0>f .
2 2
Changes of variable yield the following condition.

Corollary 2.3 A necessary and suﬃcient condition for strict inequality in

(11.9) is that there exists x ∈ [a, b] such that

(b − x)f (a) + (x − a)f (b) > 0 > f (x).

A necessary and suﬃcient condition for strict inequality in (21.2) is that

there exists x ∈ [a, b] such that

f s (x) > 0 > m.

Corollary 2.4 If in the context of Theorem 2.1 f s = f , then

f (a) − M ≥ | |f (a)| − Mσ◦f (a, b)| (11.19)

and
M − m ≥ | Mσ◦f (a, b) − |m| | . (11.20)
A necessary and suﬃcient condition for strict inequality in (21.4) is that
there exists x ∈ [a, b] such that

f (x) < 0 < f (a).

A necessary and suﬃcient condition for strict inequality in (21.5) is that

there exists x ∈ [a, b] such that

f (x) > 0 > m.

These ideas have natural application to means. For an example, denote by

A(a, b), G(a, b) and I(a, b) respectively the arithmetic, geometric and identric
means of two positive numbers a, b, given by
a+b √
A(a, b) = , G(a, b) = ab
2
and 1/(b−a)
bb
I(a, b) =
1
e aa if a = b .
a if a = b
These satisfy the geometric–identric–arithmetic (GIA) inequality

G(a, b) ≤ I(a, b) ≤ A(a, b).

This follows from

G(a, b) ≤ L(a, b) ≤ A(a, b)
214 S.S. Dragomir et al.

(where L(a, b) refers to the logarithmic mean), which was ﬁrst proved by
Ostle and Terwilliger [6] and Carlson [1], [2], and

L(a, b) ≤ I(a, b) ≤ A(a, b),

which was established by Stolarsky [7], [8].

The ﬁrst part of the GIA inequality can be improved as follows.

Corollary 2.5 If a ∈ (0, 1] and b ∈ [1, ∞) with a = b, then

* ' (**
I(a, b) * (ln b)2 + (ln a)2

≥ exp * * b a 2−a−b 1/(b−a) *

− ln b a e
G(a, b) ln((b/a)2 ) *
≥ 1. (11.21)

Proof. For the convex function f (x) = − ln x (x > 0), the left-hand side of
(11.9) is
b
ln a + ln b 1
− + ln x dx
2 b−a a
1
= [b ln b − a ln a − (b − a)] − ln G(a, b)
b−a

I(a, b)
= ln .
G(a, b)

Since ln b
(ln b)2 + (ln a)2
|x|dx =
ln a 2
and
b 3 4
| ln x|dx = ln aa bb e2−a−b ,
a

we have likewise for the same choice of f that the right-hand side of
(11.9) is
* ' (**
* (ln b)2 + (ln a)2

* − ln b b a 2−a−b 1/(b−a) *
a e
* ln((b/a)2 ) *,

whence the desired result.

We note for reference the incidental result

M− ln (a, b) = − ln I(a, b) (11.22)

derived in the proof.

For the ﬁrst inequality in (21.6) to be strict, by Corollary 2.3 there needs
to exist x with 1 < x < b for which
11 Interpolating maps, the modulus map and Hadamard’s inequality 215

(b − x) ln a + (x − a) ln b < 0.

Since the left-hand side is strictly increasing in x, this condition can be sat-
isﬁed if and only if the left-hand side is strictly negative for x = 1, that is,
we require
(b − 1) ln a + (1 − a) ln b < 0. (11.23)

Because b > 1, we have b − 1 > ln b and so (b − 1)/a − ln b > 0, since

0 < a ≤ 1. Thus the left-hand side of (11.23) is strictly increasing in a. It
tends to −∞ as a → 0 and is zero for a = 1. Accordingly (11.23) holds
whenever 0 < a < 1 < b.

The second part of the GIA inequality may also be improved.

Corollary 2.6. If 0 < a < b < ∞, then

%* * + ,* *&
* 1 b* * * a + b ** *
A(a, b) * * * *
≥ exp * *ln x(a + b − x)* dx − **ln * *
I(a, b) *b − a a 2 *
≥ 1. (11.24)

Proof. For the convex function f (x) = − ln x (x > 0) we have that

+ ,
a+b A(a, b)
M − m = ln − ln I(a, b) = ln
2 I(a, b)

and that the right-hand side of (21.2) is

* * + ,* *
* 1 b* * * a + b ** *
* * * * *
* * ln x(a + b − x) * dx − * ln * *.
*b − a a 2 *

The stated result follows from (21.2).

By Corollary 2.3, a necessary and suﬃcient condition for the ﬁrst inequality
in (21.7) to be strict is that there should exist x ∈ [a, b] such that

a+b
ln[x(a + b − x)] < 0 < ln ,
2
that is,
x(a + b − x) < 1 < (a + b)/2.
The leftmost term is minimized for x = a and x = b, so the condition reduces
to
ab < 1 < (a + b)/2 or 2 − b < a < 1/b.
Since 2 − b < 1/b for b = 1, there are always values of a for which this
condition is satisﬁed.
216 S.S. Dragomir et al.

Similar analyses may be made for the reﬁnements of inequalities derived

in the remainder of this chapter.

11.3 Inequalities for Gf and Hf

Our first result in this section provides minorants for the difference between
the two sides of the first inequality and the difference between the outermost
quantities in (11.2).

Theorem 3.1. Suppose I is an interval of real numbers with a, b ∈ I and

a < b. Then if f : I → R is convex, we have for t ∈ [0, 1] that

Gf (t) − Hf (t) ≥ |Mσ (f (u1 ), f (u2 )) − Hσ◦f (t)| (11.25)

and
Hf (t) − m ≥ |Mσ◦f s (a, b) − |m| | . (11.26)

Proof. We have
1
Gf (t) = [f (u1 ) + f (u2 )] = Mf (u1 , u2 ),
2
Hf (t) = Mf ◦yt (a, b) = Mf (u1 , u2 ),
mf (u1 , u2 ) = mf (a, b) = m,
so for t ∈ [0, 1] application of Theorem 2.2 to f on (u1 , u2 ) provides

Gf (t) − Hf (t) ≥ |Mσ (f (u1 ), f (u2 )) − Mσ◦f (u1 , u2 )| , (11.27)

Hf (t) − m ≥ |Mσ◦f s (u1 , u2 ) − |m| | . (11.28)

Since u2 − u1 = t(b − a), we have for t ∈ (0, 1] that
u2
1 dy
Mσ◦f (u1 , u2 ) = |f (y)|
b − a u1 t
b
1
= |f (yt (x))| dx
b−a a
= Hσ◦f (t),

so (11.27) yields (11.25) for t ∈ (0, 1]. As (11.25) also holds for t = 0, we have
the ﬁrst part of the theorem. Using u1 + u2 = a + b, we derive the second
part similarly from Mσ◦f s (u1 , u2 ) = Mσ◦f s ◦yt (a, b).

Our next result provides a minorant for the diﬀerence between the two
sides of the second inequality in (11.3) and a corresponding result for the
third inequality.
11 Interpolating maps, the modulus map and Hadamard’s inequality 217

Theorem 3.2. Suppose the conditions of Theorem 3.1 hold. Then

* 1 *
m+M * *
− M ≥ **Mσ (m, M ) − |Gf (t)| dt** (11.29)
2 0

and
+ , + ,
1 3a + b a + 3b
M− f +f
2 4 4
* 1 * + , + ,* *
1 ** *
* 3a + b a + 3b ** *
*.
≥ * |Gf (t) + Gf (1 − t)| dt − *f +f * *
2 0 4 4
(11.30)

Proof. First, we observe that

1 1 1
1
Gf (t)dt = f (u1 )dt + f (u2 )dt
0 2 0 0
% (a+b)/2 b &
1 2 2
= f (x)dx + f (x)dx
2 b−a a b − a (a+b)/2
= M. (11.31)

Application of Theorem 2.2 to Gf on [0,1] provides

Gf (0) + Gf (1) 1 * *
− Gf (t)dt ≥ *Mσ (Gf (0), Gf (1)) − Mσ◦Gf (0, 1)*
2 0

and * * + ,* *
* * 1 ** *
M − Gf (1/2) ≥ *Mσ◦Gf (0, 1) − **Gf
* *.
2 * *
s

By (11.31) and the relation Gf (0) = m, Gf (1) = M , we have the stated

results.

11.4 More on the identric mean

For a, b > 0, deﬁne γa,b : [0, 1] → R by

γa,b (t) = G(u1 , u2 ), (11.32)

where, as before, G(x, y) denotes the geometric mean of the positive numbers
x, y.

Theorem 4.1 The mapping γa,b possesses the following properties:

(a) γa,b is concave on [0,1];
218 S.S. Dragomir et al.

(b) γa,b is monotone nonincreasing on [0, 1], with

γa,b (1) = G(a, b) and γa,b (0) = A(a, b);

(c) for t ∈ [0, 1]

γa,b (t) ≤ I(u1 , u2 ) ≤ A(a + b);
(d) we have
+ , + ,
3a + b a + 3b 3a + b a + 3b
I , ≥G ,
4 4 4 4
≥ I(a, b)
≥ G(A(a, b), G(a, b))
≥ G(a, b);

(e) for t ∈ [0, 1] we have

A(a, b) I(u1 , u2 )
1≤ ≤ . (11.33)
I(u1 , u2 ) γa,b (t)

Proof. We have readily that for t ∈ [0, 1]

γa,b (t) = exp [−G− ln (t)] ,

H− ln (t) = − ln I(u1 , u2 ).
Since the map x :→ exp(−x) is order reversing, (b)–(e) follow from Theorem
B(ii),(iii). It remains only to establish (a).
Since dui /dt = (−1)i (b − a)/2 for i = 1, 2 and u2 − u1 = t(b − a), we have
from (11.32) that

dγa,b b − a u 1 − u2 t(b − a)2

= · = −
dt 4 (u1 u2 )1/2 4(u1 u2 )1/2

and so
dγa,b (b − a)2 (b − a)2 ' 1/2 −3/2 1/2 −3/2
(
=− 1/2
− u2 u1 + u1 u2 < 0.
dt 8(u1 u2 ) 16

This establishes (a).

We now apply Theorems 3.1 and 3.2 to obtain further information about
the identric mean. For t ∈ [0, 1], put

ηa,b (t) := I(u1 , u2 ).

11 Interpolating maps, the modulus map and Hadamard’s inequality 219

Because
G− ln (t) = − ln γa,b (t), H− ln (t) = − ln ηa,b (t),
(11.25) provides

ln ηa,b (t) − ln γa,b (t) = G− ln (t) − H− ln (t) ≥ La,b (t),

where
La,b (t) = Mσ (ln u1 , ln u2 ).
This yields
ηa,b (t)
≥ exp [La,b (t)] ≥ 1 for t ∈ [0, 1].
γa,b (t)
From (11.26) we derive
+ , * * + ,* *
* 1 b √ * a + b ** *
a+b * *
ln − ln ηa,b (t) ≥ * |ln yu2 | dx − **ln * * ≥ 0,
2 *b − a a 2 *

which gives
%* * + ,* *&
* 1 b √ * a + b ** *
A(a, b) * *
≥ exp * |ln yu2 | dx − **ln * *
ηa,b (t) *b − a a 2 *
≥1 for t ∈ [0, 1].

Also application of (11.29) to the convex function − ln yields

ln I(a, b) − ln[G(A(a, b), G(a, b))] ≥ Ka,b ,

where * *
* 1 *
Ka,b = **Mσ (ln A(a, b), ln G(a, b)) − |ln γa,b (t)| dt** .
0

Hence
I(a, b)
≥ exp [Ka,b ] ≥ 1.
G(A(a, b), G(a, b))
Finally, applying (11.30) to the convex mapping − ln provides
+ ,
a + 3b 3a + b
ln G , − ln I(a, b)
4 4
* * + ,+ ,* *
1 ** 1 *
* 3a + b a + 3b ** *
*
≥ * |ln [γa,b (t)γa,b (1 − t)]| dt − *ln * *
2 0 4 4
= Ma,b ,

where
* * * + ,*
* 1 * * 3a + b a + 3b **
Ma,b := ** * *
|ln G(γa,b (t), γa,b (1 − t))* dt − *ln G , *.
0 4 4
220 S.S. Dragomir et al.

Hence a+3b

G , 3a+b
4 4
≥ exp [Ma,b ] ≥ 1.
I(a, b)

11.5 The mapping Lf

We now consider further the properties of the univariate mapping Lf deﬁned

in the introduction. First we introduce the useful auxiliaries
b
1
Af (t) := f (u) dx,
b−a a
b
1
Bf (t) := f (v) dx
b−a a

for t ∈ [0, 1]. We have immediately from (11.4) that

1
Lf (t) = [Af (t) + Bf (t)] .
2
The following property is closely connected with the second part of the
Hadamard inequality.

Proposition 5.1. Suppose a < b and f : [a, b] → R is convex. Then

1 M + Mf (t)
Lf (t) ≤ + Gf (t)
2 2
M + Mf (t)
≤
2
≤M (11.34)

for all t ∈ [0, 1].

Proof. For t ∈ [0, 1] we have

w1 b
1 1
Af (t) = f (u)du and Bf (t) = f (v)dv.
w1 − a a b − w2 w2

Substituting Af (t), Bf (t) for the leftmost terms in the known inequalities
w1 + ,
1 1 f (a) + f (w1 ) a + w1 f (a) + f (w1 )
f (u)du ≤ +f ≤ ,
w1 − a a 2 2 2 2
b + ,
1 1 f (b) + f (w2 ) b + w2 f (b) + f (w2 )
f (v)dv ≤ +f ≤
b − w2 w2 2 2 2 2
11 Interpolating maps, the modulus map and Hadamard’s inequality 221

respectively and adding, gives by (11.4) that

+ , + ,
1 1 a + w1 b + w2
Lf (t) ≤ M + {f (w1 ) + f (w2 )} + f +f
4 2 2 2

1 1
≤ M + {f (w1 ) + f (w2 )} .
2 2

The ﬁrst two inequalities in (11.34) follow from (11.6) and (11.7) and the
ﬁnal inequality from (11.8).

The ﬁrst inequality in (11.5) is improved by the following proposition. We

introduce the auxiliary variable
a+b
z = z(t, x) := (1 − t)x + t for x ∈ [a, b] and t ∈ [0, 1].
2
Proposition 5.2. Under the assumptions of Proposition 5.1 we have

Lf (t) − Hf (1 − t)
* w1 * * *
* 1 * f (u) + f (u + t(b − a)) * *
≥ ** *
*
* du − Hσ◦f (1 − t)*
* *
(1 − t)(b − a) a 2
≥0 (11.35)

for all t ∈ [0, 1).

Proof. Put

z = (u + v)/2 = φt ((a + b)/2, x) for x ∈ [a, b] and t ∈ [0, 1].

By (11.11) and the convexity of f ,

* *
f (u) + f (v) * f (u) + f (v) *
*
− f (z) = * − f (z)**
2 2
** * *
* * f (u) + f (v) * *
≥ ** ** * − |f (z)|
*
*
*
2
≥0

for all x ∈ [a, b] and t ∈ [0, 1].

By Lemma 2.1, integration with respect to x over [a, b] provides
* *
* 1 b ** f (u) + f (v) ** *
* * * dx − Hσ◦f (1 − t)** ≥ 0.
Lf (t) − Hf (1 − t) ≥ * * *
*b − a a 2 *
222 S.S. Dragomir et al.

Since
* *
1 * f (u) + f (v) *
b
* * dx
b−a * 2 *
a
w1 * *
1 * f (u) + f (u + t(b − a)) *
= * * du,
(1 − t)(b − a) a * 2 *

inequality (11.35) is proved.

Remark 5.3. We can apply Theorem 2.2 to Lf to provide results similar to
Theorems 3.1 and 3.2. In fact we may readily verify that the components Af
and Bf are themselves convex and so subject to Theorem 2.2.

We now apply the above to obtain results for the identric mean. We may
compute Af , Bf for f = − ln to derive
w1
1
A− ln (t) = (− ln u)du = − ln I(w1 , a),
w1 − a a

b
1
B− ln (t) = (− ln u)du = − ln I(b, w2 ).
b − w2 w2

Thus
1
L− ln (t) = [A− ln (t) + B− ln (t)] = − ln ζa,b (t),
2
where the map ζa,b : [0, 1] → R is deﬁned by

ζa,b (t) = G(I(a, w1 ), I(w2 , b)).

Theorem 5.4. We have the following.

(a) for all t ∈ [0, 1]

γa,b (t) ≥ ζa,b (t) ≥ [I(a, b)]1−t [G(a, b)]t ≥ G(a, b);

(b) for all t ∈ [0, 1]

ηa,b (1 − t) ≥ ζa,b (t) and G(ηa,b (t), ηa,b (1 − t)) ≥ ζa,b (t).

Proof. Since
ζa,b (t) = exp [−L− ln (t)] for all t ∈ [0, 1]
and the map x :→ exp(−x) is order reversing, (a) and (b) follow from Theo-
rem C, parts 2 and 3.

Remark 5.5. Similar results may be obtained from Propositions 5.1 and
5.2.
11 Interpolating maps, the modulus map and Hadamard’s inequality 223

References

1. B. C. Carlson, Some inequalities for hypergeometric functions, Proc. Amer. Math. Soc.
17 (1966), 32–39.
2. B. C. Carlson, The logarithmic mean, Amer. Math. Monthly 79 (1972), 615–618.
3. S. S. Dragomir, D. S. Milošević and J. Sándor, On some reﬁnements of Hadamard’s
inequalities and applications, Univ. Belgrad Publ. Elek. Fak. Sci. Math. 4 (1993),
21–24.
4. S. S. Dragomir and C. E. M. Pearce, Hermite–Hadamard Inequali-
ties, RGMIA Monographs, Victoria University, Melbourne (2000), online:
http://rgmia.vu.edu.au/monographs.
5. S. S. Dragomir and E. Pearce, A reﬁnement of the second part of Hadamard’s in-
equality, with applications, in Sixth Symposium on Mathematics & its Applications,
Technical University of Timisoara (1996), 1–9.
6. B. Ostle and H. L. Terwilliger, A comparison of two means, Proc. Montana Acad. Sci.
17 (1957), 69–70.
7. K. B. Stolarsky, Generalizations of the logarithmic mean, Math. Mag. 48 (1975),
87–92.
8. K. B. Stolarsky, The power and generalized of logarithmic means, Amer. Math.
Monthly 87 (1980), 545–548.
Chapter 12
Estimating the size of correcting codes
using extremal graph problems

Sergiy Butenko, Panos Pardalos, Ivan Sergienko, Vladimir Shylo

and Petro Stetsyuk

Abstract Some of the fundamental problems in coding theory can be formu-

lated as extremal graph problems. Finding estimates of the size of correcting
codes is important from both theoretical and practical perspectives. We solve
the problem of ﬁnding the largest correcting codes using previously developed
algorithms for optimization problems in graphs. We report new exact solu-
tions and estimates.

Key words: Maximum independent set, graph coloring, error-correcting

codes, coding theory, combinatorial optimization

12.1 Introduction

Let a positive integer l be given. For a binary vector u ∈ B l denote by Fe (u)

the set of all vectors (not necessarily of dimension l) which can be obtained
from u as a consequence of a certain error e, such as deletion or transposition
:
of bits. A subset C ⊆ B l is said to be an e-correcting code if Fe (u) Fe (v) = ∅
for all u, v ∈ C, u = v. In this chapter we consider the following cases for the
error e.
• Single deletion (e = 1d): F1d (u) ⊆ B l−1 and all elements of F1d (u) are
obtained by deletion of one of the components of u. For example, if l = 4
and u = 0101 then F1d (u) = {101, 001, 011, 010}. See [25] for a survey of
single-deletion-correcting codes.

Sergiy Butenko, Panos Pardalos

University of Florida, 303 Weil Hall, Gainesville, FL 32611, U. S. A.
e-mail: butenko,pardalos@uﬂ.edu
Ivan Sergienko, Vladimir Shylo and Petro Stetsyuk
Institute of Cybernetics, NAS of Ukraine, Kiev, UKRAINE
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 227

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 12,
c Springer Science+Business Media, LLC 2009
228 S. Butenko et al.

• Two-deletion (e = 2d): F2d (u) ⊆ B l−2 and all elements of F2d (u) are
obtained by deletion of two of the components of u. For u = 0101 we have
F2d (u) = {00, 01, 10, 11}.
• Single transposition, excluding the end-around transposition (e = 1t):
F1t (u) ⊆ B l and all elements of F1t (u) are obtained by transposition of a
neighboring pair of components in u. For example, if l = 5 and u = 11100
then F1t (u) = {11100, 11010}.
• Single transposition, including the end-around transposition (e = 1et):
F1et (u) ⊆ B l and all elements of F1et (u) are obtained by transposition
of a neighboring pair of components in u, where the ﬁrst and the last
components are also considered as neighbors. For l = 5 and u = 11100 we
obtain F1et (u) = {11100, 11010, 01101}.
• One error on the Z-channel (e = 1z): F1z (u) ⊆ B l and all elements
of F1z (u) are obtained by possibly changing one of the nonzero compo-
nents of u from 1 to 0. If l = 5 and u = 11100 then F1z (u) =
{11100, 01100, 10100, 11000}. The codes correcting one error on the Z-
channel represent the simplest case of asymmetric codes.
Our problem of interest here is to ﬁnd the largest correcting codes. It appears
that this problem can be formulated in terms of extremal graph problems as
follows [24].
Consider a simple undirected graph G = (V, E), where V = {1, . . . , n}
is the set of vertices and E is the set of edges. The complement graph of G
is the graph Ḡ = (V, Ē), where Ē is the complement of E. Given a subset
W ⊆ V , we denote by G(W ) the subgraph induced by W on G. A subset
I ⊆ V is called an independent set (stable set, vertex packing) if the edge set
of the subgraph induced by I is empty. An independent set is maximal if it
is not a subset of any larger independent set and maximum if there are no
larger independent sets in the graph. The independence number α(G) (also
called the stability number) is the cardinality of a maximum independent set
in G. A subset C ⊆ V is called a clique if G(C) is a complete graph.
Consider a graph Gl having a vertex for every vector u ∈ B l , with an
edge joining
: the vertices corresponding to u, v ∈ B l , u = v if and only if
Fe (u) Fe (v) = ∅. Then a correcting code corresponds to an independent
set in Gl . Hence the largest e-correcting code can be found by solving the
maximum independent set problem in the considered graph. Note that this
problem could be equivalently formulated as the maximum clique problem in
the complement graph of G.
Another discrete optimization problem which we will use to obtain lower
bounds for asymmetric codes is the graph coloring problem, which is formu-
lated as follows. A legal (proper) coloring of G is an assignment of colors to
its vertices so that no pair of adjacent vertices has the same color. A color-
ing induces naturally a partition of the vertex set such that the elements of
each set in the partition are pairwise nonadjacent; these sets are precisely the
subsets of vertices being assigned the same color. If there exists a coloring
of G that uses no more than k colors, we say that G admits a k-coloring
12 Estimating the size of correcting codes using extremal graph problems 229

(G is k-colorable). The minimal k for which G admits a k-coloring is called

the chromatic number and is denoted by χ(G). The graph coloring problem
is to find χ(G) as well as the partition of vertices induced by a χ(G)-coloring.
The maximum independent set (clique) and the graph coloring problems
are NP-hard [15]; moreover, they are associated with a series of recent results
about hardness of approximations. Arora and Safra [2] proved that for some
positive the approximation of the maximum clique within a factor of n is
NP-hard. Håstad [16] has shown that in fact for any δ > 0 the maximum
clique is hard to approximate in polynomial time within a factor n1−δ . Sim-
ilar approximation complexity results hold for the graph coloring problem
as well. Garey and Johnson [14] have shown that obtaining colorings using
sχ(G) colors, where s < 2, is NP-hard. It has been shown by Lund and Yan-
nakakis [18] that χ(G) is hard to approximate within n for some > 0, and
Feige and Kilian [13] have shown that for any δ > 0 the chromatic number is
hard to approximate within a factor of n1−δ , unless NP ⊆ ZPP. These results
together with practical evidence [17] suggest that the maximum independent
set (clique) and coloring problems are hard to solve even in graphs of mod-
erate sizes. Therefore heuristics are used to solve practical instances of these
problems. References [3] and [19] provide extensive reviews of the maximum
clique and graph coloring problems, respectively.
In this chapter, using efficient approaches for the maximum independent
set and graph coloring problems, we have improved some of the previously
known lower bounds for asymmetric codes and found the exact solutions for
some of the instances.
The remainder of this chapter is organized as follows. In Section 12.2 we
find lower bounds and exact solutions for the largest codes using efficient
algorithms for the maximum independent set problem. In Section 12.3 a
graph coloring heuristic and the partitioning method are utilized in order to
obtain better lower bounds for some asymmetric codes. Finally, concluding
remarks are made in Section 12.4.

12.2 Finding lower bounds and exact solutions

for the largest code sizes using a maximum
independent set problem

In this section we summarize the results obtained in [5, 21]. We start with
the following global optimization formulation for the maximum independent
set problem.
Theorem 1 ([1]). The independence number of G satisﬁes the following
equality:
n
α(G) = max n xi (1 − xj ). (12.1)
x∈[0,1]
i=1 (i,j)∈E
230 S. Butenko et al.

This formulation is valid if instead of [0, 1]n we use {0, 1}n as the feasi-
ble region, thus obtaining an integer 0–1 programming problem. In problem
(12.1), for each vertex i there is a corresponding Boolean expression:
⎡ ⎤
; ;
i ←→ ri = xi ⎣ xj ⎦ .
(i,j)∈E

Therefore the problem of ﬁnding a maximum independent set can be re-

duced to the problem of ﬁnding a Boolean vector x∗ which maximizes the
number of “true” values among ri , i = 1, . . . , n:
* ⎡ ⎤*
n * ; ; *
* *
∗
x = argmax * ⎣ xj ⎦ ** .
*xi (12.2)
i=1 * (i,j)∈E *

To apply local search techniques to the above problem one needs to deﬁne
a proper neighborhood. We deﬁne the neighborhood on the set of all maximal
independent sets as follows.
For each jq ∈ I, q = 1, . . . , |I|,
⎧ ⎡ ⎤ ⎫
⎨ ; has exactly 2 literals ⎬
Listjq = / I : ri = ⎣xi
i∈ x̄k ⎦ with value 0, namely .
⎩ ⎭
(i,k)∈E xi = 0 and x̄jq = 0

If the set Listjq is not empty, let I(G(Listjq )) be an arbitrary maximal

independent set in G(Listjq ). Then sets of the form

(I − {jq }) ∪ I(G(Listjq )), q = 1, . . . , |I|,

are maximal independent sets in G. Therefore the neighborhood of a maximal

independent set I in G can be deﬁned as follows:

O(I) = (I − {jq }) ∪ I(G(Listjq )),

jq ∈ I, q = 1, . . . |I|}.

We have the following algorithm to ﬁnd maximal independent sets:

1. Given a randomly generated Boolean vector x, ﬁnd an appropriate initial
maximal independent set I.
2. Find a maximal independent set from the neighborhood (deﬁned for max-
imal independent sets) of I, which has the largest cardinality.
12 Estimating the size of correcting codes using extremal graph problems 231

We tested the proposed algorithm with the following graphs arising from
coding theory. These graphs are constructed as discussed in Section 12.1 and
can be downloaded from [24]:
• Graphs From Single-Deletion-Correcting Codes (1dc);
• Graphs From Two-Deletion-Correcting Codes (2dc);
• Graphs From Codes For Correcting a Single Transposition, Excluding the
End-Around Transposition (1tc);
• Graphs From Codes For Correcting a Single Transposition, Including the
End-Around Transposition (1et);
• Graphs From Codes For Correcting One Error on the Z-Channel (1zc).

The results of the experiments are summarized in Table 12.1. In this table,
the columns “Graph,” “n” and “|E|” represent the name of the graph, the
number of its vertices and its number of edges. This information is available
from [24]. The column “Solution found” contains the size of the largest inde-
pendent sets found by the algorithm over 10 runs. As one can see the results
are very encouraging. In fact, for all of the considered instances they were at
least as good as the best previously known estimates.

Table 12.1 Lower bounds obtained

Graph n |E| Solution
found

1dc128 128 1471 16

1dc256 256 3839 30
1dc512 512 9727 52
1dc1024 1024 24063 94
1dc2048 2048 58367 172
2dc128 128 5173 5
2dc256 256 17183 7
2dc512 512 54895 11
2dc1024 1024 169162 16
1tc64 64 192 20
1tc128 128 512 38
1tc256 256 1312 63
1tc512 512 3264 110
1tc1024 1024 7936 196
1tc2048 2048 18944 352
1et64 64 264 18
1et128 128 672 28
1et256 256 1664 50
1et512 512 4032 100
1et1024 1024 9600 171
1et2048 2048 220528 316
1zc128 128 1120 18
1zc256 256 2816 36
1zc512 512 6912 62
1zc1024 1024 16140 112
1zc2048 2048 39424 198
1zc4096 4096 92160 379
232 S. Butenko et al.

12.2.1 Finding the largest correcting codes

The proposed exact algorithm consists of the following steps:

• Preprocessing: ﬁnding and removing the set of isolated cliques;

• Finding a partition which divides the graph into disjoint cliques;
• Finding an approximate solution;
• Finding an upper bound;
• A Branch-and-Bound algorithm.
Below we give more detail on each of these steps.
0. Preprocessing: ﬁnding and removing the set of isolated cliques
We will call a clique C isolated if it contains a vertex i with the property
|N (i)| = |C| − 1. Using the fact that if C is an isolated clique, then α(G) =
α(G − G(C)) + 1, we iteratively ﬁnd and remove all isolated cliques in the
graph. After that, we consider each connected component of the obtained
graph separately.
1. Finding a partition which divides the graph into disjoint cliques
We partition the set of vertices V of G as follows:

/
k
V = Ci ,
i=1

where Ci , i = 1, 2, . . . , k, are cliques such that Ci ∩ Cj = ∅, i = j.

The cliques are found using a simple greedy algorithm. Starting with C1 =
∅, we pick the vertex j ∈ / C1 that has the maximal number of neighbors
among those vertices outside of5C1 which are in the neighborhood of every
vertex from C1 . Set C1 = C1 {j}, and repeat recursively, until there is
no vertex to add. Then remove C1 from the graph, and repeat the above
procedure to obtain C2 . Continue in this way until the vertex set in the
graph is empty.
2. Finding an approximate solution
An approximate solution is found using the approach described above.
3. Finding an upper bound
To obtain an upper bound for α(G) we can solve the following linear
program:

n
OC (G) = max xi , (12.3)
i=1

s. t. xi ≤ 1, j = 1, . . . , m, (12.4)
i∈Cj

x ≥ 0, (12.5)
12 Estimating the size of correcting codes using extremal graph problems 233

where Cj ∈ C is a maximal clique and C is a set of maximal cliques

with |C| = m. For a general graph the last constraint should read
0 ≤ xi ≤ 1, i = 1, . . . , n. But since an isolated vertex is an isolated clique
as well, after the preprocessing step our graph does not contain isolated
vertices and the above inequalities are implied by the set of clique con-
straints (12.4) along with nonnegativity constraints (12.5). We call OC (G)
the linear clique estimate.
In order to ﬁnd a tight bound OC (G) one normally needs to consider a large
number of clique constraints. Therefore one deals with linear programs in
which the number of constraints may be much larger than the number
of variables. In this case it makes sense to consider the linear program
which is dual to problem (12.3)–(12.5). The dual problem can be written
as follows:
m
OC (G) = min yj , (12.6)
j=1

m
s. t. aij yj ≥ 1, i = 1, . . . , n, (12.7)
j=1

y ≥ 0, (12.8)
where
1, if i ∈ Cj ,
aij =
0, otherwise.

The number of constraints in the last LP is always equal to the number

of vertices in G. This gives us some advantages in comparison to problem
(12.3)–(12.5). If m > n, the dual problem is more suitable for solving with
the simplex method and interior point methods. Increasing the number of
clique constraints in problem (12.3)–(12.5) only leads to an increase in the
number of variables in problem (12.6)–(12.8). This provides a convenient
“restart” scheme (start from an optimal solution to the previous problem)
when additional clique constraints are generated.
To solve problem (12.6)–(12.8) we used a variation of an interior point
method proposed by Dikin [8, 9]. We will call this version of interior
point method Dikin’s Interior Point Method, or DIPM. We present a com-
putational scheme of DIPM for an LP problem in the following form:

m+n
min
m+n
ci yi , (12.9)
y∈R
i=1

s. t. Ay = e, (12.10)

yi ≥ 0, i = 1, . . . , m + n. (12.11)
234 S. Butenko et al.

Here A is an (m+n)×n matrix in which the ﬁrst m columns are determined

by coeﬃcients aij and columns am+i = −ei for i = 1, . . . , n, where ei is
the i-th orth. The vector c ∈ Rm+n has its ﬁrst m components equal to
one and the other n components equal to zero; e ∈ Rn is the vector of all
ones. Problem (12.6)–(12.8) can be reduced to this form if the inequality
constraints in (12.7) are replaced by equality constraints. As the initial
point for the DIPM method we choose y 0 such that
⎧
⎨ 2, for i = 1, . . . , m,
yi0 = 2 m
aij − 1, for i = m + 1, . . . , m + n.
⎩
j=1

Now let y k be a feasible point for problem (12.9)–(12.11). In the DIPM

method the next point y k+1 is obtained by the following scheme:
• Determine Dk of dimension (m + n) × (m + n) as Dk = diag{y k }.
• Compute vector

cp = (I − (ADk )T (ADk2 AT )−1 ADk )Dk c.

• Find ρk = max cpi .

i=1,...,m+n

cp
• Compute yik+1 = yik 1 − α ρik , i = 1, . . . , m + n, where α = 0.9.

As the stopping criterion we used the condition

m+n
m+n
cj yjk − cj yjk+1 < ε, where ε = 10−3 .
j=1 j=1

The most labor-consuming operation of this method is the computation

of the vector cp . This part was implemented using subroutines DPPFA
and DPPSL of LINPACK [10] for solving the following system of linear
equations:
ADk2 AT uk = ADk2 c, uk ∈ Rn .
In this implementation the time complexity of one iteration of DIPM can
be estimated as O(n3 ).
The values of the vector uk = (ADk2 AT )−1 ADk2 c found from the last sys-
tem define the dual variables in problem (12.6)–(12.8) (Lagrange multipli-
ers for constraints (12.7)). The optimal values of the dual variables were
then used as weight coefficients for finding additional clique constraints,
which help to reduce the linear clique estimate OC (G). The problem of
finding weighted cliques was solved using an approximation algorithm; a
maximum of 1000 cliques were added to the constraints.
4. A Branch-and-Bound algorithm
(a) Branching: Based on the fact that the number of vertices from a clique
that can be included in an independent set is always equal to 0 or 1.
12 Estimating the size of correcting codes using extremal graph problems 235

(b) Bounding: We use the approximate solution found as a lower bound and
the linear clique estimate OC (G) as an upper bound.
Tables 12.2 and 12.3 contain a summary of the numerical experiments with
the exact algorithm. In Table 12.2 Column “#” contains a number assigned to

Table 12.2 Exact algorithm: Computational results

Graph # 1 2 3

1tc128 1 5 4 4.0002
2 5 5 5.0002
3 5 5 5.0002
4 5 4 4.0001

1tc256 1 6 5 5.0002
2 10 9 9.2501
3 19 13 13.7501
4 10 9 9.5003
5 6 5 5.0003

1tc512 1 10 7 7.0003
2 18 14 14. 9221
3 29 22 23.6836
4 29 22 23.6811
5 18 14 14.9232
6 10 7 7.0002

1dc512 1 75 50 51.3167

2dc512 1 16 9 10.9674

1et128 1 3 2 3.0004
2 6 4 5.0003
3 9 7 7.0002
4 9 7 7.0002
5 6 4 5.0003
6 3 2 3.0004

1et256 1 3 2 3.0002
2 8 6 6.0006
3 14 10 12.0001
4 22 12 14.4002
5 14 10 12.0004
6 8 6 6.0005
7 3 2 3.0002
1et512 1 3 3 3.0000
2 10 7 8.2502
3 27 18 18.0006
4 29 21 23.0626
5 29 21 23.1029
6 27 18 18.0009
7 10 7 8.2501
8 3 3 3.0000
236 S. Butenko et al.

Table 12.3 Exact solutions found

Graph n |E| α(G) Time (s)

1dc512 512 9727 52 2118

2dc512 512 54895 11 2618
1tc128 128 512 38 7
1tc256 256 1312 63 39
1tc512 512 3264 110 141
1et128 128 672 28 25
1et256 256 1664 50 72
1et512 512 4032 100 143

each connected component of a graph after the preprocessing. Columns “1,”

“2” and “3” stand for the number of cliques in the partition, the solution
found by the approximation algorithm and the value of the upper bound
OC (G), respectively. In Table 12.3 Column “α(G)” contains the independence
number of the corresponding instance found by the exact algorithm; Column
“Time” summarizes the total time needed to ﬁnd α(G).
Among the exact solutions presented in Table 12.3 only two were previ-
ously known, for 2dc512 and 1et128. The rest were either unknown or were
not proved to be exact.

12.3 Lower Bounds for Codes Correcting One Error

on the Z-Channel

The error-correcting codes for the Z-channel have very important practi-
cal applications. The Z-channel shown in Fig. 12.1 is an asymmetric binary
channel, in which the probability of transformation of 1 into 0 is p, and the
probability of transformation of 0 into 1 is 0.

Fig. 12.1 A scheme of the 1

0 0
Z-channel
p

1 1
1−p

The problem we are interested in is that of ﬁnding good estimates for the
size of the largest codes correcting one error on the Z-channel.
Let us introduce some background information related to asymmetric
codes.
12 Estimating the size of correcting codes using extremal graph problems 237

The asymmetric distance dA (x, y) between vectors x, y ∈ B l is deﬁned as

follows [20]:
dA (x, y) = max{N (x, y), N (y, x)}, (12.12)
where N (x, y) = |{i : (xi = 0) ∧ (yi = 1)}|. It is related to the Hamming dis-
tance dH
l
(x, y) = i=1 |xi − yi | = N (x, y) + N (y, x) by the expression

2dA (x, y) = dH (x, y) + |w(x) − w(y)|, (12.13)

l
where w(x) = i=1 xi = |{i : xi = 1}| is the weight of x. Let us deﬁne the
minimum asymmetric distance Δ for a code C ⊂ B l as

Δ = min {dA (x, y)| x, y ∈ C, x = y}.

It was shown in [20] that a code C with the minimum asymmetric distance
Δ can correct at most (Δ − 1) asymmetric errors (transitions of 1 to 0). In
this subsection we present new lower bounds for codes with the minimum
asymmetric distance Δ = 2.
Let us deﬁne the graph G = (V (l), E(l)), where the set of vertices V (l) =
B l consists of all binary vectors of length l, and (vi , vj ) ∈ E(l) if and only
if dA (vi , vj ) < Δ. Then the problem of ﬁnding the size of the code with
minimal asymmetric distance Δ is reduced to the maximum independent set
problem in this graph. Table 12.4 contains the lower bounds obtained using
the algorithm presented above in this section (some of which were mentioned
in Table 12.1).

Table 12.4 Lower bounds obtained in: a [27]; b [6]; c [7]; d [12]; e (this chapter)
l Lower Bound Upper Bound

4 4 4
5 6a 6
6 12b 12
7 18c 18
8 36c 36
9 62c 62
10 112d 117
11 198d 210
12 379e (378d) 410

12.3.1 The partitioning method

The partitioning method [4, 12, 26] uses independent set partitions of the
vertices of graph G in order to obtain a lower bound for the code size. An
238 S. Butenko et al.

independent set partition is a partition of vertices into independent sets such

that each vertex belongs to exactly one independent set, that is,
/
m <
V (l) = Ii , Ii is an independent set, Ii Ij = ∅, i = j. (12.14)
i=1

Recall that the problem of ﬁnding the smallest m for which a partition of
the vertices into m disjoint independent sets exists is the well-known graph
coloring problem.
The independent set partition (12.14) can be identiﬁed by the vector

Π(l) = (I1 , I2 , . . . , Im ).

We associate the vector

π(l) = (|I1 |, |I2 |, . . . , |Im |),

which is called the index vector of partition Π(n), with Π(l). Its norm is
deﬁned as

m
π(l) · π(l) = |Ii |2 .
i=1

We will assume that |I1 | ≥ |I2 | ≥ . . . ≥ |Im |.

In terms of the codes, the independent set partition is a partition of words
(binary vectors) into a set of codes, where each code corresponds to an inde-
pendent set in the graph.
Similarly, for the set of all binary vectors of weight w we can

construct a
graph G(l, w), in which the set of vertices is the set of the wl vectors, and
two vertices are adjacent iﬀ the Hamming distance between the corresponding
vectors is less than 4. Then an independent set partition

Π(l, w) = (I1w , I2w , . . . , Im

w
)

can be considered in which each independent set will correspond to a sub-

code with minimum Hamming distance 4. The index vector and its norm are
deﬁned in the same way as for Π(n).
By the direct product Π(l1 ) × Π(l2 , w) of a partition of asymmetric codes
Π(l1 ) = (I1 , I2 , . . . , Im1 ) and a partition of constant weight codes Π(l2 , w) =
(I1w , I2w , . . . , Im
w
2
) we will mean the set of vectors

C = {(u, v) : u ∈ Ii , v ∈ Iiw , 1 ≤ i ≤ m},

where m = min{m1 , m2 }. It appears that C is a code of length l = l1 + l2

with minimum asymmetric distance 2, that is, a code correcting one error on
the Z-channel of length l = l1 + l2 [12].
In order to ﬁnd a code C of length l and minimum asymmetric distance 2
by the partitioning method, we can use the following construction procedure:
12 Estimating the size of correcting codes using extremal graph problems 239

1. Choose l1 and l2 such that l1 + l2 = n.

2. Choose = 0 or 1.
3. Set
l2 /2 + ,
/
C= Π(l1 ) × Π(l2 , 2i + ) . (12.15)
i=0

12.3.2 The partitioning algorithm

One of the popular heuristic approaches to the independent set partitioning

(graph coloring) problem is the following. Suppose that a graph G = (V, E)
is given.

INPUT: G = (V, E);

OUTPUT: I1 , I2 , . . . , Im .
0. i = 1;
1. while G = ∅
find a maximal independent set I; set Ii = I; i = i + 1;
G = G − G(I), where G(I) is the subgraph induced by I;
end
In [22, 23] an improvement of this approach was proposed by finding at
each step a specified number of maximal independent sets. Then a new graph
G is constructed, in which a vertex corresponds to a maximal independent
set, and two vertices are adjacent iff the corresponding independent sets have
common vertices. In the graph G, a few maximal independent sets are found,
and the best of them (say, the one with the least number of adjacent edges in
the corresponding independent sets of G) is chosen. This approach is formally
described in Figure 12.2.

12.3.3 Improved lower bounds for code sizes

The partitions obtained using the described partition algorithm are given in
Tables 12.5 and 12.6. These partitions, together with the facts that [11]
Π(l, 0) consists of one (zero) codeword,
Π(l, 1) consists of l codes of size 1,
Π(l, 2) consists of l − 1 codes of size l/2 for even l,
the index vectors of Π(l, w) and Π(l, l − w) are equal
240 S. Butenko et al.

Fig. 12.2 Algorithm for ﬁnding independent set partitions

Table 12.5 Partitions of asymmetric codes found

l1 # Partition Index Vector Norm m

8 1 36,34, 34, 33, 30, 29, 26, 25, 9 7820 9

9 1 62, 62, 62, 61, 58, 56, 53, 46, 29, 18, 5 27868 11
2 62, 62, 62, 62, 58, 56, 53, 43, 32, 16, 6 27850 11
3 62, 62, 62, 61, 58, 56, 52, 46, 31, 17, 5 27848 11
4 62, 62, 62, 62, 58, 56, 52, 43, 33, 17, 5 27832 11
5 62, 62, 62, 62, 58, 56, 54, 42, 31, 15, 8 27806 11
6 62, 62, 62, 60, 57, 55, 52, 45, 31, 18, 8 27794 11
7 62, 62, 62, 60, 58, 55, 51, 45, 37, 16, 4 27788 11
8 62, 62, 62, 60, 58, 56, 53, 45, 32, 16, 6 27782 11
9 62, 62, 62, 62, 58, 56, 52, 43, 32, 17, 6 27778 11
10 62, 62, 62, 60, 58, 56, 53, 45, 31, 18, 5 27776 11
11 62, 62, 62, 62, 58, 56, 50, 45, 32, 18, 5 27774 11
12 62, 62, 62, 61, 58, 56, 51, 45, 30, 22, 3 27772 11
13 62, 62, 62, 62, 58, 56, 50, 44, 34, 16, 6 27760 11
14 62, 62, 62, 62, 58, 55, 51, 44, 32, 20, 4 27742 11

10 1 112, 110, 110, 109, 105, 100, 99, 88, 75, 59, 37, 16, 4 97942 13
2 112, 110, 110, 109, 105, 101, 96, 87, 77, 60, 38, 15, 4 97850 13
3 112, 110, 110, 108, 106, 99, 95, 89, 76, 60, 43, 15, 1 97842 13
4 112, 110, 110, 108, 105, 100, 96, 88, 74, 65, 38, 17, 1 97828 13
5 112, 110, 110, 108, 106, 103, 95, 85, 76, 60, 40, 15, 4 97720 13
6 112, 110, 110, 108, 106, 101, 95, 87, 75, 61, 40, 17, 2 97678 13
7 112, 110, 109, 108, 105, 101, 96, 86, 78, 63, 36, 17, 3 97674 13
12 Estimating the size of correcting codes using extremal graph problems 241

Table 12.6 Partitions of constant weight codes obtained in: a (this chapter); b [4]; c [12]
l2 w # Partition Index-Vector Norm m

10 4 1a 30, 30, 30, 30, 26, 25, 22, 15, 2 5614 9

12 4 1a 51, 51, 51, 51, 49, 48, 48, 42, 42, 37, 23, 2 22843 12
12 4 2a 51, 51, 51, 51, 49, 48, 48, 45, 39, 36, 22, 4 22755 12
12 4 3a 51, 51, 51, 51, 49, 48, 48, 45, 41, 32, 22, 6 22663 12
12 6 1a 132, 132, 120, 120, 110, 94, 90, 76, 36, 14 99952 10
14 4 1c 91, 91, 88, 87, 84, 82, 81, 79, 76, 73, 66, 54, 38, 11 78399 14
14 4 2c 91, 90, 88, 85, 84, 83, 81, 79, 76, 72, 67, 59, 34, 11, 1 78305 15
14 6 1b 278, 273, 265, 257, 250, 231, 229, 219, 211, 672203 16
203, 184, 156, 127, 81, 35, 4

were used in (12.15), with = 0, to obtain new lower bounds for the asym-
metric codes presented in Table 12.7. To illustrate how the lower bounds
were computed, let us show how the code for l = 18 was constructed. We use
l1 = 8 and l2 = 10:

|Π(8) × Π(10, 0)| = 36 · 1 = 36;

|Π(8) × Π(10, 2)| = 256 · 5 = 1280;
|Π(8) × Π(10, 4)| = 36 · 30 + 34 · 30 + 34 · 30 + 33 · 30 + 30 · 26 + 29 · 25
+ 26 · 22 + 25 · 15 + 9 · 2 = 6580;
|Π(8) × Π(10, 6)| = |Π(8) × Π(10, 4)| = 6580;
|Π(8) × Π(10, 8)| = |Π(8) × Π(10, 2)| = 1280;
|Π(8) × Π(10, 10)| = |Π(8) × Π(10, 0)| = 36;

The total is 2(36 + 1280 + 6580) = 15792 codewords.

Table 12.7 New lower bounds. Previous lower bounds were found in: a [11]; b [12]
Lower Bound
l New Previous

18 15792 15762a
19 29478 29334b
20 56196 56144b
21 107862 107648b
22 202130 201508b
24 678860 678098b

12.4 Conclusions

In this chapter we have dealt with binary codes of given length correcting
certain types of errors. For such codes, a graph can be constructed in which
each vertex corresponds to a binary vector and the edges are built such
242 S. Butenko et al.

that each independent set corresponds to a correcting code. The problem

of ﬁnding the largest code is thus reduced to the maximum independent set
problem in the corresponding graph. For asymmetric codes, we also applied
the partitioning method, which utilizes independent set partitions (or graph
colorings) in order to obtain lower bounds for the maximum code sizes.
We use eﬃcient approaches to the maximum independent set and graph
coloring problems to deal with the problem of estimating the largest code
sizes. As a result, some improved lower bounds and exact solutions for the
size of the largest error-correcting codes were obtained.

Acknowledgments We would like to thank two anonymous referees for their valuable
comments.

References

1. J. Abello, S. Butenko, P. Pardalos and M. Resende, Finding independent sets in a

graph using continuous multivariable polynomial formulations, J. Global Optim. 21(4)
(2001), 111–137.
2. S. Arora and S. Safra, Approximating clique is NP–complete, Proceedings of the 33rd
IEEE Symposium on Foundations on Computer Science (1992) (IEEE Computer
Society Press, Los Alamitos, California, 1992), 2–13.
3. I. M. Bomze, M. Budinich, P. M. Pardalos and M. Pelillo, The maximum clique prob-
lem, in D.-Z. Du and P. M. Pardalos, Eds, Handbook of Combinatorial Optimization
(Kluwer Academic Publishers, Dordrecht, 1999), 1–74.
4. A. Brouwer, J. Shearer, N. Sloane and W. Smith, A new table of constant weight
codes, IEEE Trans. Inform. Theory 36 (1990), 1334–1380.
5. S. Butenko, P. M. Pardalos, I. V. Sergienko, V. Shylo and P. Stetsyuk, Finding max-
imum independent sets in graphs arising from coding theory, Proceedings of the 17th
ACM Symposium on Applied Computing (ACM Press, New York, 2002), 542–546.
6. S. D. Constantin and T. R. N. Rao, On the theory of binary asymmetric error cor-
recting codes, Inform. Control 40 (1979), 20–36.
7. P. Delsarte and P. Piret, Bounds and constructions for binary asymmetric error cor-
recting codes, IEEE Trans. Inform. Theory IT-27 (1981), 125–128.
8. I. I. Dikin, Iterative solution of linear and quadratic programming problems, Dokl.
Akad. Nauk. SSSR 174 (1967), 747–748 (in Russian).
9. I. I. Dikin and V. I. Zorkal’tsev, Iterative Solution of Mathematical Programming
Problems (Algorithms for the Method of Interior Points) (Nauka, Novosibirsk, 1980).
10. J. Dongarra, C. Moler, J. Bunch and G. Stewart, Linpack users’ guide,
http://www.netlib.org/linpack/index.html, available from the ICTP Library, 1979.
11. T. Etzion, New lower bounds for asymmetric and undirectional codes, IEEE Trans.
Inform. Theory 37 (1991), 1696–1704.
12. T. Etzion and P. R. J. Ostergard, Greedy and heuristic algorithms for codes and
colorings, IEEE Trans. Inform. Theory 44 (1998), 382–388.
13. U. Feige and J. Kilian, Zero knowledge and the chromatic number, J. Comput. System
Sci. 57 (1998), 187–199.
14. M. R. Garey and D. S. Johnson, The complexity of near–optimal coloring, JACM 23
(1976), 43–49.
15. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory
of NP–completeness (Freeman, San Francisco, 1979).
12 Estimating the size of correcting codes using extremal graph problems 243

16. J. Håstad, Clique is hard to approximate within n1− , Acta Math. 182 (1999),
105–142.
17. D. S. Johnson and M. A. Trick (Eds), Cliques, Coloring, and Satisﬁability: Second
DIMACS Implementation Challenge, Vol. 26 of DIMACS Series, (American Mathe-
matical Society, Providence, RI, 1996).
18. C. Lund and M. Yannakakis, On the hardness of approximating minimization prob-
lems, JACM 41 (1994), 960–981.
19. P. M. Pardalos, T. Mavridou and J. Xue, The graph coloring problem: a bibliographic
survey, in D.-Z. Du and P. M. Pardalos, Eds, Handbook of Combinatorial Optimization,
Vol. 2 (Kluwer Academic Publishers, Dordrecht, 1999), 331–395.
20. T. R. N. Rao and A. S. Chawla, Asymmetric error codes for some lsi semiconductor
memories, Proceedings of the 7th Southeastern Symposium on System Theory (1975)
(IEEE Computer Society Press, Los Alamitos, California, 1975), 170–171.
21. I. V. Sergienko, V. P. Shylo and P. I. Stetsyuk, Approximate algorithm for solving
the maximum independent set problem, in Computer Mathematics, (V.M. Glushkov
Institute of Cybernetics NAS of Ukraine, Kiev, 2001), 4–20 (in Russian).
22. V. Shylo, New lower bounds of the size of error–correcting codes for the Z–channel,
Cybernet. Systems Anal. 38 (2002), 13–16.
23. V. Shylo and D. Boyarchuk, An algorithm for construction of covering by indepen-
dent sets, in Computer Mathematics (V.M. Glushkov Institute of Cybernetics NAS of
Ukraine, Kiev, 2001), 151–157.
24. N. Sloane, Challenge problems: Independent sets in graphs, http://www.research.
att.com/∼njas/doc/graphs.html, 2001.
25. N. Sloane, On single–deletion–correcting codes, in K. T. Arasu and A. Suress, Eds,
Codes and Designs: Ray–Chaudhuri Festschrift (Walter de Gruyter, Berlin, 2002),
273–291.
26. C. L. M. van Pul and T. Etzion, New lower bounds for constant weight codes, IEEE
Trans. Inform. Theory 35 (1989), 1324–1329.
27. R. R. Varshamov, A class of codes for asymmetric channels and a problem from the
additive theory of numbers, IEEE Trans. Inform. Theory IT–19 (1973), 92–95.
Chapter 13
New perspectives on optimal
transforms of random vectors

P. G. Howlett, C. E. M. Pearce and A. P. Torokhti

Abstract We present a new transform which is optimal over the class of

transforms generated by second-degree polynomial operators. The transform
is based on the solution of the best constrained approximation problem with
the approximant formed by a polynomial operator. It is shown that the new
transform has advantages over the Karhunen–Loève transform, arguably the
most popular transform, which is optimal over the class of linear transforms
of ﬁxed rank. We provide a strict justiﬁcation of the technique, demonstrate
its advantages and describe useful extensions and applications.

Key words: Optimal transforms, singular-value decomposition, ﬁltering,

compression, tensors, random signals

13.1 Introduction and statement of the problem

Optimal transforms of random vectors have been applied succesfully to

many problems in signal processing including, for example, the ﬁltering and

P. G. Howlett
Centre for Industrial and Applicable Mathematics, The University of South Australia,
Mawson Lakes, SA 5095, AUSTRALIA
e-mail: [email protected]
C. E. M. Pearce
School of Mathematical Sciences, The University of Adelaide, Adelaide SA 5005,
AUSTRALIA
e-mail: [email protected]
A. P. Torokhti
Centre for Industrial and Applicable Mathematics, The University of South Australia,
Mawson Lakes, SA 5095, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 245

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 13,
c Springer Science+Business Media, LLC 2009
246 P.G. Howlett et al.

compression of random signals and the classiﬁcation and clustering of signals

[4, 8, 18].
Known transforms are mainly based on linear models. The Karhunen–
Loève transform is perhaps the most popular linear transform and achieves
the smallest associated error of all linear transforms of fixed rank. Recently
Hua and Liu [8] generalized it to the case where no relationship is assumed
between a stochastic signal and noise.
Although the associated error cannot be reduced by the use of any other
linear transform of the same rank, the performance of this transform is still
unsatisfactory in many applications. See the simulations in Section 13.7 in
this connection. In this chapter we present a new nonlinear transform with
a substantially better performance than that of the generalized Karhunen–
Loève transform (GKLT) of [8]. In particular, we show that for the same
rank our transform possesses a much smaller associated error. Our method
is based on the best constrained approximation of a stochastic signal by an
approximant generated by a second-degree polynomial operator. The tech-
nique is based on the primary concept presented in [14]–[16]. We begin with
a rigorous statement of the problem.
Let (Ω, Σ, μ) be a probability space, with Ω the set of outcomes, Σ the
minimal σ-field of measurable subsets of Ω and μ : Σ → [0, 1] an associated
probability measure on Σ. Suppose that x ∈ L2 (Ω, Rm ) and y ∈ L2 (Ω, Rn )
are random vectors with realizations x(ω) ∈ Rm and y(ω) ∈ Rn . We interpret
x as a given “idealized” signal (without any distortion) and y as an observed
signal. In particular, y can be interpreted as x contaminated with noise so that
no specific relationships between signal and noise are assumed. For instance,
noise can be additive, multiplicative or a combination of the two.
Each operator F : Rm → Rn defines an associated operator FF :
L (Ω, Rm ) → L2 (Ω, Rn ) via
2

FF [(x)](ω) = F [x(ω)] for each ω ∈ Ω. (13.1)

It is customary to write F (x) rather than FF (x), since we have [F (x)](ω)=

F [x(ω)] for each ω ∈ Ω. It is also convenient to write x for x(ω), y for y(ω),
etc.
Let T : Rn → Rm be the operator associated with the map TT :
L (Ω, Rn ) → L2 (Ω, Rm ) by an equation similar to (13.1). Suppose T is
2

given by

n
T (y) = A + Bj zj , (13.2)
j=0

where A ∈ Rm , Bj ∈ Rm×n (j = 0, 1, . . . n), z0 = y, y = (y1 , . . . , yn )T ∈ Rn

and zj = yj y for j = 1, . . . n. Then the operator T is completely deﬁned by
A and Bj for j = 0, 1, . . . n.
13 New perspectives on optimal transforms of random vectors 247

The problem is to ﬁnd a vector A0 and matrices Bj0 such that

J(A0 , B00 , . . . Bn0 ) = min J(A, B0 , . . . , Bn ), (13.3)

A,B0 ,...,Bn

subject to
rank[A B0 B1 . . . Bn ] = r (13.4)
with r ≤ m. Here
⎡2 ⎛ ⎞22 ⎤
2 2
⎢2 2 ⎥
n
J(A, B0 , . . . , Bn ) = E ⎣2
2 x − ⎝A + B ⎠ 2
j j 2 ⎦,
z (13.5)
2 j=0 2

with E the expectation operator, · the Frobenius norm, q = n2 + n + 1

and [A B0 B1 . . . Bn ] ∈ Rm×q .

13.2 Motivation of the statement of the problem

Equations (13.3)–(13.5) represent the best constrained approximation prob-

lem. It is well known that a nonlinear approximant normally possesses a
smaller associated error than that of a linear approximation. Therefore it is
natural to seek a suitable nonlinear form of approximant.
Let us consider the nonlinear operator T given by

T (y) = A + B0 y + C(y, y), (13.6)

where C : Rn × Rn → Rm is a bilinear operator, that is, (y, y) ∈ Rn × Rn .

The operator C is a (m × n × n) − −tensor, C = {cijk } ∈ Rm×n×n . Therefore
the vector C(y, y) can be presented as the product of a tensor C and vector
y and also as a product of the matrix Cy with the vector y. As a result we
have
C(y, y) = (Cy)y = B1 y1 y + . . . + Bn yn y,
where B1 = {ci1k }∈ Rm×n , . . . , Bn = {cink } ∈ Rm×n . Alternatively, B1 =
{cij1 } ∈ Rm×n , . . . , Bn = {cijn } ∈ Rm×n . Hence (13.6) coincides with
(13.2).
In the following four sections we show that the transform T 0 produced by
the nonlinear approximant

n
T 0 (y) = A0 + Bj0 zj (13.7)
j=0

possesses a much smaller associated error than that of the GKLT. We then
proceed to address applications and simulations.
248 P.G. Howlett et al.

13.3 Preliminaries

For any u, w ∈ Rn , we write Euw = E[uwT ] and E uw = Euw − E[u]E[wT ].

The symbol M † denotes the Moore–Penrose pseudo-inverse of a matrix M
(see [2]).
Lemma 13.3.1 We have the relations

E xy E †yy E yy = E xy , E zk y E †yy E yy = E zk y and E xzk E †zk zk E zk zk = E xzk .

(13.8)

2
Lemma 13.3.2 Let z = [z1T · · · znT ]T ∈ Rn ,

D = E zz − E zy E †yy E yz and G = E xz − E xy E †yy E yz .

Then
GD† D = G.

The proofs are similar to those of Lemmas 2 and 3 in [14].

For the following result it is convenient to write s = [1 y T z T ]T ∈ Rq .

Lemma 13.3.3 Let

P11 = 1 − P12 E[y] − P13 E[z], T

P12 = P21 , P13 = −E[y T ]P23 − E[z T ]P33 ,

P21 = −P22 E[y] − P23 E[z], P22 = E †yy − P23 E zy E †yy , T

P23 = P32 ,

P31 = −P33 E[z] − P32 E[y], P32 = −P33 E zy E †yy , P33 = D† .

Then ⎡ ⎤
P11 P12 P13
†
Ess = ⎣ P21 P22 P23 ⎦ . (13.9)
P13 P32 P33

Proof. Let
1
t= , S11 = 1 − S12 E[y],
y
S12 = −E[y T ]S22 , T
S21 = S12 , S22 = E †yy .
First we show that

† S11 S12
Ett = . (13.10)
S21 S22
13 New perspectives on optimal transforms of random vectors 249

We have
S11 S12 Q11 Q12
Ett Ett = ,
S21 S22 Q21 Q22
where
Q11 = S11 + E[y T ]S21 + S12 E[y] + E[y T ]S22 E[y] = 1,
Q12 = S11 E[y T ] + E[y T ]S21 E[y T ] + S12 Eyy + E[y T ]S22 Eyy = E[y T ],
Q21 = E[y]S11 + Eyy S21 + E[y]S12 E[y] + Eyy S22 E[y] = E[y]
and

Q22 = E[y]S11 E[y T ] + Eyy S21 E[y T ] + E[y]S12 Eyy + Eyy S22 Eyy = Eyy .

S11 S12
Hence Ett Ett = Ett , that is, the first condition for the Moore–
S21 S22
Penrose inverse of Ett to be given by (13.10) is satisfied. The remain-
†
ing Moore–Penrose conditions for Ett are also easily verified, and therefore
(13.10) is established.
Next, let
† †
R11 = Ett − R12 Ezt Ett , R12 = R21 T
, (13.11)
† †
R21 = −R22 Ezt Ett and R22 = Dzt , (13.12)
†
where Dzt = Ezz − Ezt Ett Etz = D.
Arguing much as above, we have by Lemmas 13.3.1 and 13.3.2 that

† R11 R12
Ess = . (13.13)
R21 R22

Relation (13.9) follows from (13.13) by virtue of (13.10)–(13.12).

13.4 Main results

We denote by U ΣV T the singular-value decomposition of Exs (Ess )† ,that is,

1/2

1/2 †
U ΣV T = Exs (Ess ) , (13.14)

where
U = (u1 , . . . , uq ) ∈ Rm×q and V = (v1 , . . . , vq ) ∈ Rq×q

are orthogonal matrices and

Σ = diag(σ1 , . . . , σq ) ∈ Rq×q

is a diagonal matrix with σ1 ≥ · · · ≥ σk > 0 and σk+1 = · · · = σq = 0. Put

Ur = (u1 , . . . , ur ), Vr = (v1 , . . . , vr ), Σr = diag(σ1 , . . . , σr )

250 P.G. Howlett et al.

and deﬁne
Θr = Θr(x,s) = Ur Σr VrT . (13.15)
Suppose
Φ = [A B0 . . . Bn ] ∈ Rm×q
and let Φ(:, η : τ ) be the matrix formed by the τ − η + 1 sequential columns
of Φ beginning with column η.
The optimal transform T 0 , introduced by (13.7), is deﬁned by the following
theorem.
Theorem 13.4.1 The solution to problem (13.3) is given by

A0 = Φ0 (:, 1 : 1), Bj0 = Φ0 (:, jn + 2 : jn + n + 1), (13.16)

for j = 0, 1, . . . n, where
1/2 † 1/2 †
Φ0 = Θr (Ess ) + Mr [I − Ess
1/2
(Ess ) ],

with I the identity matrix and Mr ∈ Rm×q any matrix such that rank Φ 0
≤ r < m.
Proof. We have
J(A, B0 , . . . , Bn ) = E[ x − Φs 2 ].
By Lemma 13.3.1,

J(A, B0 , . . . , Bn )
8 †
9 8 †
9
= tr Exx − Exs Ess Esx + tr (Φ − Exs Ess )Ess (Φ − Exs Es† )T
8 †
9 22 †
22
1/2 2
= tr Exx − Exs Ess Esx + 2(Φ − Exs Ess )Ess 2 . (13.17)

The minimum of this functional subject to constraint (13.3) is achieved if

1/2
ΦEss = Θr (13.18)

(see [5]). Here we have used

† 1/2 1/2 T 1/2 † 1/2 T 1/2 † † 1/2
Ess Ess = ([Ess ] Ess ) [Ess ] = (Ess ) = (Ess ) .

The necessary and sufficient condition (see [2]) for (13.18) to have a solution
is readily verified to hold and provides the solution Φ = Φ0 . The theorem is
proved.
Remark 1. The proof above is based on Lemma 13.3.1. The first equation in
(13.8) has been presented in [8] but without proof.
Theorem 13.4.2 Let
†
Δ = (Exz − Exy Eyy Eyz )(D† )1/2 2 .
13 New perspectives on optimal transforms of random vectors 251

The error associated with the transform T 0 is

k
† 1/2 2
E[ x − T 0 (y) 2 ] = tr{Exx } + σi2 − Exy (Eyy ) − Δ.
i=r+1

Proof. By Lemma 13.3.3, it follows from (13.17) and (13.18)

E[ x − T 0 (y) 2 ] = tr{Exx − Exy E †yy Eyx } + U ΣV T − Θr 2

− (Exz − Exy E †yy Eyz )(D† )1/2 2 ,

where

k
U ΣV T − Θr 2 = σj2
j=r+1

(see [5]). This proves the theorem.

13.5 Comparison of the transform T 0 and the GKLT

The GKLT is a particular case of our transform T 0 with A0 = O and each

Bj0 = O in (13.16), where O is the corresponding zero vector or zero matrix.
To compare the transform T 0 with the GKLT, we put A0 = O in (13.16).
Then the vector s in (13.14) can be written as s = s̃ = [y T z T ]T . We denote
by σ̃j the eigenvalues in (13.14) for s = s̃ and by T̃ 0 the transform which
follows from (13.7) and (13.16) with A0 = O and s = s̃. We denote the GKLT
by H.
Theorem 13.5.1 Let ϑ1 , . . . , ϑl be the nonzero singular values of the matrix
Exy (Eyy )† , rank H = p ≤ l and D = Ezz − Ezy E †yy Eyz . If
1/2

k 2 22
l
2 † 2
σ̃j2 < 2(Exz − Exy Eyy Eyz )(D† )1/2 2 + ϑ2i , (13.19)
j=r+1 i=p+1

then the error associated with the transform T̃ 0 is less than that associated
with H, that is, 2(2
'2
2 2
E 2x − T̃ 0 (y)2 < E[ x − Hy ]2 .

Proof. It is easily shown that

l
†
E[ x − Hy ]2 = tr{Exx − Exy Eyy Eyx } + ϑ2i .
i=p+1
252 P.G. Howlett et al.

Hence
'2 2(2
2 2
E[ x − Hy ]2 − E 2x − T̃ 0 (y)2
2 22
l
k
2 † † 1/2 2
= 2(Exz − Exy Eyy Eyz )(D ) 2 + ϑi −
2
σ̃j2 ,
i=p+1 j=r+1

giving the desired result.

Condition (13.19) is not restrictive and is normally satisﬁed in practice.

In this connection, see the results of the simulations in Section 13.8.

13.6 Solution of the unconstrained minimization

problem (13.3)

We now address the solution of the minimization problem (13.3) without the
constraint (13.4). This is important in its own right. The solution is a special
form of the transform T 0 and represents a model of the optimal nonlinear
ﬁlter with x an actual signal and y the observed data.
Let
⎡ ⎤
D11 . . . D1n
⎢ D21 . . . D2n ⎥
P=⎢ ⎥
⎣ . . . . . . . . . ⎦ and G = [G1 G2 . . . Gn ],
Dn1 . . . Dnn
where

Dij = E zi zj − E zi y E †yy E yzj ∈ Rn×n and Gj = E xzj − E xy E †yy E yzj ∈ Rm×n

for i, j = 1, . . . , n. We denote a solution to the unconstrained problem (13.3)

using the same symbols as before, that is, with A0 and Bj0 for j = 0, · · · , n.
Theorem 13.6.1 The solution to the problem (13.3) is given by
n
A0 = E[x] − B00 E[y] − Bk0 E[zk ], (13.20)
k=1

n
B00 = (E xy − Bk0 E zk y )E †yy + M1 [I − E 1/2 1/2 †
yy (E yy ) ], (13.21)
k=1

[B10 B20 . . . Bn0 ] = GP † + M2 [I − P 1/2 (P 1/2 )† ], (13.22)

13 New perspectives on optimal transforms of random vectors 253
2
where B10 , B20 , . . . , Bn0 ∈ Rm×n , M1 ∈ Rm×n and M2 ∈ Rm×n are arbitrary
matrices.

Theorem 13.6.2 Let

⎡ ⎤
Q11 ... Q1n
⎢ Q21 ... Q2n ⎥
P =⎢
†
⎣ ...
⎥,
... ... ⎦
Qn1 ... Qnn

where Qij ∈ Rn×n for i, j = 1, . . . , n. The error associated with the transform
T (1) deﬁned by
n
T (1) (y) = A0 + Bj0 zj ,
j=0
0
with A and Bj0 given by (13.20)–(13.22), is

n
E[ x − T (1) (y) 2 ] = tr{E xx } − E xy (E †yy )1/2 2 −
1/2
Gi Qii 2
i=1

− tr{Gj Qjk GTk }.
j,k=1,...,n
j =k

The proofs of both theorems are similar to those of Theorems 1 and 2 in

[14].
It follows from Theorem 13.6.2 that the ﬁlter T (1) has a much smaller
associated error than the error
† 1/2 2
E[ x − H (1) (y) 2 ] = tr{Exx } − Exy (Eyy )

†
associated with the optimal linear ﬁlter H (1)) = Exy Eyy in [8].

13.7 Applications and further modiﬁcations

and extensions

Applications of our technique are abundant and include, for example, simul-
taneous filtering and compression of noisy stochastic signals, feature selection
in pattern recognition, blind channel equalization and the optimal rejection
of colored noise in some neural systems. For the background to these appli-
cations see [1, 4, 8, 18, 19].
The efficiency of a fixed-rank transform is mainly characterized by two
parameters; the compression ratio (see [8]) and the accuracy of signal restora-
tion. The signal compression is realized through the following device. Let p
be the rank of the transform H. Then H can be represented in the form
254 P.G. Howlett et al.

H = H1 H2 , where the matrix H2 ∈ Rp×n relates to compression of the sig-

nal and the matrix H1 ∈ Rm×p to its reconstruction. The compression ratio
of the transform is given by cH = p/m. Similarly the transform T 0 can be
represented in the form Φ0 = C1 C2 , where, for example, C1 = Ur ∈ Rm×r
1/2 †
and C2 = Σr VrT (Ess ) ∈ Rr×q , so that the matrix C2 is associated with
compression and the matrix C1 with signal reconstruction. The compression
ratio of the transform T 0 is given by cT = r/m.
Modiﬁcations of the method are motivated mostly by a desire to reduce the
computation entailed in the estimation of the singular-value decomposition
of Exs (Ess )† . This can be done by exploiting the representation (13.20)–
1/2

(13.22) in such a way that the matrices B10 , · · · , Bn0 in (13.22) are estimated
by a scheme similar to the Gaussian elimination scheme in linear algebra. A
rank restriction can then be imposed on the matrices B10 , · · · , Bn0 that will
bring about reduction of the computational work in ﬁnding certain pseudo-
inverse matrices.
Extensions of the technique can be made in the following directions. First,
the method can be combined with a special iterative procedure to improve the
associated accuracy of the signal estimation. Secondly, an attractive extension
may be based on the representation of the operator T (13.6) in the form

T (y) = A0 + A1 y + A2 (y, y) + · · · + Ak (y, . . . , y),

where Ak : (Rn )k → Rm is a k-linear operator.

Thirdly, a natural extension is to apply the technique to the optimal syn-
thesis of nonlinear sytems. Background material can be found in papers by
Sandberg (see, for example, [11, 12]) and also [6, 7, 13, 16].

13.8 Simulations

The aim of our simulations is to demonstrate the advantages of T 0 over the

GKLT H. To this end, we use the standard digitized image “Lena” presented
by a 256 × 256 matrix X.
To compare the transforms T 0 and H for diﬀerent noisy signals, we parti-
tion the matrix X into 128 submatrices Xij ∈ R16×32 with i = 1, . . . , 16 and
j = 1, . . . , 8 and treat each Xij as a set of 32 realizations of a random vector
so that a column of Xij represents the vector realization.
Observed data have been simulated in the form
(1) (2)
Yij = 10 ∗ Rij . ∗ Xij + 500 ∗ Rij ,

(1)
with i = 1, . . . , 16 and j = 1, . . . , 8, where each Rij is a matrix with entries
(2)
uniformly distributed over the interval (0, 1) and each Rij is a matrix with
normally distributed entries with mean 0 and variance 1. The symbol .∗
signiﬁes Hadamard matrix multiplication.
13 New perspectives on optimal transforms of random vectors 255

The transforms T 0 and H have been applied to each pair Xij , Yij with the
same rank r = 8, that is, with the same compression ratio. The correspond-
ing covariance matrices have been estimated from the samples Xij and Yij .
Special methods for their estimation can be found, for example, in [3, 9, 10]
and [17].
Table 13.1 represents the values of ratios

ρij = Xij − H(Yij ) 2 / Xij − T 0 (Yij ) 2

for each i = 1, . . . , 16 and j = 1, . . . , 8, where Xij − H(Yij ) 2 and

Xij − T 0 (Yij ) 2 are the errors associated with the transforms H and
T 0 , respectively. The value ρij is placed in the cell situated in row i and
column j.

Table 13.1 Ratios ρij of the error associated with the GKLT H to that of the transform
T 0 with the same compression ratios

↓i j → 1 2 3 4 5 6 7 8
1 5268.3 3880.6 1864.5 1094.7 2605.4 2878.0 4591.6 1052.7
2 2168.4 995.1 1499.7 338.6 1015.1 3324.0 2440.5 336.1
3 2269.3 803.5 158.4 136.4 66.7 2545.4 1227.1 326.6
4 1394.3 716.2 173.7 62.9 451.4 721.6 227.8 691.6
5 3352.4 1970.1 98.9 192.8 390.0 92.8 680.4 3196.8
6 1781.5 758.6 93.6 79.3 59.8 223.2 110.5 2580.8
7 2077.4 1526.0 67.4 30.3 172.5 70.3 1024.4 4749.3
8 3137.2 901.2 27.1 38.5 475.3 445.6 1363.2 2917.5
9 2313.2 117.0 18.0 39.3 180.6 251.0 1500.4 2074.2
10 1476.0 31.5 35.7 119.3 859.3 883.5 2843.1 3270.6
11 1836.7 35.3 36.4 1015.5 460.6 487.0 2843.1 8902.3
12 1808.5 74.5 38.2 419.0 428.0 387.2 2616.9 8895.3
13 1849.1 17.6 30.3 492.4 1175.5 135.8 1441.9 1649.2
14 2123.6 54.9 38.6 302.0 1310.5 2193.8 2681.5 1347.9
15 1295.1 136.3 31.8 711.1 2561.7 5999.2 550.7 996.0
16 2125.5 114.9 31.5 732.3 2258.2 5999.2 550.7 427.1

Inspection of Table 13.1 shows that, for the same compression ratio, the
transform T 0 has associated error varying from one part in 17.6 to one part
in 8,895.3 to that of the transform H.
We also applied our filter T (1) (constructed from Theorem 13.6.1) and the
†
optimal linear filter H (1) = Exy Eyy to the same signals and data as above,
that is, to each pair Xij , Yij with i = 1, · · · , 16 and j = 1, · · · , 8.
The errors associated with filters T (1) and H (1) are

X − XT 2 = 1.4 × 10−12 and X − XH 2 = 3.9 × 107 ,

where the matrices XT and XH have been constructed from the submatrices
XT ij ∈ R16×32 and XHij ∈ R16×32 correspondingly, that is, XT = {XT ij } ∈
256 P.G. Howlett et al.

R256×256 and XH = {XHij } ∈ R256×256 with XT ij = T (1) (Yij ) the estimate

of Xij by the filter T (1) and XHij = H (1) Yij that of Xij by the filter H (1) .
The error produced by the filter H (1) is 2.7 × 1019 times greater than that of
the filter T (1) .
Figures 13.1(c) and (d) represent images reconstructed after filtering and
compression of the noisy image in Figure 13.1(b) by the transforms H and

50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250
(a) Given signals. (b) Observed signals.

50 50

100 100

150 150

200 200

250 250
50 100 150 200 250 50 100 150 200 250
(c) Reconstruction after ﬁltering and compression (d) Reconstruction after ﬁltering and compression
by the GKLT. by our transform with the same rank as that of
the GKLT.

50 50

100 100

150 150

200
200

250
250
50 100 150 200 250
50 100 150 200 250
(e) Estimates by the ﬁlter H(1). (f) Estimates by the ﬁlter T(1).

Fig. 13.1 Illustration of the performance of our method.

13 New perspectives on optimal transforms of random vectors 257

200

180

160

140

120

100

80
0 50 100 150 200 250
(a) Estimates of the 18th column in the matrix X.

250

200

150

100

0 50 100 150 200 250

(b) Estimates of the 244th column in the matrix X.
Fig. 13.2 Typical examples of a column reconstruction in the matrix X (image “Lena”)
after ﬁltering and compression of the observed noisy image (Figure 13.1b) by trans-
forms H (line with circles) and T 0 (solid line) of the same rank. In both subﬁgures, the
plot of the column (solid line) virtually coincides with the plot of the estimate by the
transform T 0 .
258 P.G. Howlett et al.

T 0 (which have been applied to each of the subimages Xij , Yij with the same
compression ratio.
Figures 13.1(e) and (f) represent estimates of the noisy image in Figure
13.1(b) by filters H (1) and T (1) , respectively.
To illustrate the simulation results in a different way, we present typical
examples of the plots of a column estimate in matrix X by transforms H
and T 0 . Note that differences of the estimate by T 0 from the column plot are
almost invisible.
Table 13.1 and Figures 13.1 and 13.2 demonstrate the advantages of our
technique.

13.9 Conclusion

The recently discovered generalization [8] of the Karhunen–Loève transform

(GKLT) is the best linear transform of fixed rank. In this chapter we have
proposed and justified a new nonlinear transform which possesses substan-
tially smaller associated error than that of the GKLT of the same rank.
A number of potential applications, modifications and extensions have
been described. Numerical simulations demonstrate the clear advantages of
our technique.

References

1. K. Abed–Meraim, W. Qiu and Y. Hua, Blind system identiﬁcation, Proc. IEEE 85

(1997), 1310–1322.
2. A. Ben–Israel and T. N. E. Greville, Generalized Inverses: Theory and Applications
(John Wiley & Sons, New York, 1974).
3. J.–P. Delmas, On eigenvalue decomposition estimators of centro–symmetric covariance
matrices, Signal Proc. 78 (1999), 101–116.
4. K. Fukunaga, Introduction to Statistical Pattern Recognition (Academic Press, Boston,
1990).
5. G. H Golub and C. F. Van Loan, Matrix Computations (Johns Hopkins University
Press, Baltimore, 1996).
6. P. G. Howlett and A. P. Torokhti, A methodology for the constructive approximation
of nonlinear operators deﬁned on noncompact sets, Numer. Funct. Anal. Optim. 18
(1997), 343–365.
7. P. G. Howlett and A. P. Torokhti, Weak interpolation and approximation of non–linear
operators on the space C([0, 1]), Numer. Funct. Anal. Optim. 19 (1998), 1025–1043.
8. Y. Hua and W. Q. Liu, Generalized Karhunen-Loève transform, IEEE Signal Proc.
Lett. 5 (1998), 141–142.
9. M. Jansson and P. Stoica, Forward–only and forward–backward sample covariances –
a comparative study, Signal Proc. 77 (1999), 235–245.
10. E. I. Lehmann, Testing Statistical Hypotheses (John Wiley, New York, 1986).
11. I. W. Sandberg, Separation conditions and approximation of continuous–time approxi-
mately ﬁnite memory systems, IEEE Trans. Circuit Syst.: Fund. Th. Appl., 46 (1999),
820–826.
13 New perspectives on optimal transforms of random vectors 259

12. I. W. Sandberg, Time–delay polynomial networks and quality of approximation, IEEE

Trans. Circuit Syst.: Fund. Th. Appl. 47 (2000), 40–49.
13. A. P. Torokhti and P. G. Howlett, On the constructive approximation of non–linear
operators in the modelling of dynamical systems, J. Austral. Math. Soc. Ser. B 39
(1997), 1–27.
14. A. P. Torokhti and P. G. Howlett, An optimal ﬁlter of the second order, IEEE Trans.
Signal Proc. 49 (2001), 1044–1048.
15. A. P. Torokhti and P. G. Howlett, Optimal ﬁxed rank transform of the second degree,
IEEE Trans. Circuit Syst.: Analog Digital Signal Proc. 48 (2001), 309–315.
16. A. P. Torokhti and P. G. Howlett, On the best quadratic approximation of nonlinear
systems, IEEE Trans. Circuit Syst.: Fund. Th. Appl., 48 (2001), 595–602.
17. V. N. Vapnik, Estimation of Dependences Based on Empirical Data (Springer-Verlag,
New York, 1982).
18. Y. Yamashita and H. Ogawa, Relative Karhunen–Loève transform, IEEE Trans. Signal
Proc. 44 (1996), 371–378.
19. L.–H. Zou and J. Lu, Linear associative memories with optimal rejection to colored
noise, IEEE Trans. Circuit Syst.: Analog Digital Signal Proc. 44 (1997), 990–1000.
Chapter 14
Optimal capacity assignment
in general queueing networks

P. K. Pollett

Abstract We consider the problem of how best to assign the service capacity
in a queueing network in order to minimize the expected delay under a cost
constraint. We study systems with several types of customers, general service
time distributions, stochastic or deterministic routing, and a variety of ser-
vice regimes. For such networks there are typically no analytical formulae for
the waiting-time distributions. Thus we shall approach the optimal alloca-
tion problem using an approximation technique: speciﬁcally, the residual-life
approximation for the distribution of queueing times. This work generalizes
results of Kleinrock, who studied networks with exponentially distributed
service times. We illustrate our results with reference to data networks.

Key words: Capacity assignment, queueing network, residual-life

approximation

14.1 Introduction

Since their inception, queueing network models have been used to study a
wide variety of complex stochastic systems involving the ﬂow and interaction
of individual items: for example, “job shops,” where manufactured items are
fashioned by various machines in turn [7]; the provision of spare parts for
collections of machines [17]; mining operations, where coal faces are worked
in turn by a number of specialized machines [12]; and delay networks, where
packets of data are stored and then transmitted along the communications
links that make up the network [18, 1]. For some excellent recent expositions,
which describe these and other instances where queueing networks have been
applied, see [2, 6] and the important text by Serfozo [16].

P. K. Pollett
Department of Mathematics, University of Queensland, Queensland 4072, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 261

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 14,
c Springer Science+Business Media, LLC 2009
262 P.K. Pollett

In each of the above-mentioned systems it is important to be able to de-

termine how best to assign service capacity so as to optimize various perfor-
mance measures, such as the expected delay or the expected number of items
(customers) in the network. We shall study this problem in greater generality
than has previously been considered. We allow diﬀerent types of customers,
general service time distributions, stochastic or deterministic routing, and a
variety of service regimes. The basic model is that of Kelly [8], but we do not
assume that the network has the simplifying feature of quasi-reversibility [9].

14.2 The model

We shall suppose that there are J queues, labeled j = 1, 2, . . . , J. Cus-

tomers enter the network from external sources according to independent
Poisson streams, with type u customers arriving at rate νu (customers per
second). Service times at queue j are assumed to be mutually independent,
with an arbitrary distribution Fj (x) that has mean 1/μj (units of service)
and variance σj2 . For simplicity we shall assume that each queue operates
under the usual first-come–first-served (FCFS) discipline and that a total ef-
fort (or capacity) of φj (units per second) is assigned to queue j. We shall
explain later how our results can be extended to deal with other queueing
disciplines.
We shall allow for two possible routing procedures: fixed routing, where
there is a unique route specified for each customer type, and random alterna-
tive routing, where one of a number of possible routes is chosen at random.
(We do not allow for adaptive or dynamic routing, where routing decisions
are made on the basis of the observed traffic flow.) For fixed routing we define
R(u) to be the (unique) ordered list of queues visited by type u customers. In
particular, let R(u) = {ru (1), . . . , ru (su )}, where su is the number of queues
visited by a type u customer and ru (s) is the queue visited at stage s along
its route (ru (s), s = 1, 2, . . . , su , are assumed to be distinct). It is perhaps
surprising that random alternative routing can be accommodated within the
framework of fixed routing (see Exercise 3.1.2 of [10]). If there are several
alternative routes for a given type u, then one simply provides a finer type
classification for customers using these routes. We label the alternative routes
as (u, i), i = 1, 2, . . . , N (u), where N (u) is the number of alternative routes
for type u customers, and we replace R(u) by R(u, i) = {rui (1), . . . , rui (sui )},
for i = 1, 2, . . . , N (u), where now rui (s) is the queue visited at stage s along
alternative route i and sui is the number of stages. We then replace νu by
νui = νu qui , where qui is the probability that alternative route i is cho-
N (u)
sen. Clearly νu = i=1 νui , and so the effect is to thin the Poisson stream
of arrivals of type u into a collection of independent Poisson streams, one
for each type (u, i). We should think of customers as being identified by
their type, whether this be simply u for fixed routing, or the finer classi-
14 Optimal capacity assignment in general queueing networks 263

ﬁcation (u, i) for alternative routing. For convenience, let us denote by T

the set of all types, and suppose that, for each t in T , customers of type t
arrive according to a Poisson stream with rate νt and traverse the route
R(t) = {rt (1), . . . , rt (st )}, a collection of st distinct queues. This is the net-
work of queues with customers of diﬀerent types described in [8]. If all service
times have a common exponential distribution with mean 1/μ (and hence
μj = μ), the model is analytically tractable. In equilibrium the queues be-
have independently: indeed, as if they were isolated , each with independent
Poisson arrival streams (independent among types). For example, if we let
αj (t, s) = νt when rt (s) = j, and α j (t, s)= 0 otherwise, so that the arrival
st
rate at queue j is given by αj = t∈T s=1 αj (t, s), and the demand (in
units per second) by aj = αj /μ, then, provided the system is stable (aj < φj
for each j), the expected number of customers at queue j is n̄j = aj /(φj − aj )
and the expected delay is W j = n̄j /αj = 1/(μφj − αj ); for further details,
see Section 3.1 of [10].

14.3 The residual-life approximation

Under our assumption that service times have arbitrary distributions, the
model is rendered intractable. In particular, there are no analytical formu-
lae for the delay distributions. We shall therefore adopt one of the many
approximation techniques. Consider a particular queue j and let Qj (x) be
the distribution function of the queueing time, that is, the period of time a
customer spends at queue j before its service begins. The residual-life approx-
imation, developed by the author [14], provides an accurate approximation
for Qj (x):
∞
Qj (x) $ Pr(nj = n)Gjn (x) , (14.1)
n=0
φj x
where Gj (x) = μj 0 (1 − Fj (y)) dy and Gjn (x) denotes the n-fold convo-
lution of Gj (x). The distribution of the number of customers nj at queue j,
which appears in (14.1), is that of a corresponding quasi-reversible net-
work [10]: speciﬁcally, a network of symmetric queues obtained by imposing
a symmetry condition at each queue j. The term residual-life approximation
comes from renewal theory; Gj (x) is the residual-life distribution correspond-
ing to the (lifetime) distribution Fj (x/φj ).
One immediate consequence of (14.1) is that the expected queueing time
Qj is approximated by Qj $ n̄j (1 + μ2j σj2 )/(2μj φj ), where n̄j is the expected
number of customers at queue j in the corresponding quasi-reversible net-
work. Hence the expected delay at queue j is approximated as follows:

1 1 + μ2j σj2
Wj $ + n̄j .
μj φj 2μj φj
264 P.K. Pollett

Under the residual-life approximation, it is only n̄j which changes when the
service discipline is altered. In the current context, the FCFS discipline, which
is assumed to be in operation everywhere in the network, is replaced by a
preemptive-resume last-come–ﬁrst-served discipline, giving n̄j = aj /(φj − aj )
with aj = αj /μj , for each j, and hence
+ ,
1 1 + μ2j σj2 αj
Wj $ + . (14.2)
μj φj 2μj φj μj φj − αj

Simulation results presented in [14] justify the approximation by assessing

its accuracy under a variety of conditions. Even for relatively small networks
with generous mixing of traﬃc, it is accurate, and the accuracy improves
as the size and complexity of the network increases. (The approximation is
very accurate in the tails of the queueing time distributions and so it allows
an accurate prediction to be made of the likelihood of extreme queueing
times.) For moderately large networks the approximation becomes worse as
the coeﬃcient of variation μj σj of the service-time distribution at queue j
deviates markedly from 1, the value obtained in the exponential case.

14.4 Optimal allocation of eﬀort

We now turn our attention to the problem of how best to apportion resources
so that the expected network delay, or equivalently (by Little’s theorem)
the expected number of customers in the network, is minimized. We shall
suppose that there is some overall network budget F (dollars) which cannot
be exceeded, and that the cost of operating queue j is a function fj of its
capacity. Suppose that the cost of operating queue j is proportional to φj ,
that is, fj (φj ) = fj φj (the units of fj are dollars per unit of capacity, or
dollar–seconds per unit of service). Thus we should choose the capacities
subject to the cost constraint

J
fj φj = F . (14.3)
j=1

We shall suppose that the average delay of customers at queue j is adequately

approximated by (14.2). Using Little’s theorem, we obtain an approximate
expression for the mean number m̄ of customers in the network. This is
% &
J
1 αj (1 + μ2j σj2 ) J
1 aj (1 + cj )
m̄ $ αj + = aj + ,
j=1
μj φj 2μj φj (μj φj − αj ) j=1
φj 2φj (φj − aj )
14 Optimal capacity assignment in general queueing networks 265

where cj = μ2j σj2 is the squared coeﬃcient of variation of the service time
distribution Fj (x). We seek to minimize m̄ over φ1 , . . . , φJ subject to (14.3).
To this end, we introduce a Lagrange multiplier 1/λ2 ; our problem then
becomes one of minimizing
⎛ ⎞
1 J
L(φ1 , . . . , φJ ; λ−2 ) = m̄ + 2 ⎝ fj φj − F ⎠ .
λ j=1

Setting ∂L/∂φj = 0 for ﬁxed j yields a quartic polynomial equation in φj :

2fj φ4j − 4aj fj φ3j + 2aj (aj fj − λ2 )φ2j − 2j a2j λ2 φj + j a3j λ2 = 0 ,

where j = cj − 1, and our immediate task is to ﬁnd solutions such that

φj > aj (recall that this latter condition is required for stability). The task is
simpliﬁed by observing that the transformation φj fj /F → φj , aj fj /F → aj ,
λ2 /F → λ2 , reduces the problem to one with unit costs fj = F = 1, whence
the above polynomial equation becomes

2φ4j − 4aj φ3j + 2aj (aj − λ2 )φ2j − 2j a2j λ2 φj + j a3j λ2 = 0 , (14.4)

and the constraint becomes

φ1 + φ2 + · · · + φJ = 1 . (14.5)

It is easy to verify that, if service times are exponentially distributed (j = 0

for each j), there is a unique solution to (14.4) on (aj , ∞), given by φj =
√
aj +|λ| aj . Upon application of the constraint (14.5) we arrive at the optimal
√ J J √
capacity assignment φj = aj + aj (1− k=1 ak )/( k=1 ak ), for unit costs.
In the case of general costs this becomes

1 J
fj aj
φj = aj + F− fk ak J √ ,
fj k=1 fk ak
k=1

after applying the transformation. This is a result obtained by Kleinrock [11]

(see also [10]): the allocation proceeds by first assigning enough capacity to
meet the demand aj , at each queue j, and then allocating a proportion of the
J
affordable excess capacity, (F − k=1 fk ak )/fj (that which could be afforded
to queue j), in proportion to the square root of the cost fj aj of meeting that
demand. In the case where some or all of the j , j = 1, 2, . . . , J, deviate from
zero, (14.4) is difficult to solve analytically. We shall adopt a perturbation
technique, assuming that the Lagrange multiplier and the optimal allocation
take the following forms:
266 P.K. Pollett

J
λ = λ0 + λ1k k + O(2 ), (14.6)
k=1

J
φj = φ0j + φ1jk k + O(2 ) , j = 1, . . . , J, (14.7)
k=1

where O(2 ) denotes terms of order i k . The zero-th order terms come from
√
Kleinrock’s solution: specifically, φ0j = aj + λ0 aj , j = 1, . . . , J, where λ0 =
J J √
(1 − k=1 ak )/( k=1 ak ). On substituting (14.6) and (14.7) into (14.4) we
obtain an expression for φ1jk in terms of λ1k , which in turn is calculated
using the constraint (14.5) and by setting k = δkj (the Kronecker delta). We
find that the optimal allocation, to first order, is
√ √
√ aj aj
φj = aj + λ0 aj − J √ bk k + 1 − J √ bj j , (14.8)
k=1 ak k =j k=1 ak

3/2 √ √
where bk = 14 λ0 ak (ak + 2λ0 ak )/(ak + λ0 ak )2 . For most practical appli-
cations, higher-order solutions are required. To achieve this we can simplify
matters by using a single perturbation = max1≤j≤J |j |. For each j we
deﬁne a quantity βj = j / and write φj and λ as power series in :
∞
∞

λ= λn n , φj = φnj n , j = 1, . . . , J. (14.9)
n=0 n=0

Substituting as before into (14.4), and using (14.5), gives rise to an iterative
scheme, details of which can be found in [13]. The first-order approximation is
useful, nonetheless, in dealing with networks whose service-time distributions
are all ‘close’ to exponential in the sense that their coefficients of variation do
not differ significantly from 1. It is also useful in providing some insight into
how the allocation varies as j , for fixed j, varies. Let φi , i = 1, 2, . . . , J, be
the new optimal allocation obtained after incrementing j by a small quantity
δ > 0. We find that to first order in δ
√
aj
φj − φj = 1 − J √ bj δ > 0,
k=1 ak
√
ai
φi − φi = − J √ (φj − φj ) < 0 , i = j.
k=1 ak

Thus, if the coeﬃcient of variation of the service-time distribution at a given

queue j is increased (respectively decreased) by a small quantity δ, then there
is an increase (respectively decrease) in the optimal allocation at queue j
which is proportional to δ. All other queues experience a complementary
decrease (respectively increase) in their allocations and the resulting deﬁcit
is reallocated in proportion to the square root of the demand.
14 Optimal capacity assignment in general queueing networks 267

In [13] empirical estimates were obtained for the radii of convergence of

the power series (14.9) for the optimal allocation. In all cases considered
there, the closest pole to the origin was on the negative real axis outside
the physical limits for i , which are of course −1 ≤ j < ∞. The pertur-
bation technique is therefore useful for networks whose service-time distri-
butions are, for example, Erlang (gamma) (−1 < j < 0) or mixtures of
exponential distributions (0 < j < ∞) with not too large a coeﬃcient of
variation.

14.5 Extensions

So far we have assumed that the capacity does not depend on the state of
the queue (as a consequence of the FCFS discipline) and that the cost of
operating a queue is a linear function of its capacity. Let us briefly consider
some other possibilities. Let φj (n) be the effort assigned to queue j when there
are n customers present. If, for example, φj (n) = nφj /(n + η − 1), where η
is a positive constant, the zero-th order allocation, optimal under (14.3), is
precisely the same as before (the case η = 1). For values of η greater than 1
the capacity increases as the number of customers at queue j increases and
levels off at a constant value φj as the number becomes large. If we allow η
to depend on j we get a similar allocation but with the factor

fj aj fj ηj aj
J √ replaced by J √
k=1 fk ak k=1 fk ηk ak

(see Exercise 4.1.6 of [10]). The higher-order analysis is very nearly the same
as before. The factor 1 + cj is replaced by ηj (1 + cj ); for the sake of brevity,
we shall omit the details.
As another example, suppose that the capacity function is linear, that is,
φj (n) = φj n, and that service times are exponentially distributed. In this
case, the total number
J of customers in the system has a Poisson distribu-
tion with mean j=1 (aj /φj ) and it is elementary to show that the optimal
allocation subject to (14.3) is given by

fj aj
φj = J √ F, j = 1, . . . , J.
fj k=1 fk ak

It is interesting to note that we get a proportional allocation, φj /φk = aj /ak ,

J
in this case if (14.3) is replaced by j=1 log φj = 1 (see Exercise 4.1.7 of [10]).
More generally, we might use the constraint

J
fj log(gj φj ) = F
j=1
268 P.K. Pollett

to account for ‘decreasing costs’: costs become less with each increase in
capacity. Under this constraint, the optimal allocation is φj = λaj /fj , where
=

J
J
log λ = F− fk log(gk ak /fk ) fk .
k=1 k=1

14.6 Data networks

One of the most interesting and useful applications of queueing networks is

in the area of telecommunications, where they are used to model (among
other things) data networks. In contrast to circuit-switched networks (see for
example [15]), where one or more circuits are held simultaneously on several
links connecting a source and destination node, only one link is used at any
time by a given transmission in a data network (message- or packet-switched
network); a transmission is received in its entirety at a given node before
being transmitted along the next link in its path through the network. If
the link is at full capacity, packets are stored in a buffer until the link be-
comes available for use. Thus the network can be modeled as a queueing
network: the queues are the communications links and the customers are the
messages. The most important measure of performance of a data network
is the total delay, the time it takes for a message to reach its destination.
Using the results presented above, we can optimally assign the link capaci-
ties (service rates) in order to minimize the expected total delay. We shall
first explain in detail how the data network can be described by a queueing
network.
Suppose that there are N switching nodes, labeled n = 1, 2, . . . , N , and J
communications links, labeled j = 1, 2, . . . , J. We assume that all the links
are perfectly reliable and not subject to noise, so that transmission times are
determined by message length. We shall also suppose that the time taken to
switch, buffer, and (if necessary) re-assemble and acknowledge, is negligible
compared with the transmission times. Each message is therefore assumed to
have the same transmission time on all links visited. Transmission times are
assumed to be mutually independent with a common (arbitrary) distribution
having mean 1/μ (bits, say) and variance σ 2 . Traffic entering the network
from external sources is assumed to be Poisson and that which originates
from node m and is destined for node n is offered at rate νmn ; the origin–
destination pair determines the message type. We shall assume that each link
operates under a FCFS discipline and that a total capacity of φj (bits per
second) is assigned to link j.
In order to apply the above results, we shall need to make a further assump-
tion. It is similar to the celebrated independence assumption of Kleinrock [11].
As remarked earlier, each message has the same transmission time on all links
visited. However, numerous simulation results (see for example [11]) suggest
14 Optimal capacity assignment in general queueing networks 269

that, even so, the network behaves as if successive transmission times at

any given link are independent. We shall therefore suppose that transmission
times at any given link are independent and that transmission times at differ-
ent links are independent. This phenomenon can be explained by observing
that the arrival process at a given link is the result of the superposition of a
generally large number of streams, which are themselves the result of thin-
ning the output from other links. The approximation can therefore be justified
on the basis of limit theorems concerning the thinning and superposition of
marked point processes; see [3, 4, 5], and the references therein. Kleinrock’s
assumption differs from ours only in that he assumes the transmission-time
distribution at a given link j is exponential with common mean 1/μ, a natural
consequence of the usual teletraffic modeling assumption that messages ema-
nating from outside the network are independent and identically distributed
exponential random variables. However, although the exponential assump-
tion is usually valid in circuit-switched networks, we should not expect it
to be appropriate in the current context of message/packet switching, since
packets are of similar length. Thus it is more realistic to assume, as we do
here, that message lengths have an arbitrary distribution.
For each origin–destination (ordered) pair (m, n), let

R(m, n) = {rmn (1), rmn (2), . . . , rmn (smn )}

be the ordered sequence of links used by messages on that route; smn is the
number of links and rmn (s) is the link used at stage s. Let αj (m, n, s) = νmn
if rmn
(s) =j, and0smnotherwise, so that the arrival rate at link j is given by
αj = m n =m s=1 αj (m, n, s), and the demand (in bits per second) by
aj = αj /μ. Assume that the system is stable (αj < μφj for each j). The
optimal capacity allocation (φj , j = 1, 2, . . . , J) can now be obtained using
the results of Section
14.4. For unit costs, the optimal allocation of capacity
√
(constrained by j φj = 1) satisfies μφj = αj + λ αj , j = 1, . . . , J, where
J J √
λ = (μ − k=1 αk )/( k=1 αk ), in the case of exponential transmission
times. More generally, in the case where the transmission times have an ar-
bitrary distribution with mean 1/μ and variance σ 2 , the optimal allocation
satisfies (to first order in )
√
√ αj J
μφj = αj + λ αj + cj − J √ ck , (14.10)
k=1 αk k=1

3/2 √ √
where ck = 14 λαk (αk + 2λ αk )/(αk + λ αk )2 and = μ2 σ 2 − 1.
To illustrate this, consider a symmetric star network , in which a collection
of identical outer nodes communicate via a single central node. Suppose that
there are J outer nodes and thus J communications links. The corresponding
queueing network, where the nodes represent the communications links, is a
fully connected symmetric network. Clearly there are J(J −1) routes, a typical
one being R(m, n) = {m, n}, where m = n. Suppose that transmission times
270 P.K. Pollett

have a common mean 1/μ and variance σ 2 (for simplicity, set μ = 1), and,
to begin with, suppose that transmission times are exponentially distributed
and that all traffic is offered at the same rate ν. Clearly the optimal allocation
will be φj = 1/J, owing to the symmetry of the network. What happens to
the optimal allocation if we alter the traffic offered on one particular route
by a small quantity? Suppose that we alter ν12 by setting ν12 = ν + e. The
arrival rates at links 1 and 2 will then be altered by the same amount e. Since
μ = 1 we will have a1 = a2 = ν + e and aj = ν for j = 3, . . . , J. The optimal
allocation is easy to evaluate. We find that, for j = 1, 2,
√
(1 − Jν − 2e) ν + e 1 1 (Jν + 1)
φj = ν + e + √ √ = + (J − 2) e + O(e2 ),
(J − 2) ν + 2 ν + e J 2 J 2ν

and for j = 3, . . . , J,
√
(1 − Jν − 2e) ν 1 Jν + 1
φj = ν + √ √ = − e + O(e2 ).
(J − 2) ν + 2 ν + e J J 2ν

Thus, to ﬁrst order in e, there is an O(1/J) decrease in the capacity at all

links in the network, except at links 1 and 2, where there is an O(1) increase
in capacity.
When the transmission times are not exponentially distributed, similar
results can be obtained. For example, suppose that the transmission times
have a distribution whose squared coeﬃcient of variation is 2 (such as a
mixture of exponential distributions). Then it can be shown that the optimal
allocation is given for j = 1, 2 by

1 1 (J 2 ν 2 − Jν + 2)(J 2 ν 2 − 2Jν − 1)
φj = + e + O(e2 )
J 2 J 2ν
and for 3 ≤ j ≤ J by
1 (J − 2)(J 2 ν 2 − Jν + 2)(J 2 ν 2 − 2Jν − 1)
φj = − e + O(e2 ).
J 4J 2 ν
Thus, to ﬁrst order in e, there is an O(J 3 ) decrease in the capacity at all
links in the network, except at links 1 and 2, where there is an O(J 2 ) increase
in capacity. Indeed, the latter is true whenever the squared coeﬃcient of
variation c is not equal to 1, for it is easily checked that φj = 1/J + gJ (c)e +
O(e2 ), j = 1, 2, and φj = 1/J − (J/2 − 1)gJ (c)e + O(e2 ), j = 3, . . . , J, where

Jν(Jν − 1)3 c − (J 4 ν 4 − 3J 3 ν 3 + 3J 2 ν 2 + Jν + 2)
gJ (c) = .
2J 2 ν
Clearly gJ (c) is O(J 2 ). It is also an increasing function of c, and so this accords
with our previous general results on varying the coeﬃcient of variation of the
service-time distribution.
14 Optimal capacity assignment in general queueing networks 271

14.7 Conclusions

We have considered the problem of how best to assign service capacity in

a queueing network so as to minimize the expected number of customers in
the network subject to a cost constraint. We have allowed for different types
of customers, general service-time distributions, stochastic or deterministic
routing, and a variety of service regimes. Using an accurate approximation
for the distribution of queueing times, we derived an explicit expression for
the optimal allocation to first order in the squared coefficient of variation
of the service-time distribution. This can easily be extended to arbitrary
order in a straightforward way using a standard perturbation expansion. We
have illustrated our results with reference to data networks, giving particular
attention to the symmetric star network. In this context we considered how
best to assign the link capacities in order to minimize the expected total delay
of messages in the system. We studied the effect on the optimal allocation
of varying the offered traffic and the distribution of transmission times. We
showed that for the symmetric star network, the effect of varying the offered
traffic is far greater in cases where the distribution of transmission times
deviates from exponential, and that more allocation is needed at nodes where
the variation in the transmission times is greatest.

Acknowledgments I am grateful to Tony Roberts for suggesting that I adopt the per-
turbation approach described in Section 14.4. I am also grateful to Erhan Kozan for helpful
comments on an earlier draft of this chapter and to the three referees, whose comments
and suggestions did much to improve the presentation of my results. The support of the
Australian Research Council is gratefully acknowledged.

References

1. H. Akimaru and K. Kawashima, Teletraﬃc: Theory and Applications, 2nd edition

(Springer-Verlag, London, 1999).
2. G. Bloch, S. Greiner, H. de Meer and K. Trivedi, Queueing Networks and Markov
Chains: Modeling and Performance Evaluation with Computer Science Applications
(Wiley, New York, 1998).
3. T. Brown, Some Distributional Approximations for Random Measures, PhD thesis,
University of Cambridge, 1979.
4. T. Brown and P. Pollett, Some distributional approximations in Markovian networks,
Adv. Appl. Probab. 14 (1982), 654–671.
5. T. Brown and P. Pollett, Poisson approximations for telecommunications networks, J.
Austral. Math. Soc., 32 (1991), 348–364.
6. X. Chao, M. Miyazawa and M. Pinedo, Queueing Networks: Customers, Signals and
Product Form Solutions (Wiley, New York, 1999).
7. J. Jackson, Jobshop-like queueing systems, Mgmt. Sci. 10 (1963), 131–142.
8. F. Kelly, Networks of queues with customers of diﬀerent types, J. Appl. Probab. 12
(1975), 542–554.
9. F. Kelly, Networks of queues, Adv. Appl. Probab. 8 (1976), 416–432.
272 P.K. Pollett

10. F. Kelly, Reversibility and Stochastic Networks (Wiley, Chichester, 1979).

11. L. Kleinrock, Communication Nets (McGraw-Hill, New York, 1964).
12. E. Koenigsberg, Cyclic queues, Operat. Res. Quart. 9 (1958), 22–35.
13. P. Pollett, Distributional Approximations for Networks of Queues, PhD thesis, Uni-
versity of Cambridge, 1982.
14. P. Pollett, Residual life approximations in general queueing networks, Elektron. Infor-
mationsverarb. Kybernet. 20 (1984), 41–54.
15. K. Ross, Multiservice Loss Models for Broadband Telecommunication Networks
(Springer-Verlag, London, 1995).
16. R. Serfozo, Introduction to Stochastic Networks (Springer-Verlag, New York, 1999).
17. J. Taylor and R. Jackson, An application of the birth and death process to the provision
of spare machines, Operat. Res. Quart. 5 (1954), 95–108.
18. W. Turin, Digital Transmission Systems: Performance Analysis and Modelling, 2nd
edition (McGraw-Hill, New York, 1998).
Chapter 15
Analysis of a simple control policy
for stormwater management in two
connected dams

Julia Piantadosi and Phil Howlett

Abstract We will consider the management of stormwater storage in a

system of two connected dams. It is assumed that we have stochastic in-
put of stormwater to the first dam and that there is regular demand from the
second dam. We wish to choose a control policy from a simple class of con-
trol policies that releases an optimal flow of water from the first dam to the
second dam. The cost of each policy is determined by the expected volume
of water lost through overflow.

Key words: Stormwater management, storage dams, eigenvalues, steady-

state probabilities

15.1 Introduction

We will analyze the management of stormwater storage in interconnected

dams. Classic works by Moran [4, 5, 6] and Yeo [7, 8] have considered a
single storage system with independent and identically distributed inputs,
occurring as a Poisson process. Simple rules were used to determine the in-
stantaneous release rates and the expected average behavior. These models
provide a useful background for our analysis of more complicated systems
with a sequence of interdependent storage systems.
In this chapter we have developed a discrete-time, discrete-state Markov
chain model that consists of two connected dams. It is assumed that the
input of stormwater into the ﬁrst dam is stochastic and that there is

Julia Piantadosi
C.I.A.M., University of South Australia, Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]
Phil Howlett
C.I.A.M., University of South Australia, Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 273

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 15,
c Springer Science+Business Media, LLC 2009
274 J. Piantadosi and P. Howlett

regular release from the second dam that reflects the known demand for
water. We wish to find a control policy that releases an optimal flow of water
from the first dam to the second dam. In the first instance we have restricted
our attention to a very simple class of control policies. To calculate the cost of
a particular policy it is necessary to find an invariant measure. This measure
is found as the eigenvector of a large transposed transition matrix. A key
finding is that for our simple class of control policies the eigenvector of the
large matrix can be found from the corresponding eigenvector of a small block
matrix. The cost of a particular policy will depend on the expected volume of
water that is wasted and on the pumping costs. An appropriate cost function
will assist in determining an optimal pumping policy for our system.
This work will be used to analyze water cycle management in a suburban
housing development at Mawson Lakes in South Australia. The intention is to
capture and treat all stormwater entering the estate. The reclaimed water will
be supplied to all residential and commercial sites for watering of parks and
gardens and other non-potable usage. Since this is a preliminary investigation
we have been mainly concerned with the calculation of steady-state solutions
for different levels of control in a class of practical management policies. The
cost of each policy is determined by the expected volume of water lost through
overflow. We have ignored pumping costs. A numerical example is used to
illustrate the theoretical solution presented in the chapter. For underlying
methodology see [1, 2].

15.2 A discrete-state model

15.2.1 Problem description

Consider a system with two connected dams, D1 and D2 , each of finite ca-
pacity. The content of the first dam is denoted by Z1 ∈ {0, 1, . . . , h} and
the content of the second dam by Z2 ∈ {0, 1, . . . , k}. We assume a stochastic
supply of untreated stormwater to the first dam and a regular demand for
treated stormwater from the second dam. The system is controlled by pump-
ing water from the first dam into the second dam. The input to the first
dam is denoted by X1 and the input to the second dam by X2 . We have
formulated a discrete-state model in which the state of the system, at time t,
is an ordered pair (Z1,t , Z2,t ) specifying the content of the two dams before
pumping. We will consider a class of simple control policies. If the content of
the first dam is greater than or equal to a specified level U1 = m, then we
will pump precisely m units of water from the first dam to the second dam.
If the content of the first dam is below this level we do not pump any water
into the second dam. The parameter m is the control parameter for the class
of policies we wish to study. We assume a constant demand for treated water
from the second dam and pump a constant volume U2 = 1 unit from the
second dam provided the dam is not empty. The units of measurement are
chosen to be the daily level of demand.
15 Control policy for stormwater management in two connected dams 275

15.2.2 The transition matrix for a speciﬁc control

policy

We need to begin by describing the transitions. We consider the following

cases:
• for the state (z1 , 0) where z1 < m we do not pump from either dam. If n
units of stormwater enter the ﬁrst dam then

(z1 , 0) → (min([z1 + n], h), 0);

• for the state (z1 , z2 ) where z1 < m and 0 < z2 we do not pump water
from the ﬁrst dam but we do pump from the second dam. If n units of
stormwater enter the ﬁrst dam then

(z1 , z2 ) → (min([z1 + n], h), z2 − 1);

• for the state (z1 , 0) where z1 ≥ m we pump m units from the ﬁrst dam
into the second dam. If n units of stormwater enter the system then

(z1 , 0) → (min([z1 − m + n], h), min(m, k)); and

• for the state (z1 , z2 ) where z1 ≥ m and 0 < z2 we pump m units from the
ﬁrst dam into the second dam and pump one unit from the second dam to
meet the regular demand. If n units of stormwater enter the system then

(z1 , z2 ) → (min([z1 − m + n], h), min(z2 + m − 1, k)).

If we order the states (z1 , z2 ) by the rules that (z1 , z2 ) ≺ (ζ1 , ζ2 ) if z2 < ζ2
and (z1 , z2 ) ≺ (ζ1 , z2 ) if z1 < ζ1 then the transition matrix can be written in
the form

⎡ ⎤
A 0 ··· 0 B 0 ··· 0 0 0
⎢ ⎥
⎢A 0 ··· 0 B 0 ··· 0 0 0 ⎥
⎢ ⎥
⎢0 A ··· 0 0 B ··· 0 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎢. .. .. .. .. .. .. .. ⎥
⎢ .. . ··· . . . ··· . . . ⎥
⎢ ⎥
⎢ ⎥
⎢0 0 ··· A 0 0 ··· 0 B 0 ⎥
H(A, B) = ⎢
⎢
⎥.
⎥
⎢0 0 ··· 0 A 0 ··· 0 0 B⎥
⎢ ⎥
⎢0 0 ··· 0 0 A ··· 0 0 B⎥
⎢ ⎥
⎢ ⎥
⎢ .. .. .. .. .. .. .. .. ⎥
⎢. . ··· . . . ··· . . . ⎥
⎢ ⎥
⎢ ⎥
⎣0 0 ··· 0 0 0 ··· A 0 B⎦
0 0 ··· 0 0 0 ··· 0 A B
276 J. Piantadosi and P. Howlett

The block matrices A = [ai,j ] and B = [bi,j ] for i, j ∈ {0, 1, . . . , h} are deﬁned
by
⎧
⎪
⎪ 0 for 1 ≤ i ≤ m − 1, j < i
⎪
⎪
⎪ pj
⎪ for i = 0, 0 ≤ j ≤ h − 1
⎨
ai,j = pj−i for 1 ≤ i ≤ m − 1, 1 ≤ j ≤ h − 1 and i ≤ j
⎪
⎪
⎪
⎪ ph−i + for j = h, 0 ≤ i ≤ m − 1
⎪
⎪
⎩
0 for m ≤ i ≤ h, 0 ≤ j ≤ h
and
⎧
⎪
⎪ 0 for 0 ≤ i ≤ m − 1, 0 ≤ j ≤ h
⎪
⎪
⎪
⎪ p for i = m, 0 ≤ j ≤ h − 1
⎨ j
bi,j = pj−i+m for m + 1 ≤ i ≤ h, 0 ≤ j ≤ h − 1 and i − m ≤ j
⎪
⎪
⎪
⎪ ph−i+m + for j = h, m ≤ i ≤ h
⎪
⎪
⎩
0 for m + 1 ≤ i ≤ h, j < i − m,

where pr is the probability

∞ that r units of stormwater will ﬂow into the ﬁrst
dam, and ps + = r=s pr . Note that A, B ∈ IR(h+1)×(h+1) and H ∈ IRn×n ,
where n = (h + 1)(k + 1).

15.2.3 Calculating the steady state when 1 < m < k

We suppose that the level of control m is ﬁxed and write Hm = H ∈ IRn×n .

The steady state x[m] = x ∈ IRn is the vector of state probabilities deter-
mined by the non-negative eigenvector of the transposed transition matrix
K = H T corresponding to the unit eigenvalue. Thus we ﬁnd x by solving the
equation

Kx = x subject to the conditions x≥0 and 1T x = 1. (15.1)

If we deﬁne C = AT and D = B T then the matrix K can be written in block

form as K = F (C) + Gm (D) = F + Gm , where

⎡ ⎤
0 C C 0 ··· 0
1 ⎢0 ··· 0⎥
⎢ 0 C ⎥
⎢ ⎥
. ⎢ .. .. .. .. .. ⎥
F = .. ⎢ . . . . . ⎥
⎢ ⎥
k−1 ⎢0 ··· C ⎥
⎣ 0 0 ⎦
k 0 0 0 ··· 0
15 Control policy for stormwater management in two connected dams 277

and
⎡ ⎤
0 0 0 0 ··· 0 ··· 0
.. ⎢ . .. .. . . ⎥
⎢ .. · · · .. · · · .. ⎥
. ⎢ . . ⎥
⎢ ⎥
m−1 ⎢0 0 0 ··· 0 ··· 0 ⎥
⎢ ⎥
⎢D ··· 0 ··· 0 ⎥
Gm = m ⎢ D 0 ⎥.
⎢ ⎥
m+1 ⎢0 0 D ··· 0 ··· 0 ⎥
⎢ ⎥
.. ⎢ . .. .. . . .. .. ⎥
⎢ .. . . ··· . ⎥
. ⎣ . . ⎦
k 0 0 0 ··· D ··· D

Therefore Equation (15.1) can be rewritten as

[F + Gm ]x = x

and by substituting y = [I − F ]x and rearranging, this becomes

Gm [I − F ]−1 y = y. (15.2)

To solve this equation we make some preliminary calculations. We will show

later that the inverse matrices used in the sequel are well deﬁned. From the
Neumann expansion

(I − F )−1 = I + F + F 2 + · · · ,

we deduce that

⎡ ⎤
P P C P C 2 · · · P C k−1 P C k
⎢0 I C · · · C k−2 C k−1 ⎥
⎢ ⎥
⎢ ⎥
⎢0 0 I · · · C k−3 C k−2 ⎥
⎢ ⎥
(I − F )−1 =⎢ . . .. . . .. .. ⎥ ,
⎢ .. .. . . ⎥
⎢ . . ⎥
⎢ ⎥
⎣0 0 0 ··· I C ⎦
0 0 0 ··· 0 I

where we have written P = (I − C)−1 . It follows that

0 0
Gm (I − F )−1 = ,
RS

where S = [S0 , S1 , . . . , Sk−m ] is a block matrix with columns consisting

of k − m + 1 blocks given by
278 J. Piantadosi and P. Howlett

⎡ ⎤ ⎡ ⎤
DP C m−1 DP C m
⎢ DC m−2 ⎥ ⎢ DC m−1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ . ⎥
⎢
.
⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ D ⎥ ⎢ DC ⎥
S0 = ⎢ ⎥, ⎢
S1 = ⎢ ⎥,···
⎢ ⎥ ⎥
⎢
0
⎥ ⎢ D ⎥
⎢ ⎥ ⎢ ⎥
⎢ .
.. ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎢ 0 ⎥
0 ⎣ ⎦
0 0

⎡ ⎤ ⎡ ⎤
DP C k−m DP C k−m+1
⎢ DC k−m−1 ⎥ ⎢ DC k−m ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢ DC k−2m+1 ⎥ ⎢ DC k−2m+2 ⎥
⎢ ⎥ ⎢ ⎥
· · · Sk−2m+1 =⎢ ⎥, Sk−2m+2 =⎢ ⎥,···
⎢ DC k−2m ⎥ ⎢ DC k−2m+1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎣ DC ⎦ ⎣ DC 2 ⎦
D D(I + C)
and ﬁnally ⎡ ⎤
DP C k−1
⎢ DC k−2 ⎥
⎢ ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎢ DC k−m ⎥
Sk−m =⎢
⎢
⎥.
⎥
⎢ DC k−m−1 ⎥
⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ DC m ⎥
⎣ ⎦
D(I + C + · · · + C m−1 )

By writing the matrix equation (15.2) in partitioned form

% &% & % &
I 0 u 0
=
−R (I − S) v 0

it can be seen that u = 0 and that our original problem has reduced to solving
the matrix equation

(I − S)v = 0. (15.3)
15 Control policy for stormwater management in two connected dams 279

Thus we must ﬁnd the eigenvector for S corresponding to the unit eigenvalue.
To properly describe the elimination process we need to establish some suit-
able notation. We will write S = [Si,j ] where i, j ∈ {0, 1, . . . , (k − m)} and
⎧
⎪ DP C m−1+j for i = 0
⎪
⎪
⎪ DC m−1−i+j
⎪ for 1 ≤ i ≤ k − m − 1
⎪
⎪
⎪
⎨ and i − m + 1 ≤ j
Si,j =
⎪
⎪ D(I + C + · · · + C j−k+2m−1 ) for i = k − m
⎪
⎪
⎪
⎪ and k − 2m + 1 ≤ j
⎪
⎪
⎩
0 for m + j ≤ i.

We note that
⎡ ⎤
S0,j
⎢ ⎥
⎢ S1,j ⎥
Sj = ⎢
⎢ ..
⎥
⎥
⎣ . ⎦
Sk−m,j
and that 1T Sj = 1T for each j = 0, 1, . . . , k − m. Hence S is a stochastic
matrix. One of our key findings is that we can use Gaussian elimination to
further reduce the problem from one of finding an eigenvector for the large
matrix S ∈ IR(h+1)(k−m+1)×(h+1)(k−m+1) to one of finding the corresponding
eigenvector for a small block matrix in IR(h+1)×(h+1) .

15.2.4 Calculating the steady state for m = 1

For the special case when m = 1 we have the block matrix G1 with the
following structure:
⎡ ⎤
0 0 0 0 ··· 0
1⎢ ⎢D D 0 ··· 0 ⎥
⎥
⎢ ⎥
⎢ ⎥
G1 = 2 ⎢ 0 0 D · · · 0 ⎥ .
.. ⎢ . . . .
. . . .. . ⎥
⎥
. ⎢
⎣ . . . . . ⎦
k 0 0 0 ··· D

In this case we have

% &
−1
0 0
G1 (I − F ) = ,
RS
280 J. Piantadosi and P. Howlett

where S ∈ IR(h+1)k×(h+1)k is given by

⎡ ⎤
DP DP C DP C 2 · · · DP C k−2 DP C k−1
⎢ DC · · · DC k−3 DC k−2 ⎥
⎢ 0 D ⎥
⎢ ⎥
⎢ 0 0 D · · · DC k−4 DC k−3 ⎥
⎢ ⎥
S=⎢ .. .. .. .. .. .. ⎥.
⎢ . ⎥
⎢ . . . . . ⎥
⎢ ⎥
⎣ 0 0 0 ··· D DC ⎦
0 0 0 ··· 0 D

We now wish to solve

(I − S)v = 0.

15.2.5 Calculating the steady state for m = k

For the case when m = k we have

⎡ ⎤
0 0 0 ··· 0 ··· 0
⎢ . . .. . . ⎥
⎢ .. .. · · · .. · · · .. ⎥
⎢ . ⎥
⎢ ⎥
Gk = ⎢⎢
0 0 0 ··· 0 ··· 0 ⎥.
⎥
⎢ . . .. . . ⎥
⎢ .. .. · · · .. · · · .. ⎥
⎣ . ⎦
D D D ··· D ··· D

Therefore % &
−1
0 0
Gk (I − F ) = ,
RS

where S = DP ∈ IR(h+1)×(h+1) , and so we wish to solve

(I − DP )v = 0.

15.3 Solution of the matrix eigenvalue problem using

Gaussian elimination for 1 < m < k

We wish to ﬁnd the eigenvector corresponding to the unit eigenvalue for the
matrix S. We use Gaussian elimination in a block matrix format. During the
elimination we will make repeated use of the following elementary formulae.

Lemma 1. If W = (I − V )−1 then W = (I + W V ) and W V r = V r W for

all non-negative integers r.
15 Control policy for stormwater management in two connected dams 281

15.3.1 Stage 0

Before beginning the elimination we write T (0) = [I −S](0) = I −S (0) = I −S.

We consider a sub-matrix M from the matrix T (0) consisting of the (0, 0),
(q, 0), (0, s) and (q, s)th elements where 1 ≤ q ≤ m − 1 and 1 ≤ s. We have
% &
I − DP C m−1 −DP C m−1+s
M= .
−DC m−1−q Iδq,s − DC m−1−q+s

If we write W0 = [I − DP C m−1 ]−1 then the standard elimination gives

% &
I −W0 DP C m−1+s
M →
0 Iδq,s − DC m−1−q+s − DC m−1−q W0 DP C m−1+s
% &
I −DP C m−1 W0 C s
→
0 Iδq,s − DC m−1−q [I + W0 DP C m−1 ]C s
% &
I −DP C m−1 (W0 C)C s−1
→ .
0 Iδq,s − DC m−1−q (W0 C)C s−1

After stage 0 of the elimination we have a new matrix T (1) = I − S (1) where
⎧
⎪
⎪ 0 for j = 0
⎪
⎪
⎪
⎪ DP C m−1 (W0 C)C j−1 for i = 0, 1 ≤ j
⎪
⎪
⎪
⎪
⎪
⎪ DC m−1−i (W0 C)C j−1 for 1 ≤ i ≤ m − 1, 1 ≤ j
⎪
⎨
(1) DC m−1−i+j for m ≤ i ≤ k − m − 1
Si,j =
⎪
⎪ and i − m + 1 ≤ j
⎪
⎪
⎪
⎪
⎪
⎪ D(I + C + · · · + C j−k+2m−1 ) for i = k − m
⎪
⎪
⎪
⎪ and k − 2m + 1 ≤ j
⎪
⎩
0 for m + j ≤ i.

Note that column 0 is reduced to a zero column and that row 0 is ﬁxed for all
subsequent stages. We therefore modify T (1) by dropping both column and
row 0.

15.3.2 The general rules for stages 2 to m − 2

After stage p−1 of the elimination, for 1 ≤ p ≤ m−2, we have T (p) = I −S (p)
where
282 J. Piantadosi and P. Howlett
⎧
⎪
⎪ 0 for j =p−1
⎪
⎪ p−1
⎪
⎪ DC m−p t=0 (Wt C)C j−p for i = p − 1, p ≤ j
⎪
⎪ p−1
⎪
⎪ DC m−1−i t=0 (Wt C)C j−p for p ≤ i ≤ m − 1, p ≤ j
⎪
⎨ p−1
(p) D t=i−m+1 (Wt C)C j−p for m ≤ i ≤ m + p − 2, p ≤ j
Si,j =
⎪
⎪ DC m−1−i+j for m+p−1≤i≤k−m−1
⎪
⎪
⎪
⎪ and i−m+1≤j
⎪
⎪ j−k+2m−1 t
⎪
⎪ D t=0 C for i = k − m, k − 2m + 1 ≤ j
⎪
⎩0 for m + j ≤ i.

Column p − 1 is reduced to a zero column and row p − 1 is ﬁxed for all

subsequent stages. We modify T (p) = (I − S (p) ) by dropping both column
and row p − 1 and consider a sub-matrix M consisting of the (p, p), (q, p),
(r, p), (m + p − 1, p), (p, s), (q, s), (r, s) and (m + p − 1, s)th elements, where
p + 1 ≤ q ≤ m − 1, m ≤ r ≤ m + p − 2 and p + 1 ≤ s. The sub-matrix M is
given by
⎡ p−1 p−1 ⎤
I − DC m−1−p t=0 (Wt C) −DC m−1−p t=0 (Wt C)C s−p
⎢ ⎥
⎢ − DC m−1−q p−1 (Wt C) p−1
Iδq,s − DC m−1−q t=0 (Wt C)C s−p ⎥
⎢ t=0 ⎥
⎢ ⎥
⎢ − D p−1 p−1
Iδr,s − D t=r−m+1 (Wt C)C s−p ⎥
⎣ t=r−m+1 (Wt C) ⎦
−D Iδm+p−1,s − DC s−p

and if Wp = [I − DC m−1−p (W0 C) · · · (Wp−1 C)]−1 elimination gives

⎡ p ⎤
I −DC m−1−p t=0 (Wt C)C s−p−1
⎢ 0 Iδq,s − DC m−1−q p (Wt C)C s−p−1 ⎥
M →⎢ ⎣0
p t=0 ⎥.
Iδr,s − D t=r−m+1 (Wt C)C s−p−1 ⎦
0 Iδm+p−1,s − D(Wp C)C s−p−1

After stage p of the elimination we have T (p+1) = I − S (p+1) where

⎧
⎪
⎪ 0 p for j = p
⎪
⎪ m−1−p j−p−1
for i = p, p + 1 ≤ j
⎪
⎪ DC pt=0 (Wt C)C
⎪
⎪ m−1−i j−p−1
for p + 1 ≤ i ≤ m − 1
⎪
⎪ DC (W t C)C
⎪
⎪
t=0
p+1≤j
⎪
⎪ and
⎪
⎪ D pt=i−m+1 (Wt C)C j−p−1 for m ≤ i ≤ m + p − 1
⎨
(p+1)
Si,j = and p + 1 ≤ j
⎪
⎪ m−1−i+j
for m + p ≤ i ≤ k − m − 1
⎪
⎪ DC
⎪
⎪ i−m+1≤j
⎪
⎪ j−k+2m−1 t
and
⎪
⎪
⎪
⎪ D C for i =k−m
⎪
⎪
t=0
⎪
⎪ and k − 2m + 1 ≤ j
⎩
0 for m + j ≤ i.

Since column p is reduced to a zero column and row p is ﬁxed for all subse-
quent stages we modify T (p+1) by dropping both column and row p.
15 Control policy for stormwater management in two connected dams 283

15.3.3 Stage m − 1

After stage m − 2 we have T (m−1) = I − S (m−1) where

⎧
⎪
⎪ 0 for j =m−2
⎪
⎪ m−2
⎪
⎪ DC t=0 (Wt C)C j−m+1
for i = m − 2, m − 1 ≤ j
⎪
⎪
⎪ D m−2 (W C)C j−m+1
⎪
⎪ i = m − 1, m − 1 ≤ j
⎪
⎪ t=0 t for
⎪ m−2
⎪
⎪
⎪ D t=i−m+1 (Wt C)C j−m+1
for m ≤ i ≤ 2m − 3
⎪
⎪
⎨ and m−1≤j
(m−1)
Si,j =
⎪
⎪ DC m−1−i+j for 2m − 2 ≤ i ≤ k − m − 1
⎪
⎪
⎪
⎪ i−m+1≤j
⎪
⎪ and
⎪
⎪ j−k+2m−1 t
⎪
⎪ D t=0 C for i=k−m
⎪
⎪
⎪
⎪
⎪
⎪ and k − 2m + 1 ≤ j
⎪
⎩
0 for m + j ≤ i.

We modify T (m−1) by dropping both column and row m − 2 and consider a

sub-matrix M consisting of the (m − 1, m − 1), (r, m − 1), (2m − 2, m − 1),
(m − 1, s), (r, s) and (2m − 2, s)th elements, where m ≤ r ≤ 2m − 3 and
m ≤ s. We have
⎡ m−2 m−2 ⎤
I − D t=0 (Wt C) −D t=0 (Wt C)C s−m+1
⎢ m−2 ⎥
⎢ s−m+1 ⎥ .
M = ⎢ − D m−2 (Wt C) Iδ r,s − D (W t C)C ⎥
⎣ t=r−m+1 t=r−m+1 ⎦
−D Iδ2m−2,s − DC s−m+1

If we write Wm−1 = [I − D(W0 C) · · · (Wm−2 C)]−1 then the standard elimi-

nation gives
⎡ ⎤
I −D(W0 C) · · · (Wm−1 C)C s−m
⎢ ⎥
M → ⎣ 0 Iδr,s − D(Wr−m+1 C) · · · (Wm−1 C)C s−m ⎦ .
0 Iδ2m−2,s − D(Wm−1 C)C s−m

After stage m − 1 of the elimination we have T (m) = I − S (m) where

⎧
⎪ 0 for j =m−1
⎪
⎪ m−1
⎪
⎪ (Wt C)C j−m for i = m − 1, m ≤ j
⎪ D
⎪ t=0
⎪
⎪
⎪ m−1
⎨ D t=i−m+1 (Wt C)C
j−m
for m ≤ i ≤ 2m − 2, m ≤ j
(m)
Si,j = DC m−1−i+j for 2m − 1 ≤ i ≤ k − m − 1
⎪
⎪
⎪
⎪ and i − m + 1 ≤ j
⎪
⎪ j−k+2m−1 t
⎪
⎪
⎪
⎪ D t=0 C for i = k − m, k − 2m + 1 ≤ j
⎩
0 for m + j ≤ i.
284 J. Piantadosi and P. Howlett

Column m − 1 is reduced to a zero column and row m − 1 is ﬁxed for all

subsequent stages. We modify T (m) by dropping both column and row m − 1.

15.3.4 The general rules for stages m to k − 2m

After stage p − 1 for m ≤ p ≤ k − 2m we have T (p) = I − S (p) where

⎧
⎪ 0 for j = p − 1
⎪
⎪ p−1
⎪
⎪ j−p
i = p − 1, p ≤ j
⎪
⎪
D (W t C)C for
⎪
⎪ t=p−m
⎪ p−1
⎨ D t=i−m+1 (Wt C)C
j−p
for p ≤ i ≤ m + p − 2, p ≤ j
(p)
Si,j = DC m−1−i+j for m + p − 1 ≤ i ≤ k − m − 1
⎪
⎪
⎪
⎪ and i−m+1≤j
⎪
⎪ j−k+2m−1 t
⎪
⎪
⎪
⎪ D t=0 C for i = k − m, k − 2m + 1 ≤ j
⎩
0 for m + j ≤ i.

We modify T (p) by dropping both column and row p − 1 and consider a sub-
matrix M using the (p, p), (r, p), (m+p−1, p), (p, s), (r, s) and (m+p−1, s)th
elements, where p + 1 ≤ r ≤ m + p − 2 and p + 1 ≤ s. We have
⎡ p−1 p−1 ⎤
I − D t=p−m+1 (Wt C) −D t=p−m+1 (Wt C)C s−p
⎢ p−1 ⎥
M = ⎣ −D p−1 t=r−m+1 (Wt C) Iδr,s − D t=r−m+1 (Wt C)C s−p ⎦
−D Iδm+p−1,s − DC s−p

and if we write Wp = [I − D(Wp−m+1 C) · · · (Wp−1 C)]−1 then the standard

elimination gives
⎡ ⎤
I −D(Wp−m+1 C) · · · (Wp C)C s−p−1
M → ⎣ 0 Iδr,s − D(Wr−m+1 C) · · · (Wp C)C s−p−1 ⎦ .
0 Iδm+p−1,s − D(Wp C)C s−p−1

After stage p of the elimination we have T (p+1) = I − S (p+1) where

⎧
⎪
⎪ 0 for j = p
⎪
⎪ D pt=i−m+1 (Wt C)C j−p−1 for p + 1 ≤ i ≤ m + p − 1
⎪
⎪
⎪
⎪ and p + 1 ≤ j
⎪
⎪
⎨ DC m−1−i+j for m + p ≤ i ≤ k − m − 1
(p+1)
Si,j =
⎪
⎪ and i−m+1≤j
⎪ D j−k+2m−1 C t
⎪
⎪ for i = k − m
⎪
⎪ t=0
⎪
⎪ and k − 2m + 1 ≤ j
⎪
⎩
0 for m + j ≤ i.

We now modify T (p+1) by dropping both column and row p.

15 Control policy for stormwater management in two connected dams 285

15.3.5 Stage k − 2m + 1

To reduce the cumbersome notation it is convenient to write p = k − 2m + 1.

After stage k − 2m we have T (p) = I − S (p) where
⎧
⎪ 0 for j = k − 2m
⎪
⎪ k−2m
⎪
⎪ D t=k−3m+1 (Wt C)C j−k+2m−1 for i = k − 2m
⎪
⎪
⎪
⎨ and p≤j
(p)
Si,j = D k−2m
t=i−m+1 (Wt C)C
j−k+2m−1
for p≤i≤k−m−1
⎪
⎪
⎪
⎪ and p≤j
⎪ j−p t
⎪
⎪
⎪ i = k − m, p ≤ j
⎩ D t=0 C for
0 for m + j ≤ i.

We modify T (p) by dropping both column and row k − 2m. We consider

a sub-matrix M consisting of the (p, p), (r, p), (k − m, p), (p, s), (r, s) and
(k − m, s)th elements, where p + 1 ≤ r ≤ k − m − 1 and p + 1 ≤ s. We have
⎡ p−1 p−1 ⎤
I − D t=p−m+1 (Wt C) −D t=p−m+1 (Wt C)C s−p
⎢ p−1 ⎥
M = ⎣ −D p−1 t=r−m+1 (Wt C) Iδr,s − D t=r−m+1 (Wt C)C s−p ⎦ .
s−p
−D Iδk−m,s − D t=0 C t

We write Wp = [I − D(Wp−m+1 C) · · · (Wp−1 C)]−1 . In order to describe the

ﬁnal stages of the elimination more easily we deﬁne

Xk−2m = I and set Xk−2m+r+1 = I + Xk−2m+r (Wk−2m+r C) (15.4)

for each r = 0, . . . , m − 1. With this notation the standard elimination gives

⎡ p ⎤
I −D t=p−m+1 (Wt C)C s−p−1
p
⎢ Iδr,s − D t=r−m+1 (Wt C)C s−p−1 ⎥
M → ⎣0 6 7⎦.
s−p−1 t
0 Iδk−m,s − D t=0 C + (Xp+1 C)C s−p−1

After stage p = k − 2m + 1 of the elimination we have T (p+1) = I − S (p+1)

where
⎧
⎪
⎪ 0 for j = p
⎪ p
⎪
⎪
⎪ D (W C)C j−p−1
for p + 1 ≤ i ≤ k − m − 1
⎪
⎪
t=i−m+1 t
⎨ and p + 1 ≤ j
(p+1)
Si,j = 6
⎪
⎪ D
j−p−1 t
C
⎪
⎪ +(X C)C j−p−1 9
⎪ t=0
⎪
⎪ for i = k − m, p + 1 ≤ j
⎪
⎩
p+1
0 for m + j ≤ i.

We modify T (k−2m+2) by dropping both column and row k − 2m + 1.

286 J. Piantadosi and P. Howlett

15.3.6 The general rule for stages k − 2m + 2

to k − m − 2

After stage p − 1 for k − 2m + 2 ≤ p ≤ k − m − 2 we have T (p) = I − S (p)

where
⎧
⎪ 0 for j = p − 1
⎪
⎪ D p−1 (W C)C j−p
⎪
⎪ for i = p − 1, p ≤ j
⎨ t=p−m
p−1
t
(p)
Si,j = D 6t=i−m+1 (W t C)C
j−p
7 for p ≤ i ≤ k − m − 1, p ≤ j
⎪
⎪
⎪
⎪ D
j−p t
C + X C j−p
for i = k − m, p ≤ j
⎪
⎩ t=0 p
0 for m + j ≤ i.

We modify T (p) by dropping both column and row p − 1 and consider a sub-
matrix M using the (p, p), (r, p), (k − m, p), (p, s), (r, s) and (k − m, s)th
elements, where p + 1 ≤ r ≤ k − m and p + 1 ≤ s. We have M given by

⎡ p−1 p−1 ⎤
I − D t=p−m+1 (Wt C) −D t=p−m+1 (Wt C)C s−p

⎢ −D p−1 p−1
⎣ t=r−m+1 (Wt C) Iδr,s − D 6t=r−m+1 (Wt C)C s−p 7 ⎥
⎦
s−p t
−DXp−1 Iδk−m,s − D t=0 C + X p C s−p

and if we write Wp = [I − D(Wp−m+1 C) · · · (Wp−1 C)]−1 then the standard

elimination gives

⎡ p ⎤
I −D t=p−m+1 (Wt C)C s−p−1
⎢0 p
M →⎣ Iδr,s −6D t=r−m+1 (Wt C)C s−p−1 7 ⎥
⎦.
s−p−1
0 Iδk−m,s − D t=0 C t + Xp+1 C s−p−1

After stage p of the elimination we have T (p+1) = I − S (p+1) where

⎧
⎪
⎪ 0 for j = p
⎪ p
⎪
⎪
⎪ D t=i−m+1 (Wt C)C j−p−1 for p + 1 ≤ i ≤ k − m − 1
⎪
⎨
(p+1) 6 and p + 1 ≤ j
Si,j = j−p−1 t
⎪
⎪ D C
⎪
⎪
t=0
9
⎪
⎪ +Xp+1 C j−p−1
for i = k − m, p + 1 ≤ j
⎪
⎩
0 for m + j ≤ i.

We again modify T (p+1) by dropping both column and row p.

15 Control policy for stormwater management in two connected dams 287

15.3.7 The ﬁnal stage k − m − 1

The matrix S (k−m−1) is given by

k−m−2 k−m−2
D t=k−2m (Wt C) D t=k−2m (Wt C)C
S (k−m−1) = .
DXk−m−1 D[I + (Xk−m−1 C)]

Hence
k−m−2 k−m−2
(k−m−1) I − D t=k−2m (Wt C) −D t=k−2m (Wt C)C
T = .
−DXk−m−1 I − D[I + (Xk−m−1 C)]

If we write Wk−m−1 = [I − D(Wk−2m C) · · · (Wk−m−2 C)]−1 then elimination

gives
I −D(Wk−2m C) · · · (Wk−m−1 C)
M→ .
0 I − DXk−m
Since the original system is singular and since we show later in this chapter
that the matrices W0 , . . . , Wk−m−1 are well defined, we know that the final
pivot element I −DXk−m must also be singular. Therefore the original system
can be solved by finding the eigenvector corresponding to the unit eigenvalue
for the matrix DXk−m and then using back substitution.

15.4 The solution process using back substitution

for 1 < m < k

After the Gaussian elimination has been completed the system reduces to an
equation for v0 ,

k−m
m−1
v0 = DP C (W0 C) C j−1 vj ,
j=1

a set of equations for vp when p = 1, 2, . . . , m − 2,

% &

p
k−m
m−1−p
vp = DC (Wt C) C j−p−1 vj ,
t=0 j=p+1

a set of equations for vq when q = m − 1, m, . . . , k − m − 1,

% &

q
k−m
vq = D (Wt C) C j−q−1 vj ,
t=q−m+1 j=q+1

and ﬁnally an equation for vk−m

(I − DXk−m )vk−m = 0.
288 J. Piantadosi and P. Howlett

We begin by solving the ﬁnal equation to ﬁnd vk−m . The penultimate equa-
tion now shows us that
% k−m−1 &

vk−m−1 = D (Wt C) vk−m .
t=k−2m

We proceed by induction. We suppose that m ≤ q and that for all s with

q ≤ s ≤ k − m we have
% k−m−1 &

vs = D (Wt C) vk−m
t=s−m+1

and %k−m−1 &

k−m
j−s−1
C vj = (Wt C) vk−m .
j=s+1 t=s+1

The hypothesis is clearly true for q = k − m − 1. Now we have

k−m
C j−q−1 vj
j=q

k−m
= vq + C C j−q−2 vj
j=q+1
% & %k−m−1 &-

k−m−1
= D (Wt C) + C (Wt C) vk−m
t=q−m+1 t=q+1
% & - %k−m−1 &

q−1
= D (Wt C) Wq + I ×C (Wt C) vk−m
t=q−m+1 t=q+1
%k−m−1 &

= (Wt C) vk−m
t=q

and hence
% & k−m

q−1
vq−1 = D (Wt C) C j−q vj
t=q−m j=q
%k−m−1 &

=D (Wt C) vk−m .
t=q−m

Thus the hypothesis is also true for m − 1 ≤ q − 1 ≤ s ≤ k − m. To complete

the solution we note that the pattern changes at this point. We still have
15 Control policy for stormwater management in two connected dams 289

k−m
C j−m+1 vj
j=m−1

k−m
= vm−1 + C C j−m vj
j=m
%k−m−1 & %k−m−1 &-

= D (Wt C) + C (Wt C) vk−m
t=0 t=m
%m−2 & - %k−m−1 &

= D (Wt C) Wm−1 + I × C (Wt C) vk−m
t=0 t=m
%k−m−1 &

= (Wt C) vk−m
t=m−1

but now we have

%m−2 &

k−m
vm−2 = DC (Wt C) C j−m+1 vj
t=0 j=m−1
%k−m−1 &

= DC (Wt C) vk−m .
t=0

We use induction once more. Let 1 ≤ p ≤ m − 2 and for p ≤ s ≤ m − 2 we

suppose that %k−m−1 &

m−s−1
vs = DC (Wt C) vk−m
t=0

and %k−m−1 &

k−m
C j−s−1 vj = (Wt C) vk−m .
j=s+1 t=s+1

The hypothesis is true for p = m − 2. Now we have

k−m
C j−p−1 vj
j=p

k−m
= vp + C C j−p−2 vj
j=p+1
%p−1 & - %k−m−1 &

= DC m−p−1 (Wt C) Wp + I ×C (Wt C) vk−m
t=0 t=p+1
%k−m−1 &

= (Wt C) vk−m
t=p
290 J. Piantadosi and P. Howlett

and hence
%p−1 & k−m

vp−1 = DC m−p (Wt C) C j−q vj
t=0 j=p
%k−m−1 &

= DC m−p (Wt C) vk−m .
t=0

Thus the hypothesis is also true for 0 ≤ p − 1 ≤ s ≤ k − m. In summary we

have the solution to Equation (15.3) given by
%k−m−1 &

m−p−1
vp = DC (Wt C) vk−m (15.5)
t=0

for p = 0, 1, . . . , m − 2 and
% &

k−m−1
vq = D (Wt C) vk−m (15.6)
t=q−m+1

for q = m − 1, m, . . . , k − m − 1. The original solution can now be recovered

through
u 0
y= = and x = (I − F )−1 y,
v v
where x = x[m] is the steady-state vector for the original system using the
control policy with level m. The steady-state vector can be used to calculate
the expected amount of water lost from the system when this policy is imple-
mented. The cost of a particular policy will depend on the expected volume
of water that is wasted and on the pumping costs. This cost will assist in
determining an optimal pumping policy for the system.

15.5 The solution process for m = 1

For the case when m = 1 the ﬁnal equation is given by

(I − D)vk = 0.

We will show that (I − D)−1 is well defined and hence deduce that vk = 0.
Since
0 L2
D=
0 M2
it follows that (I − D)−1 is well defined if and only if (I − M2 )−1 is well
defined. We have the following result.
15 Control policy for stormwater management in two connected dams 291

Lemma 2. The matrix M2 k+1 is strictly sub-stochastic and (I − M2 )−1 is

well deﬁned by the formula

(I − M2 )−1 = (I + M2 + M2 2 + · · · + M2 k )(I − M2 k+1 )−1 .

Proof. We observe that

1T M2 = [p1 + , 1, 1, . . . , 1]

and suppose that

1T M2 r = [α0 , α1 , . . . , αr−1 , 1, 1, . . . , 1]

for each r = 1, 2, . . . , q, where αj = αj (r) ∈ (0, 1). Now it follows that

1T M2 q+1 = [α0 , α1 , . . . , αq−1 , 1, 1, . . . , 1]M2

= α0 [p1 , p0 , 0, . . . , 0, 0, 0, . . . , 0] +
α1 [p2 , p1 , p0 , . . . , 0, 0, 0, . . . , 0] + · · ·
··· +
αq−2 [pq−1 , pq−2 , pq−3 , . . . , p0 , 0, 0, . . . , 0] +
αq−1 [pq , pq−1 , pq−2 , . . . , p1 , p0 , 0, . . . , 0] +
[pq+1 , pq , pq−1 , . . . , p2 , p1 , p0 , . . . , 0] + · · ·
··· +
[pk , pk−1 , pk−2 , . . . , pk−q+1 , pk−q , pk−q−1 , . . . , p0 ] +
[pk+1 + , pk + , pk−1 + , . . . , pk−q+2 + , pk−q+1 + , pk−q + , . . . , p1 + ].

The ﬁrst element in the resultant row matrix is

α0 p1 + α1 p2 + · · · + αq−1 pq + pq+1 + = β0 < 1,

and the second element is

α0 p0 + α1 p1 + · · · + αq−1 pq−1 + pq + = β1 < 1.

A similar argument shows that the j th element is less than 1 for all j ≤ q
and indeed, for the critical case j = q, the j th element is given by

αq−1 p0 + p1 + = βq < 1.

The remaining elements for q < j ≤ k are easily seen to be equal to 1. Hence
the hypothesis is also true for r = q + 1. By induction it follows that

1T M2 k+1 < 1T .
292 J. Piantadosi and P. Howlett

Hence (I − M2 k+1 )−1 is well deﬁned. The matrix (I − M2 )−1 is now deﬁned
by the identity above. This completes the proof.

By back substitution into the equation (I −S)v = 0 we can see that vp = 0

for p = 1, . . . , k − 1 and ﬁnally that

(I − DP )v0 = 0.

Since DP is a stochastic matrix, the eigenvector v0 corresponding to the unit

eigenvalue can be found. We know that

u 0
y= =
v v

and hence we can calculate the steady-state vector x = (I − F )−1 y.

15.6 The solution process for m = k

In this particular case we need to solve the equation

(I − DP )v0 = 0.

Hence we can ﬁnd the eigenvector v0 corresponding to the unit eigenvalue

of the matrix DP and the original steady-state solution x can be recovered
through
u 0
y= =
v v
and x = (I − F )−1 y.

15.7 A numerical example

We will consider a system of two connected dams with discrete states

z1 ∈ {0, 1, 2, 3, 4, 5, 6} and z2 ∈ {0, 1, 2, 3, 4}. Assume that the inflow to the
first dam is defined by pr = (0.5)r+1 for r = 0, 1, . . . and consider the control
policy with m = 2. The transition probability matrix has the block matrix
form ⎡ ⎤
A 0 B 0 0
⎢ ⎥
⎢A 0 B 0 0⎥
⎢ ⎥
H=⎢ ⎢0 A 0 B 0⎥⎥,
⎢ ⎥
⎣0 0 A 0 B⎦
0 0 0 A B
15 Control policy for stormwater management in two connected dams 293

where ⎡1 1 1 1 1 1 1
⎤
2 4 8 16 32 64 64
⎢ ⎥
⎢0 1 1 1 1 1 1 ⎥
⎢ 2 4 8 16 32 32 ⎥
⎢ ⎥
⎢0 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ ⎥
A=⎢ ⎥
⎢0 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢0 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎢0 0 0 0 0 0 0 ⎥
⎣ ⎦
0 0 0 0 0 0 0
and ⎡ ⎤
0 0 0 0 0 0 0
⎢ ⎥
⎢0 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢1 1 ⎥
⎢2 1 1 1 1 1
64 ⎥
⎢ 4 8 16 32 64 ⎥
⎢ 1 ⎥
B =⎢0 1 1 1 1 1
32 ⎥ .
⎢ 2 4 8 16 32 ⎥
⎢ 1 1 1 1 1 ⎥
⎢0 0 16 ⎥
⎢ 2 4 8 16 ⎥
⎢ 1 1 1 1 ⎥
⎣0 0 0 2 4 8 8 ⎦
1 1 1
0 0 0 0 2 4 4

As explained in Subsection 15.2.3 we solve the equation

(I − S)v = 0,

where S = [Si,j ] for i, j = {0, 1, 2}. Using the elimination process described
in Section 15.3 we ﬁnd the reduced coeﬃcient matrix
⎡ ⎤
I −W0 D(I − C)−1 C 2 −W0 D(I − C)−1 C 3
(I − S) → ⎣ 0 I −W1 DW0 C 2 ⎦,
0 0 I − D(I + W1 C)

where
⎡ 447 491 1
⎤
2296 2296 2 0 0 0 0
⎢ ⎥
⎢ 1021 1065 1 1
0⎥
⎢ 4592 4592 4 2 0 0 ⎥
⎢ ⎥
⎢ 1849 ⎥
⎢ 9184 1805 1 1 1
0 0⎥
⎢ 9184 8 4 2 ⎥
⎢ 2677 ⎥
D(I + W1 C) = ⎢ 0⎥
2545 1 1 1 1
⎢ 18368 18368 16 8 4 2 ⎥.
⎢ ⎥
⎢ 619 575 1 1 1 1 1⎥
⎢ 5248 5248 32 16 8 4 2⎥
⎢ ⎥
⎢ 619 575 1 1 1 1 1⎥
⎢ 10496 4⎥
⎣ 10496 64 32 16 8
⎦
619 575 1 1 1 1 1
10496 10496 64 32 16 8 4
294 J. Piantadosi and P. Howlett

We solve the matrix equation

(I − D(I + W1 C))v2 = 0

to ﬁnd the vector

' 2787 821 12337 9053 7411 7411 7411 (T
v2 = , , , , , ,
15124 3781 60496 60496 60496 120992 120992
and, by the back substitution process of Section 15.4, the vectors
' 2845 6197 10563 14929 23661 23661 23661 (T
v1 = , , , , , ,
13408 26816 53632 107264 214528 429056 429056
and
'1 1 3 1 3 3 3 (T
v0 = , , , , , , .
4 4 16 8 32 64 64
The probability measure is the steady-state vector x given by
'
1718 983 983 983 983 983 983
62375 , 12475 , 24950 , 49900 , 99800 , 199600 , 199600 ,

1718 3197 3197 3197 3197 3197 3197

62375 , 62375 , 124750 , 249500 , 499000 , 998000 , 998000 ,

3436 4676 569 1676 2183 2183 2183

62375 , 62375 , 12475 , 62375 , 124750 , 249500 , 249500 ,

2816 3888 9959 6071 4127 4127 4127

62375 , 62375 , 249500 , 249500 , 249500 , 499000 , 499000 ,
(T
2787 3284 12337 9053 7411 7411 7411
62375 , 62375 , 249500 , 249500 , 249500 , 499000 , 499000 .

Using the steady-state vector x we can calculate the expected overﬂow of

water from the system. Let z = z(s) = (z1 (s), z2 (s)) for s = 1, 2, . . . , n denote
the collection of all possible states. The expected overﬂow is calculated by
%∞ &
n
J= f [z(s)|r]pr xs ,
s=1 r=0

where f [z(s)|r] is the overflow from state z(s) when r units of stormwater
enter the first dam. We will consider the same pumping policy for four differ-
ent values m = 1, 2, 3, 4 of the control parameter. We obtain the steady-state
vector x = x[m] for each particular value of the control parameter and deter-
mine the expected total overflow in each case. Table 15.1 compares the four
parameter values by considering the overflow Ji = Ji [m] from the first and
second dams.
From the table it is clear that the first pumping policy results in less
overflow from the system. If pumping costs are ignored then it is clear that
the policy m = 1 is the best. Of course, in a real system there are likely to
15 Control policy for stormwater management in two connected dams 295

Table 15.1 Overﬂow lost from the system for m = 1, 2, 3, 4

m=1 m=2 m=3 m=4

1 1 1 1
J1 7 25 25 17

9053 263 57
J2 0 62375 1350 272

1 11548 317 73
Total 7 62375 1350 272

be other cost factors to consider. It is possible that less frequent pumping of

larger volumes may be more economical.

15.8 Justiﬁcation of inverses

To justify the solution procedure described earlier we will show that Wr is

well defined for r = 0, 1, . . . , k − m − 1. From the definition of the transition
matrix H = H(A, B) in Subsection 15.2.2 and the subsequent definition of
C = AT and D = B T we can see that C and D can be written in block form
as

L1 0 0 L2
C= and D = ,
M1 0 0 M2

where L1 ∈ IRm×m and M1 ∈ IR(h−m+1)×m are given by

⎡ ⎤
⎡ ⎤ pm pm−1 · · · p1
p0 0 ··· 0 ⎢ pm+1 pm · · ·
⎢ p1 p2 ⎥
⎢ p0 · · · 0 ⎥ ⎥
⎢
⎢ .. .. ..
⎥
⎥
L1 = ⎢ . . . . ⎥ , M1 = ⎢ . . · · · . ⎥
⎣ .. .. . . .. ⎦ ⎢ ⎥
⎣ ph−1 ph−2 · · · ph−m ⎦
pm−1 pm−2 · · · p0
p+ h−1 · · · ph−m+1
p+ +
h

and where L2 ∈ IRm×(h−m+1) and M2 ∈ IR(h−m+1)×(h−m+1) are given by

⎡ ⎤
p0 · · · 0 0 · · · 0
⎢ p1 · · · 0 0 · · · 0 ⎥
⎢ ⎥
L2 = ⎢ . . . .⎥
⎣ .. · · · .. .. · · · .. ⎦
pm−1 · · · p0 0 · · · 0

and
296 J. Piantadosi and P. Howlett
⎡ ⎤
pm · · · p1 0 ··· 0
⎢ pm+1 · · · p p ··· 0 ⎥
⎢ 2 1 ⎥
⎢ .. . . .. ⎥
M2 = ⎢ . · · · .
. .
. ··· . ⎥.
⎢ ⎥
⎣ ph−1 · · · ph−m ph−m−1 · · · pm−1 ⎦
p+
h · · · p+ +
h−m+1 ph−m · · · p+m

15.8.1 Existence of the matrix W0

Provided pj > 0 for all j ≤ h the matrix L1 is strictly sub-stochastic [3] with
1T L1 < 1T . It follows that (I − L1 )−1 is well deﬁned and hence

(I − L1 )−1 0
P = (I − C)−1 =
M1 (I − L1 )−1 I

is also well deﬁned. Note also that 1T C = [1T , 0T ] and 1T D = [0T , 1T ]. We

begin with an elementary but important result.

Lemma 3. With the above deﬁnitions,

1T DP = 1T and 1T DP C m−1 ≤ 1T C m−1 (15.1)

and the matrix W0 = [I − DP C m−1 ]−1 is well deﬁned.

Proof.

1T D + 1T C = 1T ⇒ 1T D = 1T (I − C) ⇒ 1T D(I − C)−1 = 1T .

Hence

L1 m−1 0
1T DP C m−1 = 1T C m−1 = [1T , 1T ]
M1 L1 m−2 0
3 T m−1 4
= 1 L1 + 1T M1 Lm−2
1 , 0T
3 4
= (1T L1 + 1T M1 )Lm−2 1 , 0T
3 4
= 1T Lm−21 , 0T < 1T

for m ≥ 3. Hence W0 = [I − DP C m−1 ]−1 is well deﬁned.

15.8.2 Existence of the matrix Wp for 1 ≤ p ≤ m − 1

We will consider the matrix

W1 = [I − DC m−2 (W0 C)]−1 .

15 Control policy for stormwater management in two connected dams 297

We have

1T DC m−2 + 1T DP C m−1 = 1T D[I + P C]C m−2

= 1T DP C m−2
= 1T C m−2 .

Since 1T C m−2 ≤ 1T we deduce that

1T DC m−2 + (1T C m−2 )DP C m−1 ≤ 1T C m−2 ,

from which it follows that

1T DC m−2 ≤ 1T C m−2 [I − DP C m−1 ]

and hence that

1T DC m−2 (W0 C) ≤ 1T C m−1 < 1T .
Since the column sums of the matrix DC m−2 (W0 C) are all non-negative and
less than one, it follows that the inverse matrix

W1 = [I − DC m−2 (W0 C)]−1

is well deﬁned.

Lemma 4. With the above deﬁnitions the matrix Wp is well deﬁned for each
p = 0, 1, . . . , m − 1 and for each such p we have
⎡ ⎤

m−1−p
p−1
T
1 D ⎣ C j⎦
(Wt C) ≤ 1T C p .
j=0 t=0

In the special case when p = m − 1 the inequality becomes

m−2
1T D (Wt C) ≤ 1T C m−1 .
t=0

Proof. The proof is by induction. We note that

⎡ ⎤

m−2
1T D ⎣ C j + P C m−1 ⎦ = 1T DP = 1T
j=0

and hence ⎡ ⎤

m−2
1T D ⎣ C j ⎦ = 1T [I − DP C m−1 ],
j=0

from which it follows that

298 J. Piantadosi and P. Howlett
⎡ ⎤

m−2
1T D ⎣ C j ⎦ (W0 C) = 1T C.
j=0

Thus the result is true for p = 1. Suppose the result is true for 1 ≤ s ≤ p − 1.
Then

p−2
1T DC m−p (Wt C) ≤ 1T C p−1 < 1T
t=0

and hence
Wp−1 = [I − DC m−p (W0 C) · · · (Wp−2 C)]−1
is well deﬁned. Now we have
⎡ ⎤

m−1−p
p−2
p−2
1T D ⎣ Cj⎦ (Wt C) + (1T C p−1 )DC m−p (Wt C)
j=0 t=0 t=0
⎡ ⎤

m−p
p−2
≤ 1T D ⎣ Cj⎦ (Wt C)
j=0 t=0

≤ 1T C p−1

and hence
⎡ ⎤

m−1−p
p−2
p−2
1T D ⎣ Cj⎦ (Wt C) ≤ (1T C p−1 )[I − DC m−p (Wt C)],
j=0 t=0 t=0

from which it follows that

⎡ ⎤ % &

m−1−p
p−2
1T D ⎣ Cj⎦ × (Wt C) Wp−1 ≤ 1T C p−1 .
j=0 t=0

If we multiply on the right by C we obtain the desired result

⎡ ⎤

m−1−p
p−1
1T D ⎣ Cj⎦ (Wt C) ≤ 1T C p .
j=0 t=0

Thus the result is also true for s = p. This completes the proof.

15.8.3 Existence of the matrix Wq

for m ≤ q ≤ k − m − 1

We need to establish some important identities.

15 Control policy for stormwater management in two connected dams 299

Lemma 5. The JP identities of the ﬁrst kind

⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨
p−1
p−1
m−p−1
p−1 ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C) = 1T (15.2)
⎩ ⎭
j=1 t=p−j j=0 t=0

are valid for each p = 1, 2, . . . , m − 1.

Proof. From the identity

m−2
C j + C m−1 P = P
j=0

and Lemma 3 we deduce that

⎡ ⎤

m−2
1T D ⎣ C j + C m−1 P ⎦ = 1T .
j=0

By rearranging this identity we have

⎡ ⎤

m−2
1T D ⎣ C j ⎦ = 1T [I − DC m−1 P ]
j=0

and hence ⎡ ⎤

m−2
1T D ⎣ C j ⎦ (W0 C) = 1T C,
j=0

from which it follows that

⎧ ⎡ ⎤ ⎫
⎨
m−2 ⎬
1T D I + ⎣ C j ⎦ (W0 C) = 1T C + 1T D = 1T .
⎩ ⎭
j=0

Therefore the JP identity of the ﬁrst kind is valid for p = 1. We will use
induction to establish the general identity. Let p > 1 and suppose the result
is true for s = p < m − 1. From
⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨
p−1
p−1
m−p−1
p−1 ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C) = 1T
⎩ ⎭
j=1 t=p−j j=0 t=0

we deduce that
⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨
p−1
p−1
m−p−2
p−1 ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C)
⎩ ⎭
j=1 t=p−j j=0 t=0

= 1 [I − DC
T m−p−1
(W0 C) · · · (Wp−1 C)]
300 J. Piantadosi and P. Howlett

and hence that

⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨
p−1
p−1
m−p−2
p−1 ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C) (Wp C) = 1T C.
⎩ ⎭
j=1 t=p−j j=0 t=0

If we rewrite this in the form

⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨ p p
m−p−2
p ⎬
1T D ⎣ (Wt C)⎦ + ⎣ C j ⎦ (Wt C) = 1T C
⎩ ⎭
j=1 t=p+1−j j=0 t=0

then it is clear that

⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨ p
p
m−p−2
p ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C)
⎩ ⎭
j=1 t=p+1−j j=0 t=0

= 1T C + 1T D = 1T .

Hence the result is also true for s = p + 1. This completes the proof.
Lemma 6. The matrix Wq exists for q = m, m + 1, . . . , k − m − 1 and for
each such q the JP identities of the second kind
⎧ ⎡ ⎤⎫
⎨ q−1
m−1 ⎬
1T D I + ⎣ (Wt C)⎦ = 1T (15.3)
⎩ ⎭
j=1 t=q−j

are also valid.

Proof. The JP identities of the second kind are established in the same way
that we established the JP identities of the first kind but care is needed
because it is necessary to establish that each Wq is well defined. From Lemma
5 the JP identity of the first kind for p = 1 is
⎧ ⎡ ⎤ ⎫
⎨
m−3 ⎬
1T D I + ⎣ C j ⎦ (W0 C) = 1T .
⎩ ⎭
j=0

Therefore
⎧ ⎡ ⎤ ⎫
⎨
m−3 ⎬
1T D I + ⎣ C j ⎦ (W0 C) + 1T DC m−2 (W0 C) = 1T
⎩ ⎭
j=0

and hence
⎧ ⎡ ⎤ ⎫
⎨
m−3 ⎬
1T D I +⎣ C j ⎦ (W0 C) = 1T [I − DC m−2 (W0 C)],
⎩ ⎭
j=0
15 Control policy for stormwater management in two connected dams 301

from which we obtain

⎧ ⎡ ⎤ ⎫
⎨
m−3 ⎬
1T D I + ⎣ C j ⎦ (W0 C) (W1 C) = 1T C.
⎩ ⎭
j=0

In general if we suppose that

⎧ ⎡ ⎤ ⎫
⎨
m−p ⎬ p−2

1T D I + ⎣ C j ⎦ (W0 C) (Wt C) ≤ 1T C p−2
⎩ ⎭
j=0 t=1

then we have
⎧ ⎡ ⎤ ⎫
⎨
m−p−1 ⎬ p−2

p−2
1T D I + ⎣ C j ⎦ (W0 C) (Wt C) + (1T C p−2 )DC m−p (Wt C)
⎩ ⎭
j=0 t=1 t=0

≤1 CT p−2

and hence
⎧ ⎡ ⎤ ⎫
⎨
m−p−1 ⎬ p−2

1T D I +⎣ C j ⎦ (W0 C) (Wt C)
⎩ ⎭
j=0 t=1

p−2
≤ (1T C p−2 )[I − DC m−p (Wt C)],
t=0

from which we obtain

⎧ ⎡ ⎤ ⎫
⎨
m−p−1 ⎬ p−1

1T D I + ⎣ C j ⎦ (W0 C) (Wt C) ≤ 1T C p−1 .
⎩ ⎭
j=0 t=1

By continuing this process until W0 is eliminated we obtain the inequality

m−1
1T D (Wt C) ≤ 1T C m−1 .
t=1

Therefore the matrix

Wm = [I − D(W1 C) · · · (Wm−1 C)]−1

is well deﬁned. The JP identity of the ﬁrst kind with p = m − 1 gives

⎧ ⎡ ⎤ ⎫
⎨
m−2
m−2
m−2 ⎬
1T D I + ⎣ (Wr C)⎦ + (Wt C) = 1T
⎩ ⎭
j=1 t=m−1−j t=0
302 J. Piantadosi and P. Howlett

which, after rearrangement, becomes

⎧ ⎡ ⎤⎫
⎨
m−2
m−2 ⎬
m−2
T
1 D I+ ⎣ (Wt C)⎦ = 1 [I − D
T
(Wt C)]
⎩ ⎭
j=1 t=m−1−j t=0

and allows us to deduce

⎧ ⎡ ⎤⎫
⎨
m−2
m−2 ⎬
1T D I + ⎣ (Wt C)⎦ (Wm−1 C) = 1T C.
⎩ ⎭
j=1 t=m−1−j

If we rewrite this in the form

⎧ ⎡ ⎤ ⎫
⎨m−1
m−1
m−1 ⎬
1T D ⎣ (Wt C)⎦ + (Wt C) = 1T C
⎩ ⎭
j=1 t=m−j t=0

then it follows that

⎧ ⎡ ⎤⎫
⎨
m−1
m−1 ⎬
1T D I+ ⎣ (Wt C)⎦ = 1T .
⎩ ⎭
j=1 t=m−j

Thus the JP identity of the second kind is true for q = m. We proceed by

induction. We suppose that the matrix Ws is well deﬁned for each s with
m ≤ s ≤ q < k − m − 1 and that the JP identity of the second kind
⎧ ⎡ ⎤⎫
⎨ s−1
m−1 ⎬
1T D I + ⎣ (Wt C)⎦ = 1T
⎩ ⎭
j=1 t=s−j

is valid for these values of s. Therefore

⎧ ⎡ ⎤⎫
⎨ q−1
m−2 ⎬
q−1
1T D I + ⎣ (Wt C)⎦ + 1T D (Wt C) = 1T
⎩ ⎭
j=1 t=q−j t=q−m+1

and hence
⎧ ⎡ ⎤⎫
⎨ q−1
m−2 ⎬
1T D I + ⎣ (Wt C)⎦ = 1T [I − D(Wq−m+1 C) · · · (Wq−1 C)].
⎩ ⎭
j=1 t=q−j

Since Wq = [I − D(Wq−m+1 C) · · · (Wq−1 C)]−1 is well deﬁned, we have

⎧ ⎡ ⎤⎫
⎨ q−1
m−2 ⎬
1T D I + ⎣ (Wt C)⎦ (Wq C) = 1T C
⎩ ⎭
j=1 t=q−j
15 Control policy for stormwater management in two connected dams 303

which we rewrite as
⎧ ⎡ ⎤⎫
⎨m−1

q ⎬
1T D ⎣ (Wt C)⎦ = 1T C
⎩ ⎭
j=1 t=q+1−j

and from which it follows that

⎧ ⎡ ⎤⎫
⎨
m−1
q ⎬
1T D I + ⎣ (Wt C)⎦ = 1T .
⎩ ⎭
j=1 t=q+1−j

Hence the JP identity of the second kind is valid for s = q + 1. To show that
Wq+1 is well deﬁned, we must consider two cases. When q ≤ 2m − 2 we set
p = q − m + 2 in the JP identity of the ﬁrst kind to give
⎡ ⎡ ⎤ ⎡ ⎤ ⎤

q+1−m
q+1−m
2m−q−1
q+1−m
1T D ⎣I + ⎣ (Wt C)⎦+⎣ C j⎦ (Wt C)⎦ = 1T .
j=1 t=q−m+2−j j=0 t=0

Therefore
⎧ ⎡ ⎤ ⎡ ⎤ ⎫
⎨
q+1−m
q+1−m
2m−q−2
q+1−m ⎬
1T D I + ⎣ (Wt C)⎦ + ⎣ Cj⎦ (Wt C)
⎩ ⎭
j=1 t=q−m+2−j j=0 t=0

= 1 [I − DC
T 2m−q−1
(W0 C) · · · (Wq+1−m C)]

and hence
⎧ ⎡ ⎤
⎨
q+1−m
q+1−m
1T D I+ ⎣ (Wt C)⎦
⎩
j=1 t=q−m+2−j
⎡ ⎤ ⎫

2m−q−2
q+1−m ⎬
+⎣ Cj⎦ (Wt C) (Wq−m+2 C) = 1T C.
⎭
j=0 t=0

Since 1T C ≤ 1T it follows that

⎧ ⎡ ⎤
⎨
q+1−m
q+1−m
1T D I + ⎣ (Wt C)⎦
⎩
j=1 t=q−m+2−j
⎡ ⎤ ⎫

2m−q−3
q+1−m ⎬
+ ⎣ Cj⎦ (Wt C) (Wq−m+2 C)
⎭
j=0 t=0

q−m+2
+ (1T C)DC 2m−q−2 (Wt C) ≤ 1T C
t=0
304 J. Piantadosi and P. Howlett

and hence that

⎧ ⎡ ⎤
⎨
q+1−m
q+1−m
1T D I+ ⎣ (Wt C)⎦
⎩
j=1 t=q−m+2−j
⎡ ⎤ ⎫

2m−q−3
q+1−m ⎬
+⎣ Cj⎦ (Wt C) (Wq−m+2 C)
⎭
j=0 t=0

≤ (1T C)[I − DC 2m−q−2 (W0 C) · · · (Wq−m+2 C)].

Now we can deduce that

⎧ ⎡ ⎤
⎨
q+1−m
q+1−m
1T D I + ⎣ (Wt C)⎦
⎩
j=1 t=q−m+2−j
⎡ ⎤ ⎫

2m−q−3
q+1−m ⎬
+⎣ Cj⎦ (Wt C) (Wq−m+2 C)(Wq−m+3 C) ≤ 1T C 2 .
⎭
j=0 t=0

If we continue this process the terms of the second sum on the left-hand side
will be eliminated after 2m − q steps, at which stage we have
⎧ ⎡ ⎤⎫
⎨
q+1−m
q+1−m ⎬ m−1

T
1 D I+ ⎣ (Wt C) ⎦ (Wt C) ≤ 1T C 2m−q .
⎩ ⎭
j=1 t=q−m+2−j t=q−m+2

The details change slightly but we continue the elimination process. Since
(1T C 2m−q ) < 1T we now have
⎧ ⎡ ⎤⎫
⎨
q−m
q+1−m ⎬ m−1

1T D I + ⎣ (Wt C)⎦ (Wt C)
⎩ ⎭
j=1 t=q−m+2−j t=q−m+2

m−1
+ (1T C 2m−q )D (Wt C) ≤ 1T C 2m−q
t=1

and hence
⎧ ⎡ ⎤⎫
⎨
q−m
q+1−m ⎬
m−1
1T D I+ ⎣ (Wt C)⎦ (Wt C)
⎩ ⎭
j=1 t=q−m+2−j t=q−m+2

≤ (1T C 2m−q )[I − D(W1 C) · · · (Wm−1 C)],

from which we obtain

15 Control policy for stormwater management in two connected dams 305
⎧ ⎡ ⎤⎫
⎨
q−m
q+1−m ⎬
m
1T D I+ ⎣ (Wt C)⎦ (Wt C) ≤ 1T C 2m−q+1 .
⎩ ⎭
j=1 t=q−m+2−j t=q−m+2

The elimination continues in the same way until we eventually conclude that

q
T
1 D (Wt C) ≤ 1T C m−1 < 1T
t=q−m+2

and hence establish that Wq+1 = [I − D(Wq−m+2 C) · · · (Wq C)]−1 is well

deﬁned. A similar but less complicated argument can be carried through
using the appropriate JP identity of the second kind when q ≥ 2m − 1. Hence
the matrix Ws is well deﬁned for s = q + 1. This completes the proof.

15.9 Summary

We have established a general method of analysis for a class of simple con-

trol policies in a system of two connected dams where we assume a stochastic
supply and regular demand. We calculated steady-state probabilities for each
particular policy within the class and hence determined the expected overflow
from the system. A key finding is that calculation of the steady-state proba-
bility vector for a large system can be reduced to a much smaller calculation
using the block matrix structure.
We hope to extend our considerations to more complex control policies
in which the decision to pump from the first dam requires that the con-
tent of the first dam exceeds a particular level m1 and also that the con-
tent of the second dam is less than the level m2 = k − m1 . We observe
that for this class the transition matrix can be written in block matrix
form using the matrices A and B described in this article in almost the
same form but with the final rows containing only one non-zero block ma-
trix R. Thus it seems likely that the methodology presented in this chapter
could be adapted to provide a general analysis for this new class of pumping
policies. Ultimately we would like to extend our considerations to include
more complicated connections and the delays associated with treatment of
stormwater.
We also believe a similar analysis is possible for the policies considered
in this chapter when a continuous state space is used for the first dam. The
matrices must be replaced by linear integral operators but the overall block
structure remains the same.
306 J. Piantadosi and P. Howlett

References

1. D. R. Cox and H. D. Miller, The Theory of Stochastic Processes (Methuen & Co.,
London, 1965).
2. F. R. Gantmacher, The Theory of Matrices (Chelsea Publishing Company, New York,
1960).
3. D. L. Isaacson and R. W. Madsen, Markov Chains: Theory and Applications (John
Wiley & Sons, New York, 1976).
4. P. A. P. Moran, A probability theory of dams and storage systems, Austral. J. Appl.
Sci. 5 (1954), 116–124.
5. P. A. P. Moran, An Introduction to Probability Theory (Clarendon Press, Oxford,
1968).
6. P. A. P. Moran, A Probability Theory of Dams and Storage Systems (McGraw-Hill,
New York, 1974).
7. G. F. Yeo, A ﬁnite dam with exponential release, J. Appl. Probability 11 (1974),
122–133.
8. G. F. Yeo, A ﬁnite dam with variable release rate, J. Appl. Probability 12 (1975),
205–211.
Chapter 16
Optimal design of linear
consecutive–k–out–of–n systems

Malgorzata O’Reilly

Abstract A linear consecutive–k–out–of–n:F system is an ordered sequence

Key words: Linear consecutive–k–out–of–n:F system, linear consecutive–

k–out–of–n:G system, variant optimal design, singular design, nonsingular
design

16.1 Introduction

16.1.1 Mathematical model

A linear consecutive–k–out–of–n:F system ([11], [20]–[23]) is a system of n

components ordered in a line, such that the system fails if and only if at least
k consecutive components fail. A linear consecutive–k–out–of–n:G system is
a system of n components ordered in a line, such that the system works if and
only if at least k consecutive components work. A particular arrangement of
components in a system is referred to as a design and a design that maximizes
system reliability is referred to as optimal. We assume the following:

Malgorzata O’Reilly
School of Mathematics and Physics, University of Tasmania, Hobart TAS 7001,
AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 307

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 16,
c Springer Science+Business Media, LLC 2009
308 M. O’Reilly

1. the system is either in a failing or a working state;

2. each component is either in a failing or a working state;
3. the failures of the components are independent;
4. component reliabilities are distinct and within the interval (0, 1).

The fourth assumption is made for the clarity of presentation, without

loss of generality. Cases that include reliabilities 0 and 1 can be viewed as
limits of other cases. Some of the strict inequalities will become nonstrict
when these cases are included. Note also that, in all procedures, an improved
design X = (q1 , . . . , qn ) and its reverse X = (qn , . . . , q1 ) are considered to be
equivalent.

16.1.2 Applications and generalizations of linear

consecutive–k–out–of–n systems

Two classic examples of consecutive–2–out–of–n:F systems were given by

Chiang and Niu in [11]:
• a telecommunication system with n relay stations (satellites or ground
stations) which fails when at least 2 consecutive stations fail; and
• an oil pipeline system with n pump stations which fails when at least 2
consecutive pump stations are down.
Kuo, Zhang and Zuo [24] gave the following example of a linear consecu-
tive–k–out–of–n:G system:
• consider n parallel-parking spaces on a street, with each space being suit-
able for one car. The problem is to ﬁnd the probability that a bus, which
takes 2 consecutive spaces, can park on this street.

More examples of these systems are in [2, 19, 34, 35, 38]. For a review of
the literature on consecutive–k–out–of–n systems the reader is referred to [8].
Also see [5] by Chang, Cui and Hwang.
Introducing more general assumptions and considering system topology
has led to some generalizations of consecutive–k–out–of–n systems. These
are listed below:
• consecutively connected systems [32];
• linearly connected systems [6, 7, 14];
• consecutive–k–out–of–m–from–n: F systems [36];
• consecutive–weighed–k–out–of–n: F systems [37];
• m–consecutive–k–out–of–n: F systems [15];
• 2–dimensional consecutive–k–out–of–n: F systems [31];
• connected–X–out–of–(m, n): F lattice systems [3];
• connected–(r, s)–out–of–(m, n): F lattice systems [27];
16 Optimal design of linear consecutive–k–out–of–n systems 309

• k–within–(r, s)–out–of–(m, n): F lattice systems [27];

• consecutively connected systems with multi-state components [27];
• generalized multi-state k–out–of–n: G systems [16];
• combined k–out–of–n: F , consecutive–k–out–of–n: F and linear connected–
(r, s)–out–of–(m, n): F system structures [39].

A number of related, more realistic systems have also been reported

[1, 9, 30].
Linear consecutive–k–out–of–n: F systems have been used to model vac-
uum systems in accelerators [18], computer ring networks [17], systems from
the ﬁeld of integrated circuits [4], belt conveyors in open-cast mining [27]
and the exploration of distant stars by spacecraft [10]. Applications of gener-
alized consecutive systems include medical diagnosis [31], pattern detection
[31], evaluation of furnace systems in the petro-chemical industry [39] and a
shovel-truck system in an open mine [16].

16.1.3 Studies of consecutive–k–out–of–n systems

Studies of the optimal designs of consecutive–k–out–of–n systems have re-

sulted in establishing two types of optimal designs: invariant and variant
designs. The optimality of invariant optimal designs is independent of the
numerical values of components’ reliabilities and subject only to the ordering
of the numerical values of component reliabilities. Conversely, the optimality
of variant optimal designs is contingent on those numerical values. Malon
[26] has noticed that, in practice, it may be suﬃcient to know the ages of
the components to be able to order them according to their reliabilities. This
has an important implication when an optimal design of the system is in-
variant, that is, independent of the component reliabilities. For such optimal
designs, one does not need to know the exact component reliabilities to be
able to order components in an optimal way. A linear consecutive–k–out–of–
n:F system has an invariant optimal design only for k ∈ {1, 2, n − 2, n − 1, n}
[26]. The invariant optimal design for linear consecutive–2–out–of–n:F sys-
tems has been given by Derman, Lieberman and Ross [12] and proven by
Malon [25] and Du and Hwang [13]. The invariant optimal designs for lin-
ear consecutive–k–out–of–n:F systems with k ∈ {n − 2, n − 1} have been
established by Malon [26].
A linear consecutive–k–out–of–n:G system has an invariant optimal design
only for k ∈ {1, n − 2, n − 1, n} and for n/2 ≤ k < n − 2 [40]. The invariant
optimal design for linear consecutive–k–out–of–n:G systems with n/2 ≤ k ≤
n−1 has been given by Kuo et al. [24]. Zuo and Kuo [40] have summarized the
complete results on the invariant optimal designs of consecutive–k–out–of–n
systems. Table 16.1 lists all invariant optimal designs of linear consecutive–
k–out–of–n systems and has been reproduced from [40]. The assumed order
310 M. O’Reilly

Table 16.1 Invariant optimal designs of linear consecutive–k–out–of–n systems

k F System G System

k=1 ω ω
k=2 (1, n, 3, n − 2, . . . , −
n − 3, 4, n − 1, 2)
2 < k < n/2 − −
n/2 ≤ k < n − 2 − (1, 3, . . . , 2(n − k) − 1, ω,
2(n − k), . . . , 4, 2)
k =n−2 (1, 4, ω, 3, 2) (1, 3, . . . , 2(n − k) − 1, ω,
2(n − k), . . . , 4, 2)
k =n−1 (1, ω, 2) (1, 3, . . . , 2(n − k) − 1, ω,
2(n − k), . . . , 4, 2)
k=n ω ω

of component reliabilities is p1 < p2 < . . . < pn . The symbol ω represents

any possible arrangement.
In all cases where an invariant optimal design is not listed, only variant
optimal designs exist.
Linear consecutive–k–out–of–n systems have variant optimal designs for
all F systems with 2 < k < n − 2 and all G systems with 2 ≤ k < n/2.
For these systems, the information about the order of component reliabili-
ties is not sufficient to find the optimal design. In fact, one needs to know
the exact value of component reliabilities. This is because different sets of
component reliabilities produce different optimal designs, so that for a given
linear consecutive–k–out–of–n system there is more than one possible optimal
design.
Zuo and Kuo [40] have proposed methods for dealing with the variant opti-
mal design problem which are based upon the following necessary conditions
for optimal design, proved by Malon [26] for linear consecutive–k–out–of–n:F
systems and extended by Kuo et al. [24] to linear consecutive–k–out–of–n:G
systems:

(i) components from positions 1 to min{k, n − k + 1} are arranged in nonde-

creasing order of component reliability;
(ii) components from positions n to max{k, n − k + 1} are arranged in nonde-
creasing order of component reliability;
(iii) the (2k − n) most reliable components are arranged from positions (n −
k + 1) to k in any order if n < 2k.

In the case when n ≥ 2k, a useful concept has been that of singularity,
which has been also applied in invariant optimal designs [13]. A design
X = (q1 , q2 , . . . , qn ) is singular if for symmetrical components qi and qn+1−i ,
1 ≤ i ≤ [n/2], either qi > qn+1−i or qi < qn+1−i for all i; otherwise the
design is nonsingular. According to Shen and Zuo [33] a necessary condi-
tion for the optimal design of a linear consecutive–k–out–of–n:G system with
16 Optimal design of linear consecutive–k–out–of–n systems 311

n ∈ {2k, 2k + 1} is for it to be singular. In [28] we have shown that a nec-

essary condition for the optimal design of linear consecutive–k–out–of–n:F
systems with 2k ≤ n ≤ (2k + 1) is for it to be nonsingular. Procedures to
improve designs not satisfying necessary conditions for the optimal design of
linear consecutive–k–out–of–n:F and linear consecutive–k–out–of–n:G were
also given. The signiﬁcance of these results was illustrated by an example
showing that designs satisfying these necessary conditions can be better than
designs satisfying other known necessary conditions.

16.1.4 Summary of the results

In this chapter we treat the case 2k + 2 ≤ n ≤ 3k and explore whether the re-
sults of Shen and Zuo [33] and O’Reilly [28] can be extended to this case. The
proofs included here are more complicated and the produced results do not
exactly mirror those when 2k ≤ n ≤ 2k + 1. We find that, although the nec-
essary conditions for the optimal design of linear consecutive–k–out–of–n:F
systems in the cases 2k ≤ n ≤ 2k +1 and 2k +2 ≤ n ≤ 3k are similar, the pro-
cedures to improve designs not satisfying this necessary condition differ in the
choice of interchanged components. Furthermore, the necessary conditions for
linear consecutive–k–out–of–n:G systems in these two cases are significantly
different. In the case when 2k + 2 ≤ n ≤ 3k, the requirement for the opti-
mal design of a linear consecutive–k–out–of–n:G system to be singular holds
only under certain limitations. Examples of nonsingular and singular optimal
designs are given. The theorems are built on three subsidiary propositions,
which are given in Sections 16.2 and 16.4. Proposition 16.4.1 itself requires
some supporting lemmas which are the substance of Section 16.3. The main
results for this case are presented in Section 16.5. The ideas are related to
those in the existing literature, though the detail is somewhat complicated.
The arguments are constructive and based on the following.
Suppose X ≡ (q1 , . . . , q2k+m ) is a design and {qi1 , . . . , qir } is an arbitrary

• proper subset of {q1 , . . . , qk } when m ≤ 1, or

• nonempty subset of {qm , . . . , qk } when m > 1.

We denote by X ∗ ≡ (q1∗ , . . . , q2k+m

∗
) the design obtained from X by inter-
changing symmetrical components qij and q2k+m+1−ij for all 1 ≤ j ≤ r.
We show that a number of inequalities exist between quantities deﬁned
from X and the corresponding quantities deﬁned for a generic X ∗ . We
use the notation X ∗ in this way throughout this chapter without further
comment.
Theorem 16.5.1 of Section 16.5 rules out only one type of design of consecu-
tive–k–out–of–(2k + m): F systems: singular designs. However, we emphasize
that the results for consecutive–k–out–of–(2k + m): G systems in Theorem
16.5.2 and Corollary 16.5.2, obtained by symmetry from Theorem 16.5.1,
312 M. O’Reilly

signiﬁcantly reduce the number of designs to be considered in algorithms

searching for an optimal design when m is small. For example, for m =
2, when we apply the necessary condition stated in Corollary 16.5.2, the
number of designs to be considered reduces from (2k − 2)! to (2k − 2)!/2k if
(q1 , qk+1 , qk+2 , q2k+2 ) is singular (which occurs with probability 0.5 when a
design is chosen randomly). We note that, except for the necessary conditions
mentioned in Section 16.1.3, little is known about the variant optimal designs.
We establish more necessary conditions for the variant optimal design in [29],
also appearing in this volume, which is an important step forward in studying
this diﬃcult problem.

16.2 Propositions for R and M

Throughout this chapter we adopt the convention

qs ≡ 1,
∅
a + b − ab ≡ (a ⊕ b),

and make use of the following deﬁnitions.

Deﬁnition 1. Let X ≡ (q1 , . . . , q2k+m ), 2 ≤ m ≤ k, l ≤ k and {qi1 , . . . , qir }

be an arbitrary nonempty subset of {qm , . . . , qk }. We deﬁne

A(X) ≡ qs ,
s∈{i1 ,...,ir }

A (X) ≡ q2k+m+1−s ,
s∈{i1 ,...,ir }

Bl (X) ≡ qs and
s∈{l,...,k}\{i1 ,...,ir }

Bl (X) ≡ q2k+m+1−s ,
s∈{l,...,k}\{i1 ,...,ir }

with similar deﬁnitions for X ∗ (obtained by replacing X with X ∗ and

q with q ∗ ).

Note that Bl (X) = Bl (X ∗ ), Bl (X) = Bl (X ∗ ), A (X) = A(X ∗ ) and

A (X ∗ ) = A(X).

Deﬁnition 2. Let X ≡ (q1 , . . . , q2k+m ), 2 ≤ m ≤ k. Thus we have either

m = 2T + 1 for some T > 0 or m = 2T + 2 for some T ≥ 0. We deﬁne
16 Optimal design of linear consecutive–k–out–of–n systems 313

W0 (X) ≡ F̄k2k+m (X),

t
> ?@ A
Wt (X) ≡ F̄k2k+m (q1 , . . . , qk , 1, . . . , 1, qk+t+1 , . . . , qk+m−t ,
t
> ?@ A
1, . . . , 1, qk+m+1 , . . . , q2k+m )
for 1 ≤ t ≤ T, m > 2 and
m
> ?@ A
WT +1 (X) ≡ F̄k2k+m (q1 , . . . , qk , 1, . . . , 1, qk+m+1 , . . . , q2k+m ).

Deﬁnition 3. Let X ≡ (q1 , . . . , q2k+m ), 2 ≤ m ≤ k with either m = 2T + 1

for some T > 0 or m = 2T + 2 for some T ≥ 0. We deﬁne
k

2k+m−t
Mt (X) ≡ qs ⊕ qs
s=t+1 s=k+m+1
for 0 ≤ t ≤ T.

Deﬁnition 4. Let X ≡ (q1 , . . . , q2k+m ), 2 ≤ m ≤ k. If m = 2T + 2 for some

T ≥ 0, we deﬁne
k
−1
2k+m−T
RT (X) ≡ pk+T +1 qk+m−T qs ⊕ qs
s=T +1 s=k+m+1
2k+m−T

k
+ qk+T +1 pk+m−T qs ⊕ qs .
s=k+m+1 s=T +2

If m > 2 with either m = 2T + 1 for some T > 0 or m = 2T + 2 for some

T > 0, then for 0 ≤ t ≤ T − 1 we deﬁne
k

Rt (X) ≡ pk+t+1 qk+m−t qs ⊕
s=t+1

m+k−3t−3
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 )

2k+m−t
+ qk+t+1 pk+m−t qs ⊕
s=k+m+1

m+k−3t−3
F̄k−t−1 (qt+2 , . . . , qk , qk+t+2 , . . . , qk+m−t−1 ) .

It will be convenient to make the following abbreviations: A = A(X), A∗ =

A(X ∗ ), Bl = Bl (X), Bl = Bl (X), Wt (X) = Wt , Wt (X ∗ ) = Wt∗ , Mt (X) =
Mt , Mt (X ∗ ) = Mt∗ , Rt (X) = Rt and Rt (X ∗ ) = Rt∗ .
314 M. O’Reilly

Propositions 16.2.1 and 16.2.2 below contain results for Mt and RT , which
are later used to prove a result for W0 in Theorem 16.5.1 of Section 16.5. In
the proofs of Propositions 16.2.1, 16.2.2 and 16.4.1 and Theorem 16.5.1 we
assume q1 > q2k+m . Note that, by the symmetry of the formulas, reversing
the order of components in X and X ∗ would not change the values of Wt , Mt
and Rt for X and X ∗ . Therefore the assumption q1 > q2k+m can be made
without loss of generality.

Proposition 16.2.1 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 ≤ m ≤ k, with

either m = 2T + 1 for some T > 0 or m = 2T + 2 for some T ≥ 0. Then

Mt > Mt∗ for 0 ≤ t ≤ T,

for any X ∗ .

Proof. Without loss of generality we can assume q1 > q2k+m . Note that
m ≥ 2 and so t + 1 < m for all 0 ≤ t ≤ T . We have

Mt = ABt+1 ⊕ A∗ Bt+1 ,

Mt∗ = A∗ Bt+1 ⊕ ABt+1 ,

Mt − Mt∗ = (A − A∗ )(Bt+1 − Bt+1 ),

where A − A∗ > 0 and Bt+1 − Bt+1 > 0 by the singularity of X, and so

Mt − Mt∗ > 0,

proving the proposition.

Proposition 16.2.2 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 ≤ m ≤ k, with

m = 2T + 2 for some T ≥ 0. Then

RT > RT∗

for any X ∗ .

Proof. Without loss of generality we can assume q1 > q2k+m . Note that
m ≥ 2 and so T + 2 ≤ m. We have

RT = pk+T +1 qk+m−T qT +1 ABT +2 ⊕ A∗ BT +2

+ qk+T +1 pk+m−T q2k+m−T A∗ BT +2 ⊕ ABT +2 ,

RT∗ = pk+T +1 qk+m−T qT +1 A∗ BT +2 ⊕ ABT +2

+ qk+T +1 pk+m−T q2k+m−T ABT +2 ⊕ A∗ BT +2
16 Optimal design of linear consecutive–k–out–of–n systems 315

and

RT − RT∗ = qk+T +1 pk+m−T (BT +2 − q2k+m−T BT +2 )(A − A∗ )

− qk+m−T pk+T +1 (BT +2 − qT +1 BT +2 )(A − A∗ ),

where by the singularity of X

BT +2 − q2k+m−T BT +2 > 0,
A − A∗ > 0,
qk+T +1 pk+m−T > qk+m−T pk+T +1 ,

BT +2 − q2k+m−T BT +2 > BT +2 − qT +1 BT +2 ,

and so
RT − RT∗ > 0,
proving the proposition.

16.3 Preliminaries to the main proposition

Lemmas 16.3.1–16.3.3 below contain some preliminary results which are used
in the proof of a result for Rt in Proposition 16.4.1 of Section 16.4.
Lemma 16.3.1 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 < m ≤ k, with either
m = 2T + 1 for some T > 0 or m = 2T + 2 for some T > 0. Then for any
X ∗ and all 0 ≤ t ≤ T − 1 we have

k
m−1
qs = ABm qs , (16.1)
s=t+1 s=t+1

m+k−3t−3
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 )

2(m−2t−2)
= F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) ·

2k+t+1
qs
s=k+m+1

2(m−2t−2)
= F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) ·

2k+t+1
A∗ Bm qs , (16.2)
s=2k+2
316 M. O’Reilly

k
m+k−3t−3
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 ) qs
s=t+1
2(m−2t−2)
= F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) ·
m−1 2k+t+1

∗
ABm A Bm qs qs , (16.3)
s=t+1 s=2k+2

m−1

k
qs∗ = qs A∗ Bm , (16.4)
s=t+1 s=t+1

2(m−2t−2) ∗ ∗ ∗ ∗
F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , qk+m−1 , . . . , q2k+m−t−1 )
2(m−2t−2)
= F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) ·

2k+t+1
ABm qs , (16.5)
s=2k+2

k
m+k−3t−3 ∗ ∗ ∗ ∗
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 ) qs∗
s=t+1
2(m−2t−2)
= F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) ·
m−1 2k+t+1

ABm A∗ Bm qs qs (16.6)
s=t+1 s=2k+2

and

k
m+k−3t−3
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 ) qs
s=t+1

k
m+k−3t−3 ∗ ∗ ∗ ∗
= F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 ) qs∗ .
s=t+1
(16.7)

Proof. Without loss of generality we can assume q1 > q2k+m .

We have m > 2 and so t + 1 < m for all 0 ≤ t ≤ T − 1. Therefore (16.1)
is satisﬁed. Also, consider that in
m+k−3t−3
F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 )

we have 2(k − t − 1) > m + k − 3t − 3. Therefore every event in which

k − t − 1 consecutive components fail will include failure of the components
qk+m+1 , . . . , q2k+t+1 . Hence (16.2) follows. From (16.1) and (16.2) we have
(16.3).
16 Optimal design of linear consecutive–k–out–of–n systems 317

In a similar manner we show (16.4)–(16.6).

From (16.3) and (16.6), we have (16.7) and the lemma follows.

Lemma 16.3.2 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 < m ≤ k, with either

m = 2T + 1 for some T > 0 or m = 2T + 2 for some T > 0. Then for any
X ∗ and all 0 ≤ t ≤ T − 1 we have

2k+m−t

2k+m−t
qs = A∗ Bm qs , (16.8)
s=k+m+1 s=2k+2

m+k−3t−3
F̄k−t−1 (qt+2 , . . . , qk , qk+t+2 , . . . , qk+m−t−1 )

2(m−2t−2)

k
= F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 ) qs
s=m−t

2(m−2t−2)

m−1
= F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )ABm qs ,
s=m−t
(16.9)

2k+m−t
m+k−3t−3
F̄k−t−1 (qt+2 , . . . , qk , qk+t+2 , . . . , qk+m−t−1 ) qs
s=k+m+1

(qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )ABm A∗ Bm ·
2(m−2t−2)
= F̄m−2t−2
2k+m−t m−1

qs qs , (16.10)
s=2k+2 s=m−t

2k+m−t

2k+m−t
qs∗ = ABm qs , (16.11)
s=k+m+1 s=2k+2

m+k−3t−3 ∗
F̄k−t−1 (qt+2 , . . . , qk∗ , qk+t+2
∗ ∗
, . . . , qk+m−t−1 )

m−1
(qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )A∗ Bm
2(m−2t−2)
= F̄m−2t−2 qs ,
s=m−t
(16.12)

2k+m−t
m+k−3t−3 ∗
F̄k−t−1 (qt+2 , . . . , qk∗ , qk+t+2
∗ ∗
, . . . , qk+m−t−1 ) qs∗
s=k+m+1

F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )ABm A∗ Bm
2(m−2t−2)
= ·
2k+m−t

m−1
qs qs (16.13)
s=2k+2 s=m−t
318 M. O’Reilly

and

2k+m−t
m+k−3t−3
F̄k−t−1 (qt+2 , . . . , qk , qk+t+2 , . . . , qk+m−t−1 ) qs
s=k+m+1

2k+m−t
m+k−3t−3 ∗
= F̄k−t−1 (qt+2 , . . . , qk∗ , qk+t+2
∗ ∗
, . . . , qk+m−t−1 ) qs∗ .
s=k+m+1
(16.14)

Proof. Note that Lemma 16.3.1 is also true for designs reversed to X and
X ∗ , that is, for Xr = (q2k+m , . . . , q1 ) and Xr∗ = (q2k+m
∗
, . . . , q1∗ ). Therefore
all equalities of Lemma 16.3.2 are satisﬁed.

Lemma 16.3.3 Let Y ≡ (q2k , . . . , q1 ), k ≥ 2, let (qk , . . . , q1 ) be singular with

q1 < qk , and let Y ≡ (q2k , . . . , qk+1 , q1 , . . . , qk ). Then

F̄k2k (Y ) > F̄k2k (Y ).

Proof. For 1 ≤ i ≤ [k/2], let Y i be obtained from Y by interchanging com-

ponents i and k + 1 − i. It is suﬃcient to show

F̄k2k (Y ) > F̄k2k (Y i ),

that is, that the system Y is improved by interchanging the components at

positions i and k+1−i. Note that by the singularity of Y we have qi < qk+1−i .
Malon in [26] (for F systems) and Kuo et al. in [24] (for G systems) have
shown that if in a linear consecutive–k–out–of–n system we have pi > pj (or
equivalently qi < qj ) for some 1 ≤ i ≤ j ≤ min{k, n − k + 1}, then the system
is improved by interchanging components pi and pj (equivalently qi and qj ).
Applying this result to systems Y and Y i , we have

F̄k2k (Y ) > F̄k2k (Y i ),

proving the lemma.

16.4 The main proposition

Proposition 16.4.1 below contains a result for Rt which is later used in the
proof of a result for W0 in Theorem 16.5.1 of Section 16.5.
Proposition 16.4.1 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 < m ≤ k, with
either m = 2T + 1 for some T > 0 or m = 2T + 2 for some T > 0. Then

Rt > Rt∗ for 0≤t≤T −1

16 Optimal design of linear consecutive–k–out–of–n systems 319

for any X ∗ .

Proof. Without loss of generality we can assume q1 > q2k+m . Deﬁne

k
R̃t (X) ≡ pk+t+1 qk+m−t qs
s=t+1

m+k−3t−3
+ F̄k−t−1 (qk+t+2 , . . . , qk+m−t−1 , qk+m+1 , . . . , q2k+m−t−1 )

2k+m−t
+ qk+t+1 pk+m−t qs
s=k+m+1

m+k−3t−3
+ F̄k−t−1 (qt+2 , . . . , qk , qk+t+2 , . . . , qk+m−t−1 ) ,

with a similar formula for R̃t (X ∗ ). Put R̃t (X) = R̃t and R̃t (X ∗ ) = R̃t∗ .
Note that in the formulas for Rt and Rt∗ the following equalities are satisfied:
∗ ∗
qk+t+1 = qk+t+1 , qk+m−t = qk+m−t , (16.7) of Lemma 16.3.1 and (16.14) of
Lemma 16.3.2. Hence to prove that Rt > Rt∗ , it is sufficient to show R̃t > R̃t∗ .
Define

m−1

2k+m−t
T ≡ qs , T ≡ qs ,
s=t+1 s=2k+2
2(m−2t−2)
U1 ≡ F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ) and
2(m−2t−2)
U2 ≡ F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 ).

Applying results (16.1), (16.2), (16.4) and (16.5) of Lemma 16.3.1, and
(16.8), (16.9), (16.11) and (16.13) of Lemma 16.3.2, we have

2k+t+1
∗
R̃t = pk+t+1 qk+m−t T ABm + A Bm U1 qs
s=2k+2

m−1
+ qk+t+1 pk+m−t A∗ Bm T + ABm U2 qs ,
s=m−t

2k+t+1
R̃t∗ = pk+t+1 qk+m−t T A∗ Bm + ABm U1 qs
s=2k+2

m−1
+ qk+t+1 pk+m−t ABm T + A∗ Bm U2 qs ,
s=m−t
320 M. O’Reilly

and so

m−1

R̃t − R̃t∗ = qk+t+1 pk+m−t Bm U2
qs − T Bm (A − A∗ )
s=m−t

2k+t+1
− qk+m−t pk+t+1 Bm U1 qs − T Bm (A − A∗ ). (16.15)
s=2k+2

Note that by the singularity of X

qk+t+1 pk+m−t > qk+m−t pk+t+1 , (16.16)

A − A∗ > 0, (16.17)

Bm ≥ Bm , (16.18)

T > T and (16.19)

m−1
2k+t+2
qs ≥ qs , (16.20)
s=m−t s=2k+2

where the equalities include the assumption ∅ qs ≡ 1 , and
m−t−1

m−1
m−1
U2 qs > qs qs
s=m−t s=m−t s=t+2

m−1
= qs
s=t+2

2k+m−t−1
> qs
s=2k+2

>T . (16.21)

From (16.18) and (16.21) it follows that

m−1

Bm U2 qs − T Bm > 0.
s=m−t

Next, by Lemma 16.3.3

2(m−2t−2)
F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )
2(m−2t−2)
> F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+m−t−1 , . . . , qk+t+2 ), (16.22)

and since
qt+2 > q2k+m−t−1 , . . . , qm−t−1 > q2k+t+2 ,
16 Optimal design of linear consecutive–k–out–of–n systems 321

we have
2(m−2t−2)
F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+m−t−1 , . . . , qk+t+2 )
2(m−2t−2)
> F̄m−2t−2 (q2k+m−t−1 , . . . , q2k+t+2 , qk+m−t−1 , . . . , qk+t+2 ).
(16.23)

From (16.22) and (16.23) it follows that

2(m−2t−2)
F̄m−2t−2 (qt+2 , . . . , qm−t−1 , qk+t+2 , . . . , qk+m−t−1 )
2(m−2t−2)
> F̄m−2t−2 (qk+t+2 , . . . , qk+m−t−1 , q2k+t+2 , . . . , q2k+m−t−1 ),

that is,
U2 > U1 .
From (16.18)–(16.20) and (16.24) we have

m−1

2k+t+1
Bm U2 qs − T Bm > Bm U1 qs − T Bm .
s=m−t s=2k+2

Considering (16.15), we conclude by (16.16), (16.17), (16.22) and (16.24) that

R̃t > R̃t∗ ,

proving the proposition.

16.5 Theorems

Theorem 16.5.1 below states that if X is a singular design of a linear

consecutive–k–out–of–(2k + m):F system with 2 ≤ m ≤ k, then for any
nonsingular design X∗ obtained from X by interchanging symmetrical com-
ponents (as deﬁned in Section 16.1.4), X∗ is a better design.
Theorem 16.5.1 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 ≤ m ≤ k, with
either m = 2T + 1 for some T > 0 or m = 2T + 2 for some T ≥ 0. Then X ∗
is nonsingular and
F̄k2k+m (X) > F̄k2k+m (X ∗ )
for any X ∗ .

Proof. Clearly, X ∗ is nonsingular. Without loss of generality we can assume

q1 > q2k+m . Proceeding by induction, we shall prove that W0 > W0∗ .
STEP 1. For 2 ≤ m = k we have WT +1 = WT∗ +1 = 1. For 2 ≤ m < k we
have
322 M. O’Reilly

2(k−m)
WT +1 = F̄k−m (qm+1 , . . . , qk , qk+m+1 , . . . , q2k ),
with a similar formula for WT∗ +1 . By Theorem 16.5.1 (see also O’Reilly [28]) it
follows that WT +1 ≥ WT∗ +1 , with equality if and only if either {qm+1 , . . . , qk }
is a subset of {qi1 , . . . , qir } or the intersection of those sets is empty. Either
way we have
WT +1 ≥ WT∗ +1 .
STEP 2. Note that if m = 2T + 1, then

WT = pk+T +1 MT + qk+T +1 WT +1 ,

with a similar formula for WT∗ , where qk+T +1 = qk+T

∗ ∗
+1 . We have MT > MT

by Proposition 16.2.1 and WT +1 ≥ WT∗ +1 by Step 1, and so it follows that

WT > WT∗ .
If m = 2T + 2, then

WT = pk+T +1 pk+m−T MT + qk+T +1 qk+m−T WT +1 + RT ,

with a similar formula for WT∗ , where qk+T +1 = qk+T

∗ ∗
+1 , qk+m−T = qk+m−T ,

MT > MT∗ by Proposition 16.2.1, WT +1 ≥ WT∗ +1 by Step 1 and RT > RT∗ by

Proposition 16.2.2. Hence WT > WT∗ .
Either way we have
WT > WT∗ .
If m = 2, then T = 0 and so W0 > W0∗ , completing the proof for m = 2.
Consider m > 2.
∗
STEP 3. Suppose that Wt+1 > Wt+1 for some 0 ≤ t ≤ T − 1. We shall
∗
show that then Wt > Wt .
We have

Wt = pk+t+1 pk+m−t Mt + qk+t+1 qk+m−t Wt+1 + Rt ,

with a similar formula for Wt∗ , where qk+t+1 = qk+t+1

∗ ∗
, qk+m−t = qk+m−t ,
Mt > Mt∗ by Proposition 16.2.1, Wt+1 > Wt+1 ∗
by the inductive assumption
and Rt > Rt∗ by Proposition 16.4.1. It follows that

Wt > Wt∗ .

From Steps 2–3 and mathematical induction we have

W0 > W0∗ ,

proving the theorem.

The following corollary is a direct consequence of Theorem 16.5.1.
16 Optimal design of linear consecutive–k–out–of–n systems 323

Corollary 16.5.1 A necessary condition for the optimal design of a linear

consecutive–k–out–of–(2k + m):F system with 2 ≤ m ≤ k is for it to be
nonsingular.

Theorem 16.5.2 below states that if X is a singular design of a linear

consecutive–k–out–of–(2k + m):G system with q1 > q2k+m , 2 ≤ m ≤ k, then
for any nonsingular design X∗ obtained from X by interchanging symmetrical
components (as deﬁned in Section 16.1.4), X is a better design.

Theorem 16.5.2 Let X ≡ (q1 , . . . , q2k+m ) be singular, 2 ≤ m ≤ k. Then

X ∗ is nonsingular and

G2k+m
k (X) > G2k+m
k (X ∗ )

for any X ∗ .

Proof. This theorem for G systems can be proved in a manner similar to the
proof of Theorem 16.5.1 for F systems. That is, by giving similar deﬁnitions
for G systems, similar proofs for lemmas and propositions for G systems
can be given. Alternatively, the theorem can be proved by applying only
Theorem 16.5.1, as below.
Clearly, X ∗ is nonsingular. Deﬁne p̄i ≡ qi for all 1 ≤ i ≤ 2k + m. Then we
have

G2k+m
k (X) = F̄k2k+m (p̄1 , . . . , p̄2k+m ),
G2k+m
k (X ∗ ) = F̄k2k+m (p̄∗1 , . . . , p̄∗2k+m ),

where (p̄1 , . . . , p̄2k+m ) is singular, and so by Theorem 16.5.1

F̄k2k+m (p̄1 , . . . , p̄2k+m ) > F̄k2k+m (p̄∗1 , . . . , p̄∗2k+m ),

proving the theorem.

Corollary 16.5.2 Let Y = (q1 , . . . , q2k+m ) be the optimal design of a linear

consecutive–k–out–of–(2k + m):G system with 2 ≤ m ≤ k. If

(q1 , . . . , qm−1 , qk+1 . . . , qk+m , q2k+2 , . . . , q2k+m )

is singular, then Y must be singular too.

Proof. Suppose Y is not singular. Let Z be a singular design obtained from Y

by an operation in which we allow the interchange of only those symmetrical
components which are in places m, . . . , k, k + m + 1, . . . , 2k + 1. Then Z and
Y satisfy the conditions of Theorem 16.5.2, and so

G2k+m
k (Z) > G2k+m
k (Y ).

Hence Y is not optimal, and by contradiction the corollary is proved.

324 M. O’Reilly

16.6 Procedures to improve designs not satisfying

necessary conditions for the optimal design

We have shown that a necessary condition for the optimal design of a lin-
ear consecutive–k–out–of–n:F system with 2k + 2 ≤ n ≤ 3k is for it to be
nonsingular (Corollary 16.5.1 of Section 16.5), which is similar to the case
2k ≤ n ≤ 2k + 1 treated in [28]. However, the procedures given in [28] can-
not be implemented in this case. This is due to the restriction placed on
the choice of interchanged symmetrical components ((3m − 2) components
excluded from the interchange).
The following procedure is a consequence of Theorem 16.5.1.
Procedure 16.6.1 In order to improve a singular design of a linear consec-
utive–k–out–of–(2k +m):F system with 2 ≤ m ≤ k, apply the following steps:
1. select an arbitrary nonempty set of pairs of symmetrical components so
that the ﬁrst component in each pair is in a position from m to k; and
then
2. interchange the two components in each selected pair.
Note that the number of possible choices in Step 1 is 2(k−m+1) − 1. Conse-
quently, the best improvement can be chosen or, if the number of possible
choices is too large to consider all options, the procedure can be repeated as
required.
Because the result for systems with 2k +2 ≤ n ≤ 3k excludes some compo-
nents, it is not possible to derive from it, unlike the case when 2k ≤ n ≤ 2k+1,
that it is necessary for the optimal design of a linear consecutive–k–out–of–n:
G system to be singular. However, as stated in Corollary 16.5.2 of Section
16.5, if a subsystem composed of those excluded components is singular, then
the whole system has to be singular for it to be optimal. Consequently, the
following procedure can be applied. Note that, for a given nonsingular design,
the number of possible singular designs produced in this manner is 1.
Procedure 16.6.2 Suppose a design of a linear consecutive–k–out–of–(2k +
m): G system is nonsingular, with 2 ≤ m ≤ k. Consider its subsystem com-
posed of components in positions from 1 to (m − 1) , from (k + 1) to (k + m),
and from (2k + 2) to (2k + m), in order as in the design. If such a subsys-
tem is singular, then in order to improve the design, interchange all required
symmetrical components so that the design becomes singular.
The following examples, calculated using a program written in C ++ , are
given in order to illustrate the fact that both nonsingular and singular optimal
designs of linear consecutive–k–out–of–n:G systems exist.
Example 1. (q1 , q5 , q7 , q9 , q8 , q6 , q4 , q3 , q2 ) is a nonsingular optimal design of
a linear consecutive–3–out–of–9:G system. It is optimal for q1 = 0.151860,
q2 = 0.212439, q3 = 0.304657, q4 = 0.337662, q5 = 0.387477, q6 = 0.600855,
q7 = 0.608716, q8 = 0.643610 and q9 = 0.885895.
16 Optimal design of linear consecutive–k–out–of–n systems 325

Example 2. (q1 , q3 , q4 , q5 , q7 , q9 , q8 , q6 , q2 ) is a singular optimal design of a lin-

ear consecutive–3–out–of–9:G system. It is optimal for q1 = 0.0155828, q2 =
0.1593690, q3 = 0.3186930, q4 = 0.3533360, q5 = 0.3964650, q6 = 0.4465830,
q7 = 0.5840900, q8 = 0.8404850 and q9 = 0.8864280.

References

1. D. S. Bai, W. Y. Yun and S. W. Chung, Redundancy optimization of k–out–of–n

systems with common–cause failures, IEEE Trans. Reliability 40(1) (1991), 56–59.
2. A. Behr and L. Camarinopoulos, Two formulas for computing the reliability of incom-
plete k–out–of–n:G systems, IEEE Trans. Reliability 46(3) (1997), 421–429.
3. T. Boehme, A. Kossow and W. Preuss, A generalization of consecutive–k–out–of–n:F
systems, IEEE Trans. Reliability 41(3) (1992), 451–457.
4. R. C. Bollinger, A. A. Salvia, Consecutive–k–out–of–n:F networks, IEEE Trans. Re-
liability 31(1) (1982), 53–56.
5. G. J. Chang, L. Cui and F. K. Hwang, Reliabilities of Consecutive–k Systems, (Kluwer,
Amsterdam, 2000).
6. M. T. Chao and J. C. Fu, A limit theorem of certain repairable systems, Ann. Inst.
Statist. Math. 41 (1989), 809–818.
7. M. T. Chao and J. C. Fu, The reliability of large series systems under Markov structure,
Adv. Applied Probability 23(4) (1991), 894–908.
8. M. T. Chao, J. C. Fu and M. V. Koutras, Survey of reliability studies of consecutive–
k–out–of–n: F & related systems, IEEE Trans. Reliability 44(1) (1995), 120–127.
9. R. W. Chen, F. K. Hwang and Wen-Ching Winnie Li, Consecutive–2–out–of–n:F
systems with node & link failures, IEEE Trans. Reliability 42(3) (1993), 497–502.
10. D. T. Chiang and R. F. Chiang, Relayed communication via consecutive–k–out–of–n:F
system, IEEE Trans. Reliability 35(1) (1986), 65–67.
11. D. T. Chiang and S–C. Niu, Reliability of consecutive–k–out–of–n:F system, IEEE
Trans. Reliability 30(1) (1981), 87–89.
12. C. Derman, G. J. Lieberman and S. M. Ross, On the consecutive–k–of–n:F system,
IEEE Trans. Reliability 31(1) (1982), 57–63.
13. D. Z. Du and F. K. Hwang, Optimal consecutive–2–out–of–n systems, Mat. Oper.
Research 11(1) (1986), 187–191.
14. J. C. Fu, Reliability of consecutive–k–out–of–n:F systems with (k − 1)–step Markov
dependence, IEEE Trans. Reliability 35(5) (1986), 602–606.
15. W. S. Griﬃth, On consecutive–k–out–of–n failure systems and their generalizations,
Reliability & Quality Control, Columbia, Mo. (1984), (North–Holland, Amsterdam,
1986), 157–165.
16. J. Huang and M. J. Zuo, Generalised multi–state k–out–of–n:G systems, IEEE Trans.
Reliability 49(1) (2000), 105–111.
17. F. K. Hwang, Simpliﬁed reliabilities for consecutive–k–out–of–n systems, SIAM J. Alg.
Disc. Meth. 7(2) (1986), 258–264.
18. S. C. Kao, Computing reliability from warranty, Proc. Amer. Statistical Assoc., Section
on Statistical Computing (1982), 309–312.
19. K. C. Kapur and L. R. Lamberson, Reliability in Engineering Design (John Wiley &
Sons, New York, 1977).
20. J. M. Kontoleon, Optimum allocation of components in a special 2–port network, IEEE
Trans. Reliability 27(2) (1978), 112–115.
21. J. M. Kontoleon, Analysis of a dynamic redundant system, IEEE Trans. Reliability
27(2) (1978), 116–119.
326 M. O’Reilly

22. J. M. Kontoleon, Reliability improvement of multiterminal networks, IEEE Trans.

Reliability 29(1) (1980), 75–76.
23. J. M. Kontoleon, Reliability determination of a r–successive–out–of–n:F system, IEEE
Trans. Reliability 29(5) (1980), 437.
24. W. Kuo, W. Zhang and M. Zuo, A consecutive–k–out–of–n:G system: the mirror image
of a consecutive–k–out–of–n:F system, IEEE Trans. Reliability 39(2) (1990), 244–253.
25. D. M. Malon, Optimal consecutive–2–out–of–n:F component sequencing, IEEE Trans.
Reliability 33(5) (1984), 414–418.
26. D. M. Malon, Optimal consecutive–k–out–of–n:F component sequencing, IEEE Trans.
Reliability 34(1) (1985), 46–49.
27. J. Malinowski and W. Preuss, On the reliability of generalized consecutive systems –
a survey, Intern. Journ. Reliab., Quality & Safety Engineering 2(2) (1995), 187–201.
28. M. M. O’Reilly, Variant optimal designs of linear consecutive–k–out–of–n systems, to
appear in Mathematical Sciences Series: Industrial Mathematics and Statistics, Ed.
J. C. Misra (Narosa Publishing House, New Delhi, 2003), 496–502.
29. M. M. O’Reilly, The (k + 1)–th component of linear consecutive–k–out–of–n systems,”
chapter in this volume.
30. S. Papastavridis and M. Lambiris, Reliability of a consecutive–k–out–of–n:F system
for Markov–dependent components, IEEE Trans. Reliability 36(1) (1987), 78–79.
31. A. A. Salvia and W. C. Lasher, 2–dimensional consecutive–k–out–of–n:F models,
IEEE Trans. Reliability 39(3) (1990), 382–385.
32. J. G. Shanthikumar, Reliability of systems with consecutive minimal cutsets, IEEE
Trans. Reliability 36(5) (1987), 546–550.
33. J. Shen and M. Zuo, A necessary condition for optimal consecutive–k–out–of–n:G
system design, Microelectron. Reliab. 34(3) (1994), 485–493.
34. J. Shen and M. J. Zuo, Optimal design of series consecutive–k–out–of–n:G systems,
Rel. Engin. & System Safety 45 (1994), 277–283.
35. M. L. Shooman, Probabilistic Reliability: An Engineering Approach, (McGraw-Hill,
New York, 1968).
36. Y. L. Tong, A rearrangement inequality for the longest run, with an application to
network reliability, J. App. Prob. 22 (1985), 386–393.
37. Jer–Shyan Wu and Rong–Jaya Chen, Eﬃcient algorithms for k–out–of–n &
consecutive–weighed–k–out–of-n:F system, IEEE Trans. Reliability 43(4) (1994),
650–655.
38. W. Zhang, C. Miller and W. Kuo, Application and analysis for a consecutive–k–out–
of–n:G structure, Rel. Engin. & System Safety 33 (1991), 189–197.
39. M. J. Zuo, Reliability evaluation of combined k–out–of–n:F , consecutive–k–out–of–
n:F , and linear connected–(r, s)–out–of–(m, n):F system structures, IEEE Trans. Re-
liability 49(1) (2000), 99–104.
40. M. Zuo and W. Kuo, Design and performance analysis of consecutive–k–out–of–n
structure, Nav. Research Logistics 37 (1990), 203–230.
Chapter 17
The (k + 1)-th component of linear
consecutive–k–out–of–n systems

Malgorzata O’Reilly

Abstract A linear consecutive–k–out–of–n:F system is an ordered sequence

of n components that fails if and only if at least k consecutive components
fail. A linear consecutive–k–out–of–n:G system is an ordered sequence of n
components that works if and only if at least k consecutive components work.
The existing necessary conditions for the optimal design of systems with
2k ≤ n provide comparisons between reliabilities of components restricted
to positions from 1 to k and positions from n to (n − k + 1). This chapter
establishes necessary conditions for the variant optimal design that involve
components at some other positions, including component (k+1). Procedures
to improve designs not satisfying those conditions are also given.

Key words: Linear consecutive–k–out–of–n: F system, linear consecutive–

k–out–of–n: G system, variant optimal design, singular design, nonsingular
design

17.1 Introduction

For the description of the mathematical model of the system discussed here,
including nomenclature, assumptions and notation, the reader is referred to
[9], also appearing in this volume.
Zuo and Kuo [16] have proposed three methods for dealing with the variant
optimal design problem: a heuristic method, a randomization method and a
binary search method. The heuristic and randomization methods produce

Malgorzata O’Reilly
School of Mathematics and Physics, University of Tasmania, Hobart TAS 7001,
AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 327

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 17,
c Springer Science+Business Media, LLC 2009
328 M. O’Reilly

suboptimal designs of consecutive–k–out–of–n systems; the binary search

method produces an exact optimal design.
The heuristic method [16] is based on the concept of Birnbaum impor-
tance, which was introduced by Birnbaum in [1]. The Birnbaum reliability
importance Ii of component i is deﬁned by the following formula:

Ii ≡ R(p1 , . . . , pi−1 , 1, pi+1 , . . . , pn ) − R(p1 , . . . , pi−1 , 0, pi+1 , . . . , pn ),

where R stands for the reliability of a system.

The heuristic method [16] implements the idea that a component with a
higher reliability should be placed in a position with a higher Birnbaum im-
portance. Based on Birnbaum’s deﬁnition, Papastavridis [10] and Kuo, Zhang
and Zuo [5] deﬁned component reliability functions for consecutive–k–out–
of–n systems. Zakaria, David and Kuo [13], Zuo [14] and Chang, Cui and
Hwang [3] have established some comparisons of Birnbaum importance in
consecutive–k–out–of—n systems. Zakaria et al. [13] noted that more reli-
able components should be placed in positions with higher importance in a
reasonable heuristic for maximizing system reliability. Zuo and Shen [17] de-
veloped a heuristic method which performs better than the heuristic method
of Zuo and Kuo [16].
The randomization method [16] compares a limited number of randomly
chosen designs and obtains the best amongst them. The binary search
method [16] has been applied only to linear consecutive–k–out–of–n:F sys-
tems with n/2 ≤ k ≤ n. Both methods are based upon the following necessary
conditions for optimal design, proved by Malon [7] for linear consecutive–k–
out–of–n:F systems and extended by Kuo et al. [5] to linear consecutive–k–
out–of–n:G systems:
(i) components from positions 1 to min{k, n − k + 1} are arranged in nonde-
creasing order of component reliability;
(ii) components from positions n to max{k, n − k + 1} are arranged in nonde-
creasing order of component reliability;
(iii) the (2k − n) most reliable components are arranged from positions (n −
k + 1) to k in any order if n < 2k.
Pairwise rearrangement of components in a system has been suggested
as another method to enhance designs [2, 4, 6]. Other necessary condi-
tions have also been reported in the literature. Shen and Zuo [12] proved
that a necessary condition for the optimal design of a linear consecutive–
k–out–of–n:G system with n ∈ {2k, 2k + 1} is for it to be singular and
O’Reilly proved that a necessary condition for the optimal design of a lin-
ear consecutive–k–out–of–n:F system with n ∈ {2k, 2k + 1} is for it to be
nonsingular [8]. Those results have been extended to the case 2k ≤ n ≤ 3k
by O’Reilly in [9]. Procedures to improve designs not satisfying those neces-
sary conditions have been also provided ([8], Procedures 1–2; [9], Procedures
1–2).
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 329

17.2 Summary of the results

In this chapter we focus on variant optimal designs of linear consecutive–k–

out–of–n systems and establish necessary conditions for the optimal designs
of those systems. As an application of these results, we construct procedures
to enhance designs not satisfying these necessary conditions. An improved
design and its reverse, that is, a design with components in reverse order, are
regarded in these procedures as equivalent. Although variant optimal designs
depend upon the particular choices of component reliabilities, the necessary
conditions for the optimal design of linear consecutive systems established
here rely only on the order of component reliabilities and not their exact
values. Therefore they can be applied in the process of eliminating nonop-
timal designs from the set of potential optimal designs when it is possible
to compare component reliabilities, without necessarily knowing their exact
values.
We explore the case n ≥ 2k. The case n < 2k for G systems has been
solved by Kuo et al. [5] and the case n ≤ 2k for F systems can be limited to
n = 2k due to the result of Malon [7]. We can summarize this as follows.
Theorem 17.2.1 A design X = (q1 , q2 , . . . , qn ) is optimal for a linear
consecutive–k–out–of–n:F system with n < 2k if and only if
1. the (2k − n) best components are placed from positions (n − k + 1) to k, in
any order; and
2. the design (q1 , . . . , qn−k , qk+1 , . . . , qn ) is optimal for a linear consecu-
tive-(n − k)-out-of-2(n − k):F system.
The existing necessary conditions for the optimal design of systems with
n ≥ 2k [5, 7] provide comparisons between the reliabilities of components
restricted to the positions from 1 to k and the positions from n to (n − k + 1).
In this chapter we develop necessary conditions for the optimal design of
systems with n ≥ 2k with comparisons that involve components at some
other positions, including the (k + 1)-th component. The following conditions
are established as necessary for the design of a linear consecutive system to
be optimal (stated in Corollaries 17.3.1, 17.4.1 and 17.5.1 respectively):

• q1 > qk+1 and qn > qn−k for linear consecutive–k–out–of–n:F and

consecutive–k–out–of–n:G systems with n > 2k, k ≥ 2;
• min{q1 , q2k } > qk+1 > max{qk , qk+2 } for linear consecutive–k–out–of–
n:F systems with n = 2k + 1, k > 2;
• (q1 , qk+1 , qk+2 , q2k+2 ) is singular and (q1 , . . . , qk , qk+3 , . . . , q2k+2 )
nonsingular for linear consecutive–k–out–of–n:F systems with n = 2k + 2,
k > 2.

Further, procedures to improve designs not satisfying these conditions are

given. Whereas the ﬁrst of the conditions is general, the other two conditions
330 M. O’Reilly

compare components in places other than only the k left-hand or k right-

hand places for systems with n > 2k, unlike what has been considered in
the literature so far. Zuo [15] proved that for linear consecutive–k–out–of–n
systems with components with common choices of reliability p and n > 2k,
k > 2, we have
I1 < Ik+1 ,

where Ii stands for Birnbaum reliability importance. Lemmas 17.3.1–17.3.2

and Corollary 17.3.1 give a stronger result, which also allows the component
unreliabilities to be distinct.
Suppose X is a design of a linear consecutive system with n > 2k. Let i
and j be the intermediate components with k ≤ i < j ≤ n − k + 1. From the
results of Koutras, Papadopoulos and Papastavridis [4] it follows that such
components are incomparable in a sense that the information qi > qj is not
suﬃcient for us to establish whether pairwise rearrangement of components
i and j improves the design. However, as we show in this chapter, this does
not necessarily mean that we cannot determine, as a necessary condition,
which of the components i and j should be more reliable for the design to be
optimal. In the proofs of Propositions 17.4.2 and 17.5.2 we apply the following
recursive formula of Shanthikumar [11]:

F̄kn (q1 , . . . , qn ) = F̄kn−1 (q1 , . . . , qn−1 ) + pn−k qn−k+1 . . . qn

3 4
+ 1 − F̄kn−k−1 (q1 , . . . , qn−k−1 ) .

17.3 General result for n > 2k, k ≥ 2

We shall make use of the following notation.

Deﬁnition 1. Let X ≡ (q1 , . . . , qn )(X ≡ (p1 , . . . , pn )). We deﬁne X i;j to be

a design obtained from X by interchanging components i and j.

Propositions 17.3.1 and 17.3.2 below contain preliminary results to Lem-

mas 17.3.1 and 17.3.2, followed by Corollary 17.3.1 which states a necessary
condition for the optimal design of linear consecutive–k–out–of–n:F and lin-
ear consecutive–k–out–of–n:G systems with n > 2k, k ≥ 2.
Proposition 17.3.1 Let X ≡ (q1 , . . . , qn ), n > 2k, k ≥ 2. Then
3
F̄kn (X)−F̄kn (X 1;k+1 )=qk+2 (qk+1 − q1 ) F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
4
− F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn ) .

Proof. By the theorem of total probability, conditioning on the behavior of

the items in positions 1 and k + 1, we have that
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 331

F̄kn (X) = q1 qk+1 F̄kn (1, q2 , . . . , qk , 1, qk+2 , . . . , qn )

+ p1 pk+1 F̄kn (0, q2 , . . . , qk , 0, qk+2 , . . . , qn )
+ (p1 qk+1 )qk+2 F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
+ (p1 qk+1 )pk+2 F̄kn−1 (q2 , . . . , qk , 1, 0, qk+3 , . . . , qn )
+ (q1 pk+1 )qk+2 F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn )
+ (q1 pk+1 )pk+2 F̄kn (1, q2 , . . . , qk , 0, 0, qk+3 , . . . , qn ), (17.1)

and so also
F̄kn (X 1;k+1 ) = qk+1 q1 F̄kn (1, q2 , . . . , qk , 1, qk+2 , . . . , qn )
+ pk+1 p1 F̄kn (0, q2 , . . . , qk , 0, qk+2 , . . . , qn )
+ (pk+1 q1 )qk+2 F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
+ (pk+1 q1 )pk+2 F̄kn−1 (q2 , . . . , qk , 1, 0, qk+3 , . . . , qn )
+ (qk+1 p1 )qk+2 F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn )
+ (qk+1 p1 )pk+2 F̄kn (1, q2 , . . . , qk , 0, 0, qk+3 , . . . , qn ). (17.2)

Note that
F̄kn−1 (q2 , . . . , qk , 1, 0, qk+3 , . . . , qn )
= F̄kn (1, q2 , . . . , qk , 0, 0, qk+3 , . . . , qn )
k
k
= qs + F̄k n−k−2
(qk+3 , . . . , qn ) − qs F̄kn−k−2 (qk+3 , . . . , qn ),
s=2 s=2
(17.3)

and therefore
(p1 qk+1 )pk+2 F̄kn−1 (q2 , . . . , qk , 1, 0, qk+3 , . . . , qn )
+ (q1 pk+1 )pk+2 F̄kn (1, q2 , . . . , qk , 0, 0, qk+3 , . . . , qn )
= (pk+1 q1 )pk+2 F̄kn−1 (q2 , . . . , qk , 1, 0, qk+3 , . . . , qn )
+ (qk+1 p1 )pk+2 F̄kn (1, q2 , . . . , qk , 0, 0, qk+3 , . . . , qn ). (17.4)

Consequently, from (17.1), (17.2) and (17.4) we have

F̄kn (X) − F̄kn (X 1;k+1 )
3
= (p1 qk+1 )qk+2 F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
4
+ (q1 pk+1 )qk+2 F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn )
3
− (pk+1 q1 )qk+2 F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
4
+ (qk+1 p1 )qk+2 F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn )
3
= qk+2 (qk+1 − q1 ) F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )
4
− F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn ) , (17.5)

proving the proposition.

332 M. O’Reilly

Proposition 17.3.2 Let X ≡ (p1 , . . . , pn ), n > 2k, k ≥ 2. Then

Gnk (X) − Gnk (X 1;k+1 ) = pk+2 (pk+1 − p1 )·

3 n−1
Gk (p2 , . . . , pk , 1, 1, pk+3 , . . . , pn ) −
Gnk (1, p2 , . . . , pk , 0, 1, pk+3 , . . . , pn )] .

Proof. This result follows from Proposition 17.3.1 and the fact that

F̄kn (q1 , . . . , qn ) = Gnk (p̄1 , . . . , p̄n ),

where p̄i ≡ qi , 1 ≤ i ≤ n.
Lemma 17.3.1 Let X ≡ (q1 , . . . , qn ) be a design for a linear consecutive–
k–out–of–n:F system, n > 2k, k ≥ 2. If q1 < qk+1 , then X 1;k+1 is a better
design.

Proof. We assume the notation that if T < R then

T
fs ≡ 0.
s=R

Deﬁne

W ≡ F̄kn−1 (q2 , . . . , qk , 1, 1, qk+3 , . . . , qn )

− F̄kn (1, q2 , . . . , qk , 0, 1, qk+3 , . . . , qn ). (17.6)

Since qk+1 − q1 > 0 by assumption, it is suﬃcient by Proposition 17.3.1 to

show that W > 0.
We shall show W > 0. Deﬁne Wi for 0 ≤ i ≤ k − 1 in the following way. If
i = 0, then

Wi ≡ F̄kn−k (1, 1, qk+3 , . . . , qn ) − F̄kn−k (0, 1, qk+3 , . . . , qn );

if 1 ≤ i ≤ k − 2, then

Wi = F̄kn−k+i (1, . . . , 1, 1, 1, qk+3 , . . . , qn )

@ A> ?
i

− F̄kn−k+i (1, . . . , 1, 0, 1, qk+3 , . . . , qn );

@ A> ?
i

and if i = k − 1, then
Wi = F̄kn−k+i (1, . . . , 1, 1, 1, qk+3 , . . . , qn )
@ A> ?
k−1

− F̄kn−k+i (1, 1, . . . , 1, 0, 1, qk+3 , . . . , qn ). (17.7)

@ A> ?
k−1
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 333

Since

k−2
k
k
pk + pk−i qs + qs = 1,
i=1 s=k−i+1 s=2

by the theorem of total probability, conditioning on the behavior of the items

in positions 2, . . . , k, we have that
k

k−2 k
W = p k W0 + pk−i qs W i + qs Wk−1 . (17.8)
i=1 s=k−i+1 s=2

Note that Wk−1 = 1 − 1 = 0 and Wi > 0 for all 0 ≤ i ≤ k − 2. From this it

follows that W > 0, proving the lemma.
Lemma 17.3.2 Let X ≡ (p1 , . . . , pn ) be a design for a linear consecutive–
k–out–of–n:G system, n > 2k, k ≥ 2. If q1 < qk+1 , then X 1;k+1 is a better
design.
Proof. By reasoning similar to that in the proof of Lemma 17.3.1 it can be
shown that W > 0, where W is deﬁned by

W ≡ Gn−1
k (p2 , . . . , pk , 1, 1, pk+3 , . . . , pn )
− n
Gk (1, p2 , . . . , pk , 0, 1, pk+3 , . . . , pn ). (17.9)

Since pk+1 − p1 < 0 by assumption, from Proposition 17.3.2 we have

Gk (X) < Gk (X 1;k+1 ),

proving that X 1;k+1 is a better design.

Corollary 17.3.1 Let X ≡ (q1 , . . . , qn ), n > 2k, k ≥ 2. If X is an optimal
design for a linear consecutive–k–out–of–n:F or k–out–of–n:G system, then
q1 > qk+1 and qn > qn−k .
Proof. Let X be an optimal design for a linear consecutive–k–out–of–n:F
system. Suppose that q1 < qk+1 or qn < qn−k . If q1 < qk+1 , then from
Lemma 17.3.1 it follows that X 1;k+1 is a better design. Further, if qn < qn−k ,
then by Lemma 17.3.1 applied to the reversed design Xr ≡ (qn , . . . , q1 ), we
have that X n;n−k is a better design.
The proof for a linear consecutive–k–out–of–n:G system is similar and
follows from Lemma 17.3.2.
Note: In the optimal design of linear consecutive–k–out–of–n systems
with n ∈ {2k + 1, 2k + 2}, the worst component must be placed in position 1
or n. This is due to the necessary condition for the optimal design stated in
Corollary 17.3.1 and the necessary conditions of Malon [7] and Kuo et al. [5],
as stated in Section 1. Considering that a design is equivalent to its reversed
version in the sense that their reliabilities are equal, it can be assumed in
algorithms that the worst component is placed in position 1.
334 M. O’Reilly

17.4 Results for n = 2k + 1, k > 2

We shall make use of the following notation.

Deﬁnition 2. Let X ≡ (q1 , . . . , qn ) (X ≡ (p1 , . . . , pn )). We deﬁne

X i1 ,...,ir ;j1 ,...,jr

to be a design obtained from X by interchanging components is and js for

all 1 ≤ s ≤ r.

Propositions 17.4.1 and 17.4.2 below contain preliminary results to Lemma

17.4.1, followed by Corollary 17.4.1 which states a necessary condition for the
optimal design of a linear consecutive–k–out–of–(2k+1):F system with k > 2.
Proposition 17.4.1 Let X ≡ (q1 , . . . , q2k+1 ), k > 2. Then

F̄k2k+1 (X) − F̄k2k+1 (X k;k+1 ) = (qk − qk+1 ) (q1 . . . qk−1 − qk+2 . . . q2k
+qk+2 . . . q2k+1 − q1 . . . qk−1 qk+2 . . . q2k+1 ) .

Proof. From

F̄k2k+1 (X) = qk qk+1 F̄k2k+1 (q1 , . . . , qk−1 , 1, 1, qk+2 , . . . , q2k+1 )

+ pk pk+1 F̄k2k+1 (q1 , . . . , qk−1 , 0, 0, qk+2 , . . . , q2k+1 )
+ pk qk+1 F̄k2k+1 (q1 , . . . , qk−1 , 0, 1, qk+2 , . . . , q2k+1 )
+ qk pk+1 F̄k2k+1 (q1 , . . . , qk−1 , 1, 0, qk+2 , . . . , q2k+1 ), (17.10)

and consequently

F̄k2k+1 (X k;k+1 ) = qk+1 qk F̄k2k+1 (q1 , . . . , qk−1 , 1, 1, qk+2 , . . . , q2k+1 )

+ pk+1 pk F̄k2k+1 (q1 , . . . , qk−1 , 0, 0, qk+2 , . . . , q2k+1 )
+ pk+1 qk F̄k2k+1 (q1 , . . . , qk−1 , 0, 1, qk+2 , . . . , q2k+1 )
+ qk+1 pk F̄k2k+1 (q1 , . . . , qk−1 , 1, 0, qk+2 , . . . , q2k+1 ), (17.11)

we have

F̄k2k+1 (X) − F̄k2k+1 (X k+1;k+2 )

3
= (qk − qk+1 ) F̄k2k+1 (q1 , . . . , qk−1 , 1, 0, qk+2 , . . . , q2k+1 )
4
− F̄k2k+1 (q1 , . . . , qk−1 , 0, 1, qk+2 , . . . , q2k+1 )
= (qk − qk−1 ) (q1 . . . qk−1 − qk+2 . . . q2k
+ qk+2 . . . q2k+1 − q1 . . . qk−1 qk+2 . . . q2k+1 ) , (17.12)

proving the proposition.

17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 335

Proposition 17.4.2 Let X ≡ (q1 , . . . q2k+1 ) and Y ≡ X 1,...,k−1;2k,...,k+2 .

Then
F̄k2k+1 (X) − F̄k2k+1 (Y ) = (qk+2 . . . q2k − q1 . . . qk−1 )·
(qk+1 p2k+1 + q2k+1 − qk ).

Proof. From Shanthikumar’s recursive algorithm [11], as stated in Section

17.2, it follows that
F̄k2k+1 (X) = F̄k2k (q1 , . . . , q2k ) + pk+1 qk+2 . . . q2k q2k+1 (1 − q1 . . . qk−1 qk )
(17.13)

and
F̄k2k+1 (Y ) = F̄k2k (q2k , . . . , qk+2 , qk , qk+1 , qk−1 , . . . , q1 )
+ pk+1 qk−1 . . . q1 q2k+1 (1 − qk+2 . . . q2k qk ) . (17.14)

Note that
pk+1 qk+2 . . . q2k q2k+1 (1 − q1 . . . qk−1 qk )
− pk+1 qk−1 . . . q1 q2k+1 (1 − qk+2 . . . q2k qk )
= pk+1 q2k+1 (qk+2 . . . q2k − q1 . . . qk−1 ) . (17.15)

Also, we have
F̄k2k (q1 , . . . , q2k ) = pk pk+1 · 0
2(k−2)
+ qk qk+1 F̄k−2 (q2 , . . . , qk−1 , qk+2 , . . . , q2k−1 )
+ pk qk+1 (qk+2 . . . q2k ) + qk pk+1 (q1 . . . qk−1 ) (17.16)

and
F̄k2k (q2k , . . . , qk+2 , qk , qk+1 , qk−1 , . . . , q1 ) = pk pk+1 · 0
2(k−2)
+ qk qk+1 F̄k−2 (q2 , . . . , qk−1 , qk+2 , . . . , q2k−1 )
+ pk qk+1 (q1 . . . qk−1 ) + pk+1 qk (qk+2 . . . q2k ), (17.17)

and so
F̄k2k (q1 , . . . , q2k ) − F̄k2k (q2k , . . . , qk+2 , qk , qk+1 , qk−1 , . . . , q1 )
= (qk+1 − qk ) (qk+2 . . . q2k − q1 . . . qk−1 ) . (17.18)

From (17.13)–(17.15) and (17.18) it follows that

F̄k2k+1 (X) − F̄k2k+1 (Y )
= (qk+1 − qk ) (qk+2 . . . q2k − q1 . . . qk−1 )
+ pk+1 q2k+1 (qk+2 . . . q2k − q1 . . . qk−1 )
= (qk+2 . . . q2k − q1 . . . qk−1 )(qk+1 p2k+1 + q2k+1 − qk ), (17.19)

and so the proposition follows.

336 M. O’Reilly

Lemma 17.4.1 Let X ≡ (q1 , . . . q2k+1 ) be a design for a linear consecutive–

k–out–of–(2k + 1) : F system, k > 2. Let X satisfy the necessary condi-
tions for the optimal design given by Malon [7] and Kuo et al. [5], as stated
in Section 1. Assume qk+1 < qk . If

q1 . . . qk−1 ≥ qk+2 . . . q2k , (17.20)

then X k;k+1 is a better design, while if

q1 . . . qk−1 < qk+2 . . . q2k , (17.21)

then X 1,...,k−1;2k,...,k+2 is a better design and (q2k+1 , q1 , . . . q2k ) is a better

design.

Proof. Suppose q1 . . . qk−1 ≥ qk+2 . . . q2k . Then, since qk − qk+1 > 0 and

qk+2 . . . q2k+1 − q1 . . . qk−1 qk+2 . . . q2k+1 > 0,

by Proposition 17.4.1 we have

F̄k2k+1 (X) − F̄k2k+1 (X k;k+1 ) > 0, (17.22)

and so X k;k+1 is a better design.

Suppose q1 . . . qk−1 < qk+2 . . . q2k . We have assumed the values qi are
distinct, so q2k+1 = qk . If q2k+1 < qk , then by the necessary conditions of
Malon [7] and Kuo et al. [5] we have

q1 > q2 > · · · > qk > q2k+1 > q2k > · · · > qk+2 , (17.23)

and then q1 . . . qk−1 > qk+2 . . . q2k , contrary to assumption. Hence

q2k+1 > qk (17.24)

and so by Proposition 17.4.2 we have

F̄k2k+1 (X) − F̄k2k+1 (X 1,...,k−1;2k,...,k+2 ) > 0, (17.25)

proving that

X 1,...,k−1;2k,...,k+2 ≡ (q2k , . . . , qk+2 , qk , qk+1 , qk−1 , . . . , q1 , q2k+1 )

is a better design. Deﬁne X ≡ (q1 , . . . , q2k+1 ) ≡ X 1,...,k−1;2k,...,k+2 . Note

that qk+1 < qk and q1 · . . . · qk−1 > qk+2 · . . . · q2k and so by Proposition 17.4.1,
as we have shown in the earlier part of this proof, interchanging components

qk+1 and qk ∈ X improves the design. Since a design

(q1 , . . . , qk−1 , qk+1 qk , qk+2 , . . . , q2k+1 ) ≡ (q2k , . . . , q1 , q2k+1 )
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 337

is better than X , X is better than X, and

F̄k2k+1 (q2k+1 , q1 , . . . q2k ) = F̄k2k+1 (q2k , . . . , q1 , q2k+1 ), (17.26)

so (q2k+1 , q1 , . . . q2k ) is better than X.

Note that the rearrangement X 1,...,k−1;2k,...,k+2 as given in Lemma 17.4.1

above is equivalent to:

• taking the (2k + 1)-th component and putting it on the left-hand side of
the system, next to the ﬁrst component (in position 0);
• interchanging components k and (k + 1); and then
• reversing the order of components.

Corollary 17.4.1 Let X ≡ (q1 , . . . q2k+1 ) be a design for a linear conse-

cutive–k–out–of–(2k + 1) : F system, k > 2. If X is optimal, then

min{q1 , q2k+1 } > qk+1 > max{qk , qk+2 }. (17.27)

Proof. From Corollary 17.3.1 we have min{q1 , q2k+1 } > qk+1 . If X is opti-
mal, then it satisﬁes the necessary conditions for the optimal design given by
Malon [7] and Kuo et al. [5], as stated in Section 17.2. From Lemma 17.4.1 it
follows that qk+1 > qk must be satisﬁed. Similarly, from Lemma
17.4.1 applied to the reversed design Xr ≡ (q2k+1 , . . . , q1 ), we have qk+1 >
qk+2 .

17.5 Results for n = 2k + 2, k > 2

Propositions 17.5.1 and 17.5.2 below contain preliminary results for Lemma
17.5.1, followed by Corollary 17.5.1 which gives a necessary condition for the
optimal design of a linear consecutive–k–out–of–(2k + 2): F system.
Proposition 17.5.1 Let X ≡ (q1 , . . . , q2k+2 ), k > 2. Then

F̄k2k+2 (X) − F̄k2k+2 (X k+1;k+2 ) = (qk+2 − qk+1 )·

[(p2k+2 qk+3 . . . q2k+1 − p1 q2 . . . qk )
− (p2k+2 − p1 )q2 . . . qk qk+3 . . . q2k+1 ] .

Proof. Since
2(k−2)
F̄k2k+2 (X) = qk+1 qk+2 F̄k−2 (q3 , . . . , qk , qk+3 , . . . , q2k )
+ pk+1 pk+2 F̄k2k+2 (q1 , . . . , qk , 0, 0, qk+3 , . . . , q2k )
+ pk+1 qk+2 F̄k2k+2 (q1 , . . . , qk , 0, 1, qk+3 , . . . , q2k+2 )
+ qk+1 pk+2 F̄k2k+2 (q1 , . . . , qk , 1, 0, qk+3 , . . . , q2k+2 ) (17.28)
338 M. O’Reilly

and
2(k−2)
F̄k2k+2 (X k+1;k+2 ) = qk+1 qk+2 F̄k−2 (q3 , . . . , qk , qk+3 , . . . , q2k )
+ pk+1 pk+2 F̄k2k+2 (q1 , . . . , qk , 0, 0, qk+3 , . . . , q2k )
+ pk+2 qk+1 F̄k2k+2 (q1 , . . . , qk , 0, 1, qk+3 , . . . , q2k+2 )
+ qk+2 pk+1 F̄k2k+2 (q1 , . . . , qk , 1, 0, qk+3 , . . . , q2k+2 ), (17.29)

we have

F̄k2k+2 (X) − F̄k2k+2 (X k+1;k+2 ) = (qk+2 − qk+1 )·

3 2k+2
F̄k (q1 , . . . , qk , 0, 1, qk+3 , . . . , q2k+2 )
4
− F̄k2k+2
(q1 , . . . , qk , 1, 0, qk+3 , . . . , q2k+2 )
= (qk+2 − qk+1 ) [q1 . . . qk + qk+3 . . . q2k+1 − q1 . . . qk qk+3 . . . q2k+1
− q2 . . . qk − qk+3 . . . q2k+2 + q2 . . . qk qk+3 . . . q2k+2 ]
= (qk+2 − qk+1 ) [(p2k+2 qk+3 . . . q2k+1 − p1 q2 . . . qk )
− (p2k+2 − p1 )q2 . . . qk qk+3 . . . q2k+1 ] , (17.30)

proving the proposition.

Proposition 17.5.2 Let X ≡ (q1 , . . . , q2k+2 ), k > 2. Then

F̄k2k+2 (X) − F̄k2k+2 (X 1;2k+2 ) = (q1 − q2k+2 )·

[(pk+1 q2 . . . qk − pk+2 qk+3 . . . q2k+1 )
− (pk+1 − pk+2 )q2 . . . qk qk+3 . . . q2k+1 ] .

Proof. Since

F̄k2k+2 (X) = q1 q2k+2 F̄k2k+2 (1, q2 , . . . , q2k+1 , 1)

+ p1 p2k+1 F̄k2k+2 (0, q2 , . . . , q2k+1 , 0)
+ p1 q2k+2 F̄k2k+1 (q2 , . . . , q2k+1 , 1)
+ p2k+2 q1 F̄k2k+1 (1, q2 , . . . , q2k+1 ) (17.31)

and

F̄k2k+2 (X 1;2k+2 ) = q1 q2k+2 F̄k2k+2 (1, q2 , . . . , q2k+1 , 1)

+ p1 p2k+1 F̄k2k+2 (0, q2 , . . . , q2k+1 , 0)
+ p2k+2 q1 F̄k2k+1 (q2 , . . . , q2k+1 , 1)
+ p1 q2k+2 F̄k2k+1 (1, q2 , . . . , q2k+1 ), (17.32)
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 339

we have

F̄k2k+2 (X) − F̄k2k+2 (X 1;2k+2 ) = (q1 − q2k+2 )·

3 2k+1
F̄k (1, q2 , . . . , q2k+1 )
4
− F̄k2k+1 (q2 , . . . , q2k+1 , 1) . (17.33)

From this, by Shanthikumar’s recursive algorithm [11], as stated in Section

17.2, it follows that

F̄k2k+2 (X) − F̄k2k+2 (X 1;2k+2 ) = (q1 − q2k+2 )·

38 2k
F̄k (q2 , . . . , q2k+1 )
+ pk+1 q2 . . . qk (1 − qk+2 . . . q2k+1 )}
8
− F̄k2k (q2 , . . . , q2k+1 )
+ pk+2 qk+3 . . . q2k+1 (1 − q2 . . . qk+1 )}]
= (q1 − q2k+2 )·
[pk+1 q2 . . . qk (1 − qk+3 . . . q2k+1 + pk+2 qk+3 . . . q2k+1 )]
− pk+2 qk+3 . . . q2k+1 (1 − q2 . . . qk + pk+1 q2 . . . qk )]
= (q1 − q2k+2 ) [(pk+1 q2 . . . qk − pk+2 qk+3 . . . q2k+1 )
− (pk+1 − pk+2 )q2 . . . qk qk+3 . . . q2k+1 ] , (17.34)

completing the proof.

Lemma 17.5.1 Let X ≡ (q1 , . . . , q2k+2 ) be a design for a linear consecutive–
k–out–of–(2k + 2) : F system, q1 > q2k+2 . Assume qk+1 < qk+2 . If

qk+3 . . . q2k+1 ≥ q2 . . . qk , (17.35)

then X k+1;k+2 is a better design, whereas if

qk+3 . . . q2k+1 ≤ q2 . . . qk , (17.36)

then X 1;2k+2 is a better design.

Proof. If qk+3 . . . q2k+1 ≥ q2 . . . qk , then

p2k+2 qk+3 . . . q2k+1 − p1 q2 . . . qk

≥ (p2k+2 − p1 )q2 . . . qk
> (p2k+2 − p1 )q2 . . . qk qk+3 . . . q2k+1 , (17.37)

and so by Proposition 17.5.1 we have F̄k2k+2 (X) > F̄k2k+2 (X k+1;k+2 ), proving
that X k+1;k+2 is a better design.
340 M. O’Reilly

If qk+3 . . . q2k+1 ≤ q2 . . . qk , then

pk+1 q2 . . . qk − pk+2 qk+3 . . . q2k+1

≥ (pk+1 − pk+2 )qk+3 . . . q2k+1
> (pk+1 − pk+2 )q2 . . . qk qk+3 . . . q2k+1 , (17.38)

and from Proposition 17.5.2 it follows that F̄k2k+2 (X) > F̄k2k+2 (X 1;2k+2 ),
proving that X 1;2k+2 is a better design and completing the proof.

Corollary 17.5.1 Let X ≡ (q1 , . . . , q2k+2 ) be a design for a linear consecu-

tive–k–out–of–(2k + 2):F system, k > 2. If X is optimal, then
(i)(q1 , qk+1 , qk+2 , q2k+2 ) is singular; and
(ii)(q1 , . . . , qk , qk+3 , . . . , q2k+2 ) is nonsingular.

Proof. Without loss of generality we may assume q1 > q2k+2 . For q1 < q2k+2
we apply the reasoning below to the reversed design Xr ≡ (q2k+2 , . . . , q1 ).
Suppose (q1 , qk+1 , qk+2 , q2k+2 ) is nonsingular. Then qk+1 < qk+2 , and by
Lemma 17.5.1 we have that either X k+1;k+2 or X 1;2k+2 must be a bet-
ter design. Hence X is not optimal contrary to the assumption, and (i)
follows.
Suppose that (q1 , . . . , qk , qk+3 , . . . , q2k+2 ) is singular. Then, since from
above (q1 , qk+1 , qk+2 , q2k+2 ) must be singular, we have that X is singular,
contrary to the necessary condition of nonsingularity stated by O’Reilly in
([9], Corollary 1). Hence (ii) follows and this completes the proof.

17.6 Procedures to improve designs not satisfying the

necessary conditions for the optimal design

The procedures below follow directly from the results of Lemmas 17.3.1,
17.3.2, 17.4.1, 17.5.1 respectively. Procedure 17.6.3 also applies the necessary
conditions for the optimal design given by Malon [7] and Kuo et al. [5], as
stated in Section 17.1.

Procedure 17.6.1 Let X be a design for a linear consecutive–k–out–of–n:F

or a linear consecutive–k–out–of–n:G system, with n > 2k, k ≥ 2. In order to
improve the design, if q1 < qk+1 , interchange components q1 and qk+1 . Next,
if qn < qn−k+1 , interchange components qn and qn−k+1 .
Procedure 17.6.2 Let X be a design for a linear consecutive–k–out–of–
(2k + 1) : F system, k > 2. Rearrange the components in the positions from
1 to k, and then the components in the positions from (2k + 1) to (k + 2) in
non-decreasing order of component reliability. In order to improve the design,
proceed as follows:
17 The (k + 1)-th component of linear consecutive–k–out–of–n systems 341

• If qk+1 < qk ,

1. Interchange components qk+1 and qk , when q1 . . . qk−1 ≥ qk+2 . . . q2k ;

otherwise take the q2k+1 component, put it on the left-hand side of the
system, next to the q1 component (in position 0).
2. In a design obtained in this way, rearrange the components in the po-
sitions from 1 to k, and then the components in the positions from
(2k + 1) to (k + 2) in non-decreasing order of component reliability.
3. If required, repeat steps 1–3 to further improve this new rearranged
design or until the condition qk+1 > qk is satisifed.
• If qk+1 < qk+2 , reverse the order of components and apply steps 1–3 to
the rearranged design.
Procedure 17.6.3 Let X be a design for a linear consecutive–k–out–of–
(2k + 2) : F system, with k > 2. In order to improve the design:

• If q1 > q2k+2 and qk+1 < qk+2 , interchange components

1. qk+1 and qk+2 , when qk+3 . . . q2k+1 ≥ q2 . . . qk or
2. q1 and q2k+2 , when qk+3 . . . q2k+1 ≤ q2 . . . qk .
• If q1 < q2k+2 and qk+1 > qk+2 , interchange components

1. qk+1 and qk+2 , when qk+3 . . . q2k+1 ≤ q2 . . . qk or

2. q1 and q2k+2 , when qk+3 . . . q2k+1 ≥ q2 . . . qk .

References

1. Z. W. Birnbaum, On the importance of diﬀerent components in a multicomponent

system, in P. R. Krishnaiah, editor, Multivariate Analysis, II, (Academic Press, New
York, 1969), 581–592.
2. P. J. Boland, F. Proschan and Y. L. Tong, Optimal arrangement of components via
pairwise rearrangements, Naval Res. Logist. 36 (1989), 807–815.
3. G. J. Chang, L. Cui and F. K. Hwang, New comparisons in Birnbaum importance for
the consecutive–k–out–of–n system, Probab. Engrg. Inform. Sci. 13 (1999), 187–192.
4. M. V. Koutras, G. K. Papadopoulos and S. G. Papastavridis, Note: Pairwise rear-
rangements in reliability structures, Naval Res. Logist. 41 (1994), 683–687.
5. W. Kuo, W. Zhang and M. Zuo, A consecutive–k–out–of–n:G system: the mirror im-
age of a consecutive–k–out–of–n : F system, IEEE Trans. Reliability, 39(2) (1990),
244–253.
6. J. Malinowski and W. Preuss, Reliability increase of consecutive–k–out–of–n : F and
related systems through component rearrangement, Microelectron. Reliab. 36(10)
(1996), 1417–1423.
7. D. M. Malon, Optimal consecutive–k–out–of–n : F component sequencing, IEEE
Trans. Reliability, 34(1) (1985), 46–49.
8. M. M. O’Reilly, Variant optimal designs of linear consecutive–k–out–of–n systems, to
appear in Mathematical Sciences Series: Industrial Mathematics and Statistics, Ed.
J. C. Misra (Narosa Publishing House, New Delhi, 2003), 496–502.
342 M. O’Reilly

9. M. M. O’Reilly, Optimal design of linear consecutive–k–out–of–n systems, chapter in

this volume.
10. S. Papastavridis, The most important component in a consecutive–k–out–of–n: F sys-
tem, IEEE Trans. Reliability 36(2) (1987), 266–268.
11. J. G. Shanthikumar, Recursive algorithm to evaluate the reliability of a consecutive–
k–out–of–n: F system, IEEE Trans. Reliability 31(5) (1982), 442–443.
12. J. Shen and M. Zuo, A necessary condition for optimal consecutive–k–out–of–n: G
system design, Microelectron. Reliab. 34(3) (1994), 485–493.
13. R. S. Zakaria, H. T. David and W. Kuo, A counter–intuitive aspect of compo-
nent importance in linear consecutive–k–out–of–n systems, IIE Trans. 24(5) (1992),
147–154.
14. M. J. Zuo, Reliability of linear and circular consecutively–connected systems, IEEE
Trans. Reliability 42(3) (1993), 484–487.
15. M. Zuo, Reliability and component importance of a consecutive–k–out–of–n system,
Microelectron. Reliab. 33(2) (1993), 243–258.
16. M. Zuo and W. Kuo, Design and performance analysis of consecutive–k–out–of–n
structure, Naval Res. Logist. 37 (1990), 203–230.
17. M. Zuo and J. Shen, System reliability enhancement through heuristic design, OMAE
II, Safety and Reliability (ASME Press, 1992), 301–304.
Chapter 18
Optimizing properties
of polypropylene and elastomer
compounds containing wood ﬂour

Pavel Spiridonov, Jan Budin, Stephen Clarke and Jani Matisons

Abstract Despite the fact that wood flour has been known as an inexpen-
sive filler in plastics compounds for many years, commercial wood-filled plas-
tics are not widely used. One reason for this has been the poor mechanical
properties of wood-filled compounds. Recent publications report advances in
wood flour modification and compatibilization of polymer matrices, which
has led to an improvement in processability and the mechanical properties
of the blends. In most cases the compounds were obtained in Brabender-
type mixers. In this work the authors present the results for direct feeding of
mixtures of wood flour and thermoplastic materials (polypropylene and SBS
elastomer) in injection molding. The obtained blends were compared with
Brabender-mixed compounds from the point of view of physical and mechan-
ical properties and aesthetics. It was shown that polymer blends with rough

Pavel Spiridonov
Centre for Advanced Manufacturing Research, University of South Australia,
Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]
Jan Budin
Institute of Chemical Technology, Prague, CZECH REPUBLIC
e-mail: [email protected]
Stephen Clarke
Polymer Science Group, Ian Wark Research Institute, University of South Australia,
Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]
Jani Matisons
Polymer Science Group, Ian Wark Research Institute, University of South Australia,
Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]
Grier Lin
Centre for Advanced Manufacturing Research, University of South Australia,
Mawson Lakes SA 5095, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 343

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 18,
c Springer Science+Business Media, LLC 2009
344 P. Spiridonov et al.

grades of wood ﬂour (particle size >300 microns) possess a better decorative
look and a lower density having, at the same time, poorer mechanical prop-
erties. Usage of compatibilizers allowed the authors to optimize the tensile
strength of these compounds.

Key words: Tensile strength, wood-ﬁlled plastic, polypropylene elastomer,

wood ﬂour, optimal properties

18.1 Introduction

Wood ﬂour is referred to [1] as an extender – a type of ﬁller that is added

to polymer compounds as a partial substitute for expensive plastic or elas-
tomeric material. The major advantages of organic fillers, including wood
flour, are their relatively low cost and low density [2]. Despite the fact that
wood flour has been known as a filler since the 1950s, its commercial appli-
cation has been rather restricted. During the past 10–15 years, new organic
materials such as rice husks, oil palm fibers, natural bast fibers and sisal
strands have been studied as fillers [3]–[6]. Previous investigations in the area
of traditional wood fibers have studied particular types of wood such as euca-
lyptus [7] or ponderosa pine [8, 9]. Most of the above organic fillers are used
in composite materials [2]–[4],[6]–[8],[10]–[14].
The primary objective of this research was to find cost-effective ways to
optimize properties of polypropylene and thermoplastic elastomers filled with
wood flour. To eliminate a blending operation from the manufacturing pro-
cess, direct feeding of an injection-molding machine with the polymer-filler
mixtures was employed.
Filling the polymer compounds with wood flour would allow not only a
decrease in their cost, but would also reduce the environmental effect by
utilizing wood wastes and making recyclable or bio-degradable products [2,
15, 16]. From this point of view, the authors did not select a particular type
of wood; instead we used a mixture of unclassified sawdust.

18.2 Methodology

18.2.1 Materials

As a polymer matrix two polymers were used. Polypropylene (unﬁlled,

density 0.875 g/cm3 , melt index 10 cm3 /10 min at 230o C) was used. This
is one of the most common industrial plastics. Styrene–butadiene tri-block
(SBS) elastomer (unﬁlled, rigid-to-soft fraction ratio 30/70, density 0.921
g/cm3 , melt index 12 cm3 /10 min at 230o C) was selected because of the
growing popularity of thermoplastic elastomers (TPE). This is due to their
“soft touch feeling” properties and application in two-component injection-
molding applications.
18 Optimizing properties of plastics compounds containing wood ﬂour 345

To compatibilize the wood ﬂour with polymers [9]–[13], maleic anhydride

grafted polypropylene (PP–MA) and styrene–ethylene / butylene–styrene
elastomer (SEBS–MA) were added to the mix. The content of maleic an-
hydride in PP–MA and SBS–MA was 1.5% and 1.7%, respectively.
As a ﬁller, a mixture of local unclassiﬁed sawdust was used. The mixture
was separated into 4 fractions. The characteristics of the fractions are given
in Table 18.1.

Table 18.1 Physical characteristics of the wood ﬂour fractions

Fraction Particle Size, μ m Sieve Mesh Size Density, g/cm3
1 600–850 20 0.92
2 300–600 30 1.17
3 150–300 60 1.37
4 <150 100 1.56

18.2.2 Sample preparation and tests

Wood flour samples were pre-dried for 6 hours at 50–60o C in the electric
oven before blending. The polymer–wood flour blends were obtained in two
ways. For injection molding, the polymer and filler were mixed just before
molding. No additional pre-compounding was used. The specimens for ten-
sile test (Australian Standard AS 1145) were molded in a 22 (metric) tonne
injection-molding machine.
These blends were also pre-mixed in a Brabender mixer at 40 rpm at 180o C
to 190o C. The polymers were first introduced in the mixer; the wood flour was
then added when the polymers melted (a constant torque was reached). The
total mixing time was 6–8 min depending on the composition. Each blend
weighed 65–70 grams. While warm, the blended materials were formed into a
2-mm sheet in the laboratory vulcanization press under 10 MPa pressure at
180o C. The tensile specimens were punched from the sheets using a standard
cutting die. Tensile testing of the above specimens was conducted according
to AS 1145 on a horizontal tensile test machine. Five test samples of each
compound were tested.
Densities of the wood flour and polymer compounds were determined by
a volumetric method in either water or methylated spirits.

18.3 Results and discussions

18.3.1 Density of compounds

Comparison of the densities of the polymer compounds provides information

about the quality and interactions between the polymer matrix and ﬁller.
346 P. Spiridonov et al.

From Table 18.1 it can be seen that the density of wood flour depends on
the particle size in each fraction. The fractions consisting of smaller particles
have a higher density. This is because wood is a cellulose material, which has
a porous structure [4]. Larger wood particles retain this structure with con-
siderably less displacement of voids by the floatation medium. On the other
hand, for smaller particles a higher percentage of voids are filled by the flota-
tion liquid [14] resulting in a higher density. The difference in density should
influence the density of the polymer compounds that contain different wood
flour fractions. When the compounds were molded in the injection-molding
machine or pressed after mixing in the Brabender mixer, their density was
both measured and calculated. The calculations were based on the density
of the polymer matrix and the wood flour and their ratio in the compounds.
The results are presented in Figure 18.1.
A difference was observed between the two processing methods and the
calculated values. The results indicated that the densities of un-coupled PP
were lower when molded in the injection-molding machine. The most sta-
ble results were obtained when both PP and SBS compounds were mixed
in the Brabender mixer and then were formed in the press. Stark et al. [8]
have explained this observation by the compression of the compounds to
the maximum density that the wood cell walls can sustain. This correlates
with our results in regard to the Brabender method and the difference from
the calculated values. However, the pressure created in an injection mold
is comparable with the pressure developed in the press. Although an injec-
tion machine creates a bigger plasticizing effect, the total blending time is
shorter (1–1.5 min) than for the Babender mixer (6–8 min). Therefore, in
addition to the effect of compression, blending time is a very important
parameter.
The use of modifiers improved the quality of compounds without increas-
ing the blending time. Thus maleated polypropylene allowed us to obtain
compounds with close densities both in injection molding and the Braben-
der mixer (see Figure 18.1). This is because of the compatibilization impact
of maleic anhydride, which is achieved by improving the polymer matrix
impregnation, improving fiber dispersion, enhancing the interfacial adhesion
and other effects [9]–[13].

18.3.2 Comparison of compounds obtained

in a Brabender mixer and an injection-molding
machine

The diﬀerence in mixing by injection molding and by the Brabender method

influences not only the density of compounds but also their mechanical
properties. Despite the fact that the tensile strength of the control com-
pound (unmodified polypropylene) molded in the injection machine was
18 Optimizing properties of plastics compounds containing wood flour 347

Fig. 18.1 Density of (a) polypropylene and (b) SBS in elastomer compounds for diﬀerent
blending methods.

higher than that of Brabender-mixed specimens, most of the other com-

pounds were weaker (see Figure 18.2). The primary inﬂuential parameters
have been discussed above. In addition to pressure and blending time, tem-
perature is also an important technological parameter. The mixing tempera-
ture of the Brabender mixer was 185–190o C, which is below the temperature
of wood degradation (200o C) [7]. The injection-molding temperature varied
for polypropylene from 185o C in the center zone to 200o C in the nozzle.
348 P. Spiridonov et al.

Fig. 18.2 Comparison of tensile strength of the compounds obtained in an injection-

molding machine and in a Brabender mixer.

For the injection of SBS elastomer, the temperatures in the barrel were set
10–15o C higher. The observations of the injection molding of polymer–wood
flour mixtures showed that stability of this process depends on the filler /
polymer ratio and the temperature. When the content of wood flour was be-
low 40%, the process and the quality of the molded specimens were both quite
stable. When the content of wood flour exceeded 50%, its volume exceeded
the polymer volume and it became hard to obtain a consistent quality. In
addition, because wood flour is hard and does not melt during the process,
the friction between metal parts of the machine (screw, barrel) and the wood
flour particles is very high, which also prevents the polymer matrix from
forming a continuous phase. Therefore it was visually detected that the dis-
tribution of wood flour in the polymer was not regular (for example, particle
agglomerates were observed) when the wood filler content was greater than
50% weight. The maximum content of wood flour in the following experiments
was maintained at 40%.
During the injection-molding experiments it was noticed that higher tem-
peratures led to the formation of vapors in the compounds. It was accounted
for by decomposition of the wood flour, which is known to start at around
200o C [7]. In such cases it was difficult or even impossible to obtain good
specimens, despite high mold pressure. Thus we were unable to mold mix-
tures of nylon with wood flour, because nylon requires higher injection tem-
peratures (240–260o C). Therefore the direct feeding of polymer–wood flour
mixtures into an injection-molding machine can be done only when the
wood flour content is less than 40% and the polymer has a melting point
below 200o C.
18 Optimizing properties of plastics compounds containing wood flour 349

18.3.3 Compatibilization of the polymer matrix

and wood ﬂour

The mechanical properties of the wood-ﬁlled compounds were found to de-

pend not only on the wood flour content but also on the fraction (particle
size). Thus a maximum reduction in tensile strength was observed in the case
of blends containing wood flour fraction 1, which contained the largest par-
ticle size (600–850 microns). It was found when the PP and SBS blends con-
tained 40% of wood flour fraction 1, tensile strength losses of 72% and 60% in
respect to the control samples occurred (see Figure 18.3). The best strength
(15.2 MPa) of SBS compounds was achieved when they contained fraction

Fig. 18.3 Influence of wood flour fractions and the modifier on the tensile strength of
injection-molded specimens of the (a) PP and (b) SBS compounds.
350 P. Spiridonov et al.

4 (particles <150 microns). The PP compound containing 40% of fraction

2 had the best mechanical properties, although this result was practically
within the statistical deviations (8%) for the strength of the compounds con-
taining fractions 3 and 4. Similar results for PP compounds were described
in [8].
The influence of the particle size on the mechanical properties of wood
flour–polymer compounds can be explained by incompatibility between the
polymer matrix and the filler, and the differences in their physical and me-
chanical properties. The wood particles differ from the polymers by their
chemical nature [4], and good adhesion between them cannot be achieved.
Therefore a large proportion of the filler in the compounds prevents the ma-
trix from forming a continuous phase, which leads to a reduction of the me-
chanical properties of the blends [8, 15]. In the case of bigger particles, they
play a role of concentration points where deformation and strain occur. Cou-
pling agents and chemically modified polymers were introduced to improve
interfacial adhesion between the polymers and fillers [9]–[13]. In this research,
maleic anhydride grafted PP and SEBS were used as modifiers for PP and
SBS blends respectively. As shown above (see Figures 18.1 and 18.2) PP–MA
was able to improve the properties of the compounds. Figure 18.3 demon-
strates that PP–MA led to the improvement and stabilization of mechanical
properties [12, 13] for injection-molded compounds regardless of the wood
fraction. The modified SBS compounds had similar properties to the con-
trol specimens. Mixing wood flour with a polymer base and modifiers in the
Brabender mixer (see Figure 18.2) gave even better results and allowed for
the loss of mechanical properties of the compounds containing a high content
of wood flour [12, 13].
The improved compatibility between the polymer matrix and the wood
flour particles led to homogeneous morphology of the compounds [9, 11, 14].
Thus maleated polymers provided a compatibilization effect in the filled com-
pounds.

18.3.4 Optimization of the compositions

The introduction of wood flour as a filler and maleated polymers as modifiers

to the polymer base resulted in opposing technical and economic effects [12,
13]. The relative cost of the PP compounds decreased by 40–50% as the
content of the wood filler increased. At the same time, the tensile strength
of these compounds droped to 40%. Increasing the PP–MA content up to a
certain point allowed a 50% improvement in mechanical properties, however
the cost of the compounds was 2–3 times higher. This example demonstrates
the necessity to optimize these wood-filled compositions.
The relative cost of the PP and SBS compounds was calculated as a ratio
of the actual cost of the compound to the cost of the control (pure) material.
18 Optimizing properties of plastics compounds containing wood flour 351

Fig. 18.4 Relative cost of the (a) PP and (b) SBS compounds depending on the content
of wood ﬂour and maleated polymers.

The calculations were based on the cost of raw materials and their content
in the compounds. Figure 18.4 shows that PP compounds are cheaper than
the control sample (with a relative cost of less than 1) where they contain
a considerable amount of wood flour and less than 17% PP–MA. For SBS
compounds, the equilibrium cost threshold was much higher. It follows from
Figure 18.4 that it is possible to introduce 50% SEBS–MA to the compound
containing 50% wood flour without increasing the cost of the modified com-
pound. The difference between the optimum cost of the PP and SBS com-
pounds is caused by the difference in the cost of raw materials. Thus the
cost of the SBS elastomer is ∼ 3 times higher than the cost of virgin PP and
the cost of maleated SEBS is much higher than the cost of PP–MA. Under
these conditions, the application of a cheap filler such as wood flour provides
an economic effect, allowing the manufacturers greater flexibility with the
composition.
It is necessary to say that in addition to financial savings, the use of wood
flour can provide plastics companies with the benefit of an aesthetically pleas-
ing, natural looking, wood-filled finish. Figure 18.5 provides an indication of
the decorative properties possible for PP compounds containing 40% of dif-
ferent grades of wood flour. It can be noticed that rough grades (fraction 1
and 2) provide a more natural wood look to the plastics than do fractions 3
and 4. Therefore wood flour with particle sizes in the range from 300 to 850
microns can be recommended for use in decorative plastics. Despite the fact
that those fractions decrease the mechanical properties of the compounds,
these properties may not necessarily be an essential criterion for decorative
parts and components. It should be possible to find an optimal balance be-
tween the properties and the cost of the compounds in the way described
above.
352 P. Spiridonov et al.

Fraction 1 Fraction 2

Fraction 3 Fraction 4

Fig. 18.5 Photographs of the PP compounds containing 40% wood ﬂour of diﬀerent
fractions.

18.4 Conclusions

The results of this research clearly demonstrate that it is possible to elim-

inate a mixing operation and make PP and SBS products with wood flour
content up to 40% directly by injection molding. Due to its low degradation
temperature, wood flour cannot be blended with polymers at temperatures
higher than 200o C. This limitation has to be considered when selecting a
polymer.
Our research has also shown that in addition to conventional plastics,
thermoplastic elastomers can be filled with wood flour. The combination of
compatible wood flour–filled plastic and elastomer materials can be used in
two-component injection-molding technology.
Lower mechanical properties of the wood fiber–filled products can be com-
pensated for using maleic anhydride grafted polymers. The properties of the
wood flour–polymer compounds can be optimized from the point of view of
their mechanical properties and cost.
Due to the different influences of wood flour fractions on the properties of
the polymer compounds, they can be applied in different ways. Thus wood
flour grades with particle sizes less than 300 microns can be used as extenders
to replace expensive plastic materials, which allows for economic savings. The
18 Optimizing properties of plastics compounds containing wood flour 353

grades with particles in the range between 300 to 850 microns can be used
in decorative plastics.

References

1. Modern Plastics Encyclopedia, Eds G. Graﬀ, K. Kreiser (McGraw–Hill, New York,

1989).
2. D. N. Saheb and J. P. Jog, Natural fiber polymer composites: A review, Adv. Polymer
Tech. 18(4) (1999), 351–363.
3. M. Y. Fuad Ahmad et al., Rice husk and oil palm wood flour as fillers in polypropylene
composites: Materials characterization and mechanical properties evaluation, in Pro-
ceedings of the 4th International Conference on Composites Engineering, 5–11 July
1998, Ed. M. L. Scott (Woodhead Publishing, Cambridge U. K.), 337–338.
4. D. Ruys, A. Crosky and W. J. Evans, Natural bast fibre structure, in Proceedings of
the 3rd Conference on Technology Convergence in Composites Applications (ACUN-
3), 6–9 February 2001, Eds S. Bandyopadhyay, N. Gowripalan, N. Drayton (University
of New South Wales, Sydney, 2001) 468–472.
5. A. S. Blicblau, S. Laird and R. S. P. Couts, Air cured sisal strand reinforced cement
sheet, in Proceedings of the 3rd Conference on Technology Convergence in Composites
Applications (ACUN-3), 6–9 February 2001, Eds S. Bandyopadhyay, N. Gowripalan,
N. Drayton (University of New South Wales, Sydney, 2001) 447–451.
6. M. S. Sreekala and S. Thomas, Accelerated effects in oil palm fibre reinforced phenol
formaldehyde composites, in Proceedings of the 3rd Conference on Technology Con-
vergence in Composites Applications (ACUN-3), 6–9 February 2001, Eds S. Bandy-
opadhyay, N. Gowripalan, N. Drayton (University of New South Wales, Sydney, 2001)
461–467.
7. N. E. Marcovich, M. M. Reboredo and M. I. Aranguren, Modified woodflour as ther-
moset fillers: II. Thermal degradation of woodflours and composites, Thermoch. Acta
372 N1–2 (2001), 45–57.
8. N. M. Stark and M. J. Berger, Effect of particle size on properties of wood–flour
reinforced polypropylene composites, in Proceedings of the Fourth International Con-
ference on Woodfibre–Plastic Composites, 12–14 May 1997, (Forest Product Society,
Madison, Wisconsin, 1997) 134–143.
9. K. Oksman and C. Clemons, Effects of elastomers and coupling agent on impact perfor-
mance of wood flour–filled polypropylene, in Proceedings of the Fourth International
Conference on Woodfibre–Plastic Composites, 12–14 May 1997, (Forest Product Soci-
ety, Madison, Wisconsin, 1997) 144–155.
10. M. N. Angles, J. Salvado and A. Dufresne, Steam–exploded residual softwood–filled
polypropylene composites, J. Appl. Polymer Sci. 74(8) (1999), 1962–1977.
11. M. Kazayawoko, J. J. Balatinecz and L. M. Matuana, Surface modification and adhe-
sion mechanisms in woodfiber-polypropylene composites, J. Mat. Sci. 34(24) (1999),
6189–6199.
12. J. Z. Lu, Q. L. Wu and H. S. McNabb, Chemical coupling in wood fiber and polymer
composites: A review of coupling agents and treatments, Wood Fib. Sci. 32(1) (2000),
88–104.
13. R. Mahlberg et al. Effect of chemical modification of wood on the mechanical and
adhesion properties of wood fiber/polypropylene fiber and polypropylene/veneer com-
posites, Holz Als Roh-und Werkstoff 59(5) (2001), 319–326.
14. S. B. Elvy, G. R. Dennis and L. T. Ng, Effects of coupling agent on the phys-
ical properties of wood-polymer composites, J. Mat. Proc. Tech. 48(1–4) (1995),
365–371.
354 P. Spiridonov et al.

15. B. J. Lee, A. G. McDonald and B. James, Influence of fiber length on the mechani-
cal properties of wood-fiber/polypropylene prepreg sheets, Mat. Res. Innov. 4(2–3)
(2001), 97–103.
16. J. J. Balatinecz and M. M. Sain, The influence of recycling on the properties of wood
fibre plastic composites, Macro. Symp. 135 (1998), 167–173.
Chapter 19
Constrained spanning, Steiner trees
and the triangle inequality

Prabhu Manyem

Abstract We consider the approximation characteristics of constrained span-

ning and Steiner tree problems in weighted undirected graphs where the edge
costs and delays obey the triangle inequality. The constraint here is in the
number of hops a message takes to reach other nodes in the network from a
given source. A hop, for instance, can be a message transfer from one end of
a link to the other. A weighted hop refers to the amount of delay experienced
by a message packet in traversing the link. The main result of this chapter
shows that no approximation algorithm for a delay-constrained spanning tree
satisfying the triangle inequality can guarantee a worst case approximation
ratio better than Θ(log n) unless NP ⊂ DTIME(nlog log n ). This result extends
to the corresponding problem for Steiner trees which satisfy the triangle in-
equality as well.

Key words: Minimum spanning tree, maximum spanning tree, triangle in-
equality, Steiner tree, APX, approximation algorithm, asymptotic worst case
ratio

19.1 Introduction

Consider a network G = (V, E) where a certain node (the source or the

speaker) broadcasts messages to all the other nodes (the destinations or the
receivers) in the network. When a broadcast occurs, suppose the network
links through which the message is relayed need to be leased for a given
non-negative cost cij , where i and j are the end nodes of the link (i, j) ∈ E.

Prabhu Manyem
Centre for Industrial and Applied Mathematics, University of South Australia
Mawson Lakes SA 5095, AUSTRALIA∗
e-mail: [email protected]
∗ Currently at The University of Ballarat.

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 355

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 19,
c Springer Science+Business Media, LLC 2009
356 P. Manyem

A feasible solution to this single source broadcast problem is a set of leased

links so that the message from the specified source reaches all destinations,
and the message passes through each (intermediate) node at most once –
because a receiver hearing the same piece of message more than once could
become confused. In other words, there should be no loops or cycles in the
feasible solution. (Any solution with loops can be modified by removing a
few edges to break the loops, with no increase in cost. Hence we consider
only acyclic solutions.) A piece of message (or data) is known as a packet in
telecommunications terminology.
The cost of a solution is the sum of the costs of the leased links, and
an optimal solution is one which minimizes this overall cost. This broad-
cast problem can be modeled as a MinST (minimum spanning tree) and the
solution is a tree rooted at the pre-defined source s.
As opposed to the broadcast problem, in the multicast problem, the mes-
sage needs to be sent only to a select group of nodes in the network, known
as the multicasting group. Just as the broadcast version lends itself to a
spanning tree formulation, the multicast version lends itself to a Steiner tree
formulation.
Suppose we add the following constraint to the broadcast problem: that
the number of hops taken by a message to reach any destination from a given
source vertex s is bounded by a threshold value Δ. A hop can be defined as
a message transfer from one end of a link to the other. We call this the hop-
constrained spanning tree problem or HCSP. This problem has been shown
to be NP-hard [12]. A variation of the HCSP is the DCSP, the diameter-
constrained spanning tree problem, where the diameter (the number of edges
in the longest path of the solution obtained) obeys an upper bound of Δ.
The DCSP is also NP-hard [4].
The CSP, the delay-constrained spanning tree problem, is a generalization
of the HCSP. Each edge in the network has two distinct parameters: (1)
a cost cij and (2) a delay dij . (Here, delay refers to the amount of delay
experienced by a message packet in traveling from one end of the link to the
other. The total delay in a link can be broken down into transmission delay,
switching delay and queueing delay, of which transmission delay is usually
predominant.) The delay parameter can be considered to be a weighted hop.
If in a CSP the delay is set to one for each edge, one obtains an HCSP.
For a given minimization problem P, let A be an approximation algorithm
and PI the set of instances in P. For a given instance I ∈ PI , let the cost of the
solution obtained by A be AI . Let the cost of the optimal solution for I be
OP TI . Then the approximation ratio of A for instance I is RA,I = AI /OP TI .
Over all instances I ∈ PI , the absolute performance ratio is defined as [4]:

RA = sup (r ≥ 1 : RA,I ≤ r for all I ∈ PI ) . (19.1)

The lower the value of RA , the better the heuristic A. A constant value of
RA is superior to a value that depends on the size of instances, for example,
19 Constrained spanning, Steiner trees and the triangle inequality 357

RA ∈ Θ(n) or RA ∈ Θ(log n). Lund and Yannakakis [7] show that the SET
COVER problem cannot be in the class APX, which is the class of prob-
lems for which it is possible to construct a polynomial time heuristic A that
guarantees a constant value on RA . Feige [3] showed that unless NP ⊂
DTIME(nlog log n ), the SET COVER problem cannot be approximated to
within Θ(log n). Manyem and Stallmann [9] have shown that an HCSP, and
hence a CSP too, cannot be in the complexity class APX. Results from [2]
indicate that a DCSP is unlikely to be in APX. Heuristics for the Steiner
tree version of the problem with general costs and weighted hops appear in
Manyem [8].
Marathe, Ravi et al. [10] consider networks with both cost and delay pa-
rameters on the edges. They provide an approximation algorithm that guar-
antees a diameter within O(log |V |) of the given threshold Δ and a total cost
within O(log |V |) of the optimum. A vast compendium of results on approx-
imability is provided in Ausiello et al. [1].
Figure 19.1 provides a road map of some of the optimization problems that
arise in telecommunication networks. Here S is the set of terminal nodes for
Steiner tree problems. In multicasting terminology, S is the set of conference
nodes. Problem 1, the Constrained Steiner Tree (CST), is the hardest in the

|S| = 2 CST all edge weights = 1

NP-C

1 all
S=V delays
2
=1 5
3 4
Constr. CST with
Shortest CSP unit weight
HCST
Path NP-C edges
NP-C all
NP-C all delays NP-C
delays =1
all edge =1 S=V
all edge all edge
wts = 1
wts = 1 wts = 1 |S| = 2

CSP with
HCST with delay constr.
Problem unit weight HCSP
8 unit weight path with
#9 edges 7
NP-C edges min. # of edges
poly. time
edge wts NP-C poly. time
6 1 or 2
S=V |S| = 2 9
all all
delays delays
HCSP with height constr. =1
=1 10
edge weights path with
1 or 2 min. # of edges 11
NP-C poly. time

all edge
wts = 1

NP-C : NP-Complete HCSP with

poly. time : solvable in unit weight
polynomial time 12
edges
poly. time

Fig. 19.1 A Constrained Steiner Tree and some of its special cases.
358 P. Manyem

figure – all other problems are special cases of CSTs. Given an instance of a
CST, if we set the multicast group to be all nodes in the network, we obtain
the Constrained Spanning Tree (Problem 3). Given an instance of a CST, if we
set the multicast group to be just two nodes which need to communicate with
each other, we obtain Problem 2, the Constrained Shortest Path. Problem 7
is a Hop-Constrained Spanning Tree, and Problem 4 is a Hop-Constrained
Steiner Tree.
All problems above the dotted line in Figure 19.1 are NP-complete, and
the ones below can be solved in polynomial time. Positive results from one
problem to another flow in the direction of the arrows, and negative results
flow in the direction against that shown by the arrows. For example, if we can
develop a heuristic for Problem 1 that guarantees an upper bound B on the
approximation ratio over all instances, this will also hold true for all problems
in the figure. On the other hand, if we can show (a negative result) that unless
NP ⊂ DTIME(nlog log n ), there can be no heuristic that guarantees an upper
bound of B for Problem 7, then this will also be true for Problems 1, 3 and
4. See Table 19.1 for further details.

Table 19.1 Constrained Steiner Tree and special cases: References

Problem Results and
Number in Figure 19.1 References
1, 3–5 [8] and [9]
2 [5]
4 [2], [8] and [9]
6 Shortest path problem
7 [2], [12] and Problem ND4 in [4]
8 [8] and [9]
9, 11 Problem ND30 in [4]
10 [12] and Problem ND4 in [4]
12 Breadth First Search

The proof in [12] that Problem 10 in Figure 19.1 is NP-hard renders prob-
lems 1, 3, 4 and 7 NP-hard as well. Similarly, the proofs in [9] show that
unless NP ⊂ DTIME(nlog log n ), Problems 7 and 8 cannot be approximated
to better than Θ(log n). Hence this non-approximability result carries over
to Problems 1, 3, 4 and 5.
In this chapter, we consider special cases of CSPs and HCSPs where the
edge costs and delays obey the triangle inequality (we call these problems
CSPI s and HCSPI s respectively). First, in Section 19.2, we show that the
cost of spanning tree solutions for a CSPI and an HCSPI in a given network
G = (V, E) is at most |V | − 1 times the cost of any other spanning tree
solution for G. This implies that any solution is within a |V | − 1 factor of
the optimal solution.
Next, in Section 19.3, we prove that the lower bound for any approxima-
tion algorithm for a CSPI is Θ(log n). Unless NP ⊂ DTIME(nlog log n ), no
19 Constrained spanning, Steiner trees and the triangle inequality 359

approximation algorithm can guarantee an RA better than this. We show

this by an E-Reduction (explained in Section 19.3.1) from the SET COVER
problem.

19.2 Upper bounds for approximation

We ﬁrst show that for a given network G = (V, E) with non-negative costs
cij on undirected edges (i, j) ∈ E, the value of a spanning tree is at most
|V | − 1 times that of any other. We shall assume that the underlying graph
is complete without loss of generality (if the network is not complete, we can
add edges to the network with costs that obey the triangle inequality). We
start with a well-known result for such graphs.
Remark 1. For any two vertices i and j in G, where the edge costs of G obey
the triangle inequality, the edge (i, j) is also a least expensive path in G
between these two vertices.

19.2.1 The most expensive edge is at most a minimum

spanning tree

We show here that Lmax , the cost of the most expensive edge in E, is at most
equal to Tmin , the cost of a MinSTG (minimum spanning tree for G). Let
the endpoints of the most expensive edge be s and t. Let L1 be the cost of
the s − t path using the edges in MinSTG . From Remark 1, it follows that
cst = Lmax ≤ L1 . Since L1 ≤ Tmin , we conclude that Lmax ≤ Tmin .
Remark 2. In a network G where the edge costs obey the triangle inequality,
the cost of the most expensive edge in G is at most the cost of a minimum
spanning tree of G.

19.2.2 MaxST is at most (n − 1)MinST

Let Tmax be the value of a MaxSTG (maximum spanning tree of G) where

|V | = n. Since Lmax is the cost of the most expensive edge in G, it follows
that Tmax ≤ (n − 1)Lmax . From Remark 2, Lmax ≤ Tmin . Thus Tmax ≤
(n − 1)Tmin , which is what we set out to show in this section.
Remark 3. For a given undirected network G = (V, E) which satisﬁes the
triangle inequality, the ratio of the costs of MaxSTG to those of MinSTG has
an upper bound of |V | − 1. Hence the ratio of the costs of any two spanning
trees in G has this upper bound.
360 P. Manyem

Remark 4. For undirected networks where the edge costs obey the triangle
inequality, the performance ratio RA for any approximation algorithm A has
an upper bound of |V | for any version of the spanning tree problem that has
the objective of minimizing the sum of the edge costs in the feasible solution.

In particular, the above remark is true for CSPI s and HCSPI s.

19.3 Lower bound for a CSP approximation

An upper bound on the performance ratio RA for any approximation al-

gorithm for a CSPI and an HCSPI is provided in Remark 4. Let us now
turn to proving a lower bound for a CSPI . We show in this section that un-
less NP ⊂ DTIME(nlog log n ), there can be no heuristic that can guarantee
a performance ratio better than Θ(log n) for a CSPI . We show this by an
E-Reduction from a SET COVER to a CSPI . Since this lower bound holds
for a SET COVER [3, 7], it does so for a CSPI too, via E-Reduction. Recall
that a CSPI is the version of a CSP where the edge costs obey the triangle
inequality.

19.3.1 E-Reductions: Deﬁnition

If problem A E-reduces to problem B, then B is as hard to approximate as

A. The formal deﬁnition of E-Reduction is as follows.

Deﬁnition 1 (E-reduction [6]). A problem A E-reduces to a problem B,

or A ≤E B, if there exist polynomial time functions f and g and a constant
β such that
(1) f maps an instance I of A to an instance J of B; and
(2) g maps solutions T of J to solutions S of I such that

ε(I, S) ≤ βε(J, T ), (19.2)

where the error term ε(I, S) is deﬁned below.

Deﬁnition 2 (Error [6]). For minimization problems, a solution S to an

instance I has error ε(I, S) if

V (I, S)
= 1 + ε(I, S), (19.3)
opt(I)

where V (I, S) is the value of a solution S to instance I and opt(I) is the

value of an optimal solution to I.
19 Constrained spanning, Steiner trees and the triangle inequality 361

Another type of reduction used in approximability theory is the L-

Reduction introduced in [11].

19.3.2 SET COVER

A SET COVER instance is deﬁned by a ground set X = {xi |1 ≤ i ≤ p} and

a collection Y = {yj |1 ≤ j ≤ q}, each yj being a subset of X. The goal is to
ﬁnd a cover Y of X such that (a) Y ⊆ Y , (b) |Y |, the cardinality of Y , is
minimal, and (c) Y satisﬁes
/
yj = X.
yj ∈Y

19.3.3 Reduction from SET COVER

For a CSPI , a spanning tree needs to be determined such that (1) its cost is
minimal and (2) the sum of the edge delays in the path from a speciﬁed vertex
s ∈ V (the source) to every other vertex in V is at most Δ, a non-negative
integer.
We create an instance of a CSPI as follows (see Figure 19.2). For each
xi ∈ X and yj ∈ Y in SET COVER, create a vertex. Create an additional
vertex s. Thus |V | = |X| + |Y | + 1. Since |V | = n, |X| = p and |Y | = q, we

s
n
n n
n+1
n

y1 1 y2 1 y3 ........ yq
1
1
1

1 1
1 1 1 1
x1 1 x2 1 x3 1 x4 . . . . . . . xp

Fig. 19.2 A CSPI instance reduced from SET COVER (not all edges shown).
362 P. Manyem

have n = p+q +1. The edges in E in the instance G = (V, E) of the CSPI are
assigned as in Table 19.2. The costs (delays) assigned to the edges are given
in Column 3 (Column 4) of the table respectively. The graph G is complete.
However, in the interests of clarity, not all edges are shown in Figure 19.2.
Only edge costs are shown in the ﬁgure, not edge delays.

Table 19.2 E-Reduction of a SET COVER to a CSPI : Costs and delays of edges in G
Edge Set Deﬁnition Cost Delay
E1 {(s, yj )| 1 ≤ j ≤ q} n 1
E2 {(s, xi )| 1 ≤ i ≤ p} n+1 2
E3 {(yj , xi )| xi ∈ yj , 1 ≤ i ≤ p, 1 ≤ j ≤ q} 1 1
E4 {(yj , xi )| xi ∈
/ yj , 1 ≤ i ≤ p, 1 ≤ j ≤ q} 1 2
E5 {(yi , yj )| 1 ≤ i < j ≤ q} 1 1
E6 {(xi , xj )| 1 ≤ i < j ≤ p} 1 1

Let Δ = the delay constraint at all vertices = 2. Note that both the edge
costs and the edge delays in G obey the triangle inequality. (In most cases,
both the cost and the transmission delay of an edge directly relate to its
length. Further, queueing and switching delays are usually minor. Hence the
cost cij and delay dij of an edge are closely related (they could be directly
proportional, for example). However, there may be instances where the in-
crease in edge delay is signiﬁcantly faster than that of edge cost. For instance,
due to a high degree of congestion in the network, queueing and switching
delays could be far higher than normal. From Table 19.2, the total set of
edges of the graph G is given by E = ∪6i=1 Ei .

19.3.4 Feasible Solutions

Recall that the delay constraint is equal to 2. It is possible for the (s, xi )
edges to be utilized in a feasible solution – if they are, they can be deleted
from the solution with no increase in cost. Note that there can be no paths
of the form s − xi − yj nor of the form s − yj − xi , where (xi , yj ) ∈ E4 ; that
is, when xi ∈/ yj in the SET COVER problem. This is due to the high delay
(2 units) of such edges. In either of the paths just mentioned, the leaf vertex
would experience a delay of 3.
Suppose for xi = x0 , the edge (s, x0 ) is in the feasible solution S0 re-
turned by a heuristic. This edge can be replaced as follows. For any yj ∈ Y ,
the edge (x0 , yj ) is not in the feasible solution S0 , otherwise the delay at
such a yj would be 3, violating the delay constraint. There are two possible
cases here:
• Suppose there exists a y0 such that x0 ∈ y0 in SET COVER, and edge
(s, y0 ) ∈ S0 . Then replace (s, x0 ) with (xi , y0 ) to obtain a new solution S1 .
Observe that cost[S0 ] − cost[S1 ] = n units (the cost decreases).
19 Constrained spanning, Steiner trees and the triangle inequality 363

• Alternatively, such a y0 may not exist. In any case, there is at least one
edge (x0 , yj ) in G for some yj ∈ Y , otherwise no feasible solution is ever
possible for the CSPI . This is due to the fact that x0 ∈ yj for at least one
yj ∈ Y in SET COVER. We name this yj , y1 . As per our assumption
for this case, the edge (s, y1 ) is not in S0 . This implies that y1 is a leaf
vertex in S0 , and has another yj (say y2 ) as its parent. To obtain a new
solution S1 , we can delete the edges (y1 , y2 ) and (s, x0 ), and replace them
with (s, y1 ) and (x0 , y1 ). In S1 , the vertex x0 remains a leaf, but y1 is
no longer a leaf. The delay constraints are still obeyed at all vertices.
We have

cost[S0 ] − cost[S1 ] = cost[(s, x0 )] + cost[(y1 , y2 )] − cost[(s, y1 )]

− cost[(x0 , y1 )] = (n + 1) + 1 − n − 1 = 1

(that is, the cost decreases).

To obtain S1 from S0 , at most |X| edges of the form (s, xi ) need to be
replaced, and each replacement takes a constant amount of time. Thus S1
can be obtained from S0 in time Θ(|X|) = O(n), a time polynomial in the
number of elements in the ground set of SET COVER. Once all edges of the
form (s, xi ) have been eliminated from S0 , the resulting feasible solution S1
will be as described below.

19.3.4.1 Structure of S1

The parents of the x’s in an FS have to be y’s — such y’s should in turn have s
as their parent. Also due to the delay constraint, a path such as s−yj −yr −xi
can also be ruled out for any 1 ≤ i ≤ p and 1 ≤ j < r ≤ q.
Not all y’s need to have s as their parent – some of the y’s can have another
y (say ya , for example) as their parent, as long as ya ’s parent is s (recall the
delay constraint of 2). Suppose we call a y such as ya a covering y and the
rest non-covering y’s. The covering y’s together form a cover to the x’s –
these y’s may or may not be leaves in an FS. The non-covering y’s will be
leaves. In other words, a yj is
• in the cover if s is yj ’s parent, and
• not in the cover otherwise. In such a case, the parent of yj would be a
covering y. The delay constraint forbids a non-covering yj to be the parent
of an xi in an FS.
It is suﬃcient for all the non-covering y’s to have a common parent. Let the
cover size (the number of covering y’s) be k. If in Figure 19.2, we move the
cover to the left (the y’s can be renumbered in such a way that y1 through
yk cover all elements in X), a feasible solution as described above will look
like the one in Figure 19.3.
364 P. Manyem

n
n n
1

y1 1
y2 ..... yk yk+1 ....... yq

1 1 1 1 1

x1 x2 x3 x4 . . . . . . . . . xp

Fig. 19.3 Feasible solution for our instance of CSPI (not all edges shown).

Note that it is cheaper for the non-covering y’s to have one of the covering
y’s as their parent, rather than s – cheaper by a factor of n. The spanning
tree (the feasible solution in Figure 19.3) includes the following:
• (s, yj ) edges: k in number, each with a cost of n,
• edges of the form (yi , yj ), where (s, yi ) is part of the FS, and (s, yj ) is not
(in other words, yi is in the cover and yj is not): q − k such edges, each
with unit cost,
• (xi , yj ) edges: p in number, each with unit cost.
Thus the cost of the spanning tree of Figure 19.3 equals

k(n) + (q − k)(1) + (p)(1) = kn + n − k − 1,

since n = p + q + 1. The cost then, can be described as a function C(k),

where
C(k) = kn + n − k − 1, 1 ≤ k ≤ q. (19.4)

19.3.4.2 Correspondence Between Feasible Solutions

Note that there is a one-to-one correspondence between feasible solutions in

SET COVER and S1 for our instance of the CSPI . A set of covering y’s in
our CSPI instance can also be used as a cover in SET COVER. In the other
direction, a cover in SET COVER can be transformed to a set of covering y’s
in our CSPI instance; these will have s as their parent in a FS. The other
(non-covering) y’s will have one of the covering y’s as their parent, and the
x’s will have the covering y’s as their parent(s).
19 Constrained spanning, Steiner trees and the triangle inequality 365

For this reduction, I is a SET COVER instance, S is a solution to I, J is

our instance of CSPI corresponding to I, and T is a solution to J. From the
above argument, we have the following lemma.
Lemma 1. A SET COVER instance I has a solution with cardinality k (1 ≤
k ≤ q) iﬀ the corresponding CSPI instance J has a solution with a total cost
C(k).
For any approximation algorithm for CSPI to obtain the least possible
cover size, the costs C(k) should monotonically increase from C(1) through
C(q), and this is indeed the case with Equation (19.4). Further note that
the reduction from SET COVER can be carried out in polynomial time. To
complete the proof that this is an E-reduction, we only need to show that
the error condition (19.2) is satisﬁed for some constant β.

19.3.5 Proof of E-Reduction

Let k (≤ q) be the value of any feasible solution to SET COVER, and l be

that of the optimal solution. Obviously, l ≤ k ≤ q. Therefore
k k−l
ε(I, S) = −1 =
l l
and from (19.4),

C(k) kn + n − k − 1 (n − 1)(k − l)
ε(J, T ) = −1 = −1 = .
C(l) ln + n − l − 1 nl + n − l − 1

We need to ﬁnd a constant β such that βε(J, T ) ≥ ε(I, S), or

(n − 1)(k − l) k−l
β ≥ ,
nl + n − l − 1 l
or

β ≥ 1 + l−1 . (19.5)

The second term in (19.5), l−1 , is bound by 0 ≤ l−1 ≤ 1. Thus it is

suﬃcient to ﬁnd a β ≥ 2. Let us set β = 2. This completes the proof of
E-Reduction. Thus we have shown that the following theorem holds.
Theorem 1. SET COVER E-reduces to a CSPI .
Corollary 1. A CSPI does not belong to APX. Further, CSPI cannot be
approximated to within Θ(log n) unless NP ⊂ DTIME(nlog log n ), where n is
the number of nodes in the network.
366 P. Manyem

19.4 Conclusions

From Remark 4, it follows that certain versions of the minimum spanning tree
problem that are of interest in data networking (which need not necessarily
be single-source) have an approximation upper bound of |V |, the number of
nodes in the network, when the edge costs obey the triangle inequality. In
particular,
• the hop-constrained version HCSPI ,
• the delay-constrained version CSPI , and
• the diameter-constrained versions (weighted as well as unweighted)
have an upper bound of |V | on the performance ratio of any approximation
algorithm.
The result from Section 19.3 extends to the case of constrained Steiner
trees which satisfy the triangle inequality, since CSPI is a special case of such
problems for Steiner trees. Speciﬁcally, we can conclude that the following
theorem holds.
Theorem 2. The following single-source problems with edge costs obeying the
triangle inequality cannot have an approximation heuristic A that can guaran-
tee a performance ratio RA better than Θ(log n) unless NP ⊂ DTIME(nlog log n ),
and hence these problems cannot be in APX:
• the delay-constrained spanning tree problem CSPI , and
• the delay-constrained Steiner tree CSTI .
The CSTI is the triangle-inequality version of Problem 1 in Figure 19.1.
Both the edge costs and delays in the delay-constrained problem versions
mentioned in this section need to obey the triangle inequality.

Acknowledgments The author beneﬁted from discussions with Matt Stallmann of North
Carolina State University. Support from the Sir Ross and Sir Keith Smith Foundation is
gratefully acknowledged. The comments from the referee were particularly helpful. Since
the early 1990s, the online compendium of Crescenzi and Kann, and more recently, their
book [1], has been a great help to the research community.

References

1. G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti–Spaccamela and

M. Protasi, Combinatorial Optimization Problems and their Approximability Prop-
erties (Springer-Verlag, Berlin, 1999).
2. J. Bar–Ilan, G. Kortsarz and D. Peleg, Generalized submodular cover problems and
applications, Theor. Comp. Sci. 250 (2001), 179–200.
3. U. Feige, A threshold of ln n for approximating set cover, JACM 45 (1998), 634–652.
4. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory
of NP–Completeness (Freeman, New York, 1979).
19 Constrained spanning, Steiner trees and the triangle inequality 367

5. R. Hassin, Approximation schemes for the restricted shortest path problem, Math.
Oper. Res. 17 (1992), 36–42.
6. S. Khanna, R. Motwani, M. Sudan and U. Vazirani, On syntactic versus computational
views of approximability, SIAM J. Comput. 28 (1998), 164–191.
7. C. Lund and M. Yannakakis, On the hardness of approximating minimization prob-
lems, JACM 41 (1994), 960–981.
8. P. Manyem, Routing Problems in Multicast Networks, PhD thesis, North Carolina
State University, Raleigh, NC, USA, 1996.
9. P. Manyem and M.F.M. Stallmann, Approximation results in multicasting, Technical
Report 312, Operations Research, NC State University, Raleigh, NC, 27695–7913,
USA, 1996.
10. M. V. Marathe, R. Ravi, R. Sundaram, S. S. Ravi, D. J. Rosenkrantz and H. B. Hunt
III, Bicriteria network design problems, J. Alg. 28 (1998), 142–171.
11. C.H. Papadimitriou, Computational Complexity (Addison-Wesley, Reading, MA,
1994).
12. H. F. Salama, Y. Viniotis and D. S. Reeves, The delay–constrained minimum span-
ning tree problem, in Second IEEE Symposium on Computers and Communications
(ISCC’97), 1997.
Chapter 20
Parallel line search

T. C. Peachey, D. Abramson and A. Lewis

Abstract We consider the well-known line search algorithm that iteratively

reﬁnes the search interval by subdivision and bracketing the optimum. In our
applications, evaluations of the objective function typically require minutes
or hours, so it becomes attractive to use more than the standard three steps
in the subdivision, performing the evaluations in parallel. A statistical model
for this scenario is presented giving the total execution time T in terms of
the number of steps k and the probability distribution for the individual
evaluation times. Both the model and extensive simulations show that the
expected value of T does not fall monotonically with k, in fact more steps may
signiﬁcantly increase the execution time. We propose heuristics for speeding
convergence by continuing to the next iteration before all evaluations are
complete. Simulations are used to estimate the speedup achieved.

Key words: Line search, parallel computation

20.1 Line searches

A line search involves ﬁnding the minimal value of a real function f of a single
real variable x. We attempt to locate the minimizing argument to within a
“tolerance.” Formally, given an interval [a, b] ∈ IR, a function g : [a, b] → IR

T. C. Peachey
School of Computer Science and Software Engineering, Monash University, Clayton,
VIC 3800, AUSTRALIA
D. Abramson
School of Computer Science and Software Engineering, Monash University, Clayton,
VIC 3800, AUSTRALIA
A. Lewis
HPC Facility, Griﬃth University, Nathan, QLD 4111, AUSTRALIA

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 369

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 20,
c Springer Science+Business Media, LLC 2009
370 T.C. Peachey et al.

and a tolerance d, we require p, q such that x∗ ∈ [p, q] ⊂ [a, b] where g is

minimal at x∗ and q − p ≤ d. We assume that the derivative, if it exists, is
unknown.
Apart from their use in one-dimensional optimization, line searches are
used in optimization on domains of higher dimension. For example, the quasi-
Newton search methods use repeated cycles of determining the search direc-
tion and then performing a line search in that direction.
The line search algorithm is one of repeated subdivision of the interval
and restriction to a subinterval. It can be summarized as follows:
1. Enter initial interval [a, b] and tolerance d.
2. Set p = a, q = b.
3. Subdivide [p, q] with points p = x0 < x1 < x2 < ... < xk = q, where k ≥ 3.
4. Compute gi = g(xi ) for i = 0, 1, 2, . . . , k.
5. Select xm : gm = mini gi , the point where g is least.
6. If m = 0 replace p by x0 and q by x1 ,
else if m = k replace p by xk−1 and q by xk ,
else replace p by xm−1 and q by xm+1 .
7. If q − p ≤ d then return (p, q),
else go to Step 3.
Clearly the algorithm will terminate if sup(xi −xi−1 )/(q−p) < 1/2, where the
supremum is taken over both steps in the line search and iterations of that
search. The process yields an interval [p, q] which is guaranteed to contain
the minimum if g is unimodal on [a, b]. Usually k is 3 as this is more efficient
in terms of the number of function evaluations. It has long been known that
the “Fibonacci search” [5] will minimize the number of function evaluations
in the worst case. If g is approximately quadratic near the minimum then
alternative methods such as Powell’s [6] can be expected to be more efficient.
We are concerned with applications where each function evaluation may
take at least several minutes on a fast processor. For example, g may represent
aerodynamic drag on an object where x is some shape parameter, so a flow
simulation would be required for each function evaluation. Further, we assume
that batches of evaluations may be performed concurrently, on a cluster of
computers or using the resources of the global grid. Clearly in such cases the
speed of convergence may be improved by using more than three steps in each
subdivision. These “parallel line searches” are the subject of this chapter.

20.2 Nimrod/O

Nimrod/O [1, 2] is an optimization package designed for the scenario de-

scribed above, that is, long evaluation times employing multiple processors.
The user prepares a “schedule ﬁle” such as the one in Figure 20.1. This speci-
ﬁes the problem parameters, any constraints linking them, how the objective
20 Parallel line search 371

Fig. 20.1 A sample con- parameter alpha float range from 1 to 15

ﬁguration ﬁle. parameter tcmax float range from 0.5 to 1.5
parameter cmax float range from 0.5 to 1.0
constraint alpha >= tcmax + 2.0*cmax
task main
copy * node:.
node:substitute skeleton foil.inp
node:execute run.all
copy node:obj.dat output.$jobname
endtask

method simplex
starts 5
starting points random
tolerance 0.01
endstarts
endmethod
method bfgs
starts 5
starting points random
tolerance 0.01
line steps 8
endstarts
endmethod

function is to be evaluated and the optimization algorithm to be used. This

example uses two algorithms, the downhill simplex and the method of Broy-
den, Fletcher, Goldfarb and Shanno (BFGS), each run 5 times with diﬀerent
starting points.
The architecture of Nimrod/O is shown in Figure 20.2. Rectangles rep-
resent separate processes. The Controller reads the schedule and launches a
process for each optimization. When an optimization requires a set of objec-
tive evaluations, it ﬁrst checks the Cache to determine which jobs have already
been run. Jobs that are new are sent to the dispatcher which is either the
“Nimrod” system [1] or its commercial version “enFuzion.” The dispatcher
may run evaluations on the local machine, or on a cluster of machines or
perhaps on the world grid.
Note that this architecture allows separate optimizations to be run in par-
allel. Within each optimization we have endeavored to speed the algorithms
by employing parallel evaluations where possible. For example our implemen-
tation of the BFGS algorithm uses a parallel line search and also concurrent
evaluations in the determination of the search direction; we call this imple-
mentation “Parallel-BFGS.”
372 T.C. Peachey et al.

Optimizations

Schedule

2 Dispatcher
Cluster

Controller 3

Cache
Results

Fig. 20.2 Architecture of Nimrod/O.

Currently Nimrod/O is being applied in three areas:

• Design of an aerofoil. Here a two-dimensional aerofoil is specified in terms
of three shape parameters. A FLUENT simulation is used to compute
the flowfield around the aerofoil and compute lift and drag. The design
problem is to determine the shape parameters that maximize the ratio of
lift to drag.
• Optimal fatigue life. Finite element models are used to predict the life
of mechanical components with pre-existing cracks under a cyclical stress
regime. We require the component shape that maximizes this life
• Image compression. We consider a compression method based on the mam-
malian vision system which involves up to 96 parameters. The parameters
are to be selected to minimize the compression ratio.
It was noticed during the aerofoil study that the execution time for eval-
uations was bimodal. Most jobs took about 30 minutes but occasional ones
required between 3 and 4 hours. Consequently some of the line searches had
completed all but one of the evaluations in less than 40 minutes and then
required about 3 more hours to finish the last one. (There was no obvious
pattern to the values of the domain that gave rise to long execution times.)
This raised two issues:
A: A smaller number of steps in the line search may achieve faster
convergence as fewer jobs are less likely to include an exceptionally
long one.
B: Faster completion may be provided by a mechanism for aborting longer
jobs and proceeding to a subinterval identified by the completed jobs.
We consider Hypothesis A in Section 20.3 and B in Section 20.4.
20 Parallel line search 373

20.3 Execution time

20.3.1 A model for execution time

This section presents a model for the execution time for a line search, in
terms of the number of steps used.
Suppose that each iteration of the line search uses k ≥ 3 steps; we assume
that the points are equally spaced. Let l be the length of the original search.
Each iteration reduces the length of the current domain to a proportion 2/k
of the previous (or 1/k if the minimum happens to fall at an end point). Let
r iterations be the most required to reduce the length to the tolerance d so r
is the least integer such that l(2/k)r ≤ d. Hence
+ ,
log(l/d)
r = ceil , (20.1)
log(k/2)

where ceil(x) signiﬁes the least integer that is not less than x. We write Ti
for the evaluation time for the ith subdivision point and assume that all the
Ti have the same probability density function f (t) and distribution function
F (t). We write s for the number of evaluations required in an iteration. Note
that, after the ﬁrst iteration, subsequent ones will not require evaluations at
the end points of the subinterval. Further, if k is even and the best point
in the previous interval was internal, then the objective at the midpoint of
the current interval will have been found in the previous iteration. So we
approximate s by k − 2 if k is even and k − 1 if k is odd. As these evaluations
are performed in parallel; the evaluation time for one iteration is B = maxi Ti .
For the scenario discussed above these times are much larger than the times
required for selection of the subdivision points and comparison of the values
there. So we assume that the time for each iteration is just B. We assume
also that the Ti are statistically independent. Under this condition, see for
example [3], the distribution function for B is F (t)s . Thus the mean time for
completion of a batch is approximately
∞
d
M= t [F (t)s ] dt. (20.2)
0 dt

Hence the expected time for the complete optimization is

+ , ∞
log(l/d) d
E = M r = ceil t [F (t)s ] dt. (20.3)
log(k/2) 0 dt

20.3.2 Evaluation time a Bernoulli variate

As a model of the bimodal distribution encountered with the wing ﬂow ex-
periments, consider the case where the execution time for a single job has a
discrete distribution
374 T.C. Peachey et al.

f (t) = aδ(t − x) + (1 − a)δ(t − y), (20.4)

where δ is the Dirac delta and a, x and y are constants with 0 < a < 1 and
x < y. Then (20.2) becomes

M = xas + y(1 − as ) (20.5)

and (20.3) becomes

+ ,
log(l/d)
E = [xa + y(1 − a )] ceil
s s
. (20.6)
log(k/2)

Graphs of these functions are shown in Figure 20.3. Figure 20.3(a) shows
how r decreases in a piecewise manner. Figure 20.3(b) gives M for the case
x = 1, y = 8, l/d = 1000 and a = 0.9. Figure 20.3(c) shows E, the product
of r and M . Since M increases and r is piecewise constant, E increases while
r is constant.

0
0 10 20 30 40 50 60 70
(a) Number of iterations, r
10
8
6
4
2
0
0 10 20 30 40 50 60 70
(b) Expected time per iteration, M
50
40
30
20
10
0
0 10 20 30 40 50 60 70
Fig. 20.3 Performance
with Bernoulli job times. (c) Expected time for line search, E
20 Parallel line search 375

20.3.3 Simulations of evaluation time

For other distributions of job times, computation of (20.2) becomes diﬃcult

so we have performed simulations instead. The line search was performed on
the function g(x) = e−x sin(20x) on the domain [0, 1], shown in Figure 20.4.
This function has four local minima with a global minimum at x ≈ 0.2331.
The tolerance used was 0.001. Job times were generated randomly from (a) an
exponential distribution with parameter 2 and (b) a rectangular distribution
on [0, 1].

0.5

–0.5

Fig. 20.4 Test function –1

g(x). 0 0.2 0.4 0.6 0.8 1

Figure 20.5(a) shows the mean total execution time, averaged over 10,000
runs, plotted against k for the exponential distribution. Each point is shown
with error bars enclosing three standard errors. Figure 20.5(b) does the same
for the rectangular distribution. Similar results were obtained for a wide
variety of tolerance values.
For some simulations the line search failed to locate the global minimum,
converging on a local minimum instead. Figure 20.5(c) shows the “effective-
ness,” the proportion of runs that achieved the global minimum. Here the
algorithm is deterministic so effectiveness for a given k is either 0 or 1. In
the next section the search will depend on the order of arrival of jobs and
effectiveness will be fractional.

20.3.4 Conclusions

The preceding results show that increasing the number of steps in a parallel
line search may be counter-productive; increases in k may produce consider-
able increases in E. For this to occur there must of course be variability in
the job times. Note that Figure 20.5(b) shows much less increase than does
Figure 20.5(a), although the mean and variance of the job times are simi-
lar. The signiﬁcant factor is that the probability of job times is considerably
larger than the mean.
376 T.C. Peachey et al.

Fig. 20.5 Results of simu- 14

lations. 12
10
8
6
4
2
0
0 10 20 30 40 50 60 70
(a) E versus k, exponential distribution job times
14
12
10
8
6
4
2
0
0 10 20 30 40 50 60 70
(b) E versus k, rectangular distribution job times

1
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70
(c) Eﬀectiveness versus k

A typical user of the line search algorithm will not have information on the
distribution of job times. However the total time E(k) has local minima at
points where r(k) decreases and these values can be predicted from knowledge
of just the initial interval length l and the tolerance
+ d.,Consideration of (20.1)
shows that r falls to a value ρ at k = ceil 2 ρ dl . This can be used to
compute the number of steps k for a desired number of iterations ρ.
Our analysis has assumed that evaluation times are independent. If these
are dependent, one may expect positive autocorrelation on the parameter
space. This would lead to reduced variation in the later iterations of the line
search which in turn would reduce growth in E between jumps. When the
objective function is continuous but not unimodal we expect a priori that
increasing the value of k makes attaining the global minimum more likely.
Figure 20.5(c) supports this.
20 Parallel line search 377

20.4 Accelerating convergence by incomplete iterations

20.4.1 Strategies for aborting jobs

We consider strategies for proceeding to the next iteration of a line search

before the evaluations for all points in the current iteration are complete.
Three heuristics are proposed.
Figure 20.6(a) illustrates a situation where 5 of the 7 evaluations of a
function g(x) are complete. The minimum so far occurs at x = 3 and the
neighbors of that point have been evaluated. If g is unimodal then clearly
the minimum is in the range [2, 4]. Thus the remaining evaluations may be
aborted and the line search can proceed to the next stage. This leads to the
following algorithm for one iteration of a line search:

6
5
4
g(x)

3
2
1
0
0 1 2 3 4 5 6
(a) Strategy 1
6
5
4
g(x)

3
2
1
0
Fig. 20.6 Incomplete eval- 0 1 2 3 4 5 6
uation points. (b) Strategy 2

Strategy 1.
Suppose an iteration involves determination of objective values
g0 , g1 , . . . , gk . At any time suppose that S represents the set of the gi that
have been completed by parallel evaluation. When each new value gj arrives:
add it to the set S
determine gm , the least value in S
if 0 < m < k and gm−1 , gm+1 ∈ S return [xm−1 , xm+1 ]
else if m = 0 and g1 ∈ S return [x0 , x1 ]
else if m = k and gk−1 ∈ S return [xk−1 , xk ]
continue
378 T.C. Peachey et al.

This approach can be extended to returning a greater interval than

that provided by the immediate neighbors of the minimum point. In Fig-
ure 20.6(b), if g is unimodal then the minimum is in the interval [2, 5]; it may
be worthwhile terminating the iteration with this interval. Many variants of
this idea are possible. We investigate only the following.
Strategy 2.
Construct S as in Strategy 1. When each new value gj arrives:
add it to the set S
determine gm , the least value in S
if 0 < m < k
if gm−1 ,gm+1 ∈ S then return [xm−1 , xm+1 ]
else if gm−1 ,gm+2 ∈ S return [xm−1 , xm+2 ]
else if gm−2 ,gm+1 ∈ S return [xm−2 , xm+1 ]
else if m = 0
if g1 ∈ S then return [x0 , x1 ]
else if g2 ∈ S return [x0 , x2 ]
else if m = k
if gk−1 ∈ S then return [xk−1 , xk ]
else if gk−2 ∈ S return [xk−2 , xk ]
continue
If sufficient processors are available it may be advantageous to both con-
tinue an iteration and to explore a subinterval identified as likely to contain
the minimum. This leads to our third heuristic.
Strategy 3.
Use Strategy 2 to identify the subinterval and then start an iteration based
on that interval, but also continue with the original iteration to completion.
If later the original iteration finds a minimum better than any so far in the
new iteration then the algorithm will “backtrack,” abort the new iteration and
start another iteration based on this improved minimum.
This is essentially a form of speculative computing, see [4]. Recursion allows
a simple implementation.
We also considered the effect of applying Strategies 1–3 only after the
penultimate job has arrived, that is, when k of the k+1 evaluations have been
completed. These heuristics will be denoted by 1p, 2p and 3p, respectively.
A full search, completing each iteration before proceeding to the next, is
denoted by F.

20.4.2 Experimental results

The strategies were implemented for line searches on the test function of
Figure 20.4 with tolerance 0.001. For each k from 3 to 70, the search process
20 Parallel line search 379

Fig. 20.7 Strategy 1 with 12 "strategy_1"

mean execution time

exponential distribution of "strategy_1p"
job times. 10 "full_search"

8
6
4
2
0
0 10 20 30 40 50 60 70
steps in line search
(a) Execution times

1
0.8 "strategy_1"
effectiveness "strategy_1p"
0.6
"full_search"
0.4
0.2
0
0 10 20 30 40 50 60 70
steps in line search
(b) Eﬀectiveness

was simulated 10,000 times with execution times selected randomly from
some probability distribution.
Strategies 1 and 1p were applied using exponential evaluation times with
a mean λ = 2. Figure 20.7(a) shows the mean execution times and Figure
20.7(b) the eﬀectiveness. In each case the results for these strategies are
compared with those for a full search. Figure 20.8 shows times for the same
range of strategies but with evaluation times from a rectangular distribution
over the interval [0, 1].
These experiments were repeated with the other strategies. Figure 20.9
shows results for the same method as Figure 20.7 but with Strategies 1 and
1p replaced by 2 and 2p. Similarly Figure 20.10 shows results for Strategies
3 and 3p.

12 "strategy_1"
mean execution time

10 "strategy_1p"
8 "full_search"

6
4
2
Fig. 20.8 Strategy 1 with 0
rectangular distribution of 0 10 20 30 40 50 60 70
job times. steps in line search
380 T.C. Peachey et al.

Fig. 20.9 Results for 12 "strategy_2"

mean execution time

Strategy 2.
10 "strategy_2p"
8 "full_search"

6
4
2
0
0 10 20 30 40 50 60 70
steps in line search
(a) Execution time

1
0.8
effectiveness "strategy_2"
0.6 "strategy_2p"
"full_search"
0.4
0.2
0
0 10 20 30 40 50 60 70
steps in line search
(b) Eﬀectiveness

Fig. 20.10 Results for 12 "strategy_3"

mean execution time

Strategy 3.
10 "strategy_3p"
"full_search"
8
6
4
2
0
0 10 20 30 40 50 60 70
steps in line search
(a) Execution time

1
0.8 "strategy_3"
effectiveness

"strategy_3p"
0.6 "full_search"
0.4
0.2
0
0 10 20 30 40 50 60 70
steps in line search
(b) Eﬀectiveness
20 Parallel line search 381

20.4.3 Conclusions

For job times with an exponential distribution, Strategy 1 shows a speedup

of between Strategies 2 and 3 for k ≥ 12, less for k < 12. This increased
speed is at the expense of a deterioration in the effectiveness of the search.
Strategy 1p is intermediate in performance between F and Strategy 1 for
both execution times and effectiveness. The experiments with a rectangular
distribution of job times showed less speedup, as there was less increase in
M with k. Strategy 2 gave more speedup than that of Strategy 1 but with a
further loss of effectiveness. Strategy 3 gave a speedup almost identical to that
of Strategy 2 but with improved effectiveness. Hence this strategy is to be
preferred when occasional long jobs are delaying execution. This advantage
is at the expense of the need for extra processors when two iterations are
running concurrently.

References

1. D. Abramson, I. Foster, J. Giddy, A. Lewis, R. Sosic, R. Sutherst and N. White,

The Nimrod computational workbench: A case study in desktop metacomputing, in
Computational Techniques and Applications Conference, Melbourne, July, 1995.
2. D. Abramson, A. Lewis and T. C. Peachey, An automatic design optimization tool
and its application to computational fluid dynamics, in Supercomputing 2001, Denver,
November, 2001.
3. A. O. Allen, Probability, Statistics and Queueing Theory (Academic Press, New York,
1978).
4. F. W. Burton, Speculative computation, parallelism and functional programming,
IEEE Trans. Comput., C–34 (1985), 1190–1193.
5. J. Kiefer, Sequential minimax search for a maximum, Proc. AMS 4 (1953), 502–506.
6. M. J. D. Powell, An efficient method of finding the minimum of a function of several
variables without calculating derivatives, Comput. J., 7 (1964), 155–162.
Chapter 21
Alternative Mathematical
Programming Models: A Case
for a Coal Blending Decision Process

Ruhul A. Sarker

Abstract Real-world problems are complex. It is not always feasible to in-

clude all aspects of reality in the model of a problem. In most cases, we
deal with a simplified version of the problem that contains only some as-
pects of reality. Thus a problem can be modeled in a number of different
ways depending on the portion of reality to be included or excluded. In this
chapter, we address the alternative mathematical programming formulation
approaches for a real-world coal-blending problem under different scenarios.
The complexity of formulation and solution approaches, quality of solutions,
and solution implementation difficulties for these models are compared and
analyzed. Choice of the most appropriate model is suggested.

Key words: Coal blending, alternative modeling, mathematical program-

ming, linear programming, nonlinear programming

21.1 Introduction

In this chapter, we consider a real-world coal-blending problem. Coals are

extracted and upgraded for the customers. The raw coals are known as run of
mine (ROM) in the coal mining industry. Each coal has its own typical quality
speciﬁcations. Coal quality is measured in terms of percent of ash, sulfur
and moisture, and BTU content per pound, as well as having metallurgical
properties. BTU content per pound expresses the heating value of coal. Higher
ash content lowers the BTU content value. Sulfur in coal results in sulfur di-
oxide emission that pollutes the environment. Water particles in the coal

Ruhul A. Sarker
School of Information Technology and Electrical Engineering, UNSW@ADFA,
Australian Defence Force Academy, Canberra, ACT 2600, AUSTRALIA
e-mail: [email protected]

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 383

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 21,
c Springer Science+Business Media, LLC 2009
384 R.A. Sarker

absorb heat to evaporate and then superheat. The customers specify the
quality parameters (maximum percentage of ash and sulfur, and minimum
BTU/pound) for their coals.
The Coal Company considered in the present research currently operates
three mines. These mines differ greatly in their cost of production and coal
quality. Mine-3 is a relatively low cost mine, but its coal contains high sulfur
and does not have satisfactory metallurgical properties. On the other hand,
it contains reasonably low ash. Mine-1 is the highest cost mine, and the coal
contains relatively high ash (stone) and medium sulfur but it has excellent
metallurgical properties. Mine-2 is the largest and lowest cost mine. Its coal
contains higher ash and sulfur than mine-1 coal, but it has good metallurgical
properties. Because of the coal properties, only mine-1 and mine-2 coals are
used in the preparation of metallurgical coal.
Preparation and blending are the two coal upgrading and processing facili-
ties. Coal preparation (washing) is a process of removing physical impurities.
The process involves several different operations, including crushing (to cre-
ate a size distribution), screening (to separate sizes) and separators (mainly
cyclones, to remove the physical impurities). The objective of running a coal
preparation plant is to maximize the revenue from clean coal while removing
the undesirable impurities.
The processing of ROM coal from mine-3 in the preparation plant does not
improve the quality of coal with a reasonable yield. Therefore, the involve-
ment of the preparation plant with this low quality ROM coal means a lower
financial performance for the company. The customers do not accept these
high sulfur coals for their plant operations because of environmental pollution
restriction. The conversion of low quality ROM coals to a minimum accept-
able quality level will mean a better financial performance for the company.
A blending process provides an opportunity of quality improvement.
Blending is a common process in the Coal Industry. Blending allows up-
grading the low quality run of mine coals by mixing with good quality
coals. Furthermore, supplying the good quality ROM coals to the customers,
through blending, can reduce the cost of production, because it saves (i) the
cost of washing and (ii) lost BTU from the refuses of the preparation plant,
and (iii) it also eliminates the need for capital investment in washing facil-
ities. Most of the thermal coal customers accept blended products if they
satisfy their quality requirements.
In the blending process, the problem is to determine the quantity required
from each run-of-mine coals and washed products that maximizes the revenue
but satisfies the quality constraints of the customers.
A single period coal-blending problem can be formulated as a simple
linear programming model ([Gershon, 1986], [Hooban and Camozzo, 1981],
[Bott and Badiozamani, 1982], [Gunn, 1988], [Gunn et al., 1989], [Gunn and
Chwialkowska, 1989], and [Gunn and Rutherford, 1990]). For the multiperiod
case ([Sarker, 1990], [Sarker, 1991], [Sarker and Gunn, 1990], [Sarker, 1994],
[Sarker and Gunn, 1991], [Sarker and Gunn, 1997], [Sarker and Gunn, 1995],
21 Alternative Mathematical Models 385

[Sarker and Gunn, 1994], and [Sarker, 2003]), the modeling process depends
on the decision whether to carry inventory of run-of-mine (ROM) or of
blended coal (final product). The multiperiod coal-blending problem with
inventory of blended coal is a nonlinear program. On the other hand, the
multiperiod blending problem with inventory of ROM can be formulated as
a linear program. In this case, a number of alternative LP models can be
developed allowing the use of ROM inventory in n future periods.
A large-scale LP is solvable using any of the standard LP packages. How-
ever, a large-scale nonlinear program is complex and is not easy to solve. The
current model is an especially structured nonlinear program, and is solved
using a simple SLP (Successive Linear Programming) algorithm developed by
[Sarker and Gunn, 1997]. The solutions of some multiperiod LP models are
not practically feasible for several technical reasons. The quality of solutions,
complexity of formulation and solution approaches, and solution implemen-
tation difficulties for these models are compared and analyzed. A choice of
the most appropriate model is suggested.
The chapter is organized as follows. Following the introduction, we discuss
four alternative models for coal blending and unpgradation. The flexibility
of these models ares analyzed in Section 3. Section 4 discusses the problem
sizes and computational time required. The objective function values and the
nature of fluctuating situations for the test problems are presented in Section
5. The selection criteria for choosing the most appropriate model is discussed
in Section 6 and the conclusions are drawn in Section 7.

21.2 Mathematical programming models

The single period LP model is formulated to determine an optimal strategy

for coal blending, washing and customer allocation so as to transform the
available run of mine coal into products within customer market specifica-
tions at maximum overall profit. The constraints considered in this model
are the maximum and minimum allowable limits of ash, sulfur and BTU con-
tent, production limits, demand requirements, etc. In the multiperiod case,
the objective function and constraint types are similar to the single period
model. However, the inventory of ROM and/or blended product is used as
the linking mechanism from one period to the next. The problem formulation
when considering inventory of ROM becomes a linear program, whereas with
inventory of blended product it becomes a nonlinear program. Any ROM
extracted in period t can be used in the blending process in any or all future
periods. This assumption controls the size of LP in the multiperiod formula-
tion. The planning horizon considered is 12 months, in 1-month-long periods.
We consider four different models in this chapter. These models are defined
as follows:
• SPM: Single period model.
• MNM: Multiperiod nonlinear model.
386 R.A. Sarker

• MLM: Multiperiod linear model.

• ULM: Upper bound linear model.
To give an idea about the mathematical models of coal blending and
upgradation, we will discuss the above four models brieﬂy in this section.
All the models consider M mines, N S local customers, K washed coals cus-
tomers, CT thermal coal customers (by (metric) tonne basis) and CB thermal
coal customers (by BTU content basis). The company has both coal wash-
ing/upgrading and coal blending facilities. The blending process accepts both
ROMs and washed coal to produce blended coals. The objectives of these
models are to ﬁnd appropriate production plans to satisfy the customers’
demand for a given number of periods by satisfying the following constraints:
• Demand constraints for all types of customers (maximum and minimum
requirements are known in advance)
• Mine production capacity (known maximum and minimum capacity)
• Wash plant capacity constraints
• Quality constraints such as allowable upper and lower limit of percentages
of ash, sulfur and BTU content per pound, and
• Overall sulfur emission constraint for environmental control

21.2.1 Single period model (SPM)

The details of the SPM are presented below for the readers. SPM considers
a period of 1 month long.
Variables
bpjl (metric) tonnes of blended product for customer j made at
location l
cmjl (metric) tonnes of run-of-mine coal from mine m used for blended
product j at location l
wck (metric) tonnes of washed product k produced
wbkjl (metric) tonnes of washed product k used for blended product j
at location l
mbkjl (metric) tonnes of middling product k used for blended product
j at location l
wpkc (metric) tonnes of washed product k sent to customer c

Data
J number of blended product customers
L(j) set of sites used for blended product for customer j
acm , scm , Bm
c
run-of-mine ash, sulfur and BTU/lb analysis for
mine m
aw w w
k , sk , Bk as received ash, sulfur and BTU/lb analysis for
washed product k
21 Alternative Mathematical Models 387

Data (continued)
am m m
k , sk , Bk as received ash, sulfur and BTU/lb analysis for
middling product k
a+ +
j , sj maximum allowable ash and sulfur analysis for
customer j
a− − −
j , sj , Bj minimum allowable ash, sulfur and BTU/lb analysis
for customer j
BT Uj+ , BT Uj− maximum and minimum BTU requirements for
customer j
+
SN S maximum allowable sulfur supplied to local customers
NS set of blended product customers who correspond to
local customers
ζj amount of SO2 per (metric) tonne of sulfur supplied to
customer j ∈ N S
I number of mines
c
rmk amount of run-of-mine coal from mine m used per
(metric) tonne of washed product k (this corresponds to
the washed product recipe)
k
rmid ratio of middling in product k produced
+ −
M Pm , M Pm maximum and minimum production from mine m
BCOSTjl blending cost (dollar/(metric) tonne) for customer j at
location l
M P ROm mining cost (dollar/(metric) tonne) for mine m
M
Bm BTU content (million BTU/(metric) tonne) for ROM
coal from mine m
BkW b BTU content (million BTU/(metric) tonne) for washed
product k
BkM b BTU content (million BTU/(metric) tonne) for
middlings of washed product k
P Bj price (dollar/million BTU) oﬀered by the blended
customer j
P Pkc price (dollar/(metric) tonne) oﬀered by customer c for
washed product k
T CBCjl transportation cost (dollar/(metric) tonne) to blended
product customer j from blending location l
T CM Lml transportation cost (dollar/(metric) tonne) from mine
m to blending location l
T CM Wm transportation cost (dollar/(metric) tonne) from mine m
to VJ plant
T CW Ll transportation cost (dollar/(metric) tonne) from VJ
plant to blending location l
T CW Cc transportation cost (dollar/(metric) tonne) from VJ
plant to washed customer c (may include banking and
pier costs)
388 R.A. Sarker

The objective function to be maximized, proﬁt, is

Z= [−(BCOSTjl + T CBCjl ) × bpjl ]
j l

+ M
[Bm × P Bj − T CM Ljl − M P ROm ] × cmjl
m j l

+ [−Wk − amk × (M CM Wm + M P ROm )] × wck
k m

+ [BkW b × P Bj − T CW Ll ] × wbkjl
k j l

+ [BkM b × P Bj − T CW Ll ] × mbkjl
k j l

+ [P Pkc − T CW Cc ] × wpkc
k c

Constraints
The constraints of SPM are presented below. Relations (21.1)–(21.8) all hold
for j = 1, J, l ∈ L(j).
1. Mass balance for blended products:

− bpjl + cmjl + wbkjl + mbkjl = 0 (21.1)
m k k

2. Ash limits in blended products:

− a+
j bpjl + acm cmjl + aw
k wbkjl + k mbkjl ≤ 0
am (21.2)
m k k

− a−
j bpjl + acm cmjl + aw
k wbkjl + k mbkjl ≥ 0
am (21.3)
m k k

3. Sulfur limits in blended products:

− s+j bpjl + scm cmjl + sw
k wbkjl + k mbkjl ≤ 0
sm (21.4)
m k k

− s−
j bpjl + scm cmjl + sw
k wbkjl + k mbkjl ≥ 0
sm (21.5)
m k k

4. Minimum BTU content in blended products:

− Bj− bpjl + c
Bm cmjl + Bkw wbkjl + Bkm mbkjl ≥ 0 (21.6)
m k k
21 Alternative Mathematical Models 389

5. Overall BTU supply to customers:

% &

BT Uj ≤
r c
Bm cmjl + w
Bk wbkjl + m
Bk mbkjl
l m k k
(21.7)
for r = +, −

6. Overall sulfur supplied to NSPC plants:

% &

ζj c
sm cmjl + w
sk wbkjl + sk mbkjl ≤ SN
m +
S (21.8)
j∈N S m k k

7. Maximum and minimum mine production:

−
M Pm ≤ cmjl + c
rmk wck ≥ M Pm
+
i = 1, I (21.9)
j l k

8. Mass balance in washplant:

wck − wbkjl − wpkc = 0 k = 1, K (21.10)
j l c

9. Middlings ratio for washed products:

rkmid wck − mbkjl = 0 k = 1, K (21.11)
j l

10. nonnegativity constraints.

We compare this model with the multiperiod model by considering a col-

lection of 12 single period models.

21.2.2 Multiperiod nonlinear model (MNM)

This model considers 12 periods where each period is 1 month long. The
variables of MNM are similar to those of SPM with an additional subscript t
to represent time period. To diﬀerentiate from SPM we use capital letters for
variables. Although the constraints of this model in each period are similar
to SPM, it has additional constraints to link one period to the next for the
entire planning horizon. The model allows the inventory of blended prod-
uct to be carried from one period to the next. The inventory variables and
inventory balance constraints maintain the links in this multiperiod model.
However, the quality (percentage of ash, sulfur and BTU content per pound)
parameters of blended coal inventories carried from one period to next are
unknown which introduce nonlinearity in the model. Although the details of
the mathematical model for MNM can be found in [Sarker and Gunn, 1997],
390 R.A. Sarker

the mass balance constraint is presented below to give an idea about the
nature of variables and constraints in MNM:

− BPjlt − Ijlt + Ijlt−1 + Cmjlt + W Bkjlt + M Bkjlt = 0 ∀j, l, t
m k k
(21.12)
where
BPjlt (metric) tonnes of blended product j, supplied to customer
j, made at location l in period t (jth product
corresponds to jth customer)
Cmjlt (metric) tonnes of run-of-mine coal from mine m used for
blended product j at location l in period t
W Bkjlt (metric) tonnes of washed product k used for blended
product j at location l in a period t
M Bkjlt (metric) tonnes of middling product k used for blended
product j at location l in period t
Ijlt inventory of blended product j at location l
at the end of period t
The above constraint indicates that the total amount of blended product
j (produced for customers and inventory) is equal to the sum of its con-
stituents of ROM coals, washed coals, middling products and blended coals
from inventories.

21.2.3 Upper bound linear model (MLM)

The MLM is similar to MNM except that it allows the transfer of the inven-
tory of ROM from one period to any or all future periods within the planning
horizon. This model forms the upper bound of the problem since it consid-
ers all possible savings from inventories and productions. The details of the
model can be found in [Sarker, 2003].
This model allows carrying the most attractive input(s) in terms of quality
and cost, for future periods. This is the upper bound of the planning problem
because:
1. this model considers all possible alternatives of supplying coals to cus-
tomers and
2. the solution to this model will give an objective value larger than or equal
to any feasible solution to the ”true problem.”
To have a feeling about the nature of variables and constraints in ULM,
we present the mass balance constraint below:

− BPjlt + Cmjlτ t + W Bkjlτ t + M Bkjlτ t = 0 (21.13)
m τ ≤t k τ ≤t k τ ≤t
21 Alternative Mathematical Models 391

where
Cmjlτ t ROM coal of mine m produced in period τ ,
used in blended product j at location l in a period t
W Bkjlτ t washed product k produced in period τ ,
used in blended product j at location l in a period t
M Bkjlτ t middling product k produced in period τ ,
used in blended product j at location l in a period t
The constraint represents that the total amount of blended product j
(produced for customers only) is equal to the sum of its constituents of ROM
coals, washed coals and middling products taken from current and previous
periods.

21.2.4 Multiperiod linear model (MLM)

The MLM is similar to ULM except that it only permits the carrying of
inventory of ROM coals from one period to the next where the quality pa-
rameters of ROM coals are known. That means, the run-of-mine and washed
coals produced in period t (=τ ) will be carried for further use in period t+1
only. So the corresponding mass balance constraint will be as follows:

− BPjlt + Cmjl(t−1)t + W Bkjl(t−1)t + M Bkjl(t−1)t = 0 (21.14)
m k k

where
Cmjl(t−1)t ROM coal of mine m produced in period (t − 1),
used in blended product j at location l in a period t
W Bkjl(t−1)t washed product k produced in period (t − 1),
used in blended product j at location l in a period t
M Bkjl(t−1)t middling product k produced in period (t − 1),
used in blended product j at location l in a period t
A number of new models can be formulated between MLM and ULM by
varying n (the inventory of ROM and washed coals which can be carried from
one period to up to n period, where the maximum value of n is 11 in our 12 pe-
riod case). Please note that we intentionally ignore the mathematical details
of MNM, ULM and MLM in this chapter, as they are too long and the empha-
sis of the chapter is on comparisons, and refer [Sarker and Gunn, 1997] and
[Sarker, 2003] to interested readers. Alternatively they can be made available
by the author upon request.
These models differ in their capability of handling fluctuating situations,
the computational time required, the size of the problem, optimal objective
function values, number of coal banks required, etc. By a fluctuating situation
we mean a variable planning environment. These aspects are discussed in the
following sections.
392 R.A. Sarker

SPM, ULM and MLM have been solved using the XMP linear program-
ming codes. The MNM is a specially structured nonlinear program that is
solved using a simple SLP algorithm developed by [Sarker and Gunn, 1997].

21.3 Model ﬂexibility

The SPM is the least flexible model and ULM is the most flexible model. The
MNM is less flexible in choosing the inputs in comparison to MLM, but more
flexible in handling fluctuating situations. In our computational experience,
the objective function value of MNM is a little less than that of MLM when
there is a stable demand and production pattern. This is due to the fact that
the blended product inventory may not be an attractive input in the next
period. In the following, we discuss how the models work under fluctuating
demand and inputs.
Consider the following simplified situations for a three period problem:
Let X1 , X2 and X3 be the maximum level of inputs available in periods 1,
2 and 3, and Q1 , Q2 and Q3 are the respective demands in periods 1, 2 and 3.

Case-1 X1 +X2 +X3 = Q1 +Q2 +Q3 , Q2 = X2 , Q1 < X1 and Q3 > X3

Case-2 X1 +X2 +X3 = Q1 +Q2 +Q3 , Q3 = X3 , Q1 < X1 and Q2 > X2

Case-3 X1 +X2 +X3 = Q1 +Q2 +Q3 , Q1 < X1 , Q2 > X2 and Q3 > X3

The simple line diagrams for these three cases are shown in parts (a)–(e)
of Figure 21.1. The models treat each of the cases as follows:

21.3.1 Case-1

SPM (Figure 21.1a):

• Q1 and Q2 are satisfied, but Q3 is not satisfied
• Shortage in period 3 = Q3 − X3
• Unused capacity in period 1 = X1 − Q1
• This model does not provide a feasible solution
MNM (Figure 21.1b):
• Q1 , Q2 and Q3 are satisfied
• IQ1 ≤ X1 − Q1 , IQ2 = IQ1 and IQ2 = Q3 − X3
• Unused capacity in period 1 = (X1 + X3 ) − (Q1 + Q3 )
• The model does provide a feasible solution
21 Alternative Mathematical Models 393

Period-1 Period-2 Period-3

inputs inputs inputs

X1 X2 X3

Q1 Q2 Q3
to customers to customers to customers
(a) SPM

inputs inputs inputs

X1 X2 X3
inventory inventory
of blend of blend
IQ1 IQ2
Q1 Q2 Q3
to customers to customers to customers
(b) MNM
inputs X1 inputs X2 inputs X3
IX1 IX2

inventory inventory
of inputs of inputs

Q1 Q2 Q3
to customers to customers to customers
(c) MLM
inputs X1 inputs X2 inputs X3

IX2
IX13
inventory
of inputs IX1
inventory
of inputs
Q1 Q2 Q3
to customers to customers to customer
(d) ULM

inputs inputs inputs

X1 X2 X3

inventory inventory
of blend of blend

Q1 IQ1 Q2 IQ2 Q3
to customers to customers to customers
(e) A Variant of MNM

Fig. 21.1 Simple case problem.

394 R.A. Sarker

MLM (Figure 21.1c):

• Q1 and Q2 are satisfied, but Q3 is not satisfied
• IX1 = 0, IX2 = 0
• Shortage in period 3 = Q3 − X3
• Unused capacity in period 1 = X1 − Q1
• This model does not provide a feasible solution
ULM (Figure 21.1d):
• Q1 , Q2 and Q3 are satisfied
• IX1 = 0, IX2 = 0
• IX13 ≤ X1 − Q1 , IX13 = Q3 − X3
• Unused capacity in period 1 = (X1 + X3 ) − (Q1 + Q3 )
• The model does provide a feasible solution
Variant of MNM (Figure 21.1e):
• Q1 and Q2 are satisfied, but Q3 is not satisfied
• IQ1 = 0, IQ2 = 0
• Shortage in period 3 = Q3 − X3
• Unused capacity in period 1 = X1 − Q1
• This model does not provide a feasible solution

21.3.2 Case-2

SPM (Figure 21.1a):

• Q1 and Q3 are satisfied, but Q2 is not satisfied
• Shortage in period 2 = Q2 − X2
• The model does not provide a feasible solution
MNM (Figure 21.1b):
• Q1 , Q2 and Q3 are satisfied
• IQ1 ≤ X1 − Q1 , IQ2 = 0 and IQ1 = Q2 − X2
• The model does provide a feasible solution
MLM (Figure 21.1c):
• Q1 , Q2 and Q3 are satisfied
• IX1 ≤ X1 − Q1 , IX2 = 0 and IX1 = Q2 − X2
• The model provides a feasible solution
ULM (Figure 21.1d):
• Q1 , Q2 and Q3 are satisfied
• IX13 = 0, IX2 = 0
• IX1 ≤ X1 − Q1 , IX2 = Q2 − +X2
• The model provides a feasible solution
21 Alternative Mathematical Models 395

Variant of MNM (Figure 21.1e):

• Q1 , Q2 and Q3 are satisﬁed
• IQ1 ≤ X1 − Q1 , IQ2 = 0 and IQ1 = Q2 − X2
• The model provides a feasible solution

21.3.3 Case-3

Only MNM and ULM provide feasible solutions.

The MNM and ULM provide feasible solutions for all of the three cases.
The MLM and variant of MNM give feasible solutions for case 2 only, and
SPM does not provide any feasible solution for any of the three cases.

21.4 Problem size and computation time

The ULM is a simple but large linear program. We can solve a reasonably
large linear program without much diﬃculty. The MLM model is also a linear
program. It is smaller than the upper bounding model. The MNM is smaller
than the ULM and close to MLM, but it takes the largest computational time.
In our study, we solved 36 test problems. In the test problems, we considered
the number of blended products up to 2, blending locations up to 3, number
of mines up to 3, coal washing facilities up to 3 and the time periods 3 to 12.
The arbitrary demand, capacity and quality data were randomly generated
for diﬀerent test problems. The ranges for some monthly data are: blended
product demand, 200,000 to 250,000 (metric) tonnes, washed coal demand,
290,000 to 550,000 (metric) tonnes, production capacity of mine-1, 85,000
to 270,000 (metric) tonnes, capacity of mine-2, 95,000 to 300,000 (metric)
tonnes and capacity of mine-3, 70,000 to 240,000 (metric) tonnes. The relative
problem sizes of the models are shown in Table 21.1.

Table 21.1 Relative problem size of ULM, MLM and MNM

Minimum Problem Size Maximum Problem Size

Number of Number of Number of Number of
Model Constraints Variables Constraints Variables
ULM 66 49 576 4770
MLM 66 44 576 1800
MNM 75 46+9 792 1494+216

For the largest problem, the number of variables in MNM is (1494+216) =

1710. Out of these 1710 variables, 216 variables are additional variables, which
396 R.A. Sarker

are required to solve the model using a SLP algorithm [Sarker and Gunn, 1997].
The ULM and MLM have a similar number of constraints and the MLM has
many fewer variables. For the largest problem, the ULM model contains 576
constraints and 4770 variables, whereas the MLM contains 576 constraints
and 1800 variables. With an increasing number of blended products, blending
locations, washed products, inputs and customers, the ULM becomes a very
large problem in comparison to the other two models.
The ULM is a possible candidate for the planning problem under consid-
eration. This model could be a very large linear program with a large number
of blended products, washed products, customers, mines and blending loca-
tions. If the problem size is too large, one could consider the following points
to reduce the size of the problem without losing the characteristics of the
model.

1. Reduce the number of periods in the planning horizon. Consider 5 periods

(3 of one month, 1 of three months and 1 of six months) instead of 12
periods.
2. Group the customers based on similar prices oﬀered and transportation
costs.
3. Reduce or group the number of products based on similar quality param-
eters.
4. Omit the quality constraints for blended product.
These considerations will provide a solution in more aggregate form. More
aggregate means more problem in disaggregation under an unstable planning
environment. In such a case, the reformulation of the disaggregate model may
be necessary to obtain a detailed and practically feasible solution.
Though we ignore the physical inventory in the upper bounding model, the
model does not require extensive inventory carry through. We have examined
closely the blending processes considered in modeling MNM and MLM. The
way of dealing with inventory is one of the major diﬀerences between these
two models. We allow the inventory of blended product in MNM and inven-
tory of run-of-mine and washed coals in MLM. In both cases, the inventories
were carried for use in the next period. However, the inventory of blended
product (in MNM) of a period can be used in further future periods through
blending in the intermediate periods.

21.5 Objective function values and ﬂuctuating situation

The SPM, considered in this chapter, is a collection of 12 single period models

without having linking mechanism from one period to the next. This means
the model does not allow carrying of any inventory. This model gives a lower
bound of the planning problem for proﬁt maximization and cannot be applied
in ﬂuctuating cases. The ULM is a Land algorithm or transportation type
21 Alternative Mathematical Models 397

model. This model considers all possible savings from the carrying inventories
of inputs for the blending process. This model gives an upper bound of the
problem. The computational comparisons of these models are presented in
Table 21.2.

Table 21.2 Objective function values of ULM, MLM and MNM

Objective Value of Smallest Objective Value of Largest
Model Problem (Million Dollars) Problem (Million Dollars)
ULM 27.817 117.852
MLM 27.778 117.327
MNM 27.771 116.011

SMP is infeasible for both cases. Normally the MLM shows lower profit
than that of the ULM and higher than that of the SPM. However this model
shows infeasibility in highly fluctuating situations. The MNM also shows
lower profit than that of the ULM, higher than that of the SPM and close
to that of MLM for most cases. The MNM can handle a highly fluctuating
situation as well as the ULM.

21.6 Selection criteria

The above analysis is used to suggest an appropriate model for the planning
problem. The ULM model can be proposed for use as a tactical model. The
analysis shows that

1. the ULM ensures the highest proﬁt,

2. it takes less computational time than does the nonlinear model,
3. it provides ﬂexibility to choose inputs and tackle ﬂuctuating situations,
and
4. for practicality the number of banks (coal piles) suggested by the model
are much lower than that predicted by theoretical calculations.

The selection of ULM as a planning model may raise a question as to

the use of MNM. The MNM is an alternative formulation approach for the
planning issues which follows the concept of traditional formulation of mul-
tiperiod planning. This model also allows us to check the solution of ULM or
MLM. The solutions of these models are reasonably close as expected. One
may prefer to carry an inventory of blended product instead of run-of-mine
and washed coals, and take the advantage of managing fewer banks using
the solution of MNM. The development of the solution method for MNM
has given us that opportunity. The use of available LP codes in solving a
large nonlinear program makes the algorithm very attractive for practical
applications. The algorithm can also be used for solving other multiperiod
398 R.A. Sarker

blending problems like food or crude oil blending. Since the algorithm devel-
oped for the MNM can be generalized for a class of nonlinear programs, its
contribution to the knowledge is justiﬁed.

21.7 Conclusions

We addressed a real-world coal-blending problem. The coal-blending problem

can be modeled in a number of different ways depending on the portion of
reality to be excluded. The alternative mathematical programming models
under different scenarios are discussed and analyzed.
A coal-blending model has been selected from a number of alternative
models by comparing:
1. the computational complexity,
2. the model flexibility,
3. the number of banks required, and
4. the objective function value.
The upper bound linear programming model seems appropriate because
1. it shows the highest profit,
2. it takes less computational time than does the nonlinear model,
3. it is the most flexible model in tackling an unstable planning environment,
4. for practicality, the number of banks suggested by the model is much lower
than that predicted by theoretical calculations, and
5. it is implementable.

References

[Bott and Badiozamani, 1982] Bott, D. L. and Badiozamani, K. (1982). Optimal blending
of coal to meeting quality compliance standards. In Proc. XVII APCOM Symposium,
(American Institute of Mining Engineers, New York, 1982) pages 15–23.
[Gershon, 1986] Gershon, M. (1986). A blending-based approach to mine planning and
production scheduling. In Proc. XIX APCOM Symposium, (American Institute of
Mining Engineers, New York, 1986) pages 120–126.
[Gunn, 1988] Gunn, E. A. (1988). Description of the coal blend and wash linear program-
ming model. In A Res. Report for Cape Breton Dev. Corp., Canada, (Cape Breton
Development Corporation, Cape Breton) pages 1–16.
[Gunn et al., 1989] Gunn, E. A., Allen, G., Campbell, J. C., Cunningham, B., and Ruther-
ford, R. (1989). One year of or: Models for operational and production planning in the
coal industry. In Won CORS Practice Prize-89, TIMS/ORSA/CORS Joint Meeting,
Vancouver, Canada.
[Gunn and Chwialkowska, 1989] Gunn, E. A. and Chwialkowska, E. (1989). Develop-
ments in production planning at a coal mining corporation. In Proc. Int. Ind. Eng.
Conference, (Institute of Industrial Engineers, Norcross, Georgia USA) pages 319–324.
21 Alternative Mathematical Models 399

[Gunn and Rutherford, 1990] Gunn, E. A. and Rutherford, P. (1990). Integration of an-
nual and operational planning in a coal mining enterprise. In Proc. of XXII APCOM
Intl. Symposium, Berlin, (Technische Universität, Berlin) pages 95–106.
[Hooban and Camozzo, 1981] Hooban, M. and Camozzo, R. (1981). Blending coal with
a small computer. Coal Age, 86:102–104.
[Sarker, 1990] Sarker, R. A. (1990). Slp algorithm for solving a nonlinear multiperiod coal
blending problem. In Honourable Mention Award, CORS National Annual Confer-
ence, Ottawa, Canada, (Canadian Operational Research Society, Ottawa) pages 1–27.
[Sarker, 1991] Sarker, R. A. (1991). A linear programming based algorithm for a specially
structured nonlinear program. In CORS Annual Conference, Quebec City, Canada.
[Sarker, 1994] Sarker, R. A. (1994). Solving a class of nonlinear programs via a sequence
of linear programs. In Eds Krishna, G. Reddy, R. Nadarajan, Stochastic Models,
Optimization Techniques and Computer Applications, pages 269–278. Wiley Eastern
Limited.
[Sarker, 2003] Sarker, R. A. (2003). Operations Research Applications in a Mining Com-
pany. (Dissertation. de Verlag, Berlin).
[Sarker and Gunn, 1990] Sarker, R. A. and Gunn, E. A. (1990). Linear programming
based tactical planning model for a coal industry. In CORS National Annual Confer-
ence, Ottawa, Canada.
[Sarker and Gunn, 1991] Sarker, R. A. and Gunn, E. A. (1991). A hierarchical production
planning framework for a coal mining company. In CORS Annual Conference, Quebec
City, Canada.
[Sarker and Gunn, 1994] Sarker, R. A. and Gunn, E. A. (1994). Coal bank scheduling
using a mathematical programming model. Applied Mathematical Modelling, 18:672–
678.
[Sarker and Gunn, 1995] Sarker, R. A. and Gunn, E. A. (1995). Determination of a coal
preparation strategy using a computer based enumeration method. Indian Journal of
Engineering and Material Sciences, 2:150–156.
[Sarker and Gunn, 1997] Sarker, R. A. and Gunn, E. A. (1997). A simple slp algorithm
for solving a class of nonlinear programs. European Journal of Operational Research,
101(1):140–154.
About the Editors

Emma Hunt undertook her undergraduate studies at the University of Ade-

laide, obtaining a B.A. with a double major in English and a B.Sc. with ﬁrst
class honors in Mathematics. She subsequently graduated with a Ph.D. in
the area of Stochastic Processes. She worked as a Research Scientist with
DSTO from 1999 to 2005. She is currently a Visiting Lecturer in the School
of Economics at the University of Adelaide. She is Executive Editor of the
ANZIAM Journal, Deputy Editor of the Bulletin of the Australian Society
for Operations Research (ASOR) and Chair of the South Australian Branch
of ASOR. Her research interests lie in optimization and stochastic processes.

Charles Pearce graduated in mathematics and physics from the University

of New Zealand and has a Ph.D. from the Australian National University in
the area of stochastic processes. He holds the Elder Chair of Mathematics at
the University of Adelaide and is on the editorial boards of more than a dozen
journals, including the Journal of Industrial and Management Optimization,
the Journal of Innovative Computing Information and Control, Advances in
Nonlinear Variational Inequalities, Nonlinear Functional Analysis and Ap-
plications, and the ANZIAM Journal (of which he is Editor-In-Chief).
He has research interests in optimization, convex analysis and probabilis-
tic modeling and analysis and about 300 research publications. In 2001 he
was awarded the ANZIAM medal of the Australian Mathematical Society
for outstanding contributions to applied and industrial mathematics in Aus-
tralia and New Zealand. In 2007 he was awarded the Ren Potts medal of the
Australian Society for Operations Research for outstanding contributions to
operations research in Australia.

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 401

and Its Applications 32, DOI 10.1007/978-0-387-98096-6 BM2,
c Springer Science+Business Media, LLC 2009

Word by Word - Picture Dictionary - Second Edition PDF
79% (14)
Word by Word - Picture Dictionary - Second Edition PDF
209 pages
Mathematics 2 Oneside
No ratings yet
Mathematics 2 Oneside
180 pages
Principals of Mathematics For Economics
No ratings yet
Principals of Mathematics For Economics
1,589 pages
Lagrange Multiplier Approach To Variational Problems and Applications
100% (3)
Lagrange Multiplier Approach To Variational Problems and Applications
360 pages
CUTTING EDGE - Elementary - Third Edition - Workbook
71% (7)
CUTTING EDGE - Elementary - Third Edition - Workbook
96 pages
Luc 2016
No ratings yet
Luc 2016
328 pages
Mathematical Methods For Economic Analysis
100% (1)
Mathematical Methods For Economic Analysis
245 pages
An Introduction To Continuous Optimization
100% (4)
An Introduction To Continuous Optimization
400 pages
Problem-Solving and Selected Topics in Euclidean Geometry: Sotirios E. Louridas Michael Th. Rassias
100% (2)
Problem-Solving and Selected Topics in Euclidean Geometry: Sotirios E. Louridas Michael Th. Rassias
238 pages
CBRC Math Reviewer Let
89% (9)
CBRC Math Reviewer Let
124 pages
Numerical Optimization - Solutions Manual
73% (11)
Numerical Optimization - Solutions Manual
75 pages
Notes On Luenberger's Vector Space Optimization
100% (3)
Notes On Luenberger's Vector Space Optimization
131 pages
Introduction Numerical Analysis
No ratings yet
Introduction Numerical Analysis
252 pages
Eecs127 Reader
No ratings yet
Eecs127 Reader
199 pages
Hybrid Switching Diffusions Properties and Applications
No ratings yet
Hybrid Switching Diffusions Properties and Applications
394 pages
Jorge Nocedal, Stephen Wright - Numerical Optimization - Solution Manual-Springer (2006)
No ratings yet
Jorge Nocedal, Stephen Wright - Numerical Optimization - Solution Manual-Springer (2006)
76 pages
Notes Ipad
No ratings yet
Notes Ipad
263 pages
Siegfried Carl, Seppo Heikkilä) Fixed Point Theo (Book4You)
No ratings yet
Siegfried Carl, Seppo Heikkilä) Fixed Point Theo (Book4You)
492 pages
Alexander J. Zaslavski - Optimization in Banach Spaces-Springer (2022)
No ratings yet
Alexander J. Zaslavski - Optimization in Banach Spaces-Springer (2022)
132 pages
Numerical Methods For Hamilton-Jacobi-Bellman Equations
No ratings yet
Numerical Methods For Hamilton-Jacobi-Bellman Equations
68 pages
Apunte EDP - AF
No ratings yet
Apunte EDP - AF
51 pages
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
No ratings yet
Kelley C.T. - Iterative Methods For optimization-SIAM (1999)
188 pages
Convexity and Well-Posed Problems (CMS Books in Mathematics) by Roberto Lucchetti
No ratings yet
Convexity and Well-Posed Problems (CMS Books in Mathematics) by Roberto Lucchetti
321 pages
Introduction To Optimization - Jean-François Aujol
No ratings yet
Introduction To Optimization - Jean-François Aujol
51 pages
Barbu V., Precupanu T. Convexity and Optimization in Banach Spaces (4ed., Springer, 2012) (ISBN 9789400722460) (O) (381s) - MOc - PDF
0% (1)
Barbu V., Precupanu T. Convexity and Optimization in Banach Spaces (4ed., Springer, 2012) (ISBN 9789400722460) (O) (381s) - MOc - PDF
381 pages
Machine Learning
100% (3)
Machine Learning
47 pages
Cono Semi Defini Do
100% (2)
Cono Semi Defini Do
155 pages
Libro - An Introduction To Continuous Optimization
No ratings yet
Libro - An Introduction To Continuous Optimization
399 pages
Convex It y 2015
0% (1)
Convex It y 2015
437 pages
Nonlinear Optimization CO 367
No ratings yet
Nonlinear Optimization CO 367
105 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
The Just Intonation Primer
From Everand
The Just Intonation Primer
David B Doty
No ratings yet
Manual + Caderno (2 Em 1)
From Everand
Manual + Caderno (2 Em 1)
Tiago Ferreira
No ratings yet
Mosek Modeling
No ratings yet
Mosek Modeling
93 pages
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
No ratings yet
Kelley - Iterative Methods For Optimization-SIAM (1999) PDF
187 pages
Math Notes
No ratings yet
Math Notes
244 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Paul G.bamberg-Convexity and Optimization With Applications
No ratings yet
Paul G.bamberg-Convexity and Optimization With Applications
131 pages
ECE 236B Course Notes
No ratings yet
ECE 236B Course Notes
90 pages
Ordinary Differential Equations: December 2014
No ratings yet
Ordinary Differential Equations: December 2014
5 pages
Optimum Design of Mechanical Elements: Class Notes For AME60661
No ratings yet
Optimum Design of Mechanical Elements: Class Notes For AME60661
217 pages
Rolle's Theorem & Lagrange's Theorem PDF
No ratings yet
Rolle's Theorem & Lagrange's Theorem PDF
16 pages
Mathematical Economics Lecture Notes: Alexander W. Richter
No ratings yet
Mathematical Economics Lecture Notes: Alexander W. Richter
128 pages
Kompendium Sub
No ratings yet
Kompendium Sub
348 pages
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
No ratings yet
Solutions To Selected Problems in Numerical Optimization 2nbsped - Compress
75 pages
Introduction To Optimisation PDF
No ratings yet
Introduction To Optimisation PDF
264 pages
Book2017 xGUoysw PDF
No ratings yet
Book2017 xGUoysw PDF
240 pages
Takashi's Econ633 Lecture Notes May 18 2010
No ratings yet
Takashi's Econ633 Lecture Notes May 18 2010
105 pages
Numerical Linear PDF
No ratings yet
Numerical Linear PDF
196 pages
507 592 PDF
No ratings yet
507 592 PDF
86 pages
Undergraduate Text
No ratings yet
Undergraduate Text
351 pages
ODE LargeFont
No ratings yet
ODE LargeFont
276 pages
LP3 - Pre-Calculus
No ratings yet
LP3 - Pre-Calculus
75 pages
Mfepoly PDF
No ratings yet
Mfepoly PDF
168 pages
1 9781611971439 FM
No ratings yet
1 9781611971439 FM
10 pages
Num PDF
No ratings yet
Num PDF
96 pages
664 Optimal Control
No ratings yet
664 Optimal Control
184 pages
Human Nature Potential in Nurture
From Everand
Human Nature Potential in Nurture
David L. Hawk
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Metodos Iterativos para Optimizacion
No ratings yet
Metodos Iterativos para Optimizacion
188 pages
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Math562 ContinuousOptimization
No ratings yet
Math562 ContinuousOptimization
126 pages
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Semi Definite and Cone Programming
No ratings yet
Semi Definite and Cone Programming
89 pages
Multiple Integrals and Wallis Formula
0% (1)
Multiple Integrals and Wallis Formula
10 pages
Biomedical Instrumentation and Measurements
89% (163)
Biomedical Instrumentation and Measurements
536 pages
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Rigid Body
No ratings yet
Rigid Body
38 pages
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
7 - Graphing The Derivative Function
No ratings yet
7 - Graphing The Derivative Function
20 pages
2122 Mock S6 M2
No ratings yet
2122 Mock S6 M2
28 pages
Week 03
No ratings yet
Week 03
9 pages
Semesterwise Detailed Course Structure
No ratings yet
Semesterwise Detailed Course Structure
18 pages
HW2 MTH452/552
No ratings yet
HW2 MTH452/552
7 pages
Biomedical Device Technology Principles and Design PDF
85% (13)
Biomedical Device Technology Principles and Design PDF
595 pages
Homework 4 Solutions
No ratings yet
Homework 4 Solutions
5 pages
01 Net Assignment # 1 CH # 1 Book 1
No ratings yet
01 Net Assignment # 1 CH # 1 Book 1
9 pages
Signal Biomedical Processing
90% (10)
Signal Biomedical Processing
616 pages
Fundamentals of Artificial Intelligence PDF
100% (13)
Fundamentals of Artificial Intelligence PDF
730 pages
(Smart Sensors, Measurement and Instrumentation 41) IoT System Design - Project Based Approach - James-Seth (Springer, 2022)
100% (3)
(Smart Sensors, Measurement and Instrumentation 41) IoT System Design - Project Based Approach - James-Seth (Springer, 2022)
291 pages
Relationship Between Zeroes and Coefficients of A Polynomial - Ic16738
No ratings yet
Relationship Between Zeroes and Coefficients of A Polynomial - Ic16738
4 pages
3D Printing and Biofabrication
100% (3)
3D Printing and Biofabrication
563 pages
Fundamentals of Bio Medical Engineering
91% (11)
Fundamentals of Bio Medical Engineering
273 pages
HW5 Sol
No ratings yet
HW5 Sol
2 pages
Introduction To Artificial Intelligence
93% (41)
Introduction To Artificial Intelligence
316 pages
Lagrange
No ratings yet
Lagrange
5 pages
Unit 8 Pack
No ratings yet
Unit 8 Pack
25 pages
Introduction To Sensors PDF
100% (8)
Introduction To Sensors PDF
211 pages
Calculus 2 Cheat Sheet
100% (1)
Calculus 2 Cheat Sheet
1 page
Biomedical Devices
No ratings yet
Biomedical Devices
388 pages
FINC6005: Advanced Asset Pricing: Guanglian Hu
No ratings yet
FINC6005: Advanced Asset Pricing: Guanglian Hu
35 pages
223 Gram Schmidt
No ratings yet
223 Gram Schmidt
3 pages
Artificial Intelligence and Machine Learning For Healthcare Vol
100% (4)
Artificial Intelligence and Machine Learning For Healthcare Vol
239 pages
Solutions To Some Exercises From Bayesian Data Analysis, Third Edition, by Gelman, Carlin, Stern, and Rubin
No ratings yet
Solutions To Some Exercises From Bayesian Data Analysis, Third Edition, by Gelman, Carlin, Stern, and Rubin
36 pages
4 Page Per Sheet Class 1th Notes Math
No ratings yet
4 Page Per Sheet Class 1th Notes Math
5 pages
Statistical Distance
No ratings yet
Statistical Distance
3 pages
IoT Great Book For Beginners
100% (9)
IoT Great Book For Beginners
350 pages
Quotient Spaces
No ratings yet
Quotient Spaces
6 pages
Georgia Standards of Excellence Curriculum Map: Accelerated GSE 7B/8
No ratings yet
Georgia Standards of Excellence Curriculum Map: Accelerated GSE 7B/8
7 pages
Gamma and Beta Functions
No ratings yet
Gamma and Beta Functions
18 pages
2024-26 MAA HL Overview Updated Nov 2024
No ratings yet
2024-26 MAA HL Overview Updated Nov 2024
6 pages
Electrical Circuits in Biomedical Engineering Problems With Solutions PDF
100% (2)
Electrical Circuits in Biomedical Engineering Problems With Solutions PDF
818 pages
Section 5. Graphing Systems: 5A. The Phase Plane
No ratings yet
Section 5. Graphing Systems: 5A. The Phase Plane
5 pages
Maximal Functions On Classical Lorentz Spaces and Hardy'S Inequality With Weights For Nonincreasing Functions
No ratings yet
Maximal Functions On Classical Lorentz Spaces and Hardy'S Inequality With Weights For Nonincreasing Functions
9 pages
Crunk Nicolson
No ratings yet
Crunk Nicolson
7 pages
Discrete Mathematics: Final Exam
No ratings yet
Discrete Mathematics: Final Exam
5 pages
Internet-of-Things (IoT) System Architectures, Algorithms, Methodologies PDF
100% (3)
Internet-of-Things (IoT) System Architectures, Algorithms, Methodologies PDF
102 pages
Biomedical Engineering Technical - Applications in Medicine
100% (4)
Biomedical Engineering Technical - Applications in Medicine
431 pages
Introduction To Functions: Short Answer
No ratings yet
Introduction To Functions: Short Answer
8 pages
Project Crazy Project: Computation
No ratings yet
Project Crazy Project: Computation
4 pages
BIOMED INS, Chan, 2nd
100% (2)
BIOMED INS, Chan, 2nd
758 pages
Introduction To Bio Medical Instrumentation The Technology of Patient Care
88% (8)
Introduction To Bio Medical Instrumentation The Technology of Patient Care
245 pages
Biomedical Engineering and Design Handbook (Vol. 2)
100% (9)
Biomedical Engineering and Design Handbook (Vol. 2)
816 pages
Machine Learning Paradigms
100% (10)
Machine Learning Paradigms
336 pages
Protein Protein Interactions Methods and
100% (1)
Protein Protein Interactions Methods and
612 pages
Basic and Applied Aspects of Biotechnology
100% (5)
Basic and Applied Aspects of Biotechnology
543 pages
Flexible Electronics: Fabrication and Ubiquitous Integration
100% (1)
Flexible Electronics: Fabrication and Ubiquitous Integration
162 pages
Springer
100% (2)
Springer
169 pages
Synthetic Gene Circuits - Methods and Protocols (2021)
100% (2)
Synthetic Gene Circuits - Methods and Protocols (2021)
356 pages
BS 2790 1992
100% (2)
BS 2790 1992
186 pages
3D Bioprinting: Principles and Protocols
100% (2)
3D Bioprinting: Principles and Protocols
263 pages
(Lecture Notes in Electrical Engineering 399) Canjun Yang, G. S. Virk, Huayong Yang (Eds.)-Wearable Sensors and Robots_ Proceedings of International Conference on Wearable Sensors and Robots 2015-Spri
No ratings yet
(Lecture Notes in Electrical Engineering 399) Canjun Yang, G. S. Virk, Huayong Yang (Eds.)-Wearable Sensors and Robots_ Proceedings of International Conference on Wearable Sensors and Robots 2015-Spri
571 pages
Practical Applications Biomedical Engineering I To 13
No ratings yet
Practical Applications Biomedical Engineering I To 13
420 pages
Sensor Guang Zhong Yang 2018
No ratings yet
Sensor Guang Zhong Yang 2018
649 pages
541420
No ratings yet
541420
259 pages

Optimization Structure and Applications

Uploaded by

Optimization Structure and Applications

Uploaded by

OPTIMIZATION

Springer Optimization and Its Applications

Aims and Scope

Structure and Applications

Library of Congress Control Number: 2009927130

Mathematics Subject Classification (2000): 49-06, 65Kxx, 65K10, 76D55, 78M50

c Springer Science+Business Media, LLC 2009

Cover illustration: Picture provided by Elias Tyligadas

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Part I Optimization: Structure

1 On the nondiﬀerentiability of cone-monotone functions

2 Duality and a Farkas lemma

2.3.4 The “discrete” Brion and Vergne formula . . . . . . . . . 25

4 Convergence of truncates in l1 optimal feedback control . . 55

5 Asymptotical stability of optimal paths in nonconvex

6 Pontryagin principle with a PDE: a uniﬁed approach . . . . . 135

7 A turnpike property for discrete-time control systems

8 Mond–Weir Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8.4 Mond–Weir dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9 Computing the fundamental matrix of an M /G/1–type

10 A comparison of probabilistic and invariant subspace

11 Interpolating maps, the modulus map and Hadamard’s

Part II Optimization: Applications

12 Estimating the size of correcting codes using extremal

13 New perspectives on optimal transforms of random

14 Optimal capacity assignment in general queueing

15 Analysis of a simple control policy for stormwater

16 Optimal design of linear consecutive–k–out–of–n systems . 307

16.6 Procedures to improve designs not satisfying necessary

17 The (k+1)-th component of linear consecutive–k–out–of–n

18 Optimizing properties of polypropylene and elastomer

19 Constrained spanning, Steiner trees and the triangle

19.3.5 Proof of E-Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

20 Parallel line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

21 Alternative Mathematical Programming Models: A Case

About the Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401

19.3 Feasible solution for our instance of CSPI

9.1 The interlacing property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

21.1 Relative problem size of ULM, MLM and MNM . . . . . . . . . . . . 395

This volume comprises a selection of material based on presentations at the

assumed in turnpike theory. The succeeding chapter proposes a uniﬁed

Ballarat), Jie Mi (Florida International University), Marco Muselli (Insti-

Jonathan Borwein and Rafal Goebel

Abstract In ﬁnite-dimensional spaces, cone-monotone functions – a special

Key words: Monotone functions, ordered Banach spaces, generating cones,

Functions for which f (y) ≥ f (x) whenever y − x is an element of a given

C. Pearce, E. Hunt (eds.), Structure and Applications, Springer Optimization 3

|l(x) − l(y)| ≤ K x − y ∞ , for all x, y ∈ IRn .

Let z ∈ IRn be given by zi = K, i = 1, 2, . . . , n, and deﬁne a function

In eﬀect, f is coordinate-wise nondecreasing. Thus a Lipschitz function de-

(c) f is almost everywhere continuous.

Theorem 2 (Borwein, Burke and Lewis). Let X be a separable space

We begin the section by showing that K-monotonicity of a function f : X →

Example 1 (a nongenerating cone). Suppose that K − K = X. Let L ⊃

f (x) = g(Pl (x)).

The function f : X → IR is K-monotone: if y ≥K x, that is, y − x ∈ K, then

In light of the example above, in what follows we only discuss generating

In what follows, we assume that the spaces in question are inﬁnite-

Example 2 (lack of continuity, general Banach space). Consider an ordered

Example 3 (lack of continuity but continuity on a dense set of directions).

f (x) = lim sup sign

where a+ = max{0, a}. Then f is K-monotone. Indeed, if x ≤K y, that is,

We now need to introduce another notion of a null set in a separable

Example 4 (continuity, but only on a dense subset of a separable and non-

Example 5 (Lipschitz continuity, but no weak continuity). Let X = c0 with

The denominator is never 0, as at least one of the summands is always posi-

monotone with respect to the nonnegative cone K, fails to be Gâteaux dif-

and the last expression diverges to +∞ as k → ∞. Thus f is not diﬀerentiable

and, if the function was Gâteaux diﬀerentiable at x, the directional derivative

To see that f fails to be Gâteaux diﬀerentiable at any point x ∈ −K, note

Monotonicity is easy to check, and f is continuous as (x − ki )+ ≤ x+ . If

p(x) = lim sup |xn |

is a nowhere Gâteaux diﬀerentiable continuous seminorm in l∞ , see Phelps

Lipschitz functions lead to local examples of nonregular cone-monotone func-

We begin the section by showing that K-monotonicity of a function f : X →

The function f : X → IR is K-monotone: if y ≥K x, that is, y − x ∈ K, then

Abstract We consider the integer program max{c x | Ax = b, x ∈ Nn }. A

P∗ → minm {b λ | A λ ≥ c}, (2.3)

P: f (b, c) := max{c x | Ax = b; x ≥ 0} (2.5)

Re(A λ − c) > 0, (2.11)

where γ ∈ Rm is ﬁxed and satisﬁes A γ −c > 0. Incidentally, observe that the

f(b, rc)1/r = εc,x(σ ) ⎢ ⎥

lim ln f(b, rc)1/r =

with : A λ − c ≥ 0 with : Re(A λ − c) > 0

Observe that the domain of deﬁnition (2.19) of Fd (., c) is the exponential

Therefore the value fd (b, c) is obtained by solving the inverse Z-transform

with Re(A λ − c) > 0. with |z Ak | > εck , k = 1, . . . , n.

In particular, for t ∈ N suﬃcently large, the function t → f (tb, c) − fd (tb, c)

Aσ λ + 2iπ Aσ θ = cσ , (2.29)