0% found this document useful (0 votes)
123 views499 pages

Wiley - Performance of Computer Communication Systems

Boudewijn R. Haverkort – Performance os Computer Communication Systems 1998

Uploaded by

STE7EN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views499 pages

Wiley - Performance of Computer Communication Systems

Boudewijn R. Haverkort – Performance os Computer Communication Systems 1998

Uploaded by

STE7EN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 499

Performance of Computer Communication Systems: A Model-Based Approach.

Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons, Ltd
Print ISBN 0-471-197228-2 Electronic ISBN 0-470-84192-3

PERFO~CI~ 0E;
CoMPUTER
COMMUNICA~ON
SYSTEMS
PERFORMAN~E~F
COMPUTER
COMMUNICATION
SYSTEMS
A Model-Based Approach

BOUDEWIJN R. HAVERKORT
Rheinisch- Westfalische Technische
Hochschule Aachen, Germany

John Wiley & Sons, Ltd


Chichester l New York l Weinheim l Brisbane l Singapore Toronto
l
Copyright 0 1998 by John Wiley & Sons Ltd,
Baffins Lane, Chichester,
West SussexPO 19 1UD, England
National 0 1243 779777
International (+44) 1243 779777
e-mail (for orders and customer service enquiries): [email protected]
Visit our Home Page on http://www.wiley.co.uk or http://www.wiley.com

Reprinted June 1999

All Rights Reserved.No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or
otherwise, except under the terms of the Copyright, Designs and PatentsAct 1988 or under the terms of a
licence issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London W 1P 9HE, UK,
without the permission in writing of the Publisher.

Neither the author nor John Wiley & Sons Ltd accept any responsibility or liability for loss or damage
occasioned to any person or property through using the material, instructions, methods or ideas contained
herein, or acting or refraining from acting as a result of such use. The author and Publisher expressly
disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will
be no duty on the author or Publisher to correct any errors or defects in the software.

Designations used by companies to distinguish their products are often claimed as trademarks. In all
instanceswhere John Wiley & Sons is aware of a claim, the product names appear in initial capital or all
capital letters. Readers,however, should contact the appropriate companies for more complete information
regarding trademarks and registration.

Other Wiley Editorial Oflees

New York Weinheim Brisbane Singapore Toronto


l l l l

Library of Congress Cataloguing in Publication Data

Haverkort, Boudewijn R.
Performance of computer communication systems : a model-based
approach / Boudewijn R. Haverkort.
p. cm.
Includes bibliographical references and index.
ISBN O-471-97228-2 (alk. paper)
1. Computer networks-Evaluation. 2. Electronic digital
computers-Evaluation. 3. Queuing theory. 4. Telecommunication
systems-Evaluation, 5. Stochastic processes. I. Title.
TK5 105.5.H375 1998
004.6-dc2 1 98-27222
CIP

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 0 471 97228 2

Produced from Postcript files supplied by the author.


Printed and bound in Great Britain by Biddles Ltd, Guildford and King’s Lynn.
This book is printed on acid-free paper responsibly manufactured from sustainable
forestry, in which at least two trees are planted for each one used in paper production.
In memory of my father
Johannes Hermannus Hendrikus Haverkort
Contents

Preface xvii

I Performance modelling with stochastic processes 1

1 Introduction 3
1.1 Performance evaluation: aim and approach .................. 3
1.2 Model solution techniques ........................... 8
1.3 Stochastic models ................................ 9
1.4 Queueing models ................................ 11
1.4.1 The principle of queueing ....................... 11
1.4.2 Single queues: the Kendall notation .................. 13
1.5 Tool support. .................................. 15
1.5.1 Model construction ........................... 15
1.5.2 Model solution ............................. 18
1.6 Further reading ................................. 19
1.7 Exercises ..................................... 19

2 Little’s law and the MIMI1 queue 21


2.1 Little’s law ................................... 21
2.1.1 Understanding Little’s law ....................... 21
2.1.2 Proof of Little’s law ........................... 24
2.2 The simplest queueing model: the MI M 11 queue ............... 25
2.3 Further reading ................................. 28
2.4 Exercises ..................................... 28

3 Stochastic processes 31
3.1 Overview of stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . 31
... Contents
Vlll

3.2 Renewal processes ................................ 34


3.3 Discrete-time Markov chains .......................... 37
3.4 Convergence properties of Markov chains ................... 41
3.5 Continuous-time Markov chains ........................ 43
3.5.1 From DTMC to CTMC ........................ 43
3.5.2 Evaluating the steady-state and transient behaviour ......... 45
3.6 Semi-Markov chains ............................... 51
3.7 The birth-death process ............................ 53
3.8 The Poisson process. .............................. 53
3.9 Renewal processes as arrival processes ..................... 54
3.9.1 Phase-type distributions ........................ 55
3.9.2 Phase-type renewal processes ..................... 58
3.10 Summary of Markov chains .......................... 62
3.11 Further reading ................................. 63
3.12 Exercises. .................................... 63

II Single-server queueing models 67

4 MIMI1 queueing models 69


4.1 General solution of the MIMI1 queue ..................... 70
4.2 The MIMI1 queue with constant rates ..................... 73
4.3 The PASTA property .............................. 74
4.4 Response time distribution in the MIMI1 queue ............... 76
4.5 The MIMI m multi-server queue ........................ 77
4.6 The MIMI 00 infinite-server queue ....................... 79
4.7 Job allocation in heterogeneous multi-processors ............... 80
4.8 The MIMI11 m single-server queue with bounded buffer ........... 83
4.9 The MIMI m Im multi-server queue without buffer ............... 85
4.10 The MIM(l((K queue or the terminal model ................. 86
4.11 Mean values for the terminal model ...................... 88
4.12 Further reading ................................. 92
4.13 Exercises. .................................... 92

5 MIGjl-FCFS queueing models 95


5.1 The M/Gil result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 An intuitive proof of the M/G/l result . . . . . . . . . . . . . . . . . . . . . 99
Contents iX

5.2.1 Residual lifetime ............................ 100


5.2,2 Intuitive proof .............................. 103
5.3 A formal proof of the MlGll result ...................... 104
5.4 The MlGll model with batch arrivals ..................... 107
5.5 MI G/ 1 queueing systems with server breakdowns ............... 109
5.5.1 Single arrivals .............................. 110
5.5.2 Batch arrivals .............................. 111
5.6 Further reading ................................. 112
5.7 Exercises ..................................... 112

6 M/G/ 1 queueing models with various scheduling disciplines 115


6.1 Non-preemptive priority scheduling ...................... 115
6.2 Preemptive priority scheduling ......................... 121
6.3 Shortest job next scheduling .......................... 123
6.4 Round robin scheduling ............................. 126
6.5 Processor sharing scheduling .......................... 128
6.6 Scheduling based on elapsed processing time ................. 129
6.7 Further reading ................................. 131
6.8 Exercises ..................................... 131

7 G)Mll-FCFS and GIGIl-FCFS queueing models 133


7.1 The GlMll queue ................................ 133
7.2 The GIG11 queue ................................ 139
7.3 Approximate results for the G(GI 1 queue ................... 143
7.4 Further reading ................................. 143
7.5 Exercises ..................................... 144

8 PHIPHIl queueing
models 145
8.1 The MIMI1 queue ................................ 145
8.2 The PHIPHIl queue .............................. 148
8.2.1 A structured description of the CTMC ................ 148
8.2.2 Matrix-geometric solution ....................... 152
8.2.3 Stability issues ............................. 153
8.2.4 Performance measures ......................... 154
8.3 Numerical aspects ................................ 155
8.3.1 Solving the boundary equations .................... 155
X Contents

8.3.2 A successive substitution algorithm .................. 156


8.3.3 The logarithmic reduction algorithm ................. 158
8.4 A few special cases ............................... 162
8.4.1 The M/PHI1 q ueue: an explicit expression for R ........... 162
8.4.2 The PHlMlm queue ........................... 163
8.5 The caudal curve ................................ 164
8.6 Other models with matrix-geometric solution ................. 167
8.7 Further reading ................................. 169
8.8 Exercises ..................................... 169

9 Polling models 173


9.1 Characterisation of polling models ....................... 173
9.1.1 Basic terminology ............................ 174
9.1.2 The visit order ............................. 174
9.1.3 The scheduling strategy ........................ 176
9.2 Cyclic polling: cycle time and conservation law ............... 177
9.3 Count-based symmetric cyclic polling models ................. 180
9.4 Count-based asymmetric cyclic polling models ................ 183
9.4.1 Exhaustive service: exact analysis ................... 183
9.4.2 Some approximate results ....................... 184
9.5 Performance evaluation of the IBM token ring ................ 186
9.5.1 Timed-token access mechanisms .................... 186
9.5.2 Approximating the timed-token access mechanism .......... 187
9.5.3 The influence of the token holding timer ............... 189
9.6 Local and global time-based polling models .................. 191
9.7 Further reading ................................. 194
9.8 Exercises ..................................... 194

III Queueing network models 197

10 Open queueing networks 199


10.1 Basic terminology ................................ 200
10.2 Feed-forward queueing networks ........................ 200
10.2.1 The MIMI1 queue ............................ 200
10.2.2 Series of MIMI1 queues ......................... 201
10.2.3 Feed-forward queueing networks .................... 204
Contents xi

10.3 Jackson queueing networks ........................... 205


10.4 The QNA method ................................ 207
10.4.1 The QNA queueing network class ................... 208
10.4.2 The QNA method ............................ 209
10.4.3 Summary of approximations ...................... 219
10.5 Telecommunication network modelling .................... 220
10.5.1 System description ........................... 220
10.5.2 Evaluation using Jackson queueing networks ............. 223
10.5.3 Evaluation using networks of M 1G 11 queues ............. 224
10.5.4 Evaluation using QNA ......................... 225
10.6 Further reading ................................. 226
10.7 Exercises ..................................... 226

11 Closed queueing networks 229


11.1 Gordon-Newell queueing networks ....................... 229
11.2 The convolution algorithm ........................... 235
11.3 Mean-value analysis ............................... 241
11.4 Mean-value analysis-based approximations .................. 245
11.4.1 Asymptotic bounds ........................... 245
11.4.2 The Bard-Schweitzer approximation .................. 247
11.4.3 Balanced networks ........................... 248
11.5 An approximate solution method ....................... 251
11.5.1 Queueing network definition ...................... 252
11.5.2 Basic approach ............................. 252
11.5.3 Numerical solution ........................... 254
11.5.4 Extension to other queueing stations ................. 256
11.6 Application study ..... / . .... 258
.......................................
11.6.1 System description and basic model 258
11.6.2 Evaluation with MVA and other techniques ............. 259
11.6.3 Suggestions for performance improvements .............. 261
11.7 Further reading ................................. 261
11.8 Exercises ..................................... 263

12 Hierarchical queueing networks 265


12.1 Load-dependent servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
12.2 The convolution algorithm . , . . . . . . . . . . . . . . . . . . . . . . . . . 269
xii Contents

12.3 Special cases of the convolution algorithm .................. 272


12.3.1 Convolution with multi-server queueing stations ........... 272
12.3.2 Convolution with an infinite-server station .............. 273
12.4 Mean-value analysis ............................... 274
12.5 Exact hierarchical decomposition ....................... 277
12.5.1 Informal description of the decomposition method .......... 277
12.5.2 Formal derivation of the decomposition method ........... 279
12.6 Approximate hierarchical decomposition ................... 283
12.6.1 Multiprogrammed computer system models ............. 283
12.6.2 Studying paging effects ......................... 286
12.7 Further reading ................................. 290
12.8 Exercises ..................................... 291

13 BCMP queueing networks 293


13.1 Queueing network class and solution ..................... 293
13.1.1 Model class ............................... 293
13.1.2 Steady-state customer probability distribution ............ 295
13.2 Computational algorithms ........................... 298
13.2.1 The convolution algorithm ....................... 298
13.2.2 Mean-value analysis ........................... 299
13.3 Further reading ................................. 300
13.4 Exercises ..................................... 301

IV Stochastic Petri net models 303

14 Stochastic Petri nets 305


14.1 Definition .................................... 305
14.1.1 Static SPN properties ......................... 306
14.1.2 Dynamic SPN properties ........................ 307
14.1.3 SPN extensions ............................. 311
14.2 Structural properties .............................. 312
14.3 Measures to obtain from SPNs ......................... 316
14.4 Mapping SPNs to CTMCs ........................... 318
14.5 Further reading ................................. 323
14.6 Exercises ..................................... 325
...
Contents x111

15 Numerical solution of Markov chains 329


15.1 Computing steady-state probabilities ..................... 329
15.1.1 Gaussian elimination .......................... 330
15.1.2 LU decomposition ............................ 334
15.1.3 Power, Jacobi, Gauss-Seidel and SOR iterative methods. ...... 337
15.2 Transient behaviour ............................... 343
15.2.1 Introduction ............................... 343
15.2.2 Runge-Kutta methods ......................... 346
15.2.3 Uniformisation ............................. 347
15.2.4 Cumulative measures .......................... 352
15.3 Further reading ................................. 353
15.4 Exercises ..................................... 354

16 Stochastic Petri
net applications 357
16.1 Multiprogramming systems ........................... 357
16.1.1 Multiprogramming computer systems ................. 358
16.1.2 The SPN model ............................. 358
16.1.3 Some numerical results ......................... 360
16.2 Polling models .................................. 363
16.2.1 Count-based, cyclic polling models .................. 365
16.2.2 Local time-based, cyclic polling models ................ 367
16.2.3 Approximating large models ...................... 371
16.2.4 Polling with load-dependent visit ordering .............. 373
16.3 An SPN availability model ........................... 376
16.4 Resource reservation systems .......................... 378
16.5 Further reading ................................. 381
16.6 Exercises ..................................... 382

17 Infinite-state SPNs 383


17.1 Introduction ................................... 383
17.2 Definitions .................................... 385
17.2.1 Preliminaries .............................. 385
17.2.2 Requirements: formal definition .................... 385
17.2.3 Requirements: discussion ........................ 386
17.3 Matrix-geometric solution ........................... 387
17.4 iSPN specification and measure computation ................. 390
xiv Contents

17.4.1 From iSPN to the underlying QBD .................. 390


17.4.2 Efficient computation of reward-based measures ........... 391
17.5 Application studies ............................... 393
17.5.1 A queueing model with delayed service ................ 393
17.5.2 Connection management in communication systems ......... 395
17.5.3 A queueing system with checkpointing and recovery ......... 399
17.6 Further reading ................................. 405
17.7 Exercises ..................................... 405

V Simulation 407

18 Simulation: methodology and statistics 409


18.1 The idea of simulation ............................. 409
18.2 Classifying simulations ............................. 411
18.3 Implementation of discrete-event simulations ................. 412
18.3.1 Terminology ............................... 412
18.3.2 Time-based simulation ......................... 413
18.3.3 Event-based simulation ......................... 415
18.3.4 Implementation strategies ....................... 418
18.4 Random number generation .......................... 419
18.4.1 Generating pseudo-random numbers ................. 420
18.4.2 Testing pseudo-uniformly distributed random numbers ....... 422
18.4.3 Generation of non-uniformly distributed random numbers ...... 424
18.5 Statistical evaluation of simulation results .................. 427
18.5.1 Obtaining measurements ........................ 428
18.5.2 Mean values and confidence intervals ................. 430
18.6 Further reading ................................. 434
18.7 Exercises ..................................... 435

VI Appendices 439

A Applied probability for performance analysts 441


A.1 Probability measures .............................. 441
A.2 Discrete random variables ........................... 443
A.3 Some important discrete distributions ..................... 444
Contents XV

A.4 Continuous random variables ......................... 445


A.5 Some important continuous distributions ................... 447
A.6 Moments of random variables ......................... 450
A.7 Moments of discrete random variables ..................... 451
A.8 Moments of continuous random variables ................... 452

B Some useful techniques in applied probability 455


B.l Laplace transforms . . . . . . . . . . . . . . . . . . . . * . . . . . . . . . 455
B.2 Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
B.3 Tensor sums and products . . . . . . . . . . . . . . . . . . . . . . . . . . 458

C Abbreviations 461

Bibliography 465

Index 489
Preface

HEN I began lecturing a course on the performance of communication systems in


W the electrical engineering department of the University of Twente in the fall of
1990, I had difficulty in finding a suitable textbook for this course. Many books were too
much mathematically oriented, others used only a very limited set of evaluation techniques,
e.g.,only the MlGll queue. I therefore decided to use a collection of book chapters, recent
survey articles and some research papers, thereby focusing on the performance evaluation
of network access mechanisms. During the semester, however, I began to realise that using
material from different sources is not the most adequate. To overcome this inadequacy, I
started to develop my own course material in the years that followed.
In 1993 I also started lecturing a course on the performance evaluation of computer
systems in the computer science department of the University of Twente, traditionally fo-
cused on scheduling techniques, queueing networks and multiprogramming models. Trying
to organise my teaching more efficiently, I decided to extend my course notes so that they
could be used for both classes: an introductory part on stochastic processes and single-
server queues, followed by specialisations towards communication networks and computer
systems. By mid 1995, the course notes had evolved into a booklet of some 200 pages.
After having moved to the RWTH-Aachen in the fall of 1995, I started two new courses
for computer science students, dealing with a superset of the themes I already addressed
at the University of Twente. The evolving course notes for these two courses resulted in
the book you are now reading.

Prerequisites
Many performance evaluation textbooks require a rather strong mathematical background
of the readers, e.g., by the extensive use of Laplace transforms. In this book, the use
of Laplace and x-transforms is completely avoided (except for two non-critical issues in
Part II) ; however, it is assumed that the readers are familiar with the basic principles of
...
xv111 Preface

mathematical analysis, linear algebra, numerical mathematics, and probability theory and
statistics (some appendices are included as a refresher). Stochastic processes are treated in
this book from first principles; background knowledge in this area is therefore not required.
It should be well understood that this is not a book on queueing theory, nor on stochastic
processes, although both will be dealt with. Instead, the aim of the book is to apply
queueing theory and stochastic processes for the evaluation of the performance of computer
and communication systems.In order to address interesting applications, I assume that the
readers are familiar with the basic principles of computer architecture [19, 128, 223, 2751,
operating systems [261, 276, 2781 and computer networks [265, 277, 2841.

How to use this book?


The book consists of five main parts, as sketched in Figure 1, which will be introduced
below.

I. Performance modelling with stochastic processes. Part I starts with a chapter


on the aim of model-based performance evaluation. Then, a number of basic performance
modelling building blocks are addressed, as well as generally valid rules, such as Little’s
law in Chapter 2. Chapter 3 then presents background material in the field of stochastic
processes, most notably about renewal processes and continuous-time Markov chains.

II. Single-server queueing models. In the second part, a wide variety of single-server
queueing models is addressed, which can be used to evaluate the performance of system
parts, such as network access mechanisms or CPU schedulers. In Chapter 4 we discuss
the simplest class of queues, the MIMI1 q ueues, followed by the slightly more general
M]G/ 1 queues with various types of scheduling in Chapters 5 and 6. We then continue, in
Chapter 7, with the analysis of the G]M]l and GIG]1 queueing models. Although the latter
analyses are interesting from a theoretical viewpoint, they are less useful from a practical
point of view. We therefore present PH IPH I1 queues, for which efficient matrix-geometric
solution techniques exist, in Chapter 8. Finally, we discuss polling models and applications
in the area of token-ring communication networks in Chapter 9.

III. Queueing network models. In the third part of the book, we study networks of
queues. These are especially useful when complex systems consisting of many interacting
parts are studied, such as complete computer systems or communication networks. We
start, in Chapter 10, with the evaluation of open queueing networks. We address both
well-known exact methods (Jackson networks) and approximate methods. In Chapter 11
Preface xix

Figure 1: The organisation of the books in 5 parts and 18 chapters

we then continue with the study of load-independent closed queueing networks, including
the discussion of computational algorithms such as the convolution method, mean-value
analysis and various approximation techniques. These are extended in Chapter 12 to
include load-dependent queueing networks in order to handle hierarchical decomposition
as a method to cope with model complexity. Finally, in Chapter 13, queueing networks
with multiple customer classes and various types of stations are discussed.

IV. Stochastic Petri net models. In the fourth part of the book, stochastic Petri
nets (SPNs) are discussed as a representative of Markovian techniques to evaluate large
performance models. SPNs provide more modelling flexibility than queueing networks,
however, most often at the cost of a more expensive solution. We present the basic SPN
formalism in Chapter 14, including the computation of invariants and the derivation of the
underlying Markov chain. Chapter 15 is then devoted to the numerical solution of large
Markov chains, as they arise from SPN models. Both steady-state and transient analysis
methods are presented. In Chapter 16 we then present four larger SPN-based performance
evaluation case studies. In Chapter 17 we finally address a special class of SPNs that uses
the efficient matrix-geometric solution method of Chapter 8.

V. Simulation. The fifth part comprises only Chapter 18, in which the basic principles
and statistical tools for stochastic discrete-event simulations are discussed.

A final part consisting of appendices on probability theory and Laplace transforms com-
pletes the book.

In Figure 1 we sketch an overview of the book chapters and their relation (A --+ B expresses
that Chapter A is a prerequisite to understand Chapter B; dashed arrows express a similar
xx Preface

but less strong relation). Given this ordering, various trajectories can be followed through
the book:
l Single-server queues: l-2-3-4-5-6-7-( 8)-9;

l Queueing networks: l-2-3-4-10-11-12-13;

l Stochastic Petri nets: l-2-3-14-15-16-(8-17);

l General introduction: l-2-3-4-5-(6)-10-( ll-14-16)-18;

l Advanced course: (l-2-3-4-5)-6-7-8-9-( lo)-1112-13-( 14)-15-16-17;


Regarding the used notation, the following remarks are in place. Normally, mathematical
variables are denoted in lower case (z or CY).Sets are indicated using calligrahic letters (S),
except for the sets of natural numbers (JV) and real numbers (B?). Vectors are written
as underlined lower case (v or 7r) and matrices as boldface upper case (M). Random
variables are given in upper case, their realisations as normal mathematical variables, and
their estimators have a tilde (X is an estimator for the random variable X). Finally,
the expectation and variance of a stochastic variable X are denoted as E[X] and var[X],
respectively.

On the use of tools


Throughout the book many numerical examples are presented. In the simpler cases, these
examples have been performed “by hand” or with the computer algebra package MAPLE
[lag]. The more involved queueing network and stochastic Petri net models have been
built and evaluated with the tools SHARPE [250, 2491 and SPNP [53], both developed at
Duke University. All quasi-birth-death models have been created and evaluated using the
tool SPN2MGM [118, 1251 d eveloped at the RWTH-Aachen.
To really understand performance evaluation techniques, example modelling studies
should be performed, preferably also using modern tools. Many performance evaluation
tools can be used free of charge for teaching purposes. Please contact the “tool authors”
or the web-site below for more details.

Get in touch!
For more information on the use of this book, including updates, errata, new exercises,
tools and other interesting links, visit the following web-site:
Preface xxi

http://www-lvs.informatik.rwth-aachen.de/pccs/

or mail the author at haverkort@inf ormat ik. rwth-aachen. de.

Acknowledgements
I have been working on this book for a long time. Throughout this period, I have had the
pleasure to cooperate with many researchers and students, most notably at the University
of Twente and the RWTH-Aachen, but also from many other places around the globe,
both from industry and academia. All of them have contributed to my understanding of
performance evaluation of computer communication systems, for which I am very grate-
ful. I would like to thank a number of people explicitly: Ignas Niemegeers (University of
Twente) for his long-time encouragement and the many stimulating discussions; Leonard
Franken (KPN Research), Geert Heijenk (Ericsson) and Aad van Moorsel (Bell Laborato-
ries) for the joint work on model-based performance evaluation since the beginning of the
1990’s; Bill Henderson (University of Adelaide), John Meyer (University of Michigan), Bill
Sanders (University of Illinois) and Kishor Trivedi (Duke University) for being my over-
seas collaborators and hosts; Henrik Bohnenkamp and Alexander Ost for their support and
collaboration since I have been working at the RWTH-Aachen.
Aad van Moorsel and Henrik Bohnenkamp read the complete manuscript; their sug-
gestions and comments improved the book substantially. Many of the presented exercises
have been developed in close cooperation with Henrik Bohnenkamp as well. Of course, the
final responsibility for any flaws or shortcomings lies with me.
To write a book requires more than just scientific support. The endless encouragement
and love of my wife Ellen ter Brugge, my daughter Isabelle and my son Arthur cannot be
expressed in words. Without them, there would not have been a “higher aim” to complete
this book. My parents, Ans and Henk Haverkort, encouraged me to study ever since I
attended school, providing me with the opportunities they never had. Regrettably, my
father has not been given time to witness the completion of this book. I dedicate this book
to him.

Boudewijn R. Haverkort
Aachen, July 1998
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons, Ltd.
Print ISBN 0-471-197228-2 Electronic ISBN 0-470-84192-3

Index

ACE algorithm, 354 checkpoint ing, 400


aggregate service center, 277 circuit-switched telecommunication network,
Amdahl’s law, 7 378
arc multiplicity, 306 closed queueing network, 87, 229
arrival instance, 107 Cobham’s formula, 119
arrival rate, intensity, 21 completion-time problem, 400
arrival theorem, 89, 242, 274 conditional probability, 442
asymptotic bound, 245 confidence interval, 433
availability model, 376 congruential method
additive, 420
balanced queueing network, 248
linear, 420
Bard-Schweitzer approximation, 247, 253,
connection-oriented service, 395
300
connectionless service, 395
BASTA property, 76
conservation law, 120, 125
batch arrivals, 107
conservative, 3 13
batch means method, 431
strictly, 313
BCMP queueing network, 252, 293, 295
continuous-time Markov chain, 43
approximate solution, 251
generator matrix, 46
birth-death process, 53, 69
steady-state probabilities, 47
blocking probability, 74
bottleneck, 201, 234, 259 transient probabilities, 46

boundary column 146 convolution algorithm, 236, 237, 269, 272,

Burke’s theorem, 202 298


infinite-server version, 273
caudal curve, 164 multi-server version, 272
central limit theorem, 427, 432, 447 correlation test, 423
central server system, 258 critical value
chain, 31 double-sided, 433
Chapman-Kolmogorov equation, 39 single-sided, 433
Chebyshev’s inequality, 411 CSMA/CD network access, 267
490 Index

cumulative density function, 443, 445 distribution function, 445


customer
enabling function, 311, 393
combination, 208
Erlang’s loss formula, 86
creation, 208
Erlang-Ic distribution, 58
cycle length, 420
estimate, 410
cycle time, 177
estimator, 410, 430
delayed service: 393 consistent, 410, 431
departure instance, 105, 107 unbiased, 410, 430
destination release, 187 event, 413
deterministic and stochastic Petri net, 324 event list, 416
direct method, 330 expected accumulated reward, 352
Gaussian elimination, 330 expected value, 450
LU decomposition, 334
factorial moment, 107
discrete-time Markov chain, 37
feed-forward queueing network, 200, 204
state residence time distribution, 41
fill-in, 333, 337
state-transition probability matrix, 37
firing rule, 307
steady-state probabilities, 40
first-order dependence, 33
transient probabilities, 39
first-order traffic equations, 204, 210, 230,
transition probabilities, 37
294
distribution fixed-point iteration, 372
Bernoulli, 444 flow-equivalent service center, 277
binomial, 445 folding, 371
Coxian, 59, 449 fork/join-operation, 208
deterministic, 447
Erlang, 448 Gauss-Seidel method, 338, 339
exponential, 448 Gaussian elimination, 330, 331
geometric, 444 general modelling tool framework, 16, 322
hyperexponential, 58, 448 generator matrix, 318, 387
hypo-exponential, 448 geometric series, 457
modified geometric, 445 global balance equations, 71, 329, 388
normal, 447 Gordon-Newell queueing network, 229
phase-type, 449
hierarchical decomposition, 277, 282
Poisson, 445
approximate, 283
Student, 433
uniform, 445, 449 independent, 442
Index 491

independent replicas, 431 Lindley’s integral equation, 142


infinite-state stochastic Petri net, 383 Little’s law, 21
inhibiting place, 307 load-dependency, 244
inhibitor arc, 306 load-dependent server, 266
initial transient period, 429 local balance equations, 72
input arc, 306 log file, 428
input place, 306 LU decomposition, 334
insensitivity property, 297
inter-visit time, 178 marginal probability density function, 444,
interrupted Poisson process, 60 446
inversion method, 424 marking, 306
iterative method, 337 reachable, 312
comparison of, 342 tangible, 307
Gauss-Seidel, 338 vanishing, 307
Jacobi, 338 marking dependent
Power, 337 arc multiplicity, 311
successive over-relaxation, 340 weight, 311
time and space complexity, 341 Markov
dependence, 33
Jackson queueing network, 205, 223 property, 33
Jacobi method, 338 reward model, 354
Jensen’s method, 347 Markov chain, 33, 63
job allocation, 80 (non-) recurrent state, 42
joint distribution function, 443, 446 absorbing state, 42
convergence properties, 41
k-bounded, 313
irreducible, 42
Ic-th moment, 450
normalisation, 47
Kendall notation, 13
periodic, 42
Kleinrock’s conservation law, 178
semi-, 34
Kleinrock’s independence assumption, 222
time-homogeneous, 33
Kolmogorov-Smirnov test, 422
Marshall’s result, 212
Kramer and Langenbach-Belz approxima-
matrix-geometric solution, 152, 387
tion, 143, 215
mean time to absorption, 57
Laplace transform, 455 mean-value analysis, 88, 241, 274, 299,
law of total probability, 443 381
level, 385 measure, 4
492 Index

cumulative, 344 normalisation, 331, 336


instant-of-time, 344 normalising constant, 231, 236, 244, 269,
reward-based, 391 298
system-oriented, 4, 322, 428 Normally distributed random numbers, 427
user-oriented, 4, 428 Norton’s approach, 277
measurement, 3
open queueing network, 199
memoryless, 444, 448
output arc, 306
property, 100
output place, 307
method of moments, 104
mgl queueing network, 224 packet-switched telecommunication networks,
model, 4 220
analytical represent at ion, 15 paging, 286, 358
modellers representation, 15 passage, 230
moments, 450, 451 PASTA property, 26, 74, 99, 107, 134, 180
Bernoulli, 451 performability, 353
binomial, 451 distribution, 345
deterministic, 452 performance evaluation, 6
Erlang, 453 phase-type distribution, 55
exponential, 452 represent ation of, 56
geometric, 451 Phipp’s formula, 124
hyperexponential, 454 pivot, 331
hypo-exponential, 453 place, 306
modified geometric, 451 invariant, 313, 360, 390
normal, 452 Poisson
Poisson, 45 1 distribution, 36, 54
uniform, 452, 454 probability, 349, 35 1
monitoring, 3 Poisson process, 36, 53
hardware, 4 merging, 54
hybrid, 4 splitting, 54
software, 4 Pollaczek-Khintchine (PK-) formula, 96
Monte Carlo simulation, 410 polling model, 173, 363
multi-mode system, 399 approximations, 185
multiprogramming computer system, 283, count based, asymmetric, 183
358 count based, symmetric, 180
multiprogramming limit, 283 Power method, 167, 337
mutually exclusive events, 441 probability, 441
Index 493

density function, 443 129


distribution function, 443 M) GI 1, shortest-remaining processi %
mass function, 443 time, 130
product-form, 231 MIMII, 25, 70, 73, 145
queueing network, 202 wwll~, 86
steady-state probabilities, 266 MIMlllm, 83
stochastic Petri net, 324 MIMloo, 79
pseudo-conservation law, 178 M(Mlm, 77
pseudo-random number generation, 420 MIMlmlm, 85
MIPHIl, 162
QNA method, 207, 225
PHIMlm, 163
quasi-birth-death model, 149, 383, 384 PHIPHIl, 148
boundary level, 149 queueing, 11
level, 148 queueing network, 199
logarithmic reduction algorithm, 158 queueing station, 12
matrix R, 152
matrix quadratic equation, 153 random number generation, 419
matrix-geometric solution, 152, 387 random variable, 443
repeating level, 146, 149 reachability graph, 313, 363
stability, 153 reachability set, 313
steady-state probabilities, 152 recurrence time
successive substitution algorithm, 156 backward, 100
queue forward, 100
D/M/l, 136 regenerative method, 432
&(&IL 150 rejection method, 426
GIGIl, 139 relaxation factor, 340
GIMIl, 133 remaining service time, 26
IPPIE& 160 renewal
MIGIl, 95, 115 counting process, 34
M(GI1 with batches, 107 density, 35
MIG 11 with server breakdown, 109 equation, 35
MIGIl, non-preemptive priority, 115 function, 35
MI G )1, processor sharing, 128 process, 33, 34, 54
MIGIl, round robin, 126 renewal process
MIGIl, shortest job next, 123 alternating, 64
MI G 11,shortest-elapsed processing time, splitting, 211
494 Index

superpositioning, 213 shortest-elapsed processing time, 129


reservation-based system, 378 shortest-remaining processing time, 130
residual lifetime, 100 time-based, 177, 367
response time, 22 second-order traffic equations, 2 11, 213
distribution, 76, 137 semi-Markov chain, 51
law, 88 kernel, 51
Runge-Kutta method, 346 steady-state probabilities, 52
explicit 4th-order, 347 server breakdown, 109
order of, 346 server vacation, 109
single-step method, 346 multiple, 110
single, 111
sample space, 441 service demands per passage, 230
saturation point, 89, 247 simulation
scalar state process, 145 asynchronous, 415
scheduling, 13 continuous-event, 411
k-limited, 176 discrete-event, 412
Bernoulli, 176 event-based, 415
Binomial, 176 rare-event, 435
count-based, 176, 367 synchronous, 413
exhaustive, 176 time-based, 413
first come, first served, 13, 293 simulation time, 412
gated, 176 solution techniques, 8
global time-based, 177 analytical, 8
infinite server, 13, 294 closed-form, 8
last come, first served, 13, 294 numerical, 8
local time-based, 177 simulation, 8
non-preemptive priority, 116 source release, 187
preemptive priority, 116 spectral expansion, 168
preemptive repeat different, 121 speed-up, 7
preemptive repeat identical, 121 squared coefficient of variation, 97, 450
preemptive resume, 121 state space, 31, 229
priority, 13 state transition diagram, 38
processor sharing, 13, 128, 294 stochastic activity network, 324
round robin, 13, 126 stochastic matrix, 37
semi-exhaustive, 176 stochastic models, 10
shortest job next, 123 stochastic Petri net, 305
Index 495

coloured, 324 enabled, 307


embedded Markov chain, 318 immediate, 306
reduced embedded Markov chain, 320 invariant, 315
stochastic process, 31 live, 314
continuous-state, 32 rate, 306
continuous-time, 32 timed, 306
discrete-state, 31 weight, 306
discrete-time, 32 triple-modular redundant system, 49
independent, 33 truncation method, 429
state, 31 truncation point
stationary, 33 left, 351, 378
stochastic simulation 410 right, 350, 378
stretch factor, 27 steady-state, 35 1, 378
successive over-relaxation, 340
uniformisation, 347, 353
swap-in queue, 283, 358
uniformisation rate, 348
task-completion time, 399 utilisation, 23, 73
tensor product, 149, 458
vector state process, 148
tensor sum, 458
visit order, 174
terminal model, 86
cyclic, 174
mean-value analysis, 88
load-dependent, 373
thrashing, 288
Markovian, 175
threshold priority policy, 374
tabular, 175
extended, 375 visit ratio, 230
throughput, 22
timed-token access mechanism, 186 waiting time distribution, 136
approximations, 187, 191 waiting time paradox, 103
token, 306 work conserving, 98
token holding timer, 187, 367 working set, 287
influence of, 189, 368, 370
x2-test, 422
token ring, 186, 363
tools, 15
trace file, 428
traffic intensity, 23
transient probabilities, 343
transition, 306
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part I

Performance modelling with


stochastic processes
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 1

Introduct ion

I N this chapter we discuss the aim of and the approach normall


evaluation of computer and communication
y followed in performance
systems in Section 1.1. A classification
solution techniques is presented in Section 1.2. The fact that we will need stochastic models
of

is motivated in Section 1.3. As a special case of these, we then introduce queueing models
in Section 1.4. Finally, in Section 1.5, we discuss the use of software tools for model
construction and solution.

1.1 Performance evaluation: aim and approach


Performance evaluation aims at forecasting system behaviour in a quantitative way. When-
ever new systems are to be built or existing systems have to be reconfigured or adapted,
performance evaluation can be employed to predict the impact of architectural or imple-
mentation changes on the system performance.
An important aspect of performance evaluation is performance measurement or mon-
itoring. By monitoring the timing of certain important events in a system, insight can
be obtained in which system operations take most time, or which system components are
heavily loaded and which are not. Notice that a prerequisite for performance measurement
is the availability of a system that can be observed (measured). If such a system is not
available, measurement cannot be employed. As can easily be understood, performance
measurement will occur much more often in cases where existing systems have to be altered
than in cases where new systems have to be designed. Another important aspect of per-
formance measurement is the fact that the system that is studied will have to be changed
slightly in order to perform the measurements, i.e., extra code might be required to gen-
erate time-stamps and to write event logs. Of course, these alterations themselves affect
4 1 Introduction

the system performance. This is especially the case when employing software monitoring,
i.e., when all the necessary extra functionality for the monitoring process is implemented in
software. When employing hardware monitoring extra hardware is used to detect certain
events, e.g., a computer address bus is monitored to measure the time between certain ad-
dresses passing by, thus giving information on the execution time of parts of programs. As
a combination of hard- and software monitoring, hybrid monitoring can also be employed.
In all cases, one sees that system-specific software or hardware is needed, of which the
development is very costly.
For the above mentioned cost and availability reasons, performance monitoring can
often not be employed. Instead, in those cases, one can use model-based performance
evaluation. This proceeds as follows. If there is no system available that can be used for
performing measurements, we should at least have an unambiguous system description.
From this system description we can then make an abstract model. According to [136]:

“a model is a small-scale reproduction or representation of something”.

In the context of performance evaluation, a model is an abstract description, based on


(mathematically) well-defined concepts, of a system in terms of its components and their
interactions, as well as its interactions with the environment. The environment part in
the model describes how the system is being used, by humans or by other systems. Very
often, this part of the model is called the system workload model. The process of designing
models is called modelling. According to [136]:

“modelling is the art of making models”.

This definition stresses a key issue in model-based performance evaluation, namely the
fact that developing models for computer-communication systems is a very challenging
task. Indeed, performance modelling requires many engineering skills, but these alone are
not enough. There is no such thing as a generally applicable model “cookbook” from
which we can learn how to built the right performance models for all types of computer-
communication systems. Surely, there are generally applicable guidelines, but these are no
more than that. Depending on the situation at hand, a good model (where good needs to
be defined) can range from being extremely simple to being utterly complex.
Let us now come to a few of the guidelines in constructing performance models. The
choice for a particular model heavily depends on the performance measure of interest.
The measure of interest should be chosen such that its value answers the questions one
has about the system. The measures of interest ar either user-oriented (sometimes also
called task-oriented) or system-oriented. Examples of the former are the (job) response
1.1 Performance evaluation: aim and approach 5

time (R), the throughput of jobs (X), the job waiting time (W) and the job service time
(S). In any case, these measures tell something about the performance of system requests
(jobs) as issued by system users. As for system-oriented measures, one can think of the
number of jobs in the system (N) or in some system queue (N,), or about the utilisation
of system components (p). These measures are not so much related to what users perceive
as system performance; they merely say something about the internal organisation of the
system under study. Very often, system-oriented measures can be related to user-oriented
measures, e.g., via Little’s law (see Chapter 2). In the course of this book, we will address
all these measures in more detail.
Once we have decided to use a particular measure, we have to answer the question
how detailed we want to determine it. Do average values suffice, or are variances also of
interest, or do we even need complete distributions ? This degree of detail clearly has its
influence on the model to be developed. As an example of this, for deriving the average
response time in a multiprogrammed computer system, a different model will be needed
than for deriving the probability that the response time is larger than some threshold value.
This aspect is related to the required accuracy of the measure of interest. If only a rough
estimate of a particular measure is required, one might try to keep the model as simple
as possible. If a more accurate determination is required, it might be needed to include
many system details in the model. It is important to point out at this place the fact that
in many circumstances where model-based performance evaluation is employed, there is
great uncertainty about many system aspects and parameters. However, for the model
to be solvable, one needs exact input. In such cases, it seems to be preferable to make a
fairly abstract model with mild assumptions, rather than make a detailed model for which
one cannot provide the required input parameters. In any case, the outcome of the model
should be interpreted taking into account the accuracy of the input; a model is as good as
its input!
Very often, not a single model will be made, but a set of models, one for each design
alternative. Also, these models can have parameters that are still unknown or that are
subject to uncertainty. The analysis and evaluation phases to follow should be performed
for all the model alternatives and parameter values.
Once a model has been constructed, it should be analysed. This analysis can proceed
by using a variety of techniques; we give an overview of the existing solution techniques
in the next section. In many practical cases, model construction and analysis should be
supported by software tools. Real computer and communication systems are generally too
complex to be modelled and analysed with just pencil and paper, although this might not
be totally true for some quick initial calculations.
6 1 Introduction

f+ system

7 measure accuracy
n

Figure 1.1: The model-based performance evaluation cycle. The dashed arrows denote
feedback loops; the normal arrows indicate the procedure order.

The numerical outcome of the model solution should be interpreted with care. First
of all, one should ask the question whether the numerical outcomes do provide the answer
to the initially posed question. If not, one might need to change to another measure
of interest, or one might require a different accuracy. Also, it might be required to use
a different solution technique or to change the model slightly. Finally, if the numerical
solution does give an answer to the posed question, this answer should be interpreted in
terms of the operation of the modelled system. This interpretation, or the whole process
that leads to this interpretation, is called system (performance) evaluation. The evaluation
might point to specific system parts that need further investigation, or might result in a
particular design choice. The sketched approach is illustrated in Figure 1.1.
As a final remark, it should be noted that model-based performance evaluation can
also be employed in combination with performance monitoring, especially when changes to
existing systems are considered. In those cases, one can measure particular events in the
1.1 Performance evaluation: aim and approach 7

system, in order to determine system parameters. These parameters can be input to a fairly
detailed system model. Then, using the model, various alternative system configurations
can be evaluated, which might lead to conclusions about how the real system needs to be
adapted and where investments can best be made. This is, again, often a more cost-effective
approach, than just to invest and hope that the system performance improves.

Example 1.1. Amdahl’s law.


Suppose we are interested in determining what the merit of parallelism is for performing a
particular task. We know that this task (a program) takes t seconds to execute on a single
processor of a particular type, and we like to have answers to the following two questions:

l How long does it take to complete this task on an n-processor (of the same type)?

a What is the reached “speed-up”?

To make the above informal questions more concrete, we define two measures:

l T(n), the time it takes to complete the task on an n-processor


(we know T(1) = t);

l The speed-up S(n) = T(l)/T(n).

We want to determine these two measures; however, since we do not know much about the
task, the processor, etc., we make a very simple model that gives us estimates about what
using more processors might bring us. We furthermore like to use the model to determine
the performance of massive parallelism (n + 00).
The tasks taking t seconds on a single processor can certainly not completely be par-
allelised. Indeed, it is reasonable to assume that only a fraction a (0 < a < 1) can be
parallelised. Therefore, the sequential part of the task, of length (1 - a)t, will not be
shortened when using multiple processors; the part of length alt will be shortened. Thus
we find for T(n):
T(n) = (1 - o!)t + o;.

Taking the limit n + 00, we find that T(n) + (1 - a)t, meaning that no matter how many
processors we have at our disposal, the task we are interested in will not be completed in
less than (1 -a)t seconds. This result might be a surprise: we cannot reduce the completion
time of tasks to 0 by simply using more processors. A similar observation can be made
regarding the speed-up. We find

S(n) = L!Il!J.= t 1
T(n) (1 - o)t + $ = 1 - ?a’
8 1 Introduction

Taking the limit n -+ 00, we find S(n) + (1 - a)-‘. This result is known as Amdahl’s
law and states that the speed-up gained when using multiple processors is limited by the
inverse of the fraction of the task that can only be performed sequentially.
As an example, we consider Q = 90%. We then find that the best we can do is to
reduce the completion time with a factor 10, i.e., lim,,, T(n) = t/10, and the speed-up is
at most lim,,, S(n) = 10. 0

1.2 Model solution techniques


Regarding the solution techniques that can be employed, two main classes of techniques
can be distinguished: analytical and simulative techniques.
If the model at hand fulfills a number of requirements, we can directly calculate im-
portant performance rneasures from the model by using analytical techniques. Analytical
techniques are of course very convenient, but, as we will see, not many real systems can be
modelled in such a way that the requirements are fulfilled. However, we will spend quite
some time on deriving and applying analytical techniques. The reasons for this are, among
others, that they can give a good insight into the operation of the systems under study at
low cost, and that they can be used for “quick engineering” purposes in system design.
Within the class of analytical techniques, a subclassification is often made. First of
all, there are the so-called closed-form analytical techniques. With these, the performance
measure of interest is given as an explicit expression in terms of the model structure
and parameters. Such techniques are only available for the simplest models. A broader
class of techniques are the analytic/numerical techniques, or numerical techniques, for
short. With these, we are able to obtain (systems of) equations of which the solution
can be obtained by employing techniques known from numerical analysis, e.g., by iterative
procedures. Although such numerical techniques do not give us closed-form formulae, we
still can obtain exact results from them, of course within the error tolerance of the computer
which is used for the numerical calculations.
For the widest class of models that can be imagined, analytical techniques do not exist
to obtain model solutions. In these cases we have to resort to simulation techniques in order
to solve the model, i.e., in order to obtain the measures of interest. With simulation, we
mimic the system behaviour, generally by executing an appropriate simulation program.
When doing so, we take time stamps, tabulate events, etc. After having simulated for some
time, we use the time stamps to derive statistical estimates of the measures of interest.
It is also possible to combine the above modelling approaches. This is called hybrid
1.3 Stochastic models 9

modelling. In such an approach, parts of the model are solved with one technique and the
obtained results are used in combination with the other model parts and solved by another
technique.
The presented classification of solution techniques is not unique, nor beyond debate.
Very often also, the performance models are classified after the techniques that can be used
to solve them, i.e., one then speaks of analytical models or of simulation models.
It is difficult to state in general terms which of the three solution techniques is best.
Each has its own merits and drawbacks. Analytical techniques tend to be the least ex-
pensive and give the modeller deep insight into the main characteristics of the system.
Unfortunately, real sy&ms often cannot be adequately modelled by analytically tractable
models. Approximate analytical models can be an outcome; however, their validity is of-
ten limited to a restricted range of parameters. Numerical techniques, as an intermediate
between pure analytical and simulative techniques, can be applied in very many cases. Us-
ing simulation, the modeller is tempted to make the models too complex since the model
solution technique itself does not bring about any restrictions in the modelling process.
This might easily lead to very large and expensive simulation models. As Alan Scherr,
IBM’s time-sharing pioneer and the first to use analytical techniques for the evaluation of
time-sharing computer systems, puts it in [98]:

lL. . . blind, imitative simulation models are by and large a waste of time and
money. To put it into a more diplomatic way, the return on investment isn’t
nearly as high as on a simpler, analytic-type model. . .“,

and

“. . Ahe danger is that people will be tempted to take the easy way out and use
the capacity of the computers as a way of avoiding the hard thinking that often
needs to be done”.

Stated differently, (analytical) performance modelling is about “finding those 10% of the
system that explains 90% of its behaviour”. Throughout this book, we will deal mainly
with analytical and numerical modelling techniques.

1.3 Stochastic models


As pointed out in Section 1.1, we are generally concerned with models of systems that do
not yet exist, of which we do not know all the parameters exactly, and, moreover, of which
10 1 Introduction

the usage patterns are not known exactly. Consequently, there is quite some uncertainty
in the models to be developed.
Uncertainty in the system parameters is often dealt with by doing parametric analysis,
i.e., by solving the model many times for different parameters, or by doing a parametric
sensitivity analysis. Both approaches can be used to come up with plots of the system
performance, expressed in some measure of interest, against a varying system parameter.
A different type of uncertainty concerns the usage pattern of the system. This is
generally denoted as the workload imposed on the system. The usage pattern is dependent
on many factors, and cannot be described deterministically, i.e., we simply do not know
what future system users will require the system to do for them. The only thing we do
know are statistics about the usage in the past, or expectations about future behaviour.
These uncertainties naturally lead to the use of random variables in the models. These
variables then express, in a stochastic way, the uncertainty about the usage patterns.

Example 1.2. Uncertainty in user behaviour: workload modelling.


Consider a model of a telephone exchange that is used to compute the long-term probability
that an incoming call needs to be rejected because all outgoing lines are busy. A system
parameter that might yet be uncertain is the number of outgoing lines. In addition there is
uncertainty about at which times (call) arrivals take place and how long calls last. These
uncertainties can be described by random variables obeying a chosen interarrival time
distribution and call duration distribution.
For a given workload, we can do a parametric analysis on the number of outgoing lines,
in order to study the call rejection probability when the telephone exchange is made more
powerful (and expensive!). Doing a parametric analysis of the system (with fixed number
of outgoing lines) on the mean time between call arrivals, gives insight into the quality
of the system, i.e., the call rejection probability, when the workload increases. Changing
the distribution of the times between calls or the call duration distribution, but keeping
the call rate (one over the mean intercall time) and the number of outgoing lines constant,
allows us to study the call rejection probability as a function of the variability in the arrival
pattern and the duration of the calls. cl

When making stochastic assumptions we naturally end up with stochastic models. The
overall behaviour of the system is then described as a stochastic process in time, i.e., a col-
lection of random variables that change their value in the course of time. The performance
measures of interest then need to be expressed as functions of this stochastic process. De-
pending on the type of the stochastic process and the type of measures requested, this
function can be more or less easy to determine.
1.4 Queueing models 11

Example 1.3. From stochastic model to measure of interest.


Referring to the previous example, we will see in Chapter 4 that the long-run call rejection
probability can be computed in closed form under the assumption that the times between
call arrivals and the call duration distributions are negative exponential. Under the same
conditions, but if the measure of interest is slightly changed to the call rejection proba-
bility at a certain time instance t, the numerical solution of a system of linear differential
equations is required. 0

1.4 Queueing models


A very important class of stochastic models are queueing models, which we introduce in this
section. We discuss the principle of queueing in Section 1.4.1. We then present Kendall’s
notation to characterise simple queueing stations in Section 1.4.2.

1.4.1 The principle of queueing

Queueing models describe queueing phenornena that occur in reality. Queueing can be
observed almost everywhere. We know about it from our daily lives: we line up in front
of airline check-in counters, in front of coffee machines, at the dentist, at traffic crossings,
etc. In all these cases, queueing occurs because the arrival pattern of customers varies in
time, and the service characteristics vary from customer to customer. As a general rule of
thumb, the more variability is involved, the more we need to queue. Directly associated
with queueing is waiting. The longer a queue, the longer one normally has to wait before
being served.
Also in technical systems, queueing plays an important role. Although we will focus
on computer-communication systems, also in logistic systems and in manufacturing lines,
queueing can be observed. It is interesting to note that in all these fields, similar techniques
are used to analyse and optimise system operation.
In the area of computer-communication systems, one observes that many system users
want to access, every now and then, shared resources. These shared resources vary from
printers, to central file or compute servers, or to the access networks for these central
facilities. Because the request rates and the requested volumes issued by a large user
population vary in time, situations occur when more than one user wants to access a single
resource. Waiting for one another is then the only reasonable solution. The alternative
to give all users all the resources privately is not a very cost effective solution. Besides
12 1 Introduction

queue server
arrivals

-rt-Ozt ures
waiting in service

Figure 1.2: The basic model for a scarce resource

that, it would also preclude other advantages of the use of shared resources, such a central
support and data sharing.

Example 1.4. A centralised compute server.


At a computer centre, jobs from various users (sitting behind their terminals) arrive to be
processed at a powerful compute server. Depending on how many other jobs are already
being processed, an incoming job is directly served or it has to wait for some time before
its predecessors are served. Once in service, the job occupies some part of the system
capacity. Let us assume a round-robin scheduling strategy, in which up to some maximum
number of jobs are served quasi-simultaneously. When the job processing has completed,
the resources that were used (apart from the CPU, some part of virtual memory and disks
are mostly involved) are released again and can be used for servicing other jobs. Important
to observe is that we deal with a resource with limited capacity, namely the compute server
in the computer centre which is to be shared by a large population of customers/terminal
users. The population of terminal users is so large that, would they all want to be served
simultaneously, the compute server would be overloaded. 0

The idea now is to model all types of shared resources as service providing entities
preceded by waiting queues, as depicted in Figure 1.2. Let us try to characterise such a
basic queueing station at which customers arrive, wait, are served, and finally depart. The
following aspects are of importance for the quality of the provided service as perceived by
the customers:

l The time between successive arrivals of customers requesting service. These


intervals are often assumed to obey some stochastic regime.

l The customer population. Are we dealing with an infinite or with a finite customer
populat<ion?

l The amount of waiting room that is available for the customers that cannot directly
be served.
1.4 Queueing models 13

l The amount of service a customer requests. This is generally also described by some
stochastic variable.

l The number of service providing entities that is available. Are we dealing with a
single server or with a multi-server?

l The way in which incoming customers are scheduled to obtain their requested service.
Commonly used scheduling principles are:

- FCFS: First Come, First Served;

- RR: Round-Robin;

- PS: Processor Sharing (the limiting case of RR);


- LCFS(PR): Last Come, First Served, with or without Preemption;

- IS: Infinite Server;

- PRIO: Prioritized scheduling.

Once a queueing station has been characterised completely, we can try to evaluate the
performance characteristics of such a queueing station.
A remark about the employed terminology should be made here. In general, we speak of
either jobs or customers in a queueing station. These two names are used interchangeably.
Sometimes, when the application is more computer-network oriented, we also use the term
packet.

1.4.2 Single queues: the Kendall notation

In order to compactly describe single queueing stations in an unambiguous way, the so-
called Kendall notation is often used; it consists of 6 identifiers, separated by vertical bars,
as follows:

Arrivals) Services) Servers 1Buffersizel Population] Scheduling,

where “Arrivals” characterises the customer arrival process, “Service” the customer service
requirements, “Servers” the number of service providing entities, “Buffersize” the maximum
number of customers in the queueing station, which includes the customer possibly in the
server, “Population” the size of the customer population, and finally, “Scheduling” the
employed scheduling strategy. Often, the buffer size and the population are omitted from
the description; in that case they are assumed to be infinitely large. The scheduling strategy
14 1 Introduction

is also often omitted; in that case, it is assumed to be FCFS. The parameters, especially
“Arrivals” and “Services” , may assume many different values. We mention some commonly
used ones:

l A4 (Markovian or Memoryless): whenever the interarrival or service times are nega-


tive exponentially distributed;

l G (General): whenever the times involved may be arbitrarily distributed;

l D (Deterministic): whenever the times involved are constant;

l E, (r-stage Erlang): whenever the times involved are distributed according to an


Erlang-r distribution;

l H,: whenever the times involved are distributed according to an r-state hyper-
exponential distribution.

We will discuss many different types of queueing stations in Part II of the book.

Example 1.5. Kendall notat ion.


When we have an M 1G 12181ILCFS queueing station, we have (negative) exponentially dis-
tributed interarrival times, generally distributed service times, 2 service providing entities,
maximally 8 customers present, no limitation on the total customer population, and an
LCFS scheduling strategy. 0

Example 1.6. Queueing in daily life.


Coin-operated coffee machines can be found in many universities and laboratories. Al-
though their service time, that is, the time for preparing a cup of coffee once a coin has
been inserted, is deterministic, some waiting often occurs in front of them. This is typi-
cally due to the stochastic nature of the arrival process. A single coffee machine could be
described by a GID/ 1 queue.
When visiting a doctor, after having made an appointment, one often still has to wait.
Although the arrivals of patients can be seen as deterministic if the appointments have
been made accurately, long waiting times arise due to the fact that the service times,
i.e., the time a doctor talks to or examines patients, is &ochastic. This situation could be
well described with a D IGI 1 queue.
When visiting a doctor without an appointment, that is at the regular “walk-in” con-
sulting hours, things get even worse. In that case, the arrival process of patients is not
1.5 Tool support 15

deterministic any more. The perceived waiting times therefore also increase. This situation
could very well be described with a G/ GJ 1 queue. 0

Important to note is the fact that sometimes the notation “GI” is used, instead of just “G”,
to indicate a general arrival process. The added “I” specifically denotes that succeeding
arrivals are independent of one another; just “G” might allow for dependence between
successive arrivals.

1.5 Tool support


For complex systems, it is not easy to come up directly with a stochastic model that de-
scribes all the relevant aspects of the system in relation to the measure of interest. Instead,
we build system models in a stepwise fashion using basic building blocks (submodels). The
translation of the thus-constructed model to an underlying stochastic model can often be
performed automatically. In this way we “pull up” the modelling activity to a level that
is closer to the system designer’s point of view than to the mathematician’s point of view.
Still though, for a proper application of high-level modelling constructs, we need to have
knowledge about the underlying mathematics.
Once we have obtained the stochastic model of interest, the derivation of the measure
of interest needs to be performed. As already indicated in Section 1.2, the model solution
can be more or less intricate, requiring detailed knowledge about simulation or about
analytic/numerical techniques.
In both the above cases, i.e.,model construction and model solution, tool support is
of key importance. We discuss the role of software tools for model construction in Sec-
tion 1.5.1, and their role in model solution in Section 1.5.2.

1.5.1 Model construction


Software tools can be a great help in the construction of large stochastic models. We can
distinguish at least two representations of a model. First, there is the analytical repre-
sentation of a model. This is the representation that is directly suitable for a numerical
evaluation by one of the techniques mentioned in Section 1.2. Secondly, there is the mod-
ellers representation of a model. This is a description in a symbolic form oriented towards
the specific application, that is the system to be modelled. Clearly, most system designers
prefer to use the modellers representation rather than the analytical representation.
16 1 Introduction

The idea of having more than one view of a model, depending on what one wants to do
with the model, is central to the general modelZing tool framework (GMTF) for quantitative
systems modelling.
In the GMTF, a hierarchy of descriptive formalisms ranging from 30 (the lowest level)
to 3n (the highest level) is employed. 30 yields models that are directly suitable for
evaluations of one or another form, by using analytic, numerical or simulation techniques,
whereas 3n is the formalism closest to the application domain. We define Fi-modelZing as
the process of abstracting, simplifying and/or rewriting a system description S in such a
way that it fits the formalism 3i. The result of this process is called an 3i-model Mi of S.
An 3i-model Mi of S can be rewritten in another formalism 3i-1 (provided i > 1) yielding
an 3i-i-model Mi-1 of S. This is called 3i-i-modelling. The lowest level formalism is 30
which coincides with the earlier mentioned analytical representation.
When 3i is the highest-level formalism, most of the user activity in the modelling
process will be 3i-modelling. The lower level modelling activities can often be partially or
completely automated; the need for tool support is evident.
Once we have a model it should be evaluated. The evaluation of an 3a-model MO yields
results in the descriptive formalism R 0. To do so, solution techniques like those indicated in
Section 1.2 can directly be applied since the formalism 30 has been chosen to directly suit
those techniques. The evaluation of an 3e-model MO is called a Vo-evaluation. The results
presented in the formalism or domain Ri (i 2 0) can be further processed or enhanced to
the higher level Ri+i. This is called an &i+i- enhancement. Often these enhancements can
be done automatically; tool support is again of importance here.
When we have an 3j-model M, of a system S we generally want to evaluate this model
and obtain measures that are specified at the same level. That is, we need the results to
be given in domain Rj. We define a virtual evaluation Vj as the process of subsequently
modelling Mi (1 2 i 5 j) in formalism 3i-1, until an 3a-model MO is obtained, followed
by the Va-evaluation and the subsequent enhancements Ii through &j. Schematically, we
have the structure as depicted in Figure 1.3. This structure represents the GMTF. The
small boxes represent system models (right hand side) or evaluation results (left hand side).
The large box represents the actual mathematical evaluation. The single pointed arrows
represent automatic translations of one formalism into another. The double pointed arrows
represent the virtual evaluations.

Example 1.7. A tool based on stochastic Petri nets.


In Chapter 14 we will present stochastic Petri nets (SPNs) as a suitable formalism for
many performance models. Various tools have been developed that support this formalism
1.5 Tool support 17

8 I

system S I
II
I

I
&modelling

results in Ri 4 * Fi-model Mi
4 I
&enhancement ?=i-r-modelling
Vi-r-evaluation
results in R&r 4 c .7=-r-model Mi-1
I I
t
Ei- r-enhancement Todelling

&-enhancement &,-modelling

results in Ra &-model MO

I&evaluation

Figure 1.3: The general modelling tool framework

(see the surveys [121, 1191).


When using these tools, the relevant aspects of a system S have to be captured in the
SPN formalism. This is the J=r-modelling activity, yielding an &-model Ml of the system.
This modelling activity is performed by a human being.
Model Ml can be converted to a continuous-time Markov chain (see Chapter 3). This is
the .&modelling activity, yielding an &-model MO of the system. Note that this modelling
activity can be performed automatically with the algorithm outlined in Chapter 14.
The resulting MO model, which is a Markov chain, can be evaluated by a standard
numerical technique (see Chapter 15); this is the Vo-evaluation. The results from this
evaluation are state probabilities, which fall in the domain Ra. Although interesting as
such, the system analyst who does not) know that the SPN is solved via an underlying
CTMC cannot do anything with these state probabilities. They have to be enhanced
to a level which corresponds to the level at which the model was originally made. The
calculation of token distributions per place from the state probabilities is an example of an
enhancement &r to the domain Rr . This enhancement can also be performed automatically.
As a conclusion, the tool user only “sees” the virtual evaluation Vr . 0
18 1 Introduction

1.5.2 Model solution

Once we have obtained a stochastic model, the measure of interest still needs to be derived.
Seen in light of the GMTF, one can say that solution techniques are needed to perform
the l&evaluations, that is, to take the ?&-model MO and to “transform it” in results in
Ra which can be more or less intricate. It should be noted also that the transformations
from one modelling level to the next level below (and up again as far as the solutions are
concerned) should be regarded as part of the solution process. Indeed, as we will see later,
these transformations can be more complex and time-consuming than the solution of the
lowest-level model once it has been generated.
In a generally applicable performance evaluation tool, one would prefer to have a wide
variety of solution techniques available. Depending on the model at hand, e.g., its statisti-
cal characteristics or its size, one or another solution technique might be more suitable. It
is desirable that the different solution techniques can be fed with models in the same for-
malism. In that way, different solution techniques can be compared, and, while increasing
the complexity of the model by adding details, shifts from more general to more specific
solution techniques can be made. It should be understood that the desire to be able to
“play around” with different solution techniques is in (mild) conflict with the desire to
keep the solution techniques as invisible as possible to the end-users of the tool.

Example 1.8. Incremental modelling.


In the early design phases of a system, often imprecise information is known about the
durations of certain events. Suppose that one chases to make a queueing network model of
such a system design, to evaluate various design alternatives. Since only limited information
is available, only fairly abstract models can be made. For such models, a solution based on
mean-value analysis might be appropriate (see Chapter 11). When more implementation
aspects become clear in the course of the design more detailed stochastic assumptions
can be made, which might overrule the restrictions to perform a mean-value analysis. A
simulation approach towards the solution, or an approximate solution might, however, still
apply. If these solution techniques are supported, starting from the same model description,
a consistent upgrade to the more detailed model can easily be established. Also, it can
then be easily investigated whether the more detailed information is indeed of importance
for the measure of interest. If not, one can use the “lighter” solution technique. q

In many performance evaluation techniques some kind of accuracy control is involved.


This can be in the form of required confidence interval widths in simulations, in the allowed
number of iteration steps in numerical procedures, or in the choice of a truncation point
1.6 Further reading 19

in an infinite series in the case of analytic/numerical techniques. A software tool should


provide suitable values for such parameters, so that users unaware of the specific techniques
involved are not bothered by these details. However, it should always be possible for the
more experienced user to change these values to better ones, given the application at hand.
The specification and control of (required and obtained) accuracy remains a difficult task.

1.6 Further reading


More information on measurement-based performance evaluation can be found in Jain
[145], Hofmann et al. [ 1351 and Lange et al. [171]. A model-based performance evaluation
cycle similar to the one presented here has been presented by Van Moorsel [206]. Amdahl
published his now-famous law in 1967 [8], although it is not beyond debate [115]. Queueing
models to evaluate system performance have been developed since the beginning of this
century; these models and their applications are the main topic of this book. Kendall
introduced the short-hand notation for single queueing systems [155]. Scherr describes his
time-sharing models already in 1966 [255]. Berson et al. [17] and Page et al. [227] presented
a simpler form of the GMTF. The GMTF was introduced by Haverkort [120]; variants of
it, tailored towards specific application areas, can be found in the literature [127, 180, 211,
2581. Interesting general guidelines for the construction of performance evaluation tools
have been reported by Beilner [16]. Recent surveys of performance evaluation tools can be
found in [126, 1251.

1.7 Exercises
1.1. Performance measures.
The average response times of computer systems A and B are such that E[RA] < E[RB].

1. Is system A better than system B for all types of applications?

2. What would be an appropriate performance measure to compare these systems in


case they have to fulfill real-time requirements?

3. What would be an appropriate performance measure to compare these systems in


case they have to fulfill reliability requirements?

1.2. Amdahl’s law.


Using the expression for T(n), compute the number of processors np that is needed to
20 1 Introduction

reduce the task completion time t to ,& (0 < ,8 < 1). What is the range of values that ,0
can assume?

1.3. Queueing in daily life.


Discuss the models presented in Example 1.6. What are the underlying assumptions?

1.4. Influence of variability.


Address the following questions based on your intuition and experience. In later chapters,
we will address these questions in more detail.

1. How do you think that increased variability in customer arrival and service patterns
does affect the performance of many systems?

2. Is increased variability in service times “worse” for performance than increased vari-
ability in interarrival times?

3. What kind of CPU scheduling is often used in operating systems and how do you
think it affects the performance of systems?

1.5. The GMTF.


Compare the main ideas behind the GMTF with those behind:

1. The use of high-level programming languages and adequate compilers to avoid as-
sembler or machine programming.

2. The possibility to define high-level functions and modules in many programming


languages to enhance the programming activity from a solution-algorithm-orientation
to a problem-description-orientation.

3. The use of layered communication protocol reference models.


Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 2

Little’s law and the MIMI1 queue

I N Section 2.1 we present Little’s law, a very general law that can be applied in many
queueing models. Using Little’s law, we are able to study the simplest queueing model,
the MIMI1 queueing model, in Section 2.2.

2.1 Little’s law


In this section we will introduce the probably most general law in model-based performance
evaluation: Little’s law, named after the author who first proved it [186]. Its generality
lies in the fact that it can be applied almost unconditionally to all queueing models and at
many levels of abstraction. Its strength furthermore lies in the fact that its form is both
intuitively appealing and simple.
In Section 2.1.1 we introduce Little’s law and explain it intuitively. A more thorough
proof is given in Section 2.1.2.

2.1.1 Understanding Little’s law

Little’s law relates the average number of jobs in a queueing station to the average number
of arrivals per time unit and the average time a job spends in a queueing station.
Consider a queueing station as a black box at which on average X jobs per time unit
arrive (see Figure 2.1); X is called the arrival rate or the arrival intensity. When we assume
that jobs are served on a first come-first served basis (FCFS), whenever a job arrives at the
queueing station, two things might happen. Either the job is immediately served which
implies that there are no other jobs in the queueing station, or it has to wait until the jobs
already in the queueing station are served before it gets its turn. Notice that we do not
22 2 Little’s law and the M/Ml1 queue

arrivals service providing departures


>
entity

Figure 2.1: Black box view of a job servicing system

assume anything about the distributions of the interarrival and service times; only their
means are used! Denote the average time a job spends in the queueing station as E[R]
(residence time/ response tirne), and the average number of jobs in the queueing station
as E[N].
Now, suppose that we mark a particular job while observing the queueing station.
When the marked job enters the queueing system we take a time stamp ti (i for “in”).
When the marked job comes out of the queueing station, we take a time stamp t, (o for
“out”). The difference t, - ti will on average be equal to the earlier defined value E[R].
However, at the moment the marked job leaves the queueing station, we know that while
the job passed through the queueing station, other jobs have arrived. How many? Well,
since on average there have elapsed E[R] t ime units between the arrival and the departure
of the marked job, on average X x E[R] j ob s h ave arrived after the marked job in that
period. This number, however, must be equal to the earlied defined value of E[N]. This is
due to the fact that every job can be a marked job, and hence, the product XE[R] always
equals the number of jobs left behind in the queueing station by a departing job, i.e., it can
be interpreted as the average value of the number of jobs in the system. As a consequence,
we have
E[N] = XE[R]. (2.1)
Notice that we have assumed that all jobs that arrive also leave the system after having
received service so that no losses of jobs occur and that the system is not overloaded. In
that case, the arrival rate X equals the throughput of jobs (denoted X). This is normally
the case when the system is not overloaded and when there is infinite buffer capacity. In
case the system is overloaded or when there are finite buffers, customers will be lost, and
only the non-lost customers should be taken into account. In such cases Little’s law should
read E[N] = XE[R]. Due to the fact that losses may occur in some systems, X is not
always a priori known!
The derivation sketched above can be applied at any desired level of abstraction. Sup-
pose that we open up the black box and look at it as if it were a queue, followed by the
actual server (see Figure 2.2). Then we could still apply Little’s law; however, we could
also apply it on the more detailed level of the queue and the server separately, as we will
2.1 Little’s law 23

arrivals
waiting in service

Figure 2.2: The inside view of a job servicing system

explain below.
At the queue on average X jobs arrive per time unit. The time spent in the queue is the
waiting time, which on average equals E[W]. Applying Little’s law, the average number
of jobs in the queue, E[N,], must be equal to the product of the waiting time and the
throughput through the server. The latter equals the arrivals at the queue (no jobs are
lost in between) and we thus obtain:

E[N,] = XE[W]. (24


When looking solely at the server, we again observe, on average, X job arrivals per time
unit, since between the queue and the server no jobs are lost. The average time a job
spends in the server, i.e.,is being served, is denoted E[S]. Then the average number of
jobs in the server is given by:
E[N,] = XE[S]. (2.3)
Since we are dealing with a single server, E[N,] can at most be equal to 1 and must at
least equal 0. In fact, E[N8] not only indicates the average number of jobs in the server
but also the average time the server is busy. Therefore, E[N,] is also called the utilisation
or traJJic intensity and denoted as p = XE[S] = XE[S] (where the last equality holds when
there are no losses).
Returning to our overall queueing system, the average number of jobs in the queueing
system must be equal to the sum of the average numbers of jobs in the components of
the queueing system, i.e., E[N] = E[N,] + E[N,]. Applying the earlier derived versions of
Little’s law, we obtain

E[N] = E[N,] + E[N,] = XE[W] + XE[S] = X(E[W] + E[S]) = XE[R], (24


which is as expected.
There are some important remarks to be made on the generality and use of Little’s law:

l Little’s law expresses a simple relationship between the average values of the through-
put, the residence times and the number of jobs in the system;
24 2 Little’s law and the MIMI1 queue

time t

Figure 2.3: Sample evolution of A(t) and D(t)

l No assumptions regarding the involved interarrival- and service-time distributions


have to be made;

l Little’s law is valid independently of the scheduling discipline of a queue as well as


of the number of servers;

l As we will see later, with many analysis techniques E[N] can be obtained easily.
Using Little’s law, measures like E[R] can then be derived;

l Little’s law not only applies to single queueing stations, but also to networks of
queueing stations.

2.1.2 Proof of Little’s law


After the proof given by Little in 1961 a wide variety of “simple proofs for E[N] = XE[R]”
has appeared in the literature. In this section we give a proof of Little’s law which indeed is
more formal than the intuitive explanation we gave in Section 2.1.1. The presented proof
is valid under the assumption that the queueing system empties infinitely often. For a
queueing system where the arrival rate of jobs is smaller than the service rate of jobs, this
is the case.
As before, we consider the black box view of a queueing system. Without loss of
generality we start observing the queueing system at time t = 0. Let A(t) denote the
number of job arrivals to the queueing system up till time t. A similar definition holds for
D(t), the number of job departures from the system up to time t. Clearly, D(t) 5 A(t).
2.2 The simplest queueing model: the M~lU~l queue 25

When we denote with N(t) the number of jobs in the system at time t, we have N(t) =
A(t) -D(t). I n F’g1 ure 2.3 we show a possible evolution of A(t) and D(t). We now proceed
to define the following four quantities:

l The average arrival rate at the system up to time t, denoted At, simply is the total
number of arrivals up to time t, i.e., A(t), divided by t. Thus, we have Xi = A(t)/t,
and At + X if t -+ 00.

l The total system time of all jobs, R(t), is exactly the area between the two curves of
A(t) and D(t), i.e.,
R(t) = St N(s)& (2.5)
0

a The average time a job spends in the system as observed over the period [0, t) is
denoted Rt . This quantity, however, is equal to the total system time, i.e., R(t),
divided by the total number of customers having been present in the system, i.e., A(t).
Thus, we have Rt = R(t)/A(t), and Rt -+ E[R] if t + co.

l If the total system time of all jobs during [0, t) equals R(t) job-seconds, the average
number of jobs in the system up to time t, denoted Nt, equals R(t)/t, and Nt -+ E[N]
ift+oo.

We now can derive the following relation:

Now, taking the limit as t + 00, we obtain E[N] = XE[R] since At + A, Nt + E[N] and
Rt --+ E[R]. In case the queue is saturated, the number of jobs queued will grow without
bound, hence, the expected response time will not be bounded either.

2.2 The simplest queueing model: the MIMI1 queue


In this section we make use of Little’s law to evaluate the performance characteristics of
the simplest queueing system possible: the MIMI 1 queue.
Consider a queueing station where jobs arrive with a negative exponential interarrival
time distribution with rate X (mean interarrival time E[A] = l/X). Furthermore, the job
service requirements are also negative exponentially distributed with mean E[S] = l/p.
Such a model typically applies when addressing a telephone exchange: empirical results
26 2 Little’s law and the MIMI1 queue

indeed support the assumptions of negative exponential interarrivals times and service
times (call durations).
We first verify whether the utilisation p = X/p < 1, that is, we check whether on
average the amount of arriving jobs flowing in per unit of time is smaller than the amount
of jobs that the system is able to handle in unit time. If so, the system is stable and we
continue our investigations. If not, the system is overloaded and an infinite queue will
grow; further analysis will not help us any further.
We are interested in determining the following mean performance measures for this
queueing system: the mean queue length E[N,], the mean number of jobs in the system
E[N], the mean waiting time E[W] and the mean response time E[R] .
We already know that we can apply Little’s law for the complete queueing station as
well as for the queue in isolation, yielding:

WYJ = Jww7 and E[N] = XE[R]. P-7)

We furthermore know that E[R] = E[W] + E[S]. T o solve for the above four unknown
quantities, we have to have one extra relation. This relation can be derived from the so-
called PASTA property (which will be discussed in detail in Chapter 4) which states that
jobs arriving according to a Poisson process (a process where the interarrival times are
negative exponentially distributed, as we have here (see also Chapter 3)) see the queueing
system upon their arrival as if in equilibrium. This means that an arriving job finds, on
average, E[N] jobs already in the queueing station upon its arrival, and for which it has
to wait before being served. This yields us the following extra equation:

E[W] = E[N]E[S]. (2.8)

The overall response time of such a customer then consists of two main components: (i)
the service time for all customers queued up front; and (ii) its own service time. Thus, we
have:
E[R] = E[N]E[S] + E[S]. P-9)

We also use the fact here that the service times are negative exponentially distributed since
we assume that the remaining service time of the customer in service at the arrival instance
of the new customer is the same as a normal service time (we also come back to this issue
in Chapter 5). Now, we can use Little’s law to write E[N] = XE[R], so that we find:

E[R] = XE[R]E[S] + E[S] =+ E[R]( 1 - XE[S]) = E[S]. (2.10)


2.2 The simplest queueing model: the M 1M] 1 queue 27

Recognising p = XE[S] and rearranging terms we find

qq = E[Sl
1 -p’
(2.11)

Using the relations we already had, we then find that

E[W] = z, E[N] = &, and E[N,] = -&. (2.12)

There are a number of important conclusions to be drawn from these results. First of
all, we observe that the mean performance measures of interest grow infinitely large as
p -+ 1. This conforms with our intuition. Secondly, we see that the performance measures
of interest do not grow linearly with p; instead they grow with a factor

(2.13)

which is called the stretch factor since it can be interpreted as the factor with which the
mean service time E[S] is stretched (multiplied) to yield the mean response time when a
server with total utilisation p processes a job with average length E[S].
In Figure 2.4 we present E[R] and E[W] as a function of p. Note that E[N] = 1 for
p = 50% (at 50% utilisation, on average, there is one job in the queueing station) and
E[W] = E[S] (at 50% utilisation, the mean waiting time equals the mean service time).
Consequently, for higher utilisations, the mean response time includes more than 50% of
waiting. Also observe that the curves start relatively flat but increase quite sharply once
above the 50% boundary. This is typical for most queueing systems and explains why
many computer and communication systems exhibit “all of a sudden” bad performance
when the load is only moderately increased.
The derivation just given provides us with only the mean values of a number of per-
formance measures of interest. For some applications, this might be enough. However,
very often also information on the variance of performance measures is of importance or
information on detailed probabilities regarding certain events in the system (e.g., the prob-
ability that a buffer contains at least k customers). For these cases a more detailed and
intricate analysis of the queueing model is necessary. To derive such information, we need
to study the stochastic process underlying the queueing systems. In order to be able to do
so we need to study stochastic processes first, which is therefore the topic of Chapter 3.
28 2 Little’s law and the MIMI1 queue

lOE[S]
WSI
WSI
7E[Sl
6Jwl
5-wSl
4EPl
WSI
2Jwl
EPI
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
P

Figure 2.4: The mean waiting time E[W] and the mean response time E [R] (measured in
E[S]) as a function of p for the M]M(l queueing model

2.3 Further reading


Little’s law was originally presented in 1961 [186]. Other proofs of Little’s results can be
found in [81, 150, 193, 269, 2701. The analysis of the MIMI1 queue as presented here is an
example of the so-called “method of moments” to derive average performance measures;
this method has first been presented by Cobham in 1954 [57, 581. Paterok et al. discuss its
suitability for the derivation of higher moments [230]. W e will use the method of moments
at many occasions in the course of this book.

2.4 Exercises
2.1. Time-sharing computer systems [adapted from [156]].
Consider a time-sharing computer system that supports K users sitting behind their ter-
minals issuing commands. The time it takes a single user to interpret an answer from the
central computer before a new command is issued is called the think time and takes, on
average, E[Z] seconds. The computer system can process a command in, on average, E[S]
seconds. Now assume that there are on average k commands (jobs) being worked upon
by the central computer, hence, there are K - Ic jobs being “processed” by the users; the
effective rate at which these users generate new requests for the central computer then
2.4 Exercises 29

equals (K - Ic)/E[Z].

1. Express the expected central computer response time E[R] in terms of K, Ic and E[Z]
using Little’s law applied to the central computer.

2. Let po equal the probability that the central computer is empty (all jobs are being
worked upon by t’he users). The throughput of the central computer can then be
expressed as (1 - po)/E[S]. Th is can be understood as follows. If there would be
an infinite supply of jobs, the computer would on average process one every E[S]
units of time, yielding a rate of completion of l/E[S]. However, with probability
1 - po there is no job for the computer to work upon, hence its rate of completion
is effectively decreased with that factor. Using Little’s law for the terminals, express
the expected number of customers in the terminals, i.e., K - k, as a function of po,
E[S] and E[Z].

3. Show that the combination of the above expressions yields:

E[R]= E - E[Z].

4. Based on the possible values that p. can assume, derive the following lower bound
for E[R] thereby taking into account the fact that E[R] cannot be smaller than E[S]:

E[R] > max{KE[S] - E[Z], E[S]}.

We will address this and similar models in more detail in Chapter 4 and in Chapters 11
through 13.

2.2. MIMI1 queueing.


How large can the utilisation in an M]M]l queue be so that the expected response time is
at most n times the expected service time?
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 3

Stochastic processes

T
chapter,
HE aim of this chapter
for practical performance
nor at mathematical
is to provide
evaluation
rigour.
the necessary
purposes.
It is assumed
background
We do not aim at completeness
that the reader
in stochastic processes

has basic knowledge


in this

about probability theory, as outlined in Appendix A.


We first define stochastic processes and classify them in Section 3.1, after which we
discuss a number of different stochastic process classes in more detail. We start with
renewal processes in Section 3.2. We follow with the study of discrete-time Markov chains
(DTMCs) in Section 3.3, followed by Section 3.4 in which general properties of Markov
chains are presented. Then, in Section 3.5, continuous-time Markov chains (CTMCs)
are discussed. Section 3.6 then discusses semi-Markov processes. Two special cases of
CTMCs, the birth-death process and the Poisson process are discussed in Sections 3.7 and
3.8, respectively. In Section 3.9 we discuss the use of renewal processes as arrival processes;
particular emphasis is given to phase-type renewal processes. Finally, in Section 3.10, we
summarise the specification and evaluation of the various types of Markov chains.

3.1 Overview of stochastic processes


A stochastic process is a collection of random variables {X(t) (t E 7-}, defined on a prob-
ability space, and indexed by a parameter t (usually assumed to be time) which can take
values in a set T.
The values that X(t) assumes are called states. The set of all possible states is called
the state space and is denoted 1. If the state space is discrete, we deal with a discrete-
state stochastic process, which is called a chain. For convenience, it is often assumed that
whenever we deal with a chain, the state space Z = (0, 1,2,. . e}. The state space can also
32 3 Stochastic processes

be continuous. We then deal with a continuous-state stochastic process. A similar classi-


fication can be made regarding the index set ‘T. The set ‘T’ can be denumerable, leading
to a discrete-time stochastic process, or it can be continuous, leading to a continuous-time
stochastic process. In case the set ‘T is discrete, the stochastic process is often denoted as
{Nklk E 7). Since we have two possibilities for each of the two sets involved, we end up
with four different types of stochastic processes. Let us give examples of these four:

l Z and ‘T discrete. Consider the number of jobs NI, present in a computer system
at the moment of the departure of the k-th job. Clearly, in a computer system only
an integer number of jobs can be present, thus Z = (0, 1, . . e}. Likewise, only after
the first job departs, Nk is clearly defined. Thus we have ‘7- = { 1,2,. . s}.

l Z discrete and T continuous. Consider the number of jobs N(t) present in the
computer system at time t. Again only integer numbers of jobs can be present, hence
z = (0, 1, * * .}. w e can, however, observe the computer system continuously. This
implies that ‘T = [0, co).

l Z continuous and T discrete. Let Wk denote the time the !+th job has to wait
until its service starts. Clearly, Ic E ‘T is a discrete index set, whereas Wk can take
any value in [0, 00) , implying that Z is continuous.

l Z and ‘T continuous. Let Ct denote the total amount of service that needs to
be done on all jobs present in the computer system at time t. Clearly, t E T is a
continuous parameter. Furthermore, Ct can take any value in [0, oo), implying again
that Z is continuous.

Apart from those based on the above distinctions, we can also classify stochastic processes
in another way. We will do so below, thereby taking the notation for the case of continuous-
time, continuous-state space stochastic processes. However, the proposed classification is
also applicable for the three other cases.
At some fixed point in time t” E 7, the value X(i) simply is a random variable describing
the state of the stochastic process. The cumulative density function (CDF) or distribution
(function) of the random variable X(i) is called the first-order distribution of the stochastic
process {X(t)It E 7) and d enoted as F(Z,t”) = Pr{X@) 5 53). We can generalise this to
the n-th order joint distribution of the stochastic process {X(t)It E 7) as follows:
3.1 Overview of stochastic processes 33

If all the n-th order distributions (n E N+) of a stochastic process {X(t)It E 7) are
invariant for time shifts for all possible values of 2 and l, then the stochastic process is
said to be strictly stationary, i.e., F(& $) = F(& ES. r), where 2-t r is a shorthand notation
for the vector (ii + 7,. . . , t, + 7).
We call a stochastic process {X(t) It E 7) an independent process whenever its n-th
order joint distribution satisfies the following condition:

(3.2)
An example of an independent stochastic process is the renewal process. A renewal process
{X,ln = 1,2,. . n}, is a discrete-time stochastic process, where Xi, X2,. e. are independent,
identically distributed, nonnegative random variables.
A renewal process is a stochastic process in which total independence exists between
successive states. In many situations, however, some form of dependence exists between
successive states assumed by a stochastic process. The minimum possible dependence is
the following: the next state to be assumed by a stochastic process only depends on the
current state of the stochastic process, and not on states that were assumed previously.
This is called first-order dependence or Markov dependence, which leads us to the following
definition.
A stochastic process {X(t)It E 7) is called a Murkov process if for any to < . . . < t, <
t n+i the distribution of X(t,,,), given the values X(ta), e.. , X(t,), only depends on X(t,),
i.e.,

Pr{X(t,+l) I x,+llX(t0) = 20, - - -, X(t,> = xn} = Pr{X(t,+l> 5 xn+lIX(tn) = G}.

(3.3)
Equation (3.3) is generally denoted as the Markov property. Similar definitions can be
given for the discrete-state cases and for discrete-time. Most often, Markov processes used
for performance evaluations are invariant to time shifts, that is, for any s < t, and x, x,,
we have
Pr{X(t) 5 x(X(s) = xs} = Pr{X(t - s) 5 x1X(O) = x8}. (3.4
In these cases we speak of time-homogeneous Markov processes. Important to note here
is that we stated that the next state only depends on the current state, and not on how
long we have been already in that state. This means that in a Markov process, the state
residence times must be random variables that have a memoryless distribution. As we
will see later, this implies that the state residence times in a continuous-time Markov
chain need to be exponentially distributed, and in a discrete-time Markov chain need to
be geometrically distributed (see also Appendix A).
34 3 Stochastic processes

An extension of Markov processes can be imagined in which the state residence time
distributions are not exponential or geometric any more. In that case it is important to
know how long we have been in a particular state and we speak of semi-Marlcov processes.

3.2 Renewal processes


We define a renewal process to be a discrete-time stochastic process {X,ln = 1,2,. . e},
where X1,Xx,... are independent, identically distributed, nonnegative random variables.
Let us now assume that all the random variables Xi are distributed as the random
variable X with underlying distribution function Fx(x). Furthermore, let

(3.5)

denote the time, from the initial time instance 0 onwards, until the k-th occurrence of a
renewal (So = 0). Then, Sk has distribution function FF) (x), the k-fold convolution of
Fx (CC), defined as

(3.6)
with fx(s) the probability density function of X, and l(x) the indicator function which
evaluates to 1 if its argument is true, and to zero otherwise.
Now, let us address the number of renewals during the time interval [0, t). We define
the renewal counting process {N(t) 1t E I?}, w h ic h counts the number of renewals in the
interval [0, t). This stochastic process has a discrete state space, being the natural numbers
JV, and a continuous time parameter. The probability of having exactly n renewals in a
certain time interval, can now be expressed as follows:

Pr{N(t) = n} = Pr{S, 5 t < Sn+i}


= Pr{S, 5 t} - Pr{S,+i < t}
= F,t”‘(t) - F$+l)(t). (3.7)

This expression can be understood as follows (see also Figure 3.1). The n-th renewal should
take place before t, yielding the term Pr{S, 5 t}. H owever, that alone is not enough, the
(n + l)-th renewal should take place after t, so that we have to subtract the probability
that the (n + 1)-th arrival happens before t, explaining the second term.
Obtaining the probability distribution for {N(t)lt E R} is often complex, so that only
the expected number of renewals during some time interval, E[N(t)], is computed. This
3.2 Renewal processes 35

s2 >

I I I I
I )- t
Xl x2 j x3 ; x4

N(tv)=2 N($=3

Figure 3.1: A renewal process and the associated counting process

quantity is denoted as M(t) and called the renewal function. Using the definition for
expectation, we now derive the following:

M(t) = E[N(t)] = E nPr{N(t) = n}


n=O
cc
=
c r@?‘(t) - fJ nFt+l)(t)
n=O n=O

- F &g)(t) - g(n - l)Fg)(t)


n=O n=l

= 2 Fp(t) = lp(t) + F Fg+"(t).


n=l n=l

Considering the fact that p$+l) (t) is the convolution of I$’ (t) and I$) (t) = Fx (t) as
given in (3.6), we obtain

M(t) = J-x(t)+ fy (s,1@)(t - s)fx(s)ds) n=l

= Fx(t) + St 2 (@)(t - s)) fx(s)ds


O n=l

= Fx(t) + It qt - s)fx(s)ds. (3.9)


This equation is known as the fundamental renewal equation. The derivative of M(t),
denoted as m(t), is known as the renewal density and can be interpreted as follows. For
small values E > 0, E x m(t) is the probability that a renewal occurs in the interval [t, t + E).
Taking the derivative of (3.8), we obtain

(3.10)
36 3 Stochastic processes

where fjyn)(t) is the derivative of F$‘(t), and consequently, we obtain the renewal equation:

4) = fx(f) + lim(t - s)fx(s)ds. (3.11)

Renewal processes have a number of nice properties. First of all, under a number of regu-
larity assumptions, it can be shown that the limiting value of m(t) for large t approaches
the reciprocal value of E[X]:
(3.12)

This result states that, in the long run, the rate of renewals is inversely related to the mean
inter-renewal time. This is an intuitively appealing result. Very often, this limiting value
of m(t) is also called the rate of renewals, or simply the rate (of the renewal process).
Secondly, a renewal process can be split into a number of less intensive renewal pro-
cesses. Let oi E (0, I] and Cy=“=,a; = 1 (the CQ,form a proper probability density). If we
have a renewal process with rate X and squared coefficient of variation of the renewal time
distribution C2, we can split it into n f IV+ renewal processes, with rate aiX and squared
coefficient of variation criC2 + (1 - cyi) respectively.

Example 3.1. Poisson process.


Whenever the times between renewals are exponentially distributed, the renewal process
is called a Poisson process. A Poisson process has many attractive features which explains
part of its extensive use. The other part of the explanation is that for many processes
observed in practice, the Poisson process is a very natural representation.
When we have as renewal time distribution Fx(t) = 1 - epAt, t 2 0, we can derive that
M(t) = At and m(t) = X. M(t) d enotes the expected number of renewals in [0, t) and
m(t) denotes the average renewal rate. Finally, notice that the n-fold convolution of the
interrenewal time distribution has an Erlang-n distribution:

n-1 @t>” -At


Fg’(t)=l- C----- e (3.13)
( ICC0 k! ) *
This means that the time until exactly n renewals have taken place is Erlang-n distributed.
When n increases, the coefficient of variation decreases. This means that the variance that
exists in the individual realisations of the renewals, for larger numbers of renewals “averages
out” .
Finally, as we will also see in Section 3.8, in a Poisson process, the probability density
of the number of renewals in an interval [0, t) h as a Poisson distribution with parameter
Ait:
Pr{N(t) = n} = (-l:i”eyA’. (3.14)
3.3 Discrete-time Markov chains 37

This can be understood by considering (3.7) and realizing that F’) (t) is an Erlang-n
distribution. In the subtraction (3.7) all the summands cancel against one another, except
the one with k = n for &‘$+“(t) in (3.13). cl

3.3 Discrete-time Markov chains


In this section we present in some detail the theory of discrete-time, discrete-state space
Markov processes. These types of stochastic processes are generally called discrete-time
Markov chains (DTMCs).
A DTMC has the usual properties of Markov processes: its future behaviour only
depends on its current state and not on states assumed in the past. Without loss of
generality we assume that the index set ‘T = (0, 1,2,. .a} and that the state space is
denoted by 1. The Markov property (3.3) has the following form:

pr{X,+l = in+llXO = io, - - -, X, = in} = I+{&+1 = L+ll& = in>, (3.15)

where io, . . . , in+i E Z. From this definition we see that the future (time instance n + 1)
depends only on the current status (time instance n), and not on the past (time instances
n - l,a.. ,O). Let pj(n) = Pr{X, = j} d enote the probability of “being” in state j at
time n. Furthermore, define the conditional probability pj,k(m,n) = Pr{X, = kiX, =
j}, for all m = 0, . . . , n, i.e., the probability of going from state j at time m to state
k at time n. Since we will only deal with time-homogeneous Markov chains, these so-
called transition probabilities only depend on the time difference I = n - m. We therefore
denote them as pj,k(l) = Pr{X,+l = klX, = j}. Th ese probabilities are called the l-step
transition probabilities. The l-step transition probabilities are simple denoted pj,k (the
parameter 1 is omitted). The O-step probabilities are defined as &,k(O) = 1, whenever
j = k, and 0 e,l sewhere. The initial distribution p(0)
- of the Markov chain is defined as
p(o) = (PO(O), * * * ,plj-l(O)). By iteratively applying the rule for conditional probabilities, it
can easily be seen that

Pr(X0 = i0, X1 = il, - e - ,X, = in} = pio(o)pio,il . . .pin-l,i,. (3.16)

This implies that the DTMC is totally described by the initial probabilities and the l-step
probabilities. The l-step probabilities are easily denoted by a state-transition probability
matrix P = (pi,j). The matrix P is a stochastic matrix because all its entries pi,j satisfy
0 2 p;,j 5 1, and Cj pi,j = 1, for all i.
38 3 Stochastic processes

Figure 3.2: State transition diagram for the example DTMC

A DTMC is very conveniently visualised as a labeled directed graph with the elements
of Z as vertices. A directed edge with label pi,j exists between vertices i and j whenever
pi,j > 0. Such representations of Markov chains are often called state transition diagrams.

Example 3.2. Graphical representation of a DTMC.


In Figure 3.2 we show the state transition diagram for the DTMC with state-transition
probability matrix

(3.17)

Let us now calculate the 2-step probabilities of a DTMC with state-transition proba-
bility matrix P. We have

pi,j(2) = Pr(X2 = jJXa = i} = 1 Pr{X2 = j, Xi = kJXO = i}, (3.18)


kEZ

since in going from state i to state j in two steps, any state Ic E Z can be visited as
intermediate state. Now, due to the rules of conditional probabilities as well as the Markov
property, we can write

pi,j(2) = C Pr(X2 = j, Xi = /+1X, = i}


kEZ

= C Pr{Xi = IclXa = i} Pr{X, = jlXi = Ic, X0 = i}


kG

= C Pr{Xi = IclX, = i}Pr{X, = jlXi = k}


&I
3.3 Discrete-time Markov chains 39

=c Pi,kPk,j - (3.19)
IcEI

In the last equality we recognize the matrix product. We thus have obtained that the
2-step probabilities pi,j (2) are elements of the matrix P2. The above technique can be
applied iteratively, yielding that the n-step probabilities pi,i(n) are elements of the matrix
P”. For the O-step probabilities we can write I = PO. The equation that established a
relation between the (m + n)-step probabilities and the m- and n-step probabilities is

P m+n = p”p”, (3.20)

which is generally known as the Chapman-Kolmogorov equation.


When we want to calculate pi(n) we can simply condition on the initial probabilities,
i.e.,

pj(n) = Pr{X, = j} = CPr{Xo = i} Pr{X, = jlXo = i}


iEZ
= x Pi(0)Pi,j(n)* (3.21)
iEZ

Writing this in matrix-vector notation, with p(n) = (po(n),pl(n), . . s), we arrive at

p(n) = p(o)P”* (3.22)

Recalling that the index n in the above expression can be interpreted as the step-count or
the time in the DTMC, (3.22) indeed expresses the time-dependent or transient behaviour
of the DTMC.

Example 3.3. Transient behaviour of a DTMC.


Let us compute g(n) = p(O)P” for n = 1,2,3,. . . with P as given in (3.17), and p(0) =
(l,O, 0). Clearly, p(l) = p(O)P = (0.6,0.2,0.2). Then, p(2) = p(0)P2 = p(l)P =
(0.50,0.28,0.22). ibe p roceed with p(3)
- = p(2)P
- = (0.460~0.324,0>16). We could go
on and calculate many more p(n)
- values. What can already be observed is that the suc-
cessive values for -p(n) seem to converge somehow, and that the elements of all the vectors
p(n) always sum to 1. 0

It is interesting to note that for many DTMCs (but definitely not all; we will discuss
conditions for convergence in the next section) all the rows in P” converge to a common
limit when n + 00. For the time being, we assume that such a limit indeed exists and
denote it as 2. Define

vi = &mpj(n) = Jlrn Pr{X, = j} = Jim ~pi(O)pi,i(n). (3.23)


iEZ
40 3 Stochastic processes

Writing this in matrix-vector notation, we obtain

(3.24)

However, we also have

v = iLmwp_(n + 1) = J&ll)(0)Pn+l = lim P(0)pn P = VP* (3.25)


n+cc - >
We thus have established that whenever the limiting probabilities g exist) they can be
obtained by solving the system of linear equations

-v = VP * ?J(I - P) = 0, (3.26)

with, since 2 is a probability vector, xi vi = 1, and 0 < vi 5 1. Note that the vector g is
the left Eigenvector of P associated with Eigenvalue 1. The equivalent form on the right,
i.e., ~(1 - P) = 0, will be discussed in Section 3.5 in relation to CTMCs.
The vector g is called the stationary or steady-state probability vector of the Markov
chain. For the Markov chains we will encounter, a unique limiting distribution will most
often exist. Furthermore, in most of the practical cases we will encounter, this steady-state
probability vector will be independent of the initial state probabilities.

Example 3.4. Steady-state probability vector calculation.


Let us compute v = VP with P as in the previous example, and compare it to the partially
converged result obtained there. Denoting g = (~1,212, us) we have a system of three linear
equations:
vl = $(6vl + ~12+ 62/s),
~2 = &(2~1 + ~vz), (3.27)
1 v3 = +j(2Vl + 212+ 4v3).
Multiplying all equations by 10 and collecting terms yields

4vI = v2 + 6v3,
2212= 2211, (3.28)
6~ = 2~ + v2.

From the middle equation we obtain v1 = 212. Substituting this in the other two equations
reveals that 213= VI/~. Using the fact that v1 + 7.12+ 213= 1 then gives us Q = (6, $,, &).
Note that the observed convergence in the previous example goes indeed in the direction
of v. cl

The steady-state probabilities can be interpreted in two ways. One way is to see them
as the long-run proportion of time the DTMC “spends” in the respective states. The other
3.4 Convergence properties of Markov chains 41

way is to regard them as the probabilities that the DTMC would be in a particular state
if one would take a snapshot after a very long time. It is important to note that for large
values of n state changes do still take place in the DTMC.
Let us finally address the state residence time distributions in DTMCs. We have seen
that the matrix P describes the l-step state transition probabilities. If, at some time
instance n, the state of the DTMC is i, then, at time instance n + 1, the state will still
be i with probability pi,i, but the state will be j # i with probability 1 - pi,i = Cjfipi,~.
For time instance n + 1 a similar reasoning holds, so that t,he probability of residing still
in state i (given residence there at time instance n and n + 1) equals p~,i. Taking this
further, the probability to reside in state i for exactly m consecutive time steps equals
(1 - pi i)pE-‘, that is, there are m - 1 steps in which the possibility (staying in i) with
probability pi,; is taken, and one final step with probability 1 - pi,; where indeed a step
towards another state j # i is taken. Interpreting leaving state i as a success and staying
in state i as a failure (one fails to leave) we see that the state residence times in a DTMC
obey a geometric distribution. The expected number of steps of residence in state i then
equals l/ ( 1 - pi,i) and the variance of the number of residence steps in state i then equals
Pi,i/( 1 - Pi,i)2.
The fact that the state residence times in a DTMC are geometrical distributions need
not be a surprise. When discussing the Markov property, we have stated that only the
actual state, at some time instance, is of importance in determining the future, and not the
residence time in that state. The geometric distribution is the only discrete distribution
exhibiting this memoryless property.

3.4 Convergence properties of Markov chains

As indicated in the previous section many DTMCs exhibit convergence properties, but
certainly not all. In this section we will discuss, in a very compact way, a number of
properties of DTMCs that help us in deciding whether a DTMC has a unique steady-state
distribution or not. In a similar way such properties can also be established for CTMCs
(see Section 3.5).
Let us start with a classification of the states in a DTMC. A state j is said to be
accessible from state i if, for some value n, pi,j(n) > 0, which means that there is a step
number for which there is a nonzero probability of going from state i to j. For such a pair
of states, we write i + j. If i + j and j + i, then i and j are said to be communicating
states; denoted i N j. Clearly, the communicating relation (-) is:
42 3 Stochastic processes

a transitive: if i N j and j - Ic then i - k;

l symmetric: by its definition in terms of --+, i - j is equivalent to j N i;

l reflexive: for n = 0, we have pi,i (0) = 1, so that i -+ i and therefore i N i.

Consequently, N is an equivalence relation which partitions the state space in communi-


cating classes. If all the states of a Markov chain belong to the same communicating class,
the Markov chain is said to be irreducible. If not, the Markov chain is called reducible.
The period di E N of state i is defined as the greatest common divisor of those values n
for which pi,i(n) 2 0. When di = 1, state i is said to be aperiodic, in which case, at every
time step there is a non-zero probability of residing in state i. It has been proven that
within a communicating class all states have the same period. Therefore, one can also speak
of periodic and aperiodic communicating classes, or, in case of an irreducible Markov chain,
of an aperiodic or periodic Murkov chain (consisting of just one communicating class).
A state i is said to be absorbing when lim,,,pi,i(n) = 1. When there is only one
absorbing state, the Markov chain will, with certainty, reach that state for some value of
n. A state is said to be transient or non-recurrent if there is a nonzero probability that the
Markov chain will not return to that state again. If this is not the case, the state is said to
be recurrent. For recurrent states, we can address the time between successive visits. Let
fi,j(n) denote the probability that exactly n steps after leaving state i, state j is visited for
the first time. Consequently, fi,i(n) is the probability that the Markov chain takes exactly
n steps between two successive visits to state i. The probability to end up in state j # i
when started in state i, can now be expressed as

fi,j = 2 fi,j(n). (3.29)

From this definition, it follows that if fi,i = 1, then state i is recurrent. If state i is
nonrecurrent then fi,i < 1. In the case fi,i = 1 we can make a further classification based
upon the mean recurrence time of state i:

mi = 5 nfi,i(n). (3.30)
n=l

A recurrent state i is said to be positive recurrent (or recurrent non-null) if the mean
recurrence time rni is finite. If rni is infinite, state i is said to be null recurrent.
Having defined the above properties, the following theorem expresses when a DTMC
has a (unique) steady-state probability distribution.
3.5 Continuous-time Markov chains 43

Theorem 3.1. Steady-state probability distributions in a DTMC.


In an irreducible and aperiodic DTMC with positive recurrent states:

l the limiting distribution g = lim,,,Ioj(n) = lim,,,pi,j(n) does exist;

l Q is independent of the initial probability distribution p(O);

l Q is the unique stationary probability distribution (the steady-state prob-


ability vector).

In most of the performance models we will encounter, the Markov chains will be of the
last type. When we do not state so explicitly, we assume that we deal with irreducible and
aperiodic DTMCs with positive recurrent states. When we are dealing with continuous-
time Markov chains, similar conditions apply.

3.5 Continuous-time Markov chains


In this section we present in some detail the theory of continuous-time Markov chains
(CTMCs). We first discuss how CTMCs can be constructed by enhancing DTMCs with
state residence time distributions in Section 3.5.1. We then present the evaluation of the
steady-state and transient behaviour of CTMCs in Section 3.5.2; this section also includes
examples.

3.5.1 From DTMC to CTMC


As before, we assume without loss of generality that the state space is denoted Z =
(0, 1,2,. . s}. In DTMC s we have only been addressing abstract time steps up till now.
No “physical” time has been associated with these steps. With CTMCs we interpret the
index as real time, i.e., we denote ‘J- = [0, oo), where t = 0 is the time point at which the
CTMC starts, and is in its initial state.
The easiest way to introduce CTMCs is to develop them from DTMCs. We do so by
associating with every state i a state residence time distribution. Since for CTMCs the
general Markov property must be valid, we do not have complete freedom to choose any
state residence time distribution. Recalling the Markov property, we must have, for all
44 3 Stochastic processes

non-negative to < ti < . . a < tn+i and ~0, ~1,. . . , x,+1:

Pr{X(t,+l) = ~+llX(t0) = 20, - - - ,X(L) = xn} = Pr{X(t,+l) = z,+lIX(L) = xn}.


(3.31)
As can be observed, the probability distribution for the (n + 1)-th state residence time
only depends on the current (n-th) state and not on the time the chain has already resided
in the current state, nor on states assumed in the past. This implies that a memoryless
continuous distribution is needed to describe the state residence times. Since there is only
one memoryless continuous distribution, we have little choice here. Indeed, as will be
shown later, the state residence times in CTMCs are exponentially distributed. Thus, we
can associate with every state i in the CTMC a parameter pi describing the rate of the
exponential distribution, that is, we have as residence distribution in state i:

F,(t) = 1 - ewPit, t 2 0. (3.32)

The vector p = (...,pi,.. .) thus describes the state residence time distributions in the
CTMC. We can still use the state transition probability matrix P to describe the state
transition behaviour. The initial probabilities remain p(O).
- The operation of the CTMC
can now be interpreted as follows. When entering state i, the CTMC will “stay” in i for
a random amount of time, distributed according to the state residence distribution F,(t).
After this delay, a state change to state j will take place with probability pi,j; to ease
understanding at this point, assume that pi,i = 0 for all i.
Instead of associating with every state just one negative exponentially distributed de-
lay, it is also possible to associate as many delays with a state as there are transition
possibilities. We therefore define the matrix Q with qi,j = ~~p;,~, in case i # j, and
4i,i = - Cjfi -pi (in some publications
qi,j = the diagonal entries qi,i are denoted as -qi
with qi = Cj-+ qi,j). Notice that since we assume that, pi,i = 0, we have qi,i = -pi. Using
this notation allows for the following interpretation. When entering state i, it is investi-
gated which states j can be reached from i, namely those j # i for which qi,j > 0. Then,
for each of these possibilities, a random variable is thought to be drawn, according to the
(negative exponential) distributions F,,j(t) = 1 - e- q29jtm
, these distributions model the
delay perceived in state i when going from i to j. One of the “drawn” delays will be the
smallest, meaning that the transition corresponding to that delay will take the smallest
amount of time, and hence will take place. The possible transitions starting from state i
can be interpreted as if in a race condition: the faster one wins.
Why is this interpretation also correct ? The answer lies in the special properties of the
employed negative exponential distributions. Let us first address the state residence times.
3.5 Continuous-time Markov chains 45

Being in state i, the time it takes to reach state j is exponentially distributed with rate
qi,j. When th ere is more than one possible successor state, the next state will be such that
the residence time in state i is minimised (race condition). However, the minimum value
of a number of exponentially distributed random variables with rates qi,j (j # i) is again
an exponentially distributed random variable, with as rate the sum Cjfi qi,j of the original
rates. This is exactly equal to the rate pi of the residence time in state i.
A second point to verify is whether the state transition behaviour is still the same.
In general, if we have n negative exponentially distributed random variables XI, (with
rates Zk), then Xi will be the minimum of them with probability Zi/ XI, Zk. In our case,
we have a number of competing delays when starting from state i, which are all negative
exponentially distributed random variables (with rates qi,j). The shortest one will then
lead to state j with probability

4&j Pi,jl-li (3.33)


= - = %j 7
Ckfi qi,k Pi

which shows the equivalence of the transition probabilities in both interpretations.


Let US now discuss the case where pi,i > 0, that is, the case where, after having resided
in state i for an exponentially distributed period of time (with rate pi), there is a positive
probability of staying in i for another period. In particular, we have seen in Section 3.3
that the state residence distributions in a DTMC obey a geometric distribution (measured
in “visits”), with mean l/( 1 - pi,;) f or state i. Hence, if we decide that the expected state
residence time in the CTMC constructed from the DTMC is l/pi, the time spent in state
i per visit should on average be (1 - pi,i)/pi. H ence, the rate of the negative exponential
distribution associated with that state should equal pi/(1 - pi,i). Using this rate in the
above procedure we find that we have to assign the following transition rates for j # i:

bPi,j . Pi,j j # i, (3.34)


qi,j = 1 - pi.+ = iA 1 _ pi,i = &Pr{jump i + jljump away from i},

that is, we have renormalised the probabilities pi,j (j # i) such that they make up a proper
distribution. To conclude, if we want to associate a negative exponential residence time
with rate /LU;to state i, we can do so by just normalising the probabilities pi,j (j # i)
appropriately.

3.5.2 Evaluating the steady-state and transient behaviour


As DTMCs, CTMCs can be depicted conveniently using state transition diagrams. These
state transition diagrams are labelled directed graphs, with the states of the CTMC repre-
46 3 Stochastic processes

sented by the vertices. An edge between vertices i and j (i # j) exists whenever qi,j > 0.
The edges in the graph are labelled with the corresponding rates.
Formally, a CTMC can be described by an (infinitesimd) generator matrix Q = (qi,j)
and initial state probabilities p(0). Denoting the system state at time t E 7 as X(t) E Z,
we have, for h -+ 0:

Pr{X(t + h) = jlX(t) = i} = qi,jh + o(h), i # j, (3.35)

where o(h) is a term that goes to zero faster than h, i.e., limt+oo(h)/h = 0. This result
follows from the fact that the state residence times are negative exponentially distributed.
The value qi,j (i # j) is the rate at which the current state i changes to state j. Denote
with pi(t) the probability that the state at time t equals i: pi(t) = Pr{X(t) = i}. Given
pi(t), we can compute the evolution of the Markov chain in the very near future [t, t + h),
as follows:

pi(t + h) = pi(t)Pr{d o not depart from i} + x pj (t) Pr{go from j to i}

= Pi(t) (1 -24.l/i) + (gPj(t)qI,J h+o(h)- (3.36)

NOW, using the earlier defined notation qi,i = - Cjfi qi,j, we have

Pi(t + h) = pi(t) + h + o(h)- (3.37)

Rearranging terms, dividing by h and taking the limit h + 0, we obtain

pi@ + h) -pi(t)
pi(t) = lim = C clj,iPj (t) 9 (3.38)
h-+0 h jEZ
which in matrix notation has the following form:

p’(t) = p_(t)Q, (3.39)

where p(t) = (*a* ,pi(t)y es*) and where the initial probability vector p_(O) is given. We
thus have obtained that the time-dependent or transient state probabilities in a CTMC are
described by a system of linear differential equations, which can be solved using a Taylor
series expansion as follows:

p(t) = p(o)eQt= p(o) (3.40)


3.5 Continuous-time Markov chains 47

As we will see in Chapter 15, this solution for p_(t) is not the most appropriate to use.
Other methods will be shown to be more efficient and accurate.
In many cases, however, the transient behawiour -p(t) of the Markov chain is more than
we really need. For performance evaluation purposes we are often already satisfied when
we are able to compute the long-term or steady-state probabilities pi = limt+m pi(t). When
we assume that a steady-state distribution exists, this implies that the above limit exists,
and thus that lim t--tmp$(t) = 0. Consequently, for obtaining the steady-state probabilities
we only need to solve the system of linear equations:

p&=0, cpi=l. (3.41)


iEZ

The right part is added to ensure that the obtained solution is indeed a probability vector;
the left part alone has infinitely many solutions, which upon normalisation all yield the
same probability vector.
It is important to note here that the equation pQ = 0 is of the same form as the equation
c = VP we have seen for DTMCs. Since this equation can be rewritten as g(P - I) = 0,
the matrix (P - I) can be interpreted as a generator matrix. It conforms the format
discussed above: all non-diagonal entries are non-negative and the diagonal entries equal
the (negated) sums of the off-diagonal elements in the same row.
Given a CTMC described by Q and g(O), 1‘t is also possible to solve the steady-state
probabilities via an associated DTMC. We therefore construct a state-transition probability
matrix P with pi,j = qi,j/lqi,i( (i # j) and with diagonal elements pi,i = 0. The resulting
DTMC is called the embedded Markov chain corresponding to the CTMC. The probabilities
pi,j represent th e b ranching probabilities, given that a transition out of state i occurs in
the CTMC. For this CTMC we solve the steady-state probability vector g via VP = g; vi
now represents the probability that state i is visited, irrespective of the length of staying
in this state. To include the latter aspect, we have to renormalise the probabilities vi with
the mean times spent in each state according to the CTMC definition. In the CTMC, the
mean residence time in state i is l/qi, so that the steady-state probabilities for the CTMC
become:
vi/% for all i. (3.42)
pi = xi viJqi ’

Example 3.5. Evaluation of a 2-state CTMC.


Consider a computer system that can either be completely operational or not at all. The
time it is operational is exponentially distributed with mean l/X. The time it is not
operational is also exponentially distributed, with mean l/p. Signifying the operational
48 3 Stochastic processes

Figure 3.3: A simple 2-state CTMC

state as state “l”, and the down state as state “0”) we can model this system as a 2-state
CTMC with generator matrix Q as follows:

-P I-L
Q= (3.43)
( x ---A) ’
Furthermore, it is assumed that the system is initially fully operational so that p(O) = (0,l).
In Figure 3.3 we show the corresponding state transition diagram. Note that the numbers
with the edges are now rates and not probabilities.
Solving (3.41) yields the following steady-state probability vector:

E= (&g (3.44)

This probability vector can also be computed via the embedded DTMC which is given as:

0 1
P= (3.45)
( 10 1 *

Solving for g yields us g = (i, $), indicating that both states are visited equally often.
However, these visits are not equally long. Incorporating the mean state residence times,
being respectively l/p and l/X, yields

(3.46)

which is the solution we have seen before.


We can also study the transient behaviour of the CTMC. We then have to solve the
corresponding system of linear differential equations. Although this is difficult in general,
for this specific example we can obtain the solution explicitly. From

244
- = pwQt,
3.5 Continuous-time Markov chains 49

1 I I I I I

Pd -
0.8 p1t ii --

PO “’ I
0.6
p.(t)
0.4

0.2

0
0 0.5 1 1.5 2 2.5 3 3.5 4
t

Figure 3.4: Steady-state and transient behaviour of a 2-state CTMC

we derive
x x -(A+P)t
PO(t) = G - -- I
X+/J
x
Pl(Q = x+p
L+ pe-(X+P)t.
(3.48)
X+/-J
Notice that pa(t) + PI(t) = 1 (f or all t) and that the limit of the transient solutions for
t + 00 indeed equals the steady-state probability vectors derived before. In Figure 3.4 we
show the transient and steady-state behaviour of the 2-state CTMC for 3X = ,Q = 1. 0

Example 3.6. Availability evaluation of a fault-tolerant system.


Consider a fault-tolerant computer system consisting of three computing nodes and a single
voting node. The three computing nodes generate results after which the voter decides upon
the correct value (by selecting the answer that is given by at least two computing nodes).
Such a fault-tolerant computing system is also known as a triple-modular redundant system
(TMR). Th e f ai 1ure rate of a computing node is X and of the voter u failures per hour (fph).
The expected repair time of a computing node is l/p and of the voter is l/S hours. If the
voter fails, the whole system is supposed to have failed and after a repair (with rate 6) the
system is assumed to start “as new”. The system is assumed to be operational when at
least two computing nodes and the voter are functioning correctly.
50 3 Stochastic processes

Figure 3.5: CTMC for the TMR system

To model the availability of this system as a CTMC, we first have to define the state
space: z = ((3, I), (2, I), (1, I), (0, I), (0, O)), w h ere state (i, j) specifies that i computing
nodes are operational as well as j voters. Note that the circumstance of the computing
nodes does not play a role any more as soon as the voter goes down; after a repair in
this down state the whole system will be fully operational, irrespective of the past state.
Using the above description, the state-transition diagram can be drawn easily, as given in
Figure 3.5. The corresponding generator matrix is given as:

Q= o P -(p+X+u) x u . (3.49)

: -(3X P
60 + Y) -(p+2X+v)3x
0 ::P0 -b++v)0 4u
v
Y I
We assume that the system is fully operational at t = 0. The following numerical param-
eters are given: X = 0.01 fph, u = 0.001 fph, p = 1.0 repairs per hour (rph) and S = 0,2
rph.
We can now compute the steady-state probabilities by solving the linear system pQ = 0
under the condition that Cipi = 1, which yields the following values:

(6 j> (371) (271) (1,1) (071) (0,0)


pi,j 9.6551 x 10-l 2.8936 x 1O-2 5.7813 x 1O-4 5.7755 x 1O-6 4.9751 x 1O-3

The probability that the system is operational


can thus be computed as 0.99444. Although
this number looks very good (it is very close to 100%) f or a non-stop transaction processing
facility, it would still mean an expected down-time of 48.7 hours a year ((1 - 0.99444) x
24 x 365).
3.6 Semi-Markov chains 51

1.000

0.995

0.990

0.985

P31@) 0.980

0.975

0.970

0.965
0.960
0 2 4 6 8 10
t

Figure 3.6: Transient probabilities for the TMR system (1)

We can also address the transient behaviour of this small CTMC by numerically solving
the differential equations for p(t) with a technique known as uniformisation (see Chap-
ter 15). In Figure 3.6 we show the probability ps,l(t) for the first 10 hours of system
operation. As can be observed, the transient probability reaches the steady-state probabil-
ity relatively fast. A similar observation can be made for the other transient probabilities
in Figure 3.7 (note the logarithmic scale of the vertical axis). q

3.6 Semi-Markov chains


It is possible to associate other than exponential distributions with the states in a Markov
chain. In semi-Murkov chains (SMCs) the rate of the transition from state i to state j
may depend on the time already spent in state i (since the last entrance) but not on states
visited before entering state i nor on any previous residence times. Thus, we deal with a
time-dependent probability matrix K(t) known as the kernel of the SMC, where an entry
ki,j(t) equals the probability that, after having entered state i, it takes at most t time units
to switch to state j (given that no transition to any other state takes place).
We can now define two important intermediate quantities. First, we define pi,j =
lim t+oo ki,j(t) which expresses the probability that once state i has been entered, the next
state will be j. Furthermore, we define F;(t) = Cj ki,j(t). As before, F,(t) is the residence
52 3 Stochastic processes

1%Pi,j@)

-6

0 2 4 6 8 10

Figure 3.7: Transient probabilities for the TMR system (2)

time distribution for state i (per visit to state i). Using these two definitions we can
describe the operation of an SMC as follows. Once state i has been entered, the residence
time in that state will be a random variable with distribution Fi(t) (and density fi(t)).
After this period, the state changes to j with probability pi,j.
To obtain the steady-state probabilities for an SMC, we can follow the same method
as has been presented to solve the steady-state probabilities for CTMCs using DTMCs.
Indeed, at the moments of state changes, the SMC behaves exactly as a DTMC. Therefore,
we can compute the steady-state probabilities for the DTMC with P = (pi,j), denoted here
as g. Now, we have to compute the average state residence times fi for all states i in the
SMC. We do this directly from the state residence time distributions:

fi = Srn tfi(t)dt. (3.50)


0

We then obtain the steady-state probabilities in the SMC by taking these residence times
into account, as follows:
%.fi
Pi = cjVjjj, for all i. (3.51)
3.7 The birth-death process 53

3.7 The birth-death process

A special case of the CTMCs discussed in Section 3.5 is the birth-death process. Since such
processes have many applications in queuemg theory (see also Chapter 4) we introduce
them here. A birth-death process on state space Z = (0, 1,2, a. .} is a CTMC in which
from any state i E N+ only transitions to the neighbouring states i - 1 and i + 1 are
allowed. From state 0, the only allowed transition is to state 1. When interpreting the
current state i E N as the current population in a system, a birth occurs withsate Xi > 0,
resulting in state i + 1. On the other hand, from state i E N+ a death occurs with rate
pi > 0, resulting in state i - 1. This type of CTMC is characterised by a tridiagonal
generator matrix Q as follows:

(3.52)

Birth-death models do not necessarily have an infinite state space. a finite state space
z = {0,1,2;.- , n} is also possible. The interpretation of state n is then such that no
births can occur in this state. We will see in Chapter 4 that birth-death models are
very well suited for the analysis of elementary queueing stations. Moreover, due to the
tridiagonal structure of Q the solution of the system of linear equations (3.41) can in many
cases be given explicitly.

3.8 The Poisson process


Consider a birth-death model on an infinite state space Z = (0, 1,2,. . .} with two special
properties:

1. All the birth-rates are equal to one another, i.e., Xi = X, i = 0, 1,2,. . .; X is called
the intensity or rate of the Poisson process;

2. All the death-rates are equal to 0, i.e., ,zi = 0, i = 1,2,3, s. s.

We deal here with a pure birth model with constant birth-rate. The state probability
pi(t) = Pr{N(t) = i }j can be obtained by solving the differential equation p’(t) = e(t) Q
54 3 Stochastic processes

with

(3.53)

It can be shown (see also the exercises) that

pi(t) = Pr{N@) = i} = !x:)‘e-*, t>o, iEN. (3.54)

Consequently, N (t ) is distributed according to a Poisson distribution with parameter At


(also E[N(t)] = Q&(~) = At). Notice that N(t) is exactly equal to the number of births in
[0, t), and thus the state of the CTMC indicates the count of births. The time between
state changes is exponentially distributed with rate X.
There are two other properties of the Poisson process of importance in practical mod-
elling studies (in Figure 3.8 we visualise these two properties):

l Merging of independent Poisson streams: n independent Poisson streams with inten-


sities Xi through X, can be merged to form a single Poisson stream with intensity
Xl + * * * + A,.

l Splitting of a Poisson A Poisson stream with intensity X can be split into


stream:
n > 1 Poisson streams with intensities aiX through o,X, with oi,. . . , Q, 2 0, and
cy=“= ai = 1.

Notice that the merging property does not hold for all renewal processes; the splitting
property, however, does.
Another important property of Poisson processes is the PASTA property; it will be
discussed in Section 4.3. Finally, recall that we have discussed two ways to arrive at a
Poisson process: (i) as a special renewal process with negative exponential renewal time
distributions, and (ii) as a special birth-death process with constant birth-rate and zero
death-rate.

3.9 Renewal processes as arrival processes


Stochastic processes describing the arrival streams of jobs at a queueing station are often
assumed to be renewal processes. This implies that the times between successive arrivals
are assumed to be independently and identically distributed. Via the choice of the renewal
3.9 Renewal processes as arrival processes 55

Figure 3.8: Merging and splitting of Poisson streams

time distribution, a wide variety of arrival processes can be constructed. For instance,
the network traffic generated by a fixed bit-rate voice source will be very deterministic,
e.g., with a rate of 8000 packets/second, and with the packets equidistantly spaced in time.
On the other hand, the data traffic generated over a network by a general computer user
will have a high variance: periods of long inactivity, e.g., for local word processing, will be
followed by high activity periods, e.g., for the sending and receiving of files or programs
to/from a file server. Consequently, the variance of the interrenewal time distribution will
be high.
As we have seen before, the Poisson process is a very special case of a renewal process.
One can, of course, use many distributions other than the exponential for the interrenewal
times. A class of distributions that is very well suited for this purpose is the class of
phase-type distributions (PH-distributions). The resulting PH-renewal processes have the
nice property that they fall within the range of Markovian models, thus allowing for many
analysis techniques (see also Chapter 8).
We present some background on PH-distributions in Section 3.9.1 and continue with
the description of a number of special PH-renewal processes in Section 3.9.2.

3.9.1 Phase-type distributions


PH-distributions are an important class of distributions that can be seen as generalisations
of the exponential distribution. In Figure 3.9 we depict the exponential distribution as an
absorbing CTMC with two states. The initial probability distribution p_(O) = (l,O), and
the time until absorption (in state 1) has an exponential distribution with rate X. When we
now generalise the single state (state 0) of this CTMC into a number of states, but maintain
56 3 Stochastic processes

Figure 3.9: The exponential distribution as a phase-type distribution

the property that the CTMC is absorbing in a single state, we obtain a PH-distribution.
The time until absorption consists of a number of phases, each of exponentially distributed
length. The initial probability distribution is denoted a (as far as the non-absorbing states
are concerned).
To generalise this, consider a CTMC on the state space Z = { 1,. . . , m, m + l}, with
generator matrix

Q= ;z” 7 (3.55)
( )
where T is an m x m matrix with ti,i < 0 (i = 1,. . . ,m), ti,j 2 0 (i # j) and To is a
column vector with nonnegative elements. The row sums of Q equal zero, i.e., Tl-+T’ = 0.
Notice that T by itself is not a proper generator matrix. The initial probability vector is
given as (2, a,+1 ) with &J + a,+1 = 1. Furthermore, the states 1,. es, m are transient,
and consequently, state m + 1 is the one and only absorbing state, regardless of the initial
probability vector. In Figure 3.10 we visualise this.
The probability distribution F(z) of the time until absorption in state m + 1 is then
given as [217]:
F(x) = 1 - aeT”L, z > 0. (3.56)

We now can define a distribution F(z) on [0, oo) to be of phase type, if and only if it is
the distribution of the time to absorption in a CTMC as defined above. The pair (q, T) is
called a representation of F(z). Note that since Q is a generator matrix of which the row
sums equal 0, the elements of To can be computed from T. Then, the following properties
hold:

l the distribution F(z) has a jump of a,+1 at z = 0 and its density on (0, KJ) equals

f(z) = F’(z) = aeTzro.

l the moments E[Xi] of F(z) are finite and given by

E[Xi] = (-l)“i!(aT-$, i =’ 0, 1, . . . . (3.57)


3.9 Renewal processes as arrival processes 57

Figure 3.10: Schematic view of a PH distribution

Examples of PH-distributions are the earlier mentioned exponential distribution, the Erlang-
k distribution, and the hyper- and hypoexponential distribution. Note that these well-
known PH distributions are all represented by acyclic CTMCs. The definition above,
however, also allows for the use of non-acyclic CTMCs.
Important to note is the fact that the first moment of a PH-distribution exactly ex-
presses the mean time to absorption in an absorbing CTMC. Using (3.57) for that specific
case, we obtain:
E[X] = -aT-l& (3.58)

Note that the last multiplication (with r) is just included to sum the elements in the
row-vector -aT-‘. We thus can also write:

E[X]=~x~=g.I, with J: = -&T-l, (3.59)


i=l

where : = (~i,e.. ,z,) is a vector in which xi denotes the mean time spent in state i
before absorption. Instead of explicitly computing T-l and performing a vector-matrix
multiplication, it is often smarter to solve the linear system

:T = -a, (3.60)

to obtain g and compute E[X] = Ci xi. I n a similar way we can compute the j-th moment
qxq = xi xdj) where the vector &) follows from

g$)Tj = (-l)jj& (3.61)

Suppose we have already computed &) and want to compute g(j+l), we then proceed as
follows:
r&+l)Tj+l = (-l)j”(j + l)!& (3.62)
58 3 Stochastic processes

Multiplying both sides of this equation with T-j we obtain

$+‘)T = (-l)j+l (j + l)jcuT-j


= -(j + 1) \(-l)j(j)!&T+ (3.63)
” / = -(j + l)@,
z(j)
-

that is, we have obtained a linear system of equations that expresses $+l) in terms of &),
If we solve this linear system of equations with a direct method such as LU-decomposition
(see Chapter 15) we only have to decompose T once and we can compute successive vectors
&) using back-substitutions only. However, if T is large and we only require the first few
moments, then an iterative solution might be the fastest way to proceed.

3.9.2 Phase-type renewal processes


In this section we discuss a number of PH-renewal processes that have practical significance.

The &-renewal process

The Erlang-k distribution is a phase-type distribution consisting of a series of k exponential


phases, followed by a single absorbing state. An Erlang-k distribution results when the
sum of k independent and identically distributed exponential random variables is taken.
The squared coefficient of variation of the Erlang-k distribution is l/k. Consequently, for
large k, it approaches a deterministic arrival pattern. Erlang-k interarrival time distribu-
tions are often used to approximate true deterministic arrivals patterns (which cannot be
incorporated in Markov models).
Instead of Erlang-k interarrival times, one can also define hypo-exponentially dis-
tributed interarrival times; they are the sum of a number of independent exponentially
distributed random variables which need not be identically distributed. Hypo-exponentially
distributed interarrival times also have a squared coefficient of variation at most equal to 1.
It can be shown though, that no hypo-exponential distribution with k phases yields a co-
efficient of variation smaller than l/k, i.e., the Erlang-k distribution is the PH-distribution
with the smallest coefficient of variation, for given k.

The Hk-renewal process

Where the Erlang-k distribution is a series of exponential phases, the hyperexponential


distribution can be interpreted as a probabilistic choice betwe.en k exponential distributions
(see Figure 3. ll), As this probabilistic choice introduces extra randomness, one can imagine
3.9 Renewal processes as arrival processes 59

Figure 3.11: The hyperexponential distribution as a phase-type distribution

that for the Hk-renewal process the squared coefficient of variation is at least equal to 1.
When modelling “bursty” traffic sources, this therefore seems to be a good choice.
Often, a two-phase hyperexponential distribution is used. The Hz-distribution has
three free parameters, Q, X1 and X2, so that

l$f2(t) = 1 - aeexlt - (1 - a)e-‘2t, t 2 0.

Fitting such a distribution on the first and second moment (or coefficient of variation)
derived via measurements, leaves one free parameter. Therefore, often the Hz distribution
with balanced means is taken. The balanced means property can be expressed mathemat-
ically as al/X1 = (1 - a)/&. Th is extra equation can then be used to fit the interarrival
time distribution.
Suppose that measurements have revealed that the first moment of the interarrival
times in a job stream can be estimated as 2, and that the squared coefficient of variation
can be estimated as 6$, then the parameters of the H2 distribution can be expressed as
follows:
1 1 -
(3.65)
a=-+- 2 2 1 -@+l’
($1 x1 = $ x2 = 2(1 2 a) *

The Cox-renewal process

A well-known class of PH-distributions is formed by the Cox distributions. They can be


regarded as a generalisation of the Erlang distributions. A Cox distribution consists of
n exponential phases, with possibly different rates X1 through X,. Before each phase,
it is decided whether the next phase is entered (with probability 1 - ri) or not (with
60 3 Stochastic Drocesses

Figure 3.12: The Cox distributions as a mixed DTMC/CTMC state transition diagram

Figure 3.13: The interrenewal distribution of the IPP as a phase-type distribution

probability ri). The Cox distribution is visualised in Figure 3.12 where the small black
diamonds indicate the probabilistic choices to be made before each exponential phase.
With a proper choice of parameters, coefficients of variation both smaller and larger than
1 can be dealt with.

The interrupted Poisson process

In many telecommunication systems, the arrival stream of packets can be viewed as a


Poisson stream, however, switched on and off randomly. The on-period then corresponds
to an active period of a source, the off-period to a passive period of a source. The on- and
off-periods are determined by the characteristics of the source. As an example of this, one
can think of the stream of digitised voice packets that is originating from a telephone user:
periods of speaking are interchanged with periods of silence. These two period durations
determine the on- and off-time distributions. When on, the rate of digitised voice packets
determines the rate of a Poisson process. When off, no arrivals are generated. The on/off-
process is also called an interrupted Poisson process (IPP).
In Figure 3.13 we depict the interrenewal distribution of the IPP as a PH-distribution.
Starting state is, deterministically, state 1, representing the on-state. In that state, a
renewal period can end, i.e., an arrival can take place, via the transition with rate X to the
absorbing state. However, instead of an arrival, also the active period can end, according to
the transition with rate yl,o to state 0, the off-state. After an exponentially distributed off-
3.9 Renewal processes as arrival processes 61

period, with rate ~o,i the on-state is entered again. The generator matrix of this absorbing
CTMC is given as
-Yo,1 70,l

QIPP = Yl,O +Yl,O +A> 1 , (3.66)


i i
0 0 0
and the initial probability vector a = (0,l). Note that due to the choice of a, we have
established that after a packet arrival (an absorption) the PH distribution starts anew in
state 1 (a renewal) so that the arrival process remains in a burst. From Qrpp, we have

-701 YOl
T= (3.67)
710 -ho + X) ’

so that we can easily derive moments, using (3.57):

E[X] = (-l)ll!(O, 1) . T-l .I

=
-&co,
, 1)* i
-ho

-Yl,O
+ 4 -“yo,1

-Y0,1
*I
1
1
= - 1+yl,o (3.68)
x ( To,1 ) -
Note that the inverse of a 2 x 2 matrix can be expressed directly:

1
A=
* A-1 = det(A)

where det(A) = ad - cb is the determinant of matrix A.


The first moment E[X] o f an IPP can also be obtained using an alternative calculation
as follows. When one enters the initial state of an IPP, i.e.,state 1 in Figure 3.13, the
average time until absorption E[X] consists of a number of phases. With probability
1, there is residence in state 1, with average length l/(X + ~i,~). Then, with probability
yi,o/(X+yi,o), a state change to state 0 takes place, adding to the average time to absorption
l/‘yo,i, the average state residence time in state 0, plus, after returning to state 1 again,
another E[X]. I n other words, when entering state 1 for the second time, it is as if one
enters there for the first time. In equational form, this can be written as

E[X]= -+*(&+w).
l (3.69)
x + Yl,O

Collecting terms, this can be written as

E[X] =; l+Z ) (3.70)


( , 1
which is the result we have seen before.
62 3 Stochastic processes

3.10 Summary of Markov chains

To finish this chapter, we present in Table 3.1 an overview of the three classes of Markov
chains we have discussed.

As can be observed from this table, to obtain the steady-state behaviour of Markov
chains (DTMCs, CTMCs and SMCs) we need to solve systems of linear equations. How
can this best be done? To answer this question, we first have to answer whether the Markov
chain under study, and thus the corresponding system of linear equations, exhibits a special
structure. This is for instance the case when we are addressing a Markov chain that actually
is a birth-death process (where the generator matrix has a tri-diagonal structure). If such
a special structure does exist, we can often exploit it, yielding explicit expressions for the
steady-state probabilities, even when the number of states is infinitely large. Examples of
such cases will be discussed in Chapter 4 where we discuss birth-death models, in Chapter 8
where we discuss quasi-birth-death models, and in Chapters 10 through 13 where we discuss
queueing network models.

When the Markov chain under study does not exhibit a nice structure but is finite,
we can still obtain the steady-state probabilities by solving the system of linear equations
with numerical means (known from linear algebra). For small Markov chains, say, with up
to 5 states, we might be able to perform the computations “by hand”. For larger models,
with up to 100 or even 1000 states, we might employ so-called direct techniques such as
Gaussian elimination of LU-decomposition. For even larger models, with possibly tens or
hundreds of thousands of states, we have to employ iterative techniques such as Gauss-
Seidel iterations or the successive over-relaxation technique. We will come back to these
techniques in Chapter 15.

As can also be observed from Table 3.1, to evaluate the transient behaviour of DTMCs
just requires matrix-vector multiplications. To evaluate the transient behaviour of CTMCs
we have to solve a system of linear differential equations. For CTMCs with an infinite
state space, the transient behaviour can only be obtained in very special cases which
go beyond the scope of this book. For finite CTMCs, explicit expressions (closed-form
solutions) for the transient behaviour can be obtained, e.g., when the CTMC is acyclic. For
general CTMCs, however, we have to solve the system of differential equations numerically,
either via standard techniques such as Runge-Kutta methods, or via a technique specially
developed for CTMCs, known as uniformisation. We will also discuss these techniques in
Chapter 15.
3.11 Further reading 63

DTMC CTMC SMC


ii itial probabilities do) P(O) P(O)
transitions p = [pi,j] Q = [qi,j] p = bi,j]
interpretation probabilities rates probabilities
state residence steps, mean l/pi,; implicit in rates, mean l/qi,i Fi(t), mean hi
times (geometric number) (negative exponent ial) (any distribution)
steady-state VP = ?I pQ=O ~P=~,CiWi=l
distribution ci vi z-1 CiPi=l pi = vihi/ Cj vjhj
transient p(4 = pKw p’(t) = p@)Q not addressed
distribution = p(n - l)P given 2$9 -

Table 3.1: Tabular overview of three classes of Markov chains

3.11 Further reading


More information on stochastic processes, in much more detail than we have presented here,
can be found in many mathematics textbooks. We mention Kemeny and Snell [154], Feller
[87], Howard [138, 1391, Cinlar [55], Wolff [290] and Ross [247]. Phase-type distributions
have extensively been treated by Neuts [217]. C ox introduced the Coxian distributions
[69]. Many books on performance evaluation also contain chapters on stochastic processes.

3.12 Exercises
3.1. Poisson and renewal processes.
We consider a collection of overhead projectors. The life-time of a light-bulb in the i-th
overhead projector is given by the stochastic variable Li. All variables Li obey a negative
exponential distribution with mean 1000 hours. A university teaching term lasts 16 weeks
of 5 days of 8 hours each, hence 640 hours.

1. Assume that an overhead projector has a built-in spare light-bulb. Compute the
probability that an overhead projector becomes useless during a term.

2. Assume now that a projector does not have its own spare light-bulb. Instead, the
warden who is in charge of the 5 overhead projectors in the computer science building
is keeping spare light-bulbs. How many spare light-bulbs should he have at his
disposal to assure that the probability that an overhead projector becomes unusable
during a terrn is below lo%?
64 3 Stochastic processes

3.2. Alternating renewal processes.


An extension of the renewal process is the alternating renewal process in which the inter-
event times are distributed alternatingly according to two distributions: Xzi is distributed
as U with given distribution Fu and Xzi+r is distributed as D with given distribution
Fo (i E IV). This type of process is often used to model the operational lifetime of a
component which switches between “up” and LLdown” periods.
Define the length of the i-t h cycle as Ci = Xzi + X 2%+r and assume that U and D both
are negative exponentially distributed with rates l/10 and 2/3 respectively.

1. Compute the mean up-, down-, and cycle-time.

2. Give the distribution of the length of a cycle.

3. Compute the percentage of time the renewal process resides in an “up’‘-period, given
a very long observation interval (t + 00); this quantity is often called the steady-state
availability A.

4. Give the probability of having exactly n cycles in an interval [0, t).

3.3. Exponential distribution.


Derive expressions for the distribution, the density and the first two moments of an expo-
nential distribution, thereby using the more generally applicable results for PH-distributions.

3.4. A small CTMC.


Given is a CTMC with generator matrix

Ii I
-4 1 3 0
0 -2 2 0
& I= 0 0 -1 1 *
3 0 0 -3

Draw the state-transition diagram of this CTMC and compute the steady-state probabili-
ties (i) directly from the CTMC specification, and (ii) via the embedded DTMC.

3.5. Multiprocessor interference [adapted from [280]].


Consider a shared-memory multiprocessor system consisting of n processors and m memory
modules all connected to an interconnection network (which we do not address any further).
We make the following assumptions:
3.12 Exercises 65

a all memory modules can be addressed independently from one another and in parallel;

l every memory access takes a fixed amount of time (the same for all modules);

l every memory module can serve one request per time unit;

a every processor has at most one outstanding request for a single memory module;

l as soon as a memory request has been answered, a processor immediately issues a


new request for memory module i with probability qi where Ci qi = 1 (the values qi
are independent of the processor index (uniform memory access assumption)).

Consider the case where n = m = 2.

1. Construct a DTMC (with three states) that describes the access pattern of the pro-
cessors over the memory modules; use as state description a tuple (nl, nz) where ni
is the number of processors accessing memory module i; clearly, n1 + n2 = n.

2. Compute the steady-state probability of having no memory-access conflicts and the


probability of having memory-access conflict at memory module i.

3. Compute the mean number of requests, denoted E[B] for bandwidth, the memory
handles per unit of time.

4. Compute the maximum value E[B] can attain as a function of q1 and q2.

3.6. Two-component availability model.


A component of a system can either be up or down. The component has an exponentially
distributed lifetime with mean l/X; the repair of a component takes an exponentially
distributed amount of time with mean l/p.

1. Describe the state of a component as a simple CTMC.

2. Compute the steady-state probabilities pi for the component to be operational or not


(this probability is often called the availability).

3. Consider the case where we have two identical components (each with their own
repair facility). Describe the states of these two components as a CTMC.

4. Compute the steady-state probabilities pi,j of the two components together and show
that pi,j = pipj where pi (and pj) have been computed above.
66 3 Stochastic processes

5. How would the availability of just a single repair facility for both the components
affect the steady-state probabilities pi,j ? First reason about the expected changes in
availability and then explicitly compute the probability that both components are
operational (using a three-state CTMC).

3.7. Poisson distribution and Poisson processes.


Show that the expected number of arrivals in [0, t) in a pure birth process has a Poisson
distribution by explicitly solving the linear system of differential equations p’(t) = e(t)Q.

3.8. Cox distributions.


Show that any Cox distribution can be represented as a PH-distribution. Show that the
Erlang- and the hypo-exponential distributions are special cases of the Cox distribution.

3.9. Higher moments of the IPP.


Derive, using the moments-property of PH-distributions, the second moment of an IPP, as
well as its variance and squared coefficient of variation.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part II

Single-server queueing models


Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 4

M 1M 1I queueing models

I N the previous chapter we have already presented birth-death


class of CTMCs.
processes as an important
In this chapter, we will see that a birth-death model can be used
perfectly for analysing various types of elementary queueing stations. Furthermore, by the
special structure of the birth-death process, the steady-state probability
vector p- can be
expressed explicitly in terms of the model parameters, thus making these models attractive
to use.
In Section 4.1 we consider the solution of the most fundamental birth-death model:
an M(M( 1 queueing model with variable service and arrival rates. In principle, all other
queueing models discussed in this chapter are then derived from this model. In Section 4.2
we deal with a (single server) queueing model with constant arrival and service rates. We
then discuss the important PASTA property for queues with Poisson arrivals in Section 4.3;
it is used in the derivation of many interesting performance measures. The response time
distribution in such a queueing model is addressed in Section 4.4. We then address multi-
server queueing stations in Section 4.5, and infinite-server queueing stations in Section 4.6.
A comparison of a number of queueing stations with equal capacity but different structure is
presented in Section 4.7. We address the issue of limited buffering space and the associated
losses of jobs in Section 4.8 for single servers and in Section 4.9 for multi-server stations.
In Section 4.10 we address a queueing model in which the total number of customers is
limited. We finally present a mean-value based computational procedure for such a model
in Section 4.11.
70 4 M/MI 1 queueing models

Figure 4.1: State transition diagram for the most general MIMI1 model

4.1 General solution of the MIMI1 queue


Consider a single-server queueing station at which jobs arrive. As discussed already in
Chapter 3, we can see these arrivals as births. Similarly, the departure of a job at the end
of its service (or transmission) can be regarded as a death. We now make the following
assumptions regarding the time durations that are involved:

l the time between successive arrivals is exponentially distributed with mean l/Xi
whenever there are i jobs in the queueing station, i.e., we have a Poisson arrival
process with state dependent rates Xi;

l the time it takes to serve a job when there are i jobs present obeys a negative
exponential distribution with mean l/pi.

We can now describe the overall behaviour of this queueing station with a very simple
CTMC on the state space Z = (0, 1,2, s . e}. From every state i E LV a transition with rate
Xi exists to state i + 1, corresponding to an arrival of a job. From every state i E N+
a transitionwith rate pi exists to state i - 1, corresponding to a departure of a job. In
Figure 4.1 we depict the state transition diagram. The corresponding generator matrix
then has the following form:

(4.1
-x0 x0 0 *.. ... ... ...
Pl -(X1+/41) Xl 0 **a -** *f*

I*
Q= 0 -(X2+/4 x2 0 -** ---
P2
. ..

We can now use (3.41) f or solving the steady-state probabilities p;, i = 0, 1, . . ., that is,
we solve pQ - = Q under the normalisation condition. Instead of using (3.41), we can also
infer the appropriate equations directly from the state transition diagram by assuming
“probability flow balance”. Since we assume that the system will reach some equilibrium,
finally, the probability flow (or flux) into each state must equal the probability flow out of
4.1 General solution of the MIMI1 queue 71

each state, where the probability flow out of a state equals the state probability multiplied
with the outgoing rates. This interpretation also explains why (3.41) are often called the
global balance equations (GBEs). Thus, we have:

For state 0 : pox0 = plpl, (4.2)


For states i = 1,2, . e. : POi + Pi) = Pi-A-1 +pi+1pi+1. (4.3)

Of course, since the probability of being in any of a!1 possible states equals 1, we have the
normalisation equation:
cy)
c2=oPi = 1. (4.4

Having obtained the global balance equations we have to solve them. Only in very special
cases do these equations allow for an explicit solution of the state probabilities pi in terms
of the system parameters. An important aspect here is that it is in general difficult to say
a priori whether there exists such an explicit solution or not. However, here we deal with
such a special case. From (4.2) we have:

X0
PI = -PO* (4.5)
IQ1

Substituting this in (4.3) for the case i = 1 we obtain:

PlGb + Ah> = POX0 + p2p2, (4.6)

which, after using (4.5), yields:


Aoh
P2 = (4.7)
iGEpo.
Examining (4.7) gives us the idea that the general solution takes the form:

Substitution of (4.8) in (4.2)-(4.3) immediately confirms this. We can use (4.8) to ex-
press all the state probabilities in terms of po. We finally obtain po by considering the
normalisation equation (4.4):

(4.9)
from which we derive:
(4.10)
72 4 MIM 11 queueing models

Of course, po is only positive when the infinite sum has a finite limit. If the latter is the
case, the queue is said to be stable. In the more specific models that follow, we can state
this stability criterion more exactly and in a way that is more easy to validate.
An important remark here is the following. It has been pointed out that the flow
balance holds for any single state. However, the flow balance argument holds for any
connected group of states as well. Sometimes, the resulting system of equations is easier
to solve when smartly chosen groups of states are addressed. We will address this in more
detail when such a situation arises.
A third way to solve the steady-state probabilities of this birth-death Markov chain is
the following. From the GBE for state 0 we see that the probability flows from and to
state 0 must be equal. Because this is the case in the GBE for state 1, the terms pox0 and
~1~1 cancel so that we obtain ~1x1 = ~2~2. This can be done repeatedly, resulting in the
following system of equations:

Pi& = Pi+lPi+l, i E N~ (4.11)

again under the normalisation condition. These balance equations are called the local
balance equations (LBEs) and are often easier to solve than the GBEs. The LBEs are
obtained by setting equal the probability flow into a particular state due to the arrival of
a job with the probability flow out of that state due to the departure of that same job.
Important to note is the fact that not for all CTMCs do LBEs hold. Only for a special
class of queueing models is this the case (for more details we refer to [74]).
Finally, note that since we are dealing with a queueing station with an infinite buffer,
no jobs will be lost. Define the overall arrival rate X = CzopiXi and the overall service rate
P = CEO Pi/-G* As long as p = X/p < 1 the queueing station is stable and the throughput
X will be equal to the overall arrival rate X.
Other measures of interest are the expected number of jobs in the queueing station and
in the queue, respectively defined as:

E[N] = Fipi, and E[N,] = g(i - l)pi,


i=O i=l

from which, via Little’s law and X = X, the expected response time E[R] = E[N]/X and
the expected waiting time E[W] = E[N,]/X = E[R] - E[S] can be derived. Also more
detailed measures such as the probability of having at least k jobs in the queueing system
can easily be computed as B(lc) = Cz”=, pi.
4.2 The M(M(1 queue with constant rates 73

Figure 4.2: State transition diagram of the constant-rate MIMI1 model

4.2 The MIMI1 queue with constant rates


The simplest possible form of the general model discussed in Section 4.1 is the one in
which all the arrival rates are the same, i.e., Xi = X, i = 0, 1,2,. . ., and in which all
the service rates are the same, i.e., p( = p, i = 1,2,3, .. .. Figure 4.2 shows the state
transition diagram. Note that the arrival process is a Poisson process. Substituting the
model parameters
i
in the general result (4.8) we obtain

Pi =po
0 A
P
, i=o,1,2,*.**

If we define p = X/,U, we can compute p. using the normalisation equation:


(4.13)

gpi EPoFPi = PO = 1, (4.14)


i=O i=O 1-P
where we use the well-known solution of the geometric series under the assumption that
p < 1 (see also Appendix B). We thus find p. = 1 - p whenever p < 1 or X < p. This
existence condition is intuitively appealing since it states that a solution exists whenever
the average number of arrivals per unit of time is smaller than the average number of
services per unit of time. In summary, we find for the steady-state probabilities in the
MIMI1 queue:
pi = (1 - p)pi, i = 0,172, ’ ’ ’ ’ (4.15)

We observe that these steady-state probabilities are geometrically distributed with base p,
Moreover, since p. = 1 - p, we have p = 1 - p. = CE”=, pi. So, p equals the sum of the
probabilities of the states in which at least one job is present, or, in other words, the sum
of the probabilities of states in which there is work to do. This equality explains why p is
often called the utilisation. By the fact that all jobs that enter the queueing station also
leave the queueing station, we can directly state that the throughput X = X, so that we
can also express p = XE[S] .
Having obtained the steady-state probabilities, we can easily calculate other quantities
of interest such as E[N], the average number of jobs in the queueing station (see also
74 4 M]M] 1 queueing models

Appendix B.2):

E[N] = -&pi = (1 -p) -&pi = (1 - p)p&i-’


i=O i=o i=O

= (1- /&II$ & = p (4.16)


( 1 1 -p’

Note that we have again used the result for the geometric series and that we have changed
the order of summation and differentiation. Although the latter is not allowed at all times,
in the cases where we do so, it is. We can continue to apply Little’s result to obtain the
average time E[R] spent in the queueing station:

qq=E[N]/X
=s =z =1p-x’ (4.17)

What we see here is that the time spent in the station basically equals the average service
time (E[S]); h owever, it is “stretched” by the factor l/(1 - p) due to the fact that there
are other jobs in need of service as well. Note that we have already seen these results in
Chapter 2. Now we can, however, also derive more detailed results as follows.
A measure of interest might be the variance a% of the number of packets in the station,
This measure can be derived as the expectation of (i - E[N])2 under the probability
distribution p:

a& = g(i - E[N])“pi = &. (4.18)


i=o
The probability B(k) of h aving at least Ic packets in the queueing station is also of interest,
for instance when dimensioning buffers. We have

B(k) = fFpi = F(l - p)pi = pk(l -p) &3i = pIc. (4.19)


i=k i=k i=O

We observe that the probability B(k) of h aving Ic or more jobs in the station is decreasing
exponentially with Ic. Very often, the value B(lc) is called a blocking probability. This
name, however, might easily lead to confusion since no losses actually occur.

4.3 The PASTA property


A well-known and often applied result of queueing theory is the PASTA property (Poisson
Arrivals See Time Averages; see also [291]):
4.3 The PASTA property 75

Theorem 4.1. PASTA property.


The distribution of jobs in a queueing station at the moment a new job of a
Poisson arrival process arrives is the same as the long-run or steady-state job
distribution. 0

We have already used the PASTA property in Chapter 2 where we derived first moments for
the performance measures of interest of an M]M] 1 q ueue. There we used that an arriving
customer “sees” the queue at which it arrives “as if in equilibrium”.
Although the PASTA property is intuitively appealing, it is certainly not true for all
arrival processes. Consider as an example a DID] 1 queueing station where every second
a job arrives which requires 0.6 seconds to be served. Clearly, since p = XE[S] = 0.6
this queueing station is stable. However, whenever a job arrives it will find the station
completely empty with probability 1, although in the long run the queue-empty probability
will only be 1 - p = 0.4.
The proof of the PASTA property is relatively simple, as outlined below. Let us con-
sider a queueing system in which the number of customers present is represented by a
stochastic process (Xt,t 2 0). Furthermore, define the event “there was (at least) one
arrival at this queueing station in the interval (t - h, ,I”. Since the arrivals as such
form a homogeneous Poisson process, the probability of this event equals the proba-
bility that there is an arrival in the interval (0, h] which equals Pr{N(h) >_ l}, where
N(t) is the counting process defined in Chapter 3. For non-Poisson processes, this “shift
to the origin” would not be valid. Since the interarrival times are memoryless, the
thus defined probability is independent of the past history of the arrival process and of
the state of the queueing station: Pr{N(h) 2 11X,-, = i} = Pr{N(h) 2 1) so that
Pr{N(h) 2 1 UXt-h = i} = Pr{N(h) 2 1) Pr{Xt_h = i}. From this, we can conclude that
also
Pr{Xt_h = i]N(h) 2 1) = Pr{Xt-h = i}. (4.20)

If we now take the limit h + 0, the left-hand side of this equality simply expresses the
probability that an arrival at time-instance t arrives at a queue with i customers in it.
This probability then equals the probability that the queue at time t has i customers in it,
independent from any arrival, hence, the steady-state probability of having i customers in
the queue. As a conclusion, we see that a Poisson arrival acts as a random observer and
sees the queue as if in equilibrium.
Important to note is that we have used the memoryless property of the interarrival
times here. Indeed, it is only for the Poisson process that this property holds, simply since
76 4 MIMI 1 queueing models

there is no memoryless interarrival time distribution other than the negative exponential
one used in the Poisson process. The discrete-time analogue of the PASTA property is
the BASTA property where in every time-slot an arrival takes place (or not) with a fixed
probability p (or 1 - p). This means that in every slot the decision on an arrival is taken
by an independent Bernoulli trial so that the times between arrivals have a geometrically
distributed length. The latter might be no surprise since the geometric distribution is the
only discrete-time memoryless distribution.

4.4 Response time distribution in the MIMI1 queue


In Section 4.2 we have computed the steady-state probability distribution of the number of
customers in an MIMI 1 queueing station. From this distribution, we were able to derive the
mean response time (via Little’s law). In many modelling studies, obtaining such mean
performance measures is enough to answer the dimensioning questions at hand, How-
ever, in some applications it is required to have knowledge of the complete response time
distribution, e.g., when modelling real-time systems for which the probability of missing
deadlines should be investigated. In general, obtaining
complete response or waiting time
distributions is a difficult task; however, for the M IM 11 ueue it remains relatively simple,
q
as we will see below.
When a job arrives at an MIMI1 queue, it will find there, due to the PASTA property,
n jobs with probability p, = (1 - p)p”. As the exponential distribution is memoryless,
the remaining processing time of a job in service again has an exponential distribution.
Therefore, the response time distribution of a job arriving at an MIMI1 queue which is
already occupied by n other jobs has an Erlang-(n + 1) distribution, that is, the sum of n
exponentially distributed service times, plus its own service time.
From Appendix A we know that the Erlang-Ic (Ek) distribution with rate p per stage
has the following form:
k-1 (pt)i
FEk (t) = 1 - eeCLtc ir. (4.21)
i=O ’
Using this result, we can calculate the response time distribution by unconditioning: the
complete response time distribution is the weighted sum of response time distributions
when there are n jobs present upon arrival, added over all possible n:

FR(t) = Pr{R 5 t} = 2 p, Pr{R 5 tin packets upon arrival}


n=O
co co
-- ~pnFk+,(t) = x(1 - p)pnFEn+l(t)
n=O n=O
4.5 The MIMlm multi-server queue 77

= 1 _ pt 5 yi = 1 _ ptpt
i=O ’
= 1 _ p(lWt -- 1 _ &CL-W (4.22)

Surprisingly, the response time is exponentially distributed, now with parameter (p - X).
We can directly conclude from this that the average response time equals E[R] = l/(p - X)
as we have seen before.
Response time distributions can often be used to give system users guarantees of the
form “with probability p the response time will be less than F&‘(p) seconds”. Especially
for time-critical applications such response time guarantees are often more useful than the
average response time.

Example 4.1. Response time distributions at varying p.


Consider an MIMI1 queue with p = 1 and where X is either 0.2, 0.5 or 0.8. For these cases,
the response time distribution is given in Figure 4.3. As can be observed, the higher p, the
smaller Pr{R 5 t), i.e., the higher the probability that the response time exceeds a certain
threshold (Pr{R > t} = 1 - Pr{R 5 t}). 0

4.5 The MIMI m multi-server queue


In the previous sections we have discussed models of systems in which there is only a
single server. Now consider a system in which a number of service providing units can
work independently on a number of jobs. Examples of such systems are (homogeneous)
multiprocessor systems or telecommunications systems (telephone switches) with multiple
outgoing lines.
The multi-server aspect can easily be incorporated in the general birth-death model we
developed before. We assume constant arrival rates, such that Xi = X for all i = 0, 1, . . .,
but the number of active servers depends on the number of jobs present, as follows:

pi = aiJ~ i = 0, 1, . e. , m,
(4.23)
{. mp, i=m+l,m+2,---.
78 4 MIM 11 queueing models

1.0
0.9
0.8
0.7
0.6
Pr(R 5 t}
0.4
0.3
0.2
0.1
0.0
0 2 4 6 8 10
t

Figure 4.3: Response time distributions in an MIMI1 queue for p = 1 and various X’s

Figure 4.4: State transition diagram for the MlMlm model

This definition says that as long as there are less than m jobs present, the effective service
rate equals that number times the per-server service rate, and when at least m jobs are
present, the effective service rate equals mp. In Figure 4.4 we show the corresponding state
transition diagram.
When we define p = X/mp, the stability condition for this model can again be expressed
as p < 1; p can also be interpreted as the utilisation of each individual server. The
expected number of busy servers equals mp = X/p. When the station is not overloaded the
throughput X = A. When the station is highly loaded, it will operate at maximum speed,
i.e., with rate mp. Under these assumptions, we can compute the following steady-state
probabilities from the global balance equations:

bPY
Pi = PQ ---, i = O,.e.,rn - 1, (4.24)
i!
4.6 The MIMI 00 infinite-server queue 79

and
bP)” (mP)i-m (4.25)
Pi = PO- m! mi-m T i = m,m+L--,

where po follows from the normalisation as follows:

Using these steady-state


PO=

probabilities
( m-1 (mp)j
c----
j=o j!

we can compute E[N]


@P)”
+ (l-p)m!
)
-l

as follows:
(4.26)

(mp)”
F[~]=~ip~=...=mp+P~~~ (4.27)
i=o

An arriving job will have to wait before its service start when all the m servers are busy
upon its arrival. Due to the PASTA property this probability can be expressed as the sum
over all state probabilities for which there are at least m jobs present as follows:

Pr{waiting} = F pol
(WY zpo$ 2 pi =po-- (mdm 1
(4.28)
i=m m.z ’ i=m m! l-p’

By explicitly including (4.26) f or p o, we obtain Erlang’s C formula:

(w)” 1
m! 1-P
w%P) = (4.29)
C;gp$+&’

4.6 The M(MI 00 infinite-server queue


As an extreme form of a multi-server system, consider now a system which increases its
capacity whenever more jobs are to be served. Stated differently, the effective service rate
increases linearly with the number of jobs present: ,LL~= ip, for i = 1,2,. . -. Combining
this with an arrival process with fixed rate, i.e., Xi = X, for i = 0, 1, s. ., and using (4.8) we
can immediately derive that
i-l

Pi = PO I-J x = po$, i = 1,2, *. . , (4.30)


k=O @ + l>P *

with p = X/p as usual. In Figure 4.5 we show the corresponding state transition diagram.
Using the normalisation equation (4.4) we find that

1 1 1 -VP (4.31)
po= l+-yzm_l(;)z+ =m=G=” *
80 4 MIMI1 queueing models

Figure 4.5: State transition diagram for the infinite-server model

Now, we can compute E[N] as

E[N] = gipi = e-f’gi$ = e-Pgi$


i=O id) ’ i=l *

= e -PP 2O” (i Pi-l = e+PeP = P, (4.32)

which can easily be explained. Since all arriving jobs are immediately served, the queue
will always be empty, i.e., E[N] = E[N,] + E[N,] = 0+ E[N,] = p. The average time spent
in the queueing system simply equals the average service time l/cl.
Infinite servers are often used for modelling the behaviour system user. For instance,
the delay that computer jobs perceive when a user has to give a command from a termi-
nal can be modelled by an infinite-server. There is no queueing of jobs at the terminal
(every user has its terminal and there is only one job per user/terminal), but submitting
a command takes time. Consequently, there is only a service delay (the “think time”).
Infinite-server queueing stations are also used when fixed delays in communication links
have to be modelled. For that reason, infinite-servers are sometimes also called delay
servers.

4.7 Job allocation in heterogeneous multi-processors


When investments for computer systems have to be made, very often the question arises
what is most effective to buy: a single fast multi-user computer, or a number of smaller
single-user computers. We can formalise this question now, using some of the models we
have just presented, albeit in a fairly abstract way.
The three abstract system models are given in Figure 4.6. In case (1) we consider
a single fast processing device with service rate Kp (all users share a single but fast
computer). In the second case we consider K smaller computers, each with capacity /-A. In
doing so, we can either deal with a single queue with K servers (all users share a number
of smaller computers; case (2a)), each of speed p, or with K totally separate computers
4.7 Job allocation in heterogeneous multi-processors 81

Figure 4.6: Three equal-capacity systems (K = 3)

among which we divide the workload (each user has its own machinery; case (2b)). What
is the most profitable in terms of the average response time E[R], provided that jobs arrive
as a Poisson process with rate X and that the service times are exponentially distributed
with mean E[S]? For this example, we define p = X/p.
In case (1) we deal with an MIMI1 queue with arrival rate X and service rate K,Q.
The actual average service time then equals E[S]/K and the actual utilisation equals
XE[S]/K = p/K. Th e average response time then equals

WI
E[&] = EISIIK = ___ (4.33)
1-M K-p’

In case (2a) we deal with an MlMlK queue with job arrival rate X and job service rate p.
We calculate E[R 2] using (4.27), thereby taking into account the used definition of p in
this example, and Little’s law as follows:

E[R21 = EISl+ Jfwl


= E[N211X 7% PK cl $K)2’ (4.34)

with po as defined in (4.26), and with m = K. In case (2b), we deal with K MIMI1 queues
in parallel, each with arrival rate X/K and service rate ,X The average service time equals
E[S] but the utilisation per queue is only X/K x E[S] = p/K. Therefore, we have

E[&] = E’S1 = Kz = KE[RJ. (4.35)


l-P/K
82 4 MIMI1 queueing models

It now follows that


q&l < JqR21 < q&l. (4.36)

This can also be intuitively explained. Case (2b) is worse than case (2a) because in case
(2b) some of the servers might be idle while others are working. These busy servers might
even have jobs queued which is not possible in case (2a). Case (2a) in turn, is worse than
case (1) because if there are L < K jobs in the system, the system in case (2) only operates
at effective speed Lp whereas in case (1) the speed of operation always is Kp. If there are
K or more jobs in the system, cases (1) and (2a) are equivalent. Despite this fact, case
(1) is best in general. In general, we observe that pooling of jobs and systems is the most
efficient, apart from possibly introduced overheads.
Now suppose that we have to divide a Poisson stream, with rate A, of arriving jobs, of
length S, to two processors, one with capacity Ki and one with capacity K2. How should
we choose the probabilities a1 and CQ, with Xi = aJ and al + o!2 = 1, such that the
average job response time is minimised?
For the average job response time we have E[R] = aiE[Ri] + cr2E[R2]. First of all, we
know that
E[Ri] = s, E[Si] = E[S]/Ki, /Ii = XiE[Si], i = 172. (4.37)
a
Writing E[R] = criE[Ri] + (1 - c~i)E[&] and substituting a2 = 1 - ~1 whenever possible,
we obtain an expression for E[R] as a function of ~1:

E[R(w)] = al 1 - EM lK1
alXEIS]/Kl
+ (1 - @l>
WI lK2
1 - (1 - al)XEIS]/Kl*
(4.38)

Taking the derivative of this expression with respect to ol and collecting terms, we obtain

dEP(41 = E[Sl/Kl E[S]/K2


(4.39)
da1 (1 - d2 - (1 - p2)2’

Now, to optimise E[R(al)] we set dEIR(cxl)]/dal = 0 which yields

dEP(41 =o * (l-d2 _ K2 * -=
(1-d I/z
(4.40)
da1 (1 - P2)2 - K1 Cl-P2) AG’

We observe that the quotient of the squared idle fractions for queues 1 and 2 should equal
the quotient of K2 and Kl! Thus, the queue with the highest capacity should have the
smallest idle fraction. Stated differently, the faster queue should be more heavily loaded to
optimise E[R]. F rom the quotient of the idle fractions, we can straightforwardly calculate
al and ~2.
4.8 The MIMI11 m single-server queue with bounded buffer 83

Example 4.2. Calculating al and a2.


As an example, consider the case where X = 4 and E[S] = 1 and where server 2 is four
times as fast as server 1, i.e., Kr = 1 and K2 = 4. Following the above computations will
yield al = l/6 and CQ = 5/6, yield ing p1 = 4/6 and p2 = 5/6. Clearly, the faster server is
more heavily loaded! This implies that load balancing on the basis of (average) utilisations
is not always the best choice. The expected queue lengths at the queues can be computed
as E[N,] = 2 and E[N2] = 5. Note that the faster server will have a larger average queue
length. This implies that load-balancing on the basis of (average) queue lengths without
taking into account server speeds is not a good idea either! cl

4.8 The MIMI11 m single-server queue with bounded


buffer
In the models we have addressed so far, the buffer capacity has always been infinitely large.
In practice the buffer capacity of a system is limited. In some performance evaluation
studies it is important to reflect this fact in the model. In this section we therefore address
an M]M]l system where the total number of jobs in the queueing station is limited by
m. Jobs arriving whenever there are already m present in the queueing station are not
accepted; they are simply lost. From the viewpoint of the queueing station, the situation
is as follows. Whenever there are m jobs in the station, no new arrivals can occur. It is
as if the Poisson arrival stream has been switched off in these cases. The serving of jobs,
however, always happens with the same speed. Consequently, we have:
A, i = O,l;..,m - 1, p, i = l,...,rn,
xi = and pi = (4.41)
0, i=m,m+l,..., 0, i = m+ l,m+2,.*..
The corresponding state transition diagram is given in Figure 4.7. It is interesting to notice
that this model is always stable; when X > p jobs will queue, but only up till a total of
m. The jobs that arrive after the queue has been filled completely are simply discarded.
Turning to the solution of this model, we have from (4.8) that
i

=pOpi, i = l;..,m, (4.42)

where p equals the ratio X/p. Notice that this ratio does not represent the utilisation any
more. We can solve for po by using the normalisation equation (4.4):

(4.43)
84 4 M/MI 1 queueing models

Figure 4.7: State transition diagram for an M]M]llm loss system

where we used the result for the finite geometric series (see Appendix B). Substituting
(4.43) in (4.42), we obtain the following steady-state probabilities:

(4.44)

In queueing stations with finite buffer capacity, arriving jobs can be lost. Due to the
PASTA property the probability that an arriving job will be lost equals p,. Therefore,
the throughput of such a queueing station is not automatically equal to the job arrival
rate. For the case considered here, we can derive that the throughput X only equals X
(the arrival rate) whenever the queue is not yet completely filled: whenever the state upon
arrival of a new packet is not equal to m, so that we have X = X( 1 - p,). On the other
hand, as long as the queue is not empty, which is the case with probability 1 - po, the
server serves jobs with rate ,Y. Therefore, the throughput is also equal to X = ~(1 - ~0).
The utilisation now equals XE[S] or 1 - po.

Example 4.3. The throughput of the MIMI115 queue.


In an MIMI115 q ueue, the probability that an arriving job will be lost equals p5. We
therefore have: X = X( 1 - p5) but also X = pL(1 - po) so that:

1 - p5
x = /.L(l -po) = X(1 -p5) = x ___ (4.45)
( l-p” ) *

Example 4.4. Losses and overflow.


Due to the fact that the MlMlllm has finite buffering capacity, losses occur. The probability
that an arriving customer is lost is expressed by p,. In Table 4.1 we show p, as a function
of m, for three values of p = X/p. As can be observed, the loss probability remains to be
substantial for higher utilisations, even when the buffer capacity becomes larger. 0
4.9 The MIMI m 1m multi-server queue without buffer 85

m p = 0.2 p = 0.6 p = 0.9


0 1.0000 1 .oooo 1.oooo
1 0.1667 0.3750 0.4737
2 0.0322 0.1837 0.2989
3 0.0064 0.0993 0.2120
4 0.0013 0.0562 0.1602
5 0.0003 0.0326 0.1260
6 0.0000 0.0192 0.1019
7 0.0000 0.0114 0.0840
8 0.0000 0.0068 0.0703
9 0.0000 0.0041 0.0595
10 0.0000 0.0024 0.0508

Table 4.1: Blocking probabilities p, in the MlMlllm queue

Figure 4.8: State transition diagram for the MlMlmlm model

4.9 The M(M( m 1m multi-server queue without buffer

Finite buffers and multi-server behaviour can also be combined. A particularly important
class of models arises where the number of servers equals the number of jobs that can be
dealt with; in principle no buffering takes place in such models. Whenever an arriving job
does not find a free server, it is lost. This typically occurs in (older) telephone switches
where the number of outgoing lines equals the maximum number of customers that can be
coped with. In case all m lines are busy, no further queueing can occur and the request
is not accepted, nor is it queued (the user simply hears the busy-tone). In Figure 4.8 we
show the corresponding state transition diagram.
We can see this model again as a special case of the general birth-death model, now
with the following parameters:

Xi=X, i=O,l;.*,m-1, and /Li=ip, i=l,2;a.,m. (4.46)


86 4 MI MI 1 queueing models

Solving (4.8) and setting again p = X/p, we obtain

Pi = PO&$ i = 0, 1, . . . , m, (4.47)

where po follows from the normalisation:


mpi -’

PO = J&oT *
(4.48)
(. *)

The probability pm signifies the probability that all servers are in use. Due to the PASTA
property, this probability equals the long term probability that an arriving packet is lost.
The formula for pm was first established by Erlang in 1917 and is therefore often referred
to as Erlang’s loss formula or Erlang’s B formula and denoted as B(m, X/p) = B(m, p):
pm/m!
Pm = B(m, P) = (4.49)
c;, Pyi!

4.10 The MlMllllK q ueue or the terminal model


We finally address a queueing station in an environment where the total number of jobs is
limited. This occurs in numerous situations, e.g., in a computer network system in which
a finite set of users may all issue one request at a time to a network file server. Only after
an answer has been received from the file server, may a new request be issued by each user.
Thus, there can never be more jobs (modelling requests) in the queueing station (modelling
the server) than there are users.
We can interpret the M]M]l]]K q ueueing model as follows. There are K system users
sitting behind their terminals. These users issue requests for service after an exponentially
distributed think time 2 with mean E[Z] = 1/x. Th e more users there are in the thinking
state, the higher the effective completion rate of thinkers is, that is, the effective arrival
rate at the system is proportional to the number of thinkers. Stated differently, the users
can be modelled as an infinite server, where each user has its own server. Once jobs have
been submitted, the users wait for the answer. The system completes jobs with mean time
E[S] = l//L.
To formalise this model, we proceed as follows. When there are i jobs in the queueing
station, there will be K - i potential other jobs. We thinks of these jobs as being part
of some “environment” of the queueing station. Furthermore, the arrival rate of jobs is
proportional to the number of jobs in this environment, i.e., proportional to K - i. This
means that we have:

xi = X(K - i), i = 0,*- *,K--1, and pi==, i=O,l,...,E(. (4.50)


4.10 The MIM(l(IK queue or the terminal model 87

Figure 4.9: State transition diagram for the MJM/lJJK model

\ K users

computer system

Figure 4.10: Simple terminal model

Notice that this queueing station can never be overloaded. When all the jobs are in the
queueing station, no new jobs will arrive, so the queue will not grow infinitely large. In
Figure 4.9 the state transition diagram is depicted. Again, we can use (4.8) to obtain the
following expression for pi:
i-l i-l
pi = po ig XW - w
= p0 n p(K - k) = popi n (K - k), i = 0, 1,. . . , K. (4.51)
k=O P k=O k=O

Using the normalisation equation we find that

(4.52)

In Figure 4.10 we sketch this so-called terminal model. It is an example of a closed


queueing network with a population of K customers circling between the terminals and the
processing system; note the special representation we use to signify infinite server nodes.
One can think of closed queueing models as models in which a departing customer at one
station will result immediately in the arrival of that customer at one of the other queueing
stations in the model. We will discuss queueing network models at length in Chapter 10
through 13.
88 4 MIMI 1 queueing models

4.11 Mean values for the terminal model


Using the birth-death model developed in Section 4.10 we can compute the steady-state
probabilities p- and from those we can compute average performance measures such as
E[N] in a similar way to that done before. However, if we are only interested in average
performance values and not in the precise queue-length distribution p,
- there is a simpler
method available. This method, known as mean-value analysis, can be applied in many
cases for more general queueing networks; we will come back to it in Chapter 11 through
13. Here we develop the method for this special case only.
Let us first address the average response times at the terminals (E[&]) and at the
system (E[R,(K)]). I n an infinite-server queueing station, every job has its own server, so
no queueing or waiting occurs. Therefore: E[&] = E[Z], independently of the number of
customers K actually present. For the processing system, however, the average response
time depends on the number of customers K. To compute E[R,(K)], we first have to
introduce the average cycle time as E[C(K)] = E[&] + E[R,(K)] = E[Z] + E[R,(K)].
E[C(K)] expresses the mean time it takes for a customer to go once through the cycle
“think-serve” . The throughput X(K) ( a g ain note the dependence on K) can now be
expressed as K/E[C(K)], th a t is, as the product of the frequency with which jobs cycle
and the number of jobs (notice that X(K) is not equal to l/E[Z]). Combining the above
two results, we obtain
K K
(4.53)
X(K) = E[C(K)] = E[Z] + E[R,(K)]’
from which we derive the response time law:

E[WWl = &j - E[Z]. (4.54)

Using Little’s law for the system, we have E[N,(K)] = X(K)E[R,(K)]. Using Little’s law
for the terminals, we have E[N,(K)] = X(K)E[&]. Using X(K) = K/E[C(K)] in the
two instances of Little’s law, we can eliminate X(K), as follows:

E[N,(K)] = f$z::#;K, and E[N,(K)] = E’Z1 K. (4.55)


Jw(K)I
As can be observed, the K customers spread themselves over the two nodes in the model
with ratios proportional to the average time spend at the nodes divided by the time spend
on an average cycle.
Although the above equations give some insight, they still do not yield us answers. For
that purpose we need to know the throughput X(K) which equals the product of the server-
busy probability and the service rate of the system: X(K) = (1 -po)p = (1 -po)/E[S].
4.11 Mean values for the terminal model 89

We can evaluate this throughput by using the result of Section 4.10, thereby again noting
that if we change K, po will change as well (we therefore write po as a function of K below).
In summary, we have
- E[Z]. (4.56)

Instead of computing PO(K) explicitly, let us first address two asymptotic results. For large
values of K, the idle fraction is very small so that the denominator 1 - po( K) will approach
1. For large K we therefore have E[R,(K)] E KE[S] - E[Z]. For K = 1, the server-busy
probability (p( 1)) th e utilisation in case of 1 job) simply equals E[S]/(E[S] + E[Z]) (the
average time the job spends in the server divided by the time for an average cycle). This
is due to the fact that E[R,(l)] = E[S] since queueing will not occur.
The above two limiting cases can be regarded as asymptotes for the actual curve of
E[&(K)1. Th eir crossing point, that is, the value for K* such that E[S] = K*E[S] - E[Z],
is called the saturation point and computed as follows:

E[R,(l)] = E[R,(oo)] a K* = E’s~s;r,l. (4.57)

Notice that K* = l/p(l). Wh en the think and service times would have been constants
rather than random variables, K* would have been the maximum number of users that
could have been served before any queueing would occur in the system. Having more
than K* customers present would for sure imply that at some place in the model queueing
would occur. Since the involved service times are not constants but random variables the
actual values for the response time at the system (E[R,(K)]) are of course larger than the
asymptotes and queueing will already occur for values of K smaller than K*. In Figure 4.11
E [R, (K)] is depicted as a function of K for the parameter values of the example to be
discussed below. The asymptotes are also indicated.
For the precise calculation of E[R, (K)] we still need to know the value of po( K).
Although we can compute this value by using the summation as in Section 4.10 there
is a smarter way to go. We can calculate E[R,(K)] recursively from E [R, (K - l)] . To
understand this, we need a result, which we will discuss in more detail in Chapters 11 and
12, known as the arrival theorem which was proven in the late 1970s.

Theorem 4.2. Arrival theorem.


A customer in a closed queueing network arriving at a queue, will see this
queue in equilibrium (with average filling), however, for the case in which there
is one customer less in the queueing network. cl
90 4 MINI1 1 queueing models

According to this theorem, the average response time at the system can be expressed as
follows:
E[R,(K)] = E[N,(K - l)]E[S] + E[S]. (4.58)

The first term represents the average waiting time because of jobs already queued (or in
service) upon arrival, whereas the second term is the average service time of the job just
arriving. Now, using (4.55) we can write

(4.59)

so that
E[&(K - l)] (K - 1) E[S]+ E[S].
-q&(K)] = E[&(K (4.60)
( - l)] + q-1 )
To begin this recursion, we use E[R,(l)] = E[S].

Example 4.5. An MVA of the terminal model.


Consider a terminal model as described throughout this section in which E[S] = 2 and
E[Z] = 10. The saturation point can easily be calculated as

K*=EISl$-EIZl- 2+10 6 (4.61)


EM 2 *

The asymptotes of E[R,(K)] are respectively given as E[R, (Ii = l)] = E[S] = 2 and
E[R,(K + oo)] = KE[S] - E[Z] = 2K - 10. Using the MVA recursion we can now
compute E[R,(K)], for K = 1, a.. ,12. Notice that we can also express p(K) and X(K)
directly in terms of E[R,(K)] as follows:

KWI K
(4.62)
dK) = E[R,(K)] + E[Z] and X(K) =
E[Rs(K)] + E[Z] ’

In Table 4.2 we present the values for E[R,(K)], p(K) and X(K). In Figure 4.11 we show
the average response time curve and its asymptotes (lower bounds). In Figure 4.12 we
show the throughput curve, again with its asymptotes (upper bounds). Note that in both
cases the asymptotes are very easy to compute and that their crossing points lie in both
cases at K*. It is clearly visible that adding more customers (adding more terminal users)
does increase the response times; however, it does not significantly increase the throughput
after a particular point. 0
4.11 Mean values for the terminal model 91

16 I I I I I

14
12
10
WdK)18
6
4
2
0
2 4 6 8 10 12

Figure 4.11: E[R,(K)] an d i t s t wo lower bounds as a function of K

0.6

0.5

0.4
X(K)
0.3

Figure 4.12: X(K) and its two upper bounds as a function of K


92 4 MIMI 1 queueing models

K E[WWl P(K) X(K) K wL(K)I P(K) X(K)


1 2.00 0.167 0.084 7 5.92 0.880 0.440
2 2.33 0.324 0.162 8 7.21 0.930 0.465
3 2.76 0.470 0.235 9 8.70 0.963 0.482
4 3.30 0.602 0.301 10 10.37 0.982 0.492
5 3.98 0.715 0.358 11 12.18 0.992 0.496
6 4.85 0.808 0.404 12 14.08 0.997 0.497

Table 4.2: E[R,(K)], p(K) and X(E() for K = 1,. .. ,12

4.12 Further reading


Further information on birth-death processes and their application to simple queueing
models can be found in many performance evaluation books. Early work related to birth-
death processes and blocking probabilities is due to Erlang [84]. Background information
on the existence of local balance equations and their relation to the global balance equations
can be found in the books by Kelly [153] and Van Dijk [74]. The PASTA property has been
described by Wolff [291]. Th e arrival theorem and the associated mean-value analysis will
be discussed in more detail in Chapters 11 through 12; seminal papers in this area have
been published by Reiser and Lavenberg [245, 2431.

4.13 Exercises
4.1. Calculation of B(k).
Compute the minimum number of jobs Ic such that for an M]M(l queue with p = 0.8 the
value B(3c) < lo-“.

4.2. Waiting time distribution.


The waiting time distribution in the M]M]l queue can be derived in a similar way as the
response time distribution. First notice that with probability p. the waiting time equals 0

and that with probability pl, (k = 1,2, . * a) the waiting time has an Erlang-Ic distribution.
Show that

Fw(t) = Pr{W 5 t} = p. + 2 pl,&,(t) = 1 - pe-(p-x)t = 1 - pe-‘“(l-P)t.


k=l
4.13 Exercises 93

4.3. Variance in the MIMI1 queue.


Show that the variance of the number of customers in an M ]M (1 queue indeed corresponds
to (4.18).

4.4. Multi-server queueing stations.


For the multiserver M]M(m queue introduced in Section 4.5 show that:

1. the solutions for pi reduce to those for the MI M ] 1 queue when m = 1,

2. the expressions for pi do indeed fulfill the global balance equations,

3. E[N] can indeed be computed using (4.27).

4.5. Comparing average response times when K = 2.


Show that inequality (4.36) holds for K = 2.

4.6. The MIMI1 queue with server breakdowns.


Consider an MIMI 1 queue in which the server has an exponentially distributed life-time
with mean l/Z. Once failed, the server is repaired; such a repair takes an exponentially
distributed amount of time, with mean l/r. Customer arrivals form a Poisson process with
rate X and services last a negative exponentially distributed time with mean l/p. For the
time being, assume that the arrival process is stopped as soon as the server breaks down.

1. Draw the state-transition diagram for this extended MIMI1 queue.

2. Derive the global balance equations for this CTMC.

3. Derive a formula for E[N] (b e inspired by the normal M ]MI 1 queue).

4. Derive a formula for 0%.

5. Now assume that arrivals continue to occur, even if the server has broken down.
How does this change the state-transition diagram and the global balance equations?
What is the stability condition in this situation?

4.7. Capacity and arrival rate increase.


How does the average response time in an MIMI1 queue change, when both the arrival rate
X and the service rate p are increased by a multiplicative factor Q, i.e., when p remains the
same?
94 4 M/MI 1 queueing models

4.8. Erlang’s B formula.


Calculate B(m, X/p) f or m = 5, 10, 15, 25 and 100 and X/,T.L= 0.1,. . . ,0.9.

4.9. Multi-server queues.


Prove (4.26) and (4.27) using (4.24) and (4.25).

4.10. MVA and the terminal model.


Consider a terminal model with K customers, E[Z] = 15 and E[S] = 3.

1. Compute the saturation point K*.

2. Compute the asymptotes for the system response time E[R,(K)] and draw them in
a graph.

3. Compute the asymptotes for the system throughput X(K) and draw them in a graph.

4. Compute the exact values of X(K) and E[R,(K)], for K = 1,. -. ,12, using MVA.

5. Propose an MVA-based scheme to compute pa(K) from p(K). Do not directly com-
pute pa(K) via (4.52)!
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 5

MI G (LFCFS queueing models

I N the previous chapter we have discussed a number of Markovian


shown various applications of them. In practice,
queueing models and
however, there are systems for which
the negative exponential service times that were assumed in these models are not realistic.
There exist, however, also single server models that require less strict assumptions regarding
the used service time distributions. Examples are the MjGll model, the GIG11 model and
the GiPHll model. The analysis of these models is more complicated than that of the
simple birth-death models encountered in Chapter 4.
In this chapter we focus on the MlGll q ueueing model. This model is rather generally
applicable in environments where multiple users (a large population of potential customers)
are using a scarce resource, such as a transmission line or a central server, for generally
distributed periods of time.
This chapter is organised as follows. In Section 5.1 we present the well-known results
for various mean performance measures of interest for the MlGll queue. We pay special
attention to the impact of the general service time distribution. The MIGIl result can
be proven in an intuitive fashion; we do so in Section 5.2. A rigorous proof based on a
embedded Markov chain is then presented in Section 5.3. In Section 5.4 we discuss an
extension of the MlGll model in which batches of jobs arrive simultaneously. Finally, in
Section 5.5, we discuss MI G 11 queueing models with server breakdowns.

5.1 The MIGIl result


Consider a single server queueing station with unlimited buffering capacity and unlimited
customer population. Jobs arriving at the queueing station form.a Poisson process with
rate X. The service requirement of a job is a random variable S, distributed according
96 5 M/G1 l-FCFS queueing models

to the distribution function B(s), i.e., B(s) = Pr{S 5 s}. S has expectation E[S] (first
moment) and second moment E[S2].
Again we use the notation E[N] for the average number of jobs in the queueing system,
E[N,] for the average number of customers in the queue, and E[N,] for the average number
of customers in the server. Applying Little’s law for the server alone we have: p = E[N,] =
XE[S] and we assume p < 1 for stability. The derivation of E[N,] is somewhat more
complicated. At this stage we will only present and discuss the result. Proofs will be
postponed to later sections.
For the average number of jobs in the queue of an M ]G ] 1 queueing station the following
expression has been derived:
E[N,] = fl. (5-l)
L\l - P)
Applying Little’s law (E[W] = E[N,I/X), we obtain

E[W] = ?!!!!% (5.2)


20 - P>

These two equations only address the queueing part of the overall queueing station. By
including the service, we arrive at the following expressions:

X2E[S2]
E[N] = XE[S] + ~
20 - P)’
XE[S2]
E[R] = E[S] + ~
w - P>’

The M]G]l result is presented mostly in one of the four forms above. The form (5.3) is
often referred to as the Pollaczek-Khintchine (or PK-) formula. Let us discuss this equation
in more detail now.
Looking at the PK-formula
we observe that E[N] depends on the first and second
moment of the service time distribution. What does this imply? From the first two
moments of a distribution its variance can be obtained as ai = E[(S - E[S])2] = E[S2] -
E[S12. We thus see that a higher variance implies a higher average number of jobs in
the system. From a queueing point of view, exhibiting no variance in the service times
(E[S2] = E[S12) 1s* o pt imal. This is a very general observation: the more variability exists
in the system, the worse the performance. With worse performance we of course mean
longer queues, longer waiting times etc.

Example 5.1. Influence of variance.


Consider two almost equivalent queueing stations. In the first one the service requirement
5.1 The MIGII result 97

is 1, deterministically, i.e., E[Si] = 1 and E[SF] = 1 so that var[Si] = 0. In the second


the service requirement is also 1 on average, but the actual values are either 0.5 or 1.5
(with probability 0.5 each), i.e., E[Sz] = 1 and E[Sg] = 0.5(0.5)2+0.5(1.5)2 = 1.25, so that
var[S2] = 0.25. Consequently, in the second system, the average waiting times will be 25%
higher than in the first system, although the average work requirements are the same! •I

Example 5.2. Infinite variance.


Consider a queueing station at which jobs arrive as a Poisson process with rate X = 0.4.
The service requirement S has a probability density function fs (s) = 2/s3, whenever s 2 1,
and fs(s) = 0, whenever s < 1. Calculating the average service time, we obtain

E[S] = Am sfs(s)ds = Lrn 2~-~ds = (-2s-‘)lIT = 2 (seconds).

Clearly, since p = XE[S] = 0.8 < 1 the queueing station is stable. However, when calcu-
lating the second moment of the service time distribution, we obtain

,qs2] = irn s2fs(s)ds = Srn 2s-‘ds = (2 Ins)::;” = ~0.


1

Application of the PK formula thus reveals that E[N] = 00, even though the queue is not
overloaded! q

It is important to note that not the total MlGll q ueueing or waiting time behaviour
is given by the first two moments of the service time distribution, but only the averages.
The effect that the performance becomes worse when the variance of the service time
distribution increases becomes clear nicely when we use the squared coefficient of variation
in our formulae. The squared coefficient of variation of a stochastic variable X, that is,
c; = a;/E[X12, expresses the variance of a random variable relative to its (squared)
mean. Using this notation the PK-formula can be rewritten as:

E[N] = XE[S] + ww1)20 + G> = p + P2(1+ G> (5.5)


w - P> a(1-4 -
We observe that E[N] increases linearly with Cg. For E[W] we obtain a similar equation:

JwS12(1+ G> = dwl(1+ c;>


E[W] = (5.6)
2(1- P> W-P) *
Before we end this section with two examples, we make a few remarks about the applica-
bility of the PK-formula:
98 5 MIGI l-FCFS queueing models

1. The arrival and service processes must be independent of each other.

2. The server should be world conserving, meaning that the server may never be idle
whenever there are jobs to be served.

3. The scheduling discipline is not allowed to base job scheduling on a priori knowledge
about the service times, e.g., shortest-job-next scheduling is not allowed. Disciplines
that are allowed are e.g., FCFS or LCFS.

4. The scheduling discipline should be non-preemptive, i.e., jobs being served may not
be interrupted.

Example 5.3. A simple communication channel.


Consider a buffered 10 kbps (kilo bit per second) communication channel, over which
two types of packets have to be transmitted. The overall stream of packets constitutes
a Poisson process with arrival rate X = 40 packets per second (p/s). Of the arriving
packets, a fraction cvr = 0.2 is short; the remaining packets, a fraction CQ = 0.8 is long.
Short packets have an average length E[Sr] = 10 bits, whereas long packets are on average
E[S2] = 200 bits. Both packet lengths are exponentially distributed. We are interested
in the average waiting time for and the average number of packets in the communication
channel.
This system can be modelled as an M]G] 1 queueing station at which packets arrive
as a Poisson stream with intensity 40 p/s. The fact that we have two types of packets
which are of different lengths, can be coped with by choosing an appropriate service time
distribution. The appropriate distribution in this case is the hyperexponential distribution
with 2 stages (see also Appendix A). The hyperexponential density function with r stages
has the following form: f(x) = CI=‘=, a+ie-pXZ:, where l/pi is the time it takes to transmit
a packet of class i, and CI=‘=, ai = 1. The average value then equals CI=‘=, cri/pi, and the
variance 2 Cr=‘=, &i/p: - (Ci=, c~i/pi)~.
Let us first calculate the /.J; by taking into account the packet sizes and the channel
transmission speed. ~1 = 10 kbps/lO bits/packet = 1000 p/s. In a similar way we obtain
p2 = 104/200 = 50 p/s. For th e utilisation we find p = ~~=, Xaii/pi = 0.648. Applying
the formula for the service time variance, we obtain 0: = 0.3164 msec, whereas E[S2] =
6.404 x 10m4. Substituting these results in the M]G]l result for E[W] and E[N] we obtain:

6.404 x 1O-4
E[W] = 40
2(1 - 0.648)
= 36.39 msec. (5.7)
5.2 An intuitive proof of the MIG11 result 99

Applying Little’s law we obtain E[N] = p + XE[W] = 2.1034 packets.


If we would have modelled this system as an MIMI1 queue with X = 40 and E[S] =
0.0162, we would have obtained E[W] = 19.82 msec, and E[N] = 1.841 packets. Here we
clearly see the importance of using the correct model. 0

Example 5.4. Comparing M/MIl, MIE211, M/Hzll, and MIDll.


We now proceed with comparing four queueing stations: an MIMI 1, an M]E2 ]1, an M]Hz ]1
and an MID] 1. These models only differ by their service requirement distribution. We
assume that the mean value of the distributions is the same and equal to 1. In the M]H2]l
case we assume that al = o2 = 0.5 and that ~1 = 2.0 and ~2 = 2/3. The performance
metric we will use as a comparison is the average number of jobs in the queueing station.
Let us first derive the coefficients of variation of the four service time distributions
involved. For the deterministic distribution we have Ci = 0. For the exponential dis-
tribution CM2 = 1, for the a-stage Erlang distribution C’& = l/2 and for the 2-stage
hyperexponential distribution we have C,2 = 1.5. Substituting this in (5.6) we obtain:

q%!f] = Pwl(1+ 1) = P
w - P> 1 -p’
Pwl(l+ 0) z-z PJWI
Jqw?] =
w - P) 2(1 - P)
pE[S](l + 0.5)
E[VE~] = = +[W4]7

w - P)
pE[S](l+ 1.5)
E[WH,] = = ;E[&].
w - P)

We observe that in the case of deterministic service times, the waiting times reduce 50% in
comparison to the exponential case. A reduction to 75% can be observed for the Erlang-
2 case, whereas an increase to 125% can be observed for the hyperexponential case. In
Figure 5.1 we show the curves for the average waiting times E[W] (for the deterministic,
the hyperexponential and the exponential service time distribution) as a function of the
utilisation p. 0

5.2 An intuitive proof of the M(GI1 result


In this section we will give an intuitive proof of the M]G] 1 result as discussed in the previous
section. For this intuitive proof we will use the PASTA property discussed in Section 4.3.
100 5 M 1G 1l-FCFS queueing models

25

20

15
E[Wl
10

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
P

Figure 5.1: Comparison of the MIHall, the MIMI1 and the MID/l queueing systems (top
to bottom): the average waiting time E[W] as a function of the utilisation p

Furthermore, we will need information about the so-called residual lifetime of a stochastic
variable. We will discuss this issue in Section 5.2.1 before we prove the M 1G I 1 result in
Section 5.2.2.

5.2.1 Residual lifetime


Consider a Poisson arrival process. The time between successive arrivals is exponentially
distributed. From the memoryless property that holds for these exponentially distributed
interarrival times we know that when we observe this Poisson process at any point in time,
the time from that observation until the next arrival is again exponentially distributed.
This is in fact a very peculiar property which, as we will see later, has some interesting
implications which have puzzled probability engineers a long time.
Let us now try to derive the residual lifetime distribution of a more general renewal
process. Assume that we deal with a renewal process where the interevent times X are
distributed with (positive) density function fx(~). If we observe this process at some
random time instance, we denote the time until the next event as Y, the so-called residual
lifetime or the forward recurrence time. The time from the observation instance to the
previous event is denoted 7’ and called the backward recurrence We denote the length
time.
of the interval in which the observation takes place, i.e., the “intercept>ed” interval, with a
5.2 An intuitive proof of the MlGll result 101

X z _ x
. X: interevent time
\I \I 2: intercepted interval length
I\ h I\ T: backward recurrence time
<
T I Y Y: forward recurrence time

arrival

Figure 5.2: Stochastic variables involved in the derivation of the residual lifetime

stochastic variable 2. Note that 2 does not have the same distribution as X, although one
is tempted to think so at first sight. Since longer intervals have higher probability to be
intercepted, a shift of probability mass from lower to higher values can be observed when
comparing the density functions of X and 2. In Figure 5.2 we show the various stochastic
variables.
It is reasonable to assume that the probability that the random observation falls in
an interval with length z is proportional to the length of z and to the relative occurrence
probability of such an interval, which equals fx (z)dz. We thus have that fi(z)dz =
Cxfx(z)dz, where C is a constant which assures that fi(z) is indeed a proper density
function. Taking the integral from 0 to infinity should yield 1, i.e.,

Jdm f&)dz = Srn Czf&)dz = CE[X] = 1, (5.8)


0

so that we must conclude that C = l/E[X] and

Tfx(4
f&> = qX] * (5.9)

Now that we have an expression for fi (z), we will derive an expression for fvlz (y) , the
probability density of Y, given a particular 2. Together with ii(z) we can then derive
fu ( y ) by unconditioning.
Assume that we have “intercepted” an interval with length z. Given such an interval,
the only reasonable assumption we can make is that the actual random observation point
occurs in this interval according to a uniform distribution. Consequently, we have

l/% o<y<x (5.10)


fYlZ(Yl4 =
0, elsewhere.
102 5 Ml Gf l-FCFS queueing models

Now, applying the law of conditional probability, we obtain:

f-w+7 34 = fz(z)fy,z(y~z)
Kfx(4 1
= E[Xlz
.fx(4
= ~E[X]’ O<y~z<m. (5.11)

In order to obtain fy (y) we now have to integrate over all possible z:

(5.12)

We thus have obtained the density function of Y, the forward recurrence time. A similar
expression can, by similar arguments, be derived for the backward recurrence time 7’.
Applying this result now for deriving E[Y] we obtain:

WI = AmYfYWY
1
O” Y(l - Fx(Y))dY
= E[X] s 0
1
iy2(1 - Fx(q; + km ;Y2fxw9)
= E[X] K
1
= 2E[X] s oc0 Y2fx (?I)&
JW21 (5.13)
= 2E[X]’

by using partial integration (J uw’ = uv - J U’ZI). In the same way, we have

E[T] = E[X2]/2E[X]. (5.14)

It is important to observe that the expected forward/backward recurrence time is not equal
to half the expected lifetime!

Example 5.5. Exponentially distributed periods X.


Let us now apply these results to the case where X is exponentially distributed with rate
X. Calculating

1 - (1 - eeXy)
= XemXY= fx(y), (5.15)
5.2 An intuitive proof of the MlGll result 103

reveals that the residual lifetime is distributed similarly to the overall lifetime. This is
exactly the memoryless property of the exponential distribution. A similar derivation can
be made for the backward recurrence time. The expected residual lifetime equals

E[Y] = gq = g = l/X. (5.16)

Similarly, we have E[T] = l/X. N ow, since E[Z] = E[T] + E[Y] = 2/X, we see that
E[Z] # E[X]. Th is inequality is also known as the waiting time paradox. It can be
understood by imagining that long intervals X have more probability to be “hit” by a
random observer. This implies that in 2 the longer intervals from X are more strongly
represented. 0

Example 5.6. Deterministic periods X.


In case the random intervals X are not really random but have deterministic length, we
have E[X2] = E[X12. In that case, the forward recurrence time equals E[X2]/2E[X]
which reduces to E[X]/2. We find that the, maybe intuitively appealing, value for the
forward recurrence time, namely half the normal time, is only correct when the time
periods themselves are of deterministic length. 0

Example 5.7. Waiting on a bus.


Consider the case where at a bus stop a bus arrives according to the schedule at x.00, x.20
and x.40, i.e., three times an hour at equidistance points per hour. If one would go to the
bus stop at random time-instances, one is tempted to think that one would have to wait
on average 10 minutes for the next bus to come. This is, however, not correct. Although
the “inter-bus” times B are planned to be exactly 20 minutes, in practice they are not,
hence, the variance in the inter-bus times is positive, and so E[B2] > E[B12. The correct
expected waiting time for the bus is E[B2]/2E[B], which is larger than E[B]/2. When the
“inter-bus” times are deterministic, the expected waiting time is 10 minutes. Cl

5.2.2 Intuitive proof


Combining the results of the previous section with the PASTA property, we are in a position
to intuitively prove the PK-result. We are interested in the average response time E[R] a
job perceives in an M]G] 1 queueing station. Jobs arrive as a Poisson stream with rate X.
The service time per jpb is a random variable S with first and second moment E[S] and
104 5 MI G) l-FCFS queueing models

E[S2] respectively. For the derivation, which is similar to the derivation for the M]M] 1
queue presented in Chapter 2, we assume an FCFS scheduling discipline.
At the moment a new job arrives, it will find, due to the PASTA property, E[N] jobs
already in the system. Of these jobs, on average p = XE[S] will reside in the server, and
consequently, on average E[N,] = E[N] - p jobs will reside in the queue. The average
response time E[R] for the arriving job can then be seen as the sum of three parts:

l E[Rr]: the residual service time of the job in service, if any at all;

l E[R2]: the service time for the jobs queued in front of the newly arriving packet;

l E[.&]: the service time of the new job itself.

The term E[Rr] equals the product of the probability that there is a packet in service and
the mean residual service time, i.e., E[Rr]
= pE[S2]/2E[S]. The E[N] -p jobs in the queue
in total require on average E[&] = (E[N] - p)E[S] t ime to be served. The packet itself
requires E[&] = E[S] amount of service. Noting that due to Little’s law E[N] = XE[R],
we have:

JqRl = Jqh] + q-321 + E[&]


Jw21 wwl - dW1 + WI
= p2E[sl+
+ (1 - p)E[R] = pm + E[S](l - p)
=wl
XE[S2]
=+ E[R] = E[S] + ~ (5.17)
21 - PI’
This is the result that we have seen before and concludes our proof. This intuitively
appealing form of proof, based on mean-values, is also known as “the method of moments”.
We will see examples of it in later chapters as well.

5.3 A formal proof of the MIGIl result


One of the difficulties in analysing the MI G 11 queue is the fact that the state of the queue
is not simply given by the number of jobs in the queue as was the case with the MIMI1
model. Because of the fact that the service time distribution is not memoryless any more,
the time a particular job has already been served has to be represented in the state of the
queueing stat ion. Consequently, we deal with a stochastic process with a state variable
that takes values in the two-dimensional mixed discrete-continuous set IV x I?. There is,
5.3 A formal proof of the MlGll result 105

however, a particular set of time values, for which the already expired service time of the
job in service is known, and always the same: the instances of time immediately after the
departure of a job and before the new job starts being served. At these time instances,
the so-called departure instances, the already expired service time equals 0. Consequently,
a correct and sufficient state description at departure instances is the number of jobs in
the queueing station. In the following we will first derive a result for the average number
of jobs in the queueing system at departure instances. After that we will reflect on the
general applicability of this formula.
We will first find an expression for the expected number of customers left behind by a
departing customer. Let ni denote the number of jobs left behind in the queue by the i-th
job. During the service of the i-th job, ai new jobs arrive. We first express ni+i in terms
of ni and ai. In case ni > 0, the number of jobs left behind by the (i + 1)-th job equals the
number left behind by the i-th job minus 1 (the (i+l)-th job itself), plus the number of jobs
arrived during the servicing of the (i+ l)-th job. Consequently, we have ni+i = ni+ui+i - 1.
When ni = 0, the number of jobs left behind by the (i + 1)-th job simply is the number
of arrivals during its service: ni+i = ai+i. Introducing the indicator-function l(z) which
equals 1, if x > 0, and 0 elsewhere, we have:

ni+l = 72i + C&+1 - l(ni). (5.18)

Now, taking expectations on both sides in this equation we obtain

E[ni+ll = E[%] + E[Ui+l] - E[l(ni)], (5.19)

where we assume that the expectations exist. Now, realizing that in equilibrium the average
number of jobs left behind by the i-th and the (i + I)-th job must be equal, we have
E[ni+i] = E[ni] = E[n]. Noting that we deal with Poisson arrivals, E[ai] will also be
independent from i, i.e., E[a] = E[ai]. By a similar argument, we also have E[l(ni)] =
E[l(n)]. Substituting this in (5.19) yields

E[a] = E[l(n)]. (5.20)

Note that E[a], the average number of arrivals per average amount of service time, equals
p = XE[S]. w e can also derive this differently (more intricate in this case, but it serves as
a step-up to the derivation of E[a2] later): the average number of Poisson arrivals in an
interval of length t is At (the first moment of a Poisson distribution with parameter At).
Deconditioning on the length of the service time, we obtain

E[a] = X Am sb(s)ds = XE[S] = p. (5.21)


106 5 MIG 1l-FCFS queueing models

Although we now have an expression for E[l(n)], we still do not have the desired result.
To come further to the desired measure, E[n], we employ a trick: we square both sides of
(5.18):
4+1 = nf + (l(ni))2 + c$+~ - 2nil(ni) - 21(ni)ai+l+ 2niai+l. (5.22)
First observe that (l(~))~ = l(x) and xl(z) = x in case x > 0. Now, taking expectations
on both sides and again using the earlier discussed “i-independence” we obtain

E[n2] = E[n2] + E[l(n)] + E[a2] - 2E[n] - 2E[al(n)] + 2E[na]. (5.23)

First observe that the two E[n2] terms cancel. Furthermore, since arrivals are independent
of the state of the queue, we have E[nu] = E[n]E[u] and E[ul(n)] = E[a]E[l(n)] = E[u12
(by (5.20)). R ewriting (5.23) then results in

2E[n] = E[u] + E[a2] - 2E[a12 + 2E[n]E[u]. (5.24)

Bringing the E[n]-terms together on the left-hand-side, we obtain

2E[n](l - E[u]) = E[u] + E[u2] - 2E[u12 = 2E[u](l - E[u]) + E[u2] - E[u]. (5.25)

Dividing both sides by 2(1 - E[u]) yields


E[u2]- E[u]
-cd = a4 + 2(1_ qu]) * (5.26)

We already know that E[u] = p, so the only unknown still is E[u2]. Given that the service
period lasts t seconds, the number of arrivals during this service period has a Poisson
distribution with parameter X-L. The second moment of this distribution equals (At)” + Xt.
Consequently, we have E[u2] service time is t] = (At)’ + Xt. Deconditioning on the service
time, we obtain

E[u2] = ~m((xs)2 + Xs)b(s)ds

= x2 O” s 2 b(s)& + x O” sb(s)ds = X2E[S2] + XE[S]. (5.27)


s0 s0
Using this result, we derive
E[n ] = p + (X2Jw21
+ XWI) - WSI = XE[S]+ ~X2E[S2] (5.28)
20 -P) 2(1 - P>'
This is the result that we have seen before, e.g., in (5.3), except for the fact that we now have
an expression for E[n], the average number of jobs in the queueing station upon departure
instances, whereas (5.3) gives an expression for E[N], the average number of jobs in the
queueing station at any time instance. Now we have to make use of the following theorem,
which we state without proof (for a proof, we refer to [160, Section 5.3, p.1761).
5.4 The MlGll model with batch arrivals 107

Theorem 5.1. Departure instances are arrival instances.


In a queueing system in which customers arrive and depart one at a time, the
distribution of the number of customers left behind at a departure instance
is equal to the distribution of the number of customers found upon arrival
inst ante. 0

Using this theorem and knowing that for the M] G 11 q ueue (due to the PASTA property)
the distribution of the number of customers found at arrival instances equals the customer
distribution at arbitrary time instances, we have established E[n] = E[N].

5.4 The MIGIl model with batch arrivals


In many communication systems, packets that have to be transmitted come from a higher
protocol entity in which a user-originated packet has been split into a number of mini-
packets that fit the packet size used at lower levels in the protocol stack. Since the splitting
of a packet in mini-packets proceeds normally very fast, it is as if a number of mini-packets
(a batch) arrive at the same time (or within a very short time period).
We can model this kind of system by assuming so-called batch or bulk arrivals. We
assume that we have a Poisson arrival process of batches of mini-packets with intensity X.
An arriving batch consists of a random amount H of mini-packets, characterised by the
discrete probabilities hk, k = 1,2, . . . . The expected number of mini-packets in a batch is

E[H] = g khk, (5.29)


k=l

and the second factorial moment equals

E[H’] - E[H] = E[H(H - l)] = g Ic(k - l)hk. (5.30)


k=l

Each mini-packet requires a random amount of service S with first and second moment
E[S] and E[S2] respectively. We thus have p = XE[H]E[S]. Without proof, we state the
average mini-packet waiting time E[W]:

XE[H]E[S2] + E[H(H - l)]E[S]


E[W] = (5.31)
w - P) 2E[Hl(l - P) *

The first term indeed is the average mini-packet waiting time in a normal M]G]l model as
if packets arrived with average size E [H] E[S] and second moment E[H] E [S2]. The second
108 5 MIG/ l-FCFS queueing models

term accounts for the extra waiting time experienced by packets that are not the first in
the batch.
The M]G]l model with batch arrivals is often denoted as MIH]]G]l, where the H denotes
the batch-size distribution.

Example 5.8. Fixed mini-packet size.


Consider a distributed computing application that generates exponentially distributed ap-
plication packets with length A, with E[A] = l/p bytes. The underlying network can only
accept mini-packets (frames) with fixed length u bytes. The overall stream of application
packets forms a Poisson process with rate X. The network transmission speed is r bytes
per second.
Let us compute the average mini-packet waiting time E[W], given that X = 75 packets
per second, E[A] = 1 Kbyte, u = 50 bytes and r = 100 Kbytes per second.

We first compute the density of the number of mini-packets H, i.e., the probabilities hk,
k = 1,2;**, that an application packet is split in k mini-packets. An application packet
of length a will be split in Ic mini-packets whenever (Ic - 1)~ < a < ku. Consequently, we
have

hk =
s kv

(k-1)u
fA(a)da = &(kv) - FA((IC - l)y)

...*
77
= (1 _ pkv) _ (1 _ ,-~(k-lh) = ,-dk-lb-y 1 - ,yw)

= (e-wy’ (1 - e-y = (1 - q”-112, k = 1 2 (5.32)

We thus find that H is geometrically distributed with parameter 0 = 1 - e-pV. We know


that (see also Appendix A) E[H] = l/R. Then note that var[H] = (1 - fi)/f12 so that
E[H2] = var[H] + (E[H])2 = (2 - 0)/n”. Consequently,

E[H(H - l)] = E[H2] - E[H] = F - ; = 2’1a; ‘). (5.33)

Substituting these results in (5.31), and noting that we have deterministic mini-packet
service times so that E[S2] = E[S12, we obtain

(5.34)

Let us now use the above given numerical values. We first calculate s2 = 1 - e+p =
1 - e-5o/1ooo = 0 .049 . Consequently, the average number of mini-packets an application is
split into equals E[H] = l/0 = 20.504. Furthermore, the first moment of the mini-packet
5.5 MIG(1 queueing systems with server breakdowns 109

service time E[S] = 50/100000 = 0.5 msec, and p = XE[H]E[S] = 0.769. We thus obtain

75(5 x lo-4)2 (1 - 0.049)5 x 1o-4


(5.35)
= 2(1 - 0.769)0.049 + (1 - 0.769)0.049 = 43 mSec’

Interesting to note is that when there is no splitting, that is, when the application packets
can be transmitted directly, we would have obtained

~JwiI 150 x 1o-4


E[W] = = 30 msec, (5.36)
2(1 - XE[SA]) = a(1 - 75 x 10-2)

where E[SA] and E[Si] are the first and second moment of the application packet lengths
measured in seconds. Observe that the batch arrivals cause an increase in waiting time of
13 msec (around 43%)! cl

5.5 MJGI 1 queueing systems with server breakdowns


In many computer-communication applications one can observe temporary unavailability
of resources. One can think of the breakdown of a server, the unavailability of a server due
to maintenance, or the unavailability of a server for a particular customer class due to the
fact that the server is occupied by another customer class. One also often speaks about
server vacation models in this context, simply because from a customer point-of-view, the
server is taking a vacation and cannot serve customers for some time.
There are many aspects that characterise queueing models with breakdowns. Apart
from the “normal” characteristics of a queueing system, there are various characteristics
concerning the vacations the server takes:

l The length of the server vacation, that is, its distribution and its dependence on,
e.g., the state of the queue.

l The time-instances at which server vacations start. Do they start during a normal
service, thus implying a preemption of the job in service, or can they only take place
when the server is idle or when the server changes from one customer to the next?

l The scheduling of server vacations. Are vacations starting randomly, after some fixed
number of (job) services or after some fixed time period?
110 5 MIG 1l-FCFS queueing models

l The resume policy of the server. Is a new vacation started if after a vacation the
server finds the queue empty, or is the server just becoming idle in these cases?

Depending on the application that one studies, one can decide for one or another model.
As an example, maintenance work is preferably done during idle periods of a system.
When modelling a system including its maintenance, the maintenance activities could be
modelled as server vacations which can only start when there are no jobs to be served.
Depending on the maintenance strategy, a maintenance activity may be started after a
fixed number of services or after a fixed time interval, whichever comes first (compare a
car maintenance schedule: e.g., every 20000 km or once a year, whichever comes first).
Without going into too much detail regarding derivations, we will present M]G] 1 queue-
ing systems with server breakdowns and single arrivals in Section 5.5.1 and with server
breakdowns and batch arrivals in Section 5.5.2.

5.5.1 Single arrivals


We first consider a variant of the MI G ] 1 model in which the server takes a vacation of
random length whenever the system becomes empty. Upon returning from a vacation,
the server starts working again when there are jobs queued, until the station empties
again. Whenever, upon return from a vacation, the server finds the queue empty, it takes
another vacation. This model is called an Mj GI 1 model with exhaustive service and multiple
vacations.
Assume that jobs arrive as a Poisson stream with intensity X, and that job lengths
have a distribution characterised by the first two moments E[S] and E[S2]. As usual,
p = XE[S]. A vacation has a duration V and is characterised by its first two moments
E[V] and E[V2].
Let us use the method of moments to derive the average waiting time. The average
waiting time E[W] consists of three parts. The first part is the average remaining vacation
time, given that a vacation is going on at the instance of arrival. The latter is the case
with probability 1 - p, the probability that there is no work to be done. The second and
third part are similar to the corresponding parts in the normal MIG] 1 queue: p times the
average residual service time for the job being served and E[N,] = E[N] - p times the
average service times of the jobs queued in front of the arriving job. Consequently, we
have
w21 p2E[sl+
WV = Cl- d2E[vl+ w21 Jw%1w1* (5.37)
5.5 M(GI1 queueing systems with server breakdowns 111

By using Little’s law to rewrite E[N,] = XE[W] an d rearranging terms, we obtain

q/j/T] = E[v21 I WS21 (5.38)


2E[V] 2( 1 - p) *
We observe that the expected waiting time E[W] of a packet in this system simply equals
the sum of the residual vacation duration, E[V2]/2E[V], and the waiting time as perceived
in a normal M]G/l model. This result is not surprising. All jobs being served in fact receive
service as in a normal MJ G] 1 queue, however, delayed by the expected remaining vacation
time (of the last vacation).
A slight change to the above model is the case where at most one vacation is taken: un
ibf GI I model with exhaustive service and a single vacation. Whenever the server returns
from a vacation and finds the system empty, it simply becomes idle, waiting on the next
arrival. For E[W] we then have the following slightly more complex result:
XE[V2] XE[S2]
E[W] = (5.39)
2(&(X) + XE[V]) + 2(1--
where we recognize the normal M/G]1 waiting time, plus a term accounting for the server
vacation. In this term, f;(X) is the Laplace transform of the density function of V, evalu-
ated at s = X (see also Appendix B):

f;(s = X) = (lm eCtfv(t)dt) . (5.40)


s=x
For the above evaluation we thus require knowledge of f;(s) and, hence, of the distribution
of V. What can be observed though, is the fact that the single vacation model yields a
smaller expected waiting time than the multiple vacation model since the additive term
f;(X) in the left denominator is always positive, thus yielding a smaller contribution to
the waiting time.

5.5.2 Batch arrivals


We can easily generalise the M]G] 1 model with server breakdowns to one with batch
arrivals, as discussed in Section 5.4. We have a Poisson arrival process of batches with
rate X. An arriving batch of random size H consists of Ic = 1,2,. . . mini-packets with
probabilities hk, k = 1,2, *se. The expected number of mini-packets in a batch is E[H];
the second factorial moment is E[H( H - l)]. E ac h mini-packet requires a random amount
of service with first and second moment E[S] and E[S2], respectively. Recalling that
P= ~JwqE[Sl, we have for the average message waiting time E [W] the following result:
E[V2] XE[H]E[S2] + E[H(H - l)]E[S]
WV = 2E[Vl+ (5.41)
20 - P> WHl(1 - P> ’
112 5 M 1G 1l-FCFS queueing models

where the first term indeed is the residual vacation time and the latter two terms represent
the earlier presented average message waiting time in an MIHl ]G] 1 model without vacations.

5.6 Further reading


The M]G]l q ueue is treated in most textbooks on computer performance evaluation. The
earliest results on the M]G] 1 queue have been derived by Pollaczek [235]. Early work on the
embedded Markov chain approach has been performed by Palm [228] and Kendall [155].
Background on queues with server breakdowns can be found in the survey of Doshi [76].
Many variants of the M(G]l q ueue are treated in depth in the book by Takagi [274].

5.7 Exercises
5.1. The MIGll queue.
In a university computer center three types of jobs are distinguished (with their relative
occurrence): student jobs (30%): faculty jobs (507)o and administrative jobs (20%). Jobs
of these three classes have negative exponential lengths with mean values 10, 30 and 20
milliseconds, respectively.

1. Compute the first and second moment of the job service time.

2. At which overall job arrival rate does the system become overloaded?

3. Express E[W] as a function of the overall arrival rate X and draw a graph of E[W]
against X.

4. What value does the arrival rate reach, when the expected waiting time is at most
equal to two times the expected service time ? How big is the utilisation in that case?

5.2. Modelling of a disk system.


We consider a (simplified) single disk system. The disk has t tracks and s sectors per
track. We assume that seek operations can be performed with constant speed w tracks
per second. The number of disk rotations per second is T. A single sector (containing b
bytes) corresponds to the smallest unit of transfer (a block). We consider the case where
disk requests (for single blocks) arrive as a Poisson process with rate X and assume that
requests are uniformly distributed over the disk; successive requests are independent of one
another. The service time of a single request then consists of the rotational latency, the
seek time and the block transfer time.
5.7 Exercises 113

1. Find expressions for the first and second moment of the rotational latency.

2. Find expressions for the first and second moment of the seek time.

3. Find expressions for the first and second moment of the block transfer time.

4. Give an expression for the expected response time for a disk request.

5. How would the analysis change when a single disk request consists of i blocks with
probability bi (i = 1, . . . , s) under the assumption that these blocks are stored con-
secutively.

The issue of disk modelling is discussed in detail in [248].

5.3. MIG/l queues with batch arrivals.


Consider a MIG 11 queue with batch arrivals. The arrival process (of batches) has rate A.
Every batch consists of H mini-jobs (mini-packets), with hk = $, for k = 1, s. s, 5. We
furthermore have: E[S] = 0.03 deterministically.

1. Compute the mean batch size E[H].

2. Compute the second factorial moment E[H(H - l)].

3. How large can X be before the system becomes unstable?

4. Compute E[W] in case p = 0.9.

5. Now, suppose we are not aware of the results for the MlGll queue with batch arrivals.
Instead, to approximate the fact that arrivals take place in batches, we accelerate the
arrival process with a factor E[H] and treat the resulting system as a normal MI G/ 1
queueing model. How large is E[W] in this case?

6. To improve upon the approximation derived in the previous exercise, we now assume
that arrivals of batches do take place, but we “encode” the batch size in the service
times, i.e., we assume that a service lasts &k when the arrival consists of k mini-
packets (which is the case with probability hk, F; = 1, .. . ,5). Compute E[S] and
E[S2] and evaluate the expected waiting time E[W] using the normal MlGll result.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 6

MI GI 1 queueing models with various


scheduling disciplines

I N this chapter we continue the study of MlGjl q ueueing models. In particular, we will
study the influence of various new scheduling disciplines, in comparison to the FCFS
scheduling we have addressed in Chapter 5. We address non-preemptive priority scheduling
in Section 6.1 and preemptive priority scheduling in Section 6.2. A limiting case of the
non-preemptive priority scheduling is shortest job next scheduling, which is discussed in
Section 6.3. Then, in Section 6.4 we discuss the round robin scheduling strategy and, in
Section 6.5, its limiting case, processor sharing scheduling. Finally, we discuss scheduling
disciplines based on the already elapsed service time of jobs in Section 6.6.

6. I Non-preemptive priority scheduling


Up till now we have assumed that all the jobs that need to be served have the same
priority. In practice, this is often not the case. In communication systems, as an exam-
ple, short packets that contain control information might have priority over the generally
longer user packets. Another case where priorities are needed is in integrated services
communication systems which transmit real-time data such as voice or video samples next
to time-insensitive data. Also in computer system scheduling multiple priority classes do
make sense, e.g., to distinguish between interactive jobs and computation-intensive batch
jobs. To illustrate the need for priority scheduling, let us consider the following example.

Example 6.1. A computer center without priorities.


Consider a computer center where jobs enter as a Poisson process with rate X. Of the
116 6 M( G 11 queueing models with various scheduling disciplines

1 0.1540 0.0222
2 0.3080 0.0542
3 0.4620 0.1046
4 0.6160 0.1953
5 0.7700 0.4076
6 0.9240 1.4803

Table 6.1: Mean waiting times E[W] ( in seconds) in a non-priority computing center

incoming jobs, a fraction pi = 40% comprises inter-active jobs with fixed length of 10
msec, and a fraction of CQ = 60% comprises batch jobs with a fixed length of 250 msec. We
can compute the expected service time E[S] = 0.4 x 10 + 0.6 x 250 = 154 msec. We thus
find that the system becomes unstable at X = l/O.154 z 6.6 job arrivals per second. For
the second moment we find E[S2] = 0.4 x 0.012 + 0.6 x 0.2502 = 0.0375 seconds2. Using
the standard MlGll result, we can compute the expected waiting times for increasing load,
as indicated in Table 6.1
What can be observed is that even for small utilisations the average waiting time E[W]
is significantly larger than the time required to serve interactive jobs. This is an undesirable
situation which can be overcome by introducing (higher) priority for the interactive jobs.
0

An important characteristic of priority strategies is whether they are preemptive or not:

l In preemptive priority strategies a job in service is stopped being served as soon as


a higher-priority job arrives. In that case, first the higher-priority job is served,
after which the service of the lower-priority job that was originally being serviced
is resumed or restarted. Preemptive scheduling strategies thus order the jobs in the
queue, including the one in service.

l In non-preemptive priority strategies, a job in service is always first finished before a


new job is put into service. Non-preemptive priority strategies only order the jobs in
the queue. Apart from the introduced overhead, preemptive priority strategies are
better for the higher-priority jobs than non-preemptive ones. On the other hand,
non-preemptive strategies are generally easier to implement and cause lessoverhead.

In this section we focus on non-preemptive priority scheduling, whereas Section 6.2 is


devoted to preemptive priority scheduling.
6.1 Non-preemptive priority scheduling 117

Figure 6.1: M]G]l model with P priority classes

Let us now consider a model of a single server system in which arriving jobs can be
classified in P priority classes, numbered 1 through P. We assume that class 1 has the
highest and class P the lowest priority.Jobs of class r = 1, . . . , P, arrive at the queueing
station according to a Poisson process with rate X,. The average service requirement for
class r jobs is E[S,] = l/,+. The second moment of the service requirements for class r
jobs is E[S,2]. In Figure 6.1 we show the M]G/l model with multiple priorities. Note that
we have drawn multiple queues for convenience only; one could also consider a single queue
in which the customers are ordered on the basis of their priority.
We will now derive a relation between the average waiting time E[W,] of a class r job
and the average waiting times of jobs of higher priority classes 1,. . . , r - 1.
A job of class r arriving at the queueing station has to wait before it can be served. Its
waiting time IV, consists of three components:

l The remaining service time of the job in service, if any;

l The time it costs to serve all the jobs of priority classes k = 1,. . . , r, i.e., of all higher
and equal priority jobs, that are present in the system upon the arrival of the class
r job;

l The time it costs to serve jobs of higher priority classes, i.e., jobs of classes k =
l;.. , r - 1, that arrive during the waiting period of the class r job.

In an equation, this takes the following form:

W)
k=l k=l
118 6 MI G 11 queueing models with various scheduling disciplines

where W, is the waiting time of class r jobs, Tp the remaining service time of the job in
service, TL the time it costs to serve all the class Ic customers that are present upon the
arrival of the class r customer (note the summation index that ranges from 1 to r) , and
TL the time it costs to serve all the class k customers that arrive during IV, and that need
to be served before the class r customer (note the summation index that ranges from 1 to
r - 1). Taking expectations on both sides we obtain:

T r-l

E[W,] = E[Tp] + x E[T;] + c E[T;].


k=l k=l

We will now derive expressions for the three terms on the right hand side of this equation:

l The remaining service time of the job in service clearly depends on the class of the job
that is in service. A job of class r is in service with probability pT, where pr = X,E[S,],
the utilisation caused by class r jobs. The remaining processing time of a class r job
equals E[S,2]/2E[S,]. In total, we thus obtain:

E[Tp] = 5 pk= = ; 5 &E[s;]. (6.3)


k=l 2E[sd k=l

l The term E[Ti] is determined by the number of jobs per class in the queueing station
upon the arrival of the new class r job. Due to the PASTA property and Little’s
law we know that on average there are E[ N*,k] = .&E[Wk] class k jobs upon arrival
in the queue. Since they require on average I/pk amount of service each, we have
E[G] = E[&,k]/Pk = AkE[wk]/Pk = PkE[Wk]-

l Finally, we have to calculate E[T[]. D uring the E[W,] time units the class r job has
to Wait, on average &E[wr] j o11sof class k arrive, each requiring I/& service. Thus,
we have E[TJ = &E[Wr]/pk = pkE[W,].

Substituting these results in (6.2), we obtain

E[W,]= 5 +[S;] + 2 PkE[Wk]


k=l
+ ‘2 PkE[W,] k=l k=l
r-l
’ AI,
= c yE[S,2] + c PkE[wk] + 2 PkE[W,]
k=l k=l k=l
r-l
p AI,
= c ~EIS;l + c PkE[w/c] + E[W,] (64
k=l k=l
6.1 Non-preemptive priority scheduling 119

Bringing the E[W,]-t erms to the left, we obtain

(6*5)

Defining 0,. = C’,=, Pk (with go = 0) and dividing by (1 - a,) we obtain:

c:z, +E[S,Z] + CTk,; PkE[Wk]


E[W,] = 1 - CT,
W)
We thus have established a relation that expresses E[W,] in terms of E[W,_I] through
E[W,]. Taking the case f = 1 we obtain:

w.Pl
-------===E[%=] c:z, hE[S;c2] (6.7)
E[wll = (1-Q) 1 ---pi W-d *

Notice the similarity of this expression with the normal M]G]l waiting time formula. If
P = 1, the above expression even reduces to the PK result we have seen before. For the
casef = 2 we obtain

E[W,] =
E[Tp] + plE[W] = . . . = JWPI (6.8)
(1 - 02) (1 - Qz>(l - 01) *

Continuing this process we will recognize the following relation:

WY (6.9)
E[wT1 = (1 - q.)(l - q.-1)’

with E[Tp] = CL=, &E[$!]/2. Th’is result is known as Cobham’s formula [57, 581. Notice
that we can express the average response time for class r as

E[&] = E[S,] + WY (6.10)


(1 - a,)(1 - G-1)’

Example 6.2. A computer center with two priority classes.


To illustrate the impact of the use of priorities, let us readdress the computer center that
servestwo types of customers. Instead of merging the two classeswe now give priority to the
shorter interactive jobs. In Table 6.2 we show the results. For easein comparison we also
include the non-priority results. We observe that the average waiting time for interactive
jobs is only slightly longer than the remaining service time of the job in service (shown
in the rightmost column of the table). As can be seen, the performance improvement for
interactive jobs is tremendous, at almost no performance penalty for the batch jobs. •I
120 6 MIG 11 queueing models with various scheduling disciplines

x P EWl Ai pi Ab Pb E[Wi] E[Wb] E[Tp]

1 0.1540 0.0222 0.4 0.004 0.6 0.15 0.0188 0.0222 0.0188


2 0.3080 0.0542 0.8 0.008 1.2 0.30 0.0378 0.0547 0.0375
3 0.4620 0.1046 1.2 0.012 1.8 0.45 0.0570 0.1059 0.0563
4 0.6160 0.1953 1.6 0.016 2.4 0.60 0.0763 0.1987 0.0750
5 0.7700 0.4076 2.0 0.020 3.0 0.75 0.0958 0.4164 0.0938
6 0.9240 1.4803 2.4 0.024 3.6 0.90 0.1154 1.5183 0.1126

Table 6.2: The waiting times in a computer center with job priorities

The non-preemptive priority scheduling strategy in fact does nothing more than change
the ordering of service of arriving jobs. The amount of work to be done by the queueing
station remains the same. Note that we assume that the overhead for “implementing”
the priorities is negligible. One might therefore expect that some kind of law exists that
expresses that whenever one gives one class of jobs a higher priority, that other classes of
jobs suffer from this.
Indeed, such conservation law does exist. This law expresses that the sum of the average
waiting times per class, weighted by their utilisations, remains the same, independent of
the priority assignment to the various classes. This law, often denoted as Kleinrock’s
conservation law [160] has the following form:
P

c PJv?-1= PE~WI, (6.11)


r=l

where p = C,‘=, pr, and E[W] is the average waiting time when there are no priority
classes. E [W] can thus be derived by the normal M]G]l result. Substituting pr = X,/p,.
(Little’s law for the server only) and E[W,] = E[N,,,]/X, (Little’s law for the queue only)
we obtain
pE[W] = 5 prE[Wr] = 5 kE[N,,l = 5 E[N,,]E[S,]. (6.12)
r=l r=l Pr X7- r=l
The right-hand side of this equation expresses a quantity that is often called the amount
of work in the system, that is, the sum over all priority classes of the number of queued
jobs multiplied by their average service requirements,

Example 6.3. The conservation law for the computer center with job priorities.
As an example of the conservation law reconsider the computer center with job priorities.
From Table 6.1 we read that for X = 5, the amount of work in the system equals 0.314.
Computing piE[Wi] + pbE[Wb] f rom Table 6.2 yields the same result. Cl
6.2 Preemptive priority scheduling 121

6.2 Preemptive priority scheduling


In non-preemptive priority scheduling, a high-priority job might have to wait for a low-
priority job already in service when the former arrives. To circumvent this, we might
allow for the preemption of low-priority jobs by arriving high-priority jobs. In a time-
sharing computer system such a scheduling mechanism can be implemented very well; in
a communication system such a scheduling mechanism can be applied at the mini-packet
level.
A job that has been preempted can be handled in various ways when the processor
restarts working on it. Most efficient is to resume the work from the point where the
preemption took place. This is called the preemptive resume (PRS) strategy. Instead of
resuming the work, the job can also be restarted. Two possibilities exists in this case, at
least from a modelling point of view. One can either repeat exactly the same job, or one
can repeat a job with a service time redrawn from the service time distribution. The former
variant is known as the preemptive repeat identical (PRI) strategy and is what happens
most often in reality. The latter strategy is known as preemptive repeat diflerent (PRD).
As the PRS-strategy excludes the possibility of work being redone, it seems to be the most
efficient. That the PRD-strategy is more efficient than the PRI-strategy can be seen as
follows. First observe that the probability of a job being preempted increases when the
job increases in length. If such a job is then later repeated identically, it will again be
long, and the probability that it is preempted remains relatively high. If on the other hand
a different job drawn from the same distribution is repeated, there is a fair probability
that it will be shorter than the “original” (long) one. It will thus finish faster and will be
preempted with a smaller probability.
We now focus on the PRS-strategy. For jobs of the highest priority class (class 1) the
workload imposed by the lower priority classes does not matter at all. Therefore, E[W,]
can simply be derived via the PK-formula for class 1:

E[W,] = AlE[sfl W[S,2]


(6.13)
2(1- p1) = 2(1 - 01)’
For jobs of class r = 2,-s- , P, the situation is slightly different. These jobs first of all
have to wait for the remaining service time of the job in service if it is of class r or higher
priority, i.e.,of one of the classes 1, -9. , r. The expected value of this quantity equals

(6.14)

Notice the difference with the non-preemptive case here. There, the arriving job always
122 6 MI G 11 queueing models with various scheduling disciplines

has to wait for the completion of the job in service, regardless of its class, that is, we always
deal with a remaining processing time E[Tp].
Then, similar to the non-preemptive case, the job has to wait for the completion of
jobs of classes 1 through r that are already waiting when it arrives. Following the same
derivation as in the non-preemptive case, we obtain that the average waiting time for a job
of class r before it is put into service (denoted as IV;) equals

EKI (6.15)
EFvl = (1 - a,)(1 - a,-1)’

However, there is one element that we did not yet take into account, namely the fact that
once a class r job is in service, it can be preempted by higher-priority jobs, so that extra
waiting time is introduced. The amount of work that flows in during the service of the class
r job and that needsto be handled first equals, on average, o,-~E[S,], i.e., the utilisation of
high-priority job class multiplied by the service duration of the class r job. However, also
when serving this extra work, additional work might flow in: a quantity equal to c&E[S,].
Taking this reasoning further reveals that the effective time to complete the service of the
single class r job takes

E[Sr] (1 + or-1 + a,2m1 + - - .) = JfWTl


+ a,3m1
1 - C&i’

Since the actual average service time still is E[S,], extra waiting time due to preemptions,
of length
~ E[srl - E[S,] = ;$“’ (6.17)
1 - or-i T 1

is introduced, so that the average waiting time for a class r customer finally becomes

E[W,]= $j+E[Sr] + (1 EITrl (6.18)


rl - a,)(1 - (T,-1) *

Defining go = 0, this equation is also valid for class 1. For the average response time for
classr we derive:
E[R,] = ly’;y +
EITrl (6.19)
T 1 (l-ar)(l -or--I)’

Notice that in comparison with the non-preemptive casethe average service time E[S,] for
class r jobs is stretched by a factor (1 - a,-i)-‘, and E[Tp] has been changed to E[T,].

Example 6.4. Preemptive priorities in the computer center.


We can further improve on the performance (mean waiting time) for the interactive jobs
by giving them preemptive priority over the batch jobs. In Table 6.3 we show the expected
6.3 Shortest job next scheduling 123

non-preemptive preemptive
x P qw1 q%l JqW] JqWbl
1 0.1540 0.0188 0.0222 0.00002 0.0233
2 0.3080 0.0378 0.0547 0.00004 0.0567
3 0.4620 0.0570 0.1059 0.00006 0.1090
4 0.6160 0.0763 0.1987 0.00008 0.2028
5 0.7700 0.0958 0.4164 0.00010 0.4215
6 0.9240 0.1154 1.5183 0.00012 1.5244

Table 6.3: The waiting times (in seconds) in a computer center with non-preemptive and
preemptive priorities

waiting times for the two job classes. For easeof comparison, we have also included the
expected waiting times in the non-preemptive priority case. As can be seen, the waiting
time for the interactive jobs almost vanishes; only when another interactive job is still in
service does an arriving one have to wait (the probability that no such arrivals take place
equals ewpi which is very close to 1). The batch jobs do suffer from this improvement only
to a very limited extent. cl

6.3 Shortest job next scheduling


Given that the required service times for jobs are known in advance, which is sometimesthe
casewhen transmitting packets in a communication system or when submitting “standard
jobs” to a computer system, one might want to give priority to small jobs. This leads to
what is known as shortest job next (SJN) scheduling. In fact, SJN scheduling is a form
of priority scheduling in which shorter jobs get priority over longer jobs, i.e., the length
of the job is the priority criterion. Therefore, a derivation similar to the one presented in
Section 6.1 is possible; however, we now deal with a continuous priority-criterion, thereby
assuming that job service times or job transmission times take values in some continuous
domain. Again, we assumethat the priority mechanism works in a non-preemptive way.
Let us assumethat jobs arrive to the queue as a Poisson process with rate X and with
service requirement S. S is a random variable with probability density fs(t). We say that
a job has “priority t” whenever t 5 S < t + d-t and d-t + 0. All jobs wit.h priority t together
form a Poisson process with intensity
124 6 MIG 11 queueing models with various scheduling disciplines

At = APr{t 5 S < t + dt} = AJ+(t)dt. (6.20)

The jobs with priority t together constitute a job class with utilisation

Pt = Ad = Atfs(t)dt, (6.21)

as the (average) service requirement of class t jobs is t units of time.


Analogous to or for the priority scheduling mechanism discussed in Section 6.1, we
define & to be the cumulative utilisation of jobs of priority t and higher. Note that a
higher priority corresponds to a lower value of t. Consequently, we have

pt = StXSf~(S)dS. (6.22)
0

As a result, ,& = X Jamsfs(s)ds = XE[S] = p, the total utilisation.


In the derivation in Section 6.1 the average remaining processing time E[Tp] of a job in
service, given P different priority classes, played an important role. This quantity can now
be calculated as the summation over all priority classes of the product of the probability
to find a class t customer in service and its average remaining processing time; we denote
this quantity as E[T,]. First note that a class t customer is in service with probability
density value Pt. Then note that a class t customer has a fixed service time of t seconds,
so that the average remaining processing time equals t/2. Consequently, we have

E[T,] = I* pt;dt = lrn Atfs(t)$dt = irn ;t2fs(t)dt


0

= Pw% (6.23)

where Y is the residual service time. The conditional average waiting time for a class
t customer, denoted E[Wt], now simply
equals the average remaining processing time,
divided by (1 - ,&) and (1 - @t-h) (h + O>, in a similar way as in (6.9). In this case,
however, & and Pt-h are the same since h + 0, so that we obtain:

E[~l = (IEEQI
XE[S2]
_ pt>2= 2(1- /Q2* (6.24)

This result is known as Phipp’s formula [233]. Th is conditional waiting time can be used
to calculate the unconditional waiting time as follows:

E[W]= Irn E[W]fs(t)dt. (6.25)


6.3 Shortest job next scheduling 125

1.0 I
I I
I I
I I
I

0.8 -

0.6 -
Jwwl FCFS
0.4 -

Figure 6.2: Comparing SJN and FCFS scheduling for the MlUIl queue

As in the discrete case, also in the continuous case there exists a conservation law. It takes
a form completely analogously to the form seen before:

Irn ptE[W]= lrn Ws(t)E[W]dt= PJ-WkOl


- 1-P = PqwFCFSl~ (6.26)

Example 6.5. FCFS and S JN scheduling in a MIUJ 1 queue.


Let us now compare the average waiting time of two MlUl 1 queues: one with FCFS schedul-
ing and one with SJN scheduling. Let the service times be uniformly distributed in [0, l]
and let us assume that X = 1.0. Consequently, p = XE[S] = l/2. Furthermore, we have
E[S2] = l/3. Using fs(t) we can easily calculate pt as follows:

(6.27)
For the FCFS scheduling case, we can simply apply the PK-formula which yields

XE[S2] 1
E[WFCFS] = -=- (6.28)
2(1-p) 3’

For the SJN scheduling we find

E[W,]= XE[S21 113 1


20 - Pt>,
= 2(1 _ $2)2
= 6(1 _ $2)2’ ’ s ’ 5 ” (6.29)
126 6 MiGll queueing models with various scheduling disciplines

In Figure 6.2 we show the two curves for the (conditional) average waiting times. As can
be observed, SJN is advantageous for short jobs; the price to be paid is the longer expected
waiting time for long jobs.
It is also possible to calculate the unconditional waiting time for SJN scheduling, simply
by unconditioning E[W,], according to (6.25):

=
arctanh(t/&) t
t=1 = 0.2705
(6.30)

(6.31)
6fi - w - 2) t=(-J

As can be seen, the unconditional average waiting time for SJN scheduling is smaller than
for FCFS scheduling (note that we have used a formula manipulation package to establish
the last equality). cl

A disadvantage of SJN scheduling is that the required service times have to be known
in advance. As an example of a system where this is the case, consider a communica-
tion system in which packets are queued for transmission, The service times (the packet
transmission times) are then readily known, in which case SJN scheduling can be fruit-
fully employed. In general purpose computer systems, however, such knowledge will not
be available. Indications of job durations provided by the system users are generally very
unreliable and therefore not a solid basis for a scheduling discipline. One way to overcome
this problem is to quit (and later restart) jobs that last longer than the durations indicated
by the users. This prevents the users being overly optimistic about their job lengths; esti-
mating too tightly will result in necessary reruns of jobs and even higher effective response
times.

6.4 Round robin scheduling


Although SJN scheduling improves the performance of short jobs, there is still a non-zero
probability that long jobs are taken into service. In fact, in any stable system, all jobs
finally receive service. An unlucky short job can therefore experience a long delay if the job
in service happens to be a rather long one. In such a case the preemption of the long job
will help the small job significantly, whereas the long job will not be delayed that much,
relative to its own long service time.
A very special preemptive scheduling discipline is round robin (RR) scheduling. With
RR scheduling all the customers in the queue receive service on a turn-by-turn basis. Per
6.4 Round robin scheduling 127

turn, the customers receive a maximum quantum or time-slice of Q time-units of service. If


the customer completes during such a short service period, it leaves the system. Otherwise,
it rejoins the queue at the end. Clearly, RR is only possible when it makes sense to work on
jobs little by little. In time-shared or multi-programmed computer systems this is possible
(and normally practiced): after having received an amount of CPU time, the status of a job
can be saved. At a later stage the work on that job can be resumed. For communication
networks a similar scheme can also be employed: large application-packets from various
sources are then split into mini-packets (segmentation). These can then be transmitted
on a turn by turn basis, e.g., in a cyclic way. The result is a multiplexing scheme at the
mini-packet level.
Below, we outline the derivation of the response time E[&] for customers requiring k
quanta, as given in [59]. W e assume that the actual service time of a customer is an integer
number of the quantum Q and that the probability that a customer requires i quanta is
geometrically distributed with parameter 0:

gi = Pr{cust omer length = iQ} = (1 - 0)&l, i = 1,2,. . . . (6.32)

For the first and second moment of the service time distribution, we then find

E[Sl= FbQ)gi
i=o
= +$, (6.33)

and
E[S21
= c(iQ)2gi
i=o
= %Q’. (6.34)

The analysis then proceeds to compute

(6.35)

where pi is the steady-state probability of having i customers in the queue (at arrival
instances), and E[&]i] is th e expected waiting time for a job requiring k quanta of service
when upon its arrival i jobs are already queued. A recursive expression for E[&\i] is then
derived which, using the summation (6.35) reduces to a recursive expression for E[Rk]
which can be solved to a direct expression (as we have seen for priority queues). For a
customer requiring only a single quantum of service (k = l), we immediately see

The first term is the remaining time for the service quantum currently given to the customer
in service, if any at all (this explains the factor p); the second term is the waiting time for
128 6 MI G 11 queueing models with various scheduling disciplines

all the quanta of service given to the customers queued in front of the arriving customer;
the third term is the quantum for the arriving customer itself. Note that p is defined as
XE[S] = XQ/(l -a) h ere. Furthermore, E[N,] is computed using the normal MlG] 1 result,
however, with the first and second moment of the service time as given above:

Jwil = (1+
q1 _4P2p>* (6.37)

Knowing E[Ri], we can compute E[R k] as follows (without derivation) :

w&cl = q&l + XE[&]+oE[N]-& l-@-l,


1-CY
k = 1,2, * * * ,
(6.38)
where Q = XQ + 0.

6.5 Processor sharing scheduling


Let us now focus on the case that the quantum Q in the RR scheduling caseof Section 6.4
approaches zero. However, we adjust o so that E[S] = Q/(1 - a) is kept constant. In
doing so, the number of required quanta per job goes to infinity. Under the assumption
that the switching between processesdoes not take any time, we obtain what is called
processor sharing (PS) scheduling. In PS scheduling all the jobs in the queue are processed
quasi-simultaneously. When there are n jobs in the queue, every job receives a fraction
l/n of the total processing capacity.
We assumethat a job of general length requires t secondsto complete, sothat it requires
Ic = t/Q quanta. Substituting this value of k in (6.38) and keeping p (and thus E[S]) fixed,
we find the conditional expected responsetime of a job with length t:

(6.39)

that is, the time spent in the queueing system is linearly dependent on the service re-
quirement. A job’s actual service time t is stretched by a factor (1 - p)-’ to its perceived
average responsetime. Note that this property holds without the need to have a priori
knowledge about the job length. This also illustrates why the PS and the RR scheduling
mechanism are attractive to use in multiprogrammed computer systems. However, note
that the limiting case PS cannot be implemented efficiently; if the quanta are too small,
too much time will be spent at context switches. Therefore, in practice the RR discipline
is implemented.
6.6 Scheduling based on elapsed processing time 129

Although there are no waiting customers in a processor sharing system, we can still
compute the difference between the average response and service time. Thus, the (virtual)
average waiting time for PS scheduling for customers of length t becomes:

(6.40)

The unconditional average waiting time then becomes:

(6.41)

We observe that this result is the same as for the M/M/l queue! This, again, is remarkable.
The use of the PS scheduling implies that the second moment of the service time distri-
bution does not play a role any more. One explanation for this is that since all jobs are
processed simultaneously in the PS case, a source of variance is removed from the queue
(short jobs do not suffer specifically from long jobs). Less variance normally implies better
performance, in this case a performance equal to the M]M]l queue. On the other hand, if
we deal with deterministic service times, a source of variance is introduced: the processing
of a job now is dependent on how many other jobs there are in the queue, which was not
the case with FCFS scheduling. In this case, therefore, the performance becomes worse,
To make this more concrete, we can compare the PK-result (for FCFS scheduling) with
E[WpS]. We see that

E[lv~CpS] > E[WpS] * E[S2] > 2E[S12 * c; = ws21- w2 > 1 (6.42)


E[S]2 *

Concluding, when Cz > 1, PS is better than FCFS scheduling, otherwise FCFS is better.
When Cg = 1, we deal with exponential service times and PS and FCFS result in the same
mean waiting time.

6.6 Scheduling based on elapsed processing time


Although PS scheduling and its practical counterpart RR scheduling exhibit very attractive
behaviour, there are circumstances in which one does want to decrease (or increase) the
service-intensity for jobs that already have received a large amount of service or that still
need a large amount of service. We briefly touch upon three of such disciplines in this
section.
The shortest-elapsed processing time (SEPT) scheduling discipline uses the elapsed pro-
cessing time (EPT) of a job as priority criterion; the lower the SEPT, the higher the job
130 6 M 1GI 1 queueing models with various scheduling disciplines

priority. Since SEPT scheduling disciplines do not require the actual processing time a
priori they are more easy to implement than SJN scheduling. Typically, SEPT scheduling
is organised using an RR mechanism with multiple queues. Upon arrival, jobs enter the
queue with highest priority. After having received a quantum of service, they are either
finished, or rejoin a queue of one priority level less, until they are finally finished. Notice
that the number of queues is not known in advance. It has been derived that the expected
response time of a job of length t can be computed as follows:

$E[S2]
E[&m+)] = (6.43)
(1 - pt - M(l - F&)))2 + 1 - & - $1 - Fs(t))’

where ,L$ = X Ji tfs (t)dt is the utilisation due to customers with a length of at most t
seconds, as we have encountered before in the derivation of the SJN scheduling.
A slightly different approach is taken in the shortest-remaining processing time (SRPT)
scheduling discipline. This variant of SJN scheduling takes exactly that customer into
service (with preemption, if necessary) which requires the smallest amount of time to be
completed. Of course, as with SJN scheduling, the service requirements of the jobs have
to be known in advance. It has been derived that

x J; s2dlqs) + X2(1 - l-qt)) + t dz


E[&w&)] = (6.44)
2(1 - PJ2 s0 (l-AT)’

Finally, we mention scheduling based on the so-called response ratio of a job, which is
defined as
expected response time
expected service time ’

By scheduling on the basis of this ratio (which is maintained and updated for all jobs in their
process record) where a higher ratio means a higher priority, all jobs will perceive in the
end a similar ratio. Notice that such a scheduling method can only be implemented using a
round robin mechanism of some forrn. Brinch Hansen derives the following approximation
for the expected waiting times for jobs of length t in such a system:

ml + $$), t 5 2JmlP,
E[Wm(t)] = [I _ (6.45)
(Ip)(ly+qq ’ t > wm,

where E[T] = XE[S2]/2. S’emulations have been conducted to validate the accuracy of this
approximation.
6.7 Further reading 131

6.7 Further reading


Many books on performance evaluation address the M ]G ]1 queue only in combination with
FCFS and priority scheduling. The more advanced scheduling mechanisms are not treated
so often. Since the M]G]l models discussed in this chapter find their application most
directly in the context of operating systems and processor scheduling, more information
on these queues can be found in the earlier books on operating systems, most notably in
Coffmann and Denning [59] and Brinch Hansen [29]. Many more recent books on operating
systems do not address scheduling and queueing effects in so much detail.
The non-preemptive priority model was first developed by Cobham [57] and the shortest-
job next model by Phipps [233]. Early work on round robin scheduling and processor shar-
ing was reported by Coffmann, Kleinrock and Muntz [60, 61, 159, 1611. The SEPT and
SRPT scheduling disciplines have been studied by Schrage [256] and Miller and Schrage
[201]. The HRN scheduling mechanism has been studied by Brinch Hansen [29].

6.8 Exercises
6.1. The MlGll queue with three priority classes.
Reconsider Exercise 5.1, but now assign different non-preemptive priorities to the three
customer classes. Compute the expected waiting times per class when the priority ordering
is (from highest to lowest) :

1. faculty-student-administrative;

2. faculty-administrative-student;

3. student-faculty-administrative.

6.2. MlGll with 2 priority classes.


Readdress the computer center example with interactive and batch jobs given in this chap-
ter.

1. Compute the expected waiting times for the two job classes when priority is given to
the batch jobs and check the validity of the conservation law.

2. Compute the expected waiting times for the two classes when preemptive priority is
given to the batch jobs.
132 6 MlGll queueing models with various scheduling disciplines

6.3. SJN versus FCFS scheduling.


Compare FCFS and SJN scheduling for an M]M] 1 queue. Take X = 1 and E[S] = 0.8 and
find expressions for ,Bt, EIIVt] and E[IVFCFS].

6.4. FCFS versus PS scheduling.


Compare FCFS and PS scheduling for an M]G/l queue with E[S] = 0.8 and var[S] E [0,2].
Draw graphs of E[W] a g ainst var[S] for both queueing models thereby varying X = 0.25,0.5
and 0.75.

6.5. A voice/data multiplexer.


Consider a multiplexer that has to merge one data stream and one real-time voice stream.
The server speed is assumed to be 3642 packets per second, which totals to 1.544 Mbps
with packets of 53 bytes; note that the size of a packet conforms to the size of an ATM
cell [225, 2511.
The data load forms a Poisson process and accounts for 40% of the multiplexer capacity,
i.e., Xd = 1456 packet/set (p/s). Th e voice load is increased from 1 to 13 calls where every
call brings an extra X, = 150 p/ s, also as a Poisson process. Packets have a deterministic
length.

1. Compute the first and second moment of the packet transmission time.

2. Express the overall arrival rate and utilisation as a function of the number of active
calls k.

3. Taking no priorities into account, compute the average response time E[R(lc)] for
both voice and data packets, given Ic active voice calls.

4. Giving priority to the delay sensitive voice packets, compute the average response
time for voice packets E[R,(k)] and the average response time for data packets
Jw&)1> given Ic voice calls.
5. Would a preemption mechanism increase the performance of the data stream ,,&a-
matically?
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 7

GIMIl-FCFS and GIGI&FCFS


queueing models

I N Chapters 5 and 6 we have addressed queues with generally distributed service time
distributions, but still with Poisson arrivals. In this chapter we focus on queues with
more general interarrival time distributions. In Section 7.1 we address the GlMll queue,
the important “counterpart” of the MIGil q ueue. Then, in Section 7.2, we present an exact
result for the GIG11 q ueue. Since this result is more of theoretical than of any practical
interest, we conclude in Section 7.3 with a well-known approximate result for the GIG11
queue.
It should be noted that most of the exact results presented in this chapter are less
easy to apply in practical performance evaluation. For a particular subclass of GIGI 1
queueing models, namely those where the interarrival and service times are of phase-type,
easy-applicable computational techniques have been developed, known as matrix-geometric
techniques. These techniques will be studied in Chapter 8.

7.1 The GlMil queue


For the analysis of the G IMI 1 queue, one encounters similar problems as for the analysis of
the M/G/l q ueue. As before, the state of the GlMll queue consist of two parts, a continuous
and a discrete part, since the state is given by the number of customers in the system and
the time since the last arrival. Unfortunately, the intuitively appealing method of moments
followed for the MI G I1 q ueue, based on average values, the PASTA property and knowledge
about the residual service time cannot be used in this case because the PASTA property
does not hold. Instead, we will simply state a number of important results and discuss
134 7 GIMIl-FCFS and GIGIl-FCFS queueing models

their meaning in detail.


We first have to define some notation. The interarrival time distribution is denoted
FA (t), and has as first moment E[A] = l/X and as second moment E[A2]. The service
time distribution F,(t) = 1 - e- pt, t 2 0, E[S] = l/p, E[S2] = 2/p2 and Cg = 1.
As we will see below, we need knowledge about the full interarrival time distribution
and density; the latter is denoted f~(t) and has Laplace transform f;(s):

fi(s) = lrn e-“fL&)dt. (7.1)


The Laplace transforms of many interarrival time densities are easy to derive; some of
them are listed in Appendix B.
For the G]M]l queue, an embedded Markov chain approach can be employed, similar to
the one discussed in Section 5.3 for the M]G]l q ueue. A general two-dimensional Markov
chain can then be defined: ((N(t), V(t)), t 2 0), w h ere N(t) E IV denotes the number of
customers in the queueing station and V(t) E R denotes the time since the last arrival; we
have to keep track of this time since the interarrival times are not memoryless anymore.
Suitable embedding moments are now arrival instances, since then V(t) = 0. Taking this
further will reveal that the probability that an arriving job finds i jobs in the queueing
system is of the form
ri = (l-a)& i=O,l,***, P-4

with 0 < 0 < 1. Surprisingly, this is a geometric distribution, just as in the MIMI1 case.
Notice, however, that the base of this geometric distribution is not p but 0, which is defined
as the probability that the system is perceived not empty by an arriving customer. This
agrees with the fact that ~0 = 1 - 0, and thus we also have that 0 = 1 - ro = ‘&. ri.
Notice that in general 0 # p; only in case of Poisson arrivals are these two quantities the
same, thus reflecting the PASTA property. As a result of this, for general arrival processes,
the probability that an arriving customer finds the queue non-empty differs from p. The
long-term probability that the queue is not empty, however, remains equal to p.
Knowing the probabilities ri, we can compute the expected waiting time for a customer
as
co
E[W] = C riiE[S], (7.3)
i=O

by noting that for a customer arriving at a station with i customers in it, which happens
with probability ri, i services of average length E[S] have to be performed before the
arriving customer is being served. Notice that we use the memoryless property of the
7.1 The GlMll queue 135

service time distribution here. Continuing our computation, we find:

E[W] = 2 r&?qS] = F(l - a)a%E[S] = aE[S](l - 0) 2 iai-r


i=o i=o i=O

= OE[S](l - 0) F. $(Oi) = OE[S](l - O)$ (gOi)


i-0

= aE[S](l - 0,; ($-) = s. (74


This is again a remarkable result since it has exactly the same form as the M]M] 1 result
for the expected waiting time, however, with p replaced by 0. Hence, we can view the
G]M] 1 queue as an M ]M 11queue in which p is replaced by 0 through some transformation.
Without proof we now state that o can be derived from the following nonlinear equation
involving the Laplace transform of the interarrival time density:

This fixed-point equation can be understood as the transformation of p in 0. Unfortunately,


(7.5) can most often not be solved explicitly. We therefore usually have to employ the
following (straightforward) fixed-point iteration. Pick a first guess for CT,denoted a(‘). It
can be proven that 0 < 0 < 1 as long as p = XE[S] < 1, so a first guess could be o(l) = p.
Then, we compute ac2) = ji(p(l - &))), and so on, until we find a a@) such that (7.5)
is sufficiently well satisfied. Notice that cr = 1 is always a solution of (7.5)) however, this
solution is not valid as it would not result in a proper density fi.

Example 7.1. Poisson arrivals.


In case we have Poisson arrivals, fA(t) = Xemxt, and the above results should reduce to
results we already know for the MIMI 1 queue. We find as Laplace transform
t=cm x
-“tXemxtdt= --e x -(s+X)t -
(7.6)
s+x t=o s+X’
so that (7.5) becomes:

x
* (0 - l)(/Qa - A) = 0. (7.7)
c7= /Q(l - a) + x
The solution 0 = 1 is not valid; it would not result in a proper density for ri, Therefore,
the result is 0 = X/p, which equals our expectation since for the MIMI 1 queue, the queue
length distribution at arrival instances equals the steady-state queue length distribution.
cl
136 7 GlMll-FCFS and GlGll-FCFS queueing models

Example 7.2. The DIM11 queue.


For the DIM]1 q ueue, we have FA(t) = l(t 2 l/X), that is, arrivals take place exactly
every l/X time units. The density f~(t) is a Dirac impulse at t = l/X (see Appendix B).
As Laplace transform we find fi(s) = e-8/X, so that we have to obtain 0 from

This nonlinear equation cannot be solved explicitly, so that we have to resort to the fixed-
point iteration scheme.
As a numerical example, consider the case where E[S] = 1.0 and where X increases
from 0.05 to 0.95, i.e.,p increases from 5% to 95%. In Table 7.1 we compare the results
for the D(M(1 queue with those of the MIMI1 queue. As performance measure of interest,
we have chosen the average waiting time. As can be observed, the deterministic arrivals
have a positive effect on the performance since E[W] is smaller in the deterministic arrivals
case.
Also observe that 0 approaches p as p increases. As CThas taken over the role of p
in the expression for the mean waiting time, we see that for small utilisations (when CJis
much smaller than p) the waiting time is much lower than in the corresponding MIMI1
case. For larger utilisations the effect of having deterministic arrivals instead of Poisson
arrivals becomes relatively less important.
To conclude this example, we can state that with respect to man waiting times the
DIM] 1 queue can be seen as an MIMI 1 queue with reduced utilisation 0 where 0 follows
from (7.8). cl

Since we have knowledge about the number of customers seen by an arriving customer,
we can express the waiting time distribution for such an arriving customer as well. If
n customers are present upon its arrival, there are n services to be performed, before the
arriving customer is taken into service. Since all the services are of exponentially distributed
length, the arriving customer perceives an Erlang-n waiting time distribution (denoted
FE, (t)) with probability r,. This is similar to what we have already seen in Chapter 4,
where we computed the response time distribution for the M(M(1 queue. Summing over
all possible numbers of customers present at arrivals, and weighting with the appropriate
occurrence probabilities, we find

Pr{W 5 t} = E r,F~,(t) = g(l - c+PFEn(t)


n=O n=O

= (1 - 0) g on (I- epPtz $I.!)


n=O i=o *
7.1 The GlMll queue 137

P 0 wv WV
0.05 2.06 x lo-’ 2.06 x lo-’ 5.26 x 1O-2
0.15 1.28 x 1O-3 1.29 x 1O-3 1.77 x 10-l
0.25 1.98 x 1O-2 2.02 x 1O-2 3.33 x 10-l
0.35 7.02 x 1O-2 7.55 x 1O-2 5.39 x 10-l
0.45 1.52 x 10-l 1.79 x 10-l 8.18 x 10-i
0.55 2.61 x 10-l 3.83 x 10-l 1.22
0.65 3.93 x 10-l 6.48 x 10-l 1.86
0.75 5.46 x 10-l 1.20 3.00
0.85 7.16 x 10-l 2.52 5.67
0.95 9.01 x 10-l 9.17 1.90 x lOi

Table 7.1: Comparing the expected waiting time E[W] of a DIM]1 queue with an MIMI1
queue

= 1 - (1 - ++ E L$z 5 fy
i=o ’ n=i+l
i #+l
= 1- (l-+-+$-G
i=O *
DC)(pcq
= 1 - ae-@ C 7
i=O ’
= 1 - ae-p(l-a)t, t 2 0.
(7.9)
In a similar way, we find for the response time distribution:

Pr{R 5 t} = 1 - e-h(l-a)t, t 2 0. (7.10)


Both these results are again similar to those obtained previously for the MIMI1 queue (see
Section 4.4); the only difference is that 0 takes over the role of p.
Prom the waiting time density fw(t) = a,~(1 - a)e-p(l-a)t, we can also derive the
average waiting time as we have seen before.

Example 7.3. The Hypo-2IM/l queue.


In the Hypo-2]M]l queue services take an exponentially distributed time (here, we take
138 7 GIMIl-FCFS and G 1G (l-FCFS queueing models

E[S] = l/p = 1). Th e interarrival times can be considered to consist of two exponential
phases; only after an exponentially distributed time with rate X1 and an exponentially
distributed time with rate X2 an arrival takes place (here we assume that Xr = 2 and
X2 = 1). Thus, the mean interarrival time E[A] = l/2 + l/l = 1.5 and the utilisation
p = 2/3. From A ppendix A we know that

fA(t) = $$- (emXzt - eexlt), t 2 0. (7.11)


1 2

For the Laplace transform we find (with X1 = 2 and X2 = 1):

h x2
-- (7.12)
fxs) = (x,(s)2 + s) - (s + 2;(s + 1)’
The fixed-point equation becomes:

= u + u3 - 5a2 + 60 - 2 = 0. (7.13)
(l-0+2)2(1-0+1)
This third-order polynomial can be solved easily by noting that 0 = 1 must be a solution
to it (check this!) so that we can divide it by (0 - 1) yielding

a2 - 40 + 2 = 0. (7.14)

This quadratic equation has solutions 0 = 2 f &. Since we are looking for a solution
in the range (0, l), the only valid solution we find is CY= 2 - fi z 0.586. Note that
D<P z 0.667. Thus, we find E[W] = (2 - fi)/(fi - 1) = 1.42. The waiting time
distribution has the form

Fw(t) = 1 - (2 - JZ),-(J2P1)t, t 2 0, (7.15)

and is depicted in Figure 7.1. Notice the ‘jump” at t = 0; it corresponds to the fact that
there is a non-zero probability of not having to wait at all. cl

Example 7.4. Waiting time distribution in the EklMll queue.


We will now study the influence of the variance of the arrival process for a single server
queue with memoryless services. We assume E[S] = l/p = 1 and p = X. For an Erlang-k
interarrival time distribution with rate parameter kX, the mean interarrival time equals
E[A] = Ic/lcX = l/X and the Laplace transform equals

(7.16)
7.2 The GIG11 queue 139

1.4 ;;lv(t)
-
1.2

1.0

Fw(t) O-8
0.6
/
0.4

0.2

0.0
0 2 4 6 8 10
t

Figure 7.1: The waiting time distribution in a Hypo-2IMIl queue (p = 0.667)

The fixed-point iteration to solve becomes

(7.17)

In Figure 7.2 we show the value of 0 as a function of p for various values of k. As can
be observed, we always have 0 5 p. Moreover, the larger k, the more D deviates from p.
Hence, for a given utilisation p in the EklMll queue, a larger k implies a smaller utilisation
cr in the MIMI1 q ueue. Clearly, increasing k removes variance from the model and thus
improves the performance. cl

7.2 The GIG11 queue


For the GI Cl 1 queue, explicit results for mean performance measures are even more difficult
to obtain. The main difficulty lies in the complex state space of the stochastic process un-
derlying this queueing system which, apart from the discrete number of customers present,
also consists of continuous components for both the remaining service time and the remain-
ing interarrival time. Even an embedding approach as has been followed for the MlGll
and the G (M I1 queue is therefore not possible anymore. Still though, an important gen-
eral result known as Lindley’s integral equation can be obtained quite easily; its practical
applicability is, however, limited.
140 7 GIMIl-FCFS and GIGIl-FCFS queueing models

1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
P

Figure 7.2: Values of o as a function of p for k = 1,2,9 (top to bottom) in the Ek[M[l
queue

As before, we assume that we deal with independent and identically distributed inter-
arrival and service times, with distributions (and densities) J”(t) (fA(t)) and R’s(t) (fs(t)),
respectively. Moments and (squared) coefficients of variation are denoted as usual.
Let us now try to express the waiting time perceived by the (n + 1)-th customer in
terms of the waiting time of the n-th customer. For that purpose, let r, denote the arrival
time for the n-th customer, S, the service time and W, the waiting time perceived by the
n-th customer. Furthermore, we can define A, = r, - 7,-r, so that A, is the interarrival
time between the n-th and the (n - l)-th arrival. Notice that the random variables Sn
and A, are in fact independent of n; they are only governed by the interarrival and service
time distributions. We have to distinguish two cases now (see also Figure 7.3):

(a) the (n + 1)-th customer finds a busy system: the sum of the service and waiting
time of the n-th customer is more than the time between the arrival of the n-th and
(n + 1)-th customer, and we have

W n+l = W, + Sn - A,+l, if Wn + Sn 2 &+I;

(b) the (n + l)-th customer finds an empty system: the sum of the service and waiting
time of the n-th customer is less than the time between the arrival of the n-th and
7.2 The GIG11 queue 141

n-l n n-l
departures jl-!--J -----I;, ) t
server
It
I WI =4n
t(---------
M-1

arrivals ----__
i- A n+l -i
(------------>I
Wn+l
I
I
I
Gt
n n+l n n+l

(4 (b)

Figure 7.3: Two cases for the evolution of a GIG11 system: (a) system non-empty upon an
arrival, and (b) system empty upon an arrival

(n + l)-th customer, and we have

Wn+l = 0, if Wn + Sn 2 &+I-

The equations for these two cases describe the evolution of the GIG11 system. By intro-
ducing a new random variable U, = S, - An+l, we can rewrite them as

Wn + un, Wn+Un L O,
Wn+l = maX{W, + Un,0) = o (7.18)
{ ’ Wn+Un 50.

The random variable U, measures the difference in interarrival and service time of the n-th
and (n + l)-th customer. For stability of the GIG11 q ueue, it should have an expectation
smaller than 0, meaning that, on average, the interarrival time is larger than the service
time. If we know the distribution of U,, we can calculate the distribution of W,. To start,
we have to compute
Pr{U, 6 U} = Pr{S, - A,+1 5 u}. (7.19)
Since u can be both negative and positive, we have to distinguish between these two cases.
In Figure 7.4 we show the two possible cases. On the x- and y-axis we have drawn S and
A (since our arguments are valid for all n, we can drop the subscript n). If u 2 0, the area
that signifies the events “S-A < u” is the shaded area in Figure 7.4(a) over which we have
to integrate. If u 5 0 we have to integrate over the shaded area in Figure 7.4(b). Since
142 7 GlMl l-FCFS and GI GI l-FCFS queueing models

04

Figure 7.4: The two cases to be distinguished when computing Fu(u)

A and S are independent random variables, we have fA,s(a, s) = fA (a)fs (s). In summary,
we have

so” so” fS(s).fA(+a ds + s,” Js:, f&)fA(a)da ds, U 2 0,


Pr{U 5 u} = (7.20)
~oi h;, fds).fA(a)da ds, u 5 0.

Note that in case u 2 0, the first integral reduces to J’s(u).


Now that we have obtained the distribution of U, we can derive the waiting time
distribution Fw,(t). Clearly, we have FwJt) = 0, for t 5 0. For t > 0, we have from
(7.18):

F&+1(t) = Pr{W,+l 5 t} = Pr{W, + U, 5 t}


= O”Pr{U, 5 t - w]IV, = w}dF~,,(w), (7.21)
J0
where we have conditioned on IV, taking a specific value w, which happens to be the
case with probability density dF~,(w). We omit the subscripts n; this can be done under
the assumption that the system is stable, i.e.,p < 1, in which case, in steady state, all
customers experience the same waiting time distribution. Because the random variables
U = S - A and W are independent, we can rewrite the above conditional probability as
follows:
Fw(t) = /” Pr{U 5 t - w}dFw(w) =
0
Jm
FU(t - w)dFw(w).
0
(7.22)
Combining the above two results, we obtain Lindley’s integral equation:

Fv(t) = 6 FU(t- w)dFdw)?: 5 i, (7.23)


7
7.3 Approximate results for the G/ GI 1 queue 143

Lindley’s integral equation is a fundamental result for the GIG] 1 queue. As this equation
is implicit in Fw (t) , we still do not have an explicit expression for Fw(t) or for E[W]. Un-
fortunately, such expressions are not readily available. They require the solution of (7.23)
which is a complicated task. Various approaches have been proposed for that purpose,
however, they all require fairly advanced mathematics and go beyond the scope of this
book.

7.3 Approximate results for the GIGI 1 queue


To do practical performance evaluations for GIGI 1 queues various approximate results have
been derived. It would go beyond the scope of this chapter to give a complete overview
of all these approximations. Instead we restrict ourselves to a prominent approximation
proposed by Kramer and Langenbach-Belz [164]. Adhering to the same notation as before,
we have:
E[W] = (Ci + cg> (7.24)

where

(7.25)

Notice that this result reduces to the exact result for the MIMI1 and the M]G]l queue.

Example 7.5. The Hypo-2JMIl queue revisited.


We now use the KLB-approximation to compute E[W] for the Hypo-2]M(l queue we ad-
dressed before. Since we deal with exponential services, we have Cs = 1. We can compute
the variance of the hypo-exponential distribution as the sum of the variances of its expo-
nent i al phases. We find 0; = f and we thus have Cl = ;/(i)” = 1 = 0.556. We then
compute g(O.667,0.556,1.000) = 0.959. Using this in (7.24) we find E[W] FZ 1.492 which
is only 5% off the exact value we have derived before. cl

7.4 Further reading


In contrast to M]G]l queues, the theory of queues of G]M]l and GIG]1 type is not often
addressed in manuscripts on performance evaluation of computer and communication sys-
tems. A notable exception is Kleinrock [160, Chapters 6 and 81, on which most of this
chapter is based; note that Kleinrock treats the GIG] 1 queue as “advanced material”. A
144 7 GIMIl-FCFS and GIG(l-FCFS queueing models

very thorough overview of single server queues of a wide variety, including the ones ad-
dressed in this chapter, can be found in the book by Cohen [62]. Lindley published his
integral equation already in 1952 [185]! Kendall published the embedded Markov chain
approach for the G]M] 1 queue in 1951 [155]. Many approximation schemes do exist for
G]M]l and M]G]l queues; we only addressed the KLB-approximation [164].

7.5 Exercises
7.1. The HzjMI1 queue.
For hyperexponential interarrival times, the Laplace transform of the interarrival density
is given as:
(7.26)

Take E[S] = 1, and evaluate o for increasing values of l/E[A] = X = pXr+ (1 - p)X2, and
for various values of p and Xi. How does 0 relate to p when C& > 1. Compare the exact
value for E[W] with the Kramer and Langenbach-Belz approximation.

7.2. The IPPlMll queue.


Derive the fixed-point equation for o in case of IPP-arrivals (see Chapter 4).

7.3. Comparison with the DIM11 queue.


Derive the average waiting time for the D ]M] 1 queue with the Kramer and Langenbach-
Belz approximation and compare it with the exact results in case E[S] = 1, for X =
0.05,0.10, * * * ) 0.95.

7.4. Fu(u) for the MIMI1 queue (adapted from [160, Example 8.21).
Show that for the MIMI1 queue (with usual parameters X and p) Fu(u) can be computed
as:
l-&i
&J(u) = -!L,~u
{ p+x ’
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 8

PHIPHI 1 queueing models

I N this
as special
efficient
chapter

numerical
we address
cases of GIGI 1 queues;
algorithms
the class of PHlPH
however,
known as matrix-geometric
11 queues.
due to the specific
methods
These queues can be seen
distributions
can be applied
involved,
for their
solution.
The aim of this chapter is not to present all the known material on PHiPHIl queues
and their matrix-geometric solution. Instead, our aim is to show the usefulness of matrix-
geometric methods, to provide insight into their operation, and to show that PHIPHIl
models, together with their efficient solution techniques, are a good alternative to GIG/l
queueing models.
This chapter is further organised as follows. In Section 8.1 we readdress the analysis
of the MIMI 1 queue in a “matrix-geometric way”. This is used as an introduction to the
matrix-geometric analysis of the PH IPH/ 1 q ueue in Section 8.2. Numerical algorithms that
play an important role in the matrix-geometric technique are discussed in Section 8.3. We
then discuss a few special cases in Section 8.4. In Section 8.5 we discuss the caudal curve
which plays an interesting role when studying the “tail behaviour” of queues. We finally
comment on additional queueing models that still allow for a matrix-geometric solution in
Section 8.6.

8.1 The MIMI1 queue


Consider an MIMI 1 queueing model with arrival rate X and service rate p. The Markov
chain underlying this queueing model is a simple birth-death process where the state vari-
able denotes the total number of packets in the queueing station. Since the state variable
is a scalar, such a process is sometimes called a scaZar state process. The generator matrix
146 8 PHIPHIl queueing models

Figure 8.1: The state-transition diagram for the MIMI1 queue

Q of the CTMC undc3’ lying ;he MIMI 1 queue has the following form:

-A x 0 ... ... ...


0 ... ...

Q=
I-L -0+/Q)
0 I-L
0
x
-(X+/A) x
-(X+/L)
0
x
**-
---
(8.1)
1.
P
. *.

Observe that, apart from the first column, all the columns are basically the same, except
that “they are shifted down one row”. We call the first column a boundary column and
the other ones repeating columns. Notice that the repeating structure can also be observed
nicely in the state-transition diagram of the CTMC underlying the MIMI1 queue, as given
in Figure 8.1. The balance equations for states i = 1,2, . . ., have the following form:

Pi@ + FL) = Pi-J + pi+1p. (8.2)

Now, let us assume that the steady-state probability for state i only depends on the prob-
ability pi-l and the rates between states i and i - 1. As those rates are constants, we
guess that there is a constant p, of yet unknown value, which defines that pi = ppi-1, for
i = 1,2,--a, or equivalently, pi = pipe, for i = 1,2,. . . . Substituting this in the global
balance equation for the repeating portion of the CTMC (i = 1,2,. ss):

p&x + /J) = pl$flx + PoPitlk (8.3)

Since all steady-state probabilities depend via the multiplicative factor p on po, we have
to assume that p > 0, otherwise all pi would be equal to 0. In doing so, we may divide the
above equation by pi-‘, so that we obtain

pop@ + p) = pox + PoP2P * Po(PP2 - (A + cL)P + 4 = 0. (8.4)

The latter is true when either po = 0 which does not make sense because in that case
all the pi = 0, or when the quadratic equation in p evaluates to 0. This is true for two
8.1 The MIMI1 queue 147

values of p: p = X/p or p = 1. The latter solution cannot be accepted due to the fact that
a proper normalisation of all the probabilities requires p < 1, otherwise xi pi # 1. We
thus conclude that p = X/p. Using this result, we can solve the “boundary” of the global
balance equations as follows. We have

POX = PllJ * pox = pppo = pox. (8.5)

This equality is always true and does not yield us the value of po. We therefore use the
normalisation equation:

fJPi=l*p~~/+=&=I,
1-p
(8.6)
i=O i=O

provided that p < 1. We therefore conclude that po = 1 - p, and, consequently, pi =


(1 - p)pi, i E N. This result is of course well-known; however, the method by which we
derived it here differs from the methods we used earlier.
We need the fact that p < 1 in order for the geometric sum to be finite. The requirement
p < 1 also exactly expresses a stability requirement for the queue: on average, the rate of
incoming jobs must be smaller than the rate at which jobs can be served. In the state-
transition diagram, this property can be interpreted as follows: the rate to the next higher
state must be smaller than the rate to the next lower state.
Once the steady-state probabilities pi (i = 0, 1,. . .) are known, we can derive various
performance metrics, such as:

l the average number of jobs in the queue: E[N] = CEO ipi = 6;

l the average response time (via Little’s law): E[R] = y = $, with the average
service time E[S] = l/p;

l the probability of at least Ic jobs in the queue: B(k) = Cz”=, pi = pk.

In summary, to evaluate the MIMI 1 queue, we have gone through 4 steps:

1. Guessing a solution based on the repetitive CTMC structure;

2. Substituting this solution in the repeating part of the global balance equations to
derive the multiplicative constant p;

3. Computing the solution of the “boundary probabilities” by solving the corresponding


part of the global balance equations, thereby using the normalisation equation and
the result from Step 2;
148 8 PHIPHIl queueing models

4. Computing performance measures of interest.

In the next section we will use this 4-step approach to evaluate more complex queueing
systems.

8.2 The PHIPHIl queue


In Section 8.2.1 we present a structured description of the CTMC underlying a PHIPHIl
queue. In Section 8.2.2 we then proceed with the matrix-geometric solution. Stability
issues are discussed in Section 8.2.3 and explicit expressions for performance measures of
interest are discussed in Section 8.2.4.

8.2.1 A structured description of the CTMC


Now, consider the PHIPHIl queue, the generalisation of the MIMI1 queue in which both
the service and the interarrival times have a phase-type distribution (see also Chapter 3
and Appendix A). In this case, the state of the underlying CTMC is not totally described
by the number of jobs in the queueing system. Part of the state description now is the
phase of the arrival process, and the phase of the job in service, if any. Consequently, the
state is a vector of three elements. The underlying Markov chain is therefore referred to
as a vector state process.
Suppose that the representation of the arrival process is given by (a, T), where T is
an m, x m, matrix, and that the representation of the service process is given by (p, S),
where S is an m, x m, matrix. Ordered lexicographically, the states of the Markov chain
underlying this PHIPHIl queueing system are of the form (n, a, s) where n is the number of
jobs in the queue, a E { 1, . . . , m,} is the phase of the arrival process, and s E { 1, . . . , m,}
is the phase of the service process:

Z= {(O,lJ), (OJ, l>,-, (0-b 1>,(LL1),-~, (1,1,ms),(1,2,1),...,(1,2,ms),


- - -, (1, m,, I), f - - , (l,m,, m,), * - * , (2, mu,w), - * - , (~,md%)).
All the states that have the same number of jobs in the queue are said to belong to one
level. For example, level i, i = 1,2,. . ., corresponds to the set of states

Level 0 consists of the states ((0, 1, l), (0,2, l), . . . , (0, m,, 1)). Apart from some irregu-
larities at the boundary, the matrix Q shows great similarity with the generator matrices
8.2 The PHIPHIl queue 149

we have seen for birth-death queueing models. In particular, we observe that Q is block-
tridiagonal. The upper-diagonal blocks describe the transitions from a certain level to the
next higher level: they correspond to arrivals. The lower-diagonal blocks describe the tran-
sitions from a certain level to the next lower level: they correspond to service completions.
The diagonal blocks describe transitions internal to a level causing arrivals nor departures,
that is, they correspond to phase changes in the arrival or service process. Viewing the
levels as “super states” the matrix Q in fact describes a birth-death process on them. For
this reason vector state processes are most often called quasi-birth-death models (QBDs).
Returning to the specific case at hand, the generator matrix Q of the underlying Markov
chain has the following structure:
/ Boo Bol 0 0 - -a
Blo Al A0 0 ---
Q= 0 (8-V
0

The square matrices Ai are of size mams x mams and have the following form:

Al = (T@I) + (I@S),
AZ = I@@‘/?-’ (8.8)
where the binary operator @ is used to represent the tensor or Kronecker product of two
matrices (see Appendix B). The matrix A0 describes the transitions to the next higher
level and includes the component To which indicates the rate at which the arrival process
completes. The factor a accounts for the possible change in phase in the next arrival
interval directly starting afterwards. Similarly, A2 describes the rate at which services
complete (factor so) multiplied by the vector -,8 in order to account for the starting phase
of the next service epoch. The matrix Al describes changes in arrival or service process
phase within a single level. The matrices Bij are all differently dimensioned and have the
following form:

Boo = T (size m, x m,),

Bol = Toa @I-p (size m, x mams),


BIO = I @ so (size mams x m,). (8.9)
Matrix Boo describes state changes internal to level 0; because there-is no job in the queue
at level 0, only changes in the arrival process need to be kept track of, which explains its
150 8 PHIPHIl queueing models

equality to T. Matrix B o1 has a similar structure to A 0; however, as it represents the


arrival of the first job after the system has been empty for some time, it needs an extra
factor taking into account the phase at which the next service is started, which explains the
factor -,0. Similarly, B 1o much resembles A,; however, as this matrix represents transitions
to the empty system, no new service epochs can start, so that a factor ,0 - is missing.

Example 8.1. The EzlH211 queue (I).


Consider a queueing model with a single server and with Erlang-2 interarrival times and
hyper-exponentially distributed service times. Notice that this model is a special case of
the GIG11 queue, which we cannot solve. We use the following PH-type representation of
a 2-stage Erlang distribution:

T= (-,“” TiA), To= ( ;A) ande=(l,O), (8.10)

that is, we have two exponential phases which are visited one after another. If both phases
have been passed, an arrival takes place and the next interarrival period begins (a renewal
process). We take the following representation of a 2-stage hyper-exponential service time
distribution:

(8.11)

that is, with probability l/4 phase 1 is taken, with mean length l/pi and with probability
3/4 phase 2 is taken, with mean length l/pz.
The CTMC describing this queueing model is a QBD process where the number of
states per level is 4 (2 x 2; two phase possibilities for the service as well as for the arrival
process). When the system is empty, only the arrival process can be in one of its two
states, hence, the first level (level 0) only has two states. The state transition diagram is
depicted in Figure 8.2. As can be observed, transitions related to services always point to
states one level lower. Notice that at the end of a service period, the next service period is
started; the probabilities -,8 are used for that purpose by multiplying them with the service
rates pi. The first phase in the arrival process always leads to a state transition within
a level; the completion of the second phase in the arrival process leads to a next higher
level. Notice that only at the arrival of the first customer (transition from level 0 to level
1) the service time distribution is chosen (with probability ,& take rate pi). For arrivals at
a non-empty queue the choice of the service time distribution is postponed until the next
customer has completed its service.
8.2 The PHIPHIl queue 151

Figure 8.2: State transition diagram for the QBD underlying the E21Hzll queue

From Figure 8.2 the generator matrix for this queueing system can readily be obtained.
Instead, one could also apply the definitions of the block matrices to arrive at:

Boo = T =

Blo=I@So= , Ao=~o~@I=
i 0 000 I
-(p1+ 2X)
I 0 2x
2x 0

0
00 '

0 +2 + 24 0 2x
Al =
0 +1+ 2X) 0
0 0 492 + 2X)

and (8.12)
L 00
t32
CL1 3p1
3~2
0 ~200
I-L1 00
3/-h
3~2 I *
152 8 PHIPHIl queueing models

8.2.2 Matrix-geometric solution


We now proceed to solve the CTMC for its steady-state probabilities in a way that closely
resembles the approach followed for the MIMI 1 queue. First, let zi denote the vector
of lexicographically ordered steady-state probabilities of states in level i. Then, for i =
1,2, * * .) we have

The arguments that led to the simple geometric form for the steady-state probabilities
in an MIM 11 queue apply again here; however, we now have to take levels of states as
neighbours between which probability flows. That is, the probability of residence in level
i (z;) depends only on the probability of residence in level i - 1 (Zi-i)r for those levels i
which are already in the repeating portion of the CTMC, i.e., for i = 2,3, - - -. Since the
rates between neighbouring levels of states are constants, we hope that the relation we
are aiming at can be expressed as a multiplicative constant between gi and zi-1. Note,
however, that since we are dealing with vectors now, this constant needs to be a square
matrix R of size mams x mams. If we assume that

it follows that the steady-state probability vectors Xi must have the form

(8.15)

Notice that we “go back in the recursion” to level 1 here, and not to level 0 as in the MIMI1
case. This difference is due to the fact that in the M 1M) 1 case we had only one boundary
level, whereas we have two here. The length of the vectors z;, i = 1,2, ++., is m,m,. As in
the MIMI 1 case for p, the square matrix R of size mams x mams follows from a quadratic
equation obtained by substituting the assumed geometric form in the repeating portion of
the global balance equation:

which, after rewriting the vectors pi in terms of z1 yields

z,(R2A2 + RIAl + R”Ao) = 0. (8.17)

This equation can only be true when either z i = 0, or when the quadratic equation within
parentheses equals 0. For the same reasons as mentioned when discussing the MI MI 1
8.2 The PHIPHIl queue 153

queue, the latter must be the case, and the matrix R thus follows from the following
matrix quadratic equation:

R2A2 + R’A1 + R”Ao = 0. (8.18)

We discuss the actual computation of R from this quadratic equation in Section 8.3; for
the time being we assume that we can do so.
To start the recursive relation (8.15) the following boundary equations must be solved
to obtain go and z,:

=0,and (z~,z~,z~)~ (8.19)

By the fact that z2A2 = (z,R)A 2 we can rewrite the right-hand equation as

(8.20)

so that we obtain the following boundary equations:

(8.21)

As can be observed, the length of z. = (~(~,r,o),... ,~(o,~,,~)) is m,. Also, since (8.21) is
not of full rank, the normalisation equation has to be used to arrive at a unique solution:

-&il=go~+‘&l=zol+gl gol+z,(I-R)-ll= 1. (8.22)


i=o i=l

This equation might be integrated in (8.21) to yield one system of linear equations although
this might not be most attractive to do from a numerical point of view (see Chapter 15).

8.2.3 Stability issues


Let us now return to the question under which conditions (8.18) can indeed be solved and
lead to a unique matrix R.
First recall from the MI MI 1 queue that such a queue is stable as long as X < p.
Translated to the state-transition diagram, this inequality can be interpreted as follows.
As long as the “drift” to higher level states is smaller than the “drift” to lower level states,
the system is stable: A < p + p = X/,Y < 1 and a steady-state solution exists.
154 8 PHIPHIl queueing models

A similar reasoning can be followed when we deal with QBDs. Let us denote A =
A0 + Al + A,; this matrix can be interpreted as a Markov generator matrix which describes
the transition behaviour within a level. Transitions between levels, normally described by
matrices A0 and Aa, are simply “looped back” to their originating level. Now, define the
vector sir to be the steady-state probability vector belonging to the CTMC with generator
matrix A. E follows from EA = 0 under the normalising condition xi ri = 1. Thus,
given presence in some repeating level j, the probability of being in the i-th state of that
level, i.e., in state (j, i), equals xi. Given this probability, the total “drift” to a next higher
level can be expressed as xi ni x1 Al:,), , where A:$, is the (i, I)-th element of matrix A,-,.
Similarly, the drift to the next lower level can be expressed as Ci ni CI A&. Notice that
in both these expressions, the rates to the next higher (lower) level are weighted by their
relative occurrence. Now, as long as the drift to next higher levels, i.e., xi ri XI A::,{,, is
smaller than the drift to a next lower level, i.e., Ci ri Cl A[$,, we have a stable system. In
matrix notation, the required inequality for stability then becomes:

~JA~L< EAJ. (8.23)

Once the matrices Ai have been derived, this inequality can easily be verified.

8.2.4 Performance measures


Once the matrix R and the boundary vectors z. and zl have been obtained, various
interesting performance metrics can be derived:

l Arrival and service process characteristics:

- the average interarrival time E[A] = --aT-‘L; we define X = l/E[A];


- the average service time E[S] = -/$3-‘1; we define p = l/E[S].

l Average performance measures:

- the utilisation E[N,] = p = X/F;


- the average number of jobs in the queue and server:

E[N] = C igil= 21 ( x iRip ) 1


\i=l

= zl-& ((I - R)-1 - I) 1= &I - R)-2& (8.24)


8.3 Numerical asPects 155

- the average number of jobs in the queue E[N,] = E[N] - p;

- the average response time E[R] = E[N]/X;

- the average waiting time E[W] = E[N,]/X.

l Detailed performance measures:

- the probability zi of having i jobs in the queue: xi = z;l;

- the probability Bk of having at least k jobs in the queue, which for k > 1 can
be written as:

L= s,R”-l(I - R)-‘l. (8.25)

For all these measures, the similarity with the corresponding measures for the M] M ]1 queue
is striking.

8.3 Numerical aspects


In order to really compute steady-state probabilities we have to solve for the matrix R
as well as for the (normalised) boundary equations. We discuss the numerical solution
of the boundary equations in Section 8.3.1. Only in special cases can R be computed
explicitly (see Section 8.4). Normally, R is computed iteratively. We present a simple and
straightforward substitution algorithm in Section 8.3.2, after which we present the recently
developed logarithmic reduction algorithm in Section 8.3.3.

8.3.1 Solving the boundary equations


The calculation of the boundary probability vectors from the (normalised) system of linear
equations can be done using the standard techniques for solving linear systems of equations,
as will be described in Chapter 15. As the number of boundary equations is normally not
too large, direct approaches such as Gaussian elimination are in general well-suited. Notice
that we need to compute R before we can solve the linear system-of boundary equations
(8.21).
156 8 PHIPHIl queueing models

8.3.2 A successive substitution algorithm


From the quadratic matrix equation

F(R) = R2A2 + RIAl + R”Ao = 0, (8.26)

we can derive
R = -(A0 + R2A2)A;‘. (8.27)

Now, taking as a first guess R( 0) = 0, we obtain the next guess R( 1) = -AoA;l. We


obtain successive better approximations of R as follows:

R(k + 1) = -(A, + R2(k)A2)A;‘, k = 1,2,. e.. (8.28)

The iteration is stopped when 1IF(R) I I < E. It has been shown that the sequence {R(k), k =
0, 1, * * *} is entry-wise nondecreasing, and that it converges monotonically to the matrix R.
The following remark is of importance when implementing this method in an efficient
way. First notice that R(1) is defined as -AoALl. In many cases we will see only a few
non-zero entries in the vectors a and To. Since A0 = (2’ . a) @ I (it describes arrivals
to the queue) we see that rows containing just O’s in A0 will yield similar zero-rows in
R. Further iteration steps do not change this situation for the matrix R. Therefore, the
matrix R will often have many rows completely zero. This fact can be exploited in the
multiplications by simply skipping rows which contain only zero entries.
For each iteration step, 3 matrix-multiplications have to be made. If R is of size
N x N, then we require 0(3N3) operations per iteration step. The number of iteration
steps heavily depends on the utilisation of the queue under study.

Example 8.2. The E21H211 queue (II).


Consider again the E21H211 q ueue in which we assume the following numerical values:
X = 8, ~1 = 20 and ,92 = 10. Due to this choice, p = 0.7. The matrix R is calculated with
the above simple algorithm, with precision 10m8in 72 steps and equals:

0.0000 0.0000 0.0000 0.0000


R= ’ (8.29)
0.4678 0.0970 0.2229 0.1217
0.0415 0.7879 0.0521 0.6247

First notice that the matrix R has two non-zero rows; these correspond to those situations
in which arrivals can take place (at each level). There are 4 states per level, but in only two
8.3 Numerical aspects 157

2x P Jw,I SS LR
2 0.0875 0.0023 6 3
4 0.1750 0.0160 11 4
6 0.2625 0.0492 15 4
8 0.3500 0.1122 21 4
10 0.4375 0.2200 28 5
12 0.5250 0.3985 38 5
14 0.6125 0.6963 52 5
16 0.7000 1.2186 72 6
18 0.7875 2.2432 100 6
20 0.8750 4.8242 100 7
22 0.9581 17.5829 100 9

Table 8.1: Analysis results for the E2 IH211 queue

of them can arrivals take place, namely in those states where the Erlang-2 arrival process
is in its second phase.
To compute the boundary probabilities, we have to solve (8.21) under the normalising
condition (8.22). We find go = (0.1089,0.1911) f rom which we quickly verify that p =
1 - z,l = 0.70.
In Table 8.1 we show some more analysis results. We vary X from 1 through 11. Note
the increase in the average queue length with increasing utilisation. We tabulated the
required number of steps in the successive substitution procedure to compute R (column
SS). Note that when p increases the iterative solution of R slows down tremendously; since
we stopped the iterative procedure after 100 steps, the last three rows do not represent
accurate results. cl

We finally note that a number of more efficient methods (based on successive substi-
tutions) have been developed than the one presented here. These methods rely on the
computation of the intermediate matrices G and U, which will be introduced and related
to R in the next section. However, since these algorithms are also outperformed by the
recently developed logarithmic reduction algorithm, we do not discuss them any further
here; for further details, refer to [172, 1881.
158 8 PHIPHIl queueing models

8.3.3 The logarithmic reduction algorithm


Recently, Latouche and Ramaswami developed the logarithmic reduction (LR) algorithm
to compute R [173]. It is regarded as the most efficient to date. The algorithm is based on
the following three matrix-quadratic equations (here given in the case of a discrete-time
QBD):

G = A2+A1G+AoG2,
R = A,+RA, +R2A2,
U = Al +Ao(I - U)-lA2. (8.30)

The three unknown matrices (all of size N x N) have the following interpretation:

l the element gi,j of the matrix G is the probability that, starting from state (1, i), the
QBD process eventually visits level 0 in state (0,j);

l the element ui,j of the matrix U is the taboo probabdity that, starting from state
(1, i), the QBD eventually returns to level 1 by visiting state (1, j), under taboo of
level 0, that is, without visiting level 0 in between;

l the element ri,j of the matrix R can be interpreted as the expected number of visits
into state (1, j), starting from state (0, i), until the first return to level 0.

Once one of these matrices is known, the other ones can be readily computed. For instance,
having computed G, we can derive U = Al + AoG, and R = -AOUml.
When the QBD is recurrent, the matrix G is stochastic. We can therefore iteratively
compute successive estimates for G, denoted as G(lc), Ic = 1,2, *es, until the row sums
equal (almost) 1, that is, until Ill- GI]I < E, where E is a prespecified accuracy criterion.
We have: limk,, G( Ic) = G.
The (i,j)-th element of G(lc), denoted as gi,j(lc), now has the following interpretation
[173]: gi j(lc) is the probability that, starting from state (1, i), the QBD visits level 0 in state
(0, j), under taboo of levels Ic+ 1 and beyond. Clearly, to compute G accurately, we should
make Ic large, so that even very long queue lengths are allowed for (so that effectively the
taboo is not there). The algorithms developed in the past all compute successive matrices
G(lc) by increasing lc one-at-a-time. Especially for queueing models with high utilisation,
large queue lengths occur quite often, thus requiring many iteration steps (as witnessed by
Table 8.1). In contrast, the new LR algorithm doubles the value of k in every step, thus
reaching a far smaller effective taboo, given a fixed number of iteration steps. In practice,
8.3 Numerical aspects 159

1. B. := -AllA,; B2 := -A,lA2
2. G:=B,; T:=Bo
3. while ]]I.-Gl]( > E do
4. D := BoBz + BzBo
5. B. := (I - D)-lB;
6. Bz := (I - D)-lB;
7. G:=G+TB2
8. T := TBo
9. od
10. U := Al + AoG
II. R := -A&-l

Figure 8.3: The logarithmic-reduction algorithm for continuous-time QBDs

20 iteration steps suffice most of the time, thus allowing the QBD to have upsurges to
levels as high as 220 (about 1 million).
Without going into the detailed derivations, we now present the basic equations to
compute G (note that we treat the case of a continuous-time QBD here; in [172, 173, 2681,
the discrete-time variant is presented):

(8.31)

‘64
with
Be(O) = -AlAo, and B2(0) = -AlA2, (8.32)

and
Bi(k + 1) = (I - (Bo(k)B&) + B2(k)Bo(k)))-‘B;(k), i = o, 2. (8.33)
Dik)
The corresponding algorithm is given in Figure 8.3. The first line represent the initialisation
according to (8.32), and in the second line G is set equal to the first term in (8.31); note
that when k = 0, T(0) = I so that G = B2. Then, until G truly is a stochastic matrix,
successive terms for D(k) and the matrices B;(k) are computed in lines 4-6, according to
(8.33). In line 7 the current term is added to G and in line 8 the.new value for T(k) is
computed. When the iteration ends, U and R are finally computed.
160 8 PHIPHIl q ueueing models

Regarding the complexity of the LR algorithm, it can be shown that the number of
operations per iteration step equals O($N3). Th is is about eight times more than in the
case of the successive substitution method. However, the strength of the LR algorithm lies
in the fact that it does need far less iterations; roughly speaking, when the successive sub-
stitution method requires Ic, iteration steps to reach an accuracy E, then the LR-algorithm
requires only O(log, k,) steps. Recently, a slightly faster iteration step has been proposed
by Wagner et al. [282]; their algorithm requires O(yn3) per iteration steps and the same
number of steps as the LR algorithm.

Example 8.3. The E21H211 queue (III).


In the last column of Table 8.1 we show the number of steps required by the LR algorithm
to reach the same accuracy as the SS algorithm. The increase in performance is indeed
tremendous. The number of steps still increases for increasing p, however, only very slowly.
Needing more than 20 iterations to compute R has been an exception in the case studies
we have performed so far. 0

Example 8.4. The IPPIEkIl queue (I).


To model the bursty nature of many arrival processes we can use an IPP which has two
states: ‘<off” (0) and “on” (1) (see also Section 3.9.1). The transition rate from state i to
state j equals yi,j. In state i, jobs are generated as a Poisson process with rate Xi. Such
an IPP is completely described by the matrix T:

-Yo,1 To,1
T= (8.34)
Yl,O -ho + X) ’

and the vector (a, a~) = (0, 1,O). Note that due to the choice of a!, we have established
that after an arrival (an absorption) the PH distribution stays in state 1 so that the arrival
process remains in a burst. An important parameter for an IPP is the burstiness b. It is
defined as the ratio between the arrival rate in a burst and the overall average arrival rate:

arrival rate in a burst x To,1 + Yl,O


b= (8.35)
overall average arrival rate = ~YO,l/(YO,l + Yl,O) = "lo,1 *

We will first study the influence of the burstiness of the IPP on the average number of
customers queued while keeping the utilisation constant. This allows us to investigate
quantitatively the influence of b on the performance. At the same time, we vary the
number of Erlang phases k in the service time distribution from 1 to 10. We address four
different burstiness levels: b = 1, 2.75, 5.5 and 11; we kept ~o,i = 10 and varied yi,-, from
8.3 Numerical aspects 161

E[N,] for given b


k b=l b=2.75 b=5.5 b=ll
1 0.375 1.759 2.859 3.496
2 0.284 1.595 2.741 3.394
3 0.253 1.537 2.702 3.361
4 0.237 1.508 2.682 3.344
5 0.227 1.489 2.670 3.334
6 0.221 1.477 2.662 3.328
7 0.216 1.468 2.657 3.323
8 0.213 1.462 2.652 3.320
9 0.210 1.457 2.649 3.317
10 0.208 1.452 2.646 3.315

Table 8.2: The average queue length E[N,] in the IPP IEI, (1 queue, for increasing number
of phases k and different burstiness factors (p = 0.4545)

17.5 via 45 to 100 in the latter three cases. We want to keep the utilisation equal to 45.45%
in all cases and we therefore adjusted X to 45.45, 125, 250 and 500 respectively; the service
rate p = 100. Notice that the case b = 1 corresponds to the case where the IPP has been
replaced by a normal Poisson arrival process (or an IPP with yl,o = 0).
The results are depicted in Table 8.2. We observe that for all burstiness levels the aver-
age number of queued customers decreases as the service times become more deterministic,
i.e., as Ic increases. Notice, however, that the relative decrease becomes less pronounced for
higher values of b. This implies that for (very) bursty sources, the service time distribu-
tion (or its second moment) plays a less important role, as far as the average performance
variables are concerned. Looking at the arrival rates, we see that for the larger burstiness
values, the queue is overloaded when the arrival process is active; this causes the enormous
increase in average number of customers queued for larger values of b (and so, via Little’s
law, in the average waiting time).
In Figure 8.4 we show the effect of increasing t>henumber of Erlang stages in the service
time distribution from 1 to 50. The figure shows the expected queue length when the
arrival process has b = 11 and utilisation 72.73%. Notice that altering Ic to a value around
5 already makes the services very deterministic. Adding more phases does not change the
performance measure of interest very much any more; it does change the number of states
per level. The number of rows and columns in R equals 2k. 0
162 8 PHIPHIl queueing models

18.8

18.6
w,J
18.4

18.2

18.0
5 10 15 20 25 30 35 40 45 50
lc

Figure 8.4: E[N,] for the IPP/Ekll q ueue as a function of the number of phases k in the
Erlang service time distribution

8.4 A few special cases


We address a few often recurring special cases of the general theory we have addressed so
far. These, and a few other cases can also be found in [217, Section 3.21. We discuss the
M]PH]l queue in Section 8.4.1 because it allows for an explicit computation of R. We then
discuss the PH]M] m multi-server queue in Section 8.4.2.

8.4.1 The MIPHIl queue: an explicit expression for R


Consider an M IPH 11 queue in which the Poisson arrival rate equals X and in which the
service time distribution has representation (p,
- S) of order m,. We assume that the queue
is stable, i.e.,p = XE[S] = -X&S-‘1 < 1. The CTMC describing this queueing system
has states Z = (0) U {(i, j)li E IV+, j = 1, - - - , m,}. State 0 signifies the empty state and
state (i, j) is the state with i customers and the service process in phase j. The generator
matrix Q is then given as:
8.4 A few mecial cases 163

f-x x/f3 0 Q 4
so s -XI XI 0 ...

Q = ; ‘;’ ,,;I s yAI 1:: . (8.36)

0 ()- sop ...


f . . .-.*
/
The steady-state probability vector z = (20, z,, z,, . . .) is then given as

1-P, i = 0,
(8.37)
G= 1 (1 -p)pRi,
- i = 1,2,...,

where the matrix R is given as

R = X(X1 - XB - S)-‘, (8.38)

with fi = 1-p. It is worthwhile for the reader to investigate the validity of the above result
for the case of an exponential service time distribution.

8.4.2 The PHlMlm queue


As the next special case we consider an M-server queueing system with exponentially
distributed service times and a general PH-renewal process as arrival process. Although
for this queueing system no explicit expression for R exists, we include it here since it can
be regarded as a computationally attractive special case of the GIJM(m queueing system.
We assume that the interarrival distribution has a PH-representation (T, a) and that we
deal with an exponentially distributed service time with mean l/p. The generator matrix
describing the corresponding QBD is then given as (only non-zero blocks are shown):

T p$&! ... ...


p1 T - p1 Tog +a+
WI T - 2pI Tog
*. -. *.
Q= *. . . (8.39)
T - (m - 1)pI T0a
*. T -mpI Tog
md
*. *. T-mpI
WI
*.
164 8 PHIPHIl queueing models

All submatrices have their size equal to the size of T: m x m. The difference with the
PHIPHIl queue, is that we now have m boundary columns, instead of only 2. Therefore,
the geometric regime in the steady-state probability vectors per level now starts from level
m onwards:
gi =smFtpm, i = m,m+ l,..., (8.40)

where the matrix R again is the minimal nonnegative solution of the matrix quadratic
equation
R2A2 + RA1 + A,, = 0, (8.41)
where
A2 = mpl, Al = T - mpl, and A0 = T0 . a. (8.42)

The first m (boundary) probability vectors, 3, through z,-~, follow from the boundary
equations:

gi-,Toa + g;(T - ipI> + (i + l)p~~+~ = Q, i = 1,. . . , m - 2, (8.43)


z,-,T”~ + x,-,(T - (m - l)pI + mpR) = 0.
This system of linear equations is not of full rank, so we need the normalisation equation
to reach a unique solution:
03 m-2 co

i=O i=O i=m-1


m-2
= c zi.il + grnpl(I - R)-‘1 = 1. (8.44)

8.5 The caudal curve


A specially interesting result that can be obtained from the matrix R is the so-called caudal
curve, which is the graph of the largest Eigenvalue of R, denoted q, as a function of the
utilisation p. The name caudal curve stems from the Latin cauda, meaning tail, a name
that will become clear shortly. It has been shown that

(8.45)

This means that for large i the ratio of the relative amount of time spent at level i + 1
compared to that at level i is approximately equal to Q. A similar result holds for any
two corresponding elements of the probability vectors gi and zi+i* Recalling that a level
8.5 The caudal curve 165

corresponds to the set of states for which the number of customers in the queue is the
same, it is clear that (8.45) expresses the rate of decay of (the tail of) the queue length
distribution. In a similar way, the equality
Pr{N, > Ic} = hqk + o(qk), for Ic -+ co, (8.46)
holds, with h a nonnegative constant. So, we observe that q “rules” the rate of decrease of
the steady-state queue length distribution. Knowledge of q thus gives us insight into the
tail of the queues.
Only for very few queueing systems can the caudal curve be obtained with little effort.
For M]M]c queueing systems, we have R = (p), so that q(p) = p. We could regard an
M]M]c queueing system as a reference queueing system.
For &-I&I1 q ueueing systems, it can be derived that v(p) = p’, which implies that
q(p) 5 p. Intuitively, one might have expected the latter inequality. Erlang-r distributions
have a smaller coefficient of variation than exponential distributions. Less variance often
implies better performance (smaller waiting times and smaller queues). Thus, q can be
expected to be smaller than p because then there will be less probability mass for states
representing longer queue lengths.
In a similar way, for an Hz [MI 1 queueing system, the explicit solution for the cau-
da1 curve reveals that q(p) > p. By a suitable parameter choice for the 2-phase hyper-
exponential distribution, the caudal curve will increase very steeply to close to 1 in the
interval [0, h’), and then increase very slowly to 1 in the interval [h’, 11, for small h’ > 0
(see also [218]). Ag ain, this is intuitively appealing since the hyperexponential distribution
is known to introduce more randomness in the system, which often implies worse perfor-
mance (longer queues and longer waiting times). A value of q larger than p will cause a
shift in the queue length distribution towards states representing longer queues.
For more general queueing systems than those mentioned, the caudal curve q(p) can
best be computed numerically from R. A practical method for that purpose is the Power
method; it will be discussed after the following examples.

Example 8.5. The EzIH:!ll queue (IV).


We present the values of the largest Eigenvalue q of R as a function of p in Table 8.3. As
can be observed, for all utilisations, we have q < p. This means that the E2/H2]1 queue
is a very “friendly” system, in the sense that it has a smaller tendency than the M ]M 11
queue to build up long queues. Cl

Example 8.6. The IPPIEkIl queue (II).


When we address the caudal curve of the IPP 1Ek ]1 queue, we find that despite the rather
166 8 PHIPHIl queueing models

2x P rl 2x P 29 P 'I
1 0.044 0.00836 2 0.088 0.02882 3 0.131 0.05705
4 0.175 0.09057 5 0.219 0.12788 6 0.263 0.16800
7 0.306 0.21018 8 0.350 0.25406 9 0.394 0.29928
10 0.436 0.34560 11 0.481 0.38288 12 0.525 0.44097
13 0.569 0.48979 14 0.613 0.53927 15 0.656 0.58936
16 0.700 0.64000 17 0.744 0.69117 18 0.786 0.74282
19 0.831 0.79495 20 0.875 0.84752 21 0.919 0.90051

Table 8.3: The values of v against p for the EzIH2 11 queue

deterministic arrivals, the value of 7 is always at least as large as that of p. This implies
that in this queue, there is a clear tendency towards states with more customers queued.
As an example, when b = 11 and /C = 10, we find 7 = 0.9613 in case p = 0.7273. Note
that in both the examples the arrival process dominates the service process regarding the
influence on the caudal curve. q

Let us now return to the actual computation of the caudal curve. The Eigenvalues of
R are defined as those values X for which Rx = X: for any :. To find them all, we have
to find those values of X for which the determinant of (R - XI) equals zero. When doing
so, we have to solve the so-called characteristic polynomial, which is of the same order as
the number of rows (and columns) in R. If we have computed them all, we can select the
largest one.
Instead of computing all the Eigenvalues, we can also compute the largest one only,
via a numerical procedure known as the Power method. This method can be described as
follows. We choose an initial row-vector y” and successively compute

(k+l) = Ry(“-l). (8.47)


Y

Suppose that the matrix R has N Eigenvalues which can be ordered as follows: 1~1=
lrlll 2 Id 2 - - - 2 IqNI. We furthermore introduce an initial approximation vector y(O)
which can be written as a linear combination of the Eigenvectors vi corresponding to vi
(i = 1,. . . , N):

?J(O)= 5 Zigi. (8.48)


i=l
8.6 Other models with matrix-geometric solution 167

After lc iterations in the Power method, the resulting vector can be described as:

I!!ck)= Rky(o)
_ = 5i=l xiv;gi = qlk (8.49)

The smaller the ratios Iqi/qrl (i = 2, s. . , N), the faster the summation on the right-hand
side will turn to zero. For large k, what remains is the following approximation:

yCk) w Vkxil!i 7 (8.50)

so that the most-dominant Eigenvalue of R can be computed as the ratio of the j-th
element in two successive vectors ~(‘1:

(8.51)

In practice, the iteration is continued until two successive approximations of 7 differ less
than some predefined value 6 > 0. Since 0 < q < 1, the computation of -y(“) might lead to
loss of accuracy if Ic is large. To avoid this, the successive vectors -y(“) are often renormalised
during the iteration process as follows:

2(k+l) = +@-‘),
lk-1
(8.52)

where lk-1 = minj{yjk-‘) ). This does not change the result since in the quotient that
computes q these factors cancel.

8.6 Other models with matrix--geometric solution


The queues of PHIPHI 1 type we have addressed so far are not the only queueing models
that can be solved using matrix-geometric methods. In this section we will briefly address
other queueing models for which similar methods as those presented here can be applied.
First of all, we can use the QBD structure when we are dealing with queues of PHIPHIl
type, in a more general context. The QBD structure still exists when the arrival and ser-
vice process are no longer renewal processes but they are still Markovian. This can best be
understood by addressing an arrival process with multiple active modes; in each mode the
(Poisson) arrival process may have a different rate. Mode changes in the arrival process
can coincide with arrivals, but can also be dependent on another Markov chain, i.e., we
deal with a Poisson process of which the rate is modulated by an independent Markov
168 8 PHIPHIl q ueueing models

chain. At the service side, we can have a similar situation; we then often speak of a
multi-mode server. General QBD models are very important in studying the performance
aspects of communication systems subject to complex arrival streams and to systems with
server breakdowns and repairs. However, their specification at the level of the block ma-
trices constituting the Markov chain is cumbersome. Instead, one should use higher-level
mechanisms to construct these models; we will see an example of such an approach in
Chapter 17.
In the QBD models we have addressed in this chapter the state space has been un-
bounded in one direction. In other words, we have addressed system models with an
infinite buffer. QBD models on a finite state space can be studied with similar means. For
the MIPHIl and the PHlMll queue, the matrix R that needs to be computed is in fact
the same as the one that is computed in the unbounded case; the only difference lies in
another normalisation equation. For general QBDs on a finite state space, we have to deal
with two different boundaries of the state space (one corresponding to the empty system
and one corresponding to the completely filled system). This most naturally leads to two
second-order matrix equations that need to be solved, as well as two sets of boundary
equations. The state probabilities gi can then be computed as the sum of two geometric
terms.
A remark should be made here about another powerful solution method for queueing
models with a QBD structure known as the spectral expansion method. Using this method,
the global balance equations for the states in the repeating part of the CTMC are in-
terpreted as a matrix-vector difference equation of second order. To solve this difference
equation, a characteristic matrix polynomial has to be solved. Using an Eigenvalue analy-
sis, the probability vectors for each level can be written as a sum of weighted Eigenvectors;
the coefficients in the sum are given by powers of the Eigenvalues. Recent comparisons
with the logarithmic reduction algorithm show favourable performance for the spectral
expansion method in most cases [119].
Finally, queues of type G/Ml 1 and MlGll can be handled using matrix-geometric tech-
niques. Studying these queues at arrival and departure instances, the embedded Markov
chains have an upper and lower triangular form, respectively. In practice, this means that
we have to deal with matrices Ai (i E JV) in the repeating part of the Markov chain.
This then leads to the following non-linear equation that needs to be solved (for the MlGll
case) :
R=gRiAi. (8.53)
i=O

Special variants of the successive substitution and logarithmic reduction algorithm then
8.7 Further reading 169

have to be used to compute R. The solution of the boundary probabilities and the proba-
bility vectors zi remain as we have seen.

8.7 Further reading


Despite their great applicability and their numerically attractiveness, only a few books on
model-based performance evaluation of computer-communication systems address matrix-
geometric methods, all very concisely; most notably are [117, 152, 156, 2161. The tutorial
paper by Nelson provides a good introduction [215]. Background information can be found
in the books by Neuts [217, 2191 and in many mathematical journals; a seminal paper has
been written by Evans [85]. Matrix-geometric analyses of queues subject to complex arrival
processes, such as Markov-modulated arrival processes, can be found in [89, 1871. Surveys
on algorithms to solve for the matrix R can be found in [172, 1881. The logarithmic-
reduction algorithm has been published by Latouche and Ramaswami [173]. Information
on the caudal curve can be found in [218]. The spectral expansion method has been
advocated by Daigle and Lucantoni [71] and Mitrani et al. [203]; a comparison with the
logarithmic reduction method has been performed by Haverkort and Ost [119]. Finite
QBD models have been discussed by various authors as well. Gun and Makowski [114],
Bocharov and Naoumov [24] and Wagner et al. [282] p resent a matrix-geometric solution.
Chakka and Mitrani use the spectral expansion method also for these models [39] whereas
Ye and Li recently proposed a new (and fast) folding algorithm [293].

8.8 Exercises
8.1. The EklMll queue.
For the EklM]l queue with k > 1:

1. Draw the state-transition diagram.

2. Find the matrices Ai.

3. Given the specific form of the matrices Ai) what will be the form of R?

4. For k = 2, X = 10 (rate per arrival phase), ,X = 10 (service rate) and compute


E[N] for this queueing model. Compare your results with those obtained via the
Laplace-transform approach for the GlMll queue (see Chapter 7).
170 8 PHIPHIl queueing models

8.2. The IPPlMll queue.


Consider an IPP]M] 1 queue with arrival rate X, service rate /-Land on- and off-rates ~0,~
and YI,O.

1. Draw the state transition diagram.

2. Recognise the block matrices Ai and B;j.

3. Compute the matrix R for suitable numerical values of the model parameters.

4. Recognise the boundary blocks Bij and solve the boundary equations.

8.3. An MIMI1 queue with slow-starting server.


Consider an MIMI 1 queueing system in which the server only starts working when at least
T = 3 customers are queued. Once it is serving customers, it continues to do so until the
queue is empty. The job arrival rate X = 2 and the service rate ,Y = 3.

1. Define the state space Z of the QBD underlying this model.

2. Draw the state-transition diagram.

3. Recognise the QBD structure and the block matrices Ai and B;j.

4. Compute R. Does the value of R surprise you?

5. Compute the boundary probabilities.

6. Show that E[N] = 3 in this slow-starting MIMI1 queue. For a normal MIMI1 queue,
i.e., with T = 1, we would find that E[N] = 2. Explain the difference.

8.4. Two queues in series.


Consider a system that can be modelled as a tandem of two queues. At queue 1 jobs arrive
as a Poisson process with rate X and are served with rate pl. After service at queue 1, jobs
are transferred to queue 2. The service rate of queue 2 is p2. The number of customers that
can be held in queue 2 is limited to K (in queue 1 this number is not limited). Whenever
queue 2 is completely filled, the service process in queue 1 is stopped, in order to avoid a
customer ready at queue 1 being unable to move into queue 2. This form of blocking at
queue 1, due to a full successor queue is known as communication blocking. The model
sketched so far can be regarded as a QBD.
8.8 Exercises 171

1. Define the state space Z.

2. Draw the state-transition diagram.

3. Define the block matrices Ai and B;j.

8.5. Exponential polling systems with 2 stations.


Consider a two-station polling model (see also Chapter 9) with arrival, service and switch-
over rates Xi, pi and Si respectively (i = 1,2). The service strategy is exhaustive in both
stations. Furthermore, the queue at station 1 has infinite capacity, the queue at station 2
is bounded to K customers.

1. Define the state space Z of this Markovian polling model.

2. Draw the state-transition diagram for K = 1.

3. Indicate how a matrix-geometric solution can be performed (define the matrices Ai


and Bij).

4. Indicate the changes in the model and matrices when both stations are served ac-
cording to the l-limited strategy.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 9

Polling models

T
whether
HE principle of polling
tions. In early timed-sharing
they had any processing
is well-known
computers,
in many branches of computer
terminals
science applica-
were polled in order to investigate
to be done. These days, intelligent workstations access
file or computing servers via a shared communication medium that grants access using a
polling scheme. Also in other fields, e.g., manufacturing, logistics and maintenance, the
principle of polling is often encountered.
When trying to analyse systems that operate along some polling scheme, so-called
polling models are needed. In this chapter we provide a concise overview of the theory and
application of polling models. Although we do provide some mathematical derivations,
our main aim is to show how relatively simple models can be used, albeit sometimes
approximately, for the analysis of fairly complex systems.
This chapter is further organised as follows. In Section 9.1 we characterise polling
models and introduce notation and terminology. In Section 9.2 we address some important
general results for polling models. Symmetric and asymmetric count-based polling models
are addressed in Section 9.3 and 9.4 respectively. Using these models, the IBM token
ring system is analysed in Section 9.5. Time-based polling models, both symmetric and
asymmetric, are finally discussed in Section 9.6.

9.1 Characterisation of polling models


In polling models, there is a single server which visits (polls) a number of queues in some
predefined order. Customers arrive at the queues following some arrival process. Upon
visiting a particular queue, queued customers are being served according to some scheduling
strategy. After that, t&e server leaves the queue and visits the next queue. Going from
174 9 Polling models

one queue to another takes some time which is generally called the switch-over time.
In the above description a number of issues have deliberately been left unspecified. It
is these issues that, once specified, characterise the polling model. In particular, the visit
ordering of the server to the queues and the strategy being used to decide how long a
particular queue receives service before the server leaves, characterise the model. These
issues will be addressed in Section 9.1.2 and 9.1.3 respectively after some preliminary
notation and terminology has been introduced in Section 9.1.1.

9.1.1 Basic terminology


We will assume that we deal with a polling model with N stations, modelled by queues
Qi through QN. We use queue indices i, j E ( 1, . . . , N). At queue i customers arrive
according to a Poisson process with rate Xi. The mean and second moment of the service
requirement of customers arriving at queue i is E[Si] and E[ST] respectively. The total
offered load is given by p = Cz, pi, with pi = XJZ[Si]. The mean and variance of the time
needed by the server to switch from queue i to queue j are denoted &j and 62(i) respectively.
When the queues are assumed to be unbounded, under stability conditions, the through-
put of each queue equals the arrival rate of customers at each queue. The main performance
measure of interest is then the customer waiting time for queue i, i.e., Wi. Most analytic
models only provide insight into the average waiting time E[ Wi] . When the queues are
bounded, the throughput and blocking probability at the stations are also of interest; we
will come back to finite-buffer polling models in Chapter 16.

9.1.2 The visit order


We distinguish three different visit orders: a cyclic ordering, a Markovian ordering and an
ordering via a polling table.

l Cyclic polling. In a cyclic visiting scheme, after having served queue i, the server
continues to poll station i @ 1 where @ is the modulo-N addition operator such that
N @ 1 = 1. As a consequence of this deterministic visit ordering only “neighbouring”
switch-over times and variances are possibly non-zero, i.e., S;,j = 6(2)
;,j = 0 whenever
j # i @ 1. For ease of notation we set Si = 6i,+@i and 6c2) a = dj:Li. The mean
and variance of the total switch-over time, defined as the total time spent switching
during a cycle in which all stations are visited once, are given by n = CEi & and
A(“) = CE, 8j2) respectively. In Figure 9.1 we show a polling model with cyclic visit
ordering.
9.1 Characterisation of polling models 175

Ql /.
I/ \\

Figure 9.1: Basic polling model with cyclic visit ordering

Due to the fact that most, and especially the earlier, results on polling models as-
sumea the cyclic visit order, polling models are often called cyclic server models.
This, however, is a slight abuse of terminology: the class of polling models is larger,
as discussed below.

l Markovian polling. In a Markovian polling scheme, after having polled queue


i, the server switches to poll queue j with probability ~i,~. Since the probabilities
pi,j are ind ependent of the state of the polling model this form of polling is called
Markovian polling. The probabilities are gathered in an N x N matrix P. The mean
and variance of the total switch-over time are now defined as n = CL, C~=ipi,~&j
and AC21 = CL, CyJ=l pi,jb,(i’ respectively.
Notice that the degenerate case of Markovian polling in which pi,iei = 1 and pi,j =
0 (j # i @ 1) is equivalent to a polling model with a cyclic visit ordering.

l Tabular polling. Finally, an ordering via a polling table T = (Z’i, Tz, . . . , TM)
establishes a cyclic visit ordering of the server along the queues; however, these
cycles may contain multiple visits to the same queue. The server starts with visiting
queue QT1, then goes to QT2, etc. After having visited QTM the server visits QT~
again and a new cycle starts.
Typically, when the polling process is controlled in a centralised way, a polling table is
used, e.g., when station 1 actively controls the system, we have 2’ = (1,2,1,3,1,4, e. . ,
1, N - 1, 1, N). ,Also the scan-polling order can be observed quite frequently:
176 9 Polling models

T = (1,2,3,.*., N-l,N,N,N-l,N-2,- ,2, l), e.g., when the queues model disk
tracks that are visited by a moving disk head.
The mean and variance of the total switch-over time are now defined as A =
~;,M,I b&j, and AC21 = Cr=, ~~~‘T~~l
> respectively. Note that the @-operator is
now defined on {l,...,M}.

Whenever T = (1,2,-m-,N), i.e., M = N, a polling model with polling table T is


equivalent to a polling model with a cyclic visit ordering.

9.1.3 The scheduling strategy


The scheduling strategy defines how long or how many customers are served by the server
once it visits a particular queue. Two main streams in scheduling strategies can be distin-
guished: count-based scheduling and time-based scheduling.

Count-based scheduling. With count-based scheduling the maximum amount of service


that is granted during one visit of the server at a particular queue is based on the number
of customers served in the polling period. Among the well-known scheduling disciplines
are the following (for a more complete survey, see [286, Chapter 11):

l Exhaustive (E) : the server continuously serves a queue until it is empty;

l Gated (G) : the server only serves those customers that were already in the queue
at the time the service started (the polling instant);

l k-limited (k-L): each queue is served until it is emptied, or until Ic customers


have been served, whichever occurs first. The case where Ic = 1 is often mentioned
separately as it results in simpler models;

l Decrementing or semi-exhaustive (D) : when the server finds at least one cus-
tomer at the queue it starts serving the queue until there is one customer less in the
queue than at the polling instant;

l Bernoulli (B) : when the server finds at least one customer in the queue it serves
that customer; with probability bi E (0, l] an extra customer is served, after which,
again with probability bi another one is served, etc.;

l Binomial (Bi): when the server finds ki customers in queue i at the polling instant,
the number of customers served in the current service period is binomially distributed
with parameters ki and bi E (0, 11.
9.2 Cyclic polling: cycle time and conservation law 177

All of the above strategies are local, i.e., they are determined per queue. One can also
imagine global count-based strategies. For instance, the global-gated strategy marks all
jobs present at the beginning of a polling cycle. During that cycle all of those jobs are
served exhaustively. Jobs arriving during the current cycle are saved for the next cycle.

Time-based scheduling. With time-based scheduling the maximum amount of service


that is granted during one visit of the server at a particular queue is based on the time
already spent at that queue. Two basic variants exist:

l Local time-based: the server continues to serve a particular queue until either all
customers have been served or until some local timer, which has been started at the
polling instant, expires;

l Global time-based: the server continues to serve a particular queue until either
all customers have been served or until some global timer, which might have been
started when the server last left the queue, expires.

In fact, the first mechanism can be found in the IBM token ring (IEEE P802.5) [35] whereas
the second one can be found in FDDI [35, 277, 2841.

9.2 Cyclic polling: cycle time and conservation law


In this section we restrict ourselves to polling models with a cyclic visit order and a mixture
of count-based scheduling strategies (exhaustive, l-limited, gated and/or decrementing).
The mean cycle time E[C] is’ d efi ne d as the average time between two successive polling
instants at a particular queue. E[C] is independent of i, also for asymmetric systems. This
can easily be shown using the following conservation argument. Assuming that the service
discipline is exhaustive, one cycle consists, on average, of the servicing of all jobs plus the
total switch-over time. The latter component equals a = xi &. In one cycle all the jobs
that arrive at station i, i.e., X$[C] jobs, have to be served. This requires, for the i-th
station X$[C]E[Si] t ime units. Thus we have:

E[C] = 5 & + 5 X+?qC]E[SJ= a + E[C]p =+E[C] = &. (94


i=l i=l

This result is also valid when the service discipline is other than exhaustive. The average
service period E[Pi] for queue i can be derived as

E[ly = X~E[C]E[S~] = f$. (9.2)


178 9 Polling models

This equation follows from the fact that for stability reasons, on average, everything that
arrives in one cycle at station i, must be servable in one cycle. For the average time between
the departure of the server from station i and the next arrival at station i, the so-called
inter-visit time Ii, we have

E[IJ = E[C] - E[P,] = (’ lv~~” e (9.3)


Important to note is that upon the arrival of a job at station i, the average time until
the server reaches that station is not E[Ii]/2. Th is is due to the fact that Ii is a random
variable, and we thus have an example of the waiting time paradox. The average time
until the next server visit therefore equals the residual inter-visit time E[1:]/2E[Ii]. Notice
that, in general, an explicit expression for E[1!] is not available.
The (cyclic) polling models we address are generally not work conserving, that is, there
are situations in which there is work to be done (the queues are non empty) but in which
the server does no real work since it is switching from one queue to another. When
the switching times are zero, the polling model would have been work conserving and
Kleinrock’s conservation law would apply ([ 1601; see also Chapters 5 and 6 on M]G] 1
queues) :
N
x pi~[~i] = pCEl xiE[s3. (9.4)
i=l w - P>

Because piE[Wi] = XiE[Si] x E[N,,i]/Xi = E[AJq,i]E[Si], the left-hand side of (9.4) is often
called the amount of work in the system. Independent of how the queues are visited, this
amount always equals the steady-state amount of work in a model in which the service
order is FCFS (the right-hand side of (9.4)). If we have only one station (N = 1) and zero
switch-over times, we obtain a normal (work conserving) M]G]l queue, and the right-hand
side of (9.4) is just the expected waiting time in the M]G]l model. When we have only
one station with exhaustive service but now with positive switch-over times, we obtain a
queue with multiple server vacations.
When the model is not work conserving, that is, when the switch-over times are positive,
Kleinrock’s conservation law does not hold anymore. It has, however, been shown by
Boxma et al. [28, 27, 1121 that a so-called pseudo-conservation law still does hold. This
pseudo-conservation law is based on the principle of work decomposition:

V=V+Y, (9.5)

where v is a random variable indicating the steady-state amount of work in the model
with positive switch-over times, V is a random variable indicating the steady-state amount
9.2 Cyclic polling: cycle time and conservation law 179

of work in the model when the switch-over times are set to 0, and Y is a random vari-
able indicating the steady-state amount of work in the model at an arbitrary switch-over
instance. The principle of work decomposition is valid for cyclic polling models as well
as for polling models with Markovian routing or a polling table. V is totally independent
of the scheduling discipline, whereas Y and therefore ? are dependent on the scheduling.
Intuitively, one expects Y and ? to decrease if the switch-over times decrease, if the visit
order becomes more efficient or if the scheduling becomes “more exhaustive”. In particular,
for polling models with non-zero switch-over times (with cyclic, tabular or Markovian visit
ordering) a pseudo-conservation law of the following form applies:

When we are dealing with a cyclic polling order, one has:

C ;E[wi] + c ; (I- -=) E[Wi] + c ; (1 - A$ I;‘“) E[Wi] =


iEE,G iEL iED

= Cf!, XiE[Sf] Ac2) n(&J - Cz, pf) + n C~EG,L d n c&D ~ix~E[s~l


, (9.7)
2(1 - P> +2a+ 2P(l - P> P(l-PI - 2P(l - P>

where E, G, L, and D are the index sets of the queues with an exhaustive, a gated,
a l-limited and a decrementing scheduling discipline, respectively. Clearly, the pseudo-
conservation law expresses that the sum of the waiting times at the queues, weighted by
their relative utilisations (for E and G directly and with more ccmplex factors for L and
D) equals a constant.
The pseudo-conservation law does not give explicit expressions for the individual mean
waiting times since it is only one equation with as many unknowns as there are stations.
Nevertheless, it does provide insight into system operation and in the efficiency of schedul-
ing strategies. Also, it can be used as a basis for approximations or to verify simulation
results (see below).
It is interesting to study the stability conditions for cyclic server models. For cyclic
server models with an exhaustive or a gated service discipline a necessary and sufficient
condition is p < 1. For models with a l-limited service strategy, a necessary condition
can be derived as follows. The mean number of customers arriving at station i per cycle
equals XiE[C]. Th is number must be smaller than 1, as there is only 1 customer served
per cycle. Using the fact that E[C] = n/(1 - p), th e necessary stability condition equals
p + xin < 1, for all i. For models with a decrementing scheduling strategy a necessary
stability condition of the form p + Xi(l - pi)A < 1, for all i, can be derived.
180 9 Polling models

Example 9.1. A 2-station asymmetric polling model (I).


Consider an asymmetric polling model with 2 stations: station 1 has exhaustive scheduling
and station 2 l-limited scheduling. Furthermore, the following parameters apply: E[Si] =
0.4, E[$] = 0.32, Xi = 1, Si = 0.05 and Ji2) = 0 for i = 1,2. Clearly, the stability
conditions are satisfied so finite average waiting times do exist for both stations. We are
not in the position to compute E[IVi] and E[W2] directly; however, we can apply the
pseudo-conservation law, yielding the following relation between E[W,] and E[W2]:

E[W,] + kE[W2] = 3.7. (9.8)


This linear relation can be drawn in the E[IVi]-E[W,] pl ane; the exact solutions for E[Wi]
and E[W2] have to lie on this line. 0

9.3 Count-based symmetric cyclic polling models


When we address models in which all the scheduling disciplines and parameters are station
independent, we can obtain closed-form results for the average waiting times by using the
pseudo-conservation law, since in a fully symmetric system all the average waiting times
are equal to one another so that we are left with only one unknown in (9.7). We will not
use this approach here, since it does not provide us much insight into the actual system
operation. Instead, we will derive the expected waiting time in a fully symmetric exhaustive
scheduling polling model in an operational way, following the lines of the proofs for the
expected waiting times in the MIG]l and related queues as presented in Chapters 5 and 6.
For an exhaustive count-based symmetric polling model, the expected waiting time for
an arriving customer can be thought to consist of 4 components:

where the 4 components can be understood as follows:

1. An arriving customer will, due to the PASTA property, find another customer (at
some queue) in service with probability p. The remaining service time of this cus-
tomer equals E[S”]/2E[S].

2. Similarly, with probability 1 - p an arriving customer will find the server switching
from one queue to another. The remaining switch-over time equals S2/2S (notice
that Sc2) denotes the variance in switch-over times, whereas 62 denotes the second
moment of. the switch-over time here).
9.3 Count-based symmetric cyclic polling models 181

3. An arriving job arrives at any queue with equal probability, so on average (N - 1)/2
switch-overs, each of expected length 6, are needed for the server to arrive at the
particular queue (since the number of queues N is a constant, the waiting time
paradox does not apply).

4. Finally, upon arrival


of a customer, the steady-state amount of work in the system
equals NE[N,]E[S]. I n eq ui 1i b rium, this amount of work should be handled before
the randomly arriving customer is served.

Adding these 4 components, we have:

JqWl = (I- PI2 + + NE[N,]E[S]. (9.10)

Using Little’s law to rewrite E[N,] = XE[W], we obtain

(l-p)E[W] = (I-~)$+~c~+P~

+ E[W] =
s2
-+------- N - ’ 6 + NXE[S] w21 (9.11)
26 2(1-p) w - Puwl ’
We can rewrite the first two additive terms as follows:
(N - p)6
(9.12)
2(1 - P> ’
so that we have
h(2) NXE[S2] + 6(N - p)
E[WT]= 26 + (9.13)
W-d *
The subscript ‘E’ is added to indicate that the formula is valid for exhaustive scheduling.
Along the above lines, one can also derive the mean waiting time when all scheduling
strategies are of gated type:

c5c2) NXE[S2] + 6(N + p)


E[~G] = 26 + (9.14)
W-P) n
For the l-limited scheduling discipline we have:

S(2) NXE[S2] + 6(N + p) + NJd2)


E[JC,] = 26 + (9.15)
2(1-p- NM) ’

Finally, for the decrementing scheduling discipline we have:


dc2) NXE[S2](1 - XS) + (N - p)(6 + xhC2))
E[~D] = 26 + (9.16)
2(1 - p - M(N - p)) ’
182 9 Polling models

P E[WE] E[WG] E[WL] J?@%] p -@[WE] E[WG] B[WL] E[b]

0.05 0.6000 0.6053 0.6085 0.6031 0.55 1.711 1.833 2.089 1.931
0.15 0.7176 0.7353 0.7485 0.7302 0.65 2.314 2.500 3.070 2.793
0.25 0.8667 0.9000 0.9310 0.8953 0.75 3.400 3.700 5.286 4.690
0.35 1.062 1.115 1.179 1.119 0.85 5.933 6.500 15.00 12.27
0.45 1.327 1.409 1.535 1.428 0.95 18.60 20.50 - -

Table 9.1: E[W] in symmetric polling models with exhaustive, gated, l-limited, and decre-
menting scheduling

From these explicit formulae, the earlier derived stability conditions can also easily be seen;
they correspond to those traffic conditions where the right-hand denominator becomes zero.
From these expressions, it can also be observed that

and

E[WG] > E[WD] (at low load), and E[WG] < E[WD] (at high load). (9.18)

Example 9.2. Symmetric polling models: influence of scheduling.


In Table 9.1 we have tabulated the average waiting times for symmetric polling models
with exhaustive, gated, l-limited, and decrementing scheduling strategies for increasing
utilisation (established by increasing the arrival rate). The other parameters are: N = 10,
s = 0.1, JC2) = 0.01, E[S] = 1.0, and E[S2] = 1.0. The above inequalities can easily
be observed. Also notice that for p = 0.95, the l-limited and decrementing systems are
already overloaded. 0

The fact that the exhaustive scheduling discipline is the most efficient can easily be
understood. It simply does not spoil its time for switching purposes when there is still
work to do, i.e., when the queue it is serving still is not empty. The gated and decrement-
ing discipline, however, sometimes take time to switch when there are still customers in
the current queue. This counts even more for the l-limited case where every service is
effectively lengthened with the succeeding switch-over time. The fact that the amount of
switching overhead per customer is smallest with an exhaustive scheduling strategy does
not necessarily imply that it is also the best. From a fairness point of view, the other
9.4 Count-based asymmetric cyclic polling models 183

disciplines might be considered better since they prevent one station from totally hogging
the system.

Example 9.3. A 2-station asymmetric polling model (II).


Reconsider the asymmetric polling model addressed before. Since station 2 uses a l-limited
scheduling strategy, station 1 profits from this as it receives more opportunities to serve
customers. In fact, E[IVi] should be smaller in the mixed scheduling case than when both
stations would have exhaustive scheduling. In this latter symmetric case, however, we can
exactly compute the average waiting times: E[VVr,J = 1.75. So, in the asymmetric case
station 1 is expected to perform better, i.e., E[IVr] 5 1.75. This implies, by the pseudo-
conservation law derived for this example, that in the asymmetric case station 2 will suffer
more, i.e., EIIVZ] 2 3.9.
We can improve on the above bounds by considering the case where the arrival rate of
station 2 is set to zero. In that case, the model reduces to an M]G]l queue with exhaustive
service and multiple vacations (as seen from station 1) because at station 2 no jobs arrive.
Thus, after the queue in station 1 empties, the server switches to queue 2 and directly back
to queue 1. This switching can be interpreted as a vacation with average length 0.1 (two
switches of length 0.05). The variance of the switching (vacation) time is 0, so that we can
compute E[IVi] using (5.38) as follows:

qJ/j/q_ Jfw21I WS21 =-Oil+ o-32 (9.19)


2Jwq 20 - P> 2(1 - 0.4) = Oe317*

We thus have: 0.317 < E[IVi] 5 1.75 and, using the pseudo-conservation law again,
3.9 5 E[W2] 5 6.766. cl

9.4 Count-based asymmetric cyclic polling models


The analysis of asymmetric cyclic server models is much more complicated than the analysis
of symmetric models. We will present the exact analysis of an asymmetric cyclic polling
model with exhaustive service in Section 9.4.1 followed by a number of approximate results
derived by using the pseudo-conservation law in Section 9.4.2.

9.4.1 Exhaustive service: exact analysis


In the exhaustive asymmetric case the average waiting time E[IVi] perceived at station
i has the following form (see also (5.38) for the M]GJl queue with server vacations in
184 9 Polling models

Chapter 5):
E[WE,i] = -
JW3 XiE[Sf]
(9.20)
2E[Ii] + 2( 1 - pi) ’
where we recognise as the first term the residual inter-visit time, i.e., the average time it
costs for the server to arrive at the station. The second term is simply the MlGll waiting
time applied for station i only. Since the scheduling discipline is exhaustive, once the
server is at station i, we can analyse station i in isolation as if it were an MlGll queue.
The second term can readily be computed. E[li] d irectly follows from (9.3). The only
problem we have in evaluating E [W E,+.] is the determination of E[I,f]; it has been derived
by Ferguson and Aminetzah [88] as:

E[I?]2 = E[Ij22 + &!)I + ’ - pi c ri,j 7 (9.21)


P j#i

where the coefficients ri,j (i, j = 1,. . . , N) follow from a system of N2 linear equations:

j < i,

ri,j = j > i, (9.22)


j = i.

A similar solution exists for gated systems. For k-limited and decrementing systems such
solution schemes do not exist. For these cases, one has to resort to approximate solutions.

9.4.2 Some approximate results


For most asymmetric system models, exact results do not (yet) exist. For these cases, there
is a wide variety of approximate results, some of which we will present here. The pseudo-
conservation law plays an important role in the construction of these approximations, by
the use of the following three-step approach:

1. The expected waiting time for queue i, E[Wi], is expressed in terms of the expected
residual cycle time E[jZi] = E[If]/2E[&] (for th e congestion due to traffic at other
queues) and some local parameters (for the congestion at queue i, e.g., in the form of
an MlGll result). As a result, E[W.] is expressed as a function of known parameters
and the unknown parameter E[Ri], similar to (9.20).

2. It is then assumed that the residual inter-visit times are equal for all the stations,
i.e., E[&] = E[R], f or all i. This assumption has been shown to be less accurate
when the system becomes more asymmetric and when the variance of the switch-over
9.4 Count-based asymmetric cyclic polling models 185

times increases. The resulting expressions for E[Wi] are substituted in the pseudo-
conservation law in which, due to the assumption just made, only one unknown
remains, being E[Q . This yields an explicit expression for E[R] .

3. The results of Steps 1 and 2 are combined to obtain explicit expressions for all the
E[W].

Using this approach, for a mixture of exhaustive and gated scheduling disciplines, the
following result has been derived by Everitt [86]:

(I - p)d2) + Cj"=, X$[S;]


E[~E,G,~]z (1 f pi)& 1+ A
A2 >) *
(9.23)
For the gated scheduling discipline the + sign should be taken, whereas for the exhaustive
scheduling discipline the - sign holds.
Similarly, for an asymmetric l-limited model, the following approximation has been
derived by Boxma and Meister [28]:

1-P
E[WL,+] FZ ’ - + pi
1 - p - AiA ’ (1 - p)p + CfLl pz
pAc2)
X - + $q c kE[$‘] + & $PiCl + pi)) * (9.24)
2A i-l 2-l

An iterative and more accurate solution procedure for asymmetric l-limited models has
been proposed by Groenendijk [112, Section 7.2.31.
For models in which a mixture of exhaustive, gated, and l-limited scheduling disciplines
exists, the following approximation has been proposed by Groenendijk [ill]. First set

EIWE,iI = (1 - Pi)E[RI, E[WG,i] = (1 + Pi)E[R], E[WL,i] = ll~~pp~Ap~E[fil, (9.25)


i

where E[R] approximates the mean, scheduling dependent, residual cycle time. Substitut-
ing this result in the pseudo-conservation law yields:

(9.26)

Example 9.4. A a-station asymmetric polling model (III).


Reconsider the asymmetric polling model addressed before. Using the above approximation
we calculate E[R] = I.028 and consequently EIWl] = 0.617 and E[W2] E 6.167. As can
186 9 Polling models

be observed, these values not only lie on the line described by the pseudo-conservation law
but also within the bounds we established before. cl

Finally, for k-limited systems, Fuhrmann has derived the following bound for the symmetric
case [loll:
(9.27)

This bound is so tight that it can be used as an approximation. Note that the equal
sign holds if either Ic = 1 (l-limited) or Ic = 00 (exhaustive). For asymmetric ki-limited
systems, Fuhrmann and Wang have also provided an approximation based on the pseudo-
conservation law [ 1001.

9.5 Performance evaluation of the IBM token ring


In this section we discuss the use of count-based cyclic server models for the analysis of
timed-token ring systems such as the IBM token ring. We briefly touch upon the IBM
token ring access mechanism in Section 9.5.1. Then we discuss an approximation of this
timed-token access mechanism by means of (approximate) k-limited cyclic server models in
Section 9.5.2, and discuss the influence of the token holding time on the system performance
in Section 9.5.3. The methods presented here have been developed by Groenendijk in his
Ph.D thesis [112, Chapter 81.

9.5.1 Timed-token access mechanisms


In this section a brief operational explanation is given of the timed-token network access
mechanism as used in the IBM token ring. We do not address the priority mechanism. For
more details, we refer to the survey paper by Bux [35].
In a token ring system a number of stations, denoted Qi through QN, are connected to a
ring shaped medium. On this medium, a special pattern of bits circulates, the token. There
are two types of tokens: busy tokens and free tokens. A busy token is always followed by
an actual data packet. At the beginning of the data packet is the header which contains,
among others, a field of bits reserved for the indication of the destination address (the
address field).
Whenever a token passes a particular station Qi, there are two possibilities. When
the passing token is of the busy type, Qi checks whether the address field matches its own
address. If so, it starts copying the trailing data packet(s) in its input buffer. If the address
9.5 Performance evaluation of the IBM token ring 187

field does not match the station’s address, Q; simply lets the busy token and the trailing
packet pass.
When a free token passes the station, there are again two possibilities. If Qi has no
data packets to send, it just does nothing to the token: it simply passes it to the next
downstream station which will take a certain switch-over time. If Qi has data packets to
send, it grabs the empty token, changes it to a busy token and puts it on the ring again,
directly followed by the data packets it wants to send, of course preceded by the correct
header.
At the moment Qi starts transmitting data packets on the ring, it also starts its (local)
token holding timer (THT), which operates as a count-down timer with initial value thti.
Now, Qi continues to send until either all its data packets have been transmitted or the
THT expires, whichever comes first. After finishing the transmission of data packets, by
one of the two above reasons, the station issues a new free token.
Important to note is that the expiration of the THT is non-preemptive. This means
that once the THT expires: Qi is still allowed to finish the transmission of the data packet
that is in progress. Because data packets have a maximum length, one can calculate the
maximum time a station will hold the token.
Upon receipt of a data packet by a station, there are two ways to go. The receiving
station might change the busy token in a free token (so-called destination release) or it
forwards the token to the sending station, which then releases a free token (source release).
Note that destination release is more efficient, although it might incur unfairness between
stations as a truly cyclic polling order is not guaranteed.

9.5.2 Approximating the timed-token access mechanism


We cannot directly model the above timed-token ring access mechanism in terms of count-
based polling models. However, wc can approximate it by a cyclic (asymmetric) polling
model with a k-limited scheduling strategy as follows.
Consider a timed-token ring system, in which for station i the token holding time is
thti and the average packet transmission time is E[Si]. E[Si] and E[Sf] should reflect the
transmission time of the packet, plus possibly the propagation delay on the medium or
even the total round-trip delay, depending on which token scheme is used [35].
Whenever thti is much smaller than E[Si], in most cases no more than 1 packet will
be transmitted per visit to station i. This situation can conveniently be modelled as a
l-limited scheduling discipline.
If, on the other hand thti is much larger than E[Si], in most cases all queued packets
188 9 Polling models

simulation l-limited 2-limited


p E[W,] E[W2,3] E[wl] E[W2,3] E[W,] E[W2,3]

0.3 0.421 0.347 0.525 0.370 0.420 0.322


0.5 1.12 0.747 1.51 0.775 1.12 0.670
0.8 10.6 2.33 55.7 2.31 9.67 2.23
0.3 0.442 0.399 0.506 0.444 0.317 0.558
0.5 1.20 0.989 1.38 1.16 0.861 1.38
0.8 7.52 5.60 10.72 8.30 6.91 6.22

Table 9.2: Comparison of the expected waiting time for a timed token protocol; simulation
results and approximate results using k-limited cyclic server models (results from [112])

at station i will be served. This can be modelled as an exhaustive scheduling discipline.


Clearly, interesting modelling problems arise whenever thti z E[S,]. Noting that, on
average, t&/E[Si] packets “fit” in a single service period at station i, a k-limited scheduling
strategy seems reasonable to assume, with

Ic = round 1+ (9.28)
( g).

The addition of 1 comes from the fact that the timer expiration is non-preemptive.

Example 9.5. A 3-station, timed-token ring system (adapted from [112]).


Consider a timed-token ring system with only three stations. For comparison purposes,
the average waiting times E[Wi] h ave been derived in two ways: (1) by simulation, using
the timed-token mechanism in all its details, and (2) by using cyclic server models with a
l-limited and a 2-limited scheduling strategy.
The value of the token holding time in station i is equal to the average packet service
duration, i.e.,th& = E[Si]. Th e switch-over times & are assumed to be 0.1, determin-
istically, i.e., bi2) = 0. All packet lengths are assumed to be exponentially distributed.
According to (9.28), we should use a 2-limited scheduling discipline when these parameters
apply (Ic = 2).
In the upper half of Table 9.2 we present the results for Xi = 0.6, X2 = X3 = 0.2,
and E[Sr] = E[S2] = E[S3]. In the lower half of Table 9.2 we present the results for
x1 = x2 = x3 = l/3, and E[Si] = 3E[S2] = 3E[S3]. Th e results for the l-limited cyclic
server models have been obtained by the Boxma and Meister approximation (9.24) for
small utilisation (p = 0.3) and by an improved approximation for the higher utilisations
9.5 Performance evaluation of the IBM token ring 189

[112]. The results for the 2-limited case have been derived with the Fuhrmann and Wang
approximation [loo]. The results reveal that especially for higher loads (p = 0.8) the l-
limited approximation overestimates the waiting time, especially for the station with the
higher arrival rate. This is due to the fact that when at a polling instant a queue is not
empty (which is normally the case when p is high), then in the timed token protocol at
least one packet will be transmitted whereas in the l-limited case at most one packet is
transmitted. This causes more switching overhead and consequently higher waiting times.
Also notice that the approximations for the system with different arrival intensities (upper
half of Table 9.2) are more accurate than the approximations for the systems with different
service intensities (lower half of Table 9.2). cl

Example 9.6. Stability condition.


Observing Table 9.2, we see a very high value for E[VVi] in the l-limited case. This is
not surprising if we consider the stability condition for l-limited models: we should have
p + &A < 1. Although p = 0.8 does not directly suggest that the model is extremely
heavily loaded, the per-station stability condition p + x, A = 0.8 + 0.6 x 0.3 = 0.98 actually
does indicate that it is operating close to saturation. 0

9.5.3 The influence of the token holding timer


One of the problems in the management of token ring systems is the setting of the token
holding times for the stations. Dependent on the overall loading of the system and the
relative loading of the connected stations, one or another choice might yield a better
performance or might be more or less unfair. One of the things which we expect from
(9.28) is that large THTs coincide with the exhaustive scheduling discipline, i.e., a k-limited
discipline with k + 00, whereas very small THTs coincide with a l-limited scheduling
strategy.

Example 9.7. The symmetric case (adapted from [112]).


Consider a symmetric cyclic server system with only two connected stations. The following
parameters are chosen (i = 1,2): Xi = X = 0.5, E[SJ = E[S] = 0.5 where the distribution is
assumed to be negative exponential. The switch-overs equal & = S = 0.1 deterministically.
We derive that p = NXE[S] = 0.5. A ssuming that thti = tht, we have a totally
symmetric system. If we choose tht very small, the system will behave as a l-limited cyclic
server system, so that according to (9.15) we have:
190 9 Polling models

tht + 0 + l-limited + E[WL] = 0.938. (9.29)


Conversely, taking tht very large, we end up with an exhaustive cyclic server system, with
the following solution (according to (9.13)):

tht + 00 + exhaustive + E[WE] = 0.650. (9.30)

By the symmetric nature of the model one would now expect that E[WE] 5 E[Wtht] 5
E[WL]. Simulations of a model including the timed-token mechanism have indeed con-
firmed this [112]. 0

Example 9.8. The asymmetric case (adapted from [112]).


Now, consider an asymmetric cyclic server system with only two connected stations. The
following parameters are chosen (i = 1,2): E[S,] = E[S] = 0.5 where the distribution is
assumed to be negative exponential. The switch-overs again equal 6, = 6 = 0.1 determin-
istically. The arrival rates, however, differ for the two stations: Xi = 0.7 and X2 = 0.3.
We derive that p = xi XiE[Si] = 0.5. Once again, we choose both THTs to be the
same. If we choose tht very small, the system will behave as an asymmetric l-limited
cyclic server system which can be solved approximately. We thus have, following (9.24):

EIW~,l] x 1.133,
tht -+ 0 + l-limited + (9.31)
E[W@] M 0.709.
An exact result, only available for a 2-station asymmetric l-limited cyclic polling model
derived by Groenendijk [112, Chapter 61 provides
JT[W~,~] = 1.152,
tht -+ 0 + l-limited + (9.32)
E[WL,2] = 0.671.
Conversely, taking tht very large, we end up with an asymmetric exhaustive cyclic server
system, for which the following exact solution can be obtained (see (9.21)-(9.22)):

EIWE,l] = 0.585,
tht + 00 + exhaustive + (9.33)
E[W& = 0.792.
Interesting to observe here is that for a small THT, station 1 does worse than station 2,
whereas for a large THT, station 1 does better than station 2. By the asymmetric nature
of the model it is now not correct to assume that E[WE,i] 2 EIWtht,l] 5 E[WL,i]. An
increase in the THT now makes the access mechanism more efficient, but also more unfair.
Especially the large value for THT might not be acceptable for station 2. We will come
back to this issue in Section 16.2 where we discuss SPN-based polling models. cl
9.6 Local and global time-based polling models 191

9.6 Local and global time-based polling models


As has become clear from Section 9.5 many real systems do not operate with the mathemat-
ically attractive count-based scheduling strategies. Instead they use scheduling strategies
based on the elapsed service time already spent at the queues. Although it often makes
sense to use count-based scheduling strategies for bounding purposes, it would be good to
have closed-from results available for time-based scheduling strategies as well. To the best
of my knowledge, only the work of Tangemann proceeds in this direction [279].
The model addressed by Tangemann is a cyclic polling model where the scheduling
strategy is either locally or globally timed. Let C be the set of nodes having a locally
timed scheduling strategy and let G be the set of nodes having a globally timed scheduling
strategy. In the first case, there is a limit T! on the time the server may stay at station
i during one visit. In the second case, there is a limit Tig on the sum of the time the
server may stay at station i in one visit and the length of the previous cycle. These two
scheduling strategies correspond to the operation of the access mechanisms of the IBM
token ring and FDDI [246, 277, 2841 respectively. Since both access mechanisms allow for
so-called overrun of the timers, i.e., a packet being transmitted when the timer expires will
be completed, the above timers include an additive factor accounting for the allowance of
overrun. This factor equals the expected remaining service time, i.e., E[S3/2E[Si]. If an
access mechanism is studied which does not allow for overrun, the above timers should not
include such a component.
The stability conditions are derived as follows. For the overall model we must have
p < 1. For the individual queues, however, some extra restrictions apply. In the locally
timed case, the expected amount of work flowing into queue i must be less than the timer
threshold Ti, i.e., XJZ[Si]E[C] < Ti or, by the fact that E[C] = a/l - p,

p+pin < 1.
T,”

Notice that in case Ti = E[Si], th is condition reduces to the stability condition for the
l-limited system. For globally timed models, the amount of work flowing into queue i
during one cycle, i.e., XJ3[Si]E[C], must be less than the threshold T! minus the length of
the previous cycle. This results in the condition

P+ (1+ Pi,; 51.


a

Tangemann now proceeds by deriving approximate solutions for the mean unfinished work
E[Uz] at queue i, that is, the expected amount of work left behind in queue i when the
192 9 Polling models

I I I I I I I I I
14 -‘., p = 0.05 -
12- p = 0.45 - _
p = 0.75 -
10 - p = 0.85 ”-

6-
4--7
2-
0 I I I I I I I I 1
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
T1

Figure 9.2: The influence of T1 in a symmetric locally-timed polling model

server leaves there (the * denotes either a ‘g’ or an ‘1’). The pseudo-conservation law can
be written as
N N N

c PiE[Wi*] = c piE[WE,J + c piE[U,*], (9.36)


i=l i=l i=l

where E[Wf] is the exact average waiting time in the globally and locally timed stations and
where E[WE,i] is the expected waiting time in the queues when all the scheduling strategies
would have been of exhaustive type. Using an approximation for E [UT], Tangemann derives
the following approximate pseudo-conservation law for time-based cyclic polling models:

2 P~E[W~] (1 - q) + C piE[W:] (I - ‘“;,lr, p)EaIP’)


iEB i

(9.37)

where
N
c piE[WE,i] = pct"'l xiE[sfl a(") qp - cz, pf)
+Px+ (9.38)
i=l 81 -P> W-P) ’
and where E[C] = A/(1 - p) is the mean cycle length. For symmetric systems, the
approximate pseudo-conservation law can directly be used for deriving expected waiting
9.6 Local and global time-based polling models 193

times since it will then only contain one unknown. For locally-timed scheduling we can
derive
(1 - PP-VW + AE[Sl 2 _ (l-dN)AEIC1
Tl (Iv 2
(1 - 9))
E[Wc] = > (9.39)
1-p-g-l
where E[WE] is as in (9.13). For globally timed scheduling we can derive

(1 - p)E[WE] + ‘“;;)f;[” ($ - (l-“;)XE’C1 (1 - a))


JwGTI= (9.40)
l-P-g&2$
For asymmetric systems, an approximation procedure as in Section 9.4.2 can be followed.
The average waiting time in each queue is expressed in terms of known parameters and the
expected remaining cycle time, which is then assumed to be station independent. Taking
into account the stability conditions and the limiting case T* -+ 00, the following two
results have been derived:

C:, piE[W~,i]
+ CE, p* (pi - (I- pi)q (I- y ))
X >
N
CL 21% P. (

(9.41)

and

l-Pi(l-&~)
E[W]g = 1-*
2

where xi pi E [WE,i] is defined in (9.38).

Example 9.9. The influence of T1.


Consider the symmetric polling model with 10 stations we have addressed before: N = 10,
6 = 0.1, Jc2) = 0.01, E[S] = 1.0, and E[S2] = 1.0. In Figure 9.2 we depict) the expected
waiting time as a function of Tz based on the approximation (9.39) for p = 0.05, 0.45, 0.75
and 0.85. Comparing these results with those calculated in Table 9.1, we observe that for
194 9 Polling models

T’ = 1 the values rougly agree with those calculated for the l-limited scheduling strategy
and that the results for T1 = 3 rougly agree with those calculated for the exhaustive
scheduling strategy. 0

Example 9.10. Limiting cases for large values of T*,


The above approximate results are correct for large values of the timers. In the two sym-
metric cases, taking the limit T* -+ 00 reduces the expressions to the average waiting time
in symmetric exhaustive models as given in (9.13), which seems intuitively correct. Simi-
larly, in the asymmetric case, taking T* + co reduces the expressions to the approximate
result for the average waiting times for exhaustive models (9.25). q

9.7 Further reading


Polling models have been the subject of research for many years now. Early papers in this
field were written in the 1970s; e.g., by Avi-Itzhak et al. [9], Cooper and Murray [67, 661,
Eisenberg [82, 831, Kuhn [167], Konheim and Meister [163] and Bux and Truong [36].
Surprisingly though, polling models are treated in only a few textbooks on performance
evaluation, see e.g., [152, 1561. 0 ver the last 10 to 15 years the number of research papers
on polling models has increased tremendously; Takagi reports about 250 publications in
the period 198661990 only [272, 273]! The book on polling models by Takagi [271] focusses
on mathematical issues, whereas the survey by Levy and Sidi puts more emphasis on
applicat>ions [181]. The Ph.D. theses by Groenendijk [112] and Weststrate [286] also provide
excellent reading on the topic. Recently, Blanc published interesting work on the power
method with application to polling models [al, 201. The Ph.D. thesis of Tangemann treats
time-based scheduling in polling models in great detail [279]. Background on the IBM
token-ring can be found in [35]. FDDI is described in [246]. The issue of timer-setting in
token ring networks is discussed by Jain in more detail [146]. Johnson and Sevcik discuss
stability and cycle time issues for FDDI in detail [151, 2591. We will come back to the
evaluation of polling models in Chapter 16 where we discuss applications of stochastic Petri
nets.

9.8 Exercises
9.1. Symmetric polling systems.
Use the pseudo-conservation law to compute the expected waiting time when all the stations
9.8 Exercises 195

have either gated, l-limited or decrementing service ordering.

9.2. Priority and polling systems.


Consider a 2-station polling model where, at each station, jobs arrive as a Poisson process
with rate X = 5 and have fixed duration E[S] = 75 msec. Furthermore, the switch-over
times S are fixed and small. This model can be interpreted as a model of a priority
queueing system when station 1 has exhaustive scheduling (highest priority) and station 2
has l-limited scheduling (lowest priority). The switch-over time now models the overhead
incurred by the switching between priority classes.

1. Approximate E[IVr] and E[wz] as a function of A.

2. Compute E[W,] and E[W] 2 using a suitable priority model from Chapter 6, thereby
neglecting the switching overhead.

3. What happens when we take the limit A -+ 0 in the result of l?

9.3. Polling models with matrix-geometric solution.


Consider a 2-station polling model with exhaustive service with the following parameters.
Arrivals form a Poisson process with rate X; and services are negative exponentially dis-
tributed with rate pi. The switch-overs take a negative exponentially distributed amount
of time with rate Si. Furthermore, the queue in station 1 has unlimited capacity but the
queue in station 2 has a limited capacity of K.

1. Define the state space Z of this Markovian polling model.

2. Draw the state-transition diagram of this quasi-birth-death model for K = 2.

3. Discuss how such a model can be solved, thereby explicitly stating the block matrices
Ai and Bi,j.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part III

Queueing network models


Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 10

Open queueing networks

I N the previous chapters we have addressed single queueing stations. In practice, many
systems consist of multiple, fairly independent service providing entities, each with their
own queue. Jobs in such systems “travel” from queueing station to queueing station in
order to complete. Instances of these so-called queueing networks can be observed at many
places: in computer systems where a number of users are trying to get things done by
a set of processors and peripherals, or in communication systems where packets travel
via independent links and intermediate routers from source to destination. In fact, in
Chapter 4 we already saw an example of a queueing network: the simple terminal model
in which many system users attended a central processing system and their terminals in a
cyclic manner. In this chapter we will elaborate in particular on the class of open queueing
network models, i.e., queueing networks in which the number of customers is not a priori
limited.

We first introduce basic terminology in Section 10.1 after which we discuss the class
of feed-forward queueing networks in Section 10.2. This discussion provides us with a
good insight into the analysis of more general open queueing networks, such as Jackson
networks in Section 10.3. Although Jackson queueing networks can be applied in many
cases, there are situations in which the model class supported does not suffice, in particular
when arrival streams are not Poisson and service times are not exponentially distributed.
Therefore, we present in Section 10.4 an approximation procedure for large open queueing
networks with characteristics that go well beyond the class of Jackson networks. We finally
address the evaluation of packet-switched telecommunication networks as an application
study in Section 10.5.
200 10 Open queueing networks

10.1 Basic terminology


Queueing networks (QNs) consist of a number of interconnected queueing stations, which
we will number 1, . s. , M. Individual queueing stations or nodes, are independent of one
another. They are, however, connected to each other so that the input stream of customers
of one node is formed by the superposition of the output streams of one or more other nodes,
We assume that there is a never-empty source from which customers originate and arrive
at the QN, and into which they disappear after having received their service.
A more formal way to describe QNs is as a directed graph of which the nodes are the
queueing stations and the vertices the routes along which customers may be routed from
node to node. The vertices may be labelled with routing probabilities or arrival rates.
When dealing with open QNs, the source (and sink) is generally denoted as a special node,
mostly numbered 0 (we will do so as well).
A well-known example of an open QN model is a model of a (public) telecommunication
infrastructure where calls are generated by a very large group of potential system users.
We will address such a model in Section 10.5.

10.2 Feed-forward queueing networks


In this section we discuss feed-forward queueing networks (FFQNs). In such networks the
queues can be ordered in such a way that whenever customers flow from queue i to queue
j, this implies that i < j, i.e., these QNs are acyclic. Note that due to this property FFQNs
must be open. We focus on the case where the individual queueing stations are of MIMI 1
type, but will indicate generalisations to multi-server queueing stations.
In order to increase the understandability we present the evaluation of FFQNs in three
steps. We first discuss the MIMI1 queue as the simplest case of an FFQN in Section 10.2.1.
We then discuss series of MIMI1 queues in Section 10.2.2 and finally come to the most
general form of FFQNs in Section 10.2.3.

10.2.1 The MIMI1 queue


Consider a simple MIMI1 queue with arrival rate X and service rate p. Given that the
queue operates in a stable fashion, i.e., p = X/p < 1, we know from Chapter 4 that the
steady-state probability of having i customers in the queue is given by

pi = Pr{N = i} = (1 - p)pi, i E LN. (10.1)


10.2 Feed-forward queueing networks 201

x . ....
A42 CLM

Figure 10.1: Series connection of M queues

The correctness of this result can be verified by substituting it in the global balance equa-
tions of the underlying CTMC. From p- all kinds of interesting performance measures can
be obtained. Notice that we have implicitly defined the state space Z of the underlying
CTMC to be equal to JV.

10.2.2 Series of MIMI1 queues


Now consider the case where we deal with M queues in series. The external arrival rate
to queue 1 equals X and the service rate of queue i equals pi. For stability we require that
all pi = X/pi < 1. If there are queues for which pj 2 1, these queues build up an infinitely
large waiting line and the average response time of the series of queues goes to infinity as
well. The queue with the largest value of pj is said to be the bottleneck node (or weakest
link) of the QN. In Figure 10.1 we show a series QN.
Since there are no departures from the QN, nor arrivals to the QN in between any
two queues, the arrival rate at any queue i equals X. Furthermore, a departure at queue
i (i = l;**,A4 - 1) results in an arrival at queue i + 1. In Figure 10.2 we show the
state-transition diagram of the CTMC underlying a series QN with M = 2. Notice that
the state space Z = LV2. Every state (i, j) E Z signifies the situation with i customers in
queue 1 and j customers in queue 2. The sum of i and j is the total number of customers in
the series QN. As can be observed form Figure 10.2, a “column of states” represent states
with the same overall number of customers present in the QN. Recognising that there are
basically 4 different types of states, it is easy to write down the GBEs for this CTMC:

state (0,O) : Po,oX= POJl-12,

states (i,O), i E JV+ : Pi,O(X + PI> = Pi-1,0X + Pi,lp27

states (0, j), j f JV+ : Po,j(X + /32) = Pl,j-1p1 + PO,jSlcL2,


states (i, j), i, j E N+ : Pi,j(x + pl + p2) = Pi-l,jx + Pi+l,j-lpi + Pi,j+ly2,

normalisation : Fc
,=oJ~ol% = l* (10.2)
.

Solving these GBEs seems a problem at first sight; however, their regular structure and
202 10 Open queueing networks

/
/---

P2
/
//
/-----
x x

Figure 10.2: CTMC underlying two MIMI 1 queues in series

the fact that we already know the answer when A4 = 1 might lead us to try whether

P&j = (1 - Pl)P”; x (1 - P2,d (10.3)

is the correct solution. That this is indeed the case, we leave as an exercise to the reader.
Now we will spend some more time on interpreting this result. In a series QN it seems that
the overall customer probability distribution can be obtained as a product over the per-
queue customer probability distributions. It is therefore that series QNs (and many more
QNs we will encounter later) are often called product-form queueing networks (PFQNs).
For series QNs the product-form property is easy to show. In fact, all the customer streams
in a series QN are Poisson streams. This fact has been proven by Burke and is known as:

Theorem 10.1. Burke’s theorem.


The departure process from a stable single server MIMI 1 queue with arrival and
service rates X and ,Y respectively, is a Poisson process with rate X. Cl

That Burke’s theorem is indeed valid, can easily be seen. Consider an MIMI 1 queue
as sketched. As long as the queue is non-empty, customers will depart with the inter-
departure time distribution being the same as the service time distribution. If upon a
10.2 Feed-forward queueing networks 203

departure the queue empties, one has to wait for the next arrival, which takes a negative
exponentially distributed length (with rate X), plus the successive service period. So, when
leaving behind an empty queue, the time until the next departure has a hypo-exponential
distribution with 2 phases, with rates X and ,Y (see Appendix A). The probability that
upon departure instances the queue is non-empty equals p, so that we can compute the
inter-departure time distribution FD(t) as follows:

&(t) = p(1 - eVCLt)+ (1 - p) 1 - Le-Xt + Xe -d 7


P--x CL-X
which reduces to Fo (t) = 1 - epxt = FA (t).
In fact, Burke’s theorem also applies to MIM(m queues, in which case the output process
of every server is a Poisson process with rate X. However, it is easy to see that for an MIMI 1
queue with feedback, the resulting outgoing job stream is not a Poisson stream any more.
In fact, also the total stream of jobs entering such a queue is no Poisson stream. The
superposition of the external arrival stream and the jobs fed back after having received
service is not a Poisson process since the superpositioned Poisson streams are no longer
independent.
Given the steady-state probability distribution for series of queues, we can address
average performance measures for the individual queues. Here also the results for an
individual MIMI1 q ueue simply apply. As an example, the average number of customers
in queue 2 is derived as

E[N,] = gyjPi,j = p& -P1)Pi x .7ll - Pa)Pi


ix0 j=o i=o j=o

= (1 -PdFPI (1 -Pz&P32
( i=o I( j=o )
P2 (10.5)
= I-pz’

Finally, we generalise the above results to a series of A4 queueing stations (given all pi < 1).
Denoting with Ni the number of customers in queue i, the vector valued random variable
N = (Nl,- , NM) represents the state of the series QN. Consequently, we have state space
Z = LVM and find the following steady-state probabilities:

Pr{N = 11) = :(I - pi)p’”


i=l

(10.6)
204 10 Open queueing networks

where G is called the normalising constant, i.e.,

G=(fjl-Pi~j’- (10.7)

which assures that the sum of all probabilities CnEZ Pr{N = n} = 1.

10.2.3 Feed-forward queueing networks


In this section we address general FFQNs, i.e.: acyclic QNs, not necessarily in series. All
the queues themselves are M 1M 11 queues. Denote the environment with “0” and let the
overall arrival process from the environment be a Poisson process with rate X0. Let ri,j
be the (user-specified) probability that a customer leaving queue i goes to queue j. By
definition, we have ri,j = 0 whenever j 5 i. The probabilities ro,+ indicate how the arrivals
are spread over the individual queues and the probabilities T~,Orepresent departures from
the QN.
The overall flow of jobs through queue j now equals the sum of what comes in from
the environment and what comes from other queues upon service completion, i.e.,

These equations are called the (first-order) trufic equations. Their number equals the
number of queues in the QN. As FFQNs are acyclic we can solve the traffic equations
successively:
x0 --+ Xl + x2 + * * - -+ X&f. (10.9)

NOW, if all the pi = Xi/pi < 1, the QN is said to be stable. If that is the case, the
overall steady-state probability distribution of the QN can again be seen as the product
of the per-queue steady-state probability distributions. Again, the queues can be regarded
as independent from each other. Using the same notation as above, the steady-state
probabilities are given as:

Pr{iJ = 111) = Pr{K Ni = ni>


i=l

= 6 Pr{Ni = ni} = fi(l - pi)pn” = A fi p’“. (10.10)


i=l i=l 2-l

The factor l/G is independent of & and can again be regarded as a normalisation constant
for the probabilities to sum up to 1.
10.3 Jackson queueing networks 205

When we deal with multiple server FFQNs, a similar results hold. Only the normali-
sation constant changes according to the results derived in Chapter 4 for MlMlm queues.
In any case, Burke’s theorem applies so that in FFQNs all the job streams are Poisson
streams.

10.3 Jackson queueing networks


Jackson QNs (JQN s) are an extension of FFQNs in the sense that the “feed-forward restric-
tion” is removed, i.e., jobs may be routed to queues they attended before. The job streams
between the various queues now are not Poisson streams because they are composed out
of dependent streams. Surprisingly enough, however, the queue-wise decomposition of the
QN can still be applied! Despite the fact that the job streams are not Poisson, the steady-
state probabilities for the QN take a form as if they are ! This also implies that we still
have a nice product-form solution for the steady-state probabilities. The only difference
with the FFQN is now given by the traffic equations:

(10.11)

These cannot be solved successively any more, so that we either have to use a Gaussian
elimination procedure or an iterative technique (see Chapter 15). Once the values Xi are
known we can establish the stability of the QN by verifying whether pi = Xi/pi < 1 for all
i. If this is indeed the case, we have the following solution for the steady-state customer
probability distribution:

Pr{N = ?l} = c(l - pi)p%” = $6 pai, (10.12)


i=l 2=1

with G = (n,M_i(l - pi))-‘. Again, the state space of the underlying CTMC equals Z =
NM.
As we will see in the application in Section 10.5, we do not always have to specify
the routing probabilities. In many cases, it is more natural to specify routes through the
queueing network and to compute the values Xj directly from them. Furthermore, under
the restriction that all queues are stable, the throughputs Xi are equal to the arrival rates
Xi, for all queues.
When changing from single-server queues to multiple-server queues, the presented
product-from result still holds. The only difference lies in the steady-state probability
206 10 Open queueing networks

Figure 10.3: A simple Jackson QN

distributions for the multiple-server queueing stations (all according to the results given in
Section 4.5). The first-order traffic equations remain unchanged.

Example 10.1. A simple Jackson queueing network.


Consider the simple 2-station Jackson QN given in Figure 10.3. We denote the external
arrival rate at queue i as X~ro,i; the overall arrival rate at queue i is Xi. Services in queue i
take place with rate pi and after service, the customer leaves the QN with probability ri,o or
goes to the other queue with probability ri,3-i = 1-ri,a. The corresponding state-transition
diagram is given in Figure 10.4. From the state transition diagram we can conclude the
following GBEs:

Po,oXo = P0,1P2r2,0 + PI,O~I,O,

+
Pi,o(Xo + l.4) = p.-2 i,o A0 r 04 + Pi+1,0Plr1,0 + pi,1p2T2,0 + pi-1,1p2~2,1, i E N ,
+
Po,#o + i-42) = p 0,~.- 1 X 0 r 0,2 + Po,j+1rcL2r2,0 + pl,jwl,0 + pl,j4w1,2, j E JV ,

P&O + PI + p2) = pi+ I X 0 r 0,2 + Pi+l,j-lwl,2 + pi+l,jw1,0

+ pi,j+lp2r2,0 + pi-l,j-lp2r2,1 + pi--l,jXor0,1, i, j E N+. (10.13)

To verify whether the general solution given satisfies these GBEs (including the normali-
sation equation) we have to solve the traffic equations first. The overall traffic arriving at
queue i can be computed as:

Xi = Xoro,l + X&ir3-i,i, i = 1,2. (10.14)

From these equations we find:


x = X0r0,l + X0r0,2r2,1 X0r0,2 + X0r0,lr1,2
1 , and X2 = (10.15)
1- r1,2r2,1 1- r1,2r2,1

Notice the easy interpretation of these expressions: the overall traffic that flows into queue
1 equals that what flows in from the environment, plus that which flows in after it has
10.4 The QNA method 207

.... xoro,1
w-l,0

x07-0,2

I-Lzr2,o

w-l,2

. 1-12r2,l

Figure 10.4: State transition diagram for the simple Jackson QN

been at queue 2. However, since customers may cycle through the queues more than once,
this sum has to be “stretched” by a factor (1 - r1,2r2,1)-1.
The stability conditions are p; = Xi/pi < 1. Using these pi-values, the result for Jackson
QNs can be verified by substituting it in the above GBEs. We will start with the case
i = j = 0. We have:

(1 - PdP -P2Po = (1 -Pd(l -p2)p2p2r0,2 + (1 -p&1(1- p2)plrl,o. (10.16)

Dividing by (1 - pi)( 1 - p2) and using the fact that p+i = Xi, we obtain:

X0 = h-1,0 + X2r2,0. (10.17)

This equality must hold, since it expresses that whatever flows into the QN, i.e., X0, equals
that which flows out of both queues together. Moreover, if we substitute (10.15), and write
ri,o = 1 - ri,3-i, this equality becomes trivial. We leave the validation of the other GBEs
as an exercise to the reader. 0

10.4 The QNA method


In many modelling applications, the restriction to Poisson arrival processes and negative
exponential service times in combination with FCFS scheduling cannot be justified. As
an example, think of a model for a multimedia communication system in which arrivals
of fixed-length packets (containing digitized voice or video) have either a very bursty or
208 10 Open queueing networks

a very deterministic nature. For such systems, Jackson queueing networks might not be
the most appropriate modelling tool. Instead, a modelling method known as the Queueing
Network Analyzer (QNA) might be applied in such cases.
The QNA method, as developed by Kuhn and later extended by Whitt, allows for
quick analyses of large open queueing networks with fixed routing probabilities and FCFS
scheduling. Its most important characteristic, though, is that it treats queueing networks
in which the arrival processes need not be Poisson, and the service-time distribution need
not be exponential. With QNA all arrival processes are assumed to be renewal processes
characterised by their first two moments, i.e., successive arrivals are still independent of
one another. Similarly, all service time distributions are characterised by the first two
moments; in particular, constant services times can be dealt with, albeit approximately.
Besides that, the QNA method also allows for the merging and splitting of customers, as
well as for multiple customer classes. In its simplest form, it reduces to an exact analysis
of Jackson queueing networks as we have seen in Section 10.3.
With QNA the computational complexity is linearly dependent on the number of queue-
ing stations. Once the traffic equations have been solved, all queueing stations can be stud-
ied individually. This advantage, however, comes not without costs; the decomposition of
the overall queueing network into a number of individual queueing stations is approximate.
We will discuss how good this approximation is.
This section is further organised as follows. The considered queueing network class is
introduced in section 10.4.1. The computational method is discussed in Section 10.4.2 and
a summary of the involved approximation steps is given in Section 10.4.3.

10.4.1 The QNA queueing network class


A queueing network to be solved with QNA consists of M nodes or queueing stations. Cus-
tomers travel between nodes according to fixed routing probabilities ri,j, i, j = 0, 1, . . . , M
(; and j not both equal to 0). There is one special node, the environment (indexed 0) from
which external arrivals and to which departures take place.
A special property of QNA is that it allows for customer creation (splitting) and com-
bination (merging) so as to represent the segmentation and reassembly process in com-
munication networks or the fork/join-operation in parallel processing. The combination
or creation capability is indicated with every station. For each queueing station i, the
following parameters have to be defined:

l the arrivals from the environment, characterised by the first and second moment,
i.e., by the arrival rate Xo,i and the squared coefficient of variation C~;o,i;
10.4 The QNA method 209

l the number of servers mi;

l the service time distribution, characterised by the first) and second moment, i.e., by
the average service time E[Si] = l/pi and the squared coefficient of variation C~;i;

l the routing probabilities ri,j;

a the customer creation or combination factor Ti, with

for combination stations,


for ordinary stations, (10.18)
(1, oo), for creation stations.

For all the queues, it is assumed that the scheduling strategy is FCFS and that the buffer
is infinitely large.

10.4.2 The QNA method

The general approach in the QNA method consists of four steps:

1. Input. Description of the QN in terms of the defining parameters as given in Sec-


tion 10.4.1;

2. Flows. Determination of the customer flows inside the network which includes the
generation and the solution of the traffic equations, both for the first moment (similar
to that seen in Section 10.3), and for the second moment (to be discussed below);

3. Per-node results. The computation of the results for the individual queues, given
the first and second moment of the service time distribution, and the first and second
moment of the interarrival time distribution, using exact and approximate results for
the MIGlm, the GlMlm and the GlGlm queue;

4. Network-wide results. Calculation of the network-wide performance results, i.e., the


throughput, the customer departure rate, mean and variance of the sojourn time and
number of customers, per queue and for the overall network.

These steps are treated in more detail in the following subsections.


210 10 Open queueing networks

Input

Although the QNA method allows for multiple customer classes, we restrict ourselves here to
the single-class case. In the papers on QNA a multiple class queueing network is transformed
to a single class queueing network by aggregating customer classes.
Once all the parameters for the single class queueing network have been specified, it
might be the case that some values ri,i > 0, i.e., some queues have immediate feedback.
Although this is allowed as such, the approximations used in QNA are more accurate when
such immediate feedback is eliminated. The idea here is to regard the multiple visits of
the immediately fed-back customer as one larger visit, and to compensate for this later
in the computations. With immediate feedback probability ri,+, the expected number of
successive visits to node i equals (1 - ri,i)-l, so that we alter the model for node i as
follows:

ci;i + ri,i + (1- ri,i)Cg;i)


ri,j
rii + 1 _ ri,i ' Li # 4

ri,i t 0. (10.19)

When computing the performance measure per node, we have to correct for these changes.

Flows: first-order traffic equations

In this step the customer flows between nodes should be obtained. We first concentrate
on the mean traffic flow, i.e., the arrival rates Xi to all the nodes. The first-order traffic
equations are well-known and given by

Xj = &,j + 5 &Tiri,j, j = l;**,M, (10.20)


i=l

that is, the arrival rate at node j is just the sum of the external arrival rate at that node,
and the departure rates X; of every node Xi, weighted by the customer creation factor yi
and the appropriate routing probabilities ri,j.
There are basically three operations that affect the traffic through the QN and which
are illustrated in Figure 10.5:

(a) the probabilistic splitting of a renewal stream, induced by the constant routing prob-
abilities which take place after customer completion at a queueing station;
10.4 The QNA method 211

Figure 10.5: The basic operations (a) splitting, (b) superpositioning and (c) servicing that
affect the traffic streams

(b) the service process at a particular queueing station;

(c) the superpositioning of renewal streams before entering a particular queueing station.

The fact that the first-order traffic equations are so easily established comes from the fact
that the superpositioning of renewal streams as well as the probabilistic splitting of renewal
streams can be expressed as additions and multiplications of rates. Also, the service process
at a queueing station does not affect the average flows.
The first-order traffic equations form a non-homogeneous system of linear equations,
which can easily be solved with such techniques as Gaussian elimination or Gauss-Seidel
iterations.
Now that the arrival rates to the queues have been found, it is possible to calculate the
utilisation pi = Xi/mipi at node i. If pi 2 1 for some i, that queueing station is overloaded.
If all the pi < l9 the queueing network is stable and the analysis can be taken further.

Flows: second-order traffic equations

To determine the second moment of the traffic flows we use the so-called second-order traffic
equations. The three operations that affect the traffic characteristics are superpositioning
and probabilistic splitting of renewal streams, and the service process. We therefore focus
on these three factors and the way they influence the second-order characteristics of the
arrival processes.

(a) Splitting. The probabilistic splitting of renewal processes can be expressed exactly.
As we have seen in Chapter 3, if a renewal stream with rate X and squared coefficient
of variation C2 is split by independent probabilities oi, i = 1,. . . , n, then the out-
going processes are again renewal processes, with rates Xi = aiX and with squared
coefficient of variation (2’: = aiC2 + (1 - ai). If, however, at node i customer creation
or combination takes place with value yi, then C2 is scaled by a factor yi as well,
i.e., Cf = oiyiC2 + (1 - ai). It should be noted that this scaling is an approximation.
212 10 Open queueing networks

(b) Servicing. The service process also has its influence on the departure process. If the
service process has a very high variability, it increases the variability of the outgoing
stream, or on the other hand, a more deterministic service process decreases the
variability of the outgoing stream (in comparison with the variability of the incoming
stream).

Apart from the service time variability, the utilisation of the server is also of impor-
tance, and therefore the average service time, i.e.,

- when the utilisation is low, the outgoing stream will more closely resemble the
characteristics of the incoming stream;
- if the utilisation is high, there will almost always be customers present so that
the departure process is almost completely determined by the service process.

In QNA , an adaptation of Marshall’s result for Ci, the variability of the departure
process in the GIlGIl q ueue, is used. Marshall’s result states that

(10.21)

Since E[W] is not known for the GIlGIl queue, the Kramer and Langenbach-Beltz
approximation (7.24) is used, under the assumption that Ci 2 1:

E[W] = (C; + C;)* (10.22)


w - PI’

Recall that this approximation is exact in the M 1G )1 and the MIMI 1 case. Combining
these two results, we obtain

c; = p2c; + (1 - p”)Ci.

We observe that the coefficient of variation of the service process is weighted with p2
and the coefficient of variation of the arrival process is weighted with (1 - p2) to add
up to the coefficient of variation of the departure process.

For multiple server queues, i.e., GIIGI m q ueues, this result has been extended to

G = 1+(1-P2)(c; - 1)$&p; - 1)
-- ~c~+(l-P”)c2,+P2(l-~j. (10.24)
10.4 The QNA method 213

This result reduces to the earlier result for m = 1, and yields Cg = 1 for the M 1M(m
and the MlGlco queue, as it should.
Practical experience has revealed that although low-variance service processes can
decrease the variability of the departure process, this decrease is often less than pre-
dicted by (10.24). Therefore, instead of taking Cs, in the QNA approach max{Cz, 0.2)
is proposed.

(4 Superpositioning. Finally, we turn our attention to the superpositioning of renewal


streams. Consider the case where we have to superposition a number of renewal
streams with rates Xi and coefficients of variation Cz. A first approximation for the
coefficient of variation of the merged stream would be

(10.25)

which is a weighted sum of the coefficients of variation of the merged streams. Ex-
perimental work, however, showed that the actual C2 is better given by

c2 = WC: + (1 - w)C$, (10.26)

where a good choice for C$ is C$ = 1. The weighting factor w (0 < w < 1) is derived
as
1
(10.27)
w = 1 + 4( 1 - p)“(v - 1) ’
with

v= (l&)2)-‘. (10.28)

It should be noted that these last three equations are based on experience and ex-
tensive simulations, rather than on theory.

Combining the results for superpositioning, splitting and the service of customer flows, we
obtain the following system of second-order trufzc equations:

c+.zj+~c~,ib,,, j=l,***,iW, (10.29)


i=l

where aj and b,,j are constants which follow from the above considerations:

aj = 1 + wj ((P0,jc&-J,j - 1) + gpi,j((l - ri,j) + 7iri,jp&>>, (10.30)


i=l
214 10 Open queueing networks

and
h,j = qPi,jri,j%(l - $1, (10.31)

with p;,j the proportion of arrivals to j that came from i:


JWiri,j
Pi/j = (10.32)
xj ’

and
max{C&,O.2} - 1
Xi = 1+
@G 7
1
wj =
1 + 4(1 - pj)+j - 1) ’
1
v.i = ~ (10.33)
czl P$j *
Consequently, (10.29) comprises a non-homogeneous system which can be easily solved
using such techniques as Gaussian elimination or Gauss-Seidel iterations (see also Chap-
ter 15).

Example 10.2. Second-order traffic equations: simple case.


To obtain more insight into the QNA approximation, consider the case where w = 1 in
(10.26), i.e., where the first approximation for the superpositioning of renewal streams is
used. The total coefficient of variation of the stream arriving at node j is then just the
weighted sum of the incoming streams:

ci,j = PO,jci;O,j + 5 Pi,jG;i,j> (10.34)


i=l

where C&ij is the coefficient of variation of the stream that left node i and is directed to
node j and’C2A;O,jis the coefficient of variation of the external arrival stream. The weighting
factors pi,j are derived such that they express the relative weight of stream i in the total
arrival stream at node j, i.e., if X,,j = Xiyiri,j, then pi,j = Xi,j/Xj. The stream from i to j
is the result of the splitting or merging of the stream that has been multiplied by pi after
it left node i:
Cg;i,j = ri,j+fiCi;i + (1 - ri,j) 7 (10.35)
where C& , is the coefficient of variation of the stream that left server i, but before the
splitting or merging took place. But this stream is the result of an incoming stream at
node i and the service process there:

(10.36)
10.4 The QNA method 215

Using these two results, we can rewrite (10.34), in a number of steps, to

c;;j = PO;jCyi;o,j + 5
i=l
Pi,j (1 - Tij) + 5
i=l
7iyipi,jri,j &Ci;i

+ FPi,jri,j%Pt 1 - - + FPi,jri,jYi(l - Pf)ci;i


i=l ( & 1 i=l
A4
= aj + C h,jci;i, (10.37)
i=l

with aj and b,,j as defined before. Notice that the only difference that remains is that the
above equation uses C~;i directly, whereas the more general result presented before uses
max{C&, 0.2). As this adaptation was based on experience rather than on a mathematical
argument, this is no surprise. q

Performance measures at the nodes

Now that we have decomposed the queueing network into separate service facilities of
GIjGjm type characterised by the first two moments of the interarrival time and the service
time distribution, we can analyse them in isolation. We first focus on the single server case,
i.e.,m = 1, and then on the multi-server case.
An important performance measure is the mean waiting time E[W] for which the
Kramer and Langenbach-Belz approximation (7.24) is used:

(10.38)

where

(10.39)

The mean of the number of customers in the queueing station, E[N], can easily be found
employing Little’s law:
E[N] = p + XE[W]. (10.40)

The probability of delay, i.e., 0 = Pr{W > 0}, is also based on the Kramer and Langenbach-
Belz approximation as follows
216 10 Open queueing networks

with
(10.42)

Notice that in case of an M]G] 1 queue (10.41) reduces to 0 = p, which is correct since in
that case the Poisson arrivals see the average utilisation of the queue (due to the PASTA
property). The squared coefficient of variation of the waiting time, i.e., C&, is approxi-
mated as follows (see [288, (50)-(53)]):

4( 1 - p)(2Cg + mu{C& 1))


c& z 2. 2P + -0 . (10.43)
0 ( 3(C. + 1) 1
From this result, the squared coefficient of variation of the number of customers in the
facility, i.e., Cc is derived as:

(10.44)

Also here various approximations and results based on experience have been used.
In the case of multiserver nodes, only a few approximate measures can be computed,
by modifying the exact results for the M]M] m queue. The multiserver results are therefore
less accurate. As an example of such an approximate result, the average waiting time in a
GI]G]m node is obtained as

(10.45)

where E[W]Mlr,+ is the average waiting time in an M]M]m queue with the same arrival and
service rates. For M]G]m queues, this approach is known to perform well. More advanced
methods can of course be included here.
Finally, notice that for those queues for which we have eliminated the immediate feed-
back in the input-phase, we have to adjust the obtained performance measures as follows:

Cg;i - ri,i
Cg;i t
1 - ri,i ’
E[Wi] t (1 - Ti,i)E[Wi]. (10.46)

For higher moments of N and W, more complex back-transformations are necessary [288].
10.4 The QNA method 217

Network-wide performance measures

In the last step the performance results of the separate queueing facilities have to be
combined to final network-wide performance results. The throughput is defined in QNA as
the total external arrival rate:

AJ = x0,1+ - - - + X()&I. (10.47)

When no customers are combined or created at the nodes, the departure rate from the
network will be equal to the throughput. Otherwise the departure rate can differ from the
throughput. In general, the departure rate from the network is given as:

(10.48)

The mean number of customers in the network is given by

E[N] = E[N,] + * - * + E[N$f]. (10.49)

Neglecting the fact that the nodes are dependent, the variance of the number of customers
can be approximated as:

var[N] = var[A$] + . . . + var[NM]. (10.50)

The response time is calculated from the perspective of an aggregate customer. A customer
enters queue i with probability Xo,i/Aa. The expected number of visits to node i, the so-
called visit-count Vi (see Chapter ll), is

x = Xi/X@ (10.51)

The mean time an arbitrary customer spends in node i during its total time in the network
(for all its visits to node i) therefore equals

E[&] = k&?qSJ + E[WJ). (10.52)

Note that the response time for an arbitrary customer to pass through queue i once, simply
equals E[Si] + E[IV”]. Th e expected total response time for an arbitrary customer to pass
through the network then equals

(10.53)
i=l
218 10 Open queueing networks

+-El@
x
+-El@
Figure 10.6: The nine-node example queueing network

c; = 0.5, c; = 0.5 Cl = 6.0, C; = 0.5 Cl = 6.0, c: = 4.0


QNA SIM RE QNA SIM RE QNA SIM RE
E[Wl;3] 0.533 0.497 7.2 3.972 3.220 23.4 6.111 6.228 -1.9
E[W4] 3.008 2.931 2.6 9.189 10.97 -16.2 21.00 22.92 -8.4
E[W~] 1.377 1.301 5.8 1.596 1.559 2.4 6.005 6.591 -9.7
E[l&] 1.091 0.985 10.8 1.372 1.282 7.0 5.257 5.956 -11.7
E[W7] 1.830 1.695 7.9 5.604 6.026 -7.5 12.29 13.31 -7.7
E[W8] 0.820 0.758 8.2 3.273 2.685 21.9 6.719 7.597 -11.6
E[V&] 20.22 18.40 9.9 34.51 70.25 -50.9 113.1 100.2 12.9

Table 10.1: The expected waiting times for the nine-node example queueing network

Example 10.3. Kiihn’s nine-node queueing network.


We take a queueing network similar to the one discussed in [166, Section 4.2.31 and [287,
Section VI]. It is a nine-node queueing network as depicted in Figure 10.6. Customers
arrive via a renewal process at nodes 1-3, each with arrival intensity X = 0.55. All node
service times are, on average, equal to 1, except for node 5, at which the mean service time
equals 2. Three cases are addressed, with different coefficients of variation for the arrival
and service processes. For every case, all the arrival processes have identical coefficients of
variation, and so have the service times.
10.4 The QNA method 219

The results are presented in Table 10.1. For the three different cases we present the
expected waiting time at each node (notice that nodes l-3 have the same expected waiting
time), both derived with QNA and with simulation (SIM). The relative error (RE) is defined
as RE= 100% x (QNA -SIM)/SIM. In the first case, QNA gives acceptable results. In the
second case, QNA provides less good results, except for queues 5-7. This is most probably
due to the rather extreme coefficient of variation of the arrival process. In the last case,
the results are a little better again. It is interesting to see that the QNA result for the
queues “in the middle” of the queueing network are often better than those for queues “on
the border”. As QNA is based on a number of approximations, it is difficult to track down
the exact source of the errors. cl

10.4.3 Summary of approximations


It is important to realise that the QNA method is not exact but is based on a number of
approximations. The results obtained with QNA therefore always have to be interpreted
carefully. Below we mention which approximations are made in the QNA approach:

1. Performance measures for the network as a whole are obtained by assuming that the
nodes are stochastically independent given the approximate flow parameters;

2. It is assumed that traffic streams are renewal processes that can be characterised
adequately by the first two moments of the interrenewal time;

3. A GI 1G (m approximation for the performance at a node is used which is only exact


for the MIMI1 or the MlGll queue;

4. Several equations are not based on an exact mathematical derivation , but on approx-
imate results, sometimes obtained after extensive simulations.

To avoid the third approximate step, one can consider the use of PHIPHlm queues, for
which an exact matrix-geometric solution exists. For networks with queues in series and
where some of the queues include customer creation, the arrival processes at intermediate
nodes are no renewal processes. In fact, such arrivals are better described as batch arrivals,
as we have seen in relation to the M 1G 11q ueue. Under some circumstances, it is therefore
better to use batch queueing models for the intermediate nodes.
220 10 Open queueing networks

node link

c E H

Figure 10.7: An example telecommunication network with && links connecting A& nodes

10.5 Telecommunication network modelling


In this section we present a typical application of open queueing network models: the
evaluation of possibly large packet-switched telecommunication networks. We begin with
a system description of such a telecommunication network in Section 10.5.1. We then
proceed with the first modelling approach, based on Jackson networks, in Section 10.5.2.
We refine our evaluation rnethod by switching from networks of MIMI1 queues to networks
of M]G]l queues in Section 10.5.3. We finish this section by illustrating the use of QNA for
the evaluation of telecommunication networks in Section 10.5.4.

10.5.1 System description


Consider a packet-switched telecommunication network consisting of A& nodes connected
by i$ links. The topology of the network might take any form, i.e., we do not restrict
ourselves to ring- or star-shaped networks. Most often, we will see meshed structures, as
depicted in Figure 10.7, with switching nodes A, B, ea‘, I. In what follows, we assume that
all the links, as depicted in the figure, are bidirectional but may have different capacity
in each direction (notice that this includes the case of having uni-directional links). The
nodes are capable of buffering incoming traffic and transmitting it on any of the outgoing
links. For this reason, these networks are also often called store-and-forward networks.
Examples of such networks are the ARPANET and the Internet [161, 277, 2841.
The nodes in the network model should not necessarily be understood as end-nodes in
the network; they can best be understood as concentrators for a large user group. As an
example of this, the nodes could be university computer centers and the network could
span a complete country. All computer users within a certain university would then be
able to communicate with all other users at all other universities via the central nodes
for their respective universities. We denote the aggregate stream of traffic originating in
10.5 Telecommunication network modelline; 221

bidirectional link
. . . . . ............ . .

node

Figure 10.8: The queueing network equivalent of a switching node and its associated bi-
directional links

node i destined for node j as ~i,j (in packets per second) and assume that it is a Poisson
process; normally ~i,i = 0. Since the traffic originating at one node is the aggregate traffic
of many users connected directly to this node, this assumption seems reasonable. The
overall aggregate network traffic, measured in packets per second, is denoted
^ A
y = p&j. (10.54)
i=l j=l
The complete traffic to be transported by the network is often summarised in a square
traffic matrix P with entries yi,j.
Let us now have a look at a possible queueing network model for such a telecommuni-
cation network. We take an abstract point of view and assume that a switching node can
be modelled as a single-server queue. Furthermore, we assume that each unidirectional
link has to be modelled as a separate queueing station as well. The queueing-network
equivalent of a switching node such as B, C, E or H in Figure 10.7 and its three associated
bidirectional links is illustrated in Figure 10.8. In doing so, we end up with a queueing
network with A& = Qn queueing stations representing the nodes and Ml = 2&$ queueing
stat ions representing the links.
Let us now come to the characteristics of the queueing stations. For every switching
node i we assume that the scheduling order is FCFS and that the service rate pi (in switch-
ing operations per second) is known. For now, we do not dwell on the actual switching
time distribution. For the queueing stations representing the links, the situation is more
complicated. Every link I is assumed to have a certain capacity cl (in bits per second). Now
222 10 Open queueing networks

consider a packet generated at node i and destined for node j; its length is drawn from a
particular packet length distribution. On its way, the packet passes through a number of
links and switching nodes. Since the packet length does not change on its way, we observe
that the service times over the links are in fact not independent. In the queueing network
models we have introduced so far, however, the service time at every node is assumed to
be independent of the service times at all the other nodes. Thus, when we try to model the
actual packet flow through the network realistically, we end up with a queueing network
model that we cannot solve. However, it seems reasonable to assume that in the overall
network, packets from many source-destination pairs will be transmitted in an interleaved
fashion. This source of randomness then justifies the assumption that the packet length
itself is a random variable with fixed mean, that is regenerated at every queue. In a sense,
we assume that the queues operate independently from one another. This assumption has
become known as Kleinrock’s independence assumption and has been validated as accurate
using extensive simulations.
We assume that all the packets originating at any of the nodes have the same mean
length and obey the same packet length distribution FB(~) (in bits) with mean value l/p
(bits per packet). A packet of b bits length will then take b/ci seconds to be transmitted
over link i. The number of packets that can be transmitted over link i is pci (packets per
second).
Now that we have established the characteristics of the queueing network nodes, we
have to compute the workload per node. We do so by readdressing the traffic matrix I? and
the actual routes through the network. Let R(i, j) be the set of links visited by packets
routed from i to j and let N(i, j) be the set of switching nodes in the route from i to j. We
assume that these sets are uniquely defined and that they are static. We do not address
the question how these sets, i.e., the actual routes through the network, are obtained or
established. We then can compute the arrival rate of packets to a link 1 as

Xl= c 7i,j, (10.55)


{(i,~)l~EW~,8)

and the arrival rate to switching node n as

(10.56)

Given these arrival rates, we can validate the stability of the system by verifying whether
the utilisations pl = Xl//~1 and pn = X,/p, are all smaller than 1. We can then compute
the expected response time for packets from i to j (often denoted as the expected delay)
10.5 Telecommunication network modelling 223

as the sum over the response times at all the links and nodes visited along the way:

= c (Jwvl + E[Sl] + fi> + c (E[W,] + E[&]), (10.57)


lER(i,j) nEN(i,j)
where we have split the response times per link and node in waiting and service time per
link and node, for reasons to become clear below. For links, we furthermore have added
the term Pi which is the propagation delay for link 1. It depends on the actual size of the
network and on the technology employed whether Pl does play a role in the response time
for a link or not. In networks with links of moderate capacity and small geographical size,
the service time, i.e., the transmission time, normally dominates the propagation time; in
these cases Pl is often neglected. In high-speed wide-area networks, on the other hand, Pl
can play an important role and can even be larger than E[Sl].
The overall average network response time can be expressed as the expected response
time on a route from node i to j, weighted by its relative importance with respect to the
overall traffic carried, i.e.,
E[R] = 2 2 =E[R(i,j)]. (10.58)
i=lj=l Y
Notice that we do not specify the routing probabilities ~i,j directly. Instead, we specify
routes. Given the information on the routes, we can of course compute the long-term
probability that an incoming packet in a switching node needs to be transmitted over a
particular outgoing link, but we do not need these probabilities as such.

10.5.2 Evaluation using Jackson queueing networks


We can evaluate a packet-switched communication network as sketched in the previous
section easily using a Jackson queueing network. Note, however, that we have already
made a number of assumptions in the sketched modelling trajectory in order to make this
possible.
The queueing network model of the communication network is completely specified by
the link and switching node parameters (ci and pi), the traffic matrix I?, the mean pa,cket
length l/p and the routing information (the sets R(i, j) and ni(i, j)). Extra assumptions
we have to make here are that the packet length and the switching time are negative
exponentially distributed random variables.
Having made these assumptions, we can compute the per-link expected delay as:

(10.59)
224 10 Open queueing networks

where the first term represents the expected waiting time and the second term E[Sl] = l/pcl
is the expected service time (see Chapter 4). Note that we have left out the propagation
delay over link 1. Similarly, the expected per-node delay at node n is:
PJW?J (10.60)
q%J = I-p + qsL1,
n
with E[S,] = l/pn.
We can adapt this model slightly when we consider the expected delay in the com-
munication network for a specific class of packets, e.g., user-oriented packets. The overall
traffic mix will consist of user packets and control packets. Since control packets are gen-
erally shorter than user packets, the overall expected packet length will be smaller than
the expected user packet length. For the per-link delay, we then can compute the expected
waiting time using the overall expected packet length, but the service time for user-packets
is then changed to the specific value for the user packets. For the switching nodes, we do
not change anything.
In our treatment so far we have restricted ourselves to the computation of mean perfor-
mance measures. Of course, using Jackson networks, we can also compute specific customer
distributions for the model, as indicated in Section 10.3.

10.5.3 Evaluation using networks of MIGIl queues


There are a number of restrictions in the use of Jackson queueing networks to evaluate
communication networks that might be overcome when switching to networks of MlGl 1
queues. First of all, such queueing networks allow us to use other than exponential service
times (in the switching nodes) and packet length distributions. Secondly, the use of different
packet lengths (in mean or distribution) would be possible. However, these advantages do
not come without cost. The computational procedure we will outline below is slightly more
complicated than the one sketched before and it is no longer exact.
Let us first address the case with fixed-length packets, since many networks do operate
with them. As we have seen in Chapter 5 the expected waiting time in a queueing station
with deterministic service times as opposed to negative exponentially distributed service
times, reduces by a factor 2. Hence, the only thing we change in the computations is the
MIMI1 based term for E[&] with the corresponding term for the MlGll queue. A similar
procedure can be followed for the expected response times at the switching nodes. First
note that this approach can also be followed when distributions other than deterministic
are required. Secondly, notice that we are making an approximation here. The fact that we
can analyse a network of queues by assessing one queue at a time is a typical characteristic
10.5 Telecommunication network modelling 225

of Jackson queueing networks (its product-form result) that does, in fact, not apply to
networks of M(G]l queues. Still though, simulation studies have confirmed that in most
cases this approximate approach yields results that are reasonably accurate.
We can take this even further by assuming that the packets from node i to j, which
form a Poisson process with rate yi,j, have fixed length l/,~~i,j. A random packet arriving
at link I then has the following mean and second moment (of transmission time) :
E[SJ= c --
7i,j 1
and E[SF] = c 7i,j
--
1
(10.61)
(i,j),lER(i,j) xl c@i,j ’ (i,j),lER(i,j) ‘1 (clk.i)2 ’
We finally comment on the use of even other M ]G (1 queues as models for the links and
switching nodes. We already addressed the possibility that in the network two types of
packets are communicated: control packets and user packets. By addressing these streams
of packets separately, one can use MI G] 1 models with priority scheduling to further enhance
the performance of the control packets. Any of the other M]GJl queueing models presented
in Chapter 6 might be used for modelling the nodes and links in a more appropriate way.
It should be noted, however, that these approaches are all approximate and thus should
be practiced with care.

10.5.4 Evaluation using QNA

In the previous section we have addressed a number of extensions to Jackson queueing net-
works that lead to approximate performance measures. A large number of these extensions
have already been captured in the QNA approach. Given a telecommunication network as
we have defined in Section 10.5.1, with QNA the performance evaluation would take the
following form.
The traffic matrix r would be used to compute the values Xl and X,, and also to
compute the routing probabilities ri,j. The second-order traffic equations are then used to
compute the second moment of the packet streams between links and nodes. Notice that
the traffic originating at node i and destined for node j does not need to be a Poisson
process any more. Furthermore, the packet length between each pair of end-nodes may
be different. As we have indicated in Section 10.5.3, in many cases the packet length per
source-destination pair is deterministic. We can then easily compute the first two moments
of the transmission or service time distribution at each of the links and nodes. The number
of servers per node mi can still be varied; the packet splitting and reassembly option can
also be used.
When the above has been given, the complete QNA input is known and the computa-
tional procedure as sketched in Section 10.4 can be performed.
226 10 Open queueing networks

10.6 Further reading


Seminal work on open queueing networks was done by Jackson in the 1950s [143, 1441.
Later, his work has been extended and combined with other queueing network results (see
Chapter 12). Burke presented the result on the flow out of a M]M]l queue as early as 1956
WI.
The work on the QNA method was initiated by Kuhn [166] and has later been extended
by Whit,t [288, 2871. Important steps in the QNA method are given by the Kramer and
Langenbach-Beltz approximation [164] and by Marschall’s result [191]. Haverkort recently
improved on the approximation for the individual node performance by using matrix-
geometric methods under the restriction of having phase-type renewal arrival streams and
phase-type service time distributions. Under the same restrictions, he also extended the
model class to the case where the nodes have finite capacity so that customer losses can
occur [122, 1231.
Kleinrock has done pioneering work in the application of Jackson queueing networks
(and various variants) to computer networking problems. In [161], Chapters 5 and 6 are
devoted to the analysis of the early ARPANET in which a similar (but less detailed)
modelling approach is presented to that given here.

10.7 Exercises
10.1. Feed-forward queueing networks.
Show that the suggested solution for the series QN in Section 10.2.2 indeed fulfills the
given GBEs (case A4 = 2). Also show that a similar result applies for general M.

10.2. Jackson networks.


Show that the suggested solution for the Jackson network in Example 10.1 indeed fulfills
the given global balance equations.

10.3. A central server QN.


Consider the central server QN given in Figure 10.9. Jobs arrive at the CPU (the central
server) with rate X as a Poisson process. After service at the CPU, they either leave
the system (with probability ~i,o) or they move to one of the M - 1 I/O-devices (with
probability ~i,j, j = 2, . . . , M). After service at the I/O-device of choice, the jobs return to
the central server. All service times are assumed to be negative exponentially distributed.
We here address the case with M = 3, i.e., we have 2 I/O subsystems. Furthermore,
10.7 Exercises 227

Figure 10.9: A central server QN

it is given that E[Sr] = 2 and E[Sj] = 1.2 (j = 2,3), X = l/7 and the non-zero routing
probabilities are rl,o = 0.3, r1,2 = 0.6 and r1,3 = 0.1.

1. Draw the CTMC underlying this JQN.

2. Write down the GBEs and show that Jackson’s solution for these equations is indeed
valid.

3. Compute the mean queue lengths at the stations.

4. Compute the mean time a (random) customer spends in this system.

10.4. QNA without customer creation and combination.


Derive the simplified QNA traffic equations when no customer creation nor combination
takes place, i.e., all pi = 1 and w = 1.

10.5. QNA without multiserver queues.


Further reduce the simplified QNA traffic equations when all queues are served by single
servers, i.e., all rni = 1.

10.6. A simple QNA application.


Consider a QN with AL! = 2 queues and the following numerical parameters. For the
arrival streams we have: Xo,r = 2 with C:;,,, = 2 and X0,2 = 3 with Ci.o2 = 3. For the
service time distributions we have: E[Sr] = 0.1 with Cs,r = 3 and E[S2] = 0.05 with
Csi2 = 0.5. Furthermore, the following routing probabilities are given: rr,2 = 1, r2,1 = 0.5
228 10 Open queueing networks

and Q,O = 0.5. Finally, there is no splitting and combining (ri = 1) and the queues have a
single server (mi = 1).

1. Derive and solve the first-order traffic equations.

2. derive and solve the second-order traffic equations.

3. Compute E[Wi] and E[IVi].

4. Assume that the squared coefficients of variation are all equal to 1. Now, compute
E[Wi] and E[Ni] using Jackson queueing networks, and compare the results.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 11

Closed queueing networks

I N Chapter 10 we addressed queueing networks with, in principle, an unbounded number


of customers. In this chapter we will focus on the class of queueing networks with a fixed
number of customers. The simplest case of this class is represented by the so-called Gordon-
Newell queueing networks; they are presented in Section 11.1. As we will see, although
the state space of the underlying Markov chain is finite, the solution of the steady-state
probabilities is not at all straightforward (in comparison to Jackson networks). A recursive
scheme to calculate the steady-state probabilities in Gordon-Newell queueing networks
is presented in Section 11.2. In order to ease the computation of average performance
measures, we discuss the mean value analysis (MVA) approach to evaluate GNQNs in
Section 11.3. Since this approach still is computationally quite expensive for larger QNs or
QNs with many customers, we present MVA-based bounding techniques for such queueing
networks in Section 11.4. We then discuss an approximate iterative technique to evaluate
GNQNs in Section 11.5. We conclude the chapter with an application study in Section 11.6.

11.1 Gordon-Newell queueing networks


Gordon-Newell QNs (GNQN s) , named after their inventors, are representatives for a class
of closed Markovian QNs (all services have negative exponential service times). Again we
deal with &Y MIMI1 q ueues which are connected to each other according to the earlier
encountered routing probabilities ri,j. The average service time at queue i equals E[Si] =
l/pi. Let the total and fixed number of customers in such a QN be denoted K, so that we
deal with a CTMC on state space
230 11 Closed queueing networks

To stress the dependence of the performance measures on both the number of queues and
the number of customers present, we will often include both A4 and K as (functional)
parameters of the measures of interest.
As we have seen before, the involved customer streams between the queues are not
necessarily Poisson but this type of QN still allows for a product-form solution as we will
see later. Let us start with solving the traffic equations:
M
Xj(W = Cxi(Eori,j, (11.1)
i=l

where Xj (K) denotes the throughput through node j, given the presence of K customers
in the QN. This system of equations, however, is of rank iVf - 1. We can only calculate
the values of Xj relative to some other Xi (i # j), and therefore we introduce relative
throughputs as follows. We define Xi(K) = KXr (K), w h ere the so-called visit count or
visit ratio Vi expresses the throughput of node i relative to that of node 1; clearly VI = 1.
These &values can also be interpreted as follows: whenever node 1 is visited once (VI = l),
node i is visited, on average, Vi times. Stated differently, if we call the period between
two successive departures from queue 1 a passage, then V; expresses the number of visits
to node i during such a passage. Using the visit counts as just defined, we can restate the
traffic equations as follows:
M M M
I$ = C F$ri,j = VI + C b$ri,j = 1+ C l$ri,ja (11.2)
i=l i=2 i=2

This system of linear equations has a unique solution. Once we have computed the visit
counts, we can calculate the service demands per passage as Di = I$E[Si] for all i. So,
Di expresses the amount of service a customer requires (on average) from node i during
a single passage. The queue with the highest service demand per passage clearly is the
bottleneck in the system. Here, Di takes over the role of pi in open queueing networks.
Notice that in many applications, the visit-ratios are given as part of the QN specifi-
cation, rather than the routing probabilities. Of course, the latter can be computed from
the former; however, as we will see below, this is not really necessary. We only need the
visit-ratios to compute the performance measures of interest.
Furthermore, notice that the values for Di might be larger than 1; this does not imply
that the system is unstable in these cases. We simply changed the time basis from the
percentage of time server i is busy (expressed by pi) to the amount of service server i has
to handle per passage (expressed by Di). Moreover, a closed QN is never unstable, i.e., it
will never build up unbounded queues: it is self-regulatory because the maximal filling of
any queue is bounded by the number of customers present (K).
11.1 Gordon-Newell queueing networks 231

Figure 11.1: A small three-node GNQN

Let us now address the state space of a GNQN. Clearly, if we have M queues, the state
space must be a finite subset of NM. In particular, we have

Z(M,K)={nE~“lnl+...+n~=K}. (11.3)

We define the vector N = (Ni , . . q, NM) in which Ni is the random variable denoting the
number of customers in node i. Note that the random variables Ni are not independent
from one another any more (this was the case in Jackson networks). It has been shown by
Gordon and Newell that the steady-state distribution of the number of customers over the
M nodes is still given by a product-form:

(11.4)

where the normalising constant (or normalisation constant) is defined as

G(M,K)= c f&. (11.5)


~EI( M,K) i=l

It is the constant G(M, K), depending on M and K, that takes care of the normalisation so
that we indeed deal with a proper probability distribution, Only once we know the normal-
ising constant are we able to calculate the throughputs and other interesting performance
measures for the QN and its individual queues. This makes the analysis of GNQNs more
difficult than that of JQNs, despite the fact that we have changed an infinitely large state
space to a finite one.

Example 11.1. A three-node GNQN (I).


Consider a GNQN with only three stations, numbered 1 through 3. It is given that E[Si] =
1, E[S2] = 2 and E[&] = 3. Furthermore, we take ri,2 = 0.4, ~1,s = 0.6, r2,i = rs,i = 1.
232 11 Closed queueing networks

The other routing probabilities are zero. The number of jobs circulating equals K = 3.
This GNQN is depicted in Figure 11.l.
We first calculate the visit ratios. As usual, we take station 1 as a reference station,
i.e., VI = 1, so that V2 = 0.4 and V, = 0.6 (in a different type of specification, these visit-
counts would be directly given). The service demands equal: Di = 1, D2 = 2(0.4) = 0.8
and D3 = 1.8. Station 3 has the highest service demand and therefore forms the bottleneck.
Let us now try to write down the state space of the CTMC underlying this GNQN. It
comprises the set 2(3,3) = {( nr, n2, ns) E JV3(n1 + n2 + n3 = 3) which is small enough to
state explicitly:

163) = ~(3,0,0),(0,3,0),(0,0,3),(1,2,0),(1,0,~),
(0, 1921, c&l, o), c&o, l>, 642, l), (1, 1, 1)). (11.6)

As can be observed, ]2(3,3)] = 10. Applying the definition of G(M, K), we calculate

G(3,3) = c D~‘D;” Dy3 = 19.008. (11.7)


!G(3,3)

The probability of residing in state (1, 1,l) now, for instance, equals

1.44
Pr{AJ = (l,l, 1)) = 2:z3 = - = 0.0758. (11.8)
7 19.008

The probability that station 1 is empty can be calculated as

0.83 + 1.83 + 0.8(1.82) + 0.82(1.8) 10.0880


Pr{ni = 0) = 19.008 = ~19.008 = 0.5307. (11.9)

Consequently, we have ~~(3) = 1- 0.5307 = 0.4693, and X,(3) = plpl = 0.4693. Note that
we stress the dependence on the number of customers present by including this number
between parentheses. From this, the other throughputs can be derived using the calculated
visit counts. The average filling of e.g., station 1 can be derived as

l(D& + DID; + DJhD3) + 2(DfD2 + DfD3) + 30;


E[N,(3)] = 19.008
13.520
= ~ = 0.7113. (11.10)
19.008
Then, the expected response time when there are 3 customers present, denoted E[Ri(3)],
can be calculated using Little’s law as

E[N,(3)] 0.7113
ww)1 = xq3) = ~ = 1.516. (11.11)
0.4693
11.1 Gordon-Newell queueing networks 233

Figure 11.2: The CTMC underlying the three-node GNQN

In a similar way; we can compute for node 2: Pr{nz = 0} = 0.6246, pz(3) = 0.3754, X2(3) =
0.1877, E[N,(3)] = 0.5236 and E[&(3)] = E[Nz(3)]/X2(3) = E[Nz(3)]/VzX1(3) = 2.789.
For node 3, the corresponding values are 0.8447, 0.2816, 1.765 and 6.2683 respectively.
Note the large utilisation and average queue length and response time at node 3 (the
bottleneck station). 0

Notice that it is also possible to solve directly the CTMC underlying the GNQN from
the previous example. The only thing one has to do for that is to solve the global balance
equations that follow directly from the state-transition diagram that can be drawn for the
QN; it is depicted in Figure 11.2, where a state identifier is a triple (i, j, Ic) indicating
the number of customers in the three nodes and where pl,2 = ,~~~ri,2and ~1,s = ,9rr1,3. In
general, the d erivation of the CTMC is a straightforward task; however, due to its generally
large size, it is not so easy in practice. In this example, the number of states is only 10,
but it increases quickly with increasing numbers of customers. To be precise, in a GNQN
with M nodes and K customers, the number of states equals

(11.12)

This can be understood as follows. The K customers are to be spread over M nodes.
We can understand this as the task to put M - 1 dividing lines between the K lined-up
customers, i.e., we have a line of K + AS!- 1 objects, of which K represent customers and
.U - 1 represent boundaries between queues. The number of ways in which we can assign
M - 1 of the objects to be of type “boundary” is exactly the number of combinations
234 11 Closed queueing networks

of M - 1 out of K + ALJ- 1. For the three-node example at hand we find ]2(3, K) ] =


i (K2 + 3K + 2)) a quadratic expression in the number of customers present. For larger
customer populations, the construction of the underlying CTMC therefore is not practically
feasible, nor is the explicit computation of the normalising constant as a sum over all the
elements of Z(A!, K). Fortunately, there are more efficient ways to compute the normalising
constant; we will discuss such methods in the following sections. Before doing so, however,
we present an important result to obtain, in a very easy way, bounds on the performance
of a system.

It is important to realise that in a GNQN all queues are stable. The worst thing that can
happen is that almost all customers spend most of their time waiting at a particular queue.
We denote the queueing station with the largest service demand as the system bottleneck;
its index is denoted b. Since pi(K) = X(K)L$E[Si] < 1 f or all i, we can immediately derive
the following throughput bound:

X(K)< g&,i 2 = y& for all K, (11.13)

that is, the bottleneck station determines the maximum throughput. If we increase K, the
utilisation of station b will approach 1:

limlipb(K) = 1, and lim X(K) = &. (11.14)


K-h32

By this fact, the utilisations of other stations will also be bounded. We have

lim pi(K) = KiymViE[Si]X(K)


1’ = D; 5 1, for all i. (11.15)
K+CC Db

Thus, when K increases, the utilisation of station i will converge to a fixed, station-specific
value, the utilisation limit Di/Dba It is very important to realise that not all the limiting
utilisations are equal to 1. Some stations cannot be used to their full capacity because
another station is fully loaded. This result also implies that when a bottleneck in a system
is exchanged by an z-times faster component it does not follow that the overall speed of
the system is increased by the same factor. Indeed, the old bottleneck may have been
removed, but another one might have appeared. We will illustrate these concepts with two
examples.

Example 11.2. A three-node GNQN (II).


We come back to the previous three-node GNQN. There we observed the following three
11.2 The convolution algorithm 235

service demands: Di = 1, 02 = 0.8,Ds = 1.8. Since 03 is the largest service demand,


node 3 is the bottleneck in the model (b = 3). This implies that

lim X(K) = k = 0.5556. (11.16)


K-CC 3

Furthermore, we find that

DI
lim pi(K) = o3 = & = 0.5556,
KhX

Dz 0.8
Jlwpa(K) = ~)3 = 18 = O-4444,

lim p3(K) = 2 = 1. (11.17)


K+CO 3

We observe that nodes 2 and 3 can only be used up to 55% and 44% of their capacity! 0

Example 11.3. Bottleneck removal.


Consider a simple computer system that can be modelled conveniently with a two-node
GNQN. Analysis of the system parameters reveals that the service demands to be used
in the GNQN have the following values: D1 = 4.0 and 02 = 5.0. Clearly, node 2 is the
bottleneck as its service demand is the largest. The throughput of the system is therefore
bounded by l/D2 = 0.2. By doubling the speed of node 2, we obtain the following situation:
0; = 4.0 and 0; = 2.5. The bottleneck has been removed but another one has appeared:
station 1. In the new system, the throughput is bounded by l/D; = 0.25. Although this
is an increase by 25%, it is not an increase by a factor 2 (200%) which might have been
expected by the doubling of the speed of the bottleneck node. 0

11.2 The convolution algorithm


In the GNQNs presented in the previous section, the determination of the normalising
constant turned out to be a computationally intensive task, especially for larger queueing
networks and large numbers of customers, since then the employed summation (11.5) en-
compasses very many elements. This is made worse by the fact that the summands are
often very small or very large, depending on the values of the Di and the particular value
of n one is accounting for (one can often avoid summands that become either too small or
too large by an appropriate resealing of the Di-values). In general though, it will be very
difficult to limit round-off errors so it is better to avoid direct summation.
236 11 Closed queueing networks

Fortunately, a very fast and stable algorithm for the computation of G(M, K) does
exist and is known as the convolution algorithm; it was first presented by Buzen in the
early 1970s. Let us start with the definition of G(M, K):

G(M,K)= x fiD;%. (11.18)


@qM,K) i=l

We now split this sum into two parts, one part accounting for all the states with ?zM = 0
and one part accounting for all the states with nM > 0:

M-l M

ngZ(M-1,K) i=l TJEZ(M,K-~) i=l

Since the first term sums over all states such that there are K customers to distribute over
M - 1 queues (queue M is empty) it represents exactly G( A4 - 1, K). Similarly, in the
second term one sums over all states such that there are K - 1 customers to distribute
over the M queues, as we are already sure that one of the K customers resides in queue
M, hence we have a term DMG(M, K - 1). Consequently, we find:

G(M, K) = G(M - 1, K) + DMG(M, K - 1). (11.20)

This equation allows us to express the normalising constant G(M, K) in terms of nor-
malising constants with one customer and with one queue less, i.e., we have a recursive
expression for G(M, K). To start this recursion we need boundary values. These can be
derived as follows. When there is only 1 queue, by definition, G(1, Ic) = Df, for all Ic E N.
Also, by the fact that there is only one way of distributing 0 customers over M queues, we
have G(m, 0) = 1, for all m.
A straightforward recursive solution can now be used to compute G(M, K). However,
this does not lead to an efficient computation since many intermediate results will be
computed multiple times, as illustrated by the following double application of (11.20):

G(M, K) = G(M - 1, K) + DMG(A4, K - l),


= G(M - 2, K) + DM-lG(M - 1, K - 1)
+ DMG(M - 1, K - 1) + DMDM-lG(M, .K - 2). (11.21)

Instead, (11.20) can be implemented efficiently in an iterative way, as illustrated in Ta-


ble 11.1. The boundary values are easily set. The other values for G(m, k) can now be
11.2 The convolution algorithm 237

k G(l,k) G(2,k) .a. G(m - 1, k) G(m,k) --- G(M,k)


0 1 1 ... 1 1 ... 1
1 Dl
2 0::
3 D?
-1 -1
Dk-1
k-l 1 + G(m-&k-l) + G(m,k-1) -+
-1 %-I -4.-Qn
k @ --+ G(m-1,k) + W-G) --)

Table 11.1: The calculation of G(M, K) with Buzen’s convolution algorithm

calculated in a column-wise fashion, left to right, top to bottom. In its most efficient form,
the iterative approach to compute G(M, K) only requires A4 - 1 columns to compute, each
of which requires K additions and K multiplications. Hence, we have a complexity O(MK)
which is far less than the direct summation approach employed earlier. Furthermore, if
only the end value G( M, K) is required, only a single column of intermediate values needs
to be stored. New columns replace older columns as the computation proceeds. Thus, we
need only O(K) storage for this algorithm.
Once we have computed the normalising constant G( M, K) we can very easily calculate
interesting performance measures. For instance, to evaluate the probability of having at
least ni customers in queue i, we have

Pr{Ni 2 ni> =
Nil?
= Dyz G(“? K - %>
G(M, K) ‘(11*22>

Using this result, we find for the utilisation of queue i:
G(M, K - 1)
pi = Pr{Ni 2 l} = Di (11.23)
~vf,W *
Using the fact that pi = Xa(K)E[Si] = X(K)V,E[Si] = X(K)Diy we find:
G(M, K - 1)
X(K) = (11.24)
W&K) ’
238 11 Closed queueing networks

Thus, the throughput for the reference station simply equals the quotient of the last two
computed normalising constants. To compute the probability of having exactly ni cus-
tomers in node i, we proceed as follows:

Pr{Ni = ni) =
Ni=ni

As can be observed, the sum in the last expression resembles the normalising constant
in a GNQN in which one queue (namely i) and ni customers are removed. However, it
would be wrong to “recognize” this sum as G(M - 1, K - ni) since we have removed the
i-th station, and not the A4-th station; the column G(M - 1, .) corresponds to a GNQN
in which station M has been removed. However, if we, for the time being, assume that
i = M, we obtain the following:
M-l
1
Pr{NM =nM} = DhM c rI
~EZ(MK) G(M, K) j=l Dj”j
= DTIMG(M’-‘YK-~M)
M (11.26)
WCK) -

In this expression we see two normalising constants appearing, one corresponding to column
M - 1 and one corresponding to column M. It is this dependence on two columns that
makes the ordering of the stations in the convolution scheme important.
We will now derive an alternate expression for Pr{NM = r&M} in which only normalising
constants of the form G(M;) appear, so that the ordering of the stations, as sketched
above, does not pose a problem any more. These expressions are then valid for all stations.
We first rewrite (11.20) as

G(M - 1, K) = G(M, K) - DMG(M, K - l), (11.27)

and substitute it in (11.26) and set the number of customers equal to K - ni ; we then
obtain

PI’{NM = nM) = G(yKj (G(M, K -n&f) - DMG(M, K - nM - 1)). (11.28)


7
In this expression, however, we see only normalising constants appearing that belong to
the M-th column in the tabular computational scheme. Since the last column must always
11.2 The convolution algorithm 239

be same, independently of the ordering of the stations, this expression is not only valid for
the M-th station, but for all stations, so that we have:

Pr{Ni = ni) = G,zzKl (G(M, K - ni) - DiG(M, K - ni - 1)). (11.29)


7
Anot her advantage of expressions only using the normalising constant of the form G( M, +)
is that they are all available at the end of the computation. The columns G(M - i, a) might
have been overwritten during the computations to save storage.
For calculating average queue fillings the situation is a little bit more complicated,
however, still reasonable. We have

E[Ni(K)] = 5 kPr{Ni = /?} = 2 Pr{Ni 2 lc} = 2 UyGG~~~)‘). (11.30)


k=l k=l k=l >

From these results we can, by applying Little’s law, derive the expected response time
E[Ri(K)] for queue i as f0110WS:

E[N(K)] = E[N(K)] = CfI’=, DfG(M,K- k)


EIRi(K)I = x (jq
(11.31)
i KX(K) L$G(M, K - 1) ’

Example 11.4. A three-node GNQN (III).


Consider the GNQN with 3 nodes we have addressed before, and of which we do know
the visits counts and the service demands: VI = 1, Vz = 0.4, V, = 0.6 and Dr = 1.0, D2 =
0.8,Ds = 1.8, respectively. The procedure to calculate G(M, K) is now performed by
stepwisely filling the following table:

k GW) G(2A G(W)


0 1.0 1.0 1.0
1 1.0 1.8 3.6
2 1.0 2.44 8.92
3 1.0 2.952 19.008

Using the result that X(K) = G(M, K - l)/G(M, K), we have X(1) = l/3.6 = 0.278,
X(2) = 3.6/8.92 = 0.404 and X(3) = 8.92/19.008 = 0.469. Using these values, we can
calculate the utilisations: pi(3) = X(3)Di. For instance, we have ~~(3) = 0.469 x 0.8 =
0.3752. For calculating the average number of customers in node 2, we use

G(M, K - k) = (0.8 x 8.92) + (0.82 x 3.6) + (0.83 x 1.0) = o 5236


E[N2(3)1 = 2 D; G(j,,f K) 19.008
k=l 7
(11.32)
240 11 Closed queueing networks

k PlW P2W PSW ww~)l Jw2(W Jw3(W -q&(q] JqR2(Q] E[R3@)]

1 0.278 0.222 0.500 0.278 0.222 0.500 1.000 2.000 3.000


2 0.404 0.323 0.726 0.516 0.395 1.090 1.278 2.445 4.500
3 0.469 0.375 0.845 0.711 0.524 1.765 1.516 2.791 6.268

Table 11.2: Performance results for the small GNQN derived with the convolution method

From this, we derive

ww3)l E[W3)1 0.5236


wu3)l = x2(3) = = 2.7909. (11.33)
V&(3) = 0.4 x 0.469
Other performance measures can be derived similarly and are presented in Table 11.2. 0

In the computations presented so far we have computed the expected response time at
nodes per visit; we have taken the viewpoint of an arbitrary customer and computed its
expected residence time at a particular node. Very often in the analysis of GNQNs, the
expected response time per passage is also computed. This is nothing more than the usual
expected response time weighted by the number of times the particular node is visited in
a passage. We denote the expected response time per passage at node i as E[&(K)] and
have:
E[&(K)] = k$?qRi(K)]. (11.34)
E[&(K)] simply d enotes the expected amount of time a customer spends in node i during
an average passage through the network. Consequently, if node i is visited more than once
per passage (Vi > l), the residence times of all these visits to node i are added. Similarly,
if node i is only visited in a fraction of passages (Vi < 1) the average time a customer
spends at node i is weighted accordingly. In a similar way, the overall expected response
time per passage is defined as

q&i-)] = 5 E[iii(K)]= 5 vyqR@-)], (11.35)


i=l i=l

and expresses the expected amount of time it takes a customer to pass once through the
queueing network. Given E[@K)], the frequency at which a single customer attends the
reference node (usually node 1) is then I/E[A(K)]. S’mce there are K customers cycling
through the QN the throughput through the reference node must be:

X(K) = K (11.36)
m(K)1 *
We will use this result in the mean-value analysis to be presented in Section 11.3.
11.3 Mean-value analysis 241

solution of GBEs

Figure 11.3: The role of CTMCs and MVA in the solution of QN models

A final remark regarding the convolution approach follows here. We have presented it
for GNQN where all the queues are M/M 11 queues. However, various extensions do exist,
most notably to the case where the individual nodes have service rates that depend on
their queue length. In particular, the infinite-server queue belongs to this class of nodes.
We will come back to these extensions of the convolution algorithm in Chapter 12.

11.3 Mean-value analysis


In the analysis of the GNQNs discussed so far, we have used intermediate quantities such
as normalising constants (in the case of the convolution algorithm) or steady-state proba-
bilities (in the case of a direct computation at the CTMC level) to compute user-oriented
measures of interest such as average response times or utilisations at the nodes of the QN.
This is illustrated in Figure 11.3. Although this approach as such is correct, there may
be instances in which it is inefficient or computationally unattractive. This happens when
one is really only interested in average performance measures. Then we do not need the
steady-state probabilities nor the normalising constants, but resort to an approach known
as mean-value analysis (MVA) w h’ic h was developed by Reiser and Lavenberg in the late
1970s. With MVA, the average performance measures of interest are directly calculated at
the level of the QN without resorting to an underlying CTMC and without using normalis-
ing constants. The advantage of such an approach is that the quantities that are employed
in the computations have a “physical” interpretation in terms of the GNQN and hence the
computations can be better understood and verified for correctness. Note that in Chapter 4
we have already encountered a special case of the MVA approach in the discussion of the
terminal model. Here we will generalise that approach.
242 11 Closed queueing networks

A key role in MVA is played by the so-called arrival theorem for closed queueing net-
works, which was derived independently in the late 1970s by Lavenberg and Reiser [174]
and Sevcik and Mitrani [260]; we refer to these publications for the (involved) proof of this
theorem.

Theorem 11.1. Arrival theorem.


In a closed queueing network the steady-state probability distribution of cus-
tomers at the moment a customer moves from one queue to another, equals
the (usual) steady-state probability distribution of customers in that queueing
network without the moving customer. q

A direct consequence of the arrival theorem is that a customer moving from queue i to
queue j in a QN with K customers, will find, upon its arrival at queue j, on average
JWXK - 111customers in queue j. Assuming that customers are served in FCFS order,
we can use this result to establish a recursive relation between the average performance
measures in a GNQN with K customers and a GNQN with K - 1 customers as follows.
The average waiting time of a customer arriving at queue i, given an overall customer
population K, equals the number of customers present upon arrival, multiplied by their
average service time. According to the arrival theorem, this can be expressed as:

E[W,(K)] = E[Ni(K - l)]EISi]. (11.37)

The average response time (per visit) then equals the average waiting time plus the average
service time:
E[Ri(K)] = (E[N(K - l)] + l)E[Si]. (11.38)

Multiplying this with the visit ratio Vi, we obtain the average response time per passage:

l@(K)] = (E[N,(K - l)] + l)E[S& = (E[N(K - l)] + l)Di. (11.39)

Using Little’s law, we know

E[Ni(K)] = Xi(K)E[R;(K)] = X(K)&?3[&(K)]) = X(K)E[iii(K)], (11.40)

or, if we sum over all stations,

g E[N(K)]= X(K)5 @i(K)] = X(K)@(K)] = K. (11.41)


i=l
11.3 Mean-value analysis 243

From this equation, we have


X(K) = K (11.42)
WM-)I ’
as we have seen before, and which can be substituted in Little’s law applied to a single
node:

This result is intuitively appealing since it expresses that the average number of customers
at queue i equals the fraction of time that each customer passage, the customers resides in
queue i during a passage, i.e., E[&(K)]/E[k(K)], t imes the total number of customers K.
Using (11.39) and (11.43)) we can recursively compute average performance measures for
increasing values of K. Knowing E [R( K)], we can use (11.42) to calculate X(K). Finally,
we can compute pi(K) = X(K)V,E[Si]. Th e st ar t in the recursion is formed by the case
K = 1, for which E[&(l)] = Di, for all i.

Example 11.5. A three-node GNQN (IV).


We readdress the example of the previous sections, but this time apply the MVA approach
to solve for the average performance measures. For K = 1, we have

E[&(l)] = (E[N,(O)] + l)D1 = D1 = 1.0


E@(l)] = (E[&(O)] + 1)D2 = D2 = 0.8 (11.44)
E[&(l)] = (E[&(O)] + 1)D3 = D3 = 1.8

From this, we derive E[k(l)] = Cz=, E[&( 1)] = 3.6 and so X(1) = l/E[k(l)] = 0.278. Us-
ing E[Ni(l)] = X(l)E[&(l)] we have E[Nl(l)] = 0.278, E[N2(1)] = 0.222 and E[N,(l)] =
0.500. We also have immediately pr( 1) = DlX(1) = 0.278, pa(l) = 0.222 and p3( 1)
= 0.500. For K = 2, we have

E[l?,(2)] = (E[N,(l)] + l)D1 = 1.278D1 = 1.278


E@(2)] = (E[&(l)] + 1)D2 = 1.222D2 = 0.978 (11.45)
E[ii,(2)] = (E[&(l)] + 1)D3 = 1.500D3 = 2.700

From this, we derive E[@2)] = C:=, E[&(2)] = 4.956 and so X(2) = 2/E[fi(2)] =
0.404. Using E[Ni(2)] = X(2)E[&(2)] we have E[Nl(2)] = 0.516, E[N2(2)] = 0.395 and
E[N,(2)] = 1.091. We also have immediately pr(2) = DlX(2) = 0.404, pz(2) = 0.323 and
~~(2) = 0.726.
244 11 Closed queueing networks

We leave it as an exercise for the reader to derive the results for the case K = 3 and
compare them with the results derived with Buzen’s convolution scheme. q

When the scheduling discipline at a particular queue is of infinite server type (IS), we
can still employ the simple MVA approach as presented here. For more general cases of
load-dependency a more intricate form is needed (to be addressed in Chapter 12). The only
thing t>hat changes in the infinite-server case is that for the stations j with IS semantics,
equation (11.39) changes: there is no waiting so the response time always equals the service
time, i.e., we have

for IS nodes : E[Rj(K)] = E[Sj], or E[&(K)] = Dj. (11.46)

As can be observed, the case of IS nodes makes the computations even simpler!
Regarding the complexity of the MVA approach the following remarks can be made.
We have to compute the response times at the nodes for a customer population increasing
to I(. Given a certain customer population, for every station one has to perform one
addition, one multiplication and one division. Consequently, the complexity is of order
O(KM). In principle, once results for K have been computed, the results for K - 1 do not
need to be stored any longer. Therefore, one needs at most O(M) storage for the MVA
approach.
Alt,hough the MVA approach might seem slightly more computationally intensive, its
advantage clearly is that one computes with mean values that can be understood in terms of
the system being modclled. Since these mean values are usually not as large as normalising
constants tend to be, computational problems relating to round-off and loss of accuracy are
less likely to occur. Furthermore, while computing performance measures for a particular
GNQN with K customers, the results for the same GNQN with less customers present are
computed as well. This is very useful when performing parametric studies on the number
of customers.
After an MVA has been performed, it might turn out that at a particular station the
average response time is very high and one might be interested in obtaining more detailed
performance measures for that station, e.g., the probability of having more than a certain
number of customers in that station. Such measures cannot be obtained with an MVA, but
they can be obtained using the convolution method. The question then is: should we redo
all the work and perform a convolution solution from the start, or can we reuse the MVA
results? Fortunately, the latter is the case, i.e., we can use the MVA results to calculate
normalising constant! From Section 11.2 we recall that
G(M, K - 1) G(M, K - 1)
X(K) = =s- G(M,K) = (11.47)
WC Eo WE0 *
11.4 Mean-value analysis-based approximations 245

Since we have calculated the values of X(k) for Ic = 1, . . . , K, using the MVA, we can
calculate G(M, 1) = l/X(l), th en calculate G(M, 2) = G(M, 1)/X(2), etc. This approach
is shown in Figure 11.3 by the arc labelled “trick”. Using the thus calculated normalising
constants we can proceed to calculate more detailed performance measures as shown in
Section 11.1.

Example 11.6. A three-node GNQN (V).


Using MVA we have computed the following values for the throughputs at increasing
customer populations: X(1) = 0.278, X(2) = 0.404 and X(3) = 0.469. Consequently, as
indicated above, we have
1
G(M, 1) = - ’ = ~ = 3.600,
X(l) 0.278

G&5’) = w=“= 8.920, (11.48)


0.404

G(M,3) = -r-z
G(~, 2) 8.92 19.008.
X(3) 0.469
As can be observed, these normalising constants correspond with those calculated via
Buzen’s algorithm. For the computations as presented here, it is of key importance to
maintain a large number of significant digits; typically, using ordinary floating point num-
bers, such as those of the 64-bit IEEE 754 floating point number standard, does not suffice
when evaluating large GNQNs! cl

11.4 Mean-value analysis-based approximations


The larger the number of customers K in a GNQN, the longer the MVA recursion scheme
takes. To overcome this, a number of approximation and bounding techniques have been
proposed; we discuss three of them here. We start with asymptotic bounds in Sec-
tion 11.4.1; we have already seen these in a less general context in Chapter 4. The well-
known Bard-Schweitzer approximation is presented in Section 11.4.2 and an approximation
based on balanced networks is discussed in Section 11.4.3.

11.41 Asymptotic bounds


We have already encountered asymptotic bounds in a less general context in Chapter 4
when discussing the terminal model, as well as in Section 11.1, when discussing throughput
bounds. Here we present these bounds in a more general context.
246 11 Closed queueing networks

Consider a GNQN in which all but one of the nodes are normal FCFS nodes. There is
one node of infinite-server type which is assumed to be visited once during a passage (note
that this does not form a major restriction since the visit-counts x can be scaled at will).
Denote its expected service time as E[Z] and the service demands at the other nodes as
D; = I/iE[Si]. Furthermore, let D+ = maxi and DC = xi Di.
We first address the case where the number of customers is the smallest possible: K = 1.
In that case, the time for the single customer to pass once through the queueing network
can be expressed as E[@l)] = E[Z] + D c, simply because there will be no queueing. From
this, we can directly conclude that the expected response time for K 2 1 will always be at
least equal to the value when K = 1:

E@(K)] > E[Z] + DC. (11.49)

When there is only a single customer present, the throughput equals l/@ 1). We assume
that the throughput will increase when more customers are entering the network (this
assumption implies that we do not take into account various forms of overhead or ineffi-
ciences). However, as more customers are entering the network, queueing effects will start
to play a role. Hence, the throughput with K customers will not be as good as K times
the throughput with a single customer only:

X(K) I E[Z]K+ DC’ (11.50)

Although the above bounds are also valid for large K they can be made much tighter in
that case. Since we know that for large K the bottleneck device does have a utilisation
approaching 1, we know that for K + 00 the following must hold: X(K) D+ + 1, so that
X(K) + l/D+ or
X(K) 5 $. (11.51)
+
Using Little’s law for the queueing network as a whole we find K = X(K)E[l?(K)], or,
when taking K + 00:
l@(K)] = & 5 KD,. (11.52)

In conclusion we have found the following bounds:

X(K) 5 min

E@(K)] L max{E[Z] + DC, KD+}. (11.53)


11.4 Mean-value analysis-based approximations 247

K* K

1
D+

Figure 11.4: Asymptotic bounds for a closed queueing network (dashed lines: asymptotes;
solid lines: true values (estimated))

These bounds are illustrated in Figure 11.4. The crossing point K* of the two asymptotes
is called the saturation point and is given by

K* = Dc + E[Zl (11.54)
D+ *
The integer part of K* can be interpreted as the maximum number of customers that could
be accommodated without any queueing when all the service times are of deterministic
length. Stated differently, if the number of customers is larger than K* we are sure that
queueing effects in the network contribute to the response times.

11.4.2 The Bard-Schweitzer approximation


A simple approximation for GNQNs with large customer population was proposed by Bard
and Schweitzer in the late 1970s. When the number of customers in the GNQN is large, it
248 11 Closed queueing networks

might be reasonable to assume that

(11.55)

that is, the decrease in customer number is divided equally over all queues. Substituting
this in the basic MVA equation (11.39), we obtain

@i(K)]
= D, (yE[Ni(K)].+
23
l)Di, FCFS nodes,
IS nodes.
(11.56)

We can use this equation to compute an estimate for E[&(K)] given an estimate for
E[N(K)]. A s a first guess, typically the uniform distribution of all K customers over
the M nodes is taken, i.e.,E[Ni(K)] E K/M. Then, (11.56) can be used to obtain a
more exact estimate for E[&(K)], aft er which (11.43) is used to compute a more accurate
approximation of E[Ni(K)]. Th is p recess is continued until convergence is reached. It is
important to note that convergence should be tested on all the measures of interest. If the
interest is in mean response times at the nodes, the iteration should be stopped when two
successive estimates for that measure are sufficiently close to one another. At that point,
it is possible that the estimates for the mean queue lengths or throughput are not as good.
The Bard-Schweitzer approximation scheme often works very well in practice, however,
its accuracy depends on the model and parameters at hand. Notice that an iteration step in
the approximation is as expensive as a normal MVA iteration step; thus, the approximation
scheme is computationally only attractive when the expected number of iterations is smaller
than the number of customers in the GNQN.

Example 11.7. A three-node GNQN (VI).


Let us address the simple three-node GNQN again, but now with K = 60. Using the
Bard-Schweitzer iterative scheme as given above, we find the performance measures as
given in Table 11.3 for increasing number of iterations (the number of iterations is seen in
the column headed #). As can be observed, with only few iterations, the approximation
scheme approaches the correct values relatively well. Cl

11.4.3 Balanced networks


A class of GNQN that can very easily be solved using MVA is the class of balanced networks.
A GNQN is said to be balanced whenever Di = D, for all i; in other words, in a balanced
QN all the nodes have the same service demand. If this is the case, the performance
11.4 Mean-value analysis-based approximations 249

# E[Ri(60)] E[&(60)] E[&@O)] E[N(60)1 E[N2(60)1 ~9[N3(60)1

1 21.0000 42.0000 63.0000 16.6667 13.3333 30.0000


2 17.6667 28.6667 93.0000 12.4804 8.1005 39.4192
3 13.4804 18.2009 121.2575 8.6491 4.6711 46.6798
4 9.6491 11.3423 143.0393 5.7889 2.7219 51.4892
5 6.7889 7.4438 157.4676 3.9074 1.7137 54.3789
6 4.9074 5.4274 166.1366 2.7580 1.2201 56.0219
7 3.7580 4.4402 171.0657 2.0844 0.9851 56.9304
8 3.0844 3.9703 173.7913 1.6987 0.8746 57.4267
9 2.6987 3.7492 175.2802 1.4805 0.8227 57.6967
10 2.4805 3.6455 176.0902 1.3580 0.7983 57.8436
11 2.3580 3.5967 176.5309 1.2895 0.7868 57.9237
12 2.2895 3.5735 176.7711 1.2513 0.7812 57.9674

20 2.2041 3.5520 177.0597 1.2038 0.7760 58.0203


MVA 2.2500 3.6000 173.8500 1.2500 0.8000 57.9500

Table 11.3: Bard-Schweitzer approximation for the three-node GNQN when K = 60

measures will be the same for all queues. As a consequence of this, the K customers are
divided equally over the A4 queues, i.e., E[Ni(K)] = PC/&!, for all i.
The expected response times (per passage) are then also the same for all stations. Using
the basic MVA relation (11.39) and substituting E[j’Vi(K - l)] = (K - 1)/M, we find:

E@&(K)] = (&$ + 1) D = K +; - ‘D. (11.57)

The overall expected response time per passage E[.R(K)] = (K + M - 1)D. For the
throughput and utilisations we obtain:
K
X(K) = K = D(K+M-1)
and p,(K) = X(K)D = K +c _ 1. (11.58)
mw)I
In practice almost no system can be modelled as a balanced GNQN. However, we can
use a balanced QN to obtain bounds for the throughput and average response times in
non-balanced GNQNs as follows. Define:

D+ = mpx{ Di}, D- = mjn{Di}, and DC = C Di. (11.59)


i
250 11 Closed queueing networks

These values correspond to the smallest, the largest and the sum of the service demands.
The throughput X(K) in the unbalanced GNQN will be smaller than the throughput
in a completely balanced GNQN with service demands set to D-, but higher than in a
completely balanced GNQN with service demands set to D+. Consequently,
K K
(11.60)
D+(K+M-1) -<X(K)i D-(K+M-1)’
From this inequality, we can also derive bounds for the response times:

(K + M - l)D- < E@(K)] 2 (K + M - l)D+. (11.61)

The above bounds can be made a little bit tighter by realising that the performance is
best when the overall service demand DC is divided equally over the M stations, i.e., when
Di = D = DC/M. Al so, the throughput is bounded by l/D+. These considerations lead
to the following throughput upperbound:
K 1
X(K) 5 min - (11.62)
D(K-l)+Dc’D+
The worst performance appears when the network is most unbalanced, keeping in mind
that the largest service demand is D+. We then have [DC/D+] stations with service
demand D+ and M - [DC/D+1 st at ions with service demand zero. From this, we derive
the following lower bound:
K K
(11.63)
X(K) ’ (K + @] - l)D+ = D+(K - 1) + DC’

From these two inequalities we can easily derive the following bound for the expected
response time per passage:

max (DC + D(K - l), KD+} I E@(K)] 5 DC + D+(K - I). (11.64)

Example 11.8. A three-node GNQN (VII).


Let us consider the three-node GNQN again with M = K = 3 and D1 = 1, Dz = 0.8 and
D3 = 1.8. For the simple bounds we derive the following:

0.333 <_ X(3) 5 0.750, and 4.000 5 E[@3)] 5 9.000.

As can be observed, these bounds are not too tight; the exact values are: X(3) = 0.469
and E[R(3)] = 6.393. When we employ the tighter bounding scheme we find, however,
much better bounds:

0.417 5 X(3) 2 0.500, and 6.000 5 E[@3)] 5 7.200.


11.5 An approximate solution method 251

The error is about 10%. The bounds obtained in this way improve when the differences in
the involved service demands Di are smaller. 0

When the queueing network to be bounded does not only contain FCFS nodes but also
nodes of infinite-server type, the bounds presented above cannot be used. Instead, a special
treatment has to be given to the infinite-server nodes. Below, we will consider bounds for
the performance measures in a GNQN in which there is only one infinite-server node with
think time E[Z]. Since we will use the think time directly in the bounds below, we have to
make sure that the visit count Vi for the infinite-server station is 1; this can be achieved by
-
simply renormalising them as such. The values D-, D+, D and DC are defined as before,
however, taking into account the renormalised visit counts. Without proof, we state that
in such a QN the expected response time given K customers present, i.e., E[@K)], lies in
the following interval:

E[Z], DC + (K - l)D Dc
Qz + -q-q
Jc+(K-l)D+
w - 1)&I
(K - 1)Dc + E[Z] *1
(11.65)
Notice that these bounds are to be interpreted as the bounds for the system to respond
to a request from the infinite-server station (normally, the modelled terminals), i.e., the
think time itself is not included in the response time bound. Similarly, we find that the
throughput of the infinite-server station (the station with & = 1) when there are K
customers present, i.e., X(K) 1ies in the following interval:

II
K 1 K
Dt’ E[Z] + DC + (K - l)r>&] *
[ E[Z] + DC + (K - 1)D+~17min I
(11.66)
Note that when there are no infinite-server stations, setting E[Z] = 0 in these bounds will
yield the bounds for a model without infinite-server station, as we have seen before. A
four-page proof of these bounds can be found in [145, Section 34.41. We will show the
usefulness of the bounds in Section 11.6.

11.5 An approximate solution method


As pointed out in the previous sections, exact solution methods for GNQNs suffer from
the fact that the computational requirements increase rapidly with increasing numbers
of customers and nodes. Also, a normally quite efficient method such as MVA becomes
less attractive when multi-server queues or nodes with load-dependent service rates are
introduced (see Chapter 13 for details). In order to circumvent these difficulties, we present
252 11 Closed queueing networks

a generally applicable approximate approach that has been developed by Belch et al. [26];
since his approach is based on finding the zero of a function, it is sometimes referred to as
the “functional approach”.
We define the class of queueing networks that can be handled in Section 11.5.1. Then,
we discuss the basic solution approach in Section 11.5.2. A numerical solution procedure
is presented in Section 11.5.3, and Section 11.5.4 addresses a few extensions.

11.5.1 Queueing network definition


To start with, we consider a GNQN with M nodes and K customers and assume that, for
every node, the visit ratios x and the service rates ,!~i (E[S,] = l/pi) are known. Then, we
assume that every node in the queueing network is of one of the four following types:
1. MJMJm-FCFS;
2. MIGIl-PS;
3.M(Glco-IS;
4. MIGI l-LCFSPR.
These four node types coincide with the four node types allowed in BCMP queueing net-
works which we will present in Chapter 13.

11.5.2 Basic approach


Central to the approximation procedure is the observation that in open, load-independent
queueing stations, the mean number of customers is an increasing function of the through-
put (or the utilisation). For instance, in the MlMll-case we have

Pi (11.67)
E[Ni] = fi(Xi) = 1 Tf$i,, =
2 z 1 - pi’
For (load-dependent) queueing stations in closed queueing networks, this functional relation
might be slightly different, but we will assume that such a function fi exists and is non-
decreasing. For a type 3 station, it is known that fi(Xi) = Xi/pi. Observe that fi(0) = 0
and that fi is only defined for Xi E [0, mipi), i.e.,for cases in which the node is not
saturated.
Now, suppose that we know the functions fi for all the queueing stations in the network.
Let Ni(K) b e a random variable that denotes the number of customers in station i, given
there are in total K customers present. Clearly, by the fact that the total number of
customers is fixed, we have

IV,(K) + N,(K) + --* + NM(K) = K. (11.68)


11.5 An approximate solution method 253

Taking expectations on both sides, we obtain

E[N,(K)] + E[AqK)] + - * * + E[&f(K)] = K, (11.69)


which, by using the functions f; and the fact that Xi(K) = X(K)E, can be rewritten as

5 E[Ni(K)] = 5 fi(Xi(K)) = 5 fi(X(K)bg = K. (11.70)


i=l i=l i=l

This equation basically states that the sum of the average number of customers in all
stations must equal the total number of customers. If we know the functions fi, the non-
linear equation ( 11.70) in X(K) can be solved numerically. Let us now first focus on the
determination of the functions fi before we discuss the numerical solution of (11.70) in
Section 11.5.3.
For ease in notation, we use pi = Xi/(mipi). One problem in the determination of the
functions f; is that they should be valid in the finite-customer domain. Normal functional
relationships, e.g., those derived from MIMI1 analysis, are valid for open QNs, hence for
infinite customer populations,
We first address the type 1 queueing stations with m = 1. For the infinite-buffer single-
server case, we know from M) MI 1 analysis that (11.67) holds. Observe that limpZ,r E[.A?i] =
00 in this case. However, if we introduce a correction factor (K- 1)/K in the denominator,
we obtain
E[Aq = pi (11.71)
l- !$2pi*
This equation has a limiting value of K when pi -+ 1, which is intuitively appealing: when
the utilisation approaches lOO%, all K customers will reside in this node. Due to the
insensitivity property in BCMP queueing networks (see Chapter 13) this result can also be
used for nodes of type 2 (PS) and type 4 (LCFSPR). Note that the above result can also be
derived from the Bard-Schweitzer approximation discussed in Section 11.4.2, in particular,
by rewriting (11.56).
When we deal with an MJMlmi-FCFS station, we have to adapt the result so as to
account for the multiserver behaviour. Maintaining the same line of thinking, we obtain:

E[Ni] = miPi + P,(Pi) X 1 _ ~“~~-1 . (11.72)


Kerni Pi
The first additive term gives the number of customers in service, whereas the latter gives
the number of customers queued, since p,(pi) is the probability that there are customers
queued. From analyses similar to those performed in Chapter 4, we can derive that
(w)”
m!(l-p)
Prn(P> = (11.73)
CyL--’ y + * ’
254 11 Closed queueing networks

which can be approximated as

p(m+1)/2 P L 0.7,
Pm(P) = (11.74)
{ $(P” + b), p 2 0.7.

Again notice the behaviour of E[Ni] when pi -+ 1: the first additive term will equal mi; the
probability p,(p) approaches 1, and the second term will therefore approach K - mi, so
that for the overall number of customers we have: E[NJ + K. Finally, for type 3 stations,
i.e., IS stations, we have the exact relation

Jw%]= Xi/j.&. (11.75)

11.5.3 Numerical solution


Now that we have obtained the individual functions fi for the allowed types of nodes in the
queueing network we have to solve (11.70). D enoting the left-hand side of this equation
as ww)), we have to solve F(X(K)) = K. Since the additive parts of F(X(K)) are
all increasing, F(X(K)) d oes so as well. By the fact that fi(O) = 0, for all i, and the fact
that fi is increasing, X(K) must be non-negative. Also, because

pi(K) = x(K>r/i 5 1,
miPi

we have
X(K) 5 y, for all i,
a
i.e., we require

To solve F(X(K)) = K, we start with two guesses: we set 1 = 0 (low) and h = mini{ y}
(high) so that F(Z) 5 K and F(h) 2 K. Then, we compute m = (I + h)/2 (middle) and
evaluate F(m). If F(m) < K, we set 1 t m; otherwise, if F(m) > K we set h t m. We
repeat this procedure until we have found that value m such that F(m) = K f E, where
E is some preset accuracy level, The approach is illustrated in Figure 11.5; we basically
determine a zero of the function F(X(K)) - K by interval splitting. Of course, other
methods such as Newton’s method might be employed for this purpose as well.
Once X(K) has b een determined, one can easily compute other performance mea-
sures of interest. For i = 1, - a- , Ad, one has Xi(K) = X(K)K, pi(K) = Xi(K)/mipi,
E[Ni(K)] = fi(Xi(K)), E[N,,i(K)] = E[Ni(K)] - pi(K), E[&(K)] = .fi(Xi(K))/Xi(K)
and E[Wi(K)] = E[N,,i(K)]/Xi(K)*
11.5 An approximate solution method 255

unknown X(K)

Figure 11.5: Interval splitting to determine F(X(K)) = K

It can be seen that although the non-linear equation increases in size (complexity)
when the number of stations A4 increases, it does not increase in complexity when either
K increases or the number of servers mi becomes larger than 1. This is an important
advantage of this solution technique.

Example 11.9. The three-node GNQN (VIII).


Applying this method to our small three-node GNQN, we find

E[Ni(K; x)] = 1 _ x&ffx,,


K 2

so that we have to solve, for K = 3:


X 8X 18X
l-:X+ lO(1 - &X) + lO(1 - ;x> = 3*

Using a numerical method as sketched above, we find X = 0.460 which differs by only 2%
from the exact value 0.4693 we found earlier. We find also: E[Ni(3)] = 0.664, E[Nz(3)] =
0.488 and E[N3(3)] = 1.848. Here the differences with the exact values are less than 7%.
Cl

Example 11.10. A four-node BCMP queueing network (adapted from [26]).


Consider a queueing network with iL! = 4 nodes and K = 10 customers. The station
parameters are given in Table 11.4. Since the example network is of BCMP type, a direct
MVA can also be employed. In Table 11.5 we present the results for both the MVA and
256 11 Closed queueing networks

1 1.0 1.9 4
2 0.4 0.9 00
3 0.4 5.0 1
4 0.2 1.5 1

Table 11.4: Parameters of the four-node queueing network

MVA approximation
i Xi Pi E[N] E[Ri] xi pi E[Ni] E[&]
1 6.021 0.792 3.990 0.662 5.764 0.758 4.156 0.721
2 2.408 0.267 3.676 1.111 2.305 0.256 2.561 1.111
3 2.408 0.481 0.856 0.355 2.305 0.461 0.788 0.314
4 1.204 0.802 2.476 2.056 1.152 0.768 2.492 2.162

Table 11.5: MVA and approximate results for the four-node queueing network

the approximate analysis. As can be observed, the results match within about lo-15%.
This seems to be reasonable, given the difference in computational effort required to get
the solution. q

11.5.4 Extension to other queueing stations


Given the approach, we can easily extend it to include other nodes as long as we are able
to find functions E[Ni(K)] = fi(Xi(K)). 0 ne such function can for instance be found for
the M ]G ]m-FCFS queueing station:

EINi(K)I = miPi + Vm(Pi) x 1 _ I(?;,-, 7 (11.76)


K-m, Pi

where pm (p) is defined as before, and where

cl!= i(1+ CT;>, (11.77)

where Cz is the squared coefficient of variation of the service time distribution. This type
of queueing station does not fall into the category of BCMP queueing networks any more.
Notice that for q = 1, i.e., for negative exponentially distributed service times, the above
11.5 An approximate solution method 257

i Vi Pi mi G
1 1.00 0.20 4 0.30
2 1.13 0.08 7 2.40
3 0.33 0.80 1 1.00
4 0.33 0.12 10 3.90
5 0.67 0.05 00 1.00

Table 11.6: Parameters of the five-node non-BCMP queueing network

simulation approximation
i Xi Pi E[Ni] E[&] xi Pi E[N] E[&]
1 0.339 0.424 2.045 6.038 0.322 0.402 1.649 5.115
2 0.382 0.682 5.848 15.328 0.364 0.650 5.437 14.927
3 0.112 0.140 0.184 1.643 0.106 0.132 0.152 1.434
4 1.127 0.939 17.465 15.502 1.073 1.073 18.415 17.155
5 0.228 0.152 4.399 19.330 0.215 0.215 4.319 20.000

Table 11.7: Approximate and simulation results for the five-node non-BCMP network

result reduces to the earlier presented result for the M(Mlmi queue. Also note that when
pi(K) + 1, then E[N;(K)] -+ E(. Finally, for G 1G Irni queues we can use a’ = i (Ci + Cg ) ,
instead of ct in the equation above (see the Kriimer and Langenbach-Belz approximation
(7.24)).

Example 11.11. A five-node non-BCMP queueing network (adapted from [26]).


Consider a non-BCMP queueing network with M = 5 stations and K = 30 customers. The
station parameters are given in Table 11.6. Notdice the extra column which denotes the
coefficient of variation of the service time distribution. Since the example network is not of
BCMP type, MVA cannot be employed. In Table 11.5 we present the results for both the
approximate analysis and a simulation. Confidence intervals are within 5% of the mean
values. As can be observed, the results match within 5-10%. This seems to be reasonable,
given the difference in computational effort required to find the solution. 0
258 11 Closed queueing networks

v,, = 3

Figure 11.6: A simple queueing network model of a central server system

11.6 Application study

We consider a modelling study of a central server system; the structure of this model
has been adapted from [249, Section 10.2.11. The system and model are presented in
Section 11.6.1. A first performance evaluation, using MVA and a bounding technique,
is presented in Section 11.6.2. Suggestions for performance improvements are studied in
Section 11.6.3.

11.6.1 System description and basic model

We consider a central server system consisting of a CPU and two disk systems. A number
of terminals (users) is connected to this system. Requests from the users are processed by
the CPU. The CPU needs to access the disks during its processing after which a response
is given back to the users. Users think, on average, 10 seconds between receiving an answer
from the system and submitting a new request: E[Z] = 10 (seconds). A single burst of
CPU processing on a job takes E[S,,,] = 0.02 (seconds). Similarly, the two disks require
E[sdr] = 0.03 and E[sd,] = 0.05 (seconds) to perform requests issued to them. An average
user request requires 10 CPU bursts, 6 disk accesses at disk 1, and 3 at disk 2, so that we
have: I&,, = 1, V& = 10, vdr = 6 and v& = 3. In Figure 11.6 we show the corresponding
queueing network model. We denote the numerical parameter set presented thus far as
configuration C 1.
11.6 Application study 259

7 I I I I I I 1 I l-

6
upper bound

0
10 20 30 40 50 60 70 80 90
K

Figure 11.7: The actual throughput X(K) (middl e curve) and lower and upper bounds for
increasing K in the central server model (configuration Cl)

11.6.2 Evaluation with MVA and other techniques

On the basis of the service demands and the visit-counts, we can directly establish which
system component is the bottleneck and compute the maximum throughput reachable. We
compute D,, = 0.2, Ddl = 0.18 and Dd2 = 0.15. The terminals cannot form a bottleneck.
Clearly, the CPU is the most heavily used component and we can directly compute that
X(K) + 5 for K + 00. Furthermore, we note that for K -+ co: pcpu(K) -+ 1, pdE1(K) -+
0.9 and ,Q(K) -+ 0.75.
In Figure 11.7 we show the bounds on the throughput as well as the exact values
(computed using MVA) for K increasing from 1 to 91. Similarly, we show the expected
response time as perceived by the terminal users in Figure 11.8. It can be seen that the
actual X(K) is very close to the computed upper bound, whereas the actual E[k(K)] is
very close to the computed lower bound. If the system had had more balanced service
demands, and if E[Z] had b een closer to any of the other service demands, the computed
bounds would have been tighter.
We finally compare the exact and the approximate throughput computed using the
method of Section 11.5 as a function of K in Figure 11.9. It can be seen that over the full
spectrum of K, the method performs very well.
260 11 Closed queueing networks

8
mv-a6

lower bound -

0 I I I I I I I I I

10 20 30 40 50 60 70 80 90
K

Figure 11.8: The actual expected response time E [R( K)] (middle curve) and lower and
upper bounds for increasing K in the central server model (configuration Cl)

5.0
4.5
4.0
3.5
3.0
xw 25

2.0
1.5
1.0
0.5
0.0
lo 20 30 40 50 60 70 80 90
K

Figure 11.9: Comparison of the exact and the approximate throughput (computed with
the method of Section 10.5) as a function of K (configuration Cl)
11.7 Further reading 261

11.6.3 Suggestions for performance improvements


After having studied the performance of the central server system in configuration Cl,
we try to evaluate some system changes. We will first investigate what the impact is of
improvement of the CPU, the bottleneck in Cl. We therefore evaluate the central server
model with a two times faster CPU; we denote this changed configuration by C2. In C2,
D,, has decreased from 0.2 to 0.1, thus making disk Dl the bottleneck. We immediately
can see that the new throughput is bounded from above by l/O.18 = 5.56.
To improve further on the system performance, we have to increase the speed of the
disk subsystem. Although disk Dl is the bottleneck in the system, it might not be wise to
increase its speed; since disk Dl is already a rather fast disk, an increase of its speed will
most probably be expensive. Instead, we improve upon the speed of disk D2, by doubling
its speed. However, that alone would not be a good investment since not many requests
are handled by disk D2. To take care of a larger share of disk requests, we have to change
the partitioning of the file system such that V& = 4 and Vd2 = 5, that is, we let the fastest
disk do most of the work. We denote these latter changes as configuration C3. In this final
configuration, we have disk D2 as bottleneck and the throughput is bounded from above
by l/O.125 = 8.
In Figure 11.10 we show the throughput as a function of the number of users, for the
three configurations. The increase in throughput when going from Cl via C2 to C3 is clear.
With C3, we can support even more than 100 users before the throughput curve starts to
flatten. Similar observations can be made from Figure 11.11; the better the configuration,
the flatter the response time curve.

Il.7 Further reading


Seminal work on closed queueing networks has been done by Gordon and Newell [106].
Buzen first introduced the convolution method [37] and Reiser, Kobayashi and Lavenberg
developed the mean-value analysis approach [244, 241, 245, 2431 which is based on the use
of the arrival theorem [174, 2601. R eiser also discusses the application of closed queueing
networks for the modelling of window flow control mechanisms [242]. The special issue
of ACM Computing Surveys (September 1978) on queueing network models of computer
system performance contains a number of very interesting papers from the early days of
computer performance evaluation [12, 38, 44, 72, 107, 212, 2921.
Approximation schemes for large GNQNs have been developed by Bard and Schweitzer
[13, 2571. An extension of it is the so-called Linearizer approach developed by Chandy and
262 11 Closed queueing networks

lo 20 30 40 50 60 70 80 90
K

Figure 11.10: The throughput X(K) f or increasing K for the three configurations

I I I I I I I
I

Cl

/:

1
II
10 20 30 40 50 60 70 80 90
K

Figure 11.11: The expected response time E[R(K)] f or increasing K for the three config-
urations
11.8 Exercises 263

Neuse [43]. Zahorjan et al. describe the bounds based on balanced networks [294]; they can
also be found in [177]. Jain also also discusses the bounds on GNQN with infinite-server
stations at length [145, Chapter 33 and 341. Extensions to bound hierarchies have been
developed by Eager and Sevcik [77, 781. King also discusses many approximate methods in
his book [156, Chapter 121. Belch et al. introduced the approximation approach presented
in Section 11.5 [26].

11.8 Exercises
11.1. Approximate MIMI1 result.
Derive the MIMI1 result that is used in the ‘Lfunctional approach” of Belch et al. in Sec-
tion 11.5, starting from the Bard-Schweitzer approximation (11.56).

11.2. Convolution in GNQNs.


Consider a GNQN with three nodes (M = 3). The mean service times at the nodes are
given as: E[&] = 20 msec, I?[&] = 100 msec and E[S3] = 50 msec. For the routing
probabilities we have: rl,l = 0.1, ~1,~= 0.2, r1,3 = 0.7, r2,r = ~3,1= 1.0; all other routing
probabilities equal 0.

1. Compute the visit counts vi.

2. Compute the average service demands Di.

3. Compute the normalising constant when K = 3, i.e., G(3,3) with the convolution
method.

4. Compute, using the convolution scheme, for all stations i: Xi(K), E[IVi(K)] and
q&WI.

11.3. MVA in GNQN.


We readdress the GNQN of the previous exercise.

1. Compute, using the MVA scheme, for all stations i: Xi(K), E[IVi(K)] and E[&(K)].

2. Compute the normalising constants G(3, Ic), k = 1,2,3, using the results from MVA.

3. For K = l;.. , 6, find bounds for the throughput X(K) using balanced queueing
networks.
264 11 Closed queueing networks

4. For K = 1, ee. ,6, find better bounds for the throughput X(K) using balanced queue-
ing networks.

5. For K = 1, ... ,6, use the Bard-Schweitzer approximation to compute X(K) and
compare the results with the above bounds and the exact values.

6. For K= 1,s.. ,6, use the functional approach of Belch et al. to compute X(K) and
compare the results with the above bounds and the exact values.

11.4. Normalising constants.


Consider a GNQN with M > 2 queues and K 2 M customers.
Find an expression in terms of Di’s and normalising constants for the probability
that node 1 contains exactly 1 customer and node 2 exactly 2, i.e., find an expression
for Pr{nr = 1,722= a>.

Assume that K = M. Find an explicit expression for Pr{ni = 1, i = 1, . . . , M}.

Assume that E[Si] = i and T”, = D/i. Give an explicit expression for G(M, K) for
M > 2 and K 2 M.

Show that under the assumptions of 3, the following holds:

X(K) = (M + K”- l)D’

11.5. Approximation schemes.


Consider a GNQN with M = 4 and the following service rates: p1 = p2 = 200, fig = 100
and ~4 = 50. Furthermore, the routing probabilities are given in the following matrix:
0 1 7 2
1
i6 i 6040
6400
0 0 0 10
I '

1. Compute the visit counts & and the average service demands Di.

2. Compute E[&(K)], E[Ni(K)] and X(K) for K = 60 using the Bard-Schweitzer


approximation.

3. Compute E[&(K)], E[Ni(K)] and X(K) for K = 60 using Belch’s functional ap-
proach.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 12

Hierarchical queueing networks

I N the previous chapter it has become clear that the evaluation of large closed queueing
networks can be quite unattractive from a computational point of view; this was also
the reason for addressing approximation schemes and bounding methods. In this chapter
we go a different way to attack large queueing network models: hierarchical modelling and
evaluation. We address a modelling and evaluation approach where large submodels are
solved in isolation and where the results of such an isolated evaluation are used in other
models. To be able to do so, however, we need load-dependent queueing stations, that
is, queueing nodes in which the service rate depends on the number of customers present.
In Section 12.1 we introduce load-dependent servers and show the corresponding product-
form results for closed queueing networks including such servers. We then continue with
the extension of the convolution algorithm to include load-dependent service stations in
Section 12.2 and discuss two important special cases, namely infinite-server systems and
multi-server systems, in Section 12.3. In Section 12.4 we extend the mean-value analysis
method to the load-dependent case. We then outline an exact hierarchical decomposition
approach using load-dependent service centers in Section 12.5. The hierarchical decompo-
sition method can also be used in an approximate fashion; an example of that is discussed
in Section 12.6, where we study memory management issues in time-sharing computer
systems.

12.1 Load-dependent servers


Up till now we have assumed that the service rate at the nodes in a queueing network is
constant and independent of the state of the queue or the state of the queueing network.
It is, however, also possible to deal with load-dependent service rates (or load-dependent
266 12 Hierarchical queueing networks

servers). In fact we have already encountered two special cases of load-dependent servers
in Chapter 4:
l multiple server stations in which the service rate grows linearly with the number
of customers present until there are more customers in the stations than there are
servers;

l infinite server stations in which the service rate grows linearly (without bound) with
the number of customers present.
Observe that in both these cases the load-dependency is “local” to a single queueing station:
the service rate in a certain station only depends on the number of customers in that station.
One can also imagine similar dependencies among queueing stations. Although they can
have practical significance, we do not address them here because their analysis is more
difficult; in general such dependencies spoil the product-form properties of a queueing
network so that mean-value and convolution solution approaches cannot be employed any
more. When using stochastic Petri nets, however, such dependencies can be modelled with
relative ease, albeit at the cost of a more expensive solution (see Chapter 14).
Having load-dependent service rates, it becomes difficult to specify the service time
distribution of a single job since this distribution depends on the number of customers
present during the service process. It is therefore easier to specify the service rate of node
i as a function of the number of customers present: pi(ni). The value E[Si(ni)] = l/pi(ni)
can then be interpreted as the average time between service completions at station i, given
that during the whole service period there are exactly ni customers present. In principle,
pi(ni) can be any non-negative function of ni.
The load-independent case and the above two special cases can easily be expressed in
the above formalism:
l load-independent nodes: pi(ni) = pi, for all ni;

l infinite server nodes (delay centers): pi (ni) = nipi, for all n;;

l K-server nodes: pi(ni) = min{nipi, Kpi} (see also Chapter 4).


Load-dependency as introduced above does not change the product-form structure of
queueing networks of the Jackson (JQN) and Gordon-Newell (GNQN) type introduced
in Chapters 10 and 11. Having M queueing stations and population K we still have a
product-form solution for the steady-state probabilities:

(12.1)
12.1 Load-dependent servers 267

with
pi(ni>= fi pi(j) (JQN), or pi(ni) = fi Oi(j) (GNQN), (12.2)
j=l j=l

and the normalising constant defined as usual. Comparing this with (10.12) for JQNs, we
observe that the ni-th power of pi has been replaced by the product nj pi(j) in the load-
dependent case. Similarly, comparing this with (11.4) for GNQNs, we observe that the
ni-th power of Di has been replaced by the product nj Di(j) in the load-dependent case.
In both cases, the above result reduces to the simpler expressions for the load-independent
case whenever pi (ni) = pi, for all i.
Before we proceed with the analysis of QN including load-dependent service stations,
we give two examples of such stations.

Example 12.1. Non-ideal multi-processing.


Consider a K-processor system where, due to multiprocessing-overhead, not the full ca-
pacity of all the processors can be used. To be more precise, whenever there is only
one customer present, this customer can be served in a single processor at full speed,
i.e., p( 1) = p. However, if two customers are present, each will be processed in a separate
processor with speed ~(1 - e), where E is the fraction of the processing capacity “lost” due
to overhead, i.e., we have ~(2) = 2~( 1 - 6). Th is continues until the number of jobs present
equals K, i.e.,p(lc) = Icp(l - e), Ic 2 K. For Ic > K, we have p(Jc) = Kp(l - E). Clearly,
the service rate of this multi-server system is load-dependent. 0

Example 12.2. Modelling of CSMA/CD networks.


For the modelling of CSMA/CD network access mechanisms like Ethernet [194, 277, 2841,
the queueing analysis methods we have discussed so far do not suffice (we cannot model
the carrier sensing, the collisions and the binary-exponential backoff period in all their
details, to name a few examples). Instead of modelling the exact system operation in all
its details, we can also try to incorporate in a model just their net effect. In particular,
measurement studies have revealed that the effective throughput at the network boundary
in CSMA/CD systems strongly depends on the length of the network, the used packet
length and the number of users simultaneously trying to use the network. In modelling
studies, one therefore may include these three aspects in an expression for the effective
capacity that a CSMA/CD network offers, as follows. If the number of customers increases,
more collisions will occur. If a collision occurs, it will take longer to be resolved when the
network length is larger due to the longer propagation time. A good approximation of the
network efficiency E(n) when n users are trying to access the network is therefore given
268 12 Hierarchical queueing networks

by:
EPI (12.3)
Eb) = E[S] + C(n)&
where E[S] is the average packet length, tR equals the round-trip delay, and C(n) is the
expected number of collisions before a successful transmission. This expression can be
understood as follows. If there are no other users, C(n) = 0 and the efficiency is equal
to 1. On the contrary, if there are n customers trying to use the network, this will cause
on average C(n) collisions before a successful transmission takes place. A collision takes
tR time to be resolved, because the information that there has been a collision has to be
passed through the whole network. Thus, to send one packet of (average) length requires
C(n)tR collision resolution time plus the actual packet transmission time E[S]. Of these
EM + cc n >tR, only the transmission time for E[S] is effectively used, yielding the above
expression.
What can immediately be seen is that longer CSMA/CD networks are less efficient
since collisions take longer to be resolved. Rewriting E(n) slightly we obtain

E(n) = 1 (12.4)
l+W’

which reveals that having longer packets is better for the efficiency since in that case, the
time spent on collision resolution relative to the amount of information sent decreases.
With a few extra assumptions, it has been shown that a good expression for the expected
number of collisions with n active users is [177]:

n-l
1 - A(n)
'b> = A(n) 7 . (12.5)

As n + 00, A(n) + l/e and C(n) -+ e - 1.


We can use the above efficiency E(n) in a simple load-dependent queueing model of a
CSMA/CD network in which the service rate p(n) = FE(n), where 1-1is the transmission
rate. Comparison of this simple model with measurements on Ethernet show that this
model does reasonably well [25].
As a final remark, note that the presentation above follows [177]. In [277] a similar
derivation is given, however, with some differences. Since the model in [ 1771 follows the
measurement results presented in [25] more accurately than the model in [277], we stick to
the former model. 0
12.2 The convolution algorithm 269

12.2 The convolution algorithm


We now proceed to the solution of closed queueing networks with load-dependent servers
using the convolution scheme. Consider a GNQN consisting of A4 MIMI1 stations with
K customers. As before, the routing probabilities are denoted ri,j and the state space
Z(A4, K). The traffic equations are solved as usual, yielding the visit counts L$. Since
we allow for load-dependent service rates now, we have to define pi(j), the service rate
at station i, given that station i is currently being visited by j customers. We define
the service demand (per passage) of station i given that there are j customers present as
Di(j) = K/pi(j). For the steady-state customer distribution, we now have the following
expression:
Pr{& = rz} = with pi(ni) = fi Di(j).
j=l

As we have seen before, the latter product replaces the ni-th power of Di we have seen in the
load-independent case. A direct calculation of the state probabilities and the normalising
constant therefore does not change so much; however, a more practical way to calculate
performance measures of interest is via a recursive algorithm. This recursive solution of
G(A4, K) does change by the introduction of load-dependency, as we now need to take care
of all different populations in each station. In particular, we have:
M ni
G&M) = c n noi( (12.7)
nEZ(M,K)i=l j=l
We now split the single sum into K + 1 smaller sums, each accounting for a particular
population at station A4, which then ranges from 0 to K:
K M n,

k=Cl zEZ(M,K) i=l jzz]


nM=k

= 5 (ii Ddz)) x “rl’fi ( )- (12.8)


k=O Z=l EEZ(M-~,K-~) i=l j=l Di ’

In the first term, we recognize PM(k), the (unnormalised) probability of having Ic customers
in queue A4, and in the second term we recognize the normalising constant with 1 station
(namely, the A4-th) and Ic customers less. Hence, we can write:

G(M, K) = &IM(~)G(M - 1, K - k). (12.9)


k=O
270 12 Hierarchical queueing networks

GO, k) G(2, k) ..a G(m - 1, k) G(m, k) ... Wf, k)


1 1 *. . 1. pm(k)+ 1 ... 1

Pl (1)
.

k-l Pl(k - 1) --a G(m-l,k$.p,(l)+ G(m,k-1) .a.


k Pl (k) ... G(m - 1, k) - pm(O) * G(m,k) eve G&f, k)
.
.
.. G(M, K - 1)
K Pl (K) ... Wf, K)

Figure 12.1: The calculation of G(M, K) with Buzen’s convolution algorithm

This summation explains the name convolution method: the normalising constant G(M, K)
is computed as the convolution of the sequences PM(O), . . . ,pM(K) and G(M - 1, K), . . . ,
G(M - 1,O). As initial values for the recursion, we have G(m,O) = 1, for m = 1, ..a, M
(there is only one way to divide 0 customers over m nodes), and G( 1, k) = pi (Ic) =
l-& &(k), for Ic = 1,. . . , K.
Although this recursion scheme is slightly more involved than the load-independent
case, we can easily represent it in a two-dimensional scheme as before (see Figure 12.1).
We can still work through the scheme column-wise, however, we need to remember the
complete left-neighbouring column for the calculation of the entries in the current column.
We therefore need to store one column more than in the load-independent case; the memory
requirements are therefore of order O( 2K). If all the nodes are load-dependent, we need
to store the precomputed values Di(j) which costs O(MK). In summary, the memory
requirements are of order O(MK). The time complexity can be bounded as follows. To
compute the Ic-th entry in a column, we have to add Ic products. Since Ic can at most be
equal to K, we need at most O(K) operations per element in the table. Since the table
contains MK elements, the overall computational complexity is O(MK2).
TOcompute Pr{Ni = ni} we now proceed in a similar way as for load-independent
nodes. As we have seen there, it turns out to be convenient to first address the case
i = ikf:

Pr(NM =nM) = C
12.2 The convolution algorithm 271

G(M-l,K-nM)
= PA&M) (12.10)
GPf,Eo *
As we have seen before, this expression contains normalising constants of columns M and
M - 1; hence, the ordering of columns (stations) is important. In the load-independent
case we were in the position to write G(M - 1, K - nM> as the difference of two normalising
constants of the form G(M, .) by using the simple recursion (11.20). Due to the convolution-
based expression in the load-dependent case (12.9) we cannot do so now, so we cannot
generalise the above expression for all stations i. If we want to compute this measure for
more than one station, the only thing we can do is to repeat the convolution scheme with
all of the stations of interest appearing once as station M. Notice that the nodes for which
no such detailed measures are necessary can be numbered from 1 onwards and the part of
the convolution for these nodes does not have to be repeated. Using the above result, the
utilisation of station M can be calculated as

PM(K) = 1 - Pr{nM = 0} = 1 - G(Gyi kf”. (12.11)


7
For the calculation of the throughput X(K) we need the following result, which is valid
for all i = 1, . . . , M:

(12.12)
because Oi(j) = K/pi(j). For the throughput of station M we now find:

k=l

(12.13)

Here we are fortunate to find again an expression based on only the M-th column in the
computational scheme, and hence it is valid not only for station M, but for all stations:
G(M, K - 1)
Xi(K) = v; (12.14)
G@f,K) -
272 12 Hierarchical queueing networks

As we have seen in the load-independent case, the throughput through the reference node
(with visit-count 1) is the quotient of the last two normalising constants; all other node
throughputs depend on that value via their visit ratio Vi. For the average population of
station AL!we find

(12.15)
k=l

Notice that this expression is again only valid for station M. If one would be interested
in E[Ni(K)] (i # M), th e convolution algorithm should be run with a different ordering of
stations so that station i is the last one to be added.
We finally comment on the difficulty in computing Pr{NM 2 nM> in the load-dependent
case. Similar to the load-independent case, we can express this probability as follows:

Pr{NM LnM) = c

4M

~&i(“i) n oM(+ (12.16)


knM+l
Z(M,K-~M)

We observe that we cannot reduce the remaining sum to a well-known normalising constant,
since the terms DM(l) over which the product for station M is taken (I ranges from the
smallest number larger than nM, i.e.,nM + 1, to the actual number (n/M) of customers in
station M) are different from the terms DM (I’) that would appear in the expression for
the normalising constant G(M, K - n&f) (I’ would then range from 0 to K - nM>. This
subtle observation shows the increased complexity of computing performance measures in
queueing networks with load-dependent servers.

12.3 Special cases of the convolution algorithm


There are two special cases when using the convolution algorithm for GNQN with load-
dependent service rates: the case of having multiple servers per queue (Section 12.3.1) and
the case of having an infinite server station (Section 12.3.2).

12.3.1 Convolution with multi-server queueing stations


Consider the case where we deal with a GNQN with M stations as we have seen before,
however, the number of identical servers at station i is mi. The service rate at station i
12.3 Special cases of the convolution algorithm 273

then is:

Pi(j) = jk j < mi
(12.17)
{ miPi7 .i 2 mi.

Consequently, the steady-state distribution of the number of customers in such a queueing


network, given in total K customers, equals:

(12.18)

where D; = Xi/,ui and

ai = %!, ni < m;,


(12.19)
mi !mnayrna, n; 2 mi.
{
The normalising constant is defined as usual to sum all the terms to one. It can be shown
that the following recursion holds for the normalising constant>:

K D”
G(W Eo = 1 --G(M - 1, K - k), (12.20)
k=o ai

with as initial conditions G(m,O) = 1, for m = 1, ... , M, and G(1, k) = Df/ctl(k), for
k = 0, * - * ) K.
The recursion given here is only to be used for nodes that are indeed multi-servers. For
those columns in the tabular computational scheme that correspond to load-independent
single server nodes, the simpler load-independent recursion can be used.

12.3.2 Convolution with an infinite-server station


A special case arises when we address a GNQN with a single infinite-server station. In
principle, an infinite-server station in a GNQN with K customers can be regarded as a
K-server station. It is reasonable to assume that there is only one such station in the
queueing network (if there are more they can be merged into a single one). Without loss
of generality we assume this node is numbered one. Then, the only thing we need to do is
to change the initialisation to the tabular computational scheme as follows.
The steady-state distribution of the number of customers in such a queueing network,
given in total K customers, equals

1 0;’ M
Pr{N = rz} = rI 0”“. (12.21)
G(WEo nl! iz2
274 12 Hierarchical queueing networks

The normalising constant is defined as usual to obtain probabilities that sum to one. The
following recursion then holds for the normalising constant:

Wf, K> = G(M - 1,K) + DMG(M, K - I), (12.22)

with as initial conditions G(m,O) = 1, for m = 1, ..a, M, and G(l, Ic) = @/lc!, for
k = o,... , K. The only “irregularity” in the queueing network is brought into the compu-
tational scheme directly at the initialisation; the rest of the computations do not change.

12.4 Mean-value analysis


In Chapter 11 we have developed an MVA recursion for the load-independent case and for
the special load-dependent case of the infinite servers. In this section we develop an MVA
recursion scheme for general load-dependency.
As before, the throughput X(K) is simply expressed as the fraction of the number of
customers K present and the overall response time per passage:

(12.23)

In order to calculate the value of E[.&(K)] we again use the arrival theorem for closed
queueing networks. However, since the service times depend on the exact number of
customers in the queue, the average number E[Ni(K)] in the queue does not provide
us with enough information for the calculation of E[&(K)]. Instead, we need to know
the probability r;(j]k) for j customers to be present at queue i, given overall network
population Ic. Let us for the time being assume we know these probabilities.
In a more detailed version, the arrival theorem for closed queueing networks states
that an arriving customer at queue i will find j (j = 0, . ea, K - 1) customers already in
that queue with probability x,(j]Ic - 1)) i.e., with the steady-state probability of having
j customers in queue i, given in total one customer less in the queueing network. Since
A
E[&(K)] includes the service of the arriving customer, we have j services with average
demand Di (j) with probability 7r(j - 1]K - 1)) and thus:

E[ii@-)] = &,(j - l]K - l)Di(j). (12.24)


j=l

As before, we have E[R(K)] = C& E[&(K)] and X(K) = K/E[R(K)], Xi(K) =


ViX(K), and the average population at station i can be can be computed using Little’s
12.4 Mean-value analysis 275

law or can be expressed as:

(12.25)
j=o
Anyway, the only unknowns to be solved are the probabilities ~i(jlrC). We will develop a
recursion scheme for the ~i(jllc) below.

Example 12.3. Readdressing the load-independent case.


Before we present the solution of the probabilities ~~(jllc) it might be instructive to address
the case when Di (j) = Di, for all j, that is, for the case that we again deal with the load-
independent case. We have:

E[&(K)] = &?r,(j - 1IK - l)Di


j=l

= Di(~i(O(K-1)+2~i(llK-l)+...+K~i(K-11K-l))
= Di(~i(0~1(-1)+~i(l~K-1)+~i(2~Ei-1)+~~~+~i(li-l~K-l))
+ Di (Ti(llK - 1) + 2ri(2IK - 1) + * * * + (K - l)~i(K - 11-K - 1))

= Di + DiE[Ni(K - I)]
= (E[Ni(K - I)] + l)Di) (12.26)

which indeed conforms to the load-independent case we have seen before. 0

Let us now come to the actual computation of ri(jllc). First, notice that Ti(OIO) = 1
since with probability 1 there are no customers at queue i when there are none in the
network at all. Secondly, we have

(12.27)
j=l

since CjK_o~i(jlK) = 1. N o t ice that this subtraction can be a source of round-off errors.
We then use (12.10) as follows:

G(M-l,K-j)
Ti(jlK) = Pr{Wi(K) = j} = pi(j)
GPK Eo
G(M-l,K-j)
= fi Ddz) G(M E()
1=1

Djlj)‘(~~~‘~)-‘)
7
276 12 Hierarchical queueing networks

G(&&l,K-j) G(M, K - 1)
- Pi(j - 1) Di(d
-( G(M, K - 1) Wf, K)

= q(j-l]K-1) Oi(.i>

Xi(K)
(12.28)

As can be observed, we have found a way to express ri(j] K) in terms of ni(j - 11K - 1)
so that the MVA recursion scheme for load-dependent GNQNs is complete.
It is possible to combine the MVA presented here with those for load-independent and
infinite-server nodes. To summarize, the following MVA computational scheme can be
used to evaluate closed queueing networks with FCFS, IS and load-dependent nodes for
increasing customer number IG= 1,. . a , K, and for all nodes i:

1. Initialise Ic = 1, E[Ni(O)] = 0, and ri(O]O) = 1;

2. Compute E[&(k)] as follows:

l E[&(k)] = (E[Ni(k - l)] + l)Di, if node i is of type FCFS;

l E[ki (k)] = Di, if node i is of IS type;

l E[&(k)] = cg=1j~i(j-l/k-l)Di(j), 1‘f no d e i is load-dependent, thereby using


the probabilities ri(*]lc - 1);

3. Compute E[I*l(lc)] = xi E[Eii(k)];

4. Compute X(k) = Ic/E[i(k)] and X;(k) = V,X(rC);

5. For the load-dependent nodes, compute the values ni(*]lc) using (12.27) and (12.28);

6. Increase k with 1 and continue with 2, until k = K is reached.

Regarding the cost of this version of MVA, the following remarks are in place. Since we
iterate over K customers and M stations, the computational cost is directly proportional
to the product MK. However, to compute E[&(K)] we need to compute the K prob-
abilities ri(j] K - 1) w h ic h involves another K operations. In total, the computational
cost is 0(MK2). Regarding memory costs, we have to store the extra information for the
probabilities ’ ( a t most) K times more than we used to store, hence, we
~i(jl K) w h’ic>h is
have O(MK) memory costs.
12.5 Exact hierarchical decomposition 277

12.5 Exact hierarchical decomposition


Load-dependent servers are especially useful in hierarchical model decomposition, which not
only saves computation time, but also keeps the modelling process structured and allows
for reuse of subsystem models. In this sect8ion, we restrict ourselves to the case where
hierarchical model decomposition can be applied exactly. We describe hierarchical model
decomposition informally in Section 12.5.1, after which we formalize it in Section 12.5.2.

12.5.1 Informal description of the decomposition method

Consider the case where one has to model a large computer or communication system
involving many queueing stations (M) and customers (K). We have seen that the required
computational effort is at least proportional to the product MK (in the load-independent
case), so that we might try to decrease either K or M. Therefore, instead of constructing a
monolithic model and analysing that, we proceed to analyse subsystems first. The results
of these detailed analyses, each with a smaller number of nodes, are “summarized” in a
load-dependent server modelling the behaviour of the subsystem, which can subsequently
be used in a higher-level model of the whole system (again with a smaller number of
nodes). The hierarchical decomposition approach as sketched here is also often referred
to as Norton’s approach by its similarity with the well-known decomposition approach
in electrical circuit analysis. By the fact that a subsystem model, possibly consisting of
multiple queueing stations, is replaced by a single load-dependent queueing station, this
queueing station is often referred to as a fZow-equivalent service center (FESC) or as an
aggregate service center. The former name indicates that the load-dependency is chosen
such that the single station acts in such a way that its customer flow is equivalent to that
of the original queueing network.
Let us illustrate this approach for dealing with a GNQN with A4 stations numbered
l;.. , b!l*,111* + l;-, A4 and K customers. Stations 1 through M* are the nodes to be
aggregated in a single FESC. Stations M* + 1,. . . , A4 are the queueing stations that will not
be affected; sometimes these are called the high-level nodes. Note that the node numbering
scheme does not affect generality. Furthermore, we assume that the queueing network is
structured in such a way that there is only a single customer stream from the high-level
model stations to the stations to be aggregated and back. This is visualized in Figure 12.2;
we come back to the interpretation of the probabilities ~1:and p later.
Since the total number of customers in the GNQN equals K, the number of customers in
the stations to be aggregated varies between 0 and K as well. Given a certain population in
278 12 Hierarchical queueing networks

Figure 12.2: High-level view of a GNQN that is to be decomposed

o!
stations
1,***,A4*
8
population Ic = l,...,K

Figure 12.3: Decomposition approach using a FESC

the group of stations to be aggregated, the high-level stations perceive a fixed average delay
for customers passing through the stations to be aggregated, namely the average response
time (per passage!) for this subnetwork with Ic customers in it (denoted E[fi*(k)]). From
this, we can compute the perceived rate X*(lc) = k/E[fi*(k)] at which customers are
served in the subnetwork. We compute X*(lc) by studying the subnetwork in isolation
by connecting the in- and out-going flows to the high-level model. That is, the outgoing
branch (labelled with probability al) is looped back to the queueing station at which the
flow labelled 1 - p ends. The throughput along this shorted circuit can then be used
as service rate for the aggregated subnetwork (the FESC) that can be embedded in the
high-level nodes, as visualized in Figure 12.3. Notice that we have added an immediate
feedback loop around the FESC (with probability 1 - Q); this is to ensure that the visit
counts in the original non-decomposed GNQN of the non-aggregated stations, relative to
the visit-count of the FESC, remain the same. In many textbooks on queueing networks
this immediate feedback loop is not explicitly mentioned.
Note that since the overall network is a GNQN, the subnetwork is so as well. Therefore,
one can employ MVA or the convolution approach to solve it. Since an MVA is recursive
12.5 Exact hierarchical decomposition 279

in the number of customers in the network, with one MVA for the maximum population
K, the throughputs for smaller populations are computed as well.
When we need to aggregate a subnetwork which is to be embedded in a yet unknown
high-level model, we face the problem that there is no given bound on the population
in the subnetwork. One then often assumes that the subnetwork population is bounded
by some i and computes the throughputs X*(l), . . . ,X*(i). Whenever in the high-level
model evaluation the situation occurs that the number of customers in the FESC is larger
than i, one assumes X*(lc) M X*(b). If i is taken large enough, this yields reasonably
accurate results in most practical cases.

12.5.2 Formal derivation of the decomposition method


Let us now formalize the hierarchical decomposition approach and show its correctness
by using results from the convolution method with load-dependent stations. To keep the
notation and terminology simple, we assume that all stations are load-independent and
have visit-counts K,, mean service times E[Si] and service demands Di. The nodes 1
through M* are to be aggregated in a single FESC (we use the subscript “a” to refer to
this aggregate station).
If we study the queueing stations 1 through M* in isolation (in the above sketched
short-circuited way) the service demands for these stations do not alter. Hence, we
can perform a standard convolution to obtain the normalising constants G(IM*, 0) =
1, G(M*, 1)) G(IM*, 2)). s. , G(IM*, K), using the computational scheme depicted in Fig-
ure 11.1 (we basically compute the first M* columns in this scheme). According to (11.24))
we then know that the throughput through this queueing network, given Ic customers are
present, is given by
G(M*, k - 1)
x*(k) = G(&f*J) ’ ’ = l,“‘,K’ (12.29)

Notice that X*(lc) is the throughput through a station in the subnetwork that is being
aggregated for which Vi = 1. When Vi # 1, the actual throughput through station i in
the subnetwork is KX(lc). To keep the notation simple, we assume that VI = 1 and that
the short-circuit originates and ends at station 1, that is, customers from the high-level
model enter and leave the subnetwork at station 1. We find that the throughput through
the short-circuit equals X*(k) (1 - X:2 4). For reasons to become clear below, we now
construct a load-dependent queueing station with service rate p,(k) = X*(k).
We now consider the high-level model with the embedded FESC, i.e., we consider the
new station indexed “a” and the stations M* through M. To evaluate this GNQN, we
280 12 Hierarchical queueing networks

Figure 12.4: A small three-node GNQN

can use a convolution scheme for load-dependent servers. As first column in the tabular
computation scheme we take the aggregated station. We know from Table 12.1 that the
first column is given as

(12.30)

Now, by our choice pL,(IG)= X*(k), we find that p,(O) = 1 (as it should), but also p,(l) =
l/pa(l) = G(M*, 1) by (11.24). In a similar way, we find

= G(hf*,
1)GWf*,
2) mf*, k)
G(M*, 0) G(M*, 1) . . . G(M*, k - 1)
= G(M*, k). (12.31)

The service rates p,(k) have apparently been chosen such that the original column of
normalising constants for the subnetwork to be aggregated is exactly reproduced. We can
now continue with adding more columns to the convolution table, namely the columns
corresponding to stations M* + 1 through M*. In other words, stations 1 through M*
have been summarized in a single station yielding the same “behaviour” in the sense of
the computational method used. For this approach to work correctly, the visit-count to
the aggregate should be the same in the overall and in the decomposed model.

Example 12.4. A three-node GNQN.


We reconsider the small GNQN we have used throughout Chapter 11 to illustrate algo-
rithms. For ease of reference, we depict this GNQN again in Figure 12.4 and restate the
model parameters: VI = 1, V2 = 0.4, V3 = 0.6, and E[Si] = i, so that Di = 1, 02 = 0.8 and
D3 = 1.8. We are now intending to aggregate station 2 and 3 into a single load-dependent
queueing station. Note that when adhering to the above introduced notation, we should
12.5 Exact hierarchical decomposition 281

(4 (b)

Figure 12.5: Aggregating the three-node GNQN

renumber the nodes as follows: (1,2,3) + (2,3, l), so that we can aggregate the new nodes
1 and 2 into an aggregate “a” ; however, to avoid confusion, we will adhere to the node
numbering scheme of Figure 12.4.
Short-circuiting nodes 2 and 3 will yield the simple GNQN as depicted in Figure 12.5(a);
note that the visit counts remain the same. After we have computed the service rates for
the FESC, we have to evaluate the GNQN as given in Figure 12.5(b). Note that there is no
need to have an immediate feedback loop around the FESC with probability 1 - c2r,since
the probability to make more than a single pass through the subnetwork to be aggregated
equals 0 (a = 1 -+ 1 - cy = 0).
Let us now start the computations. We find for the subnetwork to be aggregated the
following normalising constants:

k WA G(3,k)
0 1.0 1.0
1 0.8 2.6

I----
2 0.64
3 0.512
5.32
10.088

The throughput through the subnetwork, given Ic customers present, and hence the service
rates for the FESC can then be computed as the usual ratios of normalising constants:

We now continue to embed the FESC in the overall network. We therefore have to compute
the normalising constant for the queueing network given in Figure 12.5(b). As first column,
we take the column computed for the aggregated subnetwork, as follows:
282 12 Hierarchical queueing networks

k I G(d4 G(LW
0 1.0 1.0
1 2.6 3.6
2 5.32 8.92
3 10.088 19.008

The second column is then computed by standard convolution (load-independent), using


the service demand Di = 1 for station 1. As can be observed, this column is the same
as the column of normalising constants we have computed before (in Chapter 11) without
aggregation. Cl

Having computed the normalising constants for the GNQN including the FESC, we can
compute other performance measures of interest in the usual way. The big advantage of
aggregating model parts is to cope with model complexity. Subsystems that appear in a
number of models, e.g., a disk subsystem model as part of a number of models of various
computer architecture alternatives, only need to be analysed once and can then be used in
its aggregate form.
We have demonstrated the hierarchical decomposition method here, using the convo-
lution scheme. Needless to say the MVA scheme can also be used for this purpose. The
subnetwork to be aggregated is then evaluated using MVA for all possible populations K
and the relevant throughputs are computed. These are then used as service rates in a load-
dependent queueing station. The MVA variant for queueing networks with load-dependent
stations is then used to evaluate the overall network.
The result presented here has been proven by Chandy, Herzog and Woo in the mid-
1970s (it is sometimes referred to as the CHW-Theorem). To be more precise, the following
theorem holds.

Theorem 12.1. Hierarchical decomposition.


In any GNQN, a group of queueing stations can be replaced by a single
load-dependent server with suitably chosen service rates, without changing the
queue length distribution at the nodes. ci

Example 12.5. Central server model.


Hierarchical decomposition works very well for queueing network models of central server
systems, as addressed in Section 11.6. A typical decomposition approach would then be to
12.6 Approximate hierarchical decomposition 283

simply aggregate the CPU and the two disks into a single load-dependent service station.
What remains of the overall model are two queues, one with infinite-server semantics (the
terminals) and one with general load-dependent behaviour (the system). cl

12.6 Approximate hierarchical decomposition


Although hierarchical model decomposition is a very useful technique for GNQNs, its
use also lies in the analysis of queueing networks which do not conform to the class of
GNQNs. In such a case the decomposition is not exact any more; however, in many
practical situations, the approximations obtained with it are reasonably good. Without
going into too much detail, we state that the approximation becomes better whenever the
customers in the subnetwork to be aggregated are less affected by what happens in the
high-level model. This is most notably the case when the services in the submodel complete
at much higher rates than in the high-level model and the customers stay relatively long
in the aggregate. In such cases, between any two interactions of the high-level model and
the submodel to be aggregated, many activities take place in the submodel. We say then
that there are time-scale differences between the two model parts.
As a representative example of a case in which approximate hierarchical decomposition
works well, we consider a model of a multiprogramming computer system. A simple case,
not taking into account paging effects, is addressed in Section 12.6.1 whereas Section 12.6.2
deals with a model including paging effects.

12.6.1 Multiprogrammed computer system models

Consider a model of a multiprogramming computer system as given in Figure 12.6. K


terminals are connected to a computer system consisting of a CPU and two IO-systems.
Due to the limited memory size of the computer, not all jobs can be processed at the same
time (think of the CPU as a processor with PS or RR scheduling). There is a so-called
multiprogramming limit J < K. Jobs issued from the terminals are taken into service as
long as the number of jobs currently being processed is smaller than J, otherwise they are
queued in a swap-in queue.
In a way, the swap-in queue could be regarded as a load-dependent server in which the
service rate depends on the population at the CPU and the IO-systems; the service rate is
infinite as long as the latter population is smaller than J, and zero otherwise. As indicated
in Section 12.1 such non-local load-dependency cannot be analysed using queueing network
284 12 Hierarchical queueing networks

K customers

swap-in queue
a
-------------_-------------
I I
I I
I > I
I I
I I
I I
I I
I
101 I
I I
I I
I I

I
I multiprogramming limit J I
L--------------------------l

Figure 12.6: Central server model with K terminals and multiprogramming limit J < K

models (as we will see later, this model can easily be solved using stochastic Petri nets if
the number of customers K is not too large). We therefore employ an approximate model
decomposition. We solve the “dashed” submodel in Figure 12.6 in isolation for possible
populations ranging from 1 through J in order to obtain X*(l), . . . , X*(J), as depicted
in Figure 12.7(a). We then solve the high-level model with a load-dependent FESC as a
substitute for the system with multiprogramming limit, as depicted in Figure 12.7(b). The
effect that only J customers are allowed to enter the subnetwork will be reflected in the
approximate model by taking X*(lc) = X*(J), whenever Ic > J.

Note that based on the above considerations, the approximation will become better
when the probability a decreases (the interactions between the high-level and the low-
level model takes place less frequently) or when the CPU and IO service times decrease as
opposed to the terminal think time.

We consider the same numerical parameters as we have considered in case Cl in Sec-


tion 11.6, i.e., the mean service times are E[S,,,] = 20 msec, E[Sroi] = 30 msec, and
E[Sroz] = 50 msec, and the visit counts are given as V&, = 1, Vcpu = 10, VI,, = 6 and
VIoz = 3. The multiprogramming limit is set to J = 20. Solving the short-circuited model,
12.6 Approximate hierarchical decomposition 285

. ------------ -.
, [Ey erminals ,a

Figure 12.7: Decomposition of the model with multiprogramming limit

yields the following throughputs (we show only a few of the computed values):

j 1 6 11 16 2 20
X*(j) 1.8868 3.9907 4.6471 4.8239 4.8934

We can use these throughputs in the analysis of the overall model of Figure 12.7(b). First
note that the swap-in queue is no longer present in the model. Its effect is still present in
the model by the use of the specific values for the rates of the FESC. Also notice the direct
feedback loop around the FESC (with probability a); it is included to obtain the proper
visit counts in the model (see also Section 12.5.1).
Without presenting the detailed recursion, the approximate terminal throughput X(K)
is presented in Table 12.1 as a function of the number of users K. Next to the throughput
results from the decomposition analysis, we present the exact throughputs X(K) obtained
via a numerical analysis of the underlying CTMC, thereby using the stochastic Petri net
formalism as will be presented in Chapter 14. As can be observed, the results agree
very well. This is due to the large time-scale differences in the two models; the rates
involved differ by three orders of magnitude! To indicate that we have gained a lot by the
decomposition approach, we show the number of states in the CTMC model as well; as will
become clear later, we have to solve a linear system of equations of that size to compute
the solution. Surely, the hierarchical decomposition method is much cheaper to pursue.
A few remarks can be made regarding the employed model. We did not distinguish
between different jobs and therefore assumed that the memory requirements of all jobs are
the same. Furthermore, we assumed that when terminal requests enter the system, the
286 12 Hierarchical queueing networks

K X(K) X(K) states


1 0.0950 0.0945 4
5 0.4731 0.4731 56
11 1.0343 1.0343 364
16 1.4948 1.4948 969
21 1.9468 1.9468 2002
26 2.3875 2.3875 3157
31 2.8133 2.8133 4312
36 3.2191 3.2191 5467
41 3.5981 3.5980 6622
46 3.9419 3.9416 7777
51 4.2409 4.2397 8932
56 4.4861 4.4830 10087

Table 12.1: Comparing the throughputs obtained via a decomposition (X(K)) and a nu-
merical (X(K)) so1u t ion approach for the multiprogramming system model for increasing
number of customers K

jobs already present are not affected. This is not the case in practice. More often than
not, allowing new customers to start their execution influences the amount of memory that
is granted to the customers already there. Either these customers will be able to use less
memory or they will perceive extra delays due to increased paging activity.

12.6.2 Studying paging effects


In multiprogramming computer systems the number of jobs being processed simultaneously
is limited by the multiprogramming limit J. When there are more requests for job pro-
cessing, jobs are queued in the swap-in queue. We assumed that only jobs being processed
by the terminals are swapped out. This, however, is only part of the truth. When the
processing of a job requires a lengthy I/O operation, this job may be swapped out as long
as the I/O device can autonomously work on the job. To speedup the swapping process (in
which large process images might have to be stored) use is made of fast I/O devices, or of
I/O devices that are relatively close to the system’s main memory (local disks) as opposed
to remote disks that need to be accessed for normal user I/O. Apart from the above swap-
ping phenomena, virtual memory and paging play an important role. In virtual memory
systems, every job can have a virtual work space larger than the physical memory of the
12.6 Approximate hierarchical decomposition 287

throughput X*(lc) 1

Figure 12.8: A decomposed GNQN of the multiprogramming system with user and page
I/O d evices

system. Only those parts or pages of the job that are needed are loaded in main memory;
the other pages are stored in a secondary storage device, normally disk. Depending on the
characteristics of the individual jobs, more or less pages are needed by a job to progress
efficiently. Whenever a running job needs to access a page that is not yet in main memory
a page fault occurs. The page fault results in a page I/O request to get the missing page
from secondary storage and to place it in main memory, thereby most often replacing the
least recently used page, either from the same job or from all jobs being processed.
The number of pages needed by a job over a time interval of length t is called the job’s
working set w(t). w(t) is increasing with t, however, its derivative w’(t) goes to zero for
increasing t since there is a maximum number of pages that the job needs to complete (the
total memory requirement is limited). If the number of jobs that is simultaneously accepted
by the system increases, possibly up to the multiprogramming limit J, less main memory
per job is available to store the job’s working set. Hence, the higher the number of jobs in
the system, the higher the page-fault rate. As a consequence of this, the probability that
job processing has to be interrupted by a page I/O device access increases with increasing
numbers of jobs. As can be understood, this decreases the perceived speed of the system.
Now, we will discuss the above sketched phenomenon once more, illustrated with a
simple queueing model. Consider a QN model of a multiprogramming computer system,
similar to the one depicted in Figure 12.2; however, there is one so-called user-I/O device
and one dedicated page-I/O device. Due to the multiprogramming limit J, we have to
apply a decomposition to solve the submodel with j = 1, . . . , J customers in it. This
short-circuited submodel is shown in Figure 12.8.
Whenever the service demands at the queues are constant, for increasing multipro-
gramming limit j = l,, . . . , J, the throughput X*(j) will b e 1imited by the highest service
288 12 Hierarchical queueing networks

1
maxz D,

multiprogramming limit j f

Figure 12.9: The throughput X*(j) approaches its bound for increasing j, with fixed paging
load

demand:
x*(j) I IL (12.32)
maxi{ Di} *
The result of this is that with increasing j, X*(j) approaches its maximum value, as illus-
trated in Figure 12.9. However, when the multiprogramming limit increases, the amount of
main memory per admitted job decreases. Since this results in more page faults, it seems
reasonable to assume that the routing probability from the CPU to the page-I/O device
increases; therefore the visit ratio Vpage(j) will increase relative to the other visit ratios (we
explicitly indicate the dependence of the visit ratios on the multiprogramming limit j).
We take the CPU as a reference, i.e., V&(j) = 1, and assume that the system initially is
CPU bound which means that for small j the CPU has the largest service demand. When
increasing j, by the increase of Vpape(j) , Dpape(j) b ecomes the largest service demand so
that under larger load the system becomes page-IO bound. In the latter case, it is possible
that the throughput X*(j) = l/ maxi{Di(j)} d ecreases for increasing j. The effect that by
increasing the multiprogramming limit the performance of a system deteriorates is known
as thrashing, and is illustrated in Figure 12.10. When the multiprogramming degree is
taken too small the system is under-utilized and the throughput is too small, but when
the multiprogramming limit is taken too high, the system performs badly due to too much
page-I/O overhead.
To evaluate a multiprogramming computer system, including paging effects, as sketched
above, we can use an evaluation similar to the approximate hierarchical decomposition we
have seen before. We can aggregate the complete computer system model, excluding the
terminals, in a single FESC. As in the previous case, we need to evaluate the submodel to be
12.6 Approximate hierarchical decomposition 289

x*(.d

multiprogramming limit j B

Figure 12.10: The throughput X*(j) decreases for larger multiprogramming limits j; the
thrashing effect

aggregated for all possible populations; however, we now have to take into account different
routing probabilities (or visit counts) for different multiprogramming limits to reflect the
fact that for higher values of the multiprogramming limit, the visit count for the paging
device becomes larger. Since we cannot cope in our models with routing probabilities that
change depending on the population of a model part, we have to assume that the routing
probabilities are fixed. We assume t>hem to be equal to the case when the maximum number
of jobs is present, even if less customers are being processed (interpreted in the context of
the modelled system, this implies that for a given multiprogramming limit, each process
obtains a fixed number of pages, whether it uses these or not, and whether the other pages
are free or not). In summary, we have to use the following computational scheme:

1. For j = 1;~. , J compute the visit ratios vi(j) from the traffic equations;

2. For a given value of j, use Di and V,(j) to compute X*(j) using an MVA
or convolution scheme;

3. Construct a FESC with pa(j) = X*(j) as computed above; for I > J we


set ,Y~(Z) = X*(J).

Important to observe is that we need multiple MVAs (or convolutions) to compute the
service rates for the FESC. The problem we now face is that of embedding the above
FESC in the overall model. As we have seen in Section 12.5.1, when embedding the FESC
in the overall model, we need to route jobs departing from the FESC directly back to
it, with probability 1 - a. However, when we change the population j, in the submodel
to be aggregated the visit count for the paging device changes. Although the other visit
290 12 Hierarchical queueing networks

counts remain the same, this changes the routing probabilities in the model. Thus, for
every population j, a has been different. Hence, there is no unique way to construct the
overall aggregated model. Thus, at this point the only thing we can do is to study X(lc)
of the aggregate in itself. In Chapter 14 we will use stochastic Petri net models to solve
the unresolved modelling problem.

Example 12.6. Increased paging device load.


Consider a multiprogramming computer system as we have addressed before with the
following numerical parameters: E[S,,,] = 20 msec, E[S,,,,] = 40 msec, E[S,,,,] = 50
msec, the multiprogramming limit J = 20 and a(j) = 0.3 - j/SO and p(j) = 0.6 + j/SO
and r(j) = 0.3, for j = 1,. . . , J.
For increasing j, the service demands Di (j) can be computed from the traffic equations
and the service times. We find (in msec):

DC&) = 20, L&age(j) = 16 + j/2, &se,(j) = 15, (12.33)

so that, according to the bounds presented in Section 11.4.1:

x*(j)2I 50, 1000 j 2 8,


(12.34)
16+j/2 ’ j 2 8.

Using multiple MVA evaluations, the aggregate throughputs Xa(j) are depicted in Fig-
ure 12.11; the discussed thrashing effect is clearly visible. Notice how tight the upper
bound is for larger j, e.g., for j = 20 we find X*(20) 5 38.46 whereas the exact value
equals X*(20) = 38.38. In order to increase the system performance, admitting more cus-
tomers is not a good idea. In fact, one should try to keep the number of admitted customers
below the value of j for which Xa(j) is not yet decreasing (here, a multiprogramming limit
of 10 would have been a good choice). A suggestion for a performance improvement would
be to either increase the speed of the paging device (decreasing E[S,,,,]), or to increase
the size of the main memory so that less page faults occur (decreasing V&+,(j)); both will
decrease Dpage(j) and therefore increase X, (Ic) . 0

12.7 Further reading


The hierarchical decomposition approach has been developed by Chandy, Herzog and Woo
in the mid-seventies [42, 411. It should be noted that the FESC approach can be applied
iteratively: an FESC can be embedded in a model which in turn is aggregated in an FESC.
12.8 Exercises 291

50
45
40
35
x*(j) 30
25
20
15
10
2 4 6 8 10 12 14 16 18 20
multiprogramming limit j

Figure 12.11: The aggregate throughput X*(j) as a function of the multiprogramming


limit j

For each aggregation phase, the most suitable computational method can be chosen. In
fact, when different computational methods are chosen (including possibly also simulation)
to obtain the FESC, one often speaks of hybrid modelling. In the book by Lazowska et
al. these issues are treated in more detail [1771.
The convolution method for QN with load-dependent stations has first been presented
by Buzen [37]. The MVA for this case has been presented by Reiser and Lavenberg [245,
2431. Interesting work on multiprogrammed central server systems, including paging effects
and working sets can be found in the book of Trivedi (a simple model similar to the one
discussed here) [280] and the book by Coffmann and Denning (a complete discussion of
memory management issues) [59].

12.8 Exercises
12.1. Load-independent convolution.
Show that when Oi(j) = Di, for all i, j, the convolution recursion for the load-dependent
case (12.9) indeed reduces to the one for the load-independent case (11.20).

12.2. I/O-subsystems.
A typical system component in many computer systems is the I/O-subsystem, consisting
292 12 Hierarchical queueing networks

of a number of parallel independent disks. Suppose we have nI/o of these parallel disks
and that a disk request is handled by disk i with probability ai and takes l/pi seconds to
complete (i = 1, . . . , nl,o).

1. Discuss how an I/O-subsystem as described above can be aggregated into a single


FESC.

2. Simplify the results of 1, when CY~= l/nI,o and pi = p.

3. Simplify the results of 1, when K = (r$[Si] = V.

12.3. A three-node GNQN.


Reconsider the three-node GNQN for K = 3.

1. Aggregate nodes 1 and 2 and construct an appropriate FESC.

2. Embed this aggregate in the overall network, i.e., combine it with node 3.

3. Solve the resulting two-node GNQN (with one load-dependent node) with both MVA
and the convolution method.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 13

BCMP queueing networks

I N this chapter we present a number of results for a yet richer class of (mixed)
closed queueing networks, the so-called BCMP queueing networks.
open and
The seminal paper
on this class of queueing networks, published by Baskett, Chandy, Muntz and Palacios in
1975, is probably the most referenced paper in the performance evaluation literature. We
present the BCMP result in Section 13.1, and then we discuss a number of computational
algorithms in Section 13.2. It is important to note that we do not strive for completeness
in this chapter; we merely selected a few computational algorithms to show their similarity
to the algorithms discussed so far and to comment on their computational complexity.

13.1 Queueing network class and solution


The best-known class of (mixed) open and closed queueing networks with product-form
solution has been published by Baskett, Chandy, Muntz and Palacios in 1975 [14]. We
first present this class of queueing models in Section 13.1.1, after which we discuss the
steady-state probability distribution of customers in Section 13.1.2.

13.1.1 Model class


A BCMP queueing network consists of AL! queueing stations (or nodes). Customers belong
to one of R classes. For each class, routing probabilities through the network must be
specified. A class can either be open or closed and jobs are allowed to change classes when
changing from queue to queue. The queueing stations can be of 4 types:

1. In FCFS nodes, jobs are served in a first come, first served fashion. Although
FCFS nodes may be visited by jobs of multiple classes, the service time distributions
294 13 BCMP queueing networks

of all classes need to be the same and must be negative exponential, albeit possibly
load-dependent. This latter option can be used to model multiple server stations or
FESCs.

2. In PS nodes, jobs are served in a processor sharing fashion. All jobs are processed
simultaneously with an equal share of the capacity. Jobs of different classes may
have different service requirements and the service rates (per class) may depend on
the queue length at the node. Service time distributions must be of Coxian type,
although only the first moment does play a role in the computations.

3. In IS or delay nodes, an infinite number of servers is available. All jobs will be


served by “their own” server (sometimes this is called the server-per-job strategy).
Jobs of different classes may have different service requirements and the service rates
(per class) may depend on the queue length at the node. Service time distributions
must be of Coxian type; only the first moment needs to be specified.

4. In LCFSPR nodes, jobs are served on a last come first served basis with preemption.
Further restrictions are the same as for the PS and IS case.

In the literature, these nodes types are also often referred to as type 1, type 2, type 3 and
type 4 nodes.
When leaving node i, a job of class r will go to node j as a class s job with probability
ri,r;j,s * Jobs leave the network with probability T+;~. Depending on which routing possi-
bilities are present, the pairs (node, class) can be partitioned into so-called routing chains.
In many practical cases, class changes do not occur and a class is then directly connected
to a particular route.
For open models, two arrival possibilities exist: (1) either there is a single Poisson arrival
stream with rate X(lc), w h ere Ic is the total actual population of the queueing network. A
fraction T();i,r of the arrivals goes as a class T job to station i; or (2) every routing chain has
its own Poisson arrival stream with a rate only dependent on the population of that chain
(denoted X,(rCc) with c E C; see below). A fraction ro;i,c of these arrivals arrive at queue i.
For every routing chain c E C (C is the set of all routing chains, NC = ICI), the
following traffic equation can be established:

where X& is determined by the external arrivals of class r to node i (in the case of closed
networks this term equals 0; for open networks it equals XT~;~,~(one arrival process) or
13.1 Queueing network class and solution 295

brO;i,T (arrivals per chain/class)). As a result of this, one obtains the throughputs & for
open chains and the visit ratios V&. for closed chains (both per node and per class). In
many cases, however, the visit counts I&. are directly given as part of the QN specification.
To conclude this section, let us list the most striking characteristics of BCMP QNs in
comparison to the JQNs and GNQNs we discussed before:

l queueing stations can be of four different types, instead of just two (IS and FCFS);

l all station types may have load-dependent service rates;

l customers may belong to different classes, each following their own route through the
QN and requesting class-specific service;

l the service time distributions are all of Coxian type, except for the FCFS stations,
where we still have to adhere to negative exponentially distributed service times;

l BCMP QNs allow for closed and open routes;

l arrivals at the QN (for the open classes) may depend on the QN population.

13.1.2 Steady-state customer probability distribution


Without going into derivations, we simply present the BCMP theorem below. Before
doing so, we need to know how to represent the states of a BCMP QN. Let &, =
(NJ, 42, - - - , N~,R) be the state of node i where Ni,r presents the number of class r cus-
tomers present in queue i and let Ni = C,“=, Ni,r be the total number of jobs in queue i.
The overall state is given by the vector N = (&, N,, - a. , N,) and the overall number of
customers in the QN is given by K = CE, Ni.

Theorem 13.1. BCMP.


The steady-state probability distribution in a BCMP QN has the following
form:
Pr{N = n} = $A(n) fipi(Bi)T (13.2)
i=l

where G is a normalising constant, A(n) is a function of the arrival process


and the functions pi(Ei) are the “per-node” steady-state distributions, to be
specified below. We refer to the next section for the definition and computation
of the normalising constant G.
296 13 BCMP queueing networks

When node i is of type FCFS, we have in the load-independent case

(13.3)

and in the load-dependent case

(13.4)

When node i is of type PS or LCFSPR, we have in the load-independent case

(13.5)

and in the load-dependent case

(13.6)

When node i is of type IS, we have in the load-independent case

R 1 I&. nz,r
(13.7)
pi(Ed =r&-J E
( ,) ’

and in the load-dependent case

(13.8)

Notice that the load-dependent cases we have addressed here refer to the case
where the service rate for a customer of a particular class depends on the total
number of customers in the queueing station (ni). In the BCMP paper, other
forms of load-dependency are also discussed, e.g., the case where the service
rate of a class r customer at queue i depends on the number of customers of
that class at that station (ni,,); we do not address these cases here.

The term A(n) is determined by the arrival processes. If all chains are closed
A(s) = 1. If th e arrivals depend on the total QN population, its value equals
A(e) = LI,“iA X(j), w h ere Ic is the actual network population. If the arrivals
are per chain, its value equals A(n) = nzi l-$&i’ Xc(j), where Ic, is the actual
population in routing chain c. cl
13.1 Queueing network class and solution 297

Important to note is that although the service time distribution in PS, IS and LCFSPR
nodes is of Coxian type, only its mean value is of importance in the expressions for the
steady-state distribut>ions. This is also known as the insensitivity property (with respect
to higher moments) of BCMP queueing networks.

Example 13.1. Single-class, load-independent open networks.


A simplification of the BCMP theorem is obtained when only open networks are addressed
in which there is only a single, load-independent Poisson arrival process with rate X and
where the service rates are fixed as well. In that case, t,he steady-state distribution

Pr{N= n>= fi p&i), (13.9)


i=l

with
0 - PMY FCFS, PS, LCFSPR type,
Pi(%) = (13.10)
e-Pz &.Y IS type,
I n,! 1

*
where pi is defined as

FCFS type,
Pi = =a AL+ ’ (13.11)
c Tent 2,I&,?- PS, IS, LCFSPR type,

with Ri the set of classes asking service at station i. Notice that the value A(n) = Xk is not
explicitly used in the expression for Pr{N = n}; it is hidden in the product of the pi-terms.
Furthermore, realize that in this simplified case, nodes of FCFS, PS and LCFSPR type
operate as if they are MIMI 1 queues studied in isolation (see also Chapter 6 where we found
a similar result for the M 1G 11 q ueue with processor sharing scheduling). To conclude, this
special case of the BCMP theorem leads to a slight generalisation of the JQNs we have
studied in Chapter 10. 0

Example 13.2. Closed, multi-class, load-independent BCMP networks.


A class of BCMP networks that is of particular interest encompasses load-independent
servers, multiple customer classes (without class changes) and fixed populations per class.
For such QNs, the steady-state distribution equals

1”
Pr{N = n) = yj J-JPi(Tli)r (13.12)
2=1
298 13 BCMP queueing networks

with
ni! ( k > nz IJ,“,, $$$, FCFS type,

V nqr
Pi(%) = %! II,“=, 5 7 PS, LCFSPR type, (13.13)
,. ( 2 9 >

I rI,“=l $J (giyp, , IS type.


Note that ni = C,“=, ni,r. 0

13.2 Computational algorithms


Specifying BCMP QNs is one thing, evaluating them is a completely different story. Gen-
eralisations of the convolution and MVA algorithms we have discussed do exist, but they
are notationally not so convenient anymore, and they do have a high time and space
complexity.
We will start with a brief treatment of the extension of the convolution method in
Section 13.2.1. We then present MVA algorithms for a number of special cases of BCMP
QNs in Section 13.2.2.

13.2.1 The convolution algorithm


We address the case of a closed queueing network with R classes of customers, no class
changes and M load-independent nodes. The fixed population is given by the vector
K = (K1,Ky* , E(R) and K = CE, Ki. The state space Z( M, I() is now specified as
follows:

Z(M,K) = {(~l,*-*,?l~) E JV”‘R(?Ji = (ni,l;..,ni,R),~ni,j = Kj,i= l;**,R}.


i=l
(13.14)
The normalising constant is now dependent on both the number of nodes M and the
population vector K, and is defined as

G(“7K) = C fi pi(I2.i) 7 (13.15)


nEZ(M,K) i=l

where the terms pi(ni) are dependent on the node type and defined in (13.3)-(13.8). A
direct computation of the normalising constant would involve
Kj+M-1
M-l
13.2 Computational algorithms 299

computational steps (one for each state) where each step consists of multiple multipli-
cations. This is clearly not a practical approach to use. Therefore, an extension of the
convolution approach has been proposed. Without derivation, we state that the following
recursive relation holds:

G(M,K) = 2 2 --- F pM(k1,k2;-,kR)G(M- l,K- (kl,k2,..+R)). (13.16)


k1=0 kz=O kR=O
We observe that the recursion is multi-dimensional along all the routing chains. As initial
values, we have G(l,K) =pl(K), and G(IM,K) = 0, as soon as one of the components of
K is negative.
The above expression can be simplified in case we exclusively deal with load-independent
nodes of types FCFS, PS or LCFSPR, in which case we have:

R Kr
G(WK) = G@f - UC) + c ;G(“,K-I,), (13.17)
r=l i,r
where 1, is a unit-vector with a 1 on the r-th position. Here, we see that the recurrence is
again over the classes: the normalising constant G( M, K) is the weighted sum of normal-
ising constants with one node less, and with one customer less in each of the classes. Note
that for a node i of FCFS type, the value pi,r = pi, for all classes r.
The throughput of customers of class r at node i can again be expressed as a quotient
of normalising constants:
x =,f
G(~1K-Ir)
i,r (13.18)
‘T G(M,() ’
Other measures of interest can be computed in a similar way as we have seen before.
The computational time complexity of this convolution algorithm is O(A4R nF!,(KT +
1)) and the space complexity is O(A4 n,“=, (Kr + 1)). Hence, the inclusion of more cus-
tomer classes, even if the number of customers remains the same, increases the number of
operations to be performed significantly.

13.2.2 Mean-value analysis


The MVA algorithm can be extended to the multi-class case in a straightforward way. We
adhere to the same class of QNs as in the previous section. If we define the average service
demand for class r at node i as Di,, = I&./p+., the arrival theorem states that the response
time (per passage) for a class T customer at node i is given as follows

Di,T (C;“=i E[Ni,j(K - 4)] + 1) , FCFS, PS, LCFSPR nodes,


E[fii,r(K)] = D
\ i,T1 IS nodes.
(13.19)
300 13 BCMP queueing networks

The throughput for class r customers (through the node with I,$r = 1) is given as:

(13.20)

and the expected number of class T customers in node i is, according to Little’s law, given
as
qN,,(K)] = xr(K)Jq&(K)]. (13.21)

Of course, E[Ni,,(K)] = 0 if KT = 0.
The time and space complexity of this MVA algorithm are similar to those of the multi-
class convolution algorithm. Especially when dealing with many classes and with many
customers, a direct computation along this scheme therefore becomes prohibitive. To avoid
such direct computations, approximation schemes have been devised. For the multi-class
MVA algorithm, one can use the following variant of the Bard-Schweitzer approximation
to break the recursion.
Instead of using EINi,j(K-lj)] for th e number of class j customers present upon arrival
at node i, one can estimate this value as

In this approximation, it is assumed that removing a class j customer does not affect the
performance of other classes (which is not true!) and affects all class j queue lengths in the
same way. An advantage of this approach is that the system of recursive MVA equations is
transformed in a system of non-linear equations independent of the value of the population
vector K that can be solved using a fixed-point iteration technique. Especially for larger
populations this is an advantage. As initial estimates one might take E[N,,,(E()] = K,/M
(per class, the customers are uniformly spread over all queues). In practice, good results
have been obtained with this approximation scheme; errors typically stay within 10%’ and
a few dozen iterations are needed at most.
The MVA presented here can also be extended to Qlvs with load-dependent service
rates. We do not discuss these here but refer to the appropriate literature.

13.3 Further reading


BCMP queueing networks have been introduced in [14] as extension of the work on Jack-
son networks by Buzen [37]. Around the same time, Denning and Buzen published their
13.4 Exercises 301

so-called operational analysis method which also addresses QNs with a variety of character-
istics [72]. Their paper also includes some computational algorithms. The MVA schemes
have been developed by Reiser and Lavenberg [245, 2431. R eiser also presented approxi-
mation schemes to break the recursion [242]. The MVA algorithms, as well as bounds and
a few approximations, are also discussed in [177]. In the books by King [156], Harrison
and Pate1 [117] and Kant 11521 some more computational techniques for BCMP queueing
networks are discussed (e.g., LBANC [45] and RECAL [64]). Lam discusses the relation
between various algorithms [1701 as well as ways to deal with large normalising constants
[169]. The books by Bruell and Balbo [30] and by Conway and Georganas [65] deal ex-
clusively with computational algorithms for BCMP networks. Conway also discusses the
applicability of queueing network models for the analysis of layered communication sys-
tems [63]. 0 nvural discusses closed-form solutions and algorithms for queueing network
with blocking [224].

13.4 Exercises
13.1. The GNQN-special case of BCMP queueing networks.
Show that the BCMP theorem reduces to the product-form result for GNQN when we deal
with single-class, load-independent servers of FCFS type in a closed network.

13.2. Use of BCMP queueing networks.


List examples of computer or communication system components that can best be modelled
by BCMP nodes of types 1 through 4 (FCFS, PS, IS or LCFSPR).
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part IV

Stochastic Petri net models


Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 14

Stochastic Petri nets

I N this chapter

the early
we present
SPNs grew out of the general
1960s through
the basic theory
theory
the introduction
concerning
of (non-timed)
of stochastic
stochastic
Petri
timing
Petri
nets developed
in these models.
nets (SPNs).
by Petri
The first
in

papers in the field of stochastic Petri nets appeared in the late 1970s and early 1980s; refer
to the end of the chapter for an overview.
SPNs are, like queueing networks, a graph-based modelling formalism allowing for the
easy specification of finite-state machine models. Under suitable assumptions regarding
the involved stochastic timing, a large class of SPN models can be mapped (automatically)
to an underlying CTMC with finite state space. Such a CTMC can then be analysed
numerically, thus yielding information about the SPN.
In this chapter we present a fairly general class of SPNs in which the involved distri-
butions are all of negative exponential type so that an underlying CTMC can be readily
computed. Notation and terminology is introduced in Section 14.1. Structural properties
of these Petri net models are discussed in Section 14.2. We present important SPN-related
performance measures in Section 14.3. The actual mapping of SPNs to the underlying
CTMCs is then discussed in Section 14.4.
Note that the discussion of the numerical solution of the CTMCs generated from the
SPNs is postponed until Chapter 15. Chapter 16 is devoted to various applications of SPN
modelling.

14.1 Definition
In this section we introduce a class of SPN models. In Section 14.1.1 we present the static
structure of SPNs, after which we discuss the dynamic properties of SPNs in Section 14.1.2.
306 14 Stochastic Petri nets

We finally comment on a number of wide-spread model extensions in Section 14.1.3.

14.1.1 Static SPN properties


Stochastic Petri net models are graph models. More precisely, SPNs are directed bipartite
graphs where the set of vertices is formed by the union of a set of places P and a set
of transitions T. The edges are a subset of the union of a set of input arcs (pointing
from places to transitions), a set of output arcs (pointing from transitions to places) and
a set of inhibitor arcs (pointing from places to transitions; actually, inhibitor arcs were
first introduced in generulised SPNs, but we will consider them here as well). Places are
depicted as circles, transitions as bars. Input and output arcs are depicted as arrows and
inhibitor arcs as arrows with a small circle as arrow head (see also Figure 14.1).
Transitions can be one of two types: immediate or timed. Immediate transitions do not
take any time at all to complete, whereas timed transitions require a negative exponential
time to fire, after they have become enabled. Immediate transitions are normally drawn
as thin bars and used to model system aspects that do not have physical time associated
with them, or system aspects that take only very little time when compared to the time
consuming activities in a system. Timed transitions are depicted as thick bars.
Finally, places may contain one or more tokens, depicted as small black circles. A
distribution of tokens over places is called a marking.
Before we elaborate more on the operation of SPNs, let us first define them formally.
An SPN is given as:
SPN= (P,T,Pr,I,O,H,W,m,), (14.1)

where P = {P,,Pz,---,P,,} is a set of places (/PI = n,), T = {tl, tz, --., tn,} is a set
of transitions (ITI = nt), Pr : T + JV is a function which associates priorities with
transitions, where timed transitions have lowest priority (coded as 0 here) and immediate
transitions have higher priorities (1,2, . . e). I : P x T -+ IV is a function which associates
a multiplicity with each input arc. 0 : 7’ x P -+ LTVis a function which associates a
multiplicity with each output arc, and H : P x 7’ + LV is a function which associates a
multiplicity with each inhibitor arc. W : T --+ R+ is a function which associates with every
transition t, either (i) a firing rate W(t) w h enever t is a timed transition (Pr(t) = 0), or
(ii) a weight W(t) w h enever t is an immediate transition (Pr(t) > 0). Finally, m. E J’V”p is
the initial marking, that is, the initial distribution of tokens over the np places; we assume
that m. is finite. Based on this definition, we can define the following sets:

l the input places of transition t: Ip(t) = {p E PlI(p,t) > 0);


14.1 Definition 307

l the output places of transition t: O,(t) = {p E PlO(t,p) > 0);

l the inhibiting places of transition t: HP(t) = {p E PIH(p, t) > 0).

We will use these sets in the description of the dynamic properties of SPNs below.

14.1.2 Dynamic SPN properties

The dynamic properties of SPNs describe the possible movements of tokens from place to
place. The following so-called firing rules describe the dynamic properties:

l A transition t is said to be enabled in a marking m when all its input places Ir,(t)
contain at least I(p,t) tokens and when no inhibiting place in HP(t) contains more
than H(p,t) tokens.

l When there are immediate transitions enabled in marking 772,this marking is said to
be vanishing. Then, the following rules apply:

- The set Ei(m) & T of enabled transitions of highest priority is determined. Let
W be the sum of the weights of the transition in I&(m), i.e., W = &EE,(E) IV(t).
- Transition t is selected to fire with probability IV(t)/IV.
- Upon firing, transition t removes I(p, t) tokens from input place p E Ip(t) (for
all P E 4(t)) and adds O(t,p) tokens to output place p E O,(t). Thus, the
firing of transition t in marking m yields a new marking m’. Firing is an atomic
action, it either takes place completely, or not at all.

l When there are no immediate transitions enabled, the marking m is said to be tan-
gible. Then, the following rules apply:

- An enabled transition t fires after an exponentially distributed time with rate


W(t)*
- Upon firing, a transition t removes I(p, t) tokens from input place p E Ip(t) and
adds O(t,p) tokens to output place p E O,(t). Thus, the firing of transition t in
marking m yields a new marking m’.
- When in a particular marking m more than one timed transition is enabled, one
of them is selected probabilistically as follows:
* Let E,(m) C T denote the set of transitions enabled in marking 773.
308 14 Stochastic Petri nets

AcqA PrA
1A

elB

I I I \\
cl I i;lput arc
piace ti&ed immediate
transition &ken
ou&ut arc transition

Figure 14.1: SPN model of simultaneous resource possessing

* All the enabled transitions t E Et (2) take a sample from their correspond-
ing exponential distribution with rate IV(t). The transition with the mini-
mum sample time fires first, and a new marking arises according to the token
flow along the input and output arcs of that transition. Alternatively, one
can define W = &Et(El IV(t) as the total outgoing rate of marking m.
The delay in marking m is then exponentially distributed with rate W;
after that delay, transition t will fire (without further timing delay) with
probability VV(t)/VV. Th is interpretation is equivalent to the former one by
the memoryless property of the negative exponential distribution.

Example 14.1. Simultaneous resource possessing (I).


Consider two computer system users A and B that want to make use of a single, scarce
resource such as a printer, every once in a while. The behaviour of both users can be
described by an infinite repetition of the following activities: do normal work-acquire
printer-use printer-release printer-. . . . We can model this by a SPN as depicted in
Figure 14.1. Formally, we have:
14.1 Definition 309

P= {Pa, Pb, WoPa,WoPb,PrA, PrB, Printer}


T= {Wa, Wb,AcqA, AcqB, RelA, RelB};
Pr = ((4 O), (m, O), (AcqA, I>, (AcqB, I>, @elk 01, (Redo));
I= {(Pa, Wa,l), (Pb,Wb, l), (WoPa,AcqA, l), (WoPb,AcqB, l), (Printer, AcqA, l),
(Printer, AcqB, l), (PrA, RelA, l), (PrB, RelB, I)};
o= {(Wa, WoPa,l), (Wb,WoPb,l), (AcqA, PrA, l), (AcqB, PrB, l), (RelA, Pa, l),
(RelB, Pb, l), (RelA, Printer, 1), (RelB, Printer, I)};
m, = (1, LO, 070, 071);
H= 0;
w= { (Wa,AA 0% hJ, +A 11, (AC@, 9, WA, h>, WB, h)h

An informal description of the above SPN is as follows. There are two users, represented
by the tokens in places Pa and Pb. They do an amount of work, represented by the timed
transitions Wa and Wb respectively. After that, they try to acquire access to the single
printer (represented by the single token in place Printer) via the immediate transitions
AcqA and AcqB respectively, while remaining in the places (“in state”) WoPaand WoPb(Wait
on Printer). Note that when one of the users already has access to the printer, the other
user is “blocked” since its corresponding acquire-transition cannot fire due to the fact that
the place Printer is already empty. After acquiring access to the printer, the printer is
used for some time and released afterwards via the transitions RelA and RelB. By the firing
of these transitions, the printer becomes available again for other users, and the user who
just used the printer resumes normal work. 0

Example 14.2. Priority systems (I).


Reconsider the SPN of Figure 14.1 and assume that the initial number of tokens in places
Pa and Pb is much larger. We can then interpret this model as a two-class model, where
customers of class A (and B) arrive at the system in their own buffers, and which require
exclusive service from the system (the printer) with durations specified by the transitions
RelA and RelB. If both WoPaand WoPbcontain tokens, the next customer taken into service
is chosen probabilistically, using the weights of the transitions AcqA and AcqB.
We can adapt this model to address the case where class A customers have non-
preemptive priority over class B customers, by simply adding an inhibitor arc from WoPa
to Acqb; as long as there are class A customers waiting, no work on a class B customer
may be started. Alternatively, we can set the priority of transition AcqA higher than that
of AcqB. Cl
310 14 Stochastic Petri nets

Figure 14.2: SPN representation of the MIM1215110 queue

Example 14.3. The MiMl2l5[10 queue (I).


Consider an SPN representation of the MIMl215/10 q ueue as depicted in Figure 14.2. The
waiting room is represented by place Pl. Idle servers are represented by tokens in place
P3 and busy servers are represented by tokens in place P2. Place P4 models the outside
world; it is the finite source and sink of tokens. Notice that a token in place P2 signifies
a server-customer pair. The tokens in place P5 represent free buffer places (free places in
the waiting room Pl). Once the waiting room is fully occupied, place P5 is empty, so that
no more arrivals are allowed since Tl is disabled. In order to exactly describe the multi-
server behaviour, the marking of place ~2 should determine the rate at which customers
are served by the timed transition T3: its rate should be proportional to the number of
customers in service, i.e., the number of tokens in place P2; we come back to these so-called
marking-dependent firing rates below. Notice that the movement of customers waiting for
service in place Pl to the service area (place P2), via the immediate transition T2 does not
take any time at all. 0

It is important to note that the employed terminology of SPNs is very general. It is left open
to the user what places, transitions, etc., represent in reality. In most of the SPN models
we interpret tokens as customers or packets and places as buffers or queues; however, they
are actually more than that. Their interpretation depends on the place they are in. As an
example of this, in Figure 14.1, a token in place Pa or WoPaindeed represents a customer.
On the other hand, the token in place Printer represents a resource. Moreover, a token
in place PrA represents a customer using a particular resource. Similarly, SPN transitions
model system activities such as packet transmissions or the processing of jobs. However, a
single transition ,might consume two (or more) tokens at the same time from different places
14.1 Definition 311

and “transform” them into a single token for yet another place (as AcqA does). On the
other hand, a single token taken from a single place by a transition may be transformed to
multiple tokens put in multiple places (as RelA does). The complex of enabling possibilities
by the input and output arcs then represents the rules for using resources, the involved
protocols, etc. In the literature, SPNs have also been used to model completely different
dynamic phenomena, such as the flow of traffic through cities and paperwork through
offices, but also chemical systems where the transitions model partial chemical reactions
and the tokens molecules.

14.1.3 SPN extensions


Over the years many enhancements and extensions to the definition of SPNs have been
published. Below, we discuss a number of widely-used extensions.

l We have assumed so far that the weights W(t) are constants. This need not be
the case. The weights, represented by the function W : 7’ + R+, can be made
marking dependent. Denoting with M the set of possible markings, i.e., M c JV”p,
the function W becomes: W : T x A4 -+ R+. Using marking dependent weights often
results in models that are easier to understand and more compact. Graphically, the
symbol “#” is placed near a transition with a marking dependent rate; a separate
specification of this dependence then needs to be given. As an example using this
notation, in Figure 14.2, the rate of T3 should equal #P2 . ,x.

l On top of the normal firing rules, the use of enabling functions (or guards) has
become more widespread. An enabling function E : 2 x M -+ (0, 1) specifies for
every transition t E T in marking m E M whether it is enabled (1) or not (0). Given
a particular marking, the choice of which transition will fire first is based on the
normal firing rules; however, for the transitions having passed this first selection, the
enabling functions are evaluated. Only if these also evaluate to 1 are the transitions
said to be enabled. The use of enabling functions eases the specification of SPNs,
especially when complex enabling conditions have to be expressed. A disadvantage of
the use of enabling functions is that part of the net semantics is no longer graphically
visible.

l Modelling features that are often very convenient are marking dependent arc mul-
tiplicities. We then have to specify functions I : P x 7’ x AL! + JV (and similar
extensions for output and inhibitor arcs) that specify the multiplicity of an arc, de-
pendent on the current marking. As an example of this, consider the case where
312 14 Stochastic Petri nets

an input arc I(p, t, m) equals the actual number of tokens in p. Firing of t then
empties place p (given O(t,p, m) = 0). Marking dependent arc multiplicities should
be used with great care; although they often help in keeping the SPN models simple,
they also might obscure the semantics of the model. Arcs with marking dependent
multiplicity are often marked with a “zig-zag” as follows: +.

Example 14.4. Priority systems (II).


Reconsider the SPN of Figure 14.1 and again assume that the initial number of tokens in
places Pa and Pb is larger than 1. Instead of giving absolute priority to class A (over B)
we can also increase the probability of a class A (or B) customer to be taken into service if
there are more of them waiting (in comparison to those waiting of the other class). We can
do so by making the weights of AcqA and AcqB proportional to the number of customers of
that class waiting.
If the printer server is extended to include a second printer, we can simply add an
extra (initial) token to place Printer. However, to correctly model the cases where the
two printers are handling jobs of either class A or B, we should double the rate of the
transitions RelA and RelB in those cases. This can be accomplished by making the rate
of these transitions proportional to the number of tokens in their input places; if there is
only one token there, the normal print rate is employed, but if there are two tokens in that
place, both printers are active for that class, hence, the effective rate is twice the original
one. Cl

14.2 Structural properties


Given an SPN as defined in Section 14.1, we can check a number of properties directly
from its structure, i.e., its places, transitions, and arcs, which can be used to verify whether
the model actually models what it should (the system) as well as to find bounds on the
size of the underlying CTMC. For the time being, assume that we do not need to make
a distinction between timed and immediate transitions to determine the properties listed
below.
Given a marking m, we say that marking m’ is immediately reachable from na, if z’ can
be obtained by the firing of an enabled transition t in marking m; we write m t\ m’.
A marking m’ is reachable from marking m, if M’ is immediately reachable from m, or
if a” is immediately reachable from VZ, and m’ is reachable from m”; we write m ---+ m’.
14.2 Structural properties 313

Given the initial marking mo, the set of all possible markings that are reachable from
m. is called the reachability set R(mo); notice that R(mo) c IN”p. The graph (R(mo), E)
in which all possible markings are the vertices (nodes) and in which an edge labelled t
exists between two vertices m and m’ whenever m t\ m’, is called the reachability gruph.
A marking m from which no other marking m’ is reachable is called a dead marking.
In terms of the SPN, this means that after a marking m is reached, no transition can fire
any more. Such markings therefore generally point to deadlock or absorption situations
(either in the system being modelled or in the model).
An SPN is said to be k-bounded when in all m E R(mo) the number of tokens per place
is at most k. In the special case of 1-boundedness, we speak of a safe SPN. A k-bounded
Petri net, possibly with inhibitor and multiple arcs, is always finite and can therefore be
represented as a finite state machine with at most (Ic + l)“p states.
An SPN is said to be strictly conservative if the sum of the number of tokens in all
places is the same for all markings. This implies that for each transition the number of
input and output arcs, measured by their multiplicity, must be equal. A Petri net is said
to be conservative if the number of tokens in all places, weighted by a per-place weighting
factor is the same in all markings. A place invariant then is a vector of weighting factors
such that for that vector, the Petri net is conservative; note that some of the weighting
factors may be zero!
When using SPNs to represent real systems (in which the tokens represent physical
resources) the notion of conservation is important. Since physical resources are normally
constant in number, the corresponding SPN will have to exhibit conservation properties.
We can compute place invariants in t$hefollowing way. We define cp,+= O(t, p) - I(p, t) as
the effect that the firing of transition t has on the marking of place p (it takes I(p, t) tokens
from p, but adds another 0( t, p) to p), The matrix C = ( c~,~)specifies in each column how
the firing of a certain transition affects all the places. Define the column vector -f E JV”“.
It can then be proven that for all m E R(mo), we can write:

m=mo+Cf -’ (14.2)

where the vector -f denotes how often every transition in T has been fired, starting from
so to reach 772.Multiplying this equation with a row vector 2 gives us a. m = g. mo, for
all m E R(mo), provided VC = 0. Since we know m,, the product g. m. can readily be
computed. Thus, once we have computed vectors g # Q such that VC = Q, we know the
value of the product g . m for any reachable marking a. This gives us a means to decide
whether a certain marking can be reached or not. We call a vector 2 # Q such that VC = Q
314 14 Stochastic Petri nets

a place invariant. Sometimes place invariants are called S-invariants, where the “S” stands
for “Stellen”, the German word for place.

Example 14.5. The MIM1215110 queue (II).


We readdress the queueing model we have seen before. The matrix C can easily be obtained
from Figure 14.2 as:

Computing VC = Q leads to the following equations:

i z11
-2’1
-v2 - + - +
w4 v2
213 = +=0,v50, = 0,
-215213
214

from which we can conclude the following place invariants: (u4 + 215,v3 + ~4, w3,v4, Q).
Notice that these place invariants are not determined uniquely! Assuming values for 7~3,
v4 and w5, we come to the following place invariants: ii = (O,l, l,O,O), i2 = (l,l,O, 1,0)
and & = (1, 0, 0, 0,l). Multiplying these invariants with the initial marking mo, we find
that:
P2 + P3 = 2, Pl + P2 + P4 = 10, and Pl + P5 = 5,

where we have (informally) used the place identifiers to indicate the number of tokens in
the places. The first place invariant states that the number of servers active and passive
always sums up to 2, as it should. The second place invariant states that the number of
customers buffered, served, or outside of the system, equals 10. The last place invariant
states that the number of used and unused buffer places must equal 5. All reachable
markings m should obey these invariants; hence, a marking like (1, 1, 1, 0,4) cannot be
reached. 0

A place Pi is said to be cowered by a place invariant g if the entry corresponding to


place Pi is non-zero, i.e., if vi > 0. If a place is covered by a place invariant, than this place
is bounded. If all the places of the SPN are covered by place invariants, then the SPN is
bounded.
A transition t is said to be live if for all possible markings m E R(m,) there is a
marking m’, reachable from m, that enables t. If this property holds for all transitions,
14.2 Structural properties 315

the Petri net is said to be live. Liveness is a property which is closely related to liveness
in computer or communication systems. When a system model is live, the corresponding
system is deadlock free. A live Petri net cannot contain dead markings.
A transition invariant is a series of transitions that, when starting from marking m,
after the successive firing of these transitions, will yield marking m again. Using -f as
defined before, -f is a transition invariant if

-m=m+Cf.
-

Hence, to find transition invariants, we have to find those -f # 0 such that Cf = 0. If


an SPN is bounded and live, all of its transitions should be part of at least one transition
invariant (all transitions should be covered by transition invariants).

Example 14.6. Simultaneous resource possessing (II).


The transition invariants for the simultaneous resource possessing example are easy to
compute from Cf = 0 with

/ -1 0 0 0 1 0
0 -1 0 0 0 1
1 0 -1 0 0 0
c= 0 1 0 -1 0 0
0 0 1 0 -1 0
0 0 0 1 0 -1
\ 0 0 -1 -1 1 1

The first group of three equations from the system Cf = 0 yields fi = f3 = fs, and the sec-
ond group of three equations gives f2 = f4 = fs = 1. The last equation is dependent on the
former six. Thus, we have as transition invariants both (l,O, l,O, 1,0) and (0, l,O, l,O, 1).
These can also be easily interpreted; either user A or user B will use the printer. After
usage, they both leave the system in the same state as that they entered it. Parallel usage
is not possible. cl

Example 14.7. The M1M12/5110 queue (III).


From Cf- = Q we immediately find that f = (1, l,l). Th is means that after having fired
all transitions once, given starting marking no, we are back in m again. El

We finally comment on the applicability of the structural properties for timed Petri
nets. Originally, these properties have been defined for non-timed Petri nets, However,
316 14 Stochastic Petri nets

Figure 14.3: Structural properties in SPNs

as long as the timed transitions have delay distributions with support on [0, oo), and
transition priorities and enabling functions are not used, nor are marking dependent arc
multiplicities, the structural properties of the non-timed Petri nets are valid for their timed
counterparts as well. For further details on invariants for SPNs with marking-dependent
arc multiplicities, refer to [51].

Example 14.8. Structural properties in SPNs.


Consider the SPN as given in Figure 14.3. If all the transitions are non-timed, there will
be two place invariants: Pl + P2 = 1 and Pl + P3 = 1. Furthermore, there will be two
transition invariants: (1, 0, 1,0) and (0,1’, 0,l). When all the transitions have a negative
exponential firing delay, these invariants still hold. However, if Tl has a deterministic delay
of length 1 and T3 of length 2, then the second place and transition invariant will both
not be valid any more. On the other hand, if Tl and T2 still exhibit negative exponential
delays, but T3 and T4 have deterministic delays, then all the above invariants still hold.
Thus, it depends on the involved timing delays and their “position” in the SPN whether
or not the invariants for the non-timed Petri net continue to hold. 0

14.3 Measures to obtain from SPNs


Assuming that a given SPN is finite, that is IR(uz~)I < 00, we can derive many interesting
measures from the SPN. For the time being we focus on steady-state measures.
Let R(ux~) denote the set of all possible markings, m E R(m,) and let #P(m) denote
the number of tokens in place P given marking m (one also often sees the notation #P
14.3 Measures to obtain from SPNs 317

leaving the “in marking m” implicit; when there is no confusion possible, we will do so as
well). The following measures can easily be obtained:
l The probability pl, of a particular marking m. This probability can be obtained from
the CTMC underlying the SPN (see Section 14.4).

l The probability Pr{#P = Ic}, that is, the probability of having exactly Ic tokens in
place P. It is derived as follows:
Pr{#P = Ic} = C p,. (14.3)
mEw?.?,),#p(m)=k
l More generally, the probability Pr{A} of an event A, where A c R(nx,) expresses
some condition on the markings of interest, can be derived as

Pr{A} = c pE. (14.4)


ZEA

Note that the condition A can refer to multiple places so that complex conditions
are allowed.

l The average number of tokens in a place P:

E[#P] = 2 IcPr{#P = Ic}. (14.5)


k=O

Note that the infinite summation does not occur in practice as the number of different
markings is finite.

l The throughput Xt of a timed transition t can be computed as the weighted sum of


the firing rate of t over all markings in which t is enabled:

&= 1 wt, r?.z)Pn-l, (14.6)


mER(m),tE~t(m)

where W(t,m) is the rate at which transition t fires, given marking m.

l The average delay E[R] a t ok en p erceives when traversing a subnet of the SPN. It
must be computed with Little’s formula: E[R] = E[N]/X where E[N] is the average
number of tokens in the subnet and X is the throughput through the SPN subnet.
The measures expressed here all refer to steady-state. Of course, we can also study SPNs
as a function of time. We then have to compute transient measures. Finally, we can also
express interest in cumulative measures over time, e.g., to express the number of tokens
having passed a certain place in a finite time interval. We will come back to such measures
when we discuss the numerical analysis of the CTMC underlying the SPNs in Chapter 15.
We will now first derive these CTMCs.
318 14 Stochastic Petri nets

1. input SPN = (P,‘LPr,I,O,H,W,m,)


2. NM := {mo}; RS := {s,}
3. while NM # 8
4. do
5. let m E NM
6. NM := NM - (124)
7. for all t E Et(m)
8. do
9. let m---% m’
10. store-Q(m, m’, w(t, m>>
11. if m’ $! RS
12. then NM := NM U {m’}
13. RS := RS U {z’}
14. od
15. od
16. p(o) = (ho, * * * 70)

Figure 14.4: Deriving a CTMC from a SPN

14.4 Mapping SPNs to CTMCs

As we have seen, k-bounded SPNs can be mapped on a finite state machine. Given that
the state residence times are exponentially distributed, the underlying state machine can
be interpreted as a CTMC. It is often referred to as the underlying CTMC of the SPN or
the embedded CTMC. As a consequence of this, we can “solve” SPN models by successively
constructing and solving the underlying CTMC. In this section we focus on the construction
of the underlying CTMC from the SPN. We first address the case where the SPN does not
contain immediate transitions. We then discuss the slightly more complicated case with
immediate transitions.
Let us first recall how a CTMC is described (see Chapter 3): its generator matrix Q
specifies the possible transitions and their rates; its initial probability vector p(O)
- specifies
the starting state. Since we are only interested in the non-null entries of Q, we only need
to know all triples (i, j, qi,j) (w h ere i and j are state identifiers) for which qi,j # 0, and
where i # j, since the diagonal entries of Q can be derived from the non-diagonal entries.
14.4 Mapping SPNs to CTMCs 319

Let us now discuss the algorithm that derives a CTMC on state space R(m,), with
generator matrix Q and initial probability vector p(O) from an SPN. For the time being
we assume that there are only (exponentially) timed transitions.
The algorithm is given in Figure 14.4. First, the sets NMand RS (for new markings and
reachability set, respectively) are initialised with the initial marking na,. As long as NM
is not empty, an element m is taken from NM. Given this marking m, all enabled (timed)
transitions t E Et(m) are generated. Then, one by one, the result of the firing of these
transitions is determined. The firing of transition t in marking m yields the marking m’.
The firing rate is IV@, m). Therefore, the triple (m, m’, W(t, m)) is stored, representing an
entry of the matrix Q. When the newly derived marking m’ has not been examined before,
i.e., when it is not yet part of the reachability set, it is put in the reachability set as well
as in the set of new markings for further examination. When all the markings have been
examined, the set RS contains all possible markings (all states of the underlying CTMC)
and all non-zero non-diagonal entries of the matrix Q have been stored. Assuming that
the initial marking is the first one, the vector p- is assigned its value.
When we allow for immediate transitions, we have to deal in a special way with the
vanishing markings. In principle, we can apply a similar algorithm as before; however, we
have to mark the entries in Q that correspond to the firing of immediate transitions. Instead
of storing the corresponding rates, we have to store the firing probabilities. Since the
obtained matrix is not really a generator matrix any more, we denote it as Q’. Furthermore,
note that the diagonal entries have not been given their value yet.
Suppose that we have used the above algorithm to derive a CTMC from an SPN and
that we have marked those state transitions that correspond to the firing of immediate
transitions. Note that these transitions come together in rows of which the corresponding
starting state is a vanishing marking. We therefore can reorder the states in Q’ in such a
way that

Q’=A+B= (14.7)
(: :)+(i i):
where C is the matrix describing transitions between vanishing states, D the matrix de-
scribing transitions from vanishing to tangible states, E the matrix describing transitions
from tangible to vanishing states, and F the matrix describing transitions between tangible
states. The elements of the matrices C, D, and E are transition probabilities. Below, we
will describe the derivation of the real generator matrix, denoted Q again, of the underlying
CTMC.
A departure from the set of tangible markings, via a non-zero entry in the matrix E, is
followed by zero or more transitions between vanishing states, via non-zero entries in the
320 14 Stochastic Petri nets

matrix C, and exactly one transition from a vanishing to a tangible state, via a non-zero
entry in the matrix D. We would like to reduce the matrix Q’ in such a way that all
the possible state transitions between tangible states via one or more vanishing states are
accounted for. The thus obtained matrix Q (of the same size as F) then describes the
reduced embedded Markov chain.
To derive Q from Q’, we let i and j denote tangible states and let r and s denote
vanishing states. Going from one tangible state i to another j can either take place directly,
with rate fi,j, or indirectly with rate C, e+. Pr{r + j}, where t+ is the rate to leave a
tangible marking i to end up in a vanishing marking r, and where Pr{r --+ j} is the
probability of ending up in a tangible state j, starting from a vanishing state r, by one or
more steps. Consequently, we set

fi,j + C, ei,7- P+ + j}, i # j,


Qi,j = (14.8)
{ 0, i = j.

From (14.7) we observe that the entries of the matrix A’ represent the probability of going
from a vanishing state r to any other state in exactly I steps, under the condition that only
vanishing intermediate states are visited. Since

Al= (“,’ .,,), (14.9)

the element gi,i of the matrix G1 = cf=, ChmlD signifies the probability of reaching
tangible state i from vanishing state r in at most I steps. Under mild regularity conditions
(irreducibility) th e matrix Gz exists and is finite, also for I -+ 00. Moreover, it can be
shown that G” = (I - C)-lD (g eometric series extended for matrices, see Appendix B.2).
Using this result, we derive, for all i # j:

Qi,j = .fi,j + C +!I~, (14.10)

or, in matrix notation,

Q = F + EG” = F + E(I- C)-lD. (14.11)

This equation shows us how we can derive the generator matrix of the reduced embedded
CTMC from elements of the matrix Q’ of the embedded CTMC. The only thing still to be
done is the computation of the diagonal entries. Notice that the reduced embedded CTMC
is smaller than the embedded CTMC. The determination of this smaller matrix Q, using
(14.11)) however, involves an expensive matrix inversion.
14.4 Mapping SPNs to CTMCs 321

b \

Figure 14.5: Generated (unreduced) reachability graph

Instead of removing the immediate transitions after the complete unreduced Markov
chain has been generated, using (14.1 l), one can also remove the immediate transitions
during the generation process itself (“on the fly”) or by inspection of the matrix Q’. Let
ml, m2 and m3 be tangible markings and let m, be a vanishing marking. After Q’ has
been constructed, it turns out that m, r% m,, m, prw m2 and sV pr@+P m2. The
vanishing marking is then removed and the following two transition (rates) are introduced:
from ml to m2 with rate aX and from ml to m3 with rate ,BX. This approach can be
followed for all vanishing markings encountered, and works well as long as there are no
infinite sequences (loops) of immediate transitions possible. We think that the latter is no
severe restriction; it might even be questioned whether using SPNs that allow for infinite
loops of immediate transition firings are good modelling practice anyway. In software tools
that allow for the construction of SPNs and for the automatic generation of the underlying
CTMC, a typical restriction is set on the number of immediate transitions that might be
enabled in sequence.

Example 14.9. Simultaneous resource possessing (III).


We apply the state space generation algorithm of Figure 14.4 to the SPN given in Fig-
ure 14.1. The reachability graph generated is depicted in Figure 14.5 where the dashed
ovals refer to vanishing markings and the other ovals to tangible markings. Numbering the
states as indicated in this figure, we obtain the following matrix:
322 14 Stochastic Petri nets

fo
/O 0 001001 0 0 0o\ \
0 0 0001000 1 0 0
Xa Xb
xa Ab 00 0 0 0 0
Q’ = 0 0O Papa 0 0 Ab
Xb 0 7
0 0o bpb 0 0 0 Xa
Aa
0O Pa
Pa O00000 0 0 0
\j&
\ pb 00 00 0
00 0 0 0o/ J
in which we have indicated the partitioning in submatrices C through F. The matrix
G” = (I - C)-ID = D so that

00 000

so that the (non-diagonal entries of) Q can be computed:

Q=F+EG”=

Notice that Q could also have been derived by inspection of Figure 14.5. Cl

To conclude this chapter, we show in Figure 14.6, how SPNs are used in a model-
based system evaluation. The system of interest is modelled as an SPN and the measures
of interest are indicated as well. After the model has been checked on its correctness,
e.g., by computing place invariants with the method discussed in Section 14.2, the actual
solution can start. First, the underlying CTMC is generated with the algorithm discussed
in Section 14.4. Then, from this CTMC the probabilities of interest are computed with
the numerical methods to be discussed in Chapter 15. These probabilities are then used
to compute the measures of interest, according to the rules of Section 14.3; we could say
that the state probabilities are enhanced to yield system-oriented measures. Finally, these
system-oriented measures are interpreted in terms of the system being studied (refer to
Chapter 1 and the discussion of the GMTF, in particular to the example addressing tools
for SPNs).
14.5 Further reading 323

I
w,,interpretation
modelling ,/ ’ *
I
1
SPN model measures of interest
A
generation enhancement
v
underlying CTMC state probabilities

numerical solution

Figure 14.6: The model construction and solution trajectory in SPN-based system evalu-
ation

14.5 Further reading


Petri published his Ph.D. thesis on non-timed Petri nets in the early 1960s [232]. He was
awarded the Werner-von-Siemens-ring for his accomplishments in science and engineering
early in 1998. The first stochastic extensions of Petri nets appeared in the late 1970s.
Pioneering work has been accomplished by Natkin, Molloy, Ajmone Marsan et al. and
Meyer et al. . Natkin published his Ph.D. thesis on stochastic petri nets in 1980 [214]. In the
early 1980s Molloy published a paper on SPNs in which all the transitions had exponential
timing [205]. At the same time, Ajmone Marsan et al. published the well-known paper
on generalised SPNs (GSPNs) in which transitions could be either exponentially timed or
immediate (as we used) [4]. At about the same time, Meyer et al. introduced stochastic
activity networks (SANs) in which transitions are either of exponential type or immediate
(instantaneous in their terminology) [210, 209, 1991. In the mid 1980s the amount of
research on SPN formalisms and evaluation techniques increased sharply, also witnessed
by the start of an bi-annual IEEE conference series on Petri nets and performance models
in 1985 (published by the IEEE Computer Society Press as Proceedings of the International
Workshop on Petri Nets and Performance Models).
A number of books have been devoted (almost) exclusively to the use of SPNs: we
mention Ajmone Marsan et al. [l, 2] and Bause and Kritzinger [15]. The performance
evaluation textbook by Kant [152] and the book on Sharpe by Sahner et al. [249] also
address SPNs. Two surveys on the success of SPNs, including many references, have
recently been published by Balbo [ll, lo].
In the field of stochastic Petri nets, much research is still going on. Most of it addresses
324 14 Stochastic Petri nets

one of the following four issues: (i) modelling ease; (ii) largeness of the underlying CTMCs;
(iii) timing distributions other than exponential; and (iv) alternative solution methods. We
briefly touch upon these four issues below.
Over the years, a number of extensions to SPN models has been proposed. One of these
extensions is the attribution of colours to tokens, leading to the so-called doured stochastic
Petri nets (CSPNs) [148, 1491. A marking then is not just an enumeration of the number of
tokens in all places, but also includes the colours of the tokens. Correspondingly, transitions
can be both marking and colour dependent. Coloured Petri nets can be transformed to
“black-and-white” stochastic Petri nets as we have discussed here. They then generally
become more complex to understand. For the solution, again the underlying CTMC is
used.
A slightly different approach has been followed in the definition of stochastic activity
networks (SANs) [253]. With SANs the enabling of transitions has been made more explicit
by the use of enabling functions that must be specified with every transition. Furthermore,
input and output gates are associated with transitions to indicate from which places tokens
are taken and in which places tokens are stored upon firing of the transition. With SANs
some nice hierarchical modelling constructs have been developed that decrease the size of
the state space of the underlying CTMC by using results from the theory of lumping in
CTMCs.
One of the largest problems in the application of SPNs is the growth of the state space
of the underlying CTMC. Although CTMCs with hundreds of thousands of states can be
handled with current-day workstations, there will always remain models that are just too
big to be solved efficiently. Solutions that have been proposed are often based on (approx-
imate) truncation techniques [121] or lumping [31, 32, 47, 2531, thereby still performing
the solution at the state space level. By employing the structure of the underlying CTMC,
explicit generation of it can sometimes be avoided. This has led to alternative solution
methods, e.g., via the use of product-forms results, leading to so-called product-form SPNs
(PFSPN s) w h’ic h a11ow for an efficient convolution or mean-value analysis style of solu-
tion [70, 75, 951, and the use of matrix-geometric methods (which will be discussed in
Chapter 17).
Instead of using exponentially distributed firing times, deterministic and general timing
distributions are also of interest. Apart from using phase-type expansion techniques, the
state-of-the-art in generally-timed SPNs is such that only SPNs with at most one non-
exponentially timed transition enabled in every marking can be handled at reasonable
computational expense. These so-called deterministic and stochastic Petri nets (DSPNs)
have been introduced by Ajmone Marsan and Chiola [3] and have been further developed
14.6 Exercises 325

by, among others, Lindemann and German [104, 103, 182, 183, 1841, Ciardo et al. [52] and
Choi et al. [49].
SPNs are not the only “high-level” formalisms to specify CTMCs. QN models can
also be interpreted as specifications of CTMCs. However, since the queueing networks
we have addressed allow for a specialised solution, using either MVA or convolution, it is
not necessary to generate and solve the underlying CTMC; it is more efficient to use the
special algorithms. When no special algorithms can be used, the underlying CTMC might
be explicitly generated and solved. Next to QNs, other techniques, such as those based on
production rule systems, process algebras and reliability block diagrams can be used. For
an overview of high-level formalisms for the specification of CTMCs, we refer to [121].

14.6 Exercises
14.1. SPN model of a system with server vacations.
Consider an M]M] 1 queue with finite population K in which the server takes a vacation of
exponentially distributed length after exactly L customers have been served.

1. Construct an SPN for this queueing model when L = 2. Compute the place and
transition invariants.

2. Derive the underlying CTMC when K = 6.

3. Generalise the model such that only the initial marking has to be changed when L
changes. Again compute the place invariants.

14.2. CTMC construction.


Consider the SPN as given in Figure 14.7.

1. Construct the embedded Markov chain, including both the vanishing and the tangible
markings.

2. Construct the reduced embedded Markov chain, i.e., remove the vanishing markings

l with the matrix method discussed in Section 14.4;


l and by inspection of the matrix Q’.

14.3. Preemptive priority systems.


Construct an SPN for a system in which two priority classes exist. There are Kr (resp. ISZ)
326 14 Stochastic Petri nets

Figure 14.7: An example SPN for which the underlying CTMC is to be determined

customers in each class, and customers of class 1 have preemptive priority over customers
of class 2. Assume that a class 2 customer is interrupted when a class 1 customer arrives
and that the interrupted customer has to be reserved completely.

14.4. Polling systems.


In Chapter 9 we have discussed polling models and developed results for the mean waiting
time in symmetric polling models. Let us now try to develop SPN-based polling mod-
els. Consider the case where we have three stations and where all the service times are
exponentially distributed, possibly with different means. Likewise, we assume that the
switch-over times are all exponentially distributed. Finally, assume that the arrival pro-
cesses are Poisson processes, however, fed from a finite source, i.e., for every station there
is a place that models the finite source (and sink) of jobs for that station.

1. Let there be at most one job at each station. Construct an SPN, modelling a three-
station polling model, where every station has a l-limited scheduling strategy.

2. Derive the underlying CTMC for such an SPN.

3. Now let there be more than one job at each station (we increase the number of jobs
in the finite source). Construct an SPN, modelling a three-station polling model,
where every station has an exhaustive scheduling strategy.

4. Is it still possible to derive the underlying CTMC by hand?

5. Construct a model for a polling system where some stations have l-limited schedul-
ing, and others have exhaustive scheduling. How is such a “mixed” model solved?
14.6 Exercises 327

Does the asymmetry have any impact on the complexity of the numerical solution
approach?

6. Construct a model of a station with a gated scheduling discipline.

7. Construct a model of a station with a time-based scheduling discipline, such as we


have discussed for the IBM token ring.

In Chapter 16 we will present a number of SPN-based polling models.

14.5. Availability modelling.


Consider a system consisting of ?a,4components of type A and nB components of type B.
The failure rate of components of type A (B) is fA (fs) and the repair rates are rA and fg
respectively. We assume that the times to failure and the times to repair are exponentially
distributed. There is a single repair unit available for repairing failed components; it can
only repair one component at a time. Components that fail are immediately repaired,
given the repair unit is free; otherwise they have to queue for repair.

1. Construct an SPN modelling the availability of the system.

2. What would be a reasonable initial marking?

3. How many states does the underlying CTMC have, given nA and nB?

4. The system is considered operational (available) when at least one component of each
class is non-failed. Express this condition in terms of required place occupancies. How
would you compute this availability measure?

Haverkort discusses this model at length in [12l] ; we will also address it in Chapter 16.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 15

Numerical solution of Markov chains

A LTHOUGH

the CTMC
a CTMC is completely described by its state space 1, its generator matrix
Q and its initial probability vector p(O), in most cases we will not directly specify
at the state level. Instead, we use SPNs (or other high-level specification
techniques) to specify the CTMCs.
In this chapter, we focus on the solution of CTMCs with a finite, but possibly large,
state space, once they have been generated from a high-level specification. The solution
of infinite-state CTMCs has been discussed in Chapter 4 (birth-death queueing models),
Chapter 8 (quasi-birth-death queueing models) and Chapter 10 (open queueing network
models) and will be discussed further in Chapter 17.
Finite CTMCs can be studied for their steady-state as well as for their transient be-
haviour. In the former case, systems of linear equations have to be solved. How to do this,
using direct or iterative methods, is the topic of Section 15.1. In the latter case, linear
systems of differential equations have to be solved, which is addressed in Section 15.2.

15.1 Computing steady-state probabilities


As presented in Chapter 3, for obtaining the steady-state probabilities of a finite CTMC
with N states (numbered 1 through N), we need to solve the following system of N linear
equations:

where the right part is added to assure that the obtained solution is a probability vector.
We assume here that the CTMC is irreducible and aperiodic such that -p does exist and
is independent of p(O). N o t ice that the left part of (15.1) in fact does not uniquely define
330 15 Numerical solution of Markov chains

the steady-state probabilities; however, together with the normalisation equation a unique
solution is found. For the explanations that follow, we will transpose the matrix Q and
denote it as A. Hence, we basically have to solve the following system of linear equations:

ApT
- = b, with A = QT and b = 0’. (15.2)

Starting from this system of equations, various solution approaches can be chosen:

1. Direct methods:

l Gaussian elimination;
l LU-decomposition;

2. Iterative methods:

l Jacobi iteration;
a Gauss-Seidel iteration;
0 Successive overrelaxation;

These methods will be discussed in more detail in the following sections.

15.1.1 Gaussian elimination


A characteristic of direct methods is that they aim at rewriting the system of equations in
such a form that explicit expressions for the steady-state probabilities are obtained. Typ-
ically, this rewriting procedure takes place in an a priori known number of computations
(given N). A very well-known and straightforward direct solution technique is Gaussian
elimination. The Gaussian elimination procedure consists of two phases: a reduction phase
and a substitution phase.
In the reduction phase repetitive subtractions of equations from one another are used
to make the system of equations upper-triangular (see also Figure 15.1). To do so, let the
i-th equation be Cj ai,jpj = 0 (this equals Cj pjqj,i = 0 in the non-transposed system).
We now vary i from 1 to N. The j-th equation, with j = i + 1,. . . , N, is now changed by
subtracting the i-th equation mj,i times from it, where mj,i = aj,i/ai,i, that is, we reassign
the aj,k values as follows:

aj,k I= Uj,k - mj,iai,k, j, k > i. (15.3)

Clearly, :=
Uj,i - mj,iai,i = 0, for all j > i. By repeating this procedure for increasing
Uj,i

i, the linear system of equations is transformed, in N - 1 steps, to an upper-triangular


15.1 Computing steady-state probabilities 331

ai,i . . .._.
row i

.’ .~..~~~~~~~ row j
aj,i := aj,k - Vlj,iai,k

“. column i being reduced

Figure 15.1: Schematic representation of the i-th reduction step in the Gaussian elimination
procedure

system of equations. The element ai,i that acts as a divisor is called the pivot. If a pivot
is encountered that equals 0, the algorithm would attempt to divide by 0. Such a failure
indicates that the system of equations being solved does not have a solution. Since Q is a
generator matrix of an irreducible ergodic CTMC, this problem will not occur. Moreover,
since A is weakly diagonal dominant (ai,i is as large as the sum of all the values aj,i (j # i)
in the same column) we have that mj,i < 1 so that overflow problems are unlikely to occur.
At the end of the reduction phase, the N-th equation will always reduce to a trivial
one (0 = 0). This is no surprise, since the system of equations without normalisation is
not of full rank. We might even completely ignore the last equation. Since the right-hand
side of the linear system of equations equals 0, we do not have to change anything there.
When the right-hand side is a non-zero vector b, we would have to set bj := bj - mj,ibi, for
all j > i in each step in the reduction process.
After the reduction has been performed, the substitution phase can start. The equation
for pN does not help us any further; we therefore assume a value Q > 0 for PN. This
value for PJJ can be substituted in the first N - 1 equations, thus yielding a system of
equations with one unknown less. We implement this by setting bj := bj - aj,J@N. Now,
the (N - 1)-th equation will have only one unknown left which we can directly compute
8s PN-1 = bN-l/Wl,N-1. This new value can be substituted in the N - 2 remaining
equations, after which the (N - 2)-th equation has only one unknown. This procedure
can be repeated until all probabilities have been computed explicitly in terms of a. We
then use the normalisation equation to compute a to obtain the true probability vector,
332 15 Numerical solution of Markov chains

1. for i:=l to N-l


2. do for j:=i+l to N
3. do
4. mj,i I= aj,i/Cli,i; CLj,iI= 0
5. for k :=i+l to N
6. do aj,k := aj,k - mj,iai,k

7. od
8. od
7. pN:=a; g:=a
8. for j := N downto 2
9. do
IO. for i := j - 1 downto 1
II. do bi I= bi - ai,jpj
12. pj-l := bj-l/aj-l,j-l
13. 0 := 0 +pj-1
14.od
15.for i I= 1 to N do pi :=pi/O

Figure 15.2: The Gaussian elimination procedure

that is, we compute o = CzI pi and set) pi := pi/ O, for all i. We summarise the complete
algorithm in Figure 15.2.

Example 15.1. Gaussian elimination.


As an example, consider a CTMC for which the matrix Q is given by

(15.4)

Writing pQ = 0, we obtain:

-4P1 +p2 +6p3 =O,


2Pl -2p2 = 0, (15.5)
2Pl +P2 -6p3 = 0.

Adding the first equation one-half times (m2,1 = -i) to the other equations, we obtain the
15.1 Computing steady-state probabilities 333

following:
-4P1 -f-p2 +6p3 = 0,
-1.5p2 +3p3 = 0, (15.6)
1.5~2 -3p3 = 0.
We then add the second equation once to the third to obtain:

-4p1 +p2 +6p3 = o, (15.7)


-1.5~2 +3p3 = 0,

where the last equation has disappeared; we therefore assume p3 = a! and obtain the
following system of equations by back-substitution:

-4P1 +P2 = -6a,


(15.8)
-1.5pz = -3a, ’

from which we obtain that p2 = 2a. Substituting this result in the first equation, we obtain
pi = 2a. Thus, we find p = a(2,2,1). Using the normalisation equation, we find 5a = 1
so that the final solution-vector equals 2 = ( g , f, i) . 0

Instead of assuming the value CIIfor pi, we can also directly include the normalisation
equation in the Gaussian elimination procedure. The best way to go then, is to replace
the N-th equation with the equation Cipi = 1. In doing so, the last equation will directly
give us pN. The substitution phase can proceed as before.
Observing the algorithm in Figure 15.2 we see that the computational complexity is
O(N3). By a more careful study of the algorithm, one will find that about N3/3 + N2/2
multiplications and additions have to be performed, as well as N(N + 1)/2 divisions.
Clearly, these numbers increase rapidly with increasing N. The main problem with Gaus-
sian elimination lies in the storage. Although A will initially be sparse for most models,
the reduction procedure normally increases the number of non-zeros in A. At the end of
the reduction phase, most entries of the upper half of A will be non-zero. The non-zero el-
ements generated during this phase are called fill-ins. They can only be inserted efficiently
when direct storage structures (arrays) are used. To store the upper-triangular matrix A,
N2/2 floats have to be stored, normally each taking 6 or 8 bytes. For moderately sized
models generated from SPN specifications, N can easily be as large as lo4 or even 105.
This then precludes the use of Gaussian elimination. Fortunately, there are methods to
compute p- that do not change A and that are very fast as well. We will discuss these
methods after we have discussed one alternative direct method.
334 15 Numerical solution of Markov chains

15.1.2 LU decomposition
A method known as LU decomposition is advantageous to use when multiple systems of
equations have to be solved, all of the form A: = b, for different values of b. The method
starts by decomposing A such that it can be written as the product of two matrices L and
U, where the former is lower-triangular, and the latter is upper-triangular. We have:

A:=b + LUz=b. (15.9)


-
z
After the decomposition has taken place, we have to solve Lz = b, after which we solve
UJ: = z. Since the last two systems of equations are triangular, their solution can be found
by a simple forward- and back-substitution.
The main question then lies in the computation of suitable matrices L and U. Since
A is the product of these two matrices, we know that

CLi,j = 5 li,kuk,j, i,j = 1, - * * , N. (15.10)


k=l

Given the fact that L and U are lower- and upper-triangular, we have to find N2 + N
unknowns:
1.23.7 i = I,..., N, k = l,...,i,
(15.11)
Uk,j, k = I,---, N, j = k;--, N.

Since (15.10) only consists of N2 equations, we have to assume N values to determine a


unique solution. Two well-known schemes in this context are [268]:

l the Doolittle decomposition where one assumes Zi,i = 1, i = 1,. . . , N;

l the Crout decomposition where one assumes UQ = 1, i = 1, . . . , N.

In the sequel, we consider the Doolittle variant. First notice that in (15.10) many of the
terms in the summation are zero, since one of the numbers being multiplied is zero. In
fact, we can rewrite (15.10) in a more convenient form as follows:

i<j: ai,j = ui,j + ci:: li,kuk,j,


(15.12)
i>j: ai,j = ifi,jUj,j xii: Ei,kUk 1j *

From this system of equations, we can now iteratively compute the entries of L and U as
follows:
ilj: ‘%,j = ai,j - cizi li,kuk,j,
(15.13)
i > j 1 li,j = y& (ai,j - Cj,l: li,kUk,j) ,
15.1 Computing steady-state probabilities 335

by increasing i from 1 until N is reached.

Example 15.2. LU decomposition (I).


Suppose we want to decompose

3 2 5
-6 1 8
-7 2 -3

using a Doolittle LU decomposition. We then know that

L=( t p !) andU=(i 0 !),

We start to compute ul,l = al,l = 3. We then can compute Zz,i = u2,i/ui,i = -2. Then,
u1,2 = al,2 = 2. From this, we find u 2,2 = u2,2 - Z2,1u1,2= 5. We then compute Z3,1 = -5
and find Z3,2= i. Via 2~1,s= ai,3 = 5 and ~2,3 = 18 we find us,3 = ~3,3--Ci=~ Z3,kuk,3 = -$.
We thus find:

A= LU, with L =
1 and U = .

Example 15.3. LU decomposition (II).


The result of the LU decomposition is given by the matrices L and U in the previous
example. To solve A: = I, we now first solve for z in Lg = I. A simple substitution
procedure yields 2 = (1,3, -i). W e now continue to solve UJ: = z; also here a substitution
procedure suffices to find : = & (-4,51,5). It is easily verified that this value for : indeed
satisfies A: = I. 0

Example 15.4. LU decomposition (III).


We reconsider the CTMC for which the matrix Q is given by

(15.14)
336 15 Numerical solution of Markov chains

We now form the matrix A = QT and in addition directly include the normalisation
equation. To find the steady-state probabilities we have to solve:

(7 $ B).(;;)=(H). (15.15)

We now decompose A using the Doolittle decomposition as follows:

A=LU= (15.16)

The solution of Lg = (O,O, l)T now reveals, via a simple substitution, that 3 = (O,O, 1).
We now have to find p from Up = z, from which we, again via a substitution procedure,
findp= ($$,i), as we
- have seen
- before. 0

In the above example, we took a specific way to deal with the normalisation equation:
we replaced one equation from the “normal” system with the normalisation equation. In
doing so, the vector f! changes to b = (O,O, 1) and after the solution of Lg = b, we found
2 = (O,O, l)? Th is is not only true for the above example; if we replace the last equation,
the vector 2 always has this value, and so we do not really have to solve the system
Lx = (o,o, 1) T, Hence, after the LU decomposition has been performed, we can always
directly solve 13from Up_ = (0, s- - , 0, l)T.
Opposed to the above variant, we can also postpone the normalisation, as we have done
in the Gaussian elimination case. We then decompose A = QT = LU, for which we will
find that the last row of U contains only 0’s. The solution of Lx = 0 will then always yield
-z = 0, so that we can immediately solve Up- = 0. This triangular system of equations can
easily be solved via a back-substitution procedure; however, we have to assume PN = o
and compute the rest of p relative to a as well. A final normalisation will then yield the
ultimate steady-state probability vector p.
Postponing the normalisation is prefirred in most cases for at least two reasons: (i)
it provides an implicit numerical accuracy test in that the last row of U should equal 0;
and (ii) it requires less computations than the implicit normalisation since the number of
non-zeros in the matrices that need to be handled is smaller. Of course, these advantages
will become more important for larger values of N.
The LU decomposition solution method has the same computational complexity of
O(N3) as the Gaussian elimination procedure. The decomposition can be performed with
15.1 Computing steady-state probabilities 337

only one data structure (typically an array). Initially, the matrix A is stored in it, but
during the decomposition the elements of L (except for the diagonal elements from L, but
these are equal to 1 anyway) and the elements of U replace the original values.
We finally comment on the occurrence of over- and underflow during the computations.
Underflow can be dealt with by setting intermediate values smaller than some threshold,
say 1O-24 equal to 0. Overflow is unlikely to occur during the reduction phase in the
Gaussian elimination (the pivots are the largest (absolute) quantities in every column). If
in other parts of the algorithms overflow tends to occur (observed if some of the values
grow above a certain threshold, say lOlo) then an intermediate normalisation of the solution
vector is required. A final normalisation then completes the procedures.

15.1.3 Power, Jacobi, Gauss-Seidel and SOR iterative methods


Although direct methods are suitable to solve smaller instances of the system of equations
(15.2)) for reasons of computational and memory efficiency they cannot be used when the
number of states N grows beyond about a thousand. Instead, we use iterative methods
in these cases. With iterative methods, the involved matrices do not change (fill-in is
avoided), so that they can be stored efficiently using sparse matrix methods. Moreover,
these methods can be implemented such that in the matrix-multiplications only the mul-
tiplications with two non-zero operands are taken into account. This of course speeds up
the computations enormously.
Iterative procedures do not result in an explicit solution of the system of equations. A
key characteristic of iterative methods is that it is not possible to state a priori how many
computational steps are required. Instead, a simple numerical procedure (the iteration
step) is performed repeatedly until a desired level of accuracy is reached.

The Power method

We have already seen the simplest iterative method to solve for the steady-state probabil-
ities of a DTMC in Chapter 4: the Power method. The Power method performs successive
multiplication of the steady-state probability vector g with P until convergence is reached.
The Power method can also be applied for CTMCs. Given a CTMC with generator matrix
Q, we can compute the DTMC transition matrix P = I+ Q/X. If we take X 2 maxi{ Iqi,+l},
the matrix P is a stochastic matrix and describes the evolution of the CTMC in time-steps
of mean length l/X ( see Section 15.2 for a more precise formulation). Using P and set-
ting p(O) = p(0) as initial estimate for the steady-state probability vector, we can compute
p(“+lT = p(“p and find that p = limk,, PC’“).
- - -
338 15 Numerical solution of Markov chains

The Power method solves p- as the left Eigenvector of P, corresponding to an Eigenvalue


1. The employed matrix P is called the iteration matrix and denoted as +p.
Note that the Power method as sketched above is also often called uniformisation (see
Section 15.2). It is also allowed to use values of X < max;{lqi,+l} to construct a matrix P.
However, in that case P is not a stochastic matrix and the DTMC-interpretation is not
valid any more. Furthermore, it is also possible to apply the Power method with a matrix
P’, with p$ = qi,j/l~i,ii, for i # j, and plj = 0. After convergence, the resulting probability
vector g’ then needs to be renormalised to account for the different mean residence times
per state, as we have seen for SMCs in Chapter 3.
In practice, the Power method is not very efficient (see also Chapter 8 where we dis-
cussed the convergence of the Power method to compute the largest Eigenvalue of a matrix).
Since more efficient methods do exist, we do not discuss the Power method any further.

The Jacobi method

Two of the best-known (and simple) iterative methods are the Jacobi iterative method and
the Gauss-Seidel iterative method. These methods first rewrite the i-th equation of the
linear system (15.2) into:

(15.17)

We clearly need ai,i # 0; when the linear system is used to solve for the steady-state
probabilities of an irreducible aperiodic CTMC, this is guaranteed.
The iterative procedures now proceed with assuming a first guess for -p, denoted p(O).
-
If one does know an approximate solution for -p, it can be used as initial guess, e.g., when
a similar model with slightly different parameters has been solved before, the solution
vector for that model might be used as initial estimate, although the convergence gain in
doing so is mostly small. In other cases, the uniform distribution is a reasonable choice,
i.e.,pi (‘I = l/N. The next estimate for -p is then computed as follows:

pF’ai,j + &li”)ai,j
) =& (pf)%).
(15.18)
j>i

This is the Jacobi iteration scheme. We continue to iterate until two successive estimates
for 13 differ less than some E from one another, i.e., when I lp_ck+l) - p(‘“)I
_ I < E (difference
criterion). Notice that when this difference is very small, this does not always imply that
the solution vector has been found. Indeed, it might be the case that the convergence
15.1 Computing steady-state probabilities 339

towards the solution is very slow. Therefore, it is good to check whether 1IA$“) 11 < E
(residual criterion). Since this way of checking convergence is more expensive, often a
combination of these two methods is used: use the difference criterion normally; once it is
satisfied use the residual criterion. If the convergence is really slow, two successive iterates
might be very close to one another, although the actual value for -p is still “far away”.
To avoid the difference criterion to stop the iteration process too soon, one might instead
check on the difference between non-successive iterates, i.e., 111)(k+l) - p(“-d)
_ 1I < ,c, with
d E LV+ (and d 5 k).
When we denote the diagonal matrix D = diag(ai,i) and L and U respectively as the
lower and upper triangular half of A (these matrices should not be confused with the
matrices L and U used in the LU-decomposition!), we can write QT = A = D - (L + U),
so that we can write the Jacobi iteration scheme in matrix-vector notation as:

p@+l) = D-l& + u)~(“).


- (15.19)

We observe that the Jacobi method has iteration matrix +J = D-‘(L + U).

The Gauss-Seidel method

The Jacobi method requires the storage of both p(“) and p(“+‘) during an iteration step.
If, instead, the computation is structured such that the (!c + l)-th estimates are already
used as soon as they have been computed, we obtain the Gauss-Seidel scheme:

py+l)= -
lUt,il(gp’(k+l)ai,j
+ Cpj”)Ui,j)
j>i 1
(15.20)

where we assume that the order of computation is from pl to PN. This scheme then requires
only one probability vector to be stored, since the (Ic + 1)-th estimate for pi immediately
replaces the Ic-th estimate in the single stored vector p.
-
Employing the same matrix notation as above, we can write the Gauss-Seidel iteration
scheme in matrix-vector notation as

(k-t11 = Lp(k+l) + up(“), (15.21)


DP- - -

from which we conclude


p(“+l) = (D - L)-lUp(“). (15.22)
- -

We observe that the Gauss-Seidel method has iteration matrix +GS = (D - L)-lU.
340 15 Numerical solution of Markov chains

The SOR method

The last method we mention is the successive over-relaxation method (SOR). SOR is an ex-
tension of the Gauss-Seidel method, in which the vector p(“+l) is computed as the weighted
average of the vector -p(“) and the vector p ck+‘) that would have been used in the (pure)
Gauss-Seidel iteration. That is, we have, for i = 1,. . . , N:

(15.23)

where w E (0,2) is the relaxation factor. When w = 1, this method reduces to the Gauss-
Seidel iteration scheme; however, when we take w > 1 (or w < 1) we speak over over-
relaxation (under-relaxation). With a proper choice of w, the iterative solution process can
be accelerated significantly. Unfortunately, the optimal choice of w cannot be determined
a priori. We can, however, estimate w during the solution process itself, as will be pointed
out below.
Employing the same matrix notation as before, we can write the SOR iteration scheme
in matrix-vector notation as

DP(“+‘)
- = (1 - w)~(“)
- + &-l(~ (Ic+l) + Up(“)),
- (15.24)

from which we conclude

p@+‘)
- = (D - wL)-+AJ + (1 - w)D)$? (15.25)

We observe that the SOR method has iteration matrix *SOR = (D-wL)-1 (wU+( 1-w)D).
When using the SOR method, we start with w = 1 for about 10 iteration steps. In order
to find a better value for w, we can use the method proposed by Hageman and Young [116].
We then have to compute an estimate for the second largest Eigenvalue of the iteration
matrix +SOR as follows:
(k+l) - p(‘“) 11

L = l!;(k) -p(“-l)Il’
(15.26)

A new estimate for w, denoted w’ can then be computed as

2 xz+w-1
with Y= (15.27)
w’= l+dm’
w d-’x2
This new estimate then replaces the old value of w, and should be used for another number
of iterations, after which the estimation procedure is repeated. Whenever successive itera-
tion vectors are becoming worse, or when the estimated sub-dominant Eigenvalue X2 > 1,
then w should be reduced towards 1.
15.1 Computing steady-state probabilities 341

From the discussion of the Power method in Chapter 8 (in the context of the computa-
tion of Eigenvalues), we recall that convergence is achieved faster when the second-largest
(sub-dominant) Eigenvalue is smaller. In the iteration matrices given above, the largest
Eigenvalue always equals 1, and the speed of convergence of the discussed methods then
depends on the sub-dominant Eigenvalue. With the SOR method, with a proper choice
of w one can adapt the iteration matrix such that its second-largest Eigenvalue becomes
smaller.

Time and space complexity

The above iterative methods can be used to solve the linear systems arising in the solution
of the steady-state probabilities for CTMCs, with or without the normalisation equation.
Quite generally we can state that it is better not to include the normalisation equation.
If the normalisation equation is included, the actual iteration matrices change in such a
way that the sub-dominant Eigenvalue increases, hence reducing the speed of convergence.
Therefore, one better performs an explicit normalisation after a number of iterations.
We finally comment on the time and space complexity of the discussed iterative meth-
ods. All methods require the storage of the matrix A in some form. For larger modelling
problems, A has to be stored sparsely; it is then important that the sparse storage struc-
ture is structured such that row-wise access is very efficient since all methods require the
product of a row of A with the iteration
vector p- (Ic). The Power and the Jacobi method
require two iteration vectors to be stored, each of length N. The Gauss-Seidel and the
SOR method only require one such vector. In all the iteration schemes the divisions by
lai,iJ (and for SOR th e multiplication with w) need to be done only once, either before the
actual iteration process starts or during the first iteration step, by changing the matrix
A accordingly. This saves N divisions (and N multiplications for SOR) per iteration. A
single iteration can then be interpreted as a single matrix-vector multiplication (MVM).
In a non-sparse implementation, a single MVM costs O(N2) multiplications and additions.
However, in a suitably chosen sparse storage structure only O(q) multiplications and addi-
tions are required, where q is the number of non-zero elements in A. Typically, the number
of nonzero elements per column in A is limited to a few dozen. For example, considering
an SPN used to generate an underlying CTMC, the number of nonzero elements per row
in Q equals the number of enabled transitions in a particular marking. This number is
bounded by the number of transitions in the SPN, which is normally much smaller than
N (especially when N is large). Hence, it is reasonable to assume that q is of order O(N),
so that one iteration step then takes O(N) operations.
342 15 Numerical solution of Markov chains

An important difference between the presented iterative methods is the number of


required iterations that are typically required. This number can be estimated as
log E
NoI=- (15.28)
log x2 ’
where E is the required accuracy and X2 is the sub-dominant Eigenvalue of the iteration
matrix. Since we do not have knowledge about X2, we cannot compute No1 a priori but
we observe that the closer X2 is to Xi = 1, the slower the convergence. Typically, the
Power method converges slowest, and the Gauss-Seidel method typically outperforms the
Jacobi method. With the SOR method, a proper choice of the relaxation factor w results
in a smaller value for X2, hence, accelerates the iteration process, so that it often is the
fastest method. In practical modelling problems, arising from CTMCs generated from SPN
specifications, No1 can range from just a few to a few thousands.

Example 15.5. Comparing the Power, Jacobi and Gauss-Seidel methods.


We reconsider the CTMC for which the matrix Q is given by

For the Power method we obtain the iteration matrix @‘p = I + Q/X, with X = 6, so that

The matrix A = QT can be decomposed as D - (L + U) , so that we find:

fDJ = D-‘(L + U) =

and

aGS = (D - L)-lU =

These iteration matrices have the following Eigenvalues:

1.000 0.408 -0.408 1 1.000 -0.500 -0.500 1 1.000 0.000 0.000


15.2 Transient behaviour 343

# Power Jacobi Gauss-Seidel


1 ( 0.5000, 0.3333, 0.1667 ) ( 0.5385, 0.3077, 0.1538 ) ( 0.5833, 0.5833, 0.2917 )
2 ( 0.3889, 0.3889, 0.2222 ) ( 0.4902, 0.3137, 0.1961 ) ( 0.4000, 0.4000, 0.2000 )
3 ( 0.4167, 0.3889, 0.1944 ) ( 0.3796, 0.4213, 0.1991 ) ( 0.4000, 0.4000, 0.2000 )
4 ( 0.3981, 0.3981, 0.2037 ) ( 0.3979, 0.4023, 0.1998 )
5 ( 0.4028, 0.3981, 0.1991 ) ( 0.4001, 0.3999, 0.2000 )
6 ( 0.3997, 0.3997, 0.2006 ) ( 0.4000, 0.4000, 0.2000 )
7 ( 0.4005, 0.3997, 0.1998 ) ( 0.4000, 0.4000, 0.2000 )
8 ( 0.3999, 0.3999, 0.2001 ) ( 0.4000, 0.4000, 0.2000 )
9 ( 0.4001, 0.3999, 0.2000 )
10 ( 0.4000, 0.4000, 0.2000 )

Table 15.1: The first few iteration vectors for three iterative solution methods

As starting vector for the iterations we take (5, f , f ). In the Jacobi and Gauss-Seidel
method we renormalized the probability vector after every iteration. In Table 15.1 we
show the first ten iteration vectors for these methods. As can be seen, the Power method
convergest slowest, followed by the Jacobi and the Gauss-Seidel method. cl

15.2 Transient behaviour


In this section we discuss the solution of the time-dependent behaviour of a CTMC. In Sec-
tion 15.2.1 we explain why transient behaviour is of interest and which equations we need
to solve for that purpose. We then continue with the discussion of a simple Runge-Kutta
method in Section 15.2.2. In Section 15.2.3 we proceed with a very reliable method, known
as uniformisation, to compute the time-dependent state probabilities. This method exploits
the special probabilistic properties of the problem at hand. Finally, in Section 15.2.4, we
comment on the use of uniformisation to compute cumulative measures.

15.2.1 Introduction
Up till now we have mainly addressed the use and computation of the steady-state prob-
abilities of CTMCs. In general, steady-state measures do suffice for the evaluation of the
performance of most systems. There are, however, exceptions to this rule, for instance
344 15 Numerical solution of Markov chains

l when the system life-time is so short that steady-state is not reached;

l when the start-up period towards the steady-state situation itself is of interest;

l when temporary overload periods, for which no steady-state solution exists, are of
interest;

l when reliability and availability properties are taken into account in the model,
e.g., non-repairable systems that are failure-prone are of no interest in steady-state,
since then they will have completely failed.

The time-dependent state probabilities of a CTMC are specified by a linear system of


differential equations (see also (3.39)):

p’(t) = p_(t>Q,given p(O). (15.29)

Measures that are specified in terms of p(t) are called instant-of-time measures. If we
associate a reward ri with every state, the expected reward at time t can be computed as

E[X(t)]= f&i(t). i=l


(15.30)

The rewards express the amount of gain (or costs) that is accumulated per unit of time in
state i; E[X(t)] th en expresses the overall gain accumulated per time-unit.
In many modelling applications, not only the values of the state probabilities at a time
instance t are of importance, but also the total time spent in any state up to some time
t. That is, we are interested in so-called cumulative measures. We therefore define the
cumulative state vector l(t) as
L(t) = SotP_(s)ds. (15.31)

Notice that the entries of l(t) are no longer probabilities; Zi(t) denotes the overall time
spent in state i during the interval [0, t). Integrating (15.29), we obtain

Atp’(sW
=s,t
&)Qds, (15.32)

which can be rewritten as


p(t) - p_(O) = I@> Q, (15.33)

which equals, after having substituted i’(t) = p(t):

I’(t) = W>Q + p(O). (15.34)


15.2 Transient behaviour 345

We see that a similar differential equation can be used to obtain I(t) as to obtain p(t). If
ri is the reward for staying one time-unit in state i, then

Y(t) = -&(t) (15.35)


i=l

expresses the total amount of reward gained over the period [0, t). The distribution
Fy(y, t) = Pr{Y(t) 2 y} has b een defined by Meyer as the performability distribution
[196, 1971; 1‘t ex p resses the probability that a reward of at most y is gained in the period
[0, t). Meyer developed his performability measure in order to express the effectiveness of
use of computer systems in failure prone environments. After the next example we will
present an expression to compute E[Y(t)] effi ciently and comment on the computation of
the distribution of Y(t).

Example 15.6. Measure interpretation.


Consider a three-state CTMC with generator matrix

-2‘f 2f 0

This CTMC models the availability of a computer system with two processors. In state 1
both processors are operational but can fail with rate 2f. In state 2 only one processor is
operational (and can fail with rate f); the other one is repaired with rate r. In state 3 both
processors have failed; one of them is being repaired. Note that we assume that both the
processor life-times and the repair times are negative exponentially distributed. Since in
state 1 both processors operate, we assign a reward 2~ to state 1, where ,Q is the effective
processing rate of a single processor. Similarly, we assign r2 = p and r3 = 0. We assume
that the system is initially fully operational, i.e., -p(0) = (1, 0,O). The following measures
can now be computed:

l Steady-state reward rate (xi ripi): the expected processing rate of the system in
steady-state, i.e., the long-term average processing rate of the system;

l Expected instant reward rate (xi ripi(t th e expected processing rate at a particular
time instance t;

l Expected accumulated reward (xi riZi(t)): th e expected number of jobs (of fixed
length 1) processed in the interval [O,t);
346 15 Numerical solution of Markov chains

l Finally, the accumulated reward distribution F’(y, t) at time t expresses what the
probability is that at most y jobs (of fixed length 1) have been processed in [0, t).

152.2 Runge-Kutta methods


The numerical solution of systems of differential equations has long been (and still is)
an important topic in numerical mathematics. Many numerical procedures have been
developed for this purpose, all with specific strengths and weaknesses. Below, we will
present one such method in a concise way, thereby focusing on the computation of e(t).
The basic idea with Runge-Kutta methods (RK-methods) is to approximate the contin-
uous vector function p(t) that follows from the differential equation p’(t) = &t)Q, given
p(O), by a discrete function xi (i E IV), where ~~ approximates p(ih), i.e., h is the fixed
step-size in the discretisation. Normally, 7ro = p(O),
- and the smaller h, the better (but
more expensive) the solution is.
With Runge-Kutta methods, the last computed value for any point 7ri is used to
compute E~+~. The values 7ro through E;-, are not used to compute E~+~. Therefore,
Runge-Kutta methods are called single-step methods. They are always stable, provided
the step-size h is taken sufficiently small. Unlike Euler-methods (and variants), Runge-
Kutta methods do not require the computation of derivatives of the function of interest.
Since the latter is normally a costly operation, RK-methods are fairly efficient. There are
many Runge-Kutta schemes; they are distinguished on the basis of their order. A Runge-
Kutta method is of order Ic if the exact Taylor series for p(t + h) and the solution of the
RK-scheme for time instance t + h coincide as far as the terms up to hk are concerned.
One of the most widely used RK-methods is the RK4 method (Runge-Kutta method of
order 4). For a (vector) differential equation p’(t) = e(t)Q, given p(O), successive estimates
for 7ri are computed as follows:

Ki+1 (15.36)

with

(15.37)
15.2 Transient behaviour 347

Since the RK4 method provides an explicit solution to 7ri, it is called an explicit 4th-order
method. Per iteration step of length h, it requires 4 matrix-vector multiplications, 7 vector-
vector additions and 4 scalar-vector multiplications. Furthermore, apart from Q and E also
storage for at least two intermediate probability vectors is required.
In contrast, other RK-methods will yield a system of linear equations in which the
vector of interest, i.e., Ed, appears implicitly. Such methods are normally more expensive
to employ and can therefore only be justified in special situations, e.g., when the CTMC
under study is stiff, meaning that the ratio of the largest and smallest rate appearing in
Q is very large, say of the order of lo4 or higher.
There is much more to say a,bout numerical methods to solve differential equations,
however, we will not do so. Instead, we will focus on a class of methods especially developed
for the solution of the transient behaviour of CTMCs in the next section.

15.2.3 Uniformisation
We first consider the scalar differential equation p’(t) = pQ, given p(0). From elementary
analysis we know that the solution to this differential equation is p(t) = p(O)e@. When we
deal with a linear system of differential equations, as appears when addressing CTMCs,
the transient behaviour can still be computed from an exponential function, now in terms
of vectors and matrices:
p(t)
- = p(0)eQt. (15.38)

Direct computation of this matrix exponential, e.g.,via a Taylor series as CEO(Qt)i/i!,


is in general not feasible because [204]: (i) th e infinite summation that appears in the
Taylor series cannot be truncated efficiently; (ii) severe round-off errors usually will occur
due to the fact that Q contains positive as well as negative entries; and (iii) the matrices
(Qt)i become non-sparse, thus requiring too much storage capacity for practically relevant
applications. Although the last drawback can be handled by using larger memories, the
former two drawbacks are more profound. An analysis based on the Eigenvalues and
Eigenvectors of Q is possible, but this is a very expensive way to go, especially if Q is large.
Therefore, other algorithms have been developed of which uniformisation is currently the
most popular.
Uniformisation is based on the more general concept of uniformisation [147] and is also
known as Jensen’s method [llO] or randomisation [113], To use uniformisation, we define
the matrix
P=I+x
Q +- Q=X(P-I). (15.39)
348 15 Numerical solution of Markov chains

CTMC uniformised DTMC

Figure 15.3: A small CTMC and the corresponding DTMC after uniformisation

If X is chosen such that X > maxi{ (qi,il}, then the entries in P are all between 0 and 1, while
the rows of P sum to 1. In other words, P is a stochastic matrix and describes a DTlMC.
The value of X, the so-called uniformisation rate, can be derived from Q by inspection.

Example 15.7. Uniformising a CTMC.


Consider the CTMC given by

(15.40)

and initial probability vector -p(O) = (1, 0,O). For the uniformisation rate we find by
inspection: X = 6, so that the corresponding DTMC is given by:

(15.41)

The CTMC and the DTMC are given in Figure 15.3. Cl

The uniformisation of a CTMC into a DTMC can be understood as follows. In the


CTMC, the state residence times are exponentially distributed. The state with the shortest
residence times provides us with the value X. For that state, one epoch in the DTMC
corresponds to one negative exponentially distributed delay with rate X, after which one
of the successor states is selected probabilistically. For the states in the CTMC that have
total outgoing rate X, the corresponding states in the DTMC will not have self-loops. For
states in the CTMC having a state residence time distribution with a rate smaller than
X (the states having on average a longer state residence time), one epoch in the DTMC
15.2 Transient behaviour 349

might not be long enough; hence, in the next epoch these states might be revisited. This
is made possible by the definition of P, in which these states have self-loops, i.e.,pi,i > 0.
Using the matrix P, we can write

p(t) = p(0)eQt = p(0)ex(P-l)t = p(0)e-XIteXPt = p(0)eextexPt. (15.42)

We now employ a Taylor-series expansion for the last matrix exponential as follows

p(t) = p(0)ePxt F q = g(o) 5 $(W 4pn, (15.43)


- -
n=O n=O

where
$(Xt; n) = evAtQ$, n E IV, (15.44)
are Poisson probabilities, i.e., Q(Xt; n) is the probability of n events occurring in [0, t) in a
Poisson process with rate X. Of course, we still deal with a Taylor series approach here;
however, the involved P-matrix is a probabilistic matrix with all its entries between 0 and
1, as are the Poisson probabilities. Hence, this Taylor series “behaves nicely”, as we will
discuss below.
Equation (15.43) can be understood as follows. At time t, the probability mass of
the CTMC, initially distributed according to p(0) has been redistributed according to the
DTMC with state-transition matrix P. During the time interval [0, t), with probability
$(Xt; n) exactly n jumps have taken place. The effect of these n jumps on the initial
distribution p(O) is described by the vector-matrix product -p(0)Pn. Weighting this vector
with the associated Poisson probability @(Xi; n), and summing over all possible numbers
of jumps in [0, t), we obtain, by the law of total probability, the probability vector p_(t).
Uniformisation allows for an iterative solution algorithm in which no matrix-matrix
multiplications take place, and thus no matrix fill-in occurs. Instead of directly computing
(15.43) one considers the following sum of vectors:

I-$) = g VW; 4 (p_(o)pn) = F WC n)%z, (15.45)


n=O n=O

where z,, being the state probability distribution vector after n epochs in the DTMC with
transition matrix P, is derived recursively as

7ro = p(0)
- and 7rn = 7rnM1P, n E N+. (15.46)

Clearly, the infinite sum in (15.45) has to be truncated, say after Ic, iterations or epochs in
the DTMC. The actually computed state probability vector g(t) is then:

jj@)= 5 $(xt;n)En* (15.47)


n=O
350 15 Numerical solution of Markov chains

At

E 0.1 0.2 1 2 4 8 16
0.0005 2 3 6 8 12 19 31
0.00005 3 3 7 10 14 21 34
0.000005 3 4 8 11 16 23 37

Table 15.2: The number of required steps kc as a function of E and the product At

The number of terms that has to be added to reach a prespecified accuracy E can now be
computed a priori as follows, It can be shown that the difference between the computed
and the exact value of the transient probability vector is bounded as follows:

lip(t) -p(t)11 < 1 - 5 e-At@$ (15.48)


n=O

Thus, we have to find that value of k, such that 1 - ~$zoe-X’(At)n/n! 5 E. Stated


differently, we need the smallest value of k, that satisfies

5 (At)n
n=O
n! -> -l-e-At
E= (1 - e)eXt, (15.49)

For reasons that will become clear below, k, is called the right truncation point.

Example 15.8. How large should we take k,.


In Table 15.2 we show the number of required steps k, as a function of E and the product At
in the uniformisation procedure. As can be observed, k, increases sharply with increasing
At and decreasing E. Cl

If the product At is large, k, tends to be of order O(M). On the other hand, if At


is large, the DTMC described by P might have reached steady-state along the way, so
that the last matrix-vector multiplications do not need to be performed any more. Such a
steady-state detection can be integrated in the computational procedure (see [213]).

Example 15.9. Transient solution of the three-state CTMC.


We consider the transient solution of the CTMC given in Figure 15.3; we already performed
the uniformisation to form the matrix P with uniformisation rate X = 6.
We first establish how many steps we have to take into account for increasing time
values t. This number can be computed by checking the inequality (15.49) and taking
E = 10w4. We find:
15.2 Transient behaviour 351

t 0.1 0.2 0.5 1 5 10 20 50 100


k, 5 7 11 17 52 91 163 367 693
We then continue to compute p(t) by adding the appropriate number of vectors Ed, weighted
by the Poisson probabilities icording to (15.47), to find the curves for pi(t) as indicated
in Figure 15.4. As can be observed, for t 2 2 steady-state is reached. Hence, although
for larger values of t we require very many steps to be taken, the successive vectors 7rn do
not change any more after some time. This can be detected during the computations. To
that end, denote with k,, < kE the value after which E does not change any more. Instead
of explicitly computing the sum (15.47) f or all values of n, the last part of it can then be
computed more efficiently as follows:

p(t) = 5 $(Xt;n)~, = 5 $(At;n)7r, + (n$&+(Atin)) %s, (15.50)


n=O n=O

thus saving the computation intensive matrix-vector multiplications in the last part of the
sum. The point k,, is called the steady-state truncation point.
If the product At is very large, the first group of Poisson probabilities is very small,
often so small that the corresponding vectors “n do (almost) cancel. We can exploit this
by only starting to add the weighted vectors 7r, after the Poisson weighting factors become
reasonably large. Of course, we still have to compute the matrix-vector products (15.46).
The point where we start to add the probability vectors is called the left truncation point
and is denoted ko. 0

It should be noted that using the precomputed value of k,, the truncation error is
bourlded by C; whether round-off introduces extra error is a separate issue. Finally, we
note that the Poisson probabilities $(Xt; n), n = 0,. +. , N, can be computed efficiently
when taking into account the following recursive relations:

i$(At;O) = eext, and $(Xt; n + 1) = $(Xt; n)s, n E I?. (15.51)

When At is large, say larger than 25, overflow might easily occur. However, for these
cases, the normal distribution can be used as an approximation. Fox and Glynn recently
proposed a stable algorithm to compute Poisson probabilities [96].
To use uniformisation, the sparse matrix P has to be stored, as well as two probability
vectors. The main computational complexity lies in the k, matrix-vector multiplications
that need to be performed (and the subsequent multiplication of these vectors with the
appropriate Poisson probabilities; these are precomputed once). We finally remark that
numerical instabilities do generally not occur when using uniformisation, since all compu-
tational elements are probabilities.
352 15 Numerical solution of Markov chains

0.8

0.6
Pi(t)
0.4

0.2

0.0
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2
t

Figure 15.4: First two seconds in the evolution of the three-state CTMC computing via
uniformisation

15.2.4 Cumulative measures

As we have seen, in many modelling applications, not only the value of the state probabil-
ities at a time instance t are of importance, but also the total time spent in any state up
to some time t is often of interest. We therefore defined the cumulative state vector I(t)

and
N
w> = Cd(t), i=l
(15.52)

which expresses the total amount of reward gained over the period [0, t). Below, we will
present an expression to compute E[Y(t)] effi ciently and comment on the computation of
the distribution of Y(t).
In an interval [0, t), the expected time between two jumps, when Ic jumps have taken
place according to a Poisson process with rate X equals t/( k+ 1). The expected accumulated
reward until time t , given Ic jumps, then equals

Summing this expression over all possible number of jumps during the interval [0, t) and
15.3 Further reading 353

weighting these possibilities accordingly, we obtain:

E[Y(t)] = 2 $J(% Ic)& gi c R(m)


k=O 2-l m=O

(15.53)
k=O i=l m=O

Based on this expression, efficient numerical procedures can be devised [263, 2641.
We finally comment on the solution of the performability distribution Fy(~,t) =
Pr{Y(t) 5 y}. Also h ere, uniformisation can be employed; however, a direct summa-
tion over all states does not suffice any more. Instead, we have to sum the accumulated
reward over all paths of length I (given a starting state) that can be taken through the
DTMC, after which we have to compute a weighted sum over all these paths and their
occurrence probabilities. De Souza e Silva and Gail have developed nice recursive solution
procedures to efficiently compute & [263, 2641; these go beyond the scope of this book.

15.3 Further reading


For more details on the numerical solution of Markov chains (with a focus on steady-state
probabilities) the recent books by Stewart [268] and Buchholz et al. [33] provide excellent
material. The overview paper by Krieger et al. [165], and the comparisons by Stewart
[266, 2671 can also be recommended. General information on numerical methods for the
solution of linear systems of (differential) equations can also be found in textbooks on
numerical analysis [105, 116, 2541.
Meyer introduced the concept of performability [195, 196, 200, 197, 198, 1991, for which
uniformisation turned out to be a very important computational method. Uniformisation
for CTMCs has been presented by Gross and Miller [113], Grassmann [log, 108, 1101, Van
Dijk [73] and originally by Jensen [147]. Apart from the standard uniformisation procedure
we have presented here, various variants do exist, most notably adaptive uniformisation
[208], partial uniformisation, dynamic uniformisation, orthogonal uniformisation and lay-
ered uniformisation; the Ph.D. thesis of van Moorsel provides an excellent overview of
uniformisation and its applications [206]. Van Moorsel and Haverkort discuss the use of
partial uniformisation in the context of probabilistic validation [207]. De Souza e Silva
and Gail as well as Qureshi and Sanders apply uniformisation for the computation of
performability measures [236, 262, 263, 2641.
For CTMCs with special properties, uniformisation might be second best. For instance,
for CTMCs that are stiff, ordinary differential equation solution methods might be better;
354 15 Numerical solution of Markov chains

Reibman et al. present comparisons in [239, 240, 2381. A procedure to handle t,he stiffness
based on aggregation is proposed by Bobbio and Trivedi [23]. For acyclic CTMCs, a special
algorithm known as ACE has been developed to compute the transient state probabilities
efficiently, although this algorithm is not always numerically stable [ 1901.
Trivedi et al. [281] and Haverkort and Trivedi [ 1211 d iscuss the use of Markov-reward
models for performance, reliability and performability evaluation. Haverkort and Niemegeers
recently discussed software tools to support performability evaluation [125].

15.4 Exercises
15.1. Direct methods for steady-state.
A four-state CTMC is given by its generator matrix:

Compute the steady-state probability distribution for this CTMC, thereby handling the
normalisation equation separately, using the following methods:

1. Gaussian elimination.

2. LU-decomposition in the Doolittle variant I

3. LU-decomposition in the Crout variant.

15.2. Iterative methods for steady-state.


Reuse the four-state CTMC of the previous exercise. Compute the steady-state probability
distribution for this CTMC (again handling the normalisation equation separately) using
the following methods:

1. The Power method (how large is X?).

2. The Jacobi iterative method.

3. The Gauss-Seidel method.


15.4 Exercises 355

15.3. Computing transient probabilities.


Reuse the four-state CTMC of the previous exercise and assume that p(O) = (l,O, 0,O).
Compute the transient probability distribution for t = 1 using the following methods:

1. The RK4-method (how large should h be?).

2. The uniformisation method, thereby also addressing the following questions:

(a) How large is the uniformisation rate A?


(b) How large is k, for E = lo-“, n = 1,2,3,4,5?

15.4. The RK4 method.


Show that the coefficients in the RK4 method have been chosen such that they yield the
first four terms of the Taylor-series expansion of p(t).
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 16

Stochastic Petri net applications

I N this chapter we address a number of applications of the use of SPN models. All
the addressed applications include aspects that are very difficult to capture by other
performance evaluation techniques. The aim of this chapter is not to introduce new theory,
but to make the reader more familiar with the use of SPNs.
We start with SPN models of a multiprogramming computer system in Section 16.1.
We will show how to use exact SPN models for multiprogramming models including paging
phenomena. Although the SPN-based solution approach is more expensive than one based
on queueing networks, this study shows how to model system aspects that cannot be
coped with by traditional queueing models. Then, in Section 16.2, we discuss SPN-based
polling models for the analysis of token ring systems and traffic multiplexers. These models
include aspects that could not be addressed with the techniques presented in Chapter 9.
Since some of the models become very large, i.e., the underlying CTMC becomes very large,
we also discuss a number of approximation procedures. We then present a simple SPN-
based reliability model for which we will perform a transient analysis in Section 16.3. We
finally present an SPN model of a very general resource reservation system in Section 16.4.

16.1 Multiprogramming systems


We briefly recall the most important system aspects to be modelled in Section 16.1.1. We
then present the SPN model and perform invariant analysis in Section 16.1.2. We present
some numerical results in Section 16.1.3
358 16 Stochastic Petri net applications

16.1.1 Multiprogramming computer systems


We consider a multiprogramming computer system at which K system users work; they sit
behind their terminals and issue requests after a negative exponentially distributed think
time E[Z]. Requests are accepted by the system, but not necessarily put in operation
immediately since there exists a multiprogramming limit J, such that at most J customers
are actively being processed. For its processing, the system uses its CPU and two disks,
one of which is used exclusively for handling paging I/O. After having received a burst
of CPU time, a customer either has to do a disk access (user I/O), or a new page has to
be obtained from the paging device (page I/O), or a next CPU burst can be started, or
an interaction with the terminal is necessary. In the latter case, the customer’s code will
be completely swapped out of main memory. We assume that every customer receives an
equal share of the physical memory.

16.12 The SPN model

The SPN depicted in Figure 16.1 models the above sketched system. Place terminal
models the users sitting behind their terminals; it initially contains K tokens. The think
time is modelled by transition think; its rate is #terminals/E[Z], i.e., it is proportional
to the number of users thinking. The place swap models the swap-in queue. Only if there
are free pages available, modelled by available tokens in free, is a customer swapped in,
via immediate transition getmem, to become active at the CPU. The place used is used to
count the number of users being processed: it contains J - #free tokens. After service at
the CPU, a customer moves to place decide. There, it is decided what the next action the
customer will undertake is: it might return to the terminals (via the immediate transition
f reemem), it might require an extra CPU burst (via the immediate transition reserve),
or it might need I/O, either from the user-disk (via the immediate transition user-io) or
from the paging disk (via the immediate transition page-io). The weight to be associated
with transition page-io is made dependent on the number of customers being worked
upon, i.e., on the number of tokens in place used, so as to model increased paging activity
if there are more customers being processed simultaneously.
We are interested in the throughput and (average) response time perceived at the ter-
minals. Furthermore, to identify bottlenecks, we are interested in computing the utilisation
of the servers and the expected number of customers active at the servers (note that we
can not directly talk about queue lengths here, although a place like cpu might be seen as
a queue that holds the customer in service as well and where serve is the corresponding
16.1 Multiprogramming systems 359

# terminals

swap

f reemem
getmem

page-disk

Figure 16.1 : An SPN model of a multiprogramming computer system

server). The measures of interest can be defined and computed as follows:


l The component utilisations are defined as the probability that the corresponding
places are non-empty. As an example, for the CPU,we find:

Pcpu
= c mER(~),#cpu>O
-
Pr(m). (16.1)

l The expected number of tokens in a component is expressed as follows (as an example,


we again consider the CPU):

q&J = c #cpu(m) Pr{2121. (16.2)


m~w!!?0)
l The total number of customers being operated upon and the number of customers
waiting is denoted as the number in the system (syst):

E[h+] = c (#used(m)+#swap(m))Pr(m}. (16.3)


!G%.??il)
360 16 Stochastic Petri net applications

l The throughput perceived at the terminals:

#terminals(m)
xt= c WI
Pr{m}. (16.4)
E??~WE,)

l With Little’s law, we finally can express the expected system response time as:

E[R] = E[N,,st]/-L (16.5)

Before we proceed to the actual performance evaluation, we can compute the place invari-
ants. In many cases, we can directly obtain them from the graphical representation of the
SPN, as is the case here. Some care has to be taken regarding places that will only contain
tokens in vanishing markings (as decide in this case). We thus find the following place
invariants:

free + used = J, free + cpu + decide + user-disk + page-disk = J,


terminals + swap + cpu + decide + user-disk + page-disk = K.

16.1.3 Some numerical results


To evaluate the model, we have assumed the following numerical parameters: the number
of terminals K = 30, the think time at the terminals E[Z] = 5, the service time at the cpu
E[S,,,] = 0.02, the service time at the user disk E[Suser-disk] = 0.1, and the service time
at the paging device E[Spage-disk] = 0.0667. The weights of the immediate transitions are,
apart from one, all taken constant: W(getmem) = 1, W(freemem) = 0.1, W(reserve) =
0.5, and W(user-io) = 0.2. Finally, we have set W(page-io, m) = 0.2 + 0.04 x #used(m),
i.e., the load on the paging device increases as the number of actually admitted customers
increases. When we exclude paging effects, we simply set W(page-io) = 0.2. In the
latter case, the five transitions that are enabled when decide contains a token form a
probabilistic switch and the weights can be interpreted as fixed routing probabilities. In
what follows, we will compare the performance of the multiprogrammed computer system
for increasing multiprogramming degree J, when paging is taken into account, or not.
First of all, we study the throughput perceived at the terminals for increasing J in
Figure 16.2. When paging is not taken into account, we see an increase of the throughput
for increasing J, until a certain maximum has been reached. When paging is included in
the model, we observe that after an initial increase in throughput, a dramatic decrease in
throughput takes place. By allowing more customers, the paging device will become more
heavily loaded and the effective rate at which customers are completed decreases. Similar
16.1 Multiprogramming systems 361

I I
no paging

0' I I I I I
5 10 15 20 25 30
J

Figure 16.2: The terminal throughput Xt as a function of the multiprogramming limit J


when paging effects are modelled and not modelled

20
no paging

15

E[R] lo

paging

10 15 20 25 30
J

Figure 16.3: The expected response time E[R] as a function of the multiprogramming limit
J when paging effects are modelled and not modelled
362 16 Stochastic Petri net applications

10

E[NT
4 CPU

2 page-i0

0
10 15 20 25 30
J

Figure 16.4: The mean number of customers in various components as a function of the
multiprogramming limit J when paging is not modelled

WI 8
6

5 10 15 20 25 30
J

Figure 16.5: The mean number of customers in various components as a function of the
multiprogramming limit J when paging is modelled
16.2 Polling models 363

observations can be made from Figure 16.3 where we compare, for the same scenarios, the
expected response times. Allowing only a small number of customers leaves the system
resources largely unused and only increases the expected number of tokens in place swap.
Allowing too many customers, on the other hand, causes the extra paging activity which,
in the end, overloads the paging device and causes thrashing to occur.
To obtain slightly more detailed insight into the system behaviour, we also show the
expected number of customers in a number of queues, when paging is not taken into account
(Figure 16.4) and when paging is taken into account (Figure 16.5). As can be observed,
the monotonous behaviour of the expected place occupancies in the model without paging
changes dramatically if paging is included in the model. First of all, by increasing the
multiprogramming limit above a certain value (about 8 or 9) the number of customers
queued at the CPU decreases, simply because more and more customers start to queue
up at the paging device (the sharply increasing curve). The swap-in queue (place swap)
does not decrease so fast in size any more when paging is modelled; it takes longer for
customers to be completely served (including all their paging) before they return to the
terminals; hence, the time they spend at the terminals becomes relatively smaller (in a
complete cycle) so that they have to spend more time in the swap-in queue.
We finally comment on the size of the underlying reachability graph and CTMC of
this SPN. We therefore show in Table 16.1 the number of tangible markings TM (which
equals the number of states in the CTMC), the number of vanishing markings VM (which
need to be removed during the CTMC construction process) and the number of nonzero
entries (7) in the generator matrix of the CTMC, as a function of the multiprogramming
limit J. Although the state space increases for increasing J, the models presented here can
still be evaluated within reasonable time; for J = 30 the computation time remains below
60 seconds (Sun Spare 20). Notice, however, that the human-readable reachability graph
requires about 1.4 Mbyte of storage.

16.2 Polling models


In this section we discuss the use of SPNs to specify and solve polling models for the analysis
of token ring systems. A class of cyclic polling models with count-based scheduling will be
discussed in Section 16.2.1, whereas Section 16.2.2 is devoted to cyclic polling models with
local time-based scheduling. We then comment on some computational aspects for large
models in Section 16.2.3. We finally comment on the use of load-dependent visit-orderings
in Section 16.2.4.
364 16 Stochastic Petri net applications

J TM VM q J TM VM q J TM VM q
1 92 59 327 11 1846 3014 14022
21 4301 7469 34617
2 178 173 873 12 2093 3458 16068 22 4508 7843 36363
3 290 338 1650 13 2345 3913 18165 23 4700 8188 37980
4 425 550 2640 14 2600 4375 20295 24 4875 8500 39450
5 581 805 3825 15 2856 4840 22440 25 5031 8775 40755
6 756 1099 5187 16 3111 5304 24582 26 5166 9009 41877
7 948 1428 6708 17 3363 5763 26703 27 5278 9198 42798
8 1155 1788 8370 18 3610 6213 28785 28 5365 9338 43500
9 1375 2175 10155 19 3850 6650 30810 29 5425 9425 43965
10 1606 2585 12045 20 4081 7070 32760 30 5456 9920 44640

Table 16.1: The number of tangible and vanishing states and the number of nonzero entries
in the CTMCs for increasing multiprogramming limit J

passive active token

(j =-j tz next station

switch-over

Figure 16.6: SPN model of a station with exhaustive scheduling strategy


16.2 Polling models 365

16.2.1 Count-based, cyclic polling models


Using SPNs we can construct Markovian polling models of a wide variety. However, the
choice for a numercial solution of a finite Markov chain implies that only finite-buffer (or
finite-customer) systems can be modelled, and that all timing distributions are exponential
or of phase-type. Both these restrictions do not imply fundamental problems; however,
from a practical point of view, using phase-type distributions or large finite buffers results
in large Markovian models which might be too costly to generate and solve. The recent
developments in the use of so-called DSPNs [182, 183, 1841allow us to use deterministically
timed transition as well, albeit in a restricted fashion.
Ibe and Trivedi discuss a number of SPN-based cyclic server models [142]. A few
of them will be discussed here. First consider the exhaustive service model of which we
depict a single station in Figure 16.6. The overall model consists of a cyclic composition of a
number of station submodels. Tokens in place passive indicate potential customers; after
an exponentially distributed time, they become active (they are moved to place active
where they wait until they are served). When the server arrives at the station, indicated
by a token in place token, two situations can arise. If there are customers waiting to be
served, the service process starts, thereby using the server and transferring customers from
place active to passive via transition serve. After each service completion, the server
returns to place token. Transition serve models the customer service time. If there are
no customers waiting (anymore) to be served (place active is empty), transition serve
is disabled and the server is transferred via the immediate transition direct-switch to
place switch. After the switch-over (transition switch-over), the server arrives at the
place token of the next station.
Using the SPN model an underlying Markov chain can be constructed and solved.
Suppose, for station i, the initial number of customers in place passive equals ni and the
rate of customers arriving via transition arrive equals Xi. Then, from the SPN analysis
we can obtain E[N,,ti,,], t h e ex p ect ed number of customers in place active, and ai, the
probability that place passive is empty. The effective arrival rate of customers to place
active is (l- cri)Xi since only when passive is non-empty, is the arrival transition enabled.
Using Little’s law, the expected response time then equals

(16.6)

In Figure 16.7 we depict a similar model for the case when the scheduling strategy is
k-limited. Notice that when Ic = 1, a simpler model can be used. If a token arrives
at such a station, three situations can occur. Either there is nothing to send, so that
366 16 Stochastic Petri net applications

passive active token


from previous station

direct-switch

station

switch-over

Figure 16.7: SPN model of a station with k-limited scheduling strategy

immediate transition direct fires and the token is forwarded to the next, station. If there
are customers waiting, at most Ic of them can be served (transition serve is inhibited as
soon as count contains k tokens). Transition enough then fires, thus resetting place count
to zero and taking the token into place prepare. Then, transition flush can fire, taking
all tokens in place count (note the marking dependent arc multiplicity which is chosen to
equal the number of tokens in place count; this number can also be zero, meaning that
the arc is effectively not there), and preparing the token for the switch-over to the next
station. When there are less than k customers queued upon arrival of the token, these
can all be served. After their service, only transition direct can fire, putting a token in
prepare. Then, as before, transition flush fires and removes all tokens from count.
As can be observed here, the SPN approach towards the modelling of polling systems
provides great, flexibility. Not, only can we model most, “standard” polling mechanisms,
we can also combine them as we like. Since the underlying CTMC is solved numerically,
dealing with asymmetric models does not change the solution procedure. Notice that we
have used Poisson arrival processes in the polling models of Chapter 9. In the models
presented, we approximate these by using a place passive (per station submodel) as a
finite source and sink of customers; by making the initial number of tokens in this place
larger, we approximate the Poisson process better (and make the state space larger). Of
course, we can use other arrival processes as well, i.e., we can use any phase-type renewal
process, or even non-renewal processes.
16.2 Polling models 367

passive active token


from previous station

direct-switch

- to next station

switch-over

Figure 16.8: SPN-based station model with local, exponentially distributed THT

16.2.2 Local time-based, cyclic polling models


The SPN models presented so far all exhibit count-based scheduling. As we have seen
before, time-based scheduling strategies are often closer to reality. We can easily model
such time-based polling models using SPNs as well.
Consider the SPN as depicted in Figure 16.8. It represents a single station of a polling
model. Once the token arrives at the station, i.e., a token is deposited in place token, two
possibilities exist:

1. There are no customers buffered: the token is immediately released and passed to
the next station in line, via the immediate transition direct;

2. There are customers buffered: these customers are served and simultaneously, the
token holding timer (THT) is started. The service process can end in one of two
ways:

l The token holding timer expires by the firing of transition timer, in which case
the token is passed to the next downstream station and the serving of customers
stops;
l All customers are served via transition serve before the token holding timer
expires: the token is simply forwarded to the next downstream station.

Instead of using a single exponential transition to model the token holding timer, one
can also use a more deterministic Erlang-J distributed token holding timer as depicted in
Figure 16.9. The number J of exponential phases making up the overall Erlang distribution
368 16 Stochastic Petri net applications

passive active token

direct-switch
to next

switch-over

Figure 16.9: SPN-based station model with local, Erlang-J distributed THT

is present in the model via the multiplicity of the arc from count to expire and the rate of
transition timer which equals J/t& place count now counts the number of phases in the
Erlang-J distributed timer that have been passed already. The operation of this SPN is
similar to the one described before; the only difference is that transition timer now needs
to fire J times before the THT has expired (and transition expire becomes enabled). In
case all customers have been served but the timer has not yet expired, transition direct
will fire and move the token to the next station. Notice that one of its input arcs is marking
dependent; its multiplicity equals the number of tokens in place count.

Example 16.1. The influence of the THT in a symmetric model.


Consider a 3-station cyclic polling model as depicted in Figure 16.9 with J = 2. The
system is fully symmetric but for the THT values per station: we have thti varying whereas
t/&3 = 0.2. The other system parameters are X = 3, E[S] = 0.1 (exponentially distributed)
and S = 0.05 (exponentially distributed).
In Figure 16.10 we depict the average waiting times perceived at station 1 and stations
2 and 3 (the latter two appear to be the same) when we vary thti from 0.05 through 2.0
seconds. As can be observed, with increasing thti, the performance of station 1 improves
at the cost of stations 2 and 3. 0
16.2 Polling models 369

I I I I I I I I
o- ’
0.1 0.2 0.3 0.4 OL& 0.6 0.7 0.8 0.9 1
1

Figure 16.10: The influence of thtr on the average waiting times in a symmetric system

JK r7 Jwl w$I J K rl Jfwl ~[N,I


1 4488 20136 0.778 1.731 5 12648 58056 0.648 1.482
2 6528 29616 0.699 1.580 6 14688 67538 0.642 1.471
3 8568 39096 0.671 1.527 7 16728 77016 0.638 1.463
4 10608 48576 0.657 1.499 8 18768 86496 0.635 1.457

Table 16.2: The influence of the variability of the THT

Example 16.2. Erlang- J distributed THT.


Consider a symmetric cyclic polling model consisting of N = 3 stations of the form as
depicted in Figure 16.9. As system parameters we have X = 2, E[S] = 0.1 (exponentially
distributed) and 6 = 0.05 (exponentially distributed), and tht = 0.2 for all stations.
In Table 16.2 we show, for increasing J, the state space size K and the number of
non-zero entries q in the Markov chain generator matrix Q which is a good measure for
the required amount of computation, the expected waiting time, and the expected queue
length. As can be observed, when the THT becomes more deterministic (when J increases)
the performance improves. This is due to the fact that variability is taken out of the model.
0
370 16 Stochastic Petri net applications

1.8

1.6

1.4

E[WJ 1.2

1.0

0.8

0.6
2 4 6 8 10 12
tht

Figure 16.11: The influence of the THT on the average waiting times in an asymmetric
system

Example 16.3. The influence of the THT in an asymmetric model.


Consider a 2-station polling model as depicted in Figure 16.9 with J = 4. Furthermore, we
have E[Si] = 0.5 (ex p onentially distributed) and 6i = 0.1 (ex p onentially distributed). The
asymmetry exists in the arrival rates: Xi = 0.8 and AZ = 0.2. The system is moderately
loaded: p = 0.5.
In Figure 16.11 we show E[W,] and E[W 2] as a function of tht, which is the same in both
stations. For small values of the THT, the system behaves approximately as a l-limited
system and in the higher loaded station (1) a higher average waiting time is perceived.
When the THT becomes very large, the system behaves as an exhaustive service system in
which station 1 dominates and station 2 suffers. Indeed, for tht = 100 (not in the figure),
we find the limiting values E[W,] = 0.616 and E[W2] = 0.947 (see also Chapter 9). The
performance perceived at station 2 is worse than at station 1, despite its lower load.
When increasing the THT, E[W,] monotonously decreases: the larger the THT the
more station 1 profits. For station 2 this is not the case. When the THT increases, station
2 first profits from the increase in efficiency that is gained. However, when the THT grows
beyond 1.5, station 2 starts to suffer from the dominance of station 1. 0
16.2 Polling models 371

passive arrive active token

Station 1 Stations 2 to N

switch-over’

leave

Figure 16.12: Folding an N-station model to an approximate 2-station model

16.2.3 Approximating large models

An advantage of the SPN approach is that asymmetric models are as easy to solve as
symmetric models and that different scheduling strategies can be easily mixed. An inherent
problem with this approach is that the state-space size increases rapidly with the number
of stations and the maximum number of customers per station.
To cope with the problem of very large Markovian models, one can go at least two ways.
One can try to exploit symmetries in the model in order to reduce the state space. This can
sometimes be done in an exact way, in other circumstances only approximately. Another
way to go is to decompose the model and to analyse the submodels in isolation (divide and
conquer). Again, this can sometimes be done exactly, in other cases only approximately.

Exploiting symmetries: folding. When the model is highly symmetric it is possible to


exploit this by folding together states which are statistically equivalent or near equivalent.
As an example of this, consider the case where we model an N-station l-limited polling
model where stations 2, s. . , N are statistically the same. About station 1 we do not make
any assumptions. Instead of modelling stations 2 through N separately, we can also fold
them together, i.e., model them as one station with an increased arrival rate and which is
visited N - 1 times after another, each time including a switch-over time, before a visit to
station 1 occurs. This approach is illustrated in Figure 16.12 where basically two stations
are depicted; however, the station on the right “models” stations 2 through N. The extra
places in station 2 model a counter that takes care of visiting the folded model N - 1 times
before visiting station 1 again; for details, see [50]. In this approach, the service of station
1 is interrupted every now and then. The interruption duration, i.e., the residence time
372 16 Stochastic Petri net applications

passive active token

direct-switch

switch-over

Figure 16.13: Approximate model &Li for a single station in a polling model

of the server at stations 2 through N, is modelled fairly much in detail. A system aspect
that is lost in this approach is the ordering of the stations. Suppose that, when N = 8, in
the unfolded model the server is at station 4 and an arrival takes place at station 2. The
server would not serve this job before going to station 1 first. In the folded model, however,
it might be the case, dependent on other buffer conditions, that this customer is served
before the server moves to station 1, simply because the station identity of the customer
is lost. Choi and Trivedi report fairly accurate results with this approach even though the
state-space size of the folded model is only a few percents of the unfolded model [50]. They
have successfully applied this strategy in the analysis of client-server architectures with
token ring and Ethernet communication infrastructures [140].
When the polling strategy itself would have been symmetric, i.e., when we would have
dealt with a polling table with pi,j = l/N, the folding strategy would have yielded exact
results. In such a case, the folding technique corresponds to the mathematically exact
technique of state lumping in CTMCs. The software tool UltraSAN [68] supports this kind
of lumping automatically.

Model decomposition: fixed-point iteration. When the models are asymmetric, a


folding procedure cannot be followed. Instead, one can employ a procedure in which all the
stat ions are analysed individually, thereby taking into account the “server unavailability”
due to service granted at other stations. Since these latter quantities are unknown in
advance, one can initially guess them. Using these guesses, a more exact approximation
can be derived which can again be used to obtain a better approximation, etc.
Using such so-called fixed-point iteration techniques, Choi and Trivedi derive results for
large asymmetric polling models with acceptable accuracy [50]. For every station (indexed
16.2 Polling models 373

i), they solve a model (named Mi) similar to the one presented in Figure 16.13. In Mi,
transition others models the time it takes to visit and serve all others stations j # i. Seen
from station i, this time is just a vacation for the server; what the server actually does
during this time is not important for station i, Initially, a reasonable guess is done for the
rate of others, e.g., the reciprocal value of the sum of the switch-over times. From A,& one
can calculate the probability pi(lc) of having k E (0, a. . , ni} customers queued in place
buffer (ni is the initial number of tokens in place passive in station i). The expected
delay the token perceives when passing through station i then equals

di = 2 di(k)pi(k),
k=O

where da(k) = 6; + Jc/p,, 6i is the switch-over time starting from station i, and pi is the
service rate at station i. When all the values di have been calculated, the mean server
vacation time perceived at station i equals Di = Cjfd di. The reciprocal value l/Di can
then be used as the new guess for the rate at which transition others completes in model
AL&. This process is iterated until two successive values of Di do not differ by more than a
prescribed error tolerance from one another.
An intrinsic assumption in this approach is that the server unavailability time perceived
by a station is exponentially distributed. This is generally not the case in practice. Using
phase-type distributions, this assumption might be relaxed.
From the analysis of A4i with the converged value l/Di for the rate of others, various
performance measures can easily be derived as before. Choi and Trivedi report relative
errors on the mean response time per node of less then 1% for low utilisations (less than
50%) up to less than 10% for larger utilisations, when compared to the exact analysis of the
complete models (when these complete models can still be solved). The solution time of
the fixed-point iteration was reported to be only a few percent of the time required to solve
the overall models. The fact that the sketched procedure indeed leads to a unique solution
relies on the fixed-point theorem of Brouwer and is extensively discussed by Mainkar and
Trivedi [ 1891.

16.2.4 Polling with load-dependent visit ordering


In the polling models we have discussed so far, the visit-ordering of the stations has been
fixed. We now address a two-station polling model where the visit-order is dependent
on the length of the queues in the two stations. Of course, for such an ordering to be
practically feasible, the server attending the queues must have knowledge of the status of
374 16 Stochastic Petri net applications

all the queues, otherwise a proper decision cannot be taken. We therefore restrict ourselves
to a simple case, with only two queues; we furthermore assume that the switch-over delays
are equal to zero. Under these assumptions, this polling model can be seen as a model of
a traffic multiplexer for two classes of traffic.
Lee and Sengupta recently proposed such a polling mechanism as a flexible priority
mechanism to be used in high-speed networking switches [178]. They proposed their so-
called threshold priority policy (TPP) as a priority mechanism that gives priority to one
traffic class over another, only when really needed. In the following, we will assume that
the two traffic classes are a video or real-time class (rt) and a data or nonreal-time class
(nrt).
In the TPP, two buffers are used for the two traffic classes. A predetermined threshold
L is associated with the real-time buffer. When the queue length in the real-time buffer is
less than or equal to L, the server alternates between the two buffers transmitting one cell
from each buffer (as long as a queue is not empty). On the other hand, when the queue
length in the real-time buffer exceeds L, the server continues to serve the real-time buffer
until its queue length is reduced to L. The value of the threshold L gives the degree of
preferential treatment of the real-time traffic. When L = 0, real-time traffic is given an
absolute non-preemptive priority. When L = 00, both traffic classes are served alternatingly
when not empty, i.e., the server acts as a l-limited cyclic server. By selecting L between
these two extremes, one may provide an adequate quality of service to both real-time and
nonreal-time traffic.
In Figure 16.14 we depict the SPN model of the multiplexer. On the left side, we see the
arrival streams coming into the buffers for the two traffic classes; here appropriate arrival
models should be added. The server is represented by the single token that alternates
between places try-rt and try-nrt. After a cell of one class is served (via either transition
serve-rt or serve-nrt) the server polls the other class. When nothing is buffered for a
particular traffic class, the server also polls the other class, via the transitions empty-rt and
empty-nrt. However, depending on whether there are more or less than L cells buffered
in place buff -rt, it can be decided that the server remains serving the real-time traffic
class. This is enforced by the immediate transitions rt-rt, rt-nrt, nrt-nrt, nrt-rtl
and nrt-rt2. Apart from the normal enabling conditions for these transitions (at least a
token in every input place and no tokens in places that are connected via an inhibitor arc
to the transition) these transitions have enabling functions associated with them, as given
in Table 16.3; they are taken such that the TPP is exactly enforced.
The TPP a proposed by Lee and Sengupta functions well when the arrival streams of
the two traffic classes are Poisson streams. However, when one of the arrival streams is
16.2 Polling models 375

arrive-rt try-73
buff-rt
arrivals

real-time

departures

arrivals :

nonreal-time

departures :

G-t-rt2

Figure 16.14: The TPP as an SPN model

transition enabling function


rt-rt (#buffrt> L) or ((#buffrt> 0) and (#buffnrt= 0))
rt-nrt (#buffrt> 0) and (#buffrt< L)
nrt-nrt (#buffrt= 0) and (#buffnrt> 0)
nrt-rtl (#buffrt> L)
nrt-rt2 (#buffrt> 0)

Table 16.3: Enabling functions for the immediate transitions in the TPP model

more bursty, which can be the case for real-time video traffic, it does not function properly
any more. We therefore recently introduced the extended Z’PP (ETPP) mechanism, which
also works well in case of bursty real-time traffic [124]. In the ETPP, the server remains to
serve the real-time queue until the burst of real-time traffic has been handled completely,
instead of polling the nonreal-time queue already when there are less than L real-time
traffic jobs left.
376 16 Stochastic Petri net applications

UPA FailA

\-4 RepB

Figure 16.15: A simple availability model for a system consisting of multiple components
of two classes and a single repair unit

16.3 An SPN availability model

We now consider a typical availability model for a multi-component system; we have already
addressed this model in one of the exercises of Chapter 14.
Consider a system consisting of nA components of type A and nB components of type
B. The failure rate of components of type A (B) is fA (fB) and the repair rate is TA (TB).
We assume that the times to failure and the times to repair are exponentially distributed.
There is nR = 1 repair unit available for repairing failed components; it can only repair
one component at a time. Components that fail are immediately repaired, if the repair
unit is free; otherwise they have to queue for repair.
The availability of such a system can be modelled using a fairly simple SPN, as given in
Figure 16.15. The nA components in place UpA can fail, each with rate fA, hence the rate of
transition FailA is made marking dependent on the number of tokens in UpA. Once failed,
these components have to wait on their repair in place WoRA.If the repair unit is free, i.e., if
RepUnit is not empty, the immediate transition srA (start repair A) fires, thus bringing
together the repair unit and the failed component. After an exponentially distributed time
(with rate rA) the repair is completed, bringing the component up again. For components
of class B, the SPN operates similarly. The weights of the immediate transitions srA and
srB have been made linearly dependent on the number of failed components waiting to be
repaired in the places WoRAand WoRB.If multiple repair units are provided, i.e., ?%R> 1,
then the rate of the repair transitions RepA and RepB should be made dependent on the
number of currently repaired components (per class); we only consider the case nR = 1.
16.3 An SPN availability model 377

The number of tangible states in the underlying CTMC can be expressed as:

NoS= l+nA-knB+kAng.

This can be understood as follows. There is one state with no components failed. Given
that one or more components of class A have failed (and none of class B), one of them is
being repaired; there are nA of such states. A similar reasoning is valid for the case when
only one or more class B components have failed. There are nAnB cases in which there are
failed components of both classes A and B, in each of which either a class A component or
a class B component is being repaired, which explains the factor 2.
Now consider the case in which the components of class A and B are used to process
items of some sort. After an initial processing phase at a class A component, a class B
component finishes the processing of the item. The nA components of class A can each
maintain a speed Of item PrOCeSSing Of PA items per minute. Similarly, the nB components
of class B can each handle PB items per minute. Finally, items flow in at rate A (items per
minute). We assume that all times for item processing are deterministic. If some of the
components have failed, their processing capacity is lost, leading to a smaller throughput
of produced items. We assume that a measure of merit for the overall processing system,
given (‘state” (nA, nB), is given by min{n#A, ngpg}, i.e., the weakest link in the processing
chain determines the reachable throughput X; this minimum value can be taken as reward
to be associated with every state in the model.
To evaluate this model, we assume the following numerical parameters. We take as
failure and repair characteristics for the components: nA = 20, nB = 15, fA = 0.0001
failures per hour (fph), fB = 0.0005 fph, and TA = 1 repair per hour (rph) and TB = 0.5
rph. Furthermore, the item production rates are PA = 7.5 items per minute (ipm) and
pB = 10 ipm. w e set A = 150, such that the overall item processing capacity of components
of class A and B exactly matches the arrival rate of items to be processed. If one or
more components fail, the production rate of items will fall. We now define the following
measures for this system model:

l the probability that the system is fully operational at time t (the availability A(t));

l the probability that the system is not fully operational at time t (the unavailability
U(t) = 1 - A(t)); for th ese two measures, the limiting case for t + 00 is also of
interest;

l the expected rate at which items are produced at time t, denoted E[X(t)], measured
in items per hour, using the rewards just defined;
378 16 Stochastic Petri net applications

t w> Jw(t)l qLs(t>l kl k kss


0.1 0.000921 8999.475 0.0265 0 7 -
0.2 0.001788 8998.980 0.1039 0 8 -
0.5 0.004097 8997.657 0.6140 0 9 -
1.0 0.007141 8995.891 2.2475 0 12 -
2.0 0.011154 8993.530 7.6420 0 15 -
5.0 0.015679 8990.796 32.153 0 23 -
10.0 0.016864 8990.056 80.700 0 33 -
20.0 0.016983 8989.978 180.757 2 52 -
50.0 0.016985 8989.978 481.430 19 98 55
100.0 0.016985 8989.978 982.555 - - 55
00 0.016985 8989.978 - - - -

Table 16.4: Results for the transient analysis of a CTMC (636 states) of moderate size,
using uniformisation

l the cumulative number of items not produced due to capacity loss due to failures,
denoted E[L&)], using the rewards just defined.

In Table 16.4 we show, for increasing values of t, some of the above measures. We also
indicate the left-, right- and steady-state- truncation points in the summation of the uni-
formisation procedure taking E:= lop7 (see Section 15.2). We observe that for increasing t,
the unavailability steadily decreases towards its limiting value, as does the expected pro-
duction throughput E[X(t)]. The column E[&,,,(t)] sh ows that even a small unavailability
can lead to a large loss in production. The column ko shows the left-truncation point in the
uniformisation process. For small t, it equals 0; however, for increasing t, it shifts slightly
higher. Similar remarks apply for the right-truncation point k,. For t 2 50, we see that
steady-state is reached (column I?,,), hence the tail of the summation can be performed
more efficiently. For t 2 100 we see that steady-state is reached before the actual addition
of probability vectors starts, hence, for t = 100 the same results apply as for steady-state.
Notice that limt+, E[Y&(t)] = 00.

16.4 Resource reservation systems


We finally present an SPN model for the analysis of reservation-based systems. A typical
example of such a system can be found in circuit-switched telecommunication networks
16.4 Resource reservation systems 379

Figure 16.16: A generic SPN model for a resource reservation system

where, before actual messages can be exchanged, a number of links (or part of their ca-
pacity) has to be reserved (connection reservation and set-up phase). After usage, these
resources are then freed and can be used as part of other connections.
Let us consider the case in which there are R resource types. Of each resource type, n,
instances exist. Furthermore, we have K types of resource users. We have mk instances of
user type Ic, and a resource user of type Ic is characterised by the following three quantities:

l its resource request rate xk;

l its resource usage rate pk;

a its resource claim Set ck 2 { 1, . . . , R}.

Using these quantities, the typical operation of a user of class k is as follows. First of all, the
user remains inactive, i.e., it does not need any resources, for an exponentially distributed
time with rate Xk. This user then tries to claim all the resources it needs, i.e., it claims
resources instances T E ck. After the user has acquired all its resources, it starts using
them, for an exponentially distributed time with rate pk. After that, the claimed resources
are freed again, and the cycle restarts.
This type of behaviour can very well be described by the SPN (partially) given in
Figure 16.16. For type /Cof resource users, we have an arrival transition reqk. After such a
380 16 Stochastic Petri net applications

user has made its request, an immediate transition grabk takes all the required resources in
the claim set Ck and the usage of the resources starts. After an exponentially distributed
time, modelled by transition releasek, the resources are freed again, and the user of type
k remains inactive for some time. Notice that transition releaser, should have infinite-
server semantics; for every token in inusek, an independent exponentially distributed time
until resource release should be modelled. This implies that the rate of releasek should be
made linearly dependent on the marking of place inusek. Whether or not also transition
requestk should have infinite-server semantics (with respect to place inactivek) depends
on the actual behaviour of the users. Typical measures of interest for this type of model
are the following:

l the number of outstanding requests for class k that cannot be granted immediately,
denoted E[# waitk], or the density of the number of tokens in place waitk;

a the throughput for customers of class k: Xk = k Czr 1Pr{#inusek = 1);

l the utilisation of resources of type T: p7.= I - E[#&,]/q..

There are a number of possible extensions to the model sketched in Figure 16.16:

l the arrivals of requests could be made more general, e.g., by allowing more complex
constellations of places and transitions instead of the single request-transition;

l the resource holding time could be made more deterministic or more variable, e.g., by
allowing for more complex constellations of places and transitions instead of the single
release-transition;

a the resource request could be made more general in the sense that some requests might
require multiple instances of certain resources; the claim sets would then become
multisets and the multiplicities of the arcs from the resource-places to the grab-
transitions would have to be changed accordingly.

We finally comment on the solution of this type of SPN. Although a straightforward solution
via the underlying CTMC is, in principle, always possible, in many practical cases the
underlying CTMC becomes too large to be generated and evaluated, especially if R, K,
and the values of n, and ml, are not small. Therefore, various researchers have looked
for approximations for this type of model. In particular, if the transitions requestk and
grabk are amalgamated (and place waitl, is removed, as in Figure 16.17) the SPN has a
structure that allows for a product-form solution of a similar form to that seen for GNQNs
16.5 Further reading 381

inact ivek

releasek

Figure 16.17: An approximate SPN model for a resource reservation system

[75, 134, 1331. Th e d evelopment of a mean-value analysis algorithm for a special case of
such an SPN is demonstrated in [70]. It should be noted though, that the changed model
might not accurately describe the real system operation anymore.

16.5 Further reading

Further application examples of SPNs can be found in a number of books on SPNs, e.g., by
Ajmone Marsan et al. [l, 21 and by Bause and Kritzinger [15]. Also the book on Sharpe by
Sahner et al. [249] p rovides application examples.
Regarding SPN-based polling models, we refer to papers from Ajmone Marsan et al. [5,
7, 61, Ibe et al. [141, 142, 1401, Choi and Trivedi [50] and Haverkort et al. [124].
The model in Section 16.3 is elaborated in [121]. Sanders and Malhis also present SPN
dependability models [252]. C iar d o et al. also present interesting SPN-based performance
and dependability models in [54].
Finally, more application examples of SPNs can, among others, be found in the pro-
ceedings of the IEEE workshop series on Petri Nets and Performance Models. SPN-based
dependability models can be found in the proceedings of the annual IEEE Fault-Tolerant
Computer Systems Symposium and in the proceedings of the IEEE International Perfor-
mance and Dependability Symposium.
382 16 Stochastic Petri net applications

16.6 Exercises
16.1. l-limited polling.
Construct a simpler version of the SPN model of Figure 16.7 for a l-limited scheduling
strategy.

16.2. Globally-timed polling.


Construct an SPN-based polling model with global timing, as used in FDDI (see Chapter 9).
Notice that in such a system, a timer is started when the token (the server) leaves a
station. Upon return to the station, it is checked whether there is still time left to start
transmission(s), i.e., whether the timer has yet expired or not.

16.3. ETPP polling.


In Section 16.2.4 we presented the threshold priority policy (TPP) as a polling model with
load-dependent visit order. We are able to change this policy to an extended form (ETPP)
in which the real-time buffer is completely emptied once its filling has reached the threshold
L, before service is granted to the nonreal-time traffic. Starting from the SPN given in
Figure 16.14, construct an SPN for the ETPP. For details on such models, we refer to [124].

16.4. Polling models with IPP workload.


Adapt the SPN-based polling models so that the arrival stream of jobs per station proceeds
according to an IPP. How does this change affect the size of the underlying CTMC?

16.5. Polling models with PH-distributed services.


Adapt the SPN-based polling models so that job service times are PH-distributed. As pos-
sible PH-distributions, take an Erlang-Ic distribution and a hyper-exponential distribution
(with k phases). How do these changes affect the size of the underlying CTMC?

16.6. The availability model.


Consider the following generalisation of the availability model of Section 16.3. We now deal
with C class of components. Component class c E C = { 1,2, . e. , C} has n, components.
Show that the number of tangible states in the underlying CTMC can be expressed as:
NoS = l+ x IT]. nnc,
TC,T#0 ET

where T is the power set of { 1,2, . . . , C} and IT] the cardinality of set T. Hint: try to
generalise the reasoning for the case C = 2 as presented in Section 16.3. For more details
on this model, and on the solution of this exercise, see [121].
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 17

Infinite-state SPNs

T HE SPNs we have addressed in the previous chapters all have a possibly


finite state space. In this chapter we focus on a special class of stochastic
with unbounded state space, known as one-place unbounded SPNs or infinite-state SPNs
large, but
Petri nets

(abbreviated as iSPNs). In particular, we focus on a class of SPNs of which the underlying


CTMC has a QBD structure, for which efficient solution methods exist (see Chapter 8).
The properties an SPN has to fulfill to belong to this class can be verified at the SPN
level, without having to construct the reachability graph. The main advantage of iSPNs is
that efficient matrix-geometric techniques for the solution are combined with the powerful
description facilities of (general) SPNs. This not only allows non-specialists to use these
methods, it also avoids the state-space explosion problem that is so common in traditional
SPN analysis based on the complete (finite) underlying Markov chain.
We motivate the use of iSPNs in Section 17.1 and characterise the class of iSPNs
by defining a number of constraints that have to be fulfilled in Section 17.2. We then
discuss, in Section 17.3, how matrix-geometric methods can also be applied in this case.
In Section 17.4 we comment on algorithms to detect the special iSPN structure and to
compute reward-based measures efficiently. We finally discuss a number of application
examples in Section 17.5.

17.1 Introduction
The general approach in solving SPNs is to translate them to an underlying finite CTMC
which can be solved numerically. However, a problem that often arises when following this
approach is the rapid growth of the state space. Various solutions have been proposed for
this problem, e.g., the use of state space truncation techniques [121] or lumping techniques
384 17 Infinite-state SPNs

Section 17.2
/
iSPN measures

Section 17.4

Ai 7Bij

\I Section 17.3

matrix-geometric solution

Figure 17.1: Overview of the modelling and solution approach for iSPNs

[31, 32, 47, 2531. For a restricted class of SPNs, product-form results apply, so that an
efficient mean-value analysis style of solution becomes feasible [134, 751. In all these cases,
the “trick” lies in circumventing the generation of the large overall state space.
When studying queueing models, as we have done in Parts II and III, one observes
that the analysis of models with an unbounded state space is often simpler than analysing
similar models on a finite state space. This suggests the idea of studying SPNs that have
an unbounded state space. Instead of generating and solving a very large but finite SPN
model (as we have done so far) we solve infinitely large CTMCs derived from special SPN
models. Of course, not all SPNs can be used for this purpose; we require them to exhibit
a certain regular structure. Although this limits their applicability, it is surprising how
many SPNs do fulfill the extra requirements.
A class of infinitely large Markovian models which allows for an efficient solution is the
class of quasi-birth-death models, as described in Chapter 8. A state-level characterisation
of such models is, however, cumbersome for practical applications. We therefore define a
class of SPNs which has an underlying CTMC which is a QBD; these SPNs are denoted
iSPNs.
In Figure 17.1 we present the GMTF ( as introduced in Chapter 1) applied to iSPNs.
The sections that follow are devoted to the specific parts indicated in this figure. Defini-
tions and terminology for iSPNs are given in Section 17.2. The solution of the underlying
QBD, starting from the block matrices Ai and Bij , and yielding the matrix R and the
probability vectors gi is discussed in Section 17.3 (this section depends strongly on Chap-
ter 8). The transformation process from the SPN description to the QBD structure and the
enhancement of the steady-state probability vectors to reward-based performance measures
17.2 Definitions 385

is then discussed in Section 17.4.

17.2 Definitions
We discuss some preliminary notation and terminology in Section 17.2.1. The requirements
for iSPNs are formally given in Section 17.2.2 and they are discussed in Section 17.2.3.

17.2.1 Preliminaries
The class of iSPNs is similar to the class of SPNs defined in Chapter 14. Without loss of gen-
erality we assume that the iSPN under study, denoted iSPN, has a set P = {PO, PI, . - - , Pnp}
of places of which PO may contain an infinitely large number of tokens. A distribution of
tokens over the places is called a marking and denoted m = (me, m) = (me, ml, . . a, m,).
With m. E m and m E R’, the set of all possible markings is denoted R = JV x R’.
Clearly, INI = 00 and IR’( < 03. The set of transitions is denoted T.
We now define ZeveZR(lc) to be the set of markings such that place PO contains Ic tokens,
i.e., R(lc) = {a = (mo,nx) E RJ mo = Ic}. The levels R(k), Ic E JV constitute a partition of
the overall state space: R = lJE, R(lc) and R(lc) n R(Z) = 8, Ic # 1. For ease in notation,
we also introduce R’(lc) = {ml(lC,z) E R(k)}.
We furthermore define the following two leads to relations. We denote m --% m’ if
transition t is enabled in m and, upon firing, leads to marking a’. The firing rate of t is
not important. We denote m t,X\ m’ if transition t E T is enabled in JQ and, upon firing,
with rate X, leads to marking a’.

17.2.2 Requirements: formal definition


We now can define the class of iSPNs by imposing a number of requirements on the SPN
structure and transition firing behaviour. It should be noted that these requirements are
sufficient, rather than necessary.

Requirement 1. Given iSPN, there exists a K E JV such that for all Ic, Z > K: R’( Ic) =
R’(Z). We denote L = lR’(n)I.

Requirement 2. Given iSPN and K as defined above, the following requirements should
hold for the so-called repeating portion of the state space:

1. intra-level equivalence:
vk, z > 4 t E T, x E E+: if (k,m) t,X\ (Ic,m’) then (1,224) t,X\ (Z, ml);
386 17 Infinite-state SPNs

2. inter-level one-step increases only:

kf’lc 2 K, 3 E T: (k,g.J) -5 (k + l,??g;


Vk 2 K, t E T, x E 8x+: if @,m) 3 (Ic + 1,m’) then (k + 1,~) t,X\ (Ic + 24;
v’lc 2 K, v’i E N, i 2 2, glt E T: (k, Tn) -% (k + i, m’);

3. inter-level one-step decreases only:


v’lc>K,3tET: (k+l,m)A(k,&
vk > K, t E T, x E E+: if (Ic + 2,~) t,X\ (/c + l,m’) then (k + 1, m) t,X\ (k,~‘);
WC > K, Vi E JV, i 2 2, j5l-LE T: (k + i,m) -5 (k,~‘);

Requirement 3. Given iSPN and K as defined above, for the so-called boundary portion
of the state space the following requirements should hold:

1. no boundary jumping:
‘d/c < K - 1,vz > K, j%l E T: (i&m) -% (I&);
tJk < K - 1, vz > K, & E T: (I,& tz (k,T?J);

2. only boundary crossing:


3w2 E T: (K - Lm,) tl\ (&ml,), (I‘$?& -3 (K - l,?&);

17.2.3 Requirements: discussion

To ease the understanding, let us now discuss the formal requirements in a more informal
way. We first address the state space and its partitioning in levels. The first requirement
states that, starting from a certain level K upwards, all levels are the same as far as the
non-infinite places (places Pi through P,) are concerned; they only differ in the number of
tokens in place Pa. It is for this reason that the levels Ic 2 K are called the repeating portion
(levels) of the state space. The levels Ic < K are called the boundary portion (levels) of the
state space. In Figure 17.2 we depict the overall state space and its partitioning in levels.
We have tried to visualise the fact that starting from level K upwards, the levels repeat one
another. Levels 0 through K - 1 can be totally different from one another. Between states
from levels 0 through K - 1 all kinds of transitions may occur. That is why we can also
see these boundary levels as one aggregated boundary level (Requirement 1).
Transitions can occur within a level, and between levels. Since the repeating levels are
always the same (apart from the level number itself) all internal transitions in one level must
have similar equivalents in other repeating levels. There are no transitions possible between
non-neighbouring levels. There have to exist up- and down-going transitions between
17.3 Matrix-geometric solution 387

c______-----------_
I \
I I
---f --f --f
----j. --s-- -e-

K I K+l j K+2 j
- f- -t-
f- -t- -
111 II-
I.~~~~~~~~----------~
repeating levels
boundary level(s)

Figure 17.2: State space partitioning in levels

neighbouring levels. Also, for the repeating levels, their interaction with neighbouring
levels is always the same (Requirement 2).
The transitions between the boundary levels and the repeating levels only take place
in levels K - 1 and K; however, they may have any form (Requirement 3).
As a conclusion, due to the three requirements the CTMC underlying iSPNs obeys a
quasi-birth-death structure. We will exploit this fact in the solution of this class of SPNs.
A final remark should be made regarding the necessity of the requirements. Indeed, the
requirements are sufficient, however, not always necessary. One can imagine CTMCs which
have a slightly different structure, especially in the boundary part of the state space, that
still have a matrix-geometric solution. St at ing necessary requirements, however, would
make the requirements more cumbersome to validate.

17.3 Matrix-geometric solution


In this section we discuss the matrix-geometric solution of the class of iSPNs. Referring
to Figure 17.2, it is easy to see that the generator matrix Q of the QBD has the following
form:
388 17 Infinite-state SPNs

row\col 0 ... K-l K K+l K+2 *-*


0 Bo,o --s BO,M 0 0 ... ...
1 Bl,o --e BI,,-I 0 0 ... ...

Q= K-1 BeI,O -es BKc-l,rc-1 B,pl,, 0 e.. ..a . (17.1)


K 0 . . . &-I Bc,tc %++I 0 ...
K+l 0 ... 0 &+I+ Bn+l,n+l K+l,,c+2 . --
K-k2 0 ... 0 0 %+2,n+1 Bn+2,n+2 e--
..

Now, by the requirements posed on the intra- and inter-level transitions, we have

B/c,/c+l = Ao, k=K,K+l-~,


&-,I, = Al, k = K+ l,rc+2**., (17.2)
i Bk,k-1 - A27 k = is+ l,K+2*?
Using this notation, we may write Q as follows:

/ Bo,o --. Bo,,-I 0 ... ... ...


... 0 ... ... ...

Bn-I,O --- BK.-l,n-l Bnpl+ 0 ... es.


Q= 0 0 B&+-l B,,, A0 0 s .a (17.3)
0 0 0 A2 Al A0 ..’
0 0 0 0 A2 Al ...
. *.
\
From a CTMC with the above generator matrix, we can compute the steady-state proba-
bilities p- by solving the usual global balance equations (GBEs):

pQ= 0, andcpi = 1, (17.4)

where the right part is a normalisation to make sure that all the probabilities sum t,o
1. First, we partition p- according to the levels, i.e., ;p = (go, gl, . . . ,2,-r, z,, x,+r, . . .).
Substituting this in (17.4) we obtain the following system of linear equations:

(17.5)
17.3 Matrix-geometric solution 389

We now exploit the regular structure of the state space in the solution process in a similar
way to that in Chapter 8. In particular, looking at (17.5(d)), it seems reasonable to
assume that for the state probabilities zi ,i = K, K,+ 1, . . ., only the neighbouring levels are
of importance, so that they can be expressed as:

&c+1 =z,R, z,+~ =zn+rR=gKR2;.., (17.6)

or, equivalently,
&C+i = z,Ri,i E N, (17.7)

where R is a square L x L matrix relating the steady-state probability vector at level K + i


to the steady-state probability vector at level K + i - 1 (i = 1,2,. . s). As we have seen
for QBDs in Chapter 8, we know that this is true when the matrix R satisfies the matrix
polynomial:
Aa + RAr + R2A2 = 0. (17.8)

We have discussed means to solve this matrix polynomial in Section 8.3.


When i = K, (17.5(c)) can be rewritten to incorporate the above assumption, because
z,+~ can be written in terms of zK. and B,+r,, = As:

fi+1
c ZjBj,K
j=n-1 = ~tc--1Bt+c
+x,B,,,+z,+JL+l,ra,
= z~-~%--I,K.
+s,(B,,,
+R&)=0. (17.9)

With this substitution, (17.5(a-c)) comprises a system of K+ 1 linear vector equations with
as many unknown vectors. However, as these vectors are still dependent, the normalisation
(17.5(e)) h as t o b e integrated in it, to yield a unique solution. This normalisation can be
written as follows:

i=O i=O i=tc i=rc


ICE-1
= C~i..++z,R”(I-R)-‘.I= 1’ (17.10)
i=O

We have mentioned means to solve this system of linear equations in Section 8.3; details
about these solution techniques can be found in Section 15.1.
Regarding the stability of the modelled system, similar remarks apply as given in Sec-
tion 8.2.3 for PHIPHIl queueing systems; condition (8.23) can be validated once the ma-
trices Ai have been computed.
390 17 Infinite-state SPNs

17.4 iSPN specification and measure computation


In Section 17.4.1 we reflect on an algorithm to translate iSPNs to the underlying QBD
processes. In Section 17.4.2 we then present a number of techniques to evaluate reward-
based measures for iSPNs in an efficient way.

17.4.1 From iSPN to the underlying QBD


The translation of an iSPNs to the underlying QBD can in principle be performed with
well-known algorithms for state-space generation, as discussed in Chapter 14. It should
be noted, however, that iSPNs have an unbounded number of states so that a ‘(standard”
algorithm will not terminate. Instead, a proper translation algorithm should end when
it recognises that it is exploring the state space in the repeating part. One practical
problem that one encounters is that the stated requirements are not easily verified in
general. When inhibitor arcs and enabling functions are allowed in their most general
setting, the verification problem is even undecidable (see also [18]). Therefore, the given
requirements should be interpreted as being sufficient to allow for the matrix-geometric
solution. However, we are free to pose extra requirements, in order to ease the decision
task, albeit possibly at the cost of less modelling flexibility. Taking these considerations
into account, we decided on the following practical restrictions on iSPNs:

l input and output arcs connected to place PO may only have multiplicity one;

l marking dependent rates and weights on the contents of place PO and enabling func-
tions using the marking of place PO are not allowed.

Up till now, these restrictions did not bother us in performance evaluation studies.
In the above restrictions, as well as in Section 17.2, we assumed that the identity of the
place that may contain an unbounded number of tokens is known in advance. Indeed, when
using iSPNs it is normally easy for a modeller to indicate its identity; the whole model
is normally built around this place. On the other hand, place PO can also be recognised
automatically. Since only for PO is the marking unbounded, place PO will not appear in any
place invariant. Thus, given an iSPN, an invariant analysis as discussed in Section 14.2
will reveal the identity of PO.
A final problem in the translation algorithm is the determination of the level number
K where the repetition starts, in order to stop the state-space generation process timely.
Although this is easy for a human being, doing this for instance “by inspection” of the
upper left part of the partially generated matrix Q, it is less easy to grasp in an algorithm.
17.4 iSPN specification and measure computation 391

However, given the above restrictions, we have been able to prove that once two successive
levels are the same, all levels beyond these will also be the same. The 3-page proof, based
on finite induction, goes beyond the scope of this book 1971.

17.4.2 Efficient computation of reward-based measures


Once the steady-state probabilities are known, reward-based performance measures can be
computed easily. Let r : R --+ R denote a real-valued reward function defined on the state
space of the model. The steady-state expected reward is then computed as

qx1 = i?: x ~i,~r(ol).


i=o E@?(i)
(17.11)

Without any further restrictions on the form of ~(i, VZ) we cannot further reduce the above
expression. Thus, to compute E[X] in such cases, we start the infinite summation and
continue to add more terms until the additional terms are smaller than a certain threshold.
Assuming that all rewards are positive, we thus compute a lower bound on the actual
expected reward.
There are, however, quite a number of reward-based measures that are of general in-
terest and that can be computed more efficiently, without involving infinite summations.
We discuss a number of these special cases below.

Reward function only depending on the level

If the reward function only depends on the level number and not on the inter-level state,
that is, if ~(i, m) = ~(i, m’), for all i E m and for all m, m’ E R’(i), we may write r(i, m) =
y(i). Note that this type of reward-based measure typically concerns the unbounded place
Pa, when computing

l the probability that the number of tokens in PO is above a certain threshold 1, we set
r(i) = l{Po > 1);

l the probability that PO contains exactly I tokens, we set r(i) = l{Po = Z};

l the expected number of tokens in PO, we set r(i) = i.

For these cases, we can write:

E[X] = &i)(r, - 1)
i=o
392 17 Infinite-state SPNs

i=O i=tc
cm
(17.12)
i=O j=o
\ v ,
LT

The first, finite, sum does not comprise a problem. We now concentrate on the second sum
for each of the measures identified above:

l When r(i) = l{F’o = 1) th e infinite summation contains at most one non-zero term,
so that we have:
E[X] = LT+a -1. (17.13)

l If we want to compute the probability that Pa contains more than I tokens, this can
be rewritten as 1 minus the probability that PO contains at most I - 1 tokens. Thus,
the infinite summation also reduces to a finite one.

l A more complex case arises when r(i) = i; however, here also a closed-form expression
can be derived:
K-1
E[X] = ci(4i.I)+~(~+j)(znRj.~)
i=O j=o
Pi-1

= C i(Zi * 1) + E K(~KR’ * 1) + &(znRj * 1)


i=O j=O j=O
K-l

= xi(Zi.1) +Kz.i, (g-) I+& (gi-‘) 1


i=O

n-l

= Ci(Zislc)+K& l+z&R-& ((I-R)-‘)1


i=O
K-1
= x i(Zi * 1) + (z&(1 - R)-~(KI + R(1 - R)-l)) ‘1. (17.14)
i=O

Reward function independent of level and PO

When computing mean place occupancies for places other than PO, the rewards depend
on m rather than on i: r(i,m) = r(m), irrespective of i. We start with the general
reward-based expression:
17.5 Application studies 393

K-1

=x>: q,+, m) + 5 c zi&-(i, m). (17.15)


i=o mER’(i) i=n mER’(i)
-

LT

The left additive term LT again does not cause problems. The right term can be reduced
considerably by changing the summation order and using the fact that the rewards are
level-independent as follows:
cm

E[X] = LT + x r(m) xzi,,


mER'(n) i=tc

= LT + 1 +I> -g&m -!&I, (17.16)


1140?'(K) i=O

where e, is a vector with a single one at its m-th position. The right-most sum can be
reduced, yielding:
E[X] = LT + c r(m)z,(I - R)-kE. (17.17)

Similar expressions are obtained when computing the probability that the number of tokens
in a certain place (unequal PO) is larger or smaller than a threshold 1.

17.5 Application studies


To demonstrate the usability of iSPNs we present three small application examples. We
present a queueing model with delayed service in Section 17.5.1; this model is so simple
that we can solve it explicitly. We then continue with two more realistic models: a model of
a connection management mechanism in Section 17.5.2 and a model of a queueing system
with checkpointing and recovery in Section 17.5.3.

17.5.1 A queueing model with delayed service


Considers a single server queueing system (see Figure 17.3) at which customers arrive as
a Poisson process with rate X via transition arr and are served with rate p via transition
serve. Before service, arriving customers are stored in a buffer. Service is not immedi-
ately granted to an arriving customer, even if the server is idle at that time (a token in
place sleep). Only after there are at least T (for threshold) customers queued does the
server awake and start its duties. This is enforced by an enabling function associated with
the immediate transition wake-up: #buffer > T. The server subsequently remains awake
until the buffer becomes empty, after which it resumes sleeping.
394 17 Infinite-state SPNs

start

buffer2T

Figure 17.3: iSPN of a single server queueing system with delayed service

For a threshold T = 3, the corresponding CTMC is given in Figure 17.4. From the
models, it can easily be seen that they fulfill requirements l-3 with K = T = 3. The
generator matrix has the following form:

where C is chosen such that the row sums equal 0. From this matrix we observe that the
A-matrices are in fact scalars or 1 x 1 matrices: A0 = (X), Al = (-(X + p)) and AZ = (p).
From these matrices we derive pR2 - (X + p)R + X = 0 for which the only valid solution
is R = (X/p). F rom this, it once again becomes clear that R takes over the role of p in
simpler queueing analysis, such as in the M 1MI 1 queue.

Denoting zi = zi, for i = 0 or i = 3,4,. . ., and zi = (zi,~, z~,A), for i = 2,3, the boundary
17.5 Application studies 395

Figure 17.4: QBD of the single server queueing system with delayed service in case T = 3

equations, including the normalisation equation, become:

(17.18)
&A - (A + &2A +pz3 = 0,

&,5 + &A - (A + /.+3 + I.Lz4 = 0,

20 + 21s + XlA + z2,5 + Z2A + x3(1 - p)-’ = 1.

As a numerical example, consider the case where X = 2, ,X = 3, T = 3 and, consequently,


p = 2/3. The matrix-geometric solution results in the following boundary steady-state
probabilities:
zo = 0.11111, ZlA = 0.07407, 21s = 0.11111,

{ &4 = 0.12346, x2s = 0.11111, z3 = 0.15638.


Using these probabilities, and z; = z3pie3, i = 4,5,. . ., we obtain for the average number
of customers in the system E[N] = 3.00.

17.5.2 Connection management in communication systems


An ATM/B-ISDN-based communication infrastructure offers a connection-oriented service.
Via the ATM adaptation layers 3/4 and 5, connectionless services can also be provided [225,
2511. Packets arriving at the AAL service boundary to make use of such a service, suffer
a possible delay from the connection establishment at the ATM service boundary, unless
there already exists a connection when the packet arrives. Once the connection has been
established, all buffered packets can be transmitted and the connection can be released.
This can be done immediately, or with some delay. The former has the disadvantage that
a connection is being maintained when it is not needed; however, it has the advantage
396 17 Infinite-state SPNs

Figure 17.5: iSPN model of the OCDR mechanism

Figure 17.6: CTMC of the OCDR mechanism

that some packets might profit from the fact that there is still a connection when they
arrive. Clearly, there is a trade-off between the release delay, the costs of maintaining
an unused connection and the perceived performance (average delay). The above way of
implementing connectionless services, has been proposed by Heijenk et al. [131] under the
name “on-demand connection with delayed release” (OCDR).
An iSPN model for such a system is given in Figure 17.5. Packets arrive via transition
arr and are placed in the buffer. The rate of transition arr is modulated by an inde-
pendent on/off-model. A token in place on or off models the fact that the source is in a
burst or not, respectively. When in a burst, packets are generated according to a Poisson
process with rate X packets/second. When not in a burst, no packets are generated. The
transitions go-on and go-off, with rates a and ,0 respectively, model the time durations
the source remains in the off and on state. The service rate is ,Q Mbps and the average
packet length is denoted 1.
17.5 Application studies 397

0.08 I I I
set 1 -
0.07 ‘- set 2

100
x (P/S>

Figure 17.7: The expected delay E[D] ( in seconds) as a function of the arrival rate X in a
burst

If the server is busy, there will be a token in place busy and arriving packets have to
wait on their turn. If the server is idle, but there is no connection available, signified by
a token in place no-corm, a connection will be established, causing a negative exponential
delay with average length l/c (transition set-up). Once there is a connection, normal
packet transmissions can take place. Once the buffer is empty, the connection is released
with a negative exponential delay with average length l/r (transition release).
The corresponding CTMC is given in Figure 17.6. In this model, the states space
R = {(i,j,k)li E IV, j,k = 0, I}. P arameter i denotes the number of packets in the
system, j denotes whether there is a connection (j = 1) or not (j = 0), and Ic denotes
whether the arrival process is in a burst (k = 1) or not (k = 0). As can be seen from
the CTMC, but as can also be verified using Requirements 1-3, this model has a structure
that allows for a matrix-geometric solution. Every level consists of the L = 4 states with
i packets present, i.e., R(i) = {(i, O,O), (i, 0, l), (i, l,O), (i, 1, l)}.
Clearly, IC = 1 and the boundary equations are given by the global balance equations
for the first two levels, plus the normalisation equation, in total yielding a system of 8
linear equations. The L x L matrix R with L = 4 now has to be solved numerically.
Measures of interest we could address are: (i) the average node delay E[D] (in seconds);
(ii) the average reserved bandwidth E[Bw] (in Mbps); and (ii) the expected number of
connection establishments per second E[C] (in p er - second). All these quantities can be
398 17 Infinite-state SPNs

0.30 I I I
0.28 set 1 -
set 2
0.26

0.24

0.22

Wwl 0.20

0.18

0.16

0.14

0.12

0 50 100 150 200


x (p/s)

Figure 17.8: The expected bandwidth E[Bw] (in Mbp s ) as a function of the arrival rate X
in a burst

expressed in closed-form using R and the boundary vector z. (for details, see [131]). Under
the assumption that communication capacity can be claimed in various amounts, the service
rate p can be chosen freely. Given a certain workload, a higher requested transmission
speed p will yield smaller connection times, however, at higher costs per time unit. The
parameter p together with the connection release rate r are therefore interesting quantities
for controlling the system performance and cost.
Let us now turn to some numerical results. We assume that a = 1.0, p = 0.04, c = 10.0,
and I = 10 kbit. We address the following two combinations of transmission and release
rates: (p,r)i = (336.0,l.O) and (P,L,T)~ = (236.0,0.5). In the first case, the transmission
speed is relatively high, but connections are rapidly released after usage. In the second
case, a lower transmission speed is used, but a connection is maintained longer. Therefore,
arriving packets have a smaller probability of experiencing an extra connection setup delay.
In Figure 17.7 we depict the expected delay E[D] ( in seconds) as a function of the arrival
rate X in a burst. Although for X x 85 the average delay values coincide (E[D] z 0.0245),
for changing X this is certainly not the case. For the first parameter set, the average delay
is less sensitive to changes in X, especially towards higher values. For smaller values of X,
the average delay is smaller for parameter set 2. Surprisingly, the less sensitive solution
requires a smaller average bandwidth as well, as illustrated in Figure 17.8. The number
of connection establishments, however, is higher, as illustrated by Figure 17.9. Since the
17.5 Application studies 399

0.070 I

0.065 set 1 -

0.060

0.055

0.050

WI 0.045

0.040

0.035

0.030

0.025
I I I
0.020
0 50 100 150 200
x (P/S>

Figure 17.9: The expected connection setup rate E[C] ( in s-i) as a function of the arrival
rate X in a burst

latter can also be associated with costs in a B-ISDN context, the price for the less sensitive
delay behaviour and for the smaller bandwidth consumption is paid here. Also observe,
that for higher traffic, the number of connection establishments decreases, i.e., a connection
that is once established, is used for a long time since the probability of having a connection
and no packets present decreases with larger X.

17.5.3 A queueing system with checkpointing and recovery


In this section we present an analysis of the time to complete tasks on systems that change
their structure in the course of time (multi-mode systems). The task-completion time
of jobs on multi-mode systems has been studied by several authors. Bobbio describes the
completion-time problem as the time-to-absorption problem in a CTMC with an absorbing
state [22]. Chimento and Trivedi consider different types of failures that can occur as well
as different ways in which failed services might be resumed [46]. The above studies aim at
the completion time distribution of single jobs; effects of queueing, due to congestion, are
not taken into account. In [220] the performance of a queueing system in which the server
is subject to breakdowns and repairs is studied, so that queueing is taken into account.
Typically, variants of queues of M(G( 1 type are studied, where, depending on the status
of the server, jobs experience a delay in the queueing system. Related models occur when
400 17 Infinite-state SPNs

always arr buffer serve start-chk

ing

reserve

Figure 17.10: iSPN of the transaction processing system model with checkpointing and
recovery

studying the effect of checkpointing strategies on the performance of computer systems


that are subject to failures and repairs [40, 221, 2371. When increasing the checkpointing
frequency, the amount of overhead increases; however, the amount of work to be done after
a failure and subsequent rollback, i.e., the actual recovery time, decreases. This interesting
trade-off has lead researchers to study the optimality of checkpointing strategies in various
situations.
To illustrate the use of iSPNs we address a job completion-time problem taking into
account queueing. We consider a simple model of a transaction processing system where
jobs (transactions) arrive in a buffer. A single server is normally available to process
these jobs. After K jobs have been processed, a checkpoint is made. We refer to K
as the checkpointing interval. When making a checkpoint, the server is unable to serve
other jobs. When the server is idle or processing ordinary jobs, it might fail. Once the
server has failed, it requires a repair after which it needs to re-serve all the jobs that
were processed since the last checkpoint. After that, the server becomes available for
ordinary job processing. For simplicity, it is assumed that the server cannot fail when
it is checkpointing or recovering. This assumption can be changed without altering the
fundamental solution approach proposed here. Also, it is assumed that the service times,
the checkpointing time and the repair time are exponentially distributed and that the
arrivals form a Poisson process. The corresponding iSPN is depicted in Figure 17.10. In
Table 17.1 we summarise the meaning of the places and transitions.
17.5 Application studies 401

place meaning init


buffer input queue of jobs 0
UP server available 1
down server down and recovering 0
done number of jobs processed
since last checkpoint 0
saving server making a checkpoint 0
always infinite source of jobs 1
transition meaning rate
arr arrivals x
serve services P
reserve recovery P
fail failures 4
do-check checkpointing
restart put into normal operation ;
start-chk start checkpointing 1

Table 17.1: Places and transitions in the iSPN describing the job completion time problem
including queueing aspects

Interesting reward-based performance measures are, among others, the average number
of jobs queued, the average number of jobs that has not yet been checkpointed, the per-
centage of time that the processor is available for normal processing and the probability
that the buffer is empty (or not), all as a function of the checkpointing interval K.
We have evaluated the above described model for three different scenarios (see Ta-
ble 17.2) for varying checkpointing intervals K. First notice that the number of (tangible)
states in the repeating levels equals 2K + 1. This can be explained as follows. As long as
the system is up, i.e., there is a token in place up, there can be 0 through K - 1 tokens
in place done. When the K-th token arrives in place done, transition start-chk fires,
yielding a single token in place saving, and none in up. This makes K + 1 different states
already. Then, when the server is down, i.e., when there is a token in place down, there are
up to K - 1 possible tokens in place done, thus making up another K different situations.
In total, this yields 2K + 1 states per level. As a consequence of this, the matrices Ai and
R have dimension (2K+l) x (2K+l). Since these matrices in principle remain transparent
to the modeller, this is not a problem at all; however, the construction of these matrices
402 17 Infinite-state SPNs

1 2 2
X 5.0 6.5 8.0
; 0.13
10 0.13
10 0.25
10

a 21.0 21.0 21.0


p 11.0 11.0 11.0

Table 17.2: Parameters used in the three scenarios

3.0
E[buf f er]
2.5

1.0
2 3 4 5 7 8 9 10
l

Figure 17.11: The expected buffer occupancy as a function of the checkpointing interval
K

“by hand” would have been very impractical.


Studying the model reveals that the level number K at which the repeating structure
begins is 1. This implies that the boundary equations comprise a system of (4K + 2)
linear equations. Consequently, the state probabilities for levels 0 and 1 are computed
directly, including their normalisation, after which state probabilities for higher levels can
be computed recursively.
In Figure 17.11 we present the expected buffer occupancy E[buf f er] as a function of
the checkpointing interval K for two values of the job arrival rate X (we compare scenarios
1 and 2 here). As can be observed, the buffer occupancy is higher for a more heavily
loaded system, for all K. Interestingly, the curves show a pronounced minimum for K = 4
17.5 Application studies 403

(scenario 1) or K = 5 (scenario 2); a choice of K too small causes the system to make
checkpoints too often (checkpoints take an amount of time that is independent of K), thus
losing processing capacity. On the other hand, if only very few checkpoints are made,
the recovery process, which requires the reprocessing of not-yet-checkpointed jobs plus the
restart delay, will be longer. As such, this figure clearly illustrates the trade-off that exists
in designing rollback/recovery schemes.
Then, in Figure 17.12, we present the percentage of time the server is making check-
points (E[saving]), as well as the expected time the server is re-serving unchecked jobs
after a failure has occurred plus the expected restart time (E[down]), again for scenarios 1
and 2. The latter quantity is almost the same for both scenarios and therefore drawn as a
single line; since failures occur almost randomly, the number of unchecked jobs is almost
the same in both cases, so that the percentage of time the server is actively recovering
is also almost the same. The fact that the computed values for this probability are not
exactly alike lies in the fact that the server can only fail when it is serving regular jobs
or when it is idle. Since in scenario 2 the job arrival rate is larger than in scenario 1,
the server is slightly more busy making checkpoints and therefore is slightly less failure
prone. Therefore, in scenario 2 this probability is slightly smaller but the difference is at
most 10m3. The other two curves show the decreasing percentage of time the server is busy
making checkpoints when K increases. As expected, in scenario 1 there is less load, so
there is less checkpointing required.
We finally increase the job arrival rate to 8.0, as indicated in scenario 3. For this
scenario we show two probabilities in Figure 17.13, again as a function of K. First notice
that for K = 2 the system is not stable any more. This can be understood as follows. The
percentage of time needed for normal processing equals X/p = 8/10 = 80%. On top of
that comes, for K = 2, half a checkpointing time per job, requiring another 4/21 E 19.04%
capacity of the server. Taking into account the non-zero probability of failure, and the
extra work required when failures occur, it becomes clear that the server cannot do all the
work when K = 2.
Let us now address the cases where the system is stable. The upper curve shows the
probability that the buffer is non-empty. As can be seen, this probability ranges around
95%. Here we see similar behaviour to that of the expected buffer occupancy. There is a
pronounced minimum for K = 6. For K < 6, the server is making too many checkpoints.
This extra work is not earned back by the shorter recovery time that results. On the other
hand, for K > 6, the recovery times become larger so that the gain of less checkpointing
overhead is lost. The lower curve shows the probability that the server is available for
regular processing or for being idle, i.e., the probability that the server is not making
404 17 Infinite-state SPNs

0.16 I I I I I I I I I

0.14

0.12

0.10

0.08

0.06

0.04

0.02

2 3 4 5 7 8 9 10
;

Figure 17.12: The percentage of time spent making checkpoints and the percentage of time
recovering as a function of the checkpointing interval K

1.00

scenario 3: Pr
0.95

0.90
w+
0.85

0.80

0.70
3 4 5 6 8 9 10 11
ii

Figure 17.13: The probability of actual server availability and the probability of the buffer
being non-empty as a function of the checkpointing interval K

checkpoints nor is recovering. Also here a choice of K too small or too large yields a loss
in performance.
17.6 Further reading 405

17.6 Further reading


One-place unbounded SPNs were described in the mid-1980s by Florin and Natkin [92,
93, 941. Haverkort proposed the more general class of iSPNs (not yet named as such)
in [118], after which a software tool, named SPNZMGM, supporting the construction and
evaluation of iSPNs was reported in [125]. The M.Sc. students Klein [158] and Frank [97]
contributed to the development of SPNZMGM. Haverkort and Ost reported on the efficiency
of the matrix-geometric solution method as compared to the spectral expansion method,
using the model of Section 17.5.3 in [119]. In that paper, models with block sizes up to
hundreds of states are addressed. The OCDR mechanism was introduced by Heijenk [131]
and evaluated by Heijenk and Haverkort [132]. R ecently, Ost et al. extended the OCDR
model to include non-exponential connection-setup and -release times [226].

17.7 Exercises
17.1. OCDR model.
Extend the OCDR model of Figure 17.5 such that it includes:

l hyper-exponential or Erlangian connection-setup times;

l hyper-exponential or Erlangian connection-release times,

Discuss how the size of the levels changes due to the model changes. In the last case,
be careful to reset the state of the connection-release places and transitions when a new
job arrives before the connection-release time has completely expired. For details, see also
[226].
Which of the two types of distributions do you think is more appropriate to model the
setup- and release-times?

17.2. OCDR model with Poisson arrivals.


Consider the OCDR model of Figure 17.5 when the arrivals form a pure Poisson process.

1. Find the 2 x 2 matrices Ao, Al and A2 in symbolic form.

2. Explicitly solve the quadratic matrix equation and show that the matrix R has the
following form:
!A- A\
406 17 Infinite-state SPNs

17.3. Checkpointing models.


Extend the model of Figure 17.10 such that:

l it includes system failures during the checkpointing process;

l checkpoints are made after K jobs or after the expiration of a timer, whichever comes
first. Model the timer as an Erlang-Z distribution.

17.4. Two queues in series.


Reconsider the two queues in series, as addressed in Exercise 8.4. Model this queueing
system using iSPNs.

17.5. Polling models.


Consider a Markovian polling model with N stations. Under which conditions can such a
model be regarded as an iSPN (see also Exercise 8.5). Construct such an iSPN for N = 3
stations and 2-limited scheduling.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part V

Simulation
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Chapter 18

Simulation: methodology and


statistics

In the previous chapters we have addressed models that can be solved by analytical or
numerical means. Although the class of addressed models has been very wide, there are
still models that cannot be solved adequately with the presented techniques. These models,
however, can still be analysed using simulation. With simulation there are no fundamental
restrictions towards what models can be solved. Practical restrictions do exist since the
amount of computer time or memory required for running a simulation can be prohibitively
large.
In this chapter we concentrate on the general set-up of simulations as well as on the sta-
tistical aspects of simulation studies. To compare the concept of simulation with analytical
and numerical techniques we discuss the application of simulation for the computation of an
integral in Section 18.1. Various forms of simulation are then classified in Section 18.2. Im-
plementation aspects for so-called discrete event simulations are discussed in Section 18.3.
In order to execute simulation programs, realisations of random variables have to be gen-
erated. This is an important task that deserves special attention since a wrong or biased
number generation scheme can severely corrupt the outcome of a simulation. Random
number generation is therefore considered in section 18.4. The gathering of measurements
from the simulation and their processing is finally discussed in Section 18.5.

18.1 The idea of simulation


Consider the following mathematical problem. One has to obtain the (unknown) area CY
under the curve y = f(z) = x 2, from x = 0 to x = 1. Let 6 denote the result of the
410 18 Simulation: methodology and statistics

calculation we perform to obtain this value. Since f(x) is a simple quadratic term, this
problem can easily be solved analytically:

(18.1)

Clearly, in this case, the calculated value zi is exactly the same as the real value Q.
Making the problem somewhat more complicated, we can pose the same question when
f(x) = xsina:. Now, we cannot solve the problem analytically any more (as far as we have
consulted integration tables). We can, however, resort to a numerical technique such as
the trapezoid rule. We then have to split the interval [0, l] into n consecutive intervals
[~O,~l], [%~21, .", [x,-i, ~~1 so that the area under the curve can be approximated as:

ti = ; &i - %l)(f(Xi) + f(Xi-1)). (18.2)


i=l

By making the intervals sufficiently small zt will approximate a with any level of desired
accuracy. This is an example of a numericaE solution technique.
Surprisingly, we can also obtain a reasonable estimate zi for a by means of stochastic
simulation. Studying f(s) = Yinz on the interval [O,l], we see that 0 2 f(x) 5 1. Taking
two random samples zi and yi from the uniform distribution on [0, 11, can be interpreted as
picking a random point in the unit-square {(x, y)(O 5 II: 5 1, 0 5 y 5 1). Repeating this
N times, the variables ni = l{yi 5 f(q)} in d ica t e whether the i-point lies below f(z), or
not. Then, the value
1 N
a = N ci=l ni, (18.3)

estimates the area a.


In trying to obtain a by means of a so-called Monte Carlo simulation we should keep
track of the accuracy of the obtained results. Since 6 is obtained as a function of a number
of realisations of random variables, 6 is itself a realisation of a random variable (which we
denote as 2). The random variable A is often called the estimator, whereas the realisation
6 is called the estimate. The random variable A should be defined such that it obeys a
number of properties, otherwise the estimate 6 cannot be guaranteed to be accurate:

l A should be unbiased, meaning that E[A] = a;

l A should be consistent, meaning that the more samples we take, the more accurate
the estimate 6 becomes.
18.2 Classifying simulations 411

We will come back to these properties in Section 18.5. Prom the simulation we can compute
an estimate for the variance of the estimator A as follows:

1 N
x (ni - ii)“. (18.4)
82 = N(N - 1) i=i

Note that this estimator should not be confused with the esimator for the variance of a
single sample, which is N times larger; see also Section 18.5.2 and [231]. Now we can apply
Chebyshev’s inequality, which states that for any ,0 > 0

a2
Pr{]A - a] 2 p} I p2.

In words, it states that the probability that A deviates more than ,0 from the estimated
value 5, is at most equal to the quotient of 5’ and ,0. The smaller the allowed deviation is,
the weaker the bound on the probability. Rewriting (18.5) by setting S = 1 - c2/p2 and
5 = @, we obtain
Pr{]A - &] 5 &)‘6. (18.6)

This equation tells us that A deviates at most a/d= from zi, with a probability of
at least 6. In this expression, we would like S to be relatively large, e.g., 0.99. Then,
dm = 0.1, so that Pr{]j - 61 5 lOa> 2 0.99. In order to let this inequality have high
significance, we must make sure that the term “105” is small. This can be accomplished
by making many observations.
It is important to note that when there is an analytical solution available for a particular
problem, this analytical solution typically gives far more insight than the numerical answers
obtained from a simulation. Individual simulation results only give information about a
particular solution to a problem, and not at all over the range of possible solutions, nor do
they give insight into the sensitivity of the solution to changes in one or more of the model
parameters.

18.2 Classifying simulations


In this section we will classify simulations according to two criteria: their state space and
their time evolution. Note that we used the same classification criteria when discussing
stochastic processes in Chapter 3.
In continuous-event simulations, systems are studied in which the state continuously
changes with time. Typically, these systems are physical processes that can be described by
412 18 Simulation: methodology and statistics

simulation

continuous-event discrete-event

time-based event-based
/ \
event-oriented process-oriented

Figure 18.1: Classifying simulations

systems of differential equations with boundary conditions. The numerical solution of such
a system of differential equations is sometimes called a simulation. In physical systems,
time is a continuous parameter, although one can also observe systems at predefined time
instances only, yielding a discrete time parameter. We do not further address continuous-
state simulations, as we did not consider continuous-state stochastic processes.
More appropriate for our aims are discrete-event simulations (DES). In discrete-event
simulations the state changes take place at discrete points in time. Again we can either
take time as a continuous or as a discrete parameter. Depending on the application at
hand, one of the two can be more or less suitable. In the discussions to follow we will
assume that we deal with time as a continuous parameter.
In Figure 18.1 we show the discussed classification, together with some sub-classifications
that follow below.

18.3 Implementation of discrete-event simulations


Before going into implementation details of discrete-event simulations, we first define some
terminology in Section 18.3.1. We then present time-based simulations in Section 18.3.2
and event-based simulations in Section 18.3.3. We finally discuss implementation strategies
for event-based discrete-event simulations in Section 18.3.4.

18.3.1 Terminology

The simulation time or simulated time of a simulation is the value of the parameter “time”
that is used in the simulation program, which corresponds to the value of the time that
would have been valid in the real system. The run time is the time it takes to execute a
simulation program. Difference is often made between wall-clock time and process time;
18.3 Implementation of discrete-event simulations 413

the former includes any operating system overhead, whereas the latter includes only the
required CPU, and possibly I/O time, for the simulation process.
In a discrete-event system, the state will change over time. The cause of a state variable
change is called an event. Very often the state changes themselves are also called events.
Since we consider simulations in which events take place one-by-one, that is, discrete in
time, we speak of discrete-event simulations. In fact, it is because events in a discrete-
event system happen one-by-one that discrete-event simulations are so much easier to
handle than simulations of continuous-events systems. In discrete-event simulations we
“‘jump” from event to event and it is the ordering of events and their relative timing we are
interested in, because this exactly describes the performance of the simulated system. In
a simulation program we will therefore mimic all the events. By keeping track of all these
events and their timing, we are able to derive measures such as the average inter-event
time or the average time between specific pairs of events. These then form the basis for
the computation of performance estimates.

18.3.2 Time-based simulation

In a time-bused simulation (also often called synchronous simulation) the main control loop
of the simulation controls the time progress in constant steps. At the beginning of this
control loop the time t is increased by a step At to t + at, with At small. Then it is
checked whether any events have happened in the time interval [t, t + At]. If so, these
events will be executed, that is, the state will be changed according to these events, before
the next cycle of the loop starts. It is assumed that the ordering of the events within the
interval [t, t + At] is not of importance and that these events are independent. The number
of events that happened in the interval [t, t + At] may change from time to time. When t
rises above some maximum, the simulation stops. In Figure 18.2 a diagram of the actions
to be performed in a time-based simulation is given,
Time-based simulation is easy to implement. The implementation closely resembles the
implementation of numerical methods for solving differential equations. However, there
are some drawbacks associated with this method as well. Both the assumption that the
ordering of events within an interval [t, t + At] is not important and the assumption that
these events are independent require that At be sufficiently small, in order to minimize
the probability of occurrence of mutually dependent events. For this reason, we normally
have to take At so small that the resulting simulation becomes very inefficient. Many very
short time-steps will have to be performed without any event occurring at all. For these
reasons time-based simulations are not often employed.
414 18 Simulation: methodology and statistics

initialize

t:=t+At f

i
events in
[t,t + At)

event 1 event 2 ‘. event n

no
stop?

Yes

statistics

Figure 18.2: Diagram of the actions to be taken in a time-based simulation

Example 18.1. A time-based MIMI1 simulation program.


As an example, we present the framework of a time-based simulation program for an
M 1MI 1 queue with arrival rate X and service rate p. In this program, we use two state
variables: N, f (0, 1) d enoting the number of jobs in service, and Nq E 1TNdenoting the
number of jobs queued. Notice that there is a slight redundancy in these two variables since
N4 > 0 + N, = 1. The aim of the simulation program is to generate a list of time-instances
and the state variables at these instances. The variable A is assumed to be sufficiently
small. Furthermore, we have to make use of a function draw(p) , which evaluates to true
with probability p and to false with probability 1 - p; see also Section 18.4.
The resulting program is presented in Figure 18.3. After the initialisation (lines l-3),
the main program loop starts. First, the time is updated (line 6). If during the period
[t, t + At) an arrival has taken place, which happens with probability X . At in a Poisson
process with rate X, we have to increase the number of jobs in the queue (line 7). Then, we
check whether there is a job in service. If not, the just arrived job enters the server (lines
13-14). If there is already a job in service, we verify whether its service has ended in the
last interval (line 9). If so, the counter N, is set to 0 (line 12), unless there is another job
waiting to be served (line 10); in that case a job is taken out of the queue and the server
18.3 Implementation of discrete-event simulations 415

1. input& P, -L,>
2. t := 0
3. N, := 0; Nq := 0
4. while t < t,,,
5. do
6. t:=t+At
7. if draw(X. At) then N4 := N4 + 1
8. if N,=l
9. then if draw(p. At)
10. then if Nq > 0
11. then Nq := Nn - 1
12. else N, := 0
13. if N, = 0 and N4 > 0
14. then N,:=l; NQ:=Nq-1
15. writeln(t, N4, N,)
16. od

Figure 18.3: Pseudo-code for a time-based MIMI 1 simulation

remains occupied (N, does not need to be changed). 0

18.3.3 Event-based simulation


In time-based simulations the time steps were of fixed length, but the number of events per
time step varied. In event-bused simulations (also often called asynchronous simulation) it
is just the other way around. We then deal with time steps of varying length such that
there is always exactly one event in every time step. So, the simulation is controlled by
the occurrence of “next events” . This is very efficient since the time steps are now just
long enough to optimally proceed with the simulation and just short enough to exclude the
possibility of having more than one event per time step, thus circumventing the problems
of handling dependent events in one time step.
Whenever an event occurs this causes new events to occur in the future. Consider for
instance the arrival of a job at a queue. This event causes the system state to change,
but will also cause at least one event in the future, namely the event that the job is taken
416 18 Simulation: methodology and statistics

initialize

Y
determine (
next event

1
t := t next event

\li Ji JI
new new new
events events .... events

no
stop?
Yes

+-l statistics

Figure 18.4: Diagram of the actions to be taken in a event-based simulation

into service. All future events are generally gathered in an ordered event list. The head
of this list contains the next event to occur and its occurrence time. The tail of this list
contains the future events, in their occurrence order. Whenever the first event is simulated
(processed), it is taken from the list and the simulation time is updated accordingly. In the
simulation of this event, new events may be created. These new events are inserted in the
event list at the appropriate places. After that, the new head of the event list is processed.
In Figure 18.4 we show a diagram of the actions to be performed in such a simulation.
Most of the discrete-event simulations performed in the field of computer and com-
munication performance evaluation are of the event-based type. The only limitation to
event-based simulation is that one must be able to compute the time instances at which
future events take place. This is not always possible, e.g., if very complicated delay dis-
tributions are used, or if the system is a continuous-variable dynamic system. In those
cases, time-based simulations may be preferred. Also when simulating at a very fine time-
18.3 Implementation of discrete-event simulations 417

1. input& p, La,>
2. t := 0
3. N, := 0; Nq := 0
4. while t < t,,,
5. do
6. if N,=l
7. then narr := negexp(A)
8. ndep := negexp(p)
9. if ndep < narr
10. then t := t + ndep
11. if N,>O
12. then Nq := N4 - 1
13. else N, := 0
13. else t := t + narr
14. Nq := Nq + 1
15. else narr := negexp(X)
16. t := t+ narr
17. N, := 1
18. writeln( t, Nq, N,)
19. od

Figure 18.5: Pseudo-code for an event-based MIMI1 simulation

granularity, time-based simulations are often used, e.g., when simulating the execution of
microprocessor instructions. In such cases, the time-steps will resemble the processor clock-
cycles and the microprocessor should have been designed such that dependent events within
a clock-cycle do not exist. We will only address event-based simulations from now on.

Example 18.2. An event-based MIMI1 simulation program.


We now present the framework of an event-based simulation program for the MIMI 1 queue
we addressed before. We again use two state variables: N, E (0, 1) denoting the number of
jobs in service, and N4 E m denoting the number of jobs queued. We furthermore use two
variables that represent the possible next events: narr denotes the time of the next arrival
and ndep denotes the time of the next departure. Since there are at most two possible
418 18 Simulation: methodology and statistics

next events, we can store them in just two variables (instead of in a list). The aim of the
simulation program is to generate a list of events times, and the state variables at these
instances. We have to make use of a function negexp(X) which generates a realisation of a
random variable with negative exponential distribution with rate X; see also Section 18.4.
The resulting program is presented in Figure 18.5. After the initialisation (lines l-
3), the main program loop starts. Using the variable N,, it is decided what the possible
next events are (line 6). If there is no job being processed, the only possible next event
is an arrival: the time until this next event is generated, the simulation time is updated
accordingly and the state variable N, increased by 1 (lines 15-17). If there is a job being
processed, then two possible next events exist. The times for these two events are computed
(lines 7-8) and the one which occurs first is performed (decision in line 9). If the departure
takes place first, the simulation time is adjusted accordingly, and if there are jobs queued,
one of them is taken into service. Otherwise, the queue and server remain empty (lines
10-13). If the arrival takes place first, the time is updated accordingly and the queue is
enlarged by 1 (lines 13-14). 0

18.3.4 Implementation strategies


Having chosen the event-based approach towards simulation, there exist different imple-
mentation forms. The implementation can either be event-oriented or process-oriented.
With the event-oriented implementation there is a procedure Pi defined for every type
of event i that can occur. In the simulator an event list is defined. After initialisation of
this event list the main control loop starts, consisting of the following steps. The first event
to happen is taken from the list. The simulation time is incremented to the value at which
this (current) event occurred. Then, if this event is of type i, the procedure Pi is invoked.
In this procedure the simulated system state is changed according to the occurrence of
event i, and new events are generated and inserted in the event list at the appropriate
places. After procedure Pi terminates, some statistics may be collected, and the main
control loop is continued. Typically employed stopping criteria include the simulated time,
the number of processed events, the amount of used processing time, or the width of the
confidence intervals that are computed for the measures of interest (see Section 18.5).
In an event-oriented implementation, the management of the events is explicitly visible.
In a process-oriented implementation, on the other hand, a process is associated with every
event-type. These processes exchange information to communicate state changes to one
another, e.g., via explicit message passing or via shared variables. The simulated system
18.4 Random number generation 419

operation can be seen as an execution path of the communicating event-processes. The


scheduling of the events in the simulation is done implicitly in the scheduling of the event-
processes. The latter can be done by the operating system or by the language run-time
system. A prerequisite for this approach is that language elements for parallel programming
are provided.
Both implementation strategies are used extensively. For the event-oriented implemen-
tation normal programming languages such as Pascal or C are used. For the process-
oriented implementation, Simula’67 has been used widely for a long period; however,
currently the use of C++ in combination with public domain simulation classes is most
common.
Instead of explicitly coding a simulation, there are many software packages available
(both public domain and commercial) that serve to construct and execute simulations in
an efficient way. Internally, these packages employ one of the two methods discussed above;
however, to the user they represent themselves in a more application-oriented way, thus
hiding most of the details of the actual simulation (see also Section 1.5 on the GMTF).
A number of commercial software packages, using graphical interfaces, for the simulation
of computer-communication systems have recently been discussed by Law and McComas
[176]; with these tools, the simulations are described as block-diagrams representing the
system components and their interactions. A different approach is taken with (graphical)
simulation tools based on queueing networks and stochastic Petri nets. With such tools,
the formalisms we have discussed for analytical and numerical performance evaluations
are extended so that they can be interpreted as simulation specifications. The tools then
automatically transform these specifications to executable simulation programs and present
the results of these simulations in a tabular or graphical format. Of course, restrictions
that apply for the analytic and numerical solutions of these models do not apply any more
when the simulative solution is used. For more information, we refer to the literature,
e.g., [125].

18.4 Random number generation


In order to simulate performance models of computer-communication systems using a com-
puter program we have to be able to generate random numbers from certain probability
distributions, as we have already seen in the examples in the previous section. Random
number generation (RNG) is a difficult but important task; when the generated random
numbers do not conform to the required distribution, the results obtained from the simu-
420 18 Simulation: methodology and statistics

lation should at least be regarded with suspicion.


To start with, true random numbers cannot be generated with a deterministic algo-
rithm. This means that when using computers for RNG, we have to be satisfied with
pseudo-random numbers. To generate pseudo-random numbers from a given distribution,
we proceed in three steps. We first generate a series of pseudo-random numbers on a finite
subset of AT, normally (0, . . . , m - l}, m E IV. This is discussed in Section 18.4.1. From
this pseudo-random series, we compute (pseudo) uniformly distributed random numbers,
To verify whether these pseudo-random numbers can be regarded as true random numbers
we have to employ a number of statistical tests. These are discussed in Section 18.4.2.
Using the uniform distributed random variables, various methods exist to compute non-
uniform pseudo-random variables. These are discussed in Section 18.4.3.

18.4.1 Generating pseudo-random numbers


The generation of sequences of pseudo-random numbers is a challenging task. Although
many methods exist for generating such sequences, we restrict ourselves here to the so-
called linear and additive congruential methods, since these methods are relatively easy to
implement and most commonly used. An RNG can be classified as good when:

l successive pseudo-random numbers can be computed with little cost;

l the generated sequence appears as truly random, i.e., successive pseudo-random num-
bers are independent from and uncorrelated with one another and conform to the
desired distribution;

l its period (the time after which it repeats itself) is very long.

Below, we will present two RNGs and comment on the degree of fulfillment of these prop-
erties.
The basic idea of linear congruential RNGs is simple. Starting with a value ~0, the
so-called seed, zi+l is computed from zi as follows:

xi+l = (a,zi + c) modulo m. (18.7)

With the right choice of parameters a, c, and m, this algorithm will generate m different
values, after which it starts anew. The number m is called the cycle length. Since the
next value of the series only depends on the current value, the cycle starts anew whenever
a value reappears. The linear congruential RNG will generate a cycle of length m if the
following three conditions hold:
18.4 Random number generation 421

l the values m and c are relative primes, i.e., their greatest common divisor is 1;

a all prime factors of m should divide a - 1;

l if 4 divides m, then 4 should also divide a - 1.

These conditions only state something about the cycle length; they do not imply that the
resulting cycle appears as truly random.

Example 18.3. Linear congruential method.


Consider the case when m = 16, c = 7, and a = 5. We can easily check the conditions
above. Starting with x0 = 0, we obtain zr = (5 x 0 + 7) modulo 16 = 7. Continuing in this
way we obtain: 0,7,10,9,4,11,14, . . .. 0

The main problem with linear congruential methods is that the cycles are relatively
short, hence, there is too much repetition, too little randomness. This problem is avoided
by using additive congruential methods. With these methods, the i-value xi is derived from
the Ic previous values (zi-r, . . . ,zi-k) in the following way:

Zi= modulo m. (18.8)

The starting values x0 through ,zkVl are generally derived by a linear congruential method,
or by assuming xl = 0, for I < 0. With an appropriate selection of the factors aj cycles of
length mk - 1 are obtained.

Example 18.4. Additive congruential method.


Choosing the value k = 7 and setting the coefficients al = a7 = 1 and a2 = +. . = a6 = 0, we
can extend the previous example. As starting sequence we take the first 7 terms computed
before: 0,7,10,9,4,11,14. The next value would then be (14 + 0) modulo 16 = 14.
Continuing in this way we obtain: 0,7,10,9,4,11,14,14,5,15,8,12,7,5,~ . . . Observe that
when a number reappears, this does not mean that the cycle restarts. For this example,
the cycle length is limited by 167 - 1 = 268435455. cl

Finally, it is advisable to use a different RNG for each random number sequence to be
used in the simulation, otherwise undesired dependencies between random variables can
be introduced. Also, a proper choice of the seed is of importance. There are good RNGs
that do not function properly, or not optimally, with wrongly chosen seeds. To be able
to reproduce simulation experiments, it is necessary to control the seed selection process;
taking a random number as seed is therefore not a good idea.
422 18 Simulation: methodology and statistics

18.4.2 Testing pseudo-uniformly distributed random numbers

With the methods of Section 18.4.1 we are able to generate pseudo-random sequences.
Since the largest number that is obtained is m - 1, we can simply divide the successive ,zi
values by m - 1 to obtain a sequence of values ui = q/(m - 1). It is then assumed that
these values are pseudo-uniformly distributed.
Before we proceed to compute random numbers obeying other distributions, it is now
time to verify whether the generated sequence of pseudo-uniform random numbers can
indeed be viewed as a realisation sequence of the uniform distribution.

Testing the uniform distribution with the x2-test

We apply the x2-test to decide whether a sequence of n random numbers ~1, . . . , x, obeys
the uniform distribution on [0, I]. For this purpose, we divide the interval [0, l] in k intervals
Ii = [(i - l)/lc, i/k], that is, Ii is the i-th interval of length l/k in [0, I] starting from the
left, i = 1, ... , Ic. We now compute the number n; of generated random numbers in the
i-th interval:
ni = I{zjlxj E Ii,j = l,... ,n}j. (18.9)

We would expect all values ni to be close to n/k. We now define as quality criterion for
the RNG the relative squared difference of the values ni and their expectation [289]:

(18.10)

The value d is a realisation of a stochastic variable D which has approximately a x2-


distribution with k - 1 degrees of freedom. The hypothesis that the generated numbers do
come from the uniform distribution on [0, l] cannot be rejected with probability Q, if d is
smaller than the critical value for x$-r, according to Table 18.1.
A few remarks are in order here. For the x2-test to be valuable, we should have a large
number of intervals Ic, and the number of random numbers in each interval should not be
too small. Typically, one would require k 2 10 and ni 2 5.
The x2-test employs a discretisation to test the generated pseudo-random sequence.
Alternatively, one could use the Kolmogorov-Smirnov test to directly test the real numbers
generated. As quality measure, this test uses the maximum difference between the desired
CDF and the observed CDF; for more details, see e.g., [137, 1451.
18.4 Random number generation 423

k a = 0.9 a = 0.95 a = 0.99


2 4.605 5.991 9.210
3 6.253 7.817 11.356
4 7.779 9.488 13.277
5 9.236 11.071 15.086
6 10.645 12.592 16.812
7 12.017 14.067 18.475
8 13.362 15.507 20.090
9 14.684 16.919 21.666
10 15.987 18.307 23.209
15 22.307 24.996 30.578
20 28.412 31.410 37.566
25 34.382 37.653 44.314
30 40.256 43.773 50.892
40 51.805 55.759 63.691
50 63.17 67.505 76.154
60 74.40 79.082 88.379
70 85.53 90.531 100.425
80 96.58 101.879 112.329
90 107.6 113.145 124.116
100 118.5 124.342 135.807
k > 100 +(h + 1.28)2 $(h + 1.64)2 +(h + 2.33)2

Table 18.1: Critical values xk,cr for the x2-distribution with k degrees of freedom and
confidence level a such that Pr{D 5 xk,cr} = Q (with h = d!%-?)

Testing the correlation structure

In order to test whether successive random numbers can be considered independent from
one another, we have to study the correlation between the successive pseudo-random num-
bers. Let the random numbers xi, . . . , x, be generated uniform numbers on [0, 11. The
auto-correlation coeficient with Zag k 2 1 is then estimated by:

ck= & nEk


(Xi- i) (xi+k- i) . (18.11)
2=1

Since c,$ is the sum of a large number of identically distributed random variables, it has a
Normal distribution, here with mean 0 and variance (144(n- k))-‘. Therefore, the random
424 18 Simulation: methodology and statistics

Table 18.2: Critical values z for the N(0, 1)-distribution and confidence level a such that
Pr{]Z/ 5 z} = a

variable Al, = 12Ckdg is N(0, l)-distributed. Hence, we can determine the value z
such that
Pr{Ck 5 2) = Pr{]Ak] < x/l2da} = a, (18.12)

by using Table 18.2. Thus, for a chosen confidence level a, the autocorrelation coefficient
at lag k will lie in the interVa1 [ck - z/12&-, ck + z/12d-], where ck is a realisation
of Ck. For a proper uniform RNG, the auto-correlation coefficients should be very close to
0, that is, the computed confidence intervals should contain 0.

18.4.3 Generation of non-uniformly distributed random numbers


There are various techniques to use uniform random numbers to obtain differently dis-
tributed random numbers that obey other distributions. We present some of these tech-
niques below.

The inversion method

Consider the distribution function Fy(y) of some stochastic variable Y. Let 2 be a random
variable defined as a function of the random variable Y, and let us choose as function a
very special one, namely the distribution function of Y: i.e., 2 = Fy(Y). The distribution
function of 2, i.e., Fz(,z), has the following form:

Fz(z) = Pr{Z 2 2) = Pr{Fy(Y) < z}. (18.13)

Now, assuming that Fy can be inverted, we can equate the latter probability with Pr{Y 5
&l(z)}, for 0 5 x 5 1. But, since Fy(y) = Pr{Y 5 y} we find, after having substituted
Fpl(z) for y:
Fz(z) = Fy(Fpl(z)) = 2, for 0 5 z 5 1. (18.14)

In conclusion, we find that 2 is distributed uniformly on [0, 11. To generate random


numbers with a distribution Z+(y) we now proceed as follows. We generate a uniformly
distributed random number z and apply the inverse function to yield y = FpL(z). The
18.4 Random number generation 425

0 Y

Figure 18.6: Deriving a continuous random variable from a uniformly distributed random
variable

realisations y are then distributed according to distribution function Fy. In Figure 18.6
we visualise the inversion approach.

Example 18.5. Negative exponential random numbers.


To generate random numbers from the exponential distribution F!(y) = 1 - e-‘Y, y > 0,
we proceed as follows. We solve z = Fy (y) for y to find: y = - ln( 1 - z)/X. Thus, we can
generate uniformly distributed numbers x, and apply the just derived equation to obtain
exponentially distributed random numbers y. To save one arithmetic operation, we can
change the term 1 - z to z, since if x is uniformly distributed, then 1 - z is so as well. 0

Example 18.6. Erlang-k random numbers.


To generate random numbers from the Erlang-Ic distribution, we generate k random num-
bers, distributed according to a negative exponential distribution, and simply add these.
In order to avoid having to take Ic logarithms, we can consider the following. Let ul,. . . , U,+
be k uniformly distributed random numbers, and let xi = - ln(ui)/A be the corresponding
Ic negative exponential distributed random numbers (A is the rate per phase). We now
compute the Erlang-lc distributed number z as follows:
k
X=
c
(18.15)
i=l

In conclusion, we simply have to multiply the k terms, and have to take only one nat-
ural logarithm. Since RNGs are invoked very often during a simulation, such efficiency
improvements are very important. 0

Example 18.7. Hyper-exponential random numbers.


The hyper-exponential distribution can be interpreted as a choice between n negative expo-
426 18 Simulation: methodology and statistics

reject

a Xl x2
b -

Figure 18.7: Rejection method for generating random variables with density fX(x)

nential distributions, each with rate Xi. We therefore first generate a random integer i from
the set (1, -. . , n}; this is the selection phase. We then generate a negative exponentially
distributed random number with rate Xi. Cl

The rejection method

For obtaining random variables for which the inverse distribution function cannot be easily
obtained, we can use the rejection method, provided we know the density function fx(x).
Furthermore, the density function must have a finite domain, say [a, b], as well as a finite
image on [a, b], say [0, c]. If there are values x 6 [a, b] f or which fx(x) > 0, then the rejection
method only provides approximate random numbers. In Figure 18.7 we show a density
function fulfilling the requirements. We proceed as follows. We generate two uniformly
distributed numbers ui and u2 on [0, l] and derive the random numbers x = a + (b - a)u,
and y = cu2. The tuple (x, y) is a randomly selected point in the rectangle [a, b] x [0, c].
Now, whenever y < fx(x), that is, whenever the point (2, y) lies below the density fx(x),
we accept 2. Successive values for x then obey the density fx (x). Whenever y 2 fx (x)
we repeat the procedure until we encounter a tuple for which the condition holds. This
procedure is fairly efficient when the area under the density fx(x) is close to c(b - a). In
that case we have a relatively high probability that a sample point lies under the density,
so that we do not need many sample points.
The proof of the rejection method is fairly simple. Consider two random variables: X
is distributed uniformly on [a, b], and Y is distributed uniformly on [0, c]. We then proceed
to compute the following conditional probability, which exactly equals the probability
18.5 Statistical evaluation of simulation results 427

distribution function for an accepted value z according to the rejection scheme:

Pr{x5 X 2 x + dxly2 fx(X)}= Pr’x’ x ’ x + dx’Y’ fx(x))


Pr{Y5 fx(X)}
=(&) (9) (&)-l=fx(xMx.
We see that the conditional probability reduces to the required probability density.

Normally distributed random numbers

For some random variables, the distribution function cannot be explicitly computed or
inverted, nor does the density function have a finite domain. For these cases we have to
come up with even other methods to generate random numbers.
As a most interesting example of these, we address the normal distribution. We apply
the central limit theorem to compute normally distributed random numbers as follows. We
first generate n independent and identically distributed random numbers ~1, . . . , x,, which
can be seen as realisations of the random variables X1, . . . , X,, which are all distributed
as the random variable X with mean E[X] and variance var[X]. We can then define the
random variable S, = X1 + . . . + X,. The central limit theorem then states that the
random variable
N = sn - nEIXl
~iqFj
approaches a normally distributed random variable with mean 0 and variance 1, i.e., an
N(0, 1) distribution.
Now, by choosing the uniform distribution on [0, l] for X (X has mean E[X] = l/2
and var[X] = l/12) and taking n = 12 samples, Sr2 = Xr + . . . + X12, so that

(18.16)

approaches a N(0, 1)-distributed random variable. An advantage of using N is that it is


very efficient to compute. Of course, taking larger values for n increases the accuracy of
the generated random numbers.

18.5 Statistical evaluation of simulation results


A discrete-event simulation is performed to obtain quantitative insight into the operation
of the modelled system. When executing a simulation (program), relevant events can be
428 18 Simulation: methodology and statistics

time-stamped. All the resulting samples (or observations) can be written to a truce or
Zagfile. In Section 18.5.1 we discuss how we can obtain single samples from a simulation
execution. Although a complete log file contains all the available information, it is generally
of little practical use. Therefore, the simulation log is “condensed” to a format that is more
suitable for human interpretation. This step is performed using statistical techniques and
is discussed in Section 18.5.2.

18.5.1 Obtaining measurements

We address the issues of obtaining individual samples and the removal of invalid samples
(the initial transient) below.

Sampling individual events

We distinguish two types of measures that can be obtained from a simulation: user-oriented
measures and system-oriented measures. User-oriented measures are typically obtained by
monitoring specific users of the system under study, i.e., by monitoring individual jobs. An
example of a user-oriented measure is the job residence time in some part of the modelled
system. When the i-th job enters that system part, a time-stamp ti(a) (“a” for arrival) is
taken. When the job leaves the system part a time-stamp tid) (“d” for departure) is taken.
The difference t.z = t(d)
2 - t!“’ is an realisation of the job residence time. By summing over
all simulated jobs, denoted as n, we finally obtain an estimate for the mean job residence
time as:
(18.17)

Notice that during the simulation, we do not have to store all the individual time-stamp
values, since in the end we only need the difference of their sums.
For the derivation of system-oriented measures no individual jobs should be monitored,
but the system state itself. A typical example is the case where the measure of interest
is the long-run probability that a finite buffer is fully occupied. Upon every state change
in the model the system state of interest is checked for this condition. If this condition
becomes true, a time-stamp $) is taken (“f” for full). The next time-stamp, denoted
&n) (“n” for not full) is then taken when the buffer is not fully occupied anymore. The
i
difference t,!“’ - tif) is a realisation of random variable that could be called the “buffer-full
period”. The sum of all these periods, divided by the total simulation time then estimates
18.5 Statistical evaluation of simulation results 429

the long-run buffer full probability, i.e.,

(18.18)

where we assume that during the simulation of length 2’ we have experienced n buffer full
periods. Notice that the simulation time 7’ is predetermined for system-oriented measures,
whereas the number of samples n is prespecified for user-oriented measures, in order to
obtain unbiased estimators.

Initial transient removal

With most simulations, we try to obtain steady-state performance measures. However,


when starting the simulation, the modelled system state will generally be very non-typical,
e.g., in queueing network simulations one may choose all queues to be initially empty.
Hence, the observations made during the first period of the simulation will be non-typical
as well, thus influencing the simulation results in an inappropriate way. Therefore, we
should ignore the measurements taken during this so-called (initial) transient period. The
main question then is: how long should we ignore the measurements, or, in other words,
how long should we take this initial transient ? This question is not at all easy to answer;
Pawlikowski discusses 11 “rules” to recognise the initial transient period [231]. Below, we
briefly discuss a number of simple guidelines.
The first guideline is to simulate so long that the effect of the initial transient period
becomes negligible. Of course, this is not an efficient method, nor does it provide any
evidence. This method becomes slightly better when combined with a smart initial state
in the simulation, e.g., based on an analytic queueing network model of a simplified version
of the simulation model. In doing so, the period in which queues have to built up towards
their mean occupation becomes smaller. But this method does not provide evidence for a
particular choice.
The slightly better guideline is given by the truncation method, which removes the first
1 < n samples from a sequence of samples zi, f . . ,2,, where I is the smallest value such
that:
min{ 2l+i, . . . 7%) # XZ+I # max{a+l, -**, %}. (18.19)

In words: the samples ~1, e+. , xl are removed when the (I + 1)-th sample is no longer the
maximum, nor the minimum of the remaining samples. This means that the samples that
follow xl+1 have values both smaller and larger than x1+1, thus indicating that an oscillation
around a stationary situation has started. This method corresponds to rule Rl in [231] and
430 18 Simulation: methodology and statistics

has been shown to overestimate the length of the initial transient period when simulating
systems under low utilisation, and to underestimate it for high utilisation.
A better but still simple guideline is the following, based on an estimation of the vari-
ance. Consider a sequence of n samples. The sample mean m is computed as (EYE, xi)/n.
Then, we split the n observations into k groups or batches, such that k = Ln/lj. We start
with batch size I = 2 and increase it stepwisely, thereby computing k accordingly, until the
sample variance starts to decrease, as follows. We compute the batch means as
1 1
mi = - c qi-1y+j 7 i = l,...,k, (18.20)
1 j=l
and the sample variance as:

2 1 k
C(mi-m)2. (18.21)
o = E-l iTl
By increasing the batch size I, more and more samples of the initial transient period will
become part of the first batch. If I is small, many batches will contain samples from the
initial transient period, thus making g2 larger, also for increasing 1. However, if 1 becomes
so large that the first batch contains almost exclusively the samples that can be considered
part of the initial transient period, only ml will significantly differ from m, thus making cr2
smaller. The batch size I for which g2 starts to decrease monotonously equals the number
of samples that should not be considered any further (see also [145]).

18.5.2 Mean values and confidence intervals


Suppose we are executing a simulation to estimate the mean of the random variable X
with (unknown) E[X] = a. In doing so, a simulation is used to generate n samples Q,
i = I,..., n, each of which can be seen as a realisation of a stochastic variable Xi. The
simulation has been constructed such that the stochastic variables Xi are all distributed
as the random variable X. Furthermore, to compute confidence intervals, we require the
Xi to be independent of each other.
Below, we discuss how to estimate the mean value of X and present means to com-
pute confidence intervals for the obtained estimate. We also comment on ways to handle
independence in the measurements.

Mean values

To estimate E[X], we define a new stochastic variable, X, which is called an estimator of X.


Whenever E[X] = a, the estimator X is called unbiased. Whenever Pr{]X - ai < E} -+ 0
18.5 Statistical evaluation of simulation results 431

when n -+ co, the estimator X is called consistent. The latter condition translates itself
to the requirement that var[X] -+ 0, whenever the number of samples n -+ 00. Clearly,
both unbiasedness and consistency are desirable properties of estimators. Whenever the
observations X1, ses, X, are independent, then

li=;gxi (18.22)
2=1

is an unbiased and consistent estimator for E[X] since E[X] = E[X] (unbiasedness) and
var[X] = var[X]/ n, so that var[X] + 0, when n + 00 (consistence).
Although the estimator X seems to be fine, the requirement that the random variables
Xi should be independent causes us problems (independence is required for consistency,
not for unbiasedness). Indeed, successive samples taken from a simulation are generally
not independent. For instance, suppose that the random variables Xi signify samples of
job response times in a particular queue. Whenever Xi is large (or small), Xi+i will most
probably be large (or small) as well, so that successive samples are dependent. There are
a number of ways to cope with the dependence between successive samples; these will be
discussed below.

Guaranteeing independence

There are a number of ways to obtain almost independent observations from simulations,
as they are required to compute confidence intervals. We discuss the three most-widely
used methods below.
With the method of independent replicas the simulation execution is replicated n times,
each time with a different seed for the RNG. In the i-th simulation run, the samples
q1, ' * - ,G,m are taken. Although these individual samples are not independent, the sample
means zi = (~jm=i 3+)/m, i = 1, . . . , n, are considered to be independent from one anot her.
These n mean values are therefore considered as the samples from which the overall mean
value and confidence interval (see below) are estimated. The advantage of this method
lies in the fact that the used samples are really independent, provided the RNGs deliver
independent streams. A disadvantage is that the simulation has to be executed from start
several times. This implies that also the transient behaviour at the beginning of every
simulation has to performed and removed multiple times.
A method which circumvents the latter problem is the batch means method. It requires
only a single simulation run from which the samples 21, . . . , x,., are split into n batches
432 18 Simulation: methodology and statistics

of size m each. Within every batch the samples are averaged as follows:

(18.23)

The samples yi, . . . , yn are assumed to be independent and used to compute the overall
average and confidence intervals. The advantage of this method is that only a single
simulation run is performed for which the initial transient has to be removed only once.
A disadvantage of this method is the fact that the batches are not totally independent,
hence, this method is only approximate. In practice, however, the batch-means method is
most often used.
The fact that successive batches are not totally independent is overcome with the so-
called regenerative method. With this method a single simulation execution is split into
several batches as well; however, the splitting is done at so-called regeneration points in the
simulation. Regeneration points are defined such that the behaviour before such a point is
totally independent from the behaviour after it. A good example of a regeneration point
is the moment the queue gets empty in the simulation of an MlGll queue. As before, the
batch averages are taken as the samples to compute the overall mean and the confidence
intervals; however, since the number of samples per batch is not constant, more complex,
so-called ratio-estimators, must be used.
The advantage of the regeneration method is that the employed samples to compute
the confidence intervals are really independent. The problems with it are the more complex
estimators and the fact that regeneration points might occur only very rarely, thus yielding
long simulation run times. True regeneration points that are visited frequently during a
simulation are rare; the state in which the modelled system empties is often considered
a regeneration point, but this is only correct when the times until the next events are
drawn from memoryless distributions. To illustrate this, in an MIMI1 simulation, every
arrival and departure point is a regeneration point. In an MlGl 1 simulation, all departure
instances and the first arrival instance are regeneration points. In contrast, in an GIGI 1
simulation, only the first arrival instance is a regeneration point.

Confidence intervals

Since the random variables Xi are assumed to be independent and identically distributed,
the estimator X as defined in (18.22) will, according to the central limit theorem, approx-
imately have a Normal distribution with mean a and variance a2/n (we denote var[X] as
18.5 Statistical evaluation of simulation results 433

0”). This implies that the random variable

(18.24)

is N(0, 1)-distributed. However, since we do not know the variance of the random variable
X, we have to estimate it as well. An unbiased estimator for cr2, known as the sample
variance, is given as:
s2= & 5 (Xi- q2,
2=1

The stochastic variable


x-u
z=----- (18.26)
j;/fi
then has a Student- or t-distribution with n - 1 degrees of freedom (a t,-r-distribution).
Notice that a and C2 can easily be computed when xi xi and Ci xs are known:

c:=“=, xi
a=-------, and a2=-- Cell 2: (c:=l Xi>
(18.27)
n n-l n(n - 1) ’

so that during the simulation, only two real numbers have to be maintained per measure.
The Student distribution with three or more degrees of freedom is a symmetric bell-
shaped distribution, similar in form to the Normal distribution (see Figure 18.8). For
n + co, the t,-distribution approaches an N(0, 1) distribution. By using a standard table
for it, such as given in Table 18.3, we can find the value z > 0 such that Pr{lZl 5 z} = p;
z is called the double-sided critical value for the t,-distribution, given ,0. The last row in
Table 18.3 corresponds to the case of having an unbounded number of degrees of freedom,
that is, these critical values follow from the N(0, 1) distribution. Using these double-sided
critical values, we can write:

(18.28)

which states that the probability that the estimator X deviates less than z~/fi from
the mean a is ,0. Stated differently, the probability that X lies in the so-called confidence
interval [u - .m/fi, a + zo/fl n is /?. The probability p is called the confidence level.
As can be observed, to make the confidence interval a factor I smaller, Z2 times more
observations are required, so that the simulation needs to be Z2times as long.
Note that many statistical tables present single-sided critical values, i.e., those values
z’ for which the probability Pr{Z 5 z’} = p’. Due to the symmetrical nature of the
&-distribution, for n 2 3, we have z’ = z if p’ = (1 + /?)/a.
434 18 Simulation: methodology and statistics

Figure 18.8: The bell-shaped Student-distribution with n 2 3 degrees of freedom

We finally note that when using regenerative simulation, due to the use of ratio-
estimators, the computation of confidence intervals becomes more complicated.

Example 18.8. Confidence interval determination.


From five mutually independent simulation runs we obtain the following five samples:
(%‘” ,Q,) = (0.108,0.112,0.111, 0.115,0.098). The sample mean a = (Cf=, 2;)/5 =
0.1088. The sample variance g2 = Cf=,(zi - ~z)~/(5 - 1) = 0.0000427. We assume a
confidence level of ,0 = 0.9. In Table 18.3, we find the corresponding double-sided critical
value for the &distribution with 90% confidence level x = 2.132. We thus obtain that:

Pr{]Z] 5 2.132) = Pr{]x - m] 2 2.132,/h},

from which we derive that Pr{ 12 - ml < 0.00623) = 0.90. Thus, we know that:

r;- E [0.1026,0.1150] with 90% confidence.

Notice that in practical situations, we should always aim to have at least ten degrees of
freedom, i.e., n 2 10. 0

18.6 Further reading


More information on the statistical techniques to evaluate simulation results can be found
in the performance evaluation textbooks by Jain [145], Mitrani [202], Trivedi [280] and in
the survey by Pawlikowski [231]. The former two books also discuss the simulation setup
and random number generation. More detailed information on the statistical techniques
to be used in the evaluation of simulation results can be found in [90, 157, 175, 2851 and
statistics textbooks [137, 2891. Random number generation is treated extensively by Knuth
18.7 Exercises 435

n a = 90% 0 =gjj% a = 99% ) n a = 90% a = 95% a=99%


3 2.353 3.182 5.841 18 1.734 2.101 2.878
4 2.132 2.776 4.604 20 1.725 2.086 2.845
5 2.015 2.571 4.032 22 1.717 2.074 2.819
6 1.943 2.447 3.707 24 1.711 2.064 2.797
7 1.895 2.365 3.499 26 1.706 2.056 2.779
8 1.860 2.306 3.355 28 1.701 2.048 2.763
9 1.833 2.262 3.250 30 1.697 2.042 2.750
10 1.812 2.228 3.169 60 1.671 2.000 2.660
12 1.782 2.179 3.055 90 1.662 1.987 2.632
14 1.761 2.145 2.977 120 1.658 1.980 2.617
16 1.746 2.120 2.921 00 1.645 1.960 2.576

Table 18.3: Double-sided critical values z of the Student distribution for n degrees of
freedom and confidence level CYsuch that Pr{ ]2 ] 5 z} = CY

[162], 1’Ecuyer [79, 801 and others [56, 2291. Some recent tests of random number generators
can be found in [179].
Research in simulation t,echniques is still continuing. In particular, when the interest is
in obtaining measures that are to be associated with events that occur rarely, e.g., packet
loss probabilities in communication systems or complete system failures in fault-tolerant
computer systems, simulations tend to be very long. Special techniques to deal with these
so-called rare-event simulations are being developed, see [99, 130, 2221. Pawlikowski gives
an overview of the problems (and solutions) of the simulation of queueing processes [231].
Also techniques to speed-up simulations using parallel and distributed computer systems
are receiving increased attention; for an overview of these techniques see [102]. Chiola and
Ferscha discuss the special case of parallel simulations of SPNs [48].

18.7 Exercises
18.1. Random number generation.
Use a linear congruential method with m = 18, c = 17, a = 7 and seed x0 = 3 to generate
random numbers in [0,17].
Now make use of an additive congruential method to generate random numbers, using
m = 18 and a0 = -es = ui7 = 1, thereby using as seeds the values computed with the linear
436 18 Simulation: methodology and statistics

congruential method.

18.2. Uniform number generation.


Use the additive congruential method of the previous exercise to obtain 1000 uniformly-
distributed random numbers on [0, 11. Use the x2-test to validate this RNG. Use the auto-
correlation test to validate the independence of successive random numbers k = 1,2,3,4
generated with this RNG.

18.3. Uniform number generation.


Validate the following two RNGs, recently proposed by Fishman and Moore [91]:

l Gl = 48271x,-r modulo 232 - 1;

l &l = 696211x,-r modulo 232 - 1.

18.4. How fair is a coin.


We try to compute the probability p with which the tossing of a coin yield “heads”. We
therefore toss the coin n times. How large should we take n such that with confidence level
Q = 0.9 the width of the confidence interval is only 0.1 times the mean np. What happens
(with n) if p tends to 0, that is, when tossing “heads” becomes a rare event?

18.5. Confidence interval construction.


As a result of a batch-means simulation we have obtained following batch means:

2.45 2.55 2.39 2.41 2.49 2.67 2.38 2.44 2.47

Compute confidence intervals, with Q = 0.9,0.95,0.99. How does the width of the confi-
dence intervals change when the confidence level gets closer to l? How many batches do
you expect to be necessary to decrease the width of the confidence interval by a factor 1.

18.6. Simulating a GIMil and a GIG11 queue.


We consider the event-based simulation of the MIMI1 q ueue, as sketched in Section 18.3.
How should we adapt the given program to simulate queues of MI G/ 1 and of GIG 11 type.
Hint: do not “redraw” random number for non-exponential distributions, unless the asso-
ciated events have really been executed.

18.7. Simulating a CTMC.


We are given a finite CTMC on state space Z, with generator matrix Q and initial proba-
bility distribution p(0). We are interested in the long-run proportion of time that the state
18.7 Exercises 437

of the simulated CTMC is in some (given) subset 3 from 1. We start simulating at t, = 0


until some value t,; we only start collecting samples after the initial transient period of
length ti is over.

1. What is a useful estimator for the measure of interest?

2. Outline a simulation program for CTMC simulation, thereby using properties of


CTMCs.

18.8. Simulating SPNs.


Given is an SPN of the class we have discussed in Chapter 14, however, without any
restrictions on the timings of the transitions. Furthermore, we assume that a transition
which is disabled due to the firing of another transition will resume its activity once it
becomes enabled again. Thus, the time until the firing of a transition is not resampled
every time it becomes enabled, but only when it becomes first enabled (after having fired
or after simulation starts). Present the outline of a simulation program for SPNs obeying
the sketched sampling strategy. What are the advantages of simulating SPNs instead of
numerically solving them? And what are the disadvantages?
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Part VI

Appendices
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Appendix A

Applied probability for performance


analysts

I N this appendix we briefly discuss a number of elementary concepts of probability


that are of use for performance evaluation purposes. By no means is this appendix
intended to cover all of probability theory; it is included
theory

as a refresher for those having


enrolled in a basic course on probability theory and statistics in the past. For further
details, refer to the textbooks [87, 152, 2801
This appendix is further organised as follows. In Section A. 1 we introduce the math-
ematical basis of probability theory and present some general laws and definitions. We
introduce discrete random variables in Sections A.2 and A.3, and continuous random vari-
ables in Sections A.4 and A.5. Mean values and variances are discussed in Sections A.6
through A.8.

A. 1 Probability measures
The mathematical concept of probability can be defined as a triple (S, E, Pr) where S is
the sample space of all possible outcomes of an experiment, E c 2’ (the power set of S)
is a set of events, and Pr : E --+ R is a probability mapping from events to real numbers
which satisfies the following three rules:

1. For all possible e E E: Pr{e} exists and 0 5 Pr{e} 5 1, that is, any possible event
occurs with a probability between 0 and 1;

2. Pr{S} = 1, that is, the sum of probabilities of all possible events equals 1;

3. If e, e’ E E are two mutually exclusive events then Pr{e U e’} = Pr{e} + Pr{e’}.
442 A Applied probability for performance analysts

Example A. 1. Rolling a dice (I).


Consider a normal cubic dice with numbers 1 through 6. Rolling this dice will generate
samples or events from the sample space S = { 1, . . s ,6}. E is the set of all possible events:
E = {{I};.. , (6)). The probability of rolling “6” is then most naturally set equal to
Pr(6) = l/6 (f or convenience we normally omit the double {{}}-notation from now on).
The probability of rolling one of the numbers 1 through 6 equals Pr{ 1,. . e,6} = Pr{S} = 1.
The probability of rolling “even” equals Pr{ ‘Leven”} = Pr{2,4,6} = Pr(2) + Pr(4) +
Pr(6) = l/2. Cl

When two events are not mutually exclusive the probability of occurrence of both
events is not the sum of the probability of both events individually; one should account for
the “overlapping” part of the events by subtracting the overlap. To be more concrete, if
e, e’ E E, then
Pr{e U e’} = Pr{e} + Pr{e’} - Pr{e n e’}. (A4
Two events e, e’ E E that are disjoint, i.e., e II e’ = 0 are mutually exclusive events.

Example A.2. Rolling a dice (II).


Let event e signify “even” rolls, i.e., e = {2,4,6}. Let e’ signify the event “at least three”,
i.e., e’ = {3,4,5,6}. We then have that Pr{e U e’} = Pr{e} + Pr{e’} - Pr{e fl e’} =
Pr{2,4,6} + Pr{3,4,5,6} - Pr{4,6} = 3/6 + 4/6 - 2/6 = 5/6. The latter could also be
derived directly, since Pr{e U e’} = Pr{2,3,4,5,6} = 5/6. cl

The probability of the complement of an event e equals 1 minus the probability of event
e: Pr{le} = 1 - Pr{e}.

Example A.3. Rolling a dice (III).


Let event e signify the event “at least three”, i.e., e = {3,4,5,6}. We then have Pr{e} =
4/6. Throwing less than three then has probability Pr{le} = 1 - Pr{e} = 2/6. 0

Let Pr{ele’} denote the probability that e happens, given that e’ already happens, and
it is called the conditional probability of e, given e’. We have:

Pr{e, e’} = Pr{e n e’} = Pr{ele’} Pr{e’} * Pr{e/e’} = ‘~~~~,;“. (A.4


Two events e and e’ are said to be independent if Pr{e, e’} = Pr{e} Pr{e’}. When e and e’
are independent we of course have Pr{ele’} = Pr{e} and also Pr{e’le} = Pr{e’}. In these
cases, the occurrence of e (or e’) does not say anything about the possible occurrence of e’
A.2 Discrete random variables 443

(or e). If e and e’ are mutually exclusive events, then Pr{e, e’} = Pr{e n e’} = 0, and thus
also Pr{e]e’} = Pr{e’]e} = 0.
Let el, - - - , e, be mutually exclusive events such that S = el U - - 9U e, and let e C S,
then the law of total probability states:

Pr{e} = 2 Pr{eJei} Pr{ei}. (A*3


i=l

A.2 Discrete random variables


Up till now we have only worked with probabilities. In the examples we only used the dice.
This was convenient because the outcome of the experiment (the events) were numbers. In
general this need not be the case. Consider the tossing of a coin. This can yield “heads” or
“tails”. What to do with these outcomes in a calculation? To encompass these problems,
random variables have been introduced.
A random variable is a mapping from the sample space of a probability measure to, in
the case of discrete random variables, the natural numbers or a subset thereof.
A discrete random variable N can be characterised by its probability distribution func-
tion or cumulative density function (CDF) FN(n) = Pr{N 5 n}. The probability density
function (PDF) or probability mass function (pmf) is defined as fN(n) = Pr{N = n}. The
following relation exists between the CDF and PDF:

cv(n) = i: f&-4. (A.41


m=O

As lower limit in the summation we have chosen 0. This will mostly be the case although
this is not necessary. Since the sum of all probabilities of all events, or to be precise, of all
images n E m of events, must equal 1, we have

IX fdn) = 1. (A4
all n

Consequently, Fy(n) is a monotonously increasing function with image in [0, I].

Example A.4. Car counting.


The number of cars C driving over a particular roundabout per minute can be regarded
as a discrete random variable which can take values from 0 onwards. We say that the
distribution of C has support on (0, 1, - - e}. cl

It is also possible to address two (or more) random variables simultaneously. Consider
the case where N and K are random variables with joint distribution function FN,K(~, k) =
444 A Applied probability for performance analysts

Pr{N 2 n, K 5 k}. Similarly, we have fN,K(n, Ic) = Pr{N = n, K = k}. Again, we have a
simple relation between the joint CDF and the joint PDF:

FN,K(n, k) = C C Pr{N = i, K = j}. (A4


i<n j<k

The so-called marginal probability density function fN(n) is defined as follows:

(A.7)
k=O

A similar definition can be given for the marginal PDF of K. The conditional PDF of N
with respect to K is given as
Pr{N = n, K = k} fN,K (n, k>
fNiK(nlk) = Pr{N = nlK = k} = (A.8)
Pr{K = k} = h-4) ’
by virtue of the definition of conditional probability.
If N and K are independent we have FNlK (nl Ic) = FN (n) , and consequently fN,K (n, k) =
fN(n)fK(k). The latter two equalities also hold for the CDF’s.

A.3 Some important discrete distributions

Bernoulli. A discrete random variable N has a Bernoulli distribution with parameter p


when it only takes two values, without loss of generality called “0” and “1”) and for which
f~(0) = Pr{N = 0) = 1 -p and f~(l) = Pr{N = 1) = p.
The result of tossing a coin is generally assumed to generate a Bernoulli distribution
(with p = 0.5). Th e event “0” is generally called a failure, the event “1” a success.
Geometric. Consider again a series of Bernoulli experiments in which the success-
probability is p. The number of trials N until the first success, the success itself included,
has a geometric distribution:

fN(n) = Pr{N = n} = (1 - p)“-‘p, n = 1,2,. .. . (A-9)

A geometrically distributed random variable has support on 1,2, e. ..


The geometric distribution is the only discrete distribution that is memoryless. This
means that the probability that a geometrically distributed random variable N = n + m,
given that we know that N > n (with m 2 l), simply is equal to the probability that
N = m:
Pr{N = n + mJN > n) = Pr{N = m}, m 2 1. (A.lO)
A.4 Continuous random variables 445

In words, this does mean that the fact that we have knowledge of the past (the fact that
N > n) does not change the future behaviour.

Modified geometric. A slight variation occurs with the modified geometric distribution.
Here the number of Bernoulli trials M before the first success is of interest. A random
variable M distributed like this has support on LV:

f~(m) = Pr{M = m} = (1 - p)“p, m = 0, 1, . =. . (A.ll)

Binomial. Consider again a Bernoulli experiment in which the success probability equals
p. Now repeat this Bernoulli experiment n times. The number of successes N then has a
binomial distribution with parameters n and p:

fN(k) = Pr{N = Ic} =

Note that N has support over the range 0,. s. , n.

Poisson. When the number of trials n in the binomial distribution becomes large, the
calculation of the binomial probabilities might be cumbersome. Instead, one might ap-
proximate the binomial PDF by a Poisson PDF; however, this approximation is only good
when n 2 20 and p 5 0.05. If this is the case, we can set a = np and use the following
expression for the Poisson PDF:
CP
fN(n) = Pr{N
= n} = emcy-. (A.13)
n!
Note that a Poisson distributed number has support on N.

Uniform. A discrete uniformly distributed random variable N takes, with equal proba-
bility, one value out of a countable set. As an example, think of rolling a dice. If N can
take values in S = {nl, . . e ,nk}, we have fN(ni) = l/k.

A.4 Continuous random variables


In this section we address continuous random variables. Since a continuous random variable
X can assume values in a finite or infinite range of the real numbers there are always
infinitely many possible values for X. Therefore, the probability of any particular value
x equals 0: Pr{X = x} = 0. The cumulative density function or distribution function is
defined as Fx(x) = Pr{X 2 x}. The probability density function, if it exists, is defined as

Fx(x + 6) - Fx(x) dFx(x)


fx(x) = lim = - = F:,(x). (A.14)
h--+0 6 dx
446 A Applied probability for performance analysts

Stated differently, we see that J’x(s) is the integral over an infinite range of fx (z) values,
like FN(~) was the (discrete) sum over a finite countable set of fN(n) values:

Fx(z) = sx fx(u)du. (A. 15)

Consequently, Pr{z f [u,b]} = Pr{a < IC 5 him= Pr{z 5 b} -Pr(z 5 u} = Fx(b) -Fx(a).
Because the probability of all outcomes must sum to one, we have
m
--co fx(4dJ: = 1. (A.16)
J
Consequently, under certain (technical) conditions, Fx(z) is a monotonously increasing
function with range [0, I].

Example A. 5. Telephone calls.


The duration of a telephone call is generally represented as a random variable with support
on [0, oo), whereas the number of calls a particular exchange is handling at a particular
time instance is a discrete random variable on (0, 1, . . e}. 0

It is also possible to address two (or more) random variables simultaneously. Consider
the case where X and Y are random variables with joint distribution function F”,y(z, y) =
Pr{X 5 z, Y 5 y}. We have a simple relation between the joint CDF and the joint PDF:

ox,+, Y) = Jx J” .fx,y (u, v)dudv. (A.17)


-co -co
The so-called marginal probability density function fx(cc) is defined as follows:

fxw = Jm (A-18)
--oo fx,+, y)dy.

A similar definition can be given for the marginal PDF of Y. The conditional PDF of X
with respect to Y is given as:
fX,Ycw)
fX(Ycm = (A.19)
fY(Y) ’
by virtue of the definition of conditional probability. If X and Y are independent we have
QYbIY) = fxb% and consequently fx,y(z, y) = fx(z)fy(y). The latter two equalities
also hold for the CDF’s.
Now consider two independent random variables X and Y, with distribution functions
Fx(~) and FY(Y), respectively. The sum 2 = X + Y, is a random variable with as density
the convolution of fx(x) and fy (y) as follows:

fi(zz) = Jm
-CO
fx(u)fy(z - u)du, --oo < z < 00. (A.20)
A.5 Some important continuous distributions 447

When X and Y are non-negative random variables, then this can be reduced to

fz(4 = AZ fX(U)fY(Z - u)du, 0 2 2 < m. (A.21)

A.5 Some important continuous distributions

Normal. Let X denote a random variable with support over the entire real axis and with
mean p and variance o2 (0 is called the standard deviation; see also the next section). We
say that X has an N(p, a2) distribution, if the associated PDF is:

-cy2
2u * (A.22)

No explicit expression exists for the CDF Fx(~). Th ere are a number of interesting prop-
erties associated with the normal distribution:

l When X has an N(,v, a2) distribution, the random variable (X - ,u)/o has an N(O, 1)
distribution.

0 Let Xi,... ,X, be a set of mutually independent random variables with means
Pl,“‘, pn and standard deviations 01, . . . , on, such that, for all i, the values pi,
oi, and pi/ai are finite, then, as n + 00, the random variable

(A.23)

This is the central limit theorem. Note that although we did not assume anything
about the distributions of the involved Xi, their arithmetic average tends to be
normally distributed for large n. The normal distribution plays a role in the statistical
evaluation of simulation results.

Deterministic. A deterministic random variable X can only take one value, d, with
probability 1. In fact, this random variable can be viewed as discrete as well as continuous:

0, x < d,
fx(z) = Dirac(d), and Fx(z) = (A.24)
1, x34

where Dirac(d) is a Dirac impulse at x = d.


448 A Applied probability for performance analysts

Exponential. A non-negative random variable X has an exponential distribution with


parameter X (sometimes also called a negative exponential distribution) when, for X, x > 0,
we have
fx(x) = Xe- xx and F’(x) = 1 - eeXz. (A.25)

The exponential distribution has a very nice property: it is the only continuous memoryless
distribution. The memoryless property states that

Pr{X 5 x + y(X > y} = Pr{X 5 x}.

This implies that the knowledge of X being larger than y does not matter at all for
determining the probability that X is larger than x + y. Somehow, the history of X (its
being larger than y) does not matter; it has no memory.

Erlang. Let Xi, . . . , X, be identically and independently distributed random variables, all
with the same exponential distribution. Then X = Cy=“=,Xi has an n-stage Erlang distribu-
tion (an Erlang-n distribution). An Erlang-n distribution is a series of n independent and
identically distributed exponential distributions. We have, for x, X > 0 and n = 1,2,. . .:

X&y1 -xx
M-4 = (n-1)! e 7 (A.27)

and
-xx yiii (Xx>j
Fx(x)=l-e (A.28)
j=O
j! *

Notice that the Erlang-n density is the convolution of n exponential densities (all with the
same mean).

Hypo-exponential. This distribution is also called generalised Erlang distribution. It is


similar to the Erlang distribution; however, the successive exponential stages need not have
the same mean, that is, we have X = Cy=r Xi, where Xi is an exponentially distributed
random variable with parameter Xi. When we have 2 stages and parameters Xi, X2 > 0
and x > 0, we have:

*(e-X’” - eexzx) 7 and


fx(4 =
-x1x + Xl -xzz (A.29)
Fx(x) = 1 - &e AZ-Ale *

Notice that the hypo-exponential density is the convolution of n exponential densities.

Hyperexponential. Let X1 and X2 be two independent exponentially distributed random


variables with parameters Xi and X2 respectively. Now, let X be a random variable that
A.5 Some important continuous distributions 449

with probability pr is distributed as X1 and with probability p2 as X,. Note that pl+p2 = 1.
We say that X has a 2-stage hyperexponential distribution:

.fx(x) = p& -‘lz + p2X2e-x2z, (A.30).

Generalisations to more than two stages are easy to imagine.

Example A.6. Service times.


In the modelling of computer and communication systems, one often has to make assump-
tions about the involved job service times or the packet transmission times (the packet
lengths). For these purposes often the exponential, Erlang, hypo- and hyper-exponential
distribution are used. The exponential distribution is especially advantageous to use, due
to its memoryless property.
Extensive monitoring of telephone exchanges has revealed that telephone call durations
typically are exponentially distributed. Moreover, the average time between successive call
arrivals obeys the exponential distribution. Cl

Phase-type distributions. The last three mentioned distributions are examples of the
class of phase-type distributions, which are distributions that are formed by summing
exponential distributions, or by probabilistically choosing among them. We discuss phase-
type distributions in more detail in Chapter 3. A special type of phase-type distribution
that is used often is the Coxian distribution. It is basically a hypo-exponential distribution;
however, before every exponential phase, it is decided probabilistically whether a next phase
is taken or not. Coxian distributions can be used to approximate any other distribution
(with rational Laplace transform). They can also be seen as special cases of a phase-type
distribution.

Uniform. A continuous random variable X has a uniform distribution on [a, b] when all
the values in the interval [a, b] have equal probability. The uniform PDF equals

& a<x<b,
fx(4 = (A.31)
0, otherwise.

The uniform CDF equals


0, x < a,
E, asx<b, (A.32)
1, x 2 b.
450 A Applied probability for performance analysts

A.6 Moments of random variables


In many situations we do not characterise a random variable by its complete distribution
but rather by its moments. The most important moment of a random variable is the mean
or the expected value. This first moment is calculated as

JWI = n=O
&h(n), (A.33)

when N is a discrete random variable, and as

(A.34)

when X is a continuous random variable. E[.] is called the expectation operator. This is
generalised to the k-th moment (Ic = 1,2,. . a) as follows:

E[N”l = n=O
5 +fi+), (A.35)

when N is a discrete random variable, and as

when X is a continuous random variable.


The quantity E[(X - E[X])‘“] is k nown as the k-th central moment (k = 1,2,. m.).
Note that the first central moment equals 0. The second central moment is known as the
variance. It is denoted as var[X] or as a$:

0: = var[X] = E[(X - J!Z[X])~] = E[X2] - E[X12. (A.37)

A similar definition exists for the variance of discrete random variables (by just changing
the X’s into N’s). We call var[.] the variance operator.
A measure that is often used in performance analysis is the squared coeficient of vari-
ation:
(A.38)

which expresses the variance of X relative to its average value.


Note that we have assumed a number of properties of the expectation operator. First of
all, that it is a linear operator. This means that whenever the random variable Y = aX+b,
then E[Y] = aE[X] + E[b] = aE[X] + b, w h ere X is a random variable and where a and b
are constant.
A.7 Moments of discrete random variables 451

More generally, when Xi, . . . ,X, are random variables, we always have

E[& UiXi]= 2 UJqXJ. (A.39)


i=l i=l

If the Xi are mutually independent we have

E[fi Xi] = fi E[Xi]. (A.40)


i=l i=l

Also under the independence assumption, we have

var [e Xi] = 2 Var[Xi], (A.41)


i=l i=l

and n n
var[JJ Xi] = n Var[Xi]. (A.42)
i=l i=l

Important to note is the fact that the variance operator is not a linear operator: var[aX +
b] = a%ar[X].

A.7 Moments of discrete random variables


In Table A.1 we summarise the parameters and moments of the most important discrete
random variables.

Bernoulli. Consider a Bernoulli distributed random variable N with parameter p. E[N] =


1 xp+Ox (l-p) =p. Similarly, wederiveE[N2] =p, so that var[N] =p-p2 =p(l-p).
The squared coefficient of variation then equals: Ci = var[N]/E[N12 = (1 - p)/p.

Binomial. Consider a binomially distributed random variable N with parameters n and


p. By the fact that a binomial distribution is derived from n independent Bernoulli trials
and the linearity of the expectation operator we immediately see that E[N] = np and
var[N] = np(1 - p). C onsequently, Ci = ( 1 - p) /np.

Poisson. Consider a Poisson distributed random variable N with parameter a. The


expectation then equals, analogously to the binomial distribution: E[N] = a. We can also
calculate that var[ N] = a. Consequently, we have C$ = 1/a.
Geometric. Consider a geometrically distributed random variable N with parameter p.
We can calculate E[N] = l/p and var[N] = (1 - p)/p”, so that Cc = 1 - p.

Modified geometric. Consider a modified geometrically distributed random variable N


with parameter p. We now have E[N] = (1 -p)/p, var[N] = (1 -p)/p2, and C$ = l/( 1 -p).
452 A Applied probability for performance analysts

distribution parameters E[N] var[N]


Bernoulli P P P(l - P)

1 1-P
Geometric P
i P2

1-P 1-P
Mod. Geo. P -
P P2 1-P

1-P
Binomial n,p nP np(l - P)
nP

Poisson o! o! a -1
o!

n-l-1 n2 - 1 n-l
Uniform n -
2 ^-
17 3(n + 1)

Table A.l: Moments of important discrete random variables

Uniform. Consider a discrete uniformly distributed random variable N that can take
values in S = { 1, . . . , n}. Its mean then equals E[N] = (n+ 1)/2 and its variance var[N] =
(n2 - 1)/12. Consequently, C& = (n - 1)/3(n + 1).

A.8 Moments of continuous random variables


In Table A.2 we summarise the parameters and moments of the most important continuous
random variables.
Normal. A random variable X with an N(p, a2) distribution has, by definition, E[X] = p
and var[X] = 02. Consequently, C$ = (a/p)“.
Deterministic. A deterministic random variable X has as expectation its only possible
value: E[X] = d. Since there is no randomness at all, var[X] = 0 (E[X2] = d2), so that
c; = 0.

Exponential. An exponentially distributed random variable X with parameter X has


E[X] = l/X. Th is can be derived as follows:

E[X] = /- x&(z)dz = s,;o Xze-xzdz


-CQ
A.8 Moments of continuous random variables 453

distribution parameters E[X] EW21 var[X] G


Normal -CY
P2

Deterministic d d d2 0 0

1 2 1
Exponential x 1
x x2 x2
n n(n + 1) n 1
Erlang An x x2 x2 n

Hypo-expo.

n Pi n 2Pi
Hyper-expo. Pl7”.7Pn c 1 CT def. >l
i=l i i=l i
x l,“‘, Ark

- L a+b b3 - a3 (b - a)” (b - a)”


Uniform .I_
2 3(b 12 3(a + b)2

Table A.2: Moments of important continuous random variables

By integrating by parts twice, it can be derived that E[X2] = 2/X2, so that var[X] = 1/x2.
Consequently, we have Ci = 1.

Erlang. For the n-stage Erlang distribution we simply derive, by the linearity of the
expectation operator, that E[X] = n/A. We derive that E[X2] = n(n + 1)/X2, so that
var[X] = n/X2. C onsequently, we have Cx2 - 1 /n which is always smaller than 1, for n =
2) 3) * - *. An Erlang-n distribution is “more deterministic” than an exponential distribution.

Hypo-exponential. Consider a hypo-exponentially distributed random variable X with


parameters X1, a. e,X,, that is, X is the sum of n independent exponential stages Xi
with parameters Xi . By using the linearity of the expectation operator, we easily de-
rive that E[X] = Cy=“=, E[Xi] = CFtl l/Xi. Since the stages that constitute the hypo-
exponential are independent, we also have linearity in the variance operator, so that
var[X] = Cr=“=, var[Xi] = Cr=“=, l/X:. For the coefficient of variation it can be derived
454 A Applied probability for performance analysts

Hyperexponential. An n-stage hyperexponentially distributed random variable X with


parameters pi, . . . , p,, and Xi, . a. , X, has the following moments: E[X] = EYE=,pi/X; and
E[X2] = Cy=“=, 2jYi/Xf. Consequently,

var[Xl=&$ 2- (Iili;i)2.
2
(A.44)

The coefficient of variation can then be derived as C$ = E[X2]/E[X12 - 1. It can be shown


that C$ 2 1.

Uniform. A uniformly distributed random variable X on [a, b] has as average value


E[X] = (a + b)/2. It s variance can be derived as var[X] = (b - ~)~/12. From these two,
we derive the coefficient of variation as C$ = (b - CL)~/~(U + b)2.
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Appendix B

Some useful techniques in applied


probability

I N this appendix we present a brief introduction


We then present the geometric
operators in Section B.3.
to Laplace transforms
series in Section B.2, after which we introduce
in Section B.l.
tensor

B. 1 Laplace transforms
The Laplace transform is a useful tool in many stochastic models. In model-based perfor-
mance evaluation, Laplace transforms of probability density functions play an important
role. Let f(x) b e such a PDF (for x > 0). Then, whenever If(x)1 < Me”“, M,cu > 0, the
Laplace transform of f(x) can be computed as:

f*(s) = Lm f(x)emSzdx.

Using this definition, a number of “standard” Laplace transforms can easily be computed
as given in Table B.l. There are a number of interesting properties associated with Laplace
transforms. We list some of them below:

1. Uniqueness. If two Laplace transforms f*(s) = g*(s), for all s, then f(x) = g(x),
for all x.

2. Convolution. Consider n random variables X1 through X,, each distributed ac-


cording to PDF fx,(x). The PDF of the sum 2 = Cr=“=, Xi has Laplace transform:
456 B Some useful techniques in applied probability

f(x) f*(s) condition

C -C
s

1
2
s2

i!
xi si+l
iEIN

-ax 1
e Re(s) > a
s+a

i!
xie -ax i E lN, Re(s) > a
(s + a)i+l

Table B. 1: A number of standard Laplace transforms

We see that convolution in the x-domain transforms to a simple product in the


Laplace domain (the s-domain).

3. Linearity. Let f(x) = & cifi(x), then f*(s) = xi cifi*(s).

4. Moment generation. Let X be a random variable distributed with PDF f(x), and
with Laplace transform f*(s), then the k-th moment of X can be computed from the
transform as follows:
E[X”] = (-1)” p . (B.3)
( ) a=0

5. Dirac impulse. If F(x) = 0, f or x < d, and F(x) = 1, for x 2 d, then f(x) is


a Dirac impulse at x = d. The corresponding Laplace transform is then given as
f*(s) = eAad.

More information on Laplace transforms can be found in textbooks on mathematical anal-


ysis, such as in Kwakernaak and Sivan [168], but also in some performance evaluation
textbooks, e.g., [280].
B.2 Geometric series 457

B.2 Geometric series


The geometric series is used extensively in model-based performance evaluation, most
notably in the analysis of birth-death processes and their variants. Below, we will present
some important results for this series.
Consider the sum &(x) = C~=,z?, with n E N, x > 0 and x # 1; Sri(l) = n + 1. The
sum xS, (x) = C&, xi+‘. We now find:

Sri(x) - x&(x) = cxi - exi+l rr x0 - xn+l = 1 - xn+l,


i=o i=o
which can be rewritten as:

(1 - x)Sn(x) = 1 - xnfl * &(x) = l ;“,’ . W)

This expression for S,(x)


is valid for all x > 0 (x # 1) and for all n E IV. If we now take
0 < x < 1 and let n -+ 00, the term xn+’ will vanish, so that we obtain:

S,(x) = Fxi = 1 O<x<l. (B.5)


1-X’
i=O

Related to S,(x), one often has to compute the series

T,(x) = -&xi,
i=O

e.g., in expectation computations. We can compute this expression explicitly by changing


the order of summation and differentiating as follows:

.i . i-1=xg g (xi)
i=O

Tm(x) 1 ~(gf:x, (li’x) = (1Tx)2.


(B.6)

The expressions for S,(x) and T,(x) extend to the case where x is replaced by a square
matrix M of which the largest Eigenvalue (the spectral radius) is smaller than 1 (in length) :

S,(M) = EMi = (I - M)-‘, Isp( < 1, P-7)


i=O

and co
T&M) = xiMi = M(1 - M)-2, Is-p(M)/ < 1. P.8)
i=O

The matrix extensions are often used when evaluating performance models by means of
matrix-geometric methods (Chapter 8 and 17).
458 B Some useful techniques in applied probability

B.3 Tensor sums and products

Tensor sums and products (also often called Kronecker sums and products) can be used
when composing large CTMCs out of smaller ones, as is done in Chapter 8.
Let M(dr) and M(d2) denote the sets of all matrices of sizes dr x dr and d2 x d2
respectively. Let Qr E M(dl) and Qz E M (dz) and let Id be the identity matrix in M(d).
The tensor sum Q@ of the matrices Qr and Q2 is defined as follows:

Qe = Ql@ Q2 = (Q1 @ I& + (I~~ B Q2), (B.9)

where the + operator is the normal element-to-element addition and @ the tensor product.
The tensor product Q@ of two matrices Qr E M(d,) and Q2 E M (d2) is a matrix in
M(dlds), i.e., a matrix consisting of dl x dl blocks, each of size d2 x d2. Let ~r(ir,jr) be
an element of the matrix Qr and let q2(i2,j2) be an element of the matrix Q2. Then, the
element q@(J,j) equals ~r(ir,jr)q2(iz,j2), where z = (iI,i2) and 3 = (jl,j2). (ir,jr) can be
interpreted as the block coordinate of Q@(2, j), and (i2, j2) as its position within this block.
The tensor sum and product are associative.
Consider two independent CTMCs on state spaces Zr and & and with generator ma-
trices Qr and Q2 respectively. The state space of the combined CTMCs is given by the
Cartesian product Zr x Z2, and the generator Q of the combined CTMC equals the tensor
sum of the individual Markov generators, i.e., Q = Qr @ Q2.

Example B.l. Tensor sum and product.


Consider the following two matrices:

A=(: l), andB=( i). (B.lO)

We then can compute


( a 2a b 2b \
3a 4a 3b 4b
5a 6a 5b 6b
A@B= (B.11)
c 2c d 2d ’
3c 4c 3d 4d
\ 5c 6c 5d 6d J
B.3 Tensor sums and products 459

and
a b 2a 2b \
c d 2c 2d
3a 3b 4a 4b
B@A= (B.12)
3c 3d 4c 4d *
5a 5b 6a 6b
\ 5c 5d 6c 6d /

For further details on tensor algebra and its applications, see e.g., Massey [192] or Plateau
et al. [234].
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Appendix C

Abbreviations

AAL ATM adapt at ion layer


ACE acyclic Markov chain evaluator
ACM Association for Computing Machinery
ATM asynchronous transfer mode
BASTA Bernoulli arrivals see time averages
BCMP Baskett, Chandy, Muntz, Palacios
CDF cumulative density function
CHW Chandy, Herzog, Woo (theorem)
CPU central processing unit
CTMC continuous-time Markov chain
CSPN coloured stochastic Petri net
DES discrete-event simulation
DSPN deterministic and stochastic Petri net
DTMC discrete-time Markov chain
EPT elapsed processing time
ETPP extended threshold priority policy
FCFS first-come first-served
FDDI fiber distributed data interface
FESC flow-equivalent service center
FFQN feed-forward queueing network
FIFO first in, first out
GBE global balance equations
GMTF general modelling tool framework
462 Abbreviations

GNQN Gordon-Newell queueing network


GS G auss-Seidel
GSPN generalised stochastic Petri net
HRN highest response ratio next
IBM International Business Machines
IEEE Institute of Electrical and Electronics Engineers, Inc.
IO input /output
IPP interrupted Poisson process
IS infinite server
iSPN infinite-state stochastic Petri net
SDN integrated services digital network
JQN Jackson queueing network
KLB Kramer and Langenbach-Belz
LAN local-area network
LBE local balance equations
LCFS last-come first-served
LCFSPR last-come first-served, with preemption
LDS load-dependent server
LR logarithmic reduction
LU lower-upper (decomposition)
LWB lower bound
MGM matrix-geometric method
MMAP Markov-modulated arrival process
MMPP Markov-modulated Poisson process
MTTF mean time to failure
MTTR mean time to repair
MVA mean-value analysis
MVM matrix-vector multiplications
OCDR on-demand connection with delayed release
PASTA Poisson arrivals see time averages
PDF probability density function
PFQN product-form queueing network
PFSPN product-form stochastic Petri net
PH phase-type (distribution)
PK Pollaczek-Khintchine (formula)
Abbreviations 463

PRD preemptive resume different


PRI preemptive resume identical
PRIO priority scheduling
PRS preemptive resume
PS processor sharing
QBD quasi-birth-death process
QN queueing network
QNA queueing network analyzer
RNG random number generator/generation
RK Runge-Kutta
RR round robin
SAN stochastic activity network
SEPT shortest elapsed processing time
SJN shortest job next
SMC semi-Markov chain
SOR successive over-relaxation
SPN stochastic Petri net
SRPT shortest remaining processing time
ss successive substitution
su standard uniformisation
THT token holding timer
TMR triple-modular redundancy
TPP threshold priority policy
TTRT target token rotation timer
UPB upper bound
Performance of Computer Communication Systems: A Model-Based Approach.
Boudewijn R. Haverkort
Copyright © 1998 John Wiley & Sons Ltd
ISBNs: 0-471-97228-2 (Hardback); 0-470-84192-3 (Electronic)

Bibliography

[I] M. Ajmone Marsan, G. Balbo and G. Conte. Performance Models of Multiprocessor


Systems. The MIT Press, 1986.

[2] M. Ajmone Marsan, G. Balbo, G. Conte, S. Donatelli and G. Franceschinis. ModelZing


with generalized stochastic Petri nets. John Wiley & Sons, 1995.

[3] M. Ajmone Marsan and G. Chiola. On Petri nets with deterministic and exponen-
tially distributed firing time. In: G. Rozenberg (editor), Advances in Petri Nets,
pp. 132-145. Springer-Verlag, 1987.

[4] M. Ajmone Marsan, G. Conte and G. Balbo. A class of generalized stochastic Petri
nets for the performance evaluation of multiprocessor systems. ACM Transactions
on Computer Systems, 2(2):93-122, 1984.

[5] M. Ajmone Marsan, S. Donatelli and F. Neri. GSPN models of Markovian multiserver
multiqueue systems. Performance Evaluation, 11:227-240, 1990.

[6] M. Ajmone Marsan, S. Donatelli, F. Neri and U. Rubino. GSPN models of ran-
dom, cyclic and optimal l-limited multiserver multiqueue systems. ACM Computer
Communications Review, 21(4):69-80, 1991.

[7] M. Ajmone Marsan, S. Donatelli, F. Neri and U. Rubino. On the construction of


.
abstract GSPNs: An exercise in modelling. In: Proceedings of the 4th International
Workshop on Petri Nets and Performance Models, pp.2-17. IEEE Computer Society
Press, 1991.

[8] G.M. Amdahl. Validity of the single-processor approach to achieving large scale com-
puting capabilities. In: Proceedings of the AFIPS Conference; Volume 30, pp.483-
485. AFIPS Press, 1967.
466 Bibliography

[9] B. Avi-Itzhak, W.L. Maxwell and L.W. Miller. Queueing with alternating priorities.
Operations Research, 13:306-318, 1965.

[lo] G. Balbo. On the successof stochastic Petri nets. In: Proceedings of the 5th Inter-
national Workshop on Petri Nets and Performance Models, pp.2-9. IEEE Computer
Society Press, 1995.

[ll] G. Balbo. Stochastic Petri nets: Accomplishments and open problems. In: Proceed-
ings of the 1st International Computer Performance and Dependability Symposium,
pp.51-60. IEEE Computer Society Press, 1995.

[12] Y. Bard. The VM/370 performance predictor. ACM Computing Surveys, 10(3):333-
342, 1978.

[13] Y. Bard. Some extensions to multiclass queueing network analysis. In: M. Arato,
A. Butrimenko and E. Gelenbe (editors), Performance of Computer Systems, pp.51-
61. North-Holland, 1979.

[14] F. Baskett, K.M. Chandy, R.R. Muntz and F. Palacios. Open, closed and mixed net-
works of queues with different classesof customers. Journal of the ACM, 22(2):248-
260, 1975.

[15] F. Bause and P.S. Kritzinger. Stochastic Petri Nets: An Introduction to the Theory.
Vieweg Verlag, 1996.

[16] H. Beilner. Workload characterisation and performance modelling. In: Proceedings


of the International Workshop on Workload Characterisation of Computer Systems,
Pavia, Italy, 1985.

[17] S. Berson, E. de Souza e Silva and R.R. Muntz. An object-oriented methodology for
the specification of Markov models. Technical report, University of California, Los
Angeles, 1987.

[18] S.S. Berson and R.R. Muntz. Detecting Block GIllUll and Block 4 Gil Matrices
from Model Specifications. Technical report, University of California, Los Angeles,
1994.

[19] G.A. Blaauw and F.P. Brooks. Computer Architecture: Concepts and Evolution.
Addison Wesley, 1997.
Bibliography 467

[20] J.B.C. Bl ant. An algorithmic solution of polling models with limited services disci-
plines. IEEE Transactions on Communications, 40( 7): 1152-1155, 1992.

[21] J.B.C. Bl ant. Performance evaluation of polling systems by meansof the Power-series
algorithm. Annals of Operations Research, 35:155-186, 1992.

[22] A. Bobbio and L. Roberti. Distribution of the minimal completion time of parallel
tasks in multi-reward semi-Markov models. Performance Evaluation, 14:239-256,
1992.

[23] A. Bobbio and K.S. Trivedi. An aggregation technique for the transient analysis of
stiff Markov chains. IEEE Transactions on Computers, 35(9):803-814, 1986.

[24] P.P. Bocharov and V. Nauomov. Matrix-geometric stationary distribution for


the PH]PH]l]r queue. Elektronische Informationsverarbeitung und Kybernetik,
22(4):179-186, 1986.

[25] D.R. Boggs, J.C. Mogul and C.A. Kent. Measured capacity of an Ethernet: Myths
and reality. ACM Computer Communication Review, 18(4):222-234, 1988.

[26] G. Belch, G. Fl eischmann and R. Schreppel. Ein funktionales Konzept zur Analyse
von Warteschlangennetzen und Optimierung von Leistungsgrossen. In: U. Herzog
and M. Paterok (editors), Messung, Modellierung und Bewertung von Rechensyste-
men, Informatik Fachberichte 154, pp.327-342. Springer-Verlag, 1987.

[27] O.J. Boxma, W.P. Groenendijk and J.A. Weststrate. A pseudo-conservation law
for service systems with a polling table. IEEE Transactions on Communications,
38( 10):1865-1870, 1993.

[28] O.J. Boxma and B. Meister. Waiting time approximations in multi-queue systems
with cyclic service. Performance Evaluation, 7( 1):59-70, 1987.

[29] P. Brinch Hansen. Operating System Principles. Prentice-Hall, 1973.

[30] S.C. Brue 11and G. Balbo. Computational algorithms for closed queueing networks.
North-Holland, 1980.

[31] P. Buchholz. Hierarchical Markovian models: Symmetries and reduction. In: R. Poo-
ley and J. Hillston (editors), Computer Performance Evaluation ‘92: ModelZing Tech-
niques and Tools, pp.305-319. Edinburgh University Press, 1992.
468 Bibliography

[32] P. Buchholz. Aggregation and reduction techniques for hierarchical GCSPNs. In:
Proceedings of the 5th International Workshop on Petri Nets and Performance Mod-
els, pp.216 225. IEEE Computer Society Press, 1993.

[33] P. Buchholz, J. Dunkel, B. Muller-Clostermann, M. Sczittnick and S. Zaske. Quan-


titative Systemanalyse mit Markovschen Ketten. B.G. Teubner Verlag, 1994.

[34] P.J. Burke. Th e output of a queueing system. Operations Research, 4:6999704, 1956.

[35] W. Bux. Token-ring local-area networks and their performance. Proceedings of the
IEEE, 77(2):238-256, 1989.

[36] W. Bux and H.L. Truong. Mean-delay approximation for cyclic-server queueing
systems. Performance Evaluation, 3: 187-196, 1983.

[37] J.P. Buzen. Computational algorithms for closed queueing networks with exponential
servers. Communications of the ACM, 16(9):527-531, 1973.

[38] J.P. Buzen. A queueing network model of MVS. ACM Computing Surveys,
10(3):319-331, 1978.

[39] R. Chakk a and I. Mitrani. Spectral expansion solution for a finite capacity multiserver
system in a Markovian environment. In: D.D. Kouvatsos (editor), Proceedings of the
3rd International Workshop on Queueing Networks with Finite Capacity, pp.6.1-6.9,
1995.

[40] K.M. Chandy. A survey of analytic models of rollback and recovery strategies. IEEE
Computer, 8(5):40-47, 1975.

[41] K.M. Chandy, U. Herzog and L.S. Woo. Approximate analysis of general queueing
networks. IBM Journal of Research and Development, 19(1):43-49, 1975.

[42] K.M. Chandy, U. H erzog and L.S. Woo. Parametric analysis of queueing network
models. IBM Journal of Research and Development, 19(1):36-42, 1975.

[43] K.M. Chandy and D. Neuze. Linearizer: A heuristic algorithm for queueing network
models of computing systems. Communications of the ACM, 25(2):126-134! 1982.

[44] K.M. Chandy and C.H. Sauer. Approximate methods for analyzing queueing network
models of computer systems. ACA4 Computing Surveys, 10(3):281-317, 1978.
Bibliography 469

[45] K.M. Chandy and C.H. Sauer. Computational algorithms for product-form queueing
networks. Communications of the ACM, 23(10):573-583, 1980.

[46] P. Chimento and K.S. Trivedi. The completion time of programs on processors
subject to failure and repair. IEEE Transactions on Computers, 42( 10):1184-1194,
1993.

[47] G. Chiola, C. Dutheillet, G. Franceschinis and S. Haddad. Stochastic well-formed


coloured nets and symmetric modelling applications. IEEE Transactions on Com-
puters, 42( 11):1343-1360, 1993.

[48] G. Chiola and A. Ferscha. Distributed simulation of Petri nets. IEEE Parallel and
Distributed Technology, 1(3):33-50, 1993.

[49] H. Choi, V.G. K u lk arni and K.S. Trivedi. Markov-regenerative stochastic Petri nets.
In: G. Iazeolla and S.S. Lavenberg (editors), Proceedings Performance ‘93. North-
Holland, 1993.

[50] H. Choi and K.S. Trivedi. Approximate performance models of polling systems using
stochastic Petri nets. In: Proceedings Infocom ‘92, pp.2306-2314. IEEE Computer
Society Press, 1992.

[51] G. Ciardo. P et ri nets with marking-dependent arc cardinality: Properties and anal-
ysis. In: R. Valette (editor), Application and Theory of Petri Nets 1994, pp.179-198.
Springer-Verlag, 1994.

[52] G. Ciardo, R. G erman and C. Lindemann. A characterization of the stochastic pro-


cess underlying a stochastic Petri net. IEEE Transactions on Software Engineering,
20(7):506-515, 1994.

[53] G. Ciardo, J. Muppala and K. S. Trivedi. SPNP: Stochastic Petri net package.
In: Proceedings of the 3rd International Workshop on Petri Nets and Performance
Models, pp.142-151. IEEE Computer Society Press, 1989.

[54] G. Ciardo, J.K. Muppala and K.S. Trivedi. Analyzing concurrent and fault-tolerant
software using stochastic reward nets. Journal of Parallel and Distributed Computing,
15:225-269, 1992.

[55] E. Cinlar. Introduction to Stochastic Processes.Prentice-Hall, 1975.


470 Bibliography

[56] R.F.W. Coates, G.J. Janacek and K.V. Leever. Monte Carlo simulation and random
number generation. IEEE Journal on Selected Areas in Communications, 6( 1):58-65,
1988.

[57] A. Cobham. Priority assignments in waiting line problems. Operations Research,


2:70-76, 1954.

[58] A. Cobham. Priority assignment-a correction. Operations Research, 3:547, 1955.

[59] E.G. Coffman and P.J. Denning. Operating Systems Theory. Prentice-Hall, 1973.

[60] E.G. Coffman and L. Kleinrock. Feedback queueing models for time-shared systems.
Journal of the ACM, 15(4):549-576, 1968.

[61] E.G. Coffman, R.R. Muntz and H. Trotter. Waiting time distributions for processor-
sharing systems. Journal of the ACM, 17(1):123-130, 1970.

[62] J.W. Cohen. The Single Server Queue. North-Holland, 1969.

[63] A.E. Conway. A perspective on the analytical performance evaluation of multilayered


communication protocol architectures. IEEE Journal on Selected Areas in Commu-
nications, 9( 1):4-14, 1991.

[64] A.E. Conway and N.D. Georganas. A new method for computing the normalisation
constant of multiple-chain queueing networks. INFOR, 24(3):184-198, 1986.

[65] A.E. Conway and N.D. Georganas. Queueing Networks: Exact Computational Algo-
rithms. The MIT Press, 1989.

[66] R.B. Cooper. Q ueues served in cyclic order: Waiting times. The Bell System technical
Journal, 49:399-413, 1970.

[67] R.B. Cooper and G. Murray. Queues served in cyclic order. The Bell System Technical
Journal, 48:675-689, 1969.

[68] J.A. Couvillion, R. Freire, R. Johnson, W.D. Obal, A. Qureshi, M. Rai, W.H. Sanders
and J.E. Tvedt. Performability modelling with UltraSAN. IEEE Software, 8(5):69-
80, 1991.

[69] D.R. Cox. A use of complex probabilities in the theory of stochastic processes.
Proceedings of the Cambridge Philosophical Society, 51:313-319, 1955.
Bibliography 471

WI A.J. Coyle, B.R. Haverkort, W. Henderson and C.E.M. Pearce. A mean-value analysis
of slotted-ring network models. Telecommunication Systems, 6(2):203-227, 1996.

PI J.N. Daigle and D.M. Lucantoni. Queueing systems having phase-dependent arrival
and service rates. In: W.J. Stewart (editor), Numerical Solution of Markov Chains,
pp.161-202. Marcel Dekker Inc., 1991.

VI P.J. Denning and J.B. Buzen. The operational analysis of queueing network models.
ACM Computing Surveys, 10(3):225-261, 1978.

PI N.M. van Dijk. On a simple proof of uniformization for continuous and discrete-state
continuous-time Markov chains. Advances in Applied Probability, 22:749-750, 1990.

PI N.M. van Dijk. Queueing Networks and Product Form: A Systems Approach. John
Wiley & Sons, 1993.

PI S. Donatelli and M. Sereno. On the product-form solution for stochastic Petri nets.
In: K. Jensen (editor), Application and Theory of Petri Nets 1992, pp.154-172.
Springer-Verlag, 1992.

WI B.T. Doshi. Queueing systems with vacations: A survey. Queueing Systems, I( 1):29-
66, 1986.

PI D.L. Eager and K.C. Sevcik. Performance bound hierarchies for queueing networks.
ACM Transactions on Computer Systems, 1(2):99-116, 1983.

PI D.L. Eager and K.C. Sevcik. Bound hierarchies for multiple-class queueing networks.
Journal of the ACM, 33(1):179-206, 1986.

v91 P. 1’Ecuyer. Efficient and portable combined random number generators. Communi-
cations of the ACM, 31(6):742-774, 1988.

WI P. 1’Ecuyer. Random numbers for simulation. Communication of the ACM,


33( 10):85-97, 1990.

WI S. Eilon. A simpler proof of L = XW. Operations Research, 17(5):915-916, 1969.

P21M. Eisenberg. Two queues with changeover times. Operations Research, 9:386-401,
1971.

WI M. Eisenberg. Queues with periodic service and changeover times. Operations Re-
search, 20:440-451, 1972.
472 Bibliography

[84] A.K. Erlang. Solution of some problems in the theory of probabilities of significance
in automatic telephone exchanges. The Post Ofice Electrical Engineer’s Journal,
10: 189-197, 1917.

[85] R.V. Evans. Geometric distribution in some two-dimensional queueing systems. Op-
erations Research, 15:830-846, 1967.

[86] D. Everitt. Simple approximations for token rings. IEEE Transactions on Commu-
nications, 34(7):719-721, 1986.

[87] W. Feller. An Introduction to Probability Theory and its Applications. John Wiley
& Sons, 1968.

[88] M.J. Ferguson and Y.J. Aminetzah. Exact results for nonsymmetric token ring sys-
tems. IEEE Transactions on Communications, 33(3):223-331, 1985.

[89] W. Fischer and K.S. Meier-Hellstern. The Markov-modulated Poisson process


(MMPP) cookbook. Performance Evaluation, 18:149-171, 1992.

[90] G.S. Fishman. Principles of Discrete Event Simulation. John Wiley & Sons, 1978.

[91] G.S. Fishman and L.R. Moore. An exhaustive analysis of multiplicative congruential
random number generators with modulus 231- 1. SIAM Journal on Scientific and
Statistical Computing, 7127-45, 1986.

[92] G. Florin and S. Natkin. On open synchronized queueing networks. In: Proceedings
of the 1st International Workshop on Timed Petri Nets, pp.226-223. IEEE Computer
Society Press, 1985.

[93] G. Florin and S. Natkin. One place unbounded stochastic Petri nets: Ergodicity
criteria and steady-state solution. Journal of Systems and Software, 1(2):103-115,
1986.

[94] G. Florin and S. Natkin. A necessary and sufficient saturation condition for open
synchronized queueing networks. In: Proceedings of the 2nd International Workshop
on Petri Nets and Performance Models, pp.4-13. IEEE Computer Society Press,
1987.

[95] G. Florin and S. Natkin. Generalizations of queueing network product-form solutions


to stochastic Petri nets. IEEE Transactions on Software Engineering, 17(2):99-107,
1991.
Bibliography 473

[96] B.L. Fox and P.W. Glynn. Computing Poisson probabilities. Communications of the
ACM, 31(4):440-445, 1988.

[97] C. Frank. Bewertung von stochastischen Petrinetxen mit Hilfe der Matrix-
geometrischen Methode. Master’s thesis, RWTH Aachen, 1997.

[98] K.A. Frenkel. All an L. Scherr - Big Blue’s time-sharing pioneer. Communications
of the ACM, 30(10):824-828, 1987.

[99] V.S. Frost, W.W. L aure and K.S. Shanmugan. Efficient techniques for the simu-
lation of computer communication networks. IEEE Journal on Selected Areas in
Communications, 6(1):1466157, 1988.

[loo] S.W. Fuhrmann and Y.T. Wang. Mean waiting time approximations of cyclic service
systems with limited service. In: P.J. Courtois and G. Latouche (editors), Proceedings
Performance ‘87, pp.253-265. North-Holland, 1987.

[loll S.W. Fuhrmann and Y.T. Wang. Analysis of cyclic service systems with limited
service: Bounds and approximations. Performance Evaluation, 9( 1):35-54, 1988.

[102] M. Fujimoto. Parallel discrete-event simulation. Communications of the A CM,


33( 10):35554, 1990.

[103] R. German. New results for the analysis of deterministic and stochastic Petri nets.
In: Proceedings of the 1st International Performance and Dependability Symposium,
pp.114-123. IEEE Computer Society Press, 1995.

[104] R. German and C. Lindemann. Analysis of stochastic Petri nets by the method of
supplementary variables. Performance Evaluation, 20:317-335, 1994.

[105] G.H. Golub and C.F. van Loan. Matrix Computations. Johns Hopkins University
Press, 1989.

[106] W.J. Gor don and G. J. Newell. Closed queueing systems with exponential servers.
Operations Research, 15:254-265, 1967.

[107] G.S. Graham. Queueing network models of computer system performance. ACM
Computing Surveys, 10(3):219-224, 1978.

[108] W. Grassmann. Means and variances of time averages in Markovian environments.


European Journal of Operations Research, 31(l): 132-139, 1987.
474 Bibliography

[log] W.K. Grassmann. Transient solutions in Markovian queueing systems. Computers


and Operations Research, 4:47-53, 1977.

[llO] W.K. Grassmann. Finding transient solutions in Markovian event systems through
randomization. In: W.J. Stewart (editor), Numerical Solution of Markov Chains,
pp.357-371. Marcel Dekker, 1991.

[ill] W.P. Groenendijk. Waiting-time approximations for cyclic service systems with
mixed service strategies. In: M. Bonatti (editor), Teletraygic Science for New, Cost-
Eflective Systems, Networks and Services, pp.1434-1441. North-Holland, 1989.

[112] W.P. Groenendijk. Conservation Laws in Polling Systems. PhD thesis, University
of Utrecht, 1990.

[113] D. Gross and D.R. Miller. The randomization technique as a modeling tool and
solution procedure for transient Markov processes. Operations Research, 32(2):343-
361, 1984.

[114] L. Gun and A.M. Makowski. Matrix geometric solutions for finite capacity queues
with phase-type distributions. In: P.J. Courtois and G. Latouche (editors), Proceed-
ings Performance ‘87, pp.269-282, North-Holland, 1987.

[115] J.L. Gustafson. Reevaluating Amdahl’s law. Communications of the ACM,


31(5):532-533, 1988.

[116] A.L. H ageman and D.M. Young. Applied Iterative Methods. Academic Press, 1981.

[117] P.G. Harrison and N.M. Patel. Performance ModelZing of Communication Networks
and Computer Architectures. Addison-Wesley, 1992.

[118] B.R. Haverkort. Matrix-geometric solution of infinite stochastic Petri nets. In: Pro-
ceedings of the 1st International Computer Performance and Dependability Sympo-
sium, pp.72-81. IEEE Computer Society Press, 1995.

[119] B.R. Haverkort and A. Ost. Steady-state analysis of infinite stochastic Petri nets:
A comparison between the spectral expansion and the matrix-geometric method.
In: Proceedings of the 7th International Workshop on Petri Nets and Performance
Models, pp.36-45. IEEE Computer Society Press, 1997.

[1201 B.R. Haverkort. Performability ModelZing Tools, Evaluation Techniques and Appli-
cations. PhD thesis, University of Twente, 1990.
Bibliography 475

[121] B.R. Haverkort. Approximate performability and dependability modelling using gen-
eralized stochastic Petri nets. Performance Evaluation, 18:61-78, 1993.

[122] B.R. Haverkort. Approximate analysis of networks of PH]PH]l]K queues: Theory


& tool support. In: H. Beilner and F. Bause (editors), Quantitative Evulution of
Computing and Communication Systems, Lecture Notes in Computer Science 977,
pp.239-253. Springer-Verlag, 1995.

[123] B.R. Haverkort. Approximate analysis of networks of PH]PH]l]K queues with cus-
tomer losses:Test results. Annuls of Operations Research, 79:271-291, 1998.

[124] B.R. Haverkort, H. Idzenga and B.G. Kim. Performance evaluation of threshold-
based ATM cell scheduling policies under Markov-modulated Poisson traffic using
stochastic Petri nets. In: D.D. Kouvatsos (editor), Performance ModelZing and EvuZ-
uution of ATA Networks, pp.553-572. Chapman and Hall, 1995.

[125] B.R. Haverkort and I.G. Niemegeers. Performability modelling tools and techniques.
Performance Evaluation, 25: 17-40, 1996.

[126] B.R. Haverkort and K.S. Trivedi. Specification and generation of Markov reward
models. Discrete-Event Dynamic Systems: Theory and Applications, 31219-247,
1993.

[127] B.R. Haverkort, A.P.A. van Moorsel and D.-J. Speelman. Xmgm: A performance
analysis tool based on matrix geometric methods. In: Proceedings of the 2nd Internu-
tionul Workshop on ModelZing, Analysis and Simulation of Computer and Telecom-
munication Systems, pp.152-157. IEEE Computer Society Press, 1994.

[128] J.P. H ay es. Computer Architecture and Organization. McGraw-Hill, 1988.

[129] A. Heck. Introduction to MAPLE. Springer-Verlag, 1993.

[130] P. Heidelberger. Fast simulation of rare events in queueing and reliability models.
ACM Transactions on Modeling and Computer Simulation, 5(1):43-85, 1995.

[131] G.J. Heijenk. Connectionless Communications using the Asynchronous Transfer


Mode. PhD thesis, University of Twente, 1995.

11321G.J. Heijenk and B.R. Haverkort. Design and evaluation of a connection management
mechanism for an ATM-based connectionless service. Distributed System Engineering
Journal, 3( 1):53-67, 1996.
476 Bibliography

[133] W. Henderson and D. Lucic. Aggregation and disaggregation through insensitivity


in stochastic Petri nets. Performance Evaluation, 17:91-l 14, 1993.

[134] W. Henderson and P.G. Taylor. Embedded processesin stochastic Petri nets. IEEE
Transactions on Software Engineering, 17(2):108-116, 1991.

[135] R. Hofmann, R. Klar, B. Mohr, A. Quick and M. Siegle. Distributed performance


monitoring: Methods, tools and applications. IEEE Transactions on Parallel and
Distributed Systems, 5(6):585-598, 1994.

[136] A.S. Hornby. Oxford Advanced Learner’s Dictionary of Current English. Oxford
University Press, 1974.

[137] R. Houterman. Wiskundige Statistiek met Toepassingen. Course Notes 153014, Uni-
versity of Twente, 1992.

[138] R.A. Howard. Dynamic probabilistic systems; Volume I: Markov models. John Wiley
& Sons, 1971.

[139] R.A. Howard. Dynamic Probabilistic Systems; Volume II: Semi-Markov and decision
processes. John Wiley & Sons, 1971.

[140] O.C. Ibe, H. Choi and K.S. Trivedi. Performance evaluation of client-server systems.
IEEE Transactions on Parallel and Distributed Systems, 4(11):1217-1229, 1993.

[141] O.C. Ibe, A. Sathaye, R.C. Howe and K.S. Trivedi. Stochastic Petri net modelling of
VAXcluster system availability. In: Proceedings of the 3rd International Workshop
on Petri Nets and Performance Models, pp.112-121. IEEE Computer Society Press,
1989.

[142] O.C. Ib e and K.S. Trivedi. Stochastic Petri net models of polling systems. IEEE
Journal on Selected Areas in Communications, 8(9):1649-1657, 1990.

[143] J.R. J ack son. Networks of waiting lines. Operations Research, 5:518-521, 1957.

[144] J.R. Jackson. Jobshop-like queueing systems. Management Sciences, 10:131-142,


1963.

[145] R. Jain. The Art of Computer System Performance Evaluation. John Wiley & Sons,
1991.
Bibliography 477

[146] R. Jain. Performance analysis of FDDI token ring networks: Effects of parameters
and guidelines for setting TTRT. IEEE Magazine of Lightwave Telecommunication
Systems, ~~~16-22, 1991.

[147] A. Jensen. Markov chains as an aid in the study of Markov processes. Skand.
Aktuarietidskrift, 3:87-91, 1953.

[148] K. Jensen. Coloured Petri Nets. Basic Concept, Analysis Methods and Practical
Applications (Volume 1). EATCS Monographs on Theoretical Computer Science.
Springer-Verlag, 1992.

[149] K. Jensen. Coloured Petri Nets. Basic Concept, Analysis Methods and Practical
Applications (Volume 2,). EATCS Monographs on Theoretical Computer Science.
Springer-Verlag, 1997.

[150] W.S. Jewell. A simple proof of L = XW. Operations Research, 15(6):1109-1116,


1967.

[151] M.J. J oh nson. Proof that timing requirements of the FDDI token ring protocol are
satisfied. IEEE Transactions on Communications, 35(6):620-625, 1987.

[152] K. Kant. Introduction to Computer System Performance Evaluation. McGraw-Hill,


1992.

[153] F.P. Kelly. Reversibility and Stochastic Networks. John Wiley & Sons, 1979.

[154] J.G. Kemeny and J.L. Snell. Finite Markov chains. Van Nostrand, Princeton, 1960.

[155] D.G. Kendall. Some problems in the theory of queues. Journal of the Royal Statistical
Society, Ser. B, 13:151-185, 1951.

[156] P.J.B. King. Computer and Communication Systems Performance Modelling.


Prentice-Hall, 1990.

[157] J.P.C. Kleijnen and W. van Groenendaal. Simulation: A Statistical Perspective. John
Wiley & Sons, 1992.

[158] E.F.J. Kl ein. SPN2MGM: a tool for solving a class of infinite GSPN models. Master’s
thesis, University of Twente, 1995.

[159] L. Kleinrock. Time-shared systems: A theoretical treatment. Journal of the ACM,


14(2):242-261, 1967.
478 Bibliography

[160] L. Kleinrock. Queueing Systems; Volume 1: Theory. John Wiley & Sons, 1975.

[161] L. Kleinrock. Queueing Systems; Volume 2: Computer Applications. John Wiley &
Sons, 1976.

[162] D.E. Knuth. The Art of Computer Programming; Volume 2: Seminumerical Algo-
rithms. Addison-Wesley, 1981.

[163] A.G. K onh eim and B. Meister. Waiting lines and times in a system with polling.
Journal of the ACM, 21(7):470-490, 1974.

[164] W. Kramer and M. Langenbach-Belz. Approximate formulae for the delay in the
queueing system GIlGIl. In: Proceedings of the 8th International Teletrafic Congress,
pp.235-l/8, 1976.

[165] U. Krieger, B. Miiller-Clostermann and M. Sczittnick. Modelling and analysis of


communication systems based on computational methods for Markov chains. IEEE
Journal on Selected Areas in Communications, 8(9):1630-1648, 1990.

[166] P.J. K Uhn. Approximate analysis of general queueing networks by decomposition.


IEEE Transactions on Communications, 27(1):113-126, 1979.

[167] P.J. Kuhn. Multiqueue systems with non-exhaustive cyclic service. The Bell System
Technical Journal, 58(3):671-698, 1979.

[168] H. Kwakernaak and R. Sivan. Modern Signals and Systems. Prentice-Hall, 1991.

[169] S.S. Lam. Dynamic scaling and growth behaviour of queueing network normalization
constants. Journal of the ACM, 29(2):492-513, 1982.

[170] S.S. Lam. A simple derivation of the MVA and LBANC algorithms from the convo-
lution algorithm. IEEE Transactions on Computers, 32(11):1062-1064, 1983.

[171] F. Lange, R. Kroeger and M. Gergeleit. JEWEL: Design and implementation of


a distributed measurement system. IEEE Transactions on Parallel and Distributed
Systems, 3(6):657-671, 1992.

[172] G. Latouche. Algorithms for infinite Markov chains with repeating columns. In:
C.D. Meyer (editor), Linear algebra, Markov chains and queueing models, pp.231-
265. Springer-Verlag, 1993.
Bibliography 479

[173] G. Latouche and V. Ramaswami. A logarithmic reduction algorithm for quasi birth
and death processes. Journal of Applied Probability, 30:650-674, 1993.

[174] S.S. Lavenberg and M. Reiser. Stationary state probabilities at arrival instants for
closed queueing networks with multiple types of customers. Journal of Applied Prob-
ability, 17(4):1048-1061, 1980.

[175] A.L. Law and W.D. Kelton. Simulation Modeling and Analysis. McGraw-Hill, 1991.

[176] A.M. Law and M.G. McComas. Simulation software for communications networks:
The state of the art. IEEE Communications Magazine, 32(3):44-50, 1994.

[177] E.D. Lazowska, J.L. Zahorjan, G.S. Graham and K.C. Sevcik. Quantitative System
Performance: Computer system analysis using queueing network models. Prentice-
Hall, 1982.

[178] D.S. Lee and B. Sengupta. Queueing analysis of a threshold based priority scheme
for ATM networks. IEEE/ACM Transactions on Networking, 1(6):709-717, 1993.

[179] H. Leeb and S. Wegenkittl. Inversive and linear congruential pseudorandom num-
ber generators in empirical tests. ACM Transactions on Modeling and Computer
Simulation, 7(2):272-286, 1997.

[180] R. Lepold. PENPET: A new approach to performability modelling using stochastic


Petri nets. In: B.R. Haverkort, I.G. Niemegeers and N.M. van Dijk (editors), Proceed-
ings of the First International Workshop on Performability ModelZing of Computer
and Communication Systems, pp.3-17. University of Twente, 1991.

[181] H. Levy and M. Sidi. Polling systems: Applications, modeling and optimization.
IEEE Transactions on Communications, 38(10):1750-1760, 1990.

[182] C. Lindemann. An improved numerical algorithm for calculating steady-state solu-


tions of deterministic and stochastic Petri net models. In: Proceedings of the 4th
International Workshop on Petri Nets and Performance Models, pp.176-185. IEEE
Computer Society Press, 1991.

[183] C. Lindemann. An improved numerical algorithm for calculating steady-state solu-


tions of deterministic and stochastic Petri nets. Performance Evaluation, 18(1):79-
95, 1993.
480 Bibliography

[184] C. Lindemann. Performance Modeling with Deterministic and Stochastic Petri Nets.
John Wiley & Sons, 1998.

[185] D.V. Lindley. The theory of queues with a single server. Proceedings of the Cambridge
Philosophical Society, 48~277-289, 1952.

[186] J.D.C. Little. A proof of the queueing formula L = XW. Operations Research,
9(3):383-387, 1961.

[187] D.M. Lucantoni, K.S. Meier-Hellstern and M.F. Neuts. A single-server queue with
server vacations and a class of non-renewal arrival processes. Advances in Applied
Probability, 22:676-705, 1990.

[188] D.M. Lucantoni and V. Ramaswami. Efficient algorithms for solving the non-linear
matrix equations arising in phase-type queues. Stochastic Models, l( 1):29-51, 1985.

[189] V. Mainkar and K.S. Trivedi. Sufficient conditions for existence of fixed-point in
stochastic reward net-based iterative models. IEEE Transactions on Software Engi-
neering, 22(9):640-653, 1996.

[190] R.A. Marie, A.L. Reibman and K.S. Trivedi. Transient analysis of acyclic Markov
chains. Performance Evaluation, 71175-194, 1987.

[191] K.T. Marshall. Some inequalities in queueing. Operations Research, 16(3):651-665,


1981.

[192] W.A. Massey. Open networks of queues: their algebraic structure and estimating
their transient behavior. Advances in Applied Probability, 16: 176-201, 1984.

11931 W.L. Maxwell. On the generality of the equation L = XW. Operations Research,
18(1):172-174, 1970.

[194] R.M. Metcalfe and D.R. Boggs. Ethernet: distributed packet switching for local
computer networks. Communications of the ACM, 19(7):395-404, 1976.

[195] J.F. M ey er. Computation-based reliability analysis. IEEE Transactions on Comput-


ers, 25(6):578-584, 1976.

[196] J.F. Meyer. On evaluating the performability of degradable computing systems.


IEEE Transactions on Computers, 29(8):720-731, 1980.
Bibliography 481

[197] J.F. M ey er. Closed-form solutions of performability. IEEE Transactions on Com-


puters, 31(7):648-657, 1982.

[198] J.F. M eyer. Performability: A retrospective and some pointers to the future. Per-
formance Evaluation, 14(3):139-156, 1992.

[199] J.F. M ey er. Performability evaluation: Where it is and what lies ahead. In: Proceed-
ings of the 1st International Performance and Dependability Symposium, pp.334-343.
IEEE Computer Society Press, 1995.

[200] J.F. M ey er, D.G. Furchtgott and L.T. Wu. Performability evaluation of the SIFT
computer. IEEE Transactions on Computers, 29(6):501-506, 1980.

[201] L.W. M-111 er and L.E. Schrage. The queue M]G]l with the shortest remaining pro-
cessing time. Operations Research, 14:670-683, 1966.

[202] I. Mitrani. Simulation Techniques for Discrete-Event Systems. Cambridge University


Press, 1982.

[203] I. Mitrani and R. Chakka. Spectral expansion solution of a class of Markov mod-
els: Application and comparison with the matrix-geometric method. Performance
Evaluation, 23:241-260, 1995.

[204] C. Moler and C.F. van Loan. Nineteen dubious ways to compute the exponential of
a matrix. SIAM Review, 20(4):801-835, 1978.

[205] M.K. Molloy. Performance analysis using stochastic Petri nets. IEEE Transactions
on Computers, 31(9):913-917, 1982.

[206] A.P.A. van Moorsel. Performability Evaluation Concepts and Techniques. PhD thesis,
University of Twente, 1993.

[207] A.P.A. van Moorsel and B.R. Haverkort. Probabilistic evaluation for the analytical
solution of large Markov models: Algorithms and tool support. Microelectronics and
Reliability, 36(6):733-755, 1996.

[208] A.P.A. van Moorsel and W.H. Sanders. Adaptive uniformization. Stochastic Models,
10(3):619-648, 1994.

[209] A. Movagh ar. Performability modeling with stochastic activity networks. PhD thesis,
The University of Michigan, 1985.
482 Bibliography

[210] A. Movagh ar and J.F. Meyer. Performability modelling with stochastic activity
networks. In: Proceedings of the 1984 Real-Time Systems Symposium, pp.215-224.
IEEE Computer Society Press, 1984.

[all] M. Mulazzani and K.S. Trivedi. Dependability prediction: Comparison of tools and
techniques. In: Proceedings IFAC SAFECOMP, pp.171-178, 1986.

[212] R.R. Muntz. Queueing networks: A critique of the state of the art and directions for
the future. ACA4 Computing Surveys, 10(3):353-359, 1978.

[213] J.K. Muppala and KS. Trivedi. Numerical transient solution of finite Markovian
queueing systems. In: U. Bhat (editor), Queueing and Related Models. Oxford Uni-
versity Press, 1992.

[214] S. N atk in. Reseaux de Petri Stochastiques. PhD thesis, CNAM, Paris, 1980.

[215] R. Nelson. Matrix Geometric Solutions in Markov Models: A Mathematical Tutorial.


Technical report, IBM Research Report RC 16777, 1991.

[216] R. Nelson. Probability, stochastic processes and queueing theory. Springer-Verlag,


1995.

[217] M.F. Neuts. Matrix Geometric Solutions in Stochastic Models: An Algorithmic Ap-
proach. Johns Hopkins University Press, 1981.

[218] M.F. Neuts. The caudal characteristic curve of queues. Advances in Applied Proba-
bility, 18:221-254, 1986.

[219] M.F. Neuts. Structured Stochastic Matrices of MlG]l Type and Their Applications.
Marcel Dekker, 1989.

[220] V.F. Nicola, V.G. Kulkarni and K.S. Trivedi. Queueing analysis of fault-tolerant com-
puter systems. IEEE Transactions on Software Engineering, 13(3):363-375, 1987.

[221] V.F. U1 ice1a and J.M. van Spanje. Comparative analysis of different models of check-
pointing and recovery. IEEE Transactions on Software Engineering, 16(8):807-821,
1990.

[222] W.D. Oba1 and W.H. Sanders, Importance sampling simulation in UltraSAN. Sim-
ulation, 62(2):98-111, 1994.
Bibliography 483

[223] W. Oberschelp and G. Vossen. Rechneraufbau und Rechnerstrukturen. Oldenbourg


Verlag, 1997.

[224] R.O. Onvural. Survey of closed queueing networks with blocking. ACM Computing
Surweys, 22(2):83-121, 1990.

[225] R.O. Onvural. Asynchronous Transfer Mode Networks: Performance Issues. Artech
House, 1994.

[226] A. Ost, C. Frank and B.R. Haverkort. Untersuchungen zum Verbindungsmanagement


bei Videoverkehr mit Matrix-geometrischen stochastischen Petrinetzen. In: K. Irm-
scher (editor), Proceedings of the 9th ITG/GI Workshop on Measurement, Modeling
and Evaluation of Computer System Performance, pp.71-85. VDE Verlag, 1997.

[227] T.W. Page, S.E. Berson, W.C. Cheng and R.R. Muntz. An object-oriented modelling
environment. ACM Sigplan Notices, 24( 10):287-296, 1989.

[228] C. Palm. Intensitatsschwankungen im Fernsprechverkehr. Ericsson Technics, pp. l-


189, 1943.

[229] S.K. Par k and K.W. Miller. Random number generators: Good ones are hard to
find. Communications of the ACM, 31(10):1192-1201, 1988.

[230] M. Paterok, P. Dauphin and U. Herzog. The method of moments for higher moments
and the usefulnessof formula manipulation systems. In: H. Beilner and F. Bause (ed-
itors), Quantitative Evaluation of Computing and Communication Systems, Lecture
Notes in Computer Science 977, pp.56-70. Springer-Verlag, 1995.

[231] K. Paw1ik owski. Steady-state simulation of queueing processes: A survey of problems


and solutions. ACM Computing Surveys, 22(2):123-170, 1990.

[232] C.A. Petri. Kommunikation mit Automaten. PhD thesis, University of Bonn, 1962.

[233] T.E. Phipps. Machine repair as a priority waiting-line problem. Operations Research,
9:732-742, 1961.

[234] B. Plateau, J.-M. Fourneau and K.-H. Lee. PEPS: A package for solving com-
plex Markov models of parallel systems. In: R. Puigjaner and D. Potier (editors),
Modelling Techniques and Tools for Computer Performance Evaluation, pp.291-305.
Plenum Press, 1990.
484 Bibliography

[235] F. P o 11aczek. Uber


‘* eine Aufgabe der Wahrscheinlichkeitstheorie. Muthematisches
Zeitschrij?, 32:729-750, 1930.

[236] M.A. Qureshi and W.H. Sanders. Reward model solution methods with impulse
and rate rewards: An algorithm and numerical results. Performance Evaluation,
20:413-436, 1994.

[237] A. Ranganathan and S. J. Upadhyaya. Performance evaluation of rollback-recovery


techniques in computer programs. IEEE Transactions on Reliability, 42(2):220-226,
1993.

[238] A.L. Reibman, R. Smith and K.S. Trivedi. Markov and Markov reward models
transient analysis: An overview of numerical approaches. European Journal of Op-
erational Research, 4:257-267, 1989.

[239] A.L. R ei b man and K.S. Trivedi. Numerical transient analysis of Markov models.
Computers and Operations Research, 15( 1):19-36, 1988.

[240] A.L. R eib man and K.S. Trivedi. Transient analysis of cumulative measures of Markov
model behavior. Stochastic Models, 5(4):683-710, 1989.

[241] M. Reiser. Numerical methods in separable queueing networks. Studies in the Mun-
ugement Sciences, 7: 113-142, 1977.

[242] M. Reiser. A queueing network analysis of computer communication networks with


window flow control. IEEE Transactions on Communications, 27(8):1199-1209,
1979.

[243] M. Reiser. Mean value analysis and convolution method for queue dependent servers
in closed queueing networks. Performance Evaluation, 1(1):7-18, 1981.

[244] M. Reiser and H. Kobayashi. Queueing networks with multiple closed chains: Theory
and computational algorithms. IBM Journal of Research and Development, 19:285-
294, 1975.

[245] M. Reiser and S.S. Lavenberg. Mean value analysis of closed multichain queueing
networks. Journal of the ACM, 22(4):313-322, 1980.

[246] F.E. Ross. An overview of FDDI: the fiber-distributed data interface. IEEE Journal
on Selected Areas in Communications, 7(7):1043--1051, 1989.
Bibliography 485

[247] S.M. Ross. Stochastic Processes. John Wiley & Sons, 1983.

[248] C. Ruemmler and J. Wilkes. An introduction to disk drive modelling. IEEE Com-
puter, 27(3):17-28, 1994.

[249] R. Sahner, K.S. Trivedi and A. Puliafito. Performance and Reliability Analysis of
Computer Systems: An Example-Bused Approach using the SHARPE Software Puck-
age. Kluwer Academic Publishers, 1996.

[250] R. A. S ah ner and K. S. Trivedi. A software tool for learning about stochastic models.
IEEE Transactions on Education, 36(1):56-61, 1993.

[251] H. Saito. Teletrufic Technologies in ATM Networks. Artech House, 1994.

[252] W.H. San d ers and L.M. Malhis. Dependability evaluation using composed SAN-
based reward models. Journal on Parallel and Distributed Computing, 15:238-254,
1992.

[253] W.H. San d ers and J.F. Meyer. Reduced-base model construction for stochastic ac-
tivity networks. IEEE Journal on Selected Areas in Communications, 9(1):25-36,
1991.

[254] F. Scheid. Theory and Problems of Numerical Analysis. McGraw-Hill, 1983.

[255] A.L. Sch err. An analysis of time-shared computer systems. The MIT Press, 1966.

[256] L.E. Schrage. The queue M]G]l with feedback to lower priority queues. iklunugement
Science, 13:466-474, 1967.

[257] P. Schweitzer. Approximate analysis of multichain closed queueing networks. In:


Proceedings of the International Conference on Stochastic Control and Optimization,
1979.

[258] M. Sczittnick and B. Miiller-Clostermann. MACOM-a tool for the Markovian anal-
ysis of communication systems. In: R. Puigjaner (editor), Purtipunts Proceedings of
the 4th International Conference on Data Communication Systems and Their Per-
formance, pp.456-470, 1990.

[259] K.C. Sevcik and M.J. Johnson. Cycle time properties of the FDDI token ring protocol.
IEEE Transactions on Software Engineering, 13(3):376-385, 1987.
486 Bibliography

[260] K.C. Sevcik and I. Mitrani. The distribution of queueing network states at input and
output instants. Journal of the ACM, 28(2):358-371, 1981.

[261] A. S’lb
1 erschatz and P.B. Galvin. Operating System Concepts. Addison Wesley, 1994.

[262] E. de Souza e Silva and H.R. Gail. Calculating cumulative operational time distribu-
tions of repairable computer systems. IEEE Transactions on Computers, 35(4):322-
332, 1986.

[263] E. de Souza e Silva and H.R. Gail. Calculating availability and performability mea-
sures of repairable computer systems using randomization. Journal of the ACM,
36(1):171-193, 1989.

12641 E. de Souza e Silva and H.R. Gail. Performability analysis of computer systems:
from model specification to solution. Performance Evaluation, 1:157-196, 1992.

[265] J.D. Spragins, J.L. Hammond and K. Pawlikowski. Telecommunication Protocols and
Design. Addison Wesley, 1991.

[266] W.J. St ewart. A comparison of numerical techniques in Markov modelling. Commu-


nications of the ACM, 21(2):144-252, 1978.

[267] W.J. Stewart. On the use of numerical methods for ATM models. In: H. Perros,
G. Pujolle and Y. Takahashi (editors), ModelZing and Performance Evaluation of
ATM Technology, pp.375-396. North-Holland, 1993.

[268] W.J. St ewart. Introduction to the Numerical Solution of Markov Chains. Princeton
University Press, 1994.

[269] S. Stidh am. L = XIV: A d’iscounted analogue and a new proof. Operations Research,
20(6):1115-1126, 1972.

[270] S. Stidh am. A last word on L = XW. Operations Research, 22(2):417-421, 1974.

[271] H. Takagi. Analysis of Polling Models. The MIT Press, 1986.

[272] H. Takagi. Queueing analysis of polling models. ACM Computing Surveys, 20(1):5-
28, 1988.

[273] H. Takagi. Queueing analysis of polling models: An update. In: H. Takagi (editor),
Stochastic Analysis of Computer and Communication Systems, pp.267-318, North-
Holland, 1990.
Bibliography 487

[274] H. Takagi. Queueing analysis: A foundation of performance evaluation; Volume 1:


Vacation and Priority Models. North-Holland, 1991.

[275] A.S. Tanenbaum. Structured Computer Organization. Prentice-Hall, 1990.

[276] A.S. Tanenbaum. Distributed Operating Systems. Prentice-Hall, 1995.

[277] A.S. Tanenbaum. Computer Networks. Prentice-Hall, 1996.

[278] A.S. Tanenbaum and A.S. Woodhull. Operating Systems: Design and Implementa-
tion. Prentice-Hall, 1997.

[279] M. Tangemann. Mean waiting time approximations for symmetric and asymmetric
polling systems with time-limited service. In: B. Walke and 0. Spaniol (editors),
Messung, Modellierung und Bewertung von Rechen- und Kommunikationssystemen,
pp.143-158. Springer-Verlag, 1993.

[280] K.S. Trivedi. Probability and Statistics with Reliability, Queueing and Computer
Science Applications. Prentice-Hall, 1982.

[281] K.S. Trivedi, J.K. Muppala, S.P. Woolet and B.R. Haverkort. Composite performance
and dependability analysis. Performance Evaluation, 14:197-215, 1992.

[282] D. Wagner, V. Nauomov and U. Krieger. Analysis of a finite capacity multi-server


delay-loss system with a general Markovian arrival process. In: S.S. Alfa and
S. Chakravarthy (editors), Matrix-Analytic Methods in Stochastic Models. Marcel
Dekker, 1995.

[283] J. Walrand. An Introduction to Queueing Networks. Prentice-Hall, 1988.

[284] J. Walrand. Communication Networks. Aksen Associates, 1991.

[285] P.D. Welch. The statistical analysis of simulation results. In: S.S. Lavenberg (editor),
Computer Performance Modelling Handbook, pp.267-329. Academic Press, 1993.

[286] J.A. Weststrate. Analysis and Optimization of Polling Models. PhD thesis, Catholic
University of Brabant, 1992.

[287] W. Whitt. Performance of the queueing network analyzer. The Bell System Technical
Journal, 62(9):2817-2843, 1983.
488 Bibliography

[288] W. Whitt. The queueing network analyzer. The Bell System Technical Journal,
62(9):2779-2815, 1983.

[289] S.S. Wilks. M ath ematical Statistics. John Wiley & Sons, 1962.

[290] R.A. Wolff. Stochastic Modelling and Theory of Queues. Prentice-Hall, 1989.

[291] R.W. Wolff. P oisson arrivals see time averages. Operations Research, 30(2):223-231,
1982.

[292] J.W. Wong. Q ueueing network modelling of computer communication networks.


ACM Computing Surveys, 10(3):343-351, 1978.

[293] J. Ye and S. Li. Folding algorithm: A computational method for finite QBD pro-
cesses with level-dependent transitions. IEEE Transactions on Communications,
42(2):625-639, 1994.

[294] J.L. Z ah or j an, K.C. Sevcik, D.L. Eager and B.I. Galler. Balanced job bound analysis
of queueing networks. Communications of the ACM’, 25(2):134-141, 1982.

You might also like