0% found this document useful (0 votes)

980 views364 pages

Stochastic Modeling of Manufacturing Systems PDF

Uploaded by

Virgilio2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

980 views364 pages

Stochastic Modeling of Manufacturing Systems PDF

Uploaded by

Virgilio2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 364

Stochastic Modeling of Manufacturing Systems

Advances in Design, Performance Evaluation, and Control Issues

G. Liberopoulos · C. T. Papadopoulos · B. Tan
J. MacGregor Smith · S. B. Gershwin
Editors

Stochastic Modeling
of Manufacturing Systems
Advances in Design,
Performance Evaluation,
and Control Issues

With 121 Figures

and 91 Tables

123
George Liberopoulos J. M. Smith
Department of Mechanical Department of Mechanical
and Industrial Engineering and Industrial Engineering
University of Thessaly University of Massachusetts
38334 Volos Amherst, Massachusetts 01003
Greece USA
E-mail: [email protected] E-mail: [email protected]

Chrissoleon T. Papadopoulos Stanley B. Gershwin

Department of Economic Sciences Department of Mechanical Engineering
Aristotle University of Thessaloniki Massachusetts Institute of Technology
54124 Thessaloniki Cambridge, Massachusetts 02139-4307
Greece USA
E-mail: [email protected] E-mail: [email protected]

Barış Tan
Graduate School of Business
Koç University
80910 Sariyer, Istanbul
Turkey
E-mail: [email protected]

Parts of the papers of this volume have been published in the journal OR Spectrum.

Library of Congress Control Number: 2005930501

ISBN-10 3-540-26579-1 Springer Berlin Heidelberg New York

ISBN-13 978-3-540-26579-5 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is con-
cerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, repro-
duction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts
thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its
current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable
for prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springeronline.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
Cover design: Erich Kirchner
Production: Helmut Petri
Printing: Strauss Offsetdruck
SPIN 11506560 Printed on acid-free paper – 42/3153 – 5 4 3 2 1 0
Editorial – Stochastic Modeling of Manufacturing
Systems: Advances in Design, Performance Evaluation,
and Control Issues

Manufacturing systems rarely perform exactly as expected and predicted. Unex-

pected events always happen: customers may change their orders, equipment may
break down, workers may be absent, raw parts may not arrive on time, processed
parts may be defective, etc. Such randomness affects the performance of the sys-
tem and complicates decision-making. Responding to unexpected disturbances
occupies a significant amount of time of manufacturing managers. There are two
possible plans of action for addressing randomness: reduce it or respond to it in a
way that limits its corrupting effect on system performance. This volume is devot-
ed to the second. It includes fifteen novel chapters on stochastic models for the
design, coordination, and control of manufacturing systems. The advantage of
modeling is that it can lead to the deepest understanding of the system and give the
most practical results, provided that the models apply well to the real systems that
they are intended to represent. The chapters in this volume mostly focus on the
development and analysis of performance evaluation models using decomposition-
based methods, Markovian and queuing analysis, simulation, and inventory con-
trol approaches. They are organized into four distinct sections to reflect their shared
viewpoints.
Section I includes a single chapter (Chapter 1) on factory design. In this chapter,
Smith raises several concerns that must be addressed before even choosing a
modeling approach and developing and testing a model. Specifically, he discusses
a number of dilemmas in factory design problems and the paradoxes that they lead
to. These paradoxes give rise to new paradigms that can bring on new approaches
and insights for solving them.
Section II includes Chapters 2–7 on unreliable production lines with in-process
buffers.
More specifically, in Chapter 2, Enginarlar, Li, and Meerkov analyze a tandem
production line and determine the minimum buffer levels that are necessary to obtain
a desired line-efficiency. The work considers tandem lines with non-exponential
stations and extends prior work on tandem lines with exponential servers. A fairly
detailed simulation study is conducted to analyze the performance of the tandem
lines. The results are used to derive an empirical law that provides an upper bound
on the desired buffer levels.
In Chapter 3, Helber uses decomposition to analyze flow lines with Cox-2 dis-
tributed processing times and limited buffer capacity. First, he derives an exact
solution for a two-station line. Based on this solution, he then derives an approxi-
mate, decomposition-based solution for larger flow lines. Finally, he compares the
VI Editorial

results obtained by his decomposition method against those obtained by Buzacott,

Liu, and Shanthikumar.
In Chapter 4, Colledani, Matta, and Tolio present a decomposition method to
evaluate the performance of a production line with multiple failure modes and
multiple products. They solve analytically the two-part-type, two-machine line and
derive the decomposition equations for longer lines. They use an algorithm similar
to the DDX algorithm to solve these equations to determine the production rate and
other performance measures approximately.
In the next chapter (Chapter 5), Matta, Runchina, and Tolio address the question
of how to increase the production rate of production lines by using a shared buffer
within the system in order to avoid blocking. Simulation is used to demonstrate the
gain in the mean production rate when a common buffer is used. In addition, an
application of the shared buffer approach to a real case is reported.
In Chapter 6, Kim and Gershwin ask what happens if machines in a production
line can either fail catastrophically (stop producing), or fail to produce good parts
while continuing to produce. First, they develop a Markov process model for machines
with both quality and operational failures. Then, they develop models for two-machine
systems, for which they calculate total production rate, effective production rate,
and yield. Using these models, they conduct numerical studies on the effect of the
buffer sizes on the effective production rate.
Finally, in Chapter 7, Lee and Lee consider a flow line with finite buffers that
repetitively produces multiple items in a cyclic order. They develop an exact method
for evaluating the performance of a two-station line with exponentially or phase-
type distributed processing times by making use of the matrix geometric structure
of the associated Markov chain. They then present a decomposition-based approx-
imation method for evaluating larger lines. They report on the accuracy of their
proposed method and they discuss the effects of job variation and job sequence on
performance.
Section III includes Chapters 8–13 on queueing network models of manufac-
turing systems.
More specifically, in Chapter 8, Van Vuuren, Adan, and Resing-Sassen consider
multi-server tandem queues with finite buffers and generally distributed service
times. They develop an effective approximation technique based on a spectral expan-
sion method. Numerous experiments are utilized to demonstrate the effectiveness
of their performance methodology when compared with simulation of the same
systems. Their approximation methodology should be very useful for production
line design.
In Chapter 9, Koukoumialos and Liberopoulos present an analytical approxi-
mation method for the performance evaluation of multi-stage, serial systems
operating under nested or echelon kanban control. Full decomposition is utilized
along with an associated set of algorithms to effectively analyze the performance of
these systems. Finally, these approximation algorithms are utilized to accurately
optimize the design parameters of the system.
In the next chapter (Chapter 10), Spanjers, van Ommeren, and Zijm consider
closed-loop, two-echelon repairable item systems with repair facilities at a number
of local service centers and at a central location. They use an approximation method
Editorial VII

based on a general multi-class marginal distribution analysis algorithm to evaluate

the performance of the system. The performance evaluation results are then used to
find the stock levels that maximize the availability given a fixed configuration of
machines and servers and a certain budget for storing items.
In Chapter 11, Van Nyen, Bertrand, van Ooijen, and Vandaele present a heuris-
tic that minimizes the relevant costs and satisfies the customer service levels in
multi-product, multi-machine production-inventory systems characterized by job-
shop routings and stochastic arrival, set-up, and processing times. The numerical
results derived from the heuristic are compared against simulation.
In Chapter 12, Van Houtum, Adan, Wessels, and Zijm study a production system
consisting of several parallel machines, where each machine has its own queue and
can produce a particular set of job types. When a job arrives to the system, it joins
the shortest queue among all queues capable of serving that job. Under the assump-
tion of Poisson arrivals and identical exponential processing times they derive upper
and lower bounds for the mean waiting time and investigate how the mean waiting
time is effected by the number of common job types that can be produced by dif-
ferent machines.
Finally, in Chapter 13, Geraghty and Heavey review two approaches that have
been followed in the literature for overcoming the disadvantages of kanban control
in non-repetitive manufacturing environments. The first approach has been con-
cerned with developing new, or combining existing, pull control strategies and the
second approach has focused on combining JIT and MRP. A comparison between a
Production Control Strategy (PCS) from each approach is presented. Also, a com-
parison of the performance of several pull production control strategies in an envi-
ronment with low variability and a light-to-medium demand load is carried out.
The last section (Section IV) includes Chapters 14 and 15 on production plan-
ning and assembly.
In Chapter 14, Axsäter considers a multi-stage assembly network, where a num-
ber of end items must be delivered at certain due dates. The operation times at all
stages are independent stochastic variables. The objective is to choose starting times
for different operations in order to minimize the total expected holding and back-
order costs. An approximate decomposition technique, which is based on repeated
application of the solution of a simpler single-stage problem, is proposed. The per-
formance of the approximate technique is compared to exact results in a numerical
study.
In Chapter 15, Yıldırım, Tan, and Karaesmen study a stochastic, multi-period
production planning and sourcing problem of a manufacturer with a number of plants
and subcontractors with different costs, lead times, and capacities. The demand for
each product in each period is random. They present a methodology for deciding
how much, when, and where to produce, and how much inventory to carry, given
certain service level constraints. The randomness in demand and related probabilistic
service level constraints are integrated in a deterministic mathematical program by
adding a number of additional linear constraints. They evaluate the performance of
their methodology analytically and numerically.
This volume is a reprint of a special issue of OR Spectrum (Vol. 27,
Nos. 2–3) on stochastic models for the design, coordination, and control of
VIII Editorial

manufacturing systems, with the addition of Chapters 7 and 12 that appeared as

articles in other issues of OR Spectrum. That special issue of OR Spectrum origi-
nated from the 4th Aegean International Conference on Analysis of Manufacturing
Systems, which was held in Samos Island, Greece, in July 1–4 2003. The purpose
of that issue was not to simply publish the proceedings of the conference. Rather it
was to put together a select set of rigorously refereed articles, each focusing on a
novel topic. Collected into a single issue the articles aimed to serve as a useful
reference for manufacturing systems researchers and practitioners, and as reading
materials for graduate courses and seminars.
We wish to thank Professor Dr. Hans-Otto Guenther, Managing Editor of OR
Spectrum, and his staff for supporting the special issue of OR Spectrum and seeing
that it becomes a published reality as well as for supporting its subsequent reprint
into this volume with the addition of Chapters 7 and 12.

G. Liberopoulos, University of Thessaly, Greece

C. T. Papadopoulos, Aristotle University of Thessaloniki, Greece
B. Tan, Koç University, Turkey
J. M. Smith, University of Massachusetts, USA
S. B. Gershwin, Massachusetts Institute of Technology, USA
Contents

Section I: Factory Design

Dilemmas in factory design: paradox and paradigm

J. MacGregor Smith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Section II: Unreliable Production Lines

Lean buffering in serial production lines with non-exponential machines

Emre Enginarlar, Jingshan Li and Semyon M. Meerkov . . . . . . . . . . . . . . . . . . 29

Analysis of flow lines with Cox-2-distributed processing times

and limited buffer capacity
Stefan Helber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Performance evaluation of production lines with finite buffer capacity

producing two different products
M. Colledani, A. Matta and T. Tolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Automated flow lines with shared buffer

A. Matta, M. Runchina and T. Tolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Integrated quality and quantity modeling of a production line

Jongyoon Kim and Stanley B. Gershwin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Stochastic cyclic flow lines with blocking: Markovian models

Young-Doo Lee and Tae-Eog Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Section III: Queueing Network Models of Manufacturing Systems

Performance analysis of multi-server tandem queues

with finite buffers and blocking
Marcel van Vuuren, Ivo J. B. F. Adan and Simone A. E. Resing-Sassen . . . . . . . 169

An analytical method for the performance evaluation

of echelon kanban control systems
Stelios Koukoumialos and George Liberopoulos . . . . . . . . . . . . . . . . . . . . . . . 193
X Contents

Closed loop two-echelon repairable item systems

L. Spanjers, J. C.W. van Ommeren and W. H. M. Zijm . . . . . . . . . . . . . . . . . . . . 223

A heuristic to control integrated multi-product multi-machine

production-inventory systems with job shop routings and stochastic arrival,
set-up and processing times
P. L. M. van Nyen, J. W. M. Bertrand, H. P. G. van Ooijen and N. J. Vandaele . . . 253

Performance analysis of parallel identical machines

with a generalized shortest queue arrival mechanism
G. J. Van Houtum, I. J. B. E. Adan, J. Wessels and W. H. M. Zijm . . . . . . . . . . . . 289

A review and comparison

of hybrid and pull-type production control strategies
John Geraghty and Cathal Heavey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

Section IV: Stochastic Production Planning and Assembly

Planning order releases for an assembly system

with random operation times
Sven Axsäter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

A multiperiod stochastic production planning

and sourcing problem with service level constraints
Işıl Yıldırım, Barış Tan and Fikri Karaesmen . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Section I: Factory Design
Dilemmas in factory design:
paradox and paradigm
J. MacGregor Smith
Department of Mechanical and Industrial Engineering, University of Massachusetts,
Amherst, MA 01003, USA (e-mail: [email protected])

Abstract. The problems of factory design are notorious for their complexity. It
is argued in this paper that factory design problems represent a class of problems
for which there are crucial dilemmas and correspondingly deep-seated underlying
paradoxes. These paradoxes, however, give rise to novel paradigms which can bring
about fresh approaches as well as insights into their solution.

Keywords: Factory design – Dilemmas – Paradox – Paradigm

1 Introduction

The purpose of this paper is to develop a new paradigm for factory design that
integrates much of the theoretical underpinnings of the problems and processes
encountered in the author’s experiences with factory design. As a side beneﬁt to this
paper, many of the ideas discussed within point towards a new direction for which
manufacturing and industrial engineering professionals might re-align themselves,
since the paradigms which have guided these ﬁelds are in need of a new vision and
repair.

1.1 Motivation

The origins of this paper stem from an invitation to give a keynote address at a
conference on the Analysis of Manufacturing Systems 1 where the idea of the

I would like to thank the referees for their insights and suggestions and pointing out some
problems in earlier drafts. My approach to factory design has evolved over the years, and is
still evolving, and it is largely due to the inﬂuence of Professor Horst Rittel, my professor
at the University of California at Berkeley during my formative undergraduate days, who
instilled much of the basis of this philosophy.
1
4th Aegean Conference on: “The Analysis of Manufacturing Systems”, Samos Island
Greece, July 1st-July 4th, 2003
4 J. MacGregor Smith

address was to recount the author’s philosophy about manufacturing systems design
and in particular an approach to factory design problems.
Concurrently with the conference there appeared a related conundrum on the
email listserv: [email protected] of the Industrial Engineering fac-
ulty about an “identity” crisis within the industrial engineering community and the
direction of the profession and more practically speaking what fundamental courses
should be taught students of industrial engineering. It is not the ﬁrst time this iden-
tity crisis has arisen in IE, nor is the crisis one exclusive to industrial engineers, as it
commonly occurs throughout most professions from time-to-time. Paradoxically,
all professions have a vested interest in their clients, but cannot be trusted to act in
their clients best interests, “a conspiracy against the laity.”[21, 17].
Since, the Factory Design Problem (FDP) is a very important aspect within
manufacturing and industrial engineering, it became obvious that the subject matter
of the keynote address and the crisis in industrial engineering education are two
closely related matters. So while not attempting to be presumptuous, the resulting
paper was a response partly to this crisis and also more importantly to demonstrate
the author’s philosophy about factory design. The viewpoint and conclusions in the
paper may also apply to the problems of factory planning and control, but the focus
for the present paper is on the FDP problem.

1.2 Outline of paper

Section 2 of this paper provides necessary background, deﬁnitions, and notation on

the problem of factory design. Section 3 describes a case study used to illustrate
many of the ideas within the paper, while Section 4 provides the theoretical back-
ground of the many concepts in the paper. Section 5 describes the implication for
the manufacturing and IE profession and Section 6 concludes the paper.

2 Background

Many manufacturing and industrial engineering professionals view the FDP as

a complex queueing network, where one has to manufacture or produce a series
of products (1, 2, . . . , n) from different raw materials and possible sources. The
average arrival rate of type j raw material from source k is deﬁned as λjk (j =
1, 2, . . . , J; k = 1, 2, . . . , K). People, machines, manufacturing processes and the
material handling system are necessary to transform the raw materials into ﬁnished
goods for shipment to consumers at throughput rates θ1 , θ2 , . . . θn . Figure 1 is a

λ11 θ1

θ2
λ21
λjk
Σ θn
Fig. 1. Factory ﬂow design paradigm
Dilemmas in factory design: paradox and paradigm 5

useful caricature of the flow paradigm. The Σ represents the mathematical model
of the queueing network underlying the people, resources, products and their flow
relationships.
The professionals (especially the academics) would like to know the set of
underlying equations Σ (no questions asked) which would allow them to design the
factory to maximize the overall throughput (Θ) of the products and also minimize
the work-in-process (WIP) inside the plant.
The desire to find all these equations, or laws [9] as some people would like
to characterize them, is largely attributed to the scientific foundation of Industrial
Engineering education with a strong physics, chemistry, and mathematics back-
ground. A sterling example of one of these laws is Little’s Law L = λW which is
an extremely robust, effective tool to calculate numbers of machines, throughput,
and waiting times in queueing processes[9]. What will be shown in the following
is that this scientific approach is deficient. The problems of factory design cannot
be answered with just a scientific background, but need to be augmented with other
knowledge-based skills. The scientific background is necessary but not sufficient
to solve the problem.
In order to realize this factory flow paradigm, most IE professionals system-
atically define the multiple products (there can be hundreds) and their input rates
and raw material requirements, the constraint relationships with the machines, peo-
ple, resources, and materials handling equipment, and the functional equations for
achieving the WIP and throughput objectives, utilization, cycle time, lateness, etc.
This factory flow paradigm is often realized as a series of well-defined steps or
phases similar to the following top-down approach (see Fig. 2).
This top-down approach is also a hallmark of an operations research (OR)
paradigm typically argued for in OR textbooks found in the Industrial Engineering
curriculum. While this top-down (“waterfall”) [3] paradigm has its merits, mainly
for project management, it will be argued in this paper that other paradigms are
warranted, ones more realistically appropriate for treating FDPs. A key criticism
of the top-down approach is that no feedback loops occur at the detailed stages,
which is clearly unrealistic. A bottom-up approach, on the other hand, is really not
much better, since one has no real overall knowledge of what is being constructed.
One needs a paradigm that is paradoxically top-down and bottom up at the same
time. Unfortunately, very few individuals are capable of this prescient feat, thus
necessitating development of new external aids.
It will also be argued later on in this paper, that the recommended paradigm has
strong implications for changes in the profession and in the education of manufac-
turing and industrial engineers.

2.1 Deﬁnitions

Before we proceed too far along, it would be good to posit some of the key deﬁnitions
and notation utilized throughout the paper [6].
Dilemma: (Late Greek) dilEmmat, dilEmmatos- an argument presenting two or
more conclusive alternatives against an opponent; a problem involving
a difﬁcult choice; a perplexing predicament.
6 J. MacGregor Smith

Step 1.0 Identify Product Classes/Sources

Step 2.0 Product Routing Vectors

Step 3.0 Distance and Flow Matrices

Topological Network Design

Step 4.0
(TND) Diagrams

Optimal TND
Step 5.0
Alternatives

Stochastic Flow
Step 6.0
Matrices

Evaluation of
Step 7.0
Alternatives

Factory Plan
Step 8.0
Synthesis

Sensitivity
Step 9.0
Analysis

Step 10.0

Factory Plan
Implementation

Fig. 2. Factory design process paradigm

Paradox: (Greek) paradoxon, paradoxos- A tenet contrary to received opinion. A

statement that is seemingly contradictory or opposed to common sense.
Paradigm: (Greek) paradeigma, paradeiknynai- To show side by side a pattern- an
outstandingly clear example or archetype (a.k.a. a philosophy)

The notion of a dilemma in Factory Design is that we are often faced with difﬁ-
cult issues of what to do, and, occasionally, we must select between two alternatives
that are not necessarily desirable.
The notion of paradox is important because it helps frame the seemingly con-
tradictory elements which are contrary to common sense.
Dilemmas give rise to paradoxes which in turn underly paradigms for solution.
Paradigm is a particularly appropriate word when one thinks of it as a “pattern”,
since this is often what we employ in resolving design problems because of its
modular structure.
All three of these concepts are crucial underpinnings to what is to follow and
they form the basis of the general design “philosophy” purported in this paper.
The fact that these three concepts are derived from the Greek philosophers is an
indication of their importance.

2.2 Notation

The following notation shall be utilized to aid the discussion:

Dilemmas in factory design: paradox and paradigm 7

– ∆:= Dilemma
– χ:= Paradox
– δi := Deontic issue
– i := Causal or explanatory issue
– ιi := Instrumental issue
– φi := Factual issue
– πi := Planning Issue
– FDP:= Factory Design Problem
– WP:= Wicked Problem
– TP:= Tame Problem
– IBIS:= Issue Based Information System
– NI:= Non-Inferior set of solutions

3 Case study: polymer recycling project

In order to place things in perspective, a case study will be utilized to characterize

the ideas and concepts of the paper. One project completed eight years ago stands
out as a compelling example of the ideas in this paper. It was concerned with the
FDP of a polymer re-processing plant in Western, Massachusetts.

3.1 Problem description

Essentially, this plant represented a manufacturing/warehouse capacity design prob-

lem. The plant maintains a dynamic material handling system which operates 3
shifts 24 hours a day.
The problem as first posed to the factory design team largely revolved around
space capacity and equipment needs since the business was growing and there was
some real concern about the ability of the present site to accommodate future growth
of the business. The business is largely concerned with manufacturing essentially
four different polymer products PC, PC/ABS, PS, ABS and their combinations. In
fact, the unit load of the plant is 1000# gaylords (raw materials and finished goods)
filled with various plastic pellets. As will unfold, forecasting the ability of the plant
to respond to fluctuations in demand over time also became a critical part of the
study.

3.2 Links to paper

Figure 3 illustrates the initial layout of the plant that formed the basis of the layout
and systems model about to be discussed. One can see the 4 × 4 gaylords spread
throughout the facility in Figure 3.
As one can see in the plant, there is little room for expansion and there is a
restricted material handling system where the forklift trafﬁc coming and going
must traverse the same aisles.
8 J. MacGregor Smith

Fig. 3. Existing polymer re-processing plant

4 Dilemmas in factory design

The notion of the dilemmas in factory design stems from a seminal paper of Horst
Rittel and Mel Webber [17] on wicked problems. They outline the characteristics
of wicked problems and go on to recount how many planning problems are actu-
ally wicked problems. In fact they argue that there are essentially two classes of
problems:
– Tame Problems (TPs)
– Wicked Problems (WPs)
Tame problems are like puzzles: precisely described, with a finite (or count-
ably infinite) set of solutions, although perhaps extremely difficult to solve.
Problems solved via numerical and combinatorial algorithms can be grouped
in this category. The relationship of Computational Complexity and its classes
P, N P, N P−Complete, and N P−Hard are very appropriate characterizations
for tame problems. Also, more recently, designing large scale interacting systems
has been shown to be N P- complete [5].
It will be argued that the N P Complexity classification is a useful way of char-
acterizing TPs. On the other hand, Wicked problems are the exact opposite of tame
problems, and while not “evil” in themselves, present particulary nasty character-
istics which Rittel and Webber feel justly to deserve the approbation. Their wicked
Dilemmas in factory design: paradox and paradigm 9
WP

NP
N P−Hard

N P−Complete
Fig. 4. Wicked problem tame problem dichotomy

problem framework is useful for characterizing the FDP, since the characteristics of
FDPs as shall be argued are similar. Not all IEs or manufacturing engineers might
agree with the equivalence statement, but the equivalence framework, as we shall
argue, will become the basis for the new paradigm.
Very often, IEs utilize algorithmic approaches to solve FDPs, so they become
integral parts of the solution process of factory design problems, but a key question
here is: Can we utilize systematic procedures to solve FDPs?
While no formal classiﬁcation of WPs has been developed so far, other than what
is depicted in Figure 4, it appears that the distinction between one type of wicked
problem and another can be based on the following three measurable dimensions:

– x:= # Stakeholders (# persons concerned, involved and affected by the problem)

– y:= # Objectives in the problem {f1 , f2 , . . . fp }
– z:= Time frame or planning horizon (in years)

The degree of “wickedness” is correlated with the cardinality of the dimensions.

For example, establishing the solution for the disposal of nuclear waste is one of the
most difﬁcult WPs, since the time frame is thousands of years, and the consequences
affect millions of people. The reason for selecting these problem dimensions should
become clearer as the paper unfolds.
Project management is a classic example of a WP. We know that minimizing the
number of dummy activities in a PERT/CPM diagram is actually N P-Complete
[12], however, the complexity of balancing time, cost, and quality tradeoffs in
scheduling the construction and launching for example of the space shuttle is a
very wicked problem. Tame Problems and their solutions are often subsets of WPs
and they have their usefulness especially in providing arguments to convince people
one way or another on resolving a planning issue, but the TPs are in another class
compared to WPs.
Many other researchers have begun to realize the importance and extent of
wicked problems in other professions besides factory design. Some of the literature
on wicked problems is related to public service facility planning [22], government
resource planning within developing countries [19] software engineering design
projects [3], planning and project scheduling[20].
Unlike TPs, the ﬁrst characteristic of a wicked problem is that:
10 J. MacGregor Smith

∆1 :There is no deﬁnitive problem formulation.

The dilemma argues that factory design problems cannot be written down on a
sheet of paper (like a quadratic equation), given to someone, where they then can
go off into a corner and work out the solution. Students are continually drilled with
textbook problems (the author is guilty of this himself), but these are not the real
problems. Recent research on the modularization of design problems has shown that
modularization avoids trade-offs in decision making and often ignores important
interactions between decision choices [5].
If someone states the problem as: “build a new plant” or “remodel the existing
facility”, or “add another storey”, then, i.e. the solution and problem are one and
the same! This is antithetical to the scientific paradigm. In fact, the entire edifice
of NP-Completeness problems (i.e. Tame Problems) is critically structured around
the precise problem definition e.g. 3-satisfiability.
For FDPs, it is important whom you talk with and their worldview because
in the ensuing dialog the solution to the problem and the problem definition will
emerge.
In the case of the polymer recycling plant, when the facility was first examined,
their receiving and shipping areas were co-located in the same area of the plant, see
the lower left hand corner of Figure 3 which resulted in severe material handling
conflicts with forklift truck movements, accidents, and space utilization problems.
It was obvious that separate receiving and shipping areas were desirable– thus, the
problem was the same as the solution: “re-layout the plant and separate receiving
and shipping.”
Thus, we have the first formal paradox: χ1 := Every formulation of a problem
corresponds to a statement of its solution and vice versa[14].
This first dilemma of factory design is a most difficult one. One cannot know
a priori the problems inherent in factory design, independent of the client and the
context around which the problem occurs. In essence, the factory design process is
essentially information deficient.
Many “experts” in manufacturing and IE purport to know the answers, yet one
must talk with the owners, the plant manager, the line staff, and many others involved
with the facility, before the problems and their solutions can be identified. As the
paper proceeds, we will postulate the underlying principles of the new paradigm
as Propositions. In fact, the principle underlying the paradigm associated with this
first dilemma and paradox is:
Proposition 1. The FDP design system ≡ Knowledge/Information System.
What is meant here by an knowledge/information system? The knowl-
edge/information system here is a special type of information system, not just
a sophisticated data base system, where one collects data for the sake of collecting
data, but data is collected to resolve the planning issues. The planning issues are
the fundamental units within the information system [13]. A related information
system approach based on the first proposition is that of Peter Checkland’s work
[1], however, the information system and resulting paradigm discussed in this paper
is based upon different concepts and is directly related to the FDP.
Dilemmas in factory design: paradox and paradigm 11

φi ι1

πi i

δi ι2

Fig. 5. Planning issue πi

What are the building blocks of this knowledge/information system? There

are essentially four categories of knowledge (issues) needed to help formulate the
problem. These fundamental categories of issues are basic to the IBIS[13]:
– Factual issue (φi ):= Knowledge of what is, was, or will be the case.
– Deontic issue (δi ):= Knowledge of what ought to be or should be the case.
– Explanatory issue (i ):= Knowledge of why something is the case.
– Instrumental issue (ιi ):= Knowledge of the conditions and methods under
which the problem can be resolved.

Proposition 2. A planning issue πi is a discrepancy between what is the case φi

and what ought to be the case δi [15].
The conflict between φi and δi gives rise to πi . Deontic knowledge is critical to
the problem formulation and might be considered as factory planning principles,
or “golden rules.”
The explanatory issues i describe why the problem occurs and the instrumental
issues ι1 , ι2 describe alternative ways of resolving the πi . At least two alternative
ways of resolving an issue are felt to be important for the problem structure and
its completeness. Figure 5 illustrates the relationship between a factual issue, a
deontic issue, the explanatory and instrumental issues. Each planning issue should
be comprised of these component parts.
The planning issue structure is a useful paradigm itself of the elements of
problem formulation. It becomes clear how the component parts of a problem
should be defined. It also provides an unambiguous method for defining a problem.
Each planning issue is dynamic but also bounded. A brief example of a planning
issue is derived from the polymer recycling plant.
– Factual Issue (φi ):= The number of accidents and potential conflicts with per-
sonnel in the plant at the receiving and shipping areas is excessive.
– Deontic Issue (δi ):= The number of conflicts between plant personnel and fork-
lift trucks should be minimized.
– Planning Issue (πi ):= How should congestion between forklift trucks and plant
personnel be avoided at the receiving and shipping area?
12 J. MacGregor Smith

Questions Issues

Answers Positions Taken

Arguments Heard

Decisions Reached

Knowledge Gained
Fig. 6. Planning issues resolution process

– Explanatory Issue (i ):= There is not clear separation between the forklift
trucks and the plant personnel within the receiving and shipping area.
Instrumental Issue (ι1 ):= If space is available, separate receiving and shipping
and design the material handling systems in the plant in a U-shape layout.
Instrumental Issue (ι2 ):= If space is unavailable, clearly demarcate the
receiving and shipping areas and the paths of the vehicles and pedestrians.

The reason the above are stated as issues is that evidence for their support
must be brought forth to support or refute each issue. People must be convinced
of the case being made. Some issues are easily resolved as questions, while others
may not be so easily resolved. Not everyone might agree with what we mean by
“excessive” trafﬁc in the receiving and shipping area of φi so some supporting
data may be necessary. Likewise, even the instrumental issues will likely need
supporting evidence such as is possible with sophisticated simulation and queueing
models to estimate expected (maximum) volume of forklift trafﬁc, # number of
expected gaylords in the shipping and receiving areas, etc. Why a U-shape layout?
is certainly arguable. Figure 6 is suggestive of the issue resolution process.
While this approach to problem formulation through the planning issues
paradigm can be seen as well-structured, there can be many planning issues in
factory design, which, unfortunately, leads to the next dilemma.
Dilemmas in factory design: paradox and paradigm 13

C1 C2 · · · Cj · · · Cn−1 Cn

π11 π1,2 π1j π1,n

... πij ...

πm1 πmj πmn

Fig. 7. IBIS dynamic programming paradigm

∆2 : Every factory design problem is symptomatic of every other factory

design problem.

The second dilemma underscores the fact that there are many problems nested
together, there is not simply one isolated problem to be solved. The paradox sur-
rounding the second dilemma is that: χ2 :=Tackling the problem as formulated
may lead to curing the symptoms of the problem rather than the real problem-you
are never sure you are tackling the right problem at the right level.
One needs to tackle the problems on as high a level as possible. In the poly-
mer recycling project, issues of scheduling, resource conﬁguration and utilization,
quality control, and many others became functionally related to the plant layout
problem. As will be shown, these other issues emerged as critical to the plant lay-
out. The principle needed in the paradigm in response to the paradox of dilemma
#2 is:

Proposition 3. Construct a network of planning issues, an Issue-Based Information

System (IBIS).

An (IBIS) is needed in order to identify, interrelate, and quantify (weights of

importance) the different planning issues within the FDP. Figure 7 illustrates one
realization of an IBIS through a dynamic programming (DP) paradigm.
An IBIS has a number of stages C1 , . . . Cn which serve as useful ways of
organizing the planning issues as they are deﬁned and emerge in the planning
process. Each node within a stage j represents a planning issue πij . The planning
issues represent the states of the DP framework. Within each stage Cj all π s are
inter-connected cliques. There can be many links from one πij to another πik so it
makes the most sense that the data organization would be some type of relational
data base. However, depending upon the problem, other ways of organizing the
issues would be possible, such as a simple matrix.
14 J. MacGregor Smith

Each Cj represents a stage of the DP paradigm and each state has a set of
alternative ways of resolving each planning issue πij labelled as alternative k within
each planning issue xijk Transitions between states in adjacent stages would have
an associated cost for transitioning or linking adjacent states. One possible recursive
cost function for an additive or separable resource constrained problem could be
[8]:
∗
fj (πi , xijk ) = cπxk + fj+1 (xijk )

In general, the recursive cost function need not be additive, yet the additive
situation would be quite appropriate in many resource constrained IBIS scenarios.
The general recursive cost function relationship would more likely be:

fj (π) = max / min{fj (π, xijk )}

xijk xijk

One can consider the overall cost of resolving a set of planning issues as a
path/tree through the stages and states of the IBIS problem. Each such path repre-
sents a morphological plan solution.

πi πn

cij
π
cjk ckm

cmn
πj πm

Figure 8 illustrates another IBIS network that was utilized by the author to
approach a resource planning problem at the University of Massachusetts [20]. In
this study there were ﬁve categories (stages) of planning issues (22 issues total):

– C1 : Client Communication/ Ownership

– I2 : Information Tracking of Projects
– S3 : Scheduling and Control of Projects
– G4 : General Project Management
– O5 : Outreach to Clients

The IBIS provided a viable framework which resulted in a successful resolution

of the management process of small-scale construction projects. In fact, as we speak,
this management struggle is still on-going at the University. The planning issues
will simply not go away.
The obvious implications for the manufacturing and IE professionals and their
education is that the design and analysis of information systems are crucial to the
profession. This is in response to dilemmas ∆1 and ∆2 .
Well, let’s argue that these notions of planning issues and information systems
are reasonable, what next?
Dilemmas in factory design: paradox and paradigm 15

Fig. 8. University of Massachusetts IBIS project

∆3 There is no list of permissable operations.

When one plays chess, there are only a ﬁnite number of moves to start the game.
In linear programming, one needs a starting feasible solution to begin the process.
In factory design, there is no one single place to start the problem formulation and
solution process.
For the polymer recycling project, we could have visited other polymer pro-
cessing plants, travelled to other locations besides Western Massachusetts, read all
the literature on polymer re-processing, carried out a mail survey, talked with all the
employees, and so on. We should have done all the above, but alas, it was not prac-
tical nor cost-effective. This dilemma is founded on the following paradox: χ3 :=If
one is rational, one should consider the consequences of their actions; however,
one should also consider the consequences of considering the consequences, i.e. if
there is nowhere to start to be rational, one should somehow start earlier [15].
The paradox indicates that a great deal of knowledge about the system under
study is needed to assist the client and the engineers in making decisions about the
FDP. Of course, a logical response to this paradox is the following principle:
Proposition 4. Construct a system representation Σ (analytical or simulation) of
the manufacturing system within which the FDP is situated.
This principle is very useful one but obviously can be expensive in time to con-
struct. It makes eminent sense in the supply-chain business environment currently
popular, so the more one understands the logistics and the manufacturing systems
and processes, the better. At this point, the system model Σ becomes an integral
part of the new paradigm.
A discrete-event digital simulation model of the polymer recycling plant was
constructed in order to better understand the manufacturing processes and the sys-
tem as well as the logistics of the product shipments to and from the plant. This
16 J. MacGregor Smith

Fig. 9. Final plan for polymer re-processing plant

was felt to be crucial before simply re-laying out the plant and will be shown to be
an extremely fortunate decision.
Figure 9 illustrates the layout plan arrived at with a u-shaped circulation ﬂow
to eliminate the forklift conﬂicts from the previous scheme (Fig. 3). Unfortunately,
this was not the end of the story.
Thus, for the Manufacturing and IE professional, system models such as supply-
chain networks, simulation and queueing network models are critically important
to frame the context of the problem. The “systems approach” is still sage advice.
Related to ∆3 is:

∆4 : There is no stopping rule.

In chess, you either win, lose, or draw– game over! In linear programming, either
you ﬁnd the optimal solution, an unbounded one, or ﬁnd out that the problem is
infeasible. In factory design, you can always make improvements to the system. As
we saw above, simply arriving at the layout design in not enough. Thus, we have
the following paradox: χ4 :=If one is rational, one should also know that every
consequence has a consequence, so once one starts to be rational, one cannot stop-
one can always do better [15].
Dilemmas in factory design: paradox and paradigm 17

Client Information Scheduling General Outsource

Comm. Systems Control Process

C11 I12 S13 G14 O15

C21 I22 S23 G24 O25

C31 I32 S33 G34 O35

C41 I42 S43 G44

I52 S53 G45

Fig. 10. Path through IBIS network

In fact, another paradox which interrelates ∆3 and ∆4 is: χ5 :=One cannot

start to be rational and, consequently, one cannot stop [15].
The final step in generating plans for FDPs here is that in factory design and
in most wicked problems, time, resources, and the finances involved indicate that
one must terminate the design process and arrive at a final plan.
In the context of the IBIS network (see Fig. 10), the highlighted circles illustrate
the selected path/plan through the IBIS issues which is actually the path that was
taken for the University of Massachusetts project. This path included the following
prescient issues which was used to formulate the ultimate strategy (and problem!)
for solution:
– C31 : There is no customer feedback loop.
– I32 : Small construction projects are not as well managed as larger capital
projects.
– S33 : Cycle times for small construction projects are not satisfactory.
– G34 : There is no dedicated professional staff assigned to small projects.
– O35 : Outside private contractors (rather than University personnel) do not have
access to as-built drawings of University facilities.
Given the resources, time, and financial constraints, this selected path through the
IBIS represented a reasonable morphological plan solution.
Also, the remaining issue network does not disappear once the final plan is
agreed upon. This is a realistic assessment of the planning process and is also
related to the next dilemma.

∆5 : There are many alternative explanations for a planning issue

As one can argue, there are many explanations for each planning issue, and thus,
there are many potential solutions, not just one. Refer to Figure 11 for an illustration
of this process.
18 J. MacGregor Smith

ι11
φi

i1 ι12

πi i2 ι21

δi
ι22
ι11
φi ι12
i1
ι21
i2
πi
ι22
δi i3
ι31
ι32

Fig. 11. Many explanations possible for πi

The paradox surrounding this dilemma: χ5 :=People need to choose one so-
lution as a “best” solution; but, unfortunately, there are many potential solutions,
with correspondingly difﬁcult tradeoffs.
In response to this situation, one needs much help to generate innovative so-
lutions to the underlying FDP problems. Layout planning algorithms were used
in the polymer processing plant to help come to a solution to the layout problem
and also were seen as vehicles to resolve issues in the layout problem, not as ends
in themselves. Besides using combinatorial optimization algorithms, one needs to
generate a special set of solutions, in fact, the paradigmatic principle which ∆5 is
based upon is closely related to the next dilemma both in spirit and in practice.

∆6 : There is no single criterion for correctness.

In most TPs, there are objective functions which clearly demarcate feasible from
optimal solutions. The gap between linear and nonlinear programming TPs can be
quite huge. In wicked problems, there are a multiple number of objective func-
tions, not only linear and nonlinear ones. Paradoxically, in factory design we have:
χ6 :=Solutions are either good or bad not right or wrong (true or false). There are
multiple criteria embedded within each planning issue.
Dilemmas in factory design: paradox and paradigm 19

∆5 and ∆6 are closely related since one of the reasons why there are so many
solutions is that there are multiple objectives in FDP. Thus, we need to generate a
Non-Inferior(NI) set of solutions, and the notion of optimality becomes spurious
because it only makes sense in a single objective environment. It is very rare that an
FDP has only one objective. In another project we worked on, the project manager
gave out the following daunting list of objectives before we started our project:
Optimize product ﬂow in order to:
– Reduce project costs;
– Reduce WIP investment;
– Increase Inventory turns;
– Reduce scrap and rework;
– Quicker response to customer needs;
– Improve response time to quality problems;
– Improve housekeeping;
– Better utilize ﬂoor space;
– Improve safety;
– Eliminate fork lift trucks.
Thus, we have the following:
Proposition 5. Generate a Non-inferior (NI) set of Solutions based upon the mul-
tiple objectives/criteria {f1 (x), f2 (x), . . . , fp (x)} involved in the FDP.
The implications of ∆5 and ∆6 for the manufacturing and IE profession and
curriculum are that multi-criteria and multi-objective programming are essential
methodological concepts and algorithmic tools in manufacturing systems and IE.
MCDM concepts and methodologies have slowly been introduced into IE curricu-
lums which is a very positive sign. Related to the last dilemma is the fact that:

∆7 : There is no immediate or ultimate test of a solution

Mathematical programming models, analytical stochastic tools, and simulation

models become very important for arguing why resolving a certain issue in a certain
way should be carried out. Thus, the systems model suggested in ∆3 are critical for
resolving ∆7 . The paradox here is: χ7 :=Unlike chess or solving an equation sys-
tem, there is no immediate or ultimate test of a solution, because there are dynamic
consequences over time, i.e. a great deal of uncertainty.
∆7 is closely related to ∆4 . Both simulation and analytical stochastic and
dynamic models are necessary .

∆8 : Every factory design problem is a one-shot operation.

In factory design problems, one doesn’t get a second chance. One can play chess
or solitaire many times over. Solving mathematical programming programs on
one computer or a distributed computer network is routine. Markovian queueing
networks can be run forwards or backwards in time and this affords their decom-
posability. The paradox is that χ8 :=FDPs are not time reversible. There is no
20 J. MacGregor Smith

trial and error with factory design problems, no experimentation you cannot build
a plant, tear it down, and rebuild it without significant consequences.
This dilemma and paradox are very troubling because once the factory design
project goes to the construction phase, there is no turning back. In many scientific
disciplines, repeated experimentation to test an hypothesis is routine and accepted
practice because the costs and consequences are justified. The principle relating
∆7 and ∆8 is:
Proposition 6. Dynamic Models Σ(t) are needed for FDPs.
For the manufacturing and IE profession, simulation modelling is accepted
practice and with good reason. Analytical system models with queueing networks
are also becoming more important and many of these analytical tools are often used
in addition to simulation.
The polymer recycling project is most appropriate as an illustration of these
dilemmas at this stage. In order to test our final factory design layout, we ran the
simulation model and calculated the number of gaylords in the warehouse as a
function of variations in the input demand λi , ∀ i, from 0% − 20%. Figure 12
illustrates the results of the simulation runs for the total number of raw material
gaylords possible on the y-axis vs. the input demand on the x-axis.
The first 3 columns of Figure 12 illustrate the number of raw material gaylords
as a function of input demand. Thus, as one can see the initial design of the plant
was fairly robust.
However, Figure 13 revealed that as the input volume ramped up in the plant to
120% (3rd column), a serious problem arose with one of the key resources because
at 120% of input demand the minimum raw material input volume went negative
by 670 gaylords. Essentially, the plant input-processing of raw materials basically
shut-down. We needed to find out which resource was the bottleneck.
After a detailed analysis of the simulation model outputs, it was revealed in the
third column of Figure 14 where it is shown that the auger blender was operating
at 100% capacity and could not handle any more input. The auger blender was the
bottleneck. Thus, if the input demand was to be greater than 20% of the current
demand, it became obvious that a minimum of 2 auger blenders were needed as
opposed to only one.

Fig. 12. Total number of raw material gaylords

Dilemmas in factory design: paradox and paradigm 21

Fig. 13. Average and minimum raw material capacity

Fig. 14. Blender utilizations vs. input demand

In subsequent runs of the simulation model, 2 auger blenders were utilized, so

that in viewing the fourth column in Figures 12,13, and 14, the output statistics
include 2 auger blenders operating within the plant. Finally, Figure 15 illustrates
that with 2 auger blenders, the total capacity of the plant (# of gaylords including
raw materials and ﬁnished goods in the revised layout) is acceptable for the given
input levels of input demand.
Additional runs of the simulation model revealed that if future input demand
were to increase beyond 20%, four extruders rather the current three would be
needed to handle the demand. Thus, the simulation model became an invaluable
tool to identify the shifting bottlenecks and forecast the conﬁguration of resources
needed within the plant as demand increased over time. The next dilemma is very
troubling for academics, because it argues that:
22 J. MacGregor Smith

Fig. 15. Final total # of gaylords warehouse capacity

∆9 : Every factory design problem is unique.

In academia, one learns general principles (deontic knowledge); however, in prac-

tice, these general principles must be tempered with the surrounding context, the
client, the ever-changing problem requirements, and uncertainty in modelling. With
every new FDP, one must start over again. The paradox is that: χ9 :=General knowl-
edge and rules are very limited. You cannot learn for the next time. One cannot
easily use strategies that have worked in the past and expect that they will work in
the future [15].
Even with all the detailed simulation models and understanding of the plant
painstakingly done, when it came to examining the relocation of the polymer pro-
cessing plant two years after the study, everything had to be re-done all over again,
because the site was different, the existing buildings were not the same, the input
volume had changed, etc.
Certainly one might argue that experienced people have special knowledge of
the issues surrounding a FDP, but there is no guarantee, even if one knows the
issues, that the solutions used in the past to resolve them will work in the future.
Proposition 7. You should never decide too early the nature of the solution and
whether or not an old solution can be used in a new context [15].
Finally, we have the last dilemma:

∆10 := We have no right to be wrong.

This is also very challenging for professionals as well as academics, because the
principles of scientific research can be compromised. Science can accept or refute
an hypothesis, mathematicians can disprove conjectures, but running a business
cannot accept failure. Compromise is essential. The cynical remarks by George
Bernard Shaw [21] mentioned at the beginning of this paper underly the moral
dilemma captured by this last dilemma. The paradox surrounding this last dilemma
is that: χ10 :=Design cannot be carried out in solitary confinement, the FDP design
process is democratic. The final principle summarizes our overall approach to FDP:
Dilemmas in factory design: paradox and paradigm 23

Proposition 8. Solving FDPs is an argumentative and dynamic process concerned

with identifying, explaining, and resolving of the planning issues.
This last principle links back to ∆1 , since the problem formulation process
must start with inquiries and issues, and thus an argumentative, dynamic process
through an IBIS is critical to the entire FDP process.

5 Implications for the profession and the curriculum

To brieﬂy summarize and emphasize the importance of the preceding discussion,

the ten different dilemmas are re-presented below:
∆1 : There is no deﬁnitive problem formulation.
∆2 : Every problem is symptomatic of every other problem.
∆3 : There is no list of permissable operations.
∆4 : There is no stopping rule.
∆5 : There are many alternative solutions for a planning issue.
∆6 : There is no single criterion for correctness.
∆7 : There is no immediate or ultimate test of a solution.
∆8 : Every factory design problem is a one-shot operation.
∆9 : Every factory design problem is unique.
∆10 : We have no right to be wrong.

The elemental implications for the manufacturing and IE profession are proba-
bly best described in a summary implication diagram centered around the dilemmas

∆1

∆2

∆3
Σ ∆4

∆5 ∆6 IBIS

∆7
Σ(t) ∆8

∆9

∆10
Fig. 16. Final IBIS paradigm
24 J. MacGregor Smith

and the IBIS which must integrate them and the models necessary to resolve the
issues (see Fig. 16).
The IBIS is necessary to frame ∆1 and to interrelate the different issues and
problems spawned by ∆2 . A systems model Σ is necessary for ∆3 , ∆4 . Generating
ideas and evaluating as captured by ∆5 and ∆6 must rely on effective algorithmic
tools but these must be tempered with a cognizance of the multiple objectives and
criteria involved so that effective tradeoffs can be made. A Stochastic/Dynamic
model Σt is necessary to address the variability, prediction, and control issues
surrounding ∆7 , ∆8 , ∆9 . Indeed, the degree of uncertainty in most FDPs makes
this last stage very challenging. Finally, the IBIS needs to be an open and democratic
system that links all aspects of the FDP process.
Perhaps the weakest element in most manufacturing and IE curriculums, at least
from the perspectives argued in this paper, is adequate exposure to FDPs as Wicked
Problems.
Design problems within academia with real clients are most desirable, whereas,
if this is not possible, projects derived from a real world setting with realistic con-
straints and expectations should be pursued. In a very positive sense, many schools
have semester or year-long senior design projects which can capture this aspect
of the FDP problem. An interesting development in Engineering education is the
Conceiving-Designing-Implementing-Operating real-world systems and products
(CDIO) collaborative http://www.cdio.org/ which underscores much of what has
been argued here in the is paper. It is oriented to all of Engineering rather than just
Industrial and Manufacturing Engineering, but its philosophy is similar. However,
it does not appear to rely on an IBIS approach, which as argued for in this paper, is
very critical to success in resolving real-world problems.
Problem formulation and structuring for WPs are very difﬁcult topics to treat
and teach, but the IBIS framework is something which has clear paradigmatic
and teachable elements. Of course, how these elements are put together into the
curriculum remains the real wicked problem.

6 Summary and conclusions

The underlying dilemmas, paradoxes, and possible paradigms of factory design

have been expounded upon. All these concepts are closely intertwined and it is
hoped that illuminating the relationship between these elements will shed some
light on possible approaches to FDPs. An IBIS is proposed to be the vehicle for
structuring the design process for FDPs. Also, as a side beneﬁt, possible changes
to the manufacturing and IE curriculums have been discussed, since FDPs pose a
microcosm and synthesis of many of the activities manufacturing and IEs profess.
Dilemmas in factory design: paradox and paradigm 25

References

1. Checkland P (1984) Rethinking a systems approach. In: Tomlnson R, Kiss I (eds) Re-
thinking the process of operational research and systems analysis, pp 43–65. Pergamon
Press, New York
2. Cook SA (1971) The complexity of theorem-proving procedures. Proc. of the Third
ACM Symposium on Theory of Computing, pp 4–18
3. DeGrace P, Stahl L (1990) Wicked problems, righteous solutions. Yourden Press Com-
puting Series, Upper Saddle River, NJ
4. Dixon JR, Poli C (1995) Engineering design and design for manufacturing. Field Stone,
Conway, MA
5. Ethiraj SK, Levinthal D (2004) Modularity and innovation in complex systems. Man-
agement Science 50(2): 159–173
6. Funk, Wagnalis (1968) Standard college dictionary. Harcourt, Brace and World, New
York
7. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of
NP-completeness. Freeman, San Francisco
8. Hillier F, Lieberman G (2001) Introduction to operations research. McGraw-Hill, New
York
9. Hopp W, Spearman M (1996) Factory physics. McGraw-Hill, New York
10. Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher
JW (eds) Complexity of computer computations. Plenum Press, New York
11. Karp RM (1975) On the computational complexity of combinatorial problems. Net-
works 5: 45–68
12. Krishnamoorthy MS, Deo N (1979) Complexity of minimum-dummy-activities prob-
lem in a PERT network. Networks 9: 189–194
13. Kunz W, Rittel H (1970) Issues as elements of information systems. Institute for Urban
and Regional Development, Univeristy of California, Berkeley, Working Paper No. 131
14. Rittel H (1968) Lecture notes for arch, vol 130. Unpublished, Department of Architec-
ture, University of California, Berkeley, CA
15. Rittel H (1972a) On the planning crisis: systems analysis of the ‘ﬁrst and second
generations’. Bedrifts Okonomen 8: 390–396
16. Rittel H (1972b) Structure and usefulness of planning information systems. Bedrifts
Okonomen (8): 398–401
17. Rittel H, Webber M (1973) Dilemmas in a general theory of planning. Policy Sciences
4: 155–167
18. Rittel H (1975) On the planning crisis: systems analysis of the ﬁrst and second gen-
erations. Reprinted from: Bedrifts ØKonmen, No. 8, October 1972; Reprint No. 107,
Berkeley, Institute of Urban and Regional Development
19. Roberts N (2001) Coping with wicked problems: the case of Afghanistan. Learning from
International Public Management Forum, vol 11B, pp 353–375. Elsevier, Amsterdam
20. Robinson D, et al. (1999) A change proposal for small-scale renovation and construction
projects. Project Team Report. Internal University of Massachusetts Report. Campus
Committee for Organization Restructuring, University of Massachusetts, Amherst, MA
21. Shaw GB (1946) The doctors dilemma. Penguin, New York
22. Smith J MacGregor, Larson RJ, MacGilvary DF (1976) Trial court facility. National
Clearinghouse for Criminal Justice Planning and Architecture, Monograph B5. In:
Guidelines for the planning and design of state court programs and facilities. University
of Illinois, Champaign, IL
Section II: Unreliable Production Lines
Lean buffering in serial production lines
with non-exponential machines
Emre Enginarlar1 , Jingshan Li2 , and Semyon M. Meerkov3
1
Decision Applications Division, Los Alamos National Laboratory,
Los Alamos, NM 87545, USA
2
Manufacturing Systems Research Laboratory, GM Research and Development Center,
Warren, MI 48090-9055, USA
3
Department of Electrical Engineering and Computer Science, University of Michigan,
Ann Arbor, MI 48109-2122, USA (e-mail: [email protected])

Abstract. In this paper, lean buffering (i.e., the smallest level of buffering neces-
sary and sufficient to ensure the desired production rate of a manufacturing system)
is analyzed for the case of serial lines with machines having Weibull, gamma, and
log-normal distributions of up- and downtime. The results obtained show that: (1)
the lean level of buffering is not very sensitive to the type of up- and downtime dis-
tributions and depends mainly on their coefficients of variation, CVup and CVdown ;
(2) the lean level of buffering is more sensitive to CVdown than to CVup but the
difference in sensitivities is not too large (typically, within 20%). Based on these
observations, an empirical law for calculating the lean level of buffering as a func-
tion of machine efficiency, line efficiency, the number of machines in the system,
and CVup and CVdown is introduced. It leads to a reduction of lean buffering by a
factor of up to 4, as compared with that calculated using the exponential assump-
tion. It is conjectured that this empirical law holds for any unimodal distribution of
up- and downtime, provided that CVup and CVdown are less than 1.

Keywords: Lean production systems – Serial lines – Non-exponential machine

reliability model – Coefﬁcients of variation – Empirical law

1 Introduction

1.1 Goal of the study

The smallest buffer capacity, which is necessary and sufﬁcient to achieve the desired
throughput of a production system, is referred to as lean buffering. In (Enginarlar
et al., 2002, 2003a), the problem of lean buffering was analyzed for the case of
Correspondence to: S.M. Meerkov
30 E. Enginarlar et al.

serial production lines with exponential machines, i.e., the machines having up-
and downtime distributed exponentially. The development was carried out in terms
of normalized buffer capacity and production system efficiency. The normalized
buffer capacity was introduced as
N
k= , (1)
Tdown
where N denoted the capacity of each buffer and Tdown the average downtime of
each machine in units of cycle time (i.e., the time necessary to process one part
by a machine). Parameter k was referred to as the Level of Buffering (LB). The
production line efficiency was quantified as
P Rk
E= , (2)
P R∞
where P Rk and P R∞ represented the production rate of the line (i.e., the average
number of parts produced by the last machine per cycle time) with LB equal to k
and infinity, respectively. The smallest k, which ensured the desired E, was denoted
as kE and referred to as the Lean Level of Buffering (LLB).
Using parameterizations (1) and (2), Enginarlar et al., (2002, 2003a) derived
closed formulas for kE as a function of system characteristics. For instance, in the
case of two-machines lines, it was shown that (Enginarlar et al., 2002)
⎧
⎪
⎪ 2e(E − e)
⎪
⎪ , if e < E,
⎨ 1−E
exp
kE = (3)
⎪
⎪
⎪
⎪
⎩ 0, otherwise.
Here the superscript exp indicates that the machines have exponentially distributed
up- and downtime, and e denotes machine efficiency in isolation, i.e.,
Tup
e= , (4)
Tup + Tdown
where Tup is the average uptime in units of cycle time. For the case of M > 2-
machine serial lines, the following formula had been derived (Enginarlar et al.,
2003a):
⎧
⎪ e(1 − Q)(eQ + 1 − e)(eQ + 2 − 2e)(2 − Q)
⎪
⎪ ×
⎪
⎪ Q(2e − 2eQ + eQ2 + Q − 2)
⎪
⎪
⎨ E−eE+eEQ−1+e−2eQ+eQ2 +Q
⎪
1
exp
kE (M ≥3)= ln , if e<E M −1 , (5)
⎪
⎪ (1 − e − Q + eQ)(E − 1)
⎪
⎪
⎪
⎪
⎪
⎪
⎩
0, otherwise,
where
1
−3 M/4
1 −3 M/4

1+( M 1+( M M −2
M −1 ) M −1 )
Q = 1−E2 + E2 − E M −1

1

E M −1 − e
× exp − . (6)
1−E
Lean buffering in serial production lines with non-exponential machines 31

This formula is exact for M = 3 and approximate for M > 3.

Initial results on lean buffering for non-exponential machines have been re-
ported in (Enginarlar et al., 2002). Two distributions of up- and downtime have
been considered (Rayleigh and Erlang). It has been shown that LLB for these
cases is smaller than that for the exponential case. However, (Enginarlar et al.,
2002) did not provide a sufficiently complete characterization of lean buffering in
non-exponential production systems. In particular, it did not quantify how different
types of up- and downtime distributions affect LLB and did not investigate relative
effects of uptime vs. downtime on LLB.
The goal of this paper is to provide a method for selecting LLB in serial lines
with non-exponential machines. We consider Weibull, gamma, and log-normal
reliability models under various assumptions on their parameters. This allows us to
place their coefficients of variations at will and study LLB as a function of up- and
downtime variability. Moreover, since each of these distributions is defined by two
parameters, selecting them appropriately allows us to analyze the lean buffering for
26 various shapes of density functions, ranging from almost delta-function to almost
uniform. This analysis leads to the quantification of both influences of distribution
shapes on LLB and effects of up- and downtime on LLB. Based of these results,
we develop a method for selecting LLB in serial lines with Weibull, gamma, and
log-normal reliability characteristics and conjecture that the same method can be
used for selecting LLB in serial lines with arbitrary unimodal distributions of up-
and downtime.

1.2 Motivation for considering non-exponential machines

The case of non-exponential machines is important for at least two reasons:

First, in practice the machines often have up- and downtime distributed non-
exponentially. As the empirical evidence (Inman, 1999) indicates, the coefﬁcients
of variation, CVup and CVdown of these random variables are often less than 1; thus,
the distributions cannot be exponential. Therefore, an analytical characterization
of kE for non-exponential machines is of theoretical importance.
Second, such a characterization is of practical importance as well. Indeed, it
exp
can be expected that kE is the upper bound of kE for CV < 1 and, moreover, kE
exp
might be substantially smaller than kE . This implies that a smaller buffer capacity
is necessary to achieve the desired line efﬁciency E when the machines are non-
exponential. Thus, selecting LLB based on realistic, non-exponential reliability
characteristics would lead to increased leanness of production systems.

1.3 Difﬁculties in studying the non-exponential case

Analysis of lean buffering in serial production lines with non-exponential machines

is complicated, as compared with the exponential case, by the reasons outlined in
Table 1 . Especially damaging is the ﬁrst one, which practically precludes analytical
investigation. The other reasons lead to a combinatorially increasing number of
cases to be investigated. In this work, we partially overcome these difﬁculties by
32 E. Enginarlar et al.

Table 1. Difﬁculties of the non-exponential case as compared with the exponential one

Exponential case Non-exponential case

Analytical methods for evaluating No analytical methods for evaluating

P R are available P R are available
Machine up- and downtimes are distributed Machine up- and downtimes may
identically (i.e., exponentially). have different distributions.
Coefﬁcients of variation of machine Coefﬁcients of variation of machine
up- and downtimes are identical up- and downtimes may take arbitrary
and equal to 1. positive values and may be
non-identical.
All machines in the system have the Each machine in the system may have
same type of up- and downtime distributions different types of up- and downtime
(i.e., exponential). distributions.

using numerical simulations and by restricting the number of distributions and

coefﬁcients of variation analyzed.

1.4 Related literature

The majority of quantitative results on buffer capacity allocation in serial produc-

tion lines address the case of exponential or geometric machines (Buzacott, 1967;
Caramanis, 1987; Conway et al., 1988; Smith and Daskalaki, 1988; Jafari and
Shanthikumar, 1989; Park, 1993; Seong et al., 1995; Gershwin and Schor, 2000).
Just a few numerical/empirical studies are devoted to the non-exponential case.
Speciﬁcally, two-stage coaxian type completion time distributions are considered
by Altiok and Stidham (1983), Chow (1987), Hillier and So (1991a,b), and the
effects of log-normal processing times are analyzed by Powell (1994), Powell and
Pyke (1998), Harris and Powell (1999). These papers consider lines with reliable
machines having random processing time. Another approach is to develop methods
to extend the results obtained for such cases to unreliable machines with determinis-
tic processing time (Tempelmeier, 2003). Phase-type distributions to model random
processing time and reliability characteristics are analyzed by Altiok (1985, 1989),
Altiok and Ranjan (1989), Yamashita and Altiok (1998), but the resulting methods
are computationally intensive and can be used only for short lines with small buffers
(e.g., two-machine lines with buffers of capacity less than six). Finally, as it was
mentioned in the Introduction, initial results on lean level of buffering in serial lines
with Rayleigh and Erlang machines have been reported in (Enginarlar et al., 2002).
Lean buffering in serial production lines with non-exponential machines 33

1.5 Contributions of this paper

The main results derived in this paper are as follows:

– LLB is not very sensitive to the type of up- and downtime distributions and
depends mostly on their coefﬁcients of variation (CVup and CVdown ).
– LLB is more sensitive to CVdown than to CVup , but this difference in sensi-
tivities is not too large (typically, within 20%).
– In serial lines with M machines having Weibull, gamma, and log-normal dis-
tributions of up- and downtime with CVup and CVdown less than 1, LLB can
be selected using the following upper bound:
kE (M, E, e, CVup , CVdown )
max{0.25, CVup } + max{0.25, CVdown } exp
≤ kE (M, E, e), (7)
2
exp
where kE is given by (5), (6). This bound is referred to as the empirical
law. It is conjectured that this bound holds for all unimodal up- and downtime
distributions with CVup < 1 and CVdown < 1.
– Although for some values of CVup and CVdown , bound (7) may not be too tight,
it still leads to a reduction of lean buffering by a factor of up to 4, as compared
to LLB based on the exponential assumption.

1.6 Paper organization

In Section 2, the model of the production system under consideration is introduced

and the problems addressed are formulated. Section 3 describes the approach of
this study. Sections 4 and 5 present the main results pertaining, respectively, to
systems with machines having identical and non-identical coefﬁcients of variation
of up- and downtime. In Section 6, serial lines with machines having arbitrary, i.e.,
general, reliability models are discussed. Finally, in Section 7, the conclusions are
formulated.

2 Model and problem formulation

2.1 Model

The block diagram of the production system considered in this work is shown
in Figure 1, where the circles represent the machines and the rectangles are the
buffers. Assumptions on the machines and buffers, described below, are similar to
those of (Enginarlar et al., 2003a) with the only difference that up- and downtime
distributions are not exponential. Speciﬁcally, these assumptions are:
(i) Each machine mi , i = 1, . . . , M , has two states: up and down. When up, the
machine is capable of processing one part per cycle time; when down, no production
takes place. The cycle times of all machines are the same.
34 E. Enginarlar et al.

m1 b1 m2 b2 m M-2 b M-2 m M-1 b M-1 mM

Fig. 1. Serial production line

(ii) The up- and downtime of each machine are random variables measured in units
of the cycle time. In other words, uptime (respectively, downtime) of length t ≥ 0
implies that the machine is up (respectively, down) during t cycle times. The up-
and downtime are distributed according to one of the following probability density
functions, referred to as reliability models:
(a) Weibull, i.e.,
W
(t) = pP e−(pt) P tP −1 ,
P
fup
W
(t) = rR e−(rt) RtR−1 ,
R
fdown (8)
W W
where fup (t)and fdown (t)
are the probability density functions of up- and
downtime, respectively and (p, P ) and (r, R) are their parameters. (Here, and
in the subsequent distributions, the parameters are positive real numbers). These
distributions are denoted as W (p, P ) and W (r, R), respectively.
(b) Gamma, i.e.,
g (pt)P −1
fup (t) = pe−pt ,
Γ (P )
g (rt)R−1
fdown (t) = re−rt , (9)
Γ (R)
∞
where Γ (x) is the gamma function, Γ (x) = 0 sx−1 e−s ds. These distribu-
tions are denoted as g(p, P ) and g(r, R), respectively.
(c) Log-normal, i.e.,
1 (ln(t)−p)2
LN
fup (t) = √ e− 2P 2 ,
2πP t
1 (ln(t)−r)2
LN
fdown (t) = √ e− 2R2 . (10)
2πRt
We denote these distributions as LN (p, P ) and LN (r, R), respectively.
The expected values, variances, and coefﬁcients of variation of distributions
(8)–(10) are given in Table 2.
(iii) The parameters of distributions (8)–(10) are selected so that the machine efﬁ-
ciencies, i.e.,
Tup
e= , (11)
Tup + Tdown
and, moreover, Tup , Tdown , CVup , and CVdown of all machines are identical for
all reliability models, i.e.,

−1 1
Tup = p Γ 1 + (Weibull)
P
Lean buffering in serial production lines with non-exponential machines 35

Table 2. Expected value, variance, and coefﬁcient of variation of up- and downtime distri-
butions considered

Gamma Weibull Log-normal

2
Tup P/p p−1 Γ (1 + 1/P ) ep+P /2
2
Tdown R/r r−1 Γ (1 + 1/R) er+R /2
2 2
2 (eP −1)
σup P/p2 p−2 [Γ (1 + 2/P ) − Γ 2 (1 + 1/P )] e2p+P
2 2
2 (eR −1)
σdown R/r2 r−2 [Γ (1 + 2/R) − Γ 2 (1 + 1/R)] e2r+R

√
CVup 1/ P Γ (1 + 2/P ) − Γ 2 (1 + 1/P ) Γ (1 + 1/P ) P2 −1
√ e
CVdown 1/ R Γ (1 + 2/R) − Γ 2 (1 + 1/R) Γ (1 + 1/R) eR2 − 1

P
= (gamma)
p
2
= ep+P /2
(log-normal);
−1
Tdown = r Γ (1 + 1/R) (Weibull)
R
= (gamma)
r
2
= er+R /2 (log-normal);

Γ (1 + 2/P ) − Γ 2 (1 + 1/P )
CVup = (Weibull)
Γ (1 + 1/P )
1
= √ (gamma)
P

= eP 2 − 1 (log-normal);

Γ (1 + 2/R) − Γ 2 (1 + 1/R)
CVdown = (Weibull)
Γ (1 + 1/R)
1
= √ (gamma)
R

= eR2 − 1 (log-normal).

(iv) Buffer bi , i = 1, . . . , M − 1 is of capacity 0 ≤ N ≤ ∞.

(v) Machine mi , i = 2, . . . , M , is starved at time t if it is up at time t, buffer bi−1 is
empty at time t and mi−1 does not place any work in this buffer at time t. Machine
m1 cannot be starved.
(vi) Machine mi , i = 1, . . . , M − 1, is blocked at time t if it is up at time t, buffer bi
is full at time t and mi+1 fails to take any work from this buffer at time t. Machine
mM cannot be blocked.
36 E. Enginarlar et al.

Remark 1.
– Assumptions (i)–(iii) imply that all machines are identical from all points of
view except, perhaps, for the nature of up- and downtime distributions. The
buffers are also assumed to be of equal capacity (see (iv)). We make these
assumptions in order to provide a compact characterization of lean buffering.
– Assumption (ii) implies, in particular, that time-dependent, rather than
operation-dependent failures, are considered. This failure mode simpliﬁes the
analysis and results in just a small difference in comparison with operation-
dependent failures.

2.2 Notations

Each machine considered in this paper is denoted by a pair

[Dup (p, P ), Ddown (r, R)]i , i = 1, . . . , M, (12)
where Dup (p, P ) and Ddown (r, R) represent, respectively, the distributions of up-
and downtime of the i-th machine in the system, Dup and Ddown ∈ {W, g, LN }.
The serial line with M machines is denoted as
{[Dup , Ddown ]1 , . . . , [Dup , Ddown ]M }. (13)
If all machines have identical distribution of uptimes and downtimes, the line is
denoted as
{[Dup (p, P ), Ddown (r, R)]i , i = 1, . . . , M }. (14)
If, in addition, the types of up- and downtime distributions are the same, the notation
for the line is
{[D(p, P ), D(r, R)]i , i = 1, . . . , M }. (15)
Finally, if up- and downtime distributions of the machines are not necessarily W ,
g, or LN but are general in nature, however, unimodal, the line is denoted as
{[Gup , Gdown ]1 , . . . , [Gup , Gdown ]M }. (16)

2.3 Problems addressed

Using the parameterizations (1), (2), the model (i)–(vi), and the notations (12)–(16),
this paper is intended to
– develop a method for calculating Lean Level of Buffering in production lines
(13)–(15) under the assumption that the coefﬁcients of variation of up- and
downtime, CVup and CVdown , are identical, i.e., CVup = CVdown = CV ;
– develop a method of calculating LLB in production lines (13)–(15) for the case
of CVup =/ CVdown ;
– extend the results obtained to production lines (16).
Solutions of these problems are presented in Sections 4–6 while Section 3
describes the approach used in this work.
Lean buffering in serial production lines with non-exponential machines 37

3 Approach

3.1 General considerations

Since LLB depends on line efficiency E, the calculation of kE requires the knowl-
edge of the production rate, P R, of the system. Unfortunately, as it was mentioned
earlier, no analytical methods exist for evaluating P R in serial lines with either
Weibull, or gamma, or log-normal reliability characteristics. Approximation meth-
ods are also hardly applicable since, in our experiences, even 1%-2% errors in the
production rate evaluation (due to the approximate nature of the techniques) often
lead to much larger errors (up to 20%) in lean buffering characterization. There-
fore, the only method available is the Monte Carlo approach based on numerical
simulations. To implement this approach, a MATLAB code was constructed, which
simulated the operation of the production line defined by assumptions (i)–(vi) of
Section 2. Then, a set of representative distributions of up- and downtime was se-
lected and, finally, for each member of this set, P R and LLB were evaluated with
guaranteed statistical characteristics. Each of these steps is described below in more
detail.

3.2 Up- and downtime distributions analyzed

The set of 26 downtime distributions analyzed in this work is shown in Table 3,

where the notations introduced in Section 2.1 are used. These distributions are
classified according to their coefficients of variation, CVdown , which take values
from the set {0.1, 0.25, 0.5, 0.75, 1.0}. The analysis of LLB for this set is intended
to reveal the behavior of kE as a function of CVdown .
To investigate the effect of the average downtime, the distributions of Table 3
have been classified according to Tdown , which takes values 20 and 100.
An illustration of a few of the downtime distributions included in Table 3 is
given in Figure 2 for CVdown = 0.5. As one can see, the shapes of the distributions
included in Table 3 range from “almost” uniform to “almost” δ-function.

Table 3. Downtime distributions considered

CVdown Tdown = 20 Tdown = 100

0.1 g(5, 100), g(1, 100),

W (0.048, 12.15), LN (2.99, 0.1) W (0.01, 12.15), LN (4.602, 0.1)
0.25 g(0.8, 16), g(0.16, 16),
W (0.046, 4.54), LN (2.97, 0.25) W (0.009, 4.54), LN (4.57, 0.25)
0.5 g(0.2, 4), g(0.04, 4),
W (0.044, 2.1), LN (2.88, 0.49) W (0.009, 2.1), LN (4.49, 0.49)
0.75 g(0.09, 1.8), g(0.018, 1.8),
W (0.046, 1.35), LN (2.77, 0.66) W (0.009, 1.35), LN (4.38, 0.66)
1.00 LN (2.65, 0.83) LN (4.26, 0.83)
38 E. Enginarlar et al.

0.06
g, Tdown = 20
g, Tdown = 100
W, Tdown = 20
W, T = 100
0.05 down
LN, T = 20
down
LN, T = 100
down

0.04
f(t)

0.03

0.02

0.01

0
0 50 100 150 200 250 300 350
t

Fig. 2. Different distributions with identical coefﬁcients of variation (CVdown = 0.5)

The uptime distributions, corresponding to the downtime distributions of Ta-

ble 3, have been selected as follows: For a given machine efﬁciency, e, the average
uptime was chosen as
e
Tup = Tdown .
1−e
Next, CVup was selected as CVup = CVdown , when the case of identical coef-
ﬁcients of variation of up- and downtime was considered; otherwise CVup was
selected as a constant independent of CVdown . Finally, using these Tup and CVup ,
the distribution of uptime was selected to be the same as that of the downtime, if the
case of identical distributions was analyzed; otherwise it was selected as any other
distribution from the set {W, g, LN }. For instance, if the downtime was distributed
according to Ddown (r, R) = g(0.018, 1.8) and e was 0.9, the uptime distribution
was selected as

g(0.002, 1.8) for CVup = CVdown ,
Dup (p, P ) =
g(0.0044, 4) for CVup = 0.5,
or

LN (6.69, 0.47) for CVup = CVdown ,
Dup (p, P ) =
LN (2.88, 0.49) for CVup = 0.5.

Remark 2. Both CVup and CVdown considered are less than 1 because, according
to the empirical evidence of (Inman, 1999), the equipment on the factory ﬂoor often
satisﬁes this condition. In addition, it has been shown by Li and Meerkov (2005)
that CVup and CVdown are less than 1 if the breakdown and repair rates of the
machines are increasing functions of time, which often takes place in reality.
Lean buffering in serial production lines with non-exponential machines 39

3.3 Parameters selected

In all systems analyzed, particular values of M , E, and e have been selected as

follows:
(a) The number of machines in the system, M : Since, as it was shown in (Enginarlar
exp
et al., 2002), kE is not very sensitive to M if M ≥ 10, the number of machines in
the system was selected to be 10. For verification purposes, we analyzed also serial
lines with M = 5.
(b) Line efficiency, E: In practice, production lines are often operated close to
their maximum capacity. Therefore, for the purposes of simulation, E was selected
to belong to the set {0.85, 0.9, 0.95}. For the purposes of verification, additional
values of E analyzed were {0.7, 0.8}.
(c) Machine efficiency, e: Although in practice e may have widely different val-
ues (e.g., smaller in machining operations and much larger in assembly), to ob-
tain a manageable set of systems for simulation, e was selected from the set
{0.85, 0.9, 0.95}. For verification purposes, we considered e ∈ {0.6, 0.7, 0.8}.

3.4 Systems analyzed

Speciﬁc systems of the form (15) considered in this work are:

{[W (p, P ), W (r, R)]i , i = 1, . . . , 10},
{[g(p, P ), g(r, R)]i , i = 1, . . . , 10}, (17)
{[LN (p, P ), LN (r, R)]i , i = 1, . . . , 10}.
Systems of the form (13) have been formed as follows: For each machine
mi , i = 1, . . . , 10, the up- and downtime distributions were chosen from the set
{W, g, LN } equiprobably and independently of each other and all other machines
in the system. As a result, the following two lines were selected:
Line 1: {(g, W ), (LN, LN ), (W, g), (g, LN ), (g, W ),
(LN, g), (W, W ), (g, g), (LN, W ), (g, LN )},
Line 2: {(W, LN ), (g, W ), (LN, W ), (W, g), (g, LN ), (18)
(g, W ), (W, W ), (LN, g), (g, W ), (LN, LN )}.
We will use notations A ∈ {(17)}, A ∈ {(18)} or A ∈ {(17), (18)} to indicate
that line A is one of (17), or one of (18), and one of (17) and (18), respectively.
Lines (17) and (18) are analyzed in Sections 4 and 5 for the cases of CVup =
CVdown and CVup = / CVdown , respectively.

3.5 Evaluation of the production rate

To evaluate the production rate in systems (17) and (18), using the MATLAB code
and the up- and downtime distributions discussed in Sections 3.1–3.3, zero initial
40 E. Enginarlar et al.

conditions of all buffers have been assumed and the states of all machines at the
initial time moment have been selected “up”. The ﬁrst 100,000 cycle times were
considered as warm-up period. The subsequent 1,000,000 cycle times were used
for statistical evaluation of P R. Each simulation was repeated 10 times, which
resulted in 95% conﬁdence intervals of less than 0.0005.

3.6 Evaluation of LLB

The lean buffering, kE , necessary and sufficient to ensure line efficiency E, was
evaluated using the following procedure:
For each model of serial line (13)–(15), the production rate was evaluated first
for N = 0, then for N = 1, and so on, until the production rate P R = E ·P R∞ was
achieved. Then kE was determined by dividing the resulting NE by the machine
average downtime (in units of the cycle time).
Remark 3. Although, as it is well known (Hillier and So, 1991b), the optimal
allocation of a fixed total buffer capacity is non-uniform, to simplify the analysis we
consider only uniform allocations. Since the optimal (i.e., inverted bowl) allocation
typically results in just 1 − 2% throughput improvement in comparison with the
uniform allocation, for the sake of simplicity we consider only the latter case.

4 LLB in serial lines with CVup = CVdown = CV

4.1 System {[D(p, P ), D(r, R)]i , i = 1, . . . , 10}

Figures 3 and 5 present the simulation results for production lines (17) for all
distributions of Table 3. These ﬁgures are arranged as matrices where the rows
and columns correspond to e ∈ {0.85, 0.9, 0.95} and E ∈ {0.85, 0.9, 0.95}, re-
spectively. Since, due to space considerations, the graphs in Figures 3 and 5 are
congested and may be difﬁcult to read, one of them is shown in Figure 4 in a larger
scale. (The dashed lines in Figs. 3–5 will be discussed in Sect. 4.3.) Examining
these data, the following may be concluded:
exp
– As expected, kE for non-exponential machines is smaller than kE . Moreover,
kE is a monotonically increasing function of CV . In addition, kE (CV ) is
convex, which implies that reducing larger CV ’s leads to larger reduction of
kE than reducing smaller CV ’s.
– Function kE (CV ) seems to be polynomial in nature. In fact, each curve of
Figures 3 and 5 can be approximated by a polynomial of an appropriate order.
However, since these approximations are “parameter-dependent” (i.e., different
polynomials must be used for different e and E), they are of small practical
importance, and, therefore, are not reported here.
– Since for every pair (E, e), corresponding curves of Figures 3 and 5 are identical,
it is concluded that kE is not dependent of Tup and Tdown explicitly but only
through the ratio e. In other words, the situation here is the same as in lines with
exponential machines (see (5), (6)).
Lean buffering in serial production lines with non-exponential machines 41

Fig. 3. LLB versus CV for systems (17) with Tdown = 20

10
Gamma
Weibull
8 log−normal
empirical law

6
kE

0
0 0.2 0.4 0.6 0.8 1
CV
Fig. 4. LLB versus CV for system {(D(p, P ), D(r, R))i , i = 1, . . . , 10} with Tdown =
20, e = 0.9, E = 0.9

– Finally, and perhaps most importantly, the behavior of kE as a function of

CV is almost independent of the type of up- and downtime distributions
A
considered. Indeed, let kE (CV ) denote LLB for line A ∈ {(17)} with
CV ∈ {0.1, 0.25, 0.5, 0.75, 1.0}. Then the sensitivity of kE to up- and down-
time distributions may be characterized by
A
kE (CV ) − kE
B
(CV )

1 (CV ) = max
A,B∈{(17)} k A (CV ) · 100%. (19)
E
42 E. Enginarlar et al.

Fig. 5. LLB versus CV for systems (17) with Tdown = 100

Fig. 6. Sensitivity of LLB to the nature of up- and downtime distributions for systems (17)
Lean buffering in serial production lines with non-exponential machines 43

Function 1 (CV ) is illustrated in Figure 6. As one can see, in most cases it takes
values within 10%. Thus, it is possible to conclude that for all practical purposes
kE depends on the coefﬁcients of variation of up- and downtime, rather than
on actual distribution of these random variables.

4.2 System {[D(p, P ), D(r, R)]1 , . . . , [D(p, P ), D(r, R)]10 }

Figures 7 and 8 present the simulation results for lines (18), while Figure 9 char-
acterizes the sensitivity of kE to up- and downtime distributions. This sensitivity
is calculated according to (19) with the only difference that the max is taken over
A, B ∈ {(18)}. Based on these data, we afﬁrm that the conclusions formulated in
Section 4.1 hold for production lines of the type (13) as well.

4.3 Empirical law

4.3.1 Analytical expression

Simulation results reported above provide a characterization of kE for M = 10 and
E and e ∈ {0.85, 0.9, 0.95}. How can kE be determined for other values of M , E,
and e? Obviously, simulations for all values of these variables are impossible. Even
for particular values of M , E, and e, simulations take a very long time: Figures 3
and 5 required approximately one week of calculations using 25 Sun workstations
working in parallel. Therefore, an analytical method for evaluating kE for all values
of M , E, e, and CV is desirable. Although an exact characterization of the function
kE = kE (M, E, e, CV ) is all but impossible, results of Sections 4.1 and 4.2 provide
an opportunity for introducing an upper bound of kE as a function of all four
exp exp
variables. This upper bound is based on the expression of kE = kE (M, E, e),
given by (5), (6), and the fact that all curves of Figures 3, 5 and 7, 8 are below the
exp
linear function of CV with the slope kE , if 0.25 < CV ≤ 1. For 0 < CV ≤ 0.25,
exp
all curves are below the constant 0.25kE . Thus, the following piece-wise linear
upper bound for kE may be introduced:
exp
kE (M, E, e, CV ) ≤ max{0.25, CV }kE (M, E, e), CV ≤ 1. (20)
This expression, referred to as the empirical law, is illustrated in Figures 3-5 and
7, 8 by the broken lines.
The tightness of this bound can be characterized by the function
upper bound A
kE − kE
2 (CV ) = max A
· 100%, CV ≤ 1, (21)
A∈{(17),(18)} kE
upper bound
where kE is the right-hand-side of (20). Function 2 (CV ) is illustrated
in Figure 10. Although, as one can see, the empirical law is quite conservative, its
usage still leads to up to 400% reduction of buffering, as compared with that based
on the exponential assumption (see Figs. 3, 5 and 7, 8).
Remark 4. As it was pointed out above, the curves of Figures 3, 5 and 7, 8 are
polynomial in nature. This, along with the quadratic dependence of performance
44 E. Enginarlar et al.

Fig. 7. LLB versus CV for systems (18) with Tdown = 20

Fig. 8. LLB versus CV for systems (18) with Tdown = 100

Lean buffering in serial production lines with non-exponential machines 45

Fig. 9. Sensitivity of LLB to the nature of up- and downtime distributions for systems (18)

Fig. 10. The tightness of the empirical law (20)

46 E. Enginarlar et al.

Fig. 11. Veriﬁcation: LLB versus CV for system {(D(p, P ), D(r, R))i , i = 1, . . . , 5}
with Tdown = 10

measures on CV in G/G/1 queues, might lead to a temptation to approximate these

curves by polynomials. This, however, proved to be practically impossible, since
for various values of M , E, and e, the order and the coefﬁcients of the polynomials
would have to be selected differently. This, together with the fact that only one
exp
point is known analytically (i.e., kE ), leads to the selection of the piece-wise
linear approximation (20).

4.3.2 Veriﬁcation
To verify the empirical law (20), production lines (17) and (18) were simulated with
parameters M , E, and e other than those considered in Sections 4.1 and 4.2. Specif-
ically, the following parameters have been selected: M = 5, E ∈ {0.7, 0.8, 0.9},
e ∈ {0.6, 0.7, 0.8}, Tdown = 10. (In lines (18), the ﬁrst 5 machines were selected.)
The results are shown in Figure 11. As one can see, the upper bound given by (20)
still holds.

5 LLB in serial lines with CVup =

CVdown

5.1 Effect of CVup and CVdown

The case of CVup = / CVdown is complicated by the fact that CVup and CVdown may
have different effects on kE . If this difference is signiﬁcant, it would be difﬁcult
Lean buffering in serial production lines with non-exponential machines 47

to expect that the empirical law (20) could be extended to the case of unequal
coefﬁcients of variation. On the other hand, if CVup and CVdown affect kE in a
somewhat similar manner, it would seem likely that (20) might be extended to the
case under consideration. Therefore, analysis of effects of CVup and CVdown on
kE is of importance. This section is devoted to such an analysis.
To investigate this issue, introduce two functions:

kE (CVup |CVdown = α) (22)

and

kE (CVdown |CVup = α), (23)

where

α ∈ {0.1, 0.25, 0.5, 0.75, 1.0}. (24)

Function (22) describes kE as a function of CVup given that CVdown = α, while

(23) describes kE as a function of CVdown given that CVup = α. If for all α and
β ∈ {0.1, 0.25, 0.5, 0.75, 1.0},

kE (CVdown = β|CVup = α) < kE (CVup = β|CVdown = α) (25)

when α > β, it must be concluded that CVdown has a larger effect on kE than CVup .
If the inequality is reversed, CVup has a stronger effect. Finally, if (25) holds for
some α and β from (24) and does not hold for others, the conclusion would be that,
in general, neither has a dominant effect.
To investigate which of these situations takes place, we evaluated functions
(22) and (23) using the approach described in Section 3. Some of the results for
Weibull distribution are shown in Figure 12 (where the broken lines and CVef f will
be deﬁned in Sect. 5.2). Similar results were obtained for gamma and log-normal
distributions as well (see Enginarlar et al., 2003b for details). From these results,
the following can be concluded:

– For all α and β, such that α > β, inequality (25) takes place. Thus, CVdown
has a larger effect on kE than CVup .
– However, since each pair of curves (22), (23) corresponding to the same α are
close to each other, the difference in the effects of CVup and CVdown is not too
dramatic. To analyze this difference, introduce the function

A
3 (CV |CVup = CVdown = α)
A A
kE (CVup =CV |CVdown = α)−kE (CVdown =CV |CVup =α)
= A
·100 , (26)
kE (CVup =CV |CVdown =α)
where A ∈ {W, g, LN }. The behavior of this function for Weibull distribution
is shown in Figure 13 (see Enginarlar et al., 2003b for gamma and log-normal
distributions). Thus, the effects of CVup and CVdown on kE are not dramatically
different (typically within 20% and no more than 40%).
48 E. Enginarlar et al.

Fig. 12. LLB versus CV for M = 10 Weibull machines

5.2 Empirical law

5.2.1 Analytical expression

Since the upper bound (20) is not too tight (and, hence, may accommodate additional
uncertainties) and the effects of CVup and CVdown on kE are not dramatically
different, the following extension of the empirical law is suggested:
kE (M, E, e, CVup , CVdown )
max{0.25, CVup }+ max{0.25, CVdown } exp
≤ kE (M, E, e),
2
CVup ≤ 1, CVdown ≤ 1, (27)
exp
where, as before, kE , is deﬁned by (5), (6). If CVup = CVdown , (27) reduces to
(20); otherwise, it takes into account different values of CVup and CVdown .
The ﬁrst factor in the right-hand-side of (27) is denoted as CVef f :
max{0.25, CVup } + max{0.25, CVdown }
CVef f = . (28)
2
Thus, (27) can be rewritten as
exp
kE ≤ CVef f kE (M, E, e). (29)
The right-hand-side of (29) is shown in Figure 12 by the broken lines.
The utilization of this law can be illustrated as follows: Suppose CVup = 0.1
and CVdown = 1. Then CVef f = 0.625 and, according to (27),
exp
kE ≤ 0.625kE (M, E, e).
Lean buffering in serial production lines with non-exponential machines 49

3 (CV |CVup = CVdown = α)

Fig. 13. Function W

Table 4. ∆(10, E, e) for all CVup =

CVdown cases considered

E=0.85 E=0.9 E=0.95

e = 0.85 0.1016 0.0386 0.0687

e = 0.9 0.0425 0.1647 0.1625
e = 0.95 0.0402 0.0488 0.1200

To investigate the validity of the empirical law (27), consider the following
function:

∆(M, E, e) = min min (30)

A∈{(17)} CVup ,CVdown ∈{(24)}

upper bound A
kE (M, E, e, CVef f )−kE (M, E, e, CVup , CVdown ) ,

upper bound
where kE is the right-hand-side of (29), i.e.,
upper bound exp
kE (M, E, e, CVef f ) = CVef f kE (M, E, e).

If for all values of its arguments, function ∆(M, E, e) is positive, the right-hand-
side of inequality (27) is an upper bound. The values of ∆(10, E, e) for E ∈
{0.85, 0.9, 0.95} and e ∈ {0.85, 0.9, 0.95} are shown in Table 4. As one can see,
function ∆(10, E, e) indeed takes positive values. Thus, the empirical law (27)
takes place for all distributions and parameters analyzed.
50 E. Enginarlar et al.

Fig. 14. The tightness of the empirical law (27)

To investigate the tightness of the bound (27), consider the function

4 (CVef f ) = max max (31)

A∈{(17)} CVup ,CVdown ∈{(24)}
upperbound A
kE (M, E, e, CVef f )−kE (M, E, e, CVup , CVdown )
A
·100 .
kE (M, E, e, CVup , CVdown )

Figure 14 illustrates the behavior of this function. Comparing this with Figure 10,
we conclude that the tightness of bound (27) appears to be similar to that of (20).

5.2.2 Verification
To evaluate the validity of the upper bound (27), serial production lines with M = 5,
E ∈ {0.7, 0.8, 0.9}, e ∈ {0.6, 0.7, 0.8}, and Tup = 10 were simulated. For each
of these parameters, systems (17) and (18) have been considered. (For system (18),
the first 5 machines were selected.) Typical results are shown in Figure 15 (see
Enginarlar et al., 2003b for more details). The validity of empirical law (27) for
these cases is analyzed using function ∆(M, E, e), defined in (30) with the only
difference that the first min is taken over A ∈ {(17), (18)}. Since the values of
this function, shown in Table 5, are positive, we conclude that empirical law (27) is
indeed verified for all values of M , E, e, and all distributions of up- and downtime
considered.
Lean buffering in serial production lines with non-exponential machines 51

Fig. 15. Veriﬁcation: LLB versus CV for M = 5 Weibull machines

Table 5. Veriﬁcation: ∆(5, E, e) for all CVup =

CVdown cases considered

E=0.7 E=0.8 E=0.9

e = 0.6 0.0039 0.0242 0.0547

e = 0.7 0.0102 0.0213 0.0481
e = 0.8 0.0084 0.0162 0.0355

6 SYSTEM {[Gup , Gdown ]1 , . . . , [Gup , Gdown ]M }

So far, serial production lines with Weibull, gamma, and log-normal reliability
models have been analyzed. It is of interests to extend this analysis to general
probability density functions. Based on the results obtained above, the following
conjecture is formulated:
The empirical laws (20) and (27) hold for serial production lines satisfying
assumptions (i), (iii)–(vi) with up- and downtime having arbitrary unimodal prob-
ability density functions.
The veriﬁcation of this conjecture is a topic for future research.
52 E. Enginarlar et al.

7 Conclusions

Results described in this paper suggest the following procedure for designing lean
buffering in serial production lines defined by assumptions (i)–(vi):
1. Identify the average value and the variance of the up- and downtime, Tup ,
2 2
Tdown , σup , and σdown , for all machines in the system (in units of machine
cycle time). This may be accomplished by measuring the duration of the up-
and downtimes of each machine during a shift or a week of operation (depending
on the frequency of occurrence). If the production line is at the design stage,
this information may be obtained from the equipment manufacturer (however,
typically with a lower level of certainty).
2. Using (5), (6), and Tup , Tdown , determine the level of buffering, necessary
and sufficient to obtain the desired efficiency, E, of the production line, if the
exp
downtime of all machines were distributed exponentially, i.e., kE .
σup σdown
3. Finally, if CVup = Tup ≤ 1 and CVdown = Tdown ≤ 1, evaluate the level of
buffering for the line with machines under consideration using the empirical
law
max{0.25, CVup } + max{0.25, CVdown } exp
kE ≤ · kE .
2
As it is shown in this paper, this procedure leads to a reduction of lean buffering
by a factor of up to 4, as compared with that based on the exponential assumption.

References

Altiok T (1985) Production lines with phase-type operation and repair times and ﬁnite buffers.
International Journal of Production Research 23: 489–498
Altiok T (1989) Approximate analysis of queues. In: Series with phase-type service times
and blocking. Operations Research 37: 601–610
Altiok T, Stidham SS (1983) The allocation of interstage buffer capacities in production
lines. IIE Transactions 15: 292–299
Altiok T, Ranjan R (1989) Analysis of production lines with general service times and
ﬁnite buffers: a two-node decomposition approach. Engineering Costs and Production
Economics 17: 155–165
Buzacott JA (1967) Automatic transfer lines with buffer stocks. International Journal of
Production Research 5: 183–200
Caramanis M (1987) Production line design: a discrete event dynamic system and general-
ized benders decomposition approach. International Journal of Production Research 25:
1223–1234
Chow W-M (1987) Buffer capacity analysis for sequential production lines with variable
processing times. International Journal of Production Research 25: 1183–1196
Conway R, Maxwell W, McClain JO, Thomas LJ (1988) The role of work-in-process inven-
tory in serial production lines. Operations Research 36: 229–241
Enginarlar E, Li J, Meerkov SM, Zhang RQ (2002) Buffer capacity to accommodating ma-
chine downtime in serial production lines. International Journal of Production Research
40: 601–624
Lean buffering in serial production lines with non-exponential machines 53

Enginarlar E, Li J, Meerkov SM (2003a) How lean can lean buffers be? Control Group Report
CGR 03-10, Deptartment of EECS, University of Michigan, Ann Arbor, MI; accepted
for publication in IIE Transactions on Design and Manufacturing (2005)
Enginarlar E, Li J, Meerkov SM (2003b) Lean buffering in serial production lines with
non-exponential machines. Control Group Report CGR 03-13, Deptartment of EECS,
University of Michigan, Ann Arbor, MI
Gershwin SB, Schor JE (2000) Efficient algorithms for buffer space allocation. Annals of
Operations Research 93: 117–144
Harris JH, Powell SG (1999) An algorithm for optimal buffer placement in reliable serial
lines. IIE Transactions 31: 287–302
Hillier FS, So KC (1991a) The effect of the coefficient of variation of operation times on the
allocation of storage space in production line systems. IIE Transactions 23: 198–206
Hillier FS, So KC (1991b) The effect of machine breakdowns and internal storage on the
performance of production line systems. International Journal of Production Research
29: 2043–2055
Inman RR (1999) Empirical evaluation of exponential and independence assumptions in
queueing models of manufacturing systems. Production and Operation Management 8:
409–432
Jafari MA, Shanthikumar JG (1989) Determination of optimal buffer storage capacity and
optimal allocation in multistage automatic transfer lines. IIE Transactions 21: 130–134
Li J, Meerkov SM (2005) On the coefficients of variation of up- and downtime in manufac-
turing equipment. Mathematical Problems in Engineering 2005: 1–6
Park T (1993) A two-phase heuristic algorithm for determining buffer sizes in production
lines. International Journal of Production Research 31: 613–631
Powell SG (1994) Buffer allocation in unbalanced three-station lines. International Journal
of Production Research 32: 2201–2217
Powell SG, Pyke DF (1998) Buffering unbalanced assembly systems. IIE Transactions 30:
55–65
Seong D, Change SY, Hong Y (1995) Heuristic algorithm for buffer allocation in a production
line with unreliable machines. International Journal of Production Research 33: 1989–
2005
Smith JM, Daskalaki S (1988) Buffer space allocation in automated assembly lines. Opera-
tions Research 36: 343–357
Tempelmeier H (2003) Practical considerations in the optimization of flow production sys-
tems. International Journal of Production Research 41: 149–170
Yamashita H, Altiok T (1988) Buffer capacity allocation for a desired throughput of produc-
tion lines. IIE Transactions 30: 883–891
Analysis of flow lines
with Cox-2-distributed processing times
and limited buffer capacity
Stefan Helber
University of Hannover, Department for Production Management, Königsworther Platz 1,
30167 Hannover, Germany (e-mail: [email protected])

Abstract. We describe a ﬂow line model consisting of machines with Cox-2-

distributed processing times and limited buffer capacities. A two-machine sub-
system is analyzed exactly and a larger ﬂow lines are evaluated through a decom-
position into a set of coupled two-machine lines. Our results are compared to those
given by Buzacott, Liu and Shantikumar for their “Stopped Arrival Queue Modell”.

Keywords: Flow line – Performance evaluation – Decomposition – General pro-

cessing times – Cox-2-distribution

1 Introduction

We describe an approximate approach to determine the production rate and in-

ventory level of a ﬂow line consisting of more than two machines where adjacent
machines are decoupled through buffers of limited capacity. We assume that ma-
chines are reliable and that processing times are Cox-2-distributed. This allows
us to model processing times with any squared coefﬁcient of variation c2 ≥ 0.5.
These processing times can include the random delay of workpieces which is due to
random failures and repairs of the machines if we use the completion time concept
proposed by Gaver [17].
Several researchers have studied transfer lines or assembly/disassembly (A/D)
systems with limited buffer capacity. A comprehensive survey is given by Dallery
and Gershwin [15]. This review includes the literature on reliable two-machine
transfer lines, on transfer lines without buffers as well as longer lines with more
than two machines and A/D systems. Earlier reviews are [7, 11], and [28].
Transfer lines and A/D systems are often studied using Markov chain or process
models to allow for an analytic solution or an accurate approximation. Many of these

The author thanks the anonymous referees for their helpful comments and suggestions.
56 S. Helber

Table 1. Two-machine models and approximation approaches

Type of Analysis of Approximate decomposition

process two-machine models approaches

Discrete state/ [2,5,7,24,26,29,34] [13,18,20,25]

discrete time
Discrete state/ [6,22,31] [12,19,25]
continuous time

Continuous state/ [21,23,32,33,35] [4,14,16]

continuous time

approximations are based on a decomposition of the complete system into a set of

single server queues [27] or two-machine transfer lines [18, 32, 35] which can be
evaluated analytically. The main advantage of analytical approaches as opposed to
simulation models is that the analytical techniques are much faster. This is crucial if a
large number of different systems has to be evaluated in order to find a configuration
which is optimal with respect to some objective.
When analyzing the related work with respect to two-machine models and
decomposition approaches, we can distinguish [15]
– Markov processes with discrete state and discrete time,
– Markov processes with discrete state and continuous time, and
– Markov processes with mixed state and continuous time.
In the first two cases, the state is discrete since discrete parts are produced.
An additional possible reason to have discrete states is that machines can be either
operational or under repair. Time is divided into discrete periods in the first case or
treated as continuous in the second. The third group of Markov processes assumes
that continuous material is produced in continuous time (which leads to a continuous
buffer level), but machine states are discrete. In this paper we describe a discrete-
state, continuous-time model where the discrete states reflect discrete buffer levels.
Table 1 gives an overview of two-machine models and decomposition ap-
proaches for the case of limited buffer capacity. In many of these papers machines
are assumed to be unreliable. Textbooks covering these and similar techniques in
detail are [1, 10, 30] as well as [21] which gives a thorough introduction into how
to derive these models. In this paper, we develop a two-machine transfer line de-
composition of the discrete state-continuous time type. We assume, however, that
machines are reliable and that processing times may exhibit variability with any
squared coefficient of variation larger than 0.5. The two papers most closely related
to this one are an older paper by Buzacott and Kostelski [8] on the analysis of a
specific two-machine line and by Buzacott et al. [9] on particular decomposition
techniques for longer lines with limited buffer capacity.
The paper is structured as follows: In Section 2 we formally describe the type of
flow line to be analyzed. Section 3 outlines the exact analysis of the two-machine,
one-buffer subsystem that serves as the building block of a decomposition and
Analysis of flow lines with Cox-2-distributed processing times 57

which has already been analyzed by Buzacott and Kostelski [8] using the Matrix
geometric method. Our analysis of the two-machine system, however, follows the
approach for two-machine systems which is thoroughly explained by Gershwin
[21]. The decomposition algorithm is brieﬂy described in Section 4. In Section 5
we present some preliminary numerical results by comparing our results to those
obtained from the multistage ﬂow line analysis with the stopped arrival queue model
as proposed by Buzacott et al. [9].

2 The model

We assume that the flow line consists of M machines or stages. The processing
times at machine Mi follow a Cox-2 distribution. Each buffer Bi between machines
Mi and Mi+1 has the capacity to hold up to Ci workpieces which flow from the
leftmost to the rightmost machine. An example of such a flow line is depicted in
Figure 1.
a1 a2 a3
µ11 µ12 µ21 µ22 µ31 µ32

1-a1 1-a2 1-a3

Fig. 1. Flow line with three machines

The rates of the two phases of stage i are µi1 and µi2 respectively. The second
phase of stage i will be required after completion of phase one with probability ai .
Therefore, a workpiece completes its service at stage i with probability 1 − ai after
the completion of the first phase and with probability ai after the completion of the
second phase. Note that these states of a machine or stage do not represent servers:
No more than one workpiece can be at a machine at any moment in time, and if it
is there, it is in one out of the two phases of the respective machine. Each machine
Mi except for the first and the last can be either idle (starved) or blocked or it can
be processing a part in phase one or two. The state of machine Mi is denoted as
αi (t). The possible machine states are αi (t) ∈ {1, 2, B, S}, representing phase
one, phase two, blocking and starvation. The buffer level n(i) is defined such that it
includes the parts between machines Mi and Mi+1 , the one part residing at machine
Mi+1 (if this machine is not starved) and the one part waiting at machine Mi if this
machine is blocked because the buffer between Mi and Mi+1 is full.

3 The two-maschine subsystem

3.1 State space and transition equations

In order to analyze larger systems with more than two machines, we first study a
two-machine line. The state of this two-machine line is given by the state of the first
machine, the state of the second machine, and the buffer level. In the analysis to
follow, we define the buffer level include all parts that are currently being processed
at the second machine, that are waiting in the physical buffer between the first and
58 S. Helber

the second machine, and those parts that have been processed at the first machine
but cannot leave it because the physical buffer between the machines is full so that
the first machine is blocked. That is, we follow the blockage convention which
is described in [21, p. 95]. The (total or extended) buffer capacity is therefore
Ni = Ci + 2. In order to describe the state space, we use use the triple (n, α1 , α2 )
where n denotes the buffer level. The probability of the system being in this state is
p(n, α1 , α2 ). Machine M1 can either be in the first phase (α1 = 1), in the second
phase (α1 = 2) or it can be blocked (α1 = B). The downstream machine M2 can
either be in the first phase (α2 = 1), in the second phase (α2 = 2) or it can be
starved (α2 = S).
This leads to the following transition equations which differ for states with an
empty or almost empty buffer, states for a full a almost full buffer, and the states
with a buffer level that is in between:
Lower boundary states:
µ11 p(0, 1, S) = (1 − a2 )µ21 p(1, 1, 1) + µ22 p(1, 1, 2) (1)
µ12 p(0, 2, S) = a1 µ11 p(0, 1, S) + (1 − a2 )µ21 p(1, 2, 1) +
µ22 p(1, 2, 2) (2)
(µ11 + µ21 )p(1, 1, 1) = (1 − a1 )µ11 p(0, 1, S) + µ12 p(0, 2, S) +
(1 − a2 )µ21 p(2, 1, 1) + µ22 p(2, 1, 2) (3)
(µ11 + µ22 )p(1, 1, 2) = a2 µ21 p(1, 1, 1) (4)

Intermediate stages:
(µ11 + µ21 )p(n, 1, 1) = (1 − a1 )µ11 p(n − 1, 1, 1) + µ12 p(n − 1, 2, 1) +
(1 − a2 )µ21 p(n + 1, 1, 1) + µ22 p(n + 1, 1, 2)
(for 2 ≤ n ≤ N − 2) (5)
(µ11 + µ22 )p(n, 1, 2) = (1 − a1 )µ11 p(n − 1, 1, 2) + µ12 p(n − 1, 2, 2) +
a2 µ21 p(n, 1, 1) (for 2 ≤ n ≤ N − 1) (6)
(µ12 + µ21 )p(n, 2, 1) = a1 µ11 p(n, 1, 1) + µ22 p(n + 1, 2, 2) +
(1 − a2 )µ21 p(n + 1, 2, 1) (for 1 ≤ n ≤ N − 2) (7)
(µ12 + µ22 )p(n, 2, 2) = a1 µ11 p(n, 1, 2) + a2 µ21 p(n, 2, 1)
(for 2 ≤ n ≤ N − 1) (8)
Upper boundary states:
µ21 p(N, B, 1) = (1 − a1 )µ11 p(N − 1, 1, 1) +
µ12 p(N − 1, 2, 1) (9)
µ22 p(N, B, 2) = a2 µ21 p(N, B, 1) +
(1 − a1 )µ11 p(N − 1, 1, 2) +
µ12 p(N − 1, 2, 2) (10)
Analysis of ﬂow lines with Cox-2-distributed processing times 59

(µ11 + µ21 )p(N − 1, 1, 1) = (1 − a1 )µ11 p(N − 2, 1, 1) + µ12 p(N − 2, 2, 1) +

(1 − a2 )µ21 p(N, B, 1) + µ22 p(N, B, 2) (11)
(µ12 + µ21 )p(N − 1, 2, 1) = a1 µ11 p(N − 1, 1, 1) (12)
Together with the normalization equation

N −1 2
2
p(0, 1, S) + p(0, 2, S) + p(n, α1 , α2 ) +
n=1 α1 =1 α2 =1
p(N, B, 1) + p(N, B, 2) = 1 (13)
this leads to a linear system of equations which can be solved in several ways.
An almost identical system of equations has been formulated in [8] and solved
via the matrix geometric method and a recursive algorithm. Since their methods
suffered from numerical instabilities, we developed a solution technique using the
ideas for the analysis of two-machine models presented in [21, pp.105]. It leads to
a numerically stable algorithm providing the exact values of all the system states
as well as the performance measures such as the production rate and the inventory
level.

3.2 Identities

Conservation of ﬂow. The rate at which parts leave machine M1 is the product
of the steady-state probabilities of all states where M1 is not blocked times the
respective rate for this state:
P R1 = µ11 (1 − a1 )p(0, 1, S) + µ12 p(0, 2, S) +
N
−1
2
(µ11 (1 − a1 )p(n, 1, α2 ) + µ12 p(n, 2, α2 )) (14)
n=1 α2 =1

The reasoning for machine M2 (which may not be starved) is similar:

N
−1
2
P R2 = (µ21 (1 − a2 )p(n, α1 , 1) + µ22 p(n, α1 , 2)) +
n=1 α1 =1
µ21 (1 − a2 )p(N, B, 1) + µ22 p(N, B, 2) (15)
The Conservation-of-Flow-identity (COF) states that the rates of parts passing
through machines M1 and M2 are equal:

P R1 = P R2 (16)

The reason is that the ﬂow of material is linear and parts are neither created nor
destroyed at either machine.
60 S. Helber

Rate of changes from phase one to two equals rate of changes from phase two
to one. For each change of machine M1 from phase one to phase two there must
be a change from phase two to phase one

N −1

2
a1 µ11 p(0, 1, S) + p(n, 1, α2 )
n=1 α2 =1

N −1

2
= µ12 p(0, 2, S) + p(n, 2, α2 ) (17)
n=1 α2 =1

and the same holds true for machine M2 :

N −1

2
a2 µ21 p(N, B, 1) + p(n, α1 , 1)
n=1 α1 =1

N −1

2
= µ22 p(N, B, 2) + p(n, α1 , 2) (18)
n=1 α1 =1

Flow-Rate-Idle-Time-Equations. The Flow-Rate-Idle-Time-Equations (FRIT-

Equations) relate the ﬂow or production rates of the up- and downstream machines
to the probability of the respective machine being blocked or starved.
The expected processing time E[T1 ] at the upstream machine M1 of a two-
machine-line is the weighted sum of the expected processing time µ111 if a workpiece
only goes through phase one (which happens with probability (1 − a1 )) and the
expected processing time µ111 + µ112 if it undergoes both phases (with probability
a1 ):

1 1 1
E[T1 ] = (1 − a1 ) + a1 + (19)
µ11 µ11 µ12

The reasoning for the expected processing time E[T2 ] at at the second (down-
stream) machine of a two-machine-line leads to an analogous result:

1 1 1
E[T2 ] = (1 − a2 ) + a2 + (20)
µ21 µ21 µ22

Now the production rate P R1 of machine M1 is the multiplicative inverse

of the average processing time of this machine times the probability 1 − pB =
1 − (p(N, B, 1) + p(N, B, 2)) of not being blocked:
1 − pB 1 − pB
P R1 = = (21)
E[T1 ] (1 − a1 ) µ111 + a1 µ111 + 1
µ12

This leads to an equation for the probability of the machine being blocked:

1 1 1
pB = 1 − P R1 (1 − a1 ) + a1 + (22)
µ11 µ11 µ12
Analysis of ﬂow lines with Cox-2-distributed processing times 61

For the downstream machine the FRIT-equation

1 − pS 1 − pS
P R2 = = (23)
E[T2 ] (1 − a2 ) µ21 + a2 µ121 +
1 1
µ22

is similar and it also leads to a similar equation for the probability of the downstream
machine not being starved:

1 1 1
pS = 1 − P R2 (1 − a2 ) + a2 + (24)
µ21 µ21 µ22

While equations (21) and (23) can be used to determine the production rate of
a two-machine system, the equations (22) and (24) will later be used in a decom-
position approach to analyze larger ﬂow lines with more than two machines.

3.3 Derivation of the solution

In this section, we derive a specialized solution procedure similar to the one given
in [22].

3.3.1 Analysis of internal states

Following the basic approach in Gershwin and Berman, we assume that the internal
equations (5)–(8) have a solution of the form
J
J

α1 −1 α2 −1
p[n, α1 , α2 ] = cj ξj (n, α1 , α2 ) = cj Xjn Y1j Y2j (25)
j=1 j=1

where cj , Xj , Y1j , and Y2j are parameters to be determined. The analysis below
is very similar to the one in [22] and [26, Sect. 3.2.4]. Replacing p(n, α1 , α2 ) by
α1 −1 α2 −1
Xjn Y1j Y2j in Equations (6), (7) and (8), we derive the following non-linear
set of equations:
(µ11 + µ22 )XY2 = a2 µ21 X + (1 − a1 )µ11 Y2 + µ12 Y1 Y2 (26)
(µ12 + µ21 )Y1 = a1 µ11 + (1 − a2 )µ21 XY1 + µ22 XY1 Y2 (27)
(µ12 + µ22 )Y1 Y2 = a2 µ21 Y1 + a1 µ11 Y2 (28)
Equations (26) and (27) are used to eliminate X. From the resulting equation
and (28) we can next eliminate Y2 . A considerable algebraic effort leads to the
following fourth degree equation in Y1

a2 µ21 (µ12 Y1 − a1 µ11 )(Y13 + sY12 + tY1 + v) = 0 (29)

with auxiliary variables s, t, v, and w deﬁned as follows:

w = µ21 (a2 µ12 − µ12 − µ22 ) (30)
62 S. Helber

1
s= (µ11 µ12 − µ212 + a1 µ11 µ21 + a2 µ11 µ21 (31)
w
−a1 a2 µ11 µ21 − µ12 µ21 + µ11 µ22
−µ12 µ22 − µ21 µ22 )
1
t = (a1 µ11 (−µ11 + 2µ12 + µ21 + µ22 )) (32)
w
1
v = −(a21 µ211 ) (33)
w
From the ﬁrst term on the left side of Equation (29) we see that one solution to
(29) is
a1 µ11
Y11 = (34)
µ12
Applying this result to Equation (28), we ﬁnd
a2 µ21
Y21 = (35)
µ22
and from (34) and (35) in (26) or (27) we see that
X1 = 1. (36)
The remaining three solutions to (29) are1

a φ s
Y12 = 2 − cos − (37)
3 3 3

a φ 2φ s
Y13 = 2 − cos + − (38)
3 3 3 3

a φ 4φ s
Y14 = 2 − cos + − (39)
3 3 3 3
with auxiliary variables
1
a= 3t − s2 (40)
3
1 3
b= 2s − 9st + 27v) (41)
27 ⎛ ⎞
b
φ = arccos ⎝− ⎠ (42)
−a3
2 27
The corresponding values of Y22 , Y22 , and Y24 are again determined via (28).
The values of X2 , X3 , and X4 are next computed from (26) or (27).
Since we have found four solutions to equations (26), (27), and (28), the general
expression for the steady-state probabilities of the internal states is as follows

4
4
α1 −1 α2 −1
p(n, α1 , α2 ) = cj ξj (n, α1 , α2 ) = cj Xjn Y1j Y2j (43)
j=1 j=1

where we still have to determine the parameters cj .

1
See [3, Sect. 2.4.2.3, p. 131]
Analysis of ﬂow lines with Cox-2-distributed processing times 63

3.3.2 Analysis of boundary states

There is a total of 12 boundary states in the model. The transition equations of four
of them ((1, 2, 1), (1, 2, 2), (N − 1, 1, 2), and (N − 1, 2, 2)) are of internal form
(6) - (8), i.e. their steady-state probabilities can be computed from equation (43)
even though they are boundary states.
Since p(1, 2, 1) and p(1, 2, 2) are given from (43), the corresponding equations
(7) and (8) related to states (1, 2, 1) and (1, 2, 2) constitute a linear system of two
equations in two unknowns p(1, 1, 1) and p(1, 1, 2) with the following solution:
(µ12 + µ21 )p(1, 2, 1) − (µ21 − a2 µ21 )p(2, 2, 1)
p(1, 1, 1) = −
a1 µ11
µ22 p(2, 2, 2)
(44)
a1 µ11
(µ12 + µ22 )p(1, 2, 2) − a2 µ21 p(1, 2, 1)
p(1, 1, 2) = (45)
a1 µ11
Given p(1, 1, 1), p(1, 1, 2), p(1, 2, 1), and p(1, 2, 2), Equations (1) and (2) can
immediately be used to determine first p(0, 1, S) and next p(0, 2, S) (in this order).
The upper boundary steady-state probabilities are determined in exactly the
same way as now states (N − 1, 1, 2) and (N − 1, 2, 2) are of internal form and we
may compute p(N − 1, 1, 2) and p(N − 1, 2, 2) from (43), then solve (6) and (8)
for p(N − 1, 1, 1) and p(N − 1, 2, 1) to find
(µ11 + µ22 )p(N − 1, 1, 2) − (µ11 − a1 µ11 )p(N − 2, 1, 2)
p(N − 1, 1, 1) = −
a2 µ21
µ12 p(N − 2, 2, 2)
(46)
a2 µ21
(µ12 + µ22 )p(N − 1, 2, 2) − a1 µ11 p(1, 1, 2)
p(N − 1, 2, 1) = . (47)
a2 µ21
Given p(N − 1, 1, 1) and p(N − 1, 2, 1), we can now (in this order) compute
p(N, B, 1) from equation (9) and finally p(N, B, 2) from equation (10). Consider
again the symmetry of upper and lower boundary values.
Since boundary states are now expressed in terms of internal states, and since
internal states are of the form

4
p(n, α1 , α2 ) = cj ξj (n, α1 , α2 ), (48)
j=1

the equations for boundary states hold for each solution ξj (n, α1 , α2 ) of the equa-
tions for internal states. The equation (45) corresponding to state (1, 1, 2), for
example, leads to
4
4
(µ12 + µ22 ) j=1 cj ξj (1, 2, 2)
cj ξj (1, 1, 2) = −
j=1
a1 µ11
4
a2 µ21 j=1 cj ξj (1, 2, 1)
(49)
a1 µ11
64 S. Helber

Similar equations can be found to determine the terms ξj (n, α1 , α2 ) for the other
boundary state probabilities. The terms ξj (n, α1 , α2 ) corresponding to transient
states are all zero.
Now all steady-state probabilities have been related to equation (43). What
remains to be done is to ﬁnd appropriate values of the coefﬁcients cj in (43).

3.3.3 Determination of coefﬁcients cj

To determine four coefficients cj , j = 1, ..., 4, a linear system of four equations in
the four unknowns cj can be solved. The following four equations can be derived by
inserting (43) into the conservation of flow equation (16), the two equations stating
that for every transition from phase one to phase two there is one from phase two
to phase one ((17) and (18)), and the condition (13) that all probabilities sum up to
one:
Conservation of flow

4
4
µ11 (1 − a1 ) cj ξj (0, 1, S) + µ12 cj ξj (0, 2, S) +
j=1 j=1
N
−1
2
4
4
(µ11 (1 − a1 ) cj ξj (n, 1, α2 ) + µ12 cj ξj (n, 2, α2 )) −
n=1 α2 =1 j=1 j=1
N
−1
2
4
4
(µ21 (1 − a2 ) cj ξj (n, α1 , 1) − µ22 cj ξj (n, α1 , 2)) −
n=1 α1 =1 j=1 j=1

4
4
µ21 (1 − a2 ) cj ξj (N, B, 1) − µ22 cj ξj (N, B, 2) = 0 (50)
j=1 j=1

Rate of changes from phase one to 2 equals rate of changes from phase two to 1 at
Maschine M1
⎛ ⎞
4 N
−1
2 4
a1 µ11 ⎝ cj ξj (0, 1, S) + cj ξj (n, 1, α2 )⎠
j=1 n=1 α2 =1 j=1
⎛ ⎞

4 N
−1
2
4
−µ12 ⎝ cj ξj (0, 2, S) + cj ξj (n, 2, α2 )⎠ = 0 (51)
j=1 n=1 α2 =1 j=1

Rate of changes from phase one to 2 equals rate of changes from phase two to 1 at
Maschine M2
⎛ ⎞
4 N
−1
2 4
a2 µ21 ⎝ cj ξj (N, B, 1) + cj ξj (n, α1 , 1)⎠
j=1 n=1 α1 =1 j=1
⎛ ⎞

4 N
−1
2
4
−µ22 ⎝ cj ξj (N, B, 2) + cj ξj (n, α1 , 2)⎠ = 0 (52)
j=1 n=1 α1 =1 j=1
Analysis of ﬂow lines with Cox-2-distributed processing times 65

Probabilities sum up to one

⎛ ⎞

4
4 N
−1
2
2
4
cj ξj (0, 1, S)+ cj ξj (0, 2, S)+ ⎝ cj ξj (n, α1 , α2 )⎠ +
j=1 j=1 n=1 α1 =1 α2 =1 j=1

4
4
cj ξj (N, B, 1)+ cj ξj (N, B, 2) = 1 (53)
j=1 j=1

Note that the right hand side of the three of the four equations is zero. For this
reason, it is relatively painless to solve this linear system of equations in the four
unknowns cj , j = 1..4 numerically.

3.4 The algorithm to determine steady-state probabilities

and performance measures

The algorithm to compute the required steady-state probabilities p[n, α1 , α2 ] and

performance measures P R and n consists of the following steps:
1. Compute auxiliary variables w, s, t, v, a, b, and φ from (30)-(33) and (40)-(42).
Compute Y11 from (34) and Y12 ...Y14 from (37)-(39). Compute Y21 ...Y24 from
(28) and X1 ...X4 from (26) or (27).
2. Determine the coefﬁcients cj , j = 1, ..., 4 in Equation (43) by solving the linear
system of equations given by (50)-(53).
3. Use the cj from Step 2 to compute the required steady-state probabilities
p(n, α1 , α2 ) of states of internal form via (43) and those of the remaining
boundary states as described in Section 3.3.2.
4. Determine performance measures. Determine the production rate from (14) or
(15), the in-process inventory via
N
−1
2
2
n̄ = np(n, α1 , α2 ) + N (p(N, B, 1) + p(N, B, 2) (54)
n=1 α1 =1 α2 =1

and blocking and starvation probabilities pB and pS via

pB = p(N, B, 1) + p(N, B, 2) (55)
pS = p(0, 1, S) + p(0, 2, S). (56)
This algorithm proved to be numerically stable and it was used as a building
block within the decomposition approach employed to analyze ﬂow lines with more
than two machines.

4 The decomposition approach

4.1 Derivation of decomposition equations

While it is possible to analyze a two-machine system exactly, the exact analysis of

larger systems is practically impossible as the state space of the system explodes
66 S. Helber

very quickly. For this reason decomposition approaches are frequently used to
analyze larger systems. The basic idea is to decompose a system with K machines
and K − 1 buffers into K − 1 two-machine systems with virtual machines that
mimic to an observer in the buffer the flow of material in and out of this buffer
as it would be seen in the corresponding buffer of the real system. We followed
the ideas presented in great detail in [21] to develop an iterative decomposition
algorithm to analyze flow lines with more than two machines. However, some
modifications were necessary which we will now briefly outline. While the models
analyzed in [21] assumed unreliable machines and consequently lead to so-called
interruption-of-flow- and resumption-of-flow-equations, we are studying a flow line
with reliable machines which cannot fail. The machines in our system, however,
change their phases of operation as described in Section 2. For this reason, we
derived the following three types of decomposition equations:
– Phase-One-to-Two(P1t2)-Equation: This type of equation deals with the
probability of the transition of the virtual machine from operating in its first
phase to its second.
– Phase-Two-to-One(P2t1)-Equation: This type of equation deals with the
probability of the transition of the virtual machine from operating in its second
phase to its first.
– Flow-Rate-Idle-Time(FRIT)-Equation: This is a type of equation which re-
lates the flow of material through a machine to its isolated production rate and
its probability of being blocked and starved. This type of equation has also been
used by Gershwin et al.
In the following we will briefly discuss the derivation of the parameters of the
virtual machines.
The key to the derivation of the P1t2- and P2t1-equations is the definition of
virtual machine states. We study a virtual two-machine line L(i) which is related
to the buffer between machines Mi and Mi+1 . The virtual machines of line L(i)
are Mu (i) (upstream of the buffer) and Md (i) (downstream of the buffer). We
want to determine the parameters au (i), µu1 (i), and µu2 (i) of the virtual machine
Mu (i) as well as the parameters ad (i), µd1 (i), and µd2 (i) of the virtual machine
Md (i) in order to be able to use our two-machine model in Section 3 to determine
performance measures for the flow line.
The upstream machine of a two-machine line is never starved (and the down-
stream machine is never blocked). We therefore assume that the virtual machine
Mu (i) is in phase one if the real machine Mi is processing a workpiece in phase
one or when it is waiting for the next workpiece:

{αu (i, t) = 1} iff {αi (t) = 1} or {αi (t) = S} (57)
Machine Mu (i) is in phase two if Mi is in phase two
{αu (i, t) = 2} iff {αi (t) = 2} (58)
and it is blocked if Mi is blocked:
{αu (i, t) = B} iff {αi (t) = B} (59)
Analysis of flow lines with Cox-2-distributed processing times 67

The deﬁnition of virtual machine states for machine Md (i) is symmetric: Ma-
chine Md (i) is in phase one if the machine Mi+1 downstream of the buffer number
i is in phase one or blocked:

{αd (i, t) = 1} iff {αi+1 (t) = 1} or {αi+1 (t) = B} (60)
It is in phase two if machine Mi+1 in the real system is in phase two
{αd (i, t) = 2} iff {αi+1 (t) = 2} (61)
and starved if Mi+1 is starved:
{αd (i, t) = S} iff {αi+1 (t) = S} (62)

Phase-One-to-Two (P1t2)-Equation: To derive the P1t2-equation for machine

Mu (i), we ask for the probability of observing a transition of the virtual machine
Mu (i) from phase one to phase two. For this to happen, we have to observe a
completion of phase one (with probability µu (i)δt) and the process must enter the
second phase, which happens with probability au (i).
au (i)µu1 (i)δt = Prob[{αu (i, t + δt) = 2}|{αu (i, t) = 1}] (63)
The joint probability au (i)µu1 (i)δt can be related to a change in the machine
states deﬁned above if we insert the deﬁnitions of the virtual machine states given
in (57) and (58):
au (i)µu1 (i)δt = Prob[{αu (i, t + δt) = 2}|{αu (i, t) = 1}]
= Prob[{αi (t + δt) = 2}|{αi (t) = 1} or {αi (t) = S}]
= Prob[{αi (t + δt) = 2}|{αi (t) = 1}] ·
Prob[{αi (t) = 1}|{αi (t) = 1} or {αi (t) = S}]
+ Prob[{αi (t + δt) = 2}|{αi (t) = S}] ·
Prob[{αi (t) = S}|{αi (t) = 1} or {αi (t) = S}]

au (i)µu1 (i) ≈ ai µi1 Prob[n(i − 1, t) > 0] (64)

In the above derivation, the probability of machine Mi being in phase two
at time t + δt, given that it was starved at time t, is zero. However, the rest of
this derivation is still only a (possibly crude) approximation since the conditional
probability Prob[{αi (t) = 1}|{αi (t) = 1} or {αi (t) = S}] of machine Mi being
in phase one given that it is either in phase one or starved is simply approximated
by the probability Prob[n(i − 1, t) > 0] of machine Mi not being starved. This is
crude since if it is not starved, in can still be in phase two or blocked. The reasoning
behind this crude approximation is that if machine Mi is in phase one, we at least
know that it cannot be starved and the probability of this state is related to the
probability of machine Md (i − i) not being starved. While there is no stronger
analytical justiﬁcation for this substitution, it appears to work well in the numerical
algorithm to be described below.
The basic approach to derive the probability of a transition from phase one to
two at the virtual machine Md (i) is similar:
ad (i)µd1 (i)δt = Prob[{αd (i, t + δt) = 2}|{αd (i, t) = 1}] (65)
68 S. Helber

We again insert the deﬁnition of virtual machine states and ﬁnd

ad (i)µd1 (i)δt = Prob[{αd (i, t + δt) = 2}|{αd (i, t) = 1}]
= Prob[{αi+1 (t + δt) = 2}|{αi+1 (t) = 1} or {αi+1 (t) = B}]
= Prob[{αi+1 (t + δt) = 2}|{αi+1 (t) = 1}] ·
Prob[{αi+1 (t) = 1}|{αi+1 (t) = 1} or {αi+1 (t) = B}]
+ Prob[{αi+1 (t + δt) = 2}|{αi+1 (t) = B}] ·
Prob[{αi+1 (t) = B}|{αi+1 (t) = 1} or {αi+1 (t) = B}]

ad (i)µd1 (i) ≈ ai+1 µi+1,1 Prob[n(i + 1, t) < N ] (66)

where we again substitute in an admittedly crude way the conditional probability
of machine Mi+1 being in phase one given that it is either in phase one or blocked
by the probability Prob[n(i + 1, t) < N − 1] of not being blocked.
Phase-Two-to-One (P2t1)-Equation: A transition of the virtual machine Mu (i)
from state two to state one or to being blocked can only occur if the real machine
Mi completes the second phase of operation on a workpiece:
µu2 (i)δt = Prob[{αu (i, t + δt) = 1} or {αu (i, t + δt) = B}|
{αu (i, t) = 2}] (67)
If we insert the deﬁnition of the virtual machine states given in (57), (58) and
(59), we get
µu2 (i)δt = Prob[{αu (i, t + δt) = 1} or {αu (i, t + δt) = B}|{αu (i, t) = 2}]
= Prob[{αi (t + δt) = 1} or {αi (t + δt) = S} or
{αi (t + δt) = B}|{αi (t) = 2}] = µi2 δt (68)
and eventually

µu2 (i) = µi2 . (69)

In this derivation, the equation (68) holds because a part must complete its phase
two (which happens with probability µi2 δt) in order for machine Mi to reach states
1, S, or B.
The reasoning for machine Md (i) is analogous:
µd2 (i)δt = Prob[{αd (i, t + δt) = 1} or {αd (i, t + δt) = S}|
{αd (i, t) = 2}] (70)
This leads to the following result:

µd2 (i) = µi+1,2 (71)

Analysis of ﬂow lines with Cox-2-distributed processing times 69

FRIT-Equation: The Flow-Rate-Idle-Time-Equation is the third type of decom-

position equation. It states that the production rate P Ri of machine Mi in the real
system is the probability that this machine is neither blocked nor starved divided
by the average processing time of this machine:

prob[{ni (t) > 0} and {ni+1 (t) < Ni+1 }]

P Ri = (72)
(1 − ai ) µ1i1 + ai µ1i1 + µ1i2

The probability of machine Mi in the real system not being blocked or starved
is approximated as follows:
prob[{ni (t) > 0} and {ni+1 (t) < Ni+1 }] ≈
1 − prob[{ni (t) = 0}] − prob[{ni+1 (t) = Ni+1 }] (73)
This is only an approximation since Mi can both blocked and starved. Now
these probabilities are unknown. However, we can use equations (22) and (24)
from the two-machine model to approximate these quantities if we decompose the
real system into a set of two-machine lines. This leads to the following equation:
1 − pS (i) − pB (i + 1)
P Ri ≈
(1 − ai ) µ1i1 + ai µ1i1 + µ1i2

P R2 (i) (1 − ad (i)) µd 1(i) + ad (i) µd 1(i) + µd 1(i)
= 1
1
2

(1 − ai ) µi1 + ai µi1 + µi2

1 1 1

P R1 (i + 1) (1 − au (i + 1)) µu (i+1) 1
+ au (i + 1) µu1(i) + 1
µu2 (i)
+ 1
1

(1 − ai ) µ1i1 + ai µ1i1 + µ1i2

1
− (74)
(1 − ai ) µ1i1 + ai 1
µi1 + 1
µi2

Because of the conservation-of-ﬂow-equation, the following condition should

be met by any decomposition of the original ﬂow line into a set of two-machine
lines:

P Ri = P R(i) = P R(i + 1) (75)

These equations will be used when we solve the decomposition equations.

4.2 Simultaneous solution of the decomposition equations

Equations (64), (69) and (74) can be solved simultaneously for the parameters
µu1 (i), µu2 (i), and au (i) of the upstream machine Mu (i) related to line L(i) of
the decomposition to ﬁnd:

au (i) = ai µd1 (i − 1)µi1 µd2 (i − 1)µi2 +
70 S. Helber

ai µd1 (i − 1)µi1 µd2 (i − 1)P R(i − 1) +

µd1 (i − 1)µd2 (i − 1)µi2 P R(i − 1) −
ad (i − 1)µd1 (i − 1)µi1 µi2 P R(i − 1) −

µi1 µd2 (i − 1)µi2 P R(i − 1) (1 − pS (i − 1)) ·
1
(76)
µd1 (i − 1)µd2 (i − 1)P R(i − 1)(µi2 + ai µi1 (1 − pS (i − 1)))

µu1 (i) = µd1 (i − 1)µi1 µd2 (i − 1)P R(i − 1)(µi2 + ai µi1 (1 − pS (i − 1))) :

µd1 (i − 1)µi1 µd2 (i − 1)µi2 +
ai µd1 (i − 1)µi1 µd2 (i − 1)P R(i − 1) +
µd1 (i − 1)µd2 (i − 1)µi2 P R(i − 1) −
ad (i − 1)µd1 (i − 1)µi1 µi2 P R(i − 1) −

µi1 µd2 (i − 1)µi2 P R(i − 1) (77)
µu2 (i) = µi2 (78)
In Eqs. (76) and (77), the expressions P Ri and P R(i) have been replaced by
P R(i − 1) which is allowed because of conservation of ﬂow. Now all three param-
eters of the virtual upstream machine Mu (i) are expressed in terms of parameters
of the real machine Mi or parameters or performance measures of line L(i − 1).
In exactly the same way the parameters for the downstream machine Md (i) can
be determined:

ad (i) = ai+1 (µi+1,1 µu1 (i + 1)µi+1,2 µu2 (i + 1) + (79)
ai+1 µi+1,1 µu1 (i + 1)µu2 (i + 1)P R(i + 1) +
µu1 (i + 1)µi+1,2 µu2 (i + 1)P R(i + 1) −
au (i + 1)µi+1,1 µu1 (i + 1)µi+1,2 P R(i + 1) −

µi+1,1 µi+1,2 µu2 (i + 1)P R(i + 1))(1 − pB (i + 1)) ·
1
µu1 (i + 1)µu2 (i + 1)P R(i + 1)(µi+1,2 + ai+1 µi+1,1 (1 − pB (i + 1)))

µd1 (i) = µi+1,1 µu1 (i + 1)µu2 (i + 1)P R(i + 1) ·

(µi+1,2 + ai+1 µi+1,1 (1 − pB (i + 1))) :

µi+1,1 µu1 (i + 1)µi+1,2 µu2 (i + 1) +
ai+1 µi+1,1 µu1 (i + 1)µu2 (i + 1)P R(i + 1) +
µu1 (i + 1)µi+1,2 µu2 (i + 1)P R(i + 1) −
au (i + 1)µi+1,1 µu1 (i + 1)µi+1,2 P R(i + 1) −

µi+1,1 µi+1,2 µu2 (i + 1)P R(i + 1) (80)
µd2 (i) = µi+1,2 (81)
Analysis of ﬂow lines with Cox-2-distributed processing times 71

Note that again all three parameters of the virtual downstream machine Md (i)
are expressed in terms of parameters of the real machine Mi+1 or parameters or
performance measures of line L(i + 1).

4.3 Decomposition algorithm

We used an iterative algorithm to solve the decomposition equations numerically.

No proof of convergence or accuracy can be given for this algorithm, as for many
similar algorithms for ﬂow line decomposition. It consists of the following steps:
1. Initialization: The initial parameters for the M − 1 two-machine lines arising
in the decomposition of a ﬂow line with M machines are given as follows:
au (i) := ai , i = 1, . . . , M − 1
µu1 (i) := µi1 , i = 1, . . . , M − 1
µu2 (i) := µi2 , i = 1, . . . , M − 1
ad (i) := ai+1 , i = 1, . . . , M − 1
µd1 (i) := µi+1,1 , i = 1, . . . , M − 1
µd2 (i) := µi+1,2 , i = 1, . . . , M − 1
If Ci is the number of buffer spaces between machines Mi and Mi+1 , the
extended buffer size N (i) is
N (i) := Ci + 2. (82)
It includes the workspace at machine Mi+1 (and also the one at Mi if this
machine should be blocked). Given these parameters for the M − 1 virtual
two-machine lines, the production rate P R(i), the average inventory level n̄(i)
and the probabilities of blocking pB (i) and starvation pS (i) can be computed
using the algorithm in Section 3.4.
2. Iteration:
(a) Downstream phase: For line l = 2, . . . , M − 1, update parameters
au (l), µu1 (l), and µu2 (l) via equations (76), (77) and (78). Compute new
performance measures P R(l), n̄(l), pB (l), and pS (l).
(b) Upstream phase: For line l = M − 2, . . . , 1, update parameters ad (l),
µd1 (l), and µd2 (l) via equations (79), (80) and (81). Compute new perfor-
mance measures P R(l), n̄(l), pB (l), and pS (l).
(c) Accuracy check: If the condition

| P R(l) − P R(l + 1) |
< 0.000001, l = 1, . . . , M − 1 (83)
P R(l)
holds, terminate the algorithm because the conservation of ﬂow condition
is met by the result of the decomposition. Otherwise, goto step 2a.
(The algorithm also terminates if no convergence should be reached after 50
iterations.)
72 S. Helber

Table 2. Three-stage system with general service times

Case 2 3 4 7

µ1 0.5 0.5 0.5 0.5

µ2 0.5 0.5 0.5 1.0
µ3 0.5 0.5 0.5 0.5
c21 0.5 0.8 2.0 0.6
c22 0.5 0.8 2.0 0.6
c23 0.5 0.8 2.0 0.6

Sim. PR 0.382 0.351 0.296 0.427

BLS-a (abs.) 0.384 0.347 0.272 0.441
BLS-a (rel.) 0.52% −1.14% −8.11% 3.28%
BLS-b (abs.) 0.381 0.349 0.282 0.429
BLS-b (rel.) −0.26% −0.57% −4.73% 0.47%
CoxDC (abs.) 0.380 0.349 0.298 0.443
CoxDC (rel.) −0.55% −0.48% 0.61% 3.82%

5 Numerical results and conclusion

Since the analysis of the two-machine model is exact, the only interesting question
with respect the the two-machine algorithm is if it is numerically stable. The results
reported in [8] indicated that their method tended to have difficulties with larger
buffer sizes. We did not observe such instabilities for the buffer sizes they considered
(up to 100 buffer spaces).
In order to evaluate the accuracy of the decomposition algorithm, we com-
pared it to results given in [9]. In these cases, the expected value and the squared
coefficient of variation of the processing time for each machine was given. The
Cox-2-distribution, however, has three parameters, so that one degree of freedom
is left. We used the so-called “balanced-mean” two-phase Coxian distribution [10,
p. 542] to match the problem data given in [9, p. 450-451].
Table 2 gives parameters and results for a three-stage system with general
service times and one buffer space between adjacent machines. (In the paper by
Buzacott et al. [9], the buffer space includes the workspace at the downstream
machine. If there is just one buffer space between two adjacent machines, our
general approach to determine steady-state probabilities as described in Section 3.4
cannot be applied since there are no “internal states” in terms of our two-machine
model. However, such a system has only 10 states which are all either lower or
upper boundary states. It is trivial to determine the steady-state probabilities and
performance measures for such a tiny system and we therefore simply added the
required algorithm to our decomposition approach in order to be able to deal with
two-machine lines with just one buffer space between adjacent machines.) The
entries “BLS-a” and “BLS-b” are related to two approaches described in [9], “Sim”
denotes the simulation results and “CoxDC” the results from our approach. For the
system in Table 2 our approach gives comparable results.
Analysis of flow lines with Cox-2-distributed processing times 73

Table 3. Four-stage system with exponential service times

Case 1 2 3 4

µ1 1.0 1.0 1.0 1.0

µ2 1.1 1.2 1.5 2.0
µ3 1.2 1.4 2.0 3.0
µ4 1.3 1.6 2.5 4.0

Exact. PR 0.71 0.765 0.861 0.929

BLS-a (abs.) 0.689 0.746 0.85 0.925
BLS-a (rel.) −2.96% −2.48% −1.28% −0.43%
BLS-b (abs.) 0.7 0.756 0.855 0.927
BLS-b (rel.) −1.41% −1.18% −0.70% −0.22%
CoxDC (abs.) 0.712 0.767 0.862 0.930
CoxDC (rel.) 0.29% 0.24% 0.07% 0.09%

Table 4. Three-stage system with general service times

Case 1 2 3

µ1 0.5 0.5 0.5

µ2 0.5 0.5 0.5
µ3 0.5 0.5 0.5
c21 0.75 2.0 2.0
c22 0.75 2.0 2.0
c23 0.75 2.0 2.0

Sim. PR 0.385 0.322 0.360

BLS-a (abs.) 0.385 0.303 0.345
BLS-a (rel.) 0.00% −5.90% −4.17%
BLS-b (abs.) 0.385 0.312 0.349
BLS-b (rel.) −0.00% −3.11% −3.06%
Altiok (abs.) 0.368 0.338 0.368
Altiok (rel.) −4.42% 4.97% 2.22%
CoxDC (abs.) 0.386 0.327 0.360
CoxDC (rel.) 0.26% 1.40% −0.10%

For the four-stage systems in Table 3 with exponential service times our ap-
proach is more accurate than the procedures proposed in [9]. There is again just
one buffer space between adjacent machines in these cases.
Table 4 presents results for systems given by Altiok as cited in [9]. In all cases
there are two buffer spaces between adjacent machines except for Case 3 with 9
buffer spaces between machines 2 and 3 (and two between machines 1 and 2). For
these systems our approach outperforms the other methods.
74 S. Helber

Table 5. Eight-stage systems with general service times

Case 1 2 3 4 5 6 7 8

µ1 1.0 1.0 1.0 1.0 1.2 1.2 1.2 1.2

µ2 1.0 1.0 1.0 1.0 1.3 1.3 1.3 1.3
µ3 1.0 1.0 1.0 1.0 1.1 1.1 1.1 1.1
µ4 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8
µ5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
µ6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
µ7 1.0 1.0 1.0 1.0 1.1 1.1 1.1 1.1
µ8 1.0 1.0 1.0 1.0 0.9 0.9 0.9 0.9

c21 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0

c22 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c23 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c24 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c25 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c26 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c27 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0
c28 0.5 0.5 2.0 2.0 0.5 0.5 2.0 2.0

Buffer sizes 1 10 1 10 1 10 1 10

Sim. PR 0.683 0.918 0.462 0.760 0.661 0.799 0.461 0.723

CoxDC (abs.) 0.688 0.923 0.494 0.784 0.663 0.799 0.486 0.736
CoxDC (rel.) 0.62% 0.54% 6.85% 3.26% 0.28% 0.03% 5.44% 1.77%

We ﬁnally study some eight-stage systems in Table 5. The numerical results

indicate that quite often the proposed algorithm is rather accurate. However, in
cases with a high degree of variability of the processing times (squared coefficient
of variation of 2.0) and small buffer sizes (one buffer space between adjacent
machines), the approximation quality deteriorates while the convergence of the
algorithms still appears to be quick and reliable.
Given the numerical results we conclude that our decomposition approach can
be used to analyze flow lines with general service times as long as these service
time exhibit a squared coefficient of variation larger than 0.5.
Analysis of flow lines with Cox-2-distributed processing times 75

References

1. Altiok T (1996) Performance analysis of manufacturing systems. Springer, Berlin

Heidelberg New York
2. Artamonov G (1977) Productivity of a two-instrument discrete processing line in the
presence of failures. Cybernetics 12: 464–468
3. Bronstein IN, Semendjajew KA (1983) Taschenbuch der Mathematik, 21st edn. Teub-
ner, Leipzig
4. Burman MH (1995) New results in flow line analysis. PhD thesis, Massachusetts
Institute of Technology. Also available as Report LMP-95-007, MIT Laboratory for
Manufacturing and Productivity
5. Buzacott JA (1967) Automatic transfer lines with buffer stocks. International Journal
of Production Research 5(3): 183–200
6. Buzacott J (1972) The effect of station breakdowns and random processing times on
the capacity of flow lines. AIIE Transactions 4: 308–312
7. Buzacott JA, Hanifin LE (1978) Models of automatic transfer lines with inventory
banks – a review and comparison. AIIE Transactions 10(2): 197–207
8. Buzacott JA, Kostelski D (1987) Matrix-geometric and recursive algorithm solution of
a two-stage unreliable flow line. IIE Transactions 19(4): 429–438
9. Buzacott JA, Liu XG, Shanthikumar JG (1995) Multistage flow line analysis with the
stopped arrival queue model. IIE Transactions 27(4): 444–455
10. Buzacott JA, Shanthikumar JG (1993) Stochastic models of manufacturing systems.
Prentice Hall, Englewood Cliffs, NJ
11. Buxey G, Slack N, Wild R (1973) Production flow line system design – a review. AIIE
Transactions 5: 37–48
12. Choong Y, Gershwin SB (1987) A decomposition method for the approximate eval-
uation of capacitated transfer lines with unreliable machines and random processing
times. IIE Transactions 19: 150–159
13. Dallery Y, David R, Xie XL (1988) An efficient algorithm for analysis of transfer lines
with unreliable machines and finite buffers. IIE Transactions 20(3): 280–283
14. Dallery Y, David R, Xie XL (1989) Approximate analysis of transfer lines with un-
reliable machines and finite buffers. IEEE Transactions on Automatic Control 34(9):
943–953
15. Dallery Y, Gershwin SB (1992) Manufacturing flow line systems: a review of models
and analytical results. Queuing Systems Theory and Applications 12(1–2): 3–94
16. Di Mascolo M, David R, Dallery Y (1991) Modeling and analysis of assembly systems
with unreliable machines and finite buffers. IIE Transactions 23(4): 315–330
17. Gaver DP (1962) A waiting line with interrupted service, including priorities. Journal
of the Royal Statistical Society 24: 73–90
18. Gershwin SB (1987) An efficient decomposition algorithm for the approximate eval-
uation of tandem queues with finite storage space and blocking. Operations Research
35: 291–305
19. Gershwin SB (1989) An efficient decomposition algorithm for unreliable tandem
queueing systems with finite buffers. In: Perros G, Altiok T (eds) Queueing networks
with blocking, pp 127–146. North Holland, Amsterdam
20. Gershwin SB (1991) Assembly/disassembly systems: An efficient decomposition al-
gorithm for tree-structured networks. IIE Transactions 23(4): 302–314
21. Gershwin SB (1994) Manufacturing systems engineering. Prentice Hall, Englewood
Cliffs, NJ
76 S. Helber

22. Gershwin SB, Berman O (1981) Analysis of transfer lines consisting of two unreliable
machines with random processing times and finite storage buffers. AIIE Transactions
13(1): 2–11
23. Gershwin SB, Schick I (1980) Continuous model of an unreliable two-stage material
flow system with a finite interstage buffer. Technical Report LIDS-R-1039, Mas-
sachusetts Institute of Technology, Cambridge, MA
24. Gershwin SB, Schick I (1983) Modeling and analysis of three-stage transfer lines with
unreliable machines and finite buffers. Operations Research 31(2): 354–380
25. Helber S (1998) Decomposition of unreliable assembly/dissassembly networks with
limited buffer capacity and random processing times. European Journal of Operational
Research 109(1): 24–42
26. Helber S (1999) Performance analysis of flow lines with non-linear flow of material.
Springer, Berlin Heidelberg New York
27. Hillier F, Boling RW (1967) Finite queues in series with exponential or Erlang service
times – a numerical approach. Operations Research 16: 286–303
28. Koenigsberg E (1959) Production lines and internal storage – a review. Management
Science 5: 410–433
29. Okamura K, Yamashina H (1977) Analysis of the effect of buffer storage capacity in
transfer line systems. AIEE Transactions 9: 127–135
30. Papadopoulus HT, Heavey C, Browne J (1993) Queueing theory in manufacturing
systems analysis and design. Chapman & Hall, London
31. Sastry BLN, Awate PG (1988) Analysis of a two-station flow line with machine pro-
cessing subject to inspection and rework. Opsearch 25: 89–97
32. Sevast’yanov BA (1962) Influence of storage bin capacity on the average standstill
time of a production line. Theory of Probability and Its Applications 7: 429–438
33. Wijngaard J (1979) The effect of interstage buffer storage on the output of two unreliable
production units in series, with different production rates. AIIE Transactions 11(1):
42–47
34. Yeralan S, Muth EJ (1987) A general model of a production line with intermediate
buffer and station breakdown. IIE Transactions 19(2): 130–139
35. Zimmern B (1956) Etudes de la propagation des arrêts aleatoires dans les chaines de
production. Review Statististical Applications 4: 85–104
Performance evaluation of production lines with finite
buffer capacity producing two different products
M. Colledani, A. Matta, and T. Tolio
Politecnico di Milano, Dipartimento di Meccanica, via Bonardi 9, 20133 Milano, Italy
(e-mail: [email protected])

Abstract. This paper presents an approximate analytical method for the perfor-
mance evaluation of a production line with finite buffer capacity, multiple failure
modes and multiple part types. This paper presents a solution to a class of problems
where flexible machines take different parts to process from distinct dedicated input
buffers and deposit produced parts into distinct dedicated output buffers with finite
capacity. This paper considers the case of two part types processed on the line, but
the method can be extended to the case of n part types. Also, the solution is devel-
oped for deterministic processing times of the machines which are all identical and
are assumed to be scaled to unity. The approach however is amenable of extension
to the case of inhomogeneous deterministic processing times. The proposed method
is based on the approximate evaluation of the performance of the k-machine line
by the evaluation of 2(k-1) two-machine lines. An algorithm inspired by the DDX
algorithm has been developed and the validation of the method has been carried
out by means of testing and comparison with simulation.

Keywords: Flow lines – Performance evaluation – Multiple part types

1 Introduction

Given the increasing ﬂexibility of manufacturing machines and assembly station it

is rather frequent that more than one part type is produced on a single production
line. Also, in automated systems, machines are normally connected by accumulating
conveyors which act as finite capacity buffers. Existing analytical techniques do not
allow the modelling of such systems; indeed classical analytical techniques allow
the modelling of multiclass systems but do not consider finite capacity buffers while
approximate analytical techniques developed to model transfer lines do not take into
account different part types. This paper presents a solution procedure to a class of
problems of this type where flexible machines take different parts to process from
Correspondence to: T. Tolio
78 M. Colledani et al.

Fig. 1. Example of a system producing two part types

distinct dedicated input buffers and deposit produced parts into distinct dedicated
output buffers with finite capacity. By dedicated input and output buffers we mean
buffers that can store only one part type. The proposed solution is developed for
the case of two part types, however the approach is amenable to extension to the
multiple part type case. Also, the solution is developed for deterministic processing
times of the machines which are all identical and are assumed to be scaled to unity.
The approach however is amenable to extension to the case of inhomogeneous
deterministic processing times.
A typical system of the proposed class is represented in Figure 1. In this case
machines M1 , M2 , M3 , M6 and M7 are dedicated machines i.e. they can produce
only one part type. On the contrary, machines M4 and M5 are flexible machines
and can process both part types. The selection of which type of part to produce
depends on the state of the system and on a dispatching rule. If the upstream buffer
of one part type is empty or the downstream buffer is full, the machine will produce
the other part type. If both the part types are either blocked or starved, the machine
will not produce. If both the parts can be produced, than the machine will produce
part type A with probability αiA and part type B with probability αiB . In the paper,
systems formed only by flexible machines will be considered, but the proposed
method in principle can be extended to the case of systems in which both flexible
and dedicated machines are present, such that in Figure 1.
It is important to notice that the proposed system is quite different from assem-
bly/disassembly systems [2, 1]. Indeed in assembly/disassembly systems assembly
machines take contemporarily different parts from different input buffers to produce
a single subassembly while disassembly machines from one subassembly produce
contemporarily different components that are put into different buffers. In either
case there is no selection of components to work on but they are all contemporarily
involved in the process. On the contrary, in the described system a flexible machine
selects a single component to work on.
The proposed system is also different from fork and join networks [3, 4]. Indeed
in fork and join networks each machine can either take the input from different
buffers to produce an undifferentiated product or take in as input an undifferenti-
ated product and place it after processing in different buffers. On the contrary in the
described system flexible machines take in as input different parts from different
buffers and produce different products placed in the corresponding buffers. In other
words, the identity of the part is not lost within the machine.
The problem presented in this paper has been originally stated by S.B. Gershwin
and addressed by Nemec [6]. The original statement however considers a priority
rule between the parts and therefore when both part types can be produced, the part
Performance evaluation of two part type lines 79

type with the highest priority is selected. This would correspond in our statement
to the case of αA = 1, αB = 0. The solution approach adopted in Nemec is heavily
dependent on the original problem statement because parts are treated differently
depending on the priority. On the contrary, in the proposed approach all the parts
are considered in the same way.
It is interesting to consider the fact that the described problem, which has been
inspired by automated production system, is similar to other relevant problems
that can be addressed with the same methodology. In particular it is interesting
to consider the case of production networks where different enterprize cooperate
to produce complex products. In this case each enterprize of the network can be
modelled as a ﬂexible machine while input and output storages can be modelled as
buffers.

2 Assumptions and notations

In this paper we consider transfer lines composed of K machines in which two

distinct part types (type A and type B) are processed in certain ratios. Both part
types follow a linear path through the system since they are processed by all the Mi
machines (with i = 1, ..., K), starting from the first one and finishing with the last
machine after which they leave the system. Adjacent machines are separated by two
different buffers BiA and BiB with limited capacities dedicated to temporally store
parts of types A and B respectively. Buffer capacities between machines Mi and
Mi+1 are denoted with NiA and NiB for part types A and B respectively. Machine
Mi of the system works part type A and part type B in the ratios αiA and αiB when
it is not blocked or starved.
Machines are multiple failure mode machines, i.e. they are unreliable and can
fail in Fi different modes as assumed in [7]; we denote with pi,j the probability of
failure of machine Mi in mode j and with ri,j the probability of repair of machine
Mi failed in mode j (with j = 1, ..., Fi ).
A detailed list of the assumptions used in the proposed model is described in
the following; assumptions regard the behavior of the machines and describe in
particular how failures can occur and how machines select the part type to produce
on the basis of blocking and starvation that characterize the part flow in the system.

– In the model, the ﬂow of material through the system is approximated by a

discrete time model.
– The first machine is never starved, i.e. there is an infinite number of pieces of
both part types waiting to be processed by the system.
– The last machine is never blocked, i.e. there is an infinite space downstream the
system where it is always possible to store pieces processed by the system.
– Blocking before service (BBS) is assumed for the machines.
– If buffer BiA (BiB ) is full then machine Mi will process part type B (A) if
possible.
A B
– If buffer Bi−1 (Bi−1 ) is empty then machine Mi will process part type B (A)
if possible.
80 M. Colledani et al.

– If for a given machine both the upstream buffers are not empty and both the
downstream buffers are not full the machine will produce a part of type A with
probability αiA and a part of type B with probability αiB (αiA + αiB = 1).
– Operation dependent failures are assumed, i.e. machines can only fail if they
are not down, not blocked or starved for both part type at the same time, and
not contemporarily starved for part type A(B) and blocked for part type B(A).
– A given machine Mi can fail in Fi different failure modes.
– An operational machine can fail in only one of its failure modes.
– Mean time to failure (MTTF) and mean time to repair (MTTR) are geometrically
distributed with average values of 1/pi,j and 1/ri,j respectively (i = 1, ..., K
; j = 1, ..., Fi ).

3 Outline of the method

The method evaluates the performance measures of the systems described in the
previous section by using a generalization of the decomposition technique proposed
in [7]. The method can also be used in principle with the decomposition technique
proposed in [8]. The analyzed system is decomposed into 2(K − 1) sets of two-
machine lines that together represent the behavior of the system. Each two-machine
line (building block) models the flow of one of the two part types in the system
(Fig. 2). In other words the method creates a two-machine line for each buffer of
the original line; each building block is composed of two pseudo machines and
one intermediary buffer. The upstream pseudo machine represents the behavior
of the portion of the system that precedes, in the original line, the corresponding
buffer considered in the building block. In the same way, the downstream pseudo
machine represents the behavior of the portion of the system that follows, in the
original line, the corresponding buffer considered in the building block. The idea is
to analyze simple building blocks, easy to study with existing techniques, instead of
the complex original system. In such a way the complexity of the analysis is reduced
to study several two-machine lines instead of a long production line. However, the
different two-machine lines are not independent and have to be analyzed by means
of decomposition equations. To do this, the parameters of the pseudo machines are
calculated so that the flow of parts through the buffers of the decomposed systems
closely matches the flow through the corresponding buffers of the original line.
Therefore, for buffers BiA and BiB of the original line, two building blocks
(Fig. 2) are created. The first building block models the flow of type A parts and
is composed of the upstream pseudo-machine M U (A) (i), the downstream pseudo-
machine M D(A) (i) and the buffer B A (i). These two pseudo-machines together
with the buffer form the building block a(i). The second building block mod-
els the flow of type B parts and is composed of the upstream pseudo-machine
M U (B) (i), the downstream pseudo-machine M D(B) (i) and the buffer B B (i).
These two pseudo-machines together with the buffer form the building block b(i).
To model the interruptions of flow through the buffers of the original line, fail-
ure probabilities of different modes are associated to each pseudo-machine. In the
following we will consider the case of the upstream pseudo-machines. A similar
reasoning applies to the downstream pseudo-machines.
Performance evaluation of two part type lines 81

Fig. 2. Decomposition of the original line

Interruptions of ﬂow due to a failure of the machine Mi of the original line

are modelled assigning to the upstream pseudo-machines local failure modes with
U (A) U (B)
probabilities of failure pi,fi and pi,fi and probabilities of repair ri,fi (i.e. the
same as the ones in the original line, for both part types), with fi = 1...Fi . It must be
noticed that, in multiple product lines, the probabilities of failure in local mode are
not the same as the ones in the original ﬂexible machine. They must be increased,
considering the probability of the original machine of failing while producing the
other part type. In fact, given the presence of two part types, a machine, even if is
starved or blocked under the point of view of a given part type, can produce the other
part type and can fail while producing that part type. Therefore, the probabilities
of local failures must be adjusted to take into account this situation.
To mimic the interruptions of ﬂow due to starvation, remote failure modes are
introduced and assigned to the upstream pseudo-machines of the building blocks,
U (A)
namely M U (A) (i) and M U (B) (i). These remote failures have probabilities pj,fj
U (B) U (A) U (B)
and pj,fj and probabilities of repair rj,fj and rj,fj where j = 1...i−1, indicates
the machines of the original line that actually failed (and are therefore responsi-
ble for the starvation) and fj = 1...Fj indicates the failure modes in which that
machines failed. For these remote failure modes, we assume that the repair prob-
abilities are identical to the repair probabilities of the machine of the original line
that actually failed. On the other hand, the probability of failure for these remote
modes are not known and must be evaluated by using decomposition equations.
The described failure modes follow the approach described in [7] to predict the
performance of a transfer line producing only one part type.
To model the interactions between the parts competing for the same machines,
in addition to the described failure modes, a new failure mode has been introduced
and assigned to each pseudo-machine of the building blocks. This new failure mode
has been called competition failure and mimics the situation in which a machine
does not produce a given part type because it is busy producing the other part type.
U (A) U (B)
This new failure mode has probability of failure pi,Fi +1 , pi,Fi +1 and probability
U (A) U (B)
of repair ri,Fi +1 , ri,Fi +1 for part types A and B respectively.
82 M. Colledani et al.

In order to estimate failure and repair probabilities of the competition failure

and to adjust local failure probabilities, it is necessary to study in detail all the states
of each machine Mi of the original line, producing two part types.
Therefore, the solution approach is based on the analysis of all the states in
which the machine Mi of the original line can be and on the solution of the Markov
chain representing this machine. In this Markov chain, some transition probabilities
are not known, however, the probabilities of starvation and blocking states of this
Markov chain can be derived by studying the probabilities of the upstream buffers
being empty and the probabilities of the downstream buffer being full. Indeed, by
using the decomposition of the original line into building blocks, the flow of material
through the buffers in the decomposed lines approximates the flow of material
through the buffers in the original line. Therefore, by means of decomposition
equations which are a generalization of the ones derived in [7], probabilities of
these states can be calculated. Therefore, at the end, it is possible to solve a linear
system of equations which allows the evaluation of both the unknown transition
probabilities and the probabilities of all the states of the Markov chain.
The probabilities obtained for the various states of the flexible machine Mi are
then used to build two separate models, one for each upstream pseudo-machine of
the two building blocks (M U (A) (i), M U (B) (i)). By studying these two models it is
U (A) U (B)
then possible to calculate the local failure parameters pi,fi and pi,fi , considering
the possibility for each machine of going down due to a failure occurred while
processing the other part. In addition, it is possible to find the probabilities of
U (A) U (B)
failure and repair of the competition failures pi,Fi +1 and pi,Fi +1 . These parameters
completely define the pseudo-machines and allow in turn to evaluate the building
blocks.
In Figure 3 the simplified scheme of the proposed method is presented. In
particular it represents one single iteration of the algorithm, while studying upstream
pseudo-machines.

Fig. 3. Outline of the method

Performance evaluation of two part type lines 83

Fig. 4. Markov chain of the ﬂexible machine Mi

4 Detailed description of the method

4.1 Macro states of the original machine

The Markov chain of the machine Mi of the original line is presented in Figure 4.
To simplify the picture, all the states of the same type are grouped into a unique
aggregate state, without considering different failure modes. Each aggregate state
is deﬁned by two state indicators, one referred to part type A and the other referred
to part type B. Each state indicator can assume four values that are, for part type
A: working (W A ), down in local mode (RA ), starved (S A ), if the upstream buffer
dedicated to the storage of product A is empty, and blocked (B A ), if the downstream
buffer dedicated to the storage of part type A is full. The same can be written for
part type B. In total there are 16 possible aggregate states. For each aggregate state,
the probability is obtained by adding up the probabilities of all the states of the
same type. For example, the probability of the aggregate state named W A RB has
been obtained by adding up all the probabilities of the W A RiB states, one for each
Fi

failure mode of the machine, i.e. π(W A RB ) = π(W A RiB ).
i=1
Obviously, while in the picture we consider the aggregate states, in writing the
equations it is important to distinguish all the different failure modes, to correctly
evaluate the state probabilities. It must be noticed that machine Mi cannot be both
working a part type while being down in local mode for the other part type, therefore
84 M. Colledani et al.

the states of type W A RB , RA W B are not feasible and are not represented in the
picture. Also the aggregate state RA RB represents a situation where the machine
is down and therefore cannot produce either A or B. We call this aggregate state
pure local down state and we rename it R. Finally the state W A W B represents a
state where, for both part types, no local failures, starvation or blocking are present,
therefore the original machine can produce either A or B.
In the following, some key characteristics of the Markov chain of the machine
Mi (Fig. 4) are discussed:

– If the machine is in a state of type W A S B and, while producing part type A

(part type B cannot be produced because machine is starved for that part), it
fails, it goes in state of type RA S B . This means that the machine is both down
in local mode and starved. From this state it can go either to a pure local down
B
state R if one part is stored in upstream buffer Bi−1 or back to W A S B if the
A B
local failure is repaired or to W W if both the local failure is repaired and
B
one part is stored in the upstream buffer Bi−1 . A similar reasoning applies to
A B A B A B
the states of type W B , S W , B W .
– If the machine is in pure local down state, R, by repairing the local failure it
always enters the W A W B state.
– When the machine is in state W A W B it can process A or B depending on the
processing rate αiA and αiB (αiA + αiB = 1). Therefore from state W A W B ,
since only one of the two part types is produced, it is not possible to go to states
of type B A B B , B A S B , S A B B , S A S B (because if a part type is not produced
it is not possible to have blocking or starvation for that part type).
– During a time interval a given machine of the line can at most process one part;
therefore it is impossible to move from states of type S A S B or B A B B to state
W AW B .

As already mentioned, in Figure 4, to simplify the diagram, all the states of the same
type are grouped into a unique state without considering different failure modes.
The probability of these 14 aggregate states is therefore the sum of the probabilities
of the disaggregated states considering all the failure modes. It must be noticed
that, in the Markov chain of the original machine, the competition failure is not
considered, because this machine is able to produce both part types, as assumed in
the previous section.
It must also be noticed that in this Markov chain not all the transition prob-
S(A) S(B) B(A) B(B)
abilities are known. Indeed the values of pj,fj , pj,fj and pk,fk , pk,fk cannot
be derived directly from the original line and therefore they must be found using
appropriate equations. In the following the 14 sets of equations required to eval-
uate the probabilities of the various states, plus the equations required to evaluate
the unknown transition probabilities are provided. The approach used to derive the
following equations deals with the idea that in the decomposed lines, as they have
been deﬁned, each building block mimics the ﬂow of material through one buffer of
the original line. Therefore, the macro state probabilities of the machine Mi must
be coherent with probabilities of building blocks a(i − 1), b(i − 1), a(i) and b(i).
Performance evaluation of two part type lines 85

In the Markov chain, the sum of all the state probabilities must be equal to unity.
Therefore it is possible to write the normalization equation for the model:

All States = 1 (1)
Since buffers B A (i−1) and B B (i−1) of the decomposed lines are equal to buffers
A B
Bi−1 and Bi−1 of the original line, the probability of starvation of machine Mi
has to be equal to the one that derives from upstream building blocks a(i − 1) and
b(i − 1).
i−1 F
k +1 K F
k +1
A
π(Sj,f SB )
j k,fk
+ A
π(Sj,f j
B
Bk,f k
) (2)
k=1 fk =1 k=i+1 fk =1
Fi

+π(W B Sj,f
A
j
)+ B
π(Ri,f S A ) = P sA
i j,fj j,fj (i − 1)
fi =1
j = 1, ..., i − 1, fj = 1, ..., Fj + 1
i−1 F
j +1 K F
j +1
A
π(Sj,f SB )
j k,fk
+ A
π(Bj,f SB )
j k,fk
(3)
j=1 fj =1 j=i+1 fj =1
Fi

+π(W A Sk,f
B
k
)+ A
π(Ri,f S B ) = P sB
i k,fk k,fk (i − 1)
fi =1
k = 1...i − 1, fk = 1...Fk + 1
Where π(X) is the steady state probability of state X.
Since buffers B A (i) and B B (i) of the decomposed lines are equal to buffers BiA
and BiB of the original line, the probability of blocking of machine Mi has to be
equal to that of downstream building blocks a(i) and b(i).
i−1 F
k +1 K F
k +1
B A A B
π(Sk,f k
Bj,f j
)+ π(Bj,f j
Bk,f k
) (4)
k=1 fk =1 k=i+1 fk =1
Fi

B A
+ π(Ri,f i
Bj,f j
) + π(W B Bj,f
A
j
) = P bA
j,fj (i)
fi =1
j = i + 1...K; fj = 1...Fj + 1
i−1 F
j +1 K F
j +1
A B A B
π(Sj,f j
Bk,f k
)+ π(Bj,f j
Bk,f k
) (5)
j=1 fj =1 j=i+1 fj =1
Fi

A B
+ π(Ri,f i
Bk,f k
) + π(W A Bk,f
B
k
) = P bB
k,fk (i)
fi =1
k = i + 1...K; fk = 1...Fk + 1
Considering the states of type RA S B , RA B B , RB S A , RB B A , we can
write node equations balancing the probability of entering these states with the
probability of exiting the same states.
S(B)
π(W A Sj,f
B
j
)pi,fi (1 − rj,fj ) (6)
86 M. Colledani et al.

A S(B) S(B)
= π(Ri,f S B )rj,fj + π(Ri,f
i j,fj
A
S B )ri,fi (1 − rj,fj )
i j,fj

fi = 1...Fi , j = 1...i − 1, fj = 1...Fj + 1

S(A)
π(W B Sj,f
A
j
)pi,fi (1 − rj,fj ) (7)
B S(A) S(A)
= π(Ri,f S A )rj,fj
i j,fj
+ B
π(Ri,f S A )ri,fi (1
i j,fj
− rj,fj )
fi = 1...Fi , j = 1...i − 1, fj = 1...Fj + 1
A B B(B)
π(W Bk,f k
)pi,fi (1 − rk,fk ) (8)
A B B(B) A B B(B)
= π(Ri,f i
Bk,f k
)rk,fk + π(Ri,f i
Bk,f−k
)ri,fi (1 rk,fk )
fi = 1...Fi , k = i + 1...K, fk = 1...Fk + 1
B(A)
π(W B Bk,f
A
k
)pi,fi (1 − rk,fk ) (9)
B A B(A) B A B(A)
= π(Ri,f i
Bk,f k
)rk,fk + π(Ri,f i
Bk,f k
)ri,fi (1 − rk,fk )
fi = 1...Fi , k = i + 1...K, fk = 1...Fk + 1
Considering the states of type S A S B , S A B B , B A S B , B A B B we can write
node equations balancing the probability of entering these states with the probability
of leaving the same states.
S(A) S(B) S(B) S(A)
π(W A Sk,f
B
k
)pj,fj (1 − rk,fk ) + π(W B Sj,f
A
j
)pk,fk (1 − rj,fj ) (10)
A S(A) S(B)
= π(Sj,f S B )(rj,fj + rk,fk )
j k,fk

j = 1...i − 1; f1 = 1...Fj + 1; k = 1...i − 1; fk = 1...Fk + 1

B(A) B(B) B(B) B(A)
π(W A Bk,f
B
k
)pj,fj (1 − rk,fk ) + π(W B Bj,f
A
1
)pk,fk (1 − rj,fj ) (11)
A B B(A) B(B)
= π(Bj,f j
Bk,f k
)(rj,fj + rk,fk )
j = i + 1...K; fj = 1...Fj + 1; k = i + 1...K; fk = 1...Fk + 1
S(A) B(B) B(B) S(A)
π(W A Bk,f
B
k
)pj,fj (1 − rk,fk ) + π(W B Sj,f
A
j
)pk,fk (1 − rj,fj ) (12)
A B S(A) B(B) S(A) B(B)
= π(Sj,f j
Bk,f k
)(rj,fj + rk,fk − rj,fj rk,fk )
j = 1...i − 1; fj = 1...Fj + 1; k = i + 1...K; fk = 1...Fk + 1
B(A) S(B) S(B) B(A)
π(W A Sk,f
B
k
)pj,fj (1 − rk,fk ) + π(W B Bj,f
A
j
)pk,fk (1 − rj,fj ) (13)
B A B(A) S(B) B(A) S(B)
= π(Sk,f k
Bj,f j
)(rj,fj + rk,fk − rj,fj rk,fk )
j = i + 1...K; fj = 1...Fj + 1; k = 1...i − 1; fk = 1...Fk + 1
Considering the set of states of type R , RA S B , RA B B , RB S A , RB B A , we
can write equations balancing the probability of entering this set of states with the
probability of leaving this set of states.

Fj
i−1

B
ri,fi (π(Ri,fi ) + π(Ri,f SA )
i j,fj
(14)
j=1 fj =1

Fk
i−1 K F
k +1
A
+ π(Ri,f SB )
i k,fk
+ A
π(Ri,f i
B
Bk,f k
)+
k=1 fk =1 k=i+1 fk =1
Performance evaluation of two part type lines 87

K F
j +1 i−1 F
j +1
B A A B
+ π(Ri,f i
Bj,f j
)) = pi,fi (π(W W ) + π(W B Sj,f
A
j
)+
j=i+1 fj =1 j=1 f1j =1

i−1 F
k +1 K F
j +1

+ π(W A Sk,f
B
k
)+ π(W B Bj,f
A
j
)
k=1 fk =1 j=i+1 fj =1
K F
k +1

+ π(W A Bk,f
B
k
)) fi = 1...Fi
k=i+1 fk =1

In order to calculate the unknown transition probabilities, we ﬁrst write the

node equation for nodes of type W A S B , W B S A , W A B B and W B B A and then,
after some manipulation, we obtain:

S(B)
P sB
j,fj (i − 1) S(B)
pj,fj = rj,fj
E B (i − 1)
S(A)
P sA
j,fj (i − 1) S(A)
pj,fj = r j = 1...i − 1; fj = 1...Fj + 1 (15)
E A (i − 1) j,fj
B(B) P bB
k,fk (i) B(B)
pk,fk = r
E B (i) k,fk
B(A) P bA
k,fk (i) B(A)
pk,fk = r k = i + 1...K; fk = 1...Fk + 1 (16)
E A (i) k,fk
where E A (i) and E B (i) are the average production rates of the building blocks
a(i) and b(i) respectively, i = 1, ..., K − 1.
Repair probabilities of these failures are supposed to be equal to those local failures
of the machines of the original line responsible for starvation and blocking.
Considering the expression of the efficiency in isolation ei of the flexible machine
Mi of the original line and the expression of the efficiency Ei of that machine
related to the presence of buffers, it is simple to demonstrate that from normalization
equation (1) it is possible to derive two conservation of flow equations, one for each
product type:
i−1 F
j +1

π(W A W B )αiA + π(W A Sj,f

B
j
)
j=1 fj =1
K F
k +1

+ π(W A Bk,f
B
k
) = E A (i − 1) (17)
k=i+1 fk =1
i−1 F
j +1

π(W A W B )αiB + π(W B Sj,f

A
j
)
j=1 fj =1
K F
k +1

+ π(W B Bk,f
A
k
) = E B (i − 1) (18)
k=i+1 fk =1
88 M. Colledani et al.

4.2 Pseudo-machine models

Once the probabilities of the various states of the ﬂexible machine Mi of the orig-
inal line are obtained, it is possible to build two models (Fig. 5), one for each
pseudo-machine M U (A) (i) and M U (B) (i). This results in two ﬁve state models.
The probability of each state is obtained by adding up the values calculated in the
previous Markov chain.

Fig. 5. Five state models for pseudo-machines M U (A) (i) and M U (B) (i)

For the pseudo-machines M U (A) (i), we have:

i−1 F
j +1

π(W U (A) ) = π(W A W B )αiA + π(W A Sj,f

B
j
)
j=1 fj =1
K F
k +1

+ π(W A Bk,f
B
k
) (19)
k=i+1 fk =1
i−1 F
k +1
U (A) A
π(Sj,fj ) = π(Sj,f SB )
j k,fk
k=1 fk =1
K F
k +1 Fi

A B B A B
+ π(Sj,f j
Bk,f k
) + π(W Sj,f j
) + π(Ri,f S A (20)
i j,fj
)
k=i+1 fk =1 fi =1
j = 1, ..., i − 1 fj = 1, ..., Fj
Fk
i−1

U (A) A
π(Ri,fi ) = π(Ri,fi ) + π(Ri,f SB )
i k,fk
k=1 fk =1
K F
k +1
A B
+ π(Ri,f i
Bk,f k
) fi = 1, ..., Fi (21)
k=i+1 fk =1
i−1 F
k +1 K F
k +1
U (A) B A A B
π(Bj,fj ) = π(Sk,f k
Bj,f j
) + π(Bj,f j
Bk,f k
)
k=1 fk =1 k=i+1 fk =1
Performance evaluation of two part type lines 89

Fi

B A
+ π(Ri,f i
Bj,f j
) + π(W B Bj,f
A
j
) (22)
fi =1
j = i + 1, ..., K fj = 1, ..., Fj
B
π(W ) = π(W W )αiB
A B
(23)
U (A)
The state W represents the state in which the original machine Mi works
part type A, that is the up state for the pseudo-machine M U (A) (i). The state W B
represents the state in which the original machine Mi works part type B even if it
could work both part type, that is a down state for the pseudo-machine M U (A) (i).
The same can be written for pseudo machine M U (B) (i).
Having these two approximate models, one for each pseudo-machine, and knowing
all the state probabilities, it is possible to calculate new local failure probabilities
and remote failure probabilities for the two pseudo-machines. It must be noticed
that, for these machines, the probability of entering into local failure is higher than
the one of the corresponding machine in the original line because we take into
account the probability of failing while producing the other part type.
We can evaluate local failure parameters using balancing equation of nodes RU (A)
and RU (B) :

U (A) π(RU (A) ) π(RU (A) )

pi,fi (i) = r i,f = ri,f fi = 1...Fi (24)
π(W U (A) ) i
E A (i − 1) i
U (B) π(RU (B) ) π(RU (B) )
pi,fi (i) = r i,f = ri,f fi = 1...Fi (25)
π(W U (B) ) i E B (i − 1) i
As originally proposed in [7], we introduce remote failure for the upstream pseudo-
machine, to mimic starvation. So it is possible to write:
U (A) S(A) U (B) S(B)
pj,fj (i) = pj,fj pj,fj (i) = pj,fj (26)
j = 1...i − 1; fj = 1...Fj + 1
U (A) S(A) U (B) S(B)
rj,fj (i) = rj,fj = rj,fj rj,fj (i) = rj,fj = rj,fj (27)
j = 1...i − 1; fj = 1...Fj + 1
In addition, since we know the probability of being in the competition failure state
(that models the situation in which the pseudo-machine does not produce because
the other pseudo-machine is producing) we can use it to evaluate the parameters of
the competition failure. Indeed, using a node equation balancing the probability of
entering the competition failure state with the probability of leaving the same state
we have, for part type A:
U (A) U (A)
π(W U (A) )pFi +1 (i) = π(W B )rFi +1 (i) (28)
In this equation there are two unknowns that are the probabilities of failure
and repair of the competition failure. Making considerations on the behavior
of the pseudo-machine model it is possible to estimate the failure probability.
The probability W U (A) of the pseudo-machine M U (A) (i) being operational, has
been obtained by adding up the probabilities of being in three different states,
90 M. Colledani et al.

π(W A ) = π(W A W B )αiA , π(W A S B ) and π(W A B B ). Therefore, it is possible

to evaluate all the transition probabilities between these starting states and the
competition failure state separately.
U (A)
pFi +1 (i) = (1 − pU (A) (i))αiB (29)
⎛ ⎞
i−1 Fj K Fj
π(W A
) π(W A Sj,f
B
) S(B)
π(W A B
Bj,fj ) B(B) ⎠
×⎝ + j
r + r
π(W U (A) ) j=1 f =1 π(W U (A) ) j,fj j=i+1 f =1 π(W U (A) ) j,fj
j j

i−1 Fj
U (A) K Fj B(A)
Where pU (A) (i) = j=1 fj =1 pj,fj (i) + j=i+1 fj =1 pj,fj (i)
fi U (A)
+ i=1 pi,fi is the sum of all the other failure probabilities of the pseudo-
machine M U (A) (i). Now, using the equation (28), the probability of repair for the
competition failure mode can be evaluated as follows:

U (A) π(W U (A) ) U (A)

rFi +1 (i) = p (i) (30)
π(W B ) Fi +1
U (B)
In a similar way we can ﬁnd for machine M U (B) (i) the values of pi,Fi +1 and
U (B)
ri,Fi +1 :
U (B)
pFi +1 (i) = (1 − pU (B) (i))αiA
⎛ ⎞
i−1 Fj K Fj
π(W B
) π(W B Sj,f
A
) S(A)
W B A
B j,f B(A)
×⎝ + j
r + j
r ⎠
π(W U (B) ) j=1 f =1 π(W U (B) ) j,fj j=i+1 f =1
π(W U (A) ) j,fj
j j

and
U (B) π(W U (B) ) U (B)
rFi +1 (i) = p (i) (31)
π(W A ) Fi +1
Once local, remote and competition failure probabilities are evaluated they can be
used within the building blocks a(i) and b(i).

5 Algorithm

Unknown failure parameters of all the pseudo-machines of the decomposed lines

are determined by following an iterative algorithm inspired by the DDX algorithm.
In particular it consists of the following steps:
1. Initialization: for each pseudo-machine of each building block, local failure
parameters are initialized to the corresponding values of the machines of the
original line, while remote failures and competition failures are initialized to a
small value (for instance we used λ = 0.05).
M U (A/B) (i):
U (A/B) U (A/B)
pi,fi (i) = pi,fi ri,fi (i) = ri,fi i = 1, ..., k − 1 fi = 1, .., Fi (32)

U (A) U (A) U (B) U (B)

pi,Fi +1 (i) = λ ri,Fi +1 (i) = αiA pi,Fi +1 (i) = λ ri,Fi +1 (i) = αiB (33)
Performance evaluation of two part type lines 91

U (A/B) U (A/B)
pj,fj (i) = λ rj,fj (i) = rj,fj j = 1, ..., i − 1; fj = 1, ..., Fj (34)

M D(A/B) (i − 1):
D(A/B) D(A/B)
pi,fi (i − 1) = pi,fi ri,fi (i − 1) = ri,fi (35)
i = 2, ..., k fi = 1, .., Fi
D(A) D(A)
pi,Fi +1 (i − 1) = λ ri,Fi +1 (i − 1) = αiA
D(B) D(B)
pi,Fi +1 (i − 1) = λ ri,Fi +1 (i − 1) = αiB (36)

B(A/B) B(A/B)
pj,fj (i − 1) = λ rj,fj (i − 1) = rj,fj (37)
j = i + 1, ..., K; fj = 1, ..., Fj
2. Step 1. For i = 1, ..., K − 1: failure parameters of machines M U (A) (i) and
M U (B) (i) are evaluated:
– Unknown transition probabilities are calculated using equations (15).
– Evaluation of all the state probabilities of the ﬂexible machine Mi by using
the linear system formed by equations (1) to (14). Blocking probabilities
and transitions to blocking states are derived from previous iterations of
the algorithm and they are equal to remote failures of downstream pseudo-
machines M D(A) (i − 1) and M D(B) (i − 1), in case of i > 1, while for
i = 1 they can be evaluated using equations (16).
– Distribution of the calculated probabilities into the two pseudo-machine
models of Figure 4, using equations (19) to (23), for both part types.
– Evaluation of new local failures using equations (24), (25).
– Calculation of remote failures using equations (26), (27).
– Evaluation of competition failures using equations (29), (30), (31) and
(32).
– Insertion of calculated failure parameters into upstream pseudo-machines
of building blocks a(i) and b(i).
– Evaluation of average throughput, probabilities of blocking and probabil-
ities of starvation of blocks a(i) and b(i) using the building block solution
proposed in [5].
3. Step 2. For i = K, ..., 2: failure parameters of machines M D(A) (i − 1) and
M D(B) (i − 1) are evaluated:
– Unknown transition probabilities are calculated using equations (16).
– Evaluation of all the state probabilities of the ﬂexible machine Mi by using
the linear system formed by equations (1) to (14). Starvation probabilities
and transitions to starvation states are derived from previous iterations of
the algorithm and they are equal to remote failures of upstream pseudo-
machines M U (A) (i) and M U (B) (i), in case of i < K, while for i = K
they can be evaluated using equations (15).
– Distribution of the calculated probabilities into the two pseudo-machine
models similar to those of Figure 5, using equations similar to (19) to (23).
– Evaluation of new local failures using equations similar to (24), (25).
– Calculation of remote failures using equations similar to (26), (27).
92 M. Colledani et al.

– Evaluation of competition failures using equations similar to (29), (30),

(31) and (32).
– Insertion of calculated failure parameters into downstream pseudo-
machines of building blocks a(i − 1) and b(i − 1).
– Evaluation of average throughput, probabilities of blocking and probabili-
ties of starvation of blocks a(i − 1) and b(i − 1) using the building block
solution proposed in [5].
The algorithm stops when the following condition becomes true:
| E A (i)−E A (i−1) |≤ ε and | E B (i)−E B (i−1) |≤ ε i = 1, ..., K − 1 (38)
Performance measures can be evaluated as follows:
Average throughput of the line
E A = E A (1) = E A (2) = ... = E A (K − 1)
E B = E B (1) = E B (2) = ... = E B (K − 1)
Average buffer level in the line
nA A
i = n (i) nB B
i = n (i) i = 1, ..., K − 1

6 Numerical results

In order to show the accuracy of the new analytical method developed (method
CMT) a set of numerical tests has been carried out comparing the analytical re-
sults with those obtained running simulation experiments. More than 150 lines
producing two products have been analyzed using the proposed method with prob-
ability of failure and repair varying in the following ranges: 0 < pi,fi < 0.25 and
0 < ri,fi < 0.8. In particular, systems with three-machines/four-buffers (Table 1),
four machines/six buffers (Table 2), ﬁve machines/eight buffers (Table 3) and six

Table 1. Three machines cases

Performance evaluation of two part type lines 93

Table 2. Four machines cases

Table 3. Five machines cases

machines/ten buffers (Table 4) with one failure parameter for each machine have
been studied and a sampling of results are reported in the following tables. Also,
systems with machines characterized by multiple failure modes are studied and the
results are reported in Table 5. For each simulation experiment 10 replications have
been performed, with a warm-up period of 105 time units followed by simulation
period of 106 time units. Average throughput has been evaluated with a 95% half
conﬁdence interval of 0.0009 as maximum value. Average buffer level has been
evaluated with a 95% half conﬁdence interval of 0.08 as maximum value.
In all the tables, for each analyzed case, failure and repair probabilities of the
machines are reported on the left, together with buffer capacities and the αiA pa-
rameters, that are equal for all the machines in the line. The average production
rates of the line, calculated with the proposed method and with simulation, are
94 M. Colledani et al.

Table 4. Six machines cases

Table 5. Multiple failure machines cases

reported on the right and the error between the evaluations is estimated using the
following equations:
A
ESIM − ECM A
A T
∆%E = A
ESIM · 100
B
E − EB
∆%E B = SIM B CM T · 100
ESIM
As it can be seen by the results provided in this section and by the summary of
results in Table 6, the algorithm has proven to be reliable and accurate in all the
tested cases; indeed the maximum error in throughput evaluation is around 3% and
a high percentage of cases have an error in throughput evaluation lower than 1%.
In Table 7 the error in the evaluation of the average buffer level is reported
for the six machine cases of Table 4. The error between evaluations has been
Performance evaluation of two part type lines 95

Table 6. Summary of results in 150 test cases

N. M ACHIN ES 3 4 5 6
ERROR > 2% 6,8% 4,7% 2,9% 10%
ERROR < 1% 81,8% 66,6% 70,6% 72,2%
M AX ERROR 2,5% 2,38% 2,77% 3,13%

Table 7. Error in average buffer level evaluation for cases of Table 4

Fig. 6. Throughput evaluation with αiA equal for all the machines of the line and variable

calculated by using the following equations:

A
(ni )SIM − (nA
i )CM T
∆%(nA
i ) = NiA · 100
B
(ni )SIM − (nB
i )CM T
B
∆%(ni ) =
NiB · 100
96 M. Colledani et al.

Table 8. Error in throughput evaluation

As it normally happens in decomposition methods, errors in average buffers level

evaluation are much higher than those regarding throughput.
It is worth noting that the total throughput of the line (the sum of throughputs of
part types A and B) is divided between part types A and B differently from the
values of αiA and αiB introduced for the machines of the line. This is due to the
fact that the occurrence of blocking and starvation is different for part type A or B
depending on their relative buffer capacities. In the following tables α indicates the
value of αiA and α1 indicates the ratio between throughput of part type A and the
total throughput, resulting from the simulation. It would be important, as a future
development of the research, to develop a method able to assess the values of αiA
parameters for each machine of the line, starting with the α1 value that we want to
effectively obtain from the line.
In order to study the accuracy of the method for different values of αiA and
B
αi in the line and for each single machine, some focused tests have been realized.
In particular we studied a six machine line with αiA variable for the bottleneck
machine and equal to 0,6 for all the others machines of the line. The behavior of the
system is well approximated by the method for values of α5A similar to those of other
machines but, in other cases, the method doesn’t evaluate performance measures
of that line correctly. This limitation of the application ﬁeld of the method is not
very relevant, because in real automated multiproduct ﬂow lines the αiA parameter
is normally constant throughout the line. In this case the proposed method correctly
Performance evaluation of two part type lines 97

estimates average throughput of the test line as it is shown in (Fig. 6) and (Table 8)
for a wide range of variability of parameter αiA .
As it can be seen by the results provided in this section, the algorithm has proven
to be reliable and accurate in all the tested cases with αiA and αiB parameters equal
for all the machines of the line.

7 Conclusions

A new approximate analytical method for the performance evaluation of multiprod-

uct automated ﬂow lines with multiple failure modes and ﬁnite buffer capacity has
been proposed. The method has been applied to the case of lines producing two
different part types, but is amenable of extension to the case of n part types. An
algorithm inspired by the DDX algorithm has been developed to evaluate failure
probabilities for all pseudo-machines of the decomposed lines. Extensive testing
has proven the accuracy of the method. As a future development, the method could
be extended to the case of continuous lines with multiple part types. In addition,
the method in principle can be extended to study assembly/disassembly networks
[1, 2] and fork and join systems [3, 4].

References

1. Tolio T, Matta A, Levantesi R (2000) Performance evaluation of assembly/disassembly

systems with deterministic processing times and multiple failure modes. In: ICPR2000
International Conference on Production Research, Bangkok, Thailand
2. Gershwin SB (1991) Assembly/disassembly systems: an efficient decomposition algo-
rithm for tree structured networks. IIE Transactions 23(4): 302–314
3. Helber S (1999) Performance analysis of flow lines with nonlinear flow of material,
vol 243. Lecture notes in economics and mathematical systems. Springer, Berlin Hei-
delberg New York
4. Helber S (2000) Approximate analysis of unreliable transfer lines with splits in the flow
of materials. Annals of Operations Research (93): 217–243
5. Tolio T, Gershwin SB, Matta A (2002) Analysis of two-machine lines with multiple
failure modes. IIE Transactions 2002 34(1): 51–62
6. Nemec JE (1999) Diffusion and decomposition approximations of stochastic models
of multiclass processing networks. PhD thesis, Massachusetts Institute of Technology,
February
7. Tolio T, Matta A (1998) A method for performance evaluation of automated flow lines.
Annals of CIRP 47(1): 373–376
8. Le Bihan H, Dallery Y (1999) An improved decomposition method for the analysis of
production lines with unreliable machines and finite buffers. International Journal of
Production Research 37(5): 1093–1117
Automated flow lines with shared buffer
A. Matta, M. Runchina, and T. Tolio
Politecnico di Milano, Dipartimento di Meccanica, via Bonardi 9, 20133 Milano, Italy
(e-mail: {andrea.matta,tullio.tolio}@polimi.it)

Abstract. The paper addresses the problem of fully using buffer spaces in man-
ufacturing ﬂow lines. The idea is to exploit recent technological devices to move
in reasonable times pieces from a machine to a common buffer area of the system
and vice versa. In such a way machines can avoid their blocking since they can
send pieces to the shared buffer area. The introduction of the buffer area shared
by all machines of the system leads to an increase of production rate as demon-
strated by simulation experiments. Also, a preliminary economic evaluation on a
real case has been carried out to estimate the proﬁtability of the system comparing
the increase of production rate, obtained with the new system architecture, with the
related additional cost.

Keywords: Flow lines – Buffer allocation – System design – Performance evalu-

ation

1 Introduction

A manufacturing ﬂow line is deﬁned in literature as a serial production system

in which parts are worked sequentially by machines: pieces flow from the first
machine, in which they are still raw parts, to the last machine where the process
cycle is completed and the finished parts leave the system. When a machine is not
available, parts wait in the buffer immediately upstream the machine. If the num-
ber of parts flowing in the system is constant during the production, these systems
are also called closed flow lines (see Fig. 1 where rectangles and circles represent
machines and buffers of the system respectively) to distinguish them from open
flow lines where the number of parts is not maintained constant. Gershwin gives
in [4] a general description of flow lines in manufacturing. The production rate of
flow lines is clearly a function of speed and reliability of machines: faster and more
reliable machines are and higher the production rate is. However, since machines
Correspondence to: A. Matta
100 A. Matta et al.

Fig. 1. Scheme of closed ﬂow lines

can have different speeds and may be affected by random failures, the part flow
can be interrupted at a certain point of the system causing blocking and starvation
of machines. In particular, blocking in the line occurs when at least one machine
cannot move the parts just worked (BAS, Blocking After Service) or still to work
(BBS, Blocking Before Service) to the next station. In flow lines the blocking of a
machine can be caused only by a long processing time or a failure of a downstream
machine. Analogously, starvation occurs when one or more machines cannot be
operational because they have no input part to work; in this case the machine can-
not work and it is said to be starved. In flow lines the starvation of a machine can
be caused only by a long processing time or a failure of an upstream machine.
Therefore, in flow lines the state of a machine affects the rest of the system because
of blocking and starvation phenomena that propagate upstream and downstream
respectively the source of flow interruption in the line. If there is no area where to
store pieces between two adjacent machines, the behavior of machines is strongly
correlated.
In order to decrease blocking and starvation phenomena in flow lines, buffers be-
tween two adjacent machines are normally included to decouple the machines
behavior. Indeed, buffers allow to adsorb the impact of a failure or a long process-
ing time because (a) the presence of parts in buffers decreases the starvation of
machines and (b) the possibility of storing parts in buffers decreases the blocking
of machines. Therefore, production rate of flow lines is also a function of buffer
capacities; more precisely, production rate is a monotone positive function of the
total buffer capacity of the system. Refer to [5, 7] for a list of works focused on the
properties of production rate in flow lines as a function of the buffer size.
Traditionally, flow lines have been deeply investigated in literature. Re-
searchers’ efforts have been devoted to develop new models for evaluating the
performance of flow lines and for optimizing their design and management in shop
floors. Operations research techniques like simulation and analytical methods have
been widely used to estimate system performance parameters such as throughput
and work in process level. Performance evaluation models are currently used in
configuration algorithms for finding the optimal design of flow lines taking into
account the total investment cost, operative cost and production rate of the system.
In synthesis, academic innovation has been mainly focused on the development of
performance evaluation and optimization methods of flow lines without entering
into several mechanical details. See also the review [1] of Dallery and Gershwin
on a detailed view of performance evaluation models for flow lines and an updated
recent state of the art on optimization techniques applied in practice [8]. Indeed,
most of works is at system level as they deal with optimization of macro variables
such as number of machines in the line, buffer capacities and machines’ speed and
Automated flow lines with shared buffer 101

efficiency. On the other hand, engineers of firms have had to face the complex-
ity due to the fact that flow lines are designed in practice with all their mechanical
components. Innovation from builders of manufacturing flow lines has been mainly
dedicated to increase machines reliability and to reduce system costs by improv-
ing the design of specific mechanical components such as feed drives, spindles,
transporters, etc.
Therefore, advancements in flow line evolution do not regard the main philos-
ophy of the system. Parts are loaded into the system at the first machine and, after
having been processed, they are moved into the first buffer waiting for the avail-
ability of the second machine. Blocking phenomena is limited by buffers, larger
is their capacity and higher the throughput of the line is. However, buffers in flow
lines are dedicated to machines; this characteristic implies that a buffer can con-
tain only pieces worked by the immediately upstream machine. Therefore, when
a long failure occurs at a machine of the line, the portion of the system upstream
the failed machine is blocked but upstream machines continue to work until their
corresponding buffers are full. On the other hand, the portion of the system down-
stream the failed machine is starved because downstream machines cannot work
since they do not have any piece to work. In that case the buffer area downstream
the failed machine cannot be used to store parts worked by machines that are up-
stream the failed machine since empty buffers are dedicated and cannot be used for
pieces coming out from other machines. It appears that buffer spaces are not fully
exploited when needed. The problem of properly using all the available space in
flow lines represents the argument of this paper.

2 Flow lines with shared buffer

2.1 Motivation

The paper presents a new concept of manufacturing flow line characterized by two
different types of buffers: traditional dedicated buffers and a common buffer shared
by all the machines of the system. The common buffer allows to store pieces at
any point of the system thus increasing the buffer capacity of each machine (see
Fig. 2). The main advantage is related to the fact that wherever an interruption of
flow is in the system, the common shared buffer can be used by all machines. As
a consequence, blocking of machines should be lower than that of classical flow
lines thus allowing an increase of production rate at constant total buffer capacity.
However, profitability of the new system architecture depends on costs incurred
for the additional shared buffer. Traditionally the main goal in the design phase
of flow lines is to find the system configuration at minimum costs constrained

Fig. 2. Scheme of the proposed system architecture

102 A. Matta et al.

to a minimum value of production rate. In this context, the introduction of the

shared buffer in flow lines is possible only if the time necessary for moving parts
from shared buffer to machines is small and the relative investment for additional
mechanical components is reasonable.
Indeed, in our opinion costs are the main reason for which shared buffers have
not still be adopted in manufacturing flow lines. Designing shared buffer in flow
lines implies to have additional components, and thus larger costs, for moving
pieces from machines to the central buffer and vice versa. However, technology
is now mature to be used for this scope at affordable costs. Several manufacturers
can provide at low costs a wide set of transport modules for part movements.
These modules can be assembled in a flexible way to move parts through the
system; actually the speed of conveyors is around 20 m/min on average depending
on the weight of parts. Parts can follow linear paths, as usual in flow lines, and
circular paths with small rounds. Furthermore in order to save floor space, parts
can be moved up or down for reaching different heights. The cost of transporter
modules is now affordable allowing their intensive usage in practice at the same
productivity level, defined in the paper as the amount of output obtained for one
unit of input. We consider the production rate of the system as the output and the
total cost of the system as the input. It is rather difficult to increase productivity
of manufacturing systems since a specific action that can increase the production
rate of a system is normally balanced by the effort required. Actions that can
improve system productivity should reduce the total costs (reduction of machines
and fixtures cost, reduction of adaptation cost, etc.) without reducing the production
rate or should increase the production rate (shorter system set-up times, reduction
of unproductive times, improvement of system availability, etc.) without increasing
costs. The proposed system can be considered interesting for practical exploitation
if its productivity remains constant or increases in comparison with traditional
systems.

2.2 System description

The proposed system architecture is a ﬂow line composed of K machines separated

by limited buffers. In case of open systems the number of buffers is equal to K − 1
and we assume that the first machine is never starved and the last machine is never
blocked; in case of closed systems the number of buffers is equal to the number of
machines. We denote with Mi and Bi (with i = 1, ..., K − 1, K) the i-th machine
and the i-th dedicated buffer respectively.
Machines are normally unreliable and their efficiency depends on their failure
and repair rates distributions. The K − 1 buffers (or K in closed flow lines ) are
dedicated to their corresponding machines: buffer B1 contains only pieces already
worked by first machine M1 , buffer B2 contains only pieces already worked by
second machine M2 , and so on. If buffer Bi is full, i.e. the buffer level has reached
the buffer capacity, machine Mi can send worked pieces to the shared buffer denoted
with Bs that is located in a specific area of the system, shared by all the machines,
where it is possible to put pieces independently by their process status. A generic
machine Mi is blocked only if both dedicated and shared buffers, i.e. buffers Bi and
Automated flow lines with shared buffer 103

Bs , are full. The presence of shared buffer decreases blocking phenomena in the
flow line: if the dedicated buffer is full, pieces worked by machine Mi can be moved
to the shared buffer until the part flow resumes at machine Mi+1 and the level of
buffer Bi decreases. In more detail, a part which cannot be stored in a dedicated
buffer stays in the shared buffer until a place in the dedicated buffer becomes
available. The way in which parts in the shared buffer are positioned depends
on the technology used and the management rules adopted. If the shared buffer
consists of a simple conveyor on which parts flow until a new space is available at
dedicated buffers, the ordering of parts depends on the their entering sequence in
the conveyor. If the shared buffer consists of a series of racks, the ordering of parts
depends on the particular management rule adopted; in this case it is necessary to
have a resource like a robot or a carrier that takes parts from machines and put
them on racks. Tempelmeier and Kuhn describes different mechanisms in the case
of Flexible Manufacturing Systems (FMS) with central buffer [9].
A certain amount of time is necessary for physically moving parts from a ded-
icated buffer area to the shared buffer area and vice versa. Therefore, if blocking
decreases with the introduction of the shared buffer the starvation increases [9]. In
the remainder of the paper we call this time the travel time denoted with tt . The
profitability of the system depends on the value of travel time and its impact on
system performance. If the travel time is reasonably small, then the penalty time
incurred for using the shared buffer does not deeply decrease the system perfor-
mance since, after the resumption of flow, the time spent by parts for going from
the shared buffer area to the dedicated one is hidden, i.e. covered by the pieces
already present in the dedicated area and processed in the meanwhile by machine
Mi+1 . If the travel time is large, then the penalty time incurred for using the shared
buffer can strongly decrease the system performance since machines are frequently
starved. The increase of starvation as a consequence of the transport time from/to
the shared buffer has been previously described by Tempelmeier and Kuhn [9] in
the analysis of a special FMS configured as a flexible flow line. The next section
reports a numerical analysis for assessing the productivity of flow lines with shared
buffer in different situations.

3 Numerical evaluation

The objective of the section is to evaluate the gain in terms of productivity due to the
introduction of shared buffers in production lines. To do this, the experimentation
has been carried out by simulating ﬂow lines on simple test cases, created ad hoc
to understand the system behavior in different situations (Sect. 3.1), and on a real
ﬂow line (Sect. 3.2).

3.1 Test cases

We consider a closed flow line composed of five machines, each one with a finite
buffer capacity immediately downstream. Machines are unreliable and character-
ized by the same type of failure. Failures are time dependent and do not depend
104 A. Matta et al.

on processing times of operations at machines. Failures have mean time to failure

(MTTF) and mean time to repair (MTTR) exponentially distributed with means
1000 s and 100 s respectively. The blocking mechanism is the BAS (Blocking Af-
ter Service) type. The cycle time of each machine of the system is the same and
is denoted with tc , i.e. the line is balanced because machines have also the same
efﬁciency. The number of parts circulating in the system is maintained constant
during production and equal to P . For simplicity, dedicated buffers have the same
capacity Ni with i = 1, ..., K.

Table 1. Test case: factor levels of the 25 experiment

Factors Low High

Cycle time (tc ) 5s 60 s

Total buffer capacity (NT OT ) 100 125
Portion of dedicated buffer capacity (α) 0.5 1
Travel time (tt ) 0s 30 s
Number of parts (P ) 75 90

The goal of the experiment is to evaluate by means of steady state simulations

the significance that potential factors may have on the main performance indicator
as the production rate is. Factors taken into consideration in the experiment are: the
machine cycle time tc , the total buffer capacity NT OT , the portion of dedicated
buffer capacity α, the travel time tt and the number of parts that circulate in the
system P . The design of experiments is a 25 factorial plan; Table 1 reports the fac-
tors’ levels chosen in the experiment. The parameter α can assume values between
0 and 1. Notice that systems with α = 1 correspond to traditional flow lines in
which the whole buffer capacity of the system is dedicated. In each treatment of
the designed factorial plan 15 replications of simulation have been carried out and
statistics have been collected after a warm-up period of 86400 simulated seconds
and 25000 finished pieces. In each simulated scenario the capacity of dedicated
buffers is calculated in the following way:
NT OT · α
Ni = , i = 1, ..., K (1)
K
while the capacity of the shared buffer is equal to NT OT · (1 − α). The analysis
of variance has been applied to test the significance of the analyzed factors on the
system’s efficiency, denoted with E and calculated as:
X
E= (2)
X∗
where X is the average production rate collected in a simulation run and X ∗ is
the maximum production rate calculated without considering failures, blocking
and starvation at machines. However, since normality assumptions on residuals is
not satisfied, we have been forced to divide the analyzed response values into two
distinct populations corresponding to low and high levels of the cycle time factor.
Automated flow lines with shared buffer 105

After that, all assumptions required by the analysis of variance have been satisfied
and the main results are now presented. In particular the Anderson-Darling and
Bartlett tests at 95 percent confidence level have been used to test normality and
variance homogeneity of residuals respectively, the independence was assured by
the randomized execution of experiments and different seeds used for generating
pseudo-random numbers. Results from ANOVA are reported in Tables 2 and 3 where
significant factors and interactions are recognizable: a source is significant on the
system’s efficiency if the p-value in the last column is lower than the Bonferroni’s
alpha family (chosen equal to 0.05) divided by the number of executed statistical
tests (i.e. 15 in this experiment).
As far as the main effects are concerned, the number of pallets that circulate
in the system, the portion of dedicated buffers and the total buffer capacity are
significant for both the populations with different cycle times. The main effect of a
factor is the average change in the response due to moving the factor from its low
level to its high level [6]; this average is taken over all combinations of the factor
levels in the design. A first conclusion is that in the analyzed system a travel time
value equal to 0 or 30s is not relevant for the system’s efficiency due to the values
chosen for the levels. However increasing to a threshold value, greater than 30s,
the travel time leads to bad performance; we will see at the end of the paragraph
the threshold values after which the travel time becomes significant. The factor
α is significant for both populations and, furthermore, it is possible to conclude,
by comparing with the Tukey’s method the two levels, that the analyzed closed
flow line with shared buffer has statistically an efficiency superior than that of the
analyzed traditional flow line. Notice that the difference of efficiency in the two
levels is around 5% for tc = 5s and 1% for tc = 60s. The significance of the
number of pallets and the total buffer capacity is a well known result in literature
[2–4].
As far as the interactions effects are concerned, it is possible to state that two-
way interactions between P , α and NT OT are statistically relevant (see also Fig. 3
and 4). A two-way interaction is significant if the combined variation of the two
factors has a relevant effect on the response. The interaction between P and α
shows that the system performance decreases when the number of pallets is high
and all buffers are dedicated: in this case the blocking of machines is frequent
and it deeply affects the line efficiency. Notice that the system with shared buffer
has approximately the same efficiency value independently by how many pallet
circulate in the line. On the contrary, performance decreases in the traditional system
when the number of pallets is augmented. The interaction between NT OT and α
shows that system performance decreases when the total buffer capacity is low and
all buffers are dedicated: in this case the blocking of machines is frequent due to
the contemporary reduced and dedicated buffers capacity. The interaction between
NT OT and P is known in literature [3, 9] and we do not comment more. The triple
interaction among P , α and NT OT results to be significant only for cycle time
equal to 60 s.
The same system has been simulated also for a wider set of values to better
understand the effect of the shared buffer on the system performance. Figure 5
shows the average throughput for different sharing levels of the central buffer when
106 A. Matta et al.

Table 2. Test case: ANOVA results (tc = 5s)

Source DF Seq SS Adj SS Adj MS F P

Pallet number (P ) 1 0.018475 0.018475 0.018475 441.97 0.000

Alpha (α) 1 0.154370 0.154370 0.154370 3693.01 0.000
Travel time (tt ) 1 0.000126 0.000126 0.000126 3.01 0.084
Total buffer capacity (NT OT ) 1 0.077345 0.077345 0.077345 1850.32 0.000
P *α 1 0.010684 0.010684 0.010684 255.59 0.000
P *tt 1 0.000004 0.000004 0.000004 0.09 0.762
P *NT OT 1 0.017484 0.017484 0.017484 418.28 0.000
α*tt 1 0.000068 0.000068 0.000068 1.63 0.203
α*NT OT 1 0.020673 0.020673 0.020673 494.56 0.000
tt *NT OT 1 0.000063 0.000063 0.000063 1.50 0.222
P *α*tt 1 0.000167 0.000167 0.000167 3.99 0.047
P *α*NT OT 1 0.000358 0.000358 0.000358 8.57 0.004
P *tt *NT OT 1 0.000077 0.000077 0.000077 1.85 0.175
α*tt *NT OT 1 0.000005 0.000005 0.000005 0.12 0.732
P *α*tt *NT OT 1 0.000024 0.000024 0.000024 0.58 0.445
Error 224 0.009363 0.009363 0.000042
Total 239 0.309285

Table 3. Test case: ANOVA results (tc = 60s)

Source DF Seq SS Adj SS Adj MS F P

Pallet number (P ) 1 0.0010176 0.0010176 0.0010176 440.48 0.000

Alpha (α) 1 0.0049118 0.0049118 0.0049118 2126.16 0.000
Travel time (tt ) 1 0.0000065 0.0000065 0.0000065 2.81 0.095
Total buffer capacity (NT OT ) 1 0.0022927 0.0022927 0.0022927 992.46 0.000
P *α 1 0.0009786 0.0009786 0.0009786 423.60 0.000
P *tt 1 0.0000011 0.0000011 0.0000011 0.49 0.483
P *NT OT 1 0.0007292 0.0007292 0.0007292 315.65 0.000
α*tt 1 0.0000001 0.0000001 0.0000001 0.06 0.804
α*NT OT 1 0.0017108 0.0017108 0.0017108 740.57 0.000
tt *NT OT 1 0.0000018 0.0000018 0.0000018 0.78 0.380
P *α*tt 1 0.0000014 0.0000014 0.0000014 0.60 0.438
P *α*NT OT 1 0.0002936 0.0002936 0.0002936 127.10 0.000
P *tt *NT OT 1 0.0000099 0.0000099 0.0000099 4.30 0.039
α*tt *NT OT 1 0.0000009 0.0000009 0.0000009 0.40 0.525
P *α*tt *NT OT 1 0.0000013 0.0000013 0.0000013 0.55 0.461
Error 224 0.0005175 0.0005175 0.0000023
Total 239 0.0124749
Automated ﬂow lines with shared buffer 107

Fig. 3. Test case: interaction plot (tc = 5s)

Fig. 4. Test case: interaction plot (tc = 60s)

the travel time is equal to 5 s. When the number of pallets is small the shared buffer is
never used and the different systems have the same performance. When the number
of pallets increases, the starvation decreases and the throughput increases; however
as the number of pallets increases the blocking occurs more frequently and the
systems with shared buffer perform better than the traditional one (i.e. α = 1). In
more detail higher the sharing percentage is and better is the performance. After a
certain value of pallets in the system the blocking penalizes the system performance
108 A. Matta et al.

Fig. 5. Test case: average production rate (±1.2 part/hour) vs P when NT OT = 50 and
tt = 5s for different values of α

Fig. 6. Test case: average production rate (±1.2 part/hour) vs tt when NT OT = 50 and
P = 30 for different values of α

and the throughput decreases [2, 3, 9]. This inversion value is larger in systems
with shared buffer than in traditional systems. In particular the inversion point
increases as the percentage sharing of buffers increases. Thus, in order to increase
the throughput the system’s user could move a portion of the buffer capacity from
dedicated to buffer and contemporary to increase the number of pallets.
Figure 6 shows the effect of the travel time on the average throughput. The
system performance stays stable for values of the parameter travel time inferior to
Automated ﬂow lines with shared buffer 109

Fig. 7. Test case: average production rate (±1.2 part/hour) vs tt when NT OT = 50 and
α = 0.75 for different values of P

a threshold value and deteriorates after it. In this experiment the threshold value
is equal to 40 s, 25 s and 15 s for values of α equal to 0.75, 0.5 and 0.25 respec-
tively when the number of pallets is 30. The threshold value of the travel time
decreases as the percentage of shared buffer increases because there are less pallets
in the dedicated buffers and starvation occurs more frequently. This effect could be
compensated by increasing the number of pallets.
Figure 7 shows the effect of the travel time for the system with α = 0.5 and
different values of pallets. Notice that the loss of production after the threshold
value of the travel time is larger for small number of pallets. It is worthwhile to
notice that the results reported in this section are valid for closed flow lines. Open
flow lines are more difficult to manage due to the large number of parts which may
enter from the first machine. Indeed, if there is no limit to the number of parts
entering into the system and the first machine is very efficient, it happens that all
the parts just entered and processed by the first machine fills the shared buffer thus
limiting the possibility to the other machines of recurring to the shared buffer. Thus,
specific rules for managing the entering of parts should be designed.

3.2 Real case

In this paragraph we consider a real assembly line composed of ﬁve machines

separated by buffers with limited capacity. Pallets are empties before entering into
the first machine; then components are loaded on pallets until the assembled final
product is obtained at the last machine of the system. The components are stored
on containers located at each machine and are not modelled as customers, thus
the system can be view as a flow line crossed by parts (i.e. the pallets) that visit
machines in a fixed sequence. The number of pallets in the system remains constant
110 A. Matta et al.

Fig. 8. Real case: lay-out of the real system

Fig. 9. Real case: lay-out of alternative 1

during the production. Machines are unreliable and characterized by different types
of failures. In particular machines M1 , M2 and M5 can fail in three different ways
while M3 and M4 in only one way. MTTF and MTTR for each failure type are
exponentially distributed with values as reported in Table 4 and calculated by the
firm. As in the real system, a machine can fail only when it is occupied by a part. The
first failure type of each machine models mechanical and electronic failures in an
aggregated way, and the second and third failure types of M1 , M2 and M5 model the
emptying of component containers. Table 4 reports the failure parameters and the
deterministic processing rates of machines. The BAS control rule is considered,
that is machines may enter in a blocking state only after the completion of the
process. A physical constraint in the lay-out does not allow changes in the portion
of the system between M5 and M1 , i.e. the buffer B5 is dedicated and cannot be
modified in its capacity. The lay-out of the system is shown in Figure 8. The real
system already uses in the traditional way flexible transport modules for moving
parts through the line at a constant speed of 17.6 m/min.
Among a large set of feasible solutions, two alternative reasonable systems
with shared buffer are considered in the comparison with the real one. The first
alternative has a shared buffer, located at the center of the line, in addition to
the dedicated buffers of the real system. The increase of total buffer capacity is
around 31% corresponding to an increase of approximately 44 kEuro of the total
investment cost (this value has been estimated on the basis of additional conveyors,
sensors, engines and control system). The second alternative has been designed
Automated flow lines with shared buffer 111

Fig. 10. Real case: lay-out of alternative 2

Fig. 11. Input/output into/fronm the shared buffer

with investment cost equal to that of the real line; the total buffer capacity is lower
than that of the real line because of the additional costs of sensors and engines.
Total buffer capacity is reported in Table 5 while the lay-outs of alternatives with
shared buffer are shown in Figures 9 and 10.
In the proposed alternatives each machine is blocked only if its dedicated buffer
and the common buffer are full. The mechanism of input/output into/from the
shared buffer is now described referring to Figure 11. Let us consider the portion
of the system between machines M1 and M2 . Before the machine M1 releases
a processed part, the system controls the availability of space in the portion of
conveyor between M1 and M2 , i.e. in the dedicated buffer with size N1 . If there
is space in the dedicated buffer the machine releases the part which then moves
towards machine M2 , otherwise the system controls if the part can be introduced in
the shared buffer. If there is space available in the shared buffer the machine releases
the part that enters in the shared buffer, otherwise the machine is blocked until a
new space becomes available in the dedicated buffer or shared buffer. This reaction

Table 4. Real case: processing rates [part/min] and MTTFs and MTTRs [min] of machines

Machine Processing MTTF MTTR MTTF MTTR MTTF MTTR

number rate type 1 type 1 type 2 type 2 type 3 type 3

1 20.0 5.64 0.81 499.17 4.00 143.99 7.16

2 17.3 2.90 1.08 94.78 5.16 69.23 5.16
3 16.5 5.61 0.57 – – – –
4 15.7 21.28 0.51 – – – –
5 16.0 10.60 0.63 274.43 5.16 29.93 5.00
112 A. Matta et al.

Table 5. Real case: buffer capacities of real system and alternatives with shared buffer

System N1 N2 N3 N4 N5 NShared Total Dedicated Shared

Real 110 66 107 70 83 0 436 100 % 0%

Alternative 1 110 43 92 52 83 192 572 61 % 39 %
Alternative 2 44 24 29 37 83 149 366 47 % 53 %

Table 6. Real case: comparison between real system and alternatives with shared buffer

System Max average production Investment Average productivity

rate [part/h] cost [kEuro] index

Real 650.9±3.6 2250 0.289

Alternative 1 677.7±3.9 2294 0.295
Alternative 2 667.3±3.1 2250 0.297

of the machine has been called as ”block-and-recirculate” strategy by Tempelmeier

and Kuhn in their book [9]. The point denoted with A in Figure 11 is the transfer
point at which parts can change conveyor. The transfer point is bi-directional, that
is a part is switched from the output conveyor of the machine to the shared conveyor
and vice versa. Switching devices are available in the market at affordable costs and
allow the machine to avoid the blocking state. An example of switching mechanism
is shown in Figure 12. When a new space becomes available in the shared buffer a
control rule must be defined to decide which part, if any, will access to the common
area. In the proposed systems the precedence is given to the machine that has just
made free the place in the shared buffer.
Each time a part must leave the shared buffer it is necessary that the part reaches
the transfer point. If the shared buffer is large the time to reach the transfer point
can be so high that the dedicated buffer empties and starvation thus occurs. In
order to decrease this time, which is a portion of the above defined travel time, two
inner alternative paths have been introduced in the first alternative (see Fig. 9). In
the second alternative machines are closer and the travel time is not critical. The
transfer time in which the part leaves changes the conveyor from the shared to the
dedicated buffer takes few seconds.
The performance of systems has been calculated by terminating simulations of
one production day because at the end of the shift the system is always emptied.
The simulation model has been validated on the real production rates of seven days.
Statistics have been collected and confidence intervals on production rate have been
calculated at 95% confidence level. Figure 13 shows the average production rate
for the analyzed systems which depends on the numbers of customers that circulate
in the system [2, 3, 9]. As shown in Table 6, both the proposed alternative systems
have productivity index (calculated as average production rate over investment
cost) greater than that of the real system. In particular, at equal investment cost, the
second alternative has an average production rate greater 2.5% than that in the real
case. Figures 13–18 report the detailed states of machines for the real system and
Automated flow lines with shared buffer 113

Fig. 12. Switching mechanism

Fig. 13. Real case: average production rate vs number of pallets (±4 part/h)

the alternative 2. It can be noticed how the blocking of machines decreases from
the real system to the system with the shared buffer and starvation increases due to
the fact that parts circulate in the shared buffer. However the disadvantages due to
the increase of starvation do not compensate the advantages due to the reduction
of blocking in the system.
A long term investment analysis must consider the Net Present Value (NPV)
related to the investment, i.e. the sum of all discounted cash ﬂows during the life
of the system. The NPV considers the initial investment cost (fundamentally ma-
chines, buffers), the future discounted cash ﬂows during the normal running of the
system (revenues and production costs) and the residual value of the system after
the planning horizon of the investment. In the investment analysis only the discrim-
inating voices have to be considered. In this case the three alternative systems differ
in the buffer capacity since machines remain the same. Thus the investment cost of
machines is not differential and is not considered. The investment cost of buffers
is different only for the alternative 1. Revenues are discriminant if the additional
production capacity of the proposed alternatives is converted in additional sales.
114 A. Matta et al.

Fig. 14. Real case: Machine 1 states vs number of pallets for real system and alternative 2

A reasonable assumption is that the unit production cost does not differ because
the process technology is the same for each alternative. Again, if the additional
capacity is exploited in new sales the total variable production costs are a differ-
ential item. The residual value of a production system at the end of the planning
horizon is very difficult to estimate. However, the difference between the initial
investment with the real system is limited to 44 kEuro for the first alternative and
null for the second one and we can imagine that the difference between the residual
value will be smaller, so we can neglect the residual value voice in the analysis. The
second alternative has the same investment cost of the real system with a higher
production rate, and in this case the NPV analysis is not necessary because the real
system is dominated by the alternative. For the first alternative we have to impose
Automated flow lines with shared buffer 115

Fig. 15. Real case: Machine 2 states vs number of pallets for real system and alternative 2

some assumptions to compensate the lack of information on revenues and costs.

Assuming a yearly discount rate of 0.1, a planning horizon of 5 years, 8 hours of
production per day, and that all the additional capacity is sold in the market, the
marginal gain (i.e. the difference between price and variable cost of the product) the
product must have to compensate the additional investment of the first alternative is
equal to 0.17 euro/part, corresponding approximately to 0.6 % of the market price
of the product. We think that the marginal gain on the product is much larger than
the calculated threshold value and therefore the proposed alternative 1 seems to
be profitable. Obviously, if the additional capacity will not be exploited the real
system is clearly more profitable than the first alternative, however in this situation
the second alternative dominates the others.
116 A. Matta et al.

Fig. 16. Real case: Machine 3 states vs number of pallets for real system and alternative 2

4 Practical considerations

In order to introduce manufacturing flow lines with shared buffer in real shop floors
several aspects, both technological and economic, have to be clarified. First of all,
a necessary condition for the common buffer exploitation is that, in order to be
able to dispatch parts to the different machines, parts must be tracked during their
movements in the system. To do this, several technologies are available at low costs.
Lasers markers can sculpture codes, easily readable by optical devices, on metal
components in a fast and cheap way. Standard devices like chips can save informa-
tion and exchange it with the system supervisor. Also radio frequency technology
is now ready to be used in shop floors to exchange information without the limiting
Automated flow lines with shared buffer 117

Fig. 17. Real case: Machine 4 states vs number of pallets for real system and alternative 2

constraint of designing control points in the system. Therefore, traceability of parts

in the system does not seem to be an obstacle in future applications.
Conveyors seem to be a good and consolidated solution to move parts from
machines to the common buffer and vice versa. However, other devices should be
investigated such as robot manipulators that can move parts through the system.
The main advantage of manipulators is their ﬂexibility since they can be adapted to
different situations (e.g. adaptation to react to changes in the lay-out of the system)
by simply re-programming them. The main drawback of manipulators is related
to their investment cost and the skills needed to instruct them. Shuttles and AGV
(Automated Guided Vehicle) represents a traditional solution to part movement in
118 A. Matta et al.

Fig. 18. Real case: Machine 5 states vs number of pallets for real system and alternative 2

Flexible Manufacturing Systems, which represent the first case of shared buffer in
automated manufacturing systems.
Another important aspect that is normally taken into consideration in the design
phase of a flow line is the floor space occupied by the system. Theoretically, it is
necessary an additional space to locate the common buffer in a manufacturing flow
line. However, it is also true that the space occupied by dedicated buffers decreases
and consequently the machines of the line are closer. Therefore, it is not possible
a priori to say anything about the effects of the shared buffer on the occupied floor
space since this aspect is closely connected to the lay-out of the real system.
Automated flow lines with shared buffer 119

5 Conclusions and future developments

The paper addresses the problem of fully using buffer spaces in flow lines. The idea
is to exploit recent technological devices to move in reasonable time pieces from
a machine to a common buffer area of the system and vice versa. In such a way
machines can avoid their blocking moving pieces to the shared buffer area. The
decrease of blocking in flow lines has a positive impact on their production rate.
The numerical analysis reported in the paper demonstrates the validity of the idea
pointing out also the factors that affect the improvement of the proposed system
architecture in terms of productivity.
In conclusion, several practical aspects have to be investigated before to state
that shared buffers can be successfully adopted in real manufacturing flow lines,
however the first results shown in this paper and the technologies now available
motivate further research in this direction. Ongoing research is dedicated to identify
the potential sectors for practical applications of the new concepts proposed in
this paper. Then, further research will focus on new key-issues never addressed in
literature and introduced by the architecture with the shared buffer:
– Allocation of dedicated and shared buffers. Traditionally only capacities of
dedicated buffers have been considered in the design phase of manufacturing
flow lines. In our opinion the buffer allocation problem in the case of shared
buffer will be easier in comparison than the traditional one because the new
system architecture is more robust, i.e. the system performance is stable in
several conditions and do not decay after some changes in the design of the
line.
– Performance evaluation of flow lines with shared buffer. New analytical meth-
ods are necessary to estimate performance of new system architectures. The
method of Tempelmeier et al. [10], originally developed to evaluate the per-
formance of Flexible Manufacturing Systems with blocking, could be adopted
also for flow lines with shared buffer. This method is being tested in terms of
accurateness of provided results.
– Management of flow lines with shared buffer. New dispatching rules could
be necessary to avoid deadlock in the new system architecture when pieces
converge to the same area coming from different positions.

References

1. Dallery Y, Gershwin SB (1992) Manufacturing ﬂow line systems: a review of models

and analytical results. Queueing Systems 12: 3–94
2. Dallery Y, Towsley D (1991) Symmetry property of the throughput in closed tandem
queueing networks with finite capacity. Operations Research 10(9): 541–547
3. Frein Y, Commault C, Dallery Y (1996) Modeling and analysis of closed-loop produc-
tion lines with unreliable machines and finite buffers. IIE Transactions 28: 545–554
4. Gershwin SB (1994) Manufacturing systems engineering. PTR Prentice Hall, New
Jersey
5. Gershwin SB, Schor JE (2000) Efficient algorithms for buffer space allocation. Annals
of Operations Research 93: 91–116
120 A. Matta et al.

6. Law AM, Kelton WD (2000) Simulation modelling and analysis. McGraw–Hill, New
York
7. Shantikumar JG, Yao DD (1989) Queueing networks with finite buffers. In: Perros
HG, Altiok T (eds) chapter Monotonicity and concavity properties in cyclic queueing
networks with finite buffers, pp. 325–344. North Holland, Amsterdam
8. Tempelmeier H (2003) Practical considerations in the optimization of flow production
systems. International Journal of Production Research 41(1): 149–170
9. Tempelmeier H, Kuhn H (1993) Flexible manufacturing systems – Decision support
for design and operation. Wiley, New York
10. Tempelmeier H, Kuhn H, Tetzlaff U (1989) Performance evaluation of flexible manu-
facturing systems with blocking. International Journal of Production Research, 27(11):
1963–1979
Integrated quality and quantity modeling
of a production line
Jongyoon Kim and Stanley B. Gershwin
Department of Mechanical Engineering, Massachusetts Institute of Technology,
Cambridge, MA 02139-4307, USA (e-mail: [email protected])

Abstract. During the past three decades, the success of the Toyota Production Sys-
tem has spurred much research in manufacturing systems engineering. Productivity
and quality have been extensively studied, but there is little research in their inter-
section. The goal of this paper is to analyze how production system design, quality,
and productivity are inter-related in small production systems. We develop a new
Markov process model for machines with both quality and operational failures,
and we identify important differences between types of quality failures. We also
develop models for two-machine systems, with infinite buffers, buffers of size zero,
and finite buffers. We calculate total production rate, effective production rate (ie,
the production rate of good parts), and yield. Numerical studies using these models
show that when the first machine has quality failures and the inspection occurs only
at the second machine, there are cases in which the effective production rate in-
creases as buffer sizes increase, and there are cases in which the effective production
rate decreases for larger buffers. We propose extensions to larger systems.

Keywords: Quality, Productivity, Manufacturing system design

1 Introduction

1.1 Motivation

During the past three decades, the success of the Toyota Production System has
spurred much research in manufacturing systems design. Numerous research pa-
pers have tried to explore the relationship between production system design and

We are grateful for support from the Singapore-MIT Alliance, the General Motors Re-
search and Development Center, and PSA Peugeot-Citroën.
Correspondence to: S.B. Gershwin
122 J. Kim and S.B. Gershwin

productivity, so that they can show ways to design factories to produce more prod-
ucts on time with less resources (such as people, material, and space). On the other
hand, topics in quality research have captured the attention of practitioners and re-
searchers since the early 1980s. The recent popularity of Statistical Quality Control
(SQC), Total Quality Management (TQM), and Six Sigma have demonstrated the
importance of quality.
These two fields, productivity and quality, have been extensively studied and
reported separately both in the manufacturing systems research literature and the
practitioner literature, but there is little research in their intersection. The need for
such work was recently described by authors from the GM Corporation based on
their experience [13]. All manufacturers must satisfy these two requirements (high
productivity and high quality) at the same time to maintain their competitiveness.
Toyota Production System advocates admonish factory designers to combine
inspections with operations. In the Toyota Production System, the machines are
designed to detect abnormalities and to stop automatically whenever they occur.
Also, operators are equipped with means of stopping the production flow whenever
they note anything suspicious. (They call this practice jidoka.) Toyota Production
System advocates argue that mechanical and human jidoka prevent the waste that
would result from producing a series of defective items. Therefore jidoka is a means
to improve quality and increase productivity at the same time [23], [24]. But this
statement is arguable: quality failures are often those in which the quality of each
part is independent of the others. This is the case when the defect takes place due
to common (or chance or random) causes of variations [16]. In this case, there is
no reason to stop a machine that has made a bad part because there is no reason
to believe that stopping it will reduce the number of bad parts in the future. In this
case, therefore, stopping the operation does not influence quality but it does reduce
productivity. On the other hand, when quality failures are those in which once a
bad part is produced, all subsequent parts will be bad until the machine is repaired
(due to special or assignable or systematic causes of variations) [16], catching bad
parts and stopping the machine as soon as possible is the best way to maintain high
quality and productivity.
Non-stock or lean production is another popular buzzword in manufacturing
systems engineering. Some lean manufacturing professionals advocate reducing
inventory on the factory floor since the reduction of work-in-process (WIP) reveals
the problems in the production lines [3]. Thus, it can help improve product quality.
It is true in some sense: less inventory reduces the time between making a defect
and identifying the defect. But it is also true that productivity would diminish
significantly without stock [5]. Since there is a tradeoff, there must be optimal
stock levels that are specific to each manufacturing environment. In fact, Toyota
recently changed their view on inventory and are trying to re-adjust their inventory
levels [9].
What is missing in discussions of factory design, quality, and productivity is a
quantitative model to show how they are inter-related. Most of the arguments about
this are based on anecdotal evidence or qualitative reasoning that lack a sound sci-
entific quantitative foundation. The research described here tries to establish such
a foundation to investigate how production system design and operation influence
Integrated quality and quantity modeling of a production line 123

productivity and product quality by developing conceptual and computational mod-

els of two-machine-one-buffer systems and performing numerical experiments.

1.2 Background

1.2.1 Quality models. There are two extreme kinds of quality failures based on
the characteristics of variations that cause the failures. In the quality literature,
these variations are called common (or chance or random) cause variations and
assignable (or special or unusual) cause variations [18].
Figure 1 shows the types of quality failures and variations. Common cause
failures are those in which the quality of each part is independent of the others.
Such failures occur often when an operation is sensitive to external perturbations
like defects in raw material or when the operation uses a new technology that is
difﬁcult to control. This is inherent in the design of the process. Such failures can be
represented by independent Bernoulli random variables, in which a binary random
variable, which indicates whether or not the part is good, is chosen each time a
part is operated on. A good part is produced with probability π, and a bad part
is produced with probability 1 − π. The occurrence of a bad part implies nothing
about the quality of future parts, so no permanent changes can have occurred in the
machine. For the sake of clarity, we call this a Bernoulli-type quality failure. Most
of the quantitative literature on inspection allocation assumes this kind of quality
failure [21]. In this case, if bad parts are destined to be scrapped, it is useful to catch
them as soon as possible because the longer before they are scrapped, the more they
consume the capacity of downstream machines. However, there is no reason to stop
a machine that has produced a bad part due to this kind of failure.
The quality failures due to assignable cause variations are those in which a
quality failure only happens after a change occurs in the machine. In that case, it is
very likely that once a bad part is produced, all subsequent parts will be bad until
the machine is repaired. Here, there is much more incentive to catch defective parts
and stop the machine quickly. In addition to minimizing the waste of downstream
capacity, this strategy minimizes the further production of defective parts. For this
kind of quality failure, there is no inherent measure of yield because the fractions
Persistent-
type quality
failure
Bernoulli-
type quality
failure
Repair takes place
Upper
Specification
Limit
Mean

Lower
Random Variation
Specification
Limit
Assignable Variation
(tool breakage) takes
place

Fig. 1. Types of quality failures

124 J. Kim and S.B. Gershwin

of parts that are good and bad depend on how soon bad parts are detected and how
quickly the machine is stopped for repair. In this paper, we call this a persistent-type
quality failure. Most quantitative studies in Statistical Quality Control (SQC) are
dedicated to finding efficient inspection policies (sampling interval, sample size,
control limits, and others) to detect this type of quality failure [26].
In reality, failures are mixtures of Bernoulli-type quality failures and persistent-
type quality failures. It can be argued that the quality strategy of the Toyota Produc-
tion System [17], in which machines are stopped as soon as a bad part is detected,
is implicitly based on the assumption of the persistent-type quality failure. In this
paper, we focus on persistent failures.
1.2.2 System yield. System yield is defined here as the fraction of input to a sys-
tem that is transformed into output of acceptable quality. This is an important metric
because customers observe the quality of products only after all the manufacturing
processes are done and the products are shipped. The system yield is a complex
function of how the factory is designed and operated, as well as of the characteristics
of the machines. Some of influencing factors include individual operation yields,
inspection strategies, operation policies, buffer sizes, and other factors. Compre-
hensive approaches are needed to manage system yield effectively. This research
aims to develop mathematical models to show how the system yield is influenced
by these factors.
1.2.3 Quality improvement policy. System yield is a complex function of various
factors such as inspection, individual operation yields, buffer size, operation poli-
cies, and others. There are many ways to affect the system yield. Inspection policy
has received the most attention in the literature. Research on inspection policies
can be divided into optimizing inspection parameters at a single station and the
inspection station allocation problem. The former issue has been investigated ex-
tensively in the SQC literature [26]. Here, optimal SQC parameters such as control
limits, sampling size, and frequency are sought for an optimal balance between the
inspection cost and the cost of quality. The latter research looks for the optimal
distributions of inspection stations along production lines [21].
Improving individual operation yield is another important way to increase the
system yield. Many studies in this field try to stabilize the process either by finding
root causes of variation and eliminating them or by making the process insensitive
to external noise. The former topic has numerous qualitative research papers in the
fields of Total Quality Management (TQM) [2] and Six Sigma [19]. Quantitative
research is more oriented toward the latter topic. Robust engineering [20] is an area
that has gained substantial attention.
It has been argued that inventory reduction is an effective means to improve
system yield. Many lean manufacturing specialists have asserted that less inventory
on the factory floor reveals problems in the manufacturing lines more quickly and
helps quality improvement activities [1, 17].
There also have been investigations to explain the relationship between plant
layout design and quality [7]. They argue that U-shaped lines are better than straight
lines for producing higher quality products since there are more points of contact
between operators. There is also less material movement, and there are other rea-
sons.
Integrated quality and quantity modeling of a production line 125

There are many ways to improve system yield, but using only a single method
will give limited gains. The effectiveness of each method is greatly dependent on
the details of the factory. Thus, there is need to determine which method or which
combination of methods is most effective in each case. The quantitative tools that
will be developed from this research can help fulﬁll this need.

1.3 Outline

In Section 2 we introduce the structure of the modeling techniques used in this

paper. We present modeling, solution techniques, and validation of the 2-machine-
1-ﬁnite buffer case in Section 3. Discussions on the behavior of a production line
based on numerical experiments are provided in Section 5. A future research plan
is shown in Section 6. Parameters of many of the systems studied numerically here,
and details of the analytical solution of the two-machine line, can be found in the
appendices.

2 Mathematical models

2.1 Single machine model

There are many possible ways to characterize a machine for the purpose of simul-
taneously studying quality and quantity issues. Here, we model a machine as a
discrete state, continuous time Markov process. Material is assumed continuous,
and µi is the speed at which Machine i processes material while it is operating and
not constrained by the other machine or the buffer. It is a constant, in that µi does
not depend on the repair state of the other machine or the buffer level.
Figure 2 shows the proposed state transitions of a single machine with persistent-
type quality failures. In the model, the machine has three states:
∗ State 1: The machine is operating and producing good parts.
∗ State -1: The machine is operating and producing bad parts, but the operator does
not know this yet.
∗ State 0: The machine is not operating.
The machine therefore has two different failure modes (i.e. transition to failure
states from state 1):
∗ Operational failure: transition from state 1 to state 0. The machine stops pro-
ducing parts due to failures like motor burnout.
∗ Quality failure: transition from state 1 to state -1. The machine stops producing
good parts (and starts producing bad parts) due to a failure like a sudden tool
damage.
When a machine is in state 1, it can fail due to a non-quality related event.
It goes to state 0 with transition probability rate p. After that an operator ﬁxes it,
and the machine goes back to state 1 with transition rate r. Sometimes, due to an
assignable cause, the machine begins to produce bad parts, so there is a transition
126 J. Kim and S.B. Gershwin

g f

State 1 State -1 State 0

r
Fig. 2. States of a machine

from state 1 to state -1 with a probability rate g. Here g is the reciprocal of the Mean
Time to Quality Failure (MTQF). A more stable operation leads to a larger MTQF
and a smaller g.
The machine, when it is in state -1, can be stopped for two reasons: it may
experience the same kind of operational failure as it does when it is in state 1; and
the operator may stop it for repair when he learns that it is producing bad parts.
The transition from state -1 to state 0 occurs at probability rate f = p + h where h
is the reciprocal of the Mean Time To Detect (MTTD). A more reliable inspection
leads to a shorter MTTD and a larger f . (The detection can take place elsewhere,
for example at a remote inspection station.) Note that this implies that f > p. Here,
for simplicity, we assume that whenever a machine is repaired, it goes back to state
1. All the indicated transitions are assumed to follow exponential distributions.

Single machine analysis. To determine the production rate of a single machine,

we ﬁrst determine the steady-state probability distribution. This is calculated based
on the probability balance principle: the probability of leaving a state is the same
as the probability of entering that state. We have
(g + p)P (1) = rP (0) (1)
f P (−1) = gP (1) (2)
rP (0) = pP (1) + f P (−1) (3)
The probabilities must also satisfy the normalization equation:
P (0) + P (1) + P (−1) = 1 (4)
The solution of (1)–(4) is
1
P (1) = (5)
1 + (p + g)/r + g/f
(p + g)/r
P (0) = (6)
1 + (p + g)/r + g/f
g/f
P (−1) = (7)
1 + (p + g)/r + g/f
Integrated quality and quantity modeling of a production line 127

The total production rate, including good and bad parts, is

1 + g/f
PT = µ(P (1) + P (−1)) = µ (8)
1 + (p + g)/r + g/f
The effective production rate, the production rate of good parts only, is
1
PE = µP (1) = µ (9)
1 + (p + g)/r + g/f
The yield is
PE P (1) f
= = (10)
PT P (1) + P (−1) f +g

2.2 2-machine-1-buffer continuous model

A flow (or transfer) line is a manufacturing system with a very special structure. It
is a linear network of service stations or machines (M1 , M2 , ..., Mk ) separated by
buffer storages (B1 , B2 , ..., Bk−1 ). Material flows from outside the system to M1 ,
then to B1 , then to M2 , and so forth until it reaches Mk , after which it leaves. Figure
3 depicts a flow line. The rectangles represent machines and the circles represent
buffers.

M1 B1 M2 B2 M3 B3 M4 B4 M5

Fig. 3. Five-machine ﬂow line

2-machine-1-buffer (2M1B) models should be studied ﬁrst. Then a decompo-

sition technique, that divides a long transfer line into multiple 2-machine-1-buffer
models, could be developed. (See [14].) Among the various modeling techniques
for the 2M1B case, including deterministic, exponential, and continuous models,
the continuous material line model is used for this research because it can handle
deterministic but different operation times at each operation. This is an extension
of the continuous material serial line modeling of [10] by adding another machine
failure state. Figure 4 shows the 2M1B continuous model where the machines,
buffer and discrete parts are represented as valves, a tank, and a continuous ﬂuid.

M1 B

M2
M1 B M2

Fig. 4. Two-machine-one-buffer continuous model

We assume that an inexhaustible supply of workpieces is available upstream of

the ﬁrst machine in the line, and an unlimited storage area is present downstream
128 J. Kim and S.B. Gershwin

of the last machine. Thus, the first machine is never starved, and the last machine
is never blocked. Also, failures are assumed to be operation dependent (ODF).
Finally, we assume that each machine works on a different feature. For example,
the two machines may be making two different holes. We do not consider cases
where the both machines work on the same hole, in which the first machine does
a roughing operation and the second does a finishing operation. This allows us to
assume that the failures of the two machines are independent.

2.3 Inﬁnite buffer case

An infinite buffer case is a special 2M1B line in which the size of the buffer (B)
is infinite. This is an extreme case in which the first machine (M1 ) never suffers
from blockage. To derive expressions for the total production rate and the effective
production rate, we observe that when there is infinite buffer capacity between two
machines (M1 , M2 ), the total production rate of the 2M1B system is a minimum
of the total production rates of M1 and M2 . The total production rate of machine i
is given by (8), so the total production rate of the 2M1B system is
! "
µ1 (1 + g1 /f1 ) µ2 (1 + g2 /f2 )
PT∞ = min , (11)
1 + (p1 + g1 )/r1 + g1 /f1 1 + (p2 + g2 )/r2 + g2 /f2
The probability that machine Mi does not add non-conformities is
Pi (1) fi
Yi = = (12)
Pi (1) + Pi (−1) fi + gi
Since there is no scrap and rework in the system, the system yield becomes
f1 f2
(13)
(f1 + g1 )(f2 + g2 )
As a result, the effective production rate is
f1 f2
PE∞ = P∞ (14)
(f1 + g1 )(f2 + g2 ) T
The effective production rate evaluated from (14) has been compared with a
discrete-event, discrete-part simulation. Table 1 shows good agreement. The pa-
rameters for these cases are shown in Appendix B.
As indicated in Section 2.1, the detection of quality failures due to machine M1
need not occur at that machine. For example, the inspection of the feature that M1
works on could take place at an inspection station at M2 , and this inspection could
trigger a repair of M1 . (We call this quality information feedback. See Section 4.)
In that case, the MTTD of M1 (and therefore f1 ) will be a function of the amount
of material in the buffer. We return to this important case in Section 4.

2.4 Zero buffer case

The zero buffer case is one in which there is no buffer space between the machines.
This is the other extreme case where blockage and starvation take place most
frequently.
Integrated quality and quantity modeling of a production line 129

Table 1. Validation of inﬁnite buffer case

Case # PE∞ (Analytic) PE∞ (Simulation) %Difference

1 0.762 0.761 0.17

2 0.708 0.708 0.00
3 0.657 0.657 0.00
4 0.577 0.580 −0.50
5 0.527 0.530 −0.42
6 0.745 0.745 0.01
7 0.762 0.760 0.30
8 1.524 1.522 0.14
9 0.762 0.762 0.00
10 1.524 1.526 −0.13

In the zero-buffer case in which machines have different operation times, when-
ever one of the machines stops, the other one is also stopped. In addition, when
both of them are working, the production rate is min[µ1 , µ2 ]. Consider a long time
interval of length T during which M1 fails m1 times and M2 fails m2 times. If we
assume that the average time to repair M1 is 1/r1 and the average time to repair
M2 is 1/r2 , then the total system down time will be close to D = m 1 m2
r1 + r2 .
Consequently, the total up time will be approximately
m1 m2
U =T −D =T −( + ) (15)
r1 r2
Since we assume operation-dependent failures, the rates of failure are reduced
for the faster machine. Therefore,
min(µ1 , µ2 ) b min(µ1 , µ2 ) min(µ1 , µ2 )
pbi = pi , gi = gi , and fib = fi (16)
µi µi µi
The reduction of pi is explained in detail in [10]. The reductions of gi and fi
are done for the same reasons.
Table 2 lists the possible working states α1 and α2 of M1 and M2 . The third
column is the probability of ﬁnding the system in the indicated state. The fourth
and ﬁfth columns indicate the expected number of transitions to down states during
the time interval from each of the states in column 1.

Table 2. Zero-buffer states, probabilities, and expected numbers of events

α1 α2 Probability π(α1 , α2 ) Em1 (α1 , α2 ) Em2 (α1 , α2 )

f1b f2b
1 1 f1b +g1b f b +g b pb1 U π(1, 1) pb2 U π(1, 1)
2 2
f1b b
g2
1 −1 f1b +g1
b f b +g b pb1 U π(1, −1) f2b U π(1, −1)
2 2
b
g1 f2b
−1 1 f1b +g1
b f b +g b f1b U π(−1, 1) pb2 U π(−1, 1)
2 2
b b
g1 g2
−1 −1 f1b +g1
b f b +g b f1b U π(−1, −1) f2b U π(−1, −1)
2 2
130 J. Kim and S.B. Gershwin

From Table 2, the expectations of m1 and m2 are

1
1
U f1b (pb1 + g1b )
Em1 = Em1 (α1 , α2 ) = (17)
α1 =−1 α2 =−1
f1b + g1b

1
1
U f2b (pb2 + g2b )
Em2 = Em2 (α1 , α2 ) =
α1 =−1 α2 =−1
f2b + g2b

By plugging them into equation (15), we ﬁnd total production rate:

min[µ1 , µ2 ]
PT0 = f1b (pb1 +g1b ) f2b (pb2 +g2b )
(18)
1+ r1 (f1b +g1b )
+ r2 (f2b +g2b )

The effective production rate is

f1b f2b
PE0 = P0 (19)
(f1b + g1b )(f2b + g2b ) T
The comparison with simulation is shown in in Table 3. The parameters of the
cases are shown in Appendix B.

Table 3. Zero buffer case

Case # PE0 (Analytic) PE0 (Simulation) %Difference

1 0.657 0.662 −0.73

2 0.620 0.627 −1.15
3 0.614 0.621 −1.03
4 0.529 0.534 −0.99
5 0.480 0.484 −0.77
6 0.647 0.651 −0.57
7 0.706 0.712 −0.91
8 1.377 1.406 −2.10
9 0.706 0.711 −0.77
10 1.377 1.380 −0.22

3 2-machine-1-ﬁnite-buffer line

The two-machine line is the simplest non-trivial case of a production line. In the
existing literature on the performance evaluation of systems in which quality is not
considered, two-machine lines are used in decomposition approximations of longer
lines (see [10]).
We deﬁne the model here and show the solution technique in Appendix A.
Integrated quality and quantity modeling of a production line 131

3.1 State deﬁnition

The state of the 2M1B line is deﬁned as (x, α1 , α2 ) where

∗ x: the total amount of material in buffer B, 0 ≤ x ≤ N ,
∗ α1 : the state of M1 . (α1 = −1, 0, or 1),
∗ α2 : the state of M2 . (α2 = −1, 0, or 1)
The parameters of machine Mi are µi , ri , pi , fi , gi and the buffer size is N .

3.2 Model development

3.2.1 Internal transition equations. When buffer B is neither empty nor full, its
level can rise or fall depending on the states of adjacent machines. Since it can
change only a small amount during a short time interval, it is reasonable to use a
continuous probability density f (x, α1 , α2 ) and differential equations to describe
its behavior. The probability of ﬁnding both machines at state 1 with a storage level
between x and x + δx at time t + δt is given by f (x, 1, 1, t + δt)δx, where
f (x, 1, 1, t + δt) = {1 − (p1 + g1 + p2 + g2 )δt}f (x + (µ2 − µ1 )δt, 1, 1) (20)
+r2 δtf (x − µ1 δt, 1, 0) + r1 δtf (x + µ2 δt, 0, 1) + o(δt)
Except for the factor of δx, the ﬁrst term is the probability of transition from
between (x + (µ2 − µ1 )δt, 1, 1) and (x + (µ2 − µ1 )δt + δx, 1, 1) at time t to
between (x, 1, 1) and (x + δx, 1, 1) at time t + δt. This is because
∗ The probability of neither machine failing between t and t + δt is
{1 − (p1 + g1 )δt}{1 − (p2 + g2 )δt} {1 − (p1 + g1 + p2 + g2 )δt} (21)
∗ If there are no failures between t and t + δt and the buffer level is between x and
x + δx at time t + δt, then it could only have been between x + (µ2 − µ1 )δt and
x + (µ2 − µ1 )δt + δx at time t.
The other terms, which represent the probabilities of transition from (1) machine
states (1,0) with buffer level between x − µ1 δt and x − µ1 δt + δx and (2) machine
states (0,1) with buffer level between x + µ2 δt and (x + µ2 δt + δx can be found
similarly. No other transitions are possible. After linearizing, and letting δt → 0,
this equation becomes
∂f (x, 1, 1) ∂f (x, 1, 1)
= (µ2 − µ1 ) − (p1 + g1 + p2 + g2 )f (x, 1, 1)
∂t ∂x
+r2 f (x, 1, 0) + r1 f (x, 0, 1) (22)

∂f
In steady state ∂t = 0. Then, we have
df (x, 1, 1)
(µ2 − µ1 ) − (p1 + g1 + p2 + g2 )f (x, 1, 1) + r2 f (x, 1, 0)
dx
+r1 f (x, 0, 1) = 0 (23)
132 J. Kim and S.B. Gershwin

In the same way, the eight other internal transition equations for the probability
density function are

df (x, 1, 0)
p2 f (x, 1, 1) − µ1 − (p1 + g1 + r2 )f (x, 1, 0) + f2 f (x, 1, −1)
dx
+r1 f (x, 0, 0) = 0 (24)
df (x, 1, −1)
g2 f (x, 1, 1) + (µ2 − µ1 ) − (p1 + g + f2 )f (x, 1, −1)
dx
+r1 f (x, 0, −1) = 0 (25)
df (x, 0, 1)
p1 f (x, 1, 1) + µ2 − (r1 + p2 + g2 )f (x, 0, 1) + r2 f (x, 0, 0)
dx
+f1 f (x, −1, 1) = 0 (26)
p1 f (x, 1, 0) + p2 f (x, 0, 1) − (r1 + r2 )f (x, 0, 0) + f2 f (x, 0, −1)
+f1 f (x, −1, 0) = 0 (27)
df (x, 0, −1)
p1 f (x, 1, −1) + g2 f (x, 0, 1) − (r1 + f2 )f (x, 0, −1) + µ2
dx
+f1 f (x, −1, −1) = 0 (28)
df (x, −1, 1)
g1 f (x, 1, 1) − (p2 + g2 + f1 )f (x, −1, 1) + (µ2 − µ1 )
dx
+r2 f (x, −1, 0) = 0 (29)
df (x, −1, 0)
g1 f (x, 1, 0) − µ1 − (r2 + f1 )f (x, −1, 0) + p2 f (x, −1, 1)
dx
+f2 f (x, −1, −1) = 0 (30)
df (x, −1, −1)
g1 f (x, 1, −1) + g2 f (x, −1, 1) + (µ2 − µ1 )
dx
−(f1 + f2 )f (x, −1, −1) = 0 (31)

3.2.2 Boundary transition equations. While the internal behavior of the system
can be described by probability density functions, there is a nonzero probability of
ﬁnding the system in certain boundary states. For example, if µ1 < µ2 and both
machines are in state 1, the level of storage tends to decrease. If both machines
remain operational for enough time, the storage will become empty (x = 0). Once
the system reaches state (0, 1, 1), it will remain there until a machine fails. There are
18 probability masses for boundary states (P (N, α1 , α2 ) and P (0, α1 , α2 ) where
α1 = −1, 0 or 1, and α2 = −1, 0 or 1) and 22 boundary equations for the µ1 = µ2
case.
To arrive at state (0, 1, 1) at time t + δt when µ1 = µ2 , the system may have
been in one of two states at time t. It could have been in state (0, 1, 1) without any
of operational failures and quality failures for both of machines. It could have been
in state (0, 0, 1) with a repair of the ﬁrst machine. (The second machine could not
have failed since it was starved). If the second order terms are ignored,

P (0, 1, 1, t + δt) = {1 − (p1 + g1 + pb2 + g2b )δt}P (0, 1, 1) + r1 P (0, 0, 1) (32)

Integrated quality and quantity modeling of a production line 133

After the usual analysis, (32) becomes

∂P (0, 1, 1)
= (p1 + g1 + pb2 + g2b )P (0, 1, 1) + r1 P (0, 0, 1) (33)
∂t
In steady state

−(p1 + g1 + pb2 + g2b )P (0, 1, 1) + r1 P (0, 0, 1) = 0 (34)

There are 21 other boundary equations derived similarly for µ1 = µ2 [14]:

P (0, 1, 0) = 0 (35)
g2b P (0, 1, 1) − (p1 + g1 + f2b )P (0, 1, −1) + r1 P (0, 0, −1) = 0 (36)
p1 P (0, 1, 1) − r1 P (0, 0, 1) + µ2 f (0, 0, 1) + f1 P (0, −1, 1)
+r2 P (0, 0, 0) = 0 (37)
−(r1 + r2 )P (0, 0, 0) = 0 (38)
p1 P (0, 1, −1) − r1 P (0, 0, −1) + µ2 f (0, 0, −1) + f1 P (0, −1, −1) = 0 (39)
g1 P (0, 1, 1) − (f1 + pb2 + g2b )P (0, −1, 1) =0 (40)
P (0, −1, 0) = 0 (41)
g1 P (0, 1, −1) + g2b P (0, −1, 1) − (f1 + f2b )P (0, −1, −1) = 0 (42)
−(pb1 + g1b + p2 + g2 )P (N, 1, 1) + r2 P (N, 1, 0) = 0 (43)
p2 P (N, 1, 1) − r2 P (N, 1, 0) + µ1 f (N, 1, 0) + f2 P (N, 1, −1)
+r1 P (N, 0, 0) = 0 (44)
g2 P (N, 1, 1) − (pb1 + g1b + f2 )P (N, 1, −1) = 0 (45)
P (N, 0, 1) = 0 (46)
−(r1 + r2 )P (N, 0, 0) = 0 (47)
P (N, 0, −1) = 0 (48)
g1b P (N, 1, 1) − (f1b + g2 + p2 )P (N, −1, 1) + r2 P (N, −1, 0) = 0 (49)
−r2 P (N, −1, 0) + µ1 f (N, −1, 0) + f2 P (N, −1, −1)
+p2 P (N, −1, 1) = 0 (50)
g1b P (N, 1, −1) + g2 P (N, −1, 1) − (f1b + f2 )P (N, −1, −1) = 0 (51)
µ1 f (0, 1, 0) = r1 P (0, 0, 0) + pb2 P (0, 1, 1) + f2b P (0, 1, −1) (52)
µ1 f (0, −1, 0) = pb2 P (0, −1, 1) + f2b P (0, −1, −1) (53)
µ2 f (N, 0, 1) = r2 P (N, 0, 0) + pb1 P (N, 1, 1) + f1b P (N, −1, 1) (54)
µ2 f (N, 0, −1) = pb1 P (N, 1, −1) + g2 P (N, 0, 1) + f1b P (N, −1, −1) (55)

3.2.3 Normalization. In addition to these, all the probability density functions

and probability masses must satisfy the normalization equation:
#$ %
N
f (x, α1 , α2 )dx+P (0, α1 , α2 )+P (N, α1 , α2 ) =1 (56)
α1 =−1,0,1 α2 =−1,0,1 0
134 J. Kim and S.B. Gershwin

3.2.4 Performance measures. After ﬁnding all probability density functions and
probability masses, we can calculate the average inventory in the buffer from
⎡N ⎤
$
x= ⎣ xf (x, α1 , α2 )dx + N P (N, α1 , α2 )⎦ (57)
α1 =−1,0,1 α2 =−1,0,1 0

The total production rate is

PT = PT1 =
⎡N ⎤
$
µ1 ⎣ {f (x, −1, α2 )+f (x, 1, α2 )}dx+P (0, 1, α2 )+P (0, −1, α2 )⎦
α2 =−1,0,1 0
+µ2 {P (N, 1, −1) + P (N, 1, 1) + P (N, −1, −1) + P (N, −1, 1} (58)

The rate at which machine M1 produces good parts is

$ N
PE1 = µ1 [ f (x, 1, α2 )dx + P (0, 1, α2 )]
α2 =−1,0,1 0

+µ2 {P (N, 1, −1) + P (N, 1, 1)} (59)

The probability that the ﬁrst machine produces a non-defective part is then Y1 =
PE1 /PT . Similarly, the probability that the second machine ﬁnishes its operation
without adding a bad feature to a part is Y2 = PE2 /PT , where

$N
PE2 = µ2 [ f (x, α1 , 1)dx
α1 =−1,0,1 0
+P (N, α1 , 1)] + µ1 {P (0, −1, 1) + P (0, 1, 1)} (60)

Therefore, the effective production rate is

PE = Y1 Y2 PT (61)

3.3 Validation

The 2M1B systems with the same machine speed (µ1 = µ2 ) are solved in Ap-
pendix A. As we have indicated, we represent discrete parts in this model as a
continuous ﬂuid and time as a continuous variable. We compare analytical and
simulation results in this section. In the simulation, both material and time are
discrete. Details are presented in [14].
Figure 5 shows the comparison of the effective production rate and the average
inventory from the analytic model and the simulation. 50 cases are generated by
changing machines and buffer parameters and % errors are plotted in the vertical
Integrated quality and quantity modeling of a production line 135

10.00% 10.00%

8.00% 8.00%

6.00% 6.00%

4.00% 4.00%
E

% error of Inv
2.00% 2.00%
% error of P

0.00% 0.00%
1
4
7

1
4
7
10

13
16

19
22
25
28
31

34
37

43
46
49

13
16
19

22
25

28
31
34
37

43
46
49
-2.00% -2.00%

-4.00% -4.00%

-6.00% -6.00%

-8.00% -8.00%

-10.00% -10.00%

Case Number Case Number

Fig. 5. Validation of the intermediate buffer size case

axis. The parameters for theses cases are given in Appendix B. The % error in the
effective production rate is calculated from
PE (A) − PE (S)
PE %error = × 100(%) (62)
PE (S)
where PE (A) and PE (S) are the effective production rates estimated from the
analytical model and the simulation respectively. But the % error in the average
inventory is calculated from
InvE (A) − InvE (S)
InvE %error = × 100(%) (63)
0.5 × N
where InvE (A) and InvE (S) are average inventory estimated from the analytical
model and the simulation respectively and N is a buffer size1 .
The average absolute value of the % error in the effective production rate esti-
mation is 0.76% and it is 1.89% for average inventory estimation.

4 Quality information feedback

Factory designers and managers know that it is ideal to have inspection after every
operation. However, it is often costly to do this. As a result, factories are usually
designed so that multiple inspections are performed at a small number of stations.
In this case, inspection at downstream operations can detect bad features made by
upstream machines. We call this quality information feedback. A simple example of
the quality information feedback in 2M1B systems is when M1 produces defective
features but does not have inspection and M2 has inspection and it can detect bad
features made by M1 . In this situation, as we demonstrate below, the yield of a line
is a function of the size of buffer. This is because when buffer gets larger, more
material can accumulate between an operation (M1 ) and the inspection of that
operation (M2 ). All such material will be defective if a persistent quality failure
1
This is an unbiased way to calculate the error in average inventory. If it were calculated
in the same way as the error in the effective production rate, the error would depend on the
relative speeds of the machines. This is because there will be a lower error when the buffer
is mostly full (ie, when M1 is faster than M2 ) and a higher error when the buffer is empty
(when M1 is faster than M2 ).
136 J. Kim and S.B. Gershwin

takes place. In other words, if buffer is larger, there tends to be more material in the
buffer and consequently more material is defective. In addition it takes longer to
have inspections after ﬁnishing operations. We can capture this phenomenon with
the adjustment of a transition probability rate of M1 from state -1 to state 0.
Let us deﬁne f1q as a transition rate of M1 from state -1 to state 0 when there
is a quality information feedback and f1 as the transition rate without the quality
information feedback. The adjustment can be done in a way that the yield of M1 is
Z1g
the same as Z g +Z b where
1 1

∗ Z1b : the expected number of bad parts generated by M1 while it stays in state -1.
∗ Z1g : the expected number of good parts produced by M1 from the moment when
M1 leaves the -1 state to the next time it arrives at state -1.
From (10), the yield of M1 is
P (1) fq
= q 1 (64)
P (1) + P (−1) f1 + g1
Suppose that M1 has been in state 1 for a very long time. Then all parts in the
buffer B are non-defective. Suppose that M1 goes to state -1. Defective parts will
then begin to accumulate in the buffer. Until all the parts in the buffer are defective,
the only way that M1 can go to state 0 is due to its own inspection or its own
operation failure. Therefore, the probability of a transition to 0 before M1 ﬁnishes
a part is
f1
≡ χ11
µ1
Eventually all the parts in the buffer are bad so that defective parts reach M2 . Then,
there is another way that M1 can move to state 0 from state -1: quality information
feedback. The probability that the inspection at M2 detects a nonconformity made
by M1 is
h21
χ21 ≡
µ2
where 1/h21 is the mean time until the inspection at M2 detects a bad part made
by M1 after M2 receives the bad part.
The expected value of the number of bad parts produced by M1 before it is
stopped by either operational failures or quality information feedback is
Z1b = [χ11 + 2χ11 (1 − χ11 ) + 3χ11 (1 − χ11 )2 + . . . + wχ11 (1 − χ11 )w−1 ]
+[(w + 1)(1 − χ11 )w χ21 + (w + 2)(1 − χ11 )w+1 χ21 (1 − χ21 ) + . . .](65)
where w is average inventory in the buffer B. This is an approximate formula since
we simply use the average inventory rather than averaging the expected number of
bad parts produced by M1 depending on different inventory levels wi . After some
mathematical manipulation,
1 − (1 − χ11 )w
Z1b = − w(1 − χ11 )w
χ11
(1 − χ11 )w χ21 [(w + 1) − w(1 − χ11 )(1 − χ21 )]
+ (66)
[1 − (1 − χ11 )(1 − χ21 )]2
Integrated quality and quantity modeling of a production line 137

On the other hand, Z1g is given as

µ1 p1 µ1 p1 2 µ1 µ1
Z1g = + +( ) ( )... = (67)
p 1 + g1 p 1 + g1 p 1 + g1 p 1 + g1 p 1 + g1 g1
f1q Z1g
By setting f1q +g1
= g
Z1 +Z1b
we have
µ1
f1q = 1−(1+wχ11 )(1−χ11 )w (1−χ11 )w χ21 [1+w(χ21 +χ11 −χ21 χ11 )]
(68)
χ11 + [1−(1−χ11 )(1−χ21 )]2

Since the average inventory is a function of f1q and f1q is dependent on the
average inventory, an iterative method is required to determine these values.

15.00% 15.00%

10.00% 10.00%

5.00% 5.00%
% error of Inv
% error of P E

0.00% 0.00%
1

9
13

37
-5.00% -5.00%

-10.00% -10.00%

-15.00% -15.00%
Case Number Case Number

Fig. 6. Validation of the quality information feedback formula

Figure 6 shows the comparison of the effective production rate and the average
inventory from the analytic model and the simulation. 50 cases are generated by
selecting different machine and buffer parameters and % errors are plotted in the
y-axis. The parameters for theses cases are given in Appendix B. % errors in the
effective production rate and average inventory are calculated using equations (62)
and (63) respectively. The average absolute value of the % error in PE and x
estimations are 1.01% and 3.67% respectively.

5 Insights from numerical experimentation

In this section, we perform a set of numerical experiments to provide intuitive

insight into the behavior of production lines with inspection. The parameters of all
the cases are presented in Appendix B.

5.1 Beneﬁcial buffer case

5.1.1 Production rates. Having quality information feedback means having more
inspection than otherwise. Therefore, machines tend to stop more frequently. As
a result, the total production rate of the line decreases. However, the effective
production rate can increase since added inspections prevent the making of defective
parts. This phenomenon is shown in Figure 7. Note that the total production rate PT
without quality information feedback is consistently higher than PT with quality
information feedback regardless of buffer size and the opposite is true for the
138 J. Kim and S.B. Gershwin

effective production rate PE . Also it should be noted that in this case, both the total
production rate and the effective production rate increase with buffer size, with or
without quality information feedback.

0.8
0.8

Effective Production Rate

without feedback
Total Production Rate

with feedback

0.75
0.75

without feedback
with feedback

0.7
0.7

0.65
0.65 0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
Buffer Size Buffer Size

Fig. 7. Production rates with/without quality information feedback

5.1.2 System yield and buffer size. Even though a larger buffer increases both total
and effective production rates in this case, it decreases yield. As explained in Section
4, the system yield is a function of the buffer size if there is quality information
feedback. Figure 8 shows system yield decreasing as buffer size increases when
there is quality information feedback. This happens because when the buffer gets
larger, more material accumulates between an operation and the inspection of that
operation. All such material will be defective when the first machine is at state -1
but the inspection at the first machine does not find it. This is a case in which a
smaller buffer improves quality, which is widely believed to be generally true. If
there is no quality information feedback, then the system yield is independent of
the buffer size (and is substantially less).

5.2 Harmful buffer case

5.2.1 Production rates. Typically, increasing the buffer size leads to higher ef-
fective production rate. This is the case in Figure 7. But under certain conditions,
the effective production rate can actually decrease as buffer size increases. This can
happen when
∗ The first machine produces bad parts frequently: this means g1 is large.
∗ The inspection at the first machine is poor or non-existent and inspection at the
second machine is reliable: this means h1 h2 or f1 − p1 f2 − p2 .
∗ There is quality information feedback.
∗ The isolated production rate of the first machine is higher than that of the second
machine:
1 + g1 /f1 1 + g2 /f2
µ1 > µ2
1 + (p1 + g1 )/r1 + g1 /f1 1 + (p2 + g2 )/r2 + g2 /f2
Figure 9 shows a case in which a buffer size increase leads to a lower effective
production rate. Note that even in this case, total production rate monotonically
increases as buffer size increases.
Integrated quality and quantity modeling of a production line 139

0.97

0.96

0.95
without feedback
with feedback

0.94
System Yield

0.93

0.92

0.91

0.9

0.89
0 5 10 15 20 25 30 35 40 45 50

Buffer Size

Fig. 8. System yield as a function of buffer size

1.5
1.5
Effective Production Rate
Total Production Rate

1
1

Without feedback
With feedback
Without feedback
With feedback

0.5
0.5

0
0 5 10 15 20 25 30 35 40 45 50 0
0 5 10 15 20 25 30 35 40 45 50
Buffer Size Buffer Size

Fig. 9. Total production rate and effective production rate

0.9

0.8

0.7
System Yield

0.6
Without feedback
With feedback
0.5

0.4

0.3

0.2

0.1

0
0 5 10 15 20 25 30 35 40 45 50
Buffer Size

Fig. 10. System yield as a function of buffer size

140 J. Kim and S.B. Gershwin

5.2.2 System yield. The system yield for this case is shown in Figure 10. Note
that the yield decreases dramatically as the buffer size increases. In this case, the
decrease of the system yield is more than the increase of the total production rate
so that the effective production rate monotonically decreases as buffer size gets
bigger.

5.3 How to improve quality in a line with persistent quality failures

There are two major ways to improve quality. One is to increase the yield of in-
dividual operations and the other is to perform more rigorous inspection. Having
extensive preventive maintenance on manufacturing equipment and using robust
engineering techniques to stabilize operations have been suggested as tools to in-
crease yield of individual operations. Both approaches increase the Mean Time to
Quality Failure (MTQF) (i.e. decrease g). On the other hand, the inspection policy
aims to detect bad parts as soon as possible and prevent their ﬂow toward down-
stream operations. More rigorous inspection decreases the mean time to detect
(MTTD) (i.e. increases h and therefore increases f ). It is natural to believe that
using only one kind of method to achieve a target quality level would not give the
most cost efﬁcient quality assurance policy. Figure 11 indicates that the impact of
individual operation stabilization on the system yield decreases as the operation
becomes more stable. It also shows that effect of improving inspection (MTTD) on
the system yield decreases. Therefore, it is optimal to use a combination of both
methods to improve quality.

1
1

0.9
0.9

0.8
0.8

0.7
0.7
System Yield

0.6
System Yield

0.6

0.5
0.5

0.4
0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
50 100 150 200 250 300 350 400 450 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
MTQF f=p+h

Fig. 11. Quality improvement

5.4 How to increase productivity

Improving the stand-alone throughput of each operation and increasing the buffer
space are typical ways to increase the production rate of manufacturing systems.
If operations are apt to have quality failures, however, there may be other ways to
increase the effective production rate: increasing the yield of each operation and
conducting more extensive inspections. Stabilizing operations, thus improving the
yield of individual operations, will increase effective throughput of a manufacturing
Integrated quality and quantity modeling of a production line 141

system regardless of the type of quality failure. On the other hand, reducing the
mean time to detect (MTTD) will increase the effective production rate only if the
quality failure is persistent but it will decrease the effective production rate if the
quality failure is Bernoulli. This is because the quality of each part is independent
of the others when the quality failure is Bernoulli. Therefore, stopping the line does
not reduce the number of bad parts in the future.
In a situation in which machines produce defective parts frequently and in-
spection is poor, increasing inspection reliability is more effective than increasing
buffer size to boost the effective production rate. Figure 12 shows this. Also, in other
situations in which machines produce defective parts frequently and inspection is
reliable, increasing machine stability is more effective than increasing buffer size
to enhance effective production rate. Figure 13 shows this phenomenon.

0.8

0.75
MTTD = 20
MTTD = 10
MTTD = 2
Effective Production Rate

0.7

0.65

0.6

0.55

0.5
0 5 10 15 20 25 30 35 40

Buffer Size

Fig. 12. Mean time to detect and effective production rate

0.9

0.8
Effective Production Rate

0.7

MTQF = 20
0.6
MTQF = 100
MTQF = 500

0.5

0.4

0.3
0 5 10 15 20 25 30 35 40
Buffer Size

Fig. 13. Quality failure frequency and effective production rate

142 J. Kim and S.B. Gershwin

6 Future research
The 2-Machine-1-Buffer (2M1B) model with µ1 = / µ2 is analyzed in [14]. This
case is more challenging because the number of roots of the internal transition
equations depends on parameters of machine. A more general 2M1B model with
multiple-yield quality failures (a mixture of Bernoulli- and persistent-type quality
failures) should also be studied. A long line analysis using decomposition is under
the development. Refer to Kim [14] for more detailed information.

Appendix
A Solution technique
It is natural to assume an exponential form for the solution to the steady state density
functions since equations (23)–(31) are coupled ordinary linear differential equa-
tions. A solution of the form eλx K1α1 K2α2 worked successfully in the continuous
material two-machine line with perfect quality [10]. Therefore, a solution of a form
f (x, α1 , α2 ) = eλx G1 (α1 )G2 (α2 ) (69)
is assumed here. This form satisﬁes the transition equations if all of the following
equations are met. Equations (23)–(31) become, after substituting (69),
{(µ2 − µ1 )λ − (p1 + g1 + p2 + g2 )G1 (1)G2 (1)}
+r2 G1 (1)G2 (0) + r1 G1 (0)G2 (1) = 0 (70)
−{µ1 λ + (p1 + g1 + r2 )}G1 (1)G2 (0) + p2 G1 (1)G2 (1) + f2 G1 (1)G2 (−1)
+r1 G1 (0)G2 (0) = 0 (71)
{(µ2 − µ1 )λ − (p1 + g1 + f2 )}G1 (1)G2 (−1) + g2 G1 (1)G2 (1)
+r1 G1 (0)G2 (−1) = 0 (72)
{µ2 λ − (r1 + p2 + g2 )}G1 (0)G2 (1) + p1 G1 (1)G2 (1) + r2 G1 (0)G2 (0)
+f1 G1 (−1)G2 (1) = 0 (73)
p1 G1 (1)G2 (0) + p2 G1 (0)G2 (1) − (r1 + r2 )G1 (0)G2 (0) + f2 G1 (0)G2 (−1)
+f1 G1 (−1)G2 (0) = 0 (74)
{µ2 λ − (r1 + f2 )}G1 (0)G2 (−1) + p1 G1 (1)G2 (−1) + g2 G1 (0)G2 (1)
+f1 G1 (−1)G2 (−1) = 0 (75)
{(µ2 − µ1 )λ − (p2 + g2 + f1 )}G1 (−1)G2 (1) + g1 G1 (1)G2 (1)
+r2 G1 (−1)G2 (0) = 0 (76)
−{µ1 λ + (r2 + f1 )}G1 (−1)G2 (0) + g1 G1 (1)G2 (0) + p2 G1 (−1)G2 (1)
+f2 G1 (−1)G2 (−1) = 0 (77)
{(µ2 − µ1 )λ − (f1 + f2 )}G1 (−1)G2 (−1) + g1 G1 (1)G2 (−1)
+g2 G1 (−1)G2 (1) = 0 (78)

These are nine equations in seven unknowns (λ, G1 (1), G2 (0), G1 (−1), G2 (1),
G2 (0), and G2 (−1)). Thus, there must be seven independent equations and two
dependent ones.
Integrated quality and quantity modeling of a production line 143

If we divide equations (70) – (78) by G1 (0)G2 (0) and deﬁne new parameters
Gi (1) Gi (−1)
Γ i = pi − ri + fi (79)
Gi (0) Gi (0)
Gi (0)
Ψi = −pi − gi + ri (80)
Gi (1)
Gi (1)
Θi = −fi + gi (81)
Gi (−1)
then equations (70)–(78) can be rewritten as
Γ1 + Γ2 = 0 (82)
−µ2 λ = Γ1 + Ψ2 (83)
µ1 λ = Γ2 + Ψ1 (84)
(µ1 − µ2 )λ = Ψ1 + Ψ2 (85)
(µ1 − µ2 )λ = Θ1 + Θ2 (86)
µ1 λ = Γ2 + Θ1 (87)
−µ2 λ = Γ1 + Θ2 (88)
(µ1 − µ2 )λ = Ψ2 + Θ1 (89)
(µ1 − µ2 )λ = Ψ1 + Θ2 (90)

From equations (82)–(90), it is clear that only seven equations are independent.
After much mathematical manipulation [14], these equations become
{(M + r1 )(µ1 N − 1) − f1 }2
0=
(f1 − p1 )(µ1 N − 1)
{(p1 + g1 − f1 ) + r1 (µ1 N − 1)}{(M + r1 )(µ1 N − 1) − f1 }
− − r1 (91)
(f1 − p1 )(µ1 N − 1)
{(−M + r2 )(µ2 N − 1) − f2 }2
0=
(f2 − p2 )(µ2 N − 1)
{(p2 +g2 −f2 )+r2 (µ2 N −1)}{(−M +r2 )(µ2 N −1)−f2 }
− −r2 =0 (92)
(f2 −p2 )(µ2 N −1)
where

G1 (1) G1 (−1) G2 (1) G2 (−1)
p1 − r1 + f1 = − p2 − r2 + f2 =M (93)
G1 (0) G1 (0) G2 (0) G2 (0)
⎛ ⎞
1 ⎝ 1 ⎠=
1+
µ1 G1 (1)/ + G1 (−1)/
G1 (0) G1 (0)
⎛ ⎞
1 ⎝ 1 ⎠=N
1+ (94)
µ2 G2 (1)/ + G2 (−1)/
G2 (0) G2 (0)

Now all the equations and unknowns are simpliﬁed into two unknowns and
two equations. By solving equations (91) and (92) simultaneously we can calculate
144 J. Kim and S.B. Gershwin

M and N . An example of these equations is plotted in Figure 14. Equation (91)

is represented with lighter lines and equation (92) is shown as darker lines. The
intersections of the two sets of lines are the solutions of the equations.

1
N

−1

−2

−3
−3 −2 −1 0 1 2 3
M

Fig. 14. Plot of equations (91) and (92)

These are high order polynomial equations for which no general analytical
solution exists. A numerical approach is required to find the roots of the equations.
A special algorithm to find the solutions has been developed [14] based on the
characteristics of the equations. Once we find roots of equations (91) and (92),
Gi (1)
we can get ratios G i (0)
and GGi i(−1)
(0) (i = 1, 2) from equation (94). By setting
G1 (0) = G2 (0) = 1, we can calculate G1 (1), G1 (−1), G2 (1), and G2 (−1). After
some mathematical manipulation, we find that λ can be expressed as
−p1 − g1 + r1/G1 (1) − p1 G1 (1) + r1 − f1 G1 (−1)
λ= (95)
µ1
Therefore, we can get a probability density function f (x, α1 , α2 ) corresponding
to an (M, N ) pair. The number of roots in equations (91) and (92) depends on
machine parameters. There are only 3 roots when µ1 = µ2 regardless of other
parameters. Therefore, a general expression of the probability density function in
this case is
f (x, α1 , α2 ) = c1 f1 (x, α1 , α2 ) + c2 f2 (x, α1 , α2 ) + c3 f3 (x, α1 , α2 ) (96)
where f1 (x, α1 , α2 ), f2 (x, α1 , α2 ), f3 (x, α1 , α2 ) are the roots of the equations
(91) and (92).
The remaining unknowns, including c1 , c2 , c3 and probability masses at the
boundaries, can be calculated by solving boundary equations ((34)–(55)) and the
normalization equation (56) with fi (x, α1 , α2 ) given by equation (96).
Integrated quality and quantity modeling of a production line 145

B Machine parameters for numerical and simulation experiments

Table 4. Machine parameters for inﬁnite buffer case and zero buffer case

Case # µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2
1 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2
2 1.0 1.0 0.3 0.3 0.005 0.005 0.05 0.05 0.5 0.5
3 1.0 1.0 0.2 0.05 0.01 0.01 0.01 0.01 0.2 0.2
4 1.0 1.0 0.1 0.1 0.05 0.005 0.01 0.01 0.2 0.2
5 1.0 1.0 0.1 0.1 0.01 0.01 0.05 0.005 0.2 0.2
6 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.5 0.1
7 2.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.5 0.1
8 3.0 2.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2
9 1.0 2.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2
10 2.0 3.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2

Table 5. Machine parameters for Figures 7 and 8

µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2
1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.1 0.9

Table 6. Machine parameters for Figures 9 and 10

µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2
2.0 2.0 0.5 0.1 0.005 0.05 0.5 0.005 0.02 0.9

Table 7. Machine parameters for Figure 11

µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2
1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2

Table 8. Machine parameters for Figure 12

µ1 µ2 r1 r2 p1 p2 g1 g2
1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01

Table 9. Machine parameters for Figure 13

µ1 µ2 r1 r2 p1 p2 f1 f2
1.0 1.0 0.1 0.1 0.01 0.01 0.2 0.2
146 J. Kim and S.B. Gershwin

Table 10. Machine parameters for intermediate buffer case validation

Case # µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2 N
1 1.0 1.0 0.1 0.1 0.01 0.01 0.02 0.01 0.1 0.2 30
2 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 5
3 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 10
4 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 15
5 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 20
6 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 25
7 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 35
8 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 40
9 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 45
10 0.5 0.5 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
11 1.5 1.5 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
12 2.0 2.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
13 2.5 2.5 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
14 3.0 3.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
15 1.0 1.0 0.01 0.01 0.01 0.01 0.01 0.01 0.2 0.2 30
16 1.0 1.0 0.05 0.05 0.01 0.01 0.01 0.01 0.2 0.2 30
17 1.0 1.0 0.2 0.2 0.01 0.01 0.01 0.01 0.2 0.2 30
18 1.0 1.0 0.5 0.5 0.01 0.01 0.01 0.01 0.2 0.2 30
19 1.0 1.0 0.8 0.8 0.01 0.01 0.01 0.01 0.2 0.2 30
20 1.0 1.0 0.1 0.1 0.001 0.001 0.01 0.01 0.2 0.2 30
21 1.0 1.0 0.1 0.1 0.005 0.005 0.01 0.01 0.2 0.2 30
22 1.0 1.0 0.1 0.1 0.02 0.02 0.01 0.01 0.2 0.2 30
23 1.0 1.0 0.1 0.1 0.05 0.05 0.01 0.01 0.2 0.2 30
24 1.0 1.0 0.1 0.1 0.1 0.1 0.01 0.01 0.2 0.2 30
25 1.0 1.0 0.1 0.1 0.01 0.01 0.001 0.001 0.2 0.2 30
26 1.0 1.0 0.1 0.1 0.01 0.01 0.005 0.005 0.2 0.2 30
27 1.0 1.0 0.1 0.1 0.01 0.01 0.02 0.02 0.2 0.2 30
28 1.0 1.0 0.1 0.1 0.01 0.01 0.05 0.05 0.2 0.2 30
29 1.0 1.0 0.1 0.1 0.01 0.01 0.10 0.10 0.2 0.2 30
30 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.02 0.02 30
31 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.05 0.05 30
32 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.1 0.1 30
33 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.5 0.5 30
34 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.95 0.95 30
35 1.0 1.0 0.5 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
36 1.0 1.0 0.01 0.1 0.01 0.01 0.01 0.01 0.2 0.2 30
37 1.0 1.0 0.1 0.5 0.01 0.01 0.01 0.01 0.2 0.2 30
38 1.0 1.0 0.1 0.01 0.01 0.01 0.01 0.01 0.2 0.2 30
39 1.0 1.0 0.1 0.1 0.1 0.01 0.01 0.01 0.2 0.2 30
40 1.0 1.0 0.1 0.1 0.001 0.01 0.01 0.01 0.2 0.2 30
41 1.0 1.0 0.1 0.1 0.01 0.1 0.01 0.01 0.2 0.2 30
42 1.0 1.0 0.1 0.1 0.01 0.001 0.01 0.01 0.2 0.2 30
43 1.0 1.0 0.1 0.1 0.01 0.01 0.1 0.01 0.2 0.2 30
44 1.0 1.0 0.1 0.1 0.01 0.01 0.001 0.01 0.2 0.2 30
45 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.1 0.2 0.2 30
46 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.001 0.2 0.2 30
47 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.9 0.2 30
48 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.05 0.2 30
49 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.9 30
50 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 0.05 30
Integrated quality and quantity modeling of a production line 147

Table 11. Machine parameters for quality information feedback validation

Case # µ1 µ2 r1 r2 p1 p2 g1 g2 f1 f2 N
1 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
2 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 0
3 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 5
4 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 20
5 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 30
6 1.0 1.0 0.01 0.01 0.01 0.01 0.01 0.01 0.01 1.0 10
7 1.0 1.0 0.05 0.05 0.01 0.01 0.01 0.01 0.01 1.0 10
8 1.0 1.0 0.4 0.4 0.01 0.01 0.01 0.01 0.01 1.0 10
9 1.0 1.0 0.8 0.8 0.01 0.01 0.01 0.01 0.01 1.0 10
10 1.0 1.0 0.1 0.1 0.001 0.001 0.01 0.001 0.01 1.0 10
11 1.0 1.0 0.1 0.1 0.005 0.005 0.01 0.005 0.01 1.0 10
12 1.0 1.0 0.1 0.1 0.02 0.02 0.01 0.01 0.02 1.0 10
13 1.0 1.0 0.1 0.1 0.1 0.1 0.01 0.01 0.1 1.0 10
14 1.0 1.0 0.1 0.1 0.01 0.01 0.001 0.001 0.01 1.0 10
15 1.0 1.0 0.1 0.1 0.01 0.01 0.005 0.005 0.01 1.0 10
16 1.0 1.0 0.1 0.1 0.01 0.01 0.02 0.02 0.01 1.0 10
17 1.0 1.0 0.1 0.1 0.01 0.01 0.05 0.05 0.01 1.0 10
18 0.5 0.5 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
19 1.5 1.5 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
20 2.0 2.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
21 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.05 1.0 10
22 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 1.0 10
23 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.5 1.0 10
24 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.8 1.0 10
25 1.0 1.0 0.5 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
26 1.0 1.0 0.01 0.1 0.01 0.01 0.01 0.01 0.01 1.0 10
27 1.0 1.0 0.1 0.5 0.01 0.01 0.01 0.01 0.01 1.0 10
28 1.0 1.0 0.1 0.01 0.01 0.01 0.01 0.01 0.01 1.0 10
29 1.0 1.0 0.1 0.1 0.1 0.01 0.01 0.01 0.1 1.0 10
30 1.0 1.0 0.1 0.1 0.001 0.01 0.01 0.01 0.001 1.0 10
31 1.0 1.0 0.1 0.1 0.01 0.1 0.01 0.01 0.01 1.0 10
32 1.0 1.0 0.1 0.1 0.01 0.001 0.01 0.01 0.01 1.0 10
33 1.0 1.0 0.1 0.1 0.01 0.01 0.05 0.01 0.01 1.0 10
34 1.0 1.0 0.1 0.1 0.01 0.01 0.001 0.01 0.01 1.0 10
35 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.05 0.01 1.0 10
36 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.001 0.01 1.0 10
37 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.5 1.0 10
38 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.2 1.0 10
39 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 0.8 10
40 1.0 1.0 0.1 0.1 0.01 0.01 0.01 0.01 0.01 0.2 10

References

1. Alles M, Amershi A, Datar S, Sarkar R (2000) Information and incentive effects of

inventory in JIT production. Management Science 46(12): 1528–1544
2. Besterfield DH, Besterfield-Michna C, Besterfield G, Besterfield-Sacre M (2003) Total
quality management. Prentice Hall, Englewood Cliffs
3. Black JT (1991) The design of the factory with a future. McGraw-Hill, New York
4. Bonvik AM, Couch CE, Gershwin SB (1997) A comparison of production line control
mechanisms. International Journal of Production Research 35(3): 789–804
148 J. Kim and S.B. Gershwin

5. Burman M, Gershwin SB, Suyematsu C (1998) Hewlett-Packard uses operations re-

search to improve the design of a printer production line. Interfaces 28(1): 24–26
6. Buzacott JA, Shantikumar JG (1993) Stochastic models of manufacturing systems.
Prentice-Hall, Englewood Cliffs
7. Cheng CH, Miltenburg J, Motwani J (2000) The effect of straight and U shaped lines
on quality. IEEE Transactions on Engineering Management 47(3): 321–334
8. Dallery Y, Gershwin SB (1992) Manufacturing flow line systems: a review of models
and analytical results. Queuing Systems Theory and Applications 12: 3–94
9. Fujimoto T (1999) The evolution of a manufacturing systems at Toyota. Oxford Uni-
versity Press, Oxford
10. Gershwin SB (1994) Manufacturing systems engineering. Prentice Hall, Englewood
Cliffs
11. Gershwin SB (2000) Design and operation of manufacturing systems – the control-point
policy. IIE Transactions 32(2): 891–906
12. Gershwin SB, Schor JE (2000) Efficient algorithms for buffer space allocation. Annals
of Operations Research 93: 117–144
13. Inman RR, Blumenfeld DE, Huang N, Li J (2003) Designing production systems for
quality: research opportunities from an automotive industry perspective. International
Journal of Production Research 41(9): 1953–1971
14. Kim J (2004) Integrated quality and quantity modeling of a production line. Mas-
sachusetts Institute of Technology PhD thesis (in preparation)
15. Law AM, Kelton DW, Kelton WD, Kelton DM (1999) Simulation modeling and anal-
ysis. McGraw-Hill, New York
16. Ledolter J, Burrill CW (1999) Statistical quality control. Wiley, New York
17. Monden Y (1998) Toyota production system – an integrated approach to just-in-time.
EMP Books, Norcross
18. Montgomery DC (2001) Introduction to statistical quality control, 4th edn. Wiley, New
York
19. Pande P, Holpp L (2002) What is six sigma? McGraw-Hill, New York
20. Phadke M (1989) Quality engineering using robust design. Prentice Hall, Englewood
Cliffs
21. Raz T (1986) A survey of models for allocating inspection effort in multistage produc-
tion systems. Journal of Quality Technology 18(4): 239–246
22. Shin WS, Mart SM, Lee HF (1995) Strategic allocation of inspection stations for a flow
assembly line: a hybrid procedure. IIE Transactions 27: 707–715
23. Shingo S (1989) A study of the Toyota production system from an industrial engineering
viewpoint. Productivity Press, Portland
24. Toyota Motor Corporation (1996) The Toyota production system
25. Wein L (1988) Scheduling semiconductor wafer fabrication. IEEE Transactions on
semiconductor manufacturing 1(3): 115–130
26. Wooddall WH, Montgomery DC (1999) Research issues and ideas in statistical process
control. Journal of Quality Technology 31(4): 376–386
Stochastic cyclic flow lines with blocking:
Markovian models
Young-Doo Lee and Tae-Eog Lee
Department of Industrial Engineering, Korea Advanced Institute of Science and Technology,
373-1, Kuseong-Dong, Yuseong-Gu, Taejon 305-701, Korea
(e-mail: {ydlee,telee}@kaist.ac.kr)

Abstract. We consider a cyclic flow line model that repetitively produces multiple
items in a cyclic order. We examine performance of stochastic cyclic flow line mod-
els with finite buffers of which processing times have exponential or phase-type
distributions. We develop an exact method for computing a two-station model by
making use of the matrix geometric structure of the associated Markov chain. We
present a computationally tractable approximate performance computing method
that decomposes the line model into a number of two-station submodels and param-
eterizing the submodels by propagating the starvation and blocking probabilities
through the adjacent submodels. We discuss performance characteristics including
comparison with random order processing and effects of the job variation and the
job processing sequence. We also report the accuracy of our proposed method.

Keywords: Cyclic ﬂow line – Stochastic – Blocking – Performance – Decompo-

sition

1 Introduction

Cyclic production is a way of producing multiple items simultaneously in a shop.

It repetitively produces an identical set of items in the same loading and sequence
at each station. For instance, we have production requirement of 100, 200, and 300
items for item types a, b, and c, respectively. Then, the minimal set of 1 a, 2 b’s, and
3 c’s is produced 100 times in the same production method. Depending on the visit
sequence of the items through the stations, the shop can be a job shop or a flow line.
In a cyclic flow line, each item flows through the stations in the same sequence. Each
station processes the items in the order of first come first service. Therefore, each
station repeats an identical cyclic sequence of processing the items, for instance,
Correspondence to: T.-E. Lee, 373-1 Gusung-Dong, Yusong-Gu, Daejon 305-701, Korea
150 Y.-D. Lee and T.-E. Lee

a, b, b, c, c, c, which is the same as the release sequence of the items into the line.
Cyclic flow lines are widely used for assembly lines or serial processing lines where
multiple types of items are simultaneously produced and the setup times are not
significant. Advantages of cyclic production over conventional batch production
or random order production include better utilization of the machines, simplified
flow control, continuous and smooth supply of complete part sets for downstream
assembly, timely delivery, and reduced work-in-progress inventory [14].
There have been studies on cyclic shops. Essential issues can be found in [1,
7, 10, 11, 14–18, 22]. Cyclic flow lines are often used for printed circuit board as-
sembly and electronics or other home appliance assembly, and integrated with an
accumulation-type conveyor system. Such a conveyor system allows only a few
parts to wait before each station. Such cyclic flow lines with blocking have been
examined [1, 18, 22]. They deal with scheduling issues for the cases where process
times are completely known. However, cyclic shops are subject to random disrup-
tions such as tool jamming and recovery, retrials of an assembly operation, etc.
These tend to contribute to random variation in job processing times. Scheduling
models often neglect transport times or tend to increase the processing times by the
transport times. Such approximate modeling simplifies the scheduling model, but
adds randomness of the transport times to the combined process times.
There are a few works on stochastic cyclic shop models. Rao and Jackson [20]
develop an approximate algorithm to compute the average cycle time for a cyclic
job shop with general processing time distributions, which makes use of Clark’s
approximation method for stochastic PERT networks. Bowman and Muckstadt [2]
deliberately develop a finite Markov chain model for a cyclic job shop with expo-
nential processing times and compute the average cycle time, but do not discuss
the queue length. Zhang and Graves [23] find the schedules that are least dis-
turbed by machine failures in a re-entrant cyclic flow shop. For cyclic flow lines,
Seo and Lee [21] examine the queue length distributions of the cases that have
exponential processing times and infinite buffers. Stochastic cyclic flow lines with
limited buffers have distinct performance characteristics and require a different per-
formance analysis method due to blocking. Therefore, it is necessary to examine
performance of stochastic flow lines with blocking. Karabati and Tan [13] propose
a heuristic procedure for scheduling stochastic cyclic transfer lines that move jobs
between the stations synchronously.
Stochastic cyclic flow lines are comparable to conventional tandem queues with
multiple customer classes. While the former produces different types of items in
a cyclic order, the latter processes the items in random order. Therefore, stochas-
tic flow lines require a distinct performance analysis method. Nonetheless, it is
expected that some ideas for analyzing tandem queues also will be useful for ex-
amining stochastic cyclic flow lines.
An important technique for analyzing a tandem queue model is to decompose
the model into multiple two-station models, each of which is modeled by an appro-
priately parameterized single-queue model, and approximate the performance of
the tandem queue from the performance estimates of the decomposed single-queue
models [4–6, 8]. While Dallery et al. [4, 5] and Gershwin [8] propose such decom-
position technique for transfer lines with unreliable machines and finite buffers,
Stochastic cyclic flow lines with blocking: Markovian models 151

the technique is popularly used for tandem queues. Various decomposed single-
queue models and different approximation schemes can be found in the survey on
modeling and analysis of tandem queues [6]. We note that most works on tandem
queues assume single customer class while a stochastic cyclic flow line processes
multiple customer classes simultaneously. It is expected that stochastic cyclic flow
lines with blocking require yet another decomposition and approximation method.
In this paper, we examine performance of cyclic flow line models that have finite
buffers and processing times of exponential or phase-type distributions. Phase-type
distributions are more realistic for modeling processing time distributions since any
distribution can be arbitrarily closely approximated by phase-type distributions.
While such a cyclic flow line model would be modeled by a finite continuous-
time Markov chain, the number of states tends to explode and the chain easily
becomes computationally intractable as the number of stations, the buffer capac-
ities, the number of job types, and the number of phases in the processing time
distributions increase. Therefore, we present a computationally tractable perfor-
mance approximation method that decomposes the line model into a number of
two-station submodels and appropriately parameterizes the mean processing times
of the decomposed submodels. We examine the performance characteristics and
report the experimental accuracy of the proposed algorithm. We also compare the
performance of cyclic production with that of random order production. The effect
of the job processing sequence is also discussed.

2 Stochastic cyclic ﬂow line models with ﬁnite buffers

We first explain stochastic cyclic flow line models. We consider a cyclic flow line
that consists of (K + 1) stations (S0 , S1 , . . ., SK ). Each station has a single
machine. The first station S0 has an unlimited buffer. Each subsequent station
Si (i = 1, . . . , K) has an input buffer of capacity B − 1 (that is, each station has
capacity B). Each station can process the next job in the buffer after the previous
one completes and leaves the station. A job completed at a station immediately
leaves the station and enters the input buffer of the next station. When the next
input buffer is full, the job cannot leave the station and waits until the next buffer is
available. Such waiting is called blocking, more specifically blocking after service
(BAS). When there is no job available at the input buffer, the station is idle. This is
called starvation. The transport times of jobs between the stations are negligible or
included in the processing times. The jobs in an input buffer are processed in the
order of first come and first service. A job being processed at a station cannot be
preempted. We assume that the stations are all reliable and there is no breakdown.
There are enough jobs available and hence no shortage or starvation at the first
station S0 . The last station SK has no blocking since there is no next station. The
J types of jobs are repetitively loaded into the line in a predefined cyclic order.
Therefore, each station repeats the identical cyclic order of processing the jobs.
We assume that the processing times of the jobs at a station have exponential or
phase-type distributions. The setup times are negligible.
152 Y.-D. Lee and T.-E. Lee

3 Two-station models

We ﬁrst examine performance of a two-station model that has exponential process-

ing time distributions. We introduce parameters for the two-station model. λi and
µi are the processing rates of job i at station 1 and 2, respectively, and B is the
capacity of station 2. The state of the line model is then denoted by (m1 , m2 , n),
where n = the number of jobs at station 2 including the job in progress, and mi =
the state of station i. mi usually indicates the job type being processed at station i.
However, when station 1 is being blocked, its state is indicated by m1 = b. m2 = s
means that station 2 is starving. For example, (1,1,3) indicates that both stations 1
and 2 are processing job 1, and 3 jobs at station 2 (2 in the buffer and 1 in progress).
(b, 2, 2) represents that station 1 is blocked after processing a job since station 2 has
capacity 2. We note that if we know the number of jobs at the buffer and the job type
in progress at a station, the job type in progress or just completed at another station
is easily determined. By examining the operational behavior of the line model,
the state transition diagram is obtained as in Figure 1. Since all event occurrences

1s0 2s0 3s0 Js0

λ1 λ2 λ3 λJ

µ1 µ2 µ3 µJ
211 321 431 1J1

λ2 λ4 µJ
λ3 λ1

µ1 µJ
µ2

µJ
µ1 µ2 µ3
J1(J-1) 12(J-1) 23(J-1) (J-1)J(J-1)

λJ λ1 λ2 λ J −1

µ1 µ2 µ3 µJ
11J 22J 33J JJJ
µJ
λ1 λ2 λ3 λJ

µ1 µ2 µ3 µJ
21(J+1) 32(J+1) 43(J+1) 1J(J+1)

λ2 λ4 µJ
λ3 λ1

µ1 µ2

µJ
µ1 µ2 µ3
b1B b2B b3B bJB

Fig. 1. State transition diagram of a two-station model

Stochastic cyclic ﬂow lines with blocking: Markovian models 153

are governed by exponential processing times, the state transition process forms a
continuous-time Markov chain.
We observe that the diagram repeats an identical structure each multiple of
J for state variable n. The transition rates are marked on the corresponding
arcs. We therefore expect that the generator of the Markov chain has a repeat-
ing pattern. Deﬁne r ≡ B J . We let π(m1 , m2 , n) denote the probability that
the line is at state (m1 , m2 , n) in the steady state. For exposition convenience,
we explain the case of J = 2. Deﬁne the steady state distribution vector as
π ≡ (πs , π0 , π1 , . . . , πk , . . . , πr−1 , πb ), where πs ≡ (π(1, s, 0), π(2, s, 0)), πk ≡
(π(2, 1, 2k+1), π(1, 2, 2k+1), π(1, 1, 2k+2), π(2, 2, 2k+2)), k = 0, 1, . . . , r−1,
and πb ≡ (π(b, 1, B), π(b, 2, B)). From the state transition diagram, we have the
generator matrix Q that is represented by block matrices with special structures as

⎛ ⎞
S1 S2 ···
⎜ ⎟
⎜S S A ··· ⎟
⎜ 3 4 0 ⎟
⎜ ⎟
⎜ A2 A1 A0 · · · ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜
Q=⎜ .
.. ⎟,
⎟
⎜ ⎟
⎜ ⎟
⎜ A 2 A1 A0 ⎟
⎜ ⎟
⎜ ⎟
⎜ A2 B4 B3 ⎟
⎝ ⎠
B1 B2
⎛ ⎞
−λ1 0 λ1 0
⎜ ⎟
⎜ 0 −λ2 0 λ2 ⎟
⎜ ⎟
where S1 = ⎜ ⎟,
⎜ 0 µ1 −(µ1 + λ2 ) 0 ⎟
⎝ ⎠
µ2 0 0 −(µ2 + λ1 )
⎛ ⎞
0 0 00
⎜ ⎟
⎜ 0 0 0 0⎟
⎜ ⎟
S 2 = A0 = ⎜ ⎟,
⎜ λ2 0 0 0 ⎟
⎝ ⎠
0 λ1 0 0
⎛ ⎞
−(µ1 +λ1 ) 0 λ1 0
⎜ ⎟
⎜ 0 −(µ2 +λ2 ) 0 λ2 ⎟
⎜ ⎟
S4 = B4 = A1 = ⎜ ⎟,
⎜ 0 µ1 −(µ1 +λ2 ) 0 ⎟
⎝ ⎠
µ2 0 0 −(µ2 +λ1 )
154 Y.-D. Lee and T.-E. Lee

⎛ ⎞
0 0 0 µ1
⎜ ⎟

⎜ 0 0 µ2 0 ⎟ 0 0 0 µ1 −µ1 0
⎜ ⎟
S 3 = A2 = ⎜ ⎟ , B1 = , B2 = ,
⎜0 0 0 0 ⎟ 0 0 µ2 0 0 −µ2
⎝ ⎠
00 0 0
⎛ ⎞
0 0
⎜ ⎟
⎜ 0 0 ⎟
⎜ ⎟
and B3 = ⎜ ⎟.
⎜ λ2 0 ⎟
⎝ ⎠
0 λ1
Generator Q for J > 2 that has the same block structure also can be similarly
identified. The size of each block matrix is determined by the number of job types,
J, and the station capacity, B. Ai is a J 2 × J 2 square matrix regardless of B.
It is easily seen that Q is irreducible and the finite Markov chain is positive
recurrent and hence ergodic. Therefore, the steady state probability π is the solution
of the balance equation πQ = 0. The balance equation can be efficiently solved
since Q has a special structure called a generalized birth-and-death process. Such
a structure is also a special case of the general matrix geometric structure. There
are two generally known strategies for solving the balance equation for such a
structured generator, matrix geometric technique [19] and recursive technique [3].
Buzacott and Kostelski [3] report that there is no significant difference between the
two methods in their accuracy, but the recursive method is more efficient than the
matrix geometric algorithm. There can be different implementations of the recursive
method depending on the detailed matrix structure. We adapt the recursive algorithm
of Hong et al. [12] that is used for two-station tandem queues with random failures
of stations.
From πQ = 0, we obtain the following equations.
πs S1 + π0 S3 = 0, (1)
πs S2 + π0 S4 + π1 A2 = 0, (2)
πk−1 A0 + πk A1 + πk+1 A2 = 0, k = 1, 2, . . . , r − 2, (3)
πr−2 A0 + πr−1 B4 + πb B1 = 0, and (4)
πr−1 B3 + πb B2 = 0. (5)
From (1), (2), (3), and (4), we derive following relationships. From (1) and (2),
πs = π0 T0 , where T0 = −S3 S1−1 , and (6)
π0 = π1 T1 , where T1 = −A2 (S4 + T0 S2 )−1 . (7)
From (3), we can derive
πk = πk+1 Tk+1 , where Tk+1 = −A2 (A1 + Tk A0 )−1 , k = 1, . . . , r − 2. (8)
From (4), we have
πr−1 = πb Tb , where Tb = −B1 (B4 + Tr−1 A0 )−1 . (9)
Stochastic cyclic flow lines with blocking: Markovian models 155

From the initial value of πb , T0 , T1 , T2 , . . . , Tr−1 , and Tb are successively obtained

and vector π is computed from the normalizing condition:
r−1

πs 1 + πk 1 + πb 1 = 1, (10)
k=0

where 1 is the column vector of (1, 1, . . . , 1) with an appropriate dimension. The

procedure for computing the performance is summarized as follows.

Algorithm: Two-station
Step 1. Set πb = 1 and SAV E = πb = 1.
Step 2. Compute πr−1 from (9).
Step 3. Compute πb from (5).
Step 4. If ||SAV E − πb || ≤ , go to Step 5.
Else SAV E = πb , and go to Step 2.
Step 5. Compute πr−1 , πr−2 , . . . , π0 , πs from (6)–(9).
Step 6. Normalize the steady state distribution vector π.
Vectors πb , πk , and πs are computed recursively by matrix operations. After
taking some initial value of πb , compute πk ’s and πb recursively until πb value con-
verges to a ﬁnite value. Then, π vector is normalized. The steady state queue length
distributions, the starvation probability, the blocking probability, the throughput
rate, and the mean queue length are computed, respectively, as
J
J

pn = π(m1 , m2 , n), n = 1, . . . , B,
m1 =1 m2 =1
J

ps = π(m1 , s, 0),
m1 =1
J

pb = π(b, m2 , B),
m2 =1
J
B
J
T = µj [ π(m1 , j, y)] + µJ π(b, J, B), and
j=1 y=1 m1 =1
B
J J
J

L= yπ(m1 , m2 , y) + Bπ(b, m2 , B).
y=1 m1 =1 m2 =1 m2 =1

Note that 1/T is the mean cycle time for all types of items. The mean cycle time
of job sets is J/T .

4 Models with more than two stations

In order to analyze the performance of a line model with more than two stations, we
extend the decomposition technique that has been used for tandem queues [8]. The
156 Y.-D. Lee and T.-E. Lee

S
S0 B1 S1 B2 S2 B3 S3 B4 S4

S(1)
u
S (1) B(1) Sd(1)

S(2)
u
S (2) B(2) Sd(2)

S(3)
S (3) B(3) Sd(3)
u

S(4)
u
S (4) B(4) Sd(4)

Fig. 2. Two-station decomposition

procedure is outlined as follows. First, the line model is decomposed into K two-
station submodels as shown in Figure 2. Each two-station submodel S(i) consists
of upstream station S u (i), downstream station S d (i), and buffer B(i) between
them with the same capacity B − 1 as in the original line model S. Stations S u (i)
and S d (i) are parameterized to have the performances close to those of stations
Si−1 and Si , respectively, in the original line, which are subject to starvation and
blocking.

4.1 Exponential models

We first examine the decomposition method for the case where all processing times
are exponentially distributed. Let tj (i) denote the mean processing time of job j
at station i in the original line model. Let t(i) ≡ (t1 (i), . . . , tJ (i)) , i = 0, . . . , K,
be the mean processing time vector at station Si . Let tu (i) ≡ (tu1 (i), . . . , tuJ (i))
and td (i) ≡ (td1 (i), . . . , tdJ (i)) , i = 1, . . . , K, be the mean processing times of
S u (i) and S d (i) in submodel S(i), respectively. The processing capacity at each
station of a decomposed two-station submodel is parameterized to be as close as
possible to the effective processing capacity of the corresponding station in the
original line. The processing capacity of a station in the original line is reduced due
to starvation or blocking at the station. Therefore, the processing times tu (i) of the
upstream station S u (i) of each submodel S(i) are extended as much as the delays
due to starvation of the corresponding station Si−1 in the original line. Similarly,
the processing times td (i) of the downstream station S d (i) of each submodel S(i)
are extended as much as the delays due to blocking of the corresponding stations Si
in the original line. However, for the first submodel S(1), the processing times of
station S u (1) are kept same as those of S0 in the original line. It is because the first
station S0 is never starved. Similarly, for the last submodel S(K), the processing
times of S d (K) are kept same as those of SK in the original line because the last
Stochastic cyclic flow lines with blocking: Markovian models 157

station SK is never blocked. Therefore, we have the following boundary conditions:

tu (1) = t(0) and td (K) = t(K). (11)
Consider a submodel S(i) such that 1 ≤ i < K. The processing time of job j at
the upstream station S u (i) should be taken as the sum of the processing time of
the job at station Si−1 and the starvation time of the station in the original line.
Suppose that Si−1 is starved, i.e., Bi−1 is empty, at the instant of completion of
job j at the station. Since the exact starvation probability at station Si−1 in the
original line is not available, it is approximated by the starvation probability at the
corresponding station S d (i − 1) of the preceding submodel S(i − 1), denoted by
ps (i − 1), where the preceding submodel was appropriately parameterized. The
starved station Si−1 is delayed as long as the next job j + 1’s residual processing
time at station Si−2 . The residual processing time is approximated to be job j + 1’s
residual processing time at station S u (i − 1) of submodel S(i − 1). The residual
processing time is exponentially distributed with mean tuj+1 (i − 1) because of
the memoryless property. The delay due to starvation is regarded to extend the
processing time of job j + 1 at station S u (i). Therefore, the mean processing times
of station S u (i) in submodel S(i) are parameterized to be
tu (i) = t(i − 1) + tu (i − 1) × ps (i − 1), i = 2, . . . , K. (12)
We observe that the processing times of the upstream station S u (i) of each submodel
S(i) are recursively modiﬁed from the mean processing times of the upstream
station S u (i−1) and the starvation probability of the preceding submodel S(i−1).
Such kind of recursion is often called starvation propagation [6].
Similarly, the mean processing time of job j at the downstream station S d (i)
of each submodel S(i), that is, tdj (i), is parameterized to be the sum of the mean
processing time of the job at station Si and the blocking time of the station in the
original line. Suppose that Si is blocked, i.e., Bi+1 is full at the instant of completion
of job j at the station. The blocking probability at station Si is approximated by
the blocking probability at the corresponding station S u (i + 1) of the succeeding
submodel S(i + 1), denoted by pb (i + 1), where the succeeding submodel was
appropriately parameterized. The blocked station waits until the job in progress at
S d (i + 1) is ﬁnished. The type of the job in progress, θ(j), is determined from the
jobs at station S d (i + 1). The job list is in the sequence of j − 1, j − 2, . . . , 1, J, J −
1, . . . , 2, 1, . . . , J, J −1, . . . , 2, 1, J, J −1, . . . , θ(j), where the number of identical
subsequences is appropriately determined. Since the capacity of station S d (i + 1)
is B, j − 1 + mJ + (J − θ(j) + 1) = B, where m is an appropriate nonnegative
integer and 1 ≤ j, θ(j) ≤ J. From some reasoning, it can be seen that θ(j) =
j − (B mod J) (mod J). Therefore, the residual processing time of job θ(j),
for which the upstream station S u (i + 1) of submodel S(i + 1) is being blocked, is
added to the processing time of job j at the downstream station S d (i) of submodel
S(i), which corresponds to station S u (i + 1). Due to the memoryless property,
the residual processing time of job θ(j) is exponentially distributed with mean
tdθ(j) (i + 1). We let tdθ (i) ≡ (tdθ(1) (i), . . . , tdθ(J) (i)) . By matching the index of the
blocked job with that of the job in progress at the next station, the mean processing
times of station S d (i) of submodel S(i) are parameterized to be
td (i) = t(i) + tdθ (i + 1) × pb (i + 1), i = 1, . . . , K − 1. (13)
158 Y.-D. Lee and T.-E. Lee

S(i) S(i)
J1 J1
J2 J2
J3 J3
u d u
S (i) S (i) S (i) Sd(i)
a b
Fig. 3a,b. Propagation of starvation and blocking. a Starvation propagation. b Blocking
propagation

The mean processing times of the downstream station S d (i) of each submodel S(i)
are recursively computed from the mean processing times of the downstream station
S d (i + 1) and the blocking probability of the succeeding submodel S(i + 1). Such
kind of recursion is often called blocking propagation [6]. Figure 3 illustrates the
propagation mechanism of starvation and blocking. The mean processing time of a
speciﬁc job(shaded box) at S(i) is elongated by the starvation (or blocking) time,
which is the remaining processing time of the job in progress at the preceding (or
succeeding) station.
We now have a simultaneous equation system that has 2JK unknown param-
eters, tu (i) and td (i), i = 1, . . . , K, and 2JK independent equations. The perfor-
mance of the original line is then approximated by the decomposed submodels that
are parameterized by the decomposition procedure summarized below. Once each
submodel is parameterized based on the starvation and blocking probabilities of
the adjacent submodels, the starvation and blocking probabilities of each submodel
change due to the modiﬁcations in its processing times. Therefore, the submod-
els should be parameterized again based on the changed starvation and blocking
probabilities. The algorithm repeats such computing cycle until the process times
of the submodels do not change anymore. We note that our decomposition algo-
rithm is structurally similar to the well-known decomposition procedures of [4, 5, 8]
for conventional transfer lines or tandem queues. It is known that such algorithms
based on propagation of starvation and blocking converge. In fact, our algorithm
converged quickly, mostly within 10 iterations, for all experimental cases, which
are explained in Section 5. The computation times were within 1∼2 CPU seconds
at Pentimum 1 GHz PC.

Algorithm: Decomposition
Step 1. Initialize.
Set tu (1) ≡ t(0) and td (K) ≡ t(K).
Let td (i) = t(i), i = 1, . . . , K − 1.
Step 2. For i = 2, . . . , K,
compute ps (i − 1) from submodel S(i − 1), and
compute tu (i) using equation (12).
Step 3. For i = K − 1, . . . , 1,
compute pb (i + 1) from submodel S(i + 1), and
compute td (i) using equation (13).
Stochastic cyclic ﬂow lines with blocking: Markovian models 159

Step 4. Go to step 2 until tu (i) and td (i) converge.

4.2 Phase-type distribution models

We now explain how the proposed decomposition method can be extended to

the case of phase-type distributions. A phase-type distribution with k phases
is represented as (1 − β1 )exp(µ1 ) + (1 − β2 )β1 exp(µ1 ) ∗ exp(µ2 )+ . . . +
,k−1
j=1 βj [exp(µ1 ) ∗ . . . ∗ exp(µk )], where k ≥ 1 is an integer, βj ∈ (0, 1), j =
1, . . . , k−1, and βk = 0. k-Erlang, Coxian, and hyper-exponential distributions are
the special cases. For a two-station model with phase-type processing time distri-
butions, a Markov chain model can be constructed once the state is taken to include
the phase of the job in progress at each station. It is because the time to complete
each phase has an exponential distribution and hence all event occurrences are gov-
erned by exponential time distributions. The state of the two-station model can be
represented by (m1 , m2 , n), where mi = jl , the current phase l of job j at station i.
For notational convenience, we assume that the processing time distributions have
the same number of phases l. For example, (12 , 11 , 3) represents that job 1 is in
phase 2 at station 1, job 1 is in phase 1 at station 2 while 3 jobs are at station 2. The
performance computing procedure is then similar to that for the exponential case
except that the size of the generator matrix increases due to the multiple phases.
Using a two-station, two-job case with Coxian-2 distributions, we outline how the
decomposition method can be extended for the cases with phase-type distributions.
Let (µij1 , βij , µij2 ) be the parameters of Coxian-2 distribution for processing time
of job j at station i. µijl is the mean processing rate of phase l job j at station i. βij is
the probability that job j enters the second phase after completion of the first phase
at station i. Hence, a job leaves station i immediately after completion of the first
phase at station i with probability 1 − βij . Let tijl ≡ 1/µijl , tj (i) ≡ (tij1 , tij2 ) ,
and t(i) ≡ (t1 (i), . . . , tJ (i)) , i = 0, . . . , K, be the mean processing time vector
of the jobs at station i. The starvation time at station Si−1 can be approximated
by the residual processing time of job j in progress at station S u (i − 1), which
has a Coxian distribution with parameter (1/tu(i−1)j1 , β(i−1)j , 1/tu(i−1)j2 ). To find
the mean residual processing time, we should know αjl (i − 1), the probability of
station S u (i − 1) being in phase l for processing job j when starvation occurs at the
downstream station S d (i − 1). This can be derived from the two-station model. We
note that only the mean time for processing the first phase of job j is extended be-
cause the completed job enters the first phase at the downstream station. Therefore,
the mean processing time of job j at station S u (i) is parameterized to be
tuj (i) = tj (i − 1) + (1, 0) αj (i − 1)β j (i − 1)tuj (i − 1)ps (i − 1),
i = 2, . . . , K, (14)

1 β(i−1)j
where αj (i − 1) ≡ (αj1 (i − 1), αj2 (i − 1)), and β j (i − 1) ≡ .
0 1
The blocking time of job j at station Si is approximated by the residual process-
ing time of job θ(j) in progress at station S d (i+1), which has a Coxian distribution
160 Y.-D. Lee and T.-E. Lee

Table 1. Effects of trafﬁc intensity for M(3,3)

Buffer W(0.5,0.5,3) Exact Error

size CT L1 L2 CT L1 L2 CT L1 L2

1 36.774 0.652 0.540 37.189 0.670 0.537 −1.12(%) −2.69(%) 0.56(%)

10 30.058 1.049 1.078 30.030 1.058 1.082 0.09(%) −0.85(%) 0.37(%)
20 30.000 1.060 1.104 30.003 1.060 1.093 −0.01(%) 0.00(%) 1.01(%)

Buffer W(0.8,0.8,3) Exact Error

size CT L1 L2 CT L1 L2 CT L1 L2

1 46.289 0.996 0.774 46.447 0.998 0.766 −0.34(%) −0.20(%) 1.04(%)

10 31.333 3.080 2.885 31.466 3.170 2.879 −0.42(%) −2.84(%) 0.21(%)
20 30.164 4.090 4.363 30.112 4.138 4.216 0.17(%) −1.16(%) 1.11(%)

with parameter (1/td(i+1)θ(j)1 , β(i+1)θ(j) , 1/td(i+1)θ(j)2 ). Similarly as for the star-

vation time, we obtain the mean residual processing time using the probability
γjl (i + 1) that station S d (i + 1) is in phase l for processing job j when blocking
occurs at the upstream station. Therefore, the mean processing time of job j at
station S d (i) is parameterized to be
tdj (i) = tj (i) + (1, 1) γ θ(j) (i + 1)β θ(j) (i + 1)tdθ(j) (i + 1)pb (i + 1),
i = 1, . . . , K − 1, (15)
where γ θ(j) (i + 1) = (γθ(j)1 (i + 1), γθ(j)2 (i + 1)). Together with the boundary
conditions similar to equation (11), we obtain the parameters for the two-station
submodels.

5 Experiments

We investigate the accuracy of the decomposition algorithm. A cyclic ﬂow line

model with K stations and J jobs is denoted by M (K, J). We let W (ρ1 , . . . , ρK , υ)
indicate
J the workloads
J at the stations, where the workload at station i is ρi ≡
j=1 t j (i)/ j=1 t j (0). Job variation υ indicates the relative variations of the
mean processing times of the jobs at each station, deﬁned as the ratio of the max-
imum to the minimum of the mean processing times of the jobs at each station.
It is chosen identical for all stations. We compute the cycle time (CT) and the
K lengths at the stations (Li ’s), the total mean queue length in the line
mean queue
(L ≡ i=1 Li ). Table 1 shows the performance for M (3, 3) models with differ-
ent buffer capacities. Each station has the same buffer capacity. For such small
line models, the exact performance values are computed from the continuous-time
Markov chain for the whole line.
We explain the error behavior of our proposed approximation algorithm that is
shown in Table 1. ‘Error’ in the table indicates the percent deviation from the exact
value. The proposed algorithm approximates the performance of each station by
Stochastic cyclic ﬂow lines with blocking: Markovian models 161

that of the corresponding decomposed two-station submodel. The submodels are

not independent but interact with each other through blocking and starvation. The
interactions are approximately modeled by accommodating the process times of
each submodel S(i) based on the starvation and blocking probabilities at the adja-
cent submodels S(i − 1) and S(i + 1), respectively (see equations (12) and (13) ).
The starvation and blocking probabilities are approximate values that are estimated
from the adjacent submodels. Therefore, when the starvation or blocking probabil-
ities are higher, the performance of each submodel is more affected by starvation
or blocking at the adjacent submodels, and hence the performance estimates based
on the decomposed submodels tend to have larger errors. As the traffic intensity
increases, the blocking probability increases while the starvation probability de-
creases. Therefore, it is hard to exactly figure out how the approximation errors
are affected by the traffic intensity values. They depend on the relative size of the
starvation probability decrement to the blocking probability increment, which is
also affected by the buffer size. When the buffer size is 10 or 20, the higher traf-
fic intensity causes the higher blocking probability and hence tends to increase
errors although the traffic intensity increment reduces the starvation probability.
However, for the case of buffer size 1, we observe that the higher traffic intensity
tends to decrease the errors. A conjecture for this reversed error behavior follows.
Buffer size 1 implies that there is no waiting place except the machine itself and
hence the blocking probability is extremely high, close to 1. Therefore, the block-
ing probability increment due to the traffic intensity increment is relatively small
as compared to the starvation probability decrement. Consequently, the no-buffer
case with higher traffic intensity has less error. Nonetheless, we observe that for a
given traffic intensity, the smaller buffer tends to make larger errors. Our further
experiments for longer lines such as M (12, 5) with υ = 3 and high workloads
indicate estimation errors within 2∼3%.
Table 2 shows the performance estimates for 5-station cases with different
values of job variation υ. The table is visualized by Figure 4. In the figure, for better
visualization, the performance values for two different job variation cases for a given
buffer size are marked with a small horizontal space. The simulation estimates and
99% confidence intervals in the table are obtained by 100 simulation replications.
Since there is no exact method available for these larger models, we list two types
of estimates, one from simulation and another from our approximation algorithm.
We observe that job variation increment from 3 to 10 causes the significant increase
in the cycle time. Such increase is salient when the buffer size is small and hence
the blocking probability is high. The job variation can be considered as another
kind of variation to the processing times.
The primary purpose of Table 2 and Figure 4 are to show the effects of job
variation on the performance. Nonetheless, the table shows the relative accuracy
of our estimates in comparison to the simulation estimates. The relative errors are
listed in the last two columns. In order to compare the two types of estimates, it is
desirable to reduce the confidence interval so that its width is much smaller than
the errors of our proposed algorithm. However, the confidence interval may not
be significantly reduced by increasing the number of replications due to numerical
errors and incomplete randomness of the pseudo-random number streams in the
162 Y.-D. Lee and T.-E. Lee

Table 2. Effects of job variation for M(5,3)

Buffer W(0.5,0.5,0.5,0.5,3) Simulation Error

size CT L CT L CT L

1 37.532 2.486 37.968±0.168 2.473±0.014 −1.15(%) 0.53(%)

5 30.816 4.125 30.761±0.192 4.123±0.055 0.18(%) 0.05(%)
10 30.070 4.392 30.049±0.175 4.485±0.069 −0.07(%) −2.07(%)
20 30.000 4.506 30.005±0.161 4.521±0.065 −0.02(%) −0.19(%)

Buffer W(0.5,0.5,0.5,0.5,10) Simulation Error

size CT L CT L CT L
1 39.004 2.585 39.465±0.193 2.528±0.018 −1.17(%) 2.25(%)
5 31.115 4.636 31.232±0.177 4.628±0.062 −0.38(%) 0.17(%)
10 30.090 5.073 30.125±0.167 5.143±0.088 −0.12(%) −1.36(%)
20 30.002 5.229 30.000±0.168 5.242±0.098 0.01(%) −0.25(%)

Cycle time estimates Queue length estimates

ͥ͡ ͦͥ͟

ͤͪ
ͥͪ͟
ͤͩ
v d10
v d10
Number of waiting

ͤͨ
Cycle time

ͥͥ͟
ͤͧ
v d3 v d3 max
max
ͤͦ
min ͤͪ͟ min
ͤͥ estimate estimate

ͤͤ
ͤͥ͟
jobs

ͤͣ

ͤ͢ ͣͪ͟

ͤ͡

ͣͪ ͣͥ͟
͢ ͦ ͢͡ ͣ͡ ͢ ͦ ͢͡ ͣ͡
Buffer size Buffer size

Fig. 4. Effects of job variation for M (5, 3)

computer simulation program, which are hard to control. In fact, Figure 4 shows
that the confidence intervals are not sufficiently reduced. Nonetheless, we roughly
figure out the relative accuracy of the estimates by our proposed algorithm and how
much job variation increment increases the errors of our estimates. For instance,
among 8 comparisons in the table, 6 estimates by our approximation algorithm fall
within the confidence intervals. The figure shows that the errors tend to be larger
when the buffer size is smaller. It is because when the blocking probabilities are
higher, the blocking propagation procedure amplifies the blocking approximation
errors more. We also observe that job variation increment tends to increase the errors
of our algorithm in the cycle time estimates. It is because higher job variation causes
more blocking especially when the buffer size is small.
Stochastic cyclic flow lines with blocking: Markovian models 163

Cycle time
55

B=1, v=10

B=1, v=3
45
Seconds

99% C.I.
1 estimate

35 B=10, v=10

B=10, v=3
30 1 2 3 4 5 6 Random
Sequence
Fig. 5. Effects of job processing orders for M (4, 4)

6 Effects of job processing sequences and comparison

with random order processing

We examine the effects of the job processing sequence on the performance and
compare the performance with that of random order job processing. There are
(J − 1)! cyclic sequences of processing J types of jobs.
Since the exact method is not available for the larger cases such as M (4, 4),
we should resort to simulation or our proposed approximation method. Although
the errors of our approximation method are small, mostly within 1∼2 percent, they
are biased. However, simulation tends to give less biased point estimates because
the point estimates are obtained by averaging out the estimates from many inde-
pendent replications. Furthermore, the confidence intervals provide information on
the relative accuracy of the point estimates. Therefore, we primarily use simula-
tion estimates to have more consistent performance estimates for examining the
effects of the job processing order. Table 3 shows the performance estimates for
each processing sequence for line model M(4,4) with W(0.8, 0.8, 0.8, υ). Figure 5
visualizes the cycle time estimates. The point estimates and the 99% confidence
intervals are obtained from 100 replications of simulation. CV in the table indi-
cates 100 times of the coefficient of variation of the performance estimates for the
sequences. For random order processing, the first station that is considered as an
arrival generator is modified to generate arrivals in random order while the long-
run proportion of job types is maintained. From Figure 5, we observe that while
the mean processing times are kept to be identical for all four cases, the relative
performance between the processing sequences are different for each case. There-
fore, the processing sequence should be carefully taken based on the performance
estimates. Our proposed procedure can efficiently compute the approximate cycle
time estimates within 1∼2 percent errors. As seen in Figures 4 and 5, the estimates
164 Y.-D. Lee and T.-E. Lee

Table 3. Effects of job processing orders for M(4,4) with W(0.8,0.8,0.8,υ)

Order B=1 B=10

CT L CT L

υ=3 1 45.186±0.173 2.390±0.013 31.428±0.110 9.144±0.101

2 45.235±0.163 2.431±0.015 31.444±0.103 9.247±0.097
SCFL 3 45.516±0.185 2.404±0.014 31.514±0.107 9.260±0.108
4 46.331±0.177 2.453±0.014 31.652±0.106 9.356±0.099
5 45.860±0.187 2.397±0.014 31.556±0.107 9.148±0.094
6 45.952±0.185 2.421±0.014 31.099±0.101 9.228± 0.109

CV 0.978 0.979 0.603 0.856

Random 46.615±0.179 2.573±0.013 31.758±0.107 9.453±0.106

Order B=1 B=10

CT Q CT Q

υ=10 1 50.030±0.274 2.395±0.021 32.993±0.179 10.170±0.129

2 51.204±0.301 2.494±0.018 33.404±0.186 10.329±0.128
SCFL 3 51.368±0.312 2.322±0.019 33.554±0.177 10.303±0.138
4 47.883±0.286 2.374±0.021 32.885±0.184 10.106±0.029
5 48.346±0.286 2.483±0.020 33.215±0.187 10.174±0.027
6 50.164±0.342 2.387±0.020 33.129±0.174 10.219±0.137

CV 2.894 2.764 0.755 0.834

Random 51.029±0.321 2.596±0.021 33.987±0.178 10.563±0.136

mostly fall within or are close to the confidence intervals. Further, Figure 5 shows
that their changes for different sequences are consistent with those of the simulation
estimates. Even though our estimates tend to have larger errors when the buffer size
is smaller, the performance differences between the processing sequences become
larger for such case. Therefore, our approximation procedure can be effectively
used for selecting the optimal or near-optimal processing sequence.
We observe that the processing sequence significantly affects the performance,
especially when job variation υ is high and the buffer sizes are small. The cyclic
processing sequences outperform random order processing in most cases. When the
job variation is higher and the buffer capacities are smaller, the cyclic sequences
have larger performance differences, but the optimal cyclic sequence has much
better performance than random order processing. A good choice of the processing
sequence tends to minimize both the cycle time and the queue length.
Stochastic cyclic flow lines with blocking: Markovian models 165

7 Final remarks

We proposed a procedure for efﬁciently computing approximate performance es-

timates of stochastic cyclic flow line models with finite buffers, where processing
times have exponential or phase-type distributions. For two-station models, we
developed an exact computing procedure by making use of the matrix geometric
structure. We identified that the popular method of parameterizing the decomposed
submodels by propagating starvation and blocking through the adjacents submod-
els of a tandem queue also can be effectively extended to cyclic flow shops with
blocking. We also found that the job processing sequence significantly affects the
performance especially when the job variation is large and the buffer capacities are
small. It was shown that cyclic production has better performance than random order
production. Future topics include reversibility and buffer allocation characteristics.

References

1. Ahmadi RH, Wurgaft H (1994) Design for synchronized flow manufacturing. Manage-
ment Science 40(11): 1469–1483
2. Bowman RA, Muckstadt JA (1993) Stochastic analysis of cyclic schedules. Operations
Research 41(5): 947–958
3. Buzacott JA, Kostelski D (1987) Matrix-geometric and recursive algorithm solution of
a two-stage unreliable flow line. IIE Transaction 19(4): 429–438
4. Dallery Y, David R, Xie XL (1988) An efficient algorithm for analysis of transfer lines
with unreliable machines and finite buffers. IIE Transactions 20(3): 280–283
5. Dallery Y, David R, Xie XL (1989) Approximate analysis of transfer lines with un-
reliable machines and finite buffers. IEEE Transactions on Automatic Control 34(9):
943–953
6. Dallery Y, Gershwin SB (1992) Manufacturing flow lines: a review of models and
analytical results. Queueing Systems 12(1): 3–94
7. Dobson G, Yano CA (1994) Cyclic scheduling to minimize inventory in a batch flow
line. European Journal of Operational Research 75(2): 441–461
8. Gershwin SB (1987) An efficient decomposition method for the approximate evaluation
of tandem queues with finite storage space and blocking. Operations Research 35(2):
291–305
9. Gershwin SB, Schick IC (1983) Modeling and analysis of three-stage transfer lines
with unreliable machines and finite buffers. Operations Research 31(2): 354–380
10. Graves SC, Meal HC, Stefek D, Zeghmi AH (1983) Scheduling of re-entrant flow shops.
Journal of Operations Management 3(4): 197–207
11. Hall NG, Lee TE, Posner ME (2002) The complexity of cyclic shop scheduling prob-
lems. Journal of Scheduling 5(4): 307–327
12. Hong Y, Glassey CR, Seong D (1992) The analysis of a production line with unreliable
machines and random processing times. IIE Transactions 24(1): 77–83
13. Karabati S, Tan B (1998) Stochastic cyclic scheduling problem in synchronous as-
sembly and production lines. The Journal of the Operational Research Society 49(11):
1173–1187
14. Lee TE, Posner ME (1997) Performance measures and schedules in periodic job shops.
Operations Research 45(1): 72–91
15. Lee TE (2000) Stable earliest starting schedules for cyclic job shops: a linear system
approach. International Journal of Flexible Manufacturing Systems 12(1): 59–80
166 Y.-D. Lee and T.-E. Lee

16. Seo JW, Lee TE (2002) Steady state analysis of cyclic job shops with overtaking.
International Journal of Flexible Manufacturing Systems 14(4): 291–318
17. Kim JH, Lee TE, Lee HY, Park DB (2003) Scheduling analysis of time-constrained
dual-armed cluster tools. IEEE Transactions on Semiconductor Manufacturing 16(3):
521–534
18. McCormick ST, Pinedo ML, Shenker S, Wolf B (1989) Sequencing in an assembly line
with blocking to minimize cycle time. Operations Research 37(6): 925–935
19. Neuts MF (1981) Matrix-geometric solutions in stochastic models: an algorithmic ap-
proach. The Johns Hopkins University Press, Baltimore, MD
20. Rao US, Jackson PL (1996) Estimating performance measures in repetitive manufac-
turing environments via stochastic cyclic scheduling. IIE Transactions 28(11): 929–939
21. Seo JW, Lee TE (1996) Stochastic cyclic flow lines: non-blocking, Markovian models.
Journal of Operational Research Society 49(5): 537–548
22. Wittrock RJ (1985) Scheduling algorithm for flexible flow lines. IBM Journal of Re-
search and Development 29(4): 401–412
23. Zhang H, Graves SC (1997) Cyclic scheduling in a stochastic environment. Operations
Research 45(6): 894–903
Section III: Queueing Network Models
of Manufacturing Systems
Performance analysis of multi-server tandem queues
with finite buffers and blocking
Marcel van Vuuren1 , Ivo J.B.F. Adan1 , and Simone A.E. Resing-Sassen2
1
Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven,
The Netherlands (e-mail: [email protected], [email protected])
2
CQM BV, P.O. Box 414, 5600 AK Eindhoven, The Netherlands
(e-mail: [email protected])

Abstract. In this paper we study multi-server tandem queues with ﬁnite buffers
and blocking after service. The service times are generally distributed. We develop
an efﬁcient approximation method to determine performance characteristics such as
the throughput and mean sojourn times. The method is based on decomposition into
two-station subsystems, the parameters of which are determined by iteration. For the
analysis of the subsystems we developed a spectral expansion method. Comparison
with simulation shows that the approximation method produces accurate results.
So it is useful for the design and analysis of production lines.

Keywords: Approximation – Blocking – Decomposition – Finite buffers – Multi-

server tandem queues – Production lines – Spectral expansion

1 Introduction

Queueing networks with finite buffers have been studied extensively in the litera-
ture; see, e.g., Dallery and Gershwin [6], Perros [17, 18], and Perros and Altiok [19],
and the references therein. Most studies, however, consider single-server models.
The few references dealing with multi-server models typically assume exponential
service times. In this paper we focus on multi-server tandem queues with general
service times, finite buffers and Blocking After Service (BAS).
Models with finite buffers and phase-type service times can be represented by
finite state Markov chains. Hence, in theory, they can be analyzed exactly. However,
the number of states of the Markov chain can be very large, which makes numerical
solutions intractable. In practice, only small systems with one or two queues can
be solved exactly; for exact methods we refer to Perros [18].
We develop an efficient method to approximate performance characteristics
such as the throughput and the mean sojourn time. The method only needs the
first two moments of the service time and it decomposes the tandem queue into
170 M. van Vuuren et al.

subsystems with one buffer. ﬁtted on Each multi-server subsystem is approximated

by a single (super) server system with state dependent arrival and departure rates,
the queue length distribution of which can be efficiently computed by a spectral
expansion method. The parameters of the inter-arrival and service times of each
subsystem are determined by an iterative algorithm. Numerical results show that
this method produces accurate estimates for important performance characteristics
as the throughput and the mean sojourn time.
Decomposition techniques have also been used by, e.g., Buzacott [2], Dallery
et al. [5], Perros [18], and Kerbache and MacGregor Smith [11]. These papers deal
with single-server queueing networks. Methods for multi-server queueing networks
with finite buffers are presented by Tahilramani et al. [21], Jain and MacGregor
Smith [9], and Cruz et al. [3, 4]. These methods, however, do not assume general
service times. An excellent survey on the analysis of manufacturing flow lines with
finite buffers is presented by Dallery and Gershwin [6].
In the analysis of queueing networks with blocking three basic approaches
can be distinguished. The first approach decomposes the network into subsystems
and the parameters of the inter-arrival and service times of the subsystems are
determined iteratively. This is the most common approach. It involves three steps:
1. Characterize the subsystems;
2. Derive a set of equations that determine the unknown parameters of each sub-
system;
3. Develop an iterative algorithm to solve these equations.
This approach is treated in Perros’ book [18] and in the survey of Dallery
and Gershwin [6]. The approach in this paper also involves the three steps men-
tioned above, as we will explain in Section 5. There are also decomposition meth-
ods available for finite buffer models with some special features, such as assem-
bly/disassembly systems (see Gershin and Burman [7]) and systems with multiple
failure modes (see Tolio et al. [23]).
The second approach is also based on decomposition of the network, but instead
of iteratively determining the parameters of the inter-arrival and service times of
the subsystems, holding nodes are added to represent blocking. This so-called
expansion method has been introduced by Kerbache and Smith [11]. The expansion
method has been successfully used to model tandem queues with the following kinds
of nodes: M/G/1/K [20], M/M/C/K [9] and M/G/C/C [3, 4].
The expansion method consist of the following three stages:
1. Network reconfiguration;
2. Parameter estimation;
3. Feedback elimination.
This method is very efficient; it produces accurate results when the buffers are large.
The third approach has been introduced by Kouvatsos and Xenios [12]. They
developed a method based on the maximum entropy method (MEM) to analyze
single-server networks. Here, holding nodes are also used and the characteristics
of the queues are determined iteratively. For each subsystem in the network the
queue-length distribution is determined by using a maximum entropy method. This
Performance analysis of multi-server tandem queues 171

algorithm is a linear program where the entropy of the queue-length distribution is

maximized subject to a number of constraints. For more information we refer the
reader to [12]. This method has been implemented in QNAT by Tahilramani et al.
[21]; they also extended the method to multi-server networks. This method works
well; the average error in the throughput is typically around 5%.
There are also several methods available for optimizing tandem queues with
ﬁnite buffers. For example, Hillier and So [8] give some insight into the general
form of the optimal design of tandem queues with the expected service times, the
queue capacities and the number of servers at each station as the decision variables.
Li et al. [13] have developed a method for optimization of tandem queues using
techniques and concepts like simulation, critical path and perturbation analysis.
The paper is organized as follows. In Section 2 we introduce the tandem queue
and its decomposition. In the section thereafter we elaborate on the arrivals at and
departures from the subsystems. The spectral expansion method for analyzing the
subsystems is discussed in Section 4. Section 5 describes the iterative algorithm.
Numerical results are presented in Section 6. The results of the approximation
method are compared with simulation and with QNAT. Finally, Section 7 contains
some concluding remarks.

2 Model and decomposition

We consider a tandem queue (L) with M server-groups and M − 1 buffers Bi ,

i = 1, . . . , M − 1, of size bi in between. The server-groups are labelled Mi ,
i = 0, . . . , M − 1; server-group Mi has mi parallel identical servers. The random
variable Si denotes the service time of a server in group Mi ; Si is generally dis-
tributed with rate µp,i (and thus with mean 1/µp,i ) and coefficient of variation cp,i .
Each server can serve one customer at a time and the customers are served in order
of arrival. The servers of M0 are never starved and we consider the BAS blocking
protocol. Figure 1 shows a tandem queue with four server groups.
The tandem queue L is decomposed into M −1 subsystems L1 , L2 , . . . , LM −1 .
Subsystem Li consists of a finite buffer of size bi , mi−1 so-called arrival servers in
front of the buffer, and mi so-called departure servers after the buffer. The arrival
and departure servers are virtual servers who describe the arrivals to a buffer and
the departures from a buffer. The decomposition of L is shown in Figure 1.
The random variable Ai denotes the service time of an arrival-server in sub-
system Li , i = 1, . . . , M − 1. This random variable represents the service time
of a server in server-group Mi−1 including possible starvation of this server. The
random variable Di denotes the service time of a departure-server in subsystem
Li ; it represents the service time of a server in server-group Mi including possible
blocking of this server. Let us indicate the rates of Ai and Di by µa,i and µd,i
and their coefficients of variation by ca,i and cd,i , respectively. If these character-
istics are known, we are able to approximate the queue-length distribution of each
subsystem. Then, by using the queue-length distribution we can also approximate
characteristics of the complete tandem queue, such as the throughput and mean
sojourn time.
172 M. van Vuuren et al.

Fig. 1. The tandem queue L and its decomposition into three subsystems L1 , L2 and L3

3 Service times of arrival and departure servers

In this section we describe how the service times of the arrival and departure servers
in subsystem Li are modelled.
The service-time Di of a departure-server in subsystem Li is approximated as
follows. We deﬁne bi,j as the probability that just after service completion of a server
in server-group Mi , exactly j servers of server-group Mi are blocked. This means
that, with probability bi,j , a server in server-group Mi has to wait for one residual
inter-departure time and j − 1 full inter-departure times of the next server-group
Mi+1 before the customer can leave the server. The inter-departure times of server-
group Mi+1 are assumed to be independent and distributed as the inter-departure
times of the superposition of mi+1 independent service processes, each with service
times Di+1 ; the residual inter-departure time is approximated by the equilibrium
residual inter-departure time of the superposition of these service processes. Let
the random variable SDi+1 denote the inter-departure time of server-group Mi+1
and RSDi+1 the residual inter-departure time. Figure 2 displays a representation
of the service time of a departure-server of subsystem Li .
In the appendix it is explained how the rates and coefﬁcients of variation of
SDi+1 and RSDi+1 can be determined. If also the blocking probabilities bi,j
Performance analysis of multi-server tandem queues 173

Fig. 2. Representation of the service time Di of a departure-server of subsystem Li

are known, then we can determine the rate µd,i and coefficient of variation cd,i
of the service time Di of a departure-server of subsystem Li . The distribution of
Di is approximated by fitting an Erlangk−1,k or Coxian2 distribution on µd,i and
cd,i , depending on whether c2d,i is less or greater than 1/2. More specifically, if
c2d,i > 1/2, then the rate and coefficient of variation of the Coxian2 distribution
with density
µ1 µ2 −µ2 t
f (t) = (1 − q)µ1 e−µ1 t + q e − e−µ1 t , t ≥ 0,
µ1 − µ2
matches with µd,i and cd,i , provided the parameters µ1 , µ2 and q are chosen as (cf.
Marie [14]):
1
µ1 = 2µd,i , q = 2 , µ2 = µ1 q. (1)
2cd,i
If 1/k ≤ c2d,i ≤ 1/(k − 1) for some k > 2, then the rate and coefficient of variation
of the Erlangk−1,k with density
tk−2 −µt tk−1 −µt
f (t) = pµk−1 e + (1 − p)µk e , t ≥ 0,
(k − 2)! (k − 1)!
matches with µd,i and cd,i if the parameters µ and p are chosen as (cf. Tijms [22]):

kc2d,i − k(1 + c2d,i ) − k 2 c2d,i
p= , µ = (k − p)µd,i . (2)
1 + c2d,i
Of course, also other phase-type distributions may be fitted on the rate and coef-
ficient of variation of Di , but numerical experiments suggest that other distributions
only have a minor effect on the results, as shown in [10].
The service times Ai of the arrival-servers in subsystem Li are modelled simi-
larly. Instead of bi,j we now use si,j defined as the probability that just after service
completion of a server in server-group Mi , exactly j servers of Mi are starved.
This means that, with probability si,j , a server in server-group Mi has to wait one
174 M. van Vuuren et al.

Fig. 3. Representation of the service time Ai of an arrival-server of subsystem Li

residual inter-departure time and j − 1 full inter-departure times from the preced-
ing server-group Mi−1 . Figure 3 displays a representation of the service time of an
arrival-server of subsystem Li .

4 Spectral analysis of a subsystem

By ﬁtting Coxian or Erlang distributions on the service times Ai and Di , subsystem

Li can be modelled as a ﬁnite state Markov process; below we describe this Markov
process in more detail for a subsystem with ma arrival servers, md departure servers
and a buffer of size b.
To reduce the state space we replace the arrival and departure servers by super
servers with state-dependent service times. The service time of the super arrival
server is the inter-departure time of the service processes of the non-blocked arrival
servers. If the buffer is not full, all arrival servers are working. In this case, the
inter-departure time (or super service time) is assumed to be Coxianl distributed,
where phase j (j = 1, . . . , l) has parameter λj and pj is the probability to proceed
to the next phase (note that Erlang distributions are a special case of Coxian dis-
tributions). If the buffer is full, one or more arrival servers may be blocked. Then
the super service time is Coxian distributed, the parameters of which depend on
the number of active servers (and follow from the inter-departure time distribution
of the active service processes). The service time of the super departure server is
deﬁned similarly. In particular, if none of the departure servers is starved, the super
service time is the inter-departure time of the service processes of all md departure
servers. This inter-departure time is assumed to be Coxiann distributed with pa-
rameters µj and qj (j = 1, . . . , n). So, the time spend in phase j is exponentially
distributed with parameter µj and the probability to proceed to the next phase is qj .
Now the subsystem can be described by a Markov process with states (i, j, k).
The state variable i denotes the total number of customers in the subsystem. Clearly,
i is at most equal to md + b + ma . Note that, if i > md + b, then i − md − b actually
Performance analysis of multi-server tandem queues 175

indicates the number of blocked arrival servers. The state variable j (k) indicates
the phase of the service time of the super arrival (departure) server. If i ≤ md + b,
then the service time of the super arrival server consists of l phases; the number of
phases depends on i for i > md + b. Similarly, the number of phases of the service
time of the super departure server is n for i ≥ md , and it depends on i for i < md .
The steady-state distribution of this Markov process can be determined effi-
ciently by using the spectral expansion method, see e.g. Mitrani [16]. Using the
spectral expansion method, Bertsimas [1] analysed a multi-server system with an
infinite buffer; we will adapt this method for finite buffer systems. The advantage of
the spectral expansion method is that the time to solve a subsystem is independent
of the size of the buffer.
Below we formulate the equilibrium equations for the equilibrium probabilities
P (i, j, k). Only the equations in the states (i, j, k) with md < i < md + b are
presented; the form of the equations in the other states appears to be of minor
importance to the analysis.
So, for md < i < md + b we have:
l n

P (i, 1, 1)(λ1 +µ1 ) = (1 − pj )λj P (i−1, j, 1)+ (1−qk )µk P (i+1, 1, k)(3)
j=1 k=1
n

P (i, j, 1)(λj + µ1 ) = pj−1 λj−1 P (i, j − 1, 1) + (1 − qk )µk P (i + 1, j, k),
k=1
j = 2, . . . , l (4)
l

P (i, 1, k)(λ1 + µk ) = qk−1 µk−1 P (i, 1, k − 1) + (1 − pj )λj P (i − 1, j, k),
j=1
k = 2, . . . , n (5)
P (i, j, k)(λj + µk ) = pj−1 λj−1 P (i, j − 1, k) + qk−1 µk−1 P (i, j, k − 1),
j = 2, . . . , l, k = 2, . . . , n. (6)
We are going to use the separation of variables technique presented in Mickens
[15], by assuming that the equilibrium probabilities P (i, j, k) are of the form
P (i, j, k) = Dj Rk wi , md ≤ i ≤ md + b, 2 ≤ j ≤ l, 2 ≤ k ≤ n. (7)
Substituting (7) in the equilibrium equations (3)–(6) and dividing by common pow-
ers of w yields:
l n
1
D1 R1 (λ1 +µ1 ) = (1 − pj )λj Dj R1 + w (1 − qk )µk D1 Rk (8)
w j=1
k=1
n

Dj R1 (λj +µ1 ) = pj−1 λj−1 Dj−1 R1 +w (1−qk )µk Dj Rk , 2≤j≤l (9)
k=1
l
1
D1 Rk (λ1 +µk ) = (1−pj )λj Dj Rk +qk−1 µk−1 D1 Rk−1 , 2≤k≤n (10)
w j=1
Dj Rk (λj +µk ) = pj−1 λj−1 Dj−1 Rk + qk−1 µk−1 Dj Rk−1
2 ≤ j ≤ l, 2 ≤ k ≤ n (11)
176 M. van Vuuren et al.

We can rewrite (11) as:

λj Dj −pj−1 λj−1 Dj−1 −µk Rk +qk−1 µk−1 Rk−1
= , 2≤j≤l, 2≤k≤n. (12)
Dj Rk
Since (12) holds for each combination of j and k, the left-hand side of (12) is
independent of k and the right-hand side of (12) is independent of j. Hence, there
exists a constant x, depending on w, such that
−xDj = λj Dj − pj−1 λj−1 Dj−1 , 2 ≤ j ≤ l, (13)
−xRk = −µk Rk + qk−1 µk−1 Rk−1 , 2 ≤ k ≤ n. (14)
Solving equation (13) gives
l−1
- pr λ r
Dj = D1 (15)
r=1
x + λr+1
Substituting (15) in (10) and using equation (14) we find the following relationship
between x and w,
l j−1
(1 − pj )λj - pr λr
w= . (16)
j=1
x + λj r=1 x + λr
Note that w is equal to the Laplace Stieltjes transform fA (s) of the service time of
the super arrival server, evaluated at s = x. Now we do the same for (9) yielding
another relationship between x and w,
n
k−1
1 (1 − qk )µk - qr µr
= . (17)
w −x + µk r=1 −x + µr
k=1
Clearly, 1/w is equal to the Laplace Stieltjes transform fD (s) of the service time
of the super departure server, evaluated at s = −x. Substituting (16) and (17) in
(8) and using (13) and (14) we find that
1 = fA (x)fD (−x).
This is a polynomial equation of degree l+n; the roots are labeled xt , t = 1, . . . , l+
n, and they are assumed to be distinct. Note that these roots may be complex-
valued. Using equation (17) we can find the corresponding l + n values for wt for
t = 1, . . . , l + n. Summarizing, for each t, we obtain the following solution of
(3)–(6),

j−1
k−1
- pr λ r - qr µr
P (i, j, k) = Bt wti ,
r=1
xt + λ r+1 r=1
−x t + µ r+1

mb ≤ i ≤ md + b, 1 ≤ j ≤ l, 1 ≤ k ≤ n,
where Bt = D1,t R1,t is some constant. Since the equilibrium equations are linear,
any linear combination of the above solutions satisﬁes (3)–(6). Hence, the general
solution of (3)–(6) is given by
l+n

j−1
k−1
- pr λ r - qr µr
P (i, j, k) = Bt wti ,
t=1 r=1
x(w t ) + λ r+1 r=1
−x(wt ) + µr+1

mb ≤ i ≤ md + b, 1 ≤ j ≤ l, 1 ≤ k ≤ n.
Performance analysis of multi-server tandem queues 177

Finally, the unknown coefﬁcients Bt and the unknown equilibrium probabilities

P (i, j, k) for i < md and i > md + b can be determined from the equilibrium
equations for i ≤ md and i ≥ md + b and the normalization equation.

5 Iterative algorithm

We now describe the iterative algorithm for approximating the performance char-
acteristics of tandem queue L. The algorithm is based on the decomposition of L
in M − 1 subsystems L1 , L2 , . . . , LM −1 . Before going into detail in Section 5.2,
we present the outline of the algorithm in Section 5.1.

5.1 Outline of the algorithm

• Step 0: Determine initial characteristics of the service times Di of the departure

servers of subsystem Li , i = M − 1, . . . , 1.
• Step 1: For subsystem Li , i = 1, . . . , M − 1:
1. Determine the ﬁrst two moments of the service time Ai of the arrival servers,
given the queue-length distribution and throughput of subsystem Li−1 .
2. Determine the queue-length distribution of subsystem Li .
3. Determine the throughput Ti of subsystem Li .
• Step 2: Determine the new characteristics of the service times Di of the departure
servers of subsystem Li , i = M − 1, . . . , 1.
• Repeat Step 1 and 2 until the service time characteristics of the departure servers
have converged.

5.2 Details of the algorithm

Step 0: Initialization: The ﬁrst step of the algorithm is to set bi,j = 0 for all i and
j. This means that we initially assume that there is no blocking. This also means
that the random variables Di are initially the same as the service times Si .

Step 1: Evaluation of subsystems: We now know the service time characteristics

of the departure servers of Li , but we also need to know the characteristics of the
service times of its arrival servers, before we are able to determine the queue-length
distribution of Li .

(a) Service times of arrival servers

For the ﬁrst subsystem L1 , the characteristics of A1 are the same as those of S0 ,
because the servers of M0 cannot be starved.
For the other subsystems we proceed as follows. By application of Little’s law
to the arrival servers, it follows that the throughput of the arrival servers multiplied
with the service time of an arrival server is equal to mean number of active (i.e.
178 M. van Vuuren et al.

non-blocked) arrival servers. The service time of an arrival server of subsystem i

is equal to 1/µa,i and the mean number of active servers is equal to
⎛ ⎞
mi−1 mi−1

⎝1 − pi,mi +bi +j ⎠ mi−1 + pi,mi +bi +j (mi−1 − j).
j=1 j=1

So, we have for the throughput Ti of subsystem Li ,

⎛ ⎞
mi−1 mi−1

T i = ⎝1 − pi,mi +bi +j ⎠ mi−1 µa,i + pi,mi +bi +j (mi−1 − j)µa,i , (18)
j=1 j=1

where pi,j denotes the probability of j customers in subsystem Li . By substituting

(n) (n−1)
the estimate Ti−1 for Ti and pi,ni +j for pi,ni +j we get as new estimate for the
service rate µa,i ,
(n)
(n) Ti−1
µa,i = mi−1 (n−1) mi−1 (n−1) ,
(1 − j=1 pi,mi +bi +j )mi−1 + j=1 pi,mi +bi +j (mi−1 − j)
where the super scripts indicate in which iteration the quantities have been calcu-
lated.
To approximate the coefﬁcient of variation ca,i of Ai we use the representation
for Ai as described in Section 3 (which is based on si−1,j , Si−1 , RSAi−1 and
SAi−1 ).

(b) Analysis of subsystem Li

Based on the (new) characteristics of the service times of both arrival and departure
servers we can determine the steady-state queue-length distribution of subsystem
Li . To do so we first fit Coxian2 or Erlangk−1,k distributions on the first two mo-
ments of the service times of the arrival-servers and departure-servers as described
in Section 3. Then we calculate the equilibrium probabilities pi,j by using the
spectral expansion method as described in Section 4.

(c) Throughput of subsystem Li

Once the steady-state queue length distribution is known, we can determine the
(n)
new throughput Ti according to (cf. (18))
⎛ ⎞
m
i −1 mi −1

T i = ⎝1 − pi,j ⎠ mi µd,i +
(n) (n) (n−1) (n) (n−1)
pi,j jµd,i . (19)
j=0 j=1

We also determine new estimates for the probabilities bi−1,j that j servers of
server-group Mi−1 are blocked after service completion of a server in server-group
Mi−1 and the probabilities si,j that j servers of server-group Mi are starved after
service completion of a server in server-group Mi .
We perform Step 1 for every subsystem from L1 up to LM −1 .
Performance analysis of multi-server tandem queues 179

Step 2: Service times of departure servers: Now we have new information about
the departure processes of the subsystems. So we can again calculate the ﬁrst two
moments of the service times of the departure-servers, starting from DM −2 down
to D1 . Note that DM −1 is always the same as SM −1 , because the servers in server-
group MM −1 can never be blocked.
A new estimate for the rate µd,i of Di is determined from (cf. (18))
(n)
(n) Ti+1
µd,i = mi −1 (n) mi −1 (n)
(20)
(1 − j=0 pi,j )mi + j=1 pi,j j
The calculation of a new estimate for the coefﬁcient of variation cd,i of Di is similar
to the one of Ai .

Convergence criterion: After Step 1 and 2 we check whether the iterative algo-
rithm has converged by comparing the departure rates in the (n − 1)-th and k-th
iteration. We decide to stop when the sum of the absolute values of the differ-
ences between these rates is less than ε; otherwise we repeat Step 1 and 2. So the
convergence criterion is
M −1
(n) (n−1)
µd,i − µd,i < ε.
i=1
Of course, we may use other stop-criteria as well; for example, we may consider
the throughput instead of the departure rates. The bottom line is that we go on until
all parameters do not change anymore.
Remark. Equality of throughputs.
It is easily seen that, after convergence, the throughputs in all subsystems are
(n) (n−1)
equal. Let us assume that the iterative algorithm has converged, so µd,i = µd,i
for all i = 1, . . . , M − 1. From equations (19) and (20) we ﬁnd the following:
⎛ ⎞
m i −1 m
i −1

T i = ⎝1 − pi,j ⎠ mi µd,i +
(n) (n) (n−1) (n) (n−1)
pi,j jµd,i
j=0 j=1
⎛ ⎞
m
i −1 m
i −1

= ⎝1 − pi,j ⎠ mi µd,i +
(n) (n) (n) (n)
pi,j jµd,i
j=0 j=1
(n)
= Ti+1 .
Hence we can conclude that the throughputs in all subsystems are the same after
convergence.

Complexity analysis: The complexity of this method is as follows. Within the

iterative algorithm, solving a subsystem consumes most of the time. In one iteration
a subsystem is solved M times. The number of iterations needed is difﬁcult to
predict, but in practice this number is about three to seven iterations.
The time consuming part of solving a subsystem is solving the boundary equa-
tions. This can be done in O((ma + md )(ka kd )3 ) time, where ka is the number
180 M. van Vuuren et al.

of phases of the distribution of one arrival process and kd is the number of phases
of the distribution of one departure process. Then, the time complexity of one it-
eration becomes O(M maxi ((mi + mi−1 )(ki ki−1 )3 )). This means that the time
complexity is polynomial and it doesn’t depend on the sizes of the buffers.

6 Numerical results

In this section we present some numerical results. To investigate the quality of our
method we compare it with discrete event simulation. After that, we compare our
method with the method developed by Tahilramani et al. [21], which is implemented
in QNAT [25].

6.1 Comparison with simulation

In order to investigate the quality of our method we compare the throughput and
the mean sojourn time with the ones produced by discrete event simulation. We are
especially interested in investigating for which set of input-parameters our method
gives satisfying results. Each simulation run is sufficiently long such that the widths
of the 95% confidence intervals of the throughput and the mean sojourn time are
smaller than 1%.
In order to test the quality of the method we use a broad set of parameters. We
test two different lengths M of tandem queues, namely with 4 and 8 server-groups.
For each tandem queue we vary the number of servers mi in the server-groups; we
use tandems with 1 server per server-group, 5 servers per server-group and with the
sequence (4, 1, 2, 8). We also vary the level of balance in the tandem queue; every
server-group has a maximum total rate of 1 and the group right after the middle
can have a total rate of 1, 1.1, 1.2, 1.5 and 2. The coefficient of variation of the
service times varies between 0.1, 0.2, 0.5, 1, 1.5 and 2. Finally we vary the buffer
sizes between 0, 2, 5 and 10. This leads to a total of 720 test-cases. The results for
each category are summarized in Table 1 up to 5. Each table lists the average error
in the throughput and the mean sojourn time compared with the simulation results.
Each table also gives for 4 error-ranges the percentage of the cases which fall in
that range. The results for a selection of 54 cases can be found in Tables 6 and 7.

Table 1. Overall results for tandem queues with different buffer sizes

Buffer Error in throughput Error in mean sojourn time

sizes (bi ) Avg. 0–5% 5–10% 10–15% >15% Avg. 0–5% 5–10% 10–15% >15%

0 5.7% 55.0% 35.0% 4.4% 5.6% 6.8% 42.8% 35.0% 14.4% 7.8%
2 3.2% 76.1% 22.8% 1.1% 0.0% 4.7% 57.2% 35.0% 7.2% 0.6%
5 2.1% 90.6% 9.4% 0.0% 0.0% 4.5% 60.6% 32.2% 7.2% 0.0%
10 1.4% 95.6% 4.4% 0.0% 0.0% 5.1% 53.3% 34.4% 12.2% 0.0%
Performance analysis of multi-server tandem queues 181

Table 2. Overall results for tandem queues with different balancing rates

Rates Error in throughput Error in mean sojourn time

unbalanced Avg. 0–5% 5–10% 10–15% >15% Avg. 0–5% 5–10% 10–15% >15%
server-group
(mi µp,i )

1.0 3.3% 76.4% 20.8% 1.4% 1.4% 3.4% 74.3% 22.2% 2.1% 1.4%
1.1 3.1% 78.5% 18.1% 2.1% 1.4% 4.0% 68.1% 27.1% 3.5% 1.4%
1.2 3.0% 79.2% 18.8% 0.7% 1.4% 4.6% 59.7% 34.7% 4.2% 1.4%
1.5 3.0% 81.3% 16.0% 1.4% 1.4% 6.5% 38.2% 43.1% 16.7% 2.1%
2.0 3.1% 81.3% 16.0% 1.4% 1.4% 7.9% 27.1% 43.8% 25.0% 4.2%

Table 3. Overall results for tandem queues with different coefﬁcients of variation of the
service times

Coefﬁcients Error in throughput Error in mean sojourn time

of variation Avg. 0–5% 5–10% 10–15% >15% Avg. 0–5% 5–10% 10–15% >15%
(c2p,i )

0.1 4.4% 54.2% 44.2% 1.7% 0.0% 3.1% 77.5% 21.7% 0.8% 0.0%
0.2 2.6% 88.3% 11.7% 0.0% 0.0% 3.4% 75.8% 22.5% 1.7% 0.0%
0.5 2.2% 90.8% 9.2% 0.0% 0.0% 4.5% 60.8% 32.5% 6.7% 0.0%
1.0 1.5% 93.3% 2.5% 4.2% 0.0% 4.1% 64.2% 30.0% 5.0% 0.8%
1.5 3.0% 82.5% 13.3% 0.0% 4.2% 7.5% 25.8% 54.2% 15.0% 5.0%
2.0 4.8% 66.7% 26.7% 2.5% 4.2% 9.1% 16.7% 44.2% 32.5% 6.7%

Table 4. Overall results for tandem queues with a different number of servers per server-
group

Number of Error in throughput Error in mean sojourn time

servers (mi ) Avg. 0-5% 5–10% 10–15% >15% Avg. 0–5% 5–10% 10–15% >15%

All 1 2.9% 83.8% 9.2% 2.9% 4.2% 5.9% 46.3% 39.2% 10.0% 4.6%
All 5 3.8% 68.3% 30.8% 0.8% 0.0% 4.6% 60.0% 29.2% 10.8% 0.0%
Mixed 2.6% 85.8% 13.8% 0.4% 0.0% 5.3% 54.2% 34.2% 10.0% 1.7%

We may conclude the following from the above results. First, we see in Table 1
that the performance of the approximation becomes better when the buffer sizes
increase. This may be due to less dependencies between the servers-groups when
the buffers are large.
We also notice that the performance is better for balanced lines (Table 2); for
unbalanced lines, especially the estimate for the mean sojourn time is not as good
as for balanced lines. If we look at the coefﬁcients of variation of the service times
(Table 3), we get the best approximations for the throughput when the coefﬁcients
182 M. van Vuuren et al.

Table 5. Overall results for tandem queues with 4 and 8 server-groups

Number of Error in throughput Error in mean sojourn time

server- Avg. 0–5% 5–10% 10–15% >15% Avg. 0–5% 5–10% 10–15% >15%
groups (M )

4 2.3% 87.2% 12.2% 0.6% 0.0% 4.7% 57.5% 32.8% 9.7% 0.0%
8 3.9% 71.4% 23.6% 2.2% 2.8% 5.8% 49.4% 35.6% 10.8% 4.2%

Table 6. Detailed results for balanced tandem queues

mi M c2p,i Buffers T App. T Sim. Diff. S App. S Sim. Diff.

1 4 0.1 0 0.735 0.771 −4.7% 4.70 4.63 1.5%

8 2 0.906 0.926 −2.2% 16.14 15.99 0.9%
4 10 0.981 0.985 −0.4% 19.22 19.03 1.0%
8 1.0 0 0.488 0.443 10.2% 11.73 13.43 −12.7%
4 2 0.703 0.700 0.4% 9.09 9.25 −1.7%
8 10 0.855 0.855 0.0% 49.52 49.81 −0.6%
4 1.5 0 0.504 0.473 6.6% 5.82 6.27 −7.2%
8 2 0.607 0.581 4.5% 21.94 23.52 −6.7%
4 10 0.834 0.835 −0.1% 22.38 22.31 0.3%

5 4 0.1 0 0.789 0.856 −7.8% 22.48 21.78 3.2%

8 2 0.827 0.926 −10.7% 52.35 49.71 5.3%
4 10 0.927 0.983 −5.7% 36.88 35.24 4.7%
8 1.0 0 0.693 0.697 −0.6% 49.20 49.14 0.1%
4 2 0.797 0.808 −1.4% 26.37 26.17 0.8%
8 10 0.867 0.882 −1.7% 83.09 83.96 −1.0%
4 1.5 0 0.742 0.724 2.5% 22.99 23.90 −3.8%
8 2 0.759 0.737 3.0% 54.63 57.27 −4.6%
4 10 0.867 0.874 −0.8% 37.97 38.86 −2.3%

Mixed 4 0.1 0 0.746 0.793 −5.9% 16.19 16.28 −0.6%

8 2 0.845 0.921 −8.3% 39.90 38.96 2.4%
4 10 0.956 0.984 −2.8% 31.61 30.05 5.2%
8 1.0 0 0.619 0.604 2.5% 37.90 38.55 −1.7%
4 2 0.756 0.757 −0.1% 20.15 20.14 0.0%
8 10 0.863 0.871 −0.9% 71.67 71.74 −0.1%
4 1.5 0 0.633 0.619 2.3% 16.78 18.01 −6.8%
8 2 0.705 0.678 4.0% 43.38 46.32 −6.3%
4 10 0.850 0.856 −0.7% 31.43 32.37 −2.9%
Performance analysis of multi-server tandem queues 183

Table 7. Detailed results for unbalanced tandem queues

mi M c2p,i Buffers T App. T Sim. Diff. S App. S Sim. Diff.

1 8 0.1 0 0.718 0.751 −4.4% 8.90 9.27 −4.0%

4 2 0.960 0.958 0.2% 6.18 6.41 −3.6%
8 10 0.980 0.983 −0.3% 38.45 43.22 −11.0%
4 1.0 0 0.594 0.561 5.9% 4.84 5.28 −8.3%
8 2 0.690 0.670 3.0% 18.81 20.31 −7.4%
4 10 0.918 0.912 0.7% 16.20 17.41 −7.0%
8 1.5 0 0.482 0.409 17.8% 11.26 13.79 −18.3%
4 2 0.714 0.691 3.3% 8.03 8.60 −6.6%
8 10 0.830 0.819 1.3% 46.75 50.16 −6.8%

5 8 0.1 0 0.781 0.851 −8.2% 43.03 42.65 0.9%

4 2 0.902 0.958 −5.8% 21.63 21.50 0.6%
8 10 0.922 0.983 −6.2% 71.89 73.95 −2.8%
4 1.0 0 0.801 0.794 0.9% 20.79 21.13 −1.6%
8 2 0.789 0.787 0.3% 51.52 53.49 −3.7%
4 10 0.927 0.929 −0.2% 30.37 32.61 −6.9%
8 1.5 0 0.730 0.692 5.5% 44.43 47.95 −7.3%
4 2 0.850 0.828 2.7% 21.95 23.70 −7.4%
8 10 0.864 0.862 0.2% 74.69 81.01 −7.8%

Mixed 8 0.1 0 0.744 0.790 −5.8% 30.96 32.41 −4.5%

4 0.1 2 0.920 0.953 −3.5% 16.72 17.14 −2.5%
8 0.1 10 0.945 0.983 −3.9% 61.00 62.54 −2.5%
4 1.0 0 0.714 0.702 1.7% 16.22 16.43 −1.3%
8 1.0 2 0.750 0.742 1.1% 39.64 42.20 −6.1%
4 1.0 10 0.926 0.919 0.8% 25.99 27.60 −5.8%
8 1.5 0 0.628 0.588 6.8% 32.68 37.66 −13.2%
4 1.5 2 0.787 0.773 1.8% 17.52 18.93 −7.4%
8 1.5 10 0.844 0.843 0.1% 61.82 69.32 −10.8%

of variation are 1, and also the estimate for the mean sojourn time is better for small
coefﬁcients of variation.
The quality of the results seems to be rather insensitive to the number of servers
per server-group (Table 4), in spite of the super-server approximation used for
multi-server models. Finally we may conclude from Table 5 that the results are
better for shorter tandem queues.
Most crucial to the quality of the approximation of the throughput appears to be
the buffer-size. For the sojourn time this appears to be the coefﬁcient of variation of
the service time. In Figures 4 and 5 we present a scatter-plot of simulation results
versus approximation results for the throughput and mean sojourn times; the plotted
cases are the same as in Tables 6 and 7. The results of the throughput are split-up
184 M. van Vuuren et al.

Fig. 4. Scatter-plot of the throughput of 54 cases split up by buffer-size

according to the buffer-size; the one for the sojourn times are split-up according to
the squared coefficient of variation of the service times.
Overall we can say that the approximation produces accurate results in most
cases. In the majority of the cases the error of the throughput is within 5% of the
simulation and the error of the mean sojourn time is within 10% of the simulation
(see also Tables 6 and 7). The worst performance is obtained for unbalanced lines
with zero buffers and high coefficients of variation of the service times. But these
cases are unlikely (and undesired) to occur in practice.
The computation times are very short. On a modern computer the computation
times are much less than a second in most cases, only in cases with service times
with low coefficients of variation and 1 server per server-group the computation
times increase to a few seconds. Therefore, for the design of production lines, this
is a very useful approximation method.

6.2 Comparison with QNAT

We also compare the present method with QNAT, a method developed by Tahilra-
mani et al. [21]. We use a tandem queue with four server-groups. It was only possible
to test cases where the ﬁrst server-group consists of a single exponential server. The
reason is that the two methods assume a different arrival process to the system. Both
processes, however, coincide for the special case of a single exponential server at
the beginning of the line. We varied the number of servers per server-group and the
size of buffers. Table 8 shows the results.
Performance analysis of multi-server tandem queues 185

Fig. 5. Scatter-plot of the mean sojourn time of 54 cases split up by coefﬁcient of variation
Table 8. Comparison of our method with QNAT

TP TP Our TP QNAT Soj. Soj. Our Soj. QNAT

mi bi Sim. App. error QNAT Error Sim. App. error QNAT error

(1,1,1,1) 0 0.515 0.537−4.3% 0.500 2.9% 5.95 5.61 5.7% – –

(1,1,1,1) 2 0.702 0.703−0.1% 0.750 −6.8% 9.25 9.10 1.7% 8.17 11.7%
(1,1,1,1) 10 0.879 0.876 0.3% 0.917 −4.3% 21.43 21.41 0.1% 18.55 13.5%
(1,5,5,5) 0 0.711 0.717−0.8% 0.167 76.5% 17.87 17.67 1.1% – –
(1,5,5,5) 2 0.791 0.788 0.3% 0.800 −1.1% 20.53 20.45 0.4% – –
(1,5,5,5) 10 0.898 0.884 1.6% 0.895 0.3% 32.27 32.59−1.0% 22.88 29.1%
(1,4,2,8) 0 0.677 0.692−2.3% 0.200 70.5% 16.59 16.28 1.9% – –
(1,4,2,8) 2 0.775 0.774 0.1% 0.800 −3.2% 19.29 19.15 0.7% – –
(1,4,2,8) 10 0.893 0.886 0.8% 0.902 −1.0% 31.03 30.86 0.6% 23.04 25.7 %

We see that the present approximation method is much more stable than QNAT
and gives in almost all cases better results. Especially the approximation of the mean
sojourn time is much better; in a number of cases QNAT is not able to produce an
approximation of the mean sojourn time. Of course, one should be careful with
drawing conclusions from this limited set of cases. Table 8 only gives an indication
of how the two methods perform.
186 M. van Vuuren et al.

6.3 Industrial case

To give an indication of the performance of our method in practice, we present the

results of an industrial case. The case involves a production line for the production
of light bulbs. The production line consists of 5 production stages with buffers in
between. Each stage has a different number of machines varying between 2 and 8.
The machines have deterministic service times, but they do suffer from breakdowns.
In the queueing model we included the breakdowns into the coefﬁcient of variation
of the service times, yielding effective service times with coefﬁcients of variation
larger than 0. In Table 9 the parameters of the production line are shown.

Table 9. Parameters for the production line for the production of bulbs

Stage mi µp,i c2p,i bi

1 2 5.73 0.96 −
2 8 1.53 0.09 21
3 4 3.43 0.80 11
4 1 32.18 0.57 34
5 4 16.12 0.96 19

We only have data of the throughput and not of the mean sojourn time of the
line, so we can only test the approximation for the throughput. The output of the
production line based on the measured data is 11.34 products per time unit. If we
simulate this production line, we obtain a throughput of 11.41 products per time
unit. The throughput given by our approximation method is 11.26, so in this case
the approximation is a good prediction for the actual throughput.

7 Concluding remarks

In this paper we described a method for the approximate analysis of a multi-server

tandem queue with ﬁnite buffers and general service times. We decomposed the
tandem queue in subsystems. We used an iterative algorithm to approximate the
arrivals and departures at the subsystems and to approximate some performance
characteristics of the tandem queue. Each multi-server subsystem is approximated
by a single (super) server queue with state-dependent inter-arrival and service times,
the steady-state queue length distribution of which is determined by a spectral
expansion method.
This method is robust and efﬁcient; it provides a good and fast alternative to
simulation methods. In most cases the errors for performance characteristics as the
throughput and mean sojourn time are within 5% of the simulation results. Numer-
ical results also give an indication of the performance of the method compared with
QNAT. The method can be extended in several directions. One may think of more
Performance analysis of multi-server tandem queues 187

Fig. 6. Phase diagram of an arbitrary inter-departure time

general conﬁgurations, like splitting and merging of streams or the possibility of

feedback. Other possibilities for extension are for example unreliable machines and
assembly/disassembly (see [24]). Possibilities for improving the quality of the ap-
proximation are, for example, using a more detailed description of the arrival to and
departures from the subsystems (e.g. including correlations between consecutive
arrivals and departures) or improving the subsystem analysis by using a description
of the service process that is more detailed than the super-server approach.

Appendix: Superposition of service processes

Let us consider m independent service processes, each of them continuously ser-

vicing customers one at a time. The service times are assumed to be independent
and identically distributed. We are interested in the ﬁrst two moments of an arbi-
trary inter-departure time of the superposition of m service processes. Below we
distinguish between Coxian2 service times and Erlangk−1,k service times.

A.1 Coxian2 service times

We assume that the service times of each service process are Coxian2 distributed
with the same parameters. The rate of the ﬁrst phase is µ1 , the rate of the second
phase is µ2 and the probability that the second phase is needed is q. The distribution
of an arbitrary inter-departure time of the superposition of m service processes can
be described by a phase-type distribution with m+1 phases, numbered 0, 1, . . . , m.
In phase i exactly i service processes are in the second phase of the service time and
m − i service processes are in the ﬁrst phase. A phase diagram of the phase-type
distribution of an arbitrary inter-departure time is shown in Figure 6. The probability
to start in phase i is denoted by ai , i = 0, . . . , m − 1. The sojourn time in phase
i is exponentially distributed with rate R(i), and pi is the probability to continue
with phase i + 1 after completion of phase i. Now we explain how to compute the
parameters ai , R(i) and pi .
The probability ai can be interpreted as follows. It is the probability that i service
processes are in phase 2 just after a departure (i.e., service completion). There is
at least one process in phase 1, namely the one that generated the departure. Since
the service processes are mutually independent, the number of service processes
in phase 2 is binomially distributed with m − 1 trials and success probability p.
188 M. van Vuuren et al.

The success probability is equal to the fraction of time a single service process is
in phase 2, so
qµ1
p= .
qµ+ µ2
Hence, for the initial probability ai we get
i m−1−i
m−1 qµ1 µ2
ai = (21)
i qµ1 + µ2 qµ1 + µ2
To determine the rate R(i), note that in state i there are i processes in phase 2 and
m − i in phase 1, so the total rate at which one of the service processes completes
a service phase is equal to
R(i) = (m − i)µ1 + iµ2 (22)
It remains to find pi , the probability that there is no departure after phase i. In phase
i three things may happen:
– Case (i): A service process completes phase 1 and immediately continues with
phase 2;
– Case (ii): A service process completes phase 1 and generates a departure;
– Case (iii): A service process completes phase 2 (and thus always generates a
departure).
Clearly, pi is the probability that case (i) happens, so
q(m − i)µi
pi = (23)
R(i)
Now the parameters of the phase-type distribution are known, we can determine
its first two moments. Let Xi denote the total sojourn time, given that we start in
phase i, i = 0, 1, . . . , m. Starting with
1 2 2
EXm = , EXm = ,
R(m) R(m)2
the first two moments of Xi can be calculated from i = m − 1 down to i = 0 by
using
1
EXi = + pi EXi , (24)
R(i)

2 2EXi+1
EXi2 = + p i + EX 2
i+1 . (25)
R(i)2 R(i)
Then the rate µs and coefficient of variation cs of an arbitrary inter-departure time
of the superposition of m service processes follow from
m
1 1 q
µ−1
s = ai EX i = + , (26)
i=0
m µ1 µ2

m

c2s = µ2s ai EXi2 − 1 (27)
i=0
Performance analysis of multi-server tandem queues 189

A.2 Erlangk−1,k service times

Now the service times of each service process are assumed to be Erlangk−1,k
distributed, i.e., with probability p (respectively 1 − p) a service time consists of
k − 1 (respectively k) exponential phases with parameter µ. Clearly, the time that
elapses until one of the m service processes completes a service phase is exponential
with parameter mµ. The number of service phases completions before one of the
service processes generates a departure ranges from 1 up to m(k − 1) + 1. So the
distribution of an arbitrary inter-departure time of the superposition of m service
processes is a mixture of Erlang distributions; with probability pi it consists of i
exponential phases with parameter mµ, i = 1, . . . , m(k − 1) + 1. Figure 7 depicts
the phase diagram. Below we show how to determine the probabilities pi .
An arbitrary inter-departure time of the superposition of m service processes is
the minimum of m − 1 equilibrium residual service times and one full service time.
Both residual and full service time have a (different) mixed Erlang distribution. In
particular, the residual service consists with probability ri of i phases with parameter
µ, where

1/(k − p), i = 1, 2, . . . , k − 1;
ri =
(1 − p)/(k − p), i = k.
The minimum of two mixed Erlang service times has again a mixed Erlang distribu-
tion; below we indicate how the parameters of the distribution of the minimum can
be determined. Then repeated application of this procedure yields the minimum of
m mixed Erlang service times.
Let X1 and X2 be two independent random variables with mixed Erlang dis-
tributions, i.e., with probability qk,i the random variable Xk (k = 1, 2) consists of
i exponential phases with parameter µk , i = 1, . . . , nk . Then the minimum of X1

Fig. 7. Phase diagram of an arbitrary inder-departure time

190 M. van Vuuren et al.

and X2 consists of at most n1 + n2 − 1 exponential phases with parameter µ1 + µ2 .

To ﬁnd the probability qi that the minimum consists of i phases, we proceed as
follows. Deﬁne qi (j) as the probability that the minimum of X1 and X2 consists
of i phases transitions, where j(≤ i) transitions are due to X1 and i − j transitions
are due to X2 . Obviously we have

min(i,n1 )
qi = qi (j), i = 1, 2, . . . , n1 + n2 − 1.
j=max(0,i−n2 )

To determine qi (j) note that the ith phase transition of the minimum can be due to
either X1 or X2 . If X1 makes the last transition, then X1 clearly consists of exactly
j phases and X2 of at least i − j + 1 phases; the probability that X2 makes i − j
transitions before the jth transition of X1 is negative-binomially distributed with
parameters j and µ1 /(µ1 + µ2 ). The result is similar if X2 instead of X1 makes
the last transition. Hence, we obtain
⎛ ⎞
j i−j n2
i−1 µ1 µ2
qi (j) = q1,j ⎝ q2,k ⎠
j−1 µ1 + µ2 µ1 + µ2
k=i−j+1
⎛ ⎞
j i−j n1
i−1 µ1 µ2 ⎝
+ q1,k ⎠ q2,i−j ,
j µ1 + µ2 µ1 + µ2
k=j+1

1 ≤ i ≤ n1 + n2 − 1, 0 ≤ j ≤ i,
where by convention, q1,0 = q2,0 = 0.
By repeated application of the above procedure we can ﬁnd the probability pi
that the distribution of an arbitrary inter-departure time of the superposition of m
Erlangk−1,k service processes consists of exactly i service phases with parameter
mµ, i = 1, 2, . . . , m(k − 1) + 1. It is now easy to determine the rate µs and
coefﬁcient of variation cs of an arbitrary inter-departure time, yielding

−1 1 p(k − 1) (1 − p)k k−p
µs = + = ,
m µ µ mµ
and, by using that the second moment of an Ek distribution with scale parameter
µ is k(k + 1)/µ2 ,
m(k−1)+1 m(k−1)+1
i(i + 1) 1
c2s = µ2s pi − 1 = −1 + pi i(i + 1).
i=1
(mµ)2 (k − p)2 i=1

A.3 Equilibrium residual inter-departure time

To determine the ﬁrst two moments of the equilibrium residual inter-departure time
of the superposition of m independent service processes we adopt the following
simple approach.
Let the random variable D denote an arbitrary inter-departure time and let R
denote the equilibrium residual inter-departure time. It is well known that
E(D2 ) E(D3 )
E(R) = , E(R2 ) = .
2E(D) 3E(D)
Performance analysis of multi-server tandem queues 191

In the previous sections we have shown how the first two moments of D can be
determined in case of Coxian2 and Erlangk−1,k service times. Its third moment is
approximated by the third moment of the distribution fitted on the first two moments
of D, according to the recipe in Section 3.

References

1. Bertsimas D (1990) An analytic approach to a general class of G/G/s queueing systems.

Operations Rearch earch 1: 139–155
2. Buzacott JA (1967) Automatic transfer lines with buffer stock. International Journal of
Production Research 5: 183–200
3. Cruz FRB, MacGregor Smith J (2004) Algorithm for analysis of generalized
M/G/C/C state dependent queueing networks.
http://www.compscipreprints.com/comp/Preprint/fcruzfcruz/20040105/1
4. Cruz FRB, MacGregor Smith J, Queiroz DC (2004) Service and capacity allocation in
M/G/C/C state dependent queueing networks. Computers & Operations Research
(to appear)
5. Dallery Y, David R, Xie X (1989) Approximate analysis of transfer lines with unreliable
machines and finite buffers. IEEE Transactiona on Automatic Control 34(9): 943–953
6. Dallery Y, Gershwin B (1992) Manufacturing flow line systems: a review of models
and analytical results. Queueing Systems 12: 3–94
7. Gershwin SB, Burman MH (2000) A decomposition method for analyzing inhomoge-
neous assembly/disassembly systems. Annals of Operation Research 93: 91–115
8. Hillier FS, So KC (1995) On the optimal design of tandem queueing systems with finite
buffers. Queueing Systems Theory Application 21: 245–266
9. Jain S, MacGregor Smith J (1994) Open finite queueing networks with M/M/C/K
parallel servers. Computers Operations Research 21(3): 297–317
10. Johnson MA (1993) An empirical study of queueing approximations based on phase-
type distributions. Communication Statistic-Stochastic Models 9(4): 531–561
11. Kerbache L, MacGregor Smith J (1987) The generalized expansion method for open
finite queueing networks. The European Journal of Operations Research 32: 448–461
12. Kouvatsos D, Xenios NP (1989) MEM for arbitrary queueing networks with multiple
general servers and repetative-service blocking. Performance Evaluation 10: 169–195
13. Li Y, Cai X, Tu F, Shao X (2004) Optimization of tandem queue systems with finite
buffers. Computers & Operations Research 31: 963–984
14. Marie RA (1980) Calculating equilibrium probabilities for λ(n)/Ck /1/N queue. Pro-
ceedings Performance ’80, Toronto, pp 117–125
15. Mickens R (1987) Difference equations. Van Nostrand-Reinhold, New York
16. Mitrani I, Mitra D (1992) A spectral expansion method for random walks on semi-
infinite strips. In: Beauwens R, de Groen P (eds) Iterative methods in linear algebra,
pp 141–149. North-Holland, Amsterdam
17. Perros HG (1989) A bibliography of papers on queueing networks with finite capacity
queues. Perf Eval 10: 255–260
18. Perros HG (1994) Queueing networks with blocking. Oxford University Press, Oxford
19. Perros HG, Altiok T (1989) Queueing networks with blocking. North-Holland, Ams-
terdam
20. MacGregor Smith J, Cruz FRB (2000) The buffer allocation problem for general finite
buffer queueing networks. http://citeseer.nj.nec.com/smith00buffer.html
21. Tahilramani H, Manjunath D, Bose SK (1999) Approximate analysis of open network
of GE/GE/m/N queues with transfer blocking. MASCOTS’99, pp 164–172
192 M. van Vuuren et al.

22. Tijms HC (1994) Stochastic models: an algorithmic approach. Wiley, Chichester

23. Tolio T, Matta A, Gershwin SB (2002) Analysis of two-machine lines with multiple
failure modes. IIE Transactions 34: 51–62
24. van Vuuren M (2003) Performance analysis of multi-server tandem queues with ﬁnite
buffers. Master’s Thesis, University of Technology Eindhoven, The Netherlands
25. http://poisson.ecse.rpi.edu/ hema/qnat/
An analytical method for the performance evaluation
of echelon kanban control systems
Stelios Koukoumialos and George Liberopoulos
Department of Mechanical and Industrial Engineering, University of Thessaly, Volos, Greece
(e-mail: [email protected]; [email protected])

Abstract. We develop a general purpose analytical approximation method for the

performance evaluation of a multi-stage, serial, echelon kanban control system.
The basic principle of the method is to decompose the original system into a set of
nested subsystems, each subsystem being associated with a particular echelon of
stages. Each subsystem is analyzed in isolation using a product-form approximation
technique. An iterative procedure is used to determine the unknown parameters of
each subsystem. Numerical results show that the method is fairly accurate.

Keywords: Production/inventory control – Multi-stage system – Echelon kanban

– Performance evaluation

1 Introduction

In 1960, Clark and Scarf [10] initiated the research on the coordination of multi-
stage, serial, uncapacitated inventory systems with stochastic demand and constant
lead times. Their work received considerable attention in the years that followed
and spawned a large amount of follow-on research. Much of that research evolved
around variants of the base stock control system. Research on the coordination of
multi-stage, serial, production/inventory systems having networks of stations with
limited capacity, on the other hand, has been directed mostly towards variants of
the kanban control system. In this paper, we develop an analytical approximation
method for the performance evaluation of an echelon kanban control system, used
for the coordination of production in a multi-stage, serial production/inventory
system. We test the behavior of this method with several numerical examples.
The term “echelon kanban” was introduced in [19]. The basic principle of the
operation of the echelon kanban control system is very simple: When a part leaves
Correspondence to: G. Liberopoulos
194 S. Koukoumialos and G. Liberopoulos

the last stage of the system to satisfy a customer demand, a new part is demanded
and authorized to be released into each stage. It is worth noting that the echelon
kanban control system is equivalent to the integral control system described in [8].
The echelon kanban control system differs from the conventional kanban control
system, which is referred to as installation kanban control system or policy in [19],
in that in the conventional kanban control system, a new part is demanded and
authorized to be released into a stage when a part leaves this particular stage and
not when a part leaves the last stage, as is the case with the echelon kanban control
system. This implies that in the conventional kanban control system, the placement
of a demand and an authorization for the production of a new part into a stage is
based on local information from this stage, whereas in the echelon kanban control
system, it is based on global information from the last stage. This constitutes a
potential advantage of the echelon kanban control system over the conventional
kanban control system. Moreover, the echelon kanban control system, just like the
conventional kanban control system, depends on only one parameter per stage, the
number of echelon kanbans, as we will see later on, and is therefore simpler to
optimize and implement than more complicated kanban-type control systems that
depend of two parameters per stage, such as the generalized kanban control system
[7] and the extended kanban control system [12]. These two apparent advantages of
the echelon kanban control system motivated our effort to develop an approximation
method for its performance evaluation.
Kanban-type production/inventory systems have often been modeled as queue-
ing networks in the literature. Consequently, most of the techniques that have been
developed for the analysis of kanban-type production/inventory systems are based
on methods for the performance evaluation of queueing networks. Exact analytical
solutions exist for a class of queueing networks known as separable, in which the
steady-state joint probabilities have a product-form solution. Jackson [18] was the
first to show that the steady-state joint probability of an open queueing network with
Poisson arrivals, exponential service times, probabilistic routing, and first-come-
first-served (FCFS) service disciplines has a product-form solution, where each
station of the network can be analyzed in isolation as an M/M/1 queue. For closed
queueing networks of the Jackson type, Gordon and Newell [17] showed that an
analytical, product-form solution also exists. The performance parameters of such
networks can be obtained using efficient algorithms, such as the mean value analy-
sis (MVA) algorithm [22] and the convolution algorithm [9]. The BCMP theorem
[1] summarizes extensions of product-form networks that incorporate alternative
service disciplines and several classes of customers.
Since the class of queueing networks for which an exact solution is known
(separable networks) is too restrictive for modeling and analyzing real systems,
much work has been devoted to the development of approximation methods for the
analysis of non-separable networks. Whitt [27] presented an approximation method
for the analysis of a general open queueing network that is based on decomposing
the network into a set of GI/GI/1 queues and analyzing each queue in isolation. In
the case of closed queueing networks, the approximation methods are for the most
part based on two approaches. The first approach relies on heuristic extensions of
the MVA algorithm (e.g. [23]). The second approach relies on approximating the
An analytical method for the performance evaluation 195

performance of the original network by that of an equivalent product-form network.

Spanjers et al. [24] developed a method that is based on the second approach for
a closed-loop, two-indenture, repairable-item system. Interestingly, their system is
equivalent to an echelon kanban control system with a finite population of exter-
nal jobs. Their method aggregates several states of the underlying continuous-time
Markov chain and adjusts some service rates using Norton’s Theorem for closed
queueing networks to obtain a product-form solution. Among the different methods
that rely on the second approach, Marie’s method [20] has attracted considerable at-
tention. Extensions and comparative studies of Marie’s method have been proposed
for a variety of queueing networks [2–5], and [11]. Di Mascolo, Frein and Dallery
[14,16] developed approximation methods based on Marie’s method for the perfor-
mance evaluation of the conventional kanban control system and the generalized
kanban control system.
The approximation method that we develop in this paper for the performance
evaluation of the echelon kanban control system relies on Marie’s method. To
develop our method, we first model the system as an open queueing network with
synchronization stations. By exchanging the roles of jobs (parts) and resources
(echelon kanbans) in the open network, we obtain an equivalent, multi-class, nested,
closed queueing network, in which the population of each class is equal to the job
capacity or number of echelon kanbans of the echelon of stages associated with a
particular stage. The echelon of stages associated with a particular stage is the stage
itself and all its downstream stages. We then decompose the closed network into a
set of nested subsystems, each subsystem being associated with a particular class.
This means that we have as many subsystems as the number of the stages. Each
subsystem is analyzed in isolation using Marie’s method. Each subsystem interacts
with its neighboring subsystems in that it includes its downstream subsystem in the
form of a single-server station with load-dependent, exponential service rates, and
it receives external arrivals from its upstream subsystem. A fixed-point, iterative
procedure is used to determine the unknown parameters of each subsystem by
taking into account the interactions between neighboring subsystems.
The rest of this paper is organized as follows. In Section 2, we describe the
exact operation of the echelon kanban control system by means of a simple exam-
ple. In Section 3 we present the queueing network model of the echelon kanban
control system and the performance measures of the system that we are interested
in evaluating. In Section 4, we describe the decomposition of the original system
into many subsystems. In Section 5, we present the analysis in isolation of each sub-
system, and in Section 6 we develop the analysis of the entire system. In Section 7,
we present numerical results on the effects and optimization of the parameters.
Finally, in Section 8, we draw conclusions. The analysis of the synchronization sta-
tions that appear in the queueing network models of each subsystem is presented
in Appendices A and B, and a table of the notation used in the paper is given in
Appendix C.
196 S. Koukoumialos and G. Liberopoulos

Raw Manufacturing Output Finished

Parts Process 1 Buffer 1 Parts

M1 M2 M3 M4 M5 M6 M7 M8 M9

Stage 1 Stage 2 Stage 3 Customer

Demands
Fig. 1. A serial production system decomposed into three stages in series

2 The echelon kanban control system

In this section, we give a precise description of the operation of the echelon kan-
ban control system by means of a simple example. In this example, we consider
a production system that consists of M = 9 machines in series, labeled M1 to
M9, produces a single part type, and does not involve any batching, reworking or
scrapping of parts. Each machine has a random processing time. All parts visit
successively machines M1 to M9. The production system is decomposed into N =
3 stages. Each stage is a production/inventory system consisting of a manufactur-
ing process and an output buffer. The output buffer stores the finished parts of the
stage. The manufacturing process consists of a subset of machines of the original
manufacturing system and contains parts that are in service or waiting for service
on the machines. These parts represent the work in process (WIP) of the stage and
are used to supply the output buffer. In the example, each stage consists of three
machines. More specifically, the sets of machines {M1, M2, M3}, {M4, M5, M6}
and {M7, M8, M9} belong to stages 1, 2 and 3, respectively. The decomposition
of the production system into three stages is illustrated in Figure 1.
Each stage has associated with it a number of echelon kanbans that are used
to demand and authorize the release of parts into this stage. An echelon kanban
of a particular stage traces a closed path through this stage and all its downstream
stages. The number of echelon kanbans of stage i is fixed and equal to Ki . There
must be at least one echelon kanban of stage i available in order to release a new
part into this stage. If such a kanban is available, the kanban is attached onto the
part and follows it through the system until the output buffer of the last stage. Since
an echelon kanban of stage i is attached to every part in any stage from i to N , the
number of parts in stages i to N is limited by Ki .
Parts that are in the output buffer of stage N are the finished parts of the
production system. These parts are used to satisfy customer demands. When a
customer demand arrives to the system, a demand for the delivery of a finished
part from the output buffer of the last stage to the customer is placed. If there
are no finished parts in the output buffer of the last stage, the demand cannot be
immediately satisfied and is backordered until a finished part becomes available. If
there is at least one finished part in the output buffer of the last stage, this part is
delivered to the customer after releasing the kanbans of all the stages (1, 2, and 3,
in the example) that were attached to it, hence the demand is immediately satisfied.
The released kanbans are immediately transferred upstream to their corresponding
stages. The kanban of stage i carries with it a demand for the production of a new
An analytical method for the performance evaluation 197

stage−i finished part and an authorization to release a finished part from the output
buffer of stage i − 1 into stage i. When a finished part of stage i − 1 is transferred
to stage i, the stage-i kanban is attached to it on top of the kanbans of stages 1 to
i − 1, which have already been attached to the part at previous stages. With this in
mind, we can just as well assume that
Ki ≥ Ki+1 , i = 1, . . ., N − 1. (1)

3 Queueing network model of the echelon kanban control system

In order to develop the approximation method for the performance evaluation of

the echelon kanban control system, we first model the system as an open queueing
network with synchronization stations. Figure 2 shows the queueing network model
of the echelon kanban control system with three stages in series, considered in
Section 2. The manufacturing process of each stage is modeled as a subnetwork in
which the machines of the manufacturing process are represented by single-server
stations. The subnetwork associated with the manufacturing process of stage i is
denoted by Li , and the single-server stations representing machines M1,. . . , M9
are denoted by S 1 ,. . . , S 9 , respectively. The number of stations of subnetwork Li
is denoted by mi . In the example, mi = 3, i = 1, 2, 3. The echelon kanban control
mechanism is modeled via three synchronization stations, denoted by J i , at the
output of each stage i, i = 1, 2, 3.
A synchronization station is a modeling element that is often used to model
assembly operations in queueing networks. It can be thought of as a server with
instant service times. This server is fed by two or more queues (in our case by two).
When there is at least one customer in each of the queues that feed the server, these
customers move instantly through and out of the server. This implies that, at any
time, at least one of the queues that feed the server is empty. Customers that enter
the server, exit the server after possibly having been split into more or merged into
fewer customers. In our case, the queues in each synchronization station contain
either parts or demands combined with kanbans.
To illustrate the operation of the synchronization stations, let us first focus on
any synchronization station Ji , except that of the last stage. This synchronization
station represents the synchronization between a stage-i finished part and a stage-
(i+1) free kanban. Let P Ai and DAi+1 denote the two queues of Ji . P Ai represents

L1 J1 L2 J2 L3 J3
S1 S2 S3 PA1 S4 S5 S6 PA2 S7 S8 S9 PA3

DA2 DA3 D4
Customer
K3 Demands
K2
K1

Fig. 2. Queueing network model of the echolon kanban control system of Figure 1
198 S. Koukoumialos and G. Liberopoulos

the output buffer of stage i and contains stage-i finished parts, each of which has
attached to it a kanban from each stage from 1 to i. DAi+1 contains demands for
the production of new stage-(i + 1) parts, each of which has attached to it a stage-
(i + 1) kanban. The synchronization station operates as follows. As soon as there
is one entity in each queue P Ai and DAi+1 , the stage-i finished part engages the
stage-(i + 1) kanban without releasing the kanbans from stages 1 to i that were
already attached to it, and joins the first station of stage i + 1. Note that at stage
1, as soon as a stage-1 kanban is available, a new part is immediately released into
stage 1 since there are always raw parts at the input of the system.
Let us now consider the last synchronization station JN (J3 in the example). JN
synchronizes queues P AN , and DN +1 . P AN represents the output buffer of stage
N and contains stage-N finished parts, each of which has attached to it a kanban
from each stage from 1 to N . DN +1 contains customer demands. When a customer
demand arrives to the system, it joins DN +1 , thereby demanding the release of a
finished part from P AN to the customer. If there is a finished part in queue P AN ,
it is released to the customer and the demand is satisfied. In this case, the finished
part in P AN releases the kanbans that were attached to it, and these kanbans are
transferred upstream to queues DAi (i = 1, . . ., N ). The kanban of stage i carries
along with it a demand for the production of a new stage-i(i = 1, . . ., N ) finished
part and an authorization for the release of a finished part from queue P Ai−1 into
stage i. If there are no finished parts in queue P AN , the customer demand remains
on hold in DN +1 as a backordered demand.
An important special case of the echelon kanban control system in the case
where there are always customer demands for finished parts. This case is known
as the saturated echelon kanban control system. Its importance lies in the fact that
its throughput determines the maximum capacity of the system. In the saturated
system, when there are finished parts at stage N , they are immediately consumed
and an equal number of parts enter the system. As far as the queueing network
corresponding to this model is concerned, the synchronization station JN can be
eliminated since queue DN +1 is never empty and can therefore be ignored. In
the saturated echelon kanban control system, when the processing of a part is
completed at stage N , this part is immediately consumed after releasing the kanbans
of stages 1,. . . , N that were attached to it and sending them back to queues DAi (i =
1, . . ., N ).
It is worth noting that the echelon kanban control system contains the make-
to-stock CONWIP system [23] as a special case. In the make-to-stock CONWIP
system, as soon as a finished part leaves the production system to be delivered to a
customer, a new part enters the system to begin its processing. An echelon kanban
control system with K1 ≤ Ki , i = / 1, behaves exactly like the make-to-stock
CONWIP system.
The dynamic behavior of the echelon kanban control system depends on the
manufacturing processes, the arrival process of customer external demands, and
the number of echelon kanbans of each stage. Among the performance measures
that are of particular interest are the average work in process (WIP) and the average
number of finished parts in each stage, the average number of backordered (not
immediately satisfied) demands, and the average waiting time and percentage of
An analytical method for the performance evaluation 199

backordered demands. In the case of the saturated echelon kanban control system,
the main performance measure of interest is its production rate, Pr , i.e. the average
number of ﬁnished parts leaving the output buffer of stage N per unit of time. Pr
represents the maximum rate at which customer demands can be satisﬁed. With this
in mind, the average arrival rate of external customer demands in the unsaturated
system, say λD , must be strictly less than Pr in order for the system to meet all the
demands in the long run. In other words, the stability condition for the unsaturated
system is
λD < Pr . (2)

4 Decomposition of the echelon kanban control system

To evaluate the performance of the multi-stage, serial, echelon kanban control
system, we decompose the system into many nested, single-stage subsystems and
analyze each system in isolation. The susbsystems are nested in each other in such
a way that each subsystem includes its downstream subsystem in the form of a
single-server station and receives external arrivals from its upstream subsystem.
The ﬁrst subsystem mimics the original system. To analyze each subsystem, we
view it as a closed queueing network and we approximate each station of this
network by an exponential-service station with load-dependent service rates. The
resulting network is a product-form network. A ﬁxed-point iterative procedure is
then used to determine the unknown parameters of each subsystem by taking into
account the interactions between neighboring subsystems. A detailed description
of the decomposition follows.
Consider the queueing network model of an echelon kanban control system
consisting of N stages in series as described in Section 3 (see Fig. 2 for N = 3). Let
us denote the queueing network of the system by R. Our goal is to analyze R by
decomposing it into a set of N nested subsystems, Ri , i = 1, . . ., N . This is done
as follows (see Fig. 3 for N = 3).
Subsystem RN (R3 in the example) is an open queueing network with restricted
capacity consisting of 1) an upstream synchronization station, denoted by I N ,
representing JN −1 in the original system, 2) the subnetwork of stations LN of
the original system, and 3) a downstream synchronization station, denoted by ON ,
representing JN in the original system. Each subsystem Ri , i = 2, . . ., N − 1, is
an open queueing network with restricted capacity consisting of 1) an upstream
synchronization station, denoted by I i , representing Ji−1 in the original system,
2) the subnetwork of stations Li of the original system, and 3) a downstream
single-server pseudo-station, denoted by Ŝi , representing the part of the system
downstream of Li in the original system. Finally, subsystem R1 is a closed queueing
network consisting of 1) the subnetwork of stations L1 of the original system, and
2) a downstream single-server pseudo-station, denoted by Ŝ1 , representing the part
of the system downstream of L1 in the original system. Note that pseudo-station
Ŝi in subsystem Ri , i = 1, . . ., N − 1, is an aggregate representation of subsystem
Ri+1 .
The number of echelon kanbans of subsystem Ri is Ki . Subsystem RN is syn-
chronized with two external arrival processes, one at synchronization station I N
200 S. Koukoumialos and G. Liberopoulos

Fig. 3. Illsutration of the decomposition of a 3-stage echolon kanban control system

concerning parts that arrive from subnetwork LN −1 , and the other at synchroniza-
tion station ON concerning customer demands. Subsystem Ri , i = 2, . . ., N − 1,
is synchronized with only one external arrival process at synchronization station
I i concerning parts that arrive from subnetwork Li−1 . Subsystem R1 is a closed
network; therefore it is not synchronized with any external arrival processes. As
can be seen from Table 3, each synchronization station Ji of the original network
R, linking stage i to stage i + 1, is represented only once in the decomposition.
To completely characterize each subsystem Ri , i = 2, . . ., N , we assume that
each of the external arrival processes to Ri is a state-dependent, continuous-time
Markov process. Let λi (ni ) denote the state-dependent arrival rate of stage-i raw
parts at the upstream synchronization station I i of subsystem Ri , where ni is the
state of subsystem Ri and is defined as the number of parts in this subsystem.
Let Qiu and QiI be the two queues of synchronization station I i , containing niu
and niI customers, respectively, where niu is the number of finished parts of stage
i-1 waiting to enter subnetwork Li , and niI is the number of free stage-i kanbans
waiting to authorize the release of stage-(i − 1 <) finished parts into subnetwork
Li . Then, it is clear that the only possible states of the synchronization station are
An analytical method for the performance evaluation 201

the states (niI , 0), for niI = 0, . . ., Ki , and (0, niu ), for niu = 0, . . ., Ki−1 − Ki ;
therefore, the state ni of subsystem Ri can be simply obtained from niu and niI
using the following relation:

Ki − niI if niI = / 0,
i
n = (3)
Ki + niu if niI = 0.
The above relation implies that 0 ≤ ni ≤ Ki−1 . Also, since the number of raw
parts at the input of stage i cannot be more than the number of stage-(i−1) kanbans,
λi (Ki−1 ) = 0. In subsystem RN , besides the arrival rate of stage-N raw parts at
I N , λN (nN ), there is also the external arrival rate of customer demands at ON ,
λD . Subsystem R1 , as was mentioned above, is a closed network and therefore has
no external arrival processes to deﬁne.
To obtain the performance of the original network R, the following two prob-
lems must be addressed: 1) How to analyze each subsystem Ri , i = 1, . . ., N ,
assuming that the external arrival rates are known (except in the case of the ﬁrst
subsystem R1 , where there are no external arrivals), and 2) how to determine the
unknown external arrival rates. These two problems are addressed in Sections 5
and 6, respectively. Once these two problems have been solved, the performance
of each stage of the original network R can be obtained from the performances of
subsystems Ri , i = 1, . . ., N .

5 Analysis of each subsystem in isolation

In this section, we describe how to analyze each subsystem in isolation using Marie’s
approximate analysis of general closed queueing networks [20]. Throughout this
analysis, the state-dependent rates of the external arrival processes, λi (ni ), 0 ≤
ni ≤ Ki−1 , i = 2, . . ., N , are assumed to be known. To analyze each subsystem
using Marie’s method, we ﬁrst view the subsystem as a closed queueing network.
For subsystems Ri , i = 2, . . ., N , this is done by considering the kanbans of stage
i as the customers of the closed network, and the parts and demands (in the case of
the last subsystem RN ) as external resources. Note that the queueing network asso-
ciated with subsystem R1 is already being modeled as a closed queueing network
in the decomposition. Its customers are the kanbans of stage 1.
The closed queueing network associated with subsystem RN is partitioned
into mN + 2 stations, namely, the synchronization stations I N and ON and the
mN stations of subnetwork LN . Similarly, the closed queueing network asso-
ciated with each subsystem Ri is partitioned into mi + 2 stations, namely, the
synchronization station I i , the mi stations of subnetwork Li , and station Ŝi . Fi-
nally, the closed queueing network associated with subsystem R1 is partitioned into
m1 +1 stations, namely, the m1 stations of subnetwork L1 , and station Ŝ1 . Each sta-
tion is approximated by an exponential-service station with load-dependent service
rates. The resulting network associated with each subsystem is a Gordon-Newell,
product-form network [17] consisting of Ki customers and mi + 2 stations for
subsystems Ri , i = 2, . . ., N , and m1 + 1 stations for subsystem R1 . The stations
within each subsystem Ri , i = 1, . . ., N , will be denoted by the index k ∈ Mi ,
202 S. Koukoumialos and G. Liberopoulos

where M1 = {1, . . ., m1 , Ŝ}, Mi = {I, 1, . . ., mi , Ŝ} for i = 2, . . ., N − 1, and

MN = {I, 1, . . ., mN , O}.
Let µik (nik ) denote the load-dependent service rate of station k in the product-
form network of subsystem Ri when there are nik customers in that station. We will
show how to determine µik (nik ), nik = 1, . . . , Ki , for each station k ∈ Mi within a
particular subsystem Ri , i = 1, . . ., N . The method for doing this is the same for
all subsystems Ri , i = 1, . . ., N ; therefore, for the sake of notational simplicity we
will drop index i that denotes variables associated with subsystem Ri .
Let vector n = (nk , k ∈ M ) be the state of the closed, product-form network,
where nk denotes the number of customers present at station k. Then, the probability
of being in stage n, P (n), is given by the following product-form solution [12]:
#n %
1 - - k
Vk
P (n) = , (4)
G(K) n=1 k
µ (n)
k∈M

where Vk is the average visit ratio of station k in the original system and is given
from the routing matrix of the original system, and G(K) is the normalization
constant.
To determine the unknown parameters µk (nk ), nk = 1, . . ., K, for each station
k ∈ M , in the product-form solution (4), each station is analyzed in isolation
as an open system with a state-dependent, Poisson arrival process, whose rate
λk (nk ) depends on the total number of customers, nk , present in the station. Let
Tk denote this open system. Assume that the rates λk (nk ) are known for nk =
1, . . ., K − 1. The open queueing system Tk can then be analyzed in isolation
using any appropriate technique to obtain the steady-state probabilities of having
nk customers in the isolated system, say Pk (nk ). The issue of analyzing each
queueing system Tk in isolation will be discussed immediately after Algorithm 1,
below. Once the probabilities Pk (nk ) are known, the conditional throughput of Tk
when its population is nk , which is denoted by vk (nk ), can be derived using the
relation [12],
Pk (nk − 1)
vk (nk ) = λk (nk − 1) , for nk = 1, . . ., K. (5)
Pk (nk )
The load-dependent service rates of the k-th station of the closed product-form
network are then set equal to the conditional throughputs of the corresponding open
station in isolation, i.e.:
µk (nk ) = vk (nk ), for nk = 1, . . ., K. (6)
Once the rates µk (nk ) have been obtained, the state-dependent arrival rates
λk (nk ) can be obtained from the generalized, product-form solution as [6,12]:
Gk (K − nk − 1)
λk (nk ) = Vk , for nk = 1, . . ., K − 1, and λk (K) = 0, (7)
Gk (K − nk )
where Gk (n) is the normalization constant of the closed, product-form network with
station k removed (complementary network) and population n. Gk (n) is a function
of the parameters µk (nk ) for all k =
/ k and nk = 1, . . ., K, and can be computed
efﬁciently using any computational algorithm for product-form networks [6,9]. An
An analytical method for the performance evaluation 203

iterative procedure can then be used to determine these unknown quantities. This
procedure is described by the following algorithm.
Algorithm 1: Analysis of a Subsystem in Isolation.
Step 0. (Initialization) Set µk (nk ) to some initial value, for k ∈ M and nk =
1, . . ., K.
Step 1. For k ∈ M :
Calculate the state-dependent arrival rates λk (nk ), for nk = 0, . . ., K −1, using
(7).
Step 2. For k ∈ M :
1. Analyze the open queueing system Tk .
2. Derive the steady state probabilities Pk (nk ) of having nk customers, for nk =
1, . . ., K.
3. Calculate the conditional throughputs vk (nk ), for nk = 1, . . ., K, using (5).

Step 3. For k ∈ M :
Set the load-dependent service rates µk (nk ), for nk = 1, . . ., K, in the closed,
product-form network using (6).
Step 4. Go to Step 1 until convergence of the parameters µk (nk ).
Next, we show how to analyze each open queueing system Tk . To do this, we
reintroduce index i to denote subsystem Ri . Step 2.1 of Algorithm 1 above requires
the analysis of the open queueing systems Tki for k ∈ Mi and i = 1, . . ., N .
There are four different types of queueing systems: 1) the synchronization station
ON in subsystem RN , 2) the synchronization stations I i in subsystems Ri , i =
2, . . ., N, 3) the mi stations in each subnetwork Li , i = 1, . . ., N , and 4) the pseudo-
stations Ŝi in subsystems Ri , i = 1, . . ., N − 1.
First, consider the analysis of synchronization station ON in subsystem RN .
N
O is a synchronization station fed by a continuous-time Markov arrival process
with state-dependent rates, λN N N
O (nO ), 0 ≤ nO ≤ KN , and an external Poisson
process with ﬁxed rate λD . An exact solution for this system is easy to obtain by
solving the underlying continuous-time Markov chain. Namely, the steady-state
probabilities PON (nN N
O ) of having nO customers in subsystem O
N
can be derived,
N N
and the conditional throughput vO (nO ) can be estimated using (5) (see [11] and
Appendix A).
The synchronization station I i in each subsystem Ri , i = 2, . . ., N , is a syn-
chronization station fed by two continuous-time Markov arrival processes with
state-dependent rates, λiI (niI ), 0 ≤ niI ≤ Ki , and λi (ni ), 0 ≤ ni ≤ Ki−1 . An
exact solution for this system is also easy to obtain by solving the underlying
continuous-time Markov chain. (see [14] and Appendix B).
The analysis in isolation of any station k ∈ {1, . . ., mi } in each subnetwork
Li , i = 1, . . ., N , reduces to the analysis of a λik (nik )/Gi /1/N queue. Classical
methods can be used to analyze this queue to obtain the steady-state probabilities
Pki (nik ). For instance, if the service time distribution is Coxian, the algorithms
given in [21] may be used. For multiple-server stations, we can use the numerical
204 S. Koukoumialos and G. Liberopoulos

technique presented in [26]. The conditional throughput vki (nik ) can then be derived
from the state probabilities using (5). In the special case where the service time is
exponentially distributed, the conditional throughput vki (nik ) is simply equal to the
load-dependent service rate µik (nik ) [12].
Finally, as was mentioned earlier, pseudo-station Ŝi in subsystem Ri , i =
1, . . ., N − 1, is an aggregate representation of subsystem Ri+1 , which is nested
inside subsystem Ri . Therefore, the conditional throughput of pseudo-station Ŝi ,
vŜi (niŜ ), is set equal to the conditional throughput of subsystem Ri+1 . The condi-
tional throughput of any subsystem Ri , i = 2, . . ., N , is denoted by v i (ni ) and can
be estimated by the following simple expression [3]:
i
λI (Ki − ni ) for 1 ≤ ni ≤ Ki ,
i i
v (n ) = (8)
λiI (0) for Ki ≤ ni ≤ Ki−1 .

6 Analysis of the entire echelon kanban control system

In Section 5 we analyzed each subsystem of the decomposition in isolation, given

that the arrival rates of the external arrival processes were known. In this section,
we show how to determine these arrival rates.
Consider again the queueing network of the original system,R, which was de-
composed into N subsystems (see Fig. 3 for N = 3). In each subsystem Ri ,
i = 2, . . ., N , the unknown parameters involved in the decomposition are the ar-
rival rates of raw parts at each upstream synchronization station I i , λi (ni ), 0 ≤
ni ≤ Ki−1 . Recall that pseudo-station Ŝi−1 in subsystem Ri−1 represents sub-
system Ri , i = 2, . . ., N ; therefore, the external arrival process of raw parts at
synchronization station I i in subsystem Ri should be identical to the arrival pro-
cess of parts at pseudo-station Ŝi−1 in subsystem Ri−1 . The latter process was
involved in the analysis of subsystem Ri−1 in isolation and was modeled as a state-
dependent Poisson arrival process with rate λŜi−1 (nŜi−1 ), 0 ≤ ni−1 ≤ Ki−1 . As a
result, the following set of equations holds:
λi (ni ) = λi−1
Ŝ
(ni−1
Ŝ
) for 0 ≤ ni = nŜi−1 ≤ Ki−1 and i = 2, . . ., N. (9)
Equation (9) implies that the unknown parameters λi (ni ) are the solutions of a
ﬁxed-point problem. To determine these quantities we use an iterative procedure.
This procedure is given by Algorithm 2 below. Algorithm 2 consists of several
forward and backward steps. A forward step from subsystem Ri−1 to Ri uses
new estimates of the arrival rates to the upstream synchronization station I i of
subsystem Ri , λi (ni ), to resolve Ri using Algorithm 1. A backward step from
Ri to Ri−1 solves Ri−1 using Algorithm 1, given that the arrival rates λi (ni )
to the upstream synchronization station I i of each subsystem Rj , j = i, . . ., N ,
have converged. The procedure starts with subsystem RN and moves backwards
until it reaches subsystem R1 . Subsystem RN is analyzed ﬁrst using Algorithm 1
and current estimates of λN (nN ). This yields the conditional throughput of RN ,
v N (nN ), which is needed to analyze subsystem RN −1 , since it determines the load-
dependent exponential-service rates of pseudo-station ŜN −1 . Subsystem RN −1
An analytical method for the performance evaluation 205

is analyzed next using Algorithm 1 and current estimates of λN −1 (nN −1 ). This

yields the conditional throughput of RN −1 , v N −1 (nN −1 ), and the arrival rates to the
pseudo-station ŜN −1 , λNŜ
−1 N −1
(nŜ ). If these arrival rates are not equal to the current
estimates of the arrival rates λN (nN ), then the latter rates have not converged. In this
case, the current estimates of λN (nN ) are updated to λN Ŝ
−1 N −1
(nŜ ) and subsystem
N
R is analyzed again using Algorithm 1 with the new estimates. Otherwise, the
arrival rates λN (nN ) have converged and the procedure moves on to the analysis
of subsystem RN −2 using Algorithm 1, where the load-dependent exponential-
service rates of pseudo-station ŜN −2 are set equal to v N −1 (nN −1 ). This procedure
is repeated for subsystems RN −2 ,RN −3 ,. . . , until the ﬁrst subsystem, R1 , is reached
and all the arrival rates λi (ni ), i = 2, ..., N , have converged. All the performance
parameters of interest can then be derived.
Algorithm 2: Analysis of a multi-stage echelon kanban control system.
Step 0. (Initialization) Set the unknown arrival rates of each subsystem Ri to some
initial values, e.g., λi (ni ) = λD , 0 ≤ ni ≤ Ki−1 , and i = 2, . . ., N .
Step 1. Computation and convergence of the arrival rates, λi (ni ), i = 2, . . ., N .
Set i = N
While i ≥ 1
If i = N
Solve subsystem RN using Algorithm 1 and calculate the throughput
v (n ), nN = 1, . . ., KN −1 , from (8).
N N

Set i = i − 1.
Else
Solve subsystem Ri using Algorithm 1 and calculate the arrival rate λiŜ (niŜ ),
niŜ = 0, . . . , Ki , and the throughput v i (ni ), ni = 1, . . ., Ki−1 , from (8).
If λiŜ (ni+1 ) = λi+1 (ni+1 ), ni+1 = 0, . . . , Ki ,
Set i = i − 1
Else
Set λi+1 (ni+1 ) = λiŜ (ni+1 ), ni+1 = 0, . . . , Ki , and set i = i + 1
Endif
Endif
Endwhile
In the case of the saturated echelon kanban control system, we can use the
same algorithm. The only difference is in the analysis of subsystem RN in Al-
gorithm 1, where there is no downstream synchronization station ON . As far as
the convergence properties of Algorithms 1 and 2 are concerned, in all of the nu-
merical examples that we examined (see Sect. 7), both algorithms converged. The
convergence criterion was that the relative difference between the values of every
unknown parameter at two consecutive iterations should be less that 10−4 .
Once Algorithm 2 has converged, all the performance parameters of the system
can be calculated. Indeed, from the analysis of each subsystem Ri using Algo-
rithm 1, it is possible to derive the performance parameters of stage i in the original
network R, especially the throughput and the average length of each queue, includ-
ing the queues of the synchronization stations. Thus, in the case of the saturated
206 S. Koukoumialos and G. Liberopoulos

echelon kanban system, we can derive the throughput, the average WIP, the average
number of ﬁnished parts, and the average number of free echelon kanbans for each
stage. In the case of the echelon kanban control system with external demands,
some other important performance measures can be derived from the analysis of
subsystem RN , namely, the proportion of backordered demands, pB , the average
number of backordered demands, QD , and the average waiting time of backordered
demands, WB . These performance measures can be derived as follows [11,14]:
1 QD
pB = PON (0), QD = PON (0) λN (0) , WB = ,
O
−1 pB λ D
λD

where λN N
O (0) is the arrival rate of finished parts at synchronization station O when
N
there are no finished parts at that station and PO (0) is the steady-state probability
of having no finished parts at synchronization station ON .

7 Numerical results

In this section, we test the approximation method for the performance evaluation
of the echelon kanban control system that we developed in Sections 4–6 on sev-
eral numerical examples. The approximation method was implemented on an Intel
Celeron PC @ 433 MHz, and its results are compared to simulation results obtained
using the simulation software ARENA on an AMD Athlon PC @ 1400 MHz. For
each simulation experiment we run a single replication. The length of this replica-
tion was set equal to the time needed for the system to produce 68 million parts. The
initial condition of the system at the beginning of the replication was set to a typical
regenerative state, namely the state where all customer demands and demands for
the production of new parts at all stages have been satisfied. This permitted us to
set the warm-up period at the beginning of the replication equal to zero. In all simu-
lation experiments we used 95% confidence intervals. The numerical examples are
organized into Sections 7.1 and 7.2. In Section 7.1, we study the accuracy and ra-
pidity of the approximation method as well as the influence of some key parameters
of the echelon kanban control system on system performance. In Section 7.2, we
use the approximation method to optimize the design parameters (echelon kanbans)
of the system.

7.1 Inﬂuence of parameters

In this section, we test the accuracy and rapidity of the approximation method
with two numerical examples in which we vary the number of stages, the number
of kanbans in each stage, and the service-time distributions of the manufacturing
process of each stage. For each example, we consider ﬁrst the case of the saturated
system and then the case of the system with external demands. In each example,
we compare the performance of the system obtained by the approximation method
to that obtained by simulation. We also compare the performance of the echelon
kanban control system obtained by the approximation method and by simulation to
An analytical method for the performance evaluation 207

Table 1. Production capacity of the saturated echelon kanban control system (Example 1)

Simulation Approximation

Conﬁguration Production Conﬁdence Production Relative Iterations

capacity interval capacity error

1.1: N = 3; K = 1 0.581 ±0.1% 0.571 −1.8% 7

1.2: N = 3; K = 3 0.809 ±0.1% 0.804 −0.6% 7
1.3: N = 3; K = 5 0.877 ±0.2% 0.873 −0.5% 7
1.4: N = 3; K = 10 0.934 ±0.5% 0.933 −0.1% 7
1.5: N = 3; K = 15 0.955 ±0.6% 0.954 −0.1% 7
1.6: N = 5; K = 1 0.522 ±0.0009% 0.502 −4% 16
1.7: N = 5; K = 3 0.772 ±0.1% 0.761 −1.4% 16
1.8: N = 5; K = 5 0.850 ±0.1% 0.843 −0.8% 16
1.9: N = 5; K = 10 0.919 ±0.2% 0.916 −0.3% 16
1.10: N = 5; K = 15 0.945 ±0.0009% 0.942 −0.3% 16
1.11: N = 10; K = 1 0.485 ±0.0007% 0.456 −6.4% 56
1.12: N = 10; K = 3 0.745 ±0.5% 0.730 −2.1% 56
1.13: N = 10; K = 5 0.831 ±0.7% 0.820 −1.3% 56
1.14: N = 10; K = 10 0.908 ±0.1% 0.902 −0.7% 56
1.15: N = 10; K = 15 0.937 ±0.1% 0.933 −0.4% 56

the performance of the conventional or installation kanban control system obtained

by a similar approximation method developed in [14] and by simulation.

Example 1. In Example 1, we consider an echelon kanban system composed of

N identical stages, where each stage contains a single machine with exponentially
distributed service times with mean equal to 1. In order to compare the echelon
kanban control system to the conventional kanban control system, we ﬁrst set the
number of installation kanbans of each stage i in the conventional kanban control
system, say Kic , equal to some constant, K, i.e. Kic = K. Then, we set the number
of echelon kanbans of each stage i in the echelon kanban control system, say Kie ,
equal to the sum of the installation kanbans of stages i, . . ., N , in the conventional
N
kanban control system, i.e. Kie = j=i Kjc = (N + 1 − i)K.
For the case of the saturated system, the main performance parameter of interest
is the throughput of the system, which determines the production capacity of the
system. Table 1 shows the throughput of the saturated echelon kanban control
system obtained by the approximation method and by simulation, for different
values of N and K. The same table also shows the 95% conﬁdence interval for
the simulation results, the percentage of relative error of the approximation method
with respect to simulation, and the number of iterations of Algorithm 2 that are
needed to reach convergence. Table 2 shows the same results for the conventional
kanban control system obtained in [14].
From the results in Table 1, we note that the number of iterations of Algorithm 2
of the approximation method increases with the number of stages, as is expected.
208 S. Koukoumialos and G. Liberopoulos

Table 2. Production capacity of the saturated conventional kanban control system (Exam-
ple 1)

Simulation Approximation

Conﬁguration Production Conﬁdence Production Relative Iterations

capacity interval capacity error

1.1: N = 3; K = 1 0.562 ±0.5% 0.547 −2.7% 2

1.2: N = 3; K = 3 0.800 ±0.7% 0.792 −1.0% 2
1.3: N = 3; K = 5 0.869 ±1.3% 0.865 −0.5% 2
1.4: N = 3; K = 10 0.926 ±0.8% 0.928 +0.2% 2
1.5: N = 3; K = 15 0.952 ±1.2% 0.951 −0.1% 2
1.6: N = 5; K = 1 0.484 ±0.6% 0.449 −7.0% 4
1.7: N = 5; K = 3 0.746 ±0.8% 0.731 −2.0% 4
1.8: N = 5; K = 5 0.833 ±0.8% 0.822 −1.3% 4
1.9: N = 5; K = 10 0.901 ±1.2% 0.904 +0.3% 4
1.10: N = 5; K = 15 0.943 ±1.1% 0.934 −0.9% 4
1.11: N = 10; K = 1 0.429 ±0.5% 0.379 −11.6% 7
1.12: N = 10; K = 3 0.704 ±0.7% 0.680 −3.4% 6
1.13: N = 10; K = 5 0.806 ±0.9% 0.786 −2.6% 5
1.14: N = 10; K = 10 0.855 ±0.5% 0.883 −3.2% 5
1.15: N = 10; K = 15 0.917 ±1.3% 0.919 +0.2% 5

Speciﬁcally, for N = 3, 5, and 10, we have 7, 16, and 56 iterations of Algorithm 2,

respectively. As far as the convergence of Algorithm 1 is concerned, we also note
that subsystem RN requires two iterations of Algorithm 1, subsystem R1 requires
one iteration, and all other subsystems require three iterations, irrespectively of the
number of stages N , for all the conﬁgurations tested. The simulation time is ex-
tremely long (over two hours) compared to the time required for the approximation
method, which is approximately 1–10 seconds. From Table 1, we see that as the
number of echelon kanbans increases, for a given number of stages N , the through-
put also increases and asymptotically tends to the production rate of each machine
in isolation. Moreover, the throughput seems to be decreasing in the number of
stages. The results obtained by the approximation method are fairly accurate when
compared to the simulation results. The relative error is very small in general except
for the cases where K = 1, where we observe somewhat signiﬁcant errors. This
happens because when the number of echelon kanbans is small, there are strong
dependence phenomena among stations and these phenomena are not captured well
by the state-dependent, continuous-time, Markov arrival processes assumed in the
decomposition method. Comparing the results between Tables 1 and 2, we note that
the production capacity of the echelon kanban control system is always higher than
that of the conventional kanban control system, given that the two systems have the
same value of K.
An analytical method for the performance evaluation 209

For the system with backordered demands, the main performance parameters
of interest are the proportion of backordered demands, pB , the average number of
backordered demands, QD , and the average waiting time of backordered demands,
WB , as defined at the end of Section 6. Table 3 shows these performance parameters
obtained by the approximation method and by simulation, for the configurations of
parameters 1.3, 1.8, and 1.13 of Table 1, i.e. for K = 5, and different values of the
customer demand rate, λD . The same table also shows the 95% confidence interval
for the simulation results and the number of iterations of Algorithm 2 that are
needed to reach convergence. Table 4 shows the same results for the conventional
kanban control system obtained in [14].
From the results in Table 3, we note that as the customer demand arrival rate
increases, the number of iterations of Algorithm 2 also increases, though not dra-
matically. As far as the average number of backordered demands, QD , is concerned,
we note that the analytical method is fairly accurate. This is not true for the average
waiting time of backordered demands, WB , where in some cases the difference
between the approximation method and simulation are significant. Comparing the
results between Tables 3 and 4, we note that the echelon kanban control system
always has a smaller average number of backordered demands, QD , than the con-
ventional kanban control system, given that the two systems have the same value
of K. The difference in the average number of backordered demands is more pro-
nounced when the two systems are highly loaded, i.e. when λD is close to the
production capacity.
Table 5 shows the results for the average number of finished parts (FP) and the
average work-in-process (WIP) at each stage for the configurations of parameters
1.17 and 1.19 in Table 3. Table 6 shows the same results for the conventional kanban
control system.
Comparing the results between Tables 5 and 6, we note that the echelon kanban
control system has slightly higher average WIP and lower FP than the conventional
kanban control system, when the two systems are highly loaded (i.e. λD is close
to Pr ), and given that the two systems have the same value of K. When the two
systems are not highly loaded, the difference in average WIP and FP between the
two systems is very small. Finally, it appears that the difference in average WIP
and FP between the echelon kanban control system and the conventional kanban
control system is higher in upstream stages than in downstream stages.
Although the above observations hold for the particular configurations of pa-
rameters examined, we expect that they should also hold for the other configurations
of Table 1 and different values of the customer demand rate, λD , because to a large
extent they are due to the fact that the echelon kanban control system always re-
sponds faster to customer demands than the conventional kanban control system,
given that the two systems have the same value of K.
Finally, we should note that the approximation method for the performance
evaluation of the conventional kanban control system developed in [14] is also
based on decomposing a system of N stages into N subsystems. The total number
of the unknown parameter sets (the arrival rates of the external arrival processes
to the subsystems) that must be determined for the conventional kanban control
system, however, is twice as big as that which must be determined for the echelon
210 S. Koukoumialos and G. Liberopoulos

Table 3. Average number of backordered demands, average waiting time of backordered de-
mands, and proportion of backordered demands for the echelon kanban system (Example 1)

Conﬁguration QD WB pB (%) Iterations

1.16: N = 3; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 6
Simulation 0.0 0.0 0.0
1.17: N = 3; K = 5; λD = 0.5
Approximation 0.035 4.069 1.729 7
Simulation 0.034 (±0.9%) 2.066 (±1.2%) 3.337
1.18: N = 3; K = 5; λD = 0.625
Approximation 0.221 4.594 7.687 7
Simulation 0.213 (±0.1%) 3.014 (±14.2%) 11.32
1.19: N = 3; K = 5; λD = 0.8
Approximation 4.176 10.791 48.38 8
Simulation 4.095 (±3.6%) 9.755 (±7%) 52.47
1.20: N = 5; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 16
Simulation 0.0 0.0 0.0
1.21: N = 5; K = 5; λD = 0.5
Approximation 0.035 4.070 1.71 16
Simulation 0.032 (±0.007%) 3.189 (±0.003%) 2.03
1.22: N = 5; K = 5; λD = 0.8
Approximation 6.774 14.440 58.69 22
Simulation 6.5686 (±0.08%) 12.895 (±0.02%) 63.67
1.23: N = 10; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 20
Simulation 0.0 0.0 0.0
1.24: N = 10; K = 5; λD = 0.5
Approximation 0.035 4.070 1.72 39
Simulation 0.023 (±0.005%) 3.512 (±0.002%) 1.28
1.25: N = 10; K = 5; λD = 0.77
Approximation 3.817 10.709 46.3 61
Simulation 3.131 (±0.003%) 9.064 (±0.001%) 49.3

kanban control system (namely, there are 2(N − 1) external arrival rates for the
conventional kanban control system compared to N − 1 external arrival rates for
the echelon kanban control system). Yet, for both examples examined, the number
of iterations needed for the convergence of the parameters is signiﬁcantly lower for
the conventional kanban control system than for the echelon kanban control system,
given the same convergence criterion for the two systems, as can be seen from Tables
1–4. This is due to the fact that the coordination of production is decentralized in
An analytical method for the performance evaluation 211

Table 4. Average number of backordered demands, average waiting time of backordered

demands, amd proportion of backordered demands for the conventional kanban control
system (Example 1)

Conﬁguration QD WB pB (%) Iterations

1.16: N = 3; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 1
Simulation 0.0 0.0 0.0
1.17: N = 3; K = 5; λD = 0.5
Approximation 0.035 2.06 3.4 2
Simulation 0.033 (±30%) 2.16 (±17%) 3.1
1.18: N = 3; K = 5;
λD = 0.625 Approximation 0.222 3.00 11.82 3
Simulation 0.230 (±17%) 3.26 (±15%) 11.78
1.19: N = 3; K = 5; λD = 0.8
Approximation 4.56 10.1 56.3 4
Simulation 4.26 (±19%) 10.3 (±13%) 52.1
1.20: N = 5; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 1
Simulation 0.0 0.0 0.0
1.21: N = 5; K = 5; λD = 0.5
Approximation 0.0353 2.07 3.40 2
Simulation 0.038 (±30%) 2.16 (±9%) 3.58
1.22: N = 5; K = 5; λD = 0.8
Approximation 11.26 19.3 73.0 7
Simulation 8.93 (±22%) 17.2 (±15%) 65.2
1.23: N = 10; K = 5; λD = 0.1
Approximation 0.0 0.0 0.0 1
Simulation 0.0 0.0 0.0
1.24: N = 10; K = 5; λD = 0.5
Approximation 0.0353 2.07 3.40 2
Simulation 0.0368 (±30%) 2.18 (±17%) 3.38
1.25: N = 10; K = 5; λD = 0.77
Approximation 6.89 13.9 64.2 11
Simulation 5.95 (±22%) 13.7 (±14%) 56.9

the conventional kanban control system, whereas it is centralized in the echelon

kanban control system. Nonetheless, this does not seem to constitute a noticeable
disadvantage of the approximation method for the echelon kanban control system,
since for all the cases examined, the method converges in a matter of 1–10 seconds.
Example 2. In Example 2, we consider an echelon kanban control system con-
sisting of N = 3 identical stages, where each stage contains a single machine with
212 S. Koukoumialos and G. Liberopoulos

Table 5. Average work in progress (WIP) and average number of ﬁnished parts (FP) in each
stage for the echelon kanban control system (Example 1)

Conﬁguration Stage 1 Stage 2 Stage 3

WIP FP WIP FP WIP FP

1.17: N = 3; K = 5; λD = 0.5
Simulation 0.988 4.039 0.978 4.022 0.961 4.011
(±0.1%) (±0.09%) (±0.1%) (±0.1%) (±0.1%) (±0.1%)
Approximation 0.999 4.031 0.995 4.005 0.969 4.000
Error +1.1% −0.2% +1.7% −0.4% +0.8% −0.3%
1.19: N = 3; K = 5; λD = 0.8
Simulation 3.363 2.392 3.068 2.018 2.589 1.569
(±0.5%) (±0.3%) (±0.3%) (±0.3%) (±0.3%) (±0.5%)
Approximation 3.479 2.349 3.159 1.902 2.655 1.455
Error +3.3% −1.8% +2.9% −6.1% +2.5% −7.8%

Table 6. Average work in progress (WIP) and average number of ﬁnished products (FP) in
each stage for the conventional kanban control system (Example 1)

Conﬁguration Stage 1 Stage 2 Stage 3

WIP FP WIP FP WIP FP

1.17: N = 3; K = 5; λD = 0.5
Simulation 0.94 4.06 0.95 4.02 0.94 4.04
(±3.2%) (±0.7%) (±3.1%) (±0.7%) (±3.2%) (±0.8%)
Approximation 0.97 4.03 0.97 4.01 0.97 4.00
Error +3% −0.7% +2% −0.2% +3% −1%
1.19: N = 3; K = 5; λD = 0.8
Simulation 2.54 2.47 2.52 1.98 2.55 1.58
(±3.0%) (±4.0%) (±3.2%) (±5.0%) (±3.1%) (±6.3%)
Approximation 2.61 2.38 2.58 1.85 2.66 1.40
Error +2.7% −3.6% +2.4% −6.5% +4% −11%

mean service-time equal to 1. The number of echelon kanbans at each stage is

K1 = 15, K2 = 10, and K3 = 5. Our goal is to investigate the inﬂuence of the
variability of the service time on the performance of the above system. To this end,
we consider three different service-time distributions: a Coxian-2 distribution with
squared coefﬁcient of variation cv2 = 2.0, an Erlang-2 distribution with cv2 = 0.5,
and an exponential distribution with cv2 = 1.0. Table 7 shows the production capac-
ity for the saturated echelon kanban control system obtained by the approximation
method and by simulation, for the three different distributions. Table 8 shows the
same results for the conventional kanban control system obtained in [14].
An analytical method for the performance evaluation 213

Table 7. Production capacity of the echelon kanban control system (Example 2)

Simulation Approximation

Conﬁguration Production Conﬁdence Production Relative Iterations

capacity interval capacity error

2.1: N = 3; K = 5; cv2 = 0.5 0.929 ±0.1% 0.934 +0.5% 11

2.2: N = 3; K = 5; cv2 = 1 0.876 ±0.2% 0.873 −0.3% 7
2.3: N = 3; K = 5; cv2 = 2 0.813 ±0.3% 0.808 −0.6% 13

Table 8. Production capacity of the conventional kanban control system (Example 2)

Simulation Approximation

Conﬁguration Production Conﬁdence Production Relative Iterations

capacity interval capacity error

2.1: N = 3; K = 5; cv2 = 0.5 0.926 ±0.2% 0.932 +0.6% 2

2.2: N = 3; K = 5; cv2 = 1 0.870 ±0.1% 0.865 −0.6% 2
2.3: N = 3; K = 5; cv2 = 2 0.787 ±0.5% 0.786 −0.2% 2

From the results in Table 7, we note that when the variability of the service
time distribution increases, the production capacity decreases, as is expected. The
results obtained by the approximation method are fairly accurate when compared to
the simulation results. Comparing the results between Tables 7 and 8, we note that
for all the service-time distributions, the production capacity of the echelon kanban
control system is higher than that of the conventional kanban control system. The
results for the analytical solution and simulation for the case of the echelon kanban
system with backordered demands is shown in Figure 4. More speciﬁcally, Figure 4
depicts the proportion of backordered demands, pB , as a function of the arrival rate
of demands, λD , for the three different service time distributions. It appears that as
the cv2 of the service time distribution increases, the difference between simulation
and analytical results tends to increase.

7.2 Optimization of parameters

The main purpose of developing an approximation method for the performance

evaluation of the echelon kanban control system is to use it to optimize the design
parameters of the system. The design parameters of the echelon kanban control
system are the number of echelon kanbans for each stage. In order to optimize
these parameters, we must deﬁne a performance measure of the system. Typical
performance measures are those that include the cost of not being able to satisfy the
demands on time (i.e. quality of service) and the cost of producing parts ahead of
214 S. Koukoumialos and G. Liberopoulos

Fig. 4. Proportion of backordered demands versus the average arrival rate of demands for
different values of the service-time squared coefﬁcient of variation (Example 2)

time and, therefore, building up inventory (inventory holding cost). In this paper, we
consider an optimization problem where the objective is to meet a certain quality
of service constraint with minimum inventory holding cost.
We examine two quality-of-service measures as in [15]. The first measure is
the probability that when a customer demand arrives, it is backordered. The second
measure is the probability that when a customer demand arrives, it sees more than
nwaiting demands, excluding itself. The first measure is denoted by Prupt and con-
cerns the situation where the demands must be immediately satisfied. The second
measure is denoted by P (Q > n) and concerns the situation where we have the
prerogative to introduce a delay in filling orders, which is equivalent to authorizing
demands to wait. Specifically, Prupt is the marginal stationary probability of having
no finished parts in the last synchronization station, which is given by equation (18)
in Appendix A. Similarly, P (Q > n) is the stationary probability of having more
than n customers waiting and can be computed from the following expression:
∞
n

P (Q > n) = P (Q = x) = 1 − P (Q = y), (10)
x=n+1 y=0

where P (Q = n) is given by (see Appendix A):

n
λD
P (Q = n) = pN
O (0, n) = p N
O (0, 0) . (11)
λN
O (0)

The stationary distribution pN

O (0, 0) that is needed to evaluate both Prupt and
P (Q > n) is given by the following expression:
1
pN
O (0, 0) = . (12)

K N ,
x−1
1
λD + ( λ1x λN
O (i))
1− x=1 D
i=0
λN (0)
O
An analytical method for the performance evaluation 215

The cost function that we want to minimize is the long-run, expected, average cost
of holding inventory,
N

Ctotal = hi E [W IPi + F Pi ], (13)
i=1

where hi is the unit cost of holding W IPi + FPi inventory per unit time in stage i.
In the remaining of this section, we optimize the echelon kanbans of an echelon
kanban control system consisting of N = 5 stages, where each stage contains a
single machine with exponentially distributed service times with mean equal to
1, for different combinations of inventory holding cost rates, hi , i = 1, . . ., 5,
and demand arrival rate λD = 0.5. In all cases we assume that there is value
added to the parts at every stage so that the inventory holding cost increases as
the stage increases, i.e. h1 < h2 < . . . < h5 . If this were not the case, i.e. if
h1 = h2 = . . . = h5 , then clearly it would make no sense to block the passage of
parts from one stage to another via the use of echelon kanbans, because this would
not lower the inventory holding cost but would worsen the quality of service. This
implies that if h1 = h2 = . . . = h5 , the optimal echelon kanbans satisfy K1 ≤ Ki ,
i = 2, . . ., 5, in which case the echelon kanban control system is equivalent to the
make-to-stock CONWIP system [23] with a WIP-cap on the total number of parts
in the system equal to K1 .
Table 9 shows the optimal design parameters (K1 , . . ., K5 ) and associated min-
imum, long-run, expected, average cost of holding inventory, for λD = 0.5 and
different quality of service constraints and inventory holding cost rates h1 , . . ., h5 ,
where h1 < h2 < . . . < h5 . The quality of service constraints that we use are
Prupt ≤ 0.02 and P (Q > n) ≤ 0.02, for n = 2, 5, 10.
From the results in Table 9, we see that the higher the number of backordered
demands n in the quality of service deﬁnition, P (Q > n), the lower the optimal
number of echelon kanbans, and hence the inventory holding cost. As the difference
between the holding cost rates hi , i = 1, . . ., 5, increases, the difference between the
optimal values of Ki , i = 1, . . ., 5, also increases, since the behavior of the echelon
kanban control system diverts further from that of the make-to-stock CONWIP
system. When the relative difference between the holding cost rates hi , i = 1, . . ., 5,
is low, the behavior of the echelon kanban control system tends to that of the make-
to-stock CONWIP system.
Table 10 shows the optimal design parameter K1 and associated minimum
inventory holding cost for λD = 0.5 and different quality of service constraints and
inventory holding cost rates h1 , . . ., h5 , for the make-to-stock CONWIP system.
The last column of Table 10 shows the relative increase in cost of the optimal
make-to-stock CONWIP system compared to the optimal echelon kanban control
system. Comparing the results between Tables 9 and 10, we note that the optimal
make-to-stock CONWIP system performs considerably worse than the optimal
echelon kanban control system, particularly when the relative difference between
the holding cost rates hi , i = 1, . . ., 5, is high and/or the number of backordered
demands n in the quality of service deﬁnition, P (Q > n), is high, indicating that
the quality of service is low.
216 S. Koukoumialos and G. Liberopoulos

Table 9. Opimal conﬁguration and associated costss for λD = 0.5 and different values of
h1 , . . . , h5 , for the echelon kanban control system

Design criterion K1 K2 K3 K4 K5 Cost

h1 = 1, h2 = 2, h3 = 3, h4 = 4, h5 = 5
Prupt ≤ 0.02 15 13 12 10 8 55.885
P (Q > 2) ≤ 0.02 13 11 10 8 7 46.555
P (Q > 5) ≤ 0.02 10 8 7 6 2 31.120
P (Q > 10) ≤ 0.02 7 6 5 3 1 20.253
h1 = 3, h2 = 8, h3 = 9, h4 = 10, h5 = 12
Prupt ≤ 0.02 15 13 12 10 8 144.314
P (Q > 2) ≤ 0.02 13 11 10 9 6 121.161
P (Q > 5) ≤ 0.02 10 8 7 6 2 84.074
P (Q > 10) ≤ 0.02 7 6 5 3 1 57.360
h1 = 1, h2 = 2, h3 = 4, h4 = 11, h5 = 12
Prupt ≤ 0.02 15 14 13 9 8 121.288
P (Q > 2) ≤ 0.02 14 13 10 7 6 98.890
P (Q > 5) ≤ 0.02 10 9 8 5 2 67.383
P (Q > 10) ≤ 0.02 8 6 4 3 1 39.483
h1 = 1, h2 = 6, h3 = 11, h4 = 16, h5 = 21
Prupt ≤ 0.02 17 13 11 10 8 218.702
P (Q > 2) ≤ 0.02 15 11 10 8 5 178.162
P (Q > 5) ≤ 0.02 10 8 7 6 2 115.601
P (Q > 10) ≤ 0.02 8 6 5 3 1 76.523
h1 = 1, h2 = 11, h3 = 21, h4 = 31, h5 = 41
Prupt ≤ 0.02 17 13 11 10 8 420.405
P (Q > 2) ≤ 0.02 15 11 10 8 5 341.324
P (Q > 5) ≤ 0.02 10 8 7 6 2 221.203
P (Q > 10) ≤ 0.02 8 6 5 3 1 145.047
h1 = 1, h2 = 2, h3 = 4, h4 = 8, h5 = 16
Prupt ≤ 0.02 17 15 12 9 7 143.879
P (Q > 2) ≤ 0.02 14 13 11 7 5 112.442
P (Q > 5) ≤ 0.02 10 8 7 6 2 65.843
P (Q > 10) ≤ 0.02 8 6 5 3 1 39.934
h1 = 1, h2 = 3, h3 = 9, h4 = 27, h5 = 81
Prupt ≤ 0.02 19 17 14 10 6 633.178
P (Q > 2) ≤ 0.02 17 15 12 8 4 471.867
P (Q > 5) ≤ 0.02 12 10 8 6 1 231.446
P (Q > 10) ≤ 0.02 8 6 5 3 1 139.066
An analytical method for the performance evaluation 217

Table 10. Optimal conﬁguration and associated costs for λD = 0.5 and different values of
h1 , . . . , h5 , for the CONWIP system

Design criterion K1 Cost Relative cost increase

h1 = 1, h2 = 6, h3 = 11, h4 = 16, h5 = 21
Prupt ≤ 0.02 14 244.163 10.43%
P (Q > 2) ≤ 0.02 12 202.415 11.98%
P (Q > 5) ≤ 0.02 10 161.006 28.2%
P (Q > 10) ≤ 0.02 8 120.307 36.39%
h1 = 1, h2 = 11, h3 = 21, h4 = 31, h5 = 41
Prupt ≤ 0.02 14 474.326 11.37%
P (Q > 2) ≤ 0.02 12 392.830 13.11%
P (Q > 5) ≤ 0.02 10 312.012 29.1%
P (Q > 10) ≤ 0.02 8 232.613 37.64%
h1 = 1, h2 = 2, h3 = 4, h4 = 8, h5 = 16
Prupt ≤ 0.02 14 175.160 17.86%
P (Q > 2) ≤ 0.02 12 143.407 21.59%
P (Q > 5) ≤ 0.02 10 111.986 41.2%
P (Q > 10) ≤ 0.02 8 81.260 50.86%
h1 = 1, h2 = 3, h3 = 9, h4 = 27, h5 = 81
Prupt ≤ 0.02 14 850.927 25.59%
P (Q > 2) ≤ 0.02 12 690.358 31.65%
P (Q > 5) ≤ 0.02 10 531.715 56.47%
P (Q > 10) ≤ 0.02 8 377.102 63.12%

8 Conclusions

We developed an analytical, decomposition-based approximation method for the

performance evaluation of the echelon kanban control system and tested it on several
numerical examples. The numerical examples showed that the method is quite
accurate in most cases. They also showed that the echelon kanban control system
has some advantages over the conventional kanban control system. Specifically,
when the two systems have the same value of K, the echelon kanban control system
has higher production capacity, lower average number of backordered demands, but
only slightly higher average WIP and either slightly higher or slightly lower FP than
the conventional kanban control system. The numerical results also showed that as
the variability of the service time distribution increases, the production capacity of
the echelon kanban control system and the accuracy of the approximation method
decrease. Finally, we know that the optimized echelon kanban control system always
performs at least as well as the optimized make-to-stock CONWIP system since the
latter system is a special case of the first system. The numerical results showed that
in fact the superiority in performance of the echelon kanban control system over
that of the make-to-stock CONWIP system can be quite significant, particularly
when the relative increase in inventory holding costs from one stage to the next
downstream stage is high and/or the quality of service is low.
218 S. Koukoumialos and G. Liberopoulos

Appendix A – Analysis of synchronization station ON

ON is a synchronization station fed by a continuous-time Markov arrival process

with state-dependent arrival rate λN N N
O (nO ), 0 ≤ nO < KN , and an external Poisson
process with rate λD . The underlying continuous-time Markov chain is shown in
Figure 5. The state of this Markov chain is (nN O , nD ), where is the number of
engaged kanbans and nD , nD ≥ 0, is the number of external resources (customer
demands) currently present in subsystem ON . Let pN N
O (nO , nD ) be the steady-state
probabilities of the Markov chain. These probabilities are solution of the following
balance equations:

Fig. 5. Continuous-time Markov chain describing the state (nN

0 , nD ) of synchronization
station ON

pN N N N N N
O (nO , 0)λD = pO (nO − 1, 0)λO (nO − 1) for nN
O = 1, ..., KN (14)
pN N
O (0, nD )λO (0) = pN
O (0, nD − 1)λD for nD > 0 (15)
The marginal probabilities PON (nN
O ) are then simply given by

PON (nN N N
O ) = pO (nO , 0) for nN
O = 1, ..., KN , (16)
∞

PON (0) = pN
O (0, nD ). (17)
nD =0

From (15) and (17) we get

∞
n D
λD 1
PON (0) = pN
O (0, 0) N (0)
= pN
O (0, 0) λD
. (18)
n =0
D
λ O 1 − λN (0) O

N
The conditional throughputs of subsystem O are obtained from (5), (14) and (16),
as follows:
N
vO (nN
O ) = λD for nN
O = 2, ..., KN (19)
From (5), (14), (16) and (18), we also get
N 1
vO (1) = . (20)
1
λD − λN1(0)
O
An analytical method for the performance evaluation 219

Appendix B – Analysis of synchronization station Ii

I i , i = 2, . . ., N , is a synchronization station fed by two continuous-time Markov

arrival processes with state-dependent arrival rates: λiI (niI ), 0 ≤ niI ≤ Ki , and
λi (ni ), 0 ≤ ni ≤ Ki−1 . The underlying continuous-time Markov chain is shown
in Figure 6. The state of this Markov chain is (niI , niu ), where niI is the number
of free kanbans and niu is the number of external resources (ﬁnished parts of stage
i − 1) currently present in subsystem I i . Recall that ni can be obtained from niu and
niI using (3). The steady-state probabilities piI (niI , niu ) can be derived as solutions
of the underlying balance equations and are given by:
⎡ i ⎤
nI
- λ i
(n − 1)
piI (niI , 0) = ⎣ I ⎦ pi (0, 0), (21)
n=1
λi (Ki − n) I
n
,u
i

λi (Ki + n − 1)
n=1
piI (0, niu ) = i niu piI (0, 0). (22)
λI (0)
The marginal probabilities, PIi (niI ), can then be derived by summing up the prob-
abilities above as follows:
⎡ i ⎤
nI
- i
λI (n − 1) ⎦ i
PIi (niI ) = ⎣ i (K − n)
pI (0, 0) for niI = 1, . . . , Ki , (23)
n=1
λ i
⎡ n i ⎤
,u i
⎢ Ki−1 −Ki
λ (Ki + n − 1) ⎥
⎢ n=1 ⎥ i
PIi (0) = ⎢1 + i niu ⎥ pI (0, 0). (24)
⎣ λI (0) ⎦
i nu =1

The estimation of the conditional throughputs of subsystem I i can then be obtained

by substituting the above probabilities into (5), as follows:
vIi (niI ) = λi (Ki − niI ) for niI = 2, ..., Ki , (25)

Fig. 6. Continuous-time Markov chain describing the state (niI ,niu ) of synchronization sta-
tion I i
220 S. Koukoumialos and G. Liberopoulos

⎡ n i ⎤
,u
⎢ Ki−1 −Ki
λi (Ki + n − 1) ⎥
⎢ n=1 ⎥
vIi (1) = λi (Ki − 1) ⎢1 + i niu ⎥. (26)
⎣ λ (0) ⎦
niu =1 I

Appendix C – Table of notation

N Number of stages
Ki Number of echelon kanbans of stage i
Li Subnetwork associated with the manufacturing process of stage i
mi Number of stations of subnetwork Li
Ji Synchronization station at the output of stage i
λD Average arrival rate of external customer demands in the unsaturated
system
Pr Maximum rate at which customer demands can be satisﬁed
R Queueing network of the echelon kanban control system
Ri Subsystem associated with stage i
Ii Upstream synchronization station of subsystem Ri
ON Downstream synchronization station of subsystem RN
Ŝi Downstream single-server pseudo-station of subsystem Ri
ni State of subsystem Ri
λi (ni ) State-dependent arrival rate of stage-i raw parts at the upstream synchro-
nization station I i of subsystem Ri
v i (ni ) Conditional throughput of subsystem Ri
k ∈ Mi Index denoting the stations within subsystem Ri , where M1 =
{1, . . ., m1 , Ŝ}, Mi = {I, 1, . . ., mi , Ŝ} for i = 2, . . ., N − 1, and
MN = {I, 1, . . ., mN , O}
nik State of station k in subsystem Ri
µik (nik ) Load-dependent service rate of station k in subsystem Ri
µk (nk ) Same as µik (nik ) with index i dropped
Tki Open system representing station k in subsystem Ri
Tk Same as Tki with index i dropped
λik (nik ) Rate of state-dependent Poisson arrival process at Tki
λk (nk ) Same as λik (nik ) with index i dropped
vki (nik ) Conditional throughput of Tki
vk (nk ) Same as vki (nik ) with index i dropped
Pki (nik ) Steady-state probability of Tki
pB Proportion of backordered demands
QD Average number of backordered demands
WB Average waiting time of backordered demands
An analytical method for the performance evaluation 221

References

1. Baskett F, Chandy KM, Muntz RR, Palacios-Gomez F (1975) Open, closed and mixed
networks of queues with different classes of customers. Journal of ACM 22: 248–260
2. Baynat B, Dallery Y (1993) A uniﬁed view of product-form approximation techniques
for general closed queueing networks. Performance Evaluation 18(3): 205–224
3. Baynat B, Dallery Y (1993) Approximate techniques for general closed queueing net-
works with subnetworks having population constraints. European Journal of Opera-
tional Research 69: 250–264
4. Baynat B, Dallery Y (1996) A product-form approximation method for general closed
queueing networks with several classes of customers. Performance Evaluation 24: 165–
188
5. Baynat B, Dallery Y, Ross K (1994) A decomposition approximation method for mul-
ticlass BCMP queueing networks with multiple-server stations. Annals of Operations
Research 48: 273–294
6. Bruell SC, Balbo G (1980) Computational algorithms for closed queueing networks.
Elsevier North-Holland, Amsterdam
7. Buzacott JA (1989) Queueing models of kanban and MRP controlled production sys-
tems. Engineering Costs and Production Economics 17: 3–20
8. Buzacott JA, Shanthikumar JG (1993) Stochastic models of manufacturing systems.
Prentice-Hall, Englewood Cliffs, NJ
9. Buzen JP (1973) Computational algorithms for closed queueing networks with expo-
nential servers. Comm. ACM 16(9): 527–531
10. Clark A, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Man-
agement Science 6: 475–490
11. Dallery Y (1990) Approximate analysis of general open queueing networks with re-
stricted capacity. Performance Evaluation 11(3): 209–222
12. Dallery Y, Cao X (1992) Operational analysis of stochastic closed queueing networks.
Performance Evaluation 14(1): 43–61
13. Dallery Y, Liberopoulos G (2000) Extended kanban control system: combining kanban
and base stock. IIE Transactions 32(4): 369–386
14. Di Mascolo M, Frein Y, Dallery Y (1996) An analytical method for performance eval-
uation of kanban controlled production systems. Operations Research 44(1): 50–64
15. Duri C, Frein Y, Di Mascolo M (2000) Comparison among three pull control policies:
kanban, base stock and generalized kanban. Annals of Operations Research 93: 41–69
16. Frein Y, Di Mascolo M, Dallery Y (1995) On the design of generalized kanban control
systems. International Journal of Operations and Production Management 15(9): 158–
184
17. Gordon WJ, Newell GF (1967) Closed queueing networks with exponential servers.
Operations Research 15: 252–267
18. Jackson JR (1963) Jobshop-like queueing systems. Management Science 10(1): 131–
142
19. Liberopoulos G, Dallery Y (2002) Comparative modeling of multi-stage production-
inventory control policies with lot sizing. International Journal of Production Research
41(6): 1273–1298
20. Marie R (1979) An approximate analytical method for general queueing networks.
IEEE Transactions on Software Engineering 5(5): 530–538
21. Marie R (1980) Calculating equilibrium probabilities for λ(n)/Ck /1/N queues. Perfor-
mance Evaluation Review 9: 117–125
22. Reiser M, Lavenberg SS (1980) Mean value analysis of closed multichain queueing
networks. Journal of ACM 27(2): 313–322
222 S. Koukoumialos and G. Liberopoulos

23. Schweitzer PJ (1979) Approximate analysis of multiclass closed networks of queues.

Proceedings of the International Conference on Stochastic Control and Optimization,
Amsterdam
24. Spanjers L, van Ommeren JCW, Zijm WHM (2005) Closed loop two-echelon reparable
item systems. OR Spectrum 27(2–3): 369–398
25. Spearman ML, Woodruff DL, Hopp WJ (1990) CONWIP: a pull alternative to kanban.
International Journal of Production Research 28: 879–894
26. Stewart WJ, Marie R (1980) A numerical solution for the λ(n)/Ck /r/N queue. European
Journal of Operational Research 5: 56–68
27. Whitt W (1983) The queueing network analyser. Bell Systems Technology Journal
62(9): 2779–2815
Closed loop two-echelon repairable item systems
L. Spanjers, J.C.W. van Ommeren, and W.H.M. Zijm
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente,
P.O. Box 217, 7500 AE Enschede, The Netherlands
(e-mail: [email protected])

Abstract. In this paper we consider closed loop two-echelon repairable item sys-
tems with repair facilities both at a number of local service centers (called bases)
and at a central location (the depot). The goal of the system is to maintain a number
of production facilities (one at each base) in optimal operational condition. Each
production facility consists of a number of identical machines which may fail inci-
dentally. Each repair facility may be considered to be a multi-server station, while
any transport from the depot to the bases is modeled as an ample server. At all
bases as well as at the depot, ready-for-use spare parts (machines) are kept in stock.
Once a machine in the production cell of a certain base fails, it is replaced by a
ready-for-use machine from that base’s stock, if available. The failed machine is
either repaired at the base or repaired at the central repair facility. In the case of local
repair, the machine is added to the local spare parts stock as a ready-for-use ma-
chine after repair. If a repair at the depot is needed, the base orders a machine from
the central spare parts stock to replenish its local stock, while the failed machine
is added to the central stock after repair. Orders are satisfied on a first-come-first-
served basis while any requirement that cannot be satisfied immediately either at
the bases or at the depot is backlogged. In case of a backlog at a certain base, that
base’s production cell performs worse.
To determine the steady state probabilities of the system, we develop a slightly
aggregated system model and propose a special near-product-form solution that
provides excellent approximations of relevant performance measures. The depot
repair shop is modeled as a server with state-dependent service rates, of which the
parameters follow from an application of Norton’s theorem for Closed Queuing
Networks. A special adaptation to a general Multi-Class Marginal Distribution
Analysis (MDA) algorithm is proposed, on which the approximations are based.
All relevant performance measures can be calculated with errors which are generally

Correspondence to: W.H.M. Zijm

224 L. Spanjers et al.

less than one percent, when compared to simulation results. The approximations
are used to find the stock levels which maximize the availibility given a fixed
configuration of machines and servers and a certain budget for storing items.

Keywords: Multi-echelon systems – Repairable items – Spare parts inventory –

Closed queueing networks – Near-product form solutions

1 Introduction

Repairable inventory theory involves designing inventory systems for items which
are repaired and returned to use rather than discarded. The items are less expensive
to repair than to replace. Such items can for example be found in the military, avi-
ation, copying machines, transportation equipment and electronics. The repairable
inventory problem is typically concerned with the optimal stocking of parts at bases
and a central depot facility which repairs failed units returned from bases while pro-
viding some predetermined level of service. Different performance measures may
be used, such as cost, backorders and availability.
Over the past 30 years there has been considerable interest in multi-echelon
inventory theory. Much of this work originates from a model called METRIC, which
was first reported in the literature by Sherbrooke [9]. The model was developed for
the US Air Force at the Rand Corporation for a multi-echelon repairable-item
inventory system. In this model an item at failure is replaced by a spare if one is
available. If none are available a spare is backordered. Of the failed items a certain
proportion is repaired at the base and the rest at a repair depot, thereby creating
a two-echelon repairable-item system. Items are returned from the depot using a
one-for-one reordering policy. The METRIC model determines the optimal level
of spares to be maintained at each of the bases and at the depot.
A shortfall of the METRIC model is that it assumes that failures are Poisson
from an infinite source and that the repair capacity is unlimited. Therefore, others
have continued the research to gain results more useful for real life applications.
Gross, Kioussis and Miller [5], Albright and Soni [1] and Albright [2] focused their
attention on closed queuing network models, thereby dropping the assumption of
Poisson failures from an infinite source. The intensity by which machines enter the
repair shops depends on the number of machines operating in the production cell.
In case of a backlog at a base, this intensity is therefore smaller than in the optimal
case where the maximum number of machines is operating in the production cell.
Also the assumption of unlimited repair capacity is dropped in Gross et al. [5] and
Albright [2].
This paper deals with similar models. It handles closed queuing network mod-
els with limited repair. However, the approximation method differs considerably.
The approximation method builds on the method by Avsar and Zijm [3]. Avsar and
Zijm considered an open queuing network model with limited repair. By a small
aggregation step, the system is changed into a system with a special near-product-
form solution that provides an approximation for the steady state distribution. From
the steady state distribution all relevant performance measures can be computed.
Closed loop two-echelon repairable item systems 225

We will perform a similar aggregation step in this paper and again a special near-
product-form solution will be obtained. However, as opposed to open systems, in a
system with finite sources, the demand rates to the depot also become state depen-
dent; moreover, these demand rates are clearly influenced by the efficiency of the
base repair stations. Nevertheless, we are able to develop relatively simple approxi-
mation algorithms to obtain the relevant performance measures. These performance
measures can ultimately be used within an optimization model to determine such
quantities as the optimal repair capacities and the optimal inventory levels.
The organization of this paper is as follows: In the next section we consider
a very simple two-echelon system, consisting of one base, a base repair shop and
a central repair shop. The repair shops are modeled as single servers. This model
mainly serves to explain the essential elements of the aggregation step. We present
the modified system with near-product-form solution and numerical results to show
the accuracy of the approximation. Next, in Section 3, we turn to more general re-
pairable item network structures, containing multiple bases and transport lines from
the depot to the bases. The repair shops are modeled as multi-servers. The approx-
imation method leading to an adapted Multi-Class MDA algorithm is presented
and some numerical results are discussed. In Section 4, an optimization algorithm
based on this approximation method, is given which finds the stock levels that max-
imize the (weighted) availibility under a given cost constraint. In the last section,
we summarize our results and discuss a number of extensions that are currently
being investigated.

2 Analysis of a simple two-echelon system with single server facilities

In this section a simpliﬁed repairable item system is discussed to explain how a

slight modiﬁcation turns this system into a near-product form network that can be
analyzed completely. In the next section we turn to more complex systems.

2.1 The single base model without transportation

Consider the system as depicted in Figure 1. The system consists of a single base
and a depot. At the base a maximum of J1 machines can be operational in the
production cell.
Operational machines fail at exponential rate λ1 and are replaced by a machine
from the base stock (if available). Both at the base and at the depot there is a repair
shop. Failed machines are base-repairable with probability p1 and consequently
depot-repairable with probability 1 − p1 . The repair shops are modeled as single
servers with exponential service rate µ0 for the depot and exponential service rate
µ1 for the base. In addition to the J1 machines another group of S1 machines is
dedicated to the base to act as spares. When a machine fails, the failed machine goes
to a repair shop while at the same time a request is sent to place a spare machine
from the base stock in the production cell. This request is carried out immediately,
if possible. In case no spare machines are at the base, a backlog occurs. As soon
as there is a repaired machine available, it becomes operational. A number of S0
226 L. Spanjers et al.

Production cell
1 − p1
λ1
λ1
p1

Base repair
µ1 λ1

J1 machines
S1
Depot repair
µ0
S0

Fig. 1. The single base repairable item system

machines is dedicated to the depot to act as spares. When a failed machine cannot
be repaired at the base and hence is sent to the depot, a spare machine is shipped
from the depot to the base to replenish the base stock, or - in case of a backlog
- to become operational immediately. When no spares are available at the depot,
a backorder is created. In that case, as soon as a machine is repaired at the depot
repair shop, it is sent to the base. In this simple model, transport times from the
base to the depot and vice versa are not taken into account.
In Figure 1 (and subsequent ﬁgures), requests are indicated by dotted lines. The
matching of a request and a ready-for-use machine is modeled as a synchronization
queue, both at the base and at the depot. At the base however, some reﬂection reveals
that the synchronization queue can be seen as a normal queue where machines
are waiting to be moved into the production cell. This is only possible when the
production cell does not contain the maximum number of machines, that is, if a
machine in the production cell has failed. This leads to the model in Figure 2.

Production cell
1 − p1
λ1
λ1
p1

Base repair
µ1 λ1
k m11
j1 machines
m12 operational
Depot repair
µ0
n1 n2

Fig. 2. The modiﬁed single base system

In this ﬁgure the variables n1 , n2 , k, m11 and m12 indicate the lengths of the
various queues in the system. The number of machines in (or awaiting) depot repair
is denoted by the random variable n1 , the number of spare machines at the depot
is denoted by the random variable n2 and the backlog of machines at the depot
is denoted by k. At the base there are m11 machines waiting for repair or being
repaired and m12 machines are acting as spares. In the production cell j 1 machines
are operational. As a result of the operating inventory control policies, for n1 = n1 ,
n2 = n2 , k = k, m11 = m11 , m12 = m12 and j 1 = j1 the following equations
Closed loop two-echelon repairable item systems 227

must hold:
n 1 + n 2 − k = S0 , (1)
n2 · k = 0, (2)
k + m11 + m12 + j1 = S1 + J1 , (3)
m12 · (J1 − j1 ) = 0, (4)
where Equations (2) and (4) follow from the fact that it is impossible to have a
backlog and to have spare machines available at the same time. If spare machines
are available, a request is satisﬁed immediately. In case of a backlog, a request is
not satisﬁed until a repair completion. The repaired machine is merged with the
longest waiting request.
From these relations it follows immediately that n1 and m11 completely deter-
mine the state of the system, including the values of n2 , k, m12 and j1 . Therefore,
the system can be modeled as a continuous time Markov chain with state description
(n1 , m11 ). The corresponding transition diagram is displayed in Figure 3.

m11

S1 + J1

( J1 + S − m ) p λ
1 11 1 1

µ0 ( J1 + S − m )(1 − p ) λ
1 11 1 1
µ1
( J1 + S + S − m − n ) p λ
II 0 1 11 1 1 1
S1
µ0 ( J1 + S + S − m − n )(1 − p ) λ
0 1 11 1 1 1
µ1

J1 p λ J1 p λ
1 1 1 1

µ0 µ0
J1 (1 − p )λ J1 (1 − p )λ
1 1 1 1
µ1 µ1
I III IV
S0 S 0 + S1 S0 + S1 + J1 n1

Fig. 3. Transition diagram for state description (n1 , m11 )

Let P (n1 , m11 ) = P (n1 = n1 , m11 = m11 ) be the steady state probability of
being in state (n1 , m11 ). This steady state probability can be found by solving the
global balance equations of the system. These can be deduced from the transition
diagram. Nevertheless, it is not possible to ﬁnd an algebraic expression for the
steady state probabilities. Moreover, for larger systems with e.g. multiple bases,
the computational effort becomes prohibitive. Therefore the system will be slightly
adjusted in the next subsection, in order to arrive at a near-product form network.
Note that the analysis presented in this paper, is partly similar to the one given in
Avsar and Zijm [3], where the equivalent open two-echelon network is considered.
For this open network, an algebraic and easily computable product form approxi-
mation is found. In the current paper, a closed network is considered, and an easily
computable algebraic approximation could not be found. However, the aggregated
228 L. Spanjers et al.

network has a product form steady state distribution, and we can use MDA-like
algorithms to ﬁnd numerical approximations for performance measures.
An alternative approach is to model the number of machines at the depot and
the bases as a level dependent quasi birth death process. This method may yield an
algebraic solution but, here too, the ﬁnite state space makes the analysis more com-
plex. Moreover, the transition rates in a given state, do not only depend on the phase
but also on the level. Together, this makes the alternative method computationally
very demanding, if not intractable.

2.2 Approximation

A ﬁrst step towards an approximation for the steady state probabilities is to aggregate
the state space. The most difﬁcult parts of the transition diagram are regions I and
II, that is, the parts with n1 ≤ S0 or, equivalently, the parts with k = 0. The parts
with k > 0 are equivalent to the states with n1 = k + S0 . A natural aggregation of
the system is a description through the states (k, m11 ). The states (n1 , m11 ) with
n1 = 0, 1, . . . , S0 are then aggregated into one state (0, m11 ). Denote the steady
state probabilities for the new model by P̃ then the following holds for any m11 :
S0

P̃ (k = 0, m11 = m11 ) = P (n1 = n1 , m11 = m11 ), (5)
n1 =0

P̃ (k = k, m11 = m11 ) = P (n1 = S0 + k, m11 = m11 ). (6)

The transition diagram corresponding to the alternative state space description is
displayed in Figure 4.
The rates only differ from the transition diagram in Figure 3 for the case k = 0.
Let q(m11 ) be the steady state probability that an arriving request for a machine
at the depot has to wait, given that it ﬁnds no other waiting requests in front of it
(k = 0) and m11 = m11 . Given the (aggregated) state (0, m11 ), the state does not
change in case of an arriving request with probability 1−q(m11 ), because spares are
available. With probability q(m11 ) no spares are available and the state changes into
(1, m11 ). The transition rate from (0, m11 ) to (1, m11 ) equals j1 (1−p1 )λ1 q(m11 ).
To determine q(m11 ) one needs

q(m11 ) = P (n1 = S0 |n1 ≤ S0 , m11 = m11 ). (7)

However, to compute this, one needs to know the steady state distribution of the
original system, which is exactly what we attempt to approximate. Therefore, we
approximate the q(m11 )’s by their weighted average, i.e. we focus on the conditional
probability q deﬁned by

q= q(m11 )P (m11 = m11 |n1 ≤ S0 ) = P (n1 = S0 |n1 ≤ S0 ) (8)
m11

and for every m11 we replace q(m11 ) in the transition diagram by this q. In the
next section it will be explained how a reasonable approximation for this q can be
obtained by means of an application of Norton’s theorem.
Closed loop two-echelon repairable item systems 229

m11

S1 + J1

( J1 + S − m ) p λ
1 11 1 1

( J1 + S1 − m11)(1 − p1)λ1q(m11)
µ1

( J1 + S − m − k ) p λ
1 11 1 1
S1
J1 p λ µ0 ( J1 + S − m − k )(1 − p ) λ
1 1 1 11 1 1
µ1
J1(1 − p1)λ1q(m11)
µ1
J1 p λ
1 1

µ0
J1 (1 − p ) λ
1 1
µ1
k
0 S1 S1 + J1
Fig. 4. Transition diagram for state description (k, m11 )

Lemma 1 The steady state probabilities for the model with state description
(k, m11 ) and transition rates as denoted in Figure 4 with q(m11 ) replaced by
arbitrary q have a product form.

Proof. To ﬁnd the steady state probabilities, consider both the original model in
Figure 2 and the alternative model in Figure 5.

Production cell
1 − p1
λ1
λ1
p1

Base repair
µ1 λ1
m11
Depot repair j1 machines
µ0 / ∞ m12 operational
k

Fig. 5. Typical-server Closed Queuing Network (TCQN)

In Figure 5 the depot repair shop with synchronization queue is replaced by

a typical server. For jobs that find the server idle the server has infinite service
rate with probability 1 − q (the case spares are available) and service rate µ0 with
probability q (the case no spares are available). Let b1 be the random variable equal
to m12 + j 1 , then by looking at the system with the typical server, and conditioning
on the fact that the network contains exactly J1 + S1 jobs, it is easily verified that
the following expression for P̃ (k = k, m11 = m11 , b1 = b1 ) satisfies the balance
230 L. Spanjers et al.

equations of the TCQN:

⎧ b1
⎪
⎪ m11 k 1
⎪
⎪ p1 1 − p1 λ1
⎪
⎪ G̃q , b1 > J1 , k > 0
⎪
⎪ µ1 µ0 J1 !J1b1 −J1
⎪
⎪ b
⎪
⎪
⎪
⎪ m11 k 1 1
⎪
⎪ p 1 − p λ1
⎪ G̃q
⎪
1 1
, b1 ≤ J1 , k > 0
⎪
⎨ µ1 µ b1 !
0b1
P̃ (k, m11 , b1 ) = m11 1
⎪
⎪ p1 λ1
⎪
⎪ G̃ , b1 > J1 , k = 0
⎪
⎪ µ1 b1 −J1
⎪
⎪ J !J
b1
1
⎪
⎪
1
⎪ p m11 λ11
⎪
⎪
⎪
⎪ 1
b1 ≤ J1 , k = 0
⎪
⎪ G̃ ,
⎪
⎪ µ1 b1 !
⎩
(9)

with k + m11 + b1 = J1 + S1 and G̃ the normalization constant.

Expressed in terms of the state variables (k, m11 ), this result immediately leads
to:

Lemma 2 The steady state distribution for the aggregate model is given by
⎧ m11 k
⎪
⎪ Gq p1 λ 1 (1−p1 )λ1
⎪
⎪ ,
⎪
⎪ J1 !J1S1 −k−m11 µ1 µ0
⎪
⎪
⎪
⎪ k+m11 ≤S1 , k>0
⎪
⎪
⎪
⎪ m11 k
⎪
⎪
⎪
⎪ Gq p1 λ 1 (1−p1 )λ1
⎪
⎪ ,
⎪
⎪ (S1 +J1 −k−m11 )! µ1 µ0
⎪
⎨ k+m11 >S1 , k>0
P̃ (k, m11 )= m11 (10)
⎪
⎪
⎪
⎪ G p1 λ 1
⎪
⎪ ,
⎪
⎪ J1 !J1S1 −m11 µ1
⎪
⎪
⎪
⎪ m11 ≤ S1 , k = 0
⎪
⎪
⎪
⎪ m11
⎪
⎪ G p1 λ 1
⎪
⎪ ,
⎪ (S1 + J1 − m11 )!
⎪ µ1
⎪
⎩
m11 > S1 , k = 0
−(J1 +S1 )
with G = G̃λ1 the normalization constant.

The previous lemma gives an explicit expression for the steady state probabili-
ties. For large systems it may be difﬁcult to calculate the normalization constant G.
However, since we are dealing with a product form network, Marginal Distribution
Analysis (see e.g. Buzacott and Shanthikumar [4]) can be used to calculate the
appropriate performance measures directly.
The results presented so far hold true for any value of q ∈ [0, 1]. In the derivation
of the lemmas above the interpretation of q as the conditional probability that a
Closed loop two-echelon repairable item systems 231

request at the depot has to wait given that it ﬁnds no other requests in front of it
(see (8)), has not been used. Therefore any q ∈ [0, 1] will do, but it is expected that
a good approximation will be obtained by using a q that does correspond to this
interpretation. In the next subsection Norton’s theorem will be used to ﬁnd a q with
a meaningful interpretation that gives good results.

2.3 Applying Norton’s theorem to approximate q

Although we have stated in the previous section that the product form does not
depend on q, it is still needed to find a q that gives a good approximation for the
performance measures. In this section, the basic idea of Norton’s theorem (see
Harrison and Patel [6] for an overview) is used to find an approximation for q that
gives good results. This basic idea is that a product form network can be analyzed
by replacing subnetworks by state dependent servers. Norton’s theorem states that
the joint distributions for the numbers of customers in the subnetworks and the
queue lengths at the replacing state dependent servers are the same.
To use this idea, first recall the original model as shown in Figure 2. We want
to find q, the conditional probability that a request corresponding with a machine
failure finds no spare parts in stock at the depot, although there was no backlog so
far. The base, consisting of the production cell and the base repair shop, is taken
apart and replaced by a state dependent server.

Production cell
1 − p1
λ1
λ1
p1

Base repair
µ1 λ1
k m11
j1 machines
TH1(i) m12 operational
Depot repair TH1(i)
µ0
n1 n2

a b
Fig. 6. a The new network with state dependent server. b The short circuited network

The new network with the state dependent server is displayed in Figure 6a. In
order to ﬁnd the service rates for this state dependent server, the original network is
short circuited by setting the service rate at the depot repair facility to inﬁnity. This
short circuited network is also depicted in Figure 6b. The service rate for the new
state dependent server with i jobs present is equal to the throughput of the short
circuited network with i jobs present, denoted by T H1 (i).
The evolution of n1 = n1 , the number of machines in or awaiting depot re-
pair, can be described as a birth-death process. The transition diagram is shown in
Figure 7.
232 L. Spanjers et al.

TH1 ( J1 + S1 ) TH 1 ( J1 + S1 ) TH1 ( J1 + S1 ) TH1 ( J1 + S1 ) TH1 ( J1 + S1 − 1) TH1 (2) TH1 (1)

0 1 2 S0 − 1 S0 S0 + 1 S0 + S1 + J1 − 1 S0 + S1 + J1
µ0 µ0 µ0 µ0 µ0

Fig. 7. Transition diagram for n1

Note that this is just an approximation due to the fact that Norton’s theorem is
only valid for product form networks. In case S0 = 0, we would have a product
form network and the results would be exact. From the diagram one can observe
that

P (n1 = n1 ) T H1 (J1 + S1 − (n1 − S0 )+ ) = P (n1 = n1 + 1) µ0 (11)

for n1 = 0, . . . , J1 + S1 + S0 − 1. In principle one can derive an approximation

of the distribution of n1 from this. However, by the definition of q (see (8)), we
only need to study the behavior for n1 ≤ S0 . For these states, the service rate of
the state dependent server is equal to T H1 (J1 + S1 ). Let δ = T H1 (J1 + S1 )/µ0 .
From (11) we observe that P (n1 = n1 ) = δ n1 P (n1 = 0) for n1 = 0, . . . , S0 so
P (n1 =S0 ) δ S0 P (n1 =0) δ S0 P (n1 =0) δ S0
q= = S0 = S0 = 1−δS0 +1
P (n1 ≤ S0 ) n =0 P (n1 =n1 )
n1
n =0 δ P (n1 =0)
1 1 1−δ
1−δ
= δ S0 . (12)
1 − δ S0 +1
It remains to find the throughput of the short circuited network in Fig-
ure 6b with J1 + S1 jobs present. A simple observation reveals that P (b1 =
b1 ) min(b1 , J1 ) λ1 p1 = P (b1 = b1 − 1)µ1 for b1 = 1, . . . , J1 + S1 from
which the steady state probabilities of b1 are immediately deduced. Moreover, the
throughput satisfies
J
1 +S1

T H1 (J1 + S1 ) = (1−p1 ) P (b1 =b1 ) min(b1 , J1 )λ1

b1 =1
1 − p1
= µ1 (1 − P (b1 =J1 +S1 )). (13)
p1
We can determine q with (12) and (13). This q can be used to approximate the steady
state distribution using (10) or using Marginal Distribution Analysis. Results of this
approximation are presented in the next section.

2.4 Results

In this section numerical results obtained by the approximation described above

will be presented. To be able to judge the approximation the results are compared
to exact results. The exact results are obtained by solving the balance equations for
the original model.
The performance measures we are interested in are the availability, i.e. the
probability that the maximum number of machines is working in the production
Closed loop two-echelon repairable item systems 233

cell, denoted by A, and the expected number of machines operating in the production
cell (Ej 1 ). These are deﬁned as follows:
A = P (j 1 = J1 ) = P (b1 ≥ J1 ) = P (k + m11 ≤ S1 ), (14)

Ej 1 = E(J1 −[k+m11 −S1 ]+ ) = (J1 −[k+m11 − S1 ]+ )P (k, m11 ). (15)
k,m11

The performance measures are computed for several values of J1 , S0 , S1 , p1 , λ1 , µ0

and µ1 . The results are given in Table 1 and in Tables 5 and 6 in the Appendix.
Also, the percentage deviation is given.
The numbers reveal that in these systems, the approximation gives an error of
at most 1 %. In all other cases that we tested, we got similar results. The largest
errors are attained in the cases with only a small number of spares (S0 > 0) in the
system. For the case S0 = 0 the results are exact.

3 General two-echelon repairable item systems

In this section the simple system from Section 2 will be extended to a more realistic
one. The system will contain multiple bases and transport lines. Furthermore, the
single servers that are used in the repair shops are replaced by multiple parallel
servers. These adjustments will make the analysis of the system more complicated.
Nevertheless, the basic idea of the aggregation step will be the same.

3.1 The multi-base model with transportation

The system in this section consists of multiple bases, where the number of bases is
denoted by L. A graphical representation of the system is given in Figure 8 for the
case L = 2.
As in the simple system described before, at base = 1, . . . , L at most J
machines are operating in the production cell. The machines fail at exponential rate
λ and are always replaced by a machine from the corresponding base stock (if
available). Failed machines from base are base-repairable with probability p and
depot-repairable with probability 1 − p . In contrast to the simple model described
before, the repair shops are modeled as multi-servers. That is, at the repair shop of
base = 1, . . . , L R repairmen are working, each at exponential rate µ . At the
depot repair shop R0 repairmen are working at exponential rate µ0 . Consistent with
the simple model S machines are dedicated to base to act as spares and S0 spare
machines are dedicated to the depot. Broken machines at a certain base that are
base-repairable are sent to the base repair shop. After repair they ﬁll up the spares
buffer at base or, in case of a backlog at that base, become operational immediately.
Broken machines from base that are considered depot-repairable are sent to the
depot repair shop. When depot spares are available, a spare is immediately sent to
the stock of base . In case there are no spares available a backlog occurs. Machines
that have completed repair are sent to the base that has been waiting the longest. That
is, an FCFS return policy is used. In this model the transportation from the depot to
234 L. Spanjers et al.

Table 1. Results for the simple single base model, p1 = 0.5, λ1 = 1, µ0 = 2J1 , µ1 = J1

J1 S0 S1 Aexact Aappr % dev Ej 1 exact Ej 1 appr % dev

3 1 0 0.5651 0.5674 0.4185 2.4225 2.4246 0.0853

3 3 0 0.5889 0.5892 0.0543 2.4572 2.4576 0.0145
3 5 0 0.5901 0.5901 0.0041 2.4589 2.4590 0.0012
3 1 1 0.7945 0.7952 0.0934 2.7283 2.7286 0.0098
3 3 1 0.8110 0.8111 0.0154 2.7506 2.7507 0.0036
3 5 1 0.8120 0.8120 0.0014 2.7518 2.7518 0.0004
3 1 3 0.9506 0.9506 0.0012 2.9349 2.9348 0.0057
3 3 3 0.9554 0.9554 0.0000 2.9412 2.9412 0.0006
3 5 3 0.9557 0.9557 0.0000 2.9416 2.9416 0.0000
3 1 4 0.9755 0.9754 0.0012 2.9677 2.9676 0.0036
3 3 4 0.9779 0.9779 0.0004 2.9709 2.9709 0.0005
3 5 4 0.9781 0.9781 0.0000 2.9711 2.9711 0.0000
5 1 0 0.5369 0.5387 0.3314 4.3147 4.3160 0.0318
5 3 0 0.5625 0.5628 0.0461 4.3581 4.3584 0.0064
5 5 0 0.5639 0.5639 0.0037 4.3604 4.3604 0.0006
5 1 1 0.7759 0.7765 0.0761 4.6703 4.6704 0.0006
5 3 1 0.7940 0.7941 0.0127 4.6978 4.6979 0.0012
5 5 1 0.7950 0.7950 0.0012 4.6994 4.6994 0.0002
5 1 3 0.9453 0.9453 0.0012 4.9198 4.9196 0.0041
5 3 3 0.9506 0.9506 0.0000 4.9276 4.9276 0.0005
5 5 3 0.9510 0.0000 0.0000 4.9281 4.9281 0.0000
5 1 4 0.9727 0.9727 0.0009 4.9601 4.9600 0.0025
5 3 4 0.9755 0.9755 0.0003 4.9641 4.9640 0.0004
5 5 4 0.9757 0.9757 0.0000 4.9643 4.9643 0.0000
10 1 0 0.5091 0.5102 0.2178 9.1830 9.1837 0.0073
10 3 0 0.5363 0.5365 0.0328 9.2375 9.2377 0.0017
10 5 0 0.5379 0.5379 0.0028 9.2406 9.2406 0.0002
10 1 1 0.7565 0.7569 0.0507 9.5979 9.5977 0.0016
10 3 1 0.7762 0.7762 0.0087 9.6321 9.6321 0.0000
10 5 1 0.7774 0.7774 0.0008 9.6341 9.6341 0.0000
10 1 3 0.9395 0.9395 0.0006 9.9006 9.9004 0.0020
10 3 3 0.9455 0.9455 0.0001 9.9104 9.9104 0.0003
10 5 3 0.9458 0.9458 0.0000 9.9110 9.9110 0.0000
10 1 4 0.9698 0.9698 0.0007 9.9504 9.9503 0.0012
10 3 4 0.9728 0.9728 0.0002 9.9554 9.9554 0.0002
10 5 4 0.9730 0.9730 0.0000 9.9557 9.9557 0.0000
Closed loop two-echelon repairable item systems 235

Production cell
1 − p1
λ1
λ1
p1

Base 1 repair µ1
λ1
µ1
J1 machines
Transport γ1 S1

Depot repair µ0 Transport depot to bases

µ0 S0 Transport γ2 S2
Production cell
µ2
λ2
Base 2 repair µ2 λ2

λ2
1 − p2
J 2 machines

Fig. 8. The multi-base repairable item system for L = 2

3URGXFWLRQFHOO
1 − p1
λ1
λ1
p1

%DVHUHSDLU µ1
m11
m12 λ1
µ1
t1 PDFKLQHV
j1
k 7UDQVSRUW γ1 RSHUDWLRQDO
'HSRWUHSDLU µ0 7UDQVSRUWGHSRWWREDVHV
n1
µ0 n2 7UDQVSRUW γ2
t2
3URGXFWLRQFHOO
µ2 m22 λ2
m21
%DVHUHSDLU µ2 λ2

λ2
1 − p2
PDFKLQHV
j2
RSHUDWLRQDO
Fig. 9. The modiﬁed multi-base system for L = 2

the bases is taken into account explicitly. The transport lines are modeled as ample
servers with exponential service rate γ for the transport to base = 1, . . . , L. The
number of machines in transport to base is denoted by the random variable t .
The transport from the bases to the depot is not taken into account.
As in the simple model, the synchronization queues at the bases can be replaced
by ordinary queues as is depicted in Figure 9.
The vector m1 = (m11 , m21 , . . . , mL1 ) denotes the number of machines in
base repair ( = 1, . . . , L) and the vector m2 = (m12 , m22 , . . . , mL2 ) denotes
the number of spares at the bases ( = 1, . . . , L). The variable n1 stands for the
number of machines in depot repair and n2 is the number of spare machines at the
236 L. Spanjers et al.

depot. The vector k0 = (k01 , k02 , . . . , k0L ) denotes the backorders at the depot,
L base ( = 1, . . . , L). The total number of backorders at the depot
originating from
equals k = =1 k0 . The machines in transit to the bases are given by the vector
t = (t1 , t2 , . . . , tL ) and the numbers of machines operating in the production cells
are expressed in vector j = (j1 , j2 , . . . , jL ). The sum of the number of machines in
base stock and the number of machines operating in the production cell is denoted
in the vector b = (b1 , b2 , . . . , bL ), where b = m2 + j .
As a result of the operating inventory control policies, for n1 = n1 , n2 = n2 ,
k0 = k0 , t = t , m1 = m1 , m2 = m2 and j = j the following equations must
hold:
n1 + n2 − k = S0 , (16)
n2 · k = 0, (17)
and for = 1, 2, . . . , L :
k0 + t + m1 + m2 + j = S + J , (18)
m2 · (J − j ) = 0. (19)
From these relations it follows immediately that k0 , n1 , t and m1 completely
determine the state of the system. Therefore, the system can be modeled as a
continuous time Markov chain with state description (k0 , n1 , t, m1 ).

Remark 3 In the vector that denotes the number of backorders originating from the
bases, k0 = (k01 , k02 , . . . , k0L ), it is not taken into account that the order of the
backorders matters. Since an FCFS return policy is assumed, this order should be
known. Nevertheless, in this model all states with similar numbers of backorders
per base, are aggregated into one state. This aggregation step will not have a big
inﬂuence on the results, but it will considerably simplify the analysis.

3.2 Approximation

In correspondence with the simple model as described in Section 2 a similar aggre-

gation step is performed to tackle this extended model. Once more, all states with
0 ≤ n1 ≤ S0 are aggregated into one state. The aggregation step is performed as
follows
P (k0 = 0, k = 0, t = t, m1 = m1 )
S0

= P (k0 = 0, n1 = n1 , t = t, m1 = m1 ) (20)
n1 =0
P (k0 = k0 , k = k, t = t, m1 = m1 )
= P (k0 = k0 , n1 = S0 + k, t = t, m1 = m1 ) (21)
The aggregated
L system can be described by (k0 , k, t, m1 ). Furthermore, because
k = =1 k0 the state space can also be described by (k0 , t, m1 ).
Deﬁne q as before, that is q is the conditional probability that an arriving request
at the depot cannot be fulﬁlled immediately, given that there are no other requests
Closed loop two-echelon repairable item systems 237

waiting. In a formula it says q = P (n1 = S0 |n1 ≤ S0 ). So, given there is no

backlog at the depot, an arriving request has to wait with probability q. The waiting
time depends on the number of spares already in the queue.

Production cell
1 − p1
λ1
λ1
p1

Base 1 repair µ1
m11
m12 λ1
µ1
t1 j1 machines
Transport γ1 operational
Depot repair
min( R0 , S 0 + k ) µ 0 / ∞ Transport depot to bases
k

t2
Transport γ2
Production cell
µ2 m22 λ2
m21
Base 2 repair µ2 λ2

λ2
1 − p2
j2 machines
operational

Fig. 10. The Typical-server Closed Queuing Network

The first spare that finishes repair will fulfill the just arrived request. With
probability 1 − q spares are available and the arriving request does not have to wait.
This aggregated network is depicted as a Typical-server Closed Queuing Network
in Figure 10. The depot repair shop is modeled as a typical server. In case of no
backlog (k = 0) the service rate equals infinity with probability 1 − q and equals
min(S0 , R0 )µ0 with probability q. In all other cases (k > 0) the service rate equals
min(k + S0 , R0 )µ0 .
To determine q Norton’s theorem is used once more. As in Subsection 2.3 each
base (the transport line, the base repair shop and the production cell) is replaced by a
state dependent server. To determine the transition rate of this state dependent server,
each base-part of the network is short circuited and its throughput is calculated.
This throughput operates as the service rate of the state dependent server. The
new network with the state dependent servers and the short circuited networks are
depicted in Figure 11.
Once again the evolution of n1 can be described as a birth-death process. The
(approximated) transition diagram for n1 = 0, . . . , S0 is given in Figure 12.
Let T H (i) be the throughput of the subnetwork replacing base ( = 1, . . . , L)
with i jobs present. As in the simple model only the behavior for n1 ≤ S0 needs to
238 L. Spanjers et al.

Production cell
1 − p1
λ1
λ1
p1

Base 1 repair µ1
m11
m12 λ1
µ1
TH1 (i) t1 j1 machines
k
TH1(i) Transport γ1 operational

Depot repair µ0
n1
µ0 n2 TH 2 (i) t2
Transport γ2
TH 2 (i) Production cell
µ2 m22 λ2
m21
Base 2 repair µ2 λ2

λ2
1 − p2
j2 machines
operational

a b
Fig. 11. a The new network with state dependent servers. b The short circuited networks

∑ TH l ( J l + S l ) ∑ TH l ( J l + S l ) ∑ TH l ( J l + S l ) ∑ TH l ( J l + S l )
l l l l

0 1 2 i i+1 S 0 −1 S0
µ0 2µ0 min(i, R0 ) µ 0 min(i + 1, R0 ) µ 0 min(S 0 − 1, R0 ) µ 0 min(S 0 , R0 ) µ 0

Fig. 12. Transition diagram for n1

be studied to determine q. Take δ = T H (J + S )/µ0 , then

1
P (n1 = n1 ) = δ n1 ,n1 P (n1 = 0) for n1 = 0, . . . , S0
k=1 min(k, R0 )
(22)

and
P (n1 = S0 ) P (n = S0 )
q= = S0 1
P (n1 ≤ S0 ) n =0 P (n1 = n1 ) 1

δ S0 S0 1
P (n1 = 0)
k=1 min(k,R0 )
= S0
n1 =0 δ n1 n1 1
min(k,R0 )
P (n1 = 0)
k=1

δ S0 S0 1
min(k,R0 )
= S0 k=1
. (23)
n1 =0 δ n1 n1 1
min(k,R0 )
k=1

The throughputs can be obtained by applying a standard MDA algorithm (see [4])
on the short circuited product form networks as shown in Figure 11.
The steady state marginal probabilities as well as the main performance mea-
sures for the aggregated system can be found by using an adapted Multi-Class
Marginal Distribution Analysis algorithm (see Buzacott and Shanthikumar [4]
for ordinary Multi-Class MDA). To see this, introduce tokens of class with
Closed loop two-echelon repairable item systems 239

= 1, . . . , L that either represent machines present at base (in the produc-

tion cell, in the base repair shop, in the base stock or in transit to this base) or
represent requests to the depot stock emerging from a failure of a machine at base
that cannot be repaired locally. Recall that machines that have to be repaired in the
depot repair shop, in fact lose their identity, i.e. after completion they are placed
in the depot stock, from which they can in principle be shipped to any arbitrary
base. However, the request arriving jointly with that broken machine at the depot,
maintains its identity, meaning that it is matched with the first spare machine avail-
able, after which the combination is transported to the base the request originated
from. Therefore, a token can be seen as connected to a machine as long as that
machine is at the base (in any status) and connected with the corresponding request
as soon as the machine is sent to the depot. This request matches with an available
machine from stock (which generally is different from the one sent to the depot,
unless S0 = 0) and the combination returns to the base that generated the request.
Hence, in this way, a multi-class network arises in a natural way.
The adapted algorithm is given below. An important aspect of an MDA algo-
rithm is the computation of the expected sojourn time in the stations. Since the depot
repair shop is modeled as a typical server, the standard sojourn time as described in
[4] will not do for this station. As denoted before, in case of no backlog (k = 0) the
service rate equals infinity with probability 1 − q and equals min(S0 , R0 )µ0 with
probability q. In all other cases (k > 0) the service rate equals min(k + S0 , R0 )µ0 .
The expected sojourn time of an arriving request is the time it takes until all re-
quests in front of it (k) are fulfilled and the request itself is fulfilled. That is, the
time until k + 1 machines come out of repair. In case k = 0 with probability 1 − q
the sojourn time equals 0 because a spare fulfills the request. The adaptations to
the sojourn time reveal themselves in the algorithm in step 4. Another adaptation to
the ordinary algorithm is found in step 6. The transition rates from the states with
0 machines in depot repair to the states with 1 machine in depot repair now equal
q times the throughput, instead of just the throughput.
Algorithm 4 The depot repair shop is defined as station 0 and all other stations
are defined as station i, where denotes the number of the base ( = 1, . . . , L)
and i denotes the specific station associated with that base. The production cell is
denoted by i = b, the base repair shop by i = m and the transport line from the
depot to the base by i = t.
(r)
Let Vj be the visit ratio of station j for class r type machines. Let z denote
the number of machines in the system and z = (z1 , . . . , zr , . . . , zL ) the vector
denoting the state that indicates the number of machines per class. The steady state
probability that y machines are in station j, given vector z is denoted by pj (y|z).
The expected sojourn time for type r machines arriving at station j given that z
(r) (r)
machines are wandering through the system is given by EWj (z) and T Hj (z)
denotes the throughput of type r machines given state z. The algorithm is executed
as follows:
() () 1 () p
1. (Initialization) For = 1, . . . , L set V0 = 1, Vlb = 1−p , Vlm = 1−p and
() (r)
/ , i ∈ {b, m, t} set
Vlt = 1. For = 1, . . . , L, r = 1,0. . . , L, r = Vi = 0.
Set z = 0 and pj (0|0) = 1 for j ∈ {lb, lm, lt} ∪ {0}.
240 L. Spanjers et al.

2. z:=z+1. L
3. For all states z ∈ {z| =1 z () = z and z () ≤ J + S } execute steps 4
through 6.
4. Compute the sojourn times for = 1, . . . , L for which z () > 0 from:
z−1

() k+1
EW0 (z) = p0 (k|z − e )
min(R0 , S0 + k + 1)µ0
k=1
q
+ p0 (0|z − e ),
min(R0 , S0 + 1)µ0
z−1

() b − J + 1 1
EWlb (z) = plb (b |z − e ) + ,
J λ λ
b =J
z−1

() m1 − R + 1 1
EWlm (z) = plm (m1 |z − e ) + ,
R µ µ
m1 =R

() 1
EWlt (z) = .
γ
()
5. Compute T H0 (z) for = 1, . . . , L if z () > 0 from:
() z ()
T H0 (z) = () () () ()
,
V0 EW0 + i∈{b,m,t} Vi EWi
() ()
and if z () = 0 then T H0 (z) = 0. Compute T Hi (z) for = 1, . . . , L and
i ∈ {b, m, t} from:
() () ()
T Hi (z) = Vi T H0 (z).
6. Compute the marginal probabilities for all stations from:
L
()
µ0 min(R0 , S0 + 1) p0 (1|z) = T H0 (z) q p0 (0|z − e ),
=1
L
()
µ0 min(R0 , S0 + k) p0 (k|z) = T H0 (z)p0 (k−1|z−e ) for k=2, . . ., z,
=1
and for = 1, . . . , L from:
()
λ min(J , b )plb (b |z) = T Hlb (z)plb (b − 1|z − e )
for b = 1, . . . , z,
()
µ min(R , m1 ) plm (m1 |z) = T Hlm (z) plm (m1 − 1|z − e )
for m1 = 1, . . . , z,
()
γ t plt (t |z) = T Hlt (z) plt (t − 1|z − e )
for t = 1, . . . , z.
0
Compute pj (0|z) for j ∈ {lb, lm, lt} ∪ {0} from:
z

pj (0|z) = 1 − pj (y|z).
y=1
Closed loop two-echelon repairable item systems 241

L
7. If z = =1 J + S then stop; else go to step 2.

With the adapted Multi-Class MDA algorithm presented above, the marginal
probabilities of the system as well as the throughputs and the sojourn times can be
approximated. From these, various performance measures can be computed. In the
next section some results obtained by the algorithm will be compared with results
from simulation.

Remark 5 Opposite to the simple problem discussed in Section 2 (which merely

served to illustrate the basic steps of the aggregation procedure), an exact solution
approach for the current extended problem already proves to be computationally
intractable, due to the curse of dimensionality. The aggregation procedure, on the
other hand, yields no essential computational difﬁculties. This is due to two reasons.
First of all, the aggregation and subsequent small changes on some border transition
rates allow us to come up with a near-product form solution for the approximated
system. Second, as a result of that, we are able to apply Norton’s theorem, which
allows for an exact decomposition of the remaining approximated model. Although
for large problems the adapted Multi-Class MDA algorithm becomes slower, stan-
dard approximation techniques for multi-class systems are available to speed up
these algorithms further, without losing much accuracy (see also our ﬁnal remarks
in Sect. 5).

3.3 Results

In this section results obtained by the adapted Multi-Class MDA algorithm from
the previous section will be presented. They will be compared to results obtained by
simulation. For each base we are interested in the availability, that is the probability
that the maximum number of machines is operating in the production cell. For base
this is denoted by A for = 1, . . . , L. Furthermore we are interested in the
expected number of machines operating in the production cell, denoted by Ej for
base =1, . . . , L. For =1, . . . , L the performance measures can be computed by
A = P (j = J ) = P (b ≥ J ) = P (k 0 + m1 ≤ S ), (24)
Ej = E(J − [k 0 + m1 − S ] ) +

= (J − [k0 + m1 − S ]+ )P (k0 , m1 ). (25)
k0 ,m1

In Table 2 and Table 7 in the Appendix, the parameter settings for some repre-
sentative test problems are given. In this section, we consider dual base systems
(L = 2); in the appendix we also have examples of systems with three (L = 3) and
four bases (L = 4). The other parameters in this case are given in Table 2 with
J , the maximum number of working machines at base ,
S , the maximum number of stored items at base (or at the depot),
λ , the breakdown rate of individual machines at base ,
µ , the repair rate of individual machines at base (or at the depot),
R , the number of repairmen at base (or at the depot),
242 L. Spanjers et al.

p , the probability that a machine can be repaired at base ,

γ , the transportation rate to base and
ρ , the trafﬁc intensity at the base (or at the depot).

The first 10 models are symmetric, that is the same parameter values apply to both
bases. The other 10 problems concern asymmetric cases. It is obvious that a large
number of input parameters is required to specify a given problem. This makes it
difficult to vary these parameters in a totally systematic manner. In Albright [2] it is
shown that traffic intensities are good indicators of whether a system will work well
(minimal backorders) and are better indicators than the stock levels. Therefore we
selected most of the test problem parameter settings by selecting values of the traffic
intensities, usually well less than 1, and then selecting parameters to achieve these
traffic intensities. For the base repair facility, the traffic intensity ρ is defined as

ρ = J λ p /R µ , (26)

the maximum failure rate divided by the maximum repair rate. Similarly, the depot
trafﬁc intensity ρ0 is deﬁned as

L

ρ0 = J λ (1 − p )/R0 µ0 . (27)
=1

The results are given in Table 3 and Table 8 in the Appendix. The simulation leads
to 95 % confidence intervals. The simulation method was the so called replication
deletion method where the warmup period was found by Welch’s graphical proce-
dure (cf. Law and Kelton [7]). To compare the approximations with the simulation
results, the deviation from the approximation to the midpoint of the confidence
interval is calculated. These percentage deviations are given as well.
From the results it can be concluded that the approximations are very accurate.
The maximum deviation for the availability as well as the relative deviation for the
expected number of working machines, is well less than 1% and all approximat-
ing values lie within the confidence intervals. Furthermore, all types of problems
exhibited similar levels of accuracy.

4 Optimization

In the preceeding sections an accurate approximation for several performance mea-

sures of closed two-echelon repairable item systems has been obtained. These ap-
proximation methods can be used to find an optimal allocation of spares in the
system, in order to achieve the best performance. In this section we give algoritms
to find the optimal allocation.
At first, we formulate the optimization problem. Subsequently, we present a fast but
reliable greedy approximation scheme for the optimization problem. The section
is concluded with some numerical results.
Closed loop two-echelon repairable item systems 243

Table 2. Parameter settings for test problems multi-base model with transportation (1)

depot
Problem S0 µ0 R0 ρ0 base J S λ µ R p γ ρ

1 1 20 1 0.5 1/2 10 2 1 10 1 0.5 ∞ 0.5

2 1 10 1 0.5 1/2 5 2 1 5 1 0.5 ∞ 0.5
3 1 10 2 0.25 1/2 5 2 1 5 2 0.5 ∞ 0.25
4 1 10 1 0.5 1/2 5 2 1 5 1 0.5 10 0.5
5 1 10 1 0.5 1/2 5 2 1 5 1 0.5 2 0.5
6 1 10 2 0.25 1/2 5 2 1 5 2 0.5 2 0.25
7 1 2 5 0.5 1/2 5 2 1 1 5 0.5 2 0.5
8 7 2 5 0.5 1/2 5 2 1 1 5 0.5 2 0.5
9 5 6 1 0.83 1/2 5 5 1 3 1 0.5 ∞ 0.83
10 5 10 1 0.5 1/2 5 5 1 5 1 0.5 ∞ 0.5
11 1 20 1 0.5 1 10 2 1 10 1 0.5 ∞ 0.5
2 10 2 1 3 1 0.5 ∞ 1.67
12 1 20 1 0.38 1 10 2 1 10 1 0.5 ∞ 0.5
2 10 2 1 3 1 0.75 ∞ 2.5
13 1 20 1 0.38 1 10 2 1 10 1 0.5 1 0.5
2 10 2 1 20 1 0.75 ∞ 0.375
14 2 20 1 0.25 1 10 5 1 10 1 0.5 2 0.5
2 10 2 1 20 1 0.5 2 0.25
15 2 10 1 0.5 1 10 1 1 12 1 0.5 ∞ 0.42
2 10 4 1 3 4 0.5 ∞ 0.42
16 1 10 1 0.5 1 5 2 1 5 1 0.5 ∞ 0.5
2 5 2 1 5 1 0.5 300 0.5
17 1 10 1 0.5 1 5 2 1 5 1 0.5 ∞ 0.5
2 5 2 1 5 1 0.5 5 0.5
18 1 10 1 0.5 1 5 2 1 5 1 0.5 ∞ 0.5
2 5 2 1 5 1 0.5 2 0.5
19 1 6 1 1.67 1 10 2 1 5 1 0.5 ∞ 1
2 10 2 1 5 1 0.5 2 1
20 1 10 1 1 1 10 2 1 5 1 0.5 ∞ 1
2 10 2 1 5 1 0.5 2 1

4.1 The optimization problem

The aim is to maximize the overall performance of the system under a budget con-
straint for stocking costs. For the overall performance of the two-echelon repairable
item system, the total availability Atot , deﬁned by

L
=1 J λ A
Atot = L
,
=1 J λ
244 L. Spanjers et al.

Table 3. Results for test problems from Table 2

Problem base A sim A appr % dev Ej sim EJ appr % dev

1 1/2 (0.8529,0.8563) 0.8542 0.05 (9.7533,9.7615) 9.7562 0.01

2 1/2 (0.8638,0.8750) 0.8683 0.13 (4.7957,4.8161) 4.8043 0.03
3 1/2 (0.9695,0.9714) 0.9701 0.04 (4.9626,4.9655) 4.9633 0.02
4 1/2 (0.8311,0.8403) 0.8353 0.04 (4.7461,4.7640) 4.7543 0.02
5 1/2 (0.6548,0.6639) 0.6605 0.18 (4.4542,4.4737) 4.4672 0.07
6 1/2 (0.7490,0.7539) 0.7514 0.00 (4.6463,4.6545) 4.6521 0.04
7 1/2 (0.2938,0.3008) 0.2978 0.17 (3.6284,3.6497) 3.6445 0.15
8 1/2 (0.3781,0.3883) 0.3800 0.83 (3.8866,3.9096) 3.8907 0.19
9 1/2 (0.8165,0.8361) 0.8234 0.34 (4.6622,4.7032) 4.6770 0.12
10 1/2 (0.9854,0.9894) 0.9875 0.01 (4.9785,4.9851) 4.9817 0.00
11 1 (0.8631,0.8703) 0.8663 0.05 (9.7672,9.7864) 9.7782 0.01
2 (0.0739,0.0830) 0.0797 1.54 (5.8615,5.9716) 5.9036 0.22
12 1 (0.8733,0.8785) 0.8753 0.07 (9.7915,9.8028) 9.7940 0.03
2 (0.0078,0.0100) 0.0082 8.06 (3.9778,4.0665) 3.9965 0.64
13 1 (0.1252,0.1367) 0.1303 0.49 (7.5031,7.5736) 7.5390 0.01
2 (0.9423,0.9455) 0.9452 0.14 (9.9154,9.9219) 9.9208 0.02
14 1 (0.8466,0.8565) 0.8512 0.04 (9.7283,9.7524) 9.7408 0.00
2 (0.4846,0.4995) 0.4895 0.52 (9.0411,9.0802) 9.0602 0.00
15 1 (0.4413,0.4647) 0.4382 3.27 (8.6368,8.7362) 8.6387 0.55
2 (0.7007,0.7231) 0.7012 1.50 (9.3273,9.3925) 9.3375 0.24
16 1 (0.8617,0.8694) 0.8693 0.43 (4.7934,4.8076) 4.8043 0.08
2 (0.8625,0.8717) 0.8673 0.02 (4.7938,4.8113) 4.8028 0.00
17 1 (0.8644,0.8734) 0.8691 0.03 (4.7985,4.8139) 4.8057 0.01
2 (0.7899,0.8005) 0.7957 0.06 (4.6831,4.7016) 4.6928 0.01
18 1 (0.8690,0.8752) 0.8707 0.16 (4.8049,4.8158) 4.8082 0.05
2 (0.6514,0.6618) 0.6579 0.20 (4.4494,4.4689) 4.4620 0.06
19 1 (0.0742,0.0837) 0.0769 2.60 (6.2112,6.2972) 6.2354 0.30
2 (0.0277,0.0338) 0.0301 2.15 (5.5644,5.6682) 5.5946 0.39
20 1 (0.3298,0.3430) 0.3354 0.29 (8.0845,8.1484) 8.1002 0.20
2 (0.1417,0.1492) 0.1472 1.20 (7.2625,7.3228) 7.2967 0.06

is taken. It can be considered as the weighted average of the availabilities per

base. The total availability is considered as a function of the maximal stock sizes
S0 , S1 , · · · , SL ; the other parameters that inﬂuence the total availability are given.
The constraint for the optimization problem is an upperbound C for the total
stocking costs. The stocking costs are linear in the maximum stock sizes. Let c be
the storage cost for keeping one spare at stockpoint . The (non-linear) optimization
Closed loop two-echelon repairable item systems 245

problem can now be formulated as:

max Atot (S0 , . . . , SL ),
L

s.t. c S ≤ C,
=0
S ≥ 0, for = 0, . . . , L.
In the next subsection a greedy approximation scheme will be given to approximate
the optimal values for S0 , . . . , SL .

4.2 Optimization algorithm

The most straightforward solution method to ﬁnd optimal stock levels, is the brute
force method. This method simply checks all feasible allocations and picks the one
which gives the highest total availability. By assuming that Atot is an increasing
function, the brute force can be improved by considering only allocations on the
boundary of the feasible region, that is those allocation where adding another spare
part would lead to an infeasible allocation. Even this improved brute force approach
turns out to be rather time consuming.
In Zijm and Avsar [10], a greedy approximation procedure is given to ﬁnd the
optimal allocation of stocks for an open two-indenture model. This method can
also be applied on closed two-echelon repairable item systems.
At the start of the heuristic algorithm no spares are allocated. One repeatedly allo-
cates one spare to the location that leads to the maximum increase in total availability
per unit of money invested, under the constraint that the allocation is feasible. The
heuristic continues as long as this maximum increase is positive; it can be presented
as follows:
Algorithm 6 Approximative optimization method (greedy approach)

1. (Initialization) Set Ŝ = 0, for = 0, 1, . . . , L, and set Ĉ = 0.

2. (Repetition) Deﬁne ∆ for = 0, 1, . . . , L, by
⎧
⎪
⎨ Atot (Ŝ0 , . . . , Ŝ +1 . . . , ŜL )−Atot (Ŝ0 , . . . , Ŝ . . . , ŜL ) if Ĉ+c ≤C,
∆ = c
⎪
⎩0, otherwise.

Let ˆ = arg max ∆ . If ∆ˆ ≤ 0 then stop; otherwise repeat this step after setting
Ŝˆ = Ŝˆ + 1 and Ĉ = Ĉ + cˆ.
3. (Solution) The resulting stock allocation (Ŝ0 , Ŝ1 , . . . , ŜL ) is the approximative
solution to the optimization problem.
The greedy heuristic presented above builds on the observation that
Atot (S0 , . . . , SL ) tends to behave as an increasing multi-dimensional concave func-
tion, in particular for not too small values of Si , i = 1, . . . , L. This observation
246 L. Spanjers et al.

Table 4. Optimal stock sizes for test problems

Problem base J λ µ R p γ c C Atot,bf S,bf Atot,greedy S,greedy

1 depot 5 1 1 10 0.7513 2 0.7513 2

1 5 1 5 1 0.5 10 1 4 4
2 5 1 5 1 0.5 10 1 4 4
2 depot 5 1 1 20 0.8668 6 0.8662 7
1 5 1 5 1 0.5 10 1 7 7
2 5 1 5 1 0.5 10 1 7 6
3 depot 5 1 1 20 0.8328 9 0.8328 9
1 5 1 5 1 0.5 10 2 3 3
2 5 1 5 1 0.5 10 1 5 5
4 depot 5 1 1 20 0.7977 8 0.7977 8
1 5 1 5 1 0.5 10 2 3 3
2 5 1 5 1 0.5 10 2 3 3
5 depot 5 1 1 20 0.6487 4 0.6438 2
1 5 1 5 1 0.5 1 2 4 5
2 5 1 5 1 0.5 1 2 4 4
6 depot 5 2 1 20 0.9716 4 0.9716 4
1 5 1 5 1 0.5 10 2 4 4
2 7 1 5 2 0.5 10 2 4 4
7 depot 5 2 2 20 0.9987 0 0.9987 0
1 5 1 5 1 0.5 10 1 10 10
2 7 1 5 2 0.5 10 1 10 10
8 depot 5 3 1 20 0.9144 4 0.9144 4
1 10 1 5 2 0.5 10 2 4 4
2 10 1 5 2 0.5 10 2 4 4
9 depot 5 3 1 20 0.6327 6 0.6234 6
1 10 2 5 4 0.5 10 2 5 4
2 10 1 5 2 0.5 10 2 2 3
10 depot 3 2 1 20 0.6976 2 0.6976 2
1 3 1 3 1 0.2 1 2 3 3
2 7 1 3 2 0.8 1 2 6 6

of concavity is strongly supported by empirical evidence. In addition, we note that

in the uncapacitated case, a formal proof of the concavity of the availability func-
tion can be given, based on convexity properties of backorder probabilities as a
function of the base stock levels (see e.g. Rustenburg et al. [8]), at least when the
values of Si , i = 1, . . . , L, exceed certain (low) thresholds. In other words: a law
of diminishing added value is valid here, and is again very likely to hold in the
capacitated case as well. If Atot is an increasing function, the heuristic will stop
when the boundary of the feasible region is reached. In the next section the greedy
approach is numerically compared with the brute force approach.
Closed loop two-echelon repairable item systems 247

In this section results are obtained for several test problems. The results ob-
tained by the brute force approach are compared to the results found by the greedy
approach. Even when the greedy approach gives a different allocation for spare
items, the total availability only decreases slightly.
In Table 4 several test problems are presented. The parameters in this case are
J , the maximum number of working machines at base ,
λ , the breakdown rate of individual machines at base ,
µ , the repair rate of individual machines at base (or at the depot),
R , the number of repairmen at base (or at the depot),
p , the probability that a machine can be repaired at base ,
γ , the transportation rate to base ,
c , the costs to store an item at base (or at the depot),
C, the available budget for storing items.
Note that the the maximal stock sizes S0 , S1 , . . . , SL and the total availability Atot
are not given but computed by either the brute force approach (Atot,bf and S,bf )
or by Heuristic 6 (Atot,greedy and S,greedy ). The numerical results indicate that
the greedy approach yields good results.

5 Summary and possible extensions

In this paper we have analyzed a closed loop two-echelon repairable item system
with a fixed number of items circulating in the network. The system consists of
several bases and a central repair facility (depot). Each base consists of a production
cell and a base repair shop. There are transport lines leading from the depot to the
bases. Transport from bases to the depot is not taken into account. The repair shops
are modeled as multi-servers and the transport lines as ample servers. Repair shops
at the depot as well as at the bases are able to keep a number of ready-for-use
items in stock. Machines that have failed in the production cell of a certain base
are immediately replaced by a ready-for-use machine from that base’s stock, if
available. The failed machine is sent to either the base repair facility or to the depot
repair facility, in the latter case a spare machine is sent from the depot to the base,
to deplete the base’s stock of ready-for-use items. Once the machine at the depot is
repaired, it is added to the central stock. Orders are satisfied on a first-come-first-
served basis while any requirement that cannot be satisfied immediately either at a
base or at the depot is backlogged. In case of a backlog at a certain base, that base’s
production cell performs worse. This also means that the expected total rate at
which machines fail at the production cell is smaller than in the case of no backlog.
The exact analysis of a Markov chain model for this system with multiple bases
and many machines or with large inventories, is difficult to handle. Therefore, we
aggregated a number of states and adjusted some rates to obtain a special near-
product-form solution. The new system can be observed as a Typical-server Closed
Queuing Network (TCQN). The notion typical comes from modeling the central
repair facility together with the synchronization queue, as a typical server with state
dependent service rates. These state dependent service rates follow from an appli-
cation of Norton’s theorem for Closed Queuing Networks. An adapted Multi-Class
248 L. Spanjers et al.

Marginal Distribution Analysis algorithm is developed to compute the steady state

probabilities. From these steady state probabilities several performance measures
can be obtained, such as the availability and the expected number of machines op-
erating in the production cells. Numerical results show that the approximations are
extremely accurate, when compared to simulation results. The approximations are
used in an optimization heuristic to determine inventory levels at both the central
and local facilities with a maximal total availability under a cost constraint.
A disadvantage of the adapted Multi-Class Marginal Distribution Analysis al-
gorithm is the computational slowness. Especially for large systems with multiple
bases, many machines and large inventories, the algorithm is not very fast. Here,
further aggregation steps may speed up the system evaluation considerably, unfor-
tunately at the cost of some accuracy.
Furthermore, the model considered is quite a realistic model. However, it could
be more realistic by including transport from the bases to the depot and to allow
for more complicated networks in the repair facilities. In the model described in
this paper, each repair shop is modeled as a multi-server. An interesting extension
to this, is to consider the repair facility to be a job shop and model it as a limited
capacity open queuing network, as has been done in [3] for the case of an open
multi-echelon repairable item system. Then, it is easy to include transport to the
depot repair facility as just an additional node in the job shop. Last but not least, it
is interesting to ﬁnd a heuristic to optimize inventory levels at the central and local
facilities in combination with optimal repair capacities. This will be the subject of
future research.
References
1. Albright SC, Soni A (1988) Markovian multi-echelon repairable inventory system.
Naval Research Logistics 35(1): 49–61
2. Albright SC (1989) An approximation to the stationary distribution of a multieche-
lon repairable-item inventory system with ﬁnite sources and repair channels. Naval
Research Logistics 36(2): 179–195
3. Avsar ZM, Zijm WHM (2002) Capacitated two-echelon inventory models for repairable
item systems. In: Gershwin SB et al. (eds) Analysis and modeling of manufacturing
systems, pp 1–36. Kluwer, Boston
4. Buzacott JA, Shanthikumar JG (1993) Stochastic models of manufacturing systems.
Prentice-Hall, Englewood Cliffs, NJ
5. Gross D, Kioussis LC, Miller DR (1987) A network decomposition approach for ap-
proximate steady state behavior of Markovian multi-echelon repairable item inventory
systems. Management Science 33: 1453–1468
6. Harrison PG, Patel NM (1993) Performance modelling of communication networks
and computer architectures. Addison Wesley, New York
7. Law AM, Kelton WD (2000) Simulation modeling and analysis, 3rd edn. McGraw-Hill
Higher Education, Singapore
8. Rustenburg WD, van Houtum GJ, Zijm WHM (2000) Spare parts management for
technical systems: resupply of spare parts under limited budgets. IIE Transactions 32:
1013–1026
9. Sherbrooke CC (1968) METRIC: a multi-echelon technique for recoverable item con-
trol. Operations Research 16: 122–141
10. Zijm WHM, Avsar ZM (2003) Capacitated two-indenture models for repairable item
systems. International Journal of Production Economics 81–82: 573–588
Closed loop two-echelon repairable item systems 249

Appendix

In this appendix, numerical results are given for various parameter settings in our
model. In most cases, the availability is high as desired in practical situations. In
Table 5 and Table 6 the focus is on the single base model. Multiple base models
are considered in Table 7 and Table 8.
Table 5. Results for the simple single base model, p1 = 0.5, λ1 = 1, µ0 = J1 , µ1 = J1

J1 S0 S1 Aexact Aappr % dev Ej 1 exact Ej 1 appr % dev

3 1 0 0.5056 0.5100 0.8575 2.3178 2.3225 0.2037

3 3 0 0.5749 0.5771 0.3784 2.4338 2.4368 0.1227
3 5 0 0.5874 0.5880 0.1066 2.4544 2.4553 0.0379
3 1 1 0.7322 0.7340 0.2516 2.6331 2.6345 0.0545
3 3 1 0.7948 0.7961 0.1590 2.7264 2.7279 0.0531
3 5 1 0.8082 0.8087 0.0578 2.7463 2.7469 0.0217
3 1 3 0.9171 0.9172 0.0114 2.8875 2.8873 0.0061
3 3 3 0.9465 0.9466 0.0106 2.9287 2.9287 0.0005
3 5 3 0.9535 0.9536 0.0055 2.9385 2.9385 0.0014
3 1 4 0.9538 0.9538 0.0008 2.9376 2.9374 0.0058
3 3 4 0.9722 0.9722 0.0001 2.9630 2.9629 0.0022
3 5 4 0.9766 0.9766 0.0006 2.9691 2.9691 0.0003
5 1 0 0.4690 0.4722 0.6947 4.1654 4.1688 0.0817
5 3 0 0.5452 0.5470 0.3263 4.3224 4.3250 0.0595
5 5 0 0.5602 0.5607 0.0987 4.3529 4.3538 0.0209
5 1 1 0.7045 0.7059 0.2070 4.5407 4.5416 0.0187
5 3 1 0.7748 0.7758 0.1318 4.6643 4.6654 0.0237
5 5 1 0.7905 0.7909 0.0486 4.6915 4.6920 0.0108
5 1 3 0.9068 0.9069 0.0094 4.8573 4.8570 0.0059
5 3 3 0.9403 0.9404 0.0078 4.9111 4.9110 0.0016
5 5 3 0.9484 0.9484 0.0040 4.9240 4.9240 0.0001
5 1 4 0.9480 0.9480 0.0007 4.9207 4.9205 0.0045
5 3 4 0.9689 0.9689 0.0006 4.9537 4.9536 0.0022
5 5 4 0.9740 0.9740 0.0002 4.9617 4.9617 0.0005
10 1 0 0.4318 0.4339 0.4703 8.9658 8.9676 0.0206
10 3 0 0.5150 0.5162 0.2363 9.1819 9.1836 0.0182
10 5 0 0.5329 0.5333 0.0756 9.2279 9.2286 0.0073
10 1 1 0.6746 0.6756 0.1407 9.4175 9.4177 0.0023
10 3 1 0.7535 0.7542 0.0907 9.5842 9.5848 0.0057
10 5 1 0.7718 0.7721 0.0336 9.6225 9.6228 0.0031
10 1 3 0.8953 0.8953 0.0058 9.8165 9.8161 0.0036
10 3 3 0.9335 0.9335 0.0043 9.8880 9.8879 0.0017
10 5 3 0.9428 0.9428 0.0021 9.9054 9.9053 0.0004
10 1 4 0.9414 0.9414 0.0009 9.8980 9.8978 0.0024
10 3 4 0.9652 0.9652 0.0011 9.9415 9.9414 0.0015
10 5 4 0.9711 0.9711 0.0002 9.9522 9.9522 0.0004
250 L. Spanjers et al.

Table 6. Results for the simple single base model, p1 = 0.25, λ1 = 1, µ0 = 2J1 , µ1 = J1

J1 S0 S1 Aexact Aappr % dev Ej 1 exact Ej 1 appr % dev

3 1 0 0.5348 0.5383 0.6612 2.3402 2.3436 0.1475

3 3 0 0.6743 0.6783 0.5878 2.5726 2.5777 0.1978
3 5 0 0.7282 0.7310 0.3796 2.6619 2.6658 0.1468
3 1 1 0.7201 0.7208 0.0913 2.5951 2.5956 0.0176
3 3 1 0.8384 0.8394 0.1194 2.7746 2.7757 0.0405
3 5 1 0.8906 0.8914 0.0958 2.8537 2.8548 0.0381
3 1 3 0.8705 0.8705 0.0007 2.8110 2.8109 0.0007
3 3 3 0.9311 0.9311 0.0019 2.8999 2.8999 0.0001
3 5 3 0.9613 0.9613 0.0023 2.9442 2.9443 0.0007
3 1 4 0.9075 0.9075 0.0001 2.8649 2.8649 0.0003
3 3 4 0.9505 0.9505 0.0000 2.9278 2.9278 0.0002
3 5 4 0.9726 0.9726 0.0002 2.9602 2.9602 0.0000
5 1 0 0.4900 0.4923 0.4515 4.1493 4.1514 0.0483
5 3 0 0.6429 0.6455 0.4015 4.4641 4.4675 0.0762
5 5 0 0.7066 0.7085 0.2621 4.5946 4.5975 0.0620
5 1 1 0.6814 0.6818 0.0605 4.4558 4.4560 0.0042
5 3 1 0.8147 0.8154 0.0774 4.6983 4.6990 0.0142
5 5 1 0.8761 0.8767 0.0617 4.8098 4.8105 0.0149
5 1 3 0.8477 0.8477 0.0002 4.7371 4.7370 0.0005
5 3 3 0.9182 0.9182 0.0007 4.8597 4.8596 0.0003
5 5 3 0.9540 0.9540 0.0011 4.9219 4.9219 0.0001
5 1 4 0.8904 0.8904 0.0002 4.8106 4.8106 0.0002
5 3 4 0.9409 0.9409 0.0002 4.8980 4.8980 0.0002
5 5 4 0.9672 0.9672 0.0000 4.9436 4.9436 0.0001
10 1 0 0.4390 0.4401 0.2503 8.8481 8.8489 0.0094
10 3 0 0.6051 0.6064 0.2198 9.2890 9.2906 0.0176
10 5 0 0.6807 0.6817 0.1415 9.4891 9.4906 0.0158
10 1 1 0.6338 0.6340 0.0316 9.2282 9.2282 0.0000
10 3 1 0.7843 0.7846 0.0387 9.5703 9.5706 0.0024
10 5 1 0.8574 0.8576 0.0300 9.7364 9.7367 0.0032
10 1 3 0.8177 0.8177 0.0001 9.6112 9.6112 0.0003
10 3 3 0.9007 0.9007 0.0001 9.7898 9.7898 0.0002
10 5 3 0.9440 0.9440 0.0002 9.8828 9.8828 0.0001
10 1 4 0.8675 0.8675 0.0002 9.7170 9.7170 0.0001
10 3 4 0.9277 0.9277 0.0002 9.8460 9.8460 0.0001
10 5 4 0.9597 0.9597 0.0001 9.9146 9.9146 0.0001
Closed loop two-echelon repairable item systems 251

Table 7. Parameter settings for test problems multi-base model with transportation (2)

Problem Depot Base J S λ µ R p γ ρ

S0 µ0 R0 ρ0

21 5 10 1 0.5 1/2 5 5 1 5 1 0.5 10 0.5

22 3 10 1 0.8 1/2 5 2 1 2 1 0.2 ∞ 0.5
23 3 10 2 0.4 1/2 5 2 1 2 2 0.2 ∞ 0.25
24 3 10 2 0.4 1/2 5 2 1 2 2 0.2 5 0.25
25 2 5 1 1 1/2 5 1 1 5 1 0.5 ∞ 0.5
26 2 3 3 0.56 1/2 5 3 1 2 3 0.5 5 0.42
27 4 2 10 0.25 1/2 5 2 1 5 1 0.5 10 0.5
28 8 5 3 0.33 1/2 5 2 1 5 1 0.5 10 0.5
29 8 1 8 0.63 1/2 5 2 1 5 1 0.5 10 0.5
30 3 10 1 1.05 1/2 7 3 1 5 1 0.25 ∞ 0.35
31 3 10 1 0.75 1 5 1 1 5 1 0.5 ∞ 0.5
2 10 3 1 10 1 0.5 ∞ 0.5
32 3 5 1 0.68 1 2 1 1 2 1 0.5 ∞ 0.5
2 8 3 1 8 1 0.7 ∞ 0.7
33 1 10 1 0.6 1 5 2 1 5 1 0.5 ∞ 0.5
2 7 2 1 5 1 0.5 ∞ 0.7
34 8 5 3 0.5 1/2/3 5 2 1 5 1 0.5 10 0.5
35 1 4 8 0.23 1/2/3 5 1 1 2 3 0.5 10 0.42
36 3 4 8 0.25 1 2 1 2 3 1 0.5 5 0.67
2 5 1 1 2 3 0.5 10 0.42
3 7 1 1 5 3 0.5 10 0.23
37 5 3 7 0.9 1 7 5 1 3 3 0.5 10 0.39
2 7 5 2 3 3 0.2 10 0.31
3 7 5 3 3 7 0.8 10 0.8
38 5 5 2 1.05 1 7 0 1 5 2 0.5 10 0.35
2 7 5 1 5 2 0.5 10 0.35
3 7 10 1 5 2 0.5 10 0.35
39 2 5 2 0.45 1 3 2 1 5 1 0.5 5 0.3
2 3 2 1 5 2 0.5 5 0.15
3 3 2 1 5 3 0.5 5 0.1
40 2 5 4 0.5 1/2/3/4 5 2 1 5 2 0.5 10 0.25
252 L. Spanjers et al.

Table 8. Results for test problems from Table 7

Problem Base A sim A appr % dev Ej sim EJ appr % dev

21 1/2 (0.9826,0.9854) 0.9840 0.00 (4.9737,4.9792) 4.9765 0.00

22 1/2 (0.8151,0.8266) 0.8192 0.20 (4.7062,4.7304) 4.7129 0.11
23 1/2 (0.9720,0.9742) 0.9731 0.01 (4.9650,4.9690) 4.9669 0.00
24 1/2 (0.8559,0.8607) 0.8563 0.23 (4.8108,4.8181) 4.8118 0.05
25 1/2 (0.5462,0.5522) 0.5526 0.61 (4.1962,4.2112) 4.2061 0.06
26 1/2 (0.8487,0.8510) 0.8493 0.06 (4.7788,4.7833) 4.7804 0.01
27 1/2 (0.8567,0.8614) 0.8594 0.04 (4.7882,4.7982) 4.7931 0.00
28 1/2 (0.8704,0.8752) 0.8714 0.16 (4.8103,4.8195) 4.8113 0.08
29 1/2 (0.8526,0.8606) 0.8555 0.13 (4.7798,4.7945) 4.7851 0.04
30 1/2 (0.6480,0.6813) 0.6608 0.58 (6.2491,6.3370) 6.2806 0.20
31 1 (0.7250,0.7325) 0.7305 0.25 (4.5786,4.5933) 4.5884 0.05
2 (0.8776,0.8871) 0.8813 0.12 (9.7689,9.7944) 9.7783 0.03
32 1 (0.7985,0.8068) 0.8019 0.09 (1.7560,1.7676) 1.7607 0.06
2 (0.7977,0.8020) 0.7994 0.05 (7.6077,7.6190) 7.6096 0.05
33 1 (0.8511,0.8587) 0.8561 0.14 (4.7765,4.7885) 4.7849 0.05
2 (0.6898,0.6971) 0.6933 0.02 (6.4198,6.4366) 6.4237 0.07
34 1/2 (0.8676,0.8718) 0.8711 0.25 (4.8051,4.8128) 4.8109 0.08
35 1/2/3 (0.5041,0.5130) 0.5109 0.46 (4.2436,4.2618) 4.2558 0.07
36 1 (0.6456,0.6538) 0.6525 0.43 (1.5538,1.5658) 1.5638 0.26
2 (0.5754,0.5821) 0.5790 0.04 (4.3828,4.3952) 4.3883 0.02
3 (0.7056,0.7089) 0.7070 0.04 (6.5989,6.6031) 6.6016 0.01
37 1 (0.9577,0.9617) 0.9599 0.02 (6.9352,6.9433) 6.9400 0.01
2 (0.7820,0.7903) 0.7859 0.03 (6.5683,6.5911) 6.5778 0.03
3 (0.4492,0.4542) 0.4510 0.15 (5.8151,5.8303) 5.8196 0.05
38 1 (0.1745,0.1807) 0.1766 0.59 (5.0709,5.1143) 5.0859 0.13
2 (0.8530,0.8624) 0.8575 0.03 (6.7166,6.7389) 6.7280 0.00
3 (0.9845,0.9862) 0.9848 0.05 (6.9731,6.9760) 6.9738 0.01
39 1 (0.9430,0.9450) 0.9443 0.04 (2.9308,2.9337) 2.9326 0.01
2 (0.9649,0.9671) 0.9670 0.10 (2.9595,2.9624) 2.9622 0.04
3 (0.9674,0.9686) 0.9686 0.07 (2.9627,2.9644) 2.9644 0.03
40 1/2/3/4 (0.9250,0.9273) 0.9268 0.07 (4.9047,4.9083) 4.9074 0.02
A heuristic to control integrated multi-product
multi-machine production-inventory systems
with job shop routings and stochastic arrival,
set-up and processing times
P.L.M. Van Nyen1 , J.W.M. Bertrand1 , H.P.G. Van Ooijen1 ,
and N.J. Vandaele2
1
Technische Universiteit Eindhoven, Department of Technology Management,
Den Dolech 2, Pav. F-14, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
(e-mail: {p.v.nyen,j.w.m.bertrand,h.p.g.v.ooijen}@tm.tue.nl)
2
University of Antwerp, Antwerpen, Belgium
(e-mail: [email protected])

Abstract. This paper investigates a multi-product multi-machine production-

inventory system, characterized by job shop routings and stochastic demand in-
terarrival times, set-up times and processing times. The inventory points and the
production system are controlled integrally by a centralized decision maker. We
present a heuristic that minimizes the relevant costs by making near-optimal pro-
duction and inventory control decisions while target customer service levels are
satisﬁed. The heuristic is tested in an extensive simulation study and the results are
discussed.

Keywords: Production-inventory system – Queueing network analyser – Produc-

tion control – Inventory control – Performance analysis

1 Introduction

This paper addresses the problem of determining optimal inventory and production
control decisions for an integrated production-inventory (PI) system in which mul-
tiple products are made-to-stock through a functionally oriented shop. As can be
seen in Figure 1, inventory is carried to service customer demand. The customers
require that their orders are serviced with a target fill rate. Unsatisfied demand is
backordered. Customers arrive according to a renewal process that is characterised
by interarrival times with a known mean and squared coefficient of variation (scv).

The authors would like to thank two anonymous referees and the editor for many valuable
suggestions.
254 P.L.M. Van Nyen et al.

Fig. 1. Integrated production-inventory system with job shop routings

The inventory points generate replenishment orders that are, in this integrated PI
system, equivalent to production orders. There is a fixed cost incurred every time a
production order is generated. In this paper, all inventory points are controlled using
periodic review, order-up-to (R, S) inventory policies. Other inventory policies can
be embedded in the framework, if desired. The production orders are manufactured
through the shop. We assume ample supply of raw material. The production system
consists of multiple functionally oriented workcenters through which a consider-
able number of different products can be produced. Each of the products can have
a specific serial sequence of production steps, which results in a job shop routing
structure. The production orders for different products compete for capacity at the
different workcenters, where they are processed in order of arrival (FCFS priority).
Before starting the production of an order, a set-up that takes a certain time and cost
is performed. The total production time of a production order depends on its size.
When the production of the entire order is finished, it is moved to the inventory point
where the products are temporarily stored until they are requested by a customer.
We consider situations in which the average demand for end products is relatively
high and stationary. Since the production system is characterized by considerable
set-up times and costs, the products are produced in batches. We assume that a
centralized decision maker controls both the inventory points and the production
system. Then, the objective of the centralized decision maker is to minimize the
sum of fixed set-up costs, work-in-process holding costs and final inventory holding
costs. Moreover, we impose that customer demand has to be serviced with a target
fill rate. The decision variables that can be influenced by the decision maker are the
review periods and the order-up-to levels of the products.
Typically, integrated PI systems with job shop routing structure and stochastic
demand interarrival times, set-up times and processing times can be found in metal
or woodworking companies, e.g. the suppliers of the automotive or aircraft building
industry. In particular, the advent of Vendor Managed Inventory (VMI) has forced
manufacturing companies to integrate production and inventory decisions.
It appears that it is impossible to analyse this integrated PI system exactly and
consequently, it is very difficult (if possible) to solve the optimization problem faced
Control of multi-product multi-machine production-inventory systems 255

by the centralized decision maker to optimality. Therefore, we propose a heuristic

method that allows us to determine the near-optimal production and inventory con-
trol decisions. To this end, we apply and integrate aspects from production control
and inventory theory. We follow the basic idea of Zipkin [34]: we use standard
inventory models to represent the inventory points of every product and standard
results on open queueing networks to represent the production system. Next, we link
these submodels together appropriately. Our model differs from Zipkin’s model in
several ways. Firstly, we consider periodic review (R, S) policies, instead of con-
tinuous review (s, Q) policies. Secondly, we use the ﬁll rate as measure of customer
service, instead of a backorder cost. Thirdly, our multi-workcenter production fa-
cility is modeled as a general open queueing network, instead of an open Jackson
network. The general open queueing network model allows us to account more
accurately for the effect of batching on the arrival and production processes. In
our heuristic, the production capacity is explicitly modeled because the inventory
control and the production control are interrelated and depend both on the produc-
tion capacity. The inventory policy determines the review periods, which on their
turn determine the order throughput times in the production system. Based on these
throughput times, the safety stock can be set at the inventory points. This reasoning
shows how the production and inventory system form one integrated system. If
both subsystems are controlled in isolation, a sequential control approach can be
used (see e.g [24]). The review periods are set without assessing the impact on the
production system (e.g. using the economic order quantity). Next, one observes
the throughput times that result from the selected review periods. Based on the ob-
served throughput times, the safety stocks can be set. The costs resulting from this
sequential decision-making approach typically exceed the costs of the integrated
control approach since treating the subsystems in isolation leads to suboptimality.
The integrated production-inventory control approach proposed in this paper
is a simpliﬁed version of the hierarchical production control approach advocated
by Lambrecht et al. [16] for the control of real-life make-to-order job shops. Their
hierarchical control approach consists of three important decisions: (i) lot sizing
decisions; (ii) determination of the release dates of production orders; and (iii)
sequencing decisions. The control approach proposed in this paper focuses on the
decisions at the highest level of the control hierarchy: the lot sizing decisions. In
our control approach, the lot sizes are determined by setting the review periods
for all products in the PI system. Note that the production system is characterized
by considerable set-up times. Therefore, by setting the review periods the decision
maker allocates the available production capacity to the different products. We
propose an approximate analytical model that allows determining the review periods
that minimize the total relevant costs. The approximate analytical model is an
extension of the lot sizing procedure based on open queueing networks developed
in [16]. Their lot sizing procedure is used to minimize the expected lead times in a
make-to-order job shop. We propose an approximate analytical model that takes into
account many of the characteristics of the integrated PI system. Most importantly, it
explicitly deals with the interaction between the production orders for the different
products in the job shop, assuming a FCFS priority rule for the sequencing of the
production orders. In this way, the approximate analytical model takes into account
256 P.L.M. Van Nyen et al.

the influence of the review periods on the capacity utilization of the workcenters and
on the order throughput times (thus dealing with the multi-product aspect of the PI
system under study). Similarly to [13, 15] and [16], we observe a convex relationship
between the review periods and the order throughput times. The control decisions
situated at the lower level of the hierarchical framework of Lambrecht et al. [16],
the determination of the release dates and the sequencing decisions, are dealt with
in a straightforward way. Firstly, the production orders are released immediately.
Secondly, the sequencing of all orders is done using the FCFS priority rule. The use
of the FCFS rule allows for the queueing network to be analyzed using standard
queueing network analysers. However, from production control theory it is known
that other sequencing policies (priority rules or scheduling methods) may lead to
substantial time and cost savings. An overview of priority rules and scheduling
methods can be found in [21]. We believe that sequencing policies, other than the
FCFS priority rule, can be embedded in the hierarchical control framework in the
same way as Lambrecht et al. [16] incorporate the shifted bottleneck procedure [1]
for the scheduling of production orders. Observe that by choosing reasonable – not
necessarily optimal – review periods or lot sizes, the sequencing decisions situated
lower in the control hierarchy become significantly easier to make.
To the best of our knowledge, this paper is the only research that studies the
specific problem of minimizing the total relevant costs in integrated multi-product
multi-machine PI systems with job shop routing structure and stochastic demand
interarrival times, set-up times and processing times. For related problems, however,
solution methods have been developed. First, for the deterministic version of this
problem, Ouenniche et al. [19, 20] present heuristics that are based on cyclical
production plans. The cost-optimal cyclical plans are generated using mathematical
programming techniques. We believe that it is not possible to apply the results
of Ouenniche et al. directly in our setting because the influence of variability in
the demand and manufacturing processes makes the proposed production plans
infeasible.
A second relevant contribution is the literature review on the stochastic lot
scheduling problem (SLSP) by Sox et al. [25]. This paper gives an extensive
overview of current research on the control of multi-product PI systems in which
the production system consists of a single workcenter. The critical assumption in
this body of research is that the performance of the production system is deter-
mined by a single bottleneck process. Although this assumption may be valid in
some situations, it is certainly not in others. The heuristic proposed in this paper
explicitly considers situations in which the production system consists of multiple
workcenters.
Thirdly, Benjaafar et al. [4, 5] study managerial issues related to PI systems,
such as the effect of product variety and the benefits of pooling. Benjaafar et al.
[4] proposes a method to jointly optimize batch sizes and base-stock levels for
a multi-product single workcenter integrated PI system controlled by continuous
review (s, Q) policies.
Fourthly, Amin and Altiok [3] and Altiok and Shiue [2], study production
allocation issues in multi-product PI systems. More specifically, they address such
issues as “when to switch production to a new product” and “what product to switch
Control of multi-product multi-machine production-inventory systems 257

to”. They propose to handle the first issue with a continuous review inventory
control policy. The second issue is resolved by using switching policies that are
based on priority structures. Amin and Altiok [3] use simulation to compare a
number of manufacturing strategies and switching policies for a serial multi-stage
production system with finite interstage buffer space. Altiok and Shiue [2] develop
an approximate analytical model to compute the average inventory levels under
different priority schemes for a single machine production system.
Fifthly, Rubio and Wein [22] study a PI system where the production system
is modeled as a Jackson queueing network. Under this assumption, they derive a
formula characterizing the optimal base stock level. The main difference with our
approach is that they do not have batching issues in their model because of the
absence of set-up times and set-up costs. Note that it is precisely the batching issue
that makes the problem very hard to solve.
As a sixth contribution, we mention the work of Lambrecht et al. [16] who
study the control of a stochastic make-to-order job shop. They describe a lot sizing
procedure that minimizes the expected lead times and thus, the expected work-in-
process costs. To this end, they model the production environment as a general
open queueing network. Vandaele et al. [28] successfully implemented the method
described in Lambrecht et al. [16] to solve lot sizing problems with the medium-
sized metal working company Spicer Off-Highway. Their research shows that the lot
sizing procedure is capable of solving realistic, large-scale problems. Our research
builds on the work of Lambrecht et al. We extend their make-to-order model by
including inventory points so that we obtain an integrated PI system in which
products are made-to-stock instead of made-to-order.
Seventhly, Bowman and Muckstadt [7] present a production control approach
for cyclically scheduled production systems that are characterized by demand and
process variability. The production management can delay the release of material
to the floor and increase production in a cycle to anticipate on demand and to avoid
overtime in future cycles. The control approach uses estimates for the cycle time
and the task criticality that are obtained from a Markov chain model. Using these
estimates, the control approach seeks a trade-off between inventory holding costs
and overtime costs.
Finally, Liberopoulos and Dallery [18] propose a unified modeling framework
for describing and comparing several production-inventory control policies with lot
sizing. The control method used in this paper is based on (R, S) inventory policies
and is comparable to the Reorder Point Policies (RPPs) described in [18]. More
insights on RPPs can be found in their paper. Our work differs from their work
in several aspects. First, they study a N-stage serial PI system through which a
single product is manufactured while we focus on a single stage PI system in which
multiple products are produced. Secondly, Liberopoulos and Dallery use queueing
network representations to define (not analyze or optimize) several production-
inventory control policies that decide when to place and release replenishment
orders at each stage. Our work focuses on a (R, S) inventory rule based control
policy for which we not only define the control policy in place, but also present a
heuristic to make the production and inventory control decisions that minimize the
relevant costs.
258 P.L.M. Van Nyen et al.

The remainder of this paper is organized as follows: Section 2 presents the for-
mal problem statement; Section 3 proposes a heuristic to determine review periods
and order-up-to levels that minimize the relevant costs; in Section 4 the perfor-
mance of the heuristic is tested in an extensive simulation study; in Section 5, the
results of the simulation study are discussed; and ﬁnally, Section 6 summarizes the
major ﬁndings of our paper.

2 Formal problem statement

First, we introduce the notation used in this paper. Then, we derive formulas for
the different cost components. After this, a formal problem deﬁnition is given.

2.1 Notation

General input variables:

– P : number of products;
– M : number of workcenters in the production system;
– ADi : interarrival times of demand for product i (stochastic variable);
– αi∗ : target ﬁll rate for product i;

Cost related input variables:

– ci : ﬁxed set-up costs incurred for one production order of product i;

– vij : echelon value of one item of product i at workcenter j;
– vi : end value of product i;
– r : inventory holding cost per unit of time;

Tactical control variables:

– Ri : review period of product i;

– Si : order-up-to level of product i;
– ssi : safety stock for product i;

Performance related output variables:

– Tij : throughput time of production orders for product i at workcenter j (stochas-

tic variable);
– αi : realized ﬁll rate for product i;

Mathematical operators:

– E[.] : expectation of a stochastic variable;

– σ 2 [.] : variance of a stochastic variable;
– c2 [.] : squared coefﬁcient of variation of a stochastic variable.
Control of multi-product multi-machine production-inventory systems 259

2.2 Modeling the cost components

In a periodic review policy, a replenishment order for product i is placed every Ri

time units. Consequently, the number of orders placed per time unit is given by
Ri−1 , so that the total set-up costs per time unit for product i are given by:
SCi = ci Ri−1 .
We use Little’s law to compute that the average number of items of product i at
E[T ]
workcenter j equals E[AijD ] . Multiplying the average work-in-process at a machine
i
with the holding cost and summing over all machines leads to the total work-in-
process cost for product i:
M
E [Tij ]
W IP Ci = vij r .
j=1
E ADi

The ﬁnal inventory cost for product i is given by the formula below [24]. The term
between brackets gives the average amount of ﬁnal inventory at inventory point i,
which consists of half the average order quantity plus the safety stock.

Ri
F ICi = + ssi vi r.
2E AD i

The total cost for product i is simply the sum of its components. Clearly, the total
cost T C for the whole PI system is given by the sum over all products of the total
costs for each product:
P

TC = (SCi + W IP Ci + F ICi ) (1)
i=1

2.3 Formal problem statement

In this system, we have one centralized decision maker who wants to minimize the
total costs of the PI system. As stated in the introduction, the total costs consist
of set-up costs, final inventory holding costs and work-in-process holding costs.
The decision maker has to ensure that the target fill rates are satisfied and that
the review periods are positive. Consequently, the mathematical formulation of the
optimization problem can be stated as:
⎡
⎤
P M
⎣ci Ri + E [T ij ] R i
min −1
vij r + + ssi vi r⎦ (2)
Ri
i=1 j=1
E AD i 2E AD i

subject to:
1. αi ≥ αi∗ for i = 1, ..., P
2. Ri > 0 for i = 1, ..., P
Observe that one can easily compute most of the cost components if the review
periods for all the products are given. However, two variables – the throughput time
260 P.L.M. Van Nyen et al.

in the workcenters Tij and the safety stock ssi – cannot be computed analytically.
In order to ﬁnd an expression for the throughput times Tij in the production system,
a general open queueing network should be solved. Unfortunately, no exact results
for the throughput times in such a queueing system are known. Consequently, it
is also impossible to ﬁnd an exact expression for the safety stock ssi since ssi
depends on the average and the variance of the throughput times. In conclusion, it
is impossible to derive exact expressions for these variables and this implies that
our objective function is analytically intractable. Therefore, we have to rely on
estimates to evaluate the cost of a given solution.

3 Heuristic to determine review periods and order-up-to levels

In this section, we present a three-phase heuristic that allows ﬁnding near-optimal

review periods and order-up-to levels. The heuristic is based on an integrated view
of the inventory and production system and takes into account all relevant costs,
including work-in-process and safety stock costs. Moreover, the heuristic simul-
taneously considers cost and capacity aspects. This results in a solution that is
near-optimal in terms of costs and feasible with respect to production capacity.
Given that the exact analytical evaluation of the objective function is mathe-
matically intractable, we have to use estimation methods to evaluate and optimize
the objective function. Two estimation methods can be used to estimate the relevant
costs in the PI system under study: simulation and approximate analytical models.
Simulation is an accurate estimation method and therefore, simulation-based op-
timization techniques can be used to accurately solve the optimization problem.
For details on simulation-based optimization, see e.g. [17]. Unfortunately, these
techniques are very expensive in terms of computation time. This often prohibits
the use of simulation based optimization techniques, even for medium-sized prob-
lems. Alternatively, it is possible to optimize our problem using an approximate
analytical model. The main advantage of an approximate analytical model is the
low amount of computation time required. Obviously, the price of the gain in speed
is a certain degree of inaccuracy. To have the best of both worlds, we propose a
three-phase heuristic that combines an approximate analytical model with simu-
lation techniques. The simulation techniques are used to circumvent some of the
inaccuracies due to the approximate analytical model.
Our heuristic is presented graphically in Figure 2. In the optimization phase,
the heuristic determines near-optimal review periods and initial order-up-to levels
based on an approximate analytical model. The approximate analytical model is
designed so that it captures the most essential characteristics of the PI system while
it can be optimized fast using a greedy search algorithm. Unfortunately, the use of
an approximate model may result in suboptimal control decisions. Also, the initial
order-up-to levels computed by the approximate model may be insufficient to meet
the target fill rates. Therefore, the second phase of the heuristic uses simulation
techniques to fine-tune the order-up-to levels. Finally, in the third phase of the
heuristic, the costs and operational characteristics (fill rates, throughput times, etc.)
are estimated accurately with simulation. The remainder of this section discusses
the three phases of the heuristic in more detail.
Control of multi-product multi-machine production-inventory systems 261

Fig. 2. Outline of three-phase heuristic

3.1 Optimization phase

In the ﬁrst phase of the heuristic, near-optimal tactical inventory and production
control decisions are determined. The optimization tool consists of two main ele-
ments: (i) an analytical model that approximates the relevant costs given a vector
of review periods and (ii) a greedy search method that ﬁnds the vector of review
periods that minimizes the relevant costs. In the subsections below, both elements
are described in more detail.
The approximate analytical model follows the same basic idea as Zipkin [34].
First, we use a standard inventory model to represent the (R, S) inventory policy
262 P.L.M. Van Nyen et al.

of every product, see e.g. [24]. Next, we use standard results on open queueing
networks to represent the production system. Our open queueing network is solved
using the Queueing Network Analyser developed by Whitt [32]. Similarly to [16],
the expressions for the performance measures of the queueing network are written
as a function of the production lot size. Finally, we link both submodels together
using concepts from renewal theory; see [8]. The resulting analytical model is an
approximation to the real PI system. Similarly to the approach proposed by Zipkin,
we sacriﬁce accuracy for the sake of simplicity and computational tractability.

3.1.1 Approximate analytical model. In this section we present an approximate

analytical model to estimate the relevant costs in the PI system under study, given
a set of review periods R = (R1 , ..., Ri , ..., RP ). The analytical model consists of
four successive steps.
Step 1: Determine characteristics of production orders
We start by analysing the generation of replenishment orders by the inventory
points. Note that in the PI system under study, generating a replenishment order at
an inventory point results in placing a production order to the production system. By
analysing the characteristics of the replenishment orders, we therefore implicitly
analyse the characteristics of the production orders that arrive to the production
system. In our approximation model, we focus on two main characteristics of the
production orders: the time between the arrivals of two successive production orders
of product i, referred to as the interarrival time AP i , and the processing time of a
production order for product i on machine j, denoted as PijP . We limit ourselves to
the determination of the expectation E[.] and variance σ 2 [.] of the interarrival times
and the processing times. In the case of an (Ri , Si )-inventory policy, a production
order of variable size is generated at every review moment, i.e. every Ri time units.
Therefore, the expectation and variance of the interarrival times of production orders
are given by: E[AP 2 P
i ] = Ri and σ [Ai ] = 0. The production orders for product
i are of variable size, which we denote here by Ni . By applying limiting results
from renewal theory [8] we obtain that the number of arrivals Ni in a review period
Ri is approximately normally distributed with mean E[Ni ] = Ri E −1 [AD i ] and
variance σ 2 [Ni ] ≈ Ri c2 [ADi ]E −1
[A D
i ]. Note that the normal approximation for
the number of arrivals in a review period is only justified if the review period is
relatively long or the arrival rate is relatively high, since there must be a significant
number of arrivals within a review period. Since we are concerned with situations
in which the average demand for end products is relatively high, the use of the
normal approximation is acceptable here.
From these expressions, we can derive the mean and variance of the production
order processing times, but first we introduce some additional notation:
Pij : processing time of one item of product i at machine j;
Lij : set-up time of production orders of product i at machine j.
The expected processing time of an order of product i on machine j is given
by the expected total processing time plus the expected set-up time, i.e. E[PijP ] =
Ri E −1 [AD
i ]E[Pij ] + E[Lij ]. We assume that the processing times of single units
Control of multi-product multi-machine production-inventory systems 263

are independent and identically distributed (i.i.d.) and independent of the set-up
time. Then, the variance of the net processing times, excluding set-up time, equals
the variance of the sum of a variable number of variable processing times, which
can be computed with a formula given by e.g. [24]. Consequently, the variance of
the processing times of the orders of product i at machine j is given by:

σ 2 PijP = E [Ni ] σ 2 [Pij ] + E 2 [Pij ] σ 2 [Ni ] + σ 2 [Lij ]
2 D −1 D (3)
= Ri E −1 AD 2
i σ [Pij ] + E [Pij ] Ri c Ai E
2
Ai + σ 2 [Lij ]

Step 2: Analyse queueing network

The second step in the approximate analytical model uses the characterization of
the production orders to compute performance measures of the production system.
Based on the characterization of the production orders, we can model the production
system as a general open queueing network in which the arrival and production pro-
cesses of the orders have known first and second moments. From the late seventies
on, extensive research has been executed on the estimation of performance mea-
sures in such queueing systems. Our procedure is based on the queueing network
analyser developed by Whitt [32]. The lot sizing procedure proposed in Lambrecht
et al. [16] is also based on Whitt [32]. However, our approach differs from the work
of Lambrecht et al. in two ways: (i) they use a simplified expression for the scv of
the aggregated arrival process, whereas we use Whitt’s original approximation; (ii)
we use an improved expression for the scv of the departure processes that is due to
[33], whereas Lambrecht et al. use an adapted version of Shantikumar and Buzacott
[23]. Van Nyen et al. [29] present simulation results on the estimation performance
of the queueing network analyzer, which indicate that serious estimation errors
may occur. The queueing network analyser allows us to find approximations for
the expectation and variance of the throughput times of product i at the different
machines j in the production system, i.e. E[Tij ] and σ 2 [Tij ]. In order to approx-
imate the throughput times in the entire production system, Whitt [32] assumes
that the throughput times at different machines are independent. Then, the expec-
tation andvariance of the total throughput
M time of a production order are given by:
M
E[Ti ] = j=1 E[Tij ] and σ 2 [Ti ] = j=1 σ 2 [Tij ]. For more details on the use of
the queueing network analyser, see [26], [30] or [31].
Step 3: Calculate order-up-to levels and safety stocks
In the third step of the approximate analytical model, we determine the order-up-to
levels S = (S1 , ..., Si , ..., SP ) using standard inventory theory. The reorder level
is set such that the customer demand is satisfied with a target fill rate αi∗ . We need a
characterisation of the customer demand DiTi +Ri during the throughput time Ti and
the review period Ri to determine the appropriate order-up-to level. Note that the
customer demand DiTi +Ri is related to the time between successive demand arrivals
Ati . Using renewal theory, we obtain expressions for E[DiTi +Ri ] and σ[DiTi +Ri ].
See [31] for more details.
Then, the order-up-to level Si can be determined by:
Si = E[DiTi +Ri ] + ki σ[DiTi +Ri ] (4)
264 P.L.M. Van Nyen et al.

where ki is the so-called safety factor for product i that depends on the target ﬁll
rate αi∗ . Given a target ﬁll rate, [24] presents a very accurate rational approximation
for ki in the case of normally distributed demand. Finally, the safety stock ssi for
product i can be computed as: ssi = ki σ[DiTi +Ri ]. This step results in a vector of
order-up-to levels S = (S1 , ..., Si , ..., SP ) that correspond to the vector of review
periods R = (R1 , ..., Ri , ..., RP ).
Step 4: Estimate costs
In the previous steps, we presented an approach to approximate the expected
throughput times E[Tij ] and the safety stocks ssi , given a vector R =
(R1 , ..., Ri , ..., RP ). Using the expressions for the different cost components given
in Section 2.1 we can now compute the total costs T C corresponding to a solution R.

3.1.2 Optimization of tactical control parameters. In the previous subsection, we

presented an approximate analytical model to estimate the cost of a given set of
review periods R = (R1 , ..., Ri , ..., RP ). In this subsection, we try to find the vec-
tor of review periods R∗ = (R1∗ , ..., Ri∗ , ..., RP
∗
) that minimizes the total relevant
costs. Unfortunately, we cannot prove the unimodality of the total cost function in
terms of the review periods R. However, extensive tests allow us to postulate that
the objective function, when estimated with the approximate analytical model, is
unimodal in the review periods. We ground this unimodality-postulate on the ob-
servation that (i) using a greedy search algorithm with different starting solutions
always resulted in the same final solution; (ii) using simulated annealing, a general
optimization technique for non-convex functions (see e.g. [9]), never resulted in a
better solution than the greedy search algorithm. Moreover, our postulate is consis-
tent with [27] where it is postulated that the expected throughput times are convex
in the lot sizes.
Based on the unimodality-postulate, we propose to use a simple greedy search
algorithm for the minimization of the relevant costs, called univariant search paral-
lel to the axes. This approach fixes all review periods but one and performs a direct
search along this variable until the minimum of the objective function in the current
direction has been found. This minimum is then used as the starting point for the
next iteration. Again, all review periods but one are fixed and a direct search is
performed. This process is repeated until the value of the objective function cannot
be further improved. The final solution R∗ cannot be improved in any direction
parallel to the axes and is the solution proposed by the greedy search heuristic.
The performance of the greedy search heuristic has been tested against a simulated
annealing algorithm. For all instances in the test bed, the greedy search algorithm
outperformed simulated annealing.

3.2 Tuning phase

In the ﬁrst phase of the heuristic, we use an approximate analytical model to de-
termine the near-optimal review periods R∗ and initial settings for the order-up-to
levels S . Because of the use of approximations, the realized ﬁll rates may be lower
Control of multi-product multi-machine production-inventory systems 265

than the target fill rates. Given the constraints on the fill rate in the formal problem
statement this may result in the infeasibility of the solution. Also, it may happen
that the realized fill rates are higher than the target fill rates. Obviously, this leads
to a solution that is unnecessarily expensive. For these reasons, we add a second
step to the heuristic in which the order-up-levels are fine-tuned.
In this second step of the heuristic, we use a procedure proposed by Gudum and
de Kok [10]. Their procedure builds on the following observation. Given that the
inventory points are controlled by periodic review (R, S) policies with full backo-
rdering, the size and the timing of replenishment orders is determined by the review
periods only. In an integrated PI system, this implies that the arrivals of production
orders to the production system, as well as the processing times of the production
orders at the different workcenters are determined by the review intervals and not
by the order-up-to-levels. Therefore, the throughput times of the replenishment
orders are completely determined by the review intervals. This also implies that
a change in the order-up-to levels only influences the customer service levels. In
conclusion, this observation states that a given selection of the review periods fully
determines the stochastic behavior of the PI system and that the order-up-to levels
can be adjusted to achieve a certain customer service level without affecting the
behavior of the PI system. For more details on this observation, see [10].
In our heuristic, the fine-tuning phase starts by simulating the initial solution
to estimate the realized fill rates corresponding to the solution (R*, S). Based on
a trace of the inventory levels in the first simulation run, a procedure developed in
[10] is used to fine-tune the order-up-to levels. Note that we do not adjust the review
periods. The procedure makes use of the observation above, stating that changes in
the order-up-to levels only influence the fill rates. This procedure allows us to set the
order-up-to levels to the lowest possible levels that satisfy the fill rate constraints,
denoted as S∗ . This results in the solution (R∗ , S∗ ). From a computational point
of view, the procedure developed in [10] is attractive because it uses only one
simulation run instead of iterative simulation runs.

3.3 Estimation phase

In the third and ﬁnal phase of the heuristic, a simulation experiment is used to
estimate the costs and operational performance characteristics corresponding to
the solution (R∗ , S∗ ) . Simulation is used because of its estimation accuracy. All
the costs that are listed in Section 2.2 are estimated. The operational performance
characteristics that are estimated include the ﬁll rates, the throughput times at the
different workcenters and the utilization of the workcenters.

4 Testing the heuristic in a simulation study

The heuristic uses an approximate analytical model to determine the tactical pro-
duction and inventory decisions, which may result in suboptimal decisions. In this
section we test the performance of our heuristic in an extensive simulation study.
A speciﬁc problem instance is studied in detail in order to gain understanding of
266 P.L.M. Van Nyen et al.

the mechanisms embedded in the optimization tool. Furthermore, we compare our

heuristic to two simulation based optimization methods. Since simulation based
optimization techniques allow for the accurate optimization of the objective func-
tion, the comparison enables us to assess the quality of our heuristic. This section
is organized as follows. We ﬁrst present the experimental design of our simulation
study. Then, a speciﬁc problem instance is studied into detail. Finally, we compare
the performance of our heuristic with the simulation based optimization methods.

4.1 Experimental design of the simulation study

In the simulation study, we consider an integrated PI system with 10 products and

5 workcenters. We assume that the customer demands arrive according to a Pois-
son process. Furthermore, the set-up times and processing times are exponentially
distributed, leading to phase-type production order processing times. This assump-
tion allows incorporating all kinds of variability that are present in real production
systems: operator influences, workcenter defects, etc. The parameter values in our
experimental design are based on data from two medium-sized metalworking com-
panies.
In the simulation study, we vary four factors over several levels:
– net utilization of the workcenters ρnet (0.65, 0.75, 0.85);
– set-up times Lij (randomly generated in the intervals [30, 60] min. or [90, 180]
min.);
– set-up costs ci (randomly generated in the intervals ¼ [0, 0], ¼ [6.67, 13.33],
¼ [20, 40] or ¼ [60, 120]);
– target fill rates αi∗ (0.90, 0.98).
The number of combinations that can be made with the levels of the four factors
equals 3×2×4×2 = 48 combinations. We use a procedure presented in Appendix
1 to generate five random instances for each combination of the levels of the four
factors. Therefore, the total simulation study consists of 5 × 48 = 240 instances.
In order to reduce the computation time required for the optimization phase
of the heuristic, we restrict the value of the review periods to multiples of 100
minutes. Since the objective function is flat around the optimum, this restriction
has a negligible impact on the total cost of a solution. In the second and third step
of our heuristic, we use a simulation model that is built in Simula. Simula is a
general-purpose simulation language, for more details see [6]. We use the batch-
means method with 10 subruns to find performance estimates. The length of the
subruns is chosen so that at least 100,000 customer orders for each product arrive.
The review moments are initialized by letting them start at a random moment in the
interval [0, Ri ] for i = 1, ..., P . This ensures that no special patterns are built into
the order generation process and into the arrival process of orders to the production
system.
Control of multi-product multi-machine production-inventory systems 267

4.2 Mechanisms embedded in the approximate analytical model

In this section, we discuss how the approximate analytical model uses the review
periods to minimize the total relevant costs. The mechanisms behind the selection
of the review periods are illustrated by comparing the detailed output of the opti-
mization tool with the output of a simple heuristic method for setting the review
periods. More specifically, we use the economic order quantity (EOQ) expressed
as a time supply to set the review periods, see e.g. [24]:
1
2ci E AD i
Ri = (5)
vi r
The EOQ formula completely ignores the impact of the lot sizing decision on the
production system. Hence, it does not take into account the costs that are related
to capacity utilization and throughput times, i.e. work-in-process and safety stock
costs. We use the uncapacitated EOQ approach to solve one random set of 48
problem instances of the experimental design. Below, the results for all 48 problem
instances are summarized, but first we study in detail one specific problem instance.
This problem instance is selected because it clearly demonstrates how our heuristic
works. In this way, the reader can gain understanding of the mechanisms that
are embedded in the heuristic. The selected problem instance is characterized by:
ρnet = 0.85; Lij ∈ [90, 180] min .; ci ∈ [6.67, 13.33] ¼ ; αi∗ = 0.98.
In Tables 1 up to 3, we present the detailed output of the simulation of the
heuristic (HEU ) and the uncapacitated EOQ method for this problem instance.
From the analysis of the decision variables and the corresponding performance
measures, we learn how the optimization tool works and how it tries to achieve the
minimal total relevant costs. Table 1 displays the decision variables (review periods
and order-up-to levels) and the resulting throughput times, characterized by their
expectation E[T] and standard deviation σ[T ]. It can be seen from Table 1 that
the heuristic proposes considerably smaller review periods than the uncapacitated
EOQ method. The order-up-to levels are lowered accordingly. Remark that the
review periods of the different products are decreased in a non-proportional way
in order to account for the specific processing characteristics of every product. The
impact of the smaller review periods on the expectation and standard deviation of
the throughput times is high: the expected throughput times decrease by 36.5% on
average while the standard deviations of the throughput times decrease by 42.4%
on average.
Table 2 shows the impact of the changes in the decision variables on the relevant
costs. Since the solution of the heuristic uses considerably smaller review periods,
the set-up costs are substantially higher compared to the uncapacitated EOQ solu-
tion (+58.4%). However, since the review periods chosen by the heuristic lead to
shorter and more reliable throughput times, the work-in-process costs and the final
inventory holding costs decline significantly (−35.8% and −36.5%). Overall, this
leads to a cost decrease realized by the heuristic versus the uncapacitated EOQ
approach of 9.0%.
Finally, Table 3 gives insight into the mechanisms embedded in the optimization
tool. We use elementary insights from queueing theory to illustrate the trade-offs
268 P.L.M. Van Nyen et al.

Table 1. Decision variables and throughput times for one problem instance: heuristic vs.
uncapacitated EOQ method

Prod. HEU EOQ

R S E[T ] σ[T ] R S E[T ] σ[T ]

1 3700 665 3784.2 653.1 8157 1326 6718.3 1362.6

2 4800 608 3844.4 865.7 6178 827 5385.9 1307.7
3 4600 657 5107.2 974.5 7449 1028 7812.7 1574.1
4 5400 536 2291.0 628.9 6359 679 3062.3 1012.5
5 5000 807 3639.5 857.0 5703 1092 5424.3 1481.8
6 4000 860 5028.4 923.8 7605 1490 8259.6 1539.9
7 5100 561 2781.4 782.0 7534 825 4045.0 1185.1
8 7000 574 2422.7 673.6 7289 703 3461.8 1329.2
9 4300 662 4763.3 852.6 7000 1061 7518.3 1472.9
10 3700 411 3455.2 694.8 8095 846 6780.5 1452.2

Avg. 4760 634.1 3711.7 790.6 7136.9 987.7 5846.9 1371.8

Table 2. Cost components for one problem instance: heuristic vs. uncapacitated EOQ
method

Prod. HEU EOQ

OC FIC WIPC OC FIC WIPC

1 8867.4 2526.5 2337.8 4021.7 5475.6 4183.7

2 3790.3 3230.4 2183.1 2945.1 4416.5 3079.6
3 5571.8 3242.3 2867.6 3440.8 5123.0 4410.6
4 3277.1 2849.1 1257.7 2783.0 3681.2 1681.3
5 4325.4 4521.5 2909.1 3792.2 6161.7 4257.8
6 8224.6 3579.1 3793.2 4325.6 6332.5 6192.0
7 4709.4 2842.7 1409.5 3188.0 4203.9 2050.2
8 3393.5 3581.2 1340.6 3259.2 4523.1 1960.0
9 4820.3 2670.3 2648.0 2960.6 4331.3 4224.1
10 6096.1 1877.1 1555.8 2786.5 3929.4 3055.4

Tot. 53076.0 30920.2 22302.3 33502.6 48178.1 35094.6

Overall 106298.4 116775.2

that are made by the heuristic. From elementary queueing theoretical results for
the GI/G/1 queue, e.g. Hopp and Spearman [12], we learn that there are four main
elements affecting the expectation of the throughput times E[T] on the machines:
(i) utilization ρ; (ii) variation of the arrivals c2a ; (iii) average processing time tp ;
Control of multi-product multi-machine production-inventory systems 269

Table 3. Operational characteristics of production system for one problem instance: heuristic
vs. uncapacitated EOQ method

Mach.nr. HEU EOQ

ρ c2a tp c2p ρ c2a tp c2p

1 0.91 0.7015 582.5 0.046 0.89 0.6735 867.7 0.082

2 0.93 0.6834 547.9 0.047 0.90 0.7526 807.2 0.082
3 0.89 0.6146 1022.6 0.024 0.88 0.5964 1563.9 0.050
4 0.90 0.4963 806.3 0.045 0.88 0.6533 1256.1 0.180
5 0.92 0.691 636.2 0.053 0.89 0.7967 1070.7 0.133

Avg. 0.91 0.637 719.1 0.043 0.89 0.695 1113.12 0.105

(iv) variation of the processing times c2p . This insight is based on the Kingman
approximation for the expectation of the throughput times in a GI/G/1 queue [14]:

c2a + c2p ρ
E [T ] = tp + t p (6)
2 1−ρ
From Table 3, it can be seen that the optimization tool adapts the review periods
so that the variation of the arrivals and the expectation and the variation of the
processing times are reduced. However, this happens at the expense of increased
utilization levels. We may conclude that the optimization tool ‘harmonizes’ the
review periods of the different products so as to obtain the best balance between
utilization and variability.
In the job shop production system under study, the departure process of a ma-
chine is the arrival process to the next machine in the routing of a product. An
elementary approximation, due to Hopp and Spearman [12], for the scv of the
departure process leaving a queue is:

c2d = ρ2 c2p + 1 − ρ2 c2a (7)
From this approximation, it can be observed that when the utilization of the ma-
chines is high, it is important to achieve low variation in the processing times in
order to obtain an arrival process with low variability to the next machine. Table
3 shows that the heuristic realizes a low variation in the processing times, while
the utilization levels are high (around 90%). From Table 1, it can be seen that the
actions taken by the heuristic, based on the mechanisms presented above, result
in shorter and less variable throughput times. Note that the mechanisms described
above are embedded in the proposed approximate analytical model using advanced
queueing theoretical results developed in [32, 33].
Now, we brieﬂy present the results for the 48 instances that were solved using
the uncapacitated EOQ method. In 14 out of 48 problem instances, the unca-
pacitated EOQ approach resulted into a solution that is infeasible with respect to
production capacity. For the 34 feasible instances, the uncapacitated EOQ solution
is on average 5.2% more expensive than the solution of the heuristic. The maxi-
mum cost increase reported on this set of experiments is 10.1%. The conclusion of
270 P.L.M. Van Nyen et al.

these experiments is that the uncapacitated EOQ method may work relatively well
compared to the heuristic, but since the uncapacitated EOQ approach does not take
into account capacity issues, it may result in unnecessarily expensive solutions or
in solutions that are infeasible with respect to production capacity (and that require
capacity expansion in the form of overwork, outsourcing, etc.).
In order to avoid that infeasible solutions are obtained, one can add capac-
ity restrictions to the uncapacitated EOQ method. Doing so, we obtain the fol-
lowing mathematical programming problem, which we call the capacitated EOQ
approach:
P
# %
ci vi r
min + Ri
Ri
i=1
Ri 2E AD i

subject to :
P
# % (8)
E [Pij ] E [Lij ]
1. D + ≤ ρmax
j f or j = 1, . . . , M
i=1
E Ai R i

2. Ri > 0 f or i = 1, . . . , P
The objective function of this mathematical program is identical to the cost function
of the uncapacitated EOQ method. The first set of constraints imposes that the
machine utilization must be lower than a maximum allowable utilization level
ρmax
j . The second set of constraints states that the review periods should be strictly
positive. Note that the objective function and the constraints are convex in the review
periods Ri . This convex programming problem can easily be solved to optimality
using the commercially available CONOPT algorithm. The CONOPT algorithm
attempts to find a local optimum satisfying the Karush-Kuhn-Tucker conditions. It
is well known that for convex programming problems a local optimum is also the
global optimum (see e.g.[11]).
The main difficulty that arises with this capacitated EOQ method is the choice
of the maximum allowable utilization level ρmax j . For deterministic problems ρmaxj
is usually chosen so that all production capacity is utilized, i.e. ρmax
j = 1. Clearly,
in stochastic settings ρmax
j should be lower than 1 for reasons of stability. It is,
however, not obvious how the precise value of ρmax j should be chosen. If ρmax
j is
chosen too low, this leads to long review periods (in order to reduce the capacity
utilization due to set-ups) and thus to high cycle stocks. On the contrary, if ρmax
j is
chosen too high, this results in high congestion, large amounts of work-in-process,
long throughput times and high safety stocks. A priori, the EOQ model is not able
to predict which value of ρmax
j leads to the lowest total relevant costs. Therefore,
in our experiments we vary ρmax j over a range of reasonable values and observe the
resulting total relevant costs.
We use the capacitated EOQ method to solve the 48 problem instances that
were also solved using the uncapacitated EOQ method. The value of ρmax j is set
to 0.90, 0.95 and 0.99. Let us define the deviation in the total costs between the
capacitated EOQ method and our heuristic as:
T C eoq − T C heu
∆= × 100%.
T C heu
Control of multi-product multi-machine production-inventory systems 271

Table 4. Summary statistics for ∆, the relative deviation in total costs between the capacitated
EOQ method and the proposed heuristic (instances with set-up costs > 0)

ρmax
j = 0.90 ρmax
j = 0.95 ρmax
j = 0.99

min. 1.9 1.9 1.9

ρnet = 0.65 avg. 3.7 3.7 3.7
max. 5.4 5.4 5.4

min. 1.8 1.7 1.7

ρnet = 0.75 avg. 5.0 5.0 5.0
max. 7.5 7.5 7.5

min. 5.7 4.2 3.8

ρnet = 0.85 avg. 25.2 7.8 6.7
max. 80.2 10.7 10.1

Table 5. Summary statistics for ∆, the relative deviation in total costs between the capacitated
EOQ method and the proposed heuristic (instances with set-up costs = 0)

ρmax
j = 0.90 ρmax
j = 0.95 ρmax
j = 0.99

min. 3.3 7.1 121.2

ρnet = 0.65 avg. 3.8 9.0 157.6
max. 4.6 11.0 224.5

min. 17.5 1.8 71.5

ρnet = 0.75 avg. 19.2 3.1 84.5
max. 21.2 4.5 96.2

min. 106.8 15.0 11.7

ρnet = 0.85 avg. 114.6 17.9 13.8
max. 119.1 20.4 16.0

Tables 4 and 5 give the minimum, average and maximum of ∆, respectively for the
instances with set-up costs > 0 and the instances with set-up costs = 0. The results
are shown for the different levels of the net utilization of the machines ρnet .
The results in Tables 4 and 5 show that the proposed heuristic always outper-
forms the capacitated EOQ method. The capacitated EOQ approach may work
reasonably well, provided that a good choice is made for ρmax j : the lowest ∆ ob-
served in this set of instances is 1.7%. However, one can also observe that an
inappropriate choice of ρmax
j may result in a very poor performance: the maximum
of ∆ in this set of instances is 224.5%. As mentioned before, the EOQ approach
does not provide any guideline for choosing the value of ρmax j .
For the instances with set-up costs > 0 and ρmax j = 0.99, the performance of
the capacitated EOQ method seems reasonable: the average of ∆ is 5.1% with a
272 P.L.M. Van Nyen et al.

maximum of 10.1%. It appears that in the majority of these instances the capacity
constraints are non-binding so that the solution of the capacitated EOQ method is
identical to the solution of the uncapacitated EOQ method. When ρmax j is lowered,
the performance of the capacitated EOQ method degrades for the instances with
high ρnet . When ρnet = 0.85 and ρmax j = 0.90, the average of ∆ is 25.2% with
a maximum of 80.2%. For the majority of the instances with low and moderate
ρnet , the capacity constraints are also non-binding when ρmax j = 0.90 and 0.95.
Therefore, in these instances the performance of the capacitated EOQ method is
similar to that of the uncapacitated EOQ method and the capacitated EOQ method
with ρmax
j = 0.99.
For the instances with set-up costs = 0, it seems to be even more important to se-
lect the appropriate value of ρmax
j than in the case of set-up costs > 0. For example,
when ρnet = 0.65, the capacitated EOQ method works well when ρmax = 0.90:
the average of ∆ is 3.8% with a maximum of 4.6%. If ρmax is chosen too high, the
performance of the capacitated EOQ method degrades strongly: for ρmax = 0.99,
the average of ∆ is 157.6% with a maximum of 224.5%. Similar results hold for
the instances with ρnet = 0.75. For the instances with ρnet = 0.85, the capacitated
EOQ approach performs rather poorly for all choices of ρmax : the minimum of ∆
reported on this set of instances is 11.7%.
The main conclusion from these experiments is that the capacitated EOQ ap-
proach is very sensitive to the choice of ρmax . Since the appropriate value of ρmax
depends on the specific characteristics of the problem instance, it is difficult to
develop a general rule of thumb for selecting ρmax . The heuristic proposed in this
paper does not suffer from this problem. The approximate analytical model embed-
ded in the heuristic explicitly models the impact of the review periods on capacity
utilization and on congestion phenomena, taking into consideration the specific
characteristics of the problem instance. Therefore, our heuristic is a more robust
and reliable method to set the decision variables.

4.3 Testing the quality of the heuristic

In this section, we test the optimization quality of the heuristic. The methodology
used for this test warrants some discussion. First, note that the optimal solution
for the problem under study is unknown. Furthermore, no high-quality bounds on
the optimal costs are available, mainly due to the difﬁculty to ﬁnd bounds on the
waiting times in the production system. Moreover, to the best of our knowledge, no
other control approaches have been developed for the integrated PI system with job
shop routings and stochastic arrival, processing and set-up times. Consequently, the
heuristic solution cannot be compared to the true optimum, nor to a good bound,
nor to another control approach reported in the literature. In short, there exists no
good benchmark to test the quality of our heuristic. Therefore, we constructed our
own benchmarks in order to test the performance of the heuristic.
First, we test the prediction quality of the approximate analytical model. If the
prediction quality of the approximate model is satisfactory, then one may expect
that the optimization quality of the tool is good. However, when the approximate
analytical model wrongly estimates the absolute value of the costs, but correctly
Control of multi-product multi-machine production-inventory systems 273

Fig. 3. Frequency diagram for relative difference between total costs approximate analytical
model (AAM) and simulation (SIM)

captures the relative behavior of the costs in function of the review periods, the
optimization process may still perform well. In Figure 3, we present the relative
difference between the cost estimates of the approximate analytical model (AAM)
and simulation (SIM) for the solutions proposed by the heuristic for the 240 in-
stances. Since for optimization purposes mainly the absolute value of the relative
deviation is relevant, we group the positive and negative intervals with the same
absolute value. The frequency of negative and positive values of the relative devi-
ation is shown in distinctive colors. From Figure 3, it can be seen that the relative
approximation error is rather small, lying in the range of −17% to 12% on this set
of 240 experiments. From this figure, it also can be seen that for the vast majority
of the instances (more than 92%) the absolute relative approximation error is lower
than 10%. Negative relative differences occur, but in more than 70% of the cases
the relative error is positive. From these results, we conclude that the estimation
quality of the approximate analytical model is satisfactory.
Secondly, in order to test the optimization quality of the heuristic we develop
two simulation based optimization methods to solve several instances of our opti-
mization problem. Simulation based optimization is known to be an accurate but
time-consuming optimization method, see e.g. Law and Kelton [17]. The perfor-
mance of the simulation based optimization methods is compared to the perfor-
mance of our heuristic in terms of the optimization quality as well as the required
computation time.
We use two different simulation based optimization methods:
– A modification of the greedy search algorithm presented in Section 3.1.2. We
use three different step sizes to perform the search along the axes. For each of
the step sizes, we use the greedy search algorithm. The solution of one phase
is used as an input for the next phase. The step sizes for the review periods are
2500, 500 and 100 minutes.
– OptQuest, a commercially available software package developed by Glover
et al. OptQuest combines elements of scatter search, taboo search and neural
networks to find solutions for non-convex optimization problems [17]. We limit
the review periods to the interval [0.5, 2] times the review periods R∗ proposed
274 P.L.M. Van Nyen et al.

by our heuristic. Moreover, we use a step size of 100 minutes for the review
periods.
Both optimization techniques suggest new vectors R that need to be evaluated
using simulation. In our research, we use the second and the third phase of our
heuristic to evaluate the vector R . The evaluation of a vector R consists of a
tuning phase, see Section 3.2, in which the correct order-up-to levels are computed
based on a simulation run. Next, the total cost of the solution R is estimated using
a second simulation run in the evaluation phase presented in Section 3.3.
In order to reduce the computation time required, we use the solution of the
heuristic as initial solution. The simulation based optimization methods are then
used to improve this initial solution. Furthermore, we limit the number of simulation
subruns to 5. However, even with these measures, the simulation based optimization
techniques take very large amounts of computation time. For this reason, it is
impossible to use the simulation based optimization techniques for all 240 instances
in the experimental design. Instead, we select 15 worst-case instances for which
we apply simulation based optimization techniques.
The selection of the 15 worst-case instances warrants some discussion. Let
us first introduce a lower bound for the total costs in the PI system under study.
The lower bound neglects the impact of the variability in the system as well as
the interaction between different products. More details on the lower bound can
be found in Appendix 2. The computation of the lower bound is only possible for
the 180 instances with set-up costs ci > 0. We compute the relative deviation e1
between the total cost of our heuristic and the lower bound. On the set of 180
instances, e1 usually lies in the interval 10–30% with an average of 20%. In this
paper, we use e1 as an indicator for the potential improvement that can be realised
when simulation based optimization techniques are used. We select 10 instances
with the largest e1 to be optimized using simulation based optimization techniques.
Since these instances have the largest potential for improvement, we call them
worst-case instances. We ensure that only one of the five random instances of the
same combination of levels of the four factors is selected as a worst-case instance.
For the case of set-up costs = 0, we cannot compute the lower bound presented above.
Therefore, we use another criterion to select the instances. Since the optimization
quality of the heuristic depends on the accuracy of the approximate analytical model
presented in Section 3.1.1, we select the 5 instances with the largest deviation
between cost estimate of the approximate model and the total costs estimated with
simulation for the optimal solution R∗ proposed by the heuristic. The selected
instances are worst-case instances on the criterion of approximation performance.
Again, we ensure that only one of the five random instances of the same combination
of levels is chosen. In this way, 15 worst-case instances are selected.
Table 6 summarizes the findings of the simulation based optimization exper-
iments for the 15 worst-case instances. It presents the optimal total costs of our
heuristic T C heu , the greedy search algorithm T C gs and the OptQuest algorithm
T C optq . Also the 90% confidence interval on the total costs is given. Finally, Table
6 presents the relative improvement of the greedy search and OptQuest algorithm
with regard to the solution of the heuristic, defined as:
Control of multi-product multi-machine production-inventory systems
Table 6. Results of simulation based optimization experiments

Exp. nr. T C heu CI (90%) T C gs CI (90%) i1 (%) T C optq CI (90%) i2 (%) max(i1 ,i2 ) (%)

1 172843.8 426.6 169700.5 722.7 1.8 170462.7 335.5 1.4 1.8

2 166937.9 288.5 164988.8 403.1 1.2 163182.4 153.9 2.2 2.2
3 180776 308.7 179496.9 305 0.7 175361.6 391.3 3.0 3.0
4 308863.3 508.5 305763.6 759.4 1.0 303366.3 587 1.8 1.8
5 193689.6 650.5 187546.9 838.7 3.2 191255.4 790.8 1.3 3.2
6 119521.1 592.6 118594.7 938 0.8 118768.7 657.5 0.6 0.8
7 103428.2 270.3 101259.9 240 2.1 101309.4 240.7 2.0 2.1
8 139734.6 717.5 135119.1 1177 3.3 138166.8 622.9 1.1 3.3
9 292435.2 566.0 290025.9 558.7 0.8 286411.8 424.3 2.1 2.1
10 98308.4 143.0 95414.6 100.6 2.9 95801.9 129.4 2.5 2.9
11 21316.6 95.0 21027.7 151.2 1.4 20899.5 127.9 2.0 2.0
12 26897.1 158.0 26503.7 119.7 1.5 26105.3 123.7 2.9 2.9
13 6883.8 25.0 6755.4 18.4 1.9 6752.2 22.8 1.9 1.9
14 8690.8 43.7 8510 37.7 2.1 8665.6 32.7 0.3 2.1
15 13561.9 102.2 12729 72.4 6.1 13189.4 71.3 2.7 6.1

Average 2.0 1.9 2.5

275
276 P.L.M. Van Nyen et al.

T C heu − T C gs T C heu − T C optq

i1 = × 100% and i2 = × 100%.
T C heu T C heu
On the set of 15 experiments, the greedy search heuristic achieves a 2.0 %
improvement on average. The OptQuest technique achieves a 1.9 % improvement
on average. Taking the maximum improvement for each instance, we see that the
solution of our heuristic can be improved by 2.5 % on average using simulation
based optimization techniques. On this set of 15 worst-case instances, the maximum
improvement is 6.1 %. Based on these results, we claim that the optimization quality
of our heuristic is satisfactory.
Next, we compare the results of the greedy search algorithm and the OptQuest
algorithm. Table 6 shows that in 60% of the instances the greedy search algorithm
outperforms the OptQuest algorithm, while in 40% of the instances OptQuest out-
performs the greedy search algorithm. The difference in optimization performance
between both search methods does not exhibit a clear pattern and we were not able
to relate it to any of the factors in the study. In our opinion, the difference in per-
formance is due to the fact that both simulation based optimization techniques are
heuristics without any performance guarantee. Both simulation based optimization
techniques can get stuck into non-optimal solutions, as can be observed in Table 6.
The simulation based optimization techniques require large amounts of compu-
tation time: the OptQuest algorithm is stopped after 1000 iterations, resulting in the
solutions presented in Table 6. The greedy search algorithm used a variable number
of iterations to achieve the results in Table 6: on average 123, with a minimum of
80 and a maximum of 185. Depending on the problem instance, one iteration takes
about 2.5 to 4 minutes on an Intel Pentium 4 – 2.00 GHz. processor. On average, the
OptQuest algorithm takes about 54 hours to find the solutions presented in Table 6.
The greedy search algorithm gives comparable results in a much shorter amount of
time: about 6.5 hours on average. Our heuristic, however, is many times faster than
the simulation based techniques: it takes about 8 minutes on average to find solu-
tions that are only slightly worse than the solutions found by the simulation based
optimization techniques. In conclusion, we can state that our heuristic performs
satisfactory: it takes a fraction the time required by simulation based optimization
techniques to find solutions that are only slightly worse in terms of total costs.
Table 7 presents the average of the review periods, order-up-to levels and ma-
chine utilization for every problem instance solved using simulation based optimiza-
tion. The first column gives the instance number. The second and third columns
present the relative difference between the average of the review periods over all
products for the simulation based techniques and the heuristic. The fourth and fifth
columns give the relative difference between the average order-up-to levels for
the simulation based optimization methods and the heuristic. The sixth and sev-
enth columns give the relative difference in the average of the utilization of the
machines in the production system between the simulation optimization and the
heuristic. Finally, the eight and ninth columns repeat the relative improvement i1
and i2 that is obtained by using the simulation based optimization techniques versus
the heuristic.
From the analysis of Table 7, we try to gain understanding of how the three op-
timization techniques work. Surprisingly, no clear patterns appear in the numerical
Control of multi-product multi-machine production-inventory systems
Table 7. Relative difference in average review periods, order-up-to levels, utilization and total costs for simulation based optimization vs. heuristic

Exp. nr. % diff. in avg. % diff. in avg. % diff. in avg. % improvement

review periods order-up-to levels utilization level in total costs

Rx −Rheu S x −S heu ρx −ρheu T C heu −T C x

Rheu
× 100% S heu
× 100% ρheu
× 100% T C heu
× 100%

gs optq gs optq gs optq gs optq

1 2.4 6.5 2.2 3.3 −0.5 −0.5 1.8 1.4

2 4.9 6.8 5.6 6.3 −0.7 −1.0 1.2 2.2
3 3.3 6.3 3.9 3.9 −0.1 −0.3 0.7 3
4 4.6 10.1 5.2 7.7 −0.2 −0.2 1 1.8
5 2.6 8.7 −1.0 4.0 −0.3 −0.6 3.2 1.3
6 −3.8 −1.4 −2.3 −1.2 0.4 0.2 0.8 0.6
7 7.7 8.7 4.7 5.1 −1.2 −1.6 2.1 2
8 −8.4 −1.7 −6.0 −1.6 0.7 0.1 3.3 1.1
9 9.5 5.3 8.6 2.7 −0.6 −0.4 0.8 2.1
10 1.7 1.5 −2.6 −1.4 0.0 −0.3 2.9 2.5
11 −3.4 2.2 −1.7 −0.9 1.7 −0.7 1.4 2
12 −1.5 2.6 −1.2 −1.6 0.8 −0.5 1.5 2.9
13 4.1 4.5 0.0 0.0 −0.6 −1.1 1.9 1.9
14 −0.8 2.0 −1.9 0.1 0.1 −0.6 2.1 0.3
15 −2.4 3.0 −4.8 −1.4 1.0 −0.6 6.1 2.7

277
278 P.L.M. Van Nyen et al.

results in Table 7. The relative improvement does not seem to be directly related
to the relative difference in the average review periods and order-up-to levels. Take
e.g. instance 4 where the OptQuest algorithm increases the average review peri-
ods by 10.1%, while the greedy search algorithm proposes an increase of 4.6%.
For this instance the relative improvement realized by the OptQuest algorithm is
1.8% while the relative improvement for the greedy search algorithm is 1.0%. One
may conclude that larger differences in the review periods lead to larger cost im-
provements. However, this conclusion is contradicted by e.g. instances 9 and 15.
In instance 9, the greedy search algorithm increases the review periods by 9.5%
leading to a cost saving of 0.8%, while the OptQuest algorithm increases the lot
sizes by only 5.3% leading to a larger cost saving of 2.1%. In instance 15, the
greedy search algorithm obtains a cost improvement of 6.1%, the largest observed
on this set of experiments. In order to achieve this cost improvement, the review
periods are reduced by 2.4% on average. Furthermore, it can be observed that even
if the utilization remains almost unchanged, still a substantial cost improvement
can be realized by the harmonization of the review periods. This can e.g. be seen
from instance nr. 10 where the utilization does not change for the greedy search
algorithm, but the costs improve by 2.9%.
From this analysis, we conclude that there is no direct relation between the
average review periods, order-up-to levels, machine utilization and the relative cost
improvement that can be obtained from the use of simulation based optimization
techniques. This conclusion leads to the obvious question how the relative cost
improvement is then realized by the simulation based optimization techniques. We
believe that the answer lies in the mechanisms that were described in Section 4.2.
Similarly to the approximate analytical model, the simulation based optimization
techniques ‘harmonize’ the review periods of the different products so as to obtain
the best balance between utilization and variability. Both simulation based opti-
mization techniques seek the best possible trade-off between the utilization of the
machines, variation of the arrivals, average processing times and variation of the
processing times. In this way, it is possible that relatively small differences in the
average review periods can lead to relatively large cost savings.

5 Discussion of the simulation results

In this section we analyse and interpret the results of the heuristic for the 240
instances in the simulation study. This allows us to verify the soundness of the
solutions proposed by the heuristic.

5.1 Main effects of factors

Figures 4–6 show the impact of the four factors on the average costs, review periods
and order-up-to levels. The impact of every factor is discussed successively.
The impact of increases in the net utilization on the total costs is about 17 % and
19 % for changes from 0.65 to 0.75 and from 0.75 to 0.85 respectively. The increase
of the net utilization from 0.65 to 0.75 causes the review periods to decrease, while
Control of multi-product multi-machine production-inventory systems 279

a b

c d
Fig. 4a–d. Impact of factor on average costs: a net utilization – b ﬁll rate – c avg. set-up
times – d avg. set-up costs

a b

c d
Fig. 5a–d. Impact of factor on average review periods: a net utilization – b ﬁll rate – c avg.
set-up times – d avg. set-up costs

they increase when the net utilization is increased from 0.75 to 0.85. The reason for
this pattern is to be found in the mechanisms embedded in the heuristic, as described
in Section 4.2. When the net utilization goes from 0.65 to 0.75, the review periods
are decreased in order to reduce throughput times, work-in-process costs and ﬁnal
inventory holding costs. However, a further increase in the net utilization requires
that the review periods be slightly increased in order to reduce the impact of the
280 P.L.M. Van Nyen et al.

a b

c d
Fig. 6a–d. Impact of factor on average order-up-to levels: a net utilization – b ﬁll rate – c
avg. set-up times – d avg. set-up costs

set-up times on the capacity utilization. The order-up-to levels on the contrary,
increase steadily when the utilization is increased. This can be explained by two
effects. First, in the experiments, the rise in the net utilization is caused by increases
in the demand rate for the products. The increased demand rate results in higher
demand during a review period and increased cycle stock. Secondly, the rise in
the net utilization increases the congestion in the system, leading to longer order
throughput times and higher safety stocks.
When the fill rates increase from 90 % to 98 %, the average total costs increase
with 11%. In order to account for the increase in the fill rate, the order-up-to levels
are increased. However, the increase in the order-up-to levels is fairly small. This
is due to the fact that the heuristic proposes smaller review periods, which lead to
lower cycle stock. Moreover, the order throughput times are reduced so that less
safety stock is required.
The increase of the average set-up times from 45 to 135 leads to an increase
in the total costs of 12%. The review periods are raised to limit the impact on the
capacity utilization. The increased review periods lead to lower set-up costs, but
also to longer throughput times and higher work-in-process costs. Because of the
increases in the review periods and the throughput times, the order-up-to levels and
the final inventory costs increase.
The set-up costs appear to be the dominant factor in our experimental design:
when the average set-up costs increase from 0 to 10, the average total cost rises with
156%. Further increases in the average set-up costs result in cost increases of 64 %
and 70 %. The review periods and the order-up-to levels increase in a similar fashion.
From these results, it appears that efforts to cut set-up costs (as advocated by e.g.
the Just-In-Time philosophy) may effectively result in large savings in the overall
costs. Figure 4-d presents the division of the total costs (TC) over the different
Control of multi-product multi-machine production-inventory systems 281

Fig. 7. Frequency diagram of allocation of free capacity for set-ups (instances with set-up
costs = 0)

Fig. 8. Frequency diagram of allocation of free capacity for set-ups (instances with set-up
costs > 0)

cost components. For the instances with set-up costs = 0, the heuristic proposes
solutions in which the final inventory costs (FIC) and the work-in-process costs
(WIPC) are almost balanced, the former being slightly dominant. For the instances
with set-up costs > 0, the heuristic seeks a balance between the fixed set-up costs
(SC) and the final inventory and work-in-process holding costs. Remarkably, this
result is similar to the well-known Economic Order Quantity model for which the
holding costs and fixed ordering costs are the same if the economic order quantity
is ordered. Similarly to the instances with set-up cost = 0, the final inventory costs
dominate the work-in-process costs.

5.2 Allocation of capacity for set-ups

Now we take a look at the fraction of the free capacity, computed as 1 − ρnet , that is
allocated for performing set-ups. From Figure 7 we see that for the instances with
set-up costs = 0, the allocated fraction of free capacity lies in the interval 60–74%,
the average being 65%. As a rule of thumb, it appears that in the case of set-up costs
= 0 about 2/3 of the free capacity should be allocated for set-up times. However,
from the analysis of results of simulation based optimization techniques in Table 7
it appears that it is not only important to select the right level of capacity utilization.
282 P.L.M. Van Nyen et al.

The numerical results in Table 7 indicate that substantial savings can be realised by
adjusting the review periods, but keeping the capacity utilization more or less at the
same level. Indeed, the harmonization of the review periods is of high importance
in the multi-product system under study. Figure 8 shows the case of non-zero set-up
costs for which a wide range of allocation of free capacity is observed (3–69%).
In general, the allocated fraction of free capacity for set-ups is lower than in the
case of zero set-up costs. Moreover, our experiments indicate that the fraction of
allocated capacity decreases when the set-up costs increase. This sound behavior
is due to increases in the review periods because of rising set-up costs. Obviously,
the increases in the review periods lead to lower capacity allocation for set-ups.

5.3 Behavior of review periods when set-up times are changed

Next, we turn our attention to the behavior of the average review periods, order-up-
to levels and capacity utilization when the set-up times are tripled from [30, 60] to
[90, 180]. The results are displayed in Figures 9 and 10. When set-up costs are zero
and the set-up times are tripled, our heuristic increases the review periods and order-
up-to levels with almost the same factor as the set-up times ([2.7, 3.2] in our set of
experiments). In this way, the fraction of free capacity allocated for set-ups remains
almost unchanged. For the instances with set-up costs > 0, a substantial change in
set-up times has virtually no impact on the average review periods and order-up-to
levels. In the main part of the instances, the proportion lies in the interval [0.9, 1.3].
Therefore, the fraction of free capacity used for set-up times increases with the
same factor as the increase in set-up times. From this observation, we conclude that
the cost aspect dominates the capacity aspect in the optimization process when the
set-up costs > 0. Finally, we observe that these results fade when set-up costs are
relatively small, i.e. Lij ∈ [6.67, 13.33]. Especially when the net utilization is high
(0.85), an increase in the set-up times leads to rises in the review periods and the

Fig. 9. Frequency diagram of proportion of average review periods for set-up times [90, 180]
over [30, 60] (instances with set-up costs = 0)
Control of multi-product multi-machine production-inventory systems 283

Fig. 10. Frequency diagram of proportion of review periods for set-up times [90, 180] over
[30, 60] (instances with set-up costs > 0)

corresponding order-up-to levels. Again, this indicates that our optimization tool
works soundly.
From the results in this subsection and the previous subsection, it appears that
behaviour of the optimized control parameters is rather different in the case where
set-up costs are equal to zero and the case where the set-up costs are larger than
zero. In the case that set-up costs are zero, the optimization tool focuses more on
the capacity utilization aspect whereas when the set-up costs are larger than zero,
the tool is mainly concerned with the cost aspect. The lesson that can be learned
from these observations is that it is very important to take into account both cost and
capacity issues when making production and inventory control decisions. Decision
support systems that focus solely on one of these issues are cursed to make errors
that can result in substantial cost increases. Unfortunately it is the case that most
decision support systems do focus either on the capacity aspect or on the cost aspect.
Finally, these observations illustrate the importance of a good knowledge of the cost
structure of the PI system, so that the application of sound management accounting
techniques should be given high priority.

6 Conclusions

We propose a three-step heuristic to coordinate production and inventory control de-

cisions in an integrated multi-product multi-machine production-inventory system
characterized by job shop routings and stochastic demand, set-up and processing
times. Our heuristic minimizes the sum of set-up costs, work-in-process holding
costs and final inventory holding costs while stochastic customer demand is satis-
fied with a target fill rate. The first step uses an approximate analytical model and a
greedy search algorithm to find near-optimal control parameters. Several approxi-
mations are used in this step. Since this may result in customer service levels that
are too low or too high, the order-up-to levels are fine-tuned in the second step.
This step ensures that all customer service level requirements are satisfied. In the
third step, the performance characteristics of the system are accurately estimated
using simulation.
284 P.L.M. Van Nyen et al.

We tested our heuristic in an extensive simulation study, consisting of 240

instances. We selected a subset of 15 worst-case instances that were optimized using
our heuristic and two simulation based optimization techniques. The comparison
of the performance of our heuristic to the simulation based optimization techniques
allowed us to conclude that our heuristic performs satisfactory, both in terms of
optimization quality and required computation time.
The detailed analysis of one problem instance helped to gain understanding of
the mechanisms that are embedded in the heuristic. It appeared that our optimization
tool harmonizes the review periods of the different products so that the variability in
the arrival and production processes, the average processing times and the utilization
of the workcenters are balanced.
Based on the results of our simulation study, we conclude that the set-up cost
is the dominant factor in the study. The impact of the set-up costs on the total costs
is many times higher than that of the other factors in the study: utilization, fill rate
and set-up times. The results support the insight that substantial cost savings can
be realized by set-up cost reduction programs, as advocated by the Just-In-Time
philosophy. When compared to the other relevant costs components, i.e. work-
in process and final inventory holding costs, the set-up costs are dominant. For
the instances with set-up costs > 0, about 50% of the total costs are due to the
fixed set-up costs. The work-in-process costs are slightly dominated by the final
inventory holding costs, both in the instances with and without set-up costs. When
set-up costs are absent, review periods are chosen so that about 2/3 of available
capacity is allocated to set-ups. When set-up costs are considerably high, it seems
that the review periods are chosen based on cost considerations only. Instances
that are in between the both extremes show a trade-off between cost and capacity
considerations. These results indicate that a good knowledge of the cost structure
of the production-inventory system is of the highest importance in order to make
control decisions that minimize the total relevant costs. Unlike other approaches,
our heuristic integrates both capacity and cost aspects. Therefore, our heuristic is
able to make robust decisions for every instance, regardless of the values of the
different cost parameters. Moreover, the heuristic captures the interaction between
different products and different workcenters and their impact on the congestion
phenomena in the production system. Finally, our heuristic combines the speed of
a queueing network model with the accuracy of a simulation experiment. Therefore,
it works fast and accurately.
Some interesting directions for further research may be to improve the approx-
imation techniques for open queueing networks, to extend the heuristic to other
inventory policies and to test the heuristic in real-life case studies. Furthermore,
it would be interesting to compare the performance of the reorder point policy
based control method presented in this paper to other control methods. Other con-
trol methods can e.g. be based on cyclical production plans or pull-type control.
Finally, it can be worthwhile to investigate the influence of different priority rules
on the performance of the production-inventory system.
Control of multi-product multi-machine production-inventory systems 285

Appendix 1: generation of problem instances

We use the following procedure to generate the problem instances.

1. Randomly generate a set of routings. The routing structures are chosen so that
the average number of operations per product equals 3 and the number of
operations per product lies in the interval [2, 4]. Furthermore, the number of
products per workcenter lies in the interval [4, 8];
2. Allocate to every product i a relative share of capacity utilization of workcenter
j, denoted as rscuij . We use the capacity utilization profiles that are presented
in Table A.1. These profiles depend on the number of products that are produced
at a workcenter, denoted as Nj ;
3. Randomly
LB draw the demand for product i λi = E −1 AD i from the inter-
val λ , λU B . This interval is chosen so that the expected item produc-
tion time E [Pij ] varies between P LB = 1 min. and P U B = 5 min. Then
ρnet maxi,j (rscuij )
λLB and λU B are given by: λLB = P UB
= 0.06 ρnet and
ρnet mini,j (rscuij )
λU B = P LB
= 0.1 ρnet . If ρ
net
= 0.85, then the yearly demand
lies in the interval [26805; 44676].
4. Calculate the average item processing time for every product i and every work-
rscuij ρnet
center j : E [Pij ] = λi ;
5. Generate randomly: r ∈ [0.15, 0.25] ¼ /( ¼ ∗ year);
6. Generate randomly for every i:
– ci ∈ ¼ [0, 0] or ¼ [6.67, 13.3] or ¼ [20, 40] or ¼ [60, 120];
– vi ∈ ¼ [10, 15];
– Cost of raw material as a fraction [0.35, 0.50] of vi ;
– The difference between vi and the cost of raw material is the added value.
To find the echelon value vij of a product, we distribute the added value
equally over the different production steps.
7. Generate randomly for every i and every j:
– Lij ∈ [30, 60] min. or [90, 180] min.
8. The length of a simulation subrun is chosen as 100,000∗ maxi E AD i .

Appendix 2: lower bound on total costs

In this appendix we propose a lower bound for the total costs in a production-
inventory system with job shop routings and stochastic arrival and processing times.
The lower bound neglects the impact of the variability in the system as well as the
interaction between different products. The lower bound consists of three terms:
1. ﬁnal inventory costs. The lower bound is based on an inventory model charac-
terized by deterministic demand and zero replenishment lead times;
2. work-in-process costs due to processing times and set-up times (waiting times
are excluded);
3. ﬁxed set-up costs (which are known exactly).
286 P.L.M. Van Nyen et al.

Table A1. Capacity utilization proﬁles

↓ proﬁle nr. \Nj → 4 5 6 7 8

1 0.30 0.30 0.25 0.20 0.17

2 0.28 0.25 0.20 0.17 0.15
3 0.25 0.20 0.17 0.16 0.15
4 0.17 0.15 0.15 0.14 0.12
5 0.10 0.13 0.12 0.11
6 0.10 0.11 0.10
7 0.10 0.10
8 0.10

Total 1 1 1 1 1

We formulate a lower bound for the total costs for product i for a given review
period Ri :
M

vij r Ri E [Pij ]
−1
LBT Ci (Ri ) = ci Ri + + E [Lij ] (9)
j=1
E AD i E AD i

(αi∗ ) Ri vi r
2
+
2E AD i
Based on the formula for LBT Ci (Ri ), the review period Ri∗ that minimizes
LBT Ci equals:
2
3
3 2E 2 AD
i ci
∗ 3
Ri = 3 M (10)
4 2 E [P ] v r + (α∗ )2 E AD v r
ij ij i i i
j=1

The computation of the lower bound becomes infeasible if ci = 0. Furthermore,

the lower bound for the total costs of the whole system is given by the sum of the
lower bounds of the different products since the interaction between the different
products is ignored.

References

1. Adams J, Balas E, Zawack D (1988) The shifting bottleneck procedure for job-shop
scheduling. Management Science 34: 391–401
2. Altiok T, Shiue GA (2000) Pull-type manufacturing systems with multiple product
types. IIE Transactions 32: 115–124
3. Amin M, Altiok T (1997) Control policies for multi-product multi-stage manufacturing
systems: an experimental approach. International Journal of Production Research 35:
201–223
4. Benjaafar S, Kim JS, Vishwanadham N (2004) On the effect of product variety in
production-inventory systems. Annals of Operations Research 126: 71–101
Control of multi-product multi-machine production-inventory systems 287

5. Benjaafar S, Cooper WL, Kim JS (2003) On the beneﬁts of pooling in production-

inventory systems. Working paper, University of Minnesota, Minneapolis. Management
Science (to appear)
6. Birtwistle GM, Dahl OJ, Nijgaard K (1984) Simula begin. Studentenlitteratur, Lund
7. Bowman RA, Muckstadt JA (1995) Production control of cyclic schedules with demand
and process variability. Production and Operations Management 4: 145–162
8. Cox DR (1962) Renewal theory. Methuen, London
9. Eglese RW (1990) Simulated annealing: a tool for operational research. European
Journal of Operational Research 46: 271–281
10. Gudum CK, de Kok AG (2002) A safety stock adjustment procedure to enable tar-
get service levels in simulation of generic inventory systems. Working paper, BETA
Research School, The Netherlands
11. Hillier FS, Lieberman GJ (2005) Introduction to Operations Research, 8th edn.
McGraw-Hill, Boston
12. Hopp WJ, Spearman ML (1996) Factory physics. Irwin, Chicago
13. Karmarkar US (1987) Lot sizes, lead times and in-process inventories. Management
Science 33: 409–419
14. Kingman JFC (1961) The single server queue in heavy trafﬁc. Proceedings of the
Cambridge Philosophical Society 57: 902–904
15. Lambrecht MR, Vandaele NJ (1996) A general approximation for the single product
lot sizing model with queueing delays. European Journal of Operational Research 95:
73–88
16. Lambrecht MR, Ivens PL, Vandaele NJ (1998) ACLIPS: A capacity and lead time
integrated procedure for scheduling. Management Science 44: 1548–1561
17. Law AM, Kelton WD (2000) Simulation modeling and analysis, 3rd edn. McGraw-Hill,
Boston
18. Liberopoulos G, Dallery Y (2003) Comparative modelling of multi-stage production-
inventory control policies with lot sizing. International Journal of Production Research
41: 1273–1298
19. Ouenniche J, Boctor F (1998) Sequencing, lot sizing and scheduling of several products
in job shops: the common cycle approach. International Journal of Production Research
36: 1125–1140
20. Ouenniche J, Bertrand JWM (2001) The ﬁnite horizon economic lot sizing problem in
job shops: the multiple cycle approach. International Journal of Production Economics
74: 49–61
21. Pinedo M, Chao X (1999) Operations scheduling with applications in manufacturing
and services. McGraw-Hill, London
22. Rubio R, Wein LM (1996) Setting base stock levels using product-form queueing
networks. Management Science 42: 259–268
23. Shantikumar JG, Buzacott JA (1981) Open queueing network models of dynamic job
shops. International Journal of Production Research 19: 255–266
24. Silver EA, Pyke DF, Peterson R (1998) Inventory management and production planning
and scheduling. Wiley, New York
25. Sox CR, Jackson PL, Bowman A, Muckstadt JA (1999) A review of the stochastic lot
scheduling problem. International Journal of Production Economics 62: 181–200
26. Suri R, Sanders JL, Kamath M (1993) Performance evaluation of production networks.
In: Graves SC et al (eds) Handbooks in Operations Research and Management Science,
vol 4, pp 199–286. Elsevier, Amsterdam
27. Vandaele NJ (1996) The impact of lot sizing on queueing delays: multi product, multi
machine models. Ph.D. thesis, Department of Applied Economics, Katholieke Univer-
siteit Leuven, Belgium
288 P.L.M. Van Nyen et al.

28. Vandaele NJ, Lambrecht MR, De Schuyter N, Cremmery R (2000) Spicer Off-Highway
Products Division-Brugge improves its lead-time and scheduling performance. Inter-
faces 30: 83–95
29. Van Nyen PLM, Van Ooijen HPG, Bertrand JWM (2004) Simulation results on the
performance of Albin and Whitt’s estimation method for waiting times in integrated
production-inventory systems. International Journal of Production Economics 90: 237–
249
30. Van Nyen PLM, Bertrand JWM, Van Ooijen HPG, Vandaele NJ (2004) The control
of an integrated multi-product multi-machine production-inventory system. Working
paper, BETA Research School, The Netherlands
31. Van Nyen PLM (2005) The integrated control of production inventory systems. Ph.D.
thesis, Department of Technology Management, Technische Universiteit Eindhoven,
The Netherlands (to appear)
32. Whitt W (1983) The queueing network analyzer. Bell System Technical Journal 62:
2779–2815
33. Whitt W (1994) Towards better multi-class parametric-decomposition approximations
for open queueing networks. Annals of Operations Research 48: 221–224
34. Zipkin P (1986) Models for design and control of stochastic multi-item batch production
systems. Operations Research 34: 91–104
Performance analysis of parallel identical machines
with a generalized shortest queue arrival mechanism
G.J. van Houtum1 , I.J.B.F. Adan2 , J. Wessels2 , and W.H.M. Zijm1,3
1
Faculty of Technology Management, Eindhoven University of Technology, P.O. Box 513,
5600 MB Eindhoven, The Netherlands (e-mail: [email protected])
2
Faculty of Mathematics and Computing Science, Eindhoven University of Technology,
Eindhoven, The Netherlands
3
Faculty of Applied Mathematics, University of Twente, Twente, The Netherlands

Received: February 8, 2000 / Accepted: November 28, 2000

Abstract. In this paper we study a production system consisting of a group of

parallel machines producing multiple job types. Each machine has its own queue
and it can process a restricted set of job types only. On arrival a job joins the
shortest queue among all queues capable of serving that job. Under the assumption
of Poisson arrivals and identical exponential processing times we derive upper and
lower bounds for the mean waiting time. These bounds are obtained from so-called
ﬂexible bound models, and they provide a powerful tool to efﬁciently determine
the mean waiting time. The bounds are used to study how the mean waiting time
depends on the amount of overlap (i.e. common job types) between the machines.

Key words: Queueing system – Shortest queue routing – Performance analysis –

Flexibility – Truncation model – Bounds

1 Introduction

In this paper we consider a queueing system consisting of a group of parallel

identical servers serving multiple job types. Each server has its own queue and
is capable of serving a restricted set of job types only. Jobs arrive according to a
Poisson process and on arrival they join the shortest feasible queue. The service
times are exponentially distributed. We will refer to this queueing model as the
Generalized Shortest Queue System (GSQS). This model is motivated by a situation
encountered in the assembly of Printed Circuit Boards (PCBs). This is explained
in more detail below.
Figure 1 shows a typical layout of an assembly system for PCBs. It consists
of three parallel insertion machines, each with its own local buffer. An insertion
Correspondence to: G.J. van Houtum
290 G.J. van Houtum et al.

Fig. 1. A ﬂexible assembly system consisting of three parallel insertion machines, on which
three types of PCBs are made

machine mounts vertical components, such as resistors and capacitators, on a PCB

by the insertion head. The components are mounted in a certain sequence, which
is prescribed by a Numerical Control program. The insertion head is fed by the
sequencer, which picks components from tapes and transports them in the right
order to the insertion head. Each tape contains only one type of components. The
tapes are stored in the component magazine, which can contain at most 80 tapes,
say. Each PCB needs on average 60 different types of components. To assemble
a PCB all required components have to be available in the component magazine.
Hence, the set of components available in the magazine determines the set of PCB
types that can be processed on that machine. The system in Figure 1 has to assemble
three PCB types, labeled A, B and C. The machines are basically similar, but due
to the fact that they are loaded with different types of components, the sets of PCB
types that can be handled by the machines are different. Machine M1 can handle
the A and B types, machine M2 the A and C, and machine M3 the B and C. When
the mounting times for all PCB types are approximately the same, it is reasonable
to send arriving PCBs to the shortest feasible queue.
Since the assembly of PCBs is often characterized by relatively few job types,
large production batches and small mounting times (see Zijm [16]), the use of
a queueing model seems appropriate to predict performance characteristics such
as the mean waiting time. An important issue is the assignment of the required
components to the machines. Ideally, each machine should get all components
needed to process all PCB types. However, since the component magazines have
a ﬁnite capacity, they can contain the components needed for a (small) subset of
PCB types only. In this paper we will investigate how much overlap (i.e. common
components) between the machines is required such that the system nearly performs
as in the ideal situation where the machines are equipped with all components.
The GSQS is also relevant for many other practical situations; e.g., for parallel
machines loaded with different sets of tools, computer disks loaded with different
Performance analysis of parallel identical machines 291

information ﬁles, or operators in a call center handling requests from different

customers. Nevertheless, the literature on the GSQS is limited. Schwartz [12] (see
also Roque [11]) considered a system related to the GSQS, but with a specific server
hierarchy. He derived some expressions for the mean waiting times. Adan, Wessels
and Zijm [2] derived rough approximations for the mean waiting times in a GSQS.
Green [7] constructed a truncation model for a related system with two types of
jobs and two types of servers: servers which can serve both job types and servers
which can only serve jobs of the second type.
For the present model with general (i.e. nonexponential) arrivals, Sparaggis,
Cassandras and Towsley [13] showed that the generalized shortest queue routing
is optimal with respect to the overall mean waiting time for symmetric cases (see
Theorem 3.1 in [13]; see also Subsection 2.3). For more general systems, Foss and
Chernova [6] used a fluid approximation approach to establish ergodicity condi-
tions (see also the remarks at the end of Section 2.2). The issue of ergodicity has
also been considered in a recent report by Foley and McDonald [5]. Their main
contribution, however, consists of results on the asymptotic behavior of a GSQS
with two exponential servers with different service rates. Finally, Hassin and Ha-
viv [8] have studied a symmetric GSQS with two servers and an additional property
called threshold jockeying. They focus on the difference in waiting time between
jobs which can choose between both servers and jobs which can not choose.
The GSQS can be described by a continuous-time Markov process with multi-
dimensional states where each component denotes the queue length at one of the
servers. Only in very special cases exact analytical solutions can be found (see e.g.
[3]). Therefore, to determine the mean waiting times, we will construct truncation
models which: (i) are flexible (i.e. the size of their state space can be controlled
by one or more truncation parameters); (ii) can be solved efficiently; (iii) provide
upper and lower bounds for the mean waiting times. Such models are called solvable
flexible bound models. They are derived by using the so-called precedence relation
method. This is a systematic approach for the construction of bound models, which
has been developed in [14, 15]. In this paper we will construct a lower and upper
bound model for the mean waiting times. These two models constitute the core of a
powerful numerical approach: the two bound models are solved for increasing sizes
of the truncated state space until the mean waiting times are determined within a
given, desired accuracy.
This paper is organized as follows. In Section 2, we describe the GSQS and
we discuss conditions under which the GSQS is ergodic and balanced. Next, in
Section 3, we construct the flexible bound models and we formulate a numerical
approach to determine the mean waiting times. Finally, in Section 4, we investigate
how the mean waiting times for the GSQS depend on the amount of overlap (i.e.
common job types) between the servers. This is done by numerically evaluating
several scenarios.

2 Model

This section consists of three subsections. In the ﬁrst subsection, we describe the
GSQS. In Subsection 2.2 we present a simple condition that is necessary and suf-
292 G.J. van Houtum et al.

Fig. 2. A GSQS with c = 2 servers and three job types

ﬁcient for ergodicity. In the last subsection, we present a related condition under
which the GSQS is said to be balanced and we brieﬂy discuss symmetric systems.

2.1 Model description

The GSQS consists of c ≥ 2 parallel servers serving multiple job types. Each server
has its own queue and is capable of serving a restricted set of job types only. All
service times are exponentially distributed with the same parameter µ > 0. The
arrival stream of each job type is Poisson and an arriving job joins the shortest
queue among all queues capable of serving that job (ties are broken with equal
probabilities). Figure 2 shows a GSQS with c = 2 servers and three job types: type
A, B and C jobs arrive with intensity λA , λB and λC , respectively. The A jobs can
be served by both servers, the B jobs can only be served by server 1, and the C
jobs must be served by server 2.
We introduce the following notations. The servers are numbered from 1, . . . , c
and the set I is deﬁned by I = {1, . . . , c}. The set of all job types is denoted
by J.
The arrival intensity of type j ∈ J jobs is given by λj ≥ 0, and λ = j∈J λj is
the total arrival intensity. For each j ∈ J, I(j) denotes the set of servers that can
serve the jobs of type j. We assume that each job type can be served by at least one
server and each server can handle at least one job type; so, I(j) = ∅ for all j ∈ J,
and ∪j∈J I(j) = I. Without loss of generality, we set µ = 1. Then the average
workload per server is given by ρ = λ/c. Obviously, the requirement ρ < 1 is
necessary for ergodicity.
The behavior of the GSQS is described by a continuous-time Markov process
with states (m1 , . . . , mc ), where mi denotes the length of the queue at server i,
i ∈ I (jobs in service are included). So, the state space is equal to
M = {m | m = (m1 , . . . , mc ) with mi ∈ IN0 for all i ∈ I} . (1)

We assume that j∈J λj 1{i∈I(j)} > 0 for all servers i ∈ I (here, 1{G} is the
indicator function, which is 1 if G is true and 0 otherwise), i.e., that all servers
have a positive potential arrival rate. This guarantees that the Markov process is
Performance analysis of parallel identical machines 293

Fig. 3. The transition rate diagram for the GSQS in Figure 2

irreducible. The transition rates are denoted by qm,n . Figure 3 shows the transition
rates for the GSQS in Figure 2.
The relevant performance measures are the mean waiting times W (j) for each
of job type j ∈ J, and the overall mean waiting time W , which is equal to
λj
W = W (j) . (2)
λ
j∈J

It is obvious that for an ergodic system,

W (j)
= min mi π(m1 , . . . , mc ) , j ∈ J, (3)
i∈I(j)
(m1 ,...,mc )∈M

where π(m1 , . . . , mc ) denotes the steady-state probability for state (m1 , . . . , mc ).

2.2 Ergodicity

By studying the job routing, we obtain a simple, necessary condition for the ergod-

each subset J ⊂ J, J = ∅, jobs of type j ∈ J arrive with
icity of the GSQS. For
an intensity equal to j∈J λj and they must be served by the servers ∪j∈J I(j).
This immediately leads to the following lemma.
Lemma 1 The GSQS can only be ergodic if

λj < | ∪j∈J I(j)| for all J ⊂ J, J = ∅. (4)
j∈J
294 G.J. van Houtum et al.

Note that for J = J, this inequality is equivalent to ρ < 1. For the GSQS in
Figure 2, condition (4) states that for ergodicity it is necessary that the inequalities
λB < 1, λC < 1 and λ < 2 (or, equivalently, ρ < 1) are satisfied. It appears that
condition (4) is also sufficient for ergodicity. To show this, we consider so-called
corresponding static systems.
A corresponding static system is a system that is identical to the GSQS, but
with static (random) routing instead of dynamic shortest queue routing. The static
(j)
routing is described by discrete distributions {xi }i∈I(j) , j ∈ J, where for each
(j)
j ∈ J and i ∈ I(j), the variable xi denotes the probability that an arriving job
of type j is sent to server i. Under static routing, it holds for each j ∈ J that the
Poisson stream of arriving type j jobs is split up into Poisson streams with intensities
(j)
xj,i = λj xi , i ∈ I(j), for type j arrivals joining server i. Hence the queues i ∈ I
constitute independent M/M/1 queues with identical mean service times equal to
µ = 1 and arrival intensities j∈J(i) xj,i , where J(i) = {j ∈ J|i ∈ I(j)}. As a
result, we obtain a simple necessary and sufficient condition for the ergodicity of a
corresponding static system, viz.

xj,i < 1 for all i ∈ I.
j∈J(i)

Lemma 2 For a GSQS, there exists a corresponding static system that is ergodic,
if and only if condition (4) is satisﬁed.

Proof. There exists a corresponding static system that is ergodic if and only if there
exists a nonnegative solution {xj,i }(j,i)∈A , with A = {(j, i) | j ∈ J, i ∈ I and i ∈
I(j)}, of the following equations and inequalities:

xj,i = λj for all j ∈ J, xj,i < 1 for all i ∈ I; (5)
i∈I(j) j∈J(i)

the equalities in (5) guarantee that the solution {xj,i }(j,i)∈A corresponds to discrete
(j)
distributions {xi }i∈I(j) which describe a static routing, and the inequalities in (5)
must be satisfied for ergodicity. It is easily seen that (5) has no solution if condition
(4) is not satisfied.
Now, assume that condition (4) is satisfied. To prove that there exists a nonnega-
tive solution {xj,i }(j,i)∈A of (5), we consider a transportation problem with supply
nodes V̂1 = J ∪ {0}, demand nodes V̂2 = I, and arcs Â = A ∪ {(0, i)|i ∈ I}
(supply node 0 denotes an extra type of jobs, which can be served by all servers).
Define the supplies âj by âj = λj for all j ∈ V̂1 \ {0} and â0 = c − λ − c, where

| ∪j∈J I(j)| − j∈J λj
:= min
J ⊂J

| ∪j∈J I(j)|

J =∅

(from (4), it follows that > 0, and â0 ≥ 0 since by taking J = J we obtain
the inequality ≤ (c − λ)/c ). Further, we define the demands b̂i by b̂i = 1 −

for all i ∈ V̂2 ; note that j∈V̂1 âj = i∈V̂2 b̂i . It may be verified that this trans-
portation problem satisfies a necessary and sufficient condition for the existence of
a feasible flow; see Lemma 5.4 of [14] and its proof is based on a transformation
Performance analysis of parallel identical machines 295

to a maximum-ﬂow problem followed by the application of the max-ﬂow min-cut

theorem (see e.g. [4]). So, there exists a feasible ﬂow for the transportation problem,
i.e., there exists a nonnegative solution {x̂j,i }(j,i)∈Â of the equations

x̂j,i = âj for all j ∈ V̂1 , x̂j,i = b̂i for all i ∈ V̂2 .
i∈V̂2 j∈V̂1
(j,i)∈Â (j,i)∈Â

It is easily seen that then the solution {xj,i }(j,i)∈A defined by xj,i = x̂j,i for all
(j, i) ∈ A, is a nonnegative solution of (5), which completes the proof.

In situations with many job types shortest queue routing will balance the queue
lengths more than any static routing. So if there is a corresponding static system
that is ergodic, then the GSQS will also be ergodic. Together with Lemma 2, this
informally shows that the following theorem holds.
Theorem 1 The GSQS is ergodic if and only if condition (4) is satisfied.
For a formal proof of this theorem, the reader is referred to Foss and Chernova [6]
or Foley and McDonald [5]. In the latter paper, a generalization of condition (4)
is proved to be necessary and sufficient for the (more general) model with differ-
ent service rates. Their proof also exploits the connection with a corresponding
static system. Foss and Chernova [6] use a fluid approximation approach to derive
necessary conditions for a model with general arrivals and general service times.

2.3 Balanced and symmetric systems

It is desirable that the shortest queue routing, as reﬂected by the sets I(j), balances
the workload among the servers. Formally, we say that a GSQS is balanced if there
exists a corresponding static system for which all queues have the same workload.
(j)
This means that there must exist discrete distributions {xi }i∈I(j) such that for
each server i ∈ I, the arrival intensity j∈J, (j,i)∈A xj,i is equal to λ/c = ρ, where
the xj,i and the set A are deﬁned as before. Such discrete distributions exist if and
only if there exists a nonnegative solution {xj,i }(j,i)∈A of the equations
λ
xj,i = λj for all j ∈ J, xj,i = for all i ∈ I. (6)
i∈I j∈J
c
(j,i)∈A (j,i)∈A

These equations are precisely the equations which must be satisfied by a feasible
flow for the transportation problem with supply nodes V1 = J, demand nodes
V2 = I, arcs A, supplies aj = λj for all j ∈ V1 and demands bi = λ/c for all
i ∈ V2 . Applying the necessary and sufficient condition for the existence of such a
feasible flow (see [14]) leads to the following lemma.
Lemma 3 A GSQS is balanced if and only if
λ
λj ≤ | ∪j∈J I(j)| for all J ⊂ J. (7)
c
j∈J
296 G.J. van Houtum et al.

Note that for J = ∅ and J = J, condition (7) holds by deﬁnition. Further, it

follows that a balanced GSQS satisfies condition (4) if and only if ρ < 1. So,
for a balanced GSQS, the simple condition ρ < 1 is necessary and sufficient for
ergodicity.
For a balanced GSQS the workloads under the shortest queue routing are not
necessarily balanced. This can be seen by considering the GSQS in Figure 2. Ac-
cording to condition (7), this GSQS is balanced if and only if λB ≤ λ/2 and
λC ≤ λ/2, i.e. if and only if λB ≤ λA + λC and λC ≤ λA + λB . This condition is
obviously satisfied if we take λC = λA +λB . In this case, equal workloads for both
servers can only be obtained if all jobs of type A are sent to server 1. But, under
the shortest queue routing, it will still occur that jobs of type A are sent to server 2,
and therefore server 2 will have a higher workload than server 1. Nevertheless, one
may expect that for a balanced GSQS, the shortest queue routing at least ensures
that the workloads will not differ too much.
A subclass of balanced systems are the symmetric systems. A GSQS is said to
be symmetric, if
λ(I1 ) = λ(I2 ) for all I1 , I2 ⊂ I with |I1 | = |I2 |, (8)
where

λ(I ) := λj , I ⊂ I.
j∈J
I(j)=I

So, a GSQS is symmetric, if for all subsets I ⊂ I with the same number of servers
|I |, the arrival intensity λ(I ) for the jobs which can be served by precisely the
servers of I , is the same. The GSQS in Figure 2 is symmetric if λB = λC .
For a symmetric GSQS, all queue lengths have the same distribution, which
implies that all servers have equal workloads. For such a system, it follows from
Sparaggis et al. [13], that the shortest queue routing minimizes the total number of
jobs in the system and hence the overall mean waiting time W . In particular, this
implies that the overall mean waiting time in a symmetric GSQS is less than in the
corresponding system consisting of N independent M/M/1 queues with workload
ρ.

3 Flexible bound models

In this section we construct two truncation models which are much easier to solve
than the original model. One truncation model produces lower bounds for the mean
waiting times, and the other one upper bounds. At the end of this section we describe
a numerical method for the computation of the mean waiting times within a given,
desired accuracy.
The truncation models exploit the property that the shortest queue routing causes
a drift towards states with equal queue lengths. The state space M of the two models
is obtained by truncating the original state space M around the diagonal, i.e.,
M = {m ∈ M | m = (m1 , . . . , mc ) and mi ≤ min(m) + Ti for all i ∈ I} ,(9)
Performance analysis of parallel identical machines 297

where min(m) := mini∈I mi and T1 , . . . , Tc ∈ IN are so-called threshold param-

eters; the corresponding vector T̂ := (T1 , . . . , Tc ) is called the threshold vector.
So state m ∈ M also lies in M if and only if for each i ∈ I the length of queue
i is at most Ti greater than the length of any other queue. Later on in this section
we discuss how appropriate values for T̂ can be selected. There are two types of
transitions pointing from states inside M to states outside M :

(i) in state m = (m1 , . . . , mc ) ∈ M with min(m) > 0 and I = {i ∈ I|mi =

min(m)+Ti } = ∅, at a server k ∈ I with mk = min(m) a service completion
occurs with rate 1 and leads to a transition from m to state n = m − ek ∈ M ;
(ii) in state m = (m1 , . . . , mc ) ∈ M with I = {i ∈ I|mi = min(m) +
Ti } = ∅, at a server i ∈ I an arrival of a new job leads to a transi-
tion from m to the state n = m + ei ∈ M ; this transition occurs with
rate j∈J |I(j; m)|−1 λj 1{i∈I(j;m)} , where the set I(j; m) is deﬁned by
I(j; m) = {i ∈ I(j) | mi = mink∈I(j) mk } (note that this rate may be
equal to 0).

In the lower (upper) bound model, the transitions to states n outside M are redi-
rected to states n with less (more) jobs inside M .

In the lower bound model, the transition in (i) is redirected to n = m − ek −
i∈I ei ∈ M . This means that the departure of a job at a non-empty shortest
queue is accompanied by killing one job at each of the queues i ∈ I , which are
already Ti greater than the shortest queue. The transition in (ii) is redirected to m
itself, i.e., a new job arriving at one of the servers i ∈ I is rejected. The lower
bound model is therefore called the Threshold Killing and Rejection (TKR) model.
In the upper bound model, the transition in (i) is redirected to m itself. This
means that if at least one queue is already Ti greater than the shortest queue, the
ﬁnished job in the shortest queue is not allowed to depart, but is served once more;
this is equivalent to saying that the servers atthe shortest queues are blocked.
Transition (ii) is redirected to n = m + ei + k∈Isq ek ∈ M , with Isq = {k ∈
I|mk = min(m)}. This means that an arrival of a new job at one of the queues
which is already Ti greater than the shortest queue, is accompanied by the addition
of one extra job at each of the shortest queues. The upper bound model is therefore
called the Threshold Blocking and Addition (TBA) model. Note that this model
may be non-ergodic while the original model is ergodic. However, the larger the
values of the thresholds Ti the more unlikely this situation. In Figure 4, we show
the redirected transitions in the lower and upper bound model for the GSQS of
Figure 3.
It is intuitively clear that the queues in the TKR model are stochastically smaller
than the queues in the original model. Hence, for each j ∈ J, the TKR model yields
a lower bound for the mean length of the shortest queue among the queues i ∈ I(j),
and thus also for the mean waiting time of type j jobs (cf. (3)). Denote the steady-
state probabilities in the TKR model by πT KR (m1 , . . . , mc ) and let

(j)
WT KR (T̂ ) = min mi πT KR (m1 , . . . , mc ) , j ∈ J.
i∈I(j)
(m1 ,...,mc )∈M
298 G.J. van Houtum et al.

Fig. 4. The redirected transitions in the TKR and TBA model for the GSQS depicted in
Figure 2. For both models, T̂ = (T1 , T2 ) = (3, 3)

(j)
Then we have for each j ∈ J that WT KR (T̂ ) ≤ W (j) , and thus (cf. (2))
λj (j)
WT KR (T̂ ) = WT KR (T̂ )
λ
j∈J
yields a lower bound for the overall mean waiting time W . The lower bounds
(j)
WT KR (T̂ ) monotonically increase as the thresholds T1 , . . . , Tc increase. Similarly
(j)
the TBA model produces monotonically decreasing upper bounds WT BA (T̂ ), j ∈
J, and WT BA (T̂ ). The bounds and the monotonicity properties can be rigorously
proved by using the precedence relation method, see [14]. This method is based on
Markov reward theory and it has been developed in [14, 15].
The truncation models can be solved efﬁciently by using the matrix-geometric
approach described in [10]. Since the truncation models exploit the property that
shortest queue routing tries to balance the queues, one may expect that the bounds
are tight for already moderate values of the thresholds T1 , . . . , Tc .
We will now formulate a numerical method to determine the mean waiting
times with an absolute accuracy abs . The method repeatedly solves the TKR and
TBA model for increasing threshold vectors T̂ = (T1 , . . . , Tc ). For each vec-
(j) (j)
tor T̂ we use (WT KR (T̂ ) + WT BA (T̂ ))/2 as an approximation for W (j) and
(j) (j)
∆(j) (T̂ ) = (WT BA (T̂ ) − WT KR (T̂ ))/2 as an upper bound for the error; we sim-
ilarly approximate W by (WT KR (T̂ ) + WT BA (T̂ ))/2 where the error is at most
∆(T̂ ) = (WT BA (T̂ ) − WT KR (T̂ ))/2. The approximations and error bounds are
set equal to ∞ if the TBA model is not ergodic (which may be the case for small
thresholds). The computation procedure stops when all error bounds are less than
or equal to abs ; otherwise at least one of the thresholds is increased by 1 and new
approximations are computed. The decision to increase a threshold Ti is based on
the rate of redirections rrd (i). This is explained in the next paragraph.
Performance analysis of parallel identical machines 299

The variable rrd (i), i ∈ I, denotes the rate at which redirections occur in the
boundary states m = (m1 , . . . , mc ) with mi = min(m) + Ti of the truncated state
space. If for given T̂ only the TKR model is ergodic, then rrd (i) denotes the rate for
the TKR model, otherwise rrd (i) denotes the sum of the rate for the TKR and TBA
model. The rates rrd (i) can be computed directly from the steady-state distributions
of the bound models. The higher the rate rrd (i), the higher the expected impact
of increasing Ti . The computation procedure increases all thresholds Ti for which
rrd (i) = maxk∈I rrd (k). The numerical method is summarized below.

Algorithm (to determine the mean waiting times for the GSQS)
Input: The data of an ergodic instance of the GSQS, i.e.,
c, J, I(j) for all j ∈ J, and λj for all j ∈ J;
the absolute accuracy abs ;
the initial threshold vector T̂ = (T1 , . . . , Tc ).
(j) (j)
Step 1. Determine WT KR (T̂ ), WT BA (T̂ ) and ∆(j) (T̂ ) for all j ∈ J,
and WT KR (T̂ ), WT BA (T̂ ) and ∆(T̂ ),
and rrd (i) for all i ∈ I.
Step 2. If ∆(j) (T̂ ) > abs for some j ∈ J or ∆(T̂ ) > abs ,
then Ti := Ti + 1 for all i ∈ I with rrd (i) = maxk∈I rrd (k),
and return to Step 1.
(j) (j)
Step 3. W (j) = (WT KR (T̂ ) + WT BA (T̂ ))/2 for all j ∈ J,
and W = (WT KR (T̂ ) + WT BA (T̂ ))/2.

Note that for a symmetric GSQS it is natural to start with a threshold vector T̂
with equal components. Then in each iteration all rates rrd (i) will be equal, and
hence each Ti will be increased by 1. So the components of T̂ will remain equal.

4 Numerical study of the GSQS

In this section we consider three scenarios. In Subsection 4.1 we distinguish two

types of jobs: common jobs and specialist jobs. The common jobs can be served
by all servers and the other ones can be served by only one speciﬁc server. We
focus on the behavior of the overall mean waiting time W as a function of the
fraction of work due to common jobs. The higher this fraction, the more balanced
the queues and the better the performance. So W will be decreasing as the number
of common jobs increases. In one extreme case, viz. when all jobs are specialist
jobs, the GSQS reduces to independent M/M/1 queues, and W is maximal. In the
other extreme case, viz. when all jobs are common jobs, the GSQS is identical to a
pure Symmetric Shortest Queue System (SSQS), and W is minimal. In Subsection
4.1 we investigate how W behaves in between these two extremes.
In Subsection 4.2 we consider a symmetric GSQS with c = 3 servers, and,
besides common and specialist jobs, we also have semi-common jobs. These jobs
can be served by two servers. We compare two situations: (i) a GSQS with a given
fraction of common jobs (and no semi-common jobs); (ii) a GSQS with twice this
300 G.J. van Houtum et al.

fraction of semi-common jobs (and no common jobs). In both cases the average
number of servers capable of serving an arbitrary job is the same. In Subsection
4.3 we evaluate a series of balanced, asymmetric systems. We investigate how the
mean waiting times deteriorate due to the asymmetry. Finally, in Subsection 4.4,
the main conclusions are summarized.

4.1 The impact of common jobs

We distinguish c + 1 job types, numbered 1, . . . , c, c + 1. Type j jobs are specialist

jobs, which can only be served by server j, j = 1, . . . , c. The type c + 1 jobs are
common jobs, which can be served by all servers. The total arrival intensity is equal
to λ = cρ, with ρ ∈ (0, 1). The common jobs constitute a fraction p, p ∈ [0, 1],
of the total arrival stream, while each of the streams of specialist jobs constitutes
an equal part of the remaining stream. So λc+1 = pλ and λj = (1 − p)λ/c for
j = 1, . . . , c.
Table 1 lists the mean waiting times for specialist jobs (= W (1) = . . . = W (c) ),
common jobs (= W (c+1) ), and an arbitrary job (= W ) as a function of p for a system
with c = 2 and c = 3 servers, respectively, and a workload p = 0.9. For p = 0 there
are no common jobs; then W (c+1) is defined as the limiting value of the waiting
time of common jobs as p ↓ 0. For p = 1 a similar remark holds for the mean
waiting times W (1) = · · · = W (c) . Table 1 also lists the realized reduction rr(p).
This is defined as
WM/M/1 − W
rr(p) = , (10)
WM/M/1 − WSSQS
where WM/M/1 and WSSQS denote the mean waiting time in an M/M/1 system
and SSQS, respectively, both with the same workload ρ = 0.9 and mean service
time µ = 1 as for the GSQS. The mean waiting time WM/M/1 is realized when
p = 0, and WSSQS is realized when p = 1. Clearly, rr(0) = 0 and rr(1) = 1 by
definition. For all cases in Table 1, WM/M/1 = 9 and WSSQS = 4.475 for c = 2
and WSSQS = 2.982 for c = 3. The mean waiting times in the SSQS have been
determined with an absolute accuracy of 0.0001 by using the bound models in [1].
The mean waiting times in Table 1 have been determined by using the algorithm
described in Section 3 with an absolute accuracy abs = 0.005.
In Table 1 we see that the overall mean waiting time W = pW (c+1) + (1 −
p)W (1) sharply decreases for small values of p; see also Figure 5. Already 73% of
the maximal reduction is realized when 20% of the jobs is common and 91% of
the maximal reduction is realized when 50% of the jobs is common. A surprising
result is that the realized reduction rr(p) is almost the same for c = 2 and c = 3
servers. Further note that for large p the mean waiting time W (1) for specialist jobs
is only a little bit larger than the mean waiting time W (c+1) for common jobs. This
is due to the balancing effect of the common jobs.
The behavior of the overall mean waiting time W is further investigated in Table
2 for different values of p, ρ and c. The mean waiting times are again determined
with an absolute accuracy abs = 0.005 (and 0.0001 for WSSQS ). Only for low
workloads (i.e., ρ ≤ 0.4), the mean waiting time has been determined even more
Performance analysis of parallel identical machines 301

Table 1. Mean waiting times as a function of p and c

c=2 c=3
p W (1) W (c+1) W rr(p) W (1) W (c+1) W rr(p)
0.0 9.00 4.26 9.00 0.0 % 9.00 2.69 9.00 0.0 %
0.1 6.80 4.36 6.56 54.0 % 6.07 2.82 5.75 54.1 %
0.2 6.04 4.40 5.72 72.6 % 5.06 2.88 4.63 72.7 %
0.3 5.66 4.43 5.29 82.0 % 4.56 2.91 4.06 82.0 %
0.4 5.43 4.44 5.04 87.6 % 4.25 2.93 3.72 87.7 %
0.5 5.28 4.45 4.86 91.4 % 4.05 2.95 3.50 91.4 %
0.6 5.17 4.46 4.74 94.1 % 3.90 2.96 3.34 94.1 %
0.7 5.09 4.46 4.65 96.1 % 3.79 2.97 3.21 96.1 %
0.8 5.02 4.47 4.58 97.7 % 3.71 2.97 3.12 97.7 %
0.9 4.97 4.47 4.52 99.0 % 3.64 2.98 3.04 99.0 %
1.0 4.93 4.48 4.48 100.0 % 3.58 2.98 2.98 100.0 %

Fig. 5. Graphical representation of the mean waiting times W listed in Table 1

accurately in order to obtain sufﬁciently accurate estimates for rr(p). The results
in Table 2 show that for each combination of p and c, the values for W for varying
workloads ρ are not that far away from the values for WSSQS ; in particular, the
absolute differences are small for small workloads ρ and the relative differences are
small for high workloads ρ. The results also suggest that the rr(p) is insensitive to
the number of servers c. However, rr(p) strongly depends on ρ; it is rather small
for low workloads and large for high workloads (it seems that rr(p) ↑ 1 as ρ ↑ 1).

4.2 Common versus semi-common jobs

In Subsection 4.1 we distinguished two job types only, specialist and common
jobs. For GSQSs with more than two servers, one may also have jobs in between,
302 G.J. van Houtum et al.

Table 2. Mean waiting times as a function of p, ρ and c

c=2 c=3
p ρ WM/M/1 W WSSQS rr(p) W WSSQS rr(p)
0.25 0.2 0.25 0.19 0.07 32.1 % 0.18 0.02 30.8 %
0.4 0.67 0.51 0.26 39.6 % 0.46 0.13 38.6 %
0.6 1.50 1.10 0.68 49.2 % 0.97 0.42 48.9 %
0.8 4.00 2.67 1.96 64.8 % 2.24 1.29 64.8 %
0.9 9.00 5.47 4.47 77.9 % 4.31 2.98 78.0 %
0.95 19.00 10.69 9.49 87.3 % 7.94 6.33 87.4 %
0.98 49.00 25.86 24.49 94.4 % 18.17 16.35 94.4 %
0.50 0.2 0.25 0.14 0.07 58.7 % 0.12 0.02 57.2 %
0.4 0.67 0.40 0.26 66.4 % 0.32 0.13 65.5 %
0.6 1.50 0.89 0.68 74.5 % 0.70 0.42 74.3 %
0.8 4.00 2.27 1.96 84.7 % 1.70 1.29 84.8 %
0.9 9.00 4.86 4.47 91.4 % 3.50 2.98 91.4 %
0.95 19.00 9.93 9.49 95.4 % 6.92 6.33 95.4 %
0.98 49.00 24.97 24.49 98.1 % 16.98 16.35 98.1 %

i.e., jobs that can be served by two or more, but not all servers. In this subsection
we investigate which job types lead to the largest reduction of W : common or
semi-common jobs?
We consider a GSQS with c = 3 servers and a total arrival rate λ = 3ρ
with ρ ∈ (0, 1). The following two cases are distinguished for the detailed arrival
streams. For case I, we copy the situation in Subsection 4.1. In this case there are
4 job types. The type 4 jobs are common jobs; they arrive with intensity λ4 = pλ
with p ∈ [0, 0.5] (the reason why p may not exceed 0.5 follows below). Type j jobs,
j = 1, 2, 3 are specialist jobs which only can be served by server j; they arrive with
intensity λj = (1 − p)λ/3. So the mean number of servers capable of serving an
arbitrary job is equal to 1 + 2p. In case II we have 6 job types. The type j jobs,
j = 1, 2, 3, are again specialist jobs which can only be served by server j. The type
4, 5 and 6 jobs are semi-common jobs; the type 4 jobs can be served by the servers
1 and 2, the type 5 jobs by 1 and 3, and the type 6 jobs by 2 and 3. To guarantee
that the mean number of servers capable of serving an arbitrary job remains the
same (i.e., equal to 1 + 2p), the arrival intensity λj is set equal to λj = 2pλ/3 for
j = 4, 5, 6 and λj = (1 − 2p)λ/3 for j = 1, 2, 3 (to avoid negative intensities, p
must be less than or equal to 0.5).
Table 3 lists the overall mean waiting time W for different values of p and ρ.
The results for case I are copied from Table 2. We can conclude that the absolute
difference between the mean waiting time W in case I and II is rather small in each
situation. This suggests that W is mainly determined by the mean number of servers
capable of serving an arbitrary job; it does not matter whether this mean number is
realized by common or by (twice as many) semi-common jobs. Nevertheless, the
results in Table 3 also show that in each situation case II yields a smaller W than
Performance analysis of parallel identical machines 303

Table 3. Mean waiting times as a function of p and ρ

W Diff. (I−II)
p ρ Case I Case II Abs. Rel.
0.25 0.2 0.18 0.14 0.04 23.4 %
0.4 0.46 0.38 0.08 18.1 %
0.6 0.97 0.83 0.15 14.9 %
0.8 2.24 1.97 0.27 11.9 %
0.9 4.31 3.92 0.38 8.9 %
0.95 7.94 7.46 0.48 6.0 %
0.98 18.17 17.62 0.55 3.0 %
0.50 0.2 0.12 0.05 0.07 55.1 %
0.4 0.32 0.22 0.10 32.5 %
0.6 0.70 0.56 0.14 20.1 %
0.8 1.70 1.51 0.19 11.2 %
0.9 3.50 3.27 0.22 6.4 %
0.95 6.92 6.67 0.25 3.6 %
0.98 16.98 16.72 0.26 1.5 %

case I. This may be explained as follows. Let us consider the situation with p = 0.5.
In case I, λ1 = λ2 = λ3 = λ/6 and λ4 = λ/2. Hence, for each group of 6 arriving
jobs, on average 4 jobs join the shortest queue, 1 job joins the shortest but one
queue, and 1 job joins the longest queue. In case II, however, λ1 = λ2 = λ3 = 0
and λ4 = λ5 = λ6 = λ/3. Thus for each group of 6 arriving jobs, on average 4
jobs join the shortest queue and 2 jobs joins the shortest but one queue. So in case
II the balancing of queues will be slightly stronger, and thus W will be slightly
smaller.

4.3 Balanced asymmetric systems

In this subsection we study the GSQS with c = 2 servers and three job types as
depicted in Figure 2. The parameters are chosen as follows: ρ = 0.9, λ = 2ρ = 1.8,
λA = λ/2 = 0.9, λB = p̂λ/2 = 0.9p̂, λC = (1 − p̂)λ/2 = 0.9(1 − p̂) where
p̂ ∈ [0, 0.5]. So one half of the jobs are common (type A) jobs and the other half
are specialist (type B and C) jobs. But the specialist jobs are not equally divided
over the servers. The fraction p̂ of specialist jobs which must be served by server
1 (i.e., the type B jobs) is less than or equal to the fraction 1 − p̂ of specialist jobs
which must be served by server 2 (i.e. the type C jobs). Only for p̂ = 0.5 we have
a symmetric system. For all p̂ ∈ [0, 0.5) we have an asymmetric, but balanced
system; a static system with equal workloads for both servers is obtained when a
fraction 1 − p̂ of the type A jobs is sent to server 1 and a fraction p̂ to server 2.
Table 4 shows the mean waiting times W (A) , W (B) , W (C) for each job type and
the overall mean waiting time W for p̂ = 0, 0.1, . . . , 0.5. These waiting times have
again been computed with an absolute accuracy abs = 0.005. In the last column of
304 G.J. van Houtum et al.

Table 4. Mean waiting times as a function of p̂

p̂ W (A) W (B) W (C) W rr(p̂)

0.0 4.28 4.34 13.05 8.66 7.5 %
0.1 4.37 4.52 8.52 6.25 60.8 %
0.2 4.42 4.68 6.93 5.45 78.5 %
0.3 4.44 4.84 6.12 5.09 86.5 %
0.4 4.45 5.03 5.62 4.92 90.3 %
0.5 4.45 5.28 5.28 4.86 91.4 %

Table 4 we list the realized reduction rr(p̂) deﬁned by (10), where WM/M/1 = 9
and WSSQS = 4.475 for ρ = 0.9. The results in Table 4 show that W (A) is fairly
constant for all values of p̂. As expected, W (B) decreases and W (C) increases as
p̂ decreases. A striking observation is that W (C) sharply increases for p̂ close to
0; and thus also W = (W (A) + p̂W (B) + (1 − p̂)W (C) )/2. For p̂ = 0 we have
λA = λC = 0.9 and λB = 0, and the overall mean waiting time W is equal to
8.66. This is close to WM/M/1 = 9, which is realized when all type A jobs would
be sent to server 1.

4.4 Conclusion

The main conclusion from the numerical experiments is that the overall mean wait-
ing time may already be reduced signiﬁcantly by creating a little bit of (semi-)
common work. Furthermore, this reduction is mainly determined by the amount
of overlap, i.e., the mean number of servers capable of handling an arbitrary job.
Finally, the beneﬁcial effect of (semi-)common jobs may vanish for highly asym-
metric situations.

References

1. Adan IJBF, Van Houtum GJ, Van der Wal J (1994) Upper and lower bounds for the
waiting time in the symmetric shortest queue system. Annals of Operations Research
48: 197–217
2. Adan IJBF, Wessels J, Zijm WHM (1989) Queueing analysis in a ﬂexible assembly
system with a job-dependent parallel structure. In: Operations Research Proceedings,
pp 551–558. Springer, Berlin Heidelberg New York
3. Adan IJBF, Wessels J, Zijm, WHM (1990) Analysis of the symmetric shortest queue
problem. Stochastic Models 6: 691–713
4. Ahuja RK, Magnanti TL, Orlin JB (1993) Network ﬂows: theory, algorithms, and
applications. Prentice-Hall, Englewood Cliffs, NJ
5. Foley RD, McDonald DR (2000) Join the shortest queue: Stability and exact asymp-
totics. The Annals of Applied Probability (to appear)
6. Foss S, Chernova N (1998) On the stability of a partially accessible multi-station queue
with state-dependent routing. Queueing Systems 29: 55–73
Performance analysis of parallel identical machines 305

7. Green L (1985) A queueing system with general-use and limited-use servers. Operations
Research 33: 168–182
8. Hassin R, Haviv M (1994) Equilibrium strategies and the value of information in a two
line queueing system with threshold jockeying. Stochastic Models 10: 415–435
9. Latouche G, Ramaswami V (1993) A logarithmic reduction algorithm for quasi-birth-
death processes. Journal of Applied Probability 30: 650–674
10. Neuts MF (1981) Matrix-geometric solutions in stochastic models. Johns Hopkins
University Press, Baltimore
11. Roque DR (1980) A note on “Queueing models with lane selection”. Operations Re-
search 28: 419–420
12. Schwartz BL (1974) Queueing models with lane selection: a new class of problems.
Operations Research 22: 331–339
13. Sparaggis PD, Cassandras CG, Towsley D (1993) Optimal control of multiclass par-
allel service systems with and without state information. In Proceedings of the 32nd
Conference on Decision and Control, San Antonio, pp 1686–1691
14. Van Houtum GJ (1995) New approaches for multi-dimensional queueing systems. Ph.D.
Thesis, Eindhoven University of Technology, Eindhoven
15. Van Houtum GJ, Zijm WHM, Adan IJBF, Wessels J (1998) Bounds for performance
characteristics: A systematic approach via cost structures. Stochastic Models 14: 205–
224 (Special issue in honor of M.F. Neuts)
16. Zijm WHM (1991) Operational control of automated PCB assembly lines. In: Fandel
G, Zaepfel G (eds) Modern production concepts: theory and applications, pp 146–164.
Springer, Berlin Heidelberg New York
A review and comparison
of hybrid and pull-type production control strategies
John Geraghty1 and Cathal Heavey2
1
School of Mechanical and Manufacturing Engineering, Dublin City University, Glasnevin,
Dublin 9, Ireland (e-mail: [email protected])
2
Department of Manufacturing and Operations Engineering, University of Limerick,
Limerick, Ireland (e-mail: [email protected])

Abstract. In order to overcome the disadvantages of Kanban Control Strategy

(KCS) in non-repetitive manufacturing environments, two research approaches
have been followed in the literature in past two decades. The first approach has
been concerned with developing new, or combining existing, pull-type production
control strategies in order to maximise the benefits of pull control while increasing
the ability of a production system to satisfy demand. The second approach has
focused on how best to combine Just-In-Time (JIT) and Material-Requirements-
Planning (MRP) philosophies in order to maximise the benefits of pull control in
non-repetitive manufacturing environments. This paper provides a review of the
research activities in these two approaches, presents a comparison between a Pro-
duction Control Strategy (PCS) from each approach, and presents a comparison
of the performance of several pull-type production control strategies in addressing
the Service Level vs. WIP trade-off in an environment with low variability and a
light-to-medium demand load.

Keywords: Hybrid Push/Pull – CONWIP/Pull – EKCS – BSCS – Kanban –

Markov decision process – Discrete event simulation – Simulated annealing opti-
mization algorithm

1 Introduction

The selection, implementation and management of an appropriate Production Con-

trol Strategy is an important tool to any organisation aiming to adopt a Lean Manu-
facturing Philosophy. Production control strategies that push products through the
system based on forecasted customer demands are classiﬁed as Push-type produc-
tion control strategies. Such strategies aim to maximise the throughput of the system
Correspondence to: J. Geraghty
308 J. Geraghty and C. Heavey

so as to minimise shortage in supply and tend to result in excess work-in-progress

inventory, WIP, that masks flaws in the system. Production control strategies that
pull products through the system based on actual customer demands at the end of the
line are classified as Pull-type production control strategies. Such strategies tend to
minimise WIP and unveil flaws in the system at the risk of failure to satisfy demand.
The advantages and disadvantages of push systems such as MRP and pull systems
such as kanban controlled Just-In-Time have been well documented in the literature
[11, 23, 24, 31]. In order to overcome the disadvantages of Kanban Control Strategy
(KCS), two research approaches have been followed in the last two decades. The
first approach has been concerned with developing new, or combining existing,
pull-type production control strategies in order to maximise the benefits of pull
control while increasing the ability of a production system to satisfy demand. The
second approach has focused on how best to combine JIT and MRP philosophies
in order to maximise the benefits of pull control in non-repetitive manufacturing
environments. A hybrid production system could be characterised as a production
system that combines elements of the two philosophies in order to minimise inven-
tory and unmask flaws in the system while maintaining the ability of the system to
satisfy demand. These research approaches are not mutually exclusive as there are
intersections between these approaches. For instance, we classify CONWIP as a
pull-type production control strategy, however CONWIP could also be considered
as a hybrid Push/Pull production control strategy that utilises a pull-type control
strategy to limit the amount of inventory in the line and a push-type control control
strategy within the line to speed the progress of inventory toward the finished-goods
buffer. In addition, Geraghty and Heavey [15] showed that under certain conditions
the Horizontally Integrated Hybrid Production Control Strategy, HIHPS, favoured
by Hodgson and Wang [20, 21] is equivalent to the pull-type production control
strategy hybrid Kanban-CONWIP introduced by Bonvik and Gershwin [3], Bonvik
et al. [2].
In this paper we firstly present a brief review of the research efforts in the de-
sign and development of both Pull-type PCS and Horizontally Integrated Hybrid
Systems, HIHS (see Sects. 2 and 3 respectively). Section 4 compares the perfor-
mance of one popular model of HIHS with the Pull-type PCS known as Extended
Kanban Control Strategy, EKCS. Section 5 presents an experiment to explore the
comparative performance of several Pull-type PCS in addressing the Service Level
vs. WIP trade-off. Finally Section 6 presents a discussion of the main results of the
experiments and research presented in this paper.

2 Pull-type production control strategies

The Kanban Control Strategy, KCS, developed by Toyota allows part ﬂow in a Just-
In-Time, JIT, line to be controlled by basing production authorisations on end-item
demands. KCS is often referred to as a ‘Pull’ production control strategy since part
demands travel upstream and pull products down the line by authorising production
based on the presence of Kanban cards which are limited in number and circulated
between production stages. KCS has been the focus of considerable research effort
since the early 1980’s. In particular, optimising the number and distribution of
A review and comparison of hybrid and pull-type production control strategies 309

Kanbans has received a lot of attention. However, in practice, Kanban distributions

tend to be determined by implementing rules of thumb or simple formulae [2].
Berkley [1] provides a review of Kanban control literature, while Muckstadt and
Tayur [26] provide a review of KCS mechanisms that have been developed.
The Basestock Control Strategy, BSCS, is the oldest pull-type production con-
trol strategy. The definitive paper on BSCS [7] was published in 1960. In a BSCS
line the inventory points of each stage are initialised to predefined levels. When
a demand event occurs, demand cards are transmitted to each production stage.
These demand cards are matched with a part in the stage’s input buffer to authorise
production and are destroyed once production begins. Liberopoulos and Dallery
[25] demonstrated that BSCS is equivalent to the Hedging Point Control System
which has its origins in the work of Kimemia and Gershwin [22]. The primary
advantage of BSCS is that it responds quickly to demand events. Every stage is
informed instantly of demand events, unlike KCS where demand information must
pass slowly upstream. However, BSCS has been criticised for the loose coordina-
tion provided between stages and the fact that it does not provide any guarantee
to limit the number of parts that may enter the system [25]. Every demand event
authorises the release of new parts into the system.
Generalised Kanban Control Strategy, GKCS, and Extended Kanban Control
Strategy, EKCS, are both based on the integration of KCS and BSCS. GKCS was
first proposed by [4, 36] and EKCS was proposed by [9, 10]. In both systems
the inventory points are initialised to a predefined level as in BSCS and demand
information is communicated to each stage in the line. The movement of parts
between stages is coordinated by Kanbans as in KCS. The difference between the
control structures employed in both systems is very subtle and has to do with how
demand information is communicated to the individual production stages. In GKCS
when a demand event occurs information about the demand is communicated to
the final stage in the form of demand cards. Each demand card must be matched
with a free Kanban. When this match occurs, a demand card is sent to the stage’s
immediate predecessor and production at the stage is authorised if the demand-
Kanban match can be matched with a part. Therefore, demand information is not
necessarily transferred instantly to all production stages. The arrival of demand
information at a stage can be delayed if downstream stages fail to match the demand
cards with Kanbans instantly. In an EKCS governed line demand information is
communicated instantly to all production stages. Production is authorised when a
demand card, a Kanban and a part are available. The advantages of EKCS over
GKCS are, firstly, its comparative simplicity and secondly, the separation of the
role of the basestock and Kanban parameters is clearly distinguishable, whereas in
a GKCS system it is not [25].
Constant Work In Process or CONWIP has received a lot of research attention
since it was first proposed by [29, 30]. Initially CONWIP was proposed as a Pull
alternative to KCS and often referred to as an Order Release Mechanism as opposed
to a Production Control Strategy. CONWIP was purported to bring the advantages of
pull-control to non-repetitive manufacturing environments [29, 30]. The mechanism
utilised by CONWIP is very simple. A limit known as the WIP Cap is placed on
the amount of inventory that may be in the system at any given period of time.
310 J. Geraghty and C. Heavey

Once this level of inventory has been achieved, inventory may not enter the system
until a demand event removes a corresponding amount of inventory from the line.
With only one parameter to optimise, i.e. the WIP Cap, CONWIP is very simple to
implement and maintain. The main reason why CONWIP lines outperform KCS
lines is that demand information is instantly communicated to the initial stage and
the release rate is adjusted to match the demand rate. In a KCS line, as has been stated
earlier, demand information has to travel upstream from the end-item inventory
point to the initial stage. The longer the line and the more delays encountered
at individual production stages (e.g. processing time, breakdown/repair time, set-
up time etc.) the longer the information delay encountered. A disadvantage of
CONWIP is that inventory levels are not controlled at the individual stages, which
can result in high inventory levels building up in front of bottleneck stages.
Chang and Yih [6] introduced a Pull PCS named Generic Kanban System (GKS)
applicable to dynamic, multi-product, non-repetitive manufacturing environments.
KCS requires inventories of semi-finished products of each product type to be
maintained at each production stage. In multi-product environments the amount of
semi-finished inventory maintained in the line could be prohibitively large [6]. GKS
operates by providing a fixed number of Kanbans at each workstation that can be
acquired by any part. A part/job can only enter the system if it acquires a Kanban
from each of the workstations in the system. GKS reduces to CONWIP if an equal
number of Kanbans are distributed to all workstations. Comparison with a push-type
production control strategy was favourable with GKS shown to be less susceptible to
the position of the bottleneck. GKS outperformed KCS in terms of WIP required to
achieve a desired Cycle Time. Comparison with CONWIP was favourable and GKS
was shown to be more flexible in that by manipulating the number of Kanbans at
each workstation the performance of GKS could be improved beyond that achieved
by CONWIP. Chang and Yih [5] presented a simulated annealing algorithm for
determining the optimal Kanban distribution for a GKS line.
In order to overcome the disadvantages of loose coordination between pro-
duction stages in a CONWIP line Bonvik and Gershwin [3] and Bonvik et al.
[2] proposed an alternative strategy, hybrid Kanban-CONWIP. In hybrid Kanban-
CONWIP, as in CONWIP, an overall cap is placed on the amount of inventory
allowed in the production system. In addition, inventory is controlled using Kan-
bans in all stages except the last stage. CONWIP can be considered as special
case of hybrid Kanban-CONWIP in which there is an infinite number of Kanbans
distributed to each production stage [2]. A comparison of KCS, minimal blocking
KCS, BSCS, CONWIP and hybrid Kanban-CONWIP was presented in [2]. The
different PCS were compared in a four-stage tandem production line using simula-
tion. Each of the PCS were compared using constant demand and demand that had a
stepped increase/decrease. It was found that the hybrid Kanban-CONWIP strategy
decreased inventories by 10% to 20% over KCS while main taining the same service
levels (percentage of demands instantaneously matched with a finished product).
The performance of basestock and CONWIP strategies fell between those of KCS
and hybrid Kanban-CONWIP.
Two papers that generalize hybrid Kanban-CONWIP are [14] and [13]. These
papers propose a generic pull model that, as well as encapsulating the three basic
A review and comparison of hybrid and pull-type production control strategies 311

pull control strategies, KCS, CONWIP and BSCS, also allows customized pull
control strategies to be developed. Simulation and an evolutionary algorithm were
used to study the generic model. Details of the evolutionary algorithm are given in
[14] while results on extensive experimentation on the effect of factors (i.e., line im-
balance, machine reliability) on the proposed generic pull model are given in [13].
Gaury and Kleijnen [12] noted that Operations Research has traditionally con-
centrated on optimisation whereas practitioners find the robustness of a proposed
solution more important. A methodology was presented in [12] that was a stagewise
combination of four techniques: (i) simulation, (ii) optimization, (iii) risk or un-
certainty analysis, and (iv) bootstrapping. Gaury and Kleijnen [12] illustrated their
methodology through a production-control study for the four-stage, single product
production line utilised by [2]. Robustness was defined in [12] as the capability
to maintain short-term service, in a variety of environments; i.e. the probability of
the short-term fill-rate (service level) remaining within a pre-specified range. Be-
sides satisfying this probabilistic constraint, the system minimised expected long-
term WIP. Four systems were compared in [12], namely Kanban, CONWIP, hybrid
Kanban-CONWIP, and Generic. The optimal parameters found in [2] were used
for KCS, CONWIP and hybrid Kanban-CONWIP. Gaury and Kleijnen [12] used
a Genetic Algorithm to determine the optimal parameters for the Generic pull sys-
tem. For the risk analysis step, seventeen inputs were considered; the mean and
variance of the processing time for each of the four production stages, mean time
between failures and mean time to repair per production stage, and the demand rate.
The inputs were varied over a range of ±5% around their base values. Gaury and
Kleijnen [12] concluded that in this particular example, hybrid Kanban-CONWIP
was best when risk was not ignored; otherwise Generic was best and therefore, risk
considerations can influence the selection of a PCS.
Each of the pull-type production control strategies discussed above, with the
exception of GKCS, have one important advantage over KCS that ensures that they
are more readily applicable to non-repetitive manufacturing environments. That
advantage stems from the manner in which demand information is communicated
in comparison to KCS. In KCS, demand information is not communicated directly to
production stages that release parts/jobs into the system. Rather it is communicated
sequentially up the line from the finished goods buffer as withdrawals are made by
customer demands. This communication delay means that the pace of the production
line is not adjusted automatically to account for changes in the demand rate. The
arrival of demand information to the initial stages in a GKCS line might be delayed
if the demand cards at a production stage in the line are not instantaneously matched
with Kanban cards. BSCS, EKCS, CONWIP, GKS and hybrid Kanban-CONWIP
all, however, communicate the demand information instantaneously to the initial
stages allowing the release rate to be paced to the actual demand rate. For instance,
Bonvik et al. [2] showed that if the demand rate decreases unexpectedly the impact
on a CONWIP strategy and hybrid Kanban-CONWIP strategy would be for the
finished-goods buffer to increase toward the WIP Cap with all intermediate buffers
tending toward empty. The impact, however, on a KCS line would be that all
the intermediate buffers would increase toward their maximum permissible limits.
312 J. Geraghty and C. Heavey

Therefore, the KCS line would have semi-ﬁnished inventory distributed throughout
the line.

3 Hybrid production control strategies

Hybrid control strategies can be classified into two categories: vertically integrated
hybrid systems (VIHS) or horizontally integrated hybrid systems (HIHS) [8]. VIHS
consist of two levels, usually an upper level push-type PCS and a lower level pull-
type PCS. For example, Synchro MRP utilises MRP for long range planning and
KCS for shop floor execution [17]. The main disadvantage of VIHS is that MRP
calculations must be performed for each stage in the production system. This makes
VIHS complex to implement and maintain and accounts for their relative lack of
use in industry [20]. HIHS consist of one level where some production stages are
controlled by push-type PCS and other stages by pull-type PCS. Only HIHS are
considered in the discussion that follows.
Hodgson and Wang [20, 21] developed a Markov Decision Process (MDP)
model for HIHS. The model was solved using both dynamic programming and
simulation for several production strategies, including pure push and pure pull
production strategies and strategies based on the integration of push and pull control.
In this push/pull integration strategy each individual stage may push or pull. This
type of control strategy is denoted as Hybrid Push/Pull in [20, 21]. Initially in [20],
the research was applied to a four-stage semi-continuous production iron and steel
works (see Fig. 1), with the first two stages in parallel and the remaining stages as
serial production stages. In order to simplify the analysis the model assumes that
the production process is a discrete time process and that demand per period and
the amount of inventory are both integer multiples of a unit size. The research was
later extended to a five-stage production system [21]. For both the four and five
stage production systems, a strategy where production stages 1 and 2 (P1 and P2
in Fig. 1) push and all other stages pull was demonstrated to result in the lowest
average gain (average system cost). Hodgson and Wang [21] stated that they had
observed similar results for an eight-stage system and concluded that this strategy
would be the optimal hybrid integration strategy for a J-stage system. Subsequent
papers that use the model in [20, 21] or extensions of it are [11], [28] and [35].
Deleersnyder et al. [11] considered that the complexity of the control structure
required for the successful implementation of Synchro MRP resulted in it being
largely ignored by industry. Synchro MRP requires MRP control to be linked into
every stage in the production line while utilising local kanban control to authorise
production at each stage. Deleersnyder et al. [11] developed a hybrid production
strategy that limited the number of stages into which MRP type information is
added in order to reduce the complexity of the hybrid strategy in comparison to
Synchro MRP, while realizing the benefits of integrating push and pull type control
strategies. The model developed in [11] is similar to that presented in [20, 21] and
comparable results were obtained for a serial production line.
Pandey and Khokhajaikiat [28] extended the model in [20, 21] to allow for
the inclusion of raw material constraints at each stage. The modified model also
allowed for a stage to require more than one item of inventory and/or more than one
A review and comparison of hybrid and pull-type production control strategies 313

Fig. 1. Parallel/Serial four stage production system modelled by Hodgson and Wang [20]

item of raw material to produce a part. Pandey and Khokhajaikiat [28] presented
results from two sets of experiments. In the ﬁrst set they modelled a four-stage
parallel/serial production line similar to the system shown in Figure 1. The initial
production stages (P1 and P2 in Fig. 1) operated under raw material availability
constraints, had different order purchasing and delivery distributions but had iden-
tical production unreliability. Sixteen integration strategies were considered. In the
second experimental set the authors applied the raw material availability constraint
to all stages of the production line. The authors concluded that the hybrid strategy
in which the initial stages (P1 and P2 ) operate under push control and the remaining
stages operate under pull control is the best strategy when raw material constraints
apply only to the initial stages. When the raw material availability constraint is
applied to all stages the push strategy becomes the optimal control strategy. For
systems with large variability in demand none of the strategies dominated.
Wang and Xu [35] presented an approach that facilitated the evaluation of
a wide range of topologies that utilize hybrid push/pull. They used a structure
model to describe a manufacturing system’s topology. Their methodology was
used to investigate four 45-stage manufacturing systems: (i) A single-material serial
processing system; (ii) A multi-material serial processing system; (iii) A multi-part
processing and assembly system, and (iv) A multi-part multi-component processing
and assembly system. Wang and Xu [35] compared pure pull and push strategies
against the optimal hybrid strategy found in [20, 21], where the initial stages push
and all other stages pull. Their results suggest that the optimal hybrid strategy
out-performs pure push or pull strategies.
Other models that implement hybrid push/pull control strategies similar to [20,
21] have been developed. Takahashi et al. [32] deﬁned push/pull integration as a
system in which there is a single junction point between push stages and pull stages.
In Takahashi et al. [32] a model was presented to evaluate this control strategy. Two
314 J. Geraghty and C. Heavey

subsequent papers, [33] and [34] further developed and experimented with this
model. Hirakawa et al. [19] and Hirakawa [18] developed a mathematical model
for a hybrid push/pull control strategy that allows each production stage to switch
between push and pull control depending on whether demand can be forecasted
reliably or not. Cochran and Kim [8] presents a HIHS with a movable junction point
between a push sub-system and a pull sub-system. The control strategy presented
had three decision variables: (i) the junction point, i.e., the last push stage in the
HIHS; (ii) the safety stock level at the junction point; (iii) the number of kanbans for
each stage in the pull sub-system. Simulation combined with simulated annealing
was used to ﬁnd the optimal decision variables for the control strategy.

4 Comparison of EKCS and the push-type PCS

modelled by Hodgson and Wang

Several comparisons of Pull-PCS have been reported in the literature, for example
[2, 12, 13, 25]. There has also been several comparisons between HIHS and KCS,
for example [8, 27, 32, 33, 34, 20, 21, 11, 28, 35]. Comparisons between HIHS and
other Pull-Type PCS are rare in the literature. Orth and Coskunoglu [27] included
CONWIP, in addition to KCS, in the comparison analysis. In a previous paper [15]
we demonstrated that the optimal HIHS selected by Hodgson and Wang [20, 21]
where initial stages employ push control and all other stages employ pull control, is
equivalent to a Pull-Type PCS, namely hybrid Kanban-CONWIP [3, 2]. As well as
considering several alternative integration strategies, Hodgson and Wang [20, 21]
also included a Push-Type and a Pull-Type PCS in their analysis, which they referred
to as ‘Pure Push’ and ‘Pure Pull’ PCS. After examining the equations used in [20, 21]
to model the ‘Pure Push’ PCS we felt that there were similarities to the control
structure implemented by the Pull-Type PCS know as EKCS [9, 10]. Therefore in
this section we explore these comparisons. The notation used in the remainder of
this paper is shown in Table 1.
In Hodgson and Wang’s ‘Pure-Push’ PCS, production is authorised when (i) suf-
ficient space exists in the output buffer of the stage, (ii) sufficient inventory exists in
the input buffer of the stage, (iii) sufficient production capacity exists at the stage,
and (iv) downstream inventory levels have decreased below forecasted require-
ments necessary to meet expected demand. The MDP model presented in [20, 21]
required the evaluation of two equations (i.e. the production trigger, Aj (n), and
the production objective, P Oj (n) in order to determine production authorisations
for a stage in period n,). For the purposes of the discussion presented here we have
combined these equations to form a single equation for the number of production
authorisations, P Aj (n), available to a stage in period n. This was achieved without
making any simplifying assumptions. P Aj (n) for a system controlled by Hodgson
and Wang’s ‘Pure-Push’ PCS can be modelled by Eq. (1) where 1 ≤ j ≤ J − 1
and by Eq. (2) for the final production stage.
A review and comparison of hybrid and pull-type production control strategies 315

Table 1. Notation used in models presented

Notation Description

Aj (n): Production trigger for stage j in period n.

P Oj (n): Production objective for stage j in period n.
Ijmax : Maximum capacity of inventory point j.
SS: Desired safety stock level of ﬁnished product
N Sj : The number of stages that succeed stage j (i.e., number of stages that compo-
nents produced at stage j traverse after stage j before reaching the customer.
D(n): Forecasted demand in period n.
j, J: Unique number identifying a production stage where 1 ≤ j ≤ J.
n: Production period.
d(n): The actual demand quantity in period n.
P Aj (n): The Production Authorisation for stage j in period n.
Pjmin : The minimum production capacity of stage j
Pjmax : The maximum production capacity of stage j
Pj (n): Production quantity for stage j in period n.
q: The production reliability of a stage, which is modelled by a Probability
Mass Function
Ij : The output buffer of stage j.
Ij (n): The amount of inventory held in the output buffer of production stage j in
period n.
{Bj (n)}: The set of inventories held in the output buffers of the immediate predecessors
of stage j in period n
cj (n): The sum of inventories held in the output buffers of stages parallel to, but
with stage number greater than, production stage j
Kj : The number of Kanbans allocated to production stage j.
CC: The cap on total inventory allowed in CONWIP and hybrid Kanban-
CONWIP lines.
DCj (n): Number of demand cards held at stage j in period n in BSCS and EKCS
lines.
Sj : The initialisation stock level for stage j in BSCS and EKCS lines
Sjmin : The minimum initialisation stock level for stage j in BSCS and EKCS lines
Pj : Production center at stage j

P Aj (n) = min Ijmax − Ij (n − 1), max Pjmin , SS + (N Sj + 1) × D (n)
⎛ ⎞⎤ ⎫
J ⎬
−⎝ Ii (n − 1) − cj (n − 1)⎠⎦, {Bj (n − 1)} , Pjmax , (1)
⎭
i=j

∀j ≤ J − 1

P AJ (n) = min IJmax − IJ (n − 1) + D (n) , max PJmin , SS + D (n)
−IJ (n − 1)], {BJ (n − 1)} , PJmax } (2)
It is possible for the term IJ (n − 1) to become negative. This occurs in the event
of a shortage in period n − 1, i.e. a failure to satisfy demand. Therefore the term
316 J. Geraghty and C. Heavey

IJ (n − 1) is not only used to record the inventory in the finished goods buffer in
period n − 1 but also the backlog in period n − 1. If a backlog occurs, i.e. IJ (n − 1)
is negative, Eq. (2) would effectively result in a temporary increase of the maximum
capacity of the finished goods inventory buffer. Equation (3) models the number
of production authorisations available to the final stage where shortages are not
permitted to temporarily increase the maximum capacity of the finished goods
inventory buffer.

P AJ (n) = min IJmax − max [0, IJ (n − 1)] + D (n) , max PJmin , SS
+D (n) − IJ (n − 1)], {BJ (n − 1)} , PJmax } (3)
Notice that Hodgson and Wang’s model of Push control includes a limit on inventory
in the output buffer of a stage, Ijmax . Since a production stage does not become
aware of a change in state of the buffer until the subsequent production period, n+1,
and with the exception of the final stage does not attempt to predict the removal
of inventory from the output buffer by immediate successors, this limit behaves
similarly to Kanban control. However, the Push strategy modelled by [20, 21] is
not equivalent to KCS since each stage has information J about the status of line
downstream from its output buffer, through the term i=j Ii (n − 1) − cj (n − 1) in
Eq. (1) above. Since only a demand event can change the state of the downstream
section of the line, in terms of total WIP, this is equivalent to demand cards being
passed to each stage in the line. Therefore, it would appear that there is some
similarity between the ‘Pure-Push’PCS modelled by [20, 21] and EKCS.
In an EKCS system, a production stage has authorisation to produce a part
when: (i) inventory is available in its buffer; (ii) a Kanban card is available and (iii)
a demand card is available. The number of demand cards available to a production
stage in period n is given by Eq. (4). In period n the number of production au-
thorisations at stage j, P Aj (n), is given by Eq. (5). For an EKCS system where
the introduction of temporary Kanbans in the event of shortages is not permitted
P Aj (n) is modelled by Eq. (6) for the final production stage.
DCj (n) = DCj (n − 1) − Pj (n − 1) + d (n) , ∀j ≤ J (4)

P Aj (n) = min Kj − Ij (n − 1) , max Pjmin , DCj (n) ,
{Bj (n − 1)} , Pjmax , ∀j ≤ J − 1 (5)

P AJ (n) = min KJ − max [0, IJ (n − 1)] , max PJmin , DCJ (n) ,
{BJ (n − 1)} , PJmax } (6)
Let us assume a serial production line with J stages. The state transition equa-
tions for the inventory levels of the buffers, for either model, can be determined
from Eqs. (7) and (8):

Ij (n) = Ij (n − 1) + Pj (n) − Pj+1 (n) ∀j ≤ J − 1 (7)

IJ (n) = IJ (n − 1) + PJ (n) − d(n) (8)

From examining the equations for the two models, for both dynamic and static
Kanban distributions, it is clear that in order for the PCS to be equivalent three
conditions must be satisﬁed: (i) the inventory level of a buffer in the ‘Pure-Push’
A review and comparison of hybrid and pull-type production control strategies 317

model must equate to the inventory level of the same buffer in the EKCS model in
a given production period n. (ii) the Kanban distribution in EKCS must be equal
to the buffer capacity limits in the ‘Pure-Push’ model.
Kj = Ijmax ∀j ≤J −1 (9)
KJ = IJmax + D (n) (10)
and (iii) the following two equalities must hold:
DCJ (n) = SS + D (n) − IJ (n − 1) (11)
⎡ ⎛ ⎞⎤
J

DCj (n) = ⎣SS+ (N Sj +1) × D (n) − ⎝ Ii (n−1)−cj (n−1)⎠⎦
i=j
(12)
∀j ≤ J − 1
Substituting Eq. (11) into Eq. (4) yields:
SS + D (n) − IJ (n − 1) = SS + D (n) − IJ (n − 2)
−PJ (n − 1) + d (n) (13)
Re-writing Eq. (13) yields:
IJ (n − 1) = IJ (n − 2) + PJ (n − 1) − d (n) (14)
Equation (14) is not equivalent to Eq. (8) and therefore the two PCS are not equiv-
alent. The primary difference between the two PCS stems from the method in
which demand information is communicated to each stage. In EKCS demand in-
formation is instantly communicated to all stages. In the ‘Pure-Push’ PCS the
communication of demand information is delayed by one period. In fact, the
‘Pure-Push’ PCS described by [20, 21] could more accurately be described as
a Vertically Integrated Hybrid System, in which each production stage develops
a forecast of production requirements
J for the production period through the term
SS + (N Sj + 1) × D(n) − ( i=j Ii (n − 1) − cj (n − 1)) and utilises kanbans to
implement the production plan on the shop-ﬂoor plan. It would, therefore, be more
accurate to refer to this PCS as Synchro MRP than ‘Pure-Push’.
However, it is worth noting that the two PCS can be made equivalent if the
demand information is communicated instantly in both PCS or delayed for one
period in both PCS. For instance, communication of demand information in the
‘Pure-Push’ PCS can be made instantaneous by adjusting Eqs. (1), (2) and (3) by
including a component to adjust the downstream inventory levels by the demand
quantity d(n) as shown by Eqs. (15), (16) and (17) below.

P Aj (n) = min Ijmax − Ij (n − 1), max Pjmin , SS + (N Sj + 1) × D (n)
⎛ ⎞⎤ ⎫
J ⎬
−⎝ Ii (n−1)−cj (n−1)−d (n)⎠⎦, {Bj (n−1)} , Pjmax ,
⎭
i=j

∀j ≤ J − 1 (15)

P AJ (n) = min − IJ (n − 1) + D (n) , max PJmin , SS + D (n)
IJmax
318 J. Geraghty and C. Heavey

− (IJ (n − 1) − d (n))], {BJ (n − 1)} , PJmax } (16)

P AJ (n) = min IJmax − max [0, IJ (n − 1)] + D (n) , max PJmin , SS
+D (n) − (IJ (n − 1) − d (n))], {BJ (n − 1)} , PJmax } (17)
The PCS will be equivalent if the ﬁrst two conditions stated earlier are met and the
following two equalities hold:
DCJ (n) = SS + D (n) − (IJ (n − 1) − d (n)) (18)
⎡ ⎛ ⎞⎤
J
DCj (n) = ⎣SS+ (N Sj +1) × D (n) − ⎝ Ii (n−1)−cj (n−1) −d (n)⎠⎦
i=j

∀j ≤ J − 1 (19)
Substituting Eq. (18) into Eq. (4) yields:
SS + D (n) − (IJ (n − 1) − d (n)) = SS + D (n) − (IJ (n − 1) − d (n − 1))
−PJ (n − 1) + d (n) (20)
Re-writing Eq. (20) yields:

IJ (n − 1) = IJ (n − 2) + PJ (n − 1) − d (n − 1) (21)

Equation (21) is equivalent to Eq. (8) and therefore requires no further proof.
Substituting Eq. (19) into Eq. (4) yields:
⎡ ⎛ ⎞⎤
J
⎣SS + (N Sj + 1) × D (n) − ⎝ Ii (n − 1) − cj (n − 1) − d (n)⎠⎦ =
i=j
⎡ ⎛ ⎞⎤
J

⎣SS + (N Sj + 1) × D (n) − ⎝ Ii (n − 2) − cj (n − 2) − d (n − 1)⎠⎦
i=j

−Pj (n − 1) + d (n) (22)

Re-writing Eq. (22) yields:
⎛ ⎞ ⎛ ⎞
J J
⎝ Ii (n − 1) − cj (n − 1)⎠ = ⎝ Ii (n − 2) − cj (n − 2)⎠
i=j i=j

+Pj (n − 1) − d (n − 1) (23)
We can use the state transition equations (i.e. Eqs. (7) and (8)) to prove Eq. (23) as
shown below. Note that in a serial line the terms cj (n−1) and cj (n−2) are both zero.
A review and comparison of hybrid and pull-type production control strategies 319

Ij (n − 1) = Ij (n − 2) + Pj (n − 1) − Pj+1 (n − 1)
Ij+1 (n − 1) = Ij+1 (n − 2) + Pj+1 (n − 1) − Pj+2 (n − 1)
Ij+2 (n − 1) = Ij+2 (n − 2) + Pj+2 (n − 1) − Pj+3 (n − 1)
.. . .
. = .. + Pj+3 (n − 1) − ..
.. .. .. .. .. .. ..
. . . . . . .
.. . ..
. = .. +. −P (n − 1)
J−1
IJ−1 (n − 1) = IJ−1 (n − 2) + PJ−1 (n − 1) − PJ (n − 1)
IJ (n − 1) = IJ (n − 2) + PJ (n − 1) − d (n − 1)
J
J
Ii (n − 1) = Ii (n − 2) + Pj (n − 1) − d (n − 1)
i=j i=j

The initialisation stock levels for the buffers for both models can be determined
from Eqs. (4) and (19). For instance, assume that the initial number of demand
cards, DCj (0), the production in the previous period, Pj (0), and the demand in the
previous period d(0) are all zero. Therefore, from Eq. (4) the number of demand
cards available to each production stage in the ﬁrst period, DCj (1), will be zero.
The initialisation stocks for both models can be calculated from Eq. (19) as follows:
⎡ ⎛ ⎞⎤
J
0 = ⎣SS + (N Sj + 1) × D (n) − ⎝ Ii (0) − cj (0)⎠⎦ (24)
i=j
J

Ii (0) − cj (0) = SS + (N SJ + 1) × D (n) (25)
i=j

Therefore, for the ﬁnal production stage the initialisation stock level should be:
IJ (0) − cJ (0) = SS + (N Sj + 1) × D (n) (26)
Since it is the ﬁnal production stage, the term cJ (0) on the LHS will equal zero and
on the RHS the term N SJ will equal zero. Therefore:
IJ (0) = SS + D (n) (27)
For stage J − 1 the initialisation stock level will be:
IJ (0) + IJ−1 (0) − cJ−1 (0) = SS + (N SJ−1 + 1) × D (n) (28)
Given that cJ−1 (0) = 0, N SJ−1 = 1 and substituting in Eq. 27, then:
SS + D (n) + IJ−1 (0) = SS + 2 × D (n) (29)
IJ−1 (0) = SS + 2 × D (n) − SS − D (n) (30)
IJ−1 (0) = D (n) (31)
In fact it can be shown that for 1 ≤ j ≤ J −1 the appropriate choice for initialisation
stock is Ij (0) = D(n). Of course, if the initial number of demand cards available
to production stages is not zero then appropriate initialisation stock levels for both
models can be determined in a similar manner from Eq. (19). Therefore, EKCS and
Hodgson and Wang’s ‘Pure-Push’ PCS are equivalent if:
320 J. Geraghty and C. Heavey

(1) The demand event is communicated instantaneously in the ‘Pure-Push’ PCS

or delayed by one production period in the EKCS PCS,
(2) The Kanban distribution for EKCS and buffer capacity limits for the ‘Pure-
Push’ PCS are equivalent,
(3) The initialisation stocks of both models are equivalent and calculated from
Eq. (19) for all stages, i.e. j = 1, ..., J, and
(4) The forecasted demand quantity, D(n), in the ‘Pure-Push’ PCS is constant for
all values of n.

5 Comparison of pull-type PCS

We now turn our attention toward examining the comparative performance of

several Pull-Type PCS. The PCS examined are KCS, CONWIP, hybrid Kanban-
CONWIP, BSCS and EKCS. The study presented here differs from Bonvik et al. [2]
and Gaury and Kleijnen [12] as EKCS is included in the analysis, variable demand
is used and unsatisfied demand is backlogged rather than being treated as a lost
opportunity.
The system modelled for the purposes of these experiments was the five-stage
parallel/serial line described by Hodgson and Wang [21]. The line produces a single
product type produced from two components. In the line, stages 1 and 2 operate
in parallel to input the two components to the system that are assembled on a one-
to-one ratio at stage 3. Stages 3, 4 and 5 are in series. The output buffer of stage
5 is the finished goods buffer, from which all demands must be satisfied. Demand
in a given period, n, is either 3 or 4 units with equal probability. For the purposes
of the experimental work presented here it is assumed that minimum production
level (Pjmin ) of stage j in period n is zero. The reliability of stage j in period n
was modelled by the Probability Mass Function given in Table 2. Noting that q and
P Aj(n) are independent, the probability that stage j produces q units in period n
given that the production authorisation is P Aj (n), i.e. Pr[Pj (n) = q|P Aj (n)], is
given by:
Pr [Pj (n) = q|P Aj (n)] = Pr [Pj (n) = q] , q = 0, 1, . . . , P Aj (n) − 1(32)
Pr [Pj (n) = P Aj (n)|P Aj (n)] = Pr [Pj (n) = P Aj (n)]
+ Pr [Pj (n) = P Aj (n) + 1]

+ . . . + Pr Pj (n) = Pjmax q ≥ P Aj (n) (33)

Table 2. Probability mass function for reliability in production of individual stages

q 3 4 5

Pr[q] 0.2 0.6 0.2

The remainder of this section details the models that were developed for each
PCS examined, the experiment design and the results from the experiment. The
A review and comparison of hybrid and pull-type production control strategies 321

models have been developed by the authors with reference to the notation and
methodologies employed by [20, 21], and [28].

5.1 Kanban control strategy

In a KCS system, production at stage j is authorised by the presence of Kanban

cards and parts. When stage j begins production on a part, a Kanban card is attached
to the part and travels downstream with the part. When the succeeding stage begins
production on the part the Kanban card is removed and passed back to stage j to
be available to authorise production of a new part. The Production Authorisation
for period n for KCS stage j, where 1 ≤ j ≤ J − 1, is obtained from Eq. (34).
The Production Authorisation for the ﬁnal stage is obtained from Eq. (35) and is
different from the model in [20, 21] in that the number of Kanbans available to the
ﬁnal stage cannot be increased temporarily in response to a shortage.

P Aj (n) = min Kj −Ij (n−1) , {Bj (n−1)} , Pjmax , ∀j ≤ J − 1 (34)
P AJ (n) = min [KJ − max [0, IJ (n−1) −d(n)] , {BJ (n−1)} , PJ ] max
(35)

5.2 CONWIP control strategy

For CONWIP systems, P Aj (n) for an input stage (j = 1, 2) was modelled by

Eq. (36). P Aj (n) for an input stage is constrained by a cap (CC) on the total
inventory in the system, the number of components available in the raw material
buffers and the maximum production capacity of the stage. For the purposes of the
experiments conducted raw material was assumed to be always available. For this
situation the term {Bj (n−1)} would be inﬁnitely large. P Aj (n) for all other stages
is only constrained by the maximum amount of units that the stage can produce in
a production period and the availability of components in the stage’s input buffer.
Therefore, Eq. (37) was used to model P Aj (n) for all stages that are not input
stages, i.e 3 ≤ j ≤ J,.
⎡ ⎛⎛ ⎞ ⎞
J
P Aj (n) = min ⎣CC − ⎝⎝ Ii (n − 1)⎠ − cj (n − 1) − d (n)⎠ ,
i=j

{Bj (n − , 1≤j≤2
1)} , Pjmax (36)

P Aj (n) = min {Bj (n − 1)} , Pjmax , 3 ≤ j ≤ J (37)

5.3 Hybrid Kanban-CONWIP control strategy

Production Authorisations for production stages in a hybrid Kanban-CONWIP

system were determined by combining the equations used to model P Aj (n) for
KCS and CONWIP. For an input stage of a hybrid Kanban-CONWIP system (j =
1, 2), P Aj (n) was modelled by Eq. (38). This equation was developed by further
constraining Eq. (35) such that sufﬁcient Kanbans must also be available at the
322 J. Geraghty and C. Heavey

stage to authorise production. For stage j, where 3 ≤ j ≤ J − 1, P Aj (n) was

modelled by Eq. (34). For the ﬁnal stage P AJ (n) was modelled by Eq. (37) where
j = J.
⎡ ⎛⎛ ⎞ ⎞
J
P Aj (n) = min ⎣CC − ⎝⎝ Ii (n − 1)⎠ − cj (n − 1) − d (n)⎠ ,
i=j
(38)

Kj − Ij (n − 1) , {Bj (n − 1)} , Pjmax , 1 ≤ j ≤ 2

5.4 Basestock control strategy and extended Kanban control strategy

In a system employing BSCS, production at stage j in period n is authorised by

the presence of demand cards at the production stage. When a demand occurs
the equivalent number of demand cards are dispatched to each production stage
to authorise the production of new parts. When the stage begins production of a
new part the demand card is destroyed. The number of demand cards available to
production stage j in period n, DCj (n), was determined from Eq. (39). P Aj (n)
for a BSCS system was determined by employing Eq. (40).
DCj (n) = DCj (n − 1) − Pj (n − 1) + d (n) , ∀j ≤ J (39)

P Aj (n) = min DCj (n) , {Bj (n − 1)} , Pjmax , ∀j ≤ J (40)
The production in period n of stage j in an EKCS system is constrained by the
availability of Kanban and Demand cards. When a demand occurs, as with BSCS,
the equivalent number of demand cards are dispatched to each production stage to
authorise the production of new parts. However, before production can be autho-
rised by the presence of a demand card, the demand card must be matched with a
Kanban card and an available part. A demand card is destroyed when stage j begins
production on the part while the associated Kanban card is attached to the part and
travels downstream with the part. When the succeeding stage begins production on
the part the Kanban card is removed and passed back to stage j to be available to
authorise production of a new part. For an EKCS system the number of demand
cards available to production stage j in period n, DCj (n), was also modelled by
Eq. (39) while P Aj (n) for an EKCS system was determined by employing Eq. (41)
for 1 ≤ j ≤ J − 1 and Eq. (42) for the ﬁnal production stage, i.e. j = J.

P Aj (n) = min DCj (n) , Kj − Ij (n − 1) , {Bj (n − 1)} , Pjmax ,
∀j ≤ J − 1 (41)
P AJ (n) = min [DCJ (n) , KJ − max [0, IJ (n − 1) − d(n)] ,
{BJ (n − 1)} , PJmax ] (42)

5.5 Experimental conditions

The models just described for each PCS were translated into discrete event simu-
lation models in eM-Plant, an object-oriented simulation software tool developed
A review and comparison of hybrid and pull-type production control strategies 323

by Tecnomatix Technologies Ltd. The powerful debugging environment within

eM-Plant was utilised to conduct a step-by-step walk through of each simulation
model in order to verify the timing and accuracy of the calculations of the con-
ceptual models had been correctly encapsulated in the models. In order to validate
the simulation models, we ﬁrstly developed simulation models of the various PCS
explored by [20, 21]. These models were validated against the results published in
[20, 21] and the results were presented in Geraghty and Heavey [15]. In order to
validate the individual simulation models developed for the PCS examined in this
section we conducted the following:

(1) The output of our simulation model of KCS was compared to the output of
our validated simulation model of Hodgson and Wang’s conceptual model of
KCS. The results were identical when the assumption that a backlog could not
temporarily increase the number of kanbans available to the final stage was
incorporated into our simulation model of Hodgson and Wang’s conceptual
model of KCS.
(2) In Geraghty and Heavey [15] we showed mathematically that the optimal HIHS
identified by [20, 21] is equivalent to hybrid Kanban-CONWIP, under certain
conditions. We also demonstrated this equivalence by comparing the outputs
of our simulation model of hybrid Kanban-CONWIP with our validated simu-
lation model of Hodgson and Wang’s optimal HIHS.
(3) In this paper we have demonstrated mathematically that EKCS and Hodgson
and Wang’s ‘Pure-Push’ PCS will give equivalent results if the occurrence of
demand events are communicated at the same time in both PCS and other con-
ditions detailed earlier are met. Results are presented in [16] that demonstrate
that our simulation model of EKCS achieves the same results as our validated
simulation model of Hodgson and Wang’s ‘Pure-Push’ PCS when all conditions
for equivalence are met.
(4) It was not possible to validate our models of CONWIP and BSCS. However,
these PCS are simplifications of hybrid Kanban-CONWIP and EKCS, respec-
tively, in which kanbans are not distributed. Therefore, since we have been able
to validate our simulation models of hybrid Kanban-CONWIP and EKCS, we
assume that our simulation models of CONWIP and BSCS are valid.

For the purposes of the experimental process the simulation run-time over which
statistics were collected was 10, 000 periods with a warm-up period of 1, 000 peri-
ods. Ten replications of each simulation were conducted. The PCS were compared
by conducting a partial enumeration of the solution space for their control parame-
ters. A detailed description of the solution spaces evaluated for each PCS for each
demand distribution is described below.
The comparison of the strategies was achieved by conducting a partial enumer-
ation of the control parameters of the ﬁve PCS examined. The minimum values for
the Kanban allocations for the KCS, EKCS and hybrid Kanban-CONWIP models
were eight for each stage. This was selected since preliminary work indicated that
values below this level signiﬁcantly degraded the solution. For instance setting the
Kanban levels of the input stages (j = 1, 2) equal to 7 always resulted in a Service
324 J. Geraghty and C. Heavey

Level of 0 regardless of the number of Kanbans allocated to the remaining stages

for both KCS and hybrid Kanban-CONWIP.
CONWIP Cap, CC, values below 16 resulted in service levels of less than 10%
for both CONWIP and hybrid Kanban-CONWIP. A minimum of four parts was
selected for the initialisation stocks (Sj ) for both BSCS and EKCS. This value was
selected because (i) the nature of the control strategies implies that the initialisation
stocks must be greater than zero and (ii) mean demand was 3.5 and it was desirable
to initialise the buffers such that they could satisfy the mean demand. For KCS
and hybrid Kanban-CONWIP the maximum value for the number of Kanbans
considered for distribution to workstations 1 to 3 was 16 each with a maximum of
20 to workstation 4. For KCS workstation 5 had an upper bound of 20 Kanbans. For
CONWIP and hybrid Kanban-CONWIP the maximum value considered for CC
was 50. For BSCS the upper bounds for the initialisation stocks of workstations 1
to 5 were 12, 12, 12, 16 and 50 respectively. For each simulation run the models of
the individual PCS were initialised with inventory as described by Table 3.
For the EKCS model it would have been impossible to conduct a partial enu-
meration of the solution space for all parameters (i.e. all possible combinations
of Kanban and initialisation stock levels). The amount of computer time required
would not have been feasible. For instance, suppose a partial enumeration of the
solution space for the EKCS model were conducted with minimum values as de-
scribed above and maximum values for Kj and Sj equal to the maximum values
for Ki for the KCS model. Over 90,000,000 hours of CPU time would have been
required to conduct this experiment (based on 5.3 seconds per replication and 10
replications per iteration on a 1.8GHz Intel Pentium 4 Dell PC with 256Mb of
RAM). Therefore, in order to minimise the time requirements a method had to
be found to predetermine the Kanban distribution or the initialisation stock levels.
Dallery and Liberopoulos [10] noted that the production capacity of the EKCS only
depends on Kj and not on Sj ; i = 1, . . . , J. They suggest ed that a reasonable
design procedure for the EKCS could be to first design parameters Kj to obtain
a desirable production capacity level, and subsequently design parameters Sj to
obtain a desirable customer satisfaction level.
It seemed that a reasonable design for the Kanban allocation for the EKCS
model might be the allocation that achieved 100% Service Level for the hybrid
Kanban-CONWIP model. Therefore, it is not claimed that EKCS was compared
for optimality with the other PCS. Just that a reasonable design for EKCS was
compared. Under hybrid Kanban-CONWIP Kanbans are not allocated to the final
stage since the maximum amount of inventory that can be in the output buffer of
the final stage in any period is CC. Therefore, if it is desired to design the Kanban
allocation for the EKCS such that it has at most the equivalent amount of inventory as
a hybrid Kanban-CONWIP line then the number of Kanbans to allocate to the final
stage for the EKCS model would be the maximum inventory from hybrid Kanban-
CONWIP minus the minimum inventory to be allocated to the internal buffers in
A review and comparison of hybrid and pull-type production control strategies 325

the EKCS design, i.e. CC − 121 . The Kanban allocation for EKCS therefore was
10, 10, 15, 9 and 13 for workstations 1 to 5 respectively. These values were also
the set as the maximum initialisation stock levels, Sjmax , for each workstation.

Table 3. Initialisation levels for each Buffer under each PCS

Strategy I1 I2 I3 I4 I5

KCS K1 K2 K3 K4 K5
CONWIP 0 0 0 0 CC
Hybrid Kanban-CONWIP 0 0 0 0 CC
BSCS S1 S2 S3 S4 S5
EKCS S1 S2 S3 S4 S5

5.6 Experimental results

Of the five PCS examined, KCS was consistently the worst performer in terms of
addressing the Service Level vs. WIP trade-off. Table 4 illustrates this by giving
the percentage reduction in minimum WIP required by each PCS to achieve a
targeted Service Level when compared to KCS. hybrid Kanban-CONWIP was
consistently the best performer, requiring 9% to 15.5% less WIP than KCS to
achieve a targeted Service Level. A paired-t test demonstrated that the performance
of hybrid Kanban-CONWIP was statistically significantly better than CONWIP at
both 95% and 99% significance levels. BSCS and EKCS required on average 8%
to 13.5% less WIP than KCS to achieve a targeted Service Level. A paired-t test
demonstrated that the performance of EKCS was statistically significantly better
than BSCS at both 95% and 99% significance levels for all targeted service levels
with the exception of a targeted Service Level of 96%. Tables 5 and 6 illustrate
the inventory placement patterns achieved by each PCS for targeted service levels
of 100% and 99.9% respectively. KCS required more semi-finished inventory than
the other four PCS and a similar amount of end-item inventory as CONWIP and
hybrid Kanban-CONWIP to achieve a targeted Service Level. While the differences
between the other four PCS in terms of total WIP was small, the inventory placement
patterns of the PCS were different. CONWIP and hybrid Kanban-CONWIP tended
to maintain less WIP in semi-finished inventory and more in the end-item buffer
than BSCS and EKCS.

6 Discussion

In the last two decades researchers have followed two approaches to developing pro-
duction control strategies to overcome the disadvantages of KCS in non-repetitive
1
CC is a component based inventory cap, therefore the internal inventory for a component
in this parallel/serial model in period n is I1 (n) + I3 (n) + I4 (n) or I2 (n) + I3 (n) + I4 (n)
and the value 12 is arrived at as S1min + S3min + S4min or S2min + S3min + S4min
326 J. Geraghty and C. Heavey

Table 4. Percentage reduction over KCS in minimum inventory required by each PCS to
achieve a targeted service level

SL ≥ 100% 99.9% 99% 98% 97% 96%

CONWIP 8.9% 14.0% 14.3% 15.1% 14.2% 10.2%

Hybrid Kanban-CONWIP 9.0% 14.5% 14.8% 15.5% 14.7% 13.0%
BSCS 7.8% 12.8% 12.8% 13.3% 12.5% 12.8%
EKCS 7.9% 12.9% 12.9% 13.5% 12.6% 8.5%

Table 5. Inventory placements under optimal parameters for each PCS for targeted service
level of 100%

KCS CONWIP Hybrid Kanban-CONWIP BSCS EKCS

I1 4.3118 3.9492 3.9365 4.2644 4.2500

I2 4.3113 3.9473 3.9346 4.2625 4.2480
I3 4.8917 3.7785 3.8283 4.0694 4.1127
I4 5.0808 3.7794 3.7499 6.6772 4.0277
I5 9.0833 9.7595 9.7341 6.2555 8.8573

Internal 18.5956 15.4544 15.4493 19.2736 16.6384

Total 27.6790 25.2139 25.1833 25.5291 25.4958

Table 6. Inventory placements under optimal parameters for each PCS for targeted service
level of 99.9%

KCS CONWIP Hybrid Kanban-CONWIP BSCS EKCS

I1 4.3118 3.9486 3.9075 4.2644 4.2500

I2 4.3113 3.9469 3.9053 4.2625 4.2480
I3 4.8917 3.7779 3.7708 4.0694 4.1127
I4 5.0808 3.7794 3.7746 4.0558 4.0277
I5 6.0843 5.7608 5.7313 4.8784 4.8593

Internal 18.5956 15.4528 15.3581 16.6522 16.6384

Total 24.6799 21.2136 21.0895 21.5305 21.4977

manufacturing environments. The ﬁrst approach has been to develop new or com-
bine existing Pull-type PCS while the second approach has been to develop hybrid
PCS based on combining elements of Push and Pull PCS. In a previous paper [15]
it was demonstrated that the optimal HIHS selected by Hodgson and Wang [20, 21]
where initial stages employ push control and all other stages employ pull control, is
equivalent to a Pull-type PCS, namely hybrid Kanban-CONWIP [3, 2]. Here it was
shown that the ‘Pure-Push’ PCS modelled by Hodgson and Wang [20, 21] would
A review and comparison of hybrid and pull-type production control strategies 327

be more accurately described as a vertical integration production control strategy,

since each production stage forecasts its production requirements and utilises kan-
bans to control shop-floor production for each production period. However, it was
also shown that by ensuring that demand information in the ‘Pure-Push’ PCS is
communicated the instant it occurs rather than been delayed for one period the
‘Pure-Push’ PCS is equivalent to EKCS.
Using the model presented in Hodgson and Wang [21] a comparative study
of KCS, CONWIP, hybrid Kanban-CONWIP, BSCS and EKCS was carried out.
The criterion used in the study was the Service Level vs. WIP trade-off. KCS
performed worst in terms of addressing this trade-off in that KCS consistently
required more inventory than the other four PCS to achieve a targeted Service
Level. The reason for the poor performance of KCS is due to the information delay
that occurs in a KCS line. When a demand event occurs this information is only
communicated to the final production stage to authorise production of replacement
parts. The longer the line and more delays that occur in the system such as downtime
due to machine unreliability, the longer the delay in communicating the demand
information to initial stages. Therefore, the release rate is not easily adjusted to
match changes in the demand rate. CONWIP and hybrid Kanban-CONWIP employ
limits on inventory in the system and once this limit has been reached only the
occurrence of a demand event can authorise the release of a part into the system.
The release rate is therefore paced to match the demand rate. BSCS and EKCS use
demand cards that are instantly communicated to each production stage to pace the
production rate of the line to the demand rate.
For the system modelled, the demand rate was 3.5 parts per production period,
which was 87.5% of the isolated production rate of a stage (4 parts per period).
The coefficient of variation of the demand distribution was approximately 14%
and the coefficient of variation of the production rate of a stage in isolation was
approximately 16%. This therefore is a system with low variability and a light-to-
medium demand load. For this system there was minimal difference between the
performances of the various PCS examined, with the exception of KCS. A statistical
analysis of the data however revealed these differences to be statistically significant.
For this system, hybrid Kanban-CONWIP performed the best in addressing the
Service Level vs. WIP trade-off.
EKCS tended to maintain similar overall inventory levels as CONWIP, hybrid
Kanban-CONWIP and BSCS. However, EKCS tended to maintain more of this
inventory internally in the line, i.e. in a semi-finished state, than CONWIP and
hybrid Kanban-CONWIP. This may be either an advantage or disadvantage and
will depend on the manufacturing objectives of the organisation. The strategy of
the organisation might be to maintain as much as possible of the WIP in a finished
state and thereby provide the organisation with greater flexibility to respond to
unexpected demands. If this is the strategy of the organisation then hybrid Kanban-
CONWIP is the preferable PCS for the manufacturing system. On the other hand
the strategy of the organisation might be to maintain WIP in semi-finished states
close to the completion state allowing the organisation to respond to changes in
customer demands by reassigning WIP to other customers or altering the WIP to
328 J. Geraghty and C. Heavey

meet new customer specifications. If this is the strategy of the organisation then
EKCS is the preferable PCS for the manufacturing system.
Finally, as has been stated, the experiment presented here to examine the com-
parative performance of various Pull-type PCS was for a manufacturing system with
moderate variability and a light-to-medium demand load. Future planned work is
to examine the comparative performance of the five PCS further by examining how
the PCS respond as the coefficient of variation of the demand distribution increases
and as the mean of the demand distribution approaches the maximum capacity of
the manufacturing system.

References
1. Berkley BJ (1992) A review of the kanban production control research literature.
Production and Operations Management 1(4): 393–411
2. Bonvik AM, Couch CE, Gershwin SB (1997) A comparison of production-line control
mechanisms. International Journal of Production Research 35(3): 789–804
3. Bonvik AM, Gershwin SB (1996) Beyond kanban – creating and analyzing lean shop
ﬂoor control policies. In: Proceedings of Manufacturing and Service Operations Man-
agement Conference, Dartmouth College, The Amos Tuck School, Hanover, NH, USA
4. Buzacott JA (1989) Queueing models of kanban and MRP controlled production sys-
tems. Engineering Cost and Production Economics 17: 3–20
5. Chang T-M, Yih Y (1994a) Determining the number of kanbans and lotsizes in a generic
kanban system: A simulated annealing approach. International Journal of Production
Research 32(8): 1991–2004
6. Chang T-M, Yih Y (1994b) Generic kanban systems for dynamic environments. Inter-
national Journal of Production Research 32(4): 889–902
7. Clark AJ, Scarf H (1960) Optimal policies for the multi-echelon inventory problem.
Management Science 6(4): 475–490
8. Cochran JK, Kim S-S (1998) Optimum junction point location and inventory levels
in serial hybrid push/pull production systems. International Journal of Production
Research 36(4): 1141–1155
9. Dallery Y, Liberopoulos G (1995) A new kanban-type pull control mechanism for multi-
stage manufacturing systems. In: Proceedings of the 3rd European Control Conference
vol 4(2), pp 3543–3548
10. Dallery Y, Liberopoulos G (2000) Extended kanban control system: combining kanban
and base stock. IIE Transactions 32(4): 369–386
11. Deleersnyder JL, Hodgson TJ, King RE, O’Grady PJ, Savva A (1992) Integrating
kanban type pull systems and MRP type push systems: Insights from a Markovian
model. IEE Transactions 24(3): 43–56
12. Gaury EGA, Kleijnen JPC (2003) Short-term robustness of production management
systems: A case study. European Journal of Operational Research 148: 452–465
13. Gaury EGA, Kleijnen JPC, Pierreval H (2001) A methodology to customize pull control
systems. Journal of the Operational Research Society 52(7): 789–799
14. Gaury EGA, Pierreval H, Kleijnen JPC (2000) An evolutionary approach to select a
pull system among kanban, CONWIP and hybrid. Journal of Intelligent Manufacturing
11(2): 157–167
15. Geraghty J, Heavey C (2004) A comparison of hybrid Push/Pull and CONWIP/Pull
production inventory control policies. International Journal of Production Economics
91(1): 75–90
A review and comparison of hybrid and pull-type production control strategies 329

16. Geraghty JE (2003) An investigation of pull-type production control mechanisms for

lean manufacturing environments in the presence of variability in the demand process.
PhD, University Of Limerick, Ireland
17. Hall RW (1983) Zero inventories. Dow Jones-Irwin, Homewood, IL
18. Hirakawa Y (1996) Performance of a multistage hybrid push/pull production control
system. International Journal of Production Research 44: 129–135
19. Hirakawa Y, Hoshino K, Katayama H (1992) A hybrid push/pull production control
system for multistage manufacturing processes. International Journal of Operations
and Production Management 12(4): 69–81
20. Hodgson TJ, Wang D (1991a) Optimal hybrid push/pull control strategies for a parallel
multi-stage system: Part I. International Journal of Production Research 29(6): 1279–
1287
21. Hodgson TJ, Wang D (1991b) Optimal hybrid push/pull control strategies for a parallel
multi-stage system: Part II. International Journal of Production Research 29(7): 1453–
1460
22. Kimemia J, Gershwin SB (1983) An algorithm for the computer control of a ﬂexible
manufacturing system. IIE Transactions 15(4): 353–362
23. Krajewski LJ, King BE, Ritzman LP, Wong DS (1987) Kanban, MRP, and shaping the
manufacturing environment. Management Science 33(1): 39–57
24. Lee LC (1989) A comparative study of the push and pull productions systems. Inter-
national Journal of Operations and Production Management 9(4): 5–18
25. Liberopoulos G, Dallery Y (2000) A uniﬁed framework for pull control mechanisms
in multi-stage manufacturing systems. Annals of Operations Research 93: 325–355
26. Muckstadt JA, Tayur SR (1995) A comparison of alternative kanban control mecha-
nisms. IIE Transactions 27: 140–161
27. Orth MJ, Coskunoglu O (1995) Comparison of push/pull hybrid manufacturing control
strategies. In: Proceedings of Industrial Engineering Research, Nashville, TN, pp 881–
890. IIE, Norcross, GA
28. Pandey PC, Khokhajaikiat P (1996) Performance modelling of multistage production
systems operating under hybrid push/pull control. International Journal of Production
Economics 43(1): 17–28
29. Spearman ML (1988) An analytical congestion model for closed production systems.
Technical Report 88-23, Dept. of Industrial Engineering and Management Sciences,
Northwestern University, Evanston, IL
30. Spearman ML, Woodruff DL, Hopp WJ (1990) CONWIP: A pull alternative to kanban.
International Journal of Production Research 28(5): 879–894
31. Spearman ML, Zazanis MA (1992) Push and pull production systems: Issues and
comparisions. Operations Research 40(3): 521–532
32. Takahashi K, Hiraki S, Soshiroda M (1991) Integration of push type and pull type pro-
duction ordering systems. In: Proceedings of 1st China-Japan International Symposium
on Industrial Management, Beijing, China, pp 396–401
33. Takahashi K, Hiraki S, Soshiroda M (1994) Push-pull integration in production ordering
systems. International Journal of Production Economics 33(1–3): 155
34. Takahashi K, Soshiroda M (1996) Comparing integration strategies in production
ordering systems. International Journal of Production Economics 44(1–2): 83–89
35. Wang D, Xu CC (1997) Hybrid push/pull production control strategy simulation and
its applications. Production Planning and Control 8(2): 142–151
36. Zipkin P (1989) A kanban-like production control system: analysis of simple models.
Working paper 89-1, Graduate School of Business, Columbia University, New York
Section IV: Stochastic Production Planning
and Assembly
Planning order releases for an assembly system
with random operation times
Sven Axsäter
Department of Industrial Management and Logistics, Lund University, Sweden
(e-mail: [email protected])

Abstract. A multi-stage assembly network is considered. A number of end items

should be delivered at a certain time. Otherwise a delay cost is incurred. End items
and components that are delivered before they are needed will cause holding costs.
All operation times are independent stochastic variables. The objective is to choose
starting times for different operations in order to minimize the total expected costs.
We suggest an approximate decomposition technique that is based on repeated
application of the solution of a simpler single-stage problem. The performance of
our approximate technique is compared to exact results in a numerical study.

Keywords: Multi-stage production/inventory systems – Decomposition

1 Introduction

In this paper we consider the planning of interrelated assembly operations with

independent stochastic operation times. One or more end items should according
to a given contract be delivered at a certain time. The delivery cannot take place
until all end items are ready. In case the given delivery requirement cannot be
satisfied there is a delay cost that is proportional to the length of the delay. If the
end items are ready at different times the delay cost is based on the time when all
items are ready. Furthermore, if end items are ready earlier than the delivery time,
holding costs are incurred. A final assembly operation can normally not start unless
a set of preceding operations, also with stochastic durations, has been completed.
Delays for such preceding operations will not result in any direct delay costs but
may indirectly result in additional delay costs for the end items. If such preceding
operations are finished before the corresponding final operations start, holding
costs are incurred. The operations preceding the final operations can, in turn, have
preceding operations and so on. Our purpose is to find starting times for the different
334 S. Axsäter

operations that minimize the total expected costs, i.e., in other words we are looking
for optimal safety times.
The considered problem with several stages is, in general, too difficult to be
solved exactly. We therefore suggest a heuristic that is based on successive appli-
cations of the solution of a simpler one-stage problem. Different versions of such
simpler problems with one or two stages have been studied in several papers before.
Examples of this research are Yano (1987a), Kumar (1989), Hopp and Spearman
(1993), Chu et al. (1993), and Shore (1995). Song et al. (2000) consider stochastic
operation times as well as stochastic demand. In a recent overview Song and Zipkin
(2003) consider a more general class of stochastic assembly problems. There are
also a number of papers dealing with similar problems for other types of systems.
Gong et al. (1994) consider a serial system and show that the problem of choos-
ing optimal lead-times is equivalent to the well-known model in Clark and Scarf
(1960). Yano (1987b) deals also with a serial system, while Yano (1987c) consid-
ers a distribution-type system. Examples of papers analyzing related problems for
single-stage systems are Buzacott and Shanthikumar (1994), Hariharan and Zipkin
(1995), Chen (2001), and Karaesmen et al. (2002).
The outline of this paper is as follows. We first give a detailed problem for-
mulation in Section 2. In Section 3 we consider the simpler single-stage system
that is the basis for our heuristic. The approximate procedure is then described in
Section 4. In Section 5 we apply our technique to two sets of sample problems, and
finally we give some concluding remarks in Section 6.

2 Problem formulation

We consider an assembly network (see Fig. 1). The arcs represent the operations.
The node where operation i starts is denoted node i. The operation times are in-
dependent random variables with continuous distributions. Our purpose is to plan
production so that the expected holding and delay costs are minimized.
Let us introduce the following notation:
ti = starting time for operation i,
τi = stochastic duration time of operation i,
fi (x) = density for τi ,
Fi (x) = cumulative distribution function for τi , (It is assumed that Fi (x) < 1 for
any finite x.)
td = requested delivery time for the assembly,
ei = positive echelon holding cost associated with operation i,
h = sum of all echelon holding costs, i.e., holding cost for all end items,
b = positive delay cost per time unit.
The delay costs at node 0 are obtained as b(t1 + τ1 − td )+ , where x+ =
max(x, 0). If there are several end items, the delay cost is based on the maximum
delay. There are no delay costs associated with other nodes. However, delays at
other nodes may affect the delay at node 0. Consider node i, which is the starting
point for operation i. Note first that we must have ti ≥ max(tj + τj , tk + τk ). After
starting operation i, the echelon holding cost ei is incurred until the final delivery,
Planning order releases for an assembly system with random operation times 335

m j
ti
n i
τi
k
1 τ1 0
t1

Fig. 1. Assembly network

which will take place at the requested delivery time td , or later in case of a delay.
However, we disregard the holding costs during the operations, because they are
not affected by when the operations are carried out. It is assumed that raw material
can be obtained instantaneously from an outside supplier. This means that initial
operations like l, m, and n in Figure 1 can start at any time.

3 Single-stage system

We shall derive an approximate solution by successively applying the exact solution

for a single-stage system. Consider therefore ﬁrst the system in Figure 2.

1
.

i
.
0
.
.

N
Fig. 2. Single-stage system

We can express the delay d as

d = max (ti + τi − td )+ . (1)
1≤i≤N
336 S. Axsäter

The average costs C can then be expressed as

N
C= ei E(td + d − ti − τi ) + bE(d)
i=1

N

N
= ei (td − ti − E(τi )) + ei + b E(d). (2)
i=1 i=1
We shall optimize C with respect to the starting times ti (i = 1, 2, ..., N). This
problem was solved by Yano (1987a), who also showed that C is a convex function
of the starting times for N = 2. It is easy to see that C is convex in the starting
times also for larger values of N. Consider the right-hand side of (2). The ﬁrst term
is linear so we only need to show that E(d) is convex. Because the operation times
are independent it is enough to demonstrate that d in (1) is convex in the starting
times for given operation times. Note that x+ is convex. Let 0 ≤ α ≤ 1, and t and
t be two set-ups of starting times. We have

max (αti + (1 − α)ti + τi − td )+
1≤i≤N

≤ max α(ti + τi − td )+ + (1 − α)(ti + τi − td )+
1≤i≤N

≤ α max (ti + τi − td )+ + (1 − α) max (ti + τi − td )+ . (3)
1≤i≤N 1≤i≤N

Furthermore, it is evident that C → ∞ as ti → ∞ or ti → −∞. It follows that C

has a unique minimum for ﬁnite values of the starting times ti . See also e.g., Hopp
and Spearman (1993) and Song et al. (2000).
Let G(x) be the cumulative distribution function of the delay d. We have
-
N
G(x) = P(d ≤ x) = Fi (x+td −ti ). (4)
i=1
We can now express C as

N $

N ∞ -
N
C= ei (td −ti −E(τi ))+ ei +b 1− Fi (x+td −ti ) dx. (5)
i=1 i=1 0 i=1
Consequently we get the partial derivative of C with respect to ti as

N $
∂C ∞ -
= −ei + ei + b fi (x + td − ti ) Fj (x + td − tj )dx. (6)
∂ti 0
i=1 /
j=i

When evaluating ∂C/∂ti numerically we need to carry out a numerical integration.

Assume now ﬁrst that there are no constraints on the variables ti . We will then
obtain the minimum as the unique solution of ∂C/∂ti = 0. Note that for N = 1 the
problem degenerates to the familiar Newsboy problem, and we obtain the optimum
from the condition
b
F1 (td − t1 ) = . (7)
b + e1
Consider the general case again. For given values of the other tj it is easy to ﬁnd
the ti giving ∂C/∂ti = 0. Due to the convexity ∂C/∂ti is increasing so we can
Planning order releases for an assembly system with random operation times 337

apply a simple search procedure. If we start with some initial values of the starting
times, e.g., ti = td − E(τi ), and carry out optimizations with respect to one ti at a
time the costs are nonincreasing and we will ultimately reach the optimal solution
due to the convexity.
Assume now that we have lower bounds for the starting times ti .
ti0 = lower bound for ti .
We can then obtain the optimal solution in almost the same way. We start with
feasible values of ti , e.g., ti = max(td −E(τi ), ti0 ). Then we optimize with respect
to one ti at a time. Consider a local optimization of ti . We ﬁrst check whether ti = ti0
is optimal. This is the case if ∂C/∂ti > 0 for ti = ti0 . Otherwise we determine
the ti giving ∂C/∂ti = 0 as before. Again the costs are nonincreasing and we will
ultimately reach the optimal solution due to the convexity.

4 Approximate procedure for a multi-stage system

Consider now a general assembly system. We shall describe our approximate plan-
ning procedure. Let
P(i) = set of operations that are immediate predecessors of node i.
Consider first the immediate predecessors of the final node, i.e., P(0). In Figure 1
we have a single predecessor, but in a more general case we may have multiple
predecessors like in Figure 2. Assume that there are N operations in P(0). Assume
also that the operations that must precede these operations are finished, i.e., the N
operations in P(0) are ready to start. Consider first all operations, which do not
belong to P(0). The holding costs associated with these operations before time td
cannot be affected by the starting times for the operations in P(0). So we disregard
these holding costs in the first step. There are also holding costs associated with
these operations during a possible delay, which should be included because they
are affected by the starting times for the operations in P(0). For the operations in
P(0) both the holding costs from their respective starting times to td , and during a
possible delay are affected by the starting times for the operations in P(0). There are
also delay costs associated with a delay. This leads to the following cost expression
for the operations in P(0).
⎛ ⎞
N $ ∞ -N
C(0) = ej (td − tj − E(τj )) + (h + b) ⎝1 − Fj (x + td − tj )⎠ dx.(8)
j=1 0 j=1

Note that the only difference compared to (5) is that h includes also holding costs
during a possible delay for operations preceding P(0). Using the algorithm in
Section 3 (without lower bounds for the starting times) we optimize (8) in our ﬁrst
step and get the corresponding starting times t∗i for the operations in P(0).
Assume then that i is one of the operations in P(0), and consider its immediate
predecessors, i.e., the operations in P(i). We shall now consider the single stage
system consisting of these operations and thereby interpret t∗i as a requested delivery
time. For an operation in P(i) the starting time will affect the holding costs before
338 S. Axsäter

t∗i . Let us also consider a delay cost b̂ that replaces h + b in (8). The resulting
problem is to minimize
N $ ∞ -
N
C(i) = ej (t∗i − tj − E(τj )) + b̂ (1 − Fj (x + t∗i − tj ))dx. (9)
j=1 0 j=1

Note that although we, for simplicity, are using a similar notation in (8) and (9), the
considered operations are not the same. So the number of operations N, the holding
costs ej , and the distribution functions Fj are normally different.
It remains to determine b̂. If there is a delay, this delay will affect the starting
times of the operations in P(0). This will increase the costs C(0) in (8). Consider
some given starting times for the operations in P(i) and let the corresponding
stochastic delay relative to t∗i be δ. A reasonable approximate delay cost is
8 9
dC(0) ∗ ∗ ∗
b̂ = Eδ>0 (t1 + δ, t2 + δ, ..., tN + δ)
dδ
8 9
dC(0) ∗ ∗ ∗
= Eδ (t1 + δ, t2 + δ, ..., tN + δ) / Pr(δ > 0). (10)
dδ
The second equality in (10) follows because dC(0)/dδ = 0 for δ = 0 due to the
optimality of t∗i . Because of our assumption concerning the distributions of the
operation times we know that Pr(δ > 0) > 0. In (10) it is implicitly assumed
that all operations in P(0) are started δ time units later compared to the optimal
solution, i.e., not only operation i. This is a reasonable assumption and will also
simplify the computations. Using that

d - d -
N N
Fj (x + td − tj − δ) = − Fj (x + td − tj − δ), (11)
dδ dx
j=1 j=1

we get from (10)

dC(0) ∗
(t1 + δ, t∗2 + δ, ..., t∗N + δ)
dδ ⎛ ⎞
N $ ∞ -N
⎝ d
=− ej + (h + b) Fj (x + td − tj − δ)⎠ dx
0 dx
j=1 j=1
⎛ ⎞
N -N
=− ej + (h + b) ⎝1 − Fj (td − tj − δ)⎠ (12)
j=1 j=1

so it is relatively easy to evaluate b̂ according to (10). Recall that we get the

distribution of the delay from (4).
Because b̂ in (10) depends on the starting times of the operations in P(i), it is
unknown. It is, however, still easy to determine an optimal solution corresponding
to the delay cost (10). Noteﬁrst that the b̂ from (10) is bounded from below by 0
N
and from above by h + b − j=1 ej . The upper bound is easy to see from (12). It is
also clear that the upper bound will lead to a ﬁnite optimal solution of (9). (Recall
that h is the sum of all echelon holding costs.) Assume now that we start with a
Planning order releases for an assembly system with random operation times 339

certain b̂in from the considered interval. There are now two possibilities. If b̂in is
sufficiently small there is no finite optimum of (9). The resulting b̂out from (10)
will approach the upper bound. If, on the other hand, there is a finite solution we
know that b̂out is between the lower and upper bounds. Clearly, b̂out is a continuous
function of b̂in . Consequently, it follows from Brouwer’s fixed point theorem that
there exists a fixed point b̂out = b̂in , i.e., a solution corresponding to the delay cost
(10). It is easy to find such a fixed point by a one-dimensional search.
Remark. Normally b̂out is a decreasing function of b̂in . In that case it is very easy
to find the unique fixed point.
We can then handle the predecessors of the operations in P(i) in the same way,
etc. Let j be one of the operations in P(i). When dealing with P(j) we let t∗j be the
requested delivery time for the single-stage system. In (10) we are using C(i) instead
of C(0). A difference here is that the upper bound for b̂ will not necessarily lead
to a finite solution. This means in that case that the delay δ will approach infinity
and, as a consequence, also the operations in P(0) will be delayed. Consequently
it is reasonable to use the costs in the preceding step, i.e., in this case C(0) instead
of C(i). If necessary we can go one step further, and so on. This will always work
because C(0) will provide an upper bound leading to a finite optimal solution.
We will end up with starting times t∗i for all operations and delay costs for all
single-stage systems. When implementing the solution we will stick to the obtained
starting times as long as they are possible to follow. However, delays may enforce
changes. Consider, for example, operation j in Figure 1. Assume that operations
l, m, and n are finished at some time tj0 > t∗j . We then derive a new solution for
the operations in P(i). (The delay cost at node i is not changed.) In the solution we
apply the constraint tj ≥ tj0 . Starting times that have already been implemented are
regarded as given. If operation k has not yet started, its starting time may increase
but cannot decrease. To see this consider (6) and note that ∂C/∂ti is nonincreasing
if some other tj is increasing, i.e., t∗i is nondecreasing.
Let us summarize our approximate procedure:
1. Optimize C(0) as given by (8). Let K be the number of stages. Set k = −1.
2. Set k = k + 1.
3. For all operations i with k succeeding operations, optimize (9) for the operations
of P(i) under the constraint (10) with the cost function C(i). If k > 0 it may
occur that no finite optimum exists. If this is the case use the cost function for
the successor of i. If necessary go to the successor of the successor, etc.
4. If k < K − 1, then goto 2.
5. Implement the solution. In case of a delay reoptimize free starting times without
changing the delay cost.
340 S. Axsäter

5 Numerical results

To evaluate the suggested approximate procedure we used two sets of sample prob-
lems.

Problem set 1

Our ﬁrst problem set concerns the two-stage network in Figure 3.

4 1
td = 0
5 0

2
Fig. 3. Network for Problem set 1

The requested delivery time is td = 0. All operation times have the same
distribution. We denote this stochastic operation time by τ . This means that it is
relatively easy to determine also the exact solution for comparison. By symmetry
operations 3, 4, and 5 should have the same starting time ts = t3 = t4 = t5 . As
before consider ﬁrst the single-stage network to the right. Let t∗1 = t∗2 be the optimal
solution of (8). Given ts there is a certain stochastic delay d at node 1 relative to t∗1 .
If d = 0 it is optimal to apply t∗1 and t∗2 . If d > 0 it is optimal to use the solution
obtained with the constraint t1 ≥ t10 = t∗1 +d. Recall that this leads to t2 ≥ t∗2 . Let
c(d) be the corresponding expected costs for the single-stage network according to
(8). We obtain the total costs as

5
C(ts ) = ei (td − ts − E(τ ))+Ed {c(d)} . (13)
i=3

This is the case both for the exact and the approximate solution. The only differ-
ence between the exact and approximate solution is the determination of ts . In the
approximate procedure we use the procedure described in Section 4. In the optimal
solution we optimize (13) with respect to ts .
All echelon holding costs are kept equal, ei = 1, while we considered three
different delay costs b = 5, 25, and 50. Furthermore the expected operation time
E(τ ) = 1 in all considered cases. Two different types of distributions for the
operation time were considered. For each distribution we considered the standard
deviations σ = 0.2, 0.5, and 1. Both distributions are constructed as α + (1 − α)X,
where α is a constant between 0 and 1 and X is a stochastic variable with its mean
equal to 1. Distribution 1 is obtained by letting X have an exponential distribution
with mean (and standard deviation) equal to 1. Distribution 2 is similarly obtained
Planning order releases for an assembly system with random operation times 341

Table 1. Optimal parameters and costs for Problem set 1

Distri- Stand. Delay Optimal policy Approx. policy Cost in-

bution dev. σ cost b ts Costs ts Costs crease %

1 0.2 5 −2.16 2.06 −2.22 2.08 1.0

1 0.2 25 −2.50 3.48 −2.58 3.52 1.1
1 0.2 50 −2.66 4.21 −2.75 4.26 1.1
1 0.5 5 −2.39 5.15 −2.54 5.19 0.8
1 0.5 25 −3.25 8.71 −3.45 8.81 1.1
1 0.5 50 −3.66 10.52 −3.88 10.64 1.1
1 1 5 −2.79 10.30 −3.09 10.40 1.0
1 1 25 −4.50 17.43 −4.90 17.62 1.1
1 1 50 −5.32 21.04 −5.75 21.27 1.1
2 0.2 5 −2.09 2.07 −2.17 2.10 1.4
2 0.2 25 −2.49 3.74 −2.57 3.77 0.8
2 0.2 50 −2.69 4.64 −2.78 4.68 0.9
2 0.5 5 −2.30 5.20 −2.43 5.23 0.6
2 0.5 25 −3.25 9.38 −3.43 9.45 0.7
2 0.5 50 −3.72 11.56 −3.95 11.68 1.0
2 1 5 −2.61 10.40 −2.86 10.47 0.7
2 1 25 −4.49 18.75 −4.87 18.90 0.8
2 1 50 −5.43 23.12 −5.90 23.35 1.0

by letting X be the square of a normally distributed random variable with mean 0

and standard deviation 1. In other words, X has a χ2 -distribution√with one degree
of freedom. This means that X has mean 1 and standard deviation 2. In both cases
the mean is equal to 1 for any value of α, and by adjusting α we can obtain the
considered standard deviations.
The results are shown in Table 1. The relative cost increase when using our
approximate technique is quite small in all 18 cases. The maximum error is 1.4 %.
The relative errors are fairly insensitive to the distribution type, the delay cost, and
the standard deviation of the operation times. The approximate method results for
Problem set 1 always in an earlier starting time ts for the initial operations, i.e., the
needed safety times are overestimated.

Problem set 2

Our second problem set concerns a more complicated network with three stages
(see Fig. 4). Operations 1, 2, 3, 5, and 7 have the same stochastic operation time τ ,
while the times of operations 4 and 6 are 2τ . All times are independent. The time
τ has the same distribution as distribution 1 in Problem set 1 with E(τ ) = 1 and
the standard deviations σ = 0.2, 0.5, and 1. Furthermore, as for Problem set 1 all
342 S. Axsäter

2
5 td = 0
1 00
6
3
7

Fig. 4. Network for Problem set 2

Table 2. Optimal parameters and costs for Problem set 2

Stand. Delay Optimum by simulation Approximate policy Cost in-

dev. σ cost b ts1 ts2 tm Costs ts1 ts2 tm Costs crease %

0.2 5 −4.3 −3.1 −2.3 3.68 −4.20 −3.20 −2.26 3.94 7.1
0.2 25 −5.0 −3.5 −2.5 6.46 −4.57 −3.53 −2.53 6.86 6.2
0.2 50 −5.3 −3.6 −2.7 7.82 −4.76 −3.70 −2.68 8.69 11.1
0.5 5 −4.6 −3.2 −2.6 9.18 −4.49 −3.49 −2.66 9.86 7.4
0.5 25 −5.8 −3.9 −3.4 16.16 −5.42 −4.32 −3.33 17.57 8.7
0.5 50 −7.1 −4.4 −3.8 19.12 −5.90 −4.75 −3.70 22.29 16.6
1 5 −5.2 −3.3 −3.3 17.89 −4.98 −3.99 −3.31 20.04 12.0
1 25 −8.3 −4.7 −4.6 31.85 −6.84 −5.64 −4.67 34.41 8.0
1 50 −9.9 −5.5 −5.4 38.79 −7.80 −6.49 −5.41 43.69 12.6

echelon holding costs are equal, ei = 1, and we consider the delay costs b = 5, 25,
and 50. Table 2 shows the results.
The approximate policy in Table 2 is obtained as described in Section 4. The
costs for the approximate policy are obtained by simulation. The standard deviation
is less than 0.02. The time ts1 is the starting time of the longer operations 4 and 6,
and ts2 is the starting time of operations 5 and 7. The time tm is the starting time
for operations 2 and 3 if both operations are ready to start.
The optimum by simulation is obtained by a combination of simulation and ana-
lytical techniques. We omit the details. All times, both starting times and stochastic
times were for simplicity rounded to multiples of 0.1. This does not affect the re-
sults much. More important is that in the determination of the optimal policy we
carried out minimizations over several simulated costs. This means that the costs
are somewhat underestimated. By considering the costs for the starting times sug-
gested by the approximate policy we could, however, conclude that the error is not
very signiﬁcant. The cost increase of the approximate policy in Table 2 may be
overestimated by 1−2 % but hardly more.
We must conclude that the approximation errors for Problem set 2 are much
larger than for Problem set 1. The average cost increase is nearly 10 %. Note
that the intermediate time tm is very accurate also in Table 2. We note that our
Planning order releases for an assembly system with random operation times 343

approximation underestimates the needed safety times for the long operations 4
and 6 while it overestimates the safety times for the shorter operations 5 and 7. To
explain the large difference in errors, consider ﬁrst the network for Problem set 1 in
Figure 3. The delay at node 1 will determine the starting times t1 and t2 because we
can initiate operation 2 at any time. A linear delay cost is then reasonable. Consider
then the network for Problem set 2 in Figure 4 and the delay at node 2. A small
delay may then not be that serious because there may any way be a delay at node 3.
A long delay may, however, cause a long delay also for operation 3. This indicates
that our linear delay cost may be less appropriate in this case. It also explains why
the longer more stochastic operations are starting so early in the optimal solution,
i.e., we wish to avoid long delays. A general, not very surprising, conclusion can
be that our approximation works better for networks with a single “critical path”.

6 Conclusions

We have considered the problem of minimizing expected delay and holding costs for
a complex assembly network where the operation times are independent stochastic
variables. An approximate decomposition technique for solving the problem has
been suggested. The technique means repeated applications of the solution of a
simpler single-stage problem. The approximate method has been evaluated for two
problem sets. The results are very good for the ﬁrst set of two-stage problems and
the relative cost increase due to the approximation is only about 1 %. For the second
set of three-stage problems the errors are about 10 % and cannot be disregarded.
Although the numerical results show some promise, further research is needed for
evaluation of the applicability of the suggested technique in more general settings.
Because it is difﬁcult to derive exact solutions for problems of realistic size it may
be most fruitful to compare different heuristics for larger problems.

References

Buzacott JA, Shanthikumar JG (1994) Safety stock versus safety time in MRP controlled
production systems. Management Science 40: 1678–1689
Chen F (2001) Market segmentation, advanced demand information, and supply chain per-
formance. Manufacturing & Service Operations Management 3: 53–67
Chu C, Proth JM, Xie X (1993) Supply management in assembly systems. Naval Research
Logistics 40: 933–950
Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Manage-
ment Science 6: 475–490
Gong L, de Kok T, Ding J (1994) Optimal leadtimes planning in a serial production system.
Management Science 40: 629–632
Hariharan R, Zipkin P (1995) Customer-order information, leadtimes, and inventories. Man-
agement Science 41: 1599–1607
Hopp W, Spearman M (1993) Setting safety leadtimes for purchased components in assembly
systems. IIE Transactions 25: 2–11
Karaesmen F, Buzacott JA, Dallery Y (2002) Integrating advance order information in make-
to-stock production systems. IIE Transactions 34: 649–662
344 S. Axsäter

Kumar A (1989) Component inventory costs in an assembly problem with uncertain supplier
lead-times. IIE Transactions 21: 112–121
Shore H (1995) Setting safety lead-times for purchased components in assembly systems: a
general solution procedure. IIE Transactions 27: 634–637
Song J-S, Yano CA, Lerssrisuriya P (2000) Contract assembly: Dealing with combined
supply lead time and demand quantity uncertainty. Manufacturing & Service Operations
Management 2: 287–296
Song J-S, Zipkin P (2003) Supply chain operations: Assemble-to-order systems, Ch. 11.
In: Graves SC, De Kok T (eds) Handbooks in operations research and management
science, Vol 11. Supply chain management: design, coordination and operation. Elsevier,
Amsterdam
Yano C (1987a) Stochastic leadtimes in two-level assembly systems. IIE Transactions 19:
371–378
Yano C (1987b) Setting planned leadtimes in serial production systems with tardiness costs.
Management Science 33: 95–106
Yano C (1987c) Stochastic leadtimes in two-level distribution-type networks. Naval Research
Logistics 34: 831–843
A multiperiod stochastic production planning
and sourcing problem with service level constraints
Işıl Yıldırım1 , Barış Tan2 , and Fikri Karaesmen1
1
Department of Industrial Engineering, Koç University, Rumeli Feneri Yolu,
Sariyer, Istanbul, Turkey (e-mail: [email protected];[email protected])
2
Graduate School of Business, Koç University, Rumeli Feneri Yolu, Sariyer,
Istanbul, Turkey (e-mail: [email protected])

Abstract. We study a stochastic multiperiod production planning and sourcing

problem of a manufacturer with a number of plants and/or subcontractors. Each
source, i.e. each plant and subcontractor, has a different production cost, capacity,
and lead time. The manufacturer has to meet the demand for different products
according to the service level requirements set by its customers. The demand for
each product in each period is random. We present a methodology that a manu-
facturer can utilize to make its production and sourcing decisions, i.e., to decide
how much to produce, when to produce, where to produce, how much inventory to
carry, etc. This methodology is based on a mathematical programming approach.
The randomness in demand and related probabilistic service level constraints are in-
tegrated in a deterministic mathematical program by adding a number of additional
linear constraints. Using a rolling horizon approach that solves the deterministic
equivalent problem based on the available data at each time period yields an ap-
proximate solution to the original dynamic problem. We show that this approach
yields the same result as the base stock policy for a single plant with stationary
demand. For a system with dual sources, we show that the results obtained from
solving the deterministic equivalent model on a rolling horizon gives similar results
to a threshold subcontracting policy.

Keywords: Stochastic production planning – Service level constraints – Subcon-

tracting

The authors are grateful to Yves Dallery for his ideas, comments and suggestions on the
earlier versions of this paper.
Correspondence to: F. Karaesmen
346 I. Yıldırım et al.

1 Introduction and motivation

In this study, we consider a manufacturer that supplies products to a retailer. The

manufacturer has a number of production sources that are either its own plants or
its subcontractors. Each source has a different production cost, capacity, and lead
time. The demand for each product in each period is random. The manufacturer
has to meet the demand for multiple products taking into account the service level
requirements set by the retailer.
In the production planning and the sourcing problem, the manufacturer’s deci-
sion variables are how much to produce, when to produce, where to produce, and
how much inventory to carry in each period. The objective is to minimize its total
production and inventory carrying costs during the planning horizon subject to the
service level requirements and other possible constraints.
This problem is motivated by the problems faced by suppliers of lean retailers
in the textile-apparel-retail channel (Abernathy et al., 1999). Namely, adoption of
lean retailing practices force suppliers of lean retailers to adopt new strategies to
respond quickly to changing demand effectively. Using subcontractors emerge as
a viable alternative to increase production capacity temporarily when it is needed.
Additional cost of subcontracting can be justiﬁed by lowering inventories and im-
proving the service. However, deciding on where to produce and how much to
produce is a challenging task especially when the demand is volatile. A qualitative
discussion of this problem can be found in Abernathy et al. (2000). Figure 1 below
depicts the system which motivates this study.
We propose a solution methodology that is based on solving a deterministic
mathematical problem at each time period on a rolling horizon basis. Randomness
in the problem that comes from uncertain demand and service level constraints
are integrated in a deterministic mathematical program by adding a number of
additional linear constraints similar to the approach proposed by Bitran and Yanasse
(1984). We propose using this approach to address the more relevant but also more

Plant 1

Retailer orders Product 1

Plant
Distribution Center
Retailer

Subcontractor Inventory
1
Product M

Subcontractor Sales data

N
Decision and Control
Production

Fig. 1. A manufacturer with multiple plants that sells multiple products to a retailer
A multiperiod stochastic production planning and sourcing problem 347

difﬁcult dynamic problem where decisions can be updated over time. Since the
equivalent deterministic problem is a well-structured mathematical programming
problem, the proposed methodology can easily be integrated with the Advanced
Planning and Optimization tools, such as the products of i2, Manugistics, etc., that
are commonly used in practice.
The organization of the remaining parts of the paper is as follows: In Section 2,
we review the literature on mathematical-programming-based stochastic produc-
tion planning methodologies. The particular stochastic production planning and
sourcing problem we investigate is introduced in Section 3. Section 4 presents the
proposed solution methodology that is based on solving the deterministic equiva-
lent problem at each time step on a rolling horizon basis. The performance of the
rolling horizon approach is evaluated by considering a number of special cases in
Section 5. Finally, conclusions are presented in Section 6.

2 Literature review

The classical deterministic production planning problem, its mathematical pro-

gramming formulations and solution methodologies have received a lot of attention
for many years (see Hax and Candea, 1984 for a number of well-known models).
In this section, we only review the literature directly related to mathematical pro-
gramming based approaches for stochastic production planning problems.
Bitran and Yanasse (1984) deal with a similar stochastic production planning
problem with a service level requirement. They provide non-sequential (static)
and deterministic equivalent formulations of the model and propose error bounds
between the exact solution and the proposed approach. Their main focus is on the
solution of the static problem, i.e., the solution at time zero for the whole planning
horizon.
Bitran, Haas and Matsudo (1986) present a model that is motivated by a case
in the consumer electronics and textile and apparel industry. In this model, the
stochastic problem is transformed into a deterministic one by replacing the random
demand with their average values. Then, the solution of the transformed problem
provides answers to the questions of what to produce and when to produce. The
complete solution is obtained by determining how much to produce from a newsboy-
type formulation based on the solution of the deterministic problem.
Feiring and Sastri (1989) focus on production smoothing plans with rolling
horizon strategies and conﬁdence levels for the demand, which are set by the pro-
duction planners. The probabilistic constraints in the demand-driven scheduling
model are revised by Bayesian procedures and are transformed into deterministic
constraints by inverse transformations of normally distributed demand.
Zäpfel (1996) claims that MRP II systems can be inadequate for the solution of
production planning problems with uncertain demand because of the insufﬁciently
supported aggregation/disaggregation process. The paper then proposes a procedure
to generate an aggregate plan and a consistent disaggregate plan for the Master
Production Schedule.
Kelle, Clendenen and Dardeau (1994) extend the economic lot scheduling prob-
lem for the single-machine, multi-product case with random demands. Their ob-
348 I. Yıldırım et al.

jective is to find the optimal length of production cycles that minimizes the sum of
set-up costs and inventory holding costs per unit of time and satisfies the demand
of products at the required service levels.
Clay and Grossman (1997) focus on a two-stage fixed-recourse problem with
stochastic Right-Hand-Side terms and stochastic cost coefficients and propose a
sensitivity-based successive disaggregation algorithm.
Sox and Muckstadt (1996) present a model for the finite-horizon, discrete-time,
capacitated production planning problem with random demand for multiple prod-
ucts. The proposed model includes backorder cost in the objective function rather
than enforcing service level constraints. A subgradient optimization algorithm is
developed for the solution of the proposed model by using Lagrangian relaxation
and some computational results are provided.
Beyer and Ward (2000) report a production and inventory problem of Hewlett-
Packard’s Network Server Division. The authors propose a method to incorporate
the uncertainties in demand in an Advanced Planning System utilized by Hewlett-
Packard.
Albritton, Shapiro and Spearman (2000) study a production planning problem
with random demand and limited information and propose a simulation based op-
timization method. Qui and Burch (1997) study a hierarchical production planning
and scheduling problem motivated by the fibre industry and propose an optimization
model that uses logic of expert systems.
Van Delft and Vial (2003) consider multiperiod supply chain contracts with
options. In order to analyze the contracts, they propose a methodology to formulate
the deterministic equivalent problem from the base deterministic model and from
an event tree representation of the stochastic process and solve the stochastic linear
program by discretizing demand under the backlog assumption.
For the textile-apparel-retail problem discussed in Abernathy et al. (2000), a
simulation model has also been developed (Yang et al., 1997). Then a simulation-
based optimization technique that is referred as ordinal optimization, has been used
to determine the parameters of a production and inventory control policy that gives
a good-enough solution approximately (Yang et al., 1997; Lee, 1997). However,
one needs to set a specific production and inventory control policy in the simulation.
In addition to the difficulty of setting a plausible policy in a complicated case, as the
number of sources and products increase, the number of parameters to be optimized
also increases. As a result, finding an approximate solution requires a considerable
time.
Simplified versions of the sourcing problem studied in this paper have been
investigated in the past by using stochastic optimal control (Bradley, 2002; Tan and
Gershwin, 2004; Tan, 2001). Bradley (2002) considers a system with a producer
and a subcontractor and discrete flow of goods. In an M/M/1 setting without the
service level requirements, he proves that the optimal control policy structure is a
dual-base stock policy. In this policy when the number of customers in the queue
reaches a certain level, then new incoming customers are sent to the subcontractor.
When there are no customers waiting in the queue, then the producer continues
production until a certain threshold is reached.
A multiperiod stochastic production planning and sourcing problem 349

In Tan (2001) and Tan and Gershwin (2004), a producer with a single sub-
contractor is formulated with continuous ﬂow of goods without the service level
requirements. They also show that a threshold-type policy is optimal to decide when
and how to use a subcontractor. In the threshold policy, the subcontractor is used
when the inventory or the backlog is below a certain threshold level.
Our paper uses the idea of incorporating randomness in a deterministic math-
ematical program that is used in many of the above studies in different formats.
We utilize the approach proposed by Bitran and Yanasee (1984) that shows the
equivalence for the static problem. In contrast to this study where the main objec-
tive is determining error bounds for the optimal cost in the non-sequential case,
our main focus is generating a production and sourcing plan, i.e. determining the
values of the decision variables in the sequential (dynamic) problem where sourc-
ing decisions are made (or updated) dynamically over time. We also compare the
approximate solution of the dynamic problem with certain benchmark policies.
Since the exact optimal solution of the dynamic problem is not known, we use
two different benchmarks. It is proven that for a single source with lead time, the
proposed approach yields the same production policy as the optimal base stock
policy. For a dual-source, e.g. a producer with a subcontractor, a threshold-type
subcontracting policy suggested by Bradley (2002), Tan (2001), Tan and Gershwin
(2004) is utilized as a benchmark. After adopting the threshold policy to a more
generalized case with lead time and service-level requirements, it is observed that
the proposed approach yields very similar results to the threshold-based benchmark
in the numerical examples considered.

3 Stochastic multiperiod sourcing problem with service level constraints

Assume that there is a single product and N different production sources (plants and
subcontractors). The demand for this specific product at time t, dt is random. The
main decision variables are the production quantities at each production source at
time t, Xi,t , i = 1, . . ., N . The inventory level at the end of time period t is denoted
by It . The number of periods in the planning horizon is T . The inventory holding
cost per unit per unit time is ht and the production cost at production source i at
time t is ci,t .
Constraints on the performance (related to backorders) of the system are im-
posed by requiring service levels. The frequently used Type 1 Service Level is
defined to be the fraction of periods in which there is no stock out. It can be viewed
as the plant’s no-stock-out frequency. This service level measures whether or not a
backorder occurs but is not concerned with the size of the backorder. In this study,
we consider a Modified Type 1 Service Level requirement. The Modified Type 1
Service Level forces the probability of having no stock out to be greater than or
equal to a service level requirement in each period. The service level requirement
in period t is denoted by αt .
The Stochastic Production Planning and Sourcing Problem (SP) is defined as:
# T
N
%

∗ +
Z (SP ) = MinE ht (It ) + ci,t Xi,t
t=1 i=1
350 I. Yıldırım et al.

subject to
N

It = It−1 + Xi,t − dt , t = 1, ..., T ; (1)
i=1
P {It ≥ 0} ≥ αt , t = 1, ..., T. (2)
Xi,t ≥ 0, i = 1, ..., N t = 1, ..., T. (3)
where (It )+ = Max {0, It } , t = 1, ..., T .
The objective of the problem is to minimize the total expected cost, which is
the expected value of the sum of the inventory holding and production costs in the
planning horizon. The first constraint set defines the inventory balance equations
for each time period. The next constraint imposes the service level requirement for
each period. Finally, the last constraint states that the production quantities cannot
be negative.
This formulation can easily be extended to multiple products and production
sources with lead times. Moreover different service level definitions can also be
considered by following the same approach.

4 An approximate solution procedure based on a rolling horizon procedure

The solution of the above problem at time 0 for the planning horizon [0, T ] is
referred as the static solution. The static solution is obtained by using the available
information about the distribution of demand in the future periods and the initial
inventory. A policy that sets (or updates) the future production quantities Xi,t at
time t based on the information available at that time, e.g., demand realizations,
demand distributions in the future periods, and current inventory levels, is referred
to as the dynamic solution.
In theory, the optimal policy which determines production quantities based on
actual state information may be obtained by solving the stochastic dynamic pro-
gram associated with this problem. In practice, however, there are several problems
with the stochastic dynamic programming solution. First, the well-known curse
of dimensionality makes numerical solutions challenging even for relatively small
problems. Second, it is difﬁcult to integrate constraints on the trajectory of the
underlying stochastic processes such as service level requirements in inventory
models. Therefore, we propose a rolling-horizon approach that is based on solving
the static problem at each time period based on the available information. This, how-
ever, requires solving the static problem repeatedly which requires a transformation
explained below.

4.1 Deterministic equivalent formulation for the static solution

Although obtaining the optimal dynamic solution is, in general, not tractable, the
static solution can relatively easily be obtained by using deterministic mathematical
programming as suggested by Bitran and Yanasse (1984).
A multiperiod stochastic production planning and sourcing problem 351

In particular, Bitran and Yanasse show that the (Modiﬁed Type 1) service level
constraint can be transformed into a deterministic equivalent constraint by specify-
ing certain minimum cumulative production quantities that depend on the service
level requirements.
To summarize this approach, let lt denote the (deterministic equivalent) mini-
mum cumulative production quantity in period t which is calculated by solving the
probabilistic inequality:
t

P dτ ≤ lt = αt , t = 1, ..., T for lt (t = 1, ..., T )
τ =1
that yields
lt = Ft−1 (αt ), t = 1, ..., T
t
where Ft (.) is the cumulative distribution function of the random sum τ =1 dτ .
Then the probabilistic constraint P {It ≥ 0} ≥ αt , t = 1, ..., T can be expressed
equivalently by:
t
N
Xi,τ + I0 ≥ lt , t = 1, ..., T (4)
τ =1 i=1
Now, the deterministic equivalent problem with service level constraints that has
been mentioned in the previous sections can be modeled as below (Bitran and
Yanasse, 1984):
Deterministic Equivalent Problem (DEP):
T

t N N

∗
Z (DEP ) = Min ht (I0 + Xi,τ ) + ci,t Xi,t
t=1 τ =1 i=1 i=1
subject to
t
N
Xi,τ + I0 ≥ lt , t = 1, ..., T (5)
τ =1 i=1
Xi,t ≥ 0, i = 1, ..., N t = 1, ..., T. (6)
The optimal decision variable values in DEP are the same as the ones in the
solution of SP at time 0.
The static solution is obtained by transforming the stochastic problem into a
deterministic one and then solving the resulting mathematical program. The rolling
horizon approach repeats this procedure by using the available information at each
time period until time T .

5 Performance of the rolling horizon solution

It is known that the rolling-horizon approach yields good results for a number of
dynamic optimization problems. In some special cases, the rolling horizon method
may even yield the optimal solution. In this section, we evaluate the performance
of the proposed method by comparing it to certain benchmark policies in two
commonly encountered special cases in production planning.
352 I. Yıldırım et al.

5.1 A single source problem with stationary demand

We start with the special case of a single production source. When there is only one
source, the objective function includes only the holding cost (since the expected
total production costs must equal the total expected demand over the planning
horizon). In this case, we use the base stock policy as the benchmark policy. The
base stock policy is widely known and utilized in many applications. In addition,
it is known to be optimal in a number of related inventory problems. It, therefore,
constitutes a natural benchmark for comparison. The base stock policy has a single
parameter which is a reorder level and a base lot size of one unit. It aims to maintain
a pre-specified target inventory level. Under this policy, the sequence of events is
as follows: the system starts with a pre-specified base stock level in the finished
goods inventory. The arrival of the customer demand triggers the consumption of an
end-item from the inventory and issuing of a replenishment order to the production
facility. Using this policy, an order is placed (or the manufacturing facility operates)
if and only if the inventory level drops below the base stock level. The comparison
of these two models is performed for two cases with and without a lead time.

5.1.1 Single source without lead time

In this first scenario, there is a single product to be produced by a single production
facility. It is assumed that the demand of this specific product stays stationary over
the planning horizon. We propose that solving the deterministic equivalent model
with modified service level constraints on a rolling horizon basis is equivalent to
operating the system under the base stock policy. The next proposition establishes
this equivalence:
Proposition 1. When the production facility has no lead time and the demand
is stationary, using a base stock policy is equivalent to solving the deterministic
equivalent model with service level constraints on a rolling horizon basis (either
Modified Type 1 or Modified Type 2) in the following way: assume that the base
stock level in the base stock policy equals I0 (BS) = S1 and the initial inventory
level in the deterministic equivalent problem equals I0 (DEP ) = l1 . If S1 = l1 ,
then the equivalent base stock policy gives the same total expected cost value, yields
the same production plan and results in the same service level with the deterministic
equivalent model with modified service level constraints solved on a rolling horizon
basis.
Since this case is a special case of the next one with lead time, the proof of
Proposition 1 is not given here but reported in (Yıldırım, 2004).
Corollary 1. The optimal base stock level is equal to l1 . Equivalently, the base
stock level S1 = l1 ensures that the resulting production plan satisfies the required
service levels.
Proof. If the initial inventory level is set to be S1 = l1 , the resulting production plan
is the same with that of the base stock policy which starts with a base stock level
of S1 = l1 . Although the base stock policy does not guarantee the assurance of the
service levels, since we know that the deterministic equivalent model satisfies the
A multiperiod stochastic production planning and sourcing problem 353

required service levels and the two policies are equivalent, we can say that the base
stock level S1 = l1 ensures that the resulting production plan satisﬁes the required
service levels. Note that S1 = l1 must be optimal because decreasing the base stock
level from l1 leads to an infeasible solution and increasing it above l1 would lead
to higher average inventory costs and therefore cannot be optimal.

Even though a formal proof is lacking, it is highly likely that the base stock
policy (with a stationary base stock level) is optimal for the single-plant single-
product problem in an inﬁnite horizon setting. Theorem 1 and Corollary 1 establish
that for this problem, the rolling horizon approach yields the same solutions as the
optimal base stock policy leading us to conclude that the rolling horizon procedure
performs optimally in this case.

5.1.2 Single source with lead time

The deterministic equivalent model with service level constraints (DEP) can be
extended to a case in which the production facility has a production lead time.
Assume that there is a production lead time of LT periods and the initial scheduled
receipts are denoted by SRt , t = 1, ..., LT. Then, the problem can be modeled in
the following way:
Deterministic Equivalent Production Planning Problem including Lead Time
(DEPLT):
LT

t

Z ∗ (DEPLT) = Min ht (I0 + SRτ )
t=1 τ =1
T

LT t N

+ ht (I0 + SRτ + Xi,τ −LT )
t=LT +1 τ =1 τ =LT +1 i=1

subject to
t
LT

Xτ −LT + SRτ + I0 ≥ lt , t = (LT + 1), ..., T ; (7)
τ =LT +1 τ =1
Xt ≥ 0, t = 1, ..., T. (8)
Our main result is as follows:
Proposition 2. When the production facility has a non-negative lead time LT, the
demand is stationary and there are no scheduled receipts initially, using a base stock
policy is equivalent to solving the deterministic equivalent model with service level
constraints on a rolling horizon basis in the following manner: assume that the base
stock level in the base stock policy including lead time equals I0 (BSLT) = S2
and the initial inventory level in the deterministic equivalent model including lead
time equals I0 (DEPLT) = lLT +1 . If S2 = lLT +1 , then the equivalent base stock
policy gives the same total expected cost value, yields the same production plan
and results in the same service level with the deterministic equivalent model with
service level constraints solved on a rolling horizon basis.
Proof. The proof of Proposition 2 is given in the Appendix.

354 I. Yıldırım et al.

5.2 A dual source problem with stationary demand

Since the optimal solution of our dynamic problem is not known, a plausible bench-
mark is used to evaluate the performance of the proposed approach. We propose
a threshold subcontracting model suggested in a number of studies in the litera-
ture (Bradley, 2002; Tan, 2001; Tan and Gershwin, 2004). Although the threshold
policy is only shown to be optimal under speciﬁc assumptions including zero lead
time, stationary demand, no service level requirements, etc., we think that it is a
reasonable benchmark policy for our problem.

5.2.1 A threshold subcontracting policy

Now we explain the operation of the threshold policy for our benchmark case. We
consider a dual source system with an in-house production facility and a subcontrac-
tor. We assume that the in-house facility has a capacity of C but the subcontractor
has an infinite capacity. There is a lead time of one period. That is, production
quantities scheduled at time t become available at time t + 1.
The threshold policy is characterized by two threshold levels S and Z. The
in-house production facility operates when the inventory level is below S. That
is, it starts producing when the inventory level drops below the target level S and
stops producing when the inventory level again reaches S. The subcontractor is
used when the inventory level decreases to a threshold level of Z.
When the inventory level is below S, but is still above Z, the in-house facility
produces to cover the shortfall with respect to S. If there is not sufficient production
capacity to cover the whole shortfall, the in-house facility operates at full capacity
and the portion of demand that cannot be satisfied is backlogged for the next period.
Let X1,t and X2,t denote the production amounts of the in-house facility and
the subcontractor in period t respectively. Then, the production amounts of each
production facility in each time period can be determined for the threshold subcon-
tracting model in the following way:
X1,t = Min{S − Z, S − It−1 , C}, t = 1, ..., T ; (9)
X2,t = Max{0, Z − It−1 }, t = 1, ..., T. (10)
The following figure shows the evolution of X1,t , X2,t and It under this policy for
a Poisson arrival of demand with rate 10 and S = 15, Z = 7, and C = 8.

5.2.2 Comparison of the performance of the threshold policy

and the rolling horizon approach
The deterministic equivalent model for this case is solved for a rolling horizon of
10 periods repeatedly throughout a planning horizon of 1000 periods. 5000 sample
demand streams are generated and the realized inventory levels are integrated in
the model accordingly. The production plans and the realized cost values between
periods 451 and 550 are observed. All cost values are calculated on a per period
basis.
The optimal values of the threshold values S and Z are determined by using a
direct simulation-based numerical search. It is assumed that there are 1000 periods
A multiperiod stochastic production planning and sourcing problem 355

Fig. 2. Sample realization of dt , X1,t , X2,t and It under the threshold policy S = 15,
Z = 7, C = 8

in the planning horizon and the same 5000 sample demand streams are utilized.
The service level requirement is relaxed with the one-sided 95% confidence interval
of the simulation result. That is whenever upper confidence level of the observed
service level reaches the desired one, this case is accepted as satisfying the service
level requirement. The underlying reasoning behind making this modification in
service levels is that, the sample size we utilize might not be sufficient enough to
make the realized service level equal exactly to the required one. Among the base
356 I. Yıldırım et al.

Table 1. The possible scenarios for which comparisons are made

Subcontracting Holding In-house (Subcontracting (Holding (In-house

cost cost production cost)/(in-house cost)/(in-house prod. capacity)/
capacity prod. cost) prod. cost) (mean demand)

4 16 8 1 4 0.8
4 16 12 1 4 1.2
4 16 20 1 4 2
6 1 8 1.5 0.25 0.8
6 1 12 1.5 0.25 1.2
6 1 20 1.5 0.25 2
6 4 8 1.5 1 0.8
6 4 12 1.5 1 1.2
6 4 20 1.5 1 2

stock and threshold levels that satisfy the relevant service level requirements, the
model aims to find the one with minimum total cost. The calculations are performed
for periods between 451 and 550.
For the numerical examples reported below, the order arrivals are governed by a
Poisson process with rate 10 products per period. The production cost is assumed to
be $4 per product for the in-house facility. The initial inventory level of the specific
product is set to be zero. The service level requirement is set to be 95%.
The comparison between the deterministic equivalent model and the threshold
subcontracting model is performed for nine combinations of subcontracting cost
to in-house production cost, holding cost to in-house production cost and capacity
to mean demand ratios. The combinations of subcontracting costs, holding costs
and the in-house production capacities and therefore, the combinations of relevant
subcontracting cost to in-house production cost, holding cost to in-house production
cost and capacity to mean demand ratios for which the comparisons are made can be
observed in Table 1. For each of the problem settings, the base stock and threshold
levels observed in the threshold subcontracting model are reported in Table 2.
Note that, in some of the cases, the base stock and threshold pairs are observed
to be the same. The reasoning behind this is, these pairs lead to the same average
inventory levels and minimum cost values in these settings.
While comparing the two models, total expected cost, average production cost,
average inventory holding cost values and the assignment of production to the
plants (in percentages) are the key elements we focus on. Table 3 summarizes the
total expected cost values of the deterministic equivalent model (DEM) and the
threshold subcontracting model (TSM) for the nine different scenarios for each
modified service level type.
The below tables display that the deterministic equivalent model gives very close
solutions when compared with the threshold subcontracting model for both types
of the modified levels. The deterministic equivalent model results in total expected
cost values equal to or a little bit larger than those of the threshold subcontracting
A multiperiod stochastic production planning and sourcing problem 357

Table 2. Base stock and threshold levels observed in each scenario

Subcontracting Holding In-house Critical levels

cost cost production Base stock Threshold
capacity

4 16 8 15 7
4 16 12 15 3
4 16 20 15 −∞
6 1 8 17 7
6 1 12 16 0
6 1 20 15 −∞
6 4 8 15 7
6 4 12 15 3
6 4 20 15 −∞

Table 3. The comparison of total expected cost values observed in each scenario

Subcontracting Holding In-house Total expected cost

cost cost production DEM TSM Percentage
capacity difference

4 16 8 121.66 121.66 0.00

4 16 12 121.66 121.66 0.00
4 16 20 121.66 121.62 0.03
6 1 8 49.97 49.89 0.16
6 1 12 46.16 45.65 1.12
6 1 20 45.10 45.10 0.02
6 4 8 65.33 65.33 0.00
6 4 12 61.47 61.47 0.00
6 4 20 60.42 60.40 0.03

model. For our set of numerical experiments, the deterministic equivalent model
gives close results to the threshold subcontracting model when the service level
requirement is of Modiﬁed Type 1.
Tables 4 and 5 display the comparison of average production and holding cost
values. As can be seen, the deterministic equivalent model gives similar results to
the threshold subcontracting model.
Table 6 summarizes the percentage of production assigned to the in-house pro-
duction facility for both the deterministic equivalent model and the threshold sub-
contracting model. The results suggest that the production assignments of the de-
terministic model follow a similar pattern with the benchmark chosen.
Based on these ﬁgures, we can conclude that the proposed deterministic equiv-
alent model solved on a rolling horizon basis performs as well as the threshold
subcontracting model solved on a simulation-based optimization technique for the
358 I. Yıldırım et al.

Table 4. The comparison of average production cost values observed in each scenario

Subcontracting Holding In-house Average production cost

cost cost production DEM TSM Percentage
capacity difference

4 16 8 39.99 39.99 0.00

4 16 12 39.99 39.99 0.00
4 16 20 39.99 39.99 0.00
6 1 8 44.06 44.36 −0.68
6 1 12 41.05 40.24 2.03
6 1 20 40.00 39.97 0.06
6 4 8 44.91 44.91 0.00
6 4 12 41.05 41.05 0.00
6 4 20 40.00 39.99 0.01

Table 5. The comparison of average holding cost values observed in each scenario

Subcontracting Holding In-house Average holding cost

cost cost production DEM TSM Percentage
capacity difference

4 16 8 81.67 81.67 0.00

4 16 12 81.67 81.67 0.00
4 16 20 81.67 81.63 0.05
6 1 8 5.92 5.53 6.91
6 1 12 5.10 5.41 −5.67
6 1 20 5.10 5.10 0.05
6 4 8 20.42 20.42 0.00
6 4 12 20.42 20.42 0.00
6 4 20 20.42 20.41 0.05

Modified Type 1 service level. The total expected cost values of deterministic equiv-
alent models for all nine different cases are equal to or a little bit larger than those of
the threshold subcontracting model. However, we cannot reach the same conclusion
for the average production and holding cost values. The deterministic equivalent
model performs either worse for some cases or better for some other cases when
the comparison is based on average production or holding cost values. However,
the sum of these two terms, the total expected cost, is equal to a little bit larger than
that of the threshold subcontracting model. Moreover, the proportion of production
assigned to the in-house facility in the deterministic equivalent model resembles
that in the simulation based threshold subcontracting model.
It is worth mentioning that the sample size utilized in the above numerical
comparisons, 5000, might not be large enough to satisfy the service level require-
ments in each time period that the modified service level definitions necessitate.
A multiperiod stochastic production planning and sourcing problem 359

Table 6. The percentage of production assignments to the in-house production facility ob-
served in each scenario

Subcontracting Holding In-house % In-house production

cost cost production Base stock Threshold
capacity

4 16 8 75.45 75.40
4 16 12 94.73 94.70
4 16 20 99.97 100.00
6 1 8 79.76 78.17
6 1 12 94.73 98.78
6 1 20 99.97 100.00
6 4 8 75.45 75.40
6 4 12 94.73 94.70
6 4 20 99.97 100.00

The coefﬁcient of variation in the realized service level values might be larger than
expected. To handle this problematic issue, we introduced one-sided conﬁdence
intervals. Although the threshold subcontracting model constitutes a lower bound
in terms of total expected cost values for our set of numerical examples, it can not
be generalized from our examples that the deterministic equivalent model always
gives solutions worse than those of the threshold subcontracting model. Never-
theless, the proposed approach seems to give extremely promising results in this
particular case as well.

6 Conclusions

In many practical situations, mathematical models of production plan-

ning/outsourcing problems have to deal with the randomness in demand. We present
a systematic approach that enables the randomness in demand and the desired ser-
vice levels to be incorporated in a mathematical programming framework.
We show that solving the deterministic equivalent problem on a rolling-horizon
basis gives similar results to the performance of the benchmarks. Although the
threshold-type policies are conceptually quite intuitive, it is very challenging to
determine the optimal threshold levels by using simulation. The proposed algorithm
is easier to implement and optimize by using available solvers.
This study can be extended in a number of ways. The same approach can be
used to derive results for different service level deﬁnitions. Yıldırım (2004) reports
preliminary results for Type 2 and Modiﬁed Type 2 service levels. The formulation
of the multi-product case is also straightforward.
The effects of demand variability, production cost, and the lead time on the
production and sourcing plans need further investigation. Since the optimal solution
to the general problem is not known for the dynamic case, investigation of the static
360 I. Yıldırım et al.

case or a stylized model can yield insights regarding the interaction of demand
variability, cost, and the lead time.

Appendix

Proof of Proposition 2

We use induction to show that

i. If the inventory levels at the beginning of the first period are equal,
I0 (BSLT) = I0 (DEPLT) = lLT +1 , then production quantities in the first period
and the inventory at the end of first period for both policies become equal, i.e.
X1 (BSLT) = X1 (DEPLT) = 0 and I1 (BSLT) = I1 (DEPLT) = lLT +1 − d1 ;

t t1 such that t1 ≤ LT are equal,

ii. If the inventory levels at the end of period
It1 (BSLT) = It1 (DEPLT) = lLT +1 − τ1=1 dτ , then the production quantities
in period (t1 + 1) and the inventory levels at the end of period (t1 + 1) for
both policies become equal; i.e. Xt1 +1 (BSLT) = Xt1 +1 (DEPLT) = dt1 and
t +1
It1 +1 (BSLT) = It1 +1 (DEPLT) = lLT +1 − τ1=1 dτ .
and
iii. If the inventory levels at the end of period (LT +1) are equal,
LT +1
ILT +1 (BSLT)=ILT +1 (DEPLT)=lLT +1 − τ =1 dτ , then production quanti-
ties in period (LT +2) and the inventory levels at the end of period (LT +2) for
both policies become equal, i.e. XLT +2 (BSLT) = XLT +2 (DEPLT) = dLT +1
LT +2
and ILT +2 (BSLT) = ILT +2 (DEPLT) = lLT +1 − τ =2 dτ ;
iv. If the inventory levels at the end of period tt2 such that t2 ≥ LT are equal,
It2 (BSLT) = It2 (DEPLT) = lLT +1 − τ2=t2 −LT dτ , then the production
quantities in period (t2 + 1) and the inventory levels at the end of period (t2 + 1)
for both policies become equal; i.e. Xt2 +1 (BSLT) = Xt2 +1 (DEPLT) = dt2
t +1
and It2 +1 (BSLT) = It2 +1 (DEPLT) = lLT +1 − τ2=t2 +1−LT dτ .
Assume that the initial inventory levels are equal such that I0 (BSLT) =
S2 , I0 (DEPLT) = lLT +1 and S2 = lLT +1 . In the base stock policy, each de-
mand observed is produced in the next period; therefore there is no production
in the first period, X1 (BSLT) = 0. In the deterministic equivalent approach, the
production quantity
LT in the first period is determined according to the constraint
X1 (DEPLT)+ τ =1 SRτ (DEPLT) + I0 (DEPLT) = X1 (DEPLT)+0+lLT +1 ≥
lLT +1 and therefore, X1 (DEPLT) ≥ 0. Since the problem is of minimization type,
the production quantity in the first period equals zero, i.e. X1 (DEPLT) = 0. Next, a
customer demand of d1 arrives. The end of period inventory for the base stock pol-
icy becomes I1 (BSLT) = I0 (BSLT) + SR1 (BSLT) − d1 = S2 + 0 − d1 = S2 − d1
and the end of period inventory for the deterministic equivalent approach becomes
I1 (DEPLT) = I0 (DEPLT) + SR1 (DEPLT) − d1 = lLT +1 + 0 − d1 = lLT +1 − d1 .
Since we know that S2 = lLT +1 , I1 (BSLT) = I1 (DEPLT).
In the second period, the base stock policy produces the demand of the first
period, i.e. X2 (BSLT) = d1 . At the beginning of the second period, the deter-
ministic equivalent model is rerun since it is solved on a rolling horizon basis.
A multiperiod stochastic production planning and sourcing problem 361

The demand is assumed to be stationary over the planning horizon. Although

solving the model on a rolling horizon basis throughout the planning horizon re-
quires integration of the minimum cumulative production quantites for the num-
ber of periods in the rolling horizon into the model, only the minimum cumula-
tive production quantity of period (LT + 1), lLT +1 , is fully utilized. The produc-
tion quantity of the deterministic
LT +1 equivalent model in the second period is deter-
mined by X2 (DEPLT) + τ =2 SRτ (DEPLT) + I1 (DEPLT) = X2 (DEPLT) +
X1 (DEPLT) + I1 (DEPLT) = X2 (DEPLT) + 0 + lLT +1 − d1 ≥ lLT +1 ; therefore,
X2 (DEPLT) ≥ d1 . In order to minimize the production costs, the production quan-
tity in the second period equals the demand of the ﬁrst period, i.e. X2 (DEPLT) = d1 .
After the arrival of a customer demand of d2 , the end of period inventory for the base
stock policy becomes I2 (BSLT) = I1 (BSLT) + SR2 (BSLT) − d2 = S2 − d1 − d2
and the end of period inventory for the deterministic equivalent approach be-
comes I2 (DEPLT) = I1 (DEPLT) + SR2 (DEPLT) − d2 = lLT +1 −d1 − d2 . Since
S2 = lLT +1 , we can say that I2 (BSLT) = I2 (DEPLT).
Since demand during lead time cannot be satisﬁed no sooner than (LT +1) peri-
ods of time, the inventory levels at theend of any period t1 such that t1≤ (LT − 1)
t t
can be written as It1 (BSLT) = S2 − τ1=1 dτ , It1 (DEP) = lLT +1 − τ1=1 dτ and
S2 = lLT +1 . In period (t1 +1), the base stock policy produces Xt1 +1 (BSLT) = dt1 .
In the deterministic equivalent approach,t +LTthe production quantity is determined
by the constraint Xt1 +1 (DEPLT) + τ1=t1 +1 SRτ (DEPLT) + It1 (DEPLT) =
t1
X (DEPLT) + Xτ (DEPLT) + It1 (DEPLT) = Xt1 +1 (DEPLT) +
tt11+1
−1 τ =1
t1
τ =1 τ d +l LT +1 − τ =1 dτ ≥ lLT +1 ; therefore, Xt1 +1 (DEPLT) ≥ dt1 . Since
the problem is of minimization type, Xt1 +1 (DEPLT) = dt1 . Then, a customer de-
mand of dt1 +1 is observed. The end of period inventory for the base stock t policy
becomes It1 +1 (BSLT) = It1 (BSLT)+SRt1 +1 (BSLT)−dt1 +1 = S2 − τ1=1 dτ −
t +1
dt1 +1 = S2 − τ1=1 dτ and the end of period inventory for the deterministic equiv-
alent approach becomes It1 +1 (DEPLT) = It1 (DEPLT) + SRt1 +1 (DEPLT) −
t t +1
dt1 +1 = lLT +1 − τ1=1 dτ − dt1 +1 = lLT +1 − τ1=1 dτ . Since S2 = lLT +1 ,
It1 +1 (BSLT) = It1 +1 (DEPLT).
Similarly, dLT +1 is produced by the base stock policy in period
(LT + 1), i.e. XLT +1 = dLT +1 . The constraint XLT +1 (DEPLT) +
2LT LT
τ =LT +1 SRτ (DEPLT) + ILT (DEPLT) = XLT +1 (DEPLT)+ τ =1 Xτ +
LT −1 LT
ILT (DEPLT) = XLT +1 (DEPLT)+ τ =1 dτ + lLT +1 − τ =1 dτ ≥ lLT +1 ;
i.e. XLT +1 (DEPLT) ≥ dLT determines the production quantity of the determinis-
tic equivalent model in period (LT + 1). Then, XLT +1 (DEPLT) = dLT . Next, a
customer demand of dLT +1 arrives. The end of period inventory for the base stock
policy becomes ILT +1 (BSLT) = ILT (BSLT) + SRLT +1 (BSLT) − dLT +1 =
LT LT
S2 − τ =1 dτ + X1 (BSLT) − dLT +1 = S2 − τ =1 dτ + 0 − dLT +1 =
LT +1
S2 − τ =1 dτ and the end of period inventory for the deterministic equivalent ap-
proach becomes ILT +1 (DEPLT) = ILT (DEPLT)+SRLT +1 (DEPLT)−dLT +1 =
LT LT
lLT +1 − τ =1 dτ + X1 (DEPLT) − dLT +1 = lLT +1 − τ =1 dτ + 0 − dLT +1 =
LT +1
lLT +1 − τ =1 dτ . Since S2 = lLT +1 , ILT +1 (BSLT) = ILT +1 (DEPLT).
362 I. Yıldırım et al.

In period (LT + 2), the base stock policy produces XLT +1 (BSLT) = dLT +2 .
For the deterministic equivalent approach, we know that XLT +2 (DEPLT) +
2LT +1 LT +1
τ =LT +2 SRτ (DEPLT)ILT +1 (DEPLT) = XLT +2 (DEPLT)+ Xτ +
LT LT +1 τ =2
ILT +1 (DEPLT) = XLT +2 (DEPLT)+ τ =1 dτ + lLT +1 − τ =1 dτ ≥ lLT +1 ;
i.e. XLT +2 (DEPLT) ≥ dLT +1 and then, XLT +2 (DEPLT) = dLT +1 . After
the arrival of dLT +2 , the following end of period inventory levels are observed
LT +1
ILT +2 (BSLT) = ILT +1 (BSLT)+SRLT +2 (BSLT)−dLT +2 = S2 − τ =1 dτ +
LT +1 LT +2
X2 (BSLT) − dLT +2 = S2 − τ =1 dτ + d1 − dLT +2 = S2 − τ =2 dτ and
ILT +2 (DEPLT) = ILT +1 (DEPLT) + SRLT +2 (DEPLT) − dLT +2 = lLT +1 −
LT +1 LT +1
τLT =1 dτ +X2 (DEPLT)−dLT +2 = lLT +1 − τ =1 dτ +d1 −dLT +2 = lLT +1 −
+2
τ =2 dτ . Since we know that S2 = lLT +1 , ILT +2 (BSLT) = ILT +2 (DEPLT).
Now assume that at the end of any period t2 such that t2 ≥
t2
(LT + 1), It2 (BSLT)=S2 − τ =t2 −LT dτ , It2 (DEPLT) = lLT +1 −
t2
d
τ =t2 −LT τ and S 2 = l LT +1 . In period (t2 + 1), X t2 +1 (BSLT) =
dt2 and Xt2 +1 (DEPLT) is determined by the constraint Xt2 +1 (DEPLT) +
t2 +LT t2
τ =t2 +1 SRτ (DEPLT) + It2 (DEPLT) = Xt2 +1 (DEPLT) + τ =1 Xτ +
t2 −1 t2
It2 (DEPLT) = Xt2 +1 (DEPLT) + τ =1 dτ +lLT +1 − τ =1 dτ ≥ lLT +1 ;
Xt2 +1 (DEPLT) ≥ dt2 and since the model is of minimization type
Xt2 +1 (DEPLT)=dt2 . Next, a customer demand of dt2 +1 arrives. The end of pe-
riod inventory levels for both policies become It2 +1 (BSLT) = It2 (BSLT) +
t2
SRt2 +1 (BSLT) − dt2 +1 = S2 − τ =t2 −LT dτ + Xt2 +1−LT (BSLT) −
t2 t +1
dt2 +1 = S2 − τ =t2 −LT dτ + dt2 −LT − dt2 +1 = S2 − τ2=t2 +1−LT dτ
t2 It2 +1 (DEPLT) = It2 (DEPLT) + SRt2 +1 (DEPLT)
and t2 − dt2 +1 = S2 −
τ =t2 −LT d τ + X t +1−LT (DEPLT) − d t +1 = S 2 − τ =t2 −LT dτ + dt2 −LT −
t2 +12 2

dt2 +1 = S2 − τ =t2 +1−LT dτ . Since we know that S2 = lLT +1 , It2 +1 (BSLT) =

It2 +1 (DEPLT). This proves our proposition.

References
Abernathy FH, Dunlop JT, Hammond JH, Weil D (1999) A stitch in time. Oxford University
Press, New York
Abernathy FH, Dunlop JT, Hammond JH, Weil D (2000) Control your inventory in a world
of lean retailing. Harvard Business Review Nov-Dec: 169–176
Albritton M, Shapiro A, Spearman M (2000) Finite capacity production planning with ran-
dom demand and limited information. Stochastic Programming E-Print Series
Beyer RD, Ward J (2000) Network server supply chain at HP: a case study. HP Labs Tech
Report 2000-84
Bitran GR, Yanasse HH (1984) Deterministic approximations to stochastic production prob-
lems. Operations Research 32: 999–1018
Bitran GR, Haas EA, Maatsudo (1986) Production planning of style goods with high setup
costs and forecast revisions. Operations Research 34(2): 226–236
Bradley JR (2002) Optimal control of a dual service rate M/M/1 production-inventory model.
European Journal of Operational Research (2002) (forthcoming)
Candea D, Hax AC (1984) Production and inventory management. Prentice-Hall, New Jersey
Clay RL, Grossman IE (1997) A disaggregation algorithm for the optimization of stochastic
planning models. Computers and Chemical Engineering 21(7): 751–774
A multiperiod stochastic production planning and sourcing problem 363

Feiring BR, Sastri T (1989) A demand-driven method for scheduling optimal smooth pro-
duction levels. Annals of Operations Research 17: 199–216
Kelle P, Clendenen G, Dardeau P (1994) Economic lot scheduling heuristic for random
demands. International Journal of Production Economics 35: 337–342
Lee LH (1997) Ordinal optimization and its application in apparel manufacturing systems.
Ph.D. Thesis, Harvard University, Cambridge, MA
Qiu MM, Burch EE (1997) Hierarchical production planning and scheduling in a multi-
product, multi-machine environment. International Journal of Production Research
35(11): 3023–3042
Sox CR, Muckstadt JA (1996) Multi-item, multi-period production planning with uncertain
demand. IIE Transactions 28: 891–900
Tan B, Gershwin SB (2004) Production and subcontracting strategies for manufacturers with
limited capacity and volatile demand. Annals of Operations Research (Special volume
on Stochastic Models of Production/Inventory Systems) 125: 205–232
Tan B (2002) Managing manufacturing risks by using capacity options. Journal of the Op-
erational Research Society 53(2): 232–242
Van Delft C, Vial J-PH (2003) A practical implementation of stochastic programming: an
application to the evaluation of option contracts in supply chains. Automatica (to appear)
Yang MS, Lee LH, Ho YC (1997) On stochastic optimization and its applications to manu-
facturing. Lectures in Applied Mathematics 33: 317–331
Yıldırım I (2004) Stochastic production planning and sourcing problems with service level
constraints. M.S. Thesis, Koç University, Industrial Engineering, Istanbul, Turkey
Zäpfel G (1996) Production planning in the case of uncertain individual demand extension
for an MRP II concept. International Journal of Production Economics 46–47: 153–164