0% found this document useful (0 votes)
20 views82 pages

System Simulation and Modeling-Course Material

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views82 pages

System Simulation and Modeling-Course Material

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

SREE VIDYANIKETHAN ENGINEERING COLLEGE

(AUTONOMOUS)
Sree Sainath Nagar, A. Rangampet-517102

Department Of Computer Science and Systems Engineering

COURSE MATERIAL

IV B. Tech. - I Semester
20BT71501: SYSTEM SIMULATION MODELING

Prepared by

Dr. C. Siva Kumar


Associate Professor, CSSE
IV B. Tech. – I Semester
(20BT71501) SYSTEM SIMULATION AND MODELING
Int. Marks Ext. Marks Total Marks L T P C
40 60 100 3 - - 3
PRE-REQUISITES: Courses on Programming for Problem Solving, Numerical Methods,
Probability and Statistics.

COURSE DESCRIPTION: Discrete event simulation; Useful statistical models; Queuing


systems; Probabilistic model, Properties of random numbers, Test for random numbers; Input
modeling, Types of simulations with respect to output analysis

COURSE OUTCOMES: After successful completion of the course, students will be able to:
CO1. Understand the concepts of discrete event simulation by using single–server queuing
system and simulation software.
CO2. Develop a probabilistic model to solve real life problems and validate it.
CO3. Apply statistical models to represent the data for simulation.
CO4. Apply Techniques to generate Random variates for modeling a system
CO5. Apply goodness of fit tests for identified input data distribution
CO6. Analyze the techniques for output data analysis for a single system

DETAILED SYLLABUS:
UNIT I: BASIC SIMULATION MODELING (10 Periods)
Introduction: The nature of simulation, Systems, Models, and simulation, discrete event
simulation, Simulation of a single-server queuing system: problem statement, Intuitive
Explanation, Program Organization and Logic, simulation output and discussion, Alternative
Stopping Rules, steps in simulation study, advantages, disadvantages, and pitfalls of
simulation
Simulation software: introduction, comparison of simulation packages with programming
languages, classification of simulation software, desirable software features.

UNIT II: MODELING A PROBABILISTIC SYSTEM (11 Periods)


Introduction: Random variables and their properties, simulation output data and stochastic
processes, estimation of means, Variances, and correlations, confidence intervals and
hypothesis tests for the mean
Validation of simulation Model: Definitions, guidelines for determining the level of model
detail, verification of simulation computer programs, techniques for increasing model Validity
and credibility

UNIT III: SELECTION OF INPUT PROBABILITY DISTRIBUTIONS (10 Periods)


Introduction, Probability distributions, Continuous distributions, discrete distributions
hypothesizing families of distributions

UNIT IV: GENERATING RANDOM VARIATES (07 Periods)


Properties of random numbers, Generation of pseudo-random numbers, Techniques for
generating random numbers, Tests for random numbers, Inverse-transform technique,
Acceptance rejection technique, Special properties.

UNIT V: OUTPUT DATA ANALYSIS FOR A SINGLE SYSTEM (07 Periods)


Introduction, transient and steady-state behavior of a stochastic process, Types of simulations
with respect to output analysis, statistical analysis for terminating simulations
Total Periods: 45
Topics for self-study are provided in lesson plan.
TEXT BOOK:
1. Averill M. Law, Simulation Modeling and Analysis, McGraw Hill Education (India) Private
Limited, 5th edition, 2015.

REFERENCE BOOKS:
1. Jerry Banks, John S. Carson II, Barry L.Nelson and David M.Nicol, Discrete-Event System
Simulation, Pearson India,5th edition, 2013.
2. Narsingh Deo, System Simulation with Digital Computer, Prentice Hall India 2009.

CO-PO and PSO Mapping Table:


Program Specific
Course Program Outcomes
Outcomes
Outcomes
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
CO1 3 - - - 3 - - - - - - - 3 - -
CO2 1 3 3 3 1 1 - - - - - - 3 - -
CO3 2 3 2 - 2 - - - - - - - 2 - -
CO4 2 3 - - 2 - - - - - - - 2 - -
CO5 2 3 - 2 2 - - - - - - - 3 - -
CO6 1 3 1 3 1 - - - - - - - 3 - -
Average 1.8 3 2 2.6 1.8 1 - - - - - - 2.6 - -
Level of
correlation
2 3 2 3 2 1 - - - - - - 3 - -
of the
course
Level of Correlation: 3 - High 2 - Medium 1 – Low
SREE VIDYANIKETHANENGINEERING COLLEGE
(AUTONOMOUS)
Sree Sainath Nagar, A. Rangampet-517102
Department of Computer Science and Systems Engineering

Lesson Plan cum Diary 2021-22

Name of the Subject : System Simulation and Modeling (19BT71501)


Name of the faculty Members : Dr. C.SIVAKUMAR
Class & Semester : IV B.Tech I semester Section: CSSE-A&B

Course Blooms Self


S. Book(s)
Topic Outcome Level Learning
No. followed
Concepts
Unit-I: BASIC SIMULATION MODELING
1. The nature of simulation, Systems, CO1 BL1
T1
Models, and simulation
2. Discrete event simulation, Simulation of a CO1 BL4
single-server queuing system: problem T1
statement
3. Intuitive Explanation, Program CO1 BL2
T1
Organization
4. Logic, simulation output and discussion T1 CO1 BL4 List
5. Alternative Stopping Rules, steps in CO1 BL2 Processing
T1
simulation study in
6. Advantages, disadvantages, and pitfalls of CO1 BL2 simulation
T1
simulation
7. Simulation software: introduction T1 CO1 BL2
8. comparison of simulation packages with CO1 BL4
T1
programming languages
9. Classification of simulation software T1 CO1 BL4
10. Desirable software features T1 CO1 BL2
Total no of periods required:
Unit-II: MODELING A PROBABILISTIC SYSTEM
11. Introduction: Random variables and their CO2 BL2
T1,R1
properties
12. simulation output data T1 ,R1 CO2 BL2
13. stochastic processes T1,R1 CO2 BL2
14. Estimation of means, Variances T1,R1 CO2 BL4
15. correlations, confidence intervals T1,R1 CO2 BL4
Validates
16. Hypothesis tests for the mean T1,R1 CO2 BL3
the
17. Validation of simulation Model: Definitions T1,R1 CO2 BL4
component
18. Guidelines for determining the level of CO2 BL3
T1,R1 s of model
model detail
19. Verification of simulation computer CO2 BL3
T1,R1
programs
20. Techniques for increasing model Validity CO2 BL3
T1,R1
and credibility
Total no of periods required:
Unit-III: SELECTION OF INPUT PROBABILITY DISTRIBUTIONS
21. Introduction, Probability distributions T1 CO3 BL2
22. Types of Probability distributions T1 CO3 BL2
23. Examples T1 CO3 BL3
Shifted
24. Continuous distributions T1 CO3 BL2
and
25. Types of Continuous distributions T1 CO3 BL2
truncated
26. Examples T1 CO3 BL3
Distributio
27. Discrete distributions T1 CO3 BL2
n
28. Types of Discrete distributions T1 CO3 BL2
29. Hypothesizing families of distributions T1 CO3 BL3
30. Examples T1 CO3 BL3
Total no of periods required:
Unit-IV: GENERATING RANDOM VARIATES
31. Properties of random numbers T1,R2 CO4 BL2
32. Generation of pseudo-random numbers T1,R2 CO4 BL3
33. Techniques for generating random CO4 BL3
T1,R2 Generate
numbers
Discrete
34. Tests for random numbers T1,R2 CO5 BL4
Random
35. Inverse-transform technique T1,R2 CO5 BL3
Variates
36. Acceptance rejection technique T1,R2 CO5 BL4
37. Special properties T1,R2 CO4 BL2
Total no of periods required:
Unit-V: OUTPUT DATA ANALYSIS FOR A SINGLE SYSTEM
38. Introduction to output data analysis T1,R1 CO6 BL2
39. Single server queueing System T1,R1 CO6 BL2
40. Transient behavior of a stochastic process T1,R1 CO6 BL3
41. steady-state behavior of a stochastic CO6 BL4 Confidence
T1,R1
process Intervals
42. Types of simulations with respect to CO6 BL2 for
T1,R1
output analysis comparing
43. Examples T1,R1 CO6 BL3 two
44. statistical analysis for terminating CO6 BL4 systems
T1,R1
simulations
Total no of periods required:
Grand total of periods required: 45

TEXT BOOK:
1. Averill M. Law, Simulation Modeling and Analysis, McGraw Hill Education (India) Private
Limited, 5th edition, 2015.
REFERENCE BOOKS:
1. Jerry Banks, John S. Carson II, Barry L.Nelson and David M.Nicol, Discrete-Event System
Simulation, Pearson India,5th edition, 2013.
2. Narsingh Deo, System Simulation with Digital Computer, Prentice Hall India 2009.
UNIT-I: BASIC SIMULATION MODELING

1.1 Definition of Simulation:


A simulation is the imitation of the operation of real-world process or system over time. It
Generates artificial history of a system to model and observe the history of system. Generally
a model constructs a conceptual framework that describes a system.
The behavior of a system that evolves over time is studied by developing a simulation model
and that model takes a set of expressed assumptions that is either Mathematical, logical and
Symbolic relationship between the entities.
1.1.1Goals of Modeling and Simulation:
A model can be used to investigate a wide verity of “what if” questions about real-world
system. The following are the Goals of Modeling and Simulation:
 Potential changes to the system can be simulated and predicate their impact on the
system.
 Find adequate parameters before implementation
 Simulation can be used as
Analysis tool for predicating the effect of changes
Design tool to predicate the performance of new system
 It is better to do simulation before Implementation
How a model can be developed?
To develop any simulation model it uses any one of the following methods.
 Mathematical Methods
i. Probability theory, algebraic method,
ii. Their results are accurate
iii. They have a few number of parameters
iv. It is impossible for complex systems
 Numerical computer-based simulation
 It is simple
 It is useful for complex system
When simulation is an appropriate tool
 Simulation enables the study of internal interaction of a subsystem with complex
system
 Informational, organizational and environmental changes can be simulated and find
their effects
 A simulation model helps us to gain knowledge about improvement of system
 Finding important input parameters with changing simulation inputs
 Simulation can be used with new design and policies before implementation
 Simulating different capabilities for a machine can help determine the requirement
 Simulation models designed for training make learning possible without the cost
disruption
 A plan can be visualized with animated simulation
 The modern system (factory, wafer fabrication plant, service organization) is too
complex that its internal interaction can be treated only by simulation
When simulation is not appropriate
 When the problem can be solved by common sense.
 When the problem can be solved analytically.
 If it is easier to perform direct experiments.
 If cost exceed savings.
 If resource or time are not available.
 If system behavior is too complex.
 Like human behavior
Advantages of simulation:
 New policies, operating procedures, decision rules, information flow,etc..can be
explored without disrupting the ongoing operations of the real system.
 New hardware designs, physical layouts, transportation systems can be tested
without committing resources for their acquisition.
 Hypotheses about how or why certain phenomena occur can be tested for feasibility.
 Time can be compressed or expanded allowing for a speedup or slowdown of the
phenomena under investigation.
 Insight can be obtained about the interaction of variables.
 Insight can be obtained about the importance of variables to the performance of the
system.
 Bottleneck analysis can be performed indication where work-in process, information
materials and so on are being excessively delayed.
 A simulation study can help in understanding how the system operates rather than
how individuals think the system operates.
 “What-if” questions can be answered. Useful in the design of new systems.
Disadvantages of simulation:
Model Building Requires Special Training
It is an art that is learned over time and through experience. Furthermore, if two
models of the same system are constructed by two competent individuals, they may
have similarities, but it is highly unlikely that they will be the same.
Simulation Results May Be Difficult to Interpret
Since most simulation outputs are essentially random variables (they are usually
based on random inputs), it may be hard to determine whether an observation is a
result of system interrelationships or randomness

Simulation Modeling and Analysis Can Be Time Consuming and Expensive.


Skimping on resources for modeling and analysis may result in a simulation model
and/or analysis that is not sufficient for the task.
Simulation May Be Used Inappropriately
Simulation is used in some cases when an analytical solution is possible, or even
preferable. This is particularly true in the simulation of some waiting lines where
closed-form queueing models are available, at least for long-run evaluation.
1.1.2 Applications of simulation:
Simulation can be used in different application areas shown in Figure 1.
Construction
Manufacturing Semiconductor Engineering and
Applications Manufacturing project
management

Logistics, Supply
chain and Transportation
Military application
distribution modes and Traffic
application

Business Process
Health Care
Simulation

Fig.1 Applications of Simulation


Manufacturing Applications
 Analysis of electronics assembly operations
 Design and evaluation of a selective assembly station for high-precision scroll
compressor shells
 Comparison of dispatching rules for semiconductor manufacturing using large-
facility models
 Evaluation of cluster tool throughput for thin-film head production
 Determining optimal lot size for a semiconductor back-end factory
 Optimization of cycle time and utilization in semiconductor test manufacturing
 Analysis of storage and retrieval strategies in a warehouse
 Investigation of dynamics in a service-oriented supply chain
 Model for an Army chemical munitions disposal facility
Semiconductor Manufacturing:
 Comparison of dispatching rules using large-facility models
 The corrupting influence of variability
 A new lot-release rule for wafer fabs
 Assessment of potential gains in productivity due to proactive retile management
 Comparison of a 200-mm and 300-mm X-ray lithography cell
 Capacity planning with time constraints between operations
 300-mm logistic system risk reduction facility
Construction Engineering:
 Construction of a dam embankment
 Trenchless renewal of underground urban infrastructures
 Activity scheduling in a dynamic, multi project setting
 Investigation of the structural steel erection process
 Special-purpose template for utility tunnel construction
Military Application:
 Modeling leadership effects and recruit type in an Army recruiting station
 Design and test of an intelligent controller for autonomous underwater vehicles
 Modeling military requirements for non-war fighting operations
 Using adaptive agent in U.S Air Force pilot retention
Logistics, Transportation and Distribution Applications:
 Evaluating the potential benefits of a rail-traffic planning algorithm
 Evaluating strategies to improve railroad performance
 Parametric modeling in rail-capacity planning
 Analysis of passenger flows in an airport terminal
 Proactive flight-schedule evaluation
 Logistics issues in autonomous food production systems for extended-duration space
exploration
 Sizing industrial rail-car fleets
 Product distribution in the newspaper industry
 Design of a toll plaza
 Choosing between rental-car locations
 Quick-response replenishment
Business process simulation:
 Impact of connection bank redesign on airport gate assignment
 Product development program planning
 Reconciliation of business and systems modeling
 Personnel forecasting and strategic workforce planning
Human Systems and Health Care:
 Modeling human performance in complex systems
 Studying the human element in air traffic control
 Modeling front office and patient care in ambulatory health care practices
 Evaluating hospital operations b/n the emergency department and a medical
 Estimating maximum capacity in an emergency room and reducing length of stay in
that room.
1.2 SYSTEMS, MODELS, AND SIMULATION
1.2.1 Systems and System Environment:
System
System is defined as a group of objects that are joined together in some regular
interaction or interdependence toward the accomplishment of same.
System environment
A system is often affected by changes occurring outside the system, such changes are
said to occur in the system environment.
Components of a System:
Simulation system consists of the following components.
Activity: Represent a time period of specified length
Ex: Manufacturing process of the department.
State of the System:
The state of a system is defined as the collection of variables necessary to describe a
system at any time, relative to the objective of study.
Event: An event is defined as an instantaneous occurrence that may change the state of
the system.
 Endogenous: Is used to descried activities and events occurring with in the system.
 Exogenous: Is used to descried activities and events in the environment that affect
the system.

Examples of System and Components:


Table1 shows examples of different simulated system and its components.
Table1: System and Components
1.2.2 System Study:
At some point in the lives of most systems, there is a need to study them to try to gain some
insight into the relationships among various components, or to predict performance under
some new conditions being considered. Figure 2 maps out different ways in which a system
might be studied.

Fig2. Ways to study a system


Modeling of simulation systems is categorized into two types.
1.2.2 Discrete and Continues System:
 A discrete system is one in which the state variables change only at a discrete set
of points in time shown in Fig3.
Example: Bank
 A continues system is one in which the state variables change continuously over
time shown in Fig4.
Example: Head of water behind the dam
Discrete Systems:

Fig.3 Discrete system


Continues Systems:

Fig.4 Continuous System


Types of Models:
 Mathematical Or Physical
 Static Model
 Dynamic Model
 Deterministic Model
 Stochastic Model
Mathematical model:
Mathematical model uses symbolic notation and equations to represents a system.
Static Model:
A static simulation models represent a system at a particular point in time it is also called as
monte-carlo simulation.
Dynamic Model:
A dynamic simulation models represent system as the change over time.
Example: Simulation of a bank from 9 to 4.
Deterministic Model:
A simulation variable that contains no random variable, have a set of known input which will
result in a unique set of output.
Stochastic Model:
A stochastic simulation model has one or more random variable as input.
Random input leads to random output. Since the outputs are random they can be consider
only as estimates of the true characteristics of a model.
1.3 Steps in Simulation Study:
Model programming is just part of the overall effort to design or analyze a complex system
by simulation. Figure 4 shows the steps that will compose a typical, sound simulation study.
Problem formulation:
 Every study should begin with a statement of the problem.
 If the statement is provided by the policy makers or those that have the problem,
The analyst must ensure that the problem being described is clearly understood
 If the problem statement is being developed by the analyst, it is important that the
policy makers understand and agree with the formulation.
Setting of objective and overall project plan:
 The objectives indicate the questions to be answered by simulation
 The overall project plan should include the study in terms of
 A statement of the alternative systems
 A method for evaluating the effectiveness of these alternatives
 Plans for the study in terms of the number of people involved
 Cost of the study
 The number of days required to accomplish each phase of the work with the
anticipated results.
Model Conceptualization:
 The art of modeling is enhanced by ability to have following:
 To abstract the essential features of a problem.
 To select and modify basic assumptions that characterizes the system.
 To enrich and elaborate the model until a useful approximation results.
Data Collection:
There is a constant interplay between the construction of the model and the collection of
the needed input data.
As complexity of the model changes the required data elements may also change.
Model Translation:
 Since most real-world system result in model that require a great deal of
information storage and computation, the model must be entered into a computer
recognizable format.
 we use term program even though it is possible to accomplish the desired result in
many instances with little or no actual coding.
Fig.4 Steps in Simulation Study
Verified:
• It pertains to the computer program and checking the performance.
• If the input parameters and logical structure and correctly represented, verification
is completed.
Validated:
 Validation is the determination that a model is an accurate representation of the real
system.
 Is usually achieved through the calibration of the model an iterative process of
comparing the model to actual system behavior and using the discrepancy between
the two and the insights gained to improve the model.
 This process is repeated until model accuracy is judges acceptable.
Experimental Design:
• The alternatives that are to be simulated must be determined. For each system
design, decisions need to be made concerning
• Length of the initialization period
• Length of simulation runs
• Number of replications to be made of each run
Production runs and analysis:
They are used to estimate measures of performance for the system designs that are
being simulated.
More runs:
 Based on the analysis of runs that have been completed. The analyst determines if
additional runs are needed and what design those additional experiments should
follow.
Documentation and reporting:
Two types of documentation. Program documentation and Process documentation
Program documentation: Can be used again by the same or different analysts to
understand how the program operates
Process documentation: This enables to review the final formulation and alternatives,
results of the experiments and the recommended solution to the problem. The final report
provides a vehicle of certification.
1.4 Discrete Event Simulation:
Discrete-event simulation concerns the modeling of a system as it evolves over time by a
representation in which the state variables change instantaneously at separate points in time.
These points in time are the ones at which an event occurs, where an event is defined as an
instantaneous occurrence that may change the state of the system.
EXAMPLE: Consider a service facility with a single server—e.g., a one-operator barbershop
or an information desk at an airport—for which we would like to estimate the (expected)
average delay in queue (line) of arriving customers, where the delay in queue of a customer
is the length of the time interval from the instant of his arrival at the facility to the instant he
begins being served. For the objective of estimating the average delay of a customer, the
state variables for a discrete-event simulation model of the facility would be the status of the
server, i.e., either idle or busy, the number of customers waiting in queue to be served (if
any), and the time of arrival of each person waiting in queue. The status of the server is
needed to determine, upon a customer’s arrival, whether the customer can be served
immediately or must join the end of the queue. When the server completes serving a
customer, the number of customers in the queue is used to determine whether the server will
become idle or begin serving the first customer in the queue.
1.4.1 Time-Advance Mechanisms
The variable in a simulation model that gives the current value of simulated time the
simulation clock. The unit of time for the simulation clock is never stated explicitly when a
model is written in a general-purpose language such as C, and it is assumed to be in the same
units as the input parameters.
Two principal approaches have been suggested for advancing the simulation clock: next-event
time advance and fixed-increment time advance. Since the first approach is used by all major
simulation software and by most people programming their model in a general-purpose
language, and since the second is a special case of the first, we shall use the next-event time-
advance approach for all discrete-event simulation models.
With the next-event time-advance approach, the simulation clock is initialized to zero and the
times of occurrence of future events are determined. The simulation clock is then advanced
to the time of occurrence of the most imminent (first) of these future events, at which point
the state of the system is updated to account for the fact that an event has occurred, and our
knowledge of the times of occurrence of future events is also updated. Then the simulation
clock is advanced to the time of the (new) most imminent event, the state of the system is
updated, and future event times are determined, etc. This process of advancing the simulation
clock from one event time to another is continued until eventually some pre specified stopping
condition is satisfied.
ti=time of arrival of the i th customer(t0=0)
Ai = ti - ti -1= inter arrival time between (i - 1)st and ith arrivals of customers
Si = time that server actually spends serving i th customer (exclusive of customer’s delay in
queue)
Di = delay in queue of ith customer
ci = ti +Di + Si = time that ith customer completes service and departs
ei = time of occurrence of ith event of any type (ith value the simulation clock takes on,
excluding the value e0 = 0)
Each of these defined quantities will generally be a random variable. Assume that the
probability distributions of the inter arrival times A1, A2, . . . and the service times S1, S2, .
. . are known and have cumulative distribution functions denoted by FA and FS, respectively.
At time e0 = 0 the status of the server is idle, and the time t1 of the first arrival is determined
by generating A1 from FA.
Fig.5 The next-event time-advance approach illustrated for the single-server
queueing system

1.4.2 Components and Organization of a Discrete-Event Simulation:


System state: The collection of state variables necessary to describe the system at a
particular time.
Simulation clock: A variable giving the current value of simulated time.
Event list: A list containing the next time when each type of event will occur.
Statistical counters: Variables used for storing statistical information about system
performance.
Initialization routine: A subprogram to initialize the simulation model at time 0.
Timing routine: A subprogram that determines the next event from the event list and then
advances the simulation clock to the time when that event is to occur
Event routine: A subprogram that updates the system state when a particular type of
event occurs (there is one event routine for each event type)
Library routines: A set of subprograms used to generate random observations from
probability distributions that were determined as part of the simulation model.
Report generator: A subprogram that computes estimates (from the statistical counters)
of the desired measures of performance and produces a report when the simulation ends
Main program
A subprogram that invokes the timing routine to determine the next event and then
transfers control to the corresponding event routine to update the system state
appropriately.
Flow for the Next-Event Time-Advance Approach:
The logical relationships (flow of control) among these components are shown in Fig. 6. The
simulation begins at time 0 with the main program invoking the initialization routine, where
the simulation clock is set to zero, the system state and the statistical counters are initialized,
and the event list is initialized. After control has been returned to the main program, it invokes
the timing routine to determine which type of event is most imminent. If an event of type i is
the next to occur, the simulation clock is advanced to the time that event type i will occur.
Fig.6 Flow for the Next-Event Time-Advance Approach

1.5 SIMULATION OF A SINGLE-SERVER QUEUEING SYSTEM


Consider a single-server queueing system (see Fig. 7) for which the inter arrival times A1,
A2, . . . are independent and identically distributed (IID) random variables.
 “Identically distributed” means that the inter arrival times have the same probability
distribution.
 A customer who arrives and finds the server idle enters service immediately, and
the service times S1, S2, . . . of the successive customers are IID random variables
that are independent of the inter arrival times.
 A customer who arrives and finds the server busy joins the end of a single queue.
 Upon completing service for a customer, the server chooses a customer from the
queue
(if any) in a first-in, first out(FIFO) manner.
 The simulation will begin in the “empty-and-idle” state; i.e., no customers are
present and the server is idle.
 At time 0, we will begin waiting for the arrival of the first customer, which will occur
after the first inter arrival time, A1, rather than at time 0
Fig.7 Simulation of a single-server Queueing System

1.5.1 Intuitive Explanation:


Simulate a single-server queueing system by showing how its simulation model would be
represented inside the computer at time e0 = 0 and the times e1, e2, . . . , e13 at which the
13 successive events occur that are needed to observe the desired number, n = 6, of delays
in queue. For expository convenience, we assume that the inter arrival and service times of
customers are
A1 = 0.4, A2 = 1.2, A3 = 0.5, A4 = 1.7, A5 = 0.2,
A6 =1.6, A7 = 0.2, A8 5=1.4, A9 = 1.9, . . .
S1 =2.0, S2 =0.7, S3 =0.2, S4 =1.1, S5 = 3.7, S6 =0.6, . . .
Figure 8 gives a snapshot of the system itself and of a computer representation of the system
at each of the times e0 = 0, e1 = 0.4, . . . , e13 = 8.6. In the “system” pictures, the square
represents the server, and circles represent customers; the numbers inside the customer
circles are the times of their arrivals. In the “computer representation” pictures, the values
of the variables shown are after all processing has been completed at that event.
Fig.8 Snapshots of the system and of its computer representation at time 0 and at
each of the 13 succeeding event times.
1.5.2 Program Organization and Logic
There are several reasons for choosing a general-purpose language such as C, rather than
more powerful high-level simulation software, for introducing computer simulation at this
point.
Figure 9 contains a flowchart for the arrival event. First, the time of the next arrival in the
future is generated and placed in the event list. Then a check is made to determine whether
the server is busy. If so, the number of customers in the queue is incremented by 1, and we
ask whether the storage space allocated to hold the queue is already full. If the queue is
already full, an error message is produced and the simulation is stopped; if there is still room
in the queue, the arriving customer’s time of arrival is put at the (new) end of the queue.
If the arriving customer finds the server idle, then this customer has a delay of 0, which is
counted as a delay, and the number of customer delays completed is incremented by 1. The
server must be made busy, and the time of departure from service of the arriving customer
is scheduled into the event list.

Fig. 9 Flowchart for arrival routine, queueing model.

The departure event’s logic is depicted in the flowchart of Fig. 10. Recall that this routine is
invoked when a service completion (and subsequent departure) occurs. If the departing
customer leaves no other customers behind in queue, the server is idled and the departure
event is eliminated from consideration, since the next event must be an arrival. On the other
hand, if one or more customers are left behind by the departing customer, the first customer
in queue will leave the queue and enter service, so the queue length is reduced by 1, and
the delay in queue of this customer is computed and registered in the appropriate statistical
counter

Fig.10 Flowchart for departure routine, queueing model.

1.6 Comparison of simulation Packages with Programming Languages:


 some advantages of using a simulation package rather than a general-purpose
programming Languages.
 Simulation packages automatically provide most of the features needed to build a
simulation model, resulting in a significant decrease in “programming” time and a
reduction in overall project cost.
 They provide a natural framework for simulation modeling. Their basic modeling
constructs are more closely akin to simulation than are those in a general-purpose
programming language like C.
 Simulation models are generally easier to modify and maintain when written in a
simulation package.
 They provide better error detection because many potential types of errors are checked
for automatically
Advantages of General-purpose Programming Languages:
 Most modelers already know a programming language, but this is often not the case
with a simulation package.
 A simulation model efficiently written in C, C++, or Java may require less execution
time than a model developed in a simulation package.
 Programming languages may allow greater programming flexibility than certain
simulation packages.
 The programming languages C++ and Java are object-oriented, which is of
considerable importance to many analysts and programmers, such as those in the
defense industry.
 Software cost is generally lower, but total project cost may not be.
1.7 Classification of Simulation Software:
The following are the various aspects of simulation packages.
 General-Purpose vs. Application-Oriented Simulation Packages
 Modeling Approaches
 Common Modeling Elements
General-Purpose vs. Application-Oriented Simulation Packages
 A general-purpose simulation package can be used for any application, but might
have special features for certain ones (e.g., for manufacturing or process
reengineering).
 An application-oriented simulation package is designed to be used for a certain class
of applications such as manufacturing, health care, or communications networks.
Modeling Approaches
Most contemporary simulation packages use the process approach to simulation modeling.
 A process is a time-ordered sequence of interrelated events separated by intervals
of time, which describes the entire experience of an “entity” as it flows through a
“system.”
 The process corresponding to an entity arriving to and being served at a single
server
 A system or simulation model may have several different types of processes

Fig. 11 Process describing the flow of an entity through a system..


1.7.1 Prototype customer-process routine for a single-server queueing system:
Fig. 12 gives a flowchart for a prototype customer-process routine in the case of a single
server
queueing system. Unlike an event routine, this process routine has multiple entry points at
blocks 1, 5, and 9. Entry into this routine at block 1 corresponds to the arrival event for a
customer entity that is the most imminent event in the event list. At block 1 an arrival event
record is placed in the event list for the next customer entity to arrive. (This next customer
entity will arrive at a time equal to the time the current customer entity arrives plus an inter
arrival time.) To determine whether the customer entity currently arriving can begin service,
a check is made (at block 2) to see whether the server is idle. If the server is busy, this
customer entity is placed at the end of the queue (block 3) and is made to wait (at block 4)
until selected for service at some undetermined time in the future. (This is called a conditional
wait.) The below Table 2 shows the different components of simulation applications.
Table 2 Entities, attributes, resources, and queues for some common simulation
applications
Fig.12 Prototype customer-process routine for a single-server queueing system.

1.7.2 Common Modelling Elements:


 Simulation packages typically include entities, attributes, resources, and queues as
part of their modeling framework.
 An entity is created, travels through some part of the simulated system, and then is
usually destroyed.
 Entities are distinguished from each other by their attributes, which are pieces of
information stored with the entity.
 As an entity moves through the simulated system, it requests the use of resources.
 If a requested resource is not available, then the entity joins a queue.
 The entities in a particular queue may be served in a FIFO (first-in, first-out) manner,
served in a LIFO (last-in, first-out) manner, or ranked on some attribute in increasing
or decreasing order
1.8 Desirable Software Features:
There are numerous features to consider when selecting simulation software. The
following are the different features grouped into the following form.
 General capabilities (including modeling flexibility and ease of use)
 Hardware and software requirements
 Animation and dynamic graphics
 Statistical capabilities
 Customer support and documentation
 Output reports and graphics
General Capabilities
The following are some specific capabilities that make a simulation product flexible:
• Ability to define and change attributes for entities and also global variables, and to use both
in decision logic (e.g., if-then-else constructs)
• Ability to use mathematical expressions and mathematical functions (logarithms,
exponentiation, etc.)
• Ability to create new modeling constructs and to modify existing ones, and to store them in
libraries for use in current and future models
Hardware and Software Requirements:
 In selecting simulation software, one must consider what computer platforms the
software is available for.
 Almost all software is available for Windows-based PCs, and some products are also
available for Apple computers.
 If a software package is available for several platforms, then it should be compatible
across platforms.
 The amount of RAM required to run the software should be considered as well as what
operating systems are supported.
 It is highly desirable if independent replications of a simulation model can be made
simultaneously on multiple processor cores or on networked computers.
Animation and Dynamic Graphics
The availability of built-in animation is one of the reasons for the increased use of
simulation modeling. In an animation, key elements of the system are represented on
the screen by icons that dynamically change position, color, and shape as the
simulation model evolves through time.
The following are some of the uses of animation:
• Communicating the essence of a simulation model (or simulation itself) to a
manager or to other people who may not be aware of (or care about) the technical
details of the model
• Debugging the simulation computer program
• Showing that a simulation model is not valid
• Suggesting improved operational procedures for a system (some things may not
be apparent from looking at just the simulation’s numerical results)
• Training operational personnel
• Promoting communication among the project team
It should be possible to import CAD drawings and clip art into an animation. It is often
desirable to display dynamic graphics and statistics on the screen as the simulation
executes. Examples of dynamic graphics are clocks, dials, level meters and
dynamically updated histograms and time plots
Customer Support and Documentation
The simulation software vendor should provide public training on the software on a regular
basis, and it should also be possible to have customized training presented at the client’s site.
Good technical support is extremely important for questions on how to use the software
and in case a bug in the software is discovered. Technical support, which is usually in the
form of telephone help, should be such that a response is received in at most one day.
Good documentation is a crucial requirement for using any software product. It should be
possible, in our opinion, to learn a simulation package without taking a formal training course.
There should be a detailed description of how each modeling construct works, particularly if
its operating procedures are complex.
Output Reports and Graphics
Standard reports should be provided for the estimated performance measures. It should also
be possible to customize reports, perhaps for management presentations.
Since a simulation product should be flexible enough so that it can compute estimates of user-
defined performance measures, it should also be possible to write these estimates into a
custom report.
The simulation product should provide a variety of (static) graphics. First, it should be possible
to make a histogram for a set of observed data. For continuous (discrete) data, a histogram
is a graphical estimate of the underlying probability density (mass) function that produced
the data.
In a time plot one or more key system variables (e.g., the numbers in certain queues) are
plotted over the length of the simulation, providing a long-term indication of the dynamic
behavior of the simulated system.
Unit-II
MODELING A PROBABILISTIC SYSTEM

2.1 Random Variables and properties


A Random Variable is a set of possible values from a random experiment.
Example: Tossing a coin: we could get Heads or Tails.
Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":
In short:
X = {0, 1}
Note: We could choose Heads=100 and Tails=150 or other values if we want! It is our
choice.
So:
 We have an experiment (such as tossing a coin)
 We give values to each event
 The set of values is a Random Variable

2.1.1 Not Like an Algebra Variable


In Algebra a variable, like x, is an unknown value:
Example: x + 2 = 6
In this case we can find that x=4
But a Random Variable is different ...
A Random Variable has a whole set of values ...
... and it could take on any of those values, randomly.

Example: X = {0, 1, 2, 3}
X could be 0, 1, 2, or 3 randomly.
And they might each have a different probability.
We use a capital letter, like X or Y, to avoid confusion with the Algebra type of variable.
Sample Space A Random Variable's set of values is the Sample Space.
Example: Throw a die once
Random Variable X = "The score shown on the top face".
X could be 1, 2, 3, 4, 5 or 6
So the Sample Space is {1, 2, 3, 4, 5, 6}

2.1.2 Probability
We can show the probability of any one value using this style:
P(X = value) = probability of that value
Example (continued): Throw a die once
X = {1, 2, 3, 4, 5, 6}
In this case they are all equally likely, so the probability of any one is 1/6
 P(X = 1) = 1/6
 P(X = 2) = 1/6
 P(X = 3) = 1/6
 P(X = 4) = 1/6
 P(X = 5) = 1/6
 P(X = 6) = 1/6
Note that the sum of the probabilities = 1, as it should be.

Example: How many heads when we toss 3 coins?


X = "The number of Heads" is the Random Variable.
In this case, there could be 0 Heads (if all the coins land Tails up), 1 Head, 2 Heads or 3
Heads.
So the Sample Space = {0, 1, 2, 3}
But this time the outcomes are NOT all equally likely.
The three coins can land in eight possible ways:

HHH 3
HHT 2
HTH 2
HTT 1
THH 2
THT 1
TTH 1
TTT 0
Looking at the table we see just 1 case of Three Heads, but 3 cases of Two Heads, 3 cases
of One Head, and 1 case of Zero Heads. So:
 P(X = 3) = 1/8
 P(X = 2) = 3/8
 P(X = 1) = 3/8
 P(X = 0) = 1/8

Example: Two dice are tossed.


The Random Variable is X = "The sum of the scores on the two dice".
Let's make a table of all possible values:
1st Die
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
2nd 3 4 5 6 7 8 9
Die
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
There are 6 × 6 = 36 possible outcomes, and the Sample Space (which is the sum of the
scores on the two dice) is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Let's count how often each value occurs, and work out the probabilities:
 2 occurs just once, so P(X = 2) = 1/36
 3 occurs twice, so P(X = 3) = 2/36 = 1/18
 4 occurs three times, so P(X = 4) = 3/36 = 1/12
 5 occurs four times, so P(X = 5) = 4/36 = 1/9
 6 occurs five times, so P(X = 6) = 5/36
 7 occurs six times, so P(X = 7) = 6/36 = 1/6
 8 occurs five times, so P(X = 8) = 5/36
 9 occurs four times, so P(X = 9) = 4/36 = 1/9
 10 occurs three times, so P(X = 10) = 3/36 = 1/12
 11 occurs twice, so P(X = 11) = 2/36 = 1/18
 12 occurs just once, so P(X = 12) = 1/36
2.1.3 A Range of Values
We could also calculate the probability that a Random Variable takes on a range of values.
Example (continued) What is the probability that the sum of the scores is 5, 6, 7 or 8?
In other words: What is P(5 ≤ X ≤ 8)?
P(5 ≤ X ≤ 8) =P(X=5) + P(X=6) + P(X=7) + P(X=8)
=(4+5+6+5)/36
=20/36
=5/9
Solving
We can also solve a Random Variable equation.
Example (continued) If P(X=x) = 1/12, what is the value of x?
Looking through the list above we find:
 P(X=4) = 1/12, and
 P(X=10) = 1/12
So there are two solutions: x = 4 or x = 10
Notice the different uses of X and x:
 X is the Random Variable "The sum of the scores on the two dice".
 x is a value that X can take.

2.1.4 Continuous
Random Variables can be either Discrete or Continuous:
 Discrete Data can only take certain values (such as 1,2,3,4,5)
 Continuous Data can take any value within a range (such as a person's height)
All our examples have been Discrete.

2.1.5 Random Variables – Continuous

A Random Variable is a set of possible values from a random experiment.


Example: Tossing a coin: we could get Heads or Tails.
Let's give them the values Heads=0 and Tails=1 and we have a Random Variable "X":

In short:
X = {0, 1}
2.1.6 Continuous
Random Variables can be either Discrete or Continuous:
 Discrete Data can only take certain values (such as 1,2,3,4,5)
 Continuous Data can take any value within a range (such as a person's height)
In our Introduction to Random Variables (please read that first!) we look at many examples
of Discrete Random Variables.
But here we look at the more advanced topic of Continuous Random Variables.
2.1.7The Uniform Distribution
The Uniform Distribution (also called the Rectangular Distribution) is the simplest
distribution.
It has equal probability for all values of the Random variable between a and b:

The probability of any value between a and b is p


We also know that p = 1/(b-a), because the total of all probabilities must be 1, so
the area of the rectangle = 1
p × (b−a) = 1
p = 1/(b−a)
We can write:
P(X = x) = 1/(b−a) for a ≤ x ≤ b
P(X = x) = 0 otherwise

Example: Old Faithful erupts every 91 minutes. You arrive there at random and wait for 20
minutes ... what is the probability you will see it erupt?

This is actually easy to calculate, 20 minutes out of 91 minutes is:


p = 20/91 = 0.22 (to 2 decimals)

But let's use the Uniform Distribution for practice.


To find the probability between a and a+20, find the blue area:

Area = (1/91) x (a+20 − a)


= (1/91) x 20
= 20/91
= 0.22 (to 2 decimals)
So there is a 0.22 probability you will see Old Faithful erupt.

If you waited the full 91 minutes you would be sure (p=1) to have seen it erupt.
But remember this is a random thing! It might erupt the moment you arrive, or any time in
the 91 minutes.
2.1.8 Cumulative Uniform Distribution
We can have the Uniform Distribution as a cumulative (adding up as it goes along)
distribution:

The probability starts at 0 and builds up to 1


This type of thing is called a "Cumulative distribution function", often shortened to "CDF"
Example (continued):
Let's use the "CDF" of the previous Uniform Distribution to work out the probability:

At a+20 the probability has accumulated to about 0.22


2.1.9 Other Distributions
Knowing how to use the Uniform Distribution
helps when dealing with more complicated
distributions like this one:

The general name for any of these is probability density function or "pdf"

2.1.10 The Normal Distribution


The most important continuous distribution is the Standard Normal Distribution
It is so important the Random Variable has its own special letter Z.
The graph for Z is a symmetrical bell-shaped curve:

Usually we want to find the probability of Z being between certain values.


Example: P(0 < Z < 0.45)

(What is the probability that Z is between 0 and 0.45)


This is found by using the Standard Normal Distribution Table
Start at the row for 0.4, and read along until 0.45: there is the value 0.1736
P(0 < Z < 0.45) = 0.1736
2.2 SIMULATION OUTPUT DATA AND STOCHASTIC PROCESSES
A stochastic process, also known as a random process, is a collection of random variables that
are indexed by some mathematical set. Each probability and random process are uniquely
associated with an element in the set. The index set is the set used to index the random
variables. The index set was traditionally a subset of the real line, such as the natural
numbers, which provided the index set with time interpretation.
Stochastic Process Meaning is one that has a system for which there are observations at
certain times, and that the outcome, that is, the observed value at each time is a random
variable.

Each random variable in the collection of the values is taken from the same mathematical
space, known as the state space. This state-space could be the integers, the real line, or η-
dimensional Euclidean space, for example. A stochastic process's increment is the amount
that a stochastic process changes between two index values, which are frequently interpreted
as two points in time. Because of its randomness, a stochastic process can have many
outcomes, and a single outcome of a stochastic process is known as, among other things, a
sample function or realization.

Classification
A stochastic process can be classified in a variety of ways, such as by its state space, index
set, or the dependence among random variables and stochastic processes are classified in a
single way, the cardinality of the index set and the state space.

When expressed in terms of time, a stochastic process is said to be in discrete-time if its index
set contains a finite or countable number of elements, such as a finite set of numbers, the set
of integers, or the natural numbers. Time is said to be continuous if the index set is some
interval of the real line. Discrete-time stochastic processes and continuous-time stochastic
processes are the two types of stochastic processes. The continuous-time stochastic processes
require more advanced mathematical techniques and knowledge, particularly because the
index set is uncountable, discrete-time stochastic processes are considered easier to study.
If the index set consists of integers or a subset of them, the stochastic process is also known
as a random sequence.

If the state space is made up of integers or natural numbers, the stochastic process is known
as a discrete or integer-valued stochastic process. If the state space is the real line, the
stochastic process is known as a real-valued stochastic process or a process with continuous
state space. If the state space is η-dimensional Euclidean space, the stochastic process is
known as a η-dimensional vector process or η-vector process.

Examples
You can study all the theory of probability and random processes mentioned below in the
brief, by referring to the book Essentials of stochastic processes.

Types of Stochastic Processes


The probability of any event depends upon various external factors. The mathematical
interpretation of these factors and using it to calculate the possibility of such an event is
studied under the chapter of Probability in Mathematics. According to probability theory to
find a definite number for the occurrence of any event all the random variables are counted.
These random variables are put together in a set then it is called a stochastic process. For
mathematical models used for understanding any phenomenon or system that results from a
very random behavior, Stochastic processes are used. Such phenomena can occur anywhere
anytime in this constantly active and changing world.
To make the learning of the Stochastic process easier it has been classified into various
categories. If the sample space consists of a finite set of numbers or a countable number of
elements such as integers or the natural numbers or any real values then it remains in a
discrete time. For an uncountable Index set, the process gets more complex. It will be taught
in higher classes. In this article, we will deal with discrete-time stochastic processes.
Various types of processes that constitute the Stochastic processes are as follows :

Bernoulli Process
The Bernoulli process is one of the simplest stochastic processes. It is a sequence of
independent and identically distributed (iid) random variables, where each random variable
has a probability of one or zero, say one with probability P and zero with probability 1-P. This
process is analogous to repeatedly flipping a coin, where the probability of getting a head is
P and its value is one, and the probability of getting a tail is zero. In other words, a Bernoulli
process is a series of iid Bernoulli random variables, with each coin flip representing a Bernoulli
trial.

Wiener Process
The Wiener process is a stationary stochastic process with independently distributed
increments that are usually distributed depending on their size. The Wiener process is named
after Norbert Wiener, who demonstrated its mathematical existence, but it is also known as
the Brownian motion process or simply Brownian motion due to its historical significance as a
model for Brownian movement in liquids

The Wiener process, which plays a central role in probability theory, is frequently regarded as
the most important and studied stochastic process, with connections to other stochastic
processes. It has a continuous index set and states space because its index set and state
spaces are non-negative numbers and real numbers, respectively. However, the process can
be defined more broadly so that its state space is dimensional Euclidean space. The resulting
Wiener or Brownian motion process is said to have zero drift if the mean of any increment is
zero. If the mean of the increment between any two points in time equals the time difference
multiplied by some constant μ, that is a real number, the resulting stochastic process is said
to have drift μ.

Poisson Process
The Poisson process is a stochastic process with various forms and definitions. It is a counting
process, which is a stochastic process that represents the random number of points or events
up to a certain time. The number of process points located in the interval from zero to some
given time is a Poisson random variable that is dependent on that time and some parameter.
This process's state space is made up of natural numbers, and its index set is made up of
non-negative numbers. This process is also known as the Poisson counting process because
it can be interpreted as a counting process.

A homogeneous Poisson process is one in which a Poisson process is defined by a single


positive constant. The homogeneous Poisson process belongs to the same class of stochastic
processes as the Markov and Lévy processes.

2.3 ESTIMATION OF MEANS, VARIANCES, AND CORRELATIONS


We have seen that simulation output data are correlated, and thus formulas from classical
statistics based on IID observations cannot be used directly for estimating variances.

2.3.1 Standard Deviation


The Standard Deviation is a measure of how spread out numbers are.
Its symbol is σ (the greek letter sigma)
The formula is easy: it is the square root of the Variance. So now you ask, "What is the
Variance?"

2.3.2 Variance
The Variance is defined as:
The average of the squared differences from the Mean.
To calculate the variance follow these steps:
 Work out the Mean (the simple average of the numbers)
 Then for each number: subtract the Mean and square the result (the squared
difference).
 Then work out the average of those squared differences. (Why Square?)

Example
You and your friends have just measured the heights of your dogs (in millimeters):

The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm.
Find out the Mean, the Variance, and the Standard Deviation.
Your first step is to find the Mean:
Answer:
Mean = 600 + 470 + 170 + 430 + 3005
= 19705
= 394
so the mean (average) height is 394 mm. Let's plot this on the chart:

Now we calculate each dog's difference from the Mean:

To calculate the Variance, take each difference, square it, and then average the result:

Variance
σ2 = 2062 + 762 + (−224)2 + 362 + (−94)25
= 42436 + 5776 + 50176 + 1296 + 88365
= 1085205
= 21704
So the Variance is 21,704
And the Standard Deviation is just the square root of Variance, so:

Standard Deviation
σ = √21704
= 147.32...
= 147 (to the nearest mm)
is normal, and what is extra large or extra small.
Rottweilers are tall dogs. And Dachshunds are a bit short, right?
We can expect about 68% of values to be within plus-or-minus 1 standard deviation.
Read Standard Normal Distribution to learn more.
Also try the Standard Deviation Calculator.
Our example has been for a Population (the 5 dogs are the only dogs we are interested in).
But if the data is a Sample (a selection taken from a bigger Population), then the calculation
changes!

When you have "N" data values that are:


 The Population: divide by N when calculating Variance (like we did)
 A Sample: divide by N-1 when calculating Variance
All other calculations stay the same, including how we calculated the mean.
Example: if our 5 dogs are just a sample of a bigger population of dogs, we divide by 4
instead of 5 like this:

Sample Variance = 108,520 / 4 = 27,130


Sample Standard Deviation = √27,130 = 165 (to the nearest mm)
Think of it as a "correction" when your data is only a sample.

2.3.3 Formulas
Here are the two formulas, explained at Standard Deviation Formulas if you want to know
more:
Looks complicated, but the important change is to
divide by N-1 (instead of N) when calculating a Sample Standard Deviation.

2.3.4 Correlation
When two sets of data are strongly linked together we say they have a High Correlation.
The word Correlation is made of Co- (meaning "together"), and Relation
 Correlation is Positive when the values increase together, and
 Correlation is Negative when one value decreases as the other increases
A correlation is assumed to be linear (following a line).

Correlation can have a value:


 1 is a perfect positive correlation
 0 is no correlation (the values don't seem linked at all)
 -1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and if it is positive
or negative.

As a formula it is:

Where:
 Σ is Sigma, the symbol for "sum up"

 is each x-value minus the mean of x (called "a" above)


 is each y-value minus the mean of y (called "b" above)
You probably won't have to calculate it like that, but at least you know it is not "magic", but
simply a routine set of calculations.

Note for Programmers


You can calculate it in one pass through the data. Just sum up x, y, x2, y2 and xy (no need
for a or b calculations above) then use the formula:

2.4 CONFIDENCE INTERVALS AND HYPOTHESIS TESTS FOR THE MEAN

 Let X1, X2, . . . ,Xn be IID random variables with finite mean m and finite variance s2
.
 How to construct a confidence interval for m and also the complementary problem of
testing the hypothesis that m 5 m0.
 if n is “sufficiently large,” the random variable Zn will be approximately distributed as
a standard normal random variable, regardless of the underlying distribution of the Xi
’s.
 It can also be shown for large n that the sample mean X(n) is approximately
distributed as a normal random variable with mean m and variance s2 yn.
 The difficulty with using the above results in practice is that the variance s2 is
generally unknown.
 Thus, there is nothing probabilistic about the single confidence interval [l(n, a), u(n,
a)] after the data have been obtained and the interval’s endpoints have been given
numerical values.
 The correct interpretation to give to the confidence interval .
 If one constructs a very large number of independent 100(1 2 a) percent confidence
intervals, each based on n observations, where n is sufficiently large, the proportion
of these confidence intervals .
 We call this proportion the coverage for the confidence interval. To further amplify
the correct interpretation to be given to a confidence interval, we generated 15
independent samples of size n 5 10 from a normal distribution with mean 5 and
variance 1.
 For each data set we constructed a 90 percent confidence interval for m, which we
know has a true value of 5.
 Suppose that the 10 observations 1.20, 1.50, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.50,
and 1.09 are from a normal distribution with unknown mean m and that our objective
is to construct a 90 percent

2.5 Validation of Simulation Model


 One of the most difficult problems facing a simulation analyst is that of trying to
determine whether a simulation model is an accurate representation of the actual
system being studied, i.e., whether the model is valid.
 We present a practical discussion of how to build valid and credible models.
 We also provide guidelines on how to determine the level of detail for a model of a
complex system, also a critical and challenging issue.
 We begin by defining the important terms used in this chapter, including verification,
validation, and credibility.
 Verification is concerned with determining whether the “assumptions document” has
been correctly translated into a computer “program,” i.e., debugging the simulation
computer program.
 Although verification is simple in concept, debugging a large-scale simulation program
is a difficult and arduous task due to the potentially large number of logical paths.
 Validation is the process of determining whether a simulation model is an accurate
representation of the system, for the particular objectives of the study.
 The following are some general perspectives on validation:
• Conceptually, if a simulation model is “valid,” then it can be used to make decisions
about the system similar to those that would be made if it were feasible and cost-
effective to experiment with the system itself.
• The ease or difficulty of the validation process depends on the complexity of the
system being modeled and on whether a version of the system currently exists. For
example, a model of a neighborhood bank would be relatively easy to validate since
it could be closely observed. However, a model of the effectiveness of a naval
weapons system in the year 2025 would be impossible to validate completely, since
the location of the battle and the nature of the enemy weapons would be unknown.
• A simulation model of a complex system can only be an approximation to the actual
system, no matter how much effort is spent on model building. There is no such
thing as absolute model validity, nor is it even desired. The more time (and hence
money) that is spent on model development, the more valid the model should be in
general. However, the most valid model is not necessarily the most cost-effective.
For example, increasing the validity of a model beyond a certain level might be quite
expensive, since extensive data collection may be required, but might not lead to
significantly better insight or decisions.
• A simulation model should always be developed for a particular set of purposes.
Indeed, a model that is valid for one purpose may not be for another.
• The measures of performance used to validate a model should include those that the
decision maker will actually use for evaluating system designs.
• Validation is not something to be attempted after the simulation model has
already been developed, and only if there is time and money remaining.
Unfortunately, our experience indicates that this recommendation is often not
followed.
• Each time a simulation model is being considered for a new application, its validity
should be reexamined. The current purpose may be substantially different from the
original purpose, or the passage of time may have invalidated certain model
parameters. A simulation model and its results have credibility if the manager and
other key project personnel accept them as “correct.” .Note that a credible model
is not necessarily valid, and vice versa. Also, a valid, credible, and appropriately
detailed simulation models model can be credible and not actually used as an aid in
making decisions. For example, a model could be credible but not used because of
political or economic reasons. The following things help establish credibility for a
model:
• The manager’s understanding of and agreement with the model’s assumptions
• Demonstration that the model has been validated and verified
• The manager’s ownership of and involvement with the project
• Reputation of the model developers
• Verification and validation that have been done
• Credibility of the model
• Simulation model development and use history (e.g., model developer and
similar applications)
• Quality of the data that are available
• Quality of the documentation
• Known problems or limitations with the simulation model The timing and
relationships of validation, verification.
 The rectangles represent states of the model or the system of interest, the solid
horizontal arrows correspond to the actions necessary to move from one state to
another, and the curved dashed arrows show where the three major concepts are most
prominently employed.
 The numbers below each solid arrow correspond to the steps in a sound simulation
study.

2.6 GUIDELINES FOR DETERMINING THE LEVEL OF MODEL DETAIL


 A simulation practitioner must determine what aspects of a complex real-world system
actually need to be incorporated into the simulation model and at what level of detail,
and what aspects can be safely ignored.
 It is rarely necessary to have a one-to-one correspondence between each element of
the system and each element of the model.
 Modeling each aspect of the system will seldom be required to make effective
decisions, and might result in excessive model execution time, in missed deadlines, or
in obscuring important system factors.
 A dog-food manufacturer had a consulting company build a simulation model of its
manufacturing line, which produced 1 million cans per day at a constant rate. Because
each can of food was represented by a separate entity in the model, the model was
very expensive to run and, thus, not very useful. A few years later the model was
rewritten, treating the manufacturing process as a “continuous flow”.
 The new model produced accurate results and executed in a small fraction of the time
necessary for the original model.
 A simulation model of a 1.5-mile-long factory was built in 1985 at a cost of $250,000.
However, the model was so detailed that no runs were ever made due to excessive
computer memory requirements. We now present some general guidelines for
determining the level of detail required by a simulation model.
• Carefully define the specific issues to be investigated by the study and the measures
of performance that will be used for evaluation.
 Models are not universally valid, but are designed for specific purposes. If the issues
of interest have not been delineated, then it is impossible to determine the appropriate
level of model detail.
A U.S. military analyst worked on a simulation model for six months without interacting
with the general who requested it. At the Pentagon briefing for the study, the general
walked out after 5 minutes stating, “That’s not the problem I’m interested in.”
• The entity moving through the simulation model does not always have to be the
same as the entity moving through the corresponding system .
Furthermore, it is not always necessary to model each component of the system in
complete detail
A large food manufacturer built a simulation model of its manufacturing line for
snack crackers. Initially, they tried to model each cracker as a separate entity, but
the computational requirements of the model made this approach infeasible. As a
result, the company was forced to use a box of crackers as the entity moving
through the model. The validity of this modeling approach was determined by using
sensitivity analysis .
• Use subject-matter experts (SMEs) and sensitivity analyses to help determine the
level of model detail. People who are familiar with systems similar to the one of
interest are asked what components of the proposed system are likely to be the
most important and, thus, need to be carefully modeled. Sensitivity analyses can
be used to determine what system factors (e.g., parameters or distributions) have
the greatest impact on the desired measures of performance. Given a limited
amount of time for model development, one should obviously concentrate on the
most important factors.
• A mistake often made by beginning modelers is to include an excessive amount of
model detail.
As a result, we recommend starting with a “moderately detailed” model, which can
later be embellished if needed.
The adequacy of a particular version of the model is determined in part by
presenting the model to SMEs and managers.

Regular interaction with these people also maintains their interest in the simulation
study. To gain credibility these members of the project team, we had to include
machine breakdowns and contention for resources. Furthermore, after the initial
model runs were made, it was necessary to make additional changes to the model
suggested by a mixer operator.
• Do not have more detail in the model than is necessary to address the issues of
interest, subject to the proviso that the model must have enough detail to be
credible. Thus, it may sometimes be necessary to include things in a model that are
not strictly required for model validity, due to credibility concerns.
• The level of model detail should be consistent with the type of data available. A
model used to design a new manufacturing system will generally be less detailed
than one used to fi ne-tune an existing system, since little or no data will be
available for a proposed system.
• In virtually all simulation studies, time and money constraints are a major factor in
determining the amount of model detail.
• If the number of factors

``of interest) for the study is large, then use a “coarse” simulation model or an analytic
model to identify what factors have a significant impact on system performance.

2.7 VERIFICATION OF SIMULATION COMPUTER PROGRAMS


 Some of these techniques may be used to debug any computer program, while others
we believe to be unique to simulation modeling.
 Technique in developing a simulation model, write and debug the computer program
in modules or subprograms.
 By way of example, for a 10,000-statement simulation model it would be poor
programming practice to write the entire program before attempting any debugging.
 When this large, untested program is finally run, it almost certainly will not execute,
and determining the location of the errors in the program will be extremely difficult.
 Instead, the simulation model’s main program and a few of the key subprograms
should be written and debugged first, perhaps representing the other required
subprograms as “dummies” or “stubs.” Next, additional subprograms or levels of detail
should be added and debugged successively, until a model is developed that
satisfactorily represents the system under study. In general, it is always better to start
with a “moderately detailed” model, which is gradually made as complex as needed,
than to develop “immediately” a complex model, which may turn out to be more
detailed than necessary and excessively expensive to run
 building valid, credible, and appropriately detailed simulation models
 For the multi-teller bank with jockeying, a good programming approach would be first
to write and debug the computer program without letting customers jockey from
queue to queue.
 It is advisable in developing large simulation models to have more than one person
review the computer program, since the writer of a particular subprogram may get
into a mental rut and, thus, may not be a good critic. In some organizations, this idea
is implemented formally and is called a structured walk-through of the program.
 For example, all members of the modeling team, say, systems analysts,
programmers, etc., are assembled in a room, and each is given a copy of a
particular set of subprograms to be debugged.
 Then the subprograms’ developer goes through the programs but does not proceed
from one statement to another until everyone is convinced that a statement is correct.
 Run the simulation under a variety of settings of the input parameters, and check to
see that the output is reasonable.
 In some cases, certain simple measures of performance may be computed exactly
and used for comparison.
 if the average utilization from a simulation run is close to the utilization factor r, there
is some indication that the program may be working correctly.
 One of the most powerful techniques that can be used to debug a discrete-event
simulation program is a “trace.”
 In a trace, the state of the simulated system, i.e., the contents of the event list, the
state variables, certain statistical counters, etc., are displayed just after each event
occurs and are compared with hand calculations to see if the program is operating as
intended.
 In performing a trace it is desirable to evaluate each possible program path as well
as the program’s ability to deal with “extreme” conditions.
 Sometimes such a thorough evaluation may require that special (perhaps
deterministic) input data be prepared for the model.
 Most simulation packages provide the capability to perform traces. A batch-mode
trace often produces a large volume of output, which must be checked event by event
for errors.
 it is usually preferable to use an interactive debugger to find programming errors.
 An interactive debugger allows an analyst to stop the simulation at a selected point
in time, and to examine and possibly change the values of certain variables.
 This latter capability can be used to “force” the occurrence of certain types of errors.
Many modern simulation packages have an interactive debugger.

2.8 TECHNIQUES FOR INCREASING MODEL VALIDITY AND CREDIBILITY


 Collect High-Quality Information and Data on the System In developing a simulation
model, the analyst should make use of all existing information, including the following:
Conversations with Subject-Matter Experts A simulation model is not an abstraction
developed by an analyst working in isolation; in fact, the modeler must work closely
with people who are intimately familiar with the system.
 There will never be one single person or document that contains all the information
needed to build the model.
 Therefore, the analyst will have to be resourceful to obtain a complete and accurate
set of information.
 Care must be taken to identify the true SMEs for each subsystem and to avoid
obtaining biased data .
 Ideally, SMEs should have some knowledge of simulation modeling, so that they
supply relevant information.
 The process of bringing all the system information together in one place is often
valuable in its own right, even if a simulation study is never performed.
 Note that since the specifications for a system may be changing during the course of
a simulation study, the modeler may have to talk to some SMEs on a continuing basis.
 For a manufacturing system, the modelers should obtain information from sources
such as machine operators, manufacturing and industrial engineers, maintenance
personnel, schedulers, managers, vendors, and blueprints.
 For a communications network, relevant people might include end-users, network
designers, technology experts , system administrators, application architects,
maintenance personnel, managers, and carriers.
 Observations of the System If a system similar to the one of interest exists, then data
should be obtained from it for use in building the model.
 These data may be available from historical records or may have to be collected during
a time study.
 Since the people who provide the data might be different from the simulation
modelers, it is important that the following two principles be followed:
• The modelers need to make sure that the data requirements (type, format,
amount, conditions under which they should be collected, why needed, etc.) are
specified precisely to the people who provide the data.
• The modelers need to understand the process that produced the data, rather
than treat the observations as just abstract numbers. The following are five potential
difficulties with data
• Data are not representative of what one really wants to model.
The data that have been collected during a military fi eld test may not be
representative of actual combat conditions due to differences in troop behavior and
lack of battlefield smoke
• Data are not of the appropriate type or format.
In modeling a manufacturing system, the largest source of randomness is usually
random downtimes of a machine.
 Ideally, we would like data on time to failure (in terms of actual machine busy time)
and time to repair of a machine. Sometimes data are available on machine
breakdowns, but quite often they are not in the proper format.
 For example, the times to failure might be based on wall-clock time and include periods
that the machine was idle or off-shift.
 Data may contain measurement, recording, or rounding errors. EXAMPLE 2.18. Repair
times for military-aircraft components were often rounded to the nearest day, making
it impossible to fi t a continuous probability distribution
 Data may be “biased” because of self-interest.
 The maintenance department in an automotive factory reported the reliability of
certain machines to be greater than reality to make themselves look good. • Data may
have inconsistent units.
 The U.S. Transportation Command transports military cargo by air, land, and sea.
Sometimes there is confusion in building simulation models because the U.S. Air Force
and the U.S. Army use short tons (2000 pounds) while the U.S. Navy uses long tons
(2200 pounds). Existing Theory For example, if one is modeling a service system such
as a bank and the arrival rate of customers is constant over some time period, theory
tells us that the interarrival times of customers are quite likely to be IID exponential
random variables; in other words, customers arrive in accordance with a Poisson
process .
 Relevant Results from Similar Simulation Studies If one is building a simulation model
of a military ground encounter (as has been done many times in the past), then results
from similar studies should be sought out and used, if possible.
 Experience and Intuition of the Modelers It will often be necessary to use one’s
experience or intuition to hypothesize how certain components of a complex system
operate, particularly if the system does not currently exist in some form.
 It is hoped that these hypotheses can be substantiated later in the simulation study.
 It is extremely important for the modeler to interact building valid, credible, and
appropriately detailed simulation models with the manager on a regular basis
throughout the course of the simulation study. This approach has the following
benefits:
 When a simulation study is initiated, there may not be a clear idea of the problem to
be solved. Thus, as the study proceeds and the nature of the problem becomes clearer,
this information should be conveyed to the manager, who may reformulate the study’s
objectives. Clearly, the greatest model for the wrong problem is invalid!
 The manager’s interest and involvement in the study are maintained.
 The manager’s knowledge of the system contributes to the actual validity of the model.
 The model is more credible since the manager understands and accepts the model’s
assumptions.
 The documentation of all model concepts, assumptions, algorithms, and data
summaries in a written assumptions document can greatly lessen this problem, and it
will also enhance the credibility of the model.
 The assumptions document should be written to be readable by analysts, SMEs, and
technically trained managers alike, and it should contain the following:
 An overview section that discusses overall project goals, the specific issues to be
addressed by the simulation study, model inputs, and the performance measures for
evaluation.
• Detailed descriptions of each subsystem in bullet format and how these subsystems
interact.
• What simplifying assumptions were made and why. Remember that a simulation model is
supposed to be a simplification or abstraction of reality.
• Limitations of the simulation model.
• Summaries of a data set such as its sample mean and a histogram. Detailed statistical
analyses or other technical material should probably be placed in appendices to the
report—remember that the assumptions document should be readable by technical
managers.
• Sources of important or controversial information (people, books, technical papers, etc.).
 There is a considerable danger that the simulation modeler will not obtain a complete
and correct description of the system.
 One way of dealing with this potential problem is to conduct a structured walk-through
of the assumptions document before an audience of SMEs and managers.
 Using a projection device, the simulation modeler goes through the assumptions
document bullet by bullet, but not proceeding from one bullet to the next until
everybody in the room is convinced that a particular bullet is correct and at an
appropriate level of detail.
 A structured walk-through will increase both the validity and the credibility of the
simulation model.
 The structured walk-through ideally should be held at a remote site (e.g., a hotel
meeting room), so that people give the meeting their full attention.
 Furthermore, it should be held prior to the beginning of programming in case major
problems are uncovered at the meeting.
 The assumptions document should be sent to participants prior to the meeting and
their comments requested.
 We do not, however, consider this to be a replacement for the structured walk-through
itself, since people may not have the time or motivation to review the document
carefully on their own.
 Furthermore, the interactions that take place at the actual meeting are invaluable.
[Within DoD the structured walk-through of the assumptions document (conceptual
model) is sometimes called conceptual model validation.] It is imperative that all key
members of the project team be present at the structured walk-through and that they
all take an active role.
 It is likely that many model assumptions will be found to be incorrect or to be missing
at the structured walk-through.
 Thus, any errors or omissions found in the assumptions document should be corrected
before programming begins.
 We now present two examples of structured walk-throughs, the fi rst being very
successful and the other producing quite surprising but still useful results.
 The process resulted in several erroneous assumptions being discovered and
corrected, a few new assumptions being added, and some level-of-detail issues being
resolved.
 Furthermore, at the end of the meeting, all nine people felt that they had a valid
model! In other words, they had taken ownership of the model.
 Various people were assigned responsibilities to collect information on different parts
of the system.
 The collected information was used to update the assumptions document, and a
second walk-through was successfully performed.
 This experience pointed out the critical importance of having all key project members
present at the kickoff meeting.
 Some people think that the need for an assumptions document and its formal review
are just common sense.
 However, based on talking to literally thousands of simulation practitioners, we believe
that, perhaps, 75 percent of all simulation models have inadequate documentation
 if several sets of data have been observed for the “same” random phenomenon, then
the correctness of merging these data can be assessed by the Kruskal-Wallis test of
homogeneity of populations .
 If the data sets appear to be homogeneous, they can be merged and the combined
data set used for some purpose in the simulation model.
 If a particular factor appears to be important, then it needs to be modeled carefully.
The following are examples of factors that could be investigated by a sensitivity
analysis:
• The value of a parameter
• The choice of a distribution
• The entity moving through the simulated system
• The level of detail for a subsystem
 A sensitivity analysis was performed, and it was found that using one-quarter of a
case of candy bars (150 candy bars) produced virtually the same simulation results
for the desired performance measure, cases produced per shift, while reducing the
execution time considerably.
 We developed a simulation model of the assembly and test area for a PC
manufacturing company. Later the company managers decided that they wanted to
run the model on their own computers, but the memory requirements of the model
were too great. As a result, we were forced to simplify greatly the model of the
assembly area to save computer memory. (The main focus of the simulation study
was the required capacity for the test area.)
 We ran the simplified simulation model (the model of the test area was unchanged)
and found that the desired performance measure, daily throughput, differed by only
2 percent from that of the original model.
 Thus, a large amount of detail was unnecessary for the assembly area. Note, however,
that the simplified model would not have been appropriate to study how to improve
the efficiency of the assembly area.
 On the other hand, it may not have been necessary to model the test area in this
case. When one is performing a sensitivity analysis, it is important to use the method
of common random numbers to control the randomness in the simulation.
 Otherwise, the effect of changing one factor may be confounded with other changes
(e.g., different random values from some input distribution) that inadvertently occur.
If one is trying to determine the sensitivity of the simulation output to changes in two or
more factors of interest, then it is not correct, in general, to vary one factor at a time while
setting the other factors to some arbitrary values.
Unit III
SELECTION OF INPUT PROBABILITY DISTRIBUTIONS

3.1 Introduction:
 To carry out a simulation using random inputs such as inter arrival times or demand
sizes, we have to specify their probability distributions.
 For example, in the simulation of the single-server queuing system , the inter arrival
times were taken to be IID exponential random variables with a mean of 1 minute;
the demand sizes in the inventory simulation were specified to be 1, 2, 3, or 4 items
with respective probabilities 1/6,1/3,1/3,1/6..
 Then, given that the input random variables to a simulation model follow particular
distributions, the simulation proceeds through time by generating random values from
these distributions.
 Almost all real-world systems contain one or more sources of randomness.
for an automotive manufacturer. It can be seen that the histogram has a longer right
tail (positive skewness) and that the minimum value is approximately 25 minutes.
Note that none of the four histograms has a symmetric shape like that of a normal
distribution, despite the fact that many simulation practitioners and simulation
books widely use normal input distributions.
 it is generally necessary to represent each source of system randomness by a
probability distribution (rather than just its mean) in the simulation model.
 The following example shows that failure to choose the “correct” distribution can also
affect 1. A single-server queuing system (e.g., a single machine in a factory) has
exponential interarrival times with a mean of 1 minute.
 Suppose that 200 service times are available from the system, but their underlying
probability distribution is the accuracy of a model’s results, sometimes drastically.
 Note that none of the four histograms has a symmetric shape like that of a normal
distribution, despite the fact that many simulation practitioners widely use normal
input distributions.
 t it is generally necessary to represent each source of system randomness by a
probability distribution (rather than just its mean) in the simulation model.
 distribution can also affect the accuracy of a model’s results, sometimes drastically.
 A single-server queueing system (e.g., a single machine in a factory)
has exponential interarrival times with a mean of 1 minute. Suppose that 200 service
times are available from the system, but their underlying probability distribution is
unknown.
 Gamma, Weibull, lognormal, and normal distributions to the observed service-time
data.
 In the case of the exponential distribution, we chose the mean b so that the resulting
distribution most closely “resembled” the available data.
 We then made 100 independent simulation runs (i.e., different random numbers were
used for each run) of the queueing system, using each of the five fitted distributions.
 For the normal distribution, if a service time was negative, then it was generated again.
 Each of the 500 simulation runs was continued until 1000 delays in queue were
collected..
 The Weibull distribution actually provides the best model for the service-time data.
Thus, the average delay for the real system should be close to 4.36 minutes.
 The average delays for the normal and lognormal distributions are 6.04 and 7.19
minutes, respectively, corresponding to model output errors of 39 percent and 65
percent.
 This is particularly surprising for the lognormal distribution, since it has the same
general shape (i.e., skewed to the right) as the Weibull distribution.
 It turns out that the lognormal distribution has a “thicker” right tail, which allows larger
service times and delays to occur.
 The probability distributions can evidently have a large impact on the simulation output
and, potentially, on the quality of the decisions made with the simulation results.
 If it is possible to collect data on an input random variable of interest, these data can
be used in one of the following approaches to specify a distribution (in increasing order
of desirability):
1. The data values themselves are used directly in the simulation. For example, if the data
represent service times, then one of the data values is used whenever a service time is needed
in the simulation. This is sometimes called a trace-driven simulation.

2. The data values themselves are used to define an empirical distribution function in some
way. If these data represent service times, we would sample from this distribution when a
service time is needed in the simulation.

3. Standard techniques of statistical inference are used to “fit” a theoretical distribution form
e.g., exponential or Poisson, to the data and to perform hypothesis tests to determine the
goodness of fit. If a particular theoretical distribution with certain values for its parameters is
a good model for the service-time data, then we would sample from this distribution when a
service time is needed in the simulation.
 Two drawbacks of approach 1 are that the simulation can only reproduce what has
happened historically and that there is seldom enough data to make all the desired
simulation runs.
 Approach 2 avoids these shortcomings since, at least for continuous data, any value
between the minimum and maximum observed data points can be generated . Thus,
approach 2 is generally preferable to approach 1.
 Approach 1 does have its uses. For example, suppose that it is desired to compare a
proposed material-handling system with the existing system for a distribution center.
 For each incoming order there is an arrival time, a list of the desired products, and a
quantity for each product. Modeling a stream of orders for a certain period of time
(e.g., for 1 month) will be diffi cult, if not impossible, using approach 2 or 3.
 Thus, in this case the existing and proposed systems will often be simulated using the
historical order stream.
 Approach 1 is also recommended for model validation when model output for an
existing system is compared with the corresponding output for the system itself.
 If a theoretical distribution can be found that fits the observed data reasonably well
(approach 3), then this will generally be preferable to using an empirical distribution
(approach 2) for the following reasons:
• An empirical distribution function may have certain “irregularities,” particularly if only
a small number of data values are available. A theoretical distribution, on the other
hand, “smooths out” the data and may provide information on the overall underlying
distribution.

If empirical distributions are used in the usual way, It is not possible to generate values
outside the range of the observed data in the simulation . This is unfortunate, since many
measures of performance for simulated systems depend heavily on the probability of an
“extreme” event’s occurring, e.g., generation of a very large service time. With a
fitted theoretical distribution, however, values outside the range of the observed data can be
generated.

There may be a compelling physical reason in some situations for using a certain theoretical
distribution form as a model for a particular input random variable Even when we are
fortunate enough to have this kind of information, it is a good idea to use observed data to
provide empirical support for the use of this particular distribution.
• A theoretical distribution is a compact way of representing a set of data values.
Conversely, n data values are available from a continuous distribution, then 2nvalues
(e.g., data and corresponding cumulative probabilities) must be entered and stored in
the computer to represent an empirical distribution in simulation packages. Thus, use
of an empirical distribution will be cumbersome if the data set is large.
• A theoretical distribution is easier to change. For example, suppose that a set of inter
arrival times is found to be modeled well by an exponential distribution with a mean
of 1 minute. If we want to determine the effect on the simulated system of increasing
the arrival rate by 10 percent, then all we have to do is to change the mean of the
exponential distribution to 0.909.

3.2 Probability Distributions:


 The purpose of this section is to discuss a variety of distributions that have been found
to be useful in simulation modeling and to provide a unified listing of relevant
properties of these distributions .
 It provides a short discussion of common methods by which continuous distributions
are defined, or parameterized.

3.3 Continuous Distribution:


 For a given family of continuous distributions, e.g., normal or gamma, there are usually
several alternative ways to define, or parameterize, the probability density function.
 if the parameters are defined correctly, they can be classified, on the basis of their
physical or geometric interpretation, as being one of three basic types: location, scale,
or shape parameters.
 A location parameter γ specifies an abscissa (x axis) location point of a distribution’s
range of values;
 Usually γ is the midpoint (e.g., the mean m for a normal distribution) or lower endpoint
of the distribution’s range.
 As γ changes, the associated distribution merely shifts left or right without otherwise
changing.
 Also, if the distribution of a random variable X has a location parameter of 0, then
 the distribution of the random variable Y = X + γ has a location parameter of γ.
 A scale parameter β determines the scale (or unit) of measurement of the values in
the range of the distribution.
 The standard deviation s is a scale parameter for the normal distribution. A change in
β compresses or expands the associated distribution without altering its basic form.
 If the distribution of the random variable X has a scale parameter of 1, then the
distribution of the αrandom variable Y = βX has a scale parameter of β.
 A shape parameter α determines, distinct from location and scale, the basic
 form or shape of a distribution within the general family of distributions of interest.
 A change in α generally alters a distribution’s properties (e.g., skewness) more
fundamentally
 than a change in location or scale. Some distributions (e.g., exponential and normal)
do not have a shape parameter, while others (e.g., beta) may have two.

Continuous probability distribution: A probability distribution in which the random variable X


can take on any value (is continuous). Because there are infinite values that X could assume,
the probability of X taking on any one specific value is zero. Therefore we often speak in
ranges of values (p(X>0) = .50). The normal distribution is one example of a continuous
distribution. The probability that X falls between two values (a and b) equals the integral (area
under the curve) from a to b:
The Normal Probability Distribution

A probability distribution is formed from all possible outcomes of a random process (for a
random variable X) and the probability associated with each outcome. Probability
distributions may either be discrete (distinct/separate outcomes, such as number of children)
or continuous (a continuum of outcomes, such as height). A probability density function is
defined such that the likelihood of a value of X between a and b equals the integral (area
under the curve) between a and b. This probability is always positive. Further, we know that
the area under the curve from negative infinity to positive infinity is one.

The normal probability distribution, one of the fundamental continuous distributions of


statistics, is actually a family of distributions (an infinite number of distributions with differing
means (μ) and standard deviations (σ). Because the normal distribution is a continuous
distribution, we can not calculate exact probability for an outcome, but instead we calculate
a probability for a range of outcomes (for example the probability that a random variable X is
greater than 10).
The normal distribution is symmetric and centered on the mean (same as the median and
mode). While the x-axis ranges from negative infinity to positive infinity, nearly all of the X
values fall within +/- three standard deviations of the mean (99.7% of values), while ~68%
are within +/-1 standard deviation and ~95% are within +/- two standard deviations. This
is often called the three sigma rule or the 68-95-99.7 rule. The normal density function is
shown below (this formula won’t be on the diagnostic!)

As illustrated at the top of this page, the standard normal probability function has a mean of
zero and a standard deviation of one. Often times the x values of the standard normal
distribution are called z-scores. We can calculate probabilities using a normal distribution table
(z-table). Here is a link to a normal probability table. It is important to note that in these
tables, the probabilities are the area to the LEFT of the z-score. If you need to find the area
to the right of a z-score (Z greater than some value), you need to subtract the value in the
table from one.

Using this table, we can calculate p(-1<z<1). To do so, first look up the probability that z is
less than negative one [p(z)<-1 = 0.1538]. Because the normal distribution is symmetric,
we therefore know that the probability that z is greater than one also equals 0.1587 [p(z)>1
= 0.1587]. To calculate the probability that z falls between 1 and -1, we take 1 – 2(0.1587)
= 0.6826. The green area in the figure above roughly equals 68% of the area under the
curve. This solutions jives with the three sigma rule stated earlier!!!
We can convert any and all normal distributions to the standard normal distribution using the
equation below. The z-score equals an X minus the population mean (μ) all divided by the
standard deviation (σ).

Example Normal Problem


We want to determine the probability that a randomly selected blue crab has a weight greater
than 1 kg. Based on previous research we assume that the distribution of weights (kg) of
adult blue crabs is normally distributed with a population mean (μ) of 0.8 kg and a standard
deviation (σ) of 0.3 kg. How do we determine this probability? First, we calculate the z score
by replacing X with 1, the mean (μ) with 0.8 and standard deviation (σ) with 0.3. We calculate
our z-score to be (1-0.8)/0.3=0.6667. We can then look in our z table to determine the
p(z>0.6667) is roughly 1-0.748 (pulled from the chart, somewhere between 0.7454 and
0.7486) = 0.252. Therefore, based on our normality assumption, we conclude that the
likelihood that a randomly selected adult blue crab weighs more than one kilogram is roughly
25.2% (the area shaded in blue).
3.4 Discrete distribution:
What Is Discrete Distribution?
A discrete distribution is a probability distribution that depicts the occurrence of discrete
(individually countable) outcomes, such as 1, 2, 3... or zero vs. one. The binomial
distribution, for example, is a discrete distribution that evaluates the probability of a "yes"
or "no" outcome occurring over a given number of trials, given the event's probability in each
trial—such as flipping a coin one hundred times and having the outcome be "heads".

Statistical distributions can be either discrete or continuous. A continuous distribution is built


from outcomes that fall on a continuum, such as all numbers greater than 0 (which would
include numbers whose decimals continue indefinitely, such as pi = 3.14159265...). Overall,
the concepts of discrete and continuous probability distributions and the random
variables they describe are the underpinnings of probability theory and statistical analysis.

KEY TAKEAWAYS
 A discrete probability distribution counts occurrences that have countable or finite
outcomes.
 This is in contrast to a continuous distribution, where outcomes can fall anywhere on
a continuum.
 Common examples of discrete distribution include the binomial, Poisson, and Bernoulli
distributions.
 These distributions often involve statistical analyses of "counts" or "how many times"
an event occurs.
 In finance, discrete distributions are used in options pricing and forecasting market
shocks or recessions.
Understanding Discrete Distribution
Distribution is a statistical concept used in data research. Those seeking to identify the
outcomes and probabilities of a particular study will chart measurable data points from a
data set, resulting in a probability distribution diagram. There are many types of probability
distribution diagram shapes that can result from a distribution study, such as the normal
distribution ("bell curve").
Statisticians can identify the development of either a discrete or continuous distribution by
the nature of the outcomes to be measured. Unlike the normal distribution, which is
continuous and accounts for any possible outcome along the number line, a discrete
distribution is constructed from data that can only follow a finite or discrete set of outcomes.

Discrete distributions thus represent data that has a countable number of outcomes, which
means that the potential outcomes can be put into a list. The list may be finite or infinite.
For example, when studying the probability distribution of a die with six numbered sides the
list is {1, 2, 3, 4, 5, 6}. A binomial distribution has a finite set of just two possible outcomes:
zero or one—for instance, lipping a coin gives you the list {Heads, Tails}. The Poisson
distribution is a discrete distribution that counts the frequency of occurrences as integers,
whose list {0, 1, 2, ...} can be infinite.

Distributions must be either discrete or continuous.


Examples of Discrete Distribution

The most common discrete probability distributions include binomial, Poisson, Bernoulli,
and multinomial.

The Poisson distribution is also commonly used to model financial count data where the tally
is small and is often zero. For one example, in finance, it can be used to model the number
of trades that a typical investor will make in a given day, which can be 0 (often), or 1, or 2,
etc. As another example, this model can be used to predict the number of "shocks" to the
market that will occur in a given time period, say over a decade.

Another example where such a discrete distribution can be valuable for businesses
is inventory management. Studying the frequency of inventory sold in conjunction with a
finite amount of inventory available can provide a business with a probability distribution
that leads to guidance on the proper allocation of inventory to best utilize square footage.

The binomial distribution is used in options pricing models that rely on binomial trees. In a
binomial tree model, the underlying asset can only be worth exactly one of two possible
values—with the model, there are just two possible outcomes with each iteration—a move
up or a move down with defined probabilities.

Discrete distributions can also be seen in the Monte Carlo simulation. Monte Carlo simulation
is a modeling technique that identifies the probabilities of different outcomes through
programmed technology. It is primarily used to help forecast scenarios and identify risks. In
Monte Carlo simulation, outcomes with discrete values will produce discrete distributions for
analysis. These distributions are used in determining risk and trade-offs among different
items being considered.
3.5 Hypothesizing Families Of Distributions:
 The first step in selecting a particular input distribution is to decide what general
 families—e.g., exponential, normal, or Poisson—appear to be appropriate on the
 basis of their shapes, without worrying (yet) about the specific parameter values for
these families.
 It describes some general techniques that can be used to hypothesize families of
distributions that might be representative of a simulation input random variable.
 In some situations, use can be made of prior knowledge about a certain random
 variable’s role in a system to select a modeling distribution or at least rule out some
distributions;
 This is done on theoretical grounds and does not require any data at all.
 For example, if we feel that customers arrive to a service facility one at a time, at a
constant rate, and so that the numbers of customers arriving in disjoint time intervals
are independent, then there are theoretical reasons for postulating that the interarrival
times are IID exponential random variables.
 Several discrete distributions—binomial, geometric, and negative binomial—
 were developed from a physical model.
 The range of a distribution rules it out as a modeling distribution. Service times, for
example, should not be generated directly from a normal distribution (at least in
principle), since a random value from any normal distribution can be negative.
 The proportion of defective items in a large batch should not be assumed to have a
gamma distribution, since proportions must be between 0 and 1, whereas gamma
random variables have no upper bound.
 Information should be used whenever available, but confirming the postulated
distribution with data is also strongly recommended.
 In practice, we seldom have enough of this kind of theoretical prior information to
select a single distribution, and the task of hypothesizing a distribution family from
observed data is somewhat less structured.
 In the remainder of this section, we discuss various heuristics, or guidelines, that can
be used to help one choose appropriate families of distributions.

3.5.1 Summary statistics


Summary statistics helps us summarize statistical information.
Let's consider an example to understand this better.
A school conducted a blood donation camp.
The blood groups of 30 students were recorded as follows.
We can represent this data in a tabular form.

This table is known as a frequency distribution table.


You can observe that all the collected data is organized under two columns.
This makes it easy for us to understand the given information.
Thus, summary statistics condenses the data to a simpler form so that it is easy for us to
observe its features at a glance.
We will learn more about summary statistics as we scroll down. Try your hand at solving some
interactive questions at the end.

Summary Statistics
Let us first understand the meaning of summary statistics.

Definition of Summary Statistics: Summary statistics is a part of descriptive statistics that


summarizes and provides the gist of the information about the sample data.
Summary statistics deals with summarizing statistical information.

This indicates that we can efficiently use summary statistics to quickly get the gist of
the information.

Statistics generally deals with the presentation of information quantitatively or visually.


"Summary statistics" is a part of descriptive statistics.

Descriptive statistics deals with the collection, organization, summaries, and presentation of
data.

What Is a Summary Statistics Table?


Big data related to population, economy, stock prices, and unemployment needs to be
summarized systematically to interpret it correctly.

It is usually done using a summary statistics table.


The summary table is a visual representation that summarizes statistical information about
the data in a tabular form.
Here are a few summary statistics about a certain country:
 The population of the country now stands at 1,351,800.
 60% of people describe their health as very good or excellent.
 20,800 have immigrated into the country while 21,500 people emigrated out of the
country.
 The per capita gross annual pay now stands at $21,000.
 There were 105,023 recorded crimes.
 Unemployment is at 2.8%.

How Do you Explain Summary Statistics?


Summary statistics is a part of descriptive statistics that summarizes and provides the gist of
information about the sample data.
Statisticians commonly try to describe and characterize the observations by finding:
 a measure of location, or central tendency, such as the arithmetic mean
 a measure of statistical dispersion like the standard mean absolute deviation
 a measure of the shape of the distribution like skewness
 if more than one variable is measured, a measure of statistical dependence such as a
correlation coefficient

How Do you Analyze Summary Statistics?


In a class, the collection of scores obtained by 30 students is the description of data collected.

To find the mean of the data, we will need to find the average marks of 30 students. If the
average marks obtained by 30 students is 75 out of 100, then we can derive a conclusion or
give judgment about the performance of the students on the basis of this result.

1. Summary statistics helps us get the gist of the information instantly.


2. Statisticians describe the observations using the following measures.
 Measure of location, or central tendency: arithmetic mean
 Measure of statistical dispersion: standard mean absolute deviation
 Measure of the shape of the distribution: skewness
 Measure of statistical dependence: correlation coefficient

Measures of Location
The arithmetic mean, median, mode, and inter quartile mean are the common measures of
location or central tendency.

Measures of Spread
Standard deviation, range, variance, absolute deviation, inter quartile range, distance
standard deviation, etc. are the common measures of spread/dispersion.

The coefficient of variation (CV) is a statistical measure of the relative spread of data points
around the mean.

Graphs / charts
Some of the graphs and charts frequently used in the statistical representation of the data
are given below.
Graphs:
 Line graph
 Bar graph
 Histogram
 Scatter plot
 Frequency distribution graph
Charts:
 Flow chart
 Pie chart

3.5.2 Histograms:
What Is a Histogram?
A histogram is a graphical representation of data points organized into user-specified ranges.
Similar in appearance to a bar graph, the histogram condenses a data series into an easily
interpreted visual by taking many data points and grouping them into logical ranges or bins.

KEY TAKEAWAYS
 A histogram is a bar graph-like representation of data that buckets a range of classes
into columns along the horizontal x-axis.
 The vertical y-axis represents the number count or percentage of occurrences in the
data for each column
 Columns can be used to visualize patterns of data distributions.
 In trading, the MACD histogram is used by technical analysts to indicate changes in
momentum.
 The MACD histogram columns can give earlier buy and sell signals than the
accompanying MACD and signal lines.

How Histograms Work


Histograms are commonly used in statistics to demonstrate how many of a certain type of
variable occur within a specific range.
For example, a census focused on the demography of a town may use a histogram to show
how many people are between the ages of zero - 10, 11 - 20, 21 - 30, 31 - 40, 41 - 50, 51
-60, 61 - 70, and 71 - 80.

This histogram example would look similar to the chart below. Let's say the numerals along
the vertical access represent thousands of people. To read this histogram example, you can
start with the horizontal axis and see that, beginning on the left, there are approximately
500 people in the town who are from less than one year old to 10 years old. There are 4,000
people in town who are 11 to 20 years old. And so on.
Histograms can be customized in several ways by analysts. They can change the interval
between buckets. In the example referenced above, there are eight buckets with an interval
of ten. This could be changed to four buckets with an interval of 20.
Another way to customize a histogram is to redefine the y-axis. The most basic label used is
the frequency of occurrences observed in the data. However, one could also use percentage
of total or density instead.
Histograms vs. Bar Charts

Both histograms and bar charts provide a visual display using columns, and people often use
the terms interchangeably. Technically, however, a histogram represents the frequency
distribution of variables in a data set. A bar graph typically represents a graphical comparison
of discrete or categorical variables.
UNIT-IV
GENERATING RANDOM VARIATES

4.1 Random Numbers:


A number chosen from some specified distribution randomly such that selection of large set
of these numbers reproduces the underlying distribution is called random number.
Properties of Random Numbers:
1. Uniformity: The random numbers generated should be uniform. That means a
sequence of random numbers should be equally probable everywhere.

If ‘N’ number of random numbers are divided into ‘K’ class interval, then expected
number of samples in each class should be equal to e= N / K.

2. Independent: Each random number should be independent samples drawn from a


continuous uniform distribution between 0 and 1.

The probability density function is given by:

3. Maximum Density: The large samples of random number should be generated in a


given range.

4. Maximum Cycle: It states that the repetition of numbers should be allowed only after
a large interval of time.

4.2 Generation of pseudo-random numbers:


Pseudo-random numbers: Pseudo random numbers are the random numbers that are
generated by using some known methods so as to produce a sequence of numbers in [0,1]
that can simulates the ideal properties of random numbers.
Pseudo random numbers are not completely random as the set of random numbers can be
replicated because of use of some known method.

The problems associated with pseudo random numbers are:


1. The generated numbers might not be uniformly distributed.
2. The generated numbers might be discrete valued instead of continuous valued.
3. The mean of the generated numbers might be too high or too low.
4. The variance of the generated numbers might be too high or too low.
5. There might be presence of correlation between the generated numbers.

Considerations of Generating Random Numbers:


 The method used to generate random number should be fast because the simulation
problem requires a large set of random numbers which can increase time complexity
of the system.
 The method used should be portable to different platform and programming
languages so as to generate same results wherever it is executed.
 The method should have long cycle.
 The random numbers should be replicable. It means that the same set of random
numbers should be generated with same starting point.
 The generated random numbers should approximate the uniformity and
independence properties.
4.3 Techniques for generating random numbers:
Linear Congruential Method:
The linear congruential method produces a sequence of integers x1,x2,x3,… between zero
and
m-1 according to the following recursive relationship:

Here,
 The initial value x0 is called the seed;
 a is called the constant multiplier;
 c is the increment
 m is the modulus
For example,
The sequence obtained when X0 = a = c = 7, m = 10, is
7, 6, 9, 0, 7, 6, 9, 0...
This example shows, the sequence is not always "random" for all choices of X 0, a, c, and m;
the way of choosing these values appropriately is the most important part of this method.
 When c is not equal to 0, the form is called the mixed congruential method;
 When c is equal to 0, the form is known as the multiplication congruential
method.
Combined or Mixed Congruential method:
By combining two or more multiplicative congruential generators may increase the length of
the period and results in other better statistics.
Procedure for generating Random Numbers using Linear Congruential Method:
 Choose the seed value X0, Modulus parameter m, Multiplier term a, and increment term
c.
 Initialize the required amount of random numbers to generate (say, an integer
variable no Of RandomNums).
 Define a storage to keep the generated random numbers (here, vector is considered) of
size noOfRandomNums.
 Initialize the 0th index of the vector with the seed value.
 For rest of the indexes follow the Linear Congruential Method to generate the random
numbers.

4.4 Tests for Random Numbers:


When a random number generator is devised, one needs to test its property. The two
properties are considered for testing random numbers they are uniformity and
independence.
The following are different tests used for testing random numbers:
1. Frequency test
2. Runs test
3. Autocorrelation test
4. Gap test
5. Poker test

 The first one tests for uniformity and the second to fifth ones test independence.

1. Frequency test:
• The frequency test is a test of uniformity.
• Two different methods used in frequence test are
a. Kolmogorov-Smirnov test and
b. chi-square test.
• Both these two tests measure the agreement between the distribution of a sample of
generated random numbers and the theoretical uniform distribution.
• Both tests are based on the null hypothesis of no significant difference between the
sample distribution and the theoretical distribution

The Kolmogorov-Smirnov test:


This test compares the cdf of uniform distribution F(x) to the empirical cdf SN(X) of the
sample of N observations.
The following are the properties of Kolmogorov-Smirnov test:
• F(x) = x, 0 ≤ x ≤1


• As N becomes larger SN(X) , should be close to F(x)
• Kolmogorov-Smirnov test is based on the following statistic measure

• Here D is a random variable, its sampling distribution is tabulated.


• If the calculated D value is greater than the ones listed in the Table, the hypothesis (no
disagreement between the samples and the theoretical value) should be rejected; otherwise,
we don't have enough information to reject it.
The Following steps are taken to perform the test:

1. Rank the data from smallest to largest

2. Compute

3. Compute

4. Determine the critical value, , from Table for the specified significance
level and the given sample size N.
5. If the sample statistic D is greater than the critical value , the null hypothesis
that the sample data is from a uniform distribution is rejected; if , then
there is no evidence to reject it.

Example: Suppose N=5 numbers: 0.44, 0.81, 0.14, 0.05,0.93


Step 1:

i 1 2 3 4 5
R(i) 0.05 0.14 0.44 0.81 0.93
i/N 0.20 0.40 0.60 0.80 1.00
i/N – R(i) 0.15 0.26 0.16 - 0.07
R(i) – (i-
0.05 - 0.04 0.21 0.13
1)/N

Step2: D+=0.26 and D-=0.15


Step 3: D = max(D+, D-) = 0.26
Step 4: For α = 0.05,
Dα = 0.565 > D = 0.26
Hence, H0 is not rejected.

Chi-square test:
The Chi-Square test is a statistical procedure for determining the difference between
observed and expected data. This test can also be used to determine whether it correlates
to the categorical variables in our data. It helps to find out whether a difference between
two categorical variables is due to chance or a relationship between them. The formula for
chi-square test is

where n is the number of classes (e.g. intervals),


Oi is the number of samples observed in the interval,
Ei is expected number of samples in the interval.
If the sample size is N, in a uniform distribution, Ei =N/n

Some of the uses of the Chi-Squared test:


 The Chi-squared test can be used to see if your data follows a well-known theoretical
probability distribution like the Normal or Poisson distribution.
 The Chi-squared test allows you to assess your trained regression model's goodness
of fit on the training, validation, and test data sets.

There are two main types of Chi-Square tests namely -


1. Independence
2. Goodness-of-Fit

1. Independence
The Chi-Square Test of Independence is a derivable ( also known as inferential ) statistical
test which examines whether the two sets of variables are likely to be related with each other
or not. This test is used when we have counts of values for two nominal or categorical
variables and is considered as non-parametric test. A relatively large sample size and
independence of obseravations are the required criteria for conducting this test.
For Example-
In a movie theatre, suppose we made a list of movie genres. Let us consider this as the first
variable. The second variable is whether or not the people who came to watch those genres
of movies have bought snacks at the theatre. Here the null hypothesis is that genre of the
film and whether people bought snacks or not are un relatable. If this is true, the movie
genres don’t impact snack sales.
2. Goodness-Of-Fit
In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test determines whether a
variable is likely to come from a given distribution or not. We must have a set of data values
and the idea of the distribution of this data. We can use this test when we have value counts
for categorical variables. This test demonstrates a way of deciding if the data values have a “
good enough” fit for our idea or if it is a representative sample data of the entire population.
For Example
Suppose we have bags of balls with five different colours in each bag. The given condition is
that the bag should contain an equal number of balls of each colour. The idea we would like
to test here is that the proportions of the five colours of balls in each bag must be exact.

Example1: Let's say if we want to know if gender has anything to do with political party
preference. You poll 440 voters in a simple random sample to find out which political party
they prefer. The results of the survey are shown in the table below:

To see if gender is linked to political party preference, perform a Chi-Square test of


independence using the steps below.

Step 1: Define the Hypothesis


H0: There is no link between gender and political party preference.
H1: There is a link between gender and political party preference.

Step 2: Calculate the Expected Values

calculate the expected frequency.

For example, the expected value for Male Republicans is:

Similarly, you can calculate the expected value for each of the cells.

Step 3: Calculate (O-E)2 / E for Each Cell in the Table

calculate the (O - E)2 / E for each cell in the table.


Where
O = Observed Value
E = Expected Value
Step 4: Calculate the Test Statistic X2

X2 is the sum of all the values in the last table


= 0.743 + 2.05 + 2.33 + 3.33 + 0.384 + 1
= 9.837

2. Runs Test:
1. Runs up and down
 The runs test examines the arrangement of numbers in a sequence to test the
hypothesis of independence.
 A run is defined as a succession of similar events proceeded and followed by a
different event.
E.g. in a sequence of tosses of a coin, we may have
{H T T H H T T T H T}
The first toss is proceeded and the last toss is followed by a "no event". This
sequence has six runs, first with a length of one, second and third with length two,
fourth length three, fifth ad sixth length one.
 A few features of a run
o two characteristics: number of runs and the length of run
o an up run is a sequence of numbers each of which is succeeded by a larger
number; a down run is a sequence of numbers each of which is succeeded by
a smaller number
 If a sequence of numbers have too few runs, it is unlikely a real random sequence.
E.g. 0.08, 0.18, 0.23, 0.36, 0.42, 0.55, 0.63, 0.72, 0.89, 0.91, the sequence
has one run, an up run. It is not likely a random sequence.
 If a sequence of numbers have too many runs, it is unlikely a real random sequence.
E.g. 0.08, 0.93, 0.15, 0.96, 0.26, 0.84, 0.28, 0.79, 0.36, 0.57. It has nine
runs, five up and four down. It is not likely a random sequence.
 If a is the total number of runs in a truly random sequence, the mean and variance
of a is given by

and

For N>20, the distribution of a is reasonably approximated by a normal


distribution, . Converting it to a standardized normal
distribution by
that is

 Failure to reject the hypothesis of independence occurs when ,


 where the is the level of significance.

2. Runs above and below the mean:


 The previous test for up runs and down runs are important. But they are not
adequate to assure that the sequence is random.
 Let n1 and n2 be the number of individual observations above and below the mean,
let b the total number of runs.
 For a given n1 and n2, the mean and variance of b can be expressed as

 For either n1 or n2 greater than 20, b is approximately normally distributed

Failure to reject the hypothesis of independence occurs when,


where is the level of significance.
3. Runs test: length of runs.
 E.g.: 0.16, 0.27, 0.58, 0.63, 0.45, 0.21, 0.72, 0.87, 0.27, 0.15, 0.92, 0.85...
If the same pattern continues, two numbers below average, two numbers above
average, it is unlikely a random number sequence. But this sequence will pass other
tests.
 We need to test the randomness of the length of runs.
 Let yi be the number of runs of length i in a sequence of N numbers. E.g. if the
above sequence stopped at 12 numbers (N = 12), then Y1=Y2=Y3----Y11=0 and
Obviously Yi is a random variable. Among various runs, the expected value for runs
up and down is given by

The number of runs above and below the mean, also random variables, the expected
value of Yi is approximated by

Where E(I) the approximate expected length of a run and ui is the approximate
probability of length .
 Ui is given by

 E(I) is given by
 The approximate expected total number of runs (of all length) in a sequence of
length N is given by

(total number divided by expected run length).


 The appropriate test is the chi-square test with Oi being the observed number of
runs of length i

Where L = N - 1 for runs up and down, L = N for runs above and below the mean.

3. Auto-correlation:
The tests for auto-correlation are concerned with the dependence between numbers in a
sequence.
 The test computes the auto-correlation between every m numbers (m is also known
as the lag) starting with the ith number.
Thus the autocorrelation pim between the following numbers would be of interest.

The value M is the largest integer such that where N is the


total number of values in the sequence.
E.g. N = 17, i = 3, m = 4, then the above sequence would be 3, 7, 11, 15 (M = 2).
The reason we require M+1 instead of M is that we need to have at least two
numbers to test (M = 0) the autocorrelation.
 Since a non-zero autocorrelation implies a lack of independence, the following test is
appropriate

 For large values of M, the distribution of the estimator , denoted as , is


approximately normal if the values are uncorrelated.
 Form the test statistic

which is distributed normally with a mean of zero and a variance of one.


 The actual formula for and the standard deviation is

and
 After computing Z0, do not reject the null hypothesis of independence if

where is the level of significance.


4. Gap Test: The gap test is used to determine the significance of the interval between
recurrence of the same digit.
 A gap of length x occurs between the recurrence of some digit.
 The probability of a particular gap length can be determined by a Bernoulli trail.

If we are only concerned with digits between 0 and 9, then

 The theoretical frequency distribution for randomly ordered digits is given by

The following are the Steps involved in the test.


Step 1.
Specify the cdf for the theoretical frequency distribution based on the selected class
interval width (See Table 8.6 for an example).
Step 2.
Arrange the observed sample of gaps in a cumulative distribution with these same
classes.

Step 3.
Find D, the maximum deviation between F(x) and SN(x) .
Step 4.
Determine the critical value, , from Table A.8 for the specified value of and
the sample size N.
Step 5.
If the calculated value of D is greater than the tabulated value of , the null
hypothesis of independence is rejected.

5. Poker Test
 The poker test for independence is based on the frequency in which certain digits are
repeated in a series of numbers.
 For example 0.255, 0.577, 0.331, 0.414, 0.828, 0.909, 0.303, 0.001... In each case,
a pair of like digits appears in the number.
 In a three digit number, there are only three possibilities.
1. The individual digits can be all different. Case 1.
2. The individual digits can all be the same. Case 2.
3. There can be one pair of like digits. Case 3.
 P(case 1) = P(second differ from the first) * P(third differ from the first and second)
= 0.9 * 0.8 = 0.72
P(case 2) = P(second the same as the first) * P(third same as the first) = 0.1 * 0.1 =
0.01
P(case 3) = 1 - 0.72 - 0.01 = 0.27
4.5 Inverse-Transform Technique:
 The inverse transform technique can be used to sample from exponential, the
uniform, the Weibull and the triangle distributions.
 The basic principle is to find the inverse function of F, such
that .
 denotes the solution of the equation r=F(x) in terms of r, not 1/F.
For example, the inverse of y = x is x = y, the inverse of y = 2 x + 1 is x = (y-1)/2
The inverse-transform technique can be used in principle for any distribution.
• Most useful when the CDF F(x) has an inverse F -1(x) which is easy to compute.

The following are the steps involved in Inverse-Transform Technique


1. Compute the CDF of the desired random variable X
2. Set F(X) = R on the range of X
3. Solve the equation F(X) = R for X in terms of R
4. Generate uniform random numbers R1, R2, R3 ... and compute the desired random
variate by Xi = F-1(Ri)

Examples of other distributions for which inverse CDF works are:


• Uniform distribution
• Weibull distribution
• Triangular distribution
Uniform Distribution:
• Random variable X uniformly distributed over [a, b]
F(X)=R
Weibull Distribution:
 Probability density function

Step 1.
cdf

for
Step 2.

let
Step 3.
Solve for in terms of yields

Step 4.

Generate uniformly distributed from (0,1) feeding them to the function in Step 3 to
get .
Triangular Distribution:
Cumulative distribution function:

Steps
Step 1.
cdf

Step 2.

let and

Step 3.
Solve in terms of

4.6 Acceptance-Rejection Technique to Generate Random Variate


 The following are the steps used to generate uniformly distributed random numbers
between 1/4 and 1.
Step 1.
Generate a random number R

Step 2a.

If , accept X = R, goto Step 3


Step 2b.

If , reject R, return to Step 1


Step 3.
If another uniform random variate on [1/4, 1] is needed, repeat the procedure
beginning at Step 1. Otherwise stop.
o The random variate generated using above methods is indeed uniformly
distributed over [1/4, 1]. The answer is Yes. Take any value ,

which is the correct probability for a uniform distribution on [1/4,1].


o The efficiency: use this method in this particular example, the rejection probability
is 1/4 on the average for each number generated. The number of rejections is a
geometrically distributed random variable with probability of ``success'' being
p = 3/4, mean number of rejections is (1/p - 1) = 4/3 - 1 = 1/3 (i.e. 1/3 waste).
o For this reason, the inverse transform (X = 1/4 + (3/4) R) is more efficient
method.

 Poisson Distribution
o pmf

Where N can be interpreted as the number of arrivals in one unit time.


o From the original Poisson process definition, we know the inter-arrival
time A1,A2,… are exponentially distributed with a mean of , i.e. arrivals in
one unit time.
o Relation between the two distribution:

if and only if

essentially this means if there are n arrivals in one unit time, the sum of inter-
arrival time of the past n observations has to be less than or equal to one, but if
one more inter-arrival time is added, it is greater than one (unit time).
o The Ai s in the relation can be generated from uniformly distributed random

number , thus

both sides are multiplied by

that is

Now we can use the Acceptance-Reject method to generate Poisson


distribution.
Step 1.
Set n = 0, P = 1.
Step 2.

Generate a random number and replace P by .


Step 3.

If , then accept N = n, meaning at this time unit, there are n arrivals. Otherwise, reject the
current n, increase n by one, return to Step 2.
o Efficiency: How many random numbers will be required, on the average, to
generate one Poisson variate, N? If N = n, then n+1 random numbers are
required (because of the (n+1) random numbers product).

When is large, say , the acceptance-rejection technique described


here becomes too expensive. Use normal distribution to approximate Poisson
distribution. When is large

is approximately normally distributed with mean 0 and variance 1, thus

can be used to generate Poisson random variate.

4.7 Special properties of Generating Random Variates


The following two methods are used as special properties of generating random variates.
1. Composite method :
Example:
i) Binomial is a sum of Bernoulli trials
ii) Erlang is a sum of Exponentials
2. Polar co-ordinates method
Example:
Normal Distribution
3. Exploit Relationships
UNIT-V

OUTPUT DATA ANALYSIS FOR A SINGLE SYSTEM

5.1 Introduction to Output Data Analysis


Output analysis is the modeling stage concerned with designing replications, computing
statistics from them and presenting them in textual or graphical format. Output analysis
focuses on the analysis of simulation results (output statistics). It provides the main value-
added of the simulation enterprise by trying to understand system behavior and generate
predictions for it.

Characteristics of output data analysis:


Replication design: A good design of simulation replications allows the analyst to obtain the
most statistical information from simulation runs for the least computational cost. In
particular, we seek to minimize the number of replications and their length, and still obtain
reliable statistics.

Estimation of performance metrics: Replication statistics provide the data for computing
point estimates and confidence intervals for system parameters of interest. Critical estimation
issues are the size of the sample to be collected and the independence of observations used
to compute statistics, particularly confidence intervals.

System analysis and experimentation: Statistical estimates are used in turn to


understand system behavior and generate performance predictions under various scenarios,
such as different input parameters (parametric analysis), scenarios of operation, and so on.
Experimentation with alternative system designs can elucidate their relative merits and
highlight design trade-offs

EXAMPLE: Consider a bank with five tellers and one queue, which opens its doors at 9 a.m.,
closes its doors at 5 p.m., but stays open until all customers in the bank at 5 p.m. have been
served. Assume that customers arrive in accordance with a Poisson process at rate 1 per
minute (i.e., IID exponential inter arrival times with mean 1 minute), that service times are
IID exponential random variables with mean 4 minutes, and that customers are served in a
FIFO manner. Table 5.1 shows several typical output statistics from l0 independent
replications of a simulation of the bank, assuming that no customers are present initially.

Table 5.1: 10 independent replications of a simulation of the bank


5.2 Transient and Steady-state Behaviour Of a Stochastic Process
Transient State : Its the output process for the initial condition I, at discrete time i.
It shows the density of random variable Y vary from one replication to another.

Steady state: It shows the distribution of the random variable from a a particular point will
be approximately same as each other. It does not depend on initial conditions I.
Since most simulations are stochastic in nature, their output can vary from run to run due to
random chance. We typically need to analyze our results over many runs. The analysis is
affected by the type of outputs. They generally fall into two categories of behaviors for a
stochastic process.

Transient behaviour:
Indicated by a simulation with a specific termination event (ex: runs for X minutes, or runs
until C customers have been processed, or runs until inventory is exhausted etc.)

Steady-state behaviour
Indicated by a simulation that runs over a very long period of simulated time, or with no
stated stop event.
Consider the output stochastic process Y1, Y2, . . . . Let Fi (y) = P(Yi # Yz) for i = 1, 2, . . . ,
where y is a real number and I represents the initial conditions used to start the simulation
at time 0. [The conditional probability P(Yi # Yz) is the probability that the event {Yi # y}
occurs given the initial conditions I.] For a manufacturing system, specify the number of jobs
present, and whether each machine is busy or idle, at time 0. Here Fi (Yz=1) the transient
distribution of the output process at (discrete) time i for initial conditions I.

The density functions for the transient distributions corresponding to the random variables Yi
=1 , Yi= 2 , Yi= 3 , and Yi =4 are shown in Fig. 5.1 for a particular set of initial conditions I
and increasing time indices i1, i2, i3, and i4, where it is assumed that the random variable Yi
j has density function fYij . The density specifies how the random variable Yi j can vary from
one replication to another. In particular, suppose that we make a very large number of
replications, n, of the simulation and observe the stochastic process Y1, Y2, . . . on each one.
If we make a histogram of the n observed values of the random variable Yi j , then this
histogram (when appropriately scaled) will look very much like the density fYij . For fixed y
and I, the probabilities F1(YZ=1 ), F2(Y Z= I), . . . are just a sequence of numbers. If Fi (Y
Z= I) S F(y) as i S ` for all y and for any initial conditions I, then F(y) is called the steady-
state distribution of the output process Y1, Y2, . . . ..the steady-state distribution F(y) is only
obtained in the limit as i S `. In practice,
Fig. 5.1 Transient and steady-state density functions for a particular stochastic
process Y1, Y2, . . . and initial conditions I.
Example: Consider the stochastic process D1, D2, . . . for the M/M/1 queue with r 5 0.9 (l5
1, v 5 10y9), where Di is the delay in queue of the ith customer. In Fig. 5.2 we plot the
convergence of the transient mean E(Di ) to the steady-state mean

Fig. 5.2 E(Di) as a function of i and the number in system at time 0, s, for the
M/M/1 queue with r 5 0.9.
The Table 5.2 shows he differences between transient and steady state behavior of
stochastic processes with the following characteristics
5.3 Types of Simulation with respect to output analysis
The options available in designing and analyzing simulation experiments depend on the type
of simulation at hand, as depicted in Fig. 5.3. Simulations may be either terminating or non
terminating, depending on whether there is an obvious way for determining the run length.
Terminating simulation:
 Runs for some duration of time TE, where E is a specified event that stops the
simulation.
 Starts at time 0 under well-specified initial conditions.
 Ends at the stopping time TE.
 Bank example: Opens at 8:30 am (time 0) with no customers present and 8 of the 11
teller working (initial conditions), and closes at 4:30 pm (Time TE = 480 minutes).
 The simulation analyst chooses to consider it a terminating system because the object
of interest is one day’s operation.
A non terminating simulation is one that executes continuously.

Examples
1. A retail / commercial establishment e.g. Bank, has working hours 9 to 5, the object is to
measure the quality of customer service in this specified 8 hours. Here the initial condition
is number of customers present at time E(t)=0 ( which is to be specified)
2. An aerospace manufacturer recieves a contract to produce 100 airplanes, which must be
delieverd within 18 months.
3. A company that sells a single product would like to decide how many items to have in
inventory during 120 month planning horizon. Given some initial inventory level, the object
is determine how much order each month so as to minimize the expected averaging cost
per month of inventory system
4. Consider a manufacturing company that operates 16 hours a day (two shifts) with work in
process carrying over from one day to the next. Would this qualify as a terminating
simulation with E 5 {16 hours of simulated time have elapsed}? No, since this
manufacturing operation is essentially a continuous process, with the ending conditions for
one day being the initial conditions for the next day.

Non-terminating simulation:
Non-Terminating Simulation is a system that runs continuously, or at least a very long period
of time, It starts at simulation time 0 under initial conditions defined by the analyst and runs
for some analyst defined period of time TE , A steady-state simulation is a simulation whose
objective is to study long-run behaviour of a non-terminating system.

Example: Consider a company that is going to build a new manufacturing system and would
like to determine the long-run (steady-state) mean hourly throughput of their system after it
has been running long enough for the workers to know their jobs and for mechanical
difficulties to have been worked out. Assume that:
(a) The system will operate 16 hours a day for 5 days a week.
(b) There is negligible loss of production at the end of one shift or at the beginning of the next
shift .
(c) There are no breaks (e.g., lunch) that shut down production at specified times each day.
Fig 5.3: Types of simulations with regard to output analysis
This system could be simulated by “pasting together” 16-hour days, thus ignoring the system
idle time at the end of each day and on the weekend. Let Ni be the number of parts
manufactured in the ith hour. If the stochastic process N1, N2, . . . has a steady-state
distribution with corresponding random variable N, then we are interested in estimating the
mean v= E(N)

stochastic processes for most real systems do not have steady-state distributions, since
the characteristics of the system change over time. For example, in a manufacturing system
the production-scheduling rules and the facility layout (e.g., number and location of
machines) may change from time to time.
A simulation model (which is an abstraction of reality) may have steady-state distributions,
since characteristics of the model are often assumed not to change over time.

Example: The manufacturing company wanted to know the time required for the system to
go from startup to operating in a “normal” manner, this would be a terminating simulation
with terminating event E 5 {simulated system is running “normally”} (if such can be defined).
Thus, a simulation for a particular system might be either terminating or non terminating,
depending on the objectives of the simulation study.
Consider a stochastic process Y1, Y2, . . . for a non terminating simulation that does not have
a steady-state distribution. Suppose that we divide the time axis into equal-length, contiguous
time intervals called cycles.

Let Yi C be a random variable defined on the ith cycle, and assume that Y1C, Y2 C, . . . are
comparable. Suppose that the process Y1C, Y2C, . . . has a steady-state distribution FC and
that YC , FC. Then a measure of performance is said to be a steady-state cycle parameter if it
is a characteristic of YC such as the mean vC = E(YC). Thus, a steady-state cycle parameter is
just a steady-state parameter of the appropriate cycle process Y1C, Y2C, . . . .

Example: Suppose for the manufacturing system , there is a half-hour lunch break at the
beginning of the fifth hour in each 8-hour shift. Then the process of hourly throughputs N1,
N2, . . . has no steady-state distribution. Let Ni C be the average hourly throughput in the ith
8-hour shift (cycle).

For a non terminating simulation, suppose that the stochastic process Y1, Y2, . . .does
not have a steady-state distribution, and that there is no appropriate cycle definition such
that the corresponding process Y1C, Y2C, . . . has a steady-state distribution.
This can occur, for example, if the parameters for the model continue to change over time.
In Example, if the arrival rate of calls changes from week to week and from year to year,
then steady-state (cycle) parameters will probably not be well defined. In these cases,
however, there will typically be a fixed amount of data describing how input parameters
change over time. This provides, in effect, a terminating event E for the simulation and, thus,
the analysis techniques for terminating simulations are appropriate.

5.4 Statistical analysis for terminating simulations


Consider n independent replications of a terminating simulation, where each replication is
terminated by the event E and is begun with the “same” initial conditions The independence
of replications is accomplished by using different random numbers for each replication.

Estimating Means
Suppose that if we obtain a point estimate and confidence interval for the mean m = E(X),
where X is a random variable defined on a replication as described above. Make n independent
replications of the simulation and let X1, X2, . . . ,Xn be the resulting IID random variables.
Then, by substituting the Xj ’s into mean of X for n variables we get that X(n) is an unbiased
point estimator form, and an approximate 100(1 - a) percent (0<a < 1) confidence interval
form is given by

-----(1)
where the sample variance S2 (n) is given by Eq. (1) for the fixed-sample-size procedure.

Example:
For the bank, suppose that we want to obtain a point estimate and an approximate 90
percent confidence interval for the expected average delay of a customer over a day, which
is given by

Thus, subject to the correct interpretation to be given to confidence intervals , we can claim
with approximately 90 percent confidence that E(X) is contained in the interval [1.71, 2.35]
minutes
For the inventory system, suppose that we want to obtain a point estimate and an
approximate 95 percent confidence interval for the expected average cost over the 120-

month planning horizon, which is given by


We made 10 independent replications and obtained the following Xj’s:
129.35 127.11 124.03 122.13 120.44
118.39 130.17 129.77 125.52 1 33.75

which resulted in
X-(10) = 126.07, S2(10)= 23.55
and the 95 percent confidence interval
126.07 + 3.47 or, 126.07 - 3.47 alternatively, [122.60, 129.54]
The estimated coefficient of variation ,a measure of variability, is 0.04 for the inventory
system and 0.27 for the bank model. Thus the Xj’s for the bank model are inherently more
variable than those for the inventory system.

The decision to perform a terminating or non-terminating simulation has less to do with the
nature of the system than it does with the behaviour of interest. A terminating simulation is
one in which the simulation starts at a defined state or time and ends when it reaches
some other defined state or time.

The common goal of is to estimate

In general, independent replications are used, each run using a different random number
stream and independently chosen initial conditions.

Statistical Background:
 Important to distinguish within-replication data from across replication data
 For example, simulation of a manufacturing system
o Two performance measures of that system: cycle time for parts and work in
process (WIP).
o Let Yij be the cycle time for the j-th part produced in the i-th replication
o Across-replication data are formed by summarizing within-replication data Yi-

The following are the properties of Across replication and within replication data.

Across Replication:
 Discrete time data
o Within replication:
 Continuous time data

Within-replication data are not independent and not identically distributed and Across-
replication data are independent and identically distributed.

Confidence Intervals with Specified Precision


The half-length H of a 100(1 – α)% confidence interval for a mean θ, based on the t
distribution, is given by

Suppose that an error criterion ε is specified with probability 1-α, a sufficiently large sample
size should satisfy:

Example:
Call Center Example: estimate the agent’s utilization ρ over the first 2 hours of the workday.
Initial sample of size R0 = 4 is taken and an initial estimate of the population variance is
S0 2= (0.072)2 = 0.00518.

The error criterion is ε = 0.04 and confidence coefficient is 1-α = 0.95, hence, the final
sample size must be at least:

• For the final sample size:

You might also like