Cosc 416
Cosc 416
COURSE MATERIAL
FOR
1
ACKNOWLEDGEMENT
We acknowledge the use of the Courseware of the National Open University of
Nigeria (NOUN) as the primary resource. Internal reviewers in the Ahmadu Bello
University have also been duly listed.
2
COPYRIGHT PAGE
© 2018 Ahmadu Bello University (ABU) Zaria, Nigeria
All rights reserved. No part of this publication may be reproduced in any form or by
any means, electronic, mechanical, photocopying, recording or otherwise without the
prior permission of the Ahmadu Bello University, Zaria, Nigeria.
ISBN:
Published and printed in Nigeria by:
Ahmadu Bello University Press Ltd.
Ahmadu Bello University,
Zaria, Nigeria.
Tel: +234
E-mail:
3
COURSE WRITERS/DEVELOPMENT TEAM
4
COURSE STUDY GUIDE
i. COURSE INFORMATION
Course Code : COSC416
Course Title: Simulation Methodology
Credit Units: 3CU
Year of Study: Four
Semester: Second
Description:
This course highlights methodology and approaches in the conduct of simulation. In
the course of your studies, you will be put through the definitions of common terms
in relation to modelling and simulation, the methodology, theories, experiments and
languages use in conducting simulations.
5
iii. COURSE PREREQUISITES
You should note that although this course has no subject pre-requisite, you are
expected to have:
1. satisfactory level of English proficiency
2. basic Computer Operations proficiency
3. introductory course in Computer Science
4. introductory course in mathematical statistics
5. good knowledge in some programming language (BASIC, FORTRAN,
Python, C, C++, Java or Matlab) are of advantage but not mandatory
6
Aldrich, C. (2004). Simulations and the future of learning: an innovative (and
perhaps revolutionary) approach to e-learning. San Francisco: Pfeifer - John
Wiley & Sons.
Percival, F., Lodge, S., Saunders, D. (1993). The Simulation and Gaming
Yearbook: Developing Transferable Skills in Education and Training.
Headrick T., Fast fifth-order polynomial transforms for generating univariate and
multivariate nonnormal distributions, Computational Statistics and Data
Analysis, 40 (4), 685-711, 2002.
Karian Z., and E. Dudewicz, Modern Statistical Systems and GPSS Simulation,
CRC Press, 1998.
Korn G., Real statistical experiments can use simulation-package software,
Simulation Modelling Practice and Theory, 13(1), 39-54, 2005.
Cochran, W. G. (1977) in Organizational Research: Determining Appropriate
Sample Size in Survey Research by James E. Bartlett, et al.
Lewis P., and E. Orav, Simulation Methodology for Statisticians, Operations
Analysts, and Engineers, Wadsworth Inc., 1989 Robert C., and G. Casella,
Monte Carlo Statistical Methods, Springer, 1999.
Gross, Donald; Carl M. Harris (1998). Fundamentals of Queuing Theory. Wiley.
Lazowska, Edward D.; John Zahorjan, G. Scott Graham, Kenneth C. Sevcik (1984).
Introduction to Probability Models 9th Edition by Sheldon M. Ross,
Academic press 2007.
Wayne L Winston, Operations Research: Applications and Algorithms, 2 nd
edition, PWS-Kent Publishing, Boston, 1991.
Frederick S.H and Gerald J.L. (1995). Introduction to Operations Research,
sixth Edition. McGraw-Hill, Inc.
Karlin & Taylor (1981): A Second Course in Stochastic Processes, Academic
Press, 1981, p.432.
Introduction to Probability Models 9th Edition by Sheldon M. Ross, Academic
press 2007.
7
David Brink, Essentials of Statistics, David Brink and Ventus Publishing Aps 2010.
http://bookboon.com/int/student/statistics/statistics-essentials.pdf
8
http://en.wikipedia.org/wiki/Probability_distribution
http://en.wikipedia.org/wiki/Probability#Mathematical_treatment
http://www.highbeam.com/doc/1G1-17387259.html
http://www.icaen.uiowa.edu/~kuhl/SoftEng/Slides5.pdf
http://en.wikipedia.org/wiki/Finite_element_method#History
http://www.answers.com/topic/hp-fem
http://www.answers.com/topic/extended-finite-element-method
http://www.answers.com/topic/finite-element-method#History
http://www.answers.com/library/Sci-Tech Encyclopedia-cid-2823403
http://www.answers.com/topic/meshfree-methods
http://en.wikipedia.org/wiki/Data_model )
http://www.idiagram.com/ideas/visual_models.html
http://en.wikipedia.org/wiki/Inferential_statistics
Quantitative System Performance: Computer System Analysis Using Queuing
Network Models. Prentice-Hall, Inc.
http://www.cs.washington.edu/homes/lazowska/qsp/.
Zukerman, Moshe. Introduction to Queuing Theory and Stochastic
Teletraffic Models. http://www.ee.cityu.edu.hk/~zukerman/classnotes.pdf.
http://en.wikipedia.org/wiki/Queuing _theory/
Characteristics of Queuing Systems. accessed on 14-02-2006
at
http://www.bsbpa.umkc.edu/classes/ashley/Chaptr14/sld007.
htm
Bajpai A.C., Mustoe L.R., and Walker D. Engineering
Mathematics. John Willey. 1974. Pages 678 to 683
http://en.wikipedia.org/wiki/Poisson_distribution
Sheldon M. Ross Introduction to Probability Models 9th Edition. Academic press
http://en.wikipedia.org/wiki/Poisson_distribution
http://en.wikipedia.org/wiki/Queuing _theory
9
http://en.wikipedia.org/wiki/Queuing _theory#cite_note-flood-7
http://en.wikipedia.org/wiki/Computer_language
http://en.wikipedia.org/wiki/Simulation_language
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=185606
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=129494..
http://en.wikipedia.org/wiki/SIMSCRIPT_II.5
Hamdy A. Taha (1987) SIMNET simulation language Proceedings of the 19th
conference on Winter simulation ISBN:0-911801-32-4
doi>10.1145/318371.318410
10
v. COURSE AIM
This course aims to introduce students to the basics, concepts and features of
Modelling and Simulation. It is believed the knowledge will enable the reader
understand and appreciate decision processes especially for critical and cost
involving (life, money, etc.) endeavours. it will help the reader to understand
the level of disparity between outcome and reality and why we require a
decision analysis and support tool to enable us to evaluate, compare and
optimize alternatives.
11
8. Explain properties of good random number generator.
9. Explain the use of Congruential method for generating Random numbers;
10. Choose appropriate parameters for congruential method;
11. Translate the method to computer programs;
12. Use other very similar random number generating methods such as:
13. Describe Monte Carlo method
14. Trace the origin of Monte Carlo method
15. Give examples of the application of Monte Carlo method
16. Define Statistics
17. Explain Statistical Distributions
18. Compute measures of Central Tendency and Variations
19. Explain the Components of Statistical Distributions
20. Explain the role of probability distribution functions in simulations
21. Describe Probability theory
22. Explain the fundamental concepts of Probability theory
23. Explain Random Variable
24. Explain Limiting theorems
25. Describe Probability distributions in simulations
26. List common Probability distributions
27. Say what simulation is about
28. State why we need simulation
29. Describe how simulations are done
30. Describe various types of Simulations
31. Give examples of Simulation
32. Show areas of applications of Simulation
33. Define Modelling
12
34. Describe some basic modelling concepts
35. Differentiate between Visual and Conceptual models
36. Explain the Characteristics of Visual, models
37. Define Finite Element Method (FEM)
38. Describe the relationship between FEM and Finite element analysis
39. State the origin and Applications of FEM
40. Describe the Basics of FEM
41. Define Data modelling
42. Describe the different types and the three perspectives of data models
43. Have an overview of database model
44. Differentiate between Descriptive and Inference statistics
45. Describe the features of descriptive statistics
46. Describe features of Inference statistic
47. Compute the essential statistics for simulation
48. Define queuing theory
49. Describe queuing systems; parameters, question and examples of queuing system
50. Describe how Probability theories applied in queuing systems
51. Describe some essential Queuing theories
52. Describe queuing systems using Kendall Lee notations and Little’s formula
53. Explain the Queue discipline
54. State the relationship between Exponential and Poisson Probability Distributions
55. Define the Input and Output parameters of a typical queuing system
56. Describe the Steady-State probability for queues
57. Define queuing model
58. Describe the construction of models
59. State basic characteristics of queues
13
60. Describe the various queue models
61. Perform experiments of queues as applied in:
62. Translate the experiments into Simulation flowcharts and programs
63. State the Purpose of simulation languages
64. List Types and Examples of Simulation Languages
65. State the approaches to model development
66. Design and develop simulation programs using SIMNET II language
67. Define Stochastic Process
68. Classify Stochastic Processes
69. Describe the Concepts in Stochastic Processes
70. Show how stochastic processes are applied in different fields
71. Describe the following processes: Ito, Levy, Wiener, Poisson, Point, Markov, and
Brownian, as stochastic process
72. Define Random walks
73. Explain various types of RW
74. Illustrate different dimensions of RW with probabilities occurrence
75. Relate RW to Wiener, Markov and Brownian processes, and
76. List the applications of RW
77. Discuss the different methods of data collection: Census, Sample Survey,
Experiments and Observations
78. Describe the various methods determining sample size
79. Describe data coding with respect to: What, Why, uses and determination of codes
80. Define and explain data coding
81. Describe the steps involved in coding
82. Explain code determination
83. Describe the outlier and how to handle it
14
viii. ACTIVITIES TO MEET COURSE OBJECTIVES
Specifically, this course shall comprise of the following activities:
1. Studying courseware
2. Listening to course audios
3. Watching relevant course videos
4. Field activities, industrial attachment or internship, laboratory or studio
work (whichever is applicable)
5. Course assignments (individual and group)
6. Forum discussion participation
7. Tutorials (optional)
8. Semester examinations (CBT and essay based)
15
B. Summative assessment (Semester examination)
CBT based 30
Essay based 30
TOTAL 100%
C. Grading Scale:
A = 70-100
B = 60 – 69
C = 50 - 59
D = 45-49
F = 0-44
D. Feedback
Courseware based:
1. In-text questions and answers (answers preceding references)
2. Self-assessment questions and answers (answers preceding references)
Tutor based:
1. Discussion Forum tutor input
2. Graded Continuous assessments
Student based:
1. Online programme assessment (administration, learning resource, deployment, and
assessment).
16
xi. LINKS TO OPEN EDUCATION RESOURCES
OSS Watch provides tips for selecting open source, or for procuring free or open
software.
SchoolForge and SourceForge are good places to find, create, and publish open software.
SourceForge, for one, has millions of downloads each day.
Open Source Education Foundation and Open Source Initiative, and other organisation
like these, help disseminate knowledge.
Creative Commons has a number of open projects from Khan Academy to Curriki where
teachers and parents can find educational materials for children or learn about Creative
Commons licenses. Also, they recently launched the School of Open that offers courses
on the meaning, application, and impact of "openness."
Numerous open or open educational resource databases and search engines exist. Some
examples include:
a. OEDb: over 10,000 free courses from universities as well as reviews of colleges
and rankings of college degree programmes
b. Open Tapestry: over 100,000 open licensed online learning resources for an
academic and general audience
c. OER Commons: over 40,000 open educational resources from elementary school
through to higher education; many of the elementary, middle, and high school
resources are aligned to the Common Core State Standards
d. Open Content: a blog, definition, and game of open source as well as a friendly
search engine for open educational resources from MIT, Stanford, and other
universities with subject and description listings
e. Academic Earth: over 1,500 video lectures from MIT, Stanford, Berkeley,
Harvard, Princeton, and Yale
17
f. JISC: Joint Information Systems Committee works on behalf of UK higher
education and is involved in many open resources and open projects including
digitising British newspapers from 1620-1900!
18
e. Live binders: search, create, or organise digital information binders by age, grade,
or subject (why re-invent the wheel?)
19
xii. ABU DLC ACADEMIC CALENDAR/PLANNER
PERIOD
Semester Semester 1 Semester 2 Semester 3
Activity JAN FEB MAR APR MAY JUN JUL AUG SEPT OCT NOV DEC
Registration
Resumption
Late Registn.
Facilitation
Revision/
Consolidation
Semester
Examination
20
xiii. COURSE STRUCTURE AND OUTLINE
Course Structure
WEEKS MODULE STUDY SESSION ACTIVITY
AND
Study Session.
Study Session 1: Basics of 2. View the Video(s) on this Study Session
MODELLING Modelling and Simulation 3. Listen to the Audio on this Study Session
4. View any other Video/U-tube
Week 1 Pp. 29 (http://bit.ly/2MJcGfm ,
http://bit.ly/30QFeax ,
http://bit.ly/2L3nBOU ,
http://bit.ly/327Z80L )
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
OF
Study Session.
Study Session 2: Random 2. View the Video(s) on this Study Session
Week 2 Numbers & Random Number 3. Listen to the Audio on this Study Session
Generation 4. View any other Video/U-
tube(http://bit.ly/2ZwGGgD ,
Pp. 41 http://bit.ly/2NE5EbI , http://bit.ly/2MIZQhc
, http://bit.ly/2L0Ew4F ,
http://bit.ly/2Lc1e97 ,
http://bit.ly/2Uh2XOr)
1:
SIMULATION
21
Week 3 Method & Statistical 3. Listen to the Audio on this Study Session
Distribution Functions 4. View any other Video/U-
Pp. 62 tube(http://bit.ly/2ZB1rvu ,
http://bit.ly/2Ua3XE7 , http://bit.ly/2ZllBus ,
http://bit.ly/2MIZQhc , http://bit.ly/2Zwtbl6
, http://bit.ly/2ZnhkXo ,
http://bit.ly/2NIromR , http://bit.ly/2XJEu5u
)
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session.
Study Session 4: Common 2. View the Video(s) on this Study Session
Week4 Probability Distributions 3. Listen to the Audio on this Study Session
4. View any other Video/U-tube
Pp. 86 (http://bit.ly/30J2arU , http://bit.ly/2Zu19CB
, http://bit.ly/2UalP1D ,
http://bit.ly/2Lch03M , http://bit.ly/2L3q1Nu
, http://bit.ly/2Wvhvtd )
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session 1: Simulation and Study Session.
Modelling. 2. View the Video(s) on this Study Session
MODELLING AND
(http://bit.ly/2L9Ltzq , http://bit.ly/2LhudZf
MODULE 2:
METHODS
, http://bit.ly/2UlmpK9 ,
http://bit.ly/2UdU3kO ,
Study Session 2: Modelling http://bit.ly/2ZvpKex , http://bit.ly/2L12lcH
Methods )
5. Read Chapter/page of Standard/relevant text.
22
Pp. 132 6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session 3: Finite Element Study Session.
Model 2. View the Video(s) on this Study Session
Week 6 3. Listen to the Audio on this Study Session
Pp. 144 4. View any other Video/U-tube
(http://bit.ly/2PjBJrw , http://bit.ly/327iJhy ,
http://bit.ly/2ZB2tYo , http://bit.ly/2NDIrpU
)
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session 4: Statistics for Study Session.
Modelling and Simulation 2. View the Video(s) on this Study Session
3. Listen to the Audio on this Study Session
Pp. 168 4. View any other Video/U-tube
Week 7 (http://bit.ly/2NIromR , http://bit.ly/2L9Ltzq
, http://bit.ly/2LhudZf ,
http://bit.ly/2UlmpK9 )
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session.
Week 8 Study Session 1: Simple 2. View the Video(s) on this Study Session
ULE 3: QUEUES
23
Queuing 5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
Pp. 217 7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session.
Study Session 3: Queuing 2. View the Video(s) on this Study Session
Week 9 Models 3. Listen to the Audio on this Study Session
4. View any other Video/U-
Pp. 227 tube(http://bit.ly/2NAz5vb ,
http://bit.ly/2HxcnQV , http://bit.ly/32a3l3X
, http://bit.ly/2L3VBec ,
Study Session 4: Queuing http://bit.ly/2LgZnj7 )
Experiments 5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
Pp. 240 7. Do any out of Class Activity
1. Read Courseware for the corresponding
MODULE 4: SIMULATION LANGUAGES
http://bit.ly/2UeWCmt ,
Study Session 2: SIMNET II http://bit.ly/32cbKUw ,
Languages http://bit.ly/3462WBr )
5. Read Chapter/page of Standard/relevant text.
Pp. 267 6. Read any additional study material
7. Do any out of Class Activity
Week 11 Study Session 3: Stochastic 1. Read Courseware for the corresponding
Processes Study Session.
2. View the Video(s) on this Study Session
Pp. 299 3. Listen to the Audio on this Study Session
4. View any other Video/U-
tube(http://bit.ly/2Zvjdwl ,
Study Session 4: Random http://bit.ly/2Pv3b62 , http://bit.ly/2MIOb2a
24
Walks , http://bit.ly/2NFQouD ,
http://bit.ly/2Ldyf4B , http://bit.ly/2Htf5GU
Pp. 313 , http://bit.ly/2Le8i59 ,
http://bit.ly/2Ucg0Ay , http://bit.ly/2Ubct5G
)
5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
7. Do any out of Class Activity
1. Read Courseware for the corresponding
Study Session 5: Data Study Session.
Collection 2. View the Video(s) on this Study Session
Week 12 3. Listen to the Audio on this Study Session
Pp. 327 4. View any other Video/U-tube
(http://bit.ly/2wL2MjD ,
http://bit.ly/2wMm7B1 ,
http://bit.ly/2zvvjuE , http://bit.ly/2X610rO
Study Session 6: Coding and , http://bit.ly/2NEsKi8 )
Screening 5. Read Chapter/page of Standard/relevant text.
6. Read any additional study material
Pp. 341 7. Do any out of Class Activity
Week 13 REVISION/TUTORIALS (On Campus or Online) & CONSOLIDATION
WEEK
25
CONTENTS
Title Page…………………………………………………………….………………………1
Acknowledgement Page………………………………………………………………….....2
Copyright Page……………………………………………………………………………...3
Course Writers/Development Team…………………………………………………….....4
26
Study Session 4: Statistics for Modelling and Simulation…………………………………168
MODULE 3: Queues………………………………………........................................................199
Study Session 1: Simple Theories of Queues……………………………………….................199
Study Session 2: Basic Probability Theories in Queuing……………………………………..217
Study Session 3: Queuing Models………………………………………....................................227
Study Session 4: Queuing Experiments……………………………………………………240
27
Course Outline
MODULE 1: Fundamentals of Modelling and Simulation
Study Session 1: Basics of Modelling and Simulation
Study Session 2: Random Numbers & Random Number Generation
Study Session 3: Monte Carlo Method & Statistical Distribution Functions
Study Session 4: Common Probability Distributions
MODULE 3: Queues
Study Session 1: Simple Theories of Queues
Study Session 2: Basic Probability Theories in Queuing
Study Session 3: Queuing Models
Study Session 4: Queuing Experiments
28
xii. STUDY MODULES
MODULE 1: Fundamentals of Modelling and Simulation
Contents:
Study Session 1: Basics of Modelling and Simulation
Study Session 2: Random Numbers & Random Number Generation
Study Session 3: Monte Carlo Method & Statistical Distribution Functions
Study Session 4: Common Probability Distributions
STUDY SESSION 1
Basics of Modelling and Simulation
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Modelling and Simulation Concepts
2.1 Definitions
2.2 What is Modelling and Simulation?
2.3 Type of Models
2.4 Advantages of Using Models
2.5 Applications
2.6 Modelling Procedure
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions and Answers
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
29
Introduction
You are welcome. In this session, we will be discussing the concepts of modelling
and simulation. The ability of man to define what may happen in the future and to
choose among alternatives lie at the heart of contemporary societies. Our knowledge
of the way things work, in society or nature are trailed with clouds of imprecision,
and vast harms have followed a belief in certainty. To reduce the level of disparity
between outcome and reality, we require a decision analysis and support tool to
enable us to evaluate, compare and optimize alternative. Such a tool should be able
to provide explanations to various stakeholders and defend the decisions. One such
tool that has been successfully employed is simulation that we use to vary the
parameter of a model and observe the outcome. Simulation has been particularly
valuable:
a) When there is significant uncertainty regarding the outcome or consequences
of a particular alternative under consideration, it allows you to deal with
uncertainty and imprecision in a quantifiable way.
b) When the system under consideration involves complex interactions and
requires input from multiple disciplines. In this case, it is difficult for only one
person to easily understand the system. A simulation of the model can in such
situations act as the framework to integrate the various components. In order
to better, understand their interaction, as such, a management tool keeps you
focused on the "big picture" without getting lost in unimportant details.
c) When the consequences of a proposed action, plan or design cannot be directly
and immediately observed, (i.e. the consequences are delayed in time and/or
dispersed in space) and/or it is simply impractical or prohibitively expensive to
test the alternatives directly.
30
2. Explain when to and why we use models
3. Describe the modelling process
4. Describe different types of Models
2.1 Definitions
a. Modelling is the process of generating abstract, conceptual, graphical and/or
mathematical models. Science offers a growing collection of methods, techniques
and theory about all kinds of specialized scientific modelling.
Modelling also means to find relations between systems and models. Stated
otherwise, models are abstractions of real or imaginary worlds we create to
understand their behaviour, play with them by performing "what if" experiments,
make projections, animate or simply have fun.
b. A model in general is a pattern, plan, representation (especially in miniature), or
description designed to show the main object or workings of an object, system, or
concept.
A model (physical or hypothetical) is a representation of real-world phenomenon
or elements (objects, concepts or events). Stated otherwise, a model is an attempt
to express a possible structure of physical causality.
Models in science are often theoretical constructs that represent any particular thing
with a set of variables and a set of logical and or quantitative relationships between
them. A model in this sense is constructed to enable reasoning within an idealized
31
logical framework about these processes and is an important component of scientific
theories.
Simulation -is the manipulation of a model in such a way that it operates on time or
space to compress it, thus enabling one to perceive the interactions that would not
otherwise be apparent because of their separation in time or space.
Modelling and Simulation is a discipline for developing a level of understanding of
the interaction of the parts of a system, and of the system as a whole. The level of
understanding, which may develop via this discipline, is seldom achievable via any
other discipline.
A computer model is a simulation or model of a situation in the real world or an
imaginary world, which has parameters that the user can alter.
For example, Newton considers movement (of planets and of masses) and writes
equations, among which f = ma (where f is force, m mass and a acceleration), that
make the dynamics intelligible. Newton by this expression makes a formidable
proposition, that force causes acceleration, with mass as proportionality coefficient.
Another example, a model airplane is a physical representation of the real airplane;
models of airplanes are useful in predicting the behaviour of the real airplane when
subjected to different conditions; weather, speed, load, etc. Models help us frame our
thinking about objects in the real world. You should note that more often than not we
model dynamic (changing) systems.
32
develops a model, simulates it, learns from the result, revises the model, and
continues the iterations until an adequate level of understanding is attained.
Modelling and Simulation is a discipline, it is also very much an art form. One can
learn about riding a bicycle from reading a book. To learn to ride a bicycle, you must
become actively engaged with a bicycle. Modelling and Simulation follows much the
same reality. You can learn much about modelling and simulation from reading
books and talking with other people. Skill and talent in developing models and
performing simulations is only developed through the building of models and
simulating them. It is very much “learn as you go” process. From the inter action of
the developer and the models emerges an understanding of what makes sense and
what does not.
Question 1: Differentiate between Model, Modelling, Simulation and Computer model.
Answer
A model in general is a pattern, plan, representation (especially in miniature), or description
designed to show the main object or workings of an object, system, or concept.
Modelling is the process of generating abstract, conceptual, graphical and/or mathematical models.
Science offers a growing collection of methods, techniques and theory about all kinds of
specialized scientific modelling.
Simulation is the manipulation of a model in such a way that it operates on time or space to
compress it, thus enabling one to perceive the interactions that would not otherwise be apparent
because of their separation in time or space.
A computer model is a simulation or model of a situation in the real world or an imaginary world,
which has parameters that the user can alter.
33
These are called iconic models. Good examples of physical models are car models,
railway models, airplane models, scale models, etc. A railway model can be used to
study the behaviour of a real railway, also scale models can be used to study a plant
layout design. In simulation studies, iconic models are rarely used.
Mathematical Models
These are models used for predictive (projecting) purposes. They are abstract and
take the form of mathematical expressions of relationships. For example:
1. x2 + y2 = 1 (mathematical model of a circle of radius 1)
Analogue Models
These are similar to iconic models. However, here, some other entities are used to
represent directly the entities of the real world. An example is the analogue computer
where the magnitudes of the electrical currents flowing in a circuit can be used to
represent quantities of materials or people moving around in a system. Other
examples are; the gauge used to check the pressure in a tyre. The movement of the
dial represent the air pressure in the tyre. In medical examination, the marks of
electrical current on paper, is the analogue representation of the working of muscles
or organs.
34
Simulation Models
Here, instead of entities being represented physically, they are represented by
sequences of random numbers subject to the assumptions of the model. These
models represent (emulate) the behaviour of a real system. They are used where there
are no suitable mathematical models or where the mathematical model is too
complex or where it is not possible to experiment upon a working system without
causing serious disruption.
Heuristic Models
These models use intuitive (or futuristic) rules with the hope that it will produce
workable solutions, which can be improved upon. For example, the Arthur C Clerk’s
heuristic model was the forerunner of the communications satellite and today’s
international television broadcast.
Deterministic Models
These are models that contain certain known and fixed constants throughout
their formulation e.g., Economic Order Quantity (EOQ) for inventory
control under uncertainty.
Stochastic models
These models involve one or more uncertain variables and as such are subject to
probabilities.
35
2.5 Applications
One application of scientific modelling is the field of "Modelling and Simulation",
generally referred to as "M&S". M&S has a spectrum of applications that range from
concept development and analysis, through experimentation, measurement and
verification, to disposal analysis. Projects and programs may use hundreds of
different simulations, simulators and model analysis tools.
36
management control. For the factors included, assumptions have to be made about
their behaviour.
Run (simulate) the model and measure what happens. For example, if we have
simulation of a queuing situation where two servers are employed, we can run this
for hundreds of customers passing through the system and obtain results such as the
average length of the queue and the average waiting time per customer. We can then
run it with three servers, and see what new values are obtained for these parameters.
Many such runs can be carried out by making different changes to the structure and
assumptions of the model.
In the case of a mathematical model, we have to solve a set of equations of some
sort, e.g. linear programming problem where we have to solve a set of constraints as
simultaneous equations, or in stock control where we have to use previously
accumulated data to predict the future value of a particular variable.
Question 2: What are the steps to be followed in modelling?
Answer
i. examine the real world situation
ii. extract the essential features from the real world situation
iii. construct a model of the real (object or system) using just the essential features identified
iv. solve and experiment with the model
v. draw conclusions about the model
vi. if a further refinement necessary, then re-examine the model and readjust parameters and
continue at (iv), otherwise continue at (vii)
vii proceed with implementation
When we have solved our mathematical model or evaluated some simulation runs,
we can now draw some conclusions about the model. For example, if we have the
average queue length and the average waiting time for a queuing situation varied in
some ways, we can use this in conjunction with information on such matters as the
37
wage-rates for servers and value of time lost in the queue to arrive at decisions on the
best way to service the queue.
Finally, we use our conclusions about the model to draw some conclusions about the
original real world situation. The validity of the conclusions will depend on how well
our model actually represents the real world situation.
Usually the first attempt at modelling the situation will almost certainly lead to
results at variance with reality. We have to look back at the assumptions in the model
and adjust them. The model must be rebuilt and new results obtained. Usually, a
large number of iterations of this form will be required before acceptable model is
obtained. When an acceptable model has been obtained, it is necessary to test the
sensitivity of that model to possible changes in condition.
The modelling process can then be considered for implementation when it is
decided that the model is representing the real world (object or system) sufficiently
well for conclusions drawn from it to be a useful guide for action.
The model can be solved by hand, especially if it is simple. It could take time to
arrive at an acceptable model. For complex models or models that involve
tremendous amount of data, the computer is very useful.
4.0 Summary
We have come to the end of our discussion and in introducing this study session, we
stated that simulation is a decision support tool, which enable us to evaluate,
compare and optimize alternative ways of solving a problem. In the cause of our
discussion,
i. Modelling was defined
38
ii. The concepts of modelling were outlined
iii. Why we use models
iv. The application of models especially for simulations
v. The types of models, which include Physical, Mathematical, Analogue,
Simulation, Heuristic, Stochastic and Deterministic models, were
highlighted
vi. The steps that can be followed in modelling were listed
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2MJcGfm , http://bit.ly/30QFeax ,
http://bit.ly/2L3nBOU , http://bit.ly/327Z80L Watch the video & summarize in 1
paragraph.
b. View the animation on basics of modelling and simulation and critique it in the
discussion forum
c. Take a walk and engage any 3 students on basics of modelling and simulation; In 2
paragraphs summarize their opinion of the discussed topic. etc.
39
8.0 References/Further Readings
http://en.wikipedia.org/wiki/Scientific_modelling#Scientific_modelling_basics
http://www.systems-thinking.org/modsim/modsim.htm
http://www.wisegeek.com/what-is-a-simulation-model.htm
40
STUDY SESSION 2
Random Numbers and Random Number Generation
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 How to Generate Random Numbers
2.1.1 Pseudorandom Number Generation
2.2 Random Numbers in Computer
2.3 Using the RND Function in BASIC
2.4 Simulating Randomness
2.5 Properties of a Good Random Number Generator
2.6 The Congruential Random Number Generation
2.6.1 Choice of a, c and m
2.6.2 RANECU Random Number Generator
2.6.3 Other Methods of Generating Random Numbers
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions and Answers
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are welcome to another study session. Here, we shall discuss Random numbers
and random number generation. The use of Random numbers lies at the foundation
of modelling and simulations. Computer applications such as simulations, games,
graphics, etc., often need the ability to generate random numbers for such
application.
41
The quality of a random number generator is proportional to its period, or the
number of random numbers it can produce before a repeating pattern sets in. In
large-scale simulations, different algorithms (called shift-register and lagged-
Fibonacci) can be used, although these also have some drawbacks, combining two
different types of generators may produce the best results.
42
Generating a random number series from a single seed works fine with most
simulations that rely upon generating random events under the control of
probabilities (Monte Carlo simulations). However, although the sequence of numbers
generated from a given seed is randomly distributed, it is always the same series of
numbers for the same seed. Thus, a computer poker game that simply used a given
seed would always generate the same hands for each player.
What is needed is a large collection of potential seeds from which one can be more
or less randomly chosen. If there are enough possible seeds, the odds of ever getting
the same series of numbers become diminishingly small.
One way to do this is to read the time (and perhaps date) from the computer’s system
clock and generate a seed based on that value. Since the clock value is in
milliseconds, there are millions of possible values to choose from. Another common
technique is to use the interval between the user’s keystrokes (in milliseconds).
Although they are not perfect, these techniques are quite adequate for games.
The so-called true random number generators extract random numbers from physical
phenomena such as a radioactive source or even atmospheric noise as detected by a
radio receiver.
43
In some simulations, we use random numbers that are between 0 and 1. For example,
if you need such numbers with four decimal digits, then you can take four at a time
from the recorded sequence of random digits, and place a decimal point in front of
each group of four. To illustrate, if the sequence of digits is 358083429261… then
the four decimal placed random numbers are .3580, .8342, and .9261.
44
The results of experiments such as the one previously describe above are published
in books of statistical tables. In hand simulation, it may be appropriate to use a
published table of random numbers.
The conventional six-sided unbiased die may also be used to generate a sequence of
random digits in the set (1, 2, 3, 4, 5, 6) where each digit has a probability 1/6 of
occurrence.
Randomize
This way, we can control the sequence of random numbers generated.
RANDOMIZE will result to the following prompt on the VDU:
Random Number Seed (-32768 to 32767)?
Suppose your response to the above prompt is 100. Then the computer would use
this number, 100, to generate the first random number. This number generated is
used to generate the next random number. Thus by specifying the seed for the first
random number, we are in a way controlling all random numbers that will be
generated until the seed is reset. A control such as this can be very useful in
validating a simulation program or other computer programs that use random
numbers.
Consider the following BASIC program:
FOR K% = 1 TO 5
PRINT RND
NEXT K%
45
END
If the above program is run, some seven-digit decimal numbers like the following
will be displayed: .6291626, .1948297, .6305799, .8625749, .736353. The
particular digits displayed depend on the system time.
Every time you run the above program, different sequence of numbers will be
displayed.
Now add a RANDOMIZE statement to the program:
RANDOMIZE TIMER
FOR K% = 1 TO 5
PRINT RND
NEXT K%
END
If you run this program with 300 as a response to the prompt for the random number
seed, the following may be displayed: .1851404, .9877729, .806621, .8573399, .620
46
The expression produces an integer in the range: 0≤X<5
47
Question 1
What is the name of the BASIC programming language numeric function used to generate random
numbers between 0 and 1?
Answer
RND
Example 2
Another QBASIC program to simulate the tossing of a fair coin 10 times. The
program displays a H when a head appears and a T when a tail appears.
CLS
REM Program to simulate the tossing of a coin 10 times
REM and print the outcome
RANDOMIZE TIMER
FOR K% = 1 TO 10
RANDNO = RND
IF RANDNO <= 0.5 PRINT “H”
IF RANDNO > 0.5 PRINT “T”
NEXT K%
END
Example 3
Suppose the output of the program of example 2 is HHTHHTTTHH and that there
are two players’ X and Y involved in the tossing of the coin. Given that player X
48
wins, N50.00 from player Y if a head appears and loses it to player Y if a tail
appears. Determine who won the game and by how much.
Solution
From the output, there are 6 heads and 4 tails.
Player X wins N50.00 x 6 = N300.00 from player Y.
He loses N50.00 x 4 = N200.00 to player Y.
Thus, player X won the game with N300.00 – N200.00 = N100.00.
49
We say that two numbers x and y are congruent modulo m if (x-y) is an integral
multiple
of m. Thus, we can write: x = y (modulo m)
For example, let m = 10, then we can write:
i. 3 (modulo 10)
ii. 4 (modulo 10)
The congruential method generates random numbers by computing the next
random number from the last random number obtained, given an initial random
number say, X0, called the seed.
The method uses the formula:
Xn+1 = (aXn + c)(modulo m) where X0 = Seed and; a, c < m,
Where a, c and m are carefully chosen positive integer constants of which a and c
must be less than m, X0 is the seed or the last random number generated in the
sequence. Stated in the computer language, the above formula becomes:
From the above formula, it follows that the random number generated must be
between 0 and (m-1) since MOD (modulo) produces remainder after division. Hence,
the above formula will produce the remainder after dividing (aXn + C) by m. So to
generate a random number between p and m we use:
Xn+1= (aXn + C) (modulo m + 1-p) + p, form > p.
If the value of c is zero, the congruential method is termed Multiplicative
Congruential Method. If the value of c is not zero, the method is called Mixed
Congruential Method.
The multiplicative congruential method is very handy. It is obtained using the
general formula:
50
rn = arn-1 (modulo m)
Where the parameters a, m and the seed r0 are specified to give desirable statistical
properties of the resultant sequence. By virtue of modulo arithmetic, each r n must be
one of the numbers 0,1,2,3… m-1. Clearly, you must be careful about the choice of a
and r0. The values of ‘a’ and r0 should be chosen to yield the largest cycle or period,
that is to give the largest value for n at which rn = r0 for the first time.
Example 4
To illustrate the technique, suppose you want to generate ten decimal place
numbers u1, u2, u3, …. It can be shown that if you use
51
un = rn x 10-1
Where rn = 100003rn-1 (modulo 1010), and r0 = any odd number not divisible by 5,
Then, the period of the sequence will be 5 x 108, that is rn = r0 for the first time at
n = 5 x 108 and the cycle subsequently repeats itself.
As an example, using our mixed congruential formula
Note that the value of X8 is 4, which is the value of the seed X0. So if we compute
X9, X10, etc. the same random numbers 3,6,5,0,7,2,1,4 will be generated once more.
Note also that if we divide the random integer values by 8, we obtain random
numbers in the range 0 < Xn+1 < 1 which is similar to using the RND function of
BASIC.
52
2.6.1 Choice of a, c and m
The method of this random number generation by linear congruential method, works
by computing each successive random number from the previous. Starting with a
seed, Xo, the linear congruential method uses the following formula:
Xi+1 = (A*Xi + C) mod M
In his book, The Art of Computer Programming, Donald Knuth presents several rules
for maximizing the length of time before the random number generator comes up
with the same value as the seed. This is desirable because once the random number
generator comes up with the initial seed, it will start to repeat the same sequence of
random numbers (which will not be so random since the second time around we can
predict what they will be). According to Knuth's rules, if M is prime, we can let C be
0.
The LCM defined above has full period if and only if the following conditions are
satisfied:
i. m and c are relatively prime
ii. If q is a prime number that divides m, then q divides a-1
iii. If 4 divides m, then 4 divides a-1
Therefore, the values for a, c and m are not generated randomly, rather they are
carefully chosen based on certain considerations. For a binary computer with a word
length of r bits, the normal choice for m is m = 2r-1. With this choice of m, a can
assume any of the values 1, 5,9,13, and c can assume any of the values 1, 3, 5, 7…
However, experience shows that the congruential method works out very well if the
value of a is an odd integer not divisible by either 3 or 5 and c chosen such that c
mod 8 = 5 (for a binary computer) or c mod 200 = 21 (for a decimal computer).
53
Example 5
Develop a function procedure called RAND in QBASIC, which generates a
random number between 0 and 1 using the mixed congruential method.
Assume a 16-bit computer.
Solution
FUNCTION RAND (SEED)
CONST M = 32767, A = 2743, C = 5923
IF SEED < 0 THEN SEED = SEED + M
SEED = (A* SEED + C) MOD M
RAND = SEED/M
END FUNCTION
Note that in the main program that references the above function in (a), the
TIMER function can be used to generate the SEED to be passed to the
function RAND as illustrated in example 2.
Example 6
Write a program that can generate that can generate 20 random integer number
distributed between 1 and 64 inclusive using mixed congruential method.
Solution
QBASIC
DECLARE RAND (X)
CLS: REM Mixed Congruential Method
DIM SHARED SEED
SEED = TIMER
FOR K% = 1 TO 20
SEED = RAND (SEED) ‘Call of function RAND
PRINT SEED: SPC(2)
NEXT K%
END ‘End of main program
54
CONST M = 64 A = 27, C = 13
IF SEED = THEN SEED = SEED + M
SEED = (a* SEED + C) MOD M + 1
RAND = SEED
END FUNCTION ‘End of the function program RAND
55
2.6.2 RANECU Random Number Generator
A FORTRAN code for generating uniform random numbers on [0,1]. RANECU is
multiplicative linear congruential generator suitable for a 16-bit platform. It
combines three simple generators, and has a period exceeding 81012.
It is constructed for more use that is efficient by providing for a sequence of such
numbers (Length), to be returned in a single call. A set of three non-zero integer
seeds can be supplied, failing which a default set is employed. If supplied, these
three seeds, in order, should lie in the ranges [1,32362], [1,31726] and [1,31656]
respectively. The program is given below.
Question 2
List three properties of a good Random Number Generator.
Answer
The random numbers generated should:
a. have as nearly as possible a uniform distribution
b. be fast.
c. not require large amounts of memory
d. have a long period
e. be able to generate a different set of random numbers or a series of numbers
f. not degenerate.
56
DO 100 I = 1, LEN
K=ISEED1/206
ISEED1 = 157 * (ISEED1 - K * 206) - K * 21
IF(ISEED1.LT.0) ISEED1=ISEED1+32363
K=ISEED2/217
ISEED2 = 146 * (ISEED2 - K*217) - K* 45
IF(ISEED2.LT.O) ISEED2=ISEED2+31727
K=ISEED3/222
ISEED3 = 142 * (ISEED3 - K *222) - K * 133
IF(ISEED3.LT.0) ISEED3=ISEED3+31657
IZ=ISEED1-ISEED2
IF(IZ.GT.706) IZ = Z - 32362
IZ = 1Z+ISEED3
IF(IZ.LT.1) IZ = 1Z + 32362
RVEC(I)=REAL(IZ) * 3.0899E - 5
CONTINUE RETURN
ENTRY RECUIN (IS1, IS2, IS3)
ISEED1=IS1
ISEED2=IS2
ISEED3=IS3
RETURN
ENTRY RECUUT(IS1,IS2,IS3)
IS1=ISEED1
IS2=ISEED2
IS3=ISEED3
RETURN
END
57
This method uses the formula:
Xn=1 = (dXn2 + cXn + a) modulo m
Where d is chosen in the same way as c and m should be a power of 2 for the
method to yield satisfactory results.
58
In this method, two initial seeds need to be provided. However, experience has
shown that the random numbers generated by using Fibonacci method fail to pass
tests for randomness. Therefore, the method does not give satisfactory results. From
the foregoing discussions, it is obvious that the last three methods – mid-square, mid-
product and Fibonacci are of historical significance and have detrimental and
limiting characteristics.
59
d) A sequence of five-digit random numbers such that Xn+1 (21Xn +
53)(modulo 100) and X0 = 33.
4.0 Summary
In the cause of our discussion in this study session, you have been introduced to
Random Numbers generation. You have also learnt the how to manipulate the RND
function of QBASIC and how to design random number generator.
What you have learnt in this study session, concern:
1. the different ways of generating pseudorandom numbers
2. the properties of good random number generator
3. the use of Q Basic RND functions to simulate randomness
4. the Congruential methods of generating random numbers
5. the use of QBasic RND function to simulate randomness
6. other Random number generating methods
7. the properties of good random number generator
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2ZwGGgD , http://bit.ly/2NE5EbI ,
http://bit.ly/2MIZQhc , http://bit.ly/2L0Ew4F , http://bit.ly/2Lc1e97 ,
http://bit.ly/2Uh2XOr . Watch the video & summarize in 1 paragraph
b. View the animation on random numbers & random number generation and
critique it in the discussion forum.
c. Take a walk and engage any 3 students on random numbers & random number
generation; In 2 paragraphs summarize their opinion of the discussed topic. etc.
60
7.0 Self-Assessment Question Answers
a. A seed is an arbitrary number with a specified length that is used to generate
random digits.
To generate a random number, simply start with an arbitrary number with a
specified number of digits, for example 4 digits. The first number is called the
seed. A constant number of the same number of digits (length) multiplies the
seed, and the desired number of digits is taken off the right end of the product.
The result becomes the new seed. It is again multiplied by the original
constant to generate a new product, and the process is repeated as often as
desired.
b. A period is the number of random numbers a random number generator can
produce before a repeating pattern sets in.
A period may be improved by combining two different types of generators.
61
STUDY SESSION 3
Monte Carlo Method and Statistical Distribution Functions
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Monte Carlo Method
2.1.1 Overview of Monte Carlo Method
2.2 - History of Monte Carlo Method
2.3 - Applications of Monte Carlo Methods
2.4 - What is Statistics
2.4.1 - What is a Statistical Distribution
2.4.2 - Measures of Central Tendency
2.4.3 - Measures of Variation
2.4.4 - Showing Data Distribution in Graphs
2.4.5 - The Difference between a Continuous and a Discrete Distribution
2.4.6 - Normal Distribution
2.4.7 - What is a Percentile
2.4.8 - Probabilities in Discrete Distributions
2.4.9 - Probability and the Normal Curve
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
62
Introduction
You are welcome. Monte Carlo methods (or Monte Carlo experiments) are a class of
computational algorithms that rely on repeated random sampling to compute their
results. Monte Carlo methods are often used in simulating physical and mathematical
systems. The methods are especially useful in studying systems with a large number
of coupled (interacting) degrees of freedom, such as fluids, disordered materials,
strongly coupled solids, and cellular structures.
More broadly, Monte Carlo methods are useful for modelling phenomena with
significant uncertainty in inputs, such as the calculation of risk in business. These
methods are also widely used in mathematics: a classic use is for the evaluation of
definite integrals, particularly multidimensional integrals with complicated boundary
conditions. It is a widely successful method in risk analysis when compared with
alternative methods or human intuition. When Monte Carlo simulations have been
applied in space exploration and oil exploration, actual observations of failures, cost
overruns and schedule overruns are routinely better predicted by the simulations than
by human intuition or alternative "soft" methods.
Although simulation can be a valuable tool for better understanding the underlying
mechanisms that control the behaviour of a system, using simulation to make
predictions of the future behaviour of a system can be difficult. This is because, for
most real-world systems, at least some of the controlling parameters, processes and
events are often stochastic, uncertain and/or poorly understood. The objective of
many simulations is to identify and quantify the risks associated with a particular
option, plan or design. Simulating a system in the face of such uncertainty and
computing such risks requires that the uncertainties be quantitatively included in the
calculations. To do this we collect data about the system parameters and subject them
to statistical analysis
63
1.0 Study Session Learning Outcomes
After studying this session, I expect you to be able to:
1. Describe Monte Carlo method
2. Trace the origin of Monte Carlo method
3. Give examples of the application of Monte Carlo method
4. Define Statistics
5. Explain Statistical Distributions
6. Compute measures of Central Tendency and Variations
7. Explain the Components of Statistical Distributions
8. Identify Normal Distributions
64
a) Draw a square on the ground, and then inscribe a circle within it. From plane
geometry, the ratio of the area of an inscribed circle to that of the surrounding
square is π / 4.
b) Uniformly scatter some objects of uniform size throughout the
Notice how the π approximation follows the general pattern of Monte Carlo
algorithms. First, we define an input domain: in this case, it's the square which
circumscribes our circle. Next, we generate inputs randomly (scatter individual
grains within the square), then perform a computation on each input (test whether it
falls within the circle). At the end, we aggregate the results into our final result, the
approximation of π.
Note, also, two other common properties of Monte Carlo methods: the computation's
reliance on good random numbers, and its slow convergence to a better
approximation as more data points are sampled. If grains are purposefully dropped
into only, for example, the centre of the circle, they will not be uniformly distributed,
and so our approximation will be poor. An approximation will also be poor if only a
few grains are randomly dropped into the whole square. Thus, the approximation of
π will become more accurate both as the grains are dropped more uniformly and as
more are dropped.
To understand the Monte Carlo method theoretically, it is useful to think of it as a
general technique of numerical integration. It can be shown, at least in a trivial
sense, that every application of the Monte Carlo method can be represented as a
definite integral.
65
Suppose we need to evaluate a multi-dimensional definite integral of the
form:……..6
Most integrals can be converted to this form with a suitable change of variables, so
we can consider this a general application suitable for the Monte Carlo method.
The integral represents a non-random problem, but the Monte Carlo method
approximates a solution by introducing a random vector U that is uniformly
distributed on the region of integration. Applying the function f to U, we obtain a
random variable f(U). This has expectation:
………. (7)
………. (8)
Comparing [6] and [8], we obtain a probabilistic expression for the integral :
……………. (9)
so random variable f(U) has mean and some standard deviation . We define
66
…………. (10)
as an unbiased estimator for with standard error . This is a little unconventional,
since is an estimator that depends upon a sample {U [1]} of size one, but it is a valid
estimator nonetheless.
To estimate with a standard error lower than , let’s generalize our
estimator to accommodate a larger sample {U [1], U [2], …, U [m]}. Applying the
…………. (11)
67
from the curse of dimensionality. It is as applicable to a 1000-dimensional integral
as it is to a one-dimensional integral.
While increasing the sample size is one technique for reducing the standard error of
a Monte Carlo analysis, doing so can be computationally expensive. A better
solution is to employ some technique of variance reduction. These techniques
incorporate additional information about the analysis directly into the estimator. This
allows them to make the Monte Carlo estimator more deterministic, and hence have
a lower standard error.
Due to high mathematics required and burden of understanding at this level, we have
to stop this discussion here.
68
and statistical sampling generally did the opposite: using simulation to test a
previously understood deterministic problem. Though examples of an "inverted"
approach do exist historically, they were not considered a general method until the
popularity of the Monte Carlo method spread.
It was only after electronic computers were first built (from 1945 on) that Monte
Carlo methods began to be studied in depth. In the 1950s, they were used at Los
Alamos for early work relating to the development of the hydrogen bomb, and
became popularized in the fields of physics, physical chemistry, and operations
research. The Rand Corporation and the U.S. Air Force were two of the major
organizations responsible for funding and disseminating information on Monte Carlo
methods during this time, and they began to find a wide application in many different
fields.
Question 1
A Monte Carlo algorithm is a Heuristic algorithm. True or False?
Answer
True
Uses of Monte Carlo methods require large amounts of random numbers, and it was
their use that spurred the development of pseudorandom number generators, which
were far quicker to use than the tables of random numbers, which had been
previously used for statistical sampling.
69
Physical sciences
Monte Carlo methods are very important in computational physics, physical
chemistry, and related applied fields, and have diverse applications from complicated
quantum calculations to designing heat shields and aerodynamic forms. The Monte
Carlo method is widely used in statistical physics, particularly Monte Carlo
molecular modelling as an alternative for computational molecular dynamics as well
as to compute statistical field theories of simple particle and polymer models. In
experimental particle physics, these methods are used for designing detectors,
understanding their behaviour and comparing experimental data to theory, or on
vastly large scale of the galaxy modelling.
Monte Carlo methods are also used in the models that form the basis of modern
weather forecasting operations.
Engineering
Monte Carlo methods are widely used in engineering for sensitivity analysis and
quantitative probabilistic analysis in process design. The need arises from the
interactive, co-linear and non-linear behaviour of typical process simulations. For
example, in microelectronics engineering, Monte Carlo methods are applied to
analyse correlated and uncorrelated variations in analogy and digital integrated
circuits. This enables designers to estimate realistic 3 sigma corners and effectively
optimise circuit yields.
70
images of virtual 3D models, with applications in video games, architecture, design,
and computer generated films.
Finance and business
Monte Carlo methods in finance are often used to calculate the value of companies,
to evaluate investments in projects at a business unit or corporate level, or to evaluate
financial derivatives. Monte Carlo methods used in these cases allow the
construction of stochastic or probabilistic financial models as opposed to the
traditional static and deterministic models, thereby enhancing the treatment of
uncertainty in the calculation.
Telecommunications
When planning a wireless network, design must be proved to work for a wide variety
of scenarios that depend mainly on the number of users, their locations and the
services they want to use. Monte Carlo methods are typically used to generate these
users and their states. The network performance is then evaluated and, if results are
not satisfactory, the network design goes through an optimization process.
Games
Monte Carlo methods have recently been applied in game playing related artificial
intelligence theory. Most notably the game of Battleship has seen remarkably
successful Monte Carlo algorithm based computer players. One of the main problems
that this approach has in game playing is that it sometimes misses an isolated, very
good move. These approaches are often strong strategically but weak tactically, as
tactical decisions tend to rely on a small number of crucial moves which the
randomly searching Monte Carlo algorithm easily misses.
71
manually chosen (such as best case, worst case, and most likely case), and the results
recorded for each so-called “what if” scenario.
By contrast, Monte Carlo simulation considers random sampling of probability
distribution functions as model inputs to produce hundreds or thousands of possible
outcomes instead of a few discrete scenarios. The results provide probabilities of
different outcomes occurring.
For example, a comparison of a spread sheet cost construction model run using
traditional “what if” scenarios, and then run again with Monte Carlo simulation and
Triangular probability distributions shows that the Monte Carlo analysis has a
narrower range than the “what if” analysis. This is because the “what i f” analysis
gives equal weight to all scenarios.
Uses in mathematics
In general, Monte Carlo methods are used in mathematics to solve various problems
by generating suitable random numbers and observing that fraction of the numbers,
which obeys some property or properties. The method is useful for obtaining
numerical solutions to problems, which are too complicated to solve analytically.
The most common application of the Monte Carlo method in mathematics are:
72
i. Integration
Deterministic methods of numerical integration usually operate by taking a number
of evenly spaced samples from a function. In general, this works very well for
functions of one variable. However, for functions of vectors, deterministic
quadrature methods can be very inefficient. To numerically integrate a function of a
two-dimensional vector, equally spaced grid points over a two-dimensional surface
are required. For instance, a 10x10 grid requires 100 points. If the vector has 100
dimensions, the same spacing on the grid would require 10 100 points, which is far too
many to be computed. However, 100 dimensions are by no means unusual, since in
many physical problems, a "dimension" is equivalent to a degree of freedom.
Monte Carlo methods provide a way out of this exponential time-increase. As long
as the function in question is reasonably well behaved, it can be estimated by
randomly selecting points in 100-dimensional space, and taking some kind of
average of the function values at these points. By the law of large numbers, this
method will display convergence (i.e. quadrupling the number of sampled points will
halve the error, regardless of the number of dimensions).
ii. Optimization
Most Monte Carlo optimization methods are based on random walks. The program
will move around a marker in multi-dimensional space, tending to move in
directions, which lead to a lower function, but sometimes moving against the
gradient.
Another popular application for random numbers in numerical simulation is in
numerical optimization (choosing the best element from some set of available
alternatives). These problems use functions of some often large-dimensional vector
that are to be minimized (or maximized). Many problems can be phrased in this
way: for example, a computer chess program could be seen as trying to find the
optimal set of, say, 10 moves which produces the best evaluation function at the end.
The travelling salesperson problem is another optimization problem. There are also
applications to engineering design, such as design optimization.
73
iii. Inverse problems
Probabilistic formulation of inverse problems leads to the definition of a probability
distribution in the space models. This probability distribution combines a priori
(prior knowledge about a population, rather than that estimated by recent
observation) information with new information obtained by measuring some
observable parameters (data). As, in the general case, the theory linking data with
model parameters is nonlinear, the aposteriori probability in the model space may
not be easy to describe (it may be multimodal, some moments may not be defined,
etc.).
When analysing an inverse problem, obtaining a maximum likelihood model is
usually not sufficient, as we normally also wish to have information on the
resolution power of the data. In the general case, we may have a large number of
model parameters, and an inspection of the marginal probability densities of interest
may be impractical, or even useless. However, it is possible to pseudo randomly
generate a large collection of models according to the posterior probability
distribution and to analyse and display the models in such a way that information on
the relative likelihoods of model properties is conveyed to the spectator. This can be
accomplished by means of an efficient Monte Carlo method, even in cases where no
explicit formula for the a priori distribution is available.
iv. Computational mathematics
Monte Carlo methods are useful in many areas of computational mathematics, where
a lucky choice can find the correct result. A classic example is Rabin's algorithm for
primality testing (algorithm, which determines whether a given number is prime). It
states that for any n, which is not prime, a random x has at least a 75% chance of
proving that n is not prime. Hence, if n is not prime, but x says that it might be, we
have observed at most a 1-in-4 event. If 10 different random x say that "n is probably
prime" when it is not, we have observed a one-in-a-million event. In general, a
Monte Carlo algorithm of this kind produces one correct answer with a guarantee
that n is composite, and x proves it so, but another one without, but with a
74
guarantee of not getting this answer when it is wrong too often; in this case at most
25% of the time.
Remark:
In physics, two systems are coupled if they are interacting with each other. Of
special interest is the coupling of two (or more) vibratory systems (e.g. pendula or
resonant circuits) by means of springs or magnetic fields, etc. Characteristic for a
coupled oscillation is the effect of beat.
Median - the middle point of a set of numbers (for odd numbered samples).
75
Mode - the most frequently occurring number.
Mode = 4 (4 occurs most).
The mean, median and mode are called measures of central tendency.
Standard deviation computes the difference between each data point and the mean.
Take the absolute value of each difference. Sum the absolute values. Divide this sum
by the number of data points. Median: First, arrange data points in increasing order.
Mean, Median, Mode, Range, and Standard Deviations are measurements in a
sample (statistics) and can also be used to make inferences on a population.
Question 2
What is a probabilistic algorithm?
Answer
A probabilistic algorithm is an algorithm which employs a degree of randomness as part of its
logic.
76
2.4.5 The Difference between a Continuous and a Discrete Distribution
Continuous distributions describe an infinite number of possible data values (as
shown by the curve). For example, someone’s height could be 1.7m, 1.705m, 1.71m,
...
Discrete distributions describe a finite number of possible values. (shown by the
bars)
77
The random variable X in the normal equation is called the normal random
variable.
The normal equation is the probability density function for the normal distribution.
The graph of the normal distribution depends on two factors - the mean and the
standard deviation. The mean of the distribution determines the location of the center
of the graph, and the standard deviation determines the height and width of the
graph. When the standard deviation is large, the curve is short and wide; when the
standard deviation is small, the curve is tall and narrow. All normal distributions
look like a symmetric, bell-shaped curve, as shown in figure 3.
78
= (X - µ) / σ
79
Then, using a standard normal distribution table, we find the cumulative probability
associated with the z score. In this case, we find P(Z < 0.90) = 0.8159.
Therefore, the P(Z > 0.90) = 1 - P(Z < 0.90) = 1 - 0.8159 = 0.1841.
Thus, we estimate that 18.41 per cent of the students tested had a higher score than
Ada.
Example 2 - An average light bulb manufactured by the Acme Corporation lasts 300
days with a standard deviation of 50 days. Assuming that bulb life is normally
distributed, what is the probability that an Acme light bulb will last at most 365
days?
Solution: Given a mean score of 300 days and a standard deviation of 50 days, we
want to find the cumulative probability that bulb life is less than or equal to 365
days. Thus, we know the following:
i. The value of the normal random variable is 365 days.
ii. The mean is equal to 300 days.
The standard deviation is equal to 50 days.
We enter these values into the formula and compute the cumulative probability. The
answer is: P(X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will
burn out within 365 days.
80
2.4.6.4 Skewed Distributions (µ)
Skewness is the degree of asymmetry or departure from symmetry, of a distribution.
Skewed distributions are not symmetric. If the frequency curve of a distribution has a
longer tail to the right of the right of the central maximum than to the left, the
distribution is said to be skewed to the right, or have a positive skewness. If the
reverse is the case, it is said to be skewed to the left or negative skewness.
For skewed distributions, the mean tend to lie on the same side of the mode as the
longer tail. Thus, a measure of the asymmetry is supplied by the difference:
Mean – mode. This can be made dimensionless if we divide it by a measure of
dispersion, such as the standard deviation, leading to the definition:
Skewness = mean mode mode
…………. (1)
SD s
To avoid using mode, we can use the empirical formula:
Skewness = 3(mean median) 3( median) ………. (2)
SD s
Equations (1) and (2) are called; Pearson’s first and second coefficients of skewness.
81
Fig.1.3.4: illustration of Percentiles
82
3.0 Tutor Marked Assignments (Individual or Group)
a. How is Monte Carlo method different in approach from the typical
mode of simulation, in deterministic problems?
b. How is Monte Carlo method used in Engineering and Mathematics?
c. Why Convert to a Standard Normal Distribution?
d. What is the difference between a Continuous and a Discrete Distribution?
e. Given the following: mean=279.76, median=279.06, mode=277.5 and
SD=15.6, find the first and second coefficients of skewness
4.0 Summary
Our discussion in this study session, about Monte Carlo methods, shows that a
Monte Carlo method relies on repeated computation of random or pseudo-random
numbers. These methods are most suited to computations by a computer and tend to
be used when it is unfeasible or impossible to compute an exact result with a
deterministic algorithm (i.e. an algorithm whose behaviour can be completely
predicted from the input).
We use Statistical distributions to: investigate how a change in one variable relates to
a change in a second variable, represent situations with numbers, tables, graphs, and
verbal descriptions, understand measurable attributes of objects and their units,
systems, and processes of measurement, identify relationships among attributes of
entities or systems and their association.
In this study session, we discussed the following:
i. The algorithm of Monte Carlo method
ii. The history of Monte Carlo method which spurred the
development of pseudorandom number generators
iii. The application of Monte Carlo methods in areas such as physical
sciences, Engineering, Finance and Business, telecommunications, Games,
Mathematics, etc.
83
iv. We defined Statistics as field of study that is concerned with the
collection, description, and interpretation of data.
v. We saw that Statistical Distributions describe the numbers of times
each possible outcome occurs in a sample.
vi. We computed various measures of Central Tendency and Variations, which
can be used to make inferences, and explained the following components of
Statistical Distributions: Normal Distributions, z-score, percentile, Skewed
Distributions and ways to transform data to Graphs.
5.0 Self-Assessment Questions
1. List three areas of Applications of Monte Carlo Methods
2. The normal random variable of a standard normal distribution is called a
_________
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2ZB1rvu , http://bit.ly/2Ua3XE7 ,
http://bit.ly/2ZllBus , http://bit.ly/2MIZQhc , http://bit.ly/2Zwtbl6 ,
http://bit.ly/2ZnhkXo , http://bit.ly/2NIromR , http://bit.ly/2XJEu5u. Watch the video
& summarize in 1 paragraph.
b. View the animation on Monte Carlo method & statistical distribution functions
and critique it in the discussion forum.
c. Take a walk and engage any 3 students on Monte Carlo Method & Statistical
Distribution Functions; In 2 paragraphs summarize their opinion of the discussed
topic. etc.
84
iv. Finance and Business
v. Telecommunications
vi. Games
2. Standard score or a z score
85
STUDY SESSION 4
Common Probability Distributions
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Distribution Functions and Simulation
2.1.1 Probability Definitions
2.2 Random Variables
2.3 Probability Function
2.4. Mathematical Treatment of Probability
2.5 Probability theory
2.6 The Limit theorems.
2.7 Probability Distribution Functions
2.8 Summary of Common Probability Distributions
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary and Conclusion
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
In this session, we shall look at the branch of statistics that deals with analysis of
random events. Probability is the numerical assessment of likelihood on a scale from
0 (impossibility) to 1 (absolute certainty). Probability is usually expressed as the
ratio between the number of ways an event can happen and the total number of
things that can happen (e.g., there are 13 ways of picking a diamond from a deck of
52 cards, so the probability of picking a diamond is 13/52, or ¼). Probability theory
86
grew out of attempts to understand card games and gambling. As science became
more rigorous, analogies between certain biological, physical, and social phenomena
and games of chance became more evident (e.g., the sexes of new born infants
follow sequences similar to those of coin tosses). As a result, probability became a
fundamental tool of modern genetics and many other disciplines.
87
result of any analysis based on inputs represented by probability distributions is
itself a probability distribution. Hence, whereas the result of a deterministic
simulation of an uncertain system is a qualified statement ("if we build the dam, the
salmon population could go extinct"), the result of a probabilistic simulation of such
a system is a quantified probability ("if we build the dam, there is a 20% chance that
the salmon population will go extinct"). Such a result (in this case, quantifying the
risk of extinction) is typically much more useful to decision-makers who might
utilize the simulation results.
88
2.1.2 Probability Distribution
A probability distribution gathers all possible outcomes of a random variable (i.e.
any quantity for which more than one value is possible), and summarizes these
outcomes by indicating the probability of each of them. While a probability
distribution is often associated with the bell-shaped curve, recognize that such a
curve is only indicative of one specific type of probability, the so-called normal
probability distribution. However, in real life, a probability distribution can take any
shape, size and form.
89
2.2 Random Variables
Random variable is discrete random variables if it can take on a finite or countable
number of possible outcomes. The previous example asking for a day of the week is
an example of a discrete variable, since it can only take seven possible values.
Monetary variables expressed in dollars and cents are always discrete, since money
is rounded to the nearest $0.01.
A random variable is continuous random variable if it has infinite possible
outcomes.
A rate of return (e.g. growth rate) is continuous: a stock can grow by 9% next year or
by 10%, and in between this range, it could grow by 9.3%, 9.4%, 9.5%.
Clearly, there is no end to how precise the outcomes could be broken down; thus, it’s
described as a continuous variable.
Examples:
i. Rates of return can theoretically range from –100% to positive infinity.
ii. Time is bound on the lower side by 0.
iii. Market price of a security will also have a lower limit of $0, while its upper limit
will depend on the security – stocks have no upper limit (thus a stock price’s
outcome > $0),
iv. Bond prices are more complicated, bound by factors such as time-to-maturity and
embedded call options. If a face value of a bond is $1,000, there is an upper limit
90
(somewhere above $1,000) above, which the price of the bond will not go, but
pinpointing the upper value of that set is imprecise.
X f(x) y g(y)
1 0.31 6 0.32
2 0.43 7 0.40
3 0.26 8 0.23
91
or there is a fourth possibility for y where g(y) = 0.05. Either way it needs to sum to
1.
X f(x)
<0 0.2
>0 0.8
92
2.4.1 Joint Probability
If both the events A and B occur on a single performance of an experiment this is
called the intersection or joint probability of A and B, denoted as P(A ∩ B) or P(A
and B) or P(AB), which reads as the joint probability of A and B. If two events, A
and B are independent then the joint probability is:
P(A and B)=P(A∩B)=P(A)P(B)
for example, if two coins are flipped the chance of both being heads is:
P(A \mbox{ and }B) = P(A \cap B) = P(A) P(B),\
\tfrac{1}{2}\times\tfrac{1}{2} = \tfrac{1}{4}.
here the possibilities included in the "3 that are both" are included in each of the "13
hearts" and the "12 face cards" but should only be counted once.
93
2.4.3 Conditional Probability
This is the probability of some event A, given the occurrence of some other event B.
Conditional probability is written P(A|B), and is read "the probability of A, given B".
It is defined by:
P (A ∣ B) = P (A ∩ B) /P (B).
If P (B) = 0, then P (A ∣ B) is undefined by this expression.
Summary of probabilities
Event Probability
A P(A) [0,1]
Not A P(A) = 1-P(A)
A or B P(A U B) = P(A) + P(B)-P(A∩B)
If A and B are mutually
= P(A)+P(B) exclusive
A and B P(A∩B) = P(A|B)P(B)
= P(A)P(B) If A and B are independent
P(A|B) =
A given B P(A∩B)/P(B)
Two or more events are mutually exclusive if the occurrence of any one of them
excludes the occurrence of the others.
94
There have been at least two successful attempts to formalize probability, namely the
Kolmogorov formulation and the Cox formulation. In Kolmogorov's formulation,
sets are interpreted as events and probability itself as a measure on a class of sets. In
Cox's theorem, probability is taken as a primitive (that is, not further analysed) and
the emphasis is on constructing a consistent assignment of probability values to
propositions. In both cases, the laws of probability are the same, except for technical
details.
Question 1
What is a probabilistic simulation?
Answer
Probabilistic simulation is the process of explicitly representing uncertainties by specifying inputs
as probability distributions.
Probability theory is a mathematical science that permits one to find, using the
probabilities of some random events, the probabilities of other random events
connected in some way with the first.
The assertion that a certain event occurs with a probability equal, for example, to
1/2, is still not, in itself, of ultimate value, because we are striving for definite
knowledge. Of definitive, cognitive value are those results of probability theory that
allow us to state that the probability of occurrence of some event A is very close to 1
or (which is the same thing) that the probability of the non-occurrence of event A is
very small. According to the principle of “disregarding sufficiently small
probabilities,” such an event is considered practically reliable. Such conclusions,
which are of scientific and practical interest, are usually based on the assumption
that the occurrence or non-occurrence of event A depends on a large number of
factors that are slightly connected with each other.
95
Consequently, it can also be said that probability theory is a mathematical science
that clarifies the regularities that arise in the interaction of a large number of random
factors.
To describe the regular connection between certain conditions S and event A, whose
occurrence or non-occurrence under given conditions can be accurately established,
natural science usually uses one of the following schemes:
i. For each realization of conditions S, event A occurs. All the laws of classical
mechanics have such a form, stating that for specified initial conditions and
forces acting on an object or system of objects, the motion will proceed in an
unambiguously definite manner.
ii. Under conditions S, event A has a definite probability P(A/S) equal to p. Thus,
for example, the laws of radioactive emission assert that for each radioactive
substance there exists the specific probability that, for a given amount of a
substance, a certain number of atoms N will decompose within a given time
interval.
Let us call the frequency of event A in a given set of n trials (that is, of n repeated
realizations of conditions S) the ratio h = m/n of the number m of those trials in
which A occurs to the total number of trials n. The existence of a specific probability
equal to p for an event A under conditions S is manifested in the fact that in almost
every sufficiently long series of trials, the frequency of event A is approximately
equal to p.
Statistical laws, that is, laws described by a scheme of type (b), were first discovered
in games of chance similar to dice. The statistical rules of birth and death (for
example, the probability of the birth of a boy is 0.515) have also been known for a
long time. A great number of statistical laws in physics, chemistry, biology, and
other sciences were discovered at the end of the 19th and in the first half of the 20th
century.
The possibility of applying the methods of probability theory to the investigation of
statistical laws, which pertain to a very wide range of scientific fields, is based on the
96
fact that the probabilities of events always satisfy certain simple relationships, which
will be discussed in the next section. The investigation of the properties of
probabilities of events on the basis of these simple relationships is also a topic of
probability theory.
Example: In the tossing of two dice, each of the 36 possible outcomes can be
designated by (i, j), where i is the number of pips that comes up on the first dice and
j, the number on the second. The outcomes are assumed to be equally likely. To the
97
event A, “the sum of the pips is 4,” three outcomes are favourable: (1,3); (2,2); (3,1).
Consequently, P(A) = 3/36 = 1/12.
Starting from certain given events, it is possible to define two new events: their
union (sum) and intersection (product). Event B is called the union of events A1, A2,
…, Ar if it has the form “ A1 or A2, …, or Ar occurs.”
Event C is called the intersection of events A1, A2 …, Ar if it has the form “A1, and
A2, … , and Ar occurs.”
The union of events is designated by the symbol ∪, and the intersection, by ∩. Thus,
we write:
= A1 ∪ A2 ∪ ... ∪ Ar
= A1 ∩ A2 ∩ ... ∩ Ar
Events A and B are called disjoint if their simultaneous occurrence is impossible—
that is, if among the outcomes of a trial not one is favourable to A and B
simultaneously.
Two of the basic theorems of probability theory are connected with the operations of
union and intersection of events; these are the theorems of addition and
multiplication of probabilities.
98
Events A1, A2, …, Ar are said to be independent if the conditional probability of each
of them, under the condition that some of the remaining events have occurred, is
equal to its “absolute” probability.
Example:
Four shots are fired at a target, and the hit probability is 0.2 for each shot. The target
hits by different shots are assumed to be independent events. What is the probability
of hitting the target three times?
Each outcome of the trial can be designated by a sequence of four letters [for
example, (s, f, f, s) denotes that the first and fourth shots hit the target (success), and
the second and third miss (failure)]. There are 2 · 2 · 2 · 2 = 16outcomes in all. In
accordance with the assumption of independence of the results of individual shots,
one should use formula (3) and the remarks about it to determine the probabilities of
these outcomes. Thus, the probability of the outcome (s, f, f, f) is set equal to 0.2 x
0.8 x 0.8 x 0.8 = 0.1024; here, 0.8 = 1 - 0.2 is the probability of a miss for a single
shot. For the event “three shots hit the target,” the possible outcomes are: (s, s, s, f), (
s, s, f, s), (s, f, s, s), and (f, s, s, s) are favourable and the probability of each is the
same:
99
0.2 · 0.2 · 0.2 · 0.8 = · · · = 0.8 · 0.2 · 0.2 ·.20 = 0.0064 consequently, the desired
probability is 4 x 0.0064 = 0.0256.
Generalizing the discussion of the given example, it is possible to derive one of the
fundamental formulas of probability theory: if events A1, A2, …, An are independent
and each has a probability p, then the probability of exactly m such events occurring
is:
Pn(m) = Cnm(1- p)n - m………….(4)
Here, Cmn denotes the number of combinations of n elements taken m at a time. For
large n, the calculation using formula (4) becomes difficult. In the preceding
example, let the number of shots equal 100; the problem then becomes one of finding
the probability x that the number of hits lies in the range from 8 to 32. The use of
formula (4) and the addition theorem gives an accurate, but not a practically useful,
expression for the desired probability.
The approximate value of the probability x can be found by the Laplace theorem
with the error not exceeding 0.0009. This is the simplest, but a typical, example of
the use of the limit theorems of probability theory.
The theorem of multiplication of probabilities turns out to be particularly useful in
the consideration of compound trials. Let us say that trial T consists of trials T1, T2,
…, Tn-1, Tn, if each outcome of trial T is the intersection of certain outcomes Ai, Bi,
…, xk, Yl of the corresponding trials T1, T2, …, Tn-1, Tn. From one or another
consideration, the following probabilities are often known:
P(A1), P(Bj/Ai), … , P(Yi/Ai ∩ Bj ∩ … ∩
Xk)……………. (5)
According to the probabilities of (5), probabilities P(E) for all the outcomes of E of
the compound trial and, in addition, the probabilities of all events connected with this
trial can be determined using the multiplication theorem (just as was done in the
example above).
100
Two types of compound trials are the most significant from a practical point of view:
a) the component trials are independent, that is, the probabilities (5) are equal to
the unconditional probabilities P(Ai), P(Bj), …, P(Yl); and
b) the results of only the directly preceding trial have any effect on the
probabilities of the outcomes of any trial—that is, the probabilities (5) are
equal, respectively, to P(Ai), P(Bj/Ai), …, P(Yl/Xk).
In this case, it is said that the trials are connected in a Markov chain. The
probabilities of all the events connected with the compound trial are completely
determined here by the initial probabilities P(Ai) and the transition probabilities
P(Bj/Ai), …, P(Yl/Xk).
Often, instead of the complete specification of a probability distribution of a random
variable, it is preferable to use a small number of numerical characteristics. The most
frequently used are the mathematical expectation and the dispersion.
In addition to mathematical expectations and dispersions of these variables, a joint
distribution of several random variables is characterized by correlation coefficients
and so forth. The meaning of the listed characteristics is to a large extent explained
by the limit theorems
101
Let X1, X2, …, Xn, be independent random variables that have one and the same
probability distribution with EXK = a, DXK = σ2 and Yn be the arithmetic mean of the
first n variables of sequence such that:
Yn = (X1 + X2 + X 2 + · · · +Xn)/n
In accordance with the law of large numbers, for any ε > 0, the probability of the
inequality | Yn - a | ≤ ε has the limit 1 as n → ∞, and thus Yn, as a rule, differs little
from a.
The central limit theorem makes this result specific by demonstrating that the
deviations of Yn from a are approximately subordinate to a normal distribution with
mean zero and dispersion 82/n. Thus, to determine the probabilities of one or another
deviation of Yn from a for large n, there is no need to know all the details about the
distribution of the variables Xn; it is sufficient to know only their dispersion.
In the 1920’s it was discovered that even in the scheme of a sequence of identically
distributed and independent random variables, limiting distributions that differ from
the normal can arise in a completely natural manner. Thus, for example, if X1 is the
time until the first reversion of some randomly varying system to the original state,
and X2 is the time between the first and second reversions, and so on, then under very
general conditions the distribution of the sum X1 + · · · + Xn (that is, of the time until
the Aith reversion), after multiplication by n-1/α(α is a constant less than 1), converges
to some limiting distribution. Thus, the time until the nth reversion increases,
roughly speaking, as n1/α, that is, more rapidly than n (in the case of applicability of
the law of large numbers, it is of the order of n).
The mechanism of the emergence of the majority of limiting regularities can be
understood ultimately only in connection with the theory of random processes.
102
another of their courses is defined. In probability theory, a random process is usually
considered as a one-parameter family of random variables X(t). In an overwhelming
number of applications, the parameter t represents time, but this parameter can be,
for example, a point in space, and then we usually speak of a random function. In the
case when the parameter t runs through the integer-valued numbers, the random
function is called a random sequence. Just as a random variable is characterized by
a distribution law, a random process can be characterized by a set of joint
distribution laws for X(t1), X(t2), …, X(tn) for all possible moments of t1, t2, …, tn for
any n > 0.
Question 2
What is Probability theory?
Answer
Probability theory is a mathematical science that permits one to find, using the probabilities of
some random events, the probabilities of other random events connected in some way with the first.
103
of statistics. There is spread or variability in almost any value that can be measured
in a population (e.g. height of people, durability of a metal, sales growth, traffic flow,
etc.); almost all measurements are made with some intrinsic error; also in physics
many processes are described probabilistically, from the kinetic properties of gases
to the quantum mechanical description of fundamental particles. For these and many
other reasons, simple numbers are often inadequate for describing a quantity, while
probability distributions are often more appropriate.
∑ 𝑃(𝑋 = 𝑢) = 1
𝑢
104
Another convention reserves the term continuous probability distribution for
absolutely continuous distributions. These distributions can be characterized by a
probability density function: a non-negative Lebesgue Integral function f defined on
the real numbers such that
x
F(x) = μ (− ∞, x] = ∫− ∞ 𝑓 (t) dt
Discrete distributions and some continuous distributions do not admit such a density.
Terminologies
The support of a distribution is the smallest closed interval/set whose complement
has probability zero. It may be understood as the points or elements that are actual
members of the distribution.
A discrete random variable is a random variable whose probability distribution is
discrete. Similarly, a continuous random variable is a random variable whose
probability distribution is continuous.
Some properties
i. The probability density function of the sum of two independent random
variables is the convolution of each of their density functions.
ii. The probability density function of the difference of two independent random
variables is the cross-correlation of their density functions.
iii. Probability distributions are not a vector space – they are not closed under
linear combinations, as these do not preserve non-negativity or total integral 1
– but they are closed under convex combination, thus forming a convex subset
of the space of functions (or measures).
In mathematics and, in particular, functional analysis, convolution is a mathematical
operation on two functions f and g, producing a third function that is typically
viewed as a modified version of one of the original functions. Convolution is similar
to cross-correlation. It has applications that include statistics, computer vision, image
and signal processing, electrical engineering, and differential equations.
105
2.8 Summary of Common Probability Distributions
The following is a list of some of the most common probability distributions,
grouped by the type of process that they are related to. Note that all of the univariate
distributions below are singly peaked; that is, it is assumed that the values cluster
around a single point. In practice, actually observed quantities may cluster around
multiple values. Such quantities can be modelled using a mixture distribution.
2.8.1 Related to real-valued quantities that grows linearly (e.g. errors, offsets)
i. Normal distribution (aka Gaussian distribution), for a single such quantity; the
most common continuous distribution.
ii. Multivariate normal distribution (aka multivariate Gaussian distribution), for
vectors of correlated outcomes that are individually Gaussian-distributed.
106
ii. Binomial distribution, for the number of "positive occurrences" (e.g. successes,
yes votes, etc.) given a fixed total number of independent occurrences;
iii. Negative binomial distribution, for binomial-type observations but where the
quantity of interest is the number of failures before a given number of successes
occurs’
iv. Geometric distribution, for binomial-type observations but where the quantity of
interest is the number of failures before the first success; a special case of the
negative binomial distribution.
107
iii. Multivariate hyperactive geometric distribution, similar to the multinomial
distribution, but using sampling without replacement; a generalization of the
hyper geometric distribution.
108
iii. Dirichlet distribution, for a vector of probabilities that must sum to 1;
conjugate to the categorical distribution and multinomial distribution;
generalization of the beta distribution
iv. Wishart distribution, for a symmetric non-negative definite matrix; conjugate
to the inverse of the covariance matrix of a multivariate normal distribution;
generalization of the gamma distribution
109
vi. Provided a listing of common probability distributions grouped by their
related processes
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/30J2arU , http://bit.ly/2Zu19CB ,
http://bit.ly/2UalP1D , http://bit.ly/2Lch03M , http://bit.ly/2L3q1Nu ,
http://bit.ly/2Wvhvtd. Watch the video & summarize in 1 paragraph
b. View the animation on Common Probability Distributions and critique it in the
discussion forum
c. Take a walk and engage any 3 students on Common Probability Distributions; In 2
paragraphs summarize their opinion of the discussed topic. etc.
110
8.0 References/Further Readings
Sheldon M. Ross (2007). Introduction to Probability Models 9th Edition.
Academic press 2007.
David Brink, Essentials of Statistics, David Brink and Ventus Publishing Aps
2010. http://bookboon.com/int/student/statistics/statistics-essentials.pdf
Schaum’s Outlines Statistics 3 rd Edition, Murray R. Spiegel and Larry J. Stephens
http://en.wikipedia.org/wiki/Probability_theory
http://en.wikipedia.org/wiki/Probability_distribution
Olav Kallenberg; Probabilistic Symmetries and Invariance Principles. Springer -
Verlag, New York (2005). 510 pp. ISBN 0-387-25115-4
Gut, Allan (2005). Probability: A Graduate Course. Springer-Verlag.
ISBN 0387228330.
http://en.wikipedia.org/wiki/Probability#Mathematical_treatment
111
MODULE 2
Modelling and Simulation Methods
Contents:
Study Session 1: Simulation and Modelling
Study Session 2: Modelling Methods
Study Session 3: Finite Element Model
Study Session 4: Statistics for Modelling and Simulation
STUDY SESSION 1
Simulation and Modelling
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content/What is Simulation?
2.1 When to Use simulation
2.2 Types of Simulations
2.3 Steps in Constructing A Simulation Model.
2.4 Applications of Computer Simulation
2.5 Model Evaluation
3.0 Tutor Marked Assignments (Individual or Group)
4.0 Conclusion/Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
112
Introduction
You are welcome. You will agree with me that real world phenomenon is very
dynamic thus, difficult to exactly predict. To make decisions in such circumstances,
we need a tool or verifiable procedures that guide decision makers to an informed
and provable decision and action. In this study session, we will look at one such tool:
simulation, which has become cornerstone to many probabilistic projects.
113
which the system will evolve and respond to its surroundings, so that you can
identify any necessary changes that will help make the system perform the way that
you want it to.
For example, a fisheries biologist could dynamically simulate the salmon population
in a river in order to predict changes to the population, and quantitatively understand
the impacts on the salmon of possible actions (e.g., fishing, loss of habitat) to ensure
that they do not go extinct at some point in the future.
In addition, flight simulator on PC is also a computer model of some aspect of the
flight; it shows on the screen the controls and what the pilot is supposed to see from
the “cockpit” (his armchair).
Simulation therefore, is a technique (not a method) for representing a dynamic real
world system by a model and experimenting with the model in order to gain
information about the system and hence take appropriate decision. Simulation can be
done by hand or by a computer.
Simulation is a powerful and important tool because it provides a way in which
alternative designs, plans and/or policies can be evaluated without having to
experiment on a real system, which may be prohibitively costly, time-consuming, or
simply impractical to do. That is, it allows you to ask "What if?" questions about a
system without having to experiment on the actual system itself (and hence incur the
costs of field tests, prototypes, etc.).
114
You can use simulation to observe the dynamic behaviour of a model of real or
imaginary system. Indeed, by simulating a complex system we are able to understand
the behaviour at low cost. Otherwise, we would have to carry out a complicated
theoretical research or to build a device (an electric heater, a building or a plane), and
observe how it changes to get hints for improvements in the design.
If you run a shop, a hospital or a bank, then computer simulation may show you
bottlenecks, service time, flows, and queues of clients and provide important
information on how to improve your business.
Note that often we describe a real world system by:
i. a physical model
ii. a mathematical or analytic model
iii. an analogue model
Now, what do you think happens when a system is not amenable to treatment using
the above model? Constructing a real physical system could be very expensive and
what more testing it with live human beings and observing what happens could be
fatal. Training a new pilot using an airplane is suicidal. This is why simulation is
designed and utilized.
Thus, simulation is the answer to our question. Many operations Research analysts
consider simulation to be a method of last resort. This is because it may be useful
when other approaches cannot be used, for example, when a real world situation is
complex. Note that nothing prevents you from using simulation approach to analytic
problem. Results can at least be compared.
Thus, before designing and implementing a real life system, it is necessary to find
out via simulation studies whether the system will work, otherwise the whole
exercise will be a wild goose chase. Inevitably, huge sums of money will have been
wasted.
Unlike the situation in mathematical programming, so far there is no clear-cut
underlying principle guiding the formulation of simulation models. Each application
115
is ad-hoc to a large extent. In general, there are three basic objectives of simulation
studies:
i. To Describe a Current System – Suppose that a manufacturing company has
suddenly observed a marked deterioration in meeting due-dates of customers’
order. It may be necessary to build a simulation model to see how the current
procedures for estimating due dates, scheduling production and ordering raw
materials are giving rise to observed delays.
ii. To explore a Hypothetical System – such as installing a new system, which
will cost a lot of money, it might be better to build a hypothetical model of the
system and learn from its behaviour.
iii. To Design an Improved System – for example consider a supermarket that
has one payment counter. Due to increase in patronage, it is considering to
increase the number of pay points. A simulation experiment may identify if
one, two or more additional points are needed or not needed.
116
iii. Stochastic models use random number generators to model the chance or
random events; they are also called Monte Carlo simulations.
There are two basic types of simulation for which models are built, and the process
of choosing the subset of characteristics or features is different for each. The
distinction between the two types is based on how time is represented; either as a
continuous variable or as a discrete variable.
117
"analogue" simulations were run on conventional digital computers that emulate the
behaviour of an analogue computer.
A typical Continuous (stochastic) system has a large number of control parameters
that can have a significant impact on the performance of the system. To establish a
basic knowledge of the behaviour of a system under variation of input parameters,
sensitivity analysis is usually performed, which applies small changes from one state
to the nominal values of input parameters. For such simulation, variations of the
input parameter cannot be made infinitely small. The sensitivity of the performance
measure with respect to an input parameter is therefore defined as (partial)
derivative.
Sensitivity analysis is concerned with evaluating sensitivities (gradient) of
performance measures with respect to parameter of interest. It provides guidance for
design and operational decisions and plays a pivotal role in identifying the most
significant system parameters, as well as bottleneck of subsystems.
In designing, analysing and operating such complex systems, one is interested not
only in performance evaluation but also in sensitivity analysis and optimisation.
118
system. Every time a car arrives the graph increases by one unit while a departing car
causes the graph to drop by one unit. This graph (called sample path), could be
obtained from observation of real station, but could also be artificially constructed.
Such artificial construction and analysis of the resulting sample path (or more sample
paths in more complex cases) consists of the simulation.
The path consists of only horizontal and vertical lines, as cars arrivals and departures
occurred, at distinct points in time, what we refer to as events. Between two
consecutive events, nothing happens – the graph is horizontal. When the number of
events is finite, we call the simulation discrete event.
Discrete event systems (DES) are dynamic systems, which evolve in time by the
occurrence of events at possible irregular time intervals. DES abounds in real-world
applications. Examples include traffic systems, flexible manufacturing systems,
computer communication systems, production lines, flow networks etc. Most of
these systems can be modelled in terms of discrete events whose occurrence causes
the system to change from one state to another.
Simulations may be performed manually. Most often, however, the system model
is written either as a computer program or as some kind of input into simulator
software.
A discrete event simulation (DE) manages events in time. Most computer, logic-test
and fault-tree simulations are of this type. In this type of simulation, the simulator
maintains a queue of events sorted by the simulated time they should occur. The
simulator reads the queue and triggers new events as each event is processed. It is not
important to execute the simulation in real time. It's often more important to be able
to access the data produced by the simulation, to discover logic defects in the design,
or the sequence of events.
A special type of discrete simulation that does not rely on a model with an
underlying equation, but can nonetheless be represented formally, is agent-based
simulation. In agent-based simulation, the individual entities (such as molecules,
cells, trees or consumers) in the model are represented directly (rather than by their
119
density or concentration) and possess an internal state and set of behaviours or rules
which determine how the agent's state is updated from one time-step to the next.
In-text Question 1
Discrete event systems (DES) are dynamic systems True or False?
Answer
True
moment
b) Event: arrival of cars, start of service, end of service
120
c) Entities: these are the cars
d) Queue: the queue of cars in front of the pump waiting for service
121
Scheduling: is the act of assigning a new future event to an existing entity.
Random variable: is a quantity that is uncertain, such as interval time
between two incoming flights or number of defectives parts in a shipment.
Random Variant: is an artificially generated random variable.
Distribution: is the mathematical law, which governs the probabilistic
features of a random variable.
2.4 Applications of Computer Simulation
Computer simulation has become a useful part of modelling many natural systems in
physics, chemistry and biology, and human systems in economics and social science
(the computational sociology) as well as in engineering to gain insight into the
operation of those systems. A good example of the usefulness of using computers to
simulate can be found in the field of network traffic simulation. In such simulations,
the model behaviour will change each simulation according to the set of initial
parameters assumed for the environment. Computer simulations are often considered
to be human out of the loop simulations.
Computer graphics can be used to display the results of a computer simulation.
Animations can be used to experience a simulation in real-time e.g. in training
simulations. In some cases, animations may also be useful in faster than real-time or
even slower than real-time modes. For example, faster than real-time animations can
be useful in visualizing the build-up of queues in the simulation of humans
evacuating a building.
There are many different types of computer simulation; the common feature they all
share is the attempt to generate a sample of representative scenarios for a model in
which a complete enumeration of all possible states of the model would be
prohibitive or impossible. Several software packages also exist for running
computer-based simulation modelling that makes the modelling almost effortless and
simple.
122
i. Simulation in computer science
In computer science, simulation has an even more specialized meaning: Alan Turing
uses the term "simulation" to refer to what happens when a digital computer runs a
state transition table (runs a program) that describes the state transitions, inputs and
outputs of a subject discrete-state machine. The computer simulates the subject
machine.
In computer programming, a simulator is often used to execute a program that has to
run on some inconvenient type of computer, or in a tightly controlled testing
environment for example, simulators are usually used to debug a micro program or
sometimes-commercial application programs. Since the operation of the computer is
simulated, all of the information about the computer's operation is directly available
to the programmer, and the speed and execution of the simulation can be varied at
will.
Simulators may also be used to interpret fault trees, or test very large scale
integration (VLSI) logic designs before they are constructed. In theoretical computer
science, the term simulation represents a relation between state transition systems.
ii. Simulation in training
Simulation is often used in the training of civilian and military personnel. This
usually occurs when it is prohibitively expensive or simply too dangerous to allow
trainees to use the real equipment in the real world. In such situations, they will
spend time learning valuable lessons in a "safe" virtual environment. Often the
convenience is to permit mistakes during training for a safety-critical system.
Training simulations typically come in one of three categories:
[1] live simulation (where real people use simulated (or "dummy") equipment in
123
gaming" since it bears some resemblance to table-top war games in which
players command armies of soldiers and equipment which move around a
board.
iii. Simulation in Education
Simulations in education are somewhat like training simulations. They focus on
specific tasks. In the past, video has been used for teachers and students to observe,
problem solve and role play; however, a more recent use of simulation in education
include animated narrative vignettes (ANV). ANVs are cartoon-like video narratives
of hypothetical and reality-based stories involving classroom teaching and learning.
ANVs have been used to assess knowledge, problem solving skills and dispositions
of children, and pre-service and in-service teachers.
Another form of simulation has been finding favour in business education in recent
years. Business simulations that incorporate a dynamic model enable
experimentation with business strategies in a risk free environment and provide a
useful extension to case study discussions.
iv. Medical Simulators
Medical simulators are increasingly being developed and deployed to teach
therapeutic and diagnostic procedures as well as medical concepts and decision
making to personnel in the health professions. Simulators have been developed for
training procedures ranging from the basics such as blood draw, to laparoscopic
surgery and trauma care.
Many medical simulators involve a computer connected to a plastic simulation of the
relevant anatomy. Sophisticated simulators of this type employ a life size
mannequin, which responds to injected drugs and can be programmed to create
simulations of life-threatening emergencies. In others simulations, visual
components of the procedure are reproduced by computer graphics techniques, while
touch-based components are reproduced by haptic feedback devices combined with
physical simulation routines computed in response to the user's actions. Medical
124
simulations of this sort will often use 3D CT or MRI scans of patient data to enhance
realism.
Another important medical application of a simulator -- although, perhaps, denoting
a slightly different meaning of simulator -- is the use of a placebo drug, a
formulation that simulates the active drug in trials of drug efficacy.
v. City Simulators / Urban Simulation
A City Simulator is a tool used by urban planners to understand how cities are likely
to evolve in response to various policy decisions. UrbanSim: The City Simulator
developed at the University of Washington and ILUTE developed at the University
of Toronto are examples of modern, large-scale urban simulators designed for use by
urban planners. City simulators are generally agent-based simulations with explicit
representations for land use and transportation.
vi. Flight simulators
A flight simulator is used to train pilots on the ground. It permits a pilot to crash his
simulated "aircraft" without being hurt. Flight simulators are often used to train
pilots to operate aircraft in extremely hazardous situations, such as landings with no
engines, or complete electrical or hydraulic failures. The most advanced simulators
have high-fidelity visual systems and hydraulic motion systems. The simulator is
normally cheaper to operate than a real trainer aircraft.
vii. Marine simulators
This bears resemblance to flight simulators. The marine simulators are used to train
ship personnel. Simulators like these are mostly used to simulate large or complex
vessels, such as cruise ships and dredging ships. They often consist of a replication
of a ships' bridge, with operating desk(s), and a number of screens on which the
virtual surroundings are projected.
125
used to simulate propagation delay and phase shift caused by an actual transmission
line. Similarly, dummy loads may be used to simulate impedance without simulating
propagation, and is used in situations where propagation is unwanted. A simulator
may imitate only a few of the operations and functions of the unit it simulates.
Contrast with: emulate.
Most engineering simulations entail mathematical modelling and computer assisted
investigation. There are many cases, however, where mathematical modelling is not
reliable. Simulation of fluid dynamics problems often requires both mathematical
and physical simulations. In these cases, the physical models require dynamic
similitude. Physical and chemical simulations have also direct realistic uses, rather
than research uses; in chemical engineering, for example, process simulations are
used to give the process parameters immediately used for operating chemical plants,
such as oil refineries.
Discrete Event Simulation is often used in industrial engineering, operations
management and operational research to model many systems (commerce, health,
defence, manufacturing, logistics, etc.); for example, the value-adding
transformation processes in businesses, to optimize business performance. Imagine a
business, where each person could do 30 tasks, where thousands of products or
services involved dozens of tasks in a sequence, where customer demand varied
seasonally and forecasting was inaccurate- this is the domain where such simulation
helps with business decisions across all functions.
viii. Simulation and games
Strategy games - both traditional and modern - may be viewed as simulations of
abstracted decision-making for the purpose of training military and political leaders.
In a narrower sense, many video games are also simulators, implemented
inexpensively. These are sometimes called "sim games". Such games can simulate
various aspects of reality, from economics to piloting vehicles, such as flight
simulators (described above).
126
ix. The "classroom of the future"
The "classroom of the future" will probably contain several kinds of simulators, in
addition to textual and visual learning tools. This will allow students to enter school
better prepared, and with a higher skill level. The advanced student or postgraduate
will have a more concise and comprehensive method of retraining -- or of
incorporating new academic contents into their skill set -- and regulatory bodies and
institution managers will find it easier to assess the proficiency and competence of
individuals.
In classrooms of the future, the simulator will be more than a "living" textbook; it
will become an integral a part of the practice of Education and training. The
simulator environment will also provide a standard platform for curriculum
development in educational institutions.
In-text Question 1
What is an Event in Simulation?
Answer
An Event is an occurrence at a point in time, which may change the state of the system, such as
arrival of a customer or start of work on a job.
127
(interpolation)? Does the model describe well events outside the measurement data
(extrapolation)?
A common approach is to split the measured data into two parts; training data and
verification data. The training data is used to train the model, that is, to estimate the
model parameters (see above). The verification data is used to evaluate model
performance. Assuming that the training data and verification data are not the same,
we can assume that if the model describes the verification data well, then the model
describes the real system well.
However, this still leaves the extrapolation question open. How well does this model
describe events outside the measured data? Consider again Newtonian classical
mechanics-model. Newton made his measurements without advanced equipment, so
he could not measure properties of particles travelling at speeds close to the speed of
light. Likewise, he did not measure the movements of molecules and other small
particles, but macro particles only. It is then not surprising that his mole does not
extrapolate well into these domains, even though his model is sufficient for ordinary
life physics.
The reliability and the trust people put in computer simulations depends on the
validity of the simulation model, therefore verification and validation are of crucial
importance in the development of computer simulations. Another important aspect of
computer simulations is that of reproducibility of the results, meaning that a
simulation model should not provide a different answer for each execution. Although
this might seem obvious, this is a special point of attention in stochastic simulations,
where random numbers should actually be semi-random numbers. An exception to
reproducibility is human in the loop simulations such as flight simulations and
computer games. Here a human is part of the simulation and thus influences the
outcome in a way that is hard if not impossible to reproduce exactly.
128
ii. In one sentence for each distinguish between different types of simulation
iii. Briefly describe simulation in five application areas.
129
c. Continuous or discrete (and as an important special case of
discrete, discrete event or DE models)
d. Local or distributed
ii. Stated that simulations are done by: Formulating the model, Design the
Experiment and Developing the Computer Programs.
iii. Listed areas of applications of Simulation to include: Computer science,
Medicine, Education, City/Urban planning, Training etc.
130
2. The common approach is to split the measured data into two parts; training
data and verification data. The training data is used to train the model, that is,
to estimate the model parameters (see above). The verification data is used to
evaluate model performance. Assuming that the training data and verification
data are not the same, we can assume that if the model describes the
verification data well, then the model describes the real system well.
131
STUDY SESSION 2
Modelling Methods
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Definitions of Modelling
2.2 Basic Modelling Concepts
2.3 Visual and Conceptual models
2.4 Features of Visual and Conceptual Model
2.5 Cognitive Affordances of Visual Models
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary and Conclusion
5.0 Self-Assessment Questions and Answers
6.0 Additional Activities
7.0 References/Further Readings
Introduction
You are welcome to this study session, where we shall explain modelling methods.
Modelling is an essential and inseparable part of all scientific activity, and many
scientific disciplines have their own ideas about specific types of modelling. There is
little general theory about scientific modelling, offered by the philosophy of science,
systems theory, and new fields like knowledge visualization.
We create models for representation of the objects within a system together with the
rules that govern the interactions of the objects. The representation may be concrete
as in the case of the spaceship or flight simulators or abstract as in the case of the
computer program that examines the number of checkout stations in service queue.
132
1.0 Study Session Learning Outcomes
After studying this session, I expect you to be able to:
1) Define Modelling
2) Describe some basic modelling concepts
3) Differentiate between Visual and Conceptual models
4) Explain the Characteristics of Visual, models
133
2.1 Basic Modelling Concepts
Modelling as a substitute for direct measurement and experimentation
Models are typically used when it is either impossible or impractical to create
experimental conditions in which scientists can directly measure outcomes. Direct
measurement of outcomes under controlled conditions will always be more accurate
than modelled estimates of outcomes. When predicting outcomes, models use
assumptions, while measurements do not. As the number of assumptions in a model
increases, the accuracy and relevance of the model diminishes.
Modelling language
A modelling language is any artificial language that can be used to express
information or knowledge or systems in a structure that is defined by a consistent set
of rules. The rules are used for interpretation of the meaning of components in the
structure.
Simulation
A simulation is the implementation of a model. A steady state simulation provides
information about the system at an instant in time (usually at equilibrium, if it
exists). A dynamic simulation provides information over time. A simulation brings a
model to life and shows how a particular object or phenomenon will behave. It is
useful for testing, analysis or training where real-world systems or concepts can be
represented by a model.
Structure
Structure is a fundamental and sometimes intangible notion covering the recognition,
observation, nature, and stability of patterns and relationships of entities. From a
child's verbal description of a snow, to the detailed scientific analysis of the
properties of magnetic fields, the concept of structure is an essential foundation of
nearly every mode of inquiry and discovery in science, philosophy, and art.
134
Systems
A system is a set of interacting or interdependent entities, real or abstract, forming
an integrated whole. In general, a system is a construct or collection of different
elements that together can produce results not obtainable by the elements alone. The
concept of an 'integrated whole' can also be stated in terms of a system embodying a
set of relationships which are differentiated from relationships of the set to other
elements, and from relationships between an element of the set and elements not a
part of the relational regime.
There are two types of systems:
a) Discrete, in which the variables change instantaneously at separate points in
time and
b) Continuous, where the state variables change continuously with respect to time
135
2.1.2 Factors in evaluating a model
A model is evaluated first by its consistency to empirical data; any model
inconsistent with reproducible observations must be modified or rejected. However,
a fit to empirical data alone is not sufficient for a model to be accepted as valid.
Other factors important in evaluating a model include:
i. Ability to explain past observations
ii. Ability to predict future observations
iii. Cost of use, especially in combination with other models
iv. Refutability, enabling estimation of the degree of confidence in the model
v. Simplicity, or even aesthetic appeal
We can however define visual models by what they strive to do, and list some of the
important characteristics that distinguish 'visual models' from other kinds of graphic
art.
136
Fig 2.2.1 visual and conceptual models
137
Sall (1991a) shows how this mechanical model neatly describes the effects of sample
size on power of a test, the leverage of outlying observations in regression, principal
components, and collinearity among others
Conceptual Modelling
a. Is used for abstract (visual) representation of the problem domain
b. It serves to enhance understanding of complex problem domains.
c. It provides a basis for communication among project team members
In-text Question 1
What is a Conceptual model?
Answer
A conceptual model is a type of model used for abstract (visual) representation of the problem
domain
138
2.3 Features of Visual and Conceptual Model
A visual model should:
1) Render conceptual knowledge as opposed to quantitative data (information
visualization) or physical things (technical illustration). We usually express
conceptual knowledge with words alone, and yet the meaning behind those
words is often inherently visual. Visual models seek to render directly the
image-schematic (meaning that lies behind our words).
2) Be good models - the images should accurately reflect the situation in the
world and embody the characteristics of a useful model.
3) Integrate the most salient aspects of the problem into a clear and coherent
picture.
4) Fit the visual structure to the problem – and not force the problem into a
predefined visual structure.
5) Use a consistent visual grammar.
6) Should be visually and cognitively tractable. Visual models exist to support
robust qualitative thinking: they are software for 'human-simulation' (as
opposed to computer-simulation) of the issue at hand. To serve as effective
'simulation software', visual models must be 'readable' and 'run able' by our
visio-cognitive 'hardware' and should positively engage our prodigious visual
intelligence.
7) Tap into the power of elegant design. In other words, they should not be ugly.
Conceptual Modelling
a. A good conceptual model should NOT reflect a solution bias.
b. Should model the problem domain, not the solution domain.
c. Initial conceptual model may be rough and general.
d. May be refined during incremental development.
139
2.4 Cognitive Affordances of Visual Models
Due to the limited capacity of our working memory, 7 ± 2 ‘chunks’ of information,
we cannot hold in our minds concepts, arguments, or problems that consist of more
than 5 to 9 objects or relationships. While this cognitive limitation severely restricts
our ability to think about complex things, we can do what we often do: extend our
intellectual abilities with external representations or 'models' of the problem.
The particular affordances diagrams – their ability to simultaneously show many
objects and relationships – make them an ideal tool for thinking about conceptually
complex problems. Diagrams provide an external mnemonic aid that enables us to
see complicated relationships and easily move between various mind-sized
groupings of things.
In-text Question 2
Diagrams provide an external mnemonic aid that enables us to see complicated relationships True
or False?
Answer
True
Summary
The essence of constructing a model is to identify a small subset of characteristics or
features that are sufficient to describe the behaviour of the system under
investigation. Since a model is an abstraction of a real system and not the system
itself, there is therefore, a fine line between having too few characteristics to
accurately describe the behaviour of the system and more than you need to
accurately describe the system. The goal should be to build the simplest model that
effectively describes the relevant behaviour of the system.
i. We defined modelling as the process of generating abstract, conceptual,
graphical and/or mathematical models. Science offers a growing
collection of methods, techniques and theory about all kinds of specialized
scientific modelling.
140
ii. We Listed and briefly explained some basic modelling
concepts
iii. Differentiating between Visual and Conceptual models
iv. we discussed the important factors in evaluating a model to include:
a. Ability to explain past observations
b. Ability to predict future observations
c. Cost of use, especially in combination with other models
d. Refutability, enabling estimation of the degree of confidence in the model
e. Simplicity, or even aesthetic appeal
v. We discussed the features of a good visual model which include:
vi. Ability to render conceptual knowledge as opposed to quantitative data
(information visualization) or physical things (technical illustration),
a. the images should accurately reflect the situation in the world,
b. the model should Integrate the most salient aspects of the problem into a
clear and coherent picture,
c. Fit the visual structure to the problem,
d. It should Use a consistent visual grammar,
e. Should be visually and cognitively tractable.
f. We also stated the Characteristics of Conceptual models
141
4.0 Conclusion/Summary
This brings us to the end of our discussion and to sum up, we can say that Simulation
is a very powerful, problem solving technique. Its applicability is so general that
it would be hard to point out disciplines or system to which simulation has not been
applied. The basics idea behind simulation is simple, namely model the given system
by means of some equations and then determine its time dependent behaviour. In
Simulation, we make a model of conceptual model and then results are compared
with real system. Normally simulation is used when either an exact analytic
expression for the behaviour of the system under investigation is not available or
the analytic solution is too time consuming.
Simulation is considered as interdisciplinary subject because it uses concepts from
mathematics computer science and application field. A model is a representation of
an actual system. Models can be of different kinds. Discrete-Event Simulation
Model is defined as one in which the state variables change only at discrete points
in time at which events occur.A deterministic system is defined as in which
randomness does not affect the behaviour of the system.A stochastic system is
defined as in which randomness affects the behaviour of the system. In order to
process a simulation of an event first we have to perform formulation of the problem,
and then we have to implement the defined model in any suitable programming
language. Finally, we have to perform verification and validation and then
analysis of output results.
142
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2L9Ltzq , http://bit.ly/2LhudZf ,
http://bit.ly/2UlmpK9 , http://bit.ly/2UdU3kO , http://bit.ly/2ZvpKex ,
http://bit.ly/2L12lcH. Watch the video & summarize in 1 paragraph
b. View the animation on Modelling Methods and critique it in the discussion forum
c. Take a walk and engage any 3 students on Modelling Methods; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
143
STUDY SESSION 3
Finite Element Model and Database Model
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Contents
2.1 Finite Element Model
2.2 Overview of Basic FEM
2.3 Discretization
2.4 Interpretation of FEM
2.6 Assembly Procedure
2.7 Boundary Conditions
2.8 Data-based models
2.9 The three perspectives of Data model
2.10 Database Model
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Conclusion/Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
In this study session, we shall talk about Finite Element Model and Database Model.
The finite element method is one of the most powerful approaches for approximate
solutions to a wide range of problems in mathematical physics. The method has
achieved acceptance in nearly every branch of engineering and is the preferred
approach in structural mechanics and heat transfer. Its application has extended to
soil mechanics, heat transfer, fluid flow, magnetic field calculations, and other areas.
144
Managing large quantities of structured and unstructured data is a primary
function of information systems. Data models describe structure of data for storage
in data management systems such as relational databases. They typically do not
describe unstructured data, such as documents, word processing, email messages,
pictures, digital audio, and video
145
state problems), or rendering the PDE into an approximating system of ordinary
differential equations, which are then numerically integrated using standard
techniques such as Euler's method, Runge-Kutta, etc.
In solving partial differential equations, the primary challenge is to create an
equation that approximates the equation to be studied, but is numerically stable,
meaning that errors in the input and intermediate calculations do not accumulate and
cause the resulting output to be meaningless. There are many ways of doing this, all
with advantages and disadvantages. The steps may be broken down as follows:
i. Definition of the physical problem: - development of the model.
ii. Formulation of the governing equations - Systems of PDE, ODE, algebraic
equations, define initial conditions and/or boundary conditions to get a well-
posed problem,
iii. Discretization of the equations.
iv. Solution of the discrete system of equations.
v. Interpretation of the obtained results.
vi. Errors analysis.
The Finite Element Method is a good choice for solving partial differential equations
over complicated domains (like cars and oil pipelines), when the domain changes (as
during a solid state reaction with a moving boundary), when the desired precision
varies over the entire domain, or when the solution lacks smoothness. For instance,
in a frontal crash simulation it is possible to increase prediction accuracy in
"important" areas like the front of the car and reduce it in its rear (thus reducing cost
of the simulation); another example would be the simulation of the weather pattern
on Earth, where it is more important to have accurate predictions over land than over
the wide-open sea.
146
2.1.1 The Finite Element Analysis
Finite Element Analysis is a method to computationally model reality in a
mathematical form to better understand a highly complex problem. In the real world
everything that occurs is as a results of interactions between atoms (and sub-particles
of those atoms), billions and billions of them. If we were to simulate the world in a
computer, we would have to simulate this interaction based on the simple laws of
physics. However, no computer can process the near infinite number of atoms in
objects, so instead we model 'finite' groups of them.
For example, we might model a gallon of water by dividing it up into 1000 parts and
measuring the interaction of these linked parts. If you divide into too few parts, your
simulation will be too inaccurate. If you divide into too many, your computer will sit
there for years calculating the result!
147
they share one essential characteristic: mesh discretization of a continuous domain
into a set of discrete sub-domains, usually called elements.
Hrennikoff's work discretizes the domain by using a lattice analogy while Courant's
approach divides the domain into finite triangular sub regions for solution of second
order elliptic partial differential equations (PDEs) that arise from the problem of
torsion of a cylinder. Courant's contribution was evolutionary, drawing on a large
body of earlier results for PDEs developed by Rayleigh, Ritz, and Galerkin.
Development of the finite element method began in earnest in the middle to late
1950s for airframe and structural analysis and gathered momentum at the University
of Stuttgart through the work of John Argyris and at Berkeley through the work of
Ray W. Clough in the 1960s for use in civil engineering. By late 1950s, the key
concepts of stiffness matrix and element assembly existed essentially in the form
used today. NASA issued a request for proposals for the development of the finite
element software NASTRAN in 1965. The method was again provided with a
rigorous mathematical foundation in 1973 with the publication of Strang and Fix's
An Analysis of The Finite Element Method has since been generalized into a branch
of applied mathematics for numerical modelling of physical systems in a wide
variety of engineering disciplines, e.g., electromagnetism, thanks to Peter P. Silvester
and fluid dynamics.
148
2.2 Overview of Basic FEM
The basic steps of the finite element method are discussed next in more generality.
Although attention is focused on structural problems, most of the steps translate to
other applications problems as noted above. The role of FEM in numerical
simulation is schematized in figure 2.2.1 below.
This diagram displays the three key simulation steps: idealization, discretization and
solution. It also indicates the fact that each step introduces different of errors. For
example, the discretization error is the discrepancy obtained when the discrete
solution is substituted in the mathematical model.
Figure 2.3.1: Steps of the physical simulation process: idealization, discretization and solution.
In-text Question 1
___________________ is a method to computationally model reality in a mathematical form to
better understand a highly complex problem.
Answer
Finite Element Analysis
2.2.1 Idealization
Models
The word “model” has the traditional meaning of a s called copy or representation of
an object. That is precisely how most dictionaries define it. We use here the term in a
more modern sense, increasingly common since the advent of computers:
149
A model is a symbolic device built to simulate and predict aspects of behaviour of a
system.
Note the careful distinction made between “behaviour” and “aspects of behaviour.”
To predict everything, in all physical scales, you must deal with the actual system. A
model abstracts aspects of interest to the modeller. The term “symbolic” means that
a model represents a system in terms of the symbols and language of another science.
For example, engineering systems may be (and are) modelled with the symbols of
mathematics and/or computer sciences.
Mathematical Models
Mathematical modelling, or idealization, is a process by which the engineer passes
from the actual physical system under study, to a mathematical model of the system,
where the term model is understood in the wider sense defined above.
The process is called idealization because the mathematical model is necessarily an
abstraction of the physical reality. (Note the phrase aspects of behaviour in the
definition.) The analytical or numerical results obtained for the mathematical model
are re-interpreted in physical terms only for those aspects.
Why is the mathematical model an abstraction of reality?
Engineering systems such as structures tend to be highly complex. To simulate its
behaviour, it is necessary to reduce that complexity to manageable proportions.
Mathematical modelling is an abstraction tool by which complexity can be brought
under control. This is achieved by “filtering out” physic al details that are not
relevant to the analysis process. For example, a continuum material model
necessarily filters out the aggregate, crystal, molecular and atomic levels of matter. If
you are designing a bridge or building such levels are irrelevant. Consequently,
choosing a mathematical model is equivalent to choosing an information filter.
150
2.2.2 Implicit vs. Explicit Modelling
Suppose that you have to analyse a structure and at your disposal is a “black box”
general-purpose finite element program. This is also known in the trade as a “canned
program.” Those programs usually offer a catalogue of element types; for example,
bars, beams, plates, shells, axisymmetric solids, general 3D solids, and so on. The
moment you choose specific elements from the catalogue, you automatically accept
the mathematical models on which the elements are based. This is implicit modelling,
a process depicted in Figure 2.3.2.
Ideally, you should be fully aware of the implications of your choice. Providing such
“finite element literacy” is one of the objectives of this course.
Unfortunately, many users of commercial programs are unaware of the “implied
consent” aspect of implicit modelling.
Figure 2.3.2: Implicit modelling: picking elements from an existing FEM code implicitly
accepts an idealization. Read the fine print.
The other extreme occur when you select a mathematical model of the physical
problem with your eyes wide open and then either shop around for finite element
programs that implements that model, or write the program yourself. This is explicit
modelling. It requires expertise that is far more technical, resources, experience and
maturity than implicit modelling. However, for problems that fall out of the ordinary
it may be the right thing to do.
In practice, a combination of implicit and explicit modelling is quite common. The
physical problem to be solved is broken down into sub problems. Those sub
151
problems that are conventional and fit existing programs may be treated with implicit
modelling. Those sub problems that require special handling may yield only to
explicit modelling treatment.
2.3 Discretization
2.3.1 Purpose
Mathematical modelling is a simplifying step. But models of physical systems are
not necessarily easy to solve. They usually involve coupled partial differential
equations in space and time subject to boundary and/or interface conditions. Such
analytical models have an infinite number of degrees of freedom.
At this point one faces the choice of trying for analytical or numerical solutions.
Analytical solutions, also called “closed form solutions,” are more intellectually
satisfying, particularly if they apply to a wide class of problems. Unfortunately, they
tend to be restricted to regular geometries and simple boundary conditions.
Moreover, a closed-form solution, expressed for example as the inverse of an integral
transform, often has to be numerically evaluated to be useful.
Most problems faced by the engineer either do not yield to analytical treatment or
doing so would require a disproportionate amount of effort. The practical way out is
numerical simulation. Here is where finite element methods and the digital computer
enter the scene.
To make numerical simulations practical it is necessary to reduce the number of
degrees of freedom to a finite number. The reduction is called discretization. The
end result of the discretization process is the discrete model depicted showed above
Discretization can proceed in space dimensions as well as in the time dimension.
Because the present course deals only with static problems, we need not consider the
time dimension and are free to concentrate on spatial discretization.
152
2.3.2 Error Sources and Approximation
Figure 2.2.1 tries to convey graphically that each simulation step introduces a source
of error. In engineering practice, modelling errors are by far the most important.
However, they are difficult and expensive to evaluate, because such model validation
requires access to and comparison with experimental results.
Next in order of importance is the discretization error. Even if solution errors are
ignored and usually they can, the computed solution of the discrete model is in
general only an approximation in some sense to the exact solution of the
mathematical model. A quantitative measurement of this discrepancy is called the
discretization error. A branch of numerical mathematics called approximation theory
addresses the characterization and study of this error.
Intuitively one might suspect that the accuracy of the discrete model solution would
improve as the number of degrees of freedom is increased, and that the discretization
error goes to zero as that number goes to infinity. This loosely worded statement
describes the convergence requirement of discrete approximations. One of the key
goals of approximation theory is to make the statement as precise as it can be
expected from a branch of mathematics.
153
In-text Question 1
The characterization and study of discretization error is addressed by a branch of numerical
mathematics called _______________________
Answer
approximation theory
154
isolation, as individual entities. This is the key to the programming of element
libraries.
In the Direct Stiffness Method, elements are isolated by disconnection and
localization. This procedure involves the separation of elements from their
neighbours by disconnecting the nodes, followed by the referral of the element to a
convenient local coordinate system. After these two steps, we can consider generic
elements: a bar element, a beam element, and so on. From the standpoint of
computer implementation, it means that you can write one subroutine or module that
constructs all elements of one type, instead of writing one for each element instance.
Figure 2.2.5: Typical finite element geometries in one through three dimensions
The following is a summary of the data associated with an individual finite element.
This data is used in finite element programs to carry out element level calculations.
Dimensionality: Elements can have one, two or three space dimensions. (There are
also special elements with zero dimensionality, such as lumped springs.)
155
Nodal points: Each element possesses a set of distinguishing points called nodal
points or nodes for short. Nodes serve two purposes: definition of element geometry,
and home for degrees of freedom. They are located at the corners or end points of
elements (see Figure 2.2.5); in the so-called refined or higher-order elements, nodes
are also placed on sides or faces.
Geometry: The geometry of the element is defined by the placement of the nodal
points. Most elements used in practice have simple geometries. In one-dimension,
elements are usually straight lines or curved segments. In two dimensions, they are
of triangular or quadrilateral shape.
In three dimensions the three common shapes are tetrahedra, pentahedra (also called
wedges or prisms), and hexahedra (also called cuboids or “bricks”). See Figure 2.2.5
Degrees of freedom: The degrees of freedom (DOF) specify the state of the element.
They also function as “handles” through which adjacent elements are connected.
DOFs are defined as the values (and possibly derivatives) of a primary field variable
at nodal points.
The actual selection depends on criteria studied at length in Part II. Here we simply
note that the key factor is the way in which the primary variable appears in the
mathematical model. For mechanical elements, the primary variable is the
displacement field and the DOF for many (but not all) elements are the displacement
components at the nodes.
Nodal forces: There is always a set of nodal forces in a one-to-one correspondence
with degrees of freedom. In mechanical elements, the correspondence is established
through energy arguments.
Constitutive properties: For a mechanical element these is the relation that
specifies the material properties. For example, in a linear elastic bar element it is
sufficient to specify the elastic modulus E and the thermal coefficient of expansion.
Fabrication properties: For a mechanical element, these fabrication properties have
been integrated out from the element dimensionality. Examples are cross sectional
156
properties of MoM elements such as bars, beams and shafts, as well as the thickness
of a plate or shell element.
This data is used by the element generation subroutines to compute element
stiffness relations in the local system.
157
of this process is not necessarily as simple as the hand calculations of the truss
example suggest. The master stiffness relations in practical cases may involve
thousands (or even millions) of degrees of freedom. To conserve storage and
processing time the use of sparse matrix techniques as well as peripheral storage is
required. Nevertheless, this inevitably increases the programming complexity.
2.7 Boundary Conditions
A key strength of the FEM is the ease and elegance with which it handles arbitrary
boundary and interface conditions. This power, however, has a down side. One of the
biggest hurdles a FEM newcomer faces is the understanding and proper handling of
boundary conditions. Surprisingly, prior exposure to partial differential equations,
without a balancing study of variation calculus, does not appear to be of much help
in this regard.
In the present session, we summarize some basic rules for treating boundary
conditions.
158
The term “direct” is meant to exclude derivatives of the primary function, unless
those derivatives also appear as degrees of freedom, such as rotations in beams and
plates.
159
computational time requirements can be managed simultaneously to address most
engineering applications. FEM allows entire designs to be constructed, refined, and
optimized before the design is manufactured.
This powerful design tool has significantly improved both the standard of
engineering designs and the methodology of the design process in many industrial
applications. The introduction of FEM has substantially decreased the time to take
products from concept to the production line. It is primarily through improved initial
prototype designs using FEM that testing and development have been accelerated. In
summary, benefits of FEM include increased accuracy, enhanced design and better
insight into critical design parameters, virtual prototyping, fewer hardware
prototypes, a faster and less expensive design cycle, increased productivity, and
increased revenue.
160
Data modelling may be performed during various types of projects and in multiple
phases of projects. Data models are progressive; there is no such thing as the final
data model for a business or application. Instead, you should consider a data model
as a living document that will change in response to a changing business. The data
models should ideally be stored in a repository so that they can be retrieved,
expanded, and edited over time.
161
way business is conducted lead to large changes in computer systems and
interfaces
b) Entity types are often not identified, or incorrectly identified. This can lead to
replication of data, data structure, and functionality, together with the
attendant costs of that duplication in development and maintenance
c) Data models for different systems are arbitrarily different. The result of this is
that complex interfaces are required between systems that share data. These
interfaces can account for between 25-70% of the cost of current systems Data
cannot be shared electronically with customers and suppliers, because the
structure and meaning of data has not been standardised. For example,
engineering design data and drawings for process plant are still sometimes
exchanged on paper The reason for these problems is a lack of standards that
will ensure that data models will both meet business needs and be consistent.
162
allowed expressions in an artificial 'language' with a scope that is limited by the
scope of the model. It describes the semantics of a domain. For example, it may be a
model of the interest area of an organization or industry. This consists of entity
classes, representing kinds of things of significance in the domain, and relationships
assertions about associations between pairs of entity classes the use of conceptual
schema has evolved to become a powerful communication tool with business users.
Often called a subject area model (SAM) or high-level data model (HDM), this
model is used to communicate core data concepts, rules, and definitions to a business
user as part of an overall application development or enterprise initiative. The
number of objects should be very small and focused on key concepts. Try to limit
this model to one page, although for extremely large organizations or complex
projects, the model might span two or more pages
Logical schema - describes the semantics, as represented by a particular data
manipulation technology. This consists of descriptions of tables and columns, object
oriented classes, and XML tags, among other things.
Physical schema - describes the physical means by which data are stored. This is
concerned with partitions, CPUs, table spaces, and the like.
The significance of this approach, according to ANSI, is that it allows the three
perspectives to be relatively independent of each other.
Storage technology can change without affecting either the logical or the conceptual
model. The table/column structure can change without (necessarily) affecting the
conceptual model. In each case, of course, the structures must remain consistent with
the other model. The table/column structure may be different from a direct
translation of the entity classes and attributes, but it must ultimately carry out the
objectives of the conceptual entity class structure. Early phases of many software
development projects emphasize the design of a conceptual data model. Such a
design can be detailed into a logical data model. In later stages, this model may be
translated into physical data model. However, it is also possible to implement a
conceptual model directly.
163
2.11 Database Model
Database model is a theory or specification describing how a database is
structured and used. Several such models have been suggested. Common models
include:
Flat model: This may not strictly qualify as a data model. The flat (or table) model
consists of a single, two-dimensional array of data elements, where all members of a
given column are assumed to be similar values, and all members of a row are
assumed to be related to one another.
Hierarchical model: In this model, data is organized into a tree-like structure,
implying a single upward link in each record to describe the nesting, and a sort field
to keep the records in a particular order in each same-level list.
Network model: This model organizes data using two fundamental constructs,
called records and sets. Records contain fields, and sets define one-to-many
relationships between records: one owner, many members.
Relational model: is a database model based on first-order predicate logic. Its core
idea is to describe a database as a collection of predicates over a finite set of
predicate variables, describing constraints on the possible values and combinations
of values.
Object-relational model: Similar to a relational database model, but objects, classes
and inheritance are directly supported in database schemas and in the query
language.
164
5. Briefly discuss each of the data associated with FE used in programs for
elementary calculations.
6. With the aid of diagrams differentiate between the common data models
7. Briefly discuss the basic rules for the treatment of boundary conditions
4.0 Summary
This is the end of our discussion and so far, we have been able to learn that the
development of systems and interfaces often cost more than they should, to build,
operate, and maintain. A major cause is that the quality of the data models
implemented in systems and interfaces is poor. This usually is because of:
i. Violation of business rules, as a result small changes in the way business is
conducted lead to large changes in computer systems and interfaces.
ii. Unidentified or incorrect identification of entity which can lead to replication
of data, data structure, and functionality, and increased costs of development
and maintenance
Consequently, data cannot be shared electronically with customers and suppliers
due to unstructured and lack of standard data that can meet business needs.
We also discussed the Physics-based Finite Element Method (FEM), which is
defined as a numerical technique for finding approximate solutions of partial
differential equations (PDE) as well as of integral equations.
i. Here we stated the uses, traced its origin and discussed its
applications
ii. Carried out the overview of the FEM basics including:
a. Idealization
b. Discretization; purpose and error sources Finite and Boundary element
methods
c. Interpretations of FEM and
d. Elementary attributes used in FEM
165
iii. Stated the three perspectives of data models; Conceptual, logical and Physical
schemas.
iv. The different types of database models; Flat, hierarchical, network, Relational
and Object oriented models.
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2PjBJrw , http://bit.ly/327iJhy ,
http://bit.ly/2ZB2tYo , http://bit.ly/2NDIrpU. Watch the video & summarize in 1
paragraph
b. View the animation on finite element model and critique it in the discussion
forum
c. Take a walk and engage any 3 students on finite element model; In 2
paragraphs summarize their opinion of the discussed topic. etc.
166
8.0 References/Further Readings
http://en.wikipedia.org/wiki/Finite_element_method#History
http://www.answers.com/topic/hp-fem
http://www.answers.com/topic/extended-finite-element-method
http://www.answers.com/topic/finite-element-method#History
http://www.answers.com/library/Sci-Tech Encyclopedia-cid-2823403
http://www.answers.com/topic/meshfree-methods
http://en.wikipedia.org/wiki/Data_model
http://www.idiagram.com/ideas/visual_models.html
167
STUDY SESSION 4
Statistics for Modelling and Simulation
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Conceptual Differences between Descriptive and Inference statistics
2.2 Descriptive Statistics
2.3 Inference Statistics
2.4 Other Essential Statistics for Simulations
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary and Conclusion
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 References and Further Readings
Introduction
You are welcome. In this study session, having looked at modelling methods, we
will discuss two ways statistics that are computed and applied in modelling and
simulations these include: inference and descriptive processes. Statistical inference is
generally distinguished from descriptive statistics. In simple terms, descriptive
statistics can be thought of as being just a straightforward presentation of facts, in
which modelling decisions made by a data analyst have had minimal influence.
Statistical inference is the process of drawing conclusions from data that are subject
to random variation, for example, observational errors or sampling variation. A
complete statistical analysis will nearly always include both descriptive statistics and
statistical inference, and will often progress in a series of steps where the emphasis
moves gradually from description to inference.
168
1.0 Study Session Learning Outcomes
After studying this session, I expect you to be able to:
1. Differentiate between Descriptive and Inference statistics
Inferential statistics tries to make inferences about a population from the sample
data. We also use inferential statistics to make judgments of the probability that an
observed difference between groups is a dependable one, or that it might have
happened by chance in this study. Thus, we use inferential statistics to make
inferences from our data to conditions that are more general; we use descriptive
statistics simply to describe what is going on in our data.
169
2.1 Descriptive Statistics
Descriptive statistics provide simple summaries about the sample and the measures.
Together with simple graphics analysis, they form the basis of quantitative analysis
of data. Descriptive statistics summarize data. For example, the shooting percentage
in basketball is a descriptive statistic that summarizes the performance of a player or
a team. The percentage is the number of shots made divided by the number of shots
taken. A player who shoots 33% is making approximately one shot in every three.
One making 25% is hitting once in four. The percentage summarizes or describes
multiple discrete events. Alternatively, consider the score of many students, the
grade point average. This single number describes the general performance of a
student across the range of their course experiences.
One that describes a large set of observations with a single indicator risks distorting
the original data or losing important detail. For example, the shooting percentage
does not tell you whether the shots are three-pointers or lay-ups, and GPA does not
tell you whether the student was in difficult or easy courses. Despite these
limitations, descriptive statistics provide a powerful summary that may enable
comparisons across people or other units.
170
possible values. Typically, specific values are not particularly meaningful (income
of 50,000 is typically not meaningfully different from 51,000). Grouping the raw
scores using ranges of values reduces the number of categories to something more
meaningful. For instance, we might group incomes into ranges of 0-10,000, 10,001-
30,000, etc.
Central tendency
The central tendency of a distribution locates the "center" of a distribution of values.
The three major types of estimates of central tendency are the mean, the median, and
the mode.
The mean is the most commonly used method of describing central tendency. To
compute the mean, take the sum of the values and divide by the count. For example,
the mean quiz score is determined by summing all the scores and dividing by the
number of students taking the exam. For example, consider the test score values:
15, 20, 21, 36, 15, 25, 15
The sum of these 7 values is 147, so the mean is 147/7 =21.
The mean is computed using the formula: Xi / n, where the sum is over i = 1 to n.
The median is the score found at the middle of the set of values, i.e., that has as
many cases with a larger value as have a smaller value. One way to compute the
median is to sort the values in numerical order, and then locate the value in the
middle of the list. For example, if there are 500 values, the value in 250th position is
the median. Sorting the 8 scores above produces:
15, 15, 15, 20, 21, 25, 36
There are 7 scores and score #4 represents the halfway point. The median is 20. If
there is an even number of observations, then the median is the mean of the two
middle scores. In the example, if there were an 8th observation, with a value of 25,
the median becomes the average of the 4th and 5th scores, in this case 20.5:
15, 15, 15, 20, 21, 25, 25, 36
171
The mode is the most frequently occurring value in the set. To determine the mode,
compute the distribution as above. The mode is the value with the greatest frequency.
In the example, the modal value 15, occurs three times. In some distributions there is
a "tie" for the highest frequency, i.e., there are multiple modal values. These are
called multi-modal distributions.
Notice that the three measures typically produce different results. The term "average"
obscures the difference between them and is better avoided. The three values are
equal if the distribution is perfectly "normal" (i.e., bell-shaped).
Dispersion
Dispersion is the spread of values around the central tendency. There are two
common measures of dispersion, the range and the standard deviation. The range is
simply the highest value minus the lowest value. In our example distribution, the
high value is 36 and the low is 15, so the range is 36 − 15 = 21.
The standard deviation is a more accurate and detailed estimate of dispersion
because an outlier can greatly exaggerate the range (as was true in this example
where the single outlier value of 36 stands apart from the rest of the values). The
standard deviation shows the relation that set of scores has to the mean of the sample.
Again, let us take the set of scores:
15, 20, 21, 36, 15, 25, 15
To compute the standard deviation, we first find the distance between each value and
the mean. We know from above that the mean is 21. Therefore, the differences from
the mean are:
15 − 21 = −6
20 − 21 = −1
21 − 21 = 0
36 − 21 = 15
15 − 21 = −6
25 − 21 = +4
172
15 − 21 = −6
Notice that values that are below the mean have negative differences and values
above it has positive ones. Next, we square each difference:
(62) = 36
(−12) = 1
(+02) = 0
(152) = 225
(−62) = 36
(+42) = 16
(−62) = 36
Now, we take these "squares" and sum them to get the sum of squares (SS) value.
Here, the sum is 350. Next, we divide this sum by the number of scores minus 1.
Here, the result is 350 / 6 = 58.3. This value is known as the variance. To get the
standard deviation, we take the square root of the variance (remember that we
squared the deviations earlier). This would be √58.3 = 7.63.
Although this computation may seem intricate, it is actually quite simple. In English,
we can describe the standard deviation as:
The square root of the sum of the squared deviations from the mean divided by the
number of scores minus one given as: √ ((xi – u) 2)/ n; where x = observed value and
u = the mean
The standard deviation allows us to reach some conclusions about specific scores in
our distribution. Assuming that the distribution of scores is close to "normal", the
following conclusions can be reached:
i. Approximately 68% of the scores in the sample fall within one standard
deviation of the mean (u - SD) and (u + SD)
ii. Approximately 95% of the scores in the sample fall within two standard
deviations of the mean (u-2SD) and (u+2SD)
173
iii. Approximately 99% of the scores in the sample fall within three standard
deviations of the mean (u - 3SD) and (u + 3SD)
For example, since the mean in our example is 21 and the standard deviation is 7.63,
we can from the above statement estimate that approximately 95% of the scores will
fall in the range of 21 − (2×7.63) to 21 + (2×7.63) or between 5.74 and 36.26. Values
beyond two standard deviations from the mean can be considered "outliers". 36 is the
only such value in our distribution.
Outliers help identify observations for further analysis or possible problems in the
observations. Standard deviations also convert measures on very different scales,
such as height and weight, into values that can be compared.
iv. Other Statistics
In research involving comparisons between groups, emphasis is often placed on the
significance level for the hypothesis that the groups being compared differ to a
degree greater than would be expected by chance. This significance level is often
represented as a p-value, or sometimes as the standard score of a test statistic. In
contrast, an effect size conveys the estimated magnitude and direction of the
difference between groups, without regard to whether the difference is statistically
significant. Reporting significance levels without effect sizes is problematic, since
for large sample sizes even small effects of little practical importance can be
statistically significant.
174
ii. Measures of dispersion
iii. Measures of association
iv. Cross-tabulation, contingency table
v. Histogram
vi. Quantile, Q-Q plot
vii. Scatter plot
viii. Box plot
175
2.2.1 Some common forms of statistical proposition
a) An estimate - a particular value that best approximates some parameter of
interest,
b) A confidence interval (or set estimate) - an interval constructed from the data
in such a way that, under repeated sampling of datasets, such intervals would
contain the true parameter value with the probability at the stated confidence
level,
c) A credible interval - a set of values containing, for example, 95% of posterior
belief,
d) Rejection of an hypothesis
e) Clustering or classification of data points into groups
2.2.2 Models/Assumptions
Any statistical inference requires some assumptions. A statistical model is a set of
assumptions concerning the generation of the observed data and similar data.
Descriptions of statistical models usually emphasize the role of population
quantities of interest, about which we wish to draw inference.
176
example, every continuous probability distribution has a median, which may
be estimated using the sample median or the Hodges-Lehmann-Sen estimator,
which has good properties when the data arise from simple random sampling.
iii. Semi-parametric: This term typically implies assumptions 'between' fully
and non-parametric approaches. For example, one may assume that a
population distribution has a finite mean. Furthermore, one may assume that
the mean response level in the population depends in a truly linear manner on
some covariate (a parametric assumption) but does not make any parametric
assumption describing the variance around that mean. More generally, semi-
parametric models can often be separated into 'structural' and 'random
variation' components. One component is treated parametrically and the other
non-parametrically.
177
samples" is approximately normally distributed, if the distribution is not heavy
tailed.
178
2.2.6 Randomization-based models
For a given dataset that was produced by a randomization design, the randomization
distribution of a statistic (under the null-hypothesis) is defined by evaluating the test
statistic for all of the plans that could have been generated by the randomization
design.
In frequents inference, randomization allows inferences to be based on the
randomization distribution rather than a subjective model, and this is important
especially in survey sampling and design of experiments. Statistical inference from
randomized studies is also more straightforward than many other situations.
In Bayesian inference, randomization is also of importance: In survey, sampling
without replacement ensures the exchangeability of the sample with the
Population; in randomized experiments, randomization warrants a missing at random
assumption for covariate information.
Objective randomization allows properly inductive procedures. Many statisticians
prefer randomization-based analysis of data that was generated by well-defined
randomization procedures. However, it is has been observed that in fields of science
with developed theoretical knowledge and experimental control, randomized
experiments may increase the costs of experimentation without improving the quality
of inferences. Similarly, results from randomized experiments are recommended by
leading statistical authorities as allowing inferences with greater reliability than do
observational studies of the same phenomena. However, a good observational study
may be better than a bad randomized experiment.
The statistical analysis of a randomized experiment may be based on the
randomization scheme stated in the experimental protocol and does not need a
subjective model. However, not all hypotheses can be tested by randomized
experiments or random samples, which often require a large budget, a lot of expertise
and time, and may have ethical problems.
179
2.2.7 Modes of inference
Different schools of statistical inference have become established. These schools (or
'paradigms') are not mutually exclusive, and methods that work well under one
paradigm often have attractive interpretations under other paradigms. The two main
paradigms in use are frequentist and Bayesian inference, which are both
summarized below.
i. Frequentist inference
This paradigm regulates the production of propositions by considering (notional)
repeated sampling of datasets similar to the one at hand. By considering its
characteristics under repeated sample, the frequentist properties of any statistical
inference procedure can be described - although in practice this quantification may
be challenging. Examples of frequentist inference are P-value and Confidence
interval
The frequentist calibration of procedures can be done without regard to utility
functions. However, some elements of frequentist statistics, such as statistical
decision theory, do incorporate utility functions. Loss functions must be explicitly
stated for statistical theorists to prove that a statistical procedure has an optimality
property. For example, median-unbiased estimators are optimal under absolute value
loss functions, and least squares estimators are optimal under squared error loss
functions.
While statisticians using frequentist inference must choose for themselves the
parameters of interest, and the estimators/test statistic to be used, the absence of
obviously explicit utilities and prior distributions has helped frequentist procedures
to become widely viewed as 'objective'.
180
propositions. There are several different justifications for using the Bayesian
approach. Examples of Bayesian inference are credible intervals for interval
estimation and Bayes factors for model comparison.
Many informal Bayesian inferences are based on "intuitively reasonable" summaries
of the posterior. For example, the posterior mean, median and mode, highest
posterior density intervals, and Bayes Factors can all be motivated in this way.
While a user's utility function need not be stated for this sort of inference, these
summaries do all depend (to some extent) on stated earlier beliefs, and are generally
viewed as subjective conclusions.
Formally, Bayesian inference is calibrated with reference to an explicitly stated
utility, or loss function; the 'Bayes rule' is the one which maximizes expected utility,
averaged over the subsequent uncertainty. Formal Bayesian inference therefore
automatically provides optimal decisions in a decision theoretic sense. Given
assumptions, data and utility, Bayesian inference can be made for essentially any
problem, although not every statistical inference need have a Bayesian
interpretation. Some advocates of Bayesian inference assert that inference must take
place in this decision-theoretic framework, and that Bayesian inference should not
conclude with the evaluation and summarization of posterior beliefs.
In-text Question 1
What are the three levels of modelling assumptions?
Answer
Fully parametric
Non-parametric
Semi-parametric
181
from a drawn sample back to a population, within the limits of random error.
However, when reviewing business education research, Wunsch (1986) stated that
“two of the most consistent flaws included:
i. Disregard for sampling error when determining sample size.
ii. Disregard for response and non-response bias”.
Within a quantitative survey design, determining sample size and dealing with no
response bias is essential. “One of the real advent ages of quantitative methods is
their ability to use smaller groups of people to make inferences about larger groups
that would be prohibitively expensive to study”. The question then is, how large of a
sample is required to infer research findings back to a population.
Standard textbook authors and researchers offer tested methods that allow studies to
take full advantage of statistical measurements, which in turn give researchers the
upper hand in determining the correct sample size. Sample size is one of the four
inter-related features of a study design that can influence the detection of significant
differences, relationships or interactions (Peers, 1996). Generally, these survey
designs try to minimize both alpha error (finding a difference that does not actually
exist in the population) and beta error (failing to find a difference that actually exists
in the population) (Peers, 1996).
However, improvement is needed. Researchers are learning experimental statistics
from highly competent statisticians and then doing their best to apply the formulas
and approaches.
182
educational level, etc., which variable(s) should be used as the basis for sample size?
This is important because the use of gender as the primary variable will result in a
substantially larger sample size than if one used the seven-point scale as the primary
variable of measure.
Cochran (1977) addressed this issue by stating that “One method of determining
sample size is to specify margins of error for the items that are regarded as most vital
to the survey. An estimation of the sample size needed is first made separately for
each of these important items”. When these calculations are completed, researchers
will have a range of n’s, usually ranging from smaller n’s for scaled, continuous
variables, to larger n’s for dichotomous or categorical variables.
The researcher should make sampling decisions based on these data. If the n’s for the
variables of interest are relatively close, the researcher can simply use the largest n as
the sample size and be confident that the sample size will provide the desired results.
More commonly, there is a sufficient variation among the n’s so that we are reluctant
to choose the largest, either from budgetary considerations or because this will give
an over-all standard of precision substantially higher than originally contemplated. In
this event, the desired standard of precision may be relaxed for certain items, in order
to permit the use of a smaller value of n. The researcher may also decide to use this
information in deciding whether to keep all of the variables identified in the study.
“In some cases, the n’s are so discordant that certain of them must be dropped from
the inquiry; . . .”.
Error Estimation
Cochran’s (1977) formula uses two key factors:
i. The risk the researcher is willing to accept in the study, commonly called the
margin of error, or the error the researcher is willing to accept, and
ii. The alpha level, the level of acceptable risk the researcher is willing to accept
that.
183
The true margin Alpha Level.
The alpha level used in determining sample size in most educational research studies
is either .05 or .01 (Ary, Jacobs, & Razavieh, 1996). In Cochran’s formula, the alpha
level is incorporated into the formula by utilizing the t-value for the alpha level
selected (e.g., t-value for alpha level of .05 is 1.96 for sample sizes above 120).
Researchers should ensure they use the correct t- value when their research involves
smaller populations, e.g., t-value for alpha of .05 and a population of 60 is 2.00.
In general, an alpha level of .05 is acceptable for most research. An alpha level of .10
or lower may be used if the researcher is more interested in identifying marginal
relationships, differences or other statistical phenomena as a precursor to further
studies. An alpha level of .01 may be used in those cases where decisions based on
the research are critical and errors may cause substantial financial or personal harm,
e.g., major programmatic changes.
184
Variance Estimation
A critical component of sample size formulas is the estimation of variance in the
primary variables of interest in the study. The researcher does not have direct control
over variance and must incorporate variance estimates into research design. Cochran
(1977) listed four ways of estimating population variances for sample size
determinations:
i. Take the sample in two steps, and use the results of the first step to determine
how many additional responses are needed to attain an appropriate sample size
based on the variance observed in the first step data;
ii. Use pilot study results;
iii. Use data from previous studies of the same or a similar population; or
iv. Estimate or guess the structure of the population assisted by some logical
mathematical results
The first three ways are logical and produce valid estimates of variance; therefore,
we may not need to discuss them further. However, in many educational and social
research studies, it is not feasible to use any of the first three ways and the researcher
must estimate variance using the fourth method.
A researcher typically needs to estimate the variance of scaled and categorical
variables. To estimate the variance of a scaled variable, one must determine the
inclusive range of the scale, and then divide by the number of standard deviations
that would include all possible values in the range, and then square this number. For
example, if a researcher used a seven-point scale and given those six standard
deviations (three to each side of the mean) would capture 98% of all responses, the
calculations would be as follows:
7 (number of points on the scale)
S= ---------------------------------------------
6 (number of standard deviations)
185
When estimating the variance of a dichotomous (proportional) variable such as
gender, Krejcie and Morgan (1970) recommended that researchers should use .50 as
an estimate of the population proportion. This proportion will result in the
maximization of variance, which will also produce the maximum sample size. This
proportion can be used to estimate variance in the population. For example, squaring
.50 will result in a population variance estimate of .25 for a dichotomous variable.
(1.96)2(1.167)
(t)2 * (s)2 2
n0 = = = 118
2 (7*.03 2
(d) )
Where t = value for selected alpha level of .025 in each tail = 1.96 (the alpha level of
.05 indicates the level of risk the researcher is willing to take that true margin of
error may exceed the acceptable margin of error.)
186
s = estimate of standard deviation in the population = 1.167 (estimate of variance
deviation for 7-point scale calculated by using 7 [inclusive range of scale] divided by
6 [number of standard deviations that include almost all (approximately 98%) of the
possible values in the range]).
d = acceptable margin of error for mean being estimated = .21 (number of points on
primary scale * acceptable margin of error; points on primary scale = 7; acceptable
margin of error = .03 [error researcher is willing to except]).
Therefore, for a population of 1,679, the required sample size is 118. However, since
this sample size exceeds 5% of the population (1,679*.05=84), Cochran’s (1977)
correction formula should be used to calculate the final sample size. These
calculations are as follows:
No (118)
n = --------------------- = -------------------- = 111
(1 + no / Population) (1 + 118/1679)
187
sample size are scarce. If the researcher decides to use over-sampling, four methods
may be used to determine the anticipated response rate:
i. Take the sample in two steps, and use the results of the first step to estimate
how many additional responses may be expected from the second step.
ii. Use pilot study results.
iii. Use responses rates from previous studies of the same or a similar population;
or
iv. Estimate the response rate. The first three ways are logical and will produce
valid estimates of response.
Categorical Data
The sample size formulas and procedures used for categorical data are very similar,
but some variations do exist. Assume a researcher has set the alpha level a priori at
.05, plans to use a proportional variable, has set the level of acceptable error at 5%,
and has estimated the standard deviation of the scale as .5. Cochran’s sample size
formula for categorical data and an example of its use is presented here along with
explanations as to how these decisions were made.
(t)2 * (p)(q)
no= ---------------
(d)2
(1.96)2(.5)(.5)
no= -----------------= 384
(.05)2
Where t = value for selected alpha level of .025 in each tail = 1.96 (the alpha level of
.05 indicates the level of risk the researcher is willing to take that true margin of
error may exceed the acceptable margin of error).
Where (p)(q) = estimate of variance = .25 (maximum possible proportion (.5) * 1-
maximum possible proportion (.5) produces maximum possible sample size).
Where d = acceptable margin of error for proportion being estimated = .05 (error
researcher is willing to except).
188
Therefore, for a population of 1,679, the required sample size is 384. However, since
this sample size exceeds 5% of the population (1,679*.05=84), Cochran’s (1977)
correction formula should be used to calculate the final sample size. These
calculations are as follows:
n0
n1= --------------------
(1 + no / Population)
(384)
------------------- =
n1= 313
(1 + 384/1679)
189
on how these conditions are specified. These are concerned with the types of
assumptions made about the distribution of the parent population (population from
which the sample is drawn) and the actual sampling procedure.
One of the simplest versions of the theorem says that if is a random sample of size n
(say, n larger than 30) from an infinite population, finite standard deviation, then the
standardized sample mean converges to a standard normal distribution or,
equivalently, the sample mean approaches a normal distribution with mean equal to
the population mean and standard deviation equal to standard deviation of the
population divided by the square root of sample size n. In applications of the central
limit theorem to practical problems in statistical inference, however, statisticians are
more interested in how closely the approximate distribution of the sample mean
follows a normal distribution for finite sample sizes, than the limiting distribution
itself. Sufficiently close agreement with a normal distribution allows statisticians to
use normal theory for making inferences about population parameters (such as the
mean) using the sample mean, irrespective of the actual form of the parent
population.
It is well known that whatever the parent population is, the standardized variable will
have a distribution with a mean 0 and standard deviation 1 under random sampling.
Moreover, if the parent population is normal, then it is distributed exactly as a
standard normal variable for any positive integer n. The central limit theorem states
the remarkable result that, even when the parent population is non-normal, the
standardized variable is approximately normal if the sample size is large, enough
(say > 30). It is generally not possible to state conditions under which the
approximation given by the central limit theorem works and what sample sizes are
needed before the approximation becomes good enough. As a general guideline,
statisticians have used the prescription that if the parent distribution is symmetric and
relatively short-tailed, then the sample mean reaches approximate normality for
smaller samples than if the parent population is skewed or long-tailed.
190
Under certain conditions, in large samples, the sampling distribution of the sample
mean can be approximated by a normal distribution. The sample size needed for the
approximation to be adequate depends strongly on the shape of the parent
distribution. Symmetry (or lack thereof) is particularly important. For a symmetric
parent distribution, even if very different from the shape of a normal distribution, an
adequate approximation can be obtained with small samples (e.g., 10 or 12 for the
uniform distribution). For symmetric short-tailed parent distributions, the sample
mean reaches approximate normality for smaller samples than if the parent
population is skewed and long-tailed. In some extreme cases (e.g. binomial) samples
sizes far exceeding the typical guidelines (e.g., 30) are needed for an adequate
approximation. For some distributions without first and second moments (e.g.,
Cauchy), the central limit theorem does not hold.
191
2.3.4 ANOVA: Analysis of Variance
Analysis of Variance or ANOVA enables us to test the difference between 2 or
more means. ANOVA does this by examining the ratio of variability between two
conditions and variability within each condition. For example, if we give a drug that
we believe will improve memory to a group of people and give a placebo to another
group of people, we might measure memory performance by the number of words
recalled from a list we ask everyone to memorize. A t-test would compare the
likelihood of observing the difference in the mean number of words recalled for
each group. An ANOVA test, on the other hand, would compare the variability that
we observe between the two conditions to the variability observed within each
condition. We measure variability as the sum of the difference of each score from
the mean. When we actually calculate an ANOVA we use a short-cut formula. Thus,
when the variability that we predict (between the two groups) is much greater than
the variability we do not predict (within each group) then we will conclude that our
treatments produce different results.
2.3.5 Exponential Density Function (EDF)
EDF is use to take important class of decision problems under uncertainty such as
the chance between events. For example, the chance of the length of time to next
breakdown of a machine not exceeding a certain time, such as the photocopying
machine in your office not to break during this week.
Exponential distribution gives distribution of time between independent events
occurring at a constant rate. Its density function is:
f(t) = exp(-t),
Where is the average number of events per unit of time, which is a positive
number?
Applications include probabilistic assessment of the time between arrival of patients
to the emergency room of a hospital, and arrival of ships to a particular port.
192
i. If you like to use Exponential Applet to perform your computations. Visit:
http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/pvalues.htm for
computation of The P-values for the Popular Distributions.
ii. If you also like to use the Lilliefors Test for Exponentially to perform the
goodness-of-fit test, then visit: http://home.ubalt.edu/ntsbarsh/Business-
stat/otherapplets/LilliExpon.htm
193
that the number of people joining the queue in a given time period follows the
Poisson model.
If you like to use Poisson Applet to perform your computations, visit:
http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/pvalues.htm
194
B. Runs tests: (run-ups, run-downs):
This is a direct test of the independence assumption. There are two test
statistics to consider: one based on a normal approximation and another using
numerical approximation.
Answer
True
195
T-TEST GROUPS=GENDER(1,2)/VARIABLES=X
For SPSS programs useful to simulation input/output analysis, visit Data Analysis
Routines @ http://home.ubalt.edu/ntsbarsh/stat-data/SPSSSAS.htm .
3.0 Tutor Marked Assignments (Individual or Group)
i. State the essential feature of Poisson process.
ii. List four ways of estimating population variances for sample size and
determinations according to Cochran.
iii. What are the objectives of randomization, and state the importance of
randomization in frequentist and Bayesian inferences?
iv. State the Cochran’s sample size formula for continuous and categorical data.
v. Assume a researcher has set the alpha level a priori at 10%, plans to use a
proportional variable, has set the level of acceptable error at 5%, and has
estimated the standard deviation of the scale as .5. Find the sample size for a
population of 2500
vi. What are the Cochran’s key factors for error estimation?
4.0 Summary
We have come to the end of our discussion in this study session. Statistics is the
basis of simulation. In this study session, we have simply introduced some basic
statistics in modelling and simulations. We hope that the reader will broaden his/her
understanding by consulting the referenced texts or other statistics books.
In this study session, we were able to:
[1] Differentiate between the two broad components of statistics: descriptive
and inference statistics
[2] Have concise discussions of descriptive statistics on; Univarite statistics
measures: the distribution, central tendency, dispersion, etc. and gave some
examples
196
[3] Discuss Inference statistics under the following subheads: definition,
Model/assumptions, approximate distributions, random-based models and
modes of inference
[4] Introduce some essential statistical measures in simulation such as; sample size
determination central limit theorem least square model Analysis of variance
Exponential distribution function Poisson distribution Uniform distribution Test
for randomness and Some commands of Special package for statistical analysis
(SPSS).
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add. http://bit.ly/2NIromR , http://bit.ly/2L9Ltzq ,
http://bit.ly/2LhudZf , http://bit.ly/2UlmpK9 Watch the video & summarize in 1
paragraph.
b. View the animation on statistics for modelling and simulation and critique it in the
discussion forum
c. Take a walk and engage any 3 students on statistics for modelling and simulation;
In 2 paragraphs summarize their opinion of the discussed topic. etc.
197
8.0 References/Further Readings
Cochran, W. G. (1977) in Organizational Research: Determining Appropriate
Sample Size in Survey Research by James E. Bartlett, etal.
Headrick T. (2002) Fast fifth-order polynomial transforms for generating univariate
and multivariate non normal distributions, Computational Statistics and Data
Analysis, 40 (4), 685-711, 2002.
Karian Z., and E. Dudewicz (1988). Modern Statistical Systems and GPSS
Simulation. CRCPress.
Korn G., Real statistical experiments can use simulation-package software,
Simulation Modelling Practice and Theory, 13(1), 39-54, 2005.
Lewis P., and E. Orav, Simulation Methodology for Statisticians, Operations
Analysts, and Engineers, Wadsworth Inc., 1989 Robert C., and G. Casella,
Monte Carlo Statistical Methods, Springer, 1999.
i. http://en.wikipedia.org/wiki/Inferential_statistics
198
MODULE 3
Queues
Contents
Study Session 1: Simple Theories of Queues
Study Session 2: Basic Probability Theories in Queuing
Study Session 3: Queuing Models
Study Session 4: Queuing Experiments
STUDY SESSION 1
Simple Theories of Queues
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Overview of Queuing Systems
2.2 Queuing theory
2.3 Queuing Discipline
2.4 Queuing networks
2.5 Role of Poisson process and Exponential distributions in Queues
2.6 Limitations of queuing theory
2.7 Kendall-Lee Notations.
2.8 Little’s Queuing Formula
2.9 Queuing Terminology and Notations
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary/Conclusion
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
199
Introduction
You are welcome. In this session, we will look at a very useful type of simulations
called queuing system. We deal with queuing systems all the time in our daily lives.
For examples, when you stand in line to cash a cheque at the bank, you are dealing
with a queuing system. When you submit a “batch job” (such as a compilation) on a
mainframe computer, your job must wait in line until the CPU finishes the jobs
scheduled ahead of it. When you make a phone call to reserve an airline ticket and
get a recording that says, thank you for calling. The next available operator will
answer your call. “Please wait” … you are dealing with a queuing system. Waiting is
the critical element of queuing systems.
200
on) as fully as possible while keeping the wait time within a reasonable limit. These
goals usually require a compromise between cost and customer satisfaction.
To put this on a personal level, no one likes to stand in line. If there were one
checkout counter for each customer in a supermarket, the customers would be
delighted. The supermarket, however, would not be in business very long. So a
compromise is made: The number of cashiers is kept within the limits set by the
store’s budget, and the average customer is not kept waiting too long.
How does a company determine the optimal compromise between the number of
servers and the wait time? One way is by experience; the company tries out different
numbers of servers and sees how things work out. There are two problems with this
approach: It takes too long and it is too expensive. Another way of examining this
problem is by using a computer simulation.
Queuing theory is generally considered a branch of operations research because the
results are often used when making business decisions about the resources needed to
provide service. It is applicable in a wide variety of situations that may be
encountered in business, commerce, industry, healthcare, public service and
engineering. Applications are frequently encountered in customer service situations
as well as transport and telecommunication. Queuing theory is directly applicable to
intelligent transportation systems, call centers, PABXs, networks,
telecommunications, server queuing, and mainframe computer queuing of
telecommunications terminals, advanced telecommunications systems, and traffic
flow.
The use of queuing allows the systems to queue their customers' requests until free
resources become available. This means that if traffic intensity levels exceed
available capacity, customer's calls are not lost; customers instead wait until they can
be served. This method is used in queuing customers for the next available operator.
Let us look at how we can solve this problem as a time-driven simulation.
A time-driven simulation is one in which the model is viewed at uniform time
intervals, say, every minute. To simulate the passing of a unit of time (a minute, for
201
example), we increment a clock. We run the simulation for a predetermined amount
of time, say, 100 minutes. (Of course, simulated time usually passes much more
quickly than real time; 100 simulated minutes pass in a flash on the computer.)
Think of the simulation as a big loop that executes a set of rules for each value of the
clock from 1 to 100, in our example. Here are the rules that are processed in the loop
body:
i. Rule 1: If a customer arrives, he or she gets in line.
ii. Rule 2: If the teller is free and if there is anyone waiting, the first customer in
line leaves the line and advances to the teller’s window. The service time is set
for that customer.
iii. Rule 3: If a customer is at the teller’s window, the time remaining for that
customer to be serviced is decremented.
iv. Rule 4: If there are customers in line, the additional minute that they have
remained in the queue (their wait time) is recorded.
The output from the simulation is the average wait time. We calculate this value
using the following formula:
Average wait time = total wait time for all customers / number of customers
Given this output, the bank can see whether their customers have an unreasonable
wait in a one-teller system. If so, the bank can repeat the simulation with two tellers.
There are still two unanswered questions:
i. How do we know if a customer arrived?
ii. How do we know when a customer has finished being served?
We must provide the simulation with information about the arrival times and the
service times. These are the variables (parameters) in the simulation. We can never
predict exactly when a customer arrives or how long each individual customer takes.
We can, however, make educated guesses, such as a customer arrives about every
five minutes and most customers take about three minutes to service.
a. How do we know whether a job has arrived in this particular clock unit?
202
The answer is a function of two factors: the number of minutes between arrivals (five
in this case) and chance. Chance? Queuing models are based on chance. Well, not
exactly. Let us express the number of minutes between arrivals another way- as the
probability that a job arrives in any given clock unit. Probabilities range from 0.0 (no
chance) to 1.0 (a sure thing). If on the average a new job arrives every five minutes,
then the chance of a customer arriving in any given minute is 0.2 (1 chance in 5).
Therefore, the probability of a new customer arriving in a particular minute is 1.0
divided by the number of minutes between arrivals.
iii. Now, what about luck?
In computer terms, luck can be represented by the use of a random-number
generator. We simulate the arrival of a customer by writing a function that generates
a random number between 0 and 1 and apply the following rules.
i. If the random number is between 0.0 and the arrival probability, a job has
arrived.
ii. If the random number is greater than the arrival probability, no job arrived
in this clock unit.
By changing the rate of arrival, we simulate what happens with a one-teller system
where each transaction takes about three minutes as more and more cars arrive. We
can also have the duration of service time based on probability. For example, we
could simulate a situation where 60% of the people require three minutes, 30% of the
people require five minutes, and 10% of the people require ten minutes.
It is important at this point to note that simulation does not give us the answer or
even an answer. Simulation is a technique for trying out “what if” questions. We
build the model and run the simulation many times; trying various combinations of
the parameters and observing the average wait time. What happens if the cars arrive
more quickly? What happens if the service time is reduced by 10%? What happens if
we add a second teller?
203
2.1.1 Queuing Examples:
i. Waiting to pay in the supermarket
ii. Waiting at the telephone for information
iii. Planes circle before they can land
In-text Question 1
A queuing system is made up of _________ and __________ of objects to be served on usually a
first-in, first-out structure.
Answer
Servers and Queues
204
2.1 Queuing Theory
Queuing theory is the study of how systems with limited resources distribute those
resources to elements waiting in line, and how those elements waiting in line
respond.
Queuing theory is a mathematical discipline that studies systems intended for
servicing a random flow of requests (the moments at which the requests appear as
well as the time for servicing them is usually random).
Queuing theory is the study of the behaviour of queues (waiting lines) and their
elements. Queuing theory is a tool for studying several performance parameters of
computer systems and is particularly useful in locating the reasons for “bottlenecks,”
compromised computer performance caused by too much data waiting to be acted on
at a particular phase.
Examples include the distribution of cars on highways (including traffic jams), data
through computer networks and phone calls through voice networks. In these
examples, Queue size and waiting time can be looked at, or items within queues can
be studied and manipulated according to factors such as priority, size, or time of
arrival.
The purpose of methods developed in queuing theory is to organize service
reasonably, so that a given quality is ensured. Queuing theory from this standpoint
can be considered as part of operations research.
205
Last in first out
This principle also serves customers one at a time; however, the customer with the
shortest waiting time will be served first.
Processor sharing
Customers are served equally. Network capacity is shared between customers and
they all effectively experience the same delay.
Priority
Customers with high priority are served first.
The queuing strategies are:
– FIFO (first in first out)
– LIFO (last in first out = stack)
– SIRO (service in random order)
– SPT (shortest processing time first)
– PR (priority)
While the particular discipline chosen will likely greatly affect waiting times for
particular customers (nobody wants to arrive early at an LCFS discipline), the
discipline generally does not affect important outcomes of the queue itself, since
arrivals are constantly receiving service regardless.
Control processes within exchanges, which can be modelled using state equations,
handle queuing. Queuing systems use a particular form of state equations known as a
Markov chain that models the system in each state. Incoming traffic to these systems
is modelled via a Poisson distribution and is subject to Erlang’s queuing theory
assumptions viz.
a. Pure-chance traffic – Call arrivals and departures are random and independent
events.
b. Statistical equilibrium – Probabilities within the system do not change.
c. Full availability – All incoming traffic can be routed to any other customer
within the network.
206
d. Congestion is cleared as soon as servers are free.
Classic queuing theory involves complex calculations to determine waiting time,
service time, server utilization and other metrics that are used to measure queuing
performance.
207
performance can be useful. The fact that such models often give "worst-case"
scenario evaluations appeals to system designers who prefer to include a safety factor
in their designs. In addition, the form of the solution of models based on the Poisson
process often provides insight into the form of the solution to a queuing problem
whose detailed behaviour is poorly mimicked. As a result, queuing models are
frequently modelled as Poisson processes with the exponential distribution.
208
2.6 Kendall-Lee Notations
The notation for describing the characteristics of a queuing model was first
suggested by David G. Kendall in 1953. Kendall's notation introduced an A/B/C
queuing notation that can be found in all standard modern works on queuing theory.
The A/B/C notation designates a queuing system having A as interarrival time
distribution, B as service time distribution and C as number of servers.
For example, "G/D/1" would indicate a General (may be anything) arrival process, a
Deterministic (constant time) service process and a single server. More details on
this notation are given later below in the queuing models.
Since describing all of the characteristics of a queue inevitably becomes very wordy,
a much simpler notation (known as Kendall-Lee notation) can be used to describe a
system. Kendall-Lee notation gives us six abbreviations for characteristics listed in
order separated by slashes. The first and second characteristics describe the arrival
and service processes based on their respective probability distributions. For the first
and second characteristics,
M represents an exponential distribution,
E represents an Erlang distribution, and
G represents a general distribution.
The third characteristic gives the number of servers working together at the same
time, also known as the number of parallel servers.
The fourth describes the queue discipline by its given acronym.
The fifth gives the maximum number of number of customers allowed in the system.
The sixth gives the size of the pool of customers that the system can draw from.
For example, M/M/5/FCFS/20/inf could represent a bank with 5 tellers, exponential
arrival times, exponential service times, an FCFS queue discipline, a total capacity of
20 customers, and an infinite population pool to draw from.
209
Kendall’s notation:
A/B/c/N/K where:
A the inter arrival distribution
B the service time distribution
C the number of parallel servers
N the system capacity
K the size of the target group
210
system can only be either in the queue or in service, it goes to show that: L = Lq +
Ls.
Likewise, we can define W as the average time a customer spends in the queuing
system. Wq is the average amount of time spent in the queue itself and Ws is the
average amount of time spent in service. As was the similar case before, W = Wq +
Ws. It should be noted that all of the averages in the above definitions are the
steady-state averages.
Defining λ as the arrival rate into the system, that is, the number of customers
arriving the system per unit of time, it can be shown that:
= λW Lq = λWq Ls = λWs
This is known as little’s queuing formula.
In the mathematical theory of queues, Little's result, theorem, lemma, law or
formula says:
The long-term average number of customers in a stable system L is equal to the
long-term average arrival rate, λ, multiplied by the long-term average time a
customer spends in the system, W; or expressed algebraically: L = λW.
Although it looks intuitively reasonable, it is a quite remarkable result, as it implies
that this behaviour is entirely independent of any of the detailed probability
distributions involved, and hence requires no assumptions about the schedule
according to which customers arrive or are serviced.
It is also a comparatively recent result; the first proof was published in 1961 by John
Little, then at Case Western Reserve University. Handily his result applies to any
system, and particularly, it applies to systems within systems. Therefore, in a bank,
the customer line might be one subsystem, and each of the tellers another subsystem,
and Little's result could be applied to each one, as well as the whole thing. The only
requirements are that the system is stable and non-pre-emptive; this rules out
transition states such as initial start up or shut down.
211
In some cases, it is possible to mathematically relate not only the average number in
the system to the average wait but relate the entire probability distribution (and
moments) of the number in the system to the wait.
For example - Imagine a small shop with a single counter and an area for browsing,
where only one person can be at the counter at a time, and no one leaves without
buying something. So the system is roughly:
Entrance → Browsing → Counter → Exit
This is a stable system, so the rate at which people enter the store is the rate at which
they arrive at the counter and the rate at which they exit as well. We call this the
arrival rate. By contrast, an arrival rate exceeding an exit rate would represent an
unstable system, and cause the store to overflow eventually.
Little's Law tells us that the average number of customers in the store, L, is the
arrival rate, λ, times the average time that a customer spends in the store, W.
Assume customers arrive at the rate of 10 per hour and stay an average of 0.5 hour.
This means we should find the average number of customers in the store at any time
to be 5.
Now suppose the store is considering doing more advertising to raise the arrival rate
to 20 per hour. The store must either be prepared to host an average of 10 occupants
or must reduce the time each customer spends in the store to 0.25 hour. The store
might achieve the latter by ringing up the bill faster or by walking up to customers
who seem to be taking their time browsing and saying, "Can I help you?"
We can apply Little's Law to systems within the shop, for example the counter and
its queue. Assume we notice that there are on average 2 customers in the queue and
at the counter. We know the arrival rate is 10 per hour, so customers must be
spending 0.2 hour on average checking out.
We can even apply Little's Law to the counter itself. The average number of people
at the counter would be in the range (0, 1), since no more than one person can be at
the counter at a time. In that case, the average number of people at the counter is also
known as the counter's utilization.
212
2.8 Queuing Terminology and Notations
Usually, the following notations are used as standards:
i.State of the system =number of customers in queuing system
ii.Queue length= number of customers waiting for service=state of the system
minus number of customers being served.
iii. N(t)= number of customers in queuing system at time t (t >=0)
iv. Pn(t)=probability of exactly n customers in queuing system at time t.
v. S= number of servers (parallel service channels) in queuing system.
vi. λn = mean arrival rate (expected number of arrivals per unit time) of new
customer when n customers are in system
vii.µ0 = mean service rate for overall system (expected number of customers
completing service per unit time) when n customers are in system. Note µ0
represents combined rate at which all busy servers (those serving
customers) achieve service completion.
When λn is a constant for all n, this constant is denoted by λ. When certain notations
also are required to describe steady-state result. When a queuing system has recently
begun operation, the state of the system will be greatly affected by the initial state
and by the time that has since elapsed. The system is said to be in a transient
condition. However, after sufficient time has elapsed, the state become independent
of the state and the elapsed time (expected under unusual circumstances). The system
has now reached steady state condition, where the probability distribution of the
state of the system remains the same (steady state or stationary distribution) over
time. Queuing theory has tended to focus largely on the stead-state conditions,
partially because the transient case is more difficult analytically.
The following notations assume that the system is in a steady state:
Pn = probability of exactly n customers in queuing system.
L = expected number of customers in in queuing system.
Lq = expected queue length (excluding customers being served).
213
W = waiting time in the system (including service time) for each individual
customer.
W = E(W)
Wq = waiting time in the queue (including service time) for each individual
customer.
Wq = E(Wq)
In-text Question 2
What is queuing discipline?
Answer
A queuing discipline determines the manner in which the exchange handles calls from customers.
4.0 Summary
Before we conclude our discussion, we ought to remind ourselves that queuing
systems are prevalent throughout society. The goals of queuing systems usually
require a compromise between cost and customer satisfaction. The adequacy of these
214
systems can have an important effect on the quality of life and productivity. In this
section we looked at ways of utilizing queuing to optimize service delivery in order
keep the waiting time, cost and customer satisfaction within reasonable limits.
In this session we:
i. Define a queuing system, as a discrete-event model that uses random
numbers to represent the arrival and duration of events and Queuing theory
as a mathematical discipline that studies systems intended for servicing a
random flow of requests know how to construct queue systems; parameters,
question and examples of queuing systems.
ii. We in the overview section look at:
[1] The Parameters, for Construction of queuing system,
[2] The questions and necessary questions in queuing construction
[3] The network queues, the role of Poisson and exponential distributions
[4] The limitations of queuing theory
iii. Described queuing systems using Kendall Lee notations and Little’s
formula
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/32aWO9a , http://bit.ly/2ztKq7W ,
http://bit.ly/2HxcnQV, http://bit.ly/2HylnVJ , http://bit.ly/347caNH ,
http://bit.ly/2Ix50u0. Watch the video & summarize in 1 paragraph
b. View the animation on simple theories of queues and basic probability theories
in queuing and critique it in the discussion forum
c. Take a walk and engage any 3 students on simple theories of queues and basic
215
probability theories in queuing; In 2 paragraphs summarize their opinion of the
discussed topic. etc.
216
STUDY SESSION 2
Basic Probability Theories in Queuing Systems
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Exponential and Poisson Probability Distributions
2.2 The Input Process.
2.3 The Output Process
2.4 Output variables
2.5 Birth-Death Processes
2.6 Steady-state Probabilities
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are welcome. To begin understanding queues, we must first have some
knowledge of probability theory. In particular, we will review the exponential and
Poisson probability distributions.
217
3. describe the Steady-State probability for queues
2.0 Main Content
2.1 And Poisson Probability Distribution
The exponential distribution with parameter λ is given by λe− variable that
represents inter arrival times with the exponential
λt
for t ≥ 0. If T is a random distribution, then:
depend on how much time has already passed. This makes intuitive sense for a
model where we’re measuring customer arrivals because the customers’ actions are
clearly independent of one another.
It’s also useful to note the exponential distribution’s relation to the Poisson
distribution. Poisson distribution is a "discrete probability distribution. It expresses
the probability of a number of events occurring in a fixed time if these events occur
with a known average rate, and are independent of the time since the last event".
Such events are said to be memory less.
A Poisson distribution describes most queuing systems’ characteristics such as
arrival and departure processes. Assuming that arrivals and departures are random
and independent i.e. they exhibit pure-chance property; arrivals are described by a
Poisson random variable or Poisson random distribution as shown by equation
below.
The probability that there are exactly k occurrences (k being a non-negative integer,
k = 0, 1, 2, ...) is the Poisson distribution with parameter λ is given by:
……………..(1)
218
occur during the given interval. A number of textbooks e.g. Schaum’s Outline
statistics 3rd edition provides the e- λ for various values of λ or by using logarithms.
For instance, if the events occur on average every 4 minutes, and you are interested
in the number of events occurring in a 10-minute interval, you would use as model a
Poisson distribution with λ = 2.5.
Some properties of Poisson distribution
Mean = m=λ
Variance = σ2 = l
etc.
Example 1
If the average rate of telephone calls received at an exchange of 8 lines is 6 per
minute. Find the probability that a caller is unable to make a connection if this is
defined to occur when all lines are engaged within a minute of the time of the call.
Solution
We first need to make an assumption that the overall rate of calls is constant, then we
can use equation (1) as follows: Since our time unit is 1 minute, then λ = 6
Using the equation:
The probability of not being able to make a call occurs only when there are at least 9
calls in any interval of a minute.
219
Solving this bit by bit with k from 0 to 8, we get p(k+1) = 0.8546.
With these distributions in mind, we can begin defining the input and output
processes of a basic queuing system, from which we can start developing the model
further.
220
Imagine an example where four customers are at a bank with three tellers with
exponentially distributed service times. Three of them receive service immediately,
while the fourth has to wait for one position to clear.
What is the probability that the fourth customer will be the final one to complete
service?
Due to the no-memory property of the exponential distribution, when the fourth
customer finally steps up to a teller, all three remaining customers have an equal
chance of finishing their service last, as the service time in this situation is not
governed by how long they have already been served. Thus, the answer to the
question is 1/3.
Unfortunately, the exponential distribution does not always represent service times
accurately. For a service that requires many different phases of service (for example,
scanning groceries, paying for groceries, and bagging the groceries), an Erlang
distribution can be used with the parameter k equal to the number of different phases
of service.
In-text Question 1
The probability that there are exactly k occurrences (k being a non-negative integer, k = 0, 1, 2, ...)
is the Poisson distribution with parameter λ is given by______________
Answer
𝑒 −𝜆 𝜆𝑘
𝑓(𝑘; 𝜆) =
𝑘!
221
v. Average time spent by a customer in the system w(service and queue)
vi. Average time spent by a customer in the queue wq
222
The probability that a birth will occur between t and t + t is λj t, and such a birth will
increase the state from j to j + 1. The probability that a death will occur between t
and t + t is µj t, and such a birth will decrease the state from j to j − 1.
223
we can substitute further.
These results are known as the flow balance equations. You may notice that they
suggest that the rate at which transitions occur into a particular state equal the rate at
which transitions occur out of the same state. At this point, each steady-state
probability can be determined by substituting in probabilities from lower states.
Starting with:
Where
In-text Question 1
In order to determine the steady-state probability Πj, we have to find a relation between
______________ and ____________ for a reasonably sized ___________
Answer
Pij(t+ t), Pij(t), t
224
3.0 Tutor Marked Assignments (Individual or Group)
i. If 10% of the tools produced in a manufacturing, process is defective. Find the
Poisson approximation to the binomial distribution.
ii. Poisson distribution is given by p(x) = (0.72)xe-0.72/x! Find p(0) and p(2).
4.0 Summary
We have come to the end of our discussion in the study session and this unit,
reviews an important function; Poisson distribution and gives an example of its
application. In this study session, we discussed Probability theories applied in
queuing systems:
i. Looked at Poisson and Exponential distributions in queuing system
ii. Defined the Input, Output parameters of queues
iii. Discuss the Birth-Death process; births and deaths are synonymous with
arrivals and service completions respectively. A birth increases the state by
one while a death decreases the state by one.
iv. Defined the formula for calculating Steady- State probability
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/32aWO9a , http://bit.ly/2ztKq7W ,
http://bit.ly/2HxcnQV, http://bit.ly/2HylnVJ , http://bit.ly/347caNH ,
http://bit.ly/2Ix50u0. Watch the video & summarize in 1 paragraph
b. View the animation on basic probability theories in queuing and critique it in the
discussion forum
225
c. Take a walk and engage any 3 students on basic probability theories in queuing;
In 2 paragraphs summarize their opinion of the discussed topic. etc.
226
STUDY SESSION 3
Queuing Models
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 - Queuing Model
2.2 - Single-Server Queue
2.3 - The Multiple and Infinite Server Systems
3.0 - Tutor Marked Assignments (Individual or Group assignments)
4.0 - Study Session Summary
5.0 - Self-Assessment Questions
6.0 - Additional Activities
7.0 - Self-Assessment Question Answers
8.0 - References/Further Readings
Introduction
I am delighted to welcome you to another exciting moment of learning. With the
foundation we have laid for the study of important characteristics of queuing
systems, we will In this study session, begin to analyse particular systems
themselves.
227
2.0 Main Content
2.1 Queuing Model
In queuing theory, a queuing model is used to approximate a real queuing situation
or system, so the queuing behaviour can be analysed mathematically. Queuing
models allow a number of useful steady state performance measures to be
determined, including:
i. The average number in the queue, or the system
ii. The average time spent in the queue, or the system
iii. The statistical distribution of those numbers or times
iv. The probability the queue is full, or empty and
v. The probability of finding the system in a particular state
2.1.1 Construction
Queuing models are generally constructed to represent the steady state of a queuing
system, that is, the typical, long run or average state of the system. As a consequence,
these are stochastic models that represent the probability that a queuing system will
be found in a particular configuration or state.
A general procedure for constructing and analysing such queuing models is:
i. Identify the parameters of the system, such as the arrival rate, service time,
queue capacity, and perhaps draw a diagram of the system.
ii. Identify the system states. (A state will generally represent the integer number
of customers, people, jobs, calls, messages, etc. in the system and may or may
not be limited.)
228
iii. Draw a state transition diagram that represents the possible system states and
identify the rates to enter and leave each state. This diagram is a representation
of a Markov chain.
iv. Because the state transition diagram represents the steady state situation
between states there is a balanced flow between states so the probabilities of
being in adjacent states can be related mathematically in terms of the arrival
and service rates and state probabilities.
v. Express all the state probabilities in terms of the empty state probability, using
the inter-state transition relationships.
vi. Determine the empty state probability by using the fact that all state
probabilities always sum to 1.
Whereas specific problems that have small finite state models can often be analysed
numerically, analysis of more general models, using calculus, yields useful formulae
that can be applied to whole classes of problems.
2.1.2 Parameters
In constructing a queuing model, we must know the following four things:
i. The number of events and how they affect the system in order to
determine the rules of entity interaction
ii. The number of servers
iii. The distribution of arrival times in order to determine if an entity enters the
system
iv. The expected service time in order to determine the duration of an event
229
Simulation uses these characteristics to predict the average wait time. The number of
servers, the distribution of arrival times, and the duration of service can be changed.
The average wait times are then examined to determine what a reasonable
compromise would be.
Example
Consider the case of a drive-in bank with one teller. How long does the average car
have to wait? If business gets better and cars start to arrive more frequently, what
would be the effect on the average wait time? When would the bank need to open a
second drive-in window?
This problem has the characteristics of a queuing model. The entities are a server
(the teller), the objects being served (customers in cars), and a queue to hold the
objects waiting to be served (customers in cars). The average wait time is what we
are interested in observing. The events in this system are the arrivals and the
departures of customers.
We first categorize queues into single and multiple servers and their service
delineations
In-text Question 1
In queuing theory, a queuing model is used to approximate a real queuing situation or system, so
the queuing behaviour can be analysed mathematically. True or False?
Answer
True
230
that the customer can select from.). Consequently, being able to model and analyse a
single server queue's behaviour is a particularly useful thing to do.
2.1.1 The Poisson arrivals and service
M/M/1/ represents a single server that has unlimited queue capacity and infinite
calling population, both arrivals and service are Poisson (or random) processes,
meaning the statistical distribution of both the inter-arrival times and the service
times follow the exponential distribution. Because of the mathematical nature of the
exponential distribution, a number of quite simple relationships can be derived for
several performance measures based on knowing the arrival rate and service rate.
231
Substituting this in to the equation for the steady-state probability, we get
We will define p = λ/µ as the traffic intensity of the system, which is a ratio of the
arrival and service rates. Knowing that the sum of all of the steady state probabilities
is equal to one, we get:
To solve for Ls, we have to determine how many customers are in service at any
given moment. In this particular system, there will always be one customer in
232
service except for when there are no customers in the system. Thus, this can be
calculated as:
Using Little’s queuing formula, we can also solve for W, Ws, and Wq by dividing
each of the corresponding L values by λ.
233
A formula for L can be found in a similar fashion, but is omitted because of the
messy calculations. The technique is similar to the one used in the previous section.
Calculating W is another issue. This is because in Little’s queuing formula, λ
represents the arrival rate, but in this system, not all of the customers who arrive will
join the queue. In fact, λΠc arrivals will arrive, but leave the system. Thus, only λ−
λΠc = λ(1 − Πc) arrivals will ever enter the system. Substituting this into Little’s
queuing formula gives us:
In-text Question 2
_______________ represents a single server that has unlimited queue capacity and infinite calling
population
Answer
M/M/1/
234
and that a single large pool of servers performs better than two or more smaller
pools, even though there are the same total number of servers in the system.
Example 1
Consider a system having 8 input lines, single queue and 8 servers. The output line
has a capacity of 64 kbit/s. Considering the arrival rate at each input as 2-packets/s.
So, the total arrival rate is 16-packets/s. With an average of 2000 bits per packet, the
service rate is 64 kbit/s/2000b = 32 packets/s. Hence, the average response time of
the system is 1/(µ − λ) = 1/(32 − 16) = 0.0625 sec.
Example 2
Consider a second system with 8 queues, one for each server. Each of the 8 output
lines has a capacity of 8 kbit/s. The calculation yields the response time as 1/(µ − λ)
= 1/(4 − 2) = 0.5 sec. And the average waiting time in the queue in the first case is
ρ/(1 − ρ)µ = 0.03125, while in the second case is 0.25.
235
every customer is being served. If j>s customers are in the system, then s customers
are being served and the remaining j − s customers are waiting in the line.
To model this as a birth-death system, we have to observe that the death rate is
dependent on how many servers are actually being used. If each server completes
service with a rate of µ, then the actual death rate is µ times the number of customers
actually being served. Parameters for this system are as follows:
In solving the steady-state probabilities, we will define p = λ /su. Notice that this
definition also applies to the other systems we looked at.
236
down and will require 1 hour to be repaired. What is the probability that the
number of new jobs that will arrive during this time is a. 0, b. 2, c. 5 or more?
d. If an M/M/1 queue has utilization of 80%, what is its mean queue length? If
the arrival rate is 100 jobs per second (and utilization is 80%), what is the
mean response time?
4.0 Summary
From our discussion so far, we can deduce that the goals of queuing systems and
queuing models usually require a compromise between cost and customer
satisfaction. In this section, we looked at ways of utilizing queuing to optimize
service delivery in order keep the waiting time, cost and customer satisfaction within
reasonable limits.
The applications of queuing theory extend well beyond waiting in line. It may take
some creative thinking, but if there is any sort of scenario where time passes before a
particular event occurs, there is probably some way to develop it into a queuing
model. Queues are so commonplace in society that it is highly worthwhile to study
them, even if only to shave a few seconds off one’s wait in the checkout line
We looked at:
i. The procedure for construction of queue models
ii. Some Basic Queuing models for:
a. Single-server with Poisson arrivals and service and general service
b. Multiple and Infinite Server Systems
237
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2NAz5vb , http://bit.ly/2HxcnQV ,
http://bit.ly/32a3l3X , http://bit.ly/2L3VBec , http://bit.ly/2LgZnj7. Watch the video
& summarize in 1 paragraph.
b. View the animation on queuing models and critique it in the discussion forum
c. Take a walk and engage any 3 students on queuing models; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
238
8.0 References/Further Readings
http://en.wikipedia.org/wiki/Poisson_distribution
http://en.wikipedia.org/wiki/Queuing _theory
http://en.wikipedia.org/wiki/Queuing _theory#cite_note-flood-7
Frederick S.H and Gerald J.L. (1995). Introduction to Operations Research, sixth
Edition. McGraw-Hill, Inc.
Sheldon M. Ross(2007). Introduction to Probability Models 9th Edition.
Academic press.
Wayne L Winston(1991) Operations Research: Applications and Algorithms,
2nd edition. Boston:PWS-Kent Publishing.
239
STUDY SESSION 4
Queuing Experiments
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Car Wash Experiment
2.2 Salesman Calls
2.3 Salesman BASIC Program
2.3 Goods Production
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary and Conclusion
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are all welcome. This study session, simply is about applications of queues in
different daily activities, and how the application enable us establish/use observed
queuing systems relationships to make informed decisions to reduce costs.
240
2.0 Main Content
2.1 Car Wash Experiment
A garage owner has installed an automatic car washing machine, which services cars
one at a time. As his services get more popular, the owner is faced with the dilemma.
As more customers use the car wash, the average waiting time tends to increase and
service becomes less attractive. The owner may react in several ways: do nothing and
let the customers flow stabilize at a lower level or build another car wash which will
keep the present customers happy and probably attract more. Since building a car
wash entails considerable expense, the latter is not a decision to be taken lightly and
demands some investigation.
Suppose the car wash services cars one at a time and each service takes 10 minutes.
When a car arrives, it goes into the car wash if it is idle, otherwise, it must wait in the
queue. As long as cars are waiting, the car wash is in continuous operation serving on
the first come first served principle. If the arrival times of cars have been recorded
for one day, then a very simple model is capable of reproducing the essential aspects
of the system on that day. This model can generate data which describe the
performance of the installation, such as use of equipment’s, average no of waiting
cars, time spent by each car etc.
The operation of the model consists of two critical events:
i. Arrival of a Car – The arrival time is noted. If the car wash is idle a 10
minutes service starts at once otherwise the car goes to the queue.
ii. End of Service – the elapse time for that car is no ted and car leaves the
system. If cars are waiting the first car in the queue is served and another 10
min service is started otherwise the machine becomes idle.
The car wash starts in the idle state and waits for the first arrival.
In other to understand the model better, we give a trace of the history of the model
over a short period when car arrive at the time 6, 10, 13, 28, 42, 43, 48 (in minutes).
The state of the model only changes at the two critical moments noted above – when
241
a car arrives and when a service is finished. Accordingly, we need only trace at these
times.
CAR WASH TRACE
Time Events
0 the car wash awaits the first arrival
Car 1 arrives and goes into the car
6 wash
10 Car 2 arrives and must wait
13 Car 3 arrives and must wait
16 Car 1 leaves the system.
Car 2 enters the car wash.
26 Car 2 leaves the system.
Car 3 enters the car wash.
28 Car 4 arrives and must wait.
36 Car 3 leaves
Car 4 enters the car wash.
42 Car 5 arrives and must wait.
43 Car 6 arrives and must wait
46 Car 4 leaves the system
Car 5 enters the car wash
48 Car 7 arrives and must wait.
56 Car 5 leaves the system
Car 6 enters
66 Car 6 leaves the system
Car 7 enters the car wash
76 Car 7 leaves the system.
The data recorded from the trace for the cars are:
Car Number 1 2 3 4 5 6 7
Arrival time 6 10 13 28 42 43 48
Departed at 16 26 36 46 56 66 76
Elapse time 10 16 23 18 14 23 28
In the real world the owner would compare the behaviour of the model with actual
system behaviour. If the model does not approximate the actual system the model
242
structure must be changed or refined. When a proper model has been developed, it
may be extended to predict hypothetical system behaviour such as estimating the
behaviour if another car wash is installed.
Let us demonstrate the trace using two car washes and run this model with two car
washes operating upon the same data. The extra (unfair) decision rule is that car
wash one is used if both car washes are idle.
The data recorded for the 7 cars for the two car washes are:
Car Number 1 2 3 4 5 6 7
Arrival time 6 10 13 28 42 43 48
Departed at 16 20 26 38 52 53 62
Elapse time 10 10 13 10 10 10 14
Serviced by 1 2 1 1 1 2 1
243
Car 3 waited for 3mins while car waited for 4mins.
Thus, the average waiting time prior to service is reduced from nearly 9mins to
1min by increasing the number of washes from 1 to 2.
Clearly, no management would base its decisions on such a small sample. To arrive
to proper estimates, it is necessary to simulate the behaviour of the system over
several days.
Solution
The solution has two parts. The first part specifies the rules and the second applies
them. The rules must link the random numbers so that success has probability of 0.5
and failure has probability of 0.5. A simple categorization of the digits
(0,1,2,3,4,5,6,7,8,9) is to have two subsets{0,1,2,3,4} and {5,6,7,8,9} where each
contains five elements. We associate success with first group and failure with the
second group. Note that this categorization is far from unique. Any division of the
ten digits into two groups of five would be good. So one might as well have used
[0,2,4,6,8] and [1,3,5,7,9] for success and failure respectively.
It is usually very useful to draw a flowchart of the simulation process as shown
below:
244
Fig. 2.4.1: Flowchart showing the activities of the sales man
The application of the rule for two 10-day periods are shown below.
First 10-day period
Day 1 2 3 4 5 6 7 8 9 10
Ran No. 8 4 3 7 9 0 6 1 5 6
Result of call f s S F f s f S f f
245
Second 10-day
period
Day 1 2 3 4 5 6 7 8 9 10
Ran No. 3 6 6 7 1 0 0 8 2 3
Result of call s f F F s s s F s s
In-text Question 1
What are two events that can cause a change in a queuing system’s state?
Answer
Arrival and end of service
246
IF X>0.5 THEN A$=”F”
PRINT A$
NEXT N%
END
A typical outcome of running this program is: sssfffffsss
The program may be run repeatedly, sometimes over large number of N. in this
case, the total number of S’s and F’s should be approximately equal.
At the start of the 10-day period, he assumed that 5 units were in stock and that a
further 5 would be available for dispatch from day 6. it was the policy of the firm to
dispatch orders on the same day they were placed. However, if no stock were
available, orders would be held until the next delivery of stock.
Use a tabular simulation to cover the 10-days. Show whether each call is made and
its result. Show also the level of stock held at the end of each day. Use the following
random numbers: 5,4,5,6,2,9,3,0,3,9,3, 9, 4, 8, 4, 9, 8, 4.
247
Solution
Note that tabular simulation is a table constructed to produce a record of what has
been simulated to occur and to facilitate any analysis or monitoring of processes
required. The first column would record an incremental number of events, other
columns would record random numbers and the corresponding results, while others
would monitor the implications of earlier columns. There is no fixed rule for doing
this. Flexibility is therefore very necessary.
Two look-up tables are needed to answer this question.
First, we need to determine whether the call took place or cancelled.
State of call Chance Random no.
Took place 90% 0,1,2,3,4,5,6,7,8
Cancelled 10% 9
The allocation of random numbers in these two tables is based on the fact that each
random number digit is assumed to have a 10% chance of occurring. Therefore
chances of 10%, 30%, 50% and 90% require one, three, five and one digits
respectively.
The following table gives the result of the simulation.
248
STATE OF CALL RESULT OF CALL
Day Random Status Random Stock Units Opening Ending
No no level Sold Stock stock
0 - - - 5 -
1 5 * 9 2 3S 5 2
2 4 * 3 2 0S 2 2
3 5 * 9 -1 0S,2H,1W 2 -1
4 6 * 4 -1 0S,2H,1W -1 -1
5 2 * 8 -3 0S,2H,3H -1 -3
6 6 * 4 5- 5S -3+5=2 2
7 9 X - 3=2 - - -
8 3 * 9 - -1 2 -1
9 0 * 8 -1 -3 -1 -3
10 3 * 4 -3 -3 -3 -3
-3
Note * means call took place and x means call cancelled.
249
Fig. 2.4.2: The flow-chart of the problem solution
In-text Question 2
What is a random variable?
Answer
A random variable is a quantity that is uncertain, such as interval time between two incoming
flights
Note that the first random number in the table contains a 9 for day 7, so no call was
made that day. Hence, only 9 random numbers are needed to determine the result of
call.
250
It will be noted that each delivery was followed 3 days later by stock-out. Thus, there
would have been no stock out if the initial stock level had been set to 8 units.
Notice that only two delivery periods have been simulated. Useful conclusions can
be drawn after repeated simulations many more times.
2.4 Goods Production
Example 5 - An engineering company has two machines A and B to produce a
product Z.
the daily output of each has been specified in the table below.
MACHINE A MACHINE B
Daily output Chance Daily output Chance
0 20 0 10
7 20 6 30
8 30 7 40
9 30 8 20
In addition, quality control tests give each day’s output a probability 0.95 chance of
being accepted and a probability 0.5 of rejection. Complete a tabular simulation to
cover a period of 10 days, monitoring daily and cumulative output.
Solution
The three look-up tables needed are one for each of the two machines and one for
quality check.
251
MACHINEA MACHINE B
Daily Chance% Random Daily Chance% Random
Output No Output No
0 20 0,1 0 10 0
7 20 2.,3 6 30 1,2,3
8 30 4,5,6 7 40 4,5,6,7
9 30 7,8,9 8 20 8,9
As the chances in the first two tables are all multiples of 10%, only one digit from the
set {0,1,2,3,4,5,6,7,8,9} is needed for each 10% chance.
However, in the quality table, we need probabilities of 0.95 and 0.05. In this case we
must use the random digits in pairs and select from the set [00,01,02,…,99]. The set
contains 100 pairs and we can assume that each pair has 1% per cent chance of being
selected. To simulate a probability of 0.95 we use 95 of the pairs and for 0.05 we use
5.
To carry out the simulation, random number blocks are used. A block is selected
randomly and then the digits are used sequentially. No cheating is allowed. Random
numbers should be used systematically. In the order in they appear.
252
The table is now produced as this:
Machine A Machine B Quality Test
Day Ran No Output Ran No Output Ran No Result Daily Cumm.
output output
1 4 8 1 6 79 Accept 14 14
2 0 0 4 7 85 Accept 7 21
3 8 9 0 0 57 Accept 9 30
4 5 8 2 6 68 Accept 14 44
5 5 8 1 6 18 Accept 14 58
6 6 8 9 8 08 Accept 16 74
7 0 0 2 6 27 Accept 6 80
8 9 9 2 6 05 Accept 15 95
9 1 0 9 8 66 Accept 8 103
10 9 9 3 6 39 Accept 15 118
253
Considering the first block of random numbers, the first random number is 4, this is
looked up in the table for machine A, the second in table 1 is for machine B, and then
the next two random numbers 7,9 are looked up in table for quality check. For each
subsequent day, four random digits are taken from the block of random numbers in
that order.
We conclude that the cumulative output for 10 days was 118 units and that no
output was rejected. Further simulation can be carried out and the result compared
with the above to give clearer indication of reliability of the result.
254
Lead time for replenishment in days 3
4.0 Conclusion/Summary
In this session, we have manually simulated the application of queuing systems in
different areas of daily activities. This reader is expected to extend his/her knowledge
by trying some other areas of daily endeavours, and completing the exercises.
In this session, we have designed queuing systems in:
i. Car-wash
ii. Sales calls
iii. Goods production, and
iv. Written computer programs in Basic languages to simulated those experiments
255
5.0 Self-Assessment Questions
1. What are the two critical events that the operation of the model of the car wash
experiment we considered in this study session?
2. When a proper model has been developed, it may not be extended to predict
hypothetical system behaviour. True or False?
256
MODULE 4
Simulation Languages and Stochastic Processes
Contents
Study Session 1: Example of Simulation Languages
Study Session 2: SIMNET II Language
Study Session 3: Stochastic Processes
Study Session 4: Random Walks
Study Session 5: Data Collection
Study Session 6: Coding and Screening
STUDY SESSION 1
Examples Of Simulation Languages
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 The Purpose of Simulation Language
2.1 Types and Examples of Simulation Languages
2.2 Approaches to model development
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
257
Introduction
Welcome. Most conventional programming languages are not suitable for writing
simulation programs. The programme is usually confronted with a number of
detailed decisions. He needs flexible tools for generating dynamic models, model
formulation, programming verification, validation and experimental design and
analysis. The basic purpose of most simulation studies is to compare alternatives.
Therefore, the simulation program must be flexible enough to readily accommodate
the alternatives that will be considered. Most of the instructions in a simulation
program are logical operations whereas the relatively little actual arithmetic work
required is usually of a very simple type. This should be reflected in the choice of
computer programming language to be used.
258
The above considerations partly motivated the development of simulation languages
in the early 1960’s. These languages are designed especially to expedite the type of
programming unique to simulation. Their specific purposes include the following:
1. To provide a convenient means of describing the elements that commonly
appear in simulation models.
2. To expedite changing the design configuration of the system being simulated
so that a large number of configurations can be considered easily.
3. To provide some form of internal timing and control mechanism with related
commands to assist in the kind of book-keeping that is required when
executing a simulation run.
4. To provide simple operational procedures, such as introducing changes into
simulation models, initializing the state of the model, altering the kind of
output data to be generated and stacking a series of simulation runs.
In simulation, computer models (literally) imitate the behaviour of the real situation
as a function of time. As the simulation advances with time, pertinent statistics are
gathered about the simulated system, in very much the same way it is carried out in
real life.
However, we pay attention to the system especially when changes in statistics take
place. Such changes are associated with the occurrence of events, for example in a
bank operation, arrival and departures from a facility, points in time at which the
length of the queue and/or the idle/busy status may change.
In-text Question 1
What is a Simulation language?
Answer
A computer simulation language describes the operation of a simulation on a computer.
259
2.1 Types and Examples of Simulation Languages
Discrete-event simulation languages, view the model as a sequence of random
events each causing a change in state.
GPSS - General Purpose Simulation System (originally Gordon's Programmable
Simulation System after creator Geoffrey Gordon. The name was changed when it
was decided to release it as a product) is a discrete time simulation language, where a
simulation clock advances in discrete steps. A system is modelled as transactions
enter the system and are passed from one service (represented by blocs) to another.
This is particularly well suited for problems such as a production factory. It was
popular in the late 1960s and early 1970s but is little used today.
Siman, a language with a very good GUI (ARENA) of Rockwell Company
SimPy- is a process-based, object-oriented discrete-event simulation language. It is
implemented in standard Python and released as open source software under the
GNU Lesser General Public License (LGPL). It provides the modeller with
components for building a simulation model including Processes, for active entities
like customers, messages, and vehicles, and Resources, for passive components that
form limited capacity congestion points like servers, checkout counters, and tunnels.
There are two varieties of Buffer classes, Levels to hold stored quantities and Stores
to hold sets of objects. It has commands to aid interaction between entities. It
provides Monitor and Tally objects to aid in gathering statistics but the generation of
random variates depends on the standard Python random module. Because it is
implemented in Python, SimPy is platform-independent. SimPy simulates parallel
processes by an efficient implementation of coroutines using Python's generators
capability. It is based on ideas from Simulate and SIMSCRIPT II.5. The first version
was released in December 2002. Version 2.0, including an object-oriented but
compatible interface and new documentation, was released in January 2009. Version
2.0.1, was released in April 2009.
260
SIMSCRIPT II.5 - is the latest incarnation of SIMSCRIPT, one of the oldest
computer simulation languages. Although military contractor CACI released it in
1971, it still enjoys wide use in large-scale military and air-traffic control
simulations.
261
Continuous simulation languages, the model essentially as a set of
differential equations.
Advanced Continuous Simulation Language (ACSL), (pronounced "axle"), is a
computer language designed for modelling and evaluating the performance of
continuous systems described by time-dependent, nonlinear differential equations. It
is a dialect of the Continuous System Simulation Language (CSSL), originally
designed by the Simulations Council Inc (SCI) in 1967 in an attempt to unify the
continuous simulations field. ACSL is an equation-oriented language consisting of a
set of arithmetic operators, standard functions, a set of special ACSL statements, and
a MACRO capability that allows extension of the special ACSL statements.
ACSL is intended to provide a simple method of representing mathematical models
on a digital computer. Working from an equation description of the problem or a
block diagram, the user writes ACSL statements to describe the system under
investigation.
An important feature of ACSL is its sorting of the continuous model equations, in
contrast to general purpose programming languages such as Fortran where program
execution depends critically on statement order Applications of ACSL in new areas
are being developed constantly.
Typical areas in which ACSL is currently applied include control system design,
aerospace simulation, chemical process dynamics, power plant dynamics, plant and
animal growth, toxicology models, vehicle handling, microprocessor controllers, and
robotics.
Dynamo - was used for the system dynamics simulations of global resource-
depletion reported in the Club of Rome's Limits to Growth. Originally designed for
batch processing on mainframe computers, it was made available on minicomputers
in the late 1970s, and became available as "micro-Dynamo" on personal computers in
the early 1980s. The language went through several revisions from DYNAMO II up
to DYNAMO IV in 1983, but has since fallen into disuse.
262
SLAM - Simulation Language for Alternative Modelling
VisSim, is a visual block diagram language for simulation of dynamical systems and
model based design of embedded systems. Visual Solutions of Westford develop it;
Massachusetts is widely used in control system design and digital signal processing
for multi domain simulation and design. It includes blocks for arithmetic, Boolean,
and transcendental functions, as well as digital filters, transfer functions, numerical
integration and interactive plotting. The most commonly modelled systems are
aeronautical, biological/medical, digital power, electric motor, electrical, hydraulic,
mechanical, process, thermal/HVAC and econometric.
Simulink - developed by Math Works, is a commercial tool for modelling,
simulating and analysing multi domain dynamic systems. Its primary interface is a
graphical block diagramming tool and a customizable set of block libraries. It offers
tight integration with the rest of the MATLAB environment and can either drive
MATLAB or be scripted from it. Simulink is widely used in control theory and
digital signal processing for multi domain simulation and Model-Based Design.
263
In-text Question 1
List the different kinds of simulation language.
Answer
Discrete-event simulation languages
Continuous simulation languages
We can also on this basis, classify simulation languages according to the type of
simulation:
Next-event scheduling: SIMSCRIPT, GASP,
Process Operation: SIMULA, GPSS, SIMNET II, SIMAM, SLAM
264
iii. Find more information on the simulation languages list above or those you
may find on the net to enrich your understanding.
iv. Also, check the following web address for computer simulation software
http://en.wikipedia.org/wiki/List_of_computer_simulation_software.
v. Briefly discuss the important features of two simulation language; one from
each category.
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2MJcGfm , http://bit.ly/2UeWCmt ,
http://bit.ly/32cbKUw , http://bit.ly/3462WBr. Watch the video & summarize in 1
paragraph
b. View the animation on Examples of Simulation Languages and critique it in the
discussion forum.
265
c. Take a walk and engage any 3 students on Examples of Simulation Languages;
In 2 paragraphs summarize their opinion of the discussed topic. etc.
266
STUDY SESSION 2
The SIMNET II Language
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Examples of Simulation Languages
2.2 Design of SIMNET II Language
2.3 SIMNET II Nodes Statements
2.4 The Definition the four Nodes
2.5 Queue Node Examples
2.6 Facility Node Examples
2.7 Example of Auxiliary Node
2.8 Rules for the Operation of Nodes
2.9 SIMNET II Mathematical Expressions
2.10 Layout of SIMNET II Language
2.11 SIMNET OUTPUT REPORT
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are welcome. In this study session, we shall discuss the Simnet ii Language.
SIMNET is a network-based discrete simulation language that differs from available
process languages in that, it utilizes exactly four nodes: a source for creating
transactions, a queue where waiting may take place, a facility where service is
267
performed, and an auxiliary that is introduced to enhance the modelling flexibility of
the language. Each node is provided with sufficient information that defines the exact
manner in which a transaction enters, resides in, and leaves the node.
268
ISES generates the animation model without any special programming effort on the
part of the user.
The language is based on network approach that utilizes three main nodes and
one auxiliary node, which include:
i. Source node - from which transactions (customers) arrive;
ii. Queue node - where waiting takes place if necessary;
iii. Facility node - where service is performed;
iv. Auxiliary node - which is added to enhance the modelling capabilities of
SIMNET II.
269
2.1 SIMNET II Nodes Statements
Statements in SIMNET II are specified node by node. The general format is:
Node identifier; field 1; field 2; ….; field m:
The node identifier consists of user-defined names (12 characters’ maximum)
followed by one of the codes *S, *Q, *F, or *A, which identify the name as either a
source, a queue, a facility or an auxiliary. The node identifier is then followed by a
number of fields separated by semicolons with the last terminating with a colon.
Each field carries information that is needed for the operation of the node. The order
of the fields must be followed for the processor to recognize their information
content. If a field is not used, or defaulted, the position is indicated with a semicolon.
For example, the statement ARRIVE:
ARRIVE *S;10;;;;LIM = 500:
Identifies a source node named ARRIVE. The value 10 for the first field is the time
between successive arrivals. Fields 2, 3 and 4 assume default values (see later) while
field 5 indicates that the maximum number of creations from ARRIVE is limited to
500 transactions.
The above statement could be written also as:
ARRIVE *S; 10; /5/LIM =500:
Notice that the semicolon indicating fields 2, 3 and 4 are omitted. Instead, the next
field number is indicated as shown.
Also, /5/ or more generally /n/ can be replaces by descriptive reserved words or
single letters. E.g. /multiple/ or /m/, /limit/ or /1/, /source/ or /s/ and /resource/ or
/l/:
SIMNET II programming is in free format and is upper/lower case insensitive.
A statement can spill over one line provided each line is terminated by &
(ampersand).
Example:
ARRIVE *S; 10; /5/LI & ! Line 1 of ARRIVE
M = 500: ! Line 2 of ARRIVE
270
Line 1 of ARRIVE is a comment and is ignored by SIMNET II processor. We
now define the various fields of SIMNET II’s four nodes.
The general format is SNAME *S; F1; F2; F3; MULT=F4; LIM=F5; F6; F7; *T:
271
The Graphic Symbol and Representation:
F
F1 2 F6
F5
F
F3 4
We use examples below to look at the operation of the fields: F1 to F5 and *T.
Example 1
Individual customers upon arrival at a car registration facility are assigned serial
numbers that identify the order in which they will receive service. The inter-arrival
time is exponential with mean 12 minutes. The first customer arrives about 10
minutes after the facility opens.
EX(12) 10
The graphic symbol is:
- CUSTOMERS
272
The source statement defining the solution is:
CUSTOMERS *S;EX(12);10;-1:
Example 2:
TV units arrive every 5 minutes for packaging. It is desired to keep track of the
arrival time of each transaction in attribute 2.
The following statements are equivalent.
TVS *S;5;; 2;* PACKAGING:
TVS *S;5;; 2; goto- PACKAGING:
273
Explanation
The creation time for the first TV unit is 0 because field F2 is defaulted. Because F3
= 2 (2>0), mark attribute A(2) for successive customers will assume the respective
values 0,5,10,15,….
The transactions leaving source TVS will be transferred to a queue (buffer) node
named PACKAGING as shown by the *T field.
Note: - *T always occupies the last field of the node, regardless the number of
defaults fields that may proceed it. Go to in the second statement has replaced
the * in the first statement.
Example 3: A mill is contracted to receive 100 truckloads of logs. Each
truckload includes 50 logs. The mill processes the logs at a time. The
arrival of trucks at the mill is spaced 45 minutes apart.
45
100
274
2. The initial number waiting at start of the simulation may be zero or
greater than zero.
3. Transactions waiting in a queue may leave according to the selected
queue discipline.
4. The queue itself may act as an accumulator whereby a specified number of
waiting transactions are replaced by one leaving transaction.
The descriptions of the various fields of a queue node are summarized in the
tables below and tables 2.4.1 and 2.4.2.
In-text Question 1
SIMNET II uses a network approach that utilizes four main nodes what are they?
Answer
Source node, Queue node, facility node and Auxiliary node
The general format is QNAME *Q; F1(SUBF1), F2(SUBF2); F3; F4;F5; *T:
275
an attribute number (see table 1)
F1 F2 F3 F
(SUB (SUB 4
F1) F2)
Graphic Symbol is:
Table 4.1.1: Rules for Computing Exiting Transaction Attributes In an Accumulation Queue.
Rule Description
276
LO(#) Attributes of the transactions having the lowest A(#)
among all accumulated transactions.
277
the head of the queue. Notice that field 1 is defaulted, signifying that queue JOBQ
has an infinite capacity.
If the values assigned to A(1) are interchanged so that A(1)=0 represents the
rush job, the queue discipline must be changed to LO(1) as follows:
JOBQ *Q;;;LO(1):
Example 2.
Units of a product are packaged four to a carton. The buffer area can hold a
maximum of 75 units. Initially, the buffer is holding 30 units.
Solution
75(30)
Let QUNIT represent the buffer, the associated statement is given as:
QUNIT *Q; 75(30); 4:
The first field sets the maximum queue capacity (=75) and the initial number in the
system (=30). The queue discipline is FIFO because field 3 is defaulted. Field 2
indicates that four product units will be converted to a single carton. By default, the
attributes of the carton transaction will equal those of the LAST of the four unit
transactions forming the carton.
278
Identifier Description Values
Explanation
Field 1 is not needed in this situation because it normally deals with multiple queue
input to the facility, which will be discussed later. In field 2 the value 15 provides the
service time in the facility. Field 3 shows that the facility has one server that is
initially busy. The *T field shows that the completed transaction will be Terminated
279
by using *TERM (or goto-TERM). The symbol TERM is a reserved word of
SIMNET II. It is not a node but simply a code that will cause the transaction to
vanish from the system.
280
Example 2.
A facility has three parallel servers, two of which are initially busy. The
service is exponential with mean 3 [EX(3)].
The associated statement is: SRVR *F;;EX (3); 3(2):
Example 3.
A small shop has one machine and 10 waiting jobs, in addition to the job that is
currently being processed. The processing time is exponential with mean 30 minutes.
The network representing this situation is shown below, where the symbol
following the facility represents TERM.
The associated SIMNET II statements
are:
QJOB *Q; (10):
FJOB *F;;EX(30); (1); *TERM:
Explanation
In the network above, since FJOB is initially busy, as indicated by the entry (1) in
field 3, the facility will automatically process its resident job using a sample from
EX(30) as its processing time. After the job leaves FJOB to be terminated, the
facility will automatically look back and draw a new job from QJOB. This process is
repeated until all 10 jobs are processed.
281
The general format is: ANAME *A;F1;F2;F3; *T:
F2
ARIV *S;EX(25):
FORM *A;15;
WAIT *Q:
282
EX(25) 15
ARIV FORM W
AI
T
283
Explanation
The model assumes that the forms are immediately accessible to the arriving
applicants. This is the reason for representing the process of filling out the form by
the infinite capacity auxiliary FORM. If the forms were to be completed with the
assistance of a clerk, the auxiliary FORM would have to be replaced with a single-
server facility preceded by a queue.
In-text Question 2
Statements in SIMNET II are specified node by node. The general format is ____________
Answer
Node identifier; field 1; field 2; ….; field m:
284
service. If the queue happens to be empty, the facility will go dormant until it
is revived by newly arriving transaction.
vii. Movement of transactions in and out of the queue can only be caused by other
nodes. The queue itself is not capable of initiating this movement.
viii. When facilities follow one another in tandem or when the intervening buffers
(queue) have limited capacities, a transaction completing service in one of the
facilities will be blocked if its successor node is full a (finite) queue or a busy
facility. The unblocking will take place automatically in a chain effect when
the cause of blocking subsides.
2.7 SIMNET II Mathematical Expressions
Mathematical expressions are used in certain fields of nodes, such as the inter arrival
time in a source. They may also be used with arithmetic assignments and conditions.
The rules for constructing and evaluating mathematical expressions in SIMNET II
are the same as in FORTRAN. An expression may include any legitimate
combination of the following elements:
I. User-defined non-subscripted or subscripted (array) variables.
II. All familiar algebraic and trigonometric functions (see table 2.4.3).
III. SIMNET II simulation variables that define the status of the simulation during
execution (see table 2.4.4).
IV. SIMNET II random samples from probabilistic distributions (see table 2.4.5).
V. SIMNET II special functions (see table2.4.6).
Names of user-defined variables may be of any length, although only the first 12
characters are recognizable by the SIMNET II processor. The name may include
intervening blanks but must exclude the following special symbols:
:,;( ) { } + - * / = < > $ & % ? !
These symbols are used to represent specific operations in SIMNET II. The
following are typical example of SIMNET II’s non-subscripted and array variables:
285
Nbr of machines
TIME BET ARVL
Sample (1*(J+K) **2)
SCORE ( Sample(I+J), MAX (K, nbr_of_machines)
The algebraic and trigonometric functions accepted by SIMNET II are listed in table
3. The arguments of these functions may be any legitimate mathematical
expressions. All given functions, have the same properties as in FORTRAN.
SIMNET II simulation variables provide access to all simulation parameters and
statistics during execution. Simulations statistics are provided in the form of current,
highest, lowest and average values. For example, LEN(QQ), HLEN(QQ),
LLEN(QQ) and ALEN(QQ) define the current, highest, lowest and average Length
of a queue named
Table 4.1.4 describes the SIMNET II simulation variables that you can access during
execution. You may use these variables directly within any mathematical expression.
Table 4.1.5 describes the random functions available in SIMNET II. All arguments
can be represented by a SIMNET II mathematical expression. The default value of
the random number stream, RS, is 1.
The last element of a mathematical expression includes SIMNET II special variables.
These variables includes table look ups TL(argl, arg2) and mathematical functions
FUN(arg1) and FFUN(arg1, arg2). See table 6.
286
Table 4.1.3: SIMNET II Intrinsic Functions ------------------------------------------------
---------------------------------------- Algebraic
VAL/HVAL/LVAL/AVAL Current/highest/lowest/average
(variable name) VALue of a Statiscal variable (see later)
LEV/HLEV/LLEV/ALEV Current/highest/lowest/average
(resource name) LEVel of a Resource.
287
simulation or the number of updates
of a resource or a variable.
RUN.LEN Length of current run
TR.PRD Length of transient period
CUR.TIME Current simulation time
OBS Current statistic I observation number
NOBS Total number of observations per run
RUN Current statistical run number
NRUNS Total number of runs
AQWA (queue name) Average wait in queue for all customers including
those who do not wait
AQWP (queue name) Average wait in queue for those who must wait
AFBL (facility name) Average blockage in a facility
AFTB (Facility name) Average blockage time in a facility
AFIT (facility name) Average time facility is idle
AFBT (Facility name) Average time facility is busy
ARTU (resource name) Average resource units in transit
ARTT (resource name) Average time a resource is in transit
ARBT (resource name) Average time a resource is busy (in use)
ARIT (resource name) Average time a resource is idle
AFRQ (variable name, cell #) Absolute histogram frequency of cell # of a variable
RFRQ (variable name, cell #) Relative histogram frequency of cell # of a variable
NTERM (node name) Number of transactions terminated from a node
NDEST (node name) Number of transactions destroyed from a node.
______________________________________________________________
B(arg1, arg2, RS) Binomial sample with parameter n = arg1 and ρ = arg2
288
GA(arg1, arg2, RS) GAmma sample with shape parameters = arg1 and
1/ = arg 2; if is a positive integer, the sample is Erlang.
With
mean = arg1 and standard deviation = arg2
NE(arg1, arg2, RS) Negative binomial with parameters c = arg1 and p = arg2
NO(arg1, arg2,RS) Normal sample with mean = argl and standard deviation =
arg2
WE(arg1, arg2,RS) Weibull sample with shape parameters = arg1 and = arg2
Variable Definition
TL(arg1,arg2) Value of a dependent variable obtained from Table Look-up number
arg1 given the value of the independent variable is arg2. If arg1 < 0,
TL is automatically determined by linear interpolation.
289
2.8 Layout of SIMNET II Language
Although SIMNET II statements are free formatted, the segments of the model
must follow a specific organization as shown below:
Definitions Segment:
$VARIABLES: (definitions of statistical variables):
290
$STOP:
$PROJECT and $DIMENSION are mandatory statements that always occupy the
first and second statements of the model. The $PROJECT provides general
information about the model. The $DIMENSION statement allocates memory
dynamically to the model’s files (queue, facilities and the E.FILE) and user-defined
arrays. The dimension m of ENTITY is an estimate of the maximum number of
transactions that can be in the system at any time. The only restriction on the use of
$DIMENSION is that attributes be defined by the reserved array name A(.). For
example, the statement:
$DIMENSION; ENTITY (50), A (5), sample (50,3):
indicates that the maximum number of transactions during execution is estimated not
to exceed 50 and each transaction will have five attributes. The double-subscripted
array sample (50,3) is defined to have 50 rows and three columns.
The optional $ATTRIBUTES statement is used when it is desired to assign
descriptive names to the elements of the A(.) array. For example, suppose that the
$DIMENSION statement specifies the attributes array as A(5), meaning that each
transaction will have five attributes. The statement
$ATTRIBUTES; Type, ser_nbr(2),, Prod_time:
signifies the following equivalences:
A(1)=Type
A(2)=ser_nbr(1)
A(3)=ser_nbr(2)
A(5)=Prod_time.
Notice that the name of A(4) has been defaulted, which means that it has no
descriptive name.
The definitions segment of the model defines the model’s statistical variables, logic
switches and resources. All three types of statements are optional.
291
The logic segment includes the code that describes the simulated system using the
nodes and branches.
The control segment provides information related to how output results are gathered
during execution. Finally, the initial data segment provides all the data needed to
initialise the simulation
Explanation
The $DIMENSION statement estimates that at most 30 transactions (customers)
will be in the model at any one time. If during execution this estimate is exceeded,
292
SIMNET II will give an error message. The $DIMENSION of ENTITY must be
increased.
The model does not use $ATTRIBUTES, $VARIABLES, $SWITCHES OR
$RESOURCES as shown by the absence of these statements.
The logic of the model is represented by the statements enclosed between $BEGIN
and $END. Transactions are automatically created by source ARVL by randomly
sampling the inter-arrival time EX(5) with the first arrival taking place at time 0
(default of field 2). Arriving transactions will enter queue LINE if all three clerks
are busy otherwise the queue is skipped. When a transaction completes a service, it
will be terminated. At this point, facility CLRKS will look back at QUEUE LINE
and bring in the first in line transaction (queue discipline is FIFO by default) the
control data of the model shows that it will be executed for one run of length 480
mins. The standard output of the model is shown below:
2.9 Sample Standard Output of the Model
SIMNET OUTPUT REPORT
PROJECT: Post office RUN LENGTH = 480.00 NBR RUNS =1
DATE: 20 January, 2009 TRANSIENT PERSON =0.00 OBS/RUN = 1
ANALYST: M.C OKORONKWO TIME BASE/OBS= 480.00
FACILITITIES
NBR MIN/MAX AV. AV. AV. AV. BUSY
SEVRS LAST GROSS BLOCKAGE BLKGE IDLE TIME
UTILZ UTILZ TIME TIME
3 3/3/1 1.9931 .0000 .00 8.48 16.78
293
TRANSACTION COUNT AT T=480.00 OF RUN 1:
NODE IN OUT RESIDING SKIPPING UNLINKED/
(BLOCKED) (DESTROYED) LINKED
*S:ARVL 96 (0) 0
*Q: LINE 40 40 0 56 0/ 0
*F:CLRKS 96 95 1 0 0 95
The transaction count given at the end of the report gives a complete history of the
flow of transaction during the run. The summary can be helpful on spotting
irregularities in the model. In our example, during the 480 minutes run, 96
transactions were created by source ATVL, 40 of which experienced some waiting in
queue LINE and the remaining 56 skipped the queue. Facility CLRKs received 96
customers and released 95 with 1 transaction remaining un processed at the end of
the run. The remaining columns on the count are all zeros. The
UNLINKED/LINKED column is used only when the model exposes some file
manipulations. The (Blocked) and (DESTROYED) column will show positive values
whether a facility is blocked or when transaction are destroyed. Neither case applied
in our case.
Queue LINE has an infinite capacity (*** shows), IN:OUT ration of 1:1 shows that
each existing transaction corresponds to one waiting transaction. The Average
length of 1.30 transactions represents the average run of waiting transactions over
the entire length of the run(= 0,12,0) that occurred during the run. Average waiting
time of all transactions (including those that do not wait) given by AV. DELAY
ALL)=6051minutes. Next column AV. DELAY (+ve WAIT ) show the average
waiting for those that must wait as 15.63 minutes. Finally, last column indicates that
58% of transactions arriving from source AVRL skip LINE i.e they do not
experience any waiting at all.
Facility CLRK has 3 parallel servers. Second column shows that CLRKs
The third column indicates that the average 1.9931 servers (out of 3) were busy
thought out the run, thus reflecting a gross percentage utilization of (1.9931/3)*100
294
= 66.4% the average BLOCKAGE records the average number of productive
occupancy of the facility.
Show how these transactions will be ordered in QQ in each of the following cases:
a). QQ *Q: b) QQ *Q;/d/LIFO: c) QQ *Q;/d/HI(1): d) QQ *Q;/d/LO(2):
295
4.0 Summary
We shall conclude our study on Simnet II Language. Note that the essence of this
study session is to look at simulation languages designed to expedite simulation,
through:
i. Provision convenient means of describing the elements that commonly appear
in simulation models.
ii. Expediting the change in the design configuration of the system being
simulated so that a large number of configurations can be considered easily.
iii. Provision of simple operational procedures, such as introducing changes into
simulation models, initializing the state of the model, altering the kind of
output data to be generated and stacking a series of simulation runs
296
5.0 Self-Assessment Questions
1. What are the rules for the operation of source, queue, facility and auxiliary
nodes?
2. The definitions segment of a model in SIMNET II
defines___________________________
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2MJcGfm , http://bit.ly/2UeWCmt ,
http://bit.ly/32cbKUw , http://bit.ly/3462WBr. Watch the video & summarize in 1
paragraph.
b. View the animation on add/site SIMNET II Languages and critique it in the
discussion forum
c. Take a walk and engage any 3 students on SIMNET II Languages; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
297
vi. If a facility is preceded by a queue, the facility will automatically
attempt to draw from the waiting transactions immediately upon the
completion of service. If the queue happens to be empty, the facility will
go dormant until it is revived by newly arriving transaction.
vii. Movement of transactions in and out of the queue can only be caused by
other nodes. The queue itself is not capable of initiating this movement.
viii. When facilities follow one another in tandem or when the intervening
buffers (queue) have limited capacities, a transaction completing service
in one of the facilities will be blocked if its successor node is full a
(finite) queue or a busy facility. The unblocking will take place
automatically in a chain effect when the cause of blocking subsides.
2. $VARIABLES, $SWITCHES and $RESOURCES
298
STUDY SESSION 3
Stochastic Processes
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Definition Of Stochastic Process
2.1 Classification of Stochastic Process
2.2 General Stochastic Processes Concepts
2.3 Application of Stochastic Processes
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are welcome. So far, in our work in probability and statistics, we have tended to
deal with one (or at most two) random variables at a time. In many situations, we
want to study the interaction of “chance" with “time" e.g. the behaviour of share s in
a company on the stock market, the spread of an epidemic, the movement of a pollen
grain in water (Brownian motion). To model this, we need a family of random
variables (all defined on the same probability space), (X(t); t ≥ 0) where X(t)
represents e.g. the value of the share at time t. (X(t); t ≥ 0) is called a (continuous
time) stochastic process or random process.
In this session, we will look at the theory of stochastic processes in an elementary
manner. In probability theory, a stochastic process, or sometimes-random process,
is the counterpart to a deterministic process (or deterministic system). Instead of
299
dealing with only one possible reality of how the process might evolve under time
(as is the case, in solutions of an ordinary differential equation), in a stochastic or
random process there is some indeterminacy in its future evolution described by
probability distributions. This means that even if the initial condition (or starting
point) is known, there are many possibilities the process might go to, but some paths
may be more probable and others less so.
300
is countable set, we have a discrete-time stochastic process, or it is non-countable
continuous set, we have a continuous-time stochastic process. Any realization of X is
named a sample path, which can be discrete or continuous.
Although in most applications, the index set is simply a set of time instants tk, for the
case of technical uncertainty this is not true.
Another definition is that of Dixit & Pindyck's which state "Stochastic process is a
variable that evolves over time in a way that is at least in part random".
So, a stochastic process means time and randomness. In most cases, a stochastic
variable has both an expected value term (drift term) and a random term (volatility
term).
We can see the stochastic process forecasting for a random variable X, as a forecast
value (E[X]) plus a forecasting error, where error follow some probability
distribution. So:
X(t) = E[X(t)] + error(t)
The figure below presents the idea, a popular example, and brings the
concept of increment (in this case the Wiener increment).
In-text Question 1
What is a stochastic process?
Answer
A stochastic process is a probabilistic model of a system that evolves randomly in time and space.
301
A Classification of Stochastic Processes
Continuous Discrete
Time
Weekdays' range of
Temperature
302
Second Order Stationary: A stochastic process is a second order stationary if it is
first order stationary and covariance between X(t) and X(s) is a function of only t-s.
Again, in economic time series, a process is second order stationary when we
stabilize also its variance by some kind of transformations such as taking square root.
Note: a stationary process is a second order stationary, however the reverse may not
hold.
In simulation output statistical analysis, we are satisfied if the output is covariance
stationary.
303
For all states i0, i1, i2, …, i n-1, i, j and all n>=0, this stochastic process is a
Markov chain.
It process is a generalized Wiener process. A Wiener process is also a special case
of a strong diffusion process that is a particular class of a continuous time Markov
process.
304
variables. The Poisson process is an example of renewal counting process.
Homogeneous Poisson Process has the following three properties:
305
Interestingly the combination of Poisson processes with Brownian motions is related
to Lévy process. According to Karlin & Taylor, "The general Lévy process can be
represented as a sum of a Brownian motion, a uniform translation, and a limit (an
integral) of a one-parameter family of compound Poisson processes, where all the
contributing basic processes are mutually independent".
In the sample paths of Lévy processes, the large increments or "jumps" are called
"Lévy flights".
2.3 Application of Stochastic Processes
Familiar examples of processes modelled as stochastic time series include stock
market and exchange rate fluctuations, signals such as speech, audio and video,
medical data such as a patient's EKG, EEG, blood pressure or temperature, and
random movement such as Brownian motion or random walks.
Examples of random fields include static images, random terrain (landscapes).
2.3.1 Mathematical theory
The use of the term stochastic to mean based on the theory of probability has been
traced back to Ladislaus Bortkiewicz, who meant the sense of making conjectures
that the Greek term bears since ancient philosophers, and after the title of "Ars
Conjectandi" that Bernoulli gave to his work on probability theory. In mathematics,
specifically in probability theory, the field of stochastic processes has been a major
area of research.
For example, a stochastic matrix is a matrix that has non-negative real entries that
sum to one in each row.
2.3.2 Artificial intelligence
In artificial intelligence, stochastic programs work by using probabilistic methods to
solve problems, as in stochastic neural networks, stochastic optimization, and
genetic algorithms. A problem itself may be stochastic as well, as in planning under
uncertainty. A deterministic environment is much simpler for an agent to deal with.
306
2.3.3 Natural science
An example of a stochastic process in the natural world is pressure in a gas as
modelled by the Wiener process. Even though (classically speaking) each molecule
is moving in a deterministic path, the motion of a collection of them is
computationally and practically unpredictable. A large set of molecules will exhibit
stochastic characteristics, such as filling the container, exerting equal pressure,
diffusing along concentration gradients, etc. These are emergent properties of the
systems.
In-text Question 2
What are Lévy flights?
Answer
Lévy flights are The large increments or "jumps" in the sample paths of Lévy processes
2.3.4 Physics
Physics researchers popularized the name “Monte Carlo” for the stochastically
Monte Carlo method. Perhaps the most famous early use was by Enrico Fermi in
1930, when he used a random method to calculate the properties of the newly
discovered neutron. Monte Carlo methods were central to the simulations required
for the Manhattan Project, though were severely limited by the computational
tools at the time. Therefore, it was only after electronic computers were first built
(from 1945 on) that Monte Carlo methods began to be studied in depth. In the
1950s they were used at Los Alamos for early work relating to the development of
the hydrogen bomb, and became popularized in the fields of physics, physical
chemistry, and operations research.
2.3.5 Biology
In biological systems, introducing stochastic 'noise' has been found to help improve
the signal strength of the internal feedback loops for balance and other vestibular
307
communication. It has been found to help diabetic and stroke patients with balance
control.
2.3.6 Medicine
Stochastic effect or "chance effect" is one classification of radiation effects that
refers to the random, statistical nature of the damage. In contrast to the deterministic
effect, severity is independent of dose. Only the probability of an effect increases
with dose. Cancer is a stochastic effect.
2.3.7 Creativity
Simonton (2003, Psych Bulletin) argues that creativity in science (of scientists) is a
constrained stochastic behaviour such that new theories in all sciences are, at least in
part, the product of stochastic processes.
2.3.8 Music
In music, stochastic elements are generated by strict mathematical processes.
Stochastic processes can be used in music to compose a fixed piece or can be
produced in performance. Stochastic music was pioneered by Iannis Xenakis, who
used probability, game theory, group theory, set theory, and Boolean algebra, and
frequently used computers to produce his scores. Earlier, John Cage and others had
composed aleatoric or indeterminate music, which is created by chance processes
but does not have the strict mathematical basis.
308
line screens, which are amplitude, modulated had problems until stochastic
screening became available. A stochastic (or frequency modulated) dot pattern
creates a sharper image.
2.3.12 Business
a. Manufacturing - Manufacturing processes are assumed to be stochastic
processes. This assumption is largely valid for either continuous or batch
manufacturing processes. Testing and monitoring of the process is recorded using
a process control chart which plots a given process control parameter over time.
Typically, a dozen or many more parameters will be tracked simultaneously.
Statistical models are used to define limit lines, which define when corrective
309
actions must be taken to bring the process back to its intended operational
window.
b. Finance - The financial markets use stochastic models to represent the seemingly
random behaviour of assets such as stocks, commodities and interest rates.
Quantitative analysts to value options on stock prices, bond prices, and on interest
rates, see Markov models then use these models. Moreover, it is at the heart of the
insurance industry.
310
4.0 Summary and Conclusion
To end our study, in the simplest possible case (discrete time), a stochastic process
amounts to a sequence of random variables known as a time series. Another basic
type of a stochastic process is a random field, whose domain is a region of space, in
other words, a random function whose arguments are drawn from a range of
continuously changing values. One approach to stochastic processes treats them as
functions of one or several deterministic arguments (inputs, in most cases regarded
as time) whose values (outputs) are random variables: non-deterministic (single)
quantities which have certain probability distributions. Random variables
corresponding to various times (or points, in the case of random fields) may be
completely different. The main requirement is that these different random quantities
all have the same type. Although the random values of a stochastic process at
different times may be independent random variables, in most commonly considered
situations they exhibit complicated statistical correlations.
In this study session, we:
a. defined Stochastic Process as a probabilistic model of a system that evolves
randomly in time and space. Or a variable that evolves over time in a way that
is at least in part random
b. classify Stochastic Processes into Continuous or Discrete in; Time or Change
in the State of the system
c. discussed the following Concepts in Stochastic Processes: 1 st, 2nd, and
Covariance stationary, Levy, Ito, Point and Poisson processes
d. highlighted the application of stochastic processes in different fields such as:
Mathematics, Artificial intelligence, Natural science, Physics, Medicine,
Business, etc.
311
(drift term) and a random term (volatility term). True or False?
2. A stationary process is a second order stationary and vice versa. True or
False?
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2Zvjdwl , http://bit.ly/2Pv3b62 ,
http://bit.ly/2MIOb2a , http://bit.ly/2NFQouD , http://bit.ly/2Ldyf4B ,
http://bit.ly/2Htf5GU , http://bit.ly/2Le8i59 , http://bit.ly/2Ucg0Ay ,
http://bit.ly/2Ubct5G. Watch the video & summarize in 1 paragraph
b. View the animation on Stochastic Processes and critique it in the discussion forum
c. Take a walk and engage any 3 students on Stochastic Processes; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
312
STUDY SESSION 4
Random Walks (RW)
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Definition of Random Walk
2.2 Types of Random Walks
2.3 Random walk in two dimensions
2.4 Random walk on graphs
2.5 Wiener Process
2.6 Applications of Random Walks
2.7 Probabilistic interpretation
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
I am delighted to welcome you all once more. Random walk is a special case of
stochastic process. This means that in this study session, we are simply extending
our knowledge of stochastic processes further.
313
3. Illustrate different dimensions of RW with probabilities occurrence
4. Relate RW to Wiener, Markov and Brownian processes, and
5. List the applications of RW
314
2.2.1 Lattice Random Walk
A popular random walk model is that of a random walk on a regular lattice (i.e.
Network or Web), where at each step the walk jumps to another site according to
some probability distribution.
In simple random walk, the walk can only jump to neighbouring sites of the lattice.
In simple symmetric random walk on a locally finite lattice, the probabilities of
walk jumping to any one of its neighbours are the same. The most well studied
example is of random walk on the d-dimensional integer lattice (sometimes called
the hyper cubic lattice).
315
Figure 3.4.1: Five flips of a fair coin
What can we say about the position of the walk after steps? This is of course
random, so we cannot calculate it. But we may say quite a bit about its distribution.
It is not hard to see that the expectation E(Sn) of Sn is zero. That is, the more you flip
the coin, the closer the mean of all your -1's and 1's will be to zero.
∑ 𝑆𝑛 = ∑ 𝐸(𝑍𝑗 ) = 0
𝑗=1
A similar calculation, using the independence of the random variables shows that:
𝑛 𝑛
2
𝐸(𝑆𝑛 ) = ∑ ∑ ∑ 𝑍𝑗 𝑍𝑖 = 𝑛
𝑖=1 𝑗=1
This hints that 𝐸(|𝑆𝑛 |), the expected translation distance after n steps should be of
the order of √𝑛.
Suppose we draw a line of some distance from the origin of the walk. How many
times will the random walk cross the line if permitted to continue walking forever?
The following, perhaps surprising theorem is the answer: simple random walk on
will cross every point an infinite number of times.
This result has many names: the level-crossing phenomenon, recurrence or the
gambler's ruin.
316
The reason for the last name is as follows: if you are a gambler with a finite amount
of money playing a fair game against a bank with an infinite amount of money, you
will surely lose. The amount of money you have will perform a random walk, and it
will almost surely, at some time, reach 0 and the game will be over.
If a and b are positive integers, then the expected number of steps until a one-
dimensional simple random walk starting at 0 first hits b or − a is ab. The
probability that this walk will hit b before -a steps is (a / (a + b)), which can be
derived from the fact that simple random walk is a martingale.
Some of the results mentioned above can be derived from properties of Pascal's
triangle. The number of different walks of n steps where each step is +1 or −1 is
clearly 2 n. For the simple random walk, each of these walks are equally likely. In
order for Sn to be equal to a number k it is necessary and sufficient that the number of
+1 in the walk exceeds those of −1 by k. Thus, the number of walks, which satisfy Sn
= k is precisely the number of ways of choosing (n + k)/2 elements from an n element
set (for this to be non-zero, it is necessary that n + k be an even number), which is an
entry in Pascal's triangle denoted by:
𝑛
((𝑛 + 𝑘))
2
𝑛
−𝑛
Therefore, the probability that Sn = k is equal to 2 ((𝑛+𝑘)).
2
317
The central limit theorem and the law of the iterated logarithm describe important
aspects of the behaviour of simple random walk on .
318
2.3 Random Walk in Two Dimensions.
Random walk in two dimensions with more, and smaller, steps. In the limit, for very
small steps, one obtains Brownian motion.
A Stochastic process {X(t), t>=0} is said to be a Brownian motion process if:
i. X(0) = 0;
ii. {X(t), t>=0} has stationary and independent increments
iii. For every t>0, X(t) is normally distributed with mean 0 and variance σ2t.
Imagine now a drunkard walking randomly in an idealized city. The city is
effectively infinite and arranged in a square grid, and at every intersection, the
drunkard chooses one of the four possible routes (including the one he came from)
with equal probability. Formally, this is a random walk on the set of all points in the
plane with integer coordinates. Will the drunkard ever get back to his home from the
bar? It turns out that he will. This is the high dimensional equivalent of the level
crossing problem discussed above. The probability of returning to the origin
decreases as the number of dimensions’ increases. In three dimensions, the
probability decreases to roughly 34%. A derivation, along with values of p(d) are
discussed in:
The trajectory of a random walk is the collection of sites it visited, considered as a
set with disregard to when the walk arrived at the point. In one dimension, the
trajectory is simply all points between the minimum height the walk achieved and
the maximum (both are, on average, on the order of √n). In higher dimensions, the
set has interesting geometric properties. In fact, one gets a discrete fractal, that is a
set, which exhibits stochastic self-similarity on large scales, but on small scales one
can observe "jaggedness" resulting from the grid on which the walk is performed.
319
2.4 Random Walk on Graphs
Assume now that our city is no longer a perfect square grid. When our drunkard
reaches a certain junction, he picks between the various available roads with equal
probability. Thus, if the junction has seven exits the drunkard will go to each one
with probability of one seventh. This is a random walk on a graph.
Will our drunkard reach his home? It turns out that under rather mild conditions, the
answer is still yes. For example, if the lengths of all the blocks are between a and b
(where a and b are any two finite positive numbers), then the drunkard will, almost
surely, reach his home. Notice that we do not assume that the graph is planar, i.e. the
city may contain tunnels and bridges. One way to prove this result is using the
connection to electrical networks. Take a map of the city and place a one ohm
resistor on every block. Now measure the "resistance between a point and infinity".
In other words, choose some number R, take all the points in the electrical network
with distance bigger than R from our point, and wire them together. This is now a
finite electrical network and we may measure the resistance from our point to the
wired points. Take R to infinity. The limit is called the resistance between a point
and infinity. It turns out that the following is true (Doyle and Snell can find an
elementary proof in the book):
In-text Question 1
What is a random walk?
Answer
A random walk, sometimes denoted by RW is a mathematical formalisation of a trajectory that
consists of taking successive random steps.
Theorem: a graph is transient if and only if the resistance between a point and
infinity is finite. It is not important which point is chosen if the graph is connected.
320
In other words, in a transient system, one only needs to overcome a finite resistance
to get to infinity from any point. In a recurrent system, the resistance from any point
to infinity is infinite.
This characterization of recurrence and transience is very useful, and specifically it
allows us to analyse the case of a city drawn in the plane with the distances bounded.
A random walk on a graph is a very special case of a Markov chain. Unlike a general
Markov chain, random walk on a graph enjoys a property called time symmetry or
reversibility. Roughly speaking, this property, also called the principle of detailed
balance, means that the probabilities to traverse a given path in one direction or in
the other have a very simple connection between them (if the graph is regular, they
are just equal). This property has important consequences.
A good reference for random walk on graphs is the online book by Aldous and Fill.
For groups see the book of Woess. If the transition kernel p(x,y) is itself random
(based on an environment ω) then the random walk is called a "random walk in
random environment". When the law of the random walk includes the randomness of
ω, the law is called the annealed law; on the other hand, if ω is seen as fixed, the law
is called a quenched law.
321
d. In mathematical ecology, random walks are used to describe individual animal
movements, to empirically support processes of bio-diffusion, and occasionally
to model population dynamics.
i. In polymer physics, random walk describes an ideal chain. It is the simplest
model to study polymers.
ii. In other fields of mathematics, random walk is used to calculate solutions to
Laplace's equation, to estimate the harmonic measure, and for various
constructions in analysis and combinatory.
iii. In computer science, random walks are used to estimate the size of the Web.
In the World Wide Web conference-2006, bar-yossef et al. published their
findings and algorithms for the same. (This was awarded the best paper for the
year 2006).
iv. In image segmentation, random walks are used to determine the labels (i.e.,
"object" or "background") to associate with each pixel. This algorithm is
typically referred to as the random walker segmentation algorithm.
v. In brain research, random walks and reinforced random walks are used to
model cascades of neuron firing in the brain.
vi. In vision science, fixation eye movements are well described by a random
walk.
vii. In psychology, random walks explain accurately the relation between the time
needed to make a decision and the probability that a certain decision will be
made.
viii. Random walk can be used to sample from a state space that is unknown or
very large, for example to pick a random page off the internet or, for research
of working conditions, a random worker in a given country.
When this last approach is used in computer science it is known as Markov
Chain Monte Carlo or MCMC for short. Often, sampling from some
complicated state space also allows one to get a probabilistic estimate of the
322
space's size. The estimate of the permanent of a large matrix of zeros and ones
was the first major problem tackled using this approach.
ix. In wireless networking, random walk is used to model node movement.
x. Motile bacteria engage in a biased random walk.
xi. Random walk is used to model gambling.
xii. In physics, random walks underlying the method of Fermi estimation.
xiii. During World War II a random walk was used to model the distance that an
escaped prisoner of war would travel in a given time.
In-text Question 2
What is annealed law?
Answer
When the law of the random walk includes the randomness of ω, the law is called the annealed law
323
similarly for±1,±10, and±100.To be clear, for the 0.1 edge, most of the
time there will be no scattering, because the step length will be larger than
the distance to the edge.
324
b. View the animation on random walks and critique it in the discussion forum
c. Take a walk and engage any 3 students on random walks; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
325
8.0 References/Further Readings
Pearson, K. (1905). The problem of the Random Walk. Nature. 72, 294.
Leo Grady (2006): "Random Walks for Image Segmentation", IEEE Transactions
on Pattern Analysis and Machine Intelligence, pp. 1768-1783, Vol. 28,
No. 11 http://en.wikipedia.org/wiki/Random_walk".
Introduction to Probability Models 9th Edition by Sheldon M. Ross,
Academic press 2007.
http://mathworld.wolfram.com/PolyasRandomWalkConstants.html.
326
STUDY SESSION 5
Data Collection
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Methods of Data Collection
2.2 Pros and Cons of Data Collection Methods
2.3 Sampling Survey Methods
2.4 Categorization of Sampling Methods
2.4.1. Non-Probability Sampling Methods
2.4.2. Probability Sampling Methods
2.5 Experiments Method in Data Collection
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Questions
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
Welcome to another special study session. The purpose of most simulation models is
to collect data and to analyse the data, in order to gain insights into the system being
simulated. Thus we literally ‘play’ with data in the conduct of ‘what if scenarios’ to
enable conclusion s and decisions to be made. To derive good conclusions from data,
we do not just use any data, care and efforts are usually made to collect usable data
from select at times ‘restricted’ source(s). Therefore, before decisions are made
based on statistics from data, we need to know how the data were collected; that is,
327
we need to know the method(s) of data collection, and the necessary screening on the
data that has taken place.
1.0 Study Session Learning Outcomes
After studying this session, I expect you to be able to:
1. Discuss the different methods of data collection: Census, Sample
Survey, Experiments and Observations
2. Describe the various methods determining sample size
3. Describe data coding with respect to: What, Why, uses and determination of
codes
328
researcher is not able to control (1) how subjects are assigned to groups and/or
(2) which treatments each group receives.
329
Sample statistic. A sample statistic is an estimate, based on sample data, of a
population parameter.
Consider this example. A public opinion pollster wants to know the percentage of
voters that favour a flat-rate income tax. The actual percentage of all the voters is a
population parameter. The estimate of that percentage, based on sample data, is a
sample statistic.
The quality of a sample statistic (i.e., accuracy, precision, representativeness) is
strongly affected by the way those sample observations are chosen; that is., by the
sampling method.
330
poll. This is a volunteer sample because the viewers, not by the survey
administrator, choose the sample.
ii. Convenience sample: A convenience sample is made up of people who are easy
to reach.
Consider the following example. A pollster interviews shoppers at a local mall. If the
mall was chosen because it was a convenient site from which to solicit survey
participants and/or because it was close to the pollster's home or business, this would
be a convenience sample.
In-text Question 1
What is Population parameter?
Answer
A population parameter is the true value of a population attribute.
331
There are many ways to obtain a simple random sample. One way would be the
lottery method. Each of the N population members is assigned a unique number.
The numbers are placed in a bowl and thoroughly mixed. Then, a blindfolded
researcher selects n numbers. Population members having the selected numbers
are included in the sample.
2.4.4 Stratified sampling Method (SSM)
With stratified sampling, the population is divided into groups, based on some
characteristic. Then, within each group, a probability sample (often a simple random
sample) is selected. In stratified sampling, the groups are called strata. As a
example, suppose we conduct a national survey. We might divide the population into
groups or strata, based on geography - north, east, south, and west. Then, within each
stratum, we might randomly select survey respondents.
332
2.4.7 Systematic Random Sampling
With systematic random sampling, we create a list of every member of the
population. From the list, we randomly select the first sample element from the first k
elements on the population list. Thereafter, we select every kth element on the list.
This method is different from simple random sampling since every possible sample
of n elements is not equally likely.
In-text Question 2
What are the advantages of Non-probability sampling methods?
Answer
Convenience and Cost.
333
Vitamin C
0 mg 250 mg 500 mg
Vitamin E
400 mg Treatment 4 Treatment 5 Treatment 6
334
2.5.2 Characteristics of a Well-Designed Experiment
A well-designed experiment includes design features that allow researchers to
eliminate extraneous variables as an explanation for the observed relationship
between the independent variable(s) and the dependent variable. Some of these
features are listed below.
a) Control: Control refers to steps taken to reduce the effects of extraneous
variables (i.e., variables other than the independent variable and the dependent
variable). These extraneous variables are called lurking variables: Control
involves making the experiment as similar as possible for experimental units
in each treatment condition. Three control strategies are control groups,
placebos, and blinding.
b) Control group: A control group is a baseline group that receives no treatment
or a neutral treatment. To assess treatment effects, the experimenter compares
results in the treatment group to results in the control group.
c) Placebo: Often, participants in an experiment respond differently after they
receive a treatment, even if the treatment is neutral. A neutral treatment that
has no "real" effect on the dependent variable is called a placebo, and a
participant's positive response to a placebo is called the placebo effect. To
control for the placebo effect, researchers often administer a neutral treatment
(i.e., a placebo) to the control group. The classic example is using a sugar pill
in drug research. The drug is effective only if participants who receive the
drug have better outcomes than participants who receive the sugar pill.
d) Blinding: Of course, if participants in the control group know that they are
receiving a placebo, the placebo effect will be reduced or eliminated; and the
placebo will not serve its intended control purpose. Blinding is the practice of
not telling participants whether they are receiving a placebo. In this way,
participants in the control and treatment groups experience the placebo effect
equally. Often, knowledge of which groups receive placebos is also kept from
people who administer or evaluate the experiment. This practice is called
335
double blinding. It prevents the experimenter from "spilling the beans" to
participants through subtle cues; and it assures that the analyst's evaluation is
not tainted by awareness of actual treatment conditions.
e) Randomization: Randomization refers to the practice of using chance
methods (random number tables, flipping a coin, etc.) to assign experimental
units to treatments. In this way, the potential effects of lurking variables are
distributed at chance levels (hopefully roughly evenly) across treatment
conditions.
f) Replication: Replication refers to the practice of assigning each treatment to
many experimental units. In general, the more experimental units in each
treatment condition, the lower the variability of the dependent measures.
2.6 Confounding
Confounding occurs when the experimental controls do not allow the experimenter
to reasonably eliminate plausible alternative explanations for an observed
relationship between independent and dependent variables.
Consider this example. A drug manufacturer tests a new cold medicine with 200
Participants - 100 men and 100 women. The men receive the drug, and the women
do not.
At the end of the test period, the men report fewer colds.
This experiment implements no controls at all! As a result, many variables are
confounded, and it is impossible to say whether the drug was effective. For example,
gender is confounded with drug use. Perhaps, men are less vulnerable to the
particular cold virus circulating during the experiment, and the new medicine had no
effect at all. Alternatively, perhaps the men experienced a placebo effect.
This experiment could be strengthened with a few controls. Women and men could
be randomly assigned to treatments. One treatment could receive a placebo, with
blinding.
336
Then, if the treatment group (i.e., the group getting the medicine) had sufficiently
fewer colds than the control group, it would be reasonable to conclude that the
medicine was effective in preventing colds.
337
4.0 Summary
To recap our discussion, you must bear in mind that Model Simulations is simply
playing with relevant data to learn and make decision. The result or accuracy of such
decision that may be far reaching is subject to the quality of data. This high quality
and thus good decision is achieved through conscious choice, planning and
execution of data collection method.
338
5.0 Self-Assessment Questions
1. List the four main methods of data collection discussed in this study session.
2. What is sample statistics?
6.0 Additional Activities (Videos, Animations & Out of Class activities) e.g.
a. Visit U-tube add http://bit.ly/2wL2MjD , http://bit.ly/2wMm7B1 ,
http://bit.ly/2zvvjuE , http://bit.ly/2X610rO , http://bit.ly/2NEsKi8. Watch the video
& summarize in 1 paragraph.
b. View the animation on data collection and critique it in the discussion forum
c. Take a walk and engage any 3 students on data collection; In 2 paragraphs
summarize their opinion of the discussed topic. etc.
339
Bryman, and Timothy Futing Liao, v. 1, 132-136. Thousand Oaks,
Calif.: Sage.
Grbich, Carol. “Incorporating Data from Multiple Sources.” In Qualitative
Data Analysis. (Thousand Oaks, Calif.: Sage Publications, 2007): 195-
204.
Lockyer, Sharon (2004). "Coding Qualitative Data." In The Sage Encyclopedia
of Social Science Research Methods, Edited by Michael S. Lewis- Beck,
Alan Bryman, and Timothy Futing Liao, v. 1, 137-138. Thousand Oaks,
Calif.: Sage.
Lee, Epstein and Andrew Martin. "Coding Variables." In The Encyclopedia of
Social Measurement. Ed. Kimberly Kempf-Leonard, v.1, 321-327. New
York: Elsevier Academic Press, 2005.
Shenton, Andrew K (2004). “The analysis of qualitative data in LIS research
projects: A possible approach.” Education for Information 22: 143-
162.
Strauss, A. and J. Corbin (2004) Basics of qualitative research: Grounded
theory procedures and techniques., 1990 as cited in Lockyer, S.
Newbury Park, CA: Sage
http://polaris.gseis.ucla.edu/jrichardson/courses/datacoding.ppt.
340
STUDY SESSION 6
Data Coding and Screening
Section and Subsection Headings:
Introduction
1.0 Learning Outcomes
2.0 Main Content
2.1 Data Coding
2.2 Framework for data Coding
2.2 Why do data coding?
2.3 When to code
2.4 Steps of coding (for qualitative data)
2.5 Uses of Data Screening
2.6 When to Determine Codes
2.7 Coding Mixed Methods
2.8 OUTLIERS in DATA ANALYSIS
3.0 Tutor Marked Assignments (Individual or Group assignments)
4.0 Study Session Summary
5.0 Self-Assessment Question
6.0 Additional Activities
7.0 Self-Assessment Question Answers
8.0 References/Further Readings
Introduction
You are welcome. We have earlier stated that simulation is about play with data to
study the behaviour of a system. To actually benefit from such exercise, true
representative data must be collected and screen. Quality data comes through
careful transformation from its raw stages into appropriate system parameters
variables that can be quantitatively analysed before it can yield true results.
341
1.0 Study Session Learning Outcomes
After studying this session, I expect you to be able to:
i. Define and explain data coding
ii. Describe the steps involved in coding
iii. Explain code determination
iv. Describe the outlier and how to handle it
342
Step 2: Marking your data
After carefully reading your data at least once (Step 1), you need to mark your
data. That is, underline, highlight, or circle points that seem significant or relevant
to your professional context. In marking your data, consider:
i. Points that are mentioned consistently across your data
ii. Points that are significant anomalies--that are important because they
indicate special or unique circumstances, standards, or procedures
iii. Points that contradict one another
iv. Connections (among persons, sites, documents, procedures, objects, etc.)
that are made explicitly
v. Connections or traces (among persons, sites, documents, procedures, objects,
etc.) that are implied
343
Step 4: Inserting coded data into "data coding grid"
After coding all of your data, download the "Data Coding Grid" form. Then, cut
and paste coded examples from your data into the appropriate categories in the
grid.
As noted above, some examples will fit into more than one column of your grid.
Once you have plotted your coded data into these issue categories, you can begin
considering which issues are most significant to your professional context. This
grid also should allow you to develop better your "Contextual Analysis Plan."
In-text Question 1
What is Data Coding?
Answer
Data Coding is a systematic way used to condense extensive data sets into smaller analysable
units through the creation of categories and concepts derived from the data.
344
The type of statistical analysis you can use depends on the type of data you
collect, how you collect it, and how it is coded.
Coding facilitates the organization, retrieval, and interpretation of data and
leads to conclusions because of that interpretation.
In-text Question 1
Why do we data code?
Answer
1. It lets you make sense of and analyse your data.
2. For qualitative studies, it can help you generate a general theory.
3. The type of statistical analysis you can use depends on the type of data you collect, how you
collect it, and how it is coded.
4. Coding facilitates the organization, retrieval, and interpretation of data and leads to
conclusions because of that interpretation.
345
2.5 Uses of Data Screening
i. It is used to identify miscoded, missing, or messy data
ii. May be used find possible outliers, non-normal
distributions and other anomalies in the data
iii. It can improve performance of statistical methods
iv. To make data conform to particular analysis methods.
346
a. makes methods transparent by recording analytical thinking
used to devise codes
b. allows comparison with other studies
Transcription features
Appropriate for open-ended answers as in focus groups, observation, individual
interviews, etc.
Strengthens “audit trail” since reviewers can see actual data.
Uses identifiers that make participant anonymous but still reveal information to
researcher; example Staff/Employee/Student number.
347
2.6.1 Advantages of Mixed Methods:
i. Improves validity of findings
More in-depth data
ii. Increases the capacity to cross-check one data set against another
Provides detail of individual experiences behind the statistics
iii. Provides more focused questionnaire
iv. Additional in-depth interviews can be used to tease out problems and
seek solutions
348
i. The outlier can be accommodated into the data set through sophisticated
statistical refinements.
ii. An outlier can be incorporated by replacing it with another model.
iii. The outlier can be used to identify another important feature of the
population being analysed, which can lead to new experimentation.
iv. If other options are of no alternative, the outlier will be rejected and
regarded as a “contaminant” of the data set.
4.0 Summary
This brings us to the end of our study for this course. In this study session, we
cannot but transcribe data into appropriate format for system analysis. The choice
of transcription method adopted is mostly dependent on the system under
investigation. Therefore, you must exercise care in selecting coding level
especially for qualitative data.
In this study session, data coding and Screening, we:
i. Defined data coding as a systematic way used to condense extensive
data-sets into smaller analysable units through the creation of categories
and concepts derived from the data
ii. Explained why we have to code data, listed the uses of data screening
349
iii. Explained code determination of codes, why and the need for creating
code book discussed mixed coding method and listed its advantages
and disadvantages
iv. Discussed outliers and how it can be handle in data analysis
350
i. Improves validity of findings More in-depth data
ii. Increases the capacity to cross-check one data set against another
Provides detail of individual experiences behind the statistics
iii. Provides more focused questionnaire
iv. Additional in-depth interviews can be used to tease out problems and
seek solutions
Disadvantages
i. Inequality in data sets
ii. Numerical data set treated less theoretically, mere proving of
hypothesis” Presenting both data sets can overwhelm the reader
iii. Synthesized findings might be “numbed-down” to make results more
readable Key Point in Coding Mixed Methods Data
351
Shenton, Andrew K (2004) “The analysis of qualitative data in LIS research
projects: A possible approach.” Education for Information 22 (2004):
143-162.
Strauss, A. and J. Corbin. Basics of qualitative research: Grounded
theory procedures and techniques. Newbury Park, CA: Sage, 1990 as
cited in Lockyer, S., 2004.
http://polaris.gseis.ucla.edu/jrichardson/courses/datacoding.ppt.
http://en.wikipedia.org/wiki/Linear_congruential_generator
http://www.une.edu.au/WebStat/unit_materials/c3_collecting_data/missing_values.
html
http://www.u.arizona.edu/%7Ekimmehea/purdue/421/datacoding.htm
http://www.u.arizona.edu/~kimmehea/purdue/421/exampledatacoding.htm
352
Glossary
Modelling: is the process of generating abstract, conceptual, graphical and/or
mathematical models. Science offers a growing collection of methods, techniques
and theory about all kinds of specialized scientific modelling.
Model: A model in general is a pattern, plan, representation (especially in
miniature), or description designed to show the main object or workings of an
object, system, or concept.
Simulation: is the manipulation of a model in such a way that it operates on time or
space to compress it, thus enabling one to perceive the interactions that would not
otherwise be apparent because of their separation in time or space.
Computer model: A computer model is a simulation or model of a situation in the
real world or an imaginary world, which has parameters that the user can alter.
Random Number: has to do with numbers that show no consistent pattern, with
each number in a series and are neither affected in any way by the preceding
number, nor predictable from it.
353
Probability function: A probability function gives the probabilities that a random
variable will take on a given list of specific values.
Probability theory: is a mathematical science that permits one to find, using the
probabilities of some random events, the probabilities of other random events
connected in some way with the first.
Entity: An entity represents some object in the real system that must be explicitly
defined.
354
Distribution: is the mathematical law, which governs the probabilistic features of a
random variable.
Finite element method (FEM): The finite element method (FEM) (its practical
application often known as finite element analysis (FEA)) is a numerical
technique for finding approximate solutions of partial differential equations (PDE)
as well as of integral equations.
Data modelling: is a method used to define and analyse data requirements needed
to support the business processes of an organization.
355
Queuing theory: is the study of how systems with limited resources distribute
those resources to elements waiting in line, and how those elements waiting in line
respond.
Census: A census is a study that obtains data from every member of a population.
In most studies, a census is not practical, because of the cost and/or time required.
Sample survey: A sample survey is a study that obtains data from a subset of a
population, in order to estimate population attributes.
Experiment: An experiment is a controlled study in which the researcher attempts
to understand cause-and-effect relationships.
Sampling method: refers to the way that observations are selected from a
population to be in the sample for a sample survey.
Population parameter. A population parameter is the true value of a population
attribute.
Sample statistic. A sample statistic is an estimate, based on sample data, of a
population parameter.
Data Coding: is a systematic way used to condense extensive data sets into
smaller analysable units through the creation of categories and concepts derived
from the data.
Outlier: An outlier is a single observation or single mean which does not conform
with the rest of the data.
356