0% found this document useful (0 votes)
17 views

System Analysis &data Collection

Analysis of data

Uploaded by

Mutasem abadleh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

System Analysis &data Collection

Analysis of data

Uploaded by

Mutasem abadleh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Simulation

Final Project Workshop # 1:


Basics of System Analysis
Data Collection
Input Analysis
Basics

 Description of the system


 Type of the system
 Deterministic or stochastic?
 Open or close?
 Static or dynamic?
 Aim(s) of the system
 Layout of the system
 Is there sub-systems?
 Organizational chart

2
System elements
 Raw material
 Workers
 Machines
 Transporters
 Equipments
 Facility

3
Processes (Main & Sub)
 What type of processes exist in the system?
 How are the objectives realized through the processes?
 Process flow chart

4 http://ogs.tamu.edu/wp-content/uploads/2012/09/thesis-flowchart-final3-full.png
Flowchart examples
 Higher level flowcharts

 Detailed flowchart →

How to develop process flow chart on MS Visio?


https://www.youtube.com/watch?v=cibALNOlGUY

http://asq.org/learn-about-quality/process-analysis-tools/overview/flowchart.html

5
Variables
 Raw material: type, quality, etc.
 Shift
 # Workers
 Production rate
The variables should
 Scrap rate have
critical/noncritical
 Demand impact on the
 Production (supply) SYSTEM

6
Parameters
 # workers per shift
 daily working hours
 # breaks and break times

Don’t limit your


observations to
the examples
here

7
Relationships
 Space-based:
 Time-based
mentionin
 Logical these after
 Causal the flow chart
will be cool!
e.g.
 Relationship between machines
 Relationship between the facility and machines
 Relationship between managers and workers
 Relationship between transporters and machines
 Relationship between market competition level and price,
profit, etc.

8
Metrics
 Daily/Weekly/Monthly production rate
 Min/Max/Average waiting times on queue(s)
 Min/Max/Average utilization of resources
 Time per part average
 Min/Max/Average process times
 Scrap rates

9
Constants
 Facility Layout
 Facility inner/outer area
 Types of raw materials, subparts, products
 Dimensions of product(s)
 Daily work time per shift (can vary or be constant)

What is unchanged in the


system or around the
system?

10
Constraints
 Work areas
 Maximum possible production rate
 Efficiency of machines
 # machines
 daily work time
 financial constraints Time, space,
financial, etc. limits
that limit the
system’s
productivity

11
System analysis cont’d.
 Description of
 Entities
 Processes
 Resources
 Queues
 Problems observed on
 system in general
 processes, queues, word of mouth

12
Example: A hospital system
 System: X hospital
 Objective(s):
 The main objective is to provide the best treatment service to
the patients depending on their needs and emergency situation
in a timely manner.
 Mission & vision statements can be checked for companies

13
Example: A hospital system cont’d.
 Problem(s):
 Unnecessary waiting times in various departments
 Long lines in emergency dept. and registration on some days
and between some hours
 Resource utilization is questionable for some workers, needs
to be examined
 Etc.

14
Example: A hospital system cont’d.
 Elements of the system
 Doctors
 Nurses
 Administrators
 Assistants
 Security officers
 Cleaning clerks
 Labs, surgery rooms, examination rooms, etc.
 Departments (emergency, etc.)
 Elevators
 Computers
 Desks
 Tables
 ….

15
Example: A hospital system cont’d.
 Parameters
 Number of
 Nurses
 Doctors
 Cleaning clerks
 Rooms
 Beds

16
Example: A hospital system cont’d.
 Variables
 Average # patients (per day, per month, etc.)
 Treatment times by department
 Percentage of patients who had surgical treatment
 Surgery times by type of surgery

 Variables are generally time-related

17
Example: A hospital system cont’d.
 Constants
 Physical location of the system
 Physical size of the system
 # floors

18 http://www.sherwin-williams.com/images/bg/bg-facility-education.jpg
http://static6.businessinsider.com/image/167a6c792089f049ba869f00/map-pin-tbi.jpg
Example: A hospital system cont’d.
 Constraints
 Financial
 Budget
 Physical
 Location
 System capacity
 Time
 Working hours (if fixed)

19
Patient Registrati Patient Dept. 1
Arrives on Type
Dept. 2
Example: A hospital Dept. 3

system cont’d. …

 Example process flow chart Dept. n

In room

Room Type of
Treatment

On feet
Quick treatment
Need
surgery? Yes

No Scheduling + Treatment

Treatment Check out

Final Examination

Check out Exit


20
Example: A hospital system cont’d.
 Performance metrics
 Average waiting time in (entrance, dept., etc.)
 Average queue length (entrance, dept., etc.)
 Transfer rate of beds/rooms
 Average resource utilization (doctors, nurses, surgery rooms,
treatment rooms, etc.)
 Average cost per patient by type of treatment
 Average length of treatment
 Average registration time

21
Example: A hospital system cont’d.
 Assumptions
 Patients will be accepted as long as the system capacity is
adequate
 During the working hours, all resource are available for all
patients
 No extraordinary events will happen (e.g. earthquake, tsunami)
 Emergency room is considered as a separate system to be
simulated

22
Popular methods that include system
analysis
 McKenzie 7S framework
 An effective way of analyzing an organization.
 The originators of the model are the consultants at McKinsey & Company
 7S emphasizes that an organization could be understood in terms of a dynamic
relationship among seven key elements:
1. Strategy
2. Structure
3. Systems
4. Shared values
5. Style
6. Staff
7. Skills
 Service Blueprinting
 An operational planning tool that provides guidance on how a service will be
provided, specifying the physical evidence, staff actions, and support systems /
infrastructure needed to deliver the service across its different channels.
Including McKenzie or Service Blueprinting in the final project report is 10% bonus

23
McKenzie 7S Components
 Strategy: the plan devised to maintain and build competitive
advantage over the competition.
 Structure: the way the organization is structured and who
reports to whom.
 Systems: the daily activities and procedures that staff
members engage in to get the job done (System Analysis).
 Shared Values: called "superordinate goals" when the model
was first developed, these are the core values of the company
that are evidenced in the corporate culture and the general
work ethic.
 Style: the style of leadership adopted.
 Staff: the employees and their general capabilities.
 Skills: the actual skills and competencies of the employees
working for the company

24
Service Blueprinting

25
26
DATA COLLECTION

27
Data
 Definition
 Information in raw or unorganized form (such as
alphabets, numbers, or symbols) that refer to,
or represent, conditions, ideas, or objects.
 Data is limitless and present everywhere in the universe.
See also information and knowledge.
 Data for Simulation Modeling
 Inter arrival times (time between 2 arrivals) Flowchart can
help identifying
 Service times (service or manufacturing systems) what type of
 Group size (group arrivals data needed…
 Defective part/product rate (manufacturing systems)
 Transfer times (transportation of parts/people, etc.)
 Setup times (setting up the machine/system, etc.)
 Etc.
http://www.businessdictionary.com/definition/data.html#ixzz3lqUV0aC4
28
Collecting Data
 Generally hard, expensive, frustrating, boring
 System might not exist
 Data available on the wrong things
 Modelers might have to change model according to what’s available
 Incomplete, “dirty” data
 Too much data (!) or No data (!)
 Sensitivity of outputs to uncertainty in the inputs
 Match model detail to quality of data
 Cost — should be budgeted in project I’m afraid of
GIGO, so you
 Capture variability in data — model validity should be
 Garbage In, Garbage Out (GIGO) too…

29
Data Collection for the Final Project
 Suggested steps:
 Observe the system
 Talk to the people
 Discuss with your teammates
 Make sure to have complete understanding about the system prior to
data collection
 A draft model will be advantageous
 Plan your data collection
 Show your plan, observations
 Show your draft system model (e.g. flowchart of processes, etc.)
 Start data collection
 Collect as much data as possible (not to violate the fitting distribution
assumptions)
 More data will have more chance / probability of fitting to a certain distribution
 Conduct the input analysis with Arena Input Analyzer
 Lab session will be provided about this particular task

30
Collecting Data for the Project
 The data needs to reflect the pattern of behavior that the system analyst observes
in the system
 Multiple days → data needs to be collected for 3 different days
 e.g. M, W, F)
 Multiple times in a day (→ at least 3 times in a day,
 e.g. morning, noon, afternoon, etc.)
 Busy days / nonbusy days (based on the interviews with the workers/managers)
 Typical data types for the final project
 Interarrival times - at least 30,
 But more will provide more reliable understanding)
 Service times - at least 30
 E.g.100 serving times for a typical server
 Data needs to be definitely larger than the sample size of 30
 So that ,working with Z tables will be of importance
 Inverse transformation assumptions can be met
 If the data does not fit to a certain distribution
 Multiple patterns might be in the data
 Further assessments might be required

31
Using Data:
Alternatives and Issues
 Use data “directly” in simulation
 Read actual observed values to drive the model inputs
(interarrivals, service times, part types, …)
 All values will be “legal” and realistic
 But can never go outside your observed data
 May not have enough data for long or many runs
 Computationally slow (reading disk files)
 Or, fit probability distribution to data
 “Draw” or “generate” synthetic observations from this
distribution to drive the model inputs
 We’ve done it this way so far
 Can go beyond observed data (good and bad)
 May not get a good “fit” to data — validity?

32
Fitting Distributions via the Arena Input
Analyzer (cont’d.)
 Fitting = deciding on distribution form (exponential,
gamma, empirical, etc.) and estimating its parameters
 Several different methods (Maximum likelihood, moment
matching, least squares, …)
 Assess goodness of fit via hypothesis tests
 H0: fitted distribution adequately represents the data
 Get p value for test (small = poor fit)
 Fitted “theoretical” vs. empirical distribution
 Continuous vs. discrete data, distribution
 “Best” fit from among several distributions

33
Input Analysis for Simulation Modeling
Input Analysis Activities
• Input Analysis activities consist of the following stages:
Stage 1: data collection
Stage 2: data analysis & dealing with outliers
Stage 3: modeling time series data
Stage 4: goodness-of-fit testing

• Random variables with negligible variability are simplified


and modeled as deterministic quantities.

• Unknown distributions are postulated to have a particular


functional form that incorporates any available partial
information.

35
Data Collection
• To illustrate data collection activities, consider modeling a
painting station, where
1. jobs arrive at random, wait in the buffer until the sprayer is available
2. having been sprayed, they leave the station
3. suppose that the spray nozzle can get clogged – an event that
results in a stoppage during which the nozzle is cleaned or replaced.
4. suppose further that the measure of interest is the expected job delay in
the buffer.

• The data collection activity in this simple case would consist of


the following tasks:
1. collection of job inter-arrival times
2. collection of painting times
3. collection of times between nozzle clogging
4. collection of nozzle cleaning/replacement times
36
Data Analysis
• Data Analysis deals with statistics of empirical data:
• statistics related to moments (mean, standard deviation, coefficient of variation, etc.)
• statistics related to distributions (histograms)
• statistics related to temporal dependence (autocorrelations within an empirical time
series, or cross-correlations among two or more distinct time series)
• For example, consider the sample of 100 repair time observations
12.9 27.7 13.5 13.7 22.2
20.9 26.6 29.1 22.4 10.7
30.0 27.4 18.8 25.3 15.0
17.0 21.7 13.7 15.5 23.2
11.0 27.5 22.5 27.1 25.2
10.3 18.0 11.5 14.1 24.0
10.9 27.0 24.2 25.6 22.4
21.0 21.3 23.1 15.8 13.2
22.8 25.9 22.4 13.8 16.6
10.8 10.3 15.1 19.0 27.9
20.5 19.4 10.9 24.1 10.9
22.2 25.5 17.2 10.9 15.6
14.3 29.9 17.8 19.8 17.6
13.3 24.0 29.7 18.1 28.4
28.6 26.9 20.7 22.0 16.8
19.4 27.4 22.5 28.3 27.1
18.9 11.9 13.2 10.9 22.1
16.7 28.5 19.9 18.5 16.5
12.7 18.1 15.0 21.0 25.7
19.5 11.9 22.9 23.2 18.9

37
Data Analysis with SPSS
 Descriptive statistics
 SPSS, Minitab can be used…
 SPSS →Analyze → Descriptive Statistics → Descriptives

38
Data Analysis with SPSS

39
Data Analysis on IA Example
• Data Analysis of the repair time data produced the histogram and summary
statistics shown below

40
Dealing with Outliers

 What are the typical impacts of


outliers on simulation modeling?
 Rule of thumbs for:
 Discrete data?
 Continuous data?

 A detailed article:
 http://pareonline.net/getvn.asp?v=9&n=6

41
Steps of Outlier Analysis for Final Project
1. Collect data log
2. Prepare excel or text files that includes to collected data
3. Find the lower and upper bounds (LB and UB)
1. the 1st and 3rd quartiles for discrete data
2. Mean +/- (2.5*Standard deviation) for continuous data
4. Take out the data that is outside of the LB and UB.
5. Re-prepare the data without outliers
6. Analyze descriptive statistics on SPSS, Minitab, etc.
7. Do Input Analysis with Arena Input Analyzer

42
Modeling Time Series Data
• Independent observations are modeled as a renewal time series, namely,
a sequence of iid random variables. In this case, the analyst’s task is to merely
identify (fit) a “good” distribution and its parameters to the empirical data.
• Arena provides built-in facilities for fitting distributions to empirical data.

• Dependent observations are modeled as random processes with temporal


dependence. In this case, the analyst’s task is to identify (fit) a “good”
probability law to empirical data. This is a far more difficult task
than the previous one, and often requires advanced mathematics.
• Arena does not provide facilities for fitting dependent random processes

• Examples:
• Observed sequences of arrival times to a queue are often modeled as iid
exponential inter-arrival times (i.e., Poisson processes)
• For observed sequence of times to failure and the corresponding repair times,
the associated uptimes may be modeled as a Poisson process, and the downtimes
as a renewal process or as a dependent process (e.g., Markov process)

43
The Arena Input Analyzer
The Arena Input Analyzer is a tool that fits a distribution to
sample data.
Distribution Arena Name Arena Parameters
Exponential EXPO Mean
Normal NORM Mean, StdDev
Triangular TRIA Min, Mode, Max
Uniform UNIF Min, Max
Erlang ERLA ExpoMean, k
Beta BETA Beta, Alpha
Gamma GAMM Beta, Alpha
Johnson JOHN G, D, L, X
Log Normal LOGN LogMean, LogStdDev
Poisson POIS Mean
Weibull WEIB Beta, Alpha
Continuous CONT P1, V1, …
Discrete DISC P1, V1, …
Arena-supported distributions and their parameters
44
Best-fit uniform distribution for the repair time data
45
Best-fit beta distribution for the repair time data
46
Best-fit gamma distribution for a sample of lead time data
47
Fit All Summary for a sample of lead time data
48
Goodness-of-Fit Tests for Distributions
• Tests of goodness-of-fit for distributions determine the
likelihood that an empirical sample is drawn from a given
distribution
• a statistical hypothesis is formulated
• a statistic is computed from the empirical data
• the distribution of the statistic is assumed known under the null
hypothesis, allowing the computation of the probability that it exceeds
the observed value
• rejection or acceptance decisions can be taken at a given significance
level, but these are subject to Type I and Type II statistical errors

• Common goodness-of-fit tests for distributions:


1. Chi-Square test
2. Kolmogorov-Smirnov test

49
Chi-Square Test
• The Chi-Square test compares the empirical histogram density,
constructed from sample data, to a candidate theoretical
density
• assume that the empirical sample x , ¼ , x is a set of N iid
1 N
realizations from an underlying (unknown) random variable, X .
• this sample is used to construct an empirical histogram with J cells,
where cell j corresponds to the interval [l , r )
j j

• The estimator of the probability p j = Pr{ X Î [l j , r j )} of cell j is


N
j
pˆ = , j = 1, K , J
j N
• N is the number of observations in cell j
j
• it is commonly suggested to take N j > 5 for statistical reliability)

50
Chi-Square Test (Cont.)
• Let FX (x ) be some theoretical candidate distribution of the
random variable X whose goodness-of-fit is to be assessed
• Compute the corresponding theoretical probabilities
p = Pr{ X Î [l , r )} = F (r ) - F (l ), j = 1, K , J
j j j X j X j

• for continuous data we have


rj
p j = FX (r j ) - FX (l j ) = òl f X (x ) dx , j = 1, K , J
j

where f (x ) is the density of X


X

• The Chi-square test statistic is then given by


2
J (N j - N p j )2
c = å
j =1 N pj

51
Chi-Square Test Example
• As an example, consider the repair time sample data of size N = 100,
given earlier, for which a histogram with J = 10 cells was constructed by the
Input Analyzer
• The table below displays the elements of the Chi-Square test for the repair data

Cell Cell Number of Relative Theoretical


Number Interval Observations Frequency Probability
j [l , r ) N pˆ p
j
j j j j
1 [10,12) 13 0.13 0.10
2 [12,14) 9 0.09 0.10
3 [14.16) 8 0.08 0.10
4 [16,18) 9 0.09 0.10
5 [18,20) 12 0.12 0.10
6 [20,22) 8 0.08 0.10
7 [22,24) 13 0.13 0.10
8 [24,26) 10 0.10 0.10
9 [26,28) 10 0.10 0.10
10 [28,30) 8 0.08 0.10
52
Chi-Square Test Example (Cont.)
• The histogram of the repair data suggests that a uniform distribution
Unif(a,b) is an acceptably good fit to the sample repair data
• The parameters of the uniform distribution are estimated as:
2

aˆ = min{x : 1£ i £ N } = 10, bˆ = max{x : 1£ i £ N } = 30


i i
• The Chi-Square statistic computation yields
10
e = å [pˆ j - p j ]2 = ( 0. 13 - 0. 10) 2 + L… + ( 0. 08 - 0.10) 2 = 0.0036
2

j=1

2 (13 - 10) 2 (8 - 10) 2


c = + L…+ = 3. 6
10 10
• A Chi-Square table shows that for significance level a = 0.10
and d = 10 - 2 - 1 = 7 degrees of freedom, the critical value is c = 12.0

2
Since the test statistic computed above is c = 3.6 < 12.0,
we accept the null hypothesis that the uniform distribution Unif(10,30)
is an acceptably good fit to the sample repair data
53
54
Kolmogorov-Smirnov Test
• The Kolmogorov-Smirnov (K-S) test compares the empirical cdf
to a theoretical counterpart
• while, the Chi-Square test requires a considerable amount of data
(at least to set up a reasonably “smooth” histogram), the K-S test can get
away with smaller samples, since it does not require a histogram

• The K-S test procedure proceeds as follows:


• sort the sample x 1, ¼ , x N is ascending order as x , ¼ , x
( 1) (N )
• constructs the empirical cdf
max{ j : x < x}
Fˆ (x ) =
(j )
X N
• construct the K-S test statistic
K S = max{x : | Fˆ X (x ) - FX (x ) |}
The smaller is the observed value of KS, the better is the fit
55
The Overall Process for:
Data Collection & Input Analysis
START
Analysis with Arena
Clean Data Input Analyzer
1-System Analysis

Integration of Fitted
Outlier Analysis Distributions to
2-Identify the
ARENA model
Required Data
6-Raw Data Adding the new
3-Schedule Data contents and
Preparation
Collection with updating the project
TEAM report
5-Data Preparation
4-Data Collection (Enter to excel or
(Logs) text files) END

56
Lab
 Focus: Working with Arena Input Analyzer (IA)
 To do:
 Analyzing 3 sample datasets with Arena IA
 Building histograms
 Finding fitted distributions
 Results Interpretation
 See Lab guide

57
References
 Starbucks McKenzie 7S model
http://www.slideshare.net/asifastral/starmc-kinsey-7-s-framework-model
 Slide 1:
 http://www.xcellimark.com/sites/all/themes/xm_adaptive/images/img_analysis2.jp
g
 http://www.odmguide.com/wp-content/uploads/2013/09/Marketing-Analysis.png
 http://aprilannfrancis.com/wp-content/uploads/2013/10/stock-vector-analysis-
magnifying-glass-over-seamless-background-with-different-association-terms-
vector-69601843.jpg
 Slide 6 – pic
 http://www.richrelevance.com/wp-content/uploads/2014/02/outlier.jpg

 Majority of the slides are taken from: Altiok / Melamed Simulation


Modeling and Analysis with Arena Chapter 7
 www.courses.vcu.edu/MATH-jrm/OPER641/Slides/Ch_07_Input.ppt

58
Time for Lab

You might also like