0% found this document useful (0 votes)
4 views24 pages

Chapter 9 - Review Input Analysis

Chapter 9 focuses on input analysis in simulation, emphasizing the importance of data collection and the potential challenges involved. It outlines the steps for input modeling, including data collection, identifying probability distributions, estimating parameters, and applying goodness-of-fit tests. Various probability distributions and their applications in modeling real-world scenarios are discussed, highlighting the significance of selecting appropriate distributions for accurate simulations.

Uploaded by

khanh.lekim2505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views24 pages

Chapter 9 - Review Input Analysis

Chapter 9 focuses on input analysis in simulation, emphasizing the importance of data collection and the potential challenges involved. It outlines the steps for input modeling, including data collection, identifying probability distributions, estimating parameters, and applying goodness-of-fit tests. Various probability distributions and their applications in modeling real-world scenarios are discussed, highlighting the significance of selecting appropriate distributions for accurate simulations.

Uploaded by

khanh.lekim2505
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

4/19/2023

Chapter 9. Review Input Analysis

Nguyen VP Nguyen, Ph.D.


Department of Industrial & Systems Engineering, HCMUT
Email: [email protected]

Chapter 1 Formulation
What is Chapter 6
Simulation? • System modeling: Conceptual
Define a & Logical modeling
• Terminology model • ARENA’s logics
• Classification Chapter 7 Chapter 7
Chapter 3
Contruct and
Input Analysis
Spreadsheet verify a model
Simulation Chapter 8 Chapter 9
Chapter 5
Modeling Steady-state
Chapter 2 & Descrete Event Operations Analysis
4:
Simulations
• Review
Statistics
Run pilots and
• Review • Simple Queueing validate the
Distributions Theory model
• Arrival Rates
• Service Rates Chapter 10
• Stability Process
• LITTLE’s law Make runs
Analysis
• Modeling of a system
Chapter 11
• Performance
Measures Analyze output data

1
4/19/2023

Data Collection

• One of the biggest tasks in solving a real problem. GIGO –


garbage-in-garbage-out
• Suggestions that may enhance and facilitate data collection:
 Plan ahead: begin by a practice or pre-observing session, watch
for unusual circumstances
 Analyze the data as it is being collected: check adequacy
 Combine homogeneous data sets, e.g. successive time periods,
during the same time period on successive days
 Be aware of data censoring: the quantity is not observed in its
entirety, danger of leaving out long process times
 Check for relationship between variables, e.g. build scatter
diagram
 Check for autocorrelation
 Collect input data, not performance data

Steps of input modeling

1) Collect data from real system of interest


 Requires substantial time and effort
 Use expert opinion in case of no sufficient data
2) Identify a probability distribution to represent the input
process
 Draw frequency distribution, histograms
 Choose a family of theoretical distribution
3) Estimate the parameters of the selected distribution
4) Apply goodness-of-fit tests to evaluate the chosen
distribution and the parameters
 Chi-square tests
 Kolmogorov Smirnov Tests
5) If these tests are not justified, choose a new theoretical
distribution and go to step 3! If all theoretical
distributions fail, then either use emprical distribution or
recollect data.

2
4/19/2023

Step 1: Data Collection includes lots of


difficulties

1. Nonhomogeneous interarrival time distribution;


distribution changes with time of the day, days of
the week, etc. You can’t merge all these data for
distribution fitting!
2. Two arrival processes might be dependent; like
demand for washing machines and dryers. You
shouldn’t treat them seperately!
3. Start and end of service durations might not be
clear; You should split the service into well defined
processes!
4. Machines may breakdown randomly; You should
collect data for up and down times!
5

Step 2.1: Identify the Probability Distribution

Raw Data
10 8 5 1 6 0 4 6 2 3
• Histogram with Discrete Data 2
5
3
1
5
8
9
9
2
1
0
9
2
3
4
7
2
4
3
0
2 6 3 1 4 5 0 3 3 2
2 10 0 3 6 0 6 5 7 0
Arrivals per 8 2 3 7 0 2 2 1 0 4
0 2 4 1 2 5 1 5 3 2
period Frequency 8 6 3 4 6 11 3 2 8 0
0 12 2 4 2 4 1 3 1 2 1 2
3 10 0 7 3 5 3 7 3 4
1 10
2 19 Histogram of Arrivals per Period
3 17 20
4 10 18
5 8
16
6 7
14
7 5
12
8 5
10
9 3 Frequency
8
10 3
6
11 1
4

0 6
0 1 2 3 4 5 6 7 8 9 10 11

3
4/19/2023

Step 2.1: Identify the Probability Distribution

Raw Data
79.919 3.081 0.062 1.961 5.845
• Histogram with Continuous Data 3.027
6.769
6.505
59.899
0.021
1.192
0.013
34.760
0.123
5.009
Component Life 18.387 0.141 43.565 24.420 0.433
(days) Frequency 144.695 2.663 17.967 0.091 9.003
[0‐3) 23 0.941 0.878 3.148 2.157 7.579
[3‐6) 10 0.624 5.380 3.371 7.078 23.960
[6,9) 5 0.590 1.928 0.300 0.002 0.543
7.004 31.764 1.005 1.147 0.219
[9‐12) 1
3.217 14.382 1.008 2.336 4.562
[12‐15) 1
[15‐18) 2 Histogram of Component Life
[18‐21) 0
25
[21‐24) 1
[24‐27) 1
[27‐30) 0 20
[30‐33) 1
[33‐36) 1 15
... ...
[42‐45) 1 Frequency
... ... 10

[57‐60) 1
... ... 5
[78‐81) 1
... ...
[144‐147) 1 0 7
3 6 9 12 15 18 21 24 27 30 33 36

Histograms [Identifying the distribution]

• Vehicle Arrival Example: # of vehicles arriving at an intersection


between 7 am and 7:05 am was monitored for 100 random workdays.
Arrivals per
Period Frequency
0 12
1 10
2 19
3 17 Same data
4 10 with different
5 8 interval sizes
6 7
7 5
8 5
9 3
10 3
11 1

• There are ample data, so the histogram may have a cell for each
possible value in the data range

4
4/19/2023

Step 2.2: Selecting the Family of Distributions


[Identifying the distribution]
• A family of distributions is selected based on:
 The context of the input variable
 Shape of the histogram
• Frequently encountered distributions:
 Easier to analyze: exponential, normal and Poisson
 Harder to analyze: beta, gamma and Weibull

Step 2.2: Selecting the family of distributions

1. The purpose of preparing a histogram is to


infer a known pdf or pmf.
2. This theoretical distribution is used to
generate random variables like interarrival
times and service times during simulation
runs.
3. Exponential, normal and poisson ditributions
are frequently encountered and are not
difficult to analyze.
4. Yet there are beta, gamma and weibull
families that provide a wide variety of shapes.
10

5
4/19/2023

Applications of Exponential Distribution

Used to model time between independent events,


like arrivals or breakdowns 11

Inappropriate for modeling process delay times

12

6
4/19/2023

Applications of Poisson Distribution

•Discrete distribution, used to model the number of


independent events occuring per unit time,
Eg. Batch sizes of customers and items

•If the time betweeen successive events is exponential, 13


then the number of events in a fixed time intervals
is poisson.

14

7
4/19/2023

15

Applications of Beta Distribution:

•Often used as a rough model in the absence of data


•Represent random proportions 16
•Can be transformed into scaled beta sample
Y=a+(b‐a)X

8
4/19/2023

17

Applications of Erlang Distribution

• Used to represent the time required to complete a


task which can be reprsented as the sum of k
exponentially distributed durations.

• For large k, Erlang approaches normal distribution.

• For k=1, Erlang is the exponential distribution with


rate=1/β.

• Special case of gamma distribution in which α, the


• shape parameter of gamma distribution is k. 18

9
4/19/2023

Applications of Gamma Distribution

•Used to represent time required to complete a task

•Same as Erlang distribution when the shape parameter


α is an integer. 19

Applications of Johnson Dist.


Flexible domain being bounded or
unbounded
20
allows it to fit many data sets.
If δ>0, the domain is bounded
If δ<0, the domain is unbounded

10
4/19/2023

Applications of Lognormal Distribution


Used to represent quantities which is the product of large
number of random quantities
Used to represent task times which are skewed to right. If
X~LOGN(l ,  l ), then lnX ~NORM(μ,σ)
21

22

11
4/19/2023

Applications of Weibull Distribution

• Widely used in reliability models to represent


lifetimes.

• If the system consists of large number of parts that


fail independently, time between successive
failures can be Weibull.

• Used to model nonnegative task times that are


skewed to left.

• It turns out to be exponential distribution when


 =1.
23

Applications of
Continuous Empirical
Distribution

•Used to incorporate
empirical data as an
alternative to
theoretical distribution,
when there are
multimodes,
significant outliers, etc.

24

12
4/19/2023

Applications of Discrete
Empirical Distribution

•Used for discrete


assignments such as job
type,
visitation sequence or
batch size

25

Step 3: Estimate the parameters of the selected


distribution

• A theoretical distribution is specified by its parameters that


are obtained from the whole population data.
Ex: Let V,W,X,Y,Z be random variables, then
V~N(µ,σ2), where µ is the mean and σ2 is the variance.
W~Poisson (λ), where λ is the mean
X~Exponential (β), where β is the mean
Y~Triangular (a,m,b), where a, m,b are the minimum,mod
and the maximum of the data
Z~Uniform (a,b), where a and b are the minimum and
maximum of the data

• These parameters are estimated by using the point


estimators defined on the sample data

26

13
4/19/2023

Step 3: Estimate the parameters of the selected


distribution

• Sample mean and the sample variance are the point estimators for the
population mean and population variance
Let Xi; i=1,2,...,n iid random variables (raw data are known) , then the sample
mean and sample variance s2 are calculated as

Discrete Raw Data Continuous Raw Data


10 8 5 1 6 0 4 6 2 3 79.919 3.081 0.062 1.961 5.845
2 3 5 9 2 0 2 4 2 3 3.027 6.505 0.021 0.013 0.123
5 1 8 9 1 9 3 7 4 0 6.769 59.899 1.192 34.760 5.009
2 6 3 1 4 5 0 3 3 2 18.387 0.141 43.565 24.420 0.433
2 10 0 3 6 0 6 5 7 0 144.695 2.663 17.967 0.091 9.003
8 2 3 7 0 2 2 1 0 4 0.941 0.878 3.148 2.157 7.579
0 2 4 1 2 5 1 5 3 2 0.624 5.380 3.371 7.078 23.960
8 6 3 4 6 11 3 2 8 0 0.590 1.928 0.300 0.002 0.543
2 4 2 4 1 3 1 2 1 2 7.004 31.764 1.005 1.147 0.219
3 10 0 7 3 5 3 7 3 4 3.217 14.382 1.008 2.336 4.562
27

Step 3: Estimate the parameters of the selected


distribution

• If the data are discrete and have been grouped in a frequency


distribution, i.e., the raw data are not known, then

where k is the number of distinct values of X and fj; j=1,2,...,k


is the observed frequency of the value Xj of X.

Arrivals per period Frequency Arrivals per period Frequency

0 12 6 7

1 10 7 5

2 19 8 5

3 17 9 3

4 10 10 3

5 8 11 1 28

14
4/19/2023

Step 3: Estimate the parameters of the selected


distribution

• If the data are discrete or continuous and have been grouped


in class intervals, i.e., the raw data are not known, then

where fj; j=1,2,...,c is the observed frequency of the jth class


interval and mj is the midpoint of the jth interval.

Component Life Component Life Component Life


(days) Frequency (days) Frequency (days) Frequency

[0‐3) 23 [21‐24) 1 ... ...

[3‐6) 10 [24‐27) 1 [57‐60) 1

[6,9) 5 [27‐30) 0 ... ...

[9‐12) 1 [30‐33) 1 [78‐81) 1

[12‐15) 1 [33‐36) 1 ... ...

[15‐18) 2 ... ... [144‐147) 1


29
[18‐21) 0 [42‐45) 1

Step 3: Estimate the parameters of the selected


distribution

• The minimum, mod (i.e., data value with the


highest frequency) and maximum of the
population data are estimated from the sample
data as

Xt is the data value that has the highest


frequency.

30

15
4/19/2023

Step 4: Goodness of fit test

• Goodness of fit tests (GFTs) provide helpful guidance


for evaluating the suitability of the selected input
model as a simulation input.
• GFTs check the discrepancy between the emprical
and the selected theoretical distribution to decide
whether the sample is taken from that theoretical
distribution or not.
• The role of sample size, n:
 If n is small, GFTs are unlikely to reject any theoretical
distribution, since discrepancy is attributed to the sampling
error!
 If n is large, then GFTs are likely to reject almost all
distributions.
31

Step 4: Goodness of fit tests


Chi square test

• Chi square test is valid for large sample sizes and for both discrete and
continuous assumptions when parameters are estimated with maximum
likelihood.

• Hypothesis test:
Ho: The random variable X conforms to the theoretical distribution with
the estimated parameters
Ha: The random variable does NOT conform to the theoretical
distribution with the estimated parameters

We need a test statistic to either reject or fail to reject Ho. This test
statistic should measure the discrepency between the theoretical and the
emprical distribution.
If this test statistic is high, then Ho is rejected,
Otherwise we fail to reject Ho! (Hence we accept Ho)

32

16
4/19/2023

Step 4: Goodness of fit tests


Chi square test

Test statistic:
Arrange n observations into a set of k class intervals or cells. The
test statistic is given by

where Oi is the observed frequency in the ith class interval and


Ei is the expected frequency in the ith class interval.

where pi is the theoretical probability associated with the ith class,


i.e., pi =P(random variable X belongs to ith class).

33

Step 4: Goodness of fit tests


Chi square test

• Recommendations for number of class intervals


for continuous data
Sample Size, Number of Class Intervals
n k
20 Do not use chi‐square test
50 5‐10
100 10 to 20
>100 to n/5

 Itis suggested that . In case it is smaller, then


that class should be combined with the adjacent
classes. Similarly the corresponding Oi values should
also be combined and k should be reduced by every
combined cell.

34

17
4/19/2023

Step 4: Goodness of fit tests


Chi square test

• Evaluation
Let α =P(rejecting Ho when it is true); the significance level is 5%.

follows the chi‐


square distribution with
k‐s‐1 degress of
freedom, where s is the
number of estimated
parameters.

Fail to Reject Ho Reject Ho


If probability of the test statistic < α, reject Ho and the distribution
otherwise, fail to reject Ho.

35

Chi-square distribution table

(k‐s‐1) α
𝜒 ,

36

18
4/19/2023

Step 4: GFT - chi square test


Ex: poisson distribution

• Consider the discrete data we analyzed in step 2.


Ho: # arrivals, X~ Poisson (λ=3.64)
Ha: ow
λ is the mean rate of arrivals, =3.64
• The following probabilities are found by using the
pmf
P(0)=0.026 P(6)=0.085
P(1)=0.096 P(7)=0.044
P(2)=0.174 P(8)=0.020
P(3)=0.211 P(9)=0.008
P(4)=0.192 P(10)=0.003
P(5)=0.140 P(>11)=0.001
37

Step 4: GFT - chi square test


Ex: poisson distribution

• Calculation of the chi-square test statistic with k-s-1=7-1-1=5


degrees of freedom and α=0,05.

So, Ho is rejected!
38

19
4/19/2023

Step 4: GFT - chi square test


Ex: arena input analyzer
Distribution Summary
Distribution: Normal
Expression: NORM(225, 89)
Square Error: 0.037778
Reject Normal distribution at 5%
significance level!
Chi Square Test
Number of intervals = 12
Degrees of freedom =9
Test Statistic = 1.22e+004
Corresponding p-value < 0.005

Data Summary Fit all summary


Number of Data Points = 27009 Function Sq Error
Min Data Value =1 ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Max Data Value = 1.88e+003 Normal 0.0506
Gamma 0.0625
Sample Mean = 225
Beta 0.0639
Sample Std Dev = 89 Erlang 0.0673
Weibull 0.079
Histogram Summary Lognormal 0.0926
Histogram Range = 0.999 to 1.88e+003 Exponential 0.286
Triangular 0.311
Number of Intervals = 40
Uniform 0.36

39

Step 4: GFT - chi square test


Ex: arena input analyzer

Distribution Summary
Distribution: Lognormal
Expression: 2 + LOGN(145, 67.9)
Square Error:0.000271 Reject Lognormal distribution at
5% significance level!
Chi Square Test
Number of intervals =4
Degrees of freedom =1
Test Statistic = 207
Corresponding p-value < 0.005

Data Summary
Number of Data Points = 21547
Min Data Value =2
Max Data Value = 6.01e+003
Sample Mean = 146
Sample Std Dev = 79.5
Histogram Summary
Histogram Range = 2 to 6.01e+003
Number of Intervals = 40

40

20
4/19/2023

Step 4: GFT - chi square test


Ex: arena input analyzer
Distribution Summary
Distribution: Weibull
Expression: 0.999 + WEIB(94.7, 0.928)
Square Error: 0.002688
Reject Weibull distribution at 5%
Chi Square Test significance level!
Number of intervals = 20
Degrees of freedom = 17
Test Statistic = 838
Corresponding p-value < 0.005

Data Summary
Number of Data Points = 12418
Min Data Value =1
Max Data Value = 1.47e+003
Sample Mean = 108
Sample Std Dev = 135
Histogram Summary
Histogram Range = 0.999 to 1.47e+003
Number of Intervals = 40

41

Step 4: Goodness of fit tests


Drawbacks of Chi-square GFT

1. The Chi-square test uses the estimates of the


parameters obtained from the sample that
decreases the degrees of freedom.
2. Chi-square test requires the data to be placed
in class intervals in the continuous
distributions where these classes are arbitrary
and affects the value of the chi-square test
statistic.
3. The distribution of the chi-square test statistic
is known approximately and the power of the
test (probability of rejecting an incorrect
theoretical distribution) is sometimes low.
Hence other GFTs are also needed!

42

21
4/19/2023

Step 4: Goodness of fit tests


Kolmogorov-Smirnov test

• Useful when the sample sizes are small and when no


parameters are estimated from the sample data.
• Compares the cdf of the theoretical distribution, F(x) with the
emprical cdf, SN(x) of the sample of N observations.

• Hypothesis test:
Ho: Data follow the selected pdf
Ha: Data do NOT follow the selected pdf

• Test Statistic:
The largest deviation, D between F(x) and SN(x).

43

Step 4: Goodness of fit tests


Kolmogorov-Smirnov test

Steps of K-S Test:

1. Rank the data so that 𝑋 1 𝑋2 ⋯ 𝑋𝑁


2. Calculate the maximum discrepancy D between F and SN,

𝐹 𝑋𝑖 𝑃 𝑋 𝑋𝑖

# 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑑 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑋𝑖 𝑖


𝑆𝑁 𝑋 𝑖
𝑁 𝑁

44

22
4/19/2023

Step 4: Goodness of fit tests


Kolmogorov-Smirnov test

• If F is discrete 𝐷 𝑚𝑎𝑥 𝐷 , 𝐷 , where

𝑖
𝐷 max 𝑆𝑁 𝑋 𝑖 𝐹 𝑋 𝑖 max 𝐹 𝑋 𝑖
0 𝑖 𝑁 0 𝑖 𝑁 𝑁
𝑖 1
𝐷 max 𝐹 𝑋 𝑖 𝑆𝑁 𝑋 𝑖 1 max 𝐹 𝑋 𝑖
0 𝑖 𝑁 0 𝑖 𝑁 𝑁

• If F is continuous

𝐷 max 𝐹 𝑋 𝑖 𝑆𝑁 𝑋 𝑖
0 𝑖 𝑁

45

Step 4: Goodness of fit tests


Kolmogorov-Smirnov test
3. Evaluation

𝐼𝑓 𝐷 𝐷∝,𝑁 , 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜

𝐼𝑓 𝐷 𝐷∝,𝑁 , 𝑡ℎ𝑒𝑛 𝑓𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻𝑜

46

23
4/19/2023

Step 4: Goodness of fit tests


Example: Kolmogorov-Smirnov test

Consider the data:


0.44, 0.81, 0.14, 0.05, 0.93

Ho: Data are uniform between (0,1)


Ha: ow

i 1 2 3 4 5
Since D=0.26 <𝐷0.05,5 = 0.565
𝑋 𝑖 0.05 0.14 0.44 0.81 0.93 Ho is not rejected!
𝐹 𝑋 𝑖 𝑋 𝑖 0.05 0.14 0.44 0.81 0.93 Data are uniform between (0,1)
𝑆𝑁 𝑋 𝑖 𝑖/𝑁 0.20 0.40 0.60 0.80 1.00
𝑖/𝑁 𝐹 𝑋 𝑖 0.15 0.26 0.16 ‐ 0.07
0.05 ‐ 0.04 0.21 0.13 47
𝐹 𝑋 𝑖 𝑖 1 /𝑁

24

You might also like