0% found this document useful (0 votes)
3 views

Chap_05_Data_Collection_and_Analysis

ENGR502 lecture

Uploaded by

jtflag
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chap_05_Data_Collection_and_Analysis

ENGR502 lecture

Uploaded by

jtflag
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 85

Some errata

 The barbershop/hairstyling simulation


 Fixed some typos, look for “…V4.pdf”
 Only 3 hairdressers, 2 barbers

1
Modeling, fun issues
 Graphical library—create your own!
 Create your own graphics library and carry it around with
you on a USB drive or similar
 I’d simply save it with any .MOD file you create, same folder.
 Tutorial Link is here:
 https://www.promodel.com/onlinehelp/solutions/promodel/#How%
20to%20Create%20Your%20Own%20Graphics%20Library.htm#ka
nchor128
 You may need to “show all content” in MS Explorer
 The spot where you save your own unique library is about 2
minutes into the video.
 Suggest you take the default graphics library, SAVE AS to a
new location, and then ADD your own stuff to it.
 That way, you have both default icons and your own icons you
added.

2
Modeling, Routing order
 The order of an entity being routed is simply the
order on the Routing table (top to bottom),
 unless there is some programmed logic to override that
order (later in semester).
 You can also have the entity change itself as a
result of something
 A block of steel (“stock_AL” entity) can be machined.
Result sent onward is a “gear”.
 A log can be processed into a toothpick. Or 1000
toothpicks.
 A mailbag can be emptied into 30 letters and sent onward.
 Here's a nice example that lays it out, step by step:
 Defining process logic
 Summarized on next page for class…

3
Defining processing order
and entity changes
 Logic of processing should be written down
first, before modeling
 The Demo “Mfg Cost.mod” is more complex than this
(has resources, too).

4
Defining processing order
and entity changes
 Logic of processing should be written down first, before modeling
 Write it out logically:
1) When an entity called Pallet arrives at location Receive there is no
operation time or processing logic (it's just a storage location). The
resulting output is six entities called Blank that are routed to the
First available destination of either NC_301L or NC_302L.
2) When Blanks arrive at NC_301L or NC_302L, the processing time is
a normal distribution with a mean of 3 and a standard deviation
of .2 minutes. The name of the entity is now changed to Cog, and
the Cog is sent to the Degrease location (First is the default routing
rule).
3) Two Cogs are accumulated at Degrease and processed for 5
minutes. When the degrease cycle is complete, Cogs are routed to
location Inspect.
4) The inspection time is a uniform distribution with a mean of 3.2 and
a half range of .3 minutes. Ninety six percent of the Cogs pass
inspection and exit the system, while four percent of the Cogs fail
inspection and become Rejects.

5
Defining processing order
and entity changes
 First, create Locations*
 Then, create Entities*
 Go to process table and connect the two,
according to your table
 Fill in the details afterwards
 You don’t need to put in the logic yet, either,
just get the tables set up rationally

*Technically, you don’t need graphics; add them later


6
Defining processing order
and entity changes
 First, create Locations*
 Then, create Entities*
 Go to process table and connect the two,
according to your table
 Fill in the details afterwards
 You don’t need to put in the logic yet, either,
just get the tables set up rationally

*Technically, you don’t need graphics; add them later


7
Defining processing order
and entity changes
 I show this last one due to the Rules (far
right column). It was simply by filling in
the Probability option in the table.
 I input either a Cog entity or a “Reject” entity
to be output, only one exit.

*Technically, you don’t need graphics; add them later


8
Chapter 5

Data Collection and Analysis


“You can observe a lot just by
watching.”
– Yogi Berra
Questions to be answered:
 What types of data should be gathered?
 How should data be gathered?
 What statistical background do I need?
 How should data be analyzed?
 How do you get data in the right form for
use in simulation?
 How should data be documented?

5-10
Sources of Data
 Historical records (production,
sales, scrap rates, equipment
reliability)
 System documentation (process
plans, facility layouts, work
procedures)
 Personal Observation (facility walk-
through, time studies, work
sampling)
5-11
Sources of Data (cont.)
 Comparative systems (same or
similar industries)
 Vendor claims (cycle times,
equipment reliability)
 Design estimates (process times,
move times, etc. for a new system)
 Literature (published research on
learning curves, predetermined
time studies, etc.) 5-12
Know when to quit.

5-13
Random System Variables
es
t im
n cle
ee Cy
tw

In es
b e

ti m
te
e es

r -a
m
T i i lu r

r
fa

ri v
al
Bo Nu
xe re mb
s je
pe ct er
rp s of
al
le
t

5-14
Characterizing Random
Variables
You don’t need to be a professional
statistician, all you need is a basic
knowledge of statistics (describe the data)
 descriptive

 data analysis (looks for correlations in the


data)
 distribution fitting (determines the
appropriate probability distribution to
represent the data)

5-15
Discrete vs. Continuous
Variables
 Continuous – The variable can take on
any value within a range (i.e. Height,
Weight, Time, etc.)

 Discrete – The variable can only take


select values within a range (i.e. Gender,
Patient Class, Part Type, Counts, etc.)

5-16
Data Groupings
 Class – Category or range of values for
grouping data.
 Frequency -- The number of observations
that fall in a class.
 Frequency distribution -- A listing of all
classes along with their frequencies.
 Relative frequency -- The ratio of the
frequency of a class to the total number of
observations.
 Relative-frequency distribution -- A
listing of all classes along with their
relative frequencies.
5-17
Histograms reveal the shape,
center, and spread of a
variable
 Shape refers to the shape formed by
the bars of the histogram
 Center refers to the mean of the
variable. If the histogram were an
object, it would “balance” on the mean
(half the area is to the left, half to the
right).
 Spread refers to how far dispersed the
data values are

5-18
Measures of Center or
“Location”
 Mean
 Median
 Mode

5-19
Calculating Sample Mean
X
Formula: X n i
That is, add up all of the data points and divide
by the number of data points.
Data (# of classes skipped): 2 8 3 4 1
Sample Mean = (2+8+3+4+1)/5 = 3.6
Do not round! “Mean” need not be a whole number.
5-20
Median
 Another name for 50th percentile.
 Appropriate for describing measurement
data.
 “Robust to outliers,” that is, not affected
much by unusual values.

5-21
Mode
 The value that occurs most frequently.
 One data set can have many modes.
 Appropriate for all types of data, but most
useful for categorical data or discrete data
with only a few number of possible values.

5-22
Histograms can be unimodal,
multimodal or uniform

5-23
A histogram can show you if
there are outliers in the data.
5-24
Measures of Spread
 Range
 Variance
 Standard Deviation

5-25
How and why should data
be analyzed?
 Data analysis ensures that your data is
meaningful and useful.
 Types of analysis include:
 Test for independence (randomness).
 Test for homogeneity (same source).
 Test for stationarity (non varying over time).

5-26
Testing for Independence
(Randomness)
 Scatter Plot
 Autocorrelation Plot
 Runs Test

These tests can be run using


Stat::Fit which can be run “Stand-
alone” or from the ProModel Tools
menu.
5-27
Stat::Fit
 Stat::Fit is the data analysis and
distribution fitting software package
bundled with PROMODEL products.
 You can enter or import a dataset into
Stat::Fit for distribution fitting.
 You can copy and paste the fitted
distribution parameters into a ProModel
model.

5-28
Entering Data in Stat::Fit
 Type values.
 Open a .dat file
(File  Open).
 Copy and paste
from spreadsheet.

5-29
Scatter Plot
 Tests for Independence.
 Plots successive pairs of data as x,y values
(n-1 points).
 Random scatter of points indicates
independence.
 If data are correlated, the points will fall
along a line or curve.

5-30
Scatter Plot for 100 Inspection
Times

5-31
Scatter Plot for 100
Temperatures

5-32
Autocorrelation Plot
 Another test for independence.
 Independence is ascertained by computing
autocorrelations for data values at varying
time lags.
 If independent, such autocorrelations
should be near zero for any and all time-
lag separations.

5-33
Autocorrelation Plot for
Inspection Times

5-34
Autocorrelation Plot for
Temperatures

5-35
Data that tends to be non-
homogenous
 Activity times that take longer or
shorter depending on the type of entity
being processed.
 Inter-arrival times that fluctuate in
length depending on the time of day or
day of the week.
 Time between failures and time to
repair where the failure may result from
a number of different causes.

5-36
Testing for Identically Distributed
(Homogenous) Data
Part Jams Mechanical Failures
of Occurrence
Frequency

Repair Time
Bimodal Distribution of Downtimes Indicating Multiple Causes
5-37
Nonstationary (time-variant)
Data

 Behavior that changes over time

Examples:
 Customer arrivals
 Equipment reliability

5-38
Non-stationary Data

Rate of
Arrival

10:00 a.m. 12:00 a.m. a.m. 2:00 p.m. 4:00 p.m. 6:00 p.m.

Time of Day

Change in Rate of Customer Arrivals Between 10 a.m. and 6 p.m.


5-39
Three ways to represent
data
 Use actual data -- e.g. read from text file
 Use a frequency table -- called an
empirical or user-defined distribution
 Use a standard distribution -- best guess or
Stat::Fit

5-40
Probability Distributions
 A Probability Distribution defines all
possible values of a system variable
plotted against their respective
probabilities.
 Distributions can be either discrete
(probability mass function) or continuous
(probability density function).

5-41
Bernoulli

 The output of a process is either defective


or non-defective
 An employee shows up for work or not
 An operation is required or not

5-42
.4

.3

Binomial
f(x)
.2

.1

0 1 2 3 4 5 6
x

 The number of defective items in a batch.


 The number of customers of a particular
type that enter the system.
 The number of employees out of a group
of employees who call in sick on a given
day.
5-43
0.2

Poisson
f(x)

0.1

0 1 2 3 4 5 6 7 8 9 10
x

 The number of entities arriving each hour.


 The number of defects per item.
 The number of times a resource is
interrupted each hour.

5-44
0.4

Geometri
0.3

f(x)
0.2

c 0.1

0
1 2 3 4 5 6 7
x

 The number of machine cycles before a


failure occurs.
 The number of items inspected before a
defective item is found.
 The number of customers processed
before a particular type is encountered

5-45
Uniform
f(x)

a b

 The type of an incoming entity given that


each possible type is equally likely to
occur

5-46
f(x)

Triangular
a m c
x

• good first approximation to the true


underlying distribution when data is sparse
and no distribution fitting analysis has been
performed

5-47
Normal f(x)

  
x

 Popular but rarely a true representation of


actual data.

5-48
Exponential
f(x)

 intervals between occurrences such as


the time between customer arrivals.
 certain repair times or activities such as
the duration of telephone conversations.
 Inverse of Poisson

5-49
 =0.5,  =2

 =1.0,  =2.0

 =.5,  =.5
f(x)

Beta
0 1.0
x

 random proportions such as the


percentage of defective items in a lot
 activity times, particularly when
multiple tasks make up the activity
(PERT analysis is based on beta
distribution).
5-50
f(x)

Lognormal
x

 manual activities such as assembly,


inspection or repair.
 The time between failures is often
lognormally distributed.

5-51
 1

Gamma
f(x)

>2

 manual tasks such as service times or


repair times

5-52
 1

f(x)

Weibull =2

 used in reliability theory for defining the


time until failure particularly due to
items (e.g. bearings, tooling, etc.) that
wear

5-53
Bounded vs. Boundless
Distributions
 Bounded distributions prevent likely
extreme values from occurring.
 Boundless distributions cause unlikely
extreme values to occur.

5-54
Fitting Distributions to Data
Using Stat::Fit
1. Enter or import data as previously
discussed.
2. Plot the data and look at parameters to
get a sense of the shape of the data.
3. Select distributions to fit and analysis to
use.
4. Run the analysis and view rankings.
5. Make a selection.

5-55
Plotting the Data
To plot the raw data
you have imported
select Input  Input
Graph

The shape of the input


graph can help
determine the
appropriate
distribution.
5-56
Descriptive Statistics

If desired, you can


view descriptive
statistics (data
parameters) for the
data to get an idea of
the center and spread.

Select Statistics 
Descriptive

5-57
Setup
Select Fit  Setup
The window that
comes up allows
you to select the
distributions
Stat::Fit will fit to
the data set.

5-58
Setup
By selecting the
Calculations tab,
you can change
the tests to be run,
the estimators to be
used, and the level
of significance.

5-59
Estimators
Stat::Fit allows you to
change the estimation
technique utilized to fit
the distributions.

MLE’s (Maximum
Likelihood Estimators)
are generally
preferred, but in some
cases the MLE
estimator doesn’t exist
so you must use
Moments.

5-60
Tests
You can choose to
perform any
combination of the
three goodness of fit
tests that are
available.

All three tests


measure the extent
that the fit distribution
models the data set
but in different ways.
5-61
Chi-Squared Test
1. Compares actual counts(values from the
input dataset) versus expected counts
(values from the estimated distribution)
2. Derives p-value from how much these
values differ
3. Better with larger sample sizes

5-62
Kolmogorov-Smirnov Test
1. Difference between cumulative
distribution of data and fit distribution

2. Most conservative, least likely to reject


the correct distribution in error

5-63
Anderson-Darling Test
1. Like Kolmogorov Smirnov, but gives a
heavier weight to differences in the tails of
the distribution

2. Good for any sample size

3. Not good for discrete data

5-64
Tests
 The Hypotheses:
 H0: The distribution fit is in fact the correct
distribution to describe the variable of interest.

 H1: The distribution fit is NOT the correct


distribution to describe the variable of interest.

5-65
How often are you willing to
be wrong?
You set the value of
the level of
significance based
on the answer to
the question above.

Tells you how likely


the test rejects a
distribution that
accurately
describes the data.

5-66
Errors
 There are two types of errors that can be
made when performing a statistical test:
 Type I: you reject H0 when in fact H0 is true
 Type II: you accept H0 when if fact H1 is true

 The level of significance you chose IS the


probability of a Type I error

5-67
Fitting the Data
There are two ways to perform the goodness of
fit tests:

 Select the Auto::Fit button from the toolbar

 Select Fit  Goodness of Fit

5-68
Auto::Fit
Within the Auto::Fit
window you can select
to fit continuous or
discrete distributions.

If the distribution has a


lower bound, that value
can be specified here.

5-69
Auto::Fit

Using Auto::Fit, the distributions are automatically ranked


according to which seem to fit the data the best.

5-70
Goodness of Fit Tests
The test results are
given along with the
actual distribution fit.
Each test has a result:
Reject or Do Not Reject.

Do Not Reject means


there was not enough
evidence to conclude it
is not the correct
distribution to describe
the data.

5-71
Distribution Graph
After picking out the
top few distributions,
it can be useful to
graph the fit
distribution against
the data.

Select Fit  Results


Graph 
Comparison
5-72
Picking a Fit
 Compare test results.
 Compare graphs.

 Use what you know about the

process.

5-73
Exporting
Once you have
selected a fit, you
need to export it to
the PROMODEL
product.

Select File  Export


 Export Fit or the
Export button from
the toolbar.
5-74
Exporting
 Select the
application you
would like to
export to
(PROMODEL
Products)
 Select the
distribution to
export

5-75
Exporting
 The precision box
allows you to
change the number
of decimal places
in the distribution
parameters
 Select OK
 The distribution is
now in the correct
form to paste into
your model

5-76
Stat::Fit
 What to avoid:
 Small samples

 Using all goodness of fit tests – it

increases the Type I error rate


 Taking the distribution into the

model without exporting

5-77
Frequency Histogram of
Inspection Times

5-78
Best Distribution Fit for
Inspection Times

5-79
Beta Curve Representing
Inspection Times

5-80
Appropriate Adjustments
 Remember, you fit a distribution to
historical data, not necessarily to the
data reflecting the design period.
 Don’t forget to adjust the data to reflect
the period of interest.
 Is there a growth rate to factor in?
 Is there a learning curve to consider?

5-81
Handling Rare Behavior
 Repeating behavior – e.g. Occasional
abnormally long downtimes.
 Can include if not too infrequent.
 Can model once like non-repeating.
 Non-repeating behavior – e.g. Labor strike
 Throw one in to see what happens.

5-82
Absence of Data
 A single, most likely or mean value
 Minimum and maximum values defining a
range
 Minimum, most likely and maximum values
 Use sensitivity analysis:
 Best case
 Worse case
 Most likely case

5-83
Assumptions
 All models are based on assumptions.
 Relative comparisons may still be valid.
 Sensitivity analysis can show crucial
assumptions.

5-84
Use of an Assumption List
Assumptions Examples
 No downtimes are considered (downtimes are rare).
 Operators are dedicated at each workstation and are always
available during the scheduled work time.
 Rework times are half of the normal operation times.

5-85

You might also like