Chap_05_Data_Collection_and_Analysis
Chap_05_Data_Collection_and_Analysis
1
Modeling, fun issues
Graphical library—create your own!
Create your own graphics library and carry it around with
you on a USB drive or similar
I’d simply save it with any .MOD file you create, same folder.
Tutorial Link is here:
https://www.promodel.com/onlinehelp/solutions/promodel/#How%
20to%20Create%20Your%20Own%20Graphics%20Library.htm#ka
nchor128
You may need to “show all content” in MS Explorer
The spot where you save your own unique library is about 2
minutes into the video.
Suggest you take the default graphics library, SAVE AS to a
new location, and then ADD your own stuff to it.
That way, you have both default icons and your own icons you
added.
2
Modeling, Routing order
The order of an entity being routed is simply the
order on the Routing table (top to bottom),
unless there is some programmed logic to override that
order (later in semester).
You can also have the entity change itself as a
result of something
A block of steel (“stock_AL” entity) can be machined.
Result sent onward is a “gear”.
A log can be processed into a toothpick. Or 1000
toothpicks.
A mailbag can be emptied into 30 letters and sent onward.
Here's a nice example that lays it out, step by step:
Defining process logic
Summarized on next page for class…
3
Defining processing order
and entity changes
Logic of processing should be written down
first, before modeling
The Demo “Mfg Cost.mod” is more complex than this
(has resources, too).
4
Defining processing order
and entity changes
Logic of processing should be written down first, before modeling
Write it out logically:
1) When an entity called Pallet arrives at location Receive there is no
operation time or processing logic (it's just a storage location). The
resulting output is six entities called Blank that are routed to the
First available destination of either NC_301L or NC_302L.
2) When Blanks arrive at NC_301L or NC_302L, the processing time is
a normal distribution with a mean of 3 and a standard deviation
of .2 minutes. The name of the entity is now changed to Cog, and
the Cog is sent to the Degrease location (First is the default routing
rule).
3) Two Cogs are accumulated at Degrease and processed for 5
minutes. When the degrease cycle is complete, Cogs are routed to
location Inspect.
4) The inspection time is a uniform distribution with a mean of 3.2 and
a half range of .3 minutes. Ninety six percent of the Cogs pass
inspection and exit the system, while four percent of the Cogs fail
inspection and become Rejects.
5
Defining processing order
and entity changes
First, create Locations*
Then, create Entities*
Go to process table and connect the two,
according to your table
Fill in the details afterwards
You don’t need to put in the logic yet, either,
just get the tables set up rationally
5-10
Sources of Data
Historical records (production,
sales, scrap rates, equipment
reliability)
System documentation (process
plans, facility layouts, work
procedures)
Personal Observation (facility walk-
through, time studies, work
sampling)
5-11
Sources of Data (cont.)
Comparative systems (same or
similar industries)
Vendor claims (cycle times,
equipment reliability)
Design estimates (process times,
move times, etc. for a new system)
Literature (published research on
learning curves, predetermined
time studies, etc.) 5-12
Know when to quit.
5-13
Random System Variables
es
t im
n cle
ee Cy
tw
In es
b e
ti m
te
e es
r -a
m
T i i lu r
r
fa
ri v
al
Bo Nu
xe re mb
s je
pe ct er
rp s of
al
le
t
5-14
Characterizing Random
Variables
You don’t need to be a professional
statistician, all you need is a basic
knowledge of statistics (describe the data)
descriptive
5-15
Discrete vs. Continuous
Variables
Continuous – The variable can take on
any value within a range (i.e. Height,
Weight, Time, etc.)
5-16
Data Groupings
Class – Category or range of values for
grouping data.
Frequency -- The number of observations
that fall in a class.
Frequency distribution -- A listing of all
classes along with their frequencies.
Relative frequency -- The ratio of the
frequency of a class to the total number of
observations.
Relative-frequency distribution -- A
listing of all classes along with their
relative frequencies.
5-17
Histograms reveal the shape,
center, and spread of a
variable
Shape refers to the shape formed by
the bars of the histogram
Center refers to the mean of the
variable. If the histogram were an
object, it would “balance” on the mean
(half the area is to the left, half to the
right).
Spread refers to how far dispersed the
data values are
5-18
Measures of Center or
“Location”
Mean
Median
Mode
5-19
Calculating Sample Mean
X
Formula: X n i
That is, add up all of the data points and divide
by the number of data points.
Data (# of classes skipped): 2 8 3 4 1
Sample Mean = (2+8+3+4+1)/5 = 3.6
Do not round! “Mean” need not be a whole number.
5-20
Median
Another name for 50th percentile.
Appropriate for describing measurement
data.
“Robust to outliers,” that is, not affected
much by unusual values.
5-21
Mode
The value that occurs most frequently.
One data set can have many modes.
Appropriate for all types of data, but most
useful for categorical data or discrete data
with only a few number of possible values.
5-22
Histograms can be unimodal,
multimodal or uniform
5-23
A histogram can show you if
there are outliers in the data.
5-24
Measures of Spread
Range
Variance
Standard Deviation
5-25
How and why should data
be analyzed?
Data analysis ensures that your data is
meaningful and useful.
Types of analysis include:
Test for independence (randomness).
Test for homogeneity (same source).
Test for stationarity (non varying over time).
5-26
Testing for Independence
(Randomness)
Scatter Plot
Autocorrelation Plot
Runs Test
5-28
Entering Data in Stat::Fit
Type values.
Open a .dat file
(File Open).
Copy and paste
from spreadsheet.
5-29
Scatter Plot
Tests for Independence.
Plots successive pairs of data as x,y values
(n-1 points).
Random scatter of points indicates
independence.
If data are correlated, the points will fall
along a line or curve.
5-30
Scatter Plot for 100 Inspection
Times
5-31
Scatter Plot for 100
Temperatures
5-32
Autocorrelation Plot
Another test for independence.
Independence is ascertained by computing
autocorrelations for data values at varying
time lags.
If independent, such autocorrelations
should be near zero for any and all time-
lag separations.
5-33
Autocorrelation Plot for
Inspection Times
5-34
Autocorrelation Plot for
Temperatures
5-35
Data that tends to be non-
homogenous
Activity times that take longer or
shorter depending on the type of entity
being processed.
Inter-arrival times that fluctuate in
length depending on the time of day or
day of the week.
Time between failures and time to
repair where the failure may result from
a number of different causes.
5-36
Testing for Identically Distributed
(Homogenous) Data
Part Jams Mechanical Failures
of Occurrence
Frequency
Repair Time
Bimodal Distribution of Downtimes Indicating Multiple Causes
5-37
Nonstationary (time-variant)
Data
Examples:
Customer arrivals
Equipment reliability
5-38
Non-stationary Data
Rate of
Arrival
10:00 a.m. 12:00 a.m. a.m. 2:00 p.m. 4:00 p.m. 6:00 p.m.
Time of Day
5-40
Probability Distributions
A Probability Distribution defines all
possible values of a system variable
plotted against their respective
probabilities.
Distributions can be either discrete
(probability mass function) or continuous
(probability density function).
5-41
Bernoulli
5-42
.4
.3
Binomial
f(x)
.2
.1
0 1 2 3 4 5 6
x
Poisson
f(x)
0.1
0 1 2 3 4 5 6 7 8 9 10
x
5-44
0.4
Geometri
0.3
f(x)
0.2
c 0.1
0
1 2 3 4 5 6 7
x
5-45
Uniform
f(x)
a b
5-46
f(x)
Triangular
a m c
x
5-47
Normal f(x)
x
5-48
Exponential
f(x)
5-49
=0.5, =2
=1.0, =2.0
=.5, =.5
f(x)
Beta
0 1.0
x
Lognormal
x
5-51
1
Gamma
f(x)
>2
5-52
1
f(x)
Weibull =2
5-53
Bounded vs. Boundless
Distributions
Bounded distributions prevent likely
extreme values from occurring.
Boundless distributions cause unlikely
extreme values to occur.
5-54
Fitting Distributions to Data
Using Stat::Fit
1. Enter or import data as previously
discussed.
2. Plot the data and look at parameters to
get a sense of the shape of the data.
3. Select distributions to fit and analysis to
use.
4. Run the analysis and view rankings.
5. Make a selection.
5-55
Plotting the Data
To plot the raw data
you have imported
select Input Input
Graph
Select Statistics
Descriptive
5-57
Setup
Select Fit Setup
The window that
comes up allows
you to select the
distributions
Stat::Fit will fit to
the data set.
5-58
Setup
By selecting the
Calculations tab,
you can change
the tests to be run,
the estimators to be
used, and the level
of significance.
5-59
Estimators
Stat::Fit allows you to
change the estimation
technique utilized to fit
the distributions.
MLE’s (Maximum
Likelihood Estimators)
are generally
preferred, but in some
cases the MLE
estimator doesn’t exist
so you must use
Moments.
5-60
Tests
You can choose to
perform any
combination of the
three goodness of fit
tests that are
available.
5-62
Kolmogorov-Smirnov Test
1. Difference between cumulative
distribution of data and fit distribution
5-63
Anderson-Darling Test
1. Like Kolmogorov Smirnov, but gives a
heavier weight to differences in the tails of
the distribution
5-64
Tests
The Hypotheses:
H0: The distribution fit is in fact the correct
distribution to describe the variable of interest.
5-65
How often are you willing to
be wrong?
You set the value of
the level of
significance based
on the answer to
the question above.
5-66
Errors
There are two types of errors that can be
made when performing a statistical test:
Type I: you reject H0 when in fact H0 is true
Type II: you accept H0 when if fact H1 is true
5-67
Fitting the Data
There are two ways to perform the goodness of
fit tests:
5-68
Auto::Fit
Within the Auto::Fit
window you can select
to fit continuous or
discrete distributions.
5-69
Auto::Fit
5-70
Goodness of Fit Tests
The test results are
given along with the
actual distribution fit.
Each test has a result:
Reject or Do Not Reject.
5-71
Distribution Graph
After picking out the
top few distributions,
it can be useful to
graph the fit
distribution against
the data.
process.
5-73
Exporting
Once you have
selected a fit, you
need to export it to
the PROMODEL
product.
5-75
Exporting
The precision box
allows you to
change the number
of decimal places
in the distribution
parameters
Select OK
The distribution is
now in the correct
form to paste into
your model
5-76
Stat::Fit
What to avoid:
Small samples
5-77
Frequency Histogram of
Inspection Times
5-78
Best Distribution Fit for
Inspection Times
5-79
Beta Curve Representing
Inspection Times
5-80
Appropriate Adjustments
Remember, you fit a distribution to
historical data, not necessarily to the
data reflecting the design period.
Don’t forget to adjust the data to reflect
the period of interest.
Is there a growth rate to factor in?
Is there a learning curve to consider?
5-81
Handling Rare Behavior
Repeating behavior – e.g. Occasional
abnormally long downtimes.
Can include if not too infrequent.
Can model once like non-repeating.
Non-repeating behavior – e.g. Labor strike
Throw one in to see what happens.
5-82
Absence of Data
A single, most likely or mean value
Minimum and maximum values defining a
range
Minimum, most likely and maximum values
Use sensitivity analysis:
Best case
Worse case
Most likely case
5-83
Assumptions
All models are based on assumptions.
Relative comparisons may still be valid.
Sensitivity analysis can show crucial
assumptions.
5-84
Use of an Assumption List
Assumptions Examples
No downtimes are considered (downtimes are rare).
Operators are dedicated at each workstation and are always
available during the scheduled work time.
Rework times are half of the normal operation times.
5-85