0% found this document useful (0 votes)
19 views114 pages

ERM 4b Final

Uploaded by

vinayak457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views114 pages

ERM 4b Final

Uploaded by

vinayak457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Why a Manager Needs to Know About Statistics

 To Know How to Properly Present Information

 To Know How to Draw Conclusions about Populations


Based on Sample Information

 To Know How to Improve Processes

 To Know How to Obtain Reliable Forecasts


Why We Need Data
 To Provide Input to Survey
 To Provide Input to Study
 To Measure Performance of Ongoing Service or
Production Process
 To Evaluate Conformance to Standards
 To Assist in Formulating Alternative Courses of
Action
 To Satisfy Curiosity
Binomial Probability Distribution

 ‘n’ Identical Trials


 E.g., 15 tosses of a coin; 10 light bulbs taken from
a warehouse
 2 Mutually Exclusive Outcomes on Each Trial
 E.g., Heads or tails in each toss of a coin;
defective or not defective light bulb
 Trials are Independent
 The outcome of one trial does not affect the
outcome of the other
Binomial Probability Distribution
(continued)

 Constant Probability for Each Trial


 E.g., Probability of getting a tail is the same each
time we toss the coin
 2 Sampling Methods
 Infinite population without replacement
 Finite population with replacement
Binomial Probability Distribution Function

n!
P X   p 1  p 
X n X

X ! n  X  !
P  X  : probability of X successes given n and p
X : number of "successes" in sample  X  0,1, , n
p : the probability of each "success"
n : sample size Tails in 2 Tosses of Coin
X P(X)
0 1/4 = .25
1 2/4 = .50
2 1/4 = .25
Poisson Distribution

Siméon Poisson
Poisson Distribution
 Discrete events (“successes”) occurring in a given
area of opportunity (“interval”)
 “Interval” can be time, length, surface area, etc.
 The probability of a “success” in a given “interval” is
the same for all the “intervals”
 The number of “successes” in one “interval” is
independent of the number of “successes” in other
“intervals”
 The probability of two or more “successes” occurring
in an “interval” approaches zero as the “interval”
becomes smaller
 E.g., # customers arriving in 15 minutes
 E.g., # defects per case of light bulbs
The Normal Distribution

 “Bell Shaped” f(X)


 Symmetrical
 Mean, Median and
Mode are Equal X

 Inter-quartile Range
Equals 1.33 s Mean
Median
 Random Variable Mode
Has Infinite Range
The Mathematical Model
1  (1/ 2)  X    / s 
f X  
2

e
2s
f  X  : density of random variable X
  3.14159; e  2.71828
 : population mean
s : population standard deviation
X : value of random variable    X   
Many Normal Distributions
There are an Infinite Number of Normal Distributions

Varying the Parameters s and , We Obtain


Different Normal Distributions
The Standardized Normal
Distribution
When X is normally distributed with a mean  and a
X   follows a
standard deviation s, Z 
s
standardized (normalized) normal distribution with a
mean 0 and a standard deviation 1.

f(Z) f(X) s

sZ 1

X
Z  0 Z
Finding Probabilities
Probability is
the area under
the curve! P c  X  d   ?

f(X)

X
c d
Standardizing Example
X  6.2  5
Z   0.12
s 10
Normal Distribution Standardized
Normal Distribution
s  10 sZ 1

6.2 X 0.12 Z
 5 Z  0
Example:
P  2.9  X  7.1  .1664
X  2.9  5 X  7.1  5
Z   .21 Z   .21
s 10 s 10
Normal Distribution Standardized
Normal Distribution
s  10 .0832 sZ 1
.0832

2.9 7.1 X 0.21 0.21 Z


 5 Z  0
Example:
P  2.9  X  7.1  .1664(continued)
Cumulative Standardized Normal
Distribution Table (Portion)
Z  0 sZ 1
Z .00 .01 .02
.5832
0.0 .5000 .5040 .5080
0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871 0


Z = 0.21
0.3 .6179 .6217 .6255
Example:
P  2.9  X  7.1  .1664(continued)
Cumulative Standardized Normal
Distribution Table (Portion)
Z  0 sZ 1
Z .00 .01 .02 .4168
-0.3 .3821 .3783 .3745
-0.2 .4207 .4168 .4129

-0.1 .4602 .4562 .4522 0


Z = -0.21
0.0 .5000 .4960 .4920
Recovering X Values for Known
Probabilities
Normal Distribution Standardized
Normal Distribution
s  10
.6179 sZ 1
.3821

X 0.30
 5 ? Z  0
Z

X    Zs  5  .3010  8
More Examples of Normal
Distribution Using PHStat
A set of final exam grades was found to be normally
distributed with a mean of 73 and a standard deviation of 8.
What is the probability of getting a grade no higher than 91
on this exam?

X N  73,8 2
 P  X  91  ? s 8
Mean 73
Standard Deviation 8

Probability for X <= X


X Value 91
Z Value 2.25
  73 91
P(X<=91) 0.9877756
Z
0 2.25
More Examples of Normal
Distribution Using PHStat
(continued)

What percentage of students scored between


65 and 89?
X N  73,82  P  65  X  89  ?

Probability for a Range


From X Value 65
To X Value 89
Z Value for 65 -1
Z Value for 89 2 X
P(X<=65) 0.1587
P(X<=89) 0.9772 65   73 89
P(65<=X<=89) 0.8186 Z
-1 0 2
More Examples of Normal
Distribution Using PHStat
(continued)

The middle 50% of the students scored


between what two scores?
X N  73,82  P  a  X  b  .50
Find X and Z Given Cum. Pctage.
Cumulative Percentage 25.00%
Z Value -0.67449
X Value 67.60408 .25 .25

Find X and Z Given Cum. Pctage.


X
Cumulative Percentage 75.00% 67.6   73 78.4
Z Value 0.67449 Z
X Value 78.39592 -0.67 0 0.67
Assessing Normality

 Not All Continuous Random Variables are


Normally Distributed
 It is Important to Evaluate How Well the Data
Set Seems to Be Adequately Approximated by
a Normal Distribution
Assessing Normality (continued)
 Construct Charts
 For small- or moderate-sized data sets, do the
stem-and-leaf display and box-and-whisker plot
look symmetric?
 For large data sets, does the histogram or polygon
appear bell-shaped?
 Compute Descriptive Summary Measures
 Do the mean, median and mode have similar
values?
 Is the interquartile range approximately 1.33 s?
 Is the range approximately 6 s?
Assessing Normality
(continued)

 Observe the Distribution of the Data Set


 Do approximately 2/3 of the observations lie
between mean  1 standard deviation?
 Do approximately 4/5 of the observations lie
between mean  1.28 standard deviations?
 Do approximately 19/20 of the observations lie
between mean  2 standard deviations?
 Evaluate Normal Probability Plot
 Do the points lie on or close to a straight line with
positive slope?
Assessing Normality
(continued)

 Normal Probability Plot


 Arrange Data into Ordered Array
 Find Corresponding Standardized Normal Quantile
Values
 Plot the Pairs of Points with Observed Data Values
on the Vertical Axis and the Standardized Normal
Quantile Values on the Horizontal Axis
 Evaluate the Plot for Evidence of Linearity
Normal Probability Plot
Left-Skewed Right-Skewed
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2

Rectangular U-Shaped
90 90
X 60 X 60
30 Z 30 Z
-2 -1 0 1 2 -2 -1 0 1 2
Unbiasedness (  X   )
f X 
Unbiased Biased

 X X
Effect of Large Sample
For sampling with replacement:
As n increases, s X decreases
f X  Larger
sample size
Smaller
sample size

 X
When the Population is Normal
Population Distribution
Central Tendency s  10
X  

Variation   50
s Sampling Distributions
sX 
n n4 n  16
sX 5 s X  2.5

 X  50 X
When the Population is
Not Normal
Population Distribution
Central Tendency s  10
X  

Variation   50
s Sampling Distributions
sX 
n n4 n  30
sX 5 s X  1.8

 X  50 X
Level of Significance and the
Rejection Region

H0:  3.5 a Critical


H1:  < 3.5 Value(s)

Rejection 0
Regions a
H0:   3.5
H1:  > 3.5
0
a/2
H0:  3.5
H1:   3.5
0
One-Tail Z Test for Mean
( s Known)

 Assumptions
 Population is normally distributed
 If not normal, requires large samples
 Null hypothesis has  or  sign only
 s is known
 Z Test Statistic

X  X X 
Z 
sX s/ n
Rejection Region
H0: 0 H0: 0
H1:  < 0 H1:  > 0
Reject H0 Reject H0
a a

0 Z 0 Z
Z must be significantly Small values of Z don’t
below 0 to reject H0 contradict H0 ; don’t
reject H0 !
Reject and Do Not Reject
Regions
H 0 :   368
Reject

.05
Do Not Reject
X
 X   X  368 372.5
0 1.645 Z
1.5
0

H1 :   368
Connection to Confidence
Intervals
For X  372.5, s  15 and n  25,
the 95% confidence interval is:
372.5  1.96  15 / 25    372.5  1.96  15 / 25
or
366.62    378.38
We are 95% confident that the population mean is
between 366.62 and 378.38.
If this interval contains the hypothesized mean (368),
we do not reject the null hypothesis.
It does. Do not reject.
t Test: s Unknown
 Assumption
 Population is normally distributed
 If not normal, requires a large sample
 s is unknown
 t Test Statistic with n-1 Degrees of Freedom
X 
 t
S/ n
Example Solution: One-Tail
H0: 368 Test Statistic:
H1:  368
X   372.5  368
a = 0.01 t   1.80
S 15
n = 36, df = 35 n 36
Critical Value: 2.4377
Reject Decision:
Do Not Reject at a = .01.
.01
Conclusion:
Insufficient Evidence that
0 2.437 t35
7
1.80
True Mean is More Than 368.
p -Value Solution
(p-Value is between .025 and .05)  (a = 0.01)
Do Not Reject.

p-Value = [.025, .05]

Reject

a = 0.01

0 t35
1.80 2.4377
Test Statistic 1.80 is in the Do Not Reject Region
Potential Pitfalls and Ethical Issues
 Data Collection Method is Not Randomized to Reduce
Selection Biases
 Treatment of Human Subjects are Manipulated Without
Informed Consent
 Data Snooping is Used to Choose between One-Tail
and Two-Tail Tests, and to Determine the Level of
Significance
Potential Pitfalls and Ethical Issues (continued)

 Data Cleansing is Practiced to Hide


Observations that do not Support a Stated
Hypothesis
 Fail to Report Pertinent Findings
One-Way Analysis of Variance
F Test
 Evaluate the Difference Among the Mean Responses of 2
or More (c ) Populations
 E.g., Several types of tires, oven temperature settings

 Assumptions
 Samples are randomly and independently drawn
 This condition must be met

 Populations are normally distributed


 F Test is robust to moderate departure from normality

 Populations have equal variances


 Less sensitive to this requirement when samples are of equal

size from each population


Why ANOVA?
 Could Compare the Means One by One using Z or t Tests for
Difference of Means
 Each Z or t Test Contains Type I Error
 The Total Type I Error with k Pairs of Means is 1- (1 - a) k

 E.g., If there are 5 means and use a = .05


 Must perform 10 comparisons

 Type I Error is 1 – (.95)


10 = .40

 40% of the time you will reject the null hypothesis of equal means

in favor of the alternative when the null is true!


Hypotheses of One-Way ANOVA
 H 0 : 1  2   c
 All population means are equal
 No treatment effect (no variation in means among groups)

 H1 : Not all i are the same


 At least one population mean is different (others may be the
same!)
 There is a treatment effect
 Does not mean that all population means are different
One-Way ANOVA
(No Treatment Effect)
H 0 : 1  2   c
H1 : Not all i are the same
The Null
Hypothesis is
True

1   2  3
One-Way ANOVA
(Treatment Effect Present)
H 0 : 1  2   c
H1 : Not all i are the same The Null
Hypothesis is
NOT True

1   2  3 1  2  3
One-Way ANOVA
(Partition of Total Variation)
Total Variation SST

Variation Due to Variation Due to Random


= Group SSA + Sampling SSW
Commonly referred to as: Commonly referred to as:
 Among Group Variation  Within Group Variation
 Sum of Squares Among  Sum of Squares Within
 Sum of Squares Between  Sum of Squares Error
 Sum of Squares Model  Sum of Squares Unexplained
 Sum of Squares Explained
 Sum of Squares Treatment
Total Variation
c nj

SST   ( X ij  X ) 2

j 1 i 1

X ij : the i -th observation in group j


n j : the number of observations in group j
n : the total number of observations in all groups
c : the number of groups
c nj

 X
j 1 i 1
ij

X  the overall or grand mean


n
Total Variation (continued)

  X   X 
2 2 2
SST  X 11  X 21 X nc c X
Response, X

Group 1 Group 2 Group 3


One-Way ANOVA
F Test Statistic
 Test Statistic
MSA
F

MSW
 MSA is mean squares among
 MSW is mean squares within
 Degrees of Freedom
 df1  c 1

df 2  n  c
One-Way ANOVA
Summary Table

Degrees Mean
Source of Sum of F
of Squares
Variation Squares Statistic
Freedom (Variance)
Among MSA =
c–1 SSA MSA/MSW
(Factor) SSA/(c – 1 )
Within MSW =
n–c SSW
(Error) SSW/(n – c )
SST =
Total n–1
SSA + SSW
Features of One-Way ANOVA F Statistic
 The F Statistic is the Ratio of the Among Estimate
of Variance and the Within Estimate of Variance
 The ratio must always be positive
 df1 = c -1 will typically be small
 df2 = n - c will typically be large
 The Ratio Should be Close to 1 if the Null is True
 If the Null Hypothesis is False
 The numerator should be greater than the denominator
 The ratio should be larger than 1
One-Way ANOVA F Test
Example
As Production Manager, you
want to see if 3 filling machines Machine1 Machine2 Machine3
have different mean filling 25.40 23.40 20.00
times. 26.31 21.80 22.20
You assign 15 similarly trained 24.10 23.50 19.75
& experienced workers, 5 per 23.74 22.75 20.60
machine, to the machines. 25.10 21.60 20.40
At the .05 significance level, is
there a difference in mean
filling times?
One-Way ANOVA
Example: Scatter Diagram

Machine1 Machine2 Machine3 27


25.40 23.40 20.00 26 •
26.31 21.80 22.20 •
25 • X1
24.10 23.50 19.75 24 •
23.74 22.75 20.60 • ••
25.10 21.60 20.40
23
• X2 •
X
22 ••
21
X 1  24.93 X 2  22.61 •• X3
20 ••
X 3  20.59 X  22.71 19
Residual Analysis for
Homoscedasticity

Y Y

X X
SR SR

X X

Heteroscedasticity Homoscedasticity
One-Way ANOVA Example
Computations
Machine1 Machine2 Machine3 X 1  24.93 nj  5
25.40 23.40 20.00
26.31 21.80 22.20 X 2  22.61 c3
24.10 23.50 19.75 X 3  20.59 n  15
23.74 22.75 20.60
25.10 21.60 20.40 X  22.71

SSA  5  24.93  22.71   22.61  22.71   20.59  22.71 


 2 2 2
 
 47.164
SSW  4.2592  3.112  3.682  11.0532
MSA  SSA /(c -1)  47.16 / 2  23.5820
MSW  SSW /( n - c)  11.0532 /12  .9211
One-Way ANOVA Example
Solution
Test Statistic:
H0: 1 = 2 = 3
H1: Not All Equal MSA 23.5820
a = .05 F   25.6
df1= 2 df2 = 12 MSW .9211

Critical Value(s):
Decision:
Reject at a = 0.05.
a = 0.05 Conclusion:
There is evidence that at
least one  i differs from
0 3.89 F the rest.
Two-Way ANOVA

 Examines the Effect of:


 Two factors on the dependent variable
 E.g., Percent carbonation and line speed on soft
drink bottling process
 Interaction between the different levels of these
two factors
 E.g., Does the effect of one particular percentage of
carbonation depend on which level the line speed is
set?
Two-Way ANOVA (continued)

 Assumptions
 Normality
 Populations are normally distributed
 Homogeneity of Variance
 Populations have equal variances
 Independence of Errors
 Independent random samples are drawn
Two-Way ANOVA
Total Variation Partitioning

Variation Due to SSA +


Factor A d.f.= r-1

SSB +
Variation Due to
d.f.= c-1
Total Variation Factor B

Variation Due to SSAB +


SST
= Interaction d.f.= (r-1)(c-1)
d.f.= n-1
Variation Due to SSE
Random Sampling d.f.= rc(n’-1)
Two-Way ANOVA
Total Variation Partitioning

r  the number of levels of factor A


c  the number of levels of factor B
n  the number of values (replications) for each cell
'

n  the total number of observations in the experiment


X ijk  the value of the k -th observation for level i of
factor A and level j of factor B
Features of Two-Way ANOVA
F Test
 Degrees of Freedom Always Add Up
 rcn’-1=rc(n’-1)+(c-1)+(r-1)+(c-1)(r-1)
 Total=Error+Column+Row+Interaction
 The Denominator of the F Test is Always the
Same but the Numerator is Different
 The Sums of Squares Always Add Up
 Total=Error+Column+Row+Interaction
Purpose of Regression Analysis

 Regression Analysis is Used Primarily to Model


Causality and Provide Prediction
 Predict the values of a dependent (response)
variable based on values of at least one
independent (explanatory) variable
 Explain the effect of the independent variables on
the dependent variable
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship


Simple Linear Regression Model

 Relationship between Variables is Described


by a Linear Function
 The Change of One Variable Causes the Other
Variable to Change
 A Dependency of One Variable on the Other
Simple Linear Regression Model
(continued)

Population regression line is a straight line that


describes the dependence of the average value
(conditional mean) of one variable on the other
Population Random
Population Error
Slope
Y Intercept Coefficient

Yi      X i   i
Dependent
Population Independent
(Response)
Variable Regression  Y |X (Explanatory)
Line Variable
(Conditional Mean)
Simple Linear Regression Model
(continued)

Y (Observed Value of Y) = Yi      X i   i

 i = Random Error 

Y | X      X i
 (Conditional Mean)
X
Observed Value of Y
Linear Regression Equation
Sample regression line provides an estimate of
the population regression line as well as a
predicted value of Y
Sample
Sample
Slope
Y Intercept

Yi  b0  b1 X i  ei
Coefficient

Residual

Ŷ  b0  b1 X (Fitted Regression Line, Predicted Value)


Simple Regression Equation
Linear Regression Equation
(continued)

 b0 and b1 are obtained by finding the values


of b0 and b that minimize the sum of the
1
squared residuals

 
n n
  ei2
2
Yi  Yˆi
i 1 i 1

 b0 provides an estimate of  
 b1 provides an estimate of 
Linear Regression Equation
(continued)

Yi  b0  b1 X i  ei Yi      X i   i
Y b1
i 
ei
Y | X      X i
 Yˆi  b0  b1 X i
b0 X
Observed Value
Interpretation of the Slope
and Intercept

   E Y | X  0 is the average value of Y


when the value of X is zero

change in E Y | X 
 1  measures the
change in X
change in the average value of Y as a result
of a one-unit change in X
Interpretation of the Slope
and Intercept (continued)

 b  Eˆ Y | X  0  is the estimated average


value of Y when the value of X is zero

change in Eˆ Y | X 
 b1  is the estimated
change in X
change in the average value of Y as a result
of a one-unit change in X
Simple Linear Regression:
Example
You wish to examine Annual
Store Square Sales
the linear dependency Feet ($1000)
of the annual sales of 1 1,726 3,681
produce stores on their 2 1,542 3,395
sizes in square footage. 3 2,816 6,653
Sample data for 7 4 5,555 9,543
stores were obtained. 5 1,292 3,318
Find the equation of 6 2,208 5,563
the straight line that 7 1,313 3,760
fits the data best.
Scatter Diagram: Example
12000
Annua l Sa le s ($000)

10000

8000

6000

4000

2000

0
0 1000 2000 3000 4000 5000 6000

S q u a re F e e t
Excel Output
Simple Linear Regression
Equation: Example

Yˆi  b0  b1 X i
 1636.415  1.487 X i

From Excel Printout:


C o e ffi c i e n ts
I n te r c e p t 1 6 3 6 .4 1 4 7 2 6
X V a ria b le 1 1 .4 8 6 6 3 3 6 5 7
Interpretation of Results:
Example

Yˆi  1636.415  1.487 X i


The slope of 1.487 means that for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 1.487 units.

The equation estimates that for each increase of 1


square foot in the size of the store, the expected
annual sales are predicted to increase by $1487.
Residual Analysis

 Purposes
 Examine linearity
 Evaluate violations of assumptions
 Graphical Analysis of Residuals
 Plot residuals vs. X and time
Residual Analysis for Linearity
Y Y

X
e X
e
X
X

Not Linear
 Linear
Residual Analysis for
Homoscedasticity

Y Y

X X
SR SR

X X

Heteroscedasticity Homoscedasticity
Residual Analysis: Excel Output
for Produce Stores Example
Observation Predicted Y Residuals
1 4202.344417 -521.3444173
2 3928.803824 -533.8038245
3 5822.775103 830.2248971
Excel Output 4 9894.664688 -351.6646882
5 3557.14541 -239.1454103
6 4918.90184 644.0981603
7 3588.364717 171.6352829
Residual Plot

0 1000 2000 3000 4000 5000 6000

Square Feet
Residual Analysis for
Independence
 The Durbin-Watson Statistic
 Used when data is collected over time to detect
autocorrelation (residuals in one time period are
related to residuals in another period)
 Measures violation of independence assumption
n

 i i1
( e  e ) 2
Should be close to 2.
D i 2
n If not, examine the model
e
i 1
2
i for autocorrelation.
Sample Observations from
Various r Values
Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

X X
r = .6 r=1
Features of rand r

 Unit Free
 Range between -1 and 1
 The Closer to -1, the Stronger the Negative
Linear Relationship
 The Closer to 1, the Stronger the Positive
Linear Relationship
 The Closer to 0, the Weaker the Linear
Relationship
t Test for Correlation
 Hypotheses
 H0: r = 0 (no correlation)
 H1: r  0 (correlation)
 Test Statistic
rr
t where
  r 2

n2
n

 X i  X Yi  Y 
r  r2  i 1
n n

 X X  Y  Y 
2 2
i i
i 1 i 1
Example: Produce Stores
From Excel Printout r
Is there any R e g r e ssi o n S ta ti sti c s
M u lt ip le R 0 .9 7 0 5 5 7 2
evidence of linear R S q u a re 0 .9 4 1 9 8 1 2 9
relationship between A d ju s t e d R S q u a re 0 . 9 3 0 3 7 7 5 4
annual sales of a S t a n d a rd E rro r 6 1 1 .7 5 1 5 1 7
store and its square O b s e rva t io n s 7
footage at .05 level
of significance? H0: r = 0 (no association)
H1: r  0 (association)
a  .05
df  7 - 2 = 5
Example: Produce Stores
Solution

Decision:
rr .9706 Reject H0.
t   9.0099
 r 2
1  .9420
Conclusion:
n2 5
There is evidence of a
Critical Value(s): linear relationship at 5%
level of significance.
Reject Reject
The value of the t statistic is
.025 .025 exactly the same as the t
statistic value for test on the
-2.5706 0 2.5706 slope coefficient.
Estimation of Mean Values
Confidence Interval Estimate for Y | X  X i
:
The Mean of Y Given a Particular Xi
Size of interval varies according
Standard error to distance away from mean, X
of the estimate
1 (Xi  X ) 2
Yˆi  tn 2 SYX  n
(Xi  X )
n 2
t value from table
i 1
with df=n-2
Prediction of Individual Values
Prediction Interval for Individual Response
Yi at a Particular Xi

Addition of 1 increases width of interval


from that for the mean of Y

1 (Xi  X ) 2
Yˆi  tn 2 SYX 1  n
(Xi  X )
n 2

i 1
Interval Estimates for Different
Values of X
Confidence
Prediction Interval Interval for the
for a Individual Yi Mean of Y
Y

X
X a given X
Example: Produce Stores
Data for 7 Stores:
Annual
Store Square Sales Consider a store
Feet ($000) with 2000 square
1 1,726 3,681 feet.
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543 Regression Model Obtained:
5 1,292 3,318 
6 2,208 5,563
Yi = 1636.415 +1.487Xi
7 1,313 3,760
Estimation of Mean Values:
Example
Confidence Interval Estimate for Y | X  X i

Find the 95% confidence interval for the average annual


sales for stores of 2,000 square feet.

Predicted Sales Yi = 1636.415 +1.487Xi = 4610.45 ($000)
X = 2350.29 SYX = 611.75 tn-2 = t5 = 2.5706

1 ( X i  X )2
Yˆi  tn 2 SYX  n  4610.45  612.66
(Xi  X )
n 2

i 1

3997.02  Y |X  X i  5222.34
What is a Time Series?

 Numerical Data Obtained at Regular Time


Intervals
 The Time Intervals Can Be Annually,
Quarterly, Daily, Hourly, Etc.
 Example:
Year: 1994 1995 1996 1997 1998
Sales: 75.3 74.2 78.5 79.7 80.2
Time-Series Components

Trend Cyclical

Time-Series

Seasonal Random
Trend Component

 Overall Upward or Downward Movement


 Data Taken Over a Period of Years

Sales

Time
Cyclical Component

 Upward or Downward Swings


 May Vary in Length
 Usually Lasts 2 - 10 Years

Sales
Seasonal Component

 Upward or Downward Swings


 Regular Patterns
 Observed Within 1 Year
Sales

Summer

Winter
Spring Fall
Time (Monthly or Quarterly)
Random or Irregular Component

 Erratic, Nonsystematic, Random, “Residual”


Fluctuations
 Due to Random Variations of
 Nature
 Accidents
 Short Duration and Non-Repeating
Example: Quarterly Retail Sales
with Seasonal Components
Quarterly with Seasonal Components

25

20

15
Sales

10

0
0 5 10 15 20 25 30 35

Time
Example: Quarterly Retail Sales with
Seasonal Components Removed

Quarterly without Seasonal Com ponents

25

20

15
Sales

Y(t)

10

0
0 5 10 15 20 25 30 35
Tim e
Multiplicative Time-Series Model

 Used Primarily for Forecasting


 Observed Value in Time Series is the Product
of Components
 For Annual Data:
Ti = Trend
Yi  Ti Ci I i
 For Quarterly or Monthly Data: Ci = Cyclical

Yi  Ti Si Ci I i Ii = Irregular
Si = Seasonal
Moving Averages

 Used for Smoothing


 Series of Arithmetic Means Over Time
 Result Dependent Upon Choice of L (Length of
Period for Computing Means)
 To Smooth Out Cyclical Component, L Should
Be Multiple of the Estimated Average Length
of the Cycle
 For Annual Time Series, L Should Be Odd
Moving Averages
(continued)

 Example: 3-Year Moving Average


Y1  Y2  Y3
 First average: MA(3) 
3
Y2  Y3  Y4
 Second average: MA(3) 
3
Moving Average Example
John is a building contractor with a record of a total
of 24 single family homes constructed over a 6-year
period. Provide John with a 3-year moving average
graph. Year Units Moving
Ave
1994 2 NA
1995 5 3
1996 2 3
1997 2 3.67
1998 7 5
1999 6 NA
Moving Average Example
Solution

Year Response Moving


Ave Sales
L=3
1994 2 NA 8
1995 5 3 6
1996 2 3 4
1997 2 3.67 2
1998 7 5 0
94 95 96 97 98 99
1999 6 NA
No MA for the first and last (L-1)/2 years
Exponential Weight: Example
Ei  WYi  (1  W ) Ei 1
Year Response Smoothing Value Forecast
(W = .2, (1-W)=.8)
1994 2 2 NA
1995 5 (.2)(5) + (.8)(2) = 2.6 2
1996 2 (.2)(2) + (.8)(2.6) = 2.48 2.6
1997 2 (.2)(2) + (.8)(2.48) = 2.384 2.48
1998 7 (.2)(7) + (.8)(2.384) = 3.307 2.384
1999 6 (.2)(6) + (.8)(3.307) = 3.846 3.307
Exponential Weight:
Example Graph

Sales
8
Data
6

2 Smoothed

0
94 95 96 97 98 99 Year
Linear Trend Model
Use the method of least squares to obtain the
linear trend forecasting equation:

Year Coded X Sales (Y)


95 0 2
96 1 5
Yˆi  b0  b1 X i
97 2 2
98 3 2
99 4 7
00 5 6
Linear Trend Model
(continued)
Linear trend forecasting equation:
Yˆi  b0  b1 X i  2.143  .743X i
8
Excel Output
7
Co efficien ts
6
I n te r c e p t 2.14285714
5
X V a ria b le 1 0 .7 4 2 8 5 7 1 4
Sales

4
3 Projected to
2 year 2001
1
0
0 1 2 3 4 5 6
X
The Quadratic Trend Model
Use the method of least squares to obtain
the quadratic trend forecasting equation:
Year Coded X Sales (Y)
95 0 2
96
97
1
2
5
2
ˆ
Yi  b0  b1 X i  b2 X i
2

98 3 2
99 4 7
00 5 6
The Quadratic Trend Model
(continued)

ˆ
Yi  b0  b1 X i  b2 X i  2.857  .33X i  .214 X i
2 2

Excel Output 8
Coefficients 7

In te rce p t 2.85714286 6

X V a ria b le 1 -0.3285714 Sales 5


4
Projected to
X V a ria b le 2 0.21428571 year 2001
3
2
1
0
0 1 2 3 4 5 6
X
The Exponential Trend Model
After taking the logarithms, use the method of least
squares to get the forecasting equation:
ˆ
Yi  b0b1
Xi
or log Yˆi  log b0  X1 log b1
Year Coded X Sales (Y) C o e f f ic ie n t s
In t e r c e p t 0 .3 3 5 8 3 7 9 5
95 0 2
X V a ria b le 10 . 0 8 0 6 8 5 4 4
96 1 5
97 2 2 Excel Output of Values in Logs
98 3 2 a n t ilo g (. 3 3 5 8 3 7 9 5 ) = 2.17
a n t ilo g (. 0 8 0 6 8 5 4 4 ) = 1.2
99 4 7
00 5 6 ˆ
Yi  (2.17)(1.2) Xi
Autoregressive Modeling

 Used for Forecasting


 Takes Advantage of Autocorrelation
 1st order - correlation between consecutive values
 2nd order - correlation between values 2 periods
apart
 Autoregressive Model for p-th Order:
Yi  A0  AY
1 i 1  A2Yi  2   ApYi  p   i
Random
Error
Autoregressive Model:
Example
The Office Concept Corp. has acquired a number of
office units (in thousands of square feet) over the
last 8 years. Develop the 2nd order autoregressive
model.
Year Units
93 4
94 3
95 2
96 3
97 2
98 2
99 4
00 6
Autoregressive Model:
Example Solution
Develop the 2nd order Year Yi Yi-1 Yi-2
table 93 4 --- ---
94 3 4 ---
Use Excel to estimate a 95 2 3 4
regression model 96 3 2 3
Excel Output 97 2 3 2
Coefficients 98 2 2 3
I n te rc e p t 3.5 99 4 2 2
X V a ri a b l e 1 0.8125 00 6 4 2
X V a ri a b l e 2 -0 . 9 3 7 5

Yˆi  3.5  .8125Yi 1  .9375Yi 2


Autoregressive Model Example:
Forecasting

Use the 2nd order model to forecast number


of units for 2001:
Yˆi  3.5  .8125Yi 1  .9375Yi 2
Yˆ2001  3.5  .8125Y2000  .9375Y1999
 3.5  .8125  6  .9375  4
 4.625
Autoregressive Modeling Steps

1. Choose p : Note that df = n - 2p - 1


2. Form a Series of “Lag Predictor” Variables
Yi-1 , Yi-2 , … ,Yi-p
3. Use Excel to Run Regression Model Using All
p Variables
4. Test Significance of Ap
 If null hypothesis rejected, this model is selected
 If null hypothesis not rejected, decrease p by 1
and repeat

You might also like