0% found this document useful (0 votes)
168 views50 pages

Reference Card (PDQ Sheet) : Week 1: Statistical Inference For One and Two Popula-Tion Variances

This document provides an overview of statistical methods that will be covered in Week 1 and Week 2 of STAT 2607 - Business Statistics II. Week 1 focuses on statistical inference for one and two population variances using chi-square and F distributions, as well as experimental design and one-way ANOVA. Week 2 covers nonparametric Kruskal-Wallis H test as an alternative to one-way ANOVA when assumptions are not met, and randomized block design as a technique to eliminate the effect of a nuisance factor.

Uploaded by

business doc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views50 pages

Reference Card (PDQ Sheet) : Week 1: Statistical Inference For One and Two Popula-Tion Variances

This document provides an overview of statistical methods that will be covered in Week 1 and Week 2 of STAT 2607 - Business Statistics II. Week 1 focuses on statistical inference for one and two population variances using chi-square and F distributions, as well as experimental design and one-way ANOVA. Week 2 covers nonparametric Kruskal-Wallis H test as an alternative to one-way ANOVA when assumptions are not met, and randomized block design as a technique to eliminate the effect of a nuisance factor.

Uploaded by

business doc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Reference Card (PDQ Sheet)

STAT 2607 - Business Statistics II


School of Mathematics and Statistics, Carleton University

Week 1: Statistical Inference for One and Two Popula-


tion Variances
Chapter 10: 10.6 - 10.7 (Test of One Population Variance)
• Chi-square distribution is used to test one population variance σ 2 .
• There is no test directly for population standard deviation σ, so any standard deviation
test can be converted into one involving the variance like the following:

• Hypotheses:

(i) H0 : σ 2 = σ02 vs HA : σ 2 6= σ02 (Two-tailed Test)


(ii) H0 : σ 2 ≤ σ02 vs HA : σ 2 > σ02 (Right-tailed Test)
(iii) H0 : σ 2 ≥ σ02 vs HA : σ 2 < σ02 (Left-tailed Test)
(n − 1)s2
• Test Statistic: χ2 =
σ02
• Rejection Rule (Critical Value Approach):

(i) Reject H0 if χ2 > χ2α or χ2 < χ21− α (Two-tailed Test)


2 2

(ii) Reject H0 if χ2 > χ2α (Right-tailed Test)


(iii) Reject H0 if χ2 < χ21−α (Left-tailed Test)

1
• Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value
≥ α, do not reject H0 (Note: In most of the cases, you need to approximate p-value
from Chi-square table as the table does not have enough details)

• Rejection Rule (Confidence Interval Approach (for two-tailed test only): If the confi-
(n − 1)s2 (n − 1)s2
dence interval (( , )) contains σ02 , do not reject H0 ; otherwise, reject
χ2U χ2L
H0

2
Chapter 11: 11.4 - 11.5 (Test of Two Population Variances)
• F distribution is used to test two population variances σ12 , σ22 .

• F-test is right-skewed distribution, depends on two parameters, numerator degrees of


freedom, df1 and denominator degrees of freedom, df2

• Always uses upper critical region to test hypothesis. If the hypothesis is two-tailed,
upper critical region is α/2, while if the test is one-tailed (left-tailed or right-tailed),
the upper critical region is α

• Hypotheses:

(i) H0 : σ12 = σ22 vs HA : σ12 6= σ22 (Two-tailed Test)


(ii) H0 : σ12 ≤ σ22 vs HA : σ12 > σ22 (Right-tailed Test)
(iii) H0 : σ12 ≥ σ22 vs HA : σ12 < σ22 (Left-tailed Test)

• Test Statistic:

larger of s21 and s22


(i) Two-tailed test: F = (when HA : σ12 6= σ22 )
smaller of s21 and s22
(df1 = the size of the sample having larger variance − 1,
df2 = the size of the sample having smaller variance − 1)
s21
(ii) Right-tailed test: F = (when HA : σ12 > σ22 ) (df1 = n1 − 1, df2 = n2 − 1)
s22
s22
(iii) Left-tailed test: F = 2
(when HA : σ12 < σ22 )(df1 = n2 − 1, df2 = n1 − 1)
s1
• Rejection Rule: (Critical Value Approach):

(i) Reject H0 if F > F α2 (Two-tailed Test)


(ii) Reject H0 if F > Fα (Right-tailed Test)
(iii) Reject H0 if F > Fα (Left-tailed Test)

• Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value


≥ α, do not reject H0 (Note: In most of the cases, you need to approximate p-value
from F table as the table does not have enough details)

3
Week 1: Experimental Design and One-way ANOVA
Chapter 12: 12.1 - 12.1 (One-way ANOVA or Completely Random-
ized Design (Parametric Test))
• Preliminaries:

(i) Observational Study: Researcher simply observes the subjects without interfering
and the data are collected without particular design. Cause and effect relationship
cannot be established.
(ii) Experimental Study: Experimenter manipulates the attributes of study partici-
pants and observes the consequences. Cause and effect relationship can be estab-
lished.
(iii) Experimental unit: Object on which measurement is taken.
(iv) Factor: Whose effect is to be studied.
(v) Level: Values (or intensities) of the factor.
(vi) Treatment: Combination of factors.
(vii) Response: Outcome of the treatment being measured by the experimenter.

• Assumptions (if not met, perform nonparametric test):

(i) All populations are normally distributed.


(ii) The population variances are equal.
(iii) The observations are independent.
(iv) The data are measured on interval or ratio scale (i.e. quantitative).

• Notations:

(i) x̄i : Average of the i-th treatment, i = 1, · · · , p


(ii) si : Standard deviation of the i-th treatment, i = 1, · · · , p
(iii) ni : Sample size of the i-th treatment, i = 1, · · · , p
(iv) p = Number of treatments

• Calculation of Sum of Squares:

p ni
X X
(i) SST O = (xij − x̄)2 = (n − 1)s2 (Note: Obtain s2 from calculator)
i=1 j=1
p
X
(ii) SST = ni (x̄i − x̄)2 (Note: Obtain x̄ from calculator and calculate x̄i manually
i=1
)
(iii) SSE = SST O − SST

4
• ANOVA Table:

• Hypotheses:

H0 : µ1 = µ2 = · · · = µp
HA : At least two of µ1 , µ2 · · · µp differ
M ST
• Test Statistic: F =
M SE
• Rejection Rule: (Critical Value Approach):

Reject H0 if F > Fdf1 = p−1, df2 = n−p

• Rejection Rule: (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value


≥ α, do not reject H0

• Post Hoc test: Tukey’s Simultaneous Confidence Interval (If H0 is rejected only)
 
p p!
– Number of pairs to be compared: =
2 2!(p − 2)!
– Point estimate for the mean difference: x̄i − x̄h where i and j designate two dif-
ferent treatments
r
qα 1 1
– (i) When sample sizes are unequal: (x̄i − x̄h ) ± M SE( + )
2 ni nh
r
M SE
(ii) When sample sizes are equal: (ni = nh = m): (x̄i − x̄h ) ± qα
m
(iii) Critical value qα : qα is the upper α percentage point of the studentized range
table for p(which is r) and (n − p)(which is v)
(iv) Decision: If the confidence interval contains 0, the difference between the
population average treatment means is NOT significant. If the confidence
interval does NOT contain 0, the difference is significant.

5
Week 2: One-way ANOVA (Nonparametric Test) +
Randomized Block Design (Parametric + Nonparamet-
ric
Chapter 18: 18.4 (Kruskal-Wallis H Test: A Nonparametric Test
for Comparing Several Population Means)
• When to use nonparametric test? Nonparametric tests (or distribution-free methods)
are used when the assumptions (same variance, normal distribution) of parametric
tests are not fulfilled

• Efficiency of parametric vs nonparametric tests:

(i) When the usual assumptions are fulfilled, nonparametric tests are about 95%
efficient than parametric equivalents
(ii) When normality and common variance assumptions are not satisfied, nonpara-
metric procedures are more efficient than parametric equivalents

• Step 1: Given p independent samples (n1 , n2 , · · · , np ≥ 5) from p populations

• Step 2: Rank the combined sample from smallest to largest (average ranks for ties)
(Sort −→ Potential (or Initial) Rank −→ Final Rank: Use EXCEL for ranking)

• Step 3: Find T1 : The sum of ranks for sample 1, T2 : The sum of ranks for sample 2,
· · · , Tp : The sum of ranks for sample p where p is the number of populations to be
compared

• Step 4: Hypotheses

(i) The p populations are identical


(ii) At least two of the populations differ in location
12 X T2
• Test Statistic: H = i
− 3(n + 1)
n(n + 1) ni

• Rejection Rule (Critical Value Approach): Reject H0 if H > χ2α where χ2α follows
chi-square distribution with (p − 1) degrees of freedom.

• Rejection Rule (p-value Approach): If p-value (exact p-value: MINITAB, approximate


p-value based on chi-square table) < α, reject H0 ; Otherwise, if p-value ≥ α, do not
reject H0

6
Chapter 12: 12.3 (Randomized Block Design (RBD)) (Parametric
Test)
• When to use RBD? There are situations when another nuisance factor (not primary
factor) may affect the observed response in one-way design

• Notion: Blocking technique can be used to eliminate the additional factor’s effect on
the statistical analysis of the main (or primary) factor of interest

• Factors: Treatment (primary factor), Block (nuisance factor) (In the data layout, treat-
ment and block can swap the positions of row and column)

• Notations:

(i) p : Number of treatments (primary factor)


(ii) b : Number of blocks
(iii) x̄i. : Treatment means
(iv) x̄.j : Block means
(v) x̄ : Overall mean of the b × p values
(vi) SST : Sum of Squares due to Treatments
(vii) SSB : Sum of Squares due to Blocks
(viii SST O : Total Sum of Squares
(ix) SSE : Sum of Squares due to Errors
(x) M ST : Mean Squares due to Treatments
(xi) M SB : Mean Squares due to Blocks
(ix) M SE : Mean Squares due to Errors

• Calculation of Sum of Squares:


p ni
X X
(i) SST O = (xij − x̄)2 = (n − 1)s2 (Note: Obtain s2 from calculator)
i=1 j=1
p
X
(ii) SST = b (x̄i. − x̄)2 = Number of blocks × Treatment Means Between Vari-
i=1
ability (Note: Obtain x̄ from calculator and calculate x̄i. manually)
b
X
(iii) SSB = p (x̄.j − x̄)2 = Number of treatments × Block Means Between Vari-
i=1
ability (Note: Obtain x̄ from calculator and calculate x̄.j manually)
(iv) SSE = SST O − SST − SSB

• ANOVA Table:

7
• Test of Treatment (Primary Factor) Means:

– Hypotheses
H0 : µ1 = µ2 = · · · = µp
HA : At least two of µ1 , µ2 · · · µp differ
M ST
– Test Statistic: F =
M SE
– Rejection Rule: (Critical Value Approach):

Reject H0 if F > Fdf1 = (p−1), df2 = (p−1)(b−1)


– Rejection Rule: (p-value Approach): If p-value < α, reject H0 ; Otherwise, if
p-value ≥ α, do not reject H0

• Test of Block (Nuisance Factor) Means:

– Hypotheses:
H0 : µb1 = µb2 = · · · = µbp
HA :At least two block effects (i.e. block means) differ
M SB
– Test Statistic: F =
M SE
– Rejection Rule: (Critical Value Approach):

Reject H0 if F > Fdf1 = (b−1), df2 = (p−1)(b−1)


– Rejection Rule: (p-value Approach): If p-value < α, reject H0 ; Otherwise, if
p-value ≥ α, do not reject H0

• Post Hoc test: Tukey’s Simultaneous Confidence Interval (If H0 is rejected only)
 
p p!
– Number of pairs to be compared: =
2 2!(p − 2)!
– Point estimate for the mean difference: x̄i. − x̄h. where i and j designate two
different treatments
r
M SE
– Confidence Interval: (x̄i. − x̄h. ) ± qα
b

8
(iii) Critical value qα : qα is the upper α percentage point of the studentized range
table for p(which is r) and (p − 1)(b − 1)(which is v)
(iv) Decision: If the confidence interval contains 0, the difference between the popula-
tion average treatment means is NOT significant. If the confidence interval does
NOT contain 0, the difference is significant.

9
Friedman Fr Test: A Nonparametric Test for RBD
• Step 1: Rank the p measurements within each block from 1, . . . , p after sorting from
smallest to largest (average ranks for ties) (Sort −→ Potential (or Initial) Rank −→
Final Rank: Use EXCEL for ranking)

• Step 2: Find T1 : The sum of ranks for treatment 1, T2 : The sum of ranks for treatment
2, · · · , Tp : The sum of ranks for treatment p where p is the number of treatments

• Step 3: Hypotheses H0 : The p treatments are identical


HA : At least one distribution is different
12 X
• Step 4:Test Statistic: Fr = Ti2 − 3b(p + 1)
bp(p + 1)

• Step 5:Rejection Rule (Critical Value Approach): Reject H0 if Fr > χ2α where χ2α fol-
lows chi-square distribution with (p − 1) degrees of freedom (use upper critical region).

• Step 5:Rejection Rule (p-value Approach): If p-value (exact p-value: MINITAB, ap-
proximate p-value based on chi-square table) < α, reject H0 ; Otherwise, if p-value ≥ α,
do not reject H0

10
Week 3: Chi-square Goodness-of-Fit Test and Test of
Independence
Chapter 13: 13.1 (Chi-square Goodness-of-Fit Test)
• Inference for count data

• Research Question: Does the actual distribution differ from model (no preference or
specific preference)? How good is the fit?

• Goodness-of-Fit test compares observed frequencies with expected frequencies where


expected frequencies predicted by the null hypothesis

• Hypotheses:

(i) Case I: No Preference or Equal Proportion: H0 : p1 = p2 = · · · = pk = a


(where a is the probability value we want to test)
HA : At least one pi is not equal to its specified value
(i) Case II: Specific Preference or Proportion: H0 : p1 = a1 p2 = a2 · · · = pk = ak
(where a1 , a2 , · · · , ak are the values we want to test)
HA : At least one pi is not equal to its specified value

• Observed Frequencies: fi (given)

• Expected Frequencies: Ei = npi

• Probability of Category i, pi : Need to calculate or may be given sometimes in case of


Normal distribution. Learn from Normal Distribution (STAT2606): How to calculate
P (X > a) =?, P (a < X < b) =?, P (X < b) =? using Standard Normal Table.

• Conditions for Validity of Chi-square: Hierarchical Process

(i) Conservative Rule: Ei ≥ 5


(ii) Relaxed Rule:
∗ Number of groups, k ≥ 5
n
∗ Average of Ei (i.e., ) ≥ 5
k
∗ Smallest Ei ≥ 1
(iii) Remedial Measure to Achieve Conservative Rule: If Ei < 5, adjacent cells need to
be meaningfully collapsed until benchmark (i.e. Ei ≥ 5) is achieved and degrees
of freedom (k − 1) should be adjusted accordingly.
X (fi − Ei )2
• Test Statistic: χ2 =
Ei

11
• Rejection Rule (Critical Value Approach):

Reject H0 if χ2 > χ2α where χ2α follows Chi-square distribution with (k − 1) degrees of
freedom or (k − 1 − m) (in case of goodness-of-fit test for normal distribution) where
m is the number of parameters to be estimated

• Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value


≥ α, do not reject H0

12
Chapter 13: 13.2 (Chi-square Test of Independence or Contingency
Table Analysis)
• Inference for count data

• Research Question: Are the underlying categorical variables independent?

• Contingency table ( or pivot table or cross tabulation) compares observed frequencies


with expected frequencies where expected frequencies predicted by the null hypothesis.
Contingency table is usually expressed by dimension r × c where r is the number of
rows and c is the number of columns

• Hypotheses:
H0 : The two classifications are statistically independent (or not associated)
HA : The two classifications are statistically dependent (or associated)

• Observed Frequencies: fij (given)


r i × cj row total × column total
• Expected Frequencies: Eˆii = =
n grand total
where ri is i−th row total and cj is j−th column total

• Conditions for Validity of Chi-square: Hierarchical Process

(i) Conservative Rule: Eˆij ≥ 5


(ii) Relaxed Rule:
∗ Number of cells, rc ≥ 5
n
∗ Average of Eˆij (i.e., ) ≥ 5
rc
∗ Smallest Eij ≥ 1
ˆ
(iii) Remedial Measure to Achieve Conservative Rule: If Eˆij < 5, adjacent cells need
to be meaningfully collapsed until benchmark (i.e. Eˆij ≥ 5) is achieved and
degrees of freedom (r − 1)(c − 1) should be adjusted accordingly.
X (fij − Eˆij )2
• Test Statistic: χ2 =
Eij
• Rejection Rule (Critical Value Approach):

Reject H0 if χ2 > χ2α where χ2α follows Chi-square distribution with (r − 1)(c − 1)
degrees of freedom.

• Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value


≥ α, do not reject H0

13
Week 4: Correlation and Simple Linear Regression Anal-
ysis
Chapter 14.6: Correlation
• Pearson Correlation Coefficient: Measures and describes the direction, strength and
form of relationship between two quantitative variables.

(i) Graphically: Scatter Plot (x vs. y plot)


SSxy
(ii) Numerically: r = p where
SSxx SSyy

( x)2 ( y)2
P P P P
X x y X
2
X
2
SSxy = xy − , SSxx = x − , SSyy = y −
n n n
• Properties:

(i) Valid range: −1 ≤ r ≤ +1


(ii) r ≈ 0 −→ Weak relationship
(iii) r = +1 −→ Perfect positive/direct/upward relationship
(iv) r = −1 −→ Perfect negative/inverse/downward relationship
(v) Interpretation of positive correlation: If X ↑, Y ↑, or X ↓ Y ↓ (The changes tend
to occur in the same direction)
(vi) Interpretation of negative correlation: If X ↑, Y ↓, or X ↓ Y ↑ (The changes tend
to occur in the opposite direction)

• Caution:

(i) Correlation and Causation: Correlation does NOT imply causation (because causal
path is asymmetric, while correlation is symmetric).
(ii) Spurious Correlation: Effect of lurking variable.
(iii) Correlation and Restricted Range of Scores: Never generalize a correlation be-
yond the sample range of data.
(iv) Correlation and Outliers: Outliers produce a disproportionately large impact on
the correlation coefficient

• Test of Population Correlation Coefficient (ρ)

• Assumptions:

14
(i) The data are measured on interval or ratio level
(ii) The two variables (x and y) are distributed as a Bivariate Normal distribution

• Hypotheses:

(i) H0 : ρ = 0 vs HA : ρ 6= 0 (Two-tailed Test)


(ii) H0 : ρ ≤ 0 vs HA : ρ > 0 (Right-tailed Test)
(iii) H0 : ρ ≥ 0 vs HA : ρ < 0 (Left-tailed Test)
r−ρ
• Test Statistic: t = q ∼ t(d.f. = n − 2)
1−r2
n−2

• Rejection Rule (Critical Value Approach):

(i) Reject H0 if t > t α2 or t < t α2 (Two-tailed Test)


(ii) Reject H0 if t > tα (Right-tailed Test)
(iii) Reject H0 if t < −tα (Left-tailed Test)

• Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-value


≥ α, do not reject H0 (Note: Exact p-value can be obtained from MINITAB, while
approximate p-value can be obtained from t table as the t table does not have enough
details)

15
Chapter 14.6: Simple Linear Regression Analysis
• Origin of the word “Regression”: “Regress” means “Step back”. Here the goal is to
find the average regression line (stepping back from all data points towards ‘average’)
which goes through the middle of all points, hence the naming.

• Why Simple? Because it involves one independent (or explanatory or predictor) vari-
able X and one dependent (or response) variable Y .

• Objective:

(i) An objective method of precisely defining the best fitting line that defines the
linear relationship between X and Y .
(ii) Prediction of Y based on knowledge of Xs

• Regression Models and Lines:

(i) Population Model: Y = β0 + β1 X +  where β0 is intercept, β1 is slope, and  is


the error or disturbance term (a surrogate or proxy of all types of errors). Here,
β0 and β1 are unknown and should be estimated based on sample.
(ii) Population Regression Line: E(Y |X) = β0 + β1 X.
(iii) Sample Regression Model: Y = b0 + b1 X + e where b0 is the estimated intercept,
b1 is the estimated slope, and e = Y − Ŷ is the residual where Ŷ is the predicted
Y.
(iv) Sample Regression Line: Ŷ = b0 + b1 X where

( x)2
P P P
SSxy X x y X
2
(a) b1 = where SSxy = xy − , SSxx = x − .
SSxx P P n n
y x
(b) b0 = ȳ − b1 x̄ where ȳ = , x̄ = .
n n
– Interpretation of b0 , b1 :

(a) b0 : b0 is the estimated average value of Y when X = 0.


(b) b1 : If the sign of b1 is positive: For each additional unit of X, estimated
average value of Y increases by b1 unit; If the sign b1 is negative: For each
additional unit of X, estimated average value of Y decreases by b1 unit.

• Assessing the Model and Quality of Fit:

explained variation SSR SSE


(i) Coefficient of Determination: r2 = = = 1− .
total variation SST SST
(ii) Valid Range of r2 : 0 ≤ r2 ≤ 1.

16
(iii) Interpretation of r2 : r2 = 52% means that 52% of the variation of Y is explained
by the model (i.e., X) and 48% of the variation of Y is still unexplained.

r r
SSE SSE
(iv) Standard Error of Estimate: s = =, = M SE. Here,
n−k−1 n−2
k = 1, the number of independent variables in simple linear regression model.
(v) Partition of Sum of Squares:

SST (total variation) = SSR(explained variation)+SSE(unexplained variation)


where

( y)2
X P
2
SST = y − , SSE = (1 − r2 )SST, and SSR = SST − SSE.
n
(vi) Relationship between r and r2 :

(a) Given r, find r2 : Just square the value of r.


(b) Given r√2 , find r:
r = +√r2 if b1 is positive
r = − r2 if b1 is negative

(c) Takeaway: The sign of r and b1 MUST align.

17
Week 5: Correlation and Simple Linear Regression (cont’d)
Chapter 14.3 - 14.5: Simple Linear Regression Analysis
• Test of Significance of Simple Regression Model: Three equivalent tests.

(i) Test 1: Test for significance of the correlation between X and Y (Please refer to
the section of Correlation).
(ii) Test 2: Test for significance of the coefficient of determination
∗ Hypotheses:

H0 : ρ2 = 0 vs HA : ρ2 > 0 (F-test always uses upper critical critical region


and is right-tailed)
SSR/k SSR/1
∗ Test Statistic: F = = , where k = 1, num-
SSE/(n − k − 1) SSE/(n − 2)
ber of independent variables in simple linear regression model.

∗ Rejection Rule (Critical Value Approach): Reject H0 if F > Fα where Fα is


the critical value from F distribution with k and (n−k −1) degrees of freedom
(Right-tailed Test).

∗ Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if


p-value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from
MINITAB, while approximate p-value can be obtained from F table as the F
table does not have enough details)
∗ Formula of ANOVA Table of Regression Model:

18
(iii) Test 3: Test for Regression Slope, β1
∗ Hypotheses:

(a) H0 : β1 = 0 vs HA : β1 6= 0 (two-tailed test)

(b) H0 : β1 ≤ 0 vs HA : β1 > 0 (right-tailed test)

(c) H0 : β1 ≥ 0 vs HA : β1 < 0 (left-tailed test)

b1 − β 1 s
∗ Test Statistic: t = where sb1 = qP , and s = Standard
sb 1 (
P
x)2
x2 − n
error of estimate.

∗ Rejection Rule (Critical Value Approach):

(a) Reject H0 if t > tα/2 or t < tα/2 where t is the critical value from t distri-
bution with (n − k − 1) = (n − 2) degrees of freedom. (two-tailed test)

(b) Reject H0 if t > tα (right-tailed test)

(c) Reject H0 if t < −tα (left-tailed test)

∗ Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if


p-value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from
MINITAB, while approximate p-value can be obtained from t table as the t
table does not have enough details)
∗ Decision Region:

∗ Rejection Rule (Confidence Interval Approach): If the confidence interval


b1 ± t × sb1 contains 0, do not reject H0 ; if the confidence interval does not
contain 0, reject H0 .

19
• Test of Significance of y-intercept, β0 :

– Hypotheses:

(a) H0 : β0 = 0 vs HA : β0 6= 0 (two-tailed test)

(b) H0 : β0 ≤ 0 vs HA : β0 > 0 (right-tailed test)

(c) H0 : β0 ≥ 0 vs HA : β0 < 0 (left-tailed test)


s
b0 − β0 1 x̄2
– Test Statistic: t = where sb0 = s + , and s = Standard error
sb0 n SSxx
of estimate.

– Rejection Rule (Critical Value Approach):

(a) Reject H0 if t > tα/2 or t < tα/2 where t is the critical value from t distribution
with (n − k − 1) = (n − 2) degrees of freedom. (two-tailed test)

(b) Reject H0 if t > tα (right-tailed test)

(c) Reject H0 if t < −tα (left-tailed test)

– Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if p-


value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from MINITAB,
while approximate p-value can be obtained from t table as the t table does not
have enough details)
• Confidence and Prediction Intervals:
1 (x0 − x̄)2
(i) Distance Value: +
n SSxx
(ii) Confidence Interval for the Mean of Ysgiven X = x0 :
√ 1 (x0 − x̄)2
ŷ ± tα/2 s distance value = ŷ ± tα/2 s +
n SSxx
(iii) Confidence Interval (or Prediction Interval)s for the individual Y given X = x0 :
√ 1 (x0 − x̄)2
ŷ ± tα/2 s 1 + distance value = ŷ ± tα/2 s 1 + +
n SSxx
(iv) Takeaway: Prediction Interval is wider than Confidence Interval due to the extra
term 1 under the square root.

20
21
Week 6: Correlation and Simple Linear Regression (cont’d)
Chapter 14.7: Simple Linear Regression Analysis (Residual Anal-
ysis)
• Model Assumptions

– Linearity Condition: The regression model is linear, is correctly specified, and


has an additive error term.
– Constant Variance Assumption: At any value of X, the population of poten-
tial error term values has constant variance.
– Normality Assumption: At any given value of X, the population of potential
error term values has a normal distribution.
– Independence Assumption: Any one value of the error term  is statistically
independent of any other value of .

• Checking Model Assumptions: Population Regression assumptions are checked by an-


alyzing the sample regression residuals. Residual, e,̄y − ŷ, a point estimate of .

(a) Linearity: Plot of (Residual(e) vs Independent Variable) (X) or


(Residual (e) vs Predicted Y (Ŷ )).

Nonlinear Pattern: A group of residuals having same sign (positive or negative)


followed a group of residuals having different sign.

Linear Pattern: No pattern can be discerned. Residuals are fairly randomly dis-
tributed over the reference line (0).

22
(b) Constant Variance: Homoskedasticity
Plot of (Residual(e) vs Independent Variable) (X) or
(Residual (e) vs Predicted Y (Ŷ )).

Nonconstant Variance Pattern: Residuals “fan out” or “funnel in”. Residuals not
spreading equally over all values of X.

Constant Variance Pattern: Residuals form a horizontal band spreading fairly


equally over all values of X.

23
(c) Normality:
Histogram of Residuals: Plot of Residual(e)

Normal Probability Plot (NPP):

Step 1: Order residuals from smallest to largest


Step 2: Plot e(i) on vertical axis against z(i) on horizontal axis. z(i) is the point
on the horizontal axis under the z curve so the area under this curve to its left is
(3i − 1)
. If the normality assumption holds, the plot should have a straight-line
(3n + 1)
appearance.

24
(c) Independence: If the observations of the error terms are not uncorrelated with
each other in time series data, the assumption is violated and the problem of
autocorrelation or serial correlation arises.

Detection of Autocorrelation (Graphically):

Positive Autocorrelation: Positive autocorrelation occurs when a positive error


term in time period i tends to be followed by another positive value in i + k.
(Cyclic Pattern)

Negative Autocorrelation: Negative autocorrelation occurs when a positive error


term tends to be followed by a negative value. (Alternating Pattern)

25
No Autocorrelation: Different observations of the error term are completely un-
correlated with each other.

Detection of First-order Autocorrelation (Numerically)(Durbin-Watson Test) :

Step 1: Hypotheses

∗ (i) Test of Positive Autocorrelation (Most Frequently Used)


H0 : ρ ≤ 0
HA : ρ > 0
∗ (ii) Test of Negative Autocorrelation (Rarely Used)
H0 : ρ ≥ 0
HA : ρ < 0
∗ (iii) Test of Autocorrelation
H0 : ρ = 0
HA : ρ 6= 0

26
Step 2: Test Statistic

n
(et − et−1 )2
P
t=2
d = n
e2t
P
t=1

Step 3: Decision Rule

∗ (i) Test of Positive Autocorrelation

· If d < dL,α Reject H0 .


· If d > dU,α Do not reject H0 .
· If dL,α ≤ d ≤ dU,α Inconclusive, where dL,α and dU,α are critical values
from Durbin-Watson Table at k = 1, α.
∗ (ii) Test of Negative Autocorrelation

· If (4 − d) < dL,α Reject H0 .


· If (4 − d) > dU,α Do not reject H0 .
· If dL,α ≤ (4 − d) ≤ dU,α Inconclusive, where dL,α and dU,α are critical
values from Durbin-Watson Table at k = 1, α .
∗ (iii) Test of Autocorrelation

· Both positive and negative autocrrelation should be performed using crit-


ical values dL,α/2 , dU,α/2 .
· If either test (positive or negative) says to reject H0 , then reject H0 .
· If both tests (positive or negative) says to not reject H0 , then do not
reject H0
If either test (positive or negative) is inconclusive, then test is inconclu-
sive.
Values of d statistic

∗ Extreme positive serial correlation: d = 0


∗ Extreme negative serial correlation: d ≈ 4
∗ No serial correlation: d ≈ 2

27
Week 7: Multiple Regression and Model Building
Chapter 15: Multiple Regression and Model Building (15.1 - 15.5)
• Multiple regression is an extension of simple regression that allows more than one
predictor (or independent or explanatory) variable.

• This method is based on regression outputs as no manual calculations are required.


Please be familiar with multiple linear regression outputs of MINITAB and EXCEL.

• Multiple Linear Regression Model (Population and Sample)

• Interpretation of (partial) regression coefficients, b2 : For instance, b2 means that the


average value of Y will increase (or decrease, depending on the sign of the coefficient)
for each additional unit of X2 , holding constant the effect of remaining independent
variables (or while the remaining variables are in the model).

Interpretation of intercept, b0 : The average value of Y when X1 = X2 = · · · = Xk = 0

• Quality of Fit:

(a) R2 : Fraction of the total variation in y accounted for by the model (all the pre-
dictor variables included).

SSR SSE
R2 = = 1−
SST SST

Caveats of R2 : Adding new predictor variables (even nonsensical) to a model


never decreases R2 and may increase it.

28
(b) Adjusted R2 : Adjusted R2 increases only if the gain in SSR outweighs the loss of
degrees of freedom due to addition of new independent variables. Adjusted R2 is
always less than R2 .

k n−1
R̄2 = (R2 − )( )
n−1 n−k−1
,

where n is sample size and k is the number of independent variables.

(c) Standard Error of Estimate, s: A point estimate of the standard deviation of the
error term, σ.


r
SSE
s = M SE =
n−k−1
Interpretation of s: Is this value large or small? Take into account the mean size of y
for comparison (using the rule 68 – 95 – 99.7% rule). If the interval ȳ ± 2s appears to
be too large to be acceptable from point of prediction, steps should be taken to reduce
standard error of estimate, s by improving the regression model.

• ANOVA Table

• Test of Significance of Multiple Regression Model:

(i) Test for significance of the Overall Model

29
∗ Hypotheses:

H0 : β1 = β2 = · · · = βk = 0
HA : At least one βi 6= 0
(F-test always uses upper critical critical region and is right-tailed)
SSR/k
∗ Test Statistic: F = , where k is number of independent
SSE/(n − k − 1)
variables in multiple linear regression model and n is sample size.

∗ Rejection Rule (Critical Value Approach): Reject H0 if F > Fα where Fα is


the critical value from F distribution with k and (n−k −1) degrees of freedom
(Right-tailed Test).

∗ Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if


p-value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from
MINITAB, while approximate p-value can be obtained from F table as the F
table does not have enough details)
(ii) Test of Significance of Individual Regression Coefficient, βi (if overall model ap-
pears to be significant or H0 is rejected)
∗ Hypotheses:

(a) H0 : βi = 0 vs HA : βi 6= 0 (two-tailed test)

(b) H0 : βi ≤ 0 vs HA : βi > 0 (right-tailed test)

(c) H0 : βi ≥ 0 vs HA : βi < 0 (left-tailed test)

bi − β i
∗ Test Statistic: t = where sbi , is standard error of coefficient, βi (from
sb i
MINITAB output).

∗ Rejection Rule (Critical Value Approach):

(a) Reject H0 if t > tα/2 or t < tα/2 where t is the critical value from t distri-
bution with (n − k − 1) degrees of freedom. (two-tailed test)

(b) Reject H0 if t > tα (right-tailed test)

(c) Reject H0 if t < −tα (left-tailed test)

30
∗ Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if
p-value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from
MINITAB, while approximate p-value can be obtained from t table as the t
table does not have enough details)
∗ Rejection Rule (Confidence Interval Approach): If the confidence interval
bi ± t × sbi contains 0, do not reject H0 ; if the confidence interval does not
contain 0, reject H0 .

31
Week 8: Multiple Regression and Model Building
Chapter 15: Multiple Regression and Model Building (15.6 - 15.9)
• Confidence and Prediction Intervals:

(i) Distance Value: Not directly available in multiple regression output. Here is the
workaround formula:
sˆy
distance value = ( )2 where sˆy is standard error of fit and s is the standard
s
error of estimate available in MINITAB output.
(ii) Confidence Interval for the Mean of Y given X1 = x10 , X2 = x20 , · · · , Xk = xk0 :


ŷ ± tα/2 s distance value
(iii) Confidence Interval (or Prediction Interval) for the individual Y given X1 =
x10 , X2 = x20 , · · · , Xk = xk0 :


ŷ ± tα/2 s 1 + distance value

where tα/2 is the t critical value from t distribution with (n − k − 1) degrees


of freedom.
(iv) Takeaway: Prediction Interval is wider than Confidence Interval due to the extra
term 1 under the square root.

• Dummy Variable

(i) Some variables are inherently “qualitative” (rather than “quantitative”) and can’t
be expressed as a number.
(ii) Some qualitative variables can be quantified by using a dummy variable or indi-
cator variable. A dummy variable takes on value of 0 (if condition is not met) or
1 (if condition is met).
(iii) Rule: Create one fewer dummy than there are conditions. Specifically, k categories
requires k - 1 dummy variables. Omitted condition is considered as “reference” or
“benchmark” category. Every category is compared with reference or benchmark
category when interpreting the dummy variable coefficients.
(iv) Intercept Dummy (Dummy variables without interaction effect in the model): In-
tercept changes but slope remains constant (i.e., parallel).

32
Model for Di = 1 : Yi = (β0 + β2 ) + β1 Xi
| {z } |{z}
intercept slope
Model for Di = 0 : Yi = β0 + β1 Xi
|{z} |{z}
intercept slope

(v) Interpretation of Dummy Variable Coefficients:

Method 1: β2 is the difference between the average Y when Di = 1 and av-


erage Y when Di = 0 (Yi (Di = 1) − Yi (Di = 0) = β2 )

Method 2: On average, Y is b2 more (if sign is positive) or less (if sign is neg-
ative) for Di=1 category compared to Di = 0 category.

(vi) Slope Dummy (Dummy variables with interaction term in the model): A quanti-
tative variable and dummy variable interaction is typically called a slope dummy.
If a slope dummy and intercept dummy are added to an equation, both intercept
and slope change (i.e., intercepts shift and lines are NOT parallel).

Model for Di = 1 : Yi = (β0 + β2 ) + β1 + β3 Xi


| {z } | {z }
intercept slope
Model for Di = 0 : Yi = β0 + β1 Xi
|{z} |{z}
intercept slope

33
• Using Squared and Interaction Variables

(i) Quadratic regression model:

y = β0 + β1 x + β2 x2 + 

(ii) Interaction Term: Regression models often contain interaction variables. In the
following regression model, x3 and x4 interact and x3 is used a quadratic.

y = β0 + β1 x4 + β2 x3 + β3 x23 + β4 x3 x4 + 

(iii) Partial F Test:

The Partial F-test is a formal hypothesis test designed to deal with a single
hypothesis about a group (or subset) of coefficients.

The following two regression outputs are required:

Full or Unconstrained Model:


y = β0 + β1 x1 + · · · + βk xk

Reduced or Constrained Model:


y = β0 + β1 x1 + · · · + βM xM

∗ Hypotheses:

H0 : β1 = β2 = · · · = βM = 0
HA : At least one βi 6= 0

34
(F-test always uses upper critical critical region and is right-tailed). Here, M
is the number of coefficients to be tested or number of constraints.

(SSEM − SSE)/M
∗ Test Statistic: F = , where
SSE/(n − k − 1)
· k is number of independent variables in multiple linear regression model.
· n is sample size.
· SSE is the residual sum of squares from the full (unconstrained) model.
· SSEM is the residual sum of squares from the reduced (constrained)
model.
· M is the number of constraints or coefficients in the null hypothesis to
be tested
∗ Rejection Rule (Critical Value Approach): Reject H0 if F > Fα where Fα
is the critical value from F distribution with M and (n − k − 1) degrees of
freedom (Right-tailed Test).

∗ Rejection Rule (p-value Approach): If p-value < α, reject H0 ; Otherwise, if


p-value ≥ α, do not reject H0 (Note: Exact p-value can be obtained from
MINITAB, while approximate p-value can be obtained from F table as the F
table does not have enough details)

35
Week 9: Multiple Regression and Model Building
Chapter 15: Multiple Regression and Model Building (15.10 -
15.11)
• Multicollinearity:

Perfect Multicollinearity: When two independent variables are perfectly linearly corre-
lated (i.e., r = +1 or −1). Perfect multicollinearity is a violation of Classical Regression
Assumption.

Imperfect Multicollinearity: Linear functional relationship between two or more inde-


pendent variables so strong that it can significantly affect the estimations of coefficients.
(−1 < r < 1).

Consequences:

(i) Estimates will remain unbiased.


(ii) The variances and standard errors of the estimates will increase, hence t scores
will fall.
(iii) Estimates will become sensitive to changes in specification.
(iv) Combination of (R̄2 ) and no statistically significant variables is an indication of
multicollinearity.
(v) It is possible for an F-test of overall significance to reject the null hyothesis even
though none of the individual t-tests do.
Detection of Multicollinearity:

(i) Simple Correlation Coefficient (e.g., correlation matrix) (for two independent vari-
ables)
(ii) Variance Inflation Factor (VIF) (for two or more independent variables):
1
V IFj = , where (1 − Rj2 ) is called tolerance.
1 − Rj2
Criteria for Assessing Multicollinearity:

∗ Severe when VIF > 10


∗ Moderately strong for VIF > 5
∗ Problematic when average VIF of independent variables > 1
Steps of Calculation of VIF:

36
(i) Step 1: Run an OLS regression that has Xj as a function of all the other
independent variables in the equation.

Xj = β0 + β1 X1 + · · · + βj−1 Xj−1 + βj+1 Xj+1 + · · · + βk Xk + v.

(ii) Step 2: Calculate V IFj for the independent variable Xj .


1
V IFj =
1 − Rj2
(iii) Run k similar auxiliary regressions following steps 1 and 2.

• Comparing Models: A model is considered desirable if the following criteria are fulfilled:

– Criterion 1: Adjusted R2 Adjusted R2 should be used to compare candidate


models. Higher the value, better the model. R2 cannot tell us that adding an
independent variable is undesirable as R2 always increases even a nonsensical vari-
able is added.

– Criterion 2: s and Length of Prediction Interval: Small standard error of


r
SSE
estimate, s = and short prediction interval is desirable.
n−k−1

– Mallows Cp a.k.a C :.
SSE
C= − [n − 2(k + 1)] ≤ (k + 1), where s2p is the mean square error for the
s2p
model containing all p potential independent variables and SSE is sum of squares
for a reduced model with k independent variables.

• Residual Analysis: (Same as Simple Linear Regression Model) Residuals, ei = yi − ŷi ,


are used to check the assumptions of multiple linear regression model.

(i) Normality: Normal Probability Plot or Histogram of Residuals

37
(ii) Constant Variance: Plot of residuals vs predicted value; plot of residuals vs each
independent variable
(iii) Independence: Plot of residuals vs time (if data is time series)
• Detection of Unusual Observations:

(i) Extrapolation and Prediction: Linear models ought not be trusted beyond
the span of the x-values of the data which is used to fit the model. If you extrap-
olate far into the future, be prepared for the actual values to be (possibly quite)
different from your predictions.
(ii) Outlier: An outlying y value (a large distance from ȳ).

(ii) Leverage: An outlying x value (a large distance from x̄). A leverage value is
(k + 1)
considered to be large if it is greater than 2 .
n

38
(iii) Influential: A point is called influential if omitting the point will change the
slope dramatically.
(iv) Standardized or Studentized Residuals: A residual divided by an estimate
of its standard deviation. If studentized residual is greater than 2 in absolute
value, the observation is deemed as an outlier.

(v) Deleted Residuals: e(i) = yi − ŷ(i) where prediction ŷ(i) is obtained using least
squares point estimates based on all n observations except for observation i. An
observation is outlier with respect to y if

|e(i) | > t.005 , where t ∼ t(df = n − k − 2)

39
Week 10: Time Series Forecasting
Chapter 17: Time Series Forecasting (17.1 - 17.2)
• Distinction between Cross-sectional and Time Series Data:

Time Series Data: Variables that are measured at regular intervals for the same entity
over time are called time series.
Cross-sectional Data: When several variables are all measured at the same point in
time for different entity, the data is called cross-sectional data.

• Components of Time Series:

(i) A trend is the long-term increase or decrease in a variable being measured over
time (linear or nonlinear).

(ii) A seasonal component is a wavelike pattern that is repeated throughout a time


series and has a recurrence period of at most one year.

40
(iii) A cyclical component is a long-run up and down fluctuations around the trend
level and has a recurrence period of more than one year (2 to 10 years or even
longer measured from peak to peak or trough to trough).

(iv) A random component depicts changes in time-series data that are unpredictable
(follow no regular or recognizable pattern) and cannot be associated with a trend,
seasonal, or cyclical component.

• Modelling Trend Components:

No Trend Regression Model

(i) Model: yt = β0 + t .
(ii) Point Estimate: yˆt = b0 = ȳ.
r
1
(iii) Interval Estimate: ŷt ± tα/2 s 1+ , where tα/2 is the critical value from t dis-
n
tribution with (n − 1) degrees of freedom.

Trend Regression Mode (Same as Simple linear Regression Model)l

(i) Model: yt = β0 + β1 t + t

41
(ii) Point Estimate:P P
tyt − t n yt
P
b1 = P P 2
t2 − ( nt)
P P
yt t
b0 = − b1
n n
• Modelling Seasonal Components:

Using Dummy Variable

Within regression, seasonality can be modeled using dummy variables. If the time
series data is quarterly, the following model can be considered:

yt = b0 + b1 t + bQ2 Q2 + bQ3 Q3 + bQ4 Q4 , where Q1 (quarter 1) is the reference cat-


egory.

Using Transformation

Sometimes transforming the time series data (with nonconstant seasonality) makes
it easier to forecast. The following are typical transformation techniques:

(i) Square Root Transformation


(ii) Quartic Root Transformation
(iii) Natural Logarithm

Constant Seasonality: The magnitude of the swing does not depend on the level of the
time series.

Nonconstant Seasonality: The magnitude of the swing increases as the level of the time
series increases.

42
Steps of Forecasting using Transformation:

(i) Model the time series data with transformed yt


(for example: Quartic Transformation)
1/4
yt = β0 + β1 t + βM 1 M1 + βM 2 M2 + · · · + βM 11 M11

(Predicted Model for time period 169 at M = 1:) (for example)


1/4
ŷ169 = b0 + b1 (169) + bM 1 (1) = 4.8073 + 0.0035(169) − .0525(1) = 5.3489

(ii) Untransform the point and prediction interval.

Point Estimate: ŷ169 = (5.3489)4 = 818.57

95% Predictive Interval Estimate for: y169 = [(5.2913)4 , (5.4065)4 ] = [783.88, 854.41]

43
Chapter 17: Adjusting for Seasonality: Multiplicative Decomposi-
tion (17.3)
• Multiplicative Decomposition Model:

Multiplicative Time Series:

yt = trt × snt × clt × irt

When to use:

If a time series exhibits increasing (or decreasing) seasonal variation, multiplication


decomposition method can be used to decompose the time series into its trend, sea-
sonal, cyclical, and irregular components.

Moving Average:

The successive average of n consecutive values in a time series. A moving average


is calculated to remove the seasonal effect allowing us to see any trend in data.

Seasonal Adjustment Process


(Please review the Tasty Cola Case example of Week 10 (Day 2))

(i) Compute each moving average from the k appropriate consecutive data values,
where k is the number of values in one period of the time series
(ii) Compute the centered moving averages (if the number of time series is odd, cen-
tering procedure would not have been necessary) (trt × clt )
(iii) Isolate the seasonal component by computing the ratio-to-moving-average values
yt
( = snt × irt )
trt × clt
(iv) Compute the seasonal indexes by averaging the ratio-to-moving-average values
for comparable periods (sn¯ t)
L
(v) Normalize the seasonal indexes (if necessary) ( P (normalizing coefficient);
sn
¯t
L
snt = sn ¯t×P )
sn
¯t
(vi) Deseasonalize the time series by dividing the actual data by the appropriate sea-
yt
sonal index ( )
snt
(vii) Use least squares regression to develop the trend line using the deseasonalized
data (trt = b0 + b1 t)
(viii) Develop the unadjusted forecasts using trend projection

44
(ix) Seasonally adjust the forecasts by multiplying the unadjusted forecasts by the
appropriate seasonal index (This returns seasonality to forecast) (trt × snt )
(x) We estimate the period-by-period cyclical and irregular component by dividing
the deseasonalized observation from Step (vi) by the deseasonalized forecast from
yt
Step (viii) ( = clt × irt )
trt × snt
(xi) We use a three-period moving average to average out the irregular component
(clt )
(xii) The value from Step (x) divided by the value from Step (xi) gives us the irregular
clt × irt
component (irt = )
clt

45
Week 11: Time Series Forecasting
Chapter 17: Simple Exponential Smoothing (17.4)
• Uses: Forecast a series with no trend and no seasonality
• Advantage: Simple, popular, adaptive (learn continuously from recent data)
• Key Concept: Smoothing constant (α)
• Model (No Trend, No Seasonality): yt = β0 + t
• Level Updating Equation:

ST = αyT + (1 − α)ST −1
• Smoothing Constant, α: Controls the speed of learning (0 ≤ α ≤ 1).

(a) α = 1 : Past values have no influence over forecast (under-smoothing)


(b) α = 0 : Past values have equal influence over forecast (over-smoothing)
• Why is the method called exponential smoothing?

The level updating equation can be expressed as follows:

ST = αyT + (1 − α)ST −1
= αyT + (1 − α)[αyT −1 + (1 − α)ST −2 ]
= αyT + (1 − α)αyT −1 + (1 − α)2 [αyT −2 + (1 − α)ST −3 ]
αyT + α(1 − α)yT −1 + α(1 − α)2 yT −2 + · · ·

Because the weights αyT , α(1 − α), α(1 − α)2 decay exponentially in the past, the
method is called exponential smoothing.

46
Week 12: Time Series Forecasting
Chapter 17: Double Exponential Smoothing - Holt-Winter’s Model
(17.5)
• Uses: Forecast a series with trend and/or seasonality

• Advantage: Popular and cheap to compute

• Key Concept: Smoothing constants (α, γ), updating equations

• Model (Trend, No Seasonality): yt = β0 + β1 t + t

• Level Updating Equation:

lT = αyT + (1 − α)[lT −1 + bT −1 ]

The equation says that lT equals a fraction α of the newly observed time series value
yT plus a fraction (1 − α) of the level and trend in time period (T − 1).

• Trend Updating Equation:

bT = γ[lT − lT −1 ] + (1 − γ)bT −1

The equation says that bT equals a fraction γ of [lT −lT −1 ], an estimate of the difference
between the levels of the time series in in periods T and T − 1 plus a fraction (1 − γ)
of bT −1 , the estimate of the trend in time period (T − 1) .

• Point Forecasts: One-step and two-step ahead forecasts

(a) ŷ25 (24) = l24 + b24 (One-step ahead forecast i.e., forecast made in time 24 (origin)
for future time 25).

(b) ŷ26 (24) = l24 + 2b24 (Two-step ahead forecast i.e., forecast made in time 24 (ori-
gin) for future time 26).

47
Chapter 17: Multiplicative Winter’s Model (17.5)
• Uses: Forecast a series with trend and seasonality

• Key Concept: Smoothing constants (α, γ, δ), updating equations

• Model (Trend and Seasonality): yt = (β0 + β1 t)snt + t

• Updating Deseasonalized Level:


yT
lT = α + (1 − α)[lT −1 + bT −1 ]
snT −L

The equation says that lT equals a fraction α of the newly observed deseasonalized
yT
time series value plus a fraction (1 − α) of the level and trend in time period
snT −L
(T − 1).

• Trend Updating Equation:

bT = γ[lT − lT −1 ] + (1 − γ)bT −1

The equation says that bT equals a fraction γ of [lT −lT −1 ], an estimate of the difference
between the levels of the time series in in periods T and T − 1 plus a fraction (1 − γ)
of bT −1 , the estimate of the trend in time period (T − 1) .

• Updating Seasonal Factor:


yT
snT = δ + (1 − δ)snT −1
lT

• Point Forecasts: One-step and seven-step ahead forecasts

(a) ŷ37 (36) = (l36 + b36 )sn25 (One-step ahead forecast i.e., forecast made in time 36
(origin) for future time 37; sn25 is the most recent seasonal factor).

(b) ŷ43 (36) = (l36 + 7b36 )sn31 (Seven-step ahead forecast i.e., forecast made in time
36 (origin) for future time 43; sn31 is the most recent seasonal factor).

48
Chapter 17: Additive Winter’s Model (17.5)
• Uses: Forecast a series with trend and seasonality

• Key Concept: Smoothing constants (α, γ, δ), updating equations

• Model (Trend and Seasonality): yt = β0 + β1 t + snt + t

• Updating Deseasonalized Level:

lT = α(yT − snT −L ) + (1 − α)[lT −1 + bT −1 ]

The equation says that lT equals a fraction α of the newly observed deseasonalized
time series value (yT − snT −L ) plus a fraction (1 − α) of the level and trend in time
period (T − 1).

• Trend Updating Equation:

bT = γ[lT − lT −1 ] + (1 − γ)bT −1

The equation says that bT equals a fraction γ of [lT −lT −1 ], an estimate of the difference
between the levels of the time series in in periods T and T − 1 plus a fraction (1 − γ)
of bT −1 , the estimate of the trend in time period (T − 1) .

• Updating Seasonal Factor:

snT = δ(yT − lT ) + (1 − δ)snT −1

49
Chapter 17: Forecast Error Comparisons (17.7)
• Forecast Error: et = yt − ŷt . In general, we want a forecast model with least forecast
error.

• Forecast comparison criteria:

P P
|et | |yt − ŷt |
– Mean Absolute Deviation (MAD): =
n n
P 2
(yt − ŷt )2
P
et
– Mean Squared Deviation (MSD): =
n n
P yt −ŷt
| yt | × 100%
– Mean Absolute Percentage Error (MAPE):
n

50

You might also like