0% found this document useful (0 votes)
15 views27 pages

Intro To Reg Models

Uploaded by

sibaee0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views27 pages

Intro To Reg Models

Uploaded by

sibaee0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Business Forecasting

Introduction to Regression Models


Generating Process
For the time series Yt, a basic representation of the
generating process is;

Yt = f(systematic, random component)

Yt = mt + et
The modeller needs to determine the relevant
functional form for mt. This will depend on the type
of patterns observed in the time series and the
type of model (time series/causal)
Regression
In regression modelling the systematic component,
mt is f ( X1, X2, X3…..Xk) where Xj are explanatory
variables
Yt = f ( X1, X2, X3…..Xk) + et

The exact functional form and the particular


independent variables (Xj ) to be included in the
model is a matter of judgement.
A regression model is a causal model since the
prediction of the target time series is linked to other
time series.
Why Use Regression?

Advantages:
1. Regression allows the forecaster to incorporate
theoretical knowledge of the time series, independent
variables and the functional form

2. Regression provides a “causal” explanation of why the


prediction may be appropriate.

3. Regression models can be used to provide strategy and


scenario based prediction. In particular, regression
can be used to analyse the best and worst case
scenarios. This provides some indication of the range of
likely values for the target time series and helps
management identify sources of risk
Why use Regression? (cont)
Disadvantages:
1. Regression requires much more data than other forecasts
methods. Data is required on the target time series
and the independent variables. In addition more
theoretical knowledge of the time series generating
process is required

2. Regression analysis requires more resources


(time,money,skill) to produce forecasts than previously
examined time series methods. This time and effort may not
necessarily result in greater predictive accuracy.

Regression may prove to be an expensive, time


consuming way of producing inferior forecasts.
When to use Regression?

Regression is a relatively time consuming and


expensive way of generating forecasts

Typically the use of regression should be for forecasts of


some importance for the firm or organisation or when
strategic options and/or scenario analysis is required

The benefits to the firm (improved understanding,


scenarios) must outweigh the considerable costs

Since regression is resource hungry it should only be


undertaken when there are sufficient resources (time,
money, data etc) to enable a proper regression analysis.
Regression Forecasting
There are basically three tasks involved

1. Choose an appropriate model. This includes


independent variable selection and choosing the specific
functional form of the model

2. Use a joint sample of observations on the dependent


and independent variables to derive estimates of the
regression coefficients

3. Use the estimated model and predicted values of


independent variables to generate forecasts of the
dependent variable
Choosing an Appropriate
Model
Choosing appropriate independent variables relies on economic
theory, logic, the observed time series and the experience
of the modeller

Typically, the modeller considers the above and selects a candidate


group of variables which may be independent variables in a final
regression model

Functional form is another issue. Once again use logic, theory,


experience and the observed time series (Linear v Non-Linear,
Statics v Dynamics, Levels v Changes)

How the predictive model will be used and what information


and/or forecasts are required may also influence the functional
form.
Estimation
Basic Functional Form (population model)

Yt = 0 + 1 X1t + 2 X2t + ... + k Xkt + et


A joint sample on Yt and X 1t, 2t, kt is collected.

Estimation of regression model coefficients (b0, b1 , b2… bk)


typically via Ordinary Least Squares (OLS) in EXCEL or
Minitab. This generates the sample regression model

E(Yt) = b0 + b1 X1t + b2 X2t + ... + bk Xkt


Why Use OLS?
OLS estimates have good forecast properties

Under the conditions the model is correctly specified and


the random error at any observation is independently
derived with zero mean and constant variance the OLS
estimates and forecasts will be

1. Unbiased - on average the OLS estimates and


forecasts will be equal to the true values

2. Efficient- the OLS estimators and forecasts will be


the most precise of any linear unbiased
estimators or forecasts.
Estimation (cont)
Before the model can be used for forecasts it needs to be
checked for adequacy and violations of assumptions
underpinning OLS

Assumptions of OLS include correctly specified


functional form and error term (et) behaviour

Residuals, other diagnostics and associated relevant


statistical tests used to determine the adequacy of model

Only after examination of the above diagnostics and


determination of adequacy should the estimated model
be used for forecasts
Regression: Example 1
Sales Advertising Advertising Vs Sales over
Week time
(Y) (1,000s) (X) ($100s)
25

1 10 9 y = 0.9137 x + 0.7842

2 6 7 20

3 5 5 Estimated Regression
Line/Equation (OLS)

4 12 14 15

5 10 15 Sales (1,000s)
10
6 15 12
7 5 6 5

8 12 10 St= b0+ b1* At


9 17 15 0
0 5 10 15 20 25
10 20 21 Advertising ($100s)
13

Excel: Regression Output


SUMMARY OUTPUT
Advertising Vs Sales over time
25
Regression Statistics y = 0.7842 + 0.9137 * x
20
Multiple R 0.891

Sales (1,000s)
R Square 0.795 15

Adjusted R 10
Square 0.769
5
Standard Error 2.448
Observations 10 0
0 5 10 15 20 25
Advertising ($100s)

ANOVA
df SS MS F Sig F
Regression 1 185.658 185.658 30.980 0.001
Residual 8 47.942 5.993
Total 9 233.6

Standard Lower Upper


Coefficients Error t Stat P-value 95% 95%
Intercept 0.784 2.025 0.387 0.709 -3.886 5.454

Advertising 0.914 0.164 5.566 0.001 0.535 1.292


Regression: Example 1 (cont.)
The sample estimated equation is

Sales = 0.7842 + 0.9137 * A


(Estimated through Excel or MINITAB)

Intercept: 0.7842 – Estimated Sales when A = 0


Slope: 0.9137 – Estimated constant increase in Sales (000’s)
when Advertising increases by 1 unit ($100)

Forecasts:
When A = 10: S = 0.7842 + 0.9137 * 10 = 9.921 (000’s)

When A = 15: S = 0.7842 + 0.9137 * 15 = 14.490 (ooo’s)


Measures of Estimated
Model Performance
R2 - Coefficient of Determination:
• R2 is the % of dependent variable variation (sample)
explained by the estimated regression

• R2 is between 0 and 1 and the closer the R2 to 1 the


better the estimated model fits the sample data.

• EXCEL calculates R2 as part of the standard regression


estimation routine

• In practice, many unskilled modellers place too much


emphasis on R2 or a close counterpart R2 adjusted.

• R2 has many flaws and can be easily manipulated. Don’t


place too much reliance on it but use it as one tool of
many in deciding the suitability of models
Performance Measures (cont.)
Standard Error

The standard error is approximately the “average”


residual of the regression.

It is similar although not identical to RMSE

Provided in standard regression output in EXCEL


and Minitab

Standard error can be used as comparison between


competing regression models
More on Performance Measures

Out of sample forecast performance:


R2 and standard error are indicators of the in- sample predictive
ability of the model.

In-sample prediction is easier since both dependent and


independent variable data are available and used to obtain the
“best” model (most accurate)

For out- of-sample prediction, the dependent variable is not


available (that’s why we are predicting it!). The values of the
independent variables may also need to be estimated

The predictive ability of the model will be different out of


sample. The forecaster should test the out-of-sample predictive
ability by leaving aside a portion of the most recent
observations as a test set. Usual error criteria can be used.
Excel: Regression Output
Standard
SUMMARY OUTPUT
Error =
2.448 R-sq = SSR/SST
Regression Statistics =185.7/233.6
Multiple R 0.891 =0.795
R Square 0.795
Adjusted R MSE =
Square 0.769 Mea
Standard Error 2.448
Observations 10

ANOVA
df SS MS F Sig F
Regression 1 185.658 185.658 30.980 0.001
Residual 8 47.942 5.993
Total 9 233.6

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 0.784 2.025 0.387 0.709 -3.886 5.454 -3.886 5.454

Advertising 0.914 0.164 5.566 0.001 0.535 1.292 0.535 1.292


Coefficient of Determination
(R2)
Advertising Vs Sales
25 over time
y = 0.914x + 0.784
R² = 0.795
20

15
Sales (1,000s)

10
79.5% of the sample
variation in Sales is
5
explained by the
variation in Advertising
0
0
5/7/2021
5 10 15 20 25 expenditure 19
Advertising ($100s)
Statistical Testing

• Test for overall model significance – F test

• Test for individual variable significance – t


tests
Testing Individual Variables
Testing Individual coefficients:

Separate tests of population slope coefficients (j) being


zero (null hypothesis)

If the slope coefficient is zero it suggests the independent


variable being examined does not influence the
dependent variable

Further, the independent variable being examined may be an


irrelevant variable and could possibly be dropped from
the model specification

t test - check p-value (<0.05 then Reject H0)


Individual Coefficient Tests
Single Co-efficient Tests:

Hypothesis tests can be applied to the co-efficients of all


variables separately. For a model given by

Y = 0 + 1* X1 + 2* X2 +….k*Xk + e

The relevant test (each co-efficient separately)


H0: j = 0 vs H1: j ≠0

The test statistic has a t distribution with p-values


indicating support for H0 or H1.
Are Individual Variables
Significant?
Use t tests of individual variable slopes
Shows if there is a relationship between the variable Xj
and Y
Hypotheses:
H0: βj = 0 (no linear relationship exists between
Xj and Y)
H1: βj ≠ 0 (linear relationship does exist
between Xj and Y)
Are Individual Variables
Significant? - (2)
H0: βj = 0 (no relationship)
H1: βj ≠ 0 (relationship does exist
between Xj and Y)

Test Statistic: bj − 0
t=
Sb j (df = n – k – 1)

Check p-value in output (<0.05 Reject H0)


25
Excel: Example 1 - t tests from
Regression Output
SUMMARY OUTPUT

Regression Statistics Sig-value (α) is the


Multiple R 0.891
probability of a similar or
R Square 0.795
more extreme sample t
Adjusted R
Square 0.769 value given β is zero.
Standard Error 2.448
Observations 10

ANOVA
df SS MS F Sig F
Regression 1 185.658 185.658 30.980 0.001
Residual 8 47.942 5.993
Total 9 233.6

Standard Lower Upper Lower Upper


Coefficients Error t Stat P-value 95% 95% 95.0% 95.0%
Intercept 0.784 2.025 0.387 0.709 -3.886 5.454 -3.886 5.454

Advertising 0.914 0.164 5.566 0.001 0.535 1.292 0.535 1.292


Check the Residuals
As in time series models, a necessary condition for adequacy
of a forecast model are non-systematic errors

Check residuals for randomness- visual inspection of


residual plots (vs. time and vs. all explanatory
variables separately)

Examine ACF and PACF of residuals

Systematic residuals may indicate violation of regression


assumptions

Other objective tests can be used (more on this next week)


Forecasting with Regression

The diagnostic tests are used as a tool to check specified models


and to suggest potential improvements to model
specifications

Models may be modified (according to diagnostic information)


and the process of estimation and diagnostic testing is
repeated

Once a final model is determined (with acceptable diagnostics) it


is used for forecasts

Forecasts will use the estimated equation and estimates of


future Xj values to forecast Yf

You might also like