0% found this document useful (0 votes)

56 views36 pages

MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis

This document discusses causal forecasting models and regression analysis. It provides an example of using historical sales and traffic flow data from 5 gas stations to construct a linear regression model to forecast sales at a new proposed location based on its traffic flow. The method of least squares is used to fit a straight line model that minimizes the sum of squared deviations between the observed and predicted values. This provides equations to calculate the slope and intercept of the best fitting line. The model finds that approximately 70% of the variation in hourly sales is explained by the number of cars per hour. It then uses the regression equation to forecast that sales would be $225/hour at a location with traffic of 183 cars/hour.

Uploaded by

kalam1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views36 pages

MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis

Uploaded by

kalam1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

MGS3100 Chapter 13

Forecasting
Slides 13c:
Causal Models and
Regression Analysis
In a causal forecasting model, the forecast for the
quantity of interest “rides piggyback” on another
quantity or set of quantities.
In other words, our knowledge of the value of one
variable (or perhaps several variables) enables us
to forecast the value of another variable.

In this model, let

y denote the true value of some variable of
interest and
y^ denote a predicted or forecast value for
that variable.
Then, in a causal model,
^
y = f(x , x , … x )
1 2 n

where
f is a forecasting rule, or function, and
x1, x2 , … xi , is a set of variables
In this representation, the x variables are often
called independent variables, whereas y ^is the
dependent or response variable.
We either know the independent variables in
^.
advance or can forecast them more easily than y
Then the independent variables will be used in the
forecasting model to forecast the dependent
variable.
Companies often find by looking at past
performance that their monthly sales are directly
related to the monthly GDP, and thus figure that a
good forecast could be made using next month’s
GDP figure.
The only problem is that this quantity is not
known, or it may just be a forecast and thus not a
truly independent variable.
To use a causal forecasting model, requires two
conditions:
1. There must be a relationship between
values of the independent and dependent
variables such that the former provides
information about the latter.
2. The values for the independent variables
must be known and available to the
forecaster at the time the forecast is made.
Simply because there is a mathematical
relationship does not guarantee that there is
really cause and effect.

One commonly used approach in creating a causal

forecasting model is called curve fitting.

CURVE FITTING:
AN OIL COMPANY EXPANSION
Consider an oil company that is planning to
expand its network of modern self-service
gasoline stations.
The company plans to use traffic flow (measured
in the average number of cars per hour) to
forecast sales (measured in average dollar sales
per hour).
The firm has had five stations in operation for
more than a year and has used historical data to
calculate the following averages:
The averages are plotted in a scatter diagram.

$300.00

$250.00

$200.00
Sales/hour ($)

$150.00

$100.00

$50.00

$-
0 50 100 150 200 250
Cars/hour
Now, these data will be used to construct a
function that will be used to forecast sales at any
proposed location by measuring the traffic flow at
that location and plugging its value into the
constructed function.
Least Squares Fits The method of least squares is
a formal procedure for curve fitting. It is a two-
step process.
1. Select a specific functional form (e.g., a
straight line or quadratic curve).
2. Within the set of functions specified in step
1, choose the specific function that
minimizes the sum of the squared
deviations between the data points and the
function values.
To demonstrate the process, consider the sales-
traffic flow example.
1. Assume a straight line; that is, functions of
the form y = a + bx.
2. Draw the line in the scatter diagram and
indicate the deviations between observed
points and the function as di .
For example,
d1 = y1 – [a +bx1] = 220 – [a + 150b]
where
y1 = actual sales/hr at location 1
x1 = actual traffic flow at location 1
a = y-axis intercept for the function
b = slope for the function
$300.00
y

d3
$250.00

$200.00 d1
d5 y = a + bx
Sales/hour ($)

$150.00 d4

$100.00 d2

$50.00

$-
0 50 100 150 200 x250
Cars/hour

The value d12 is one measure of how close the

value of the function [a +bx1] is to the observed
value, y1; that is it indicates how well the function
fits at this one point.
One measure of how well the function fits overall
is the sum of the squared deviations:
5

i=1
di2

Consider a general model with n as opposed to

five observations. Since each di = yi – (a +bxi),
the sum of the squared deviations can be written
as: n
 i
i=1
(y – [a +b xi ])2

Using the method of least squares, select a and b

so as to minimize the sum in the equation above.
Now, take the partial derivative of the sum with
respect to a and set the resulting expression
equal to zero.
n

i=1
-2(yi – [a +bxi]) = 0

A second equation is derived by following the

same procedure with b.
n

i=1
-2xi (yi – [a +bxi]) = 0

Recall that the values for xi and yi are the

observations, and our goal is to find the values of
a and b that satisfy these two equations.
The solution is:
n n n
1 xi  yi

i=1
x y
i i - n 
i=1 i=1
b= n n
1
  xi
2
x - ni
2
i=1 i=1

n n
1
a= n  i
1 y - b n  xi
i=1 i=1

The next step is to determine the values for:

n n n n

i=1
xi 
i=1
xi 2 
i=1
yi 
i=1
xiyi

Note that these quantities depend only on

observed data and can be found with simple
arithmetic operations or automatically using
Excel’s predefined functions.
Using Excel, click on Tools – Data Analysis …

In the resulting
dialog, choose
Regression.
In the Regression dialog, enter the Y-range and
X-range.

Choose to
place the
output in
a new
worksheet
called
Results

Select Residual Plots and Normal Probability Plots

to be created along with the output.
Click OK to produce the following results:

Note that a (Intercept) and b (X Variable 1) are

reported as 57.104 and 0.92997, respectively.
To add the resulting least squares line, first click
on the worksheet Chart 1 which contains the
original scatter plot.
Next, click on the data series so that they are
highlighted and then choose Add Trendline …
from the Chart pull-down menu.
Choose Linear Trend in the resulting dialog and
click OK.
A linear trend is fit to the data:
$300.00

$250.00

$200.00
Sales/hour ($)

Series1
$150.00
Linear (Series1)

$100.00

$50.00

$-
0 50 100 150 200 250
Cars/hour
One of the other summary output values that is
given in Excel is: R Square = 69.4%
This is a “goodness of fit” measure which
represents the R2 statistic discussed in
introductory statistics classes.
R2 ranges in value from 0 to 1 and gives an
indication of how much of the total variation in Y
from its mean is explained by the new trend line.
In fact, there are three different sums of errors:
TSS (Total Sum of Squares)
ESS (Error Sum of Squares)
RSS (Regression Sum of Squares)
The basic relationship between them is:
TSS = ESS + RSS
They are defined as follows:
n
– 2
TSS =  (Yi – Y )
i=1
n
^ 2
ESS =  (Yi – Yi )
i=1
n
^ – 2
RSS =  (Yi – Y )
i=1
Essentially, the ESS is the amount of variation
that can’t be explained by the regression.
The RSS quantity is effectively the amount of the
original, total variation (TSS) that could be
removed using the regression line.
R2 is defined as: RSS
R =
2
TSS

If the regression line fits perfectly, then ESS = 0

and RSS = TSS, resulting in R2 = 1.

In this example, R2 = .694 which means that

approximately 70% of the variation in the Y
values is explained by the one explanatory
variable (X), cars per hour.
Now, returning to the original question: Should
we build a station at Buffalo Grove where traffic
is 183 cars/hour?

The best guess at what the corresponding sales

volume would be is found by placing this X value
into the new regression equation:
^
y = a + b * x
Sales/hour = 57.104 + 0.92997 * (183 cars/hour)
= $227.29

However, it would be nice to be able to state a

95% confidence interval around this best guess.
We can get the information to do this from Excel’s
Summary Output.
Excel reports that the
standard error (Se) is
44.18.
This quantity represents
the amount of scatter in
the actual data around
the regression line.
The formula for Se is:
n
Where n is the number
^ 2 of data points (e.g., 5)
 (Yi – Yi )
i=1 and k is the number of
Se =
n – k -1 independent variables
(e.g., 1).
ESS
This equation is equivalent to:
n – k -1

Once we know Se and based on the normal

distribution, we can state that
• We have 68% confidence that the actual
value of sales/hour is within + 1 Se of the
predicted value ($277.29).
• We have 95% confidence that the actual
value of sales/hour is within + 2 Se of the
predicted value ($277.29).
The 95% confidence interval is:
[277.29 – 2(44.18); 227.29 + 2(44.18)]
[$138.93; $315.65]
Another value of interest in the Summary report
is the t-statistic for the X variable and its
associated values.

The t-statistic is 2.61 and the P-value is 0.0798.

A P-value less than 0.05 represents that we have
at least 95% confidence that the slope parameter
(b) is statistically significantly than 0 (zero).
A slope of 0 results in a flat trend line and
indicates no relationship between Y and X.
The 95% confidence limit for b is [-0.205; 2.064]
Thus, we can’t exclude the possibility that the
true value of b might be 0.
Also given in the Summary report is the
F –significance. Since there is only one
independent variable, the F –significance is
identical to the P-value for the t-statistic.

In the case of more than one X variable, the F –

significance tests the hypothesis that all the X
variable parameters as a group are statistically
significantly different than zero.
Concerning multiple regression models, as you
add other X variables, the R2 statistic will always
increase, meaning the RSS has increased.
In this case, the Adjusted R2
statistic is a reliable
indicator of the true
goodness of fit because it
compensates for the
reduction in the ESS due to
the addition of more
independent variables.
Thus, it may report a decreased adjusted R2 value
even though R2 has increased, unless the
improvement in RSS is more than compensated
for by the addition of the new independent
variables.
WHICH CURVE TO FIT?
If, for example, a quadratic function fits better
than a linear function, why not choose a more
general form, thereby getting an even better fit?
In practice, functions of the form (with only a
single independent variable for illustrative
purposes) are often suggested:
y = a0 + a1x + a2x2 + … + anxn
Such a function is called a polynomial of degree n,
and it represents a broad and flexible class of
functions.
n=2 quadratic
n=3 cubic
n=4 quartic
…
One must proceed with caution when fitting data
with a polynomial function.
For example, it is possible to find a (k – 1)-degree
polynomial that will perfectly fit k data points.
To be more specific, suppose we have seven
historical observations, denoted
(xi , yi), i = 1, 2, …, 7
It is possible to find a sixth-degree polynomial
y = a0 + a1x + a2x2 + … + a6x6
that exactly passes through each of these seven
data points.
A perfect fit gives zero for the sum of squared
deviations.
However,
this is
deceptive,
for it does
not imply
much about
the
predictive
value of the
model for
use in
future
forecasting.
Despite the perfect fit of the polynomial function,
the forecast is very inaccurate. The linear fit
might provide more realistic forecasts.
Also, note
that the
polynomial
fit has
hazardous
extrapolation
properties
(i.e., the
polynomial
“blows up”
at its
extremes).
Reliability and Validity
• Does the model make intuitive sense? Is the
model easy to understand and interpret?
• Are the coefficients statistically significant
(p-values less than .05)?
• Are the signs associated with the coefficients
as expected?
• Does the model predict values that are
reasonably close to the actual values?
• Is the model sufficiently sound (high R 2, low
standard error, etc.)?
Correlation Coefficient and
Coefficient of Determination
n X iYi   X i  Yi
r
[n X i2  ( X i ) 2 ][ Yi 2  ( Yi )2 ]

• Coefficient of determination = r2.

• Correlation coefficient = r.
• Where: Yi = dependent variable.
• Xi = independent variable.
• n = number of observations.
Correlation Coefficient and
Coefficient of Determination
Summary: Causal Forecasting Models
• The goal of causal forecasting model is to develop
the best statistical relationship between a dependent
variable and one or more independent variables.
• The most common model approach used in practice is
regression analysis. Only linear regression models
are examined in this course.
• In causal forecasting models, when one tries to
predict a dependent variable using a single
independent variable, it is called a simple regression
model.
• When one uses more than one independent variable
to forecast the dependent variable, it is called a
multiple regression model.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Quantitative Analysis Forecasting
100% (1)
Quantitative Analysis Forecasting
25 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
Asset-V1 MITx+CTL - sc0x+2T2020+Type@Asset+Block@SC0x M1U2 AnalyticsBasics CLEAN
No ratings yet
Asset-V1 MITx+CTL - sc0x+2T2020+Type@Asset+Block@SC0x M1U2 AnalyticsBasics CLEAN
35 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Selvanathan 7e - 17
No ratings yet
Selvanathan 7e - 17
93 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
(Classes 1 & 2) 2 Var Regression-For Upload
No ratings yet
(Classes 1 & 2) 2 Var Regression-For Upload
99 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
MPD412 - Ind Org - Lecture-03-Forecasting - Part B
No ratings yet
MPD412 - Ind Org - Lecture-03-Forecasting - Part B
26 pages
Forecasting Technques
No ratings yet
Forecasting Technques
23 pages
Linear Review 1
No ratings yet
Linear Review 1
235 pages
Banking Risk Management
No ratings yet
Banking Risk Management
57 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Regression
No ratings yet
Regression
60 pages
Regression Analysis: Ordinary Least Squares
No ratings yet
Regression Analysis: Ordinary Least Squares
12 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Lec 9
No ratings yet
Lec 9
14 pages
CH 13
No ratings yet
CH 13
11 pages
MBAS901 2 Lecture
No ratings yet
MBAS901 2 Lecture
87 pages
Forecasting: 1. Qualitative 2. Time Series
No ratings yet
Forecasting: 1. Qualitative 2. Time Series
19 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Introduction 1
No ratings yet
Introduction 1
113 pages
Exercise 2 Management Accounting S6
No ratings yet
Exercise 2 Management Accounting S6
19 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Maths Decisions
No ratings yet
Maths Decisions
20 pages
Prediction Analysis
No ratings yet
Prediction Analysis
52 pages
Midterm Exam: Deadline: December 04, 2021
No ratings yet
Midterm Exam: Deadline: December 04, 2021
13 pages
TOD 212 - PPT 1 For Students - Monsoon 2023
No ratings yet
TOD 212 - PPT 1 For Students - Monsoon 2023
26 pages
OM - Forecasting
No ratings yet
OM - Forecasting
35 pages
Year 4 Module 6 Quiz PDF
0% (1)
Year 4 Module 6 Quiz PDF
5 pages
Chapter 6 IEM
No ratings yet
Chapter 6 IEM
49 pages
Anderson Ch16
No ratings yet
Anderson Ch16
59 pages
ch9 - Model Specification and Data Problems
No ratings yet
ch9 - Model Specification and Data Problems
79 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
AI Project Cycle Question Bank
No ratings yet
AI Project Cycle Question Bank
14 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Statistical Modeling
No ratings yet
Statistical Modeling
22 pages
Lecture3 Xy 2025
No ratings yet
Lecture3 Xy 2025
43 pages
Ra Web
No ratings yet
Ra Web
70 pages
CH 2
No ratings yet
CH 2
31 pages
Extrapolation
No ratings yet
Extrapolation
48 pages
Lecture 6
No ratings yet
Lecture 6
30 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
003-Forecasting Techniques Detailed
No ratings yet
003-Forecasting Techniques Detailed
20 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Unit III
No ratings yet
Unit III
13 pages
1526 Businessterms
No ratings yet
1526 Businessterms
2 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Chapter 1. Elements in Predictive Analytics
No ratings yet
Chapter 1. Elements in Predictive Analytics
66 pages
Calibration and Curve Fitting
No ratings yet
Calibration and Curve Fitting
42 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Regressi On
No ratings yet
Regressi On
16 pages
Forecasting
No ratings yet
Forecasting
9 pages
2003 Peugeot 807 65093 PDF
No ratings yet
2003 Peugeot 807 65093 PDF
184 pages
Curve Fitting
No ratings yet
Curve Fitting
18 pages
Luwax and Poligen - Application Guide BAFS
100% (1)
Luwax and Poligen - Application Guide BAFS
9 pages
Sterling N Computing
No ratings yet
Sterling N Computing
2 pages
Aircraft Fastener
100% (3)
Aircraft Fastener
119 pages
Nursing Informatics Week 1
No ratings yet
Nursing Informatics Week 1
37 pages
Configuracion de Scannert
No ratings yet
Configuracion de Scannert
2 pages
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
No ratings yet
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
6 pages
Model BFV-300 Butterfly Valve Wafer Style General Description Technical Data
No ratings yet
Model BFV-300 Butterfly Valve Wafer Style General Description Technical Data
8 pages
ANNEX C 2016 DRRMS SCHOOL MONITORING TOOL FOR Preparedness Response and Reh
No ratings yet
ANNEX C 2016 DRRMS SCHOOL MONITORING TOOL FOR Preparedness Response and Reh
5 pages
DMT-5 User Manual
No ratings yet
DMT-5 User Manual
20 pages
Claverie Response To Selection Criteria 09.23.10
100% (1)
Claverie Response To Selection Criteria 09.23.10
2 pages
Being Lazy Is Art
No ratings yet
Being Lazy Is Art
13 pages
10151593138776675
No ratings yet
10151593138776675
239 pages
Q3 Brochure
No ratings yet
Q3 Brochure
24 pages
Contract - II
No ratings yet
Contract - II
8 pages
Module10 Activity
No ratings yet
Module10 Activity
4 pages
2.2. BASIC Work in Team Environment
No ratings yet
2.2. BASIC Work in Team Environment
3 pages
Wa0005.
No ratings yet
Wa0005.
17 pages
Atty. Agbayani Cases
No ratings yet
Atty. Agbayani Cases
46 pages
Engine Test Stands For Automotive Technicians
No ratings yet
Engine Test Stands For Automotive Technicians
6 pages
Syllabus - Private International Law Copy 2
No ratings yet
Syllabus - Private International Law Copy 2
5 pages
Company Profile Acurate Packtech
No ratings yet
Company Profile Acurate Packtech
6 pages
NSDL Conversion Request Form
No ratings yet
NSDL Conversion Request Form
1 page
Personal Banking: What Is It?
No ratings yet
Personal Banking: What Is It?
25 pages
OS Lab Manual Part 3
No ratings yet
OS Lab Manual Part 3
7 pages
Secrret (Rahasiyam)
No ratings yet
Secrret (Rahasiyam)
114 pages
Suspended Systems: Lighting and Security
No ratings yet
Suspended Systems: Lighting and Security
8 pages
Career Planning and Mobility Workshop 2 Job Application Workbook
No ratings yet
Career Planning and Mobility Workshop 2 Job Application Workbook
60 pages
3 Causal Models Part I: Sufficient Causes: Matthew Fox Advanced Epidemiology
No ratings yet
3 Causal Models Part I: Sufficient Causes: Matthew Fox Advanced Epidemiology
51 pages
Consultancy Project Assessment Sheet
No ratings yet
Consultancy Project Assessment Sheet
1 page
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
No ratings yet
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
2 pages
AS Business Studies Unit 1 - Developing A Business Idea: Understanding Markets Market Size, Growth and Share
No ratings yet
AS Business Studies Unit 1 - Developing A Business Idea: Understanding Markets Market Size, Growth and Share
20 pages
JNTUH 4-1-EEE-R13-Syllabus
No ratings yet
JNTUH 4-1-EEE-R13-Syllabus
14 pages
Tool 4: Developing Selection Criteria: What Is The Main Goal?
No ratings yet
Tool 4: Developing Selection Criteria: What Is The Main Goal?
2 pages
For Communication Skills
No ratings yet
For Communication Skills
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet

MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis

Uploaded by

MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis

Uploaded by

MGS3100 Chapter 13

In this model, let

One commonly used approach in creating a causal

The value d12 is one measure of how close the

Consider a general model with n as opposed to

Using the method of least squares, select a and b

A second equation is derived by following the

Recall that the values for xi and yi are the

The next step is to determine the values for:

Note that these quantities depend only on

Select Residual Plots and Normal Probability Plots

Note that a (Intercept) and b (X Variable 1) are

If the regression line fits perfectly, then ESS = 0

In this example, R2 = .694 which means that

The best guess at what the corresponding sales

However, it would be nice to be able to state a

Once we know Se and based on the normal

The t-statistic is 2.61 and the P-value is 0.0798.

In the case of more than one X variable, the F –

• Coefficient of determination = r2.

You might also like