MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis
MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis
Forecasting
Slides 13c:
Causal Models and
Regression Analysis
In a causal forecasting model, the forecast for the
quantity of interest “rides piggyback” on another
quantity or set of quantities.
In other words, our knowledge of the value of one
variable (or perhaps several variables) enables us
to forecast the value of another variable.
where
f is a forecasting rule, or function, and
x1, x2 , … xi , is a set of variables
In this representation, the x variables are often
called independent variables, whereas y ^is the
dependent or response variable.
We either know the independent variables in
^.
advance or can forecast them more easily than y
Then the independent variables will be used in the
forecasting model to forecast the dependent
variable.
Companies often find by looking at past
performance that their monthly sales are directly
related to the monthly GDP, and thus figure that a
good forecast could be made using next month’s
GDP figure.
The only problem is that this quantity is not
known, or it may just be a forecast and thus not a
truly independent variable.
To use a causal forecasting model, requires two
conditions:
1. There must be a relationship between
values of the independent and dependent
variables such that the former provides
information about the latter.
2. The values for the independent variables
must be known and available to the
forecaster at the time the forecast is made.
Simply because there is a mathematical
relationship does not guarantee that there is
really cause and effect.
CURVE FITTING:
AN OIL COMPANY EXPANSION
Consider an oil company that is planning to
expand its network of modern self-service
gasoline stations.
The company plans to use traffic flow (measured
in the average number of cars per hour) to
forecast sales (measured in average dollar sales
per hour).
The firm has had five stations in operation for
more than a year and has used historical data to
calculate the following averages:
The averages are plotted in a scatter diagram.
$300.00
$250.00
$200.00
Sales/hour ($)
$150.00
$100.00
$50.00
$-
0 50 100 150 200 250
Cars/hour
Now, these data will be used to construct a
function that will be used to forecast sales at any
proposed location by measuring the traffic flow at
that location and plugging its value into the
constructed function.
Least Squares Fits The method of least squares is
a formal procedure for curve fitting. It is a two-
step process.
1. Select a specific functional form (e.g., a
straight line or quadratic curve).
2. Within the set of functions specified in step
1, choose the specific function that
minimizes the sum of the squared
deviations between the data points and the
function values.
To demonstrate the process, consider the sales-
traffic flow example.
1. Assume a straight line; that is, functions of
the form y = a + bx.
2. Draw the line in the scatter diagram and
indicate the deviations between observed
points and the function as di .
For example,
d1 = y1 – [a +bx1] = 220 – [a + 150b]
where
y1 = actual sales/hr at location 1
x1 = actual traffic flow at location 1
a = y-axis intercept for the function
b = slope for the function
$300.00
y
d3
$250.00
$200.00 d1
d5 y = a + bx
Sales/hour ($)
$150.00 d4
$100.00 d2
$50.00
$-
0 50 100 150 200 x250
Cars/hour
n n
1
a= n i
1 y - b n xi
i=1 i=1
In the resulting
dialog, choose
Regression.
In the Regression dialog, enter the Y-range and
X-range.
Choose to
place the
output in
a new
worksheet
called
Results
$250.00
$200.00
Sales/hour ($)
Series1
$150.00
Linear (Series1)
$100.00
$50.00
$-
0 50 100 150 200 250
Cars/hour
One of the other summary output values that is
given in Excel is: R Square = 69.4%
This is a “goodness of fit” measure which
represents the R2 statistic discussed in
introductory statistics classes.
R2 ranges in value from 0 to 1 and gives an
indication of how much of the total variation in Y
from its mean is explained by the new trend line.
In fact, there are three different sums of errors:
TSS (Total Sum of Squares)
ESS (Error Sum of Squares)
RSS (Regression Sum of Squares)
The basic relationship between them is:
TSS = ESS + RSS
They are defined as follows:
n
– 2
TSS = (Yi – Y )
i=1
n
^ 2
ESS = (Yi – Yi )
i=1
n
^ – 2
RSS = (Yi – Y )
i=1
Essentially, the ESS is the amount of variation
that can’t be explained by the regression.
The RSS quantity is effectively the amount of the
original, total variation (TSS) that could be
removed using the regression line.
R2 is defined as: RSS
R =
2
TSS
• Correlation coefficient = r.
• Where: Yi = dependent variable.
• Xi = independent variable.
• n = number of observations.
Correlation Coefficient and
Coefficient of Determination
Summary: Causal Forecasting Models
• The goal of causal forecasting model is to develop
the best statistical relationship between a dependent
variable and one or more independent variables.
• The most common model approach used in practice is
regression analysis. Only linear regression models
are examined in this course.
• In causal forecasting models, when one tries to
predict a dependent variable using a single
independent variable, it is called a simple regression
model.
• When one uses more than one independent variable
to forecast the dependent variable, it is called a
multiple regression model.