Extrapolation
Extrapolation
Learning Objectives
1.
Extrapolation
Concepts, assumptions, limitations Alternative functional forms linear and nonlinear
2.
3.
Part 1: Extrapolation
Forecasting Context
Uncertainty (forecasting error) increases with
Longer forecast horizon Smaller areas
Extrapolation Technique
Fit function to a set of observations and extend this pattern into the future Use the function that
Is the function of best fit
(Least squares or regression)
Assumptions
Use of aggregate data, generally across time (population, employment, etc.) Future movement of the data series is determined by past patterns embedded in the series The essential information about the future of the data series is contained in the history of the series Past trends will continue into the future
Advantages / Benefits
Computational simplicity Transparent methodology Ease of application May work for
Large areas Short time horizons Slow grow areas
Disadvantages / Risks
Does not account for underlying causes / structural conditions
Example: Cohorts are invisible
Ignores structural / systemic context Current trend often do not continue Excludes any external considerations
Klostermans Technique
Klosterman
Transforms curves into lines Performs linear regression
Some functions are not available in Excel (e.g., Gompertz, modified exponential, logistic) His approach can be applied in Excel so that these alternative functions are available.
Transform data according to his technique Fit a trend line using the Excel function Reverse the transformation to compute forecasted values.
Linear Function
Linear Function: Y=10X+100
450 400 Dependent Variable 350 300 250 200 150 100 50 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
= a + bX
Geometric Function
Geometric Function:
200 180 160 140 120 100 80 60 40 20 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Inde pe nde nt Variable
= 10(1.1X ) YC
Dependent Variable
= ab X YC
Parabolic Function
Parabolic Function
2,000
Dependent Variable
= 10 + 1.5 X + 2 X 2 YC
Independent Variable
= a + bX + cX
Modified Exponential
Modified Exponential Function:
250 200
Dependent Variable
150 100 50 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Independent Variable
= c + ab X YC
where a is c minus the Y intercept b is the ratio of successive growth increments (constant) c is the asymptotic value
Gompertz Function
Gompertz Function
120 100
Dependent Variable
= 100 * (0.9)
0.8 X
80 60 40 20 0
0 2 4 6 -8 -6 -4 -2 8 10 12 16 18 14 -1 4 -2 0 -1 8 -1 6 -1 0 -1 2 20
Independent Variable
Y C = ca
bX
If ln(a)<0 with 0<b<1 C is the upper limit Ratio of the logarithms of successive observations is constant
Logistic Function
Logistic Function
0.120 0.100
Dependent Variable
YC =
1 10 + (0.5) * (0.5) x
Independent Variable
1 Y C = c + ab x
If 1. b is between 0 and 1 and 2. a < 0 Then 1. Curve takes the S shape and 2. 1/c is the asymptotic value (upper limit) and 3. 0 is the lower limit
Regression
Fitting Equations
Given the existence of a time trend, fitting equations enables us to identify the mathematical function that best captures the relationship
Meanings of Coefficients
R2 is the regression coefficient 0<R2 <1
0 1 No relationship Perfect fit
R : correlation coefficient
Square root of R2 and signed according to the direction of the relationship -1<R<1
1 -1 0 Perfect fit, positive relationship Perfect fit, inverse relationship No relationship
Fitted Value
Deviation
Observed Value
a x i + bn =a y i
i =1 i =1
a x i + b x i = x i y i
2 i =1 i =1 i =1
Inverse
a b a b
0.0070 -0.0574
27.5 299
SSE = [ y i f ( x i )]
i =1
SSE r = 1 SST
2
Where:
SST = [ y i y ]
i =1
Values of r2
0<r2<1 As r2 approaches 1, the fit is better As r2 approaches 0, the fit is worse
Calculating r2 in Excel
I 1 2 3 4 5 x 2 4 7 11 17 y 2.00 3.50 4.50 8.00 9.50 5.5 Error Error^2 f(x) y-f(x) (y-f(x))^2 y-(AveY) (y-(AveY))^2 2.308824 -0.308824 0.095372 -3.50 12.25 3.338235 0.161765 0.026168 -2.00 4.00 4.882353 -0.382353 0.146194 -1.00 1.00 6.941176 1.058824 1.121107 2.50 6.25 10.029412 -0.529412 0.280277 4.00 16.00 1.67 39.50 SSE SST
Average of Y
6 4 2 0 0 5 10 X Variable 1
Y Predicted Y
15
20
Intercept X Variable 1
Coefficients Standard Error t Stat P-value 1.279411765 0.610944132 2.094155 0.127272 0.514705882 0.062419278 8.245944 0.003734
Worksheet examples
Go to Options tab
Tip: Select the formula label, then format, and increase the number of digits to the largest possible. This will result in a more precise computation.
1950
1960
1970
1980
1990
2000
2010
2020
y = 29,481.45x + 31,6996.53
3,000,000
2,500,000
2,000,000
Tip: For final presentation purposes, reduce the number of digits displayed in the equation.
R = 0.962
1,500,000
1,000,000
500,000
0 1940
1950
1960
1970
1980
1990
2000
2010
2020
29481.450370525*(2020)+ 316996.533799534
= 59,869,526
This is much too high.
29481.450370525*(81)+ 316996.533799534 = 2,704,994 This is the correct formula. You can determine how your version of excel interprets the x in your equations by experimenting. See example spreadsheet. CalcExtrap.xls
1940
1950
1960
1970
1980
1990
2000
Linear regression on residuals collapses to the x axis. The sum of the residuals is zero.
1940
1950
1960
1970
1980
1990
2000
Ratio Methods
Smith, Tayman, Swanson Chapter 8 Smaller region (city) is contained in larger region (county or state) Projection of larger region projection of smaller region Depends upon a preexisting forecast / projection of the larger region These will be used in the economic models section of the course.
Extrapolation - Summary
Use with care.
Just because a function fits (high r2) does not mean that the extrapolation is reasonable. Make your assumptions explicit Generally there are growth limits at some point
Calculation of the forecast value when using Excel may require the construction of an index.
Use the reported equation and substitute either the year or index number into the formula for x. If you create an index, the beginning value should be 1.