0% found this document useful (0 votes)
8 views

Forecasting - Introduction

Test

Uploaded by

arunnanair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Forecasting - Introduction

Test

Uploaded by

arunnanair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Forecasting - Introduction

Dinesh Kumar
Those who have knowledge don’t predict. Those who predict
don’t have knowledge
- Lao Tzu
I think there is a world market for may be 5 computers
- Thomas Watson, Chairman of IBM 1943

Computers in future weigh no more than 1.5 tons


- Popular Mechanics, 1949

640K ought to be enough for everybody


- Bill Gates, 1981???
But, forecasting helps!
Forecasting
• Forecasting is a process of estimation of an unknown
event/parameter.

• Forecasting is commonly used to refer time series data.

• Time series is a sequence of data points measured at


successive time intervals.
Corporate
Strategy

Business Product and Financial


Forecasting Market Planning Planning

Aggregate
Aggregate Resource
Production
Forecasting Planning Planning

Item Master
Capacity
Production
Forecasting Planning
Planning

Spare Materials Capacity


Forecasting Requirement Requirement
Planning Planning
Forecasting methods
• Qualitative Techniques.
– Expert opinion, Astrologers, Vaastu experts.

• Quantitative Techniques.
– Time series techniques

• Casual Models.
– Uses information about relationship between system
elements (e.g. regression).
Time Series Techniques
• Moving Average.
• Exponential Smoothing.
• Extrapolation.
• Trend Estimation.
• Auto-regression.
Time Series Analysis - Application

• Times series analysis helps to explain:


– Any systemic variation in the series of data which
is usually due to seasonality.
– Cyclical patterns that repeat.
– Trends in the data.
– Growth rates of these trends.
Time Series Components

Trend Cyclical

Seasonal Irregular
Trend Component
• Persistent, overall upward or downward
pattern
• Due to population, economy, technology etc.
• Several years duration
Cyclical Component
• Repeating up and down movements
• Due to interaction of factors influencing
economy
• Usually 2-10 years duration
Seasonal Component
• Regular pattern of up and down movements
• Due to weather, customs etc.
• Occurs within one year
Irregular Component
• Erratic, unsystematic fluctuations
• Due to random variation or unforeseen events
– Strike
– Floods
• Short duration and non-repeating
Demand
Demand

Random
movement

Time Time
(a) Trend (b) Cycle

Demand
Demand

Time Time
(c) Seasonal pattern (d) Trend with seasonal pattern
Time series techniques for Forecasting
Why Time Series Analysis ?
• Time series analysis helps to identify and
explain:
– Any systemic variation in the series of data which
is due to seasonality.
– Cyclical pattern that repeat.
– Trends in the data.
– Growth rates in the trends.
Seasonal Vs Cyclical
• When a cyclical pattern in the data has a period of one
year, it is referred as seasonal variation.

• When the cyclical pattern has a period of more than


one year we refer to it as cyclical variation.
Seasonal Model
The level, trend and seasonality can be combined in three basic ways:

Multiplicative:

Systematic component = level x trend x seasonal factor

Additive:

Systematic component = level + trend + seasonal factor

Mixed:

Systematic component = (level + trend) x seasonal factor


Smoothing Techniques
• Smoothing is a technique of removing random
variation in the data but retain the trend and cyclic type
variation.

• Smoothing Techniques:
– Moving average smoothing
– Exponential smoothing
Moving Average (Rolling Average)
• Simple moving average.
– Used mainly to capture trend and smooth short
term fluctuations.
– Most recent data are given equal weights.
• Weighted moving average
– Uses unequal weights for data
Simple moving average
• The forecast for period t+1 (Ft+1) is given by
the average of the ‘n’ most recent data.

1 t
Ft 1   Di
n i  t n 1
Ft 1  Forecast for period t  1
Di  Data corresponding to time period i
Moving average example – Electra TV Sales

• Electra city is a retail store that sells electronic goods. Each


month the manager of the store must order merchandize from a
distant warehouse. Currently the manager is trying to estimate
how many TVs the store is likely to sell next month. To assist
this process, she has collected the data (TV sales.xls) on the
number of TVs sold in each of the previous 24 months. She
wants to use this data for decision making.
TV sale data for 16 weeks
Week TVs sold Week TVs sold
1 49 9 63
2 77 10 85
3 90 11 98
4 79 12 88
5 57 13 73
6 90 14 102
7 92 15 98
8 80 16 89
Moving average with n = 2 and 4

1 t
Ft 1,2   Di
2 i  t 1
1 t
Ft 1,4   Di
4 i  t 3
Moving Average with n = 2 and n = 4

120

100

80
TV Sale

60

40

20

0
0 5 10 Time 15 20

Actual MA with n = 2 MA with n = 4


Length of moving average
• What N should be chosen?

– Small N makes level very responsive to the last


observed demand point

– Large N makes level less responsive (smoother)


Forecasting Accuracy

• The forecast error is the difference between the forecast value


and the actual value for the corresponding period.

E t  Yt  Ft
E t  Forecast error at period t
Yt  Actual value at time period t
Ft  Forecast for time period t
Measures of aggregate error
Mean absolute error MAE 1 n
MAE   Et
n t 1

Mean absolute percentage 1 n Et


MAPE  
error MAPE n t 1 Yt

Mean squared error MSE 1 n 2


MSE   Et
n t 1

Root mean squared error RMSE 


1 n 2
 Et
RMSE n t 1
Exponential Smoothing
• Form of Weighted Moving Average
– Weights decline exponentially.
– Largest weight is given to the present observation,
less weight to immediately preceding observation
and so on.
• Requires smoothing constant ()
– Ranges from 0 to 1
Exponential Smoothing

Next forecast =   (present actual value)


+ (1-)  present forecast
Simple Exponential Smoothing Equations

• Smoothing Equations
Lt   * Yt 1  (1   ) * Lt 1
L1  Y1
– Lt : Level at time period t (smoothed value)
• Forecast Equation


Ft  Lt
Simple Exponential Smoothing Equations

• Smoothing Equations

Lt  Yt 1   (1   )Yt  2   (1   ) Yt 3  ....


2
120

100

80
Demand

60

40

20

0
0 5 10 15 20

Actual alpha = 0.8 Time alpha = 0.6 alpha = 0.4


Choice of 

The larger the value of  the faster the forecast series


responds to change in the original series.

The smaller the value of  the less sensitive is the forecast


to changes in the original series.
Choice of 

For “smooth” data, try a high value of a , forecast


responsive to most current data

For “noisy” data try low a  forecast more stable—less


responsive
Double exponential Smoothing – Holt’s model

– A problem with simple exponential smoothing is


that it will produce consistently biased forecasts
in the presence of a trend.

– Holt's method (double exponential smoothing) is


appropriate when demand has a trend but no
seasonality.

– Systematic component of demand = Level +


Trend
Holt’s method
• Holt’s method can be used to forecast when there is a
linear trend present in the data.

• The method requires separate smoothing constants for


slope and intercept.
Level Model
The demand xt in a specific period t consists of the level a plus
random noise ut (which cannot be estimated by a forecasting
method). Thus

xt = at + ut

Trend Model
The linear trend b is added to the level model’s equation:

xt = at +bt + ut
Holt’s Method
• Holt’s Equations
(i) Lt    Yt  (1   )  ( Lt 1  Tt 1)
(ii ) Tt    ( Lt  Lt 1)  (1   )  Tt 1
• Forecast Equation

Ft 1  Lt  Tt

F t  m  Lt  mTt
Initial values of Lt and Tt
• L1 is in general set to Y1.

• T1 can be set to any one of the following values (or use


regression to get initial values):

T1  (Y 2Y1 )
T1  (Y2  Y1 )  (Y3  Y2 )  (Y4  Y3 ) / 3
T1  (Yn  Y1 ) /( n  1)
Double exponential Smoothing

120

100

80

60

40

20

0
0 2 4 6 8 10 12 14 16 18
Actual Forecast
Forecasting Power of a Model
Theil’s Coefficient
2
n
 Ft 1  Yt 1 
n -1

 t t
Y  F 2
  Y 
U1  t 1
, U2  t 1  t 
2
n n n -1
 Yt 1  Yt 
 Yt 2   Ft2   Y 
t 1 t 1 t 1  t 

U1 is bounded between 0 and 1, with values closure to zero


indicating greater accuracy.
If U2 = 1, there is no difference between naïve forecast and
the forecasting technique
If U2 < 1, the technique is better than naïve forecast
If U2 > 1, the technique is no better than the naïve
forecast.
Theil’s coefficient for TV Sales Problem

Method U1 U2
Moving Average with 2 0.1488 0.8966
periods
Double exponential 0.1304 0.7864
Smoothing
Auto Regressive, Integrated Moving
Average
ARIMA
• ARIMA has the following three components:
• Auto-regressive component: Function of past
values of the time series.

• Integration Component: Differencing the time


series to make it a stationary process.

• Moving Average Component: Function of past


error values.
Auto-Regression
 An auto-regression is a regression model in which Yt is
regressed against its own lagged values.
 The number of lags used as regressors is called the order of
the autoregression.
 In a first order autoregression, Yt is regressed against Yt–1
 In a pth order autoregression, Yt is regressed against
Yt–1,Yt–2,…,Yt–p.
Auto-regressive process (AR(p))
• Assume { } is purely random with mean zero
t

and s.d. 
• Then the autoregressive process of order p or
AR(p) process is

Yt   0  1Yt 1   2Yt  2  ...   pYt  p   t

AR(p) process models each future observation as a function “p” previous


observations.
Moving Average Process MA(q)
• Start with { } being white noise or purely random,
t

mean zero, s.d. 


• {Yt} is a moving average process of order q (written
MA(q)) if for some constants 0, 1, . . . q we have

Yt   0  1 t   2 t  2  ...   q t  q   t

MA(q) models each future observation as a function of “q” previous errors


Stationarity
• A stochastic process is stationary, if:
– Mean is constant over time.
– Variance is constant over time.
– The covariance between two time periods (Yt) and
(Yt+k) depends only on the lag k not on the time t.
ACF Plot of non-stationary and
stationary process
Non-Stationary Process
• The existence of non-stationarity is
indicated by an ACF which is large at long
lags
• Stationarity can be achieved by
differencing. Differencing once is
generally sufficient, twice may be needed
Differencing
• Differencing is a process of making a non-stationary process into
stationary process.

• In differencing, we create a new process Xt, where Xt = Yt – Yt-1.


Integration (d)
• Checks whether the process is stationary or non-stationary.

• Instead of observed values, differences between observed


values are modelled.

• When d=0, the observations are modelled directly. If d = 1,


the differences between consecutive observations are
modelled. If d = 2, the differences of the differences are
modelled.
ARIMA (p, d, q)
• The q and p values are identified using auto-
correlation function (ACF) and Partial auto-
correlation function (PACF) respectively.

• Usually p+q <= 4 and d <= 2.


ARIMA(p,0,q) Model

AR(p) Model

Yt   0  1Yt 1   2Yt  2  ...   pYt  p 


  0  1 t   2 t  2  ...   q t  q  t

MA(q) Model
ARIMA(p,1,q) Process

X t   0  1 X t 1   2 X t  2  ...   p X t  p 
  0  1 t   2 t  2  ...   q t  q  t

Where Xt = Yt – Yt-1
Auto-Correlation
• Auto-correlation is the correlation between successive
observations over time.

• The autocorrelation for a k-period lag is given by:

nk
 
 

  Yt  k  Y  Yt  Y 
rk  t 1  n 

 (Yi  Y ) 2
t 1
Auto-correlation Function
• A k-period plot of autocorrelations is called
autocorrelation function (ACF) or a
correlogram.
ACF Plot for Harmon Foods
PACF Plot for Harmon Foods
Hypothesis test for autocorrelation
• To test whether the autocorrelation at lag k is significantly
different from 0, the following hypothesis test is used:
• H0: rk = 0
• HA: rk ≠ 0
• For any k, reject H0 if |rk| > 1.96/√n. Where n is the number
of observations.
Harmon Foods Example

=1.96/√48
= 0.29
Partial Auto-Correlation
• Partial auto-correlation of lag k is auto-correlation between Yt
and Yt+k after the removal of linear dependence of Yt+1 to Yt+k-1.
• To test whether the autocorrelation at lag k is significantly
different from 0, the following hypothesis test is used:
• H0: k = 0
• H A:  k ≠ 0
• For any k, reject H0 if |  k| > 1.96/√n. Where n is the number
of observations.
Harmon Foods: Partial Auto-
Correlation
Pure AR AND MA Process
• Non-stationary process has ACF significant for large number of
lags.

• Autoregressive processes have an exponentially declining ACF and


spikes in the first one or more lags of the PACF. The number of
spikes indicates the order of the auto-regression.

• Moving average processes have spikes in the first one or more


lags of the ACF and an exponentially declining PACF. The number
of spikes indicates the order of the moving average.

• Mixed (ARMA) processes typically show exponential declines in


both the ACF and the PACF.
Pure AR & MA Model Identification

Model ACF PACF


AR(1) Exponential Decay: Positive side if 1 > Spike at lag 1, then cuts off to zero.
0 and alternating in sign starting on Spike positive if 1 > 0 and negative
negative side if 1 < 0. side if 1 < 0.

AR(p) Exponential decay: pattern depends on Spikes at lags 1 to p, then cuts of to


signs of 1, 2, etc zero.
MA(1) Spike at lag 1 then cuts of to zero. Spike Exponential decay. On negative side if
positive if 1 > 0 and negative side if 1> 0 on positive side if 1< 0.
1 < 0.
MA(q) Spikes at lags 1 to q, then cuts off to Exponential decay or sine wave.
zero. Exact pattern depends on signs of 1,
2 etc.
ARMA(p,q) Model Identification
• ARMA(p,q) models are not easy to identify. We usually start
with pure AR and MA process. The following thump rule may
be used.

Process ACF PACF


ARMA(p,q) Tails off after (q – p) lags Tails off after (p-q) lags

• The final ARMA model may be selected based on parameters


such as RMSE, MAPE, AIC and BIC.
Forecasting Model Evaluation
Akaike’s information criteria

AIC = -2LL + 2m
Where m is the number of variables estimated in the model

Bayesian Information Criteria

BIC = -2LL + m ln(n)


Where m is the number of variables estimated in the model and
n is the number of observations
Recommended Readings
• F Diebold, “Forecasting Applications and
Methods”, Cengage Learning, 2009.

• J Holten Wilson and Barry Keating, “Business


Forecasting”, TaTa McGraw Hill, 2010.

You might also like