Forecasting - Assignment1
Forecasting - Assignment1
`
Ravinderpal Wasu - 71710004
FORECASTING
ANALYTICS - 1
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
a. Time plot
Graph 1
500
400
300
200
100
0
Jan-95
Jan-96
Jan-97
Apr-95
Mar-97
May-95
Oct-95
May-96
Oct-96
Feb-97
May-97
Oct-97
Jun-95
Feb-95
Mar-95
Aug-95
Nov-95
Dec-95
Feb-96
Mar-96
Apr-96
Jun-96
Aug-96
Nov-96
Dec-96
Apr-97
Jun-97
Aug-97
Nov-97
Dec-97
Jul-95
Sep-95
Jul-96
Sep-96
Jul-97
Sep-97
Month-Year
Shampoo Sales Linear (Shampoo Sales) Expon. (Shampoo Sales) Poly. (Shampoo Sales)
500
400
300
200
100
0
Month - Year
c. Shampoo is a daily usage product for hygiene purposes. Sales are not restricted to festivals or climatic
seasons. Therefore, the seasonality is less.
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
b) Examining time plots of the series and of model forecasts only for the training period – No.
We should examine both – training and validation periods
d) Looking at MAPE and RMSE values for the validation period – Yes
We should use the validation period data to test the model built on the training period
c) What is the naive forecast for the validation period? (assume that you must provide forecasts for 12
months ahead)
Create R Code FA_Assignment1_SouvenirSales_71710004.R
e) Histogram of errors
Create the new data sheet (ForecastingSheet), where we will add next 12 months as input predictors
and 12 dummy variables.
From Data sheet, select data range, predictor inputs and Log_Sales as output, as shown below.
Then select the ForecastingSheet where we have the details of new forecast dates
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
This gives a new score sheet MLR_NewScore1 with the forecasted data for the date range as per the
ForecastingSheet
Predicted Exp(Predict
Date
Value ed Value)
Jan-02 9.5092637 13484.06
Feb-02 9.7827003 17724.45
Mar-02 10.249256 28261.51
Apr-02 9.9593767 21149.61
May-02 10.00683 22177.42
Jun-02 10.068191 23580.87
Jul-02 10.251837 28334.55
Aug-02 10.251367 28321.23
Sep-02 10.354752 31405.93
Oct-02 10.454834 34711.77
Nov-02 10.93621 56174.03
Dec-02 11.713723 122237.7
As RMSE for Linear trend model < Quadratic trend model, Linear trend model could fit the series.
However, as there is a downward and then an upward trend, the linear trend model may not calculate
this complex situation.
Hence, Quadratic trend model seems the better choice to fit the series.
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
b) And c)
Linear Trend Model – lag 1 shows positive autocorrelation = 0.8113547
Lags ACF
0 1
1 0.8113547
2 0.5372829
3 0.3468594
4 0.163308
5 0.007701
6 -0.099131
7 -0.216145
8 -0.336966
9 -0.367947
10 -0.35435
Lags ACF
0 1
1 0.8684287 ACF Plot for Var1
2 0.6858585 0 5 10 15
3 0.5293604 1 0.4
4 0.3555479
0.5 0.2 ACF
5 0.1802078
ACF
6 0.0350135 0 0 UCI
7 -0.105849 -0.5 0 1 2 3 4 5 6 7 8 9 10 -0.2
LCI
8 -0.242297
-1 -0.4
9 -0.322102 Lags
10 -0.354119
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
For both Plots, for Linear trend model, the input predictors would be ‘Time (t)’. It is the month of the
year, from 1995 to 2001.
For Quadratic trend model, we use ‘Time2 (t2)’ as the predictor.
As both plots show seasonality, we can add additional predictors (dummy variables). As we have data
for 12 months (for 7 years), we can add 12 additional predictors (dummy variables), 1 for each month
showing seasonality.
Total Predictors = Time (t) and 12 additional predictors (dummy variables), d1, d2, d3, …… , d12
From XL Miner created the Data Partition on the Time series and ran the Prediction as below:
Data partition Prediction setup
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
(i) The sales starts picking up in mid-Nov, peaking the highest in Dec and dropping in January.
The reasons are
o It is Christmas holidays where people buy gifts and souvenirs
o In northern hemisphere we have winter, and people travel and fly south to Australia to
enjoy the summers there
Estimated trend Co-efficient = 245.36 indicates that for every increase in predictor unit, there is
an expected increase in output by 245.36 unit
In this case, for every next month, Sales output expected to increase by 245.36 Australian
dollars
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
(i) Here we create the model with Exponential trend model with Seasonality
Trend = percentage monthly growth
Fit linear regression with log(Yt) as output and t as predictor
yt = a e b t e
log(yt ) = b0 + b1 t + e
From XL Miner created the Data Partition on the Time series and ran the Prediction as below:
Data partition Prediction setup
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
Estimated trend Co-efficient = 0.02 indicates that for every increase in predictor unit, there is an
expectation of 0.02 % growth
In this case, for every next month, Sales output expected to increase / grow by 0.02% of previous
month’s sales.
From Data sheet, select data range, predictor inputs and Log_Sales as output, as shown below.
Then select the ForecastingSheet where we have the details of new forecast dates
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
This gives a new score sheet MLR_NewScore1 with the forecasted data for the date range
as per the ForecastingSheet
Predicted Exp(Predict
Date
Value ed Value)
Jan-02 9.5092637 13484.06
Feb-02 9.7827003 17724.45
Mar-02 10.249256 28261.51
Apr-02 9.9593767 21149.61
May-02 10.00683 22177.42
Jun-02 10.068191 23580.87
Jul-02 10.251837 28334.55
Aug-02 10.251367 28321.23
Sep-02 10.354752 31405.93
Oct-02 10.454834 34711.77
Nov-02 10.93621 56174.03
Dec-02 11.713723 122237.7
Multiplicative No Yes
increase/decrease of This is additive This is multiplicative
the series over time
RMSE 17451.547 7101.44
From the data above, the Model B is having better performance as:
i) The Root Mean Square Error (RMSE = 17451.547) of Model B < RMSE (7101.44) of Model A
ii) Model B provides multiplicative trend over time
e) ACF plot and AR(2) model for Exponential Trend model with seasonality
Here I see that none of the lags are touching the UCI or LCI. Lag 1 and Lag 2 are significant though.
Forecast
f % Confidence Level
Now taking the new residuals to the MLR_ValidationScore sheet and recalculating the predicted
values, adjusting the errors obtained from AR2 model
For before the AR2 improvisation the Jan 2002 forecast was 9780.022, with 95% confidence level
After the AR2 residual correction, the Jan 2002 forecast is 10902.93, with 95% confidence level