100% found this document useful (1 vote)
133 views

Forecasting - Assignment1

The document appears to be an assignment containing solutions to forecasting problems from a textbook. It includes: 1) A time series plot and analysis of shampoo sales data showing trends, seasonality, and noise. 2) A discussion of best practices for forecasting including partitioning data, assessing models on validation data, and calculating naive forecasts. 3) The naive forecast for a 12 month validation period is provided as the forecast since the data is being forecast 12 months ahead.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
133 views

Forecasting - Assignment1

The document appears to be an assignment containing solutions to forecasting problems from a textbook. It includes: 1) A time series plot and analysis of shampoo sales data showing trends, seasonality, and noise. 2) A discussion of best practices for forecasting including partitioning data, assessing models on validation data, and calculating naive forecasts. 3) The naive forecast for a 12 month validation period is provided as the forecast since the data is being forecast 12 months ahead.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Assignment 1

`
Ravinderpal Wasu - 71710004

FORECASTING
ANALYTICS - 1
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

1. Page 44, Problem 6 ......................................................................................................................................................... 2


2. Page 58, Problem 2 ......................................................................................................................................................... 5
3. Page 58, Problem 1 ......................................................................................................................................................... 6
4. Page 103, Problem 2 ..................................................................................................................................................... 10
5. PAGE 111, PROBLEM 6 ............................................................................................................................................. 12
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

1. Page 44, Problem 6

SOLUTION – Page 44, Problem 6


Data provided for Shampoo Sales for 3 years.
Year = 1995 Year = 1996 Year = 1997
Month Shampoo Sales Month Shampoo Sales Month Shampoo Sales
Jan-95 266 Jan-96 194.3 Jan-97 339.7

Feb-95 145.9 Feb-96 149.5 Feb-97 440.4

Mar-95 183.1 Mar-96 210.1 Mar-97 315.9

Apr-95 119.3 Apr-96 273.3 Apr-97 439.3

May-95 180.3 May-96 191.4 May-97 401.3

Jun-95 168.5 Jun-96 287 Jun-97 437.4

Jul-95 231.8 Jul-96 226 Jul-97 575.5

Aug-95 224.5 Aug-96 303.6 Aug-97 407.6

Sep-95 192.8 Sep-96 289.9 Sep-97 682

Oct-95 122.9 Oct-96 421.6 Oct-97 475.3

Nov-95 336.5 Nov-96 264.5 Nov-97 581.3

Dec-95 185.9 Dec-96 342.3 Dec-97 646.9


Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

a. Time plot
Graph 1

Shampoo Sales v/s Time


800
700
600
Shampoo Sales

500
400
300
200
100
0
Jan-95

Jan-96

Jan-97
Apr-95

Mar-97
May-95

Oct-95

May-96

Oct-96

Feb-97

May-97

Oct-97
Jun-95
Feb-95
Mar-95

Aug-95

Nov-95
Dec-95

Feb-96
Mar-96
Apr-96

Jun-96

Aug-96

Nov-96
Dec-96

Apr-97

Jun-97

Aug-97

Nov-97
Dec-97
Jul-95

Sep-95

Jul-96

Sep-96

Jul-97

Sep-97
Month-Year

Shampoo Sales Linear (Shampoo Sales) Expon. (Shampoo Sales) Poly. (Shampoo Sales)

b. Time Series Components present in this data are :


Component Details
Systematic part
Level Yes. Average = 312.6
Represented by the Level line in the graph

Shampoo Sales v/s Time


800
700
600
Shampoo Sales

500
400
300
200
100
0

Month - Year

Shampoo Sales Level

Trend Yes. We have an upward exponential trend with multiplicative seasonality.


Represented in the Time plot Graph 1 above.
a) Linear trend
b) Exponential trend
c) Polynomial trend
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

Seasonal Yes, we see repetitive, cyclical behavior, but not completely.


patterns Sales goes up every Sept-Oct and goes down in Dec-Jan months.
In addition, the sales for every year in this season is increasing, so it has some
multiplicative seasonality as well.
Non-systematic part
Noise Yes. Noise is always present in a time series.
Cannot fathom visually from the graph

c. Shampoo is a daily usage product for hygiene purposes. Sales are not restricted to festivals or climatic
seasons. Therefore, the seasonality is less.
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

2. Page 58, Problem 2

SOLUTION – Page 58, Problem 2


For forecasting, we should use the following:

a) Partition the data into training and validation periods - Yes


We should fit the model only to training period and access performance on validation period. This avoids
overfitting the model and reduces noise. Finally, deploy model by joining training and validation, to forecast the
future.

b) Examining time plots of the series and of model forecasts only for the training period – No.
We should examine both – training and validation periods

c) Looking at MAPE and RMSE values for the training period – No


We use the training period to build the model.

d) Looking at MAPE and RMSE values for the validation period – Yes
We should use the validation period data to test the model built on the training period

e) Computing naive forecasts – Yes


This is a simple way to forecast. Logically the most recent data is possibly the predictive data.
Naïve k-step ahead forecast is denoted as : Ft+k = yt
And for a seasonal series (with M seasons) : Ft+k = yt-M=k
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

3. Page 58, Problem 1


Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

SOLUTION – Page 58, Problem 1


a) Why were the data partitioned?
In order to predict the future, we have to take the past as the baseline. Hence, we partition data to
train the model and predict the future. We also have to validate / assess the prediction performance of
the model, for which we need some portion (10 – 20 %) of past data to compare and calculate the
error co-efficient.
Data is partitioned into:
i. Training period -
ii. Validation period

b) Why did the analyst choose 12 months for validation period?


The data provided is monthly and the prediction goal is for next 12 months. Hence, the analyst chose
12 months for validation period, which account to approx. 15% of the entire data.

c) What is the naive forecast for the validation period? (assume that you must provide forecasts for 12
months ahead)
Create R Code  FA_Assignment1_SouvenirSales_71710004.R

Forcast for validation period :


Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2001 7615.03 -673.8117 15903.87 -5061.6594 20291.72
Feb 2001 9849.69 1560.8483 18138.53 -2826.9994 22526.38
Mar 2001 14558.40 6269.5583 22847.24 1881.7106 27235.09
Apr 2001 11587.33 3298.4883 19876.17 -1089.3594 24264.02
May 2001 9332.56 1043.7183 17621.40 -3344.1294 22009.25
Jun 2001 13082.09 4793.2483 21370.93 405.4006 25758.78
Jul 2001 16732.78 8443.9383 25021.62 4056.0906 29409.47
Aug 2001 19888.61 11599.7683 28177.45 7211.9206 32565.30
Sep 2001 23933.38 15644.5383 32222.22 11256.6906 36610.07
Oct 2001 25391.35 17102.5083 33680.19 12714.6606 38068.04
Nov 2001 36024.80 27735.9583 44313.64 23348.1106 48701.49
Dec 2001 80721.71 72432.8683 89010.55 68045.0206 93398.40

d) Compute the RMSE and MAPE for the naïve forecasts

ME RMSE MAE MPE MAPE MASE ACF1 Theil's U


Training set 3401.361 6467.818 3744.801 22.39270 25.64127 1.000000 0.4140974 NA
Test set 7828.278 9542.346 7828.278 27.27926 27.27926 2.090439 0.2264895 0.7373759
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

e) Histogram of errors

f) Forecast for 2002


Excel - SouvenirSalesExponentialTrend.xlsx (PFA)  Exponential Trend model with seasonality
This has the data provided for the sales

Create the new data sheet (ForecastingSheet), where we will add next 12 months as input predictors
and 12 dummy variables.

From Data sheet, select data range, predictor inputs and Log_Sales as output, as shown below.
Then select the ForecastingSheet where we have the details of new forecast dates
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

Select the data range and click ‘Match by Name’.

This gives a new score sheet  MLR_NewScore1 with the forecasted data for the date range as per the
ForecastingSheet
Predicted Exp(Predict
Date
Value ed Value)
Jan-02 9.5092637 13484.06
Feb-02 9.7827003 17724.45
Mar-02 10.249256 28261.51
Apr-02 9.9593767 21149.61
May-02 10.00683 22177.42
Jun-02 10.068191 23580.87
Jul-02 10.251837 28334.55
Aug-02 10.251367 28321.23
Sep-02 10.354752 31405.93
Oct-02 10.454834 34711.77
Nov-02 10.93621 56174.03
Dec-02 11.713723 122237.7

Take the exp. Values from the log values.


Forecasted sales for year 2002 in Australian dollars
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

4. Page 103, Problem 2

SOLUTION – Page 103, Problem 2


a) I do not see any seasonality here.
Visually the Quadratic trend model suits better.
Worked on Linear trend and Quadratic trend on the CanadianWorkHours.xls

For linear trend model, CanadianWorkHoursLinearTrendModel.xls


sheet - MLR_Output  RMSE = 1.23964928980373

For Quadratic trend model, CanadianWorkHoursQuadraticTrendModel.xls


sheet - MLR_Output  RMSE = 1.5436761

As RMSE for Linear trend model < Quadratic trend model, Linear trend model could fit the series.
However, as there is a downward and then an upward trend, the linear trend model may not calculate
this complex situation.
Hence, Quadratic trend model seems the better choice to fit the series.
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

b) And c)
Linear Trend Model – lag 1 shows positive autocorrelation = 0.8113547

Lags ACF
0 1
1 0.8113547
2 0.5372829
3 0.3468594
4 0.163308
5 0.007701
6 -0.099131
7 -0.216145
8 -0.336966
9 -0.367947
10 -0.35435

Quadratic Trend Model – lag 1 shows positive autocorrelation = 0.8684287

Lags ACF
0 1
1 0.8684287 ACF Plot for Var1
2 0.6858585 0 5 10 15
3 0.5293604 1 0.4
4 0.3555479
0.5 0.2 ACF
5 0.1802078
ACF

6 0.0350135 0 0 UCI
7 -0.105849 -0.5 0 1 2 3 4 5 6 7 8 9 10 -0.2
LCI
8 -0.242297
-1 -0.4
9 -0.322102 Lags
10 -0.354119
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

5. PAGE 111, PROBLEM 6


Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

SOLUTION – Page 111, Problem 6


a) Both the plots are time series plot. Plot 1 is versus Sales and Plot 2 is versus Log of sales.
For Plot 1 – output would be Sales and for Plot 2 – output would be Log(Sales).

For both Plots, for Linear trend model, the input predictors would be ‘Time (t)’. It is the month of the
year, from 1995 to 2001.
For Quadratic trend model, we use ‘Time2 (t2)’ as the predictor.

As both plots show seasonality, we can add additional predictors (dummy variables). As we have data
for 12 months (for 7 years), we can add 12 additional predictors (dummy variables), 1 for each month
showing seasonality.

Total Predictors = Time (t) and 12 additional predictors (dummy variables), d1, d2, d3, …… , d12

b) Linear trend model with Seasonality


Excel created  SouvenirSalesLinearTrend.xlsx (PFA)

 Prepare data sheet:


From the data provided, consider the months as ‘t’ predictor and Sales as output.
Get the dummy variables for the 12 months, d1, d2, d3, ….., d12 and provide values 1 or 0 as per the
months
d1 is for Jan, then in Jan column, enter value 1 and remaining are 0s
d2 is for Feb, then in Feb column, enter value 1 and remaining are 0s
and so on, till d12 for Dec

From XL Miner created the Data Partition on the Time series and ran the Prediction as below:
Data partition Prediction setup
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

(i) The sales starts picking up in mid-Nov, peaking the highest in Dec and dropping in January.
The reasons are
o It is Christmas holidays where people buy gifts and souvenirs
o In northern hemisphere we have winter, and people travel and fly south to Australia to
enjoy the summers there

(ii) For Linear trend model, yt = b0 + b1 t + e


Results from the excel, MLR_Output sheet
Regression Model
Input RSS
Coefficient Std. Error t-Statistic P-Value CI Lower CI Upper
Variables Reduction
Intercept 29403.996 2811.365701 10.4589725 4.65991E-15 23778.467 35029.526 9796552803
t 245.36417 34.08279399 7.19906275 1.24479E-09 177.16466 313.56369 2632660324
d1 -32469.55 3442.362193 -9.4323459 2.19089E-13 -39357.7 -25581.4 200983029
d2 -31350.17 3438.817107 -9.1165554 7.30808E-13 -38231.22 -24469.11 164835092
d3 -28060.71 3435.606496 -8.1676134 2.85041E-11 -34935.34 -21186.07 34090020.7
d4 -31006.98 3432.731299 -9.0327441 1.00753E-12 -37875.86 -24138.1 204610629
d5 -31023.36 3430.192358 -9.0442029 9.64225E-13 -37887.15 -24159.56 268422518
d6 -30601.57 3427.990423 -8.9269716 1.51215E-12 -37460.97 -23742.18 320573509
d7 -29480.99 3426.126141 -8.6047583 5.23484E-12 -36336.65 -22625.32 333412323
d8 -29241.97 3424.600064 -8.5387985 6.75564E-12 -36094.58 -22389.36 474017994
d9 -28513.99 3423.412646 -8.329113 1.52229E-11 -35364.22 -21663.76 681000134
d10 -27647.89 3422.564237 -8.0781225 4.03707E-11 -34496.43 -20799.36 1179203458
d11 -20944.91 3422.055091 -6.1205655 8.15052E-08 -27792.43 -14097.4 1315937529

Estimated trend Co-efficient = 245.36 indicates that for every increase in predictor unit, there is
an expected increase in output by 245.36 unit
In this case, for every next month, Sales output expected to increase by 245.36 Australian
dollars
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

c) Exponential Trend Model with Seasonality

(i) Here we create the model with Exponential trend model with Seasonality
Trend = percentage monthly growth
Fit linear regression with log(Yt) as output and t as predictor
yt = a e b t e
log(yt ) = b0 + b1 t + e

Excel created  SouvenirSalesExponentialTrend.xlsx (PFA)


 Prepare data sheet:
From the data provided, get the Log of Sales. Consider the months as ‘t’ predictor and Log(Sales) as
output.
Get the dummy variables for the 12 months, d1, d2, d3, ….., d12 and provide values 1 or 0 as per the
months
d1 is for Jan, then in Jan column, enter value 1 and remaining are 0s
d2 is for Feb, then in Feb column, enter value 1 and remaining are 0s
and so on, till d12 for Dec

From XL Miner created the Data Partition on the Time series and ran the Prediction as below:
Data partition Prediction setup
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

(ii) From the results from the excel, MLR_Output sheet


Regression Model
Input RSS
Coefficient Std. Error t-Statistic P-Value CI Lower CI Upper
Variables Reduction
Intercept 9.5985651 0.089571244 107.161235 2.61821E-69 9.4193335 9.7777967 5926.8709
t 0.0211196 0.001085892 19.4491301 2.41603E-27 0.0189468 0.0232925 18.194163
d1 -1.952202 0.109675046 -17.799876 2.12881E-25 -2.171661 -1.732743 2.6881029
d2 -1.670187 0.109562098 -15.244207 3.98408E-22 -1.889421 -1.450954 1.1797006
d3 -1.257204 0.109459807 -11.485531 1.11431E-16 -1.476233 -1.038175 0.0203974
d4 -1.578329 0.109368202 -14.431332 5.15543E-21 -1.797174 -1.359484 1.0016503
d5 -1.530492 0.10928731 -14.0043 2.04559E-20 -1.749176 -1.311809 1.033963
d6 -1.505156 0.109217156 -13.781316 4.23997E-20 -1.723699 -1.286613 1.2387331
d7 -1.368822 0.109157759 -12.539854 2.75598E-18 -1.587247 -1.150398 0.9130081
d8 -1.405306 0.109109137 -12.879815 8.61563E-19 -1.623632 -1.186979 1.6218147
d9 -1.316637 0.109071306 -12.071343 1.40195E-17 -1.534888 -1.098386 1.9481593
d10 -1.222712 0.109044275 -11.212984 2.96586E-16 -1.440909 -1.004515 2.8686189
d11 -0.751248 0.109028054 -6.8904114 4.14576E-09 -0.969413 -0.533084 1.6929533

Estimated trend Co-efficient = 0.02 indicates that for every increase in predictor unit, there is an
expectation of 0.02 % growth
In this case, for every next month, Sales output expected to increase / grow by 0.02% of previous
month’s sales.

(iii) Forecasting sales for Feb 2002:


Extra steps needed are as below…
Create the new data sheet (ForecastingSheet), where we will add next 12 months as input
predictors and 12 dummy variables.

From Data sheet, select data range, predictor inputs and Log_Sales as output, as shown below.
Then select the ForecastingSheet where we have the details of new forecast dates
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

Select the data range and click ‘Match by Name’.

This gives a new score sheet  MLR_NewScore1 with the forecasted data for the date range
as per the ForecastingSheet

Predicted Exp(Predict
Date
Value ed Value)
Jan-02 9.5092637 13484.06
Feb-02 9.7827003 17724.45
Mar-02 10.249256 28261.51
Apr-02 9.9593767 21149.61
May-02 10.00683 22177.42
Jun-02 10.068191 23580.87
Jul-02 10.251837 28334.55
Aug-02 10.251367 28321.23
Sep-02 10.354752 31405.93
Oct-02 10.454834 34711.77
Nov-02 10.93621 56174.03
Dec-02 11.713723 122237.7

Take the exp. Values from the log values.


Forecasted sales for February 2002 = $ 17224.45 Australian dollars
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

d) Comparing the 2 regression models from the XLMiner results.

Model A = SouvenirSalesLinearTrend.xlsx (PFA)  Linear trend model with seasonality


Model B = SouvenirSalesExponentialTrend.xlsx (PFA)  Exponential Trend model with seasonality

Metrics to Compare Model A Model B


Predictor unit 245.36 0.02 %
Time in months = t Indicates constant expected Indicates expected % of growth in
growth in output units ($ / output units ($ / riders..) per
riders..) per predictor unit (t = predictor unit (t = monthly/
monthly/ yearly..) yearly..)

Multiplicative No Yes
increase/decrease of This is additive This is multiplicative
the series over time
RMSE 17451.547 7101.44

From the data above, the Model B is having better performance as:
i) The Root Mean Square Error (RMSE = 17451.547) of Model B < RMSE (7101.44) of Model A
ii) Model B provides multiplicative trend over time

e) ACF plot and AR(2) model for Exponential Trend model with seasonality

(i) Result in SouvenirSalesExponentialTrend.xlsx (PFA)

From sheet : MLR_TrainingScore  Select the Residual data  XL Miner  ARIMA 


Autocorrelations, select the data range, variable and lags = 15
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

We get the ACF plot in the sheet  ACF_Output

Here I see that none of the lags are touching the UCI or LCI. Lag 1 and Lag 2 are significant though.

Run a AR(2) model using the ARIMA (p=2, d=0, q=0)


Create new ArimaInput sheet with MLR_TraiingScore residuals and t.
Enter the t column and the residual columns as data
Run the ARIMA 2 model for getting the error residuals and forecast errors
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

From Arima_Output sheet


ARIMA Model
ARIMA Coeff StErr p-value
Const. term -9.8344E-16 0.0065285 1
AR1 0.307216 0.0991386 0.0019427
AR2 0.368709304 0.0869801 2.245E-05

Forecast
f % Confidence Level

Date t Forecast Lower Upper


Jan-02 Forecast 1 0.108689385 -0.166048 0.3834263
Feb-02 Forecast 2 0.099605127 -0.187805 0.3870148
Mar-02 Forecast 3 0.070675076 -0.243636 0.384986
Apr-02 Forecast 4 0.058437851 -0.263619 0.3804943
May-02 Forecast 5 0.044011601 -0.285245 0.3732679
Jun-02 Forecast 6 0.035067648 -0.297516 0.3676508
Jul-02 Forecast 7 0.027000829 -0.307939 0.3619409
Aug-02 Forecast 8 0.021224855 -0.315011 0.3574604
Sep-02 Forecast 9 0.016476072 -0.320593 0.3535447
Oct-02 Forecast 10 0.012887514 -0.324669 0.3504443
Nov-02 Forecast 11 0.010034132 -0.327826 0.347894
Dec-02 Forecast 12 0.007834392 -0.330207 0.3458761

Now taking the new residuals to the MLR_ValidationScore sheet and recalculating the predicted
values, adjusting the errors obtained from AR2 model

Predicted Values adjustment


Adjusted 95% Confidence Intervals 95% Prediction Intervals
Predicted AR2 value =
Predicted
Date Value forecasted EXP(Predit
Value Lower Upper Lower Upper
(EXP) Error ed Value +
Error)
Jan-02 9.188097 9780.022 0.1086894 10902.93 9113.889 13043.14594 7176.587 16564.11
Feb-02 9.4912315 13243.09 0.0996051 14630.11 12229.48 17501.96287 9629.913 22226.57
Mar-02 9.9253346 20441.75 0.0706751 21938.75 18338.87 26245.27651 14440.65 33330.12
Apr-02 9.6253294 15143.54 0.0584379 16054.87 13420.46 19206.40054 10567.73 24391.12
May-02 9.6942856 16224.63 0.0440116 16954.65 14172.6 20282.80662 11159.99 25758.1
Jun-02 9.7407414 16996.14 0.0350676 17602.73 14714.33 21058.10239 11586.57 26742.68
Jul-02 9.8981948 19894.42 0.0270008 20438.91 17085.13 24451.02041 13453.42 31051.51
Aug-02 9.8828312 19591.11 0.0212249 20011.37 16727.75 23939.56358 13172 30401.99
Sep-02 9.9926192 21864.49 0.0164761 22227.72 18580.42 26590.9699 14630.86 33769.13
Oct-02 10.107664 24530.3 0.0128875 24848.48 20771.15 29726.18212 16355.91 37750.69
Nov-02 10.600248 40144.77 0.0100341 40549.62 33895.92 48509.42304 26690.81 61604.42
Dec-02 11.372615 86908.87 0.0078344 87592.42 73219.58 104786.6223 57655.6 133073.5
Forecasting Analytics -1 Assignment-1 Ravinderpal S Wasu - 71710004

ii) Improved forecast for January 2002

For before the AR2 improvisation the Jan 2002 forecast was 9780.022, with 95% confidence level

After the AR2 residual correction, the Jan 2002 forecast is 10902.93, with 95% confidence level

f) Different goals of understanding components


For sales of different components, we will take the trend for the different components and then
analysis and prepare the different models.
We could analyze the trend for various components.
Some components that come to my mind are:
a) From which country are the customers
b) Gender of customers
c) Age of customers
d) The cost range that the customers are buying

You might also like