100% found this document useful (3 votes)

535 views62 pages

Time Series Forecasting Project Report

The document describes a time series forecasting project to analyze and forecast wine sales data for two types of wines, Sparkling and Rose, from 1980 to 1995. For the Sparkling wine data, the author reads in the data, performs exploratory data analysis including decomposition, splits the data into training and test sets, checks for stationarity, builds ARIMA/SARIMA models using AIC and ACF/PACF cut-offs, evaluates models on test data, selects the optimal model on complete data to forecast 12 months into the future, and comments on findings. The same process is repeated for the Rose wine data. Various time series models like exponential smoothing, regression and moving averages are also built and compared on

Uploaded by

pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

535 views62 pages

Time Series Forecasting Project Report

Uploaded by

pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

TIME SERIES FORECASTING

PROJECT REPORT

DSBA

NAME : SREEVATHSAN S S
BATCH : PGPDSBA ONLINE APRIL_B 2021
TABLE OF CONTENTS
PROBLEM:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyses and forecast Wine Sales in the 20th century.
Dataset : Sparkling
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Check for the stationarity of the data on which the model is being built on using appropriate statistical
tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take
appropriate steps to make it stationary. Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
5. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using
the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data
using RMSE.
6. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.
7. Build a table with all the models built along with their corresponding parameters and the respective
RMSE values on the test data.
8. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict
12 months into the future with appropriate confidence intervals/bands.
9. Comment on the model thus built and report your findings and suggest the measures that the company
should be taking for future sales.

Dataset : Rose
10. Read the data as an appropriate Time Series data and plot the data.
11. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.
12. Split the data into training and test. The test data should start in 1991.
13. Check for the stationarity of the data on which the model is being built on using appropriate statistical
tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take
appropriate steps to make it stationary. Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
14. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using
the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data
using RMSE.
15. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.
16. Build a table with all the models built along with their corresponding parameters and the respective
RMSE values on the test data.
17. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict
12 months into the future with appropriate confidence intervals/bands.
18. Comment on the model thus built and report your findings and suggest the measures that the company
should be taking for future sales.
19.
Sparkling:
Data Dictionary:
Year Month – Month & Year of Sales
Sparkling – No. of units of Sparkling brand wine got sold
1. Read the data as an appropriate Time Series data and plot the data.
We will read the data as Time Stamp data to conduct the time series by passing parse date to the column
‘YearMonth and will convert ‘YearMonth’ field as index field

We will plot the data now to see whether the production values are represented against time or not.

Fig 1. Monthly Sales Values

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.

Description of the Data:

• The dataset contains the monthly Sparkling Wine Sales value from Jan-1980 to Jul-1995
• It has total of 187 observations
• There is no missing value available in this dataset
• Average monthly sales value is around 2402.41 and the median value is around 1874 which
implies there is the right skewness present in this data set

Exploratory Analysis through different Plots

Fig 2. Yearly Box plot

Fig3. Monthly Box Plot

Fig 4. Average Month plot within different month across year

Fig 5. Month wise comparison Plot

Observations:
1. Yearly Sales value trend shows almost constant throughout the 16 years, however the variance
between the monthly sales value within the year is getting wider after 1984
2. Almost every year have at least one positive Outlier
3. From Monthly Box plot it is clearly visible that till Jun the sales value is lower and almost constant
till June after the trend there is increasing trend observed with the highest sales value is getting
recorded in the month of December
4. Month wise comparison plot also shows that across all the year the sales value is recorded higher in
December followed by November
5. There is clear seasonality is visible in this data set

Decomposition of Data:

We will perform Decomposition the data to segregate Trend, Seasonality and Residuals.
The individual components and its plots are mentioned below.
Fig 6: Decomposition graph of time series

• Decomposition graph shows very good seasonality on yearly basis

• The series is additive because there is not much variance observed as we move across time
• Linear model might not work as the trend doesn’t show proper pattern

3. Split the data into training and test. The test data should start in 1991.
We have split the data in training and test data. Our training data is from January 1980 to December
1990 and testing data is from January 1991 to July 1995.
The total records for the training data sets are 132 and for testing data sets are 55.
We have displayed the last 5 records of training data followed by first 5 records of testing data.

Fig. 7 Train and Test Data Split

4. Build various exponential smoothing models on the training data and evaluate the
model using RMSE on the test data.
Other models such as regression,naïve forecast models, simple average models etc.
should also be built on the training data and check the performance on the test data
using RMSE.

Model 1 – Linear Regression

For this particular linear regression, we are going to regress the ‘Sparkling’ variable against the order
of the occurrence. For this we have modified our data before fitting it into a linear regression.

Test RMSE 1275.659913 Test MAPE: 38.700848

Model 2 – Naive Approach
For this particular naive model, we say that the prediction for tomorrow is the same as today
and the prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is
same as today, therefore the prediction for day after tomorrow is also today.

Test RMSE 3864.279352 Test MAPE: 201.327650

Model 3 – Simple Average
For this particular simple average method, we will forecast by using the average of the training
values.

Test RMSE: 1275.081804 Test MAPE: 39.157336

Model 4 – Moving Average
For the moving average model, we are going to calculate rolling means (or moving averages) for
different intervals. The best interval can be determined by the maximum accuracy (or the
minimum error).

For 2 point Moving Average Model forecast on the Training Data, RMSE is 813.401 MAPE
For 4 point Moving Average Model forecast on the Training Data, RMSE is 1156.590 MAPE
For 6 point Moving Average Model forecast on the Training Data, RMSE is 1283.927 MAPE
For 9 point Moving Average Model forecast on the Training Data, RMSE is 1346.278 MAPE

Model 5 – Simple Exponential Smoothing

In the Simple Exponential Smoothing Model, only the level of the Time Series is accounted for.
Here, we can see that the data has both trend and seasonality. This particular Simple
Exponential Smoothing model is built only to showcase how Simple Exponential Smoothing
models are built in Python.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.216.
For 𝛼 = 0.216, Test RMSE: 1275.081823 Test MAPE: 39.157523

We have run the model by setting different alpha values.

The higher the alpha value more weightage is given to the more recent observation. That
means, what happened recently will happen again.

We have run a loop with different alpha values to understand which particular value works best
for alpha on the test set. Below are the top 5 𝛼 values with the least test RMSE values.

Now we will go ahead and plot the graph with auto predicted 𝛼 (0.216) as well as the 𝛼 with the
least test RMSE values (0.1).
Method 6 – Double Exponential Smoothing (Holt's Model)
Two parameters 𝛼 and 𝛽 are estimated in this model. Level and Trend are accounted for in this
model. This particular Time Series seems to have a Seasonality as well. Let us see how Holt's
Model behaves in such a scenario.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.400 and 𝛽 to be 0.072.
We have run the model by setting different alpha and beta values.

Now we will go ahead and plot the graph with auto predicted 𝛼 (0.111), 𝛽(0.049) and 𝛾(0.395)
as well as the 𝛼, 𝛽 and 𝛾 with the least test RMSE values (0.4, 0.3 and 0.1).
Test RMS: 1778.564670 Test MAPE: 85.874037

Model 7 – Triple Exponential Smoothing (Holt - Winter's Model)

Three parameters 𝛼, 𝛽 and 𝛾 are estimated in this model. Level, Trend and Seasonality are
accounted for in this model. This particular Time Series looks to have trend as well as
seasonality, so Holt-Winter's model theoretically seems to be a correct fit. Let us see how the
model behaves.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.111, 𝛽 to be 0.049 and 𝛾 to
be 0.395.
We have run the model by setting different alpha, beta and gamma values.

Now we will go ahead and plot the graph with auto predicted 𝛼 (0.111), 𝛽(0.049) and 𝛾(0.395)
as well as the 𝛼, 𝛽 and 𝛾 with the least test RMSE values (0.4, 0.3 and 0.1).
6. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to be
non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
Assumptions:
H0: The data is not stationary
H1: The data is stationary

We have checked the stationarity of data using Dickey-Fuller test. From the below figure we can infer
that at 5% significant level, we can't reject null hypothesis and hence the time series data is not
stationary.

Since this dataset is not stationary we have taken 1st order difference and checked the stationarity of
the data. We see that at alpha = 0.05, we can reject the null hypothesis as the p value is almost 0 and
less than 0.05 , hence the time series is indeed stationary at difference of order 1.

7. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.

ARIMA Model :

In this model we required the value of p, q, d . And the best possible value of thes parameter
can be finalized based on the lowest AIC number of that model. Hence we have to build model
with the parameter combination as mentioned below. The parameter range selected for p and q is
from 0 to 4 and for d it is 1 and 2.
We have built the ARIMA model for the parameters ranged from 0 to 5 for p and q. We have
sorted model results based on lowest Akaike Information Criteria (AIC). At the lowest AIC
(2213.509213) on the training data , the parameters are (2,1,2)

Below are the results applying the lowest parameters identified – ARIMA (2,1,2). Both Lags and
error term were significant.
Fig. 8 ARIMA(2,1,2) Result

Test RMSE has been calculated for the ARIMA (2,1,2 ) which is 1299.980869

SARIMA

To build SARIMA model we required 6 parameter p,q,d and P,Q,D . We have built the SARIMA
model considering the seasonality for the range 0 to 2 and the selected the lowest AIC
(1054.718055) on the training data – SARIMA(0, 1, 1)x(1, 0, 1, 12)

Below are the results applying the lowest parameters identified – SARIMA(0, 1, 1)x(1, 0, 1, 12).
Fig. 9 SARIMA(0,1,1)x(1,0,1,12) Result

The Test RMSE for the SARIMA (0,1,1)x(1,0,1,12) is 603.649011 compared to ARIMA(2,1,2) it has
very less RMSE value which is due to seasonality presence in the dataset.

7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training
data and evaluate this model on the test data using RMSE.

We have to plot the autocorrelation and partial autocorrelation function on the whole data. From
Autocorrelation we have to find out the value of q and Q, and from Partial Autocorrelation we
have to find out the value of p and P based on the significant level.

The autocorrelation has been plotted using stats model as mentioned in the Figure 10, the 3rd level
lag lies in the significant level and hence the value of q can be considered as 3.

Similarly, 2nd season lies in the significant area hence we will consider the value of Q as 2.
Fig.10 ACF plot

The Partial autocorrelation has been plotted using stats model as mentioned in the Figure 11,
from this plot, we can predict p as 3 and since every lag is significant, P can be taken as 1

Fig.11 PACF plot

All the required values has been found out using the plot.
p= 3 , q=3 , P=1 , Q =2

Based on these value we have calculated ARIMA /SARIMA model

ARIMA model has been built with the parameter p=3, d=1, q=3 and its result shown in the figure12
All AR and MA values are significant in this model.
Fig 12. ARIMA (3,1,3) Result

The test RMSE value of ARIMA (3,1,3) is 1228.4889 which is slightly lower than the ARIMA(2,1,2)
but very higher than the SARIMA (0,1,1)x(1,0,1,12)

SARIMA model has been built with the parameter p=3, d=1, q=3, P=1, Q=2, D=0 and its result
shown in the figure13

AR lag 3 and MA s Lag 24 are not significant in this model

Fig 13. SARIMA(3,1,3)x(1,0,2,12) Result

The Test RMSE value for this model SARIMA (3,1,3)x(1,0,2,12) is 623.9257 and has lesser RMSE
compared with the both the ARIMA model , however slightly higher RMSE than the
SARIMA(0,1,1)x(1,0,1,12)

We have also built auto ARIMA model using PMD ARIMA function in Python for the range of p and q
from 0 to 3 and also d with 1.
Figure 14 shows the result of the model thus built using PMD ARIMA function .
Fig 14. ARIMA(2,1,3) Result

The Test RMSE value of this model built with PMD function is 1300.1634 which seems higher than the
rest of the model RMSE value.

Similarly, we have built SARIMA model using the PMD function with the range of p and q from 0 to 4
and the parameter value of P and Q starts from 0 and with the seasonal value 12.
Figure 15 shows the model result of SARIMA(3, 1, 0)x(1, 0, 1, 12) built through the PMD function.

Fig 15. SARIMA(3, 1, 0)x(1, 0, 1, 12) Result

The RMSE value for the SARIMA(3, 1, 0)x(1, 0, 1, 12) is 899.7035 which has lesser value of all ARIMA
model, however slightly higher than the other two SARIMA model.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.

We have built a table (Fig 16) with all the Test RMSE value of the model thus built so far. Out of the 6
model built we have observed that SARIMA (0,1,1)x(1,0,1,12) has the least RMSE value compared to
other 5 model. Hence we can finalise this model as the optimum model to forecast 12-month data.

Fig 16. Test RMSE value table

9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands.

Based on the above 6 model we have finalised the SARIMA (0,1,1)x(1,0,1,12) model since it has the
least Test RMSE value .
We have built the model SARIMA (0,1,1)x(1,0,1,12) with full data set and its results were shown in the
figure 17.
Fig 17. SARIMA (0,1,1)x(1,0,1,12) Result
The RMSE value for the SARIMA (0,1,1)x(1,0,1,12) model for the full data set is 519.0809

We have forecasted 12 month value starting from Aug’1995 till Jul’1996 and the forecasted value is
mentioned in the below table.
We have built a plot with the forecasted value of the model SARIMA (0,1,1)x(1,0,1,12) along with the
original value which shown in the fig 18.

Fig 18. Original & Forecasted value of the model SARIMA (0,1,1)x(1,0,1,12)
20. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.

• Final model suggested is SARIMA (0,1,1)x(1,0,1,12)

• RMSE value for the test data sets of the selected model is 603.6490
• RMSE value for full data set of the selected model is 519.0809
• To predict the August 1995 value, the error term of July 1995 as p & q values are 0,1
respectively and August 1994 and error terms of August 1994 as P&Q values are 1 each,
are needed.
• Generalizing the above, in order to predict the future, one year data is significant.
• The Forecasted sales value though shows immediate increase in the value which is due
to seasonality, and also the overall forecasted year value is slightly better than the current
year
• The sales value for the past few years suggest the trend is almost consistent and have the
potential to increase the sales value in the upcoming year
Problem Statement:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analysed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.
1. Read the data as an appropriate Time Series data and plot the data.
We have read the data and converted the data into a monthly time series data and the new field is
labeled as “YearMonth” using parse_dates function.

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.

The dataset contains monthly sales of rose wine data from January 1980 to July 1995. There are
two null values. We have interpolated the null values using linear interpolation method.

We have looked into the statistics and have identified different statistical measures like mean,
standard deviation and other measures on the given data. The mean monthly rose wine sales over
the period is 90.39 and median is 86. It shows slight skewness towards the right.
We also plotted the data against the time and studied the pattern.(Figure 1)

Figure 1: Time series plot of rose wine sales data

In the below figures we studied the distribution of data and confirm the skewness of the data. The
most of data values situated between 30 to 190.(Figure 2)

Figure 2: Histogram & Density of time series data - rose wine sales
We also plotted the yearly box plot. From that we can clearly see the high rose wine sales are in
the years 1980 and 1981. And has gradually decreased over the years. We also witness the
presence of outliers which are negligible in size and hence it is untreated.(Figure 3)

Figure 1: Yearly Box Plot

We also plotted the monthly box plot. From that, we can clearly see the high rose wine sales are
in the month of December and shows the slight increasing pattern throughout the year.(Figure 4)

Figure 2: Monthly Box Plot

We plotted the monthly plot and the red line shows the mean value of particular month sales. We
can infer that December continued to experience high sales compared to other months over the
given period.(Figure 5)
Figure 3: Monthly Plot

We have also plotted the monthly sales data across the years and confirmed that throughout all
the years, December month sales is tremendous. (Figure 6)
Figure 6: Yearly Line plot

We have also decomposed the time series data to check for the components of Trend,
Seasonality and Residuals. The individual components and their plots are indicated
below.(Figure 7)
Decomposition Graph shows very good seasonality for month-on-month patterns. The series is
additive because the seasonal variation did not increases as we move across time. Trend shows
a decreasing pattern from the year 1981.
Figure 7: Decomposition of data

3. Split the data into training and test. The test data should start in 1991.

We have split the data in training and test data. Our training data is from January 1980 to
December 1990 and testing data is from January 1991 to July 1995.
The total records for the training data sets are 132 and for testing data sets are 55.
We have displayed the last 5 records of training data followed by first 5 records of testing data.
Figure 8: Rose wine sales – Split into Test and Train data

4. Build various exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naive forecast
models, simple average models etc. should also be built on the training data and check
the performance on the test data using RMSE.

Model 1 – Linear Regression

For this particular linear regression, we are going to regress the “Rose” variable against
the order of the occurrence. For this we have modified our data before fitting it into a
linear regression.
Test RMSE: 54.28611 Test MAPE: 111.119236

Model 2 – Naive Approach

For this particular naive model, we say that the prediction for tomorrow is the same as today
and the prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is
same as today, therefore the prediction for day after tomorrow is also today.
Test RMSE: 79.718773 Test MAPE: 164.846275

Model 3 – Simple Average

For this particular simple average method, we will forecast by using the average of the training values.
Test RMSE: 53.460570 Test MAPE: 110.587957
Model 4 – Moving Average
For the moving average model, we are going to calculate rolling means (or moving averages) for
different intervals. The best interval can be determined by the maximum accuracy (or the
minimum error).

2-point Moving Average - Test RMSE: 556.725 Test MAPE: 12.85

4-point Moving Average - Test RMSE: 687.182 Test MAPE: 15.51
6-point Moving Average - Test RMSE: 710.514 Test MAPE: 16.64
9-point Moving Average - Test RMSE: 735.890 Test MAPE: 16.61
According to the above we can see that the 2-point moving average is the best one to go with.

Before we go on to build the various Exponential Smoothing models, let us plot all the models
[only the most optimum Moving Average model (one with least RMSE) is plotted] and
compare the Time Series plots.
Model 5 – Simple Exponential Smoothing
In the Simple Exponential Smoothing Model, only the level of the Time Series is accounted for.
Here, we can see that the data has both trend and seasonality. This particular Simple
Exponential Smoothing model is built only to showcase how Simple Exponential Smoothing
models are built in Python.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.216.
Test RMSE: 36.796242 Test MAPE: 75.909219

We have run the model by setting different alpha values.

The higher the alpha value more weightage is given to the more recent observation. That
means, what happened recently will happen again.

We have run a loop with different alpha values to understand which particular value works best
for alpha on the test set. Below are the top 5 𝛼 values with the least test RMSE values.
Now we will go ahead and plot the graph with auto predicted 𝛼 (0.216) as well as the 𝛼 with the
least test RMSE values (0.1).
Method 6 – Double Exponential Smoothing (Holt's Model)
Two parameters 𝛼 and 𝛽 are estimated in this model. Level and Trend are accounted for in this
model. This particular Time Series seems to have a Seasonality as well. Let us see how Holt's
Model behaves in such a scenario.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.400 and 𝛽 to be 0.072.

We have run the model by setting different alpha and beta values.

We have run a loop with different alpha and beta values to understand which particular value
combination works best on the test set. Below are the top 5 𝛼 and 𝛽 value combinations with
the least test RMSE values.
Now we will go ahead and plot the graph with auto predicted 𝛼 (0.400) and 𝛽(0.072) as well as
the 𝛼 and 𝛽 with the least test RMSE values (0.1 and 0.1).
Model 7 – Triple Exponential Smoothing (Holt - Winter's Model)
Three parameters 𝛼, 𝛽 and 𝛾 are estimated in this model. Level, Trend and Seasonality are
accounted for in this model. This particular Time Series looks to have trend as well as
seasonality, so Holt-Winter's model theoretically seems to be a correct fit. Let us see how the
model behaves.
For this dataset Python has optimized the smoothing level 𝛼 to be 0.111, 𝛽 to be 0.049 and 𝛾 to
be 0.395.

We have run the model by setting different alpha, beta and gamma values.

We have run a loop with different alpha, beta and gamma values to understand which
particular value combination works best on the test set. Below are the top 5 𝛼, 𝛽 and 𝛾 value
combinations with the least test RMSE values.
Now we will go ahead and plot the graph with auto predicted 𝛼 (0.111), 𝛽(0.049) and 𝛾(0.395)
as well as the 𝛼, 𝛽 and 𝛾 with the least test RMSE values (0.4, 0.3 and 0.1).
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.

Assumptions:
H0: The data is not stationary
H1: The data is stationary

We have checked the stationarity of data using Dickey-Fuller test. From the below figure we can
infer that at 5% significant level, we can't reject null hypothesis and hence the time series data is
not stationary.

We have taken a order of 1 and checked the stationarity of the data. We see that at alpha = 0.05,
we can reject the null hypothesis and hence the time series is indeed stationary

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

We have built the ARIMA model for the parameters ranged from 0 to 5 for p and q. We have
sorted model results based on lowest Akaike Information Criteria (AIC). At the lowest AIC
(1274.695172) on the training data , the parameters are (2,1,3)
Below are the results applying the lowest parameters identified – ARIMA (2,1,3). From that
results, error term of 1 period and 3 period lags are slightly insignificant.

We have also built the SARIMA model considering the seasonality for the range 0 to 2 and the
selected the lowest AIC (1054.718055) on the training data – SARIMA(1, 1, 1)x(1, 0, 1, 12)

Below are the results applying the lowest parameters identified – SARIMA(1, 1, 1)x(1, 0, 1, 12).
We have calculated the RMSE value for the ARIMA and SARIMA models built on Test data.
RMSE using ARIMA (2,1,3) model is 36.813755 and SARIMA(1, 1, 1)x(1, 0, 1, 12) model is
21.703017

7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.

We have plotted the autocorrelation and partial autocorrelation function plots on the whole data.
From the Figure 9, the we take q value as 3 , in case of Q, all are significant. As assumption Q is
taken as 2.
Figure 9: Autocorrelation function plot

From the Figure 10, the value for p is taken as 5, and for P it is 3 since at these values, it seems to
be significant.

Figure 10: Partial Autocorrelation function plot

We build the ARIMA model at parameters (5,1,3) based on the results derived after plotting
autocorrelation and partial autocorrelation function plots. Below are the results. We witness that
the 1 period and 4 period lag are slightly insignificant.
We build the SARIMA model of parameters (5,1,3) x (3,0,2,12) based on the results derived after
plotting autocorrelation and partial autocorrelation function plots. It shows only the error term for
2 period lag and 12 period (1st seasonal) lag are significant.
We have calculated RMSE values for ARIMA (5,1,3) and SARIMA (5,1,3) x (3,0,2,12) models

We have also built the models of ARIMA based on pmd arima function in python for the ranges 0
to 3. Below is the result.
We have also built the models of SARIMA based on pmdarima function in python for the ranges
0 to 4 in terms of trend and 0 to 5 in terms of seasonal. Below is the result.

8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.

We have calculated the RMSE values for different models and the best model is SARIMA of pmd
function (1, 1, 2) x (1,0,1,12) with an RMSE value of 14.562001
9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.

We build the SARIMA model of (1, 1, 2) x (1,0,1,12) which has lowest RMSE value for the
whole dataset. Below is the result.

We have forecasted the data for the next 12 months – August 1995 to July 1996 after applying the
best model – SARIMA of (1, 1, 2) x (1,0,1,12). We have also calculated the RMSE value for the
full period.
Figure 11: Rose wine sales (original data – Jan 1980 to Jul 1995, Forecast – Aug 1995 to Jul
1996)

10. Comment on the model thus built and report your findings and suggest the measures that
the company should be taking for future sales.

• Final model suggested is SARIMA (1, 1, 2) x (1,0,1,12)

• RMSE value for the test data sets of the selected model is 14.562001
• RMSE value for full data set of the selected model is 33.6399
• To predict the August 1995 value, the significant values are the July 1995 and error term
of June 1995 as p& q values are 1,2 respectively and August 1994 and error terms of
August 1994 as P&Q values are 1 each, are needed.
• Generalizing the above, in order to predict the future, one-year data is significant.
• The Forecasted sales value though shows immediate increase in the value which is due to
seasonality, however the overall decreasing trend is being observed and suggesting
management to take steps to increase the sales

Time - PGP DSBA
100% (1)
Time - PGP DSBA
43 pages
Time Series
100% (5)
Time Series
16 pages
Cafe Chain Analysis - Janani Prakash
100% (1)
Cafe Chain Analysis - Janani Prakash
21 pages
Time Series Forecasting Business Report
No ratings yet
Time Series Forecasting Business Report
42 pages
REport Time Series
100% (2)
REport Time Series
57 pages
Shoe Sales
100% (3)
Shoe Sales
105 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Soal Dan Jawaban Heizer
63% (8)
Soal Dan Jawaban Heizer
106 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Time Series Project
100% (3)
Time Series Project
45 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
New Wheels - Project - Report
No ratings yet
New Wheels - Project - Report
31 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Suresh-Rose Time Series Forecasting Project Report
100% (1)
Suresh-Rose Time Series Forecasting Project Report
75 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
TSA Business Report
0% (1)
TSA Business Report
27 pages
MRA - Project - Puvya - Ravi
100% (3)
MRA - Project - Puvya - Ravi
46 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Time Series Forcast
No ratings yet
Time Series Forcast
18 pages
Project - 8 (MRA)
50% (4)
Project - 8 (MRA)
15 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Project On Finance
No ratings yet
Project On Finance
61 pages
Time Series Project
50% (4)
Time Series Project
2 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Problem Statement and Dataset Details
No ratings yet
Problem Statement and Dataset Details
13 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Executive Summary - Exploratory Data Analysis - Menu Analysis - Recommendation
No ratings yet
Executive Summary - Exploratory Data Analysis - Menu Analysis - Recommendation
17 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
FRA Report
100% (1)
FRA Report
30 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Vivek Dubey - Marketing & Retail Analytics
100% (2)
Vivek Dubey - Marketing & Retail Analytics
20 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
MRA Milestone-1 Graded Project
100% (2)
MRA Milestone-1 Graded Project
41 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
Project Avinash Ray DVT Car Insurance
No ratings yet
Project Avinash Ray DVT Car Insurance
4 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Project Work 01
No ratings yet
Project Work 01
23 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
MRA CafeChain Analysis
No ratings yet
MRA CafeChain Analysis
23 pages
Stevenson 14e Chap003
No ratings yet
Stevenson 14e Chap003
41 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Lifi
100% (1)
Lifi
16 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Capstone Project
100% (1)
Capstone Project
7 pages
QA Lecture Notes PDF
No ratings yet
QA Lecture Notes PDF
89 pages
Quantitative Techiniques
100% (3)
Quantitative Techiniques
38 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Chapter 4: Forecasting: Problem 1
No ratings yet
Chapter 4: Forecasting: Problem 1
10 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
Milestone 1
No ratings yet
Milestone 1
2 pages
Excel 2007 - 10 Forecasting and Data Analysis Course Manual1
No ratings yet
Excel 2007 - 10 Forecasting and Data Analysis Course Manual1
160 pages
Rajendra Prasad TSA Rose Sparkeling 04.06.2013
No ratings yet
Rajendra Prasad TSA Rose Sparkeling 04.06.2013
91 pages
PriyankaSharma TSF Sparkling
No ratings yet
PriyankaSharma TSF Sparkling
36 pages
The Forecasting Problem Name University Date
No ratings yet
The Forecasting Problem Name University Date
14 pages
Economics 14th Edition by Mark Hirschey
No ratings yet
Economics 14th Edition by Mark Hirschey
43 pages
05 - Technological & Quantitative Forecasting
No ratings yet
05 - Technological & Quantitative Forecasting
44 pages
The Concept of TQM
No ratings yet
The Concept of TQM
10 pages
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
No ratings yet
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
38 pages
General Sales Forecast Models For Automobile Markets PDF
No ratings yet
General Sales Forecast Models For Automobile Markets PDF
22 pages
Klaus-Michael Fortmann-Fundamentals of Logistics
No ratings yet
Klaus-Michael Fortmann-Fundamentals of Logistics
143 pages
Comparative Analysis Between Different Forecasting Methods: Course: Managing Operations and Supply Chain (P501)
No ratings yet
Comparative Analysis Between Different Forecasting Methods: Course: Managing Operations and Supply Chain (P501)
11 pages
P4a Forecasting
No ratings yet
P4a Forecasting
64 pages
Time Series Forecasting Business Report-1
No ratings yet
Time Series Forecasting Business Report-1
65 pages
CH 13 Supply Chain Management
No ratings yet
CH 13 Supply Chain Management
66 pages
3 Simple Moving Average
No ratings yet
3 Simple Moving Average
55 pages
Student Instructions For The Use of Spreadsheets With Examples PDF
No ratings yet
Student Instructions For The Use of Spreadsheets With Examples PDF
67 pages
Student Name: - : TRUE/FALSE - Write 'T' If The Statement Is True and 'F' If The Statement Is False. 1)
No ratings yet
Student Name: - : TRUE/FALSE - Write 'T' If The Statement Is True and 'F' If The Statement Is False. 1)
109 pages
Data Science For Supply Chain Forecasting 2nd Edition Nicolas Vandeput Download
No ratings yet
Data Science For Supply Chain Forecasting 2nd Edition Nicolas Vandeput Download
78 pages
Taylor 2007
No ratings yet
Taylor 2007
14 pages
Demand Forecasting: Evidence-Based Methods: Armstrong@wharton - Upenn.edu
No ratings yet
Demand Forecasting: Evidence-Based Methods: Armstrong@wharton - Upenn.edu
23 pages
Production and Operation Management: NAME: - ERP: - DATE
No ratings yet
Production and Operation Management: NAME: - ERP: - DATE
3 pages
Case EOQ
No ratings yet
Case EOQ
12 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
15 pages
8.2 Methods With Trend - Forecasting - Principles and Practice (3rd Ed)
No ratings yet
8.2 Methods With Trend - Forecasting - Principles and Practice (3rd Ed)
10 pages
Rmbi1020 Tut06
No ratings yet
Rmbi1020 Tut06
3 pages

Time Series Forecasting Project Report

Uploaded by

Time Series Forecasting Project Report

Uploaded by

TIME SERIES FORECASTING

Fig 1. Monthly Sales Values

Description of the Data:

Exploratory Analysis through different Plots

Fig3. Monthly Box Plot

Fig 4. Average Month plot within different month across year

• Decomposition graph shows very good seasonality on yearly basis

Fig. 7 Train and Test Data Split

Model 1 – Linear Regression

Test RMSE 1275.659913 Test MAPE: 38.700848

Test RMSE 3864.279352 Test MAPE: 201.327650

Test RMSE: 1275.081804 Test MAPE: 39.157336

Model 5 – Simple Exponential Smoothing

We have run the model by setting different alpha values.

Model 7 – Triple Exponential Smoothing (Holt - Winter's Model)

Fig.11 PACF plot

Based on these value we have calculated ARIMA /SARIMA model

AR lag 3 and MA s Lag 24 are not significant in this model

Fig 15. SARIMA(3, 1, 0)x(1, 0, 1, 12) Result

Fig 16. Test RMSE value table

• Final model suggested is SARIMA (0,1,1)x(1,0,1,12)

Figure 1: Time series plot of rose wine sales data

Figure 1: Yearly Box Plot

Figure 2: Monthly Box Plot

Model 1 – Linear Regression

Model 2 – Naive Approach

Model 3 – Simple Average

2-point Moving Average - Test RMSE: 556.725 Test MAPE: 12.85

We have run the model by setting different alpha values.

Figure 10: Partial Autocorrelation function plot

• Final model suggested is SARIMA (1, 1, 2) x (1,0,1,12)

You might also like