0% found this document useful (0 votes)
117 views10 pages

Answer Book - Sparkling Wines

The document discusses time series analysis and forecasting of sparkling wine sales data from 1981-1995. Key steps included: 1. Plotting the timeseries and performing exploratory analysis to identify seasonality but no trend. 2. Decomposition showed seasonality and white noise but no trend. The series was identified as additive. 3. Various forecasting models were fitted on a training-test split and evaluated on test data, with triple exponential smoothing and SARIMA performing best. 4. The most accurate models (triple exponential smoothing and SARIMA) were used to forecast the next 12 months of sales with 95% confidence intervals.

Uploaded by

Ashish Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views10 pages

Answer Book - Sparkling Wines

The document discusses time series analysis and forecasting of sparkling wine sales data from 1981-1995. Key steps included: 1. Plotting the timeseries and performing exploratory analysis to identify seasonality but no trend. 2. Decomposition showed seasonality and white noise but no trend. The series was identified as additive. 3. Various forecasting models were fitted on a training-test split and evaluated on test data, with triple exponential smoothing and SARIMA performing best. 4. The most accurate models (triple exponential smoothing and SARIMA) were used to forecast the next 12 months of sales with 95% confidence intervals.

Uploaded by

Ashish Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Answer Book – Ashish

Dataset – Sparkling Wines

Please do perform the following questions on each of these two data sets separately.
1. Read the data as an appropriate Time Series data and plot the data.

Read the data and made the timeseries by adding TimeStamp to the sales data. So that it can
become timeseries for analysis in python.

The plot above is between the sales of wines during period Jan-1981 to Jul-1995

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.

It can be seen from the yearly boxplot below that there is an absence of trend is not that
prominent in the timeseries since the mean is not fluctuating much.

However, there is a seasonality element in the timeseries which we can see from monthly box
plots. The maximum sales is achieved in the month of December. There is not much
fluctuations in the sales from Jan to Sep. Although, Oct to December sales is on significantly
increasing trend.

Decomposition

i) Additive
i) Multiplicative

Looking at the decomposition data it can be observed that there is no increasing or decreasing trend
in this timeseries. However, seasonality is present and white noise is also present in the data. Looking
at the charts we can conclude that the series is Additive.

3. Split the data into training and test. The test data should start in 1991.

The data is being split into train and test data using the below code.

4.  Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using RMSE.
Following models are being prepared on the train and test data and then performance is
measured on the test data using RMSE.

- Linear regression
- Naïve forecast
- Simple average
- Moving average
- Simple exponential smoothing
- Double exponential smoothing (Holt’s method)
- Triple exponential smoothing (Holt’s winter method)

Above models are being fit on the train data and then measured for performance on test data
based on RMSE values. Lower the RMSE better the model.

As we can Triple exponential smoothing model (using Alpha =0.1, Beta = 0.9 and Gamma
=0.6) has resulted lowest RMSE and MAPE on test data and seems to be the best exponential
smoothing model for forecast. The model with MAPE lower than 10% is generally considered
good model.

5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to
be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.

We have performed the check for stationarity based on “Augmented Dicky Fuller” test.
As can be seen above the P value is greater than 5% and hence we failed to reject the null
hypothesis as formed below.

H0 = Time series is non-stationary


Ha = Time series is stationary

We took a difference of order 1 and checked again whether the Time Series is stationary or
not.
As can be seen now based on p-value we are able to reject null hypothesis, which means with
order 1 the time series is now stationary. Although, the seasonality is still present in the
timeseries.

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.

We build the ARIMA/SARIMA models on train and measured the performance on test data
using RMSE value as measurement factor. Lower the RMSE value better the model.

We initially fit all the above models on train data with Optimization = True in our code for
Python to automatically identified the most optimum value of p,d and q or P,D,Q
Also we fit all the above models on train data with Optimization = False in our code for us to
manually identified the most optimum value of p,d and q or P,D,Q and pass on the lowest AIC
value in our model for best results.

Please see below diagnostics.

ARIMA model

SARIMA model
As we notice, SARIMA model with p=1,d=1,q=2,P=1,D=0,Q=2 and frequency = 12 months
has resulted in the lowest RMSE and hence we moved ahead and used SARIMA based on
these inputs to train our model on full dataset and forecast the sales data for next 12 months
as shown in below plot.

7. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.

Please see below table for following models.


- Linear regression
- Naïve forecast
- Simple average
- Moving average
- Simple exponential smoothing
- Double exponential smoothing (Holt’s method)
- Triple exponential smoothing (Holt’s winter method)

Please see below table for following models.


-ARMA
-ARIMA
-SARIMA

Lowest RMSE is achieved using Triple Exponential smoothing model (Holt’s winder model) with Alpha
=0.1, Beta = 0.9 and Gamma =0.6 followed by SARIMA model. Hence, we trained our model on full
data using these two models.

8. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands.

Lowest RMSE is achieved using Triple Exponential smoothing model (Holt’s winder model)
with Alpha =0.1, Beta = 0.9 and Gamma =0.6 followed by SARIMA model. Hence, we trained
our model on full data using these two models and predicted next 12 months data.
Forecast plot using Holt’s winter model
Below are the results if we consider Seasonality as “Additive”

RMSE on full data - 465.5687141603981

Forecast data based on 95% confidence interval

Forecast plot for next 12 months.

Forecast plot using SARIMA

RMSE on full data - 539.9831608568093

Forecast data based on 95% confidence interval


9. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.

Based on the forecast for next 12 months, it seems there is not much presence of trends in the
timeseries. Although, seasonality present in the data and sales is significantly increasing in the
last quarter of the year. The company should keep sufficient stocks of sparkling wines and
accelerate the manufacturing for quarter 4 of the year because the sales is high during this
period because of the festive seasons.

You might also like