Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
Project - Time Series Forecasting (Sparkling - CSV) & (Rose - CSV)
(Sparkling.csv)&(Rose.csv)
Problem:
For this particular assignment, the data of different types of wine sales in the 20th century is to
be analysed. Both of these data are from the same company but of different wines. As an analyst
in the ABC Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.
1. Read the data as an appropriate Time series data and plot the data
Time series is a sequence of observations recorded at regular time intervals.
Sparkling Data
Import libraries
Read the data
Head and Tail of the data
YearMonth Sparkling
0 1980-01 1686
1 1980-02 1591
2 1980-03 2304
3 1980-04 1712
4 1980-05 1471
YearMonth Sparkling
182 1995-03 1897
183 1995-04 1862
184 1995-05 1670
185 1995-06 1688
186 1995-07 2031
2. Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.
Plot graph
Timestamp
YearMonth Sparkling Time_Stamp
0 1980-01 1686 2013-01-31
1 1980-02 1591 2013-02-28
2 1980-03 2304 2013-03-31
3 1980-04 1712 2013-04-30
4 1980-05 1471 2013-05-31
Yearly plot
Monthly Plot
2) Irregular Remainder (random): is the residual left in the series after removal of trend and seasonal
components. Remainder is calculated using the following formula:
3. Split the data into training and test. The test data should start in 1991.
Both the test data in Sparkling and Rose wine sales start from 1991
4. Build various exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression,naïve forecast
models, simple average models etc. should also be built on the training data and check
the performance on the test data using RMSE.
Please do try to build as many models as possible and as many iterations of models as
possible with different parameters.
Exponential smoothing methods consist of special case exponential moving with notation ETS (Error,
Trend, Seasonality) where each can be none(N), additive (N), additive damped (Ad), Multiplicative
(M) or multiplicative damped (Md).
This method is suitable for forecasting data with no clear trend or seasonal pattern.
In Single ES, the forecast at time (t + 1) is given by Winters,1960
��+1=𝛼��+(1−𝛼)��Ft+1=αYt+(1−α)Ft
Parameter 𝛼α is called the smoothing constant and its value lies between 0 and 1. Since the
model uses only one smoothing constant, it is called Single Exponential Smoothing.
The dataset, Rose and Sparkling gives the wine sales, the 20th centuary1980 to 1995.
Training and Test time instance Training and Test time instance
Training Time instance Training Time instance
[1, 2, 3, 4, 5, 6, 7, 8, 9, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
10, 11, 12, 13, 14, 15, 16, 17, 11, 12, 13, 14, 15, 16, 17, 18,
18, 19, 20, 21, 22, 23, 24, 25, 19, 20, 21, 22, 23, 24, 25, 26,
26, 27, 28, 29, 30, 31, 32, 33, 27, 28, 29, 30, 31, 32, 33, 34,
34, 35, 36, 37, 38, 39, 40, 41, 35, 36, 37, 38, 39, 40, 41, 42,
42, 43, 44, 45, 46, 47, 48, 49, 43, 44, 45, 46, 47, 48, 49, 50,
50, 51, 52, 53, 54, 55, 56, 57, 51, 52, 53, 54, 55, 56, 57, 58,
58, 59, 60, 61, 62, 63, 64, 65, 59, 60, 61, 62, 63, 64, 65, 66,
66, 67, 68, 69, 70, 71, 72, 73, 67, 68, 69, 70, 71, 72, 73, 74,
74, 75, 76, 77, 78, 79, 80, 81, 75, 76, 77, 78, 79, 80, 81, 82,
82, 83, 84, 85, 86, 87, 88, 89, 83, 84, 85, 86, 87, 88, 89, 90,
90, 91, 92, 93, 94, 95, 96, 97, 91, 92, 93, 94, 95, 96, 97, 98,
98, 99, 100, 101, 102, 103, 99, 100, 101, 102, 103, 104, 105,
104, 105, 106, 107, 108, 109, 106, 107, 108, 109, 110, 111, 112,
110, 111, 112, 113, 114, 115, 113, 114, 115, 116, 117, 118, 119,
116, 117, 118, 119, 120, 121, 120, 121, 122, 123, 124, 125, 126,
122, 123, 124, 125, 126, 127, 127, 128, 129, 130]
128, 129, 130] Test Time instance
Test Time instance [43, 44, 45, 46, 47, 48, 49, 50,
[43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58,
50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61, 62, 63, 64, 65, 66,
58, 59, 60, 61, 62, 63, 64, 65, 67, 68, 69, 70, 71, 72, 73, 74,
66, 67, 68, 69, 70, 71, 72, 73, 75, 76, 77, 78, 79, 80, 81, 82,
74, 75, 76, 77, 78, 79, 80, 81, 83, 84, 85, 86, 87, 88, 89, 90,
82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 97, 98,
90, 91, 92, 93, 94, 95, 96, 97, 99]
98, 99] We see that we have successfully the
We see that we have successfully the generated the numerical time instance
generated the numerical time instance order for both the training and test set. Now
order for both the training and test set. we will add these values in the training and
Now we will add these values in the test set.
training and test set.
First few rows of Training Data First few rows of Training Data
YearMonth Rose time
Sparkling time Time_Stamp
Time_Stamp 2013-01-31 112.0 1
2013-01-31 1980-01 1686 2013-02-28 118.0 2
1 2013-03-31 129.0 3
2013-02-28 1980-02 1591 2013-04-30 99.0 4
2 2013-05-31 116.0 5
2013-03-31 1980-03 2304
3 Last few rows of Training Data
2013-04-30 1980-04 1712 Rose time
4 Time_Stamp
2013-05-31 1980-05 1471 NaT 76.0 126
5 NaT 78.0 127
NaT 70.0 128
Last few rows of Training Data NaT 83.0 129
YearMonth NaT 65.0 130
Sparkling time
Time_Stamp First few rows of Test Data
NaT 1990-06 1457 Rose time
126 Time_Stamp
NaT 1990-07 1899 NaT 110.0 43
127 NaT 132.0 44
NaT 1990-08 1605 NaT 54.0 45
128 NaT 55.0 46
NaT 1990-09 2424 NaT 66.0 47
129
NaT 1990-10 3116 Last few rows of Test Data
130 Rose time
Time_Stamp
First few rows of Test Data NaT 45.0 95
YearMonth NaT 52.0 96
Sparkling time NaT 28.0 97
Time_Stamp NaT 40.0 98
NaT 1990-11 4286 NaT 62.0 99
43
NaT 1990-12 6047 Now that our training and test data has
44 been modified, let us go ahead use
NaT 1991-01 1902 LinearRegression−−−−−−−−−−−−−−− to
build the model on the training data and
45
test the model on the test data
NaT 1991-02 2049
46
NaT 1991-03 1874
47
We can calculate the RMSE using the We can calculate the RMSE using the
helper function from the scikit-learn helper function from the scikit-learn library
library mean_squared_error() that mean_squared_error() that calculates the
calculates the mean squared error mean squared error between a list of
between a list of expected values (the expected values (the test set) and the list of
test set) and the list of predictions. We predictions. We can then take the square
can then take the square root of this root of this value to give us an RMSE
value to give us an RMSE score. score.
## Test Data - RMSE and MAPE ## Test Data - RMSE and MAPE
Test RMSE
RegressionOnTime 1374.550202
For this particular naive model, we say that the prediction for tomorrow is the same as today and
the prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is same as
today,therefore the prediction for day after tomorrow is also today.¶
Timestamps: Timestamps:
Time_Stamp Time_Stamp
NaT 3116 NaT 65.0
NaT 3116 NaT 65.0
NaT 3116 NaT 65.0
NaT 3116 NaT 65.0
NaT 3116 NaT 65.0
Name: naive, dtype: int64 Name: naive, dtype: float64
1496.
NaiveMod
44462
el
9
For Moving Average, we are going to average over the entire data.
In [69]:
Model Evaluation
Done only on the test data.:
For 2 point Moving Average Model forecast on the Training Data, RMSE
is 811.179
For 4 point Moving Average Model forecast on the Training Data, RMSE
is 1184.213
For 6 point Moving Average Model forecast on the Training Data, RMSE
is 1337.201
For 9 point Moving Average Model forecast on the Training Data, RMSE
is 1422.653
Test RMSE
1374.55020
RegressionOnTime
2
1496.44462
NaiveModel
9
Test RMSE
1368.74671
SimpleAverageModel
7
2pointTrailingMovingAverage 811.178937
1184.21329
4pointTrailingMovingAverage
5
1337.20052
6pointTrailingMovingAverage
4
1422.65328
9pointTrailingMovingAverage
1
We have the Sparkling wine sales data from Jan 1980 to Jul 1995.
Split the data into train and test in the ratio 70:30 Use Single Exponential Smoothing method to
forecast sales using the test data. Calculate the values of RMSE and MAPE. Plot the forecasted values
along with original values.
Forecasts are calculated using weighted averages where the weights decrease exponentially as
observations come from further in the past, the smallest weights are associated with the oldest
observations:
So essentially we’ve got a weighted moving average with two weights: α and 1−α.
As we can see, 1−α is multiplied by the previous expected value y x−1 which makes the expression
recursive. And this is why this method is called Exponential. The forecast at time t+1 is equal to a
weighted average between the most recent observation yt and the most recent forecast y t|t−1.
Sparkling Data:
Parameters
{'smoothing_level': 0.0,
'smoothing_slope': nan,
'smoothing_seasonal': nan,
'damping_slope': nan,
'initial_level': 2361.2692421307943,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
Test data:
YearMonth Sparkling predict
Time_Stamp
NaT 1990-11 4286 NaN
NaT 1990-12 6047 NaN
NaT 1991-01 1902 NaN
NaT 1991-02 2049 NaN
NaT 1991-03 1874 NaN
Rose data:
Parameters
{'smoothing_level': 0.10272096910328038,
'smoothing_slope': nan,
'smoothing_seasonal': nan,
'damping_slope': nan,
'initial_level': 134.26283319141947,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
Test data
Rose predict
Time_Stamp
NaT 110.0 NaN
NaT 132.0 NaN
NaT 54.0 NaN
NaT 55.0 NaN
NaT 66.0 NaN
HOLT METHOD
Holt extended simple exponential smoothing to allow forecasting of data with a trend. It is nothing
more than exponential smoothing applied to both level(the average value in the series) and trend. To
express this in mathematical notation we now need three equations: one for level, one for the trend
and one to combine the level and trend to get the expected forecast y
Using Holt’s winter method will be the best option among the rest of the models beacuse of the
seasonality factor. The Holt-Winters seasonal method comprises the forecast equation and three
smoothing equations — one for the level ℓt, one for trend bt and one for the seasonal component
denoted by st, with smoothing parameters α, β and γ.
ADDITIVE
5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.
Augmented Dickey-Fuller test
Statistical tests make strong assumptions about the data. They can only be used to inform the
degree to which a null hypothesis can be rejected or fail to be reject. The result must be
interpreted for a given problem to be meaningful. They can provide a quick check and
confirmatory evidence that your time series is stationary or non-stationary. The Augmented
Dickey-Fuller test is a type of statistical test called a unit root test. The intuition behind a unit root
test is that it determines how strongly a time series is defined by a trend.There are a number of
unit root tests and the Augmented Dickey-Fuller may be one of the more widely used. It uses an
autoregressive model and optimizes an information criterion across multiple different lag values.
The null hypothesis of the test is that the time series can be represented by a unit root, that it is
not stationary (has some time-dependent structure). The alternate hypothesis (rejecting the null
hypothesis) is that the time series is stationary. • Null Hypothesis (H0): If failed to be rejected, it
suggests the time series has a unit root, meaning it is non-stationary. It has some time
dependent structure. • Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the
time series does not have a unit root, meaning it is stationary. It does not have time-dependent
structure. We interpret this result using the p-value from the test. A p-value below a threshold
(such as 5% or 1%) suggests we reject the null hypothesis (stationary), otherwise a p-value
above the threshold suggests we fail to reject the null hypothesis (non-stationary). • p-value >
0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary. • p-
value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is
stationary.
• Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root,
meaning it is non-stationary. It has some time dependent structure. • Alternate Hypothesis (H1):
The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it
is stationary. It does not have time-dependent structure. We interpret this result using the p-value
from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null
hypothesis (stationary), otherwise a p-value above the threshold suggests we fail to reject the
null hypothesis (non-stationary). • p-value > 0.05: Fail to reject the null hypothesis (H0), the data
has a unit root and is non-stationary.- In Rose dataset • p-value <= 0.05: Reject the null
hypothesis (H0), the data does not have a unit root and is stationary.- In Sparkling dataset
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-
Regressive Integrated Moving Averages. ARIMA models work on the following assumptions – •
The data series is stationary, which means that the mean and variance should not vary with time.
A series can be made stationary by using log transformation or differencing the series. • The data
provided as input must be a univariate series, since ARIMA uses the past values to predict the
future values.
ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving
average term). Let us understand each of these components – • AR term refers to the past
values used for forecasting the next value. The AR term is defined by the parameter ‘p’ in ARIMA
The value of ‘p’ is determined using the PACF plot. • MA term is used to defines number of past
forecast errors used to predict the future values. The parameter ‘q’ in ARIMA represents the MA
term. ACF plot is used to identify the correct ‘q’ value. • Order of differencing specifies the
number of times the differencing operation is performed on series to make it stationary. Test like
ADF and KPSS can be used to determine whether the series is stationary and help in identifying
the d value. Auto ARIMA takes into account the AIC and BIC values generated Akaike’s
information criterion (AIC) compares the quality of a set of statistical models to each other.
Hyperparameter tuning for ARIMA In order to choose the best combination of the above
parameters, we’ll use a grid search. The best combination of parameters will give the lowest
Akaike information criterion (AIC) score. AIC tells us the quality of statistical models for a given
set of data.
The above plot shows that our predicted values catch up to the observed values in the dataset.
Our forecasts seem to align with the ground truth very well and spiked up through 95 to 96 shows
result as expected. RMSE is also low in this case. So, final ARIMA model can be represented as
SARIMAX. This is the best we can do with ARIMA, so let’s try another model to see whether we
can decrease the RMSE
In this case ARIMA performed the best
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
For example, an ARIMA model will show: • a spike at lag in the ACF but no other significant
spikes; • exponential decay in the seasonal lags of the PACF . Similarly, an ARIMA model will
show: • exponential decay in the seasonal lags of the ACF; • a single significant spike at lag in
the PACF. In considering the appropriate seasonal orders for a seasonal ARIMA model, restrict
attention to the seasonal lags.The modelling procedure is almost the same as for non-seasonal
data, except that we need to select seasonal AR and MA terms as well as the non-seasonal
components of the model. To determine a proper model for a given time series data, it is
necessary to carry out the ACF and PACF analysis. These statistical measures reflect how the
observations in a time series are related to each other. For modeling and forecasting purpose it is
often useful to plot the ACF and PACF against consecutive time lags. These plots help in
determining the order of AR and MA terms. Below we give their mathematical definitions: For a
time series{ } x(t),t = 0,1, 2,... the Autocovariance [21, 23] at lag k is defined as: γ = ( , ) = [( − μ)
( − μ)] k t t+k t t+k Cov x x E x x
In [ ]: