Business Report Time Series
Business Report Time Series
Problem Statement
As an analyst at ABC Estate Wines, we are presented with historical data encompassing the sales of different
types of wines throughout the 20th century. These datasets originate from the same company but represent
sales figures for distinct wine varieties. Our objective is to delve into the data, analyze trends, patterns, and
factors influencing wine sales over the course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic decision-making and optimize sales
strategies for the future.
Objective
The primary objective of this project is to analyze and forecast wine sales trends for the 20th century based on
historical data provided by ABC Estate Wines. We aim to equip ABC Estate Wines with the necessary insights and
foresight to enhance sales performance, capitalize on emerging market opportunities, and maintain a
competitive edge in the wine industry
Page 1
Read the data as an appropriate time series data - Plot the data -
Perform EDA - Perform Decomposition.
ROSE WINE:-
SPARKLING WINE:-
Page 2
The center line in each box-plot represents the median sales price.
The median is the price point where half of the sales were for a higher price and
the other half were for a lower price.
The box in each plot contains the middle 50% of the sales data. The upper edge
of the box is the third quartile (Q3) and the lower edge is the first quartile (Q1).
The whiskers extend from the top and bottom of the box to the highest and
lowest sales prices within 1.5 times the interquartile range (IQR). The IQR is the
difference between Q3 and Q1. Sales prices outside this range are considered
outliers and are shown as individual points in the plot.
The median sales price for roses is lower than the median sales price for
sparkling wine.
The sales price for roses is more spread out than the sales price for sparkling
wine. This means that there is a wider range of prices for roses than there is for
sparkling wine.
There are more outliers in the sales price for sparkling wine than there are for
roses.
Page 5
The top graph shows the original time series data for rose sales.
Page 6
The middle graph shows the seasonal component of the sales data. This graph
shows how rose sales vary throughout a typical year. For example, rose sales
tend to be higher in February around Valentine's Day and in May around
Mother's Day.
The bottom graph shows the trend component of the sales data. This graph
shows the overall increase or decrease in rose sales over time.
Page 7
The top graph shows the original time series data for sparkling sales. It appears
to show a declining trend in sparkling wine sales over several years.
The middle graph shows the seasonal component of the sales data. This graph
shows how sparkling sales vary throughout a typical year. There is a peak in
December, which could be due to holiday sales.
The bottom graph shows the trend component of the sales data. This graph
confirms the declining trend in sparkling wine sales over time.
The title says "Residual Component of Rose Wine Sales" which means this graph
shows the difference between the actual sales figures and a predicted baseline
sales figure
The x-axis shows years, ranging from 1983 to 1995.
The y-axis shows the residual sales. Positive values on the y-axis indicate that
sales were higher than predicted in that year. Negative values on the y-axis
indicate that sales were lower than predicted in that year.
Residual sales fluctuate from year to year, with some years having higher than
predicted sales and other years having lower than predicted sales.
B. Data Pre-Processing:
Missing value treatment - Visualize the
TREATMENT:-
ROSE:-
There appears to be an upward trend in rose wine sales over this time period.
Sales appear to be higher in later years (1990s) than in earlier years (1980s).
SPARKLING:-
ROSE:-
Page 11
SPARKILING:-
Page 12
Simplicity vs. Accuracy: A 2-point moving average is a very simple model that only
considers the most recent two data points. This can be beneficial because it's easy to
understand and implement. However, it may not capture more complex trends in
the data.
Data Availability: A 2-point moving average only needs two data points to make a
prediction. This can be an advantage if data is limited.
Compare the 2-point moving average model to other models: Try using more
sophisticated models like ARIMA or SARIMA which can capture trends and
seasonality. Compare the RMSE scores of these models to the 2-point moving
average model.
Visualize the forecasts: Plot the actual sales data along with the forecasts from the
2-point moving average model and any other models you consider. This will helps us
to see how well the models are capturing the trends in the data.
Page 17
JUST COMPARING:
Best model:-
To solve this issue, we'll apply a single level of difference to determine, if the
series becomes stationary.
Page 26
p-value less than the significance level of 0.0, we reject the Null Hypothesis.
It conclude that after applying a lag of 10, the Rose data becomes stationary.
p-value less than the significance level 0.05, so we reject the null hypothesis.
As a result, after applying a lag of 10, the Sparkling data becomes stationary.
Page 27
SARIMA (3, 1, 1) (3, 0, 0, 12) Diagnostic Plot - SPARKLING Test RMSE Sparkling Test
MAPE Sparkling ARIMA(2,1,2) 1299.98 47.10 SARIMA (3, 1, 1) (3, 0, 0, 12) 601.24
25.87 Time Series Project ACF analysis helps determine the moving average parameter
'q' in ARIMA or SARIMA models.
Partial Auto-Correlation Function (PACF) measures the correlation between a time series
and its lagged values, excluding the intermediate instances. For instance, if the lag is
denoted as 'k,' PACF computes the correlation between the current value and the
value 'k' time units ago, disregarding the impact of observations between them. PACF
is represented through a plot that illustrates correlations among lag points. It assists
in determining the auto-regressive parameter 'p' in ARIMA or SARIMA models.
Page 28
E.1 Generate ACF & PACF plot and find the AR, MA Values:-
The significance level of 0.05 and analyzing the characteristics of the PACF and ACF
plots, we select the Auto-Regressive (AR) parameter 'p' as 2 and the Moving-Average
(MA) parameter 'q' also as 2. This decision is guided by identifying significant lags in
both plots before they cut off. The significant lag in the PACF plot before it
terminates informs the choice of 'p', while the significant lag in the ACF plot before it
cuts off guides the selection of 'q'. These parameter values are crucial for
constructing ARIMA or SARIMA models, providing insights into the temporal
dependencies within the time series data.
Page 29
For ARIMA:
Auto-Regressive (AR) parameter (p) = 0
Moving-Average (MA) parameter (q) = 0
Differencing parameter (d) = 1
For SARIMA:
Auto-Regressive (AR) parameter (p) = 0
Moving-Average (MA) parameter (q) = 0
Differencing parameter (d) = 1
Seasonal Auto-Regressive (SAR) parameters (P) = 0, 1, 2, 3
Seasonal Moving-Average (SMA) parameters (Q) = 1, 2, 3
Seasonal differencing parameter (D) = 0
Page 30
E.2 Build different ARIMA models (Auto ARIMA , Manual ARIMA) + E.3
Build different SARIMA models (Auto SARIMA , Manual SARIMA ):-
ROSE (MANUAL+AUTO):
SPARKLING(MANUAL):
Best model:-
ROSE:- SARIMA(3,1,1)(3,0,2,12)
ROSE:- SARIMA(3,1,1)(3,0,2,12)
SPARKLING:- SARIMA(3,1,1)(3,0,2,12)
Page 45
Conclude with the key takeaways (actionable insights and recommendations) for the
business:-
ROSE WINE:-
Long-term decline: Sales have been dropping since 1980, indicating a decrease
in popularity.
Seasonal spike: Sales rise significantly during the holiday season (Oct-Dec),
peaking in December. (Likely due to holiday celebrations)
Post-holiday slump: Sales sharply decline in the first quarter (Jan-Mar), possibly
reflecting a post-holiday slowdown.
Gradual recovery: Sales slowly pick up again by May-June.
SPARKLING WINE:-