0% found this document useful (0 votes)
15 views54 pages

Business Report Time Series

The document outlines a data analysis project for ABC Estate Wines, focusing on historical wine sales data from the 20th century. It includes steps for exploratory data analysis, data pre-processing, model building, and performance comparison of various forecasting models, ultimately aiming to provide actionable insights for optimizing sales strategies. Key findings indicate trends and seasonal patterns in sales for both rose and sparkling wines, with recommendations for using advanced models like ARIMA and SARIMA for better forecasting accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views54 pages

Business Report Time Series

The document outlines a data analysis project for ABC Estate Wines, focusing on historical wine sales data from the 20th century. It includes steps for exploratory data analysis, data pre-processing, model building, and performance comparison of various forecasting models, ultimately aiming to provide actionable insights for optimizing sales strategies. Key findings indicate trends and seasonal patterns in sales for both rose and sparkling wines, with recommendations for using advanced models like ARIMA and SARIMA for better forecasting accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Busine

Sr. CONTENT Page


No. no.

A. Define the problem and perform Exploratory 1-8


Data Analysis

B. Data Pre-processing 9-12

C. Model Building - Original Data 12-24

D. Check for Stationarity 25-27

E. Model Building - Stationary Data 27-41

F. Compare the performance of the 41-46


models

G Actionable Insights & Recommendations 47-49

Problem Statement
As an analyst at ABC Estate Wines, we are presented with historical data encompassing the sales of different
types of wines throughout the 20th century. These datasets originate from the same company but represent
sales figures for distinct wine varieties. Our objective is to delve into the data, analyze trends, patterns, and
factors influencing wine sales over the course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic decision-making and optimize sales
strategies for the future.
Objective
The primary objective of this project is to analyze and forecast wine sales trends for the 20th century based on
historical data provided by ABC Estate Wines. We aim to equip ABC Estate Wines with the necessary insights and
foresight to enhance sales performance, capitalize on emerging market opportunities, and maintain a
competitive edge in the wine industry
Page 1

A. Define the problem and perform Exploratory Data Analysis:

Read the data as an appropriate time series data - Plot the data -
Perform EDA - Perform Decomposition.

A.1 Read the data as an appropriate time series data:-

ROSE WINE:-

SPARKLING WINE:-
Page 2

A.2 Plot the data:-


Page 3

A.3 Perform EDA:-


 Correlation between Rose and Sparkling sales: 0.40457904770543324
Page 4

 The center line in each box-plot represents the median sales price.
 The median is the price point where half of the sales were for a higher price and
the other half were for a lower price.
 The box in each plot contains the middle 50% of the sales data. The upper edge
of the box is the third quartile (Q3) and the lower edge is the first quartile (Q1).
 The whiskers extend from the top and bottom of the box to the highest and
lowest sales prices within 1.5 times the interquartile range (IQR). The IQR is the
difference between Q3 and Q1. Sales prices outside this range are considered
outliers and are shown as individual points in the plot.

ACCORDING TO THE DATA OF ROSE AND SPARKLING WINE:

 The median sales price for roses is lower than the median sales price for
sparkling wine.
 The sales price for roses is more spread out than the sales price for sparkling
wine. This means that there is a wider range of prices for roses than there is for
sparkling wine.
 There are more outliers in the sales price for sparkling wine than there are for
roses.
Page 5

A.4 Perform Decomposition:-

 The top graph shows the original time series data for rose sales.
Page 6

 The middle graph shows the seasonal component of the sales data. This graph
shows how rose sales vary throughout a typical year. For example, rose sales
tend to be higher in February around Valentine's Day and in May around
Mother's Day.
 The bottom graph shows the trend component of the sales data. This graph
shows the overall increase or decrease in rose sales over time.
Page 7

 The top graph shows the original time series data for sparkling sales. It appears
to show a declining trend in sparkling wine sales over several years.
 The middle graph shows the seasonal component of the sales data. This graph
shows how sparkling sales vary throughout a typical year. There is a peak in
December, which could be due to holiday sales.
 The bottom graph shows the trend component of the sales data. This graph
confirms the declining trend in sparkling wine sales over time.

TREND, SEASONALITY, RESIDUAL OF ROSE WINE:-


Page 8

 The title says "Residual Component of Rose Wine Sales" which means this graph
shows the difference between the actual sales figures and a predicted baseline
sales figure
 The x-axis shows years, ranging from 1983 to 1995.
 The y-axis shows the residual sales. Positive values on the y-axis indicate that
sales were higher than predicted in that year. Negative values on the y-axis
indicate that sales were lower than predicted in that year.
 Residual sales fluctuate from year to year, with some years having higher than
predicted sales and other years having lower than predicted sales.

TREND, SEASONALITY, RESIDUAL OF SPARKLING WINE:-

Coefficient of Variation for the Residual Component: -296.08335294144075


Page 9

B. Data Pre-Processing:
Missing value treatment - Visualize the

B.1 Missing values and treatment:-

TREATMENT:-

B.2 Visualize the processed data:-

ROSE:-

 The y-axis shows the sales of rose wine.


 The x-axis shows the year and month. The time period ranges from January 1980
to December 1996.
Page 10

 There appears to be an upward trend in rose wine sales over this time period.
Sales appear to be higher in later years (1990s) than in earlier years (1980s).

SPARKLING:-

 The y-axis shows the number of sparkling wine sales.


 The x-axis shows the year, ranging from 1980 to 1996.
 There appears to be some fluctuation in sales throughout the years, but there is
no clear upward or downward trend. But in 1988 sales is at highest point as
compare to others and lowest in around in 1995.

B.3 Train-Test and Split :-

ROSE:-
Page 11

SPARKILING:-
Page 12

C. Model Building - Original Data:-


Build forecasting models - Linear regression - Simple Average - Moving Average - Exponential Models
(Single, Double, Triple) - Check the performance of the models built

C.1 Model 1-Linear Regression:-


Page 13
Page 14

C2. Model 2- Simple Average :-


Page 15

C3. Model 3- Moving Average:-


Page 16

JUST COMPARING THE TILL ALL 3 MODELS:-

Based on the above information , a 2-point moving average model seems to be a


good starting point for forecasting both rose and sparkling wine sales due to its
low Root Mean Squared Error (RMSE) score. However, it's important to consider
some additional factors before settling on this model:

Simplicity vs. Accuracy: A 2-point moving average is a very simple model that only
considers the most recent two data points. This can be beneficial because it's easy to
understand and implement. However, it may not capture more complex trends in
the data.
Data Availability: A 2-point moving average only needs two data points to make a
prediction. This can be an advantage if data is limited.

For further analysis:

Compare the 2-point moving average model to other models: Try using more
sophisticated models like ARIMA or SARIMA which can capture trends and
seasonality. Compare the RMSE scores of these models to the 2-point moving
average model.
Visualize the forecasts: Plot the actual sales data along with the forecasts from the
2-point moving average model and any other models you consider. This will helps us
to see how well the models are capturing the trends in the data.
Page 17

C4. Model 4- Exponential Models (Single, Double, Triple):-

Exponential Smoothing Models -


• Single/Simple Exponential Smoothing with Additive Errors - ETS(A, N, N)
• Double Exponential Smoothing with Additive Errors, Additive Trends -
ETS(A, A, N)
• Triple Exponential Smoothing with Additive Errors, Additive Trends, Additive
Seasonality - ETS(A, A, A)
• Triple Exponential Smoothing with Additive Errors, Additive Trends,
Multiplicative Seasonality - ETS(A, A, M)
• Triple Exponential Smoothing with Additive Errors, Additive DAMPED
Trends, Additive Seasonality - ETS(A, Ad, A)
• Triple Exponential Smoothing with Additive Errors, Additive DAMPED
Trends, Multiplicative Seasonality - ETS(A, Ad, M).

a) SINGLE Exponential Smoothing with additive errors:-


Page 18

b) DOUBLE Exponential Smoothing with additive errors:-


Page 19

Rose - Alpha = 0 ; Beta = 0


Sparkling - Alpha = 0.665 ; Beta = 0.0001

Double Exponential Smoothing (DES) is a clear improvement over Single


Exponential Smoothing (SES) in this case because it can capture trends in the data.

c) Triple Exponential Smoothing with additive errors:-


Page 20

JUST COMPARING:

• Rose - Alpha = 0.0849 ; Beta = 0.0 ; Gamma = 0.00054


• Sparkling - Alpha = 0.11127 ; Beta = 0.01236 ; Gamma = 0.46071

d) Triple Exponential Smoothing with Additive errors, Additive Trends,


Multiplicative Seasonality - ETS(A, A, M):-
Page 21

• Rose - Alpha = 0.07736, Beta = 0.03936, Gamma = 0.00083


• Sparkling - Alpha = 0.07736, Beta = 0.04943, Gamma = 0.36205

e) Triple Exponential Smoothing with Additive Errors, Additive DAMPED Trends,


Additive Seasonality - ETS(A, Ad, A):-
Page 22

 Rose - Alpha= 0.07842, Beta = 0.01153, Gamma = 0.07738, Damping factor =


0.97503
 Sparkling - Alpha= 0.10062, Beta = 0.00018, Gamma = 0.51151, Damping factor=
0.97025

f) Triple Exponential Smoothing with Additive Errors, Additive DAMPED Trends,


Multiplicative Seasonality - ETS(A, Ad, M):-
Page 23

Comparing all the Models:-

Best model:-

 Rose — Triple Exponential Smoothing (Multiplicative Season)


 Sparkling — Triple Exponential Smoothing (Additive Season)
Page 24
Page 25

D. Check for Stationarity:-

The hypothesis in a simple form for the ADF test is:

H0 : The Time Series has a unit root and is thus non-stationary.;


H1 : The Time Series does not have a unit root and is thus stationary.

 5% of significance level reveals its non-stationary.

 To solve this issue, we'll apply a single level of difference to determine, if the
series becomes stationary.
Page 26

 p-value less than the significance level of 0.0, we reject the Null Hypothesis.

 It conclude that after applying a lag of 10, the Rose data becomes stationary.

 Significance level of 7%, the Time Series appears to be non-stationary.

 Let's apply one level of difference, if the series achieves stationary.

 p-value less than the significance level 0.05, so we reject the null hypothesis.

 As a result, after applying a lag of 10, the Sparkling data becomes stationary.
Page 27

• Acc. to the Industry standard , the Confidence Interval is 95%


• Alpha = 0.05; IF p-value < alpha :- Reject the Null Hypothesis and hence conclude
that given Time Series is Stationarity
• ADF Test, IF p-value > alpha ===> We fail to reject the Null Hypothesis and hence
conclude that given Time Series is Not Stationarity
• If Time Series is not Stationarity, then we can apply one level of difference and
check
for Stationary again.
• Again, if the Time Series is still not Stationarity, then we again apply one more level
of difference and check for Stationarity again
• Generally, Max. 2 levels of difference, Time Series becomes Stationarity
• If Time Series is Stationarity then we are ready to apply ARIMA / SARIMA Models.

E. Model Building - Stationary Data:-


Generate ACF & PACF Plot and find the AR, MA values. - Build different ARIMA models
(Auto ARIMA - Manual ARIMA) - Build different SARIMA models (Auto SARIMA -
Manual SARIMA ) - Check the performance of the models built.

Auto-Correlation Function (ACF) is a statistical tool used to measure the correlation


between a time series and its past values. It examines how each point in a time
series relates to its previous points. The "auto" aspect of auto-correlation implies
that it measures the correlation between a specific time instance and its preceding
instances within the same time series. ACF is often visualized through a plot that
displays correlations up to a certain lag unit, providing insights into the relationship
between consecutive observations.

SARIMA (3, 1, 1) (3, 0, 0, 12) Diagnostic Plot - SPARKLING Test RMSE Sparkling Test
MAPE Sparkling ARIMA(2,1,2) 1299.98 47.10 SARIMA (3, 1, 1) (3, 0, 0, 12) 601.24
25.87 Time Series Project ACF analysis helps determine the moving average parameter
'q' in ARIMA or SARIMA models.

Partial Auto-Correlation Function (PACF) measures the correlation between a time series
and its lagged values, excluding the intermediate instances. For instance, if the lag is
denoted as 'k,' PACF computes the correlation between the current value and the
value 'k' time units ago, disregarding the impact of observations between them. PACF
is represented through a plot that illustrates correlations among lag points. It assists
in determining the auto-regressive parameter 'p' in ARIMA or SARIMA models.
Page 28

E.1 Generate ACF & PACF plot and find the AR, MA Values:-

ACF & PACF of Rose:-

The significance level of 0.05 and analyzing the characteristics of the PACF and ACF
plots, we select the Auto-Regressive (AR) parameter 'p' as 2 and the Moving-Average
(MA) parameter 'q' also as 2. This decision is guided by identifying significant lags in
both plots before they cut off. The significant lag in the PACF plot before it
terminates informs the choice of 'p', while the significant lag in the ACF plot before it
cuts off guides the selection of 'q'. These parameter values are crucial for
constructing ARIMA or SARIMA models, providing insights into the temporal
dependencies within the time series data.
Page 29

ACF & PACF of Sparkling:-

For ARIMA:
Auto-Regressive (AR) parameter (p) = 0
Moving-Average (MA) parameter (q) = 0
Differencing parameter (d) = 1

For SARIMA:
Auto-Regressive (AR) parameter (p) = 0
Moving-Average (MA) parameter (q) = 0
Differencing parameter (d) = 1
Seasonal Auto-Regressive (SAR) parameters (P) = 0, 1, 2, 3
Seasonal Moving-Average (SMA) parameters (Q) = 1, 2, 3
Seasonal differencing parameter (D) = 0
Page 30

E.2 Build different ARIMA models (Auto ARIMA , Manual ARIMA) + E.3
Build different SARIMA models (Auto SARIMA , Manual SARIMA ):-

ROSE (MANUAL+AUTO):

1. ARIMA Auto- Rose (2,1,3):-


Page 31

2. ARIMA Manual- Rose (2,1,2):-


Page 32

3. SARIMA Auto- Rose (3,1,1):-


Page 33

4. SARIMA Manual- Rose (2,1,2)(2,1,2,12):-


Page 34

5. SARIMA Manual- Rose (2,1,2)(3,1,2,12):-


Page 35

SPARKLING(MANUAL):

1. ARIMA Auto- Sparkling (2,1,2):-


Page 36

2. ARIMA Manual- Sparkling (0,1,0):-


Page 37

3. SARIMA Auto- Sparkling (3,1,1)(3,0,0,12):-


Page 38

4. SARIMA Manual- Sparkling (0,1,0)(1,1,1,12):-


Page 39

5. SARIMA Manual- Sparkling (0,1,0)(2,1,1,12):-


Page 40

6. SARIMA Manual- Sparkling (0,1,0)(3,1,1,12):-


Page 41

ACCORDING TO THE DATA OF ARIMA / SARIMA

BEST MODEL FOR ROSE:- SARIMA(3,1,1)(3,0,2,12)


BEST MODEL FOR SPARKLING:- SARIMA (3,1,1)(3,0,2,12)

F. Compare the performance of the models:-


Compare the performance of all the models built - Choose the best model with
proper rationale - Rebuild the best model using the entire data - Make a forecast for
the next 12 months

F.1 Compare the performance of all the models built:-

F.2 Choose the best model with proper rationale:-

Best model:-

 Rose — Triple Exponential Smoothing (Multiplicative Season)


 Sparkling — Triple Exponential Smoothing (Additive Season)
Page 42

ACCORDING TO THE DATA OF ARIMA / SARIMA:-

 ROSE:- SARIMA(3,1,1)(3,0,2,12)

 SPARKLING:- SARIMA (3,1,1)(3,0,2,12)

F.3 Rebuild the best model using the entire data:-

FOR ROSE:- Triple Exponential Smoothing (Multiplicative Season)

• Rose - Alpha = 0.07736, Beta = 0.03936, Gamma = 0.00083


Page 43

ACCORDING TO THE DATA OF ARIMA / SARIMA:-

 ROSE:- SARIMA(3,1,1)(3,0,2,12)

FOR SPARKLING:- Triple Exponential Smoothing (Additive Season)

• Sparkling - Alpha = 0.11127 ; Beta = 0.01236 ; Gamma = 0.46071


Page 44

ACCORDING TO THE DATA OF ARIMA / SARIMA:-

 SPARKLING:- SARIMA(3,1,1)(3,0,2,12)
Page 45

F.4 Make a forecast for the next 12 months:-

 Rose — Triple Exponential Smoothing (Multiplicative Season)


Page 46

 Sparkling — Triple Exponential Smoothing (Additive Season)


Page 47

G. Actionable Insights & Recommendations:-

Conclude with the key takeaways (actionable insights and recommendations) for the
business:-

ROSE WINE:-

 Long-term decline: Sales have been dropping since 1980, indicating a decrease
in popularity.
 Seasonal spike: Sales rise significantly during the holiday season (Oct-Dec),
peaking in December. (Likely due to holiday celebrations)
 Post-holiday slump: Sales sharply decline in the first quarter (Jan-Mar), possibly
reflecting a post-holiday slowdown.
 Gradual recovery: Sales slowly pick up again by May-June.

Rose Wine Sales - Action Plan:

Capitalize on the Holiday Season:


 Increase Inventory: Stock up on rose wine in anticipation of the rising sales and
December peak (based on forecast).

Address Long-Term Decline:


 Data Analysis: Conduct further data analysis to understand the reasons behind
the long-term decline in sales.

Rebranding & Innovation:


 Consider Rebranding: Explore rebranding the existing rose wine with a fresh
image, potentially alongside a new winemaker.
Page 48

Marketing & Promotions:


 Pre-Holiday Push (Aug-Oct): Launch targeted marketing campaigns and special
offers to attract new customers, particularly first-time wine drinkers and those
open to different brands.

Decision Point (Post-Holiday Season):


 Evaluate Sales Performance: Assess the overall sales trend after the December
peak.
 Positive Trend: Continue with the existing rose wine variant.

SPARKLING WINE:-

Flat Trend, Seasonal Fluctuations:


 Unlike rose wine, sparkling wine sales show no long-term upward or downward
trend, indicating a stable but stagnant market.

Holiday Boom & Post-Holiday Slump:


 Similar to rose, sparkling wine experiences a significant seasonal spike during
the holiday season (Oct-Dec), with sales in December reaching nearly triple the
volume of September. This surge likely aligns with holiday celebrations.
 Following the December peak, sales plummet in the first quarter (Jan-Mar),
mirroring the post-holiday slump observed in rose.

Recovery and Recommendations:


 Sales gradually recover by July-August, suggesting a potential opportunity to:

1. Target summer celebrations: Develop marketing campaigns promoting sparkling


wine for summer gatherings (e.g., picnics, barbecues).
2. Maintain steady inventory: While December brings a significant sales increase, a
consistent inventory level throughout the year might help capture potential
customers who enjoy sparkling wine outside the holiday season.
Page 49

Sparkling Wine Sales - Action Plan:

Capitalize on the Holiday Season:


 Increase Inventory: Build stock in anticipation of rising sales and the December
peak (based on forecast).
 Targeted Advertising (Oct-Dec): Launch focused advertising campaigns during
the holiday season (Oct-Dec) to leverage the existing buying trend and
potentially boost sales further.

Product Innovation & Marketing:


 Celebration-Themed Design: Consider introducing a special, lower-priced bottle
design specifically for celebratory purposes (e.g., bottle designed for popping).
 Summer Marketing: Explore marketing campaigns promoting sparkling wine for
summer gatherings (e.g., picnics, barbecues) to capitalize on the sales recovery
period (Jul-Aug).

Investigate Flat Sales Trend:


 Deep Sales Dive (Jan-Mar): Utilize the first quarter (Jan-Mar) slowdown to
conduct a thorough analysis of year-over-year sales data to understand the
stagnant sales trend. This will help identify potential areas for improvement
outside the holiday season.

You might also like