0% found this document useful (0 votes)
24 views40 pages

TSF - Problem Statement

Uploaded by

ashvvinachhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views40 pages

TSF - Problem Statement

Uploaded by

ashvvinachhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Context & Objective

Context:
In today's dynamic business environment, precise sales and production forecasts are essential for strategic
planning and operational efficiency. Companies like IJK Shoe Company and RST Firm have accumulated
extensive monthly data on shoe sales and soft drink production, respectively, spanning from January 1980
to July 1995. Leveraging advanced time series forecasting techniques, these companies aim to utilize their
historical data to predict future trends accurately. This initiative enables them to make informed decisions,
optimize resource allocation, and adapt proactively to market dynamics.

Objective:
The primary objective is to predict future sales for IJK Shoe Company and production volumes for RST
Firm over the next one year. By analyzing the historical monthly data spanning from January 1980 to July
1995, our goal is to develop accurate forecasting models that capture the underlying patterns and
seasonality inherent in the sales and production processes. Through this task, we aim to empower IJK Shoe
Company and RST Firm with actionable insights that facilitate proactive planning, optimize resource
allocation, and enhance operational efficiency. By anticipating future trends in sales and production, both
companies can align their strategies, streamline production-related activities, and capitalize on emerging
opportunities in their respective markets.

Data Overview

Shoe Sales
Head:
YearMonth Shoe_Sales

0 1980-01 85

1 1980-02 89

2 1980-03 109

3 1980-04 95

4 1980-05 91

Tail:
Shoe_Sales
YearMonth

182 1995-03 188


Shoe_Sales
YearMonth

183 1995-04 195

184 1995-05 189

185 1995-06 220

186 1995-07 274

Plot the data:

We are able to see that the Sales see an upward trend during the months of July till Oct, then
see a downward trend.

Soft Drinks

Head:

YearMonth SoftDrinkProduction

0 1980-01 1954

1 1980-02 2302
YearMonth SoftDrinkProduction

2 1980-03 3054

3 1980-04 2414

4 1980-05 2226

Tail :

SoftDrinkProduction
YearMonth

182 1995-03 4067

183 1995-04 4022

184 1995-05 3937

185 1995-06 4365

186 1995-07 4290

Plot the data

It may be noticed that there are variations with sales trend month on month with November,
December taking precedence owing to the fact that these are holiday season.

Exploratory Data Analysis


Shoe Sales:

statistics summary:
year month
Shoe_Sales

count 187.000000 187.000000 187.000000

mean 245.636364 1987.299465 6.406417

std 121.390804 4.514749 3.450972

min 85.000000 1980.000000 1.000000

25% 143.500000 1983.000000 3.000000

50% 220.000000 1987.000000 6.000000

75% 315.500000 1991.000000 9.000000

max 662.000000 1995.000000 12.000000

The minimum sales is of 85 wutg the year being at 1980


The maximum sales is at 662 with year being 1995
The first quartile is at 143.50, 50th statnds at 220 & 75th is at 315
The standard deviation is at 121.39

We can notice outliers in the month of April & May


The maximum sales is during the month of September, November and December, which is
owing to onset of Winter and Holiday Season
The median or average sales differs month on month
There is an upward trend from July till December
The sales see a predomnantly less volume during the month of Mar, May and June, which
may be due to the seasons, where
people prefer to use more breezy shoewears or opt for more causual wear.

Outliers are noted in the month of 1980, 1984, 88 through 91 and during the years of 93 &
94.
The sales saw a huge volume in the year of 1987.
There were fewer sales during 1984
We are also able to see that there has been an upward trend year on year.
The upward trend may be attributed to the branding & advertisement of the product.
As noted there is an upward trend of sales with each year
Month wise, we note that post July, there is an upward sales trend
This is due to onset of winter and Holiday Seasons.

Decomposition:
It is evident that sales increase after the month of July and there is an upward trend during the
last quarter of the year
Although there is not much variation each month
The sales have definitely improved during the years between 1985 till 1991.
Soft Drinks
Bivariate Analysis

Softdrinks year month

count 187.000000 187.000000 187.000000

mean 3262.609626 1987.299465 6.406417

std 728.357367 4.514749 3.450972

min 1954.000000 1980.000000 1.000000

25% 2748.000000 1983.000000 3.000000

50% 3134.000000 1987.000000 6.000000

75% 3741.000000 1991.000000 9.000000

max 5725.000000 1995.000000 12.000000

The standard deviation of the year is at 4.51 and the sales is at 728.35
The first quartiel is at 2748 with 3rd quartile at 3741
The average sales is at 3262.60

We are able to see outlier in the month of April, May, June, August & November
The sales takes presedence during December, which may be due to the Holiday Season
The months February and July also see a considerable amount of sales as compared to other
months
The median sales each month also varies.

The outleirs are visible from the year of 1989 till 1995.
The maximum sales could be witnessed during the years of 1988
The median also varies each year with sales in the range of 2400 till 4200
The years 1980, 83 & 85 also witnessed a good amount of sales as compared to other years.
The sales do see an upward trend after July
Year on Year there has been an increase in the sales trend
The sales pattern however remains mostly same.

Decomposition:
From the plots above we can see the sales distributions over the years and months.
We can say that the production Exponentially increased every month, But more in the months
towards the end of the year.

The trend shows that production have increased every year.

Data Pre-processing
Shoe Sales
Train Test Split

The data was split into train and test data

We followed 80 & 20 rule during split

The train datasets were prior to the year 1991 and test datasets are post the year 1991

We are able to see an upward trend on the train dataset

The test dataset also has an upward trend with every year.

Soft Drinks
Train Test Split
The data was split into train and test data

We followed 80 & 20 rule during split

The train datasets were prior to the year 1991 and test datasets are post the year 1991

With training datset, we are able to see that the maximum sales happened during 1989

With test data set the highest contribution came from years 1994 & 1995.

Model Building - Original Data:


Shoes Sales
Linear Regression
With Linear Regression Model, we are able to see an upward
trend with the test data set which is also the trend noticed
The RMSE is at 263.79 which also confirmst the fact that the
model performs well during test dataset
This model has an affinity towards good performance on test
dataset.
With Moving Average, we are able to see that there is not much
variance with the test data
The average sales remains very similar and follows a straight
line
This does not affect the performance of the test datset
RMSE is at 63.98 for Simple Average
RMSE is at 45.94 for Moving Average Model.
We bulit a Simple Exponential Model on train and test dataset
This model exhibits a simple trend with predicting upward sales
on the test dataset
The RMSE for this model is at 196.40.

Double Exponential Model

Linear Regression Model 263.790974


RMSE
Simple Average Model 63.98457
RMSE
Moving Average (2) Model 45.948736
RMSE
Single Exp. Smoothing Model: Level 0.61 196.404837
RMSE
Double Exp Smoothing Model: Level 0.59 ,Trend 0.0 288.576717

Triple Exponential Model:


RMSE

Moving Average (2) Model 45.948736

Simple Average Model 63.984570

Triple Exp Smoothing Model: Level 0.57 ,Trend 0.0 ,Seasonality 0.29 128.992526

Single Exp. Smoothing Model: Level 0.61 196.404837

Linear Regression Model 263.790974

Double Exp Smoothing Model: Level 0.59 ,Trend 0.0 288.576717

With Exponential Smoothening, the triple Exponential model is better with lower RMSE as
compared to other models

and better prediction of seasonality.

Tuned Exponential Model:


Beta Values Gamma Values RMSE
Alpha Values

162 0.11 0.61 0.21 43.951774

163 0.11 0.61 0.31 44.081443

183 0.11 0.81 0.31 47.060942

193 0.11 0.91 0.31 47.231154

164 0.11 0.61 0.41 47.879588

We tuned the model and applied alpha, beta and gamma to smooth the parameters

We are able to see that the tuned model works quiet well on training dataset and has a variable
prediction.
With tuned paremeters applied on test data, we are able to see that triple expnential smoothening
performs quiet well
Beta Gamma
Alpha Values
RMSE Values Values

162 43.951774 0.11 0.61 0.21

Moving Average (2) Model 45.948736 NaN NaN NaN

Simple Average Model 63.984570 NaN NaN NaN

Triple Exp Smoothing Model: Level 0.57 ,Trend


128.992526 NaN NaN NaN
0.0 ,Seasonality 0.29

Single Exp. Smoothing Model: Level 0.61 196.404837 NaN NaN NaN

Linear Regression Model 263.790974 NaN NaN NaN

Double Exp Smoothing Model: Level 0.59 ,Trend 0.0 288.576717 NaN NaN NaN

With RMSE being still at 128.99 for Triple Exponential Smoothening, this model has more accuracy
in predicting future trends

Soft Drinks
Linear Regression
The data was split into train and test data

We followed 80 & 20 rule during split

The train datasets were prior to the year 1991 and test datasets are post the year 1991

The test dataset also has an upward trend with every year

With Linear Regression Model, there is upward trend year on year noted with RMSE being at
775.75.

Moving Average:
With Moving Average, we are able to see that there is not much variance with the test data

The average sales remains very similar and follows a straight line

RMSE is at 556.72 for Moving Average Model.

Simple Exponential Model:


We bulit a Simple Exponential Model on train and test dataset

This model exhibits a simple trend but predicts an downward trend on the test dataset

The RMSE for this model is at 819.40

Double Exponential Model:


We then bulit a Double Exponential Model on train and test dataset

We are able to see that this model is much better and has an upward prediction trend and is
predicting

more closer to the data behavior

The RMSE now stands at 1074.32

Triple Exponential Model


RMSE

Simple Average Model 63.984570

Triple Exp Smoothing Model: Level 0.15 ,Trend 0.04 ,Seasonality 0.26 458.965428

Moving Average (2) Model 556.725418

Linear Regression Model 775.757118

Single Exp. Smoothing Model: Level 0.16 819.401213

Double Exp Smoothing Model: Level 0.12 ,Trend 0.11 1074.329653

Tuned Exponential Model:


Alpha Values Beta Values Gamma Values RMSE

204 0.21 0.01 0.41 401.709838

205 0.21 0.01 0.51 404.012373

203 0.21 0.01 0.31 407.935870

303 0.31 0.01 0.31 409.870512


Alpha Values Beta Values Gamma Values RMSE

206 0.21 0.01 0.61 412.604312

We tuned the model and applied alpha, beta and gamma to smooth the parameters

We are able to see that the tuned model works quiet well on training dataset and has a variable
prediction.

Alpha Beta Gamma


RMSE
Values Values Values

Simple Average Model 63.984570 NaN NaN NaN

204 401.709838 0.21 0.01 0.41

Triple Exp Smoothing Model: Level 0.15 ,Trend


458.965428 NaN NaN NaN
0.04 ,Seasonality 0.26

Moving Average (2) Model 556.725418 NaN NaN NaN

Linear Regression Model 775.757118 NaN NaN NaN

Single Exp. Smoothing Model: Level 0.16 819.401213 NaN NaN NaN

Double Exp Smoothing Model: Level 0.12 ,Trend


1074.329653 NaN NaN NaN
0.11
Stationarity Check:
Shoes Sales:

Results of Dickey-Fuller Test:


Test Statistic -1.717397
p-value 0.422172
#Lags Used 13.000000
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
Critical Value (5%) -2.878396
Critical Value (10%) -2.575756
dtype: float64
Results of Dickey-Fuller Test:
Test Statistic -3.144211
p-value 0.023450
#Lags Used 13.000000
Number of Observations Used 117.000000
Critical Value (1%) -3.487517
Critical Value (5%) -2.886578
Critical Value (10%) -2.580124
dtype: float64

DF test statistic is -1.361


DF test p-value is 0.6008

DF test statistic is -3.144


DF test p-value is 0.0234
We see that after differencing at 5% significant level the Time Series becomes
stationary as p
value = 0.02 is less than alpha = 0.05.
The Augmented Dickey-Fuller test is an unit root test which determines
whether there is a unit
root and subsequently whether the series is non-stationary.
The hypothesis in a simple form for the ADF test is:
H0: The Time Series has a unit root and is thus non-stationary.
H1: The Time Series does not have a unit root and is thus stationary.
We would want the series to be stationary for building ARIMA models and
thus we would want
the p-value of this test to be less than the α value.

Soft Drinks
Results of Dickey-Fuller Test:
Test Statistic -1.717397
p-value 0.422172
#Lags Used 13.000000
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
Critical Value (5%) -2.878396
Critical Value (10%) -2.575756
dtype: float64

We see that at 5% significant level the Time Series is non stationary as p value
= 0.42 is greater than alpha = 0.05.
We use differencing approach to make the series stationary.
Results of Dickey-Fuller Test:
Test Statistic -3.144211
p-value 0.023450
#Lags Used 13.000000
Number of Observations Used 117.000000
Critical Value (1%) -3.487517
Critical Value (5%) -2.886578
Critical Value (10%) -2.580124
dtype: float64

DF test statistic is -1.361


DF test p-value is 0.6008
DF test statistic is -3.144
DF test p-value is 0.0234
The test majorly tests if a null hypothesis is present in the autoregressive time
series model.
The p value is at 0.95, which is more than 0.05 and hence we reject the null
hypothesis and accept that the null hypothesis is present. After differencing
we do see that stationary has same trend as rolling mean.
The p value is comparatively high at 8.87 which implies alternative hypothesis
is true and that moving average does have a significant affect on the
performance of the model.

Check the ACF and PACF Plots


Shoes Sales
Soft Drinks:
Model Building Stationary Data
Shoe Sales
Auto ARIMA
ARIMA(0, 0, 0) - AIC:1510.0865416806232
ARIMA(0, 0, 1) - AIC:1498.6078723523256
ARIMA(0, 0, 2) - AIC:1496.135095936209
ARIMA(1, 0, 0) - AIC:1503.3488752554235
ARIMA(1, 0, 1) - AIC:1493.2550628348156
ARIMA(1, 0, 2) - AIC:1495.1540870299634
ARIMA(2, 0, 0) - AIC:1500.5666398730273
ARIMA(2, 0, 1) - AIC:1495.1680883328036
ARIMA(2, 0, 2) - AIC:1495.0868899629804

Some parameter combinations for the Model...


Model: (0, 0, 1)
Model: (0, 0, 2)
Model: (1, 0, 0)
Model: (1, 0, 1)
Model: (1, 0, 2)
Model: (2, 0, 0)
Model: (2, 0, 1)
Model: (2, 0, 2)
ARIMA(0, 0, 0) - AIC:1510.0865416806232
ARIMA(0, 0, 1) - AIC:1498.6078723523256
ARIMA(0, 0, 2) - AIC:1496.135095936209
ARIMA(1, 0, 0) - AIC:1503.3488752554235
ARIMA(1, 0, 1) - AIC:1493.2550628348156
ARIMA(1, 0, 2) - AIC:1495.1540870299634
ARIMA(2, 0, 0) - AIC:1500.5666398730273
ARIMA(2, 0, 1) - AIC:1495.1680883328036
ARIMA(2, 0, 2) - AIC:1495.0868899629804
param AIC
4 (1, 0, 1) 1493.255063
8 (2, 0, 2) 1495.086890
5 (1, 0, 2) 1495.154087
7 (2, 0, 1) 1495.168088
2 (0, 0, 2) 1496.135096
1 (0, 0, 1) 1498.607872
6 (2, 0, 0) 1500.566640
3 (1, 0, 0) 1503.348875
0 (0, 0, 0) 1510.086542

Manual ARIMA:

Auto SARIMA

Manual SARIMA
Soft Drinks:
Auto ARIMA
Some parameter combinations for the Model...
Model: (0, 0, 1)
Model: (0, 0, 2)
Model: (1, 0, 0)
Model: (1, 0, 1)
Model: (1, 0, 2)
Model: (2, 0, 0)
Model: (2, 0, 1)
Model: (2, 0, 2)
ARIMA(0, 0, 0) - AIC:758.6368500789519
ARIMA(0, 0, 1) - AIC:759.8713034294582
ARIMA(0, 0, 2) - AIC:761.6791565339296
ARIMA(1, 0, 0) - AIC:759.9391043347649
ARIMA(1, 0, 1) - AIC:761.4777625813728
ARIMA(1, 0, 2) - AIC:763.4596380764991
ARIMA(2, 0, 0) - AIC:761.8332949007508
ARIMA(2, 0, 1) - AIC:763.4512219957444
ARIMA(2, 0, 2) - AIC:763.9034264159081
Comparison of Model Performance
Shoe Sales:
Alpha Beta Gamma
RMSE
Values Values Values

ARIMA Model (1, 1, 1) 27.004010 NaN NaN NaN

SARIMA Model (1, 1, 1, 12) 30.763507 NaN NaN NaN

162 43.951774 0.11 0.61 0.21

Moving Average (2) Model 45.948736 NaN NaN NaN

Simple Average Model 63.984570 NaN NaN NaN

Triple Exp Smoothing Model: Level 0.57 ,Trend


128.992526 NaN NaN NaN
0.0 ,Seasonality 0.29

Single Exp. Smoothing Model: Level 0.61 196.404837 NaN NaN NaN

Linear Regression Model 263.790974 NaN NaN NaN

Double Exp Smoothing Model: Level 0.59 ,Trend 0.0 288.576717 NaN NaN NaN

Auto ARIMA, SARIMA and 2 point Moving Average forecasting models have
lowest RMSE
and hence are the most accurate forecasting models respectively.

Soft Drinks:

Beta Gamma
Alpha Values
RMSE Values Values

ARIMA Model (1, 1, 1) 27.004010 NaN NaN NaN


Beta Gamma
Alpha Values
RMSE Values Values

SARIMA Model (1, 1, 1, 12) 30.763507 NaN NaN NaN

Simple Average Model 63.984570 NaN NaN NaN

204 401.709838 0.21 0.01 0.41

Triple Exp Smoothing Model: Level 0.15 ,Trend


458.965428 NaN NaN NaN
0.04 ,Seasonality 0.26

Moving Average (2) Model 556.725418 NaN NaN NaN

Linear Regression Model 775.757118 NaN NaN NaN

Single Exp. Smoothing Model: Level 0.16 819.401213 NaN NaN NaN

Double Exp Smoothing Model: Level 0.12 ,Trend 0.11 1074.329653 NaN NaN NaN

Auto ARIMA, SARIMA, and SImple Moving Average forecasting models


have lowest RMSE
and hence are the most accurate forecasting models respectively.

Soft Drink Production is poised to increase substantially in the next 12


months.
There is a seasonality component in the production amount which multiplies
in each quarter.
Within a quarter it peaks up in the middle.
It is interesting to note that at the end of the year there is high production
due to increased demand during Festive season like Halloween, Christmas,
and New year etc.
It is recommended to plan branding the Soft Drink with Santa and New Year
themes.

Shoe sales are poised to experience a significant upward trend in the coming
year.
Historical data reveals a consistent growth in sales since 1980,
with a temporary decline observed in the late 1980s and early 1990s.
However, sales figures are projected to reach unprecedented heights, nearly
doubling previous records.
Notably, seasonal patterns greatly influence shoe sales, with a surge in
demand towards the end of the year,
likely due to winter.
To leverage this anticipated growth, it is advisable to launch targeted
marketing campaigns
and offer attractive discounts at the beginning of each year.
By doing so, businesses can capitalize on the expected rise in sales and
maximize their market impact.
Over the next 12 months, there is a strong expectation for a significant
increase in soft drink production.
This growth is attributed to a seasonality component that amplifies
production levels in each quarter,
with a peak occurring in the middle of each quarter.
Notably, the end of the year experiences a surge in production due to
heightened demand during festive seasons
such as Halloween, Christmas, and New Year.
Considering this trend, it is advisable to strategically plan branding initiatives
that align soft drinks with
Santa and New Year themes.
By doing so, companies can effectively capitalize on the increased demand and
reinforce their
products association with these popular festivities.

You might also like