0% found this document useful (0 votes)

145 views26 pages

Time Series Forecasting Project (Shoe Sales)

This document describes analyzing a time series dataset of shoe sales to build forecasting models. It includes splitting the data into training and test sets, exploring the data through decomposition and plots, building linear regression, naive, simple average and exponential smoothing models on the training data and evaluating them on the test set using RMSE.

Uploaded by

PARIJAT DEV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views26 pages

Time Series Forecasting Project (Shoe Sales)

Uploaded by

PARIJAT DEV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Time Series Forecasting

Extended Project
(Shoes Sales)
By - Parijat Dev

1
1. Executive Summary 3
2. Data details 3
Q1 Read the data as an appropriate Time Series data and plot the data 3
Q2- Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition. 5
Data Description 5
Q3- Split the data into training and test. The test data should start in 1991. 8
We can observe the training and test data in the above plot, the blue part of the plots
depicts the Train datasets (January ’80 – December ‘90), and the Orange part of the plots
depict the test datasets (January ’91 – July ‘95). 9
Q4- Build various exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naïve forecast
models, simple average models etc. should also be built on the training data and check
the performance on the test data using RMSE. 9
Q5- Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05. 17
6 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE. 18
Q7 - Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE. 23
Q 8- Build a table with all the models built along with their corresponding parameters and
the respective RMSE values on the test data. 24
Q9- Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands. 25
Q10 - Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales. 26

2
1. Executive Summary
You are an analyst in the IJK shoe company and you are expected to forecast the sales of the
pairs of shoes for the upcoming 12 months from where the data ends. The data for the pair of
shoe sales have been given to you from January 1980 to July 1995.

2. Data details

Figure 1- Shoe Sales Time Series Plot

Data set contains two columns, where the first column shows the month and year of the
corresponding Production Quantity recorded in the second column

Q1 Read the data as an appropriate Time Series

data and plot the data
I have imported the data series and as we can observe, entry has an YearMonth value with it,
which is not really a data point, but an index for the sales entry. So in reality the datasets have a
single column that contains the quantity of shoes sales in that particular month. Here, while
reading the datasets I have given the argument in a way so that it parses the first column which
is date column, and indicates to the system that this is a one column series through squeeze. It
can be observed the dataset has data starting from January 1980 going till July 1995, so there
are 187 entries in totality in each dataset. the Data Now that I have uploaded the dataset with
no arguments (and hence uploaded the datasets without parsing the dates here), I will need to
provide a time stamp value by ourselves. In addition to that I have removed the YearMonth
variable and added a time stamp to the dataset myself. I have plotted the time series below.

3
Figure 2- Shoe Sales Time Series Plot

As we can observe from the above plot, the sales of shoes was in upward direction. There is a
certain seasonality element that is visible in the graph. We will explore the trend and seasonality
further during decomposition, where we will be able to view a much detailed report on these two
factors.

4
Q2- Perform appropriate Exploratory Data Analysis
to understand the data and also perform
decomposition.
Data Description

Figure 3 - Description of the Dataset

As we can see from the above, the shoes sales time series data look like they are skewed.
There is High Standard Deviation for the time series since the Min and Max have significant
difference between them. Moreover, there is difference between the mean and the median for
the same reason of skewness. As mentioned earlier, there are in total 187 records in the
dataset.

Yealy Boxplot

5
Figure 4 - Yearly Boxplot of the Dataset

As we can see the data is showing upward and downward trend, till 1987 the shoe sales have
shown a significant growth and from 1987 onwards the growth of shoe sales got hampered and
the sales number started to decline. The highest shoe sales has happed in 1987. The year 1984
is the year with lowest variation in sales.

Monthly Boxplot

Figure 4 - Yearly Boxplot of the Dataset

6
As we can observe that there is more sales happening towards the second half of the year than
the fist half. Highest sales have happened in the december month.

Monthly Sales Across Years

Figure 5 - Monthly sales line chart across Years

December and Novemeber are the months that derives maximum sales throughout the years.
From the above chart it is visible that the seasonality element is present in the chart.

7
Decomposition

Figure 6 - Multiplicative Decomposition of the data

From the above decomposition chart it is clear that the trend and seasonality both are present in
the dataset. The residuals are minimal in the multiplicative decomposition.

Q3- Split the data into training and test. The test
data should start in 1991.
I have split the time series datasets into Train and Test datasets below. It is given the question
that the Test Data should start in 1991.

Figure 7 - Test and Training Datasets

8
I have also confirmed that the Train dataset indeed ends in 1990, and the Test dataset indeed
starts in 1991 by using the Head and Tail functions on the Training and Test dataset. As we can
observe, the size of the Train data frame is 132 observations and that of the Test data frame is
55 observations.
I have also plotted the Train and test data frames for both time series datasets below:

Figure 8 - Test and Training Datasets

We can observe the training and test data in the above plot, the blue part of the plots depicts the
Train datasets (January ’80 – December ‘90), and the Orange part of the plots depict the test
datasets (January ’91 – July ‘95).

Q4- Build various exponential smoothing models on the training

data and evaluate the model using RMSE on the test data. Other
models such as regression, naïve forecast models, simple
average models etc. should also be built on the training data and
check the performance on the test data using RMSE.
In this section I will try to run the various available models on time series data set. Let’s kick off
the analysis with Linear Regression model.

4.1 Linear Regression

Figure 9 - Test and Training Data for linear regression

9
Following is the results from a Linear Regression model on the dataset:

Figure 10 - Test and Training Data for linear regression

For Regression on Time forecast on the Test Data,

RMSE = 266.28

4.2 Naive Model

Figure 11 - Test and Training Data for Naive Model

10
For Naive model on Time forecast on the Test Data, RMSE = 245.1

4.3 Simple Average Model

Figure 12 - Test and Training Data for Simple Average

The extracts of Training data for the Simple Average Model can be seen below:

Figure 13 - Extract of Training data

For Simple Average Model, RMSE = 63.985

11
4.4 Moving Average Model

Figure 13 - Dataset for the moving Average

Results from Moving Average

12
Figure 15: Moving Average Model Outcome

For 2 point Moving Average Model forecast on the Testing Data, RMSE = 45.949
For 4 point Moving Average Model forecast on the Testing Data, RMSE = 57.873
For 6 point Moving Average Model forecast on the Testing Data, RMSE = 63.457
For 9 point Moving Average Model forecast on the Testing Data, RMSE = 67.724

I have applied 2, 4, 6 and 9-point trailing averages on the dataset.

As we can observe from the above plots, all of the trailing average plots show prediction values
below the actual train and test data sets, and the 9 point trailing average plot shows the lowest
prediction of all the plots. The closest prediction to actual data is shown by the 2 point trailing
moving average model. This observation is corroborated by the RMSE scores for each of these
moving average models.

13
4.5 Simple Smoothing

Figure 15: Moving Average Model Outcome

Figure 16: SES Parameters

Following is the result from running a SES Model on the dataset:

Figure 17: SES Model Outcome

For Alpha =0.605 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is
196.405

14
4.6 Double Exponential Smoothing

Figure 18: DES Parameters

Following is the result from running a SES & DES Model on the dataset:

Figure 19: Smoothing Models Outcome

Double Exponential Smoothing RMSE = 288.5473422908694

4.7 Triple Exponential Smoothing

Figure 20: TES Parameters

15
Following is the result from running a SES, DES & TES Model on the dataset:

Figure 21: Smoothing Models Outcome

Triple Exponential Smoothing Model forecast on the Test, RMSE = 128.992526

The summarized performance of the models run on the Shoe Sales datasets can be seen
below:

Figure 22: Performance metrics of Different models

16
Q5- Check for the stationarity of the data on which the model is
being built on using appropriate statistical tests and also mention
the hypothesis for the statistical test. If the data is found to be
non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.
5.1 Dickey Fuller test on the stationarity of the data
DF test statistic is -1.717
DF test p-value is 0.4222

As the p value is more than 0.05 the data is not stationary. We can try different methods to
make the data stationary.

5.2 Data Stationarity by Differencing

Figure 23: New Data after Differencing it by 1

AD Fuller test on the new Data after differencing
DF test statistic is -3.479
DF test p-value is 0.0085

17
5.3 Data Stationarity by Differencing and Log Transformation

Figure 24: New Data after Differencing it by 1 and Log Transfornation

AD Fuller test on the new Data after differencing

DF test statistic is -3.479
DF test p-value is 0.0086
The data has become stationary

6 Build an automated version of the ARIMA/SARIMA model in

which the parameters are selected using the lowest Akaike
Information Criteria (AIC) on the training data and evaluate this
model on the test data using RMSE.

18
6.1 ARIMA

Figure 24: Best AIC

Figure 25: ARIMA Results

19
Figure 26: ARIMA Model Performance

The Root Mean Squared Error of ARIMA forecasts is 175.196

20
6.2 SARIMA

Figure 27: AUTO Sarima Performance

Figure 28: Best AIC values for SARIMA

21
Diagnostic Plot

Figure 29: SARIMA Dianostic Plot

The Root Mean Squared Error of SARIMA forecasts is 69.03

22
Q7 - Build ARIMA/SARIMA models based on the cut-off
points of ACF and PACF on the training data and evaluate
this model on the test data using RMSE.

Figure 30: ACF Parameters

Figure 31: PACF Parameters

23
Q 8- Build a table with all the models built along with their
corresponding parameters and the respective RMSE
values on the test data.

Figure 32: Model Performances

As We can see from the above table that the best model performance is of the 2 point
Trailing Average

24
Q9- Based on the model-building exercise, build the
most optimum model(s) on the complete data and
predict 12 months into the future with appropriate
confidence intervals/bands.

Figure 33: Prediction of 12 months ahead

25
12 Months predictions

Figure 34: Prediction of 12 months ahead

Q10 - Comment on the model thus built and report your

findings and suggest the measures that the company
should be taking for future sales.
Dataset contains 187 entries among 2 variables.
The data has outliers present
There is more sales in the second half of the year than the first year
December records the highest sales
We can see the sales of shoes has decreased drastically over few years.
The company can have more sales that forecasted if they focus on the innovation of the
products and applying marketing strategies.
The decrease in the sales figures over the years suggest that there has been a significant rise in
the competition and other brands are providing much better shoes than the company at a much
better prices.
To remain competitive in the market the company needs to implement multiple strategies

Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
100% (5)
Time Series Forecasting - ShoeSales - Business Report - Divjyot Shah Singh
38 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
Coffee Ville Complete
100% (6)
Coffee Ville Complete
29 pages
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
No ratings yet
Project Time Series Forecasting ROSE Dataset by Somya Dhar 1 PDF
52 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Answer Book - Sparkling Wines
No ratings yet
Answer Book - Sparkling Wines
10 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Project-Time Series Forecasting
100% (1)
Project-Time Series Forecasting
10 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
AS Graded Project Suchi Solanki
No ratings yet
AS Graded Project Suchi Solanki
21 pages
SMDM Project
No ratings yet
SMDM Project
17 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
TSF - Project
100% (1)
TSF - Project
5 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
2ND SEM 1ST QUARTER Statistic and Probability
100% (6)
2ND SEM 1ST QUARTER Statistic and Probability
7 pages
Problem 1
No ratings yet
Problem 1
12 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Data Mining Project - PCA - Hair Salon
No ratings yet
Data Mining Project - PCA - Hair Salon
8 pages
STD-12, Eco-Ch-1
No ratings yet
STD-12, Eco-Ch-1
11 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
3rd Quarter Research Capstone
No ratings yet
3rd Quarter Research Capstone
19 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Business - Report-Comp-Fin - Data - Part A - Problem
No ratings yet
Business - Report-Comp-Fin - Data - Part A - Problem
17 pages
Asnake Tarekegn
No ratings yet
Asnake Tarekegn
338 pages
Types of Quantitative Research For Students and Researchers
No ratings yet
Types of Quantitative Research For Students and Researchers
44 pages
Predictive Modeling
No ratings yet
Predictive Modeling
21 pages
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
No ratings yet
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
2 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
Revue D'anthropologie Vol. 2, No.2, 1879: Doctor Gustave Le Bon
100% (1)
Revue D'anthropologie Vol. 2, No.2, 1879: Doctor Gustave Le Bon
102 pages
Introduction To Data Analysis and Decision Making
No ratings yet
Introduction To Data Analysis and Decision Making
11 pages
50.2 - Chi Square Goodness-of-Fit Test
No ratings yet
50.2 - Chi Square Goodness-of-Fit Test
11 pages
Dips v7 Manual (081-100)
No ratings yet
Dips v7 Manual (081-100)
20 pages
Foundations of Probability With R
No ratings yet
Foundations of Probability With R
70 pages
Date Fruit Classification Project
No ratings yet
Date Fruit Classification Project
11 pages
Interactive Approaches For Vocabulary Teaching: The 2009 AE Conference Iso University, Kaohsiung
No ratings yet
Interactive Approaches For Vocabulary Teaching: The 2009 AE Conference Iso University, Kaohsiung
17 pages
30 Assignments PDF
No ratings yet
30 Assignments PDF
5 pages
Ppt. Correlation and Regression
No ratings yet
Ppt. Correlation and Regression
33 pages
Civil and Environmental Engineering Reports: Hedonic Pricing Model For Real Property Valuation Via Gis - A Review
No ratings yet
Civil and Environmental Engineering Reports: Hedonic Pricing Model For Real Property Valuation Via Gis - A Review
14 pages
Ucsp Notes
No ratings yet
Ucsp Notes
6 pages
Bandohan and Mendoza PDF
No ratings yet
Bandohan and Mendoza PDF
13 pages
Chapter 2: Economic Theories, Data, and Graphs: 2.1 Positive and Normative Statements
No ratings yet
Chapter 2: Economic Theories, Data, and Graphs: 2.1 Positive and Normative Statements
8 pages
AITools Unit 2
No ratings yet
AITools Unit 2
34 pages
Problem Set - VI
No ratings yet
Problem Set - VI
2 pages
Rank Cor. Regression
No ratings yet
Rank Cor. Regression
6 pages
Jurnal Selvi Psik Fix
No ratings yet
Jurnal Selvi Psik Fix
8 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
SAP MELCs
No ratings yet
SAP MELCs
4 pages
M Arch Syllabus
No ratings yet
M Arch Syllabus
13 pages
Civil Sensor2
No ratings yet
Civil Sensor2
12 pages
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
No ratings yet
Subhadeep Seal TSF-Coded Project Rose Wine Business Report
38 pages
Iit Madras (Data Science) ..-1
No ratings yet
Iit Madras (Data Science) ..-1
5 pages
Statistics. Level 6. Scatter Graphs. The Line of Best Fit
No ratings yet
Statistics. Level 6. Scatter Graphs. The Line of Best Fit
2 pages
Sample Final Paper Quantitative
No ratings yet
Sample Final Paper Quantitative
48 pages
Data Science: Concepts and Practice Vijay Kotu & Bala Deshpande download
100% (1)
Data Science: Concepts and Practice Vijay Kotu & Bala Deshpande download
153 pages