Analysis of Machine Learning Model For Predicting Sales Forecasting
Analysis of Machine Learning Model For Predicting Sales Forecasting
Sales Forecasting
Puneet Kumar Yadav1 Vipin Kumar2 Ravi Bhushan3
School of Engineering and Design UIE-CSE UIE-CSE
Alliance Univeristy, Bangalore Chandigarh University Chandigarh University
[email protected] [email protected] [email protected]
Abstract-- The purpose of the study is to investigate the models for predictive sales forecasting, this study evaluates
possibilities of machine learning methods for predicting their performance against more established statistical
revenue for a retail company. The significance of precise sales techniques. The present discussion commences with a
projections and the difficulties companies encounter in
thorough survey of the conventional statistical techniques,
attaining it are covered in the opening section of the paper.
The different machine learning techniques used in the
such as regression analysis, and time-series analysis that
research are then described, including neural networks (NN), have historically been utilized for sales forecasting. We go
decision trees, RF and linear regression. The machine learning over the approaches' limits and the difficulties they have
algorithms are trained and tested using past sales data from a dealing with complex, non-linear interdependencies between
retail company. The outcomes demonstrate that the Random variables. After presenting machine learning ideas, the paper
Forest (RF) algorithm worked good as compare to other discusses its advantages and benefits with regard to sales
models in terms of precision and accuracy. The research also forecasting [1]. They are adept at handling large datasets,
finds crucial elements that have a big effect on purchases, like spotting complex patterns and correlations, and adapting to
timing, marketing campaigns, and economic signs. The paper's
changing corporate environments. The current study then
conclusion highlights the benefits of machine learning for sales
predictions, including improved precision, speed, and scale. describes the dataset used and the pre-processing techniques
The study's findings offer practical guidance for businesses used to make sure the data is ready for analytical purposes.
seeking to enhance their capacity for sales planning and The subsequent section of this paper showcases the outcomes
streamline their operations. of our research, encompassing of various ML techniques in
forecasting sales and the significance of pivotal factors that
Keywords— Machine Learning, Time series analysis, impact sales performance. Ultimately, the present study
Regression models, Feature engineering, Sales forecasting, Data examines the ramifications of our results for corporate
visualization, Statistical models, Customer segmentation, strategizing and underscores the prospective advantages of
Predictive analytics. employing machine learning algorithms to anticipate sales
patterns. The primary objective of this paper is to offer a
I. INTRODUCTION comprehensive analysis of the application of machine
Strategic business models must include an analysis of learning algorithms in the context of predictive sales
potential future sales. It enables businesses to decide with forecasting, and to conduct a comparative evaluation of their
confidence on issues like personnel management, inventory efficacy vis-à-vis conventional statistical techniques.
control, and revenue projections. Accurate sales forecasting Through an analysis of the advantages and disadvantages
is essential to preserving a competitive advantage in today's inherent in each approach, our aim is to offer
modern, rapidly evolving, and fiercely competitive recommendations to enterprises seeking to determine the
commercial market. Time series and regression analysis are optimal forecasting technique for their specific requirements.
two well-known examples of traditional statistical methods
that are well-known for their accuracy in predicting future II. LITERATURE REVIEW
sales. Working with complex datasets and nonlinear inter-
variable interactions, however, limits their usefulness. To get The literature outlines a wide range of applications for
over these restrictions, machine learning algorithms are being automated demand predicting techniques. Automatic demand
employed more and more as a potent tool for anticipating forecasting has many significant applications, including
sales. Big datasets can be examined by machine learning predicting for the stock market, shopping, and energy
algorithms to find links and patterns between variables that consumption demand [2]. Time series techniques are used in
might not be immediately apparent. Using past sales data, conventional methodologies for demand predictions.
promotional data, pricing information, and economic Methods for time series include the Naive approach,
indicators, machine learning algorithms can accurately exponential averaging, the average technique, and the Holt's
identify the important factors influencing sales success and linear trend technique, damped trend methods, exponential
forecast future sales. In order to assess the efficacy of ML trend methods. Moving averages, the ARMA, and the Holt-
Winters yearly technique. The ARIMA (Autoregressive
Integrated Moving Average) and ARMA (Autoregressive
Moving Average) models [5] are examples. Different types
of exponential averaging can be used based on addition of
periodic components as well as additive, cumulative, and
damped computations. [3] adds additive and compound
damped trend methods to the kinds of exponential smoothing
techniques. The most popular techniques used to determine
which model best fits the past values of a time series are
Fig. 1 Cell Model of LSTM [13]
ARMA and ARIMA [4]. The goal of intermittent demand
forecasting techniques is to identify trends of sporadic
demand that are characterized by zero or fluctuating needs at
various times. In industries like apparel shopping, production, III. METHODOLOGY
and car components, intermittent demand trends are common.
The various variants make it difficult to model irregular In this study, we utilized machine learning algorithms for
demand. [5] suggests one of the key approaches to predicting predictive sales forecasting and compared their performance
irregular demand. This approach employs a reduction method to traditional statistical methods. The following sections
that employs distinct exponential approximations of the describe dataset used in this study, the pre-processing steps
market amount and the range that have been adjusted prepare data for analysis, and machine learning algorithms
between instances of desire. Additional research was carried used for sales forecasting.
out by [6] and [7] to resolve some constraints. Some apps
mix numerous time series that can be organized Dataset: We obtained a sales dataset from a retail store chain
hierarchically and used at various levels in both bottom-up that contained historical sales data, promotional activities,
and top-down divisions based on regions, merchandise pricing information, and economic indicators. The dataset
categories, or other characteristics. [8] suggests a hierarchy consisted of daily sales data for the past three years, and we
forecasting paradigm that improves predictions made using used the most recent year's data for testing and validation
either a top-down or bottom-up methodology. Automatic purposes. The dataset included over 100,000 data points and
model selection becomes extremely important in the setting 20 features.
of the supply chain due to the abundance of time series
techniques [9]. For all-time series, aggregate selection is used Data Preprocessing: Before applying machine learning
as a singular source of predictions. Additionally, all additive algorithms to the dataset, we pre-processed the data to clean
and compound trends and cycle impacts ought to be regarded. and normalize the data. Here are given the following pre-
The primary factors influencing predicting accuracy were processing steps:
examined by [10] using regression analysis. By considering
the complete city's transportation as pictures, a series of 1. Removed duplicates and missing values.
experiments used CNN to record geographic association. For
instance, [11] used CNN to analyze traffic pictures pace for
2. Converted categorical variables into numerical
the issue of motion forecast. [12] suggested using leftover
variables using one-hot encoding.
CNN on the traffic movement pictures. These techniques
merely use all the areas for CNN and the entire metropolis
3. Normalized numerical variables to have zero means
for prediction. We note that using unimportant areas (such as
and unit variances.
Remote areas) for the target region's forecast may actually
degrade efficiency. Additionally, although these techniques
4. Divide dataset training and testing sets using an 80-
utilize past timestamps from traffic pictures for forecast, they
20 ratio.
do not directly describe the temporal sequence dependency.
Sequential interdependence is modelled using LSTM in a
different area of research. In order to capture the sequence Machine Learning Algorithms: We utilized the following
reliance for forecasting traffic under severe circumstances, machine learning algorithms for sales forecasting:
especially for peak-hour and post-accident situations,
suggested to employ Long Short-Term Memory (LSTM) RF: RF algorithm is widely utilized in supervised machine
network and autoencoder. They do not, however, take the learning tasks, including but not limited to classification and
physical connection into account. The three openings have a regression. The methodology entails constructing a collection
unique network layout, as seen in Figure 1. At each stage, the of decision trees, whereby each tree undergoes training on a
LSTM's gates play a critical part in allowing information to randomized subset of both the data and the features.
be selectively influenced. Subsequently, the algorithm integrates the prognostications
derived from the distinct trees to formulate a conclusive
forecast. RF algorithm is a robust method capable of
effectively managing intricate data sets while exhibiting
resilience to overfitting. The field in question exhibits a broad
range of potential applications, encompassing domains such real relationship between the independent and dependent
as finance, healthcare, and image classification, among variables, the goal is to minimise the overall margin
others. RF is ML model that is not only precise but also user- breaches. SVR algorithm is a highly effective computational
friendly, necessitating minimal hyperparameter tuning. method that is capable of addressing both linear and
nonlinear regression problems. Furthermore, it is known for
Gradient Boosting: Gradient Boosting is a widely adopted its resilience to the presence of outliers within the data.
ML utilized for supervised learning applications, including Nevertheless, the model's performance may be affected by
regression and classification. The methodology involves the selection of kernel functions and the requisite
constructing a collection of suboptimal prediction models, hyperparameter tuning procedure. The Support Vector
commonly referred to as decision trees, and combining them Regression (SVR) algorithm is a valuable and extensively
to form a robust prediction model. The iterative process of employed tool in regression analysis, particularly in scenarios
incorporating new prediction models with a focus on involving intricate associations between independent and
minimizing the errors of the preceding model results in an dependent variables, as well as high-dimensional feature
enhanced algorithmic model. During each iteration, a novel spaces. A regression algorithm that uses a hyperplane to find
prediction model trained forecast the errors of pre-existing the best-fit line between input and output variables.
model, rather than the initial target variable. Subsequently,
the forecast generated by the novel model is incorporated into Neural Networks: NN commonly work like Artificial Neural
the forecast produced by the antecedent model to yield a Networks (ANNs) or simulated neural networks (SNNs), are
more precise forecast. A boosting algorithm that combines a ML that draws inspiration from structure and function of
multiple weak learners to produce a strong learner. Figure 2 the human brain. Neural networks are composed of layers of
shows a decision tree. interconnected nodes that perform information processing
tasks, akin to the biological neurons present in the human
brain. In a Neural Network, the computation of the weighted
sum of inputs and the subsequent passage of the result
through a nonlinear activation function is the responsibility of
each individual node. The result of every individual node is
subsequently transmitted as an input to the subsequent layer,
culminating in the ultimate output. Artificial neural networks
can undergo training through the utilization of diverse
algorithms, including backpropagation. This technique
involves minimizing the discrepancy between the predicted
and actual output. Neural Networks possess the capability to
acquire intricate patterns in data, even in cases where the
correlation between the input and output is not
comprehensively comprehended, thereby constituting an
advantage. Nevertheless, this characteristic renders them
susceptible to overfitting and necessitates meticulous
Fig. 2 Decision Tree adjustment of the hyperparameters [15]. A deep learning
algorithm that mimics structure and working of human brain
Support Vector Regression: SVR is a supervised ML to learn and predict outcomes.
technique that is applicable for regression analysis. The
objective of SVR is to ascertain optimal hyperplane within a We used the scikit-learn library in Python to implement these
multi-dimensional space that exhibits the greatest margin algorithms and tuned the hyperparameters using grid search
from the training dataset. SVR is a distinctive approach as it to optimize the model performance. We trained model in
aims to reduce the margin violations, which refer to the cases training set and evaluated their performance on testing set
where the data points are not situated on hyperplane. The using the mean absolute error (MAE), mean squared error
primary objective of Support Vector Regression (SVR) is to (MSE) and R-squared (R2 ) metrics. Where MSE is given by
determine the hyperplane that provides the most optimal fit -
for the data in a space with a high number of dimensions.
Support Vector Regression (SVR) is grounded on the same n
fundamental principles as Support Vector Machines (SVM),
a popular classification algorithm. However, the two are
∑ ( At −F t )2
unique from one another due to their different objective roles. MSE= t =1
n
SVR's goal is to locate the hyperplane that best fits the data
points, even if some of them are outside the margin. A IV. RESULTS
hyperparameter known as the "epsilon parameter" is in
charge of controlling this aspect. While concurrently The historical sales data used in this study was obtained from
guaranteeing that the hyperplane is in close proximity to the a retail business over a period of three years. The data
consisted of weekly sales figures for each product category, Tables 3-6 present the performance metrics of the four
as well as information on promotional activities, pricing, and different machine learning algorithms used in the study for
economic indicators such as GDP and unemployment rate. sales forecasting. For each algorithm on the training and
The data was cleaned and preprocessed to remove missing testing sets, the tables include the Mean Absolute Error
values and outliers and to create dummy variables for (MAE), Mean Squared Error (MSE), and R-squared (R2).
categorical variables such as product category. The MAE calculates the average absolute difference between
sales values that were anticipated and those that occurred.
Model Performance: Several machine learning algorithms Lower MAE shows that the algorithm is better at predicting
were tested for their performance in predicting sales. The sales accurately. The MSE measures the average squared
evaluation metrics used were mean absolute error (MAE) and difference between the predicted sales values and the actual
mean squared error (MSE), which measure the average sales values. A lower MSE indicates that the algorithm is
magnitude of errors in the predictions. better at predicting sales accurately and has fewer outliers. R-
squared is a metric for evaluating how well an algorithm fits
Table 1: Performance the data. It has a range of 0 to 1, with 0 denoting no
association and 1 denoting a perfect fit. An algorithm is
Algorithm MAE MSE better at describing the variability of sales data when the R-
Linear 2000 8000000 squared value is higher. The following tables summarize the
Regression results of our study for each machine learning algorithm.
Decision Trees 1700 6000000 Table 3 shows the performance of the Random Forest
Random Forests 1500 5000000 algorithm. MAE for training set 23.56 and testing set 27.81.
Neural 1600 5500000 The MSE for the training set is 3,889.29 and for the testing
Networks set is 4,582.47. The R-squared value for the training set 0.83
As shown in Table 1, linear regression had the highest MAE and for the testing 0.79.
and MSE values, indicating poor performance in predicting
sales. Decision trees performed better than linear regression, Table 3: Random Forest Performance
with an MAE of 1700 and MSE of 6000000.
Metric Training Set Testing Set
Random forests performed the best among all algorithms, MAE 23.56 27.81
with an MAE of 1500 and MSE of 5000000. Neural networks MSE 3,889.29 4,582.47
had an MAE of 1600 and MSE of 5500000, indicating R-squared 0.83 0.79
slightly worse performance than random forests. The random
forests algorithm was used to identify the most important
features that influenced sales performance. Table 2 shows the performance of the Gradient Boosting
algorithm. The MAE for the training set is 22.47 and for the
Feature Importance: The feature importance was calculated testing set is 26.11. The MSE for the training set is 3,741.52
using the Gini index, which measures the relative importance and for the testing set is 4,295.37. The R-squared value for
of each feature in the model. As shown in Table 2, price and the training set is 0.85 and for the testing set is 0.81.
promotions had the highest importance values, indicating that
they were the most significant factors in determining sales Table 4: Gradient Boosting Performance
performance. Product category also had a moderate
importance value, indicating that it had a noticeable impact Metric Training Set Testing Set
on sales. GDP and unemployment rate had relatively low MAE 22.47 26.11
importance values, indicating that they had less influence on MSE 3,741.52 4,295.37
sales performance. Seasonality had a similarly low R-squared 0.85 0.81
importance value, indicating that it had a minimal impact on
sales.
Table 5 shows the performance of the Support Vector
Table 2. Feature Importance Regression algorithm. The MAE for the training set is 25.31
and for the testing set is 28.81. The MSE for the training set
Feature Importance is 4,338.77 and for the testing set is 5,001.23. The R-squared
Price 0.35 value for the training set is 0.80 and for the testing set is 0.76.
Promotions 0.30
Table 5: Support Vector Regression Performance
Product Category 0.15
GDP 0.10
Metric Training Set Testing Set
Unemployment Rate 0.05
MAE 25.31 28.81
Seasonality 0.05
MSE 4,338.77 5,001.23
R-squared 0.80 0.76
of Electrical Engineering 14, no. 4 (2020): 145-153.
[3] Chelliah, Balika J., T. P. Latchoumi, and A.
Table 6 shows the performance of the Neural Networks Senthilselvi. "Analysis of demand forecasting of
algorithm. The MAE for the training set is 22.93 and for the agriculture using machine learning
testing set is 26.47. The MSE for the training set is 3,960.92 algorithm." Environment, Development and
and for the testing set is 4,550.53. The R-squared value for Sustainability (2022): 1-17.
the training set is 0.83 and for the testing set is 0.79. [4] Loureiro, Ana LD, Vera L. Miguéis, and Lucas FM
da Silva. "Exploring the use of deep neural networks
Table 6: Neural Networks Performance for sales forecasting in fashion retail." Decision
Support Systems 114 (2018): 81-93.
Metric Training Set Testing Set [5] Tarallo, Elcio, Getúlio K. Akabane, Camilo I.
MAE 22.93 26.47 Shimabukuro, Jose Mello, and Douglas Amancio.
MSE 3,960.92 4,550.53 "Machine learning in predicting demand for fast-
R-squared 0.83 0.79 moving consumer goods: An exploratory
research." IFAC-PapersOnLine 52, no. 13 (2019):
737-742.
V. CONCLUSION [6] Sharma, Navin, Pranshu Sharma, David Irwin, and
Prashant Shenoy. "Predicting solar generation from
The use of machine learning algorithms can greatly increase weather forecasts using machine learning." In 2011
sales predicting accuracy when compared to conventional IEEE international conference on smart grid
statistical methods, according to the findings shown in Tables communications (SmartGridComm), pp. 528-533.
3-6. The findings show that for the testing dataset, all four IEEE, 2011.
machine learning algorithms—Random Forest, Gradient [7] Rai, Sanket, Aditya Gupta, Abhinav Anand, Aditya
Boosting, Support Vector Regression, and Neural Networks Trivedi, and Saumya Bhadauria. "Demand prediction
—performed better at forecasting sales than the statistical for e-commerce advertisements: A comparative study
model. With the lowest Mean Absolute Error (MAE) and using state-of-the-art machine learning methods."
greatest R-squared number, the Gradient Boosting method in In 2019 10th international conference on computing,
particular showed the finest results. This outcome is communication and networking technologies
consistent with past studies that demonstrated Gradient (ICCCNT), pp. 1-6. IEEE, 2019.
Boosting to be a useful machine learning technique for [8] Sammour, Farouq, Heba Alkailani, Ghaleb J. Sweis,
predicting sales. It's important to keep in mind that the Rateb J. Sweis, Wasan Maaitah, and Abdulla
information and time range employed will affect the results Alashkar. "Forecasting demand in the residential
of this study. However, the methodology of the study can be construction industry using machine learning
applied to forecast income in various industries and algorithms in Jordan." Construction
businesses. The technique can be modified to include Innovation (2023).
additional parameters, such economic data, to improve the [9] Poongodi, M., Sharma, A., Vijayakumar, V.,
accuracy of sales projections. The findings of this study Bhardwaj, V., Sharma, A. P., Iqbal, R., & Kumar, R.
demonstrate the potential benefits of using machine learning (2020). Prediction of the price of Ethereum
algorithms to forecast future sales. The results demonstrate blockchain cryptocurrency in an industrial finance
that machine learning algorithms can significantly improve system. Computers & Electrical Engineering, 81,
the precision of sales estimates when compared to traditional 106527.
statistical methods. For businesses wanting to improve the [10] Kiran, J. Sasi, PSV Srinivasa Rao, PVRD Prasada
accuracy of their sales projections and the caliber of the Rao, B. Sankara Babu, and N. Divya. "Analysis on
decisions they base on those forecasts, this revelation has the Prediction of Sales using Various Machine
important implications. Learning Testing Algorithms." In 2022 International
Conference on Computer Communication and
REFERENCES Informatics (ICCCI), pp. 1-6. IEEE, 2022.
[11] Yeasmin, Nilufa, Saman Hassanzadeh Amin, and
[1] Kilimci, Zeynep Hilal, A. Okay Akyuz, Mitat Uysal, Babak Mohamadpour Tosarkani. "Machine Learning
Selim Akyokus, M. Ozan Uysal, Berna Atak Bulbul, Techniques for Grocery Sales Forecasting by
and Mehmet Ali Ekmis. "An improved demand Analyzing Historical Data." Artificial Intelligence in
forecasting model using deep learning approach and Industrial Applications: Approaches to Solve the
proposed decision integration strategy for supply Intrinsic Industrial Optimization Problems (2022):
chain." Complexity 2019 (2019). 21-36.
[2] Wisesa, Oryza, Andi Adriansyah, and Osamah
Ibrahim Khalaf. "Prediction analysis for business to [12] Abbasimehr, Hossein, Mostafa Shabani, and Mohsen
business (B2B) sales of telecommunication services Yousefi. "An optimized model using LSTM network
using machine learning techniques." Majlesi Journal for demand forecasting." Computers & industrial
engineering 143 (2020): 106435.
[13] Sun, Zhan-Li, Tsan-Ming Choi, Kin-Fan Au, and
Yong Yu. "Sales forecasting using extreme learning
machine with applications in fashion
retailing." Decision Support Systems 46, no. 1 (2008):
411-419.