ForecastingRetailSalesusingMachine Learning Models
ForecastingRetailSalesusingMachine Learning Models
Models
Article history
Submitted 17.02.2025 Revised Version Received 16.03.2025 Accepted 18.04.2025
Abstract
Purpose: This paper’s main objective is to leveraging these predictive models,
examine common machine learning companies can make data-driven decisions
techniques and also time series analysis for to improve efficiency, reduce costs, and
sales forecasting in a bid to get the best increase profitability. The findings also
fitted technique and give more logical highlight the importance of selecting the
hypotheses for raising future profit margins most appropriate algorithm for a given
while obtaining historical in-depth dataset and problem, as well as the need for
understanding of prior demand utilising proper model tuning and validation to
business intelligence software’s like ensure reliable results. Furthermore, the
Tableau or Microsoft Power BI. The study underscores the significance of
outcomes are laid forth with regards to understanding and interpreting error
dependability as well as precision of the metrics like RMSE and MAE to effectively
various forecasting models that were evaluate and compare model performance.
employed. Unique Contribution to Theory, Practice
Materials and Methods: In this project, and Policy: Factors such as Seasonality,
a sales prediction is carried out on a 5 Trend, Promotional offers and Randomity
year store-item sales data for 50 different have been known to be important factors
items in 10 different stores with a dataset that affect the outcome of Sales Forecasting
obtained from Kaggle. This study focuses which is why the performances of the
on using Machine Learning Methods Mean Absolute Error (MAE), the Absolute
including the Random Forest, Gradient error (R2) and the Root Mean Square Error
Boosting Regression (XGBoost), Linear (RMSE) are all compared in the different
Regression and also the standard time algorithms used, to help identify the best
series Autoregressive Integrated Moving preferred algorithm to be adopted which
Average (ARIMA) method were turned out to be the XGBoost method.
analysed and contrasted to measure the Keywords: Sales forecasting, Machine
methods’ effectiveness for prediction of learning, Time series analysis, Random
Sales. Forest, ARIMA, LSTM, XGBoost, Linear
Findings: This study demonstrates the regression, RMSE, and MAE.
potential of machine learning algorithms in
accurately forecasting sales, which can be
extremely valuable for businesses in
optimizing their operations, inventory
management, and financial planning. By
INTRODUCTION
During the early years, a lot of businesses would usually embark on the production of goods
without taking into account the sales numbers resulting from demand for such products.
Competing in a market without taking into consideration the usage of Data about the demand
for same or similar items in such a market in order to evaluate whether to expand or reduce
quantity produced puts any manufacturer at the risk of losing out. Bajari, Nekipelov, Ryan,
and Yang mentions that varying specifications are used by various businesses in gauging
sales. Techniques that previously existed either fall short of capturing context-specific, erratic
trends or failing to incorporate all available data in the face of a data shortage issue.
For every business to be successful, forecasting the right demand at each outlet is essential
because it properly aids in management of stock, improves the distribution of products
among stores, reduces excessive stocking and understocking at individual stores, which
essentially, enhances profits and client happiness while bringing loss to a minimum [Berry
and Linoff (2004)]. Given the limited shelf lives of many products in a lot of businesses,
which results in income losses in both shortage and surplus scenarios, the importance of sales
forecasting cannot be overemphasized [Jain, Menon, and Chandra (2015)].
Computer systems that can assist in making quality decisions by offering projections of
future sales should be used in supporting the forecasting of sales as a result of the fact that
human contingencies such as illnesses and deaths can usually cause scarcity of Manager
who would usually make imprecise sales predictions on several occasions [Tsoumakas
(2019a)]. Optionally, making use of machine learning algorithms in the creation of precise
sales forecasting techniques by utilizing the abundance of available sales data is a better
and easier process, because it is unbiased and adaptive to any data changes unlike what is
attainable with human attributes of a Sales manager. Also a machine learning algorithm
surpasses the imperfections of a human expert’s accuracy.
How can we collect globally accepted data from sales (wholesale or retail) that a computer can
analyse? Every time a sales transaction occurs (mostly between buyers and sellers) in a market,
it is best practice to document the transaction (agreed-upon) price at the time. In a bid to
estimate future sales and demand, several businesses have traditionally centred on analytical
models like the linear regression, time series and the random forest models.
A set of samples taken over time in a chronological order is called a time series. Time series
can be used to represent many different data-sets type, together with a regular sequence
of the number of items dispatched from a facility, a weekly series of the number of traffic
accidents, daily rainfall totals, regular measurements of the outcome of a chemical process,
and so forth. Time series examples abound in a variety of disciplines, including finance,
engineering, geology, astronomy, and sociology. One key inherent characteristic of time series
is that of its dependent neighboring observations. It is highly practical to understand how
observations in a time series interact with one another. Strategies for this dependence’s study
are the focus of time series analysis while also critically observing seasonality, irregularities
and patterns. [Box, Jenkins, Reinsel, and Ljung (2015)]
Linear Regression is one basic important Statistical Modeling tool that represents regression
functions as a collection of predictors. It is widely used because it offers a good approximation
to underlying regression functions in the case of small sample size. It is safely referred to as
the fundamental foundation of all contemporary modeling tools [Su, Yan, and Tsai (2012)]. In
machine learning, feature engineering involves creating quantitative patterns of significant
systems based on the subject matter expertise, allowing better analysis of data and an additional
beneficial viewpoint [Li, Ma, and Xin (2017)]. Random Forest is a more advanced approach
which allows for the merging of several trees to produce judgments. The random forest model
eliminates the average of all individual tree decision predictions to produce projections that are
more precise [Boyapati and Mummidi (2020)].
Regression rather than time series analysis is the better approach for predicting sales.
Regression algorithms can frequently provide us with better outcomes than time series
methods, as experience has shown. The time series can be examined for trends using machine
learning algorithms [Pavlyshenko (2019)].
Research Questions
i. What limitations are evident in previous retail sales revenue projections research works?
ii. What are the most frequent and important characteristics or elements that affect the
forecasting of retail sales?
iii. Which algorithms are most effective in solving issues with product sales forecasting?
iv. How are sales volume or market projections improved by machine learning algorithms
over the classical statistical approach?
Research Objective(s)
Majorly, the ultimate objective of this study is to employ machine learning and conventional
algorithms in determination of all categories of goods that sell more at a given business
outlet, which will assist the shop owners to make better choices about stock procurement.
In a bid to find out which machine learning algorithms perform the best on the provided
data, a comparison analysis of four distinct classification machine learning algorithms would
be conducted. Application of a conventional technique will also be covered in the further
sections of this study for simple comparison purposes.
Why Forecast?
As forecasting involves the future, it greatly benefits a lot of sectors. For families, it equips the
major provider of the home with proper planning based on past spending known as Budgeting.
When projections are reliable and accurate, enterprises in the private sector can have significant
confidence in their upcoming investments. Sales forecasts are vital components of many
decision-making processes at the organisational level in a variety of functional areas,
including administration, promotion, advertising, manufacturing, and financing.
[Cheriyan, Ibrahim, Mohanan, and Treesa (2018)] of great importance is knowing that
having a reliable sales forecast available to a firm offers the following advantages.
It directs how sales targets and anticipated profits should line up. Aids in making better
future decisions. It assists in shortening the time required for setting up targets and planning
market penetration. Setting criteria that can be utilised to identify trends in the future is
beneficial. Also not forgetting how catastrophic erroneous forecasts can be for a business
as it introduces disparities and gaps in the business strategic goals.
Forecasting Methods
The two most fundamental forecasting techniques with which businesses can conduct and
execute sales forecasting are: the top-down (TD) sales forecast strategy and the bottom-up (BU)
sales forecast strategy.
Upon a decision by a company to choose to use the bottom-up sales forecast strategy, it
begins by concentrating on the anticipated number of units it believes it will probably sell
then multiplying that number by the average cost per unit. The number of available sales
agents, the number of potential retail sites, and digital presence are some other variables that
can be integrated with this strategy. The bottom-up sales approach seeks to begin with the
least elements of the projection and work its way up. The number of sales agents and retail
sites, as well as the pricing of a product or quantity, can all be dynamically altered under the
bottom-up sales forecast strategy. As a result, this method offers information that is relatively
detailed [Ofoegbu (2021)]
Ofoegbu also mentions that, on the contrary, the top-down sales forecasting strategy focuses
primarily on the global demand (total addressable market-TAM). By predicting the proportion
of the overall market share they want to sell for the reviewed time, it calculates the potential
market share that the company can obtain. The bottom-up strategy should be used first,
followed by the top-down strategy, to guarantee that the prediction is practicable.
However, both of these strategies should be used to obtain solid and deployable sales
projections. Over the past decade, numerous studies have explored various forecasting
techniques, and their findings are well-documented in the literature. However, it is
important to note that each forecasting method has its own limitations and drawbacks. For
instance, the accuracy of standard statistical methods heavily depends on the
characteristics of the time series data, which can significantly impact the precision of the
forecast. On the other hand, artificial intelligence (AI) approaches have the potential to
surpass traditional statistical forecasting models in terms of accuracy. Nevertheless, these
AI methods often require more time and computational resources to generate results [Liu,
Ren, Choi, Hui, and Ng (2013)]. Consequently, researchers have consistently
recommended the simultaneous integration of different methodologies to develop a novel
"hybrid technique" that can provide a reliable and effective forecasting outcome [Ofoegbu,
2021].
Achieving Precise Forecasts
Management of businesses ought to be able to integrate these 5 processes in their forecast
criteria to guarantee an appropriate forecast is generated from a business perspective:
Past data availability: Historical data and patterns should be easily accessible for analysis.
The foundation of a sound prediction begins with a breakdown of the data by revenue,
transaction time, volume of sales, and other pertinent criteria that should be addressed.
Next, this information is formed into a "sales operating margin," which is regarded as the
anticipated number of sales for the evaluation period. In light of this, the CRISP-DM
methodology’s deployment is essential at this point since it will assist the analyst in
gathering data that is relevant to the business’s needs and objectives and ensure that the data
gathered for the prediction is accurate [Shearer (2000)].
Address anticipated changes: This requires adjusting the sales operating margin to reflect
anticipated future changes. This could involve adjustments to the pricing owing to rivalry or
other causes, adjustments to the client base that are increasing or declining, incentives,
adjustments to the outlets that are either incremental or reducing, and adjustments to the
commodity itself [Liu et al. (2013)].
absolute percentage error(mape) and accuracy but in this study, measurements such as mean
squared error, mean absolute error, measures of variation (r2) are taken into account as they do
not ignore the weightings, error significance and correlation between the actual and projected
data.
Model training, testing, and validation: The models in this study will be trained, tested, and
validated using a combination of techniques. First, the dataset will be split into training and
testing sets using a train-test split approach. This involves randomly dividing the data into two
subsets: a larger portion (e.g., 80%) for training the models and a smaller portion (e.g., 20%)
for evaluating their performance on unseen data. Additionally, cross-validation techniques,
such as k-fold cross-validation, will be employed to assess the models' generalization ability
and to reduce the risk of overfitting. In k-fold cross-validation, the training data is further
divided into k subsets, and the model is trained and validated k times, each time using a
different subset for validation while the remaining subsets are used for training. This process
helps to ensure that the models' performance is consistent across different subsets of the data.
Research Structure
The proposed model for this thesis study is created using the blueprint below.
a new algorithm known as the “Rapid Fashion Forecasting (3F) algorithm”, that handles
prediction using small dataset and in a constrained amount of time, the study explores the
sales predictive model for fast fashion. The issue of prediction using small dataset and in
a constrained amount of time is the subject of this research for the first time ever. A 4-
period real sales dataset from a knitwear fashion company that utilises the principle of fast
fashion is used for the study. The dataset is tested with the 3F Algorithm and the resulting
outcomes are recorded. GM-EELMs with two neurons and two inputs are discovered to
reach the time restriction of .5 seconds, whereas those with two neurons and seven inputs
are discovered as shown by the research results that it is possible to complete the task in less
than 20 seconds (and Not .5 s). The investigation goes to conclude that the 3F algorithm gives
a more impressive prediction precision than the GM and can also perform excellently in the
predictions of others like financial markets and short spanned seasonal products. [Choi et al.
(2014)]
”Time Series Forecasting of Agricultural Products’ Sales Volumes Based on Seasonal
Long Short-Term Memory” (Yoo & Oh, 2020)
For a balance in agricultural supply and demand, this paper proposed developing a deep
learning-based system (Seasonal Long short-time memory) in forecasting of agricultural sales.
Sales of agricultural products are mostly time series, which makes it possible to project them
using numerous traditional time series projection techniques. Majority of the time, machine
learning or traditional statistical models are used to handle the supply and demand prediction
problem for agricultural products. The time-series records comprising 5 products from a
regional food shop’s Point of sale system located at Wanju, South Korea (ChineseMallow,
Jujube Mini Tomato, Onion, Lettuce, and Welsh Onion) from June 2014 to December 2019
that contained the fewest missing dates was selected for this study. In this study, the well-
known models (Prophet, LSTM, SARIMA, as well as the suggested SLSTM) were
implemented and each characteristics and execution results then compared. Three error
measures which are the normalised mean absolute error (NMAE), root mean squared error
(RMSE) and mean absolute error (MAE), have been adopted to appraise the forecasting
performances. Lower accuracies were produced by the Prophet model’s smaller
predictions compared to actual values. Both LSTM and SLSTM predictions are more
accurate than Prophet’s predictions at predicting the direction of the actual data. These
results explain why LSTM and SLSTM are more accurate than Prophet and auto arima.
The paper came to the conclusion that in evaluating investigations with prophet, auto arima
and regular LSTM models, the SLSTM with enhanced seasonal variation has a low
predicting error rate and NMAE. [Yoo and Oh (2020)]. The Stacked LSTM (SLSTM)
model is an extension of the standard LSTM architecture that incorporates multiple hidden
LSTM layers. In a standard LSTM, there is a single hidden layer composed of LSTM units,
which are responsible for capturing and learning the temporal dependencies in the data. By
stacking multiple LSTM layers, the SLSTM model allows for a more complex and
hierarchical representation of the temporal patterns. Each additional layer can learn
increasingly abstract features from the previous layer's output, enabling the model to capture
more intricate and long-term dependencies in the data.
”Machine-Learning Models for Sales Time Series Forecasting” (Pavlyshenko, 2019)
To forecast future purchases in this study, the "Rosemann Store Sales" dataset from Kaggle is
used. The primary Python packages were used for the calculations. Analysis includes
attributes like average sales price of past information, public and educational breaks, the
range separating your store and those of your rivals, and the dispersion type of your store are
taken into consideration in projecting knowing that Regression classifier captures patterns in
the entire set of stores or consumer items. The data is analysed using different algorithms
for machine learning, such as Random Forest, Neural Network, Arima model, Extra Tree
model etc. The findings from the first level’s models have non-zero parameters. The linear
model and the Extra Tree model which are models from the Pythons’ scikit-learn package,
plus the Neural Network concept was implemented using the second stacking level. The 3rd
level is where the result from level 2 was weighted and added. The analysis revealed that
using regression models for sales prediction can commonly result in better outcomes than
time series approaches. By taking into consideration the variances in outputs from numerous
models with different permutations of inputs, stacking makes it possible to increase quality
on the verification and on the out-of-sample sets of data. [Pavlyshenko (2019)]
”Time-Series Forecasting of Seasonal Items Sales using Machine Learning: A
Comparative Analysis” (Ensafi, Amin, Zhang, & Shah, 2022)
With the use of various models, including SARIMA, Prophet Neural Networks and
Exponential Smoothing, this papers objective is to perform time-series analysis of cyclical
data. The best model in this inquiry is the one that produces the most accurate results. The
dataset of the Superstore sales (Community.tableau.com, 2017) does not have any missing
values and exhibits seasonality in its sales trend. This study’s dataset, which comprises close
to 10,000 data points and 21 attributes, describes the retail outlet volumes between 2014 and
2017. It includes sales data for office supplies, furniture, and technology items being the three
categories. Variable of interest in this analysis is furniture sales, which exhibit seasonal trends.
The most popular Indicators have been employed in this study’s analysis to gauge forecast
accuracy. The first Indicator to be used is MSE. Since the RMSE is on the same scale as
the data, it can be preferred to the MSE as an indicator in this study. The third and final
metric is MAPE, which has the benefit of being scale-independent. With a better MAPE
and RMSE value, Stacked LSTM outperforms the other models according to all three
metrics.
Out of 13 models, the CNN, Vanilla LSTM, Stacked LSTM, and Prophet models are
among the five models that can predict seasonal time-series sales data is what this study
concludes. [Ensafi et al. (2022)]. The reason why the Stacked LSTM outperforms other
models in this study can be attributed to its deeper architecture, which allows it to better
capture and model the complex seasonal patterns present in the sales data. The multiple
layers of LSTM units enable the model to learn hierarchical representations of the time
series at different levels of abstraction, from low-level patterns to high-level trends and
seasonality. This deep learning approach is particularly effective in handling the non-linear
and non-stationary characteristics of sales data, as it can adapt to changing patterns and
extract meaningful features from the historical observations. In contrast, traditional models
like ARIMA and standard LSTM may struggle to fully capture the complexity of the
seasonal patterns, leading to less accurate predictions.
”Intelligent Sales Prediction Using Machine Learning Techniques” (Cheriyan et al.,
2018)
Dataset gotten from a fashion store sales over three straight years is utilised. The sales
forecast covers the three years from 2015 to 2017. The data mining approach goes through
several stages of exploratory data analysis, including data interpretation, preparation,
modelling, evaluation, and deployment. A smoothed average that has been vertically trend-
adjusted serves as the projection’s basis and then seasonality-adjusted as well. Future
revenue is predicted using machine learning methods like the Gradient Boost Tree (GBT),
Decision Tree (DT), and Generalised Linear Model (GLM). Gradient Boost Algorithm
offers 98% accuracy rate, Decision Tree follows closely as runner up with roughly 71%
rate of accuracy, and Generalised Linear Model lastly comes in with 64% rate of accuracy,
according to results obtained. Ultimately, if the three algorithms are empirically evaluated,
Gradient Boosted Tree, which offers the highest prediction accuracy of all the algorithms,
is shown to be the best fit for the theory. [Cheriyan et al. (2018)]
Further Literatures
It is possible to go back more than six decades in the history of sales forecasting [Boulden
(1957), Winters (1960a)]. Ever since, numerous studies on sales forecasting have been released
[Chen and Ou (2009), Lo (1994), Sakai, Nakajima, Higashihara, Yasuda, and Oosumi
(1999), Wong and Guo (2010), Winters (1960b)], involving a robust range of real-world
applications in the agriculture [Yoo and Oh (2020)], fashion [Liu et al. (2013)], furniture
[Yucesan, Gul, and Erkan (2017)], and food [Tsoumakas (2019b)] industries, amongst others.
Retailers generally concur that a retail product’s initial sales are a reliable predictor of future
sales [M. L. Fisher, Raman, and McClelland (2000)]. Nevertheless, there has been much
focus on using early purchases to predict total retail product sales up to this point. Tanaka
produced the sole piece of research that was discovered, and it forecasted long-term
purchases based on early purchases and relationships between short-term and long-term
cumulative purchases within related product groupings. Tanaka’s work’s predicting
accuracy was heavily dependent on the choice of the sample population, which was made
based on professional expertise, making it discretionary and perhaps untrustworthy. Further,
Tanaka (2010) did not take into account how several influencing elements, such as
manufacturing qualities and advertising strategies, may affect sales volume, making it
unable to handle fluctuations in sales brought on by these factors.
The creation and enhancement of sales forecasting models for the retail industry have received
a lot of attention over the past few decades as comprehensively discussed in M. Fisher &
Raman, 2018. Like all companies, retailers must decide how to evolve strategically in the
face of shifting market conditions and technology advancements. Forecasts are often a
requirement for the conventional components of a sales approach, including market and
competitive considerations in dynamic technological and legislative environments [Levy,
Weitz, Grewal, and Madore (2012)]. The climate for small traditional vendors is also unstable,
and there is ambiguity over the region and intended audience. Despite the fact that many
of the issues that major merchants confront still exist (such as the digital service), there isn’t
much in the intensive information that even briefly summarizes the outcomes of the numerous
small vendors’ site choices. Accurate Economic Order quantity Unit quick demand
estimates are essential in Just-in-time systems since they serve as the input elements for the
supply chain scheduling process, which is often performed on a daily, weekly, or monthly
basis. Since both underestimating quick demands and overestimating it are costlier due to the
resulting supply constraints and poor customer service, overstocking, and goods
depreciation, forecasting performance can significantly affect a firm’s earnings outcomes
[Sanders and Graman (2009)]. The fundamental approaches to forecasting item sales rely
solely on historical sales data and linear modelling techniques. Conventional time series
approaches are used in the retail sector [KALAOGLU et al. (2015)], including basic moving
averages and the Auto Regressive Moving Average ARMA methodology. Another study
area uses model-based forecasting to anticipate sales of products made, solely integrating
promotional (plus other) data. [Fildes, Ma, and Kolassa (2022)], assert that these methods
frequently depend on more advanced business theories or numerous linear models, whose
components are related with periodicity, dated activities, climate, cost, and marketing
features. The objective of new findings on projecting sales volumes has been to utilise a
generic prediction model which can be applied to all the items under investigation. There is
no guarantee that any method, no matter how sophisticated it may be, ranks higher for a
diverse set of series than that of any other method, according to [Wolpert and Macready
(1997)]”no-free-lunch theorem”. This indicates that it is nearly impossible that one particular
method will be superior to others for all products and all coming years and emphasises how
crucial it is to choose a method that fits the features of the problem. There hasn’t been a
significant amount of study in the general forecasting community examining the
advantages of various selection methods, differing the use of individual and collective
choices and pairing [Fildes and Petropoulos (2015)].
You must first exclude the trend and seasonality from the data before using the autoregressive
integrated moving average (ARIMA) models. For instance, you would have to eliminate this
trend from the time series if you were analysing the number of regular visitors on your website
and it was increasing by 15% monthly. To obtain the final forecasts, you would need to
reintroduce the trend after the model has been developed and has begun to generate projections.
Comparable to this, if one were attempting to forecast the monthly sales of water bottles, one
would likely see significant seasonality: because it records high sales during the summer,
and there is a high recurrence yearly. By calculating the change in the value at every time
interval and the value from the previous year, for instance, one can be capable of eliminating
this seasonality from the time series (technique referred to as differencing). Likewise, to obtain
the final forecasts, one would need to re-introduce the seasonal pattern once the model has been
developed and has made several projections. [Betel (2022)]
The Grey method (GM) has a reputation for being a strong contender for forecasting with
limited past information. For instance, [C.-C. Hsu and Chen (2003a)] used the GM-based
algorithm to project the demand for electricity despite having very little historical data, and
they found some encouraging outcomes before an enhanced GM-based strategy was put
out by [Yao, Chi, and Chen (2003)] also applying it to the forecasting of electricity demand.
In order to create a projection technique for the integrated circuit industry Chen worked
with the GM-based model and got favorable prediction results. Yao et al. (2003)
discovered that their proposed enhanced GM approach performed extremely effectively
with a largely small number of past data under a number of key situations that were
particularly relevant to the electrical business. [Lin and Lee (2007), L.-C. Hsu (2009), Lei
and Feng (2012), L.-C. Hsu and Wang (2007)] all carry out recent studies on predicting
using GM models. Although the GM modelling techniques were thought to be an effective
instrument for forecasting with little available data, it was still found to perform poorly,
especially if the supporting data pattern is highly erratic and devoid of any recognizable
trend [Deng (1989),C.-C. Hsu and Chen (2003b)].
[Tsao, Chen, Chiu, Lu, and Vu (2022a)] analyzed actual sales data as internal operational
data from a business in the server industry. They created a revolutionary intelligent forecasting
methodology that modifies demand forecasting findings by adding external information
indices, stating that the suggested framework for intelligent forecasting offers a useful tool
for businesses susceptible to inconsistent demand. Tsao et al. concluded that better forecasting
outcomes should make it possible for a business to undertake manufacturing planning and
control with more accuracy suggesting that when supply and demand are easily met,
network with convolutional layers that enable automatic extraction of properties of time
series at any tier. Estimated coefficients accessible at any level of the hierarchy are combined
with these features in the network. The findings of the study carried out by Mancuso et al.,
2021 shows that the goal of combining all the relevant information through the hierarchy,
both concealed and supplied by the parameter estimates, in a single machine, without the
requirement for post-processing on the succession to reach optimal precision and homogeneity
across sections, was achieved.
Customer retention describes the measures a firm takes to maintain customer loyalty and
prevent customer churn, incorporating initiatives like personalised advertisement, membership
initiatives, and customer relations. Most of the time, keeping current customers is simpler and
cheaper than finding new ones [Derby (2018)]. A technique for predicting which clients will
be kept and which will be lost one or more months in advance is proposed and evaluated by
[Schaeffer and Sanchez (2020)]; the technique is tested using a variety of algorithms for
machine learning (the Support Vector Mechanism, ADA Boost as well as Random Forests)
that were identified to be effective in these situations. The results demonstrated that several
techniques yield valuable classifications up to three months in advance. We can also conclude
from the study carried out by [Saradhi and Palshikar (2011)] that identifying loyal customers
is simpler than predicting customer attrition.
The pioneer study to look at the outcomes of over ten various predictive models, including
both conventional and modern techniques like diverse Prophet and LSTM approaches was
carried out by [Ensafi et al. (2022)]. The study examined the ability of Convolutional
Neural Networks, primarily used for image recognition, to forecast sales of seasonal goods
with results of the Ensafi et al. study showing that most of the neural network algorithms
outperformed the traditional forecasting techniques. In a situation where a network system is
in cycles, it is known as Recurrent Neural Network (RNN). They happen to be very important
in forecasting time-series because of their capacity to retain memories of the past [Gamboa
(2017)]. Two models were recommended by [Zhuge, Xu, and Zhang (2017)] namely the
Emotional analysis model and LSTM; then reported that the LSTM model showed an
outperformance in time series processing due to its ability to preserve both temporal
properties of events and contextual information.
ARIMA, which stands for Auto Regressive Integrated Moving Average, is a popular model for
time series forecasting. One of the key assumptions of ARIMA is that the time series data
should be stationary, meaning that its statistical properties (such as mean and variance) do not
change over time. However, many real-world time series data, including sales data, often
exhibit non-stationary behavior, such as trends and seasonality. To address this issue, ARIMA
employs a technique called differencing, which involves computing the differences between
consecutive observations. The purpose of differencing is to remove the non-stationary
components from the time series, making it more suitable for modeling. The number of times
the differencing operation is applied is denoted by the "d" parameter in the ARIMA (p,d,q)
notation.
MATERIALS AND METHODS
Data Overview
When choosing a dataset for further investigation, there are a number of elements to take into
account, such as accessibility, anonymity, and data volume [Zhang, Zhang, Sun, and Liu
(2018)]. We were able to retrieve 5 years of store-item sales data (Kaggle.com, 2019) from the
Kaggle portal, which has no missing data, despite the fact that there are not many publicly
released datasets with sufficient observations for execution of time-series forecasting. There
are sales data from items from various stores that are included in this study. These data include
information about the date of sales in each store, item number, store number and the
amount of daily sales at each store. There are 4 attributes with 913000 instances each in
the training dataset under analysis with every entry showing the daily sales volume at individual
stores from 2013 to 2017, included. The training and testing portions of the dataset have been
correctly separated.
Table 3.1: Description of Variables
Variable Description
Date Date of each sales occurrence. No holidays or Store closures are recorded
Store An ID is assigned to each of the stores
Item Each of the products are assigned an ID
Sales The number of products that was sold on a specific store and date
Data Pre-Processing
One vital step for enhancing any project’s likelihood of success is to pre-process the data.
In this study on sales forecasting, time-series grouping is employed, which greatly
decreases computing capacity and improves prediction quality [Kotzur, Markewitz,
Robinius, and Stolten (2018)]. The daily data used in this study is re-scaled into a month-by-
month frequency because of the significant variation in the daily sales preceding projection.
The sales volume and the date each piece of data was ordered are the most crucial components
for performing sales forecasting. The JupyterLab and Google Colab environment in Python
programming language was utilised for the execution of this task with an outlined approach
shown below:
i. Importing and loading the required libraries and dataset.
ii. Performing an exploratory data analysis on the dataset to give a proper insight into
the dataset.
iii. Using the various machine learning models on dataset.
iv. Viewing and comparing the performance output of the various models using
appropriate indicators.
accessible to the public, including those that are frequently used processes that map
colour to a parameter or use acting needs around the universe. [Sial, Rashdi, and
Khan (2021)]
v. SciKit-learn: Using a standardised, mission architecture, Scikit-learn exposes a
huge variety of machine learning algorithms, either supervised or unsupervised,
making it simple to compare approaches for a particular application. It can be
seamlessly implemented into systems outside the conventional scope of analyzing
statistical data because it depends on the experimental Python framework.
[Pedregosa et al. (2011)]
vi. Matplotlib: In data science, visualisation is an important process. Using
visualisation, one may immediately comprehend data insights. Production-quality
graphs are generated by the Python library for 2D plotting called Matplotlib.
Giving room for both interactive and passive plots, also supporting a variety of
output formats for image storage (PNG, PS, etc.). It offers a large selection of
graphic layouts (bar charts, stacked pie charts, line charts, histograms, and many
more). Because MATLAB was exception- ally good at graphing, Matplotlib was
designed after MATLAB. Many people switched from MATLAB to Matplotlib due
to their excellent level of similarity, extreme versatility, user-friendliness, and
complete scalability. [Tosi (2009)]
After loading all the mentioned Libraries above, the retail sales dataset, in CSV format,
was loaded into our python environment as illustrated below.
distinctive values [Swami, Shah, and Ray (2020)]. In essence, the adaptability of ARIMA
modelling allows for the creation of an appropriate model that is derived from the structure
of the data. Derivatives of partial autocorrelation and autocorrelation enable for the finding of
information like trend, random variations, cyclical component, repetitive patterns, and
cointegration by roughly approximating the stochastic aspect of the time series. Deriving
predictions of the series’ anticipated values that are roughly accurate is therefore
straightforward. [Ho and Xie (1998)].
Measurement of Forecast Error
The competence of the models is revealed through time-series forecasting evaluation criteria.
Model evaluation can be done in a variety of ways [Hamzaçebi (2008)]. Three widely used
metrics are employed in this study to estimate the model’s competence; the root mean
squared error (RMSE), the mean absolute error (MAE), and the R-squared score. They are
briefly discussed below:
i. Root Mean Squared Error: A group of predictions’ average errors are measured by
its root mean squared error. As seen in Equation (1), each error’s squares are summed
before being averaged and the square root is taken. This makes sure that the direction
of the error is inconsequential and that all errors have the same weight. Given that it
is a quadratic function, global minima will always be reached. [Mitra, Jain, Kishore,
and Kumar (2022)]
Where the obtained value is Oi, the prediction value is Ei, and we have n as the number
of observations.
ii. Mean Absolute Error: A collection of predictions’ average errors are calculated using
this method. The error’s positivity or negativity is neglected because it is absolute, and
each individual error is given the same significance. It is simple to calculate MAE
as Equation (3.3.2) illustrates. The aggregate of the observations is divided by the
sum of the absolute values of the errors to obtain the "total error."[Wang and Bovik
(2009)]
Where the obtained value is Oi, the prediction value is Ei, and we have n as the number
of observations.
iii. R-square Score: The coefficient of determination is a number that represents the amount
of variability in the dependent variable that a framework can comprehend [Valbuena et
al. (2019)]. The R-square score is used to evaluate the dispersed data relative to a line
of best fit. Smaller disparities between the expected and true data are evidenced by
higher R-square scores for related datasets. On a scale ranging from zero to 1, it
evaluates the correlation between the predicted and actual data Mitra et al. (2022). An
R-square value of 0.75, for instance, means that variation in the independent variable
accounts for 75% of the variations in the dependent variable under study. Eq. (3)
gives an illustration.
The true value is Oi, the forecast value is Ei, and the mean is denoted as mu. The sum
of squares for residuals is known as SS res, while the sum of squares for all data is
known as SS total.
FINDINGS
The examination of the results is treated in this section. Each model is tested using 20% of the
remaining data once it has been trained on a trainset, made from 80% of the total dataset.
ARIMA
The grid search approach is used to determine this ARIMA model’s optimal order, which is
(12, 0, 0). [Brownlee (2021)]. The forecasting for sales is shown in Fig 4.1. Below. Seasonal
ARIMA (SARIMA) model is searched using a grid [Vincent (2021)]. The Auto-
ARIMA grid search performance is assessed using a manually created grid search
architecture. It discovered that (1, 1, 0) x is the ideal order (0, 1, 0, 12). The SARIMA
model, which utilised Auto-ARIMA, has an RMSE value of 14959.89346608319.
LSTM
Despite using the same system that has been developed on identical data, neural network
techniques yield various outcomes because they seed arbitrary weights during the training
phase. Utilized the Stacked LSTM and the average outcome of the training and testing
procedures, which have been repeated 50 times, is used to create solid outcomes. The
Stacked LSTM’s MAE value is 12314.000000. The LSTM’s R2 value is 0.992499. The
RMSE is 14579.789785. The results of the LSTM model’s sales predictions are shown in Fig
4.2. Below.
Random Forest
The average of the outputs from each decision tree included in the Random Forest model serves
as the ultimate output value in this collection of decision trees. With the help of the
variables n estimators and max depth, we can define the number of decision trees and their
total depth. After defining the Random Forest’s features, we fit the model to the training
data and forecast the resulting sale difference values and then compare the anticipated and
actual sale values to gauge the model’s performance. An RMSE of 17208.303725, MAE
of 14559.250000 and an R2 value of 0.989551. We see the result in Fig 4.4. Below.
Result Comparison
Now that different models have been explored, we can compare the outcomes. By combining
the outcomes from each model that was employed, a comparison can be established. Table
4.1 provides a convenient summary of the results.
Table 4.1: Summary of Error Scores
Model RMSE MAE R2 Percentage Off
Random Forest 17208.303725 14559.250000 0.989551 1.63%
Linear Regression 16221.040791 12433.000000 0.990716 1.39%
ARIMA 14959.893466 11265.335748 0.983564 1.26%
LSTM 14579.789785 12314.000000 0.992499 1.38%
XGBoost 13574.792632 11649.666667 0.993498 1.3%
The Root mean square error (RMSE) and Mean Absolute Error (MAE) will be examined
to compare model performance. Both of these metrics, albeit they have significantly distinct
statistical and intuitive meanings, are frequently used to compare the performance of models.
By taking the square root of the total sum of all squared errors, we may get the root mean square
error (RMSE). When we square, greater errors do have greater influence on the total error, but
smaller errors have less of an impact. Also, the mean absolute error, or MAE, indicates
how far off the mark on average our forecasts are from reality. The weight of each error is the
same in this situation. The formula for calculating the percentage of the prediction from the
actual given in table 4.1 above is given as:
to create a strong predictive model. This ensemble approach allows it to learn complex
patterns in the data, including non-linear interactions between features. Additionally,
XGBoost employs regularization techniques, such as L1 and L2 regularization, which help
prevent overfitting by penalizing overly complex models. These properties make XGBoost
particularly well-suited for handling the complexities and potential noise in retail sales data.
However, some key hyperparameters that could be considered for tuning include:
i. For Boost: max depth, learning rate, n_estimators, subsample, colsample bytree
ii. For Random Forest: n_estimators, max_depth, min_samples_split, min_samples_leaf
iii. For LSTM: hidden_layer_sizes, dropout_rate, learning_rate, batch_size, and epochs.
Recommendations for Future Work
This paper’s research can be expanded in numerous ways. First, it would be intriguing to
be able to conduct a comparable analysis for sales forecasting by taking into account the
data with its daily frequency rather than a monthly one. Indeed, retailers and distributors
may find value in more frequent sales forecasts. Furthermore, a more thorough variation
of the study could differentiate between customer segments because the sales tactics used
by the businesses while interacting with consumers of different types and sizes do vary.
Customer segmentation is a very intriguing and busy area of applied study, and it can
definitely increase businesses’ profits. So, in the near future, the analysis can be continued by
fusing time-series forecasting with customer classification techniques i.e. by knowing when
and what each person buys, how individual clients recognise particular items separately
(identifying supplementary and alternative commodities for each of them), and in what way
the demand price elasticity for various products/customers correspond to one another are all
factors that can also result in financial gains for businesses. Significant future study should
incorporate discounts and holidays data into the model because they have a major impact
on sales forecasting. There are also additional areas for this study’s future research. More
comprehensive LSTM can be tested in order to enhance the outcomes. Adopting multivariate
time-series forecasting is another option to investigate. Additionally, developing hybrid
models, which mix traditional and modern forecasting methodologies, can be advantageous
and represent a viable study area. Lastly, the machine learning algorithms can be attempted
in Actuarial Employee Benefit Valuations as they are also mostly a month-by-month dataset
of employee salaries trying to forecast the future liability needed to be reserved.
The forecasting techniques used in this study for retail sales could potentially be adapted to
actuarial science because both domains involve analyzing time-series data to make predictions
about future outcomes. In retail, the goal is to forecast future sales based on historical patterns,
while in actuarial science, the aim is to predict future claims, premiums, or reserves based on
past data. The underlying principles of time-series analysis, such as identifying trends,
seasonality, and other patterns, are applicable in both fields. However, it's important to note
that actuarial data may have its own unique characteristics and requirements, such as longer
time horizons, different types of risks, and regulatory constraints. Therefore, careful
consideration and domain expertise would be necessary to effectively apply these techniques
in an actuarial context.
Limitations
Lack of a completely detailed dataset that captures metrics such as promotional offers and
discounts does not allow for the study to have a complete picture of how accurate the models
predictions are because they are very important metrics that should be considered in an
exercise as such. The computers inability to run the python scripts was one restriction for
this study. The majority of this study had to be executed on Google colab because the keras
library kept shutting down the jupyter notebook kernel on the computer system that was used,
which indicates that a very high processing system is needed for forecasting exercises.
There is a risk of data leakage and overfitting in this study, particularly given the small dataset
and the use of powerful models like XGBoost. Data leakage can occur if future information is
inadvertently included in the training data, leading to overly optimistic performance estimates.
To mitigate this risk, it's crucial to ensure a strict separation between the training and testing
data and to use techniques like time-based cross-validation to simulate real-world forecasting
scenarios. Overfitting is another concern, especially with models that have high flexibility, such
as XGBoost. Overfitting occurs when a model learns to fit the noise in the training data,
resulting in poor generalization to new data. To address this, techniques like cross-validation,
regularization, and early stopping can be employed. Additionally, collecting more data and
using larger datasets can help reduce the risk of overfitting.
Acknowledgment
Firstly, all glory be to God Almighty for His favours in enabling me to satisfactorily
conclude this project work. I am particularly appreciative of my project Supervisor who
happens to double as my Personal tutor, Dr. Terry Sithole for her advice, persistence and
feedback in ensuring my completion of this assignment. All my various module lecturers who
have in one way or the other also impacted us as students thereby aiding in the production
of this project work are also not left out. Lastly, I will always be appreciative of my brother,
Babajide Mustapha and my entire family for their financial and psychological support. Their
faith in me kept motivating me to continue.
REFERENCES
Akanksha, A., Yadav, D., Jaiswal, D., Ashwani, A., & Mishra, A. (2022). Store-sales
forecasting model to determine inventory stock levels using machine learning. In 2022
international conference on inventive computation technologies (icict) (pp. 339–344).
Alexis, C. (2022, January). Time series - arima, dnn, xgboost comparison.
http://www.kaggle.com/code/alexisbcook/xgboost.
Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Machine learning methods for
demand estimation. American Economic Review, 105(5), 481–85.
Berry, M. J., & Linoff, G. S. (2004). Data mining techniques: for marketing, sales, and
customer relationship management. John Wiley & Sons.
Betul, G. (2022, September).Time series-arima, dnn, xgboost comparison.
http://www.kaggle.com/code/badl071/forecasting-future-sales -using-machine-learning.
Boulden, J. B. (1957). Fitting the sales forecast to your firm. Business Horizons, 1(1), 65–72.
Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis:
forecasting and control. John Wiley & Sons.
Boyapati, S. N., & Mummidi, R. (2020). Predicting sales using machine learning techniques.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.
Brownlee, J. (2021, November). How to grid search arima model hyperpa- rameters
with python. machinelearningmastery.com/grid-search-arima-hyperparameters-with-
python.
Chen, F., & Ou, T. (2009). Gray relation analysis and multilayer functional link network
sales forecasting model for perishable food in convenience store. Expert Systems
with Applications, 36(3), 7054–7063.
Cheriyan, S., Ibrahim, S., Mohanan, S., & Treesa, S. (2018). Intelligent sales prediction using
machine learning techniques. In 2018 international conference on computing,
electronics & communications engineering (iccece) (pp. 53–58).
Choi, T.-M., Hui, C.-L., Liu, N., Ng, S.-F., & Yu, Y. (2014). Fast fashion sales forecasting
with limited data and time. Decision Support Systems, 59, 84–92.
Cui, R., Gallino, S., Moreno, A., & Zhang, D. J. (2018). The operational value of social
media information. Production and Operations Management, 27(10), 1749–1769.
Deng, J. (1989). Introduction to grey theory system. The Journal of Grey System, 1(1), 1–
24.
Derby, N. (2018). Reducing customer attrition with machine learning for financial
institutions. Proceeding soft he SAS GlobalForum, 1796.
Dey, A., Singh, J., & Singh, N. (2016). Analysis of supervised machine learning algorithms for
heart disease prediction with reduced number of attributes using principal component
analysis. International Journal of Computer Applications, 140(2), 27–31.
Enolac5. (2018, May). Time series - arima, dnn, xgboost comparison. http://www.kaggle
Ensafi, Y., Amin, S. H., Zhang, G., & Shah, B. (2022). Time-series forecasting of seasonal
items sales using machine learning–a comparative analysis. International Journal of
Information Management Data Insights, 2(1), 100058.
Ferreira, K. J., Lee, B. H. A., & Simchi-Levi, D. (2016). Analytics for an online retailer:
Demand forecasting and price optimization. Manufacturing & service operations
management, 18(1), 69–88.
Fildes, R., & Petropoulos, F. (2015). Simple versus complex selection rules for forecasting
many time series. Journal of Business Research, 68(8), 1692–1701.
Fildes, R., Ma, S., & Kolassa, S. (2022). Retail forecasting: Research and practice.
International Journal of Forecasting, 38(4), 1283–1318.
Fisher, M. L., Raman, A., & McClelland, A. S. (2000). Rocket science retailing is almost
here-are you ready? Harvard Business Review, 78(4), 115–123.
Fisher, M., & Raman, A. (2018). Using data and big data in retailing. Production and
Operations Management, 27(9), 1665–1669.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals
of statistics, 1189–1232.
Gamboa, J. C. B. (2017). Deep learning for time-series analysis. arXiv preprint
arXiv:1701.01887.
Hamzaçebi, C. (2008). Improving artificial neural network sâ performance in seasonal time
series forecasting. Information Sciences, 178(23), 4550–4559.
Ho, S. L., & Xie, M. (1998). The use of arima models for reliability forecasting and
analysis. Computers & industrial engineering, 35(1-2), 213–216.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation,
9(8), 1735–1780.
Hsu, C.-C., & Chen, C.-Y. (2003a). Applications of improved grey prediction model for
power demand forecasting. Energy Conversion and management, 44(14), 2241–2249.
Hsu, C.-C., & Chen, C.-Y. (2003b). Applications of improved grey prediction model for
power demand forecasting. Energy Conversion and management, 44(14), 2241–2249.
Hsu, L.-C. (2009). Forecasting the output of integrated circuit industry using genetic algorithm
based multivariable grey optimization models. Expert systems with applications,
36(4), 7898–7903.
Hsu, L.-C., & Wang, C.-H. (2007). Forecasting the output of integrated circuit industry
using a grey model improved by the Bayesian analysis. Technological Forecasting and
Social Change, 74(6), 843–853.
Hyndman, R., Lee, A., Wang, E., & Wickramasuriya, S. (2018). hts: Hierarchical and
grouped time series. R package version 5.1. 5.
Jain, A., Menon, M. N., & Chandra, S. (2015). Sales forecasting for retail chains. San
Diego, California: UC San Diego Jacobs School of Engineering.
Kaggle.com. (2019, September). Store item demand forecasting challenge. http://https://
www.kaggle.com/competitions/demand-forecasting-kernels-only/ data.
Kalaoglu, Ö. I˙., Akyuz, E. S., Ecemis¸, S., Eryuruk, S. H., Sümen, H., & Kalaoglu, F.
(2015).
Kotzur, L., Markewitz, P., Robinius, M., & Stolten, D. (2018). Impact of different time
series aggregation methods on optimal energy system design. Renewable energy, 117,
474–487.
Lei, M., & Feng, Z. (2012). A proposed grey model for short-term electricity price forecasting
in competitive power markets. International Journal of Electrical Power & Energy
Systems, 43(1), 531–538.
Levy, M., Weitz, B. A., Grewal, D., & Madore, M. (2012). Retailing management
(Vol. 6).
Li, Z., Ma, X., & Xin, H. (2017). Feature engineering of machine-learning
chemisorption models for catalyst design. Catalysis today, 280, 232–238.
Lin, Y.-H., & Lee, P.-C. (2007). Novel high-precision grey forecasting model. Automation
in construction, 16(6), 771–777.
Liu, N., Ren, S., Choi, T.-M., Hui, C.-L., & Ng, S.-F. (2013). Sales forecasting for
fashion retailing service industry: a review. Mathematical Problems in
Engineering, 2013.
Lo, T. (1994). An expert system for choosing demand forecasting techniques.
International Journal of Production Economics, 33(1-3), 5–15.
Lou, W., Wang, X., Chen, F., Chen, Y., Jiang, B., & Zhang, H. (2014). Sequence based
prediction of dna-binding proteins based on hybrid feature selection using random
forest and gaussian naive bayes. PloS one, 9(1), e86703.
Ma, S., & Fildes, R. (2021). Retail sales forecasting with meta-learning. European Journal
of Operational Research, 288(1), 111–128.
Manaswi, N. K. (2018). Understanding and working with keras. In Deep learning with
applications using python (pp. 31–43). Springer.
Mancuso, P., Piccialli, V., & Sudoso, A. M. (2021). A machine learning approach for
forecasting hierarchical time series. Expert Systems with Applications, 182, 115102.
McKinney, W. (2012). Python for data analysis: Data wrangling with pandas, numpy, and
ipython. “O’Reilly Media, Inc.".
Mitchell, T. M. (1997). Does machine learning really work? AI magazine, 18(3), 11–11.
Mitra, A., Jain, A., Kishore, A., & Kumar, P. (2022). A comparative study of demand
forecasting models for a multi-channel retail company: A novel hybrid machine
learning approach. In Operations research forum (Vol. 3, pp. 1–22).
Ofoegbu, K. (2021). A comparison of four machine learning algorithms to predict product sales
in a retail store (Unpublished doctoral dissertation). Dublin Business School.
Pal, M. (2005). Random forest classifier for remote sensing classification. International journal
of remote sensing, 26(1), 217–222.
Pavlyshenko, B. M. (2019). Machine-learning models for sales time series forecasting. Data,
4(1), 15.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., others
(2011). Scikit-learn: Machine learning in python. The Journal of machine learning
research, 12, 2825–2830.
Prudêncio, R. B., & Ludermir, T. B. (2004). Meta-learning approaches to selecting time series
models. Neurocomputing, 61, 121–137.
Retail demand forecasting in clothing industry. Textile and Apparel, 25(2), 172–178.
Rokach, L. (2016). Decision forest: Twenty years of research. Information Fusion, 27, 111–
125.
Sakai, H., Nakajima, H., Higashihara, M., Yasuda, M., & Oosumi, M. (1999). Development
of a fuzzy sales forecasting system for vending machines. Computers & industrial
engineering, 36(2), 427–449.
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers.
IBM Journal of Research and Development, 3(3), 210-229. doi: 10.1147/rd.33.0210
Sanders, N. R., & Graman, G. A. (2009). Quantifying costs of forecast errors: A case study
of the warehouse environment. Omega, 37(1), 116–125.
Saradhi, V. V., & Palshikar, G. K. (2011). Employee churn prediction. Expert Systems
with Applications, 38(3), 1999–2006.
Schaeffer, S. E., & Sanchez, S. V. R. (2020). Forecasting client retentionâa machine-
learning approach. Journal of Retailing and Consumer Services, 52, 101918.
Shearer, C. (2000). The crisp-dm model: the new blueprint for data mining. Journal of
data warehousing, 5(4), 13–22.
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (rnn) and long short-term
memory (lstm) network. Physica D: Nonlinear Phenomena, 404, 132306.
Sial, A. H., Rashdi, S. Y. S., & Khan, A. H. (2021). Comparative analysis of data
visualization libraries matplotlib and seaborn in python. International Journal, 10(1).
Su, X., Yan, X., & Tsai, C.-L. (2012). Linear regression. Wiley Interdisciplinary
Reviews: Computational Statistics, 4(3), 275–294.
Swami, D., Shah, A. D., & Ray, S. K. (2020). Predicting future sales of retail products
using machine learning. arXiv preprint arXiv:2008.07779.
Tanaka, K. (2010). A sales forecasting model for new-released and nonlinear sales
trend products. Expert Systems with Applications, 37(11), 7387–7393. Tosi, S.
(2009). Matplotlib for python developers. Packt Publishing Ltd.
Tsao, Y.-C., Chen, Y.-K., Chiu, S.-H., Lu, J.-C., & Vu, T.-L. (2022a). an innovative
demand forecasting approach for the server industry. Technovation, 110, 102371.
Tsao, Y.-C., Chen, Y.-K., Chiu, S.-H., Lu, J.-C., & Vu, T.-L. (2022b). an innovative
demandforecasting approach for the server industry. Technovation, 110, 102371.
Tsoumakas, G. (2019a). A survey of machine learning techniques for food sales prediction.
Artificial Intelligence Review, 52(1), 441–447.
Tsoumakas, G. (2019b). A survey of machine learning techniques for food sales prediction.
Artificial Intelligence Review, 52(1), 441–447.
Valbuena, R., Hernando, A., Manzanera, J. A., Görgens, E. B., Almeida, D. R., Silva,
C. A., & García-Abril, A. (2019). Evaluating observed versus predicted forest
biomass: R- squared, index of agreement or maximal information coefficient?
European Journal of Remote Sensing, 52(1), 345–358.
Van Der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The numpy array: a structure
for efficient numerical computation. Computing in science & engineering, 13(2), 22–
30.
Vincent, T. (2021, November). Arima time series data forecasting and visualization
in python.www.digitalocean.com/community/tutorials/a-guide-to-time-series-
forecastingwith-arima-in-python-3.
Wang, Z., & Bovik, A. C. (2009). Mean squared error: Love it or leave it? A new look at
signal fidelity measures. IEEE signal processing magazine, 26(1), 98–117.
Wei, Z., & Shan, X. (2019). Research on forecast method of railway passenger flow
demand in pre-sale period. In Iop conference series: Materials science and
engineering (Vol. 563, p. 052080).
Winters, P. R. (1960a).Forecasting sales by exponentially weighted moving averages.
Management science, 6(3), 324–342.
Winters, P. R. (1960b).Forecasting sales by exponentially weighted moving averages.
Management science, 6(3), 324–342.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE
transactions on evolutionary computation, 1(1), 67–82.
Wong, W. K., & Guo, Z. (2010). A hybrid intelligent model for medium-term sales
forecasting in fashion retail supply chains using extreme learning machine and
harmony search algorithm. International Journal of Production Economics, 128(2),
614–624.
Yao, A. W., Chi, S., & Chen, J. (2003). An improved grey-based approach for
electricity demand forecasting. Electric Power Systems Research, 67(3), 217–224.
Yoo, T.-W., & Oh, I.-S. (2020). Time series forecasting of agricultural products sales volumes
based on seasonal long short-term memory. Applied Sciences, 10(22), 8169.
Yucesan, M., Gul, M., & Erkan, E. (2017). Application of artificial neural networks
using Bayesian training rule in sales forecasting for furniture industry. Drvna
industrija, 68(3), 219–228.
Zhang, C., Zhang, H., Sun, Q., & Liu, K. (2018). Mechanical properties of zr41. 2ti13.
8ni10cu12. 5be22. 5 bulk metallic glass with different geometric confinements. Results
in Physics, 8, 1–6.
Zhuge, Q., Xu, L., & Zhang, G. (2017). Lstm neural network with emotional analysis for
prediction of stock price. Engineering letters, 25(2).
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work
simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that
allows others to share the work with an acknowledgment of the work's authorship and initial
publication in this journal.