Paper 4
Paper 4
https://doi.org/10.1007/s42979-021-00970-5
ORIGINAL RESEARCH
Received: 16 July 2021 / Accepted: 12 November 2021 / Published online: 26 November 2021
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2021
Abstract
The stock market is complex in nature and it is very difficult to predict. Investors have many factors that affect the stock
values. The stock market plays an important role in the financial aspect of the country’s growth. The demand to predict stock
values is very high, hence is the need for stock market analysis. This article is basically focused on always taking risks to
invest his money in the stock market to gain profit. There are various machine-learning techniques available to predict the
stock market. There are on predicting the stock market values. In the current scenario, the stock market forecasting is done
using machine learning and artificial intelligence which makes the prediction process easier and based on the values of the
current stock rate by training on the previous values. Basically, stock price prediction is based on time series data means
every new data are dependent or based on previous data value. The dataset used for this is Dell daily stock for the period 17
Aug 2016–21 May 2021 which was used in this article. There are different kinds of models that can help in predicting the
stock market. A simple machine-learning model cannot be applied to time series data, that is why we studied many models
such as LSTM and ARIMA model, which are best for time series data. In addition, at the end, we saw that ARIMA is one
of the best models for predicting the stock market values for short time series. This model is based on previous values. This
model gives the more accurate and best results as compared to another one.
Introduction
SN Computer Science
Vol.:(0123456789)
88
Page 2 of 11 SN Computer Science (2022) 3:88
weak type of market efficacy as well as a high noise level. [9]. This article said that the SARIMA model forecasts are
Wang et al. [2] utilized ANN to prediction of prices of stock more accurate than the BPNN model for the KOSPI model.
in 2003 and concentrated on capacity, which is a unique ele- KOSPI model best predicts the nonlinear, uncertain, and vol-
ment of the stock. Among the significant conclusions was atile data. Forecasting the accurate value is totally depend-
that increasing the capacity was unable to improve predict- ent upon the developing process of predicting the model,
ing performance on the datasets they studied, which were this study is also helping to generate the procedure of the
the S & amp P 500 and the DJI. SARIMA model and BPNN model.
In [3], Ince et al. focused on the short projecting and used Adebiyi et al. presented the ARIMA model to forecast
a SVM, the support vector machine model for the price pre- the future of the securities, stock, financial market moves by
diction of stocks. The authors’ key role is a assessment of examining of regression analysis of their data [10]. Using
MLP with that of SVM, which indicated that SVM outper- the ARIMA model, the individual or any entity predicts the
formed MLP in the majority of situations; however, the out- value of the stock in the stock market. Its main goal is to
come was also influenced by diverse transaction methods. predict the differences between values in the series. The
Meanwhile, financial domain experts were studying stock ARIMA model helps to predict the short-term price of the
market data using traditional statistical approaches and sig- stock, which helps the investor to invest in the stock at the
nal-processing techniques. right point in time.
Short-term stock price prediction was also done by means Rafiqul et al. presented a relative study of 3 models, Geo-
of optimization methods, e.g., PCA, the principle compo- metric Brownian Motion, ARIMA and ANN, that helps to
nent analysis [4]. Over time, scientists have attempted to forecast the future prices of the stock market [11]. The study
evaluate stock market transactions such as volume burst shows that the ARIMA and Geometric Brownian Motion
hazards, broadening the stock market analysis study field model is much better than the ANN in predicting the future
and indicating that this study area still has a lot of promise price of the stock. ARIMA and stochastic model, both can
[5]. Numerous projected solutions endeavored to merge deep be used for short-term prediction using the time series data.
learning (DL) and ML methods created on earlier methods Devi et al., shown in their paper is that inferences a new
as AI-based methods improved in recent years, such as Liu investment decision which is based on the less error per-
and Wang [6]. centage obtained [12]. This paper also highlighted the point
To assess different quantitative methods in stock markets, on the next few years’ future forecasting of each and every
Liu et al. introduced a CNN as well as LSTM neural-net- index.
work-based model in [7]. The CNN is used for stock selec- Alma Sarah et al. presented a model that showed short-
tion, and it inevitably abstracts features based on numerical term predicting of the high technology procedure [13].
data, then uses an LSTM to retain time series structures to ARIMA models are implemented over the previous year
increase profits. To anticipate the stock market index, a new dataset to improve short-term prediction. In this paper, they
study offers a hybrid neural network design that syndicates show the use of the method in banking stock market data
a CNN with bidirectional memory [8]. verified with its accuracy. This study has shown that the
There are many technologies that are used to resolve this method was limited to short-term forecasting.
issue related to stock market prediction such as ANN, Fuzzy
Logic, and SVM. Recently, The ARIMA method was used
for this problem in predicting the pattern. ARIMA has been
done a successful job in the field of analyzing and predicting
Proposed Approach
the time series. ARIMA is best known for short-term predic-
tion. In this article, we use a daily fractional change in the
ARIMA Model
stock value. Segment 2 of the paper presents the extensive
ARIMA model stands for (auto-regressive integrated mov-
literature survey and finds similar work in the area. The sec-
ing average). This model is a generalization of autoregres-
tion next to this discusses the proposed methodology and
sive and moving average models. It is best for short time
results are presented in the next section.
series forecasting. Generally, time series analysis needs sta-
tionary data but stock market data are non-stationary data.
These non-stationary data are handled using the approach of
Literature Survey
the ARIMA model which is introduced by Box and Jenkins
in 1970. This model is best for predicting the stock market
Lee et al. presented the comparison between the forecasting
value. In this, the future value of a variable is based on a
technique and reliability between the BPNN model and a
linear combination of past errors and past values:
time series (SARIMA) model in Korean Stock Exchange
SN Computer Science
SN Computer Science (2022) 3:88 Page 3 of 11 88
LSTM Model
Methodology
LSTM model is presented by Hochreiter and Schmidhuber
[14]. Basically, LSTMs are intended to evade the long-term This section is divided into three subsections. One deals
dependency problem. This model can predict an arbitrary with the data that are primarily used for building the model.
number of steps into the future. It is useful for both long- Another two sections deal with describing various theories
term and short-term data modeling. It has five components. and processes for building the models. The performance of
the model is tested by the following:
Cell state (ct) It represents the internal memory of the cell which
stores both short term memory and long-term • Root mean square error
memories • Mean absolute error
Hidden state (ht) This is the output state information which stored • Mean square error and
the previous calculated hidden state, current
input, and current cell input which is eventually
• Regression score
used to predict the future stock market values ∑n
�yi − xi �
Input gate (it) It is used to decides how much information flows MAE = i=1
from current input to the cell state n
Forget gate (ft) It is used to decide how much information flows
into the current cell state from previous cell and ∑n 2
the current input cell i=1
(Yi − Ŷi )
MSE =
Output gate (ot) It decides how much information flows from the n
current cell state into the hidden state, that helps
to choose the long-term memories or short term �
memories and long-term memories ∑n 2
i=1
(Yi − Ŷi )
RMSE =
n
RSS
R2 = 1 − .
TSS
Dataset
• volume,
• close,
• low,
• high,
• daily open,
• adjusted close price.
Equations are represented as follows:
SN Computer Science
88
Page 4 of 11 SN Computer Science (2022) 3:88
SN Computer Science
SN Computer Science (2022) 3:88 Page 5 of 11 88
SN Computer Science
88
Page 6 of 11 SN Computer Science (2022) 3:88
function from pmdarima python package and in result found (p,d,q) models were also considered in this article. The most
the value 1. accurate model was picked using the Akaike information
From the PACF and ACF plots of data in Fig. 4, the criteria (AIC) criteria; the lower the value, the more accurate
autoregressive and moving average orders p and q were the model is.
found. From Table 1, the ARIMA (0,1,1) model has minimum
In both, PACF and ACF plots, there is no significant lag AIC value. In Table 2, there are 3 error factors MAE,
which indicates that the AR process of order p = 0 and MA MSE, RMSE. R2 score shows the Goodness of fit of the
process of order q = 0, i.e., ARIMA (0,1,0). Other ARIMA model.
SN Computer Science
SN Computer Science (2022) 3:88 Page 7 of 11 88
Table 6 Prediction by ARIMA (0,1,1) model We will utilize the LSTM RNN to create our model,
Actual Predicted Error which will use 70% of the data for training and 30% for
testing. We optimize our model using mean squared error
1 50 49.97671 0.046588 for training. We also utilized 300 epochs for training data,
2 49.89 50.00259 − 0.22568 thus our model will look like this.
3 49.62 49.87748 − 0.51891 Table 3 shows a summary of the LSTM model and in
4 50.46 49.59137 1.721431 Table 4, error factors that help to decide how the model is
5 50.83 50.55635 0.538376 performing.
6 52.01 50.86041 2.210314
7 50.98 52.13819 − 2.27185 Result
8 50.91 50.85306 0.111842
9 51.39 50.91623 0.921906 In this result section, we discussed the above two models and
10 52.29 51.44202 1.621683 an overview of the actual and predicted prices which is shown
11 50.57 52.38352 − 3.58617 through a graphical representation.
12 51.08 50.37357 1.382995
13 49.8 51.15498 − 2.72085 ARIMA
14 50 49.65782 0.684369
15 49.62 50.03563 − 0.83762 To calculate error, the predicted values have been related
16 49.51 49.57673 − 0.13479 through the real values. This comparison is shown in Table 5.
17 49.83 49.50305 0.656138
18 50.86 49.86406 1.958199 Error =
actual − predicted
× 100.
19 50.6 50.96416 − 0.71968 actual
From Table 5, we can observe that the errors are less
than ₹4 for the daily forecast. Relative errors lie in the range
LSTM of negative 3.58617 to positive 2.210314. In Fig. 5, the
graph is drawn for actual prices against the predicted fig-
The first step after the data collection is data pre-process- ures. Some observations are presented in Fig. 5 as follows:
ing, which is used for data transformation, data cleaning, predicted prices are very close to actual prices in ARIMA
and data integration. Data transformation involved data (0,1,1). The ARIMA (0,1,1) model’s performance was
normalization, MinMaxScaler scales all the data to be in assessed using Table 2 error measure, and Table 5 illustrates
the region of 0 and 1. After the dataset is normalized and the comparison between test and projected results. Figure 6
cleaned, the dataset is divided into training and testing is a zoomed-in version of Fig. 5.
sets. The testing data are kept as 30% of the total dataset.
SN Computer Science
88
Page 8 of 11 SN Computer Science (2022) 3:88
Table 7 Prediction by LSTM model Table 6 shows that the relative errors for the daily forecast
Actual Predicted Error
are less than 5, with relative errors ranging from − 4.52004
to 1.18007. Figure 7 depicts a graph of the actual data and
1 50 50.56885 − 1.1377 the LSTM model’s predicted stock price value. The blue
2 49.89 50.58073 − 1.38451 line in this graph reflects Dell’s actual stock price, while the
3 49.62 50.43788 − 1.64828 orange line represents its forecasted stock price. The evalu-
4 50.46 50.14669 0.620905 ation of the performance of this model is evaluated by the
5 50.83 51.09684 − 0.52497 evaluation from Table 6 error measure and from Table 4. It
6 52.01 51.39624 1.18007 represents the comparison between Actual and predicted.
7 50.98 52.64707 − 3.27004 Figure 8 shows the zoom view of Fig. 7.
8 50.91 51.35383 − 0.87179
9 51.39 51.40783 − 0.0347
10 52.29 51.93532 0.678292
11 50.57 52.85579 − 4.52004 Conclusion
12 51.08 50.8346 0.480429
13 49.8 51.62289 − 3.66042 We now discuss the collective results obtained from two the
14 50 50.15513 − 0.31026 models discussed above. Table 7 presents the experimental
15 49.62 50.53177 − 1.83751 output generated from the supplied models, whereas Fig. 9
16 49.51 50.09539 − 1.18236 graphically depicts the result.
17 49.83 50.03061 − 0.40258 The output of the ARIMA (0,1,1) model and the output of
18 50.86 50.40498 0.894646 the LSTM model are quite close, and they sometimes match,
19 50.6 51.50288 − 1.78434 as seen in Fig. 10.
When comparing the error measures in Table 8, it is evi-
dent that the ARIMA model outperforms the LSTM model
LSTM when it comes to predicting the next-day stock price.
The goal of this research is to compare the performance
To calculate the inaccuracy, the anticipated prices were of an LSTM model and a time series ARIMA model in
compared to the actual prices, as given in Table 6. Using forecasting. We discover the following using DELL data:
the formula’s unique error calculation, first, the ARIMA model consistently outperforms the LSTM
actual − predicted
Error = × 100.
actual
SN Computer Science
SN Computer Science (2022) 3:88 Page 9 of 11 88
Fig. 10 Zoom view of prediction by ARIMA (0,1,1) and LSTM against actual price
SN Computer Science
88
Page 10 of 11 SN Computer Science (2022) 3:88
3. Jeon S, Hong B, Chang V. Pattern graph tracking-based stock international conference on computer modelling and simulation.
price prediction using big data. Future Gener Comput Syst. 2014. p. 106–112.
2018;80:171–87. https://doi.org/10.1016/j.future.2017.02.010. 11. Islam MR, Nguyen N. Comparison of financial models for stock
4. Lin X, Yang Z, Song Y. Expert systems with applications short- price prediction. J Risk Financ Manag. 2020;13(8):181.
term stock price prediction based on echo state networks. Expert 12. Uma Devi B, Sundar D, Alli P. An effective time series analysis
Syst Appl. 2009;36(3):7313–7. https://doi.org/10.1016/j.eswa. for stock trend prediction using ARIMA model for nifty Mid-
2008.09.049. cap-50. Int J Data Min Knowl Manag Process. 2013;3(1).
5. Shih D, Hsu H, Shih P. A study of early warning system in volume 13. Almasarweh M, Al Wadi S. ARIMA model in predicting banking
burst risk assessment of stock with big data platform. In: 2019 stock market data. Mod Appl Sci. 2018;12(11)
IEEE 4th international conference on cloud computing and big 14. Hochreiter S, Schmidhuber J. Long short-term memory. Neural
data analysis (ICCCBDA). 2019. p. 244–248. https://doi.org/10. Comput. 1997;9(8):1735–1780. https://doi.org/10.1162/neco.
1109/ICCCBDA.2019.8725738. 1997.9.8.1735
6. Liu G, Wang X. A new metric for individual stock trend pre- 15. Edward A..J., & Manoj Jyothi. (2016).Forecast Model Using
diction. Eng Appl Artif Intell. 2019;82:1–12. https://doi.org/10. ARIMA for stock prices of Automobile sector. International Jour-
1016/j.engappai.2019.03.019. nal of Research in Finance and Marketing(Impact Factor 5.861),
7. Liu S, Zhang C, Ma J. CNN-LSTM neural network model for 6(4), 174-178.
quantitative strategy analysis in stock markets. 2017;1:198–206. 16. Srivastava AK, Kumar Y, Singh PK. A rule-based monitoring
https://doi.org/10.1007/978-3-319-70096-0. system for accurate prediction of diabetes: monitoring system for
8. Eapen J, Bein D, Verma A. Novel deep learning model with CNN diabetes. Int J E-Health Med Commun. 2020;11(3):32–53. https://
and bi-directional LSTM for improved stock market index predic- doi.org/10.4018/IJEHMC.2020070103.
tion. In: 2019 IEEE 9th annual computing and communication 17. Moghar A, Hamiche M. Stock market prediction using LSTM
workshop and conference (CCWC). 2019. pp. 264–70. https://d oi. recurrent neural network. Procedia Comput Sci. 2020;170:1168–
org/10.1109/CCWC.2019.8666592. 1173. https://doi.org/10.1016/j.procs.2020.03.049.
9. Lee K, Yoo S, Jongdae JJ. Neural network model vs SARIMA 18. Nochai R, Nochai T. ARIMA model for forecasting oilpalm price.
model in forecasting Korean Stock Price Index (KOSPI). Issues In: Proceedings of 2nd IMT-GT reginal conference on mathemat-
Inf Syst. 2007;8(2):372–8. ics, statistics and applications Universiti Sains Malaysia, Penang,
10. Ariyo AA, Adewumi AO, Ayo CK (2014) Stock price predic- June 13–15, 2006, 2006.
tion using the ARIMA model. In: 2014 UKSim-AMSS 16th
SN Computer Science
SN Computer Science (2022) 3:88 Page 11 of 11 88
19. Roondiwala R, Patel H, Varma S. Predicting stock prices using 23. https://machinelearningmastery.com/how-to-load-visualize-and-
LSTM. Int J Sci Res. 2015. explore-a-complex-multivariate-multistep-time-series-forecast-
20. Srivastava AK, Singh PK, Kumar Y. A taxonomy on machine ing-dataset/ (how to load, visualize and explore a time series
learning based techniques to identify the heart disease. In: Prateek dataset).
M, Sharma D, Tiwari R, Sharma R, Kumar K, Kumar N, editors.
Next generation computing technologies on computational intel- Publisher's Note Springer Nature remains neutral with regard to
ligence. NGCT 2018. Communications in computer and informa- jurisdictional claims in published maps and institutional affiliations.
tion science, vol 922. Springer, Singapore; 2019. https://doi.org/
10.1007/978-981-15-1718-1_2.
21. https://machinelearningmastery.com/time-series-prediction-lstm-
recurrent-neural-networks-python-keras/. Accessed 5 Aug 2021
(for complete understanding of LSTM model).
22. https://machinelearningmastery.com/arima-for-time-series-forec
ith-p ython/. Accsessed 5 Aug 2021 (for complete under-
asting-w
standing of ARIMA model).
SN Computer Science