0% found this document useful (0 votes)
60 views4 pages

Stock Movement Prediction Based On Technical Indicators Applying Hybrid Machine Learning Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views4 pages

Stock Movement Prediction Based On Technical Indicators Applying Hybrid Machine Learning Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Stock Movement Prediction Based On Technical

Indicators Applying Hybrid Machine Learning


Models
Zakia ZOUAGHIA Zahra KODIA AOUINA Lamjed BEN SAID
2023 International Symposium on Networks, Computers and Communications (ISNCC) | 979-8-3503-3559-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ISNCC58260.2023.10323971

University of Tunis University of Tunis University of Tunis


Department of Computer Science Department of Computer Science Department of Computer Science
SMART LAB, ISG Tunis, Tunisia. SMART LAB, ISG Tunis, Tunisia. SMART LAB, ISG Tunis, Tunisia.
[email protected] [email protected] [email protected]

Abstract—The prediction of stock price movements is one recent years, a huge progress of the application of artificial
of the most challenging tasks in financial market field. Stock intelligence (AI) techniques in financial field has been seen.
price trends depended on various external factors like investor’s Especially, ML algorithms, which is a subset of AI and one
sentiments, health and political crises which can make stock
prices more volatile and chaotic. Lately, two crises affected of the most triggering recent and powerful tools. Further
the variation of stock prices, COVID-19 pandemic and Russia- ML is widely utilized to analyze financial data and to make
Ukraine conflict. Investors need a robust system to predict future predictions of stock price trends [2].
stock trends in order to make successful investments and to face In this research, we address the problem of stock price
huge losses in uncertainty situations. Recently, various machine prediction as a classification problem. and it has three prin-
learning (ML) models have been proposed to make accurate stock
movement predictions. In this paper, a framework including five cipal objectives: (i) to use historical stock data of NASDAQ
ML classifiers ( Gaussian Naive Bayes (GNB), Random Forest composite index and technical indicators as input. (ii) to use
(RF), Gradient Boosting (GB), Support Vector Machine (SVM), five hybrid ML algorithms combined with two methods (PCA
and K-Nearest Neighbors (kNN)) ) is proposed to predict the for automatic features selection and GS technique for hyper-
closing price trends. Technical indicators are calculated and parameters optimization) to predict the closing price trends.
used with historical stock data as input. These classifiers are
hybridized with Principal Component Analysis method (PCA) for Then, (iii) to analyze and compare the achieved outcomes
feature selection and Grid Search (GS) Optimization Algorithm of the considered models and recommend the best one to
for hyper-parameters tuning. Experimental results are conducted investors. This contribution reveals the power of a set of tradi-
on National Association of Securities Dealers Automated Quo- tional, ensemble, and non-parametric ML models hybridized
tations (NASDAQ) stock data covering the period from 2018 to with other methods to make accurate financial decisions.
2023. The best result was found with the Random Forest classifier
model which achieving the highest accuracy (61%). This paper has been organized into six sections. After the
Index Terms—Stock market, Machine learning classifiers, introduction, Section II elaborates the related works on the
Technical indicators, Optimization, Automatic features selection, use of ML models to predict stock price trends. Section III
Prediction, Financial decision making, NASDAQ index. explains the baselines of the injected algorithms and methods
in the developed framework. Section IV demonstrates the used
I. I NTRODUCTION data and the architecture of the proposed approach. Section V
A stock market is a place where investors (buyers or sellers illustrates the outputs of the experiments. Finally, Section VI
of stocks) are aggregated, they seek to maximize profits presents concluding remarks, provides limitations of this work
and minimize risks by making timely accurate decisions: and proposes future research directions.
selling, buying or holding a given stock. Stock markets are II. L ITERATURE REVIEW
an important part of financial system of countries. Stock
price movements are generally driven by numerous factors [3] In paper [4], authors proposed a prediction system for
such as investor’s beliefs, political (lately Russian-Ukrainian, banking stock prices (BRI) including the Naive Bayes (NB)
European, and American conflict), economic [1] or health algorithm to generate results in predicting stock price classi-
(lately COVID-19 pandemic) factors. Thus, due these lately fications. They suggested this system to be a useful tool for
crisis, the stock markets become more volatile which a sign of investors to assist them to discover the movement of future
uncertainty and fear among investors. This current condition stock prices in order to make financial decisions. The proposed
has made the stock movement prediction a difficult task. In model achieved 57% of accuracy.
In paper [5], authors have implemented four ML models
(Artificial Neural Network (ANN), NB, RF and SVM) based
979-8-3503-3559-0/23/$31.00 ©2023 IEEE on the use of technical indicators as input features to predict

Authorized licensed use limited to: VIT University. Downloaded on January 27,2024 at 05:54:33 UTC from IEEE Xplore. Restrictions apply.
the future stock price trends. Experiments have been conducted solutions to intricate problems. It can be used for solving both
on Shanghai Stock Exchange (SSE) 50 index stocks. Results classification and regression problems, this paper will focus on
revealed that the average prediction test accuracy of the building a classification model. RF is an ensemble learning
considered stocks achieved 53.9%, 51.4%, 50.4% and 48.6%, method, meaning that it combines multiple trees to predict the
respectively, for ANN, SVM, RF and NB models. class of the data. It trains many decision trees on different
In paper [9], they predicted the daily close returns of subspace of the feature space, but none of the trees in the
three traded stocks (GARAN, THYAO, and ISCTR) in Borsa forest could reveal the whole training data [5]. The data is
˙Istanbul (BIST 100) index. Technical indicators were calcu- recursively divided into partitions as illustrated in Figure 1.
lated based on the stock prices and then used as an input. C. Gradient Boosting (GB) algorithm
Additionally, to choose the most important features, authors
used selection methods combined with GB classifier model. GB is used for regression and classification tasks. It is a
Accuracy and F-measure metrics were used to evaluate the popular known ensemble method intended for binary trees [6]
quality of prediction. The accuracy of this model achieved and showed a successful performances in several applications
59.9%, 55.8% and 58.1%, respectively, for GARAN, THYAO, [8]. This model erects on the principle of combining a set of
and ISCTR stocks. weak learners to build a single strong learner, by applying the
In paper [12], authors used open and close stock prices in boosting method [7].
addition to a calculated technical indicator, called Change Over D. Support Vector Machine (SVM) algorithm
Time (COT), as input to KNN algorithm to predict the stock SVM is considered one of the most popular supervised
price movements. Results showed that choosing a K value ML algorithms, which is employed for both classification
equal to 6 in KNN model, the accuracy attained 58.82%. and regression problems. Its main idea is to find the best
In paper [13], authors proposed a hybrid model com- classification hyperplane (called also decision boundary) that
bined Binary Gravity Search Algorithm (BGSA) with SVM segregates two categories. This separation facilitates to put the
to forecast and trade the stock indices of S&P500, DAX, new data point in the correct class. This classifier is used in
FTSE100, NKY and CAC. This proposed model BGSA-SVM this paper because the number of studies that use hybrid SVMs
outperformed other models like Random Walk (RW), SVM, to predict future stocks is relatively low [14].
and Buy-and-Hold. It achieved 52.87% in terms of accuracy.
The reviewed papers are summarized in Table I. E. K-Nearest Neighbors (KNN) algorithm
KNN is a popular machine learning algorithm for solving
TABLE I regression or classification problems. Various studies shown
S UMMARY OF LITERATURE REVIEW that this algorithm generated highly accurate predictions [12].
Paper Data Model Accuracy
The principle of this algorithm is that computes the similarity
Share price data
between a new data point and all the rest training data
[4] at BRI NB 57% points. Then, it aggregates similar cases together into the same
ANN=53.9%, category.
ANN, NB, SVM=51.4%,
RF, and RF=50.4%, and F. Grid Search (GS) method
[5] SSE SVM NB=48.6%
The GS algorithm is a method utilized to optimize the
GARAN,
THYAO, and 59.9%, 55.8% hyper-parameter values of a given ML algorithm [15]. Trying
[9] ISCTR GBC and 58.1% manually all possible combination of values consumes a large
[12] Stock dataset KNN 58.82% amount of resources and time. GS method automates the
S&P500, tuning and the process of searching is based on defined subset
DAX,FTSE100, of the hyper-parameter space.
[13] NKY, and CAC BGSA-SVM 52.82%
G. Principal Components Analysis (PCA)
PCA is a popular method that showed great abilities in
III. BASELINES feature selection issue [16]. The principal goal of PCA [17]
A. Gaussian Naı̈ve Bayes (GNB) algorithm is to discover linear dependencies among attributes of a set of
data which encounters the following specifications: to combine
GNB is a supervised ML algorithm and a probabilistic original attributes, to make orthogonal to each other, and to
classifier, which is based on Bayes theorem [11]. It enumerates apprehend the maximum amount of variation in the data to
a set of probabilities by aggregating the frequencies and value achieve high dimensionality reduction [18].
combinations from a given data [4]. This algorithm is utilized
for solving classification problems. IV. A RCHITECTURE OF THE PROPOSED METHODOLOGY
In this paper, we focus on five ML models combined with
B. Random Forest (RF) algorithm PCA and GS of the proposed framework is illustrated in Figure
RF is a popular supervised ML algorithm. It uses ensemble 2. The detail of each step is explained in the next section across
learning technique by combining several classifiers to give the experiments phase.

Authorized licensed use limited to: VIT University. Downloaded on January 27,2024 at 05:54:33 UTC from IEEE Xplore. Restrictions apply.
Figure 1. The general architecture of RF classifier algorithm

Figure 2. The general architecture of the proposed framework

V. E XPERIMENTS AND DISCUSSION testing all the combinations until finding the optimal hyper-
A. Experiment environment parameter values. For example the selected hyper-parameters
of RF model are resumed in Table II.
The experiments are conducted on a computer with Intel
Core i7 7th Gen 2.8 GHz, 32 GBs of RAM and a Microsoft TABLE II
Windows 10 Professional operating system. The implementa- H YPER - PARAMETERS SETTING OF THE RF MODEL
tion has been realized in Python programming language.
Hyper-parameter name Selected value
B. Data max depth 3
In this paper, two types of data are used as input: (i) max features sqrt
n estimators 100
historical stock data of NASDAQ index, covering the period
from March 2018 to March 2023 with 1258 daily observations,
and (ii) calculated technical indicators named Simple Mov-
ing Average (SMA), Exponential Moving Average (EMA), E. Performance evaluation metrics
Commodity Channel Index (CCI), Rate Of Change (ROC)
and Stochastic %K (S%K). The stock dataset includes seven In this paper, we are trying to resolve a classification
variables, represented by: Date, Open, High, Low, Close, Adj problem. The accuracy metric is a popular metric used to
Close prices, and Volume. The use of technical indicators as evaluate the prediction’s quality of ML models in this kind
input proved their successful to ameliorate the accuracy of of problem. It is calculated as shown in Equation 1.
prediction [10]. TN + TP
Accuracy = ∗ 100 (1)
C. Feature selection method TP + FP + TN + FN
In this paper, twenty features are used as input to the Where, TN are the True Negative values, TP are the True
PCA method that selects the most important features and then Positive values, FN are the False Negative values and FP are
transfer them as input to the ML models. the False Positive values.

D. Parameters tuning method F. Results analysis and discussion


For each ML model, a space search is initially defined and In this paper, we have implemented five ML models (GNB,
after that the GridSearchCV method is executed iteratively RF, GB, SVM and KNN classifiers) hybridized with two

Authorized licensed use limited to: VIT University. Downloaded on January 27,2024 at 05:54:33 UTC from IEEE Xplore. Restrictions apply.
methods: (i) PCA for feature selection and (ii) GS for hyper- experimental results show that all the implemented models
parameters tuning. Experiments are conducted on NASDAQ generated satisfied results by comparing them against reviewed
stock index. The objective of this research work is to find papers. Finally, we recommend the PCA-RF to investors
the better model to predict future closing price movements. to make accurate decisions: to buy, sell or hold a given
The achieved results for the developed models are summarised stock. Undoubtedly, the achieved results can be improved
in Table III. Additionally, Figure 3 provides a graphical by applying other hyper-parameters tuning (such as Bayesian
visualization to compare the results of the implemented ML Optimization) and feature selection (like Convolutional Neural
models, in terms of accuracy percentage. Network) techniques.
R EFERENCES
TABLE III
P ERFORMANCE ACCURACY OF THE MODELS [1] Islam, S., Sikder, M. S., Hossain, M. F., & Chakraborty, P. (2021).
Predicting the daily closing price of selected shares on the Dhaka
Models Accuracy Stock Exchange using machine learning techniques. SN Business &
Economics, 1, 1-16.
GNB 57.03% [2] dos Santos Pinheiro, L., & Dras, M. (2017, December). Stock market
RF 60.64% prediction with deep learning: A character-based neural language model
GB 54.61% for event-based trading. In Proceedings of the Australasian Language
SVM 55.82% Technology Association Workshop 2017 (pp. 6-15).
KNN 58.63% [3] Berislav, Ž., & Hrvoje, J. (2020). Forecasting stock market indices using
machine learning algorithms. Interdisciplinary Description of Complex
Systems: INDECS, 18(4), 471-489.
[4] Setiani, I., Tentua, M. N., & Oyama, S. (2021, March). Prediction of
Banking Stock Prices Using Naı̈ve Bayes Method. In Journal of Physics:
Conference Series (Vol. 1823, No. 1, p. 012059). IOP Publishing.
[5] Zhang, C., Ji, Z., Zhang, J., Wang, Y., Zhao, X., & Yang, Y. (2018,
October). Predicting Chinese stock market price trend using machine
learning approach. In Proceedings of the 2nd International Conference
on Computer Science and Application Engineering (pp. 1-5).
[6] Cortes, C., Mohri, M., & Storcheus, D. (2019). Regularized gradient
boosting. Advances in neural information processing systems, 32.
[7] F. Zhou, Q. Zhang, D. Sornette and L. Jiang,” Cascading logistics
regression onto gradient boosted decision tree for forecasting and trading
stock indices,” Applied Soft Computing, vol. 84, November 2019. DOI:
10.1016/j.asoc.2019.105747
[8] Schapire RE, Freund Y. Boosting: Foundations and Algorithms. Cam-
bridge, MA, USA: MIT Press, 2012
[9] Gündüz, H., Çataltepe, Z., & Yaslan, Y. (2017). Stock daily return
prediction using expanded features and feature selection. Turkish Journal
of Electrical Engineering and Computer Sciences, 25(6), 4829-4840.
[10] Chandar, S. K. (2022). Convolutional neural network for stock trading
Figure 3. Comparative accuracy of the implemented models using technical indicators. Automated Software Engineering, 29, 1-14.
[11] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduc-
tion to statistical learning (Vol. 112, p. 18). New York: springer.
According to the experimental results, it is seen that RF [12] Latha, R. S., Sreekanth, G. R., Suganthe, R. C., Geetha, M., Selvaraj,
classifier model based stock movement prediction gives a R. E., Balaji, S., ... & Ponnusamy, P. P. (2022, January). Stock Move-
maximum accuracy near to 61%, followed by KNN with 59%. ment Prediction using KNN Machine Learning Algorithm. In 2022
International Conference on Computer Communication and Informatics
Then, comes GNB with 57% and followed by SVM with (ICCCI) (pp. 1-5). IEEE.
56%. However GB generated the lower accuracy percentage [13] Kang, H., Zong, X., Wang, J., & Chen, H. (2023). Binary gravity search
(55%). Additionally, it is observed that the proposed model algorithm and support vector machine for forecasting and trading stock
indices. International Review of Economics & Finance, 84, 507-526.
RF outperformed equally the same classifier used in paper [5] [14] Zong, X. (2020). MLP, CNN, LSTM and Hybrid SVM for Stock Index
with a difference equal to 10.24% as shown in Table I. From Forecasting Task to INDU and FTSE100. Available at SSRN 3644034.
Figure 3, it is evident that the RF classifier performs well [15] Syarif, I., Prugel-Bennett, A., & Wills, G. (2016). SVM parameter
optimization using grid search and genetic algorithm to improve classi-
in predicting the future stock price movements comparing it fication performance. TELKOMNIKA (Telecommunication Computing
to other models. Additionally, this proposed hybrid classifier Electronics and Control), 14(4), 1502-1509.
(PCA-GS-RF) is tested on data during the last two years (2022 [16] Song, F., Guo, Z., & Mei, D. (2010, November). Feature selection
using principal component analysis. In 2010 international conference on
and 2023), this period is characterised by twin major crises: system science, engineering design and manufacturing informatization
Russia-Ukraine conflict crisis and COVID-19 pandemic health (Vol. 1, pp. 27-30). IEEE.
crisis [19]. This model is capable to understand patterns and [17] Jolliffe, I. T. (2002). Principal component analysis for special types of
data (pp. 338-372). Springer New York.
fluctuations in stock prices, so we recommend it to be used [18] Janecek, A., Gansterer, W., Demel, M., & Ecker, G. (2008, September).
by decisions makers in similar situations. On the relationship between feature selection and classification accuracy.
In New challenges for feature selection in data mining and knowledge
VI. CONCLUSION AND FUTURE WORK discovery (pp. 90-105). PMLR.
[19] Gaio, L. E., Stefanelli, N. O., Júnior, T. P., Bonacim, C. A. G., &
In this research, we have proven the importance of combined Gatsios, R. C. (2022). The impact of the Russia-Ukraine conflict on
ML models with PCA and GS methods to reveal patterns market efficiency: Evidence for the developed stock market. Finance
in stock data and to predict stock price movements. The Research Letters, 50, 103302.

Authorized licensed use limited to: VIT University. Downloaded on January 27,2024 at 05:54:33 UTC from IEEE Xplore. Restrictions apply.

You might also like