0% found this document useful (0 votes)
22 views5 pages

A Comparative Study of Supervised Machine Learning Algorithms For Stock Market Trend Prediction

This paper presents a comparative study of five supervised machine learning algorithms for stock market trend prediction, including Support Vector Machine, Random Forest, K-Nearest Neighbor, Naive Bayes, and Softmax. The results indicate that the Random Forest algorithm is most effective for large datasets, while Naive Bayes excels with smaller datasets. The study emphasizes the importance of machine learning in improving stock market predictions and aims to change misconceptions about stock trading.

Uploaded by

usmansheikh1016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views5 pages

A Comparative Study of Supervised Machine Learning Algorithms For Stock Market Trend Prediction

This paper presents a comparative study of five supervised machine learning algorithms for stock market trend prediction, including Support Vector Machine, Random Forest, K-Nearest Neighbor, Naive Bayes, and Softmax. The results indicate that the Random Forest algorithm is most effective for large datasets, while Naive Bayes excels with smaller datasets. The study emphasizes the importance of machine learning in improving stock market predictions and aims to change misconceptions about stock trading.

Uploaded by

usmansheikh1016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018)

IEEE Xplore Compliant - Part Number: CFP18BAC-ART; ISBN:978-1-5386-1974-2

A comparative study of supervised machine learning algorithms for


stock market trend prediction
Indu Kumar Kiran Dogra Chetna Utreja Premlata Yadav
Computer Engineering Computer Engineering Computer Engineering Computer Engineering
Department Department Department Department
NIT Kurukshetra NIT Kurukshetra NIT Kurukshetra NIT Kurukshetra
[email protected] [email protected] [email protected] [email protected]

Abstract- Impact of many factors on the bounded with the performance of stock
stock prices makes the stock prediction a market. In any country, only 10% of the
difficult and highly complicated task. In people engaging themselves with the stock
this paper, machine learning techniques market investment because of the dynamic
have been applied for the stock price nature of the stock market. There is a
prediction in order to overcome such misconception about the stock market i.e.,
difficulties. In the implemented work, five buying or selling of shares is an act of
models have been developed and their gambling.
performances are compared in predicting Hence, this misconception can be changed
the stock market trends. These models by bringing the awareness across the people
are based on five supervised learning for this. The prediction techniques in stock
techniques i.e., Support Vector Machine market can play a crucial role in bringing
(SVM), Random Forest, K-Nearest more people and existing investors at one
Neighbor (KNN), Naive Bayes, and place. Among the popular methods that have
Softmax. The experimental results show been employed, Machine Learning
that Random Forest algorithm performs techniques are very popular due to the
the best for large datasets and Naive capacity of identifying stock trends from
Bayesian Classifier is the best for small massive amounts of data that capture the
datasets. The results also reveal that underlying stock price dynamics. In this
reduction in the number of technical paper, we applied supervised learning
indicators reduces the accuracies of each methods for stock price trend forecasting.
algorithm. The details of the structure of paper are as
follows. In the next section, related work in
Keywords- machine learning, classifier, this field has been mentioned. In section 3,
Random Forest, SVM, KNN, Naïve Bayes, the paper discusses research data details. In
Softmax section 4 proposed work has been presented.
Finally, in section 5 obtained results are
I. INTRODUCTION
discussed and section 6 concludes the
Stock market plays a very important role in proposal.
fast economic growth of the developing
country like India. So our country and other II. RELATED WORK
developing nation’s growth may depend on
performance of stock market. If stock Correct Prediction of stock market trends is
market rises, then countries economic of great importance for the investors as it
growth would be high. If stock market falls, helps in determining whether the investment
then countries economic growth would be would pay off or not. Many methods have
down. In other words, we can say that stock been deployed for the same. Artificial
market and country growth is tightly Neural Network based method is the first

978-1-5386-1974-2/18/$31.00 ©2018 IEEE 1003


Authorized licensed use limited to: NED UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on October 25,2024 at 18:59:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018)
IEEE Xplore Compliant - Part Number: CFP18BAC-ART; ISBN:978-1-5386-1974-2

technique to be used for the stock market returns for a given security. Volatility for a
trend prediction [1]. Machine learning has period of 10 days has been calculated.
been used for prediction of movement sign Disparity Index (DI) that measures the
of stock market index. Kim [2] in last two relative position of selected moving average
decades applied SVM for the first time for to the most recent closing price. Disparity
predicting stock market price. Random Index for 10 days has been calculated. Next
Forest is another machine learning model indicator is Stochastic Oscillator which
used for predicting trend direction of stocks depicts the location of the closing price
[3]. Five-days and ten-days ahead models relative to the high-low range. Williams%R
have been used. In this research paper, the is momentum indicator which shows the
comparative study of the supervised level of the closing price relative to the
machine learning algorithms using the time highest high. Next indicator is Volume Price
window of size 1 to 90 has been proposed. trend which relates the volume and the price.
The algorithms have been compared based Commodity Channel Index (CCI) calculates
upon the parameters: Size of the dataset and the current price level relative to an
Number of technical indicators used. average price level over a given period of
Accuracy and F-measure values have been time
computed for each algorithm. Long term
model has been used to compute the IV. PROPOSED METHODOLOGY
accuracy and F-measure.
The proposed architecture for the
implemented work mainly consist of four
III. RESEARCH DATA steps: feature extraction from the given
dataset, supervised classification of the
The data used in this research paper has training dataset, supervised classification of
been collected from data sources like Yahoo the test dataset, and result evaluation. Flow
Finance, Quandl, NSE-India, and YCharts. chart for the proposed methodology is
The data available has the following described in Figure1.
attributes: Date Open, High, Close, and
Volume. Twelve technical indicators have
been used for the model prediction. First Dataset
technical indicator used is Moving Average
(MA10 and MA50). It is responsible for Feature Extraction
smoothening the stock price signal and
making the identification of trends easier.
Moving averages for 10 and 50 days have Supervised Classification
been used in this paper. Next technical (Training Dataset)
indicator is Relative Strength Index (RSI)
which detects whether the stock is
overbought or oversold or not. Supervised Classification
Next indicator is Rate of Change (RoC) (Test Dataset)
which simply measures the rate of change of
price from one period to another. RoC1 and
RoC2 have been used in the paper. Next Result Evaluation
indicator which has been used is Volatility
Figure 1. Flow chart of proposed methodology
which gives the measure of the dispersion of

978-1-5386-1974-2/18/$31.00 ©2018 IEEE 1004


Authorized licensed use limited to: NED UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on October 25,2024 at 18:59:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018)
IEEE Xplore Compliant - Part Number: CFP18BAC-ART; ISBN:978-1-5386-1974-2

A. Dataset calculated by taking the average of


The dataset has been collected from data accuracies of 1 to 90 day models. The
sources Yahoo Finance, Quandl, NSE-India, comparison of accuracies of the algorithm
YChart and features have been extracted. has been done on the basis of the following
The dataset for sites Amazon, Cipla, Eicher, parameters: Size of the dataset and Number
Bata and Bosch have been collected for a of technical indicators.
period of five to ten years. Amazon, Bata
and Bosch are the large dataset which has
approximately 4500 entries. Cipla and V. EXPERIMENTAL RESULTS
Eicher are the small dataset which contains
approximately 1800 entries. For comparative study of the supervised
learning algorithms for stock market
B. Features Extraction prediction, accuracy and F-measure are used
in this paper. Accuracy is mathematically
For feature extraction, twelve technical expressed using equation (1) and F-measure
indicators have been calculated from the is mathematically expressed by equation (2),
downloaded dataset for each site. The where TP is true positive, TN is true
indicators calculated are Moving average negative, FP is false positive and FN is false
(MA10, MA50), Rate of Change (RoC1, negative
RoC2), Relative Strength Index (RSI), (1)
Volatility10, Williams%R, Stochastic
Oscillator, Channel Commodity Index
(CCI), Disparity (Disparity5, Disparity10) ( ) ( )
(2)
and Price Volume trend. ( ) ( )

C. Supervised Classification (Training The results evaluated for twelve indicators


Dataset) are shown in Table 1 and for six indicators
The data has been divided into two parts i.e., are shown in Table 2.
training and testing data in the 70:30 ratios.
Table 1. Long term model for twelve indicators
Learning algorithms have been applied on
Algor Large Dataset Small Dataset
the training data and based on the learning, ithms (Accuracy/F-measure in (Accuracy/F-
predictions are made on the test data set. %) measure in % )
Amaz Bosch Bata Cipla Eicher
D. Supervised Classification (Test on
Dataset) SVM 67.16/ 64.56/ 62.35/ 58.51/ 58.98/
75.98 73.85 75.20 65.84 65.80
The test dataset is 30% of the total data. Rand 72.36/ 64.51/ 66.28/ 55.71/ 55.80/
Supervised learning algorithms have been om 80.55 73.30 75.88 64.28 63.95
applied on the test data and the output Forest
obtained is compared with the actual output. KNN 65.56/ 55.06/ 60.89/ 45.94/ 45.81/
77.00 69.22 74.20 57.26 57.06
E. Result Evaluation Naive 70.80/ 63.36/ 50.93/ 63.84/ 64.03/
Bayes 60.42 50.04 50.24 62.32 50.14
Results have been evaluated where Softm 57.80/ 53.18/ 60.00/ 45.90/ 46.93/
accuracies and F-measure values for each ax 64.74 66.13 70.62 48.11 46.94
learning algorithm have been calculated.
Time window that has been used is of size
90 i.e. models from day 1 to 90 are Table 2. Long term model for six indicators
evaluated. Total accuracy has been Algor Large Dataset Small Dataset

978-1-5386-1974-2/18/$31.00 ©2018 IEEE 1005


Authorized licensed use limited to: NED UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on October 25,2024 at 18:59:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018)
IEEE Xplore Compliant - Part Number: CFP18BAC-ART; ISBN:978-1-5386-1974-2

ithms (Accuracy/F-measure in (Accuracy/F- 100


%) measure in % ) SVM
90
Amaz Bosch Bata Cipla Eicher
on 80
70 Random
SVM 66.14/ 59.80/ 66.14/ 55.28/ 54.42/
78.36 71.74 78.36 61.04 60.02 60 Forest
Rand 69.73/ 60.49/ 69.58/ 52.40/ 54.66/ 50 KNN
om 79.87 71.12 79.69 61.92 59.89 40
Forest 30
KNN 65.56/ 55.08/ 65.60/ 45.91/ 43.45/ Naive Bayes
20
76.99 69.23 77.04 57.25 55.89
Naive 67.61/ 60.78/ 67.61/ 60.07/ 60.09/ 10
Softmax
Bayes 50.72 59.58 50.72 43.32 48.33 0
Softm 57.22/ 52.94/ 56.61/ 45.7/4 45.87/ 15 30 45 60 75 90
ax 63.16 65.48 61.21 8.72 44.32
Figure 3. Comparison of algorithms for large dataset
(Random forest performs best).
Comparison of different algorithms for
small dataset has been shown in Figure 2. It 90
verifies our result that Naïve Bayes performs 80
SVM
best for small dataset. Comparison for large 70
dataset is shown in Figure 3. It verifies our Random
60
result that Random Forest performs best for Forest
50
large dataset. Technical indicators are KNN
removed to see the effect on accuracy of the 40
algorithms when features to train the 30
Naive Bayes
algorithm are removed. It is observed that 20
accuracy of algorithms decreases when 10
Softmax
technical indicators are reduced. 0
15 30 45 60 75 90
100
SVM
Figure 4. Comparison of algorithms for less number
80 of technical indicators for small dataset (Compare
Random with Figure 2).
60 Forest
KNN 80 SVM
40
70
20 Naive Bayes 60
Random
50 Forest
0 40
Softmax KNN
15 30 45 60 75 90
30
20
Figure 2. Comparison of algorithms for small dataset Naive Bayes
10
(Naive Bayes performs best).
0
15 30 45 60 75 90 Softmax

Figure 5. Comparison of algorithms for less number


of technical indicators for large dataset (Compare
with Figure 3)

978-1-5386-1974-2/18/$31.00 ©2018 IEEE 1006


Authorized licensed use limited to: NED UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on October 25,2024 at 18:59:06 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the 2nd International Conference on Inventive Communication and Computational Technologies (ICICCT 2018)
IEEE Xplore Compliant - Part Number: CFP18BAC-ART; ISBN:978-1-5386-1974-2

Six technical indicators have been removed


i.e RSI, Williams%R, MA 50, Disparity 10,
RoC 2, CCI. Results after removal of
technical indicators are shown in Figure 4
for small dataset and Figure 5 for large
dataset.
VI. CONCLUSION
In this paper, Supervised machine learning
algorithms SVM, Random Forest, KNN,
Naive Bayes Algorithm, and Softmax
Algorithm have been applied for the stock
price prediction. The results reveal that for
large dataset, Random Forest Algorithm
outperforms all the other algorithms in terms
of accuracy and when the size of the dataset
is reduced to almost half of the original, then
Naïve Bayes Algorithm shows the best
results in terms of accuracy. Also, reduction
in the number of technical indicators reduces
the accuracy of each algorithm in predicting
the stock market trends.
REFERRENCES

[1] G. Zhang, B.E. Patuwo, M.Y. Hu, Forecasting with arti-cial


neural networks: the state of the art, Int. J. Forecasting 14
(1998) 35–62.
[2] K. Kim, Financial time series forecasting using support vector
machines. Neurocomputing, 55, pp. 307–319, 2003.
[3] T. Manojlović* and I. Štajduhar*, Predicting Stock Market
Trends Using Random Forest: A Sample of the Zagreb
Stock Exchange, IEEE International Convention, pp. 1189-
1193, 2015.
[4] Yuqing Dai, Yuning Zhang, Machine Learning in Stock Price
Trend Forecasting, 2013.
[5] Steven B. Achelis, “Technical Analysis from A to Z”, 2 nd ed.,
McGraw-Hill Education, 2000.
[6] Koosha Golmohammadi, Osmar R. Zaiane and David Díaz,
Detecting Stock Market Manipulation using Supervised
Learning Algorithms, IEEE International Conference on
Data Science and Advanced Analytics, pp. 435-441, 2014.
[7] P. Hajek, Forecasting Stock Market Trend using Prototype
Generation Classifiers, WSEAS Transactions on Systems,
Vol.11, No. 12, pp. 671-80, 2012.

978-1-5386-1974-2/18/$31.00 ©2018 IEEE 1007


Authorized licensed use limited to: NED UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on October 25,2024 at 18:59:06 UTC from IEEE Xplore. Restrictions apply.

You might also like