0% found this document useful (0 votes)
10 views17 pages

An Efficient Stock Market Trend Prediction Using The Real-Time Stock Technical Data and Stock Social Media Data

Research paper

Uploaded by

sagarjhs2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

An Efficient Stock Market Trend Prediction Using The Real-Time Stock Technical Data and Stock Social Media Data

Research paper

Uploaded by

sagarjhs2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Received: April 15, 2020. Revised: May 14, 2020.

316

An Efficient Stock Market Trend Prediction Using the Real-Time Stock


Technical Data and Stock Social Media Data

Lakshmana Phaneendra Maguluri1* Ragupathy Rengaswamy1

1
Department of Computer Science and Engineering,
Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu 608002, India
* Corresponding author’s Email: [email protected]

Abstract: Stock market trend prediction is one of the major issues which have been problematic to both financial
analysts and scientific research on real-time streaming `data. Also, forecasting stock market returns based on the
technical data is difficult due to noise in the data and mixed data types. As the size of stock technical tools and
techniques are increasing along with stock news data, it becomes more difficult to analyze and predict the market
trend based on the historical market data. Therefore, it is very challenging task to research institutes and financial
brokerages for data analysis and market trend prediction. Since, financial data is unstructured by nature and it is
difficult to find an essential feature for intraday bullish and bearish trend prediction. In this work, a new stock
technical indicator based non-linear SVM model is designed and implemented on the real-time stock market data for
trend prediction. In this model, a novel stock technical data transformation technique, stock technical and text feature
extraction method and non-linear SVM classification algorithm are proposed to predict the stock trend on daily and
weekly basis. Experimental results have shown that the proposed stock market trend prediction approach has 7% of
computational runtime (ms) and an average stock prediction accuracy of 10% as compared to the existing stock
market trend prediction models.
Keywords: Stock market prediction, Support vector machine, Sentiment prediction, Trend analysis.

feasible top positive and negative trend stocks based


1. Introduction on the company announcements and technical data
as shown in Fig. 1.
Stock market is considered as the most vital and
Stock exchanges are inconsistent, it may vary
active part of financial institutions and investors.
over time; the correlation between social media and
The financial articles and trend data have been
considered as the primary factor for market trend
prediction. Most of the organizations are depend on
the high computational systems in order to predict
the market trend based on the sentiment score and
stock technical data. These predictions are used to
filter positive and negative sentiment stocks are
been used to take appropriate decisions by the
investors. Hence, the modelling and analysis process
of news articles are very much essential in order to
make accurate predictions. The organizations can
become market prominent, if they can attract the
attention of more and more investors [1]. Let us
consider an example, if an investor has 100 stocks;
though the investor couldn’t able to filter the Figure. 1 Historical stock market trend prediction

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 317

stock market data certainly varies over time. A large Stock markets have been studied repeatedly to
number of methods have been proposed in the develop useful models and to forecast their
literature to find and predict the correlation of the movements. Most of the researchers and financing
stock features with financial time series data. These investors use statistical tools and techniques in order
methods are applicable to structured and numerical to predict stock market conditions. The behaviour
databases. In the traditional systems, different of the stock market is generally not known to
market analytical tools and statistical analysis tools financial analysts who invest in stock markets [6].
are used to predict the sentiment of the stock market They can immediately act on it, if they can forecast
trend. In the stock market, the share prices can the future action of stock prices. Pricing trends
depend on many factors, which are ranging from based solely on data analysis are very popular.
corporate news to political news [2]. However, only the event is included in numerical
Stock exchanges can be seen as systems that time series data and not the reason. The use of
vary over time; the correlation between social media textual information, in combination with numerical
and stock market data varies certainly over time [3]. time-series data increases the quality of input and
Taking time varying behaviour into account may improves stock prediction. The application of these
result in more accuracy as correlations that vary in methods establishes the link between news and
time can now be taken into account. This is stock prices and allows us to learn a prediction
particularly useful in the prediction of stock markets system using a text classifier. The technical analysis,
using social media information because no time- which uses technological analytical indicators, is
invariant correlation can be anticipated. For example, built on the numerical time series data and attempts
a credit crunch search term may not be equally to predict stock market trend. It is based on the
relevant for months [4]. Assuming that a constant hypothesis that all the reactions to news on the
correlation for a long time would probably lead to market in real time are included to predict the
incorrect consideration of the data when it is no market future price action. The main objective of
longer useful or exclude data for time periods for this work is to identify current trends and predict
which useful information may be provided. Really future stock trends in charts [7]. In chart analysis,
variable SVR extensions are not known. To the timing of the market is considered critical and
implement time-diverging behaviour into the SVR, opportunities are identified by estimating historical
the horizontal approach is applied. The SVR is changes in prices and volumes and comparing them
trained on a limited number of previous data points with current prices. Technicians charts and models
each time, only to focus on the latest history. This are used to determine trends in price and volume.
method is easy to implement and was used Different types of Web-based financial information
previously with the LASSO [5]. The basic sources provide electronic versions of daily issues.
formulation of the LASSO shown in Eq. (1). In the All these sources contain political and economic
Eq. (1), A is the input data matrix, x,y are the global and regional news, quotes from leading
variables, 𝜀 is small constant, E is the expectation of bankers and politicians and financial analysts '
the variable. J is the predicted value. recommendations [8].
Technical analysis with historical trading data
𝐽 = ‖𝐴𝑥 − 𝑦‖22 = 𝑥 𝑇 (𝐴𝑇 𝐴)𝑥 − 𝑦 𝑇 𝐴𝑥 − 𝑥 𝑇 𝐴𝑇 𝑦 can be used for data analysis. This includes
+ 𝑦𝑇 𝑦 inventory, future, commodity, fixed income,
𝛿𝐽 currency and other securities. Technical analysis is
= 2(𝐴𝑇 𝐴)𝑥 − 2𝐴𝑇 𝑦 = 0 (solution) much more prevalent in commodities and foreign
𝛿𝑥
exchange markets, where traders concentrate on
𝑥 = (𝐴𝑇 𝐴)−1 𝐴𝑇 𝑦 short-term price movements [9].
𝐸[𝑥] = 𝐸[(𝐴𝑇 𝐴)−1 𝐴𝑇 (𝐴𝑥 + 𝜖)] = 𝐸[𝑥] + 𝐸[𝜖] Technical indicators, known as' technical' ones,
focus rather than on the fundamentals of a business,
=𝑥 such as income, revenue and profit margins, on
𝛴𝑥 = 𝐸[𝑥 2 ] − 𝐸[𝑥]2 traditional trading data such as price, volume and
open interest. Technical indicators are commonly
= 𝐸[((𝐴𝑇 𝐴)−1 𝐴𝑇 (𝐴𝑥 + 𝜀))((𝐴𝑇 𝐴)−1 𝐴𝑇 (𝐴𝑥 used by active traders because they are intended to
+ 𝜀))𝑇 ] − 𝑥𝑥 analyze short-term price moves, but long-term
investors can also use technical indicators for
= 𝐸[(𝑥 + (𝐴𝑇 𝐴)−1 𝐴𝑇 𝜀))(𝑥 + (𝐴𝑇 𝐴)−1 𝐴𝑇 𝜀))𝑇 identification of points of entry and exit.
Relative Strength Index: Relative Strength index
−𝑥𝑥 𝑇 (1)

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 318

Figure. 2: Basic Sentiment based Classification model for Trend prediction

is a measure of the strength inherent in a field of correlation (CC) and odds ratios (OR). An
calculated using the amount of price changes overview of the comparative study of various
upwards and downwards over a given time frame. It functional selection metrics demonstrated that the
has a range of 0 to 100 with typically 30 to 70 conventional selection scores are still the best for
values. Higher Relative Strength Index values categorization of text.
indicate overbooked conditions while lower values The main contribution of the work is to find the
indicate oversold conditions. The formula is as sentimental based stock classification in order to
follows for the calculation of the relative strength improve the buying behaviour of investors on the
index shown in Eq. (2) Futures market as shown in Fig. 2. As shown in
figure, stock news and real-time data are taken as
100 input for sentiment analysis and technical analysis.
𝑅𝑆𝐼 = 100 − (2)
1 + 𝑅𝑆 These features are given to classification model for
stock trend prediction. Sentiment analysis reveals
Where as RS = Average Gain / Average Loss. the effect of unstructured market data on investor
RSI is the relative strength index which is used emotions for decision making. The market feelings
to find the trend of the stock based on the or the purchasing behaviour of the merchants are
candlestick values. based on the stock technical and the market trend.
Money Flow Index: The Money Flow Index is a This work is an extension to hybrid Bayesian
measure of the strength of the currency instrument network-based stock market prediction model [12].
that flows into or out of an open-market stock. It is In this work, a hybrid Bayesian classifier is used to
mainly derived by comparing the volume of upward predict the stock trend for real-time dataset. In this
and downward changes in prices over a given period. paper, a novel real-time stock market trend
The Cash Flow Index is based on the amount of the prediction by using data pre-processing and non-
Cash Ratio, which is the ratio of positive cash flow linear classifier. This model predicts the stock trend
to negative cash flow over the period concerned [10]. in time basis.
Moving average: The move average of a field is The main advantage of the proposed model is to
returned for a given period of time. The moving find and predict the trend of the stock in time basis.
average is calculated by combining past values with This model is better applicable to real-time stock
current values over the given period. trend prediction in intraday and customized time
MACD: The MACD is the difference between manner. In this paper, a novel technical indicator is
the short and long moving field averages. The proposed to predict the trend of the stock using the
MACD is usually a particular instance of a value training data.
oscillator and is used mostly to detect price trends at The rest of the paper was organized as follows:
the closing price of a security system. If the MACD Section 2. Deals with Related work; Section 3. Filter
is on a growing trend, prices are higher. If the based stock technical prediction model; Section 4.
MACD is on a downward trend, prices are lower Conveys the obtained results; Section 5. Inference;
[11]. Section 6. Discuss about the Conclusions drawn.
Many feature selection metrics have been
examined for text categorizations, including 2. Related works
information gain (IG), chi-square (CHI), coefficient
Feature subset selection can improve the is a meta learning based approach from the
classification accuracy by creating optimal subset ensemble learning group. The main objective of the
features from the high dimensional feature space. A AdaBoost is to improve the strong classifier using
feature selection approach has been used for the the group of base weak classifiers. AdaoBoost
selection of limited features from the original stock approach is an iterative method, and in each
feature selection [13]. Adaboost(Adaptive Boosting) iteration, a weak base classifier is selected to
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 319

minimize the error rate of the model. High During the process of ranking, the relationship
dimensionality is one of the severe issue for among every individual stock with respective class
machine learning models In order to optimize the label is considered. Univariate scoring metric play a
precision of the classification algorithm, most of the significant role in the above ranking process. It's
classification approaches use feature selection important to note that there are other approaches that
measures such as mutual information, correlation can be taken to make sure that your causality testing
coefficient, rough-set, chi-square test etc.,[14]. To is done properly when the time-series you're using
select subset of features from the high dimensional are non-stationary as shown in Eq. (4)
space. They implemented a PSO based spectral
filtering model to high dimensional features of the 𝑁 𝐷𝑦
original training data. Two reconstruction methods 𝑅𝑆𝑆𝐴𝑅 = ∑(𝑦𝑛 − ∑ 𝑥𝑦,𝑖 𝑦𝑛−𝑖 )2
are used, one is the principle component analysis 𝑛=1 𝑖=1
and the other is Maximum likelihood estimation. 𝑁 𝐷 𝑧 −1 𝐷𝑦
Several distribution algorithms were used in 𝑆𝑆𝐴𝑅𝑋 = ∑(𝑦𝑛 − ∑ 𝑥𝑧,𝑖 𝑍𝑛−𝑖 + ∑ 𝑥𝑦,𝑖 𝑦𝑛−𝑖 )2
randomization models. In most of these approaches 𝑛=1 𝑖=0 𝑖=1
Bayesian analysis is used to predict the original data (𝑅𝑆𝑆𝐴𝑅 − 𝑅𝑆𝑆𝐴𝑅𝑋 )⁄𝐷 𝑧
𝑢= ~𝐹 𝑧 𝑧 𝑦 (4)
distribution using the randomization operator and 𝑅𝑆𝑆𝐴𝐵𝑋 ⁄(𝑁 − 𝐷 𝑧 − 𝐷 𝑦 ) 𝐷 𝑁−𝐷 −𝐷
the randomization data [15].
A multi-scale filter bank is used in order to 𝑅𝑆𝑆𝐴𝑅 is the deviation of the expected and
present the characteristics of stock trend image data observed values of the stock data(x, y). 𝑆𝑆𝐴𝑅𝑋 is
texture and structure. Different efficient and the sum of squares of deviations of observed and
effective classification schemes are implemented to expected values with z-tabulated values. Where
train the system. In the subsequent time, another N,D are the number of samples and D is degree of
generalize system is developed which has the freedom.
responsibility of regional trend classification. The The top ranked stock candle stick patterns are
basic prior probability of the stock trend prediction selected prior to the execution of classification
as shown in Eq. (3). Here X is stock random schemes. On the contrary, wrapper schemes require
variable, C is stock class label, N total number of the stock selection approach in order to integrate
stocks, 𝜏 is the scaling factor (0.5), ∅ is the with a classifier. The prime objective of this
parametric equation. technique is to evaluate the classification
𝜏 performance of every individual stock subset. The
𝑃𝑟(𝑋 < 𝐶|𝑋 < 0) = 𝑃𝑟(𝑦 + < 𝐶) optimal subset of trend patterns is detected
2𝑁 according to the ranking of each feature. Traditional
𝜏
= 𝑃𝑟(𝑁(𝑋 𝑡𝑟𝑢𝑒 , 1⁄√𝑁) < 𝐶 − ) filtering schemes are incapable and inefficient to
2𝑁 𝜏
= 𝑃𝑟(𝑁(0,1⁄√𝑁) < 𝐶 − 𝑋 𝑡𝑟𝑢𝑒 − 2𝑁) measure the relationship in between different stocks
[16].
= √𝑁𝑃𝑟(𝑁(0,1) < √𝑁(𝐶 − 𝑋 𝑡𝑟𝑢𝑒 ) − Directional Accuracy (DA) and Mean Absolute
𝜏
2 ) Percentage Error (MAPE) methods are used to
√𝑁
𝜏 evaluate each feature or feature subset to optimize
= √𝑁∅(√𝑁(𝐶 − 𝑋 𝑡𝑟𝑢𝑒 ) − 2 ) (3)
√𝑁 the classification accuracy [17]. Filter method
evaluates each feature independent from the
Most of the traditional approaches detect classification algorithm, ranks the stock features
inappropriate and computationally infeasible after evaluation and considers the superior one. This
patterns on high dimensional datasets. Hence, it is evaluation is performed using information,
difficult to process all of the stock patterns that are dependency, distance and consistency. The basic
not required during the process of classification. DA and MAPE functions are shown in Eq. (5)
Hence, the overall computational overhead also
increases significantly. Unwanted noise is resulted 𝐷𝐴𝑛 = 1 𝑖𝑓(𝑦 ^ 𝑛+1 − 𝑦𝑛 )(𝑦𝑛+1 − 𝑦𝑛 ) > 0
during the process of classification. Hence, it is very
much required to select essential stock patches else 𝐷𝐴𝑛 = 0
during the classification process. All of the 𝑁
traditional stock selection techniques involve a 𝐷𝐴 = 1⁄𝑁 ∑ 𝐷𝐴𝑛
perfect combination of filter and wrapper schemes.
𝑛=1
Filtering approaches have the responsibility to rank
every individual feature according to their goodness.
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 320

𝑛
𝑦𝑛 − 𝑦^𝑛 approach [21]. Most of the traditional learning
𝑀𝐴𝑃𝐸 = 1⁄𝑁 ∑ | | (5) models for training Single Hidden Layer Feed-
𝑦𝑛
𝑡=1 Forward Neural Networks(SLFN) are comparatively
slower than that of non-parametric approaches on
Here N is the number of stocks, y is the random stock detection. This approach operates slowly
variable for data prediction. DA and MAPE are used because parameters are required to be tuned
to estimate the trend of the stock data based on the iteratively. Moreover, these models require high
training values. computational memory and also increase the overall
In general, the speed of wrapper model is slower computation time of the mapping process. An
than the filter model because of cross validation and extended and slightly modified version of traditional
repeated iteration to evaluate the feature subsets. FFNN approach is developed as Extreme classifier
Traditional wrapper model is more efficient because [22]. This method is used to enhance the efficiency
classification technique affects the overall accuracy, and performance of conventional SLFNs. Also, most
although the subset selection is an NP-hard. of the Neural network-based learning schemes
However, if the number of features involved in perform manual tuning of control parameters (such
complex data increases, finding new trend patterns as learning rate, learning epochs etc.) as well as
can become difficult due to the complex local minima. But Extreme classifier is applied
relationships among features. Feature ranking automatically and there is no need of manual
methods compute the measure for each feature and iterative tuning. The classification boundary is not
rank them accordingly. These ranking methods optimal in FFNN and the boundary is constant
select the top ‘k’ features based on highest rank and throughout the stock training phase. Hence, there are
eliminate those having lower feature ranks [18]. chances of misclassification of samples closer to
Information gain is one of the attribute selection boundary. This approach requires a large number of
measures which are based on entropy value. hidden neurons as compared to other traditional
Efficient sub-sets of attributes contain class-related tuning-based approaches.
and non-related attributes. The method is used to Each stock data is scanned and transformed into
determine how closely the characteristic vectors are normalized continuous data. The main issues of the
linked [19]. When the coefficient of correlation stock datasets are high dimensionality and
between the two vectors is above "0," the imbalance nature. Traditional machine learning
characteristics are said to have a highly positive classifiers consider subset of features for
correlation. Likewise, the functions are said to be classification and trend prediction with high true
negatively correlated if the correlation coefficient negative rate and error rates. Attribute selection is
between these two characteristic vectors is less than used to compute the measure for each feature and
"0" The features are said not to be correlated if the rank them accordingly. These ranking methods
correspondence coefficient of the two characteristic select the top ‘k’ features based on highest rank and
vectors is equal to "0"[20]. Chi-Square is based on eliminate those having lower feature ranks.
the analysis of statistics. The functional vector Information gain is one of the attribute selections
measures the independence. The strength of the measures which is based on entropy value.
relationship between two random variables is tested Information gain approach is the mutual information
with observed and anticipated values. The of a target random variable say P and independent
descriptors should ideally be invariant to operations random variable Q. The main limitation of this
like scale, rotation and illumination changes. This approach is, it chooses features having large distinct
invariance enables descriptors to be matched across values over the features having less distinct values.
videos which have differences in these parameters. They developed an evolutionary technique in order
Extreme classification model is an extension of to detect stock anomalies using hidden Markov
traditional neural network model for data models for imbalance detection. In this work, they
classification. It partitions the whole problem into proposed an imbalance-based technique which is
numbers of sub-problems and merges them to find responsible for monitoring the bandwidth
an optimal stock market trend prediction. The consumption of the sub-stock. The normal
parameters of hidden layer contain training data behaviour scheme completely depends upon the
samples are mapping to output layer. In the bandwidth consumption of the sub-stock. The most
traditional Feed-Forward Neural Networks (FFNN) common variables of hidden Markov models are:-
approach, the adjustments of parameters are iterative bandwidth consumption and the total amount of
in nature and results some issues. These issues are time required for all stock activities. In this model,
overcome by the suggested Extreme classifier a feature ranking measures such as information gain
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 321

and correlation are used to filter the feature space. classification accuracy using different types of
After successful completion of the feature ranking, Artificial Neural Network (ANN) architectures
feature reduction approach is implemented. The developed in different datasets. Random Forests,
feature reduction technique is implemented through Gradient Boosting, or even Logistics Regression can
the integration of ranks generated from the process also be used to predict and classify trend with high
of information gain and correlation. The reduced dimensional feature sets. Although high-
features are given as input to feed forward neural classification accuracies have been reported, further
stock to train and test the stock features in stock dynamic evaluation of trend is needed to gain
dataset. In this method, the pre-processing is carried information about the stocks. It is important to
out manually which is a severe drawback of this measure trend function because stock frequently do
model. not recognize or emphasize small movements,
Mittal, et.al, proposed knowledge-maximized especially if they last longer time. Learning
ensemble approach for various kinds of concept drift classification models with all the high dimensional
[23]. In this work, they presented an advanced data features may result serious issues such as
stream classifier which is known as knowledge performance and scalability [25]. The main
maximized ensemble. Hence, it becomes hard and problems in the existing models are:
complicated to restrict the amount of training data. 1. Problem of feature selection on high
This technique can be influenced by different kinds dimensional datasets.
of concept drift through integration of various 2. Problem of predicting trend with high true
imbalance detection approaches. Decision tree positive rate and less error rate
induction is a simple and powerful classification 3. Problem of handling high dimensional and
process that produces a tree and a set from a specific large datasets using the parallel processing model.
dataset [24] representing a model for different Class-imbalanced data are common in the
classes. During regression analysis, the linear domain of data categorization. It generally
combination splitting criteria are used. In building a categorizes many irrelevant documents, but some
classification model, the training data set is used, articles are categorized under interesting category.
while the test data record is used in model validation. BN approaches are mostly implemented as standard
It is used to classify and predict new records, which classifiers. These approaches give rise to exact
differ between training and testing. Controlled results along with the capability for representing
learning algorithms are preferable to unchecked relationships in between variables. This approach is
learning algorithms (like clusters), because their unable to resolve the traditional class-imbalanced
prior knowledge of class labels of data logs problem [26]. The above process continues
simplifies the selection of features / attributes, and executing till it matches the size of other class and
therefore leads to the prediction / classification of cost-sensitive learning scheme. It includes the
accuracies. Some researchers have succeeded in modifications of relative cost associated with
adopting the theory of the Rough sets for the misclassification of positive and negative class [27].
classification of different stock complications. The The outcomes of both methods are analyzed and
error rates for rough sets were found to be compared with performance achieved without
completely comparable and often significantly lower balancing.
than the other computational techniques.
Recently, ensemble learning models have 3. Filter based stock technical prediction
become popular and widely accepted for high model
dimensional and imbalanced datasets. Most of the
traditional ensemble classification models are In the paper, we have proposed a novel filtered
processed with limited feature space and small data based classification model on the technical data to
size. As the size of the feature space increases, find the bullish trend stocks on the real-time market
traditional ensemble classifiers select a predefined data. This model is tested on the continuous type of
number of features for classification. Learning technical data for trend prediction. In the proposed
classification models with all the high dimensional framework, a novel stock market trend prediction
features may result serious issues such as model is designed and implemented on the real-time
performance and scalability. Feature selection market data. In this model, real-time stock market
measures can be categorized into three types: technical data and its social medial comments are
wrappers, filters and embedded models. Several used to predict the trend of each stock for the
studies have been carried out with the aim of classification problem. Fig. 3 describes the
classifying respiratory trends with high-
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 322

Figure. 3 Proposed stock market trend prediction framework

flowchart of the proposed model for stock market


trend prediction. Initially, real-time market data is
taken from the stock market sites such as NSE or
Upstox etc. Social medial stock comments are
extracted from the tradeview or moneycontrol site.
Stock related technical factors such as symbol, price,
ADX, ADR, RSI, MACD, news sentiment score,
etc., are used as the training data. Here, text pre-
Figure. 4: Bullish and Bearish trend prediction
processing and data pre-processor operations are
performance on the stock text comments and
In this work, a new comprehensive and finance-
technical data. New features are extracted in the
specific word-list is used to find the stock sentiment
comments and technical data for stock trend
score. Using standard TF / IDF weighting scheme,
prediction. Finally, each stock trend is predicted
the weight of each term is used to find the relevance
using the proposed non-linear SVM classifier in
of the bullish or bearish trend in the stock. The
different levels of time frame.
weight of the stock comment term t is computed as
3.1 Text pre-processor shown in Eq. (6)

In a first step all stock market comments of the 𝑓(𝑡, 𝑑)


𝑤(𝑡, 𝑑) = (0.5 + 0.5 × ) × 𝐼(𝑡)
stocks are captured from the money control website 𝑚𝑎𝑥𝑡 𝑓(𝑡 ′ , 𝑑)
as the training data. All the known stop words are 𝐶
subsequently removed from the corpus. Here we 𝐼(𝑡) = 𝑙𝑜𝑔 (6)
𝑐(𝑡)
adopt a standard POS tagger for extracting nouns
from stock sentiment data. If a model uses only P, Where f(t,d) is the frequency of the term in the
then in the left side of Fig. 4 a rational prediction comment, 𝑚𝑎𝑥𝑡 𝑓(𝑡 ′ , 𝑑) is the maximum frequency
"trends down without reversing" would be used; in of all terms in all the stock comments. C is the count
the right side of Fig. 4, "trend up without reversing." of the both positive and negative classes and c(t) is
"Returning and upward trend" in the left side of Fig. the maximized positive or negative terms in
4 and "reversing and downward trend" in the right comments list.
side of Fig. 4 would increase the model's error rate. The public sentiment on each stock is specified
If a model uses only N, derivative f could not as bullish and bearish. The bullish of the stock is
explain why the price still "drops down," as the left computed by using following Eq. (7)
side of Fig. 4 shows, when good news is released,
and why the price is "trending up" when bad news is 𝜏 𝐾
released. 𝑃𝑖,𝑗 × 𝑤(𝑡, 𝑑)𝑗
𝐵𝑢𝑙𝑙𝑠+ = ∑∑ (7)
𝑙𝑖
𝑖=0 𝑗=0

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 323

A where Pi,j represents the count of bullish terms stock features. In the real-time market data, it is
in each comment. Tpi is the total count of the used to analyze contextual feature relationship over
bullish terms in all the comments , Tni is the total time. It is the measure of correlation and
count of the bearish words in all the comments, dependency of the features in high dimensional
w(t,d) represents the weight of comment, and Ni,j dataset. Hybrid Mutual information is represented
represents the number of bearish terms in each in terms of bullish and bearish cases as shown in Eq.
comment. The bearish comment sentiment score of (11).
each comment is computed as Eq. (8)
𝑝(𝑙, 𝑏𝑒)
𝜏 𝐼𝐺𝑎 (𝑙) = 𝑝(𝑙, 𝑏𝑢) 𝑙𝑜𝑔 (11)
𝐾 𝑁𝑖,𝑗 ×𝑤(𝑡,𝑑)𝑗 𝑝(𝑙) × 𝑝(𝑏𝑒)
𝐵𝑒𝑎𝑟𝑠− =∑ ∑ 𝑙𝑖
× 𝑇𝑛𝑖 (8)
𝑗=0 where bu represents the bullish and be
𝑖=0
represents the bearish type. P(l,bu) is the probability
3.2 Data pre-processor of the term in the bullish dictionary. P(be):
probability of the bearish terms. 𝐼𝐺𝑎 (𝑙) :
In the data pre-processor, each feature in the
Information gain o f the term l in the bullish or
technical data is normalized to find the correlation
bearish terms list.
between the data samples for trend prediction. Let x
Here, computed stock sentiment score is used as
defines the input vector x which is normalized in the
the additional attribute in the training data.
specified range x ∈ R → x∈ [R1,R2] to remove
the sparsity issue. Here R1 and R2 represent the 𝑃𝐽𝑜𝑖𝑛𝑡 (𝑥, 𝑦)
predefined normalized range as shown in Eq. (9) 𝐼(𝑋, 𝑌) = 𝑃𝐽𝑜𝑖𝑛𝑡 (𝑥, 𝑦) 𝑙𝑜𝑔 ( ) (12)
𝑃𝑋 (𝑥)𝑃𝑌 (𝑦)
𝑇
𝑥 − min(𝑥) 𝑥

𝑥 = 𝑙 ∗ ( ) = 𝑥 2 (𝑤). 𝑝(𝛽) ∑ 𝐼(𝑋, 𝑌)
max(𝑥) − min(𝑥) 𝛽
𝑡=1
× (𝑅2 − 𝑅1) + 𝑅1 (9) 𝑓𝑜𝑟 𝑡𝑒𝑐ℎ𝑛𝑖𝑐𝑎𝑙 𝑑𝑎𝑡𝑎

3.3 Feature extraction Let α, β, γ represents the three essential Stock


trend technical factors taken from the training data
3.3.1. Stock comment feature score
α = Stock Performance (D)
To each comment in the stock corpus D, we
β = Volatility (D)
construct a dictionary of words that contains bullish
γ = RSI (D)
and bearish words. By using this dictionary, each
(∝× 𝛽)
stock comment d is represented as a bag-of-words 𝑆𝑡𝑜𝑐𝑘𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 =
vector w. Let st(i) defines the ith comment term in 𝛾
the stock comment d. In order to compare the SSC (3) = Stock sentiment score;
comments of different stocks, the term frequencies 𝑃(𝛽) = 𝛼 × 𝑆𝑆𝐶(3)
𝑝
(tf) are used to normalize the words in all the stock
1−𝛼 2
comments. This normalized data is rescaled by ∗ 𝑅𝑆𝐼 ∑ ( 𝛽𝑗 + 𝛼|𝛽𝑗 |) , 𝛼
using inverse document frequency (idf) as shown in 2
Eq. (10) 𝑖=1
∈ [0,1]
𝑡𝑓𝑖𝑑𝑓(𝑠𝑡(𝑖), 𝑑, 𝑆𝐶) = 𝑡𝑓(𝑠𝑡(𝑖), 𝑑) × 𝑖𝑑𝑓 (𝑠𝑡(𝑖), 𝑆𝐶) Where as RSI>70 oversold
|𝑆𝐶| Or RSI<30 undersold
𝑖𝑑𝑓(𝑠𝑡(𝑖), 𝑑, 𝑆𝐶) = 𝑙𝑜𝑔
1 + |{𝑑 ∈ 𝑆𝐶|𝑠𝑡(𝑖) ∈ 𝑑} 3.3.3. Bullish dictionary words
(10)
In this section, a list of bullish dictionary words
where |SC| is the cardinality of stock related is used to predict the trend of the stock using the
comments SC. technical indicator. A set of bullish words are
described below to compute the trend of a stock in
3.3.2. New stock technical indicator real-time market data.
escalated, gain, enjoy, expansion, aggrandize,
Mutual information (MI) is used to find the elevated, increment, rise, prefer, hallow, expand,
variation in the two or more data distributions of supersize, idolize, positive, appreciate, plus,
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 324

relish, accelerate, augment, raise, more, amplify, 𝑥 − 𝑚𝑖𝑛 _(𝑥)


soar, adore, appreciative, approbatory, desire, 𝑥′ = ∗ (𝑅2 − 𝑅1)
𝑚𝑎𝑥 _(𝑥) − 𝑚𝑖𝑛 _(𝑥)
esteem, approving, raised, swell, extend, addition, + 𝑅1
worship, climb, add, commendatory, venerate, End if
augmentation, fancy, revere, friendly, proliferate, Done
addendum, increased, escalate, proliferation,
accumulate, love, stoke, complimentary, Here, min-max normalization is used to scale the
heightened, hype, uprise, accrual, boost, up, data between the ranges of 0 to 1. This approach is
applauding, enlarge, admire, admiring, good, used to clean the data in high dimensional datasets.
multiply, accretion.
3.3.4. Bearish dictionary words Symbol Notation
In this section, a list of bearish dictionary words F[i] ith feature
is used to predict the trend of the stock using the
technical indicator. A set of bearish words are ∅ Kernel transformation
described below to compute the trend of a stock in
real-time market data. μ Mean
descend, recede, depreciative, abhor, drop,
σ Standard deviation
diminution, depreciatory, uncomplimentary,
adverse, deplore, slide, detest, lower, plunge, 𝑅1 Minimum normalized
lessen, unappreciative, depletion, dislike, dive, value
reduce, decrease, depressed, decreased, under,
diminish, dip, low, derogatory, disapprove, 𝑅2 Maximum normalized
unfavourable, negative, lowering, loathe, disfavour, value
unflattering, sink, receded, disdain, hate, decrement,
unfriendly, subtract, loss, abate, decline, despise, 𝑚𝑖𝑛 _(𝑥) Minimum value in
feature
fall, diminishment, lessening, downsize, abominate,
minify, execrate, deprecate, in-appreciative, 𝑚𝑎𝑥 _(𝑥) Maximum value in
dropped, shrinkage, reduction, wane, abatement, feature
disapproving, dwindle, down.
𝑚𝑖𝑛𝑊𝑘 ,𝑎𝑘 Minimization of objective
3.3.5. Stock data transformation function

Input: Training dataset D, F (D): Feature space of D. ‖𝑊𝑘 ‖ SVM parameter


Output: Kernel Filtering or Transformed data KD.
𝐾𝑒𝑟 < 𝑥, 𝑦 > Kernel function with x
Procedure:
and y values
Read input data D.
For each pair of feature F[i], F[j] in feature 𝜏𝑚 , 𝑠𝑖 constants
space F (D)
Do
Apply Kernel transformation on I as 3.4 Proposed non-linear SVM classification
KernelTransform(F[i]) = ∅ model
1 −(𝐹[𝑖]−𝜇(𝐹[𝑖]))⁄𝜎(𝐹[𝑖])
= 𝑒 Input the stock features for data classification.
√2𝜋
Where 𝑛 = ∑ (𝐹[𝑖] − 𝜇)⁄𝑚𝑎𝑥{𝐹[𝑖]} For each feature set do
If (FT (F[i]>0 and n>0) for each stock in SD.
Then do
Normalize F[i] using Min-max Apply SVM multi-class optimization
1 2
normalization [0, 1] models as 𝑚𝑖𝑛𝑊𝑘 ,𝑎𝑘 2 ‖𝑊𝑘 ‖1 + 𝜏𝑚 +
Else ∑𝑙𝑖=1 𝑎𝑖 (𝑦𝑖 [𝐾𝑒𝑟 < 𝑥, 𝑦 >. 𝑤 + 𝑏] − 1 + 𝑠 𝑒 𝑖 ) −
Normalize F[i] and F[j] within [0, 1] using ∑𝑙𝑖=1 𝛾𝑖 𝑠 𝑒 𝑖
Min-max normalization as KD
𝑠. 𝑡 𝐾𝑒𝑟 < 𝑥, 𝑦 >. 𝑤 + 𝑏 ≥ 1 − 𝑠𝑖 𝑒𝑛 − 𝜏𝑚 , 𝑠𝑖 𝑒𝑛
> 0, 𝜏𝑚 > 0; 𝑚 = 1 … 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 325

Here kernel function ker(x,y) represents the


kernel functions defined from trend feature space.
𝑒𝑛 𝑙𝑜𝑔(∑‖𝑥−𝑦‖2 )
𝐾𝑒𝑟 < 𝑥, 𝑦 >= 𝑒 −𝑠𝑖 𝑖𝑓 𝑥 == 𝑦
1
𝑒𝑛 𝑙𝑜𝑔(∑‖𝑥−𝑦‖ ⁄2 )
= 𝑒 −𝑠𝑖 𝑖𝑓 𝑥 < 𝑦
𝑒𝑛 𝑙𝑜𝑔(∑‖𝑦‖2 )
= 𝑒 −𝑠𝑖 𝑖𝑓 𝑥 > 𝑦
Test data is predicted to the class y based on the Figure. 6 Upstox 1min candle data for testing
largest decision values as
𝘢𝘳𝘨𝘮𝘢𝘹{𝑊𝐾𝑇 𝐷𝑖 + 𝑏𝑘 }

4. Experimental results
Experimental results are simulated using java
environment and real-time market data. In this work,
a real-time NSE stock market data are taken as input
for technical analysis. These real-time data are Figure. 7 Upstox 5min candle data for testing
taken from the Zerodha/Upstock brokerage API.
Proposed model is compared to the traditional stock
market classification models to verify the
performance of the hybrid classification model to
the traditional models. Also, proposed model is
compared to the traditional techniques by using
various statistical performance measures such as
accuracy, true positive rate, recall, precision, false Figure. 8 Upstox one day candle data for testing
positive rate, ROC area etc. These performance
metrics are analysed and compared by using third Recall: It is the ratio of correctly classified
party java libraries. Different types of statistical positive stock classes labels to the all predicted
metrics such as recall, precision, accuracy, F- positive and negative labelled stock classes as
measure are evaluated on the stock market sentiment shown in Eq. (15)
data along with the technical data. These statistical
measures are evaluated based on the confusion 𝑆𝑇𝑃
matrix as shown in Table 1. 𝑆𝑡𝑜𝑐𝑘 𝑅𝑒𝑐𝑎𝑙𝑙(𝑆𝑅) = (15)
(𝑆𝑇𝑃 + 𝑆𝐹𝑁)
Accuracy: It is the ratio of correctly labelled
stock predictions class labels to the entire stock class F-Measure: It is the harmonic average of recall
labels as shown in Eq. (13) and precision as shown in Eq. (16)
𝑆𝑡𝑜𝑐𝑘 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 (𝑆𝐴) 2 × 𝑆𝑅 × 𝑆𝑃
(𝑆𝑇𝑃 + 𝑆𝑇𝑁) 𝑆𝑡𝑜𝑐𝑘 𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒(𝑆𝐹) = (16)
= (13) (𝑆𝑅 + 𝑆𝑃)
(𝑆𝑇𝑃 + 𝑆𝐹𝑃 + 𝑆𝐹𝑁 + 𝑆𝑇𝑁)
Fig. 6 illustrates the 1 min candlestick pattern
Precision: It is the ratio of correctly classified graph in the UPSTOX brokerage website. From the
positive stock classes to the all actual positive and above graph it is clearly noted that the performance
negative labelled stock classes as shown in Eq. (14). of the HDFC bank is UP trend in the afternoon
session and neural in the morning session.
𝑆𝑇𝑃 Fig. 7 illustrates the 5 min candlestick pattern
𝑆𝑡𝑜𝑐𝑘 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑆𝑃) = (14)
(𝑆𝑇𝑃 + 𝑆𝐹𝑃) graph in the UPSTOX brokerage website. From the
above graph it is clearly noted that the performance
Table 1. Stock measuring metrics of the HDFC bank is UP trend in the morning
Actual Predicted Stock positive Stock negative session and neural in the afternoon session.
Stock true Stock false Fig. 8 illustrates the 1 day candlestick pattern
Stock positive
positive (STP) positive(SFP)
graph in the UPSTOX brokerage website. From the
Stock false Stock true
above graph it is clearly noted that the performance
Stock negative negative negative
(SFN) (STN) of the HDFC bank is UP trend up to June month and
neutral between July to September months.
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 326

Table 2. Stock features extraction using the proposed models


MI [29] Chi-Square [28] Rough set [31] IG [30] GA [32] PSO [33] Proposed FS
58 56 78 63 56 51 41
53 54 78 60 52 56 37
55 54 74 59 62 69 43
56 52 82 57 53 60 46
58 55 63 58 60 72 48
56 59 85 64 54 52 44
65 65 75 58 58 69 42
53 65 77 55 62 75 50
64 63 79 50 62 56 41
65 52 85 59 57 69 36
60 71 69 62 56 50 36
56 61 70 57 61 70 38
59 70 67 65 58 74 42
56 67 69 62 60 58 45
54 59 69 51 55 60 39
51 72 74 60 57 72 45
53 64 61 57 53 60 46
60 59 85 50 58 65 44
56 62 76 62 61 69 38
58 62 71 51 60 54 38

Figure. 9 Multiple technical trend prediction model

Fig. 9 illustrates the various candlestick patterns


in the UPSTOX brokerage website. From the above Figure. 10 Sample Yes bank user comments
graph it is clearly noted that the various types of
technical trends are used on the HDFC bank to Table 3 illustrates the performance of stock
check the trend from the morning session to the trend feature extraction using the proposed PSO
afternoon session. approach on large datasets. From the Table 2 it is
Table 2 illustrates the performance of stock trend clearly shown that the present feature extraction
feature extraction using the proposed approach on large procedure has high filtering rate as compared to the
datasets. From the table1, it is clearly shown that the existing approaches.
present feature extraction procedure has high filtering rate Table 4 describes the performance of
as compared to the existing approaches. computational runtime (ms) of stock trend feature
Fig. 10 describes the sample yes bank user comments extraction using the proposed approach on large
in the money control website. This data is extracted using datasets. From the Table 3, it is clearly shown that
the web drivers in real time. the present feature extraction procedure has low
computation runtime as compared to the existing
approaches.

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 327

Table. 3 Stock trend features extraction using the proposed


MI Chi-Square Rough set IG GA PSO Proposed FS
64 63 66 60 56 60 48
64 52 77 62 59 66 47
57 58 80 56 56 53 48
62 50 72 64 58 66 38
55 63 65 51 58 65 35
60 53 77 54 54 60 47
60 68 71 54 63 72 46
63 73 66 64 51 68 40
53 54 78 54 58 52 46
53 74 71 57 52 52 40
61 73 80 61 62 59 46
58 55 75 56 57 58 48
57 56 64 54 58 51 44
58 70 74 60 59 72 42
64 60 81 50 65 61 42
55 66 65 55 52 69 36
61 69 60 59 65 50 48
52 62 84 56 53 58 47
54 51 64 63 53 72 44
64 57 66 50 50 63 37

Table. 4 Performance Analysis Of Computational Runtime (Ms) With Different Traditional Feature Selection Models
Features Size MI Chi-square Roughset Information Gain Genetic Algorithm PSO Proposed FS
StockID-100 5417 6297 7230 7024 6638 6481 4747
StockID-200 5365 6529 6957 5917 6253 6516 3965
StockID-300 6200 6609 5959 6882 7099 6469 3474
StockID-400 6122 5514 6055 5729 7430 5584 3495
StockID-500 6727 5676 7488 6685 6103 6753 4809
StockID-600 5774 6089 6823 6273 5996 6597 4292
StockID-700 7205 6045 6060 7449 7448 6319 3888
StockID-800 6976 7340 6036 6544 6855 5864 4723
StockID-900 7080 5710 6692 5661 6866 7365 4488
StockID-1000 5854 7118 7087 5916 5541 5552 4368
StockID-1100 6431 7224 5597 7359 7022 6253 4470
StockID-1200 6595 5767 5989 7086 5950 7552 3995
StockID-1300 6077 7018 5357 6242 5645 6886 4335
StockID-1400 7350 6953 5711 6927 7133 7061 4439
StockID-1500 7536 5695 5414 6211 6909 6438 4303
StockID-1600 7557 7425 6421 6729 6922 5686 3618
StockID-1700 7032 5520 6334 5614 5942 5840 4524
StockID-1800 7556 7183 5417 5700 7236 5630 4734
StockID-1900 5669 5589 5708 7281 5900 6588 4568
StockID-2000 6318 6253 5603 6852 5925 7090 4178

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 328

StockData1900

StockData1700
Proposed_FS_SVM
StockData1500
HNN
StockData1300
RNN
Stock data

StockData1100
CNN
StockData900 BN

StockData700 NN
HT
StockData500
RF
StockData300
SVM
StockData100
0 0.2 0.4 0.6 0.8 1 1.2
Recall

Figure. 11 Performance analysis of recall using different traditional feature selection based classification frameworks

high computational recall as compared to the


4.1 Recall existing frameworks.
Fig. 12 describes the performance of
Fig. 11 describes the performance of recall of precision of stock trend classification using the
stock trend classification using the proposed proposed learning framework on large datasets.
learning framework on large datasets. From the Fig. From the Fig. 12 it is clearly shown that the present
11 it is clearly shown that the present framework has framework has high computational precision as
compared to the existing models.

4.2 Precision

StockData1900

StockData1700
Proposed_FS_SVM
StockData1500
HNN
StockData1300
RNN
Stock data

StockData1100
CNN
StockData900 BN

StockData700 NN
HT
StockData500
RF
StockData300
SVM
StockData100
0 0.2 0.4 0.6 0.8 1 1.2
Precision

Figure. 12 Performance analysis of Precision using different traditional classification learning frameworks

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 329

StockData2000
StockData1900
StockData1800
StockData1700
StockData1600
StockData1500 Proposed_FS_SVM
StockData1400
HNN
StockData1300
RNN
Stock data

StockData1200
StockData1100
CNN
StockData1000
StockData900 BN
StockData800
NN
StockData700
StockData600 HT
StockData500
RF
StockData400
StockData300 SVM
StockData200
StockData100
0 0.2 0.4 0.6 0.8 1 1.2
F1-measure

Figure. 13 Performance analysis of F1-Measure using different traditional classification learning frameworks

StockData1900

StockData1700
Proposed_FS_SVM
StockData1500
HNN
StockData1300
RNN
Stock data

StockData1100
CNN
StockData900 BN

StockData700 NN
HT
StockData500
RF
StockData300
SVM
StockData100
0 0.2 0.4 0.6 0.8 1 1.2
Accuracy

Figure. 14 Performance analysis of Accuracy using different traditional classification learning frameworks

4.3 F1-measure 4.3 Accuracy

Fig. 13 describes the performance of F1-measure Fig. 14 describes the performance of accuracy of
of stock trend classification using the proposed deep stock trend classification using the proposed deep
learning framework on large datasets. From the Fig. learning framework on large datasets. From the Fig.
13 it is clearly shown that the present framework has 14 it is clearly shown that the present framework has
high computational F1-measure as compared to the high computational accuracy as compared to the
existing frameworks. existing frameworks.
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 330

5. Inference Most of the existing technical indicators are difficult


to predict the bullish or bearish trend by using the
As described in the above results, the data technical data and social stock comments. Also,
filtering and feature selection of the proposed model these indicators contain noise during data pre-
improves the performance of the various measures
processing and stock feature extraction. In the
such as precision, recall, F-measure, accuracy and
proposed work, a new technical indicator to the
runtime. From the above tables, it is clearly stock market data is proposed to find the bullish or
identified that the proposed feature extraction and bearish trend in each stock. Here, social media stock
scoring approach optimizes the stock sentiment of related comments are used to find the movement of
the social media comments and its technical data. As the stock or trend of the stock along with the
compared to the traditional feature extraction technical indicator. Experimental results proved that
measures, proposed stock feature extraction function the present model has high computational efficiency
has high computational efficiency with less runtime than the traditional technical indicators in terms of
in the real-time stock market databases. accuracy, f-measure, precision and recall. From the
In the experimental results, nifty 50 stocks and
experimental results, it is observed that the proposed
its technical data are taken as input data to the
stock market trend prediction model has 7% of
proposed model. Initially, input data is pre- runtime (ms) and 10% of average classification
processed using the data transformation approach. accuracy as compared to the traditional trend
Here, the filtered data is used as training data to the prediction models on training and test dataset.
classification model. In the above tables, various
feature extraction measures such as mutual
Conflicts of Interest
information (MI), chi-square, roughset, information
gain and genetic algorithm are used to find the Authors declare that he has no conflict of
essential features on the input stock data for Interest.
classification problem. The output of the
classification approach is stock trend prediction i.e Author Contributions
buy(1) or sell(0).
The entire work of conceptualization, formal
From the results, it is observed that the proposed
analysis, validation, and writing, editing and
model has 7% efficiency for features identification
modification of article were done by Lakshmana
and runtime (ms).
Phaneendra Maguluri under the supervision of
The performance of the proposed non-linear
Ragupathy Rengaswamy.
SVM classifier is compared to the traditional
classifiers such as SVM, random forest (RF),
References
hoeffding tree (HT), neural network (NN) and
Bayesian networks. From the above tables, it is [1] K. Gadiaa and K. Bhowmick, “Parallel Text
observed that the performance of the proposed non- Mining in Multicore Systems Using FP-tree
linear classifier is better than the traditional Algorithm”, Procedia Computer Science, Vol.
classifiers in terms of recall, precision, F-measure, 45, pp. 111-117, 2015.
accuracy and runtime (ms). Also, approximately on [2] J. Bollen, H. Mao, and X. Zeng, “Twitter mood
an average 8-10% accuracy is optimized in the predicts the stock market”, Journal of
proposed model than the traditional stock market Computational Science, Vol. 2, No. 1, pp. 1-8,
prediction classifiers. Discussions: Table 2-4 March 2011.
describes the efficiency of the hybrid feature [3] F. Xianghua, L. Wangwang, X. Yingying, and
selection approach to the conventional feature C. Laizhong, “Combine How Net lexicon to
selection models on stock market data. From the train phrase recursive auto encoder for
results, it is noted that the proposed approach has sentence-level sentiment analysis”, Neuro
better efficiency in selecting the features for Computing, Vol. 241, pp. 18-27, 2017.
classification problem. Section 4.1-4.4 illustrates the [4] D. Lin, L. Li, D. Cao, Y. Lv, and X. Ke,
performance of classification measures on the real- “Multi-modality weakly labeled sentiment
time stock market data. learning based on Explicit Emotion Signal for
Chinese micro blog”, Neurocomputing, Vol.
6. Conclusion 272, pp. 258-269, 2018.
[5] M. Giatsoglou, M. G. Vozalis, K. Diamantaras,
In this paper, a new sentiment and technical
A. Vakali, G. Sarigiannidis, and K. C.
based stock market trend prediction model is
Chatzisavvas, “Sentiment analysis leveraging
designed and implemented on real-time market data.
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 331

emotions and word embeddings”, Expert FLANN based model for forecasting of stock
Systems with Applications, Vol. 69, pp. 214- markets”, Expert Systems with Applications,
224, 2017. Vol. 36, No. 3, pp. 6800-6808, 2009.
[6] K. Guo, Y. Sun, and X. Qian, “Can investor [17] A. P. Ratto, S. Merello, L. Oneto, Y. Ma, L.
sentiment be used to predict the stock price? Malandri, and E. Cambria, “Ensemble of
Dynamic analysis based on China stock Technical Analysis and Machine Learning for
market”, Physica A: Statistical Mechanics and Market Trend Prediction”, In: Proc. of IEEE
its Applications, Vol. 469, pp. 390-396, 2017. Symposium Series on Computational
[7] C. Hung, “Word of mouth quality classification Intelligence (SSCI), pp. 2091-2096, 2018.
based on contextual sentiment lexicons”, [18] T. Loughran and B. Mcdonald, “When Is a
Information Processing & Management, Vol. Liability Not a Liability? Textual Analysis,
53, No. 4, pp. 751-763, 2017. Dictionaries, and 10‐Ks”, The Journal of
[8] O. Araque, I. C. Platas, J. F. S. Rada, and C. A. finance, pp. 35-65, 2011.
Iglesias, “Enhancing deep learning sentiment [19] R. Blagus and L. Lusa, “Class prediction for
analysis with ensemble techniques in social high-dimensional class-imbalanced data”, BMC
applications”, Expert Systems with Applications, Bioinformatics, pp. 1-17, 2010.
Vol. 77, pp. 236-246, 2017. [20] R. Luss and A. d'Aspremont, “Predicting
[9] D. Clevert, T. Unterthiner, and S. Hochreiter, abnormal returns from news using text
“Fast and Accurate Deep Network Learning by classification”, Journal Quantitative Finance,
Exponential Linear Units (ELUs)”, In: Proc. of Vol. 15, No. 6, pp.1-14, 2015.
Learning Representations, 2016. [21] Y. Ma, H. Peng, and E. Cambria, “Targeted
[10] T. Chen, X. He, and MY. Kan, “Context-aware Aspect-Based Sentiment Analysis via
image tweet modeling and recommendation”, Embedding Commonsense Knowledge into an
In: Proc. of 24th ACM international conference Attentive LSTM”, In: Proc. of Artificial
on Multimedia, pp. 1018-1027, 2016. Intelligence Thirty-Second AAAI Conference on
[11] X. He, H. Zhang, M. Y. Kan, and T. S. Chua, Artificial Intelligence, pp. 5876-5883, 2018.
“Fast matrix factorization for online [22] S. Merello, A. P. Ratto, L. Oneto, and E.
recommendation with implicit feedback”, In: Cambria, “Predicting Future Market Trends:
Proc. of ACM SIGIR conference on Research Which Is the Optimal Window?”, Recent
and Development in Information Retrieval, pp. Advances in Big Data and Deep Learning, pp.
549-558, 2016. 180-185, 2019.
[12] L. P. Maguluri and R. Ragupathy, “A New [23] A. Mittal and A. Goel, “Stock Prediction Using
sentiment score based improved Bayesian Twitter Sentiment Analysis”, Stanford
networks for real-time intraday stock trend University, CS229, pp. 1-5, 2011.
classification”, International Journal of [24] P. Areekul, T. Senjyu, H. Toyama, and A. Yona,
Advanced Trends in Computer Science and “Notice of Violation of IEEE Publication
Engineering, Vol. 8, No. 4, 2019. Principles: A Hybrid ARIMA and Neural
[13] P. C. Chang, C. H. Liu, J. L. Lin, C. Y. Fan, Network Model for Short-Term Price
and C. S. P. Ng, “A neural network with a case Forecasting in Deregulated Market”, IEEE
based dynamic window for stock trading Transactions on Power Systems, Vol. 25, No. 1,
prediction”, Expert Systems with Applications, pp. 524-530, 2009.
Vol. 36, No. 3, pp. 6889-6898, 2009. [25] G. E. P. Box, G. Jenkins, G. Reinsel, and G.
[14] B. B. Nair, V.P. Mohandas, and N. R. Ljung, “Time series analysis: Forecasting and
Sakthivel, “A Decision Tree- Rough Set Hybrid control”, Journal of Time, Vol. 31, pp. 238–242,
System for Stock Market Trend Prediction”, 1976.
International Journal of Computer Applications, [26] S. K. Chandar, M. Sumathi, and S. N.
Vol. 6, No. 9, pp. 1-6, 2010. Sivanandam, “Prediction of Stock Market Price
[15] B. B. Nair, V. P. Mohandas, and N. R. using Hybrid of Wavelet Transform and
Sakthivel, “A Stock Market Trend Prediction Artificial Neural Network”, Indian Journal of
System Using a Hybrid Decision Tree-Neuro- Science and Technology, Vol. 9, No. 8, pp. 1-5,
Fuzzy System”, In: Proc. of Advances in Recent 2016.
Technologies in Communication and [27] M. Syamala and N. J. Nalini, “A Filter Based
Computing, pp. 381-385, 2010. Improved Decision Tree Sentiment
[16] R. Majhi, G. Panda, and G. Sahoo, Classification Model for Real-Time Amazon
“Development and performance evaluation of Product Review Data”, International Journal of
International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28
Received: April 15, 2020. Revised: May 14, 2020. 332

Intelligent Engineering and Systems, Vol. 13,


No. 1, pp. 191-202, 2020.
[28] Y. Zhai, W. Song, X. Liu, L. Liu, and X. Zhao,
“A Chi-Square Statistics Based Feature
Selection Method in Text Classification”, In:
Proc. of Software Engineering and Service
Science (ICSESS), Beijing, China, pp. 160-163,
2018.
[29] X. Wang, B. Guo, Y. Shen, C. Zhou, and X.
Duan, “Input Feature Selection Method Based
on Feature Set Equivalence and Mutual
Information Gain Maximization”, IEEE Access,
Vol. 7, pp. 151525-151538, 2019.
[30] Y. Wang, “Unsupervised Representative
Feature Selection Algorithm Based on
Information Entropy and Relevance Analysis”,
IEEE Access, Vol. 6, pp. 45317-45324, 2018.
[31] H. Zhao, P. Wang, Q. Hu, and P. Zhu, “Fuzzy
Rough Set Based Feature Selection for Large-
Scale Hierarchical Classification”, IEEE
Transactions on Fuzzy Systems, Vol. 27, No. 10,
pp. 1891-1903, 2019.
[32] K. Nag and N. R. Pal, “A Multiobjective
Genetic Programming-Based Ensemble for
Simultaneous Feature Selection and
Classification”, IEEE Transactions on
Cybernetics, Vol. 46, No. 2, pp. 499-510, 2016.
[33] B. Tran, B. Xue, and M. Zhang, “Variable-
Length Particle Swarm Optimization for
Feature Selection on High-Dimensional
Classification”, IEEE Transactions on
Evolutionary Computation, Vol. 23, No. 3, pp.
473-487, 2019.

International Journal of Intelligent Engineering and Systems, Vol.13, No.4, 2020 DOI: 10.22266/ijies2020.0831.28

You might also like