1 s2.0 S0952197619301666 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Sentiment analysis on stock social media for stock price movement


prediction✩
Ali Derakhshan, Hamid Beigy ∗
Sharif Intelligent Systems Laboratory, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

ARTICLE INFO ABSTRACT


Keywords: The opinions of other people are an essential piece of information for making informed decisions. With the
Sentiment analysis increase in using the Internet, today the web becomes an excellent source of user’s viewpoints in different
Opinion mining domains. However, in one hand, the growing volume of opinionated text and on the other hand, complexity
Model-based opinion mining
caused by contrast in user opinion, makes it almost impossible to read all of these reviews and make an
Stock prediction
informed decision. These requirements have encouraged a new line of research on mining user reviews, which
is called opinion mining. User’s viewpoints could change during the time, and this is an important issue for
companies. One of the most challenging problems of opinion mining is model-based opinion mining, which aim
to model the generation of words by modeling their probabilities. In this paper, we address the problem of
model-based opinion mining by introducing a part-of-speech graphical model to extract user’s opinions and
test it in two different datasets in English and Persian where the Persian dataset is gathered in this paper
from Iranian stock market social network. In the prediction of the stock market by this model, we achieved
an accuracy better than methods that are using explicit sentiment labels for comments.

1. Introduction idioms, having unusual grammatical structures and so many other prob-
lems. Furthermore, literature in this context shows conflicting results
The ability to predict stock prices is an essential issue concerning in predicting the market. Although some recent researchers proclaimed
academia as well as business, but it is difficult to achieve a model that weak to strong predictive capabilities (Nguyen et al., 2015; Bollen et al.,
can do this. In the current stock markets, the feelings of stockholders 2011), earlier researchers concluded that mood information in social
whether positive or negative is an essential indicator of the future value media has no predictive power (Antweiler and Frank, 2004; Tumarkin
of that stock. In recent years, the expansion of the internet and social and Whitelaw, 2001). Still, Use of comments on social networks to
networks make user opinions about the shares for various companies predict stock prices have remained challenging.
available in a large volume, and even social networks specifically for The goal of this research is to develop a model using mood in-
stockholders have emerged which people can express and discuss their
formation in social media that could be used to predict stock market
ideas about future of each stock. Sentiment information for stocks in
fluctuations (up or down) in the next day. In the proposed model, we
addition to historical prices of them could help to predict the future
use extracted features in one or two consecutive previous days and train
price of stocks better.
a model in a supervised manner to predict the next day’s prices.
Stock prices are affected by many factors, including macroeco-
One of our contributions is using part-of-speech in the LDA model
nomics and various news. However, the focus of this research is solely
on the feelings of users (using their comments). To achieve the best separating words based on their part-of-speech tags. We call the pro-
possible prediction model for stocks, we should aggregate all informa- posed method LDA-POS, and this model achieved notable results on two
tion about them including news and the company periodical reports, datasets one in English and the other in Persian. The English dataset
but our goal here is to achieve the best possible accuracy using only we used is from Nguyen paper and contains comments for 18 stocks
users opinions and comments in the social media. for more than one year which is a massive dataset in this field of re-
To be able for extracting sentiments from these social networks we search where datasets are usually short (Nguyen et al., 2015). Another
should do opinion mining in large quantities on the data which is a contribution of this paper is gathering a Persian dataset during this
tough task, because texts in social networks are usually short, full of research which contains comments for five stocks in Iran’s stock market

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work.
For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.07.002.
∗ Corresponding author.
E-mail addresses: [email protected] (A. Derakhshan), [email protected] (H. Beigy).

https://doi.org/10.1016/j.engappai.2019.07.002
Received 20 August 2018; Received in revised form 23 June 2019; Accepted 2 July 2019
Available online 29 July 2019
0952-1976/© 2019 Elsevier Ltd. All rights reserved.
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

for about six months. The experimental results show the superiority of opinion mining models, parse the sentences and extracted their syntax
the proposed model in both datasets. tree and find the coreferences of the nouns to address the problem
The remainder of the paper is organized as follows. Section 2 successfully (Federici and Dragoni, 2016) .
presents a literature review of the related research on predictive ap- Another approach to opinion mining is argumentation based opin-
proaches. Section 3 describes both the English and Persian datasets in ion mining which uses argumentation theory to model and evaluates
details including their statistics, their sources, and the way we used the pieces of information called arguments. An argument could sup-
them in our task. Section 4 represents methods for stock movement port, contradict or explain a statement and by presenting the argu-
prediction and Section 5 describes the evaluation methods we used ments relations in the form of a graph, one can make decisions based
and compares the proposed model to other related models. Section 6 on them. A paper in this field introduced SMACk as argumentation
concludes the paper and provides incentives for further work. based opinion mining framework that can analyze online social media
texts (Dragoni et al., 2016). This framework is based on abstract bipolar
2. Related work argumentation theory that can extract relevant documents from textual
documents (Dragoni et al., 2016). A recent paper in this field has
Predicting the stock market is an exciting field both for academics tried to combine argumentation theory and natural language processing
and industry. The central question in this area is that is it possible methods to find most debated arguments in online shopping frame-
to predict stock price movements at all. Some of the researches were work and enable users to make more informed decisions (Dragoni
based on the random walk and Efficient Market Hypothesis (EMH) et al., 2018). The introduced model combines Argumentation and
theory (Fama, 1991; Fama et al., 1969). They suggested that changes Aspect-Based Opinion Mining (Dragoni et al., 2018).
in stock prices are only because of news and events that could affect Some researches in understanding emotions from the text empha-
the costs and because it is not possible to predict events and news, it is sized on the broad applications of polarity detection in academia and
impossible to predict stock price changes (Walczak, 2001). In practice, also in industry (Cambria, 2016), they also suggest that although
some researches were showing that stock prices do not follow random sentiment mining approaches are mainly using the bag-of-words model
walk theory and they could be predicted in some degrees (Qian and as the primary linguistic unit is a word in the first look, however
Rasheed, 2007; Bollen et al., 2011; Vu et al., 2012). Some researches determining specific and sentics needs multi-word expressions. Sentics,
achieved descent accuracy results at 56% hit in predicting the direction particularly determine the emotional information for real-world entities
of changes in stock prices (Schumaker and Chen, 2009a; Si et al., 2013; and are essential in sentiment polarity detection. This research high-
Tsibouris and Zeidenberg, 1995). lights the importance of integrating semantic knowledge in addition to
Further to discussed theories, there are two main philosophies machine learning approaches. The writer also suggests that the next-
in stock trading: fundamental analysis and technical analysis. In the generation models will include the commonsense knowledge data as
fundamental analysis, the economic conditions of the company and eco- well as brain-inspired reasoning methods (Cambria, 2016).
nomic indicators are used to assess the overall status of that company Based on a survey on applications of text mining in financial domain
and to predict its stock prices. On the other hand, technical analysis (Kumar and Ravi, 2016), about 70% of previous researches in this
is based on analyzing stocks price history during a time. Searching field have done using regular methods like decision trees, SVMs and
for repetitive patterns in prices to predict stock price fluctuations. regression analysis. Based on another recent survey the usage portion
Some researchers used only historical prices (Cervelló-Royo et al., of regular methods is the same as other methods, as using complicated
2015; Patel et al., 2015; Ticknor, 2013; Zuo and Kita, 2012a,b). To models gets poor performance in general (Xing et al., 2018b).
find patterns in the history of prices, many researchers implemented For predicting the market, most of the studies did not find the
methods including Bayesian Networks (Zuo and Kita, 2012a,b), time- features extracted from the text sufficient, and they usually combined
series and Auto Regressive models (Patel et al., 2015; Zuo and Kita, numeric economic data to their features or used ensemble methods
2012a). whether in feature level or decision level to make their prediction
While the mentioned techniques did not use the sentiments on social robust (Xing et al., 2018b).
media to predict the stock prices, it could be beneficial to include these In recent years, by emerging deep neural networks, their usage in
data to improve prediction performance. this field of research has dramatically increased. One of the prominent
Most of the researchers in this field used only one stock (Bollen analyses has used Deep Belief Networks(DBN) in addition to Recurrent
et al., 2011; Qian and Rasheed, 2007; Si et al., 2013), and in many of Neural Networks(RNN) to predict the market and has reduced the
them, the number of test samples were insufficient (about 15 samples), binary classification error rate to 40.05% from a baseline with 47.30%
which seems inadequate to reach a conclusion (Bollen et al., 2011; Vu error rate (Yoshihara et al., 2015).
et al., 2012). As we know, there is no research showing noteworthy The most recent paper in this field has predicted the market well us-
results on several stocks in a long time. In this study, we used two ing ensemble learning of evolving clustering and LSTMs. This paper also
datasets with more than 24 stocks in a relatively long interval. emphasized that it is not sufficient for individuals to make investments
Sentiment analysis has been widely used in product and restaurant solely based on public mood data as the public mood do not affect the
reviews (Liu and Zhang, 2012; Pang and Lee, 2008). There have been market directly and other factors must also be considered (Xing et al.,
some researchers trying to include textual data to improve stock market 2018a).
prediction. There are two primary sources of textual data that can To predict stock market prices using twitter messages authors of Si
be used in this task, the first one which is available for a longer et al. (2013) applied a non-parametric topic model. This model was a
time is economic news and the second one which is a new source is continuous Dirichlet Process Mixture(cDPM) to learn daily topics. Then,
social media, especially social media that exclusively built for the stock time series were created on the everyday topics. Using a non-parametric
market. These sentiments are aggregated in the model (Schumaker and model and ability to estimate the number of topics automatically is the
Chen, 2009a,b; Sadeghi and Beigy, 2013). main advantage of this method. However, they used a small dataset,
The focus of sentiment analysis has changed over the time to with messages only for three months.
aspect level opinion mining, while in many scenarios like product or
restaurant reviews, it is crucial to separate aspects of different features 3. Dataset
and determine their polarity separately. Some initial researches in
this domain, start their work base on the idea that most nouns could In this study we used two types of datasets as model inputs. The
be aspects and their nearby adjective could contain the polarity for market social network datasets, which are used for comment analysis,
that aspect (Hu and Liu, 2004). While other prominent aspect-based and the market data that shows daily prices per share. We used the

570
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Persian and English language social networks to explore comments 3.2. Persian dataset
and sentiments, where the users submit their comments for Iran and
U.S. markets, respectively. Specifications of each dataset are described This dataset is collected during this research. User comments gath-
briefly in the rest of this section. ered from SAHAMYAB website (Sahamyab stock twitter). There is a
part in this website called tweets, where the users can post their
comments on different stocks. The dataset contains user comments,
3.1. English dataset
spanning approximately six months (four months for train and two
months for test). The name of the stock companies and their stock
This dataset was collected during a research on this field of study symbols are summarized in Table 2. We used these stock abbreviation
(Nguyen et al., 2015). The dataset consists of the English language symbols in the result tables.
users comments on 18 shares at Yahoo Message Board (Yahoo Finance There are a total of 21,205 comments from stock users in this
Message Board), including their comments in a period of one year (from dataset, among which 11,183 comments were directly sentiment la-
32 July 2012 to 19 July 2013). The name of the stock companies and beled by users, indicating that a high percentage of comments are
their stock abbreviation symbols are summarized in Table 1. We will labeled emotions. The dataset’s daily statics number of comments for
use the stock summary symbols in the result tables. There are a total different shares, given in Table 2, includes user comments on the shares
of 787,547 comments in the dataset. Some shares get more comments mentioned above for 106 working days. In this dataset, 76 days from
while the others get fewer comments per day. The number of comments April 30, 2016 to August 28, 2016 are allocated to training data and
per different shares in this data is also demonstrated in Table 1. 30 days from September 13, 2016 to November 02, 2016 are allocated
In order to prepare the training and testing sets, we divided the to testing data.
comment dataset into two time periods. For AAPL stock, 61 working The comments for the current transaction date in this dataset is the
days from 16 July 2012 to 01 Oct 2012 is used for the training set, and set of all comments from 12 pm of previous transaction date to 8 am
83 working days from 12 Nov. 2012 to 13 March 2013 is used for the of the current transaction date. Iran’s stock market is open from 8 am
testing set. For all of the other stocks, the period from 23 July 2012 to to 12 pm in regular working days, hence for each transaction date we
28 March 2013, containing 171 working days, is used for the training used users comments before the market gets opened, and the comments
during the four open hours of the market (8 am to 12 pm) are discarded.
set and 78 working days from 01 April 2013 to 19 July 2013 is used
for the testing set. The comments in this dataset can have metadata like images and ti-
tle, but we only used the comments texts. Also, we used only comments
As mentioned in Nguyen et al. (2015), the comments for current
without sentiment labels for methods that are using the comments
transaction date is the set of all comments from 4 pm of previous
texts, and we used the labels in explicitly labeled comments in human
transaction date to 4 pm of the current transaction date, because 4 pm
sentiment method as a baseline.
is the when U.S. market gets closed.
Iran’s stock market dataset contains the prices of stocks used in
Table 1 shows the statistics of English dataset stocks. For the AAPL,
the prediction model as labels. The information in this dataset was
Dell and KO stocks the minimum number of messages in each transac- extracted from Tehran Securities Exchange Technology Management
tion date is zero. It means that there are some transaction dates in this Co. website (Tehran Securities Exchange). Iran’s market is closed on
dataset which we have no messages for them. While we want to predict weekends and other official holidays, with fewer days of activity com-
stock price movements using sentiments in users messages, having some pared to similar markets abroad. Daily opening prices, the highest
days with no data in this dataset could add some random error to results and lowest prices, closing prices and final set prices per share can be
for different methods. We want the results to be as accurate as possible accessed online on trading days. In this study, we use closing prices per
(to reliably compare various methods), we decided to remove these share for each working day. The Iranian rial is the currency of Iran, with
three stocks from this dataset and use other 15 stocks with at least one no decimal places.
message in a day, in our experiments.
Some previous studies have used Twitter as a major source of
3.3. The balancing of the labels
comments and sentiments to analyze market comments (Azar and Lo,
2016). However, this dataset has some advantages over Twitter which
Our train datasets are not balanced, and the number of positive
has inspired us to use it. Anyone can comment on any topic on Twitter.
and negative samples in the training sets can be different. However, in
We can separate the market specific comments with hashtags and a
the datasets we used, the range of ratio between positive and negative
series of distinguishing words, but there are still a large number of
samples where about the same, so we are not highly unbalanced and
irrelevant comments on Twitter. We also do not have emotion labels on
the F-measure results also confirms this. In both our English and
Twitter to use it as a basis for comparing the results(Human sentiment
Persian datasets, the relative number of the labels in the test sets are
method). Although there are comments in the English social network
also significant in addition to the provided statistics. Imagine that the
datasets that are unrelated to the market, and even contradictory or
positive labels (corresponding to the increase in prices) consist 90% of
false, they are much better than the comments on Twitter.
the labels, in this case, if a classifier always says that the prices will
The U.S. stock market dataset contains the prices of stocks used increase, it would be correct 90% of the times, so it is essential to
in the prediction model as labels. The information in this dataset was investigate this property of the datasets. In our datasets, for the test set
extracted from the Yahoo finance website (Yahoo Finance). The U.S. labels, we counted the positive and negative labels and calculated their
stock markets are closed during weekends and major holidays but are relative abundance, after that, we calculated the average of the most
open on weekdays. Daily stock prices are available for regular business frequent labels(always bigger than 0.5 in two class classification) over
days. There are daily opening prices, the highest and lowest prices, all the stocks. For the Persian dataset this indicator is equal to 54.00%,
closing prices and adjusted closing prices. While we want to use a and for the English dataset, it is equal to 55.21%. This indicator is not
reasonable number of comments to predict stock prices for the next a method to predict the price movements, while our classifier does not
trading day, daily price fluctuation does not have importance to us; have access to test sets in the training phase and it is possible that for
therefore, we use adjusted close prices per working day. U.S. markets some stocks the most frequent class for test and train sets do not be the
use dollars for trading with the correct two decimal places (cents). same.

571
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Table 1
Statistics of the English dataset for each transaction date (Nguyen et al., 2015).
Stocks Company name The number of messages Mean of the number of
human sentiments
Min Median Mean Max
AAPL Apple Inc. 0 1093 1678 11 220 350
AMZN Amazon.com Inc. 24 154 192 1 963 28
BA The Boeing Company 46 173 203 1 053 16
BAC Bank of America Corporation 94 282 343 1 366 49
CSCO Cisco Systems Inc. 69 247 274 972 10
DELL Dell Inc. 0 18 42 587 10
EBAY eBay Inc. 1 17 29 267 3
ETFC E Trade Financial Corporation 2 42 56 315 12
GOOG Google Inc. 10 69 93 1 305 16
IBM IBM Inc. 3 14 20 195 3
INTC Intel Corporation 37 177 200 958 29
KO The Coca-Cola Company 0 6 8 89 2
MSFT Microsoft Corporation 27 139 172 815 53
NVDA NVIDIA Corporation 10 65 80 410 11
ORCL Oracle Corporation 5 67 79 372 6
T AT&T Inc. 10 52 59 251 8
XOM Exxon Mobil Corporation 10 37 44 202 4
YHOO Yahoo! Inc. 22 121 141 860 27

Table 2
Statistics of the Persian dataset for each transaction date.
Stocks Company name The number of messages Mean of the number of
human sentiments
Min Median Mean Max
KHZAMIA Vehicle parts manufacturing 3 17 22 90 13
SHABENDAR Bandar Abbas Oil Refining 4 28 42 323 21
SHAPNA Esfahan Oil Refining 0 7 10 59 5
VNAFT Oil, Gas industry investment 0 7 10 42 5
KHODRO Iran Khodro Automobile 21 94 116 521 62

4. Methods for stock movement prediction aggregating each expert idea about price movement using ensemble
learning methods we can build a model which incorporates all the
To predict stock prices, at first a set of features (using different available methods and outperforms all of them.
methods) is extracted in a daily basis, then a Support Vector Machine
(SVM) is used to predict the price movement by classifying the features
4.2. Human sentiment method
in two categories up and down indicating increase and decrease in the
stock price, respectively. As mentioned in Section 3, the English dataset
used in this article is the same as the dataset used in Nguyen et al. As we saw in Section 3, both English, and Persian datasets consist of
(2015) and we want to compare our proposed method with methods some comments which users expressed their opinions. In some of the
presented in Nguyen et al. (2015). So we tried to keep the parameters comments, users explicitly determined their optimistic or pessimistic
and prediction model the same as the article mentioned to have a fair overview of that stock by explicitly labeling their comment (buy label
comparison. Therefore we chose the linear kernel for the SVM as the means that user believes that the price of that stock will rise and users
default kernel (same as the base article). must buy it and sell label means the opposite opinion). While we want
to predict stock price movements using users opinions, this feature is
4.1. Price only method the most prominent feature that shows the opinion of users about a
specific stock.
In this method, we used only stock price history as our features to In English dataset the labels are strong sell, sell, hold, buy and strong
predict its price movement. This method is used as one of our base-line buy while in Persian dataset there are only two labels buy and sell.
methods, and we want to see that by using only historical prices ow
In each transaction date, we count the number of messages with
much is it possible to predict the market behavior.
buy and sell labels in the Persian dataset. While we have only two
For each transaction date, the price movement (up or down) for
labels here, by using the percentage of one label we can know the
one previous date is shown by 𝑝𝑟𝑖𝑐𝑒𝑡−1 and for two previous dates by
percentage of the other, so we use the percentage of buy for this
𝑝𝑟𝑖𝑐𝑒𝑡−2 . In this method 𝑝𝑟𝑖𝑐𝑒𝑡−1 and 𝑝𝑟𝑖𝑐𝑒𝑡−2 are the only features fed
dataset. For each transaction date, we calculate the percentage of buy
into the given classification model. Features used in each method are
label among explicitly labeled messages for current and the previous
shown in Table 3, and the suffix En and Per denote the features for
English and Persian dataset, respectively. transaction dates and denote them by 𝐻𝑠𝑒𝑛𝑡𝑃 𝑒𝑟𝑖,𝑡 and 𝐻𝑠𝑒𝑛𝑡𝑃 𝑒𝑟𝑖,𝑡−1 ,
To follow the Ref. Nguyen et al. (2015), in our English dataset, we respectively. Then we use these features in our prediction model. The
add price only method features to the feature set of other methods, but features used for the English dataset is similar to what we have done for
in our Persian dataset, we do not include features generated in Price the Persian dataset. The percentage of each label for each transaction
only method to the feature set of other methods and we used only date is calculated, and these five percentages are the representative of
the features generated in each method only for that method. By this the human sentiments in that day.
separation, we can determine and compare the predictive power of Features used in prediction model for this dataset are 𝐻𝑠𝑒𝑛𝑡𝐸𝑛𝑖,𝑡
features in each method exclusively. When we want to create a practical and 𝐻𝑠𝑒𝑛𝑡𝐸𝑛𝑖,𝑡−1 in addition to 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 and 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 . Table 3
stock prediction model, we can see each method as an expert and by shows the features used in Human sentiment method.

572
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Table 3
Features of the prediction model.
Method Features in English dataset Features in Persian dataset
Price only method 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 𝑝𝑟𝑖𝑐𝑒𝑃 𝑒𝑟𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝑃 𝑒𝑟𝑡−2
Human sentiment 𝐻𝑠𝑒𝑛𝑡𝐸𝑛𝑖,𝑡 , 𝐻𝑠𝑒𝑛𝑡𝐸𝑛𝑖,𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 𝐻𝑠𝑒𝑛𝑡𝑃 𝑒𝑟𝑖,𝑡 , 𝐻𝑠𝑒𝑛𝑡𝑃 𝑒𝑟𝑖,𝑡−1
LDA-based method 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 , 𝑙𝑑𝑎𝐸𝑛𝑖,𝑡 , 𝑙𝑑𝑎𝐸𝑛𝑖,𝑡−1 𝑙𝑑𝑎𝑃 𝑒𝑟𝑖,𝑡
Aspect-based sentiment 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 , 𝐴𝑠𝑒𝑛𝑡𝑖;𝑡 , 𝐴𝑠𝑒𝑛𝑡𝑖;𝑡−1 , 𝐼𝑖,𝑡 , 𝐼𝑖,𝑡−1 Unavailable
LDA-POS method 𝑃 𝑜𝑠𝐴𝑑𝑗𝐸𝑛𝑖,𝑡 , 𝑃 𝑜𝑠𝑁𝑜𝑢𝑛𝐸𝑛𝑖,𝑡 , 𝑃 𝑜𝑠𝑃 𝑟𝑒𝑝𝐸𝑛𝑖,𝑡 , 𝑃 𝑜𝑠𝑉 𝑒𝑟𝑏𝐸𝑛𝑖,𝑡 𝑃 𝑜𝑠𝐴𝑑𝑗𝑃 𝑒𝑟𝑖,𝑡 , 𝑃 𝑜𝑠𝑁𝑜𝑢𝑛𝑃 𝑒𝑟𝑖,𝑡 , 𝑃 𝑜𝑠𝑃 𝑟𝑒𝑝𝑃 𝑒𝑟𝑖,𝑡 , 𝑃 𝑜𝑠𝑉 𝑒𝑟𝑏𝑃 𝑒𝑟𝑖,𝑡
Neural network method 𝑙𝑑𝑎𝐸𝑛𝑖,𝑡 Unavailable

Table 4 transaction date 𝑡 in our test dataset, we calculated the probability of


Notations in LDA (Blei et al., 2003; Nguyen et al., 2015).
each topic for every document and used it as input (a vector of 50
Notation Definition probability numbers as we have 50 topics) to our prediction model.
𝛼, 𝛽 Hyperparameters Features used in prediction model in English dataset are 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 ,
𝜙 The distribution over words
𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 , 𝑙𝑑𝑎𝐸𝑛𝑖,𝑡 and 𝑙𝑑𝑎𝐸𝑛𝑖,𝑡−1 which subscripts 𝑡 and 𝑡 − 1 indicate
T The number of topics
𝜃 The message specific topic distribution transaction date 𝑡 and 𝑡−1 respectively. For Persian dataset the features
z A topic used in prediction model is 𝑙𝑑𝑎𝑃 𝑒𝑟𝑖,𝑡 . Table 3 shows the features used
w A word in the message d in LDA-based method.
𝑁𝑑 The number of words in the message d
D The number of messages
4.4. Aspect-based sentiment

The aspect-based sentiment is the method proposed by Nguyen et al.


(2015). We used the English dataset provided by this paper in our work,
and we want to compare our proposed method results with it as well.
In this method, frequent consecutive nouns in all messages are
detected (more than 10) in the dataset, then the sentiment polarity of
these common consecutive nouns which are called topics is determined.
To find the sentiment score of each topic, the opinion words of the
sentences with that topic are detected using the SentiWordNet (opinion
word list with their polarity Baccianella et al., 2010), and the topics
are scored based on distance and polarity of each opinion word in a
sentence containing that topic. The final score for each topic is the
average score for that topic in all the sentences having that topic. To
highlight the importance of more frequent topics, the percentage of
Fig. 1. Graphical model representation of LDA (Blei et al., 2003).
messages with that topic is also included as a feature for this method.
The features used in the prediction model for this method are
𝐴𝑠𝑒𝑛𝑡𝑖;𝑡 , 𝐴𝑠𝑒𝑛𝑡𝑖;𝑡−1 , 𝐼𝑖,𝑡 , 𝐼𝑖,𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 and 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 . 𝐴𝑠𝑒𝑛𝑡 denotes
4.3. LDA-based method the polarity of topics and 𝐼 indicates their corresponding importance in
the prediction model. Subscripts 𝑡 and 𝑡 − 1 indicate transaction date 𝑡
This model is based on generative probabilistic model of Latent and 𝑡 − 1, respectively. Table 3 shows the features used in Aspect-based
Dirichlet Allocation which considers every document(messages in our sentiment and for the detailed explanation of this method you can refer
model) as a mixture of latent topics, and each topic is a probability to the Nguyen’s paper (Nguyen et al., 2015).
distribution over words. We used this model as a baseline model and To implement this method on the Persian dataset, we need an
for comparison with Ref. Nguyen et al. (2015), we used the same opinion word lexicon for the Persian language. Although there are
parameter values. In Fig. 1, the graphical model representation of LDA some lexicons of opinion words and their polarity in Persian, their total
are given and Table 4 shows the notations use in the LDA model. lexicon size is about 1500 words (Dashtipour et al., 2016; Dehdarbe-
For the English dataset, stop words are removed from the dataset, hbahani et al., 2014), which is not comparable to SentiWordNet with
then the remaining words are lemmatized using Stanford CoreNLP more than 117 thousand entries. Because there are many synonyms to
(Manning et al., 2014). LDA is trained on the training dataset, then describe an opinion in Persian and detecting similarity and expanding
topics are inferred for messages in the test dataset. We used Gibbs these small Persian lexicons is out of this research focus, we decided not
sampling with 1000 iterations to infer the topics and choose 50 topics.1 to implement this method in Persian. Lack of sufficient opinion words
For each transaction date 𝑡 in our test dataset, we fetch all messages in the lexicon makes it impossible to score the topics in the sentences.
and calculate the probability of each topic for every message and use
their average as the input to the proposed prediction model. 4.5. LDA-POS Method
For the Persian dataset, stop words are removed from the dataset,
then the remaining words are lemmatized using JHazm library (A
In this section, we describe the proposed LDA-POS model. In the
library developed specially for the Persian language). Because messages
LDA-based method at first stop words are removed from the dataset,
are short in this dataset, all messages in each transaction date are
and then the distribution of different topics are assessed. In this method
concatenated together, and we call them the document of that day.
instead of removing stop words from the dataset, the part of speech
LDA is trained on the documents of the training dataset, then topics
of all words in the sentences are determined, then we create four
are inferred for documents in the test dataset. We used Gibbs sampling
categories of POS tags and the words with the same POS category tags
with 1000 iterations to infer the topics and choose 50 topics.2 For each
are grouped as a document of that POS tag. Afterward, we assume
each document with specific POS tag, a distribution of different topics
1
We used mallet library to implement LDA, and in this library, English stop
words have already been provided.
2
We used Mallet library to implement LDA, and because mallet does not stopwords/persian website which is the same source that English stop-words
provide Persian stop words, we add the stop word from https://www.ranks.nl/ are acquired from in mallet.

573
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

bigger, it might clip some valuable feature, and if it is smaller, all the
features will be selected and we find this amount appropriate. By using
this preprocessing method, our results on English dataset improved
over 1%, and the results on the Persian dataset remains the same.
Features used in prediction model for the English dataset are
𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−1 , 𝑝𝑟𝑖𝑐𝑒𝐸𝑛𝑡−2 , 𝑃 𝑜𝑠𝐴𝑑𝑗𝐸𝑛𝑖,𝑡 , 𝑃 𝑜𝑠𝑁𝑜𝑢𝑛𝐸𝑛𝑖,𝑡 , 𝑃 𝑜𝑠𝑃 𝑟𝑒𝑝𝐸𝑛𝑖,𝑡 and
𝑃 𝑜𝑠𝑉 𝑒𝑟𝑏𝐸𝑛𝑖,𝑡 which the terms Adj, Noun, Prep and Verb indicate ad-
jective, noun, preposition and verb, respectively in the topic probability
of each part-of-speech category. Features used in prediction model for
Persian datasets are 𝑃 𝑜𝑠𝐴𝑑𝑗𝑃 𝑒𝑟𝑖,𝑡 , 𝑃 𝑜𝑠𝑁𝑜𝑢𝑛𝑃 𝑒𝑟𝑖,𝑡 , 𝑃 𝑜𝑠𝑃 𝑟𝑒𝑝𝑃 𝑒𝑟𝑖,𝑡 and
𝑃 𝑜𝑠𝑉 𝑒𝑟𝑏𝑃 𝑒𝑟𝑖,𝑡 . Table 3 shows the features used in LDA-POS method.

4.6. Neural network method

Instead of using SVM with linear kernel we can use different clas-
Fig. 2. Graphical model representation of LDA-POS. sification methods to do this. Neural Network models gain much pop-
ularity because of the performance they show on various tasks, while
Table 5 deep models have too many parameters to learn and they usually need
Notations in LDA-POS. big training sets to train well, we used a two-layered neural network
Symbol Definition with one hidden layer. The activation function of neurons in the hidden
𝛼, 𝛽 Hyperparameters layer is ‘‘tanh’’ and the output neuron activation function is ‘‘sigmoid’’.
𝜙 The distribution over words The inputs of this model(input layer) are the features generated by
T The number of topics for all POS tags the LDA method. The number of topics in the LDA is 50, so the input
𝜃 The document specific topic distribution
layer size of this shallow neural network is equal to 50. For the output
z A topic
w A word in the document d layer, we choose sigmoid as activation function so that we could see the
𝑁𝑑 The number of words in the document d output as the probability of a sample belonging to either of two classes
𝑝𝑜𝑠 The part of speech tag for the document 0 or 1 that corresponds to increase and decrease of the next day prices.
D The number of documents If the output is bigger than 0.5, it belongs to class 1; else it belongs to
class 0.
The number of hidden units are 40 in our implementation. We se-
in that POS tag which each topic is a distribution of words. We used lected this number experimentally by testing different amounts and try-
the training part of our datasets and some POS taggers(explained in ing to maximize the results. Also, this amount seems a good heuristic.
the following paragraphs) to create documents of each POS category. The error back propagation is used to update the weight of this model.
Then we used these tagged documents to infer the topic distributions. The number of iteration is 10,000, and the learning rate is 0.1. To avoid
Fig. 2 shows the graphical model representation of this method and the overfitting, it is conventional to add regularization term(weight decay)
notations of this model are described in Table 5. to favor new weights, and it is proven to be helpful in practice. The
We implement this method on both English and Persian datasets. regularization amount here is 0.995. Table 7 show the accuracy results
In each transaction date in the training days of the datasets, we de- for this model on the English dataset.
termined the part-of-speech role of the words in the sentences of all
messages (in the Persian language we used JHazm3 (Nourian et al., 5. Experimental results
2015), and in the English language we used StanfordCoreNLP (Manning
et al., 2014) for part-of-speech tagging and lemmatization). After that, In this section, we use the computer experiments to evaluate the
we lemmatized the words; then we group the words with the same proposed model, and then we analyze the results.
part-of-speech category in a document corresponding to that part-of-
speech (Table 6 shows the part-of-speech categories and how different 5.1. Evaluation measure
part-of-speech tags in the English and Persian taggers are mapped to
them). For each transaction date in the train part of the datasets, we Both Persian and English datasets are divided into two parts. One
generate a document for each part-of-speech category, and in each for training the prediction model and the other for testing. In English
part-of-speech category, we infer the topics using Gibbs sampling with dataset, the period from July 23, 2012 to March 28, 2013 is for training
1000 number of iterations. We choose 50 topics for each part-of-speech and contains 171 transaction dates, and the period from April 1, 2013
category. Then the probability of each topic in different part-of-speech to July 19, 2013 is for testing and contains 78 transaction dates as
categories in the datasets for each transaction date is calculated. The mentioned by Nguyen et al. (2015). In Persian dataset the period from
feature generated by this method for each transaction date is a set April 30, 2016 to August 28, 2016 is used for training and contains 76
of 50 topics probabilities for each verb, adjective, preposition and noun transaction dates, and the period from September 2, 2016 to November
part-of-speech category summing up to a vector of 200 numbers. 2, 2016 is for testing and contains 30 transaction dates. While Iran
While we separated the POS categories and created the feature vec- stock market is closed on weekends and in normal working days, a stock
tor with 200 dimensions for each transaction date containing the topics could be frozen due to an administerial decision, the transaction days
of all categories, some of the topics might have tiny probabilities for in our dataset might not be exactly the same for all stocks but in the
some stocks and therefore harm the performance of the SVM classifier. mentioned period the number of transaction dates is the same.4 We give
To prevent this, we perform a feature selection mechanism on the each transaction date an up or down label based on the price increment
feature vector of each stock. We calculated the average of a feature or decrement compared to the previous transaction date. Accuracy is
in the training set, and if its square is bigger than 0.03, we kept that selected as the evaluation measure for this paper because it shows the
feature in the feature vector otherwise we simply ignored(omitted) that proportion of true results in the test set and is suitable for our task and
feature. The parameter 0.03 has been chosen experimentally. If it is
4
Section 3.2 describes the Persian dataset in details and all transaction
3
https://github.com/mojtaba-khallash/JHazm. dates for each stock are available in dataset statistic files.

574
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Table 6
List of word tags in each category.
Category name POS Tag POS tagger On language Explanation
Verb VB StanfordTagger English Verb, base form
Verb VBD StanfordTagger English Verb, past tense
Verb VBG StanfordTagger English Verb, gerund or present participle
Verb VBN StanfordTagger English Verb, past participle
Verb VBP StanfordTagger English Verb, non3rd person singular present
Verb VBZ StanfordTagger English Verb, 3rd person singular present
Verb V JHazm Persian Verb
Adjective JJ StanfordTagger English Adjective
Adjective JJR StanfordTagger English Adjective, comparative
Adjective JJS StanfordTagger English JJS Adjective, superlative
Adjective RB StanfordTagger English Adverb
Adjective RBR StanfordTagger English Adverb, comparative
Adjective RBS StanfordTagger English Adverb, superlative
Adjective ADJ JHazm Persian Adjective
Adjective ADV JHazm Persian Adverb
Preposition CC StanfordTagger English Coordinating conjunction
Preposition DT StanfordTagger English Determiner
Preposition IN StanfordTagger English Preposition or subordinating conjunction
Preposition MD StanfordTagger English Modal
Preposition PDT StanfordTagger English Predeterminer
Preposition UH StanfordTagger English Interjection
Preposition PP JHazm Persian Prepositional phrase
Preposition PREP JHazm Persian Preposition
Preposition CONJ JHazm Persian Conjunction
Preposition PUNC JHazm Persian Punctuation
Noun NN StanfordTagger English Noun, singular or mass
Noun NNS StanfordTagger English Noun, plural
Noun NNP StanfordTagger English Proper noun, singular
Noun NNPS StanfordTagger English Proper noun, plural
Noun N JHazm Persian Noun
Noun NP JHazm Persian Noun phrase

also to be able compare with the results given in Nguyen et al. (2015). infer that using mood information in social media can be helpful in
Eq. (1) shows the Accuracy measure. predicting the stock market.
𝑡𝑝 + 𝑡𝑛 To be able to compare our results with the results reported in
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (1) Nguyen et al. (2015), we add price only features to the feature vector
𝑡𝑝 + 𝑓 𝑝 + 𝑓 𝑛 + 𝑡𝑛
of other methods in the prediction model. In Nguyen et al. (2015),
where 𝑡𝑝 is the number of samples correctly categorized for positive
the price only features were also concatenated to the feature vector
samples, 𝑡𝑛 is the number of samples, correctly rejected for the negative
samples, 𝑓 𝑝 is the number of samples incorrectly categorized for the of other methods in the English dataset(Table 3 show the features used
positive samples, and 𝑓 𝑛 is the number of samples incorrectly rejected in the prediction model for each method and dataset). We believe this
for the negative samples. approach is inefficient if not fallacious, so we do not include these fea-
tures in the feature vector of different methods in the Persian dataset.
5.2. The results If we add price only features to the LDA-POS feature vector in the
Persian dataset the average accuracy results would be higher. Omitting
Tables 7 and 8 show the accuracy results of different stocks for other features from the feature vector of different methods help us
the English and Persian datasets. These tables also provide the average to determine the predicting power of various methods precisely and
and variance of each method on all stocks. Using LDA-POS method we therefore their comparison would be more meaningful. The features
reached the overall average accuracy of 56.24% on 15 stocks in the of price only method are just ones and negative ones, adding these
English dataset and overall average accuracy of 55.33% on five stocks features to the feature vector of other methods which most their el-
in the Persian dataset. As mentioned in Section 2 in some papers the ements are usually probability distribution with small amounts, makes
accuracy of 56% is also reported as a pleasant result, but it should be the feature vector heterogeneous in it could reduce the performance
considered that in most of them, the dataset is limited to a short period of the prediction model. In order to aggregate features of different
and their dataset usually consists of only one stock. In this paper, we methods to improve the overall accuracy results, we can assume the
implement our method on two datasets in different languages with a outcome of each method as a belief of an expert and use techniques in
considerable number of stocks and a relatively protracted period (look ensemble learning to aggregate these beliefs and achieve an accuracy
at dataset details in Section 3). In some stocks, the accuracy of our higher than all. In this paper, the focus is on proposing a method with
method are outstanding. For example, VNAFT with 66.66%, YAHOO the best possible accuracy, but in practical implementations integrating
with 62.82% and so on. the results of different methods and approaches is necessary to achieve
The fundamental question in utilizing mood information in social the best result.
networks to predict stock price movements is that whether this infor- The LDA-POS method outperforms human sentiment method by
mation has any predicting power. To answer this question, we compare 2.12% in the English dataset and by 0.66% in the Persian dataset.
the results of the human sentiment and the proposed LDA-POS method Therefore we can claim that our method can capture sentiment infor-
to the price only method. In the English dataset, the human sentiment mation from messages in each transaction day in a way that it has
method outperforms price only method by 0.26% and our LDA-POS more predicting power than explicitly labeled human sentiments. In
method outperforms price only method by 2.38% and in Persian dataset, the implementation of the LDA-POS method, we used only messages
these improvement rates are 0.67% for human sentiment and 2.33% for without human sentiment label, so we can apply LDA-POS method on
our LDA-POS method. Based on these accuracy improvements we can messages in social networks that they do not have sentiment labels,

575
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Table 7
Results of accuracies of 15 stocks.
Stocks Baseline models Our model
Price only LDA-based method Human sentiment Aspect-based sentiment LDA-POS method Neural model on LDA LDA-POS F-measure
AMZN 0.4605 0.5132 0.4868 0.7105 0.5769 0.3846 0.7179
BA 0.6316 0.5526 0.6053 0.5921 0.6154 0.5512 0.7619
BAC 0.5658 0.5526 0.5921 0.4474 0.5897 0.6153 0.7288
CSCO 0.5526 0.4737 0.4474 0.4605 0.5513 0.5384 0.7058
EBAY 0.5921 0.5658 0.4605 0.5789 0.5128 0.5384 0.6724
ETFC 0.5789 0.4868 0.5921 0.5526 0.5513 0.5512 0.7058
GOOG 0.5000 0.5658 0.5658 0.5263 0.5385 0.5512 0.6896
IBM 0.4868 0.5395 0.4737 0.5526 0.5513 0.4743 0.6956
INTC 0.4474 0.5000 0.4605 0.5263 0.5769 0.4102 0.7317
MSFT 0.5789 0.5526 0.6579 0.5263 0.5385 0.4743 0.6896
NVDA 0.6053 0.3947 0.5789 0.5395 0.5385 0.4871 0.6896
ORCL 0.4868 0.5921 0.5263 0.5395 0.5128 0.5000 0.6724
T 0.5526 0.5000 0.4737 0.5132 0.5513 0.5512 0.7008
XOM 0.4868 0.4342 0.6447 0.5395 0.6026 0.5769 0.7155
YAHOO 0.5526 0.5263 0.5526 0.5526 0.6282 0.5384 0.7289
AVERAGE 0.5386 0.5166 0.5412 0.5439 0.5624 0.5162 0.7071
VARIANCE 0.0032 0.0028 0.0050 0.0036 0.0012 0.0038 0.0006

Table 8
Results of accuracies of 5 Iranian market stocks.
Stocks Baseline models Our model
Price only LDA-based method Human sentiment LDA-POS linear LDA-POS F-measure
Khzamia 0.5333 0.4666 0.6000 0.5333 0.4166
Shabendar 0.5000 0.5333 0.5000 0.5000 0.5714
Shapna 0.5333 0.4666 0.5000 0.6000 0.6000
Vnaft 0.6333 0.4666 0.6300 0.6666 0.5833
Khodro 0.5000 0.4666 0.5000 0.4666 0.6363
AVERAGE 0.5400 0.4800 0.5467 0.5533 0.5616
VARIANCE 0.0030 0.0008 0.0042 0.0064 0.0072

sentiment polarity lexicon could harm the effectiveness of this method


for several stocks. Also generating comprehensive sentiment polarity
lexicon like the one used in this method for the English dataset is a grim
task. Therefore implementation of this method on languages other than
English depends on the availability of such lexicon. For example, in
the Persian language, there are just extremely limited polarity lexicons
with a maximum total size of 1500 words which makes implementation
of aspect-based sentiment inefficient if not impossible. The advantage
of the LDA-based method and LDA-POS method is that they do not
need any further or language-dependent data to find different topic
distributions and they have fewer limitations to be implemented in
various languages. Another issue of aspect-based sentiment is its high
variance. In the English dataset, the variance of this method is three
times more than our LDA-POS method. Aspect-based sentiment works
well on small number of stocks like AMZN and BA, but in some stocks,
its accuracy is even less than 50%. The reason for this could be the
domain sensitivity of the polarity lexicon, which makes use of aspect-
based sentiment limited to only a few stocks. On the other hand in
the LDA-POS method the accuracy are consistent, and for an unknown
stock on an unknown language, use of LDA-POS method produces more
trustable results.
Fig. 3. Features used in the classification model. In this figure ‘‘a’’ stands for adjective, The accuracy results for the LDA-based method were 52.63% in the
‘‘n’’ stands for noun, ‘‘v’’ stands for verb, ‘‘p’’ stands for preposition, and their English dataset and 48% in the Persian dataset. If we augment the price
concatenation stands for their union. only features to feature vector of the LDA-based method in Persian,
its accuracy would be equal to 52.66%. In the LDA-based method, the
first step is removing the stop words from all document and then
and we can expect a better or comparable result than human annotated further procedures will be applied to what is remained to infer hidden
sentiments. topics. The LDA-based method looks at these remainder words as a
The LDA-POS method outperforms aspect-based sentiment, human bag of words, and it does not consider their part-of-speech roles in the
sentiment and LDA-based method in English dataset by 1.85%, 2.12% sentences, While the meaning of each word can be assessed only in
and 4.58%, respectively. The proposed method also outperforms human its context with having a specific part of speech. If we want to extract
sentiment and LDA-based method in Persian dataset by 0.66% and features to capture semantic changes in users opinions, it is better to
7.33%, respectively. seek this change in each part-of-speech. For example, to assess users
The aspect-based sentiment needs a sentiment polarity lexicon in opinions about a product or company, we should observe how the
its implementation. The polarity and sentiment for words in various adjectives that users use change, how is the change in other categories
stocks and domains could be completely different, and use of a static of part-of-speech, to capture the general sentiment change accurately.

576
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

The LDA-POS method focus is on the part-of-speech role of words in they open it with a new price that could be dramatically different from
the sentences, therefore at first in this method, the words with similar the old price. To avoid these halt periods, we choose the time-line of our
part-of-speech are grouped, and then we infer the topic distributions stocks carefully, and all the shares in our dataset are open to exchange
in each category. In the LDA-POS method, we divided the words in during the dataset period. If we want to extend our dataset and add
the datasets into four categories, which the detail of categories and more working days to it, some of our stocks would be halted in the
implementation of this method is described in Section 4.5. In the extended period. It is possible to extend one of our stocks to more
Persian language, the way the words are used and their corresponding elongated period, but we regard having the same time-line and number
part-of-speech plays a crucial role in determining the meaning of a of samples for all the stocks a positive point for our dataset, and we do
sentence, in such a level that a subtle change could alter the meaning not want to make it fragmented.
of the whole sentence, while the English language is not as sensitive as
Persian. To assess the importance of the features (topic distributions) 6. Conclusion and future work
for each part-of-speech in LDA-POS based method, we add the inferred
features for each part-of-speech category one by one to the feature Recently predicting stock markets using machine learning methods
vector in both English and Persian dataset. Fig. 3 shows the changes in has gained much attention. There are different approaches to satisfy
the accuracies while expanding the feature vector. As it can be observed this goal, including using sentiments in social media by assessing the
in Fig. 3, the accuracy changes in the English dataset are negligible, human sentiments in user reviews. In this paper, we proposed a new
while in the Persian language, in each step we can see a significant im- method which incorporates part-of-speech tags into topic modeling
provement, even when we added features for the preposition category methods, and we call our method ‘‘LDA-POS’’ method. The average
(𝑇𝑝𝑟𝑒𝑝 ), a considerable increase in accuracy is observed, while in LDA- accuracy results for this method on quite large datasets in both English
based method removing stop words almost removes all the prepositions and Persian languages reaches promising results of 56.24% and 55.33%
in this language. In the English dataset, we could see no accuracy respectively, and we outperform the related work which we used
improvement by adding features from preposition category, and this its English dataset. Usually, the sentiment extractions methods which
shows that some part-of-speech roles do not play an essential role in perform flawlessly in the English language, do not perform well on
the sentiment of the sentences in English and can be safely removed. the Persian language, while LDA-POS method results were similar in
These results illustrate the massive importance of part-of-speech roles both languages. We show that some words with specific part-of-speech
in the Persian language. do not have significant in English and can usually be removed in the
The LDA-POS method outperforms the presented Neural Network preprocessing step, while these words could have great significance
model by 4.62%. The reason we used 2-layered neural network but in Persian. Also in this paper, we generated a dataset for the Persian
not a deeper model is that neural networks have many parameters language including five stocks, their user reviews and price movements
corresponding to the weights of their neurons which should be learned which is a valuable resource and as our knowledge it is a first Persian
in the learning stage and that demands comparably more number of stock dataset containing quite a protracted time.
samples than simpler models to confidently learn their value. To predict There are some ideas to improve the proposed model and suggest
the changes in the price well, the trend of the prices, social sentiments, them for future works. In our method, we select 50 topics, but in fact,
and economic indicators are crucially important even more than their the real number of topics for each stock could be different. We can solve
absolute value. To capture the patterns one could use recurrent neural this problem by using non-parametric methods or use some methods
models to obtain them, and the sentiments derived from the LDA-POS to guess the number of topics before applying the method. Also, we
method could be used in that model which are reasonably beneficial label price movements up and down. It is better to have more granular
based on their performance. labels. Although the size of our Persian dataset seems to be sufficient,
One of the weaknesses of the LDA-POS method is because it searches if we expand it to contain more stocks, the results derived from this
for topics in POS categories, so if the number of comments in a day dataset will be more reliable.
or textual information were limited, this would prevent the method Furthermore, the general goal of research in this field is to predict
to capture the actual distribution of the topics. For example, if there the market and gave investors suggestions to buy specific stocks. To
are few comments in a day, consequently the number of words in the reach this goal, we should implement a market simulator which calcu-
adjective category will be low, and this makes the method to overfit lates the gain and loss by obeying the suggestions, and we should also
to the few samples the model observes in that day. In the datasets we integrate the suggestions (idea) of different methods together efficiently
used, on average there are plenty comments in each day, and this is not and produce suggestions which is more beneficial than each individual
a problem in our case, but the limited number of days(samples) leads method.
us to use simpler classification models.
Many factors are limiting the performance of our Persian dataset. Acknowledgments
The first restriction is because of the characteristics of the Persian
language. In the formal written texts in this language, the vowels are The authors would like to thank the anonymous reviewers for their
omitted, so one word can have different pronunciations and different valuable comments and suggestions which improved the paper.
meanings. Also, Persian users usually write words based on their pro-
nunciation and do not follow the formal language. So there is not even References
a universal slang dictionary for Persian, because in this manner a word
could have many written forms which are not tractable. Antweiler, W., Frank, M.Z., 2004. Is all that talk just noise? the information content
The second factor hindering the performance comes from the of internet stock message boards. J. Finance 59 (3), 1259–1294.
scarcity of comments for different stocks. Not all the shares receive Azar, P., Lo, A.W., 2016. The wisdom of twitter crowds: Predicting stock market
reactions to fomc meetings via twitter feeds. J. Portf. Manag..
enough comments in daily basis to allow us to use them to predict daily
Baccianella, S., Esuli, A., Sebastiani, F., 2010. Sentiwordnet 3.0: an enhanced lexi-
price fluctuation, so we have a limited number of stocks to only which cal resource for sentiment analysis and opinion mining. In: Lrec, Vol. 10. pp.
receive a reasonable amount of comments in regularly. 2200–2204.
Further limitations come from the regulators of the market. In Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent dirichlet allocation. J. Mach. Learn.
contrast to the global market, which trading the stocks of companies Res. 3, 993–1022, URL http://dl.acm.org/citation.cfm?id=944919.944937.
Bollen, J., Mao, H., Zeng, X., 2011. Twitter mood predicts the stock market. J. Comput.
are most of the time possible; in Iran stock market, it is prevalent that Sci. 2 (1), 1–8.
the market administrator halts the exchange of a particular stock. It Cambria, E., 2016. Affective computing and sentiment analysis. IEEE Intell. Syst. 31
could last even for months due to new information or news. Afterward, (2), 102–107.

577
A. Derakhshan and H. Beigy Engineering Applications of Artificial Intelligence 85 (2019) 569–578

Cervelló-Royo, R., Guijarro, F., Michniuk, K., 2015. Stock market trading rule based on Qian, B., Rasheed, K., 2007. Stock market prediction with multiple classifiers. Appl.
pattern recognition and technical analysis: Forecasting the djia index with intraday Intell. 26 (1), 25–33.
data. Expert Syst. Appl. 42 (14), 5963–5975. Sadeghi, S., Beigy, H., 2013. A new ensemble method for feature ranking in text mining.
Dashtipour, K., Hussain, A., Zhou, Q., Gelbukh, A., Hawalah, A.Y., Cambria, E., 2016. Int. J. Artif. Intell. Tools 22 (03), 1350010.
Persent: a freely available persian sentiment lexicon. In: International Conference Sahamyab stock twitter. Oct 2016, https://www.sahamyab.com/stocktwits.
on Brain Inspired Cognitive Systems. Springer, pp. 310–320. Schumaker, R.P., Chen, H., 2009a. A quantitative stock prediction system based on
Dehdarbehbahani, I., Shakery, A., Faili, H., 2014. Semi-supervised word polarity financial news. Inf. Process. Manage. 45 (5), 571–583.
identification in resource-lean languages. Neural Netw. 58, 50–59. Schumaker, R.P., Chen, H., 2009b. Textual analysis of stock market prediction using
Dragoni, M., da Costa Pereira, C., Tettamanzi, A.G., Villata, S., 2018. Combining breaking financial news: The azfin text system. ACM Trans. Inf. Syst. (TOIS) 27
argumentation and aspect-based opinion mining: The smack system. AI Commun. (2), 12.
(Preprint), 1–21. Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., Deng, X., 2013. Exploiting topic based
Dragoni, M., Pereira, C.D.C., Tettamanzi, A.G., Villata, S., 2016. Smack: An argumenta- twitter sentiment for stock prediction. ACL (2) 2013, 24–29.
tion framework for opinion mining. In: International Joint Conference on Artificial Tehran Securities Exchange Technology Management Co. Oct 2016, http://www.tsetmc.
Intelligence (IJCAI). IJCAI/AAAI Press, pp. 4242–4243. com/Loader.aspx?ParTree=15.
Fama, E.F., 1991. Efficient capital markets: Ii. J. Financ. 46 (5), 1575–1617. Ticknor, J.L., 2013. A bayesian regularized artificial neural network for stock market
Fama, E.F., Fisher, L., Jensen, M.C., Roll, R., 1969. The adjustment of stock prices to forecasting. Expert Syst. Appl. 40 (14), 5501–5506.
new information. Int. Econ. Rev. 10 (1), 1–21. Tsibouris, G., Zeidenberg, M., 1995. Testing the efficient markets hypothesis with
Federici, M., Dragoni, M., 2016. A knowledge-based approach for aspect-based opinion gradient descent algorithms. In: Neural Networks in the Capital Markets, Vol. 8.
mining. In: Semantic Web Evaluation Challenge. Springer, pp. 141–152. Wiley, Chichester, pp. 127–136.
Hu, M., Liu, B., 2004. Mining and summarizing customer reviews. In: Proceedings of the Tumarkin, R., Whitelaw, R.F., 2001. News or noise? internet postings and stock prices.
Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Financ. Anal. J. 57 (3), 41–51.
Mining. pp. 168–177. Vu, T.-T., Chang, S., Ha, Q.T., Collier, N., 2012. An experiment in integrating sentiment
Kumar, B.S., Ravi, V., 2016. A survey of the applications of text mining in financial features for tech stock prediction in twitter. In: Proceedings of the Workshop on
domain. Knowl.-Based Syst. 114, 128–147. Information Extraction and Entity Analytics on Social Media Data. p. 2338.
Liu, B., Zhang, L., 2012. A survey of opinion mining and sentiment analysis. In: Mining Walczak, S., 2001. An empirical analysis of data requirements for financial forecasting
Text Data. Springer, pp. 415–463. with neural networks. J. Manag. Inf. Syst. 17 (4), 203–222.
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D., 2014. Xing, F.Z., Cambria, E., Welsch, R.E., 2018a. Intelligent asset allocation via market
The stanford corenlp natural language processing toolkit. In: Proceedings of sentiment views. IEEE Comput. Intell. Mag. 13 (4), 25–34.
52nd Annual Meeting of the Association for Computational Linguistics: System Xing, F.Z., Cambria, E., Welsch, R.E., 2018b. Natural language based financial
Demonstrations. pp. 55–60. forecasting: a survey. Artif. Intell. Rev. 50 (1), 49–73.
Nguyen, T.H., Shirai, K., Velcin, J., 2015. Sentiment analysis on social media for stock Yahoo Finance. Oct 2016, https://finance.yahoo.com/.
movement prediction. Expert Syst. Appl. 42 (24), 9603–9611. Yahoo Finance Message Board. Oct 2016, https://finance.yahoo.com/quote/GE.
Nourian, A., Rasooli, M.S., Imany, M., Faili, H., 2015. On the importance of ezafe Yoshihara, A., Seki, K., Uehara, K., 2015. Leveraging temporal properties of news events
construction in persian parsing. In: Proceedings of the 53rd Annual Meeting of for stock market prediction. Artif. Intell. Res. 5 (1), 103.
the Association for Computational Linguistics and the 7th International Joint Zuo, Y., Kita, E., 2012a. Stock price forecast using Bayesian network. Expert Syst. Appl.
Conference on Natural Language Processing, Vol. 2. pp. 877–882. 39 (8), 6729–6737.
Pang, B., Lee, L., 2008. Opinion Mining and Sentiment Analysis (Foundations and Zuo, Y., Kita, E., 2012b. Up/down analysis of stock index by using Bayesian network.
Trends (R) in Information Retrieval). Now Publishers Inc. Eng. Manag. Res. 1 (2), 46.
Patel, J., Shah, S., Thakkar, P., Kotecha, K., 2015. Predicting stock market index using
fusion of machine learning techniques. Expert Syst. Appl. 42 (4), 2162–2172.

578

You might also like