Sentiment Analysis of Stock Prices and News Headlines Using The MCDM Framework
Sentiment Analysis of Stock Prices and News Headlines Using The MCDM Framework
[email protected] [email protected]
Abstract—In the 21st century, the speedy progress in digital [7]. If the news sentiment is favorable, the stock price is
data procurement has led to the fast-growing amount of data more likely to rise; if the news sentiment is negative, the
kept in the database, data warehouses, or other data stock price is more likely to fall [8]. This study aims to
storehouses. The main reason behind this is that it provides a develop a model to forecast news polarity, influencing stock
fast spreading of information and increases technology usage. market patterns. The different types of sentiments are
The stock market is one of the utmost competitive financial positive, negative, and neutral. Analysts use historical data to
markets where traders want to compute financial capacities predict the value of a company's or financial instrument's
with low latency and high output. In this study, we introduced future stock price on an exchange. Predicting stock prices
an unsupervised MCDM-based Grey Relational analysis
that will yield a significant return is another challenging
(GRA) model that targets giving appropriate sentiment tags to
the news headlines and predicting the forthcoming stock
endeavor. The company's current health is considered when
prediction. To check the proposed model's applicability, we making future stock price predictions, rather than only
used INFOSYS and WIPRO datasets, which give satisfactory looking at historical data. It is possible to predict the stock
results over the proposed model. We recorded an accuracy of price by evaluating historical data and employing various
around 87%. We utilize a practical GRA model approach to techniques [9].
evaluate and recommend the finest share stocks based on news A company's stock price cannot be predicted by any
headlines from multiple web sources. Our system's
method like Random Walk Theory, which holds that stock
performance is evaluated using real-time data from WIPRO
prices move randomly and cannot be predicted. Fuzzy logic
and INFOSYS.
[10] and artificial intelligence approaches [11], such as
Keywords— Stock market, Sentiment analysis, GRA model, neural networks, can be used to anticipate stock prices [12].
News headlines Dumpster–Shafer Theory, K-Nearest Neighbor Theory [13],
Support Vector Machine [14], Fuzzy Logic, Artificial Neural
Networks, and Support Vector Machine (k-NN) [15].
I. INTRODUCTION
One of the most intriguing technologies of our time is A. Effect of sentiments on the stock market
financial markets [1]. They significantly influence many
Stock sentiment analysis, which employs text mining,
things, including business, education, jobs, technology, and
natural language processing, and computational approaches
the economy [2]. To begin, we discuss some stock market
to extract sentiments from a text automatically, is another
terminology. Stock refers to products or items stored on a
approach to knowing the stock trend forecast [16]. Its goal is
store or warehouse premises and available for sale or
to identify a text's polarity at the phrase or class level,
distribution. The stock of a company or organization is
determining whether it represents a positive, negative, or
expressed as a percentage of the total number of shares, with
neutral viewpoint [17]. In stock market prediction, two
one share of the stock representing partial ownership. The
primary sources of text are used: reliable social media,
company's shares are divided at the firm's founding, and the
tweets from official Twitter accounts, and financial news
absolute number of shares should be mentioned. Actions
articles [18].
represent a portion of a company's ownership. A company
can declare shares with various ownership, rights, or shares For example, In 2021, due to COVID, the demand for
regulations. In its most basic form, the stock market gathers medicines was high; therefore, Candela healthcare stock
buyers and sellers of stocks, reflecting firms' ownership market increased similarly due to lockdown and COVID,
claims [3]. Most of the time, stocks listed on a public stock BSE Sensex goes on loss, implies how people's sentiments
exchange can also be sold privately, and the future market reflected through social media affect the business statistics
trend is forecasted in the space between buyer and seller and profit and loss made by the company.
aggregate. Stock traders that correctly forecast stock price
patterns can make large rewards. As a result, stock traders Financial news articles are thus viewed as a more reliable
need to be able to foresee future stock market movements to and consistent source of information. Financial news stories
make informed decisions. Investors assess a company's have been found to correlate strongly with stock market
performance before purchasing its stock to prevent risky changes, suggesting that studying such stories can help with
stocks. Stock exchanges encourage people to invest [4]. This stock market forecasting. Using a unified latent space model,
investment is crucial for economic growth, wealth, and the author investigated the association between stock prices
commerce [5]. and news item releases. The result shows a high level of
return accuracy, indicating that news item analysis
Stock market analysts are working to establish a viable significantly influences stock market movement [19].
method for predicting stock values [6]. For sentiment
analysis, multiple methods are utilized to build a link
between stock market values and sentiments in news feeds,
and then a trained model is used to predict stock market rates
Authorized licensed use limited to: Zhejiang University. Downloaded on November 03,2023 at 14:47:23 UTC from IEEE Xplore. Restrictions apply.
B. Research Problem Step 5) Stemming: It transforms a word into its basic
This study intends to create a framework that improves form.
the accuracy of assessing the sentiment orientation of Step 6) Lemmatization: Lemmatization to determine a
newspaper headlines. word's lemma, it's part of speech must be determined. This
assists in converting the term to its correct root form. More
C. Contribution computational linguistics resources, such as a part-of-speech
The Proposed model employs mathematical optimization tagger, are required to accomplish this.
techniques for sentiment analysis of news headlines. We Step 7) Evaluating the context score of news headlines
introduced MCDM based GRA model, which is used for using SentiWordNet: This step determines the context score
sentiment tagging of the news headlines by integrating of textual comments using SentiWordNet [26].
context scores evaluated from SentiWordNet and emotion SentiWordNet includes both positive and negative polarity
scores from python libraries. word lists. From the feedback, we extract the POS-tagged
Vu et al. [20] improved the named-entity identification content words. To determine comment context scores, we
approach for noise reduction on the Twitter dataset. Martin parse SentiWordNet for positive and negative polarity values
[21] used a neural network with Twitter sentiment analysis for each word.
for stock market move prediction. Li et al. [22] introduced a Step 8) Evaluate emotion score: We evaluate the emotion
sentiment analysis model that outperforms a bag-of-words score using python libraries and classify them into three
approach. Shastri et al. improve the sentiment analysis categories: positive, negative, and neutral, named E1, E2, and
process by incorporating different words that might influence E3 as mentioned in Figure 4.
stock movement more. The Study Wu et al. [23] only looked
at news stories from the Knowledge Management Winner Step 9) Grey Relational analysis (GRA) technique for
newspaper. Maqsood et al. [24] require the use of many deduced tag: As we had taken context and emotion scores of
websites for social media to create an analysis of feelings for each review. We use Algorithm 1 to implement the given
a specific occurrence. Liu and Wang [25] use the NBA method, and the review tag is generated by marking the
model in an index or on data at the industry level. Li et al. [1] alternative with the highest rank while ignoring the others.
proposed a topic modelling framework for sentiment
analysis. Algorithm 1: GRA Technique
Step 1) Data normalization between [0 1].
II. METHODOLOGY
xiτ ( k ) − min xiτ ( k )
The methodology includes a step-by-step procedure to τ τ
→ For Higher the better
evaluate the sentiment tag of the headlines. The pipeline of * max xi ( k ) − min xi ( k )
xi ( k ) = τ τ
the proposed model is explained in the below sections. First, max xi ( k ) − xi ( k ) → For Lower the better
we extract the context and emotion of the reviews. Using max xiτ ( k ) − min xiτ ( k )
SentiWordNet and Python, we transform the news headlines Step 2) Evaluate the deviation sequence by using the formula
from text to a numerical form termed a context score and below.
then assess the sentiments contained inside. The proposed | x iτ ( k ) − x iτ |
model is then fed the emotion and context scores to make x i* ( k ) = 1 −
review sentiment predictions. What you need to do is laid out m ax x iτ ( k ) − x iτ
in detail below.
Step 3) Evaluate grey relational coefficient (GRC).
Step 1) News Sources: The initial step is the collection of Ω m in + ξ .Ω m ax
news. This study gathered information via press releases ξi (k ) =
Ω ∆ i ( k ) + ξ .Ω m ax
issued by the respective companies and from traditional
financial news sources. In this stage, we scraped web news Where Ω τ i ( k ) =| xiτ ( k ) − xiτ | and Ω max = 1, Ω min = 0
from Indian companies such as Infosys and Wipro over 10
years and saved the information in a database. The Step 4) Estimate grey relational grade.
1
subsequent step involved natural language processing in
n
γi = k =1
ω kξ i (k )
extracting emotion from unstructured news articles. Stop n
words were eliminated. The remaining valuable terms were
Step 5) Rank the alternatives.
subsequently employed to analyze the feelings. We assessed
the performance of SVM and Naive Bayes classifiers to Step 6) Alternative with rank 1 is the deduced tag.
classify sentiments.
The review's emotional label was gleaned using the GRA
Step 2) Textual Pre-processing: It is common practice to model.
carry out data cleaning as a component of data pre-
processing to ensure that the data is free of errors and
outliers. Text pre-processing is the initial step in the NLP III. RESULT EVALUATIOPN
algorithm development procedure. We have identified the various sentiment categories as
Step 3) Tokenization: Splitting the sentence into words. positive, negative, and neutral. The well-known Naïve
Bayes, KNN, and SVM classifiers classify a stock news
Step 4) Stop words removal: In the texts, stop words are headline in the appropriate category. The result is compared
the most often used terms (a, an, etc.). These terms do not aid with the human annotations, and the accuracy is computed
in distinguishing between two texts; hence they do not carry with equation 1.
any meaningful significance.
Authorized licensed use limited to: Zhejiang University. Downloaded on November 03,2023 at 14:47:23 UTC from IEEE Xplore. Restrictions apply.
sum(abs( EXPoutput - ACToutput ) analyze the sentiment of alternative brands and see how one
Accuracy = (1) company's sentiment affects the others over time. We can
2
also look into other elements, such as social and legal
difficulties to see if they affect the overall market situation.
We applied the above-proposed model to the WIPRO and Using this proposed model, we may undertake financial
INFOSYS news datasets. We compared the results with the modelling such as portfolio management, risk estimation,
existing approaches [3] in Tables I and II. and strategic planning. Future work may entail expanding the
analyses and possibly adding a new model. Also, alternative
TABLE I. COMPARISON IN TERMS OF ACCURACY FOR THE INFOSYS models are included to compare stock market predictions
DATASET with sentiment analysis technologies. This can be achieved
by establishing a platform incorporating future modifications
Sentiment Accuracy into the existing model.
term Naïve Proposed
polarity SVM KNN
Bayes Model
Positive 71.92 46.58 48.59 73.56
REFERENCES
96.41 81.77 52.76 94.89 [1] J. Li, G. Li, M. Liu, X. Zhu, and L. Wei, “A novel text-based
Negative
framework for forecasting agricultural futures using massive online
Neutral 90.64 71.06 88.42 92.56 news headlines,” International Journal of Forecasting, vol. 38, no. 1,
pp. 35–50, Jan. 2022, doi: 10.1016/J.IJFORECAST.2020.02.002.
TABLE II. COMPARISON IN TERMS OF ACCURACY FOR THE WIPRO [2] M. Li, L. Chen, J. Zhao, and Q. Li, “Sentiment analysis of Chinese
DATASET stock reviews based on BERT model,” Applied Intelligence, vol. 51,
Accuracy no. 7, pp. 5016–5024, Jul. 2021, doi: 10.1007/S10489-020-02101-
Sentiment 8/TABLES/5.
Naïve Proposed
term polarity SVM KNN
Bayes Model [3] M. Li, L. Chen, J. Zhao, and Q. Li, “Sentiment analysis of Chinese
Positive 62.727 60.00 48.59 66.12 stock reviews based on BERT model,” Applied Intelligence, vol. 51,
no. 7, pp. 5016–5024, Jul. 2021, doi: 10.1007/S10489-020-02101-
Negative 63.636 49.29 52.76 78.78
8/TABLES/5.
Neutral 78.18 24.51 88.42 85.23
[4] G. Kumar and N. Parimala, “A weighted sum method MCDM
approach for recommending product using sentiment analysis,”
Tables I and II represent the performance recorded for International Journal of Business Information Systems, vol. 35, no. 2,
pp. 185–203, 2020, doi: 10.1504/IJBIS.2020.110172.
WIPRO and INFOSYS datasets by different machine
learning models with three class classifications. Comparing [5] M. Keshavarz Ghorabaee, E. Kazimieras Zavadskas, L. Olfat, and Z.
the results produced by the GRA model is satisfactory. The Turskis, “Multi-criteria inventory classification using a new method
of evaluation based on distance from average solution (EDAS),”
proposed model outperforms other models with an overall content.iospress.com, vol. 26, no. 3, pp. 435–451, 2015, doi:
accuracy of 87% in the INFOSYS dataset and 76.71% in the 10.15388/Informatica.2015.57.
WIPRO dataset. To see the overall performance, we
calculate all the models' average performance, which is [6] A. E. de Oliveira Carosia, G. P. Coelho, and A. E. A. da Silva,
“Investment strategies applied to the Brazilian stock market: A
visually depicted in Fig 1. methodology based on Sentiment Analysis with deep learning,”
Expert Systems with Applications, vol. 184. p. 115470, 2021, doi:
10.1016/j.eswa.2021.115470.
[7] N. Jing, Z. Wu, and H. Wang, “A hybrid model integrating deep
learning with investor sentiment analysis for stock price prediction,”
Expert Systems with Applications, vol. 178. 2021, doi:
10.1016/j.eswa.2021.115019.
[8] B. Luo, J. Zeng, and J. Duan, “Emotion space model for classifying
opinions in stock message board,” Expert Systems with Applications,
vol. 44, pp. 138–146, 2016, doi: 10.1016/j.eswa.2015.08.023.
[9] A. Nasir, M. A. Shah, U. Ashraf, A. Khan, and G. Jeon, “An
intelligent framework to predict socioeconomic impacts of COVID-19
and public sentiments,” Computers & Electrical Engineering, vol. 96,
p. 107526, Dec. 2021, doi:
10.1016/J.COMPELECENG.2021.107526.
[10] M. Lata Joshi, N. Joshi, and B. Vidyapith, “SGATS: Semantic Graph-
based Automatic Text Summarization from Hindi Text Documents,”
ACM Trans. Asian Low-Resour. Lang. Inf. Process, vol. 20, 2021,
Fig. 1. The average performance of models in terms of accuracy. doi: 10.1145/3464381.
[11] S. Poria, E. Cambria, G. Winterstein, and G. Bin Huang, “Sentic
patterns: Dependency-based rules for concept-level sentiment
IV. CONCLUSION analysis,” Knowledge-Based Systems, vol. 69, no. 1, pp. 45–63, 2014,
An optimization technique to predict sentiment around doi: 10.1016/j.knosys.2014.05.005.
stock prices is proposed here. The sentiment around [12] R. Ahuja and S. C. Sharma, “Transformer-Based Word Embedding
companies was predicted by first filtering relevant real-time With CNN Model to Detect Sarcasm and Irony,” Arabian Journal for
news headlines and press releases from many businesses Science and Engineering, pp. 1–14, Sep. 2021, doi: 10.1007/S13369-
021-06193-3/TABLES/8.
news sources. Although this proposed model's accuracy is
satisfactory, it can be improved by combining more [13] D. K. Kirange and R. R. Deshmukh, “Sentiment Analysis of News
advanced methods of machine learning or by data mining Headlines for Stock Price Prediction,” An international journal of
advanced computer technology, vol. 5, no. 3, 2016, doi:
approaches. This paper's future scope is open. We may
Authorized licensed use limited to: Zhejiang University. Downloaded on November 03,2023 at 14:47:23 UTC from IEEE Xplore. Restrictions apply.
10.13140/RG.2.1.4606.3765. European Journal of Tourism Research, vol. 13, pp. 132–138, Jul.
2016, doi: 10.54055/EJTR.V13I.236.
[14] Y. Liu, J. W. Bi, and Z. P. Fan, “A method for ranking products
through online reviews based on sentiment classification and interval- [21] M. T. Martín-Valdivia, E. Martínez-Cámara, J. M. Perea-Ortega, and
valued intuitionistic fuzzy topsis,” International Journal of L. A. Ureña-López, “Sentiment polarity detection in Spanish reviews
Information Technology and Decision Making, vol. 16, no. 6, pp. combining supervised and unsupervised approaches,” Expert Systems
1497–1522, Nov. 2017, doi: 10.1142/S021962201750033X. with Applications, vol. 40, no. 10, pp. 3934–3942, Aug. 2013, doi:
10.1016/J.ESWA.2012.12.084.
[15] Y. Liu, J. W. Bi, and Z. P. Fan, “A method for multi-class sentiment
classification based on an improved one-vs-one (OVO) strategy and [22] S. Xu et al., “Deep retinex decomposition network for underwater
the support vector machine (SVM) algorithm,” Information Sciences, image enhancement,” Computers and Electrical Engineering, vol.
vol. 394–395, pp. 38–52, Jul. 2017, doi: 10.1016/J.INS.2017.02.016. 100, p. 107822, May 2022, doi:
10.1016/J.COMPELECENG.2022.107822.
[16] J. Heidary Dahooie, R. Raafat, A. R. Qorbani, and T. Daim, “An
intuitionistic fuzzy data-driven product ranking model using [23] T. Wu et al., “Video sentiment analysis with bimodal information-
sentiment analysis and multi-criteria decision-making,” Technological augmented multi-head attention,” Knowledge-Based Systems, vol.
Forecasting and Social Change, vol. 173, p. 121158, Dec. 2021, doi: 235, p. 107676, Jan. 2022, doi: 10.1016/J.KNOSYS.2021.107676.
10.1016/j.techfore.2021.121158.
[24] F. Nazir, M. A. Ghazanfar, M. Maqsood, F. Aadil, S. Rho, and I.
[17] L. Zhu, W. Li, Y. Shi, and K. Guo, “SentiVec: Learning Sentiment- Mehmood, “Social media signal detection using tweets volume,
Context Vector via Kernel Optimization Function for Sentiment hashtag, and sentiment analysis,” Multimedia Tools and Applications,
Analysis,” IEEE Transactions on Neural Networks and Learning vol. 78, no. 3, pp. 3553–3586, Feb. 2019, doi: 10.1007/S11042-018-
Systems, vol. 32, no. 6, pp. 2561–2572, Jun. 2021, doi: 6437-Z/FIGURES/14.
10.1109/TNNLS.2020.3006531.
[25] X. Wang, L. (Rebecca) Tang, and E. Kim, “More than words: Do
[18] M. Etter, E. Colleoni, L. Illia, … K. M.-B. &, and undefined 2018, emotional content and linguistic style matching matter on restaurant
“Measuring organizational legitimacy in social media: Assessing review helpfulness?,” International Journal of Hospitality
citizens’ judgments with sentiment analysis,” journals.sagepub.com, Management, vol. 77, pp. 438–447, 2019, doi:
vol. 57, no. 1, pp. 60–97, Jan. 2018, doi: 10.1177/0007650316683926. 10.1016/j.ijhm.2018.08.007.
[19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- [26] A. Esuli and F. Sebastiani, “SENTIWORDNET: A publicly available
training of Deep Bidirectional Transformers for Language lexical resource for opinion mining,” Proceedings of the 5th
Understanding,” NAACL HLT 2019 - 2019 Conference of the North International Conference on Language Resources and Evaluation,
American Chapter of the Association for Computational Linguistics: LREC 2006, pp. 417–422, 2006, Accessed: Sep. 23, 2021. [Online].
Human Language Technologies - Proceedings of the Conference, vol. Available:
1, pp. 4171–4186, Oct. 2018, Accessed: Oct. 11, 2021. [Online]. http://nmis.isti.cnr.it/sebastiani/Publications/2007TR02.pdf.
Available: https://arxiv.org/abs/1810.04805v2.
[20] M. Zaman, L. Botti, and T. V. Thanh, “Weight of criteria in hotel
selection: An empirical illustration based on TripAdvisor criteria,”
Authorized licensed use limited to: Zhejiang University. Downloaded on November 03,2023 at 14:47:23 UTC from IEEE Xplore. Restrictions apply.