Sentiment Analysis
Sentiment Analysis
net/publication/339844280
CITATIONS READS
7 1,356
2 authors:
All content following this page was uploaded by Nedaa Al Barghuthi on 20 November 2022.
Intelligent Computing
Systems
Third International Symposium, ISICS 2020
Sharjah, United Arab Emirates, March 18–19, 2020
Proceedings
Sentiment Analysis on Predicting Presidential
Election: Twitter Used Case
Abstract. Twitter is a popular tool for social interaction over the Internet. It allows
users to share/post opinions, social media events, and interact with other politi-
cal and ordinary people. According to Statista web site 2019 statistical report, it
estimated that the number of users on Twitter had grown dramatically over the
past couple of years to research 300 million users. Twitter has become the largest
source of news and postings for key presidents and political figures. Referring to
the Trackalytics 2019 report, the recent president of the USA had posted 4,000
tweets per year, which indicates an average of 11–12 tweets per day. Our research
proposes a technique that extracts and analyzes tweets from blogs and predicts
election results based on tweets analysis. It assessed the people’s opinion and
studied the impact that might predict the final results for the Turkey 2018 presi-
dential election candidates. The final results were compared with the actual elec-
tion results and had a high accuracy prediction percentage based on the collected
22,000 tweets.
1 Introduction
At present, social media provides a massive amount of data about users and their social
interaction. This data plays a useful role in policy, health, finance, and many other sec-
tors as well as predicting future events and actions. In a relatively short period, social
media has gained popularity as a tool for mass communications and public participa-
tion when it comes to political purposes and governance. Obtaining a successful data
forecast helps to understand the limitations of predictability in social media and avoid
false expectations, misinformation, or unintended consequences. Rapid dissemination
of information through social networking platforms, such as Twitter, enables politicians
and activists to broadcast their messages to broad audiences immediately and directly
outside traditional media channels.
During the 2008 US election, Twitter had used as an essential tool to influence the
results of Barack Obama’s campaign [1]. Obama’s campaign succeeded in using Twitter
as a campaign and gaining more followers. Accordingly, more voters had elected during
© Springer Nature Switzerland AG 2020
C. Brito-Loeza et al. (Eds.): ISICS 2020, CCIS 1187, pp. 105–117, 2020.
https://doi.org/10.1007/978-3-030-43364-2_10
106 N. Baker Al Barghuthi and H. E. Said
the 17 months of the election period. The campaign published 262 tweets and gained
about 120,000 new followers. As a result of the Twitter campaign, all major candidates
and political parties nowadays use social media as an essential tool to convey their
messages [1]. This paper aims to study the expected patterns of political activity and
Twitter campaigns in this context. The Turkish presidential election is used as a case
scenario to extract and analyze the final campaign results. The second section of the
paper highlights the related work on election prediction around the Asia region.
Furthermore, the third section addresses the work approach and methods adopted.
The results section was organized in Sect. 4 of the paper. A final discussion and
conclusion section was listed in Sect. 5.
Fig. 1. Tween volume chart Fig. 2. Correlation Coefficient for Trum and
Clinton Tweets 43 days before election using a
Moving Average of k days
Sentiment Analysis on Predicting Presidential Election 107
the election period, when using “moving average technique,” it was noticed that 94%
correlation was recorded with poll data used as illustrated (see Fig. 2).
The lexicon dictionary [3] contains 1600 positive and 1200 negative words. The
result of applying this analysis was shown in Figs. 3 and 4.
Fig. 3. Lexicon sentiment for Trump hashtags Fig. 4. Lexicon sentiment for Clinton
hashtags
National Language Toolkit (NLTK) was used in the Naïve Bayes algorithm. There
were five hundred negative and five hundred positive tweets labeled in this algorithm.
Two key search words (‘Hillary Clinton’) and (‘Donald Trump’) were used. The criteria
of this algorithm and the analysis results are shown in Figs. 5 and 6. Authors in [1] have
used a machine learning approach for automatic tweet labeling. Convolution Neural
Network (CNN) was performed using Python and Tensorflow. The collected tweets are
trained by 140 sentiment data set. After that, tweets were labeled as a positive or negative
sentiment. The election dataset is used for evaluation and predicting the votes by using
two algorithms and sentiment analysis. The predicted votes were compared with the
polling statistics collected during the election period, and it shows that the expected
result was matched with the actual voting result [5].
Fig. 5. NLTK sentiment for Clinton hashtags Fig. 6. NLTK sentiment for Trump hashtags
Moreover, [6] produced an analytical text on the American Elections in 2016. In this
research, three million tweets were collected within 21 days before and after Election
Day. Emotion analysis was used to examine the user’s sentiment behavior regarding his
Twitter profile and its associated features. Both research papers have studied the topic of
108 N. Baker Al Barghuthi and H. E. Said
Twitter and discuss its relevance to news and events. More often the SentiStrength is used
for sentiment score calculation. The collected tweets were cleaned based on messages
referring to the keyword search terms ‘election 2016’, ‘Hillary Clinton’ or ‘Donald
Trump.’ Furthermore, [6] has conducted several hypotheses to study and discuss the
characteristics of Twitter’s political user behavior, views, and other discussions. The
result of the analysis showed that the majority of feelings of the collected tweets were
negative for both candidates.
There is a sign of unpleasant sensations of the latest elections. Also, it was discovered
from the analytic results that few numbers of tweets posted during debate sessions and
were mostly re-tweeted. Authors in [7] provided a novel method to facilitate data mining
related to the participant’s opinion with the support of linguistic analysis and sentiment
ratings. It was used to identify the sentiment score level of political involvers in Pakistan.
According to the literature, this original paper was recognized as the first one that studied
the sentiment analysis on social media. The authors have introduced a new technique for
mining opinion in a political context. Two classifications analyses were applied among
the collected data using the Bayes Naïve probability distribution and Support Vector
Machine (SVM) algorithms. The result of applying this approach was shown in Fig. 8.
The sentiment analysis was performed on the Sentiment viz web-interface that provides
a graphical analysis of the different levels of sentiments categories, as shown below (see
Fig. 7). Figure 8 also shows that SVM performed better than Naïve Bayes algorithm and
gave higher accuracy.
Fig. 7. Sentiment classification defined over Fig. 8. Comparisons of SVM and Naive
the keyword “PTI” Bayes
Figure 9 shows an output display of the invented model. This model has an option
to display the data in a visual environment. Authors of [7] have used this model to
represent the negative opinion for Pakistan People’s Party (PPP) among different places
and cities. From the display of Fig. 9, it was noticed that Lahore city has the highest
negative opinion level about PPP. The highlighted information leads to many facts.
Furthermore, other parties can see this fact from a different angle. This leads to an
easy comparison and clarity concerning PPP. The proposed model was tested on the
Pakistan Election. Sentiment analysis was performed and shows that Lahore has the
highest negativity percentage among the country, as it is listed in Fig. 9. SVM was also
used for determining the polarity of the sentiment. SVM is based on features of the data,
and label polarity of the tweet’s sentiment, whether it is positive or negative, otherwise,
Sentiment Analysis on Predicting Presidential Election 109
Fig. 9. The magnitude of negativity about PPP in Fig. 10. Working of SVM Model
different cities
it will be labeled as neutral, along with the political party name for the posted tweets [9]
(see Fig. 10).
The author in [8] has improved the political sentiment analysis on Twitter using
a dynamic keyword method. This paper presents a new method for collecting data to
predict election results and a way for subject models to extract topics [10]. The RSS
aggregator uses news articles and dynamic keywords shared on Twitter at the same time,
creating an intelligent prediction method that mainly depends on the size of Twitter. It
also attempts to enhance electoral predictions on social media using a dynamic word-
based methodology for Delhi Assembly Election 2015. There were two channels for
data collected using RSS (see Fig. 11) and Twitter API. Two types of analysis were
applied to the collected data (volume and sentiment-based analysis). This analysis is
useful to detect the voting activity from the tweets by using keywording searches related
to ‘election’ using the LDA-based approach, as shown in (see Fig. 12). This analysis
is based on simple opinion and comparison-based opinion. It is evident from the graph
(see Fig. 10) that SVM techniques give good results.
Fig. 11. Selection procedure for topics and Fig. 12. Tag cloud showing the most
keywords from RSS feeds relevant keywords in RSS
110 N. Baker Al Barghuthi and H. E. Said
3 Methodology
Two datasets were generated using the Twitter application for data collection. Each
dataset has 22000 tweets. The collected tweets were extracted before the Turkish Presi-
dential Election period. The election took place during the period of 22nd and 24th June
2018. Spyder Python development environment was used for the data mining and sen-
timent analysis process (see Table 1). Multiple stages were performed on this dataset
during the analysis processes. Step 1 of the process involves translating the text por-
tion in the datasets to the English Language text format. We selected the python spyder
programming script and the translator parser.
Furthermore, step 2 investigates the dataset’s text conversion into a lower case text
format. A filtering process was applied to eliminate the unwanted text. This filter was used
to remove redundant data for better extraction. A custom stopwords lexicon dictionary
that contained English and Turkish words were used to filter the dataset’s text. The output
text from this filter was extracted from any punctuations symbols, URLs, and hashtags.
Besides, all the re-tweeted texts were excluded from the filtered text. After that, word
and sentence tokenize process was completed. Then, each tweet was divided into two
parts; discrete words and sentences. The output of the text was further extracted for
better sentiment analysis performed on the output text using a Naïve Bayes Classifier.
It classifies the sentiment polarity level using three levels of polarity, positive, negative,
and neutral.
Finally, several data processes were implemented on the dataset text using the word
cloud and word frequency algorithms. Multiple visual presentation and measurements
Sentiment Analysis on Predicting Presidential Election 111
were generated and displayed. Finally, the actual election results were compared with the
Twitter sentiment analysis results and evaluated the precision percentage of the overall
effect. Figures 13 and 14 show the processes performed on the data sets.
Fig. 16. Converting text to lower case and spell check process
c. Create a custom stop words and remove the unwanted text from the main data
set
Two custom stopwords dictionaries were created using utf-8 encoding. Stopwords
list for English and another one for Turkish. These stopwords lists have most frequent
words that people usually use. It should be removed from the dataset before and after
the data processing as shown in Fig. 17.
Fig. 17. Create a custom stop words and remove the unwanted text from the main dataset
Fig. 19. Measure the overall polarity sentiment level for each elector
5 Observation
The datasets were examined using the above-mentioned approach. Exciting results were
generated. The testing was introduced using two election candidates. Those are Mr.
Regep Tayyip Erdogan (RT Erdogan - candidate #1) and his competitor Mr. Muharrem
Inci (Inci - candidate #2). Similar experimental procedures were applied to these two
candidates’ datasets. Table 2 shows samples of positive, negative and neutral opinions
for candidate #1 using the Naïve Bayes Classifier.
The results showed that candidate #1 has more positive sentiment votes. A “48%”
positives sentiments reflect that more people are pleased to elect him. The competitor has
114 N. Baker Al Barghuthi and H. E. Said
Table 2. Comparison between actual Turkey presidential final result vs. Twitter predicted result
Table 3. The word cloud results from the collected dataset during June 22nd–24th, 2018, for both
electors. It shows the top 100 words that are frequently repeated and the most 100 positive words
about both electors
“35%” positive sentiment votes. Moreover, 23% of tweets have negative feelings votes
for candidate one whereas 30% recorded tweets were neutral votes. As for candidate 2,
35% of the tweets were identified as positive sentiments; on the other hand, only 17% of
the tweets listed as negative feelings. For this candidate, the neutral votes were recorded
as 48% of the overall tweets. These results are comparable to the official election results
listed, as shown in Fig. 20 and Tables 2, 3 and 4.
Sentiment Analysis on Predicting Presidential Election 115
Sentiment : polari-
Zeeshan Haider, iamshanichadhar,
ty=
2018-06-22 03: 55,0,0," ""I do not Now look, where are the citizens of a great
0.420238095238095
know what he say in this video but Sentiment: polari- country when the president is in the palaces? how
24,
PosiƟve every word is important for every ty=0.4, is this country represented from this profession?
subjectivity=
Muslim he is real Hero as a leader subjectivity=0.625 just open your mind and look at the state of the
0.559077380952381
country. how does it look?
Sentiment:
Neutral polarity=0.0, The AKP lie beneath the party feet that it enlists.
subjectivity=0.0
SenƟmental
Polarity Per-
centage
The actual Turkey Presidential results are shown in Fig. 14. It indicates that candidate 1
received 53.3% from the overall election votes. In contrast, candidate 2 received 30.4%
of the total electoral votes. As shown in Table 3, the proposed sentiment analysis model
was successful in predicting the final results of the presidential election votes. This
model predicted that candidate 1 has positive support from the community with 48% of
the predicted twitter votes. When the actual voting result was received during the election
period, it shows that there is 9.94% accuracy. Nowadays, social media plays an essential
role in sentiment analysis, mainly if it is used in predicted votes before running the actual
election results [12, 13]. This proposed model can also be useful for politicians, especially
if they need to understand their followers and supporters. As shown in Table 4, the word
cloud has a critical impact, in collecting the negative sentiments from the community
and visualize them. By doing so, the voters can have a useful resource from these data to
improve their proposed strategy seeking more supporters. On another hand, this model
can be used in several cases, such as to study the activities of the elector’s followers on
social media to understand their thought, opinion, and subjectivity.
116 N. Baker Al Barghuthi and H. E. Said
References
1. Heredia, B., Prusa, J., Khoshgoftaar, T.: Exploring the effectiveness of Twitter at polling the
United States 2016 presidential election. In: IEEE 3rd International Conference on Collabo-
ration and Internet Computing (CIC), pp. 283–290. IEEE (2017). https://doi.org/10.1109/cic.
2017.00045
2. Abuaiadah, D., Rajendran, D., Jarrar, M.: Clustering Arabic tweets for sentiment analysis. In:
IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA),
pp. 449–456. IEEE (2017). https://doi.org/10.1109/aiccsa.2017.162
3. Joyce, B., Deng, J.: Sentiment analysis of tweets for the 2016 US presidential election.
In: Undergraduate Research Technology Conference (URTC), IEEE MIT, pp. 1–4. IEEE
(2017)
4. Huberty, M.: Can we vote with our tweet? On the perennial difficulty of election forecast-
ing with social media. Int. J. Forecast. 31(3), 992–1007 (2015). https://doi.org/10.1016/J.
IJFORECAST.2014.08.005
5. Yaqub, U., Atluri, V., Chun, S.A., Vaidya, J.: Sentiment based analysis of tweets during the
US presidential elections. In: Proceedings of the 18th Annual International Conference on
Digital Government Research, pp. 1–10 (2017)
6. Ussama, Y., Chun, S.A., Atluri, V., Vaidya, J.: Analysis of political discourse on Twitter in
the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017). https://
doi.org/10.1016/J.GIQ.2017.11.001
7. Gull, R., Shoaib, U., Rasheed, S., Abid, W., Zahoor, B.: Pre processing of Twitter’s data
for opinion mining in political context. In: Proceedings of 20th International Conference on
Knowledge-Based and Intelligent Information and Engineering Systems, KES 2016 (2016)
8. Jain, S., Sharma, V., Kaushal, R.: PoliticAlly: finding political friends on Twitter. In: IEEE
International Conference on Advanced Networks and Telecommunications Systems (ANTS),
pp. 1–3 (2015). https://doi.org/10.1109/ANTS.2015.7413659
9. Sharma, U., Yaqub, R., Pabreja, S., Chun, A., Atluri, V., Vaidya, J.: Analysis and visualization
of subjectivity and polarity of Twitter location data. In: Proceedings of the 19th Annual
International Conference on Digital Government Research Governance in the Data Age -
DGO 2018, pp. 1–10 (2018)
10. Ceron, A., Curini, L., Iacus, S.M., Porro, G.: Every tweet counts? How sentiment analysis of
social media can improve our knowledge of citizens’ political preferences with an application
to Italy and France. New Media Soc. 16(2), 340–358 (2014)
Sentiment Analysis on Predicting Presidential Election 117
11. Hurriyet Daily News, As it happened: Erdoğan re-elected president, “People’s Alliance”
wins majority at parliament (2018). http://www.hurriyetdailynews.com/turkey-election-live-
updates-vote-counting-starts-as-polls-close-across-turkey-133726. Accessed 16 August 2018
12. Burnap, P., Gibson, R., Sloan, L., Southern, R., Williams, M.: 140 characters to victory?:
Using Twitter to predict the UK 2015 general election. Electoral. Stud. 41, 230–233 (2016)
13. Vinay, J., Shishir, K.: Towards prediction of electronic outcomes using social media. Int. J.
Intell. Syst. Appl. 12, 20–28 (2017)