0% found this document useful (0 votes)
24 views15 pages

Sentiment Analysis

Uploaded by

Reetom Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views15 pages

Sentiment Analysis

Uploaded by

Reetom Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339844280

Sentiment Analysis on Predicting Presidential Election: Twitter Used Case

Chapter in Communications in Computer and Information Science · March 2020


DOI: 10.1007/978-3-030-43364-2_10

CITATIONS READS

7 1,356

2 authors:

Nedaa Al Barghuthi Huwida E. Said


Higher Colleges of Technology, United Arab Emirates, Sharjah Zayed University
34 PUBLICATIONS 493 CITATIONS 40 PUBLICATIONS 963 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Nedaa Al Barghuthi on 20 November 2022.

The user has requested enhancement of the downloaded file.


Carlos Brito-Loeza
Arturo Espinosa-Romero
Anabel Martin-Gonzalez
Asad Safi (Eds.)

Communications in Computer and Information Science 1187

Intelligent Computing
Systems
Third International Symposium, ISICS 2020
Sharjah, United Arab Emirates, March 18–19, 2020
Proceedings
Sentiment Analysis on Predicting Presidential
Election: Twitter Used Case

Nedaa Baker Al Barghuthi1(B) and Huwida E. Said2(B)


1 Higher Colleges of Technology, Sharjah, United Arab Emirates
[email protected]
2 Zayed University, Dubai, United Arab Emirates

[email protected]

Abstract. Twitter is a popular tool for social interaction over the Internet. It allows
users to share/post opinions, social media events, and interact with other politi-
cal and ordinary people. According to Statista web site 2019 statistical report, it
estimated that the number of users on Twitter had grown dramatically over the
past couple of years to research 300 million users. Twitter has become the largest
source of news and postings for key presidents and political figures. Referring to
the Trackalytics 2019 report, the recent president of the USA had posted 4,000
tweets per year, which indicates an average of 11–12 tweets per day. Our research
proposes a technique that extracts and analyzes tweets from blogs and predicts
election results based on tweets analysis. It assessed the people’s opinion and
studied the impact that might predict the final results for the Turkey 2018 presi-
dential election candidates. The final results were compared with the actual elec-
tion results and had a high accuracy prediction percentage based on the collected
22,000 tweets.

Keywords: Twitter API · Virtualization · Data mining · Sentiment analysis ·


Tweets · Election · Positive polarity · Negative polarity

1 Introduction
At present, social media provides a massive amount of data about users and their social
interaction. This data plays a useful role in policy, health, finance, and many other sec-
tors as well as predicting future events and actions. In a relatively short period, social
media has gained popularity as a tool for mass communications and public participa-
tion when it comes to political purposes and governance. Obtaining a successful data
forecast helps to understand the limitations of predictability in social media and avoid
false expectations, misinformation, or unintended consequences. Rapid dissemination
of information through social networking platforms, such as Twitter, enables politicians
and activists to broadcast their messages to broad audiences immediately and directly
outside traditional media channels.
During the 2008 US election, Twitter had used as an essential tool to influence the
results of Barack Obama’s campaign [1]. Obama’s campaign succeeded in using Twitter
as a campaign and gaining more followers. Accordingly, more voters had elected during
© Springer Nature Switzerland AG 2020
C. Brito-Loeza et al. (Eds.): ISICS 2020, CCIS 1187, pp. 105–117, 2020.
https://doi.org/10.1007/978-3-030-43364-2_10
106 N. Baker Al Barghuthi and H. E. Said

the 17 months of the election period. The campaign published 262 tweets and gained
about 120,000 new followers. As a result of the Twitter campaign, all major candidates
and political parties nowadays use social media as an essential tool to convey their
messages [1]. This paper aims to study the expected patterns of political activity and
Twitter campaigns in this context. The Turkish presidential election is used as a case
scenario to extract and analyze the final campaign results. The second section of the
paper highlights the related work on election prediction around the Asia region.
Furthermore, the third section addresses the work approach and methods adopted.
The results section was organized in Sect. 4 of the paper. A final discussion and
conclusion section was listed in Sect. 5.

2 Related Work on Election Prediction Based on Twitter


The Twitter tool is used as a vital component to assess the needs of social media among
users [2]. It has many benefits, such as social and political sentiments, measuring the
popularity of political and social policies, marketing products, extracting positive and
negative opinions, discovering recent trends, and detecting product popularity. Besides
all, sarcastic and non-sarcastic tweets are also crucial for detecting sentiments. There
have been many studies examining the characteristics and features of the political dis-
course of Twitter during the US 2016 election. The objective of these studies is to develop
a robust approach that can be applied to extract data from Twitter to predict the results of
the upcoming elections. According to [1, 3, 4] Twitter is a popular application on social
media. During the US presidential election of 2016, people used social media to express
their admiration or dissatisfaction for a particular presidential candidate. The authors
measure how these tweets expressed and compared to the poll data. The authors used an
Opinion Finder Lexicon dictionary and the Naive Bayes Machine Learning Algorithm
to measure the sense of tweets related to politics. In another research paper [3], there
were 3,068,000 tweets contains ‘Donald Trump’ text, and 4,603,000 contains ‘Hillary
Clinton’ text collected within 100 days before the election period (see Fig. 1). Also,
[1] has collected 3 million tweets within the same period. In both types of research,
authors have manually labeled the collected tweets. Also, another technique was used to
automate label based on the hashtag content/address. The authors concluded that Twitter
had become a more reliable environment. By observing the tweets during 43 days before

Fig. 1. Tween volume chart Fig. 2. Correlation Coefficient for Trum and
Clinton Tweets 43 days before election using a
Moving Average of k days
Sentiment Analysis on Predicting Presidential Election 107

the election period, when using “moving average technique,” it was noticed that 94%
correlation was recorded with poll data used as illustrated (see Fig. 2).
The lexicon dictionary [3] contains 1600 positive and 1200 negative words. The
result of applying this analysis was shown in Figs. 3 and 4.

Fig. 3. Lexicon sentiment for Trump hashtags Fig. 4. Lexicon sentiment for Clinton
hashtags

National Language Toolkit (NLTK) was used in the Naïve Bayes algorithm. There
were five hundred negative and five hundred positive tweets labeled in this algorithm.
Two key search words (‘Hillary Clinton’) and (‘Donald Trump’) were used. The criteria
of this algorithm and the analysis results are shown in Figs. 5 and 6. Authors in [1] have
used a machine learning approach for automatic tweet labeling. Convolution Neural
Network (CNN) was performed using Python and Tensorflow. The collected tweets are
trained by 140 sentiment data set. After that, tweets were labeled as a positive or negative
sentiment. The election dataset is used for evaluation and predicting the votes by using
two algorithms and sentiment analysis. The predicted votes were compared with the
polling statistics collected during the election period, and it shows that the expected
result was matched with the actual voting result [5].

Fig. 5. NLTK sentiment for Clinton hashtags Fig. 6. NLTK sentiment for Trump hashtags

Moreover, [6] produced an analytical text on the American Elections in 2016. In this
research, three million tweets were collected within 21 days before and after Election
Day. Emotion analysis was used to examine the user’s sentiment behavior regarding his
Twitter profile and its associated features. Both research papers have studied the topic of
108 N. Baker Al Barghuthi and H. E. Said

Twitter and discuss its relevance to news and events. More often the SentiStrength is used
for sentiment score calculation. The collected tweets were cleaned based on messages
referring to the keyword search terms ‘election 2016’, ‘Hillary Clinton’ or ‘Donald
Trump.’ Furthermore, [6] has conducted several hypotheses to study and discuss the
characteristics of Twitter’s political user behavior, views, and other discussions. The
result of the analysis showed that the majority of feelings of the collected tweets were
negative for both candidates.
There is a sign of unpleasant sensations of the latest elections. Also, it was discovered
from the analytic results that few numbers of tweets posted during debate sessions and
were mostly re-tweeted. Authors in [7] provided a novel method to facilitate data mining
related to the participant’s opinion with the support of linguistic analysis and sentiment
ratings. It was used to identify the sentiment score level of political involvers in Pakistan.
According to the literature, this original paper was recognized as the first one that studied
the sentiment analysis on social media. The authors have introduced a new technique for
mining opinion in a political context. Two classifications analyses were applied among
the collected data using the Bayes Naïve probability distribution and Support Vector
Machine (SVM) algorithms. The result of applying this approach was shown in Fig. 8.
The sentiment analysis was performed on the Sentiment viz web-interface that provides
a graphical analysis of the different levels of sentiments categories, as shown below (see
Fig. 7). Figure 8 also shows that SVM performed better than Naïve Bayes algorithm and
gave higher accuracy.

Fig. 7. Sentiment classification defined over Fig. 8. Comparisons of SVM and Naive
the keyword “PTI” Bayes

Figure 9 shows an output display of the invented model. This model has an option
to display the data in a visual environment. Authors of [7] have used this model to
represent the negative opinion for Pakistan People’s Party (PPP) among different places
and cities. From the display of Fig. 9, it was noticed that Lahore city has the highest
negative opinion level about PPP. The highlighted information leads to many facts.
Furthermore, other parties can see this fact from a different angle. This leads to an
easy comparison and clarity concerning PPP. The proposed model was tested on the
Pakistan Election. Sentiment analysis was performed and shows that Lahore has the
highest negativity percentage among the country, as it is listed in Fig. 9. SVM was also
used for determining the polarity of the sentiment. SVM is based on features of the data,
and label polarity of the tweet’s sentiment, whether it is positive or negative, otherwise,
Sentiment Analysis on Predicting Presidential Election 109

Fig. 9. The magnitude of negativity about PPP in Fig. 10. Working of SVM Model
different cities

it will be labeled as neutral, along with the political party name for the posted tweets [9]
(see Fig. 10).
The author in [8] has improved the political sentiment analysis on Twitter using
a dynamic keyword method. This paper presents a new method for collecting data to
predict election results and a way for subject models to extract topics [10]. The RSS
aggregator uses news articles and dynamic keywords shared on Twitter at the same time,
creating an intelligent prediction method that mainly depends on the size of Twitter. It
also attempts to enhance electoral predictions on social media using a dynamic word-
based methodology for Delhi Assembly Election 2015. There were two channels for
data collected using RSS (see Fig. 11) and Twitter API. Two types of analysis were
applied to the collected data (volume and sentiment-based analysis). This analysis is
useful to detect the voting activity from the tweets by using keywording searches related
to ‘election’ using the LDA-based approach, as shown in (see Fig. 12). This analysis
is based on simple opinion and comparison-based opinion. It is evident from the graph
(see Fig. 10) that SVM techniques give good results.

Fig. 11. Selection procedure for topics and Fig. 12. Tag cloud showing the most
keywords from RSS feeds relevant keywords in RSS
110 N. Baker Al Barghuthi and H. E. Said

3 Methodology
Two datasets were generated using the Twitter application for data collection. Each
dataset has 22000 tweets. The collected tweets were extracted before the Turkish Presi-
dential Election period. The election took place during the period of 22nd and 24th June
2018. Spyder Python development environment was used for the data mining and sen-
timent analysis process (see Table 1). Multiple stages were performed on this dataset
during the analysis processes. Step 1 of the process involves translating the text por-
tion in the datasets to the English Language text format. We selected the python spyder
programming script and the translator parser.

Table 1. Tools and Software packages used in the Tweet Analysis

Tools used Software package used


Twitter API Tweepy package
Spyder Python Development Environment Json and CSV packages
Anaconda Navigator Environment Textblob package
SentiStrength Classifier Toolbox Naive Bayes Analyzer package
Twitter API Wordcloud package

Furthermore, step 2 investigates the dataset’s text conversion into a lower case text
format. A filtering process was applied to eliminate the unwanted text. This filter was used
to remove redundant data for better extraction. A custom stopwords lexicon dictionary
that contained English and Turkish words were used to filter the dataset’s text. The output
text from this filter was extracted from any punctuations symbols, URLs, and hashtags.
Besides, all the re-tweeted texts were excluded from the filtered text. After that, word
and sentence tokenize process was completed. Then, each tweet was divided into two
parts; discrete words and sentences. The output of the text was further extracted for
better sentiment analysis performed on the output text using a Naïve Bayes Classifier.
It classifies the sentiment polarity level using three levels of polarity, positive, negative,
and neutral.

Fig. 13. Typical scenarios of pre-processing on standards text and Tweets

Finally, several data processes were implemented on the dataset text using the word
cloud and word frequency algorithms. Multiple visual presentation and measurements
Sentiment Analysis on Predicting Presidential Election 111

were generated and displayed. Finally, the actual election results were compared with the
Twitter sentiment analysis results and evaluated the precision percentage of the overall
effect. Figures 13 and 14 show the processes performed on the data sets.

Fig. 14. Flowchart sentiment analysis algorithm

4 Sentiment Analysis Process


a. Tokenize each text into words
Each tweet will be tokenized and split into separated words as showed in Fig. 15.

Fig. 15. Tokenize each text into words


112 N. Baker Al Barghuthi and H. E. Said

b. Converting text to lower case and spell check process


Each tokenized word is converted into a lower case. After that it will be validated by
comparing it with an English word list dictionary. If there is a typo mistake, it will
be corrected and returned back to its dataset as shown in Fig. 16.

Fig. 16. Converting text to lower case and spell check process

c. Create a custom stop words and remove the unwanted text from the main data
set
Two custom stopwords dictionaries were created using utf-8 encoding. Stopwords
list for English and another one for Turkish. These stopwords lists have most frequent
words that people usually use. It should be removed from the dataset before and after
the data processing as shown in Fig. 17.

Fig. 17. Create a custom stop words and remove the unwanted text from the main dataset

d. Sentiment Polarity Analysis Process


TextBlob and NaiveBayesAnalyzer packages were installed and imported. Naïve
Bayes Sentiment classifier is used to word sentiment classification. This function
returns a score between [1-1 and 1]. If sentiment score > 1, this means that the tweet
has a positive sentiment. Else if sentiment score < 1, this means that the tweet has a
negative sentiment. Else if sentiment score is equal to zero, it is considered that this
tweet has a neutral sentiment as shown in Fig. 18.
Sentiment Analysis on Predicting Presidential Election 113

Fig. 18. Sentiment polarity analysis process

e. Measure the overall polarity sentiment level for each elector


The overall polarity level percentage (positive, neutral and negative) is calculated
for each elector (see Fig. 19).

Fig. 19. Measure the overall polarity sentiment level for each elector

5 Observation
The datasets were examined using the above-mentioned approach. Exciting results were
generated. The testing was introduced using two election candidates. Those are Mr.
Regep Tayyip Erdogan (RT Erdogan - candidate #1) and his competitor Mr. Muharrem
Inci (Inci - candidate #2). Similar experimental procedures were applied to these two
candidates’ datasets. Table 2 shows samples of positive, negative and neutral opinions
for candidate #1 using the Naïve Bayes Classifier.
The results showed that candidate #1 has more positive sentiment votes. A “48%”
positives sentiments reflect that more people are pleased to elect him. The competitor has
114 N. Baker Al Barghuthi and H. E. Said

Fig. 20. Actual Turkey presidential elections 2018 results [11]

Table 2. Comparison between actual Turkey presidential final result vs. Twitter predicted result

Turkey presidential elections 2018 Candidate 1 Candidate 2


Actual final election results on June 25, 018 53.3% 30.4%
Predicted election results through the proposed Twitter sentiment 48.0% 35.0%
analysis on June 20–24, 2018
Accuracy error percentage 9.94% 11.84%

Table 3. The word cloud results from the collected dataset during June 22nd–24th, 2018, for both
electors. It shows the top 100 words that are frequently repeated and the most 100 positive words
about both electors

Related Tweets to: Relevant tweets words - Related tweeted words -


Candidate 1 Candidate 2
Top 100 frequent words on Twitter
related to each elector

Top 100 positive words related to


elector

“35%” positive sentiment votes. Moreover, 23% of tweets have negative feelings votes
for candidate one whereas 30% recorded tweets were neutral votes. As for candidate 2,
35% of the tweets were identified as positive sentiments; on the other hand, only 17% of
the tweets listed as negative feelings. For this candidate, the neutral votes were recorded
as 48% of the overall tweets. These results are comparable to the official election results
listed, as shown in Fig. 20 and Tables 2, 3 and 4.
Sentiment Analysis on Predicting Presidential Election 115

Table 4. Positive, negative and neutral sentiment

Sample Candidate 1 Candidate 2

Sentiment : polari-
Zeeshan Haider, iamshanichadhar,
ty=
2018-06-22 03: 55,0,0," ""I do not Now look, where are the citizens of a great
0.420238095238095
know what he say in this video but Sentiment: polari- country when the president is in the palaces? how
24,
PosiƟve every word is important for every ty=0.4, is this country represented from this profession?
subjectivity=
Muslim he is real Hero as a leader subjectivity=0.625 just open your mind and look at the state of the
0.559077380952381
country. how does it look?

Vicki Andrada, BlindNewsGirl,


Sentiment polari- 2018-06-22 03: 27,0,1," ""I fear if
ty=- # Erdogan realizes by doing it the
0.69999999999999 democratic way is not working,"
98, he will go really violent on the Sentiment:
subjectivi- Turkish people, worse than he I polarity=-0.46875, No trolls! It is no? Peace, theft, contradiction with
NegaƟve Allah
ty=0.666666666666 have already had it .. Perhaps then
6666 he'd go too far for Turks," but it's subjectivity=0.8
a very scary situation

Sentiment:
Neutral polarity=0.0, The AKP lie beneath the party feet that it enlists.
subjectivity=0.0

SenƟmental
Polarity Per-
centage

6 Analysis and Discussion

The actual Turkey Presidential results are shown in Fig. 14. It indicates that candidate 1
received 53.3% from the overall election votes. In contrast, candidate 2 received 30.4%
of the total electoral votes. As shown in Table 3, the proposed sentiment analysis model
was successful in predicting the final results of the presidential election votes. This
model predicted that candidate 1 has positive support from the community with 48% of
the predicted twitter votes. When the actual voting result was received during the election
period, it shows that there is 9.94% accuracy. Nowadays, social media plays an essential
role in sentiment analysis, mainly if it is used in predicted votes before running the actual
election results [12, 13]. This proposed model can also be useful for politicians, especially
if they need to understand their followers and supporters. As shown in Table 4, the word
cloud has a critical impact, in collecting the negative sentiments from the community
and visualize them. By doing so, the voters can have a useful resource from these data to
improve their proposed strategy seeking more supporters. On another hand, this model
can be used in several cases, such as to study the activities of the elector’s followers on
social media to understand their thought, opinion, and subjectivity.
116 N. Baker Al Barghuthi and H. E. Said

7 Conclusion and Future Work


In this research, a systematic sentimental analysis of collected tweets towards predicting
presidential election results was introduced. The data was collected from tweets blogs
using Twitter API aggregator. A polarity sentimental analysis and word cloud counts
were applied to the collected tweets. Text files were created to compare the tweets.
In this context the classification (i.e. Naïve Bayes Classifier) of the tweets was based
on three levels; positive, negative and neutral. When the experiment was executed we
were able to generate a word cloud for the most repetitive words found in the tweets
retriever. The classified data was visualized by the pie chart and word cloud based on the
sentiment scores. As future work, the authors would like to develop an in-depth analysis
of the emotional behavior for election candidates. Besides we would like to enhance our
prediction approach by adapting multiple classifiers and compare the outcome with the
real results.

References
1. Heredia, B., Prusa, J., Khoshgoftaar, T.: Exploring the effectiveness of Twitter at polling the
United States 2016 presidential election. In: IEEE 3rd International Conference on Collabo-
ration and Internet Computing (CIC), pp. 283–290. IEEE (2017). https://doi.org/10.1109/cic.
2017.00045
2. Abuaiadah, D., Rajendran, D., Jarrar, M.: Clustering Arabic tweets for sentiment analysis. In:
IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA),
pp. 449–456. IEEE (2017). https://doi.org/10.1109/aiccsa.2017.162
3. Joyce, B., Deng, J.: Sentiment analysis of tweets for the 2016 US presidential election.
In: Undergraduate Research Technology Conference (URTC), IEEE MIT, pp. 1–4. IEEE
(2017)
4. Huberty, M.: Can we vote with our tweet? On the perennial difficulty of election forecast-
ing with social media. Int. J. Forecast. 31(3), 992–1007 (2015). https://doi.org/10.1016/J.
IJFORECAST.2014.08.005
5. Yaqub, U., Atluri, V., Chun, S.A., Vaidya, J.: Sentiment based analysis of tweets during the
US presidential elections. In: Proceedings of the 18th Annual International Conference on
Digital Government Research, pp. 1–10 (2017)
6. Ussama, Y., Chun, S.A., Atluri, V., Vaidya, J.: Analysis of political discourse on Twitter in
the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017). https://
doi.org/10.1016/J.GIQ.2017.11.001
7. Gull, R., Shoaib, U., Rasheed, S., Abid, W., Zahoor, B.: Pre processing of Twitter’s data
for opinion mining in political context. In: Proceedings of 20th International Conference on
Knowledge-Based and Intelligent Information and Engineering Systems, KES 2016 (2016)
8. Jain, S., Sharma, V., Kaushal, R.: PoliticAlly: finding political friends on Twitter. In: IEEE
International Conference on Advanced Networks and Telecommunications Systems (ANTS),
pp. 1–3 (2015). https://doi.org/10.1109/ANTS.2015.7413659
9. Sharma, U., Yaqub, R., Pabreja, S., Chun, A., Atluri, V., Vaidya, J.: Analysis and visualization
of subjectivity and polarity of Twitter location data. In: Proceedings of the 19th Annual
International Conference on Digital Government Research Governance in the Data Age -
DGO 2018, pp. 1–10 (2018)
10. Ceron, A., Curini, L., Iacus, S.M., Porro, G.: Every tweet counts? How sentiment analysis of
social media can improve our knowledge of citizens’ political preferences with an application
to Italy and France. New Media Soc. 16(2), 340–358 (2014)
Sentiment Analysis on Predicting Presidential Election 117

11. Hurriyet Daily News, As it happened: Erdoğan re-elected president, “People’s Alliance”
wins majority at parliament (2018). http://www.hurriyetdailynews.com/turkey-election-live-
updates-vote-counting-starts-as-polls-close-across-turkey-133726. Accessed 16 August 2018
12. Burnap, P., Gibson, R., Sloan, L., Southern, R., Williams, M.: 140 characters to victory?:
Using Twitter to predict the UK 2015 general election. Electoral. Stud. 41, 230–233 (2016)
13. Vinay, J., Shishir, K.: Towards prediction of electronic outcomes using social media. Int. J.
Intell. Syst. Appl. 12, 20–28 (2017)

View publication stats

You might also like