0% found this document useful (0 votes)
87 views

Social Network Based Sentiment Analysis and Election Prediction

This document discusses using social media sentiment analysis to predict election results in India. It analyzes tweets to gauge public opinion on political parties and leaders. The researchers collected Twitter data during the 2019 Indian general election and analyzed it using sentiment analysis techniques. Their model found Narendra Modi and the BJP to be the most popular leader and party on Twitter. When compared to actual results, the predictions from social media analysis proved fairly accurate. Previous related work on predicting elections through Twitter is also discussed.

Uploaded by

temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Social Network Based Sentiment Analysis and Election Prediction

This document discusses using social media sentiment analysis to predict election results in India. It analyzes tweets to gauge public opinion on political parties and leaders. The researchers collected Twitter data during the 2019 Indian general election and analyzed it using sentiment analysis techniques. Their model found Narendra Modi and the BJP to be the most popular leader and party on Twitter. When compared to actual results, the predictions from social media analysis proved fairly accurate. Previous related work on predicting elections through Twitter is also discussed.

Uploaded by

temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Social Network based Sentiment Analysis and

Election Prediction
Dr. Purnendu Karmakar Aakash Jhawar
Department of Electronics and Communication Engineering Department of Electronics and Communication Engineering
The LNM Institute Of Information Technology The LNM Institute Of Information Technology
Jaipur, India Jaipur, India
[email protected] [email protected]

Saket Ranjan Vaibhav Munjal


Department of Communication and Computer Engineering Department of Electronics and Communication Engineering
The LNM Institute Of Information Technology The LNM Institute Of Information Technology
Jaipur, India Jaipur, India
[email protected] [email protected]

Abstract—The General Elections in India is a grand event. I. I NTRODUCTION


Everything about it is colossal. Over 900 million eligible voters
registered their votes for this term to choose their leaders. With A. Problem Addressed
a population of around 1.2 billion people, India has one of Social media has become part of our life, people are using
the world’s largest democracies and there are many interesting
aspects to it. One of them is to predict the results of the election.
social media to express their thoughts, demands and ideas. We
Who will win? Who will come for a second term? What is the can see a major change in election held in 2009 and 2014, role
general mood of the public? Which party is more favoured by of social media has increased a lot. Different political parties
the people and so on. The social media is abuzz with opinions are using them as advertisement medium. According to market
and discussions taking place all over India. If we were to gauge research agency, number of Indian internet users have grown
the general sentiment of the public about a specific party or
parties or about a topic, social media could provide a fairly
to 627 million. In the ICUBE 2018 report [2] that tracks digital
solid idea about it with some effort. Previously, a lot of work adoption and usage trends in India, it noted that the number
has been done in the sphere of gauging the mindset of people of internet users in India has recorded an 18 percent annual
using social media, specifically Sentiment Analysis of Twitter growth and is estimated at 566 million as of December 2018,
posts. Sentiment Analysis is basically a method to attribute a an overall increase of 40 percent.
specific sentiment to a document using different features of the
document and Natural Language Processing (NLP) techniques. There are lot of social media platforms used in India
We can classify a document into a host of categories like positive, like Facebook, Twitter, Instagram, Google+, Linkedin, Quora.
negative, neutral, angry, happy, sad etc. and this is specially Twitter the micro blogging service is the second most used
helpful when we are trying to gauge the opinion of people based social media website in India with about 34.4 million users [3].
on what they wrote about it. We used Twitter data for our work Call it the influence of introduction of cheap mobile networks
and tried to apply sentiment analysis technique to find the general
consensus of the public about different political leaders and in India, India has dramatically increased number of internet
political parties. The model was first tested on the US Elections users. Candidates use this medium to connect to millions of
2016 and it provided fairly accurate results with Donald Trump voters.
leading in social media popularity. After that we tested our model The problem this paper address is that can we use this
to predict the outcome of General Elections 2019 in India. The vast data to predict patterns, preferences and trends to predict
analysis provided some really interesting results and also showed
implications of recent controversies floating in the social media. future results. Lot of researchers have tried to use Twitter
The model put Narendra Modi as the most popular leader and to predict elections outcomes. Logic behind the studies is to
BJP as the most popular party on Twitter. The analysis also collect data using Twitter API and apply different classifiers
showed what fractions of users supported BJP/Narendra Modi and find different trends among them to predict the elections.
and how many users supported Congress/Rahul Gandhi. The Despite the large number of efforts, there are lot of issues,
results were pretty accurate when compared to other sources
and this model and analysis could prove useful in gauging the like the tweet we gathered are dependent on words in text and
mindset of the general public about a specific topic, product or find that the user has a positive or negative view on the topic.
idea.
B. Research Motivation
Twitter is the second most used social media platform
in India, Twitter micro blogging website allows users to
Index Terms—component, formatting, style, styling, insert
express their opinions within the limit of 140 characters. Many
politicians, media houses, journalists and even the general They used last 4 days election tweets to analyze the data,
people of India are using Twitter to express their political they pre processed the tweets, applied different algorithms to
orientations. find the opinion of a person. According to them PTI should
Many research papers have been written in this context emerge as the actual winners, but actual winner was PMLN.
but on American elections, which had only 2 candidates with However we can see PTI obtain a significant victory over
about 50 million users and using only English as the medium a one province. We could conclude that twitter helped PTI
of communication. [4] India being such a vast country with to have positive impact on the election and motivate the
about thousands of parties in India and thousands of candidates young voters in favour of them, although Twitter can not be
for Lok sabha elections 2019. With this kind of diversity, considered to represent whole voting population.
problem of finding political orientation becomes bigger and
complex. All these factors motivates us to work on this idea A researcher [8] published a research paper in 2012 in
and finding interesting patterns between them. which they developed a tool called Tara tweet to define
experiments and to capture defined conversations. Using this
C. Research Aim tool they analyzed 5 lakh tweets, showed the result on tara
With data collected in real time we want to relate with the tweet website, and had a adequate result. As the voters change
sentiments associated with it and find out different inferences their mind till the last second, percentage vote of some parties
and patterns on it. We aimed to find out what were some changed but this is normal, with this experiments we can
common hash tags and patterns, who were the person at analyze the relation between actual votes and mentions. This
center, and what kinds of recent propaganda is going in the experiment concluded that twitter may be used to analyze the
election time. Second part of the project was to find what was political situation in a country.
the popularity of politicians like Rahul Gandhi and Narendra
Modi. An another [9] very interesting paper named ”Emotion
Since there were mainly two alliances in news that are analysis of Twitter using opinion mining” instead of
National Democratic Alliance(NDA) and United Progressive classifying opinions in only positive, negative and neutral,
Alliance(UPA) we tried to analyze in categories either ProBJP, they tried to categories the opinion into 5 emotions i.e.
ProCongress and neutral statements. Happiness, Anger, Fear, Sadness and Disgust. They followed
Since Indian politics is vast we used wide range of keywords a two step approach, firstly extract the opinion words, they
for data acquisition that we will discuss in upcoming chapters firstly extract the opinion words and then applied novel
and pre-processes the tweets to analyze it better, and apply algorithm to find emotional values to opinion words. Their
sentimental analysis to it. The main aim of our project is initial motivation shows this can be applied to find the
to analyze and draw meaningful conclusions from collected visualization of political scenarios.
tweets over the entire duration of time.
A another researcher [10] which focused majorly and
II. BACKGROUND tried to prove that using social media sometimes leads to
Although analyzing social media is a field that started to be wrong results due to lack of testing of the data. Their concern
popular a few years ago, there are a lot of researchers who was that lot of political parties try to invoke their propaganda
have tried to develop methods to find out what users think and fake news using different measures and trying to involve
about different topics. ”actors” that influence people. The results of the paper was
Tumasjan A.(2010) [5] who analyzed about 1,00,000 tweets that linking tweets to the vote is not a appropriate measure
in 2010, an era when people started analyzing text to get but twitter can be used to find voter sentiments.
patterns and sentiments, paper claimed that mere mention
or volume analysis of the tweets was enough to predict the In a research paper [11] named ”Prediction of Indian
elections. In our case we tried to applied this concept to our Election Using Sentiment Analysis on Hindi Twitter” as we
model. all know India is a diverse country, lot of tweets in context
During presidential election in Wolska and Bouguera [6] of India election were written in Hindi. He used HindiSenti
published a paper, their sole purpose was not only to predict Word net and different algorithms to classify tweets in
correct election result but also to find interesting patterns Positive, Negative, Neutral, and they were successful to
found during the elections. Furthermore, importance of opin- predict the voter sentiments in Hindi Tweets.
ion mining was used to predict the loyalty of users and general
public. AS the authors claimed that social network analysis and
sentimental analysis can be used to predict political as well as
economic changes in the future.
In 2013 a research paper [7] tried to predict Pakistan
election result in 2013, considered major 3 political parties
in Pakistan Tehreek-e-Insaaf (PTI), Pakistan Muslim League
Nawaz (PMLN), and Muttahida Qaumi Movement (MQM).
III. P ROPOSED W ORK only provides data for the last 7 days. The other method is
to use the Streaming API to collect data in real time. In this
method we basically track certain hashtags or terms that are
intersting for us and this API returns a stream of twitter posts
related to those terms in JSON format.
2) Tracking specific terms in Streaming API: So we first
used the REST API to collect data for the last 7 days and we
tracked some specific terms. After that, we used the Streaming
API to track some specific terms in real time that were
relevant to us and saved the incoming tweets in a JSON file.
The terms that we tracked had to be decided manually and
carefully since a major part of our analysis was to collect
Figure 1. Project Flow Chart relevant data. The terms that we decided on were: ’BJP’
’Congress’ ’Narendra Modi’ ’Rahul Gandhi’ ’Loksabha
A. Project Work Flow Elections 2019’ ’India Elections’ ’Elections2019’ ’Elections
The whole project was divided into five sub tasks to make 2019’ ’AssemblyElections2019’ ’LokSabhaElections2019’
the work flow organized and easier to collaborate. The first ’LokSabha2019’ ’Loksabha Elections 2019’.
task was to collect the data for our model that could then be 3) Timings for data acquisition and volume of Tweets: As
processed and worked upon. The detailed process is explained it can be seen that we only kept our search limited to a couple
later in the report. of popular parties, popular leaders and the terms and hashtags
Out next task was to convert the JSON format tweets into related to elections in general. We also decided on collecting
CSV format to make it easier to work with them. After that data thrice daily because of the high volume of tweets posted
the tweets needed to be cleaned and pre-processed to extract during 8AM to 10AM in the morning, 12PM to 2PM in the
only the meaningful information from them. afternoon and 8PM to 10PM in the evening. These were the
After the cleaning and pre-processing of tweets, we can run time slots when the volume of tweets were very high and it
some preliminary analysis modules to know the basic facts was beneficial for us to collect the data thrice daily rather than
about the data and other story generation from the available for the whole time. This gave us a pretty large amount of data
data. 11GB and a total count of 14.5 million tweets in total in
The next step was to perform the sentiment analysis using JSON format for different days.
TextBlob on the cleaned and preprocessed tweets and save the The tweets in JSON format that were returned to us were
polarity of each tweet along with the text so that it tells us the mainly of 3 types: Normal tweets, Retweeted tweets and
sentiment of the tweet or the person who wrote the tweet. Quoted tweets. Each of the tweets had different items in them
The last step was to gather all the information gained and we needed to process each tweet differently to extract the
from the sentiment analysis part to determine the positive items that were important for our analysis as discussed in the
and negative support for each candidate/party after which next section.
we can predict the result of the event on the basis of the
positive/negative tweet percentage. C. Data Preprocessing
B. Data Acquisition

Figure 3. Data Preprocessing Flow Chart

The main objective of data pre-processing step is to clean


the noise which are less relevant to find the sentiment of tweets
or texts. These noise could be punctuation, numbers, special
Figure 2. Data Acquisition Flow Chart character and some terms which do not hold much weight in
context of the text.
1) Collecting data using Twitter APIs: The first step of 1) Removing Twitter Handles @user: The dataset tweets
our project was to collect relevant data to work on. We used contains a lot of twitter user handles as @user. Each Twitter
Twitter posts for our sentiment analysis and so we collected user is acknowledged by @user. We will remove all these
data from Twitter for some time. There are two methods to twitter handles from the data as they are not important and
collect data from the official Twitter API. The first method is to dont convey much information. We used a regular expression
use the REST API to collect the Twitter posts but this method which will pick words beginning with @ and remove it.
2) Removing Punctuations, Numbers, and Special Charac- A. Most common hash tags
ters: As discussed above, numbers, punctuations, emoticons
and special characters do not help much as they dont cary any
weight. So we will remove all punctuations and numbers and
only allow alphabets and pound character and replace it whit
spaces.
3) Remove short words: We have removed the words whose
length is less than 2. One must be very careful while selecting
the length of words which we want to remove. For example,
words like oh, hmm are of very little use. It is better to get
rid of them.
4) Tokenization: We will tokenize all the cleaned tweets.
Tokens are usually individual terms or words, and ”tokeniza-
tion” is taking a text or set of text and breaking it up
into its individual words. These tokens are then used as the
input for other types of analysis or tasks, like parsing (auto- Figure 4. Occurrences of Most Frequent Hashtags
matically tagging the syntactic relationship between words).
Tokenization means dividing tweet or a sentence into a list
of sequence of tokens. These sequence roughly correspond to The most common hashtags ’#bjp’, ’#everyvoteformodi’,
words. Tokenization is one of the basic tasks of NLP. ’#indiawantsmodiagain’ and ’#bharatmodikesaath’ are the
5) Stemming: We used Porter Stemmer Algorithm [12] for most trending and hot topics during the election period. There
stemming the dataset. Porter Stemmer algorithm is used to are anti BJP hashtags like ’#bjpjumlamanifesto’, ’#modiwith-
remove each suffix word present in English. Stemming is a terrorist’. So, we can conclude that there are variety of
process which is used for stripping the suffixes (ing, ly, es, hashtags present in the tweets. People are both in favour and
s etc) from a word. For example love, lovely, loved, lovable opposite of BJP. But the total number of hashtags in favour
and loving are the different variations of the root word love. of Narendra Modi or BJP are more than that of Rahul Gandhi
or Congress.

D. Sentiment Analysis
B. Most frequent terms
TextBlob [13] is a python library which offers to perform
basic Natural Language Processing tasks on textual data. We
can access all the methods to perform NLP operations by a
simple Textblob API. TextBlob are like python strings. We can
change and transform the string in the same way as we do in
python.
TextBlob takes a string or tweet as an input and returns
a float number between -1 to 1. This number represents the
polarity of the sentence.
Textblob is used by many different researchers for sentiment
analysis. In the recent research paper by Nausheen and
Begum [14] named ”Sentiment Analysis to Predict Election
Results Using Python” used TextBlob for Sentiment Analysis
and Translation. He found out that the result predicted using
TextBlob were same as the real outcome. Figure 5. Occurrences of Most Frequent Words

IV. S IMULATION AND R ESULTS The most frequent words used during the elections are ’bjp’,
’congress’, ’modi’, ’rahul’, ’pragya’ etc. So we can say that
people were tweeting more about BJP than Congress. These
We Classified the tweets in two categories positive and most frequent words can be from a positive or negative tweet.
negative. Based on our set of keywords for collecting the But we can conclude that the absolute number of tweets
tweets, the data that we could collect 1464943 tweets for India pertaining to BJP was quite high than Congress. Also, Modi
Elections 2019 is much popular than Rahul Gandhi over twitter.
C. Most common bi-grams E. Sentiment distribution for both the alliance

Figure 8. Histogram of Sentiment relating to each alliance


Figure 6. Occurrences of Most Frequent Bi-grams

Sentiment analysis positive tweets were more in BJP al-


liance data than that of Congress and vice versa. Neutral tweets
Sadhvi-Pragya is the most recurrening Bi-Gram during the are also more for BJP. The positive and negative tweets of BJP
Indian Elections 2019. It also gives us an idea about the are more than Congress. So people are tweeting more about
controversy around her during April 2019. BJP or National Democratic Alliance than Congress or United
Other popular Bi-Grams were Rahul-Gandhi, Narendra- Progressive Alliance.
Modi, Hindu-Terror, Vote-BJP. This shows us the idea about
the ongoing propoganda. An insteresting thing to note here is
that the keyword ’Rahul Gandhi’ is more used than ’Narendra F. Candidate Wise comparison
Modi’.

D. Sentiment distribution for both candidates

Figure 9. Candidate wise positive and negative tweets

Table I
Figure 7. Histogram of Sentiment relating to each candidate S TATS FOR CANDIDATE WISE SENTIMENT ANALYSIS

Positive Tweets Negative Tweets % of Positive tweets


Modi 53399 28867 64.9%
Rahul 26251 15017 63.6%
In sentiment analysis positive tweets were more for Modi Total 79650 43884 64.5%
than that of Rahul and negative sentiment tweets are almost
equal for Narendra Modi and Rahul Gandhi. Neutral tweets
are also more for Modi. So from the above plot, we can predict We can say that absolute supporters of Narendra Modi are
that Narendra Modi will win elections from tweeter data we more than Rahul Gandi. The positive tweets for Modi and
collected. We can see the comparison of positive, negative and Rahul are 64.5% and 21.3% of the total tweets respectively.
neutral tweets for Modi and Rahul. Also, the absolute tweets The negative tweets for Modi and Rahul are 23.4% and 12.2%
of Narendra Modi are more than Rahul Gandhi. of the total tweets respectively.
G. Alliance Wise comparison [7] Tariq Mahmood, Tasmiyah Iqbal, Farnaz Amin, Wajeeta Lohanna, Atika
Mustafa. Mining Twitter big data to predict 2013 Pakistan election win-
ner. Department of Computer Science National University of Computer
and Emerging Sciences Karachi, Pakistan
[8] Juan M. Soler, Fernando Cuartero, Manuel Roblizo. Twitter as a Tool for
Predicting Elections Results.
[9] Akshi Kumar, Prakhar Dogra, Vikrant Dabas. Emotion analysis of Twitter
using opinion mining.
[10] Andreas Jungherr. Twitter use in election campaigns: A systematic
literature review.
[11] Parul Sharma, Teng-Sheng Moh. Prediction of Indian Election Using
Sentiment Analysis on Hindi Twitter.
[12] M.F.Porter. An algorithm for suffix stripping. Computer Laboratory,
Cambridge, UK, 1980.
[13] TextBlob. Simplified Text Processing.
https://textblob.readthedocs.io/en/dev/
[14] Farha Nausheen, Sayyada Hajera Begum Sentiment Analysis to Predict
Election Results Using Python. Hyderabad, India.
https://textblob.readthedocs.io/en/dev/
[15] PANDAS
https://pandas.pydata.org
[16] NLTK
Figure 10. Alliance wise positive and negative sentiment analysis https://www.nltk.org
[17] MATPLOTLIB https://matplotlib.org/#documentation
[18] GeoPy
Table II https://geopy.readthedocs.io/en/stable/#welcome-to-geopy-s-
S TATS FOR A LLIANCE WISE SENTIMENT ANALYSIS documentation
[19] Caropy
Positive Tweets Negative Tweets % of Positive tweets https://scitools.org.uk/cartopy/docs/latest/
BJP 193303 114215 62.9% [20] Json Pickle
Congress 38003 20631 64.8% https://jsonpickle.github.io/
Total 231306 134846 63.2%

The positive tweets for BJP and Congress are 52.8% and
10.4% of the total tweets respectively. The negative tweets
for Modi and Rahul are 31.2% and 5.6% of the total tweets
respectively. BJP dominates Congress by a large number.
V. C ONCLUSION
We applied our model to 14 million tweets which were
collected over a month and half period of time. We
found ’#bjp’, ’#everyvoteformodi’, ’#indiawantsmodiagain’,
’#bharatmodikesaath’ are most common hash tags over the
period and could not even see ProCongress Hash tag in top 10,
but there were anti BJP hasg tags like ’#bjpjumlamanifesto’,
’#modiwithterrorist’ which were also trending. We can also
see word ”BJP” is more frequently than ”congress”.Applying
sentimental analysis to the tweets we found positive tweets for
NDA were more than UPA, even when comparing candidates
Rahul Gandi and Narendra Modi, we found Narendra Modi
has more positives tweets than Rahul Gandhi. So from this
case we predict NDA will win more seats than UPA.
R EFERENCES
[1] Michel Goossens, Frank Mittelbach, and Alexander Samarin. The LATEX
Companion. Addison-Wesley, Reading, Massachusetts, 1993.
[2] ICUBE 21st edition ICUBE Digital adoption and usage trends.
[3] Stastita Number of Twitter users in India from 2012 to 2019 (in millions).
https://www.statista.com/statistics/381832/twitter-users-india/
[4] Abhishek Bhola Twitter and Polls: Analyzing and estimating political
orientation of Twitter users in India General. IIIT-D
[5] Andranik Tumasjan, Timm O. Sprenger, Philipp G. Sandner, Isabell M.
Welpe Predicting Elections with Twitter: What 140 Characters Reveal
about Political Sentiment.
[6] Lazaros Oikonomou, Christos Tjortjis
A Method for Predicting the Winner of the USA Presidential Elections
using Data extracted from Twitter.

You might also like