0% found this document useful (0 votes)
87 views35 pages

Mini Project Report

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 35

(AUTONOMOUS)

MINI-PROJECT REPORT

DEPARTMENT OF INFORMATION TECHNOLOGY

SECOND YEAR

IV SEMESTER

TWITTER SENTIMENT ANALYSIS

PRESENTED BY:

B.V.L.Pravallika - 19NG1A1207
D.Sree pujitha – 19NG1A1216
P.Sri harika – 19NG1A1246
V.Mounika – 19NG1A1257
ACKNOWLEDGEMENT

I feel priviledged to thank our Chairman sir,Director sir,Principal sir and our HOD
sir for their support. My sincere thanks to the faculty members and I also thank my team members for
the cooperation in making this mini project a success.

I also thank our parents and our dear friends for their help and support.
ABSTRACT

Sentiment analysis deals with identifying and classifying opinions or sentiments


expressed in source text. Social media is generating a vast amount of sentiment rich data in the form
of tweets, status updates, blog posts etc. Sentiment analysis of this user generated data is very useful
in knowing the opinion of the crowd. Twitter sentiment analysis is difficult compared to general
sentiment analysis due to the presence of slang words and misspellings. The maximum limit of
characters that are allowed in Twitter is 140.

Knowledge base approach and Machine learning approach are the two strategies used
for analyzing sentiments from the text. By doing sentiment analysis in a specific domain, it is
possible to identify the effect of domain information in sentiment classification. We present a new
feature vector for classifying the tweets as positive, negative and extract peoples' opinion about
products.

Social media have received more attention nowadays. Public and private opinion
about a wide variety of subjects are expressed and spread continually via numerous social media.
Twitter is one of the social media that is gaining popularity. Twitter offers organizations a fast and
effective way to analyze customer perspectives towards the critical to success in the market place.
This paper reports on the design of a sentiment analysis,extracting a vast amount of tweets
TABLE OF CONTENTS

SNO CONTENTS PAGE


NUMBER

1 Introduction 1

2 Software and Hardware Requirements 4

3 Algorithm Specification 5

4 Advantages and Disadvantages 6

5 Applications 7

6 Streaming Tweets 8

7 Cursor and Pagination 10

8 Analysing Tweet Data 13

9 Visualizing Tweet Data 17

10 Sentimental analysis of Tweet Data 22

11 Outputs 27

12 Conclusion 30

13 References 31
INTRODUCTION
As internet is growing bigger, its horizons are becoming wider. Social Media and Micro blogging
platforms like Facebook, Twitter dominate in spreading encapsulated news and trending topics
across the globe at a rapid pace. A topic becomes trending if more and more users are contributing
their opinion and judgements, thereby making it a valuable source of online perception. These topics
generally intended to spread awareness or to promote public figures, political campaigns during
elections, product endorsements and entertainment like movies, award shows.

Large organizations and firms take advantage of people's feedback to improve their products and
services which further help in enhancing marketing strategies. One such example can be leaking the
pictures of upcoming iPhone to create a hype to extract people's emotions and market the product
before its release. Thus, there is a huge potential of discovering and analysing interesting patterns
from the infinite social media data for business-driven applications.

Software Requirement Analysis


A) Problem
Sentiment analysis is the prediction of emotions in a word, sentence or corpus of
documents. It is intended to serve as an application to understand the attitudes, opinions and
emotions expressed within an online mention. The intention is to gain an overview of the wider
public opinion behind certain topics. Precisely, it is a paradigm of categorizing conversations into
positive, negative or neutral labels. Many people use social media sites for networking with other
people and to stay up-to-date with news and current events. These sites (Twitter, Facebook,
Instagram, google+) offer a platform to people to voice their opinions. For example, people quickly
post their reviews online as soon as they watch a movie and then start a series of comments to
discuss about the acting skills depicted in the movie. This kind of information forms a basis for
people to evaluate, rate about the performance of not only any movie but about other products and to
know about whether it will be a success or not. This type of vast information on these sites can used
for marketing and social studies. Therefore, sentiment analysis has wide applications and include
emotion mining, polarity, classification and influence analysis.

Twitter is an online networking site driven by tweets which are 140 character limited
messages. Thus, the character limit enforces the use of hashtags for text classification. Currently
around 6500 tweets are published per second, which results in approximately 561.6 million tweets

1
per day. These streams of tweets are generally noisy reflecting multi topic, changing attitudes
information in unfiltered and unstructured format. Twitter sentiment analysis involves the use of
natural language processing to extract, identify to characterize the sentiment content. Sentiment
Analysis is often carried out at two levels 1) coarse level and 2) fine level. In coarse level, the
analysis of entire documents is done while in fine level, the analysis of attributes is done. The
sentiments present in the text are of two types: Direct and Comparative. In comparative sentiments,
the comparison of objects in the same sentence is involved while in direct sentiments, objects are
independent of one another in the same sentence.

However, doing the analysis of tweets expressed in not an easy job. A lot of
challenges are involved in terms of tonality, polarity, lexicon and grammar of the tweets. They tend
to be highly unstructured and non-grammatical. It gets difficult to interpret their meaning. Moreover,
extensive usage of slang words, acronyms and out of vocabulary words are quite common while
tweeting online. The categorization of such words per polarity gets tough for natural processors
involved. The rest of this project report is structured as follows. In Section II, we detailed some
related work of our project by highlighting Software and Hardware requirements. Section III cover
details of methodology & implementation of the project Finally, Section VI concludes the report.

B) Modules and Functionalities

MODULE 1 : Streaming Tweets

In this step, we hit the API by performing Authentication and Stream the Data (unfiltered) from
requested Twitter Account.

Step 1: Connect to the API

Step This project consists of five modules . They are:

Step 3: Get the Response.

MODULE 2 : Cursor and Pagination

Pagination is a technique used for breaking large amount of data into smaller portions called pages.
The Twitter standard APIs utilize a technique called cursoring to paginate large result sets. Simply it
handles pagination so that we can specify the number of tweets we want to get.

Step 1:Connect to API and import cursor from tweepy.

2
Step 2:Access user timeline tweets using twitter client(ex : pycon)

(Returns a collection of most recent tweets posted by user indicated by screen name or user id
parameter)

MODULE 3 : Analysing Tweet Data

Analysing tweet data compiles all the behaviours and actions audience take when they come
across your posts and profile - the clicks , likes , re-tweets. Tweet Analyzer integrated with twitter.
Tweet analyzer fetches 5 most recent tweets from given twitter handle.

Step 1: Connect to the API

Step 2: Using tweet analyser functionality we analyse and categorize contents from tweets.

MODULE 4 : Visualizing Tweet Data

Data visualisation is a part of statistical analysis. After collecting and analysing the data , a
good visual representation is designed for data. A picture can speak thousands of words. Different
models give different perspectives of data.

Step 1: Connect to the API.

Step 2: Analyse the data.

Step 3: Plot the data.

MODULE 5 : Sentiment Analysis Tweet Data

Key aspect of sentiment analysis is to analyse a body of body of text based on the polarity.
Sentiment polarity for an element defines the orientation of expressed sentiment.

Step 1:Connect to the API.

Step 2:Analyse the data.

Step 3:If sentiment polarity>0 ------> returns 1

Step 4:If sentiment polarity=0 ------> returns 0

Step 5:If sentiment polarity<0 ------> returns -1

3
SOFTWARE AND HARDWARE REQUIREMENTS

SOFTWARE REQUIREMENTS

Operating System : Windows 7/8/8.1/10

Twitter Developer Account

R Studio

Python IDLE 3.9.1

HARDWARE SPECIFICATIONS

Processor : Intel(R) Core(TM) i3 or more

Installed RAM : 4.00 GB or more

System type : 64-bit operating system, x64-based processor

Monitor : 1024 x 720 Display

4
ALGORITHM SPECIFICATION
Naive Bayes Classification
Written reviews are great datasets for doing sentiment analysis because they often come with a score
that can be used to train an algorithm. Naive Bayes is a popular algorithm for classifying text

Consider, for example, the following phrases extracted from positive and negative reviews of movies
and restaurants,. Words like great, richly, awesome, and pathetic, and awful and ridiculously are very
informative cues: + ...zany characters and richly applied satire, and some great plot twists − It was
pathetic. The worst part about it was the boxing scenes...

TEXTBLOB PACKAGE

The TextBlob package for Python is a convenient way to do a lot of Natural Language Processing
(NLP) tasks. For example: From textblob

import TextBlob

TextBlob(“not a very great calculation”).sentiment

This tells us that the English phrase “not a very great calculation” has a polarity of about -0.3,
meaning it is slightly negative, and a subjectivity of about 0.6, meaning it is fairly subjective.

When calculating sentiment for a single word, TextBlob uses a sophisticated technique known to
Mathematicians as “averaging”.

TextBlob(“great").sentiment

## Sentiment(polarity=0.8, subjectivity=0.75)

TextBlob("very great").sentiment

## Sentiment(polarity=1.0, subjectivity=0.9750000000000001)

The polarity gets maxed out at 1.0, but you can see that subjectivity is also modified by “very” to
become 0.75⋅.

TextBlob("not very great").sentiment

#Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)

Textblob will ignore one letter words in its sentiment phrases.

5
ADVANTAGES

1.UPSELLING OPPORTUNITIES
Identifying the happy and satisfied customers ,and increasing the selling of product.
2.AGENT MONITORING
The superiors will monitor the quality of service provided by each team member.
3.IDENTIFYING KEY EMOTIONAL TRIGERRS
Identifying the emoji trigerrs like sad,happy etc,.. for understanding the customer
satisfaction.
4.HANDLING MULTIPLE CUSTOMERS
By handling multiple customers ,we can save time and can manage other works at that
time.
5.ADAPTIVE CUSTOMER SERVICE
If the provided service by customer is good.then,the customer can easily adapt to their
service.
6.QUICK ESCALATIONS
Finding the negative emoji’s quickly and satisfying the needs of the customer.
7.REDUCE THE CUSTOMER CHURN
Identifing the unsatisfied customer,and provide a smooth service to satisfy their needs.
8.TRACKING OVERALL CUSTOMER SATISFACTION
Tracking the customer satisfaction time to time.
9.DETECT CHANGES IN OPINION
Detecting the customer opinions and satisfying their needs.Because the customer
opinion always changes before and after receiving products.

DISADVANTAGES
1.Inability to perform well in different domains.
2.Inadequate accuracy and performance in sentimental analysis based on insufficient data.
3.Incapability to deal with complex sentences that require more than sentiment words and simple
analyzing
4. It also has lot of application issues with the slang used and the short form of words

6
APPLICATIONS
Twitter sentiment analysis is designed to analyze the sentiment of tweets. It’s ideal for social
listening and detecting brand sentiment in real time.
Based on a scoring mechanism, sentiment analysis monitors conversations and evaluates language
and voice inflections to quantify attitudes, opinions, and emotions related to a business, product or
service, or topic. Sentiment analysis is sometimes also referred to as opinion mining.
The applications of sentimental analysis are endless and can be applied to any industry, from finance
and retail to hospitality and technology.The most popular applications of sentiment analysis in real
life:
1.Social media monitoring

2.Customer support

3.Customer feedback

4.Brand monitoring and reputation management

5.Voice of customer (VoC)

6.Voice of employee

7.Product analysis

8.Market research and competitive research

Sentimental analysis one of those technologies, the usefulness of which wholly depends on the
understanding capabilities

It can be extremely useful if you know how to use it and it can be completely useless if you apply it
on something it is not supposed to do.

7
STREAMING TWEETS
from tweepy.streaming import StreamListener

from tweepy import OauthHandler

from tweepy import Stream

import consumer

# # # # TWITTER STREAMER # # # #

class TwitterStreamer():

"""

Class for streaming and processing live tweets.

"""

def __init__(self):

pass

def stream_tweets(self, fetched_tweets_filename, hash_tag_list):

# This handles Twitter authetification and the connection to Twitter Streaming API

listener = StdOutListener(fetched_tweets_filename)

auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)

auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)

stream = Stream(auth, listener)

stream.filter(track=hash_tag_list)

class StdOutListener(StreamListener):

"""

This is a basic listener that just prints received tweets to stdout.

"""

8
def __init__(self, fetched_tweets_filename):

self.fetched_tweets_filename = fetched_tweets_filename

def on_data(self, data):

try:

print(data)

with open(self.fetched_tweets_filename, 'a') as tf:

tf.write(data)

return True

except BaseException as e:

print("Error on_data %s" % str(e))

return True

def on_error(self, status):

print(status)

if __name__ == '__main__':

hash_tag_list = ["donal trump", "hillary clinton", "barack obama", "bernie sanders"]

fetched_tweets_filename = "tweets.txt"

twitter_streamer = TwitterStreamer()

twitter_streamer.stream_tweets(fetched_tweets_filename, hash_tag_list)

9
CURSOR AND PAGINATION

from tweepy import API


from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

import consumer

# # # # TWITTER CLIENT # # # #
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)

self.twitter_user = twitter_user

def get_user_timeline_tweets(self, num_tweets):


tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)
return tweets

def get_friend_list(self, num_friends):


friend_list = []
for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
friend_list.append(friend)
return friend_list

def get_home_timeline_tweets(self, num_tweets):

10
home_timeline_tweets = []
for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
home_timeline_tweets.append(tweet)
return home_timeline_tweets

# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():

def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()

def stream_tweets(self, fetched_tweets_filename, hash_tag_list):


# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)

# This line filter Twitter Streams to capture data by the keywords:


stream.filter(track=hash_tag_list)
class TwitterListener(StreamListener):
"""

11
This is a basic listener that just prints received tweets to stdout.
"""
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename

def on_data(self, data):


try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data %s" % str(e))
return True

def on_error(self, status):


if status == 420:
# Returning False on_data method in case rate limit occurs.
return False
print(status)
if __name__ == '__main__':
# Authenticate using config.py and connect to Twitter Streaming API.
hash_tag_list = ["donal trump", "hillary clinton", "barack obama"]
fetched_tweets_filename = "tweets.txt"
twitter_client = TwitterClient("pycon")
print(twitter_client.get_user_timeline_tweets(1))
# twitter_streamer = TwitterStreamer()
# twitter_streamer.stream_tweets(fetched_tweets_filename, hash_tag_list)

12
ANALYSING TWEET DATA
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

import consumer
import numpy as np
import pandas as pd

class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)

self.twitter_user = twitter_user

def get_twitter_client_api(self):
return self.twitter_client

def get_user_timeline_tweets(self, num_tweets):


tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)
return tweets
def get_friend_list(self, num_friends):
friend_list = []
for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
friend_list.append(friend)

13
return friend_list

def get_home_timeline_tweets(self, num_tweets):


home_timeline_tweets = []
for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
home_timeline_tweets.append(tweet)
return home_timeline_tweets

class TwitterAuthenticator():

def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth

# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()

def stream_tweets(self, fetched_tweets_filename, hash_tag_list):


# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)

# This line filter Twitter Streams to capture data by the keywords:

14
stream.filter(track=hash_tag_list)

# # # # TWITTER STREAM LISTENER # # # #


class TwitterListener(StreamListener):
"""
This is a basic listener that just prints received tweets to stdout.
"""
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename

def on_data(self, data):


try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data %s" % str(e))
return True

def on_error(self, status):


if status == 420:
# Returning False on_data method in case rate limit occurs.
return False
print(status)

class TweetAnalyzer():
"""

15
Functionality for analyzing and categorizing content from tweets.
"""
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

df['id'] = np.array([tweet.id for tweet in tweets])


df['len'] = np.array([len(tweet.text) for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['source'] = np.array([tweet.source for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])

return df

if __name__ == '__main__':

twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()

api = twitter_client.get_twitter_client_api()

tweets = api.user_timeline(screen_name="Narendra Modi", count=20)

#print(dir(tweets[0]))
#print(tweets[0].retweet_count)

df = tweet_analyzer.tweets_to_data_frame(tweets)

print(df.head(10))

16
VISUALIZING TWEET DATA
from tweepy import API
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

import consumer
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# # # # TWITTER CLIENT # # # #
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)

self.twitter_user = twitter_user

def get_twitter_client_api(self):
return self.twitter_client

def get_user_timeline_tweets(self, num_tweets):


tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)
return tweets

def get_friend_list(self, num_friends):

17
friend_list = []
for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
friend_list.append(friend)
return friend_list

def get_home_timeline_tweets(self, num_tweets):


home_timeline_tweets = []
for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
home_timeline_tweets.append(tweet)
return home_timeline_tweets

# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():

def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth

# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()

def stream_tweets(self, fetched_tweets_filename, hash_tag_list):


# This handles Twitter authetification and the connection to Twitter Streaming API

18
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)

# This line filter Twitter Streams to capture data by the keywords:


stream.filter(track=hash_tag_list)

# # # # TWITTER STREAM LISTENER # # # #


class TwitterListener(StreamListener):
"""
This is a basic listener that just prints received tweets to stdout.
"""
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename

def on_data(self, data):


try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data %s" % str(e))
return True

def on_error(self, status):


if status == 420:
# Returning False on_data method in case rate limit occurs.
return False

19
print(status)
class TweetAnalyzer():
"""
Functionality for analyzing and categorizing content from tweets.
"""
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])

df['id'] = np.array([tweet.id for tweet in tweets])


df['len'] = np.array([len(tweet.text) for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['source'] = np.array([tweet.source for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])

return df

if __name__ == '__main__':

twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()

api = twitter_client.get_twitter_client_api()

tweets = api.user_timeline(screen_name="Narendra Modi", count=20)

#print(dir(tweets[0]))
#print(tweets[0].retweet_count)

20
df = tweet_analyzer.tweets_to_data_frame(tweets)

# Get average length over all tweets:


print(np.mean(df['len']))

# Get the number of likes for the most liked tweet:


#print(np.max(df['likes']))

# Get the number of retweets for the most retweeted tweet:


print(np.max(df['retweets']))

#print(df.head(10))
time_favs = pd.Series(data=df['likes'].values, index=df['date'])
time_favs.plot(figsize=(16, 4), color='r')
plt.show()

#time_retweets = pd.Series(data=df['retweets'].values, index=df['date'])


#time_retweets.plot(figsize=(16, 4), color='r')
#plt.show()

# Layered Time Series:


#time_likes = pd.Series(data=df['likes'].values, index=df['date'])
#time_likes.plot(figsize=(16, 4), label="likes", legend=True)
time_retweets = pd.Series(data=df['retweets'].values, index=df['date'])
time_retweets.plot(figsize=(16, 4), label="retweets", legend=True)
plt.show()

21
SENTIMENTAL ANALYSIS OF TWEET DATA
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream

from textblob import TextBlob

import consumer

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
import re
# # # # TWITTER CLIENT # # # #
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)

self.twitter_user = twitter_user

def get_twitter_client_api(self):
return self.twitter_client

def get_user_timeline_tweets(self, num_tweets):


tweets = []
for tweet in Cursor(self.twitter_client.user_timeline, id=self.twitter_user).items(num_tweets):
tweets.append(tweet)

22
return tweets

def get_friend_list(self, num_friends):


friend_list = []
for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
friend_list.append(friend)
return friend_list

def get_home_timeline_tweets(self, num_tweets):


home_timeline_tweets = []
for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
home_timeline_tweets.append(tweet)
return home_timeline_tweets

# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():

def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN,consumer.ACCESS_TOKEN_SECRET)
return auth

# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()

23
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)

# This line filter Twitter Streams to capture data by the keywords:


stream.filter(track=hash_tag_list)

# # # # TWITTER STREAM LISTENER # # # #


class TwitterListener(StreamListener):
"""
This is a basic listener that just prints received tweets to stdout.
"""
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename
def on_data(self, data):
try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data %s" % str(e))
return True

def on_error(self, status):


if status == 420:

24
# Returning False on_data method in case rate limit occurs.
return False
print(status)

class TweetAnalyzer():
"""
Functionality for analyzing and categorizing content from tweets.
"""

def clean_tweet(self, tweet):


return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analyze_sentiment(self, tweet):


analysis = TextBlob(self.clean_tweet(tweet))

if analysis.sentiment.polarity > 0:
return 1
elif analysis.sentiment.polarity == 0:
return 0
else:
return -1
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])

df['id'] = np.array([tweet.id for tweet in tweets])


df['len'] = np.array([len(tweet.text) for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['source'] = np.array([tweet.source for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])

25
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])

return df

if __name__ == '__main__':

twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()

api = twitter_client.get_twitter_client_api()

tweets = api.user_timeline(screen_name="Narendra Modi", count=200)

df = tweet_analyzer.tweets_to_data_frame(tweets)
df['sentiment'] = np.array([tweet_analyzer.analyze_sentiment(tweet) for tweet in df['tweets']])
print(df.head(10))

26
OUTPUTS
1.STREAMING TWEETS

2.CURSOR AND PAGINATION

27
3.ANALYSING TWEET DATA

4.VISUALIZING TWEET DATA

28
5.SENTIMENTAL ANALYSIS OF TWEET DATA

29
CONCLUSION
Twitter sentiment analysis comes under the category of text and opinion mining. It focuses
on analyzing the sentiments of the tweets and feeding the data to a machine learning model
to train it and then check its accuracy, so that we can use this model for future use
according to the results.

It comprises of steps like data collection, text preprocessing, sentiment detection, sentiment
classification, training and testing the model. This research topic has evolved during the last
decade with models reaching the efficiency of almost 85%-90%. But it still lacks the
dimension of diversity in the data. Along with this it has a lot of application issues with the
slang used and the short forms of words. Many analyzers don’t perform well when the
number of classes are increased. Also, it’s still not tested that how accurate the model will
be for topics other than the one in consideration.

Hence sentiment analysis has a very bright scope of development in future.

30
REFERENCES
1.Sahar A. El_Rahman, "Sentiment Analysis of Twitter Data" in , Computer and Information
sciences College Princess Nourah Bint Abdulrahman University.

2.Anurag P. Jain, "Sentiments Analysis Of Twitter Data Using Data Mining", 2015 ICIP

3.Rasika Wagh and Payal Punde, "Survey on Sentiment Analysis using Twitter Dataset", ICECA,
2018.

4.Adyan Marendra Ramadhani and Hong Soon Goo, "Twitter Sentiment Analysis using Deep
Learning Methods" in , Department of Management Information Systems Dong-A University Busan
South Korea, 2017.

5.Bing Liu, Sentiment Analysis and Opinion Mining Morgan and Claypool Publishers, May 2012.

6.V. Kharde and S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of Techniques",
International Journal of Computer Applications, vol. 139, pp. 11, 2016.

7.Huma Parveen and Shikha Pandey, "Sentiment Analysis on Twitter Data-set using Naive Bayes
Algorithm" in , Bhilai, India:Dept. of Computer Science and Engineering Rungta College of
Engineering and Technology, 2016.

31

You might also like