Mini Project Report
Mini Project Report
Mini Project Report
MINI-PROJECT REPORT
SECOND YEAR
IV SEMESTER
PRESENTED BY:
B.V.L.Pravallika - 19NG1A1207
D.Sree pujitha – 19NG1A1216
P.Sri harika – 19NG1A1246
V.Mounika – 19NG1A1257
ACKNOWLEDGEMENT
I feel priviledged to thank our Chairman sir,Director sir,Principal sir and our HOD
sir for their support. My sincere thanks to the faculty members and I also thank my team members for
the cooperation in making this mini project a success.
I also thank our parents and our dear friends for their help and support.
ABSTRACT
Knowledge base approach and Machine learning approach are the two strategies used
for analyzing sentiments from the text. By doing sentiment analysis in a specific domain, it is
possible to identify the effect of domain information in sentiment classification. We present a new
feature vector for classifying the tweets as positive, negative and extract peoples' opinion about
products.
Social media have received more attention nowadays. Public and private opinion
about a wide variety of subjects are expressed and spread continually via numerous social media.
Twitter is one of the social media that is gaining popularity. Twitter offers organizations a fast and
effective way to analyze customer perspectives towards the critical to success in the market place.
This paper reports on the design of a sentiment analysis,extracting a vast amount of tweets
TABLE OF CONTENTS
1 Introduction 1
3 Algorithm Specification 5
5 Applications 7
6 Streaming Tweets 8
11 Outputs 27
12 Conclusion 30
13 References 31
INTRODUCTION
As internet is growing bigger, its horizons are becoming wider. Social Media and Micro blogging
platforms like Facebook, Twitter dominate in spreading encapsulated news and trending topics
across the globe at a rapid pace. A topic becomes trending if more and more users are contributing
their opinion and judgements, thereby making it a valuable source of online perception. These topics
generally intended to spread awareness or to promote public figures, political campaigns during
elections, product endorsements and entertainment like movies, award shows.
Large organizations and firms take advantage of people's feedback to improve their products and
services which further help in enhancing marketing strategies. One such example can be leaking the
pictures of upcoming iPhone to create a hype to extract people's emotions and market the product
before its release. Thus, there is a huge potential of discovering and analysing interesting patterns
from the infinite social media data for business-driven applications.
Twitter is an online networking site driven by tweets which are 140 character limited
messages. Thus, the character limit enforces the use of hashtags for text classification. Currently
around 6500 tweets are published per second, which results in approximately 561.6 million tweets
1
per day. These streams of tweets are generally noisy reflecting multi topic, changing attitudes
information in unfiltered and unstructured format. Twitter sentiment analysis involves the use of
natural language processing to extract, identify to characterize the sentiment content. Sentiment
Analysis is often carried out at two levels 1) coarse level and 2) fine level. In coarse level, the
analysis of entire documents is done while in fine level, the analysis of attributes is done. The
sentiments present in the text are of two types: Direct and Comparative. In comparative sentiments,
the comparison of objects in the same sentence is involved while in direct sentiments, objects are
independent of one another in the same sentence.
However, doing the analysis of tweets expressed in not an easy job. A lot of
challenges are involved in terms of tonality, polarity, lexicon and grammar of the tweets. They tend
to be highly unstructured and non-grammatical. It gets difficult to interpret their meaning. Moreover,
extensive usage of slang words, acronyms and out of vocabulary words are quite common while
tweeting online. The categorization of such words per polarity gets tough for natural processors
involved. The rest of this project report is structured as follows. In Section II, we detailed some
related work of our project by highlighting Software and Hardware requirements. Section III cover
details of methodology & implementation of the project Finally, Section VI concludes the report.
In this step, we hit the API by performing Authentication and Stream the Data (unfiltered) from
requested Twitter Account.
Pagination is a technique used for breaking large amount of data into smaller portions called pages.
The Twitter standard APIs utilize a technique called cursoring to paginate large result sets. Simply it
handles pagination so that we can specify the number of tweets we want to get.
2
Step 2:Access user timeline tweets using twitter client(ex : pycon)
(Returns a collection of most recent tweets posted by user indicated by screen name or user id
parameter)
Analysing tweet data compiles all the behaviours and actions audience take when they come
across your posts and profile - the clicks , likes , re-tweets. Tweet Analyzer integrated with twitter.
Tweet analyzer fetches 5 most recent tweets from given twitter handle.
Step 2: Using tweet analyser functionality we analyse and categorize contents from tweets.
Data visualisation is a part of statistical analysis. After collecting and analysing the data , a
good visual representation is designed for data. A picture can speak thousands of words. Different
models give different perspectives of data.
Key aspect of sentiment analysis is to analyse a body of body of text based on the polarity.
Sentiment polarity for an element defines the orientation of expressed sentiment.
3
SOFTWARE AND HARDWARE REQUIREMENTS
SOFTWARE REQUIREMENTS
R Studio
HARDWARE SPECIFICATIONS
4
ALGORITHM SPECIFICATION
Naive Bayes Classification
Written reviews are great datasets for doing sentiment analysis because they often come with a score
that can be used to train an algorithm. Naive Bayes is a popular algorithm for classifying text
Consider, for example, the following phrases extracted from positive and negative reviews of movies
and restaurants,. Words like great, richly, awesome, and pathetic, and awful and ridiculously are very
informative cues: + ...zany characters and richly applied satire, and some great plot twists − It was
pathetic. The worst part about it was the boxing scenes...
TEXTBLOB PACKAGE
The TextBlob package for Python is a convenient way to do a lot of Natural Language Processing
(NLP) tasks. For example: From textblob
import TextBlob
This tells us that the English phrase “not a very great calculation” has a polarity of about -0.3,
meaning it is slightly negative, and a subjectivity of about 0.6, meaning it is fairly subjective.
When calculating sentiment for a single word, TextBlob uses a sophisticated technique known to
Mathematicians as “averaging”.
TextBlob(“great").sentiment
## Sentiment(polarity=0.8, subjectivity=0.75)
TextBlob("very great").sentiment
## Sentiment(polarity=1.0, subjectivity=0.9750000000000001)
The polarity gets maxed out at 1.0, but you can see that subjectivity is also modified by “very” to
become 0.75⋅.
#Sentiment(polarity=-0.3076923076923077, subjectivity=0.5769230769230769)
5
ADVANTAGES
1.UPSELLING OPPORTUNITIES
Identifying the happy and satisfied customers ,and increasing the selling of product.
2.AGENT MONITORING
The superiors will monitor the quality of service provided by each team member.
3.IDENTIFYING KEY EMOTIONAL TRIGERRS
Identifying the emoji trigerrs like sad,happy etc,.. for understanding the customer
satisfaction.
4.HANDLING MULTIPLE CUSTOMERS
By handling multiple customers ,we can save time and can manage other works at that
time.
5.ADAPTIVE CUSTOMER SERVICE
If the provided service by customer is good.then,the customer can easily adapt to their
service.
6.QUICK ESCALATIONS
Finding the negative emoji’s quickly and satisfying the needs of the customer.
7.REDUCE THE CUSTOMER CHURN
Identifing the unsatisfied customer,and provide a smooth service to satisfy their needs.
8.TRACKING OVERALL CUSTOMER SATISFACTION
Tracking the customer satisfaction time to time.
9.DETECT CHANGES IN OPINION
Detecting the customer opinions and satisfying their needs.Because the customer
opinion always changes before and after receiving products.
DISADVANTAGES
1.Inability to perform well in different domains.
2.Inadequate accuracy and performance in sentimental analysis based on insufficient data.
3.Incapability to deal with complex sentences that require more than sentiment words and simple
analyzing
4. It also has lot of application issues with the slang used and the short form of words
6
APPLICATIONS
Twitter sentiment analysis is designed to analyze the sentiment of tweets. It’s ideal for social
listening and detecting brand sentiment in real time.
Based on a scoring mechanism, sentiment analysis monitors conversations and evaluates language
and voice inflections to quantify attitudes, opinions, and emotions related to a business, product or
service, or topic. Sentiment analysis is sometimes also referred to as opinion mining.
The applications of sentimental analysis are endless and can be applied to any industry, from finance
and retail to hospitality and technology.The most popular applications of sentiment analysis in real
life:
1.Social media monitoring
2.Customer support
3.Customer feedback
6.Voice of employee
7.Product analysis
Sentimental analysis one of those technologies, the usefulness of which wholly depends on the
understanding capabilities
It can be extremely useful if you know how to use it and it can be completely useless if you apply it
on something it is not supposed to do.
7
STREAMING TWEETS
from tweepy.streaming import StreamListener
import consumer
# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
"""
def __init__(self):
pass
# This handles Twitter authetification and the connection to Twitter Streaming API
listener = StdOutListener(fetched_tweets_filename)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
stream.filter(track=hash_tag_list)
class StdOutListener(StreamListener):
"""
"""
8
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename
try:
print(data)
tf.write(data)
return True
except BaseException as e:
return True
print(status)
if __name__ == '__main__':
fetched_tweets_filename = "tweets.txt"
twitter_streamer = TwitterStreamer()
twitter_streamer.stream_tweets(fetched_tweets_filename, hash_tag_list)
9
CURSOR AND PAGINATION
import consumer
# # # # TWITTER CLIENT # # # #
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
10
home_timeline_tweets = []
for tweet in Cursor(self.twitter_client.home_timeline, id=self.twitter_user).items(num_tweets):
home_timeline_tweets.append(tweet)
return home_timeline_tweets
# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
11
This is a basic listener that just prints received tweets to stdout.
"""
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename
12
ANALYSING TWEET DATA
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import consumer
import numpy as np
import pandas as pd
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_twitter_client_api(self):
return self.twitter_client
13
return friend_list
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth
# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
14
stream.filter(track=hash_tag_list)
class TweetAnalyzer():
"""
15
Functionality for analyzing and categorizing content from tweets.
"""
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])
return df
if __name__ == '__main__':
twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()
api = twitter_client.get_twitter_client_api()
#print(dir(tweets[0]))
#print(tweets[0].retweet_count)
df = tweet_analyzer.tweets_to_data_frame(tweets)
print(df.head(10))
16
VISUALIZING TWEET DATA
from tweepy import API
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import consumer
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# # # # TWITTER CLIENT # # # #
class TwitterClient():
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_twitter_client_api(self):
return self.twitter_client
17
friend_list = []
for friend in Cursor(self.twitter_client.friends, id=self.twitter_user).items(num_friends):
friend_list.append(friend)
return friend_list
# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN, consumer.ACCESS_TOKEN_SECRET)
return auth
# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
18
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
19
print(status)
class TweetAnalyzer():
"""
Functionality for analyzing and categorizing content from tweets.
"""
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])
return df
if __name__ == '__main__':
twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()
api = twitter_client.get_twitter_client_api()
#print(dir(tweets[0]))
#print(tweets[0].retweet_count)
20
df = tweet_analyzer.tweets_to_data_frame(tweets)
#print(df.head(10))
time_favs = pd.Series(data=df['likes'].values, index=df['date'])
time_favs.plot(figsize=(16, 4), color='r')
plt.show()
21
SENTIMENTAL ANALYSIS OF TWEET DATA
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import consumer
self.twitter_user = twitter_user
def get_twitter_client_api(self):
return self.twitter_client
22
return tweets
# # # # TWITTER AUTHENTICATER # # # #
class TwitterAuthenticator():
def authenticate_twitter_app(self):
auth = OAuthHandler(consumer.CONSUMER_KEY, consumer.CONSUMER_SECRET)
auth.set_access_token(consumer.ACCESS_TOKEN,consumer.ACCESS_TOKEN_SECRET)
return auth
# # # # TWITTER STREAMER # # # #
class TwitterStreamer():
"""
Class for streaming and processing live tweets.
"""
def __init__(self):
self.twitter_autenticator = TwitterAuthenticator()
23
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# This handles Twitter authetification and the connection to Twitter Streaming API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_autenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
24
# Returning False on_data method in case rate limit occurs.
return False
print(status)
class TweetAnalyzer():
"""
Functionality for analyzing and categorizing content from tweets.
"""
if analysis.sentiment.polarity > 0:
return 1
elif analysis.sentiment.polarity == 0:
return 0
else:
return -1
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])
25
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])
return df
if __name__ == '__main__':
twitter_client = TwitterClient()
tweet_analyzer = TweetAnalyzer()
api = twitter_client.get_twitter_client_api()
df = tweet_analyzer.tweets_to_data_frame(tweets)
df['sentiment'] = np.array([tweet_analyzer.analyze_sentiment(tweet) for tweet in df['tweets']])
print(df.head(10))
26
OUTPUTS
1.STREAMING TWEETS
27
3.ANALYSING TWEET DATA
28
5.SENTIMENTAL ANALYSIS OF TWEET DATA
29
CONCLUSION
Twitter sentiment analysis comes under the category of text and opinion mining. It focuses
on analyzing the sentiments of the tweets and feeding the data to a machine learning model
to train it and then check its accuracy, so that we can use this model for future use
according to the results.
It comprises of steps like data collection, text preprocessing, sentiment detection, sentiment
classification, training and testing the model. This research topic has evolved during the last
decade with models reaching the efficiency of almost 85%-90%. But it still lacks the
dimension of diversity in the data. Along with this it has a lot of application issues with the
slang used and the short forms of words. Many analyzers don’t perform well when the
number of classes are increased. Also, it’s still not tested that how accurate the model will
be for topics other than the one in consideration.
30
REFERENCES
1.Sahar A. El_Rahman, "Sentiment Analysis of Twitter Data" in , Computer and Information
sciences College Princess Nourah Bint Abdulrahman University.
2.Anurag P. Jain, "Sentiments Analysis Of Twitter Data Using Data Mining", 2015 ICIP
3.Rasika Wagh and Payal Punde, "Survey on Sentiment Analysis using Twitter Dataset", ICECA,
2018.
4.Adyan Marendra Ramadhani and Hong Soon Goo, "Twitter Sentiment Analysis using Deep
Learning Methods" in , Department of Management Information Systems Dong-A University Busan
South Korea, 2017.
5.Bing Liu, Sentiment Analysis and Opinion Mining Morgan and Claypool Publishers, May 2012.
6.V. Kharde and S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of Techniques",
International Journal of Computer Applications, vol. 139, pp. 11, 2016.
7.Huma Parveen and Shikha Pandey, "Sentiment Analysis on Twitter Data-set using Naive Bayes
Algorithm" in , Bhilai, India:Dept. of Computer Science and Engineering Rungta College of
Engineering and Technology, 2016.
31