0% found this document useful (0 votes)
93 views22 pages

Sentiment Analysis of Tweets Using Machine Learning

The document discusses sentiment analysis of tweets using machine learning techniques. It describes classifying tweets as positive or negative using classifiers like Naive Bayes, support vector machines, and recurrent neural networks. The workflow involves preprocessing tweets, extracting features, training classifiers on labeled data, and evaluating performance on test data. Applications include analyzing consumer sentiment for organizations and improving marketing based on public opinions.

Uploaded by

Makp112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views22 pages

Sentiment Analysis of Tweets Using Machine Learning

The document discusses sentiment analysis of tweets using machine learning techniques. It describes classifying tweets as positive or negative using classifiers like Naive Bayes, support vector machines, and recurrent neural networks. The workflow involves preprocessing tweets, extracting features, training classifiers on labeled data, and evaluating performance on test data. Applications include analyzing consumer sentiment for organizations and improving marketing based on public opinions.

Uploaded by

Makp112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SENTIMENT ANALYSIS OF TWEETS

USING MACHINE LEARNING


• Problem Statement:
• To study & apply Machine learning techniques used for sentiment
analysis to classify tweets as positive or negative.
• Scope:
• Sentiment analysis can be used for diverse applications in various
fields to maximize interests or profit of companies based on the
reviews they receive.
• Motivation
• The opinions of individuals towards an entity is very valuable. In an age
on internet, these opinions are produced on social platforms like Twitter in
huge amounts. Humans are incapable of processing such large amounts of
data which puts forth the need to automate this process using sentiment
analysis.
• With the help of machine learning algorithms this has fairly become an
easy and efficient task.
SENTIMENT ANALYSIS

• Sentiment analysis deals with identifying and classifying opinions or sentiments


expressed in source text towards entities such as products, services, organizations,
individuals, issues, events, topics, and their attributes.
• Social media is generating a vast amount of sentiment rich data in the form of
tweets, status updates, blog posts etc. This data could be used for performing
sentiment analysis.
• The amount of user generated content is too large for a normal user to analyze. So
to automate this, various sentiment analysis techniques are used.
TECHNIQUES TO PERFORM SENTIMENT ANALYSIS

• Knowledge Based approach: This technique requires a large database of


predefined emotions and an efficient knowledge representation for
identifying sentiments. This technique is found to be difficult due to the
requirement of a huge lexical database hence making it tedious and
erroneous.
• Machine Learning approach: Machine learning approach makes use of a
training set to develop a sentiment classifier that classifies sentiments. Unlike
the latter this does not require a large database of predefined emotions.
MACHINE LEARNING TECHNIQUES

• This approach makes use of a training set and test set.


• The training set consists of input feature vectors and their corresponding class
labels. Using this set, a classification model is developed which tries to classify
the input feature vector into corresponding class labels.
• Test set is used to validate the model by predicting the class labels of unseen
feature vectors.
CLASSIFIERS
• Naïve Bayes Classifier:
• This classifier is based on Bayes theorem. An assumption made here is that all the input feature
vectors are independent of each other and are equal.
• The conditional probability for Naive Bayes can be defined as :
𝑚

Ρ 𝑋 𝑦 = ෑ Ρ 𝑥𝑖 𝑦𝑗
𝑖=1
• Nave Bayes does not consider the relationships between features. So it cannot utilize the
relationships between part of speech tag, emotional keyword and negation.
• ’X’ is the feature vector defined as X={x1,x2,....xm} and yj is the class label.
• Here, in sentiment analysis of tweets there are different independent features like emoticons,
emotional keyword, count of positive and negative keywords, and count of positive and
negative hash tags which are effectively utilized by Naive Bayes classifier for classification.
• Support Vector Machine:
• This is a binary classifier i.e. it can classify the input features vectors into only two distinct
classes.
• It separates the tweets using a hyper plane.
• For classification of tweets we have used linear
Kernel as it maintains a wide gap between two classes.
• The support vector machine is give a set of labelled training
data of the two categories and is trained on this training data.
• The mathematical function used:
𝑔 𝑋 = 𝜔𝑇 𝜙 𝑋 + 𝑏
’X’ is the feature vector, ’w’ is the weights vector and
’b’ is the bias vector. φ() is the non linear mapping
from input space to high dimensional feature
space.
• Recurrent Neural Network:
• Recurring neural network (RNN) are popular and efficient models which have proven to be
useful in Natural language processing (NLP). RNN make use of sequential information.
• RNN different from these algorithms or other neural networks is its ability to connect previous
information to current tasks, thus it makes use of memory.
• RNN’s have a memory which stores about the computations previously.
• RNN’s have three layers : input layer, hidden layer and output layer.
• These compute results based on a correlation between the current data step and previous data
step, just like humans take decisions.
SENTIMENT ANALYSIS OF TWEETS

• Twitter is a social media platform used widely by individuals to large organizations


• Users on Twitter use this platform to express their views or opinions related to any
entity like a person, product, service or organization.
• Sentiment analysis of tweets is a challenging task as tweets are short in length, 140
characters are allowed at a time to be precise which occur with misspelling, use of
slangs, and use of emoticons.
• Tweets are short, noisy and covers a variety of topics. Tweeters often used different
vocabularies also. All of this puts a challenge to sentiment analysis of tweets.
WORKFLOW
DATASET
• The dataset used here was Neik Sanders corpus file.
• This file consists of sentiments of famous organizations like Apple, Microsoft,
Google and Twitter.
• It has been formatted in the following way :
Company Sentiment(positive,negativ Twitter ID
e,neutral,irrelevant)

Apple Positive 1.26E+17

Microsoft Irrelevant 1.26E+17


PRE-PROCESSING OF TWEETS
• Twitter policies do not allow to store tweets for more than 24 hours. So we retrieve the
tweets from Twitter using twitter api library defined by python.
• To retrieve the tweets we need the twitter ID which is obtained from Neik Sanders corpus
file.
• Tweets obtained could have misspellings, slang words, emoticons and hence they require
pre-processing before giving it to the classifier for classification purposes.
• Preprocessing steps include removing url, avoiding misspellings and slang words.
Misspellings are avoided by replacing repeated characters with 2 occurrences.
• Slang words contribute much to the emotion of a tweet. So they can’t be simply removed.
• A slang word dictionary is maintained to replace slang words occurring in tweets with their
associated meanings.
FEATURE VECTORS

• Feature vector is composed of 8 relevant features.


• Part of speech (pos) tag
• Special keyword
• Presence of negation
• Emoticon
• Number of positive keywords
• Number of negative keywords
• Number of positive hash tags
• Number of negative hash tags.
EVALUATION

• After pre-processing of tweets and feature extraction step, a support vector machine is
defined and is trained on the data obtained.
• It was tested for the keyword “GOOGLE”, and it returned a 82% positive sentiment and
18% negative sentiment.
• A csv file is also generated which stores the sentiment for
the tweet along with the search word.
Creating test set
Pre-processing of test set
Initializing the tweets
FUTURE SCOPE
• Sarcastic comments are the ones which are very difficult to identify. Tweets containing
sarcastic comments give exactly opposite results owing to the mindset of the author.
• The context in which a word is used, the interpretation changes. For ex: the word
‘unpredictable’ in ‘unpredictable plot’ in context of a land plot is negative whereas
‘unpredictable plot ’ in context of a movie’s plot is positive. So it’s important to relate
the interpretation with the context of the tweets.
• The use of native language combined with English usage is difficult to interpret.
• To improve the accuracy one way is to train your system in a way such that it gets the
sentiment of word based on the entire tweet i.e. if a word in the tweet has more than
one meaning then it compares all the meanings of the word and takes the one which
best suits the sentiment of entire tweet.
APPLICATIONS
• Sentiment analysis of tweets can be extended to any review related website for
example product review to understand products popularity, movie review etc.
• Highly useful in sub component technology such as detecting antagonistic, heated
language in mails, context sensitive information detection, spam detection etc.
• Organizations can use it to determine consumer attitudes and trends is one of the
major applications of sentiment analysis.
• Consumers can use sentiment analysis to research products or services before making
a purchase.
• Marketers can use this to research public opinion of their company and products, or
to analyse customer satisfaction.
CONCLUSION

• Using various machine learning algorithms to perform sentiment analysis of


tweets .
• A comprehensive study of the comparison between the different models and
their performance(accuracy) was also obtained.
• Reducing the dataset through feature extraction, enhanced the performance
of the classifiers used and produced better results.

You might also like