0% found this document useful (0 votes)
15 views

ProjectFinalReport 2copies

Uploaded by

Akash Bhosale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

ProjectFinalReport 2copies

Uploaded by

Akash Bhosale
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Sentiment Analysis on Twitter

1. Introduction

Twitter is a popular social networking site where millions of people tweet every second
about various topics related to society, politics, sports, entertainment, and many more. The
standard syntax followed by Twitter users while tweeting involves hashtags, retweets, and user
mentions. Hashtags are words or phrases which are prefixed with “#,” and user mention means
mentioning other people, companies, brands, or precisely other Twitter users in the tweet by
using the “@” symbol at the beginning of their username. Tweets thus help people to understand
how others feel about different ongoing events, government policies, sports tournaments, etc.
Brands can analyse tweets to know people’s sentiments towards their products.

The main motivation for the Twitter trend analysis is to identify the recent trends happening
across the world using big data machine learning techniques. This will help to analyze what has
happened in the past and what may happen in the future. It helps to track customer trends and
interests especially what customers like, what their behaviours are, and how those changes over
the time.

Sentiment analysis has gained significant importance in recent years due to the explosive
growth of social media and the vast amount of user-generated content available. It allows
businesses, researchers, and decision-makers to gain a deeper understanding of the public's
perception and sentiment towards products, services, brands, or even broader topics such as
social issues and political events

Department of Computer Science & Engineering, DYPCET. Page 1


Sentiment Analysis on Twitter

1.1. Need of work

The proposed methodology includes the various steps, namely, collecting the static and
real-time tweets from the Twitter and to perform the trend analysis. The proposed technique
uses both static tweets and also real-time tweet trend analysis. Initially, the tweets need to
be pre-processed for further analysis. Later, various machine learning techniques are applied
on these static and real-time tweets to analyse the trends.

Sentiment analysis influences users to classify whether the information about the
product is satisfactory or not before they acquire it. The pre-processed tweets are run using
different machine learning algorithms. These algorithms reveal the polarity of tweets. The
algorithms used were support vector regression, decision tree, random forest, and
multinomial logistic regression of which decision tree showed the highest accuracy.

1.2. Problem Statement

To develop real-time end-to-end Twitter monitoring system for the enterprise to


evaluate Twitter data to inform business decisions. This model is aimed at analyzing the
trending topics in Twitter by using different approaches of machine learning algorithms. By
collecting the static and also real-time tweets from the Twitter and extracting unstructured
tweet texts to perform the trend analysis.

Department of Computer Science & Engineering, DYPCET. Page 2


Sentiment Analysis on Twitter

1.3. Objectives

i. To classify sentiment polarity from the text whether it is positive, negative or


neutral. This classification can be done at the sentence level, document level or with
the entity and aspect level.
ii. To pre-processing Twitter data and prepare the data for training in machine learning
technique. This approach eliminates unnecessary noise such as slang, abbreviation,
URL, special characters from the linguistic data and also reduces the size of data set
by eliminating noise.
iii. To achieve high performance, reduced data size and produced more accurate results
by classifying the features from the sentiment words by adding polarity of it, and
applied machine learning techniques to the derived data set.
iv. To keep tracking all relevant Twitter content about a brand in real-time, perform
analysis as topics or issues emerge, and detect anomaly with alerting. By monitoring
brand mentions on Twitter, brands could inform engagement and deliver better
experiences for their customers across the world.
v. To implement an interface to visualize the sentiment distribution through bar graphs,
pie charts, histograms and more.

Department of Computer Science & Engineering, DYPCET. Page 3


Sentiment Analysis on Twitter

2. Literature Review

These days analysis of feelings from twitter is on constant appraisal within the research
community as its applications have a huge influence over the working of different industries
today. The main challenge faced by this type of analysis is the variation of speech and complex
structure of data when extracted.

As stated in the section above sentiment analysis could be used for politics. Tumasjan came
across the field and its benefits in election and used it for predicting the results in 2009 for
German federal elections. They extracted approximately 100,000 tweets for this purpose
regarding many political parties of that time and area. Then analyse the tweets in order to gain
sentiments for them. For this they used a software popularly known as (Linguistic Inquiry and
Word Count) LIWC2007. This software uses textual analysis as a base to derive sentiments.
The results obtained by this analysis were very much similar to the actual results of the
elections. Another interesting research was carried out by Dr Rajiv along with some of his
mates. They have applied the technique of sentiment analysis in a brand new way, where they
have used this technique to better situations in crises situations. They collected the data of 2014
about a deluge which occurred in Kashmir at that time. Data set collected by them consisted of
almost 8490 tweets on which naïve bayes classification technique was implemented. Their
research showed that applying analysis of feeling in these situations of crises could help the
government in saving lives

Department of Computer Science & Engineering, DYPCET. Page 4


Sentiment Analysis on Twitter

3. Software Requirement Specification (SRS)

The system aims to classify sentiment polarity from text, distinguishing between positive,
negative, and neutral sentiments. This classification can occur at the sentence level, document
level, or even at the entity and aspect level the system aims to classify sentiment polarity from text,
distinguishing between positive, negative, and neutral sentiments. This classification can occur at
the sentence level, document level, or even at the entity and aspect level. SRS emphasizes the need
to preprocess Twitter data effectively, ensuring the data is prepared for training using machine
learning techniques. This involves eliminating unnecessary noise, such as slang, abbreviations,
URLs, and special characters, thereby reducing the dataset's size while maintaining data integrity.
After preprocessing the Data the powerful performance and accurate results we used transformer
model from deep learning which is trained on vader lexicon dataset. SRS highlights the importance
of real-time tracking of relevant Twitter content pertaining to a brand. By continuously monitoring
brand mentions, the system can perform analysis as topics or issues emerge, enabling brands to
stay proactive and detect anomalies. This functionality empowers brands to enhance customer
engagement and deliver better experiences worldwide. For visualizing the overall results we used
different types of graphs from different python libraries. The sentiment analysis project's SRS
combines the objectives of accurate sentiment classification, efficient data preprocessing, high
performance, real-time tracking, and intuitive visualization, thereby facilitating the development
of a powerful and comprehensive sentiment analysis solution.

3.1 Methodology

Dataset Collection

A major part of solving any problem with machine learning is gaining proper dataset for the
training model. Getting the proper data consists of gathering or identifying the data that correlates
with the outcomes the system wants to predict. In order to find the polarity of tweets, we have to
study the natural language processing. Acquiring the dataset is the first step in machine learning.

Department of Computer Science & Engineering, DYPCET. Page 5


Sentiment Analysis on Twitter

To build and develop Machine Learning models, you must first acquire the relevant dataset. This
dataset will be comprised of data gathered from multiple and disparate sources which are then
combined in a proper format to form a dataset. Dataset formats differ according to use cases. For
instance, a business dataset will be entirely different from a medical dataset. While a business
dataset will contain relevant industry and business data, a medical dataset will include healthcare-
related data. You can also create a dataset by collecting data via different Python APIs. Once the
dataset is ready, you must put it in CSV, or HTML, or XLSX file format.

Collection of Dataset

The first part is the collection of datasets. This dataset will be made up of data collected
from various and different sources, which will then be integrated in the right way to produce a
dataset.

Data Pre-processing

The adjustments we apply to our data before feeding it to the algorithm are referred to as
pre-processing. Data pre-processing is a technique for converting raw data into a clean data
collection. In other words, anytime data is acquired from various sources, it is obtained in raw
format, which makes analysis impossible.

Prepare the Data

Data preparation is an exploratory data analysis and visualization it divides into statical
modelling and claimed to that of no obvious errors. Data pre-processing is the process of
transforming the raw data into an understandable format.

Data Visualization

Data visualization is defined as a graphical representation that contains the information and
the data. By using visual elements like charts, graphs, and maps, data visualization techniques

Department of Computer Science & Engineering, DYPCET. Page 6


Sentiment Analysis on Twitter

provide an accessible way to see and understand trends, outliers, and patterns in data. Data
visualization provides an important suite of tools for identifying a qualitative understanding. This
can be helpful when we try to explore the dataset and extract some information to know about a
dataset and can help with identifying patterns, corrupt data, outliers, and much more.

Dataset Splitting

Data splitting is when data is divided into two or more subsets. Typically, with a two-part
split, one part is used to evaluate or test the data and the other to train the model. Data splitting is
an important aspect of data science, particularly for creating models based on data. This technique
helps ensure the creation of data models and processes that use data models -- such as machine
learning are accurate. In this project we divide into training -70% and for test-30%.

Modelling

The process of modelling means training a machine learning algorithm to predict the labels
from the features, tuning it for the business need, and validating it on holdout data. A machine
learning model is built by learning and generalizing from training data, then applying that acquired
knowledge to new data it has never seen before to make predictions and fulfil its purpose. We used
Sentiment Module from natural language toolkit library to deciding the polarity of tweets.

AI based Text Comparison Module

To develop a tool that compares a given piece of text to a predefined reference text. To
achieve this goal, the project will utilize natural language processing techniques to analyze and
compare the texts at a semantic level. This script splits the predefined text and the user text into
lists of words, and then iterates through the words in the user text. If a word is found in the
predefined text, the score is increased by 1. The script then calculates the percentage of similarity
by dividing the total score by the length of the predefined text and multiplying the result by 100.
Finally, the script prints the percentage of similarity.

Department of Computer Science & Engineering, DYPCET. Page 7


Sentiment Analysis on Twitter

3.2 Algorithms

Support Vector Machine (SVM):

SVM is a supervised machine learning algorithm used for classification and regression tasks.
Its primary goal is to find an optimal hyperplane that separates data points into different classes
with the maximum margin. The margin is the distance between the hyperplane and the nearest
data points from each class. SVM can handle linear and non-linear data by using different
kernel functions to map the data into a higher-dimensional space.

Recurrent Neural Network (RNN):

RNN is a type of neural network architecture that is designed to handle sequential data, where
the order of data points matters. Unlike traditional feedforward neural networks, RNNs have
connections between nodes that form directed cycles, allowing them to retain information
about previous inputs and make decisions based on the context. RNNs utilize recurrent
connections to propagate information through time, which enables them to process sequences
of varying lengths. This makes them suitable for tasks like natural language processing, speech
recognition, and time series analysis.

Logistic Regression:

Logistic Regression is a statistical model used for binary classification problems, where the
goal is to predict a binary outcome (e.g., true/false, yes/no). Despite its name, logistic
regression is a classification algorithm rather than a regression algorithm. It estimates the
probability of the binary outcome using a logistic function (also called the sigmoid function),
which maps any real-valued input into a value between 0 and 1. The algorithm learns the
optimal coefficients for the input features to maximize the likelihood of the observed data.

Department of Computer Science & Engineering, DYPCET. Page 8


Sentiment Analysis on Twitter

4. DESIGN

4.1. System Architecture Diagram

Fig 1: System Architecture Diagram

Department of Computer Science & Engineering, DYPCET. Page 9


Sentiment Analysis on Twitter

The Fig 1 indicates the system architecture of proposed Sentiment analysis system. It shows the
working flow of model. Firstly the model is trained using the dataset containing positive and negative
tweets. Feature extraction refers to the process of transforming raw data into numerical features that can
be processed while preserving the information in the original data set. After Feature Extraction phase the
sentiment analysis performed on pre processed dataset by using different machine learning algorithms.
Finally the system gives output of classified tweets with sentiment score as final result.

5. Implementation And Coding

5.1. Language Used

Python:

Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code
readability with the use of significant indentation is dynamically-typed and garbage-collected. It
supports multiple programming paradigms, including structured (particularly procedural), object-
Python oriented and functional programming.

5.2. Technology Used

Machine Learning:

Machine learning is a growing technology which enables computers to learn automatically from past
data. Machine learning uses various algorithms for building mathematical models and making
predictions using historical data or information currently.

Sentiment Analysis

The sentiment analysis process was applied to the preprocessed tweets to gain insights into the
sentiment expressed in the textual content. Two methods, VADER and Transformers, were utilized
in this analysis to capture different aspects of sentiment. The following is an overview of these
methods and how they were incorporated into the sentiment analysis process:

Department of Computer Science & Engineering, DYPCET. Page 10


Sentiment Analysis on Twitter

1. VADER (Valence Aware Dictionary and sEntiment Reasoner): VADER is a rule-based


sentiment analysis tool widely used for sentiment analysis of social media texts. It incorporates a
lexicon that maps words to sentiment scores, indicating the positivity, negativity, or neutrality of
the word. VADER calculates a compound sentiment score by aggregating the individual
sentiment scores of the words in a given text. This compound score represents the overall
sentiment expressed in the text.

2. Incorporation of Custom Word Dictionary: In addition to VADER's built-in lexicon, a custom


word dictionary was incorporated to enhance the sentiment analysis process. The custom word
dictionary allowed the inclusion of domain-specific or context-specific words and their associated
sentiment scores. By updating VADER's lexicon with the custom word dictionary, the sentiment
analysis was tailored to the specific domain or context of the tweets, leading to more accurate
sentiment classification.

3. Transformers: Transformers is a powerful natural language processing (NLP) library that


includes pre-trained models for various NLP tasks, including sentiment analysis. In this project,
a pre-trained sentiment analysis model from Transformers was utilized. This model leverages
deep learning techniques and contextual embeddings to analyze the sentiment expressed in the
tweets. It provides a sentiment label (e.g., positive, negative, or neutral) and a sentiment score
indicating the confidence of the prediction.

4. Combined Sentiment Score and Label: To derive the final sentiment analysis results, the
sentiment scores and labels obtained from VADER and Transformers were combined. The
combined sentiment score was calculated by taking the average of the compound score from
VADER and the sentiment score from Transformers. The combined sentiment label was
determined based on this combined score. If the combined score was equal to or greater than 0,
the sentiment label was considered positive; otherwise, it was considered negative.

Department of Computer Science & Engineering, DYPCET. Page 11


Sentiment Analysis on Twitter

5.3. Dataset Used

Dataset 1: Redbull Racing

The dataset on Redbull Racing consists of 100 text snippets related to the Redbull Racing team and their
activities. The dataset provides insights into various aspects of Redbull Racing, such as partnerships,
content creation, web news, motorsports, and other related topics. Each snippet in the dataset is labelled
with sentiment polarity, indicating whether the sentiment expressed is positive, negative, or neutral.

Sample Data from the Redbull Racing Dataset:

"Ysten Labs SUI Network partners with Red Bull Racing team" - Sentiment: Positive

"Do you see your favourite content creators running HR races and want to get in on the action but are a
bit nervous to start? Fear not! I've got some tips and advice on how to get started in this piece I wrote
for..." - Sentiment: Positive

"Today in web, your daily dose of web news: Getty Images launching second NFT collection, SUI collabs
with Redbull Racing, Gamestop joins forces with Telos, OpenAI CTO falls to a Twitter hack, Magic
raises $1 million in funding led by PayPal Ventures" - Sentiment: Positive

"The greatest days in motorsports sure impressed this year, and how about that Penske power!" -
Sentiment: Positive

"Friday's recap: Soon to be integrated with partners with the racing team, United States dollar-pegged
stablecoin to launch a new native version of USDC on..." - Sentiment: Positive

This dataset provides a diverse range of text snippets, allowing for sentiment analysis and further
exploration of Redbull Racing-related topics. The sentiment labels can be used to train models or analyze
sentiment trends and patterns related to Redbull Racing.

Department of Computer Science & Engineering, DYPCET. Page 12


Sentiment Analysis on Twitter

Dataset 2: Inflation

The dataset on inflation contains 100 text snippets discussing various aspects of inflation, such as its
causes, effects, economic implications, government policies, and related topics. Each snippet in the
dataset is labelled with sentiment polarity, indicating whether the sentiment expressed is positive,
negative, or neutral.

Dataset 3: Moto Edge 40

The dataset on Moto Edge 40 consists of 100 text snippets related to the Moto Edge 40 smartphone
model. The snippets may include discussions about the phone's features, specifications, user reviews,
comparisons with other models, and more. Each snippet in the dataset is labeled with sentiment polarity,
indicating whether the sentiment expressed is positive, negative, or neutral.

Dataset 4: IPL Final

The dataset on IPL Final includes 100 text snippets related to the Indian Premier League (IPL) cricket
tournament's final match. The snippets may cover various aspects of the final, such as team performances,
player performances, match highlights, fan reactions, and more. Each snippet in the dataset is labeled
with sentiment polarity, indicating whether the sentiment expressed is positive, negative, or neutral.

These datasets can be used for sentiment analysis, text classification, trend analysis, or any other relevant
analysis related to the respective topics. The sentiment labels enable the identification of sentiment
patterns and trends within the datasets, aiding in understanding public opinion and sentiment towards the
specific topics of interest.

5.4.Interface Implementation:

The sentiment analysis project incorporates a user-friendly interface using Gradio, allowing users to input
the tweet dataset file and custom word dictionary file. The interface seamlessly integrates the sentiment
analysis functionality, generating sentiment analysis output files ("sentiment_analysis_tweets.csv" and
"sentiment_analysis_score.csv") while providing visualizations such as bar graphs, pie charts, and

Department of Computer Science & Engineering, DYPCET. Page 13


Sentiment Analysis on Twitter

histograms to visualize the sentiment distribution of the analyzed tweets. This interface enhances the
usability and accessibility of the sentiment analysis tool, enabling users to perform sentiment analysis
tasks efficiently and gain insights into the sentiment patterns within the dataset.

5.5. Hardware & Software Requirements

System Requirements

Hardware interface:

Device: Laptop, Desktop Computer

Processor: core i3 3rd Gen(minimum) and above

Ram: 4 GB (minimum) and above

Hard disk: 100 GB (minimum) and above

Software Requirements:

Operating System: Windows

Platforms: Google Collab, PyCharm

Dataset: CSV

Languages: Python, Python Libraries

Department of Computer Science & Engineering, DYPCET. Page 14


Sentiment Analysis on Twitter

5.6.Screenshots

Department of Computer Science & Engineering, DYPCET. Page 15


Sentiment Analysis on Twitter

OUTPUTS:

Fig 1. Pie Chart (WTC Final) Fig 2. Histogram (WTC Final)

Fig 3. Area Chart (WTC Final) Fig 4. Bubble Chart(WTC Final)

Department of Computer Science & Engineering, DYPCET. Page 16


Sentiment Analysis on Twitter

Fig 5. Bar Plot (WTC Final) Fig 6. Scatter Plot (WTC Final)

Above screenshots gives detailed comparison of output of the model. Fig.1 shows output in pie chart
form, where blue portion indicates the positive and orange potion shows negative tweets.
Fig 2. Shows histogram which describes the overall sentiment analysis distribution. Fig 3. Area chart
shows distribution of sentiment score on what range it is on . Fig 4. Shows Bubble chart
representation of positive tweets by green dots and red dots for negative tweets. Fig 5 shows simple
bar chart for broad visualization of output. Fig. 6 shows the scatter plot which also represent the
positive sentiment score by green dots while negative score with the red dots.

Department of Computer Science & Engineering, DYPCET. Page 17


Sentiment Analysis on Twitter

6. Model Testing

Table of Accuracy:

Test Data 1: “I Used oneplus 10r my experience was good but Nowadays oneplus going down.”

Here is the table of accuracy results with the method used for each model:

Model Method Results

Logistic Regression NLP Positive: 0.0010,

Negative:0.2613,

Neutral: 0.0010

SVM NLP Positive: 0.0557,

Negative:0.7047,

Neutral: 0.0557

RNN Deep Learning Positive:0.0834,

Negative:0.0344,
Neutral:0.8820

Transformers Deep Learning Positive:0.2444,


Negative:0.7555

Table no. 1 Accuracy Comparison Table (Test Data 1)

Department of Computer Science & Engineering, DYPCET. Page 18


Sentiment Analysis on Twitter

Test Data 2: whine all you want but ChatGPT is genuinely useful for us students. lol.”

Model Method Results

Logistic Regression NLP Positive: 0.0010,

Negative:0.2613,

Neutral: 0.0010

SVM NLP Positive: 0.1854,

Negative:0.2072,

Neutral: 0.1854

RNN Deep Learning Positive:0.083,


Negative:0.0147,

Neutral:0.9015

Transformers Deep Learning Positive:0.4537,


Negative:0.5462

Table no 2 Accuracy Comparison Table (Test Data 2)

Note: The accuracy for every model is different and may require further.

Here we compared different methods of NLP and Deep Learning on two different test data’s. Table
1 shows the accuracy on Test Data 1 and Table no. 2 for Test Data 2. From above comparison we
can say that the transformers model of Deep Learning method gives accurate results as compare to
RNN,SVM and Logistic Regression.

Department of Computer Science & Engineering, DYPCET. Page 19


Sentiment Analysis on Twitter

6.1 Result Analysis

The sentiment analysis was performed on the preprocessed tweets using the combined approach of
VADER and Transformers. The results provide valuable insights into the sentiment expressed in the
collected tweets. The following summarizes the key findings and analysis of the sentiment analysis
results:

1. Distribution of Sentiment Labels: Visualizations such as bar graphs, pie charts, and histograms
were employed to showcase the distribution of sentiment labels derived from the sentiment
analysis. The bar graph displays the count of positive, negative, and neutral sentiment labels,
allowing us to understand the overall sentiment distribution. The pie chart provides a visual
representation of the proportion of each sentiment label, highlighting the dominant sentiment.
Additionally, the histogram depicts the distribution of the combined sentiment scores, enabling
us to identify the sentiment trends across the dataset.

2. Sentiment Patterns and Trends: Analyzing the sentiment analysis results revealed several
notable findings and trends. By examining the distribution of sentiment labels, we observed the
dominant sentiment prevailing within the collected tweets related to the given topic. Furthermore,
analyzing the combined sentiment scores allowed us to identify the intensity and polarization of
sentiment. This analysis shed light on whether the sentiment expressed in the tweets was
predominantly positive, negative, or neutral and provided insights into the overall sentiment
tendencies within the dataset.

3. Insights into Tweet Sentiment: The sentiment analysis results offer valuable insights into the
sentiment of the collected tweets related to the given topic. Through the sentiment labels and
combined sentiment scores, we gained a deeper understanding of the public opinion and
sentiment surrounding the topic. By examining the positive and negative sentiment labels, we
identified the aspects or factors that elicit positive or negative sentiment among Twitter users.
These insights can contribute to understanding public sentiment, guiding decision-making, and
identifying areas for improvement or further investigation related to the given topic.

Department of Computer Science & Engineering, DYPCET. Page 20


Sentiment Analysis on Twitter

The visualizations and analysis of the sentiment analysis results provide a comprehensive overview
of the sentiment expressed in the collected tweets. By examining the distribution of sentiment labels
and combined sentiment scores, notable patterns and trends can be identified, allowing for a deeper
understanding of public sentiment regarding the given topic. The insights gleaned from the sentiment
analysis results serve as a valuable resource for decision-makers, researchers, and stakeholders
seeking to comprehend the public opinion landscape and make data-driven decisions.

6.2 Validation

To validate our project work, we conducted a comparison of our sentiment analysis model with various
existing models available on the internet. We utilized the same topics from Twitter and gathered results
from different models. Here are the outcomes:

The Twitter-roBERTa-base sentiment analysis model yielded a positive sentiment rate of 78..66%.

The Monkey Learn sentiment analysis model produced a positive sentiment rate of 82.2%.

Our sentiment analysis model achieved a positive sentiment rate of 89.0%.

Through this evaluation, our model demonstrated superior performance, surpassing the accuracy of the
other models. These results indicate the effectiveness and reliability of our sentiment analysis approach.

Department of Computer Science & Engineering, DYPCET. Page 21


Sentiment Analysis on Twitter

6.3 Comparative Analysis

Model Name Confidence (WTC Final )

VADER 62.0%

MonkeyLearn 54.9%

VADER+TRANSFORMERS 81.2%

Table No. 3 Comparative analysis WTC Final Dataset

Model Name Confidence (Android 13 )

VADER 56.0%

MonkeyLearn 41.8%

VADER+TRANSFORMERS 79.6%

Table No. 4 Comparative analysis Android 13 Dataset

Department of Computer Science & Engineering, DYPCET. Page 22


Sentiment Analysis on Twitter

The provided tables table no. 3 and table no. 4 presents comparative analysis results for two different
datasets: the WTC Final dataset and the Android 13 dataset respectively. The confidence scores of three
sentiment analysis models, namely VADER, MonkeyLearn, and VADER+TRANSFORMERS, are
reported for each dataset.

For the WTC Final dataset, the VADER model achieved a confidence score of 62.0%, while
MonkeyLearn obtained a slightly lower score of 54.9%. On the other hand, the combined approach of
VADER+TRANSFORMERS demonstrated a significantly higher confidence score of 81.2%. These
results indicate that VADER+TRANSFORMERS outperformed both VADER and MonkeyLearn in
analysing sentiment for the WTC Final dataset.

Moving on to the Android 13 dataset, the VADER model attained a confidence score of 56.0%.
MonkeyLearn, on the other hand, achieved a lower score of 41.8%. Similar to the previous dataset, the
VADER+TRANSFORMERS approach showcased superior performance with a confidence score of
79.6%. Hence, once again, VADER+TRANSFORMERS emerged as the top-performing model for
sentiment analysis in the context of the Android 13 dataset.

Department of Computer Science & Engineering, DYPCET. Page 23


Sentiment Analysis on Twitter

7. Conclusion

In conclusion, this project aimed to analyze the sentiment expressed in tweets related to a given topic
using a combined approach of VADER and Transformers sentiment analysis methods. The key
findings and outcomes of the project are summarized below:

1. Key Findings: The sentiment analysis of the collected tweets provided valuable insights into the
sentiment expressed by Twitter users regarding the given topic. By analyzing the distribution of
sentiment labels and combined sentiment scores, we gained a comprehensive understanding of
the overall sentiment tendencies within the dataset. The visualizations showcased the prevalence
of positive, negative, and neutral sentiments, allowing us to identify sentiment patterns and
trends.

2. Effectiveness and Limitations: The sentiment analysis approach employed in this project
utilizing both VADER and Transformers demonstrated effectiveness in capturing and analyzing
sentiment in the tweets. The incorporation of a custom word dictionary enhanced the sentiment
analysis by refining the sentiment scores generated by VADER. However, it is important to
acknowledge the limitations of sentiment analysis, such as the reliance on textual data and the
inherent challenges in accurately capturing nuanced sentiment. While the approach provided
valuable insights, there may be cases where the analysis falls short in fully capturing the
complexity and context of sentiment expressed in the tweets.

3. Areas of Improvement and Future Work: There are several potential areas of improvement
and avenues for future work in this project. Firstly, expanding the dataset by collecting more
tweets or exploring multiple data sources can provide a broader understanding of public
sentiment. Additionally, fine-tuning the sentiment analysis models, incorporating domain-
specific lexicons, or exploring advanced natural language processing techniques may further
improve the accuracy and granularity of sentiment analysis results.

Department of Computer Science & Engineering, DYPCET. Page 24


Sentiment Analysis on Twitter

8. References

[1] Mishra, Dibya Nandan, and Rajeev Kumar Panda. "Decoding customer experiences in rail transport
service: application of hybrid sentiment analysis." Public Transport (2022): 1-30.
[2] Liu, Bing, and Lei Zhang. "A survey of opinion mining and sentiment analysis." In Mining text data,
pp. 415-463. Springer, Boston, MA, 2012.
[3] Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. "Recognizing contextual polarity in phrase-level
sentiment analysis." In Proceedings of human language technology conference and conference on
empirical methods in natural language processing, pp. 347-354. 2005..
[4] Al Badani, Barakat, Ronghua Shi, and Jian Dong. "A novel machine learning approach for sentiment
analysis on Twitter incorporating the universal language model fine-tuning and SVM." Applied System
Innovation 5, no. 1 (2022): 13.

Department of Computer Science & Engineering, DYPCET. Page 25


Sentiment Analysis on Twitter

Department of Computer Science & Engineering, DYPCET. Page 26

You might also like