0% found this document useful (0 votes)

4 views6 pages

Sentiment Analysis Task on Twitter Data

This research paper evaluates the performance of BERT for sentiment analysis on Twitter data, benchmarking it against traditional methods like Logistic Regression and Support Vector Machines. The study utilizes the SemEval 2017 Tweets dataset, demonstrating that BERT outperforms the baseline models in precision, recall, and F1-score. The findings suggest that BERT is a suitable choice for analyzing short text sequences typical of Twitter, despite its limitations with longer texts.

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views6 pages

Sentiment Analysis Task on Twitter Data

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Text Mining [2020]

Final Assignment

Abhishek Akshat (s2581418)

[email protected]

Leiden University, Leiden, NL

Abstract. In this research paper, we used BERT (Bidirectional Encoder

Representations from Transformers). We benchmarked the performance
of BERT against current methods to perform the Sentiment Analysis
task on the Twitter data.

Keywords: Sentiment Analysis · Twitter · BERT

1 Introduction

The determination of polarity of texts to identify its sentiment has become an

important task today in the Natural Language Processing (NLP) and data sci-
ence field. This is becoming an interesting field to study due to large amounts
of social media messages. The tweets from Twitter have especially become an
integral part of people’s lives who want to express their views through this so-
cial media platform in a concise manner within the 140 character limit. Various
techniques in NLP are being applied to these tweets to gather more information
about people’s sentiment. In this research, we are interested in the application
of one such technique called Sentiment Analysis.

Pang et al. (2008) [1] defines Sentiment Analysis as follows: “Sentiment anal-
ysis is the process to identify and analyze polarity from short texts, sentences,
and documents.”
According to them, sentiment seems to require a higher level of understand-
ing than just topic-based classification. Sentiment analysis has many uses in the
field of political and social sciences as well as for businesses. Companies can
use sentiment analysis to analyze customers’ satisfaction level of their products
and accordingly improve their products based on the opinions of customers to
provide better services in the future.

In this research paper, we are going to use BERT (Bidirectional Encoder Rep-
resentations from Transformers) [2]. We will benchmark BERT against current
methods to perform the Sentiment Analysis task on the Twitter data.
2 A. Akshat

2 Related Work

Sentiment analysis has become an interesting field for researchers with the in-
crease in amount of text messages from social media and blog posts. A compre-
hensive overview of prevailing work has been given in Pang and Lee, 2008 [1]. In
their paper, they have described the current approaches and techniques for an
opinion-oriented information retrieval. Pak and Paroubek (2010) [4] in their pa-
per have scraped tweets from Twitter using Twitter API. They combined those
tweets together and created a corpus in which each tweet was annotated by
emoticons. A Multinomial Naıve Bayes classifier which used N-gram and POS-
tags as features was trained and tested on the corpus. Parikh and Movassate
(2009) [5] have used Naıve Bayes bigram model to classify tweets and compared
it with Maximum Entropy model. They inferred that Naıve Bayes model per-
forms much better than Maximum Entropy model. On the contrary, Go and
L.Huang (2009) [6] say that SVM outperforms other models. They made use
of unigrams, bigrams and POS for their feature space. Cliche (2017) described
in his paper about Twitter sentiment classifier using Convolutional Neural Net-
works (CNNs) and Long Short Term Memory (LSTM) networks. His sentiment
classifier utilized huge volumes of unlabeled data to pre-train word embeddings.
A subset of the unlabeled data set was then used to refine the word embeddings
using distant supervision. Lastly, the final CNNs and LSTMs were trained on
the SemEval-2017 Twitter dataset where the word embeddings are refined again.
The goal was to improve performance of his sentiment classifier for which several
CNNs and LSTMs were combined together.

3 Data

For this research paper, we have used the SemEval 2017 Tweets dataset [3]. The
data collected from Twitter is particularly useful in Sentiment Analysis for the
following reasons:

– Micro-blogging platforms such as Twitter are being used by people from a

variety of backgrounds to convey their views on diverse topics making tweets
a valuable source of public opinion.
– Twitter consists of a large amount of tweets that is growing every passing
day making the gathered data set arbitrarily huge.
– The user base of Twitter is diverse ranging from regular users to business ex-
ecutives, celebrities, politicians and country presidents and prime ministers.
Hence, it is possible to gather tweets of users from diverse social, political
and interest groups.
– Twitter’s user base is also characterized by people from different countries

We explore the data in the further sections.

Text Mining [2020] Final Assignment 3

3.1 Data Description

The dataset provided contains 11 .tsv files (Tab Separated Files). We merged
all the files in one large file in order to perform data pre-processing and tasks
easily.

Fig. 1. Data Sample

We can observe the sample of our Twitter data in the figure 1. The dataset
provided had 3 columns in each file, namely

– Item ID
– Sentiment (Positive, Neutral, Negative)
– Sentiment Text (Tweet)

4 Method
In this section, we will describe the methods and experiments performed. We
used a pre-trained general BERT to perform the task of Sentiment Analysis.
To perform the Sentiment Analysis using pre-trained BERT and evaluating and
comparing its performance against current methods, we choose Logistic Re-
gression and Support Vector Machines as our baseline methods.

4.1 Data Pre-processing

The data provided is collected from Twitter hence, it contains some noise and
needs to be pre-processed before we can use it to train our models. So we per-
formed several tasks to clean our data.
We removed all the NaN values, after that we removed stop-words from the
Tweets. We also converted texts to lowercase and removed punctuation. Text
normalization is important for noisy texts, therefore we also performed text
normalization for our dataset since Twitter data has noise present.
After cleaning the data, the final dataset contained 48302 tweets. After pre-
processing the data, we divided the dataset into training and testing sets. For
our experiments, we used 80% of the data for training and the remaining 20%
for testing purpose.
4 A. Akshat

Fig. 2. Sentiment Distribution

The distribution of Tweets with respect to its polarity can be seen in the
figure 2. It shows that the polarity is almost fairly distributed among the dataset
used for performing the experiments in this paper.

4.2 Experiment

The used BERT model is a 12-layer, 768-hidden, 12-heads, 110M parameter

neural network architecture as we can see from the figure 3. This pre-trained
BERT was trained on English Wikipedia (2,500M words) and BooksCorpus
(800M words).

Fig. 3. BERT Model

We choose ‘CategoricalCrossEntropy’ as our loss function, ‘SparseCategori-

calAccuracy’ as our accuracy metric and ‘Adam’ as our optimizer for configuring
the BERT model. We fine-tuned the model for 5 epochs with the training dataset.
Then we evaluated its performance on the testing data.
Text Mining [2020] Final Assignment 5

For baseline comparison, we used two traditional machine learning algorithms

namely, Logistic Regression and Support Vector Machines. For our experiments,
we trained all our baselines models with the help of some vectorization strate-
gies such as bag-of-words, TF-IDF, Word2Vec, etc. We trained and evalute the
baseline models on the same training and testing sets.
We recorded the Precision, Recall and F1-Score for our models. The results
from the performed experiments is shown and explained in the next section.

5 Results

After training the models on a collection of 38642 tweets and evaluating it on

the test set of 9660 tweets, we recorded the results for our pre-trained BERT,
Logistic Regression and Support Vector Machines.

Model Precision Recall F1-Score

Pre-trained BERT 0.5188 0.4961 0.5476
Logistic Regression 0.4282 0.4115 0.4062
Support Vector Machines 0.4316 0.3972 0.3880

From the results table, we can see that pre-trained BERT performs better
than the respective baseline models over all metrics. However, Logistic Regres-
sion seems to perform better than the Support Vector Machines. SVM seems
to perform worse when we have a large dataset and the dataset contains a lot
of noise since Twitter data has a lot of noise this can be the reason behind its
performance.

6 Conclusion

We successfully built a transformers network with a pre-trained BERT model

and achieved good results on the sentiment analysis of the Twitter dataset.
From the experiments performed, we can conclude that our pre-trained BERT
models performs better than the baseline methods such as Logistic Regression
and Support Vector Machines. One of the limitation of BERT is that it cannot
handle long text sequences, however in our case we are using Twitter data which
are mostly sequences of small texts making BERT a good choice here. This
helps us to answer our research question concluding pre-trained BERT method is
better than traditional baseline methods such as Logistic Regression and Support
Vector Machines.

References
1. Bo Pang and Lillian Lee (2008): ”Opinion Mining and Sentiment Analysis”,
Foundations and Trends® in Information Retrieval: Vol. 2: No. 1–2, pp 1-135.
https://doi.org/http://dx.doi.org/10.1561/1500000011
6 A. Akshat

2. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova (2018):
”BERT: Pre-training of Deep Bidirectional Transformers for Language Understand-
ing”. https://doi.org/https://arxiv.org/abs/1810.04805
3. SemEval-2017 Task 4. https://doi.org/https://alt.qcri.org/semeval2017/task4/
4. Pak, Alexander Paroubek, Patrick. (2010). Twitter as a Corpus for
Sentiment Analysis and Opinion Mining. Proceedings of LREC. 10.
https://www.researchgate.net/publication/220746311
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
5. Parikh, Ravi Movassate, Matin. (2009). Sentiment Analysis of User-
Generated Twitter Updates using Various Classication Techniques.
https://www.researchgate.net/publication/242660794
Sentiment Analysis of User-Generated Twitter Updates using Various Classication
Techniques/citation/download
6. Go, Alec Bhayani, Richa Huang, Lei. (2009). Twitter sen-
timent classification using distant supervision. Processing. 150.
https://www.researchgate.net/publication/228523135 Twitter sentiment classification
using distant supervision

twitter sentiment analysis ppt
100% (2)
twitter sentiment analysis ppt
10 pages
Cat Dog Classification Project
No ratings yet
Cat Dog Classification Project
10 pages
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
No ratings yet
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
4 pages
7.Tomato Quality Classification Based on Transfer
No ratings yet
7.Tomato Quality Classification Based on Transfer
14 pages
1729401471516
No ratings yet
1729401471516
98 pages
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
No ratings yet
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
5 pages
Yasna Abdi Peresentation
No ratings yet
Yasna Abdi Peresentation
38 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
Twitte Analysis
No ratings yet
Twitte Analysis
53 pages
NLP
No ratings yet
NLP
45 pages
Erik Hjerpe Volvo Car Group PDF
No ratings yet
Erik Hjerpe Volvo Car Group PDF
14 pages
9)Sentiment Classification in Social Media
No ratings yet
9)Sentiment Classification in Social Media
42 pages
uno-3
No ratings yet
uno-3
16 pages
Majorproject
No ratings yet
Majorproject
26 pages
Class-Incremental Learning A Survey
No ratings yet
Class-Incremental Learning A Survey
20 pages
23NE1D5802
No ratings yet
23NE1D5802
15 pages
2019, Pradha - Effective Text Data Preprocessing Technique for Sentiment Analysis in Social Media Data
No ratings yet
2019, Pradha - Effective Text Data Preprocessing Technique for Sentiment Analysis in Social Media Data
8 pages
Dos
No ratings yet
Dos
11 pages
IC-RTETM_Final_Sentiment_Analysis
No ratings yet
IC-RTETM_Final_Sentiment_Analysis
13 pages
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
Cmu CS QTR 127
No ratings yet
Cmu CS QTR 127
38 pages
Group3 POC Assignment 3
No ratings yet
Group3 POC Assignment 3
9 pages
BERT Sentiment Analysis Twitter
No ratings yet
BERT Sentiment Analysis Twitter
11 pages
How To Pass Image Datasets To CNN Models Using Image Data Generations - by MD Shahbaz Alam - Medium
No ratings yet
How To Pass Image Datasets To CNN Models Using Image Data Generations - by MD Shahbaz Alam - Medium
14 pages
CSE4062S21_Group3_Project_Delivery7_FinalReport
No ratings yet
CSE4062S21_Group3_Project_Delivery7_FinalReport
9 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
13 pages
ProjectFinalReport 2copies
No ratings yet
ProjectFinalReport 2copies
26 pages
FML Project Report
No ratings yet
FML Project Report
18 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
Lab Report - CSE 816
No ratings yet
Lab Report - CSE 816
17 pages
Introduction
No ratings yet
Introduction
27 pages
document-dsbda-codes-for-mini-project
No ratings yet
document-dsbda-codes-for-mini-project
9 pages
Abstract Review PPT Tem - 03
No ratings yet
Abstract Review PPT Tem - 03
7 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
Uno
No ratings yet
Uno
6 pages
Hands-On Machine Learning With Scikit-Learn and TensorFlow
100% (1)
Hands-On Machine Learning With Scikit-Learn and TensorFlow
18 pages
dos
No ratings yet
dos
5 pages
Sentiment Analysis Final Documentation Report
50% (2)
Sentiment Analysis Final Documentation Report
21 pages
What Are Large Language Models
No ratings yet
What Are Large Language Models
6 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
DRAFT S PRESENTATION
No ratings yet
DRAFT S PRESENTATION
3 pages
CSCI 5922 Neural Networks and Deep Learning
No ratings yet
CSCI 5922 Neural Networks and Deep Learning
37 pages
6 Month AI Engineer Roadmap
No ratings yet
6 Month AI Engineer Roadmap
3 pages
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
No ratings yet
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
7 pages
Pysentimiento: A Python Toolkit For Sentiment Analysis and Socialnlp Tasks
No ratings yet
Pysentimiento: A Python Toolkit For Sentiment Analysis and Socialnlp Tasks
4 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
9 pages
Car Popularity Prediction
No ratings yet
Car Popularity Prediction
5 pages
Back Propagation
No ratings yet
Back Propagation
37 pages
MBZUAI Course Catalogue July 2022
No ratings yet
MBZUAI Course Catalogue July 2022
6 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
7 pages
Sentiment Analysis Twitter
No ratings yet
Sentiment Analysis Twitter
3 pages
Sentiment of tweets
No ratings yet
Sentiment of tweets
7 pages
fin_ijprems1714118825
No ratings yet
fin_ijprems1714118825
6 pages
Twiiter Sentiment Analysis
No ratings yet
Twiiter Sentiment Analysis
15 pages
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
No ratings yet
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
8 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
No ratings yet
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
4 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Stanford Center For AI Safety - Whitepaper
No ratings yet
Stanford Center For AI Safety - Whitepaper
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
IJCRT2207068
No ratings yet
IJCRT2207068
5 pages
Preview
No ratings yet
Preview
11 pages
Machine Learning For Sentiment Analysis of Twitter Data
No ratings yet
Machine Learning For Sentiment Analysis of Twitter Data
9 pages
Se Write-Up
No ratings yet
Se Write-Up
2 pages
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
No ratings yet
Teaching Classical Machine Learning As A Graduate-Level Course in Chemical Engineering: An Algorithmic Approach
11 pages
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
No ratings yet
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
6 pages
K-Nearest Neighbour (KNN)
No ratings yet
K-Nearest Neighbour (KNN)
14 pages
Nitesh Singh CV
No ratings yet
Nitesh Singh CV
1 page
Abstract
No ratings yet
Abstract
2 pages
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
No ratings yet
Twitter Sentiment Analysis Using Machine Learning Algorithms IJERTV12IS070128
3 pages
Unit 4 QB AI
No ratings yet
Unit 4 QB AI
5 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
19 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
The Future of Robots - Rodney Brooks
No ratings yet
The Future of Robots - Rodney Brooks
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Sentiment Analysis of Twitter Data: Radhi D. Desai
No ratings yet
Sentiment Analysis of Twitter Data: Radhi D. Desai
4 pages
Sentiment Classification System of Twitter Data For US Airline Service Analysis
No ratings yet
Sentiment Classification System of Twitter Data For US Airline Service Analysis
5 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
Preprocessing The Informal Text For Efficient Sentiment Analysis
No ratings yet
Preprocessing The Informal Text For Efficient Sentiment Analysis
4 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
Machine Learning Specialization CloudxLab PDF
No ratings yet
Machine Learning Specialization CloudxLab PDF
12 pages
AI Associate Glossary
No ratings yet
AI Associate Glossary
5 pages
Artificial Intelligence (2180703) : Semester: Vii Credit: 6 MCQ Question Bank
No ratings yet
Artificial Intelligence (2180703) : Semester: Vii Credit: 6 MCQ Question Bank
10 pages
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)
From Everand
Building Transformer Models with PyTorch 2.0: NLP, computer vision, and speech processing with PyTorch and Hugging Face (English Edition)
Prem Timsina
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet