A Review Paper On Use of BERT Algorithm in Twitter Sentiment Analysis
A Review Paper On Use of BERT Algorithm in Twitter Sentiment Analysis
Abstract: The tweet sentiment analysis is still focused on conventional messages, such as film reviews and
product reviews, while significant improvement has been made as deep learning becomes wide spread, and
comprehensive data sets are accessible for training (far from just emoticons and hash tags). Nevertheless, prior
opinion analysis experiments typically performed on tweets, i.e., only two forms of global polarities (i.e.
optimistic and negative) occur with their work/validation/test data sets. What is more, systems' judgments are
not actively aligned with the specified appraisal objects. In the proposed system, we have discussed some Yolo
deep learning approach for twitter sentiment analysis. We also trained our model using Yolo to get some good
accuracy results.
concerns are on the increase, and taking steps ratings in a structured manner as per the existing
before issues worsen. In real-time, one can track data. Step 6: The complete data need to be
brand references on social networking site-based processed using the nltk library by removing the
user posts with sentiment analysis and obtain common words, punctuations, and stop words.
valuable apprehensions. Step 7: Thus, the processed data considered for
The particular brands related to a large further analysis to classify the sentiments that are
number of posts on networking sites can be hidden in the reviews. The discussed data gathering
analyzed automatically instead of manual analysis. is the major step in sentiment analysis.
As networking site information increasing and Once data gathered and processed then the
acquires useful insights by easy scaling of remaining part is easier to analyze.
sentiment analysis tools. Now the main second step discuss below that is
Stop discrepancies resulting from human Model utilized for sentiment analysis.
mistakes. From each piece of information, The neural network model is a pre-trained
customer representatives may not always settle based on natural language processing utilized for
about which tag to utilize, so one can finish up with sentiment analysis and that model is popularly
incorrect results. Rather than utilizing one system known as BERT. The full form of BERT is
of regulations, machine learning techniques “Bidirectional Encoder Representation from
conduct sentiment analysis, so one can guarantee Transformers”. In simple terms, it is a combination
that all their information is reliably tagged. of a neural network as well as natural language
processing. Natural language processing can also
III. METHODOLOGY be considered as NLP corresponds to a branch of
The sentiment analysis is a classification job. AI that interacts with linguistics to enable
Positive, negative, neutral are three parts in the computers to learn and understand the
sentiment analysis. The analysis would be made via communications that humans naturally considered.
using machine learning (ML) and Natural The major applications of NLP are social listening,
language processing (NLP) methods, sentiment analysis, and word suggestions, chatbot,
In narrow scope of my study, use ML and IR and so on. The way of training the models
methods for the analysis. conventionally got broken by the BERT model.
The methodology consists of two main steps in The training in this particular model will be
general. First is Dataset and Second step is Model done in a to and forth way (from first to last and
utilized for sentiment analysis. again from last to first). A complete sequence of
In this study, dataset considered is a words will be passed onto the model and the model
combination of the information extracted from will get trained as mentioned previously. Instead of
Twitter and the existing data based on movie only the word that primarily precedes or follows it,
reviews. For the extraction of information from BERT enables the language framework to
Twitter, various steps need to be followed. understand word meaning based on contextual
Step 1: Set up the Twitter application either in a terms.
web browser or on a Smartphone and generate an
account. Usually, a web browser is highly
recommended.
Step 2: Get the developer access to the Twitter
application by creating an application on Twitter.
Once the user gets the developer access, then
Twitter provides a consumer key, consumer secret
key, access token key, and access token secret key
which are crucial and private for a user for
extraction of the information from Twitter. Step 3:
By importing the library tweepy we can use various
functionalities to attain the authentication.
Step 4: Search the tweets using various search Fig. 1: Overall pre-training and fine-tuning procedures for
words that are related to movies and the date as BERT
well to obtain the data from that specific date to the
date and time of the data extraction. There are two steps in our framework: pre-training
Step 5: The data obtained considered to be raw and fine-tuning. During pre-training, the model is
data combined with the existing data along with trained on unlabeled data over different pre-training
tasks. For fine tuning, the BERT model is first hidden vector for the ith input token as Ti∈ R H.
initialized with the pre-trained parameters, and all For a given token, its input representation is
of the parameters are fine-tuned using labeled data constructed by summing the corresponding token,
from the downstream tasks. Each downstream task segment, and position embeddings. [3] proposed a
has separate fine-tuned models, even though they system, this system has concentrated on finding a
are initialized with the same pre-trained fast and interactive segmentation method for liver
parameters. The question-answering example in and tumor segmentation. In the pre-processing
Figure 1 will serve as a running example for this stage, Mean shift filter is applied to CT image
section. process and statistical thresholding method is
A distinctive feature of BERT is its unified applied for reducing processing area with
architecture across different tasks. There is minimal improving detections rate. In the Second stage, the
difference between the pre-trained architecture and liver region has been segmented using the
the final downstream architecture. BERT’s model algorithm of the proposed method.[6] proposed a
architecture is a multi-layer bidirectional system, in which a predicate is defined for
Transformer encoder based on the original measuring the evidence for a boundary between
implementation described in Vaswani et al. (2017) two regions using Geodesic Graph-based
and released in the tensor2tensor library.1 representation of the image. The algorithm is
Because the use of Transformers has become applied to image segmentation using two different
common and our implementation is almost kinds of local neighborhoods in constructing the
identical to the original, we will omit an exhaustive graph. Liver and hepatic tumor segmentation can
background description of the model architecture be automatically processed by the Geodesic graph-
and refer readers to Vaswani et al. (2017) as well as cut based method.
excellent guides such as “The Annotated
Transformer.” A visualization of this construction IV. PROPOSED METHODOLOGY
can be seen in Figure 2. BERTBASE was chosen to
How Proposed Methodology Works:
have the same model size as OpenAI GPT for
Data Collection:
comparison purposes. Critically, however, the
The first step in any sentiment analysis project
BERT Transformer uses bidirectional self-
is to gather relevant data. In this case, a dataset of
attention, while the GPT Transformer uses
Twitter tweets is required, where each tweet is
constrained self-attention where every token can
labeled with its corresponding sentiment (positive,
only attend to context to its left. To make BERT
negative, or neutral). There are publicly available
handle a variety of down-stream tasks, our input
datasets specifically designed for sentiment
representation is able to unambiguously represent
analysis tasks, or you can create a custom dataset
both a single sentence and a pair of sentences in
by manually annotating tweets with sentiment
one token sequence. Throughout this work, a
labels.
“sentence” can be an arbitrary span of contiguous
Data Preprocessing:
text, rather than an actual linguistic sentence.
Twitter data often contains noise in the form
A “sequence” refers to the input token
of hashtags, mentions, URLs, and special
sequence to BERT, which may be a single sentence
characters. Preprocessing is crucial to remove these
or two sentences packed together. We use Word
elements and ensure that the data is cleaned and
Piece embeddings (Wu et al., 2016) with a 30,000
ready for analysis. Additionally, tokenization is
token vocabulary. The first token of every
applied to split the tweets into individual words or
sequence is always a special classification token
sub words for further processing.
([CLS]). The final hidden state corresponding to
Fine-tuning BERT Model:
this token is used as the aggregate sequence
BERT, a transformer-based language model,
representation for classification tasks. Sentence
has shown exceptional performance in various NLP
pairs are packed together into a single sequence.
tasks, including sentiment analysis. However, pre-
We differentiate the sentences in two ways. First,
trained BERT models are not directly suitable for
we separate them with a special token ([SEP]).
sentiment classification on Twitter data. Therefore,
Second, we add a learned embedding to every
the BERT model needs to be fine-tuned using the
token indicating whether it belongs to sentence A
labeled Twitter dataset.
or sentence B. As shown in Figure 1, we denote
Tokenization:
input embedding as E, the final hidden vector of the
Tokenization is a critical step to convert the
special [CLS] token as C ∈ R H, and the final
preprocessed text into tokens that the BERT model
c. Stemming and Lemming: Stemming and 4. Visualization of the results based on the
lemming are almost one another represents classification of sentiment aspects is necessary for
similar working but they differ minutely. the obtained results.
Stemming as well as lemming is getting the
basic word from the existing word, but REFERENCES
stemming won’t consider the meaning of the 1. Alsaeedi, Abdullah, and Mohammad Zubair Khan. "A
study on sentiment analysis techniques of Twitter data."
word obtained whereas lemming will consider
International Journal of Advanced Computer Science and
the meaning of the word obtained. So it is Applications 10.2 (2019): 361-374.
always essential that lemming should be
2. Chen, Xin, et al. "A novel feature extraction methodology
implemented along with stemming. for sentiment analysis of product reviews." Neural
3. Extraction of features:The processed data need Computing and Applications 31.10 (2019): 6625-6642.
to pass into the proposed model to get identify the
3. Christo Ananth, D.L.Roshni Bai , K.Renuka, C.Savithra,
features and their mapping for the identification of A.Vidhya, “Interactive Automatic Hepatic Tumor CT
various sentiments depending on the dataset. Image Segmentation”, International Journal of Emerging
4. Identifying the keywords and classifying the Research in Management &Technology (IJERMT),
movie ratings according to the recognized Volume-3, Issue-1, January 2014,pp 16-20.
sentiments. 4. Haque, TanjimUl, Nudrat Nawal Saber, and Faisal
5. The training dataset and testing dataset need to Muhammad Shah. "Sentiment analysis on large scale
Amazon product reviews." 2018 IEEE International
pass into the proposed model and evaluate the Conference on Innovative Research and Development
model. (ICIRD). IEEE, 2018.
6. Now, the validation dataset will be passed on to
5. Ireland, Robert, and Ang Liu. "Application of data
the trained and tested model to attain the prediction analytics for product design: Sentiment analysis of online
or identification of polarized sentiments as positive product reviews." CIRP Journal of Manufacturing
or negative. Science and Technology 23 (2018): 128-144.
6. Christo Ananth, D.L.Roshni Bai, K.Renuka, A.Vidhya,
C.Savithra, “Liver and Hepatic Tumor Segmentation in
3D CT Images”, International Journal of Advanced
Research in Computer Engineering & Technology
(IJARTET), Volume 3,Issue-2, February 2014,pp 496-
503.
7. Kiran, M. Vamsee Krishna, et al. "User-specific product
recommendation and rating system by performing
sentiment analysis on product reviews." 2017 4th
International Conference on Advanced Computing and
Communication Systems (ICACCS). IEEE, 2017. [8]
Kouloumpis, Efthymios, Theresa Wilson, and Johanna
Moore. "Twitter sentiment analysis: The good the bad and
the omg!." Fifth International AAAI Conference on
weblogs and social media. 2011.
Fig 3: Flowchart representing the proposed 8. Liang, Ruxia, and Jian-Qiang Wang. "A linguistic
intuitionistic cloud decision support model with sentiment
framework analysis for product selection in E-commerce."
Various Stages of Twitter sentiment analysis: International Journal of Fuzzy Systems 21.3 (2019): 963-
The twitter-based sentiment analysis consists of 977.
various crucial steps to attain the sentiments that 9. M. M. Mostafa, “Clustering halal food consumers: A
hidden the text format. Those steps can be Twitter sentiment analysis,” Int. J. Mark. Res., vol. 61,
represented as follows: no. 3, pp. 320–337, 2019.
1. Gathering the information from the social 10. Mubarok, Mohamad Syahrul, Adiwijaya, and Muhammad
networking site, Twitter using keywords as well as DwiAldhi. "Aspect-based sentiment analysis to review
the tags. products using Naïve Bayes." AIP Conference
Proceedings. Vol. 1867. No. 1. AIP Publishing LLC,
2. Process the obtained data into a structural format 2017.
so that it would be ready for further analysis.
3. Create a model that withdraws the hidden 11. Ramadhani, A. M. and H. S. Goo, “Twitter sentiment
analysis using deep learning methods,” Proc. - 2017 7th
sentiments in the information obtained from Int. Annu. Eng. Semin. Ina. 2017, 2017.
Twitter.