An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews
An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews
Abstract—An abundance of product reviews and opinions is In this paper, the Bidirectional Encoder Representations
being produced every day across the internet and other media. from Transformers (BERT) model [8] is used. It has big
Sentiment analysis analyzes those data and classifies them as neural network architecture with a huge number of parameters.
positive or negative. In this paper, a classification model is
proposed for n-gram sentiment analysis using BERT. Specifically, Practically, training a BERT model on a small dataset from
the large IMDB movie review dataset is used that contains 50K scratch would leads in overfitting. Hence, a pre-trained BERT
instances. This dataset is tokenized and encoded into unigrams, model is used that has already been trained on a huge dataset.
bigrams, and trigrams and their combinations such as unigram This model is then fine-tuned on a relatively smaller dataset
and bigram, bigram and trigram, and unigram, bigram, and for the sentiment classification task. After the recognition and
trigram. The proposed BERT model employs on these extracted
features. Then, this model is evaluated using the F1 score popularity of the BERT model, researchers used this model
and its micro, macro, and weighted-average scores. The model on various NLP tasks such as document classification, rec-
shows comparable results to state-of-the-art methods for all n- ommendation systems, and question and answering. However,
gram features. In particular, the model achieves 94.64% highest most of them have targeted binary sentiment classification.
accuracy for the combination of bigram and trigram features, The pre-trained BERT model can be fine-tuned with just one
and 94.68% unigram, bigram, and trigram features than other
n-gram features. additional output layer to create progressive, state-of-the-art
Index Terms—Sentiment classification, Deep learning, Trans- models for a broad range of NLP tasks. In particular, the
formers, BERT, N-gram features. BERT pre-trained models become fast, easy, and powerful to
use for various downstream tasks, it is likely to give promising
I. I NTRODUCTION results in different Sentiment Datasets that are chosen as well.
Thanks to the explosion of social media, companies often Most of the existing works have focused on unigrams. In this
have to deal with mountains of customer feedback. Therefore, work, the BERT transformer is focused with N-gram feature
sentiment analysis is useful for quickly gaining insights from representation. The contribution of this paper is listed out as
the large volumes of text data [1]. It also helps organizations follows.
to measure the ROI of their marketing campaigns and improve
their customer service [2]. Since sentiment analysis offers
• Addresses sentiment classification task for movie reviews
insights to organizations for understanding their customer’s
with context-independent features.
emotions, they can be conscious of any crisis to come well in
• Employs BERT-Based transformer model with N-gram
time and manage it appropriately [3]. Many statistical models
features such as unigrams, bigrams, trigrams, and their
can be used to achieve this task [4], [5]. With the advancement
combinations of features.
in deep learning, neural network architectures have shown a
• The proposed N-gram-based BERT model achieves the
decent improvement in performance in solving several natural
best result than existing models in terms of precision,
language processing (NLP) tasks like language modeling, text
recall, and F1 scores.
classification, machine translation, etc. [6]. In 2018, Google
introduced the transformer model [7], which is used as transfer
learning in various NLP tasks with state-of-the-art perfor- The rest of this paper is structured as follows: Section II
mance ever since. Transfer learning is a mechanism in which discusses related works in sentiment analysis; Section III
a deep learning model is trained on a large dataset. Then, it introduces the BERT-based model for IMDB movie reviews;
is used to perform similar tasks on another dataset. Such a Section IV discusses experiment results; finally, Section V
model is known as a pre-trained model. offers concluding remarks.
II. R ELATED WORK The authors achieved 89.90% accuracy for the sentiment
analysis task. Alaparthi et al. [22] investigated the sentiment
Sentiment analysis is used in various applications such analysis task using LR classifier, lexicon-based, LSTM, and
as tourism [9], finance [10], healthcare [11], social network BERT. Their study achieved 92.31% accuracy using the BERT
analysis [12], and social media monitoring [13]. Wang et model. Furthermore, Ekbal and Bhattacharyya [23] solved the
al. [14] studied sentiment and topic classification using bigram problem of resource scarcity in sentiment analysis using a
features. Their study indicated that: the bigram word features high-resource language. The authors used multi-task multi-
consistently improves the performance for sentiment analysis lingual framework, which transfers knowledge and maps their
tasks; the NB performs well for the task of short snippet sen- semantic meaning between different languages. Especially, the
timent and SVM performs well for the task of longer snippet authors extracted character n-grams to generate vectors.
sentiment; identified a simple NB and SVM variants performs Ashok Kumar et al. [24] studied the n-gram features for
well on various datasets. Tripathy et al. [15] performed the Abilify drug user reviews using supervised learning meth-
sentiment classification for the movie reviews dataset using ods. The authors indicated that the TF-IDF-based n-gram
n-gram features. They employed the NB (Naı̈ve Bayes), SVM features achieve a better result. Bhuvaneshwari et al. [25]
(Support Vector Machine), Stochastic Gradient descent (SGD), introduced Bi-LSTM with a self-attention-based CNN model
and maximum entropy (ME) classifiers on the n-gram features for subjectivity identification. The authors’ used pre-trained
and the combination of n-gram features. The authors indicated word embedding with n-gram features to capture context
that the system accuracy is decreased for the increased level information between words and sentences. Arevalillo-Herrez
of n-gram features such as trigram, four-gram, and five- et al. [26] adopted the Dual Intent and Entity Transformer for
gram. Their results show that the combination of unigram the task of sentiment analysis using the Rasa NLU open-source
and bigram achieves a better result. Fang et al. [16] presented tool kit. The authors achieved a performance of 90.7% for the
a multi-task learning model to improve the performance of IMDb dataset. Especially, their study indicated that the n-gram
stance prediction. In particular, the authors performed both features with traditional machine learning and deep learning
supervised and unsupervised models for multiple NLP tasks. models are not performing well.
They achieved 91.2% accuracy for the sentiment analysis task Srikanth et al. [27] investigated deep belief neural networks
using unigram features. Vashishtha et al. [17] proposed an to analyze sentiment in COVID-19 tweets. The authors
unsupervised method using n-gram features for the task of used a different combination of preprocessing techniques
sentiment analysis. This method formulates phrases, computes to investigate sentiment in tweets using n-gram features.
opinion scores, and opinion polarity using the fuzzy linguistic In summary, the existing researchers studied the n-gram
method. In particular, the authors used k-means clustering with sentiment analysis task using a bag of words, TF-IDF, and
fuzzy entropy filter to extract keyphrases that are significant context-dependent features. Therefore, this research paper
for sentiment analysis. Cambria et al. used neurosymbolic AI considers context-independent features of texts with N-grams.
for sentiment analysis [18]. In this context, the n-gram-based BERT model is proposed
Moreover, Das et al. [19] studied the unstructured text with with contraction word mapping for sentiment analysis.
n-gram features and TF-IDF features. They performed the
MNB, NB, SVM, DT, RF, and KNN on these two features.
Their results indicated that the LR achieves an accuracy of
III. T HE PROPOSED METHODS
90.47% using bigram features and the SVM machine achieves
an accuracy of 91.99% using TF-IDF features. Ali et al. [20] An n-gram-based BERT pre-trained model is proposed for
developed a hybrid model for the sentiment classification task. sentiment classification using the large IMDB movie reviews
This model combines convolutional neural network (CNN) as shown in Fig. 1. The proposed model is split into four
and long short-term memory (LSTM) networks. In this model, main subgroups, namely, input data, pre-processing, n-gram
the authors achieved 89.2% accuracy using the IMDB movie characterization, and BERT pre-trained fine-tuning model.
review dataset. Wang et al. [28] proposed a convolutional
recurrent neural network for the text modeling task. Their
A. Input data
study indicated that the proposed hybrid model strengths the
semantic understandings of the text. Especially, the authors The IMDB movie review dataset [30] is used for the n-
achieved 90.39% accuracy for the IMDB movie reviews gram sentiment analysis task. This dataset contains 50K movie
dataset. Tian et al. [29] implemented an attention-aware bidi- reviews, which are categorized into 25K positive reviews and
rectional gated recurrent unit (BiGRU) framework for the 25K negative reviews. For instance, the review “It is a funny
sentiment analysis task. The authors incorporated interaction film, and it doesn’t make you smile. What a pity!! It’s a simply
between words using pre-attention BiGRU and extracted the painful film. The story is presented without a goal” is labeled
predicted features using post-attention. Their results show that as negative sentiment. Similarly, the review “I like the whole
the attention-aware BiGRU model achieves 90.3% accuracy. film and everything in it. I almost felt like watching my friends
Rauf et al. [21] determined human emotions from the IMDB and me on screen. This movie is a pure masterpiece, very
movie reviews using the BERT model. creative and original” is associated with positive sentiment.
Fig. 1. Architecture Diagram of N-gram-Based BERT model for Sentiment Analysis
TABLE I
C ONFUSION MATRIX FOR N - GRAM - BASED BERT MODEL
TABLE II
T HE PERFORMANCE OF THE U NIGRAM (1G) AND B IGRAM (2G)
TABLE III
T HE PERFORMANCE OF THE T RIGRAM (3G), AND U NIGRAM AND B IGRAM (1G+2G)
TABLE IV
T HE PERFORMANCE OF THE B IGRAM AND T RIGRAM (2G+3G), AND U NIGRAM , B IGRAM AND T RIGRAM (1G+2G+3G)