An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews

Uploaded by

Saad Tayef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews

Uploaded by

Saad Tayef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

An N-gram-Based BERT model for

Sentiment Classification Using Movie Reviews

Tina Esther Trueman Ashok Kumar Jayaraman Gayathri Ananthakrishnan
Department of Computer Science Information Science and Technology Department of Information Technology
University of the People, United States Anna University, Chennai, India VIT University, Vellore, India
[email protected] [email protected] [email protected]

Erik Cambria Satanik Mitra

Computer Science and Engineering Department of ISE
Nanyang Technological University, Singapore Indian Institute of Technology, Kharagpur
[email protected] [email protected]

Abstract—An abundance of product reviews and opinions is In this paper, the Bidirectional Encoder Representations
being produced every day across the internet and other media. from Transformers (BERT) model [8] is used. It has big
Sentiment analysis analyzes those data and classifies them as neural network architecture with a huge number of parameters.
positive or negative. In this paper, a classification model is
proposed for n-gram sentiment analysis using BERT. Specifically, Practically, training a BERT model on a small dataset from
the large IMDB movie review dataset is used that contains 50K scratch would leads in overfitting. Hence, a pre-trained BERT
instances. This dataset is tokenized and encoded into unigrams, model is used that has already been trained on a huge dataset.
bigrams, and trigrams and their combinations such as unigram This model is then fine-tuned on a relatively smaller dataset
and bigram, bigram and trigram, and unigram, bigram, and for the sentiment classification task. After the recognition and
trigram. The proposed BERT model employs on these extracted
features. Then, this model is evaluated using the F1 score popularity of the BERT model, researchers used this model
and its micro, macro, and weighted-average scores. The model on various NLP tasks such as document classification, rec-
shows comparable results to state-of-the-art methods for all n- ommendation systems, and question and answering. However,
gram features. In particular, the model achieves 94.64% highest most of them have targeted binary sentiment classification.
accuracy for the combination of bigram and trigram features, The pre-trained BERT model can be fine-tuned with just one
and 94.68% unigram, bigram, and trigram features than other
n-gram features. additional output layer to create progressive, state-of-the-art
Index Terms—Sentiment classification, Deep learning, Trans- models for a broad range of NLP tasks. In particular, the
formers, BERT, N-gram features. BERT pre-trained models become fast, easy, and powerful to
use for various downstream tasks, it is likely to give promising
I. I NTRODUCTION results in different Sentiment Datasets that are chosen as well.
Thanks to the explosion of social media, companies often Most of the existing works have focused on unigrams. In this
have to deal with mountains of customer feedback. Therefore, work, the BERT transformer is focused with N-gram feature
sentiment analysis is useful for quickly gaining insights from representation. The contribution of this paper is listed out as
the large volumes of text data [1]. It also helps organizations follows.
to measure the ROI of their marketing campaigns and improve
their customer service [2]. Since sentiment analysis offers
• Addresses sentiment classification task for movie reviews
insights to organizations for understanding their customer’s
with context-independent features.
emotions, they can be conscious of any crisis to come well in
• Employs BERT-Based transformer model with N-gram
time and manage it appropriately [3]. Many statistical models
features such as unigrams, bigrams, trigrams, and their
can be used to achieve this task [4], [5]. With the advancement
combinations of features.
in deep learning, neural network architectures have shown a
• The proposed N-gram-based BERT model achieves the
decent improvement in performance in solving several natural
best result than existing models in terms of precision,
language processing (NLP) tasks like language modeling, text
recall, and F1 scores.
classification, machine translation, etc. [6]. In 2018, Google
introduced the transformer model [7], which is used as transfer
learning in various NLP tasks with state-of-the-art perfor- The rest of this paper is structured as follows: Section II
mance ever since. Transfer learning is a mechanism in which discusses related works in sentiment analysis; Section III
a deep learning model is trained on a large dataset. Then, it introduces the BERT-based model for IMDB movie reviews;
is used to perform similar tasks on another dataset. Such a Section IV discusses experiment results; finally, Section V
model is known as a pre-trained model. offers concluding remarks.
II. R ELATED WORK The authors achieved 89.90% accuracy for the sentiment
analysis task. Alaparthi et al. [22] investigated the sentiment
Sentiment analysis is used in various applications such analysis task using LR classifier, lexicon-based, LSTM, and
as tourism [9], finance [10], healthcare [11], social network BERT. Their study achieved 92.31% accuracy using the BERT
analysis [12], and social media monitoring [13]. Wang et model. Furthermore, Ekbal and Bhattacharyya [23] solved the
al. [14] studied sentiment and topic classification using bigram problem of resource scarcity in sentiment analysis using a
features. Their study indicated that: the bigram word features high-resource language. The authors used multi-task multi-
consistently improves the performance for sentiment analysis lingual framework, which transfers knowledge and maps their
tasks; the NB performs well for the task of short snippet sen- semantic meaning between different languages. Especially, the
timent and SVM performs well for the task of longer snippet authors extracted character n-grams to generate vectors.
sentiment; identified a simple NB and SVM variants performs Ashok Kumar et al. [24] studied the n-gram features for
well on various datasets. Tripathy et al. [15] performed the Abilify drug user reviews using supervised learning meth-
sentiment classification for the movie reviews dataset using ods. The authors indicated that the TF-IDF-based n-gram
n-gram features. They employed the NB (Naı̈ve Bayes), SVM features achieve a better result. Bhuvaneshwari et al. [25]
(Support Vector Machine), Stochastic Gradient descent (SGD), introduced Bi-LSTM with a self-attention-based CNN model
and maximum entropy (ME) classifiers on the n-gram features for subjectivity identification. The authors’ used pre-trained
and the combination of n-gram features. The authors indicated word embedding with n-gram features to capture context
that the system accuracy is decreased for the increased level information between words and sentences. Arevalillo-Herrez
of n-gram features such as trigram, four-gram, and five- et al. [26] adopted the Dual Intent and Entity Transformer for
gram. Their results show that the combination of unigram the task of sentiment analysis using the Rasa NLU open-source
and bigram achieves a better result. Fang et al. [16] presented tool kit. The authors achieved a performance of 90.7% for the
a multi-task learning model to improve the performance of IMDb dataset. Especially, their study indicated that the n-gram
stance prediction. In particular, the authors performed both features with traditional machine learning and deep learning
supervised and unsupervised models for multiple NLP tasks. models are not performing well.
They achieved 91.2% accuracy for the sentiment analysis task Srikanth et al. [27] investigated deep belief neural networks
using unigram features. Vashishtha et al. [17] proposed an to analyze sentiment in COVID-19 tweets. The authors
unsupervised method using n-gram features for the task of used a different combination of preprocessing techniques
sentiment analysis. This method formulates phrases, computes to investigate sentiment in tweets using n-gram features.
opinion scores, and opinion polarity using the fuzzy linguistic In summary, the existing researchers studied the n-gram
method. In particular, the authors used k-means clustering with sentiment analysis task using a bag of words, TF-IDF, and
fuzzy entropy filter to extract keyphrases that are significant context-dependent features. Therefore, this research paper
for sentiment analysis. Cambria et al. used neurosymbolic AI considers context-independent features of texts with N-grams.
for sentiment analysis [18]. In this context, the n-gram-based BERT model is proposed
Moreover, Das et al. [19] studied the unstructured text with with contraction word mapping for sentiment analysis.
n-gram features and TF-IDF features. They performed the
MNB, NB, SVM, DT, RF, and KNN on these two features.
Their results indicated that the LR achieves an accuracy of
III. T HE PROPOSED METHODS
90.47% using bigram features and the SVM machine achieves
an accuracy of 91.99% using TF-IDF features. Ali et al. [20] An n-gram-based BERT pre-trained model is proposed for
developed a hybrid model for the sentiment classification task. sentiment classification using the large IMDB movie reviews
This model combines convolutional neural network (CNN) as shown in Fig. 1. The proposed model is split into four
and long short-term memory (LSTM) networks. In this model, main subgroups, namely, input data, pre-processing, n-gram
the authors achieved 89.2% accuracy using the IMDB movie characterization, and BERT pre-trained fine-tuning model.
review dataset. Wang et al. [28] proposed a convolutional
recurrent neural network for the text modeling task. Their
A. Input data
study indicated that the proposed hybrid model strengths the
semantic understandings of the text. Especially, the authors The IMDB movie review dataset [30] is used for the n-
achieved 90.39% accuracy for the IMDB movie reviews gram sentiment analysis task. This dataset contains 50K movie
dataset. Tian et al. [29] implemented an attention-aware bidi- reviews, which are categorized into 25K positive reviews and
rectional gated recurrent unit (BiGRU) framework for the 25K negative reviews. For instance, the review “It is a funny
sentiment analysis task. The authors incorporated interaction film, and it doesn’t make you smile. What a pity!! It’s a simply
between words using pre-attention BiGRU and extracted the painful film. The story is presented without a goal” is labeled
predicted features using post-attention. Their results show that as negative sentiment. Similarly, the review “I like the whole
the attention-aware BiGRU model achieves 90.3% accuracy. film and everything in it. I almost felt like watching my friends
Rauf et al. [21] determined human emotions from the IMDB and me on screen. This movie is a pure masterpiece, very
movie reviews using the BERT model. creative and original” is associated with positive sentiment.
Fig. 1. Architecture Diagram of N-gram-Based BERT model for Sentiment Analysis

TABLE I
C ONFUSION MATRIX FOR N - GRAM - BASED BERT MODEL

N-grams 1G 2G 3G 1G+2G 2G+3G 1G+2G+3G

Dataset
Class N P N P N P N P N P N P
Training N 20154 96 20157 93 20132 118 20166 84 20151 99 20146 104
P 76 20174 84 20166 81 20169 78 20172 76 20174 74 20176
Validation N 2119 131 2108 142 2096 154 2131 119 2111 139 2102 148
P 133 2117 105 2145 71 2179 134 2116 116 2134 111 2139
Testing N 2357 143 2349 151 2343 157 2359 141 2378 122 2346 154
P 153 2347 141 2359 131 2369 154 2346 146 2354 112 2388
* N-Negative sentiment, P-Positive sentiment

TABLE II
T HE PERFORMANCE OF THE U NIGRAM (1G) AND B IGRAM (2G)

Unigram (1G) Bigram (2G)

Class Training (%) Validation (%) Testing (%) Training (%) Validation (%) Testing (%)
P R F1 P R F1 P R F1 P R F1 P R F1 P R F1
Negative 99.62 99.53 99.58 94.09 94.18 94.14 93.90 94.28 94.09 99.59 99.54 99.56 95.26 93.69 94.47 94.34 93.96 94.15
Positive 99.53 99.62 99.58 94.17 94.09 94.13 94.26 93.88 94.07 99.54 99.59 99.56 93.79 95.33 94.56 93.98 94.36 94.17
Macro 99.58 99.58 99.58 94.13 94.13 94.13 94.08 94.08 94.08 99.56 99.56 99.56 94.52 94.51 94.51 94.16 94.16 94.16
Micro 99.58 99.58 99.58 94.13 94.13 94.13 94.08 94.08 94.08 99.56 99.56 99.56 94.51 94.51 94.51 94.16 94.16 94.16
Weighted 99.58 99.58 99.58 94.13 94.13 94.13 94.08 94.08 94.08 99.56 99.56 99.56 94.52 94.51 94.51 94.16 94.16 94.16
* P-Precision, R-Recall, F1-F1 Score

TABLE III
T HE PERFORMANCE OF THE T RIGRAM (3G), AND U NIGRAM AND B IGRAM (1G+2G)

Trigram (3G) Unigram and Bigram (1G+2G)

Class Training (%) Validation (%) Testing (%) Training (%) Validation (%) Testing (%)
P R F1 P R F1 P R F1 P R F1 P R F1 P R F1
Negative 99.60 99.42 99.51 96.72 93.16 94.91 94.70 93.72 94.21 99.61 99.59 99.60 94.08 94.71 94.40 93.87 94.36 94.12
Positive 99.42 99.60 99.51 93.40 96.84 95.09 93.78 94.76 94.27 99.59 99.61 99.60 94.68 94.04 94.36 94.33 93.84 94.08
Macro 99.51 99.51 99.51 95.06 95.00 95.00 94.24 94.24 94.24 99.60 99.60 99.60 94.38 94.38 94.38 94.10 94.10 94.10
Micro 99.51 99.51 99.51 95.00 95.00 95.00 94.24 94.24 94.24 99.60 99.60 99.60 94.38 94.38 94.38 94.10 94.10 94.10
Weighted 99.51 99.51 99.51 95.06 95.00 95.00 94.24 94.24 94.24 99.60 99.60 99.60 94.38 94.38 94.38 94.10 94.10 94.10
* P-Precision, R-Recall, F1-F1 Score

TABLE IV
T HE PERFORMANCE OF THE B IGRAM AND T RIGRAM (2G+3G), AND U NIGRAM , B IGRAM AND T RIGRAM (1G+2G+3G)

Bigram and Trigram (2G+3G) Unigram, Bigram and Trigram (1G+2G+3G)

Class Training (%) Validation (%) Testing (%) Training (%) Validation (%) Testing (%)
P R F1 P R F1 P R F1 P R F1 P R F1 P R F1
Negative 99.62 99.51 99.57 94.79 93.82 94.30 94.22 95.12 94.67 99.63 99.49 99.56 94.98 93.42 94.20 95.44 93.84 94.63
Positive 99.51 99.62 99.57 93.88 94.84 94.36 95.07 94.16 94.61 99.49 99.63 99.56 93.53 95.07 94.29 93.94 95.52 94.72
Macro 99.57 99.57 99.57 94.34 94.33 94.33 94.64 94.64 94.64 99.56 99.56 99.56 94.26 94.24 94.24 94.69 94.68 94.68
Micro 99.57 99.57 99.57 94.33 94.33 94.33 94.64 94.64 94.64 99.56 99.56 99.56 94.24 94.24 94.24 94.68 94.68 94.68
Weighted 99.57 99.57 99.57 94.34 94.33 94.33 94.64 94.64 94.64 99.56 99.56 99.56 94.26 94.24 94.24 94.69 94.68 94.68
* P-Precision, R-Recall, F1-F1 Score
TABLE V
C OMPARISON OF THE PROPOSED MODEL

Authors Methods 1G 2G 3G 1G+2G 2G+3G 1G+2G+3G

Wang and Manning [14] MNB 83.55 86.59 - - - -
SVM 86.95 89.16 - - - -
NBSVM 88.29 91.22 - - - -
Tripathy et al. [15] NB 83.65 84.06 70.53 86.00 83.82 86.23
MaxEnt 88.48 83.22 71.38 88.42 82.94 83.36
SVM 86.97 83.87 70.16 88.88 83.63 88.94
SGD 85.11 62.36 58.40 83.36 58.74 83.36
Das et al. [19] LogisticRegression - 90.47 - - - -
Vashishtha et al. [17] SentiScore+Fuzzy entropy+k-means - - - 68.60 53.60 69.10
Wang et al. [28] Conv-RNN 90.39 - - - - -
Tian et al. [29] Attention-Aware BiGRU 90.30 - - - - -
Ali et al. [20] CNN-LSTM 89.20 - - - - -
Fang et al. [16] MTransSAN 91.20 - - - - -
Rauf et al. [21] BERT 89.90 - - - - -
Alaparthi et al. [22] BERT 92.31 - - - - -
Proposed N-grams+BERT+CM 94.08 94.16 94.24 94.10 94.64 94.68

D. BERT pre-trained fine-tuning model

BERT is a new language representation model developed
by Google [8] and excels in natural language processing tasks
since it is trained from a large corpus. It overcomes the
problem present in other language models that are learning
either from the left or right only. BERT learns from both direc-
tions and hence has been very successful at natural language
prediction. BERT is pre-trained on a large corpus of unlabeled
text including the entire Wikipedia and BookCorpus of 3,300
million words. BERT uses random masking to predict the next
word during the training phase. BERT learns the context of a
word left and right at the same instant. The two variants of
BERT are BERT Base and BERT Large. Both models are
Fig. 2. The obtained result of the BERT model with N-grams Encoder-only blocks derived from the original transformer
model. The BERT Base consists of 12 layers (transformer
layers) and 12 attention heads with 110 million parameters,
B. Pre-processing and the BERT Large consists of 24 layers (transformer layers)
The following preprocessing steps are performed on the and 16 attention heads with 340 million parameters. Each
movie reviews before feeding them into the BERT model [19]. encoder layer has self-attention and feed-forward layers. Self-
First, the punctuations are removed except for the single and attention relates positions with each other through queries,
double quotes, and periods. Second, all reviews are converted keys, and values. The feed-forward is used to normalizes
from the upper case to the lower case. Third, the Special tokens the output units and learn backpropagation. In this work, the
[CLS] and [SEP] are added at the appropriate positions [7], BERT base is used for n-gram sentiment analysis using IMDB
[8]. Finally, the contraction map is applied for expanding short movie reviews.
words like aren’t into are not. IV. R ESULTS AND DISCUSSION
C. N-Grams Features The n-gram-based BERT model is implemented for the
The preprocessed reviews are tokenized using the Word- sentiment analysis task. Specifically, the large IMDB movie
Piecetokenizer. It breaks the words into their prefix, root, review dataset is used that contains 25K positive reviews
and suffix to handle unseen words better. In particular, the and 25K negative reviews. This data was pre-processed using
Word-Piecetokenizer is used to create n-gram features such case conversion, punctuation, and contraction map. Then, the
as unigram (1G), bigram (2G), trigram (3G), unigram and IMDB movie review dataset is divided into training (40500),
bigram (1G+2G), bigram and trigram (2G+3G), and unigram, validation (4500), and testing (5000) using stratified sampling.
bigram, and trigram (1G+2G+3G) features [14], [15]. The n- Later, the n-gram features are created for 1G, 2G, 3G, and 1G
gram defines a continuous sequence of n tokens from a given and 2G, 2G and 3G, and 1G, 2G, and 3G. The BERT base
review. Moreover, the model training using n-gram features model was employed on these n-gram features. It uses 512
gives a pretty good idea of the ‘probability’ of the occurrence sequence length, 20000 maximum word features, 3 epochs,
of a word after a certain word. and 2e-5 one-cycle learning rate.
Table I shows the confusion matrix of the n-gram features R EFERENCES
for training, validation, and testing respectively. Tables I, II, [1] Shayaa, S., Jaafar, N. I., Bahri, S., Sulaiman, A., Wai, P. S., Chung,
and III show the performance of the unigram, bigram, and Y. W., ... & Al-Garadi, M. A. (2018). Sentiment analysis of big data:
trigram individually as well as unigram and bigram, bigram Methods, applications, and open challenges. IEEE Access, 6, 37807-
37827.
and trigram, and unigram, bigram, and trigram together based
[2] Yom-Tov, G. B., Ashtar, S., Altman, D., Natapov, M., Barkay, N.,
on the precision, recall, F1 score, and its micro, macro, and Westphal, M., & Rafaeli, A. (2018, April). Customer sentiment in web-
weighted averages [31], [32]. In these tables, the training based service interactions: Automated analyses and new insights. In
dataset achieves 100% accuracy for all n-gram features, and Companion Proceedings of the The Web Conference 2018 (pp. 1689-
1697).
the validation dataset achieves 94% for 1G, 1G+2G, 2G+3G, [3] Oneto, L., Bisio, F., Cambria, E., Anguita, D. (2016). Statistical Learning
and 1G+2G+3G features and 95% accuracy for 2G and 3G Theory and ELM for Big Social Data Analysis. IEEE Computational
features, and the testing dataset achieves 94% accuracy for 1G, Intelligence Magazine 11(3), 45-55.
[4] Cambria, E., Schuller, B., Liu, B., Wang, H., Havasi, C. (2013). Statisti-
2G, 3G, and 1G+2G features and 95% accuracy for 2G+3G cal Approaches to Concept-Level Sentiment Analysis. IEEE Intelligent
and 1G+2G+3G features. Overall, the combination of bigram Systems 28 (3), 6-9.
and trigram, and unigram, bigram, and trigram features achieve [5] Ragusa, E., Gastaldo, P., Zunino, Cambria, E. (2020). Balancing Com-
putational Complexity and Generalization Ability: A Novel Design for
the highest accuracy of 95%. Table V compares our proposed ELM. Neurocomputing 401, 405-417.
model with other models. Our model performs comparatively [6] Chaturvedi, I., Ong, Y., Tsang, I., Welsch, R., Cambria, E. (2016).
better than other state-of-the-art models (Figure 2). In par- Learning Word Dependencies in Text by Means of a Deep Recurrent
Belief Network. Knowledge-Based Systems 108, 144-154.
ticular, our model improves 2% accuracy for 1G features, [7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
3% accuracy for 2G features, 23% accuracy for 3G features, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances
5% accuracy for 1G+2G features, 11% accuracy for 2G+3G in neural information processing systems (pp. 5998-6008).
[8] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-
features, 6% accuracy for 1G+2G+3G features. Moreover, training of deep bidirectional transformers for language understanding.
the performance of the n-gram-based BERT model is not arXiv preprint arXiv:1810.04805.
compared with the researchers who performed only with 25K [9] Guerreiro, C., Cambria, E., Nguyen, H. (2019). Understanding the
Role of Social Media in Backpacker Tourism. Proceedings of ICDM
reviews. Therefore, the proposed model seems to outperform Workshops, 530-537.
well for all n-gram features than other existing models. The [10] Merello, S., Ratto, A., Oneto, L., Cambria, E. (2019). Ensemble Ap-
limitation of the proposed method takes longer training time plication of Transfer Learning and Sample Weighting for Stock Market
Prediction. Proceedings of IJCNN.
and weight updates based on the big corpus size. It also needs [11] Mondal, A., Cambria, E., Das, D., Bandyopadhyay, S. (2017). Medi-
more computation cost. ConceptNet: An Affinity Score Based Medical Concept Network. Pro-
ceedings of FLAIRS, 335-340.
V. C ONCLUSION [12] Chandra, P., Cambria, E., Hussain, A. (2012). Clustering Social Net-
works Using Interaction Semantics and Sentics. Advances in Neural
Online movie reviews are involved in the promotion and Networks, 379-385.
box-office revenue collection of a movie among people. There, [13] Rosso, P., Bosco, C., Damiano, R., Patti, V., Cambria, E. (2016).
Emotion and sentiment in social and expressive media: Introduction to
it is one of the most influential processes in the film industry. the special issue. Information Processing and Management 52(1), 1-4.
In this work, an n-gram-based BERT model is performed for [14] Wang, S. I., & Manning, C. D. (2012, July). Baselines and bigrams:
the task of sentiment classification using the IMDB movie Simple, good sentiment and topic classification. In Proceedings of the
50th Annual Meeting of the Association for Computational Linguistics
reviews dataset. The dataset was pre-processed into the format (Volume 2: Short Papers) (pp. 90-94).
of BERT where it accepts input, segment, and position. In [15] Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of
particular, a list of n-gram features is created such as unigrams, sentiment reviews using n-gram machine learning approach. Expert
Systems with Applications, 57, 117-126.
bigrams, trigrams, and their combination of features. Then, the [16] Fang, W., Nadeem, M., Mohtarami, M., & Glass, J. (2019, November).
BERT-based model was employed on these n-gram features Neural multi-task learning for stance prediction. In Proceedings of the
for the task of sentiment analysis. This paper mainly focused Second Workshop on Fact Extraction and VERification (FEVER) (pp.
13-19).
on the context-independent features of n-grams. The obtained
[17] Vashishtha, S., & Susan, S. (2021). Highlighting keyphrases using senti-
results indicate a better result for all n-gram features than scoring and fuzzy entropy for unsupervised sentiment analysis. Expert
other existing models. In particular, the highest accuracy is Systems with Applications, 169, 114323.
achieved from the combination of bigram and trigram features [18] Cambria, E., Liu, Q., Decherchi, S., Xing, F., Kwok, K. (2022).
SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for
(96.64%), and unigram, bigram, and trigram features (94.68%) Explainable Sentiment Analysis. Proceedings of LREC, 3829-3839.
than other n-gram features. These indicate that higher-order [19] Das, M., Kamalanathan, S., & Alphonse, P. (2020). A Comparative
n-gram features significantly improve the accuracy. In future Study on TF-IDF Feature Weighting Method and its Analysis using
Unstructured Dataset.
works, the n-gram features can be studied with gender infor- [20] Ali, N. M., Abd El Hamid, M. M., & Youssif, A. (2019). Sentiment
mation using graph neural networks-based transformers and analysis for movies reviews dataset using deep learning models. Inter-
quantum machine learning approaches. national Journal of Data Mining & Knowledge Management Process
(IJDKP) Vol, 9.
[21] Rauf, S. A., Qiang, Y., Ali, S. B., & Ahmad, W. (2019). Using BERT
ACKNOWLEDGMENT for Checking the Polarity of Movie Reviews. International Journal of
This work was supported by the UGC (University Grants Computer Applications, 975, 8887.
[22] Alaparthi, S., & Mishra, M. (2020). Bidirectional Encoder Representa-
Commission), Government of India under the National Doc- tions from Transformers (BERT): A sentiment analysis odyssey. arXiv
toral Fellowship. preprint arXiv:2007.01127.
[23] Ekbal, A., & Bhattacharyya, P. (2022). Exploring Multi-lingual, Multi-
task, and Adversarial Learning for Low-resource Sentiment Analysis.
Transactions on Asian and Low-Resource Language Information Pro-
cessing, 21(5), 1-19.
[24] Ashok Kumar, J., Abirami, S., & Trueman, T. E. (2022). An N-Gram
Feature-Based Sentiment Classification Model for Drug User Reviews.
In Artificial Intelligence and Evolutionary Computations in Engineering
Systems (pp. 277-297). Springer, Singapore.
[25] Bhuvaneshwari, P., Rao, A. N., Robinson, Y. H., & Thippeswamy, M.
N. (2022). Sentiment analysis for user reviews using Bi-LSTM self-
attention based CNN model. Multimedia Tools and Applications, 81(9),
12405-12419.
[26] Arevalillo-Herrez, M., Arnau-Gonzlez, P., & Ramzan, N. (2022). On
adapting the DIET architecture and the Rasa conversational toolkit for
the sentiment analysis task. IEEE Access.
[27] Srikanth, J., Damodaram, A., Teekaraman, Y., Kuppusamy, R., &
Thelkar, A. R. (2022). Sentiment Analysis on COVID-19 Twitter Data
Streams Using Deep Belief Neural Networks. Computational Intelli-
gence and Neuroscience, 2022.
[28] Wang, C., Jiang, F., & Yang, H. (2017, August). A hybrid framework
for text modeling with convolutional RNN. In Proceedings of the 23rd
ACM SIGKDD international conference on knowledge discovery and
data mining (pp. 2061-2069).
[29] Tian, Z., Rong, W., Shi, L., Liu, J., & Xiong, Z. (2018, August).
Attention aware bidirectional gated recurrent unit based framework for
sentiment analysis. In International Conference on Knowledge Science,
Engineering and Management (pp. 67-78). Springer, Cham.
[30] Maas, A., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C.
(2011, June). Learning word vectors for sentiment analysis. In Proceed-
ings of the 49th annual meeting of the association for computational
linguistics: Human language technologies (pp. 142-150).
[31] Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class
classification: an overview. arXiv preprint arXiv:2008.05756.
[32] Alejo, R., Antonio, J. A., Valdovinos, R. M., & Pacheco-Snchez, J. H.
(2013, June). Assessments metrics for multi-class imbalance learning: A
preliminary study. In Mexican Conference on Pattern Recognition (pp.
335-343). Springer, Berlin, Heidelberg.

NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Madhumita 271220071
No ratings yet
Madhumita 271220071
282 pages
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
No ratings yet
Exploring The Effectiveness of BERT For Sentiment Analysis On Large-Scale Social Media Data
4 pages
Optimization of Sentiment Analysis Using BERT
No ratings yet
Optimization of Sentiment Analysis Using BERT
5 pages
BERT A Review of Applications in Sentiment Analysis
No ratings yet
BERT A Review of Applications in Sentiment Analysis
10 pages
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
No ratings yet
A Natural Language Processing For Sentiment Analysis From Text Using Deep Learning Algorithm
7 pages
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
No ratings yet
A Review On Advances in Sentiment Analysis A Deep Learning Approach Using Transformer Based Models
5 pages
Bert-Sentiment Analysis Ahmed Zeshan 1108 (Final)
No ratings yet
Bert-Sentiment Analysis Ahmed Zeshan 1108 (Final)
79 pages
Individual Dual Sports
100% (1)
Individual Dual Sports
60 pages
SentimentAnalysisOfIMDBMovie Reviews
No ratings yet
SentimentAnalysisOfIMDBMovie Reviews
60 pages
Unit 2
No ratings yet
Unit 2
34 pages
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
No ratings yet
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
16 pages
Sentiment Analysis Using Machine Learning Classifiers
No ratings yet
Sentiment Analysis Using Machine Learning Classifiers
41 pages
Data Science Project
No ratings yet
Data Science Project
24 pages
Preview
No ratings yet
Preview
11 pages
Key Data Extraction and Emotion Analysis of Digital Shopping Based On Bert
No ratings yet
Key Data Extraction and Emotion Analysis of Digital Shopping Based On Bert
12 pages
Vietnamese Sentiment Analysis Under Limited Training Data
No ratings yet
Vietnamese Sentiment Analysis Under Limited Training Data
14 pages
IMDB Sentiment Analysis
No ratings yet
IMDB Sentiment Analysis
44 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
Eknow 2023 2 30 60011
No ratings yet
Eknow 2023 2 30 60011
7 pages
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
No ratings yet
Team Name - Codesmashers Team Members - Manmeet Singh Tuteja, Raghav Gupta
4 pages
Sentiment Analysis Based On Roberta For Amazon Review: An Empirical Study On Decision Making
No ratings yet
Sentiment Analysis Based On Roberta For Amazon Review: An Empirical Study On Decision Making
54 pages
Iscs 476
No ratings yet
Iscs 476
18 pages
Sentiment Analysis and Implementation in Film Eval
No ratings yet
Sentiment Analysis and Implementation in Film Eval
10 pages
Applications of Deep Learning To Sentiment Analysis of Movie Reviews
No ratings yet
Applications of Deep Learning To Sentiment Analysis of Movie Reviews
8 pages
Panchbhai 2021
No ratings yet
Panchbhai 2021
6 pages
BERT Fine-Tuning For Sentiment Analysis On Indonesian Mobile Apps Reviews
No ratings yet
BERT Fine-Tuning For Sentiment Analysis On Indonesian Mobile Apps Reviews
10 pages
2 +intelligent+2024+paper+1
No ratings yet
2 +intelligent+2024+paper+1
12 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
Cambridge Primary and Lower Secondary Checkpoint Faqs January 2025
67% (3)
Cambridge Primary and Lower Secondary Checkpoint Faqs January 2025
5 pages
Shsconf Icssed2023 04007
No ratings yet
Shsconf Icssed2023 04007
5 pages
Springerbook 20224
No ratings yet
Springerbook 20224
12 pages
Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT
No ratings yet
Key Data Extraction and Emotion Analysis of Digital Shopping Based On BERT
14 pages
Research Paper
No ratings yet
Research Paper
5 pages
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
No ratings yet
Analyzing The Performance of Sentiment Analysis Using BERT DistilBERT and RoBERTa
6 pages
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
No ratings yet
Target-Dependent Sentiment Classification With BERT: Zhengjie Gao, Ao Feng, Xinyu Song, and Xi Wu
19 pages
1383-Article Text-6285-2-10-20240305
No ratings yet
1383-Article Text-6285-2-10-20240305
8 pages
Li, Mingzheng Chen, Lei Zhao, Jing Li, Qiang - Anna's Archive
No ratings yet
Li, Mingzheng Chen, Lei Zhao, Jing Li, Qiang - Anna's Archive
9 pages
XLNet Transfer Learning Model For Sentimental Analysis
No ratings yet
XLNet Transfer Learning Model For Sentimental Analysis
9 pages
Research On The Application of Deep Learning-Based BERT Model in Sentiment Analysis
No ratings yet
Research On The Application of Deep Learning-Based BERT Model in Sentiment Analysis
10 pages
MN2
No ratings yet
MN2
17 pages
Emotion Detection in Text Advances in Sentiment Analysis
No ratings yet
Emotion Detection in Text Advances in Sentiment Analysis
9 pages
Project Sentiment Analysis With BERT and Transformers
No ratings yet
Project Sentiment Analysis With BERT and Transformers
8 pages
Miniproject NLP
No ratings yet
Miniproject NLP
22 pages
NILES2021 Paper 43
No ratings yet
NILES2021 Paper 43
5 pages
Shirani MehrH PDF
No ratings yet
Shirani MehrH PDF
8 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
2019 BERT Stock Market
No ratings yet
2019 BERT Stock Market
5 pages
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
No ratings yet
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
49 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
Conference Template A4 1
No ratings yet
Conference Template A4 1
6 pages
Comparison of Word Embedding Features Using Deep Learning in Sentiment Analysis
No ratings yet
Comparison of Word Embedding Features Using Deep Learning in Sentiment Analysis
10 pages
Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
No ratings yet
Ensemble BERT A Student Social Network Text Sentiment Classification Model Based On Ensemble Learning and BERT Architecture
4 pages
Group3 POC Assignment 3
No ratings yet
Group3 POC Assignment 3
9 pages
BERT in Sentimemnt Analysis
No ratings yet
BERT in Sentimemnt Analysis
13 pages
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
No ratings yet
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
8 pages
NLPNEW
No ratings yet
NLPNEW
3 pages
Emotion Detection Via Bert-Based Deep Learning Approaches in Natural Language Processing (#1524075) - 4105316
No ratings yet
Emotion Detection Via Bert-Based Deep Learning Approaches in Natural Language Processing (#1524075) - 4105316
12 pages
NLP Sentiment Analysis
No ratings yet
NLP Sentiment Analysis
1 page
Sentiment Analysis Task On Twitter Data
No ratings yet
Sentiment Analysis Task On Twitter Data
6 pages
M1 - L1 - 3 - Procedure For Sterilization and Sanitation of Nail Care Tools and Equipments
100% (2)
M1 - L1 - 3 - Procedure For Sterilization and Sanitation of Nail Care Tools and Equipments
2 pages
Lesson Exemplar English Grade 11
100% (1)
Lesson Exemplar English Grade 11
5 pages
Factors Affecting The Academic Performance of ABM Students of Santa Isabel College of Manila AY 2017-2018
No ratings yet
Factors Affecting The Academic Performance of ABM Students of Santa Isabel College of Manila AY 2017-2018
14 pages
CBC Structure of Education
100% (1)
CBC Structure of Education
10 pages
Leadership 101
No ratings yet
Leadership 101
112 pages
Midbrain Activation Franchise
No ratings yet
Midbrain Activation Franchise
26 pages
Cot 1 Quarter 1 Lesson Plan
No ratings yet
Cot 1 Quarter 1 Lesson Plan
5 pages
Tle DLL 8
No ratings yet
Tle DLL 8
4 pages
Instructional Materials and English Proficiecyof Grade 10 Students in Magwawaintegrated SCHOOLS.Y 2021-2022
No ratings yet
Instructional Materials and English Proficiecyof Grade 10 Students in Magwawaintegrated SCHOOLS.Y 2021-2022
33 pages
CDPNHS Entrep Q4 W1 Las1
No ratings yet
CDPNHS Entrep Q4 W1 Las1
12 pages
Sas3 Edu542
No ratings yet
Sas3 Edu542
9 pages
Sentiment Analysis of IMDb Movie Reviews Using LSTM
No ratings yet
Sentiment Analysis of IMDb Movie Reviews Using LSTM
4 pages
Working in Partnership With Health and Social Care
No ratings yet
Working in Partnership With Health and Social Care
5 pages
RoBERTa-LSTM A Hybrid Model For Sentiment Analysis With Transformer and Recurrent Neural Network
No ratings yet
RoBERTa-LSTM A Hybrid Model For Sentiment Analysis With Transformer and Recurrent Neural Network
9 pages
TEFL OT Certificate - Sheline 2
No ratings yet
TEFL OT Certificate - Sheline 2
2 pages
Application of The Universal Design For Learning: Inclusive Education - Case Study
No ratings yet
Application of The Universal Design For Learning: Inclusive Education - Case Study
10 pages
Week 2 - ACTIVITY 1 (ASYNC) Sept. 27, 2022 David James B. Ignacio
No ratings yet
Week 2 - ACTIVITY 1 (ASYNC) Sept. 27, 2022 David James B. Ignacio
1 page
Staj 1
No ratings yet
Staj 1
7 pages
Computational Thinking: Computational Thinking
No ratings yet
Computational Thinking: Computational Thinking
22 pages
Lecture-2 Uniprocessor
No ratings yet
Lecture-2 Uniprocessor
18 pages
Syllabus For General English in Higher Education Fitriani A21 English Education
No ratings yet
Syllabus For General English in Higher Education Fitriani A21 English Education
7 pages
Student Fee Rates For FY 2020 and FY 2021
No ratings yet
Student Fee Rates For FY 2020 and FY 2021
18 pages
CLAVE Thesis Proposal EDIT
No ratings yet
CLAVE Thesis Proposal EDIT
23 pages
Neha Chheda Resume 2016
No ratings yet
Neha Chheda Resume 2016
2 pages
Acc QSN at Ans Gce PDF
No ratings yet
Acc QSN at Ans Gce PDF
156 pages
U. 2/ Law and Order Lesson: 1 Lesson Topic: Learning Outcomes
No ratings yet
U. 2/ Law and Order Lesson: 1 Lesson Topic: Learning Outcomes
3 pages
Teacher S Resource Centre at A Glance L4
No ratings yet
Teacher S Resource Centre at A Glance L4
1 page
Wrist EMG Improves Gesture Classification For Stroke Patients
No ratings yet
Wrist EMG Improves Gesture Classification For Stroke Patients
6 pages
Work Ethic Rubric
No ratings yet
Work Ethic Rubric
1 page
Design Thinking: Running Head: Reflective Report
No ratings yet
Design Thinking: Running Head: Reflective Report
22 pages
Achievement and Response of Students at Favorite Junior High Schools in Sukabumi On Trends in International Mathematics
No ratings yet
Achievement and Response of Students at Favorite Junior High Schools in Sukabumi On Trends in International Mathematics
13 pages
Name: Erika Mae Aliviano Teacher: Ms. Ailun Jaugan Gr. & Sec.: Stem 11-Innovativeness III - Self - Learning Activities
No ratings yet
Name: Erika Mae Aliviano Teacher: Ms. Ailun Jaugan Gr. & Sec.: Stem 11-Innovativeness III - Self - Learning Activities
2 pages
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet

An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews

Uploaded by

An N-gram-Based BERT Model For Sentiment Classification Using Movie Reviews

Uploaded by

An N-gram-Based BERT model for

Sentiment Classification Using Movie Reviews

Erik Cambria Satanik Mitra

N-grams 1G 2G 3G 1G+2G 2G+3G 1G+2G+3G

Unigram (1G) Bigram (2G)

Trigram (3G) Unigram and Bigram (1G+2G)

Bigram and Trigram (2G+3G) Unigram, Bigram and Trigram (1G+2G+3G)

Authors Methods 1G 2G 3G 1G+2G 2G+3G 1G+2G+3G

D. BERT pre-trained fine-tuning model

You might also like