Modern Approachesin Sentiment Analysis Models
Modern Approachesin Sentiment Analysis Models
net/publication/382830118
CITATIONS READS
0 92
1 author:
Aleksei Makin
Northeastern University
3 PUBLICATIONS 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by Aleksei Makin on 03 August 2024.
Abstract—This study evaluates the efficiency and accuracy to adhere to best practices at every stage: preprocessing,
of various sentiment analysis models and word embedding feature extraction, and model training.
techniques on datasets with differing text lengths. Using separate One of the factors influencing the accuracy of recognition
datasets for long and short texts, the performance of models in-
corporating Bag of Words (BoW), Term Frequency-Inverse Doc- is the length of the text. On the one hand, too short a text may
ument Frequency (TF-IDF), Word2Vec, and GloVe embeddings not contain enough data for determination; on the other hand,
combined with Naive Bayes, bidirectional LSTM, conv-LSTM, a long text may have too much context, which can complicate
and SieBERT models was compared. Results demonstrate that the formation of an unambiguous assessment [2].
advanced embeddings like GloVe, when used with LSTM-based This article evaluates various algorithms and techniques,
models, significantly improve accuracy. Notably, the SieBERT
model achieved the highest accuracy of 94.55% on the TADA
highlighting that the SieBERT model achieved the highest
dataset and 92.00% % on the IMDB dataset. Additionally, it accuracy of 94.55% on the TADA dataset and 92.00% on
was found that using a subset of approximately 1000 labeled the IMDB dataset. However, the NB classifier with TF-IDF
samples to finetune SieBERT helped mitigate overfitting and vectorizer also performed well, demonstrating that simpler
achieve optimal performance. These findings suggest that careful models can be effective.
selection of models and vectorizers based on text length can lead
This paper’s contribution lies in presenting a comparative
to more efficient and accurate sentiment analysis. Simple models
like Naive Bayes with TF-IDF are still powerful enough. However, analysis of different algorithms, from classic machine learning
finetuning SieBERT for more specific data types and lengths to recent deep learning models, with various word embedding
can yield state-of-the-art results. This approach is particularly techniques. Efficiency is compared across datasets with short
beneficial for different industries, where domain-specific senti- and long texts, and the results are reported.
ment analysis can provide highly accurate insights into customer The paper is organized as follows: Section 2 presents an
feedback, market trends, and other types of communication. This
study offers guidance for future work on achieving state-of-the- overview of related works. Section 3 discusses the methodol-
art results on specific knowledge domains, which is crucial for ogy of search and implementation in detail. Section 4 shows
deploying these models in real-world applications. and discusses the results of the tested models. Section 5
Index Terms—Sentiment Analysis, Natural Language Process- addresses ethical questions.
ing (NLP), Machine Learning, Deep Learning, Word Embed-
dings, Text Classification. II. L ITERATURE R EVIEW
Customer opinions are a valuable source of information
I. I NTRODUCTION that businesses need to capture. Automated sentiment anal-
ysis frameworks are essential tools for achieving this. These
Accurate analysis of customers’ and clients’ emotional
frameworks can help companies guide customers, recommend
components is critical to the race for leadership for most com-
suitable products, and address negative feedback. Additionally,
panies in various industries, from online trading to traditional
sentiment analysis can be used to evaluate competitors and
business customer service. Determining the emotional color
learn from their mistakes. Various models can be employed to
depends on many factors and is a non-obvious task when it
extract sentiment from text.
is required to accurately determine the tonality—positive or
Sentiment analysis models range from rule-based and tra-
negative reaction.
ditional machine learning approaches to cutting-edge deep
Today, professionals have a plethora of algorithms at their
learning techniques. However, these models face challenges
disposal to tackle the challenge of evaluating the emotional
related to training speed, contextual understanding, and model
content of consumer texts. Sentiment analysis, a practical
complexity. This study compares different models and word
application of Natural Language Processing (NLP), is one such
embedding methods to determine the most effective approach
tool.
for analyzing short and long texts.
The field of sentiment analysis employs a diverse array of
approaches and techniques, from traditional ML classifiers like A. Word Embeddings
Naive Bayes and Logistic Regression to advanced models like Word embeddings are a fundamental technique in natural
LSTM with BERT. To achieve the desired accuracy, it’s crucial language processing (NLP) for transforming words into con-
tinuous vector spaces where semantically similar words are in the data. Research has shown that SNNs can achieve
closer together. Keyword embedding techniques include Term satisfactory performance in sentiment analysis when used with
Frequency-Inverse Document Frequency (TF-IDF), Word2Vec, appropriate feature extraction techniques.
GloVe, and BERT. Convolutional Neural Networks (CNN): Originally designed
TF-IDF: Term Frequency-Inverse Document Frequency for image processing, CNNs have been adapted for text
(TF-IDF) measures the importance of words relative to a classification tasks. They use convolutional layers to capture
corpus by evaluating their frequency in the target document local features, followed by pooling layers to reduce dimen-
versus the entire collection. It transforms text into continuous sionality and fully connected layers for classification. CNNs
numerical vectors for use in machine learning models. In sen- are effective for sentiment analysis, especially when combined
timent analysis, TF-IDF combined with deep learning models with word embeddings like Word2Vec and GloVe [3].
like DNN, CNN, and RNN has shown notable performance. Long Short-Term Memory (LSTM): A Recurrent Neural
For example, a study demonstrated that TF-IDF with a CNN Network (RNN) capable of learning long-term dependencies.
achieved a precision of 0.777 and a recall of 0.790 on the LSTM networks are particularly useful for sequence prediction
Sentiment140 dataset [1]. problems in NLP, such as sentiment analysis. Studies have
Word2Vec: Developed by Google, Word2Vec includes two demonstrated that LSTMs, especially when combined with
models: Continuous Bag of Words (CBOW) and Skip-gram. GloVe embeddings, can achieve high accuracy in sentiment
Both models are designed to capture word context, with classification tasks [3] [4].
CBOW predicting target words from context words and Skip- ConvLSTM: A combination of CNN and LSTM, ConvL-
gram doing the reverse. Studies have shown Word2Vec to be STM integrates convolutional operations into LSTM units,
effective in various sentiment analysis tasks, achieving notable making it suitable for spatiotemporal data and complex NLP
performance improvements in capturing semantic relationships tasks. Recent research has shown ConvLSTM to outperform
between words. For instance, combining Word2Vec embed- traditional LSTM models in sentiment analysis, achieving
dings and CNN models has demonstrated superior results in improved accuracy by capturing both local and sequential
emotion detection tasks. Specifically, this approach achieved features of text [4].
an F1-score of 0.64, significantly outperforming traditional BERT: Bidirectional Encoder Representations from Trans-
methods like TF-IDF combined with Naive Bayes and Logistic formers (BERT) by Google is a transformer-based model pre-
Regression, which scored 0.56 and 0.58, respectively [2] [3]. trained on a large corpus of text, then fine-tuned for specific
GloVe: Created by Stanford, GloVe (Global Vectors for tasks. BERT considers context from both directions (left and
Word Representation) generates word vectors by aggregating right), making it highly effective for a wide range of NLP
global word-word co-occurrence statistics from a corpus, tasks. Research has demonstrated that BERT, particularly its
producing a vector space where word relationships are rep- smaller variant DistilBERT, outperforms traditional models in
resented. The use of GloVe embeddings in sentiment analysis sentiment analysis, achieving high accuracy scores on datasets
has led to significant improvements in accuracy, particularly like SST2 [5].
when combined with deep learning models like LSTM. For SieBERT: SieBERT (Sentiment in English BERT) is a pre-
instance, the GloVe-CNN-BiLSTM model achieved an accu- trained language model specifically designed for sentiment
racy of 95.60% with an F1-score of 0.9489. Another study analysis tasks. It leverages the transformer architecture to
noted that models utilizing TF-IDF-Glove with various neural provide contextualized embeddings and has been fine-tuned
network architectures, such as BiLSTM and CNN, attained on a large-scale dataset of sentiment-labeled text documents.
accuracies up to 93.58% [2] [3]. SieBERT has demonstrated superior performance in sentiment
classification tasks, achieving high accuracy with minimal
B. Models
training. The model’s ability to be fine-tuned on specific
Logistic Regression (LR): A popular statistical method for datasets further enhances its performance, making it a valuable
binary classification, logistic regression has been effectively tool for sentiment analysis [7].
used for sentiment analysis due to its simplicity and inter-
pretability. Studies have shown LR combined with TF-IDF to C. Detailed Analysis from Related Studies
achieve robust performance in various sentiment analysis tasks
[4]. Xu (2023) proposed a Convolutional Long Short-Term
Naive Bayes (NB): A probabilistic classifier based on Memory (ConvLSTM) model for movie review sentiment
Bayes’ theorem, particularly effective for text classification analysis. The model captures sequential information and long-
tasks such as sentiment analysis due to its strong assump- distance dependencies in text, outperforming traditional meth-
tions of feature independence. NB combined with TF-IDF ods in sentiment analysis tasks. The ConvLSTM model inte-
has demonstrated competitive accuracy, making it a reliable grates convolutional operations into LSTM units, making it
baseline for sentiment analysis models [5]. suitable for spatiotemporal data and complex NLP tasks. The
Shallow Neural Networks (SNN): include feed-forward neu- ConvLSTM model can be represented as follows:
ral networks with a few hidden layers. They are simpler than
deep networks but can still capture non-linear relationships Ct = σ(Wx ∗ xt + Wh ∗ ht−1 + b)
where Ct is the cell state at time t, Wx and Wh are weight reviews can be complex to analyze due to potentially mixed
matrices, xt is the input at time t, and σ is the activation sentiments. However, hybrid models such as SVM combined
function [4]. with Enhanced Vector Space Models (EVSMs) have shown
Wang (2024) explored the application of Word2Vec and effectiveness in analyzing short texts, with reported accuracy
Support Vector Machine (SVM) in sentiment analysis of rates as high as 92.78% [2].
Amazon reviews. The study found that combining Word2Vec The quality of data used for sentiment analysis significantly
for feature extraction with SVM for classification achieved affects the accuracy of the results. Li et al. (2024) examined the
efficient and accurate sentiment classification, outperforming impact of data quality on sentiment classification performance,
traditional methods. The Word2Vec model generates word considering three criteria: informativeness, readability, and
embeddings by minimizing the following objective function: subjectivity. The study highlighted that higher readability and
T
shorter text datasets led to more accurate sentiment classifi-
X X cation. Important preprocessing steps, such as removing noise
J(θ) = − log p(wt+j |wt )
t=1 −c≤j≤c,j̸=0
and normalizing data, improve data quality and help ensure
reliable sentiment analysis results [2].
where wt is the target word, wt+j are the context words, Assuming the findings from related works, recent deep-
and c is the context window size [3]. learning models with word embeddings perform close to ideal.
Suresh Kumar et al. (2024) introduced a hybrid machine However, there is no clear vision of differentiating approaches
learning model using the Enhanced Vector Space Model for short texts from long ones.
(EVSM) and Hybrid Support Vector Machine (HSVM) classi-
fier. This approach achieved an accuracy of 92.78%, demon-
strating improved sentiment analysis performance by leverag- IV. M ETHODOLOGY
ing advanced vector space models and multiclass classification
techniques. The SVM classifier aims to find the optimal hy- This study adopted a top-down quantitative research ap-
perplane that maximizes the margin between different classes, proach to explore the efficiency and accuracy of sentiment
represented by: analysis (SA) across various word embedding (WE) techniques
and models. The key factor significantly influencing SA per-
2 formance was examined: the length of texts.
max
w,b ∥w∥ The research began with a literature review to identify the
subject to yi (w ·xi +b) ≥ 1 for all i, where w is the weight latest advancements in WE techniques, models and approaches
vector, b is the bias, xi are the input vectors, and yi are the for sentiment analysis. The primary goal was to map the
class labels [2]. field’s current state and identify potential gaps and challenges,
Wu et al. (2024) conducted research on the application particularly in handling texts of varying lengths.
of deep learning-based BERT models in sentiment analysis. Extensive searches were conducted in academic databases
The study highlighted the significant potential of incorporating and journals to find recent papers on WE techniques and
BERT models into sentiment analysis tasks, achieving an sentiment analysis models. This review enabled the identifi-
accuracy of 91.3% with DistilBERT on the SST2 dataset, cation of all recent and classic WE models, such as TF-IDF,
outperforming traditional models like FastText, Word2Vec, Word2Vec, and GloVe, along with sentiment analysis models,
and GloVe. The BERT model uses the following transformer including Naive Bayes, Logistic Regression, SNN, CNN,
architecture: LSTM, ConvLSTM, and BERT with finetuned SieBERT,
created especially for sentiment analysis tasks.
QK T
Attention(Q, K, V ) = softmax √ V The identified models and techniques were categorized
dk and analyzed, highlighting their strengths, weaknesses, and
where Q is the query matrix, K is the key matrix, V is the application areas. This mapping exercise provided a clear
value matrix, and dk is the dimension of the key vectors [5]. understanding of the cutting-edge techniques in the field. It
Hartmann et al. (2022) introduced SieBERT, a pre-trained revealed areas that require further research, particularly the
language model designed explicitly for sentiment analysis impact of text length on model performance. Datasets such as
tasks. SieBERT leverages the transformer architecture to pro- IMDB for long texts and TADA for short texts were selected,
vide contextualized embeddings and has been fine-tuned on a ensuring they were balanced for training purposes. The result
large-scale dataset of sentiment-labeled text documents. The of this analysis is represented in Figure 1.
study demonstrated that SieBERT performs better in sentiment By systematically evaluating different WE techniques and
classification tasks with minimal training. [7]. models on these datasets, the research aimed to provide
insights into the most effective strategies for handling short
III. DATA Q UALITY AND L ONGEVITY OF T EXTS IN and long texts in sentiment analysis. The research was guided
A NALYSIS by the hypothesis that text length and training dataset balance
Sentiment analysis of short texts like tweets can be chal- are critical factors affecting the accuracy of sentiment analysis
lenging due to their limited context, while long texts like models.
B. Framework of Experiments
The following steps outline the main framework for building
and evaluating sentiment analysis models in this study, as
shown in Figure 3.
Fig. 4. IMDB Train Dataset Analysis Fig. 6. TADA Train Dataset Analysis
Fig. 5. IMDB Test Dataset Analysis Fig. 7. TADA Test Dataset Analysis
size of 32 and a dataset size of 500, achieving an accuracy of c) Conv-LSTM with GloVe: GloVe embeddings achieved
93.20%. higher accuracy when combined with conv-LSTM models.
The conv-LSTM model with GloVe achieved 91.54% accuracy
VI. E XPERIMENTAL E NVIRONMENT on the IMDB dataset and 93.12% on the TADA dataset,
The experiments were conducted on Google Colab Pro, demonstrating strong performance, particularly for short text
which utilizes GPU-enabled machines for faster computation. analysis.
This environment provides the necessary computational power d) SieBERT: SieBERT achieved the highest accuracy
to efficiently handle large datasets and complex models. among all models, with 92.00% on the IMDB dataset and
The primary machine specifications included GPU NVIDIA 94.55% on the TADA dataset, indicating its superior perfor-
Tesla P100 or T4. mance in handling both short and long texts.
The following libraries and tools were used for the imple- A. Comparison with Related Work
mentation of the experiments:
The table below compares the accuracy of models and vec-
• TensorFlow: For building and training neural network
torizers used in this study with results from related researchs:
models, including dense networks and LSTMs.
TF-IDF with Naive Bayes achieved 86.39% accuracy on
• Scikit-learn: For implementing traditional machine learn-
long texts and 88.02% on short texts, significantly higher
ing models like Naive Bayes and Logistic Regression, as
than the related work’s 71.10%. GloVe with LSTM showed
well as for vectorizing text data using TF-IDF and BoW.
improvements, reaching 87.62% for long texts and 90.23%
• Gensim: For training Word2Vec models and handling
for short texts, compared to 87.18% reported elsewhere. GloVe
word embeddings.
with ConvLSTM performed exceptionally well, with 87.05%
• Hugging Face Transformers: For loading pre-trained
accuracy on long texts and 91.25% on short texts, compared
BERT models and tokenizers.
to 89.36% reported in related work. SieBERT matched closely
• NLTK: For natural language processing tasks such as
with related studies, achieving 92.00% accuracy on long texts
tokenization and stop word removal.
and 94.55% on short texts, compared to 91.30% reported
• Pandas and NumPy: For data manipulation and numerical
elsewhere.
operations.
• Matplotlib and Seaborn: For data visualization and plot-
ting results.
VII. R ESULTS
Various models’ performance was evaluated using IMDB
(with long sentences) and TADA (with short sentences). The
table below summarizes the accuracy achieved by each model
and vectorizer combination: Fig. 9. Comparison of accuracy of models and vectorizers with results from
related research.
B. Observations
The use of advanced embeddings like GloVe, combined
with LSTM-based models, or BERT fine-tuned transformers
significantly improved the accuracy of sentiment analysis,
particularly on datasets with varying text lengths. These
Fig. 8. Accuracy achieved by each model and vectorizer combination on
IMDB and TADA datasets. findings underscore the importance of selecting appropriate
vectorizers and models based on the characteristics of the
The results indicate the following key findings: text data. However, Naive Bayes with TF-IDF still achieves
a) Naive Bayes with BoW and TF-IDF: Both BoW and nearly the same level of accuracy without requiring extensive
TF-IDF performed well with Naive Bayes, achieving 84.84% computational resources for training.
and 86.39% accuracy on the IMDB dataset, respectively, and VIII. E THICS
87.52% and 88.02% on the TADA dataset, respectively. TF-
IDF slightly outperformed BoW. This approach could be very A. Privacy Concerns
efficient for short sentences, being the fastest and most cost- One of the primary ethical issues in sentiment analysis is
effective of all the options. the potential invasion of privacy. By analyzing personal text
b) Word2Vec: Word2Vec showed significantly lower ac- data, there is a risk of inadvertently identifying individuals
curacy than other vectorizers, achieving 66.95% on the IMDB and revealing personal information. Even anonymized data
dataset and 55.41% on the TADA dataset, highlighting its can sometimes be re-identified, leading to breaches of privacy.
limitations. Users may not be aware that their data is being used for
sentiment analysis, which raises concerns about consent and Future work should focus on exploring how to achieve
transparency. Additionally, sentiment analysis can infer emo- state-of-the-art results in specific knowledge domains. Under-
tions and opinions that individuals might prefer to keep private, standing each approach’s resource requirements and domain-
leading to a sense of intrusion. To mitigate these concerns, it is specific performance will help select the most suitable models
essential to implement robust data anonymization techniques, for deployment in real-world applications, balancing accuracy
obtain explicit user consent, and ensure transparency about with computational efficiency.
data usage. Additionally, providing users with tools to clean From an ethical perspective, sentiment analysis provides
or neutralize sentiment in their text before submission can many benefits, such as understanding customer feedback and
help protect their privacy. These tools can automatically detect improving user experiences. However, it also presents ethical
and neutralize emotional content, allowing users to control the challenges, including privacy invasion, potential biases, and
sentiment conveyed in their data. the risk of misclassification. To address these concerns, adopt-
ing a responsible approach to data handling, model develop-
B. Errors and Misclassification ment, and obtaining user consent is crucial. Ensuring ethical
Errors in sentiment analysis can have practical and ethical use of sentiment analysis technology involves implementing
implications. Misclassifications can lead to incorrect conclu- robust data anonymization techniques, conducting regular bias
sions about individuals’ opinions and emotions, potentially audits, and maintaining transparency about data usage. Fu-
resulting in inappropriate responses or recommendations. For ture work should focus on developing advanced methods to
example, a negative sentiment incorrectly identified as positive mitigate bias, enhance privacy protections, and improve the
might lead to recommendations that frustrate users, while a transparency and accountability of sentiment analysis sys-
positive sentiment misclassified as negative could result in tems. Additionally, offering users tools to clean or neutralize
unnecessary concern or intervention. To mitigate the impact of sentiment in their text and incorporating randomization in
errors, it is essential to implement robust validation and testing recommendations can further protect privacy and support user
procedures, use ensemble methods to improve model accuracy, autonomy.
and provide mechanisms for users to correct misclassifications.
R EFERENCES
Continuous monitoring and updating of models based on user
feedback can also help reduce errors over time. [1] N. C. Dang, M. N. Moreno-Garcı́a, and F. De la Prieta, ”Sen-
timent Analysis Based on Deep Learning: A Comparative Study,”
Electronics, vol. 9, no. 3, pp. 1-29, 2020. [Online]. Available:
C. Impact on User Autonomy http://dx.doi.org/10.3390/electronics9030483.
Sentiment analysis can influence user autonomy by subtly [2] K. Suresh Kumar, A. S. Radha Mani, T. Ananth Kumar, Ahmad Jalili,
Mehdi Gheisari, Yasir Malik, Hsing-Chung Chen, and Ata Jahangir
shaping user experiences and decisions. For instance, person- Moshayedi, ”Sentiment Analysis of Short Texts Using SVMs and
alized content recommendations based on sentiment analysis VSMs-Based Multiclass Semantic Classification,” Applied Artificial
can create echo chambers, where users are only exposed to Intelligence, vol. 38, no. 1, pp. e2321555, 2024. [Online]. Available:
https://doi.org/10.1080/08839514.2024.2321555.
information that reinforces their existing views. This can limit [3] A. Mahmood, T. Adnan, and M. H. Ali, ”Emotion detection us-
users’ exposure to diverse perspectives and restrict their ability ing Word2Vec and convolutional neural networks,” Procedia Com-
to make fully informed decisions. To mitigate this impact, it puter Science, vol. 177, pp. 349-355, 2020. [Online]. Available:
https://doi.org/10.1016/j.procs.2020.10.048.
is important to design recommendation systems that promote [4] Y. Jang, J. Park, and S. Kim, ”Word2Vec and SVM Fusion
diverse content exposure and provide users with options to for Advanced Sentiment Analysis,” Journal of Information Sci-
customize their content preferences. Transparency about how ence, vol. 45, no. 4, pp. 564-577, 2019. [Online]. Available:
https://doi.org/10.1177/0165551519837184.
recommendations are generated and offering users control over [5] Y. Wu, Z. Jin, C. Shi, P. Liang, and T. Zhan, ”Research on the
their data can further enhance user autonomy. Additionally, application of deep learning-based BERT model in sentiment anal-
incorporating some degree of randomization in recommenda- ysis,” in Proceedings of the 2nd International Conference on Soft-
ware Engineering and Machine Learning, 2024. [Online]. Available:
tions can introduce novel content and ideas, fostering creativity https://doi.org/10.54254/2755-2721/71/2024MA0051.
and preventing the formation of echo chambers. [6] L. Xiaoyan, R. C. Raga, and S. Xuemei, ”GloVe-CNN-BiLSTM
Model for Sentiment Analysis on Text Reviews,” Journal of Sen-
IX. C ONCLUSION sors, vol. 2022, Article ID 7212366, 2022. [Online]. Available:
https://doi.org/10.1155/2022/7212366.
This study evaluated various models and vectorizers for [7] J. Hartmann, M. Heitmann, C. Siebert, C. Schamp, ”More than a
sentiment analysis on datasets with different text lengths. The Feeling: Accuracy and Application of Sentiment Analysis,” Interna-
tional Journal of Research in Marketing, 2022. [Online]. Available:
best performance was observed with the SieBERT model, https://doi.org/10.1016/j.ijresmar.2022.05.005.
which achieved 94.55% accuracy on short texts and 92.00%
accuracy on long texts. The ConvLSTM model using GloVe
embeddings also showed strong results, achieving 93.12% ac-
curacy on short texts and 91.54% accuracy on long texts. Naive
Bayes with TF-IDF demonstrated competitive performance
with 88.02% accuracy on short texts and 86.39% accuracy
on long texts, making it a viable option for scenarios with
limited computational resources.