Comparative Analysis of Large Language Models and Traditional Methods for Sentiment Analysis of Tweets Dataset
Comparative Analysis of Large Language Models and Traditional Methods for Sentiment Analysis of Tweets Dataset
Abstract:- Sentiment analysis is widely recognised as the Sentiment analysis (SA) informs consumers before they
most actively researched area in data mining. These days, purchase a product whether the information is suitable [6][7].
a number of social media platforms have been created, and This analytical data is used by marketers and businesses to
Twitter is a crucial instrument for exchanging and better understand their goods and services so that they may be
gathering people's thoughts, feelings, opinions, and provided according to the user's requirements [8][9].
attitudes about specific things. This paper presents a Processing, finding, or analysing the factual material available
comparative analysis of Large Language Models (LLMs) is the major emphasis of textual information retrieval strategies
and ML models for sentiment analysis on a Twitter [10].
dataset. The study evaluates a performance of XLNet,
advance algorithms including KNN, RF, and XGBoost. Worldwide, thanks to technological advancements, social
The sentiment analysis methodology involves pre- media platforms like Instagram, Facebook, LinkedIn,
processing the Twitter dataset through noise removal and YouTube, etc. [11][12]. people's ideas and thoughts on things,
tokenisation, followed by feature extraction using methods events, or objectives can be expressed through [13][14]. In
like Bag-of-Words and Word2Vec. Results show that today's world, people all over the world love to communicate
XLNet is superior to the conventional models; It has 99% their opinions and thoughts through short messages called
precision, recall, and F1-score values and an accuracy rate tweets on the popular microblogging network known as Twitter
of 99.54%. In comparison, KNN achieves 78% accuracy, [15][16]. An enormous amount of sentiment data derived from
85% precision, 88% recall, and 86% F1 score, while RF analyses of tweets is frequently generated [17]. Twitter is a
and XGBoost exhibit lower performance with accuracy great platform for sharing news and interacting with other users
rates of 69% and 60%, respectively. The performance online [18][19]. The way people feel on Twitter has a big
comparison highlights the superior capabilities of XLNet impact on many parts of our lives. The goal of SA and text
for sentiment classification tasks, indicating its potential classification is to extract information from texts and then
for enhancing text classification applications. Research in classify a polarity as neutral (Ne), negative (N), or positive
the future can look at ways to improve XLNet's (P)[20].
performance on bigger and more complicated datasets by
combining it with more sophisticated deep learning Traditional machine learning (ML) approaches have long
methods like attention mechanisms and transfer learning. been utilised for SA tasks, relying on extensive preprocessing
and feature extraction techniques [21][22]. However, the
Keywords:- Sentiment Analysis, LLM, Twitter Dataset, advent of LLMs such as GPT and BERT has transformed the
Natural Language Processing, Text Classification, Social landscape of NLP [23][24]. These models, pre-trained on vast
Media Analytics. corpora of text, demonstrate a superior ability to comprehend
context, capture nuances, and generate accurate predictions,
I. INTRODUCTION thereby outperforming conventional methods in many
scenarios[25][26]. The main objective of this project is to use
The Internet has revolutionised the way individuals share Twitter datasets to evaluate and contrast LLMs with more
their thoughts and ideas these days. These days, most people do traditional ML techniques for sentiment analysis.[27]. By
it on websites that allow product reviews, social media, online leveraging the strengths of both paradigms, this research seeks
forums, and blogs [1]. Facebook, Twitter, Google Plus, and to highlight their efficacy, limitations, and practical
many other social media platforms are attracting millions of applications.
users every day [2]. To share their thoughts and feelings as well
as comments regarding their everyday lives [3]. Consumers Significance and Contribution of Study
may educate and persuade one another through online forums This study highlights a significance of leveraging both
and online communities, which is an interactive kind of media LLMs and traditional methods for sentiment analysis to
[4]. Blog entries, reviews, comments, tweets, and status enhance accuracy and contextual understanding. Finding out
updates all contribute to the mountain of sentiment-rich data how well each method worked on a real-world Twitter dataset
that is generated by social media [5]. allows us to compare and contrast their merits and
shortcomings. The findings contribute to advancing sentiment
analysis techniques, enabling more effective applications in
social media analytics and NLP. The contribution of this study
In this study, Ihnaini et al. (2024) fine-tuning process In this study, Gope et al. (2022) Sentiment analysis seeks
incorporates both supervised techniques and reinforcement to categorise an amount of good and negative emotions
learning from human feedback, specifically designed to align expressed in a certain text. ML methods such as LR, MNB,
the models with the historical and cultural context of Song Ci. BNB, RF, and LSVM were used. They achieved a 91.90%
Notably, the ChatGLM-6B(8-bit) model achieved the best F1 success rate using the RFC. In our study, they also achieved
Score of 0.840, demonstrating its exceptional ability to merge maximum accuracy (97.52%) using RNN with LSTM as a
ancient literary analysis with modern computational DL strategy. Our model is well-suited to the RNN-LSTM
technology, thereby broadening our understanding of the technique [33].
emotional spectrum of classical Chinese poetry [28].
In this paper, Al-Hagree and Al-Gaphari (2022) Data used
In this study, Patrick et al. (2023) The model is developed for SA tasks were collected from evaluations left by users of
and tested using a variety of learning algorithms, including banking mobile applications on the Google Play Store. The
Linear Regression, KNN, SVM, RF, Bagging, and Gradient Arabic sentiment analysis is carried out by use of ML
Boosting. There are two sets of data: the test dataset and the algorithms, namely the NB, KNN, DT, and SVM patterns. NB
training dataset. The test as well as training datasets were used model has outperformed competing DT, KNN, and SVM
to create and evaluate the model. It was determined that 1.1% algorithms in terms of evaluation quality. In comparison to the
of respondents were untrustworthy due to unanswered other models, the NB model had exceptional results according
questions by the reliability check. The results demonstrate that to recall(88.08%), accuracy(88.25%), and F-score(88.25%)
out of the two algorithms, RF-supervised ML yields a superior [34].
accuracy of 0.711 compared to KNN's 0.515 [29].
Below, Table 1 provides a summary of a literature review
with dataset approaches, results and limitations for text dataset
classification.
III. METHODOLOGY
Figure 2 illustrates sentiment analysis across common enhanced by feature extraction [49]. Emotion counts (both
words, categorised into positive (blue), negative (orange), and positive and negative), question marks, hashtags, and
neutral (green) sentiments. Words like "thi," "go," "time," exclamation points are some of the key characteristics [50].
"like," and names like "trump" and "obama" are analysed, with
the y-axis showing frequency counts up to 250. This analysis, Word2Vec
likely derived from social media or text data, reveals how The process of word2vec involves converting words to
word usage correlates with sentiment expression, offering vectors and then identifying words that are similar to each
insights into linguistic and emotional trends in the dataset. other. This enables it to identify words that reflect distinct
emotions [51]. The construction of Word2Vec involves
Feature Extraction calculating the average similarity of a whole tweet to a given
Feature extraction is a powerful tool for reducing word, with the average values normalised to [0,1] to account
resource requirements while preserving critical data [47][48]. for variations in word count across tweets [52][53]. Figure 3
The effectiveness and precision of ML models are greatly displays the terms that are comparable.
keywords - "good," "bad," and "information" - showing their max 𝐸𝑧 ~𝑍𝑇 { ∑ logp(𝑥𝑧𝑡 |𝑥𝑧 < 𝑡)} (1)
𝑡=1
associations with different sentiment categories. For each
word, the bars display varying heights indicating the Where 𝑧𝑡 and z < t represent a permutation's t-th and t-1
frequency or strength of association, with "bad" showing the elements, respectively. 𝑍𝑇
highest positive sentiment count (around 1000), followed by
"good" (around 800), while "information" has more balanced A likelihood of token 𝑥𝑧𝑡 given previous tokens 𝑥𝑧𝑡 <𝑡 is
sentiment distributions across all three categories. This computed using the XLNet auto-regressive permutation
visualisation effectively demonstrates how words that might approach, as displayed in Equation 1. Although XLNetonly
seem inherently positive or negative can actually have modifies the factorisation order and not a sequence_order
complex sentiment associations in real-world usage. during training, it preserves an original sequence_order and
utilises Transformers to match the original sequence's
Data Splitting positional encoding[62][63]. This quality is helpful for
Data splitting in ML models is a crucial step for fortuning as it takes into account just the sequence's natural
evaluating the performance of models. The basic premise is to order [64]. They thus use XLNet in our research since its
partition the dataset into separate parts for testing and training. design differs from that of BERT [65][66].
They use 80% of the data for training and just 20% for testing.
Performance Metrics
Large Language Model (XLNET) To evaluate the performance of phishing email detection,
XLNet[54] is one of Google AI's 2019 Transfer Learning used a set of evaluation metrics, also known as performance
models; it's similar to BERT but uses an AR pre-training metrics [67][68]. A confusion matrix may be used to compare
strategy to generalise its features, making it perform better the predicted and actual values of a model in order to evaluate
than BERT on many benchmark datasets [12][55]. They its accuracy [69][70]. Five measures were utilised to evaluate
demonstrate below how XLNet has used PLM to overcome the final models: accuracy, precision, recall, and score. This
AE models' shortcomings, namely the issue of obtaining evaluation process begins with confusion matrices that use the
bidirectional context [56]. In comparison to BERT, XLNet's following criteria to rank the models: with TP, FP, TN, and
convergence time is much longer due to the fact that it trains FN, where:
through all possible word sequences utilising permutations of
occurrences for a particular word [57][58]. TP indicates the amount of occurrences when the actual
class was accurately predicted.
The core concept of XLNet is to enhance PLM with FN, the picture shows how many times the real class was
additional capabilities in order to capture situations that are wrongly assumed to be some other kind.
bidirectional [59][60]. It is possible to accomplish AR
TN displays the quantity of correctly classified records as
factorisation in T! by thinking about every possible location
normal.
on each side of a token. separate sequences x tokens of length
FP, it is the sum of all the times a different class was
T in a phrase [61]. Consider ZT to be the collection of every
wrongly assumed to be the real class.
combination of T-length sequences that could exist.
Precision: According to formula (3), precision is a The purpose of these matrices is to facilitate the
percentage of positive observations that accurately evaluation of various ML models on the Twitter dataset.
forecast a total number of positive predictions.
IV. RESULT ANALYSIS AND DISCUSSION
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3)
𝑇𝑃+𝐹𝑅
The experiment is done on the twitter dataset. On this
twitter data apply Large Language Model (XLNET), and
Recall: Recall is a percentage of properly detected positive
compare (see in Table 3) with KNN[71], Random Forest[72],
observations, as computed using Eq. (4):
XGBoost [73] models. Table II displays the outcomes of a
𝑇𝑃 sentiment analysis using a LLM model.
𝑅𝑒𝑐𝑎𝑙𝑙 = (4)
𝑇𝑃+𝐹𝑁
The Figure 4 bar graph displayed a performance of a XLNet model for Twitter data. In this graph, the XLNet model's
performance for sentiment analysis achieves an accuracy of 99.54, precision of 99%, recall of 99%, and F1-Score of 99%, reflecting
its excellent overall results in classification tasks.
Figure 5 is a line plot showing the model accuracy of XLNet during training and validation across epochs. The accuracy is
displays on a y-axis, while a number of epochs is displayed on an x-axis. A plot shows a fluctuation in validation accuracy around
epochs 2-3, with a notable dip, while training accuracy remains relatively stable.
The following Figure 6 shows the loss plot for XLNet during training. An x-axis spans epochs between 0 and 4, and the y-axis
represents loss values from approximately 0.01 to 0.03. This graph is useful for tracking the model's learning progress and seeing
signs of under- or overfitting.