0% found this document useful (0 votes)
2 views6 pages

Twitter and Emotions: Exploring Sentiment Detection

This paper explores sentiment detection on Twitter using machine learning techniques, particularly focusing on Support Vector Machines (SVM), Naive Bayes, and Decision Trees. The study demonstrates that SVM outperforms other classifiers in accuracy, achieving precision rates of 79.79% and 83.76% for datasets of 1,000 and 5,000 tweets respectively, especially when advanced preprocessing techniques are applied. The findings contribute to the development of automated tools for extracting subjective information from unstructured text, aiding in decision-making based on user sentiment.

Uploaded by

Rafael Guzman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

Twitter and Emotions: Exploring Sentiment Detection

This paper explores sentiment detection on Twitter using machine learning techniques, particularly focusing on Support Vector Machines (SVM), Naive Bayes, and Decision Trees. The study demonstrates that SVM outperforms other classifiers in accuracy, achieving precision rates of 79.79% and 83.76% for datasets of 1,000 and 5,000 tweets respectively, especially when advanced preprocessing techniques are applied. The findings contribute to the development of automated tools for extracting subjective information from unstructured text, aiding in decision-making based on user sentiment.

Uploaded by

Rafael Guzman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Twitter and Emotions: Exploring Sentiment

Detection.
1st José Carmen Morales Castro. 2nd Rafael Guzmán Cabrera 3rd Tirtha Prasad Mukhopadhyay
Departamento de Estudios (corresponding author) Departamento de Ingenieria Electrica
Multidisciplinarios Departamento de Ingenieria Electrica Universidad de Guanajuato
Universidad de Guanajuato Universidad de Guanajuato Salamanca,Guanajuato; México
Yuriria, Guanajuato; México Salamanca,Guanajuato; México [email protected]
[email protected] [email protected]

4th John R. Baker


University of Economics and Finance,
Vietnam
Shinawatra University, Thailand
[email protected]

Abstract—This paper investigates human expression on addition, advanced deep learning techniques such as
Twitter by applying sentiment analysis through natural convolutional neural networks (CNNs) are implemented using
language processing and machine learning techniques. The WekaDeeplearning4j, enabling efficient scalability on large
study demonstrates that classifiers such as SVM, Naive Bayes, datasets and highlighting the effectiveness of these
and Decision Trees achieve high accuracy in sentiment approaches in social network contexts such as Twitter.
classification, with SVM performing the best across different
preprocessing stages. Results are presented in comparative II. PROBLEM STATEMENT
tables and graphs, showing that SVM obtained a precision of
79.79% for the 1000-tweet dataset and 83.76% for the 5000- A key challenge is automatically identifying sentiment in
tweet dataset. These findings illustrate the effectiveness of unstructured texts, particularly tweets, using an architecture
combining base classifiers, lexical resources, and deep learning that combines base classifiers and lexical resources. To
techniques in identifying and categorizing tweet content. The address this, we developed automated tools to extract
demonstrated results contribute to the development of subjective information (opinions or feelings) from natural
automated tools for extracting information from unstructured language texts. This process allows for the generation of
text, which is crucial for decision-making based on relevant and structured, processible knowledge for decision-making
precise data. systems, enabling a better understanding of users’ perceptions
and facilitating the adoption of strategic measures based on
Keywords— Natural Language Preprocessing, Sentiment accurate, relevant information.
Analysis, Machine Learning.
III. METHODS
I. INTRODUCTION To address the sentiment in unstructured texts (tweets), we
Within the context of the exponential growth of social began with an exhaustive review of related work, building
networks, Twitter stands out as a virtual space where millions upon methodologies previously used by [1] and [3], to identify
of users share their opinions, emotions, and experiences in the different types of classifiers, methodologies, and
real-time. This platform offers a unique window into evaluation metrics. This culminated in a research design that
understanding how people relate to their surroundings, which allowed us to address the task competitively and efficiently.
is why sentiment analysis has become essential for (Fig 1).
understanding the complexity of human expressions in the
digital world [1]. However, sentiment analysis on social
networks like Twitter presents various challenges due to the
nature of the messages. To overcome this ambiguity,
employing a multifaceted approach that combines different
techniques and methodologies is vital.
One example of these techniques is using base classifiers
and lexical resources, which provide a foundation for
identifying sentiments and categorizing tweets as positive,
negative, or neutral. This can significantly facilitate the initial
processing of the data [2] The work also involves a meta-
classifier that integrates multiple models and approaches to Fig. 1. Methodology implemented.
generate more robust and reliable predictions about a tweet's
sentiment. Additionally, including a Deep Learning technique In the next stage, we selected a suitable database and
allows us to explore complex and non-linear data patterns. performed data preprocessing, applying techniques drawn
from related research. We utilized a dataset of approximately
This study introduces an innovative approach by 163,000 manually labeled tweets, categorized by polarity as
integrating a meta-classifier combining Support Vector positive, negative, or neutral. These tweets were sourced from
Machines (SVMs), Naive Bayes and Decision Trees, an archived dataset [see 6]. Our study builds on this previous
demonstrating a significant improvement in sentiment work by expanding the scope through the introduction of new
detection accuracy compared to traditional methods. In deep learning models, such as convolutional neural networks

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
making, which is useful for understanding what factors
influence classification. The meta-classifier was implemented
to combine the advantages of these approaches, thus
optimizing accuracy and reducing the error rate.
Precision was used as the evaluation metric, a performance
measure applied to data retrieved from a set, corpus, or sample
space. It is also termed a positive predictive value,
representing the proportion of relevant retrieved instances, as
indicated in Eq. 1:

(1)

Where “tp” corresponds to a true positive value and “fp”


to a false positive value [10]. Fig. 4. Comparison of 1000 data Training and Testing Sets.

Finally, as an additional step, a meta-classifier was The following figures show the results obtained for the
implemented that combined the three best learning techniques dataset containing 5000 tweets.
based on the best percentage of accuracy obtained in the
experiments: SVM, Naive Bayes, and Decision Trees.
The results were presented in tables and comparative
graphs, showing the best outcomes for each set using both
classification scenarios from the Weka platform. These
highlighted the highest precision values—a key performance
metric (positive predictive value) that represents the
proportion of relevant instances among those retrieved,
indicating the percentage of correctly classified instances.
IV. RESULTS
This study aimed to investigate sentiment analysis on
Twitter using various machine learning techniques,
particularly Support Vector Machines (SVM), Naive Bayes, Fig. 5. Comparison of 5000 data Cross-Validation.
and Decision Trees. We conducted experiments on two
datasets of 1,000 and 5,000 tweets, employing different
preprocessing techniques.
Our key findings indicate that SVM consistently
outperformed other classifiers across various preprocessing
methods and dataset sizes. The preprocessing approach
utilizing information gain yielded the best results for both
datasets. Additionally, cross-validation and training/testing
scenarios revealed similar trends in performance, while a
meta-classifier combining SVM, Naive Bayes, and Decision
Trees improved overall accuracy.
The detailed results are presented in the following figures.

Fig. 6. Comparison of 5000 data Training and Testing Sets.

Analysis of the results reveals that the SVM classifier


outperforms other methods in terms of accuracy, this when
advanced preprocessing techniques were applied. The reason
is due to SVM's ability to handle nonlinear features and
maximize the margin of separation between classes, which is
crucial in an environment such as twitter, where texts are
short and often ambiguous. By applying a meta-classifier
composed of SVM, Naive Bayes and decision trees, an
increase in precision was achieved, demonstrating the
effectiveness of combining complementary approaches to
improve model robustness.
Fig. 3. Comparison of 1000 data Cross-Validation. However, we identified some limitations, especially in
the classification of tweets with mixed or ambiguous
sentiments. Even the use of convolutional neural networks
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs), and by analyzing additional dimensions of sentiment no meaning by themselves. [2]. Then, uppercase letters were
classification. Furthermore, we enhanced the methodological converted to lowercase to homogenize the corpus. Afterward,
framework by incorporating additional machine learning the tweets were tokenized, segmenting the text into phrases or
models and refining the preprocessing techniques. This words. Next, the tweet was lemmatized to reduce
extension enables us to explore more advanced methodologies morphological variability and improve the accuracy of text
and provides deeper insights into how various approaches processing. Finally, the information gain technique was
affect the accuracy and scalability of sentiment analysis tasks, applied to measure the relevance of an attribute within a
particularly when working with large datasets. In this case, to dataset.
evaluate perceptions of public figures on social networks
during a national event (2019 general elections in India) and Considering the preprocessing phase, both datasets were
to automatically classify these opinions, facilitating divided into four files, each with different preprocessing
adjustments to public figures' messaging or moderating stages. We termed the first set baseline; this file did not
discourse based on public sentiment. experience additional preprocessing, keeping the tweets in
their original form without stopword removal, lemmatization,
The texts were labeled with opinion values: (0) for neutral, or application of information gain techniques. For the next set,
(1) for positive, and (-1) for negative. After identifying the preprocessing was carried out. This consisted of eliminating
dataset, we divided it into two subsets for the analysis: one stopwords, lemmatizing, and applying information gain,
consisting of 1,000 tweets and the other of 5,000 tweets. The resulting in 352 selected attributes for the set of 1000 data and
choice of these specific sizes was to facilitate comparison 637 attributes for the set of 5000 data. For the third set, only
between a smaller and larger dataset, allowing us to observe the information gain technique was applied to select the most
the impact of dataset size on model performance. For both relevant attributes, resulting in 422 attributes for the 1000 data
subsets, all messages were filtered using relevant keywords, set and 2399 attributes for the 5000 data set. Finally, a similar
including hashtags and stems. This process was part of our procedure was undertaken for the fourth set and the second
experimental design to ensure consistency and relevance in file. This resulted in the elimination of stop words and
the data utilized for sentiment analysis. lemmatization, generating 3319 attributes for the set of 1000
data and 597 attributes for the set of 5000 data, clarifying that
Afterward, convolutional neural networks (CNNs) were the information gain technique was not applied in this case.
used to compare the results. In this context, we chose to
implement CNNs in the Weka platform using the Our research used two classification scenarios: 10-fold
WekaDeeplearning4j extension, which is based on the library cross-validation, a method for evaluating predictive models
of the same name and allows us to follow a specific procedure and preventing overfitting. This model was trained with a
that begins with the installation of the extension as the first subset of the data and validated on the remaining sets. This
step in the platform. The WekaDeeplearning4j extension process was repeated ten times to ensure an accurate
offers a graphical interface for configuring, training, and evaluation and classification [5]. The second classification
evaluating deep learning models. It can extract spatial features scenario used was the training and testing sets, where the
from data and offers APU for integration with Java dataset is divided into two, one for training and one for testing.
applications. A key feature is its ability to leverage GPUs and Most of the data were used to in the training of the model,
distributed clusters, significantly speeding up model training while a smaller portion was allocated for testing and
and inference, especially with large datasets [4]. evaluating its performance. The training set adjusted the
model using selected data to improve its accuracy on new data,
In the next step, once the corpus was obtained and the and the testing set evaluated the model's performance by
experiment was undertaken, the data preprocessing ( Figure 2) avoiding overfitting, comparing predictions with actual
was performed. For this stage, a series of steps were taken to classifications [6].
standardize the structure of all the tweets in the corpus,
facilitating their interpretation during processing. In both cases, supervised learning methods were used to
classify the comments according to their corresponding label.
These techniques included (1) Support Vector Machines
(SVM), a learning-based method that provides support for
solving problems through classification and regression based
on training and resolution phases, the result of which is
proposed output to an established problem [7]; (2) Naive
Bayes (NB), a classifier that calculates the probability of an
event given information about the based on the additional
assumptions theorem [8]; and (3) Decision Trees (J48
algorithm), a machine learning algorithm that builds decision
trees for classification to select the feature with the highest
discrimination capacity at each node to be able to divide the
data set into subsets [9]. These techniques proved effective in
achieving precise class separation and obtaining high
performance in comment classification.
SVM was chosen because of its ability to handle nonlinear
data and its effectiveness in classifying short texts such as
Fig. 2. Steps to follow within preprocessing.
tweets, where features are not always easily separable. Naive
Bayes was selected for its simplicity and speed of training,
Five steps were taken to carry out the preprocessing. First, making it ideal for processing large volumes of data quickly.
stopwords or empty words were removed, i.e., those that have Decision Trees (J48) provide a clear visualization of decision
(CNNs) did not present significant improvements in these 3. CASTRO, J. C. M., CARRILLO, L. M. L., & CABRERA, R. G.
cases, suggesting the need to develop specific techniques to Identificación de polaridad en Twitter usando validación
address this complexity. This aspect will be explored in future cruzada.
work, in order to optimize the classification of texts with
4. Jianqiang, Z., & Xiaolin, G. J. I. a. (2017). Comparison research
different tonalities.
on text pre-processing methods on twitter sentiment analysis. 5,
Our results demonstrate that sentiment analysis plays a 2870-2879.
crucial role in extracting information from unstructured texts 5. Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., & Frank,
on Twitter, generating structured knowledge useful for
E. J. K.-B. S. (2019). Wekadeeplearning4j: A deep learning
decision-making.
package for weka based on deeplearning4j. 178, 48-50.
The Support Vector Machine (SVM) algorithm
consistently obtained the best performance across all 6. MORALES-CASTRO, J. C., PÉREZ-CRESPO, J. A.,
experiments, particularly in the preprocessing that included PRASAD-MUKHOPADHYAY, T., & GUZMÁN-
information gain. For the 1000-tweet dataset, this CABRERA, R. J. J. B. E. R. d. E. B. (2022). Automatic
preprocessing resulted in 422 attributes, while for the 5000- identification of sentiment in unstructured text. 6(15).
tweet dataset, it yielded 637 attributes.
7. MORALES-CASTRO, J. C., RUIZ-PINALES, J., LOZANO-
V. DISCUSSION AND CONCLUSION GARCÍA, J. M., GUZMÁN-CABRERA, R. J. J. o. P., &
This study demonstrates that classifiers such as SVM, Technology, M. (2022). Use of image processing for the
Naive Bayes, and Decision Trees can achieve high accuracy detection of Parkinson's disease. 6-16.
in sentiment classification. The strong performance of these 8. Morales Castro, W., & Guzmán Cabrera, R. J. C. y. S. (2020).
classifiers, especially when combined in a meta-classifier, Tuberculosis: Diagnóstico mediante procesamiento de
contributes to the development of automated tools for
imágenes. 24(2), 875-882.
extracting information from unstructured text. These tools can
aid in decision-making processes by providing relevant and 9. Santana Mansilla, P. F., Costaguta, R. N., & Missio, D. (2014).
precise data derived from social media sentiments Aplicación de Algoritmos de Clasificación de Minería de
The sentiment analysis techniques explored in this study Textos para el Reconocimiento de Habilidades de E-tutores
have significant implications for various fields. In product Colaborativos.
development, they enable companies to understand user
reactions for data-driven improvements. In marketing, these 10. Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & Data, M. (2005).
tools help gauge public sentiment toward campaigns, allowing Practical machine learning tools and techniques. Data mining,
for real-time strategy adjustments. In political analysis, they
provide insights into public opinion, benefiting campaign
managers and policymakers. Additionally, organizations can
improve customer service by automating the categorization
and prioritization of feedback Finally, they have important
implications for linguistic education, particularly in preparing
students who intend to enter these areas.
However, this study has limitations that future research
should address. The dataset's focus on specific political
figures suggests a need for broader topic exploration to test the
generalizability of the methods. Further studies could also
examine the effectiveness of these techniques in different
languages, analyze temporal aspects, compare traditional and
deep learning approaches more comprehensively, and develop
systems for real-time sentiment analysis.
For future research, we plan to expand the analysis to
tweets in different languages, to evaluate the adaptability and
robustness of the model in multilingual contexts. In addition,
to explore the use of this architecture for real-time analysis,
monitoring live events such as elections or product launches,
which will allow adjusting strategies dynamically and
efficiently according to audience reactions.

REFERENCES
1. Berrar, D. (2019). Cross-validation. In.
2. Bowers, A. J., & Zhou, X. J. J. o. E. f. S. P. a. R. (2019). Receiver
operating characteristic (ROC) area under the curve (AUC): A
diagnostic measure for evaluating the accuracy of predictors of
education outcomes. 24(1), 20-46.

You might also like