Twitter and Emotions: Exploring Sentiment Detection
Twitter and Emotions: Exploring Sentiment Detection
Detection.
1st José Carmen Morales Castro. 2nd Rafael Guzmán Cabrera 3rd Tirtha Prasad Mukhopadhyay
Departamento de Estudios (corresponding author) Departamento de Ingenieria Electrica
Multidisciplinarios Departamento de Ingenieria Electrica Universidad de Guanajuato
Universidad de Guanajuato Universidad de Guanajuato Salamanca,Guanajuato; México
Yuriria, Guanajuato; México Salamanca,Guanajuato; México [email protected]
[email protected] [email protected]
Abstract—This paper investigates human expression on addition, advanced deep learning techniques such as
Twitter by applying sentiment analysis through natural convolutional neural networks (CNNs) are implemented using
language processing and machine learning techniques. The WekaDeeplearning4j, enabling efficient scalability on large
study demonstrates that classifiers such as SVM, Naive Bayes, datasets and highlighting the effectiveness of these
and Decision Trees achieve high accuracy in sentiment approaches in social network contexts such as Twitter.
classification, with SVM performing the best across different
preprocessing stages. Results are presented in comparative II. PROBLEM STATEMENT
tables and graphs, showing that SVM obtained a precision of
79.79% for the 1000-tweet dataset and 83.76% for the 5000- A key challenge is automatically identifying sentiment in
tweet dataset. These findings illustrate the effectiveness of unstructured texts, particularly tweets, using an architecture
combining base classifiers, lexical resources, and deep learning that combines base classifiers and lexical resources. To
techniques in identifying and categorizing tweet content. The address this, we developed automated tools to extract
demonstrated results contribute to the development of subjective information (opinions or feelings) from natural
automated tools for extracting information from unstructured language texts. This process allows for the generation of
text, which is crucial for decision-making based on relevant and structured, processible knowledge for decision-making
precise data. systems, enabling a better understanding of users’ perceptions
and facilitating the adoption of strategic measures based on
Keywords— Natural Language Preprocessing, Sentiment accurate, relevant information.
Analysis, Machine Learning.
III. METHODS
I. INTRODUCTION To address the sentiment in unstructured texts (tweets), we
Within the context of the exponential growth of social began with an exhaustive review of related work, building
networks, Twitter stands out as a virtual space where millions upon methodologies previously used by [1] and [3], to identify
of users share their opinions, emotions, and experiences in the different types of classifiers, methodologies, and
real-time. This platform offers a unique window into evaluation metrics. This culminated in a research design that
understanding how people relate to their surroundings, which allowed us to address the task competitively and efficiently.
is why sentiment analysis has become essential for (Fig 1).
understanding the complexity of human expressions in the
digital world [1]. However, sentiment analysis on social
networks like Twitter presents various challenges due to the
nature of the messages. To overcome this ambiguity,
employing a multifaceted approach that combines different
techniques and methodologies is vital.
One example of these techniques is using base classifiers
and lexical resources, which provide a foundation for
identifying sentiments and categorizing tweets as positive,
negative, or neutral. This can significantly facilitate the initial
processing of the data [2] The work also involves a meta-
classifier that integrates multiple models and approaches to Fig. 1. Methodology implemented.
generate more robust and reliable predictions about a tweet's
sentiment. Additionally, including a Deep Learning technique In the next stage, we selected a suitable database and
allows us to explore complex and non-linear data patterns. performed data preprocessing, applying techniques drawn
from related research. We utilized a dataset of approximately
This study introduces an innovative approach by 163,000 manually labeled tweets, categorized by polarity as
integrating a meta-classifier combining Support Vector positive, negative, or neutral. These tweets were sourced from
Machines (SVMs), Naive Bayes and Decision Trees, an archived dataset [see 6]. Our study builds on this previous
demonstrating a significant improvement in sentiment work by expanding the scope through the introduction of new
detection accuracy compared to traditional methods. In deep learning models, such as convolutional neural networks
(1)
Finally, as an additional step, a meta-classifier was The following figures show the results obtained for the
implemented that combined the three best learning techniques dataset containing 5000 tweets.
based on the best percentage of accuracy obtained in the
experiments: SVM, Naive Bayes, and Decision Trees.
The results were presented in tables and comparative
graphs, showing the best outcomes for each set using both
classification scenarios from the Weka platform. These
highlighted the highest precision values—a key performance
metric (positive predictive value) that represents the
proportion of relevant instances among those retrieved,
indicating the percentage of correctly classified instances.
IV. RESULTS
This study aimed to investigate sentiment analysis on
Twitter using various machine learning techniques,
particularly Support Vector Machines (SVM), Naive Bayes, Fig. 5. Comparison of 5000 data Cross-Validation.
and Decision Trees. We conducted experiments on two
datasets of 1,000 and 5,000 tweets, employing different
preprocessing techniques.
Our key findings indicate that SVM consistently
outperformed other classifiers across various preprocessing
methods and dataset sizes. The preprocessing approach
utilizing information gain yielded the best results for both
datasets. Additionally, cross-validation and training/testing
scenarios revealed similar trends in performance, while a
meta-classifier combining SVM, Naive Bayes, and Decision
Trees improved overall accuracy.
The detailed results are presented in the following figures.
REFERENCES
1. Berrar, D. (2019). Cross-validation. In.
2. Bowers, A. J., & Zhou, X. J. J. o. E. f. S. P. a. R. (2019). Receiver
operating characteristic (ROC) area under the curve (AUC): A
diagnostic measure for evaluating the accuracy of predictors of
education outcomes. 24(1), 20-46.